Introduction to Business Analytics

Undergraduate course on data analysis and machine learning for business decision-making. Covers R programming fundamentals, data wrangling with the tidyverse, exploratory data analysis, clustering, classification and regression trees, and an introduction to AI tools, LLMs and prompt engineering for analytics work.

Instructor: Eduard F. Martínez-González

Institution: Universidad ICESI — Department of Economics

Course code: 06278-ECO

Program: Undergraduate

Term: 2026-01

Original title (in Spanish): Introducción al Business Analytics.

All course materials — theory decks, in-class tasks, guided practices, and datasets — are published on this site as each week is released. The slides are built with Quarto and embed runnable R through webR, so students can execute every example directly in the browser, with no local installation required.

Course description

The course introduces students to the workflow of an analyst: how to turn raw data into evidence that supports business decisions. Students learn to program in R, manipulate and visualize data with the tidyverse, run exploratory data analysis, and apply foundational machine learning methods (clustering and decision trees) to real business datasets. The last unit covers the practical use of AI tools — large language models, agents, and prompt design — for analytics work.

Learning outcomes

By the end of the course, students will be able to:

  • Program in R from the ground up — objects, data types, vectors, data frames, functions, and package management — and organize a reproducible, script-based workflow.
  • Manipulate and summarize data with dplyr (select, filter, arrange, mutate, group_by/summarise) and build clear, layered visualizations with ggplot2.
  • Diagnose and clean messy real-world data — missing values, duplicates, inconsistent categories — and carry an exploratory data analysis through from raw file to insight.
  • Explain the core ideas of machine learning — supervised vs. unsupervised, the train/test split, overfitting, baselines, cross-validation, and evaluation metrics — and avoid common pitfalls such as data leakage.
  • Apply foundational ML methods to business problems: k-means clustering for segmentation, and classification and regression trees for prediction, reading their results through confusion matrices and error metrics (MAE, RMSE).
  • Use AI tools (LLMs, skills, agents) critically and write effective prompts for analytics work.

Schedule

Unit 2 — Foundations of R and Programming

Week 3 — Fundamentos de R y Programación. The RStudio interface (the four panels and a reproducible workflow); R as a calculator (arithmetic, operator precedence, comparison and logical operators); data types (numeric, character, logical) and special values (NA, NULL, Inf, NaN); objects and assignment, naming conventions (snake_case), and inspecting an object’s class, type and structure; managing the Environment; functions and the help system; installing and loading packages (including pacman); vectors (creation, indexing, filtering) and data frames (creation, inspection, row/column indexing, filtering). Theory slides · In-class task

Unit 3 — Data Wrangling and Visualization

Week 4 — Transformación de Datos con dplyr. The dplyr grammar of data manipulation: select(), rename(), filter() (comparison operators and multiple conditions), arrange()/desc(), mutate() (single and multiple columns, conditional labels), and summarise() with group_by() for grouped descriptives; the pipe-based workflow, working directory, and common errors. Theory slides · Task R script · Dataset (cafeteria.csv) · Dataset (ventas.csv)

Week 5 — Visualización de Datos con ggplot2. The grammar of graphics (data, aesthetics, geometries) and building a plot layer by layer; aesthetic mappings inside vs. outside aes(); choosing geometries by the question being asked; customization, facets for multi-group comparison, and communication best practices. Theory slides

Unit 4 — Data Sources, Quality and EDA

Week 6 — Fuentes, Calidad de Datos y Análisis Exploratorio. Data sources; diagnosing data quality with skimr on a deliberately “dirty” dataset — missing values, exact duplicates, and inconsistent categories detected with unique(), table() and duplicated(); cleaning each problem step by step; and running an exploratory data analysis on the cleaned data. Theory slides · Dataset (ventas_raw.csv)

Unit 6 — Machine Learning Foundations

Week 9 — Fundamentos de Machine Learning. What machine learning is and the problems it solves; supervised vs. unsupervised learning and classification vs. regression; the vocabulary of target, features and observations; the train/test split and why we separate data; overfitting and underfitting, baselines, cross-validation and data leakage; the modeling pipeline and evaluation metrics, closing with a guided application that compares models. Theory slides

Unit 7 — Clustering

Week 10 — Clustering: Fundamentos y Métricas. Why segment; the intuition behind k-means (similarity as distance, the four-step algorithm); choosing k with the elbow and silhouette methods and the interpretability criterion; profiling and naming clusters; high-level alternatives to k-means; and the full segmentation pipeline. The guided practice segments songs using Spotify audio features. Theory slides · Practice · Practice R script

Unit 8 — Classification Trees

Week 11 — Árboles de Clasificación. The decision tree as a series of questions (nodes, leaves, depth) and how it learns; the effect of tree depth; reading a classifier with the confusion matrix — the 2×2 table and the metrics derived from it (accuracy, precision, recall); and turning the result into a business decision. The guided practice follows an 11-step pipeline on a credit-classification dataset. Theory slides · Practice · Dataset (credito_clasificacion.csv) · Data prep script

Unit 9 — Regression Trees

Week 12 — Árboles de Regresión. From classification to regression — what changes when the target is a number; how a regression tree predicts through homogeneous groups and their averages; the role of depth; regression metrics (MAE and RMSE, and when to prefer each); and interpreting the metrics for the business. The guided practice predicts student grades on a regression dataset. Theory slides · Practice · Dataset (notas_regresion.csv) · Data prep script

Unit 10 — AI, LLMs and Prompting

Week 15 — Special handouts on AI tools for analytics. Three self-contained handouts on using AI in analytics work:

Methodology

Each unit pairs a theory deck — concepts plus runnable R examples that execute in the browser via webR — with a hands-on component: an in-class task in the foundational units and a guided practice on a real business dataset in the machine-learning units. The clustering and tree practices follow an explicit step-by-step pipeline (load → explore → split → train → evaluate → compare against a baseline → interpret for the business), so students reproduce the full analyst workflow from end to end.

Datasets and tools

The course is built around a small set of business datasets — ventas.csv and the deliberately messy ventas_raw.csv for wrangling, cleaning and EDA, plus cafeteria.csv, credito_clasificacion.csv and notas_regresion.csv for the tasks and ML practices — together with a core R toolkit: dplyr and ggplot2 (tidyverse), skimr for data-quality diagnostics, and pacman for package management, complemented by interactive explainers (a k-means playground and interactive decision-tree explorers).


Additional weeks and materials are added as the semester progresses.