Q12 — AWS AIF-C01 Ch.2

Question 12 of 100 | ← Chapter 2

A company is building a machine learning model. It has collected raw data and is analyzing it by creating correlation matrices, computing statistical summaries, and visualizing distributions. At which stage of the machine learning pipeline is the company currently operating?

Correct Answer: C. Exploratory data analysis (EDA)

Explanation

Exploratory Data Analysis (EDA) is the initial phase where analysts examine raw data to understand its structure, patterns, relationships, and anomalies before modeling. Core EDA activities include computing descriptive statistics (e.g., mean, variance), generating correlation matrices to assess feature interdependencies, and visualizing distributions (e.g., histograms, box plots) to detect skewness or outliers. Since the company is performing exactly these tasks—correlation analysis, statistical computation, and visualization—it is squarely in the EDA stage. Data preprocessing focuses on cleaning and transforming data; feature engineering involves deriving new features; hyperparameter tuning occurs post-model training.