Q71 — AWS AIF-C01 Ch.2
Question 71 of 100 | ← Chapter 2
A data scientist is building an ML pipeline to train a text classification model. The data scientist has already collected the data for training. Which component of the ML lifecycle must be completed first?
- A. Model training
- B. Feature engineering ✓
- C. Model validation
- D. Model monitoring
Correct Answer: B. Feature engineering
Explanation
Analysis: A. Model training occurs after feature engineering and uses processed data to train the ML model. B. Feature engineering is the process of transforming raw data into feature vectors suitable for ML algorithms. For text classification tasks, meaningful features—such as bag-of-words or TF-IDF—must be extracted from raw text data to serve as input to the model. This is one of the most critical early steps in an ML pipeline. C. Model validation occurs after model training to evaluate performance metrics on test data. D. Model monitoring occurs post-deployment to continuously track model performance and detect anomalies such as concept drift. Therefore, after training data has been collected, the first step in a text classification ML pipeline is feature engineering—converting raw text into feature vector representations consumable by algorithms—before proceeding to model training and subsequent steps. Hence, the correct choice is B. Feature engineering.