Q76 — AWS AIF-C01 Ch.2

Question 76 of 100 | ← Chapter 2

Which type of ML involves training a model to maximize cumulative reward based on feedback received from an environment?

A. Supervised learning
B. Unsupervised learning
C. Semi-supervised learning
D. Reinforcement learning ✓

Correct Answer: D. Reinforcement learning

Explanation

Analysis: A. Supervised learning trains models using labeled data (input-output pairs) to predict or classify new data—it does not involve environmental feedback or cumulative rewards. B. Unsupervised learning discovers inherent patterns or structures in unlabeled data—no environmental feedback or rewards are involved. C. Semi-supervised learning leverages both limited labeled and abundant unlabeled data—but still does not involve environmental feedback or reward maximization. D. Reinforcement learning trains an agent to learn optimal actions by interacting with an environment and receiving feedback (rewards or penalties), aiming to maximize long-term cumulative reward. In reinforcement learning, the agent observes a state, takes an action, transitions to a new state, and receives a reward. Its goal is to learn a policy that selects optimal actions across states to maximize expected future cumulative reward. Therefore, reinforcement learning is precisely the ML paradigm that trains a model (agent) to maximize cumulative reward derived from environmental feedback.