Q22 — AWS AIF-C01 Ch.2
Question 22 of 100 | ← Chapter 2
A social media company wants to use a large language model (LLM) for content moderation. The company wants to evaluate whether the LLM's output contains bias, including potential discrimination against specific demographic groups or individuals. Which data source should the company use to evaluate the LLM's output with the least required management overhead?
- A. User-generated content
- B. Moderation logs
- C. Content moderation guidelines
- D. Benchmark dataset ✓
Correct Answer: D. Benchmark dataset
Explanation
This question tests knowledge of data sources for evaluating LLM outputs. When assessing whether an LLM's output exhibits bias or discrimination, a benchmark dataset contains standardized, normative samples covering diverse scenarios. Using it enables systematic and comprehensive evaluation with relatively low management overhead. In contrast, user-generated content is noisy and unstructured, moderation logs lack standardization, and content moderation guidelines are not direct data sources. Therefore, the benchmark dataset is the correct choice.