Q79 — AWS AIF-C01 Ch.1
Question 79 of 100 | ← Chapter 1
A company built a generative AI solution using large language models (LLMs) to translate training manuals from English into other languages. The company wants to assess the solution’s accuracy by reviewing the generated translated text. Which model evaluation strategy satisfies this requirement?
- A. Bilingual Evaluation Understudy (BLEU) ✓
- B. Root Mean Square Error (RMSE)
- C. Recall-Oriented Understudy for Gisting Evaluation (ROUGE)
- D. F1 score
Correct Answer: A. Bilingual Evaluation Understudy (BLEU)
Explanation
To evaluate translation accuracy in generative AI, a metric must measure similarity between machine-translated output and human-written reference translations. BLEU is a standard metric for machine translation quality, based on n-gram overlap between candidate and reference texts. RMSE applies to regression tasks, ROUGE is primarily used for summarization, and F1 score is suited for classification—not translation evaluation. Therefore, A is correct.