Q68 — AWS AIF-C01 Ch.1
Question 68 of 100 | ← Chapter 1
A company built a solution using generative AI. The solution uses a large language model (LLM) to translate training manuals from English into other languages. The company wants to assess the solution’s accuracy by reviewing the generated text. Which model evaluation strategy satisfies these requirements?
- A. Bilingual Evaluation Understudy (BLEU) ✓
- B. Root Mean Square Error (RMSE)
- C. Recall-Oriented Understudy for Gisting Evaluation (ROUGE)
- D. F1 Score
Correct Answer: A. Bilingual Evaluation Understudy (BLEU)
Explanation
This question tests knowledge of model evaluation strategies. In natural language processing, BLEU is commonly used to evaluate machine translation quality and is appropriate for assessing translation accuracy from English to other languages. RMSE is typically used for continuous numerical error evaluation. ROUGE is mainly used for summarization evaluation. F1 Score is commonly applied to classification tasks. Therefore, option A—Bilingual Evaluation Understudy (BLEU)—is most suitable for evaluating translation accuracy.