Q81 — AWS AIF-C01 Ch.1

Question 81 of 100 | ← Chapter 1

A company built a generative AI solution using large language models (LLMs) to translate training manuals from English into other languages. The company wants to assess the solution’s accuracy by reviewing the generated translated text. Which model evaluation strategy satisfies this requirement?

Correct Answer: A. Bilingual Evaluation Understudy (BLEU)

Explanation

Evaluating translation accuracy requires a metric that measures similarity between candidate translations and reference translations. BLEU is a widely adopted metric for machine translation, calculating n-gram precision against one or more reference translations. RMSE is used for regression tasks, ROUGE is optimized for summarization (emphasizing recall), and F1 score is for classification tasks. None of these alternatives are appropriate for translation quality assessment. Therefore, A is correct.