A company is developing an ML model to generate natural-language responses for a customer service chatbot. It

Question

A company is developing an ML model to generate natural-language responses for a customer service chatbot. It needs to evaluate how similar the model’s generated responses are to subject-matter expert (SME) responses. The company has a dataset of SME-validated question-answer pairs. Which metric should the company use to evaluate model performance?

Accepted Answer

A. BERTScore

Answer

B. Mean Squared Error (MSE)

Answer

C. Perplexity

Answer

D. F1 Score

Q82 — AWS AIF-C01 Ch.3

Correct Answer: A. BERTScore

Explanation