Q89 — AWS DEA-C01 Ch.1
Question 89 of 100 | ← Chapter 1
A data engineer needs to debug an AWS Glue job that reads from Amazon S3 and writes to Amazon Redshift. The data engineer enabled the bookmark feature for the AWS Glue job. The data engineer has set the maximum concurrency for the AWS Glue job to 1. The AWS Glue job is successfully writing the output to Amazon Redshift. However, the Amazon S3 files that were loaded during previous runs of the AWS Glue job are being reprocessed by subsequent runs. What is the likely reason the AWS Glue job is reprocessing the files?
- A. The AWS Glue job does not have the s3:GetObjectAcl permission that is required for bookmarks to work correctly.
- B. The maximum concurrency for the AWS Glue job is set to 1.
- C. The data engineer incorrectly specified an older version of AWS Glue for the Glue job.
- D. The AWS Glue job does not have a required commit statement ✓
Correct Answer: D. The AWS Glue job does not have a required commit statement
Explanation
在AWSGlue中,书签功能用于确保作业只处理新数据或变更的数据。如果AWSGlue作业在没有适当的提交语句的情况下使用了书签,那么作业可能无法正确地记录其处理进度,导致在随后的运行中重新处理之前已经加载过的S3文件。因此,选项D是正确的,因为它指出AWSGlue作业缺少一个必需的提交语句,这可能是导致文件被重复处理的原因。