Q75 — AWS DEA-C01 Ch.1
Question 75 of 100 | ← Chapter 1
A company has developed several AWS Glue extract, transform, and load (ETL) jobs to validate and transform data from Amazon S3. The ETL Jobs load the data into Amazon RDS for MySQL in batches once every day. The ETL jobs use a DynamicFrame to read the S3 data. The ETL jobs currently process all the data that is in the S3 bucket. However, the company wants the jobs to process only the daily Incremental data. Which solution will meet this requirement with the LEAST coding effort?
- A. Create an ETL job that reads the S3 file status and logs the status in Amazon DynamoDB.
- B. Enable job bookmarks for the ETL jobs to update the state after a run to keep track of previously processed data. ✓
- C. Enable job metrics for the ETL jobs to help keep track of processed objects in Amazon CloudWatch.
- D. Configure the ETL jobs to delete processed objects from Amazon S3 after each run.
Correct Answer: B. Enable job bookmarks for the ETL jobs to update the state after a run to keep track of previously processed data.
Explanation
在处理仅需处理每日增量数据的需求时,选项B启用作业书签(jobbookmarks)是最省力的方法。作业书签能够在每次运行后更新状态,以跟踪先前处理的数据,从而使ETL作业在后续运行中只处理增量数据。选项A创建读取S3文件状态并在AmazonDynamoDB中记录状态的ETL作业,需要额外的开发和配置工作。选项C启用作业指标在AmazonCloudWatch中跟踪处理的对象,不能直接实现只处理增量数据的目标。选项D配置ETL作业在每次运行后从AmazonS3删除已处理的对象,这种方式可能会导致数据丢失等问题,且不是最有效的只处理增量数据的方法。所以,答案选B。