Q47 — AWS SAA-C03 Ch.4

Question 47 of 105 | ← Chapter 4

Q242. A company has an AWS Glue extract, transform, and load (ETL) job that runs every day at the same time. The job processes XML data that is in an Amazon S3 bucket New data is added to the S3 bucket every day.A solutions architect notices that AWS Glue is processing all the data during each run.What should the solutions architect do to prevent AWS Glue from reprocessing old data?

Correct Answer: A. Edit the job to use job bookmarks.

Explanation

To prevent AWS Glue from reprocessing old data, a solutions architect should edit the job to use job bookmarks. Therefore, option A is the correct answer. By default, AWS Glue processes all data in the input location every time the job runs. This can lead to unnecessary processing overhead and increased costs. By using job bookmarks, AWS Glue can remember the last processed record and start processing from that record during the next run. This allows the job to process only the new data that has been added since the last run. Option B suggests editing the job to delete data after the data is processed. While this approach can work, it may not be optimal if the data needs to be preserved for other purposes or if there are regulatory requirements for data retention. Option C suggests editing the job by setting the NumberOfWorkers field to 1. While this approach can work, it may not be sufficient for preventing AWS Glue from reprocessing old data. Using job bookmarks is a more precise way of achieving this goal. Option D suggests using a FindMatches machine learning (ML) transform. While this approach can work for finding matches within data, it does not address the problem of preventing AWS Glue from reprocessing old data. By editing the AWS Glue ETL job to use job bookmarks, the company can ensure that only new data is processed during each run. This solution minimizes processing overhead and reduces costs while ensuring that the job operates correctly.