Q35 — AWS DEA-C01 Ch.1
Question 35 of 100 | ← Chapter 1
A company needs to partition the Amazon S3 storage that the company uses for a data lake. The partitioning will use a path of the S3 object keys in the following format: s3://bucket/prefix/year=2023/month=01/day=01. A data engineer must ensure that the AWS Glue Data Catalog synchronizes with the S3 storage when the company adds new partitions to the bucket. Which solution will meet these requirements with the LEAST latency?
- A. Schedule an AWS Glue crawler to run every morning.
- B. Manually run the AWS Glue CreatePartition API twice each day.
- C. Use code that writes data to Amazon S3 to invoke the Boto3 AWS Glue create_partition API call. ✓
- D. Run the MSCK REPAIR TABLE command from the AWS Glue console.
Correct Answer: C. Use code that writes data to Amazon S3 to invoke the Boto3 AWS Glue create_partition API call.
Explanation
为了最小化延迟并确保AWSGlueDataCatalog与AmazonS3存储同步,最理想的方式是在数据写入S3时立即触发分区创建。选项C描述了使用写入S3的代码来调用Boto3AWSGlue的create_partitionAPI,这种方式可以实现即时的同步,因此延迟最小。其他选项如定期运行crawler(A)、手动运行API(B)或使用MSCKREPAIRTABLE命令(D)都无法实现即时同步,因此会有更高的延迟。所以,选项C是满足要求且延迟最小的解决方案。