Q11 — AWS DEA-C01 Ch.1
Question 11 of 100 | ← Chapter 1
A company stores daily records of the financial performance of investment portfolios in .csv format in an Amazon S3 bucket. A data engineer uses AWS Glue crawlers to crawl the S3 data. The data engineer must make the S3 data accessible daily in the AWS Glue Data Catalog. Which solution will meet these requirements?
- A. Create an IAM role that includes the AmazonS3FullAccess policy. Associate the role with the crawler. Specify the S3 bucket path of the source data as the crawler's data store. Create a daily schedule to run the crawler. Configure the output destination to a new path in the existing S3 bucket.
- B. Create an IAM role that includes the AWSGlueServiceRole policy. Associate the role with the crawler. Specify the S3 bucket path of the Source data as the crawler's data store. Create a daily schedule to run the crawler. Specify a database name for the output. ✓
- C. Create an IAM role that includes the AmazonS3FullAccess policy. Associate the role with the crawler. Specify the S3 bucket path of the Source data as the crawler's data store. Allocate data processing units (DPUs) to run the crawler every day. Specify a database name for The output.
- D. Create an IAM role that includes the AWSGlueServiceRole policy. Associate the role with the crawler. Specify the S3 bucket path of the source data as the crawler's data store. Allocate data processing units (DPUs) to run the crawler every day. Configure the output destination to a new path in the existing S3 bucket.
Correct Answer: B. Create an IAM role that includes the AWSGlueServiceRole policy. Associate the role with the crawler. Specify the S3 bucket path of the Source data as the crawler's data store. Create a daily schedule to run the crawler. Specify a database name for the output.
Explanation
AWS Glue爬虫需要关联具有适当权限的IAM角色,其中AWSGlueServiceRole策略包含了访问S3和执行Glue操作的必要权限。每日运行爬虫的任务需通过创建计划来实现,而非手动分配DPUs。指定数据库名称确保爬虫输出的元数据正确存储在Glue Data Catalog中,而不是将数据导出到S3路径。AmazonS3FullAccess策略权限过大且非必要,且DPU分配由AWS自动处理。正确选项应包含AWSGlueServiceRole策略、调度设置及数据库输出。[AWS文档指出,AWSGlueServiceRole策略为Glue服务提供所需权限,包括访问S3和更新Data Catalog。设置定时任务使用Schedule而非DPUs分配。输出到数据库而非S3路径符合Data Catalog的使用场景。选项B符合所有这些要求。]