Q29 — AWS SAA-C03 第3章

第 29/65 题 | ← 返回第3章

Q159.一家公司有一个应用程序,每小时将数百个 .csv 文件放入 Amazon S3 存储桶中.这些文件的大小为 1GB.每次上传文件时,公司都需要将文件转换为 Apache Parquet 格式并将输出文件放入 S3 存储桶中.哪种解决方案能够以最少的运营开销满足这些要求?

正确答案: D. 创建一个 AWS Glue 提取. 转换和加载 (ETL) 作业以将 .csv 文件转换为 Parquet 格式并将输出文件放入 S3 存储桶中.为每个 S3PUT 事件创建一个 AWS Lambda 函数以调用 ETL 作业.

解析

To meet the requirement of converting hundreds of .csv files into Apache Parquet format and placing the output files into an S3 bucket with the least operational overhead, a solutions architect should create a single AWS Glue ETL job to convert the files and place the output files in an S3 bucket. Therefore, option D is the correct answer.Option A suggests creating an AWS Lambda function for each S3 PUT event, which could result in high operational overhead and costs due to the large number of files being converted.Option B suggests creating an Apache Spark job, which would require additional infrastructure and may not be necessary to convert a large number of relatively small files.Option C suggests using AWS Glue and Amazon Athena, which could add unnecessary complexity to the solution and may not be necessary for this scenario.Creating an AWS Glue ETL job can handle the conversion of hundreds of .csv files into Apache Parquet format, and placing the output files into an S3 bucket without triggering an AWS Lambda function for each file. By using AWS Glue, the ETL job can scale up or down automatically based on the size of the input data and the complexity of the transformation. This approach will reduce operational overhead and provide a scalable and cost-effective solution. 为了满足将数百个.csv文件转换为Apache Parquet格式并以最少的操作开销将输出文件放入S3桶的需求,解决方案架构师应该创建单个AWS Glue ETL作业来转换文件并将输出文件放入S3桶中。因此,选项D是正确答案。选项A建议为每个S3 PUT事件创建一个AWS Lambda函数,由于要转换大量文件,这可能会导致较高的操作开销和成本。选项B建议创建一个Apache Spark作业,这将需要额外的基础设施,并且可能不需要转换大量相对较小的文件。选项C建议使用AWS Glue和Amazon Athena,这可能会给解决方案增加不必要的复杂性,对于这种情况可能不是必需的。创建AWS Glue ETL作业可以处理将数百个.csv文件转换为Apache Parquet格式,并将输出文件放入S3桶中,而无需为每个文件触发AWS Lambda函数。通过使用AWS Glue, ETL作业可以根据输入数据的大小和转换的复杂性自动扩展或缩小。这种方法将减少操作开销,并提供可扩展且经济高效的解决方案。