Q64 — AWS DEA-C01 Ch.1

Question 64 of 100 | ← Chapter 1

A data engineer needs to build an extract, transform, and load (ETL) job. The ETL job will process daily incoming .csv files that users upload to an Amazon S3 bucket. The size of each S3 object is less than 100 MB. Which solution will meet these requirements MOST cost-effectively?

Correct Answer: C. Write an AWS Glue PySpark job. Use Apache Spark to transform the data.

Explanation

AWSGlue专门为ETL任务设计,尤其适用于处理S3中的数据。对于小于100MB的S3对象,AWSGluePySpark作业能够高效且经济地完成提取、转换和加载的任务。相比之下,自定义Python应用在EKS集群上运行或在EMR集群上运行PySpark脚本,以及使用pandas在AWSGluePythonshell作业中处理,都不如AWSGluePySpark作业在这种场景下具有成本效益。因此,选项C是最符合要求且成本效益最高的答案。