Q38 — AWS DEA-C01 Ch.1

Question 38 of 100 | ← Chapter 1

A data engineer has a one-time task to read data from objects that are in Apache Parquet format in an Amazon S3 bucket. The data engineer Needs to query only one column of the data. Which solution will meet these requirements with the LEAST operational overhead?

Correct Answer: B. Use S3 Select to write a SQL SELECT statement to retrieve the required column from the S3 objects.

Explanation

答案B是最优选择。S3Select专门用于从S3存储桶中的对象(如ApacheParquet格式的数据)中直接查询所需的列,无需加载整个数据集,操作开销最小。选项A中使用AWSLambda函数并将数据加载到pandasdataframe再查询,过程相对复杂且可能带来较大开销。选项C中使用AWSGlueDataBrew项目准备工作较多。选项D中先运行AWSGlue爬虫,再在Athena中查询,步骤较为繁琐,开销相对较大。所以,综合考虑,选项B是满足需求且操作开销最小的方案。