Q66 — AWS DEA-C01 Ch.1

Question 66 of 100 | ← Chapter 1

A company stores 10 to 15 TB of uncompressed .csv files in Amazon S3. The company is evaluating Amazon Athena as a one-time query engine. The company wants to transform the data to optimize query runtime and storage costs. Which file format and compression solution will meet these requirements for Athena queries?

Correct Answer: C. Apache Parquet format compressed with Snappy

Explanation

ApacheParquet是一种高效的列式存储格式,适用于大规模数据分析和查询。Snappy是一种快速且有效的压缩算法。相比之下,.csv格式不是最优的存储格式,压缩效果可能不佳。JSON格式对于复杂数据结构的存储和查询效率不如Parquet。Avro格式虽然也常用于数据存储,但在与Athena结合使用时,在查询性能和压缩效率方面不如Parquet结合Snappy。因此,选项C能够更好地满足优化查询运行时间和降低存储成本的需求。