Q66 — AWS DEA-C01 Ch.1
Question 66 of 100 | ← Chapter 1
A company stores 10 to 15 TB of uncompressed .csv files in Amazon S3. The company is evaluating Amazon Athena as a one-time query engine. The company wants to transform the data to optimize query runtime and storage costs. Which file format and compression solution will meet these requirements for Athena queries?
- A. csv format compressed with zip
- B. JSON format compressed with bzip2
- C. Apache Parquet format compressed with Snappy ✓
- D. Apache Avro format compressed with LZO
Correct Answer: C. Apache Parquet format compressed with Snappy
Explanation
ApacheParquet是一种高效的列式存储格式,适用于大规模数据分析和查询。Snappy是一种快速且有效的压缩算法。相比之下,.csv格式不是最优的存储格式,压缩效果可能不佳。JSON格式对于复杂数据结构的存储和查询效率不如Parquet。Avro格式虽然也常用于数据存储,但在与Athena结合使用时,在查询性能和压缩效率方面不如Parquet结合Snappy。因此,选项C能够更好地满足优化查询运行时间和降低存储成本的需求。