Q9 — AWS DEA-C01 Ch.1
Question 9 of 100 | ← Chapter 1
A data engineer needs Amazon Athena queries to finish faster. The data engineer notices that all the files the Athena queries use are currently stored in uncompressed .csv format. The data engineer also notices that users perform most queries by selecting a specific column. Which solution will MOST speed up the Athena query performance?
- A. Change the data format from .csv to JSON format. Apply Snappy compression.
- B. Compress the .csv files by using Snappy compression.
- C. Change the data format from .csv to Apache Parquet. Apply Snappy compression. ✓
- D. Compress the .csv files by using gzip compression.
Correct Answer: C. Change the data format from .csv to Apache Parquet. Apply Snappy compression.
Explanation
答案C是正确的。在这种情况下,将数据格式从.csv转换为ApacheParquet格式,并应用Snappy压缩,能显著提高Athena查询性能。Parquet是一种高效的列式存储格式,对于按列查询有优势,能够减少读取的数据量。而Snappy压缩在保证较好压缩率的同时,解压速度快,适合查询时快速处理。A选项的JSON格式通常不如Parquet格式适合查询优化。B选项仅对.csv文件压缩效果不如改变格式显著。D选项gzip压缩解压速度相对较慢。综上,选项C是最优解。