Q52 — AWS DEA-C01 Ch.1
Question 52 of 100 | ← Chapter 1
A financial company wants to use Amazon Athena to run on-demand SQL queries on a petabyte-scale dataset to support a business intelligence (BI) application. An AWS Glue job that runs during non-business hours updates the dataset once every day. The BI application has a standard data refresh frequency of 1 hour to comply with company policies. A data engineer wants to cost optimize the company's use of Amazon Athena without adding any additional infrastructure costs. Which solution will meet these requirements with the LEAST operational overhead?
- A. Configure an Amazon S3 Lifecycle policy to move data to the S3 Glacier Deep Archive storage class after 1 day.
- B. Use the query result reuse feature of Amazon Athena for the SQL queries. ✓
- C. Add an Amazon ElastiCache cluster between the BI application and Athena.
- D. Change the format of the files that are in the dataset to Apache Parquet.
Correct Answer: B. Use the query result reuse feature of Amazon Athena for the SQL queries.
Explanation
针对这个场景,我们需要找到一种既能满足业务需求又能最小化运营成本的解决方案。考虑到BI应用每小时需要刷新数据,而数据集由AWSGlue作业每天更新一次,我们需要确保在这两者之间找到一个平衡点。选项A提出将数据移动到S3GlacierDeepArchive存储类,这并不符合需求,因为这将影响数据的即时访问性,增加查询延迟。选项B建议使用AmazonAthena的查询结果重用功能。这是一个理想的解决方案,因为它可以减少重复查询相同数据集时产生的成本,同时不会引入额外的基础设施成本或运营开销。选项C提出添加AmazonElastiCache集群,这虽然可能提高查询性能,但会增加基础设施成本,不符合成本优化的要求。选项D建议改变数据集的文件格式,虽然这可能提高查询效率,但同样不涉及成本优化,且可能需要额外的数据处理工作。综上所述,选项B是最符合题目要求的解决方案,因为它提供了一种在不增加运营开销的情况下优化成本的方法。