Q21 — AWS DEA-C01 Ch.1
Question 21 of 100 | ← Chapter 1
A data engineer needs to join data from multiple sources to perform a one-time analysis job. The data is stored in Amazon DynamoDB, Amazon RDS, Amazon Redshift, and Amazon S3. Which solution will meet this requirement MOST cost-effectively?
- A. Use an Amazon EMR provisioned cluster to read from all sources. Use Apache Spark to join the data and perform the analysis.
- B. Copy the data from DynamoDB, Amazon RDS, and Amazon Redshift into Amazon S3. Run Amazon Athena queries directly on the S3 files.
- C. Use Amazon Athena Federated Query to join the data from all data sources. ✓
- D. Use Redshift Spectrum to query data from DynamoDB, Amazon RDS, and Amazon S3 directly from Redshift.
Correct Answer: C. Use Amazon Athena Federated Query to join the data from all data sources.
Explanation
对于将来自多个数据源(包括AmazonDynamoDB、AmazonRDS、AmazonRedshift和AmazonS3)的数据进行一次性分析的需求,选项C是最经济有效的。选项A中使用AmazonEMR预配置集群成本较高。选项B复制数据操作增加了复杂性和可能的成本。选项D中的RedshiftSpectrum可能在这种一次性任务中并非最优选择。而AmazonAthenaFederatedQuery能够直接连接多个数据源进行查询和分析,对于一次性的分析任务来说,相对成本更低且更便捷。所以答案选C。