Q60 — AWS DEA-C01 Ch.1

Question 60 of 100 | ← Chapter 1

A company stores datasets in JSON format and .csv format in an Amazon S3 bucket. The company has Amazon RDS for Microsoft SQL Server Databases, Amazon DynamoDB tables that are in provisioned capacity mode, and an Amazon Redshift cluster. A data engineering team must Develop a solution that will give data scientists the ability to query all data sources by using syntax similar to SQL. Which solution will meet these requirements with the LEAST operational overhead?

Correct Answer: A. Use AWS Glue to crawl the data sources. Store metadata in the AWS Glue Data Catalog. Use Amazon Athena to query the data. Use SQL For structured data sources. Use PartiQL for data that is stored in JSON format.

Explanation

为满足数据科学家使用类似SQL的语法查询所有数据源的需求,选择A选项最为合适且操作开销最小。AWSGlue可以自动发现和分类数据存储在S3中的不同格式,包括JSON和.csv,并将元数据存储在AWSGlueDataCatalog中。AmazonAthena支持使用标准SQL查询存储在S3中的数据,同时支持PartiQL来查询JSON格式的数据。这种方式无需转换数据格式,减少了处理时间和存储空间需求,且Athena按查询付费,无服务器管理成本,适合处理低频或偶发的大数据分析需求。