Q48 — AWS DEA-C01 Ch.1
Question 48 of 100 | ← Chapter 1
A company needs to build a data lake in AWS. The company must provide row-level data access and column-level data access to specific teams. The teams will access the data by using Amazon Athena, Amazon Redshift Spectrum, and Apache Hive from Amazon EMR. Which solution will meet these requirements with the LEAST operational overhead?
- A. Use Amazon S3 for data lake storage. Use S3 access policies to restrict data access by rows and columns. Provide data access through Amazon S3.
- B. Use Amazon S3 for data lake storage. Use Apache Ranger through Amazon EMR to restrict data access by rows and columns. Provide Data access by using Apache Pig.
- C. Use Amazon Redshift for data lake storage. Use Redshift security policies to restrict data access by rows and columns. Provide data Access by using Apache Spark and Amazon Athena federated queries.
- D. Use Amazon S3 for data lake storage. Use AWS Lake Formation to restrict data access by rows and columns. Provide data access Through AWS Lake Formation ✓
Correct Answer: D. Use Amazon S3 for data lake storage. Use AWS Lake Formation to restrict data access by rows and columns. Provide data access Through AWS Lake Formation
Explanation
AWS Lake Formation提供了统一的方式来管理数据湖的权限,支持行级和列级访问控制,并集成Amazon Athena、Redshift Spectrum和EMR。S3访问策略无法直接实现行列级别的细粒度控制(A)。Apache Ranger需要额外配置且主要适用于EMR环境(B)。Redshift不适合作为数据湖存储,且其安全策略不适用于外部服务查询(C)。Lake Formation通过集中权限管理简化了跨多个查询服务的访问控制,降低了维护复杂性。AWS官方文档指出Lake Formation能直接在数据湖中定义细粒度的访问策略,无需为每个服务单独配置。