Q63 — AWS DEA-C01 Ch.1

Question 63 of 100 | ← Chapter 1

A company has multiple applications that use datasets that are stored in an Amazon S3 bucket. The company has an ecommerce application that generates a dataset that contains personally identifiable information (PII). The company has an internal analytics application that does not require access to the PII. To comply with regulations, the company must not share PII unnecessarily. A data engineer needs to implement a solution that with redact PII dynamically, based on the needs of each application that accesses the dataset. Which solution will meet the requirements with the LEAST operational overhead?

Correct Answer: B. Create an S3 Object Lambda endpoint. Use the S3 Object Lambda endpoint to read data from the S3 bucket. Implement redaction logic Within an S3 Object Lambda function to dynamically redact PII based on the needs of each application that accesses the data.

Explanation

为了遵守规定并确保不共享个人身份信息(PII),公司需要一个解决方案,该方案能够根据访问数据集的应用程序的需求动态地编辑PII。考虑到操作开销最小化的要求,我们可以逐一分析每个选项:A选项提出创建多个数据集副本,每个副本具有不同级别的编辑,以适应不同应用程序的需求。这种方法涉及数据复制和存储开销,操作复杂度高。B选项建议使用S3ObjectLambda。这种方法允许在读取数据时动态地应用编辑逻辑,而无需创建数据集的多个副本。这样减少了存储需求,并且可以根据每个应用程序的需求实时编辑数据,操作开销最小。C选项类似于A选项,使用AWSGlue进行数据转换并创建多个数据集副本,这同样涉及较高的操作开销。D选项提出使用APIGateway,并通过RESTAPI调用动态编辑PII。虽然这种方法提供了一定的灵活性,但它增加了额外的网络调用开销和API管理复杂性。因此,考虑到操作开销最小化的要求,B选项(使用S3ObjectLambda)是最合适的解决方案。