Q17 — AWS DEA-C01 Ch.1

Question 17 of 100 | ← Chapter 1

A data engineer must use AWS services to ingest a dataset into an Amazon S3 data lake. The data engineer profiles the dataset and discovers that the dataset contains personally identifiable information (PII). The data engineer must implement a solution to profile the dataset and obfuscate the PII. Which solution will meet this requirement with the LEAST operational effort?

Correct Answer: B. Use the Detect PII transform in AWS Glue Studio to identify the PII. Obfuscate the PII. Use an AWS Step Functions state machine to Orchestrate a data pipeline to ingest the data into the S3 data lake.

Explanation

答案B是最优解的原因在于:AWSGlueStudio中的DetectPII变换能够方便地识别个人可识别信息(PII),然后直接对其进行模糊处理。并且,利用AWSStepFunctions状态机来协调数据管道,将处理后的数据摄取到S3数据湖中,整个流程的操作相对简单,运营工作量相对较少。相比之下,A选项需要创建Lambda转换函数和使用SDK进行模糊处理,较为复杂;C选项创建AWSGlue数据质量规则进行模糊处理增加了复杂性;D选项先将数据集摄取到DynamoDB,再通过Lambda函数处理和摄取到S3数据湖,步骤较为繁琐。所以,选项B是正确答案。