Q17 — AWS DEA-C01 Ch.1
Question 17 of 100 | ← Chapter 1
A data engineer must use AWS services to ingest a dataset into an Amazon S3 data lake. The data engineer profiles the dataset and discovers that the dataset contains personally identifiable information (PII). The data engineer must implement a solution to profile the dataset and obfuscate the PII. Which solution will meet this requirement with the LEAST operational effort?
- A. Use an Amazon Kinesis Data Firehose delivery stream to process the dataset. Create an AWS Lambda transform function to identify the PII. Use an AWS SDK to obfuscate the PII. Set the S3 data lake as the target for the delivery stream.
- B. Use the Detect PII transform in AWS Glue Studio to identify the PII. Obfuscate the PII. Use an AWS Step Functions state machine to Orchestrate a data pipeline to ingest the data into the S3 data lake. ✓
- C. Use the Detect PII transform in AWS Glue Studio to identify the PII. Create a rule in AWS Glue Data Quality to obfuscate the PII. Use an AWS Step Functions state machine to orchestrate a data pipeline to ingest the data into the S3 data lake.
- D. Ingest the dataset into Amazon DynamoDB. Create an AWS Lambda function to identify and obfuscate the PII in the DynamoDB table And to transform the data. Use the same Lambda function to ingest the data into the S3 data lake.
Correct Answer: B. Use the Detect PII transform in AWS Glue Studio to identify the PII. Obfuscate the PII. Use an AWS Step Functions state machine to Orchestrate a data pipeline to ingest the data into the S3 data lake.
Explanation
答案B是最优解的原因在于:AWSGlueStudio中的DetectPII变换能够方便地识别个人可识别信息(PII),然后直接对其进行模糊处理。并且,利用AWSStepFunctions状态机来协调数据管道,将处理后的数据摄取到S3数据湖中,整个流程的操作相对简单,运营工作量相对较少。相比之下,A选项需要创建Lambda转换函数和使用SDK进行模糊处理,较为复杂;C选项创建AWSGlue数据质量规则进行模糊处理增加了复杂性;D选项先将数据集摄取到DynamoDB,再通过Lambda函数处理和摄取到S3数据湖,步骤较为繁琐。所以,选项B是正确答案。