Q26 — AWS SAP-C02 Ch.3

Question 26 of 75 | ← Chapter 3

Q251. A company is developing a gene reporting device that will collect genomic information to assist researchers with collecting large samples of data from a diverse population. The device will push 8 KB of genomic data every second to a data platform that will need to process and analyze the data and provide information back to researchers. The data platform must meet the following requirements. Provide near-real-time analytics of the inbound genomic data Ensure the data is flexible, parallel, and durable Deliver results of processing to a data warehouse. Which strategy should a solutions architect use to meet these requirements?

Correct Answer: B. Use Amazon Kinesis Data Streams to collect the inbound sensor data, analyze the data with Kinesis clients, and save the results to an Amazon Redshift cluster using Amazon EMR.

Explanation

Option B would be the best strategy to meet the requirements of providing near-real-time analytics of the inbound genomic data, ensuring the data is flexible, parallel, and durable and delivering results of processing to a data warehouse. The solution involves using Amazon Kinesis Data Streams to collect the inbound sensor data. The data can be analyzed with Kinesis clients, which can process and transform the data in real-time before saving the results to an Amazon Redshift cluster using Amazon EMR. This ensures that the data is processed and analyzed quickly and stored in a durable and scalable data warehouse for further analysis by researchers. Option A uses Amazon Kinesis Data Firehose, which is intended for simpler data ingestion use cases and does not provide real-time data processing capabilities. Option C uses Amazon S3, which is intended for storing objects rather than processing streaming data in real-time. Option D involves using an Amazon API Gateway to put requests into an Amazon SQS queue, which may add unnecessary overhead and latency to the system. Additionally, AWS Lambda has a maximum execution time limit of 15 minutes, which may not be sufficient for analyzing large amounts of data.