Q7 — AWS DOP-C02 Ch.1
Question 7 of 100 | ← Chapter 1
A developer team manages the infrastructure for an application that uses long-running processes to consume items from an Amazon Simple Queue Service (SQS) queue. The application is deployed to an Auto Scaling group. The application recently encountered an issue where item processing duration increased significantly. The queue grew beyond its expected size, disrupting business processes. Application logs are sent to a third-party tool. The team currently subscribes to an Amazon Simple Notification Service (Amazon SNS) topic for alerts. They need to be notified when the queue exceeds its expected size.
- A. Create a 1-hour period Amazon CloudWatch alarm that triggers when the average value exceeds the expected threshold. Configure the alarm to notify the SNS topic.
- B. Create a 1-hour period Amazon CloudWatch alarm that triggers when the sum of the metric value exceeds the expected threshold. Configure the alarm to notify the SNS topic. ✓
- C. Create an AWS Lambda function that retrieves the ApproximateNumberOfMessagesVisible queue attribute and publishes it as a new CloudWatch custom metric. Create an Amazon EventBridge rule that runs the Lambda function every 5 minutes. Create a 1-hour CloudWatch metric alarm with a static threshold on the new custom metric’s sum.
- D. Create an AWS Lambda function that checks the ApproximateNumberOfMessagesVisible queue attribute and compares it against an expected size defined in the function. Create an Amazon EventBridge rule that runs the Lambda function every 5 minutes. Send a notification to the SNS topic when the queue attribute exceeds the expected size.
Correct Answer: B. Create a 1-hour period Amazon CloudWatch alarm that triggers when the sum of the metric value exceeds the expected threshold. Configure the alarm to notify the SNS topic.
Explanation
CloudWatch natively publishes SQS metrics—including ApproximateNumberOfMessagesVisible—without requiring custom data collection. To detect queue backlog, the Sum statistic is appropriate because it reflects total accumulated messages over the evaluation period, whereas Average could mask spikes. A 1-hour evaluation period with Sum avoids false positives from transient bursts while capturing sustained overload. Option B directly leverages this native capability. Option A uses Average, which dilutes impact of surges. Options C and D introduce unnecessary complexity (custom metrics, Lambda orchestration) and latency, violating the principle of using native, low-overhead monitoring. Thus, Option B is the most efficient and AWS-recommended approach.