Q53 — AWS AIF-C01 Ch.1
Question 53 of 100 | ← Chapter 1
A company wants to deploy a language model to create an inference application on edge devices. The inference must achieve the lowest possible latency. Which solution satisfies these requirements?
- A. Deploying an optimized small language model (SLM) on the edge device. ✓
- B. Deploying an optimized large language model (LLM) on the edge device.
- C. Integrating a centralized SLM API for asynchronous communication with the edge device.
- D. Integrating a centralized LLM API for asynchronous communication with the edge device.
Correct Answer: A. Deploying an optimized small language model (SLM) on the edge device.
Explanation
This question tests optimization strategies for inference applications on edge devices. Given resource constraints and low-latency requirements, deploying an optimized small language model (SLM) directly on the edge device (Option A) is most appropriate. SLMs require fewer computational resources and deliver lower latency than LLMs on constrained hardware. Deploying an optimized LLM (Option B) risks excessive latency due to higher resource demands. Centralized APIs (Options C and D) introduce network round-trip delays, violating the low-latency requirement.