Q82 — AWS AIF-C01 Ch.1
Question 82 of 100 | ← Chapter 1
A company wants to use a language model to create an application for inference on edge devices. The inference must have the lowest possible latency. Which solution meets these requirements?
- A. Deploy an optimized small language model (SLM) on the edge device. ✓
- B. Deploy an optimized large language model (LLM) on the edge device.
- C. Integrate a centralized small language model (SLM) API for asynchronous communication with the edge device.
- D. Integrate a centralized large language model (LLM) API for asynchronous communication with the edge device.
Correct Answer: A. Deploy an optimized small language model (SLM) on the edge device.
Explanation
To meet low-latency inference requirements on edge devices, deploying a small, optimized language model is critical. Small language models (SLMs) reduce model size and improve inference speed through techniques such as lightweight architecture, quantization, and pruning, making them suitable for resource-constrained edge environments. In contrast, large language models (LLMs), while powerful, are typically unsuitable for direct deployment on edge devices due to their complexity and large footprint, which would introduce higher latency and resource consumption.