Q82 — AWS AIF-C01 Ch.1

Question 82 of 100 | ← Chapter 1

A company wants to use a language model to create an application for inference on edge devices. The inference must have the lowest possible latency. Which solution meets these requirements?

Correct Answer: A. Deploy an optimized small language model (SLM) on the edge device.

Explanation

To meet low-latency inference requirements on edge devices, deploying a small, optimized language model is critical. Small language models (SLMs) reduce model size and improve inference speed through techniques such as lightweight architecture, quantization, and pruning, making them suitable for resource-constrained edge environments. In contrast, large language models (LLMs), while powerful, are typically unsuitable for direct deployment on edge devices due to their complexity and large footprint, which would introduce higher latency and resource consumption.