SLM (Small Language Model)

SLM (Small Language Model)

SLM (Small Language Model) is a general term for language models with a parameter count limited to approximately a few billion to ten billion, characterized by the ability to perform inference and fine-tuning with fewer computational resources compared to LLMs.

"Small" Does Not Mean Weak

In the world of LLMs, "bigger means smarter" has long been conventional wisdom. Compared to GPT-4's estimated 1.8 trillion parameters, SLMs sit at around 1B–10B — a difference of two orders of magnitude. However, since 2025, this conventional wisdom has been rapidly crumbling.

Microsoft's Phi-4 (14B) has achieved scores rivaling GPT-4o on several reasoning benchmarks. Google's Gemma 3, ranging from 1B to 27B, delivers extremely high performance per parameter for its size. Through improvements in model architecture and the curation of high-quality training data, "small but sufficient for specific tasks" has become a reality.

Where Are They Being Used?

SLMs have three primary battlegrounds.

Edge devices: Environments with limited GPU resources, such as smartphones, IoT gateways, and embedded systems. Apple's on-device inference running on iPhones is a prime example of SLMs in action.

Cost optimization: Using GPT-4-class models for routine tasks like classification, summarization, and data extraction is overkill. With SLMs, inference costs can be reduced to less than one-tenth.

Latency requirements: Scenarios demanding responses in tens of milliseconds, such as real-time chat, voice response, and game AI. With fewer parameters, inference speed is faster by orders of magnitude.

How to Use SLMs and LLMs Differently

LLMs still hold the advantage when general-purpose responses are needed — complex reasoning, multilingual support, and long-form generation. On the other hand, when tasks can be narrowed down, fine-tuning an SLM can outperform in terms of accuracy, speed, and cost all at once.

In practice, a standard workflow is emerging: "first prototype with an LLM API, then once the task is well-defined, distill it into an SLM to reduce costs." Distillation refers to the technique of training a smaller model using the outputs of a larger model as teacher data.