Sparse Model

A Sparse Model is a general term for neural network architectures that activate only a subset of the model's parameters during inference, rather than all of them. A representative example is MoE (Mixture of Experts), which adopts a scaling strategy distinct from that of Dense Models — increasing the total parameter count while keeping inference costs low.
The Meaning of "Sparsity"
In the context of neural networks, "Sparse" refers to a state in which only a small fraction of the connections or parameters within a network are actually used. While a Dense Model uses all parameters in its computations regardless of the input, a Sparse Model activates only a different subset of parameters for each input.
An intuitive way to understand this mechanism is to imagine a large library. A Dense Model is like a librarian who re-reads the entire collection for every question, while a Sparse Model is like a librarian who consults only the relevant shelves depending on the question.
Relationship with MoE
The dominant form of Sparse Models today is the MoE architecture. In MoE, a router assigns each input token to a small number of experts (typically 2–4), and the experts that are not selected skip computation entirely.
However, Sparse Models are not limited to MoE. "Unstructured sparsity," which sets the majority of weights to zero, and techniques that dynamically disable specific attention heads also fall within the category of sparse models. MoE is simply the most practically advanced form among them.
Criteria for Choosing Between Sparse and Dense Models
The advantages of Sparse Models are clear: they allow a model to hold more "knowledge" at the same inference cost. Mixtral 8x7B has a total of 46.7B parameters but only 12.9B active parameters, meaning its inference cost is equivalent to a 13B-class Dense Model while its performance approaches that of a 70B-class model.
On the other hand, there are also challenges. Designing effective load balancing among experts is difficult, and when inputs concentrate on specific experts, the benefits of sparsity diminish. Furthermore, all experts must be loaded into GPU memory, making memory efficiency less straightforward than with Dense Models.
Related Terms

AI ROI (Return on Investment in AI)
AI ROI is a metric that quantitatively measures the effects obtained — such as operational efficienc

AI Observability
An operational practice of continuously monitoring and visualizing the inputs/outputs, latency, cost

Ambient AI
Ambient AI refers to an AI system that is seamlessly embedded in the user's environment, continuousl

BPO (Business Process Outsourcing)
BPO refers to a form of outsourcing in which a company delegates specific business processes to an e