Sparse Modelとは？

Sparse Model

Updated:March 24, 2026Published:March 20, 2026

A Sparse Model is a general term for neural network architectures that activate only a subset of the model's parameters during inference, rather than all of them. A representative example is MoE (Mixture of Experts), which adopts a scaling strategy distinct from that of Dense Models — increasing the total parameter count while keeping inference costs low.

The Meaning of "Sparsity"

In the context of neural networks, "Sparse" refers to a state in which only a small fraction of the connections or parameters within a network are actually used. While a Dense Model uses all parameters in its computations regardless of the input, a Sparse Model activates only a different subset of parameters for each input.

An intuitive way to understand this mechanism is to imagine a large library. A Dense Model is like a librarian who re-reads the entire collection for every question, while a Sparse Model is like a librarian who consults only the relevant shelves depending on the question.

Relationship with MoE

The dominant form of Sparse Models today is the MoE architecture. In MoE, a router assigns each input token to a small number of experts (typically 2–4), and the experts that are not selected skip computation entirely.

However, Sparse Models are not limited to MoE. "Unstructured sparsity," which sets the majority of weights to zero, and techniques that dynamically disable specific attention heads also fall within the category of sparse models. MoE is simply the most practically advanced form among them.

Criteria for Choosing Between Sparse and Dense Models

The advantages of Sparse Models are clear: they allow a model to hold more "knowledge" at the same inference cost. Mixtral 8x7B has a total of 46.7B parameters but only 12.9B active parameters, meaning its inference cost is equivalent to a 13B-class Dense Model while its performance approaches that of a 70B-class model.

On the other hand, there are also challenges. Designing effective load balancing among experts is difficult, and when inputs concentrate on specific experts, the benefits of sparsity diminish. Furthermore, all experts must be loaded into GPU memory, making memory efficiency less straightforward than with Dense Models.

Sparse Model

The Meaning of "Sparsity"

Relationship with MoE

Criteria for Choosing Between Sparse and Dense Models

Related Terms

AI ROI (Return on Investment in AI)

AI Observability

Ambient AI

BPO (Business Process Outsourcing)