Dense Model (Tightly Coupled Model)とは？

Why Is It Called "Dense"?

In the world of neural networks, there has long been a convention of referring to fully connected layers as dense layers. The term Dense Model succinctly captures a structural characteristic of the architecture: the entire model is composed of these "gap-free connections" — meaning all parameters are activated on every inference pass.

The counterpart that emerged is the MoE (Mixture of Experts) architecture. In MoE, a routing mechanism selects only a small number of experts for each input token, leaving the rest dormant. As a result, even when the total parameter count is the same, the computational cost (FLOPs) during inference is significantly reduced. This is the mechanism behind the description of Mixtral 8x7B as having "46.7B parameters, but only 12.9B active parameters."

Strengths and Limitations of Dense Models

The greatest advantage of Dense Models lies in their design simplicity. There is no need to worry about routing imbalances or load balancing issues between experts, which makes training highly stable. The fact that major models such as the Llama 3 series and Claude continue to adopt the Dense architecture is because this stability carries significant weight in large-scale training.

On the other hand, the unavoidable drawback is that parameter count directly translates to inference cost. A Dense Model with 70B parameters must read and compute all 70B weights on every inference pass. If equivalent quality can be achieved with MoE, the inference cost can sometimes be reduced to a fraction of that.

Decision Criteria in Practice

When selecting a model, it is more practical to evaluate fitness for the workload rather than framing the choice as a binary opposition between Dense and MoE.

For latency-sensitive real-time dialogue, or for tasks with diverse input/output patterns where bias toward specific experts is unpredictable, the predictable computational cost of Dense Models makes them easier to operate. Conversely, for batch inference processing large volumes of text or throughput-oriented scenarios, the computational efficiency of MoE comes into its own.

In the author's experience, when switching models in a production environment, the factor with the greatest impact was not "parameter count itself" but rather "how the model fits into GPU memory." A Dense 70B model barely fits on two A100 80GB GPUs, whereas a MoE model with 13B active parameters can run on a single card — and this difference has a decisive effect on infrastructure costs.

Dense Model (Tightly Coupled Model)

Why Is It Called "Dense"?

Strengths and Limitations of Dense Models

Decision Criteria in Practice

Related Terms

AI ROI (Return on Investment in AI)

AI Observability

Ambient AI

BPO (Business Process Outsourcing)