An open-weight model is a language model whose trained weights (parameters) are publicly released and can be freely downloaded for use in inference and fine-tuning.
Although often confused, open-weight and open-source are not the same thing. Open-source software means that the source code, training data, and training procedures are all publicly available, allowing anyone to reproduce or modify them. Open-weight is a more limited concept, referring specifically to "the trained weight files being publicly available."
Meta's Llama 3 releases the model weights publicly, but the details of the datasets used for training remain undisclosed, and commercial use is subject to conditions based on monthly active users. Mistral similarly makes its weights public, while its licenses vary by model, mixing Apache 2.0 with proprietary licenses. Strictly speaking, it is more accurate to call these models "open-weight" rather than open-source.
Having the weights on hand means that inference can be run entirely under your own organization's control. This has three key implications:
Freedom to Customize: You can fine-tune on your own data to create models specialized for specific domains. This enables deep customization that is impossible via API. Using PEFT or LoRA, fine-tuning becomes practical even on a single consumer-grade GPU.
Ensuring Data Sovereignty: Since no data is sent to external parties during inference, the model can be applied to tasks involving confidential information. This is why adoption is growing in heavily regulated industries such as finance, healthcare, and legal services.
Avoiding Vendor Lock-in: You are not dependent on a specific API provider. Your organization's AI infrastructure can be decoupled from the risks of pricing changes or service discontinuation.
Meta's Llama 4 series spans a wide range of sizes, from Scout (17B active / 109B total) to Behemoth (288B active / 2T total), and adopts a Mixture of Experts architecture. Google's Gemma 3 follows a lightweight approach ranging from 1B to 27B. Mistral delivers commercial-grade performance with Mistral Large 2 while also releasing lightweight versions in parallel. From China, DeepSeek-V3 and Qwen 2.5 are making their presence felt with strong multilingual performance.
When selecting a model, there is more to evaluate than performance alone. License terms (whether commercial use is permitted, user count restrictions), supported languages, and required hardware specifications must all be carefully examined in advance.



LoRA (Low-Rank Adaptation) is a technique that inserts low-rank delta matrices into the weight matrices of large language models and trains only those deltas, enabling fine-tuning by adding approximately 0.1–1% of the total model parameters.

LLM (Large Language Model) is a general term for neural network models pre-trained on massive amounts of text data, containing billions to trillions of parameters, capable of understanding and generating natural language with high accuracy.

A Dense Model is a neural network architecture in which all of the model's parameters are used for computation during inference. In contrast to MoE (Mixture of Experts), which activates only a subset of experts, a Dense Model always involves all weights in computation regardless of the input.

A base model (Foundation Model) is a general-purpose AI model pre-trained on large-scale datasets. Rather than being specialized for a specific task, it functions as a "foundation" that can be adapted to a wide range of applications through fine-tuning or prompt engineering.

A Sparse Model is a general term for neural network architectures that activate only a subset of the model's parameters during inference, rather than all of them. A representative example is MoE (Mixture of Experts), which adopts a scaling strategy distinct from that of Dense Models — increasing the total parameter count while keeping inference costs low.