A base model (Foundation Model) is a general-purpose AI model pre-trained on large-scale datasets. Rather than being specialized for a specific task, it functions as a "foundation" that can be adapted to a wide range of applications through fine-tuning or prompt engineering.
## Models as a "Foundation" The term "Foundation Model," coined by Stanford University in 2021, draws an analogy to the foundation of a building. The idea is that a single foundation can support the construction of diverse applications — chatbots, code generation, translation, summarization, and more. Major LLMs such as GPT, Claude, Llama, and Gemini are all base models, pre-trained on trillions of tokens of text data. This pre-training instills the "groundwork" of language structure, world knowledge, and reasoning capabilities. ## Methods of Customization There are multiple ways to adapt a base model for specific tasks. The most accessible is prompt engineering, which refines instruction text without modifying the model itself. Next is fine-tuning, which adjusts the model's weights using task-specific data. LoRA and QLoRA are techniques that significantly reduce the cost of this fine-tuning process. For deeper adaptation, Continued Pre-training can be employed to incorporate domain-specific knowledge into the model. This approach is sometimes used in fields with extensive specialized terminology, such as medicine and law. ## Open-Weight vs. Proprietary Base models fall broadly into two categories. "Open-weight models," such as Meta's Llama and Mistral, make their model weights publicly available, while proprietary models — such as OpenAI's GPT and Anthropic's Claude — are accessible only via API. When fine-tuning or running a local LLM in-house, open-weight models are a prerequisite. When API access is sufficient, proprietary models may offer lower operational costs.


A Sparse Model is a general term for neural network architectures that activate only a subset of the model's parameters during inference, rather than all of them. A representative example is MoE (Mixture of Experts), which adopts a scaling strategy distinct from that of Dense Models — increasing the total parameter count while keeping inference costs low.

MCP (Model Context Protocol) is a standard protocol that enables AI agents to connect to external tools, databases, and APIs. It is an open standard developed by Anthropic and donated to the Linux Foundation's Agentic AI Foundation.

An open-weight model is a language model whose trained weights (parameters) are publicly released and can be freely downloaded for use in inference and fine-tuning.


What is PEFT (Parameter-Efficient Fine-Tuning)? A Technology That Reduces AI Model Customization Costs by 90%