Base Model (Foundation Model)

Base Model (Foundation Model)

A base model (Foundation Model) is a general-purpose AI model pre-trained on large-scale datasets. Rather than being specialized for a specific task, it functions as a "foundation" that can be adapted to a wide range of applications through fine-tuning or prompt engineering.

Models as a "Foundation"

The term "Foundation Model," coined by Stanford University in 2021, draws an analogy to the foundation of a building. The idea is that a single foundation can support the construction of diverse applications — chatbots, code generation, translation, summarization, and more.

Major LLMs such as GPT, Claude, Llama, and Gemini are all base models, pre-trained on trillions of tokens of text data. This pre-training instills the "groundwork" of language structure, world knowledge, and reasoning capabilities.

Methods of Customization

There are multiple ways to adapt a base model for specific tasks.

The most accessible is prompt engineering, which refines instruction text without modifying the model itself. Next is fine-tuning, which adjusts the model's weights using task-specific data. LoRA and QLoRA are techniques that significantly reduce the cost of this fine-tuning process.

For deeper adaptation, Continued Pre-training can be employed to incorporate domain-specific knowledge into the model. This approach is sometimes used in fields with extensive specialized terminology, such as medicine and law.

Open-Weight vs. Proprietary

Base models fall broadly into two categories. "Open-weight models," such as Meta's Llama and Mistral, make their model weights publicly available, while proprietary models — such as OpenAI's GPT and Anthropic's Claude — are accessible only via API.

When fine-tuning or running a local LLM in-house, open-weight models are a prerequisite. When API access is sufficient, proprietary models may offer lower operational costs.