Fine-tuning refers to the process of providing additional training data to a pre-trained machine learning model in order to adapt it to a specific task or domain.
"Generically intelligent, but unaware of our business operations" — this is a wall you will almost certainly hit when trying to deploy an LLM in practice. Fine-tuning is the process of tailoring this general-purpose model to your organization's specific needs.
Historically, this has been established as a standard NLP workflow since the era of BERT (around 2018). The two-stage learning framework — learning the general structure of language through pre-training, then overwriting task-specific patterns through fine-tuning — remains unchanged today. What has changed is the scale of models and the associated cost challenges.
Modern fine-tuning broadly falls into three categories.
Full FT updates all parameters of the model. It can achieve the highest accuracy, but for a 70B model, it typically requires 8 or more A100 80GB GPUs, and training can take several days. This is suited for research institutions and big tech companies with ample budget and time.
PEFT (LoRA / QLoRA, etc.) updates only a subset of parameters. It achieves accuracy approaching Full FT on many tasks at 1/10 to 1/100 of the cost. Since 2024, this has been becoming the dominant approach in practical applications.
Instruction Tuning is somewhat different in nature — it teaches the model the ability to follow instructions. The reason ChatGPT can engage in natural dialogue is also a result of fine-tuning the base model on a large number of instruction-response pairs.
Regardless of which method you choose, the quality of training data determines everything. A thousand carefully annotated data points will yield better results than ten thousand pieces of rough data — this is a lesson the author has learned time and again firsthand.


PEFT (Parameter-Efficient Fine-Tuning) is a collective term for fine-tuning methods that adapt a large language model to a specific task with minimal computational resources and data, by updating only a subset of the model's parameters rather than all of them.

LoRA (Low-Rank Adaptation) is a technique that inserts low-rank delta matrices into the weight matrices of large language models and trains only those deltas, enabling fine-tuning by adding approximately 0.1–1% of the total model parameters.

A technique that transfers knowledge from a large teacher model to a small student model, creating a lightweight yet high-accuracy model.


What is PEFT (Parameter-Efficient Fine-Tuning)? A Technology That Reduces AI Model Customization Costs by 90%

A base model (Foundation Model) is a general-purpose AI model pre-trained on large-scale datasets. Rather than being specialized for a specific task, it functions as a "foundation" that can be adapted to a wide range of applications through fine-tuning or prompt engineering.