Fine-tuning

Fine-tuning

Fine-tuning refers to the process of providing additional training data to a pre-trained machine learning model in order to adapt it to a specific task or domain.

"Generically intelligent, but unaware of our business operations" — this is a wall you will almost certainly hit when trying to deploy an LLM in practice. Fine-tuning is the process of tailoring this general-purpose model to your organization's specific needs.

Historically, this has been established as a standard NLP workflow since the era of BERT (around 2018). The two-stage learning framework — learning the general structure of language through pre-training, then overwriting task-specific patterns through fine-tuning — remains unchanged today. What has changed is the scale of models and the associated cost challenges.

Modern fine-tuning broadly falls into three categories.

Full FT updates all parameters of the model. It can achieve the highest accuracy, but for a 70B model, it typically requires 8 or more A100 80GB GPUs, and training can take several days. This is suited for research institutions and big tech companies with ample budget and time.

PEFT (LoRA / QLoRA, etc.) updates only a subset of parameters. It achieves accuracy approaching Full FT on many tasks at 1/10 to 1/100 of the cost. Since 2024, this has been becoming the dominant approach in practical applications.

Instruction Tuning is somewhat different in nature — it teaches the model the ability to follow instructions. The reason ChatGPT can engage in natural dialogue is also a result of fine-tuning the base model on a large number of instruction-response pairs.

Regardless of which method you choose, the quality of training data determines everything. A thousand carefully annotated data points will yield better results than ten thousand pieces of rough data — this is a lesson the author has learned time and again firsthand.