Knowledge Distillation (Knowledge Distillation)

A technique that transfers knowledge from a large teacher model to a small student model, creating a lightweight yet high-accuracy model.
What is Knowledge Distillation?
Knowledge Distillation is a technique in which a smaller "student model" is trained using the output distribution of a large "teacher model" as training data. By mimicking the inference patterns of the teacher model, the student model can maintain high accuracy while significantly reducing the number of parameters.
Why is Distillation Necessary?
Deploying an LLM with tens of billions of parameters directly in a production environment makes GPU costs and latency a business constraint. On the other hand, training a small model from scratch makes it difficult to achieve the same level of accuracy as a large model. Distillation is a practical approach that resolves this contradiction.
For example, Microsoft's Phi series distills small models using synthetic data generated by large models, achieving performance that rivals large models despite being an SLM (Small Language Model).
Differences from Fine-Tuning
Fine-tuning is a technique that adjusts the weights of an existing model to specialize it for a specific task, without changing the model size. Distillation differs in that it reduces the model size itself. In practice, a pipeline in which the model is first made smaller through distillation and then adapted to a business domain using LoRA or similar methods is becoming increasingly common.
Limitations of Distillation
Tasks that the teacher model struggles with will also be difficult for the student model. Additionally, since a large volume of outputs must be generated from the teacher model, the computational cost of the distillation process itself cannot be overlooked.
Related Terms

AI ROI (Return on Investment in AI)
AI ROI is a metric that quantitatively measures the effects obtained — such as operational efficienc

AI Observability
An operational practice of continuously monitoring and visualizing the inputs/outputs, latency, cost

Ambient AI
Ambient AI refers to an AI system that is seamlessly embedded in the user's environment, continuousl

BPO (Business Process Outsourcing)
BPO refers to a form of outsourcing in which a company delegates specific business processes to an e