Quantization (Quantization)

An optimization technique that compresses model size by reducing parameter precision from 16-bit to 4-bit or similar, enabling inference with limited computational resources.
What is Quantization?
Quantization is an optimization technique that reduces the numerical precision of a model's weight parameters (e.g., 32-bit floating point → 4-bit integer) to compress model size and memory usage.
Intuitive Understanding
It is similar to how reducing a photo's image quality decreases its file size. While the amount of information per parameter decreases, the model's overall performance is maintained to a surprisingly high degree. Applying 4-bit quantization to a 70B parameter model shrinks VRAM consumption from approximately 140GB to around 35GB, making inference possible without expensive GPU clusters.
Types of Quantization
| Method | Characteristics |
|---|---|
| Post-Training Quantization (PTQ) | Quantizes an already-trained model as-is. Straightforward, but may result in significant accuracy degradation. |
| Quantization-Aware Training (QAT) | Trains with quantization in mind. More accurate than PTQ, but requires training costs. |
| GPTQ / AWQ / GGUF | Quantization formats optimized for LLMs. Widely adopted as distribution formats for local LLMs. |
QLoRA is a technique that combines quantization with LoRA, enabling fine-tuning in a 4-bit quantized state.
Practical Decision Criteria
Multiple research findings have reported that "quantizing a larger model" yields higher performance than "using a smaller model at full precision." When selecting a model for edge AI environments, finding the optimal solution involves exploring combinations of model size and quantization bit-width.
Related Terms

AI ROI (Return on Investment in AI)
AI ROI is a metric that quantitatively measures the effects obtained — such as operational efficienc

AI Observability
An operational practice of continuously monitoring and visualizing the inputs/outputs, latency, cost

Ambient AI
Ambient AI refers to an AI system that is seamlessly embedded in the user's environment, continuousl

BPO (Business Process Outsourcing)
BPO refers to a form of outsourcing in which a company delegates specific business processes to an e