QLoRA (Quantized LoRA) is a method that combines LoRA with 4-bit quantization, enabling fine-tuning of large language models even on consumer-grade GPUs.
QLoRA, announced in 2023, was a direct answer to the urgent voices from practitioners saying "we don't have enough GPUs." The core idea is simple: quantize the base model weights to 4-bit to dramatically reduce GPU memory consumption, then train only the LoRA adapters in 16-bit. In other words, it adopts a two-stage design philosophy of "lightweight loading, precise training." In concrete numbers, loading a 65B parameter model at full precision requires multiple A100 80GB GPUs, but QLoRA fits it onto a single card. For 7B models, training can even run on an RTX 3090 (24GB) or RTX 4090. The cost of renting GPU instances in the cloud can often be reduced to less than 1/10 of that of full fine-tuning. However, there are caveats. Accuracy degradation from 4-bit quantization is not zero. Based on the author's own experiments, the difference from full-precision LoRA was negligible for simple classification and summarization tasks, but a score drop of around 1–3% was observed for tasks requiring mathematical reasoning or logical development in long-form text. In practice, the rational approach seems to be: "start with QLoRA, and switch to full-precision LoRA if the accuracy is insufficient."


A2A (Agent-to-Agent Protocol) is a communication protocol that enables different AI agents to perform capability discovery, task delegation, and state synchronization, published by Google in April 2025.

Agent Skills are reusable instruction sets defined to enable AI agents to perform specific tasks or areas of expertise, functioning as modular units that extend the capabilities of an agent.

Agentic AI is a general term for AI systems that interpret goals and autonomously repeat the cycle of planning, executing, and verifying actions without requiring step-by-step human instruction.
