GPU (Graphics Processing Unit)

A GPU (Graphics Processing Unit) is a semiconductor chip that processes large volumes of parallel computations at high speed. Originally designed for rendering graphics, its parallel computing capabilities are well-suited for AI training and inference, making it an indispensable hardware component for LLM training and fine-tuning.
Why GPU Instead of CPU
CPUs are optimized for complex sequential processing and typically have only a few dozen cores. GPUs, on the other hand, can execute simple operations simultaneously across thousands to tens of thousands of cores. Neural network training is fundamentally a repetition of matrix operations, and this processing pattern aligns well with the parallel architecture of GPUs.
For example, when training a 70B parameter Dense Model, gradient calculations for each parameter must be performed in parallel. Computations that would take months on a CPU with sequential processing can be completed in days to weeks on a GPU cluster.
The Constraint of VRAM
When discussing GPUs in the context of AI, VRAM (Video RAM) is just as important as computational performance. All model weights and activations must be loaded into VRAM, and VRAM capacity effectively determines the upper limit on model size.
A single NVIDIA A100 (80GB) can accommodate roughly 40B parameters (in FP16). Running a 70B Dense Model requires at least 2 cards, and training one requires 8 or more. The reason LoRA and QLoRA attract so much attention is that they can dramatically reduce VRAM consumption.
Cloud vs. On-Premises
GPUs are expensive, with a single NVIDIA H100 costing several million yen. For this reason, many companies use cloud GPUs (AWS, GCP, Azure) on demand. On the other hand, when running large volumes of inference continuously, on-premises setups can be more cost-efficient, making this a critical decision in the operation of local LLMs.
Related Terms

AI ROI (Return on Investment in AI)
AI ROI is a metric that quantitatively measures the effects obtained — such as operational efficienc

AI Observability
An operational practice of continuously monitoring and visualizing the inputs/outputs, latency, cost

Ambient AI
Ambient AI refers to an AI system that is seamlessly embedded in the user's environment, continuousl

BPO (Business Process Outsourcing)
BPO refers to a form of outsourcing in which a company delegates specific business processes to an e