A token is the smallest unit used by an LLM when processing text. It is not necessarily a whole word; it can include parts of words, symbols, and spaces — essentially the fragments resulting from splitting text based on the model's vocabulary.
When you hear "token," you might think of words, but in practice they are a bit more granular. The English word "unbelievable" can be split into 3 tokens: "un", "believ", and "able". Japanese is even more complex — a single hiragana character may become one token, while a single kanji character can consume 2–3 tokens.
This splitting process is called tokenization, and each model uses a different algorithm (BPE, SentencePiece, etc.). This is why the token count for the same text can vary depending on the model.
The cost and performance of LLMs are almost entirely determined by token count. API pricing typically follows a pay-per-use model based on the number of input and output tokens, and the context window (the amount of text a model can handle at once) is also defined in terms of tokens.
Token count is also directly tied to inference speed. In Dense Models, all parameters are involved in processing each token, so as the token count increases, the computational load increases proportionally. This constraint is why techniques to compress input are often required for long-document summarization tasks.
For English, a commonly used rule of thumb is "1 token ≈ 4 characters ≈ 0.75 words." Japanese is less token-efficient, tending to consume 1.5–2 times as many tokens as English for the same meaning. When designing multilingual systems, this difference must be factored into cost estimates.


An algorithm that merges text based on frequent patterns and splits it into subword units. It directly affects the input/output cost and processing speed of LLMs; for low-resource languages, insufficient dedicated vocabulary leads to byte-level decomposition.

SLM (Small Language Model) is a general term for language models with a parameter count limited to approximately a few billion to ten billion, characterized by the ability to perform inference and fine-tuning with fewer computational resources compared to LLMs.

LLM (Large Language Model) is a general term for neural network models pre-trained on massive amounts of text data, containing billions to trillions of parameters, capable of understanding and generating natural language with high accuracy.


Local LLM / SLM Deployment Comparison — AI Utilization Without Cloud API Dependency