Token (Token)

Token (Token)

A token is the smallest unit used by an LLM when processing text. It is not necessarily a whole word; it can include parts of words, symbols, and spaces — essentially the fragments resulting from splitting text based on the model's vocabulary.

Different from Words

When you hear "token," you might think of words, but in practice they are a bit more granular. The English word "unbelievable" can be split into 3 tokens: "un", "believ", and "able". Japanese is even more complex — a single hiragana character may become one token, while a single kanji character can consume 2–3 tokens.

This splitting process is called tokenization, and each model uses a different algorithm (BPE, SentencePiece, etc.). This is why the token count for the same text can vary depending on the model.

Why Token Count Matters

The cost and performance of LLMs are almost entirely determined by token count. API pricing typically follows a pay-per-use model based on the number of input and output tokens, and the context window (the amount of text a model can handle at once) is also defined in terms of tokens.

Token count is also directly tied to inference speed. In Dense Models, all parameters are involved in processing each token, so as the token count increases, the computational load increases proportionally. This constraint is why techniques to compress input are often required for long-document summarization tasks.

Practical Estimation

For English, a commonly used rule of thumb is "1 token ≈ 4 characters ≈ 0.75 words." Japanese is less token-efficient, tending to consume 1.5–2 times as many tokens as English for the same meaning. When designing multilingual systems, this difference must be factored into cost estimates.