A token is the smallest unit used by an LLM when processing text. It is not necessarily a whole word; it can include parts of words, symbols, and spaces — essentially the fragments resulting from splitting text based on the model's vocabulary.
## Different from Words When you hear "token," you might think of words, but in practice they are a bit more granular. The English word "unbelievable" can be split into 3 tokens: "un", "believ", and "able". Japanese is even more complex — a single hiragana character may become one token, while a single kanji character can consume 2–3 tokens. This splitting process is called tokenization, and each model uses a different algorithm (BPE, SentencePiece, etc.). This is why the token count for the same text can vary depending on the model. ## Why Token Count Matters The cost and performance of LLMs are almost entirely determined by token count. API pricing typically follows a pay-per-use model based on the number of input and output tokens, and the context window (the amount of text a model can handle at once) is also defined in terms of tokens. Token count is also directly tied to inference speed. In Dense Models, all parameters are involved in processing each token, so as the token count increases, the computational load increases proportionally. This constraint is why techniques to compress input are often required for long-document summarization tasks. ## Practical Estimation For English, a commonly used rule of thumb is "1 token ≈ 4 characters ≈ 0.75 words." Japanese is less token-efficient, tending to consume 1.5–2 times as many tokens as English for the same meaning. When designing multilingual systems, this difference must be factored into cost estimates.


SLM (Small Language Model) is a general term for language models with a parameter count limited to approximately a few billion to ten billion, characterized by the ability to perform inference and fine-tuning with fewer computational resources compared to LLMs.

LLM (Large Language Model) is a general term for neural network models pre-trained on massive amounts of text data, containing billions to trillions of parameters, capable of understanding and generating natural language with high accuracy.

A local LLM refers to an operational model in which a large language model is run directly on one's own server or PC, without going through a cloud API.


Local LLM / SLM Deployment Comparison — AI Utilization Without Cloud API Dependency

Chunk size refers to the size (in number of tokens or characters) of the unit into which documents are split when stored in a vector store within a RAG pipeline. It is a critical parameter that directly affects retrieval accuracy and answer quality.