Semantic Search

Semantic Search

Semantic search is a method that returns search results based on the "semantic proximity" between a query and documents. Rather than matching keyword strings, it converts text into a vector space using embeddings and measures relevance with distance functions such as cosine similarity.

Traditional keyword search (Sparse Models, typified by BM25) directly evaluates whether words contained in a query appear in a document. Searching for "automobile" will return documents containing "automobile," but it cannot retrieve documents using synonyms like "car" or "auto."

Semantic search transcends this limitation. It converts text into vectors of hundreds to thousands of dimensions using an embedding model, then performs nearest-neighbor search on a vector database. "I want to improve my automobile's fuel efficiency" and "ways to reduce a car's gasoline consumption" share almost no overlapping vocabulary, yet they are mapped to nearby positions in semantic space and will therefore match.

Where It Excels and Where It Falls Short

Semantic search excels at paraphrasing, synonyms, and concept-level queries. It delivers high recall for queries that differ in expression but share the same intent—such as "steps for the resignation process" and "what to do when leaving a company." It pairs well with internal knowledge bases and FAQ search.

On the other hand, it struggles with queries that require exact vocabulary matches, such as model numbers (XR-990), legal statute numbers, or program code. In embedding space, "XR-990" and "XR-991" may be mapped to nearly identical positions, making them indistinguishable. To compensate for this weakness, hybrid search combining semantic search with BM25 has been widely adopted in practice.

Role in RAG

In RAG (Retrieval-Augmented Generation), semantic search serves as the core of the retrieval phase. The user's question is vectorized, semantically relevant chunks are retrieved from an external knowledge base, and these are passed to the LLM. If retrieval accuracy is low at this stage, the LLM generates responses based on irrelevant documents, leading to hallucinations.

The practical keys to improving retrieval quality lie in selecting the right embedding model (whether multilingual support is needed, or whether domain-specific fine-tuning is effective) and in designing chunk sizes. In the author's experience, simply changing the chunk size from 256 tokens to 512 tokens with the same model has shifted Recall@10 by more than 10 points. Evaluating the model and chunk size together has become a cardinal rule.