Semantic search is a method that returns search results based on the "semantic proximity" between a query and documents. Rather than matching keyword strings, it converts text into a vector space using embeddings and measures relevance with distance functions such as cosine similarity.
Traditional keyword search (Sparse Models, typified by BM25) directly evaluates whether words contained in a query appear in a document. Searching for "automobile" will return documents containing "automobile," but it cannot retrieve documents using synonyms like "car" or "auto."
Semantic search transcends this limitation. It converts text into vectors of hundreds to thousands of dimensions using an embedding model, then performs nearest-neighbor search on a vector database. "I want to improve my automobile's fuel efficiency" and "ways to reduce a car's gasoline consumption" share almost no overlapping vocabulary, yet they are mapped to nearby positions in semantic space and will therefore match.
Semantic search excels at paraphrasing, synonyms, and concept-level queries. It delivers high recall for queries that differ in expression but share the same intent—such as "steps for the resignation process" and "what to do when leaving a company." It pairs well with internal knowledge bases and FAQ search.
On the other hand, it struggles with queries that require exact vocabulary matches, such as model numbers (XR-990), legal statute numbers, or program code. In embedding space, "XR-990" and "XR-991" may be mapped to nearly identical positions, making them indistinguishable. To compensate for this weakness, hybrid search combining semantic search with BM25 has been widely adopted in practice.
In RAG (Retrieval-Augmented Generation), semantic search serves as the core of the retrieval phase. The user's question is vectorized, semantically relevant chunks are retrieved from an external knowledge base, and these are passed to the LLM. If retrieval accuracy is low at this stage, the LLM generates responses based on irrelevant documents, leading to hallucinations.
The practical keys to improving retrieval quality lie in selecting the right embedding model (whether multilingual support is needed, or whether domain-specific fine-tuning is effective) and in designing chunk sizes. In the author's experience, simply changing the chunk size from 256 tokens to 512 tokens with the same model has shifted Recall@10 by more than 10 points. Evaluating the model and chunk size together has become a cardinal rule.



A2A (Agent-to-Agent Protocol) is a communication protocol that enables different AI agents to perform capability discovery, task delegation, and state synchronization, published by Google in April 2025.

Acceptance testing is a testing method that verifies whether developed features meet business requirements and user stories, from the perspective of the product owner and stakeholders.

AES-256 is the highest-strength encryption algorithm using a 256-bit key length within AES (Advanced Encryption Standard), a symmetric-key cryptographic scheme standardized by the National Institute of Standards and Technology (NIST).

A mechanism that controls task distribution, state management, and coordination flows among multiple AI agents.

Agent Skills are reusable instruction sets defined to enable AI agents to perform specific tasks or areas of expertise, functioning as modular units that extend the capabilities of an agent.