
A vector database is a specialized database that stores data such as text and images as numerical vectors, enabling high-speed search based on semantic similarity.
With the widespread adoption of AI systems leveraging LLMs (Large Language Models), vector databases have rapidly gained attention as core infrastructure for RAG (Retrieval-Augmented Generation). There is a growing interest in vector databases as traditional RDBs and keyword search increasingly fall short of meeting requirements for "finding semantically similar documents."
This article provides a systematic explanation covering everything from the basic mechanisms and comparisons of major products to RAG applications and common failure patterns during implementation. It is intended to help engineers, architects, and product managers driving AI adoption develop a framework for selecting the right product for their organization's needs.
A vector database is a specialized database that stores text, images, audio, and other data as numerical arrays (vectors), enabling high-speed search based on semantic similarity. Taking a fundamentally different approach from traditional RDB keyword matching, vector databases have emerged as core infrastructure for AI development against the backdrop of the growing adoption of LLMs and RAG systems. The following sections explain the mechanisms, use cases, and product comparisons in order.
A vector database is a database that stores data such as text, images, and audio as arrays of floating-point numbers with hundreds to thousands of dimensions (vectors), and enables search based on the distance or similarity between vectors. While traditional RDBs (relational databases) excel at "exact matches" and "prefix matches," the defining characteristic of a vector database is its ability to return results based on semantic similarity.
Key Differences from Traditional RDBs
For example, if you search for "recommend a smartphone" in an RDB, only rows containing the keyword "smartphone" will be returned. With a vector DB, documents written as "mobile phone" or "handheld device" can also be retrieved as semantically similar results.
Vector databases are not a replacement for RDBs; they are a complementary solution specialized for semantic search over unstructured data. In RAG systems combined with LLMs (Large Language Models), this nearest neighbor search is the core function that determines the accuracy of responses.
Embedding is a technology that converts data such as text, images, and audio into multi-dimensional numerical vectors in which semantic similarity is expressed as distance. The embeddings of "dog" and "cat" are positioned closer together in vector space than those of "dog" and "car." This property—where closeness in meaning equals closeness in distance—is the essence of vector search.
The Embedding Generation Process
How Vector Search Works
Queries are also vectorized using the same model, and similarity is calculated against the vectors stored in the database. The two most representative similarity metrics are cosine similarity, which measures the alignment of vector directions (widely used in text search), and Euclidean distance, which measures straight-line distance (suited for images and audio).
As data volume grows, brute-force search comparing all entries becomes impractical, so Approximate Nearest Neighbor (ANN) algorithms are used. The representative HNSW algorithm uses a hierarchical graph structure to dramatically improve search speed at a slight cost to accuracy.
It is important to note that embeddings for queries and documents must always be generated using the same model. If different models are used, the structure of the vector space changes, rendering similarity calculations meaningless—a pitfall that is well known to be easy to overlook during implementation.
Behind the rapid surge in attention lies a major turning point: the widespread adoption of LLMs (Large Language Models). As generative AI entered the practical stage, demand exploded for the ability to "have the model answer questions using our own data." Vector databases have become an indispensable component as the core infrastructure for RAG (Retrieval-Augmented Generation), the primary means of achieving this.
The main factors driving demand are as follows:
The limitations of traditional keyword search—its inability to capture "semantically similar" information—have also become apparent in practice. "Contract termination procedure" and "cancellation flow" carry the same meaning, yet full-text match search treats them as entirely different. Vector search clearly demonstrates its practical value by bridging this gap.
With these technological and business tailwinds converging, vector databases have attracted star-level attention as the unsung backbone of AI systems.
Vector databases are rapidly expanding their scope of application as core infrastructure for LLM-powered systems. While the most representative use case is internal document search via RAG, they are also increasingly being used as the memory foundation for recommendation engines and AI agents. Since the design philosophy required differs by use case, it is worth first gaining an understanding of the overall picture.
RAG (Retrieval-Augmented Generation) is a design pattern that compensates for the knowledge limitations of LLMs (Large Language Models). It dynamically retrieves internal documents and up-to-date information not included in training data at inference time, enabling higher answer accuracy while suppressing hallucinations.
Vector databases play a central role as the "search engine" of this RAG system. The processing flow is as follows:
This mechanism is particularly effective in scenarios involving frequently updated information. Keeping up with internal regulations and technical specifications via fine-tuning is prohibitively costly. By simply adding or updating documents in the vector database, the knowledge referenced by the LLM can be reflected in near real time.
This is also noteworthy from a grounding perspective. Passing the metadata of retrieved documents (source, last updated timestamp) to the LLM makes it easier to explicitly state the basis for answers, which tends to facilitate accountability in enterprise use cases.
Vector databases play a central role not only in RAG but also in the areas of similar document search and recommendations. Their greatest strength is the ability to return results based on semantic proximity rather than keyword matching.
Application to Similar Document Search
In fields such as law, medicine, and research, where large volumes of specialized documents are handled, there is significant demand for tasks such as "finding case law with the same intent" or "retrieving similar case reports." Conventional full-text search is reported to struggle with variations in notation and synonym barriers, but vector search using embeddings measures similarity by distance in semantic space, making it easier to surface related documents even when the wording differs.
Representative use cases include the following:
Application to Recommendations
On e-commerce sites and video platforms, vectorizing the characteristics of products and content and combining them with nearest neighbor search against user behavior history vectors tends to enable personalization even with limited data. The ability to mitigate the cold-start problem for new users and newly added items is also attracting attention.
There is also strong compatibility with multimodal search, which embeds text, images, and audio into a shared embedding space, enabling cross-modal use cases such as "search by image and return similar products." Since recommendation quality depends heavily on the choice of embedding model and feature design, these points warrant careful consideration at the time of implementation.
External memory is essential for AI agents to maintain context across multiple tasks. Because LLMs are constrained by their context window, they are not well suited for long-term information accumulation. Vector databases function as that external memory, effectively extending an agent's "capacity to remember."
Use cases can be broadly categorized into three types:
In multi-agent systems, having multiple agents reference a shared vector database tends to facilitate smooth knowledge handoff between agents and reduce redundant processing.
One important consideration is memory freshness management. There is a risk that outdated information mixed into search results will degrade decision-making accuracy. A design that combines TTL settings and metadata filtering to periodically clean up unnecessary memories is required.
One of the most common points of confusion in product selection is determining the comparison criteria—that is, deciding what to evaluate against. Because vector databases differ in their areas of strength from product to product, relying on a simple specification comparison can easily lead to misjudgments. Prioritizing comparison criteria according to the scale of the project, search requirements, and operational structure is what leads to a selection decision you won't regret.
To prevent product selection failures, it is important to evaluate scalability, latency, and cost as three independent axes.
Scalability is measured by how response performance changes as the number of vectors increases.
Latency is directly tied to the choice of ANN (Approximate Nearest Neighbor) algorithm. HNSW is fast but memory-intensive, while IVF-based approaches are more memory-efficient but prone to accuracy trade-offs. When latency requirements are strict, products designed for in-memory operation (such as Redis or Qdrant's memory mode) become viable options.
Cost should be compared not just by monthly fees, but by Total Cost of Ownership (TCO). Managed services often use consumption-based pricing tied to query volume, which carries the risk of costs spiking during traffic surges. Open-source options carry no licensing fees, but the labor costs of infrastructure management must not be overlooked.
Since the priority of these three axes varies depending on the nature of the project, the most effective approach is to finalize requirements first, then narrow down product options.
Vector search alone handles proper nouns and abbreviations poorly. Specialized terms such as "ISO 27001" and "GPT" tend to lack semantically close vectors in the embedding space, and cases of missed results with pure similarity search have been reported.
Hybrid search — combining sparse search methods such as BM25 with vector search — compensates for this weakness. By integrating scores from both using RRF (Reciprocal Rank Fusion), it is possible to leverage both keyword matching and semantic similarity.
The support status for each product is as follows:
tsvector); score integration is left to application-level implementation.During product selection, it is worth confirming whether the capability is a native feature or requires implementation at the application layer. The latter tends to introduce operational costs such as RRF parameter tuning and increased search latency. For RAG systems that make heavy use of specialized terminology, prioritizing products with native support is recommended.
The three main categories of options are managed services, open-source, and existing database extensions. Since operational overhead and cost structures differ significantly by category, it is important to compare them in alignment with the project's phase and requirements. Each H3 section organizes the characteristics and suitable use cases of representative products within each category.
The greatest strength of managed service offerings is the ability to delegate infrastructure management to a vendor. There is no need to handle server provisioning or scaling in-house, making it easier to focus on AI application development.
Pinecone is widely adopted as a fully managed SaaS solution dedicated to vector databases.
However, access to custom infrastructure is restricted, which can limit flexibility in cases where fine-grained tuning is required. Pricing figures are reference values at the time of writing; always check the latest pricing page for current information.
Weaviate Cloud is a service that provides the open-source Weaviate in a managed environment.
Compared to Pinecone, it has more configuration options and a somewhat steeper learning curve, but has been reported to offer an expressive advantage in enterprise environments with complex data models. Both services offer free tiers, making them easy to use for evaluation during the PoC phase.
For teams seeking flexible configurations while keeping costs down, open-source options are a strong choice. The ability to deploy on their own infrastructure and modify source code represents the key differentiator from managed service offerings.
Qdrant is characterized by its lightweight Rust-based architecture, with reports of low latency and low memory usage. It excels at filtered search combining metadata conditions with Approximate Nearest Neighbor (ANN) search, and its low startup cost during the PoC phase — thanks to single Docker image local deployment — is a notable advantage.
Milvus features a distributed architecture designed for horizontal scaling, making it well-suited for large-scale use cases handling billions of vectors.
Chroma offers high affinity with the Python API, and integration with LangChain and LlamaIndex can be completed with virtually zero configuration. However, operational track records at scales exceeding several million records tend to be more limited compared to Milvus or Qdrant, so when scale requirements are clear, early evaluation is advisable.
Recommended selection guidelines by use case are as follows:
It is recommended to consult the official documentation of each project for licensing details and the latest specifications.
The greatest strength of the extension-based approach is its ability to add vector search while keeping existing infrastructure intact. For teams looking to minimize operational costs for new services, it represents a realistic first step.
pgvector operates as a PostgreSQL extension module, allowing vector search to be executed with the same familiar SQL syntax.
ivfflat or hnsw indexHowever, latency tends to degrade with datasets exceeding tens of millions of records. If large-scale operation is on the horizon, it is worth considering an early migration to a dedicated vector database.
Redis (the RediSearch module in Redis Stack) is characterized by low latency through in-memory processing. Use cases have been reported in real-time recommendations requiring millisecond-level responses, as well as short-term memory infrastructure for AI agents. The ability to automatically delete vectors via TTL settings is also a practical advantage. On the other hand, PostgreSQL is often better suited for knowledge bases requiring long-term storage.
A simple decision framework for choosing between the two is sufficient: "pgvector for additions to an existing PostgreSQL environment, Redis when speed is the priority."
There is no "universally correct answer" in product selection — the optimal choice varies depending on the project phase, scale, and the team's operational capabilities. Comparing products solely on features risks overlooking operational costs and learning costs. The following sections organize the selection approach across two stages: "PoC / small-scale validation" and "production operation / enterprise."
In the PoC (proof of concept) or small-scale validation phase, the top priority is "how quickly hypotheses can be validated." When time is consumed by infrastructure setup, confirming the viability of use cases tends to get pushed back.
Key selection criteria at this stage
Chroma can be launched in-memory locally, and a full RAG workflow can be verified in just a few dozen lines of code. Its ease of use — allowing search queries to be tested from day one of prototyping — makes it well-suited for small-scale validation.
Qdrant can be launched with a single Docker image and comes with a well-maintained REST API. Because it is designed with a path from PoC to production in mind, it tends to be a good fit for teams that want to "move directly to production after validation."
For those who want to try a managed service, Pinecone's free tier is an option, but given its limitations on the number of indexes and vectors, caution is advised when validating behavior with large-scale data (check the official page for the latest restrictions).
One often-overlooked consideration is verifying compatibility with the embedding model. Since search accuracy varies significantly depending on the number of dimensions and whether normalization is applied, using the model intended for production from the PoC stage is a prerequisite for meaningful accuracy comparisons across phases.
In the production operation phase, different evaluation criteria are required compared to the PoC stage. The three key dimensions to evaluate are availability, security, and operational cost.
Priority selection criteria
Advantages of managed service offerings
Pinecone and Weaviate Cloud are designed so that the service provider handles backups and auto-scaling. When the operational team's resources are limited, this tends to make it easier to keep total cost of ownership (TCO) down.
Considerations for self-hosted deployments
When deploying Milvus or Qdrant on-premises, operational automation via Kubernetes operators is effectively a prerequisite. Milvus in particular has many distributed components, and cases have been reported where operational proficiency directly impacts search latency and stability.
Compatibility with existing stacks
In environments already running PostgreSQL in production, extending with pgvector can be an option for reducing migration costs. However, at scales exceeding tens of millions of records, a performance gap in search tends to emerge compared to dedicated products, so pre-validation through load testing is recommended.
While the technical setup for vector database adoption has become easier, cases have been reported where systems end up in a state of "running but not delivering accurate results." The cause in most cases lies not in the infrastructure, but in data design and search logic. The following H3 sections take a closer look at two failure patterns that frequently occur in practice.
A pitfall that tends to be overlooked during implementation is a mismatch between chunk size and the embedding model. No matter how capable the chosen model is, retrieval accuracy tends to drop significantly if the chunking design is misaligned.
Typical Patterns Where Mismatches Occur
all-MiniLM-L6-v2 has an upper limit of around 256 tokens, while text-embedding-ada-002 supports up to 8,192 tokens. Exceeding the limit leads to truncation of the trailing content or errors, so it is essential to verify the specifications for each model in advance (check the official documentation).Key Countermeasures
Deciding "which model to use" and "the chunking strategy" together at the design stage is the most straightforward way to reduce rework in later stages.
The performance of a vector database is heavily influenced not only by the quality of the embeddings but also by index design. Misconfiguration tends to degrade the overall answer quality of a RAG system.
Commonly reported failure patterns are as follows:
ef_construction or m too low speeds up index construction but tends to significantly reduce recall during search.One aspect that is easy to overlook is index fragmentation. As data is continuously added and updated, retrieval accuracy gradually degrades. In systems such as Milvus, periodic compaction is recommended and should ideally be incorporated into the operational workflow.
Index design is not a one-time decision; continuously revisiting it as data volume grows is key to maintaining accuracy.
Vector search alone tends to have limitations when it comes to exact keyword matching and handling specialized terminology. To further improve the answer quality of a RAG system, it is effective to strengthen the retrieval layer itself. The following sections cover two approaches: hybrid search, which combines BM25 and vector search with score integration via RRF, and GraphRAG, which leverages knowledge graphs.
Vector search alone is weak at exact matching of proper nouns and code, while sparse search methods such as BM25 cannot capture semantic similarity. Hybrid search is designed to compensate for these weaknesses, and RRF (Reciprocal Rank Fusion) is widely adopted as the algorithm for integrating scores.
RRF integrates results using rank rather than score magnitude.
Its practical strength lies in the ability to combine retrieval methods with different scales without normalization.
Concretely, BM25 tends to be better at picking up proper nouns such as "pgvector," while vector search excels at semantic queries like "What DB can scale cost-effectively?" Integrating the two with RRF tends to surface documents that either method alone would have missed.
On the implementation side, Weaviate and Qdrant natively support hybrid search, allowing the balance between vector and sparse retrieval to be adjusted via an alpha value. With pgvector, a similar configuration can be achieved by combining it with a full-text search index, though the tuning effort increases.
It should be noted, however, that hybrid search is ultimately a means of raising the baseline retrieval accuracy. The quality of the embedding model and the chunk size design are prerequisites; if the quality of the input data is poor, the effect will be limited.
Vector search alone struggles to capture "relationships between concepts." GraphRAG is attracting attention as a technique to address this weakness.
GraphRAG is an approach that combines a Knowledge Graph with RAG (Retrieval-Augmented Generation). Entities extracted from documents are structured as nodes and their relationships as edges, allowing candidates narrowed down by vector search to be explored further on the graph. Its greatest strength is the ability to pass context to an LLM (Large Language Model) that would not be visible through simple embedding distance alone.
The main cases where improvements have been reported are as follows:
It is also important to be aware of the considerations when adopting GraphRAG. The cost of building and updating the graph is higher than that of a vector index, and the accuracy of entity extraction directly affects graph quality. A practical approach is to first measure the accuracy baseline using vector search alone, then layer in GraphRAG incrementally for scenarios that require multi-hop reasoning.
We address two questions that frequently arise when considering the adoption of a vector database. "How does it differ from existing databases?" and "Can it be integrated into our own system?" are topics that tend to serve as key criteria in product selection.
Vector databases and cache databases are often confused as both serving the role of "fast data access." However, their design philosophies and areas of strength are entirely different.
Characteristics of cache databases (e.g., Redis)
Characteristics of vector databases
A concrete example makes this easier to understand. If a user enters "Tell me how to save electricity," a cache database can return nothing if the exact key does not exist. A vector database, on the other hand, can discover documents that are semantically close, such as those covering "energy-saving measures" or "reducing electricity bills."
In practice, many RAG systems combine both: semantic search is performed via the vector database, while the results of frequent queries are temporarily stored in a cache database to reduce latency. Rather than choosing one or the other, the realistic approach is to use them together with clearly defined roles.
To state the conclusion upfront: by introducing the pgvector extension, PostgreSQL can be used as a vector database. Since existing schemas and SQL assets can be used as-is, this is a practical option for teams looking to minimize migration costs.
However, "usable as-is" comes with conditions.
CREATE EXTENSION vector; and add a vector-type columnPerformance at scale is also a concern. When handling more than several million vectors, memory management and VACUUM processing tend to become bottlenecks. Dedicated vector databases are designed with large-scale workloads in mind, so the performance gap tends to widen as data volume grows.
On the other hand, pgvector works well for small-to-medium-scale RAG systems and during the PoC phase. Use cases such as implementing personalized search by JOINing with existing user tables can actually become more complex to implement with a dedicated database.
Starting with pgvector and switching to a dedicated database once bottlenecks become apparent is a low-risk, practical approach.
Selecting a vector database based solely on name recognition is an area where regret is common. Drawing on the content of this article, we outline three key points that should drive your decision-making.
① Choose a product that matches your current phase
In the PoC phase, prioritize ease of setup and a generous free tier. Chroma and Qdrant can be launched locally in a matter of minutes, allowing you to focus on experimenting with embeddings. In production, SLA guarantees, authentication and encryption support, and ease of scaling out become essential. Since the cost of switching between phases can be surprisingly high, it is advisable to choose with future requirements in mind.
② Verify support for hybrid search
It has been reported that vector search alone can suffer accuracy degradation on queries containing technical terms or proper nouns. Whether a solution supports hybrid search — integrating BM25 and vector search via RRF — has a significant impact on the quality of a RAG system. Weaviate and Qdrant offer native support, while pgvector requires a separate implementation, which is worth factoring into your decision.
③ Lock in alignment with your embedding model first
Chunk size, the dimensionality of the embedding model, and its maximum token count form the foundation of index design. Changing these after the fact requires re-indexing all data. The right order is to decide on the model first, then design the database and index around it.
A vector database can be thought of as the "memory" of an AI system, and the quality of your selection directly affects the performance of your RAG pipelines and AI agents. Use the three points above as your framework, and choose the product that best fits your organization's use case.

Yusuke Ishihara
Started programming at age 13 with MSX. After graduating from Musashi University, worked on large-scale system development including airline core systems and Japan's first Windows server hosting/VPS infrastructure. Co-founded Site Engine Inc. in 2008. Founded Unimon Inc. in 2010 and Enison Inc. in 2025, leading development of business systems, NLP, and platform solutions. Currently focuses on product development and AI/DX initiatives leveraging generative AI and large language models (LLMs).