Hybrid search is a technique that combines keyword-based full-text search (such as BM25) with vector search (semantic search), leveraging the strengths of both to improve retrieval accuracy.
The single biggest factor in RAG retrieval accuracy is whether relevant documents are correctly retrieved. Vector search alone struggles with exact matches for proper nouns like "ISO 27001," while BM25 alone cannot handle semantic paraphrases like "international standard for information security." Hybrid search compensates for these two weaknesses.
The most common implementation pattern runs BM25 and vector search independently, then merges results using RRF (Reciprocal Rank Fusion). It sums the reciprocals of each method's rankings to produce a final reranked score. The formula is simple, yet it consistently outperforms either search method alone.
Combining pgvector with PostgreSQL's full-text search enables hybrid search without additional infrastructure. On Supabase, maintaining both a vector column and a tsvector column in the same table and computing both scores within SQL is a practical approach.
Chunk size design also affects accuracy. Smaller chunks improve vector search precision but lose context for BM25. In practice, chunks of 500-1000 tokens with overlapping context from adjacent segments are common.


BM25 (Best Matching 25) is a probabilistic information retrieval algorithm that scores the relevance between a query and documents by taking into account the term frequency within documents and document length.

A vector database stores text, images, and other data as numerical vectors (embeddings) and provides fast search based on semantic similarity.

Embedding is a technique that transforms unstructured data such as text, images, and audio into fixed-length numerical vectors while preserving semantic relationships.

What is a Vector Database? A Complete Guide to How It Works, Top Product Comparisons, and RAG Applications

Gemini Embedding 2 is a multimodal embedding model developed by Google, capable of converting text, images, video, audio, and documents into a single vector space.