BM25

BM25

BM25 (Best Matching 25) is a probabilistic information retrieval algorithm that scores the relevance between a query and documents by taking into account the term frequency within documents and document length.

As an Evolution of TF-IDF

BM25 is a ranking function that extends the concept of TF-IDF, and has long served as the default algorithm for major full-text search engines such as Elasticsearch and Apache Solr. Its defining characteristic is the use of a saturation function to correct the intuition that "the more frequently a term appears in a document, the more relevant it is." Even as term frequency increases, the score does not rise without bound—it plateaus at a certain point.

There are two primary parameters. k1 controls the saturation rate of term frequency, while b adjusts the strength of normalization based on document length. The default values (k1=1.2, b=0.75) are often used as-is, though tuning them can improve retrieval accuracy for domain-specific corpora.

Role in RAG Pipelines

Even amid the growing attention on vector search (semantic search), BM25 remains strong for exact keyword matching and retrieval of technical terminology. In practice, a hybrid search approach—combining BM25 with vector search and merging scores using RRF (Reciprocal Rank Fusion)—has become the standard pattern.

For queries where "string matching" matters more than "meaning," such as those involving proper nouns or model numbers, a hybrid configuration that includes BM25 delivers more consistent results than vector search alone.