mirror of
https://github.com/lancedb/lancedb.git
synced 2026-01-05 19:32:56 +00:00
- Enforce all rerankers always return _relevance_score. This was already loosely done in tests before but based on user feedback its better to always have _relevance_score present in all reranked results - Deprecate LinearCombinationReranker in docs. And also fix a case where it would not return _relevance_score if one result set was missing
2.1 KiB
2.1 KiB
Reciprocal Rank Fusion Reranker
This is the default re-ranker used by LanceDB hybrid search. Reciprocal Rank Fusion (RRF) is an algorithm that evaluates the search scores by leveraging the positions/rank of the documents. The implementation follows this paper.
!!! note Supported Query Types: Hybrid
import numpy
import lancedb
from lancedb.embeddings import get_registry
from lancedb.pydantic import LanceModel, Vector
from lancedb.rerankers import RRFReranker
embedder = get_registry().get("sentence-transformers").create()
db = lancedb.connect("~/.lancedb")
class Schema(LanceModel):
text: str = embedder.SourceField()
vector: Vector(embedder.ndims()) = embedder.VectorField()
data = [
{"text": "hello world"},
{"text": "goodbye world"}
]
tbl = db.create_table("test", schema=Schema, mode="overwrite")
tbl.add(data)
reranker = RRFReranker()
# Run hybrid search with a reranker
tbl.create_fts_index("text", replace=True)
result = tbl.search("hello", query_type="hybrid").rerank(reranker=reranker).to_list()
Accepted Arguments
| Argument | Type | Default | Description |
|---|---|---|---|
K |
int |
60 |
A constant used in the RRF formula (default is 60). Experiments indicate that k = 60 was near-optimal, but that the choice is not critical |
return_score |
str | "relevance" |
Options are "relevance" or "all". The type of score to return. If "relevance", will return only the _relevance_score. If "all", will return all scores from the vector and FTS search along with the relevance score. |
Supported Scores for each query type
You can specify the type of scores you want the reranker to return. The following are the supported scores for each query type:
Hybrid Search
return_score |
Status | Description |
|---|---|---|
relevance |
✅ Supported | Returned rows only have the _relevance_score column |
all |
✅ Supported | Returned rows have vector(_distance) and FTS(score) along with Hybrid Search score(_relevance_score) |