mirror of https://github.com/lancedb/lancedb.git synced 2026-01-05 19:32:56 +00:00

Files

Ayush Chaurasia 86978e7588 feat!: enforce all rerankers always return relevance score & deprecate linear combination fixes (#1687 )

- Enforce all rerankers always return _relevance_score. This was already
loosely done in tests before but based on user feedback its better to
always have _relevance_score present in all reranked results
- Deprecate LinearCombinationReranker in docs. And also fix a case where
it would not return _relevance_score if one result set was missing

2024-09-23 12:12:02 +05:30

2.1 KiB

Raw Blame History

Reciprocal Rank Fusion Reranker

This is the default re-ranker used by LanceDB hybrid search. Reciprocal Rank Fusion (RRF) is an algorithm that evaluates the search scores by leveraging the positions/rank of the documents. The implementation follows this paper.

!!! note Supported Query Types: Hybrid

import numpy
import lancedb
from lancedb.embeddings import get_registry
from lancedb.pydantic import LanceModel, Vector
from lancedb.rerankers import RRFReranker

embedder = get_registry().get("sentence-transformers").create()
db = lancedb.connect("~/.lancedb")

class Schema(LanceModel):
    text: str = embedder.SourceField()
    vector: Vector(embedder.ndims()) = embedder.VectorField()

data = [
    {"text": "hello world"},
    {"text": "goodbye world"}
    ]
tbl = db.create_table("test", schema=Schema, mode="overwrite")
tbl.add(data)
reranker = RRFReranker()

# Run hybrid search with a reranker
tbl.create_fts_index("text", replace=True)
result = tbl.search("hello", query_type="hybrid").rerank(reranker=reranker).to_list()

Accepted Arguments

Argument	Type	Default	Description
`K`	`int`	`60`	A constant used in the RRF formula (default is 60). Experiments indicate that k = 60 was near-optimal, but that the choice is not critical
`return_score`	str	`"relevance"`	Options are "relevance" or "all". The type of score to return. If "relevance", will return only the `_relevance_score`. If "all", will return all scores from the vector and FTS search along with the relevance score.

Supported Scores for each query type

You can specify the type of scores you want the reranker to return. The following are the supported scores for each query type:

Hybrid Search

`return_score`	Status	Description
`relevance`	✅ Supported	Returned rows only have the `_relevance_score` column
`all`	✅ Supported	Returned rows have vector(`_distance`) and FTS(`score`) along with Hybrid Search score(`_relevance_score`)

2.1 KiB Raw Blame History

Reciprocal Rank Fusion Reranker

Accepted Arguments

Supported Scores for each query type

Hybrid Search

2.1 KiB

Raw Blame History