This PR:
- Adds missing license headers
- Integrates with answerdotai Rerankers package
- Updates ColbertReranker to subclass answerdotai package. This is done
to keep backwards compatibility as some users might be used to importing
ColbertReranker directly
- Set `trust_remote_code` to ` True` by default in CrossEncoder and
sentence-transformer based rerankers
- Update ColBertReranker architecture: The current implementation
doesn't use the right arch. This PR uses the implementation in Rerankers
library. Fixes https://github.com/lancedb/lancedb/issues/1546
Benchmark diff (hit rate):
Hybrid - 91 vs 87
reranked vector - 85 vs 80
- Reranking in FTS is basically disabled in main after last week's FTS
updates. I think there's no blocker in supporting that?
- Allow overriding accelerators: Most transformer based Rerankers and
Embedding automatically select device. This PR allows overriding those
settings by passing `device`. Fixes:
https://github.com/lancedb/lancedb/issues/1487
---------
Co-authored-by: BubbleCal <bubble-cal@outlook.com>
Lance now supports FTS, so add it into lancedb Python, TypeScript and
Rust SDKs.
For Python, we still use tantivy based FTS by default because the lance
FTS index now misses some features of tantivy.
For Python:
- Support to create lance based FTS index
- Support to specify columns for full text search (only available for
lance based FTS index)
For TypeScript:
- Change the search method so that it can accept both string and vector
- Support full text search
For Rust
- Support full text search
The others:
- Update the FTS doc
BREAKING CHANGE:
- for Python, this renames the attached score column of FTS from "score"
to "_score", this could be a breaking change for users that rely the
scores
---------
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Currently targeting the following usage:
```
from lancedb.rerankers import CrossEncoderReranker
reranker = CrossEncoderReranker()
query = "hello"
res1 = table.search(query, vector_column_name="vector").limit(3)
res2 = table.search(query, vector_column_name="text_vector").limit(3)
res3 = table.search(query, vector_column_name="meta_vector").limit(3)
reranked = reranker.rerank_multivector(
[res1, res2, res3],
deduplicate=True,
query=query # some reranker models need query
)
```
- This implements rerank_multivector function in the base reranker so
that all rerankers that implement rerank_vector will automatically have
multivector reranking support
- Special case for RRF reranker that just uses its existing
rerank_hybrid fcn to multi-vector reranking.
---------
Co-authored-by: Weston Pace <weston.pace@gmail.com>
It's useful to see the underlying query plan for debugging purposes.
This exposes LanceScanner's `explain_plan` function. Addresses #1288
---------
Co-authored-by: Will Jones <willjones127@gmail.com>