lancedb

mirror of https://github.com/lancedb/lancedb.git synced 2026-01-03 10:22:56 +00:00

Go to file

Ayush Chaurasia 3ffed89793 feat(python): Hybrid search & Reranker API (#824 )

based on https://github.com/lancedb/lancedb/pull/713
- The Reranker api can be plugged into vector only or fts only search
but this PR doesn't do that (see example -
https://txt.cohere.com/rerank/)


### Default reranker -- `LinearCombinationReranker(weight=0.7,
fill=1.0)`

```
table.search("hello", query_type="hybrid").rerank(normalize="score").to_pandas()
```
### Available rerankers
LinearCombinationReranker
```
from lancedb.rerankers import LinearCombinationReranker

# Same as default 
table.search("hello", query_type="hybrid").rerank(
                                      normalize="score", 
                                      reranker=LinearCombinationReranker()
                                     ).to_pandas()

# with custom params
reranker = LinearCombinationReranker(weight=0.3, fill=1.0)
table.search("hello", query_type="hybrid").rerank(
                                      normalize="score", 
                                      reranker=reranker
                                     ).to_pandas()
```

Cohere Reranker
```
from lancedb.rerankers import CohereReranker

# default model.. English and multi-lingual supported. See docstring for available custom params
table.search("hello", query_type="hybrid").rerank(
                                      normalize="rank",  # score or rank
                                      reranker=CohereReranker()
                                     ).to_pandas()

```

CrossEncoderReranker

```
from lancedb.rerankers import CrossEncoderReranker

table.search("hello", query_type="hybrid").rerank(
                                      normalize="rank", 
                                      reranker=CrossEncoderReranker()
                                     ).to_pandas()

```

## Using custom Reranker
```
from lancedb.reranker import Reranker

class CustomReranker(Reranker):
    def rerank_hybrid(self, vector_result, fts_result):
           combined_res = self.merge_results(vector_results, fts_results) # or use custom combination logic
           # Custom rerank logic here
           
           return combined_res
```

- [x] Expand testing
- [x] Make sure usage makes sense
- [x] Run simple benchmarks for correctness (Seeing weird result from
cohere reranker in the toy example)
- Support diverse rerankers by default:
- [x] Cross encoding
- [x] Cohere
- [x] Reciprocal Rank Fusion

---------

Co-authored-by: Chang She <759245+changhiskhan@users.noreply.github.com>
Co-authored-by: Prashanth Rao <35005448+prrao87@users.noreply.github.com>

2024-01-30 19:10:33 +05:30

.github

doc: use snippet for rust code example and make sure rust examples run through CI (#885 )

2024-01-28 14:30:30 -08:00

chore: set error handling to immediate (#686 )

2023-12-06 14:20:46 -08:00

docs

feat(python): Hybrid search & Reranker API (#824 )

2024-01-30 19:10:33 +05:30

node

chore: convert all js doc test to use snippet. (#881 )

2024-01-28 11:39:25 -08:00

nodejs

feat(napi): Issue queries as node SDK (#868 )

2024-01-25 22:14:14 -08:00

python

feat(python): Hybrid search & Reranker API (#824 )

2024-01-30 19:10:33 +05:30

rust

doc: use snippet for rust code example and make sure rust examples run through CI (#885 )

2024-01-28 14:30:30 -08:00

.bumpversion.cfg

Bump version: 0.4.5 → 0.4.6

2024-01-26 22:40:36 +00:00

.gitignore

feat: rework NodeJS SDK using napi (#847 )

2024-01-23 15:14:45 -08:00

.pre-commit-config.yaml

Handle NaN input data (#241 )

2023-07-04 20:00:46 -07:00

Cargo.toml

chore: upgrade lance, pylance and datafusion (#879 )

2024-01-27 12:31:38 -08:00

docker-compose.yml

add health check to wait for all service ready before next step (#501 )

2023-09-18 15:17:45 -04:00

LICENSE

initial commit

2023-03-17 18:15:19 -07:00

README.md

docs: Add badges (#694 )

2023-12-08 20:55:04 +05:30

README.md

Developer-friendly, serverless vector database for AI applications

LanceDB is an open-source database for vector-search built with persistent storage, which greatly simplifies retrevial, filtering and management of embeddings.

The key features of LanceDB include:

Production-scale vector search with no servers to manage.
Store, query and filter vectors, metadata and multi-modal data (text, images, videos, point clouds, and more).
Support for vector similarity search, full-text search and SQL.
Native Python and Javascript/Typescript support.
Zero-copy, automatic versioning, manage versions of your data without needing extra infrastructure.
GPU support in building vector index(*).
Ecosystem integrations with LangChain 🦜️🔗, LlamaIndex 🦙, Apache-Arrow, Pandas, Polars, DuckDB and more on the way.

LanceDB's core is written in Rust 🦀 and is built using Lance, an open-source columnar format designed for performant ML workloads.

Quick Start

Javascript

npm install vectordb

const lancedb = require('vectordb');
const db = await lancedb.connect('data/sample-lancedb');

const table = await db.createTable('vectors',
      [{ id: 1, vector: [0.1, 0.2], item: "foo", price: 10 },
       { id: 2, vector: [1.1, 1.2], item: "bar", price: 50 }])

const query = table.search([0.1, 0.3]).limit(2);
const results = await query.execute();

Python

pip install lancedb

import lancedb

uri = "data/sample-lancedb"
db = lancedb.connect(uri)
table = db.create_table("my_table",
                         data=[{"vector": [3.1, 4.1], "item": "foo", "price": 10.0},
                               {"vector": [5.9, 26.5], "item": "bar", "price": 20.0}])
result = table.search([100, 100]).limit(2).to_pandas()

Blogs, Tutorials & Videos

Languages

Rust 42.7%

Python 42%

TypeScript 14.2%

Shell 0.6%

Java 0.3%

README.md Unescape Escape

Quick Start

Blogs, Tutorials & Videos

README.md