Support hybrid search in both rust and node SDKs.
- Adds a new rerankers package to rust LanceDB, with the implementation
of the default RRF reranker
- Adds a new hybrid package to lancedb, with some helper methods related
to hybrid search such as normalizing scores and converting score column
to rank columns
- Adds capability to LanceDB VectorQuery to perform hybrid search if it
has both a nearest vector and full text search parameters.
- Adds wrappers for reranker implementations to nodejs SDK.
Additional rerankers will be added in followup PRs
https://github.com/lancedb/lancedb/issues/1921
---
Notes about how the rust rerankers are wrapped for calling from JS:
I wanted to keep the core reranker logic, and the invocation of the
reranker by the query code, in Rust. This aligns with the philosophy of
the new node SDK where it's just a thin wrapper around Rust. However, I
also wanted to have support for users who want to add custom rerankers
written in Javascript.
When we add a reranker to the query from Javascript, it adds a special
Rust reranker that has a callback to the Javascript code (which could
then turn around and call an underlying Rust reranker implementation if
desired). This adds a bit of complexity, but overall I think it moves us
in the right direction of having the majority of the query logic in the
underlying Rust SDK while keeping the option open to support custom
Javascript Rerankers.
binary vectors and hamming distance can work on only IVF_FLAT, so
introduce them all in this PR.
---------
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
* Upgrades our toolchain file to v1.83.0, since many dependencies now
have MSRV of 1.81.0
* Reverts Rust changes from #1946 that were working around this in a
dumb way
* Adding an MSRV check
* Reduce MSRV back to 1.78.0
### Changes to sync API
* Updated `LanceTable` and `LanceDBConnection` reprs
* Add `storage_options`, `data_storage_version`, and
`enable_v2_manifest_paths` to sync create table API.
* Add `storage_options` to `open_table` in sync API.
* Add `list_indices()` and `index_stats()` to sync API
* `create_table()` will now create only 1 version when data is passed.
Previously it would always create two versions: 1 to create an empty
table and 1 to add data to it.
### Changes to async API
* Add `embedding_functions` to async `create_table()` API.
* Added `head()` to async API
### Refactors
* Refactor index parameters into dataclasses so they are easier to use
from Python
* Moved most tests to use an in-memory DB so we don't need to create so
many temp directories
Closes#1792Closes#1932
---------
Co-authored-by: Weston Pace <weston.pace@gmail.com>
* One doctest was running for > 60 seconds in CI, since it was
(unsuccessfully) trying to connect to LanceDB Cloud.
* Fixed the example for `Query::full_text_query()`, which was incorrect.
Previously, whenever `Table.add()` was called, we would write and
re-open the underlying dataset. This was bad for performance, as it
reset the table cache and initiated a lot of IO. It also could be the
source of bugs, since we didn't necessarily pass all the necessary
connection options down when re-opening the table.
Closes#1655
Hello, this is a simple PR that supports `rustls-tls` feature.
The `reqwest`\`s default TLS `default-tls` is enabled by default, to
dismiss the side-effect.
The user can use `rustls-tls` like this:
```toml
lancedb = { version = "*", default-features = false, features = ["rustls-tls"] }
```