Lance now supports FTS, so add it into lancedb Python, TypeScript and
Rust SDKs.
For Python, we still use tantivy based FTS by default because the lance
FTS index now misses some features of tantivy.
For Python:
- Support to create lance based FTS index
- Support to specify columns for full text search (only available for
lance based FTS index)
For TypeScript:
- Change the search method so that it can accept both string and vector
- Support full text search
For Rust
- Support full text search
The others:
- Update the FTS doc
BREAKING CHANGE:
- for Python, this renames the attached score column of FTS from "score"
to "_score", this could be a breaking change for users that rely the
scores
---------
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
It's useful to see the underlying query plan for debugging purposes.
This exposes LanceScanner's `explain_plan` function. Addresses #1288
---------
Co-authored-by: Will Jones <willjones127@gmail.com>
This allows users to specify URIs like:
```
s3+ddb://my_bucket/path?ddbTableName=myCommitTable
```
and it will support concurrent writes in S3.
* [x] Add dynamodb integration tests
* [x] Add modifications to get it working in Python sync API
* [x] Added section in documentation describing how to configure.
Closes#534
---------
Co-authored-by: universalmind303 <cory.grinstead@gmail.com>
- fix some clippy errors from ci running a different toolchain.
- add some saftey notes about some unsafe blocks.
- locks the toolchain so that it is consistent across dev and CI.
most of the time we don't need to reload. Locking the write lock and
performing IO is not an ideal pattern.
This PR tries to make the critical section of `.write()` happen less
frequently.
This isn't the most ideal solution. The most ideal solution should not
lock until the new dataset has been loaded. But that would require too
much refactoring.
- changed the error msg for table.search with wrong query vector dim
- added missing fields for listIndices and indexStats to be consistent
with Python API - will make changes in node integ test
part of https://github.com/lancedb/lancedb/issues/994.
Adds the ability to use the openai embedding functions.
the example can be run by the following
```sh
> EXPORT OPENAI_API_KEY="sk-..."
> cargo run --example openai --features=openai
```
which should output
```
Closest match: Winter Parka
```
The optimize function is pretty crucial for getting good performance
when building a large scale dataset but it was only exposed in rust
(many sync python users are probably doing this via to_lance today)
This PR adds the optimize function to nodejs and to python.
I left the function marked experimental because I think there will
likely be changes to optimization (e.g. if we add features like
"optimize on write"). I also only exposed the `cleanup_older_than`
configuration parameter since this one is very commonly used and the
rest have sensible defaults and we don't really know why we would
recommend different values for these defaults anyways.