I only modified those docs pages that are untouched in existing unmerged
PRs, so hopefully there are no merge conflicts!
1. The `tantivy-py` version specified in the docs doesn't work (pip
install fails), but with the latest version of pip and wheel installed
on my Mac, I was able to just `pip install tantivy` and FTS works great
for me. I updated the docs page to include this in
7ca4b757ce but can always modify to
another specific version in case this breaks any tests.
2. The `.add()` method for Python should take in a list of dicts as the
first option (to also align with the JS API), and additionally, users
can pass an existing pandas DataFrame to add to a table. Hope this makes
sense.
3. I've had multiple conversations with users who are unclear that the
terms "exhaustive", "flat" and "KNN" are all the same kind of search, so
I've updated the verbiage of this section to clarify this.
4. Fixed typos and improved clarity in the ANN indexes page.
We have experimental support for prefiltering (without ANN) in pylance.
This means that we can now apply a filter BEFORE vector search is
performed. This can be done via the `.where(filter_string,
prefilter=True)` kwargs of the query.
Limitations:
- When connecting to LanceDB cloud, `prefilter=True` will raise
NotImplemented
- When an ANN index is present, `prefilter=True` will raise
NotImplemented
- This option is not available for full text search query
- This option is not available for empty search query (just
filter/project)
Additional changes in this PR:
- Bump pylance version to v0.8.0 which supports the experimental
prefiltering.
---------
Co-authored-by: Chang She <chang@lancedb.com>
The `attr` project is unrelated to `attrs` that also provides the `attr`
namespace (see also <https://hynek.me/articles/import-attrs/>).
It used to _usually_ work, because attrs is a dependency of aiohttp and
somehow took precedence over `attr`'s `attr`.
Yes, sorry, it's a mess.
This PR upgrade lance to `0.7.5`, which include fixes for searching an
empty dataset.
This PR also adds two tests in node SDK to make sure searching empty
dataset do no throw
Co-authored-by: rmeng <rob@lancedb.com>
in #486 `connect` started converting path into uri. However, the PR
didn't handle relative path and appended `file://` to relative path.
This PR changes the parsing strat to be more rational. If a path is
provided instead of url, we do not try anythinng special.
engine and engine params may only be specified when a url with schema is
provided
Co-authored-by: rmeng <rob@lancedb.com>
aws integration tests are flaky because we didn't wait for the services
to become healthy. (we only waited for the localstack service, this PR
adds wait for sub services)
# WARNING: specifying engine is NOT a publicly supported feature in
lancedb yet. THE API WILL CHANGE.
This PR exposes dynamodb based commit to `vectordb` and JS SDK (will do
python in another PR since it's on a different release track)
This PR also added aws integration test using `localstack`
## What?
This PR adds uri parameters to DB connection string. User may specify
`engine` in the connection string to let LanceDB know that the user
wants to use an external store when reading and writing a table. User
may also pass any parameters required by the commitStore in the
connection string, these parameters will be propagated to lance.
e.g.
```
vectordb.connect("s3://my-db-bucket?engine=ddb&ddbTableName=my-commit-table")
```
will automatically convert table path to
```
s3+ddb://my-db-bucket/my_table.lance?&ddbTableName=my-commit-table
```
1. Support persistent embedding function so users can just search using
query string
2. Add fixed size list conversion for multiple vector columns
3. Add support for empty query (just apply select/where/limit).
4. Refactor and simplify some of the data prep code
---------
Co-authored-by: Chang She <chang@lancedb.com>
Co-authored-by: Weston Pace <weston.pace@gmail.com>
Combine delete and append to make a temporary update feature that is
only enabled for the local python lancedb.
The reason why this is temporary is because it first has to load the
data that matches the where clause into memory, which is technical
unbounded.
---------
Co-authored-by: Chang She <chang@lancedb.com>
This reverts commit 87e9a0250f.
I triggered the nodejs release commit GHA by mistake. Reverting it.
The tag will be removed manually.
Co-authored-by: Chang She <chang@lancedb.com>
Previously if you needed to add a column to a table you'd have to
rewrite the whole table. Instead,
we use the merge functionality from Lance format
to incrementally add columns from another table
or dataframe.
---------
Co-authored-by: Chang She <chang@lancedb.com>
Co-authored-by: Weston Pace <weston.pace@gmail.com>
Previously the temporary restore feature required copying data. The new
feature in pylance does not.
---------
Co-authored-by: Chang She <chang@lancedb.com>
Co-authored-by: Weston Pace <weston.pace@gmail.com>