Fix broken link to embedding functions
testing: broken link was verified after local docs build to have been
repaired
---------
Co-authored-by: Chang She <chang@lancedb.com>
This PR adds an overview of embeddings docs:
- 2 ways to vectorize your data using lancedb - explicit & implicit
- explicit - manually vectorize your data using `wit_embedding` function
- Implicit - automatically vectorize your data as it comes by ingesting
your embedding function details as table metadata
- Multi-modal example w/ disappearing embedding function
Add `to_list` to return query results as list of python dict (so we're
not too pandas-centric). Closes#555
Add `to_pandas` API and add deprecation warning on `to_df`. Closes#545
Co-authored-by: Chang She <chang@lancedb.com>
I only modified those docs pages that are untouched in existing unmerged
PRs, so hopefully there are no merge conflicts!
1. The `tantivy-py` version specified in the docs doesn't work (pip
install fails), but with the latest version of pip and wheel installed
on my Mac, I was able to just `pip install tantivy` and FTS works great
for me. I updated the docs page to include this in
7ca4b757ce but can always modify to
another specific version in case this breaks any tests.
2. The `.add()` method for Python should take in a list of dicts as the
first option (to also align with the JS API), and additionally, users
can pass an existing pandas DataFrame to add to a table. Hope this makes
sense.
3. I've had multiple conversations with users who are unclear that the
terms "exhaustive", "flat" and "KNN" are all the same kind of search, so
I've updated the verbiage of this section to clarify this.
4. Fixed typos and improved clarity in the ANN indexes page.
We have experimental support for prefiltering (without ANN) in pylance.
This means that we can now apply a filter BEFORE vector search is
performed. This can be done via the `.where(filter_string,
prefilter=True)` kwargs of the query.
Limitations:
- When connecting to LanceDB cloud, `prefilter=True` will raise
NotImplemented
- When an ANN index is present, `prefilter=True` will raise
NotImplemented
- This option is not available for full text search query
- This option is not available for empty search query (just
filter/project)
Additional changes in this PR:
- Bump pylance version to v0.8.0 which supports the experimental
prefiltering.
---------
Co-authored-by: Chang She <chang@lancedb.com>
The `attr` project is unrelated to `attrs` that also provides the `attr`
namespace (see also <https://hynek.me/articles/import-attrs/>).
It used to _usually_ work, because attrs is a dependency of aiohttp and
somehow took precedence over `attr`'s `attr`.
Yes, sorry, it's a mess.
This PR upgrade lance to `0.7.5`, which include fixes for searching an
empty dataset.
This PR also adds two tests in node SDK to make sure searching empty
dataset do no throw
Co-authored-by: rmeng <rob@lancedb.com>
in #486 `connect` started converting path into uri. However, the PR
didn't handle relative path and appended `file://` to relative path.
This PR changes the parsing strat to be more rational. If a path is
provided instead of url, we do not try anythinng special.
engine and engine params may only be specified when a url with schema is
provided
Co-authored-by: rmeng <rob@lancedb.com>
aws integration tests are flaky because we didn't wait for the services
to become healthy. (we only waited for the localstack service, this PR
adds wait for sub services)