When running `index_stats()` for an FTS index, users would get the
deserialization error:
```
InvalidInput { message: "error deserializing index statistics: unknown variant `Inverted`, expected one of `IvfPq`, `IvfHnswPq`, `IvfHnswSq`, `BTree`, `Bitmap`, `LabelList`, `FTS` at line 1 column 24" }
```
This exposes the `LANCEDB_LOG` environment variable in node, so that
users can now turn on logging.
In addition, fixes a bug where only the top-level error from Rust was
being shown. This PR makes sure the full error chain is included in the
error message. In the future, will improve this so the error chain is
set on the [cause](https://nodejs.org/api/errors.html#errorcause)
property of JS errors https://github.com/lancedb/lancedb/issues/1779Fixes#1774
BREAKING CHANGE: default tokenizer no longer does stemming or stop-word
removal. Users should explicitly turn that option on in the future.
- upgrade lance to 0.19.1
- update the FTS docs
- update the FTS API
Upstream change notes:
https://github.com/lancedb/lance/releases/tag/v0.19.1
---------
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Co-authored-by: Will Jones <willjones127@gmail.com>
Closes#1741
If we checkout a version, we need to make a `HEAD` request to get the
size of the manifest. The new `checkout_latest()` code path can skip
this IOP. This makes the refresh slightly faster.
## user story
fixes https://github.com/lancedb/lancedb/issues/1480https://github.com/invl/retry has not had an update in 8 years, one if
its sub-dependencies via requirements.txt
(https://github.com/pytest-dev/py) is no longer maintained and has a
high severity vulnerability (CVE-2022-42969).
retry is only used for a single function in the python codebase for a
deprecated helper function `with_embeddings`, which was created for an
older tutorial (https://github.com/lancedb/lancedb/pull/12) [but is now
deprecated](https://lancedb.github.io/lancedb/embeddings/legacy/).
## changes
i backported a limited range of functionality of the `@retry()`
decorator directly into lancedb so that we no longer have a dependency
to the `retry` package.
## tests
```
/Users/james/src/lancedb/python $ ruff check .
All checks passed!
/Users/james/src/lancedb/python $ pytest python/tests/test_embeddings.py
python/tests/test_embeddings.py .......s.... [100%]
================================================================ 11 passed, 1 skipped, 2 warnings in 7.08s ================================================================
```
* Adds nicer errors to remote SDK, that expose useful properties like
`request_id` and `status_code`.
* Makes sure the Python tracebacks print nicely by mapping the `source`
field from a Rust error to the `__cause__` field.
A few bugs uncovered by integration tests:
* We didn't prepend `/v1` to the Table endpoint URLs
* `/create_index` takes `metric_type` not `distance_type`. (This is also
an error in the OpenAPI docs.)
* `/create_index` expects the `metric_type` parameter to always be
lowercase.
* We were writing an IPC file message when we were supposed to send an
IPC stream message.
This PR ports over advanced client configuration present in the Python
`RestfulLanceDBClient` to the Rust one. The goal is to have feature
parity so we can replace the implementation.
* [x] Request timeout
* [x] Retries with backoff
* [x] Request id generation
* [x] User agent (with default tied to library version ✨ )
* [x] Table existence cache
* [ ] Deferred: ~Request id customization (should this just pick up OTEL
trace ids?)~
Fixes#1684
Resovles #1709. Adds `trust_remote_code` as a parameter to the
`TransformersEmbeddingFunction` class with a default of False. Updated
relevant documentation with the same.