* One doctest was running for > 60 seconds in CI, since it was
(unsuccessfully) trying to connect to LanceDB Cloud.
* Fixed the example for `Query::full_text_query()`, which was incorrect.
Previously, whenever `Table.add()` was called, we would write and
re-open the underlying dataset. This was bad for performance, as it
reset the table cache and initiated a lot of IO. It also could be the
source of bugs, since we didn't necessarily pass all the necessary
connection options down when re-opening the table.
Closes#1655
Hello, this is a simple PR that supports `rustls-tls` feature.
The `reqwest`\`s default TLS `default-tls` is enabled by default, to
dismiss the side-effect.
The user can use `rustls-tls` like this:
```toml
lancedb = { version = "*", default-features = false, features = ["rustls-tls"] }
```
Allows users to pass multiple query vector as part of a single query
plan. This just runs the queries in parallel without any further
optimization. It's mostly a convenience.
Previously, I think this was only handled by the sync Python remote API.
This makes it common across all SDKs.
Closes https://github.com/lancedb/lancedb/issues/1803
```python
>>> import lancedb
>>> import asyncio
>>>
>>> async def main():
... db = await lancedb.connect_async("./demo")
... table = await db.create_table("demo", [{"id": 1, "vector": [1, 2, 3]}, {"id": 2, "vector": [4, 5, 6]}], mode="overwrite")
... return await table.query().nearest_to([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [4.0, 5.0, 6.0]]).limit(1).to_pandas()
...
>>> asyncio.run(main())
query_index id vector _distance
0 2 2 [4.0, 5.0, 6.0] 0.0
1 1 2 [4.0, 5.0, 6.0] 0.0
2 0 1 [1.0, 2.0, 3.0] 0.0
```
we don't really need these trait in lancedb, but all fields in `Index`
implement the 2 traits, so do it for possibility to use `Index`
somewhere
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
* Replaces Python implementation of Remote SDK with Rust one.
* Drops dependency on `attrs` and `cachetools`. Makes `requests` an
optional dependency used only for embeddings feature.
* Adds dependency on `nest-asyncio`. This was required to get hybrid
search working.
* Deprecate `request_thread_pool` parameter. We now use the tokio
threadpool.
* Stop caching the `schema` on a remote table. Schema is mutable and
there's no mechanism in place to invalidate the cache.
* Removed the client-side resolution of the vector column. We should
already be resolving this server-side.
* `open_table` uses `POST` not `GET`
* `update` uses `predicate` key not `only_if`
* For FTS search, vector cannot be omitted. It must be passed as empty.
* Added logging of JSON request bodies to debug level logging.
Fixes several minor issues with Rust remote SDK:
* Delete uses `predicate` not `filter` as parameter
* Update does not return the row value in remote SDK
* Update takes tuples
* Content type returned by query node is wrong, so we shouldn't validate
it. https://github.com/lancedb/sophon/issues/2742
* Data returned by query endpoint is actually an Arrow IPC file, not IPC
stream.
When running `index_stats()` for an FTS index, users would get the
deserialization error:
```
InvalidInput { message: "error deserializing index statistics: unknown variant `Inverted`, expected one of `IvfPq`, `IvfHnswPq`, `IvfHnswSq`, `BTree`, `Bitmap`, `LabelList`, `FTS` at line 1 column 24" }
```
This exposes the `LANCEDB_LOG` environment variable in node, so that
users can now turn on logging.
In addition, fixes a bug where only the top-level error from Rust was
being shown. This PR makes sure the full error chain is included in the
error message. In the future, will improve this so the error chain is
set on the [cause](https://nodejs.org/api/errors.html#errorcause)
property of JS errors https://github.com/lancedb/lancedb/issues/1779Fixes#1774
BREAKING CHANGE: default tokenizer no longer does stemming or stop-word
removal. Users should explicitly turn that option on in the future.
- upgrade lance to 0.19.1
- update the FTS docs
- update the FTS API
Upstream change notes:
https://github.com/lancedb/lance/releases/tag/v0.19.1
---------
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Co-authored-by: Will Jones <willjones127@gmail.com>
Closes#1741
If we checkout a version, we need to make a `HEAD` request to get the
size of the manifest. The new `checkout_latest()` code path can skip
this IOP. This makes the refresh slightly faster.