* Test that we can insert subschemas (omit nullable columns) in Python.
* More work is needed to support this in Node. See:
https://github.com/lancedb/lancedb/issues/1832
* Test that we can insert data with nullable schema but no nulls in
non-nullable schema.
* Add `"null"` option for `on_bad_vectors` where we fill with null if
the vector is bad.
* Make null values not considered bad if the field itself is nullable.
Allows users to pass multiple query vector as part of a single query
plan. This just runs the queries in parallel without any further
optimization. It's mostly a convenience.
Previously, I think this was only handled by the sync Python remote API.
This makes it common across all SDKs.
Closes https://github.com/lancedb/lancedb/issues/1803
```python
>>> import lancedb
>>> import asyncio
>>>
>>> async def main():
... db = await lancedb.connect_async("./demo")
... table = await db.create_table("demo", [{"id": 1, "vector": [1, 2, 3]}, {"id": 2, "vector": [4, 5, 6]}], mode="overwrite")
... return await table.query().nearest_to([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [4.0, 5.0, 6.0]]).limit(1).to_pandas()
...
>>> asyncio.run(main())
query_index id vector _distance
0 2 2 [4.0, 5.0, 6.0] 0.0
1 1 2 [4.0, 5.0, 6.0] 0.0
2 0 1 [1.0, 2.0, 3.0] 0.0
```
This is done as setup for a PR that will fix the OpenAI dependency
issue.
* [x] FTS examples
* [x] Setup mock openai
* [x] Ran `npm audit fix`
* [x] sentences embeddings test
* [x] Double check formatting of docs examples
Hello team,
I'm the maintainer of [Anteon](https://github.com/getanteon/anteon). We
have created Gurubase.io with the mission of building a centralized,
open-source tool-focused knowledge base. Essentially, each "guru" is
equipped with custom knowledge to answer user questions based on
collected data related to that tool.
I wanted to update you that I've manually added the [LanceDB
Guru](https://gurubase.io/g/lancedb) to Gurubase. LanceDB Guru uses the
data from this repo and data from the
[docs](https://lancedb.github.io/lancedb/) to answer questions by
leveraging the LLM.
In this PR, I showcased the "LanceDB Guru", which highlights that
LanceDB now has an AI assistant available to help users with their
questions. Please let me know your thoughts on this contribution.
Additionally, if you want me to disable LanceDB Guru in Gurubase, just
let me know that's totally fine.
Signed-off-by: Kursat Aktas <kursat.ce@gmail.com>
we don't really need these trait in lancedb, but all fields in `Index`
implement the 2 traits, so do it for possibility to use `Index`
somewhere
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
* Replaces Python implementation of Remote SDK with Rust one.
* Drops dependency on `attrs` and `cachetools`. Makes `requests` an
optional dependency used only for embeddings feature.
* Adds dependency on `nest-asyncio`. This was required to get hybrid
search working.
* Deprecate `request_thread_pool` parameter. We now use the tokio
threadpool.
* Stop caching the `schema` on a remote table. Schema is mutable and
there's no mechanism in place to invalidate the cache.
* Removed the client-side resolution of the vector column. We should
already be resolving this server-side.
* `open_table` uses `POST` not `GET`
* `update` uses `predicate` key not `only_if`
* For FTS search, vector cannot be omitted. It must be passed as empty.
* Added logging of JSON request bodies to debug level logging.
Sometimes it is acceptable to users to only search indexed data and skip
and new un-indexed data. For example, if un-indexed data will be shortly
indexed and they don't mind the delay. In these cases, we can save a lot
of CPU time in search, and provide better latency. Users can activate
this on queries using `fast_search()`.
Fixes several minor issues with Rust remote SDK:
* Delete uses `predicate` not `filter` as parameter
* Update does not return the row value in remote SDK
* Update takes tuples
* Content type returned by query node is wrong, so we shouldn't validate
it. https://github.com/lancedb/sophon/issues/2742
* Data returned by query endpoint is actually an Arrow IPC file, not IPC
stream.
When running `index_stats()` for an FTS index, users would get the
deserialization error:
```
InvalidInput { message: "error deserializing index statistics: unknown variant `Inverted`, expected one of `IvfPq`, `IvfHnswPq`, `IvfHnswSq`, `BTree`, `Bitmap`, `LabelList`, `FTS` at line 1 column 24" }
```
This exposes the `LANCEDB_LOG` environment variable in node, so that
users can now turn on logging.
In addition, fixes a bug where only the top-level error from Rust was
being shown. This PR makes sure the full error chain is included in the
error message. In the future, will improve this so the error chain is
set on the [cause](https://nodejs.org/api/errors.html#errorcause)
property of JS errors https://github.com/lancedb/lancedb/issues/1779Fixes#1774