It seems that `RecordBatch::with_schema` is unable to remove schema
metadata from a batch. It fails with the error `target schema is not
superset of current schema`.
I'm not sure how the `test_metadata_erased` test is passing. Strangely,
the metadata was not present by the time the batch arrived at the
metadata eraser. I think maybe the schema metadata is only present in
the batch if there is a filter.
I've created a new unit test that makes sure the metadata is erased if
we have a filter also
In earlier PRs (#1886, #1191) we made the default limit 10 regardless of
the query type. This was confusing for users and in many cases a
breaking change. Users would have queries that used to return all
results, but instead only returned the first 10, causing silent bugs.
Part of the cause was consistency: the Python sync API seems to have
always had a limit of 10, while newer APIs (Python async and Nodejs)
didn't.
This PR sets the default limit only for searches (vector search, FTS),
while letting scans (even with filters) be unbounded. It does this
consistently for all SDKs.
Fixes#1983Fixes#1852Fixes#2141
This also changes the pylance pin from `==0.23.2` to `~=0.23.2` which
should allow the pylance dependency to float a little. The pylance
dependency is actually not used for much anymore and so it should be
tolerant of patch changes.
BREAKING CHANGE: embedding function implementations in Node need to now
call `resolveVariables()` in their constructors and should **not**
implement `toJSON()`.
This tries to address the handling of secrets. In Node, they are
currently lost. In Python, they are currently leaked into the table
schema metadata.
This PR introduces an in-memory variable store on the function registry.
It also allows embedding function definitions to label certain config
values as "sensitive", and the preprocessing logic will raise an error
if users try to pass in hard-coded values.
Closes#2110Closes#521
---------
Co-authored-by: Weston Pace <weston.pace@gmail.com>
Reviving #1966.
Closes#1938
The `search()` method can apply embeddings for the user. This simplifies
hybrid search, so instead of writing:
```python
vector_query = embeddings.compute_query_embeddings("flower moon")[0]
await (
async_tbl.query()
.nearest_to(vector_query)
.nearest_to_text("flower moon")
.to_pandas()
)
```
You can write:
```python
await (await async_tbl.search("flower moon", query_type="hybrid")).to_pandas()
```
Unfortunately, we had to do a double-await here because `search()` needs
to be async. This is because it often needs to do IO to retrieve and run
an embedding function.
Address usage mistakes in
https://github.com/lancedb/lancedb/issues/2135.
* Add example of how to use `LanceModel` and `Vector` decorator
* Add test for pydantic doc
* Fix the example to directly use LanceModel instead of calling
`MyModel.to_arrow_schema()` in the example.
* Add cross-reference link to pydantic doc site
* Configure mkdocs to watch code changes in python directory.
we found a bug that flat KNN plan node's stats is not in right order as
fields in schema, it would cause an error if querying with distance
range and new unindexed rows.
we've fixed this in lance so add this test for verifying it works
Signed-off-by: BubbleCal <bubble-cal@outlook.com>