## Summary
This PR adds the missing `name` parameter to `create_scalar_index` and
`create_fts_index` methods in the Python SDK, which was inadvertently
omitted when it was added to `create_index` in PR #2586.
## Changes
- Add `name: Optional[str] = None` parameter to abstract
`Table.create_scalar_index` and `Table.create_fts_index` methods
- Update `LanceTable` implementation to accept and pass the `name`
parameter to the underlying Rust layer
- Update `RemoteTable` implementation to accept and pass the `name`
parameter
- Enhanced tests to verify custom index names work correctly for both
scalar and FTS indices
- When `name` is not provided, default names are generated (e.g.,
`{column}_idx`)
## Test plan
- [x] Added test cases for custom names in scalar index creation
- [x] Added test cases for custom names in FTS index creation
- [x] Verified existing tests continue to pass
- [x] Code formatting and linting checks pass
This ensures API consistency across all index creation methods in the
LanceDB Python SDK.
Fixes#2616🤖 Generated with [Claude Code](https://claude.ai/code)
---------
Co-authored-by: Claude <noreply@anthropic.com>
## Summary
- Adds an overall `timeout` parameter to `TimeoutConfig` that limits the
total time for the entire request
- Can be set via config or `LANCE_CLIENT_TIMEOUT` environment variable
- Exposed in Python and Node.js bindings
- Includes comprehensive tests
## Test plan
- [x] Unit tests for Rust TimeoutConfig
- [x] Integration tests for Python bindings
- [x] Integration tests for Node.js bindings
- [x] All existing tests pass
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-authored-by: Claude <noreply@anthropic.com>
just noticed that we're doing a 'return' instead of a 'raise' while
trying to get remote functionality working for my project. I went ahead
and implemented tests for both of the unimplemented functions (to_pandas
and to_arrow) while I was in there.
---------
Co-authored-by: Cyrus Attoun <jattoun1@gmail.com>
This exposes the maximum_nprobes and minimum_nprobes feature that was
added in https://github.com/lancedb/lance/pull/3903
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **New Features**
- Added support for specifying minimum and maximum probe counts in
vector search queries, allowing finer control over search behavior.
- Users can now independently set minimum and maximum probes for vector
and hybrid queries via new methods and parameters in Python, Node.js,
and Rust APIs.
- **Bug Fixes**
- Improved parameter validation to ensure correct usage of minimum and
maximum probe values.
- **Tests**
- Expanded test coverage to validate correct handling, serialization,
and error cases for the new probe parameters.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
This moves the __len__ method from LanceTable and RemoteTable to Table
so that child classes don't need to implement their own. In the process,
it fixes the implementation of RemoteTable's length method, which was
previously missing a return statement.
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **Refactor**
- Centralized the table length functionality in the base table class,
simplifying subclass behavior.
- Removed redundant or non-functional length methods from specific table
classes.
- **Tests**
- Added a new test to verify correct table length reporting for remote
tables.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
* Add a new "table stats" API to expose basic table and fragment
statistics with local and remote table implementations
### Questions
* This is using `calculate_data_stats` to determine total bytes in the
table. This seems like a potentially expensive operation - are there any
concerns about performance for large datasets?
### Notes
* bytes_on_disk seems to be stored at the column level but there does
not seem to be a way to easily calculate total bytes per fragment. This
may need to be added in lance before we can support fragment size
(bytes) statistics.
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **New Features**
- Added a method to retrieve comprehensive table statistics, including
total rows, index counts, storage size, and detailed fragment size
metrics such as minimum, maximum, mean, and percentiles.
- Enabled fetching of table statistics from remote sources through
asynchronous requests.
- Extended table interfaces across Python, Rust, and Node.js to support
synchronous and asynchronous retrieval of table statistics.
- **Tests**
- Introduced tests to verify the accuracy of the new table statistics
feature for both populated and empty tables.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
* Add new wait_for_index() table operation that polls until indices are
created/fully indexed
* Add an optional wait timeout parameter to all create_index operations
* Python and NodeJS interfaces
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
## Summary by CodeRabbit
- **New Features**
- Added optional waiting for index creation completion with configurable
timeout.
- Introduced methods to poll and wait for indices to be fully built
across sync and async tables.
- Extended index creation APIs to accept a wait timeout parameter.
- **Bug Fixes**
- Added a new timeout error variant for improved error reporting on
index operations.
- **Tests**
- Added tests covering successful index readiness waiting, timeout
scenarios, and missing index cases.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **Chores**
- Updated dependency versions for improved performance and
compatibility.
- **New Features**
- Added support for structured full-text search with expanded query
types (e.g., match, phrase, boost, multi-match) and flexible input
formats.
- Introduced a new method to check server support for structural
full-text search features.
- Enhanced the query system with new classes and interfaces for handling
various full-text queries.
- Expanded the functionality of existing methods to accept more complex
query structures, including updates to method signatures.
- **Bug Fixes**
- Improved error handling and reporting for full-text search queries.
- **Refactor**
- Enhanced query processing with streamlined input handling and improved
error reporting, ensuring more robust and consistent search results
across platforms.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Co-authored-by: BubbleCal <bubble-cal@outlook.com>
This PR adds a `to_query_object` method to the various query builders
(except not hybrid queries yet). This makes it possible to inspect the
query that is built.
In addition this PR does some normalization between the sync and async
query paths. A few custom defaults were removed in favor of None (with
the default getting set once, in rust).
Also, the synchronous to_batches method will now actually stream results
Also, the remote API now defaults to prefiltering
Reviving #1966.
Closes#1938
The `search()` method can apply embeddings for the user. This simplifies
hybrid search, so instead of writing:
```python
vector_query = embeddings.compute_query_embeddings("flower moon")[0]
await (
async_tbl.query()
.nearest_to(vector_query)
.nearest_to_text("flower moon")
.to_pandas()
)
```
You can write:
```python
await (await async_tbl.search("flower moon", query_type="hybrid")).to_pandas()
```
Unfortunately, we had to do a double-await here because `search()` needs
to be async. This is because it often needs to do IO to retrieve and run
an embedding function.
Closes#1106
Unfortunately, these need to be set at the connection level. I
investigated whether if we let users provide a callback they could use
`AsyncLocalStorage` to access their context. However, it doesn't seem
like NAPI supports this right now. I filed an issue:
https://github.com/napi-rs/napi-rs/issues/2456
This PR aims to fix#2047 by doing the following things:
- Add a distance_type parameter to the sync query builders of Python
SDK.
- Make metric an alias to distance_type.
Users who call the remote SDK from code that uses futures (either
`ThreadPoolExecutor` or `asyncio`) can get odd errors like:
```
Traceback (most recent call last):
File "/usr/lib/python3.12/asyncio/events.py", line 88, in _run
self._context.run(self._callback, *self._args)
RuntimeError: cannot enter context: <_contextvars.Context object at 0x7cfe94cdc900> is already entered
```
This PR fixes that by executing all LanceDB futures in a dedicated
thread pool running on a background thread. That way, it doesn't
interact with their threadpool.
Allows users to pass multiple query vector as part of a single query
plan. This just runs the queries in parallel without any further
optimization. It's mostly a convenience.
Previously, I think this was only handled by the sync Python remote API.
This makes it common across all SDKs.
Closes https://github.com/lancedb/lancedb/issues/1803
```python
>>> import lancedb
>>> import asyncio
>>>
>>> async def main():
... db = await lancedb.connect_async("./demo")
... table = await db.create_table("demo", [{"id": 1, "vector": [1, 2, 3]}, {"id": 2, "vector": [4, 5, 6]}], mode="overwrite")
... return await table.query().nearest_to([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [4.0, 5.0, 6.0]]).limit(1).to_pandas()
...
>>> asyncio.run(main())
query_index id vector _distance
0 2 2 [4.0, 5.0, 6.0] 0.0
1 1 2 [4.0, 5.0, 6.0] 0.0
2 0 1 [1.0, 2.0, 3.0] 0.0
```
* Replaces Python implementation of Remote SDK with Rust one.
* Drops dependency on `attrs` and `cachetools`. Makes `requests` an
optional dependency used only for embeddings feature.
* Adds dependency on `nest-asyncio`. This was required to get hybrid
search working.
* Deprecate `request_thread_pool` parameter. We now use the tokio
threadpool.
* Stop caching the `schema` on a remote table. Schema is mutable and
there's no mechanism in place to invalidate the cache.
* Removed the client-side resolution of the vector column. We should
already be resolving this server-side.
* Adds nicer errors to remote SDK, that expose useful properties like
`request_id` and `status_code`.
* Makes sure the Python tracebacks print nicely by mapping the `source`
field from a Rust error to the `__cause__` field.
This changes `lancedb` from a "pure python" setuptools project to a
maturin project and adds a rust lancedb dependency.
The async python client is extremely minimal (only `connect` and
`Connection.table_names` are supported). The purpose of this PR is to
get the infrastructure in place for building out the rest of the async
client.
Although this is not technically a breaking change (no APIs are
changing) it is still a considerable change in the way the wheels are
built because they now include the native shared library.