mirror of
https://github.com/lancedb/lancedb.git
synced 2026-07-04 03:20:40 +00:00
python-v0.34.0-beta.5
2631 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
d9f9a51668 |
feat: skills to connect and update column metadata (#3541)
Two skills to help people connect and manage their column metadata using a server that implements the [REST API](https://lance.org/format/catalog/rest/) lancedb-column-metadata was built using the [Claude skill creator](https://claude.com/plugins/skill-creator); without the skill it was usually calling at least one method that didn't exist and usually not setting "replace": "false". So, while the base case is already pretty good, adding this skill improves things somewhat. lancedb-connect should help with most agentic workflows, because "finding all the things you need to connect to your server" can be the hardest part. |
||
|
|
c187ff7712 | chore: ignore pyo3 advisories RUSTSEC-2026-0176/0177 in cargo-deny (#3542) | ||
|
|
dfbe5becaa |
chore: update lance dependency to v8.0.0-beta.12 (#3538)
Updates Rust workspace Lance crates and Java lance-core to v8.0.0-beta.12. No compatibility fixes were required; validation passed with cargo clippy and cargo fmt. Lance tag: https://github.com/lance-format/lance/releases/tag/v8.0.0-beta.12 |
||
|
|
49815da933 |
refactor: extract create_index module from table.rs (#3521)
## Summary - Extracts the `create_index` code cluster from `table.rs` into a new `rust/lancedb/src/table/create_index.rs` submodule, continuing the work from #2949. - Moves 8 `NativeTable` inherent methods (`load_indices`, `validate_index_type`, `build_ivf_params`, `get_num_sub_vectors`, `get_vector_dimension`, `resolve_index_field`, `make_index_params`, `get_index_type_for_field`) and 11 associated tests into the new module. - Reduces `table.rs` from ~5009 to ~3804 lines (-1205 lines) with no behavioral changes. ## Test plan UT |
||
|
|
f8caef3aca |
feat(bindings): expose new IndexConfig fields in Python and Node.js (#3534)
## Summary Surfaces the rich per-index metadata added in #3497 to the Python and Node.js language bindings. Closes #3495. New optional fields exposed on `IndexConfig` in both bindings: - `index_uuid` / `indexUuid` — UUID of the first index segment - `type_url` / `typeUrl` — protobuf type URL for the index - `created_at` / `createdAt` — creation timestamp (milliseconds since Unix epoch) - `num_indexed_rows` / `numIndexedRows` — rows covered by the index - `num_unindexed_rows` / `numUnindexedRows` — rows not yet indexed - `size_bytes` / `sizeBytes` — total index file size in bytes - `num_segments` / `numSegments` — number of index segments - `index_version` / `indexVersion` — on-disk format version - `index_details` / `indexDetails` — type-specific JSON details string All fields are `None`/`undefined` for remote tables (which don't yet surface this metadata through the server response). ## Changes - `python/src/index.rs`: extend `IndexConfig` pyclass; update `From` impl; update `__getitem__` - `python/python/lancedb/_lancedb.pyi`: add type hints for new fields - `python/python/tests/test_table.py`: new `test_index_config_fields` test - `nodejs/src/table.rs`: extend `IndexConfig` napi struct; update `From` impl - `nodejs/__test__/table.test.ts`: new test; update existing `toEqual` assertions to `expect.objectContaining` to accommodate new fields ## Test plan - [x] Python: `uv run --extra tests pytest python/tests/test_table.py::test_index_config_fields` - [x] Node.js: `pnpm test __test__/table.test.ts` 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> |
||
|
|
40f3e22600 |
feat: support rename_table on LanceNamespaceDatabase (#3520)
## Summary Closes #3412 Implements `rename_table` for `LanceNamespaceDatabase` (sync and async Python) and the Rust `NamespaceDatabase` backend. Previously these raised `NotImplementedError`; this PR delegates to the `LanceNamespace.rename_table` method which is part of the lance-namespace spec. ### Changes - **`rust/lancedb/src/database/namespace.rs`**: Remove the `NotImplementedError` stub for `rename_table`. Build a `RenameTableRequest` (with `id`, `new_table_name`, and optionally `new_namespace_id`) and call `self.namespace.rename_table(...)`, mirroring the existing `drop_table` pattern. - **`python/python/lancedb/namespace.py`**: Import `RenameTableRequest` from `lance_namespace`. Replace the `raise NotImplementedError` in both `LanceNamespaceDatabase.rename_table` (sync) and `AsyncLanceNamespaceDatabase.rename_table` (async) with a call to `self._namespace_client.rename_table(request)`. - **`python/python/tests/test_namespace.py`**: Replace the `test_rename_table_not_supported` test (which checked for `NotImplementedError`) with `test_rename_table`, which: 1. Creates a table in a namespace 2. Calls `rename_table` with `cur_namespace_path` and `new_namespace_path` 3. Asserts the old name is gone from `table_names()` 4. Asserts the new name appears in `table_names()` 5. Verifies the renamed table can be opened ## Test plan - [ ] Existing namespace tests pass in CI (all rely on `lance.namespace.DirectoryNamespace` which requires the full lance package) - [ ] `test_rename_table` exercises the full rename path: create → rename → verify old gone → verify new present → open - [ ] Rust build passes with the updated `namespace.rs` (requires Rust toolchain in CI) |
||
|
|
04480c274a |
test(python): add nested field regression matrix tests (#3518)
## Summary Closes #3406 Add a regression matrix in `python/python/tests/test_nested_fields.py` that exercises the full nested field index lifecycle for both the sync and async Python table APIs. The tests will fail if any implementation regresses to leaf-only field names in `list_indices`, `index_stats`, search, or filter results. ## Test scenarios covered **Index types:** BTree scalar, IvfPq vector, FTS **Field-name edge cases (per acceptance criteria):** - `rowId` — camelCase top-level field - `` `row-id` `` — hyphenated top-level field (escaped) - `parent.`\``leaf.name`\`` ` — struct leaf whose name contains a literal dot - `MetaData.userId` — mixed-case nested path - `` `meta-data`.`user-id` `` — hyphenated struct with hyphenated leaf **Lifecycle operations per index type:** - `create_index` / `create_scalar_index` / `create_fts_index` - `list_indices` → verify canonical full dotted path (not leaf name) - `index_stats` → verify row count and index type - Filtered scan (`WHERE nested.field = value`) - Vector search via nested embedding column - FTS search via nested text column - `add` (append) then re-check index listing - `optimize` then re-check index listing **Both sync and async APIs** are covered in parallel test classes. ## Notes Lance forbids top-level field names that contain a literal `.`, so the `` `a.b` `` acceptance-criterion variant is exercised as a *struct leaf* field (`parent.`\``leaf.name`\``) rather than a top-level column. |
||
|
|
ae7f2cbfe8 |
feat(python): accept Expr in Table.delete and merge when_not_matched_by_source_delete (#3524)
Another little pain point as I was working to integrate with paperless-ngx. The read path of table.search() or table.query() already accepted an Expr, but write paths Table.delete and merge_insert(...).when_not_matched_by_source_delete did not. This PR attempts to close that gap, so writes and reads can both use Expr, instead of one side needing to build a string. |
||
|
|
4fb7c92e86 |
chore: update lance dependency to v8.0.0-beta.11 (#3533)
Updates Lance dependencies to v8.0.0-beta.11 and refreshes the Rust and Java lock/config files. This also adapts namespace external manifest store call sites to the new table-root-aware constructor required by Lance. Triggering tag: https://github.com/lancedb/lance/releases/tag/v8.0.0-beta.11 |
||
|
|
f03abc27e3 |
feat: expand IndexConfig with rich per-index metadata (#3497)
`IndexConfig` (returned by `Table::list_indices`) previously exposed only `name`, `index_type`, and `columns`. Lance's `describe_indices` provides richer per-index info cheaply (reads manifest metadata, often cached), so this surfaces it. Adds these `Option<T>` fields to `lancedb::index::IndexConfig`, populated in `NativeTable::list_indices` from the `IndexDescription`: - `index_uuid`: uuid of the first segment - `type_url`: protobuf type URL (`IndexDescription::type_url`) - `created_at`: minimum creation time across segments - `num_indexed_rows`: approximate rows indexed across segments - `num_unindexed_rows`: table row count minus `num_indexed_rows` - `size_bytes`: total size of index files across segments - `num_segments`: number of segments making up the index - `index_version`: on-disk index format version (first segment) - `index_details`: index-type-specific details as JSON This field set mirrors the lance-namespace `IndexContent` contract (lance-format/lance-namespace#348) so client and server agree on the same shape. Note these are populated **locally** via `describe_indices` — `NativeTable::list_indices` reads the dataset directly and does not depend on the namespace spec change. `RemoteTable` leaves the new fields `None` until a follow-up wires them through the server response (#3494). Bindings exposure will also be a follow up: #3495 Existing `list_indices` tests in `rust/lancedb/src/table.rs` are extended to assert the new fields. Fixes #3492 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com> |
||
|
|
85d9c1ce63 |
feat: adds isin support to the 'Expr' builder (#3523)
The `Expr` build already includes a lot of useful filtering options, `eq, ne, gt/gte, lt/lte, and_, or_, contains, cast`, but is was missing a membership like `isin`. This PR adds that support, as minimally as possible, allowing easy filtering for membership in a list, without needing to be a series of `where` expressions. I didn't see anything in CONTRIBUTING.md about needing a feature request or issue first, so I just made the change. My apologies if I missed that somewhere. Thanks for the vector store, we're using it now in paperless-ngx. |
||
|
|
d786e39fdc |
chore(deps): bump the rust-minor-patch group across 1 directory with 7 updates (#3531)
Bumps the rust-minor-patch group with 7 updates in the / directory: | Package | From | To | | --- | --- | --- | | [log](https://github.com/rust-lang/log) | `0.4.31` | `0.4.32` | | [regex](https://github.com/rust-lang/regex) | `1.12.3` | `1.12.4` | | [chrono](https://github.com/chronotope/chrono) | `0.4.44` | `0.4.45` | | [serde_with](https://github.com/jonasbb/serde_with) | `3.20.0` | `3.21.0` | | [http](https://github.com/hyperium/http) | `1.4.1` | `1.4.2` | | [uuid](https://github.com/uuid-rs/uuid) | `1.23.2` | `1.23.3` | | [napi](https://github.com/napi-rs/napi-rs) | `3.9.0` | `3.9.1` | Updates `log` from 0.4.31 to 0.4.32 <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/rust-lang/log/releases">log's releases</a>.</em></p> <blockquote> <h2>0.4.32</h2> <h2>What's Changed</h2> <ul> <li>Support <code>Value</code> -> string conversions with <code>kv</code> + <code>std</code> features instead of <code>kv_std</code> by <a href="https://github.com/tisonkun"><code>@tisonkun</code></a> in <a href="https://redirect.github.com/rust-lang/log/pull/729">rust-lang/log#729</a></li> <li>Prepare for 0.4.32 release by <a href="https://github.com/KodrAus"><code>@KodrAus</code></a> in <a href="https://redirect.github.com/rust-lang/log/pull/730">rust-lang/log#730</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/rust-lang/log/compare/0.4.31...0.4.32">https://github.com/rust-lang/log/compare/0.4.31...0.4.32</a></p> </blockquote> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/rust-lang/log/blob/master/CHANGELOG.md">log's changelog</a>.</em></p> <blockquote> <h2>[0.4.32] - 2026-06-04</h2> <h3>What's Changed</h3> <ul> <li>Support <code>Value</code> -> string conversions with <code>kv</code> + <code>std</code> features instead of <code>kv_std</code> by <a href="https://github.com/tisonkun"><code>@tisonkun</code></a> in <a href="https://redirect.github.com/rust-lang/log/pull/729">rust-lang/log#729</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/rust-lang/log/compare/0.4.31...0.4.32">https://github.com/rust-lang/log/compare/0.4.31...0.4.32</a></p> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href=" |
||
|
|
8373318e89 |
feat: support FM-Index scalar index for substring search (#3532)
Adds an FM-Index — a scalar index over string and binary columns that
accelerates substring search (`contains(col, 'needle')`), distinct from
the tokenized `FTS` index — across the Rust core and the Python and
TypeScript bindings.
## Rust
- `Index::Fm(FmIndexBuilder)` and `IndexType::Fm`.
- `make_index_params` maps `Index::Fm` to Lance's
`ScalarIndexParams::for_builtin(BuiltinIndexType::Fm)`.
- `supported_fm_data_type` validates
`Utf8`/`LargeUtf8`/`Binary`/`LargeBinary` columns.
- `list_indices` round-trips the type (`"Fm"` → `IndexType::Fm`); the
remote wire type is `"FM"`.
## Python
Adds `lancedb.index.Fm`, accepted by `create_index`:
```python
from lancedb.index import Fm
await tbl.create_index("text", config=Fm())
```
## TypeScript
Adds the `Index.fm()` factory:
```ts
await tbl.createIndex("text", { config: Index.fm() });
```
|
||
|
|
8308cca05e |
chore: update lance dependency to v8.0.0-beta.9 (#3527)
Updates Lance dependencies to v8.0.0-beta.9. Includes the required Rust compatibility fix for Lance's updated vector index UUID API. Triggering tag: https://github.com/lancedb/lance/releases/tag/v8.0.0-beta.9 |
||
|
|
566b67a634 |
fix: support LargeList label list indexes (#3529)
## Summary This PR extends nested-field regression coverage across Rust local/remote, Python sync/async, and Node so canonical escaped paths stay consistent across scalar, vector, and FTS index lifecycle behavior. It also aligns LanceDB's LabelList type gate with Lance by accepting `LargeList<primitive>` columns while keeping `List<Struct<...>>` unsupported until Lance defines stable membership semantics for struct labels. Part of #3406. |
||
|
|
9c12fb6437 |
fix(nodejs): treat NAPI_RS_FORCE_WASI as truthy only when set to 'true' (#3519)
## Summary Fixes the `NAPI_RS_FORCE_WASI=false` issue by upgrading `@napi-rs/cli` from `3.5.1` to `3.7.0`. Closes #3267 ## Root Cause In the `native.js` loader generated by `napi build`, the check was: ```js if (!nativeBinding || process.env.NAPI_RS_FORCE_WASI) { ``` In JavaScript, any non-empty string is truthy, so `NAPI_RS_FORCE_WASI=false` (a non-empty string) inadvertently triggered the WASI fallback path. This caused an `ENOENT` error when `lancedb.wasi.cjs` was not present. ## Fix `@napi-rs/cli@3.7.0` ([napi-rs/napi-rs#3236](https://github.com/napi-rs/napi-rs/pull/3236)) introduced a tri-state check in the template that generates `native.js`: **Before (generated by @napi-rs/cli@3.5.1):** ```js if (!nativeBinding || process.env.NAPI_RS_FORCE_WASI) { ``` **After (generated by @napi-rs/cli@3.7.0):** ```js const forceWasi = process.env.NAPI_RS_FORCE_WASI === 'true' || process.env.NAPI_RS_FORCE_WASI === 'error' if (!nativeBinding || forceWasi) { ``` Only the literal string `'true'` (or `'error'` for strict mode) now activates the WASI path. All other values, including `'false'`, `'0'`, or an unset variable, behave as if WASI is not forced. ## Changes - `nodejs/package.json`: bump `@napi-rs/cli` from `3.5.1` to `3.7.0` - `nodejs/package-lock.json` / `nodejs/pnpm-lock.yaml`: update lock files to match The fix is in the upstream napi-rs tool; the generated `native.js` is not committed to this repository and is produced at build time by `napi build`. |
||
|
|
f260d3bf12 |
fix(util): convert numpy scalars in value_to_sql (#3522)
## What's broken
`Table.update(values={...})` raises `NotImplementedError: SQL conversion
is not implemented for this type` when a value is a numpy scalar such as
`np.int64`, `np.int32`, `np.float32`, or `np.bool_`. These arise
naturally from indexing an ndarray or a pandas int/bool column.
`np.float64` happens to work (it subclasses `float`), which makes the
failure inconsistent and surprising.
```python
df = pd.DataFrame({"id": np.array([10, 20], dtype="int32")})
t.update(where="id = 1", values={"id": df["id"].iloc[0]}) # np.int32
# -> NotImplementedError: SQL conversion is not implemented for this type
```
## Why it happens
`value_to_sql` is a `singledispatch` with handlers only for native
Python types and `np.ndarray`; numpy `integer`/`floating`/`bool_`
scalars aren't Python subclasses, so they fall through to the
`NotImplementedError` base.
## Fix
Register handlers for `np.bool_`, `np.integer`, and `np.floating` that
delegate to the existing native handlers.
## Test
`value_to_sql` on `np.int32/int64/float32/float64/bool_` all convert;
`np.int32` raised before.
Co-authored-by: Ishaan Samantray <ishaansamantray@Ishaans-MacBook-Pro.local>
|
||
|
|
d9018067b3 |
feat: support checking out a version on a branch (#3504)
### Description Stacked on #3490. Adds an optional version to branch checkout across the Rust core and the Python and TypeScript SDKs, so you can open a specific version on a branch ("version V of branch B"), not just the branch's latest version Rust ```rust // Open version 3 of branch "exp" (a read-only view): check out from an // existing table, or open it directly from the connection. let exp_v3 = table.checkout_branch("exp", Some(3)).await?; let exp_v3 = db.open_table("items").branch("exp").version(3).execute().await?; // checkout_latest re-attaches to the branch's writable HEAD. exp_v3.checkout_latest().await?; // With no branch, a version opens main at that version. let main_v3 = db.open_table("items").version(3).execute().await?; ``` Python ```python # Open version 3 of branch "exp" (a read-only view): check out from an # existing table, or open it directly from the connection. branch_v3 = await table.branches.checkout("exp", version=3) branch_v3 = await db.open_table("items", branch="exp", version=3) # checkout_latest re-attaches to the branch's writable HEAD. await branch_v3.checkout_latest() # With no branch, a version opens main at that version. main_v3 = await db.open_table("items", version=3) ``` TypeScript ```typescript // Open version 3 of branch "exp" (a read-only view): check out from an // existing table, or open it directly from the connection. const branchV3 = await (await table.branches()).checkout("exp", 3); const opened = await db.openTable("items", undefined, { branch: "exp", version: 3 }); // checkoutLatest re-attaches to the branch's writable HEAD. await branchV3.checkoutLatest(); // With no branch, a version opens main at that version. const mainV3 = await db.openTable("items", undefined, { version: 3 }); ``` ### Testing - Added unit tests (Rust, Python sync + async, TypeScript): branch-scoped resolution at a version number shared with `main` and with another branch, read-only enforcement on a pinned handle, `checkout_latest` recovery to the branch's HEAD, fork-point reads, and the nonexistent-version/branch error paths. - Ran smoke tests against the Python and TypeScript SDKs on local machine. |
||
|
|
53517b3aaa |
feat: add table branch support (#3490)
### Description
Adds first-class support for table branches across the Rust core and the
Python and TypeScript SDKs.
Rust
```rust
use lance::dataset::refs::Ref;
// Create a branch from main and write to it — main is untouched.
let exp = table.create_branch("exp", Ref::Version(None, None)).await?;
exp.add(batches).await?;
// Reopen the branch later: check out from a table, or open it directly.
let exp = table.checkout_branch("exp").await?;
let exp = db.open_table("items").branch("exp").execute().await?;
let branches = table.list_branches().await?;
table.delete_branch("exp").await?;
```
Python
```python
# Create a branch from main and write to it
branch = await table.branches.create("exp", from_ref="main")
await branch.add(data)
# Reopen the branch later: check out from a table, or open it directly.
branch = await table.branches.checkout("exp")
branch = await db.open_table("items", branch="exp")
await table.branches.list()
await table.branches.delete("exp")
```
TypeScript
```typescript
const branches = await table.branches();
// Create a branch from main and write to it
const branch = await branches.create("exp");
await branch.add(data);
// Reopen the branch later: check out from a table, or open it directly.
const checkedOut = await branches.checkout("exp");
const opened = await db.openTable("items", undefined, { branch: "exp" });
await branches.list();
await branches.delete("exp");
```
### Testing
- Added unit tests
- ran smoke tests against python and typescript sdks on local machine
### Next steps
- Add RemoteTable support
- Add Branch Comparison support
- Merge Branching support
|
||
|
|
3e25f584eb |
fix(python): push down namespace full reads (#3516)
## Bug Fix ### What is the bug? Namespace-backed `LanceTable.to_arrow()` full-table reads bypassed the existing `QueryTable` server-side query path and called the lower-level table `to_arrow()` implementation directly. In Geneva/Sophon this could fail while parsing the Arrow IPC response for `hist.get_table().to_arrow()` / `to_pandas()`, even though `hist.get_table().search().to_arrow()` worked. ### What issues or incorrect behavior does the bug cause? Full-table reads on namespace-backed tables with `QueryTable` pushdown could fail with Arrow IPC parse errors, while query/search reads on the same table succeeded. Since `to_pandas()` delegates through `to_arrow()` for non-blob/native cases, pandas export was affected too. ### How does this PR fix the problem? When `QueryTable` pushdown is enabled, sync and async table `to_arrow()` now construct a plain no-filter, no-limit, all-columns query and execute it through the table-level `_execute_query()` path. `AsyncTable` now preserves namespace context from async namespace connections so async full reads can make the same pushdown decision. Non-namespace tables and namespace tables without `QueryTable` pushdown keep their existing behavior. ### Tests - `uv run --extra tests --extra dev --no-sync ruff check python/lancedb/table.py python/lancedb/namespace.py python/tests/test_namespace.py` - `uv run --extra tests --extra dev --no-sync ruff format python/lancedb/table.py python/lancedb/namespace.py python/tests/test_namespace.py` - `uv run --extra tests --extra dev --no-sync pytest python/tests/test_namespace.py::TestPushdownOperations::test_lance_table_to_arrow_uses_query_pushdown python/tests/test_namespace.py::TestAsyncPushdownOperations::test_async_table_to_arrow_uses_query_pushdown python/tests/test_namespace.py::test_local_table_to_arrow_and_to_pandas_are_unchanged -q` - `uv run --extra tests --extra dev --no-sync pytest python/tests/test_namespace.py -q` |
||
|
|
59fbfd4158 |
chore: update lance dependency to v8.0.0-beta.6 (#3510)
Updates LanceDB Lance dependencies from v8.0.0-beta.5 to v8.0.0-beta.6 and refreshes Cargo metadata. No compatibility fixes were required; Java lance-core was bumped to 8.0.0-beta.6 as well. Lance tag: https://github.com/lance-format/lance/releases/tag/v8.0.0-beta.6 |
||
|
|
f37e698e2f |
chore: update lance dependency to v8.0.0-beta.5 (#3508)
Updates Lance dependencies from v8.0.0-beta.4 to v8.0.0-beta.5 across the Rust workspace and Java lance-core version. No compatibility code changes were required; clippy and rustfmt pass after installing the missing runner components. Lance tag: https://github.com/lance-format/lance/releases/tag/v8.0.0-beta.5 |
||
|
|
09b1bbc12a |
refactor!: drop unused loss field from IndexStatistics (#3496)
BREAKING CHANGE: direct Rust users lose the `IndexStatistics::loss` field. Python and Node.js consumers are unaffected in practice for remote tables (the value was always `None`/absent), but the attribute is gone for local tables too. `IndexStatistics::loss` was local-only — LanceDB Cloud never returned it, so `RemoteTable::index_stats` always set `loss: None`. It's vestigial; this removes it. - Remove `loss` from `IndexStatistics` and the internal `IndexMetadata` in `rust/lancedb/src/index.rs`, plus the summing logic in `NativeTable::index_stats`. - Drop `loss` from the Python and Node.js bindings (and their tests/docs). Fixes #3493 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
c484b24e51 |
chore: update lance dependency to v8.0.0-beta.4 (#3507)
Updates LanceDB Lance dependencies to Lance v8.0.0-beta.4. Includes the required compatibility fix for the new Lance file writer finish summary API. Lance tag: https://github.com/lance-format/lance/releases/tag/v8.0.0-beta.4 |
||
|
|
3868965413 |
fix(python): run AsyncTable.search embeddings on a dedicated executor (#3459)
## Summary `AsyncTable.search()` computes the query embedding with `loop.run_in_executor(None, ...)`, which uses asyncio's **default** `ThreadPoolExecutor`. That pool is shared with all other `run_in_executor(None, ...)` work, so a slow embedding call — a heavy local model or an HTTP request to an embeddings API — ties up those threads and starves unrelated async I/O under concurrent load. This moves the (potentially blocking) embedding call onto a **dedicated executor**, isolating it from the default pool. Closes #3310. ## Problem `python/lancedb/table.py`, `AsyncTable.search()`: ```python return ( await loop.run_in_executor( None, # asyncio's default executor, shared with other blocking I/O embedding.function.compute_query_embeddings_with_retry, query, ) )[0] ``` Under load, concurrent searches whose embeddings block (or any other code using the default executor) contend for the same small thread pool. ## Change - Add a dedicated `ThreadPoolExecutor(thread_name_prefix="lancedb-embedding")` in `background_loop.py`, exposed via `embedding_executor()`. - Use it in `AsyncTable.search()`'s `make_embedding` instead of the default executor. - Reset the executor in the existing `_reset_after_fork` hook — its worker threads don't survive `fork()`, same as the background event loop. It's recreated lazily, so this is cheap. ## Design notes The issue asked whether maintainers preferred a configurable executor, a dedicated internal one, or another approach (no response in the thread). I went with a **dedicated internal executor**: it fixes the starvation with no public API change and stays consistent with the existing `LOOP` singleton. Making the pool size configurable would be an easy follow-up if preferred. Scope is limited to `search()`. The broader "embedding functions need real async support" (including `add()`) is tracked separately in #3268. ## Testing - Added `test_async_search_runs_embedding_on_dedicated_executor`: patches the embedding function to record the executing thread during an async search and asserts it runs on a `lancedb-embedding` thread. Verified it **fails** against the previous `run_in_executor(None, ...)` and passes with the fix. - `ruff format`, `ruff check`, and `pyright` pass on the changed files. |
||
|
|
c13ebc6796 |
feat(remote): implement set/unset_lsm_write_spec REST variant (#3501)
## Summary Wires `RemoteTable::set_lsm_write_spec` / `unset_lsm_write_spec` to the sophon REST endpoints added in [lancedb/sophon#6181](https://github.com/lancedb/sophon/pull/6181), replacing the previous `NotSupported` stubs. - `set_lsm_write_spec` maps the `LsmWriteSpec` onto sophon's request DTO — mode-tagged `sharding` (`unsharded` / `bucket` / `identity`), `maintained_indexes`, and `writer_config_defaults` — and POSTs to `/v1/table/{name}/set_lsm_write_spec/`. - `unset_lsm_write_spec` POSTs to `/v1/table/{name}/unset_lsm_write_spec/`. - Both call `check_mutable` first, matching the other remote mutations. - `maintained_indexes` is sent verbatim (an empty list means "no maintained indexes", matching native semantics). ## Testing - Added mocked-endpoint unit tests for unsharded / bucket / identity set and for unset. - `cargo check --features remote --tests` passes. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
4b287fd9c4 |
chore: update lance dependency to v8.0.0-beta.2 (#3500)
Updates Lance dependencies to v8.0.0-beta.2 across the Rust workspace and Java lance-core metadata. The update was generated with ci/update_lance_dependency.py and required no compatibility code changes. Lance tag: https://github.com/lance-format/lance/releases/tag/v8.0.0-beta.2 ## ⛔ Merge blocker: legal review required This bump pulls in a new transitive **dev/profiling** dependency chain `inferno v0.11.21` → `pprof v0.15.0` → `lance-testing`, and `inferno` is licensed **CDDL-1.0** (copyleft). To get `cargo-deny` green, `CDDL-1.0` was added to the `deny.toml` allow list. **Do not merge until legal has reviewed and signed off on allowing CDDL-1.0.** The dependency is dev/test-only and not distributed, but the allow-list addition still requires legal approval per our policy. --------- Co-authored-by: Daniel Rammer <hamersaw@protonmail.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
64194ea8ad |
fix(python): make LanceDBClientError pickleable (#3470)
## Summary - Add `__reduce__` methods to `LanceDBClientError` and `RetryError` so that instances can be pickled and unpickled correctly - `HttpError` inherits the fix from `LanceDBClientError` since it has no additional `__init__` parameters - Add tests verifying pickle roundtrip for all three exception classes Fixes #3447 ## Test plan - [x] Verified pickle roundtrip for `LanceDBClientError` with and without `status_code` - [x] Verified pickle roundtrip for `HttpError` (subclass, no extra init params) - [x] Verified pickle roundtrip for `RetryError` (subclass with many extra params) - [ ] CI tests pass 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Will Jones <willjones127@gmail.com> |
||
|
|
e6c5de1a58 |
chore(deps): bump the rust-minor-patch group with 3 updates (#3499)
Bumps the rust-minor-patch group with 3 updates: [log](https://github.com/rust-lang/log), [test-log](https://github.com/d-e-s-o/test-log) and [serial_test](https://github.com/palfrey/serial_test). Updates `log` from 0.4.30 to 0.4.31 <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/rust-lang/log/releases">log's releases</a>.</em></p> <blockquote> <h2>0.4.31</h2> <h2>What's Changed</h2> <ul> <li>fix typos in kv compile errors and log documentation by <a href="https://github.com/Isvane"><code>@Isvane</code></a> in <a href="https://redirect.github.com/rust-lang/log/pull/726">rust-lang/log#726</a></li> <li>Leverage static str key when possible by <a href="https://github.com/tisonkun"><code>@tisonkun</code></a> in <a href="https://redirect.github.com/rust-lang/log/pull/727">rust-lang/log#727</a></li> <li>Prepare for 0.4.31 release by <a href="https://github.com/KodrAus"><code>@KodrAus</code></a> in <a href="https://redirect.github.com/rust-lang/log/pull/728">rust-lang/log#728</a></li> </ul> <h2>New Contributors</h2> <ul> <li><a href="https://github.com/Isvane"><code>@Isvane</code></a> made their first contribution in <a href="https://redirect.github.com/rust-lang/log/pull/726">rust-lang/log#726</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/rust-lang/log/compare/0.4.30...0.4.31">https://github.com/rust-lang/log/compare/0.4.30...0.4.31</a></p> </blockquote> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/rust-lang/log/blob/master/CHANGELOG.md">log's changelog</a>.</em></p> <blockquote> <h2>[0.4.31] - 2026-06-02</h2> <h2>What's Changed</h2> <ul> <li>Leverage static str key when possible by <a href="https://github.com/tisonkun"><code>@tisonkun</code></a> in <a href="https://redirect.github.com/rust-lang/log/pull/727">rust-lang/log#727</a></li> </ul> <h2>New Contributors</h2> <ul> <li><a href="https://github.com/Isvane"><code>@Isvane</code></a> made their first contribution in <a href="https://redirect.github.com/rust-lang/log/pull/726">rust-lang/log#726</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/rust-lang/log/compare/0.4.30...0.4.31">https://github.com/rust-lang/log/compare/0.4.30...0.4.31</a></p> <h2>[Unreleased]</h2> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href=" |
||
|
|
39a9f3e1e9 | Bump version: 0.30.1-beta.1 → 0.30.1-beta.2 | ||
|
|
952055d428 | Bump version: 0.33.1-beta.1 → 0.33.1-beta.2 python-v0.33.1-beta.2 | ||
|
|
927ba2c948 |
fix(python): route blob query pandas through scanner (#3491)
## Bug Fix ### What is the bug? `QueryBuilder.to_pandas(blob_mode="descriptions")` could still fall back to `self.to_arrow()` for query outputs with blob columns. Custom query subclasses or wrappers can have `to_arrow()` behavior that is not compatible with pandas blob-description conversion, which can surface as low-level Arrow/list-batch conversion failures. ### What issues or incorrect behavior does the bug cause? Callers need to carry local `to_pandas` or plain-scan adapter special casing for blob descriptions, and scanner-only kwargs such as row addresses and fragment selection are not represented in LanceDB query state. ### How does this PR fix the problem? This PR routes blob-output query `to_pandas()` through the Lance scanner path for `lazy`, `bytes`, and `descriptions` modes when the query is a scanner-backed plain scan. For `blob_mode="descriptions"` with `flatten`, it collects scanner Arrow/table output, applies LanceDB `flatten_columns`, and converts to pandas from there. Non-plain blob query shapes now fail with a clear unsupported error instead of falling into subclass `to_arrow()` behavior. It also adds Python query state and builder methods for scanner-only plain-scan parameters: - `with_row_address()` for `_rowaddr` - `with_fragments(...)` for Lance fragment objects - `fragment_ids([...])` as a convenience wrapper that resolves IDs to Lance fragments ## Validation - `cd python && uv run --no-sync ruff format --check python/lancedb/query.py python/tests/test_query.py` - `cd python && uv run --no-sync ruff check python/lancedb/query.py python/tests/test_query.py` Targeted pytest was intentionally not run locally per maintainer request. |
||
|
|
415d199c15 |
feat(rust): support datafusion expressions for merge insert predicates (#3444)
### Description This PR exposes native DataFusion expression support in the Rust SDK's `MergeInsertBuilder` via two new builder methods: `when_matched_update_all_expr` and `when_not_matched_by_source_delete_expr`. For remote LanceDB tables (where operations are serialized over HTTP/JSON to the SaaS backend), native DataFusion expression trees cannot be executed directly. The SDK handles this gracefully by returning a `NotSupported` error. ### Key Changes - **`MergeFilter` Enum**: Introduced a helper enum to store either a SQL string or a native `datafusion_expr::Expr`. - **`MergeInsertBuilder`**: Updated `when_matched_update_all_filt` and `when_not_matched_by_source_delete_filt` fields to store the new enum, and added `when_matched_update_all_expr` and `when_not_matched_by_source_delete_expr` builder methods. - **Execution & Remote Dispatch**: Dispatched the filter variants during local execution, and rejected expression filters with a clean `NotSupported` error in remote table request conversion. - **Testing**: Added a `test_merge_insert_expr` unit test covering conditional updates and deletes with programmatically built DataFusion expressions. ### Verification - Added integration test `test_merge_insert_expr` which successfully compiles and passes. - Formatted and linted the code. Closes #3416 |
||
|
|
a16676e05f |
ci: update python lockfile weekly (#3498)
Make sure we are getting security fixes in there regularly, and other useful bumps. |
||
|
|
4e44262499 |
test(python): add regression test for nullable struct with None (#2654) (#3483)
## Summary Regression test for [issue #2654](https://github.com/lancedb/lancedb/issues/2654) — a nullable struct column whose first batch contains only `None` values crashed in `_align_field_types` with `AttributeError: 'pyarrow.lib.DataType' object has no attribute 'fields'`. The actual fix landed in #3394, but no test was added. This PR adds the reproducer from the issue as a test. ## Test plan - `test_add_nullable_struct_with_none`: creates a table with a nullable struct column, adds a row with a non-null struct value, then a row with `None` for the struct field. Verifies both rows land correctly. - Uses Lance file format v2.1 (`new_table_data_storage_version="2.1"`) because nullable structs aren't supported on v2.0. ## Related - #3028 (the original fix attempt, now superseded) |
||
|
|
632375faf1 |
docs: add cross-SDK parity guidance for code review (#3464)
Adds a REVIEW.md at the repo root with cross-SDK parity guidance for automated code review. The Claude Code review feature automatically loads `REVIEW.md` as review-only context. This is intentionally a semantic nudge, not a deterministic check, it relies on the reviewer reading the sibling SDK, so it will catch most gaps. |
||
|
|
9969191d0d |
fix(rerankers): guard against empty vector_results in RRFReranker.rerank_multivector (#3467)
## What's broken Calling `RRFReranker().rerank_multivector([])` crashes with `IndexError: list index out of range` because the method accesses `vector_results[0]` for the type-homogeneity check before verifying the list is non-empty. The `all()` call passes vacuously on an empty iterable so the crash hits the next lines. ```python from lancedb.rerankers import RRFReranker RRFReranker().rerank_multivector([]) # IndexError: list index out of range ``` ## Why it happens The type check uses `vector_results[0]` as the reference type but never guards against an empty list. `all(...)` short-circuits to `True` when the iterable is empty, so the bad index access on the lines that follow is never reached by the existing guard logic. ## Fix Add an explicit empty-list check before any indexing. |
||
|
|
1e7326cd8c |
fix(rerankers/mrr): raise ValueError on empty vector_results list (#3469)
## What's broken
`MRRReranker.rerank_multivector([])` raises `IndexError: list index out
of range`. The crash happens on line 128 (the `all()` type-homogeneity
check passes vacuously on an empty iterable) and on line 134 which
accesses `vector_results[0]` unconditionally, with no prior guard for an
empty list.
## Why it happens
`all()` over an empty iterable returns `True`, so the type check
silently passes and execution falls through to `vector_results[0]` which
crashes.
## Fix
Added a two-line guard at the top of `rerank_multivector` that raises a
clear `ValueError("vector_results must not be empty")` before any
indexing occurs.
## Test
Added `test_mrr_reranker_empty_input` in `test_rerankers.py` which calls
`rerank_multivector([])` and asserts that a `ValueError` with the
message "must not be empty" is raised.
Fixes #3468
Co-authored-by: Aegis Dev <aegis@devteamaegis.com>
|
||
|
|
9483b534af | Bump version: 0.30.1-beta.0 → 0.30.1-beta.1 | ||
|
|
ac3411e81e | Bump version: 0.33.1-beta.0 → 0.33.1-beta.1 python-v0.33.1-beta.1 | ||
|
|
6f18eb4cce |
feat(python): support blob modes in query to_pandas (#3487)
## Feature - What is the new feature? - Adds `blob_mode` support to sync and async Python query `to_pandas()` APIs. - Enables plain scan queries to return blob columns as lazy `BlobFile` objects, raw bytes, or blob descriptions. - Lets namespace-backed local tables use Lance native blob-aware pandas conversion for lazy blobs. - Why do we need this feature? - Table and Lance dataset/scanner APIs already support blob-aware pandas conversion, but LanceDB query builders did not expose that capability. - Geneva and other callers should be able to use query-level `to_pandas(blob_mode=...)` without manually constructing Lance scanners. - How does it work? - Plain scan queries route through Lance scanner native `to_pandas(blob_mode=...)`, preserving filter, projection, limit, offset, row id, and alias/expression projection behavior. - Non-native query shapes keep existing Arrow fallback semantics and raise a clear error when they return blob columns with `blob_mode="lazy"` or `blob_mode="bytes"`. - Focused tests cover table/query blob modes, filter/select/limit/offset/alias query cases, async query behavior, vector-query error boundaries, and namespace-backed lazy blobs. ## Validation - `cd python && .venv/bin/maturin develop --uv --extras tests,dev --profile dev` - `cd python && uv run --frozen --no-sync pytest python/tests/test_table.py::test_table_to_pandas_blob_modes python/tests/test_table.py::test_async_table_to_pandas_blob_bytes python/tests/test_query.py::test_plain_scan_query_to_pandas_blob_modes python/tests/test_query.py::test_plain_scan_query_to_pandas_blob_projection python/tests/test_query.py::test_async_plain_scan_query_to_pandas_blob_projection python/tests/test_query.py::test_vector_query_to_pandas_blob_mode_requires_native_path python/tests/test_namespace.py::TestNamespaceConnection::test_table_to_pandas_blob_lazy_through_namespace -q` - `cd python && uv run --frozen --no-sync ruff format --check .` - `cd python && uv run --frozen --no-sync ruff check .` - `git diff --check` |
||
|
|
379684391e |
feat: deprecate replace_field_metadata for update_field_metadata (#3484)
### Summary Deprecates the Python replace_field_metadata (on Table and AsyncTable) in favor of update_field_metadata. Mirrors Lance, which already deprecated Dataset.replace_field_metadata for update_field_metadata. Stacked on top of #3482 as this was a follow-up task after adding update_field_metadata |
||
|
|
d065be0474 |
feat: add update_field_metadata to edit per-field metadata (#3482)
### Summary Adds update_field_metadata to the client SDK (Rust core, Python, and TypeScript) so clients can edit per-field (column) Arrow metadata (schema.fields[].metadata) ### Testing - added unit tests - ran E2E against a local server on both local and remote tables (set → merge → delete), across Python sync/async and TypeScript ### Next steps - deprecate replace_field_metadata in the python lancedb favor of this (typescript didn't have replace_field_metadata method). This matches Lance's API direction (Lance already deprecated replace_field_metadata for update_field_metadata) |
||
|
|
7b874905fd |
ci: move Lance dependency bump flow into skill (#3475)
Moves the Lance dependency bump process into an in-repository skill so local agents and GitHub Actions share the same workflow definition. The update workflow is now an explicit, optional-tag entrypoint; latest-release resolution, duplicate PR handling, Java/Rust dependency updates, and Sophon follow-up are documented in the skill and backed by a small deterministic helper. |
||
|
|
a327044e2f |
feat(python): support remote tables in PyTorch dataloaders (#3432)
This PR makes remote LanceDB tables usable from PyTorch multiprocessing workers. Remote tables now carry enough safe JSON connection state to reopen themselves after pickle/spawn or fork, and permutations lazily rebuild their reader from restored tables instead of trying to reuse process-local handles. This addresses the remote-table gap in the PyTorch dataset path while preserving the explicit connection factory escape hatch for custom worker-side credential loading or non-serializable header providers. Validated with targeted remote table, permutation, and PyTorch DataLoader tests. |
||
|
|
f20ec99dec | Bump version: 0.30.0-beta.1 → 0.30.1-beta.0 | ||
|
|
60f961584c | Bump version: 0.33.0-beta.1 → 0.33.1-beta.0 python-v0.33.1-beta.0 | ||
|
|
ac699d7ecf |
chore: bump lance to 7.2.0-beta.3 (#3471)
This updates the workspace Lance dependencies from `v7.1.0-beta.4` to `v7.2.0-beta.3` and refreshes `Cargo.lock`. The lockfile now points at Lance commit `7c070f760fa8e24c8015cb2afbd22c5e6b7898e8` and includes the transitive dependency updates required by the new beta. |
||
|
|
968277be79 |
chore(deps): bump the rust-minor-patch group with 5 updates (#3465)
Bumps the rust-minor-patch group with 5 updates: | Package | From | To | | --- | --- | --- | | [log](https://github.com/rust-lang/log) | `0.4.29` | `0.4.30` | | [serde_json](https://github.com/serde-rs/json) | `1.0.149` | `1.0.150` | | [http](https://github.com/hyperium/http) | `1.4.0` | `1.4.1` | | [uuid](https://github.com/uuid-rs/uuid) | `1.23.1` | `1.23.2` | | [aws-smithy-runtime](https://github.com/smithy-lang/smithy-rs) | `1.11.1` | `1.11.3` | Updates `log` from 0.4.29 to 0.4.30 <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/rust-lang/log/releases">log's releases</a>.</em></p> <blockquote> <h2>0.4.30</h2> <h3>What's Changed</h3> <ul> <li>Support capturing of <code>std::net</code> types by <a href="https://github.com/KodrAus"><code>@KodrAus</code></a> in <a href="https://redirect.github.com/rust-lang/log/pull/724">rust-lang/log#724</a></li> </ul> <h3>New Contributors</h3> <ul> <li><a href="https://github.com/V0ldek"><code>@V0ldek</code></a> made their first contribution in <a href="https://redirect.github.com/rust-lang/log/pull/720">rust-lang/log#720</a></li> <li><a href="https://github.com/woodruffw"><code>@woodruffw</code></a> made their first contribution in <a href="https://redirect.github.com/rust-lang/log/pull/723">rust-lang/log#723</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/rust-lang/log/compare/0.4.29...0.4.30">https://github.com/rust-lang/log/compare/0.4.29...0.4.30</a></p> <h3>Notable Changes</h3> <ul> <li>MSRV is bumped to 1.71.0 in <a href="https://redirect.github.com/rust-lang/log/pull/723">rust-lang/log#723</a></li> </ul> </blockquote> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/rust-lang/log/blob/master/CHANGELOG.md">log's changelog</a>.</em></p> <blockquote> <h2>[0.4.30] - 2026-05-21</h2> <h3>What's Changed</h3> <ul> <li>Support capturing of <code>std::net</code> types by <a href="https://github.com/KodrAus"><code>@KodrAus</code></a> in <a href="https://redirect.github.com/rust-lang/log/pull/724">rust-lang/log#724</a></li> </ul> <h3>New Contributors</h3> <ul> <li><a href="https://github.com/V0ldek"><code>@V0ldek</code></a> made their first contribution in <a href="https://redirect.github.com/rust-lang/log/pull/720">rust-lang/log#720</a></li> <li><a href="https://github.com/woodruffw"><code>@woodruffw</code></a> made their first contribution in <a href="https://redirect.github.com/rust-lang/log/pull/723">rust-lang/log#723</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/rust-lang/log/compare/0.4.29...0.4.30">https://github.com/rust-lang/log/compare/0.4.29...0.4.30</a></p> <h3>Notable Changes</h3> <ul> <li>MSRV is bumped to 1.71.0 in <a href="https://redirect.github.com/rust-lang/log/pull/723">rust-lang/log#723</a></li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href=" |
||
|
|
5638907fa5 |
chore: update Lance to v7.2.0-beta.1 (#3461)
Update the Rust workspace Lance git dependencies and Java lance-core dependency to v7.2.0-beta.1. This keeps LanceDB aligned with the latest Lance beta release and refreshes the Cargo lockfile for the new Lance dependency graph. |