lancedb

mirror of https://github.com/lancedb/lancedb.git synced 2026-06-02 20:00:46 +00:00

Author	SHA1	Message	Date
Xuanwo	ac699d7ecf	chore: bump lance to 7.2.0-beta.3 (#3471 ) This updates the workspace Lance dependencies from `v7.1.0-beta.4` to `v7.2.0-beta.3` and refreshes `Cargo.lock`. The lockfile now points at Lance commit `7c070f760fa8e24c8015cb2afbd22c5e6b7898e8` and includes the transitive dependency updates required by the new beta.	2026-06-01 20:40:07 +08:00
Xuanwo	5638907fa5	chore: update Lance to v7.2.0-beta.1 (#3461 ) Update the Rust workspace Lance git dependencies and Java lance-core dependency to v7.2.0-beta.1. This keeps LanceDB aligned with the latest Lance beta release and refreshes the Cargo lockfile for the new Lance dependency graph.	2026-05-30 00:18:22 +08:00
Heng Ge	048f52c2aa	feat(table): route merge_insert through the MemWAL LSM write path (#3354 ) ## Summary When an `LsmWriteSpec` is installed on a table (#3396), `merge_insert` upsert calls are dispatched through Lance's MemWAL `ShardWriter` (LSM-style append) instead of the standard merge path. - `use_lsm_write` — a `merge_insert` builder option, default `true`; set it `false` to use the standard path for a call even when a spec is set. - `assume_pre_sharded` — a `merge_insert` builder option, default `false`; skips the per-row shard check and routes by the first row only. - `close_lsm_writers` — drains and closes the table's cached MemWAL shard writers. - The `merge_insert` `on` columns default to, and are validated against, the table's unenforced primary key. - Shard writers are cached alongside the dataset (in `DatasetConsistencyWrapper`) and reused for the session. - `MergeResult` gains `num_rows` — on the LSM path the insert/update breakdown is unknown until compaction, so only the total is reported. Routing covers all three sharding strategies — bucket (murmur3, Iceberg-compatible), identity, and unsharded. Each `merge_insert` call targets a single shard; the whole input is collected and validated before a single atomic `ShardWriter::put`, so a validation failure leaves the MemWAL untouched. Bindings: Python (`merge_insert(...).use_lsm_write(...)` / `.assume_pre_sharded(...)`, `Table.close_lsm_writers`) and TypeScript (`mergeInsert(...).useLsmWrite(...)` / `.assumePreSharded(...)`, `Table.closeLsmWriters`). ## Context Reconstructed from the original #3354 branch onto current `main`: the branch predated the #3394 (unenforced primary key) / #3396 (`LsmWriteSpec`) split and has been rebuilt on that merged foundation. Depends on Lance `v7.0.0-beta.13`. The MemWAL read path (reading un-flushed shard data back into queries) and remote (LanceDB Cloud) LSM support are follow-ups. --------- Co-authored-by: Jack Ye <yezhaoqin@gmail.com>	2026-05-29 08:48:11 -07:00
Will Jones	458dcabbd2	chore: upgrade Rust toolchain to 1.95.0 (#3390 ) Bumps the pinned toolchain in `rust-toolchain.toml` from 1.94.0 to 1.95.0. Fixes new lints surfaced by clippy on 1.95.0: - `manual_checked_ops` — fragment size mean in `table.rs` uses `checked_div` - `explicit_counter_loop` — shuffle test loop in `shuffle.rs` No rustc warnings were introduced. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-29 08:21:45 -07:00
Will Jones	ab982d7f65	perf: migrate list_indices to use Lance's describe_indices (#3108 ) This needs https://github.com/lance-format/lance/pull/6099 to work. Closes #3140 --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-28 16:41:05 -07:00
Will Jones	a9f49c8150	fix: allow appending arrow.json data into lance.json tables (#3429 ) When a table is created with `pa.json_()` (PyArrow's JSON extension type), it is stored internally as `lance.json` (LargeBinary with `lance.json` extension metadata). Calling `table.add()` with `pa.json_()` data failed with: ``` RuntimeError: lance error: Append with different schema: `data` should have type json but type was large_binary ``` `build_field_exprs` in `rust/lancedb/src/table/datafusion/cast.rs` saw that the input field (`Utf8` with `arrow.json` metadata) differed from the table field (`LargeBinary` with `lance.json` metadata). Since `can_cast_types(Utf8, LargeBinary)` is true, it inserted a DataFusion `Utf8 → LargeBinary` cast. That cast preserved the input field's `arrow.json` extension metadata instead of adopting the table's `lance.json` metadata, so lance-core detected a schema mismatch and rejected the append. This adds a special case in `build_field_exprs`: when the input is `arrow.json` and the table field is `lance.json`, the expression is passed through unchanged. Lance-core's write path already handles the `arrow.json → lance.json` conversion (including JSONB encoding), so no DataFusion cast is needed. Fixes #3144 Continues #3291 from a fork (the original author's branch could not be pushed to). The original commits are preserved; an additional commit fixes the CI failures on that PR — formatting, a missing trait import, and read-back assertions that assumed binary storage when a lance.json column is read back as `Utf8`. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: yunju.lly <yunju.lly@antgroup.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-26 19:24:28 -07:00
Jack Ye	a7d9f2e99d	fix: remove primary key constraint from MemWAL bucket sharding (#3435 ) ## Summary - Bump lance dependency from `v7.0.0-beta.13` to `v7.0.0-rc.1` - Remove PK constraint from `LsmWriteSpec::Bucket` docs and `Table::set_lsm_write_spec` docs - Remove test assertions that expected rejection when no PK is set or when bucket column != PK Closes https://github.com/lance-format/lance/issues/6917	2026-05-26 17:35:28 -07:00
Brendan Clement	15e75804c4	feat(remote): send read freshness headers for remote table consistency (#3439 ) Closes client side work of #3370 ### Summary - Plumbs `read_consistency_interval` from `ConnectBuilder` through `RestfulLanceDbClient` so remote reads attach an `x-lancedb-min-timestamp` freshness header. None = no header (default), zero = "now", positive = `now - interval`. - Adds per-table `FreshnessState` on `RemoteTable`: write responses (`update`, `delete`, `merge_insert`, `add_columns`, `alter_columns`, `drop_columns`) track the committed version, and the next read sends `x-lancedb-min-version` so the server's cache honors read-your-write. - `checkout(v)` / `checkout_tag(t)` / `checkout_latest()` / `restore()` reset the freshness state appropriately; the validating `/describe/` and tag-resolve requests are sent without freshness headers so they don't carry stale state. - Updates Rust, Python, and Node docstrings and calls out that stronger consistency raises per-read latency and cost. ### Testing - Unit tests cover default behavior, interval=0, positive interval, checkout_latest baseline, min_version-after-write, checkout clears state, and the two no-stale-header invariants on `checkout(v)` and `checkout_tag(t)`. - Ran smoke tests against local remote table to verify functionality	2026-05-26 13:38:07 -07:00
Yuval Lifshitz	df2b6d3dd4	feat(rust): support DataFusion Expr for table row deletions (#3415 ) Modified the parameter of delete to a Predicate that could be constructed from either datafusion Expr, from str (to support SQL predicate), or from String to support python and javascript bindings. When a datafusion Expr is used, it avoids the overhead of serializing to SQL and re-parsing when callers already have an Expr (e.g. from query planning). The native implementation uses lance's `DeleteBuilder::from_expr`. The remote implementation converts the Expr to SQL via `expr_to_sql_string` before sending to the server, consistent with the existing query and count_rows paths. Closes #3204 Signed-off-by: Yuval Lifshitz <ylifshit@ibm.com> Co-authored-by: Claude Code <noreply@anthropic.com>	2026-05-26 11:49:54 -07:00
Will Jones	da2a1c4a2c	test(rust): fix flaky env-var-dependent client tests (#3426 ) The `test_resolve_user_id_*` tests in `remote/client.rs` mutate the process-global `LANCEDB_USER_ID` and `LANCEDB_USER_ID_ENV_KEY` environment variables. cargo runs tests in a binary across multiple threads, so one test's `remove_var` can race another's `set_var` between when it's set and when `resolve_user_id()` reads it. This surfaced as an intermittent failure of `test_resolve_user_id_from_env_key` on Windows CI: ``` assertion `left == right` failed left: None right: Some("custom-env-user-id") ``` Annotates the five env-mutating tests with `serial_test`'s `#[serial(user_id_env)]` so they run serially with respect to each other. Should be backported to `release/v0.28` (CI for #3421 hit this same flake). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-22 10:35:15 -07:00
Lance Release	7168d64af1	Bump version: 0.30.0-beta.0 → 0.30.0-beta.1	2026-05-22 10:09:01 +00:00
Xuanwo	a0001043b6	fix: canonicalize remote nested field paths (#3430 ) Fixes #3407. Remote tables now resolve create-index field paths against the table schema before sending requests, so nested, escaped, and case-insensitive inputs use the same canonical path contract as local tables. Remote `list_indices()` also canonicalizes returned columns against the current schema, and the remote query tests lock explicit nested vector and FTS request payloads.	2026-05-22 15:23:00 +08:00
Lance Release	1bb7acb74f	Bump version: 0.29.1-beta.0 → 0.30.0-beta.0	2026-05-21 21:36:18 +00:00
Xuanwo	d5dc4c0f06	fix: discover nested vector columns by default (#3423 ) LanceDB default vector column discovery only considered top-level fields, so tables with a single nested vector leaf still required users to pass an explicit field path. This updates Rust and Python discovery to recurse into struct fields, return canonical field paths, and preserve actionable errors when no default or multiple defaults exist. The explicit nested path flow for index creation and search remains supported across Rust, Python, and Node, with regression coverage for single nested vector leaves, multiple candidate leaves, and schemas without vector leaves. Closes #3405.	2026-05-21 19:02:41 +08:00
Xuanwo	2eba7ebd02	fix: return canonical nested index paths (#3413 ) Index metadata APIs now resolve stored field ids back to Lance canonical field paths instead of leaf names, so nested indexes such as `metadata.user_id` and escaped literal-dot fields round-trip through `list_indices()`. Native index creation also canonicalizes the input path before handing it to Lance, keeping local metadata consistent with the field-path contract while remote responses continue to expose server-provided canonical columns. Fixes #3403.	2026-05-21 00:20:47 +08:00
Xuanwo	5bfde47a8e	fix: support nested field paths in native index creation (#3408 ) Native index creation was resolving requested columns through top-level Arrow schema lookup before handing the request to Lance, which rejected nested paths and could collapse a nested field to its leaf name. This PR resolves index targets with Lance field-path semantics, passes the canonical path through to Lance, and reports indexed columns from field ids as canonical full paths. This also removes the Python native FTS guard that rejected dotted paths so scalar, vector, and FTS index creation share the same nested-field contract. Related to #3402.	2026-05-20 11:15:15 +08:00
Weston Pace	01e272c0b0	fix(rust): match embedding scannable columns by name (#3410 ) Fixes #3136. ## Summary - `WithEmbeddingsScannable::scan_as_stream` matched columns positionally against the table schema, so a `CastError` was raised whenever the computed batch order differed from the table schema order. - The mismatch surfaced when `add_columns` added a new physical column after an embedding column: the table schema became `[..., embedding, extra]`, but `compute_embeddings_for_batch` always appends embeddings at the end, producing `[..., extra, embedding]`. Position 2 then tried to cast e.g. `score: Float64` → `embedding: FixedSizeList` and failed. - Now we look each output column up by name in the result batch, which is order-independent. If a non-embedding column required by the table schema is missing from the input, we return a clear `InvalidInput` error instead of a confusing cast error. ## Reproduction (from the issue) ```text Table created with: [id, text, text_vec(embedding)] add_columns("score") → schema: [id, text, text_vec, score] table.add([id, text, score]) → BEFORE: CastError on position 2 AFTER: succeeds, embedding is computed ``` ## Tests - `data::scannable.rs::test_with_embeddings_scannable_column_added_after_embedding` — unit test exercising the exact column-order mismatch via `WithEmbeddingsScannable::with_schema`. - `data::scannable.rs::test_with_embeddings_scannable_missing_required_column` — covers the new "missing column" error path. - `table::add_data.rs::test_add_with_embeddings_after_add_columns` — end-to-end regression test mirroring the reproduction in the issue (create table with embedding → `add_columns` → `table.add`). ## Test plan - [x] `cargo check --quiet --features remote --tests --examples` - [x] `cargo clippy --quiet --features remote --tests --examples` - [x] `cargo fmt --all` - [x] `cargo test --quiet --features remote -p lancedb embedding` — 18 tests pass 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 15:08:12 -07:00
Lance Release	53c2164b84	Bump version: 0.29.0 → 0.29.1-beta.0	2026-05-18 22:07:52 +00:00
Drew Gallardo	aac6c62459	feat(python): add public take_offsets method on Permutation (#3375 ) Closes #3243. This PR exposes a new public api `Permutation.take_offsets(offsets: list[int])`, since users initially had to call __getitems__ directly to batch-fetch rows by position. Currently, the name matches the existing `Table.take_offsets` pattern, and now the dunder `__getitem__` and `__getitems__` now delegate to it. Also, fixes a parse error when `PermutationReader::take_offsets` gets an empty list. Now returns an empty `RecordBatch` with the correct schema instead. Bundled this because without the fix the new public API blows up on a perfectly reasonable input. `__getitems__` is preserved since PyTorch's batched DataLoader requires it. ### Testing - Added 3 new Rust tests for empty offsets including permutation table with Select::All, Select::Columns, and identity path - Added 3 new Python tests for the public API including a happy case, and empty input on both identity and permutation clippy, format, check all clean! cc: @westonpace	2026-05-18 09:35:56 -07:00
Weston Pace	8df2fff75f	ci: bump version after 0.29 release (#3378 ) The 0.29 release happened on a branch because the main line had already moved past the 6.0.0 stable lance release. As a result the version bump commits ended up on the branch. This merges those commits back into main. --------- Co-authored-by: Lance Release <lance-dev@lancedb.com>	2026-05-18 05:34:33 -07:00
Heng Ge	0d30b31998	feat: support setting LSM write spec for a table (#3396 ) ## Summary Split out from #3354 Adds `LsmWriteSpec` and `Table::set_lsm_write_spec` / `unset_lsm_write_spec` to install and clear the spec that selects Lance's MemWAL LSM-style write path for `merge_insert`. `LsmWriteSpec` offers three sharding strategies, all built on Lance's `InitializeMemWalBuilder`: - `LsmWriteSpec::bucket(column, num_buckets)` — hash-bucket sharding by the single-column unenforced primary key. - `LsmWriteSpec::identity(column)` — identity sharding by the raw value of a scalar column. - `LsmWriteSpec::unsharded()` — a single MemWAL shard. Each can be refined with `with_maintained_indexes(...)` (indexes the MemWAL keeps up to date as rows are appended) and `with_writer_config_defaults(...)` (default `ShardWriter` configuration recorded in the MemWAL index, so every writer starts from the same defaults). All variants require the table to have an unenforced primary key. - `set_lsm_write_spec` installs the spec by initializing the MemWAL index; `unset_lsm_write_spec` removes it (dropping the MemWAL index), reverting to the standard `merge_insert` path. `unset` is idempotent. - Bindings: Python (`LsmWriteSpec.bucket` / `.identity` / `.unsharded`, `set_lsm_write_spec` / `unset_lsm_write_spec`) and TypeScript (`setLsmWriteSpec` with `specType` `"bucket"` / `"identity"` / `"unsharded"`). `RemoteTable` returns `NotSupported`. The actual `merge_insert` LSM dispatch and `ShardWriter` write path are a follow-up — this PR only installs and clears the spec.	2026-05-18 00:11:33 -07:00
Heng Ge	6a431ff0a0	feat: support setting unenforced primary key (#3394 ) ## Summary Adds `Table::set_unenforced_primary_key` — records a single column as the table's unenforced primary key in Lance schema field metadata. "Unenforced" means LanceDB does not check uniqueness on write; the key is metadata that `merge_insert` consumes. - Single-column only; the column must exist and have a supported dtype (Int32, Int64, Utf8, LargeUtf8, Binary, LargeBinary, FixedSizeBinary). The API accepts an iterable for binding ergonomics but requires exactly one column — compound keys are rejected. - The primary key is immutable: calling this on a table that already has an unenforced primary key is rejected. Concurrent writers racing to set the key fail at commit time rather than silently overriding it. - `RemoteTable` returns `NotSupported`. - Bindings: Python (`AsyncTable`, `LanceTable`, `RemoteTable`) and TypeScript (`Table.setUnenforcedPrimaryKey`). ## Context Split out from #3354 per review feedback, so the unenforced primary key and the `merge_insert` sharding spec land as separate reviewable PRs. No Lance dependency bump — `main` is already on v7.0.0-beta.10, which includes the field-metadata round-trip fix the API relies on. Enforcing primary-key immutability at the Lance commit layer (so the cross-column concurrent race is also rejected) is a companion Lance change: lance-format/lance#6810.	2026-05-16 23:12:55 -07:00
Xin Sun	ab2c5adf5e	feat(nodejs): add order_by method to Query (#3123 )	2026-05-16 22:49:08 -07:00
Shengan Zhang	64aeee84a8	feat(python): support `bytes` in `lit()` expressions (#3387 ) Closes #3261. ## Summary Adds `bytes` to the accepted types of `lancedb.expr.lit()` so that binary scalars can be used in filter / projection expressions. The previous attempt in #3235 had to be reverted because DataFusion's SQL unparser does not support `Binary` / `LargeBinary` scalars, so any expression containing such a literal would fail in both `to_sql()` and `__repr__`. ## How `expr_to_sql_string` now has two paths: - Fast path (no binary literals): delegate to DataFusion's unparser unchanged. - Slow path: rewrite each `Binary(Some(bytes))` literal in the tree to a unique string-literal placeholder, run the unparser, then substitute `'<placeholder>'` with `X'<HEX>'` in the resulting SQL. `Binary(None)` / `LargeBinary(None)` are rewritten to `ScalarValue::Null` so the unparser emits plain `NULL`. This keeps DataFusion as the single source of truth for operator and function serialization, so binary literals work in every expression node type the unparser already supports — including nested cases like `contains(col("data"), lit(b"\xff"))`, `NOT (col == lit(b"..."))`, and `col.cast(...) == lit(b"...")`. ## Changes - `rust/lancedb/src/expr/sql.rs`: placeholder-substitution implementation. - `rust/lancedb/src/expr.rs`: 4 new unit tests covering binary literals in equality, compound predicates, scalar function calls, negation, and `NULL` binary literals. - `python/src/expr.rs`: `expr_lit` accepts `PyBytes` and produces `ScalarValue::Binary`. - `python/Cargo.toml` + `Cargo.lock`: pull in `datafusion-common` for `ScalarValue`. - `python/python/lancedb/expr.py`: extend `ExprLike` and `lit()` type annotations / docstrings with `bytes`. - `python/python/lancedb/_lancedb.pyi`: update `expr_lit` stub. - `python/tests/test_expr.py`: unit tests for `to_sql` / `repr` of binary literals and an integration test against a real `pa.binary()` column for equality / inequality / compound filters. ## Example ```python from lancedb.expr import col, lit, func # Equality against a binary column col("payload") == lit(b"\xca\xfe") # Expr((payload = X'CAFE')) # Nested inside a function call (previously failed) func("contains", col("data"), lit(b"\xff")) # Expr(contains(data, X'FF')) # repr() no longer crashes repr(lit(b"\xde\xad\xbe\xef")) # "Expr(X'DEADBEEF')" ``` ## Verification - [x] `cargo test -p lancedb --lib expr::` — 12/12 pass (was 9; +3 new tests) - [x] `cargo check --features remote --tests --examples` — clean - [x] `cargo clippy --features remote --tests --examples` — no warnings - [x] `cargo fmt --all -- --check` — clean - [x] `pytest python/tests/test_expr.py` — 76/76 pass (was 74; +2 new tests) - [x] `ruff check python` / `ruff format --check python` — clean ## Follow-ups (not in this PR) Issue #3261 also raises the possibility of a truncated `__repr__` for very large binary literals. This PR keeps `__repr__` exact (it forwards to `to_sql()`), since truncating display output would diverge from the SQL that actually gets executed. A display-only truncation could be added in a follow-up by giving `__repr__` its own renderer. Made with [Cursor](https://cursor.com) Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-14 15:24:52 -07:00
Justin Miller	5b45e44ce3	fix(rust): map lance-namespace errors to TableNotFound / TableAlreadyExists (#3385 ) ## Summary `LanceNamespaceDatabase::open_table` and `create_table` were squashing `NamespaceError::TableNotFound` and `TableAlreadyExists` into generic `Error::Runtime`, so callers couldn't distinguish a missing-table or duplicate-table error from any other internal failure. Downstream this surfaced to geneva-style code as HTTP 500 / "internal server error" on operations that should have been 400/404 — see [ENT-1235](https://linear.app/lancedb/issue/ENT-1235/fix-ns-errors-for-create-tableopen-table). This PR walks the boxed-error chain from `lance::Error::Namespace` down to the inner `NamespaceError` and maps its `ErrorCode` onto the proper `lancedb::Error` variant: - `NamespaceError::TableNotFound` → `Error::TableNotFound { name, source }` - `NamespaceError::TableAlreadyExists` → `Error::TableAlreadyExists { name }` - everything else → `Error::Runtime` (unchanged behavior for the long tail) It also replaces the existing `e.to_string().contains("already exists")` string match in `LanceNamespaceDatabase::create_table` with a downcast on the `NamespaceError` code. That string-match happened to work for the `dir` backend but isn't guaranteed to match the REST namespace backend's error format; the downcast works for both. The chain-walk is needed because `DatasetBuilder::from_namespace` re-wraps the inner namespace error in a fresh `lance::Error::Namespace`, so a single top-level downcast misses it. ## How this helps geneva Geneva's workaround (linked in the parent issue) currently has to use `except Exception:` with a `# todo: this is too broad` comment, plus `str(e).lower().contains("already exists")` string matching, because the namespace-impl path raised a generic `RuntimeError`. After this PR: - `db.open_table("missing")` raises `ValueError("Table 'missing' was not found")` (via the existing Python binding mapping of `TableNotFound` → `PyValueError`) — geneva can catch `ValueError` cleanly. - `db.create_table("dup")` raises `ValueError("Table 'dup' already exists")` reliably across both `dir` and REST backends, so the existing string match becomes deterministic. In phalanx (the sophon REST server), `LanceDBError::TableNotFound` and `LanceDBError::TableAlreadyExists` already map directly to HTTP 404 and HTTP 400 respectively — see [phalanx/src/error.rs:77-94](https://github.com/lancedb/sophon/blob/main/src/rust/phalanx/src/error.rs#L77). No phalanx code change is needed for the bug fix; the previous 500 came from phalanx's string-match fallback not finding `"namespace"` AND `"not found"` in the `Runtime` error's debug-formatted message. ## Follow-up [ENT-1246](https://linear.app/lancedb/issue/ENT-1246/remove-dead-namespace-error-string-matching-in-phalanx) — after this lands and phalanx picks up the new lancedb, the string-matching fallback for table errors in `src/rust/phalanx/src/error.rs` (lines 99-168, 236-256, 502-514) and `src/rust/phalanx/src/rest/table/create_table.rs` (lines 224-241) becomes dead code and can be removed. The `// TODO: Refactor for better namespace error handling` comment at phalanx/src/error.rs:96-98 is exactly what this PR addresses on the lancedb side; ENT-1246 finishes the cleanup on the sophon side. ## Test plan - [x] `cargo test --quiet --features remote -p lancedb --lib` — all 495 lib tests pass, including 4 new tests in `database::namespace::tests`: - `test_namespace_table_not_found` — extended to assert `Error::TableNotFound` (was just `is_err()`) - `test_namespace_open_table_not_found_at_root` — covers the root-namespace path - `test_namespace_create_table_already_exists` — covers child namespace - `test_namespace_create_table_already_exists_at_root` — covers root namespace - [x] `cargo clippy --quiet --features remote --tests` — clean - [x] `cargo fmt --all` — clean - [x] Manually confirmed (via test failures before the fix) that the two `open_table` tests were returning `Error::Runtime { message: "Failed to get table info from namespace: Namespace { source: TableNotFound { ... } }" }` prior to this change. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 15:19:23 -07:00
Shengan Zhang	650f173236	feat(python): add IVF_HNSW_FLAT vector index support (#3366 ) ## Summary Wire up `IVF_HNSW_FLAT` in the Rust core and Python SDK. The index was documented at https://docs.lancedb.com/indexing/vector-index but `lancedb.Table.create_index(index_type="IVF_HNSW_FLAT")` raised `ValueError: Unknown index type IVF_HNSW_FLAT` — the underlying `pylance` already accepted it, only the LanceDB wrapper was missing the wiring. Rust core (`rust/lancedb`): - Add `Index::IvfHnswFlat` / `IndexType::IvfHnswFlat` variants and the `IvfHnswFlatIndexBuilder` (modelled on `IvfHnswSqIndexBuilder`). - Build Lance params via the existing `VectorIndexParams::ivf_hnsw(...)` helper, keeping symmetry with the other `IVF_HNSW_` variants. - Forward the variant in `RemoteTable::create_index` and add two parametrised tests (default + customised config) for the JSON serialisation. - New `NativeTable` integration test (`test_create_index_ivf_hnsw_flat`). Python binding (`python/`):* - New `HnswFlat` dataclass + backwards-compat `IvfHnswFlat` alias. - PyO3 `extract_index_params` recognises the `HnswFlat` config. - `LanceTable.create_index(index_type="IVF_HNSW_FLAT", …)` and the sync `RemoteTable.create_index` both dispatch to the new config. - `IndexStatistics.index_type` `Literal` and `_lancedb.pyi` stubs cover the new type so `pyright`/`make check` stays clean. - Async integration tests (`HnswFlat` + `IvfHnswFlat` alias) and a sync dispatcher test, mirroring the existing `IVF_HNSW_SQ` coverage. - Existing `test_index_statistics_index_type_lists_all_supported_values` updated to include `IVF_HNSW_FLAT`. A matching Node.js / TypeScript binding is in a follow-up PR. Closes #3331 ## Test plan - [ ] \`cargo check --quiet --features remote --tests --examples\` - [ ] \`cargo test --quiet --features remote -p lancedb\` (covers the new \`test_create_index_ivf_hnsw_flat\` and the two new parametrised \`RemoteTable::create_index\` cases) - [ ] \`cargo fmt --all\` / \`cargo clippy --quiet --features remote --tests --examples\` - [ ] \`cd python && make develop && make check && make test\` (covers the two new async tests, the alias test, the dispatcher test, and the updated \`test_index_statistics_index_type_lists_all_supported_values\` assertion)	2026-05-11 15:08:32 -07:00
Xuanwo	9b21c136c6	feat(python): support model-backed native FTS tokenizers (#3289 ) This wires Lance's existing `jieba/` and `lindera/` native FTS tokenizers through the Python SDK instead of leaving them behind disabled features and narrow public typing. It also documents the `LANCE_LANGUAGE_MODEL_HOME` model layout and adds Python coverage for successful CJK indexing plus missing-model error guidance. Closes #2168.	2026-05-08 23:53:14 +08:00
Heng Ge	694aa48e19	fix(database): drop spurious trailing `?` from listing-database URIs (#3357 ) ## Summary `url::Url::query_pairs_mut()` leaves the URL with `query=Some("")` after `.clear()` even when the input had no query string. The listing-database connect path then captured that empty query into `ListingDatabase::query_string`, and `table_uri()` blindly appended `?<query>` to every per-table URI — producing URIs like `s3://bucket/prefix/foo.lance?`. The trailing `?` is benign for normal table operations, but it breaks any caller that constructs a sub-path from the table URI. In particular, MemWAL flushes write to `<table_uri>/_mem_wal/<shard>/<rand>_gen_<n>`, which `url::Url::parse` then re-parses as `path=<base table>` + `query=/_mem_wal/...`. `Dataset::write` resolves the base table dataset, finds it already exists, and fails with `Dataset already exists: …_gen_1` on the very first MemTable flush (observed deterministically against S3 across all merge_insert LSM modes; tracked in [lance-format/lance#6713](https://github.com/lance-format/lance/pull/6715)). ## Fix Treat `Some("")` query the same as no query when capturing `query_string`. A real `?foo=bar` query is still propagated unchanged. Adds a regression test covering both the empty-query and non-empty-query paths. ## Verification - `url::Url::parse("s3://bucket/prefix/").query()` → `None`, but after `query_pairs_mut().clear()` → `Some("")`. Confirmed in a standalone repro. - Without this fix, every `table_uri()` for an `s3://`-style connection ends with `?`, breaking MemWAL and any future sub-path consumer in the same way. - New unit test `test_table_uri_url_path_has_no_trailing_question_mark` exercises both code paths.	2026-05-07 23:29:29 -07:00
LanceDB Robot	47a34f5cca	chore: update lance dependency to v7.0.0-beta.4 (#3348 ) ## Summary - Update Lance Rust dependencies to `v7.0.0-beta.4` using `ci/set_lance_version.py`. - Update the Java `lance-core` dependency property to `7.0.0-beta.4`. - Align LanceDB with dependency updates required by Lance 7, including `object_store` 0.13 API compatibility. Triggering tag: https://github.com/lance-format/lance/releases/tag/v7.0.0-beta.4 ## Verification - `cargo clippy --workspace --tests --all-features -- -D warnings` - `cargo fmt --all`	2026-05-05 18:36:39 -07:00
Lance Release	c091243d5b	Bump version: 0.28.0-beta.10 → 0.28.0-beta.11	2026-04-29 17:53:49 +00:00
LanceDB Robot	4a5341edb1	chore: update lance dependency to v6.0.0-beta.7 (#3334 ) ## Summary - Update Lance Rust dependencies to `6.0.0-beta.7` using `ci/set_lance_version.py`. - Update Java `lance-core.version` to `6.0.0-beta.7`. - Align Arrow/DataFusion/PyO3 dependency versions and apply required compatibility fixes for the Lance upgrade. Triggering tag: [v6.0.0-beta.7](https://github.com/lance-format/lance/releases/tag/v6.0.0-beta.7) ## Verification - `cargo clippy --workspace --tests --all-features -- -D warnings` - `cargo fmt --all`	2026-04-29 10:52:25 -07:00
Jack Ye	25dfe2cfd4	feat: add manifest-enabled directory namespace mode (#3332 ) Adds manifest_enabled for local/native connections so directory namespace manifests can be the source of truth, including migration from directory listing and Azure credential vending feature wiring. Also exposes the option through Rust, Python, and Node bindings with focused validation.	2026-04-29 09:22:06 -07:00
Lance Release	4dcd7f4314	Bump version: 0.28.0-beta.9 → 0.28.0-beta.10	2026-04-28 13:29:26 +00:00
Jack Ye	a92ae0ded5	fix: enable hostname verification by default (#3304 ) ## Summary - make `TlsConfig::default()` enable hostname verification by default - align the Rust default with the documented Python and Node behavior - update the Rust unit test to lock in the safe default	2026-04-21 08:39:03 -07:00
Lance Release	75b0a8e0a3	Bump version: 0.28.0-beta.8 → 0.28.0-beta.9	2026-04-19 20:39:29 +00:00
Jack Ye	2a1df8edcf	fix(rust): materialize declared namespace tables on create (#3288 ) ## Summary - handle `declare_table` already-exists conflicts in the Rust namespace database create path - reuse declared-but-not-materialized table metadata instead of failing create mode - preserve overwrite behavior while allowing declared Geneva system tables to be materialized	2026-04-19 13:25:53 -07:00
Lance Release	be48ada352	Bump version: 0.28.0-beta.7 → 0.28.0-beta.8	2026-04-19 04:19:10 +00:00
Jack Ye	f909df3e87	fix(python): use namespace-backed rust connection for namespace tables (#3286 ) So far, I have been using a hacky approach that creates and opens namespace-backed table, by getting its location and use a temporary lancedb connection to create or open it. This was working for features like credentials vending but is no longer fully working for the managed versioning feature, recently geneva tests have been failing here and there and various patches are not addressing the root cause. This PR fully fixes this and implements proper rust binding for it. Specifically: - build a real Rust namespace-backed connection from the Python namespace client - route namespace table create/open through that connection instead of resolved-location temp connections - keep namespace client naming consistent in the Rust bridge and preserve federated namespace + DuckDB behavior	2026-04-18 21:17:52 -07:00
Lance Release	d715bbb588	Bump version: 0.28.0-beta.6 → 0.28.0-beta.7	2026-04-17 08:12:27 +00:00
Lance Release	11af763fcd	Bump version: 0.28.0-beta.5 → 0.28.0-beta.6	2026-04-16 18:57:28 +00:00
Xuanwo	b7c0b5987c	chore: upgrade lance to 6.0.0-beta.1 (#3281 )	2026-04-17 02:51:58 +08:00
Jack Ye	97a4b38f19	feat(rust): support nested namespace ops in listing db (#3279 ) ## Summary - delegate child-namespace `ListingDatabase` operations through an eagerly initialized `LanceNamespaceDatabase` - support nested namespace create/open/list/drop flows without requiring callers to inject explicit locations - add `namespace_client_properties` plumbing for local and namespace connections so directory namespace settings like `table_version_tracking_enabled` can be configured - add regression tests for nested namespace ops and namespace client property propagation	2026-04-16 10:12:28 -07:00
Gezi-lzq	10879d99b8	docs: fix broken documentation links (#3278 )	2026-04-15 20:56:59 +08:00
Lance Release	4e6a1d5dce	Bump version: 0.28.0-beta.4 → 0.28.0-beta.5	2026-04-12 23:51:14 +00:00
Lance Release	c6ae0de3ee	Bump version: 0.28.0-beta.3 → 0.28.0-beta.4	2026-04-12 03:57:58 +00:00
Lance Release	359710a0bf	Bump version: 0.28.0-beta.2 → 0.28.0-beta.3	2026-04-11 22:44:52 +00:00
Lance Release	df354abae4	Bump version: 0.28.0-beta.1 → 0.28.0-beta.2	2026-04-11 07:06:00 +00:00
Will Jones	2807ad6854	chore: bump Rust toolchain from 1.91.0 to 1.94.0 (#3257 ) Bumps the Rust toolchain to 1.94.0 (latest installed) to unblock CI failures caused by the AWS SDK's MSRV requirement. No lint fixes were needed. --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-10 07:57:47 -07:00
Jack Ye	a898dc81c2	feat: add user_id field to ClientConfig for user identification (#3240 ) ## Summary - Add a `user_id` field to `ClientConfig` that allows users to identify themselves to LanceDB Cloud/Enterprise - The user_id is sent as the `x-lancedb-user-id` HTTP header in all requests - Supports three configuration methods: - Direct assignment via `ClientConfig.user_id` - Environment variable `LANCEDB_USER_ID` - Indirect env var lookup via `LANCEDB_USER_ID_ENV_KEY` Closes #3230 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-04-06 11:20:10 -07:00
Lance Release	de3f8097e7	Bump version: 0.28.0-beta.0 → 0.28.0-beta.1	2026-04-05 02:51:18 +00:00

1 2 3 4 5 ...

733 Commits