Commit Graph

733 Commits

Author SHA1 Message Date
Xuanwo
ac699d7ecf chore: bump lance to 7.2.0-beta.3 (#3471)
This updates the workspace Lance dependencies from `v7.1.0-beta.4` to
`v7.2.0-beta.3` and refreshes `Cargo.lock`.

The lockfile now points at Lance commit
`7c070f760fa8e24c8015cb2afbd22c5e6b7898e8` and includes the transitive
dependency updates required by the new beta.
2026-06-01 20:40:07 +08:00
Xuanwo
5638907fa5 chore: update Lance to v7.2.0-beta.1 (#3461)
Update the Rust workspace Lance git dependencies and Java lance-core
dependency to v7.2.0-beta.1.

This keeps LanceDB aligned with the latest Lance beta release and
refreshes the Cargo lockfile for the new Lance dependency graph.
2026-05-30 00:18:22 +08:00
Heng Ge
048f52c2aa feat(table): route merge_insert through the MemWAL LSM write path (#3354)
## Summary

When an `LsmWriteSpec` is installed on a table (#3396), `merge_insert`
upsert
calls are dispatched through Lance's MemWAL `ShardWriter` (LSM-style
append)
instead of the standard merge path.

- **`use_lsm_write`** — a `merge_insert` builder option, default `true`;
set it
  `false` to use the standard path for a call even when a spec is set.
- **`assume_pre_sharded`** — a `merge_insert` builder option, default
`false`;
  skips the per-row shard check and routes by the first row only.
- **`close_lsm_writers`** — drains and closes the table's cached MemWAL
shard
  writers.
- The `merge_insert` **`on`** columns default to, and are validated
against,
  the table's unenforced primary key.
- Shard writers are cached alongside the dataset (in
  `DatasetConsistencyWrapper`) and reused for the session.
- `MergeResult` gains **`num_rows`** — on the LSM path the insert/update
  breakdown is unknown until compaction, so only the total is reported.

Routing covers all three sharding strategies — bucket (murmur3,
Iceberg-compatible), identity, and unsharded. Each `merge_insert` call
targets
a single shard; the whole input is collected and validated before a
single
atomic `ShardWriter::put`, so a validation failure leaves the MemWAL
untouched.

Bindings: Python (`merge_insert(...).use_lsm_write(...)` /
`.assume_pre_sharded(...)`, `Table.close_lsm_writers`) and TypeScript
(`mergeInsert(...).useLsmWrite(...)` / `.assumePreSharded(...)`,
`Table.closeLsmWriters`).

## Context

Reconstructed from the original #3354 branch onto current `main`: the
branch
predated the #3394 (unenforced primary key) / #3396 (`LsmWriteSpec`)
split and
has been rebuilt on that merged foundation. Depends on Lance
`v7.0.0-beta.13`.

The MemWAL read path (reading un-flushed shard data back into queries)
and
remote (LanceDB Cloud) LSM support are follow-ups.

---------

Co-authored-by: Jack Ye <yezhaoqin@gmail.com>
2026-05-29 08:48:11 -07:00
Will Jones
458dcabbd2 chore: upgrade Rust toolchain to 1.95.0 (#3390)
Bumps the pinned toolchain in `rust-toolchain.toml` from 1.94.0 to
1.95.0.

Fixes new lints surfaced by clippy on 1.95.0:

- `manual_checked_ops` — fragment size mean in `table.rs` uses
`checked_div`
- `explicit_counter_loop` — shuffle test loop in `shuffle.rs`

No rustc warnings were introduced.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-29 08:21:45 -07:00
Will Jones
ab982d7f65 perf: migrate list_indices to use Lance's describe_indices (#3108)
This needs https://github.com/lance-format/lance/pull/6099 to work.

Closes #3140

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-28 16:41:05 -07:00
Will Jones
a9f49c8150 fix: allow appending arrow.json data into lance.json tables (#3429)
When a table is created with `pa.json_()` (PyArrow's JSON extension
type),
it is stored internally as `lance.json` (LargeBinary with `lance.json`
extension metadata). Calling `table.add()` with `pa.json_()` data failed
with:

```
RuntimeError: lance error: Append with different schema:
  `data` should have type json but type was large_binary
```

`build_field_exprs` in `rust/lancedb/src/table/datafusion/cast.rs` saw
that
the input field (`Utf8` with `arrow.json` metadata) differed from the
table
field (`LargeBinary` with `lance.json` metadata). Since
`can_cast_types(Utf8, LargeBinary)` is true, it inserted a DataFusion
`Utf8 → LargeBinary` cast. That cast preserved the input field's
`arrow.json`
extension metadata instead of adopting the table's `lance.json`
metadata, so
lance-core detected a schema mismatch and rejected the append.

This adds a special case in `build_field_exprs`: when the input is
`arrow.json` and the table field is `lance.json`, the expression is
passed
through unchanged. Lance-core's write path already handles the
`arrow.json → lance.json` conversion (including JSONB encoding), so no
DataFusion cast is needed.

Fixes #3144

Continues #3291 from a fork (the original author's branch could not be
pushed to). The original commits are preserved; an additional commit
fixes
the CI failures on that PR — formatting, a missing trait import, and
read-back assertions that assumed binary storage when a lance.json
column
is read back as `Utf8`.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: yunju.lly <yunju.lly@antgroup.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 19:24:28 -07:00
Jack Ye
a7d9f2e99d fix: remove primary key constraint from MemWAL bucket sharding (#3435)
## Summary

- Bump lance dependency from `v7.0.0-beta.13` to `v7.0.0-rc.1`
- Remove PK constraint from `LsmWriteSpec::Bucket` docs and
`Table::set_lsm_write_spec` docs
- Remove test assertions that expected rejection when no PK is set or
when bucket column != PK

Closes https://github.com/lance-format/lance/issues/6917
2026-05-26 17:35:28 -07:00
Brendan Clement
15e75804c4 feat(remote): send read freshness headers for remote table consistency (#3439)
Closes client side work of #3370 

### Summary
- Plumbs `read_consistency_interval` from `ConnectBuilder` through
`RestfulLanceDbClient` so remote reads attach an
`x-lancedb-min-timestamp` freshness header. None = no header (default),
zero = "now", positive = `now - interval`.
- Adds per-table `FreshnessState` on `RemoteTable`: write responses
(`update`, `delete`, `merge_insert`, `add_columns`, `alter_columns`,
`drop_columns`) track the committed version, and the next read sends
`x-lancedb-min-version` so the server's cache honors read-your-write.
- `checkout(v)` / `checkout_tag(t)` / `checkout_latest()` / `restore()`
reset the freshness state appropriately; the validating `/describe/` and
tag-resolve requests are sent without freshness headers so they don't
carry stale state.
- Updates Rust, Python, and Node docstrings and calls out that stronger
consistency raises per-read latency and cost.

### Testing
- Unit tests cover default behavior, interval=0, positive interval,
checkout_latest baseline, min_version-after-write, checkout clears
state, and the two no-stale-header invariants on `checkout(v)` and
`checkout_tag(t)`.
- Ran smoke tests against local remote table to verify functionality
2026-05-26 13:38:07 -07:00
Yuval Lifshitz
df2b6d3dd4 feat(rust): support DataFusion Expr for table row deletions (#3415)
Modified the parameter of delete to a Predicate that could be
constructed from either datafusion Expr, from str (to support SQL
predicate), or from String to support python and javascript bindings.
When a datafusion Expr is used, it avoids the overhead of serializing to
SQL and re-parsing when callers already have an Expr (e.g. from query
planning).

The native implementation uses lance's `DeleteBuilder::from_expr`. The
remote implementation converts the Expr to SQL via `expr_to_sql_string`
before sending to the server, consistent with the existing query and
count_rows paths.

Closes #3204

Signed-off-by: Yuval Lifshitz <ylifshit@ibm.com>
Co-authored-by: Claude Code <noreply@anthropic.com>
2026-05-26 11:49:54 -07:00
Will Jones
da2a1c4a2c test(rust): fix flaky env-var-dependent client tests (#3426)
The `test_resolve_user_id_*` tests in `remote/client.rs` mutate the
process-global `LANCEDB_USER_ID` and `LANCEDB_USER_ID_ENV_KEY`
environment variables. cargo runs tests in a binary across multiple
threads, so one test's `remove_var` can race another's `set_var` between
when it's set and when `resolve_user_id()` reads it.

This surfaced as an intermittent failure of
`test_resolve_user_id_from_env_key` on Windows CI:

```
assertion `left == right` failed
  left: None
 right: Some("custom-env-user-id")
```

Annotates the five env-mutating tests with `serial_test`'s
`#[serial(user_id_env)]` so they run serially with respect to each
other.

Should be backported to `release/v0.28` (CI for #3421 hit this same
flake).

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 10:35:15 -07:00
Lance Release
7168d64af1 Bump version: 0.30.0-beta.0 → 0.30.0-beta.1 2026-05-22 10:09:01 +00:00
Xuanwo
a0001043b6 fix: canonicalize remote nested field paths (#3430)
Fixes #3407.

Remote tables now resolve create-index field paths against the table
schema before sending requests, so nested, escaped, and case-insensitive
inputs use the same canonical path contract as local tables. Remote
`list_indices()` also canonicalizes returned columns against the current
schema, and the remote query tests lock explicit nested vector and FTS
request payloads.
2026-05-22 15:23:00 +08:00
Lance Release
1bb7acb74f Bump version: 0.29.1-beta.0 → 0.30.0-beta.0 2026-05-21 21:36:18 +00:00
Xuanwo
d5dc4c0f06 fix: discover nested vector columns by default (#3423)
LanceDB default vector column discovery only considered top-level
fields, so tables with a single nested vector leaf still required users
to pass an explicit field path. This updates Rust and Python discovery
to recurse into struct fields, return canonical field paths, and
preserve actionable errors when no default or multiple defaults exist.

The explicit nested path flow for index creation and search remains
supported across Rust, Python, and Node, with regression coverage for
single nested vector leaves, multiple candidate leaves, and schemas
without vector leaves.

Closes #3405.
2026-05-21 19:02:41 +08:00
Xuanwo
2eba7ebd02 fix: return canonical nested index paths (#3413)
Index metadata APIs now resolve stored field ids back to Lance canonical
field paths instead of leaf names, so nested indexes such as
`metadata.user_id` and escaped literal-dot fields round-trip through
`list_indices()`. Native index creation also canonicalizes the input
path before handing it to Lance, keeping local metadata consistent with
the field-path contract while remote responses continue to expose
server-provided canonical columns.

Fixes #3403.
2026-05-21 00:20:47 +08:00
Xuanwo
5bfde47a8e fix: support nested field paths in native index creation (#3408)
Native index creation was resolving requested columns through top-level
Arrow schema lookup before handing the request to Lance, which rejected
nested paths and could collapse a nested field to its leaf name. This PR
resolves index targets with Lance field-path semantics, passes the
canonical path through to Lance, and reports indexed columns from field
ids as canonical full paths.

This also removes the Python native FTS guard that rejected dotted paths
so scalar, vector, and FTS index creation share the same nested-field
contract. Related to #3402.
2026-05-20 11:15:15 +08:00
Weston Pace
01e272c0b0 fix(rust): match embedding scannable columns by name (#3410)
Fixes #3136.

## Summary

- `WithEmbeddingsScannable::scan_as_stream` matched columns positionally
  against the table schema, so a `CastError` was raised whenever the
  computed batch order differed from the table schema order.
- The mismatch surfaced when `add_columns` added a new physical column
  **after** an embedding column: the table schema became
  `[..., embedding, extra]`, but `compute_embeddings_for_batch` always
  appends embeddings at the end, producing `[..., extra, embedding]`.
  Position 2 then tried to cast e.g. `score: Float64` →
  `embedding: FixedSizeList` and failed.
- Now we look each output column up by name in the result batch, which
  is order-independent. If a non-embedding column required by the table
  schema is missing from the input, we return a clear `InvalidInput`
  error instead of a confusing cast error.

## Reproduction (from the issue)

```text
Table created with:           [id, text, text_vec(embedding)]
add_columns("score")        → schema: [id, text, text_vec, score]
table.add([id, text, score]) → BEFORE: CastError on position 2
                               AFTER:  succeeds, embedding is computed
```

## Tests

-
`data::scannable.rs::test_with_embeddings_scannable_column_added_after_embedding`
  — unit test exercising the exact column-order mismatch via
  `WithEmbeddingsScannable::with_schema`.
-
`data::scannable.rs::test_with_embeddings_scannable_missing_required_column`
  — covers the new "missing column" error path.
- `table::add_data.rs::test_add_with_embeddings_after_add_columns`
  — end-to-end regression test mirroring the reproduction in the issue
  (create table with embedding → `add_columns` → `table.add`).

## Test plan

- [x] `cargo check --quiet --features remote --tests --examples`
- [x] `cargo clippy --quiet --features remote --tests --examples`
- [x] `cargo fmt --all`
- [x] `cargo test --quiet --features remote -p lancedb embedding` — 18
tests pass

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 15:08:12 -07:00
Lance Release
53c2164b84 Bump version: 0.29.0 → 0.29.1-beta.0 2026-05-18 22:07:52 +00:00
Drew Gallardo
aac6c62459 feat(python): add public take_offsets method on Permutation (#3375)
Closes #3243.

This PR exposes a new public api `Permutation.take_offsets(offsets:
list[int])`, since users initially had to call __getitems__ directly to
batch-fetch rows by position.

Currently, the name matches the existing `Table.take_offsets` pattern,
and now the dunder `__getitem__` and `__getitems__` now delegate to it.

Also, fixes a parse error when `PermutationReader::take_offsets` gets an
empty list. Now returns an empty `RecordBatch` with the correct schema
instead. Bundled this because without the fix the new public API blows
up on a perfectly reasonable input.

`__getitems__` is preserved since PyTorch's batched DataLoader requires
it.

### Testing

- Added 3 new Rust tests for empty offsets including permutation table
with Select::All, Select::Columns, and identity path
- Added 3 new Python tests for the public API including a happy case,
and empty input on both identity and permutation

clippy, format, check all clean!

cc: @westonpace
2026-05-18 09:35:56 -07:00
Weston Pace
8df2fff75f ci: bump version after 0.29 release (#3378)
The 0.29 release happened on a branch because the main line had already
moved past the 6.0.0 stable lance release. As a result the version bump
commits ended up on the branch. This merges those commits back into
main.

---------

Co-authored-by: Lance Release <lance-dev@lancedb.com>
2026-05-18 05:34:33 -07:00
Heng Ge
0d30b31998 feat: support setting LSM write spec for a table (#3396)
## Summary

Split out from #3354

Adds `LsmWriteSpec` and `Table::set_lsm_write_spec` /
`unset_lsm_write_spec` to
install and clear the spec that selects Lance's MemWAL LSM-style write
path for
`merge_insert`.

`LsmWriteSpec` offers three sharding strategies, all built on Lance's
`InitializeMemWalBuilder`:

- `LsmWriteSpec::bucket(column, num_buckets)` — hash-bucket sharding by
the
  single-column unenforced primary key.
- `LsmWriteSpec::identity(column)` — identity sharding by the raw value
of a
  scalar column.
- `LsmWriteSpec::unsharded()` — a single MemWAL shard.

Each can be refined with `with_maintained_indexes(...)` (indexes the
MemWAL
keeps up to date as rows are appended) and
`with_writer_config_defaults(...)`
(default `ShardWriter` configuration recorded in the MemWAL index, so
every
writer starts from the same defaults). All variants require the table to
have
an unenforced primary key.

- `set_lsm_write_spec` installs the spec by initializing the MemWAL
index;
`unset_lsm_write_spec` removes it (dropping the MemWAL index), reverting
to
  the standard `merge_insert` path. `unset` is idempotent.
- Bindings: Python (`LsmWriteSpec.bucket` / `.identity` / `.unsharded`,
  `set_lsm_write_spec` / `unset_lsm_write_spec`) and TypeScript
  (`setLsmWriteSpec` with `specType` `"bucket"` / `"identity"` /
  `"unsharded"`). `RemoteTable` returns `NotSupported`.

The actual `merge_insert` LSM dispatch and `ShardWriter` write path are
a
follow-up — this PR only installs and clears the spec.
2026-05-18 00:11:33 -07:00
Heng Ge
6a431ff0a0 feat: support setting unenforced primary key (#3394)
## Summary

Adds `Table::set_unenforced_primary_key` — records a single column as
the
table's unenforced primary key in Lance schema field metadata.
"Unenforced"
means LanceDB does not check uniqueness on write; the key is metadata
that
`merge_insert` consumes.

- Single-column only; the column must exist and have a supported dtype
(Int32, Int64, Utf8, LargeUtf8, Binary, LargeBinary, FixedSizeBinary).
The
API accepts an iterable for binding ergonomics but requires exactly one
  column — compound keys are rejected.
- The primary key is immutable: calling this on a table that already has
an
unenforced primary key is rejected. Concurrent writers racing to set the
key
  fail at commit time rather than silently overriding it.
- `RemoteTable` returns `NotSupported`.
- Bindings: Python (`AsyncTable`, `LanceTable`, `RemoteTable`) and
TypeScript
  (`Table.setUnenforcedPrimaryKey`).

## Context

Split out from #3354 per review feedback, so the unenforced primary key
and the
`merge_insert` sharding spec land as separate reviewable PRs.

No Lance dependency bump — `main` is already on v7.0.0-beta.10, which
includes
the field-metadata round-trip fix the API relies on. Enforcing
primary-key
immutability at the Lance commit layer (so the cross-column concurrent
race is
also rejected) is a companion Lance change: lance-format/lance#6810.
2026-05-16 23:12:55 -07:00
Xin Sun
ab2c5adf5e feat(nodejs): add order_by method to Query (#3123) 2026-05-16 22:49:08 -07:00
Shengan Zhang
64aeee84a8 feat(python): support bytes in lit() expressions (#3387)
Closes #3261.

## Summary

Adds `bytes` to the accepted types of `lancedb.expr.lit()` so that
binary scalars can be used in filter / projection expressions. The
previous attempt in #3235 had to be reverted because DataFusion's SQL
unparser does not support `Binary` / `LargeBinary` scalars, so any
expression containing such a literal would fail in both `to_sql()` and
`__repr__`.

## How

`expr_to_sql_string` now has two paths:

- **Fast path** (no binary literals): delegate to DataFusion's unparser
unchanged.
- **Slow path**: rewrite each `Binary(Some(bytes))` literal in the tree
to a unique string-literal placeholder, run the unparser, then
substitute `'<placeholder>'` with `X'<HEX>'` in the resulting SQL.
`Binary(None)` / `LargeBinary(None)` are rewritten to
`ScalarValue::Null` so the unparser emits plain `NULL`.

This keeps DataFusion as the single source of truth for operator and
function serialization, so binary literals work in every expression node
type the unparser already supports — including nested cases like
`contains(col("data"), lit(b"\xff"))`, `NOT (col == lit(b"..."))`, and
`col.cast(...) == lit(b"...")`.

## Changes

- `rust/lancedb/src/expr/sql.rs`: placeholder-substitution
implementation.
- `rust/lancedb/src/expr.rs`: 4 new unit tests covering binary literals
in equality, compound predicates, scalar function calls, negation, and
`NULL` binary literals.
- `python/src/expr.rs`: `expr_lit` accepts `PyBytes` and produces
`ScalarValue::Binary`.
- `python/Cargo.toml` + `Cargo.lock`: pull in `datafusion-common` for
`ScalarValue`.
- `python/python/lancedb/expr.py`: extend `ExprLike` and `lit()` type
annotations / docstrings with `bytes`.
- `python/python/lancedb/_lancedb.pyi`: update `expr_lit` stub.
- `python/tests/test_expr.py`: unit tests for `to_sql` / `repr` of
binary literals and an integration test against a real `pa.binary()`
column for equality / inequality / compound filters.

## Example

```python
from lancedb.expr import col, lit, func

# Equality against a binary column
col("payload") == lit(b"\xca\xfe")
# Expr((payload = X'CAFE'))

# Nested inside a function call (previously failed)
func("contains", col("data"), lit(b"\xff"))
# Expr(contains(data, X'FF'))

# repr() no longer crashes
repr(lit(b"\xde\xad\xbe\xef"))
# "Expr(X'DEADBEEF')"
```

## Verification

- [x] `cargo test -p lancedb --lib expr::` — 12/12 pass (was 9; +3 new
tests)
- [x] `cargo check --features remote --tests --examples` — clean
- [x] `cargo clippy --features remote --tests --examples` — no warnings
- [x] `cargo fmt --all -- --check` — clean
- [x] `pytest python/tests/test_expr.py` — 76/76 pass (was 74; +2 new
tests)
- [x] `ruff check python` / `ruff format --check python` — clean

## Follow-ups (not in this PR)

Issue #3261 also raises the possibility of a *truncated* `__repr__` for
very large binary literals. This PR keeps `__repr__` exact (it forwards
to `to_sql()`), since truncating display output would diverge from the
SQL that actually gets executed. A display-only truncation could be
added in a follow-up by giving `__repr__` its own renderer.

Made with [Cursor](https://cursor.com)

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-14 15:24:52 -07:00
Justin Miller
5b45e44ce3 fix(rust): map lance-namespace errors to TableNotFound / TableAlreadyExists (#3385)
## Summary

`LanceNamespaceDatabase::open_table` and `create_table` were squashing
`NamespaceError::TableNotFound` and `TableAlreadyExists` into generic
`Error::Runtime`, so callers couldn't distinguish a missing-table or
duplicate-table error from any other internal failure. Downstream this
surfaced to geneva-style code as HTTP 500 / "internal server error" on
operations that should have been 400/404 — see
[ENT-1235](https://linear.app/lancedb/issue/ENT-1235/fix-ns-errors-for-create-tableopen-table).

This PR walks the boxed-error chain from `lance::Error::Namespace` down
to the inner `NamespaceError` and maps its `ErrorCode` onto the proper
`lancedb::Error` variant:

- `NamespaceError::TableNotFound` → `Error::TableNotFound { name, source
}`
- `NamespaceError::TableAlreadyExists` → `Error::TableAlreadyExists {
name }`
- everything else → `Error::Runtime` (unchanged behavior for the long
tail)

It also replaces the existing `e.to_string().contains("already exists")`
string match in `LanceNamespaceDatabase::create_table` with a downcast
on the `NamespaceError` code. That string-match happened to work for the
`dir` backend but isn't guaranteed to match the REST namespace backend's
error format; the downcast works for both.

The chain-walk is needed because `DatasetBuilder::from_namespace`
re-wraps the inner namespace error in a fresh `lance::Error::Namespace`,
so a single top-level downcast misses it.

## How this helps geneva

Geneva's workaround (linked in the parent issue) currently has to use
`except Exception:` with a `# todo: this is too broad` comment, plus
`str(e).lower().contains("already exists")` string matching, because the
namespace-impl path raised a generic `RuntimeError`. After this PR:

- `db.open_table("missing")` raises `ValueError("Table 'missing' was not
found")` (via the existing Python binding mapping of `TableNotFound` →
`PyValueError`) — geneva can catch `ValueError` cleanly.
- `db.create_table("dup")` raises `ValueError("Table 'dup' already
exists")` reliably across both `dir` and REST backends, so the existing
string match becomes deterministic.

In phalanx (the sophon REST server), `LanceDBError::TableNotFound` and
`LanceDBError::TableAlreadyExists` already map directly to HTTP 404 and
HTTP 400 respectively — see
[phalanx/src/error.rs:77-94](https://github.com/lancedb/sophon/blob/main/src/rust/phalanx/src/error.rs#L77).
No phalanx code change is needed for the bug fix; the previous 500 came
from phalanx's string-match fallback not finding `"namespace"` AND `"not
found"` in the `Runtime` error's debug-formatted message.

## Follow-up


[ENT-1246](https://linear.app/lancedb/issue/ENT-1246/remove-dead-namespace-error-string-matching-in-phalanx)
— after this lands and phalanx picks up the new lancedb, the
string-matching fallback for table errors in
`src/rust/phalanx/src/error.rs` (lines 99-168, 236-256, 502-514) and
`src/rust/phalanx/src/rest/table/create_table.rs` (lines 224-241)
becomes dead code and can be removed. The `// TODO: Refactor for better
namespace error handling` comment at phalanx/src/error.rs:96-98 is
exactly what this PR addresses on the lancedb side; ENT-1246 finishes
the cleanup on the sophon side.

## Test plan

- [x] `cargo test --quiet --features remote -p lancedb --lib` — all 495
lib tests pass, including 4 new tests in `database::namespace::tests`:
- `test_namespace_table_not_found` — extended to assert
`Error::TableNotFound` (was just `is_err()`)
- `test_namespace_open_table_not_found_at_root` — covers the
root-namespace path
- `test_namespace_create_table_already_exists` — covers child namespace
- `test_namespace_create_table_already_exists_at_root` — covers root
namespace
- [x] `cargo clippy --quiet --features remote --tests` — clean
- [x] `cargo fmt --all` — clean
- [x] Manually confirmed (via test failures before the fix) that the two
`open_table` tests were returning `Error::Runtime { message: "Failed to
get table info from namespace: Namespace { source: TableNotFound { ... }
}" }` prior to this change.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 15:19:23 -07:00
Shengan Zhang
650f173236 feat(python): add IVF_HNSW_FLAT vector index support (#3366)
## Summary

Wire up `IVF_HNSW_FLAT` in the Rust core and Python SDK. The index was
documented at https://docs.lancedb.com/indexing/vector-index but
`lancedb.Table.create_index(index_type="IVF_HNSW_FLAT")` raised
`ValueError: Unknown index type IVF_HNSW_FLAT` — the underlying
`pylance` already accepted it, only the LanceDB wrapper was missing the
wiring.

**Rust core (`rust/lancedb`):**
- Add `Index::IvfHnswFlat` / `IndexType::IvfHnswFlat` variants and the
`IvfHnswFlatIndexBuilder` (modelled on `IvfHnswSqIndexBuilder`).
- Build Lance params via the existing `VectorIndexParams::ivf_hnsw(...)`
helper, keeping symmetry with the other `IVF_HNSW_*` variants.
- Forward the variant in `RemoteTable::create_index` and add two
parametrised tests (default + customised config) for the JSON
serialisation.
- New `NativeTable` integration test
(`test_create_index_ivf_hnsw_flat`).

**Python binding (`python/`):**
- New `HnswFlat` dataclass + backwards-compat `IvfHnswFlat` alias.
- PyO3 `extract_index_params` recognises the `HnswFlat` config.
- `LanceTable.create_index(index_type="IVF_HNSW_FLAT", …)` and the sync
`RemoteTable.create_index` both dispatch to the new config.
- `IndexStatistics.index_type` `Literal` and `_lancedb.pyi` stubs cover
the new type so `pyright`/`make check` stays clean.
- Async integration tests (`HnswFlat` + `IvfHnswFlat` alias) and a sync
dispatcher test, mirroring the existing `IVF_HNSW_SQ` coverage.
- Existing `test_index_statistics_index_type_lists_all_supported_values`
updated to include `IVF_HNSW_FLAT`.

A matching Node.js / TypeScript binding is in a follow-up PR.

Closes #3331

## Test plan

- [ ] \`cargo check --quiet --features remote --tests --examples\`
- [ ] \`cargo test --quiet --features remote -p lancedb\` (covers the
new \`test_create_index_ivf_hnsw_flat\` and the two new parametrised
\`RemoteTable::create_index\` cases)
- [ ] \`cargo fmt --all\` / \`cargo clippy --quiet --features remote
--tests --examples\`
- [ ] \`cd python && make develop && make check && make test\` (covers
the two new async tests, the alias test, the dispatcher test, and the
updated \`test_index_statistics_index_type_lists_all_supported_values\`
assertion)
2026-05-11 15:08:32 -07:00
Xuanwo
9b21c136c6 feat(python): support model-backed native FTS tokenizers (#3289)
This wires Lance's existing `jieba/*` and `lindera/*` native FTS
tokenizers through the Python SDK instead of leaving them behind
disabled features and narrow public typing. It also documents the
`LANCE_LANGUAGE_MODEL_HOME` model layout and adds Python coverage for
successful CJK indexing plus missing-model error guidance.

Closes #2168.
2026-05-08 23:53:14 +08:00
Heng Ge
694aa48e19 fix(database): drop spurious trailing ? from listing-database URIs (#3357)
## Summary

`url::Url::query_pairs_mut()` leaves the URL with `query=Some("")` after
`.clear()` even when the input had no query string. The listing-database
connect path then captured that empty query into
`ListingDatabase::query_string`, and `table_uri()` blindly appended
`?<query>` to every per-table URI — producing URIs like
`s3://bucket/prefix/foo.lance?`.

The trailing `?` is benign for normal table operations, but it breaks
any caller that constructs a sub-path from the table URI. In particular,
MemWAL flushes write to `<table_uri>/_mem_wal/<shard>/<rand>_gen_<n>`,
which `url::Url::parse` then re-parses as `path=<base table>` +
`query=/_mem_wal/...`. `Dataset::write` resolves the base table dataset,
finds it already exists, and fails with `Dataset already exists:
…_gen_1` on the very first MemTable flush (observed deterministically
against S3 across all merge_insert LSM modes; tracked in
[lance-format/lance#6713](https://github.com/lance-format/lance/pull/6715)).

## Fix

Treat `Some("")` query the same as no query when capturing
`query_string`. A real `?foo=bar` query is still propagated unchanged.

Adds a regression test covering both the empty-query and non-empty-query
paths.

## Verification

- `url::Url::parse("s3://bucket/prefix/").query()` → `None`, but after
`query_pairs_mut().clear()` → `Some("")`. Confirmed in a standalone
repro.
- Without this fix, every `table_uri()` for an `s3://`-style connection
ends with `?`, breaking MemWAL and any future sub-path consumer in the
same way.
- New unit test `test_table_uri_url_path_has_no_trailing_question_mark`
exercises both code paths.
2026-05-07 23:29:29 -07:00
LanceDB Robot
47a34f5cca chore: update lance dependency to v7.0.0-beta.4 (#3348)
## Summary
- Update Lance Rust dependencies to `v7.0.0-beta.4` using
`ci/set_lance_version.py`.
- Update the Java `lance-core` dependency property to `7.0.0-beta.4`.
- Align LanceDB with dependency updates required by Lance 7, including
`object_store` 0.13 API compatibility.

Triggering tag:
https://github.com/lance-format/lance/releases/tag/v7.0.0-beta.4

## Verification
- `cargo clippy --workspace --tests --all-features -- -D warnings`
- `cargo fmt --all`
2026-05-05 18:36:39 -07:00
Lance Release
c091243d5b Bump version: 0.28.0-beta.10 → 0.28.0-beta.11 2026-04-29 17:53:49 +00:00
LanceDB Robot
4a5341edb1 chore: update lance dependency to v6.0.0-beta.7 (#3334)
## Summary
- Update Lance Rust dependencies to `6.0.0-beta.7` using
`ci/set_lance_version.py`.
- Update Java `lance-core.version` to `6.0.0-beta.7`.
- Align Arrow/DataFusion/PyO3 dependency versions and apply required
compatibility fixes for the Lance upgrade.

Triggering tag:
[v6.0.0-beta.7](https://github.com/lance-format/lance/releases/tag/v6.0.0-beta.7)

## Verification
- `cargo clippy --workspace --tests --all-features -- -D warnings`
- `cargo fmt --all`
2026-04-29 10:52:25 -07:00
Jack Ye
25dfe2cfd4 feat: add manifest-enabled directory namespace mode (#3332)
Adds manifest_enabled for local/native connections so directory
namespace manifests can be the source of truth, including migration from
directory listing and Azure credential vending feature wiring. Also
exposes the option through Rust, Python, and Node bindings with focused
validation.
2026-04-29 09:22:06 -07:00
Lance Release
4dcd7f4314 Bump version: 0.28.0-beta.9 → 0.28.0-beta.10 2026-04-28 13:29:26 +00:00
Jack Ye
a92ae0ded5 fix: enable hostname verification by default (#3304)
## Summary

- make `TlsConfig::default()` enable hostname verification by default
- align the Rust default with the documented Python and Node behavior
- update the Rust unit test to lock in the safe default
2026-04-21 08:39:03 -07:00
Lance Release
75b0a8e0a3 Bump version: 0.28.0-beta.8 → 0.28.0-beta.9 2026-04-19 20:39:29 +00:00
Jack Ye
2a1df8edcf fix(rust): materialize declared namespace tables on create (#3288)
## Summary
- handle `declare_table` already-exists conflicts in the Rust namespace
database create path
- reuse declared-but-not-materialized table metadata instead of failing
create mode
- preserve overwrite behavior while allowing declared Geneva system
tables to be materialized
2026-04-19 13:25:53 -07:00
Lance Release
be48ada352 Bump version: 0.28.0-beta.7 → 0.28.0-beta.8 2026-04-19 04:19:10 +00:00
Jack Ye
f909df3e87 fix(python): use namespace-backed rust connection for namespace tables (#3286)
So far, I have been using a hacky approach that creates and opens
namespace-backed table, by getting its location and use a temporary
lancedb connection to create or open it. This was working for features
like credentials vending but is no longer fully working for the managed
versioning feature, recently geneva tests have been failing here and
there and various patches are not addressing the root cause. This PR
fully fixes this and implements proper rust binding for it.
Specifically:

- build a real Rust namespace-backed connection from the Python
namespace client
- route namespace table create/open through that connection instead of
resolved-location temp connections
- keep namespace client naming consistent in the Rust bridge and
preserve federated namespace + DuckDB behavior
2026-04-18 21:17:52 -07:00
Lance Release
d715bbb588 Bump version: 0.28.0-beta.6 → 0.28.0-beta.7 2026-04-17 08:12:27 +00:00
Lance Release
11af763fcd Bump version: 0.28.0-beta.5 → 0.28.0-beta.6 2026-04-16 18:57:28 +00:00
Xuanwo
b7c0b5987c chore: upgrade lance to 6.0.0-beta.1 (#3281) 2026-04-17 02:51:58 +08:00
Jack Ye
97a4b38f19 feat(rust): support nested namespace ops in listing db (#3279)
## Summary
- delegate child-namespace `ListingDatabase` operations through an
eagerly initialized `LanceNamespaceDatabase`
- support nested namespace create/open/list/drop flows without requiring
callers to inject explicit locations
- add `namespace_client_properties` plumbing for local and namespace
connections so directory namespace settings like
`table_version_tracking_enabled` can be configured
- add regression tests for nested namespace ops and namespace client
property propagation
2026-04-16 10:12:28 -07:00
Gezi-lzq
10879d99b8 docs: fix broken documentation links (#3278) 2026-04-15 20:56:59 +08:00
Lance Release
4e6a1d5dce Bump version: 0.28.0-beta.4 → 0.28.0-beta.5 2026-04-12 23:51:14 +00:00
Lance Release
c6ae0de3ee Bump version: 0.28.0-beta.3 → 0.28.0-beta.4 2026-04-12 03:57:58 +00:00
Lance Release
359710a0bf Bump version: 0.28.0-beta.2 → 0.28.0-beta.3 2026-04-11 22:44:52 +00:00
Lance Release
df354abae4 Bump version: 0.28.0-beta.1 → 0.28.0-beta.2 2026-04-11 07:06:00 +00:00
Will Jones
2807ad6854 chore: bump Rust toolchain from 1.91.0 to 1.94.0 (#3257)
Bumps the Rust toolchain to 1.94.0 (latest installed) to unblock CI
failures caused by the AWS SDK's MSRV requirement. No lint fixes were
needed.

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-10 07:57:47 -07:00
Jack Ye
a898dc81c2 feat: add user_id field to ClientConfig for user identification (#3240)
## Summary

- Add a `user_id` field to `ClientConfig` that allows users to identify
themselves to LanceDB Cloud/Enterprise
- The user_id is sent as the `x-lancedb-user-id` HTTP header in all
requests
- Supports three configuration methods:
  - Direct assignment via `ClientConfig.user_id`
  - Environment variable `LANCEDB_USER_ID`
  - Indirect env var lookup via `LANCEDB_USER_ID_ENV_KEY`

Closes #3230

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-04-06 11:20:10 -07:00
Lance Release
de3f8097e7 Bump version: 0.28.0-beta.0 → 0.28.0-beta.1 2026-04-05 02:51:18 +00:00