lancedb

mirror of https://github.com/lancedb/lancedb.git synced 2026-07-03 11:00:40 +00:00

Author	SHA1	Message	Date
Wyatt Alt	142ac835d3	client: job_history() and errors() over REST (SHOW JOB HISTORY / SHOW ERRORS) The client exposed list_jobs/get_job/cancel_job but not the durable job history or the per-row UDF errors, so those SQL/REST surfaces had no SDK equivalent. Add job_history(job_id=None) and errors(job_id=None, table=None) through every layer: - Database trait + Connection API (JobHistoryInfo, JobErrorInfo types). - Remote REST impl: GET /v1/job/history (?job=) and GET /v1/job/errors (?job=&table=), with serde response types + From mappings. - pyo3 bindings + pyclasses JobHistoryEntry / JobErrorEntry, registered. - Python sync + async db.py wrappers. Mirrors the existing list_jobs plumbing exactly. Remote-handler test asserts the GET paths, query filters, and response parsing for both. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-29 22:35:35 -07:00
Wyatt Alt	3f44f93e92	job wait(): poll by id via get_job (point access) instead of list_jobs JobHandle/AsyncJobHandle now poll conn.get_job(id, table) -- one job -- instead of list_jobs() + client-side filter over every active job. The job's table is threaded in from refresh_column / MV refresh as an O(1) lookup hint. Plumbs get_job through the Database trait (default not_supported), RemoteDatabase (GET /v1/job/{id}?table=...), the Connection wrapper, and the pyo3 binding + db.py. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-29 22:35:35 -07:00
Wyatt Alt	78aa005093	client: slice 3 -- thread table_lineage through the remote client + pyo3 A new Database::table_lineage(TableLineageRequest) -> Result<String> threaded end to end: default not_supported in the trait; the remote impl issues GET /v1/table/{name}/lineage with column/direction/depth query params and returns the body verbatim; connection.rs exposes a pub wrapper; the pyo3 binding hands the JSON string to Python. The lineage payload is carried as opaque JSON on purpose: the open-source lancedb client must not depend on the sophon-internal derived_jobs crate that defines the lineage schema, so the wire format is the contract and the Python layer deserializes it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-29 22:35:34 -07:00
Wyatt Alt	5810974b37	feat(client): Table.load_columns() REST client for LOAD COLUMNS Geneva Table.load_columns() parity on the REST-only client. Fills existing columns from an external Parquet/Lance/IPC source by primary-key join. - BaseTable::load_columns default (NotSupported) + public Table::load_columns, taking a LoadColumnsRequest (source uris/format/storage_options, target/source key, (target, source?) column mappings, on_missing, worker/batch/commit knobs). - Remote impl POSTs to /v1/table/{id}/load_columns with the matching body; mock test asserts the request shape. - PyO3 binding + Python remote Table.load_columns(source, pk, columns, *, source_format, source_pk, on_missing, ...) accepting a column list or {target: source} dict. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-29 22:35:34 -07:00
Wyatt Alt	8b38500b07	feat(view): full=True force-rebuild on refresh_materialized_view View.refresh(full=True) (sync + async) now works -- it previously raised NotImplementedError. Thread the flag through the client: RefreshMaterialized- ViewRequest.full -> the REST body (RemoteRefreshMaterializedViewRequest.full); pyo3 refresh_materialized_view(full=...); Connection.refresh_materialized_view( name, full=) sync + async. A full refresh forces a recompute-and-replace and preserves the view's indexes (reindexed by the distributed indexer). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-29 22:35:34 -07:00
Wyatt Alt	b9f33ba1c9	feat(refresh): priority as a per-refresh knob; fix batch_size on RemoteTable Thread priority (Kueue tier) through refresh_column at every layer (Python sync+async + RemoteTable -> pyo3 -> Rust client trait/public/remote -> REST body), mirroring num_workers/batch_size. The function keeps its priority as a default; the per-refresh value overrides. Also adds the previously-missed batch_size to RemoteTable.refresh_column (the REST sync path). cargo check (lancedb --features remote --tests, lancedb-python) + ruff clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-29 22:35:34 -07:00
Wyatt Alt	d4f4fef3ba	feat(refresh): batch_size is a per-refresh knob (refresh_column), not a function-only option batch_size / num_workers / max_workers are invocation concerns (how to schedule THIS refresh), so expose batch_size on refresh_column through every layer (Python sync+async -> pyo3 -> Rust client -> the REST RefreshColumnRequest.batch_size, which the handler already forwards into the backfill). num_workers/max_workers were already invocation- placed; batch_size was the gap. The function may still carry a default; the refresh override wins (extends the batch_size_override model). Both crates cargo-check clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-29 22:35:34 -07:00
Wyatt Alt	127054069a	feat(mv): partition_by option on create_materialized_view / create_view Thread an optional partition_by through the client: CreateMaterializedViewRequest -> REST body -> pyo3 binding -> Python create_materialized_view/create_view kwarg (sync + async). The server partitions the view's table function by the named source column -- by IVF index clusters if the column is indexed (image-dedup), else by distinct value. Unifies Geneva's partition_by + partition_by_indexed_column into one knob. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-29 22:35:34 -07:00
Wyatt Alt	e93476f0e0	feat: explain_refresh_materialized_view over REST (EXPLAIN REFRESH SDK) Database trait gains explain_refresh_materialized_view (default NotSupported) returning an MvRefreshPlan; RemoteDatabase POSTs /v1/materialized_view/{name}/explain_refresh; Connection method; pyo3 MvRefreshPlan pyclass + binding; sync+async python wrappers. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-29 22:35:34 -07:00
Wyatt Alt	2b41fce033	feat: cancel_job over REST (Database::cancel_job + remote impl + pyo3 + python) Exposes the existing server-side CANCEL JOB (CoordinatorCatalog::cancel_job) as a REST-backed SDK method: Database trait default NotSupported, RemoteDatabase POSTs /v1/job/{id}/cancel, pyo3 binding, sync+async python wrappers. Best-effort: a missing job returns false, not an error. Mock-HTTP unit test in test_derived_compute_routes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-29 22:35:34 -07:00
Wyatt Alt	04948fc4f6	feat: computed columns as a param on add_columns Per the interface design: computed columns are parameters on the existing add_columns operation, not a separate method. - BaseTable::add_computed_columns((name, sql_type) pairs + a f(args) expression) -- default NotSupported; RemoteTable posts 'computed' entries to the existing /v1/table/{id}/add_columns route. - python add_columns gains computed= on LanceTable, RemoteTable, and AsyncTable: tbl.add_columns(computed={'doubled': ('FLOAT', 'double_it(val)')}); grouped by expression so struct-returning functions' columns land adjacently.	2026-06-29 22:35:34 -07:00
Wyatt Alt	ff3c7111b9	feat: SDK surface for functions, materialized views, jobs, refresh_column Adds the derived-compute interface to the SDK: - Database trait: create/list/drop_function, create/refresh/alter/ drop/list_materialized_view, list_jobs -- default implementations return Error::NotSupported (NotImplementedError in python), so existing Database impls are unaffected; local single-node implementations are planned. BaseTable gains refresh_column with the same default. - RemoteDatabase/RemoteTable implement them against the server REST routes (/v1/function/, /v1/materialized_view/, /v1/job/list, /v1/table/{id}/refresh_column), with mock-HTTP unit tests. - Connection/Table public methods, pyo3 bindings (FunctionInfo, MaterializedViewInfo, JobInfo pyclasses), and python wrappers: sync on the DBConnection base (shared by local and remote connections), async on AsyncConnection; refresh_column on LanceTable, RemoteTable, and AsyncTable.	2026-06-29 22:35:34 -07:00
Lance Release	e01777070d	Bump version: 0.31.0-beta.3 → 0.31.0-beta.4	2026-06-29 11:12:18 +00:00
Jack Ye	3df3043563	feat(rust): add OAuth header provider (#3579 ) ## Summary Add the Rust OAuth header provider for remote LanceDB connections. This supports client credentials and Azure managed identity flows, handles token caching and refresh, redacts secrets in Debug output, and wires `ConnectBuilder::oauth_config()` into the remote client while rejecting ambiguous API-key/header-provider combinations.	2026-06-26 23:57:16 -07:00
Ryan Green	8a5cd74e48	fix: ensure read freshness provider is built into namespace client (#3571 ) By default the read freshness provider was not included in the namespace client, preventing the read freshness headers from being included in the request. This prevents checkout_latest() from working as expected when using the namespace client. This fix ensures the provided is built into the client when the namespace impl and properties are provided.	2026-06-25 21:47:55 -07:00
Lance Release	448d5ec20f	Bump version: 0.31.0-beta.2 → 0.31.0-beta.3	2026-06-25 01:55:06 +00:00
Jack Ye	fe287dc98c	fix(remote): support namespace clients with dynamic headers Bridge LanceDB dynamic header providers into Lance Namespace dynamic context providers for live remote namespace clients.	2026-06-24 15:30:00 -07:00
Jack Ye	411568b72c	fix(remote): omit empty api key header (#3573 ) ## Summary Skip inserting the x-api-key header when the configured API key is empty. This lets bearer-token or other dynamic-header authentication avoid sending an empty static API key header alongside the real auth header.	2026-06-24 13:25:59 -07:00
Lance Release	0749532c3c	Bump version: 0.31.0-beta.1 → 0.31.0-beta.2	2026-06-23 16:23:08 +00:00
Drew Gallardo	41ac32a344	feat(rust): add blob read and materialization APIs (#3562 ) This PR is for the Read path against blob v2. #3528 handles declare + write, and this this adds materialization on local tables. - blob_columns() - fetch_blobs(column, row_ids) → bytes - fetch_blob_files(column, row_ids) → lazy handles - Pass _rowid from query().with_row_id(). Remote returns NotSupported. (for now) ### Use cases search, grab row ids, materialize images: ```rust let row_ids = /* _rowid from hits */; let images = table.fetch_blobs("image", &row_ids).await?; ``` Large blobs: open handles, read only what you need: ```rust let handles = table.fetch_blob_files("image", &row_ids).await?; let bytes = handles[0].as_ref().unwrap().read().await?; ``` Filter then batch fetch: collect ids from a filter, one call. Multiple blob columns: image and thumbnail independently. Row ids from before compact: still resolve. ### Alignment note Lance `read_blobs` drops null rows. We descriptor-take first, read non-null ids, re-expand to match input order. Null and zero-length blobs come back null/None. Bytes path sets `preserve_order(true)`. So I added: ``` TODO(lance): expose selection_index or an aligned execute so we can drop the pre-read. ``` ### Tests `cargo test -p lancedb --test blob_integration` - 30 tests covering nulls, reorder, dups, cross-fragment bytes + files, compact, delete, legacy v1 errors. --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-23 06:58:26 -07:00
Drew Gallardo	ba1ef34481	feat(rust): add blob v2 schema declaration and write path (#3528 ) First Rust PR for #3231. Lance already stores blob v2. This adds the LanceDB write side. ```rust let schema = Schema::new(vec![ Field::new("id", DataType::Int64, false), lancedb::blob("image", true), ]); let table = db.create_table("photos", schema).execute().await?; table.add(batch_with_large_binary_image_column).execute().await?; ``` Read/materialize and Python are follow-up PRs. ### Testing - cargo test -p lancedb --test blob_integration - cargo test -p lancedb blob:: datafusion::blob_coerce - cargo test -p lancedb (591 passed) - cargo clippy --features remote --tests --------- Co-authored-by: Claude Fable 5 <noreply@anthropic.com>	2026-06-19 12:33:15 -07:00
Will Jones	85d870b397	fix: parse RFC 3339 created_at and improve IndexConfig repr (#3558 ) The server now serializes an index's `created_at` as an RFC 3339 string (e.g. `"2026-06-18T21:37:36.637Z"`), but the client deserializer only accepted a unix timestamp in milliseconds. This caused `list_indices` to fail with: ``` Failed to parse list_indices response: invalid type: string "2026-06-18T21:37:36.637Z", expected a unix timestamp in milliseconds ``` This PR replaces the fixed millisecond deserializer with a custom one that accepts both an RFC 3339 string (current server) and a unix-millisecond integer (legacy deployments), so the client works against any server version. It also improves the `IndexConfig` repr in the Python bindings. Previously it printed only three fields (`Index(FTS, columns=["text"], name="text_idx")`), hiding the metadata that `list_indices` returns. It now renders every populated field, omitting any that are `None`. Each value is valid Python — integer counts use `_` thousands separators and `created_at` uses the `datetime` repr — so values round-trip. The real repr is a single line; it's wrapped here for readability: ```python >>> table.list_indices() [IndexConfig( name="text_idx", index_type="FTS", columns=["text"], index_uuid="aefd3e00-2f95-4bdc-92ac-06de84442bf1", type_url="/lance.table.InvertedIndexDetails", created_at=datetime.datetime(2026, 6, 18, 21, 37, 36, 637000, tzinfo=datetime.timezone.utc), num_indexed_rows=2, size_bytes=3_669, num_segments=1, index_version=1, index_details={ 'lance_tokenizer': None, 'base_tokenizer': 'simple', 'language': 'English', 'with_position': False, 'max_token_length': 40, 'lower_case': True, 'stem': True, 'remove_stop_words': True, 'custom_stop_words': None, 'ascii_folding': True, 'min_ngram_length': 3, 'max_ngram_length': 3, 'prefix_only': False, }, )] ``` Fixes #3556 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-19 10:40:56 -07:00
Lance Release	113f187c2d	Bump version: 0.31.0-beta.0 → 0.31.0-beta.1	2026-06-19 16:00:59 +00:00
Lance Release	e81356089a	Bump version: 0.30.1-beta.2 → 0.31.0-beta.0	2026-06-18 18:43:22 +00:00
Armaan Sandhu	1f8ebef3cd	fix(rust): return typed errors instead of panicking in Bedrock embedding path (#3512 ) Closes #3506 ## Problem The Bedrock embedding compute path (`rust/lancedb/src/embeddings/bedrock.rs`) panics instead of returning a typed error in several places: - `serde_json::to_vec(&request_body).unwrap()`: request serialization. - `block_in_place(...).unwrap()`: the AWS `invoke_model` send result; any API error terminates the worker instead of propagating. - `v.as_f64().unwrap() as f32`: panics on non-numeric values in the returned embedding array. - `Handle::current()` + `block_in_place` assume a multi-threaded Tokio runtime and panic when that assumption does not hold (no runtime, or a current-thread runtime). Malformed payloads, non-numeric embedding values, or an incompatible runtime should surface as typed errors and never panic. ## Fix - Serialize the request body before the blocking section so a serialization failure returns `Error::Runtime` via `?`. - Map the `invoke_model` send error to `Error::Runtime` instead of `unwrap`. - Add a `json_array_to_f32` helper that converts the response array to `Vec<f32>`, returning `Error::Runtime` for a missing/non-array field or a non-numeric element (used by both the Titan and Cohere paths). - Add `current_multi_thread_handle()` (`Handle::try_current()` + a `RuntimeFlavor::CurrentThread` guard) so an absent or incompatible runtime returns a typed error rather than panicking in `block_in_place`. Scope note: the sibling `openai.rs` provider uses the same `block_in_place` + `block_on` bridge, so the bridge pattern itself is kept; this change only removes the panic paths that are specific to the Bedrock provider. ## Testing Added 6 unit tests (no AWS credentials required): - `json_array_to_f32`: valid numbers, non-array payload, and non-numeric element. - `current_multi_thread_handle`: errors with no runtime, errors on a current-thread runtime, and succeeds on a multi-threaded runtime. All pass; `cargo fmt` and `cargo clippy` clean. Build/test with `--features bedrock,lance/protoc`.	2026-06-17 15:06:44 -07:00
Ghxst ☠️	394bb34fa2	fix(rust): report local write progress bytes from Lance (#3422 ) Fixes #3360. This updates native table writes so local write progress uses Lance writer byte stats instead of Arrow in-memory batch size once write bytes are available. The change wires the existing `WriteProgressTracker` into `InsertExec` for native `add` writes, installs a Lance `WriteProgressFn` only when no lower-level callback is already configured, and keeps the existing public `InsertExec::new` signature unchanged. Validation: - `cargo test -p lancedb --features remote table::write_progress::tests::test_progress_uses_lance_write_bytes_for_local_tables -- --nocapture` passed: 1 passed, 0 failed. - `cargo test -p lancedb --features remote table::write_progress::tests -- --nocapture` passed: 7 passed, 0 failed. - `cargo check --quiet --features remote --tests --examples` passed. - `cargo fmt --all --check` passed. - `git diff --check` passed. - `git diff \| gitleaks stdin --no-banner --redact --timeout 30` passed: no leaks found. I did not run the full `cargo test --quiet --features remote --tests` suite. Co-authored-by: Ghxst <200635707+GHX5T-SOL@users.noreply.github.com>	2026-06-17 12:05:59 -07:00
Drew Gallardo	1bead6960c	fix: pin mock clock in eventual consistency test (#3547 ) This PR fixes a flaky test I hit on Windows test in #3528. Looks like `test_eventual_consistency_background_refresh` was failing with `v_cached` expected 1, got 2. There was a pr which swapped `tokio::time::sleep(300ms)` for `clock::advance_by(300ms)`, which is pretty much fine but the test necer pinned the clock so the first `get()` locks the `cached_at` on wall time. Therefore, if our CI is taking long enough the ttl expires before the value assertion in the test. So now we can add a `pin()` and call it first `get()`. After that we can advance the clock manually with no problems. Also, it's worth noting that I tried pinning in `BackgroundCache::new()` first. That broke another test `test_reload_resets_consistency_timer`, which uses real `tokio::time::sleep` and needs wall clock after `clear_mock()`. So the pin stays in this test only. And this should unblock us. Failing instances: - https://github.com/lancedb/lancedb/actions/runs/27567527236/job/81495265474?pr=3528 - https://github.com/lancedb/lancedb/actions/runs/27560366489/job/81470414928	2026-06-17 11:56:40 -07:00
Brendan Clement	0abf641733	feat: send read-freshness signal on the lance-namespace path (#3551 ) ### Description `db://`-style connections that use the lance-namespace path (`LanceNamespaceDatabase` → `NativeTable` + the lance-namespace REST client) never sent a read-freshness signal. Against a server configured to serve cached table metadata up to some staleness window, this allows stale-read-after-write across handles and processes. The remote table path already solved this (#3439). This brings the namespace path to parity. The namespace REST client doesn't let callers attach headers directly, but it forwards a `DynamicContextProvider`'s `headers.*` context entries as HTTP headers per request. So: - A shared per-table baseline map is created before the namespace client. I built and installed on the `ConnectBuilder` via a context provider. - On read operations the provider emits ·x-lancedb-min-timestamp = max(baseline, now − read_consistency_interval)` (RFC3339), keyed by the operation's `object_id`. - Each table handle bumps its baseline (monotonically) on `checkout_latest()`, `restore()`, and every data/schema write. `checkout_latest()` is the primary hook: consumers refresh a handle there after writing elsewhere, then poll. Read operations that carry the floor: `describe_table`, `list_table_versions`, `query_table`, `list_tables`. `list_table_versions` is what resolves "latest" for managed-versioning tables (`get_latest_version`), so it's the op that makes `checkout_latest()` actually observe a prior write. `describe_table_version` is excluded (pinned to an immutable version). This mirrors #3439 (timestamp baseline, `max(baseline, now − interval)`, monotonic); no `min_version` and no body channel, since the namespace path has no version-returning write responses. ### Testing - Unit tests for `compute_min_timestamp` / `next_freshness_baseline` and the provider (header at/after a bumped baseline; nothing for an empty baseline + no interval; interval floor applies; non-read ops emit nothing; `list_tables` uses only the interval floor). - Verified end-to-end against a local server that honors the header: reads carry `x-lancedb-min-timestamp`, writes don't, and read-your-write holds.	2026-06-17 13:30:53 -04:00
Yang Cen	b46a44f873	feat(query): add approx mode to vector queries (#3549 ) ## Feature ### What is the new feature? Adds Rust core API support for configuring vector query approximation mode with `ApproxMode::{Fast, Normal, Accurate}`. ### Why do we need this feature? Lance already exposes `lance_index::vector::ApproxMode` and scanner support for controlling the speed/accuracy tradeoff for approximate vector search. LanceDB Rust queries need to expose and pass this setting through for local/native and remote vector searches. ### How does it work? - Adds public `ApproxMode` in `rust/lancedb`, with lowercase serde, `Default::Normal`, parse/display, and conversions to/from Lance's `ApproxMode`. - Adds `approx_mode: Option<ApproxMode>` to `VectorQueryRequest` and a `VectorQuery::approx_mode(...)` builder. - Applies the mode to native/local Lance scanners after `nearest(...)` when explicitly set. - Sends `approx_mode` in remote query JSON only when explicitly set; default requests omit it. ## Validation - `cargo fmt --all` - `cargo test --quiet --features remote approx_mode` - `cargo test --quiet --features remote test_query_vector_default_values` - `cargo check --quiet --features remote --tests --examples` - `git diff --check`	2026-06-17 19:28:36 +08:00
Brendan Clement	f76b075d13	feat: add table branch support to remote tables and Python/TS bindings (#3540 ) ### Description Adding branch support for RemoteTable by threading a branch selector onto every operation the data plane accepts it on. Exposes the currentBranch to nodejs and python through the bindings. Matching the server handlers, the branch rides as: - a `?branch=` query parameter for Arrow-body and query-only ops (insert, merge_insert, multipart_*, version/list, drop_index) - a `branch` field in the JSON body for everything else (count_rows, query, update, delete, create_index, column ops, index list/stats, stats, restore, describe, tags create/update) A main-branch handle (`branch == None`) produces byte-identical requests to before: no `branch` field and no `?branch=` - Handle-per-branch: `create_branch` / `checkout_branch` return a new handle with fresh caches and reset version/freshness state, mirroring `NativeTable`. - `create_branch` maps 409 to already-exists, 400 to invalid, and 404 to not-found with source context, and sends without retry so the 409 stays observable. - `Ref` translation covers version, version-number (relative to the handle's branch), and tag (resolved via the tags endpoint); `"main"` and empty normalize to the main branch. - Python branch handles persist their branch (and pinned version) across pickle/fork, so a forked or pickled handle reopens on its branch rather than silently reverting to main. ### Tests - Rust mock tests per op category (query-param and body mechanisms, branch CRUD, error paths, backward-compat). - Python sync branch CRUD, `open_table(branch=)`, and a pickle round-trip regression test.	2026-06-15 18:07:40 -04:00
Will Jones	6219975222	perf: drop N+1 in RemoteTable::list_indices (#3535 ) `RemoteTable::list_indices` currently makes one `/index/list/` call plus one `/index/{name}/stats/` call per index just to recover `index_type`. When the server returns `index_type` directly in the `/index/list/` response, all enriched fields are used and the per-index stats fan-out is skipped entirely. When `index_type` is absent (legacy servers), the existing stats fallback runs as before. This is content-based: no version header required. ## Changes - `RemoteTable::parse_index_list_response` replaces the old split between enriched and legacy parsers. A single struct deserializes both old and new response shapes, with all fields except `index_name` and `columns` optional. `index_type` acts as the sentinel: present → use enriched fields directly; absent → call `/index/{name}/stats/`. ## Tests Added `test_list_indices_enriched` covering: - All enriched fields populated correctly when `index_type` is in the list response - Optional fields absent from the response deserialize as `None` - Stats endpoint is not called (panics if hit), verifying the fan-out is eliminated Existing `test_list_indices` and `test_list_indices_nested_field_paths` exercise the legacy path unchanged. ## Depends on - #3497 (expand `IndexConfig`) — already merged - Server-side enriched response support Closes #3494 Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-15 09:21:17 -07:00
Xuanyi Li	49815da933	refactor: extract create_index module from table.rs (#3521 ) ## Summary - Extracts the `create_index` code cluster from `table.rs` into a new `rust/lancedb/src/table/create_index.rs` submodule, continuing the work from #2949. - Moves 8 `NativeTable` inherent methods (`load_indices`, `validate_index_type`, `build_ivf_params`, `get_num_sub_vectors`, `get_vector_dimension`, `resolve_index_field`, `make_index_params`, `get_index_type_for_field`) and 11 associated tests into the new module. - Reduces `table.rs` from ~5009 to ~3804 lines (-1205 lines) with no behavioral changes. ## Test plan UT	2026-06-11 14:06:44 -07:00
nuthalapativarun	40f3e22600	feat: support rename_table on LanceNamespaceDatabase (#3520 ) ## Summary Closes #3412 Implements `rename_table` for `LanceNamespaceDatabase` (sync and async Python) and the Rust `NamespaceDatabase` backend. Previously these raised `NotImplementedError`; this PR delegates to the `LanceNamespace.rename_table` method which is part of the lance-namespace spec. ### Changes - `rust/lancedb/src/database/namespace.rs`: Remove the `NotImplementedError` stub for `rename_table`. Build a `RenameTableRequest` (with `id`, `new_table_name`, and optionally `new_namespace_id`) and call `self.namespace.rename_table(...)`, mirroring the existing `drop_table` pattern. - `python/python/lancedb/namespace.py`: Import `RenameTableRequest` from `lance_namespace`. Replace the `raise NotImplementedError` in both `LanceNamespaceDatabase.rename_table` (sync) and `AsyncLanceNamespaceDatabase.rename_table` (async) with a call to `self._namespace_client.rename_table(request)`. - `python/python/tests/test_namespace.py`: Replace the `test_rename_table_not_supported` test (which checked for `NotImplementedError`) with `test_rename_table`, which: 1. Creates a table in a namespace 2. Calls `rename_table` with `cur_namespace_path` and `new_namespace_path` 3. Asserts the old name is gone from `table_names()` 4. Asserts the new name appears in `table_names()` 5. Verifies the renamed table can be opened ## Test plan - [ ] Existing namespace tests pass in CI (all rely on `lance.namespace.DirectoryNamespace` which requires the full lance package) - [ ] `test_rename_table` exercises the full rename path: create → rename → verify old gone → verify new present → open - [ ] Rust build passes with the updated `namespace.rs` (requires Rust toolchain in CI)	2026-06-11 11:41:07 -07:00
LanceDB Robot	4fb7c92e86	chore: update lance dependency to v8.0.0-beta.11 (#3533 ) Updates Lance dependencies to v8.0.0-beta.11 and refreshes the Rust and Java lock/config files. This also adapts namespace external manifest store call sites to the new table-root-aware constructor required by Lance. Triggering tag: https://github.com/lancedb/lance/releases/tag/v8.0.0-beta.11	2026-06-10 17:53:58 -07:00
Will Jones	f03abc27e3	feat: expand IndexConfig with rich per-index metadata (#3497 ) `IndexConfig` (returned by `Table::list_indices`) previously exposed only `name`, `index_type`, and `columns`. Lance's `describe_indices` provides richer per-index info cheaply (reads manifest metadata, often cached), so this surfaces it. Adds these `Option<T>` fields to `lancedb::index::IndexConfig`, populated in `NativeTable::list_indices` from the `IndexDescription`: - `index_uuid`: uuid of the first segment - `type_url`: protobuf type URL (`IndexDescription::type_url`) - `created_at`: minimum creation time across segments - `num_indexed_rows`: approximate rows indexed across segments - `num_unindexed_rows`: table row count minus `num_indexed_rows` - `size_bytes`: total size of index files across segments - `num_segments`: number of segments making up the index - `index_version`: on-disk index format version (first segment) - `index_details`: index-type-specific details as JSON This field set mirrors the lance-namespace `IndexContent` contract (lance-format/lance-namespace#348) so client and server agree on the same shape. Note these are populated locally via `describe_indices` — `NativeTable::list_indices` reads the dataset directly and does not depend on the namespace spec change. `RemoteTable` leaves the new fields `None` until a follow-up wires them through the server response (#3494). Bindings exposure will also be a follow up: #3495 Existing `list_indices` tests in `rust/lancedb/src/table.rs` are extended to assert the new fields. Fixes #3492 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-10 16:14:33 -07:00
Trenton H	85d9c1ce63	feat: adds isin support to the 'Expr' builder (#3523 ) The `Expr` build already includes a lot of useful filtering options, `eq, ne, gt/gte, lt/lte, and_, or_, contains, cast`, but is was missing a membership like `isin`. This PR adds that support, as minimally as possible, allowing easy filtering for membership in a list, without needing to be a series of `where` expressions. I didn't see anything in CONTRIBUTING.md about needing a feature request or issue first, so I just made the change. My apologies if I missed that somewhere. Thanks for the vector store, we're using it now in paperless-ngx.	2026-06-10 15:28:19 -07:00
Jack Ye	8373318e89	feat: support FM-Index scalar index for substring search (#3532 ) Adds an FM-Index — a scalar index over string and binary columns that accelerates substring search (`contains(col, 'needle')`), distinct from the tokenized `FTS` index — across the Rust core and the Python and TypeScript bindings. ## Rust - `Index::Fm(FmIndexBuilder)` and `IndexType::Fm`. - `make_index_params` maps `Index::Fm` to Lance's `ScalarIndexParams::for_builtin(BuiltinIndexType::Fm)`. - `supported_fm_data_type` validates `Utf8`/`LargeUtf8`/`Binary`/`LargeBinary` columns. - `list_indices` round-trips the type (`"Fm"` → `IndexType::Fm`); the remote wire type is `"FM"`. ## Python Adds `lancedb.index.Fm`, accepted by `create_index`: ```python from lancedb.index import Fm await tbl.create_index("text", config=Fm()) ``` ## TypeScript Adds the `Index.fm()` factory: ```ts await tbl.createIndex("text", { config: Index.fm() }); ```	2026-06-10 12:28:20 -07:00
LanceDB Robot	8308cca05e	chore: update lance dependency to v8.0.0-beta.9 (#3527 ) Updates Lance dependencies to v8.0.0-beta.9. Includes the required Rust compatibility fix for Lance's updated vector index UUID API. Triggering tag: https://github.com/lancedb/lance/releases/tag/v8.0.0-beta.9	2026-06-10 10:10:11 -07:00
Xuanwo	566b67a634	fix: support LargeList label list indexes (#3529 ) ## Summary This PR extends nested-field regression coverage across Rust local/remote, Python sync/async, and Node so canonical escaped paths stay consistent across scalar, vector, and FTS index lifecycle behavior. It also aligns LanceDB's LabelList type gate with Lance by accepting `LargeList<primitive>` columns while keeping `List<Struct<...>>` unsupported until Lance defines stable membership semantics for struct labels. Part of #3406.	2026-06-10 23:53:56 +08:00
Brendan Clement	d9018067b3	feat: support checking out a version on a branch (#3504 ) ### Description Stacked on #3490. Adds an optional version to branch checkout across the Rust core and the Python and TypeScript SDKs, so you can open a specific version on a branch ("version V of branch B"), not just the branch's latest version Rust ```rust // Open version 3 of branch "exp" (a read-only view): check out from an // existing table, or open it directly from the connection. let exp_v3 = table.checkout_branch("exp", Some(3)).await?; let exp_v3 = db.open_table("items").branch("exp").version(3).execute().await?; // checkout_latest re-attaches to the branch's writable HEAD. exp_v3.checkout_latest().await?; // With no branch, a version opens main at that version. let main_v3 = db.open_table("items").version(3).execute().await?; ``` Python ```python # Open version 3 of branch "exp" (a read-only view): check out from an # existing table, or open it directly from the connection. branch_v3 = await table.branches.checkout("exp", version=3) branch_v3 = await db.open_table("items", branch="exp", version=3) # checkout_latest re-attaches to the branch's writable HEAD. await branch_v3.checkout_latest() # With no branch, a version opens main at that version. main_v3 = await db.open_table("items", version=3) ``` TypeScript ```typescript // Open version 3 of branch "exp" (a read-only view): check out from an // existing table, or open it directly from the connection. const branchV3 = await (await table.branches()).checkout("exp", 3); const opened = await db.openTable("items", undefined, { branch: "exp", version: 3 }); // checkoutLatest re-attaches to the branch's writable HEAD. await branchV3.checkoutLatest(); // With no branch, a version opens main at that version. const mainV3 = await db.openTable("items", undefined, { version: 3 }); ``` ### Testing - Added unit tests (Rust, Python sync + async, TypeScript): branch-scoped resolution at a version number shared with `main` and with another branch, read-only enforcement on a pinned handle, `checkout_latest` recovery to the branch's HEAD, fork-point reads, and the nonexistent-version/branch error paths. - Ran smoke tests against the Python and TypeScript SDKs on local machine.	2026-06-08 17:36:38 -07:00
Brendan Clement	53517b3aaa	feat: add table branch support (#3490 ) ### Description Adds first-class support for table branches across the Rust core and the Python and TypeScript SDKs. Rust ```rust use lance::dataset::refs::Ref; // Create a branch from main and write to it — main is untouched. let exp = table.create_branch("exp", Ref::Version(None, None)).await?; exp.add(batches).await?; // Reopen the branch later: check out from a table, or open it directly. let exp = table.checkout_branch("exp").await?; let exp = db.open_table("items").branch("exp").execute().await?; let branches = table.list_branches().await?; table.delete_branch("exp").await?; ``` Python ```python # Create a branch from main and write to it branch = await table.branches.create("exp", from_ref="main") await branch.add(data) # Reopen the branch later: check out from a table, or open it directly. branch = await table.branches.checkout("exp") branch = await db.open_table("items", branch="exp") await table.branches.list() await table.branches.delete("exp") ``` TypeScript ```typescript const branches = await table.branches(); // Create a branch from main and write to it const branch = await branches.create("exp"); await branch.add(data); // Reopen the branch later: check out from a table, or open it directly. const checkedOut = await branches.checkout("exp"); const opened = await db.openTable("items", undefined, { branch: "exp" }); await branches.list(); await branches.delete("exp"); ``` ### Testing - Added unit tests - ran smoke tests against python and typescript sdks on local machine ### Next steps - Add RemoteTable support - Add Branch Comparison support - Merge Branching support	2026-06-08 16:26:46 -07:00
Will Jones	09b1bbc12a	refactor!: drop unused loss field from IndexStatistics (#3496 ) BREAKING CHANGE: direct Rust users lose the `IndexStatistics::loss` field. Python and Node.js consumers are unaffected in practice for remote tables (the value was always `None`/absent), but the attribute is gone for local tables too. `IndexStatistics::loss` was local-only — LanceDB Cloud never returned it, so `RemoteTable::index_stats` always set `loss: None`. It's vestigial; this removes it. - Remove `loss` from `IndexStatistics` and the internal `IndexMetadata` in `rust/lancedb/src/index.rs`, plus the summing logic in `NativeTable::index_stats`. - Drop `loss` from the Python and Node.js bindings (and their tests/docs). Fixes #3493 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-05 07:52:40 -07:00
LanceDB Robot	c484b24e51	chore: update lance dependency to v8.0.0-beta.4 (#3507 ) Updates LanceDB Lance dependencies to Lance v8.0.0-beta.4. Includes the required compatibility fix for the new Lance file writer finish summary API. Lance tag: https://github.com/lance-format/lance/releases/tag/v8.0.0-beta.4	2026-06-05 08:28:14 -05:00
Dan Rammer	c13ebc6796	feat(remote): implement set/unset_lsm_write_spec REST variant (#3501 ) ## Summary Wires `RemoteTable::set_lsm_write_spec` / `unset_lsm_write_spec` to the sophon REST endpoints added in [lancedb/sophon#6181](https://github.com/lancedb/sophon/pull/6181), replacing the previous `NotSupported` stubs. - `set_lsm_write_spec` maps the `LsmWriteSpec` onto sophon's request DTO — mode-tagged `sharding` (`unsharded` / `bucket` / `identity`), `maintained_indexes`, and `writer_config_defaults` — and POSTs to `/v1/table/{name}/set_lsm_write_spec/`. - `unset_lsm_write_spec` POSTs to `/v1/table/{name}/unset_lsm_write_spec/`. - Both call `check_mutable` first, matching the other remote mutations. - `maintained_indexes` is sent verbatim (an empty list means "no maintained indexes", matching native semantics). ## Testing - Added mocked-endpoint unit tests for unsharded / bucket / identity set and for unset. - `cargo check --features remote --tests` passes. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-04 21:47:52 -05:00
Lance Release	39a9f3e1e9	Bump version: 0.30.1-beta.1 → 0.30.1-beta.2	2026-06-04 06:05:35 +00:00
Armaan Sandhu	415d199c15	feat(rust): support datafusion expressions for merge insert predicates (#3444 ) ### Description This PR exposes native DataFusion expression support in the Rust SDK's `MergeInsertBuilder` via two new builder methods: `when_matched_update_all_expr` and `when_not_matched_by_source_delete_expr`. For remote LanceDB tables (where operations are serialized over HTTP/JSON to the SaaS backend), native DataFusion expression trees cannot be executed directly. The SDK handles this gracefully by returning a `NotSupported` error. ### Key Changes - `MergeFilter` Enum: Introduced a helper enum to store either a SQL string or a native `datafusion_expr::Expr`. - `MergeInsertBuilder`: Updated `when_matched_update_all_filt` and `when_not_matched_by_source_delete_filt` fields to store the new enum, and added `when_matched_update_all_expr` and `when_not_matched_by_source_delete_expr` builder methods. - Execution & Remote Dispatch: Dispatched the filter variants during local execution, and rejected expression filters with a clean `NotSupported` error in remote table request conversion. - Testing: Added a `test_merge_insert_expr` unit test covering conditional updates and deletes with programmatically built DataFusion expressions. ### Verification - Added integration test `test_merge_insert_expr` which successfully compiles and passes. - Formatted and linted the code. Closes #3416	2026-06-03 15:47:51 -07:00
Lance Release	9483b534af	Bump version: 0.30.1-beta.0 → 0.30.1-beta.1	2026-06-03 11:17:37 +00:00
Brendan Clement	379684391e	feat: deprecate replace_field_metadata for update_field_metadata (#3484 ) ### Summary Deprecates the Python replace_field_metadata (on Table and AsyncTable) in favor of update_field_metadata. Mirrors Lance, which already deprecated Dataset.replace_field_metadata for update_field_metadata. Stacked on top of #3482 as this was a follow-up task after adding update_field_metadata	2026-06-02 14:02:22 -07:00
Brendan Clement	d065be0474	feat: add update_field_metadata to edit per-field metadata (#3482 ) ### Summary Adds update_field_metadata to the client SDK (Rust core, Python, and TypeScript) so clients can edit per-field (column) Arrow metadata (schema.fields[].metadata) ### Testing - added unit tests - ran E2E against a local server on both local and remote tables (set → merge → delete), across Python sync/async and TypeScript ### Next steps - deprecate replace_field_metadata in the python lancedb favor of this (typescript didn't have replace_field_metadata method). This matches Lance's API direction (Lance already deprecated replace_field_metadata for update_field_metadata)	2026-06-02 07:00:00 -07:00
Lance Release	f20ec99dec	Bump version: 0.30.0-beta.1 → 0.30.1-beta.0	2026-06-01 12:41:45 +00:00

1 2 3 4 5 ...

783 Commits