- new lineage.py: Lineage / Node / Edge / FunctionRef dataclasses that parse the
server's lineage JSON, with to_dict(), to_graphviz() (drift edges dashed+red),
and _repr_html_(); plus .functions() / .stale() helpers.
- Connection.lineage(table, column=, direction=, depth=) (sync + async) calls the
pyo3 table_lineage binding and deserializes into Lineage.
- Table.lineage(column=, ...) via the table's job connection; MaterializedView /
AsyncMaterializedView .lineage() delegate to the backing table (the server
already includes the view's sources + downstream dependents).
- export the new types.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
refresh_column returned the bare job-id str, so callers had to wrap it:
db.job(tbl.refresh_column("c")).wait(). Mirror MaterializedView.refresh() and
return a JobHandle directly, so tbl.refresh_column("c").wait() / .status() / .id
work without the wrapper. (db.job(job_id) stays for reconnecting by a stored id.)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A new Database::table_lineage(TableLineageRequest) -> Result<String> threaded
end to end: default not_supported in the trait; the remote impl issues
GET /v1/table/{name}/lineage with column/direction/depth query params and
returns the body verbatim; connection.rs exposes a pub wrapper; the pyo3
binding hands the JSON string to Python.
The lineage payload is carried as opaque JSON on purpose: the open-source
lancedb client must not depend on the sophon-internal derived_jobs crate that
defines the lineage schema, so the wire format is the contract and the Python
layer deserializes it.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The sync _refresh_materialized_view called self._conn.refresh_materialized_view
(no underscore); the async method is _refresh_materialized_view, so
MaterializedView.refresh() raised AttributeError. Add the underscore.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Refresh is a submit-a-job verb, so its only public surface should be
MaterializedView.refresh() / AsyncMaterializedView.refresh() (which return a
job handle). Rename the connection methods to _refresh_materialized_view and
have the handles call that, so the raw by-name refresh is no longer advertised
on the connection. The pyo3 native binding is unchanged.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The sync RemoteDBConnection.create_materialized_view assembled the SELECT but
called the async create_materialized_view with the query as the 2nd positional
arg, which binds to `source=` (query= is keyword-only). Every call then failed
the "needs either query= or both source and select" validation. Pass query=query.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- create_materialized_view now takes either query= or source+select (folds in
the old create_view builder) and returns a MaterializedView handle whose
.wait() blocks on initial population. create_view is removed -- it was
misnamed (it built a *materialized* view, while CREATE VIEW means the plain
non-materialized view the engine also supports).
- MaterializedView.refresh() and the remote Table.refresh_column() now return a
JobHandle directly, so tbl.refresh_column("c").wait() needs no db.job(...)
wrapper. db.job(id) is narrowed to reconnect-by-id (stored id / SQL / REST).
- rename View/AsyncView -> MaterializedView/AsyncMaterializedView (+ exports).
- tighten the replace path: only a not-found error on the pre-drop is benign;
real failures (perms/server) now surface instead of being swallowed.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Geneva Table.load_columns() parity on the REST-only client. Fills existing
columns from an external Parquet/Lance/IPC source by primary-key join.
- BaseTable::load_columns default (NotSupported) + public Table::load_columns,
taking a LoadColumnsRequest (source uris/format/storage_options, target/source
key, (target, source?) column mappings, on_missing, worker/batch/commit knobs).
- Remote impl POSTs to /v1/table/{id}/load_columns with the matching body;
mock test asserts the request shape.
- PyO3 binding + Python remote Table.load_columns(source, pk, columns, *,
source_format, source_pk, on_missing, ...) accepting a column list or
{target: source} dict.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
View.refresh(full=True) (sync + async) now works -- it previously raised
NotImplementedError. Thread the flag through the client: RefreshMaterialized-
ViewRequest.full -> the REST body (RemoteRefreshMaterializedViewRequest.full);
pyo3 refresh_materialized_view(full=...); Connection.refresh_materialized_view(
name, full=) sync + async. A full refresh forces a recompute-and-replace and
preserves the view's indexes (reindexed by the distributed indexer).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add View.create_index / create_scalar_index / create_fts_index / search
as pass-throughs to open_table(name). A materialized view is a real Lance
dataset; these let it be indexed and searched like any other table,
closing the parity gap with Geneva (whose create_materialized_view returns
a first-class Table).
The server-side create_index handler records indexes declared on a view so
they survive a full refresh (which overwrites the dataset, dropping its
indices); that re-apply is wired in the sophon engine.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Thread priority (Kueue tier) through refresh_column at every layer (Python sync+async
+ RemoteTable -> pyo3 -> Rust client trait/public/remote -> REST body), mirroring
num_workers/batch_size. The function keeps its priority as a default; the per-refresh
value overrides. Also adds the previously-missed batch_size to RemoteTable.refresh_column
(the REST sync path). cargo check (lancedb --features remote --tests, lancedb-python) +
ruff clean.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
batch_size / num_workers / max_workers are invocation concerns (how to schedule THIS
refresh), so expose batch_size on refresh_column through every layer (Python sync+async
-> pyo3 -> Rust client -> the REST RefreshColumnRequest.batch_size, which the handler
already forwards into the backfill). num_workers/max_workers were already invocation-
placed; batch_size was the gap. The function may still carry a default; the refresh
override wins (extends the batch_size_override model). Both crates cargo-check clean.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A computed column is an expression over a registered function applied to input
columns, not a UDF coupled to a column. fn("data") already returned the expression
string "fn(data)"; make it a ColumnExpr (a str subclass) that also carries the
function's return type, so add_columns(computed={"vec": embed("data")}) declares the
column with no hand-written type. _normalize_computed handles the new form (and tuple
keys for STRUCT fan-out) and keeps the legacy {col: (sql_type, expression)} tuple.
add_computed_column is deprecated (delegates, with a DeprecationWarning). The function
stays decoupled from columns -- register once, apply anywhere.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Thread an optional partition_by through the client: CreateMaterializedViewRequest
-> REST body -> pyo3 binding -> Python create_materialized_view/create_view
kwarg (sync + async). The server partitions the view's table function by the
named source column -- by IVF index clusters if the column is indexed
(image-dedup), else by distinct value. Unifies Geneva's partition_by +
partition_by_indexed_column into one knob.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Mirrors the sync ergonomics on the async surface: AsyncConnection
create_function(udf, replace=)/create_view/job; AsyncTable.add_computed_column;
AsyncView + AsyncJobHandle (await + asyncio.sleep; shared submission-prefix
matcher with the sync JobHandle). Decorator + REST routes are shared/already
validated; this is the async wrapper layer. Exported from the package root.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
db.job(id) gets the submission id the refresh/backfill endpoints return,
but list_jobs / cancel report the agent's manifest id
(<table>-<type>-<first 8 of submission id>). JobHandle now matches that
(exact id or submission prefix) so wait()/progress() truly track, and
cancel() cancels by the resolved canonical id instead of the unusable
submission uuid.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Brings the @udf/@table_udf decorator + type inference into lancedb as
lancedb.udf (Apache-2.0), and adds the ergonomic glue to the existing
connection/table so there's no separate object model:
- create_function() accepts a Udf (and a replace= flag)
- Table.add_computed_column(column, udf)
- create_view(name, source, select, ...) -> View (assembles the SELECT)
- Connection.job(job_id) -> JobHandle
- View / JobHandle are thin references over a connection
Exports udf/table_udf/Udf/JobHandle/View from the package root. The
operations stay the existing remote-only methods (enterprise/cloud); the
decorator works locally.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Exposes the existing server-side CANCEL JOB (CoordinatorCatalog::cancel_job)
as a REST-backed SDK method: Database trait default NotSupported,
RemoteDatabase POSTs /v1/job/{id}/cancel, pyo3 binding, sync+async python
wrappers. Best-effort: a missing job returns false, not an error. Mock-HTTP
unit test in test_derived_compute_routes.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Per the interface design: computed columns are parameters on the
existing add_columns operation, not a separate method.
- BaseTable::add_computed_columns((name, sql_type) pairs + a f(args)
expression) -- default NotSupported; RemoteTable posts 'computed'
entries to the existing /v1/table/{id}/add_columns route.
- python add_columns gains computed= on LanceTable, RemoteTable, and
AsyncTable: tbl.add_columns(computed={'doubled': ('FLOAT',
'double_it(val)')}); grouped by expression so struct-returning
functions' columns land adjacently.
Adds the derived-compute interface to the SDK:
- Database trait: create/list/drop_function, create/refresh/alter/
drop/list_materialized_view, list_jobs -- default implementations
return Error::NotSupported (NotImplementedError in python), so
existing Database impls are unaffected; local single-node
implementations are planned. BaseTable gains refresh_column with
the same default.
- RemoteDatabase/RemoteTable implement them against the server REST
routes (/v1/function/*, /v1/materialized_view/*, /v1/job/list,
/v1/table/{id}/refresh_column), with mock-HTTP unit tests.
- Connection/Table public methods, pyo3 bindings (FunctionInfo,
MaterializedViewInfo, JobInfo pyclasses), and python wrappers:
sync on the DBConnection base (shared by local and remote
connections), async on AsyncConnection; refresh_column on
LanceTable, RemoteTable, and AsyncTable.
Fixes#2934
## Problem
Passing a `RemoteTable` to `permutation_builder()` raises a cryptic
`AttributeError`:
```
AttributeError: 'RemoteTable' object has no attribute '_inner'
```
This leaves users confused about what went wrong and why.
## Root Cause
`PermutationBuilder.__init__()` calls `async_permutation_builder(table)`
which accesses `table._inner` — the underlying Rust Lance table object.
`RemoteTable` connects to LanceDB Cloud/Enterprise and does not have a
local `_inner` attribute, making permutations fundamentally unsupported
on remote tables.
## Solution
Added an early check in `PermutationBuilder.__init__()` that verifies
the table has `_inner` before calling the Rust function, raising a clear
`TypeError` with an explanation of why permutations don't work on remote
tables.
## Verification
- Syntax validated with `ast.parse()`
- Structural verification: single call site (`permutation_builder()`),
guard placed before Rust FFI call
- Error message tested with mock: `MockRemoteTable()` correctly triggers
`TypeError`
## Changelog
| Date | Change | Author |
|------|--------|--------|
| 2026-06-28 | Added remote table guard in PermutationBuilder.__init__ |
rtmalikian |
### Files Changed
- python/python/lancedb/permutation.py — Added `hasattr(table,
"_inner")` check with clear error
---
**About the Author:** Raphael Malikian — Clinical AI Solutions
Architect. I specialise in building and fixing AI/ML systems for
healthcare, including vector databases, RAG pipelines, and clinical NLP.
If you need help with your project or think I can add value to your
organisation, feel free to reach out — I'd love to connect.
📧rtmalikian@gmail.com🔗 GitHub: https://github.com/rtmalikian🔗 LinkedIn:
http://www.linkedin.com/in/raphael-t-malikian-mbbs-bsc-hons-71075436a
---
**Disclosure:** This code was developed with assistance from
deepseek-v4-pro (DeepSeek) via Hermes Agent (Nous Research). All changes
were reviewed, tested against the actual codebase, and verified for
correctness.
Signed-off-by: rtmalikian <rtmalikian@gmail.com>
Expose the merged Rust OAuth header provider through the Python async
connection path.
Includes:
- Python OAuthConfig and OAuthFlowType public config objects
- PyO3 conversion into the Rust OAuthConfig
- connect_async(oauth_config=...) plumbing
- repr redaction coverage for client_secret
Local validation: cargo fmt --all; ruff format/check on touched Python
files.
By default the read freshness provider was not included in the namespace
client, preventing the read freshness headers from being included in the
request. This prevents checkout_latest() from working as expected when
using the namespace client.
This fix ensures the provided is built into the client when the
namespace impl and properties are provided.
Fixes#3563
## Summary
- Add `stacklevel=2` to 10 `warnings.warn()` calls across 4 files
- Fix broken message concatenation in `table.py` where the second string
was incorrectly passed as the `category` parameter
## Problem
Multiple `warnings.warn()` calls in the `python/lancedb/` codebase were
missing the `stacklevel` parameter. Without `stacklevel=2`, warnings
point to library internals instead of the caller's code, making it
impossible for users to identify which of their function calls triggered
the warning.
Additionally, two calls in `table.py` (lines 3411 and 3420) had a more
serious bug: the deprecation message was split across two separate
string arguments, causing the second string to be passed as the
`category` parameter instead of being concatenated with the first
string. This would cause `TypeError` when the warning was triggered.
## Changes
| File | Fixes | Description |
|------|-------|-------------|
| `embeddings/colpali.py` | 1 | Add `stacklevel=2` to
`use_token_pooling` deprecation warning |
| `remote/db.py` | 3 | Add `stacklevel=2` to `request_thread_pool`,
`connection_timeout`, `read_timeout` deprecation warnings |
| `remote/table.py` | 3 | Add `stacklevel=2` to `cleanup_old_versions`,
`compact_files`, `optimize` no-op warnings |
| `table.py` | 3 | Fix broken message concatenation for
`data_storage_version` and `enable_v2_manifest_paths` deprecation
warnings + add `stacklevel=2` to `retrain` deprecation warning |
## Verification
```python
# All warnings.warn() calls now have stacklevel
python3 -c "import ast, os; ..."
# Result: All warnings.warn() calls now have stacklevel!
```
## Changelog
| Date | Change | Author |
|------|--------|--------|
| 2026-06-20 | Fix missing stacklevel=2 in 10 warnings.warn() calls +
fix broken message concatenation | rtmalikian |
### Files Changed
- `python/python/lancedb/embeddings/colpali.py` — Add stacklevel=2
- `python/python/lancedb/remote/db.py` — Add stacklevel=2 to 3
deprecation warnings
- `python/python/lancedb/remote/table.py` — Add stacklevel=2 to 3 no-op
warnings
- `python/python/lancedb/table.py` — Fix broken message concatenation +
add stacklevel=2
### Verification
- AST-based audit confirms all `warnings.warn()` calls now include
`stacklevel=2`
- Syntax check passes for all 4 modified files
---
**About the Author:** Raphael Malikian — Clinical AI Solutions
Architect. I specialise in building and fixing AI/ML systems for
healthcare, including vector databases, RAG pipelines, and clinical NLP.
If you need help with your project or think I can add value to your
organisation, feel free to reach out — I'd love to connect.
📧rtmalikian@gmail.com🔗 GitHub: https://github.com/rtmalikian🔗 LinkedIn:
http://www.linkedin.com/in/raphael-t-malikian-mbbs-bsc-hons-71075436a
---
**Disclosure:** This code was developed with assistance from **Hermes
Agent** (Nous Research). All changes were reviewed, tested against the
actual codebase, and verified for correctness.
Signed-off-by: rtmalikian <rtmalikian@gmail.com>
The server now serializes an index's `created_at` as an RFC 3339 string
(e.g. `"2026-06-18T21:37:36.637Z"`), but the client deserializer only
accepted a unix timestamp in milliseconds. This caused `list_indices` to
fail with:
```
Failed to parse list_indices response: invalid type: string "2026-06-18T21:37:36.637Z", expected a unix timestamp in milliseconds
```
This PR replaces the fixed millisecond deserializer with a custom one
that accepts both an RFC 3339 string (current server) and a
unix-millisecond integer (legacy deployments), so the client works
against any server version.
It also improves the `IndexConfig` repr in the Python bindings.
Previously it printed only three fields (`Index(FTS, columns=["text"],
name="text_idx")`), hiding the metadata that `list_indices` returns. It
now renders every populated field, omitting any that are `None`. Each
value is valid Python — integer counts use `_` thousands separators and
`created_at` uses the `datetime` repr — so values round-trip. The real
repr is a single line; it's wrapped here for readability:
```python
>>> table.list_indices()
[IndexConfig(
name="text_idx",
index_type="FTS",
columns=["text"],
index_uuid="aefd3e00-2f95-4bdc-92ac-06de84442bf1",
type_url="/lance.table.InvertedIndexDetails",
created_at=datetime.datetime(2026, 6, 18, 21, 37, 36, 637000, tzinfo=datetime.timezone.utc),
num_indexed_rows=2,
size_bytes=3_669,
num_segments=1,
index_version=1,
index_details={
'lance_tokenizer': None,
'base_tokenizer': 'simple',
'language': 'English',
'with_position': False,
'max_token_length': 40,
'lower_case': True,
'stem': True,
'remove_stop_words': True,
'custom_stop_words': None,
'ascii_folding': True,
'min_ngram_length': 3,
'max_ngram_length': 3,
'prefix_only': False,
},
)]
```
Fixes#3556🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The "Create release commit" workflow (`make-release-commit.yml`) has
failed on its last two runs; no release tags have been created since
June 4. Since this workflow creates the tag that the cargo/npm/pypi/java
publish workflows trigger off of, all recent releases are effectively
blocked.
The workflow installs `bump-my-version` unpinned. Version `1.4.0` added
a check that refuses to run `pre_commit_hooks` containing shell syntax
(pipes, `&&`, `if`, variable expansion) unless `allow_shell_hooks =
true` is set. Both bumpversion configs use such hooks:
- `python/.bumpversion.toml` — updates `Cargo.lock` after the bump
(fails first)
- `.bumpversion.toml` — runs `mvn versions:set` for the Java packages
The job dies at the version-bump step with:
> Hook '…' contains shell syntax (pipes, redirects, or variable
expansion). Set `allow_shell_hooks = true` in your configuration to
enable shell execution…
This sets `allow_shell_hooks = true` in both configs to restore the
previous behavior.
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
## Summary
- clarify the Python error for passing a single dictionary to table
creation/add paths
- add a regression test for `create_table(..., data=dict)` so it points
users to a list of dictionaries
Fixes#409
## Testing
- `python -m pytest python/tests/test_table.py -q`
- `python -m ruff format python/lancedb/table.py
python/lancedb/scannable.py python/tests/test_table.py`
- `python -m ruff check python/lancedb/table.py
python/lancedb/scannable.py python/tests/test_table.py`
### Bug
`value_to_sql({...})` builds a DataFusion `named_struct(...)` literal
but interpolates the struct field names directly as `f"'{k}'"`. A field
name that contains a single quote therefore produces invalid SQL:
```python
>>> from lancedb.util import value_to_sql
>>> value_to_sql({"it's": 1})
"named_struct('it's', 1)" # invalid SQL — the quote terminates the literal
```
String *values* are already escaped (single quotes doubled) by the `str`
branch of `value_to_sql`, so keys and values were handled
inconsistently. This affects `Table.update(values={...})` /
`merge_insert` when a struct column has a field name containing `'`.
### Fix
Render the key through `value_to_sql(str(k))` so field names are escaped
exactly like string values:
```python
>>> value_to_sql({"it's": 1})
"named_struct('it''s', 1)"
```
Keys without special characters are unchanged (`'a'` stays `'a'`), so
existing behavior is preserved.
### Verification
```
$ pytest python/tests/test_util.py -k value_to_sql_dict
```
The new `test_value_to_sql_dict_key_escaping` covers quoted keys (incl.
nested structs) and fails on `main` (`named_struct('it's', 1)`), passes
with this change; the existing `test_value_to_sql_dict` still passes.
Co-authored-by: JSap0914 <JSap0914@users.noreply.github.com>
Closes#3502
## Problem
A bare, unparameterised `typing.List` / `typing.Tuple` field crashes
`to_arrow_schema` with an opaque `AttributeError: __args__`:
```python
from typing import Tuple
from lancedb.pydantic import LanceModel
class Doc(LanceModel):
items: Tuple
Doc.to_arrow_schema() # AttributeError: __args__
```
In `_py_type_to_arrow_type`, the branch `elif getattr(py_type,
"__origin__", None) in (list, tuple)` is taken for a bare generic (its
`__origin__` is `list / tuple`), but the next line reads
`py_type.__args__[0]`, and a bare generic has no `__args__`. Other
unsupported types (e.g. `Dict[str, int]`) correctly raise a clear
`TypeError`, so this case is inconsistent.
Fix
Guard the element-type lookup with `getattr(py_type, "__args__", None)`
and raise a clear `TypeError` when it is missing, matching the existing
behavior for other unsupported types. Bare builtin list / tuple are
unaffected (their `__origin__` is `None`, so they already fall through
to the existing `TypeError`).
Testing
- Added `test_bare_generic_raises_type_error` covering both `List` and
`Tuple`.
- ruff format and ruff check clean.
### Description
Adding branch support for RemoteTable by threading a branch selector
onto every operation the data plane accepts it on. Exposes the
currentBranch to nodejs and python through the bindings.
Matching the server handlers, the branch rides as:
- a `?branch=` query parameter for Arrow-body and query-only ops
(insert, merge_insert, multipart_*, version/list, drop_index)
- a `branch` field in the JSON body for everything else (count_rows,
query, update, delete, create_index, column ops, index list/stats,
stats, restore, describe, tags create/update)
A main-branch handle (`branch == None`) produces byte-identical requests
to before: no `branch` field and no `?branch=`
- Handle-per-branch: `create_branch` / `checkout_branch` return a new
handle with fresh caches and reset version/freshness state, mirroring
`NativeTable`.
- `create_branch` maps 409 to already-exists, 400 to invalid, and 404 to
not-found with source context, and sends without retry so the 409 stays
observable.
- `Ref` translation covers version, version-number (relative to the
handle's branch), and tag (resolved via the tags endpoint); `"main"` and
empty normalize to the main branch.
- Python branch handles persist their branch (and pinned version) across
pickle/fork, so a forked or pickled handle reopens on its branch rather
than silently reverting to main.
### Tests
- Rust mock tests per op category (query-param and body mechanisms,
branch CRUD, error paths, backward-compat).
- Python sync branch CRUD, `open_table(branch=)`, and a pickle
round-trip regression test.
## Summary
Surfaces the rich per-index metadata added in #3497 to the Python and
Node.js language bindings. Closes#3495.
New optional fields exposed on `IndexConfig` in both bindings:
- `index_uuid` / `indexUuid` — UUID of the first index segment
- `type_url` / `typeUrl` — protobuf type URL for the index
- `created_at` / `createdAt` — creation timestamp (milliseconds since
Unix epoch)
- `num_indexed_rows` / `numIndexedRows` — rows covered by the index
- `num_unindexed_rows` / `numUnindexedRows` — rows not yet indexed
- `size_bytes` / `sizeBytes` — total index file size in bytes
- `num_segments` / `numSegments` — number of index segments
- `index_version` / `indexVersion` — on-disk format version
- `index_details` / `indexDetails` — type-specific JSON details string
All fields are `None`/`undefined` for remote tables (which don't yet
surface this metadata through the server response).
## Changes
- `python/src/index.rs`: extend `IndexConfig` pyclass; update `From`
impl; update `__getitem__`
- `python/python/lancedb/_lancedb.pyi`: add type hints for new fields
- `python/python/tests/test_table.py`: new `test_index_config_fields`
test
- `nodejs/src/table.rs`: extend `IndexConfig` napi struct; update `From`
impl
- `nodejs/__test__/table.test.ts`: new test; update existing `toEqual`
assertions to `expect.objectContaining` to accommodate new fields
## Test plan
- [x] Python: `uv run --extra tests pytest
python/tests/test_table.py::test_index_config_fields`
- [x] Node.js: `pnpm test __test__/table.test.ts`
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
## Summary
Closes#3412
Implements `rename_table` for `LanceNamespaceDatabase` (sync and async
Python) and the Rust `NamespaceDatabase` backend. Previously these
raised `NotImplementedError`; this PR delegates to the
`LanceNamespace.rename_table` method which is part of the
lance-namespace spec.
### Changes
- **`rust/lancedb/src/database/namespace.rs`**: Remove the
`NotImplementedError` stub for `rename_table`. Build a
`RenameTableRequest` (with `id`, `new_table_name`, and optionally
`new_namespace_id`) and call `self.namespace.rename_table(...)`,
mirroring the existing `drop_table` pattern.
- **`python/python/lancedb/namespace.py`**: Import `RenameTableRequest`
from `lance_namespace`. Replace the `raise NotImplementedError` in both
`LanceNamespaceDatabase.rename_table` (sync) and
`AsyncLanceNamespaceDatabase.rename_table` (async) with a call to
`self._namespace_client.rename_table(request)`.
- **`python/python/tests/test_namespace.py`**: Replace the
`test_rename_table_not_supported` test (which checked for
`NotImplementedError`) with `test_rename_table`, which:
1. Creates a table in a namespace
2. Calls `rename_table` with `cur_namespace_path` and
`new_namespace_path`
3. Asserts the old name is gone from `table_names()`
4. Asserts the new name appears in `table_names()`
5. Verifies the renamed table can be opened
## Test plan
- [ ] Existing namespace tests pass in CI (all rely on
`lance.namespace.DirectoryNamespace` which requires the full lance
package)
- [ ] `test_rename_table` exercises the full rename path: create →
rename → verify old gone → verify new present → open
- [ ] Rust build passes with the updated `namespace.rs` (requires Rust
toolchain in CI)
## Summary
Closes#3406
Add a regression matrix in `python/python/tests/test_nested_fields.py`
that exercises the full nested field index lifecycle for both the sync
and async Python table APIs. The tests will fail if any implementation
regresses to leaf-only field names in `list_indices`, `index_stats`,
search, or filter results.
## Test scenarios covered
**Index types:** BTree scalar, IvfPq vector, FTS
**Field-name edge cases (per acceptance criteria):**
- `rowId` — camelCase top-level field
- `` `row-id` `` — hyphenated top-level field (escaped)
- `parent.`\``leaf.name`\`` ` — struct leaf whose name contains a
literal dot
- `MetaData.userId` — mixed-case nested path
- `` `meta-data`.`user-id` `` — hyphenated struct with hyphenated leaf
**Lifecycle operations per index type:**
- `create_index` / `create_scalar_index` / `create_fts_index`
- `list_indices` → verify canonical full dotted path (not leaf name)
- `index_stats` → verify row count and index type
- Filtered scan (`WHERE nested.field = value`)
- Vector search via nested embedding column
- FTS search via nested text column
- `add` (append) then re-check index listing
- `optimize` then re-check index listing
**Both sync and async APIs** are covered in parallel test classes.
## Notes
Lance forbids top-level field names that contain a literal `.`, so the
`` `a.b` `` acceptance-criterion variant is exercised as a *struct leaf*
field (`parent.`\``leaf.name`\``) rather than a top-level column.
Another little pain point as I was working to integrate with
paperless-ngx. The read path of table.search() or table.query() already
accepted an Expr, but write paths Table.delete and
merge_insert(...).when_not_matched_by_source_delete did not. This PR
attempts to close that gap, so writes and reads can both use Expr,
instead of one side needing to build a string.
The `Expr` build already includes a lot of useful filtering options,
`eq, ne, gt/gte, lt/lte, and_, or_, contains, cast`, but is was missing
a membership like `isin`. This PR adds that support, as minimally as
possible, allowing easy filtering for membership in a list, without
needing to be a series of `where` expressions.
I didn't see anything in CONTRIBUTING.md about needing a feature request
or issue first, so I just made the change. My apologies if I missed that
somewhere.
Thanks for the vector store, we're using it now in paperless-ngx.
Adds an FM-Index — a scalar index over string and binary columns that
accelerates substring search (`contains(col, 'needle')`), distinct from
the tokenized `FTS` index — across the Rust core and the Python and
TypeScript bindings.
## Rust
- `Index::Fm(FmIndexBuilder)` and `IndexType::Fm`.
- `make_index_params` maps `Index::Fm` to Lance's
`ScalarIndexParams::for_builtin(BuiltinIndexType::Fm)`.
- `supported_fm_data_type` validates
`Utf8`/`LargeUtf8`/`Binary`/`LargeBinary` columns.
- `list_indices` round-trips the type (`"Fm"` → `IndexType::Fm`); the
remote wire type is `"FM"`.
## Python
Adds `lancedb.index.Fm`, accepted by `create_index`:
```python
from lancedb.index import Fm
await tbl.create_index("text", config=Fm())
```
## TypeScript
Adds the `Index.fm()` factory:
```ts
await tbl.createIndex("text", { config: Index.fm() });
```
## Summary
This PR extends nested-field regression coverage across Rust
local/remote, Python sync/async, and Node so canonical escaped paths
stay consistent across scalar, vector, and FTS index lifecycle behavior.
It also aligns LanceDB's LabelList type gate with Lance by accepting
`LargeList<primitive>` columns while keeping `List<Struct<...>>`
unsupported until Lance defines stable membership semantics for struct
labels.
Part of #3406.
## What's broken
`Table.update(values={...})` raises `NotImplementedError: SQL conversion
is not implemented for this type` when a value is a numpy scalar such as
`np.int64`, `np.int32`, `np.float32`, or `np.bool_`. These arise
naturally from indexing an ndarray or a pandas int/bool column.
`np.float64` happens to work (it subclasses `float`), which makes the
failure inconsistent and surprising.
```python
df = pd.DataFrame({"id": np.array([10, 20], dtype="int32")})
t.update(where="id = 1", values={"id": df["id"].iloc[0]}) # np.int32
# -> NotImplementedError: SQL conversion is not implemented for this type
```
## Why it happens
`value_to_sql` is a `singledispatch` with handlers only for native
Python types and `np.ndarray`; numpy `integer`/`floating`/`bool_`
scalars aren't Python subclasses, so they fall through to the
`NotImplementedError` base.
## Fix
Register handlers for `np.bool_`, `np.integer`, and `np.floating` that
delegate to the existing native handlers.
## Test
`value_to_sql` on `np.int32/int64/float32/float64/bool_` all convert;
`np.int32` raised before.
Co-authored-by: Ishaan Samantray <ishaansamantray@Ishaans-MacBook-Pro.local>
### Description
Stacked on #3490. Adds an optional version to branch checkout across the
Rust core and the Python and TypeScript SDKs, so you can open a specific
version on a branch ("version V of branch B"), not just the branch's
latest version
Rust
```rust
// Open version 3 of branch "exp" (a read-only view): check out from an
// existing table, or open it directly from the connection.
let exp_v3 = table.checkout_branch("exp", Some(3)).await?;
let exp_v3 = db.open_table("items").branch("exp").version(3).execute().await?;
// checkout_latest re-attaches to the branch's writable HEAD.
exp_v3.checkout_latest().await?;
// With no branch, a version opens main at that version.
let main_v3 = db.open_table("items").version(3).execute().await?;
```
Python
```python
# Open version 3 of branch "exp" (a read-only view): check out from an
# existing table, or open it directly from the connection.
branch_v3 = await table.branches.checkout("exp", version=3)
branch_v3 = await db.open_table("items", branch="exp", version=3)
# checkout_latest re-attaches to the branch's writable HEAD.
await branch_v3.checkout_latest()
# With no branch, a version opens main at that version.
main_v3 = await db.open_table("items", version=3)
```
TypeScript
```typescript
// Open version 3 of branch "exp" (a read-only view): check out from an
// existing table, or open it directly from the connection.
const branchV3 = await (await table.branches()).checkout("exp", 3);
const opened = await db.openTable("items", undefined, { branch: "exp", version: 3 });
// checkoutLatest re-attaches to the branch's writable HEAD.
await branchV3.checkoutLatest();
// With no branch, a version opens main at that version.
const mainV3 = await db.openTable("items", undefined, { version: 3 });
```
### Testing
- Added unit tests (Rust, Python sync + async, TypeScript):
branch-scoped resolution at a version number shared with `main` and with
another branch, read-only enforcement on a pinned handle,
`checkout_latest` recovery to the branch's HEAD, fork-point reads, and
the nonexistent-version/branch error paths.
- Ran smoke tests against the Python and TypeScript SDKs on local
machine.
### Description
Adds first-class support for table branches across the Rust core and the
Python and TypeScript SDKs.
Rust
```rust
use lance::dataset::refs::Ref;
// Create a branch from main and write to it — main is untouched.
let exp = table.create_branch("exp", Ref::Version(None, None)).await?;
exp.add(batches).await?;
// Reopen the branch later: check out from a table, or open it directly.
let exp = table.checkout_branch("exp").await?;
let exp = db.open_table("items").branch("exp").execute().await?;
let branches = table.list_branches().await?;
table.delete_branch("exp").await?;
```
Python
```python
# Create a branch from main and write to it
branch = await table.branches.create("exp", from_ref="main")
await branch.add(data)
# Reopen the branch later: check out from a table, or open it directly.
branch = await table.branches.checkout("exp")
branch = await db.open_table("items", branch="exp")
await table.branches.list()
await table.branches.delete("exp")
```
TypeScript
```typescript
const branches = await table.branches();
// Create a branch from main and write to it
const branch = await branches.create("exp");
await branch.add(data);
// Reopen the branch later: check out from a table, or open it directly.
const checkedOut = await branches.checkout("exp");
const opened = await db.openTable("items", undefined, { branch: "exp" });
await branches.list();
await branches.delete("exp");
```
### Testing
- Added unit tests
- ran smoke tests against python and typescript sdks on local machine
### Next steps
- Add RemoteTable support
- Add Branch Comparison support
- Merge Branching support
## Bug Fix
### What is the bug?
Namespace-backed `LanceTable.to_arrow()` full-table reads bypassed the
existing `QueryTable` server-side query path and called the lower-level
table `to_arrow()` implementation directly. In Geneva/Sophon this could
fail while parsing the Arrow IPC response for
`hist.get_table().to_arrow()` / `to_pandas()`, even though
`hist.get_table().search().to_arrow()` worked.
### What issues or incorrect behavior does the bug cause?
Full-table reads on namespace-backed tables with `QueryTable` pushdown
could fail with Arrow IPC parse errors, while query/search reads on the
same table succeeded. Since `to_pandas()` delegates through `to_arrow()`
for non-blob/native cases, pandas export was affected too.
### How does this PR fix the problem?
When `QueryTable` pushdown is enabled, sync and async table `to_arrow()`
now construct a plain no-filter, no-limit, all-columns query and execute
it through the table-level `_execute_query()` path. `AsyncTable` now
preserves namespace context from async namespace connections so async
full reads can make the same pushdown decision. Non-namespace tables and
namespace tables without `QueryTable` pushdown keep their existing
behavior.
### Tests
- `uv run --extra tests --extra dev --no-sync ruff check
python/lancedb/table.py python/lancedb/namespace.py
python/tests/test_namespace.py`
- `uv run --extra tests --extra dev --no-sync ruff format
python/lancedb/table.py python/lancedb/namespace.py
python/tests/test_namespace.py`
- `uv run --extra tests --extra dev --no-sync pytest
python/tests/test_namespace.py::TestPushdownOperations::test_lance_table_to_arrow_uses_query_pushdown
python/tests/test_namespace.py::TestAsyncPushdownOperations::test_async_table_to_arrow_uses_query_pushdown
python/tests/test_namespace.py::test_local_table_to_arrow_and_to_pandas_are_unchanged
-q`
- `uv run --extra tests --extra dev --no-sync pytest
python/tests/test_namespace.py -q`
BREAKING CHANGE: direct Rust users lose the `IndexStatistics::loss`
field. Python and Node.js consumers are unaffected in practice for
remote tables (the value was always `None`/absent), but the attribute is
gone for local tables too.
`IndexStatistics::loss` was local-only — LanceDB Cloud never returned
it, so
`RemoteTable::index_stats` always set `loss: None`. It's vestigial; this
removes it.
- Remove `loss` from `IndexStatistics` and the internal `IndexMetadata`
in `rust/lancedb/src/index.rs`, plus the summing logic in
`NativeTable::index_stats`.
- Drop `loss` from the Python and Node.js bindings (and their
tests/docs).
Fixes#3493🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>