Commit Graph

971 Commits

Author SHA1 Message Date
Jack Ye
d059447feb refactor: rename deserialize to deserialize_conn, consistent pushdown naming
- Rename deserialize() -> deserialize_conn() for clarity
- Rename internal _pushdown_operations -> _namespace_client_pushdown_operations
  for consistency with the parameter name
- Rename serialized key "pushdown_operations" ->
  "namespace_client_pushdown_operations"

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 12:58:14 -07:00
Jack Ye
6427024bcb fix: route list_namespaces through _namespace_conn for consistency
list_namespaces with empty path was going through Rust (ListingDatabase)
which doesn't see namespaces created via the directory namespace client.
Always delegate to _namespace_conn() so create/list are consistent.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 12:58:14 -07:00
Jack Ye
be19a880a9 fix: format, lint, and update tests for namespace delegation
- Run ruff format on all changed files
- Fix F821 forward reference in _namespace_conn return type
- Update test_local_namespace_operations to verify operations succeed
  instead of expecting NotImplementedError (namespace ops now work on
  LanceDBConnection via directory namespace delegation)
- Remove test_local_create_namespace_not_supported and
  test_local_drop_namespace_not_supported (no longer applicable)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 12:58:14 -07:00
Jack Ye
e7ed3d5dab fix: merge user storage_options in server-side create_table
_create_table_server_side was only passing self.storage_options
(connection-level) to CreateTableRequest, ignoring the user-provided
storage_options parameter. This caused per-table options like
new_table_data_storage_version to be silently dropped.

Fix both sync and async paths to merge user options on top of
connection options.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 12:58:14 -07:00
Jack Ye
7ed4b059a6 refactor: rename serialize_to_json/from_serialized_json to serialize/deserialize
Simpler names, docs no longer reference JSON as the serialization format.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 12:58:14 -07:00
Jack Ye
f36583d0c3 fix: keep original Rust path for root namespace operations
Only delegate to _namespace_conn() when namespace_path is non-empty.
Root namespace operations (list_namespaces, list_tables with empty
path) still go through the original Rust connection to avoid regression.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 12:58:14 -07:00
Jack Ye
fea2ef6a0a refactor: generalize worker property overrides with _lancedb_worker_ prefix
Replace the special-cased worker_uri key with a generic mechanism:
any namespace_client_properties key starting with _lancedb_worker_
has the prefix stripped and overrides the corresponding property
when for_worker=True.

e.g. _lancedb_worker_uri overrides uri in worker context.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 12:58:14 -07:00
Jack Ye
fc2a9726b2 refactor: delegate child namespace ops to LanceNamespaceDBConnection
Instead of reimplementing namespace logic (describe_table, merge
storage_options, etc.) in LanceDBConnection, delegate child namespace
operations to a LanceNamespaceDBConnection via _namespace_conn().

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 12:58:14 -07:00
Jack Ye
e3893dacf8 feat: cache namespace_client and auto-delegate child namespace operations
LanceDBConnection now:
- Caches namespace_client() result to avoid repeated DirectoryNamespace builds
- Auto-delegates open_table/create_table with non-empty namespace_path
  through the directory namespace client
- Routes create_namespace/drop_namespace/describe_namespace/list_namespaces
  through the namespace client
- Routes list_tables/drop_table for child namespaces through namespace client

This enables local storage connections to transparently handle child
namespaces like ["__system"] without requiring a separate
LanceNamespaceDBConnection.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 12:58:14 -07:00
Jack Ye
dd2a0ec48f feat: add serialize_to_json/from_serialized_json for DBConnection
Add serialization support to DBConnection classes so connections
can be reconstructed in remote workers without tracking namespace
params separately.

- DBConnection.serialize_to_json() base method
- LanceDBConnection: serializes uri, storage_options, read_consistency_interval
- LanceNamespaceDBConnection: stores namespace_client_impl/properties,
  serializes all connection params including pushdown_operations
- from_serialized_json() factory with for_worker flag for worker_uri swap
- connect_namespace() now passes impl/properties to connection for serialization

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 12:58:14 -07:00
Lance Release
231f0655ce Bump version: 0.31.0-beta.3 → 0.31.0-beta.4 2026-04-12 03:57:35 +00:00
Lance Release
1f1726369d Bump version: 0.31.0-beta.2 → 0.31.0-beta.3 2026-04-11 22:44:25 +00:00
Lance Release
11bc674548 Bump version: 0.31.0-beta.1 → 0.31.0-beta.2 2026-04-11 07:05:36 +00:00
Dhruv Garg
4761fa9bcb fix(python): migrate gemini-text provider to google-genai sdk (#3250)
## Summary
- migrate gemini-text embedding provider from deprecated
google.generativeai to google.genai
- update Python embedding extra dependency to google-genai
- update default model name to gemini-embedding-001
- adapt embed calls to Client().models.embed_content(...)
- apply lint fixes from CI

## Related
- Closes #3191
2026-04-09 15:28:34 -07:00
lennylxx
4c2939d66e fix(python): guard against None before .decode() on split_names metadata key (#3229)
`.get(b"split_names", None).decode()` was called unconditionally in both
Permutations.__init__ and Permutation.from_tables(), crashing with
AttributeError when schema metadata existed but lacked the split_names
key. Guard the decode behind a None check and add regression tests.
2026-04-08 16:04:13 -07:00
yaommen
a813ce2f71 fix(python): sanitize bad vectors before Arrow cast (#3158)
## Problem

`on_bad_vectors="drop"` is supposed to remove invalid vector rows before
write, but for some schema-defined vector columns it can still fail
later during Arrow cast instead of dropping the bad row.

Repro:
```python
class MySchema(LanceModel):
    text: str
    embedding: Vector(16)

table = db.create_table("test", schema=MySchema)
table.add(
    [
        {"text": "hello", "embedding": []},
        {"text": "bar", "embedding": [0.1] * 16},
    ],
    on_bad_vectors="drop",
)
```
Before:
```
RuntimeError
Arrow error: C Data interface error: Invalid: ListType can only be casted to FixedSizeListType if the lists are all the expected size.
```
After:
```
rows 1
texts ['bar']
```
## Solution

Make bad-vector sanitization use schema dimensions before cast, while
keeping the handling scoped to vector columns identified by schema
metadata or existing vector-name heuristics.

This also preserves existing integer vector inputs and avoids applying
on_bad_vectors to unrelated fixed-size float columns.


Fixes #1670

Signed-off-by: yaommen <myanstu@163.com>
2026-04-08 09:09:41 -07:00
Jack Ye
a898dc81c2 feat: add user_id field to ClientConfig for user identification (#3240)
## Summary

- Add a `user_id` field to `ClientConfig` that allows users to identify
themselves to LanceDB Cloud/Enterprise
- The user_id is sent as the `x-lancedb-user-id` HTTP header in all
requests
- Supports three configuration methods:
  - Direct assignment via `ClientConfig.user_id`
  - Environment variable `LANCEDB_USER_ID`
  - Indirect env var lookup via `LANCEDB_USER_ID_ENV_KEY`

Closes #3230

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-04-06 11:20:10 -07:00
Lance Release
0ac59de5f1 Bump version: 0.31.0-beta.0 → 0.31.0-beta.1 2026-04-05 02:50:52 +00:00
LanceDB Robot
d082c2d2ac chore: update lance dependency to v5.0.0-beta.5 (#3237)
## Summary
- update Rust Lance workspace dependencies to `v5.0.0-beta.5` using
`ci/set_lance_version.py`
- update Java `lance-core` dependency property to `5.0.0-beta.5`
- refresh Cargo lockfile to the new Lance tag

## Verification
- `cargo clippy --workspace --tests --all-features -- -D warnings`
- `cargo fmt --all`

## Upstream Tag
- https://github.com/lance-format/lance/releases/tag/v5.0.0-beta.5

---------

Co-authored-by: Jack Ye <yezhaoqin@gmail.com>
2026-04-04 19:49:51 -07:00
Zelys
9d8699f99e feat(python): support Enum types in Pydantic to Arrow schema conversion (#3232)
## Summary

Fixes #1846.

Python `Enum` fields raised `TypeError: Converting Pydantic type to
Arrow Type: unsupported type <enum 'SomethingTypes'>` when converting a
Pydantic model to an Arrow schema.

The fix adds Enum detection in `_pydantic_type_to_arrow_type`. When an
Enum subclass is encountered, the value type of its members is inspected
and mapped to the appropriate Arrow type:

- `str`-valued enums (e.g. `class Status(str, Enum)`) → `pa.utf8()`
- `int`-valued enums (e.g. `class Priority(int, Enum)`) → `pa.int64()`
- Other homogeneous value types → the Arrow type for that Python type
- Mixed-value or empty enums → `pa.utf8()` (safe fallback)

This covers the common `(str, Enum)` and `(int, Enum)` mixin patterns
used in practice.

## Changes

- `python/python/lancedb/pydantic.py`: add Enum branch in
`_pydantic_type_to_arrow_type`
- `python/python/tests/test_pydantic.py`: add `test_enum_types` covering
`str`, `int`, and `Optional` Enum fields

## Note on #2395

PR #2395 handles `StrEnum` (Python 3.11+) specifically, using a
dictionary-encoded type. This PR handles the broader `(str, Enum)` /
`(int, Enum)` mixin pattern that works across all Python versions and
stores values as their natural Arrow type.

AI assistance was used in developing this fix.
2026-04-03 10:40:49 -07:00
Lance Release
590c0c1e77 Bump version: 0.30.2 → 0.31.0-beta.0 2026-04-03 08:45:29 +00:00
Jack Ye
e26b22bcca refactor!: consolidate namespace related naming and enterprise integration (#3205)
1. Refactored every client (Rust core, Python, Node/TypeScript) so
“namespace” usage is explicit: code now keeps namespace paths
(namespace_path) separate from namespace clients (namespace_client).
Connections propagate the client, table creation routes through it, and
managed versioning defaults are resolved from namespace metadata. Python
gained LanceNamespaceDBConnection/async counterparts, and the
namespace-focused tests were rewritten to match the clarified API
surface.
2. Synchronized the workspace with Lance 5.0.0-beta.3 (see
https://github.com/lance-format/lance/pull/6186 for the upstream
namespace refactor), updating Cargo/uv lockfiles and ensuring all
bindings align with the new namespace semantics.
3. Added a namespace-backed code path to lancedb.connect() via new
keyword arguments (namespace_client_impl, namespace_client_properties,
plus the existing pushdown-ops flag). When those kwargs are supplied,
connect() delegates to connect_namespace, so users can opt into
namespace clients without changing APIs. (The async helper will gain
parity in a later change)
2026-04-03 00:09:03 -07:00
Lance Release
5d550124bd Bump version: 0.30.2-beta.2 → 0.30.2 2026-03-31 21:25:04 +00:00
Lance Release
c57cb310a2 Bump version: 0.30.2-beta.1 → 0.30.2-beta.2 2026-03-31 21:25:02 +00:00
Dan Tasse
97754f5123 fix: change _client reference to _conn (#3188)
This code previously referenced `self._client`, which does not exist.
This change makes it correctly call `self._conn.close()`
2026-03-31 13:29:17 -07:00
Pratik Dey
7b1c063848 feat(python): add type-safe expression builder API (#3150)
Introduces col(), lit(), func(), and Expr class as alternatives to raw
SQL strings in .where() and .select(). Expressions are backed by
DataFusion's Expr AST and serialized to SQL for remote table compat.

Resolves: 
- https://github.com/lancedb/lancedb/issues/3044 (python api's)
- https://github.com/lancedb/lancedb/issues/3043 (support for filter)
- https://github.com/lancedb/lancedb/issues/3045 (support for
projection)

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-31 11:32:49 -07:00
Will Jones
e3d53dd185 fix(python): skip test_url_retrieve_downloads_image when PIL not installed (#3208)
The test added in #3190 unconditionally imports `PIL`, which is an
optional dependency. This causes CI failures in environments where
Pillow isn't installed (`ModuleNotFoundError: No module named 'PIL'`).

Use `pytest.importorskip` to skip gracefully when Pillow is unavailable.

Fixes CI failure on main.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-30 14:48:49 -07:00
Will Jones
66804e99fc fix(python): use correct exception types in namespace tests (#3206)
## Summary
- Namespace tests expected `RuntimeError` for table-not-found and
namespace-not-empty cases, but `lance_namespace` raises
`TableNotFoundError` and `NamespaceNotEmptyError` which inherit from
`Exception`, not `RuntimeError`.
- Updated `pytest.raises` to use the correct exception types.

## Test plan
- [x] CI passes on `test_namespace.py`

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 12:55:54 -07:00
lennylxx
9f85d4c639 fix(embeddings): add missing urllib.request import in url_retrieve (#3190)
url_retrieve() calls urllib.request.urlopen() but only urllib.error was
imported, causing AttributeError for any HTTP URL input. This affects
open-clip, siglip, and jinaai embedding functions when processing image
URLs.

The bug has existed since the embeddings API refactor (#580) but was
masked because most users pass local file paths or bytes rather than
HTTP URLs.
2026-03-30 12:03:44 -07:00
lif
4c44587af0 fix: table.add(mode='overwrite') infers vector column types (#3184)
Fixes #3183

## Summary

When `table.add(mode='overwrite')` is called, PyArrow infers input data
types (e.g. `list<double>`) which differ from the original table schema
(e.g. `fixed_size_list<float32>`). Previously, overwrite mode bypassed
`cast_to_table_schema()` entirely, so the inferred types replaced the
original schema, breaking vector search.

This fix builds a merged target schema for overwrite: columns present in
the existing table schema keep their original types, while columns
unique to the input pass through as-is. This way
`cast_to_table_schema()` is applied unconditionally, preserving vector
column types without blocking schema evolution.

## Changes

- `rust/lancedb/src/table/add_data.rs`: For overwrite mode, construct a
target schema by matching input columns against the existing table
schema, then cast. Non-overwrite (append) path is unchanged.
- Added `test_add_overwrite_preserves_vector_type` test that creates a
table with `fixed_size_list<float32>`, overwrites with `list<double>`
input, and asserts the original type is preserved.

## Test Plan

- `cargo test --features remote -p lancedb -- test_add_overwrite` — all
4 overwrite tests pass
- Full suite: 454 passed, 2 failed (pre-existing `remote::retry` flakes
unrelated to this change)

---------

Signed-off-by: majiayu000 <1835304752@qq.com>
2026-03-30 10:57:33 -07:00
lennylxx
1d1cafb59c fix(python): don't assign dict.update() return value in _sanitize_data (#3198)
dict.update() mutates in place and returns None. Assigning its result
caused with_metadata(None) to strip all schema metadata when embedding
metadata was merged during create_table with embedding_functions.
2026-03-30 10:15:45 -07:00
Dan Tasse
cca6a7c989 fix: raise instead of return ValueError (#3189)
These couple of cases used to return ValueError; should raise it
instead.
2026-03-25 18:49:29 -07:00
Lance Release
76429730c0 Bump version: 0.30.2-beta.0 → 0.30.2-beta.1 2026-03-25 16:21:26 +00:00
Lance Release
f4d613565e Bump version: 0.30.1 → 0.30.2-beta.0 2026-03-25 03:22:55 +00:00
Will Jones
1d6e00b902 feat: progress bar for add() (#3067)
## Summary

Adds progress reporting for `table.add()` so users can track large write
operations. The progress callback is available in Rust, Python (sync and
async), and through the PyO3 bindings.

### Usage

Pass `progress=True` to get an automatic tqdm bar:

```python
table.add(data, progress=True)
# 100%|██████████| 1000000/1000000 [00:12<00:00, 82345 rows/s, 45.2 MB/s | 4/4 workers]
```

Or pass a tqdm bar for more control:

```python
from tqdm import tqdm

with tqdm(unit=" rows") as pbar:
    table.add(data, progress=pbar)
```

Or use a callback for custom progress handling:

```python
def on_progress(p):
    print(f"{p['output_rows']}/{p['total_rows']} rows, "
          f"{p['active_tasks']}/{p['total_tasks']} workers, "
          f"done={p['done']}")

table.add(data, progress=on_progress)
```

In Rust:

```rust
table.add(data)
    .progress(|p| println!("{}/{:?} rows", p.output_rows(), p.total_rows()))
    .execute()
    .await?;
```

### Details

- `WriteProgress` struct in Rust with getters for `elapsed`,
`output_rows`, `output_bytes`, `total_rows`, `active_tasks`,
`total_tasks`, and `done`. Fields are private behind getters so new
fields can be added without breaking changes.
- `WriteProgressTracker` tracks progress across parallel write tasks
using a mutex for row/byte counts and atomics for active task counts.
- Active task tracking uses an RAII guard pattern (`ActiveTaskGuard`)
that increments on creation and decrements on drop.
- For remote writes, `output_bytes` reflects IPC wire bytes rather than
in-memory Arrow size. For local writes it uses in-memory Arrow size as a
proxy (see TODO below).
- tqdm postfix displays throughput (MB/s) and worker utilization
(active/total).
- The `done` callback always fires, even on error (via `FinishOnDrop`),
so progress bars are always finalized.

### TODO

- Track actual bytes written to disk for local tables. This requires
Lance to expose a progress callback from its write path. See
lance-format/lance#6247.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-23 16:14:13 -07:00
Prashanth Rao
ed7e01a58b docs: fix rendering issues with missing index types in API docs (#3143)
## Problem

The generated Python API docs for
`lancedb.table.IndexStatistics.index_type` were misleading because
mkdocstrings renders that field’s type annotation directly, and the
existing `Literal[...]` listed only a subset of the actual canonical SDK
index type strings.

Current (missing index types):
<img width="823" height="83" alt="image"
src="https://github.com/user-attachments/assets/f6f29fe3-4c16-4d00-a4e9-28a7cd6e19ec"
/>


## Fix

- Update the `IndexStatistics.index_type` annotation in
`python/python/lancedb/table.py` to include the full supported set of
canonical values, so the generated docs show all valid index_type
strings inline.
- Add a small regression test in `python/python/tests/test_index.py` to
ensure the docs-facing annotation does not drift silently again in case
we add a new index/quantization type in the future.
- Bumps mkdocs and material theme versions to mkdocs 1.6 to allow access
to more features like hooks

After fix (all index types are included and tested for in the
annotations):
<img width="1017" height="93" alt="image"
src="https://github.com/user-attachments/assets/66c74d5c-34b3-4b44-8173-3ee23e3648ac"
/>
2026-03-20 09:34:42 -07:00
Lance Release
f5b21c0aa4 Bump version: 0.30.1-beta.0 → 0.30.1 2026-03-20 00:35:03 +00:00
Lance Release
e927924d26 Bump version: 0.30.0 → 0.30.1-beta.0 2026-03-20 00:35:02 +00:00
marca116
3a200d77ef fix: pre-filtering on hybrid search (#3096)
When using hybrid search with a where filter, the prefilter argument is
silently inverted. Passing prefilter=True actually performs
post-filtering, and prefilter=False actually performs pre-filtering.
2026-03-16 21:48:42 -07:00
Lance Release
c89240b16c Bump version: 0.30.0-beta.6 → 0.30.0 2026-03-16 22:46:19 +00:00
Lance Release
099ff355a4 Bump version: 0.30.0-beta.5 → 0.30.0-beta.6 2026-03-16 22:46:17 +00:00
Weston Pace
25eb1fbfa4 fix: restore storage options on copy in localstack tests (#3148) 2026-03-16 14:02:19 -07:00
Mesut-Doner
c2e543f1b7 feat(rust): support Expr in projection query (#3069)
Referred and followed [`Select::Dynamic`] implementation. 

Closes #3039
2026-03-13 12:54:26 -07:00
Weston Pace
216c1b5f77 docs: remove experimental label from optimize and warn about delete_unverified (#3128)
## Summary
- Removes the "Experimental API" section from `optimize` method
documentation across Rust, Python, and TypeScript
- Adds a warning to `delete_unverified` documentation in all bindings:
this should only be set to true if you can guarantee no other process is
working on the dataset, otherwise it could be corrupted
- Fixes a typo ("shoudl" → "should")

Closes #3125


🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 14:37:42 +08:00
Esteban Gutierrez
f951da2b00 feat: support prewarm_index and prewarm_data on remote tables (#3110)
## Summary

- Implement `RemoteTable.prewarm_data(columns)` calling `POST
/v1/table/{id}/page_cache/prewarm/`
- Implement `RemoteTable.prewarm_index(name)` calling `POST
/v1/table/{id}/index/{name}/prewarm/` (previously returned
`NotSupported`)
- Add `BaseTable::prewarm_data(columns)` trait method and `Table` public
API in Rust core
- Add PyO3 bindings and Python API (`AsyncTable`, `LanceTable`,
`RemoteTable`) for `prewarm_data`
- Add type stubs for `prewarm_index` and `prewarm_data` in
`_lancedb.pyi`
- Upgrade Lance to 3.0.0-rc.3 with breaking change fixes

Co-authored-by: Will Jones <willjones127@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 15:39:39 -05:00
Esteban Gutierrez
6530d82690 chore: dependency updates and security fixes (#3116)
## Summary

- Update dependencies across Rust, Python, Node.js, Java, Docker, and
docs
- Pin unpinned dependency lower bounds to prevent silent downgrades
- Bump CI actions to current major versions

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 20:04:27 -07:00
Lance Release
6de8f42dcd Bump version: 0.30.0-beta.4 → 0.30.0-beta.5 2026-03-09 19:56:15 +00:00
Will Jones
5c3bd68e58 feat: upgrade Lance to 3.0.0-rc.3 (#3104)
Co-authored-by: Jack Ye <yezhaoqin@gmail.com>
2026-03-09 12:55:20 -07:00
Xuanwo
68c07f333f chore: unify component README titles (#3066) 2026-03-09 21:47:58 +08:00
Lance Release
f31561c5bb Bump version: 0.30.0-beta.3 → 0.30.0-beta.4 2026-03-09 08:45:25 +00:00