Commit Graph

2450 Commits

Author SHA1 Message Date
Jack Ye
f54f5600ad refactor(python): use namespace-backed rust connection 2026-04-18 17:09:13 -07:00
Jack Ye
e34fe84c7f refactor(rust): align namespace client naming 2026-04-18 00:49:04 -07:00
Jack Ye
5b1f248257 fix(python): use rust namespace connection directly 2026-04-18 00:28:23 -07:00
Jack Ye
95e34d47b9 refactor(python): decouple namespace tables from temp connections 2026-04-17 20:30:52 -07:00
Jack Ye
a0defd448f fix(python): handle namespace tables via resolved locations 2026-04-17 15:39:43 -07:00
Jack Ye
0fadb65153 refactor(python): document temporary namespace connections 2026-04-17 14:44:39 -07:00
Jack Ye
15fbcf61fc fix(python): use declared location for namespace table open 2026-04-17 09:58:00 -07:00
Lance Release
d715bbb588 Bump version: 0.28.0-beta.6 → 0.28.0-beta.7 2026-04-17 08:12:27 +00:00
Lance Release
5ce3d8d141 Bump version: 0.31.0-beta.6 → 0.31.0-beta.7 python-v0.31.0-beta.7 2026-04-17 08:12:03 +00:00
Jack Ye
5eaac178b1 fix(python): pass namespace client on schema-only table create (#3283)
## Summary
- pass `namespace_client` through the Python create-table path
- ensure schema-only namespace table creation uses the namespace-aware
empty-table flow
- fix reopening namespace tables created without initial data
2026-04-17 01:11:18 -07:00
Lance Release
11af763fcd Bump version: 0.28.0-beta.5 → 0.28.0-beta.6 2026-04-16 18:57:28 +00:00
Lance Release
2ed5452e1c Bump version: 0.31.0-beta.5 → 0.31.0-beta.6 python-v0.31.0-beta.6 2026-04-16 18:57:05 +00:00
Xuanwo
b7c0b5987c chore: upgrade lance to 6.0.0-beta.1 (#3281) 2026-04-17 02:51:58 +08:00
Jack Ye
97a4b38f19 feat(rust): support nested namespace ops in listing db (#3279)
## Summary
- delegate child-namespace `ListingDatabase` operations through an
eagerly initialized `LanceNamespaceDatabase`
- support nested namespace create/open/list/drop flows without requiring
callers to inject explicit locations
- add `namespace_client_properties` plumbing for local and namespace
connections so directory namespace settings like
`table_version_tracking_enabled` can be configured
- add regression tests for nested namespace ops and namespace client
property propagation
2026-04-16 10:12:28 -07:00
Gezi-lzq
10879d99b8 docs: fix broken documentation links (#3278) 2026-04-15 20:56:59 +08:00
Lance Release
4e6a1d5dce Bump version: 0.28.0-beta.4 → 0.28.0-beta.5 2026-04-12 23:51:14 +00:00
Lance Release
13d2759356 Bump version: 0.31.0-beta.4 → 0.31.0-beta.5 python-v0.31.0-beta.5 2026-04-12 23:50:50 +00:00
Jack Ye
7f52ec8c36 feat(python): support child namepsace operations and json serialization for LanceDBConnection (#3265)
## Summary

Add connection serialization and child namespace support to
`LanceDBConnection`.

- `DBConnection.serialize()` / `lancedb.deserialize()` for connection
reconstruction in remote workers
- Cache `namespace_client()` in `LanceDBConnection` to avoid repeated
DirectoryNamespace builds
- `LanceDBConnection` transparently delegates child namespace operations
(open_table, create_table, list_tables, drop_table, create_namespace,
etc.) to `LanceNamespaceDBConnection` via `_namespace_conn()`
- Root namespace operations still go through the original Rust path
- Generic worker property override mechanism: any
`namespace_client_properties` key prefixed with `_lancedb_worker_` has
the prefix stripped and overrides the corresponding property when
`deserialize(data, for_worker=True)`
- `LanceNamespaceDBConnection` stores
`namespace_client_impl`/`namespace_client_properties` for serialization
roundtrip

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 16:49:45 -07:00
Lance Release
c6ae0de3ee Bump version: 0.28.0-beta.3 → 0.28.0-beta.4 2026-04-12 03:57:58 +00:00
Lance Release
231f0655ce Bump version: 0.31.0-beta.3 → 0.31.0-beta.4 python-v0.31.0-beta.4 2026-04-12 03:57:35 +00:00
LanceDB Robot
8c52977c59 chore: update lance dependency to v5.1.0-beta.3 (#3266)
## Summary
- Bump Rust Lance dependencies to `v5.1.0-beta.3` using
`ci/set_lance_version.py`.
- Update Java `lance-core.version` to `5.1.0-beta.3` in `java/pom.xml`.
- Refresh `Cargo.lock` metadata to the `v5.1.0-beta.3` Lance git tag.

## Verification
- `cargo clippy --workspace --tests --all-features -- -D warnings`
- `cargo fmt --all`

## Upstream Tag
- https://github.com/lance-format/lance/releases/tag/v5.1.0-beta.3
2026-04-11 20:56:49 -07:00
Lance Release
359710a0bf Bump version: 0.28.0-beta.2 → 0.28.0-beta.3 2026-04-11 22:44:52 +00:00
Lance Release
1f1726369d Bump version: 0.31.0-beta.2 → 0.31.0-beta.3 python-v0.31.0-beta.3 2026-04-11 22:44:25 +00:00
Lance Release
df354abae4 Bump version: 0.28.0-beta.1 → 0.28.0-beta.2 2026-04-11 07:06:00 +00:00
Lance Release
11bc674548 Bump version: 0.31.0-beta.1 → 0.31.0-beta.2 python-v0.31.0-beta.2 2026-04-11 07:05:36 +00:00
LanceDB Robot
5593460823 chore: update lance dependency to v5.1.0-beta.2 (#3263)
## Summary
- Bump Lance Rust workspace dependencies from `5.0.0-beta.5` to
`5.1.0-beta.2` using `ci/set_lance_version.py`.
- Update Java `lance-core.version` in `java/pom.xml` to `5.1.0-beta.2`.
- Refresh `Cargo.lock` to match the new Lance tag.

## Verification
- `cargo clippy --workspace --tests --all-features -- -D warnings`
(passes)
- `cargo fmt --all` (passes)

## Triggering Tag
- https://github.com/lance-format/lance/releases/tag/v5.1.0-beta.2
2026-04-11 00:04:43 -07:00
Will Jones
2807ad6854 chore: bump Rust toolchain from 1.91.0 to 1.94.0 (#3257)
Bumps the Rust toolchain to 1.94.0 (latest installed) to unblock CI
failures caused by the AWS SDK's MSRV requirement. No lint fixes were
needed.

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-10 07:57:47 -07:00
Dhruv Garg
4761fa9bcb fix(python): migrate gemini-text provider to google-genai sdk (#3250)
## Summary
- migrate gemini-text embedding provider from deprecated
google.generativeai to google.genai
- update Python embedding extra dependency to google-genai
- update default model name to gemini-embedding-001
- adapt embed calls to Client().models.embed_content(...)
- apply lint fixes from CI

## Related
- Closes #3191
2026-04-09 15:28:34 -07:00
lennylxx
4c2939d66e fix(python): guard against None before .decode() on split_names metadata key (#3229)
`.get(b"split_names", None).decode()` was called unconditionally in both
Permutations.__init__ and Permutation.from_tables(), crashing with
AttributeError when schema metadata existed but lacked the split_names
key. Guard the decode behind a None check and add regression tests.
2026-04-08 16:04:13 -07:00
yaommen
a813ce2f71 fix(python): sanitize bad vectors before Arrow cast (#3158)
## Problem

`on_bad_vectors="drop"` is supposed to remove invalid vector rows before
write, but for some schema-defined vector columns it can still fail
later during Arrow cast instead of dropping the bad row.

Repro:
```python
class MySchema(LanceModel):
    text: str
    embedding: Vector(16)

table = db.create_table("test", schema=MySchema)
table.add(
    [
        {"text": "hello", "embedding": []},
        {"text": "bar", "embedding": [0.1] * 16},
    ],
    on_bad_vectors="drop",
)
```
Before:
```
RuntimeError
Arrow error: C Data interface error: Invalid: ListType can only be casted to FixedSizeListType if the lists are all the expected size.
```
After:
```
rows 1
texts ['bar']
```
## Solution

Make bad-vector sanitization use schema dimensions before cast, while
keeping the handling scoped to vector columns identified by schema
metadata or existing vector-name heuristics.

This also preserves existing integer vector inputs and avoids applying
on_bad_vectors to unrelated fixed-size float columns.


Fixes #1670

Signed-off-by: yaommen <myanstu@163.com>
2026-04-08 09:09:41 -07:00
Jack Ye
a898dc81c2 feat: add user_id field to ClientConfig for user identification (#3240)
## Summary

- Add a `user_id` field to `ClientConfig` that allows users to identify
themselves to LanceDB Cloud/Enterprise
- The user_id is sent as the `x-lancedb-user-id` HTTP header in all
requests
- Supports three configuration methods:
  - Direct assignment via `ClientConfig.user_id`
  - Environment variable `LANCEDB_USER_ID`
  - Indirect env var lookup via `LANCEDB_USER_ID_ENV_KEY`

Closes #3230

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-04-06 11:20:10 -07:00
Lance Release
de3f8097e7 Bump version: 0.28.0-beta.0 → 0.28.0-beta.1 2026-04-05 02:51:18 +00:00
Lance Release
0ac59de5f1 Bump version: 0.31.0-beta.0 → 0.31.0-beta.1 python-v0.31.0-beta.1 2026-04-05 02:50:52 +00:00
LanceDB Robot
d082c2d2ac chore: update lance dependency to v5.0.0-beta.5 (#3237)
## Summary
- update Rust Lance workspace dependencies to `v5.0.0-beta.5` using
`ci/set_lance_version.py`
- update Java `lance-core` dependency property to `5.0.0-beta.5`
- refresh Cargo lockfile to the new Lance tag

## Verification
- `cargo clippy --workspace --tests --all-features -- -D warnings`
- `cargo fmt --all`

## Upstream Tag
- https://github.com/lance-format/lance/releases/tag/v5.0.0-beta.5

---------

Co-authored-by: Jack Ye <yezhaoqin@gmail.com>
2026-04-04 19:49:51 -07:00
Zelys
9d8699f99e feat(python): support Enum types in Pydantic to Arrow schema conversion (#3232)
## Summary

Fixes #1846.

Python `Enum` fields raised `TypeError: Converting Pydantic type to
Arrow Type: unsupported type <enum 'SomethingTypes'>` when converting a
Pydantic model to an Arrow schema.

The fix adds Enum detection in `_pydantic_type_to_arrow_type`. When an
Enum subclass is encountered, the value type of its members is inspected
and mapped to the appropriate Arrow type:

- `str`-valued enums (e.g. `class Status(str, Enum)`) → `pa.utf8()`
- `int`-valued enums (e.g. `class Priority(int, Enum)`) → `pa.int64()`
- Other homogeneous value types → the Arrow type for that Python type
- Mixed-value or empty enums → `pa.utf8()` (safe fallback)

This covers the common `(str, Enum)` and `(int, Enum)` mixin patterns
used in practice.

## Changes

- `python/python/lancedb/pydantic.py`: add Enum branch in
`_pydantic_type_to_arrow_type`
- `python/python/tests/test_pydantic.py`: add `test_enum_types` covering
`str`, `int`, and `Optional` Enum fields

## Note on #2395

PR #2395 handles `StrEnum` (Python 3.11+) specifically, using a
dictionary-encoded type. This PR handles the broader `(str, Enum)` /
`(int, Enum)` mixin pattern that works across all Python versions and
stores values as their natural Arrow type.

AI assistance was used in developing this fix.
2026-04-03 10:40:49 -07:00
Lance Release
aa2c7b3591 Bump version: 0.27.2 → 0.28.0-beta.0 2026-04-03 08:45:56 +00:00
Lance Release
590c0c1e77 Bump version: 0.30.2 → 0.31.0-beta.0 python-v0.31.0-beta.0 2026-04-03 08:45:29 +00:00
LanceDB Robot
382ecd65e3 chore: update lance dependency to v5.0.0-beta.4 (#3234)
## Summary
- Update Rust Lance workspace dependencies to `v5.0.0-beta.4` using
`ci/set_lance_version.py` (including lockfile refresh).
- Update Java `lance-core` dependency property to `5.0.0-beta.4` in
`java/pom.xml`.

## Verification
- `cargo clippy --workspace --tests --all-features -- -D warnings`
- `cargo fmt --all`

## Triggering tag
- https://github.com/lance-format/lance/releases/tag/v5.0.0-beta.4
2026-04-03 01:33:36 -07:00
Jack Ye
e26b22bcca refactor!: consolidate namespace related naming and enterprise integration (#3205)
1. Refactored every client (Rust core, Python, Node/TypeScript) so
“namespace” usage is explicit: code now keeps namespace paths
(namespace_path) separate from namespace clients (namespace_client).
Connections propagate the client, table creation routes through it, and
managed versioning defaults are resolved from namespace metadata. Python
gained LanceNamespaceDBConnection/async counterparts, and the
namespace-focused tests were rewritten to match the clarified API
surface.
2. Synchronized the workspace with Lance 5.0.0-beta.3 (see
https://github.com/lance-format/lance/pull/6186 for the upstream
namespace refactor), updating Cargo/uv lockfiles and ensuring all
bindings align with the new namespace semantics.
3. Added a namespace-backed code path to lancedb.connect() via new
keyword arguments (namespace_client_impl, namespace_client_properties,
plus the existing pushdown-ops flag). When those kwargs are supplied,
connect() delegates to connect_namespace, so users can opt into
namespace clients without changing APIs. (The async helper will gain
parity in a later change)
2026-04-03 00:09:03 -07:00
Lance Release
3ba46135a5 Bump version: 0.27.2-beta.2 → 0.27.2 2026-03-31 21:26:04 +00:00
Lance Release
f903d07887 Bump version: 0.27.2-beta.1 → 0.27.2-beta.2 2026-03-31 21:25:36 +00:00
Lance Release
5d550124bd Bump version: 0.30.2-beta.2 → 0.30.2 python-v0.30.2 2026-03-31 21:25:04 +00:00
Lance Release
c57cb310a2 Bump version: 0.30.2-beta.1 → 0.30.2-beta.2 2026-03-31 21:25:02 +00:00
Dan Tasse
97754f5123 fix: change _client reference to _conn (#3188)
This code previously referenced `self._client`, which does not exist.
This change makes it correctly call `self._conn.close()`
2026-03-31 13:29:17 -07:00
Pratik Dey
7b1c063848 feat(python): add type-safe expression builder API (#3150)
Introduces col(), lit(), func(), and Expr class as alternatives to raw
SQL strings in .where() and .select(). Expressions are backed by
DataFusion's Expr AST and serialized to SQL for remote table compat.

Resolves: 
- https://github.com/lancedb/lancedb/issues/3044 (python api's)
- https://github.com/lancedb/lancedb/issues/3043 (support for filter)
- https://github.com/lancedb/lancedb/issues/3045 (support for
projection)

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-31 11:32:49 -07:00
Will Jones
c7f189f27b chore: upgrade lance to stable 4.0.0 (#3207)
Bumps all lance-* workspace dependencies from `4.0.0-rc.3` (git source)
to the stable `4.0.0` release on crates.io, removing the `git`/`tag`
overrides.

No code changes were required — compiles and passes clippy cleanly.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-30 17:05:45 -07:00
yaommen
a0a2942ad5 fix: respect max_batch_length for Rust vector and hybrid queries (#3172)
Fixes #1540

I could not reproduce this on current `main` from Python, but I could
still reproduce it from the Rust SDK.

Python no longer reproduces because the current Python vector/hybrid
query paths re-chunk results into a `pyarrow.Table` before returning
batches. Rust still reproduced because `max_batch_length` was passed
into planning/scanning, but vector search could still emit larger
`RecordBatch`es later in execution (for example after KNN / TopK), so it
was not enforced on the final Rust output stream.

This PR enforces `max_batch_length` on the final Rust query output
stream and adds Rust regression coverage.

Before the fix, the Rust repro produced:
`num_batches=2, max_batch=8192, min_batch=1808, all_le_100=false`

After the fix, the same repro produces batches `<= 100`.

## Runnable Rust repro

Before this fix, current `main` could still return batches like `[8192,
1808]` here even with `max_batch_length = 100`:

```rust
use std::sync::Arc;

use arrow_array::{
    types::Float32Type, FixedSizeListArray, RecordBatch, RecordBatchReader, StringArray,
};
use arrow_schema::{DataType, Field, Schema};
use futures::TryStreamExt;
use lancedb::query::{ExecutableQuery, QueryBase, QueryExecutionOptions};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let tmp = tempfile::tempdir()?;
    let uri = tmp.path().to_str().unwrap();

    let rows = 10_000;
    let schema = Arc::new(Schema::new(vec![
        Field::new("id", DataType::Utf8, false),
        Field::new(
            "vector",
            DataType::FixedSizeList(Arc::new(Field::new("item", DataType::Float32, true)), 4),
            false,
        ),
    ]));

    let ids = StringArray::from_iter_values((0..rows).map(|i| format!("row-{i}")));
    let vectors = FixedSizeListArray::from_iter_primitive::<Float32Type, _, _>(
        (0..rows).map(|i| Some(vec![Some(i as f32), Some(1.0), Some(2.0), Some(3.0)])),
        4,
    );
    let batch = RecordBatch::try_new(schema.clone(), vec![Arc::new(ids), Arc::new(vectors)])?;
    let reader: Box<dyn RecordBatchReader + Send> = Box::new(
        arrow_array::RecordBatchIterator::new(vec![Ok(batch)].into_iter(), schema),
    );

    let db = lancedb::connect(uri).execute().await?;
    let table = db.create_table("test", reader).execute().await?;

    let mut opts = QueryExecutionOptions::default();
    opts.max_batch_length = 100;

    let mut stream = table
        .query()
        .nearest_to(vec![0.0, 1.0, 2.0, 3.0])?
        .limit(rows)
        .execute_with_options(opts)
        .await?;

    let mut sizes = Vec::new();
    while let Some(batch) = stream.try_next().await? {
        sizes.push(batch.num_rows());
    }

    println!("{sizes:?}");
    Ok(())
}
```

Signed-off-by: yaommen <myanstu@163.com>
2026-03-30 15:43:58 -07:00
Will Jones
e3d53dd185 fix(python): skip test_url_retrieve_downloads_image when PIL not installed (#3208)
The test added in #3190 unconditionally imports `PIL`, which is an
optional dependency. This causes CI failures in environments where
Pillow isn't installed (`ModuleNotFoundError: No module named 'PIL'`).

Use `pytest.importorskip` to skip gracefully when Pillow is unavailable.

Fixes CI failure on main.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-30 14:48:49 -07:00
Will Jones
66804e99fc fix(python): use correct exception types in namespace tests (#3206)
## Summary
- Namespace tests expected `RuntimeError` for table-not-found and
namespace-not-empty cases, but `lance_namespace` raises
`TableNotFoundError` and `NamespaceNotEmptyError` which inherit from
`Exception`, not `RuntimeError`.
- Updated `pytest.raises` to use the correct exception types.

## Test plan
- [x] CI passes on `test_namespace.py`

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 12:55:54 -07:00
lennylxx
9f85d4c639 fix(embeddings): add missing urllib.request import in url_retrieve (#3190)
url_retrieve() calls urllib.request.urlopen() but only urllib.error was
imported, causing AttributeError for any HTTP URL input. This affects
open-clip, siglip, and jinaai embedding functions when processing image
URLs.

The bug has existed since the embeddings API refactor (#580) but was
masked because most users pass local file paths or bytes rather than
HTTP URLs.
2026-03-30 12:03:44 -07:00