lancedb

mirror of https://github.com/lancedb/lancedb.git synced 2026-06-22 05:30:39 +00:00

Author	SHA1	Message	Date
Ryan Green	e1334954d7	fix: overflow using sys.maxsize for k in query with namespace connection (#3561 )	2026-06-19 12:57:10 -02:30
nuthalapativarun	40f3e22600	feat: support rename_table on LanceNamespaceDatabase (#3520 ) ## Summary Closes #3412 Implements `rename_table` for `LanceNamespaceDatabase` (sync and async Python) and the Rust `NamespaceDatabase` backend. Previously these raised `NotImplementedError`; this PR delegates to the `LanceNamespace.rename_table` method which is part of the lance-namespace spec. ### Changes - `rust/lancedb/src/database/namespace.rs`: Remove the `NotImplementedError` stub for `rename_table`. Build a `RenameTableRequest` (with `id`, `new_table_name`, and optionally `new_namespace_id`) and call `self.namespace.rename_table(...)`, mirroring the existing `drop_table` pattern. - `python/python/lancedb/namespace.py`: Import `RenameTableRequest` from `lance_namespace`. Replace the `raise NotImplementedError` in both `LanceNamespaceDatabase.rename_table` (sync) and `AsyncLanceNamespaceDatabase.rename_table` (async) with a call to `self._namespace_client.rename_table(request)`. - `python/python/tests/test_namespace.py`: Replace the `test_rename_table_not_supported` test (which checked for `NotImplementedError`) with `test_rename_table`, which: 1. Creates a table in a namespace 2. Calls `rename_table` with `cur_namespace_path` and `new_namespace_path` 3. Asserts the old name is gone from `table_names()` 4. Asserts the new name appears in `table_names()` 5. Verifies the renamed table can be opened ## Test plan - [ ] Existing namespace tests pass in CI (all rely on `lance.namespace.DirectoryNamespace` which requires the full lance package) - [ ] `test_rename_table` exercises the full rename path: create → rename → verify old gone → verify new present → open - [ ] Rust build passes with the updated `namespace.rs` (requires Rust toolchain in CI)	2026-06-11 11:41:07 -07:00
Brendan Clement	53517b3aaa	feat: add table branch support (#3490 ) ### Description Adds first-class support for table branches across the Rust core and the Python and TypeScript SDKs. Rust ```rust use lance::dataset::refs::Ref; // Create a branch from main and write to it — main is untouched. let exp = table.create_branch("exp", Ref::Version(None, None)).await?; exp.add(batches).await?; // Reopen the branch later: check out from a table, or open it directly. let exp = table.checkout_branch("exp").await?; let exp = db.open_table("items").branch("exp").execute().await?; let branches = table.list_branches().await?; table.delete_branch("exp").await?; ``` Python ```python # Create a branch from main and write to it branch = await table.branches.create("exp", from_ref="main") await branch.add(data) # Reopen the branch later: check out from a table, or open it directly. branch = await table.branches.checkout("exp") branch = await db.open_table("items", branch="exp") await table.branches.list() await table.branches.delete("exp") ``` TypeScript ```typescript const branches = await table.branches(); // Create a branch from main and write to it const branch = await branches.create("exp"); await branch.add(data); // Reopen the branch later: check out from a table, or open it directly. const checkedOut = await branches.checkout("exp"); const opened = await db.openTable("items", undefined, { branch: "exp" }); await branches.list(); await branches.delete("exp"); ``` ### Testing - Added unit tests - ran smoke tests against python and typescript sdks on local machine ### Next steps - Add RemoteTable support - Add Branch Comparison support - Merge Branching support	2026-06-08 16:26:46 -07:00
Yang Cen	3e25f584eb	fix(python): push down namespace full reads (#3516 ) ## Bug Fix ### What is the bug? Namespace-backed `LanceTable.to_arrow()` full-table reads bypassed the existing `QueryTable` server-side query path and called the lower-level table `to_arrow()` implementation directly. In Geneva/Sophon this could fail while parsing the Arrow IPC response for `hist.get_table().to_arrow()` / `to_pandas()`, even though `hist.get_table().search().to_arrow()` worked. ### What issues or incorrect behavior does the bug cause? Full-table reads on namespace-backed tables with `QueryTable` pushdown could fail with Arrow IPC parse errors, while query/search reads on the same table succeeded. Since `to_pandas()` delegates through `to_arrow()` for non-blob/native cases, pandas export was affected too. ### How does this PR fix the problem? When `QueryTable` pushdown is enabled, sync and async table `to_arrow()` now construct a plain no-filter, no-limit, all-columns query and execute it through the table-level `_execute_query()` path. `AsyncTable` now preserves namespace context from async namespace connections so async full reads can make the same pushdown decision. Non-namespace tables and namespace tables without `QueryTable` pushdown keep their existing behavior. ### Tests - `uv run --extra tests --extra dev --no-sync ruff check python/lancedb/table.py python/lancedb/namespace.py python/tests/test_namespace.py` - `uv run --extra tests --extra dev --no-sync ruff format python/lancedb/table.py python/lancedb/namespace.py python/tests/test_namespace.py` - `uv run --extra tests --extra dev --no-sync pytest python/tests/test_namespace.py::TestPushdownOperations::test_lance_table_to_arrow_uses_query_pushdown python/tests/test_namespace.py::TestAsyncPushdownOperations::test_async_table_to_arrow_uses_query_pushdown python/tests/test_namespace.py::test_local_table_to_arrow_and_to_pandas_are_unchanged -q` - `uv run --extra tests --extra dev --no-sync pytest python/tests/test_namespace.py -q`	2026-06-08 19:48:40 +08:00
Yang Cen	6f18eb4cce	feat(python): support blob modes in query to_pandas (#3487 ) ## Feature - What is the new feature? - Adds `blob_mode` support to sync and async Python query `to_pandas()` APIs. - Enables plain scan queries to return blob columns as lazy `BlobFile` objects, raw bytes, or blob descriptions. - Lets namespace-backed local tables use Lance native blob-aware pandas conversion for lazy blobs. - Why do we need this feature? - Table and Lance dataset/scanner APIs already support blob-aware pandas conversion, but LanceDB query builders did not expose that capability. - Geneva and other callers should be able to use query-level `to_pandas(blob_mode=...)` without manually constructing Lance scanners. - How does it work? - Plain scan queries route through Lance scanner native `to_pandas(blob_mode=...)`, preserving filter, projection, limit, offset, row id, and alias/expression projection behavior. - Non-native query shapes keep existing Arrow fallback semantics and raise a clear error when they return blob columns with `blob_mode="lazy"` or `blob_mode="bytes"`. - Focused tests cover table/query blob modes, filter/select/limit/offset/alias query cases, async query behavior, vector-query error boundaries, and namespace-backed lazy blobs. ## Validation - `cd python && .venv/bin/maturin develop --uv --extras tests,dev --profile dev` - `cd python && uv run --frozen --no-sync pytest python/tests/test_table.py::test_table_to_pandas_blob_modes python/tests/test_table.py::test_async_table_to_pandas_blob_bytes python/tests/test_query.py::test_plain_scan_query_to_pandas_blob_modes python/tests/test_query.py::test_plain_scan_query_to_pandas_blob_projection python/tests/test_query.py::test_async_plain_scan_query_to_pandas_blob_projection python/tests/test_query.py::test_vector_query_to_pandas_blob_mode_requires_native_path python/tests/test_namespace.py::TestNamespaceConnection::test_table_to_pandas_blob_lazy_through_namespace -q` - `cd python && uv run --frozen --no-sync ruff format --check .` - `cd python && uv run --frozen --no-sync ruff check .` - `git diff --check`	2026-06-03 19:15:44 +08:00
Jack Ye	7f52ec8c36	feat(python): support child namepsace operations and json serialization for LanceDBConnection (#3265 ) ## Summary Add connection serialization and child namespace support to `LanceDBConnection`. - `DBConnection.serialize()` / `lancedb.deserialize()` for connection reconstruction in remote workers - Cache `namespace_client()` in `LanceDBConnection` to avoid repeated DirectoryNamespace builds - `LanceDBConnection` transparently delegates child namespace operations (open_table, create_table, list_tables, drop_table, create_namespace, etc.) to `LanceNamespaceDBConnection` via `_namespace_conn()` - Root namespace operations still go through the original Rust path - Generic worker property override mechanism: any `namespace_client_properties` key prefixed with `_lancedb_worker_` has the prefix stripped and overrides the corresponding property when `deserialize(data, for_worker=True)` - `LanceNamespaceDBConnection` stores `namespace_client_impl`/`namespace_client_properties` for serialization roundtrip --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 16:49:45 -07:00
Jack Ye	e26b22bcca	refactor!: consolidate namespace related naming and enterprise integration (#3205 ) 1. Refactored every client (Rust core, Python, Node/TypeScript) so “namespace” usage is explicit: code now keeps namespace paths (namespace_path) separate from namespace clients (namespace_client). Connections propagate the client, table creation routes through it, and managed versioning defaults are resolved from namespace metadata. Python gained LanceNamespaceDBConnection/async counterparts, and the namespace-focused tests were rewritten to match the clarified API surface. 2. Synchronized the workspace with Lance 5.0.0-beta.3 (see https://github.com/lance-format/lance/pull/6186 for the upstream namespace refactor), updating Cargo/uv lockfiles and ensuring all bindings align with the new namespace semantics. 3. Added a namespace-backed code path to lancedb.connect() via new keyword arguments (namespace_client_impl, namespace_client_properties, plus the existing pushdown-ops flag). When those kwargs are supplied, connect() delegates to connect_namespace, so users can opt into namespace clients without changing APIs. (The async helper will gain parity in a later change)	2026-04-03 00:09:03 -07:00
Will Jones	66804e99fc	fix(python): use correct exception types in namespace tests (#3206 ) ## Summary - Namespace tests expected `RuntimeError` for table-not-found and namespace-not-empty cases, but `lance_namespace` raises `TableNotFoundError` and `NamespaceNotEmptyError` which inherit from `Exception`, not `RuntimeError`. - Updated `pytest.raises` to use the correct exception types. ## Test plan - [x] CI passes on `test_namespace.py` 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-30 12:55:54 -07:00
Jack Ye	d1efc6ad8a	refactor!: use namespace models directly for namespace operations (#2806 ) 1. Use generated models in lance-namespace for request response models to avoid multiple layers of conversions 2. Make sure the API is consistent with the namespace spec 3. Deprecate the table_names API in favor of the list_tables API in namespace that allows full pagination support without the need to have sorted table names 4. Add describe_namespace API which was a miss in the original implementation	2025-12-02 22:41:04 -08:00
Jack Ye	1b78ccedaf	feat: support async namespace connection (#2788 ) Also fix 2 bugs: 1. make storage options provider serializable in ray 2. fix table.to_table() uri is wrong for namespace-backed tables	2025-11-19 12:23:50 -08:00
Jack Ye	e47f552a86	feat: support namespace credentials vending (#2778 ) Based on https://github.com/lancedb/lance/pull/4984 1. Bump to 1.0.0-beta.2 2. Use DirectoryNamespace in lance to perform all testing in python and rust for much better coverage 3. Refactor `ListingDatabase` to be able to accept location and namespace. This is because we have to leverage listing database (local lancedb connection) for using namespace, namespace only resolves the location and storage options but we don't want to bind all the way to rust since user will plug-in namespace from python side. And thus `ListingDatabase` needs to be able to accept location and namespace that are created from namespace connection. 4. For credentials vending, we also pass storage options provider all the way to rust layer, and the rust layer calls back to the python function to fetch next storage option. This is exactly the same thing we did in pylance.	2025-11-17 00:42:24 -08:00
LuQQiu	199904ab35	chore: update lance dependency to v0.38.3-beta.11 (#2749 ) ## Summary - Updated all Lance dependencies from v0.38.3-beta.9 to v0.38.3-beta.11 - Migrated `lance-namespace-impls` to use new granular cloud provider features (`dir-aws`, `dir-gcp`, `dir-azure`, `dir-oss`) instead of deprecated `dir` feature - Updated namespace connection API to use `ConnectBuilder` instead of deprecated `connect()` function ## API Changes The Lance team refactored the `lance-namespace-impls` package in v0.38.3-beta.11: 1. Feature flags: The single `dir` feature was split into cloud provider-specific features: - `dir-aws` for AWS S3 support - `dir-gcp` for Google Cloud Storage support - `dir-azure` for Azure Blob Storage support - `dir-oss` for Alibaba Cloud OSS support 2. Connection API: The `connect()` function was replaced with a `ConnectBuilder` pattern for more flexibility ## Testing - ✅ Ran `cargo clippy --workspace --tests --all-features -- -D warnings` - no warnings - ✅ Ran `cargo fmt --all` - code formatted - ✅ All changes verified and committed ## Related This update was triggered by the Lance release: https://github.com/lancedb/lance/releases/tag/v0.38.3-beta.11 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-10-27 19:10:26 -07:00
Wyatt Alt	a9ea785b15	fix: remote python sdk namespace typing (#2620 ) This changes the default values for some namespace parameters in the remote python SDK from None to [], to match the underlying code it calls. Prior to this commit, failing to supply "namespace" with the remote SDK would cause an error because the underlying code it dispatches to does not consider None to be valid input.	2025-09-02 16:32:32 -07:00
Jack Ye	faf8973624	feat!: support multi-level namespace (#2603 ) This PR adds support of multi-level namespace in a LanceDB database, according to the Lance Namespace spec. This allows users to create namespace inside a database connection, perform create, drop, list, list_tables in a namespace. (other operations like update, describe will be in a follow-up PR) The 3 types of database connections behave like the following: 1 Local database connections will continue to have just a flat list of tables for backwards compatibility. 2. Remote database connections will make REST API calls according to the APIs in the Lance Namespace spec. 3. Lance Namespace connections will invoke the corresponding operations against the specific namespace implementation which could have different behaviors regarding these APIs. All the table APIs now take identifier instead of name, for example `/v1/table/{name}/create` is now `/v1/table/{id}/create`. If a table is directly in the root namespace, the API call is identical. If the table is in a namespace, then the full table ID should be used, with `$` as the default delimiter (`.` is a special character and creates issues with URL parsing so `$` is used), for example `/v1/table/ns1$table1/create`. If a different parameter needs to be passed in, user can configure the `id_delimiter` in client config and that becomes a query parameter, for example `/v1/table/ns1__table1/create?delimiter=__` The Python and Typescript APIs are kept backwards compatible, but the following Rust APIs are not: 1. `Connection::drop_table(&self, name: impl AsRef<str>) -> Result<()>` is now `Connection::drop_table(&self, name: impl AsRef<str>, namespace: &[String]) -> Result<()>` 2. `Connection::drop_all_tables(&self) -> Result<()>` is now `Connection::drop_all_tables(&self, name: impl AsRef<str>) -> Result<()>`	2025-08-27 12:07:55 -07:00
Jack Ye	04285a4a4e	feat(python): integrate with lance namespace (#2599 ) This PR integrates `lancedb` with `lance-namespace` so that users can use LanceDB client to access Lance tables in any catalog services. In general, we expect most of the logic to be delegated to the existing `LanceDBConnection` and `LanceTable`, but the namespace implemenation will control how table is created, dropped, and describe where the table is stored with any related storage options like access credentials. The implementation currently only supports a 1 level namespace that directly contains tables. We will introduce nested namespace support in a separated PR. Users are expected to use it in the following way: ```python >>> import lancedb >>> import pyarrow as pa >>> # Connect using GlueNamespace >>> db = lancedb.connect_namespace("glue", {"catalog_id": "123456789012"}) >>> # Create a table with schema >>> schema = pa.schema([ ... pa.field("id", pa.int64()), ... pa.field("vector", pa.list_(pa.float32(), 2)) ... ]) >>> table = db.create_table("my_table", schema=schema) >>> # List tables >>> db.table_names() ['my_table'] ```	2025-08-20 15:46:16 -07:00

15 Commits