lancedb

mirror of https://github.com/lancedb/lancedb.git synced 2026-05-19 04:50:40 +00:00

Author	SHA1	Message	Date
Jack Ye	25dfe2cfd4	feat: add manifest-enabled directory namespace mode (#3332 ) Adds manifest_enabled for local/native connections so directory namespace manifests can be the source of truth, including migration from directory listing and Azure credential vending feature wiring. Also exposes the option through Rust, Python, and Node bindings with focused validation.	2026-04-29 09:22:06 -07:00
Lance Release	4dcd7f4314	Bump version: 0.28.0-beta.9 → 0.28.0-beta.10	2026-04-28 13:29:26 +00:00
Jack Ye	a92ae0ded5	fix: enable hostname verification by default (#3304 ) ## Summary - make `TlsConfig::default()` enable hostname verification by default - align the Rust default with the documented Python and Node behavior - update the Rust unit test to lock in the safe default	2026-04-21 08:39:03 -07:00
Lance Release	75b0a8e0a3	Bump version: 0.28.0-beta.8 → 0.28.0-beta.9	2026-04-19 20:39:29 +00:00
Jack Ye	2a1df8edcf	fix(rust): materialize declared namespace tables on create (#3288 ) ## Summary - handle `declare_table` already-exists conflicts in the Rust namespace database create path - reuse declared-but-not-materialized table metadata instead of failing create mode - preserve overwrite behavior while allowing declared Geneva system tables to be materialized	2026-04-19 13:25:53 -07:00
Lance Release	be48ada352	Bump version: 0.28.0-beta.7 → 0.28.0-beta.8	2026-04-19 04:19:10 +00:00
Jack Ye	f909df3e87	fix(python): use namespace-backed rust connection for namespace tables (#3286 ) So far, I have been using a hacky approach that creates and opens namespace-backed table, by getting its location and use a temporary lancedb connection to create or open it. This was working for features like credentials vending but is no longer fully working for the managed versioning feature, recently geneva tests have been failing here and there and various patches are not addressing the root cause. This PR fully fixes this and implements proper rust binding for it. Specifically: - build a real Rust namespace-backed connection from the Python namespace client - route namespace table create/open through that connection instead of resolved-location temp connections - keep namespace client naming consistent in the Rust bridge and preserve federated namespace + DuckDB behavior	2026-04-18 21:17:52 -07:00
Lance Release	d715bbb588	Bump version: 0.28.0-beta.6 → 0.28.0-beta.7	2026-04-17 08:12:27 +00:00
Lance Release	11af763fcd	Bump version: 0.28.0-beta.5 → 0.28.0-beta.6	2026-04-16 18:57:28 +00:00
Xuanwo	b7c0b5987c	chore: upgrade lance to 6.0.0-beta.1 (#3281 )	2026-04-17 02:51:58 +08:00
Jack Ye	97a4b38f19	feat(rust): support nested namespace ops in listing db (#3279 ) ## Summary - delegate child-namespace `ListingDatabase` operations through an eagerly initialized `LanceNamespaceDatabase` - support nested namespace create/open/list/drop flows without requiring callers to inject explicit locations - add `namespace_client_properties` plumbing for local and namespace connections so directory namespace settings like `table_version_tracking_enabled` can be configured - add regression tests for nested namespace ops and namespace client property propagation	2026-04-16 10:12:28 -07:00
Gezi-lzq	10879d99b8	docs: fix broken documentation links (#3278 )	2026-04-15 20:56:59 +08:00
Lance Release	4e6a1d5dce	Bump version: 0.28.0-beta.4 → 0.28.0-beta.5	2026-04-12 23:51:14 +00:00
Lance Release	c6ae0de3ee	Bump version: 0.28.0-beta.3 → 0.28.0-beta.4	2026-04-12 03:57:58 +00:00
Lance Release	359710a0bf	Bump version: 0.28.0-beta.2 → 0.28.0-beta.3	2026-04-11 22:44:52 +00:00
Lance Release	df354abae4	Bump version: 0.28.0-beta.1 → 0.28.0-beta.2	2026-04-11 07:06:00 +00:00
Will Jones	2807ad6854	chore: bump Rust toolchain from 1.91.0 to 1.94.0 (#3257 ) Bumps the Rust toolchain to 1.94.0 (latest installed) to unblock CI failures caused by the AWS SDK's MSRV requirement. No lint fixes were needed. --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-10 07:57:47 -07:00
Jack Ye	a898dc81c2	feat: add user_id field to ClientConfig for user identification (#3240 ) ## Summary - Add a `user_id` field to `ClientConfig` that allows users to identify themselves to LanceDB Cloud/Enterprise - The user_id is sent as the `x-lancedb-user-id` HTTP header in all requests - Supports three configuration methods: - Direct assignment via `ClientConfig.user_id` - Environment variable `LANCEDB_USER_ID` - Indirect env var lookup via `LANCEDB_USER_ID_ENV_KEY` Closes #3230 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-04-06 11:20:10 -07:00
Lance Release	de3f8097e7	Bump version: 0.28.0-beta.0 → 0.28.0-beta.1	2026-04-05 02:51:18 +00:00
LanceDB Robot	d082c2d2ac	chore: update lance dependency to v5.0.0-beta.5 (#3237 ) ## Summary - update Rust Lance workspace dependencies to `v5.0.0-beta.5` using `ci/set_lance_version.py` - update Java `lance-core` dependency property to `5.0.0-beta.5` - refresh Cargo lockfile to the new Lance tag ## Verification - `cargo clippy --workspace --tests --all-features -- -D warnings` - `cargo fmt --all` ## Upstream Tag - https://github.com/lance-format/lance/releases/tag/v5.0.0-beta.5 --------- Co-authored-by: Jack Ye <yezhaoqin@gmail.com>	2026-04-04 19:49:51 -07:00
Lance Release	aa2c7b3591	Bump version: 0.27.2 → 0.28.0-beta.0	2026-04-03 08:45:56 +00:00
Jack Ye	e26b22bcca	refactor!: consolidate namespace related naming and enterprise integration (#3205 ) 1. Refactored every client (Rust core, Python, Node/TypeScript) so “namespace” usage is explicit: code now keeps namespace paths (namespace_path) separate from namespace clients (namespace_client). Connections propagate the client, table creation routes through it, and managed versioning defaults are resolved from namespace metadata. Python gained LanceNamespaceDBConnection/async counterparts, and the namespace-focused tests were rewritten to match the clarified API surface. 2. Synchronized the workspace with Lance 5.0.0-beta.3 (see https://github.com/lance-format/lance/pull/6186 for the upstream namespace refactor), updating Cargo/uv lockfiles and ensuring all bindings align with the new namespace semantics. 3. Added a namespace-backed code path to lancedb.connect() via new keyword arguments (namespace_client_impl, namespace_client_properties, plus the existing pushdown-ops flag). When those kwargs are supplied, connect() delegates to connect_namespace, so users can opt into namespace clients without changing APIs. (The async helper will gain parity in a later change)	2026-04-03 00:09:03 -07:00
Lance Release	3ba46135a5	Bump version: 0.27.2-beta.2 → 0.27.2	2026-03-31 21:26:04 +00:00
Lance Release	f903d07887	Bump version: 0.27.2-beta.1 → 0.27.2-beta.2	2026-03-31 21:25:36 +00:00
Pratik Dey	7b1c063848	feat(python): add type-safe expression builder API (#3150 ) Introduces col(), lit(), func(), and Expr class as alternatives to raw SQL strings in .where() and .select(). Expressions are backed by DataFusion's Expr AST and serialized to SQL for remote table compat. Resolves: - https://github.com/lancedb/lancedb/issues/3044 (python api's) - https://github.com/lancedb/lancedb/issues/3043 (support for filter) - https://github.com/lancedb/lancedb/issues/3045 (support for projection) --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-31 11:32:49 -07:00
yaommen	a0a2942ad5	fix: respect max_batch_length for Rust vector and hybrid queries (#3172 ) Fixes #1540 I could not reproduce this on current `main` from Python, but I could still reproduce it from the Rust SDK. Python no longer reproduces because the current Python vector/hybrid query paths re-chunk results into a `pyarrow.Table` before returning batches. Rust still reproduced because `max_batch_length` was passed into planning/scanning, but vector search could still emit larger `RecordBatch`es later in execution (for example after KNN / TopK), so it was not enforced on the final Rust output stream. This PR enforces `max_batch_length` on the final Rust query output stream and adds Rust regression coverage. Before the fix, the Rust repro produced: `num_batches=2, max_batch=8192, min_batch=1808, all_le_100=false` After the fix, the same repro produces batches `<= 100`. ## Runnable Rust repro Before this fix, current `main` could still return batches like `[8192, 1808]` here even with `max_batch_length = 100`: ```rust use std::sync::Arc; use arrow_array::{ types::Float32Type, FixedSizeListArray, RecordBatch, RecordBatchReader, StringArray, }; use arrow_schema::{DataType, Field, Schema}; use futures::TryStreamExt; use lancedb::query::{ExecutableQuery, QueryBase, QueryExecutionOptions}; #[tokio::main] async fn main() -> Result<(), Box<dyn std::error::Error>> { let tmp = tempfile::tempdir()?; let uri = tmp.path().to_str().unwrap(); let rows = 10_000; let schema = Arc::new(Schema::new(vec![ Field::new("id", DataType::Utf8, false), Field::new( "vector", DataType::FixedSizeList(Arc::new(Field::new("item", DataType::Float32, true)), 4), false, ), ])); let ids = StringArray::from_iter_values((0..rows).map(\|i\| format!("row-{i}"))); let vectors = FixedSizeListArray::from_iter_primitive::<Float32Type, _, _>( (0..rows).map(\|i\| Some(vec![Some(i as f32), Some(1.0), Some(2.0), Some(3.0)])), 4, ); let batch = RecordBatch::try_new(schema.clone(), vec![Arc::new(ids), Arc::new(vectors)])?; let reader: Box<dyn RecordBatchReader + Send> = Box::new( arrow_array::RecordBatchIterator::new(vec![Ok(batch)].into_iter(), schema), ); let db = lancedb::connect(uri).execute().await?; let table = db.create_table("test", reader).execute().await?; let mut opts = QueryExecutionOptions::default(); opts.max_batch_length = 100; let mut stream = table .query() .nearest_to(vec![0.0, 1.0, 2.0, 3.0])? .limit(rows) .execute_with_options(opts) .await?; let mut sizes = Vec::new(); while let Some(batch) = stream.try_next().await? { sizes.push(batch.num_rows()); } println!("{sizes:?}"); Ok(()) } ``` Signed-off-by: yaommen <myanstu@163.com>	2026-03-30 15:43:58 -07:00
lennylxx	74f457a0f2	fix(rust): handle Mutex lock poisoning gracefully across codebase (#3196 ) Replace ~30 production `lock().unwrap()` calls that would cascade-panic on a poisoned Mutex. Functions returning `Result` now propagate the poison as an error via `?` (leveraging the existing `From<PoisonError>` impl). Functions without a `Result` return recover via `unwrap_or_else(\|e\| e.into_inner())`, which is safe because the guarded data (counters, caches, RNG state) remains logically valid after a panic.	2026-03-30 09:25:18 -07:00
Lance Release	ad96489114	Bump version: 0.27.2-beta.0 → 0.27.2-beta.1	2026-03-25 16:22:09 +00:00
Lance Release	61de47f3a5	Bump version: 0.27.1 → 0.27.2-beta.0	2026-03-25 03:23:28 +00:00
Wyatt Alt	410ab9b6fe	Revert "feat: allow passing azure client/tenant ID through remote SDK" (#3185 ) Reverts lancedb/lancedb#3102	2026-03-24 20:17:40 -07:00
Will Jones	1d6e00b902	feat: progress bar for `add()` (#3067 ) ## Summary Adds progress reporting for `table.add()` so users can track large write operations. The progress callback is available in Rust, Python (sync and async), and through the PyO3 bindings. ### Usage Pass `progress=True` to get an automatic tqdm bar: ```python table.add(data, progress=True) # 100%\|██████████\| 1000000/1000000 [00:12<00:00, 82345 rows/s, 45.2 MB/s \| 4/4 workers] ``` Or pass a tqdm bar for more control: ```python from tqdm import tqdm with tqdm(unit=" rows") as pbar: table.add(data, progress=pbar) ``` Or use a callback for custom progress handling: ```python def on_progress(p): print(f"{p['output_rows']}/{p['total_rows']} rows, " f"{p['active_tasks']}/{p['total_tasks']} workers, " f"done={p['done']}") table.add(data, progress=on_progress) ``` In Rust: ```rust table.add(data) .progress(\|p\| println!("{}/{:?} rows", p.output_rows(), p.total_rows())) .execute() .await?; ``` ### Details - `WriteProgress` struct in Rust with getters for `elapsed`, `output_rows`, `output_bytes`, `total_rows`, `active_tasks`, `total_tasks`, and `done`. Fields are private behind getters so new fields can be added without breaking changes. - `WriteProgressTracker` tracks progress across parallel write tasks using a mutex for row/byte counts and atomics for active task counts. - Active task tracking uses an RAII guard pattern (`ActiveTaskGuard`) that increments on creation and decrements on drop. - For remote writes, `output_bytes` reflects IPC wire bytes rather than in-memory Arrow size. For local writes it uses in-memory Arrow size as a proxy (see TODO below). - tqdm postfix displays throughput (MB/s) and worker utilization (active/total). - The `done` callback always fires, even on error (via `FinishOnDrop`), so progress bars are always finalized. ### TODO - Track actual bytes written to disk for local tables. This requires Lance to expose a progress callback from its write path. See lance-format/lance#6247. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-23 16:14:13 -07:00
Esteban Gutierrez	a0228036ae	ci: fix unused PreprocessingOutput (#3180 ) Simple fix to for CI due unused import of PreprocessingOutput in table.rs Co-authored-by: Esteban Gutierrez <esteban@lancedb.com>	2026-03-23 13:45:44 -07:00
Will Jones	e6fd8d071e	feat(rust): parallel inserts for remote tables via multipart write (#3071 ) Similar to https://github.com/lancedb/lancedb/pull/3062, we can write in parallel to remote tables if the input data source is large enough. We take advantage of new endpoints coming in server version 0.4.0, which allow writing data in multiple requests, and the committing at the end in a single request. To make testing easier, I also introduce a `write_parallelism` parameter. In the future, we can expose that in Python and NodeJS so users can manually specify the parallelism they get. Closes #2861 --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-20 13:19:07 -07:00
Lance Release	3450ccaf7f	Bump version: 0.27.1-beta.0 → 0.27.1	2026-03-20 00:35:36 +00:00
Lance Release	9b229f1e7c	Bump version: 0.27.0 → 0.27.1-beta.0	2026-03-20 00:35:19 +00:00
Lance Release	bd09c53938	Bump version: 0.27.0-beta.6 → 0.27.0	2026-03-16 22:47:06 +00:00
Lance Release	0b18e33180	Bump version: 0.27.0-beta.5 → 0.27.0-beta.6	2026-03-16 22:46:48 +00:00
Mesut-Doner	c2e543f1b7	feat(rust): support Expr in projection query (#3069 ) Referred and followed [`Select::Dynamic`] implementation. Closes #3039	2026-03-13 12:54:26 -07:00
Weston Pace	216c1b5f77	docs: remove experimental label from optimize and warn about delete_unverified (#3128 ) ## Summary - Removes the "Experimental API" section from `optimize` method documentation across Rust, Python, and TypeScript - Adds a warning to `delete_unverified` documentation in all bindings: this should only be set to true if you can guarantee no other process is working on the dataset, otherwise it could be corrupted - Fixes a typo ("shoudl" → "should") Closes #3125 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-12 14:37:42 +08:00
Esteban Gutierrez	f951da2b00	feat: support prewarm_index and prewarm_data on remote tables (#3110 ) ## Summary - Implement `RemoteTable.prewarm_data(columns)` calling `POST /v1/table/{id}/page_cache/prewarm/` - Implement `RemoteTable.prewarm_index(name)` calling `POST /v1/table/{id}/index/{name}/prewarm/` (previously returned `NotSupported`) - Add `BaseTable::prewarm_data(columns)` trait method and `Table` public API in Rust core - Add PyO3 bindings and Python API (`AsyncTable`, `LanceTable`, `RemoteTable`) for `prewarm_data` - Add type stubs for `prewarm_index` and `prewarm_data` in `_lancedb.pyi` - Upgrade Lance to 3.0.0-rc.3 with breaking change fixes Co-authored-by: Will Jones <willjones127@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-10 15:39:39 -05:00
Lance Release	b3fc9c444f	Bump version: 0.27.0-beta.4 → 0.27.0-beta.5	2026-03-09 19:58:12 +00:00
Will Jones	5c3bd68e58	feat: upgrade Lance to 3.0.0-rc.3 (#3104 ) Co-authored-by: Jack Ye <yezhaoqin@gmail.com>	2026-03-09 12:55:20 -07:00
Xuanwo	68c07f333f	chore: unify component README titles (#3066 )	2026-03-09 21:47:58 +08:00
Lance Release	814a379e08	Bump version: 0.27.0-beta.3 → 0.27.0-beta.4	2026-03-09 08:47:17 +00:00
Jack Ye	e0c5ceac03	fix: propagate managed versioning for namespace connection (#3111 ) Without this fix, if user directly use the native table to do operations like `add_columns`, even if it is configured to use namespace db connection, it is not really propagated through. The fix is to bring lancedb's python binding up to date and do a similar implementation as https://github.com/lance-format/lance/pull/5968, and make sure the namespace is fully propagated through all the related calls. --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-03-09 01:44:31 -07:00
Will Jones	b75991eb07	fix: propagate cast errors in `add()` (#3075 ) When we write data with `add()`, we can input data to the table's schema. However, we were using "safe" mode, which propagates errors as nulls. For example, if you pass `u64::max` into a field that is a `u32`, it will just write null instead of giving overflow error. Now it propagates the overflow. This is the same behavior as other systems like DuckDB. --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-05 20:24:50 -08:00
Wyatt Alt	97ca9bb943	feat: allow passing azure client/tenant ID through remote SDK (#3102 ) Prior to this commit we supported passing the azure storage account name to the lancedb remote SDK through headers. This adds support for client ID and tenant ID as well.	2026-03-04 11:11:36 -08:00
Xuanwo	fa1b04f341	chore: migrate Rust crates to edition 2024 and fix clippy warnings (#3098 ) This PR migrates all Rust crates in the workspace to Rust 2024 edition and addresses the resulting compatibility updates. It also fixes all clippy warnings surfaced by the workspace checks so the codebase remains warning-free under the current lint configuration. Context: - Scope: workspace edition bump (`2021` -> `2024`) plus follow-up refactors required by new edition and clippy rules. - Validation: `cargo fmt --all` and `cargo clippy --quiet --features remote --tests --examples -- -D warnings` both pass.	2026-03-03 16:23:29 -08:00
Wyatt Alt	bc7b344fa4	feat: add support for remote index params (#3087 ) Prior to this commit the remote SDK did not support the full set of index parameters. This extends the SDK to support them.	2026-03-02 11:14:28 -08:00
Wyatt Alt	cf81b6419f	feat: add `num_deleted_rows` to delete result (#3077 )	2026-03-02 08:37:14 -08:00

1 2 3 4 5 ...

702 Commits