lancedb

mirror of https://github.com/lancedb/lancedb.git synced 2026-07-06 12:30:40 +00:00

Author	SHA1	Message	Date
Vedant Madane	1ba19d728e	feat(node): support Float16, Float64, and Uint8 vector queries (#3193 ) Fixes #2716 ## Summary Add support for querying with Float16Array, Float64Array, and Uint8Array vectors in the Node.js SDK, eliminating precision loss from the previous \Float32Array.from()\ conversion. ## Implementation Follows @wjones127's [5-step plan](https://github.com/lancedb/lancedb/issues/2716#issuecomment-3447750543): ### Rust (\ odejs/src/query.rs\) 1. \ytes_to_arrow_array(data: Uint8Array, dtype: String)\ helper that: - Creates an Arrow \Buffer\ from the raw bytes - Wraps it in a typed \ScalarBuffer<T>\ based on the dtype enum - Constructs a \PrimitiveArray\ and returns \Arc<dyn Array>\ 2. \ earest_to_raw(data, dtype)\ and \dd_query_vector_raw(data, dtype)\ NAPI methods that pass the type-erased array to the core \ earest_to\/\dd_query_vector\ which already accept \impl IntoQueryVector\ for \Arc<dyn Array>\ ### TypeScript (\ odejs/lancedb/query.ts\, \rrow.ts\) 3. Extended \IntoVector\ type to include \Uint8Array\ (and \Float16Array\ via runtime check for Node 22+) 4. \xtractVectorBuffer()\ helper detects non-Float32 typed arrays and extracts their underlying byte buffer + dtype string 5. \ earestTo()\ and \ddQueryVector()\ route through the raw NAPI path when the input is Float16/Float64/Uint8 ### Backward compatibility Existing \Float32Array\ and \ umber[]\ inputs are unchanged -- they still use the original \ earest_to(Float32Array)\ NAPI method. The new raw path is only used when a non-Float32 typed array is detected. ## Usage \\\ ypescript // Float16Array (Node 22+) -- no precision loss const f16vec = new Float16Array([0.1, 0.2, 0.3]); const results = await table.query().nearestTo(f16vec).limit(10).toArray(); // Float64Array -- no precision loss const f64vec = new Float64Array([0.1, 0.2, 0.3]); const results = await table.query().nearestTo(f64vec).limit(10).toArray(); // Uint8Array (binary embeddings) const u8vec = new Uint8Array([1, 0, 1, 1, 0]); const results = await table.query().nearestTo(u8vec).limit(10).toArray(); // Existing usage unchanged const results = await table.query().nearestTo([0.1, 0.2, 0.3]).limit(10).toArray(); \\\ ## Note on dependencies The Rust side uses \rrow_array\, \rrow_buffer\, and \half\ crates. These should already be in the dependency tree via \lancedb\ core, but \Cargo.toml\ may need explicit entries for \half\ and the arrow sub-crates in the nodejs workspace. --------- Signed-off-by: Vedant Madane <6527493+VedantMadane@users.noreply.github.com> Co-authored-by: Will Jones <willjones127@gmail.com>	2026-03-30 11:15:35 -07:00
lif	4c44587af0	fix: table.add(mode='overwrite') infers vector column types (#3184 ) Fixes #3183 ## Summary When `table.add(mode='overwrite')` is called, PyArrow infers input data types (e.g. `list<double>`) which differ from the original table schema (e.g. `fixed_size_list<float32>`). Previously, overwrite mode bypassed `cast_to_table_schema()` entirely, so the inferred types replaced the original schema, breaking vector search. This fix builds a merged target schema for overwrite: columns present in the existing table schema keep their original types, while columns unique to the input pass through as-is. This way `cast_to_table_schema()` is applied unconditionally, preserving vector column types without blocking schema evolution. ## Changes - `rust/lancedb/src/table/add_data.rs`: For overwrite mode, construct a target schema by matching input columns against the existing table schema, then cast. Non-overwrite (append) path is unchanged. - Added `test_add_overwrite_preserves_vector_type` test that creates a table with `fixed_size_list<float32>`, overwrites with `list<double>` input, and asserts the original type is preserved. ## Test Plan - `cargo test --features remote -p lancedb -- test_add_overwrite` — all 4 overwrite tests pass - Full suite: 454 passed, 2 failed (pre-existing `remote::retry` flakes unrelated to this change) --------- Signed-off-by: majiayu000 <1835304752@qq.com>	2026-03-30 10:57:33 -07:00
lennylxx	1d1cafb59c	fix(python): don't assign dict.update() return value in _sanitize_data (#3198 ) dict.update() mutates in place and returns None. Assigning its result caused with_metadata(None) to strip all schema metadata when embedding metadata was merged during create_table with embedding_functions.	2026-03-30 10:15:45 -07:00
aikido-autofix[bot]	4714598155	ci: mitigate template injection attack in build_linux_wheel (#3195 ) This patch mitigates template injection vulnerabilities in GitHub Workflows by replacing direct references with an environment variable. Aikido used AI to generate this PR. High confidence: Aikido has a robust set of benchmarks for similar fixes, and they are proven to be effective. Co-authored-by: aikido-autofix[bot] <119856028+aikido-autofix[bot]@users.noreply.github.com>	2026-03-30 09:29:24 -07:00
lennylxx	74f457a0f2	fix(rust): handle Mutex lock poisoning gracefully across codebase (#3196 ) Replace ~30 production `lock().unwrap()` calls that would cascade-panic on a poisoned Mutex. Functions returning `Result` now propagate the poison as an error via `?` (leveraging the existing `From<PoisonError>` impl). Functions without a `Result` return recover via `unwrap_or_else(\|e\| e.into_inner())`, which is safe because the guarded data (counters, caches, RNG state) remains logically valid after a panic.	2026-03-30 09:25:18 -07:00
Dan Tasse	cca6a7c989	fix: raise instead of return ValueError (#3189 ) These couple of cases used to return ValueError; should raise it instead.	2026-03-25 18:49:29 -07:00
Lance Release	ad96489114	Bump version: 0.27.2-beta.0 → 0.27.2-beta.1	2026-03-25 16:22:09 +00:00
Lance Release	76429730c0	Bump version: 0.30.2-beta.0 → 0.30.2-beta.1 python-v0.30.2-beta.1	2026-03-25 16:21:26 +00:00
Weston Pace	874b74dd3c	feat: update lance dependency to v4.0.0-rc.3 (#3187 ) ## Summary - Update all lance workspace dependencies from v3.0.1 (crates.io) to v4.0.0-rc.3 (git tag) - Pin AWS SDK transitive dependencies to versions compatible with Rust 1.91.0 MSRV ## Test plan - [x] `cargo check --features remote --tests --examples` passes - [x] `cargo clippy --features remote --tests --examples` passes - [x] Python bindings compile (`cargo check -p lancedb-python`) - [ ] CI passes 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-25 09:20:29 -07:00
Lance Release	61de47f3a5	Bump version: 0.27.1 → 0.27.2-beta.0	2026-03-25 03:23:28 +00:00
Lance Release	f4d613565e	Bump version: 0.30.1 → 0.30.2-beta.0 python-v0.30.2-beta.0	2026-03-25 03:22:55 +00:00
Wyatt Alt	410ab9b6fe	Revert "feat: allow passing azure client/tenant ID through remote SDK" (#3185 ) Reverts lancedb/lancedb#3102	2026-03-24 20:17:40 -07:00
Will Jones	1d6e00b902	feat: progress bar for `add()` (#3067 ) ## Summary Adds progress reporting for `table.add()` so users can track large write operations. The progress callback is available in Rust, Python (sync and async), and through the PyO3 bindings. ### Usage Pass `progress=True` to get an automatic tqdm bar: ```python table.add(data, progress=True) # 100%\|██████████\| 1000000/1000000 [00:12<00:00, 82345 rows/s, 45.2 MB/s \| 4/4 workers] ``` Or pass a tqdm bar for more control: ```python from tqdm import tqdm with tqdm(unit=" rows") as pbar: table.add(data, progress=pbar) ``` Or use a callback for custom progress handling: ```python def on_progress(p): print(f"{p['output_rows']}/{p['total_rows']} rows, " f"{p['active_tasks']}/{p['total_tasks']} workers, " f"done={p['done']}") table.add(data, progress=on_progress) ``` In Rust: ```rust table.add(data) .progress(\|p\| println!("{}/{:?} rows", p.output_rows(), p.total_rows())) .execute() .await?; ``` ### Details - `WriteProgress` struct in Rust with getters for `elapsed`, `output_rows`, `output_bytes`, `total_rows`, `active_tasks`, `total_tasks`, and `done`. Fields are private behind getters so new fields can be added without breaking changes. - `WriteProgressTracker` tracks progress across parallel write tasks using a mutex for row/byte counts and atomics for active task counts. - Active task tracking uses an RAII guard pattern (`ActiveTaskGuard`) that increments on creation and decrements on drop. - For remote writes, `output_bytes` reflects IPC wire bytes rather than in-memory Arrow size. For local writes it uses in-memory Arrow size as a proxy (see TODO below). - tqdm postfix displays throughput (MB/s) and worker utilization (active/total). - The `done` callback always fires, even on error (via `FinishOnDrop`), so progress bars are always finalized. ### TODO - Track actual bytes written to disk for local tables. This requires Lance to expose a progress callback from its write path. See lance-format/lance#6247. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-23 16:14:13 -07:00
Esteban Gutierrez	a0228036ae	ci: fix unused PreprocessingOutput (#3180 ) Simple fix to for CI due unused import of PreprocessingOutput in table.rs Co-authored-by: Esteban Gutierrez <esteban@lancedb.com>	2026-03-23 13:45:44 -07:00
Esteban Gutierrez	d8fc071a7d	fix(ci): bump AWS SDK MSRV pins to March 2025 release (#3179 ) Lance v4.1.0-beta requires the default-https-client feature on aws-sdk-dynamodb and aws-sdk-s3, which was introduced in the March 2025 AWS SDK release. Update all AWS SDK pins to versions from the same AWS SDK release to maintain internal dependency compatibility. Co-authored-by: Esteban Gutierrez <esteban@lancedb.com>	2026-03-23 15:30:33 -05:00
Will Jones	e6fd8d071e	feat(rust): parallel inserts for remote tables via multipart write (#3071 ) Similar to https://github.com/lancedb/lancedb/pull/3062, we can write in parallel to remote tables if the input data source is large enough. We take advantage of new endpoints coming in server version 0.4.0, which allow writing data in multiple requests, and the committing at the end in a single request. To make testing easier, I also introduce a `write_parallelism` parameter. In the future, we can expose that in Python and NodeJS so users can manually specify the parallelism they get. Closes #2861 --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-20 13:19:07 -07:00
LanceDB Robot	670dcca551	feat: update lance dependency to v3.0.1 (#3168 ) ## Summary - Updated Lance Rust workspace dependencies to `3.0.1` using `ci/set_lance_version.py`. - Updated Java `lance-core` dependency property in `java/pom.xml` to `3.0.1`. - Refreshed `Cargo.lock` entries for Lance crates at `3.0.1`. ## Verification - `cargo clippy --workspace --tests --all-features -- -D warnings` - `cargo fmt --all` ## Trigger - Tag: [`refs/tags/v3.0.1`](https://github.com/lancedb/lance/tree/v3.0.1) Co-authored-by: Esteban Gutierrez <estebangtz@gmail.com>	2026-03-20 09:53:20 -07:00
Prashanth Rao	ed7e01a58b	docs: fix rendering issues with missing index types in API docs (#3143 ) ## Problem The generated Python API docs for `lancedb.table.IndexStatistics.index_type` were misleading because mkdocstrings renders that field’s type annotation directly, and the existing `Literal[...]` listed only a subset of the actual canonical SDK index type strings. Current (missing index types): <img width="823" height="83" alt="image" src="https://github.com/user-attachments/assets/f6f29fe3-4c16-4d00-a4e9-28a7cd6e19ec" /> ## Fix - Update the `IndexStatistics.index_type` annotation in `python/python/lancedb/table.py` to include the full supported set of canonical values, so the generated docs show all valid index_type strings inline. - Add a small regression test in `python/python/tests/test_index.py` to ensure the docs-facing annotation does not drift silently again in case we add a new index/quantization type in the future. - Bumps mkdocs and material theme versions to mkdocs 1.6 to allow access to more features like hooks After fix (all index types are included and tested for in the annotations): <img width="1017" height="93" alt="image" src="https://github.com/user-attachments/assets/66c74d5c-34b3-4b44-8173-3ee23e3648ac" />	2026-03-20 09:34:42 -07:00
Lance Release	3450ccaf7f	Bump version: 0.27.1-beta.0 → 0.27.1	2026-03-20 00:35:36 +00:00
Lance Release	9b229f1e7c	Bump version: 0.27.0 → 0.27.1-beta.0	2026-03-20 00:35:19 +00:00
Lance Release	f5b21c0aa4	Bump version: 0.30.1-beta.0 → 0.30.1 python-v0.30.1	2026-03-20 00:35:03 +00:00
Lance Release	e927924d26	Bump version: 0.30.0 → 0.30.1-beta.0	2026-03-20 00:35:02 +00:00
Weston Pace	11a4966bfc	feat: upgrade lance dependency to v3.0.1 (#3157 ) ## Summary - Upgrade all lance-* dependencies from v3.0.0 to v3.0.1 (stable, from crates.io) ## Test plan - [x] `cargo check --features remote --tests --examples` passes - [x] `cargo clippy --features remote --tests --examples` passes - [x] `cargo fmt --all --check` passes - [ ] CI tests pass 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-19 17:30:46 -07:00
Weston Pace	dd5aaa72dc	ci: modify check_lance_release.py to prefer stable releases over betas (#3146 ) When Lance 3.0.0 released the check_lance_release.py script did not make a PR for it because it was a pre-release. This change may not be perfect but it always ranks stable releases above non-stable releases.	2026-03-17 09:21:30 -07:00
marca116	3a200d77ef	fix: pre-filtering on hybrid search (#3096 ) When using hybrid search with a where filter, the prefilter argument is silently inverted. Passing prefilter=True actually performs post-filtering, and prefilter=False actually performs pre-filtering.	2026-03-16 21:48:42 -07:00
Lance Release	bd09c53938	Bump version: 0.27.0-beta.6 → 0.27.0	2026-03-16 22:47:06 +00:00
Lance Release	0b18e33180	Bump version: 0.27.0-beta.5 → 0.27.0-beta.6	2026-03-16 22:46:48 +00:00
Lance Release	c89240b16c	Bump version: 0.30.0-beta.6 → 0.30.0 python-v0.30.0	2026-03-16 22:46:19 +00:00
Lance Release	099ff355a4	Bump version: 0.30.0-beta.5 → 0.30.0-beta.6	2026-03-16 22:46:17 +00:00
Weston Pace	c5995fda67	feat: update lance dependency to 3.0.0 release (#3137 ) ## Summary - Update all 14 lance crates from `3.0.0-rc.3` (git source) to `3.0.0` (crates.io release) - Remove git/tag source references since 3.0.0 is published on crates.io ## Test plan - [x] `cargo check --features remote --tests --examples` passes - [x] `cargo clippy --features remote --tests --examples` passes - [ ] CI passes 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-16 15:29:18 -07:00
Weston Pace	25eb1fbfa4	fix: restore storage options on copy in localstack tests (#3148 )	2026-03-16 14:02:19 -07:00
Weston Pace	4ac41c5c3f	fix(ci): upgrade LocalStack to 4.0 for S3 integration tests (#3147 ) ## Summary - Upgrade LocalStack from 3.3 to 4.0 in `docker-compose.yml` to fix S3 integration test failures in CI - Version 3.3 has compatibility issues with newer Python 3.13 and updated boto3 dependencies - Matches the LocalStack version used successfully in the lance repository ## Test plan - [ ] Verify `docker compose up --detach --wait` completes successfully in CI - [ ] All tests in `test_s3.py` pass (5 tests) - [ ] All `@pytest.mark.s3_test` tests in `test_namespace_integration.py` pass (7 tests) - [ ] No regressions in non-integration test jobs (Mac, Windows) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-16 09:02:11 -07:00
Will Jones	9a5b0398ec	chore: fix ci (#3139 ) * Move away from buildjet, which is shutting down runners for GHA [^1] * Add `Cargo.lock` to build jobs, so when we upgrade locked dependencies we check the builds actually pass. CI started failing because dependencies were changed in #3116 without running all build jobs. * Add fixes for aws-lc-rs build in NodeJS. [^1]: https://buildjet.com/for-github-actions/blog/we-are-shutting-down --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-16 06:25:40 -07:00
Pratik Dey	d1d720d08a	feat(nodejs): support field/data type input in add_columns() method (#3114 ) Add support for passing field/data type information into add_columns() method, bringing parity with Python bindings. The method now accepts: - AddColumnsSql[] - SQL expressions (existing functionality) - Field - single Arrow field with explicit data type - Field[] - array of Arrow fields with explicit data types - Schema - Arrow schema with explicit data types New columns added via Field/Schema are initialized with null values. All field-based columns must be nullable due to null initialization. Resolves #3107 --------- Signed-off-by: Pratik <pratikrocks.dey11@gmail.com> Co-authored-by: Claude <noreply@anthropic.com>	2026-03-13 12:57:14 -07:00
Mesut-Doner	c2e543f1b7	feat(rust): support Expr in projection query (#3069 ) Referred and followed [`Select::Dynamic`] implementation. Closes #3039	2026-03-13 12:54:26 -07:00
Weston Pace	216c1b5f77	docs: remove experimental label from optimize and warn about delete_unverified (#3128 ) ## Summary - Removes the "Experimental API" section from `optimize` method documentation across Rust, Python, and TypeScript - Adds a warning to `delete_unverified` documentation in all bindings: this should only be set to true if you can guarantee no other process is working on the dataset, otherwise it could be corrupted - Fixes a typo ("shoudl" → "should") Closes #3125 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-12 14:37:42 +08:00
Xin Sun	fc1867da83	chore: remove the duplicate snafu-derive dependency in the lockfile (#3124 )	2026-03-10 21:43:51 -07:00
Esteban Gutierrez	f951da2b00	feat: support prewarm_index and prewarm_data on remote tables (#3110 ) ## Summary - Implement `RemoteTable.prewarm_data(columns)` calling `POST /v1/table/{id}/page_cache/prewarm/` - Implement `RemoteTable.prewarm_index(name)` calling `POST /v1/table/{id}/index/{name}/prewarm/` (previously returned `NotSupported`) - Add `BaseTable::prewarm_data(columns)` trait method and `Table` public API in Rust core - Add PyO3 bindings and Python API (`AsyncTable`, `LanceTable`, `RemoteTable`) for `prewarm_data` - Add type stubs for `prewarm_index` and `prewarm_data` in `_lancedb.pyi` - Upgrade Lance to 3.0.0-rc.3 with breaking change fixes Co-authored-by: Will Jones <willjones127@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-10 15:39:39 -05:00
Esteban Gutierrez	6530d82690	chore: dependency updates and security fixes (#3116 ) ## Summary - Update dependencies across Rust, Python, Node.js, Java, Docker, and docs - Pin unpinned dependency lower bounds to prevent silent downgrades - Bump CI actions to current major versions 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-09 20:04:27 -07:00
Lance Release	b3fc9c444f	Bump version: 0.27.0-beta.4 → 0.27.0-beta.5	2026-03-09 19:58:12 +00:00
Lance Release	6de8f42dcd	Bump version: 0.30.0-beta.4 → 0.30.0-beta.5 python-v0.30.0-beta.5	2026-03-09 19:56:15 +00:00
Will Jones	5c3bd68e58	feat: upgrade Lance to 3.0.0-rc.3 (#3104 ) Co-authored-by: Jack Ye <yezhaoqin@gmail.com>	2026-03-09 12:55:20 -07:00
Weston Pace	4be85444f0	feat: infer js native arrays (#3119 ) When we create tables without using Arrow by parsing JS records we always infer to float64. Many times embeddings are not float64 and it would be nice to be able to use the native type without requiring users to pull in Arrow. We can utilize JS's builtin Float32Array to do this. This PR also adds support for UInt8/16/32 and Int8/16/32 arrays as well. Closes #3115	2026-03-09 10:13:59 -07:00
Xuanwo	68c07f333f	chore: unify component README titles (#3066 )	2026-03-09 21:47:58 +08:00
Lance Release	814a379e08	Bump version: 0.27.0-beta.3 → 0.27.0-beta.4	2026-03-09 08:47:17 +00:00
Lance Release	f31561c5bb	Bump version: 0.30.0-beta.3 → 0.30.0-beta.4 python-v0.30.0-beta.4	2026-03-09 08:45:25 +00:00
Jack Ye	e0c5ceac03	fix: propagate managed versioning for namespace connection (#3111 ) Without this fix, if user directly use the native table to do operations like `add_columns`, even if it is configured to use namespace db connection, it is not really propagated through. The fix is to bring lancedb's python binding up to date and do a similar implementation as https://github.com/lance-format/lance/pull/5968, and make sure the namespace is fully propagated through all the related calls. --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-03-09 01:44:31 -07:00
Prashanth Rao	e93bb3355a	docs: add meth/func names to mkdocstrings (#3101 ) LanceDB's SDK API docs do not currently show method names under any given object, and this makes it harder to quickly understand and find relevant method names for a given class. Geneva docs show the available methods in the right navigation. This PR standardizes the appearance of the LanceDB SDK API in the docs to be more similar to Geneva's. <img width="1386" height="792" alt="image" src="https://github.com/user-attachments/assets/30816591-d6d5-495d-886d-e234beeb6059" /> <img width="897" height="540" alt="image" src="https://github.com/user-attachments/assets/d5491b6b-c7bf-4d3b-8b15-1a1a7700e7c9" />	2026-03-06 08:54:45 -08:00
Will Jones	b75991eb07	fix: propagate cast errors in `add()` (#3075 ) When we write data with `add()`, we can input data to the table's schema. However, we were using "safe" mode, which propagates errors as nulls. For example, if you pass `u64::max` into a field that is a `u32`, it will just write null instead of giving overflow error. Now it propagates the overflow. This is the same behavior as other systems like DuckDB. --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-05 20:24:50 -08:00
Wyatt Alt	97ca9bb943	feat: allow passing azure client/tenant ID through remote SDK (#3102 ) Prior to this commit we supported passing the azure storage account name to the lancedb remote SDK through headers. This adds support for client ID and tenant ID as well.	2026-03-04 11:11:36 -08:00

1 2 3 4 5 ...

2450 Commits