lancedb

mirror of https://github.com/lancedb/lancedb.git synced 2026-03-24 17:40:41 +00:00

Author	SHA1	Message	Date
Lance Release	071f467571	Bump version: 0.29.0-beta.0 → 0.29.0 python-v0.29.0	2026-02-06 18:07:49 +00:00
Lance Release	f83aa25119	Bump version: 0.28.0-beta.0 → 0.29.0-beta.0	2026-02-06 18:07:48 +00:00
Jack Ye	0a8fe4d026	ci: fix python version for latest release (#2989 ) It was accidentally corrupted in https://github.com/lancedb/lancedb/pull/2972	2026-02-06 10:07:03 -08:00
Jack Ye	3ad7be9825	fix: remove x86_64-apple-darwin from list of npm triples (#2987 ) Missed during https://github.com/lancedb/lancedb/pull/2987	2026-02-06 09:43:44 -08:00
LanceDB Robot	589041d842	feat: update lance dependency to v2.0.0 (#2985 ) ## Summary - Bump Lance Rust crates to v2.0.0 (from v2.0.0-rc.4) and update Java `lance-core` to 2.0.0. - Verified `cargo clippy --workspace --tests --all-features -- -D warnings` and `cargo fmt --all`. - Triggering tag: v2.0.0.	2026-02-05 17:39:32 -08:00
Jack Ye	2e4cd56ab1	ci: auto-publish lancedb java sdk (#2986 ) Avoid the need to manually approve an artifact release in Maven Central	2026-02-05 16:30:32 -08:00
Jack Ye	6fd8586fa7	fix: avoid force push in codex workflows to work with v0.95.0 git safety (#2981 ) ## Summary - Codex CLI v0.95.0 ([PR #10258](https://github.com/openai/codex/pull/10258)) hardened git command safety so force push (`git push -f`, `--force`, `--force-with-lease`, `+refspec`) now requires approval, which blocks it in non-interactive `exec` mode. - This broke the [codex-update-lance-dependency](https://github.com/lancedb/lancedb/actions/runs/21727536000/job/62673436482) workflow — the job succeeded but failed to push the branch or create the PR. - Replace force push with `gh api` branch deletion followed by regular `git push`. - Also update the script to bump Java lance-core version which was missing previously ## Test plan - [x] Re-run the `Codex Update Lance Dependency` workflow with a test tag to verify the push and PR creation succeed: https://github.com/lancedb/lancedb/pull/2983 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-05 15:57:45 -08:00
Jack Ye	6329b57604	docs: update nodejs docs for storage options APIs (#2978 ) Regenerate TypeScript docs to include the new initialStorageOptions() and latestStorageOptions() methods added in #2966. Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-04 16:07:58 -08:00
Will Jones	c51b13e70f	ci: fix publish failure notifications being skipped (#2976 ) ## Summary The `report-failure` jobs in npm, cargo, and pypi publish workflows checked for `release` or `workflow_dispatch` events, but these workflows are triggered by tag pushes where `github.event_name` is `push`. The condition was never true, so failure notifications were silently skipped. - Use `startsWith(github.ref, 'refs/tags/...')` to match actual tag triggers - Add `failure()` to only notify on actual failures This matches the pattern already used by `java-publish.yml`. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-04 11:22:27 -08:00
Jack Ye	0859312b83	feat: add initial and latest storage options apis (#2966 ) Expose `initial_storage_options()` and `latest_storage_options()` in lance Dataset, in lancedb rust, python and typescript SDKs. --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-04 10:31:39 -08:00
Weston Pace	a6e8ec8d48	ci: remove npm auth token to allow trusted publisher (#2975 )	2026-02-04 07:28:42 -08:00
Jack Ye	bd2c6d0763	chore: update lance dependency to v2.0.0-rc.4 (#2972 )	2026-02-03 14:38:39 -08:00
Will Jones	fbf4a53475	feat(rust): implement `TableProvider::insert_into()` for LanceDB tables (#2939 ) Implements `InsertExec` and `RemoteInsertExec` to support running inserts in DataFusion. ## Context In https://github.com/lancedb/lancedb/pull/2929, I've prototyped moving the insert pipeline into DataFusion. This will enable parallelism at two levels: 1. Running preprocessing, such as casting the input schema or computing embeddings 2. Writing out files This PR is just the first part of running the actual writes. In the end, the plans might look like: ``` InsertExec RepartitionExec num_partitions=<write_parallelism> ProjectionExec vector=compute_embedding() RepartitionExec num_partitions=<num_cpus> DataSourceExec ``` where `num_cpus` is used to take advantage of all cores, while `write_parallelism` might be less than `num_cpus` if there are too few rows to want to split writes across `num_cpus` files. Later PRs will move the preprocessing steps into DataFusion, and then hook this up to the `Table::add()` implementations. ## Relation to future SQL work We eventually plan on having the Remote SDK go through a FlightSQL endpoint. Then for most queries we will send just the SQL string to the server, and not run any sort of DataFusion plan on the client. However, I think writes will be a little special, especially bulk writes where we need to upload large streams of data and likely want parallelism. So we'll have different code paths for writes, and I think using DataFusion makes sense, especially as long as we are doing the pre-processing on the client side still.	2026-02-03 10:38:02 -08:00
Vedant Madane	d3e15f3e17	fix(node): allow bigint[] for takeRowIds (#2916 ) ## Summary This PR changes takeRowIds to accept bigint[] instead of number[], matching the type of _rowid returned by withRowId(). ## Problem When retrieving row IDs using \withRowId()\ and querying them back with takeRowIds(), users get an error because: 1. _rowid values are returned as JavaScript bigint 2. takeRowIds() expected number[] 3. NAPI failed to convert: Error: Failed to convert napi value BigInt into rust type i64 ## Reproduction \\\js import lancedb from '@lancedb/lancedb'; const db = await lancedb.connect('memory://'); const table = await db.createTable('test', [{ id: 1, vector: [1.0, 2.0] }]); const results = await table.query().withRowId().toArray(); const rowIds = results.map(row => row._rowid); console.log('types:', rowIds.map(id => typeof id)); // ['bigint'] await table.takeRowIds(rowIds).toArray(); // âŒ Error before fix \\\ ## Solution - Updated TypeScript signature from takeRowIds(rowIds: number[]) to takeRowIds(rowIds: bigint[]) - Updated Rust NAPI binding to accept Vec<BigInt> and convert using get_u64() Fixes #2722 --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2026-02-03 10:09:51 -08:00
ChinmayGowda71	9c017d8348	refactor: extract update logic to src/table/update.rs (#2964 ) References #2949 Part 2 of table.rs refactor. Moved UpdateResult, UpdateBuilder, and execution logic to src/table/update.rs. No functional changes API remains identical. --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2026-02-03 09:54:19 -08:00
Rashid Ul Islam	c3cc2530b7	feat(python): expose fast_search in synchronous API (Fixes #2612 ) (#2962 ) Fixes #2612 This PR exposes the private _fast_search attribute via a public fast_search() method in the synchronous LanceVectorQueryBuilder. Previously, enabling fast search in the sync API required accessing a private member (query._fast_search = True). This change aligns the synchronous API with the Async and Remote APIs, allowing for cleaner, more Pythonic method chaining. Changes: Added fast_search() method to LanceVectorQueryBuilder in python/python/lancedb/query.py. Added a unit test verifying the flag works with high-dimensional data (2560 dims) and chaining. Example Usage: Before: ``` query = table.search(vector) query._fast_search = True # Private attribute usage results = query.limit(10).to_pandas() ``` After: ``` results = ( table.search(vector) .fast_search() .limit(10) .to_pandas() ) ``` Verification: I have added a test case (test_fast_search_high_dimension) that replicates the scenario described in the issue (2560 dimensions, cosine distance) to ensure the pipeline constructs the query correctly without errors. Checklist: - [ ] I have added tests to cover my changes. - [ ] All new and existing tests passed. - [ ] Documentation has been updated (inline docstrings). Signed-off-by: Rashidul Islam <rasidulislam71@gmail.com>	2026-02-03 09:17:27 -08:00
Lance Release	571295b0d9	Bump version: 0.24.1 → 0.25.0-beta.0	2026-02-03 04:48:34 +00:00
Lance Release	972c682857	Bump version: 0.27.1 → 0.28.0-beta.0 python-v0.28.0-beta.0	2026-02-03 04:47:20 +00:00
LuQQiu	4f8ee82730	chore: update lance core java version to 1.0.4 (#2971 )	2026-02-02 20:43:36 -08:00
Will Jones	131024839f	fix: include _rowid in hash and calculated split projections (#2965 ) ## Summary - PR #2957 changed the permutation builder to only select `_rowid` from the base table, but `Splitter::project()` for hash and calculated splits replaced the selection entirely, dropping `_rowid`. - Include `_rowid` in the column selections for hash and calculated split projections. - Fix a Python test that queried the permutation table for base table columns no longer materialized. Fixes the `test_split_hash`, `test_split_hash_with_discard`, `test_split_calculated`, `test_shuffle_combined_with_splits`, and `test_filter_with_splits` failures in `test_permutation.py`. ## Test plan - [x] `cargo test -p lancedb -- permutation` (22 passed) - [x] `pytest python/tests/test_permutation.py` (46 passed) - [x] `npm test __test__/permutation.test.ts` (20 passed) 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-02 16:27:58 -08:00
ChinmayGowda71	3c7ddf4d0c	refactor: modularize table.rs and extract delete logic (#2952 ) References #2949 Moved DeleteResult and delete() implementation to src/table/delete.rs. No functional changes. Added a test delete which works. Will work on refactoring update next.	2026-02-02 11:54:49 -08:00
Siyuan Huang	461176f9f2	docs: update REST API link in README.md (#2906 ) Fix broken REST API docs link in README.md by replacing https://docs.lancedb.com/api-reference/introduction (404) with https://docs.lancedb.com/api-reference/rest	2026-01-30 15:49:41 -08:00
Aman Harsh	3b8996bb69	fix(python): cancel remote queries on sync API interruption (#2913 ) Fixes #2898 Problem: Sync API cancellations didn’t stop remote query coroutines, so requests could continue after interrupt. Changes: - Cancel run_coroutine_threadsafe futures on any BaseException in the sync background loop - Update cancellation test to avoid starting a real background thread and cover GeneratorExit	2026-01-30 15:47:18 -08:00
Mesut-Doner	3755064e93	fix(rust): support embeddings in create_empty_table (#2961 ) Fixes the Rust SDK's `create_empty_table` to properly support embedding column definitions, bringing it to parity with the Python SDK. ## Problem The Rust SDK's `Connection::create_empty_table` did not support setting embedding columns. When using `.add_embedding()` on the builder, the embedding column definitions were lost because `TableDefinition::new_from_schema(schema)` marks all columns as physical only, without embedding metadata. The Python SDK worked around this by creating an empty record batch with proper schema metadata rather than using `create_empty_table` directly. ## Solution Modified `CreateTableBuilder<false>` to handle embeddings Closes #2759	2026-01-30 15:44:18 -08:00
Xin Sun	8773b865a9	fix(python): uses PIL incorrectly and may raise AttributeError (#2954 ) Importing `PIL` alone does not guarantee that the `Image` submodule is loaded. In a clean environment where no other code has imported `PIL.Image` before, `PIL.Image` does not exist on the `PIL` package, which leads to the AttributeError.	2026-01-30 15:33:10 -08:00
fzowl	1ee29675b3	feat(python): adding VoyageAI v4 models (#2959 ) Adding VoyageAI v4 models - with these, i added unit tests - added example code (tested!)	2026-01-30 15:16:03 -08:00
Weston Pace	9be28448f5	fix: don't store all columns in the permutation table (#2957 ) The permutation table was always intended to be a small table of row id pointers (and split id). However, it was accidentally doing a full materialization of the base table 🤦 This PR changes the permutation builder to only store row id and split id.	2026-01-29 16:06:36 -08:00
Lei Xu	357197bacc	chore!: change support python version from 3.10 to 3.13 (#2955 ) Python 3.9 is EOL since Oct 2025. and last two pyarrow builts were against python3.10-3.13. * This PR is contributed by codex-gpt5.2	2026-01-30 01:47:50 +08:00
Lei Xu	ad51e2dd1f	fix: support pydantic list of structs or optional struct (#2953 ) Closes #2950 This code is generated by codex-gpt5.2	2026-01-28 21:08:18 -08:00
Weston Pace	e9e904783c	feat: allow the permutation builder memory limit to be configured by env var (#2946 ) Running into issues with DF sorting again. This will at least allow the memory limit to be set large to bypass problems.	2026-01-28 09:02:59 +05:30
Lance Release	8500b16eca	Bump version: 0.24.1-beta.0 → 0.24.1	2026-01-26 23:39:18 +00:00
Lance Release	57e7282342	Bump version: 0.24.0 → 0.24.1-beta.0	2026-01-26 23:38:50 +00:00
Lance Release	cc5f8070d7	Bump version: 0.27.1-beta.0 → 0.27.1 python-v0.27.1	2026-01-26 23:38:24 +00:00
Lance Release	dc0fb01f6b	Bump version: 0.27.0 → 0.27.1-beta.0	2026-01-26 23:38:23 +00:00
LanceDB Robot	94b7781551	feat: update lance dependency to v1.0.4 (#2944 ) ## Summary - bump Lance dependencies to v1.0.4 - run `cargo clippy --workspace --tests --all-features -- -D warnings` - run `cargo fmt --all` ## Testing - `cargo clippy --workspace --tests --all-features -- -D warnings` ## Reference - https://github.com/lance-format/lance/releases/tag/v1.0.4	2026-01-26 15:37:28 -08:00
Jack Ye	7bf020b3d5	chore: fix clippy when remote flag is not set (#2943 ) Also add a step in CI to ensure this does not happen in the future	2026-01-26 13:59:31 -08:00
LanceDB Robot	12a98479dc	chore: update lance dependency to v1.0.4-rc.1 (#2942 ) ## Summary - bump Lance dependencies to v1.0.4-rc.1 - verified `cargo clippy --workspace --tests --all-features -- -D warnings` - ran `cargo fmt --all` ## References - https://github.com/lance-format/lance/releases/tag/v1.0.4-rc.1	2026-01-26 12:17:22 -08:00
Jack Ye	e4552e577a	chore(revert): revert update lance dependency to v2.0.0-rc.1 (#2936 ) (#2941 ) This reverts commit `bd84bba14d`, so that we can bump version to 1.0.4-rc.1	2026-01-26 11:13:59 -08:00
Will Jones	f979a902ad	ci(rust): fix MSRV check (#2940 ) Realized our MSRV check was inert because `rust-toolchain.toml` was overriding the Rust version. We set the `RUSTUP_TOOLCHAIN` environment variable, which overrides that. Also needed to update to MSRV 1.88 (due to dependencies like Lance and DataFusion) and fix some clippy warnings.	2026-01-23 15:57:09 -08:00
Colin Patrick McCabe	5a7a8da567	feat: check AZURE_STORAGE_ACCOUNT_NAME in remote conns (#2918 ) Unlike in Amazon S3, in Azure bucket names are not globally unique. Instead, the combination of (storage_account_name, bucket_name) is unique. Therefore, when using Azure blob store, we always need a way to configure the storage account name. One way is to use the storage_options hash map and set azure_storage_account_name. Another way is to set an environment variable, AZURE_STORAGE_ACCOUNT_NAME. Prior to this PR, the second way (environment variable) did not work with remote connections. This is because the existing code that checks for these environment variables happens inside the Azure object store implementation itself, which does not run locally when using remote connections. This PR addresses that situation by adding a check of the environment variable. This functions as a default if the relevant storage option is not set in the storage_options hash map.	2026-01-22 13:36:05 -08:00
Jack Ye	0db8176445	test: fix failing remote doctest reference to aws feature (#2935 ) Closes https://github.com/lancedb/lancedb/issues/2933	2026-01-22 13:17:03 -08:00
LanceDB Robot	bd84bba14d	chore: update lance dependency to v2.0.0-rc.1 (#2936 ) ## Summary - bump Lance dependencies to v2.0.0-rc.1 (git tag) - align Arrow/DataFusion/PyO3 versions for the new Lance release - update Python bindings for PyO3 0.26 (attach API + Py<PyAny>) ## Verification - `cargo clippy --workspace --tests --all-features -- -D warnings` - `cargo fmt --all` ## Reference - https://github.com/lance-format/lance/releases/tag/v2.0.0-rc.1 --------- Co-authored-by: Jack Ye <yezhaoqin@gmail.com> Co-authored-by: Will Jones <willjones127@gmail.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: BubbleCal <bubble_cal@outlook.com>	2026-01-22 13:14:38 -08:00
Lance Release	ac07f8068c	Bump version: 0.24.0-beta.1 → 0.24.0	2026-01-22 01:10:15 +00:00
Lance Release	bba362d372	Bump version: 0.24.0-beta.0 → 0.24.0-beta.1	2026-01-22 01:09:53 +00:00
Lance Release	042bc22468	Bump version: 0.27.0-beta.1 → 0.27.0 python-v0.27.0	2026-01-22 01:09:32 +00:00
Lance Release	68569906c6	Bump version: 0.27.0-beta.0 → 0.27.0-beta.1	2026-01-22 01:09:31 +00:00
LanceDB Robot	c71c1fc822	feat: update lance dependency to v1.0.3 (#2932 ) ## Summary - bump Lance dependency to v1.0.3 - refresh Cargo metadata and lockfile ## Verification - cargo clippy --workspace --tests --all-features -- -D warnings - cargo fmt --all ## Release - https://github.com/lance-format/lance/releases/tag/v1.0.3	2026-01-21 17:08:24 -08:00
Jack Ye	4a6a0c856e	ci: fix codex version bump title and summary (#2931 ) 1. use feat for releases, chore for prereleases 2. do not have literal `\n` in summary	2026-01-21 15:45:28 -08:00
Jack Ye	f124c9d8d2	test: string type conversion in pandas 3.0+ (#2928 ) Pandas 3.0+ string now converts to Arrow large_utf8. This PR mainly makes sure our test accounts for the difference across the pandas versions when constructing schema.	2026-01-21 13:40:48 -08:00
Jack Ye	4e65748abf	chore: update lance dependency to v1.0.3-rc.1 (#2927 ) Supercedes https://github.com/lancedb/lancedb/pull/2925 We accidentally upgraded lance to 2.0.0-beta.8. This PR reverts that first and then bump to 1.0.3-rc.1 --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-21 11:52:07 -08:00

1 2 3 4 5 ...

2288 Commits