Commit Graph

2288 Commits

Author SHA1 Message Date
Lance Release
071f467571 Bump version: 0.29.0-beta.0 → 0.29.0 python-v0.29.0 2026-02-06 18:07:49 +00:00
Lance Release
f83aa25119 Bump version: 0.28.0-beta.0 → 0.29.0-beta.0 2026-02-06 18:07:48 +00:00
Jack Ye
0a8fe4d026 ci: fix python version for latest release (#2989)
It was accidentally corrupted in
https://github.com/lancedb/lancedb/pull/2972
2026-02-06 10:07:03 -08:00
Jack Ye
3ad7be9825 fix: remove x86_64-apple-darwin from list of npm triples (#2987)
Missed during https://github.com/lancedb/lancedb/pull/2987
2026-02-06 09:43:44 -08:00
LanceDB Robot
589041d842 feat: update lance dependency to v2.0.0 (#2985)
## Summary
- Bump Lance Rust crates to v2.0.0 (from v2.0.0-rc.4) and update Java
`lance-core` to 2.0.0.
- Verified `cargo clippy --workspace --tests --all-features -- -D
warnings` and `cargo fmt --all`.
- Triggering tag: v2.0.0.
2026-02-05 17:39:32 -08:00
Jack Ye
2e4cd56ab1 ci: auto-publish lancedb java sdk (#2986)
Avoid the need to manually approve an artifact release in Maven Central
2026-02-05 16:30:32 -08:00
Jack Ye
6fd8586fa7 fix: avoid force push in codex workflows to work with v0.95.0 git safety (#2981)
## Summary
- Codex CLI v0.95.0 ([PR
#10258](https://github.com/openai/codex/pull/10258)) hardened git
command safety so force push (`git push -f`, `--force`,
`--force-with-lease`, `+refspec`) now requires approval, which blocks it
in non-interactive `exec` mode.
- This broke the
[codex-update-lance-dependency](https://github.com/lancedb/lancedb/actions/runs/21727536000/job/62673436482)
workflow — the job succeeded but failed to push the branch or create the
PR.
- Replace force push with `gh api` branch deletion followed by regular
`git push`.
- Also update the script to bump Java lance-core version which was
missing previously

## Test plan
- [x] Re-run the `Codex Update Lance Dependency` workflow with a test
tag to verify the push and PR creation succeed:
https://github.com/lancedb/lancedb/pull/2983

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 15:57:45 -08:00
Jack Ye
6329b57604 docs: update nodejs docs for storage options APIs (#2978)
Regenerate TypeScript docs to include the new initialStorageOptions()
and latestStorageOptions() methods added in #2966.

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 16:07:58 -08:00
Will Jones
c51b13e70f ci: fix publish failure notifications being skipped (#2976)
## Summary

The `report-failure` jobs in npm, cargo, and pypi publish workflows
checked for
`release` or `workflow_dispatch` events, but these workflows are
triggered by tag
pushes where `github.event_name` is `push`. The condition was never
true, so failure
notifications were silently skipped.

- Use `startsWith(github.ref, 'refs/tags/...')` to match actual tag
triggers
- Add `failure()` to only notify on actual failures

This matches the pattern already used by `java-publish.yml`.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 11:22:27 -08:00
Jack Ye
0859312b83 feat: add initial and latest storage options apis (#2966)
Expose `initial_storage_options()` and `latest_storage_options()` in
lance Dataset, in lancedb rust, python and typescript SDKs.

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 10:31:39 -08:00
Weston Pace
a6e8ec8d48 ci: remove npm auth token to allow trusted publisher (#2975) 2026-02-04 07:28:42 -08:00
Jack Ye
bd2c6d0763 chore: update lance dependency to v2.0.0-rc.4 (#2972) 2026-02-03 14:38:39 -08:00
Will Jones
fbf4a53475 feat(rust): implement TableProvider::insert_into() for LanceDB tables (#2939)
Implements `InsertExec` and `RemoteInsertExec` to support running
inserts in DataFusion.

## Context

In https://github.com/lancedb/lancedb/pull/2929, I've prototyped moving
the insert pipeline into DataFusion. This will enable parallelism at two
levels:

1. Running preprocessing, such as casting the input schema or computing
embeddings
2. Writing out files

This PR is just the first part of running the actual writes. In the end,
the plans might look like:

```
InsertExec
  RepartitionExec num_partitions=<write_parallelism>
    ProjectionExec vector=compute_embedding()
      RepartitionExec num_partitions=<num_cpus>
        DataSourceExec
```

where `num_cpus` is used to take advantage of all cores, while
`write_parallelism` might be less than `num_cpus` if there are too few
rows to want to split writes across `num_cpus` files.

Later PRs will move the preprocessing steps into DataFusion, and then
hook this up to the `Table::add()` implementations.

## Relation to future SQL work

We eventually plan on having the Remote SDK go through a FlightSQL
endpoint. Then for most queries we will send just the SQL string to the
server, and not run any sort of DataFusion plan on the client.

However, I think writes will be a little special, especially bulk writes
where we need to upload large streams of data and likely want
parallelism. So we'll have different code paths for writes, and I think
using DataFusion makes sense, especially as long as we are doing the
pre-processing on the client side still.
2026-02-03 10:38:02 -08:00
Vedant Madane
d3e15f3e17 fix(node): allow bigint[] for takeRowIds (#2916)
## Summary

This PR changes takeRowIds to accept bigint[] instead of 
number[], matching the type of _rowid returned by withRowId().

## Problem

When retrieving row IDs using \withRowId()\ and querying them back with
takeRowIds(), users get an error because:

1. _rowid values are returned as JavaScript bigint
2. takeRowIds() expected number[]
3. NAPI failed to convert: Error: Failed to convert napi value BigInt
into rust type i64

## Reproduction

\\\js
import lancedb from '@lancedb/lancedb';

const db = await lancedb.connect('memory://');
const table = await db.createTable('test', [{ id: 1, vector: [1.0, 2.0]
}]);

const results = await table.query().withRowId().toArray();
const rowIds = results.map(row => row._rowid);

console.log('types:', rowIds.map(id => typeof id)); // ['bigint']
await table.takeRowIds(rowIds).toArray(); // ❌ Error before fix
\\\

## Solution

- Updated TypeScript signature from takeRowIds(rowIds: number[]) to
takeRowIds(rowIds: bigint[])
- Updated Rust NAPI binding to accept Vec<BigInt> and convert using
get_u64()

Fixes #2722

---------

Co-authored-by: Will Jones <willjones127@gmail.com>
2026-02-03 10:09:51 -08:00
ChinmayGowda71
9c017d8348 refactor: extract update logic to src/table/update.rs (#2964)
References #2949 Part 2 of table.rs refactor. Moved UpdateResult,
UpdateBuilder, and execution logic to src/table/update.rs. No functional
changes API remains identical.

---------

Co-authored-by: Will Jones <willjones127@gmail.com>
2026-02-03 09:54:19 -08:00
Rashid Ul Islam
c3cc2530b7 feat(python): expose fast_search in synchronous API (Fixes #2612) (#2962)
Fixes #2612

This PR exposes the private _fast_search attribute via a public
fast_search() method in the synchronous LanceVectorQueryBuilder.

Previously, enabling fast search in the sync API required accessing a
private member (query._fast_search = True). This change aligns the
synchronous API with the Async and Remote APIs, allowing for cleaner,
more Pythonic method chaining.

Changes:
Added fast_search() method to LanceVectorQueryBuilder in
python/python/lancedb/query.py.
Added a unit test verifying the flag works with high-dimensional data
(2560 dims) and chaining.
Example Usage:

Before:

```
query = table.search(vector)
query._fast_search = True  # Private attribute usage
results = query.limit(10).to_pandas()
```

After:

```
results = (
    table.search(vector)
    .fast_search()
    .limit(10)
    .to_pandas()
)
```

Verification:
I have added a test case (test_fast_search_high_dimension) that
replicates the scenario described in the issue (2560 dimensions, cosine
distance) to ensure the pipeline constructs the query correctly without
errors.

Checklist:

- [ ]  I have added tests to cover my changes.
- [ ]  All new and existing tests passed.
- [ ]  Documentation has been updated (inline docstrings).

Signed-off-by: Rashidul Islam <rasidulislam71@gmail.com>
2026-02-03 09:17:27 -08:00
Lance Release
571295b0d9 Bump version: 0.24.1 → 0.25.0-beta.0 2026-02-03 04:48:34 +00:00
Lance Release
972c682857 Bump version: 0.27.1 → 0.28.0-beta.0 python-v0.28.0-beta.0 2026-02-03 04:47:20 +00:00
LuQQiu
4f8ee82730 chore: update lance core java version to 1.0.4 (#2971) 2026-02-02 20:43:36 -08:00
Will Jones
131024839f fix: include _rowid in hash and calculated split projections (#2965)
## Summary

- PR #2957 changed the permutation builder to only select `_rowid` from
the base table, but `Splitter::project()` for hash and calculated splits
replaced the selection entirely, dropping `_rowid`.
- Include `_rowid` in the column selections for hash and calculated
split projections.
- Fix a Python test that queried the permutation table for base table
columns no longer materialized.

Fixes the `test_split_hash`, `test_split_hash_with_discard`,
`test_split_calculated`, `test_shuffle_combined_with_splits`, and
`test_filter_with_splits` failures in `test_permutation.py`.

## Test plan

- [x] `cargo test -p lancedb -- permutation` (22 passed)
- [x] `pytest python/tests/test_permutation.py` (46 passed)
- [x] `npm test __test__/permutation.test.ts` (20 passed)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-02 16:27:58 -08:00
ChinmayGowda71
3c7ddf4d0c refactor: modularize table.rs and extract delete logic (#2952)
References #2949 Moved DeleteResult and delete() implementation to
src/table/delete.rs. No functional changes. Added a test delete which
works. Will work on refactoring update next.
2026-02-02 11:54:49 -08:00
Siyuan Huang
461176f9f2 docs: update REST API link in README.md (#2906)
Fix broken REST API docs link in README.md by replacing
https://docs.lancedb.com/api-reference/introduction (404) with
https://docs.lancedb.com/api-reference/rest
2026-01-30 15:49:41 -08:00
Aman Harsh
3b8996bb69 fix(python): cancel remote queries on sync API interruption (#2913)
Fixes #2898 

Problem:
Sync API cancellations didn’t stop remote query coroutines, so requests
could continue after interrupt.

Changes:
- Cancel run_coroutine_threadsafe futures on any BaseException in the
sync background loop
- Update cancellation test to avoid starting a real background thread
and cover GeneratorExit
2026-01-30 15:47:18 -08:00
Mesut-Doner
3755064e93 fix(rust): support embeddings in create_empty_table (#2961)
Fixes the Rust SDK's `create_empty_table` to properly support embedding
column definitions, bringing it to parity with the Python SDK.

## Problem

The Rust SDK's `Connection::create_empty_table` did not support setting
embedding columns. When using `.add_embedding()` on the builder, the
embedding column definitions were lost because
`TableDefinition::new_from_schema(schema)` marks all columns as physical
only, without embedding metadata.

The Python SDK worked around this by creating an empty record batch with
proper schema metadata rather than using `create_empty_table` directly.

## Solution
Modified `CreateTableBuilder<false>` to handle embeddings

Closes #2759
2026-01-30 15:44:18 -08:00
Xin Sun
8773b865a9 fix(python): uses PIL incorrectly and may raise AttributeError (#2954)
Importing `PIL` alone does not guarantee that the `Image` submodule is
loaded. In a clean environment where no other code has imported
`PIL.Image` before, `PIL.Image` does not exist on the `PIL` package,
which leads to the AttributeError.
2026-01-30 15:33:10 -08:00
fzowl
1ee29675b3 feat(python): adding VoyageAI v4 models (#2959)
Adding VoyageAI v4 models
 - with these, i added unit tests
 - added example code (tested!)
2026-01-30 15:16:03 -08:00
Weston Pace
9be28448f5 fix: don't store all columns in the permutation table (#2957)
The permutation table was always intended to be a small table of row id
pointers (and split id). However, it was accidentally doing a full
materialization of the base table 🤦

This PR changes the permutation builder to only store row id and split
id.
2026-01-29 16:06:36 -08:00
Lei Xu
357197bacc chore!: change support python version from 3.10 to 3.13 (#2955)
Python 3.9 is EOL since Oct 2025. and last two pyarrow builts were
against python3.10-3.13.

* This PR is contributed by codex-gpt5.2
2026-01-30 01:47:50 +08:00
Lei Xu
ad51e2dd1f fix: support pydantic list of structs or optional struct (#2953)
Closes #2950

*This code is generated by codex-gpt5.2*
2026-01-28 21:08:18 -08:00
Weston Pace
e9e904783c feat: allow the permutation builder memory limit to be configured by env var (#2946)
Running into issues with DF sorting again. This will at least allow the
memory limit to be set large to bypass problems.
2026-01-28 09:02:59 +05:30
Lance Release
8500b16eca Bump version: 0.24.1-beta.0 → 0.24.1 2026-01-26 23:39:18 +00:00
Lance Release
57e7282342 Bump version: 0.24.0 → 0.24.1-beta.0 2026-01-26 23:38:50 +00:00
Lance Release
cc5f8070d7 Bump version: 0.27.1-beta.0 → 0.27.1 python-v0.27.1 2026-01-26 23:38:24 +00:00
Lance Release
dc0fb01f6b Bump version: 0.27.0 → 0.27.1-beta.0 2026-01-26 23:38:23 +00:00
LanceDB Robot
94b7781551 feat: update lance dependency to v1.0.4 (#2944)
## Summary
- bump Lance dependencies to v1.0.4
- run `cargo clippy --workspace --tests --all-features -- -D warnings`
- run `cargo fmt --all`

## Testing
- `cargo clippy --workspace --tests --all-features -- -D warnings`

## Reference
- https://github.com/lance-format/lance/releases/tag/v1.0.4
2026-01-26 15:37:28 -08:00
Jack Ye
7bf020b3d5 chore: fix clippy when remote flag is not set (#2943)
Also add a step in CI to ensure this does not happen in the future
2026-01-26 13:59:31 -08:00
LanceDB Robot
12a98479dc chore: update lance dependency to v1.0.4-rc.1 (#2942)
## Summary
- bump Lance dependencies to v1.0.4-rc.1
- verified `cargo clippy --workspace --tests --all-features -- -D
warnings`
- ran `cargo fmt --all`

## References
- https://github.com/lance-format/lance/releases/tag/v1.0.4-rc.1
2026-01-26 12:17:22 -08:00
Jack Ye
e4552e577a chore(revert): revert update lance dependency to v2.0.0-rc.1 (#2936) (#2941)
This reverts commit bd84bba14d, so that we
can bump version to 1.0.4-rc.1
2026-01-26 11:13:59 -08:00
Will Jones
f979a902ad ci(rust): fix MSRV check (#2940)
Realized our MSRV check was inert because `rust-toolchain.toml` was
overriding the Rust version. We set the `RUSTUP_TOOLCHAIN` environment
variable, which overrides that.

Also needed to update to MSRV 1.88 (due to dependencies like Lance and
DataFusion) and fix some clippy warnings.
2026-01-23 15:57:09 -08:00
Colin Patrick McCabe
5a7a8da567 feat: check AZURE_STORAGE_ACCOUNT_NAME in remote conns (#2918)
Unlike in Amazon S3, in Azure bucket names are not globally unique.
Instead, the combination of (storage_account_name, bucket_name) is
unique.

Therefore, when using Azure blob store, we always need a way to
configure the storage account name. One way is to use the
storage_options hash map and set azure_storage_account_name. Another way
is to set an environment variable, AZURE_STORAGE_ACCOUNT_NAME.

Prior to this PR, the second way (environment variable) did not work
with remote connections. This is because the existing code that checks
for these environment variables happens inside the Azure object store
implementation itself, which does not run locally when using remote
connections.

This PR addresses that situation by adding a check of the environment
variable. This functions as a default if the relevant storage option is
not set in the storage_options hash map.
2026-01-22 13:36:05 -08:00
Jack Ye
0db8176445 test: fix failing remote doctest reference to aws feature (#2935)
Closes https://github.com/lancedb/lancedb/issues/2933
2026-01-22 13:17:03 -08:00
LanceDB Robot
bd84bba14d chore: update lance dependency to v2.0.0-rc.1 (#2936)
## Summary
- bump Lance dependencies to v2.0.0-rc.1 (git tag)
- align Arrow/DataFusion/PyO3 versions for the new Lance release
- update Python bindings for PyO3 0.26 (attach API + Py<PyAny>)

## Verification
- `cargo clippy --workspace --tests --all-features -- -D warnings`
- `cargo fmt --all`

## Reference
- https://github.com/lance-format/lance/releases/tag/v2.0.0-rc.1

---------

Co-authored-by: Jack Ye <yezhaoqin@gmail.com>
Co-authored-by: Will Jones <willjones127@gmail.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: BubbleCal <bubble_cal@outlook.com>
2026-01-22 13:14:38 -08:00
Lance Release
ac07f8068c Bump version: 0.24.0-beta.1 → 0.24.0 2026-01-22 01:10:15 +00:00
Lance Release
bba362d372 Bump version: 0.24.0-beta.0 → 0.24.0-beta.1 2026-01-22 01:09:53 +00:00
Lance Release
042bc22468 Bump version: 0.27.0-beta.1 → 0.27.0 python-v0.27.0 2026-01-22 01:09:32 +00:00
Lance Release
68569906c6 Bump version: 0.27.0-beta.0 → 0.27.0-beta.1 2026-01-22 01:09:31 +00:00
LanceDB Robot
c71c1fc822 feat: update lance dependency to v1.0.3 (#2932)
## Summary
- bump Lance dependency to v1.0.3
- refresh Cargo metadata and lockfile

## Verification
- cargo clippy --workspace --tests --all-features -- -D warnings
- cargo fmt --all

## Release
- https://github.com/lance-format/lance/releases/tag/v1.0.3
2026-01-21 17:08:24 -08:00
Jack Ye
4a6a0c856e ci: fix codex version bump title and summary (#2931)
1. use feat for releases, chore for prereleases
2. do not have literal `\n` in summary
2026-01-21 15:45:28 -08:00
Jack Ye
f124c9d8d2 test: string type conversion in pandas 3.0+ (#2928)
Pandas 3.0+ string now converts to Arrow large_utf8. This PR mainly
makes sure our test accounts for the difference across the pandas
versions when constructing schema.
2026-01-21 13:40:48 -08:00
Jack Ye
4e65748abf chore: update lance dependency to v1.0.3-rc.1 (#2927)
Supercedes https://github.com/lancedb/lancedb/pull/2925

We accidentally upgraded lance to 2.0.0-beta.8. This PR reverts that
first and then bump to 1.0.3-rc.1

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-21 11:52:07 -08:00