## Summary
- Codex CLI v0.95.0 ([PR
#10258](https://github.com/openai/codex/pull/10258)) hardened git
command safety so force push (`git push -f`, `--force`,
`--force-with-lease`, `+refspec`) now requires approval, which blocks it
in non-interactive `exec` mode.
- This broke the
[codex-update-lance-dependency](https://github.com/lancedb/lancedb/actions/runs/21727536000/job/62673436482)
workflow — the job succeeded but failed to push the branch or create the
PR.
- Replace force push with `gh api` branch deletion followed by regular
`git push`.
- Also update the script to bump Java lance-core version which was
missing previously
## Test plan
- [x] Re-run the `Codex Update Lance Dependency` workflow with a test
tag to verify the push and PR creation succeed:
https://github.com/lancedb/lancedb/pull/2983🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Regenerate TypeScript docs to include the new initialStorageOptions()
and latestStorageOptions() methods added in #2966.
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
## Summary
The `report-failure` jobs in npm, cargo, and pypi publish workflows
checked for
`release` or `workflow_dispatch` events, but these workflows are
triggered by tag
pushes where `github.event_name` is `push`. The condition was never
true, so failure
notifications were silently skipped.
- Use `startsWith(github.ref, 'refs/tags/...')` to match actual tag
triggers
- Add `failure()` to only notify on actual failures
This matches the pattern already used by `java-publish.yml`.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Expose `initial_storage_options()` and `latest_storage_options()` in
lance Dataset, in lancedb rust, python and typescript SDKs.
---------
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Implements `InsertExec` and `RemoteInsertExec` to support running
inserts in DataFusion.
## Context
In https://github.com/lancedb/lancedb/pull/2929, I've prototyped moving
the insert pipeline into DataFusion. This will enable parallelism at two
levels:
1. Running preprocessing, such as casting the input schema or computing
embeddings
2. Writing out files
This PR is just the first part of running the actual writes. In the end,
the plans might look like:
```
InsertExec
RepartitionExec num_partitions=<write_parallelism>
ProjectionExec vector=compute_embedding()
RepartitionExec num_partitions=<num_cpus>
DataSourceExec
```
where `num_cpus` is used to take advantage of all cores, while
`write_parallelism` might be less than `num_cpus` if there are too few
rows to want to split writes across `num_cpus` files.
Later PRs will move the preprocessing steps into DataFusion, and then
hook this up to the `Table::add()` implementations.
## Relation to future SQL work
We eventually plan on having the Remote SDK go through a FlightSQL
endpoint. Then for most queries we will send just the SQL string to the
server, and not run any sort of DataFusion plan on the client.
However, I think writes will be a little special, especially bulk writes
where we need to upload large streams of data and likely want
parallelism. So we'll have different code paths for writes, and I think
using DataFusion makes sense, especially as long as we are doing the
pre-processing on the client side still.
## Summary
This PR changes takeRowIds to accept bigint[] instead of
number[], matching the type of _rowid returned by withRowId().
## Problem
When retrieving row IDs using \withRowId()\ and querying them back with
takeRowIds(), users get an error because:
1. _rowid values are returned as JavaScript bigint
2. takeRowIds() expected number[]
3. NAPI failed to convert: Error: Failed to convert napi value BigInt
into rust type i64
## Reproduction
\\\js
import lancedb from '@lancedb/lancedb';
const db = await lancedb.connect('memory://');
const table = await db.createTable('test', [{ id: 1, vector: [1.0, 2.0]
}]);
const results = await table.query().withRowId().toArray();
const rowIds = results.map(row => row._rowid);
console.log('types:', rowIds.map(id => typeof id)); // ['bigint']
await table.takeRowIds(rowIds).toArray(); // ⌠Error before fix
\\\
## Solution
- Updated TypeScript signature from takeRowIds(rowIds: number[]) to
takeRowIds(rowIds: bigint[])
- Updated Rust NAPI binding to accept Vec<BigInt> and convert using
get_u64()
Fixes#2722
---------
Co-authored-by: Will Jones <willjones127@gmail.com>
References #2949 Part 2 of table.rs refactor. Moved UpdateResult,
UpdateBuilder, and execution logic to src/table/update.rs. No functional
changes API remains identical.
---------
Co-authored-by: Will Jones <willjones127@gmail.com>
Fixes#2612
This PR exposes the private _fast_search attribute via a public
fast_search() method in the synchronous LanceVectorQueryBuilder.
Previously, enabling fast search in the sync API required accessing a
private member (query._fast_search = True). This change aligns the
synchronous API with the Async and Remote APIs, allowing for cleaner,
more Pythonic method chaining.
Changes:
Added fast_search() method to LanceVectorQueryBuilder in
python/python/lancedb/query.py.
Added a unit test verifying the flag works with high-dimensional data
(2560 dims) and chaining.
Example Usage:
Before:
```
query = table.search(vector)
query._fast_search = True # Private attribute usage
results = query.limit(10).to_pandas()
```
After:
```
results = (
table.search(vector)
.fast_search()
.limit(10)
.to_pandas()
)
```
Verification:
I have added a test case (test_fast_search_high_dimension) that
replicates the scenario described in the issue (2560 dimensions, cosine
distance) to ensure the pipeline constructs the query correctly without
errors.
Checklist:
- [ ] I have added tests to cover my changes.
- [ ] All new and existing tests passed.
- [ ] Documentation has been updated (inline docstrings).
Signed-off-by: Rashidul Islam <rasidulislam71@gmail.com>
## Summary
- PR #2957 changed the permutation builder to only select `_rowid` from
the base table, but `Splitter::project()` for hash and calculated splits
replaced the selection entirely, dropping `_rowid`.
- Include `_rowid` in the column selections for hash and calculated
split projections.
- Fix a Python test that queried the permutation table for base table
columns no longer materialized.
Fixes the `test_split_hash`, `test_split_hash_with_discard`,
`test_split_calculated`, `test_shuffle_combined_with_splits`, and
`test_filter_with_splits` failures in `test_permutation.py`.
## Test plan
- [x] `cargo test -p lancedb -- permutation` (22 passed)
- [x] `pytest python/tests/test_permutation.py` (46 passed)
- [x] `npm test __test__/permutation.test.ts` (20 passed)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
References #2949 Moved DeleteResult and delete() implementation to
src/table/delete.rs. No functional changes. Added a test delete which
works. Will work on refactoring update next.
Fixes#2898
Problem:
Sync API cancellations didn’t stop remote query coroutines, so requests
could continue after interrupt.
Changes:
- Cancel run_coroutine_threadsafe futures on any BaseException in the
sync background loop
- Update cancellation test to avoid starting a real background thread
and cover GeneratorExit
Fixes the Rust SDK's `create_empty_table` to properly support embedding
column definitions, bringing it to parity with the Python SDK.
## Problem
The Rust SDK's `Connection::create_empty_table` did not support setting
embedding columns. When using `.add_embedding()` on the builder, the
embedding column definitions were lost because
`TableDefinition::new_from_schema(schema)` marks all columns as physical
only, without embedding metadata.
The Python SDK worked around this by creating an empty record batch with
proper schema metadata rather than using `create_empty_table` directly.
## Solution
Modified `CreateTableBuilder<false>` to handle embeddings
Closes#2759
Importing `PIL` alone does not guarantee that the `Image` submodule is
loaded. In a clean environment where no other code has imported
`PIL.Image` before, `PIL.Image` does not exist on the `PIL` package,
which leads to the AttributeError.
The permutation table was always intended to be a small table of row id
pointers (and split id). However, it was accidentally doing a full
materialization of the base table 🤦
This PR changes the permutation builder to only store row id and split
id.
Realized our MSRV check was inert because `rust-toolchain.toml` was
overriding the Rust version. We set the `RUSTUP_TOOLCHAIN` environment
variable, which overrides that.
Also needed to update to MSRV 1.88 (due to dependencies like Lance and
DataFusion) and fix some clippy warnings.
Unlike in Amazon S3, in Azure bucket names are not globally unique.
Instead, the combination of (storage_account_name, bucket_name) is
unique.
Therefore, when using Azure blob store, we always need a way to
configure the storage account name. One way is to use the
storage_options hash map and set azure_storage_account_name. Another way
is to set an environment variable, AZURE_STORAGE_ACCOUNT_NAME.
Prior to this PR, the second way (environment variable) did not work
with remote connections. This is because the existing code that checks
for these environment variables happens inside the Azure object store
implementation itself, which does not run locally when using remote
connections.
This PR addresses that situation by adding a check of the environment
variable. This functions as a default if the relevant storage option is
not set in the storage_options hash map.
Pandas 3.0+ string now converts to Arrow large_utf8. This PR mainly
makes sure our test accounts for the difference across the pandas
versions when constructing schema.