## Summary
- Update all 14 lance crates from `3.0.0-rc.3` (git source) to `3.0.0`
(crates.io release)
- Remove git/tag source references since 3.0.0 is published on crates.io
## Test plan
- [x] `cargo check --features remote --tests --examples` passes
- [x] `cargo clippy --features remote --tests --examples` passes
- [ ] CI passes
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
## Summary
- Upgrade LocalStack from 3.3 to 4.0 in `docker-compose.yml` to fix S3
integration test failures in CI
- Version 3.3 has compatibility issues with newer Python 3.13 and
updated boto3 dependencies
- Matches the LocalStack version used successfully in the lance
repository
## Test plan
- [ ] Verify `docker compose up --detach --wait` completes successfully
in CI
- [ ] All tests in `test_s3.py` pass (5 tests)
- [ ] All `@pytest.mark.s3_test` tests in
`test_namespace_integration.py` pass (7 tests)
- [ ] No regressions in non-integration test jobs (Mac, Windows)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* Move away from buildjet, which is shutting down runners for GHA [^1]
* Add `Cargo.lock` to build jobs, so when we upgrade locked dependencies
we check the builds actually pass. CI started failing because
dependencies were changed in #3116 without running all build jobs.
* Add fixes for aws-lc-rs build in NodeJS.
[^1]: https://buildjet.com/for-github-actions/blog/we-are-shutting-down
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add support for passing field/data type information into add_columns()
method, bringing parity with Python bindings. The method now accepts:
- AddColumnsSql[] - SQL expressions (existing functionality)
- Field - single Arrow field with explicit data type
- Field[] - array of Arrow fields with explicit data types
- Schema - Arrow schema with explicit data types
New columns added via Field/Schema are initialized with null values. All
field-based columns must be nullable due to null initialization.
Resolves#3107
---------
Signed-off-by: Pratik <pratikrocks.dey11@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
## Summary
- Removes the "Experimental API" section from `optimize` method
documentation across Rust, Python, and TypeScript
- Adds a warning to `delete_unverified` documentation in all bindings:
this should only be set to true if you can guarantee no other process is
working on the dataset, otherwise it could be corrupted
- Fixes a typo ("shoudl" → "should")
Closes#3125🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
## Summary
- Implement `RemoteTable.prewarm_data(columns)` calling `POST
/v1/table/{id}/page_cache/prewarm/`
- Implement `RemoteTable.prewarm_index(name)` calling `POST
/v1/table/{id}/index/{name}/prewarm/` (previously returned
`NotSupported`)
- Add `BaseTable::prewarm_data(columns)` trait method and `Table` public
API in Rust core
- Add PyO3 bindings and Python API (`AsyncTable`, `LanceTable`,
`RemoteTable`) for `prewarm_data`
- Add type stubs for `prewarm_index` and `prewarm_data` in
`_lancedb.pyi`
- Upgrade Lance to 3.0.0-rc.3 with breaking change fixes
Co-authored-by: Will Jones <willjones127@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
## Summary
- Update dependencies across Rust, Python, Node.js, Java, Docker, and
docs
- Pin unpinned dependency lower bounds to prevent silent downgrades
- Bump CI actions to current major versions
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
When we create tables without using Arrow by parsing JS records we
always infer to float64. Many times embeddings are not float64 and it
would be nice to be able to use the native type without requiring users
to pull in Arrow. We can utilize JS's builtin Float32Array to do this.
This PR also adds support for UInt8/16/32 and Int8/16/32 arrays as well.
Closes#3115
Without this fix, if user directly use the native table to do operations
like `add_columns`, even if it is configured to use namespace db
connection, it is not really propagated through.
The fix is to bring lancedb's python binding up to date and do a similar
implementation as https://github.com/lance-format/lance/pull/5968, and
make sure the namespace is fully propagated through all the related
calls.
---------
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
When we write data with `add()`, we can input data to the table's
schema. However, we were using "safe" mode, which propagates errors as
nulls. For example, if you pass `u64::max` into a field that is a `u32`,
it will just write null instead of giving overflow error. Now it
propagates the overflow. This is the same behavior as other systems like
DuckDB.
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Prior to this commit we supported passing the azure storage account name
to the lancedb remote SDK through headers. This adds support for client
ID and tenant ID as well.
This PR migrates all Rust crates in the workspace to Rust 2024 edition
and addresses the resulting compatibility updates. It also fixes all
clippy warnings surfaced by the workspace checks so the codebase remains
warning-free under the current lint configuration.
Context:
- Scope: workspace edition bump (`2021` -> `2024`) plus follow-up
refactors required by new edition and clippy rules.
- Validation: `cargo fmt --all` and `cargo clippy --quiet --features
remote --tests --examples -- -D warnings` both pass.
## Summary
- Add `@value_to_sql.register(dict)` handler that converts Python dicts
to DataFusion's `named_struct()` SQL syntax
- Enables updating struct-typed columns via `table.update(values={"col":
{"field_a": 1, "field_b": "hello"}})`
- Recursively handles nested structs, lists, nulls, and all existing
scalar types
Closes#1363
## Details
The `named_struct` function was introduced in DataFusion 38 and is now
available (LanceDB uses DataFusion 52.1). The implementation follows the
existing `singledispatch` pattern in `util.py`.
**Example conversion:**
```python
value_to_sql({"field_a": 1, "field_b": "hello"})
# => "named_struct('field_a', 1, 'field_b', 'hello')"
```
## Test plan
- [x] Unit tests for flat struct, nested struct, list inside struct,
mixed types, null values, and empty dict
- [ ] CI integration tests with actual table.update() on struct columns
🔗 [DataFusion named_struct
docs](https://datafusion.apache.org/user-guide/sql/scalar_functions.html#named-struct)
This PR fixes the npm publish dry-run failure for prerelease versions
without changing the existing workflow trigger behavior. The publish
step now detects prerelease versions from `nodejs/package.json` and
always appends `--tag preview` when needed.
Context:
- On `main` pushes, the workflow still runs `npm publish --dry-run` by
design.
- Recent failures were caused by prerelease versions (for example
`0.27.0-beta.3`) running without `--tag`, which npm rejects.
- The previous `refs/tags/v...-beta...` check did not apply on branch
pushes, so dry-run could fail even though release tags worked.
We don't necessarily need to do this, but one user was confused having
used `fast_search=True` as a keyword argument for vector searches, but
being unable to do so for FTS, even after the most recent changes. I
think this is the only discrepancy in where that is possible.
## Summary
Adds a Rust expression builder API as a type-safe alternative to SQL
strings for query filters.
## Motivation
Filtering with raw SQL strings can be awkward when using variables and
special types:
Closes #3038
---------
Co-authored-by: Will Jones <willjones127@gmail.com>
When input data is sufficiently large, we automatically split up into
parallel writes using a round-robin exchange operator. We sample the
first batch to determine data width, and target size of 1 million rows
or 2GB, whichever is smaller.
This hooks up a new writer implementation for the `add()` method. The
main immediate benefit is it allows streaming requests to remote tables,
and at the same time allowing retries for most inputs.
In NodeJS, we always convert the data to `Vec<RecordBatch>`, so it's
always retry-able.
For Python, all are retry-able, except `Iterator` and
`pa.RecordBatchReader`, which can only be consumed once. Some, like
`pa.datasets.Dataset` are retry-able *and* streaming.
A lot of the changes here are to make the new DataFusion write pipeline
maintain the same behavior as the existing Python-based preprocessing,
such as:
* casting input data to target schema
* rejecting NaN values if `on_bad_vectors="error"`
* applying embedding functions.
In future PRs, we'll enhance these by moving the embedding calls into
DataFusion and making sure we parallelize them. See:
https://github.com/lancedb/lancedb/issues/3048
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
## Summary
- Upgrades `@napi-rs/cli` from v2 to v3, `napi`/`napi-derive` Rust
crates to 3.x
- Fixes a bug
([napi-rs#1170](https://github.com/napi-rs/napi-rs/issues/1170)) where
the CLI failed to locate the built `.node` binary when a custom Cargo
target directory is set (via `config.toml`)
## Changes
**package.json / CLI**:
- `napi.name` → `napi.binaryName`, `napi.triples` → `napi.targets`
- Removed `--no-const-enum` flag and fixed output dir arg
- `napi universal` → `napi universalize`
**Rust API migration**:
- `#[napi::module_init]` → `#[napi_derive::module_init]`
- `napi::JsObject` → `Object`, `.get::<_, T>()` → `.get::<T>()`
- `ErrorStrategy` removed; `ThreadsafeFunction` now takes an explicit
`Return` type with `CalleeHandled = false` const generic
- `JsFunction` + `create_threadsafe_function` replaced by typed
`Function<Args, Return>` + `build_threadsafe_function().build()`
- `RerankerCallbacks` struct removed (`Function<'env,...>` can't be
stored in structs); `VectorQuery::rerank` now accepts the function
directly
- `ClassInstance::clone()` now returns `ClassInstance`, fixed with
explicit deref
- `Vec<u8>` in `#[napi(object)]` now maps to `Array<number>` in v3;
changed to `Buffer` to preserve the TypeScript `Buffer` type
**TypeScript**:
- `inner.rerank({ rerankHybrid: async (_, args) => ... })` →
`inner.rerank(async (args) => ...)`
- Header provider callback wrapped in `async` to match stricter typed
constructor signature
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
## Summary
`DatasetConsistencyWrapper::update()` only stored datasets with a
strictly newer
version. This caused `migrate_manifest_paths_v2` to silently drop its
update since
the migration renames files without bumping the dataset version. The
subsequent
`uses_v2_manifest_paths()` call would then return the stale cached
dataset.
Changed the version check from `>` to `>=` so same-version updates are
accepted.
## Test plan
- [x] Existing `test_create_table_v2_manifest_paths_async` Python test
should pass
- [x] Existing `should be able to migrate tables to the V2 manifest
paths` NodeJS test should pass
- [x] All dataset wrapper unit tests pass locally
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>