Commit Graph

2442 Commits

Author SHA1 Message Date
Weston Pace
874b74dd3c feat: update lance dependency to v4.0.0-rc.3 (#3187)
## Summary
- Update all lance workspace dependencies from v3.0.1 (crates.io) to
v4.0.0-rc.3 (git tag)
- Pin AWS SDK transitive dependencies to versions compatible with Rust
1.91.0 MSRV

## Test plan
- [x] `cargo check --features remote --tests --examples` passes
- [x] `cargo clippy --features remote --tests --examples` passes
- [x] Python bindings compile (`cargo check -p lancedb-python`)
- [ ] CI passes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 09:20:29 -07:00
Lance Release
61de47f3a5 Bump version: 0.27.1 → 0.27.2-beta.0 2026-03-25 03:23:28 +00:00
Lance Release
f4d613565e Bump version: 0.30.1 → 0.30.2-beta.0 python-v0.30.2-beta.0 2026-03-25 03:22:55 +00:00
Wyatt Alt
410ab9b6fe Revert "feat: allow passing azure client/tenant ID through remote SDK" (#3185)
Reverts lancedb/lancedb#3102
2026-03-24 20:17:40 -07:00
Will Jones
1d6e00b902 feat: progress bar for add() (#3067)
## Summary

Adds progress reporting for `table.add()` so users can track large write
operations. The progress callback is available in Rust, Python (sync and
async), and through the PyO3 bindings.

### Usage

Pass `progress=True` to get an automatic tqdm bar:

```python
table.add(data, progress=True)
# 100%|██████████| 1000000/1000000 [00:12<00:00, 82345 rows/s, 45.2 MB/s | 4/4 workers]
```

Or pass a tqdm bar for more control:

```python
from tqdm import tqdm

with tqdm(unit=" rows") as pbar:
    table.add(data, progress=pbar)
```

Or use a callback for custom progress handling:

```python
def on_progress(p):
    print(f"{p['output_rows']}/{p['total_rows']} rows, "
          f"{p['active_tasks']}/{p['total_tasks']} workers, "
          f"done={p['done']}")

table.add(data, progress=on_progress)
```

In Rust:

```rust
table.add(data)
    .progress(|p| println!("{}/{:?} rows", p.output_rows(), p.total_rows()))
    .execute()
    .await?;
```

### Details

- `WriteProgress` struct in Rust with getters for `elapsed`,
`output_rows`, `output_bytes`, `total_rows`, `active_tasks`,
`total_tasks`, and `done`. Fields are private behind getters so new
fields can be added without breaking changes.
- `WriteProgressTracker` tracks progress across parallel write tasks
using a mutex for row/byte counts and atomics for active task counts.
- Active task tracking uses an RAII guard pattern (`ActiveTaskGuard`)
that increments on creation and decrements on drop.
- For remote writes, `output_bytes` reflects IPC wire bytes rather than
in-memory Arrow size. For local writes it uses in-memory Arrow size as a
proxy (see TODO below).
- tqdm postfix displays throughput (MB/s) and worker utilization
(active/total).
- The `done` callback always fires, even on error (via `FinishOnDrop`),
so progress bars are always finalized.

### TODO

- Track actual bytes written to disk for local tables. This requires
Lance to expose a progress callback from its write path. See
lance-format/lance#6247.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-23 16:14:13 -07:00
Esteban Gutierrez
a0228036ae ci: fix unused PreprocessingOutput (#3180)
Simple fix to for CI due unused import of PreprocessingOutput in
table.rs

Co-authored-by: Esteban Gutierrez <esteban@lancedb.com>
2026-03-23 13:45:44 -07:00
Esteban Gutierrez
d8fc071a7d fix(ci): bump AWS SDK MSRV pins to March 2025 release (#3179)
Lance v4.1.0-beta requires the default-https-client feature on
aws-sdk-dynamodb and aws-sdk-s3, which was introduced in the March
2025 AWS SDK release. Update all AWS SDK pins to versions from the
same AWS SDK release to maintain internal dependency compatibility.

Co-authored-by: Esteban Gutierrez <esteban@lancedb.com>
2026-03-23 15:30:33 -05:00
Will Jones
e6fd8d071e feat(rust): parallel inserts for remote tables via multipart write (#3071)
Similar to https://github.com/lancedb/lancedb/pull/3062, we can write in
parallel to remote tables if the input data source is large enough.

We take advantage of new endpoints coming in server version 0.4.0, which
allow writing data in multiple requests, and the committing at the end
in a single request.

To make testing easier, I also introduce a `write_parallelism`
parameter. In the future, we can expose that in Python and NodeJS so
users can manually specify the parallelism they get.

Closes #2861

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-20 13:19:07 -07:00
LanceDB Robot
670dcca551 feat: update lance dependency to v3.0.1 (#3168)
## Summary
- Updated Lance Rust workspace dependencies to `3.0.1` using
`ci/set_lance_version.py`.
- Updated Java `lance-core` dependency property in `java/pom.xml` to
`3.0.1`.
- Refreshed `Cargo.lock` entries for Lance crates at `3.0.1`.

## Verification
- `cargo clippy --workspace --tests --all-features -- -D warnings`
- `cargo fmt --all`

## Trigger
- Tag:
[`refs/tags/v3.0.1`](https://github.com/lancedb/lance/tree/v3.0.1)

Co-authored-by: Esteban Gutierrez <estebangtz@gmail.com>
2026-03-20 09:53:20 -07:00
Prashanth Rao
ed7e01a58b docs: fix rendering issues with missing index types in API docs (#3143)
## Problem

The generated Python API docs for
`lancedb.table.IndexStatistics.index_type` were misleading because
mkdocstrings renders that field’s type annotation directly, and the
existing `Literal[...]` listed only a subset of the actual canonical SDK
index type strings.

Current (missing index types):
<img width="823" height="83" alt="image"
src="https://github.com/user-attachments/assets/f6f29fe3-4c16-4d00-a4e9-28a7cd6e19ec"
/>


## Fix

- Update the `IndexStatistics.index_type` annotation in
`python/python/lancedb/table.py` to include the full supported set of
canonical values, so the generated docs show all valid index_type
strings inline.
- Add a small regression test in `python/python/tests/test_index.py` to
ensure the docs-facing annotation does not drift silently again in case
we add a new index/quantization type in the future.
- Bumps mkdocs and material theme versions to mkdocs 1.6 to allow access
to more features like hooks

After fix (all index types are included and tested for in the
annotations):
<img width="1017" height="93" alt="image"
src="https://github.com/user-attachments/assets/66c74d5c-34b3-4b44-8173-3ee23e3648ac"
/>
2026-03-20 09:34:42 -07:00
Lance Release
3450ccaf7f Bump version: 0.27.1-beta.0 → 0.27.1 2026-03-20 00:35:36 +00:00
Lance Release
9b229f1e7c Bump version: 0.27.0 → 0.27.1-beta.0 2026-03-20 00:35:19 +00:00
Lance Release
f5b21c0aa4 Bump version: 0.30.1-beta.0 → 0.30.1 python-v0.30.1 2026-03-20 00:35:03 +00:00
Lance Release
e927924d26 Bump version: 0.30.0 → 0.30.1-beta.0 2026-03-20 00:35:02 +00:00
Weston Pace
11a4966bfc feat: upgrade lance dependency to v3.0.1 (#3157)
## Summary
- Upgrade all lance-* dependencies from v3.0.0 to v3.0.1 (stable, from
crates.io)

## Test plan
- [x] `cargo check --features remote --tests --examples` passes
- [x] `cargo clippy --features remote --tests --examples` passes
- [x] `cargo fmt --all --check` passes
- [ ] CI tests pass

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-19 17:30:46 -07:00
Weston Pace
dd5aaa72dc ci: modify check_lance_release.py to prefer stable releases over betas (#3146)
When Lance 3.0.0 released the check_lance_release.py script did not make
a PR for it because it was a pre-release. This change may not be perfect
but it always ranks stable releases above non-stable releases.
2026-03-17 09:21:30 -07:00
marca116
3a200d77ef fix: pre-filtering on hybrid search (#3096)
When using hybrid search with a where filter, the prefilter argument is
silently inverted. Passing prefilter=True actually performs
post-filtering, and prefilter=False actually performs pre-filtering.
2026-03-16 21:48:42 -07:00
Lance Release
bd09c53938 Bump version: 0.27.0-beta.6 → 0.27.0 2026-03-16 22:47:06 +00:00
Lance Release
0b18e33180 Bump version: 0.27.0-beta.5 → 0.27.0-beta.6 2026-03-16 22:46:48 +00:00
Lance Release
c89240b16c Bump version: 0.30.0-beta.6 → 0.30.0 python-v0.30.0 2026-03-16 22:46:19 +00:00
Lance Release
099ff355a4 Bump version: 0.30.0-beta.5 → 0.30.0-beta.6 2026-03-16 22:46:17 +00:00
Weston Pace
c5995fda67 feat: update lance dependency to 3.0.0 release (#3137)
## Summary
- Update all 14 lance crates from `3.0.0-rc.3` (git source) to `3.0.0`
(crates.io release)
- Remove git/tag source references since 3.0.0 is published on crates.io

## Test plan
- [x] `cargo check --features remote --tests --examples` passes
- [x] `cargo clippy --features remote --tests --examples` passes
- [ ] CI passes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-16 15:29:18 -07:00
Weston Pace
25eb1fbfa4 fix: restore storage options on copy in localstack tests (#3148) 2026-03-16 14:02:19 -07:00
Weston Pace
4ac41c5c3f fix(ci): upgrade LocalStack to 4.0 for S3 integration tests (#3147)
## Summary
- Upgrade LocalStack from 3.3 to 4.0 in `docker-compose.yml` to fix S3
integration test failures in CI
- Version 3.3 has compatibility issues with newer Python 3.13 and
updated boto3 dependencies
- Matches the LocalStack version used successfully in the lance
repository

## Test plan
- [ ] Verify `docker compose up --detach --wait` completes successfully
in CI
- [ ] All tests in `test_s3.py` pass (5 tests)
- [ ] All `@pytest.mark.s3_test` tests in
`test_namespace_integration.py` pass (7 tests)
- [ ] No regressions in non-integration test jobs (Mac, Windows)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-16 09:02:11 -07:00
Will Jones
9a5b0398ec chore: fix ci (#3139)
* Move away from buildjet, which is shutting down runners for GHA [^1]
* Add `Cargo.lock` to build jobs, so when we upgrade locked dependencies
we check the builds actually pass. CI started failing because
dependencies were changed in #3116 without running all build jobs.
* Add fixes for aws-lc-rs build in NodeJS.

[^1]: https://buildjet.com/for-github-actions/blog/we-are-shutting-down

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-16 06:25:40 -07:00
Pratik Dey
d1d720d08a feat(nodejs): support field/data type input in add_columns() method (#3114)
Add support for passing field/data type information into add_columns()
method, bringing parity with Python bindings. The method now accepts:

- AddColumnsSql[] - SQL expressions (existing functionality)
- Field - single Arrow field with explicit data type
- Field[] - array of Arrow fields with explicit data types
- Schema - Arrow schema with explicit data types

New columns added via Field/Schema are initialized with null values. All
field-based columns must be nullable due to null initialization.

Resolves #3107

---------

Signed-off-by: Pratik <pratikrocks.dey11@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
2026-03-13 12:57:14 -07:00
Mesut-Doner
c2e543f1b7 feat(rust): support Expr in projection query (#3069)
Referred and followed [`Select::Dynamic`] implementation. 

Closes #3039
2026-03-13 12:54:26 -07:00
Weston Pace
216c1b5f77 docs: remove experimental label from optimize and warn about delete_unverified (#3128)
## Summary
- Removes the "Experimental API" section from `optimize` method
documentation across Rust, Python, and TypeScript
- Adds a warning to `delete_unverified` documentation in all bindings:
this should only be set to true if you can guarantee no other process is
working on the dataset, otherwise it could be corrupted
- Fixes a typo ("shoudl" → "should")

Closes #3125


🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 14:37:42 +08:00
Xin Sun
fc1867da83 chore: remove the duplicate snafu-derive dependency in the lockfile (#3124) 2026-03-10 21:43:51 -07:00
Esteban Gutierrez
f951da2b00 feat: support prewarm_index and prewarm_data on remote tables (#3110)
## Summary

- Implement `RemoteTable.prewarm_data(columns)` calling `POST
/v1/table/{id}/page_cache/prewarm/`
- Implement `RemoteTable.prewarm_index(name)` calling `POST
/v1/table/{id}/index/{name}/prewarm/` (previously returned
`NotSupported`)
- Add `BaseTable::prewarm_data(columns)` trait method and `Table` public
API in Rust core
- Add PyO3 bindings and Python API (`AsyncTable`, `LanceTable`,
`RemoteTable`) for `prewarm_data`
- Add type stubs for `prewarm_index` and `prewarm_data` in
`_lancedb.pyi`
- Upgrade Lance to 3.0.0-rc.3 with breaking change fixes

Co-authored-by: Will Jones <willjones127@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 15:39:39 -05:00
Esteban Gutierrez
6530d82690 chore: dependency updates and security fixes (#3116)
## Summary

- Update dependencies across Rust, Python, Node.js, Java, Docker, and
docs
- Pin unpinned dependency lower bounds to prevent silent downgrades
- Bump CI actions to current major versions

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 20:04:27 -07:00
Lance Release
b3fc9c444f Bump version: 0.27.0-beta.4 → 0.27.0-beta.5 2026-03-09 19:58:12 +00:00
Lance Release
6de8f42dcd Bump version: 0.30.0-beta.4 → 0.30.0-beta.5 python-v0.30.0-beta.5 2026-03-09 19:56:15 +00:00
Will Jones
5c3bd68e58 feat: upgrade Lance to 3.0.0-rc.3 (#3104)
Co-authored-by: Jack Ye <yezhaoqin@gmail.com>
2026-03-09 12:55:20 -07:00
Weston Pace
4be85444f0 feat: infer js native arrays (#3119)
When we create tables without using Arrow by parsing JS records we
always infer to float64. Many times embeddings are not float64 and it
would be nice to be able to use the native type without requiring users
to pull in Arrow. We can utilize JS's builtin Float32Array to do this.
This PR also adds support for UInt8/16/32 and Int8/16/32 arrays as well.

Closes #3115
2026-03-09 10:13:59 -07:00
Xuanwo
68c07f333f chore: unify component README titles (#3066) 2026-03-09 21:47:58 +08:00
Lance Release
814a379e08 Bump version: 0.27.0-beta.3 → 0.27.0-beta.4 2026-03-09 08:47:17 +00:00
Lance Release
f31561c5bb Bump version: 0.30.0-beta.3 → 0.30.0-beta.4 python-v0.30.0-beta.4 2026-03-09 08:45:25 +00:00
Jack Ye
e0c5ceac03 fix: propagate managed versioning for namespace connection (#3111)
Without this fix, if user directly use the native table to do operations
like `add_columns`, even if it is configured to use namespace db
connection, it is not really propagated through.

The fix is to bring lancedb's python binding up to date and do a similar
implementation as https://github.com/lance-format/lance/pull/5968, and
make sure the namespace is fully propagated through all the related
calls.

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-09 01:44:31 -07:00
Prashanth Rao
e93bb3355a docs: add meth/func names to mkdocstrings (#3101)
LanceDB's SDK API docs do not currently show method names under any
given object, and this makes it harder to quickly understand and find
relevant method names for a given class. Geneva docs show the available
methods in the right navigation.

This PR standardizes the appearance of the LanceDB SDK API in the docs
to be more similar to Geneva's.
<img width="1386" height="792" alt="image"
src="https://github.com/user-attachments/assets/30816591-d6d5-495d-886d-e234beeb6059"
/>

<img width="897" height="540" alt="image"
src="https://github.com/user-attachments/assets/d5491b6b-c7bf-4d3b-8b15-1a1a7700e7c9"
/>
2026-03-06 08:54:45 -08:00
Will Jones
b75991eb07 fix: propagate cast errors in add() (#3075)
When we write data with `add()`, we can input data to the table's
schema. However, we were using "safe" mode, which propagates errors as
nulls. For example, if you pass `u64::max` into a field that is a `u32`,
it will just write null instead of giving overflow error. Now it
propagates the overflow. This is the same behavior as other systems like
DuckDB.

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-05 20:24:50 -08:00
Wyatt Alt
97ca9bb943 feat: allow passing azure client/tenant ID through remote SDK (#3102)
Prior to this commit we supported passing the azure storage account name
to the lancedb remote SDK through headers. This adds support for client
ID and tenant ID as well.
2026-03-04 11:11:36 -08:00
Xuanwo
fa1b04f341 chore: migrate Rust crates to edition 2024 and fix clippy warnings (#3098)
This PR migrates all Rust crates in the workspace to Rust 2024 edition
and addresses the resulting compatibility updates. It also fixes all
clippy warnings surfaced by the workspace checks so the codebase remains
warning-free under the current lint configuration.

Context:
- Scope: workspace edition bump (`2021` -> `2024`) plus follow-up
refactors required by new edition and clippy rules.
- Validation: `cargo fmt --all` and `cargo clippy --quiet --features
remote --tests --examples -- -D warnings` both pass.
2026-03-03 16:23:29 -08:00
mrncstt
367abe99d2 feat(python): support dict to SQL struct conversion in table.update() (#3089)
## Summary

- Add `@value_to_sql.register(dict)` handler that converts Python dicts
to DataFusion's `named_struct()` SQL syntax
- Enables updating struct-typed columns via `table.update(values={"col":
{"field_a": 1, "field_b": "hello"}})`
- Recursively handles nested structs, lists, nulls, and all existing
scalar types

Closes #1363

## Details

The `named_struct` function was introduced in DataFusion 38 and is now
available (LanceDB uses DataFusion 52.1). The implementation follows the
existing `singledispatch` pattern in `util.py`.

**Example conversion:**
```python
value_to_sql({"field_a": 1, "field_b": "hello"})
# => "named_struct('field_a', 1, 'field_b', 'hello')"
```

## Test plan

- [x] Unit tests for flat struct, nested struct, list inside struct,
mixed types, null values, and empty dict
- [ ] CI integration tests with actual table.update() on struct columns

🔗 [DataFusion named_struct
docs](https://datafusion.apache.org/user-guide/sql/scalar_functions.html#named-struct)
2026-03-03 13:36:08 -08:00
Xuanwo
52ce2c995c fix(ci): only run npm publish on release tags (#3093)
This PR fixes the npm publish dry-run failure for prerelease versions
without changing the existing workflow trigger behavior. The publish
step now detects prerelease versions from `nodejs/package.json` and
always appends `--tag preview` when needed.

Context:
- On `main` pushes, the workflow still runs `npm publish --dry-run` by
design.
- Recent failures were caused by prerelease versions (for example
`0.27.0-beta.3`) running without `--tag`, which npm rejects.
- The previous `refs/tags/v...-beta...` check did not apply on branch
pushes, so dry-run could fail even though release tags worked.
2026-03-04 01:35:10 +08:00
Sean Mackrory
e71a00998c ci: add regression test for fastSearch in FTS queries in TypeScript (#3090)
We recently added support for this for the Python bindings, and wanted
to confirm this already worked as expected in the TS bindings.
2026-03-03 07:09:09 -08:00
Sean Mackrory
39a2ac0a1c feat: add parity between fast_search keyword argument between vector and FTS searches (#3091)
We don't necessarily need to do this, but one user was confused having
used `fast_search=True` as a keyword argument for vector searches, but
being unable to do so for FTS, even after the most recent changes. I
think this is the only discrepancy in where that is possible.
2026-03-03 05:21:36 -08:00
Wyatt Alt
bc7b344fa4 feat: add support for remote index params (#3087)
Prior to this commit the remote SDK did not support the full set of
index parameters. This extends the SDK to support them.
2026-03-02 11:14:28 -08:00
Will Jones
f91d2f5fec ci(python): pin maturin to work around bug (#3088)
Work around for https://github.com/PyO3/maturin/issues/3059
2026-03-02 09:38:54 -08:00
Wyatt Alt
cf81b6419f feat: add num_deleted_rows to delete result (#3077) 2026-03-02 08:37:14 -08:00