Compare commits

...

49 Commits

Author SHA1 Message Date
Dan Tasse
cfa057351f lint 2026-03-25 11:49:13 -04:00
Dan Tasse
a5a80acf8b fix: change _client reference to _conn 2026-03-25 11:44:13 -04:00
Will Jones
1d6e00b902 feat: progress bar for add() (#3067)
## Summary

Adds progress reporting for `table.add()` so users can track large write
operations. The progress callback is available in Rust, Python (sync and
async), and through the PyO3 bindings.

### Usage

Pass `progress=True` to get an automatic tqdm bar:

```python
table.add(data, progress=True)
# 100%|██████████| 1000000/1000000 [00:12<00:00, 82345 rows/s, 45.2 MB/s | 4/4 workers]
```

Or pass a tqdm bar for more control:

```python
from tqdm import tqdm

with tqdm(unit=" rows") as pbar:
    table.add(data, progress=pbar)
```

Or use a callback for custom progress handling:

```python
def on_progress(p):
    print(f"{p['output_rows']}/{p['total_rows']} rows, "
          f"{p['active_tasks']}/{p['total_tasks']} workers, "
          f"done={p['done']}")

table.add(data, progress=on_progress)
```

In Rust:

```rust
table.add(data)
    .progress(|p| println!("{}/{:?} rows", p.output_rows(), p.total_rows()))
    .execute()
    .await?;
```

### Details

- `WriteProgress` struct in Rust with getters for `elapsed`,
`output_rows`, `output_bytes`, `total_rows`, `active_tasks`,
`total_tasks`, and `done`. Fields are private behind getters so new
fields can be added without breaking changes.
- `WriteProgressTracker` tracks progress across parallel write tasks
using a mutex for row/byte counts and atomics for active task counts.
- Active task tracking uses an RAII guard pattern (`ActiveTaskGuard`)
that increments on creation and decrements on drop.
- For remote writes, `output_bytes` reflects IPC wire bytes rather than
in-memory Arrow size. For local writes it uses in-memory Arrow size as a
proxy (see TODO below).
- tqdm postfix displays throughput (MB/s) and worker utilization
(active/total).
- The `done` callback always fires, even on error (via `FinishOnDrop`),
so progress bars are always finalized.

### TODO

- Track actual bytes written to disk for local tables. This requires
Lance to expose a progress callback from its write path. See
lance-format/lance#6247.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-23 16:14:13 -07:00
Esteban Gutierrez
a0228036ae ci: fix unused PreprocessingOutput (#3180)
Simple fix to for CI due unused import of PreprocessingOutput in
table.rs

Co-authored-by: Esteban Gutierrez <esteban@lancedb.com>
2026-03-23 13:45:44 -07:00
Esteban Gutierrez
d8fc071a7d fix(ci): bump AWS SDK MSRV pins to March 2025 release (#3179)
Lance v4.1.0-beta requires the default-https-client feature on
aws-sdk-dynamodb and aws-sdk-s3, which was introduced in the March
2025 AWS SDK release. Update all AWS SDK pins to versions from the
same AWS SDK release to maintain internal dependency compatibility.

Co-authored-by: Esteban Gutierrez <esteban@lancedb.com>
2026-03-23 15:30:33 -05:00
Will Jones
e6fd8d071e feat(rust): parallel inserts for remote tables via multipart write (#3071)
Similar to https://github.com/lancedb/lancedb/pull/3062, we can write in
parallel to remote tables if the input data source is large enough.

We take advantage of new endpoints coming in server version 0.4.0, which
allow writing data in multiple requests, and the committing at the end
in a single request.

To make testing easier, I also introduce a `write_parallelism`
parameter. In the future, we can expose that in Python and NodeJS so
users can manually specify the parallelism they get.

Closes #2861

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-20 13:19:07 -07:00
LanceDB Robot
670dcca551 feat: update lance dependency to v3.0.1 (#3168)
## Summary
- Updated Lance Rust workspace dependencies to `3.0.1` using
`ci/set_lance_version.py`.
- Updated Java `lance-core` dependency property in `java/pom.xml` to
`3.0.1`.
- Refreshed `Cargo.lock` entries for Lance crates at `3.0.1`.

## Verification
- `cargo clippy --workspace --tests --all-features -- -D warnings`
- `cargo fmt --all`

## Trigger
- Tag:
[`refs/tags/v3.0.1`](https://github.com/lancedb/lance/tree/v3.0.1)

Co-authored-by: Esteban Gutierrez <estebangtz@gmail.com>
2026-03-20 09:53:20 -07:00
Prashanth Rao
ed7e01a58b docs: fix rendering issues with missing index types in API docs (#3143)
## Problem

The generated Python API docs for
`lancedb.table.IndexStatistics.index_type` were misleading because
mkdocstrings renders that field’s type annotation directly, and the
existing `Literal[...]` listed only a subset of the actual canonical SDK
index type strings.

Current (missing index types):
<img width="823" height="83" alt="image"
src="https://github.com/user-attachments/assets/f6f29fe3-4c16-4d00-a4e9-28a7cd6e19ec"
/>


## Fix

- Update the `IndexStatistics.index_type` annotation in
`python/python/lancedb/table.py` to include the full supported set of
canonical values, so the generated docs show all valid index_type
strings inline.
- Add a small regression test in `python/python/tests/test_index.py` to
ensure the docs-facing annotation does not drift silently again in case
we add a new index/quantization type in the future.
- Bumps mkdocs and material theme versions to mkdocs 1.6 to allow access
to more features like hooks

After fix (all index types are included and tested for in the
annotations):
<img width="1017" height="93" alt="image"
src="https://github.com/user-attachments/assets/66c74d5c-34b3-4b44-8173-3ee23e3648ac"
/>
2026-03-20 09:34:42 -07:00
Lance Release
3450ccaf7f Bump version: 0.27.1-beta.0 → 0.27.1 2026-03-20 00:35:36 +00:00
Lance Release
9b229f1e7c Bump version: 0.27.0 → 0.27.1-beta.0 2026-03-20 00:35:19 +00:00
Lance Release
f5b21c0aa4 Bump version: 0.30.1-beta.0 → 0.30.1 2026-03-20 00:35:03 +00:00
Lance Release
e927924d26 Bump version: 0.30.0 → 0.30.1-beta.0 2026-03-20 00:35:02 +00:00
Weston Pace
11a4966bfc feat: upgrade lance dependency to v3.0.1 (#3157)
## Summary
- Upgrade all lance-* dependencies from v3.0.0 to v3.0.1 (stable, from
crates.io)

## Test plan
- [x] `cargo check --features remote --tests --examples` passes
- [x] `cargo clippy --features remote --tests --examples` passes
- [x] `cargo fmt --all --check` passes
- [ ] CI tests pass

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-19 17:30:46 -07:00
Weston Pace
dd5aaa72dc ci: modify check_lance_release.py to prefer stable releases over betas (#3146)
When Lance 3.0.0 released the check_lance_release.py script did not make
a PR for it because it was a pre-release. This change may not be perfect
but it always ranks stable releases above non-stable releases.
2026-03-17 09:21:30 -07:00
marca116
3a200d77ef fix: pre-filtering on hybrid search (#3096)
When using hybrid search with a where filter, the prefilter argument is
silently inverted. Passing prefilter=True actually performs
post-filtering, and prefilter=False actually performs pre-filtering.
2026-03-16 21:48:42 -07:00
Lance Release
bd09c53938 Bump version: 0.27.0-beta.6 → 0.27.0 2026-03-16 22:47:06 +00:00
Lance Release
0b18e33180 Bump version: 0.27.0-beta.5 → 0.27.0-beta.6 2026-03-16 22:46:48 +00:00
Lance Release
c89240b16c Bump version: 0.30.0-beta.6 → 0.30.0 2026-03-16 22:46:19 +00:00
Lance Release
099ff355a4 Bump version: 0.30.0-beta.5 → 0.30.0-beta.6 2026-03-16 22:46:17 +00:00
Weston Pace
c5995fda67 feat: update lance dependency to 3.0.0 release (#3137)
## Summary
- Update all 14 lance crates from `3.0.0-rc.3` (git source) to `3.0.0`
(crates.io release)
- Remove git/tag source references since 3.0.0 is published on crates.io

## Test plan
- [x] `cargo check --features remote --tests --examples` passes
- [x] `cargo clippy --features remote --tests --examples` passes
- [ ] CI passes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-16 15:29:18 -07:00
Weston Pace
25eb1fbfa4 fix: restore storage options on copy in localstack tests (#3148) 2026-03-16 14:02:19 -07:00
Weston Pace
4ac41c5c3f fix(ci): upgrade LocalStack to 4.0 for S3 integration tests (#3147)
## Summary
- Upgrade LocalStack from 3.3 to 4.0 in `docker-compose.yml` to fix S3
integration test failures in CI
- Version 3.3 has compatibility issues with newer Python 3.13 and
updated boto3 dependencies
- Matches the LocalStack version used successfully in the lance
repository

## Test plan
- [ ] Verify `docker compose up --detach --wait` completes successfully
in CI
- [ ] All tests in `test_s3.py` pass (5 tests)
- [ ] All `@pytest.mark.s3_test` tests in
`test_namespace_integration.py` pass (7 tests)
- [ ] No regressions in non-integration test jobs (Mac, Windows)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-16 09:02:11 -07:00
Will Jones
9a5b0398ec chore: fix ci (#3139)
* Move away from buildjet, which is shutting down runners for GHA [^1]
* Add `Cargo.lock` to build jobs, so when we upgrade locked dependencies
we check the builds actually pass. CI started failing because
dependencies were changed in #3116 without running all build jobs.
* Add fixes for aws-lc-rs build in NodeJS.

[^1]: https://buildjet.com/for-github-actions/blog/we-are-shutting-down

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-16 06:25:40 -07:00
Pratik Dey
d1d720d08a feat(nodejs): support field/data type input in add_columns() method (#3114)
Add support for passing field/data type information into add_columns()
method, bringing parity with Python bindings. The method now accepts:

- AddColumnsSql[] - SQL expressions (existing functionality)
- Field - single Arrow field with explicit data type
- Field[] - array of Arrow fields with explicit data types
- Schema - Arrow schema with explicit data types

New columns added via Field/Schema are initialized with null values. All
field-based columns must be nullable due to null initialization.

Resolves #3107

---------

Signed-off-by: Pratik <pratikrocks.dey11@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
2026-03-13 12:57:14 -07:00
Mesut-Doner
c2e543f1b7 feat(rust): support Expr in projection query (#3069)
Referred and followed [`Select::Dynamic`] implementation. 

Closes #3039
2026-03-13 12:54:26 -07:00
Weston Pace
216c1b5f77 docs: remove experimental label from optimize and warn about delete_unverified (#3128)
## Summary
- Removes the "Experimental API" section from `optimize` method
documentation across Rust, Python, and TypeScript
- Adds a warning to `delete_unverified` documentation in all bindings:
this should only be set to true if you can guarantee no other process is
working on the dataset, otherwise it could be corrupted
- Fixes a typo ("shoudl" → "should")

Closes #3125


🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 14:37:42 +08:00
Xin Sun
fc1867da83 chore: remove the duplicate snafu-derive dependency in the lockfile (#3124) 2026-03-10 21:43:51 -07:00
Esteban Gutierrez
f951da2b00 feat: support prewarm_index and prewarm_data on remote tables (#3110)
## Summary

- Implement `RemoteTable.prewarm_data(columns)` calling `POST
/v1/table/{id}/page_cache/prewarm/`
- Implement `RemoteTable.prewarm_index(name)` calling `POST
/v1/table/{id}/index/{name}/prewarm/` (previously returned
`NotSupported`)
- Add `BaseTable::prewarm_data(columns)` trait method and `Table` public
API in Rust core
- Add PyO3 bindings and Python API (`AsyncTable`, `LanceTable`,
`RemoteTable`) for `prewarm_data`
- Add type stubs for `prewarm_index` and `prewarm_data` in
`_lancedb.pyi`
- Upgrade Lance to 3.0.0-rc.3 with breaking change fixes

Co-authored-by: Will Jones <willjones127@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 15:39:39 -05:00
Esteban Gutierrez
6530d82690 chore: dependency updates and security fixes (#3116)
## Summary

- Update dependencies across Rust, Python, Node.js, Java, Docker, and
docs
- Pin unpinned dependency lower bounds to prevent silent downgrades
- Bump CI actions to current major versions

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 20:04:27 -07:00
Lance Release
b3fc9c444f Bump version: 0.27.0-beta.4 → 0.27.0-beta.5 2026-03-09 19:58:12 +00:00
Lance Release
6de8f42dcd Bump version: 0.30.0-beta.4 → 0.30.0-beta.5 2026-03-09 19:56:15 +00:00
Will Jones
5c3bd68e58 feat: upgrade Lance to 3.0.0-rc.3 (#3104)
Co-authored-by: Jack Ye <yezhaoqin@gmail.com>
2026-03-09 12:55:20 -07:00
Weston Pace
4be85444f0 feat: infer js native arrays (#3119)
When we create tables without using Arrow by parsing JS records we
always infer to float64. Many times embeddings are not float64 and it
would be nice to be able to use the native type without requiring users
to pull in Arrow. We can utilize JS's builtin Float32Array to do this.
This PR also adds support for UInt8/16/32 and Int8/16/32 arrays as well.

Closes #3115
2026-03-09 10:13:59 -07:00
Xuanwo
68c07f333f chore: unify component README titles (#3066) 2026-03-09 21:47:58 +08:00
Lance Release
814a379e08 Bump version: 0.27.0-beta.3 → 0.27.0-beta.4 2026-03-09 08:47:17 +00:00
Lance Release
f31561c5bb Bump version: 0.30.0-beta.3 → 0.30.0-beta.4 2026-03-09 08:45:25 +00:00
Jack Ye
e0c5ceac03 fix: propagate managed versioning for namespace connection (#3111)
Without this fix, if user directly use the native table to do operations
like `add_columns`, even if it is configured to use namespace db
connection, it is not really propagated through.

The fix is to bring lancedb's python binding up to date and do a similar
implementation as https://github.com/lance-format/lance/pull/5968, and
make sure the namespace is fully propagated through all the related
calls.

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-09 01:44:31 -07:00
Prashanth Rao
e93bb3355a docs: add meth/func names to mkdocstrings (#3101)
LanceDB's SDK API docs do not currently show method names under any
given object, and this makes it harder to quickly understand and find
relevant method names for a given class. Geneva docs show the available
methods in the right navigation.

This PR standardizes the appearance of the LanceDB SDK API in the docs
to be more similar to Geneva's.
<img width="1386" height="792" alt="image"
src="https://github.com/user-attachments/assets/30816591-d6d5-495d-886d-e234beeb6059"
/>

<img width="897" height="540" alt="image"
src="https://github.com/user-attachments/assets/d5491b6b-c7bf-4d3b-8b15-1a1a7700e7c9"
/>
2026-03-06 08:54:45 -08:00
Will Jones
b75991eb07 fix: propagate cast errors in add() (#3075)
When we write data with `add()`, we can input data to the table's
schema. However, we were using "safe" mode, which propagates errors as
nulls. For example, if you pass `u64::max` into a field that is a `u32`,
it will just write null instead of giving overflow error. Now it
propagates the overflow. This is the same behavior as other systems like
DuckDB.

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-05 20:24:50 -08:00
Wyatt Alt
97ca9bb943 feat: allow passing azure client/tenant ID through remote SDK (#3102)
Prior to this commit we supported passing the azure storage account name
to the lancedb remote SDK through headers. This adds support for client
ID and tenant ID as well.
2026-03-04 11:11:36 -08:00
Xuanwo
fa1b04f341 chore: migrate Rust crates to edition 2024 and fix clippy warnings (#3098)
This PR migrates all Rust crates in the workspace to Rust 2024 edition
and addresses the resulting compatibility updates. It also fixes all
clippy warnings surfaced by the workspace checks so the codebase remains
warning-free under the current lint configuration.

Context:
- Scope: workspace edition bump (`2021` -> `2024`) plus follow-up
refactors required by new edition and clippy rules.
- Validation: `cargo fmt --all` and `cargo clippy --quiet --features
remote --tests --examples -- -D warnings` both pass.
2026-03-03 16:23:29 -08:00
mrncstt
367abe99d2 feat(python): support dict to SQL struct conversion in table.update() (#3089)
## Summary

- Add `@value_to_sql.register(dict)` handler that converts Python dicts
to DataFusion's `named_struct()` SQL syntax
- Enables updating struct-typed columns via `table.update(values={"col":
{"field_a": 1, "field_b": "hello"}})`
- Recursively handles nested structs, lists, nulls, and all existing
scalar types

Closes #1363

## Details

The `named_struct` function was introduced in DataFusion 38 and is now
available (LanceDB uses DataFusion 52.1). The implementation follows the
existing `singledispatch` pattern in `util.py`.

**Example conversion:**
```python
value_to_sql({"field_a": 1, "field_b": "hello"})
# => "named_struct('field_a', 1, 'field_b', 'hello')"
```

## Test plan

- [x] Unit tests for flat struct, nested struct, list inside struct,
mixed types, null values, and empty dict
- [ ] CI integration tests with actual table.update() on struct columns

🔗 [DataFusion named_struct
docs](https://datafusion.apache.org/user-guide/sql/scalar_functions.html#named-struct)
2026-03-03 13:36:08 -08:00
Xuanwo
52ce2c995c fix(ci): only run npm publish on release tags (#3093)
This PR fixes the npm publish dry-run failure for prerelease versions
without changing the existing workflow trigger behavior. The publish
step now detects prerelease versions from `nodejs/package.json` and
always appends `--tag preview` when needed.

Context:
- On `main` pushes, the workflow still runs `npm publish --dry-run` by
design.
- Recent failures were caused by prerelease versions (for example
`0.27.0-beta.3`) running without `--tag`, which npm rejects.
- The previous `refs/tags/v...-beta...` check did not apply on branch
pushes, so dry-run could fail even though release tags worked.
2026-03-04 01:35:10 +08:00
Sean Mackrory
e71a00998c ci: add regression test for fastSearch in FTS queries in TypeScript (#3090)
We recently added support for this for the Python bindings, and wanted
to confirm this already worked as expected in the TS bindings.
2026-03-03 07:09:09 -08:00
Sean Mackrory
39a2ac0a1c feat: add parity between fast_search keyword argument between vector and FTS searches (#3091)
We don't necessarily need to do this, but one user was confused having
used `fast_search=True` as a keyword argument for vector searches, but
being unable to do so for FTS, even after the most recent changes. I
think this is the only discrepancy in where that is possible.
2026-03-03 05:21:36 -08:00
Wyatt Alt
bc7b344fa4 feat: add support for remote index params (#3087)
Prior to this commit the remote SDK did not support the full set of
index parameters. This extends the SDK to support them.
2026-03-02 11:14:28 -08:00
Will Jones
f91d2f5fec ci(python): pin maturin to work around bug (#3088)
Work around for https://github.com/PyO3/maturin/issues/3059
2026-03-02 09:38:54 -08:00
Wyatt Alt
cf81b6419f feat: add num_deleted_rows to delete result (#3077) 2026-03-02 08:37:14 -08:00
Lance Release
0498ac1f2f Bump version: 0.27.0-beta.2 → 0.27.0-beta.3 2026-02-28 01:31:51 +00:00
143 changed files with 6709 additions and 4477 deletions

View File

@@ -1,5 +1,5 @@
[tool.bumpversion]
current_version = "0.27.0-beta.2"
current_version = "0.27.1"
parse = """(?x)
(?P<major>0|[1-9]\\d*)\\.
(?P<minor>0|[1-9]\\d*)\\.

View File

@@ -29,6 +29,7 @@ runs:
if: ${{ inputs.arm-build == 'false' }}
uses: PyO3/maturin-action@v1
with:
maturin-version: "1.12.4"
command: build
working-directory: python
docker-options: "-e PIP_EXTRA_INDEX_URL='https://pypi.fury.io/lance-format/ https://pypi.fury.io/lancedb/'"
@@ -44,6 +45,7 @@ runs:
if: ${{ inputs.arm-build == 'true' }}
uses: PyO3/maturin-action@v1
with:
maturin-version: "1.12.4"
command: build
working-directory: python
docker-options: "-e PIP_EXTRA_INDEX_URL='https://pypi.fury.io/lance-format/ https://pypi.fury.io/lancedb/'"

View File

@@ -20,6 +20,7 @@ runs:
uses: PyO3/maturin-action@v1
with:
command: build
maturin-version: "1.12.4"
# TODO: pass through interpreter
args: ${{ inputs.args }}
docker-options: "-e PIP_EXTRA_INDEX_URL='https://pypi.fury.io/lance-format/ https://pypi.fury.io/lancedb/'"

View File

@@ -25,6 +25,7 @@ runs:
uses: PyO3/maturin-action@v1
with:
command: build
maturin-version: "1.12.4"
args: ${{ inputs.args }}
docker-options: "-e PIP_EXTRA_INDEX_URL='https://pypi.fury.io/lance-format/ https://pypi.fury.io/lancedb/'"
working-directory: python

View File

@@ -15,7 +15,7 @@ jobs:
name: Label PR
runs-on: ubuntu-latest
steps:
- uses: srvaroa/labeler@master
- uses: srvaroa/labeler@v1
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
commitlint:
@@ -24,7 +24,7 @@ jobs:
name: Verify PR title / description conforms to semantic-release
runs-on: ubuntu-latest
steps:
- uses: actions/setup-node@v3
- uses: actions/setup-node@v4
with:
node-version: "18"
# These rules are disabled because Github will always ensure there
@@ -47,7 +47,7 @@ jobs:
${{ github.event.pull_request.body }}
- if: failure()
uses: actions/github-script@v6
uses: actions/github-script@v7
with:
script: |
const message = `**ACTION NEEDED**

View File

@@ -53,7 +53,7 @@ jobs:
python -m pip install --extra-index-url https://pypi.fury.io/lance-format/ --extra-index-url https://pypi.fury.io/lancedb/ -e .
python -m pip install --extra-index-url https://pypi.fury.io/lance-format/ --extra-index-url https://pypi.fury.io/lancedb/ -r ../docs/requirements.txt
- name: Set up node
uses: actions/setup-node@v3
uses: actions/setup-node@v4
with:
node-version: 20
cache: 'npm'
@@ -68,7 +68,7 @@ jobs:
run: |
PYTHONPATH=. mkdocs build
- name: Setup Pages
uses: actions/configure-pages@v2
uses: actions/configure-pages@v5
- name: Upload artifact
uses: actions/upload-pages-artifact@v3
with:

View File

@@ -7,6 +7,7 @@ on:
pull_request:
paths:
- Cargo.toml
- Cargo.lock
- nodejs/**
- rust/**
- docs/src/js/**
@@ -37,7 +38,7 @@ jobs:
with:
fetch-depth: 0
lfs: true
- uses: actions/setup-node@v3
- uses: actions/setup-node@v4
with:
node-version: 20
cache: 'npm'
@@ -77,7 +78,7 @@ jobs:
with:
fetch-depth: 0
lfs: true
- uses: actions/setup-node@v3
- uses: actions/setup-node@v4
name: Setup Node.js 20 for build
with:
# @napi-rs/cli v3 requires Node >= 20.12 (via @inquirer/prompts@8).
@@ -94,7 +95,7 @@ jobs:
run: |
npm ci --include=optional
npm run build:debug -- --profile ci
- uses: actions/setup-node@v3
- uses: actions/setup-node@v4
name: Setup Node.js ${{ matrix.node-version }} for test
with:
node-version: ${{ matrix.node-version }}
@@ -143,7 +144,7 @@ jobs:
with:
fetch-depth: 0
lfs: true
- uses: actions/setup-node@v3
- uses: actions/setup-node@v4
with:
node-version: 20
cache: 'npm'

View File

@@ -19,6 +19,7 @@ on:
paths:
- .github/workflows/npm-publish.yml
- Cargo.toml # Change in dependency frequently breaks builds
- Cargo.lock
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
@@ -124,7 +125,12 @@ jobs:
pre_build: |-
set -e &&
apt-get update &&
apt-get install -y protobuf-compiler pkg-config
apt-get install -y protobuf-compiler pkg-config &&
# The base image (manylinux2014-cross) sets TARGET_CC to the old
# GCC 4.8 cross-compiler. aws-lc-sys checks TARGET_CC before CC,
# so it picks up GCC even though the napi-rs image sets CC=clang.
# Override to use the image's clang-18 which supports -fuse-ld=lld.
export TARGET_CC=clang TARGET_CXX=clang++
- target: x86_64-unknown-linux-musl
# This one seems to need some extra memory
host: ubuntu-2404-8x-x64
@@ -144,9 +150,10 @@ jobs:
set -e &&
apt-get update &&
apt-get install -y protobuf-compiler pkg-config &&
# https://github.com/aws/aws-lc-rs/issues/737#issuecomment-2725918627
ln -s /usr/aarch64-unknown-linux-gnu/lib/gcc/aarch64-unknown-linux-gnu/4.8.5/crtbeginS.o /usr/aarch64-unknown-linux-gnu/aarch64-unknown-linux-gnu/sysroot/usr/lib/crtbeginS.o &&
ln -s /usr/aarch64-unknown-linux-gnu/lib/gcc /usr/aarch64-unknown-linux-gnu/aarch64-unknown-linux-gnu/sysroot/usr/lib/gcc &&
export TARGET_CC=clang TARGET_CXX=clang++ &&
# The manylinux2014 sysroot has glibc 2.17 headers which lack
# AT_HWCAP2 (added in Linux 3.17). Define it for aws-lc-sys.
export CFLAGS="$CFLAGS -DAT_HWCAP2=26" &&
rustup target add aarch64-unknown-linux-gnu
- target: aarch64-unknown-linux-musl
host: ubuntu-2404-8x-x64
@@ -266,7 +273,7 @@ jobs:
- target: x86_64-unknown-linux-gnu
host: ubuntu-latest
- target: aarch64-unknown-linux-gnu
host: buildjet-16vcpu-ubuntu-2204-arm
host: ubuntu-2404-8x-arm64
node:
- '20'
runs-on: ${{ matrix.settings.host }}
@@ -356,7 +363,8 @@ jobs:
if [[ $DRY_RUN == "true" ]]; then
ARGS="$ARGS --dry-run"
fi
if [[ $GITHUB_REF =~ refs/tags/v(.*)-beta.* ]]; then
VERSION=$(node -p "require('./package.json').version")
if [[ $VERSION == *-* ]]; then
ARGS="$ARGS --tag preview"
fi
npm publish $ARGS

View File

@@ -9,6 +9,7 @@ on:
paths:
- .github/workflows/pypi-publish.yml
- Cargo.toml # Change in dependency frequently breaks builds
- Cargo.lock
env:
PIP_EXTRA_INDEX_URL: "https://pypi.fury.io/lance-format/ https://pypi.fury.io/lancedb/"

View File

@@ -7,9 +7,14 @@ on:
pull_request:
paths:
- Cargo.toml
- Cargo.lock
- python/**
- rust/**
- .github/workflows/python.yml
- .github/workflows/build_linux_wheel/**
- .github/workflows/build_mac_wheel/**
- .github/workflows/build_windows_wheel/**
- .github/workflows/run_tests/**
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}

View File

@@ -7,6 +7,7 @@ on:
pull_request:
paths:
- Cargo.toml
- Cargo.lock
- rust/**
- .github/workflows/rust.yml
@@ -206,14 +207,14 @@ jobs:
- name: Downgrade dependencies
# These packages have newer requirements for MSRV
run: |
cargo update -p aws-sdk-bedrockruntime --precise 1.64.0
cargo update -p aws-sdk-dynamodb --precise 1.55.0
cargo update -p aws-config --precise 1.5.10
cargo update -p aws-sdk-kms --precise 1.51.0
cargo update -p aws-sdk-s3 --precise 1.65.0
cargo update -p aws-sdk-sso --precise 1.50.0
cargo update -p aws-sdk-ssooidc --precise 1.51.0
cargo update -p aws-sdk-sts --precise 1.51.0
cargo update -p aws-sdk-bedrockruntime --precise 1.77.0
cargo update -p aws-sdk-dynamodb --precise 1.68.0
cargo update -p aws-config --precise 1.6.0
cargo update -p aws-sdk-kms --precise 1.63.0
cargo update -p aws-sdk-s3 --precise 1.79.0
cargo update -p aws-sdk-sso --precise 1.62.0
cargo update -p aws-sdk-ssooidc --precise 1.63.0
cargo update -p aws-sdk-sts --precise 1.63.0
cargo update -p home --precise 0.5.9
- name: cargo +${{ matrix.msrv }} check
env:

426
Cargo.lock generated

File diff suppressed because it is too large Load Diff

View File

@@ -5,7 +5,7 @@ exclude = ["python"]
resolver = "2"
[workspace.package]
edition = "2021"
edition = "2024"
authors = ["LanceDB Devs <dev@lancedb.com>"]
license = "Apache-2.0"
repository = "https://github.com/lancedb/lancedb"
@@ -15,20 +15,20 @@ categories = ["database-implementations"]
rust-version = "1.91.0"
[workspace.dependencies]
lance = { "version" = "=3.0.0-rc.2", default-features = false, "tag" = "v3.0.0-rc.2", "git" = "https://github.com/lance-format/lance.git" }
lance-core = { "version" = "=3.0.0-rc.2", "tag" = "v3.0.0-rc.2", "git" = "https://github.com/lance-format/lance.git" }
lance-datagen = { "version" = "=3.0.0-rc.2", "tag" = "v3.0.0-rc.2", "git" = "https://github.com/lance-format/lance.git" }
lance-file = { "version" = "=3.0.0-rc.2", "tag" = "v3.0.0-rc.2", "git" = "https://github.com/lance-format/lance.git" }
lance-io = { "version" = "=3.0.0-rc.2", default-features = false, "tag" = "v3.0.0-rc.2", "git" = "https://github.com/lance-format/lance.git" }
lance-index = { "version" = "=3.0.0-rc.2", "tag" = "v3.0.0-rc.2", "git" = "https://github.com/lance-format/lance.git" }
lance-linalg = { "version" = "=3.0.0-rc.2", "tag" = "v3.0.0-rc.2", "git" = "https://github.com/lance-format/lance.git" }
lance-namespace = { "version" = "=3.0.0-rc.2", "tag" = "v3.0.0-rc.2", "git" = "https://github.com/lance-format/lance.git" }
lance-namespace-impls = { "version" = "=3.0.0-rc.2", default-features = false, "tag" = "v3.0.0-rc.2", "git" = "https://github.com/lance-format/lance.git" }
lance-table = { "version" = "=3.0.0-rc.2", "tag" = "v3.0.0-rc.2", "git" = "https://github.com/lance-format/lance.git" }
lance-testing = { "version" = "=3.0.0-rc.2", "tag" = "v3.0.0-rc.2", "git" = "https://github.com/lance-format/lance.git" }
lance-datafusion = { "version" = "=3.0.0-rc.2", "tag" = "v3.0.0-rc.2", "git" = "https://github.com/lance-format/lance.git" }
lance-encoding = { "version" = "=3.0.0-rc.2", "tag" = "v3.0.0-rc.2", "git" = "https://github.com/lance-format/lance.git" }
lance-arrow = { "version" = "=3.0.0-rc.2", "tag" = "v3.0.0-rc.2", "git" = "https://github.com/lance-format/lance.git" }
lance = { version = "=3.0.1", default-features = false }
lance-core = { version = "=3.0.1" }
lance-datagen = { version = "=3.0.1" }
lance-file = { version = "=3.0.1" }
lance-io = { version = "=3.0.1", default-features = false }
lance-index = { version = "=3.0.1" }
lance-linalg = { version = "=3.0.1" }
lance-namespace = { version = "=3.0.1" }
lance-namespace-impls = { version = "=3.0.1", default-features = false }
lance-table = { version = "=3.0.1" }
lance-testing = { version = "=3.0.1" }
lance-datafusion = { version = "=3.0.1" }
lance-encoding = { version = "=3.0.1" }
lance-arrow = { version = "=3.0.1" }
ahash = "0.8"
# Note that this one does not include pyarrow
arrow = { version = "57.2", optional = false }

View File

@@ -3,6 +3,7 @@
from __future__ import annotations
import argparse
import functools
import json
import os
import re
@@ -26,6 +27,7 @@ SEMVER_RE = re.compile(
)
@functools.total_ordering
@dataclass(frozen=True)
class SemVer:
major: int
@@ -156,7 +158,9 @@ def read_current_version(repo_root: Path) -> str:
def determine_latest_tag(tags: Iterable[TagInfo]) -> TagInfo:
return max(tags, key=lambda tag: tag.semver)
# Stable releases (no prerelease) are always preferred over pre-releases.
# Within each group, standard semver ordering applies.
return max(tags, key=lambda tag: (not tag.semver.prerelease, tag.semver))
def write_outputs(args: argparse.Namespace, payload: dict) -> None:

View File

@@ -1,7 +1,7 @@
version: "3.9"
services:
localstack:
image: localstack/localstack:3.3
image: localstack/localstack:4.0
ports:
- 4566:4566
environment:

View File

@@ -1,27 +1,27 @@
#Simple base dockerfile that supports basic dependencies required to run lance with FTS and Hybrid Search
#Usage docker build -t lancedb:latest -f Dockerfile .
FROM python:3.10-slim-buster
# Simple base dockerfile that supports basic dependencies required to run lance with FTS and Hybrid Search
# Usage: docker build -t lancedb:latest -f Dockerfile .
FROM python:3.12-slim-bookworm
# Install Rust
RUN apt-get update && apt-get install -y curl build-essential && \
curl https://sh.rustup.rs -sSf | sh -s -- -y
# Set the environment variable for Rust
ENV PATH="/root/.cargo/bin:${PATH}"
# Install protobuf compiler
RUN apt-get install -y protobuf-compiler && \
# Install build dependencies in a single layer
RUN apt-get update && \
apt-get install -y --no-install-recommends \
curl \
build-essential \
protobuf-compiler \
git \
ca-certificates && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
RUN apt-get -y update &&\
apt-get -y upgrade && \
apt-get -y install git
# Install Rust (pinned installer, non-interactive)
RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y --default-toolchain stable --profile minimal
# Set the environment variable for Rust
ENV PATH="/root/.cargo/bin:${PATH}"
# Verify installations
RUN python --version && \
rustc --version && \
protoc --version
RUN pip install tantivy lancedb
RUN pip install --no-cache-dir tantivy lancedb

View File

@@ -52,14 +52,21 @@ plugins:
options:
docstring_style: numpy
heading_level: 3
show_source: true
show_symbol_type_in_heading: true
show_signature_annotations: true
show_root_heading: true
show_docstring_examples: true
show_docstring_attributes: false
show_docstring_other_parameters: true
show_symbol_type_heading: true
show_labels: false
show_if_no_docstring: true
show_source: false
members_order: source
docstring_section_style: list
signature_crossrefs: true
separate_signature: true
filters:
- "!^_"
import:
# for cross references
- https://arrow.apache.org/docs/objects.inv
@@ -113,7 +120,7 @@ markdown_extensions:
emoji_index: !!python/name:material.extensions.emoji.twemoji
emoji_generator: !!python/name:material.extensions.emoji.to_svg
- markdown.extensions.toc:
toc_depth: 3
toc_depth: 4
permalink: true
permalink_title: Anchor link to this section

View File

@@ -1,9 +1,9 @@
mkdocs==1.5.3
mkdocs==1.6.1
mkdocs-jupyter==0.24.1
mkdocs-material==9.5.3
mkdocs-autorefs<=1.0
mkdocstrings[python]==0.25.2
griffe
mkdocs-render-swagger-plugin
pydantic
mkdocs-redirects
mkdocs-material==9.6.23
mkdocs-autorefs>=0.5,<=1.0
mkdocstrings[python]>=0.24,<1.0
griffe>=0.40,<1.0
mkdocs-render-swagger-plugin>=0.1.0
pydantic>=2.0,<3.0
mkdocs-redirects>=1.2.0

View File

@@ -14,7 +14,7 @@ Add the following dependency to your `pom.xml`:
<dependency>
<groupId>com.lancedb</groupId>
<artifactId>lancedb-core</artifactId>
<version>0.27.0-beta.2</version>
<version>0.27.1</version>
</dependency>
```

View File

@@ -71,11 +71,12 @@ Add new columns with defined values.
#### Parameters
* **newColumnTransforms**: [`AddColumnsSql`](../interfaces/AddColumnsSql.md)[]
pairs of column names and
the SQL expression to use to calculate the value of the new column. These
expressions will be evaluated for each row in the table, and can
reference existing columns in the table.
* **newColumnTransforms**: `Field`&lt;`any`&gt; \| `Field`&lt;`any`&gt;[] \| `Schema`&lt;`any`&gt; \| [`AddColumnsSql`](../interfaces/AddColumnsSql.md)[]
Either:
- An array of objects with column names and SQL expressions to calculate values
- A single Arrow Field defining one column with its data type (column will be initialized with null values)
- An array of Arrow Fields defining columns with their data types (columns will be initialized with null values)
- An Arrow Schema defining columns with their data types (columns will be initialized with null values)
#### Returns
@@ -484,19 +485,7 @@ Modeled after ``VACUUM`` in PostgreSQL.
- Prune: Removes old versions of the dataset
- Index: Optimizes the indices, adding new data to existing indices
Experimental API
----------------
The optimization process is undergoing active development and may change.
Our goal with these changes is to improve the performance of optimization and
reduce the complexity.
That being said, it is essential today to run optimize if you want the best
performance. It should be stable and safe to use in production, but it our
hope that the API may be simplified (or not even need to be called) in the
future.
The frequency an application shoudl call optimize is based on the frequency of
The frequency an application should call optimize is based on the frequency of
data modifications. If data is frequently added, deleted, or updated then
optimize should be run frequently. A good rule of thumb is to run optimize if
you have added or modified 100,000 or more records or run more than 20 data

View File

@@ -8,6 +8,14 @@
## Properties
### numDeletedRows
```ts
numDeletedRows: number;
```
***
### version
```ts

View File

@@ -37,3 +37,12 @@ tbl.optimize({cleanupOlderThan: new Date()});
```ts
deleteUnverified: boolean;
```
Because they may be part of an in-progress transaction, files newer than
7 days old are not deleted by default. If you are sure that there are no
in-progress transactions, then you can set this to true to delete all
files older than `cleanupOlderThan`.
**WARNING**: This should only be set to true if you can guarantee that
no other process is currently working on this dataset. Otherwise the
dataset could be put into a corrupted state.

View File

@@ -1,4 +1,4 @@
# LanceDB Java SDK
# LanceDB Java Enterprise Client
## Configuration and Initialization

View File

@@ -8,7 +8,7 @@
<parent>
<groupId>com.lancedb</groupId>
<artifactId>lancedb-parent</artifactId>
<version>0.27.0-beta.2</version>
<version>0.27.1-final.0</version>
<relativePath>../pom.xml</relativePath>
</parent>
@@ -56,21 +56,21 @@
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-slf4j2-impl</artifactId>
<version>2.24.3</version>
<version>2.25.3</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-core</artifactId>
<version>2.24.3</version>
<version>2.25.3</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-api</artifactId>
<version>2.24.3</version>
<version>2.25.3</version>
<scope>test</scope>
</dependency>
</dependencies>

View File

@@ -6,7 +6,7 @@
<groupId>com.lancedb</groupId>
<artifactId>lancedb-parent</artifactId>
<version>0.27.0-beta.2</version>
<version>0.27.1-final.0</version>
<packaging>pom</packaging>
<name>${project.artifactId}</name>
<description>LanceDB Java SDK Parent POM</description>
@@ -28,7 +28,7 @@
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<arrow.version>15.0.0</arrow.version>
<lance-core.version>3.1.0-beta.2</lance-core.version>
<lance-core.version>3.0.1</lance-core.version>
<spotless.skip>false</spotless.skip>
<spotless.version>2.30.0</spotless.version>
<spotless.java.googlejavaformat.version>1.7</spotless.java.googlejavaformat.version>
@@ -111,7 +111,7 @@
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-source-plugin</artifactId>
<version>2.2.1</version>
<version>3.3.1</version>
<executions>
<execution>
<id>attach-sources</id>
@@ -124,7 +124,7 @@
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-javadoc-plugin</artifactId>
<version>2.9.1</version>
<version>3.11.2</version>
<executions>
<execution>
<id>attach-javadocs</id>
@@ -178,15 +178,15 @@
<plugins>
<plugin>
<artifactId>maven-clean-plugin</artifactId>
<version>3.1.0</version>
<version>3.4.1</version>
</plugin>
<plugin>
<artifactId>maven-resources-plugin</artifactId>
<version>3.0.2</version>
<version>3.3.1</version>
</plugin>
<plugin>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.8.1</version>
<version>3.14.0</version>
<configuration>
<compilerArgs>
<arg>-h</arg>
@@ -205,11 +205,11 @@
</plugin>
<plugin>
<artifactId>maven-jar-plugin</artifactId>
<version>3.0.2</version>
<version>3.4.2</version>
</plugin>
<plugin>
<artifactId>maven-install-plugin</artifactId>
<version>2.5.2</version>
<version>3.1.3</version>
</plugin>
<plugin>
<groupId>com.diffplug.spotless</groupId>
@@ -327,7 +327,7 @@
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-gpg-plugin</artifactId>
<version>1.5</version>
<version>3.2.7</version>
<executions>
<execution>
<id>sign-artifacts</id>

View File

@@ -1,7 +1,7 @@
[package]
name = "lancedb-nodejs"
edition.workspace = true
version = "0.27.0-beta.2"
version = "0.27.1"
license.workspace = true
description.workspace = true
repository.workspace = true
@@ -25,12 +25,12 @@ napi = { version = "3.8.3", default-features = false, features = [
] }
napi-derive = "3.5.2"
# Prevent dynamic linking of lzma, which comes from datafusion
lzma-sys = { version = "*", features = ["static"] }
lzma-sys = { version = "0.1", features = ["static"] }
log.workspace = true
# Workaround for build failure until we can fix it.
aws-lc-sys = "=0.28.0"
aws-lc-rs = "=1.13.0"
# Pin to resolve build failures; update periodically for security patches.
aws-lc-sys = "=0.38.0"
aws-lc-rs = "=1.16.1"
[build-dependencies]
napi-build = "2.3.1"

View File

@@ -63,6 +63,7 @@ describe.each([arrow15, arrow16, arrow17, arrow18])(
tableFromIPC,
DataType,
Dictionary,
Uint8: ArrowUint8,
// biome-ignore lint/suspicious/noExplicitAny: <explanation>
} = <any>arrow;
type Schema = ApacheArrow["Schema"];
@@ -362,6 +363,38 @@ describe.each([arrow15, arrow16, arrow17, arrow18])(
).toEqual(new Float64().toString());
});
it("will infer FixedSizeList<Float32> from Float32Array values", async function () {
const table = makeArrowTable([
{ id: "a", vector: new Float32Array([0.1, 0.2, 0.3]) },
{ id: "b", vector: new Float32Array([0.4, 0.5, 0.6]) },
]);
expect(DataType.isFixedSizeList(table.getChild("vector")?.type)).toBe(
true,
);
const vectorType = table.getChild("vector")?.type;
expect(vectorType.listSize).toBe(3);
expect(vectorType.children[0].type.toString()).toEqual(
new Float32().toString(),
);
});
it("will infer FixedSizeList<Uint8> from Uint8Array values", async function () {
const table = makeArrowTable([
{ id: "a", vector: new Uint8Array([1, 2, 3]) },
{ id: "b", vector: new Uint8Array([4, 5, 6]) },
]);
expect(DataType.isFixedSizeList(table.getChild("vector")?.type)).toBe(
true,
);
const vectorType = table.getChild("vector")?.type;
expect(vectorType.listSize).toBe(3);
expect(vectorType.children[0].type.toString()).toEqual(
new ArrowUint8().toString(),
);
});
it("will use dictionary encoded strings if asked", async function () {
const table = makeArrowTable([{ str: "hello" }]);
expect(DataType.isUtf8(table.getChild("str")?.type)).toBe(true);

View File

@@ -1259,6 +1259,98 @@ describe("schema evolution", function () {
expect(await table.schema()).toEqual(expectedSchema);
});
it("can add columns with schema for explicit data types", async function () {
const con = await connect(tmpDir.name);
const table = await con.createTable("vectors", [
{ id: 1n, vector: [0.1, 0.2] },
]);
// Define schema for new columns with explicit data types
// Note: All columns must be nullable when using addColumns with Schema
// because they are initially populated with null values
const newColumnsSchema = new Schema([
new Field("price", new Float64(), true),
new Field("category", new Utf8(), true),
new Field("rating", new Int32(), true),
]);
const result = await table.addColumns(newColumnsSchema);
expect(result).toHaveProperty("version");
expect(result.version).toBe(2);
const expectedSchema = new Schema([
new Field("id", new Int64(), true),
new Field(
"vector",
new FixedSizeList(2, new Field("item", new Float32(), true)),
true,
),
new Field("price", new Float64(), true),
new Field("category", new Utf8(), true),
new Field("rating", new Int32(), true),
]);
expect(await table.schema()).toEqual(expectedSchema);
// Verify that new columns are populated with null values
const results = await table.query().toArray();
expect(results).toHaveLength(1);
expect(results[0].price).toBeNull();
expect(results[0].category).toBeNull();
expect(results[0].rating).toBeNull();
});
it("can add a single column using Field", async function () {
const con = await connect(tmpDir.name);
const table = await con.createTable("vectors", [
{ id: 1n, vector: [0.1, 0.2] },
]);
// Add a single field
const priceField = new Field("price", new Float64(), true);
const result = await table.addColumns(priceField);
expect(result).toHaveProperty("version");
expect(result.version).toBe(2);
const expectedSchema = new Schema([
new Field("id", new Int64(), true),
new Field(
"vector",
new FixedSizeList(2, new Field("item", new Float32(), true)),
true,
),
new Field("price", new Float64(), true),
]);
expect(await table.schema()).toEqual(expectedSchema);
});
it("can add multiple columns using array of Fields", async function () {
const con = await connect(tmpDir.name);
const table = await con.createTable("vectors", [
{ id: 1n, vector: [0.1, 0.2] },
]);
// Add multiple fields as array
const fields = [
new Field("price", new Float64(), true),
new Field("category", new Utf8(), true),
];
const result = await table.addColumns(fields);
expect(result).toHaveProperty("version");
expect(result.version).toBe(2);
const expectedSchema = new Schema([
new Field("id", new Int64(), true),
new Field(
"vector",
new FixedSizeList(2, new Field("item", new Float32(), true)),
true,
),
new Field("price", new Float64(), true),
new Field("category", new Utf8(), true),
]);
expect(await table.schema()).toEqual(expectedSchema);
});
it("can alter the columns in the schema", async function () {
const con = await connect(tmpDir.name);
const schema = new Schema([
@@ -1697,6 +1789,65 @@ describe.each([arrow15, arrow16, arrow17, arrow18])(
expect(results2[0].text).toBe(data[1].text);
});
test("full text search fast search", async () => {
const db = await connect(tmpDir.name);
const data = [{ text: "hello world", vector: [0.1, 0.2, 0.3], id: 1 }];
const table = await db.createTable("test", data);
await table.createIndex("text", {
config: Index.fts(),
});
// Insert unindexed data after creating the index.
await table.add([{ text: "xyz", vector: [0.4, 0.5, 0.6], id: 2 }]);
const withFlatSearch = await table
.search("xyz", "fts")
.limit(10)
.toArray();
expect(withFlatSearch.length).toBeGreaterThan(0);
const fastSearchResults = await table
.search("xyz", "fts")
.fastSearch()
.limit(10)
.toArray();
expect(fastSearchResults.length).toBe(0);
const nearestToTextFastSearch = await table
.query()
.nearestToText("xyz")
.fastSearch()
.limit(10)
.toArray();
expect(nearestToTextFastSearch.length).toBe(0);
// fastSearch should be chainable with other methods.
const chainedFastSearch = await table
.search("xyz", "fts")
.fastSearch()
.select(["text"])
.limit(5)
.toArray();
expect(chainedFastSearch.length).toBe(0);
await table.optimize();
const indexedFastSearch = await table
.search("xyz", "fts")
.fastSearch()
.limit(10)
.toArray();
expect(indexedFastSearch.length).toBeGreaterThan(0);
const indexedNearestToTextFastSearch = await table
.query()
.nearestToText("xyz")
.fastSearch()
.limit(10)
.toArray();
expect(indexedNearestToTextFastSearch.length).toBeGreaterThan(0);
});
test("prewarm full text search index", async () => {
const db = await connect(tmpDir.name);
const data = [
@@ -2145,3 +2296,36 @@ describe("when creating an empty table", () => {
expect((actualSchema.fields[1].type as Float64).precision).toBe(2);
});
});
// Ensure we can create float32 arrays without using Arrow
// by utilizing native JS TypedArray support
//
// https://github.com/lancedb/lancedb/issues/3115
describe("when creating a table with Float32Array vectors", () => {
let tmpDir: tmp.DirResult;
beforeEach(() => {
tmpDir = tmp.dirSync({ unsafeCleanup: true });
});
afterEach(() => {
tmpDir.removeCallback();
});
it("should persist Float32Array as FixedSizeList<Float32> in the LanceDB schema", async () => {
const db = await connect(tmpDir.name);
const table = await db.createTable("test", [
{ id: "a", vector: new Float32Array([0.1, 0.2, 0.3]) },
{ id: "b", vector: new Float32Array([0.4, 0.5, 0.6]) },
]);
const schema = await table.schema();
const vectorField = schema.fields.find((f) => f.name === "vector");
expect(vectorField).toBeDefined();
expect(vectorField!.type).toBeInstanceOf(FixedSizeList);
const fsl = vectorField!.type as FixedSizeList;
expect(fsl.listSize).toBe(3);
expect(fsl.children[0].type.typeId).toBe(Type.Float);
// precision: HALF=0, SINGLE=1, DOUBLE=2
expect((fsl.children[0].type as Float32).precision).toBe(1);
});
});

View File

@@ -30,12 +30,15 @@
"x64",
"arm64"
],
"dev": true,
"license": "Apache-2.0",
"optional": true,
"os": [
"darwin",
"linux",
"win32"
],
"peer": true,
"dependencies": {
"reflect-metadata": "^0.2.2"
},
@@ -91,14 +94,15 @@
}
},
"node_modules/@babel/code-frame": {
"version": "7.26.2",
"resolved": "https://registry.npmjs.org/@babel/code-frame/-/code-frame-7.26.2.tgz",
"integrity": "sha512-RJlIHRueQgwWitWgF8OdFYGZX328Ax5BCemNGlqHfplnRT9ESi8JkFlvaVYbS+UubVY6dpv87Fs2u5M29iNFVQ==",
"version": "7.29.0",
"resolved": "https://registry.npmjs.org/@babel/code-frame/-/code-frame-7.29.0.tgz",
"integrity": "sha512-9NhCeYjq9+3uxgdtp20LSiJXJvN0FeCtNGpJxuMFZ1Kv3cWUNb6DOhJwUvcVCzKGR66cw4njwM6hrJLqgOwbcw==",
"dev": true,
"license": "MIT",
"dependencies": {
"@babel/helper-validator-identifier": "^7.25.9",
"@babel/helper-validator-identifier": "^7.28.5",
"js-tokens": "^4.0.0",
"picocolors": "^1.0.0"
"picocolors": "^1.1.1"
},
"engines": {
"node": ">=6.9.0"
@@ -233,19 +237,21 @@
}
},
"node_modules/@babel/helper-string-parser": {
"version": "7.25.9",
"resolved": "https://registry.npmjs.org/@babel/helper-string-parser/-/helper-string-parser-7.25.9.tgz",
"integrity": "sha512-4A/SCr/2KLd5jrtOMFzaKjVtAei3+2r/NChoBNoZ3EyP/+GlhoaEGoWOZUmFmoITP7zOJyHIMm+DYRd8o3PvHA==",
"version": "7.27.1",
"resolved": "https://registry.npmjs.org/@babel/helper-string-parser/-/helper-string-parser-7.27.1.tgz",
"integrity": "sha512-qMlSxKbpRlAridDExk92nSobyDdpPijUq2DW6oDnUqd0iOGxmQjyqhMIihI9+zv4LPyZdRje2cavWPbCbWm3eA==",
"dev": true,
"license": "MIT",
"engines": {
"node": ">=6.9.0"
}
},
"node_modules/@babel/helper-validator-identifier": {
"version": "7.25.9",
"resolved": "https://registry.npmjs.org/@babel/helper-validator-identifier/-/helper-validator-identifier-7.25.9.tgz",
"integrity": "sha512-Ed61U6XJc3CVRfkERJWDz4dJwKe7iLmmJsbOGu9wSloNSFttHV0I8g6UAgb7qnK5ly5bGLPd4oXZlxCdANBOWQ==",
"version": "7.28.5",
"resolved": "https://registry.npmjs.org/@babel/helper-validator-identifier/-/helper-validator-identifier-7.28.5.tgz",
"integrity": "sha512-qSs4ifwzKJSV39ucNjsvc6WVHs6b7S03sOh2OcHF9UHfVPqWWALUsNUVzhSBiItjRZoLHx7nIarVjqKVusUZ1Q==",
"dev": true,
"license": "MIT",
"engines": {
"node": ">=6.9.0"
}
@@ -260,25 +266,27 @@
}
},
"node_modules/@babel/helpers": {
"version": "7.26.0",
"resolved": "https://registry.npmjs.org/@babel/helpers/-/helpers-7.26.0.tgz",
"integrity": "sha512-tbhNuIxNcVb21pInl3ZSjksLCvgdZy9KwJ8brv993QtIVKJBBkYXz4q4ZbAv31GdnC+R90np23L5FbEBlthAEw==",
"version": "7.28.6",
"resolved": "https://registry.npmjs.org/@babel/helpers/-/helpers-7.28.6.tgz",
"integrity": "sha512-xOBvwq86HHdB7WUDTfKfT/Vuxh7gElQ+Sfti2Cy6yIWNW05P8iUslOVcZ4/sKbE+/jQaukQAdz/gf3724kYdqw==",
"dev": true,
"license": "MIT",
"dependencies": {
"@babel/template": "^7.25.9",
"@babel/types": "^7.26.0"
"@babel/template": "^7.28.6",
"@babel/types": "^7.28.6"
},
"engines": {
"node": ">=6.9.0"
}
},
"node_modules/@babel/parser": {
"version": "7.26.2",
"resolved": "https://registry.npmjs.org/@babel/parser/-/parser-7.26.2.tgz",
"integrity": "sha512-DWMCZH9WA4Maitz2q21SRKHo9QXZxkDsbNZoVD62gusNtNBBqDg9i7uOhASfTfIGNzW+O+r7+jAlM8dwphcJKQ==",
"version": "7.29.0",
"resolved": "https://registry.npmjs.org/@babel/parser/-/parser-7.29.0.tgz",
"integrity": "sha512-IyDgFV5GeDUVX4YdF/3CPULtVGSXXMLh1xVIgdCgxApktqnQV0r7/8Nqthg+8YLGaAtdyIlo2qIdZrbCv4+7ww==",
"dev": true,
"license": "MIT",
"dependencies": {
"@babel/types": "^7.26.0"
"@babel/types": "^7.29.0"
},
"bin": {
"parser": "bin/babel-parser.js"
@@ -510,14 +518,15 @@
}
},
"node_modules/@babel/template": {
"version": "7.25.9",
"resolved": "https://registry.npmjs.org/@babel/template/-/template-7.25.9.tgz",
"integrity": "sha512-9DGttpmPvIxBb/2uwpVo3dqJ+O6RooAFOS+lB+xDqoE2PVCE8nfoHMdZLpfCQRLwvohzXISPZcgxt80xLfsuwg==",
"version": "7.28.6",
"resolved": "https://registry.npmjs.org/@babel/template/-/template-7.28.6.tgz",
"integrity": "sha512-YA6Ma2KsCdGb+WC6UpBVFJGXL58MDA6oyONbjyF/+5sBgxY/dwkhLogbMT2GXXyU84/IhRw/2D1Os1B/giz+BQ==",
"dev": true,
"license": "MIT",
"dependencies": {
"@babel/code-frame": "^7.25.9",
"@babel/parser": "^7.25.9",
"@babel/types": "^7.25.9"
"@babel/code-frame": "^7.28.6",
"@babel/parser": "^7.28.6",
"@babel/types": "^7.28.6"
},
"engines": {
"node": ">=6.9.0"
@@ -542,13 +551,14 @@
}
},
"node_modules/@babel/types": {
"version": "7.26.0",
"resolved": "https://registry.npmjs.org/@babel/types/-/types-7.26.0.tgz",
"integrity": "sha512-Z/yiTPj+lDVnF7lWeKCIJzaIkI0vYO87dMpZ4bg4TDrFe4XXLFWL1TbXU27gBP3QccxV9mZICCrnjnYlJjXHOA==",
"version": "7.29.0",
"resolved": "https://registry.npmjs.org/@babel/types/-/types-7.29.0.tgz",
"integrity": "sha512-LwdZHpScM4Qz8Xw2iKSzS+cfglZzJGvofQICy7W7v4caru4EaAmyUuO6BGrbyQ2mYV11W0U8j5mBhd14dd3B0A==",
"dev": true,
"license": "MIT",
"dependencies": {
"@babel/helper-string-parser": "^7.25.9",
"@babel/helper-validator-identifier": "^7.25.9"
"@babel/helper-string-parser": "^7.27.1",
"@babel/helper-validator-identifier": "^7.28.5"
},
"engines": {
"node": ">=6.9.0"
@@ -1151,95 +1161,6 @@
"url": "https://opencollective.com/libvips"
}
},
"node_modules/@isaacs/cliui": {
"version": "8.0.2",
"resolved": "https://registry.npmjs.org/@isaacs/cliui/-/cliui-8.0.2.tgz",
"integrity": "sha512-O8jcjabXaleOG9DQ0+ARXWZBTfnP4WNAqzuiJK7ll44AmxGKv/J2M4TPjxjY3znBCfvBXFzucm1twdyFybFqEA==",
"dependencies": {
"string-width": "^5.1.2",
"string-width-cjs": "npm:string-width@^4.2.0",
"strip-ansi": "^7.0.1",
"strip-ansi-cjs": "npm:strip-ansi@^6.0.1",
"wrap-ansi": "^8.1.0",
"wrap-ansi-cjs": "npm:wrap-ansi@^7.0.0"
},
"engines": {
"node": ">=12"
}
},
"node_modules/@isaacs/cliui/node_modules/ansi-regex": {
"version": "6.1.0",
"resolved": "https://registry.npmjs.org/ansi-regex/-/ansi-regex-6.1.0.tgz",
"integrity": "sha512-7HSX4QQb4CspciLpVFwyRe79O3xsIZDDLER21kERQ71oaPodF8jL725AgJMFAYbooIqolJoRLuM81SpeUkpkvA==",
"engines": {
"node": ">=12"
},
"funding": {
"url": "https://github.com/chalk/ansi-regex?sponsor=1"
}
},
"node_modules/@isaacs/cliui/node_modules/ansi-styles": {
"version": "6.2.1",
"resolved": "https://registry.npmjs.org/ansi-styles/-/ansi-styles-6.2.1.tgz",
"integrity": "sha512-bN798gFfQX+viw3R7yrGWRqnrN2oRkEkUjjl4JNn4E8GxxbjtG3FbrEIIY3l8/hrwUwIeCZvi4QuOTP4MErVug==",
"engines": {
"node": ">=12"
},
"funding": {
"url": "https://github.com/chalk/ansi-styles?sponsor=1"
}
},
"node_modules/@isaacs/cliui/node_modules/emoji-regex": {
"version": "9.2.2",
"resolved": "https://registry.npmjs.org/emoji-regex/-/emoji-regex-9.2.2.tgz",
"integrity": "sha512-L18DaJsXSUk2+42pv8mLs5jJT2hqFkFE4j21wOmgbUqsZ2hL72NsUU785g9RXgo3s0ZNgVl42TiHp3ZtOv/Vyg=="
},
"node_modules/@isaacs/cliui/node_modules/string-width": {
"version": "5.1.2",
"resolved": "https://registry.npmjs.org/string-width/-/string-width-5.1.2.tgz",
"integrity": "sha512-HnLOCR3vjcY8beoNLtcjZ5/nxn2afmME6lhrDrebokqMap+XbeW8n9TXpPDOqdGK5qcI3oT0GKTW6wC7EMiVqA==",
"dependencies": {
"eastasianwidth": "^0.2.0",
"emoji-regex": "^9.2.2",
"strip-ansi": "^7.0.1"
},
"engines": {
"node": ">=12"
},
"funding": {
"url": "https://github.com/sponsors/sindresorhus"
}
},
"node_modules/@isaacs/cliui/node_modules/strip-ansi": {
"version": "7.1.0",
"resolved": "https://registry.npmjs.org/strip-ansi/-/strip-ansi-7.1.0.tgz",
"integrity": "sha512-iq6eVVI64nQQTRYq2KtEg2d2uU7LElhTJwsH4YzIHZshxlgZms/wIc4VoDQTlG/IvVIrBKG06CrZnp0qv7hkcQ==",
"dependencies": {
"ansi-regex": "^6.0.1"
},
"engines": {
"node": ">=12"
},
"funding": {
"url": "https://github.com/chalk/strip-ansi?sponsor=1"
}
},
"node_modules/@isaacs/cliui/node_modules/wrap-ansi": {
"version": "8.1.0",
"resolved": "https://registry.npmjs.org/wrap-ansi/-/wrap-ansi-8.1.0.tgz",
"integrity": "sha512-si7QWI6zUMq56bESFvagtmzMdGOtoxfR+Sez11Mobfc7tm+VkUckk9bW2UeffTGVUbOksxmSw0AA2gs8g71NCQ==",
"dependencies": {
"ansi-styles": "^6.1.0",
"string-width": "^5.0.1",
"strip-ansi": "^7.0.1"
},
"engines": {
"node": ">=12"
},
"funding": {
"url": "https://github.com/chalk/wrap-ansi?sponsor=1"
}
},
"node_modules/@isaacs/fs-minipass": {
"version": "4.0.1",
"resolved": "https://registry.npmjs.org/@isaacs/fs-minipass/-/fs-minipass-4.0.1.tgz",
@@ -1606,15 +1527,6 @@
"resolved": "../dist",
"link": true
},
"node_modules/@pkgjs/parseargs": {
"version": "0.11.0",
"resolved": "https://registry.npmjs.org/@pkgjs/parseargs/-/parseargs-0.11.0.tgz",
"integrity": "sha512-+1VkjdD0QBLPodGrJUeqarH8VAIvQODIbwh9XpP5Syisf7YoQgsJKPNFoqqLQlu+VQ/tVSshMR6loPMn8U+dPg==",
"optional": true,
"engines": {
"node": ">=14"
}
},
"node_modules/@protobufjs/aspromise": {
"version": "1.1.2",
"resolved": "https://registry.npmjs.org/@protobufjs/aspromise/-/aspromise-1.1.2.tgz",
@@ -1846,6 +1758,7 @@
"version": "5.0.1",
"resolved": "https://registry.npmjs.org/ansi-regex/-/ansi-regex-5.0.1.tgz",
"integrity": "sha512-quJQXlTSUGL2LH9SUXo8VwsY4soanhgo6LNSm84E1LBcE8s3O0wpdiRzyR9z/ZZJMlMWv37qOOb9pdJlMUEKFQ==",
"dev": true,
"engines": {
"node": ">=8"
}
@@ -1854,6 +1767,7 @@
"version": "4.3.0",
"resolved": "https://registry.npmjs.org/ansi-styles/-/ansi-styles-4.3.0.tgz",
"integrity": "sha512-zbB9rCJAT1rbjiVDb2hqKFHNYLxgtk8NURxZ3IZwD3F6NtxbXZQCnnSi1Lkx+IDohdPlFp222wVALIheZJQSEg==",
"dev": true,
"dependencies": {
"color-convert": "^2.0.1"
},
@@ -2019,13 +1933,15 @@
"node_modules/balanced-match": {
"version": "1.0.2",
"resolved": "https://registry.npmjs.org/balanced-match/-/balanced-match-1.0.2.tgz",
"integrity": "sha512-3oSeUO0TMV67hN1AmbXsK4yaqU7tjiHlbxRDZOpH0KW9+CeX4bRAaX0Anxt0tx2MrpRpWwQaPwIlISEJhYU5Pw=="
"integrity": "sha512-3oSeUO0TMV67hN1AmbXsK4yaqU7tjiHlbxRDZOpH0KW9+CeX4bRAaX0Anxt0tx2MrpRpWwQaPwIlISEJhYU5Pw==",
"dev": true
},
"node_modules/brace-expansion": {
"version": "1.1.11",
"resolved": "https://registry.npmjs.org/brace-expansion/-/brace-expansion-1.1.11.tgz",
"integrity": "sha512-iCuPHDFgrHX7H2vEI/5xpz07zSHB00TpugqhmYtVmMO6518mCuRMoOYFldEBl0g187ufozdaHgWKcYFb61qGiA==",
"version": "1.1.12",
"resolved": "https://registry.npmjs.org/brace-expansion/-/brace-expansion-1.1.12.tgz",
"integrity": "sha512-9T9UjW3r0UW5c1Q7GTwllptXwhvYmEzFhzMfZ9H7FQWt+uZePjZPjBP/W1ZEyZ1twGWom5/56TF4lPcqjnDHcg==",
"dev": true,
"license": "MIT",
"dependencies": {
"balanced-match": "^1.0.0",
"concat-map": "0.0.1"
@@ -2102,6 +2018,19 @@
"integrity": "sha512-E+XQCRwSbaaiChtv6k6Dwgc+bx+Bs6vuKJHHl5kox/BaKbhiXzqQOwK4cO22yElGp2OCmjwVhT3HmxgyPGnJfQ==",
"dev": true
},
"node_modules/call-bind-apply-helpers": {
"version": "1.0.2",
"resolved": "https://registry.npmjs.org/call-bind-apply-helpers/-/call-bind-apply-helpers-1.0.2.tgz",
"integrity": "sha512-Sp1ablJ0ivDkSzjcaJdxEunN5/XvksFJ2sMBFfq6x0ryhQV/2b/KwFe21cMpmHtPOSij8K99/wSfoEuTObmuMQ==",
"license": "MIT",
"dependencies": {
"es-errors": "^1.3.0",
"function-bind": "^1.1.2"
},
"engines": {
"node": ">= 0.4"
}
},
"node_modules/callsites": {
"version": "3.1.0",
"resolved": "https://registry.npmjs.org/callsites/-/callsites-3.1.0.tgz",
@@ -2298,9 +2227,11 @@
}
},
"node_modules/cross-spawn": {
"version": "7.0.3",
"resolved": "https://registry.npmjs.org/cross-spawn/-/cross-spawn-7.0.3.tgz",
"integrity": "sha512-iRDPJKUPVEND7dHPO8rkbOnPpyDygcDFtWjpeWNCgy8WP2rXcxXL8TskReQl6OrB2G7+UJrags1q15Fudc7G6w==",
"version": "7.0.6",
"resolved": "https://registry.npmjs.org/cross-spawn/-/cross-spawn-7.0.6.tgz",
"integrity": "sha512-uV2QOWP2nWzsy2aMp8aRibhi9dlzF5Hgh5SHaB9OiTGEyDTiJJyx0uy51QXdyWbtAHNua4XJzUKca3OzKUd3vA==",
"dev": true,
"license": "MIT",
"dependencies": {
"path-key": "^3.1.0",
"shebang-command": "^2.0.0",
@@ -2384,10 +2315,19 @@
"node": "^14.15.0 || ^16.10.0 || >=18.0.0"
}
},
"node_modules/eastasianwidth": {
"version": "0.2.0",
"resolved": "https://registry.npmjs.org/eastasianwidth/-/eastasianwidth-0.2.0.tgz",
"integrity": "sha512-I88TYZWc9XiYHRQ4/3c5rjjfgkjhLyW2luGIheGERbNQ6OY7yTybanSpDXZa8y7VUP9YmDcYa+eyq4ca7iLqWA=="
"node_modules/dunder-proto": {
"version": "1.0.1",
"resolved": "https://registry.npmjs.org/dunder-proto/-/dunder-proto-1.0.1.tgz",
"integrity": "sha512-KIN/nDJBQRcXw0MLVhZE9iQHmG68qAVIBg9CqmUYjmQIhgij9U5MFvrqkUL5FbtyyzZuOeOt0zdeRe4UY7ct+A==",
"license": "MIT",
"dependencies": {
"call-bind-apply-helpers": "^1.0.1",
"es-errors": "^1.3.0",
"gopd": "^1.2.0"
},
"engines": {
"node": ">= 0.4"
}
},
"node_modules/ejs": {
"version": "3.1.10",
@@ -2425,7 +2365,8 @@
"node_modules/emoji-regex": {
"version": "8.0.0",
"resolved": "https://registry.npmjs.org/emoji-regex/-/emoji-regex-8.0.0.tgz",
"integrity": "sha512-MSjYzcWNOA0ewAHpz0MxpYFvwg6yjy1NG3xteoqz644VCo/RPgnr1/GGt+ic3iJTzQ8Eu3TdM14SawnVUmGE6A=="
"integrity": "sha512-MSjYzcWNOA0ewAHpz0MxpYFvwg6yjy1NG3xteoqz644VCo/RPgnr1/GGt+ic3iJTzQ8Eu3TdM14SawnVUmGE6A==",
"dev": true
},
"node_modules/error-ex": {
"version": "1.3.2",
@@ -2442,6 +2383,51 @@
"integrity": "sha512-zz06S8t0ozoDXMG+ube26zeCTNXcKIPJZJi8hBrF4idCLms4CG9QtK7qBl1boi5ODzFpjswb5JPmHCbMpjaYzg==",
"dev": true
},
"node_modules/es-define-property": {
"version": "1.0.1",
"resolved": "https://registry.npmjs.org/es-define-property/-/es-define-property-1.0.1.tgz",
"integrity": "sha512-e3nRfgfUZ4rNGL232gUgX06QNyyez04KdjFrF+LTRoOXmrOgFKDg4BCdsjW8EnT69eqdYGmRpJwiPVYNrCaW3g==",
"license": "MIT",
"engines": {
"node": ">= 0.4"
}
},
"node_modules/es-errors": {
"version": "1.3.0",
"resolved": "https://registry.npmjs.org/es-errors/-/es-errors-1.3.0.tgz",
"integrity": "sha512-Zf5H2Kxt2xjTvbJvP2ZWLEICxA6j+hAmMzIlypy4xcBg1vKVnx89Wy0GbS+kf5cwCVFFzdCFh2XSCFNULS6csw==",
"license": "MIT",
"engines": {
"node": ">= 0.4"
}
},
"node_modules/es-object-atoms": {
"version": "1.1.1",
"resolved": "https://registry.npmjs.org/es-object-atoms/-/es-object-atoms-1.1.1.tgz",
"integrity": "sha512-FGgH2h8zKNim9ljj7dankFPcICIK9Cp5bm+c2gQSYePhpaG5+esrLODihIorn+Pe6FGJzWhXQotPv73jTaldXA==",
"license": "MIT",
"dependencies": {
"es-errors": "^1.3.0"
},
"engines": {
"node": ">= 0.4"
}
},
"node_modules/es-set-tostringtag": {
"version": "2.1.0",
"resolved": "https://registry.npmjs.org/es-set-tostringtag/-/es-set-tostringtag-2.1.0.tgz",
"integrity": "sha512-j6vWzfrGVfyXxge+O0x5sh6cvxAog0a/4Rdd2K36zCMV5eJ+/+tOAngRO8cODMNWbVRdVlmGZQL2YS3yR8bIUA==",
"license": "MIT",
"dependencies": {
"es-errors": "^1.3.0",
"get-intrinsic": "^1.2.6",
"has-tostringtag": "^1.0.2",
"hasown": "^2.0.2"
},
"engines": {
"node": ">= 0.4"
}
},
"node_modules/escalade": {
"version": "3.2.0",
"resolved": "https://registry.npmjs.org/escalade/-/escalade-3.2.0.tgz",
@@ -2554,19 +2540,21 @@
}
},
"node_modules/filelist/node_modules/brace-expansion": {
"version": "2.0.1",
"resolved": "https://registry.npmjs.org/brace-expansion/-/brace-expansion-2.0.1.tgz",
"integrity": "sha512-XnAIvQ8eM+kC6aULx6wuQiwVsnzsi9d3WxzV3FpWTGA19F621kwdbsAcFKXgKUHZWsy+mY6iL1sHTxWEFCytDA==",
"version": "2.0.2",
"resolved": "https://registry.npmjs.org/brace-expansion/-/brace-expansion-2.0.2.tgz",
"integrity": "sha512-Jt0vHyM+jmUBqojB7E1NIYadt0vI0Qxjxd2TErW94wDz+E2LAm5vKMXXwg6ZZBTHPuUlDgQHKXvjGBdfcF1ZDQ==",
"dev": true,
"license": "MIT",
"dependencies": {
"balanced-match": "^1.0.0"
}
},
"node_modules/filelist/node_modules/minimatch": {
"version": "5.1.6",
"resolved": "https://registry.npmjs.org/minimatch/-/minimatch-5.1.6.tgz",
"integrity": "sha512-lKwV/1brpG6mBUFHtb7NUmtABCb2WZZmm2wNiOA5hAb8VdCS4B3dtMWyvcoViccwAW/COERjXLt0zP1zXUN26g==",
"version": "5.1.9",
"resolved": "https://registry.npmjs.org/minimatch/-/minimatch-5.1.9.tgz",
"integrity": "sha512-7o1wEA2RyMP7Iu7GNba9vc0RWWGACJOCZBJX2GJWip0ikV+wcOsgVuY9uE8CPiyQhkGFSlhuSkZPavN7u1c2Fw==",
"dev": true,
"license": "ISC",
"dependencies": {
"brace-expansion": "^2.0.1"
},
@@ -2604,39 +2592,16 @@
"resolved": "https://registry.npmjs.org/flatbuffers/-/flatbuffers-1.12.0.tgz",
"integrity": "sha512-c7CZADjRcl6j0PlvFy0ZqXQ67qSEZfrVPynmnL+2zPc+NtMvrF8Y0QceMo7QqnSPc7+uWjUIAbvCQ5WIKlMVdQ=="
},
"node_modules/foreground-child": {
"version": "3.3.0",
"resolved": "https://registry.npmjs.org/foreground-child/-/foreground-child-3.3.0.tgz",
"integrity": "sha512-Ld2g8rrAyMYFXBhEqMz8ZAHBi4J4uS1i/CxGMDnjyFWddMXLVcDp051DZfu+t7+ab7Wv6SMqpWmyFIj5UbfFvg==",
"dependencies": {
"cross-spawn": "^7.0.0",
"signal-exit": "^4.0.1"
},
"engines": {
"node": ">=14"
},
"funding": {
"url": "https://github.com/sponsors/isaacs"
}
},
"node_modules/foreground-child/node_modules/signal-exit": {
"version": "4.1.0",
"resolved": "https://registry.npmjs.org/signal-exit/-/signal-exit-4.1.0.tgz",
"integrity": "sha512-bzyZ1e88w9O1iNJbKnOlvYTrWPDl46O1bG0D3XInv+9tkPrxrN8jUUTiFlDkkmKWgn1M6CfIA13SuGqOa9Korw==",
"engines": {
"node": ">=14"
},
"funding": {
"url": "https://github.com/sponsors/isaacs"
}
},
"node_modules/form-data": {
"version": "4.0.1",
"resolved": "https://registry.npmjs.org/form-data/-/form-data-4.0.1.tgz",
"integrity": "sha512-tzN8e4TX8+kkxGPK8D5u0FNmjPUjw3lwC9lSLxxoB/+GtsJG91CO8bSWy73APlgAZzZbXEYZJuxjkHH2w+Ezhw==",
"version": "4.0.5",
"resolved": "https://registry.npmjs.org/form-data/-/form-data-4.0.5.tgz",
"integrity": "sha512-8RipRLol37bNs2bhoV67fiTEvdTrbMUYcFTiy3+wuuOnUog2QBHCZWXDRijWQfAkhBj2Uf5UnVaiWwA5vdd82w==",
"license": "MIT",
"dependencies": {
"asynckit": "^0.4.0",
"combined-stream": "^1.0.8",
"es-set-tostringtag": "^2.1.0",
"hasown": "^2.0.2",
"mime-types": "^2.1.12"
},
"engines": {
@@ -2684,7 +2649,6 @@
"version": "1.1.2",
"resolved": "https://registry.npmjs.org/function-bind/-/function-bind-1.1.2.tgz",
"integrity": "sha512-7XHNxH7qX9xG5mIwxkhumTox/MIRNcOgDrxWsMt2pAr23WHp6MrRlN7FBSFpCpr+oVO0F744iUgR82nJMfG2SA==",
"dev": true,
"funding": {
"url": "https://github.com/sponsors/ljharb"
}
@@ -2707,6 +2671,30 @@
"node": "6.* || 8.* || >= 10.*"
}
},
"node_modules/get-intrinsic": {
"version": "1.3.0",
"resolved": "https://registry.npmjs.org/get-intrinsic/-/get-intrinsic-1.3.0.tgz",
"integrity": "sha512-9fSjSaos/fRIVIp+xSJlE6lfwhES7LNtKaCBIamHsjr2na1BiABJPo0mOjjz8GJDURarmCPGqaiVg5mfjb98CQ==",
"license": "MIT",
"dependencies": {
"call-bind-apply-helpers": "^1.0.2",
"es-define-property": "^1.0.1",
"es-errors": "^1.3.0",
"es-object-atoms": "^1.1.1",
"function-bind": "^1.1.2",
"get-proto": "^1.0.1",
"gopd": "^1.2.0",
"has-symbols": "^1.1.0",
"hasown": "^2.0.2",
"math-intrinsics": "^1.1.0"
},
"engines": {
"node": ">= 0.4"
},
"funding": {
"url": "https://github.com/sponsors/ljharb"
}
},
"node_modules/get-package-type": {
"version": "0.1.0",
"resolved": "https://registry.npmjs.org/get-package-type/-/get-package-type-0.1.0.tgz",
@@ -2716,6 +2704,19 @@
"node": ">=8.0.0"
}
},
"node_modules/get-proto": {
"version": "1.0.1",
"resolved": "https://registry.npmjs.org/get-proto/-/get-proto-1.0.1.tgz",
"integrity": "sha512-sTSfBjoXBp89JvIKIefqw7U2CCebsc74kiY6awiGogKtoSGbgjYE/G/+l9sF3MWFPNc9IcoOC4ODfKHfxFmp0g==",
"license": "MIT",
"dependencies": {
"dunder-proto": "^1.0.1",
"es-object-atoms": "^1.0.0"
},
"engines": {
"node": ">= 0.4"
}
},
"node_modules/get-stream": {
"version": "6.0.1",
"resolved": "https://registry.npmjs.org/get-stream/-/get-stream-6.0.1.tgz",
@@ -2758,6 +2759,18 @@
"node": ">=4"
}
},
"node_modules/gopd": {
"version": "1.2.0",
"resolved": "https://registry.npmjs.org/gopd/-/gopd-1.2.0.tgz",
"integrity": "sha512-ZUKRh6/kUFoAiTAtTYPZJ3hw9wNxx+BIBOijnlG9PnrJsCcSjs1wyyD6vJpaYtgnzDrKYRSqf3OO6Rfa93xsRg==",
"license": "MIT",
"engines": {
"node": ">= 0.4"
},
"funding": {
"url": "https://github.com/sponsors/ljharb"
}
},
"node_modules/graceful-fs": {
"version": "4.2.11",
"resolved": "https://registry.npmjs.org/graceful-fs/-/graceful-fs-4.2.11.tgz",
@@ -2778,11 +2791,37 @@
"node": ">=8"
}
},
"node_modules/has-symbols": {
"version": "1.1.0",
"resolved": "https://registry.npmjs.org/has-symbols/-/has-symbols-1.1.0.tgz",
"integrity": "sha512-1cDNdwJ2Jaohmb3sg4OmKaMBwuC48sYni5HUw2DvsC8LjGTLK9h+eb1X6RyuOHe4hT0ULCW68iomhjUoKUqlPQ==",
"license": "MIT",
"engines": {
"node": ">= 0.4"
},
"funding": {
"url": "https://github.com/sponsors/ljharb"
}
},
"node_modules/has-tostringtag": {
"version": "1.0.2",
"resolved": "https://registry.npmjs.org/has-tostringtag/-/has-tostringtag-1.0.2.tgz",
"integrity": "sha512-NqADB8VjPFLM2V0VvHUewwwsw0ZWBaIdgo+ieHtK3hasLz4qeCRjYcqfB6AQrBggRKppKF8L52/VqdVsO47Dlw==",
"license": "MIT",
"dependencies": {
"has-symbols": "^1.0.3"
},
"engines": {
"node": ">= 0.4"
},
"funding": {
"url": "https://github.com/sponsors/ljharb"
}
},
"node_modules/hasown": {
"version": "2.0.2",
"resolved": "https://registry.npmjs.org/hasown/-/hasown-2.0.2.tgz",
"integrity": "sha512-0hJU9SCPvmMzIBdZFqNPXWa6dqh7WdH0cII9y+CyS8rG3nL48Bclra9HmKhVVUHyPWNH5Y7xDwAB7bfgSjkUMQ==",
"dev": true,
"dependencies": {
"function-bind": "^1.1.2"
},
@@ -2882,6 +2921,7 @@
"version": "3.0.0",
"resolved": "https://registry.npmjs.org/is-fullwidth-code-point/-/is-fullwidth-code-point-3.0.0.tgz",
"integrity": "sha512-zymm5+u+sCsSWyD9qNaejV3DFvhCKclKdizYaJUuHA83RLjb7nSuGnddCHGv0hk+KY7BMAlsWeK4Ueg6EV6XQg==",
"dev": true,
"engines": {
"node": ">=8"
}
@@ -2919,7 +2959,8 @@
"node_modules/isexe": {
"version": "2.0.0",
"resolved": "https://registry.npmjs.org/isexe/-/isexe-2.0.0.tgz",
"integrity": "sha512-RHxMLp9lnKHGHRng9QFhRCMbYAcVpn69smSGcq3f36xjgVVWThj4qqLbTLlq7Ssj8B+fIQ1EuCEGI2lKsyQeIw=="
"integrity": "sha512-RHxMLp9lnKHGHRng9QFhRCMbYAcVpn69smSGcq3f36xjgVVWThj4qqLbTLlq7Ssj8B+fIQ1EuCEGI2lKsyQeIw==",
"dev": true
},
"node_modules/istanbul-lib-coverage": {
"version": "3.2.2",
@@ -2987,20 +3028,6 @@
"node": ">=8"
}
},
"node_modules/jackspeak": {
"version": "3.4.3",
"resolved": "https://registry.npmjs.org/jackspeak/-/jackspeak-3.4.3.tgz",
"integrity": "sha512-OGlZQpz2yfahA/Rd1Y8Cd9SIEsqvXkLVoSw/cgwhnhFMDbsQFeZYoJJ7bIZBS9BcamUW96asq/npPWugM+RQBw==",
"dependencies": {
"@isaacs/cliui": "^8.0.2"
},
"funding": {
"url": "https://github.com/sponsors/isaacs"
},
"optionalDependencies": {
"@pkgjs/parseargs": "^0.11.0"
}
},
"node_modules/jake": {
"version": "10.9.2",
"resolved": "https://registry.npmjs.org/jake/-/jake-10.9.2.tgz",
@@ -3605,10 +3632,11 @@
"dev": true
},
"node_modules/js-yaml": {
"version": "3.14.1",
"resolved": "https://registry.npmjs.org/js-yaml/-/js-yaml-3.14.1.tgz",
"integrity": "sha512-okMH7OXXJ7YrN9Ok3/SXrnu4iX9yOk+25nqX4imS2npuvTYDmo/QEZoqwZkYaIDk3jVvBOTOIEgEhaLOynBS9g==",
"version": "3.14.2",
"resolved": "https://registry.npmjs.org/js-yaml/-/js-yaml-3.14.2.tgz",
"integrity": "sha512-PMSmkqxr106Xa156c2M265Z+FTrPl+oxd/rgOQy2tijQeK5TxQ43psO1ZCwhVOSdnn+RzkzlRz/eY4BgJBYVpg==",
"dev": true,
"license": "MIT",
"dependencies": {
"argparse": "^1.0.7",
"esprima": "^4.0.0"
@@ -3728,6 +3756,15 @@
"tmpl": "1.0.5"
}
},
"node_modules/math-intrinsics": {
"version": "1.1.0",
"resolved": "https://registry.npmjs.org/math-intrinsics/-/math-intrinsics-1.1.0.tgz",
"integrity": "sha512-/IXtbwEk5HTPyEwyKX6hGkYXxM9nbj64B+ilVJnC/R6B0pH5G4V3b0pVbL7DBj4tkhBAppbQUlf6F6Xl9LHu1g==",
"license": "MIT",
"engines": {
"node": ">= 0.4"
}
},
"node_modules/merge-stream": {
"version": "2.0.0",
"resolved": "https://registry.npmjs.org/merge-stream/-/merge-stream-2.0.0.tgz",
@@ -3776,10 +3813,11 @@
}
},
"node_modules/minimatch": {
"version": "3.1.2",
"resolved": "https://registry.npmjs.org/minimatch/-/minimatch-3.1.2.tgz",
"integrity": "sha512-J7p63hRiAjw1NDEww1W7i37+ByIrOWO5XQQAzZ3VOcL0PNybwpfmV/N05zFAzwQ9USyEcX6t3UO+K5aqBQOIHw==",
"version": "3.1.5",
"resolved": "https://registry.npmjs.org/minimatch/-/minimatch-3.1.5.tgz",
"integrity": "sha512-VgjWUsnnT6n+NUk6eZq77zeFdpW2LWDzP6zFGrCbHXiYNul5Dzqk2HHQ5uFH2DNW5Xbp8+jVzaeNt94ssEEl4w==",
"dev": true,
"license": "ISC",
"dependencies": {
"brace-expansion": "^1.1.7"
},
@@ -3796,31 +3834,17 @@
}
},
"node_modules/minizlib": {
"version": "3.0.1",
"resolved": "https://registry.npmjs.org/minizlib/-/minizlib-3.0.1.tgz",
"integrity": "sha512-umcy022ILvb5/3Djuu8LWeqUa8D68JaBzlttKeMWen48SjabqS3iY5w/vzeMzMUNhLDifyhbOwKDSznB1vvrwg==",
"version": "3.1.0",
"resolved": "https://registry.npmjs.org/minizlib/-/minizlib-3.1.0.tgz",
"integrity": "sha512-KZxYo1BUkWD2TVFLr0MQoM8vUUigWD3LlD83a/75BqC+4qE0Hb1Vo5v1FgcfaNXvfXzr+5EhQ6ing/CaBijTlw==",
"license": "MIT",
"dependencies": {
"minipass": "^7.0.4",
"rimraf": "^5.0.5"
"minipass": "^7.1.2"
},
"engines": {
"node": ">= 18"
}
},
"node_modules/mkdirp": {
"version": "3.0.1",
"resolved": "https://registry.npmjs.org/mkdirp/-/mkdirp-3.0.1.tgz",
"integrity": "sha512-+NsyUUAZDmo6YVHzL/stxSu3t9YS1iljliy3BSDrXJ/dkn1KYdmtZODGGjLcc9XLgVVpH4KshHB8XmZgMhaBXg==",
"bin": {
"mkdirp": "dist/cjs/src/bin.js"
},
"engines": {
"node": ">=10"
},
"funding": {
"url": "https://github.com/sponsors/isaacs"
}
},
"node_modules/ms": {
"version": "2.1.3",
"resolved": "https://registry.npmjs.org/ms/-/ms-2.1.3.tgz",
@@ -4010,11 +4034,6 @@
"node": ">=6"
}
},
"node_modules/package-json-from-dist": {
"version": "1.0.1",
"resolved": "https://registry.npmjs.org/package-json-from-dist/-/package-json-from-dist-1.0.1.tgz",
"integrity": "sha512-UEZIS3/by4OC8vL3P2dTXRETpebLI2NiI5vIrjaD/5UtrkFX/tNbwjTSRAGC/+7CAo2pIcBaRgWmcBBHcsaCIw=="
},
"node_modules/parse-json": {
"version": "5.2.0",
"resolved": "https://registry.npmjs.org/parse-json/-/parse-json-5.2.0.tgz",
@@ -4055,6 +4074,7 @@
"version": "3.1.1",
"resolved": "https://registry.npmjs.org/path-key/-/path-key-3.1.1.tgz",
"integrity": "sha512-ojmeN0qd+y0jszEtoY48r0Peq5dwMEkIlCOu6Q5f41lfkswXuKtYrhgoTpLnyIcHm24Uhqx+5Tqm2InSwLhE6Q==",
"dev": true,
"engines": {
"node": ">=8"
}
@@ -4065,26 +4085,6 @@
"integrity": "sha512-LDJzPVEEEPR+y48z93A0Ed0yXb8pAByGWo/k5YYdYgpY2/2EsOsksJrq7lOHxryrVOn1ejG6oAp8ahvOIQD8sw==",
"dev": true
},
"node_modules/path-scurry": {
"version": "1.11.1",
"resolved": "https://registry.npmjs.org/path-scurry/-/path-scurry-1.11.1.tgz",
"integrity": "sha512-Xa4Nw17FS9ApQFJ9umLiJS4orGjm7ZzwUrwamcGQuHSzDyth9boKDaycYdDcZDuqYATXw4HFXgaqWTctW/v1HA==",
"dependencies": {
"lru-cache": "^10.2.0",
"minipass": "^5.0.0 || ^6.0.2 || ^7.0.0"
},
"engines": {
"node": ">=16 || 14 >=14.18"
},
"funding": {
"url": "https://github.com/sponsors/isaacs"
}
},
"node_modules/path-scurry/node_modules/lru-cache": {
"version": "10.4.3",
"resolved": "https://registry.npmjs.org/lru-cache/-/lru-cache-10.4.3.tgz",
"integrity": "sha512-JNAzZcXrCt42VGLuYz0zfAzDfAvJWW6AfYlDBQyDV5DClI2m5sAmK+OIO7s59XfsRsWHp02jAJrRadPRGTt6SQ=="
},
"node_modules/picocolors": {
"version": "1.1.1",
"resolved": "https://registry.npmjs.org/picocolors/-/picocolors-1.1.1.tgz",
@@ -4246,61 +4246,6 @@
"node": ">=10"
}
},
"node_modules/rimraf": {
"version": "5.0.10",
"resolved": "https://registry.npmjs.org/rimraf/-/rimraf-5.0.10.tgz",
"integrity": "sha512-l0OE8wL34P4nJH/H2ffoaniAokM2qSmrtXHmlpvYr5AVVX8msAyW0l8NVJFDxlSK4u3Uh/f41cQheDVdnYijwQ==",
"dependencies": {
"glob": "^10.3.7"
},
"bin": {
"rimraf": "dist/esm/bin.mjs"
},
"funding": {
"url": "https://github.com/sponsors/isaacs"
}
},
"node_modules/rimraf/node_modules/brace-expansion": {
"version": "2.0.1",
"resolved": "https://registry.npmjs.org/brace-expansion/-/brace-expansion-2.0.1.tgz",
"integrity": "sha512-XnAIvQ8eM+kC6aULx6wuQiwVsnzsi9d3WxzV3FpWTGA19F621kwdbsAcFKXgKUHZWsy+mY6iL1sHTxWEFCytDA==",
"dependencies": {
"balanced-match": "^1.0.0"
}
},
"node_modules/rimraf/node_modules/glob": {
"version": "10.4.5",
"resolved": "https://registry.npmjs.org/glob/-/glob-10.4.5.tgz",
"integrity": "sha512-7Bv8RF0k6xjo7d4A/PxYLbUCfb6c+Vpd2/mB2yRDlew7Jb5hEXiCD9ibfO7wpk8i4sevK6DFny9h7EYbM3/sHg==",
"dependencies": {
"foreground-child": "^3.1.0",
"jackspeak": "^3.1.2",
"minimatch": "^9.0.4",
"minipass": "^7.1.2",
"package-json-from-dist": "^1.0.0",
"path-scurry": "^1.11.1"
},
"bin": {
"glob": "dist/esm/bin.mjs"
},
"funding": {
"url": "https://github.com/sponsors/isaacs"
}
},
"node_modules/rimraf/node_modules/minimatch": {
"version": "9.0.5",
"resolved": "https://registry.npmjs.org/minimatch/-/minimatch-9.0.5.tgz",
"integrity": "sha512-G6T0ZX48xgozx7587koeX9Ys2NYy6Gmv//P89sEte9V9whIapMNF4idKxnW2QtCcLiTWlb/wfCabAtAFWhhBow==",
"dependencies": {
"brace-expansion": "^2.0.1"
},
"engines": {
"node": ">=16 || 14 >=14.17"
},
"funding": {
"url": "https://github.com/sponsors/isaacs"
}
},
"node_modules/semver": {
"version": "7.6.3",
"resolved": "https://registry.npmjs.org/semver/-/semver-7.6.3.tgz",
@@ -4354,6 +4299,7 @@
"version": "2.0.0",
"resolved": "https://registry.npmjs.org/shebang-command/-/shebang-command-2.0.0.tgz",
"integrity": "sha512-kHxr2zZpYtdmrN1qDjrrX/Z1rR1kG8Dx+gkpK1G4eXmvXswmcE1hTWBWYUzlraYw1/yZp6YuDY77YtvbN0dmDA==",
"dev": true,
"dependencies": {
"shebang-regex": "^3.0.0"
},
@@ -4365,6 +4311,7 @@
"version": "3.0.0",
"resolved": "https://registry.npmjs.org/shebang-regex/-/shebang-regex-3.0.0.tgz",
"integrity": "sha512-7++dFhtcx3353uBaq8DDR4NuxBetBzC7ZQOhmTQInHEd6bSrXdiEyzCvG07Z44UYdLShWUyXt5M/yhz8ekcb1A==",
"dev": true,
"engines": {
"node": ">=8"
}
@@ -4452,20 +4399,7 @@
"version": "4.2.3",
"resolved": "https://registry.npmjs.org/string-width/-/string-width-4.2.3.tgz",
"integrity": "sha512-wKyQRQpjJ0sIp62ErSZdGsjMJWsap5oRNihHhu6G7JVO/9jIB6UyevL+tXuOqrng8j/cxKTWyWUwvSTriiZz/g==",
"dependencies": {
"emoji-regex": "^8.0.0",
"is-fullwidth-code-point": "^3.0.0",
"strip-ansi": "^6.0.1"
},
"engines": {
"node": ">=8"
}
},
"node_modules/string-width-cjs": {
"name": "string-width",
"version": "4.2.3",
"resolved": "https://registry.npmjs.org/string-width/-/string-width-4.2.3.tgz",
"integrity": "sha512-wKyQRQpjJ0sIp62ErSZdGsjMJWsap5oRNihHhu6G7JVO/9jIB6UyevL+tXuOqrng8j/cxKTWyWUwvSTriiZz/g==",
"dev": true,
"dependencies": {
"emoji-regex": "^8.0.0",
"is-fullwidth-code-point": "^3.0.0",
@@ -4479,18 +4413,7 @@
"version": "6.0.1",
"resolved": "https://registry.npmjs.org/strip-ansi/-/strip-ansi-6.0.1.tgz",
"integrity": "sha512-Y38VPSHcqkFrCpFnQ9vuSXmquuv5oXOKpGeT6aGrr3o3Gc9AlVa6JBfUSOCnbxGGZF+/0ooI7KrPuUSztUdU5A==",
"dependencies": {
"ansi-regex": "^5.0.1"
},
"engines": {
"node": ">=8"
}
},
"node_modules/strip-ansi-cjs": {
"name": "strip-ansi",
"version": "6.0.1",
"resolved": "https://registry.npmjs.org/strip-ansi/-/strip-ansi-6.0.1.tgz",
"integrity": "sha512-Y38VPSHcqkFrCpFnQ9vuSXmquuv5oXOKpGeT6aGrr3o3Gc9AlVa6JBfUSOCnbxGGZF+/0ooI7KrPuUSztUdU5A==",
"dev": true,
"dependencies": {
"ansi-regex": "^5.0.1"
},
@@ -4541,15 +4464,15 @@
}
},
"node_modules/tar": {
"version": "7.4.3",
"resolved": "https://registry.npmjs.org/tar/-/tar-7.4.3.tgz",
"integrity": "sha512-5S7Va8hKfV7W5U6g3aYxXmlPoZVAwUMy9AOKyF2fVuZa2UD3qZjg578OrLRt8PcNN1PleVaL/5/yYATNL0ICUw==",
"version": "7.5.10",
"resolved": "https://registry.npmjs.org/tar/-/tar-7.5.10.tgz",
"integrity": "sha512-8mOPs1//5q/rlkNSPcCegA6hiHJYDmSLEI8aMH/CdSQJNWztHC9WHNam5zdQlfpTwB9Xp7IBEsHfV5LKMJGVAw==",
"license": "BlueOak-1.0.0",
"dependencies": {
"@isaacs/fs-minipass": "^4.0.0",
"chownr": "^3.0.0",
"minipass": "^7.1.2",
"minizlib": "^3.0.1",
"mkdirp": "^3.0.1",
"minizlib": "^3.1.0",
"yallist": "^5.0.0"
},
"engines": {
@@ -4782,6 +4705,7 @@
"version": "2.0.2",
"resolved": "https://registry.npmjs.org/which/-/which-2.0.2.tgz",
"integrity": "sha512-BLI3Tl1TW3Pvl70l3yq3Y64i+awpwXqsGBYWkkqMtnbXgrMD+yj7rhW0kuEDxzJaYXGjEW5ogapKNMEKNMjibA==",
"dev": true,
"dependencies": {
"isexe": "^2.0.0"
},
@@ -4809,23 +4733,6 @@
"url": "https://github.com/chalk/wrap-ansi?sponsor=1"
}
},
"node_modules/wrap-ansi-cjs": {
"name": "wrap-ansi",
"version": "7.0.0",
"resolved": "https://registry.npmjs.org/wrap-ansi/-/wrap-ansi-7.0.0.tgz",
"integrity": "sha512-YVGIj2kamLSTxw6NsZjoBxfSwsn0ycdesmc4p+Q21c5zPuZ1pl+NfxVdxPtdHvmNVOQ6XSYG4AUtyt/Fi7D16Q==",
"dependencies": {
"ansi-styles": "^4.0.0",
"string-width": "^4.1.0",
"strip-ansi": "^6.0.0"
},
"engines": {
"node": ">=10"
},
"funding": {
"url": "https://github.com/chalk/wrap-ansi?sponsor=1"
}
},
"node_modules/wrappy": {
"version": "1.0.2",
"resolved": "https://registry.npmjs.org/wrappy/-/wrappy-1.0.2.tgz",

View File

@@ -20,6 +20,8 @@ import {
Float32,
Float64,
Int,
Int8,
Int16,
Int32,
Int64,
LargeBinary,
@@ -35,6 +37,8 @@ import {
Timestamp,
Type,
Uint8,
Uint16,
Uint32,
Utf8,
Vector,
makeVector as arrowMakeVector,
@@ -529,7 +533,8 @@ function isObject(value: unknown): value is Record<string, unknown> {
!(value instanceof Date) &&
!(value instanceof Set) &&
!(value instanceof Map) &&
!(value instanceof Buffer)
!(value instanceof Buffer) &&
!ArrayBuffer.isView(value)
);
}
@@ -588,6 +593,13 @@ function inferType(
return new Bool();
} else if (value instanceof Buffer) {
return new Binary();
} else if (ArrayBuffer.isView(value) && !(value instanceof DataView)) {
const info = typedArrayToArrowType(value);
if (info !== undefined) {
const child = new Field("item", info.elementType, true);
return new FixedSizeList(info.length, child);
}
return undefined;
} else if (Array.isArray(value)) {
if (value.length === 0) {
return undefined; // Without any values we can't infer the type
@@ -746,6 +758,32 @@ function makeListVector(lists: unknown[][]): Vector<unknown> {
return listBuilder.finish().toVector();
}
/**
* Map a JS TypedArray instance to the corresponding Arrow element DataType
* and its length. Returns undefined if the value is not a recognized TypedArray.
*/
function typedArrayToArrowType(
value: ArrayBufferView,
): { elementType: DataType; length: number } | undefined {
if (value instanceof Float32Array)
return { elementType: new Float32(), length: value.length };
if (value instanceof Float64Array)
return { elementType: new Float64(), length: value.length };
if (value instanceof Uint8Array)
return { elementType: new Uint8(), length: value.length };
if (value instanceof Uint16Array)
return { elementType: new Uint16(), length: value.length };
if (value instanceof Uint32Array)
return { elementType: new Uint32(), length: value.length };
if (value instanceof Int8Array)
return { elementType: new Int8(), length: value.length };
if (value instanceof Int16Array)
return { elementType: new Int16(), length: value.length };
if (value instanceof Int32Array)
return { elementType: new Int32(), length: value.length };
return undefined;
}
/** Helper function to convert an Array of JS values to an Arrow Vector */
function makeVector(
values: unknown[],
@@ -814,6 +852,16 @@ function makeVector(
"makeVector cannot infer the type if all values are null or undefined",
);
}
if (ArrayBuffer.isView(sampleValue) && !(sampleValue instanceof DataView)) {
const info = typedArrayToArrowType(sampleValue);
if (info !== undefined) {
const fslType = new FixedSizeList(
info.length,
new Field("item", info.elementType, true),
);
return vectorFromArray(values, fslType);
}
}
if (Array.isArray(sampleValue)) {
// Default Arrow inference doesn't handle list types
return makeListVector(values as unknown[][]);

View File

@@ -5,12 +5,15 @@ import {
Table as ArrowTable,
Data,
DataType,
Field,
IntoVector,
MultiVector,
Schema,
dataTypeToJson,
fromDataToBuffer,
fromTableToBuffer,
isMultiVector,
makeEmptyTable,
tableFromIPC,
} from "./arrow";
@@ -84,6 +87,16 @@ export interface OptimizeOptions {
* tbl.optimize({cleanupOlderThan: new Date()});
*/
cleanupOlderThan: Date;
/**
* Because they may be part of an in-progress transaction, files newer than
* 7 days old are not deleted by default. If you are sure that there are no
* in-progress transactions, then you can set this to true to delete all
* files older than `cleanupOlderThan`.
*
* **WARNING**: This should only be set to true if you can guarantee that
* no other process is currently working on this dataset. Otherwise the
* dataset could be put into a corrupted state.
*/
deleteUnverified: boolean;
}
@@ -381,15 +394,16 @@ export abstract class Table {
abstract vectorSearch(vector: IntoVector | MultiVector): VectorQuery;
/**
* Add new columns with defined values.
* @param {AddColumnsSql[]} newColumnTransforms pairs of column names and
* the SQL expression to use to calculate the value of the new column. These
* expressions will be evaluated for each row in the table, and can
* reference existing columns in the table.
* @param {AddColumnsSql[] | Field | Field[] | Schema} newColumnTransforms Either:
* - An array of objects with column names and SQL expressions to calculate values
* - A single Arrow Field defining one column with its data type (column will be initialized with null values)
* - An array of Arrow Fields defining columns with their data types (columns will be initialized with null values)
* - An Arrow Schema defining columns with their data types (columns will be initialized with null values)
* @returns {Promise<AddColumnsResult>} A promise that resolves to an object
* containing the new version number of the table after adding the columns.
*/
abstract addColumns(
newColumnTransforms: AddColumnsSql[],
newColumnTransforms: AddColumnsSql[] | Field | Field[] | Schema,
): Promise<AddColumnsResult>;
/**
@@ -501,19 +515,7 @@ export abstract class Table {
* - Index: Optimizes the indices, adding new data to existing indices
*
*
* Experimental API
* ----------------
*
* The optimization process is undergoing active development and may change.
* Our goal with these changes is to improve the performance of optimization and
* reduce the complexity.
*
* That being said, it is essential today to run optimize if you want the best
* performance. It should be stable and safe to use in production, but it our
* hope that the API may be simplified (or not even need to be called) in the
* future.
*
* The frequency an application shoudl call optimize is based on the frequency of
* The frequency an application should call optimize is based on the frequency of
* data modifications. If data is frequently added, deleted, or updated then
* optimize should be run frequently. A good rule of thumb is to run optimize if
* you have added or modified 100,000 or more records or run more than 20 data
@@ -806,9 +808,40 @@ export class LocalTable extends Table {
// TODO: Support BatchUDF
async addColumns(
newColumnTransforms: AddColumnsSql[],
newColumnTransforms: AddColumnsSql[] | Field | Field[] | Schema,
): Promise<AddColumnsResult> {
return await this.inner.addColumns(newColumnTransforms);
// Handle single Field -> convert to array of Fields
if (newColumnTransforms instanceof Field) {
newColumnTransforms = [newColumnTransforms];
}
// Handle array of Fields -> convert to Schema
if (
Array.isArray(newColumnTransforms) &&
newColumnTransforms.length > 0 &&
newColumnTransforms[0] instanceof Field
) {
const fields = newColumnTransforms as Field[];
newColumnTransforms = new Schema(fields);
}
// Handle Schema -> use schema-based approach
if (newColumnTransforms instanceof Schema) {
const schema = newColumnTransforms;
// Convert schema to buffer using Arrow IPC format
const emptyTable = makeEmptyTable(schema);
const schemaBuf = await fromTableToBuffer(emptyTable);
return await this.inner.addColumnsWithSchema(schemaBuf);
}
// Handle SQL expressions (existing functionality)
if (Array.isArray(newColumnTransforms)) {
return await this.inner.addColumns(
newColumnTransforms as AddColumnsSql[],
);
}
throw new Error("Invalid input type for addColumns");
}
async alterColumns(

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-darwin-arm64",
"version": "0.27.0-beta.2",
"version": "0.27.1",
"os": ["darwin"],
"cpu": ["arm64"],
"main": "lancedb.darwin-arm64.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-linux-arm64-gnu",
"version": "0.27.0-beta.2",
"version": "0.27.1",
"os": ["linux"],
"cpu": ["arm64"],
"main": "lancedb.linux-arm64-gnu.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-linux-arm64-musl",
"version": "0.27.0-beta.2",
"version": "0.27.1",
"os": ["linux"],
"cpu": ["arm64"],
"main": "lancedb.linux-arm64-musl.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-linux-x64-gnu",
"version": "0.27.0-beta.2",
"version": "0.27.1",
"os": ["linux"],
"cpu": ["x64"],
"main": "lancedb.linux-x64-gnu.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-linux-x64-musl",
"version": "0.27.0-beta.2",
"version": "0.27.1",
"os": ["linux"],
"cpu": ["x64"],
"main": "lancedb.linux-x64-musl.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-win32-arm64-msvc",
"version": "0.27.0-beta.2",
"version": "0.27.1",
"os": [
"win32"
],

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-win32-x64-msvc",
"version": "0.27.0-beta.2",
"version": "0.27.1",
"os": ["win32"],
"cpu": ["x64"],
"main": "lancedb.win32-x64-msvc.node",

4289
nodejs/package-lock.json generated

File diff suppressed because it is too large Load Diff

View File

@@ -11,7 +11,7 @@
"ann"
],
"private": false,
"version": "0.27.0-beta.2",
"version": "0.27.1",
"main": "dist/index.js",
"exports": {
".": "./dist/index.js",

View File

@@ -8,10 +8,10 @@ use lancedb::database::{CreateTableMode, Database};
use napi::bindgen_prelude::*;
use napi_derive::*;
use crate::ConnectionOptions;
use crate::error::NapiErrorExt;
use crate::header::JsHeaderProvider;
use crate::table::Table;
use crate::ConnectionOptions;
use lancedb::connection::{ConnectBuilder, Connection as LanceDBConnection};
use lancedb::ipc::{ipc_file_to_batches, ipc_file_to_schema};

View File

@@ -3,12 +3,12 @@
use std::sync::Mutex;
use lancedb::index::Index as LanceDbIndex;
use lancedb::index::scalar::{BTreeIndexBuilder, FtsIndexBuilder};
use lancedb::index::vector::{
IvfFlatIndexBuilder, IvfHnswPqIndexBuilder, IvfHnswSqIndexBuilder, IvfPqIndexBuilder,
IvfRqIndexBuilder,
};
use lancedb::index::Index as LanceDbIndex;
use napi_derive::napi;
use crate::util::parse_distance_type;

View File

@@ -17,8 +17,8 @@ use lancedb::query::VectorQuery as LanceDbVectorQuery;
use napi::bindgen_prelude::*;
use napi_derive::napi;
use crate::error::convert_error;
use crate::error::NapiErrorExt;
use crate::error::convert_error;
use crate::iterator::RecordBatchIterator;
use crate::rerankers::RerankHybridCallbackArgs;
use crate::rerankers::Reranker;
@@ -551,15 +551,12 @@ fn parse_fts_query(query: Object) -> napi::Result<FullTextSearchQuery> {
}
};
let mut query = FullTextSearchQuery::new_query(query);
if let Some(cols) = columns {
if !cols.is_empty() {
query = query.with_columns(&cols).map_err(|e| {
napi::Error::from_reason(format!(
"Failed to set full text search columns: {}",
e
))
})?;
}
if let Some(cols) = columns
&& !cols.is_empty()
{
query = query.with_columns(&cols).map_err(|e| {
napi::Error::from_reason(format!("Failed to set full text search columns: {}", e))
})?;
}
Ok(query)
} else {

View File

@@ -95,7 +95,7 @@ impl napi::bindgen_prelude::FromNapiValue for Session {
napi_val: napi::sys::napi_value,
) -> napi::Result<Self> {
let object: napi::bindgen_prelude::ClassInstance<Self> =
napi::bindgen_prelude::ClassInstance::from_napi_value(env, napi_val)?;
unsafe { napi::bindgen_prelude::ClassInstance::from_napi_value(env, napi_val)? };
Ok((*object).clone())
}
}

View File

@@ -3,7 +3,7 @@
use std::collections::HashMap;
use lancedb::ipc::ipc_file_to_batches;
use lancedb::ipc::{ipc_file_to_batches, ipc_file_to_schema};
use lancedb::table::{
AddDataMode, ColumnAlteration as LanceColumnAlteration, Duration, NewColumnTransform,
OptimizeAction, OptimizeOptions, Table as LanceDbTable,
@@ -279,6 +279,23 @@ impl Table {
Ok(res.into())
}
#[napi(catch_unwind)]
pub async fn add_columns_with_schema(
&self,
schema_buf: Buffer,
) -> napi::Result<AddColumnsResult> {
let schema = ipc_file_to_schema(schema_buf.to_vec())
.map_err(|e| napi::Error::from_reason(format!("Failed to read IPC schema: {}", e)))?;
let transforms = NewColumnTransform::AllNulls(schema);
let res = self
.inner_ref()?
.add_columns(transforms, None)
.await
.default_error()?;
Ok(res.into())
}
#[napi(catch_unwind)]
pub async fn alter_columns(
&self,
@@ -753,12 +770,14 @@ impl From<lancedb::table::AddResult> for AddResult {
#[napi(object)]
pub struct DeleteResult {
pub num_deleted_rows: i64,
pub version: i64,
}
impl From<lancedb::table::DeleteResult> for DeleteResult {
fn from(value: lancedb::table::DeleteResult) -> Self {
Self {
num_deleted_rows: value.num_deleted_rows as i64,
version: value.version as i64,
}
}

View File

@@ -1,5 +1,5 @@
[tool.bumpversion]
current_version = "0.30.0-beta.3"
current_version = "0.30.1"
parse = """(?x)
(?P<major>0|[1-9]\\d*)\\.
(?P<minor>0|[1-9]\\d*)\\.

View File

@@ -1,6 +1,6 @@
[package]
name = "lancedb-python"
version = "0.30.0-beta.3"
version = "0.30.1"
edition.workspace = true
description = "Python bindings for LanceDB"
license.workspace = true
@@ -16,11 +16,14 @@ crate-type = ["cdylib"]
[dependencies]
arrow = { version = "57.2", features = ["pyarrow"] }
async-trait = "0.1"
bytes = "1"
lancedb = { path = "../rust/lancedb", default-features = false }
lance-core.workspace = true
lance-namespace.workspace = true
lance-namespace-impls.workspace = true
lance-io.workspace = true
env_logger.workspace = true
log.workspace = true
pyo3 = { version = "0.26", features = ["extension-module", "abi3-py39"] }
pyo3-async-runtimes = { version = "0.26", features = [
"attributes",
@@ -28,6 +31,8 @@ pyo3-async-runtimes = { version = "0.26", features = [
] }
pin-project = "1.1.5"
futures.workspace = true
serde = "1"
serde_json = "1"
snafu.workspace = true
tokio = { version = "1.40", features = ["sync"] }

View File

@@ -1,4 +1,4 @@
# LanceDB
# LanceDB Python SDK
A Python library for [LanceDB](https://github.com/lancedb/lancedb).

View File

@@ -3,10 +3,10 @@ name = "lancedb"
# version in Cargo.toml
dynamic = ["version"]
dependencies = [
"deprecation",
"numpy",
"deprecation>=2.1.0",
"numpy>=1.24.0",
"overrides>=0.7; python_version<'3.12'",
"packaging",
"packaging>=23.0",
"pyarrow>=16",
"pydantic>=1.10",
"tqdm>=4.27.0",
@@ -45,51 +45,51 @@ repository = "https://github.com/lancedb/lancedb"
[project.optional-dependencies]
pylance = [
"pylance>=1.0.0b14",
"pylance>=4.0.0b7",
]
tests = [
"aiohttp",
"boto3",
"aiohttp>=3.9.0",
"boto3>=1.28.57",
"pandas>=1.4",
"pytest",
"pytest-mock",
"pytest-asyncio",
"duckdb",
"pytz",
"pytest>=7.0",
"pytest-mock>=3.10",
"pytest-asyncio>=0.21",
"duckdb>=0.9.0",
"pytz>=2023.3",
"polars>=0.19, <=1.3.0",
"tantivy",
"pyarrow-stubs",
"pylance>=1.0.0b14,<3.0.0",
"requests",
"datafusion<52",
"tantivy>=0.20.0",
"pyarrow-stubs>=16.0",
"pylance>=4.0.0b7",
"requests>=2.31.0",
"datafusion>=52,<53",
]
dev = [
"ruff",
"pre-commit",
"pyright",
"ruff>=0.3.0",
"pre-commit>=3.5.0",
"pyright>=1.1.350",
'typing-extensions>=4.0.0; python_version < "3.11"',
]
docs = ["mkdocs", "mkdocs-jupyter", "mkdocs-material", "mkdocstrings-python"]
clip = ["torch", "pillow", "open-clip-torch"]
siglip = ["torch", "pillow", "transformers>=4.41.0","sentencepiece"]
clip = ["torch", "pillow>=12.1.1", "open-clip-torch"]
siglip = ["torch", "pillow>=12.1.1", "transformers>=4.41.0","sentencepiece"]
embeddings = [
"requests>=2.31.0",
"openai>=1.6.1",
"sentence-transformers",
"torch",
"pillow",
"open-clip-torch",
"cohere",
"sentence-transformers>=2.2.0",
"torch>=2.0.0",
"pillow>=12.1.1",
"open-clip-torch>=2.20.0",
"cohere>=4.0",
"colpali-engine>=0.3.10",
"huggingface_hub",
"InstructorEmbedding",
"google.generativeai",
"huggingface_hub>=0.19.0",
"InstructorEmbedding>=1.0.1",
"google.generativeai>=0.3.0",
"boto3>=1.28.57",
"awscli>=1.29.57",
"awscli>=1.44.38",
"botocore>=1.31.57",
'ibm-watsonx-ai>=1.1.2; python_version >= "3.10"',
"ollama>=0.3.0",
"sentencepiece"
"sentencepiece>=0.1.99"
]
azure = ["adlfs>=2024.2.0"]

View File

@@ -135,7 +135,10 @@ class Table:
def close(self) -> None: ...
async def schema(self) -> pa.Schema: ...
async def add(
self, data: pa.RecordBatchReader, mode: Literal["append", "overwrite"]
self,
data: pa.RecordBatchReader,
mode: Literal["append", "overwrite"],
progress: Optional[Any] = None,
) -> AddResult: ...
async def update(
self, updates: Dict[str, str], where: Optional[str]
@@ -166,6 +169,8 @@ class Table:
async def checkout(self, version: Union[int, str]): ...
async def checkout_latest(self): ...
async def restore(self, version: Optional[Union[int, str]] = None): ...
async def prewarm_index(self, index_name: str) -> None: ...
async def prewarm_data(self, columns: Optional[List[str]] = None) -> None: ...
async def list_indices(self) -> list[IndexConfig]: ...
async def delete(self, filter: str) -> DeleteResult: ...
async def add_columns(self, columns: list[tuple[str, str]]) -> AddColumnsResult: ...

View File

@@ -8,7 +8,7 @@ from abc import abstractmethod
from datetime import timedelta
from pathlib import Path
import sys
from typing import TYPE_CHECKING, Dict, Iterable, List, Literal, Optional, Union
from typing import TYPE_CHECKING, Any, Dict, Iterable, List, Literal, Optional, Union
if sys.version_info >= (3, 12):
from typing import override
@@ -1541,6 +1541,8 @@ class AsyncConnection(object):
storage_options_provider: Optional["StorageOptionsProvider"] = None,
index_cache_size: Optional[int] = None,
location: Optional[str] = None,
namespace_client: Optional[Any] = None,
managed_versioning: Optional[bool] = None,
) -> AsyncTable:
"""Open a Lance Table in the database.
@@ -1573,6 +1575,9 @@ class AsyncConnection(object):
The explicit location (URI) of the table. If provided, the table will be
opened from this location instead of deriving it from the database URI
and table name.
managed_versioning: bool, optional
Whether managed versioning is enabled for this table. If provided,
avoids a redundant describe_table call when namespace_client is set.
Returns
-------
@@ -1587,6 +1592,8 @@ class AsyncConnection(object):
storage_options_provider=storage_options_provider,
index_cache_size=index_cache_size,
location=location,
namespace_client=namespace_client,
managed_versioning=managed_versioning,
)
return AsyncTable(table)

View File

@@ -12,7 +12,7 @@ from __future__ import annotations
import asyncio
import sys
from typing import Dict, Iterable, List, Optional, Union
from typing import Any, Dict, Iterable, List, Optional, Union
if sys.version_info >= (3, 12):
from typing import override
@@ -240,7 +240,7 @@ class LanceNamespaceDBConnection(DBConnection):
session : Optional[Session]
A session to use for this connection
"""
self._ns = namespace
self._namespace_client = namespace
self.read_consistency_interval = read_consistency_interval
self.storage_options = storage_options or {}
self.session = session
@@ -269,7 +269,7 @@ class LanceNamespaceDBConnection(DBConnection):
if namespace is None:
namespace = []
request = ListTablesRequest(id=namespace, page_token=page_token, limit=limit)
response = self._ns.list_tables(request)
response = self._namespace_client.list_tables(request)
return response.tables if response.tables else []
@override
@@ -309,7 +309,9 @@ class LanceNamespaceDBConnection(DBConnection):
# Try to describe the table first to see if it exists
try:
describe_request = DescribeTableRequest(id=table_id)
describe_response = self._ns.describe_table(describe_request)
describe_response = self._namespace_client.describe_table(
describe_request
)
location = describe_response.location
namespace_storage_options = describe_response.storage_options
except Exception:
@@ -323,7 +325,7 @@ class LanceNamespaceDBConnection(DBConnection):
location=None,
properties=self.storage_options if self.storage_options else None,
)
declare_response = self._ns.declare_table(declare_request)
declare_response = self._namespace_client.declare_table(declare_request)
if not declare_response.location:
raise ValueError(
@@ -353,7 +355,7 @@ class LanceNamespaceDBConnection(DBConnection):
# Only create if namespace returned storage_options (not None)
if storage_options_provider is None and namespace_storage_options is not None:
storage_options_provider = LanceNamespaceStorageOptionsProvider(
namespace=self._ns,
namespace=self._namespace_client,
table_id=table_id,
)
@@ -371,6 +373,7 @@ class LanceNamespaceDBConnection(DBConnection):
storage_options=merged_storage_options,
storage_options_provider=storage_options_provider,
location=location,
namespace_client=self._namespace_client,
)
return tbl
@@ -389,7 +392,7 @@ class LanceNamespaceDBConnection(DBConnection):
namespace = []
table_id = namespace + [name]
request = DescribeTableRequest(id=table_id)
response = self._ns.describe_table(request)
response = self._namespace_client.describe_table(request)
# Merge storage options: self.storage_options < user options < namespace options
merged_storage_options = dict(self.storage_options)
@@ -402,10 +405,14 @@ class LanceNamespaceDBConnection(DBConnection):
# Only create if namespace returned storage_options (not None)
if storage_options_provider is None and response.storage_options is not None:
storage_options_provider = LanceNamespaceStorageOptionsProvider(
namespace=self._ns,
namespace=self._namespace_client,
table_id=table_id,
)
# Pass managed_versioning to avoid redundant describe_table call in Rust.
# Convert None to False since we already have the answer from describe_table.
managed_versioning = response.managed_versioning is True
return self._lance_table_from_uri(
name,
response.location,
@@ -413,6 +420,8 @@ class LanceNamespaceDBConnection(DBConnection):
storage_options=merged_storage_options,
storage_options_provider=storage_options_provider,
index_cache_size=index_cache_size,
namespace_client=self._namespace_client,
managed_versioning=managed_versioning,
)
@override
@@ -422,7 +431,7 @@ class LanceNamespaceDBConnection(DBConnection):
namespace = []
table_id = namespace + [name]
request = DropTableRequest(id=table_id)
self._ns.drop_table(request)
self._namespace_client.drop_table(request)
@override
def rename_table(
@@ -484,7 +493,7 @@ class LanceNamespaceDBConnection(DBConnection):
request = ListNamespacesRequest(
id=namespace, page_token=page_token, limit=limit
)
response = self._ns.list_namespaces(request)
response = self._namespace_client.list_namespaces(request)
return ListNamespacesResponse(
namespaces=response.namespaces if response.namespaces else [],
page_token=response.page_token,
@@ -520,7 +529,7 @@ class LanceNamespaceDBConnection(DBConnection):
mode=_normalize_create_namespace_mode(mode),
properties=properties,
)
response = self._ns.create_namespace(request)
response = self._namespace_client.create_namespace(request)
return CreateNamespaceResponse(
properties=response.properties if hasattr(response, "properties") else None
)
@@ -555,7 +564,7 @@ class LanceNamespaceDBConnection(DBConnection):
mode=_normalize_drop_namespace_mode(mode),
behavior=_normalize_drop_namespace_behavior(behavior),
)
response = self._ns.drop_namespace(request)
response = self._namespace_client.drop_namespace(request)
return DropNamespaceResponse(
properties=(
response.properties if hasattr(response, "properties") else None
@@ -581,7 +590,7 @@ class LanceNamespaceDBConnection(DBConnection):
Response containing the namespace properties.
"""
request = DescribeNamespaceRequest(id=namespace)
response = self._ns.describe_namespace(request)
response = self._namespace_client.describe_namespace(request)
return DescribeNamespaceResponse(
properties=response.properties if hasattr(response, "properties") else None
)
@@ -615,7 +624,7 @@ class LanceNamespaceDBConnection(DBConnection):
if namespace is None:
namespace = []
request = ListTablesRequest(id=namespace, page_token=page_token, limit=limit)
response = self._ns.list_tables(request)
response = self._namespace_client.list_tables(request)
return ListTablesResponse(
tables=response.tables if response.tables else [],
page_token=response.page_token,
@@ -630,6 +639,8 @@ class LanceNamespaceDBConnection(DBConnection):
storage_options: Optional[Dict[str, str]] = None,
storage_options_provider: Optional[StorageOptionsProvider] = None,
index_cache_size: Optional[int] = None,
namespace_client: Optional[Any] = None,
managed_versioning: Optional[bool] = None,
) -> LanceTable:
# Open a table directly from a URI using the location parameter
# Note: storage_options should already be merged by the caller
@@ -643,6 +654,8 @@ class LanceNamespaceDBConnection(DBConnection):
)
# Open the table using the temporary connection with the location parameter
# Pass namespace_client to enable managed versioning support
# Pass managed_versioning to avoid redundant describe_table call
return LanceTable.open(
temp_conn,
name,
@@ -651,6 +664,8 @@ class LanceNamespaceDBConnection(DBConnection):
storage_options_provider=storage_options_provider,
index_cache_size=index_cache_size,
location=table_uri,
namespace_client=namespace_client,
managed_versioning=managed_versioning,
)
@@ -685,7 +700,7 @@ class AsyncLanceNamespaceDBConnection:
session : Optional[Session]
A session to use for this connection
"""
self._ns = namespace
self._namespace_client = namespace
self.read_consistency_interval = read_consistency_interval
self.storage_options = storage_options or {}
self.session = session
@@ -713,7 +728,7 @@ class AsyncLanceNamespaceDBConnection:
if namespace is None:
namespace = []
request = ListTablesRequest(id=namespace, page_token=page_token, limit=limit)
response = self._ns.list_tables(request)
response = self._namespace_client.list_tables(request)
return response.tables if response.tables else []
async def create_table(
@@ -750,7 +765,9 @@ class AsyncLanceNamespaceDBConnection:
# Try to describe the table first to see if it exists
try:
describe_request = DescribeTableRequest(id=table_id)
describe_response = self._ns.describe_table(describe_request)
describe_response = self._namespace_client.describe_table(
describe_request
)
location = describe_response.location
namespace_storage_options = describe_response.storage_options
except Exception:
@@ -764,7 +781,7 @@ class AsyncLanceNamespaceDBConnection:
location=None,
properties=self.storage_options if self.storage_options else None,
)
declare_response = self._ns.declare_table(declare_request)
declare_response = self._namespace_client.declare_table(declare_request)
if not declare_response.location:
raise ValueError(
@@ -797,7 +814,7 @@ class AsyncLanceNamespaceDBConnection:
and namespace_storage_options is not None
):
provider = LanceNamespaceStorageOptionsProvider(
namespace=self._ns,
namespace=self._namespace_client,
table_id=table_id,
)
else:
@@ -817,6 +834,7 @@ class AsyncLanceNamespaceDBConnection:
storage_options=merged_storage_options,
storage_options_provider=provider,
location=location,
namespace_client=self._namespace_client,
)
lance_table = await asyncio.to_thread(_create_table)
@@ -837,7 +855,7 @@ class AsyncLanceNamespaceDBConnection:
namespace = []
table_id = namespace + [name]
request = DescribeTableRequest(id=table_id)
response = self._ns.describe_table(request)
response = self._namespace_client.describe_table(request)
# Merge storage options: self.storage_options < user options < namespace options
merged_storage_options = dict(self.storage_options)
@@ -849,10 +867,14 @@ class AsyncLanceNamespaceDBConnection:
# Create a storage options provider if not provided by user
if storage_options_provider is None and response.storage_options is not None:
storage_options_provider = LanceNamespaceStorageOptionsProvider(
namespace=self._ns,
namespace=self._namespace_client,
table_id=table_id,
)
# Capture managed_versioning from describe response.
# Convert None to False since we already have the answer from describe_table.
managed_versioning = response.managed_versioning is True
# Open table in a thread
def _open_table():
temp_conn = LanceDBConnection(
@@ -870,6 +892,8 @@ class AsyncLanceNamespaceDBConnection:
storage_options_provider=storage_options_provider,
index_cache_size=index_cache_size,
location=response.location,
namespace_client=self._namespace_client,
managed_versioning=managed_versioning,
)
lance_table = await asyncio.to_thread(_open_table)
@@ -881,7 +905,7 @@ class AsyncLanceNamespaceDBConnection:
namespace = []
table_id = namespace + [name]
request = DropTableRequest(id=table_id)
self._ns.drop_table(request)
self._namespace_client.drop_table(request)
async def rename_table(
self,
@@ -943,7 +967,7 @@ class AsyncLanceNamespaceDBConnection:
request = ListNamespacesRequest(
id=namespace, page_token=page_token, limit=limit
)
response = self._ns.list_namespaces(request)
response = self._namespace_client.list_namespaces(request)
return ListNamespacesResponse(
namespaces=response.namespaces if response.namespaces else [],
page_token=response.page_token,
@@ -978,7 +1002,7 @@ class AsyncLanceNamespaceDBConnection:
mode=_normalize_create_namespace_mode(mode),
properties=properties,
)
response = self._ns.create_namespace(request)
response = self._namespace_client.create_namespace(request)
return CreateNamespaceResponse(
properties=response.properties if hasattr(response, "properties") else None
)
@@ -1012,7 +1036,7 @@ class AsyncLanceNamespaceDBConnection:
mode=_normalize_drop_namespace_mode(mode),
behavior=_normalize_drop_namespace_behavior(behavior),
)
response = self._ns.drop_namespace(request)
response = self._namespace_client.drop_namespace(request)
return DropNamespaceResponse(
properties=(
response.properties if hasattr(response, "properties") else None
@@ -1039,7 +1063,7 @@ class AsyncLanceNamespaceDBConnection:
Response containing the namespace properties.
"""
request = DescribeNamespaceRequest(id=namespace)
response = self._ns.describe_namespace(request)
response = self._namespace_client.describe_namespace(request)
return DescribeNamespaceResponse(
properties=response.properties if hasattr(response, "properties") else None
)
@@ -1072,7 +1096,7 @@ class AsyncLanceNamespaceDBConnection:
if namespace is None:
namespace = []
request = ListTablesRequest(id=namespace, page_token=page_token, limit=limit)
response = self._ns.list_tables(request)
response = self._namespace_client.list_tables(request)
return ListTablesResponse(
tables=response.tables if response.tables else [],
page_token=response.page_token,

View File

@@ -606,6 +606,7 @@ class LanceQueryBuilder(ABC):
query,
ordering_field_name=ordering_field_name,
fts_columns=fts_columns,
fast_search=fast_search,
)
if isinstance(query, list):
@@ -1456,13 +1457,14 @@ class LanceFtsQueryBuilder(LanceQueryBuilder):
query: str | FullTextQuery,
ordering_field_name: Optional[str] = None,
fts_columns: Optional[Union[str, List[str]]] = None,
fast_search: bool = None,
):
super().__init__(table)
self._query = query
self._phrase_query = False
self.ordering_field_name = ordering_field_name
self._reranker = None
self._fast_search = None
self._fast_search = fast_search
if isinstance(fts_columns, str):
fts_columns = [fts_columns]
self._fts_columns = fts_columns
@@ -2203,8 +2205,8 @@ class LanceHybridQueryBuilder(LanceQueryBuilder):
self._vector_query.select(self._columns)
self._fts_query.select(self._columns)
if self._where:
self._vector_query.where(self._where, self._postfilter)
self._fts_query.where(self._where, self._postfilter)
self._vector_query.where(self._where, not self._postfilter)
self._fts_query.where(self._where, not self._postfilter)
if self._with_row_id:
self._vector_query.with_row_id(True)
self._fts_query.with_row_id(True)

View File

@@ -568,4 +568,4 @@ class RemoteDBConnection(DBConnection):
async def close(self):
"""Close the connection to the database."""
self._client.close()
self._conn.close()

View File

@@ -4,7 +4,7 @@
from datetime import timedelta
import logging
from functools import cached_property
from typing import Dict, Iterable, List, Optional, Union, Literal
from typing import Any, Callable, Dict, Iterable, List, Optional, Union, Literal
import warnings
from lancedb._lancedb import (
@@ -35,6 +35,7 @@ import pyarrow as pa
from lancedb.common import DATA, VEC, VECTOR_COLUMN_NAME
from lancedb.merge import LanceMergeInsertBuilder
from lancedb.embeddings import EmbeddingFunctionRegistry
from lancedb.table import _normalize_progress
from ..query import LanceVectorQueryBuilder, LanceQueryBuilder, LanceTakeQueryBuilder
from ..table import AsyncTable, IndexStatistics, Query, Table, Tags
@@ -218,8 +219,6 @@ class RemoteTable(Table):
train: bool = True,
):
"""Create an index on the table.
Currently, the only parameters that matter are
the metric and the vector column name.
Parameters
----------
@@ -250,11 +249,6 @@ class RemoteTable(Table):
>>> table.create_index("l2", "vector") # doctest: +SKIP
"""
if num_sub_vectors is not None:
logging.warning(
"num_sub_vectors is not supported on LanceDB cloud."
"This parameter will be tuned automatically."
)
if accelerator is not None:
logging.warning(
"GPU accelerator is not yet supported on LanceDB cloud."
@@ -315,6 +309,7 @@ class RemoteTable(Table):
mode: str = "append",
on_bad_vectors: str = "error",
fill_value: float = 0.0,
progress: Optional[Union[bool, Callable, Any]] = None,
) -> AddResult:
"""Add more data to the [Table](Table). It has the same API signature as
the OSS version.
@@ -337,17 +332,29 @@ class RemoteTable(Table):
One of "error", "drop", "fill".
fill_value: float, default 0.
The value to use when filling vectors. Only used if on_bad_vectors="fill".
progress: bool, callable, or tqdm-like, optional
A callback or tqdm-compatible progress bar. See
:meth:`Table.add` for details.
Returns
-------
AddResult
An object containing the new version number of the table after adding data.
"""
return LOOP.run(
self._table.add(
data, mode=mode, on_bad_vectors=on_bad_vectors, fill_value=fill_value
progress, owns = _normalize_progress(progress)
try:
return LOOP.run(
self._table.add(
data,
mode=mode,
on_bad_vectors=on_bad_vectors,
fill_value=fill_value,
progress=progress,
)
)
)
finally:
if owns:
progress.close()
def search(
self,
@@ -647,6 +654,45 @@ class RemoteTable(Table):
def drop_index(self, index_name: str):
return LOOP.run(self._table.drop_index(index_name))
def prewarm_index(self, name: str) -> None:
"""Prewarm an index in the table.
This is a hint to the database that the index will be accessed in the
future and should be loaded into memory if possible. This can reduce
cold-start latency for subsequent queries.
This call initiates prewarming and returns once the request is accepted.
It is idempotent and safe to call from multiple clients concurrently.
Parameters
----------
name: str
The name of the index to prewarm
"""
return LOOP.run(self._table.prewarm_index(name))
def prewarm_data(self, columns: Optional[List[str]] = None) -> None:
"""Prewarm data for the table.
This is a hint to the database that the given columns will be accessed
in the future and the database should prefetch the data if possible.
Currently only supported on remote tables.
This call initiates prewarming and returns once the request is accepted.
It is idempotent and safe to call from multiple clients concurrently.
This operation has a large upfront cost but can speed up future queries
that need to fetch the given columns. Large columns such as embeddings
or binary data may not be practical to prewarm. This feature is intended
for workloads that issue many queries against the same columns.
Parameters
----------
columns: list of str, optional
The columns to prewarm. If None, all columns are prewarmed.
"""
return LOOP.run(self._table.prewarm_data(columns))
def wait_for_index(
self, index_names: Iterable[str], timeout: timedelta = timedelta(seconds=300)
):

View File

@@ -14,6 +14,7 @@ from functools import cached_property
from typing import (
TYPE_CHECKING,
Any,
Callable,
Dict,
Iterable,
List,
@@ -556,6 +557,21 @@ def _table_uri(base: str, table_name: str) -> str:
return join_uri(base, f"{table_name}.lance")
def _normalize_progress(progress):
"""Normalize a ``progress`` parameter for :meth:`Table.add`.
Returns ``(progress_obj, owns)`` where *owns* is True when we created a
tqdm bar that the caller must close.
"""
if progress is True:
from tqdm.auto import tqdm
return tqdm(unit=" rows"), True
if progress is False or progress is None:
return None, False
return progress, False
class Table(ABC):
"""
A Table is a collection of Records in a LanceDB Database.
@@ -974,6 +990,7 @@ class Table(ABC):
mode: AddMode = "append",
on_bad_vectors: OnBadVectorsType = "error",
fill_value: float = 0.0,
progress: Optional[Union[bool, Callable, Any]] = None,
) -> AddResult:
"""Add more data to the [Table](Table).
@@ -995,6 +1012,29 @@ class Table(ABC):
One of "error", "drop", "fill".
fill_value: float, default 0.
The value to use when filling vectors. Only used if on_bad_vectors="fill".
progress: bool, callable, or tqdm-like, optional
Progress reporting during the add operation. Can be:
- ``True`` to automatically create and display a tqdm progress
bar (requires ``tqdm`` to be installed)::
table.add(data, progress=True)
- A **callable** that receives a dict with keys ``output_rows``,
``output_bytes``, ``total_rows``, ``elapsed_seconds``,
``active_tasks``, ``total_tasks``, and ``done``::
def on_progress(p):
print(f"{p['output_rows']}/{p['total_rows']} rows, "
f"{p['active_tasks']}/{p['total_tasks']} workers")
table.add(data, progress=on_progress)
- A **tqdm-compatible** progress bar whose ``total`` and
``update()`` will be called automatically. The postfix shows
write throughput (MB/s) and active worker count::
with tqdm() as pbar:
table.add(data, progress=pbar)
Returns
-------
@@ -1331,7 +1371,7 @@ class Table(ABC):
1 2 [3.0, 4.0]
2 3 [5.0, 6.0]
>>> table.delete("x = 2")
DeleteResult(version=2)
DeleteResult(num_deleted_rows=1, version=2)
>>> table.to_pandas()
x vector
0 1 [1.0, 2.0]
@@ -1345,7 +1385,7 @@ class Table(ABC):
>>> to_remove
'1, 5'
>>> table.delete(f"x IN ({to_remove})")
DeleteResult(version=3)
DeleteResult(num_deleted_rows=1, version=3)
>>> table.to_pandas()
x vector
0 3 [5.0, 6.0]
@@ -1506,22 +1546,17 @@ class Table(ABC):
in-progress operation (e.g. appending new data) and these files will not
be deleted unless they are at least 7 days old. If delete_unverified is True
then these files will be deleted regardless of their age.
.. warning::
This should only be set to True if you can guarantee that no other
process is currently working on this dataset. Otherwise the dataset
could be put into a corrupted state.
retrain: bool, default False
This parameter is no longer used and is deprecated.
Experimental API
----------------
The optimization process is undergoing active development and may change.
Our goal with these changes is to improve the performance of optimization and
reduce the complexity.
That being said, it is essential today to run optimize if you want the best
performance. It should be stable and safe to use in production, but it our
hope that the API may be simplified (or not even need to be called) in the
future.
The frequency an application shoudl call optimize is based on the frequency of
The frequency an application should call optimize is based on the frequency of
data modifications. If data is frequently added, deleted, or updated then
optimize should be run frequently. A good rule of thumb is to run optimize if
you have added or modified 100,000 or more records or run more than 20 data
@@ -1746,6 +1781,8 @@ class LanceTable(Table):
storage_options_provider: Optional["StorageOptionsProvider"] = None,
index_cache_size: Optional[int] = None,
location: Optional[str] = None,
namespace_client: Optional[Any] = None,
managed_versioning: Optional[bool] = None,
_async: AsyncTable = None,
):
if namespace is None:
@@ -1753,6 +1790,7 @@ class LanceTable(Table):
self._conn = connection
self._namespace = namespace
self._location = location # Store location for use in _dataset_path
self._namespace_client = namespace_client
if _async is not None:
self._table = _async
else:
@@ -1764,6 +1802,8 @@ class LanceTable(Table):
storage_options_provider=storage_options_provider,
index_cache_size=index_cache_size,
location=location,
namespace_client=namespace_client,
managed_versioning=managed_versioning,
)
)
@@ -1806,6 +1846,8 @@ class LanceTable(Table):
storage_options_provider: Optional["StorageOptionsProvider"] = None,
index_cache_size: Optional[int] = None,
location: Optional[str] = None,
namespace_client: Optional[Any] = None,
managed_versioning: Optional[bool] = None,
):
if namespace is None:
namespace = []
@@ -1817,6 +1859,8 @@ class LanceTable(Table):
storage_options_provider=storage_options_provider,
index_cache_size=index_cache_size,
location=location,
namespace_client=namespace_client,
managed_versioning=managed_versioning,
)
# check the dataset exists
@@ -1848,6 +1892,16 @@ class LanceTable(Table):
"Please install with `pip install pylance`."
)
if self._namespace_client is not None:
table_id = self._namespace + [self.name]
return lance.dataset(
version=self.version,
storage_options=self._conn.storage_options,
namespace=self._namespace_client,
table_id=table_id,
**kwargs,
)
return lance.dataset(
self._dataset_path,
version=self.version,
@@ -2200,12 +2254,18 @@ class LanceTable(Table):
def prewarm_index(self, name: str) -> None:
"""
Prewarms an index in the table
Prewarm an index in the table.
This loads the entire index into memory
This is a hint to the database that the index will be accessed in the
future and should be loaded into memory if possible. This can reduce
cold-start latency for subsequent queries.
If the index does not fit into the available cache this call
may be wasteful
This call initiates prewarming and returns once the request is accepted.
It is idempotent and safe to call from multiple clients concurrently.
It is generally wasteful to call this if the index does not fit into the
available cache. Not all index types support prewarming; unsupported
indices will silently ignore the request.
Parameters
----------
@@ -2214,6 +2274,29 @@ class LanceTable(Table):
"""
return LOOP.run(self._table.prewarm_index(name))
def prewarm_data(self, columns: Optional[List[str]] = None) -> None:
"""
Prewarm data for the table.
This is a hint to the database that the given columns will be accessed
in the future and the database should prefetch the data if possible.
Currently only supported on remote tables.
This call initiates prewarming and returns once the request is accepted.
It is idempotent and safe to call from multiple clients concurrently.
This operation has a large upfront cost but can speed up future queries
that need to fetch the given columns. Large columns such as embeddings
or binary data may not be practical to prewarm. This feature is intended
for workloads that issue many queries against the same columns.
Parameters
----------
columns: list of str, optional
The columns to prewarm. If None, all columns are prewarmed.
"""
return LOOP.run(self._table.prewarm_data(columns))
def wait_for_index(
self, index_names: Iterable[str], timeout: timedelta = timedelta(seconds=300)
) -> None:
@@ -2449,6 +2532,7 @@ class LanceTable(Table):
mode: AddMode = "append",
on_bad_vectors: OnBadVectorsType = "error",
fill_value: float = 0.0,
progress: Optional[Union[bool, Callable, Any]] = None,
) -> AddResult:
"""Add data to the table.
If vector columns are missing and the table
@@ -2467,17 +2551,29 @@ class LanceTable(Table):
One of "error", "drop", "fill", "null".
fill_value: float, default 0.
The value to use when filling vectors. Only used if on_bad_vectors="fill".
progress: bool, callable, or tqdm-like, optional
A callback or tqdm-compatible progress bar. See
:meth:`Table.add` for details.
Returns
-------
int
The number of vectors in the table.
"""
return LOOP.run(
self._table.add(
data, mode=mode, on_bad_vectors=on_bad_vectors, fill_value=fill_value
progress, owns = _normalize_progress(progress)
try:
return LOOP.run(
self._table.add(
data,
mode=mode,
on_bad_vectors=on_bad_vectors,
fill_value=fill_value,
progress=progress,
)
)
)
finally:
if owns:
progress.close()
def merge(
self,
@@ -2713,6 +2809,7 @@ class LanceTable(Table):
data_storage_version: Optional[str] = None,
enable_v2_manifest_paths: Optional[bool] = None,
location: Optional[str] = None,
namespace_client: Optional[Any] = None,
):
"""
Create a new table.
@@ -2773,6 +2870,7 @@ class LanceTable(Table):
self._conn = db
self._namespace = namespace
self._location = location
self._namespace_client = namespace_client
if data_storage_version is not None:
warnings.warn(
@@ -2997,22 +3095,17 @@ class LanceTable(Table):
in-progress operation (e.g. appending new data) and these files will not
be deleted unless they are at least 7 days old. If delete_unverified is True
then these files will be deleted regardless of their age.
.. warning::
This should only be set to True if you can guarantee that no other
process is currently working on this dataset. Otherwise the dataset
could be put into a corrupted state.
retrain: bool, default False
This parameter is no longer used and is deprecated.
Experimental API
----------------
The optimization process is undergoing active development and may change.
Our goal with these changes is to improve the performance of optimization and
reduce the complexity.
That being said, it is essential today to run optimize if you want the best
performance. It should be stable and safe to use in production, but it our
hope that the API may be simplified (or not even need to be called) in the
future.
The frequency an application shoudl call optimize is based on the frequency of
The frequency an application should call optimize is based on the frequency of
data modifications. If data is frequently added, deleted, or updated then
optimize should be run frequently. A good rule of thumb is to run optimize if
you have added or modified 100,000 or more records or run more than 20 data
@@ -3613,19 +3706,47 @@ class AsyncTable:
"""
Prewarm an index in the table.
This is a hint to the database that the index will be accessed in the
future and should be loaded into memory if possible. This can reduce
cold-start latency for subsequent queries.
This call initiates prewarming and returns once the request is accepted.
It is idempotent and safe to call from multiple clients concurrently.
It is generally wasteful to call this if the index does not fit into the
available cache. Not all index types support prewarming; unsupported
indices will silently ignore the request.
Parameters
----------
name: str
The name of the index to prewarm
Notes
-----
This will load the index into memory. This may reduce the cold-start time for
future queries. If the index does not fit in the cache then this call may be
wasteful.
"""
await self._inner.prewarm_index(name)
async def prewarm_data(self, columns: Optional[List[str]] = None) -> None:
"""
Prewarm data for the table.
This is a hint to the database that the given columns will be accessed
in the future and the database should prefetch the data if possible.
Currently only supported on remote tables.
This call initiates prewarming and returns once the request is accepted.
It is idempotent and safe to call from multiple clients concurrently.
This operation has a large upfront cost but can speed up future queries
that need to fetch the given columns. Large columns such as embeddings
or binary data may not be practical to prewarm. This feature is intended
for workloads that issue many queries against the same columns.
Parameters
----------
columns: list of str, optional
The columns to prewarm. If None, all columns are prewarmed.
"""
await self._inner.prewarm_data(columns)
async def wait_for_index(
self, index_names: Iterable[str], timeout: timedelta = timedelta(seconds=300)
) -> None:
@@ -3701,6 +3822,7 @@ class AsyncTable:
mode: Optional[Literal["append", "overwrite"]] = "append",
on_bad_vectors: Optional[OnBadVectorsType] = None,
fill_value: Optional[float] = None,
progress: Optional[Union[bool, Callable, Any]] = None,
) -> AddResult:
"""Add more data to the [Table](Table).
@@ -3722,6 +3844,9 @@ class AsyncTable:
One of "error", "drop", "fill", "null".
fill_value: float, default 0.
The value to use when filling vectors. Only used if on_bad_vectors="fill".
progress: callable or tqdm-like, optional
A callback or tqdm-compatible progress bar. See
:meth:`Table.add` for details.
"""
schema = await self.schema()
@@ -3745,8 +3870,9 @@ class AsyncTable:
)
_register_optional_converters()
data = to_scannable(data)
progress, owns = _normalize_progress(progress)
try:
return await self._inner.add(data, mode or "append")
return await self._inner.add(data, mode or "append", progress=progress)
except RuntimeError as e:
if "Cast error" in str(e):
raise ValueError(e)
@@ -3754,6 +3880,9 @@ class AsyncTable:
raise ValueError(e)
else:
raise
finally:
if owns:
progress.close()
def merge_insert(self, on: Union[str, Iterable[str]]) -> LanceMergeInsertBuilder:
"""
@@ -4215,7 +4344,7 @@ class AsyncTable:
1 2 [3.0, 4.0]
2 3 [5.0, 6.0]
>>> table.delete("x = 2")
DeleteResult(version=2)
DeleteResult(num_deleted_rows=1, version=2)
>>> table.to_pandas()
x vector
0 1 [1.0, 2.0]
@@ -4229,7 +4358,7 @@ class AsyncTable:
>>> to_remove
'1, 5'
>>> table.delete(f"x IN ({to_remove})")
DeleteResult(version=3)
DeleteResult(num_deleted_rows=1, version=3)
>>> table.to_pandas()
x vector
0 3 [5.0, 6.0]
@@ -4552,22 +4681,17 @@ class AsyncTable:
in-progress operation (e.g. appending new data) and these files will not
be deleted unless they are at least 7 days old. If delete_unverified is True
then these files will be deleted regardless of their age.
.. warning::
This should only be set to True if you can guarantee that no other
process is currently working on this dataset. Otherwise the dataset
could be put into a corrupted state.
retrain: bool, default False
This parameter is no longer used and is deprecated.
Experimental API
----------------
The optimization process is undergoing active development and may change.
Our goal with these changes is to improve the performance of optimization and
reduce the complexity.
That being said, it is essential today to run optimize if you want the best
performance. It should be stable and safe to use in production, but it our
hope that the API may be simplified (or not even need to be called) in the
future.
The frequency an application shoudl call optimize is based on the frequency of
The frequency an application should call optimize is based on the frequency of
data modifications. If data is frequently added, deleted, or updated then
optimize should be run frequently. A good rule of thumb is to run optimize if
you have added or modified 100,000 or more records or run more than 20 data
@@ -4688,7 +4812,16 @@ class IndexStatistics:
num_indexed_rows: int
num_unindexed_rows: int
index_type: Literal[
"IVF_PQ", "IVF_HNSW_PQ", "IVF_HNSW_SQ", "FTS", "BTREE", "BITMAP", "LABEL_LIST"
"IVF_FLAT",
"IVF_SQ",
"IVF_PQ",
"IVF_RQ",
"IVF_HNSW_SQ",
"IVF_HNSW_PQ",
"FTS",
"BTREE",
"BITMAP",
"LABEL_LIST",
]
distance_type: Optional[Literal["l2", "cosine", "dot"]] = None
num_indices: Optional[int] = None

View File

@@ -324,6 +324,16 @@ def _(value: list):
return "[" + ", ".join(map(value_to_sql, value)) + "]"
@value_to_sql.register(dict)
def _(value: dict):
# https://datafusion.apache.org/user-guide/sql/scalar_functions.html#named-struct
return (
"named_struct("
+ ", ".join(f"'{k}', {value_to_sql(v)}" for k, v in value.items())
+ ")"
)
@value_to_sql.register(np.ndarray)
def _(value: np.ndarray):
return value_to_sql(value.tolist())

View File

@@ -27,6 +27,7 @@ from lancedb.query import (
PhraseQuery,
BooleanQuery,
Occur,
LanceFtsQueryBuilder,
)
import numpy as np
import pyarrow as pa
@@ -920,6 +921,10 @@ def test_fts_fast_search(table):
assert query.limit == 5
assert query.columns == ["text"]
# fast_search should be enabled by keyword argument too
query = LanceFtsQueryBuilder(table, "xyz", fast_search=True).to_query_object()
assert query.fast_search is True
# Verify it executes without error and skips unindexed data
results = table.search("xyz", query_type="fts").fast_search().limit(5).to_list()
assert len(results) == 0

View File

@@ -177,6 +177,60 @@ async def test_analyze_plan(table: AsyncTable):
assert "metrics=" in res
@pytest.fixture
def table_with_id(tmpdir_factory) -> Table:
tmp_path = str(tmpdir_factory.mktemp("data"))
db = lancedb.connect(tmp_path)
data = pa.table(
{
"id": pa.array([1, 2, 3, 4], type=pa.int64()),
"text": pa.array(["a", "b", "cat", "dog"]),
"vector": pa.array(
[[0.1, 0.1], [2, 2], [-0.1, -0.1], [0.5, -0.5]],
type=pa.list_(pa.float32(), list_size=2),
),
}
)
table = db.create_table("test_with_id", data)
table.create_fts_index("text", with_position=False, use_tantivy=False)
return table
def test_hybrid_prefilter_explain_plan(table_with_id: Table):
"""
Verify that the prefilter logic is not inverted in LanceHybridQueryBuilder.
"""
plan_prefilter = (
table_with_id.search(query_type="hybrid")
.vector([0.0, 0.0])
.text("dog")
.where("id = 1", prefilter=True)
.limit(2)
.explain_plan(verbose=True)
)
plan_postfilter = (
table_with_id.search(query_type="hybrid")
.vector([0.0, 0.0])
.text("dog")
.where("id = 1", prefilter=False)
.limit(2)
.explain_plan(verbose=True)
)
# prefilter=True: filter is pushed into the LanceRead scan.
# The FTS sub-plan exposes this as "full_filter=id = Int64(1)" inside LanceRead.
assert "full_filter=id = Int64(1)" in plan_prefilter, (
f"Should push the filter into the scan.\nPlan:\n{plan_prefilter}"
)
# prefilter=False: filter is applied as a separate FilterExec after the search.
# The filter must NOT be embedded in the scan.
assert "full_filter=id = Int64(1)" not in plan_postfilter, (
f"Should NOT push the filter into the scan.\nPlan:\n{plan_postfilter}"
)
def test_normalize_scores():
cases = [
(pa.array([0.1, 0.4]), pa.array([0.0, 1.0])),

View File

@@ -3,6 +3,7 @@
from datetime import timedelta
import random
from typing import get_args, get_type_hints
import pyarrow as pa
import pytest
@@ -22,6 +23,7 @@ from lancedb.index import (
HnswSq,
FTS,
)
from lancedb.table import IndexStatistics
@pytest_asyncio.fixture
@@ -283,3 +285,23 @@ async def test_create_index_with_binary_vectors(binary_table: AsyncTable):
for v in range(256):
res = await binary_table.query().nearest_to([v] * 128).to_arrow()
assert res["id"][0].as_py() == v
def test_index_statistics_index_type_lists_all_supported_values():
expected_index_types = {
"IVF_FLAT",
"IVF_SQ",
"IVF_PQ",
"IVF_RQ",
"IVF_HNSW_SQ",
"IVF_HNSW_PQ",
"FTS",
"BTREE",
"BITMAP",
"LABEL_LIST",
}
assert (
set(get_args(get_type_hints(IndexStatistics)["index_type"]))
== expected_index_types
)

View File

@@ -147,7 +147,12 @@ class TrackingNamespace(LanceNamespace):
This simulates a credential rotation system where each call returns
new credentials that expire after credential_expires_in_seconds.
"""
modified = copy.deepcopy(storage_options) if storage_options else {}
# Start from base storage options (endpoint, region, allow_http, etc.)
# because DirectoryNamespace returns None for storage_options from
# describe_table/declare_table when no credential vendor is configured.
modified = copy.deepcopy(self.base_storage_options)
if storage_options:
modified.update(storage_options)
# Increment credentials to simulate rotation
modified["aws_access_key_id"] = f"AKID_{count}"

View File

@@ -1201,6 +1201,18 @@ async def test_header_provider_overrides_static_headers():
await db.table_names()
def test_close():
"""Test that close() works without AttributeError."""
import asyncio
def handler(req):
req.send_response(200)
req.end_headers()
with mock_lancedb_connection(handler) as db:
asyncio.run(db.close())
@pytest.mark.parametrize("exception", [KeyboardInterrupt, SystemExit, GeneratorExit])
def test_background_loop_cancellation(exception):
"""Test that BackgroundEventLoop.run() cancels the future on interrupt."""

View File

@@ -326,6 +326,24 @@ def test_add_struct(mem_db: DBConnection):
table = mem_db.create_table("test2", schema=schema)
table.add(data)
struct_type = pa.struct(
[
("b", pa.int64()),
("a", pa.int64()),
]
)
expected = pa.table(
{
"s_list": [
[
pa.scalar({"b": 1, "a": 2}, type=struct_type),
pa.scalar({"b": 4, "a": None}, type=struct_type),
]
],
}
)
assert table.to_arrow() == expected
def test_add_subschema(mem_db: DBConnection):
schema = pa.schema(
@@ -509,6 +527,102 @@ async def test_add_async(mem_db_async: AsyncConnection):
assert await table.count_rows() == 3
def test_add_progress_callback(mem_db: DBConnection):
table = mem_db.create_table(
"test",
data=[{"id": 1}, {"id": 2}],
)
updates = []
table.add([{"id": 3}, {"id": 4}], progress=lambda p: updates.append(dict(p)))
assert len(table) == 4
# The done callback always fires, so we should always get at least one.
assert len(updates) >= 1, "expected at least one progress callback"
for p in updates:
assert "output_rows" in p
assert "output_bytes" in p
assert "total_rows" in p
assert "elapsed_seconds" in p
assert "active_tasks" in p
assert "total_tasks" in p
assert "done" in p
# The last callback should have done=True.
assert updates[-1]["done"] is True
def test_add_progress_tqdm_like(mem_db: DBConnection):
"""Test that a tqdm-like object gets total set and update() called."""
class FakeBar:
def __init__(self):
self.total = None
self.n = 0
self.postfix = None
def update(self, n):
self.n += n
def set_postfix_str(self, s):
self.postfix = s
def refresh(self):
pass
table = mem_db.create_table(
"test",
data=[{"id": 1}, {"id": 2}],
)
bar = FakeBar()
table.add([{"id": 3}, {"id": 4}], progress=bar)
assert len(table) == 4
# Postfix should contain throughput and worker count
if bar.postfix is not None:
assert "MB/s" in bar.postfix
assert "workers" in bar.postfix
def test_add_progress_bool(mem_db: DBConnection):
"""Test that progress=True creates and closes a tqdm bar automatically."""
table = mem_db.create_table(
"test",
data=[{"id": 1}, {"id": 2}],
)
table.add([{"id": 3}, {"id": 4}], progress=True)
assert len(table) == 4
# progress=False should be the same as None
table.add([{"id": 5}], progress=False)
assert len(table) == 5
@pytest.mark.asyncio
async def test_add_progress_callback_async(mem_db_async: AsyncConnection):
"""Progress callbacks work through the async path too."""
table = await mem_db_async.create_table("test", data=[{"id": 1}, {"id": 2}])
updates = []
await table.add([{"id": 3}, {"id": 4}], progress=lambda p: updates.append(dict(p)))
assert await table.count_rows() == 4
assert len(updates) >= 1
assert updates[-1]["done"] is True
def test_add_progress_callback_error(mem_db: DBConnection):
"""A failing callback must not prevent the write from succeeding."""
table = mem_db.create_table("test", data=[{"id": 1}, {"id": 2}])
def bad_callback(p):
raise RuntimeError("boom")
table.add([{"id": 3}, {"id": 4}], progress=bad_callback)
assert len(table) == 4
def test_polars(mem_db: DBConnection):
data = {
"vector": [[3.1, 4.1], [5.9, 26.5]],

View File

@@ -121,6 +121,32 @@ def test_value_to_sql_string(tmp_path):
assert table.to_pandas().query("search == @value")["replace"].item() == value
def test_value_to_sql_dict():
# Simple flat struct
assert value_to_sql({"a": 1, "b": "hello"}) == "named_struct('a', 1, 'b', 'hello')"
# Nested struct
assert (
value_to_sql({"outer": {"inner": 1}})
== "named_struct('outer', named_struct('inner', 1))"
)
# List inside struct
assert value_to_sql({"a": [1, 2]}) == "named_struct('a', [1, 2])"
# Mixed types
assert (
value_to_sql({"name": "test", "count": 42, "rate": 3.14, "active": True})
== "named_struct('name', 'test', 'count', 42, 'rate', 3.14, 'active', TRUE)"
)
# Null value inside struct
assert value_to_sql({"a": None}) == "named_struct('a', NULL)"
# Empty dict
assert value_to_sql({}) == "named_struct()"
def test_append_vector_columns():
registry = EmbeddingFunctionRegistry.get_instance()
registry.register("test")(MockTextEmbeddingFunction)

View File

@@ -10,7 +10,7 @@ use arrow::{
use futures::stream::StreamExt;
use lancedb::arrow::SendableRecordBatchStream;
use pyo3::{
exceptions::PyStopAsyncIteration, pyclass, pymethods, Bound, Py, PyAny, PyRef, PyResult, Python,
Bound, Py, PyAny, PyRef, PyResult, Python, exceptions::PyStopAsyncIteration, pyclass, pymethods,
};
use pyo3_async_runtimes::tokio::future_into_py;

View File

@@ -9,15 +9,16 @@ use lancedb::{
database::{CreateTableMode, Database, ReadConsistency},
};
use pyo3::{
Bound, FromPyObject, Py, PyAny, PyRef, PyResult, Python,
exceptions::{PyRuntimeError, PyValueError},
pyclass, pyfunction, pymethods,
types::{PyDict, PyDictMethods},
Bound, FromPyObject, Py, PyAny, PyRef, PyResult, Python,
};
use pyo3_async_runtimes::tokio::future_into_py;
use crate::{
error::PythonErrorExt, storage_options::py_object_to_storage_options_provider, table::Table,
error::PythonErrorExt, namespace::extract_namespace_arc,
storage_options::py_object_to_storage_options_provider, table::Table,
};
#[pyclass]
@@ -182,7 +183,8 @@ impl Connection {
})
}
#[pyo3(signature = (name, namespace=vec![], storage_options = None, storage_options_provider=None, index_cache_size = None, location=None))]
#[allow(clippy::too_many_arguments)]
#[pyo3(signature = (name, namespace=vec![], storage_options = None, storage_options_provider=None, index_cache_size = None, location=None, namespace_client=None, managed_versioning=None))]
pub fn open_table(
self_: PyRef<'_, Self>,
name: String,
@@ -191,11 +193,13 @@ impl Connection {
storage_options_provider: Option<Py<PyAny>>,
index_cache_size: Option<u32>,
location: Option<String>,
namespace_client: Option<Py<PyAny>>,
managed_versioning: Option<bool>,
) -> PyResult<Bound<'_, PyAny>> {
let inner = self_.get_inner()?.clone();
let mut builder = inner.open_table(name);
builder = builder.namespace(namespace);
builder = builder.namespace(namespace.clone());
if let Some(storage_options) = storage_options {
builder = builder.storage_options(storage_options);
}
@@ -209,6 +213,20 @@ impl Connection {
if let Some(location) = location {
builder = builder.location(location);
}
// Extract namespace client from Python object if provided
let ns_client = if let Some(ns_obj) = namespace_client {
let py = self_.py();
Some(extract_namespace_arc(py, ns_obj)?)
} else {
None
};
if let Some(ns_client) = ns_client {
builder = builder.namespace_client(ns_client);
}
// Pass managed_versioning if provided to avoid redundant describe_table call
if let Some(enabled) = managed_versioning {
builder = builder.managed_versioning(enabled);
}
future_into_py(self_.py(), async move {
let table = builder.execute().await.infer_error()?;

View File

@@ -2,10 +2,10 @@
// SPDX-FileCopyrightText: Copyright The LanceDB Authors
use pyo3::{
PyErr, PyResult, Python,
exceptions::{PyIOError, PyNotImplementedError, PyOSError, PyRuntimeError, PyValueError},
intern,
types::{PyAnyMethods, PyNone},
PyErr, PyResult, Python,
};
use lancedb::error::Error as LanceError;

View File

@@ -3,17 +3,17 @@
use lancedb::index::vector::{IvfFlatIndexBuilder, IvfRqIndexBuilder, IvfSqIndexBuilder};
use lancedb::index::{
Index as LanceDbIndex,
scalar::{BTreeIndexBuilder, FtsIndexBuilder},
vector::{IvfHnswPqIndexBuilder, IvfHnswSqIndexBuilder, IvfPqIndexBuilder},
Index as LanceDbIndex,
};
use pyo3::types::PyStringMethods;
use pyo3::IntoPyObject;
use pyo3::types::PyStringMethods;
use pyo3::{
Bound, FromPyObject, PyAny, PyResult, Python,
exceptions::{PyKeyError, PyValueError},
intern, pyclass, pymethods,
types::PyAnyMethods,
Bound, FromPyObject, PyAny, PyResult, Python,
};
use crate::util::parse_distance_type;
@@ -41,7 +41,12 @@ pub fn extract_index_params(source: &Option<Bound<'_, PyAny>>) -> PyResult<Lance
let inner_opts = FtsIndexBuilder::default()
.base_tokenizer(params.base_tokenizer)
.language(&params.language)
.map_err(|_| PyValueError::new_err(format!("LanceDB does not support the requested language: '{}'", params.language)))?
.map_err(|_| {
PyValueError::new_err(format!(
"LanceDB does not support the requested language: '{}'",
params.language
))
})?
.with_position(params.with_position)
.lower_case(params.lower_case)
.max_token_length(params.max_token_length)
@@ -52,7 +57,7 @@ pub fn extract_index_params(source: &Option<Bound<'_, PyAny>>) -> PyResult<Lance
.ngram_max_length(params.ngram_max_length)
.ngram_prefix_only(params.prefix_only);
Ok(LanceDbIndex::FTS(inner_opts))
},
}
"IvfFlat" => {
let params = source.extract::<IvfFlatParams>()?;
let distance_type = parse_distance_type(params.distance_type)?;
@@ -64,10 +69,11 @@ pub fn extract_index_params(source: &Option<Bound<'_, PyAny>>) -> PyResult<Lance
ivf_flat_builder = ivf_flat_builder.num_partitions(num_partitions);
}
if let Some(target_partition_size) = params.target_partition_size {
ivf_flat_builder = ivf_flat_builder.target_partition_size(target_partition_size);
ivf_flat_builder =
ivf_flat_builder.target_partition_size(target_partition_size);
}
Ok(LanceDbIndex::IvfFlat(ivf_flat_builder))
},
}
"IvfPq" => {
let params = source.extract::<IvfPqParams>()?;
let distance_type = parse_distance_type(params.distance_type)?;
@@ -86,7 +92,7 @@ pub fn extract_index_params(source: &Option<Bound<'_, PyAny>>) -> PyResult<Lance
ivf_pq_builder = ivf_pq_builder.num_sub_vectors(num_sub_vectors);
}
Ok(LanceDbIndex::IvfPq(ivf_pq_builder))
},
}
"IvfSq" => {
let params = source.extract::<IvfSqParams>()?;
let distance_type = parse_distance_type(params.distance_type)?;
@@ -101,7 +107,7 @@ pub fn extract_index_params(source: &Option<Bound<'_, PyAny>>) -> PyResult<Lance
ivf_sq_builder = ivf_sq_builder.target_partition_size(target_partition_size);
}
Ok(LanceDbIndex::IvfSq(ivf_sq_builder))
},
}
"IvfRq" => {
let params = source.extract::<IvfRqParams>()?;
let distance_type = parse_distance_type(params.distance_type)?;
@@ -117,7 +123,7 @@ pub fn extract_index_params(source: &Option<Bound<'_, PyAny>>) -> PyResult<Lance
ivf_rq_builder = ivf_rq_builder.target_partition_size(target_partition_size);
}
Ok(LanceDbIndex::IvfRq(ivf_rq_builder))
},
}
"HnswPq" => {
let params = source.extract::<IvfHnswPqParams>()?;
let distance_type = parse_distance_type(params.distance_type)?;
@@ -138,7 +144,7 @@ pub fn extract_index_params(source: &Option<Bound<'_, PyAny>>) -> PyResult<Lance
hnsw_pq_builder = hnsw_pq_builder.num_sub_vectors(num_sub_vectors);
}
Ok(LanceDbIndex::IvfHnswPq(hnsw_pq_builder))
},
}
"HnswSq" => {
let params = source.extract::<IvfHnswSqParams>()?;
let distance_type = parse_distance_type(params.distance_type)?;
@@ -155,7 +161,7 @@ pub fn extract_index_params(source: &Option<Bound<'_, PyAny>>) -> PyResult<Lance
hnsw_sq_builder = hnsw_sq_builder.target_partition_size(target_partition_size);
}
Ok(LanceDbIndex::IvfHnswSq(hnsw_sq_builder))
},
}
not_supported => Err(PyValueError::new_err(format!(
"Invalid index type '{}'. Must be one of BTree, Bitmap, LabelList, FTS, IvfPq, IvfSq, IvfHnswPq, or IvfHnswSq",
not_supported

View File

@@ -2,14 +2,14 @@
// SPDX-FileCopyrightText: Copyright The LanceDB Authors
use arrow::RecordBatchStream;
use connection::{connect, Connection};
use connection::{Connection, connect};
use env_logger::Env;
use index::IndexConfig;
use permutation::{PyAsyncPermutationBuilder, PyPermutationReader};
use pyo3::{
pymodule,
Bound, PyResult, Python, pymodule,
types::{PyModule, PyModuleMethods},
wrap_pyfunction, Bound, PyResult, Python,
wrap_pyfunction,
};
use query::{FTSQuery, HybridQuery, Query, VectorQuery};
use session::Session;
@@ -23,6 +23,7 @@ pub mod connection;
pub mod error;
pub mod header;
pub mod index;
pub mod namespace;
pub mod permutation;
pub mod query;
pub mod session;

696
python/src/namespace.rs Normal file
View File

@@ -0,0 +1,696 @@
// SPDX-License-Identifier: Apache-2.0
// SPDX-FileCopyrightText: Copyright The LanceDB Authors
//! Namespace utilities for Python bindings
use std::collections::HashMap;
use std::sync::Arc;
use async_trait::async_trait;
use bytes::Bytes;
use lance_namespace::LanceNamespace as LanceNamespaceTrait;
use lance_namespace::models::*;
use pyo3::prelude::*;
use pyo3::types::PyDict;
/// Wrapper that allows any Python object implementing LanceNamespace protocol
/// to be used as a Rust LanceNamespace.
///
/// This is similar to PyLanceNamespace in lance's Python bindings - it wraps a Python
/// object and calls back into Python when namespace methods are invoked.
pub struct PyLanceNamespace {
py_namespace: Arc<Py<PyAny>>,
namespace_id: String,
}
impl PyLanceNamespace {
/// Create a new PyLanceNamespace wrapper around a Python namespace object.
pub fn new(_py: Python<'_>, py_namespace: &Bound<'_, PyAny>) -> PyResult<Self> {
let namespace_id = py_namespace
.call_method0("namespace_id")?
.extract::<String>()?;
Ok(Self {
py_namespace: Arc::new(py_namespace.clone().unbind()),
namespace_id,
})
}
/// Create an Arc<dyn LanceNamespace> from a Python namespace object.
pub fn create_arc(
py: Python<'_>,
py_namespace: &Bound<'_, PyAny>,
) -> PyResult<Arc<dyn LanceNamespaceTrait>> {
let wrapper = Self::new(py, py_namespace)?;
Ok(Arc::new(wrapper))
}
}
impl std::fmt::Debug for PyLanceNamespace {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
write!(f, "PyLanceNamespace {{ id: {} }}", self.namespace_id)
}
}
/// Get or create the DictWithModelDump class in Python.
/// This class acts like a dict but also has model_dump() method.
/// This allows it to work with both:
/// - depythonize (which expects a dict/Mapping)
/// - Python code that calls .model_dump() (like DirectoryNamespace wrapper)
fn get_dict_with_model_dump_class(py: Python<'_>) -> PyResult<Bound<'_, PyAny>> {
// Use a module-level cache via __builtins__
let builtins = py.import("builtins")?;
if builtins.hasattr("_DictWithModelDump")? {
return builtins.getattr("_DictWithModelDump");
}
// Create the class using exec
let locals = PyDict::new(py);
py.run(
c"class DictWithModelDump(dict):
def model_dump(self):
return dict(self)",
None,
Some(&locals),
)?;
let class = locals.get_item("DictWithModelDump")?.ok_or_else(|| {
pyo3::exceptions::PyRuntimeError::new_err("Failed to create DictWithModelDump class")
})?;
// Cache it
builtins.setattr("_DictWithModelDump", &class)?;
Ok(class)
}
/// Helper to call a Python namespace method with JSON serialization.
/// For methods that take a request and return a response.
/// Uses DictWithModelDump to pass a dict that also has model_dump() method,
/// making it compatible with both depythonize and Python wrappers.
async fn call_py_method<Req, Resp>(
py_namespace: Arc<Py<PyAny>>,
method_name: &'static str,
request: Req,
) -> lance_core::Result<Resp>
where
Req: serde::Serialize + Send + 'static,
Resp: serde::de::DeserializeOwned + Send + 'static,
{
let request_json = serde_json::to_string(&request).map_err(|e| {
lance_core::Error::io(format!(
"Failed to serialize request for {}: {}",
method_name, e
))
})?;
let response_json = tokio::task::spawn_blocking(move || {
Python::attach(|py| {
let json_module = py.import("json")?;
let request_dict = json_module.call_method1("loads", (&request_json,))?;
// Wrap dict in DictWithModelDump so it works with both depythonize and .model_dump()
let dict_class = get_dict_with_model_dump_class(py)?;
let request_arg = dict_class.call1((request_dict,))?;
// Call the Python method
let result = py_namespace.call_method1(py, method_name, (request_arg,))?;
// Convert response to dict, then to JSON
// Pydantic models have model_dump() method
let result_dict = if result.bind(py).hasattr("model_dump")? {
result.call_method0(py, "model_dump")?
} else {
result
};
let response_json: String = json_module
.call_method1("dumps", (result_dict,))?
.extract()?;
Ok::<_, PyErr>(response_json)
})
})
.await
.map_err(|e| lance_core::Error::io(format!("Task join error for {}: {}", method_name, e)))?
.map_err(|e: PyErr| lance_core::Error::io(format!("Python error in {}: {}", method_name, e)))?;
serde_json::from_str(&response_json).map_err(|e| {
lance_core::Error::io(format!(
"Failed to deserialize response from {}: {}",
method_name, e
))
})
}
/// Helper for methods that return () on success
async fn call_py_method_unit<Req>(
py_namespace: Arc<Py<PyAny>>,
method_name: &'static str,
request: Req,
) -> lance_core::Result<()>
where
Req: serde::Serialize + Send + 'static,
{
let request_json = serde_json::to_string(&request).map_err(|e| {
lance_core::Error::io(format!(
"Failed to serialize request for {}: {}",
method_name, e
))
})?;
tokio::task::spawn_blocking(move || {
Python::attach(|py| {
let json_module = py.import("json")?;
let request_dict = json_module.call_method1("loads", (&request_json,))?;
// Wrap dict in DictWithModelDump
let dict_class = get_dict_with_model_dump_class(py)?;
let request_arg = dict_class.call1((request_dict,))?;
// Call the Python method
py_namespace.call_method1(py, method_name, (request_arg,))?;
Ok::<_, PyErr>(())
})
})
.await
.map_err(|e| lance_core::Error::io(format!("Task join error for {}: {}", method_name, e)))?
.map_err(|e: PyErr| lance_core::Error::io(format!("Python error in {}: {}", method_name, e)))
}
/// Helper for methods that return a primitive type
async fn call_py_method_primitive<Req, Resp>(
py_namespace: Arc<Py<PyAny>>,
method_name: &'static str,
request: Req,
) -> lance_core::Result<Resp>
where
Req: serde::Serialize + Send + 'static,
Resp: for<'py> pyo3::FromPyObject<'py> + Send + 'static,
{
let request_json = serde_json::to_string(&request).map_err(|e| {
lance_core::Error::io(format!(
"Failed to serialize request for {}: {}",
method_name, e
))
})?;
tokio::task::spawn_blocking(move || {
Python::attach(|py| {
let json_module = py.import("json")?;
let request_dict = json_module.call_method1("loads", (&request_json,))?;
// Wrap dict in DictWithModelDump
let dict_class = get_dict_with_model_dump_class(py)?;
let request_arg = dict_class.call1((request_dict,))?;
// Call the Python method
let result = py_namespace.call_method1(py, method_name, (request_arg,))?;
let value: Resp = result.extract(py)?;
Ok::<_, PyErr>(value)
})
})
.await
.map_err(|e| lance_core::Error::io(format!("Task join error for {}: {}", method_name, e)))?
.map_err(|e: PyErr| lance_core::Error::io(format!("Python error in {}: {}", method_name, e)))
}
/// Helper for methods that return Bytes
async fn call_py_method_bytes<Req>(
py_namespace: Arc<Py<PyAny>>,
method_name: &'static str,
request: Req,
) -> lance_core::Result<Bytes>
where
Req: serde::Serialize + Send + 'static,
{
let request_json = serde_json::to_string(&request).map_err(|e| {
lance_core::Error::io(format!(
"Failed to serialize request for {}: {}",
method_name, e
))
})?;
tokio::task::spawn_blocking(move || {
Python::attach(|py| {
let json_module = py.import("json")?;
let request_dict = json_module.call_method1("loads", (&request_json,))?;
// Wrap dict in DictWithModelDump
let dict_class = get_dict_with_model_dump_class(py)?;
let request_arg = dict_class.call1((request_dict,))?;
// Call the Python method
let result = py_namespace.call_method1(py, method_name, (request_arg,))?;
let bytes_data: Vec<u8> = result.extract(py)?;
Ok::<_, PyErr>(Bytes::from(bytes_data))
})
})
.await
.map_err(|e| lance_core::Error::io(format!("Task join error for {}: {}", method_name, e)))?
.map_err(|e: PyErr| lance_core::Error::io(format!("Python error in {}: {}", method_name, e)))
}
/// Helper for methods that take request + data and return a response
async fn call_py_method_with_data<Req, Resp>(
py_namespace: Arc<Py<PyAny>>,
method_name: &'static str,
request: Req,
data: Bytes,
) -> lance_core::Result<Resp>
where
Req: serde::Serialize + Send + 'static,
Resp: serde::de::DeserializeOwned + Send + 'static,
{
let request_json = serde_json::to_string(&request).map_err(|e| {
lance_core::Error::io(format!(
"Failed to serialize request for {}: {}",
method_name, e
))
})?;
let response_json = tokio::task::spawn_blocking(move || {
Python::attach(|py| {
let json_module = py.import("json")?;
let request_dict = json_module.call_method1("loads", (&request_json,))?;
// Wrap dict in DictWithModelDump
let dict_class = get_dict_with_model_dump_class(py)?;
let request_arg = dict_class.call1((request_dict,))?;
// Pass request and bytes to Python method
let py_bytes = pyo3::types::PyBytes::new(py, &data);
let result = py_namespace.call_method1(py, method_name, (request_arg, py_bytes))?;
// Convert response dict to JSON
let response_json: String = json_module.call_method1("dumps", (result,))?.extract()?;
Ok::<_, PyErr>(response_json)
})
})
.await
.map_err(|e| lance_core::Error::io(format!("Task join error for {}: {}", method_name, e)))?
.map_err(|e: PyErr| lance_core::Error::io(format!("Python error in {}: {}", method_name, e)))?;
serde_json::from_str(&response_json).map_err(|e| {
lance_core::Error::io(format!(
"Failed to deserialize response from {}: {}",
method_name, e
))
})
}
#[async_trait]
impl LanceNamespaceTrait for PyLanceNamespace {
fn namespace_id(&self) -> String {
self.namespace_id.clone()
}
async fn list_namespaces(
&self,
request: ListNamespacesRequest,
) -> lance_core::Result<ListNamespacesResponse> {
call_py_method(self.py_namespace.clone(), "list_namespaces", request).await
}
async fn describe_namespace(
&self,
request: DescribeNamespaceRequest,
) -> lance_core::Result<DescribeNamespaceResponse> {
call_py_method(self.py_namespace.clone(), "describe_namespace", request).await
}
async fn create_namespace(
&self,
request: CreateNamespaceRequest,
) -> lance_core::Result<CreateNamespaceResponse> {
call_py_method(self.py_namespace.clone(), "create_namespace", request).await
}
async fn drop_namespace(
&self,
request: DropNamespaceRequest,
) -> lance_core::Result<DropNamespaceResponse> {
call_py_method(self.py_namespace.clone(), "drop_namespace", request).await
}
async fn namespace_exists(&self, request: NamespaceExistsRequest) -> lance_core::Result<()> {
call_py_method_unit(self.py_namespace.clone(), "namespace_exists", request).await
}
async fn list_tables(
&self,
request: ListTablesRequest,
) -> lance_core::Result<ListTablesResponse> {
call_py_method(self.py_namespace.clone(), "list_tables", request).await
}
async fn describe_table(
&self,
request: DescribeTableRequest,
) -> lance_core::Result<DescribeTableResponse> {
call_py_method(self.py_namespace.clone(), "describe_table", request).await
}
async fn register_table(
&self,
request: RegisterTableRequest,
) -> lance_core::Result<RegisterTableResponse> {
call_py_method(self.py_namespace.clone(), "register_table", request).await
}
async fn table_exists(&self, request: TableExistsRequest) -> lance_core::Result<()> {
call_py_method_unit(self.py_namespace.clone(), "table_exists", request).await
}
async fn drop_table(&self, request: DropTableRequest) -> lance_core::Result<DropTableResponse> {
call_py_method(self.py_namespace.clone(), "drop_table", request).await
}
async fn deregister_table(
&self,
request: DeregisterTableRequest,
) -> lance_core::Result<DeregisterTableResponse> {
call_py_method(self.py_namespace.clone(), "deregister_table", request).await
}
async fn count_table_rows(&self, request: CountTableRowsRequest) -> lance_core::Result<i64> {
call_py_method_primitive(self.py_namespace.clone(), "count_table_rows", request).await
}
async fn create_table(
&self,
request: CreateTableRequest,
request_data: Bytes,
) -> lance_core::Result<CreateTableResponse> {
call_py_method_with_data(
self.py_namespace.clone(),
"create_table",
request,
request_data,
)
.await
}
async fn declare_table(
&self,
request: DeclareTableRequest,
) -> lance_core::Result<DeclareTableResponse> {
call_py_method(self.py_namespace.clone(), "declare_table", request).await
}
async fn insert_into_table(
&self,
request: InsertIntoTableRequest,
request_data: Bytes,
) -> lance_core::Result<InsertIntoTableResponse> {
call_py_method_with_data(
self.py_namespace.clone(),
"insert_into_table",
request,
request_data,
)
.await
}
async fn merge_insert_into_table(
&self,
request: MergeInsertIntoTableRequest,
request_data: Bytes,
) -> lance_core::Result<MergeInsertIntoTableResponse> {
call_py_method_with_data(
self.py_namespace.clone(),
"merge_insert_into_table",
request,
request_data,
)
.await
}
async fn update_table(
&self,
request: UpdateTableRequest,
) -> lance_core::Result<UpdateTableResponse> {
call_py_method(self.py_namespace.clone(), "update_table", request).await
}
async fn delete_from_table(
&self,
request: DeleteFromTableRequest,
) -> lance_core::Result<DeleteFromTableResponse> {
call_py_method(self.py_namespace.clone(), "delete_from_table", request).await
}
async fn query_table(&self, request: QueryTableRequest) -> lance_core::Result<Bytes> {
call_py_method_bytes(self.py_namespace.clone(), "query_table", request).await
}
async fn create_table_index(
&self,
request: CreateTableIndexRequest,
) -> lance_core::Result<CreateTableIndexResponse> {
call_py_method(self.py_namespace.clone(), "create_table_index", request).await
}
async fn list_table_indices(
&self,
request: ListTableIndicesRequest,
) -> lance_core::Result<ListTableIndicesResponse> {
call_py_method(self.py_namespace.clone(), "list_table_indices", request).await
}
async fn describe_table_index_stats(
&self,
request: DescribeTableIndexStatsRequest,
) -> lance_core::Result<DescribeTableIndexStatsResponse> {
call_py_method(
self.py_namespace.clone(),
"describe_table_index_stats",
request,
)
.await
}
async fn describe_transaction(
&self,
request: DescribeTransactionRequest,
) -> lance_core::Result<DescribeTransactionResponse> {
call_py_method(self.py_namespace.clone(), "describe_transaction", request).await
}
async fn alter_transaction(
&self,
request: AlterTransactionRequest,
) -> lance_core::Result<AlterTransactionResponse> {
call_py_method(self.py_namespace.clone(), "alter_transaction", request).await
}
async fn create_table_scalar_index(
&self,
request: CreateTableIndexRequest,
) -> lance_core::Result<CreateTableScalarIndexResponse> {
call_py_method(
self.py_namespace.clone(),
"create_table_scalar_index",
request,
)
.await
}
async fn drop_table_index(
&self,
request: DropTableIndexRequest,
) -> lance_core::Result<DropTableIndexResponse> {
call_py_method(self.py_namespace.clone(), "drop_table_index", request).await
}
async fn list_all_tables(
&self,
request: ListTablesRequest,
) -> lance_core::Result<ListTablesResponse> {
call_py_method(self.py_namespace.clone(), "list_all_tables", request).await
}
async fn restore_table(
&self,
request: RestoreTableRequest,
) -> lance_core::Result<RestoreTableResponse> {
call_py_method(self.py_namespace.clone(), "restore_table", request).await
}
async fn rename_table(
&self,
request: RenameTableRequest,
) -> lance_core::Result<RenameTableResponse> {
call_py_method(self.py_namespace.clone(), "rename_table", request).await
}
async fn list_table_versions(
&self,
request: ListTableVersionsRequest,
) -> lance_core::Result<ListTableVersionsResponse> {
call_py_method(self.py_namespace.clone(), "list_table_versions", request).await
}
async fn create_table_version(
&self,
request: CreateTableVersionRequest,
) -> lance_core::Result<CreateTableVersionResponse> {
call_py_method(self.py_namespace.clone(), "create_table_version", request).await
}
async fn describe_table_version(
&self,
request: DescribeTableVersionRequest,
) -> lance_core::Result<DescribeTableVersionResponse> {
call_py_method(self.py_namespace.clone(), "describe_table_version", request).await
}
async fn batch_delete_table_versions(
&self,
request: BatchDeleteTableVersionsRequest,
) -> lance_core::Result<BatchDeleteTableVersionsResponse> {
call_py_method(
self.py_namespace.clone(),
"batch_delete_table_versions",
request,
)
.await
}
async fn update_table_schema_metadata(
&self,
request: UpdateTableSchemaMetadataRequest,
) -> lance_core::Result<UpdateTableSchemaMetadataResponse> {
call_py_method(
self.py_namespace.clone(),
"update_table_schema_metadata",
request,
)
.await
}
async fn get_table_stats(
&self,
request: GetTableStatsRequest,
) -> lance_core::Result<GetTableStatsResponse> {
call_py_method(self.py_namespace.clone(), "get_table_stats", request).await
}
async fn explain_table_query_plan(
&self,
request: ExplainTableQueryPlanRequest,
) -> lance_core::Result<String> {
call_py_method_primitive(
self.py_namespace.clone(),
"explain_table_query_plan",
request,
)
.await
}
async fn analyze_table_query_plan(
&self,
request: AnalyzeTableQueryPlanRequest,
) -> lance_core::Result<String> {
call_py_method_primitive(
self.py_namespace.clone(),
"analyze_table_query_plan",
request,
)
.await
}
async fn alter_table_add_columns(
&self,
request: AlterTableAddColumnsRequest,
) -> lance_core::Result<AlterTableAddColumnsResponse> {
call_py_method(
self.py_namespace.clone(),
"alter_table_add_columns",
request,
)
.await
}
async fn alter_table_alter_columns(
&self,
request: AlterTableAlterColumnsRequest,
) -> lance_core::Result<AlterTableAlterColumnsResponse> {
call_py_method(
self.py_namespace.clone(),
"alter_table_alter_columns",
request,
)
.await
}
async fn alter_table_drop_columns(
&self,
request: AlterTableDropColumnsRequest,
) -> lance_core::Result<AlterTableDropColumnsResponse> {
call_py_method(
self.py_namespace.clone(),
"alter_table_drop_columns",
request,
)
.await
}
async fn list_table_tags(
&self,
request: ListTableTagsRequest,
) -> lance_core::Result<ListTableTagsResponse> {
call_py_method(self.py_namespace.clone(), "list_table_tags", request).await
}
async fn create_table_tag(
&self,
request: CreateTableTagRequest,
) -> lance_core::Result<CreateTableTagResponse> {
call_py_method(self.py_namespace.clone(), "create_table_tag", request).await
}
async fn delete_table_tag(
&self,
request: DeleteTableTagRequest,
) -> lance_core::Result<DeleteTableTagResponse> {
call_py_method(self.py_namespace.clone(), "delete_table_tag", request).await
}
async fn update_table_tag(
&self,
request: UpdateTableTagRequest,
) -> lance_core::Result<UpdateTableTagResponse> {
call_py_method(self.py_namespace.clone(), "update_table_tag", request).await
}
async fn get_table_tag_version(
&self,
request: GetTableTagVersionRequest,
) -> lance_core::Result<GetTableTagVersionResponse> {
call_py_method(self.py_namespace.clone(), "get_table_tag_version", request).await
}
}
/// Convert Python dict to HashMap<String, String>
#[allow(dead_code)]
fn dict_to_hashmap(dict: &Bound<'_, PyDict>) -> PyResult<HashMap<String, String>> {
let mut map = HashMap::new();
for (key, value) in dict.iter() {
let key_str: String = key.extract()?;
let value_str: String = value.extract()?;
map.insert(key_str, value_str);
}
Ok(map)
}
/// Extract an Arc<dyn LanceNamespace> from a Python namespace object.
///
/// This function wraps any Python namespace object with PyLanceNamespace.
/// The PyLanceNamespace wrapper uses DictWithModelDump to pass requests,
/// which works with both:
/// - Native namespaces (DirectoryNamespace, RestNamespace) that use depythonize (expects dict)
/// - Custom Python implementations that call .model_dump() on the request
pub fn extract_namespace_arc(
py: Python<'_>,
ns: Py<PyAny>,
) -> PyResult<Arc<dyn LanceNamespaceTrait>> {
let ns_ref = ns.bind(py);
PyLanceNamespace::create_arc(py, ns_ref)
}

View File

@@ -16,10 +16,10 @@ use lancedb::{
query::Select,
};
use pyo3::{
Bound, PyAny, PyRef, PyRefMut, PyResult, Python,
exceptions::PyRuntimeError,
pyclass, pymethods,
types::{PyAnyMethods, PyDict, PyDictMethods, PyType},
Bound, PyAny, PyRef, PyRefMut, PyResult, Python,
};
use pyo3_async_runtimes::tokio::future_into_py;

View File

@@ -4,9 +4,9 @@
use std::sync::Arc;
use std::time::Duration;
use arrow::array::make_array;
use arrow::array::Array;
use arrow::array::ArrayData;
use arrow::array::make_array;
use arrow::pyarrow::FromPyArrow;
use arrow::pyarrow::IntoPyArrow;
use arrow::pyarrow::ToPyArrow;
@@ -22,23 +22,23 @@ use lancedb::query::{
VectorQuery as LanceDbVectorQuery,
};
use lancedb::table::AnyQuery;
use pyo3::prelude::{PyAnyMethods, PyDictMethods};
use pyo3::pyfunction;
use pyo3::pymethods;
use pyo3::types::PyList;
use pyo3::types::{PyDict, PyString};
use pyo3::Bound;
use pyo3::IntoPyObject;
use pyo3::PyAny;
use pyo3::PyRef;
use pyo3::PyResult;
use pyo3::Python;
use pyo3::{exceptions::PyRuntimeError, FromPyObject};
use pyo3::prelude::{PyAnyMethods, PyDictMethods};
use pyo3::pyfunction;
use pyo3::pymethods;
use pyo3::types::PyList;
use pyo3::types::{PyDict, PyString};
use pyo3::{FromPyObject, exceptions::PyRuntimeError};
use pyo3::{PyErr, pyclass};
use pyo3::{
exceptions::{PyNotImplementedError, PyValueError},
intern,
};
use pyo3::{pyclass, PyErr};
use pyo3_async_runtimes::tokio::future_into_py;
use crate::util::parse_distance_type;
@@ -316,6 +316,19 @@ impl<'py> IntoPyObject<'py> for PySelect {
Select::All => Ok(py.None().into_bound(py).into_any()),
Select::Columns(columns) => Ok(columns.into_pyobject(py)?.into_any()),
Select::Dynamic(columns) => Ok(columns.into_pyobject(py)?.into_any()),
Select::Expr(pairs) => {
// Serialize DataFusion Expr -> SQL string so Python sees the same
// format as Select::Dynamic: a list of (name, sql_string) tuples.
let sql_pairs: PyResult<Vec<(String, String)>> = pairs
.into_iter()
.map(|(name, expr)| {
lancedb::expr::expr_to_sql_string(&expr)
.map(|sql| (name, sql))
.map_err(|e| PyRuntimeError::new_err(e.to_string()))
})
.collect();
Ok(sql_pairs?.into_pyobject(py)?.into_any())
}
}
}
}

View File

@@ -4,7 +4,7 @@
use std::sync::Arc;
use lancedb::{ObjectStoreRegistry, Session as LanceSession};
use pyo3::{pyclass, pymethods, PyResult};
use pyo3::{PyResult, pyclass, pymethods};
/// A session for managing caches and object stores across LanceDB operations.
///

View File

@@ -66,13 +66,10 @@ impl StorageOptionsProvider for PyStorageOptionsProviderWrapper {
.inner
.bind(py)
.call_method0("fetch_storage_options")
.map_err(|e| lance_core::Error::IO {
source: Box::new(std::io::Error::other(format!(
"Failed to call fetch_storage_options: {}",
e
))),
location: snafu::location!(),
})?;
.map_err(|e| lance_core::Error::io_source(Box::new(std::io::Error::other(format!(
"Failed to call fetch_storage_options: {}",
e
)))))?;
// If result is None, return None
if result.is_none() {
@@ -81,26 +78,19 @@ impl StorageOptionsProvider for PyStorageOptionsProviderWrapper {
// Extract the result dict - should be a flat Map<String, String>
let result_dict = result.downcast::<PyDict>().map_err(|_| {
lance_core::Error::InvalidInput {
source: "fetch_storage_options() must return None or a dict of string key-value pairs".into(),
location: snafu::location!(),
}
lance_core::Error::invalid_input(
"fetch_storage_options() must return a dict of string key-value pairs or None",
)
})?;
// Convert all entries to HashMap<String, String>
let mut storage_options = HashMap::new();
for (key, value) in result_dict.iter() {
let key_str: String = key.extract().map_err(|e| {
lance_core::Error::InvalidInput {
source: format!("Storage option key must be a string: {}", e).into(),
location: snafu::location!(),
}
lance_core::Error::invalid_input(format!("Storage option key must be a string: {}", e))
})?;
let value_str: String = value.extract().map_err(|e| {
lance_core::Error::InvalidInput {
source: format!("Storage option value must be a string: {}", e).into(),
location: snafu::location!(),
}
lance_core::Error::invalid_input(format!("Storage option value must be a string: {}", e))
})?;
storage_options.insert(key_str, value_str);
}
@@ -109,13 +99,10 @@ impl StorageOptionsProvider for PyStorageOptionsProviderWrapper {
})
})
.await
.map_err(|e| lance_core::Error::IO {
source: Box::new(std::io::Error::other(format!(
"Task join error: {}",
e
))),
location: snafu::location!(),
})?
.map_err(|e| lance_core::Error::io_source(Box::new(std::io::Error::other(format!(
"Task join error: {}",
e
)))))?
}
fn provider_id(&self) -> String {

View File

@@ -5,7 +5,7 @@ use std::{collections::HashMap, sync::Arc};
use crate::{
connection::Connection,
error::PythonErrorExt,
index::{extract_index_params, IndexConfig},
index::{IndexConfig, extract_index_params},
query::{Query, TakeQuery},
table::scannable::PyScannable,
};
@@ -19,10 +19,10 @@ use lancedb::table::{
Table as LanceDbTable,
};
use pyo3::{
Bound, FromPyObject, Py, PyAny, PyRef, PyResult, Python,
exceptions::{PyKeyError, PyRuntimeError, PyValueError},
pyclass, pymethods,
types::{IntoPyDict, PyAnyMethods, PyDict, PyDictMethods},
Bound, FromPyObject, PyAny, PyRef, PyResult, Python,
};
use pyo3_async_runtimes::tokio::future_into_py;
@@ -112,19 +112,24 @@ impl From<lancedb::table::AddResult> for AddResult {
#[pyclass(get_all)]
#[derive(Clone, Debug)]
pub struct DeleteResult {
pub num_deleted_rows: u64,
pub version: u64,
}
#[pymethods]
impl DeleteResult {
pub fn __repr__(&self) -> String {
format!("DeleteResult(version={})", self.version)
format!(
"DeleteResult(num_deleted_rows={}, version={})",
self.num_deleted_rows, self.version
)
}
}
impl From<lancedb::table::DeleteResult> for DeleteResult {
fn from(result: lancedb::table::DeleteResult) -> Self {
Self {
num_deleted_rows: result.num_deleted_rows,
version: result.version,
}
}
@@ -294,10 +299,12 @@ impl Table {
})
}
#[pyo3(signature = (data, mode, progress=None))]
pub fn add<'a>(
self_: PyRef<'a, Self>,
data: PyScannable,
mode: String,
progress: Option<Py<PyAny>>,
) -> PyResult<Bound<'a, PyAny>> {
let mut op = self_.inner_ref()?.add(data);
if mode == "append" {
@@ -307,6 +314,81 @@ impl Table {
} else {
return Err(PyValueError::new_err(format!("Invalid mode: {}", mode)));
}
if let Some(progress_obj) = progress {
let is_callable = Python::attach(|py| progress_obj.bind(py).is_callable());
if is_callable {
// Callback: call with a dict of progress info.
op = op.progress(move |p| {
Python::attach(|py| {
let dict = PyDict::new(py);
if let Err(e) = dict
.set_item("output_rows", p.output_rows())
.and_then(|_| dict.set_item("output_bytes", p.output_bytes()))
.and_then(|_| dict.set_item("total_rows", p.total_rows()))
.and_then(|_| {
dict.set_item("elapsed_seconds", p.elapsed().as_secs_f64())
})
.and_then(|_| dict.set_item("active_tasks", p.active_tasks()))
.and_then(|_| dict.set_item("total_tasks", p.total_tasks()))
.and_then(|_| dict.set_item("done", p.done()))
{
log::warn!("progress dict error: {e}");
return;
}
if let Err(e) = progress_obj.call1(py, (dict,)) {
log::warn!("progress callback error: {e}");
}
});
});
} else {
// tqdm-like: has update() method.
let mut last_rows: usize = 0;
let mut total_set = false;
op = op.progress(move |p| {
let current = p.output_rows();
let prev = last_rows;
last_rows = current;
Python::attach(|py| {
if let Some(total) = p.total_rows()
&& !total_set
{
if let Err(e) = progress_obj.setattr(py, "total", total) {
log::warn!("progress setattr error: {e}");
}
total_set = true;
}
let delta = current.saturating_sub(prev);
if delta > 0 {
if let Err(e) = progress_obj.call_method1(py, "update", (delta,)) {
log::warn!("progress update error: {e}");
}
// Show throughput and active workers in tqdm postfix.
let elapsed = p.elapsed().as_secs_f64();
if elapsed > 0.0 {
let mb_per_sec = p.output_bytes() as f64 / elapsed / 1_000_000.0;
let postfix = format!(
"{:.1} MB/s | {}/{} workers",
mb_per_sec,
p.active_tasks(),
p.total_tasks()
);
if let Err(e) =
progress_obj.call_method1(py, "set_postfix_str", (postfix,))
{
log::warn!("progress set_postfix_str error: {e}");
}
}
}
if p.done() {
// Force a final refresh so the bar shows completion.
if let Err(e) = progress_obj.call_method0(py, "refresh") {
log::warn!("progress refresh error: {e}");
}
}
});
});
}
}
future_into_py(self_.py(), async move {
let result = op.execute().await.infer_error()?;
@@ -421,6 +503,17 @@ impl Table {
})
}
pub fn prewarm_data(
self_: PyRef<'_, Self>,
columns: Option<Vec<String>>,
) -> PyResult<Bound<'_, PyAny>> {
let inner = self_.inner_ref()?.clone();
future_into_py(self_.py(), async move {
inner.prewarm_data(columns).await.infer_error()?;
Ok(())
})
}
pub fn list_indices(self_: PyRef<'_, Self>) -> PyResult<Bound<'_, PyAny>> {
let inner = self_.inner_ref()?.clone();
future_into_py(self_.py(), async move {
@@ -537,7 +630,7 @@ impl Table {
let inner = self_.inner_ref()?.clone();
future_into_py(self_.py(), async move {
let versions = inner.list_versions().await.infer_error()?;
let versions_as_dict = Python::attach(|py| {
Python::attach(|py| {
versions
.iter()
.map(|v| {
@@ -554,9 +647,7 @@ impl Table {
Ok(dict.unbind())
})
.collect::<PyResult<Vec<_>>>()
});
versions_as_dict
})
})
}

View File

@@ -10,11 +10,11 @@ use arrow::{
};
use futures::StreamExt;
use lancedb::{
Error,
arrow::{SendableRecordBatchStream, SimpleRecordBatchStream},
data::scannable::Scannable,
Error,
};
use pyo3::{types::PyAnyMethods, FromPyObject, Py, PyAny, Python};
use pyo3::{FromPyObject, Py, PyAny, Python, types::PyAnyMethods};
/// Adapter that implements Scannable for a Python reader factory callable.
///
@@ -99,15 +99,15 @@ impl Scannable for PyScannable {
// Channel closed. Check if the task panicked — a panic
// drops the sender without sending an error, so without
// this check we'd silently return a truncated stream.
if let Some(handle) = join_handle {
if let Err(join_err) = handle.await {
return Some((
Err(Error::Runtime {
message: format!("Reader task panicked: {}", join_err),
}),
(rx, None),
));
}
if let Some(handle) = join_handle
&& let Err(join_err) = handle.await
{
return Some((
Err(Error::Runtime {
message: format!("Reader task panicked: {}", join_err),
}),
(rx, None),
));
}
None
}

View File

@@ -5,8 +5,9 @@ use std::sync::Mutex;
use lancedb::DistanceType;
use pyo3::{
PyResult,
exceptions::{PyRuntimeError, PyValueError},
pyfunction, PyResult,
pyfunction,
};
/// A wrapper around a rust builder

4
python/uv.lock generated
View File

@@ -2006,7 +2006,7 @@ requires-dist = [
{ name = "botocore", marker = "extra == 'embeddings'", specifier = ">=1.31.57" },
{ name = "cohere", marker = "extra == 'embeddings'" },
{ name = "colpali-engine", marker = "extra == 'embeddings'", specifier = ">=0.3.10" },
{ name = "datafusion", marker = "extra == 'tests'" },
{ name = "datafusion", marker = "extra == 'tests'", specifier = "<52" },
{ name = "deprecation" },
{ name = "duckdb", marker = "extra == 'tests'" },
{ name = "google-generativeai", marker = "extra == 'embeddings'" },
@@ -2035,7 +2035,7 @@ requires-dist = [
{ name = "pyarrow-stubs", marker = "extra == 'tests'" },
{ name = "pydantic", specifier = ">=1.10" },
{ name = "pylance", marker = "extra == 'pylance'", specifier = ">=1.0.0b14" },
{ name = "pylance", marker = "extra == 'tests'", specifier = ">=1.0.0b14" },
{ name = "pylance", marker = "extra == 'tests'", specifier = ">=1.0.0b14,<3.0.0" },
{ name = "pyright", marker = "extra == 'dev'" },
{ name = "pytest", marker = "extra == 'tests'" },
{ name = "pytest-asyncio", marker = "extra == 'tests'" },

View File

@@ -1,6 +1,6 @@
[package]
name = "lancedb"
version = "0.27.0-beta.2"
version = "0.27.1"
edition.workspace = true
description = "LanceDB: A serverless, low-latency vector database for AI applications"
license.workspace = true

View File

@@ -1,4 +1,4 @@
# LanceDB Rust
# LanceDB Rust SDK
<a href="https://crates.io/crates/vectordb">![img](https://img.shields.io/crates/v/vectordb)</a>
<a href="https://docs.rs/vectordb/latest/vectordb/">![Docs.rs](https://img.shields.io/docsrs/vectordb)</a>

View File

@@ -9,10 +9,9 @@ use aws_config::Region;
use aws_sdk_bedrockruntime::Client;
use futures::StreamExt;
use lancedb::{
connect,
embeddings::{bedrock::BedrockEmbeddingFunction, EmbeddingDefinition, EmbeddingFunction},
Result, connect,
embeddings::{EmbeddingDefinition, EmbeddingFunction, bedrock::BedrockEmbeddingFunction},
query::{ExecutableQuery, QueryBase},
Result,
};
#[tokio::main]

View File

@@ -10,10 +10,10 @@ use futures::TryStreamExt;
use lance_index::scalar::FullTextSearchQuery;
use lancedb::connection::Connection;
use lancedb::index::scalar::FtsIndexBuilder;
use lancedb::index::Index;
use lancedb::index::scalar::FtsIndexBuilder;
use lancedb::query::{ExecutableQuery, QueryBase};
use lancedb::{connect, Result, Table};
use lancedb::{Result, Table, connect};
use rand::random;
#[tokio::main]
@@ -46,19 +46,21 @@ fn create_some_records() -> Result<Box<dyn arrow_array::RecordBatchReader + Send
.collect::<Vec<_>>();
let n_terms = 3;
let batches = RecordBatchIterator::new(
vec![RecordBatch::try_new(
schema.clone(),
vec![
Arc::new(Int32Array::from_iter_values(0..TOTAL as i32)),
Arc::new(StringArray::from_iter_values((0..TOTAL).map(|_| {
(0..n_terms)
.map(|_| words[random::<u32>() as usize % words.len()])
.collect::<Vec<_>>()
.join(" ")
}))),
],
)
.unwrap()]
vec![
RecordBatch::try_new(
schema.clone(),
vec![
Arc::new(Int32Array::from_iter_values(0..TOTAL as i32)),
Arc::new(StringArray::from_iter_values((0..TOTAL).map(|_| {
(0..n_terms)
.map(|_| words[random::<u32>() as usize % words.len()])
.collect::<Vec<_>>()
.join(" ")
}))),
],
)
.unwrap(),
]
.into_iter()
.map(Ok),
schema.clone(),

View File

@@ -5,16 +5,15 @@ use arrow_array::{RecordBatch, StringArray};
use arrow_schema::{DataType, Field, Schema};
use futures::TryStreamExt;
use lance_index::scalar::FullTextSearchQuery;
use lancedb::index::scalar::FtsIndexBuilder;
use lancedb::index::Index;
use lancedb::index::scalar::FtsIndexBuilder;
use lancedb::{
connect,
Result, Table, connect,
embeddings::{
sentence_transformers::SentenceTransformersEmbeddings, EmbeddingDefinition,
EmbeddingFunction,
EmbeddingDefinition, EmbeddingFunction,
sentence_transformers::SentenceTransformersEmbeddings,
},
query::{QueryBase, QueryExecutionOptions},
Result, Table,
};
use std::{iter::once, sync::Arc};

View File

@@ -14,10 +14,10 @@ use arrow_schema::{DataType, Field, Schema};
use futures::TryStreamExt;
use lancedb::connection::Connection;
use lancedb::index::vector::IvfPqIndexBuilder;
use lancedb::index::Index;
use lancedb::index::vector::IvfPqIndexBuilder;
use lancedb::query::{ExecutableQuery, QueryBase};
use lancedb::{connect, DistanceType, Result, Table};
use lancedb::{DistanceType, Result, Table, connect};
#[tokio::main]
async fn main() -> Result<()> {
@@ -51,19 +51,21 @@ fn create_some_records() -> Result<Box<dyn arrow_array::RecordBatchReader + Send
// Create a RecordBatch stream.
let batches = RecordBatchIterator::new(
vec![RecordBatch::try_new(
schema.clone(),
vec![
Arc::new(Int32Array::from_iter_values(0..TOTAL as i32)),
Arc::new(
FixedSizeListArray::from_iter_primitive::<Float32Type, _, _>(
(0..TOTAL).map(|_| Some(vec![Some(1.0); DIM])),
DIM as i32,
vec![
RecordBatch::try_new(
schema.clone(),
vec![
Arc::new(Int32Array::from_iter_values(0..TOTAL as i32)),
Arc::new(
FixedSizeListArray::from_iter_primitive::<Float32Type, _, _>(
(0..TOTAL).map(|_| Some(vec![Some(1.0); DIM])),
DIM as i32,
),
),
),
],
)
.unwrap()]
],
)
.unwrap(),
]
.into_iter()
.map(Ok),
schema.clone(),

View File

@@ -8,10 +8,9 @@ use std::{iter::once, sync::Arc};
use arrow_array::{RecordBatch, StringArray};
use futures::StreamExt;
use lancedb::{
connect,
embeddings::{openai::OpenAIEmbeddingFunction, EmbeddingDefinition, EmbeddingFunction},
Result, connect,
embeddings::{EmbeddingDefinition, EmbeddingFunction, openai::OpenAIEmbeddingFunction},
query::{ExecutableQuery, QueryBase},
Result,
};
// --8<-- [end:imports]

View File

@@ -7,13 +7,12 @@ use arrow_array::{RecordBatch, StringArray};
use arrow_schema::{DataType, Field, Schema};
use futures::StreamExt;
use lancedb::{
connect,
Result, connect,
embeddings::{
sentence_transformers::SentenceTransformersEmbeddings, EmbeddingDefinition,
EmbeddingFunction,
EmbeddingDefinition, EmbeddingFunction,
sentence_transformers::SentenceTransformersEmbeddings,
},
query::{ExecutableQuery, QueryBase},
Result,
};
#[tokio::main]

View File

@@ -14,7 +14,7 @@ use futures::TryStreamExt;
use lancedb::connection::Connection;
use lancedb::index::Index;
use lancedb::query::{ExecutableQuery, QueryBase};
use lancedb::{connect, Result, Table as LanceDbTable};
use lancedb::{Result, Table as LanceDbTable, connect};
#[tokio::main]
async fn main() -> Result<()> {

View File

@@ -12,7 +12,7 @@ use lance_datagen::{BatchCount, BatchGeneratorBuilder, RowCount};
#[cfg(feature = "polars")]
use {crate::polars_arrow_convertors, polars::frame::ArrowChunk, polars::prelude::DataFrame};
use crate::{error::Result, Error};
use crate::{Error, error::Result};
/// An iterator of batches that also has a schema
pub trait RecordBatchReader: Iterator<Item = Result<arrow_array::RecordBatch>> {

View File

@@ -17,6 +17,7 @@ use lance_namespace::models::{
#[cfg(feature = "aws")]
use object_store::aws::AwsCredential;
use crate::Table;
use crate::connection::create_table::CreateTableBuilder;
use crate::data::scannable::Scannable;
use crate::database::listing::ListingDatabase;
@@ -31,7 +32,6 @@ use crate::remote::{
client::ClientConfig,
db::{OPT_REMOTE_API_KEY, OPT_REMOTE_HOST_OVERRIDE, OPT_REMOTE_REGION},
};
use crate::Table;
use lance::io::ObjectStoreParams;
pub use lance_encoding::version::LanceFileVersion;
#[cfg(feature = "remote")]
@@ -136,6 +136,7 @@ impl OpenTableBuilder {
lance_read_params: None,
location: None,
namespace_client: None,
managed_versioning: None,
},
embedding_registry,
}
@@ -235,6 +236,29 @@ impl OpenTableBuilder {
self
}
/// Set a namespace client for managed versioning support.
///
/// When a namespace client is provided and the table has `managed_versioning` enabled,
/// the table will use the namespace's commit handler to notify the namespace of
/// version changes. This enables features like event emission for table modifications.
pub fn namespace_client(mut self, client: Arc<dyn lance_namespace::LanceNamespace>) -> Self {
self.request.namespace_client = Some(client);
self
}
/// Set whether managed versioning is enabled for this table.
///
/// When set to `Some(true)`, the table will use namespace-managed commits.
/// When set to `Some(false)`, the table will use local commits even if namespace_client is set.
/// When set to `None` (default), the value will be fetched from the namespace if namespace_client is set.
///
/// This is typically set when the caller has already queried the namespace and knows the
/// managed_versioning status, avoiding a redundant describe_table call.
pub fn managed_versioning(mut self, enabled: bool) -> Self {
self.request.managed_versioning = Some(enabled);
self
}
/// Open the table
pub async fn execute(self) -> Result<Table> {
let table = self.parent.open_table(self.request).await?;
@@ -294,6 +318,12 @@ impl CloneTableBuilder {
self
}
/// Set a namespace client for managed versioning support.
pub fn namespace_client(mut self, client: Arc<dyn lance_namespace::LanceNamespace>) -> Self {
self.request.namespace_client = Some(client);
self
}
/// Execute the clone operation
pub async fn execute(self) -> Result<Table> {
let parent = self.parent.clone();
@@ -566,8 +596,11 @@ pub struct ConnectBuilder {
}
#[cfg(feature = "remote")]
const ENV_VARS_TO_STORAGE_OPTS: [(&str, &str); 1] =
[("AZURE_STORAGE_ACCOUNT_NAME", "azure_storage_account_name")];
const ENV_VARS_TO_STORAGE_OPTS: [(&str, &str); 3] = [
("AZURE_STORAGE_ACCOUNT_NAME", "azure_storage_account_name"),
("AZURE_CLIENT_ID", "azure_client_id"),
("AZURE_TENANT_ID", "azure_tenant_id"),
];
impl ConnectBuilder {
/// Create a new [`ConnectOptions`] with the given database URI.
@@ -758,10 +791,10 @@ impl ConnectBuilder {
options: &mut HashMap<String, String>,
) {
for (env_key, opt_key) in env_var_to_remote_storage_option {
if let Ok(env_value) = std::env::var(env_key) {
if !options.contains_key(*opt_key) {
options.insert((*opt_key).to_string(), env_value);
}
if let Ok(env_value) = std::env::var(env_key)
&& !options.contains_key(*opt_key)
{
options.insert((*opt_key).to_string(), env_value);
}
}
}
@@ -1011,14 +1044,13 @@ mod tests {
#[cfg(feature = "remote")]
#[test]
fn test_apply_env_defaults() {
let env_key = "TEST_APPLY_ENV_DEFAULTS_ENVIRONMENT_VARIABLE_ENV_KEY";
let env_val = "TEST_APPLY_ENV_DEFAULTS_ENVIRONMENT_VARIABLE_ENV_VAL";
let env_key = "PATH";
let env_val = std::env::var(env_key).expect("PATH should be set in test environment");
let opts_key = "test_apply_env_defaults_environment_variable_opts_key";
std::env::set_var(env_key, env_val);
let mut options = HashMap::new();
ConnectBuilder::apply_env_defaults(&[(env_key, opts_key)], &mut options);
assert_eq!(Some(&env_val.to_string()), options.get(opts_key));
assert_eq!(Some(&env_val), options.get(opts_key));
options.insert(opts_key.to_string(), "EXPLICIT-VALUE".to_string());
ConnectBuilder::apply_env_defaults(&[(env_key, opts_key)], &mut options);

View File

@@ -6,12 +6,12 @@ use std::sync::Arc;
use lance_io::object_store::StorageOptionsProvider;
use crate::{
Error, Result, Table,
connection::{merge_storage_options, set_storage_options_provider},
data::scannable::{Scannable, WithEmbeddingsScannable},
database::{CreateTableMode, CreateTableRequest, Database},
embeddings::{EmbeddingDefinition, EmbeddingFunction, EmbeddingRegistry},
table::WriteOptions,
Error, Result, Table,
};
pub struct CreateTableBuilder {
@@ -167,7 +167,7 @@ impl CreateTableBuilder {
#[cfg(test)]
mod tests {
use arrow_array::{
record_batch, Array, FixedSizeListArray, Float32Array, RecordBatch, RecordBatchIterator,
Array, FixedSizeListArray, Float32Array, RecordBatch, RecordBatchIterator, record_batch,
};
use arrow_schema::{ArrowError, DataType, Field, Schema};
use futures::TryStreamExt;
@@ -380,11 +380,12 @@ mod tests {
.await
.unwrap();
let other_schema = Arc::new(Schema::new(vec![Field::new("y", DataType::Int32, false)]));
assert!(db
.create_empty_table("test", other_schema.clone())
.execute()
.await
.is_err()); // TODO: assert what this error is
assert!(
db.create_empty_table("test", other_schema.clone())
.execute()
.await
.is_err()
); // TODO: assert what this error is
let overwritten = db
.create_empty_table("test", other_schema.clone())
.mode(CreateTableMode::Overwrite)

View File

@@ -5,9 +5,9 @@ use std::collections::HashMap;
use arrow::compute::kernels::{aggregate::bool_and, length::length};
use arrow_array::{
Array, GenericListArray, OffsetSizeTrait, PrimitiveArray, RecordBatchReader,
cast::AsArray,
types::{ArrowPrimitiveType, Int32Type, Int64Type},
Array, GenericListArray, OffsetSizeTrait, PrimitiveArray, RecordBatchReader,
};
use arrow_ord::cmp::eq;
use arrow_schema::DataType;
@@ -78,7 +78,7 @@ pub fn infer_vector_columns(
_ => {
return Err(Error::Schema {
message: format!("Column {} is not a list", col_name),
})
});
}
} {
if let Some(Some(prev_dim)) = columns_to_infer.get(&col_name) {
@@ -102,8 +102,8 @@ mod tests {
use super::*;
use arrow_array::{
types::{Float32Type, Float64Type},
FixedSizeListArray, Float32Array, ListArray, RecordBatch, RecordBatchIterator, StringArray,
types::{Float32Type, Float64Type},
};
use arrow_schema::{DataType, Field, Schema};
use std::{sync::Arc, vec};

View File

@@ -4,10 +4,10 @@
use std::{iter::repeat_with, sync::Arc};
use arrow_array::{
cast::AsArray,
types::{Float16Type, Float32Type, Float64Type, Int32Type, Int64Type},
Array, ArrowNumericType, FixedSizeListArray, PrimitiveArray, RecordBatch, RecordBatchIterator,
RecordBatchReader,
cast::AsArray,
types::{Float16Type, Float32Type, Float64Type, Int32Type, Int64Type},
};
use arrow_cast::{can_cast_types, cast};
use arrow_schema::{ArrowError, DataType, Field, Schema};
@@ -184,7 +184,7 @@ mod tests {
use std::sync::Arc;
use arrow_array::{
FixedSizeListArray, Float16Array, Float32Array, Float64Array, Int32Array, Int8Array,
FixedSizeListArray, Float16Array, Float32Array, Float64Array, Int8Array, Int32Array,
RecordBatch, RecordBatchIterator, StringArray,
};
use arrow_schema::Field;

View File

@@ -13,16 +13,16 @@ use crate::arrow::{
SendableRecordBatchStream, SendableRecordBatchStreamExt, SimpleRecordBatchStream,
};
use crate::embeddings::{
compute_embeddings_for_batch, compute_output_schema, EmbeddingDefinition, EmbeddingFunction,
EmbeddingRegistry,
EmbeddingDefinition, EmbeddingFunction, EmbeddingRegistry, compute_embeddings_for_batch,
compute_output_schema,
};
use crate::table::{ColumnDefinition, ColumnKind, TableDefinition};
use crate::{Error, Result};
use arrow_array::{ArrayRef, RecordBatch, RecordBatchIterator, RecordBatchReader};
use arrow_schema::{ArrowError, SchemaRef};
use async_trait::async_trait;
use futures::stream::once;
use futures::StreamExt;
use futures::stream::once;
use lance_datafusion::utils::StreamingWriteSource;
pub trait Scannable: Send {

View File

@@ -19,12 +19,12 @@ use std::sync::Arc;
use std::time::Duration;
use lance::dataset::ReadParams;
use lance_namespace::LanceNamespace;
use lance_namespace::models::{
CreateNamespaceRequest, CreateNamespaceResponse, DescribeNamespaceRequest,
DescribeNamespaceResponse, DropNamespaceRequest, DropNamespaceResponse, ListNamespacesRequest,
ListNamespacesResponse, ListTablesRequest, ListTablesResponse,
};
use lance_namespace::LanceNamespace;
use crate::data::scannable::Scannable;
use crate::error::Result;
@@ -66,6 +66,10 @@ pub struct OpenTableRequest {
/// Optional namespace client for server-side query execution.
/// When set, queries will be executed on the namespace server instead of locally.
pub namespace_client: Option<Arc<dyn LanceNamespace>>,
/// Whether managed versioning is enabled for this table.
/// When Some(true), the table will use namespace-managed commits instead of local commits.
/// When None and namespace_client is provided, the value will be fetched from the namespace.
pub managed_versioning: Option<bool>,
}
impl std::fmt::Debug for OpenTableRequest {
@@ -77,6 +81,7 @@ impl std::fmt::Debug for OpenTableRequest {
.field("lance_read_params", &self.lance_read_params)
.field("location", &self.location)
.field("namespace_client", &self.namespace_client)
.field("managed_versioning", &self.managed_versioning)
.finish()
}
}
@@ -161,6 +166,9 @@ pub struct CloneTableRequest {
/// Whether to perform a shallow clone (true) or deep clone (false). Defaults to true.
/// Currently only shallow clone is supported.
pub is_shallow: bool,
/// Optional namespace client for managed versioning support.
/// When set, enables the commit handler to track table versions through the namespace.
pub namespace_client: Option<Arc<dyn LanceNamespace>>,
}
impl CloneTableRequest {
@@ -172,6 +180,7 @@ impl CloneTableRequest {
source_version: None,
source_tag: None,
is_shallow: true,
namespace_client: None,
}
}
}

View File

@@ -8,7 +8,7 @@ use std::path::Path;
use std::{collections::HashMap, sync::Arc};
use lance::dataset::refs::Ref;
use lance::dataset::{builder::DatasetBuilder, ReadParams, WriteMode};
use lance::dataset::{ReadParams, WriteMode, builder::DatasetBuilder};
use lance::io::{ObjectStore, ObjectStoreParams, WrappingObjectStore};
use lance_datafusion::utils::StreamingWriteSource;
use lance_encoding::version::LanceFileVersion;
@@ -669,6 +669,7 @@ impl ListingDatabase {
lance_read_params: None,
location: None,
namespace_client: None,
managed_versioning: None,
};
let req = (callback)(req);
let table = self.open_table(req).await?;
@@ -869,6 +870,7 @@ impl Database for ListingDatabase {
Some(write_params),
self.read_consistency_interval,
request.namespace_client,
false, // server_side_query_enabled - listing database doesn't support server-side queries
)
.await
{
@@ -946,7 +948,9 @@ impl Database for ListingDatabase {
self.store_wrapper.clone(),
None,
self.read_consistency_interval,
None,
request.namespace_client,
false, // server_side_query_enabled - listing database doesn't support server-side queries
None, // managed_versioning - will be queried if namespace_client is provided
)
.await?;
@@ -1022,6 +1026,8 @@ impl Database for ListingDatabase {
Some(read_params),
self.read_consistency_interval,
request.namespace_client,
false, // server_side_query_enabled - listing database doesn't support server-side queries
request.managed_versioning, // Pass through managed_versioning from request
)
.await?,
);
@@ -1097,11 +1103,11 @@ impl Database for ListingDatabase {
#[cfg(test)]
mod tests {
use super::*;
use crate::Table;
use crate::connection::ConnectRequest;
use crate::data::scannable::Scannable;
use crate::database::{CreateTableMode, CreateTableRequest};
use crate::table::WriteOptions;
use crate::Table;
use arrow_array::{Int32Array, RecordBatch, StringArray};
use arrow_schema::{DataType, Field, Schema};
use std::path::PathBuf;
@@ -1162,6 +1168,7 @@ mod tests {
source_version: None,
source_tag: None,
is_shallow: true,
namespace_client: None,
})
.await
.unwrap();
@@ -1222,6 +1229,7 @@ mod tests {
source_version: None,
source_tag: None,
is_shallow: true,
namespace_client: None,
})
.await
.unwrap();
@@ -1281,6 +1289,7 @@ mod tests {
source_version: None,
source_tag: None,
is_shallow: true,
namespace_client: None,
})
.await;
@@ -1317,6 +1326,7 @@ mod tests {
source_version: None,
source_tag: None,
is_shallow: false, // Request deep clone
namespace_client: None,
})
.await;
@@ -1357,6 +1367,7 @@ mod tests {
source_version: None,
source_tag: None,
is_shallow: true,
namespace_client: None,
})
.await;
@@ -1397,6 +1408,7 @@ mod tests {
source_version: None,
source_tag: None,
is_shallow: true,
namespace_client: None,
})
.await;
@@ -1416,6 +1428,7 @@ mod tests {
source_version: None,
source_tag: None,
is_shallow: true,
namespace_client: None,
})
.await;
@@ -1452,6 +1465,7 @@ mod tests {
source_version: Some(1),
source_tag: Some("v1.0".to_string()),
is_shallow: true,
namespace_client: None,
})
.await;
@@ -1525,6 +1539,7 @@ mod tests {
source_version: Some(initial_version),
source_tag: None,
is_shallow: true,
namespace_client: None,
})
.await
.unwrap();
@@ -1603,6 +1618,7 @@ mod tests {
source_version: None,
source_tag: Some("v1.0".to_string()),
is_shallow: true,
namespace_client: None,
})
.await
.unwrap();
@@ -1654,6 +1670,7 @@ mod tests {
source_version: None,
source_tag: None,
is_shallow: true,
namespace_client: None,
})
.await
.unwrap();
@@ -1746,6 +1763,7 @@ mod tests {
source_version: None,
source_tag: None,
is_shallow: true,
namespace_client: None,
})
.await
.unwrap();

View File

@@ -7,18 +7,20 @@ use std::collections::HashMap;
use std::sync::Arc;
use async_trait::async_trait;
use lance::io::commit::namespace_manifest::LanceNamespaceExternalManifestStore;
use lance_io::object_store::{ObjectStoreParams, StorageOptionsAccessor};
use lance_namespace::{
models::{
CreateEmptyTableRequest, CreateNamespaceRequest, CreateNamespaceResponse,
DeclareTableRequest, DescribeNamespaceRequest, DescribeNamespaceResponse,
DescribeTableRequest, DropNamespaceRequest, DropNamespaceResponse, DropTableRequest,
ListNamespacesRequest, ListNamespacesResponse, ListTablesRequest, ListTablesResponse,
},
LanceNamespace,
models::{
CreateNamespaceRequest, CreateNamespaceResponse, DeclareTableRequest,
DescribeNamespaceRequest, DescribeNamespaceResponse, DescribeTableRequest,
DropNamespaceRequest, DropNamespaceResponse, DropTableRequest, ListNamespacesRequest,
ListNamespacesResponse, ListTablesRequest, ListTablesResponse,
},
};
use lance_namespace_impls::ConnectBuilder;
use log::warn;
use lance_table::io::commit::CommitHandler;
use lance_table::io::commit::external_manifest::ExternalManifestCommitHandler;
use crate::database::ReadConsistency;
use crate::error::{Error, Result};
@@ -206,83 +208,48 @@ impl Database for LanceNamespaceDatabase {
let mut table_id = request.namespace.clone();
table_id.push(request.name.clone());
// Try declare_table first, falling back to create_empty_table for backwards
// compatibility with older namespace clients that don't support declare_table
let declare_request = DeclareTableRequest {
id: Some(table_id.clone()),
..Default::default()
};
let (location, initial_storage_options) =
match self.namespace.declare_table(declare_request).await {
Ok(response) => {
let loc = response.location.ok_or_else(|| Error::Runtime {
message: "Table location is missing from declare_table response"
.to_string(),
})?;
// Use storage options from response, fall back to self.storage_options
let opts = response
.storage_options
.or_else(|| Some(self.storage_options.clone()))
.filter(|o| !o.is_empty());
(loc, opts)
}
Err(e) => {
// Check if the error is "not supported" and try create_empty_table as fallback
let err_str = e.to_string().to_lowercase();
if err_str.contains("not supported") || err_str.contains("not implemented") {
warn!(
"declare_table is not supported by the namespace client, \
falling back to deprecated create_empty_table. \
create_empty_table is deprecated and will be removed in Lance 3.0.0. \
Please upgrade your namespace client to support declare_table."
);
#[allow(deprecated)]
let create_empty_request = CreateEmptyTableRequest {
id: Some(table_id.clone()),
..Default::default()
};
let (location, initial_storage_options, managed_versioning) = {
let response = self.namespace.declare_table(declare_request).await?;
let loc = response.location.ok_or_else(|| Error::Runtime {
message: "Table location is missing from declare_table response".to_string(),
})?;
// Use storage options from response, fall back to self.storage_options
let opts = response
.storage_options
.or_else(|| Some(self.storage_options.clone()))
.filter(|o| !o.is_empty());
(loc, opts, response.managed_versioning)
};
#[allow(deprecated)]
let create_response = self
.namespace
.create_empty_table(create_empty_request)
.await
.map_err(|e| Error::Runtime {
message: format!("Failed to create empty table: {}", e),
})?;
// Build write params with storage options and commit handler
let mut params = request.write_options.lance_write_params.unwrap_or_default();
let loc = create_response.location.ok_or_else(|| Error::Runtime {
message: "Table location is missing from create_empty_table response"
.to_string(),
})?;
// For deprecated path, use self.storage_options
let opts = if self.storage_options.is_empty() {
None
} else {
Some(self.storage_options.clone())
};
(loc, opts)
} else {
return Err(Error::Runtime {
message: format!("Failed to declare table: {}", e),
});
}
}
};
let write_params = if let Some(storage_opts) = initial_storage_options {
let mut params = request.write_options.lance_write_params.unwrap_or_default();
// Set up storage options if provided
if let Some(storage_opts) = initial_storage_options {
let store_params = params
.store_params
.get_or_insert_with(ObjectStoreParams::default);
store_params.storage_options_accessor = Some(Arc::new(
StorageOptionsAccessor::with_static_options(storage_opts),
));
Some(params)
} else {
request.write_options.lance_write_params
};
}
// Set up commit handler when managed_versioning is enabled
if managed_versioning == Some(true) {
let external_store =
LanceNamespaceExternalManifestStore::new(self.namespace.clone(), table_id.clone());
let commit_handler: Arc<dyn CommitHandler> = Arc::new(ExternalManifestCommitHandler {
external_manifest_store: Arc::new(external_store),
});
params.commit_handler = Some(commit_handler);
}
let write_params = Some(params);
let native_table = NativeTable::create_from_namespace(
self.namespace.clone(),

View File

@@ -11,16 +11,16 @@ use lance_core::ROW_ID;
use lance_datafusion::exec::SessionContextExt;
use crate::{
Error, Result, Table,
arrow::{SendableRecordBatchStream, SendableRecordBatchStreamExt, SimpleRecordBatchStream},
connect,
database::{CreateTableRequest, Database},
dataloader::permutation::{
shuffle::{Shuffler, ShufflerConfig},
split::{SplitStrategy, Splitter, SPLIT_ID_COLUMN},
util::{rename_column, TemporaryDirectory},
split::{SPLIT_ID_COLUMN, SplitStrategy, Splitter},
util::{TemporaryDirectory, rename_column},
},
query::{ExecutableQuery, QueryBase, Select},
Error, Result, Table,
};
pub const SRC_ROW_ID_COL: &str = "row_id";

View File

@@ -25,8 +25,8 @@ use futures::{StreamExt, TryStreamExt};
use lance::dataset::scanner::DatasetRecordBatchStream;
use lance::io::RecordBatchStream;
use lance_arrow::RecordBatchExt;
use lance_core::error::LanceOptionExt;
use lance_core::ROW_ID;
use lance_core::error::LanceOptionExt;
use std::collections::HashMap;
use std::sync::Arc;
@@ -339,6 +339,12 @@ impl PermutationReader {
}
Ok(false)
}
Select::Expr(columns) => {
// For Expr projections, we check if any alias is _rowid.
// We can't validate the expression itself (it may differ from _rowid)
// but we allow it through; the column will be included.
Ok(columns.iter().any(|(alias, _)| alias == ROW_ID))
}
}
}
@@ -500,10 +506,10 @@ mod tests {
use rand::seq::SliceRandom;
use crate::{
Table,
arrow::SendableRecordBatchStream,
query::{ExecutableQuery, QueryBase},
test_utils::datagen::{virtual_table, LanceDbDatagenExt},
Table,
test_utils::datagen::{LanceDbDatagenExt, virtual_table},
};
use super::*;

View File

@@ -18,12 +18,12 @@ use lance_io::{
scheduler::{ScanScheduler, SchedulerConfig},
utils::CachedFileSize,
};
use rand::{seq::SliceRandom, Rng, RngCore};
use rand::{Rng, RngCore, seq::SliceRandom};
use crate::{
arrow::{SendableRecordBatchStream, SimpleRecordBatchStream},
dataloader::permutation::util::{non_crypto_rng, TemporaryDirectory},
Error, Result,
arrow::{SendableRecordBatchStream, SimpleRecordBatchStream},
dataloader::permutation::util::{TemporaryDirectory, non_crypto_rng},
};
#[derive(Debug, Clone)]
@@ -281,7 +281,7 @@ mod tests {
use datafusion_expr::col;
use futures::TryStreamExt;
use lance_datagen::{BatchCount, BatchGeneratorBuilder, ByteCount, RowCount, Seed};
use rand::{rngs::SmallRng, SeedableRng};
use rand::{SeedableRng, rngs::SmallRng};
fn test_gen() -> BatchGeneratorBuilder {
lance_datagen::gen_batch()

View File

@@ -2,8 +2,8 @@
// SPDX-FileCopyrightText: Copyright The LanceDB Authors
use std::sync::{
atomic::{AtomicBool, AtomicU64, AtomicUsize, Ordering},
Arc,
atomic::{AtomicBool, AtomicU64, AtomicUsize, Ordering},
};
use arrow_array::{Array, BooleanArray, RecordBatch, UInt64Array};
@@ -15,13 +15,13 @@ use lance_arrow::SchemaExt;
use lance_core::ROW_ID;
use crate::{
Error, Result,
arrow::{SendableRecordBatchStream, SimpleRecordBatchStream},
dataloader::{
permutation::shuffle::{Shuffler, ShufflerConfig},
permutation::util::TemporaryDirectory,
},
query::{Query, QueryBase, Select},
Error, Result,
};
pub const SPLIT_ID_COLUMN: &str = "split_id";

Some files were not shown because too many files have changed in this diff Show More