Compare commits

..

74 Commits

Author SHA1 Message Date
Lance Release
f31561c5bb Bump version: 0.30.0-beta.3 → 0.30.0-beta.4 2026-03-09 08:45:25 +00:00
Jack Ye
e0c5ceac03 fix: propagate managed versioning for namespace connection (#3111)
Without this fix, if user directly use the native table to do operations
like `add_columns`, even if it is configured to use namespace db
connection, it is not really propagated through.

The fix is to bring lancedb's python binding up to date and do a similar
implementation as https://github.com/lance-format/lance/pull/5968, and
make sure the namespace is fully propagated through all the related
calls.

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-03-09 01:44:31 -07:00
Prashanth Rao
e93bb3355a docs: add meth/func names to mkdocstrings (#3101)
LanceDB's SDK API docs do not currently show method names under any
given object, and this makes it harder to quickly understand and find
relevant method names for a given class. Geneva docs show the available
methods in the right navigation.

This PR standardizes the appearance of the LanceDB SDK API in the docs
to be more similar to Geneva's.
<img width="1386" height="792" alt="image"
src="https://github.com/user-attachments/assets/30816591-d6d5-495d-886d-e234beeb6059"
/>

<img width="897" height="540" alt="image"
src="https://github.com/user-attachments/assets/d5491b6b-c7bf-4d3b-8b15-1a1a7700e7c9"
/>
2026-03-06 08:54:45 -08:00
Will Jones
b75991eb07 fix: propagate cast errors in add() (#3075)
When we write data with `add()`, we can input data to the table's
schema. However, we were using "safe" mode, which propagates errors as
nulls. For example, if you pass `u64::max` into a field that is a `u32`,
it will just write null instead of giving overflow error. Now it
propagates the overflow. This is the same behavior as other systems like
DuckDB.

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-05 20:24:50 -08:00
Wyatt Alt
97ca9bb943 feat: allow passing azure client/tenant ID through remote SDK (#3102)
Prior to this commit we supported passing the azure storage account name
to the lancedb remote SDK through headers. This adds support for client
ID and tenant ID as well.
2026-03-04 11:11:36 -08:00
Xuanwo
fa1b04f341 chore: migrate Rust crates to edition 2024 and fix clippy warnings (#3098)
This PR migrates all Rust crates in the workspace to Rust 2024 edition
and addresses the resulting compatibility updates. It also fixes all
clippy warnings surfaced by the workspace checks so the codebase remains
warning-free under the current lint configuration.

Context:
- Scope: workspace edition bump (`2021` -> `2024`) plus follow-up
refactors required by new edition and clippy rules.
- Validation: `cargo fmt --all` and `cargo clippy --quiet --features
remote --tests --examples -- -D warnings` both pass.
2026-03-03 16:23:29 -08:00
mrncstt
367abe99d2 feat(python): support dict to SQL struct conversion in table.update() (#3089)
## Summary

- Add `@value_to_sql.register(dict)` handler that converts Python dicts
to DataFusion's `named_struct()` SQL syntax
- Enables updating struct-typed columns via `table.update(values={"col":
{"field_a": 1, "field_b": "hello"}})`
- Recursively handles nested structs, lists, nulls, and all existing
scalar types

Closes #1363

## Details

The `named_struct` function was introduced in DataFusion 38 and is now
available (LanceDB uses DataFusion 52.1). The implementation follows the
existing `singledispatch` pattern in `util.py`.

**Example conversion:**
```python
value_to_sql({"field_a": 1, "field_b": "hello"})
# => "named_struct('field_a', 1, 'field_b', 'hello')"
```

## Test plan

- [x] Unit tests for flat struct, nested struct, list inside struct,
mixed types, null values, and empty dict
- [ ] CI integration tests with actual table.update() on struct columns

🔗 [DataFusion named_struct
docs](https://datafusion.apache.org/user-guide/sql/scalar_functions.html#named-struct)
2026-03-03 13:36:08 -08:00
Xuanwo
52ce2c995c fix(ci): only run npm publish on release tags (#3093)
This PR fixes the npm publish dry-run failure for prerelease versions
without changing the existing workflow trigger behavior. The publish
step now detects prerelease versions from `nodejs/package.json` and
always appends `--tag preview` when needed.

Context:
- On `main` pushes, the workflow still runs `npm publish --dry-run` by
design.
- Recent failures were caused by prerelease versions (for example
`0.27.0-beta.3`) running without `--tag`, which npm rejects.
- The previous `refs/tags/v...-beta...` check did not apply on branch
pushes, so dry-run could fail even though release tags worked.
2026-03-04 01:35:10 +08:00
Sean Mackrory
e71a00998c ci: add regression test for fastSearch in FTS queries in TypeScript (#3090)
We recently added support for this for the Python bindings, and wanted
to confirm this already worked as expected in the TS bindings.
2026-03-03 07:09:09 -08:00
Sean Mackrory
39a2ac0a1c feat: add parity between fast_search keyword argument between vector and FTS searches (#3091)
We don't necessarily need to do this, but one user was confused having
used `fast_search=True` as a keyword argument for vector searches, but
being unable to do so for FTS, even after the most recent changes. I
think this is the only discrepancy in where that is possible.
2026-03-03 05:21:36 -08:00
Wyatt Alt
bc7b344fa4 feat: add support for remote index params (#3087)
Prior to this commit the remote SDK did not support the full set of
index parameters. This extends the SDK to support them.
2026-03-02 11:14:28 -08:00
Will Jones
f91d2f5fec ci(python): pin maturin to work around bug (#3088)
Work around for https://github.com/PyO3/maturin/issues/3059
2026-03-02 09:38:54 -08:00
Wyatt Alt
cf81b6419f feat: add num_deleted_rows to delete result (#3077) 2026-03-02 08:37:14 -08:00
Lance Release
0498ac1f2f Bump version: 0.27.0-beta.2 → 0.27.0-beta.3 2026-02-28 01:31:51 +00:00
Lance Release
aeb1c3ee6a Bump version: 0.30.0-beta.2 → 0.30.0-beta.3 2026-02-28 01:29:53 +00:00
Weston Pace
f9ae46c0e7 feat: upgrade lance to 3.0.0-rc.2 and add bindings for fast_search (#3083) 2026-02-27 17:27:01 -08:00
Will Jones
84bf022fb1 fix(python): pin pylance to make datafusion table provider match version (#3080) 2026-02-27 13:34:05 -08:00
Will Jones
310967eceb ci(rust): fix linux job (#3076) 2026-02-26 19:25:46 -08:00
Jack Ye
154dbeee2a chore: fix clippy for PreprocessingOutput without remote feature (#3070)
Fix clippy:

```
error: fields `overwrite` and `rescannable` are never read
Error:    --> /home/runner/work/xxxx/xxxx/src/lancedb/rust/lancedb/src/table/add_data.rs:158:9
    |
156 | pub struct PreprocessingOutput {
    |            ------------------- fields in this struct
157 |     pub plan: Arc<dyn datafusion_physical_plan::ExecutionPlan>,
158 |     pub overwrite: bool,
    |         ^^^^^^^^^
159 |     pub rescannable: bool,
    |         ^^^^^^^^^^^
    |
    = note: `-D dead-code` implied by `-D warnings`
    = help: to override `-D warnings` add `#[allow(dead_code)]`
```
2026-02-25 14:59:32 -08:00
Lance Release
c9c08ac8b9 Bump version: 0.27.0-beta.1 → 0.27.0-beta.2 2026-02-25 07:47:54 +00:00
Lance Release
e253f5d9b6 Bump version: 0.30.0-beta.1 → 0.30.0-beta.2 2026-02-25 07:46:06 +00:00
LanceDB Robot
05b4fb0990 chore: update lance dependency to v3.1.0-beta.2 (#3068)
## Summary
- Bump Lance Rust workspace dependencies to `v3.1.0-beta.2` via
`ci/set_lance_version.py`.
- Update Java `lance-core.version` to `3.1.0-beta.2`.

## Verification
- `cargo clippy --workspace --tests --all-features -- -D warnings`
- `cargo fmt --all`

## Release Reference
- refs/tags/v3.1.0-beta.2
2026-02-24 23:02:22 -08:00
Mesut-Doner
613b9c1099 feat(rust): add expression builder API for type-safe query filters (#3032)
## Summary

Adds a Rust expression builder API as a type-safe alternative to SQL
strings for query filters.

## Motivation

Filtering with raw SQL strings can be awkward when using variables and
special types:


Closes  #3038

---------

Co-authored-by: Will Jones <willjones127@gmail.com>
2026-02-24 18:44:03 -08:00
Will Jones
d5948576b9 feat: parallel inserts for local tables (#3062)
When input data is sufficiently large, we automatically split up into
parallel writes using a round-robin exchange operator. We sample the
first batch to determine data width, and target size of 1 million rows
or 2GB, whichever is smaller.
2026-02-24 12:26:51 -08:00
Will Jones
0d3fc7860a ci: fix python DataFusion test (#3060) 2026-02-24 07:59:12 -08:00
Weston Pace
531cec075c fix: don't expect all offsets to fit in one batch in permutation reader (#3065)
This would cause takes against large permutations to fail
2026-02-24 06:32:54 -08:00
Will Jones
0e486511fa feat: hook up new writer for insert (#3029)
This hooks up a new writer implementation for the `add()` method. The
main immediate benefit is it allows streaming requests to remote tables,
and at the same time allowing retries for most inputs.

In NodeJS, we always convert the data to `Vec<RecordBatch>`, so it's
always retry-able.

For Python, all are retry-able, except `Iterator` and
`pa.RecordBatchReader`, which can only be consumed once. Some, like
`pa.datasets.Dataset` are retry-able *and* streaming.

A lot of the changes here are to make the new DataFusion write pipeline
maintain the same behavior as the existing Python-based preprocessing,
such as:

* casting input data to target schema
* rejecting NaN values if `on_bad_vectors="error"`
* applying embedding functions.

In future PRs, we'll enhance these by moving the embedding calls into
DataFusion and making sure we parallelize them. See:
https://github.com/lancedb/lancedb/issues/3048

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-23 14:43:31 -08:00
Will Jones
367262662d feat(nodejs): upgrade napi-rs from v2 to v3 (#3057)
## Summary

- Upgrades `@napi-rs/cli` from v2 to v3, `napi`/`napi-derive` Rust
crates to 3.x
- Fixes a bug
([napi-rs#1170](https://github.com/napi-rs/napi-rs/issues/1170)) where
the CLI failed to locate the built `.node` binary when a custom Cargo
target directory is set (via `config.toml`)

## Changes

**package.json / CLI**:
- `napi.name` → `napi.binaryName`, `napi.triples` → `napi.targets`
- Removed `--no-const-enum` flag and fixed output dir arg
- `napi universal` → `napi universalize`

**Rust API migration**:
- `#[napi::module_init]` → `#[napi_derive::module_init]`
- `napi::JsObject` → `Object`, `.get::<_, T>()` → `.get::<T>()`
- `ErrorStrategy` removed; `ThreadsafeFunction` now takes an explicit
`Return` type with `CalleeHandled = false` const generic
- `JsFunction` + `create_threadsafe_function` replaced by typed
`Function<Args, Return>` + `build_threadsafe_function().build()`
- `RerankerCallbacks` struct removed (`Function<'env,...>` can't be
stored in structs); `VectorQuery::rerank` now accepts the function
directly
- `ClassInstance::clone()` now returns `ClassInstance`, fixed with
explicit deref
- `Vec<u8>` in `#[napi(object)]` now maps to `Array<number>` in v3;
changed to `Buffer` to preserve the TypeScript `Buffer` type

**TypeScript**:
- `inner.rerank({ rerankHybrid: async (_, args) => ... })` →
`inner.rerank(async (args) => ...)`
- Header provider callback wrapped in `async` to match stricter typed
constructor signature

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-23 14:42:55 -08:00
Lance Release
11efaf46ae Bump version: 0.27.0-beta.0 → 0.27.0-beta.1 2026-02-23 18:34:48 +00:00
Lance Release
1ea22ee5ef Bump version: 0.30.0-beta.0 → 0.30.0-beta.1 2026-02-23 18:33:28 +00:00
LanceDB Robot
8cef8806e9 chore: update lance dependency to v3.0.0-beta.5 (#3058)
## Summary
- Bump Lance Rust dependencies and Java `lance-core` to v3.0.0-beta.5
(refs/tags/v3.0.0-beta.5).
- Update workspace toolchain and dependency defaults needed for the new
Lance release.
- Resolve new clippy lint defaults introduced by the toolchain update.

## Validation
- `cargo clippy --workspace --tests --all-features -- -D warnings`
- `cargo fmt --all`

---------

Co-authored-by: Jack Ye <yezhaoqin@gmail.com>
2026-02-23 00:39:30 -08:00
Will Jones
a3cd7fce69 fix: update DatasetConsistencyWrapper to accept same-version updates (#3055)
## Summary

`DatasetConsistencyWrapper::update()` only stored datasets with a
strictly newer
version. This caused `migrate_manifest_paths_v2` to silently drop its
update since
the migration renames files without bumping the dataset version. The
subsequent
`uses_v2_manifest_paths()` call would then return the stale cached
dataset.

Changed the version check from `>` to `>=` so same-version updates are
accepted.

## Test plan

- [x] Existing `test_create_table_v2_manifest_paths_async` Python test
should pass
- [x] Existing `should be able to migrate tables to the V2 manifest
paths` NodeJS test should pass
- [x] All dataset wrapper unit tests pass locally


🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-20 16:01:15 -08:00
Will Jones
48ddc833dd feat: check for dataset updates in the background (#3021)
This updates `DatasetConsistencyWrapper` to block less:

1. `DatasetConsistencyWrapper::get()` just returns `Arc<Dataset>` now,
instead of a guard that blocks writes.
`DatasetConsistencyWrapper::get_mut()` is gone; now write methods just
use `get()` and then later call `update()` with the new version. This
means a given table handle can do concurrent reads **and** writes.
2. In weak consistency mode, will check for dataset updates in the
background, instead of blocking calls to `get()`.

---------

Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-20 11:18:33 -08:00
Varun Chawla
2802764092 fix(embeddings): stop retrying OpenAI 401 authentication errors (#2995)
## Summary
Fixes #1679

This PR prevents the OpenAI embedding function from retrying when
receiving a 401 Unauthorized error. Authentication errors are permanent
failures that won't be fixed by retrying, yet the current implementation
retries all exceptions up to 7 times by default.

## Changes
- Modified `retry_with_exponential_backoff` in `utils.py` to check for
non-retryable errors before retrying
- Added `_is_non_retryable_error` helper function that detects:
  - Exceptions with name `AuthenticationError` (OpenAI's 401 error)
  - Exceptions with `status_code` attribute of 401 or 403
- Enhanced OpenAI embeddings to explicitly catch and re-raise
`AuthenticationError` with better logging
- Added unit test `test_openai_no_retry_on_401` to verify authentication
errors don't trigger retries

## Test Plan
- Added test that verifies:
  1. A function raising `AuthenticationError` is only called once
  2. No retry delays occur (sleep is never called)
- Existing tests continue to pass
- Formatting applied via `make format`

## Example Behavior

**Before**: With an invalid API key, users would see 7 retry attempts
over ~2 minutes:
```
WARNING:root:Error occurred: Error code: 401 - {'error': {'message': 'Incorrect API key provided...'}}
 Retrying in 3.97 seconds (retry 1 of 7)
WARNING:root:Error occurred: Error code: 401...
 Retrying in 7.94 seconds (retry 2 of 7)
...
```

**After**: With an invalid API key, the error is raised immediately:
```
ERROR:root:Authentication failed: Invalid API key provided
AuthenticationError: Error code: 401 - {'error': {'message': 'Incorrect API key provided...'}}
```

This provides better UX and prevents unnecessary API calls that would
fail anyway.

---------

Co-authored-by: Will Jones <willjones127@gmail.com>
2026-02-19 09:20:54 -08:00
Weston Pace
37bbb0dba1 fix: allow permutation reader to work with remote tables as well (#3047)
Fixed one more spot that was relying on `_inner`.
2026-02-19 00:41:41 +05:30
Prashanth Rao
155ec16161 fix: deprecate outdated files for embedding registry (#3037)
There are old and outdated files in our embedding registry that can
confuse coding agents. This PR deprecates the following files that have
newer, more modern methods to generate such embeddings.

- Deprecate `embeddings/siglip.py` 
- Deprecate `embeddings/gte.py` 

## Why this change?

Per a discussion with @AyushExel, the [embedding registry directory
](1840aa7edc/python/python/lancedb/embeddings)
in the LanceDB repo has a number of outdated files that need to be
deprecated.

See https://github.com/lancedb/docs/issues/85 for the docs gaps that
identified this.
- Add note in `openclip` docs that it can be used for SigLip embeddings,
which it now supports
- Add note in the `sentence-transformers` page that ALL text embedding
models on Hugging Face can be used
2026-02-18 12:04:39 -05:00
Weston Pace
636b8b5bbd fix: allow permutation reader to be used with remote tables (#3019)
There were two issues:

1. The python code needs to get access to the underlying rust table to
setup the permutation reader and the attributes involved in this differ
between the python local table and remote table objects.
~~2. The remote table was sending projection dictionaries as arrays of
tuples and (on LanceDB cloud at least) it does not appear this is how
rest servers are setup to receive them.~~ (this is now fixed as #3023)

~~Leaving as draft as this is built on
https://github.com/lancedb/lancedb/pull/3016~~
2026-02-18 05:44:08 -08:00
Omair Afzal
715b81c86b fix(python): graceful handling of empty result sets in hybrid search (#3030)
## Problem

When applying hard filters that result in zero matches, hybrid search
crashes with `IndexError: list index out of range` during reranking.
This happens because empty result tables are passed through the full
reranker pipeline, which expects at least one result.

Traceback from the issue:
```
lancedb/query.py: in _combine_hybrid_results
    results = reranker.rerank_hybrid(fts_query, vector_results, fts_results)
lancedb/rerankers/answerdotai.py: in rerank_hybrid
    combined_results = self._rerank(combined_results, query)
...
IndexError: list index out of range
```

## Fix

Added an early return in `_combine_hybrid_results` when both vector and
FTS results are empty. Instead of passing empty tables through
normalization, reranking, and score restoration (which can fail in
various ways), we now build a properly-typed empty result table with the
`_relevance_score` column and return it directly.

## Test

Added `test_empty_hybrid_result_reranker` that exercises
`_combine_hybrid_results` directly with empty vector and FTS tables,
verifying:
- Returns empty table with correct schema  
- Includes `_relevance_score` column
- Respects `with_row_ids` flag

Closes #2425
2026-02-17 11:37:10 -08:00
Omair Afzal
7e1616376e refactor: extract merge_insert into table/merge.rs submodule (#3031)
Completes the **merge_insert.rs** checklist item from #2949.

## Changes

- Moved `MergeResult` struct from `table.rs` to `table/merge.rs`
- Moved the `NativeTable::merge_insert` implementation into
`merge::execute_merge_insert()`, with the trait impl now delegating to
it (same pattern as `delete.rs`)
- Moved `test_merge_insert` and `test_merge_insert_use_index` tests into
`table/merge.rs`
- Improved moved tests to use `memory://` URIs instead of temporary
directories
- Cleaned up unused imports from `table.rs` (`FutureExt`,
`TryFutureExt`, `Either`, `WhenMatched`, `WhenNotMatchedBySource`,
`LanceMergeInsertBuilder`)
- `MergeResult` is re-exported from `table.rs` so the public API is
unchanged

## Testing

`cargo build -p lancedb` compiles cleanly with no warnings.
2026-02-17 11:36:53 -08:00
ChinmayGowda71
d5ac5b949a refactor(rust): extract query logic to src/table/query.rs (#3035)
References #2949 Moved query logic and helpers from table.rs to
query.rs. Refactored tests using guidelines and added coverage for multi
vector plan structure.
2026-02-17 09:04:21 -08:00
Lance Release
7be6f45e0b Bump version: 0.26.2 → 0.27.0-beta.0 2026-02-17 00:28:24 +00:00
Lance Release
d9e2d51f51 Bump version: 0.29.2 → 0.30.0-beta.0 2026-02-17 00:27:45 +00:00
LuQQiu
e081708cce fix: non-stopping dataset version check after passing the first consistency check interval (#3034)
When a table has a read consistency interval, queries within the
interval skip the version check. Once the interval expires, a list call
checks for new versions. If the version hasn't changed, the timer should
reset so the next interval begins, but it didn't. The timer stayed
expired, so every query after that triggered a list call, even though
nothing changed.

This affects all read operations (queries, schema lookups, searches) on
tables with read_consistency_interval set. Each operation adds a
list("_versions/") call to object storage, adding latency proportional
to the store's list performance. For high-QPS workloads, this can
saturate object store list throughput and significantly degrade query
latency.

Bug flow:
1. Every read operation (query, schema, search) calls
ensure_up_to_date()
2. ensure_up_to_date() calls is_up_to_date(), which compares
last_consistency_check.elapsed() against
   read_consistency_interval
  3. If the interval has expired, it calls reload()
4. reload() calls need_reload(), which calls latest_version_id() — this
is the list IOP
  (list("_versions/"))
5. If no new version, reload() returns early without resetting
last_consistency_check
6. On the next query, step 2 sees the stale timer again → step 3 → step
4 → another list IOP
  7. This repeats on every query forever
2026-02-16 15:49:14 -08:00
Will Jones
2d60ea6938 perf(remote): cache schema of remote tables (#3015)
Caches the schema of remote tables and invalidates the cache when:

1. After 30 second TTL
2. When we do an operation that changes schema (e.g. add_columns) or
checks out a different version (e.g. checkout_version)
3. When we get a 400, 404, or 500 reponse

If the schema is retrieved close to the TTL, we optimistically fetch the
schema in the background. This means a continuous stream of queries will
never have the schema fetch on the critical path.

Closes #3014

---------

Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-13 15:21:04 -08:00
Jack Ye
dcb1443143 ci: add codex fix ci workflow (#3022)
Similar to the lance one added recently:
https://github.com/lance-format/lance/actions/workflows/codex-fix-ci.yml
2026-02-13 14:20:02 -08:00
Will Jones
c0230f91d2 feat(rust)!: accept RecordBatch, Vec<RecordBatch> in create_table() and Table.add() (#2948)
BREAKING CHANGE: Arbitrary `impl RecordBatchReader` is no longer
accepted, it must be made into `Box<dyn RecordBatchReader>`.

This PR replaces `IntoArrow` with a new trait `Scannable` to define
input row data. This provides the following advantages:

1. **We can implement `Scannable` for more types than `IntoArrow`, such
as `RecordBatch` and `Vec<RecordBatch>`.** The `IntoArrow` trait was
implemented for arbitrary `T: RecordBatchReader`, and the Rust compiler
would prevent us from implementing it for foreign types like
`RecordBatch` because (theoretically) those types might implement
`RecordBatchReader` in the future. That's why we implement `Scannable`
for `Box<dyn RecordBatchReader>` instead; since it's a concrete type it
doesn't block implementing for other foreign types.
2. **We can potentially replay `Scannable` values**. Previously, we had
to choose between buffering all data in memory and supporting retries of
writes. But because `Scannable` things can optionally support
re-scanning, we now have a way of supporting retries while also
streaming.
3. **`Scannable` can provide hints like `num_rows`, which can be used to
schedule parallel writers.** Without knowing the total number of rows,
it's difficult to know whether it's worth writing multiple files in
parallel.

We don't yet fully take advantage of (2) and (3) yet, but will in future
PRs. For (2), in order to be ready to leverage this, we need to hook the
`Scannable` implementation up to Python and NodeJS bindings. Right now
they always pass down a stream, but we want to make sure they support
retries when possible. And for (3), this will need to be hooked up to
#2939 and to a pipeline for running pre-processing steps (like embedding
generation).

## Other changes

* Moved `create_table` and `add_data` into their own modules. I've
created a follow up issue to split up `table.rs` further, as it's by far
the largest file: https://github.com/lancedb/lancedb/issues/2949
* Eliminated the `HAS_DATA` generic for `CreateTableBuilder`. I didn't
see any public-facing places where we differentiated methods, which is
why I felt this simplification was okay.
* Added an `Error::External` variant and integrated some conversions to
allow certain errors to pass through transparently. This will fully work
once we upgrade Lance and get to take advantage of changes in
https://github.com/lance-format/lance/pull/5606
* Added LZ4 compression support for write requests to remote endpoints.
I checked and this has been supported on the server for > 1 year.

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 14:18:36 -08:00
LanceDB Robot
5d629c9ecb feat: update lance dependency to v2.0.1 (#3027)
## Summary
- Bump Lance Rust workspace dependencies to v2.0.1 and update Java
`lance-core` version.
- Verified `cargo clippy --workspace --tests --all-features -- -D
warnings` and `cargo fmt --all`.
- Triggering tag: https://github.com/lancedb/lance/releases/tag/v2.0.1
2026-02-13 13:53:02 -08:00
Weston Pace
14973ac9d1 fix: support dynamic projection on remote table (#3023)
The remote server expects an object (`{"alias": "col"}`) and the client
was previously sending a list of tuples `[["alias", "col"]]`
2026-02-13 10:10:56 -08:00
Weston Pace
70cbee6293 feat: improve Permutation pytorch integration (#3016)
This changes around the output format of `Permutation` in some breaking
ways but I think the API is still new enough to be considered
experimental.

1. In order to align with both huggingface's dataset and torch's
expectations the default output format is now a list of dicts
(row-major) instead of a dict of lists (column-major). I've added a
python_col option which will return the dict of lists.
2. In order to align with pytorch's expectation the `torch` format is
now a list of tensors (row-major) instead of a 2D tensor (column-major).
I've added a torch_col option which will return the 2D tensor instead.

Added tests for torch integration with Permutation

~~Leaving draft until https://github.com/lancedb/lancedb/pull/3013
merges as this is built on top of that~~
2026-02-12 13:41:14 -08:00
Weston Pace
02783bf440 feat: add a getitems implementation for the permutation (#3013) 2026-02-12 05:36:11 -08:00
Dhruv
4323ca0147 feat: show reranker info in hybrid search explain plan (#3006)
Closes #3000

The hybrid search `explain_plan` now shows the reranker as the top-level
node with
the vector and FTS sub-plans indented underneath, instead of just
listing them
separately with no reranker context.

**Before:**
```
Vector Search Plan:
ProjectionExec: ...
FTS Search Plan:
ProjectionExec: ...
```

**After:**
```
RRFReranker(K=60)
  Vector Search Plan:
  ProjectionExec: ...
  FTS Search Plan:
  ProjectionExec: ...
```

Other rerankers display similarly ; e.g.
`LinearCombinationReranker(weight=0.7, fill=1.0)`,
`MRRReranker(weight_vector=0.5, weight_fts=0.5)`,
`CohereReranker(model_name=name)`.

---------

Signed-off-by: dask-58 <googldhruv@gmail.com>
Co-authored-by: Will Jones <willjones127@gmail.com>
2026-02-10 11:45:39 -08:00
Dhruv
bd3dd6a8e5 fix: improve error message for multi-field FTS index creation (#3005)
Fixes #2999

The error message previously said `"field_names must be a string when
use_tantivy=False"` implying they should use the to be deprecated
tantivy backend #2998.

Updated the error message and docstring to instead guide users to create
a separate FTS index for each field

Signed-off-by: dask-58 <googldhruv@gmail.com>
2026-02-09 16:28:50 -08:00
Abhishek
3c1162612e refactor: extract optimize logic from table.rs into submodule (#2979)
## Summary

Continues the modularization effort of table operations as outlined in
#2949.

- Extracts optimization operations (`OptimizeAction`, `OptimizeStats`,
`execute_optimize`, `compact_files_impl`, `cleanup_old_versions`,
`optimize_indices`) from
  `table.rs` into `table/optimize.rs`
  - Public API remains unchanged via re-exports
- Adds comprehensive tests including error cases with message assertions

  ## Test plan

  - [x] All new optimization tests pass
  - [x] All existing tests pass
  - [x] `cargo clippy` passes with no warnings
  - [x] `cargo fmt --check` passes

---------

Co-authored-by: Will Jones <willjones127@gmail.com>
2026-02-09 16:22:57 -08:00
Jack Ye
53c7c560c9 feat: add third party licenses lists (#3010)
The files are generated with `make licenses`, currently expected to run
manually. In the future, some automations could be built.
2026-02-09 16:16:46 -08:00
Lance Release
de4f77800d Bump version: 0.26.2-beta.0 → 0.26.2 2026-02-09 06:06:22 +00:00
Lance Release
b6ab721cf7 Bump version: 0.26.1 → 0.26.2-beta.0 2026-02-09 06:06:03 +00:00
Lance Release
027d53500b Bump version: 0.29.2-beta.0 → 0.29.2 2026-02-09 06:05:42 +00:00
Lance Release
9098f47e73 Bump version: 0.29.1 → 0.29.2-beta.0 2026-02-09 06:05:40 +00:00
Jack Ye
826a3e5ee9 ci(nodejs): add repository field to package.json for npm provenance (#3003)
## Summary

- Added `repository` field to all nodejs package.json files (main
package + 7 platform-specific packages)
- This fixes the npm publish E422 error where sigstore provenance
verification fails because the repository.url was empty

## Root Cause

Failing CI:
https://github.com/lancedb/lancedb/actions/runs/21770794768/job/62821570260

npm's sigstore provenance verification requires the `repository.url`
field in package.json to match the GitHub repository URL from the
provenance bundle. The platform-specific packages
(`@lancedb/lancedb-darwin-arm64`, etc.) were missing this field
entirely, causing the publish to fail with:

```
npm error 422 Unprocessable Entity - Error verifying sigstore provenance bundle: 
Failed to validate repository information: package.json: "repository.url" is "", 
expected to match "https://github.com/lancedb/lancedb" from provenance
```

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-08 22:04:32 -08:00
Lance Release
9fac56252e Bump version: 0.26.1-beta.0 → 0.26.1 2026-02-07 00:33:18 +00:00
Lance Release
c55ca20c1b Bump version: 0.26.0 → 0.26.1-beta.0 2026-02-07 00:33:02 +00:00
Lance Release
5cdb15feef Bump version: 0.29.1-beta.0 → 0.29.1 2026-02-07 00:32:44 +00:00
Lance Release
7a3eea927f Bump version: 0.29.0 → 0.29.1-beta.0 2026-02-07 00:32:42 +00:00
Jack Ye
5dd9b072d8 ci: upgrade node version for publishing (#2993)
Trusted publishing requires npm >=11.5.1, which means node>=24.

Also need `npm config set provenance true` to fully enable it
2026-02-06 16:30:46 -08:00
Abhishek
6dde379d44 refactor: extract schema evolution logic from table.rs into submodule (#2973)
Continues the modularization effort of schema evolution operations as
outlined in #2949

## Summary
- Extracts schema evolution operations (add_columns, alter_columns,
drop_columns) from `table.rs` into `table/schema_evolution.rs`
- Public API remains unchanged via re-exports
## Test plan
- [x] All new schema evolution tests pass
- [x] All existing tests pass
- [x] `cargo clippy` passes with no warnings
  - [x] `cargo fmt --check` passes
2026-02-06 11:33:18 -08:00
Lance Release
55f09ef1cd Bump version: 0.26.0-beta.0 → 0.26.0 2026-02-06 18:08:30 +00:00
Lance Release
e9d8651d18 Bump version: 0.25.0-beta.0 → 0.26.0-beta.0 2026-02-06 18:08:08 +00:00
Lance Release
071f467571 Bump version: 0.29.0-beta.0 → 0.29.0 2026-02-06 18:07:49 +00:00
Lance Release
f83aa25119 Bump version: 0.28.0-beta.0 → 0.29.0-beta.0 2026-02-06 18:07:48 +00:00
Jack Ye
0a8fe4d026 ci: fix python version for latest release (#2989)
It was accidentally corrupted in
https://github.com/lancedb/lancedb/pull/2972
2026-02-06 10:07:03 -08:00
Jack Ye
3ad7be9825 fix: remove x86_64-apple-darwin from list of npm triples (#2987)
Missed during https://github.com/lancedb/lancedb/pull/2987
2026-02-06 09:43:44 -08:00
LanceDB Robot
589041d842 feat: update lance dependency to v2.0.0 (#2985)
## Summary
- Bump Lance Rust crates to v2.0.0 (from v2.0.0-rc.4) and update Java
`lance-core` to 2.0.0.
- Verified `cargo clippy --workspace --tests --all-features -- -D
warnings` and `cargo fmt --all`.
- Triggering tag: v2.0.0.
2026-02-05 17:39:32 -08:00
Jack Ye
2e4cd56ab1 ci: auto-publish lancedb java sdk (#2986)
Avoid the need to manually approve an artifact release in Maven Central
2026-02-05 16:30:32 -08:00
Jack Ye
6fd8586fa7 fix: avoid force push in codex workflows to work with v0.95.0 git safety (#2981)
## Summary
- Codex CLI v0.95.0 ([PR
#10258](https://github.com/openai/codex/pull/10258)) hardened git
command safety so force push (`git push -f`, `--force`,
`--force-with-lease`, `+refspec`) now requires approval, which blocks it
in non-interactive `exec` mode.
- This broke the
[codex-update-lance-dependency](https://github.com/lancedb/lancedb/actions/runs/21727536000/job/62673436482)
workflow — the job succeeded but failed to push the branch or create the
PR.
- Replace force push with `gh api` branch deletion followed by regular
`git push`.
- Also update the script to bump Java lance-core version which was
missing previously

## Test plan
- [x] Re-run the `Codex Update Lance Dependency` workflow with a test
tag to verify the push and PR creation succeed:
https://github.com/lancedb/lancedb/pull/2983

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 15:57:45 -08:00
168 changed files with 66326 additions and 4673 deletions

View File

@@ -1,5 +1,5 @@
[tool.bumpversion]
current_version = "0.25.0-beta.0"
current_version = "0.27.0-beta.3"
parse = """(?x)
(?P<major>0|[1-9]\\d*)\\.
(?P<minor>0|[1-9]\\d*)\\.

View File

@@ -29,6 +29,7 @@ runs:
if: ${{ inputs.arm-build == 'false' }}
uses: PyO3/maturin-action@v1
with:
maturin-version: "1.12.4"
command: build
working-directory: python
docker-options: "-e PIP_EXTRA_INDEX_URL='https://pypi.fury.io/lance-format/ https://pypi.fury.io/lancedb/'"
@@ -44,6 +45,7 @@ runs:
if: ${{ inputs.arm-build == 'true' }}
uses: PyO3/maturin-action@v1
with:
maturin-version: "1.12.4"
command: build
working-directory: python
docker-options: "-e PIP_EXTRA_INDEX_URL='https://pypi.fury.io/lance-format/ https://pypi.fury.io/lancedb/'"

View File

@@ -20,6 +20,7 @@ runs:
uses: PyO3/maturin-action@v1
with:
command: build
maturin-version: "1.12.4"
# TODO: pass through interpreter
args: ${{ inputs.args }}
docker-options: "-e PIP_EXTRA_INDEX_URL='https://pypi.fury.io/lance-format/ https://pypi.fury.io/lancedb/'"

View File

@@ -25,6 +25,7 @@ runs:
uses: PyO3/maturin-action@v1
with:
command: build
maturin-version: "1.12.4"
args: ${{ inputs.args }}
docker-options: "-e PIP_EXTRA_INDEX_URL='https://pypi.fury.io/lance-format/ https://pypi.fury.io/lancedb/'"
working-directory: python

173
.github/workflows/codex-fix-ci.yml vendored Normal file
View File

@@ -0,0 +1,173 @@
name: Codex Fix CI
on:
workflow_dispatch:
inputs:
workflow_run_url:
description: "Failing CI workflow run URL (e.g., https://github.com/lancedb/lancedb/actions/runs/12345678)"
required: true
type: string
branch:
description: "Branch to fix (e.g., main, release/v2.0, or feature-branch)"
required: true
type: string
guidelines:
description: "Additional guidelines for the fix (optional)"
required: false
type: string
permissions:
contents: write
pull-requests: write
actions: read
jobs:
fix-ci:
runs-on: warp-ubuntu-latest-x64-4x
timeout-minutes: 60
env:
CC: clang
CXX: clang++
steps:
- name: Show inputs
run: |
echo "workflow_run_url = ${{ inputs.workflow_run_url }}"
echo "branch = ${{ inputs.branch }}"
echo "guidelines = ${{ inputs.guidelines }}"
- name: Checkout Repo
uses: actions/checkout@v4
with:
ref: ${{ inputs.branch }}
fetch-depth: 0
persist-credentials: true
- name: Set up Node.js
uses: actions/setup-node@v4
with:
node-version: 20
- name: Install Codex CLI
run: npm install -g @openai/codex
- name: Install Rust toolchain
uses: dtolnay/rust-toolchain@stable
with:
toolchain: stable
components: clippy, rustfmt
- uses: Swatinem/rust-cache@v2
- name: Install system dependencies
run: |
sudo apt-get update
sudo apt-get install -y protobuf-compiler libssl-dev
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.11'
- name: Install Python dependencies
run: |
pip install maturin ruff pytest pyarrow pandas polars
- name: Set up Java
uses: actions/setup-java@v4
with:
distribution: temurin
java-version: '11'
cache: maven
- name: Install Node.js dependencies for TypeScript bindings
run: |
cd nodejs
npm ci
- name: Configure git user
run: |
git config user.name "lancedb automation"
git config user.email "robot@lancedb.com"
- name: Run Codex to fix CI failure
env:
WORKFLOW_RUN_URL: ${{ inputs.workflow_run_url }}
BRANCH: ${{ inputs.branch }}
GUIDELINES: ${{ inputs.guidelines }}
GITHUB_TOKEN: ${{ secrets.ROBOT_TOKEN }}
GH_TOKEN: ${{ secrets.ROBOT_TOKEN }}
OPENAI_API_KEY: ${{ secrets.CODEX_TOKEN }}
run: |
set -euo pipefail
cat <<EOF >/tmp/codex-prompt.txt
You are running inside the lancedb repository on a GitHub Actions runner. Your task is to fix a CI failure.
Input parameters:
- Failing workflow run URL: ${WORKFLOW_RUN_URL}
- Branch to fix: ${BRANCH}
- Additional guidelines: ${GUIDELINES:-"None provided"}
Follow these steps exactly:
1. Extract the run ID from the workflow URL. The URL format is https://github.com/lancedb/lancedb/actions/runs/<run_id>.
2. Use "gh run view <run_id> --json jobs,conclusion,name" to get information about the failed run.
3. Identify which jobs failed. For each failed job, use "gh run view <run_id> --job <job_id> --log-failed" to get the failure logs.
4. Analyze the failure logs to understand what went wrong. Common failures include:
- Compilation errors
- Test failures
- Clippy warnings treated as errors
- Formatting issues
- Dependency issues
5. Based on the analysis, fix the issues in the codebase:
- For compilation errors: Fix the code that doesn't compile
- For test failures: Fix the failing tests or the code they test
- For clippy warnings: Apply the suggested fixes
- For formatting issues: Run "cargo fmt --all"
- For other issues: Apply appropriate fixes
6. After making fixes, verify them locally:
- Run "cargo fmt --all" to ensure formatting is correct
- Run "cargo clippy --workspace --tests --all-features -- -D warnings" to check for issues
- Run ONLY the specific failing tests to confirm they pass now:
- For Rust test failures: Run the specific test with "cargo test -p <crate> <test_name>"
- For Python test failures: Build with "cd python && maturin develop" then run "pytest <specific_test_file>::<test_name>"
- For Java test failures: Run "cd java && mvn test -Dtest=<TestClass>#<testMethod>"
- For TypeScript test failures: Run "cd nodejs && npm run build && npm test -- --testNamePattern='<test_name>'"
- Do NOT run the full test suite - only run the tests that were failing
7. If the additional guidelines are provided, follow them as well.
8. Inspect "git status --short" and "git diff" to review your changes.
9. Create a fix branch: "git checkout -b codex/fix-ci-<run_id>".
10. Stage all changes with "git add -A" and commit with message "fix: resolve CI failures from run <run_id>".
11. Push the branch: "git push origin codex/fix-ci-<run_id>". If the remote branch exists, delete it first with "gh api -X DELETE repos/lancedb/lancedb/git/refs/heads/codex/fix-ci-<run_id>" then push. Do NOT use "git push --force" or "git push -f".
12. Create a pull request targeting "${BRANCH}":
- Title: "ci: <short summary describing the fix>" (e.g., "ci: fix clippy warnings in lancedb" or "ci: resolve test flakiness in vector search")
- First, write the PR body to /tmp/pr-body.md using a heredoc (cat <<'PREOF' > /tmp/pr-body.md). The body should include:
- Link to the failing workflow run
- Summary of what failed
- Description of the fixes applied
- Then run "gh pr create --base ${BRANCH} --body-file /tmp/pr-body.md".
13. Display the new PR URL, "git status --short", and a summary of what was fixed.
Constraints:
- Use bash commands for all operations.
- Do not merge the PR.
- Do not modify GitHub workflow files unless they are the cause of the failure.
- If any command fails, diagnose and attempt to fix the issue instead of aborting immediately.
- If you cannot fix the issue automatically, create the PR anyway with a clear explanation of what you tried and what remains to be fixed.
- env "GH_TOKEN" is available, use "gh" tools for GitHub-related operations.
EOF
printenv OPENAI_API_KEY | codex login --with-api-key
codex --config shell_environment_policy.ignore_default_excludes=true exec --dangerously-bypass-approvals-and-sandbox "$(cat /tmp/codex-prompt.txt)"

View File

@@ -8,6 +8,7 @@ on:
paths:
- Cargo.toml
- nodejs/**
- rust/**
- docs/src/js/**
- .github/workflows/nodejs.yml
- docker-compose.yml
@@ -77,8 +78,11 @@ jobs:
fetch-depth: 0
lfs: true
- uses: actions/setup-node@v3
name: Setup Node.js 20 for build
with:
node-version: ${{ matrix.node-version }}
# @napi-rs/cli v3 requires Node >= 20.12 (via @inquirer/prompts@8).
# Build always on Node 20; tests run on the matrix version below.
node-version: 20
cache: 'npm'
cache-dependency-path: nodejs/package-lock.json
- uses: Swatinem/rust-cache@v2
@@ -86,12 +90,16 @@ jobs:
run: |
sudo apt update
sudo apt install -y protobuf-compiler libssl-dev
npm install -g @napi-rs/cli
- name: Build
run: |
npm ci --include=optional
npm run build:debug -- --profile ci
npm run tsc
- uses: actions/setup-node@v3
name: Setup Node.js ${{ matrix.node-version }} for test
with:
node-version: ${{ matrix.node-version }}
- name: Compile TypeScript
run: npm run tsc
- name: Setup localstack
working-directory: .
run: docker compose up --detach --wait
@@ -144,7 +152,6 @@ jobs:
- name: Install dependencies
run: |
brew install protobuf
npm install -g @napi-rs/cli
- name: Build
run: |
npm ci --include=optional

View File

@@ -128,16 +128,13 @@ jobs:
- target: x86_64-unknown-linux-musl
# This one seems to need some extra memory
host: ubuntu-2404-8x-x64
# https://github.com/napi-rs/napi-rs/blob/main/alpine.Dockerfile
docker: ghcr.io/napi-rs/napi-rs/nodejs-rust:lts-alpine
features: fp16kernels
pre_build: |-
set -e &&
apk add protobuf-dev curl &&
ln -s /usr/lib/gcc/x86_64-alpine-linux-musl/14.2.0/crtbeginS.o /usr/lib/crtbeginS.o &&
ln -s /usr/lib/libgcc_s.so /usr/lib/libgcc.so &&
CC=gcc &&
CXX=g++
sudo apt-get update &&
sudo apt-get install -y protobuf-compiler pkg-config &&
rustup target add x86_64-unknown-linux-musl &&
export EXTRA_ARGS="-x"
- target: aarch64-unknown-linux-gnu
host: ubuntu-2404-8x-x64
# https://github.com/napi-rs/napi-rs/blob/main/debian-aarch64.Dockerfile
@@ -153,15 +150,13 @@ jobs:
rustup target add aarch64-unknown-linux-gnu
- target: aarch64-unknown-linux-musl
host: ubuntu-2404-8x-x64
# https://github.com/napi-rs/napi-rs/blob/main/alpine.Dockerfile
docker: ghcr.io/napi-rs/napi-rs/nodejs-rust:lts-alpine
features: ","
pre_build: |-
set -e &&
apk add protobuf-dev &&
sudo apt-get update &&
sudo apt-get install -y protobuf-compiler &&
rustup target add aarch64-unknown-linux-musl &&
export CC_aarch64_unknown_linux_musl=aarch64-linux-musl-gcc &&
export CXX_aarch64_unknown_linux_musl=aarch64-linux-musl-g++
export EXTRA_ARGS="-x"
name: build - ${{ matrix.settings.target }}
runs-on: ${{ matrix.settings.host }}
defaults:
@@ -192,12 +187,18 @@ jobs:
.cargo-cache
target/
key: nodejs-${{ matrix.settings.target }}-cargo-${{ matrix.settings.host }}
- name: Setup toolchain
run: ${{ matrix.settings.setup }}
if: ${{ matrix.settings.setup }}
shell: bash
- name: Install dependencies
run: npm ci
- name: Install Zig
uses: mlugg/setup-zig@v2
if: ${{ contains(matrix.settings.target, 'musl') }}
with:
version: 0.14.1
- name: Install cargo-zigbuild
uses: taiki-e/install-action@v2
if: ${{ contains(matrix.settings.target, 'musl') }}
with:
tool: cargo-zigbuild
- name: Build in docker
uses: addnab/docker-run-action@v3
if: ${{ matrix.settings.docker }}
@@ -210,24 +211,24 @@ jobs:
run: |
set -e
${{ matrix.settings.pre_build }}
npx napi build --platform --release --no-const-enum \
npx napi build --platform --release \
--features ${{ matrix.settings.features }} \
--target ${{ matrix.settings.target }} \
--dts ../lancedb/native.d.ts \
--js ../lancedb/native.js \
--strip \
dist/
--output-dir dist/
- name: Build
run: |
${{ matrix.settings.pre_build }}
npx napi build --platform --release --no-const-enum \
npx napi build --platform --release \
--features ${{ matrix.settings.features }} \
--target ${{ matrix.settings.target }} \
--dts ../lancedb/native.d.ts \
--js ../lancedb/native.js \
--strip \
$EXTRA_ARGS \
dist/
--output-dir dist/
if: ${{ !matrix.settings.docker }}
shell: bash
- name: Upload artifact
@@ -318,7 +319,7 @@ jobs:
- name: Setup node
uses: actions/setup-node@v4
with:
node-version: 20
node-version: 24
cache: npm
cache-dependency-path: nodejs/package-lock.json
registry-url: "https://registry.npmjs.org"
@@ -350,11 +351,13 @@ jobs:
env:
DRY_RUN: ${{ !startsWith(github.ref, 'refs/tags/v') }}
run: |
npm config set provenance true
ARGS="--access public"
if [[ $DRY_RUN == "true" ]]; then
ARGS="$ARGS --dry-run"
fi
if [[ $GITHUB_REF =~ refs/tags/v(.*)-beta.* ]]; then
VERSION=$(node -p "require('./package.json').version")
if [[ $VERSION == *-* ]]; then
ARGS="$ARGS --tag preview"
fi
npm publish $ARGS

View File

@@ -8,7 +8,12 @@ on:
paths:
- Cargo.toml
- python/**
- rust/**
- .github/workflows/python.yml
- .github/workflows/build_linux_wheel/**
- .github/workflows/build_mac_wheel/**
- .github/workflows/build_windows_wheel/**
- .github/workflows/run_tests/**
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}

View File

@@ -100,7 +100,9 @@ jobs:
lfs: true
- uses: Swatinem/rust-cache@v2
- name: Install dependencies
run: sudo apt install -y protobuf-compiler libssl-dev
run: |
sudo apt update
sudo apt install -y protobuf-compiler libssl-dev
- uses: rui314/setup-mold@v1
- name: Make Swap
run: |
@@ -183,7 +185,7 @@ jobs:
runs-on: ubuntu-24.04
strategy:
matrix:
msrv: ["1.88.0"] # This should match up with rust-version in Cargo.toml
msrv: ["1.91.0"] # This should match up with rust-version in Cargo.toml
env:
# Need up-to-date compilers for kernels
CC: clang-18

864
Cargo.lock generated

File diff suppressed because it is too large Load Diff

View File

@@ -5,30 +5,30 @@ exclude = ["python"]
resolver = "2"
[workspace.package]
edition = "2021"
edition = "2024"
authors = ["LanceDB Devs <dev@lancedb.com>"]
license = "Apache-2.0"
repository = "https://github.com/lancedb/lancedb"
description = "Serverless, low-latency vector database for AI applications"
keywords = ["lancedb", "lance", "database", "vector", "search"]
categories = ["database-implementations"]
rust-version = "1.88.0"
rust-version = "1.91.0"
[workspace.dependencies]
lance = { "version" = "=2.0.0-rc.4", default-features = false, "tag" = "v2.0.0-rc.4", "git" = "https://github.com/lance-format/lance.git" }
lance-core = { "version" = "=2.0.0-rc.4", "tag" = "v2.0.0-rc.4", "git" = "https://github.com/lance-format/lance.git" }
lance-datagen = { "version" = "=2.0.0-rc.4", "tag" = "v2.0.0-rc.4", "git" = "https://github.com/lance-format/lance.git" }
lance-file = { "version" = "=2.0.0-rc.4", "tag" = "v2.0.0-rc.4", "git" = "https://github.com/lance-format/lance.git" }
lance-io = { "version" = "=2.0.0-rc.4", default-features = false, "tag" = "v2.0.0-rc.4", "git" = "https://github.com/lance-format/lance.git" }
lance-index = { "version" = "=2.0.0-rc.4", "tag" = "v2.0.0-rc.4", "git" = "https://github.com/lance-format/lance.git" }
lance-linalg = { "version" = "=2.0.0-rc.4", "tag" = "v2.0.0-rc.4", "git" = "https://github.com/lance-format/lance.git" }
lance-namespace = { "version" = "=2.0.0-rc.4", "tag" = "v2.0.0-rc.4", "git" = "https://github.com/lance-format/lance.git" }
lance-namespace-impls = { "version" = "=2.0.0-rc.4", default-features = false, "tag" = "v2.0.0-rc.4", "git" = "https://github.com/lance-format/lance.git" }
lance-table = { "version" = "=2.0.0-rc.4", "tag" = "v2.0.0-rc.4", "git" = "https://github.com/lance-format/lance.git" }
lance-testing = { "version" = "=2.0.0-rc.4", "tag" = "v2.0.0-rc.4", "git" = "https://github.com/lance-format/lance.git" }
lance-datafusion = { "version" = "=2.0.0-rc.4", "tag" = "v2.0.0-rc.4", "git" = "https://github.com/lance-format/lance.git" }
lance-encoding = { "version" = "=2.0.0-rc.4", "tag" = "v2.0.0-rc.4", "git" = "https://github.com/lance-format/lance.git" }
lance-arrow = { "version" = "=2.0.0-rc.4", "tag" = "v2.0.0-rc.4", "git" = "https://github.com/lance-format/lance.git" }
lance = { "version" = "=3.0.0-rc.2", default-features = false, "tag" = "v3.0.0-rc.2", "git" = "https://github.com/lance-format/lance.git" }
lance-core = { "version" = "=3.0.0-rc.2", "tag" = "v3.0.0-rc.2", "git" = "https://github.com/lance-format/lance.git" }
lance-datagen = { "version" = "=3.0.0-rc.2", "tag" = "v3.0.0-rc.2", "git" = "https://github.com/lance-format/lance.git" }
lance-file = { "version" = "=3.0.0-rc.2", "tag" = "v3.0.0-rc.2", "git" = "https://github.com/lance-format/lance.git" }
lance-io = { "version" = "=3.0.0-rc.2", default-features = false, "tag" = "v3.0.0-rc.2", "git" = "https://github.com/lance-format/lance.git" }
lance-index = { "version" = "=3.0.0-rc.2", "tag" = "v3.0.0-rc.2", "git" = "https://github.com/lance-format/lance.git" }
lance-linalg = { "version" = "=3.0.0-rc.2", "tag" = "v3.0.0-rc.2", "git" = "https://github.com/lance-format/lance.git" }
lance-namespace = { "version" = "=3.0.0-rc.2", "tag" = "v3.0.0-rc.2", "git" = "https://github.com/lance-format/lance.git" }
lance-namespace-impls = { "version" = "=3.0.0-rc.2", default-features = false, "tag" = "v3.0.0-rc.2", "git" = "https://github.com/lance-format/lance.git" }
lance-table = { "version" = "=3.0.0-rc.2", "tag" = "v3.0.0-rc.2", "git" = "https://github.com/lance-format/lance.git" }
lance-testing = { "version" = "=3.0.0-rc.2", "tag" = "v3.0.0-rc.2", "git" = "https://github.com/lance-format/lance.git" }
lance-datafusion = { "version" = "=3.0.0-rc.2", "tag" = "v3.0.0-rc.2", "git" = "https://github.com/lance-format/lance.git" }
lance-encoding = { "version" = "=3.0.0-rc.2", "tag" = "v3.0.0-rc.2", "git" = "https://github.com/lance-format/lance.git" }
lance-arrow = { "version" = "=3.0.0-rc.2", "tag" = "v3.0.0-rc.2", "git" = "https://github.com/lance-format/lance.git" }
ahash = "0.8"
# Note that this one does not include pyarrow
arrow = { version = "57.2", optional = false }
@@ -40,13 +40,15 @@ arrow-schema = "57.2"
arrow-select = "57.2"
arrow-cast = "57.2"
async-trait = "0"
datafusion = { version = "51.0", default-features = false }
datafusion-catalog = "51.0"
datafusion-common = { version = "51.0", default-features = false }
datafusion-execution = "51.0"
datafusion-expr = "51.0"
datafusion-physical-plan = "51.0"
datafusion-physical-expr = "51.0"
datafusion = { version = "52.1", default-features = false }
datafusion-catalog = "52.1"
datafusion-common = { version = "52.1", default-features = false }
datafusion-execution = "52.1"
datafusion-expr = "52.1"
datafusion-functions = "52.1"
datafusion-physical-plan = "52.1"
datafusion-physical-expr = "52.1"
datafusion-sql = "52.1"
env_logger = "0.11"
half = { "version" = "2.7.1", default-features = false, features = [
"num-traits",

9
Makefile Normal file
View File

@@ -0,0 +1,9 @@
.PHONY: licenses
licenses:
cargo about generate about.hbs -o RUST_THIRD_PARTY_LICENSES.html -c about.toml
cd python && cargo about generate ../about.hbs -o RUST_THIRD_PARTY_LICENSES.html -c ../about.toml
cd python && uv sync --all-extras && uv tool run pip-licenses --python .venv/bin/python --format=markdown --with-urls --output-file=PYTHON_THIRD_PARTY_LICENSES.md
cd nodejs && cargo about generate ../about.hbs -o RUST_THIRD_PARTY_LICENSES.html -c ../about.toml
cd nodejs && npx license-checker --markdown --out NODEJS_THIRD_PARTY_LICENSES.md
cd java && ./mvnw license:aggregate-add-third-party -q

15276
RUST_THIRD_PARTY_LICENSES.html Normal file

File diff suppressed because it is too large Load Diff

70
about.hbs Normal file
View File

@@ -0,0 +1,70 @@
<html>
<head>
<style>
@media (prefers-color-scheme: dark) {
body {
background: #333;
color: white;
}
a {
color: skyblue;
}
}
.container {
font-family: sans-serif;
max-width: 800px;
margin: 0 auto;
}
.intro {
text-align: center;
}
.licenses-list {
list-style-type: none;
margin: 0;
padding: 0;
}
.license-used-by {
margin-top: -10px;
}
.license-text {
max-height: 200px;
overflow-y: scroll;
white-space: pre-wrap;
}
</style>
</head>
<body>
<main class="container">
<div class="intro">
<h1>Third Party Licenses</h1>
<p>This page lists the licenses of the projects used in cargo-about.</p>
</div>
<h2>Overview of licenses:</h2>
<ul class="licenses-overview">
{{#each overview}}
<li><a href="#{{id}}">{{name}}</a> ({{count}})</li>
{{/each}}
</ul>
<h2>All license text:</h2>
<ul class="licenses-list">
{{#each licenses}}
<li class="license">
<h3 id="{{id}}">{{name}}</h3>
<h4>Used by:</h4>
<ul class="license-used-by">
{{#each used_by}}
<li><a href="{{#if crate.repository}} {{crate.repository}} {{else}} https://crates.io/crates/{{crate.name}} {{/if}}">{{crate.name}} {{crate.version}}</a></li>
{{/each}}
</ul>
<pre class="license-text">{{text}}</pre>
</li>
{{/each}}
</ul>
</main>
</body>
</html>

18
about.toml Normal file
View File

@@ -0,0 +1,18 @@
accepted = [
"0BSD",
"Apache-2.0",
"Apache-2.0 WITH LLVM-exception",
"BSD-2-Clause",
"BSD-3-Clause",
"BSL-1.0",
"bzip2-1.0.6",
"CC0-1.0",
"CDDL-1.0",
"CDLA-Permissive-2.0",
"ISC",
"MIT",
"MPL-2.0",
"OpenSSL",
"Unicode-3.0",
"Zlib",
]

View File

@@ -52,14 +52,21 @@ plugins:
options:
docstring_style: numpy
heading_level: 3
show_source: true
show_symbol_type_in_heading: true
show_signature_annotations: true
show_root_heading: true
show_docstring_examples: true
show_docstring_attributes: false
show_docstring_other_parameters: true
show_symbol_type_heading: true
show_labels: false
show_if_no_docstring: true
show_source: false
members_order: source
docstring_section_style: list
signature_crossrefs: true
separate_signature: true
filters:
- "!^_"
import:
# for cross references
- https://arrow.apache.org/docs/objects.inv
@@ -113,7 +120,7 @@ markdown_extensions:
emoji_index: !!python/name:material.extensions.emoji.twemoji
emoji_generator: !!python/name:material.extensions.emoji.to_svg
- markdown.extensions.toc:
toc_depth: 3
toc_depth: 4
permalink: true
permalink_title: Anchor link to this section

View File

@@ -14,7 +14,7 @@ Add the following dependency to your `pom.xml`:
<dependency>
<groupId>com.lancedb</groupId>
<artifactId>lancedb-core</artifactId>
<version>0.25.0-beta.0</version>
<version>0.27.0-beta.3</version>
</dependency>
```

View File

@@ -8,6 +8,14 @@
## Properties
### numDeletedRows
```ts
numDeletedRows: number;
```
***
### version
```ts

View File

@@ -0,0 +1,71 @@
List of third-party dependencies grouped by their license type.
Apache 2.0:
* error-prone annotations (com.google.errorprone:error_prone_annotations:2.28.0 - https://errorprone.info/error_prone_annotations)
Apache License 2.0:
* JsonNullable Jackson module (org.openapitools:jackson-databind-nullable:0.2.6 - https://github.com/OpenAPITools/jackson-databind-nullable)
Apache License V2.0:
* FlatBuffers Java API (com.google.flatbuffers:flatbuffers-java:23.5.26 - https://github.com/google/flatbuffers)
Apache License, Version 2.0:
* Apache Commons Codec (commons-codec:commons-codec:1.15 - https://commons.apache.org/proper/commons-codec/)
* Apache HttpClient (org.apache.httpcomponents.client5:httpclient5:5.2.1 - https://hc.apache.org/httpcomponents-client-5.0.x/5.2.1/httpclient5/)
* Apache HttpComponents Core HTTP/1.1 (org.apache.httpcomponents.core5:httpcore5:5.2 - https://hc.apache.org/httpcomponents-core-5.2.x/5.2/httpcore5/)
* Apache HttpComponents Core HTTP/2 (org.apache.httpcomponents.core5:httpcore5-h2:5.2 - https://hc.apache.org/httpcomponents-core-5.2.x/5.2/httpcore5-h2/)
* Arrow Format (org.apache.arrow:arrow-format:15.0.0 - https://arrow.apache.org/arrow-format/)
* Arrow Java C Data Interface (org.apache.arrow:arrow-c-data:15.0.0 - https://arrow.apache.org/arrow-c-data/)
* Arrow Java Dataset (org.apache.arrow:arrow-dataset:15.0.0 - https://arrow.apache.org/arrow-dataset/)
* Arrow Memory - Core (org.apache.arrow:arrow-memory-core:15.0.0 - https://arrow.apache.org/arrow-memory/arrow-memory-core/)
* Arrow Memory - Netty (org.apache.arrow:arrow-memory-netty:15.0.0 - https://arrow.apache.org/arrow-memory/arrow-memory-netty/)
* Arrow Vectors (org.apache.arrow:arrow-vector:15.0.0 - https://arrow.apache.org/arrow-vector/)
* Guava: Google Core Libraries for Java (com.google.guava:guava:33.3.1-jre - https://github.com/google/guava)
* J2ObjC Annotations (com.google.j2objc:j2objc-annotations:3.0.0 - https://github.com/google/j2objc/)
* Netty/Buffer (io.netty:netty-buffer:4.1.104.Final - https://netty.io/netty-buffer/)
* Netty/Common (io.netty:netty-common:4.1.104.Final - https://netty.io/netty-common/)
Apache-2.0:
* Apache Commons Lang (org.apache.commons:commons-lang3:3.18.0 - https://commons.apache.org/proper/commons-lang/)
* lance-namespace-apache-client (org.lance:lance-namespace-apache-client:0.4.5 - https://github.com/openapitools/openapi-generator)
* lance-namespace-core (org.lance:lance-namespace-core:0.4.5 - https://lance.org/format/namespace/lance-namespace-core/)
EDL 1.0:
* Jakarta Activation API jar (jakarta.activation:jakarta.activation-api:1.2.2 - https://github.com/eclipse-ee4j/jaf/jakarta.activation-api)
Eclipse Distribution License - v 1.0:
* Eclipse Collections API (org.eclipse.collections:eclipse-collections-api:11.1.0 - https://github.com/eclipse/eclipse-collections/eclipse-collections-api)
* Eclipse Collections Main Library (org.eclipse.collections:eclipse-collections:11.1.0 - https://github.com/eclipse/eclipse-collections/eclipse-collections)
* Jakarta XML Binding API (jakarta.xml.bind:jakarta.xml.bind-api:2.3.3 - https://github.com/eclipse-ee4j/jaxb-api/jakarta.xml.bind-api)
Eclipse Public License - v 1.0:
* Eclipse Collections API (org.eclipse.collections:eclipse-collections-api:11.1.0 - https://github.com/eclipse/eclipse-collections/eclipse-collections-api)
* Eclipse Collections Main Library (org.eclipse.collections:eclipse-collections:11.1.0 - https://github.com/eclipse/eclipse-collections/eclipse-collections)
The Apache Software License, Version 2.0:
* FindBugs-jsr305 (com.google.code.findbugs:jsr305:3.0.2 - http://findbugs.sourceforge.net/)
* Guava InternalFutureFailureAccess and InternalFutures (com.google.guava:failureaccess:1.0.2 - https://github.com/google/guava/failureaccess)
* Guava ListenableFuture only (com.google.guava:listenablefuture:9999.0-empty-to-avoid-conflict-with-guava - https://github.com/google/guava/listenablefuture)
* Jackson datatype: JSR310 (com.fasterxml.jackson.datatype:jackson-datatype-jsr310:2.16.0 - https://github.com/FasterXML/jackson-modules-java8/jackson-datatype-jsr310)
* Jackson module: Old JAXB Annotations (javax.xml.bind) (com.fasterxml.jackson.module:jackson-module-jaxb-annotations:2.17.1 - https://github.com/FasterXML/jackson-modules-base)
* Jackson-annotations (com.fasterxml.jackson.core:jackson-annotations:2.16.0 - https://github.com/FasterXML/jackson)
* Jackson-core (com.fasterxml.jackson.core:jackson-core:2.16.0 - https://github.com/FasterXML/jackson-core)
* jackson-databind (com.fasterxml.jackson.core:jackson-databind:2.15.2 - https://github.com/FasterXML/jackson)
* Jackson-JAXRS: base (com.fasterxml.jackson.jaxrs:jackson-jaxrs-base:2.17.1 - https://github.com/FasterXML/jackson-jaxrs-providers/jackson-jaxrs-base)
* Jackson-JAXRS: JSON (com.fasterxml.jackson.jaxrs:jackson-jaxrs-json-provider:2.17.1 - https://github.com/FasterXML/jackson-jaxrs-providers/jackson-jaxrs-json-provider)
* JAR JNI Loader (org.questdb:jar-jni:1.1.1 - https://github.com/questdb/rust-maven-plugin)
* Lance Core (org.lance:lance-core:2.0.0 - https://lance.org/)
The MIT License:
* Checker Qual (org.checkerframework:checker-qual:3.43.0 - https://checkerframework.org/)

View File

@@ -0,0 +1,71 @@
List of third-party dependencies grouped by their license type.
Apache 2.0:
* error-prone annotations (com.google.errorprone:error_prone_annotations:2.28.0 - https://errorprone.info/error_prone_annotations)
Apache License 2.0:
* JsonNullable Jackson module (org.openapitools:jackson-databind-nullable:0.2.6 - https://github.com/OpenAPITools/jackson-databind-nullable)
Apache License V2.0:
* FlatBuffers Java API (com.google.flatbuffers:flatbuffers-java:23.5.26 - https://github.com/google/flatbuffers)
Apache License, Version 2.0:
* Apache Commons Codec (commons-codec:commons-codec:1.15 - https://commons.apache.org/proper/commons-codec/)
* Apache HttpClient (org.apache.httpcomponents.client5:httpclient5:5.2.1 - https://hc.apache.org/httpcomponents-client-5.0.x/5.2.1/httpclient5/)
* Apache HttpComponents Core HTTP/1.1 (org.apache.httpcomponents.core5:httpcore5:5.2 - https://hc.apache.org/httpcomponents-core-5.2.x/5.2/httpcore5/)
* Apache HttpComponents Core HTTP/2 (org.apache.httpcomponents.core5:httpcore5-h2:5.2 - https://hc.apache.org/httpcomponents-core-5.2.x/5.2/httpcore5-h2/)
* Arrow Format (org.apache.arrow:arrow-format:15.0.0 - https://arrow.apache.org/arrow-format/)
* Arrow Java C Data Interface (org.apache.arrow:arrow-c-data:15.0.0 - https://arrow.apache.org/arrow-c-data/)
* Arrow Java Dataset (org.apache.arrow:arrow-dataset:15.0.0 - https://arrow.apache.org/arrow-dataset/)
* Arrow Memory - Core (org.apache.arrow:arrow-memory-core:15.0.0 - https://arrow.apache.org/arrow-memory/arrow-memory-core/)
* Arrow Memory - Netty (org.apache.arrow:arrow-memory-netty:15.0.0 - https://arrow.apache.org/arrow-memory/arrow-memory-netty/)
* Arrow Vectors (org.apache.arrow:arrow-vector:15.0.0 - https://arrow.apache.org/arrow-vector/)
* Guava: Google Core Libraries for Java (com.google.guava:guava:33.3.1-jre - https://github.com/google/guava)
* J2ObjC Annotations (com.google.j2objc:j2objc-annotations:3.0.0 - https://github.com/google/j2objc/)
* Netty/Buffer (io.netty:netty-buffer:4.1.104.Final - https://netty.io/netty-buffer/)
* Netty/Common (io.netty:netty-common:4.1.104.Final - https://netty.io/netty-common/)
Apache-2.0:
* Apache Commons Lang (org.apache.commons:commons-lang3:3.18.0 - https://commons.apache.org/proper/commons-lang/)
* lance-namespace-apache-client (org.lance:lance-namespace-apache-client:0.4.5 - https://github.com/openapitools/openapi-generator)
* lance-namespace-core (org.lance:lance-namespace-core:0.4.5 - https://lance.org/format/namespace/lance-namespace-core/)
EDL 1.0:
* Jakarta Activation API jar (jakarta.activation:jakarta.activation-api:1.2.2 - https://github.com/eclipse-ee4j/jaf/jakarta.activation-api)
Eclipse Distribution License - v 1.0:
* Eclipse Collections API (org.eclipse.collections:eclipse-collections-api:11.1.0 - https://github.com/eclipse/eclipse-collections/eclipse-collections-api)
* Eclipse Collections Main Library (org.eclipse.collections:eclipse-collections:11.1.0 - https://github.com/eclipse/eclipse-collections/eclipse-collections)
* Jakarta XML Binding API (jakarta.xml.bind:jakarta.xml.bind-api:2.3.3 - https://github.com/eclipse-ee4j/jaxb-api/jakarta.xml.bind-api)
Eclipse Public License - v 1.0:
* Eclipse Collections API (org.eclipse.collections:eclipse-collections-api:11.1.0 - https://github.com/eclipse/eclipse-collections/eclipse-collections-api)
* Eclipse Collections Main Library (org.eclipse.collections:eclipse-collections:11.1.0 - https://github.com/eclipse/eclipse-collections/eclipse-collections)
The Apache Software License, Version 2.0:
* FindBugs-jsr305 (com.google.code.findbugs:jsr305:3.0.2 - http://findbugs.sourceforge.net/)
* Guava InternalFutureFailureAccess and InternalFutures (com.google.guava:failureaccess:1.0.2 - https://github.com/google/guava/failureaccess)
* Guava ListenableFuture only (com.google.guava:listenablefuture:9999.0-empty-to-avoid-conflict-with-guava - https://github.com/google/guava/listenablefuture)
* Jackson datatype: JSR310 (com.fasterxml.jackson.datatype:jackson-datatype-jsr310:2.16.0 - https://github.com/FasterXML/jackson-modules-java8/jackson-datatype-jsr310)
* Jackson module: Old JAXB Annotations (javax.xml.bind) (com.fasterxml.jackson.module:jackson-module-jaxb-annotations:2.17.1 - https://github.com/FasterXML/jackson-modules-base)
* Jackson-annotations (com.fasterxml.jackson.core:jackson-annotations:2.16.0 - https://github.com/FasterXML/jackson)
* Jackson-core (com.fasterxml.jackson.core:jackson-core:2.16.0 - https://github.com/FasterXML/jackson-core)
* jackson-databind (com.fasterxml.jackson.core:jackson-databind:2.15.2 - https://github.com/FasterXML/jackson)
* Jackson-JAXRS: base (com.fasterxml.jackson.jaxrs:jackson-jaxrs-base:2.17.1 - https://github.com/FasterXML/jackson-jaxrs-providers/jackson-jaxrs-base)
* Jackson-JAXRS: JSON (com.fasterxml.jackson.jaxrs:jackson-jaxrs-json-provider:2.17.1 - https://github.com/FasterXML/jackson-jaxrs-providers/jackson-jaxrs-json-provider)
* JAR JNI Loader (org.questdb:jar-jni:1.1.1 - https://github.com/questdb/rust-maven-plugin)
* Lance Core (org.lance:lance-core:2.0.0 - https://lance.org/)
The MIT License:
* Checker Qual (org.checkerframework:checker-qual:3.43.0 - https://checkerframework.org/)

View File

@@ -8,7 +8,7 @@
<parent>
<groupId>com.lancedb</groupId>
<artifactId>lancedb-parent</artifactId>
<version>0.25.0-beta.0</version>
<version>0.27.0-beta.3</version>
<relativePath>../pom.xml</relativePath>
</parent>

View File

@@ -6,7 +6,7 @@
<groupId>com.lancedb</groupId>
<artifactId>lancedb-parent</artifactId>
<version>0.25.0-beta.0</version>
<version>0.27.0-beta.3</version>
<packaging>pom</packaging>
<name>${project.artifactId}</name>
<description>LanceDB Java SDK Parent POM</description>
@@ -28,7 +28,7 @@
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<arrow.version>15.0.0</arrow.version>
<lance-core.version>1.0.4</lance-core.version>
<lance-core.version>3.1.0-beta.2</lance-core.version>
<spotless.skip>false</spotless.skip>
<spotless.version>2.30.0</spotless.version>
<spotless.java.googlejavaformat.version>1.7</spotless.java.googlejavaformat.version>
@@ -160,6 +160,19 @@
<groupId>com.diffplug.spotless</groupId>
<artifactId>spotless-maven-plugin</artifactId>
</plugin>
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>license-maven-plugin</artifactId>
<version>2.4.0</version>
<configuration>
<outputDirectory>${project.basedir}</outputDirectory>
<thirdPartyFilename>JAVA_THIRD_PARTY_LICENSES.md</thirdPartyFilename>
<fileTemplate>/org/codehaus/mojo/license/third-party-file-groupByLicense.ftl</fileTemplate>
<includedScopes>compile,runtime</includedScopes>
<excludedScopes>test,provided</excludedScopes>
<sortArtifactByName>true</sortArtifactByName>
</configuration>
</plugin>
</plugins>
<pluginManagement>
<plugins>
@@ -292,11 +305,12 @@
<plugin>
<groupId>org.sonatype.central</groupId>
<artifactId>central-publishing-maven-plugin</artifactId>
<version>0.4.0</version>
<version>0.8.0</version>
<extensions>true</extensions>
<configuration>
<publishingServerId>ossrh</publishingServerId>
<tokenAuth>true</tokenAuth>
<autoPublish>true</autoPublish>
</configuration>
</plugin>
<plugin>

View File

@@ -1,7 +1,7 @@
[package]
name = "lancedb-nodejs"
edition.workspace = true
version = "0.25.0-beta.0"
version = "0.27.0-beta.3"
license.workspace = true
description.workspace = true
repository.workspace = true
@@ -19,11 +19,11 @@ arrow-schema.workspace = true
env_logger.workspace = true
futures.workspace = true
lancedb = { path = "../rust/lancedb", default-features = false }
napi = { version = "2.16.8", default-features = false, features = [
napi = { version = "3.8.3", default-features = false, features = [
"napi9",
"async"
] }
napi-derive = "2.16.4"
napi-derive = "3.5.2"
# Prevent dynamic linking of lzma, which comes from datafusion
lzma-sys = { version = "*", features = ["static"] }
log.workspace = true
@@ -33,7 +33,7 @@ aws-lc-sys = "=0.28.0"
aws-lc-rs = "=1.13.0"
[build-dependencies]
napi-build = "2.1"
napi-build = "2.3.1"
[features]
default = ["remote", "lancedb/aws", "lancedb/gcs", "lancedb/azure", "lancedb/dynamodb", "lancedb/oss", "lancedb/huggingface"]

View File

@@ -0,0 +1,668 @@
[@75lb/deep-merge@1.1.2](https://github.com/75lb/deep-merge) - MIT
[@aashutoshrathi/word-wrap@1.2.6](https://github.com/aashutoshrathi/word-wrap) - MIT
[@ampproject/remapping@2.2.1](https://github.com/ampproject/remapping) - Apache-2.0
[@aws-crypto/crc32@3.0.0](https://github.com/aws/aws-sdk-js-crypto-helpers) - Apache-2.0
[@aws-crypto/crc32c@3.0.0](https://github.com/aws/aws-sdk-js-crypto-helpers) - Apache-2.0
[@aws-crypto/ie11-detection@3.0.0](https://github.com/aws/aws-sdk-js-crypto-helpers) - Apache-2.0
[@aws-crypto/sha1-browser@3.0.0](https://github.com/aws/aws-sdk-js-crypto-helpers) - Apache-2.0
[@aws-crypto/sha256-browser@3.0.0](https://github.com/aws/aws-sdk-js-crypto-helpers) - Apache-2.0
[@aws-crypto/sha256-browser@5.2.0](https://github.com/aws/aws-sdk-js-crypto-helpers) - Apache-2.0
[@aws-crypto/sha256-js@3.0.0](https://github.com/aws/aws-sdk-js-crypto-helpers) - Apache-2.0
[@aws-crypto/sha256-js@5.2.0](https://github.com/aws/aws-sdk-js-crypto-helpers) - Apache-2.0
[@aws-crypto/supports-web-crypto@3.0.0](https://github.com/aws/aws-sdk-js-crypto-helpers) - Apache-2.0
[@aws-crypto/supports-web-crypto@5.2.0](https://github.com/aws/aws-sdk-js-crypto-helpers) - Apache-2.0
[@aws-crypto/util@3.0.0](https://github.com/aws/aws-sdk-js-crypto-helpers) - Apache-2.0
[@aws-crypto/util@5.2.0](https://github.com/aws/aws-sdk-js-crypto-helpers) - Apache-2.0
[@aws-sdk/client-dynamodb@3.602.0](https://github.com/aws/aws-sdk-js-v3) - Apache-2.0
[@aws-sdk/client-kms@3.549.0](https://github.com/aws/aws-sdk-js-v3) - Apache-2.0
[@aws-sdk/client-s3@3.550.0](https://github.com/aws/aws-sdk-js-v3) - Apache-2.0
[@aws-sdk/client-sso-oidc@3.549.0](https://github.com/aws/aws-sdk-js-v3) - Apache-2.0
[@aws-sdk/client-sso-oidc@3.600.0](https://github.com/aws/aws-sdk-js-v3) - Apache-2.0
[@aws-sdk/client-sso@3.549.0](https://github.com/aws/aws-sdk-js-v3) - Apache-2.0
[@aws-sdk/client-sso@3.598.0](https://github.com/aws/aws-sdk-js-v3) - Apache-2.0
[@aws-sdk/client-sts@3.549.0](https://github.com/aws/aws-sdk-js-v3) - Apache-2.0
[@aws-sdk/client-sts@3.600.0](https://github.com/aws/aws-sdk-js-v3) - Apache-2.0
[@aws-sdk/core@3.549.0](https://github.com/aws/aws-sdk-js-v3) - Apache-2.0
[@aws-sdk/core@3.598.0](https://github.com/aws/aws-sdk-js-v3) - Apache-2.0
[@aws-sdk/credential-provider-env@3.535.0](https://github.com/aws/aws-sdk-js-v3) - Apache-2.0
[@aws-sdk/credential-provider-env@3.598.0](https://github.com/aws/aws-sdk-js-v3) - Apache-2.0
[@aws-sdk/credential-provider-http@3.535.0](https://github.com/aws/aws-sdk-js-v3) - Apache-2.0
[@aws-sdk/credential-provider-http@3.598.0](https://github.com/aws/aws-sdk-js-v3) - Apache-2.0
[@aws-sdk/credential-provider-ini@3.549.0](https://github.com/aws/aws-sdk-js-v3) - Apache-2.0
[@aws-sdk/credential-provider-ini@3.598.0](https://github.com/aws/aws-sdk-js-v3) - Apache-2.0
[@aws-sdk/credential-provider-node@3.549.0](https://github.com/aws/aws-sdk-js-v3) - Apache-2.0
[@aws-sdk/credential-provider-node@3.600.0](https://github.com/aws/aws-sdk-js-v3) - Apache-2.0
[@aws-sdk/credential-provider-process@3.535.0](https://github.com/aws/aws-sdk-js-v3) - Apache-2.0
[@aws-sdk/credential-provider-process@3.598.0](https://github.com/aws/aws-sdk-js-v3) - Apache-2.0
[@aws-sdk/credential-provider-sso@3.549.0](https://github.com/aws/aws-sdk-js-v3) - Apache-2.0
[@aws-sdk/credential-provider-sso@3.598.0](https://github.com/aws/aws-sdk-js-v3) - Apache-2.0
[@aws-sdk/credential-provider-web-identity@3.549.0](https://github.com/aws/aws-sdk-js-v3) - Apache-2.0
[@aws-sdk/credential-provider-web-identity@3.598.0](https://github.com/aws/aws-sdk-js-v3) - Apache-2.0
[@aws-sdk/endpoint-cache@3.572.0](https://github.com/aws/aws-sdk-js-v3) - Apache-2.0
[@aws-sdk/middleware-bucket-endpoint@3.535.0](https://github.com/aws/aws-sdk-js-v3) - Apache-2.0
[@aws-sdk/middleware-endpoint-discovery@3.598.0](https://github.com/aws/aws-sdk-js-v3) - Apache-2.0
[@aws-sdk/middleware-expect-continue@3.535.0](https://github.com/aws/aws-sdk-js-v3) - Apache-2.0
[@aws-sdk/middleware-flexible-checksums@3.535.0](https://github.com/aws/aws-sdk-js-v3) - Apache-2.0
[@aws-sdk/middleware-host-header@3.535.0](https://github.com/aws/aws-sdk-js-v3) - Apache-2.0
[@aws-sdk/middleware-host-header@3.598.0](https://github.com/aws/aws-sdk-js-v3) - Apache-2.0
[@aws-sdk/middleware-location-constraint@3.535.0](https://github.com/aws/aws-sdk-js-v3) - Apache-2.0
[@aws-sdk/middleware-logger@3.535.0](https://github.com/aws/aws-sdk-js-v3) - Apache-2.0
[@aws-sdk/middleware-logger@3.598.0](https://github.com/aws/aws-sdk-js-v3) - Apache-2.0
[@aws-sdk/middleware-recursion-detection@3.535.0](https://github.com/aws/aws-sdk-js-v3) - Apache-2.0
[@aws-sdk/middleware-recursion-detection@3.598.0](https://github.com/aws/aws-sdk-js-v3) - Apache-2.0
[@aws-sdk/middleware-sdk-s3@3.535.0](https://github.com/aws/aws-sdk-js-v3) - Apache-2.0
[@aws-sdk/middleware-signing@3.535.0](https://github.com/aws/aws-sdk-js-v3) - Apache-2.0
[@aws-sdk/middleware-ssec@3.537.0](https://github.com/aws/aws-sdk-js-v3) - Apache-2.0
[@aws-sdk/middleware-user-agent@3.540.0](https://github.com/aws/aws-sdk-js-v3) - Apache-2.0
[@aws-sdk/middleware-user-agent@3.598.0](https://github.com/aws/aws-sdk-js-v3) - Apache-2.0
[@aws-sdk/region-config-resolver@3.535.0](https://github.com/aws/aws-sdk-js-v3) - Apache-2.0
[@aws-sdk/region-config-resolver@3.598.0](https://github.com/aws/aws-sdk-js-v3) - Apache-2.0
[@aws-sdk/signature-v4-multi-region@3.535.0](https://github.com/aws/aws-sdk-js-v3) - Apache-2.0
[@aws-sdk/token-providers@3.549.0](https://github.com/aws/aws-sdk-js-v3) - Apache-2.0
[@aws-sdk/token-providers@3.598.0](https://github.com/aws/aws-sdk-js-v3) - Apache-2.0
[@aws-sdk/types@3.535.0](https://github.com/aws/aws-sdk-js-v3) - Apache-2.0
[@aws-sdk/types@3.598.0](https://github.com/aws/aws-sdk-js-v3) - Apache-2.0
[@aws-sdk/util-arn-parser@3.535.0](https://github.com/aws/aws-sdk-js-v3) - Apache-2.0
[@aws-sdk/util-endpoints@3.540.0](https://github.com/aws/aws-sdk-js-v3) - Apache-2.0
[@aws-sdk/util-endpoints@3.598.0](https://github.com/aws/aws-sdk-js-v3) - Apache-2.0
[@aws-sdk/util-locate-window@3.535.0](https://github.com/aws/aws-sdk-js-v3) - Apache-2.0
[@aws-sdk/util-user-agent-browser@3.535.0](https://github.com/aws/aws-sdk-js-v3) - Apache-2.0
[@aws-sdk/util-user-agent-browser@3.598.0](https://github.com/aws/aws-sdk-js-v3) - Apache-2.0
[@aws-sdk/util-user-agent-node@3.535.0](https://github.com/aws/aws-sdk-js-v3) - Apache-2.0
[@aws-sdk/util-user-agent-node@3.598.0](https://github.com/aws/aws-sdk-js-v3) - Apache-2.0
[@aws-sdk/util-utf8-browser@3.259.0](https://github.com/aws/aws-sdk-js-v3) - Apache-2.0
[@aws-sdk/xml-builder@3.535.0](https://github.com/aws/aws-sdk-js-v3) - Apache-2.0
[@babel/code-frame@7.26.2](https://github.com/babel/babel) - MIT
[@babel/compat-data@7.23.5](https://github.com/babel/babel) - MIT
[@babel/core@7.23.7](https://github.com/babel/babel) - MIT
[@babel/generator@7.23.6](https://github.com/babel/babel) - MIT
[@babel/helper-compilation-targets@7.23.6](https://github.com/babel/babel) - MIT
[@babel/helper-environment-visitor@7.22.20](https://github.com/babel/babel) - MIT
[@babel/helper-function-name@7.23.0](https://github.com/babel/babel) - MIT
[@babel/helper-hoist-variables@7.22.5](https://github.com/babel/babel) - MIT
[@babel/helper-module-imports@7.22.15](https://github.com/babel/babel) - MIT
[@babel/helper-module-transforms@7.23.3](https://github.com/babel/babel) - MIT
[@babel/helper-plugin-utils@7.22.5](https://github.com/babel/babel) - MIT
[@babel/helper-simple-access@7.22.5](https://github.com/babel/babel) - MIT
[@babel/helper-split-export-declaration@7.22.6](https://github.com/babel/babel) - MIT
[@babel/helper-string-parser@7.25.9](https://github.com/babel/babel) - MIT
[@babel/helper-validator-identifier@7.25.9](https://github.com/babel/babel) - MIT
[@babel/helper-validator-option@7.23.5](https://github.com/babel/babel) - MIT
[@babel/helpers@7.27.0](https://github.com/babel/babel) - MIT
[@babel/parser@7.27.0](https://github.com/babel/babel) - MIT
[@babel/plugin-syntax-async-generators@7.8.4](https://github.com/babel/babel/tree/master/packages/babel-plugin-syntax-async-generators) - MIT
[@babel/plugin-syntax-bigint@7.8.3](https://github.com/babel/babel/tree/master/packages/babel-plugin-syntax-bigint) - MIT
[@babel/plugin-syntax-class-properties@7.12.13](https://github.com/babel/babel) - MIT
[@babel/plugin-syntax-import-meta@7.10.4](https://github.com/babel/babel) - MIT
[@babel/plugin-syntax-json-strings@7.8.3](https://github.com/babel/babel/tree/master/packages/babel-plugin-syntax-json-strings) - MIT
[@babel/plugin-syntax-jsx@7.23.3](https://github.com/babel/babel) - MIT
[@babel/plugin-syntax-logical-assignment-operators@7.10.4](https://github.com/babel/babel) - MIT
[@babel/plugin-syntax-nullish-coalescing-operator@7.8.3](https://github.com/babel/babel/tree/master/packages/babel-plugin-syntax-nullish-coalescing-operator) - MIT
[@babel/plugin-syntax-numeric-separator@7.10.4](https://github.com/babel/babel) - MIT
[@babel/plugin-syntax-object-rest-spread@7.8.3](https://github.com/babel/babel/tree/master/packages/babel-plugin-syntax-object-rest-spread) - MIT
[@babel/plugin-syntax-optional-catch-binding@7.8.3](https://github.com/babel/babel/tree/master/packages/babel-plugin-syntax-optional-catch-binding) - MIT
[@babel/plugin-syntax-optional-chaining@7.8.3](https://github.com/babel/babel/tree/master/packages/babel-plugin-syntax-optional-chaining) - MIT
[@babel/plugin-syntax-top-level-await@7.14.5](https://github.com/babel/babel) - MIT
[@babel/plugin-syntax-typescript@7.23.3](https://github.com/babel/babel) - MIT
[@babel/template@7.27.0](https://github.com/babel/babel) - MIT
[@babel/traverse@7.23.7](https://github.com/babel/babel) - MIT
[@babel/types@7.27.0](https://github.com/babel/babel) - MIT
[@bcoe/v8-coverage@0.2.3](https://github.com/demurgos/v8-coverage) - MIT
[@biomejs/biome@1.8.3](https://github.com/biomejs/biome) - MIT OR Apache-2.0
[@biomejs/cli-darwin-arm64@1.8.3](https://github.com/biomejs/biome) - MIT OR Apache-2.0
[@eslint-community/eslint-utils@4.4.0](https://github.com/eslint-community/eslint-utils) - MIT
[@eslint-community/regexpp@4.10.0](https://github.com/eslint-community/regexpp) - MIT
[@eslint/eslintrc@2.1.4](https://github.com/eslint/eslintrc) - MIT
[@eslint/js@8.57.0](https://github.com/eslint/eslint) - MIT
[@huggingface/jinja@0.3.2](https://github.com/huggingface/huggingface.js) - MIT
[@huggingface/transformers@3.0.2](https://github.com/huggingface/transformers.js) - Apache-2.0
[@humanwhocodes/config-array@0.11.14](https://github.com/humanwhocodes/config-array) - Apache-2.0
[@humanwhocodes/module-importer@1.0.1](https://github.com/humanwhocodes/module-importer) - Apache-2.0
[@humanwhocodes/object-schema@2.0.2](https://github.com/humanwhocodes/object-schema) - BSD-3-Clause
[@img/sharp-darwin-arm64@0.33.5](https://github.com/lovell/sharp) - Apache-2.0
[@img/sharp-libvips-darwin-arm64@1.0.4](https://github.com/lovell/sharp-libvips) - LGPL-3.0-or-later
[@isaacs/cliui@8.0.2](https://github.com/yargs/cliui) - ISC
[@isaacs/fs-minipass@4.0.1](https://github.com/npm/fs-minipass) - ISC
[@istanbuljs/load-nyc-config@1.1.0](https://github.com/istanbuljs/load-nyc-config) - ISC
[@istanbuljs/schema@0.1.3](https://github.com/istanbuljs/schema) - MIT
[@jest/console@29.7.0](https://github.com/jestjs/jest) - MIT
[@jest/core@29.7.0](https://github.com/jestjs/jest) - MIT
[@jest/environment@29.7.0](https://github.com/jestjs/jest) - MIT
[@jest/expect-utils@29.7.0](https://github.com/jestjs/jest) - MIT
[@jest/expect@29.7.0](https://github.com/jestjs/jest) - MIT
[@jest/fake-timers@29.7.0](https://github.com/jestjs/jest) - MIT
[@jest/globals@29.7.0](https://github.com/jestjs/jest) - MIT
[@jest/reporters@29.7.0](https://github.com/jestjs/jest) - MIT
[@jest/schemas@29.6.3](https://github.com/jestjs/jest) - MIT
[@jest/source-map@29.6.3](https://github.com/jestjs/jest) - MIT
[@jest/test-result@29.7.0](https://github.com/jestjs/jest) - MIT
[@jest/test-sequencer@29.7.0](https://github.com/jestjs/jest) - MIT
[@jest/transform@29.7.0](https://github.com/jestjs/jest) - MIT
[@jest/types@29.6.3](https://github.com/jestjs/jest) - MIT
[@jridgewell/gen-mapping@0.3.3](https://github.com/jridgewell/gen-mapping) - MIT
[@jridgewell/resolve-uri@3.1.1](https://github.com/jridgewell/resolve-uri) - MIT
[@jridgewell/set-array@1.1.2](https://github.com/jridgewell/set-array) - MIT
[@jridgewell/sourcemap-codec@1.4.15](https://github.com/jridgewell/sourcemap-codec) - MIT
[@jridgewell/trace-mapping@0.3.22](https://github.com/jridgewell/trace-mapping) - MIT
[@lancedb/lancedb@0.26.2](https://github.com/lancedb/lancedb) - Apache-2.0
[@napi-rs/cli@2.18.3](https://github.com/napi-rs/napi-rs) - MIT
[@nodelib/fs.scandir@2.1.5](https://github.com/nodelib/nodelib/tree/master/packages/fs/fs.scandir) - MIT
[@nodelib/fs.stat@2.0.5](https://github.com/nodelib/nodelib/tree/master/packages/fs/fs.stat) - MIT
[@nodelib/fs.walk@1.2.8](https://github.com/nodelib/nodelib/tree/master/packages/fs/fs.walk) - MIT
[@pkgjs/parseargs@0.11.0](https://github.com/pkgjs/parseargs) - MIT
[@protobufjs/aspromise@1.1.2](https://github.com/dcodeIO/protobuf.js) - BSD-3-Clause
[@protobufjs/base64@1.1.2](https://github.com/dcodeIO/protobuf.js) - BSD-3-Clause
[@protobufjs/codegen@2.0.4](https://github.com/dcodeIO/protobuf.js) - BSD-3-Clause
[@protobufjs/eventemitter@1.1.0](https://github.com/dcodeIO/protobuf.js) - BSD-3-Clause
[@protobufjs/fetch@1.1.0](https://github.com/dcodeIO/protobuf.js) - BSD-3-Clause
[@protobufjs/float@1.0.2](https://github.com/dcodeIO/protobuf.js) - BSD-3-Clause
[@protobufjs/inquire@1.1.0](https://github.com/dcodeIO/protobuf.js) - BSD-3-Clause
[@protobufjs/path@1.1.2](https://github.com/dcodeIO/protobuf.js) - BSD-3-Clause
[@protobufjs/pool@1.1.0](https://github.com/dcodeIO/protobuf.js) - BSD-3-Clause
[@protobufjs/utf8@1.1.0](https://github.com/dcodeIO/protobuf.js) - BSD-3-Clause
[@shikijs/core@1.10.3](https://github.com/shikijs/shiki) - MIT
[@sinclair/typebox@0.27.8](https://github.com/sinclairzx81/typebox) - MIT
[@sinonjs/commons@3.0.1](https://github.com/sinonjs/commons) - BSD-3-Clause
[@sinonjs/fake-timers@10.3.0](https://github.com/sinonjs/fake-timers) - BSD-3-Clause
[@smithy/abort-controller@2.2.0](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/abort-controller@3.1.0](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/chunked-blob-reader-native@2.2.0](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/chunked-blob-reader@2.2.0](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/config-resolver@2.2.0](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/config-resolver@3.0.3](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/core@1.4.2](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/core@2.2.3](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/credential-provider-imds@2.3.0](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/credential-provider-imds@3.1.2](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/eventstream-codec@2.2.0](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/eventstream-serde-browser@2.2.0](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/eventstream-serde-config-resolver@2.2.0](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/eventstream-serde-node@2.2.0](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/eventstream-serde-universal@2.2.0](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/fetch-http-handler@2.5.0](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/fetch-http-handler@3.1.0](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/hash-blob-browser@2.2.0](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/hash-node@2.2.0](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/hash-node@3.0.2](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/hash-stream-node@2.2.0](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/invalid-dependency@2.2.0](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/invalid-dependency@3.0.2](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/is-array-buffer@2.2.0](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/is-array-buffer@3.0.0](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/md5-js@2.2.0](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/middleware-content-length@2.2.0](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/middleware-content-length@3.0.2](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/middleware-endpoint@2.5.1](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/middleware-endpoint@3.0.3](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/middleware-retry@2.3.1](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/middleware-retry@3.0.6](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/middleware-serde@2.3.0](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/middleware-serde@3.0.2](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/middleware-stack@2.2.0](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/middleware-stack@3.0.2](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/node-config-provider@2.3.0](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/node-config-provider@3.1.2](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/node-http-handler@2.5.0](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/node-http-handler@3.1.0](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/property-provider@2.2.0](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/property-provider@3.1.2](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/protocol-http@3.3.0](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/protocol-http@4.0.2](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/querystring-builder@2.2.0](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/querystring-builder@3.0.2](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/querystring-parser@2.2.0](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/querystring-parser@3.0.2](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/service-error-classification@2.1.5](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/service-error-classification@3.0.2](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/shared-ini-file-loader@2.4.0](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/shared-ini-file-loader@3.1.2](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/signature-v4@2.2.1](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/signature-v4@3.1.1](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/smithy-client@2.5.1](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/smithy-client@3.1.4](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/types@2.12.0](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/types@3.2.0](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/url-parser@2.2.0](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/url-parser@3.0.2](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/util-base64@2.3.0](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/util-base64@3.0.0](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/util-body-length-browser@2.2.0](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/util-body-length-browser@3.0.0](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/util-body-length-node@2.3.0](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/util-body-length-node@3.0.0](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/util-buffer-from@2.2.0](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/util-buffer-from@3.0.0](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/util-config-provider@2.3.0](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/util-config-provider@3.0.0](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/util-defaults-mode-browser@2.2.1](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/util-defaults-mode-browser@3.0.6](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/util-defaults-mode-node@2.3.1](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/util-defaults-mode-node@3.0.6](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/util-endpoints@1.2.0](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/util-endpoints@2.0.3](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/util-hex-encoding@2.2.0](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/util-hex-encoding@3.0.0](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/util-middleware@2.2.0](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/util-middleware@3.0.2](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/util-retry@2.2.0](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/util-retry@3.0.2](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/util-stream@2.2.0](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/util-stream@3.0.4](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/util-uri-escape@2.2.0](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/util-uri-escape@3.0.0](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/util-utf8@2.3.0](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/util-utf8@3.0.0](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/util-waiter@2.2.0](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@smithy/util-waiter@3.1.1](https://github.com/awslabs/smithy-typescript) - Apache-2.0
[@swc/helpers@0.5.12](https://github.com/swc-project/swc) - Apache-2.0
[@types/axios@0.14.0](https://github.com/mzabriskie/axios) - MIT
[@types/babel__core@7.20.5](https://github.com/DefinitelyTyped/DefinitelyTyped) - MIT
[@types/babel__generator@7.6.8](https://github.com/DefinitelyTyped/DefinitelyTyped) - MIT
[@types/babel__template@7.4.4](https://github.com/DefinitelyTyped/DefinitelyTyped) - MIT
[@types/babel__traverse@7.20.5](https://github.com/DefinitelyTyped/DefinitelyTyped) - MIT
[@types/command-line-args@5.2.3](https://github.com/DefinitelyTyped/DefinitelyTyped) - MIT
[@types/command-line-usage@5.0.2](https://github.com/DefinitelyTyped/DefinitelyTyped) - MIT
[@types/command-line-usage@5.0.4](https://github.com/DefinitelyTyped/DefinitelyTyped) - MIT
[@types/graceful-fs@4.1.9](https://github.com/DefinitelyTyped/DefinitelyTyped) - MIT
[@types/hast@3.0.4](https://github.com/DefinitelyTyped/DefinitelyTyped) - MIT
[@types/istanbul-lib-coverage@2.0.6](https://github.com/DefinitelyTyped/DefinitelyTyped) - MIT
[@types/istanbul-lib-report@3.0.3](https://github.com/DefinitelyTyped/DefinitelyTyped) - MIT
[@types/istanbul-reports@3.0.4](https://github.com/DefinitelyTyped/DefinitelyTyped) - MIT
[@types/jest@29.5.12](https://github.com/DefinitelyTyped/DefinitelyTyped) - MIT
[@types/json-schema@7.0.15](https://github.com/DefinitelyTyped/DefinitelyTyped) - MIT
[@types/node-fetch@2.6.11](https://github.com/DefinitelyTyped/DefinitelyTyped) - MIT
[@types/node@18.19.26](https://github.com/DefinitelyTyped/DefinitelyTyped) - MIT
[@types/node@20.16.10](https://github.com/DefinitelyTyped/DefinitelyTyped) - MIT
[@types/node@20.17.9](https://github.com/DefinitelyTyped/DefinitelyTyped) - MIT
[@types/node@22.7.4](https://github.com/DefinitelyTyped/DefinitelyTyped) - MIT
[@types/semver@7.5.6](https://github.com/DefinitelyTyped/DefinitelyTyped) - MIT
[@types/stack-utils@2.0.3](https://github.com/DefinitelyTyped/DefinitelyTyped) - MIT
[@types/tmp@0.2.6](https://github.com/DefinitelyTyped/DefinitelyTyped) - MIT
[@types/unist@3.0.2](https://github.com/DefinitelyTyped/DefinitelyTyped) - MIT
[@types/yargs-parser@21.0.3](https://github.com/DefinitelyTyped/DefinitelyTyped) - MIT
[@types/yargs@17.0.32](https://github.com/DefinitelyTyped/DefinitelyTyped) - MIT
[@typescript-eslint/eslint-plugin@7.1.0](https://github.com/typescript-eslint/typescript-eslint) - MIT
[@typescript-eslint/parser@7.1.0](https://github.com/typescript-eslint/typescript-eslint) - BSD-2-Clause
[@typescript-eslint/scope-manager@7.1.0](https://github.com/typescript-eslint/typescript-eslint) - MIT
[@typescript-eslint/type-utils@7.1.0](https://github.com/typescript-eslint/typescript-eslint) - MIT
[@typescript-eslint/types@7.1.0](https://github.com/typescript-eslint/typescript-eslint) - MIT
[@typescript-eslint/typescript-estree@7.1.0](https://github.com/typescript-eslint/typescript-eslint) - BSD-2-Clause
[@typescript-eslint/utils@7.1.0](https://github.com/typescript-eslint/typescript-eslint) - MIT
[@typescript-eslint/visitor-keys@7.1.0](https://github.com/typescript-eslint/typescript-eslint) - MIT
[@ungap/structured-clone@1.2.0](https://github.com/ungap/structured-clone) - ISC
[abort-controller@3.0.0](https://github.com/mysticatea/abort-controller) - MIT
[acorn-jsx@5.3.2](https://github.com/acornjs/acorn-jsx) - MIT
[acorn@8.11.3](https://github.com/acornjs/acorn) - MIT
[agentkeepalive@4.5.0](https://github.com/node-modules/agentkeepalive) - MIT
[ajv@6.12.6](https://github.com/ajv-validator/ajv) - MIT
[ansi-escapes@4.3.2](https://github.com/sindresorhus/ansi-escapes) - MIT
[ansi-regex@5.0.1](https://github.com/chalk/ansi-regex) - MIT
[ansi-regex@6.1.0](https://github.com/chalk/ansi-regex) - MIT
[ansi-styles@4.3.0](https://github.com/chalk/ansi-styles) - MIT
[ansi-styles@5.2.0](https://github.com/chalk/ansi-styles) - MIT
[ansi-styles@6.2.1](https://github.com/chalk/ansi-styles) - MIT
[anymatch@3.1.3](https://github.com/micromatch/anymatch) - ISC
[apache-arrow@15.0.0](https://github.com/apache/arrow) - Apache-2.0
[apache-arrow@16.0.0](https://github.com/apache/arrow) - Apache-2.0
[apache-arrow@17.0.0](https://github.com/apache/arrow) - Apache-2.0
[apache-arrow@18.0.0](https://github.com/apache/arrow) - Apache-2.0
[argparse@1.0.10](https://github.com/nodeca/argparse) - MIT
[argparse@2.0.1](https://github.com/nodeca/argparse) - Python-2.0
[array-back@3.1.0](https://github.com/75lb/array-back) - MIT
[array-back@6.2.2](https://github.com/75lb/array-back) - MIT
[array-union@2.1.0](https://github.com/sindresorhus/array-union) - MIT
[asynckit@0.4.0](https://github.com/alexindigo/asynckit) - MIT
[axios@1.8.4](https://github.com/axios/axios) - MIT
[babel-jest@29.7.0](https://github.com/jestjs/jest) - MIT
[babel-plugin-istanbul@6.1.1](https://github.com/istanbuljs/babel-plugin-istanbul) - BSD-3-Clause
[babel-plugin-jest-hoist@29.6.3](https://github.com/jestjs/jest) - MIT
[babel-preset-current-node-syntax@1.0.1](https://github.com/nicolo-ribaudo/babel-preset-current-node-syntax) - MIT
[babel-preset-jest@29.6.3](https://github.com/jestjs/jest) - MIT
[balanced-match@1.0.2](https://github.com/juliangruber/balanced-match) - MIT
[base-64@0.1.0](https://github.com/mathiasbynens/base64) - MIT
[bowser@2.11.0](https://github.com/lancedikson/bowser) - MIT
[brace-expansion@1.1.11](https://github.com/juliangruber/brace-expansion) - MIT
[brace-expansion@2.0.1](https://github.com/juliangruber/brace-expansion) - MIT
[braces@3.0.3](https://github.com/micromatch/braces) - MIT
[browserslist@4.22.2](https://github.com/browserslist/browserslist) - MIT
[bs-logger@0.2.6](https://github.com/huafu/bs-logger) - MIT
[bser@2.1.1](https://github.com/facebook/watchman) - Apache-2.0
[buffer-from@1.1.2](https://github.com/LinusU/buffer-from) - MIT
[callsites@3.1.0](https://github.com/sindresorhus/callsites) - MIT
[camelcase@5.3.1](https://github.com/sindresorhus/camelcase) - MIT
[camelcase@6.3.0](https://github.com/sindresorhus/camelcase) - MIT
[caniuse-lite@1.0.30001579](https://github.com/browserslist/caniuse-lite) - CC-BY-4.0
[chalk-template@0.4.0](https://github.com/chalk/chalk-template) - MIT
[chalk@4.1.2](https://github.com/chalk/chalk) - MIT
[char-regex@1.0.2](https://github.com/Richienb/char-regex) - MIT
[charenc@0.0.2](https://github.com/pvorb/node-charenc) - BSD-3-Clause
[chownr@3.0.0](https://github.com/isaacs/chownr) - BlueOak-1.0.0
[ci-info@3.9.0](https://github.com/watson/ci-info) - MIT
[cjs-module-lexer@1.2.3](https://github.com/nodejs/cjs-module-lexer) - MIT
[cliui@8.0.1](https://github.com/yargs/cliui) - ISC
[co@4.6.0](https://github.com/tj/co) - MIT
[collect-v8-coverage@1.0.2](https://github.com/SimenB/collect-v8-coverage) - MIT
[color-convert@2.0.1](https://github.com/Qix-/color-convert) - MIT
[color-name@1.1.4](https://github.com/colorjs/color-name) - MIT
[color-string@1.9.1](https://github.com/Qix-/color-string) - MIT
[color@4.2.3](https://github.com/Qix-/color) - MIT
[combined-stream@1.0.8](https://github.com/felixge/node-combined-stream) - MIT
[command-line-args@5.2.1](https://github.com/75lb/command-line-args) - MIT
[command-line-usage@7.0.1](https://github.com/75lb/command-line-usage) - MIT
[concat-map@0.0.1](https://github.com/substack/node-concat-map) - MIT
[convert-source-map@2.0.0](https://github.com/thlorenz/convert-source-map) - MIT
[create-jest@29.7.0](https://github.com/jestjs/jest) - MIT
[cross-spawn@7.0.6](https://github.com/moxystudio/node-cross-spawn) - MIT
[crypt@0.0.2](https://github.com/pvorb/node-crypt) - BSD-3-Clause
[debug@4.3.4](https://github.com/debug-js/debug) - MIT
[dedent@1.5.1](https://github.com/dmnd/dedent) - MIT
[deep-is@0.1.4](https://github.com/thlorenz/deep-is) - MIT
[deepmerge@4.3.1](https://github.com/TehShrike/deepmerge) - MIT
[delayed-stream@1.0.0](https://github.com/felixge/node-delayed-stream) - MIT
[detect-libc@2.0.3](https://github.com/lovell/detect-libc) - Apache-2.0
[detect-newline@3.1.0](https://github.com/sindresorhus/detect-newline) - MIT
[diff-sequences@29.6.3](https://github.com/jestjs/jest) - MIT
[digest-fetch@1.3.0](https://github.com/devfans/digest-fetch) - ISC
[dir-glob@3.0.1](https://github.com/kevva/dir-glob) - MIT
[doctrine@3.0.0](https://github.com/eslint/doctrine) - Apache-2.0
[eastasianwidth@0.2.0](https://github.com/komagata/eastasianwidth) - MIT
[electron-to-chromium@1.4.642](https://github.com/kilian/electron-to-chromium) - ISC
[emittery@0.13.1](https://github.com/sindresorhus/emittery) - MIT
[emoji-regex@8.0.0](https://github.com/mathiasbynens/emoji-regex) - MIT
[emoji-regex@9.2.2](https://github.com/mathiasbynens/emoji-regex) - MIT
[entities@4.5.0](https://github.com/fb55/entities) - BSD-2-Clause
[error-ex@1.3.2](https://github.com/qix-/node-error-ex) - MIT
[escalade@3.1.1](https://github.com/lukeed/escalade) - MIT
[escape-string-regexp@2.0.0](https://github.com/sindresorhus/escape-string-regexp) - MIT
[escape-string-regexp@4.0.0](https://github.com/sindresorhus/escape-string-regexp) - MIT
[eslint-scope@7.2.2](https://github.com/eslint/eslint-scope) - BSD-2-Clause
[eslint-visitor-keys@3.4.3](https://github.com/eslint/eslint-visitor-keys) - Apache-2.0
[eslint@8.57.0](https://github.com/eslint/eslint) - MIT
[espree@9.6.1](https://github.com/eslint/espree) - BSD-2-Clause
[esprima@4.0.1](https://github.com/jquery/esprima) - BSD-2-Clause
[esquery@1.5.0](https://github.com/estools/esquery) - BSD-3-Clause
[esrecurse@4.3.0](https://github.com/estools/esrecurse) - BSD-2-Clause
[estraverse@5.3.0](https://github.com/estools/estraverse) - BSD-2-Clause
[esutils@2.0.3](https://github.com/estools/esutils) - BSD-2-Clause
[event-target-shim@5.0.1](https://github.com/mysticatea/event-target-shim) - MIT
[execa@5.1.1](https://github.com/sindresorhus/execa) - MIT
[exit@0.1.2](https://github.com/cowboy/node-exit) - MIT
[expect@29.7.0](https://github.com/jestjs/jest) - MIT
[fast-deep-equal@3.1.3](https://github.com/epoberezkin/fast-deep-equal) - MIT
[fast-glob@3.3.2](https://github.com/mrmlnc/fast-glob) - MIT
[fast-json-stable-stringify@2.1.0](https://github.com/epoberezkin/fast-json-stable-stringify) - MIT
[fast-levenshtein@2.0.6](https://github.com/hiddentao/fast-levenshtein) - MIT
[fast-xml-parser@4.2.5](https://github.com/NaturalIntelligence/fast-xml-parser) - MIT
[fastq@1.16.0](https://github.com/mcollina/fastq) - ISC
[fb-watchman@2.0.2](https://github.com/facebook/watchman) - Apache-2.0
[file-entry-cache@6.0.1](https://github.com/royriojas/file-entry-cache) - MIT
[fill-range@7.1.1](https://github.com/jonschlinkert/fill-range) - MIT
[find-replace@3.0.0](https://github.com/75lb/find-replace) - MIT
[find-up@4.1.0](https://github.com/sindresorhus/find-up) - MIT
[find-up@5.0.0](https://github.com/sindresorhus/find-up) - MIT
[flat-cache@3.2.0](https://github.com/jaredwray/flat-cache) - MIT
[flatbuffers@1.12.0](https://github.com/google/flatbuffers) - Apache*
[flatbuffers@23.5.26](https://github.com/google/flatbuffers) - Apache*
[flatbuffers@24.3.25](https://github.com/google/flatbuffers) - Apache-2.0
[flatted@3.2.9](https://github.com/WebReflection/flatted) - ISC
[follow-redirects@1.15.6](https://github.com/follow-redirects/follow-redirects) - MIT
[foreground-child@3.3.0](https://github.com/tapjs/foreground-child) - ISC
[form-data-encoder@1.7.2](https://github.com/octet-stream/form-data-encoder) - MIT
[form-data@4.0.0](https://github.com/form-data/form-data) - MIT
[formdata-node@4.4.1](https://github.com/octet-stream/form-data) - MIT
[fs.realpath@1.0.0](https://github.com/isaacs/fs.realpath) - ISC
[fsevents@2.3.3](https://github.com/fsevents/fsevents) - MIT
[function-bind@1.1.2](https://github.com/Raynos/function-bind) - MIT
[gensync@1.0.0-beta.2](https://github.com/loganfsmyth/gensync) - MIT
[get-caller-file@2.0.5](https://github.com/stefanpenner/get-caller-file) - ISC
[get-package-type@0.1.0](https://github.com/cfware/get-package-type) - MIT
[get-stream@6.0.1](https://github.com/sindresorhus/get-stream) - MIT
[glob-parent@5.1.2](https://github.com/gulpjs/glob-parent) - ISC
[glob-parent@6.0.2](https://github.com/gulpjs/glob-parent) - ISC
[glob@10.4.5](https://github.com/isaacs/node-glob) - ISC
[glob@7.2.3](https://github.com/isaacs/node-glob) - ISC
[globals@11.12.0](https://github.com/sindresorhus/globals) - MIT
[globals@13.24.0](https://github.com/sindresorhus/globals) - MIT
[globby@11.1.0](https://github.com/sindresorhus/globby) - MIT
[graceful-fs@4.2.11](https://github.com/isaacs/node-graceful-fs) - ISC
[graphemer@1.4.0](https://github.com/flmnt/graphemer) - MIT
[guid-typescript@1.0.9](https://github.com/NicolasDeveloper/guid-typescript) - ISC
[has-flag@4.0.0](https://github.com/sindresorhus/has-flag) - MIT
[hasown@2.0.0](https://github.com/inspect-js/hasOwn) - MIT
[html-escaper@2.0.2](https://github.com/WebReflection/html-escaper) - MIT
[human-signals@2.1.0](https://github.com/ehmicky/human-signals) - Apache-2.0
[humanize-ms@1.2.1](https://github.com/node-modules/humanize-ms) - MIT
[ignore@5.3.0](https://github.com/kaelzhang/node-ignore) - MIT
[import-fresh@3.3.0](https://github.com/sindresorhus/import-fresh) - MIT
[import-local@3.1.0](https://github.com/sindresorhus/import-local) - MIT
[imurmurhash@0.1.4](https://github.com/jensyt/imurmurhash-js) - MIT
[inflight@1.0.6](https://github.com/npm/inflight) - ISC
[inherits@2.0.4](https://github.com/isaacs/inherits) - ISC
[interpret@1.4.0](https://github.com/gulpjs/interpret) - MIT
[is-arrayish@0.2.1](https://github.com/qix-/node-is-arrayish) - MIT
[is-arrayish@0.3.2](https://github.com/qix-/node-is-arrayish) - MIT
[is-buffer@1.1.6](https://github.com/feross/is-buffer) - MIT
[is-core-module@2.13.1](https://github.com/inspect-js/is-core-module) - MIT
[is-extglob@2.1.1](https://github.com/jonschlinkert/is-extglob) - MIT
[is-fullwidth-code-point@3.0.0](https://github.com/sindresorhus/is-fullwidth-code-point) - MIT
[is-generator-fn@2.1.0](https://github.com/sindresorhus/is-generator-fn) - MIT
[is-glob@4.0.3](https://github.com/micromatch/is-glob) - MIT
[is-number@7.0.0](https://github.com/jonschlinkert/is-number) - MIT
[is-path-inside@3.0.3](https://github.com/sindresorhus/is-path-inside) - MIT
[is-stream@2.0.1](https://github.com/sindresorhus/is-stream) - MIT
[isexe@2.0.0](https://github.com/isaacs/isexe) - ISC
[istanbul-lib-coverage@3.2.2](https://github.com/istanbuljs/istanbuljs) - BSD-3-Clause
[istanbul-lib-instrument@5.2.1](https://github.com/istanbuljs/istanbuljs) - BSD-3-Clause
[istanbul-lib-instrument@6.0.1](https://github.com/istanbuljs/istanbuljs) - BSD-3-Clause
[istanbul-lib-report@3.0.1](https://github.com/istanbuljs/istanbuljs) - BSD-3-Clause
[istanbul-lib-source-maps@4.0.1](https://github.com/istanbuljs/istanbuljs) - BSD-3-Clause
[istanbul-reports@3.1.6](https://github.com/istanbuljs/istanbuljs) - BSD-3-Clause
[jackspeak@3.4.3](https://github.com/isaacs/jackspeak) - BlueOak-1.0.0
[jest-changed-files@29.7.0](https://github.com/jestjs/jest) - MIT
[jest-circus@29.7.0](https://github.com/jestjs/jest) - MIT
[jest-cli@29.7.0](https://github.com/jestjs/jest) - MIT
[jest-config@29.7.0](https://github.com/jestjs/jest) - MIT
[jest-diff@29.7.0](https://github.com/jestjs/jest) - MIT
[jest-docblock@29.7.0](https://github.com/jestjs/jest) - MIT
[jest-each@29.7.0](https://github.com/jestjs/jest) - MIT
[jest-environment-node@29.7.0](https://github.com/jestjs/jest) - MIT
[jest-get-type@29.6.3](https://github.com/jestjs/jest) - MIT
[jest-haste-map@29.7.0](https://github.com/jestjs/jest) - MIT
[jest-leak-detector@29.7.0](https://github.com/jestjs/jest) - MIT
[jest-matcher-utils@29.7.0](https://github.com/jestjs/jest) - MIT
[jest-message-util@29.7.0](https://github.com/jestjs/jest) - MIT
[jest-mock@29.7.0](https://github.com/jestjs/jest) - MIT
[jest-pnp-resolver@1.2.3](https://github.com/arcanis/jest-pnp-resolver) - MIT
[jest-regex-util@29.6.3](https://github.com/jestjs/jest) - MIT
[jest-resolve-dependencies@29.7.0](https://github.com/jestjs/jest) - MIT
[jest-resolve@29.7.0](https://github.com/jestjs/jest) - MIT
[jest-runner@29.7.0](https://github.com/jestjs/jest) - MIT
[jest-runtime@29.7.0](https://github.com/jestjs/jest) - MIT
[jest-snapshot@29.7.0](https://github.com/jestjs/jest) - MIT
[jest-util@29.7.0](https://github.com/jestjs/jest) - MIT
[jest-validate@29.7.0](https://github.com/jestjs/jest) - MIT
[jest-watcher@29.7.0](https://github.com/jestjs/jest) - MIT
[jest-worker@29.7.0](https://github.com/jestjs/jest) - MIT
[jest@29.7.0](https://github.com/jestjs/jest) - MIT
[js-tokens@4.0.0](https://github.com/lydell/js-tokens) - MIT
[js-yaml@3.14.1](https://github.com/nodeca/js-yaml) - MIT
[js-yaml@4.1.0](https://github.com/nodeca/js-yaml) - MIT
[jsesc@2.5.2](https://github.com/mathiasbynens/jsesc) - MIT
[json-bignum@0.0.3](https://github.com/datalanche/json-bignum) - MIT
[json-buffer@3.0.1](https://github.com/dominictarr/json-buffer) - MIT
[json-parse-even-better-errors@2.3.1](https://github.com/npm/json-parse-even-better-errors) - MIT
[json-schema-traverse@0.4.1](https://github.com/epoberezkin/json-schema-traverse) - MIT
[json-stable-stringify-without-jsonify@1.0.1](https://github.com/samn/json-stable-stringify) - MIT
[json5@2.2.3](https://github.com/json5/json5) - MIT
[keyv@4.5.4](https://github.com/jaredwray/keyv) - MIT
[kleur@3.0.3](https://github.com/lukeed/kleur) - MIT
[leven@3.1.0](https://github.com/sindresorhus/leven) - MIT
[levn@0.4.1](https://github.com/gkz/levn) - MIT
[lines-and-columns@1.2.4](https://github.com/eventualbuddha/lines-and-columns) - MIT
[linkify-it@5.0.0](https://github.com/markdown-it/linkify-it) - MIT
[locate-path@5.0.0](https://github.com/sindresorhus/locate-path) - MIT
[locate-path@6.0.0](https://github.com/sindresorhus/locate-path) - MIT
[lodash.camelcase@4.3.0](https://github.com/lodash/lodash) - MIT
[lodash.memoize@4.1.2](https://github.com/lodash/lodash) - MIT
[lodash.merge@4.6.2](https://github.com/lodash/lodash) - MIT
[lodash@4.17.21](https://github.com/lodash/lodash) - MIT
[long@5.2.3](https://github.com/dcodeIO/long.js) - Apache-2.0
[lru-cache@10.4.3](https://github.com/isaacs/node-lru-cache) - ISC
[lru-cache@5.1.1](https://github.com/isaacs/node-lru-cache) - ISC
[lunr@2.3.9](https://github.com/olivernn/lunr.js) - MIT
[make-dir@4.0.0](https://github.com/sindresorhus/make-dir) - MIT
[make-error@1.3.6](https://github.com/JsCommunity/make-error) - ISC
[makeerror@1.0.12](https://github.com/daaku/nodejs-makeerror) - BSD-3-Clause
[markdown-it@14.1.0](https://github.com/markdown-it/markdown-it) - MIT
[md5@2.3.0](https://github.com/pvorb/node-md5) - BSD-3-Clause
[mdurl@2.0.0](https://github.com/markdown-it/mdurl) - MIT
[merge-stream@2.0.0](https://github.com/grncdr/merge-stream) - MIT
[merge2@1.4.1](https://github.com/teambition/merge2) - MIT
[micromatch@4.0.8](https://github.com/micromatch/micromatch) - MIT
[mime-db@1.52.0](https://github.com/jshttp/mime-db) - MIT
[mime-types@2.1.35](https://github.com/jshttp/mime-types) - MIT
[mimic-fn@2.1.0](https://github.com/sindresorhus/mimic-fn) - MIT
[minimatch@3.1.2](https://github.com/isaacs/minimatch) - ISC
[minimatch@9.0.3](https://github.com/isaacs/minimatch) - ISC
[minimatch@9.0.5](https://github.com/isaacs/minimatch) - ISC
[minimist@1.2.8](https://github.com/minimistjs/minimist) - MIT
[minipass@7.1.2](https://github.com/isaacs/minipass) - ISC
[minizlib@3.0.1](https://github.com/isaacs/minizlib) - MIT
[mkdirp@3.0.1](https://github.com/isaacs/node-mkdirp) - MIT
[mnemonist@0.38.3](https://github.com/yomguithereal/mnemonist) - MIT
[ms@2.1.2](https://github.com/zeit/ms) - MIT
[ms@2.1.3](https://github.com/vercel/ms) - MIT
[natural-compare@1.4.0](https://github.com/litejs/natural-compare-lite) - MIT
[node-domexception@1.0.0](https://github.com/jimmywarting/node-domexception) - MIT
[node-fetch@2.7.0](https://github.com/bitinn/node-fetch) - MIT
[node-int64@0.4.0](https://github.com/broofa/node-int64) - MIT
[node-releases@2.0.14](https://github.com/chicoxyzzy/node-releases) - MIT
[normalize-path@3.0.0](https://github.com/jonschlinkert/normalize-path) - MIT
[npm-run-path@4.0.1](https://github.com/sindresorhus/npm-run-path) - MIT
[obliterator@1.6.1](https://github.com/yomguithereal/obliterator) - MIT
[once@1.4.0](https://github.com/isaacs/once) - ISC
[onetime@5.1.2](https://github.com/sindresorhus/onetime) - MIT
[onnxruntime-common@1.19.2](https://github.com/Microsoft/onnxruntime) - MIT
[onnxruntime-common@1.20.0-dev.20241016-2b8fc5529b](https://github.com/Microsoft/onnxruntime) - MIT
[onnxruntime-node@1.19.2](https://github.com/Microsoft/onnxruntime) - MIT
[onnxruntime-web@1.21.0-dev.20241024-d9ca84ef96](https://github.com/Microsoft/onnxruntime) - MIT
[openai@4.29.2](https://github.com/openai/openai-node) - Apache-2.0
[optionator@0.9.3](https://github.com/gkz/optionator) - MIT
[p-limit@2.3.0](https://github.com/sindresorhus/p-limit) - MIT
[p-limit@3.1.0](https://github.com/sindresorhus/p-limit) - MIT
[p-locate@4.1.0](https://github.com/sindresorhus/p-locate) - MIT
[p-locate@5.0.0](https://github.com/sindresorhus/p-locate) - MIT
[p-try@2.2.0](https://github.com/sindresorhus/p-try) - MIT
[package-json-from-dist@1.0.1](https://github.com/isaacs/package-json-from-dist) - BlueOak-1.0.0
[parent-module@1.0.1](https://github.com/sindresorhus/parent-module) - MIT
[parse-json@5.2.0](https://github.com/sindresorhus/parse-json) - MIT
[path-exists@4.0.0](https://github.com/sindresorhus/path-exists) - MIT
[path-is-absolute@1.0.1](https://github.com/sindresorhus/path-is-absolute) - MIT
[path-key@3.1.1](https://github.com/sindresorhus/path-key) - MIT
[path-parse@1.0.7](https://github.com/jbgutierrez/path-parse) - MIT
[path-scurry@1.11.1](https://github.com/isaacs/path-scurry) - BlueOak-1.0.0
[path-type@4.0.0](https://github.com/sindresorhus/path-type) - MIT
[picocolors@1.0.0](https://github.com/alexeyraspopov/picocolors) - ISC
[picomatch@2.3.1](https://github.com/micromatch/picomatch) - MIT
[pirates@4.0.6](https://github.com/danez/pirates) - MIT
[pkg-dir@4.2.0](https://github.com/sindresorhus/pkg-dir) - MIT
[platform@1.3.6](https://github.com/bestiejs/platform.js) - MIT
[prelude-ls@1.2.1](https://github.com/gkz/prelude-ls) - MIT
[pretty-format@29.7.0](https://github.com/jestjs/jest) - MIT
[prompts@2.4.2](https://github.com/terkelg/prompts) - MIT
[protobufjs@7.4.0](https://github.com/protobufjs/protobuf.js) - BSD-3-Clause
[proxy-from-env@1.1.0](https://github.com/Rob--W/proxy-from-env) - MIT
[punycode.js@2.3.1](https://github.com/mathiasbynens/punycode.js) - MIT
[punycode@2.3.1](https://github.com/mathiasbynens/punycode.js) - MIT
[pure-rand@6.0.4](https://github.com/dubzzz/pure-rand) - MIT
[queue-microtask@1.2.3](https://github.com/feross/queue-microtask) - MIT
[react-is@18.2.0](https://github.com/facebook/react) - MIT
[rechoir@0.6.2](https://github.com/tkellen/node-rechoir) - MIT
[reflect-metadata@0.2.2](https://github.com/rbuckton/reflect-metadata) - Apache-2.0
[require-directory@2.1.1](https://github.com/troygoode/node-require-directory) - MIT
[resolve-cwd@3.0.0](https://github.com/sindresorhus/resolve-cwd) - MIT
[resolve-from@4.0.0](https://github.com/sindresorhus/resolve-from) - MIT
[resolve-from@5.0.0](https://github.com/sindresorhus/resolve-from) - MIT
[resolve.exports@2.0.2](https://github.com/lukeed/resolve.exports) - MIT
[resolve@1.22.8](https://github.com/browserify/resolve) - MIT
[reusify@1.0.4](https://github.com/mcollina/reusify) - MIT
[rimraf@3.0.2](https://github.com/isaacs/rimraf) - ISC
[rimraf@5.0.10](https://github.com/isaacs/rimraf) - ISC
[run-parallel@1.2.0](https://github.com/feross/run-parallel) - MIT
[semver@6.3.1](https://github.com/npm/node-semver) - ISC
[semver@7.6.3](https://github.com/npm/node-semver) - ISC
[sharp@0.33.5](https://github.com/lovell/sharp) - Apache-2.0
[shebang-command@2.0.0](https://github.com/kevva/shebang-command) - MIT
[shebang-regex@3.0.0](https://github.com/sindresorhus/shebang-regex) - MIT
[shelljs@0.8.5](https://github.com/shelljs/shelljs) - BSD-3-Clause
[shiki@1.10.3](https://github.com/shikijs/shiki) - MIT
[shx@0.3.4](https://github.com/shelljs/shx) - MIT
[signal-exit@3.0.7](https://github.com/tapjs/signal-exit) - ISC
[signal-exit@4.1.0](https://github.com/tapjs/signal-exit) - ISC
[simple-swizzle@0.2.2](https://github.com/qix-/node-simple-swizzle) - MIT
[sisteransi@1.0.5](https://github.com/terkelg/sisteransi) - MIT
[slash@3.0.0](https://github.com/sindresorhus/slash) - MIT
[source-map-support@0.5.13](https://github.com/evanw/node-source-map-support) - MIT
[source-map@0.6.1](https://github.com/mozilla/source-map) - BSD-3-Clause
[sprintf-js@1.0.3](https://github.com/alexei/sprintf.js) - BSD-3-Clause
[stack-utils@2.0.6](https://github.com/tapjs/stack-utils) - MIT
[stream-read-all@3.0.1](https://github.com/75lb/stream-read-all) - MIT
[string-length@4.0.2](https://github.com/sindresorhus/string-length) - MIT
[string-width@4.2.3](https://github.com/sindresorhus/string-width) - MIT
[string-width@5.1.2](https://github.com/sindresorhus/string-width) - MIT
[strip-ansi@6.0.1](https://github.com/chalk/strip-ansi) - MIT
[strip-ansi@7.1.0](https://github.com/chalk/strip-ansi) - MIT
[strip-bom@4.0.0](https://github.com/sindresorhus/strip-bom) - MIT
[strip-final-newline@2.0.0](https://github.com/sindresorhus/strip-final-newline) - MIT
[strip-json-comments@3.1.1](https://github.com/sindresorhus/strip-json-comments) - MIT
[strnum@1.0.5](https://github.com/NaturalIntelligence/strnum) - MIT
[supports-color@7.2.0](https://github.com/chalk/supports-color) - MIT
[supports-color@8.1.1](https://github.com/chalk/supports-color) - MIT
[supports-preserve-symlinks-flag@1.0.0](https://github.com/inspect-js/node-supports-preserve-symlinks-flag) - MIT
[table-layout@3.0.2](https://github.com/75lb/table-layout) - MIT
[tar@7.4.3](https://github.com/isaacs/node-tar) - ISC
[test-exclude@6.0.0](https://github.com/istanbuljs/test-exclude) - ISC
[text-table@0.2.0](https://github.com/substack/text-table) - MIT
[tmp@0.2.3](https://github.com/raszi/node-tmp) - MIT
[tmpl@1.0.5](https://github.com/daaku/nodejs-tmpl) - BSD-3-Clause
[to-regex-range@5.0.1](https://github.com/micromatch/to-regex-range) - MIT
[tr46@0.0.3](https://github.com/Sebmaster/tr46.js) - MIT
[ts-api-utils@1.0.3](https://github.com/JoshuaKGoldberg/ts-api-utils) - MIT
[ts-jest@29.1.2](https://github.com/kulshekhar/ts-jest) - MIT
[tslib@1.14.1](https://github.com/Microsoft/tslib) - 0BSD
[tslib@2.6.2](https://github.com/Microsoft/tslib) - 0BSD
[type-check@0.4.0](https://github.com/gkz/type-check) - MIT
[type-detect@4.0.8](https://github.com/chaijs/type-detect) - MIT
[type-fest@0.20.2](https://github.com/sindresorhus/type-fest) - (MIT OR CC0-1.0)
[type-fest@0.21.3](https://github.com/sindresorhus/type-fest) - (MIT OR CC0-1.0)
[typedoc-plugin-markdown@4.2.1](https://github.com/typedoc2md/typedoc-plugin-markdown) - MIT
[typedoc@0.26.4](https://github.com/TypeStrong/TypeDoc) - Apache-2.0
[typescript-eslint@7.1.0](https://github.com/typescript-eslint/typescript-eslint) - MIT
[typescript@5.5.4](https://github.com/Microsoft/TypeScript) - Apache-2.0
[typical@4.0.0](https://github.com/75lb/typical) - MIT
[typical@7.1.1](https://github.com/75lb/typical) - MIT
[uc.micro@2.1.0](https://github.com/markdown-it/uc.micro) - MIT
[undici-types@5.26.5](https://github.com/nodejs/undici) - MIT
[undici-types@6.19.8](https://github.com/nodejs/undici) - MIT
[update-browserslist-db@1.0.13](https://github.com/browserslist/update-db) - MIT
[uri-js@4.4.1](https://github.com/garycourt/uri-js) - BSD-2-Clause
[uuid@9.0.1](https://github.com/uuidjs/uuid) - MIT
[v8-to-istanbul@9.2.0](https://github.com/istanbuljs/v8-to-istanbul) - ISC
[walker@1.0.8](https://github.com/daaku/nodejs-walker) - Apache-2.0
[web-streams-polyfill@3.3.3](https://github.com/MattiasBuelens/web-streams-polyfill) - MIT
[web-streams-polyfill@4.0.0-beta.3](https://github.com/MattiasBuelens/web-streams-polyfill) - MIT
[webidl-conversions@3.0.1](https://github.com/jsdom/webidl-conversions) - BSD-2-Clause
[whatwg-url@5.0.0](https://github.com/jsdom/whatwg-url) - MIT
[which@2.0.2](https://github.com/isaacs/node-which) - ISC
[wordwrapjs@5.1.0](https://github.com/75lb/wordwrapjs) - MIT
[wrap-ansi@7.0.0](https://github.com/chalk/wrap-ansi) - MIT
[wrap-ansi@8.1.0](https://github.com/chalk/wrap-ansi) - MIT
[wrappy@1.0.2](https://github.com/npm/wrappy) - ISC
[write-file-atomic@4.0.2](https://github.com/npm/write-file-atomic) - ISC
[y18n@5.0.8](https://github.com/yargs/y18n) - ISC
[yallist@3.1.1](https://github.com/isaacs/yallist) - ISC
[yallist@5.0.0](https://github.com/isaacs/yallist) - BlueOak-1.0.0
[yaml@2.4.5](https://github.com/eemeli/yaml) - ISC
[yargs-parser@21.1.1](https://github.com/yargs/yargs-parser) - ISC
[yargs@17.7.2](https://github.com/yargs/yargs) - MIT
[yocto-queue@0.1.0](https://github.com/sindresorhus/yocto-queue) - MIT

File diff suppressed because it is too large Load Diff

View File

@@ -1697,6 +1697,65 @@ describe.each([arrow15, arrow16, arrow17, arrow18])(
expect(results2[0].text).toBe(data[1].text);
});
test("full text search fast search", async () => {
const db = await connect(tmpDir.name);
const data = [{ text: "hello world", vector: [0.1, 0.2, 0.3], id: 1 }];
const table = await db.createTable("test", data);
await table.createIndex("text", {
config: Index.fts(),
});
// Insert unindexed data after creating the index.
await table.add([{ text: "xyz", vector: [0.4, 0.5, 0.6], id: 2 }]);
const withFlatSearch = await table
.search("xyz", "fts")
.limit(10)
.toArray();
expect(withFlatSearch.length).toBeGreaterThan(0);
const fastSearchResults = await table
.search("xyz", "fts")
.fastSearch()
.limit(10)
.toArray();
expect(fastSearchResults.length).toBe(0);
const nearestToTextFastSearch = await table
.query()
.nearestToText("xyz")
.fastSearch()
.limit(10)
.toArray();
expect(nearestToTextFastSearch.length).toBe(0);
// fastSearch should be chainable with other methods.
const chainedFastSearch = await table
.search("xyz", "fts")
.fastSearch()
.select(["text"])
.limit(5)
.toArray();
expect(chainedFastSearch.length).toBe(0);
await table.optimize();
const indexedFastSearch = await table
.search("xyz", "fts")
.fastSearch()
.limit(10)
.toArray();
expect(indexedFastSearch.length).toBeGreaterThan(0);
const indexedNearestToTextFastSearch = await table
.query()
.nearestToText("xyz")
.fastSearch()
.limit(10)
.toArray();
expect(indexedNearestToTextFastSearch.length).toBeGreaterThan(0);
});
test("prewarm full text search index", async () => {
const db = await connect(tmpDir.name);
const data = [

View File

@@ -273,7 +273,9 @@ export async function connect(
let nativeProvider: NativeJsHeaderProvider | undefined;
if (finalHeaderProvider) {
if (typeof finalHeaderProvider === "function") {
nativeProvider = new NativeJsHeaderProvider(finalHeaderProvider);
nativeProvider = new NativeJsHeaderProvider(async () =>
finalHeaderProvider(),
);
} else if (
finalHeaderProvider &&
typeof finalHeaderProvider.getHeaders === "function"

View File

@@ -684,19 +684,17 @@ export class VectorQuery extends StandardQueryBase<NativeVectorQuery> {
rerank(reranker: Reranker): VectorQuery {
super.doCall((inner) =>
inner.rerank({
rerankHybrid: async (_, args) => {
const vecResults = await fromBufferToRecordBatch(args.vecResults);
const ftsResults = await fromBufferToRecordBatch(args.ftsResults);
const result = await reranker.rerankHybrid(
args.query,
vecResults as RecordBatch,
ftsResults as RecordBatch,
);
inner.rerank(async (args) => {
const vecResults = await fromBufferToRecordBatch(args.vecResults);
const ftsResults = await fromBufferToRecordBatch(args.ftsResults);
const result = await reranker.rerankHybrid(
args.query,
vecResults as RecordBatch,
ftsResults as RecordBatch,
);
const buffer = fromRecordBatchToBuffer(result);
return buffer;
},
const buffer = fromRecordBatchToBuffer(result);
return buffer;
}),
);

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-darwin-arm64",
"version": "0.25.0-beta.0",
"version": "0.27.0-beta.3",
"os": ["darwin"],
"cpu": ["arm64"],
"main": "lancedb.darwin-arm64.node",
@@ -8,5 +8,9 @@
"license": "Apache-2.0",
"engines": {
"node": ">= 18"
},
"repository": {
"type": "git",
"url": "https://github.com/lancedb/lancedb"
}
}

View File

@@ -1,3 +0,0 @@
# `@lancedb/lancedb-darwin-x64`
This is the **x86_64-apple-darwin** binary for `@lancedb/lancedb`

View File

@@ -1,12 +0,0 @@
{
"name": "@lancedb/lancedb-darwin-x64",
"version": "0.25.0-beta.0",
"os": ["darwin"],
"cpu": ["x64"],
"main": "lancedb.darwin-x64.node",
"files": ["lancedb.darwin-x64.node"],
"license": "Apache-2.0",
"engines": {
"node": ">= 18"
}
}

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-linux-arm64-gnu",
"version": "0.25.0-beta.0",
"version": "0.27.0-beta.3",
"os": ["linux"],
"cpu": ["arm64"],
"main": "lancedb.linux-arm64-gnu.node",
@@ -9,5 +9,9 @@
"engines": {
"node": ">= 18"
},
"libc": ["glibc"]
"libc": ["glibc"],
"repository": {
"type": "git",
"url": "https://github.com/lancedb/lancedb"
}
}

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-linux-arm64-musl",
"version": "0.25.0-beta.0",
"version": "0.27.0-beta.3",
"os": ["linux"],
"cpu": ["arm64"],
"main": "lancedb.linux-arm64-musl.node",
@@ -9,5 +9,9 @@
"engines": {
"node": ">= 18"
},
"libc": ["musl"]
"libc": ["musl"],
"repository": {
"type": "git",
"url": "https://github.com/lancedb/lancedb"
}
}

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-linux-x64-gnu",
"version": "0.25.0-beta.0",
"version": "0.27.0-beta.3",
"os": ["linux"],
"cpu": ["x64"],
"main": "lancedb.linux-x64-gnu.node",
@@ -9,5 +9,9 @@
"engines": {
"node": ">= 18"
},
"libc": ["glibc"]
"libc": ["glibc"],
"repository": {
"type": "git",
"url": "https://github.com/lancedb/lancedb"
}
}

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-linux-x64-musl",
"version": "0.25.0-beta.0",
"version": "0.27.0-beta.3",
"os": ["linux"],
"cpu": ["x64"],
"main": "lancedb.linux-x64-musl.node",
@@ -9,5 +9,9 @@
"engines": {
"node": ">= 18"
},
"libc": ["musl"]
"libc": ["musl"],
"repository": {
"type": "git",
"url": "https://github.com/lancedb/lancedb"
}
}

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-win32-arm64-msvc",
"version": "0.25.0-beta.0",
"version": "0.27.0-beta.3",
"os": [
"win32"
],
@@ -14,5 +14,9 @@
"license": "Apache-2.0",
"engines": {
"node": ">= 18"
},
"repository": {
"type": "git",
"url": "https://github.com/lancedb/lancedb"
}
}

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-win32-x64-msvc",
"version": "0.25.0-beta.0",
"version": "0.27.0-beta.3",
"os": ["win32"],
"cpu": ["x64"],
"main": "lancedb.win32-x64-msvc.node",
@@ -8,5 +8,9 @@
"license": "Apache-2.0",
"engines": {
"node": ">= 18"
},
"repository": {
"type": "git",
"url": "https://github.com/lancedb/lancedb"
}
}

1781
nodejs/package-lock.json generated

File diff suppressed because it is too large Load Diff

View File

@@ -11,7 +11,7 @@
"ann"
],
"private": false,
"version": "0.25.0-beta.0",
"version": "0.27.0-beta.3",
"main": "dist/index.js",
"exports": {
".": "./dist/index.js",
@@ -21,29 +21,29 @@
},
"types": "dist/index.d.ts",
"napi": {
"name": "lancedb",
"triples": {
"defaults": false,
"additional": [
"x86_64-apple-darwin",
"aarch64-apple-darwin",
"x86_64-unknown-linux-gnu",
"aarch64-unknown-linux-gnu",
"x86_64-unknown-linux-musl",
"aarch64-unknown-linux-musl",
"x86_64-pc-windows-msvc",
"aarch64-pc-windows-msvc"
]
}
"binaryName": "lancedb",
"targets": [
"aarch64-apple-darwin",
"x86_64-unknown-linux-gnu",
"aarch64-unknown-linux-gnu",
"x86_64-unknown-linux-musl",
"aarch64-unknown-linux-musl",
"x86_64-pc-windows-msvc",
"aarch64-pc-windows-msvc"
]
},
"license": "Apache-2.0",
"repository": {
"type": "git",
"url": "https://github.com/lancedb/lancedb"
},
"devDependencies": {
"@aws-sdk/client-dynamodb": "^3.33.0",
"@aws-sdk/client-kms": "^3.33.0",
"@aws-sdk/client-s3": "^3.33.0",
"@biomejs/biome": "^1.7.3",
"@jest/globals": "^29.7.0",
"@napi-rs/cli": "^2.18.3",
"@napi-rs/cli": "^3.5.1",
"@types/axios": "^0.14.0",
"@types/jest": "^29.1.2",
"@types/node": "^22.7.4",
@@ -72,9 +72,9 @@
"os": ["darwin", "linux", "win32"],
"scripts": {
"artifacts": "napi artifacts",
"build:debug": "napi build --platform --no-const-enum --dts ../lancedb/native.d.ts --js ../lancedb/native.js lancedb",
"build:debug": "napi build --platform --dts ../lancedb/native.d.ts --js ../lancedb/native.js --output-dir lancedb",
"postbuild:debug": "shx mkdir -p dist && shx cp lancedb/*.node dist/",
"build:release": "napi build --platform --no-const-enum --release --dts ../lancedb/native.d.ts --js ../lancedb/native.js dist/",
"build:release": "napi build --platform --release --dts ../lancedb/native.d.ts --js ../lancedb/native.js --output-dir dist",
"postbuild:release": "shx mkdir -p dist && shx cp lancedb/*.node dist/",
"build": "npm run build:debug && npm run tsc",
"build-release": "npm run build:release && npm run tsc",
@@ -88,7 +88,7 @@
"prepublishOnly": "napi prepublish -t npm",
"test": "jest --verbose",
"integration": "S3_TEST=1 npm run test",
"universal": "napi universal",
"universal": "napi universalize",
"version": "napi version"
},
"dependencies": {

View File

@@ -8,11 +8,12 @@ use lancedb::database::{CreateTableMode, Database};
use napi::bindgen_prelude::*;
use napi_derive::*;
use crate::ConnectionOptions;
use crate::error::NapiErrorExt;
use crate::header::JsHeaderProvider;
use crate::table::Table;
use crate::ConnectionOptions;
use lancedb::connection::{ConnectBuilder, Connection as LanceDBConnection};
use lancedb::ipc::{ipc_file_to_batches, ipc_file_to_schema};
#[napi]

View File

@@ -1,20 +1,19 @@
// SPDX-License-Identifier: Apache-2.0
// SPDX-FileCopyrightText: Copyright The LanceDB Authors
use napi::{
bindgen_prelude::*,
threadsafe_function::{ErrorStrategy, ThreadsafeFunction},
};
use napi::{bindgen_prelude::*, threadsafe_function::ThreadsafeFunction};
use napi_derive::napi;
use std::collections::HashMap;
use std::sync::Arc;
type GetHeadersFn = ThreadsafeFunction<(), Promise<HashMap<String, String>>, (), Status, false>;
/// JavaScript HeaderProvider implementation that wraps a JavaScript callback.
/// This is the only native header provider - all header provider implementations
/// should provide a JavaScript function that returns headers.
#[napi]
pub struct JsHeaderProvider {
get_headers_fn: Arc<ThreadsafeFunction<(), ErrorStrategy::CalleeHandled>>,
get_headers_fn: Arc<GetHeadersFn>,
}
impl Clone for JsHeaderProvider {
@@ -29,9 +28,12 @@ impl Clone for JsHeaderProvider {
impl JsHeaderProvider {
/// Create a new JsHeaderProvider from a JavaScript callback
#[napi(constructor)]
pub fn new(get_headers_callback: JsFunction) -> Result<Self> {
pub fn new(
get_headers_callback: Function<(), Promise<HashMap<String, String>>>,
) -> Result<Self> {
let get_headers_fn = get_headers_callback
.create_threadsafe_function(0, |ctx| Ok(vec![ctx.value]))
.build_threadsafe_function()
.build()
.map_err(|e| {
Error::new(
Status::GenericFailure,
@@ -51,7 +53,7 @@ impl lancedb::remote::HeaderProvider for JsHeaderProvider {
async fn get_headers(&self) -> lancedb::error::Result<HashMap<String, String>> {
// Call the JavaScript function asynchronously
let promise: Promise<HashMap<String, String>> =
self.get_headers_fn.call_async(Ok(())).await.map_err(|e| {
self.get_headers_fn.call_async(()).await.map_err(|e| {
lancedb::error::Error::Runtime {
message: format!("Failed to call JavaScript get_headers: {}", e),
}

View File

@@ -3,12 +3,12 @@
use std::sync::Mutex;
use lancedb::index::Index as LanceDbIndex;
use lancedb::index::scalar::{BTreeIndexBuilder, FtsIndexBuilder};
use lancedb::index::vector::{
IvfFlatIndexBuilder, IvfHnswPqIndexBuilder, IvfHnswSqIndexBuilder, IvfPqIndexBuilder,
IvfRqIndexBuilder,
};
use lancedb::index::Index as LanceDbIndex;
use napi_derive::napi;
use crate::util::parse_distance_type;

View File

@@ -60,7 +60,7 @@ pub struct OpenTableOptions {
pub storage_options: Option<HashMap<String, String>>,
}
#[napi::module_init]
#[napi_derive::module_init]
fn init() {
let env = Env::new()
.filter_or("LANCEDB_LOG", "warn")

View File

@@ -17,11 +17,11 @@ use lancedb::query::VectorQuery as LanceDbVectorQuery;
use napi::bindgen_prelude::*;
use napi_derive::napi;
use crate::error::convert_error;
use crate::error::NapiErrorExt;
use crate::error::convert_error;
use crate::iterator::RecordBatchIterator;
use crate::rerankers::RerankHybridCallbackArgs;
use crate::rerankers::Reranker;
use crate::rerankers::RerankerCallbacks;
use crate::util::{parse_distance_type, schema_to_buffer};
#[napi]
@@ -42,7 +42,7 @@ impl Query {
}
#[napi]
pub fn full_text_search(&mut self, query: napi::JsObject) -> napi::Result<()> {
pub fn full_text_search(&mut self, query: Object) -> napi::Result<()> {
let query = parse_fts_query(query)?;
self.inner = self.inner.clone().full_text_search(query);
Ok(())
@@ -235,7 +235,7 @@ impl VectorQuery {
}
#[napi]
pub fn full_text_search(&mut self, query: napi::JsObject) -> napi::Result<()> {
pub fn full_text_search(&mut self, query: Object) -> napi::Result<()> {
let query = parse_fts_query(query)?;
self.inner = self.inner.clone().full_text_search(query);
Ok(())
@@ -272,11 +272,13 @@ impl VectorQuery {
}
#[napi]
pub fn rerank(&mut self, callbacks: RerankerCallbacks) {
self.inner = self
.inner
.clone()
.rerank(Arc::new(Reranker::new(callbacks)));
pub fn rerank(
&mut self,
rerank_hybrid: Function<RerankHybridCallbackArgs, Promise<Buffer>>,
) -> napi::Result<()> {
let reranker = Reranker::new(rerank_hybrid)?;
self.inner = self.inner.clone().rerank(Arc::new(reranker));
Ok(())
}
#[napi(catch_unwind)]
@@ -523,12 +525,12 @@ impl JsFullTextQuery {
}
}
fn parse_fts_query(query: napi::JsObject) -> napi::Result<FullTextSearchQuery> {
if let Ok(Some(query)) = query.get::<_, &JsFullTextQuery>("query") {
fn parse_fts_query(query: Object) -> napi::Result<FullTextSearchQuery> {
if let Ok(Some(query)) = query.get::<&JsFullTextQuery>("query") {
Ok(FullTextSearchQuery::new_query(query.inner.clone()))
} else if let Ok(Some(query_text)) = query.get::<_, String>("query") {
} else if let Ok(Some(query_text)) = query.get::<String>("query") {
let mut query_text = query_text;
let columns = query.get::<_, Option<Vec<String>>>("columns")?.flatten();
let columns = query.get::<Option<Vec<String>>>("columns")?.flatten();
let is_phrase =
query_text.len() >= 2 && query_text.starts_with('"') && query_text.ends_with('"');
@@ -549,15 +551,12 @@ fn parse_fts_query(query: napi::JsObject) -> napi::Result<FullTextSearchQuery> {
}
};
let mut query = FullTextSearchQuery::new_query(query);
if let Some(cols) = columns {
if !cols.is_empty() {
query = query.with_columns(&cols).map_err(|e| {
napi::Error::from_reason(format!(
"Failed to set full text search columns: {}",
e
))
})?;
}
if let Some(cols) = columns
&& !cols.is_empty()
{
query = query.with_columns(&cols).map_err(|e| {
napi::Error::from_reason(format!("Failed to set full text search columns: {}", e))
})?;
}
Ok(query)
} else {

View File

@@ -3,10 +3,7 @@
use arrow_array::RecordBatch;
use async_trait::async_trait;
use napi::{
bindgen_prelude::*,
threadsafe_function::{ErrorStrategy, ThreadsafeFunction},
};
use napi::{bindgen_prelude::*, threadsafe_function::ThreadsafeFunction};
use napi_derive::napi;
use lancedb::ipc::batches_to_ipc_file;
@@ -15,27 +12,28 @@ use lancedb::{error::Error, ipc::ipc_file_to_batches};
use crate::error::NapiErrorExt;
type RerankHybridFn = ThreadsafeFunction<
RerankHybridCallbackArgs,
Promise<Buffer>,
RerankHybridCallbackArgs,
Status,
false,
>;
/// Reranker implementation that "wraps" a NodeJS Reranker implementation.
/// This contains references to the callbacks that can be used to invoke the
/// reranking methods on the NodeJS implementation and handles serializing the
/// record batches to Arrow IPC buffers.
#[napi]
pub struct Reranker {
/// callback to the Javascript which will call the rerankHybrid method of
/// some Reranker implementation
rerank_hybrid: ThreadsafeFunction<RerankHybridCallbackArgs, ErrorStrategy::CalleeHandled>,
rerank_hybrid: RerankHybridFn,
}
#[napi]
impl Reranker {
#[napi]
pub fn new(callbacks: RerankerCallbacks) -> Self {
let rerank_hybrid = callbacks
.rerank_hybrid
.create_threadsafe_function(0, move |ctx| Ok(vec![ctx.value]))
.unwrap();
Self { rerank_hybrid }
pub fn new(
rerank_hybrid: Function<RerankHybridCallbackArgs, Promise<Buffer>>,
) -> napi::Result<Self> {
let rerank_hybrid = rerank_hybrid.build_threadsafe_function().build()?;
Ok(Self { rerank_hybrid })
}
}
@@ -49,16 +47,16 @@ impl lancedb::rerankers::Reranker for Reranker {
) -> lancedb::error::Result<RecordBatch> {
let callback_args = RerankHybridCallbackArgs {
query: query.to_string(),
vec_results: batches_to_ipc_file(&[vector_results])?,
fts_results: batches_to_ipc_file(&[fts_results])?,
vec_results: Buffer::from(batches_to_ipc_file(&[vector_results])?.as_ref()),
fts_results: Buffer::from(batches_to_ipc_file(&[fts_results])?.as_ref()),
};
let promised_buffer: Promise<Buffer> = self
.rerank_hybrid
.call_async(Ok(callback_args))
.call_async(callback_args)
.await
.map_err(|e| Error::Runtime {
message: format!("napi error status={}, reason={}", e.status, e.reason),
})?;
message: format!("napi error status={}, reason={}", e.status, e.reason),
})?;
let buffer = promised_buffer.await.map_err(|e| Error::Runtime {
message: format!("napi error status={}, reason={}", e.status, e.reason),
})?;
@@ -77,16 +75,11 @@ impl std::fmt::Debug for Reranker {
}
}
#[napi(object)]
pub struct RerankerCallbacks {
pub rerank_hybrid: JsFunction,
}
#[napi(object)]
pub struct RerankHybridCallbackArgs {
pub query: String,
pub vec_results: Vec<u8>,
pub fts_results: Vec<u8>,
pub vec_results: Buffer,
pub fts_results: Buffer,
}
fn buffer_to_record_batch(buffer: Buffer) -> Result<RecordBatch> {

View File

@@ -95,8 +95,7 @@ impl napi::bindgen_prelude::FromNapiValue for Session {
napi_val: napi::sys::napi_value,
) -> napi::Result<Self> {
let object: napi::bindgen_prelude::ClassInstance<Self> =
napi::bindgen_prelude::ClassInstance::from_napi_value(env, napi_val)?;
let copy = object.clone();
Ok(copy)
unsafe { napi::bindgen_prelude::ClassInstance::from_napi_value(env, napi_val)? };
Ok((*object).clone())
}
}

View File

@@ -71,6 +71,17 @@ impl Table {
pub async fn add(&self, buf: Buffer, mode: String) -> napi::Result<AddResult> {
let batches = ipc_file_to_batches(buf.to_vec())
.map_err(|e| napi::Error::from_reason(format!("Failed to read IPC file: {}", e)))?;
let batches = batches
.into_iter()
.map(|batch| {
batch.map_err(|e| {
napi::Error::from_reason(format!(
"Failed to read record batch from IPC file: {}",
e
))
})
})
.collect::<Result<Vec<_>>>()?;
let mut op = self.inner_ref()?.add(batches);
op = if mode == "append" {
@@ -742,12 +753,14 @@ impl From<lancedb::table::AddResult> for AddResult {
#[napi(object)]
pub struct DeleteResult {
pub num_deleted_rows: i64,
pub version: i64,
}
impl From<lancedb::table::DeleteResult> for DeleteResult {
fn from(value: lancedb::table::DeleteResult) -> Self {
Self {
num_deleted_rows: value.num_deleted_rows as i64,
version: value.version as i64,
}
}

View File

@@ -1,5 +1,5 @@
[tool.bumpversion]
current_version = "0.28.0-beta.0"
current_version = "0.30.0-beta.4"
parse = """(?x)
(?P<major>0|[1-9]\\d*)\\.
(?P<minor>0|[1-9]\\d*)\\.

View File

@@ -1,13 +1,13 @@
[package]
name = "lancedb-python"
version = "0.27.0"
version = "0.30.0-beta.4"
edition.workspace = true
description = "Python bindings for LanceDB"
license.workspace = true
repository.workspace = true
keywords.workspace = true
categories.workspace = true
rust-version = "1.88.0"
rust-version = "1.91.0"
[lib]
name = "_lancedb"
@@ -16,9 +16,11 @@ crate-type = ["cdylib"]
[dependencies]
arrow = { version = "57.2", features = ["pyarrow"] }
async-trait = "0.1"
bytes = "1"
lancedb = { path = "../rust/lancedb", default-features = false }
lance-core.workspace = true
lance-namespace.workspace = true
lance-namespace-impls.workspace = true
lance-io.workspace = true
env_logger.workspace = true
pyo3 = { version = "0.26", features = ["extension-module", "abi3-py39"] }
@@ -28,6 +30,8 @@ pyo3-async-runtimes = { version = "0.26", features = [
] }
pin-project = "1.1.5"
futures.workspace = true
serde = "1"
serde_json = "1"
snafu.workspace = true
tokio = { version = "1.40", features = ["sync"] }

View File

@@ -0,0 +1,206 @@
| Name | Version | License | URL |
|--------------------------------|-----------------|--------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------|
| InstructorEmbedding | 1.0.1 | Apache License 2.0 | https://github.com/HKUNLP/instructor-embedding |
| Jinja2 | 3.1.6 | BSD License | https://github.com/pallets/jinja/ |
| Markdown | 3.10.2 | BSD-3-Clause | https://Python-Markdown.github.io/ |
| MarkupSafe | 3.0.3 | BSD-3-Clause | https://github.com/pallets/markupsafe/ |
| PyJWT | 2.11.0 | MIT | https://github.com/jpadilla/pyjwt |
| PyYAML | 6.0.3 | MIT License | https://pyyaml.org/ |
| Pygments | 2.19.2 | BSD License | https://pygments.org |
| accelerate | 1.12.0 | Apache Software License | https://github.com/huggingface/accelerate |
| adlfs | 2026.2.0 | BSD License | UNKNOWN |
| aiohappyeyeballs | 2.6.1 | Python Software Foundation License | https://github.com/aio-libs/aiohappyeyeballs |
| aiohttp | 3.13.3 | Apache-2.0 AND MIT | https://github.com/aio-libs/aiohttp |
| aiosignal | 1.4.0 | Apache Software License | https://github.com/aio-libs/aiosignal |
| annotated-types | 0.7.0 | MIT License | https://github.com/annotated-types/annotated-types |
| anyio | 4.12.1 | MIT | https://anyio.readthedocs.io/en/stable/versionhistory.html |
| appnope | 0.1.4 | BSD License | http://github.com/minrk/appnope |
| asttokens | 3.0.1 | Apache 2.0 | https://github.com/gristlabs/asttokens |
| attrs | 25.4.0 | MIT | https://www.attrs.org/en/stable/changelog.html |
| awscli | 1.44.35 | Apache Software License | http://aws.amazon.com/cli/ |
| azure-core | 1.38.0 | MIT License | https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/core/azure-core |
| azure-datalake-store | 0.0.53 | MIT License | https://github.com/Azure/azure-data-lake-store-python |
| azure-identity | 1.25.1 | MIT | https://github.com/Azure/azure-sdk-for-python |
| azure-storage-blob | 12.28.0 | MIT License | https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/storage/azure-storage-blob |
| babel | 2.18.0 | BSD License | https://babel.pocoo.org/ |
| backrefs | 6.1 | MIT | https://github.com/facelessuser/backrefs |
| beautifulsoup4 | 4.14.3 | MIT License | https://www.crummy.com/software/BeautifulSoup/bs4/ |
| bleach | 6.3.0 | Apache Software License | https://github.com/mozilla/bleach |
| boto3 | 1.42.45 | Apache-2.0 | https://github.com/boto/boto3 |
| botocore | 1.42.45 | Apache-2.0 | https://github.com/boto/botocore |
| cachetools | 7.0.0 | MIT | https://github.com/tkem/cachetools/ |
| certifi | 2026.1.4 | Mozilla Public License 2.0 (MPL 2.0) | https://github.com/certifi/python-certifi |
| cffi | 2.0.0 | MIT | https://cffi.readthedocs.io/en/latest/whatsnew.html |
| cfgv | 3.5.0 | MIT | https://github.com/asottile/cfgv |
| charset-normalizer | 3.4.4 | MIT | https://github.com/jawah/charset_normalizer/blob/master/CHANGELOG.md |
| click | 8.3.1 | BSD-3-Clause | https://github.com/pallets/click/ |
| cohere | 5.20.4 | MIT License | https://github.com/cohere-ai/cohere-python |
| colorama | 0.4.6 | BSD License | https://github.com/tartley/colorama |
| colpali_engine | 0.3.13 | MIT License | https://github.com/illuin-tech/colpali |
| comm | 0.2.3 | BSD License | https://github.com/ipython/comm |
| cryptography | 46.0.4 | Apache-2.0 OR BSD-3-Clause | https://github.com/pyca/cryptography |
| datafusion | 51.0.0 | Apache Software License | https://datafusion.apache.org/python |
| debugpy | 1.8.20 | MIT License | https://aka.ms/debugpy |
| decorator | 5.2.1 | BSD License | UNKNOWN |
| defusedxml | 0.7.1 | Python Software Foundation License | https://github.com/tiran/defusedxml |
| deprecation | 2.1.0 | Apache Software License | http://deprecation.readthedocs.io/ |
| distlib | 0.4.0 | Python Software Foundation License | https://github.com/pypa/distlib |
| distro | 1.9.0 | Apache Software License | https://github.com/python-distro/distro |
| docutils | 0.19 | BSD License; GNU General Public License (GPL); Public Domain; Python Software Foundation License | https://docutils.sourceforge.io/ |
| duckdb | 1.4.4 | MIT License | https://github.com/duckdb/duckdb-python |
| executing | 2.2.1 | MIT License | https://github.com/alexmojaki/executing |
| fastavro | 1.12.1 | MIT | https://github.com/fastavro/fastavro |
| fastjsonschema | 2.21.2 | BSD License | https://github.com/horejsek/python-fastjsonschema |
| filelock | 3.20.3 | Unlicense | https://github.com/tox-dev/py-filelock |
| frozenlist | 1.8.0 | Apache-2.0 | https://github.com/aio-libs/frozenlist |
| fsspec | 2026.2.0 | BSD-3-Clause | https://github.com/fsspec/filesystem_spec |
| ftfy | 6.3.1 | Apache-2.0 | https://ftfy.readthedocs.io/en/latest/ |
| ghp-import | 2.1.0 | Apache Software License | https://github.com/c-w/ghp-import |
| google-ai-generativelanguage | 0.6.15 | Apache Software License | https://github.com/googleapis/google-cloud-python/tree/main/packages/google-ai-generativelanguage |
| google-api-core | 2.25.2 | Apache Software License | https://github.com/googleapis/python-api-core |
| google-api-python-client | 2.189.0 | Apache Software License | https://github.com/googleapis/google-api-python-client/ |
| google-auth | 2.48.0 | Apache Software License | https://github.com/googleapis/google-auth-library-python |
| google-auth-httplib2 | 0.3.0 | Apache Software License | https://github.com/GoogleCloudPlatform/google-auth-library-python-httplib2 |
| google-generativeai | 0.8.6 | Apache Software License | https://github.com/google/generative-ai-python |
| googleapis-common-protos | 1.72.0 | Apache Software License | https://github.com/googleapis/google-cloud-python/tree/main/packages/googleapis-common-protos |
| griffe | 2.0.0 | ISC | https://mkdocstrings.github.io/griffe |
| griffecli | 2.0.0 | ISC | UNKNOWN |
| griffelib | 2.0.0 | ISC | UNKNOWN |
| grpcio | 1.78.0 | Apache-2.0 | https://grpc.io |
| grpcio-status | 1.71.2 | Apache Software License | https://grpc.io |
| h11 | 0.16.0 | MIT License | https://github.com/python-hyper/h11 |
| hf-xet | 1.2.0 | Apache-2.0 | https://github.com/huggingface/xet-core |
| httpcore | 1.0.9 | BSD-3-Clause | https://www.encode.io/httpcore/ |
| httplib2 | 0.31.2 | MIT License | https://github.com/httplib2/httplib2 |
| httpx | 0.28.1 | BSD License | https://github.com/encode/httpx |
| huggingface_hub | 0.36.2 | Apache Software License | https://github.com/huggingface/huggingface_hub |
| ibm-cos-sdk | 2.14.3 | Apache Software License | https://github.com/ibm/ibm-cos-sdk-python |
| ibm-cos-sdk-core | 2.14.3 | Apache Software License | https://github.com/ibm/ibm-cos-sdk-python-core |
| ibm-cos-sdk-s3transfer | 2.14.3 | Apache Software License | https://github.com/IBM/ibm-cos-sdk-python-s3transfer |
| ibm_watsonx_ai | 1.5.1 | BSD License | https://ibm.github.io/watsonx-ai-python-sdk/changelog.html |
| identify | 2.6.16 | MIT | https://github.com/pre-commit/identify |
| idna | 3.11 | BSD-3-Clause | https://github.com/kjd/idna |
| iniconfig | 2.3.0 | MIT | https://github.com/pytest-dev/iniconfig |
| ipykernel | 6.31.0 | BSD-3-Clause | https://ipython.org |
| ipython | 9.10.0 | BSD-3-Clause | https://ipython.org |
| ipython_pygments_lexers | 1.1.1 | BSD License | https://github.com/ipython/ipython-pygments-lexers |
| isodate | 0.7.2 | BSD License | https://github.com/gweis/isodate/ |
| jedi | 0.19.2 | MIT License | https://github.com/davidhalter/jedi |
| jiter | 0.13.0 | MIT License | https://github.com/pydantic/jiter/ |
| jmespath | 1.0.1 | MIT License | https://github.com/jmespath/jmespath.py |
| joblib | 1.5.3 | BSD-3-Clause | https://joblib.readthedocs.io |
| jsonschema | 4.26.0 | MIT | https://github.com/python-jsonschema/jsonschema |
| jsonschema-specifications | 2025.9.1 | MIT | https://github.com/python-jsonschema/jsonschema-specifications |
| jupyter_client | 8.8.0 | BSD License | https://jupyter.org |
| jupyter_core | 5.9.1 | BSD-3-Clause | https://jupyter.org |
| jupyterlab_pygments | 0.3.0 | BSD License | https://github.com/jupyterlab/jupyterlab_pygments |
| jupytext | 1.19.1 | MIT License | https://github.com/mwouts/jupytext |
| lance-namespace | 0.4.5 | Apache-2.0 | https://github.com/lance-format/lance-namespace |
| lance-namespace-urllib3-client | 0.4.5 | Apache-2.0 | https://github.com/lance-format/lance-namespace |
| lancedb | 0.29.2 | Apache Software License | https://github.com/lancedb/lancedb |
| lomond | 0.3.3 | BSD License | https://github.com/wildfoundry/dataplicity-lomond |
| markdown-it-py | 4.0.0 | MIT License | https://github.com/executablebooks/markdown-it-py |
| matplotlib-inline | 0.2.1 | UNKNOWN | https://github.com/ipython/matplotlib-inline |
| mdit-py-plugins | 0.5.0 | MIT License | https://github.com/executablebooks/mdit-py-plugins |
| mdurl | 0.1.2 | MIT License | https://github.com/executablebooks/mdurl |
| mergedeep | 1.3.4 | MIT License | https://github.com/clarketm/mergedeep |
| mistune | 3.2.0 | BSD License | https://github.com/lepture/mistune |
| mkdocs | 1.6.1 | BSD-2-Clause | https://github.com/mkdocs/mkdocs |
| mkdocs-autorefs | 1.4.3 | ISC | https://mkdocstrings.github.io/autorefs |
| mkdocs-get-deps | 0.2.0 | MIT | https://github.com/mkdocs/get-deps |
| mkdocs-jupyter | 0.25.1 | Apache-2.0 | https://github.com/danielfrg/mkdocs-jupyter |
| mkdocs-material | 9.7.1 | MIT | https://github.com/squidfunk/mkdocs-material |
| mkdocs-material-extensions | 1.3.1 | MIT | https://github.com/facelessuser/mkdocs-material-extensions |
| mkdocstrings | 1.0.3 | ISC | https://mkdocstrings.github.io |
| mkdocstrings-python | 2.0.2 | ISC | https://mkdocstrings.github.io/python |
| mpmath | 1.3.0 | BSD License | http://mpmath.org/ |
| msal | 1.34.0 | MIT License | https://github.com/AzureAD/microsoft-authentication-library-for-python |
| msal-extensions | 1.3.1 | MIT License | https://github.com/AzureAD/microsoft-authentication-extensions-for-python/releases |
| multidict | 6.7.1 | Apache License 2.0 | https://github.com/aio-libs/multidict |
| nbclient | 0.10.4 | BSD License | https://jupyter.org |
| nbconvert | 7.17.0 | BSD License | https://jupyter.org |
| nbformat | 5.10.4 | BSD License | https://jupyter.org |
| nest-asyncio | 1.6.0 | BSD License | https://github.com/erdewit/nest_asyncio |
| networkx | 3.6.1 | BSD-3-Clause | https://networkx.org/ |
| nodeenv | 1.10.0 | BSD License | https://github.com/ekalinin/nodeenv |
| numpy | 2.4.2 | BSD-3-Clause AND 0BSD AND MIT AND Zlib AND CC0-1.0 | https://numpy.org |
| ollama | 0.6.1 | MIT | https://ollama.com |
| open_clip_torch | 3.2.0 | MIT License | https://github.com/mlfoundations/open_clip |
| openai | 2.18.0 | Apache Software License | https://github.com/openai/openai-python |
| packaging | 26.0 | Apache-2.0 OR BSD-2-Clause | https://github.com/pypa/packaging |
| paginate | 0.5.7 | MIT License | https://github.com/Signum/paginate |
| pandas | 2.3.3 | BSD License | https://pandas.pydata.org |
| pandocfilters | 1.5.1 | BSD License | http://github.com/jgm/pandocfilters |
| parso | 0.8.6 | MIT License | https://github.com/davidhalter/parso |
| pathspec | 1.0.4 | Mozilla Public License 2.0 (MPL 2.0) | UNKNOWN |
| peft | 0.17.1 | Apache Software License | https://github.com/huggingface/peft |
| pexpect | 4.9.0 | ISC License (ISCL) | https://pexpect.readthedocs.io/ |
| pillow | 12.1.0 | MIT-CMU | https://python-pillow.github.io |
| platformdirs | 4.5.1 | MIT | https://github.com/tox-dev/platformdirs |
| pluggy | 1.6.0 | MIT License | UNKNOWN |
| polars | 1.3.0 | MIT License | https://www.pola.rs/ |
| pre_commit | 4.5.1 | MIT | https://github.com/pre-commit/pre-commit |
| prompt_toolkit | 3.0.52 | BSD License | https://github.com/prompt-toolkit/python-prompt-toolkit |
| propcache | 0.4.1 | Apache Software License | https://github.com/aio-libs/propcache |
| proto-plus | 1.27.1 | Apache Software License | https://github.com/googleapis/proto-plus-python |
| protobuf | 5.29.6 | 3-Clause BSD License | https://developers.google.com/protocol-buffers/ |
| psutil | 7.2.2 | BSD-3-Clause | https://github.com/giampaolo/psutil |
| ptyprocess | 0.7.0 | ISC License (ISCL) | https://github.com/pexpect/ptyprocess |
| pure_eval | 0.2.3 | MIT License | http://github.com/alexmojaki/pure_eval |
| pyarrow | 23.0.0 | Apache-2.0 | https://arrow.apache.org/ |
| pyarrow-stubs | 20.0.0.20251215 | BSD-2-Clause | https://github.com/zen-xu/pyarrow-stubs |
| pyasn1 | 0.6.2 | BSD-2-Clause | https://github.com/pyasn1/pyasn1 |
| pyasn1_modules | 0.4.2 | BSD License | https://github.com/pyasn1/pyasn1-modules |
| pycparser | 3.0 | BSD-3-Clause | https://github.com/eliben/pycparser |
| pydantic | 2.12.5 | MIT | https://github.com/pydantic/pydantic |
| pydantic_core | 2.41.5 | MIT | https://github.com/pydantic/pydantic-core |
| pylance | 2.0.0 | Apache Software License | UNKNOWN |
| pymdown-extensions | 10.20.1 | MIT | https://github.com/facelessuser/pymdown-extensions |
| pyparsing | 3.3.2 | MIT | https://github.com/pyparsing/pyparsing/ |
| pyright | 1.1.408 | MIT | https://github.com/RobertCraigie/pyright-python |
| pytest | 9.0.2 | MIT | https://docs.pytest.org/en/latest/ |
| pytest-asyncio | 1.3.0 | Apache-2.0 | https://github.com/pytest-dev/pytest-asyncio |
| pytest-mock | 3.15.1 | MIT License | https://github.com/pytest-dev/pytest-mock/ |
| python-dateutil | 2.9.0.post0 | Apache Software License; BSD License | https://github.com/dateutil/dateutil |
| pytz | 2025.2 | MIT License | http://pythonhosted.org/pytz |
| pyyaml_env_tag | 1.1 | MIT | https://github.com/waylan/pyyaml-env-tag |
| pyzmq | 27.1.0 | BSD License | https://pyzmq.readthedocs.org |
| referencing | 0.37.0 | MIT | https://github.com/python-jsonschema/referencing |
| regex | 2026.1.15 | Apache-2.0 AND CNRI-Python | https://github.com/mrabarnett/mrab-regex |
| requests | 2.32.5 | Apache Software License | https://requests.readthedocs.io |
| rpds-py | 0.30.0 | MIT | https://github.com/crate-py/rpds |
| rsa | 4.7.2 | Apache Software License | https://stuvel.eu/rsa |
| ruff | 0.15.0 | MIT License | https://docs.astral.sh/ruff |
| s3transfer | 0.16.0 | Apache Software License | https://github.com/boto/s3transfer |
| safetensors | 0.7.0 | Apache Software License | https://github.com/huggingface/safetensors |
| scikit-learn | 1.8.0 | BSD-3-Clause | https://scikit-learn.org |
| scipy | 1.17.0 | BSD License | https://scipy.org/ |
| sentence-transformers | 5.2.2 | Apache Software License | https://www.SBERT.net |
| sentencepiece | 0.2.1 | UNKNOWN | https://github.com/google/sentencepiece |
| six | 1.17.0 | MIT License | https://github.com/benjaminp/six |
| sniffio | 1.3.1 | Apache Software License; MIT License | https://github.com/python-trio/sniffio |
| soupsieve | 2.8.3 | MIT | https://github.com/facelessuser/soupsieve |
| stack-data | 0.6.3 | MIT License | http://github.com/alexmojaki/stack_data |
| sympy | 1.14.0 | BSD License | https://sympy.org |
| tabulate | 0.9.0 | MIT License | https://github.com/astanin/python-tabulate |
| tantivy | 0.25.1 | UNKNOWN | UNKNOWN |
| threadpoolctl | 3.6.0 | BSD License | https://github.com/joblib/threadpoolctl |
| timm | 1.0.24 | Apache Software License | https://github.com/huggingface/pytorch-image-models |
| tinycss2 | 1.4.0 | BSD License | https://www.courtbouillon.org/tinycss2 |
| tokenizers | 0.22.2 | Apache Software License | https://github.com/huggingface/tokenizers |
| torch | 2.8.0 | BSD License | https://pytorch.org/ |
| torchvision | 0.23.0 | BSD | https://github.com/pytorch/vision |
| tornado | 6.5.4 | Apache Software License | http://www.tornadoweb.org/ |
| tqdm | 4.67.3 | MPL-2.0 AND MIT | https://tqdm.github.io |
| traitlets | 5.14.3 | BSD License | https://github.com/ipython/traitlets |
| transformers | 4.57.6 | Apache Software License | https://github.com/huggingface/transformers |
| types-requests | 2.32.4.20260107 | Apache-2.0 | https://github.com/python/typeshed |
| typing-inspection | 0.4.2 | MIT | https://github.com/pydantic/typing-inspection |
| typing_extensions | 4.15.0 | PSF-2.0 | https://github.com/python/typing_extensions |
| tzdata | 2025.3 | Apache-2.0 | https://github.com/python/tzdata |
| uritemplate | 4.2.0 | BSD 3-Clause OR Apache-2.0 | https://uritemplate.readthedocs.org |
| urllib3 | 2.6.3 | MIT | https://github.com/urllib3/urllib3/blob/main/CHANGES.rst |
| virtualenv | 20.36.1 | MIT | https://github.com/pypa/virtualenv |
| watchdog | 6.0.0 | Apache Software License | https://github.com/gorakhargosh/watchdog |
| webencodings | 0.5.1 | BSD License | https://github.com/SimonSapin/python-webencodings |
| yarl | 1.22.0 | Apache Software License | https://github.com/aio-libs/yarl |

File diff suppressed because it is too large Load Diff

View File

@@ -45,7 +45,7 @@ repository = "https://github.com/lancedb/lancedb"
[project.optional-dependencies]
pylance = [
"pylance>=1.0.0b14",
"pylance>=4.0.0b7",
]
tests = [
"aiohttp",
@@ -59,9 +59,9 @@ tests = [
"polars>=0.19, <=1.3.0",
"tantivy",
"pyarrow-stubs",
"pylance>=1.0.0b14",
"pylance>=4.0.0b7",
"requests",
"datafusion",
"datafusion>=52,<53",
]
dev = [
"ruff",

View File

@@ -1,8 +1,10 @@
# SPDX-License-Identifier: Apache-2.0
# SPDX-FileCopyrightText: Copyright The LanceDB Authors
from functools import singledispatch
from typing import List, Optional, Tuple, Union
from lancedb.pydantic import LanceModel, model_to_dict
import pyarrow as pa
from ._lancedb import RecordBatchStream
@@ -80,3 +82,32 @@ def peek_reader(
yield from reader
return batch, pa.RecordBatchReader.from_batches(batch.schema, all_batches())
@singledispatch
def to_arrow(data) -> pa.Table:
"""Convert a single data object to a pa.Table."""
raise NotImplementedError(f"to_arrow not implemented for type {type(data)}")
@to_arrow.register(pa.RecordBatch)
def _arrow_from_batch(data: pa.RecordBatch) -> pa.Table:
return pa.Table.from_batches([data])
@to_arrow.register(pa.Table)
def _arrow_from_table(data: pa.Table) -> pa.Table:
return data
@to_arrow.register(list)
def _arrow_from_list(data: list) -> pa.Table:
if not data:
raise ValueError("Cannot create table from empty list without a schema")
if isinstance(data[0], LanceModel):
schema = data[0].__class__.to_arrow_schema()
dicts = [model_to_dict(d) for d in data]
return pa.Table.from_pylist(dicts, schema=schema)
return pa.Table.from_pylist(data)

View File

@@ -8,7 +8,7 @@ from abc import abstractmethod
from datetime import timedelta
from pathlib import Path
import sys
from typing import TYPE_CHECKING, Dict, Iterable, List, Literal, Optional, Union
from typing import TYPE_CHECKING, Any, Dict, Iterable, List, Literal, Optional, Union
if sys.version_info >= (3, 12):
from typing import override
@@ -1541,6 +1541,8 @@ class AsyncConnection(object):
storage_options_provider: Optional["StorageOptionsProvider"] = None,
index_cache_size: Optional[int] = None,
location: Optional[str] = None,
namespace_client: Optional[Any] = None,
managed_versioning: Optional[bool] = None,
) -> AsyncTable:
"""Open a Lance Table in the database.
@@ -1573,6 +1575,9 @@ class AsyncConnection(object):
The explicit location (URI) of the table. If provided, the table will be
opened from this location instead of deriving it from the database URI
and table name.
managed_versioning: bool, optional
Whether managed versioning is enabled for this table. If provided,
avoids a redundant describe_table call when namespace_client is set.
Returns
-------
@@ -1587,6 +1592,8 @@ class AsyncConnection(object):
storage_options_provider=storage_options_provider,
index_cache_size=index_cache_size,
location=location,
namespace_client=namespace_client,
managed_versioning=managed_versioning,
)
return AsyncTable(table)

View File

@@ -2,6 +2,7 @@
# SPDX-FileCopyrightText: Copyright The LanceDB Authors
import warnings
from typing import List, Union
import numpy as np
@@ -15,6 +16,8 @@ from .utils import weak_lru
@register("gte-text")
class GteEmbeddings(TextEmbeddingFunction):
"""
Deprecated: GTE embeddings should be used through sentence-transformers.
An embedding function that uses GTE-LARGE MLX format(for Apple silicon devices only)
as well as the standard cpu/gpu version from: https://huggingface.co/thenlper/gte-large.
@@ -61,6 +64,13 @@ class GteEmbeddings(TextEmbeddingFunction):
def __init__(self, **kwargs):
super().__init__(**kwargs)
warnings.warn(
"GTE embeddings as a standalone embedding function are deprecated. "
"Use the 'sentence-transformers' embedding function with a GTE model "
"instead.",
DeprecationWarning,
stacklevel=3,
)
self._ndims = None
if kwargs:
self.mlx = kwargs.get("mlx", False)

View File

@@ -110,6 +110,9 @@ class OpenAIEmbeddings(TextEmbeddingFunction):
valid_embeddings = {
idx: v.embedding for v, idx in zip(rs.data, valid_indices)
}
except openai.AuthenticationError:
logging.error("Authentication failed: Invalid API key provided")
raise
except openai.BadRequestError:
logging.exception("Bad request: %s", texts)
return [None] * len(texts)

View File

@@ -6,6 +6,7 @@ import io
import os
from typing import TYPE_CHECKING, List, Union
import urllib.parse as urlparse
import warnings
import numpy as np
import pyarrow as pa
@@ -24,6 +25,7 @@ if TYPE_CHECKING:
@register("siglip")
class SigLipEmbeddings(EmbeddingFunction):
# Deprecated: prefer CLIP embeddings via `open-clip`.
model_name: str = "google/siglip-base-patch16-224"
device: str = "cpu"
batch_size: int = 64
@@ -36,6 +38,12 @@ class SigLipEmbeddings(EmbeddingFunction):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
warnings.warn(
"SigLip embeddings are deprecated. Use CLIP embeddings via the "
"'open-clip' embedding function instead.",
DeprecationWarning,
stacklevel=3,
)
transformers = attempt_import_or_raise("transformers")
self._torch = attempt_import_or_raise("torch")

View File

@@ -269,6 +269,11 @@ def retry_with_exponential_backoff(
# and say that it is assumed that if this portion errors out, it's due
# to rate limit but the user should check the error message to be sure.
except Exception as e: # noqa: PERF203
# Don't retry on authentication errors (e.g., OpenAI 401)
# These are permanent failures that won't be fixed by retrying
if _is_non_retryable_error(e):
raise
num_retries += 1
if num_retries > max_retries:
@@ -289,6 +294,29 @@ def retry_with_exponential_backoff(
return wrapper
def _is_non_retryable_error(error: Exception) -> bool:
"""Check if an error should not be retried.
Args:
error: The exception to check
Returns:
True if the error should not be retried, False otherwise
"""
# Check for OpenAI authentication errors
error_type = type(error).__name__
if error_type == "AuthenticationError":
return True
# Check for other common non-retryable HTTP status codes
# 401 Unauthorized, 403 Forbidden
if hasattr(error, "status_code"):
if error.status_code in (401, 403):
return True
return False
def url_retrieve(url: str):
"""
Parameters

View File

@@ -12,7 +12,7 @@ from __future__ import annotations
import asyncio
import sys
from typing import Dict, Iterable, List, Optional, Union
from typing import Any, Dict, Iterable, List, Optional, Union
if sys.version_info >= (3, 12):
from typing import override
@@ -44,7 +44,7 @@ from lance_namespace import (
ListNamespacesRequest,
CreateNamespaceRequest,
DropNamespaceRequest,
CreateEmptyTableRequest,
DeclareTableRequest,
)
from lancedb.table import AsyncTable, LanceTable, Table
from lancedb.util import validate_table_name
@@ -240,7 +240,7 @@ class LanceNamespaceDBConnection(DBConnection):
session : Optional[Session]
A session to use for this connection
"""
self._ns = namespace
self._namespace_client = namespace
self.read_consistency_interval = read_consistency_interval
self.storage_options = storage_options or {}
self.session = session
@@ -269,7 +269,7 @@ class LanceNamespaceDBConnection(DBConnection):
if namespace is None:
namespace = []
request = ListTablesRequest(id=namespace, page_token=page_token, limit=limit)
response = self._ns.list_tables(request)
response = self._namespace_client.list_tables(request)
return response.tables if response.tables else []
@override
@@ -309,7 +309,9 @@ class LanceNamespaceDBConnection(DBConnection):
# Try to describe the table first to see if it exists
try:
describe_request = DescribeTableRequest(id=table_id)
describe_response = self._ns.describe_table(describe_request)
describe_response = self._namespace_client.describe_table(
describe_request
)
location = describe_response.location
namespace_storage_options = describe_response.storage_options
except Exception:
@@ -318,20 +320,20 @@ class LanceNamespaceDBConnection(DBConnection):
if location is None:
# Table doesn't exist or mode is "create", reserve a new location
create_empty_request = CreateEmptyTableRequest(
declare_request = DeclareTableRequest(
id=table_id,
location=None,
properties=self.storage_options if self.storage_options else None,
)
create_empty_response = self._ns.create_empty_table(create_empty_request)
declare_response = self._namespace_client.declare_table(declare_request)
if not create_empty_response.location:
if not declare_response.location:
raise ValueError(
"Table location is missing from create_empty_table response"
"Table location is missing from declare_table response"
)
location = create_empty_response.location
namespace_storage_options = create_empty_response.storage_options
location = declare_response.location
namespace_storage_options = declare_response.storage_options
# Merge storage options: self.storage_options < user options < namespace options
merged_storage_options = dict(self.storage_options)
@@ -353,7 +355,7 @@ class LanceNamespaceDBConnection(DBConnection):
# Only create if namespace returned storage_options (not None)
if storage_options_provider is None and namespace_storage_options is not None:
storage_options_provider = LanceNamespaceStorageOptionsProvider(
namespace=self._ns,
namespace=self._namespace_client,
table_id=table_id,
)
@@ -371,6 +373,7 @@ class LanceNamespaceDBConnection(DBConnection):
storage_options=merged_storage_options,
storage_options_provider=storage_options_provider,
location=location,
namespace_client=self._namespace_client,
)
return tbl
@@ -389,7 +392,7 @@ class LanceNamespaceDBConnection(DBConnection):
namespace = []
table_id = namespace + [name]
request = DescribeTableRequest(id=table_id)
response = self._ns.describe_table(request)
response = self._namespace_client.describe_table(request)
# Merge storage options: self.storage_options < user options < namespace options
merged_storage_options = dict(self.storage_options)
@@ -402,10 +405,14 @@ class LanceNamespaceDBConnection(DBConnection):
# Only create if namespace returned storage_options (not None)
if storage_options_provider is None and response.storage_options is not None:
storage_options_provider = LanceNamespaceStorageOptionsProvider(
namespace=self._ns,
namespace=self._namespace_client,
table_id=table_id,
)
# Pass managed_versioning to avoid redundant describe_table call in Rust.
# Convert None to False since we already have the answer from describe_table.
managed_versioning = response.managed_versioning is True
return self._lance_table_from_uri(
name,
response.location,
@@ -413,6 +420,8 @@ class LanceNamespaceDBConnection(DBConnection):
storage_options=merged_storage_options,
storage_options_provider=storage_options_provider,
index_cache_size=index_cache_size,
namespace_client=self._namespace_client,
managed_versioning=managed_versioning,
)
@override
@@ -422,7 +431,7 @@ class LanceNamespaceDBConnection(DBConnection):
namespace = []
table_id = namespace + [name]
request = DropTableRequest(id=table_id)
self._ns.drop_table(request)
self._namespace_client.drop_table(request)
@override
def rename_table(
@@ -484,7 +493,7 @@ class LanceNamespaceDBConnection(DBConnection):
request = ListNamespacesRequest(
id=namespace, page_token=page_token, limit=limit
)
response = self._ns.list_namespaces(request)
response = self._namespace_client.list_namespaces(request)
return ListNamespacesResponse(
namespaces=response.namespaces if response.namespaces else [],
page_token=response.page_token,
@@ -520,7 +529,7 @@ class LanceNamespaceDBConnection(DBConnection):
mode=_normalize_create_namespace_mode(mode),
properties=properties,
)
response = self._ns.create_namespace(request)
response = self._namespace_client.create_namespace(request)
return CreateNamespaceResponse(
properties=response.properties if hasattr(response, "properties") else None
)
@@ -555,7 +564,7 @@ class LanceNamespaceDBConnection(DBConnection):
mode=_normalize_drop_namespace_mode(mode),
behavior=_normalize_drop_namespace_behavior(behavior),
)
response = self._ns.drop_namespace(request)
response = self._namespace_client.drop_namespace(request)
return DropNamespaceResponse(
properties=(
response.properties if hasattr(response, "properties") else None
@@ -581,7 +590,7 @@ class LanceNamespaceDBConnection(DBConnection):
Response containing the namespace properties.
"""
request = DescribeNamespaceRequest(id=namespace)
response = self._ns.describe_namespace(request)
response = self._namespace_client.describe_namespace(request)
return DescribeNamespaceResponse(
properties=response.properties if hasattr(response, "properties") else None
)
@@ -615,7 +624,7 @@ class LanceNamespaceDBConnection(DBConnection):
if namespace is None:
namespace = []
request = ListTablesRequest(id=namespace, page_token=page_token, limit=limit)
response = self._ns.list_tables(request)
response = self._namespace_client.list_tables(request)
return ListTablesResponse(
tables=response.tables if response.tables else [],
page_token=response.page_token,
@@ -630,6 +639,8 @@ class LanceNamespaceDBConnection(DBConnection):
storage_options: Optional[Dict[str, str]] = None,
storage_options_provider: Optional[StorageOptionsProvider] = None,
index_cache_size: Optional[int] = None,
namespace_client: Optional[Any] = None,
managed_versioning: Optional[bool] = None,
) -> LanceTable:
# Open a table directly from a URI using the location parameter
# Note: storage_options should already be merged by the caller
@@ -643,6 +654,8 @@ class LanceNamespaceDBConnection(DBConnection):
)
# Open the table using the temporary connection with the location parameter
# Pass namespace_client to enable managed versioning support
# Pass managed_versioning to avoid redundant describe_table call
return LanceTable.open(
temp_conn,
name,
@@ -651,6 +664,8 @@ class LanceNamespaceDBConnection(DBConnection):
storage_options_provider=storage_options_provider,
index_cache_size=index_cache_size,
location=table_uri,
namespace_client=namespace_client,
managed_versioning=managed_versioning,
)
@@ -685,7 +700,7 @@ class AsyncLanceNamespaceDBConnection:
session : Optional[Session]
A session to use for this connection
"""
self._ns = namespace
self._namespace_client = namespace
self.read_consistency_interval = read_consistency_interval
self.storage_options = storage_options or {}
self.session = session
@@ -713,7 +728,7 @@ class AsyncLanceNamespaceDBConnection:
if namespace is None:
namespace = []
request = ListTablesRequest(id=namespace, page_token=page_token, limit=limit)
response = self._ns.list_tables(request)
response = self._namespace_client.list_tables(request)
return response.tables if response.tables else []
async def create_table(
@@ -750,7 +765,9 @@ class AsyncLanceNamespaceDBConnection:
# Try to describe the table first to see if it exists
try:
describe_request = DescribeTableRequest(id=table_id)
describe_response = self._ns.describe_table(describe_request)
describe_response = self._namespace_client.describe_table(
describe_request
)
location = describe_response.location
namespace_storage_options = describe_response.storage_options
except Exception:
@@ -759,20 +776,20 @@ class AsyncLanceNamespaceDBConnection:
if location is None:
# Table doesn't exist or mode is "create", reserve a new location
create_empty_request = CreateEmptyTableRequest(
declare_request = DeclareTableRequest(
id=table_id,
location=None,
properties=self.storage_options if self.storage_options else None,
)
create_empty_response = self._ns.create_empty_table(create_empty_request)
declare_response = self._namespace_client.declare_table(declare_request)
if not create_empty_response.location:
if not declare_response.location:
raise ValueError(
"Table location is missing from create_empty_table response"
"Table location is missing from declare_table response"
)
location = create_empty_response.location
namespace_storage_options = create_empty_response.storage_options
location = declare_response.location
namespace_storage_options = declare_response.storage_options
# Merge storage options: self.storage_options < user options < namespace options
merged_storage_options = dict(self.storage_options)
@@ -797,7 +814,7 @@ class AsyncLanceNamespaceDBConnection:
and namespace_storage_options is not None
):
provider = LanceNamespaceStorageOptionsProvider(
namespace=self._ns,
namespace=self._namespace_client,
table_id=table_id,
)
else:
@@ -817,6 +834,7 @@ class AsyncLanceNamespaceDBConnection:
storage_options=merged_storage_options,
storage_options_provider=provider,
location=location,
namespace_client=self._namespace_client,
)
lance_table = await asyncio.to_thread(_create_table)
@@ -837,7 +855,7 @@ class AsyncLanceNamespaceDBConnection:
namespace = []
table_id = namespace + [name]
request = DescribeTableRequest(id=table_id)
response = self._ns.describe_table(request)
response = self._namespace_client.describe_table(request)
# Merge storage options: self.storage_options < user options < namespace options
merged_storage_options = dict(self.storage_options)
@@ -849,10 +867,14 @@ class AsyncLanceNamespaceDBConnection:
# Create a storage options provider if not provided by user
if storage_options_provider is None and response.storage_options is not None:
storage_options_provider = LanceNamespaceStorageOptionsProvider(
namespace=self._ns,
namespace=self._namespace_client,
table_id=table_id,
)
# Capture managed_versioning from describe response.
# Convert None to False since we already have the answer from describe_table.
managed_versioning = response.managed_versioning is True
# Open table in a thread
def _open_table():
temp_conn = LanceDBConnection(
@@ -870,6 +892,8 @@ class AsyncLanceNamespaceDBConnection:
storage_options_provider=storage_options_provider,
index_cache_size=index_cache_size,
location=response.location,
namespace_client=self._namespace_client,
managed_versioning=managed_versioning,
)
lance_table = await asyncio.to_thread(_open_table)
@@ -881,7 +905,7 @@ class AsyncLanceNamespaceDBConnection:
namespace = []
table_id = namespace + [name]
request = DropTableRequest(id=table_id)
self._ns.drop_table(request)
self._namespace_client.drop_table(request)
async def rename_table(
self,
@@ -943,7 +967,7 @@ class AsyncLanceNamespaceDBConnection:
request = ListNamespacesRequest(
id=namespace, page_token=page_token, limit=limit
)
response = self._ns.list_namespaces(request)
response = self._namespace_client.list_namespaces(request)
return ListNamespacesResponse(
namespaces=response.namespaces if response.namespaces else [],
page_token=response.page_token,
@@ -978,7 +1002,7 @@ class AsyncLanceNamespaceDBConnection:
mode=_normalize_create_namespace_mode(mode),
properties=properties,
)
response = self._ns.create_namespace(request)
response = self._namespace_client.create_namespace(request)
return CreateNamespaceResponse(
properties=response.properties if hasattr(response, "properties") else None
)
@@ -1012,7 +1036,7 @@ class AsyncLanceNamespaceDBConnection:
mode=_normalize_drop_namespace_mode(mode),
behavior=_normalize_drop_namespace_behavior(behavior),
)
response = self._ns.drop_namespace(request)
response = self._namespace_client.drop_namespace(request)
return DropNamespaceResponse(
properties=(
response.properties if hasattr(response, "properties") else None
@@ -1039,7 +1063,7 @@ class AsyncLanceNamespaceDBConnection:
Response containing the namespace properties.
"""
request = DescribeNamespaceRequest(id=namespace)
response = self._ns.describe_namespace(request)
response = self._namespace_client.describe_namespace(request)
return DescribeNamespaceResponse(
properties=response.properties if hasattr(response, "properties") else None
)
@@ -1072,7 +1096,7 @@ class AsyncLanceNamespaceDBConnection:
if namespace is None:
namespace = []
request = ListTablesRequest(id=namespace, page_token=page_token, limit=limit)
response = self._ns.list_tables(request)
response = self._namespace_client.list_tables(request)
return ListTablesResponse(
tables=response.tables if response.tables else [],
page_token=response.page_token,

View File

@@ -9,7 +9,7 @@ import json
from ._lancedb import async_permutation_builder, PermutationReader
from .table import LanceTable
from .background_loop import LOOP
from .util import batch_to_tensor
from .util import batch_to_tensor, batch_to_tensor_rows
from typing import Any, Callable, Iterator, Literal, Optional, TYPE_CHECKING, Union
if TYPE_CHECKING:
@@ -333,7 +333,11 @@ class Transforms:
"""
@staticmethod
def arrow2python(batch: pa.RecordBatch) -> dict[str, list[Any]]:
def arrow2python(batch: pa.RecordBatch) -> list[dict[str, Any]]:
return batch.to_pylist()
@staticmethod
def arrow2pythoncol(batch: pa.RecordBatch) -> dict[str, list[Any]]:
return batch.to_pydict()
@staticmethod
@@ -687,7 +691,17 @@ class Permutation:
return
def with_format(
self, format: Literal["numpy", "python", "pandas", "arrow", "torch", "polars"]
self,
format: Literal[
"numpy",
"python",
"python_col",
"pandas",
"arrow",
"torch",
"torch_col",
"polars",
],
) -> "Permutation":
"""
Set the format for batches
@@ -696,16 +710,18 @@ class Permutation:
The format can be one of:
- "numpy" - the batch will be a dict of numpy arrays (one per column)
- "python" - the batch will be a dict of lists (one per column)
- "python" - the batch will be a list of dicts (one per row)
- "python_col" - the batch will be a dict of lists (one entry per column)
- "pandas" - the batch will be a pandas DataFrame
- "arrow" - the batch will be a pyarrow RecordBatch
- "torch" - the batch will be a two dimensional torch tensor
- "torch" - the batch will be a list of tensors, one per row
- "torch_col" - the batch will be a 2D torch tensor (first dim indexes columns)
- "polars" - the batch will be a polars DataFrame
Conversion may or may not involve a data copy. Lance uses Arrow internally
and so it is able to zero-copy to the arrow and polars.
and so it is able to zero-copy to the arrow and polars formats.
Conversion to torch will be zero-copy but will only support a subset of data
Conversion to torch_col will be zero-copy but will only support a subset of data
types (numeric types).
Conversion to numpy and/or pandas will typically be zero-copy for numeric
@@ -718,6 +734,8 @@ class Permutation:
assert format is not None, "format is required"
if format == "python":
return self.with_transform(Transforms.arrow2python)
if format == "python_col":
return self.with_transform(Transforms.arrow2pythoncol)
elif format == "numpy":
return self.with_transform(Transforms.arrow2numpy)
elif format == "pandas":
@@ -725,6 +743,8 @@ class Permutation:
elif format == "arrow":
return self.with_transform(Transforms.arrow2arrow)
elif format == "torch":
return self.with_transform(batch_to_tensor_rows)
elif format == "torch_col":
return self.with_transform(batch_to_tensor)
elif format == "polars":
return self.with_transform(Transforms.arrow2polars())
@@ -746,15 +766,20 @@ class Permutation:
def __getitem__(self, index: int) -> Any:
"""
Return a single row from the permutation
The output will always be a python dictionary regardless of the format.
This method is mostly useful for debugging and exploration. For actual
processing use [iter](#iter) or a torch data loader to perform batched
processing.
Returns a single row from the permutation by offset
"""
pass
return self.__getitems__([index])
def __getitems__(self, indices: list[int]) -> Any:
"""
Returns rows from the permutation by offset
"""
async def do_getitems():
return await self.reader.take_offsets(indices, selection=self.selection)
batch = LOOP.run(do_getitems())
return self.transform_fn(batch)
@deprecated(details="Use with_skip instead")
def skip(self, skip: int) -> "Permutation":

View File

@@ -606,6 +606,7 @@ class LanceQueryBuilder(ABC):
query,
ordering_field_name=ordering_field_name,
fts_columns=fts_columns,
fast_search=fast_search,
)
if isinstance(query, list):
@@ -1456,12 +1457,14 @@ class LanceFtsQueryBuilder(LanceQueryBuilder):
query: str | FullTextQuery,
ordering_field_name: Optional[str] = None,
fts_columns: Optional[Union[str, List[str]]] = None,
fast_search: bool = None,
):
super().__init__(table)
self._query = query
self._phrase_query = False
self.ordering_field_name = ordering_field_name
self._reranker = None
self._fast_search = fast_search
if isinstance(fts_columns, str):
fts_columns = [fts_columns]
self._fts_columns = fts_columns
@@ -1483,6 +1486,19 @@ class LanceFtsQueryBuilder(LanceQueryBuilder):
self._phrase_query = phrase_query
return self
def fast_search(self) -> LanceFtsQueryBuilder:
"""
Skip a flat search of unindexed data. This will improve
search performance but search results will not include unindexed data.
Returns
-------
LanceFtsQueryBuilder
The LanceFtsQueryBuilder object.
"""
self._fast_search = True
return self
def to_query_object(self) -> Query:
return Query(
columns=self._columns,
@@ -1494,6 +1510,7 @@ class LanceFtsQueryBuilder(LanceQueryBuilder):
query=self._query, columns=self._fts_columns
),
offset=self._offset,
fast_search=self._fast_search,
)
def output_schema(self) -> pa.Schema:
@@ -1782,6 +1799,26 @@ class LanceHybridQueryBuilder(LanceQueryBuilder):
vector_results = LanceHybridQueryBuilder._rank(vector_results, "_distance")
fts_results = LanceHybridQueryBuilder._rank(fts_results, "_score")
# If both result sets are empty (e.g. after hard filtering),
# return early to avoid errors in reranking or score restoration.
if vector_results.num_rows == 0 and fts_results.num_rows == 0:
# Build a minimal empty table with the _relevance_score column
combined_schema = pa.unify_schemas(
[vector_results.schema, fts_results.schema],
)
empty = pa.table(
{
col: pa.array([], type=combined_schema.field(col).type)
for col in combined_schema.names
}
)
empty = empty.append_column(
"_relevance_score", pa.array([], type=pa.float32())
)
if not with_row_ids and "_rowid" in empty.column_names:
empty = empty.drop(["_rowid"])
return empty
original_distances = None
original_scores = None
original_distance_row_ids = None
@@ -2118,19 +2155,17 @@ class LanceHybridQueryBuilder(LanceQueryBuilder):
""" # noqa: E501
self._create_query_builders()
results = ["Vector Search Plan:"]
results.append(
self._table._explain_plan(
self._vector_query.to_query_object(), verbose=verbose
)
reranker_label = str(self._reranker) if self._reranker else "No reranker"
vector_plan = self._table._explain_plan(
self._vector_query.to_query_object(), verbose=verbose
)
results.append("FTS Search Plan:")
results.append(
self._table._explain_plan(
self._fts_query.to_query_object(), verbose=verbose
)
fts_plan = self._table._explain_plan(
self._fts_query.to_query_object(), verbose=verbose
)
return "\n".join(results)
# Indent sub-plans under the reranker
indented_vector = "\n".join(" " + line for line in vector_plan.splitlines())
indented_fts = "\n".join(" " + line for line in fts_plan.splitlines())
return f"{reranker_label}\n {indented_vector}\n {indented_fts}"
def analyze_plan(self):
"""Execute the query and display with runtime metrics.
@@ -3164,23 +3199,20 @@ class AsyncHybridQuery(AsyncStandardQuery, AsyncVectorQueryBase):
... plan = await table.query().nearest_to([1.0, 2.0]).nearest_to_text("hello").explain_plan(True)
... print(plan)
>>> asyncio.run(doctest_example()) # doctest: +ELLIPSIS, +NORMALIZE_WHITESPACE
Vector Search Plan:
ProjectionExec: expr=[vector@0 as vector, text@3 as text, _distance@2 as _distance]
Take: columns="vector, _rowid, _distance, (text)"
CoalesceBatchesExec: target_batch_size=1024
GlobalLimitExec: skip=0, fetch=10
FilterExec: _distance@2 IS NOT NULL
SortExec: TopK(fetch=10), expr=[_distance@2 ASC NULLS LAST, _rowid@1 ASC NULLS LAST], preserve_partitioning=[false]
KNNVectorDistance: metric=l2
LanceRead: uri=..., projection=[vector], ...
<BLANKLINE>
FTS Search Plan:
ProjectionExec: expr=[vector@2 as vector, text@3 as text, _score@1 as _score]
Take: columns="_rowid, _score, (vector), (text)"
CoalesceBatchesExec: target_batch_size=1024
GlobalLimitExec: skip=0, fetch=10
MatchQuery: column=text, query=hello
<BLANKLINE>
RRFReranker(K=60)
ProjectionExec: expr=[vector@0 as vector, text@3 as text, _distance@2 as _distance]
Take: columns="vector, _rowid, _distance, (text)"
CoalesceBatchesExec: target_batch_size=1024
GlobalLimitExec: skip=0, fetch=10
FilterExec: _distance@2 IS NOT NULL
SortExec: TopK(fetch=10), expr=[_distance@2 ASC NULLS LAST, _rowid@1 ASC NULLS LAST], preserve_partitioning=[false]
KNNVectorDistance: metric=l2
LanceRead: uri=..., projection=[vector], ...
ProjectionExec: expr=[vector@2 as vector, text@3 as text, _score@1 as _score]
Take: columns="_rowid, _score, (vector), (text)"
CoalesceBatchesExec: target_batch_size=1024
GlobalLimitExec: skip=0, fetch=10
MatchQuery: column=text, query=hello
Parameters
----------
@@ -3192,12 +3224,12 @@ class AsyncHybridQuery(AsyncStandardQuery, AsyncVectorQueryBase):
plan : str
""" # noqa: E501
results = ["Vector Search Plan:"]
results.append(await self._inner.to_vector_query().explain_plan(verbose))
results.append("FTS Search Plan:")
results.append(await self._inner.to_fts_query().explain_plan(verbose))
return "\n".join(results)
vector_plan = await self._inner.to_vector_query().explain_plan(verbose)
fts_plan = await self._inner.to_fts_query().explain_plan(verbose)
# Indent sub-plans under the reranker
indented_vector = "\n".join(" " + line for line in vector_plan.splitlines())
indented_fts = "\n".join(" " + line for line in fts_plan.splitlines())
return f"{self._reranker}\n {indented_vector}\n {indented_fts}"
async def analyze_plan(self):
"""

View File

@@ -218,8 +218,6 @@ class RemoteTable(Table):
train: bool = True,
):
"""Create an index on the table.
Currently, the only parameters that matter are
the metric and the vector column name.
Parameters
----------
@@ -250,11 +248,6 @@ class RemoteTable(Table):
>>> table.create_index("l2", "vector") # doctest: +SKIP
"""
if num_sub_vectors is not None:
logging.warning(
"num_sub_vectors is not supported on LanceDB cloud."
"This parameter will be tuned automatically."
)
if accelerator is not None:
logging.warning(
"GPU accelerator is not yet supported on LanceDB cloud."

View File

@@ -42,10 +42,18 @@ class AnswerdotaiRerankers(Reranker):
rerankers = attempt_import_or_raise(
"rerankers"
) # import here for faster ops later
self.model_name = model_name
self.model_type = model_type
self.reranker = rerankers.Reranker(
model_name=model_name, model_type=model_type, **kwargs
)
def __str__(self):
return (
f"AnswerdotaiRerankers(model_type={self.model_type}, "
f"model_name={self.model_name})"
)
def _rerank(self, result_set: pa.Table, query: str):
result_set = self._handle_empty_results(result_set)
if len(result_set) == 0:

View File

@@ -40,6 +40,9 @@ class Reranker(ABC):
if ARROW_VERSION.major <= 13:
self._concat_tables_args = {"promote": True}
def __str__(self):
return self.__class__.__name__
def rerank_vector(
self,
query: str,

View File

@@ -44,6 +44,9 @@ class CohereReranker(Reranker):
self.top_n = top_n
self.api_key = api_key
def __str__(self):
return f"CohereReranker(model_name={self.model_name})"
@cached_property
def _client(self):
cohere = attempt_import_or_raise("cohere")

View File

@@ -50,6 +50,9 @@ class CrossEncoderReranker(Reranker):
if self.device is None:
self.device = "cuda" if torch.cuda.is_available() else "cpu"
def __str__(self):
return f"CrossEncoderReranker(model_name={self.model_name})"
@cached_property
def model(self):
sbert = attempt_import_or_raise("sentence_transformers")

View File

@@ -45,6 +45,9 @@ class JinaReranker(Reranker):
self.top_n = top_n
self.api_key = api_key
def __str__(self):
return f"JinaReranker(model_name={self.model_name})"
@cached_property
def _client(self):
import requests

View File

@@ -38,6 +38,9 @@ class LinearCombinationReranker(Reranker):
self.weight = weight
self.fill = fill
def __str__(self):
return f"LinearCombinationReranker(weight={self.weight}, fill={self.fill})"
def rerank_hybrid(
self,
query: str, # noqa: F821

View File

@@ -54,6 +54,12 @@ class MRRReranker(Reranker):
self.weight_vector = weight_vector
self.weight_fts = weight_fts
def __str__(self):
return (
f"MRRReranker(weight_vector={self.weight_vector}, "
f"weight_fts={self.weight_fts})"
)
def rerank_hybrid(
self,
query: str, # noqa: F821

View File

@@ -43,6 +43,9 @@ class OpenaiReranker(Reranker):
self.column = column
self.api_key = api_key
def __str__(self):
return f"OpenaiReranker(model_name={self.model_name})"
def _rerank(self, result_set: pa.Table, query: str):
result_set = self._handle_empty_results(result_set)
if len(result_set) == 0:

View File

@@ -36,6 +36,9 @@ class RRFReranker(Reranker):
super().__init__(return_score)
self.K = K
def __str__(self):
return f"RRFReranker(K={self.K})"
def rerank_hybrid(
self,
query: str, # noqa: F821

View File

@@ -52,6 +52,9 @@ class VoyageAIReranker(Reranker):
self.api_key = api_key
self.truncation = truncation
def __str__(self):
return f"VoyageAIReranker(model_name={self.model_name})"
@cached_property
def _client(self):
voyageai = attempt_import_or_raise("voyageai")

View File

@@ -0,0 +1,214 @@
# SPDX-License-Identifier: Apache-2.0
# SPDX-FileCopyrightText: Copyright The LanceDB Authors
from dataclasses import dataclass
from functools import singledispatch
import sys
from typing import Callable, Iterator, Optional
from lancedb.arrow import to_arrow
import pyarrow as pa
import pyarrow.dataset as ds
from .pydantic import LanceModel
@dataclass
class Scannable:
schema: pa.Schema
num_rows: Optional[int]
# Factory function to create a new reader each time (supports re-scanning)
reader: Callable[[], pa.RecordBatchReader]
# Whether reader can be called more than once. For example, an iterator can
# only be consumed once, while a DataFrame can be converted to a new reader
# each time.
rescannable: bool = True
@singledispatch
def to_scannable(data) -> Scannable:
# Fallback: try iterable protocol
if hasattr(data, "__iter__"):
return _from_iterable(iter(data))
raise NotImplementedError(f"to_scannable not implemented for type {type(data)}")
@to_scannable.register(pa.RecordBatchReader)
def _from_reader(data: pa.RecordBatchReader) -> Scannable:
# RecordBatchReader can only be consumed once - not rescannable
return Scannable(
schema=data.schema, num_rows=None, reader=lambda: data, rescannable=False
)
@to_scannable.register(pa.RecordBatch)
def _from_batch(data: pa.RecordBatch) -> Scannable:
return Scannable(
schema=data.schema,
num_rows=data.num_rows,
reader=lambda: pa.RecordBatchReader.from_batches(data.schema, [data]),
)
@to_scannable.register(pa.Table)
def _from_table(data: pa.Table) -> Scannable:
return Scannable(schema=data.schema, num_rows=data.num_rows, reader=data.to_reader)
@to_scannable.register(ds.Dataset)
def _from_dataset(data: ds.Dataset) -> Scannable:
return Scannable(
schema=data.schema,
num_rows=data.count_rows(),
reader=lambda: data.scanner().to_reader(),
)
@to_scannable.register(ds.Scanner)
def _from_scanner(data: ds.Scanner) -> Scannable:
# Scanner can only be consumed once - not rescannable
return Scannable(
schema=data.projected_schema,
num_rows=None,
reader=data.to_reader,
rescannable=False,
)
@to_scannable.register(list)
def _from_list(data: list) -> Scannable:
if not data:
raise ValueError("Cannot create table from empty list without a schema")
table = to_arrow(data)
return Scannable(
schema=table.schema, num_rows=table.num_rows, reader=table.to_reader
)
@to_scannable.register(dict)
def _from_dict(data: dict) -> Scannable:
raise ValueError("Cannot add a single dictionary to a table. Use a list.")
@to_scannable.register(LanceModel)
def _from_lance_model(data: LanceModel) -> Scannable:
raise ValueError("Cannot add a single LanceModel to a table. Use a list.")
def _from_iterable(data: Iterator) -> Scannable:
first_item = next(data, None)
if first_item is None:
raise ValueError("Cannot create table from empty iterator")
first = to_arrow(first_item)
schema = first.schema
def iter():
yield from first.to_batches()
for item in data:
batch = to_arrow(item)
if batch.schema != schema:
try:
batch = batch.cast(schema)
except pa.lib.ArrowInvalid:
raise ValueError(
f"Input iterator yielded a batch with schema that "
f"does not match the schema of other batches.\n"
f"Expected:\n{schema}\nGot:\n{batch.schema}"
)
yield from batch.to_batches()
reader = pa.RecordBatchReader.from_batches(schema, iter())
return to_scannable(reader)
_registered_modules: set[str] = set()
def _register_optional_converters():
"""Register converters for optional dependencies that are already imported."""
if "pandas" in sys.modules and "pandas" not in _registered_modules:
_registered_modules.add("pandas")
import pandas as pd
@to_arrow.register(pd.DataFrame)
def _arrow_from_pandas(data: pd.DataFrame) -> pa.Table:
table = pa.Table.from_pandas(data, preserve_index=False)
return table.replace_schema_metadata(None)
@to_scannable.register(pd.DataFrame)
def _from_pandas(data: pd.DataFrame) -> Scannable:
return to_scannable(_arrow_from_pandas(data))
if "polars" in sys.modules and "polars" not in _registered_modules:
_registered_modules.add("polars")
import polars as pl
@to_arrow.register(pl.DataFrame)
def _arrow_from_polars(data: pl.DataFrame) -> pa.Table:
return data.to_arrow()
@to_scannable.register(pl.DataFrame)
def _from_polars(data: pl.DataFrame) -> Scannable:
arrow = data.to_arrow()
return Scannable(
schema=arrow.schema, num_rows=len(data), reader=arrow.to_reader
)
@to_scannable.register(pl.LazyFrame)
def _from_polars_lazy(data: pl.LazyFrame) -> Scannable:
arrow = data.collect().to_arrow()
return Scannable(
schema=arrow.schema, num_rows=arrow.num_rows, reader=arrow.to_reader
)
if "datasets" in sys.modules and "datasets" not in _registered_modules:
_registered_modules.add("datasets")
from datasets import Dataset as HFDataset
from datasets import DatasetDict as HFDatasetDict
@to_scannable.register(HFDataset)
def _from_hf_dataset(data: HFDataset) -> Scannable:
table = data.data.table # Access underlying Arrow table
return Scannable(
schema=table.schema, num_rows=len(data), reader=table.to_reader
)
@to_scannable.register(HFDatasetDict)
def _from_hf_dataset_dict(data: HFDatasetDict) -> Scannable:
# HuggingFace DatasetDict: combine all splits with a 'split' column
schema = data[list(data.keys())[0]].features.arrow_schema
if "split" not in schema.names:
schema = schema.append(pa.field("split", pa.string()))
def gen():
for split_name, dataset in data.items():
for batch in dataset.data.to_batches():
split_arr = pa.array(
[split_name] * len(batch), type=pa.string()
)
yield pa.RecordBatch.from_arrays(
list(batch.columns) + [split_arr], schema=schema
)
total_rows = sum(len(dataset) for dataset in data.values())
return Scannable(
schema=schema,
num_rows=total_rows,
reader=lambda: pa.RecordBatchReader.from_batches(schema, gen()),
)
if "lance" in sys.modules and "lance" not in _registered_modules:
_registered_modules.add("lance")
import lance
@to_scannable.register(lance.LanceDataset)
def _from_lance(data: lance.LanceDataset) -> Scannable:
return Scannable(
schema=data.schema,
num_rows=data.count_rows(),
reader=lambda: data.scanner().to_reader(),
)
# Register on module load
_register_optional_converters()

View File

@@ -25,6 +25,8 @@ from typing import (
)
from urllib.parse import urlparse
from lancedb.scannable import _register_optional_converters, to_scannable
from . import __version__
from lancedb.arrow import peek_reader
from lancedb.background_loop import LOOP
@@ -904,7 +906,9 @@ class Table(ABC):
----------
field_names: str or list of str
The name(s) of the field to index.
can be only str if use_tantivy=True for now.
If ``use_tantivy`` is False (default), only a single field name
(str) is supported. To index multiple fields, create a separate
FTS index for each field.
replace: bool, default False
If True, replace the existing index if it exists. Note that this is
not yet an atomic operation; the index will be temporarily
@@ -1327,7 +1331,7 @@ class Table(ABC):
1 2 [3.0, 4.0]
2 3 [5.0, 6.0]
>>> table.delete("x = 2")
DeleteResult(version=2)
DeleteResult(num_deleted_rows=1, version=2)
>>> table.to_pandas()
x vector
0 1 [1.0, 2.0]
@@ -1341,7 +1345,7 @@ class Table(ABC):
>>> to_remove
'1, 5'
>>> table.delete(f"x IN ({to_remove})")
DeleteResult(version=3)
DeleteResult(num_deleted_rows=1, version=3)
>>> table.to_pandas()
x vector
0 3 [5.0, 6.0]
@@ -1742,6 +1746,8 @@ class LanceTable(Table):
storage_options_provider: Optional["StorageOptionsProvider"] = None,
index_cache_size: Optional[int] = None,
location: Optional[str] = None,
namespace_client: Optional[Any] = None,
managed_versioning: Optional[bool] = None,
_async: AsyncTable = None,
):
if namespace is None:
@@ -1749,6 +1755,7 @@ class LanceTable(Table):
self._conn = connection
self._namespace = namespace
self._location = location # Store location for use in _dataset_path
self._namespace_client = namespace_client
if _async is not None:
self._table = _async
else:
@@ -1760,6 +1767,8 @@ class LanceTable(Table):
storage_options_provider=storage_options_provider,
index_cache_size=index_cache_size,
location=location,
namespace_client=namespace_client,
managed_versioning=managed_versioning,
)
)
@@ -1802,6 +1811,8 @@ class LanceTable(Table):
storage_options_provider: Optional["StorageOptionsProvider"] = None,
index_cache_size: Optional[int] = None,
location: Optional[str] = None,
namespace_client: Optional[Any] = None,
managed_versioning: Optional[bool] = None,
):
if namespace is None:
namespace = []
@@ -1813,6 +1824,8 @@ class LanceTable(Table):
storage_options_provider=storage_options_provider,
index_cache_size=index_cache_size,
location=location,
namespace_client=namespace_client,
managed_versioning=managed_versioning,
)
# check the dataset exists
@@ -1844,6 +1857,16 @@ class LanceTable(Table):
"Please install with `pip install pylance`."
)
if self._namespace_client is not None:
table_id = self._namespace + [self.name]
return lance.dataset(
version=self.version,
storage_options=self._conn.storage_options,
namespace=self._namespace_client,
table_id=table_id,
**kwargs,
)
return lance.dataset(
self._dataset_path,
version=self.version,
@@ -2298,7 +2321,11 @@ class LanceTable(Table):
):
if not use_tantivy:
if not isinstance(field_names, str):
raise ValueError("field_names must be a string when use_tantivy=False")
raise ValueError(
"Native FTS indexes can only be created on a single field "
"at a time. To search over multiple text fields, create a "
"separate FTS index for each field."
)
if tokenizer_name is None:
tokenizer_configs = {
@@ -2705,6 +2732,7 @@ class LanceTable(Table):
data_storage_version: Optional[str] = None,
enable_v2_manifest_paths: Optional[bool] = None,
location: Optional[str] = None,
namespace_client: Optional[Any] = None,
):
"""
Create a new table.
@@ -2765,6 +2793,7 @@ class LanceTable(Table):
self._conn = db
self._namespace = namespace
self._location = location
self._namespace_client = namespace_client
if data_storage_version is not None:
warnings.warn(
@@ -3721,18 +3750,31 @@ class AsyncTable:
on_bad_vectors = "error"
if fill_value is None:
fill_value = 0.0
data = _sanitize_data(
data,
schema,
metadata=schema.metadata,
on_bad_vectors=on_bad_vectors,
fill_value=fill_value,
allow_subschema=True,
)
if isinstance(data, pa.Table):
data = data.to_reader()
return await self._inner.add(data, mode or "append")
# _santitize_data is an old code path, but we will use it until the
# new code path is ready.
if on_bad_vectors != "error" or (
schema.metadata is not None and b"embedding_functions" in schema.metadata
):
data = _sanitize_data(
data,
schema,
metadata=schema.metadata,
on_bad_vectors=on_bad_vectors,
fill_value=fill_value,
allow_subschema=True,
)
_register_optional_converters()
data = to_scannable(data)
try:
return await self._inner.add(data, mode or "append")
except RuntimeError as e:
if "Cast error" in str(e):
raise ValueError(e)
elif "Vector column contains NaN" in str(e):
raise ValueError(e)
else:
raise
def merge_insert(self, on: Union[str, Iterable[str]]) -> LanceMergeInsertBuilder:
"""
@@ -4194,7 +4236,7 @@ class AsyncTable:
1 2 [3.0, 4.0]
2 3 [5.0, 6.0]
>>> table.delete("x = 2")
DeleteResult(version=2)
DeleteResult(num_deleted_rows=1, version=2)
>>> table.to_pandas()
x vector
0 1 [1.0, 2.0]
@@ -4208,7 +4250,7 @@ class AsyncTable:
>>> to_remove
'1, 5'
>>> table.delete(f"x IN ({to_remove})")
DeleteResult(version=3)
DeleteResult(num_deleted_rows=1, version=3)
>>> table.to_pandas()
x vector
0 3 [5.0, 6.0]

View File

@@ -324,6 +324,16 @@ def _(value: list):
return "[" + ", ".join(map(value_to_sql, value)) + "]"
@value_to_sql.register(dict)
def _(value: dict):
# https://datafusion.apache.org/user-guide/sql/scalar_functions.html#named-struct
return (
"named_struct("
+ ", ".join(f"'{k}', {value_to_sql(v)}" for k, v in value.items())
+ ")"
)
@value_to_sql.register(np.ndarray)
def _(value: np.ndarray):
return value_to_sql(value.tolist())
@@ -419,3 +429,22 @@ def batch_to_tensor(batch: pa.RecordBatch):
"""
torch = attempt_import_or_raise("torch", "torch")
return torch.stack([torch.from_dlpack(col) for col in batch.columns])
def batch_to_tensor_rows(batch: pa.RecordBatch):
"""
Convert a PyArrow RecordBatch to a list of PyTorch Tensor, one per row
Each column is converted to a tensor (using zero-copy via DLPack)
and the columns are then stacked into a single tensor. The 2D tensor
is then converted to a list of tensors, one per row
Fails if torch or numpy is not installed.
Fails if a column's data type is not supported by PyTorch.
"""
torch = attempt_import_or_raise("torch", "torch")
numpy = attempt_import_or_raise("numpy", "numpy")
columns = [col.to_numpy(zero_copy_only=False) for col in batch.columns]
stacked = torch.tensor(numpy.column_stack(columns))
rows = list(stacked.unbind(dim=0))
return rows

View File

@@ -515,3 +515,34 @@ def test_openai_propagates_api_key(monkeypatch):
query = "greetings"
actual = table.search(query).limit(1).to_pydantic(Words)[0]
assert len(actual.text) > 0
@patch("time.sleep")
def test_openai_no_retry_on_401(mock_sleep):
"""
Test that OpenAI embedding function does not retry on 401 authentication
errors.
"""
from lancedb.embeddings.utils import retry_with_exponential_backoff
# Create a mock that raises an AuthenticationError
class MockAuthenticationError(Exception):
"""Mock OpenAI AuthenticationError"""
pass
MockAuthenticationError.__name__ = "AuthenticationError"
mock_func = MagicMock(side_effect=MockAuthenticationError("Invalid API key"))
# Wrap the function with retry logic
wrapped_func = retry_with_exponential_backoff(mock_func, max_retries=3)
# Should raise without retrying
with pytest.raises(MockAuthenticationError):
wrapped_func()
# Verify that the function was only called once (no retries)
assert mock_func.call_count == 1
# Verify that sleep was never called (no retries)
assert mock_sleep.call_count == 0

View File

@@ -27,6 +27,7 @@ from lancedb.query import (
PhraseQuery,
BooleanQuery,
Occur,
LanceFtsQueryBuilder,
)
import numpy as np
import pyarrow as pa
@@ -882,3 +883,109 @@ def test_fts_query_to_json():
'"must_not":[]}}'
)
assert json_str == expected
def test_fts_fast_search(table):
table.create_fts_index("text", use_tantivy=False)
# Insert some unindexed data
table.add(
[
{
"text": "xyz",
"vector": [0 for _ in range(128)],
"id": 101,
"text2": "xyz",
"nested": {"text": "xyz"},
"count": 10,
}
]
)
# Without fast_search, the query object should not have fast_search set
builder = table.search("xyz", query_type="fts").limit(10)
query = builder.to_query_object()
assert query.fast_search is None
# With fast_search, the query object should have fast_search=True
builder = table.search("xyz", query_type="fts").fast_search().limit(10)
query = builder.to_query_object()
assert query.fast_search is True
# fast_search should be chainable with other methods
builder = (
table.search("xyz", query_type="fts").fast_search().select(["text"]).limit(5)
)
query = builder.to_query_object()
assert query.fast_search is True
assert query.limit == 5
assert query.columns == ["text"]
# fast_search should be enabled by keyword argument too
query = LanceFtsQueryBuilder(table, "xyz", fast_search=True).to_query_object()
assert query.fast_search is True
# Verify it executes without error and skips unindexed data
results = table.search("xyz", query_type="fts").fast_search().limit(5).to_list()
assert len(results) == 0
# Update index and verify it returns results
table.optimize()
results = table.search("xyz", query_type="fts").fast_search().limit(5).to_list()
assert len(results) > 0
@pytest.mark.asyncio
async def test_fts_fast_search_async(async_table):
await async_table.create_index("text", config=FTS())
# Insert some unindexed data
await async_table.add(
[
{
"text": "xyz",
"vector": [0 for _ in range(128)],
"id": 101,
"text2": "xyz",
"nested": {"text": "xyz"},
"count": 10,
}
]
)
# Without fast_search, should return results
results = await async_table.query().nearest_to_text("xyz").limit(5).to_list()
assert len(results) > 0
# With fast_search, should return no results data unindexed
fast_results = (
await async_table.query()
.nearest_to_text("xyz")
.fast_search()
.limit(5)
.to_list()
)
assert len(fast_results) == 0
# Update index and verify it returns results
await async_table.optimize()
fast_results = (
await async_table.query()
.nearest_to_text("xyz")
.fast_search()
.limit(5)
.to_list()
)
assert len(fast_results) > 0
# fast_search should be chainable with other methods
results = (
await async_table.query()
.nearest_to_text("xyz")
.fast_search()
.select(["text"])
.limit(5)
.to_list()
)
assert len(results) > 0

View File

@@ -163,9 +163,7 @@ async def test_explain_plan(table: AsyncTable):
table.query().nearest_to_text("dog").nearest_to([0.1, 0.1]).explain_plan(True)
)
assert "Vector Search Plan" in plan
assert "KNNVectorDistance" in plan
assert "FTS Search Plan" in plan
assert "LanceRead" in plan

View File

@@ -664,23 +664,20 @@ def test_iter_basic(some_permutation: Permutation):
expected_batches = (950 + batch_size - 1) // batch_size # ceiling division
assert len(batches) == expected_batches
# Check that all batches are dicts (default python format)
assert all(isinstance(batch, dict) for batch in batches)
# Check that all batches are lists of dicts (default python format)
assert all(isinstance(batch, list) for batch in batches)
# Check that batches have the correct structure
for batch in batches:
assert "id" in batch
assert "value" in batch
assert isinstance(batch["id"], list)
assert isinstance(batch["value"], list)
assert "id" in batch[0]
assert "value" in batch[0]
# Check that all batches except the last have the correct size
for batch in batches[:-1]:
assert len(batch["id"]) == batch_size
assert len(batch["value"]) == batch_size
assert len(batch) == batch_size
# Last batch might be smaller
assert len(batches[-1]["id"]) <= batch_size
assert len(batches[-1]) <= batch_size
def test_iter_skip_last_batch(some_permutation: Permutation):
@@ -699,11 +696,11 @@ def test_iter_skip_last_batch(some_permutation: Permutation):
if 950 % batch_size != 0:
assert len(batches_without_skip) == num_full_batches + 1
# Last batch should be smaller
assert len(batches_without_skip[-1]["id"]) == 950 % batch_size
assert len(batches_without_skip[-1]) == 950 % batch_size
# All batches with skip_last_batch should be full size
for batch in batches_with_skip:
assert len(batch["id"]) == batch_size
assert len(batch) == batch_size
def test_iter_different_batch_sizes(some_permutation: Permutation):
@@ -720,12 +717,12 @@ def test_iter_different_batch_sizes(some_permutation: Permutation):
# Test with batch size equal to total rows
single_batch = list(some_permutation.iter(950, skip_last_batch=False))
assert len(single_batch) == 1
assert len(single_batch[0]["id"]) == 950
assert len(single_batch[0]) == 950
# Test with batch size larger than total rows
oversized_batch = list(some_permutation.iter(10000, skip_last_batch=False))
assert len(oversized_batch) == 1
assert len(oversized_batch[0]["id"]) == 950
assert len(oversized_batch[0]) == 950
def test_dunder_iter(some_permutation: Permutation):
@@ -738,15 +735,13 @@ def test_dunder_iter(some_permutation: Permutation):
# All batches should be full size
for batch in batches:
assert len(batch["id"]) == 100
assert len(batch["value"]) == 100
assert len(batch) == 100
some_permutation = some_permutation.with_batch_size(400)
batches = list(some_permutation)
assert len(batches) == 2 # floor(950 / 400) since skip_last_batch=True
for batch in batches:
assert len(batch["id"]) == 400
assert len(batch["value"]) == 400
assert len(batch) == 400
def test_iter_with_different_formats(some_permutation: Permutation):
@@ -761,7 +756,7 @@ def test_iter_with_different_formats(some_permutation: Permutation):
# Test with python format (default)
python_perm = some_permutation.with_format("python")
python_batches = list(python_perm.iter(batch_size, skip_last_batch=False))
assert all(isinstance(batch, dict) for batch in python_batches)
assert all(isinstance(batch, list) for batch in python_batches)
# Test with pandas format
pandas_perm = some_permutation.with_format("pandas")
@@ -780,8 +775,8 @@ def test_iter_with_column_selection(some_permutation: Permutation):
# Check that batches only contain the id column
for batch in batches:
assert "id" in batch
assert "value" not in batch
assert "id" in batch[0]
assert "value" not in batch[0]
def test_iter_with_column_rename(some_permutation: Permutation):
@@ -791,9 +786,9 @@ def test_iter_with_column_rename(some_permutation: Permutation):
# Check that batches have the renamed column
for batch in batches:
assert "id" in batch
assert "data" in batch
assert "value" not in batch
assert "id" in batch[0]
assert "data" in batch[0]
assert "value" not in batch[0]
def test_iter_with_limit_offset(some_permutation: Permutation):
@@ -812,14 +807,14 @@ def test_iter_with_limit_offset(some_permutation: Permutation):
assert len(limit_batches) == 5
no_skip = some_permutation.iter(101, skip_last_batch=False)
row_100 = next(no_skip)["id"][100]
row_100 = next(no_skip)[100]["id"]
# Test with both limit and offset
limited_perm = some_permutation.with_skip(100).with_take(300)
limited_batches = list(limited_perm.iter(100, skip_last_batch=False))
# Should have 3 batches (300 / 100)
assert len(limited_batches) == 3
assert limited_batches[0]["id"][0] == row_100
assert limited_batches[0][0]["id"] == row_100
def test_iter_empty_permutation(mem_db):
@@ -842,7 +837,7 @@ def test_iter_single_row(mem_db):
# With skip_last_batch=False, should get one batch
batches = list(perm.iter(10, skip_last_batch=False))
assert len(batches) == 1
assert len(batches[0]["id"]) == 1
assert len(batches[0]) == 1
# With skip_last_batch=True, should skip the single row (since it's < batch_size)
batches_skip = list(perm.iter(10, skip_last_batch=True))
@@ -860,8 +855,7 @@ def test_identity_permutation(mem_db):
batches = list(permutation.iter(10, skip_last_batch=False))
assert len(batches) == 1
assert len(batches[0]["id"]) == 10
assert len(batches[0]["value"]) == 10
assert len(batches[0]) == 10
permutation = permutation.remove_columns(["value"])
assert permutation.num_columns == 1
@@ -904,10 +898,10 @@ def test_transform_fn(mem_db):
py_result = list(permutation.with_format("python").iter(10, skip_last_batch=False))[
0
]
assert len(py_result) == 2
assert len(py_result["id"]) == 10
assert len(py_result["value"]) == 10
assert isinstance(py_result, dict)
assert len(py_result) == 10
assert "id" in py_result[0]
assert "value" in py_result[0]
assert isinstance(py_result, list)
try:
import torch
@@ -915,9 +909,11 @@ def test_transform_fn(mem_db):
torch_result = list(
permutation.with_format("torch").iter(10, skip_last_batch=False)
)[0]
assert torch_result.shape == (2, 10)
assert torch_result.dtype == torch.int64
assert isinstance(torch_result, torch.Tensor)
assert isinstance(torch_result, list)
assert len(torch_result) == 10
assert isinstance(torch_result[0], torch.Tensor)
assert torch_result[0].shape == (2,)
assert torch_result[0].dtype == torch.int64
except ImportError:
# Skip check if torch is not installed
pass
@@ -945,3 +941,113 @@ def test_custom_transform(mem_db):
batch = batches[0]
assert batch == pa.record_batch([range(10)], ["id"])
def test_getitems_basic(some_permutation: Permutation):
"""Test __getitems__ returns correct rows by offset."""
result = some_permutation.__getitems__([0, 1, 2])
assert isinstance(result, list)
assert "id" in result[0]
assert "value" in result[0]
assert len(result) == 3
def test_getitems_single_index(some_permutation: Permutation):
"""Test __getitems__ with a single index."""
result = some_permutation.__getitems__([0])
assert len(result) == 1
def test_getitems_preserves_order(some_permutation: Permutation):
"""Test __getitems__ returns rows in the requested order."""
# Get rows in forward order
forward = some_permutation.__getitems__([0, 1, 2, 3, 4])
# Get the same rows in reverse order
reverse = some_permutation.__getitems__([4, 3, 2, 1, 0])
assert [r["id"] for r in forward] == list(reversed([r["id"] for r in reverse]))
assert [r["value"] for r in forward] == list(
reversed([r["value"] for r in reverse])
)
def test_getitems_non_contiguous(some_permutation: Permutation):
"""Test __getitems__ with non-contiguous indices."""
result = some_permutation.__getitems__([0, 10, 50, 100, 500])
assert len(result) == 5
# Each id/value pair should match what we'd get individually
for i, offset in enumerate([0, 10, 50, 100, 500]):
single = some_permutation.__getitems__([offset])
assert result[i]["id"] == single[0]["id"]
assert result[i]["value"] == single[0]["value"]
def test_getitems_with_column_selection(some_permutation: Permutation):
"""Test __getitems__ respects column selection."""
id_only = some_permutation.select_columns(["id"])
result = id_only.__getitems__([0, 1, 2])
assert "id" in result[0]
assert "value" not in result[0]
assert len(result) == 3
def test_getitems_with_column_rename(some_permutation: Permutation):
"""Test __getitems__ respects column renames."""
renamed = some_permutation.rename_column("value", "data")
result = renamed.__getitems__([0, 1])
assert "data" in result[0]
assert "value" not in result[0]
assert len(result) == 2
def test_getitems_with_format(some_permutation: Permutation):
"""Test __getitems__ applies the transform function."""
arrow_perm = some_permutation.with_format("arrow")
result = arrow_perm.__getitems__([0, 1, 2])
assert isinstance(result, pa.RecordBatch)
assert result.num_rows == 3
def test_getitems_with_custom_transform(some_permutation: Permutation):
"""Test __getitems__ with a custom transform."""
def transform(batch: pa.RecordBatch) -> list:
return batch.column("id").to_pylist()
custom = some_permutation.with_transform(transform)
result = custom.__getitems__([0, 1, 2])
assert isinstance(result, list)
assert len(result) == 3
def test_getitems_identity_permutation(mem_db):
"""Test __getitems__ on an identity permutation."""
tbl = mem_db.create_table(
"test_table", pa.table({"id": range(10), "value": range(10)})
)
perm = Permutation.identity(tbl)
result = perm.__getitems__([0, 5, 9])
assert [r["id"] for r in result] == [0, 5, 9]
assert [r["value"] for r in result] == [0, 5, 9]
def test_getitems_with_limit_offset(some_permutation: Permutation):
"""Test __getitems__ on a permutation with skip/take applied."""
limited = some_permutation.with_skip(100).with_take(200)
# Should be able to access offsets within the limited range
result = limited.__getitems__([0, 1, 199])
assert len(result) == 3
# The first item of the limited permutation should match offset 100 of original
full_result = some_permutation.__getitems__([100])
limited_result = limited.__getitems__([0])
assert limited_result[0]["id"] == full_result[0]["id"]
def test_getitems_invalid_offset(some_permutation: Permutation):
"""Test __getitems__ with an out-of-range offset raises an error."""
with pytest.raises(Exception):
some_permutation.__getitems__([999999])

View File

@@ -531,6 +531,78 @@ def test_empty_result_reranker():
)
def test_empty_hybrid_result_reranker():
"""Test that hybrid search with empty results after filtering doesn't crash.
Regression test for https://github.com/lancedb/lancedb/issues/2425
"""
from lancedb.query import LanceHybridQueryBuilder
# Simulate empty vector and FTS results with the expected schema
vector_schema = pa.schema(
[
("text", pa.string()),
("vector", pa.list_(pa.float32(), 4)),
("_rowid", pa.uint64()),
("_distance", pa.float32()),
]
)
fts_schema = pa.schema(
[
("text", pa.string()),
("vector", pa.list_(pa.float32(), 4)),
("_rowid", pa.uint64()),
("_score", pa.float32()),
]
)
empty_vector = pa.table(
{
"text": pa.array([], type=pa.string()),
"vector": pa.array([], type=pa.list_(pa.float32(), 4)),
"_rowid": pa.array([], type=pa.uint64()),
"_distance": pa.array([], type=pa.float32()),
},
schema=vector_schema,
)
empty_fts = pa.table(
{
"text": pa.array([], type=pa.string()),
"vector": pa.array([], type=pa.list_(pa.float32(), 4)),
"_rowid": pa.array([], type=pa.uint64()),
"_score": pa.array([], type=pa.float32()),
},
schema=fts_schema,
)
for reranker in [LinearCombinationReranker(), RRFReranker()]:
result = LanceHybridQueryBuilder._combine_hybrid_results(
fts_results=empty_fts,
vector_results=empty_vector,
norm="score",
fts_query="nonexistent query",
reranker=reranker,
limit=10,
with_row_ids=False,
)
assert len(result) == 0
assert "_relevance_score" in result.column_names
assert "_rowid" not in result.column_names
# Also test with with_row_ids=True
result = LanceHybridQueryBuilder._combine_hybrid_results(
fts_results=empty_fts,
vector_results=empty_vector,
norm="score",
fts_query="nonexistent query",
reranker=LinearCombinationReranker(),
limit=10,
with_row_ids=True,
)
assert len(result) == 0
assert "_relevance_score" in result.column_names
assert "_rowid" in result.column_names
@pytest.mark.parametrize("use_tantivy", [True, False])
def test_cross_encoder_reranker_return_all(tmp_path, use_tantivy):
pytest.importorskip("sentence_transformers")

View File

@@ -326,6 +326,24 @@ def test_add_struct(mem_db: DBConnection):
table = mem_db.create_table("test2", schema=schema)
table.add(data)
struct_type = pa.struct(
[
("b", pa.int64()),
("a", pa.int64()),
]
)
expected = pa.table(
{
"s_list": [
[
pa.scalar({"b": 1, "a": 2}, type=struct_type),
pa.scalar({"b": 4, "a": None}, type=struct_type),
]
],
}
)
assert table.to_arrow() == expected
def test_add_subschema(mem_db: DBConnection):
schema = pa.schema(
@@ -810,7 +828,7 @@ def test_create_index_name_and_train_parameters(
)
def test_add_with_nans(mem_db: DBConnection):
def test_create_with_nans(mem_db: DBConnection):
# by default we raise an error on bad input vectors
bad_data = [
{"vector": [np.nan], "item": "bar", "price": 20.0},
@@ -854,6 +872,57 @@ def test_add_with_nans(mem_db: DBConnection):
assert np.allclose(v, np.array([0.0, 0.0]))
def test_add_with_nans(mem_db: DBConnection):
schema = pa.schema(
[
pa.field("vector", pa.list_(pa.float32(), 2), nullable=True),
pa.field("item", pa.string(), nullable=True),
pa.field("price", pa.float64(), nullable=False),
],
)
table = mem_db.create_table("test", schema=schema)
# by default we raise an error on bad input vectors
bad_data = [
{"vector": [np.nan], "item": "bar", "price": 20.0},
{"vector": [5], "item": "bar", "price": 20.0},
{"vector": [np.nan, np.nan], "item": "bar", "price": 20.0},
{"vector": [np.nan, 5.0], "item": "bar", "price": 20.0},
]
for row in bad_data:
with pytest.raises(ValueError):
table.add(
data=[row],
)
table.add(
[
{"vector": [3.1, 4.1], "item": "foo", "price": 10.0},
{"vector": [2.1, 4.1], "item": "foo", "price": 9.0},
{"vector": [np.nan], "item": "bar", "price": 20.0},
{"vector": [5], "item": "bar", "price": 20.0},
{"vector": [np.nan, np.nan], "item": "bar", "price": 20.0},
],
on_bad_vectors="drop",
)
assert len(table) == 2
table.delete("true")
# We can fill bad input with some value
table.add(
data=[
{"vector": [3.1, 4.1], "item": "foo", "price": 10.0},
{"vector": [np.nan], "item": "bar", "price": 20.0},
{"vector": [np.nan, np.nan], "item": "bar", "price": 20.0},
],
on_bad_vectors="fill",
fill_value=0.0,
)
assert len(table) == 3
arrow_tbl = table.search().where("item == 'bar'").to_arrow()
v = arrow_tbl["vector"].to_pylist()[0]
assert np.allclose(v, np.array([0.0, 0.0]))
def test_restore(mem_db: DBConnection):
table = mem_db.create_table(
"my_table",

View File

@@ -4,6 +4,7 @@
import pyarrow as pa
import pytest
from lancedb.util import tbl_to_tensor
from lancedb.permutation import Permutation
torch = pytest.importorskip("torch")
@@ -16,3 +17,26 @@ def test_table_dataloader(mem_db):
for batch in dataloader:
assert batch.size(0) == 1
assert batch.size(1) == 10
def test_permutation_dataloader(mem_db):
table = mem_db.create_table("test_table", pa.table({"a": range(1000)}))
permutation = Permutation.identity(table)
dataloader = torch.utils.data.DataLoader(permutation, batch_size=10, shuffle=True)
for batch in dataloader:
assert batch["a"].size(0) == 10
permutation = permutation.with_format("torch")
dataloader = torch.utils.data.DataLoader(permutation, batch_size=10, shuffle=True)
for batch in dataloader:
assert batch.size(0) == 10
assert batch.size(1) == 1
permutation = permutation.with_format("torch_col")
dataloader = torch.utils.data.DataLoader(
permutation, collate_fn=lambda x: x, batch_size=10, shuffle=True
)
for batch in dataloader:
assert batch.size(0) == 1
assert batch.size(1) == 10

View File

@@ -121,6 +121,32 @@ def test_value_to_sql_string(tmp_path):
assert table.to_pandas().query("search == @value")["replace"].item() == value
def test_value_to_sql_dict():
# Simple flat struct
assert value_to_sql({"a": 1, "b": "hello"}) == "named_struct('a', 1, 'b', 'hello')"
# Nested struct
assert (
value_to_sql({"outer": {"inner": 1}})
== "named_struct('outer', named_struct('inner', 1))"
)
# List inside struct
assert value_to_sql({"a": [1, 2]}) == "named_struct('a', [1, 2])"
# Mixed types
assert (
value_to_sql({"name": "test", "count": 42, "rate": 3.14, "active": True})
== "named_struct('name', 'test', 'count', 42, 'rate', 3.14, 'active', TRUE)"
)
# Null value inside struct
assert value_to_sql({"a": None}) == "named_struct('a', NULL)"
# Empty dict
assert value_to_sql({}) == "named_struct()"
def test_append_vector_columns():
registry = EmbeddingFunctionRegistry.get_instance()
registry.register("test")(MockTextEmbeddingFunction)
@@ -292,18 +318,14 @@ class TestModel(lancedb.pydantic.LanceModel):
lambda: pa.table({"a": [1], "b": [2]}),
lambda: pa.table({"a": [1], "b": [2]}).to_reader(),
lambda: iter(pa.table({"a": [1], "b": [2]}).to_batches()),
lambda: (
lance.write_dataset(
pa.table({"a": [1], "b": [2]}),
"memory://test",
)
),
lambda: (
lance.write_dataset(
pa.table({"a": [1], "b": [2]}),
"memory://test",
).scanner()
lambda: lance.write_dataset(
pa.table({"a": [1], "b": [2]}),
"memory://test",
),
lambda: lance.write_dataset(
pa.table({"a": [1], "b": [2]}),
"memory://test",
).scanner(),
lambda: pd.DataFrame({"a": [1], "b": [2]}),
lambda: pl.DataFrame({"a": [1], "b": [2]}),
lambda: pl.LazyFrame({"a": [1], "b": [2]}),

View File

@@ -10,7 +10,7 @@ use arrow::{
use futures::stream::StreamExt;
use lancedb::arrow::SendableRecordBatchStream;
use pyo3::{
exceptions::PyStopAsyncIteration, pyclass, pymethods, Bound, Py, PyAny, PyRef, PyResult, Python,
Bound, Py, PyAny, PyRef, PyResult, Python, exceptions::PyStopAsyncIteration, pyclass, pymethods,
};
use pyo3_async_runtimes::tokio::future_into_py;

View File

@@ -9,15 +9,16 @@ use lancedb::{
database::{CreateTableMode, Database, ReadConsistency},
};
use pyo3::{
Bound, FromPyObject, Py, PyAny, PyRef, PyResult, Python,
exceptions::{PyRuntimeError, PyValueError},
pyclass, pyfunction, pymethods,
types::{PyDict, PyDictMethods},
Bound, FromPyObject, Py, PyAny, PyRef, PyResult, Python,
};
use pyo3_async_runtimes::tokio::future_into_py;
use crate::{
error::PythonErrorExt, storage_options::py_object_to_storage_options_provider, table::Table,
error::PythonErrorExt, namespace::extract_namespace_arc,
storage_options::py_object_to_storage_options_provider, table::Table,
};
#[pyclass]
@@ -121,7 +122,8 @@ impl Connection {
let mode = Self::parse_create_mode_str(mode)?;
let batches = ArrowArrayStreamReader::from_pyarrow_bound(&data)?;
let batches: Box<dyn arrow::array::RecordBatchReader + Send> =
Box::new(ArrowArrayStreamReader::from_pyarrow_bound(&data)?);
let mut builder = inner.create_table(name, batches).mode(mode);
@@ -181,7 +183,8 @@ impl Connection {
})
}
#[pyo3(signature = (name, namespace=vec![], storage_options = None, storage_options_provider=None, index_cache_size = None, location=None))]
#[allow(clippy::too_many_arguments)]
#[pyo3(signature = (name, namespace=vec![], storage_options = None, storage_options_provider=None, index_cache_size = None, location=None, namespace_client=None, managed_versioning=None))]
pub fn open_table(
self_: PyRef<'_, Self>,
name: String,
@@ -190,11 +193,13 @@ impl Connection {
storage_options_provider: Option<Py<PyAny>>,
index_cache_size: Option<u32>,
location: Option<String>,
namespace_client: Option<Py<PyAny>>,
managed_versioning: Option<bool>,
) -> PyResult<Bound<'_, PyAny>> {
let inner = self_.get_inner()?.clone();
let mut builder = inner.open_table(name);
builder = builder.namespace(namespace);
builder = builder.namespace(namespace.clone());
if let Some(storage_options) = storage_options {
builder = builder.storage_options(storage_options);
}
@@ -208,6 +213,20 @@ impl Connection {
if let Some(location) = location {
builder = builder.location(location);
}
// Extract namespace client from Python object if provided
let ns_client = if let Some(ns_obj) = namespace_client {
let py = self_.py();
Some(extract_namespace_arc(py, ns_obj)?)
} else {
None
};
if let Some(ns_client) = ns_client {
builder = builder.namespace_client(ns_client);
}
// Pass managed_versioning if provided to avoid redundant describe_table call
if let Some(enabled) = managed_versioning {
builder = builder.managed_versioning(enabled);
}
future_into_py(self_.py(), async move {
let table = builder.execute().await.infer_error()?;

View File

@@ -2,10 +2,10 @@
// SPDX-FileCopyrightText: Copyright The LanceDB Authors
use pyo3::{
PyErr, PyResult, Python,
exceptions::{PyIOError, PyNotImplementedError, PyOSError, PyRuntimeError, PyValueError},
intern,
types::{PyAnyMethods, PyNone},
PyErr, PyResult, Python,
};
use lancedb::error::Error as LanceError;

View File

@@ -3,17 +3,17 @@
use lancedb::index::vector::{IvfFlatIndexBuilder, IvfRqIndexBuilder, IvfSqIndexBuilder};
use lancedb::index::{
Index as LanceDbIndex,
scalar::{BTreeIndexBuilder, FtsIndexBuilder},
vector::{IvfHnswPqIndexBuilder, IvfHnswSqIndexBuilder, IvfPqIndexBuilder},
Index as LanceDbIndex,
};
use pyo3::types::PyStringMethods;
use pyo3::IntoPyObject;
use pyo3::types::PyStringMethods;
use pyo3::{
Bound, FromPyObject, PyAny, PyResult, Python,
exceptions::{PyKeyError, PyValueError},
intern, pyclass, pymethods,
types::PyAnyMethods,
Bound, FromPyObject, PyAny, PyResult, Python,
};
use crate::util::parse_distance_type;
@@ -41,7 +41,12 @@ pub fn extract_index_params(source: &Option<Bound<'_, PyAny>>) -> PyResult<Lance
let inner_opts = FtsIndexBuilder::default()
.base_tokenizer(params.base_tokenizer)
.language(&params.language)
.map_err(|_| PyValueError::new_err(format!("LanceDB does not support the requested language: '{}'", params.language)))?
.map_err(|_| {
PyValueError::new_err(format!(
"LanceDB does not support the requested language: '{}'",
params.language
))
})?
.with_position(params.with_position)
.lower_case(params.lower_case)
.max_token_length(params.max_token_length)
@@ -52,7 +57,7 @@ pub fn extract_index_params(source: &Option<Bound<'_, PyAny>>) -> PyResult<Lance
.ngram_max_length(params.ngram_max_length)
.ngram_prefix_only(params.prefix_only);
Ok(LanceDbIndex::FTS(inner_opts))
},
}
"IvfFlat" => {
let params = source.extract::<IvfFlatParams>()?;
let distance_type = parse_distance_type(params.distance_type)?;
@@ -64,10 +69,11 @@ pub fn extract_index_params(source: &Option<Bound<'_, PyAny>>) -> PyResult<Lance
ivf_flat_builder = ivf_flat_builder.num_partitions(num_partitions);
}
if let Some(target_partition_size) = params.target_partition_size {
ivf_flat_builder = ivf_flat_builder.target_partition_size(target_partition_size);
ivf_flat_builder =
ivf_flat_builder.target_partition_size(target_partition_size);
}
Ok(LanceDbIndex::IvfFlat(ivf_flat_builder))
},
}
"IvfPq" => {
let params = source.extract::<IvfPqParams>()?;
let distance_type = parse_distance_type(params.distance_type)?;
@@ -86,7 +92,7 @@ pub fn extract_index_params(source: &Option<Bound<'_, PyAny>>) -> PyResult<Lance
ivf_pq_builder = ivf_pq_builder.num_sub_vectors(num_sub_vectors);
}
Ok(LanceDbIndex::IvfPq(ivf_pq_builder))
},
}
"IvfSq" => {
let params = source.extract::<IvfSqParams>()?;
let distance_type = parse_distance_type(params.distance_type)?;
@@ -101,7 +107,7 @@ pub fn extract_index_params(source: &Option<Bound<'_, PyAny>>) -> PyResult<Lance
ivf_sq_builder = ivf_sq_builder.target_partition_size(target_partition_size);
}
Ok(LanceDbIndex::IvfSq(ivf_sq_builder))
},
}
"IvfRq" => {
let params = source.extract::<IvfRqParams>()?;
let distance_type = parse_distance_type(params.distance_type)?;
@@ -117,7 +123,7 @@ pub fn extract_index_params(source: &Option<Bound<'_, PyAny>>) -> PyResult<Lance
ivf_rq_builder = ivf_rq_builder.target_partition_size(target_partition_size);
}
Ok(LanceDbIndex::IvfRq(ivf_rq_builder))
},
}
"HnswPq" => {
let params = source.extract::<IvfHnswPqParams>()?;
let distance_type = parse_distance_type(params.distance_type)?;
@@ -138,7 +144,7 @@ pub fn extract_index_params(source: &Option<Bound<'_, PyAny>>) -> PyResult<Lance
hnsw_pq_builder = hnsw_pq_builder.num_sub_vectors(num_sub_vectors);
}
Ok(LanceDbIndex::IvfHnswPq(hnsw_pq_builder))
},
}
"HnswSq" => {
let params = source.extract::<IvfHnswSqParams>()?;
let distance_type = parse_distance_type(params.distance_type)?;
@@ -155,7 +161,7 @@ pub fn extract_index_params(source: &Option<Bound<'_, PyAny>>) -> PyResult<Lance
hnsw_sq_builder = hnsw_sq_builder.target_partition_size(target_partition_size);
}
Ok(LanceDbIndex::IvfHnswSq(hnsw_sq_builder))
},
}
not_supported => Err(PyValueError::new_err(format!(
"Invalid index type '{}'. Must be one of BTree, Bitmap, LabelList, FTS, IvfPq, IvfSq, IvfHnswPq, or IvfHnswSq",
not_supported

View File

@@ -2,14 +2,14 @@
// SPDX-FileCopyrightText: Copyright The LanceDB Authors
use arrow::RecordBatchStream;
use connection::{connect, Connection};
use connection::{Connection, connect};
use env_logger::Env;
use index::IndexConfig;
use permutation::{PyAsyncPermutationBuilder, PyPermutationReader};
use pyo3::{
pymodule,
Bound, PyResult, Python, pymodule,
types::{PyModule, PyModuleMethods},
wrap_pyfunction, Bound, PyResult, Python,
wrap_pyfunction,
};
use query::{FTSQuery, HybridQuery, Query, VectorQuery};
use session::Session;
@@ -23,6 +23,7 @@ pub mod connection;
pub mod error;
pub mod header;
pub mod index;
pub mod namespace;
pub mod permutation;
pub mod query;
pub mod session;

746
python/src/namespace.rs Normal file
View File

@@ -0,0 +1,746 @@
// SPDX-License-Identifier: Apache-2.0
// SPDX-FileCopyrightText: Copyright The LanceDB Authors
//! Namespace utilities for Python bindings
use std::collections::HashMap;
use std::sync::Arc;
use async_trait::async_trait;
use bytes::Bytes;
use lance_namespace::LanceNamespace as LanceNamespaceTrait;
use lance_namespace::models::*;
use pyo3::prelude::*;
use pyo3::types::PyDict;
/// Wrapper that allows any Python object implementing LanceNamespace protocol
/// to be used as a Rust LanceNamespace.
///
/// This is similar to PyLanceNamespace in lance's Python bindings - it wraps a Python
/// object and calls back into Python when namespace methods are invoked.
pub struct PyLanceNamespace {
py_namespace: Arc<Py<PyAny>>,
namespace_id: String,
}
impl PyLanceNamespace {
/// Create a new PyLanceNamespace wrapper around a Python namespace object.
pub fn new(_py: Python<'_>, py_namespace: &Bound<'_, PyAny>) -> PyResult<Self> {
let namespace_id = py_namespace
.call_method0("namespace_id")?
.extract::<String>()?;
Ok(Self {
py_namespace: Arc::new(py_namespace.clone().unbind()),
namespace_id,
})
}
/// Create an Arc<dyn LanceNamespace> from a Python namespace object.
pub fn create_arc(
py: Python<'_>,
py_namespace: &Bound<'_, PyAny>,
) -> PyResult<Arc<dyn LanceNamespaceTrait>> {
let wrapper = Self::new(py, py_namespace)?;
Ok(Arc::new(wrapper))
}
}
impl std::fmt::Debug for PyLanceNamespace {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
write!(f, "PyLanceNamespace {{ id: {} }}", self.namespace_id)
}
}
/// Get or create the DictWithModelDump class in Python.
/// This class acts like a dict but also has model_dump() method.
/// This allows it to work with both:
/// - depythonize (which expects a dict/Mapping)
/// - Python code that calls .model_dump() (like DirectoryNamespace wrapper)
fn get_dict_with_model_dump_class(py: Python<'_>) -> PyResult<Bound<'_, PyAny>> {
// Use a module-level cache via __builtins__
let builtins = py.import("builtins")?;
if builtins.hasattr("_DictWithModelDump")? {
return builtins.getattr("_DictWithModelDump");
}
// Create the class using exec
let locals = PyDict::new(py);
py.run(
c"class DictWithModelDump(dict):
def model_dump(self):
return dict(self)",
None,
Some(&locals),
)?;
let class = locals.get_item("DictWithModelDump")?.ok_or_else(|| {
pyo3::exceptions::PyRuntimeError::new_err("Failed to create DictWithModelDump class")
})?;
// Cache it
builtins.setattr("_DictWithModelDump", &class)?;
Ok(class)
}
/// Helper to call a Python namespace method with JSON serialization.
/// For methods that take a request and return a response.
/// Uses DictWithModelDump to pass a dict that also has model_dump() method,
/// making it compatible with both depythonize and Python wrappers.
async fn call_py_method<Req, Resp>(
py_namespace: Arc<Py<PyAny>>,
method_name: &'static str,
request: Req,
) -> lance_core::Result<Resp>
where
Req: serde::Serialize + Send + 'static,
Resp: serde::de::DeserializeOwned + Send + 'static,
{
let request_json = serde_json::to_string(&request).map_err(|e| {
lance_core::Error::io(
format!("Failed to serialize request for {}: {}", method_name, e),
Default::default(),
)
})?;
let response_json = tokio::task::spawn_blocking(move || {
Python::attach(|py| {
let json_module = py.import("json")?;
let request_dict = json_module.call_method1("loads", (&request_json,))?;
// Wrap dict in DictWithModelDump so it works with both depythonize and .model_dump()
let dict_class = get_dict_with_model_dump_class(py)?;
let request_arg = dict_class.call1((request_dict,))?;
// Call the Python method
let result = py_namespace.call_method1(py, method_name, (request_arg,))?;
// Convert response to dict, then to JSON
// Pydantic models have model_dump() method
let result_dict = if result.bind(py).hasattr("model_dump")? {
result.call_method0(py, "model_dump")?
} else {
result
};
let response_json: String = json_module
.call_method1("dumps", (result_dict,))?
.extract()?;
Ok::<_, PyErr>(response_json)
})
})
.await
.map_err(|e| {
lance_core::Error::io(
format!("Task join error for {}: {}", method_name, e),
Default::default(),
)
})?
.map_err(|e: PyErr| {
lance_core::Error::io(
format!("Python error in {}: {}", method_name, e),
Default::default(),
)
})?;
serde_json::from_str(&response_json).map_err(|e| {
lance_core::Error::io(
format!("Failed to deserialize response from {}: {}", method_name, e),
Default::default(),
)
})
}
/// Helper for methods that return () on success
async fn call_py_method_unit<Req>(
py_namespace: Arc<Py<PyAny>>,
method_name: &'static str,
request: Req,
) -> lance_core::Result<()>
where
Req: serde::Serialize + Send + 'static,
{
let request_json = serde_json::to_string(&request).map_err(|e| {
lance_core::Error::io(
format!("Failed to serialize request for {}: {}", method_name, e),
Default::default(),
)
})?;
tokio::task::spawn_blocking(move || {
Python::attach(|py| {
let json_module = py.import("json")?;
let request_dict = json_module.call_method1("loads", (&request_json,))?;
// Wrap dict in DictWithModelDump
let dict_class = get_dict_with_model_dump_class(py)?;
let request_arg = dict_class.call1((request_dict,))?;
// Call the Python method
py_namespace.call_method1(py, method_name, (request_arg,))?;
Ok::<_, PyErr>(())
})
})
.await
.map_err(|e| {
lance_core::Error::io(
format!("Task join error for {}: {}", method_name, e),
Default::default(),
)
})?
.map_err(|e: PyErr| {
lance_core::Error::io(
format!("Python error in {}: {}", method_name, e),
Default::default(),
)
})
}
/// Helper for methods that return a primitive type
async fn call_py_method_primitive<Req, Resp>(
py_namespace: Arc<Py<PyAny>>,
method_name: &'static str,
request: Req,
) -> lance_core::Result<Resp>
where
Req: serde::Serialize + Send + 'static,
Resp: for<'py> pyo3::FromPyObject<'py> + Send + 'static,
{
let request_json = serde_json::to_string(&request).map_err(|e| {
lance_core::Error::io(
format!("Failed to serialize request for {}: {}", method_name, e),
Default::default(),
)
})?;
tokio::task::spawn_blocking(move || {
Python::attach(|py| {
let json_module = py.import("json")?;
let request_dict = json_module.call_method1("loads", (&request_json,))?;
// Wrap dict in DictWithModelDump
let dict_class = get_dict_with_model_dump_class(py)?;
let request_arg = dict_class.call1((request_dict,))?;
// Call the Python method
let result = py_namespace.call_method1(py, method_name, (request_arg,))?;
let value: Resp = result.extract(py)?;
Ok::<_, PyErr>(value)
})
})
.await
.map_err(|e| {
lance_core::Error::io(
format!("Task join error for {}: {}", method_name, e),
Default::default(),
)
})?
.map_err(|e: PyErr| {
lance_core::Error::io(
format!("Python error in {}: {}", method_name, e),
Default::default(),
)
})
}
/// Helper for methods that return Bytes
async fn call_py_method_bytes<Req>(
py_namespace: Arc<Py<PyAny>>,
method_name: &'static str,
request: Req,
) -> lance_core::Result<Bytes>
where
Req: serde::Serialize + Send + 'static,
{
let request_json = serde_json::to_string(&request).map_err(|e| {
lance_core::Error::io(
format!("Failed to serialize request for {}: {}", method_name, e),
Default::default(),
)
})?;
tokio::task::spawn_blocking(move || {
Python::attach(|py| {
let json_module = py.import("json")?;
let request_dict = json_module.call_method1("loads", (&request_json,))?;
// Wrap dict in DictWithModelDump
let dict_class = get_dict_with_model_dump_class(py)?;
let request_arg = dict_class.call1((request_dict,))?;
// Call the Python method
let result = py_namespace.call_method1(py, method_name, (request_arg,))?;
let bytes_data: Vec<u8> = result.extract(py)?;
Ok::<_, PyErr>(Bytes::from(bytes_data))
})
})
.await
.map_err(|e| {
lance_core::Error::io(
format!("Task join error for {}: {}", method_name, e),
Default::default(),
)
})?
.map_err(|e: PyErr| {
lance_core::Error::io(
format!("Python error in {}: {}", method_name, e),
Default::default(),
)
})
}
/// Helper for methods that take request + data and return a response
async fn call_py_method_with_data<Req, Resp>(
py_namespace: Arc<Py<PyAny>>,
method_name: &'static str,
request: Req,
data: Bytes,
) -> lance_core::Result<Resp>
where
Req: serde::Serialize + Send + 'static,
Resp: serde::de::DeserializeOwned + Send + 'static,
{
let request_json = serde_json::to_string(&request).map_err(|e| {
lance_core::Error::io(
format!("Failed to serialize request for {}: {}", method_name, e),
Default::default(),
)
})?;
let response_json = tokio::task::spawn_blocking(move || {
Python::attach(|py| {
let json_module = py.import("json")?;
let request_dict = json_module.call_method1("loads", (&request_json,))?;
// Wrap dict in DictWithModelDump
let dict_class = get_dict_with_model_dump_class(py)?;
let request_arg = dict_class.call1((request_dict,))?;
// Pass request and bytes to Python method
let py_bytes = pyo3::types::PyBytes::new(py, &data);
let result = py_namespace.call_method1(py, method_name, (request_arg, py_bytes))?;
// Convert response dict to JSON
let response_json: String = json_module.call_method1("dumps", (result,))?.extract()?;
Ok::<_, PyErr>(response_json)
})
})
.await
.map_err(|e| {
lance_core::Error::io(
format!("Task join error for {}: {}", method_name, e),
Default::default(),
)
})?
.map_err(|e: PyErr| {
lance_core::Error::io(
format!("Python error in {}: {}", method_name, e),
Default::default(),
)
})?;
serde_json::from_str(&response_json).map_err(|e| {
lance_core::Error::io(
format!("Failed to deserialize response from {}: {}", method_name, e),
Default::default(),
)
})
}
#[async_trait]
impl LanceNamespaceTrait for PyLanceNamespace {
fn namespace_id(&self) -> String {
self.namespace_id.clone()
}
async fn list_namespaces(
&self,
request: ListNamespacesRequest,
) -> lance_core::Result<ListNamespacesResponse> {
call_py_method(self.py_namespace.clone(), "list_namespaces", request).await
}
async fn describe_namespace(
&self,
request: DescribeNamespaceRequest,
) -> lance_core::Result<DescribeNamespaceResponse> {
call_py_method(self.py_namespace.clone(), "describe_namespace", request).await
}
async fn create_namespace(
&self,
request: CreateNamespaceRequest,
) -> lance_core::Result<CreateNamespaceResponse> {
call_py_method(self.py_namespace.clone(), "create_namespace", request).await
}
async fn drop_namespace(
&self,
request: DropNamespaceRequest,
) -> lance_core::Result<DropNamespaceResponse> {
call_py_method(self.py_namespace.clone(), "drop_namespace", request).await
}
async fn namespace_exists(&self, request: NamespaceExistsRequest) -> lance_core::Result<()> {
call_py_method_unit(self.py_namespace.clone(), "namespace_exists", request).await
}
async fn list_tables(
&self,
request: ListTablesRequest,
) -> lance_core::Result<ListTablesResponse> {
call_py_method(self.py_namespace.clone(), "list_tables", request).await
}
async fn describe_table(
&self,
request: DescribeTableRequest,
) -> lance_core::Result<DescribeTableResponse> {
call_py_method(self.py_namespace.clone(), "describe_table", request).await
}
async fn register_table(
&self,
request: RegisterTableRequest,
) -> lance_core::Result<RegisterTableResponse> {
call_py_method(self.py_namespace.clone(), "register_table", request).await
}
async fn table_exists(&self, request: TableExistsRequest) -> lance_core::Result<()> {
call_py_method_unit(self.py_namespace.clone(), "table_exists", request).await
}
async fn drop_table(&self, request: DropTableRequest) -> lance_core::Result<DropTableResponse> {
call_py_method(self.py_namespace.clone(), "drop_table", request).await
}
async fn deregister_table(
&self,
request: DeregisterTableRequest,
) -> lance_core::Result<DeregisterTableResponse> {
call_py_method(self.py_namespace.clone(), "deregister_table", request).await
}
async fn count_table_rows(&self, request: CountTableRowsRequest) -> lance_core::Result<i64> {
call_py_method_primitive(self.py_namespace.clone(), "count_table_rows", request).await
}
async fn create_table(
&self,
request: CreateTableRequest,
request_data: Bytes,
) -> lance_core::Result<CreateTableResponse> {
call_py_method_with_data(
self.py_namespace.clone(),
"create_table",
request,
request_data,
)
.await
}
async fn declare_table(
&self,
request: DeclareTableRequest,
) -> lance_core::Result<DeclareTableResponse> {
call_py_method(self.py_namespace.clone(), "declare_table", request).await
}
async fn insert_into_table(
&self,
request: InsertIntoTableRequest,
request_data: Bytes,
) -> lance_core::Result<InsertIntoTableResponse> {
call_py_method_with_data(
self.py_namespace.clone(),
"insert_into_table",
request,
request_data,
)
.await
}
async fn merge_insert_into_table(
&self,
request: MergeInsertIntoTableRequest,
request_data: Bytes,
) -> lance_core::Result<MergeInsertIntoTableResponse> {
call_py_method_with_data(
self.py_namespace.clone(),
"merge_insert_into_table",
request,
request_data,
)
.await
}
async fn update_table(
&self,
request: UpdateTableRequest,
) -> lance_core::Result<UpdateTableResponse> {
call_py_method(self.py_namespace.clone(), "update_table", request).await
}
async fn delete_from_table(
&self,
request: DeleteFromTableRequest,
) -> lance_core::Result<DeleteFromTableResponse> {
call_py_method(self.py_namespace.clone(), "delete_from_table", request).await
}
async fn query_table(&self, request: QueryTableRequest) -> lance_core::Result<Bytes> {
call_py_method_bytes(self.py_namespace.clone(), "query_table", request).await
}
async fn create_table_index(
&self,
request: CreateTableIndexRequest,
) -> lance_core::Result<CreateTableIndexResponse> {
call_py_method(self.py_namespace.clone(), "create_table_index", request).await
}
async fn list_table_indices(
&self,
request: ListTableIndicesRequest,
) -> lance_core::Result<ListTableIndicesResponse> {
call_py_method(self.py_namespace.clone(), "list_table_indices", request).await
}
async fn describe_table_index_stats(
&self,
request: DescribeTableIndexStatsRequest,
) -> lance_core::Result<DescribeTableIndexStatsResponse> {
call_py_method(
self.py_namespace.clone(),
"describe_table_index_stats",
request,
)
.await
}
async fn describe_transaction(
&self,
request: DescribeTransactionRequest,
) -> lance_core::Result<DescribeTransactionResponse> {
call_py_method(self.py_namespace.clone(), "describe_transaction", request).await
}
async fn alter_transaction(
&self,
request: AlterTransactionRequest,
) -> lance_core::Result<AlterTransactionResponse> {
call_py_method(self.py_namespace.clone(), "alter_transaction", request).await
}
async fn create_table_scalar_index(
&self,
request: CreateTableIndexRequest,
) -> lance_core::Result<CreateTableScalarIndexResponse> {
call_py_method(
self.py_namespace.clone(),
"create_table_scalar_index",
request,
)
.await
}
async fn drop_table_index(
&self,
request: DropTableIndexRequest,
) -> lance_core::Result<DropTableIndexResponse> {
call_py_method(self.py_namespace.clone(), "drop_table_index", request).await
}
async fn list_all_tables(
&self,
request: ListTablesRequest,
) -> lance_core::Result<ListTablesResponse> {
call_py_method(self.py_namespace.clone(), "list_all_tables", request).await
}
async fn restore_table(
&self,
request: RestoreTableRequest,
) -> lance_core::Result<RestoreTableResponse> {
call_py_method(self.py_namespace.clone(), "restore_table", request).await
}
async fn rename_table(
&self,
request: RenameTableRequest,
) -> lance_core::Result<RenameTableResponse> {
call_py_method(self.py_namespace.clone(), "rename_table", request).await
}
async fn list_table_versions(
&self,
request: ListTableVersionsRequest,
) -> lance_core::Result<ListTableVersionsResponse> {
call_py_method(self.py_namespace.clone(), "list_table_versions", request).await
}
async fn create_table_version(
&self,
request: CreateTableVersionRequest,
) -> lance_core::Result<CreateTableVersionResponse> {
call_py_method(self.py_namespace.clone(), "create_table_version", request).await
}
async fn describe_table_version(
&self,
request: DescribeTableVersionRequest,
) -> lance_core::Result<DescribeTableVersionResponse> {
call_py_method(self.py_namespace.clone(), "describe_table_version", request).await
}
async fn batch_delete_table_versions(
&self,
request: BatchDeleteTableVersionsRequest,
) -> lance_core::Result<BatchDeleteTableVersionsResponse> {
call_py_method(
self.py_namespace.clone(),
"batch_delete_table_versions",
request,
)
.await
}
async fn update_table_schema_metadata(
&self,
request: UpdateTableSchemaMetadataRequest,
) -> lance_core::Result<UpdateTableSchemaMetadataResponse> {
call_py_method(
self.py_namespace.clone(),
"update_table_schema_metadata",
request,
)
.await
}
async fn get_table_stats(
&self,
request: GetTableStatsRequest,
) -> lance_core::Result<GetTableStatsResponse> {
call_py_method(self.py_namespace.clone(), "get_table_stats", request).await
}
async fn explain_table_query_plan(
&self,
request: ExplainTableQueryPlanRequest,
) -> lance_core::Result<String> {
call_py_method_primitive(
self.py_namespace.clone(),
"explain_table_query_plan",
request,
)
.await
}
async fn analyze_table_query_plan(
&self,
request: AnalyzeTableQueryPlanRequest,
) -> lance_core::Result<String> {
call_py_method_primitive(
self.py_namespace.clone(),
"analyze_table_query_plan",
request,
)
.await
}
async fn alter_table_add_columns(
&self,
request: AlterTableAddColumnsRequest,
) -> lance_core::Result<AlterTableAddColumnsResponse> {
call_py_method(
self.py_namespace.clone(),
"alter_table_add_columns",
request,
)
.await
}
async fn alter_table_alter_columns(
&self,
request: AlterTableAlterColumnsRequest,
) -> lance_core::Result<AlterTableAlterColumnsResponse> {
call_py_method(
self.py_namespace.clone(),
"alter_table_alter_columns",
request,
)
.await
}
async fn alter_table_drop_columns(
&self,
request: AlterTableDropColumnsRequest,
) -> lance_core::Result<AlterTableDropColumnsResponse> {
call_py_method(
self.py_namespace.clone(),
"alter_table_drop_columns",
request,
)
.await
}
async fn list_table_tags(
&self,
request: ListTableTagsRequest,
) -> lance_core::Result<ListTableTagsResponse> {
call_py_method(self.py_namespace.clone(), "list_table_tags", request).await
}
async fn create_table_tag(
&self,
request: CreateTableTagRequest,
) -> lance_core::Result<CreateTableTagResponse> {
call_py_method(self.py_namespace.clone(), "create_table_tag", request).await
}
async fn delete_table_tag(
&self,
request: DeleteTableTagRequest,
) -> lance_core::Result<DeleteTableTagResponse> {
call_py_method(self.py_namespace.clone(), "delete_table_tag", request).await
}
async fn update_table_tag(
&self,
request: UpdateTableTagRequest,
) -> lance_core::Result<UpdateTableTagResponse> {
call_py_method(self.py_namespace.clone(), "update_table_tag", request).await
}
async fn get_table_tag_version(
&self,
request: GetTableTagVersionRequest,
) -> lance_core::Result<GetTableTagVersionResponse> {
call_py_method(self.py_namespace.clone(), "get_table_tag_version", request).await
}
}
/// Convert Python dict to HashMap<String, String>
#[allow(dead_code)]
fn dict_to_hashmap(dict: &Bound<'_, PyDict>) -> PyResult<HashMap<String, String>> {
let mut map = HashMap::new();
for (key, value) in dict.iter() {
let key_str: String = key.extract()?;
let value_str: String = value.extract()?;
map.insert(key_str, value_str);
}
Ok(map)
}
/// Extract an Arc<dyn LanceNamespace> from a Python namespace object.
///
/// This function wraps any Python namespace object with PyLanceNamespace.
/// The PyLanceNamespace wrapper uses DictWithModelDump to pass requests,
/// which works with both:
/// - Native namespaces (DirectoryNamespace, RestNamespace) that use depythonize (expects dict)
/// - Custom Python implementations that call .model_dump() on the request
pub fn extract_namespace_arc(
py: Python<'_>,
ns: Py<PyAny>,
) -> PyResult<Arc<dyn LanceNamespaceTrait>> {
let ns_ref = ns.bind(py);
PyLanceNamespace::create_arc(py, ns_ref)
}

View File

@@ -6,7 +6,7 @@ use std::sync::{Arc, Mutex};
use crate::{
arrow::RecordBatchStream, connection::Connection, error::PythonErrorExt, table::Table,
};
use arrow::pyarrow::ToPyArrow;
use arrow::pyarrow::{PyArrowType, ToPyArrow};
use lancedb::{
dataloader::permutation::{
builder::{PermutationBuilder as LancePermutationBuilder, ShuffleStrategy},
@@ -16,17 +16,32 @@ use lancedb::{
query::Select,
};
use pyo3::{
Bound, PyAny, PyRef, PyRefMut, PyResult, Python,
exceptions::PyRuntimeError,
pyclass, pymethods,
types::{PyAnyMethods, PyDict, PyDictMethods, PyType},
Bound, PyAny, PyRef, PyRefMut, PyResult, Python,
};
use pyo3_async_runtimes::tokio::future_into_py;
fn table_from_py<'a>(table: Bound<'a, PyAny>) -> PyResult<Bound<'a, Table>> {
if table.hasattr("_inner")? {
Ok(table.getattr("_inner")?.downcast_into::<Table>()?)
} else if table.hasattr("_table")? {
Ok(table
.getattr("_table")?
.getattr("_inner")?
.downcast_into::<Table>()?)
} else {
Err(PyRuntimeError::new_err(
"Provided table does not appear to be a Table or RemoteTable instance",
))
}
}
/// Create a permutation builder for the given table
#[pyo3::pyfunction]
pub fn async_permutation_builder(table: Bound<'_, PyAny>) -> PyResult<PyAsyncPermutationBuilder> {
let table = table.getattr("_inner")?.downcast_into::<Table>()?;
let table = table_from_py(table)?;
let inner_table = table.borrow().inner_ref()?.clone();
let inner_builder = LancePermutationBuilder::new(inner_table);
@@ -250,10 +265,8 @@ impl PyPermutationReader {
permutation_table: Option<Bound<'py, PyAny>>,
split: u64,
) -> PyResult<Bound<'py, PyAny>> {
let base_table = base_table.getattr("_inner")?.downcast_into::<Table>()?;
let permutation_table = permutation_table
.map(|p| PyResult::Ok(p.getattr("_inner")?.downcast_into::<Table>()?))
.transpose()?;
let base_table = table_from_py(base_table)?;
let permutation_table = permutation_table.map(table_from_py).transpose()?;
let base_table = base_table.borrow().inner_ref()?.base_table().clone();
let permutation_table = permutation_table
@@ -328,4 +341,21 @@ impl PyPermutationReader {
Ok(RecordBatchStream::new(stream))
})
}
#[pyo3(signature = (indices, *, selection=None))]
pub fn take_offsets<'py>(
slf: PyRef<'py, Self>,
indices: Vec<u64>,
selection: Option<Bound<'py, PyAny>>,
) -> PyResult<Bound<'py, PyAny>> {
let selection = Self::parse_selection(selection)?;
let reader = slf.reader.clone();
future_into_py(slf.py(), async move {
let batch = reader
.take_offsets(&indices, selection)
.await
.infer_error()?;
Ok(PyArrowType(batch))
})
}
}

View File

@@ -4,9 +4,9 @@
use std::sync::Arc;
use std::time::Duration;
use arrow::array::make_array;
use arrow::array::Array;
use arrow::array::ArrayData;
use arrow::array::make_array;
use arrow::pyarrow::FromPyArrow;
use arrow::pyarrow::IntoPyArrow;
use arrow::pyarrow::ToPyArrow;
@@ -22,23 +22,23 @@ use lancedb::query::{
VectorQuery as LanceDbVectorQuery,
};
use lancedb::table::AnyQuery;
use pyo3::prelude::{PyAnyMethods, PyDictMethods};
use pyo3::pyfunction;
use pyo3::pymethods;
use pyo3::types::PyList;
use pyo3::types::{PyDict, PyString};
use pyo3::Bound;
use pyo3::IntoPyObject;
use pyo3::PyAny;
use pyo3::PyRef;
use pyo3::PyResult;
use pyo3::Python;
use pyo3::{exceptions::PyRuntimeError, FromPyObject};
use pyo3::prelude::{PyAnyMethods, PyDictMethods};
use pyo3::pyfunction;
use pyo3::pymethods;
use pyo3::types::PyList;
use pyo3::types::{PyDict, PyString};
use pyo3::{FromPyObject, exceptions::PyRuntimeError};
use pyo3::{PyErr, pyclass};
use pyo3::{
exceptions::{PyNotImplementedError, PyValueError},
intern,
};
use pyo3::{pyclass, PyErr};
use pyo3_async_runtimes::tokio::future_into_py;
use crate::util::parse_distance_type;

View File

@@ -4,7 +4,7 @@
use std::sync::Arc;
use lancedb::{ObjectStoreRegistry, Session as LanceSession};
use pyo3::{pyclass, pymethods, PyResult};
use pyo3::{PyResult, pyclass, pymethods};
/// A session for managing caches and object stores across LanceDB operations.
///

View File

@@ -5,8 +5,9 @@ use std::{collections::HashMap, sync::Arc};
use crate::{
connection::Connection,
error::PythonErrorExt,
index::{extract_index_params, IndexConfig},
index::{IndexConfig, extract_index_params},
query::{Query, TakeQuery},
table::scannable::PyScannable,
};
use arrow::{
datatypes::{DataType, Schema},
@@ -18,13 +19,15 @@ use lancedb::table::{
Table as LanceDbTable,
};
use pyo3::{
Bound, FromPyObject, PyAny, PyRef, PyResult, Python,
exceptions::{PyKeyError, PyRuntimeError, PyValueError},
pyclass, pymethods,
types::{IntoPyDict, PyAnyMethods, PyDict, PyDictMethods},
Bound, FromPyObject, PyAny, PyRef, PyResult, Python,
};
use pyo3_async_runtimes::tokio::future_into_py;
mod scannable;
/// Statistics about a compaction operation.
#[pyclass(get_all)]
#[derive(Clone, Debug)]
@@ -109,19 +112,24 @@ impl From<lancedb::table::AddResult> for AddResult {
#[pyclass(get_all)]
#[derive(Clone, Debug)]
pub struct DeleteResult {
pub num_deleted_rows: u64,
pub version: u64,
}
#[pymethods]
impl DeleteResult {
pub fn __repr__(&self) -> String {
format!("DeleteResult(version={})", self.version)
format!(
"DeleteResult(num_deleted_rows={}, version={})",
self.num_deleted_rows, self.version
)
}
}
impl From<lancedb::table::DeleteResult> for DeleteResult {
fn from(result: lancedb::table::DeleteResult) -> Self {
Self {
num_deleted_rows: result.num_deleted_rows,
version: result.version,
}
}
@@ -293,11 +301,10 @@ impl Table {
pub fn add<'a>(
self_: PyRef<'a, Self>,
data: Bound<'_, PyAny>,
data: PyScannable,
mode: String,
) -> PyResult<Bound<'a, PyAny>> {
let batches = ArrowArrayStreamReader::from_pyarrow_bound(&data)?;
let mut op = self_.inner_ref()?.add(batches);
let mut op = self_.inner_ref()?.add(data);
if mode == "append" {
op = op.mode(AddDataMode::Append);
} else if mode == "overwrite" {
@@ -535,7 +542,7 @@ impl Table {
let inner = self_.inner_ref()?.clone();
future_into_py(self_.py(), async move {
let versions = inner.list_versions().await.infer_error()?;
let versions_as_dict = Python::attach(|py| {
Python::attach(|py| {
versions
.iter()
.map(|v| {
@@ -552,9 +559,7 @@ impl Table {
Ok(dict.unbind())
})
.collect::<PyResult<Vec<_>>>()
});
versions_as_dict
})
})
}

View File

@@ -0,0 +1,145 @@
// SPDX-License-Identifier: Apache-2.0
// SPDX-FileCopyrightText: Copyright The LanceDB Authors
use std::sync::Arc;
use arrow::{
datatypes::{Schema, SchemaRef},
ffi_stream::ArrowArrayStreamReader,
pyarrow::{FromPyArrow, PyArrowType},
};
use futures::StreamExt;
use lancedb::{
Error,
arrow::{SendableRecordBatchStream, SimpleRecordBatchStream},
data::scannable::Scannable,
};
use pyo3::{FromPyObject, Py, PyAny, Python, types::PyAnyMethods};
/// Adapter that implements Scannable for a Python reader factory callable.
///
/// This holds a Python callable that returns a RecordBatchReader when called.
/// For rescannable sources, the callable can be invoked multiple times to
/// get fresh readers.
pub struct PyScannable {
/// Python callable that returns a RecordBatchReader
reader_factory: Py<PyAny>,
schema: SchemaRef,
num_rows: Option<usize>,
rescannable: bool,
}
impl std::fmt::Debug for PyScannable {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
f.debug_struct("PyScannable")
.field("schema", &self.schema)
.field("num_rows", &self.num_rows)
.field("rescannable", &self.rescannable)
.finish()
}
}
impl Scannable for PyScannable {
fn schema(&self) -> SchemaRef {
self.schema.clone()
}
fn scan_as_stream(&mut self) -> SendableRecordBatchStream {
let reader: Result<ArrowArrayStreamReader, Error> = {
Python::attach(|py| {
let result =
self.reader_factory
.call0(py)
.map_err(|e| lancedb::Error::Runtime {
message: format!("Python reader factory failed: {}", e),
})?;
ArrowArrayStreamReader::from_pyarrow_bound(result.bind(py)).map_err(|e| {
lancedb::Error::Runtime {
message: format!("Failed to create Arrow reader from Python: {}", e),
}
})
})
};
// Reader is blocking but stream is non-blocking, so we need to spawn a task to pull.
let (tx, rx) = tokio::sync::mpsc::channel(1);
let join_handle = tokio::task::spawn_blocking(move || {
let reader = match reader {
Ok(reader) => reader,
Err(e) => {
let _ = tx.blocking_send(Err(e));
return;
}
};
for batch in reader {
match batch {
Ok(batch) => {
if tx.blocking_send(Ok(batch)).is_err() {
// Receiver dropped, stop processing
break;
}
}
Err(source) => {
let _ = tx.blocking_send(Err(Error::Arrow { source }));
break;
}
}
}
});
let schema = self.schema.clone();
let stream = futures::stream::unfold(
(rx, Some(join_handle)),
|(mut rx, join_handle)| async move {
match rx.recv().await {
Some(Ok(batch)) => Some((Ok(batch), (rx, join_handle))),
Some(Err(e)) => Some((Err(e), (rx, join_handle))),
None => {
// Channel closed. Check if the task panicked — a panic
// drops the sender without sending an error, so without
// this check we'd silently return a truncated stream.
if let Some(handle) = join_handle
&& let Err(join_err) = handle.await
{
return Some((
Err(Error::Runtime {
message: format!("Reader task panicked: {}", join_err),
}),
(rx, None),
));
}
None
}
}
},
);
Box::pin(SimpleRecordBatchStream::new(stream.fuse(), schema))
}
fn num_rows(&self) -> Option<usize> {
self.num_rows
}
fn rescannable(&self) -> bool {
self.rescannable
}
}
impl<'py> FromPyObject<'py> for PyScannable {
fn extract_bound(ob: &pyo3::Bound<'py, PyAny>) -> pyo3::PyResult<Self> {
// Convert from Scannable dataclass.
let schema: PyArrowType<Schema> = ob.getattr("schema")?.extract()?;
let schema = Arc::new(schema.0);
let num_rows: Option<usize> = ob.getattr("num_rows")?.extract()?;
let rescannable: bool = ob.getattr("rescannable")?.extract()?;
let reader_factory: Py<PyAny> = ob.getattr("reader")?.unbind();
Ok(Self {
schema,
reader_factory,
num_rows,
rescannable,
})
}
}

View File

@@ -5,8 +5,9 @@ use std::sync::Mutex;
use lancedb::DistanceType;
use pyo3::{
PyResult,
exceptions::{PyRuntimeError, PyValueError},
pyfunction, PyResult,
pyfunction,
};
/// A wrapper around a rust builder

5349
python/uv.lock generated Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -1,2 +1,2 @@
[toolchain]
channel = "1.90.0"
channel = "1.91.0"

View File

@@ -1,6 +1,6 @@
[package]
name = "lancedb"
version = "0.25.0-beta.0"
version = "0.27.0-beta.3"
edition.workspace = true
description = "LanceDB: A serverless, low-latency vector database for AI applications"
license.workspace = true
@@ -25,7 +25,9 @@ datafusion-catalog.workspace = true
datafusion-common.workspace = true
datafusion-execution.workspace = true
datafusion-expr.workspace = true
datafusion-functions.workspace = true
datafusion-physical-expr.workspace = true
datafusion-sql.workspace = true
datafusion-physical-plan.workspace = true
datafusion.workspace = true
object_store = { workspace = true }

View File

@@ -3,17 +3,15 @@
use std::{iter::once, sync::Arc};
use arrow_array::{Float64Array, Int32Array, RecordBatch, RecordBatchIterator, StringArray};
use arrow_array::{Float64Array, Int32Array, RecordBatch, StringArray};
use arrow_schema::{DataType, Field, Schema};
use aws_config::Region;
use aws_sdk_bedrockruntime::Client;
use futures::StreamExt;
use lancedb::{
arrow::IntoArrow,
connect,
embeddings::{bedrock::BedrockEmbeddingFunction, EmbeddingDefinition, EmbeddingFunction},
Result, connect,
embeddings::{EmbeddingDefinition, EmbeddingFunction, bedrock::BedrockEmbeddingFunction},
query::{ExecutableQuery, QueryBase},
Result,
};
#[tokio::main]
@@ -67,7 +65,7 @@ async fn main() -> Result<()> {
Ok(())
}
fn make_data() -> impl IntoArrow {
fn make_data() -> RecordBatch {
let schema = Schema::new(vec![
Field::new("id", DataType::Int32, true),
Field::new("text", DataType::Utf8, false),
@@ -83,10 +81,9 @@ fn make_data() -> impl IntoArrow {
]);
let price = Float64Array::from(vec![10.0, 50.0, 100.0, 30.0]);
let schema = Arc::new(schema);
let rb = RecordBatch::try_new(
RecordBatch::try_new(
schema.clone(),
vec![Arc::new(id), Arc::new(text), Arc::new(price)],
)
.unwrap();
Box::new(RecordBatchIterator::new(vec![Ok(rb)], schema))
.unwrap()
}

View File

@@ -3,16 +3,17 @@
use std::sync::Arc;
use arrow_array::{Int32Array, RecordBatch, RecordBatchIterator, RecordBatchReader, StringArray};
use arrow_array::{Int32Array, RecordBatch, RecordBatchIterator, StringArray};
use arrow_schema::{DataType, Field, Schema};
use futures::TryStreamExt;
use lance_index::scalar::FullTextSearchQuery;
use lancedb::connection::Connection;
use lancedb::index::scalar::FtsIndexBuilder;
use lancedb::index::Index;
use lancedb::index::scalar::FtsIndexBuilder;
use lancedb::query::{ExecutableQuery, QueryBase};
use lancedb::{connect, Result, Table};
use lancedb::{Result, Table, connect};
use rand::random;
#[tokio::main]
@@ -29,7 +30,7 @@ async fn main() -> Result<()> {
Ok(())
}
fn create_some_records() -> Result<Box<dyn RecordBatchReader + Send>> {
fn create_some_records() -> Result<Box<dyn arrow_array::RecordBatchReader + Send>> {
const TOTAL: usize = 1000;
let schema = Arc::new(Schema::new(vec![
@@ -45,19 +46,21 @@ fn create_some_records() -> Result<Box<dyn RecordBatchReader + Send>> {
.collect::<Vec<_>>();
let n_terms = 3;
let batches = RecordBatchIterator::new(
vec![RecordBatch::try_new(
schema.clone(),
vec![
Arc::new(Int32Array::from_iter_values(0..TOTAL as i32)),
Arc::new(StringArray::from_iter_values((0..TOTAL).map(|_| {
(0..n_terms)
.map(|_| words[random::<u32>() as usize % words.len()])
.collect::<Vec<_>>()
.join(" ")
}))),
],
)
.unwrap()]
vec![
RecordBatch::try_new(
schema.clone(),
vec![
Arc::new(Int32Array::from_iter_values(0..TOTAL as i32)),
Arc::new(StringArray::from_iter_values((0..TOTAL).map(|_| {
(0..n_terms)
.map(|_| words[random::<u32>() as usize % words.len()])
.collect::<Vec<_>>()
.join(" ")
}))),
],
)
.unwrap(),
]
.into_iter()
.map(Ok),
schema.clone(),
@@ -66,7 +69,7 @@ fn create_some_records() -> Result<Box<dyn RecordBatchReader + Send>> {
}
async fn create_table(db: &Connection) -> Result<Table> {
let initial_data: Box<dyn RecordBatchReader + Send> = create_some_records()?;
let initial_data = create_some_records()?;
let tbl = db.create_table("my_table", initial_data).execute().await?;
Ok(tbl)
}

Some files were not shown because too many files have changed in this diff Show More