Compare commits

..

35 Commits

Author SHA1 Message Date
Lance Release
745c34a6a9 Bump version: 0.22.1-beta.6 → 0.22.1 2025-05-22 05:57:20 +00:00
Lance Release
db8fa2454d Bump version: 0.22.1-beta.5 → 0.22.1-beta.6 2025-05-22 05:57:20 +00:00
Lei Xu
a67a7b4b42 chore: use stable lance (#2398)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Chores**
- Updated workspace dependencies to use a stable release version for
improved consistency and reliability. No changes to application features
or functionality.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-05-21 22:34:29 -07:00
Lei Xu
496846e532 chore: bump lance version (#2397)
- Bump lance version and prepare a new release.
- Bump rust toolchain to 1.86, because GHA ubuntu does not have 1.83
`cargo-fmt` anymore
2025-05-21 14:15:55 -07:00
Ayush Chaurasia
dadcfebf8e docs: add logos in genkit docs page (#2391)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Documentation**
- Added an integration banner image to the beginning of the
Genkitx-LanceDB documentation.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-05-20 01:40:12 +05:30
Lance Release
67033dbd7f Updating package-lock.json 2025-05-16 00:25:41 +00:00
Lance Release
05a85cfc2a Updating package-lock.json 2025-05-15 23:44:27 +00:00
Lance Release
40c5d3d72b Updating package-lock.json 2025-05-15 23:44:10 +00:00
Lance Release
198f0f80c6 Bump version: 0.19.1-beta.4 → 0.19.1-beta.5 2025-05-15 23:43:32 +00:00
Lance Release
e3f2fd3892 Bump version: 0.22.1-beta.4 → 0.22.1-beta.5 2025-05-15 23:42:46 +00:00
Wyatt Alt
f401ccc599 chore: update lance to 0.27.1-beta.1 (#2388)
This is for fe14671f1

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Chores**
- Updated internal dependencies to newer versions for improved stability
and performance. No changes to features or functionality.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-05-15 16:09:01 -07:00
Ayush Chaurasia
81b59139f8 docs: add genkit integration docs (#2383)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Documentation**
- Added a comprehensive guide for integrating LanceDB with Genkit,
including installation instructions, setup, indexing, retrieval, and
building a custom RAG pipeline with example code and screenshots.
- Updated the documentation navigation to include the new Genkit
integration, making it accessible from the site menu.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-05-12 18:18:07 +05:30
ayush chaurasia
1026781ab6 Revert "update"
This reverts commit 9c699b8cd9.
2025-05-11 21:04:59 +05:30
ayush chaurasia
9c699b8cd9 update 2025-05-11 21:01:53 +05:30
Lance Release
34bec59bc3 Updating package-lock.json 2025-05-08 21:34:37 +00:00
Lance Release
a5fbbf0d66 Updating package-lock.json 2025-05-08 20:20:18 +00:00
Lance Release
b42721167b Updating package-lock.json 2025-05-08 20:20:00 +00:00
Lance Release
543dec9ff0 Bump version: 0.19.1-beta.3 → 0.19.1-beta.4 2025-05-08 20:19:17 +00:00
Lance Release
04f962f6b0 Bump version: 0.22.1-beta.3 → 0.22.1-beta.4 2025-05-08 20:18:40 +00:00
LuQQiu
19e896ff69 chore: add default for result structs (#2377)
add default for result structs, when values are not provided, will go
with the default values

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Chores**
- Improved internal handling of table operation results to support
default values. No changes to user-facing features or functionality.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-05-08 13:09:11 -07:00
Will Jones
272e4103b2 feat: provide timeout parameter for merge_insert (#2378)
Provides the ability to set a timeout for merge insert. The default
underlying timeout is however long the first attempt takes, or if there
are multiple attempts, 30 seconds. This has two use cases:

1. Make the timeout shorter, when you want to fail if it takes too long.
2. Allow taking more time to do retries.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Added support for specifying a timeout when performing merge insert
operations in Python, Node.js, and Rust APIs.
- Introduced a new option to control the maximum allowed execution time
for merge inserts, including retry timeout handling.

- **Documentation**
- Updated and added documentation to describe the new timeout option and
its usage in APIs.

- **Tests**
- Added and updated tests to verify correct timeout behavior during
merge insert operations.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-05-08 13:07:05 -07:00
Wyatt Alt
75c257ebb6 fix: return IndexNotExist on remote drop index 404 (#2380)
Prior to this commit, attempting to drop an index that did not exist
would return a TableNotFound error stating that the target table does
not exist -- even when it did exist. Instead, we now return an
IndexNotFound error.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Bug Fixes**
- Improved error handling when attempting to drop a non-existent index,
providing a more accurate error message.
- **Tests**
- Added a test to verify correct error reporting when dropping an index
that does not exist.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-05-07 17:24:05 -07:00
Wyatt Alt
9ee152eb42 fix: support __len__ on remote table (#2379)
This moves the __len__ method from LanceTable and RemoteTable to Table
so that child classes don't need to implement their own. In the process,
it fixes the implementation of RemoteTable's length method, which was
previously missing a return statement.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Refactor**
- Centralized the table length functionality in the base table class,
simplifying subclass behavior.
- Removed redundant or non-functional length methods from specific table
classes.

- **Tests**
- Added a new test to verify correct table length reporting for remote
tables.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-05-07 17:23:39 -07:00
LuQQiu
c9ae1b1737 fix: add restore with tag in python and nodejs API (#2374)
add restore with tag API in python and nodejs API and add tests to guard
them

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- The restore functionality now supports using version tags in addition
to numeric version identifiers, allowing you to revert tables to a state
marked by a tag.
- **Bug Fixes**
  - Restoring with an unknown tag now properly raises an error.
- **Documentation**
- Updated documentation and examples to clarify that restore accepts
both version numbers and tags.
- **Tests**
- Added new tests to verify restore behavior with version tags and error
handling for unknown tags.
  - Added tests for checkout and restore operations involving tags.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-05-06 16:12:58 -07:00
Lance Release
89dc80c42a Updating package-lock.json 2025-05-06 03:53:49 +00:00
Wyatt Alt
7b020ac799 chore: run cargo update (#2376) 2025-05-05 20:26:42 -07:00
Lance Release
529e774bbb Updating package-lock.json 2025-05-06 02:45:45 +00:00
Lance Release
7c12239305 Updating package-lock.json 2025-05-06 02:45:29 +00:00
Lance Release
d83424d6b4 Bump version: 0.19.1-beta.2 → 0.19.1-beta.3 2025-05-06 02:45:06 +00:00
Lance Release
8bf89f887c Bump version: 0.22.1-beta.2 → 0.22.1-beta.3 2025-05-06 02:44:39 +00:00
LuQQiu
b2160b2304 fix: fix backward compatibility with the add API (#2375)
add API originally returns a struct with request_id, add backward
compatibility for that

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **Bug Fixes**
- Improved handling of empty server responses for various data
operations to ensure consistent behavior across server versions.
- Added default values to version and numeric fields to prevent errors
when response data is incomplete.

- **Tests**
- Expanded tests to cover multiple server response scenarios, validating
correct version handling in data operations.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-05-05 19:26:27 -07:00
Lance Release
1bb82597be Updating package-lock.json 2025-05-06 01:21:13 +00:00
Lance Release
e4eee38b3c Updating package-lock.json 2025-05-06 00:09:39 +00:00
Lance Release
64fc2be503 Updating package-lock.json 2025-05-06 00:09:19 +00:00
Lance Release
dc8054e90d Bump version: 0.19.1-beta.1 → 0.19.1-beta.2 2025-05-06 00:08:55 +00:00
43 changed files with 686 additions and 235 deletions

View File

@@ -1,5 +1,5 @@
[tool.bumpversion]
current_version = "0.19.1-beta.1"
current_version = "0.19.1-beta.5"
parse = """(?x)
(?P<major>0|[1-9]\\d*)\\.
(?P<minor>0|[1-9]\\d*)\\.

View File

@@ -40,6 +40,9 @@ jobs:
with:
fetch-depth: 0
lfs: true
- uses: actions-rust-lang/setup-rust-toolchain@v1
with:
components: rustfmt, clippy
- uses: Swatinem/rust-cache@v2
with:
workspaces: rust

193
Cargo.lock generated
View File

@@ -223,7 +223,7 @@ dependencies = [
"chrono",
"chrono-tz 0.10.3",
"half",
"hashbrown 0.15.2",
"hashbrown 0.15.3",
"num",
]
@@ -602,9 +602,9 @@ dependencies = [
[[package]]
name = "aws-sdk-bedrockruntime"
version = "1.85.0"
version = "1.86.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "6f6c003cd82739447a18d7616468b047341c125efff11fdafc77a5e777a861c9"
checksum = "db14a0566037a6c686ef075c406dec4b067537af3d76950522e9e89848ce7a5a"
dependencies = [
"aws-credential-types",
"aws-runtime",
@@ -628,9 +628,9 @@ dependencies = [
[[package]]
name = "aws-sdk-dynamodb"
version = "1.72.1"
version = "1.73.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b14d5b5d6849d1caa7b404ea57cbe25ed8ba25c3c7d47f45bcbd5b51e098ceac"
checksum = "8d954f3581bd7254f42bbaa3a21dfd99d40a14d82a324d2012b8f3ea0d15f12b"
dependencies = [
"aws-credential-types",
"aws-runtime",
@@ -651,9 +651,9 @@ dependencies = [
[[package]]
name = "aws-sdk-kms"
version = "1.66.0"
version = "1.67.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "655097cd83ab1f15575890943135192560f77097413c6dd1733fdbdc453e81ac"
checksum = "2b650cf9e1e153ab13acd3aa1f73b271dac14e019353ec0b0c176f24a21bad03"
dependencies = [
"aws-credential-types",
"aws-runtime",
@@ -674,9 +674,9 @@ dependencies = [
[[package]]
name = "aws-sdk-s3"
version = "1.83.0"
version = "1.84.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "51384750334005f40e1a334b0d54eca822a77eacdcf3c50fdf38f583c5eee7a2"
checksum = "2111975ef21dc06542918479df0df861b273eb8d99e6bb987da469b546dce32c"
dependencies = [
"aws-credential-types",
"aws-runtime",
@@ -709,9 +709,9 @@ dependencies = [
[[package]]
name = "aws-sdk-sso"
version = "1.65.0"
version = "1.66.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "8efec445fb78df585327094fcef4cad895b154b58711e504db7a93c41aa27151"
checksum = "858007b14d0f1ade2e0124473c2126b24d334dc9486ad12eb7c0ed14757be464"
dependencies = [
"aws-credential-types",
"aws-runtime",
@@ -732,9 +732,9 @@ dependencies = [
[[package]]
name = "aws-sdk-ssooidc"
version = "1.66.0"
version = "1.67.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5e49cca619c10e7b002dc8e66928ceed66ab7f56c1a3be86c5437bf2d8d89bba"
checksum = "b83abf3ae8bd10a014933cc2383964a12ca5a3ebbe1948ad26b1b808e7d0d1f2"
dependencies = [
"aws-credential-types",
"aws-runtime",
@@ -755,9 +755,9 @@ dependencies = [
[[package]]
name = "aws-sdk-sts"
version = "1.66.0"
version = "1.67.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7420479eac0a53f776cc8f0d493841ffe58ad9d9783f3947be7265784471b47a"
checksum = "74e8e9ac4a837859c8f1d747054172e1e55933f02ed34728b0b34dea0591ec84"
dependencies = [
"aws-credential-types",
"aws-runtime",
@@ -879,7 +879,7 @@ dependencies = [
"aws-smithy-async",
"aws-smithy-runtime-api",
"aws-smithy-types",
"h2 0.4.9",
"h2 0.4.10",
"http 0.2.12",
"http 1.3.1",
"http-body 0.4.6",
@@ -890,7 +890,7 @@ dependencies = [
"hyper-util",
"pin-project-lite",
"rustls 0.21.12",
"rustls 0.23.26",
"rustls 0.23.27",
"rustls-native-certs 0.8.1",
"rustls-pki-types",
"tokio",
@@ -1326,9 +1326,9 @@ dependencies = [
[[package]]
name = "cc"
version = "1.2.20"
version = "1.2.21"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "04da6a0d40b948dfc4fa8f5bbf402b0fc1a64a28dbf7d12ffd683550f2c1b63a"
checksum = "8691782945451c1c383942c4874dbe63814f61cb57ef773cda2972682b7bb3c0"
dependencies = [
"jobserver",
"libc",
@@ -2737,8 +2737,9 @@ checksum = "42703706b716c37f96a77aea830392ad231f44c9e9a67872fa5548707e11b11c"
[[package]]
name = "fsst"
version = "0.27.0"
source = "git+https://github.com/lancedb/lance.git?tag=v0.27.0-beta.2#cf903b470be1aaff2998830bd0358226f27f4185"
version = "0.27.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "1b054c0eaa0f92df393e53cb42e3cc01e6f73bc601252f683eb63ddcc552f3ff"
dependencies = [
"rand 0.8.5",
]
@@ -3065,9 +3066,9 @@ dependencies = [
[[package]]
name = "h2"
version = "0.4.9"
version = "0.4.10"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "75249d144030531f8dee69fe9cea04d3edf809a017ae445e2abdff6629e86633"
checksum = "a9421a676d1b147b16b82c9225157dc629087ef8ec4d5e2960f9437a90dac0a5"
dependencies = [
"atomic-waker",
"bytes",
@@ -3115,9 +3116,9 @@ dependencies = [
[[package]]
name = "hashbrown"
version = "0.15.2"
version = "0.15.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "bf151400ff0baff5465007dd2f3e717f3fe502074ca563069ce3a6629d07b289"
checksum = "84b26c544d002229e640969970a2e74021aadf6e2f96372b9c58eff97de08eb3"
dependencies = [
"allocator-api2",
"equivalent",
@@ -3302,7 +3303,7 @@ dependencies = [
"bytes",
"futures-channel",
"futures-util",
"h2 0.4.9",
"h2 0.4.10",
"http 1.3.1",
"http-body 1.0.1",
"httparse",
@@ -3339,7 +3340,7 @@ dependencies = [
"http 1.3.1",
"hyper 1.6.0",
"hyper-util",
"rustls 0.23.26",
"rustls 0.23.27",
"rustls-native-certs 0.8.1",
"rustls-pki-types",
"tokio",
@@ -3564,7 +3565,7 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "cea70ddb795996207ad57735b50c5982d8844f38ba9ee5f1aedcfb708a2aa11e"
dependencies = [
"equivalent",
"hashbrown 0.15.2",
"hashbrown 0.15.3",
"serde",
]
@@ -3661,9 +3662,9 @@ checksum = "9028f49264629065d057f340a86acb84867925865f73bbf8d47b4d149a7e88b8"
[[package]]
name = "jiff"
version = "0.2.10"
version = "0.2.13"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5a064218214dc6a10fbae5ec5fa888d80c45d611aba169222fc272072bf7aef6"
checksum = "f02000660d30638906021176af16b17498bd0d12813dbfe7b276d8bc7f3c0806"
dependencies = [
"jiff-static",
"log",
@@ -3674,9 +3675,9 @@ dependencies = [
[[package]]
name = "jiff-static"
version = "0.2.10"
version = "0.2.13"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "199b7932d97e325aff3a7030e141eafe7f2c6268e1d1b24859b753a627f45254"
checksum = "f3c30758ddd7188629c6713fc45d1188af4f44c90582311d0c8d8c9907f60c48"
dependencies = [
"proc-macro2",
"quote",
@@ -3727,8 +3728,9 @@ dependencies = [
[[package]]
name = "lance"
version = "0.27.0"
source = "git+https://github.com/lancedb/lance.git?tag=v0.27.0-beta.2#cf903b470be1aaff2998830bd0358226f27f4185"
version = "0.27.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b4df828dc75fdfc665846a9bb91b882801f2092ac9a5c54cdc99c155b86b97ed"
dependencies = [
"arrow",
"arrow-arith",
@@ -3752,6 +3754,7 @@ dependencies = [
"datafusion-expr",
"datafusion-functions",
"datafusion-physical-expr",
"datafusion-physical-plan",
"deepsize",
"either",
"futures",
@@ -3790,8 +3793,9 @@ dependencies = [
[[package]]
name = "lance-arrow"
version = "0.27.0"
source = "git+https://github.com/lancedb/lance.git?tag=v0.27.0-beta.2#cf903b470be1aaff2998830bd0358226f27f4185"
version = "0.27.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "270c34ececc4e2603e50dab07ac3ba21a81fe390dcf00ee62b31a844b6cabe25"
dependencies = [
"arrow-array",
"arrow-buffer",
@@ -3808,8 +3812,9 @@ dependencies = [
[[package]]
name = "lance-core"
version = "0.27.0"
source = "git+https://github.com/lancedb/lance.git?tag=v0.27.0-beta.2#cf903b470be1aaff2998830bd0358226f27f4185"
version = "0.27.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "8860c76dc32d649cd0460fbc23e612390263de814f5918210166ee6ce26886e9"
dependencies = [
"arrow-array",
"arrow-buffer",
@@ -3845,8 +3850,9 @@ dependencies = [
[[package]]
name = "lance-datafusion"
version = "0.27.0"
source = "git+https://github.com/lancedb/lance.git?tag=v0.27.0-beta.2#cf903b470be1aaff2998830bd0358226f27f4185"
version = "0.27.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "494e614227a31a01a2a8ca0f151fd53db7f041a856d15514696af63d075867f6"
dependencies = [
"arrow",
"arrow-array",
@@ -3875,8 +3881,9 @@ dependencies = [
[[package]]
name = "lance-datagen"
version = "0.27.0"
source = "git+https://github.com/lancedb/lance.git?tag=v0.27.0-beta.2#cf903b470be1aaff2998830bd0358226f27f4185"
version = "0.27.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "34faee15ed02126597522f36cdc9b5134d1411f512f31ab7ca65e5ab5e111b37"
dependencies = [
"arrow",
"arrow-array",
@@ -3891,8 +3898,9 @@ dependencies = [
[[package]]
name = "lance-encoding"
version = "0.27.0"
source = "git+https://github.com/lancedb/lance.git?tag=v0.27.0-beta.2#cf903b470be1aaff2998830bd0358226f27f4185"
version = "0.27.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "fb705919e46ea1784c048d798d2e408b2ae703dfdc67e128177e2ee9bb405b31"
dependencies = [
"arrayref",
"arrow",
@@ -3931,8 +3939,9 @@ dependencies = [
[[package]]
name = "lance-file"
version = "0.27.0"
source = "git+https://github.com/lancedb/lance.git?tag=v0.27.0-beta.2#cf903b470be1aaff2998830bd0358226f27f4185"
version = "0.27.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "aefcde20c6a27f767072f9239af70da5e744187ec7f3c7bebcb33705b4c01834"
dependencies = [
"arrow-arith",
"arrow-array",
@@ -3966,8 +3975,9 @@ dependencies = [
[[package]]
name = "lance-index"
version = "0.27.0"
source = "git+https://github.com/lancedb/lance.git?tag=v0.27.0-beta.2#cf903b470be1aaff2998830bd0358226f27f4185"
version = "0.27.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "82678b7035c9041010c74f789a18a63b192c518699217c69e4a83512b67bcbd5"
dependencies = [
"arrow",
"arrow-array",
@@ -4020,8 +4030,9 @@ dependencies = [
[[package]]
name = "lance-io"
version = "0.27.0"
source = "git+https://github.com/lancedb/lance.git?tag=v0.27.0-beta.2#cf903b470be1aaff2998830bd0358226f27f4185"
version = "0.27.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "e2abb629dab01c7e639d9da2f83b36fb9b8ff7e971312b7363cd49a4d7c67276"
dependencies = [
"arrow",
"arrow-arith",
@@ -4050,6 +4061,7 @@ dependencies = [
"pin-project",
"prost",
"rand 0.8.5",
"serde",
"shellexpand",
"snafu",
"tokio",
@@ -4059,8 +4071,9 @@ dependencies = [
[[package]]
name = "lance-linalg"
version = "0.27.0"
source = "git+https://github.com/lancedb/lance.git?tag=v0.27.0-beta.2#cf903b470be1aaff2998830bd0358226f27f4185"
version = "0.27.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "91f6172d9f7c6105afcee8edd92165d0bfdff68dd6c622a58985eea445f309cb"
dependencies = [
"arrow-array",
"arrow-ord",
@@ -4083,8 +4096,9 @@ dependencies = [
[[package]]
name = "lance-table"
version = "0.27.0"
source = "git+https://github.com/lancedb/lance.git?tag=v0.27.0-beta.2#cf903b470be1aaff2998830bd0358226f27f4185"
version = "0.27.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "f7831c0d784e2c876dbaf39a041c9174bc888206e2d5ef515bc3917dd78a27ec"
dependencies = [
"arrow",
"arrow-array",
@@ -4123,8 +4137,9 @@ dependencies = [
[[package]]
name = "lance-testing"
version = "0.27.0"
source = "git+https://github.com/lancedb/lance.git?tag=v0.27.0-beta.2#cf903b470be1aaff2998830bd0358226f27f4185"
version = "0.27.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "2f37690b1e8dbabedda366803b6481d0b442dd70234406bd746eb0a9aaf25dfb"
dependencies = [
"arrow-array",
"arrow-schema",
@@ -4135,7 +4150,7 @@ dependencies = [
[[package]]
name = "lancedb"
version = "0.19.1-beta.1"
version = "0.19.1-beta.5"
dependencies = [
"arrow",
"arrow-array",
@@ -4222,7 +4237,7 @@ dependencies = [
[[package]]
name = "lancedb-node"
version = "0.19.1-beta.1"
version = "0.19.1-beta.5"
dependencies = [
"arrow-array",
"arrow-ipc",
@@ -4247,7 +4262,7 @@ dependencies = [
[[package]]
name = "lancedb-nodejs"
version = "0.19.1-beta.1"
version = "0.19.1-beta.5"
dependencies = [
"arrow-array",
"arrow-ipc",
@@ -4266,7 +4281,7 @@ dependencies = [
[[package]]
name = "lancedb-python"
version = "0.22.1-beta.1"
version = "0.22.1-beta.5"
dependencies = [
"arrow",
"env_logger",
@@ -4389,9 +4404,9 @@ dependencies = [
[[package]]
name = "libm"
version = "0.2.13"
version = "0.2.14"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "c9627da5196e5d8ed0b0495e61e518847578da83483c37288316d9b2e03a7f72"
checksum = "a25169bd5913a4b437588a7e3d127cd6e90127b60e0ffbd834a38f1599e016b8"
[[package]]
name = "libredox"
@@ -4456,7 +4471,7 @@ version = "0.12.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "234cf4f4a04dc1f57e24b96cc0cd600cf2af460d4161ac5ecdd0af8e1f3b2a38"
dependencies = [
"hashbrown 0.15.2",
"hashbrown 0.15.3",
]
[[package]]
@@ -5848,7 +5863,7 @@ dependencies = [
"quinn-proto",
"quinn-udp",
"rustc-hash 2.1.1",
"rustls 0.23.26",
"rustls 0.23.27",
"socket2",
"thiserror 2.0.12",
"tokio",
@@ -5867,7 +5882,7 @@ dependencies = [
"rand 0.9.1",
"ring",
"rustc-hash 2.1.1",
"rustls 0.23.26",
"rustls 0.23.27",
"rustls-pki-types",
"slab",
"thiserror 2.0.12",
@@ -5878,9 +5893,9 @@ dependencies = [
[[package]]
name = "quinn-udp"
version = "0.5.11"
version = "0.5.12"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "541d0f57c6ec747a90738a52741d3221f7960e8ac2f0ff4b1a63680e033b4ab5"
checksum = "ee4e529991f949c5e25755532370b8af5d114acae52326361d68d47af64aa842"
dependencies = [
"cfg_aliases",
"libc",
@@ -6086,9 +6101,9 @@ dependencies = [
[[package]]
name = "redox_syscall"
version = "0.5.11"
version = "0.5.12"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d2f103c6d277498fbceb16e84d317e2a400f160f46904d5f5410848c829511a3"
checksum = "928fca9cf2aa042393a8325b9ead81d2f0df4cb12e1e24cef072922ccd99c5af"
dependencies = [
"bitflags 2.9.0",
]
@@ -6183,7 +6198,7 @@ dependencies = [
"encoding_rs",
"futures-core",
"futures-util",
"h2 0.4.9",
"h2 0.4.10",
"http 1.3.1",
"http-body 1.0.1",
"http-body-util",
@@ -6199,7 +6214,7 @@ dependencies = [
"percent-encoding",
"pin-project-lite",
"quinn",
"rustls 0.23.26",
"rustls 0.23.27",
"rustls-native-certs 0.8.1",
"rustls-pemfile 2.2.0",
"rustls-pki-types",
@@ -6355,9 +6370,9 @@ dependencies = [
[[package]]
name = "rustix"
version = "1.0.5"
version = "1.0.7"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d97817398dd4bb2e6da002002db259209759911da105da92bec29ccb12cf58bf"
checksum = "c71e83d6afe7ff64890ec6b71d6a69bb8a610ab78ce364b3352876bb4c801266"
dependencies = [
"bitflags 2.9.0",
"errno",
@@ -6380,16 +6395,16 @@ dependencies = [
[[package]]
name = "rustls"
version = "0.23.26"
version = "0.23.27"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "df51b5869f3a441595eac5e8ff14d486ff285f7b8c0df8770e49c3b56351f0f0"
checksum = "730944ca083c1c233a75c09f199e973ca499344a2b7ba9e755c457e86fb4a321"
dependencies = [
"aws-lc-rs",
"log",
"once_cell",
"ring",
"rustls-pki-types",
"rustls-webpki 0.103.1",
"rustls-webpki 0.103.2",
"subtle",
"zeroize",
]
@@ -6457,9 +6472,9 @@ dependencies = [
[[package]]
name = "rustls-webpki"
version = "0.103.1"
version = "0.103.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "fef8b8769aaccf73098557a87cd1816b4f9c7c16811c9c77142aa695c16f2c03"
checksum = "7149975849f1abb3832b246010ef62ccc80d3a76169517ada7188252b9cfb437"
dependencies = [
"aws-lc-rs",
"ring",
@@ -6712,9 +6727,9 @@ dependencies = [
[[package]]
name = "sha2"
version = "0.10.8"
version = "0.10.9"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "793db75ad2bcafc3ffa7c68b215fee268f537982cd901d132f89c6343f3a3dc8"
checksum = "a7507d819769d01a365ab707794a4084392c824f54a7a6a7862f8c3d0892b283"
dependencies = [
"cfg-if",
"cpufeatures",
@@ -7033,9 +7048,9 @@ dependencies = [
[[package]]
name = "synstructure"
version = "0.13.1"
version = "0.13.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "c8af7666ab7b6390ab78131fb5b0fce11d6b7a6951602017c35fa82800708971"
checksum = "728a70f3dbaf5bab7f0c4b1ac8d7ae5ea60a4b5549c8a5914361c99147a709d2"
dependencies = [
"proc-macro2",
"quote",
@@ -7265,7 +7280,7 @@ dependencies = [
"fastrand",
"getrandom 0.3.2",
"once_cell",
"rustix 1.0.5",
"rustix 1.0.7",
"windows-sys 0.59.0",
]
@@ -7460,7 +7475,7 @@ version = "0.26.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "8e727b36a1a0e8b74c376ac2211e40c2c8af09fb4013c60d910495810f008e9b"
dependencies = [
"rustls 0.23.26",
"rustls 0.23.27",
"tokio",
]
@@ -7685,7 +7700,7 @@ dependencies = [
"flate2",
"log",
"once_cell",
"rustls 0.23.26",
"rustls 0.23.27",
"rustls-pki-types",
"serde",
"serde_json",
@@ -7905,9 +7920,9 @@ dependencies = [
[[package]]
name = "webpki-roots"
version = "0.26.9"
version = "0.26.10"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "29aad86cec885cafd03e8305fd727c418e970a521322c91688414d5b8efba16b"
checksum = "37493cadf42a2a939ed404698ded7fb378bf301b5011f973361779a3a74f8c93"
dependencies = [
"rustls-pki-types",
]
@@ -8397,9 +8412,9 @@ checksum = "271414315aff87387382ec3d271b52d7ae78726f5d44ac98b4f4030c91880486"
[[package]]
name = "winnow"
version = "0.7.7"
version = "0.7.9"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "6cb8234a863ea0e8cd7284fcdd4f145233eb00fee02bbdd9861aec44e6477bc5"
checksum = "d9fb597c990f03753e08d3c29efbfcf2019a003b4bf4ba19225c158e1549f0f3"
dependencies = [
"memchr",
]

View File

@@ -21,14 +21,14 @@ categories = ["database-implementations"]
rust-version = "1.78.0"
[workspace.dependencies]
lance = { "version" = "=0.27.0", "features" = ["dynamodb"], tag = "v0.27.0-beta.2", git="https://github.com/lancedb/lance.git" }
lance-io = { version = "=0.27.0", tag = "v0.27.0-beta.2", git="https://github.com/lancedb/lance.git" }
lance-index = { version = "=0.27.0", tag = "v0.27.0-beta.2", git="https://github.com/lancedb/lance.git" }
lance-linalg = { version = "=0.27.0", tag = "v0.27.0-beta.2", git="https://github.com/lancedb/lance.git" }
lance-table = { version = "=0.27.0", tag = "v0.27.0-beta.2", git="https://github.com/lancedb/lance.git" }
lance-testing = { version = "=0.27.0", tag = "v0.27.0-beta.2", git="https://github.com/lancedb/lance.git" }
lance-datafusion = { version = "=0.27.0", tag = "v0.27.0-beta.2", git="https://github.com/lancedb/lance.git" }
lance-encoding = { version = "=0.27.0", tag = "v0.27.0-beta.2", git="https://github.com/lancedb/lance.git" }
lance = { "version" = "=0.27.2", "features" = ["dynamodb"] }
lance-io = { version = "=0.27.2" }
lance-index = { version = "=0.27.2" }
lance-linalg = { version = "=0.27.2" }
lance-table = { version = "=0.27.2" }
lance-testing = { version = "=0.27.2" }
lance-datafusion = { version = "=0.27.2" }
lance-encoding = { version = "=0.27.2" }
# Note that this one does not include pyarrow
arrow = { version = "54.1", optional = false }
arrow-array = "54.1"
@@ -61,15 +61,12 @@ rand = "0.8"
regex = "1.10"
lazy_static = "1"
semver = "1.0.25"
# Temporary pins to work around downstream issues
# https://github.com/apache/arrow-rs/commit/2fddf85afcd20110ce783ed5b4cdeb82293da30b
chrono = "=0.4.39"
# https://github.com/RustCrypto/formats/issues/1684
base64ct = "=1.6.0"
# Workaround for: https://github.com/eira-fransham/crunchy/issues/13
crunchy = "=0.2.2"
# Workaround for: https://github.com/Lokathor/bytemuck/issues/306
bytemuck_derive = ">=1.8.1, <1.9.0"

View File

@@ -205,6 +205,7 @@ nav:
- PromptTools: integrations/prompttools.md
- dlt: integrations/dlt.md
- phidata: integrations/phidata.md
- Genkit: integrations/genkit.md
- 🎯 Examples:
- Overview: examples/index.md
- 🐍 Python:
@@ -331,6 +332,7 @@ nav:
- PromptTools: integrations/prompttools.md
- dlt: integrations/dlt.md
- phidata: integrations/phidata.md
- Genkit: integrations/genkit.md
- Examples:
- examples/index.md
- 🐍 Python:

View File

@@ -0,0 +1,183 @@
### genkitx-lancedb
This is a lancedb plugin for genkit framework. It allows you to use LanceDB for ingesting and rereiving data using genkit framework.
![integration-banner-genkit](https://github.com/user-attachments/assets/a6cc28af-98e9-4425-b87c-7ab139bd7893)
### Installation
```bash
pnpm install genkitx-lancedb
```
### Usage
Adding LanceDB plugin to your genkit instance.
```ts
import { lancedbIndexerRef, lancedb, lancedbRetrieverRef, WriteMode } from 'genkitx-lancedb';
import { textEmbedding004, vertexAI } from '@genkit-ai/vertexai';
import { gemini } from '@genkit-ai/vertexai';
import { z, genkit } from 'genkit';
import { Document } from 'genkit/retriever';
import { chunk } from 'llm-chunk';
import { readFile } from 'fs/promises';
import path from 'path';
import pdf from 'pdf-parse/lib/pdf-parse';
const ai = genkit({
plugins: [
// vertexAI provides the textEmbedding004 embedder
vertexAI(),
// the local vector store requires an embedder to translate from text to vector
lancedb([
{
dbUri: '.db', // optional lancedb uri, default to .db
tableName: 'table', // optional table name, default to table
embedder: textEmbedding004,
},
]),
],
});
```
You can run this app with the following command:
```bash
genkit start -- tsx --watch src/index.ts
```
This'll add LanceDB as a retriever and indexer to the genkit instance. You can see it in the GUI view
<img width="1710" alt="Screenshot 2025-05-11 at 7 21 05PM" src="https://github.com/user-attachments/assets/e752f7f4-785b-4797-a11e-72ab06a531b7" />
**Testing retrieval on a sample table**
Let's see the raw retrieval results
<img width="1710" alt="Screenshot 2025-05-11 at 7 21 05PM" src="https://github.com/user-attachments/assets/b8d356ed-8421-4790-8fc0-d6af563b9657" />
On running this query, you'll 5 results fetched from the lancedb table, where each result looks something like this:
<img width="1417" alt="Screenshot 2025-05-11 at 7 21 18PM" src="https://github.com/user-attachments/assets/77429525-36e2-4da6-a694-e58c1cf9eb83" />
## Creating a custom RAG flow
Now that we've seen how you can use LanceDB for in a genkit pipeline, let's refine the flow and create a RAG. A RAG flow will consist of an index and a retreiver with its outputs postprocessed an fed into an LLM for final response
### Creating custom indexer flows
You can also create custom indexer flows, utilizing more options and features provided by LanceDB.
```ts
export const menuPdfIndexer = lancedbIndexerRef({
// Using all defaults, for dbUri, tableName, and embedder, etc
});
const chunkingConfig = {
minLength: 1000,
maxLength: 2000,
splitter: 'sentence',
overlap: 100,
delimiters: '',
} as any;
async function extractTextFromPdf(filePath: string) {
const pdfFile = path.resolve(filePath);
const dataBuffer = await readFile(pdfFile);
const data = await pdf(dataBuffer);
return data.text;
}
export const indexMenu = ai.defineFlow(
{
name: 'indexMenu',
inputSchema: z.string().describe('PDF file path'),
outputSchema: z.void(),
},
async (filePath: string) => {
filePath = path.resolve(filePath);
// Read the pdf.
const pdfTxt = await ai.run('extract-text', () =>
extractTextFromPdf(filePath)
);
// Divide the pdf text into segments.
const chunks = await ai.run('chunk-it', async () =>
chunk(pdfTxt, chunkingConfig)
);
// Convert chunks of text into documents to store in the index.
const documents = chunks.map((text) => {
return Document.fromText(text, { filePath });
});
// Add documents to the index.
await ai.index({
indexer: menuPdfIndexer,
documents,
options: {
writeMode: WriteMode.Overwrite,
} as any
});
}
);
```
<img width="1316" alt="Screenshot 2025-05-11 at 8 35 56PM" src="https://github.com/user-attachments/assets/e2a20ce4-d1d0-4fa2-9a84-f2cc26e3a29f" />
In your console, you can see the logs
<img width="511" alt="Screenshot 2025-05-11 at 7 19 14PM" src="https://github.com/user-attachments/assets/243f26c5-ed38-40b6-b661-002f40f0423a" />
### Creating custom retriever flows
You can also create custom retriever flows, utilizing more options and features provided by LanceDB.
```ts
export const menuRetriever = lancedbRetrieverRef({
tableName: "table", // Use the same table name as the indexer.
displayName: "Menu", // Use a custom display name.
export const menuQAFlow = ai.defineFlow(
{ name: "Menu", inputSchema: z.string(), outputSchema: z.string() },
async (input: string) => {
// retrieve relevant documents
const docs = await ai.retrieve({
retriever: menuRetriever,
query: input,
options: {
k: 3,
},
});
const extractedContent = docs.map(doc => {
if (doc.content && Array.isArray(doc.content) && doc.content.length > 0) {
if (doc.content[0].media && doc.content[0].media.url) {
return doc.content[0].media.url;
}
}
return "No content found";
});
console.log("Extracted content:", extractedContent);
const { text } = await ai.generate({
model: gemini('gemini-2.0-flash'),
prompt: `
You are acting as a helpful AI assistant that can answer
questions about the food available on the menu at Genkit Grub Pub.
Use only the context provided to answer the question.
If you don't know, do not make up an answer.
Do not add or change items on the menu.
Context:
${extractedContent.join('\n\n')}
Question: ${input}`,
docs,
});
return text;
}
);
```
Now using our retrieval flow, we can ask question about the ingsted PDF
<img width="1306" alt="Screenshot 2025-05-11 at 7 18 45PM" src="https://github.com/user-attachments/assets/86c66b13-7c12-4d5f-9d81-ae36bfb1c346" />

View File

@@ -33,7 +33,7 @@ Construct a MergeInsertBuilder. __Internal use only.__
### execute()
```ts
execute(data): Promise<MergeResult>
execute(data, execOptions?): Promise<MergeResult>
```
Executes the merge insert operation
@@ -42,6 +42,8 @@ Executes the merge insert operation
* **data**: [`Data`](../type-aliases/Data.md)
* **execOptions?**: `Partial`&lt;[`WriteExecutionOptions`](../interfaces/WriteExecutionOptions.md)&gt;
#### Returns
`Promise`&lt;[`MergeResult`](../interfaces/MergeResult.md)&gt;

View File

@@ -72,6 +72,7 @@
- [UpdateOptions](interfaces/UpdateOptions.md)
- [UpdateResult](interfaces/UpdateResult.md)
- [Version](interfaces/Version.md)
- [WriteExecutionOptions](interfaces/WriteExecutionOptions.md)
## Type Aliases

View File

@@ -0,0 +1,26 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / WriteExecutionOptions
# Interface: WriteExecutionOptions
## Properties
### timeoutMs?
```ts
optional timeoutMs: number;
```
Maximum time to run the operation before cancelling it.
By default, there is a 30-second timeout that is only enforced after the
first attempt. This is to prevent spending too long retrying to resolve
conflicts. For example, if a write attempt takes 20 seconds and fails,
the second attempt will be cancelled after 10 seconds, hitting the
30-second timeout. However, a write that takes one hour and succeeds on the
first attempt will not be cancelled.
When this is set, the timeout is enforced on all attempts, including the first.

View File

@@ -8,7 +8,7 @@
<parent>
<groupId>com.lancedb</groupId>
<artifactId>lancedb-parent</artifactId>
<version>0.19.1-beta.1</version>
<version>0.19.1-beta.5</version>
<relativePath>../pom.xml</relativePath>
</parent>

View File

@@ -6,7 +6,7 @@
<groupId>com.lancedb</groupId>
<artifactId>lancedb-parent</artifactId>
<version>0.19.1-beta.1</version>
<version>0.19.1-beta.5</version>
<packaging>pom</packaging>
<name>LanceDB Parent</name>

44
node/package-lock.json generated
View File

@@ -1,12 +1,12 @@
{
"name": "vectordb",
"version": "0.19.1-beta.1",
"version": "0.19.1-beta.5",
"lockfileVersion": 3,
"requires": true,
"packages": {
"": {
"name": "vectordb",
"version": "0.19.1-beta.1",
"version": "0.19.1-beta.5",
"cpu": [
"x64",
"arm64"
@@ -52,11 +52,11 @@
"uuid": "^9.0.0"
},
"optionalDependencies": {
"@lancedb/vectordb-darwin-arm64": "0.19.1-beta.1",
"@lancedb/vectordb-darwin-x64": "0.19.1-beta.1",
"@lancedb/vectordb-linux-arm64-gnu": "0.19.1-beta.1",
"@lancedb/vectordb-linux-x64-gnu": "0.19.1-beta.1",
"@lancedb/vectordb-win32-x64-msvc": "0.19.1-beta.1"
"@lancedb/vectordb-darwin-arm64": "0.19.1-beta.5",
"@lancedb/vectordb-darwin-x64": "0.19.1-beta.5",
"@lancedb/vectordb-linux-arm64-gnu": "0.19.1-beta.5",
"@lancedb/vectordb-linux-x64-gnu": "0.19.1-beta.5",
"@lancedb/vectordb-win32-x64-msvc": "0.19.1-beta.5"
},
"peerDependencies": {
"@apache-arrow/ts": "^14.0.2",
@@ -327,9 +327,9 @@
}
},
"node_modules/@lancedb/vectordb-darwin-arm64": {
"version": "0.19.1-beta.1",
"resolved": "https://registry.npmjs.org/@lancedb/vectordb-darwin-arm64/-/vectordb-darwin-arm64-0.19.1-beta.1.tgz",
"integrity": "sha512-Epvel0pF5TM6MtIWQ2KhqezqSSHTL3Wr7a2rGAwz6X/XY23i6DbMPpPs0HyeIDzDrhxNfE3cz3S+SiCA6xpR0g==",
"version": "0.19.1-beta.5",
"resolved": "https://registry.npmjs.org/@lancedb/vectordb-darwin-arm64/-/vectordb-darwin-arm64-0.19.1-beta.5.tgz",
"integrity": "sha512-9WcTw67We5HYGayDt5jFquGoyAVzFSt/I65ag8+q7H9q4ZYKxeDhgNyQZJ8BmXEvbJtnYtYBSAtTEdFKYMce6w==",
"cpu": [
"arm64"
],
@@ -340,9 +340,9 @@
]
},
"node_modules/@lancedb/vectordb-darwin-x64": {
"version": "0.19.1-beta.1",
"resolved": "https://registry.npmjs.org/@lancedb/vectordb-darwin-x64/-/vectordb-darwin-x64-0.19.1-beta.1.tgz",
"integrity": "sha512-hOiUSlIoISbiXytp46hToi/r6sF5pImAsfbzCsIq8ExDV4TPa8fjbhcIT80vxxOwc2mpSSK4HsVJYod95RSbEQ==",
"version": "0.19.1-beta.5",
"resolved": "https://registry.npmjs.org/@lancedb/vectordb-darwin-x64/-/vectordb-darwin-x64-0.19.1-beta.5.tgz",
"integrity": "sha512-6Pe3PxEMi0VKGsu5R7IhOxTijUM3b5olRAqhxfcu5ti34gXIPNtu7g+T9lS78LKe+0D0v2BjZEY/JQakIFBNRw==",
"cpu": [
"x64"
],
@@ -353,9 +353,9 @@
]
},
"node_modules/@lancedb/vectordb-linux-arm64-gnu": {
"version": "0.19.1-beta.1",
"resolved": "https://registry.npmjs.org/@lancedb/vectordb-linux-arm64-gnu/-/vectordb-linux-arm64-gnu-0.19.1-beta.1.tgz",
"integrity": "sha512-/1JhGVDEngwrlM8o2TNW8G6nJ9U/VgHKAORmj/cTA7O30helJIoo9jfvUAUy+vZ4VoEwRXQbMI+gaYTg0l3MTg==",
"version": "0.19.1-beta.5",
"resolved": "https://registry.npmjs.org/@lancedb/vectordb-linux-arm64-gnu/-/vectordb-linux-arm64-gnu-0.19.1-beta.5.tgz",
"integrity": "sha512-VJbBd+Y+6L2SREaOO1OzuUfTPHXyHE4AcsZuM6VMyoeX8k7lPnaA+vNk96o0w4V2KFEAI6o4QPgrRAXmMAzmbg==",
"cpu": [
"arm64"
],
@@ -366,9 +366,9 @@
]
},
"node_modules/@lancedb/vectordb-linux-x64-gnu": {
"version": "0.19.1-beta.1",
"resolved": "https://registry.npmjs.org/@lancedb/vectordb-linux-x64-gnu/-/vectordb-linux-x64-gnu-0.19.1-beta.1.tgz",
"integrity": "sha512-zNRGSSUt8nTJMmll4NdxhQjwxR8Rezq3T4dsRoiDts5ienMam5HFjYiZ3FkDZQo16rgq2BcbFuH1G8u1chywlg==",
"version": "0.19.1-beta.5",
"resolved": "https://registry.npmjs.org/@lancedb/vectordb-linux-x64-gnu/-/vectordb-linux-x64-gnu-0.19.1-beta.5.tgz",
"integrity": "sha512-3wS8Zn5NmHoszXfrY4JzMimHoh5LAmVi3pTX4gD+C9kVGoUJcDBP7/CrAbjnAz7VzzAIPmz8kvBuPz8l9X4hjw==",
"cpu": [
"x64"
],
@@ -379,9 +379,9 @@
]
},
"node_modules/@lancedb/vectordb-win32-x64-msvc": {
"version": "0.19.1-beta.1",
"resolved": "https://registry.npmjs.org/@lancedb/vectordb-win32-x64-msvc/-/vectordb-win32-x64-msvc-0.19.1-beta.1.tgz",
"integrity": "sha512-yV550AJGlsIFdm1KoHQPJ1TZx121ZXCIdebBtBZj3wOObIhyB/i0kZAtGvwjkmr7EYyfzt1EHZzbjSGVdehIAA==",
"version": "0.19.1-beta.5",
"resolved": "https://registry.npmjs.org/@lancedb/vectordb-win32-x64-msvc/-/vectordb-win32-x64-msvc-0.19.1-beta.5.tgz",
"integrity": "sha512-TemM9cvrPa2jFCjvYmKnrL0DTHegi/+LOQ3No9nPDHie2ka2fM9O2q60fAbYsYz+Mo9aV7MvL49ATbNCyl9MLA==",
"cpu": [
"x64"
],

View File

@@ -1,6 +1,6 @@
{
"name": "vectordb",
"version": "0.19.1-beta.1",
"version": "0.19.1-beta.5",
"description": " Serverless, low-latency vector database for AI applications",
"private": false,
"main": "dist/index.js",
@@ -89,10 +89,10 @@
}
},
"optionalDependencies": {
"@lancedb/vectordb-darwin-x64": "0.19.1-beta.1",
"@lancedb/vectordb-darwin-arm64": "0.19.1-beta.1",
"@lancedb/vectordb-linux-x64-gnu": "0.19.1-beta.1",
"@lancedb/vectordb-linux-arm64-gnu": "0.19.1-beta.1",
"@lancedb/vectordb-win32-x64-msvc": "0.19.1-beta.1"
"@lancedb/vectordb-darwin-x64": "0.19.1-beta.5",
"@lancedb/vectordb-darwin-arm64": "0.19.1-beta.5",
"@lancedb/vectordb-linux-x64-gnu": "0.19.1-beta.5",
"@lancedb/vectordb-linux-arm64-gnu": "0.19.1-beta.5",
"@lancedb/vectordb-win32-x64-msvc": "0.19.1-beta.5"
}
}

View File

@@ -1,7 +1,7 @@
[package]
name = "lancedb-nodejs"
edition.workspace = true
version = "0.19.1-beta.1"
version = "0.19.1-beta.5"
license.workspace = true
description.workspace = true
repository.workspace = true

View File

@@ -349,7 +349,7 @@ describe("merge insert", () => {
.mergeInsert("a")
.whenMatchedUpdateAll()
.whenNotMatchedInsertAll()
.execute(newData);
.execute(newData, { timeoutMs: 10_000 });
expect(mergeInsertRes).toHaveProperty("version");
expect(mergeInsertRes.version).toBe(2);
expect(mergeInsertRes.numInsertedRows).toBe(1);
@@ -463,6 +463,20 @@ describe("merge insert", () => {
res = res.sort((a, b) => a.a - b.a);
expect(res).toEqual(expected);
});
test("timeout", async () => {
const newData = [
{ a: 2, b: "x" },
{ a: 4, b: "z" },
];
await expect(
table
.mergeInsert("a")
.whenMatchedUpdateAll()
.whenNotMatchedInsertAll()
.execute(newData, { timeoutMs: 0 }),
).rejects.toThrow("merge insert timed out");
});
});
describe("When creating an index", () => {
@@ -1287,6 +1301,32 @@ describe("when dealing with tags", () => {
await table.checkoutLatest();
expect(await table.version()).toBe(4);
});
it("can checkout and restore tags", async () => {
const conn = await connect(tmpDir.name, {
readConsistencyInterval: 0,
});
const table = await conn.createTable("my_table", [
{ id: 1n, vector: [0.1, 0.2] },
]);
expect(await table.version()).toBe(1);
expect(await table.countRows()).toBe(1);
const tagsManager = await table.tags();
const tag1 = "tag1";
await tagsManager.create(tag1, 1);
await table.add([{ id: 2n, vector: [0.3, 0.4] }]);
const tag2 = "tag2";
await tagsManager.create(tag2, 2);
expect(await table.version()).toBe(2);
await table.checkout(tag1);
expect(await table.version()).toBe(1);
await table.restore();
expect(await table.version()).toBe(3);
expect(await table.countRows()).toBe(1);
await table.add([{ id: 3n, vector: [0.5, 0.6] }]);
expect(await table.countRows()).toBe(2);
});
});
describe("when optimizing a dataset", () => {

View File

@@ -86,7 +86,7 @@ export {
ColumnAlteration,
} from "./table";
export { MergeInsertBuilder } from "./merge";
export { MergeInsertBuilder, WriteExecutionOptions } from "./merge";
export * as embedding from "./embedding";
export * as rerankers from "./rerankers";

View File

@@ -75,7 +75,10 @@ export class MergeInsertBuilder {
*
* @returns {Promise<MergeResult>} the merge result
*/
async execute(data: Data): Promise<MergeResult> {
async execute(
data: Data,
execOptions?: Partial<WriteExecutionOptions>,
): Promise<MergeResult> {
let schema: Schema;
if (this.#schema instanceof Promise) {
schema = await this.#schema;
@@ -83,7 +86,28 @@ export class MergeInsertBuilder {
} else {
schema = this.#schema;
}
if (execOptions?.timeoutMs !== undefined) {
this.#native.setTimeout(execOptions.timeoutMs);
}
const buffer = await fromDataToBuffer(data, undefined, schema);
return await this.#native.execute(buffer);
}
}
export interface WriteExecutionOptions {
/**
* Maximum time to run the operation before cancelling it.
*
* By default, there is a 30-second timeout that is only enforced after the
* first attempt. This is to prevent spending too long retrying to resolve
* conflicts. For example, if a write attempt takes 20 seconds and fails,
* the second attempt will be cancelled after 10 seconds, hitting the
* 30-second timeout. However, a write that takes one hour and succeeds on the
* first attempt will not be cancelled.
*
* When this is set, the timeout is enforced on all attempts, including the first.
*/
timeoutMs?: number;
}

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-darwin-arm64",
"version": "0.19.1-beta.1",
"version": "0.19.1-beta.5",
"os": ["darwin"],
"cpu": ["arm64"],
"main": "lancedb.darwin-arm64.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-darwin-x64",
"version": "0.19.1-beta.1",
"version": "0.19.1-beta.5",
"os": ["darwin"],
"cpu": ["x64"],
"main": "lancedb.darwin-x64.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-linux-arm64-gnu",
"version": "0.19.1-beta.1",
"version": "0.19.1-beta.5",
"os": ["linux"],
"cpu": ["arm64"],
"main": "lancedb.linux-arm64-gnu.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-linux-arm64-musl",
"version": "0.19.1-beta.1",
"version": "0.19.1-beta.5",
"os": ["linux"],
"cpu": ["arm64"],
"main": "lancedb.linux-arm64-musl.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-linux-x64-gnu",
"version": "0.19.1-beta.1",
"version": "0.19.1-beta.5",
"os": ["linux"],
"cpu": ["x64"],
"main": "lancedb.linux-x64-gnu.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-linux-x64-musl",
"version": "0.19.1-beta.1",
"version": "0.19.1-beta.5",
"os": ["linux"],
"cpu": ["x64"],
"main": "lancedb.linux-x64-musl.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-win32-arm64-msvc",
"version": "0.19.1-beta.1",
"version": "0.19.1-beta.5",
"os": [
"win32"
],

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-win32-x64-msvc",
"version": "0.19.1-beta.1",
"version": "0.19.1-beta.5",
"os": ["win32"],
"cpu": ["x64"],
"main": "lancedb.win32-x64-msvc.node",

View File

@@ -1,12 +1,12 @@
{
"name": "@lancedb/lancedb",
"version": "0.19.1-beta.1",
"version": "0.19.1-beta.5",
"lockfileVersion": 3,
"requires": true,
"packages": {
"": {
"name": "@lancedb/lancedb",
"version": "0.19.1-beta.1",
"version": "0.19.1-beta.5",
"cpu": [
"x64",
"arm64"

View File

@@ -11,7 +11,7 @@
"ann"
],
"private": false,
"version": "0.19.1-beta.1",
"version": "0.19.1-beta.5",
"main": "dist/index.js",
"exports": {
".": "./dist/index.js",

View File

@@ -1,6 +1,8 @@
// SPDX-License-Identifier: Apache-2.0
// SPDX-FileCopyrightText: Copyright The LanceDB Authors
use std::time::Duration;
use lancedb::{arrow::IntoArrow, ipc::ipc_file_to_batches, table::merge::MergeInsertBuilder};
use napi::bindgen_prelude::*;
use napi_derive::napi;
@@ -36,6 +38,11 @@ impl NativeMergeInsertBuilder {
this
}
#[napi]
pub fn set_timeout(&mut self, timeout: u32) {
self.inner.timeout(Duration::from_millis(timeout as u64));
}
#[napi(catch_unwind)]
pub async fn execute(&self, buf: Buffer) -> napi::Result<MergeResult> {
let data = ipc_file_to_batches(buf.to_vec())

View File

@@ -1,5 +1,5 @@
[tool.bumpversion]
current_version = "0.22.1-beta.2"
current_version = "0.22.1"
parse = """(?x)
(?P<major>0|[1-9]\\d*)\\.
(?P<minor>0|[1-9]\\d*)\\.

View File

@@ -1,6 +1,6 @@
[package]
name = "lancedb-python"
version = "0.22.1-beta.2"
version = "0.22.1"
edition.workspace = true
description = "Python bindings for LanceDB"
license.workspace = true

View File

@@ -51,7 +51,7 @@ class Table:
async def version(self) -> int: ...
async def checkout(self, version: Union[int, str]): ...
async def checkout_latest(self): ...
async def restore(self, version: Optional[int] = None): ...
async def restore(self, version: Optional[Union[int, str]] = None): ...
async def list_indices(self) -> list[IndexConfig]: ...
async def delete(self, filter: str) -> DeleteResult: ...
async def add_columns(self, columns: list[tuple[str, str]]) -> AddColumnsResult: ...

View File

@@ -4,6 +4,7 @@
from __future__ import annotations
from datetime import timedelta
from typing import TYPE_CHECKING, List, Optional
if TYPE_CHECKING:
@@ -31,6 +32,7 @@ class LanceMergeInsertBuilder(object):
self._when_not_matched_insert_all = False
self._when_not_matched_by_source_delete = False
self._when_not_matched_by_source_condition = None
self._timeout = None
def when_matched_update_all(
self, *, where: Optional[str] = None
@@ -81,6 +83,7 @@ class LanceMergeInsertBuilder(object):
new_data: DATA,
on_bad_vectors: str = "error",
fill_value: float = 0.0,
timeout: Optional[timedelta] = None,
) -> MergeInsertResult:
"""
Executes the merge insert operation
@@ -98,10 +101,24 @@ class LanceMergeInsertBuilder(object):
One of "error", "drop", "fill".
fill_value: float, default 0.
The value to use when filling vectors. Only used if on_bad_vectors="fill".
timeout: Optional[timedelta], default None
Maximum time to run the operation before cancelling it.
By default, there is a 30-second timeout that is only enforced after the
first attempt. This is to prevent spending too long retrying to resolve
conflicts. For example, if a write attempt takes 20 seconds and fails,
the second attempt will be cancelled after 10 seconds, hitting the
30-second timeout. However, a write that takes one hour and succeeds on the
first attempt will not be cancelled.
When this is set, the timeout is enforced on all attempts, including
the first.
Returns
-------
MergeInsertResult
version: the new version number of the table after doing merge insert.
"""
if timeout is not None:
self._timeout = timeout
return self._table._do_merge(self, new_data, on_bad_vectors, fill_value)

View File

@@ -47,9 +47,6 @@ class RemoteTable(Table):
def __repr__(self) -> str:
return f"RemoteTable({self.db_name}.{self.name})"
def __len__(self) -> int:
self.count_rows(None)
@property
def schema(self) -> pa.Schema:
"""The [Arrow Schema](https://arrow.apache.org/docs/python/api/datatypes.html#)
@@ -100,7 +97,7 @@ class RemoteTable(Table):
def checkout_latest(self):
return LOOP.run(self._table.checkout_latest())
def restore(self, version: Optional[int] = None):
def restore(self, version: Optional[Union[int, str]] = None):
return LOOP.run(self._table.restore(version))
def list_indices(self) -> Iterable[IndexConfig]:

View File

@@ -620,6 +620,10 @@ class Table(ABC):
"""
raise NotImplementedError
def __len__(self) -> int:
"""The number of rows in this Table"""
return self.count_rows(None)
@property
@abstractmethod
def embedding_functions(self) -> Dict[str, EmbeddingFunctionConfig]:
@@ -1470,7 +1474,7 @@ class Table(ABC):
"""
@abstractmethod
def restore(self, version: Optional[int] = None):
def restore(self, version: Optional[Union[int, str]] = None):
"""Restore a version of the table. This is an in-place operation.
This creates a new version where the data is equivalent to the
@@ -1478,9 +1482,10 @@ class Table(ABC):
Parameters
----------
version : int, default None
The version to restore. If unspecified then restores the currently
checked out version. If the currently checked out version is the
version : int or str, default None
The version number or version tag to restore.
If unspecified then restores the currently checked out version.
If the currently checked out version is the
latest version then this is a no-op.
"""
@@ -1710,7 +1715,7 @@ class LanceTable(Table):
"""
LOOP.run(self._table.checkout_latest())
def restore(self, version: Optional[int] = None):
def restore(self, version: Optional[Union[int, str]] = None):
"""Restore a version of the table. This is an in-place operation.
This creates a new version where the data is equivalent to the
@@ -1718,9 +1723,10 @@ class LanceTable(Table):
Parameters
----------
version : int, default None
The version to restore. If unspecified then restores the currently
checked out version. If the currently checked out version is the
version : int or str, default None
The version number or version tag to restore.
If unspecified then restores the currently checked out version.
If the currently checked out version is the
latest version then this is a no-op.
Examples
@@ -1738,12 +1744,20 @@ class LanceTable(Table):
AddResult(version=2)
>>> table.version
2
>>> table.tags.create("v2", 2)
>>> table.restore(1)
>>> table.to_pandas()
vector type
0 [1.1, 0.9] vector
>>> len(table.list_versions())
3
>>> table.restore("v2")
>>> table.to_pandas()
vector type
0 [1.1, 0.9] vector
1 [0.5, 0.2] vector
>>> len(table.list_versions())
4
"""
if version is not None:
LOOP.run(self._table.checkout(version))
@@ -1752,9 +1766,6 @@ class LanceTable(Table):
def count_rows(self, filter: Optional[str] = None) -> int:
return LOOP.run(self._table.count_rows(filter))
def __len__(self) -> int:
return self.count_rows()
def __repr__(self) -> str:
val = f"{self.__class__.__name__}(name={self.name!r}, version={self.version}"
if self._conn.read_consistency_interval is not None:
@@ -3705,6 +3716,7 @@ class AsyncTable:
when_not_matched_insert_all=merge._when_not_matched_insert_all,
when_not_matched_by_source_delete=merge._when_not_matched_by_source_delete,
when_not_matched_by_source_condition=merge._when_not_matched_by_source_condition,
timeout=merge._timeout,
),
)
@@ -3962,7 +3974,7 @@ class AsyncTable:
"""
await self._inner.checkout_latest()
async def restore(self, version: Optional[int] = None):
async def restore(self, version: Optional[int | str] = None):
"""
Restore the table to the currently checked out version

View File

@@ -149,6 +149,24 @@ async def test_async_checkout():
assert await table.count_rows() == 300
def test_table_len_sync():
def handler(request):
if request.path == "/v1/table/test/create/?mode=create":
request.send_response(200)
request.send_header("Content-Type", "application/json")
request.end_headers()
request.wfile.write(b"{}")
request.send_response(200)
request.send_header("Content-Type", "application/json")
request.end_headers()
request.wfile.write(json.dumps(1).encode())
with mock_lancedb_connection(handler) as db:
table = db.create_table("test", [{"id": 1}])
assert len(table) == 1
@pytest.mark.asyncio
async def test_http_error():
request_id_holder = {"request_id": None}

View File

@@ -769,6 +769,29 @@ def test_restore(mem_db: DBConnection):
table.restore(0)
def test_restore_with_tags(mem_db: DBConnection):
table = mem_db.create_table(
"my_table",
data=[{"vector": [1.1, 0.9], "type": "vector"}],
)
tag = "tag1"
table.tags.create(tag, 1)
table.add([{"vector": [0.5, 0.2], "type": "vector"}])
table.restore(tag)
assert len(table.list_versions()) == 3
assert len(table) == 1
expected = table.to_arrow()
table.add([{"vector": [0.3, 0.3], "type": "vector"}])
table.checkout("tag1")
table.restore()
assert len(table.list_versions()) == 5
assert table.to_arrow() == expected
with pytest.raises(ValueError):
table.restore("tag_unknown")
def test_merge(tmp_db: DBConnection, tmp_path):
pytest.importorskip("lance")
import lance
@@ -914,7 +937,7 @@ def test_merge_insert(mem_db: DBConnection):
table.merge_insert("a")
.when_matched_update_all()
.when_not_matched_insert_all()
.execute(new_data)
.execute(new_data, timeout=timedelta(seconds=10))
)
assert merge_insert_res.version == 2
assert merge_insert_res.num_inserted_rows == 1
@@ -990,6 +1013,12 @@ def test_merge_insert(mem_db: DBConnection):
expected = pa.table({"a": [2, 4], "b": ["x", "z"]})
assert table.to_arrow().sort_by("a") == expected
# timeout
with pytest.raises(Exception, match="merge insert timed out"):
table.merge_insert("a").when_matched_update_all().execute(
new_data, timeout=timedelta(0)
)
# We vary the data format because there are slight differences in how
# subschemas are handled in different formats

View File

@@ -17,10 +17,10 @@ use lancedb::table::{
Table as LanceDbTable,
};
use pyo3::{
exceptions::{PyIOError, PyKeyError, PyRuntimeError, PyValueError},
exceptions::{PyKeyError, PyRuntimeError, PyValueError},
pyclass, pymethods,
types::{IntoPyDict, PyAnyMethods, PyDict, PyDictMethods, PyInt, PyString},
Bound, FromPyObject, PyAny, PyObject, PyRef, PyResult, Python,
types::{IntoPyDict, PyAnyMethods, PyDict, PyDictMethods},
Bound, FromPyObject, PyAny, PyRef, PyResult, Python,
};
use pyo3_async_runtimes::tokio::future_into_py;
@@ -520,25 +520,15 @@ impl Table {
})
}
pub fn checkout(self_: PyRef<'_, Self>, version: PyObject) -> PyResult<Bound<'_, PyAny>> {
pub fn checkout(self_: PyRef<'_, Self>, version: LanceVersion) -> PyResult<Bound<'_, PyAny>> {
let inner = self_.inner_ref()?.clone();
let py = self_.py();
let (is_int, int_value, string_value) = if let Ok(i) = version.downcast_bound::<PyInt>(py) {
let num: u64 = i.extract()?;
(true, num, String::new())
} else if let Ok(s) = version.downcast_bound::<PyString>(py) {
let str_value = s.to_string();
(false, 0, str_value)
} else {
return Err(PyIOError::new_err(
"version must be an integer or a string.",
));
};
future_into_py(py, async move {
if is_int {
inner.checkout(int_value).await.infer_error()
} else {
inner.checkout_tag(&string_value).await.infer_error()
match version {
LanceVersion::Version(version_num) => {
inner.checkout(version_num).await.infer_error()
}
LanceVersion::Tag(tag) => inner.checkout_tag(&tag).await.infer_error(),
}
})
}
@@ -551,12 +541,19 @@ impl Table {
}
#[pyo3(signature = (version=None))]
pub fn restore(self_: PyRef<'_, Self>, version: Option<u64>) -> PyResult<Bound<'_, PyAny>> {
pub fn restore(
self_: PyRef<'_, Self>,
version: Option<LanceVersion>,
) -> PyResult<Bound<'_, PyAny>> {
let inner = self_.inner_ref()?.clone();
let py = self_.py();
future_into_py(self_.py(), async move {
future_into_py(py, async move {
if let Some(version) = version {
inner.checkout(version).await.infer_error()?;
match version {
LanceVersion::Version(num) => inner.checkout(num).await.infer_error()?,
LanceVersion::Tag(tag) => inner.checkout_tag(&tag).await.infer_error()?,
}
}
inner.restore().await.infer_error()
})
@@ -652,6 +649,9 @@ impl Table {
builder
.when_not_matched_by_source_delete(parameters.when_not_matched_by_source_condition);
}
if let Some(timeout) = parameters.timeout {
builder.timeout(timeout);
}
future_into_py(self_.py(), async move {
let res = builder.execute(Box::new(batches)).await.infer_error()?;
@@ -795,6 +795,12 @@ impl Table {
}
}
#[derive(FromPyObject)]
pub enum LanceVersion {
Version(u64),
Tag(String),
}
#[derive(FromPyObject)]
#[pyo3(from_item_all)]
pub struct MergeInsertParams {
@@ -804,6 +810,7 @@ pub struct MergeInsertParams {
when_not_matched_insert_all: bool,
when_not_matched_by_source_delete: bool,
when_not_matched_by_source_condition: Option<String>,
timeout: Option<std::time::Duration>,
}
#[pyclass]

View File

@@ -1,2 +1,2 @@
[toolchain]
channel = "1.83.0"
channel = "1.86.0"

View File

@@ -1,6 +1,6 @@
[package]
name = "lancedb-node"
version = "0.19.1-beta.1"
version = "0.19.1-beta.5"
description = "Serverless, low-latency vector database for AI applications"
license.workspace = true
edition.workspace = true

View File

@@ -1,6 +1,6 @@
[package]
name = "lancedb"
version = "0.19.1-beta.1"
version = "0.19.1-beta.5"
edition.workspace = true
description = "LanceDB: A serverless, low-latency vector database for AI applications"
license.workspace = true

View File

@@ -758,8 +758,7 @@ impl<S: HttpSend> BaseTable for RemoteTable<S> {
let (request_id, response) = self.send_streaming(request, data, true).await?;
let response = self.check_table_response(&request_id, response).await?;
let body = response.text().await.err_to_http(request_id.clone())?;
if body.trim().is_empty() || body == "{}" {
if body.trim().is_empty() {
// Backward compatible with old servers
return Ok(AddResult { version: 0 });
}
@@ -922,7 +921,7 @@ impl<S: HttpSend> BaseTable for RemoteTable<S> {
let response = self.check_table_response(&request_id, response).await?;
let body = response.text().await.err_to_http(request_id.clone())?;
if body.trim().is_empty() || body == "{}" {
if body.trim().is_empty() {
// Backward compatible with old servers
return Ok(UpdateResult {
rows_updated: 0,
@@ -950,12 +949,10 @@ impl<S: HttpSend> BaseTable for RemoteTable<S> {
let (request_id, response) = self.send(request, true).await?;
let response = self.check_table_response(&request_id, response).await?;
let body = response.text().await.err_to_http(request_id.clone())?;
if body == "{}" {
if body.trim().is_empty() {
// Backward compatible with old servers
return Ok(DeleteResult { version: 0 });
}
let delete_response: DeleteResult =
serde_json::from_str(&body).map_err(|e| Error::Http {
source: format!("Failed to parse delete response: {}", e).into(),
@@ -1071,19 +1068,28 @@ impl<S: HttpSend> BaseTable for RemoteTable<S> {
) -> Result<MergeResult> {
self.check_mutable().await?;
let timeout = params.timeout;
let query = MergeInsertRequest::try_from(params)?;
let request = self
let mut request = self
.client
.post(&format!("/v1/table/{}/merge_insert/", self.name))
.query(&query)
.header(CONTENT_TYPE, ARROW_STREAM_CONTENT_TYPE);
if let Some(timeout) = timeout {
// (If it doesn't fit into u64, it's not worth sending anyways.)
if let Ok(timeout_ms) = u64::try_from(timeout.as_millis()) {
request = request.header(REQUEST_TIMEOUT_HEADER, timeout_ms);
}
}
let (request_id, response) = self.send_streaming(request, new_data, true).await?;
let response = self.check_table_response(&request_id, response).await?;
let body = response.text().await.err_to_http(request_id.clone())?;
if body.trim().is_empty() || body == "{}" {
if body.trim().is_empty() {
// Backward compatible with old servers
return Ok(MergeResult {
version: 0,
@@ -1145,7 +1151,7 @@ impl<S: HttpSend> BaseTable for RemoteTable<S> {
let response = self.check_table_response(&request_id, response).await?;
let body = response.text().await.err_to_http(request_id.clone())?;
if body.trim().is_empty() || body == "{}" {
if body.trim().is_empty() {
// Backward compatible with old servers
return Ok(AddColumnsResult { version: 0 });
}
@@ -1198,7 +1204,7 @@ impl<S: HttpSend> BaseTable for RemoteTable<S> {
let response = self.check_table_response(&request_id, response).await?;
let body = response.text().await.err_to_http(request_id.clone())?;
if body.trim().is_empty() || body == "{}" {
if body.trim().is_empty() {
// Backward compatible with old servers
return Ok(AlterColumnsResult { version: 0 });
}
@@ -1223,7 +1229,7 @@ impl<S: HttpSend> BaseTable for RemoteTable<S> {
let response = self.check_table_response(&request_id, response).await?;
let body = response.text().await.err_to_http(request_id.clone())?;
if body.trim().is_empty() || body == "{}" {
if body.trim().is_empty() {
// Backward compatible with old servers
return Ok(DropColumnsResult { version: 0 });
}
@@ -1328,7 +1334,12 @@ impl<S: HttpSend> BaseTable for RemoteTable<S> {
self.name, index_name
));
let (request_id, response) = self.send(request, true).await?;
self.check_table_response(&request_id, response).await?;
if response.status() == StatusCode::NOT_FOUND {
return Err(Error::IndexNotFound {
name: index_name.to_string(),
});
};
self.client.check_response(&request_id, response).await?;
Ok(())
}
@@ -1603,16 +1614,21 @@ mod tests {
}
#[rstest]
#[case(true)]
#[case(false)]
#[case("", 0)]
#[case("{}", 0)]
#[case(r#"{"request_id": "test-request-id"}"#, 0)]
#[case(r#"{"version": 43}"#, 43)]
#[tokio::test]
async fn test_add_append(#[case] old_server: bool) {
async fn test_add_append(#[case] response_body: &str, #[case] expected_version: u64) {
let data = RecordBatch::try_new(
Arc::new(Schema::new(vec![Field::new("a", DataType::Int32, false)])),
vec![Arc::new(Int32Array::from(vec![1, 2, 3]))],
)
.unwrap();
// Clone response_body to give it 'static lifetime for the closure
let response_body = response_body.to_string();
let (sender, receiver) = std::sync::mpsc::channel();
let table = Table::new_with_handler("my_table", move |mut request| {
if request.url().path() == "/v1/table/my_table/insert/" {
@@ -1622,36 +1638,29 @@ mod tests {
.query_pairs()
.filter(|(k, _)| k == "mode")
.all(|(_, v)| v == "append"));
assert_eq!(
request.headers().get("Content-Type").unwrap(),
ARROW_STREAM_CONTENT_TYPE
);
let mut body_out = reqwest::Body::from(Vec::new());
std::mem::swap(request.body_mut().as_mut().unwrap(), &mut body_out);
sender.send(body_out).unwrap();
if old_server {
http::Response::builder().status(200).body("").unwrap()
} else {
http::Response::builder()
.status(200)
.body(r#"{"version": 43}"#)
.body(response_body.clone())
.unwrap()
}
} else {
panic!("Unexpected request path: {}", request.url().path());
}
});
let result = table
.add(RecordBatchIterator::new([Ok(data.clone())], data.schema()))
.execute()
.await
.unwrap();
assert_eq!(result.version, if old_server { 0 } else { 43 });
// Check version matches expected value
assert_eq!(result.version, expected_version);
let body = receiver.recv().unwrap();
let body = collect_body(body).await;
@@ -2884,6 +2893,22 @@ mod tests {
table.drop_index("my_index").await.unwrap();
}
#[tokio::test]
async fn test_drop_index_not_exists() {
let table = Table::new_with_handler("my_table", |request| {
assert_eq!(request.method(), "POST");
assert_eq!(
request.url().path(),
"/v1/table/my_table/index/my_index/drop/"
);
http::Response::builder().status(404).body("{}").unwrap()
});
// Assert that the error is IndexNotFound
let e = table.drop_index("my_index").await.unwrap_err();
assert!(matches!(e, Error::IndexNotFound { .. }));
}
#[tokio::test]
async fn test_wait_for_index() {
let table = _make_table_with_indices(0);

View File

@@ -14,7 +14,7 @@ use datafusion_physical_plan::projection::ProjectionExec;
use datafusion_physical_plan::repartition::RepartitionExec;
use datafusion_physical_plan::union::UnionExec;
use datafusion_physical_plan::ExecutionPlan;
use futures::{StreamExt, TryStreamExt};
use futures::{FutureExt, StreamExt, TryFutureExt, TryStreamExt};
use lance::dataset::builder::DatasetBuilder;
use lance::dataset::cleanup::RemovalStats;
use lance::dataset::optimize::{compact_files, CompactionMetrics, IndexRemapperOptions};
@@ -80,7 +80,7 @@ pub mod merge;
use crate::index::waiter::wait_for_index;
pub use chrono::Duration;
use futures::future::join_all;
use futures::future::{join_all, Either};
pub use lance::dataset::optimize::CompactionOptions;
pub use lance::dataset::refs::{TagContents, Tags as LanceTags};
pub use lance::dataset::scanner::DatasetRecordBatchStream;
@@ -423,68 +423,79 @@ pub trait Tags: Send + Sync {
async fn update(&mut self, tag: &str, version: u64) -> Result<()>;
}
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize, Default)]
pub struct UpdateResult {
#[serde(default)]
pub rows_updated: u64,
// The commit version associated with the operation.
// A version of `0` indicates compatibility with legacy servers that do not return
/// a commit version.
#[serde(default)]
pub version: u64,
}
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize, Default)]
pub struct AddResult {
// The commit version associated with the operation.
// A version of `0` indicates compatibility with legacy servers that do not return
/// a commit version.
#[serde(default)]
pub version: u64,
}
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize, Default)]
pub struct DeleteResult {
// The commit version associated with the operation.
// A version of `0` indicates compatibility with legacy servers that do not return
/// a commit version.
#[serde(default)]
pub version: u64,
}
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize, Default)]
pub struct MergeResult {
// The commit version associated with the operation.
// A version of `0` indicates compatibility with legacy servers that do not return
/// a commit version.
#[serde(default)]
pub version: u64,
/// Number of inserted rows (for user statistics)
#[serde(default)]
pub num_inserted_rows: u64,
/// Number of updated rows (for user statistics)
#[serde(default)]
pub num_updated_rows: u64,
/// Number of deleted rows (for user statistics)
/// Note: This is different from internal references to 'deleted_rows', since we technically "delete" updated rows during processing.
/// However those rows are not shared with the user.
#[serde(default)]
pub num_deleted_rows: u64,
}
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize, Default)]
pub struct AddColumnsResult {
// The commit version associated with the operation.
// A version of `0` indicates compatibility with legacy servers that do not return
/// a commit version.
#[serde(default)]
pub version: u64,
}
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize, Default)]
pub struct AlterColumnsResult {
// The commit version associated with the operation.
// A version of `0` indicates compatibility with legacy servers that do not return
/// a commit version.
#[serde(default)]
pub version: u64,
}
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize, Default)]
pub struct DropColumnsResult {
// The commit version associated with the operation.
// A version of `0` indicates compatibility with legacy servers that do not return
/// a commit version.
#[serde(default)]
pub version: u64,
}
@@ -2003,7 +2014,7 @@ impl NativeTable {
/// more information.
pub async fn uses_v2_manifest_paths(&self) -> Result<bool> {
let dataset = self.dataset.get().await?;
Ok(dataset.manifest_naming_scheme == ManifestNamingScheme::V2)
Ok(dataset.manifest_location().naming_scheme == ManifestNamingScheme::V2)
}
/// Migrate the table to use the new manifest path scheme.
@@ -2464,8 +2475,26 @@ impl BaseTable for NativeTable {
} else {
builder.when_not_matched_by_source(WhenNotMatchedBySource::Keep);
}
let future = if let Some(timeout) = params.timeout {
// The default retry timeout is 30s, so we pass the full timeout down
// as well in case it is longer than that.
let future = builder
.retry_timeout(timeout)
.try_build()?
.execute_reader(new_data);
Either::Left(tokio::time::timeout(timeout, future).map(|res| match res {
Ok(Ok((new_dataset, stats))) => Ok((new_dataset, stats)),
Ok(Err(e)) => Err(e.into()),
Err(_) => Err(Error::Runtime {
message: "merge insert timed out".to_string(),
}),
}))
} else {
let job = builder.try_build()?;
let (new_dataset, stats) = job.execute_reader(new_data).await?;
Either::Right(job.execute_reader(new_data).map_err(|e| e.into()))
};
let (new_dataset, stats) = future.await?;
let version = new_dataset.manifest().version;
self.dataset.set_latest(new_dataset.as_ref().clone()).await;
Ok(MergeResult {

View File

@@ -1,7 +1,7 @@
// SPDX-License-Identifier: Apache-2.0
// SPDX-FileCopyrightText: Copyright The LanceDB Authors
use std::sync::Arc;
use std::{sync::Arc, time::Duration};
use arrow_array::RecordBatchReader;
@@ -21,6 +21,7 @@ pub struct MergeInsertBuilder {
pub(crate) when_not_matched_insert_all: bool,
pub(crate) when_not_matched_by_source_delete: bool,
pub(crate) when_not_matched_by_source_delete_filt: Option<String>,
pub(crate) timeout: Option<Duration>,
}
impl MergeInsertBuilder {
@@ -33,6 +34,7 @@ impl MergeInsertBuilder {
when_not_matched_insert_all: false,
when_not_matched_by_source_delete: false,
when_not_matched_by_source_delete_filt: None,
timeout: None,
}
}
@@ -84,6 +86,21 @@ impl MergeInsertBuilder {
self
}
/// Maximum time to run the operation before cancelling it.
///
/// By default, there is a 30-second timeout that is only enforced after the
/// first attempt. This is to prevent spending too long retrying to resolve
/// conflicts. For example, if a write attempt takes 20 seconds and fails,
/// the second attempt will be cancelled after 10 seconds, hitting the
/// 30-second timeout. However, a write that takes one hour and succeeds on the
/// first attempt will not be cancelled.
///
/// When this is set, the timeout is enforced on all attempts, including the first.
pub fn timeout(&mut self, timeout: Duration) -> &mut Self {
self.timeout = Some(timeout);
self
}
/// Executes the merge insert operation
///
/// Returns version and statistics about the merge operation including the number of rows