Compare commits

...

11 Commits

Author SHA1 Message Date
Lance Release
04f962f6b0 Bump version: 0.22.1-beta.3 → 0.22.1-beta.4 2025-05-08 20:18:40 +00:00
LuQQiu
19e896ff69 chore: add default for result structs (#2377)
add default for result structs, when values are not provided, will go
with the default values

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Chores**
- Improved internal handling of table operation results to support
default values. No changes to user-facing features or functionality.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-05-08 13:09:11 -07:00
Will Jones
272e4103b2 feat: provide timeout parameter for merge_insert (#2378)
Provides the ability to set a timeout for merge insert. The default
underlying timeout is however long the first attempt takes, or if there
are multiple attempts, 30 seconds. This has two use cases:

1. Make the timeout shorter, when you want to fail if it takes too long.
2. Allow taking more time to do retries.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Added support for specifying a timeout when performing merge insert
operations in Python, Node.js, and Rust APIs.
- Introduced a new option to control the maximum allowed execution time
for merge inserts, including retry timeout handling.

- **Documentation**
- Updated and added documentation to describe the new timeout option and
its usage in APIs.

- **Tests**
- Added and updated tests to verify correct timeout behavior during
merge insert operations.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-05-08 13:07:05 -07:00
Wyatt Alt
75c257ebb6 fix: return IndexNotExist on remote drop index 404 (#2380)
Prior to this commit, attempting to drop an index that did not exist
would return a TableNotFound error stating that the target table does
not exist -- even when it did exist. Instead, we now return an
IndexNotFound error.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Bug Fixes**
- Improved error handling when attempting to drop a non-existent index,
providing a more accurate error message.
- **Tests**
- Added a test to verify correct error reporting when dropping an index
that does not exist.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-05-07 17:24:05 -07:00
Wyatt Alt
9ee152eb42 fix: support __len__ on remote table (#2379)
This moves the __len__ method from LanceTable and RemoteTable to Table
so that child classes don't need to implement their own. In the process,
it fixes the implementation of RemoteTable's length method, which was
previously missing a return statement.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Refactor**
- Centralized the table length functionality in the base table class,
simplifying subclass behavior.
- Removed redundant or non-functional length methods from specific table
classes.

- **Tests**
- Added a new test to verify correct table length reporting for remote
tables.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-05-07 17:23:39 -07:00
LuQQiu
c9ae1b1737 fix: add restore with tag in python and nodejs API (#2374)
add restore with tag API in python and nodejs API and add tests to guard
them

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- The restore functionality now supports using version tags in addition
to numeric version identifiers, allowing you to revert tables to a state
marked by a tag.
- **Bug Fixes**
  - Restoring with an unknown tag now properly raises an error.
- **Documentation**
- Updated documentation and examples to clarify that restore accepts
both version numbers and tags.
- **Tests**
- Added new tests to verify restore behavior with version tags and error
handling for unknown tags.
  - Added tests for checkout and restore operations involving tags.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-05-06 16:12:58 -07:00
Lance Release
89dc80c42a Updating package-lock.json 2025-05-06 03:53:49 +00:00
Wyatt Alt
7b020ac799 chore: run cargo update (#2376) 2025-05-05 20:26:42 -07:00
Lance Release
529e774bbb Updating package-lock.json 2025-05-06 02:45:45 +00:00
Lance Release
7c12239305 Updating package-lock.json 2025-05-06 02:45:29 +00:00
Lance Release
d83424d6b4 Bump version: 0.19.1-beta.2 → 0.19.1-beta.3 2025-05-06 02:45:06 +00:00
39 changed files with 390 additions and 132 deletions

View File

@@ -1,5 +1,5 @@
[tool.bumpversion]
current_version = "0.19.1-beta.2"
current_version = "0.19.1-beta.3"
parse = """(?x)
(?P<major>0|[1-9]\\d*)\\.
(?P<minor>0|[1-9]\\d*)\\.

42
Cargo.lock generated
View File

@@ -2738,7 +2738,7 @@ checksum = "42703706b716c37f96a77aea830392ad231f44c9e9a67872fa5548707e11b11c"
[[package]]
name = "fsst"
version = "0.27.0"
source = "git+https://github.com/lancedb/lance.git?tag=v0.27.0-beta.2#cf903b470be1aaff2998830bd0358226f27f4185"
source = "git+https://github.com/lancedb/lance.git?tag=v0.27.0-beta.5#80a3f8796aee814c60cbdc94179b4e6231fa54e4"
dependencies = [
"rand 0.8.5",
]
@@ -3661,9 +3661,9 @@ checksum = "9028f49264629065d057f340a86acb84867925865f73bbf8d47b4d149a7e88b8"
[[package]]
name = "jiff"
version = "0.2.12"
version = "0.2.13"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d07d8d955d798e7a4d6f9c58cd1f1916e790b42b092758a9ef6e16fef9f1b3fd"
checksum = "f02000660d30638906021176af16b17498bd0d12813dbfe7b276d8bc7f3c0806"
dependencies = [
"jiff-static",
"log",
@@ -3674,9 +3674,9 @@ dependencies = [
[[package]]
name = "jiff-static"
version = "0.2.12"
version = "0.2.13"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "f244cfe006d98d26f859c7abd1318d85327e1882dc9cef80f62daeeb0adcf300"
checksum = "f3c30758ddd7188629c6713fc45d1188af4f44c90582311d0c8d8c9907f60c48"
dependencies = [
"proc-macro2",
"quote",
@@ -3728,7 +3728,7 @@ dependencies = [
[[package]]
name = "lance"
version = "0.27.0"
source = "git+https://github.com/lancedb/lance.git?tag=v0.27.0-beta.2#cf903b470be1aaff2998830bd0358226f27f4185"
source = "git+https://github.com/lancedb/lance.git?tag=v0.27.0-beta.5#80a3f8796aee814c60cbdc94179b4e6231fa54e4"
dependencies = [
"arrow",
"arrow-arith",
@@ -3791,7 +3791,7 @@ dependencies = [
[[package]]
name = "lance-arrow"
version = "0.27.0"
source = "git+https://github.com/lancedb/lance.git?tag=v0.27.0-beta.2#cf903b470be1aaff2998830bd0358226f27f4185"
source = "git+https://github.com/lancedb/lance.git?tag=v0.27.0-beta.5#80a3f8796aee814c60cbdc94179b4e6231fa54e4"
dependencies = [
"arrow-array",
"arrow-buffer",
@@ -3809,7 +3809,7 @@ dependencies = [
[[package]]
name = "lance-core"
version = "0.27.0"
source = "git+https://github.com/lancedb/lance.git?tag=v0.27.0-beta.2#cf903b470be1aaff2998830bd0358226f27f4185"
source = "git+https://github.com/lancedb/lance.git?tag=v0.27.0-beta.5#80a3f8796aee814c60cbdc94179b4e6231fa54e4"
dependencies = [
"arrow-array",
"arrow-buffer",
@@ -3846,7 +3846,7 @@ dependencies = [
[[package]]
name = "lance-datafusion"
version = "0.27.0"
source = "git+https://github.com/lancedb/lance.git?tag=v0.27.0-beta.2#cf903b470be1aaff2998830bd0358226f27f4185"
source = "git+https://github.com/lancedb/lance.git?tag=v0.27.0-beta.5#80a3f8796aee814c60cbdc94179b4e6231fa54e4"
dependencies = [
"arrow",
"arrow-array",
@@ -3876,7 +3876,7 @@ dependencies = [
[[package]]
name = "lance-datagen"
version = "0.27.0"
source = "git+https://github.com/lancedb/lance.git?tag=v0.27.0-beta.2#cf903b470be1aaff2998830bd0358226f27f4185"
source = "git+https://github.com/lancedb/lance.git?tag=v0.27.0-beta.5#80a3f8796aee814c60cbdc94179b4e6231fa54e4"
dependencies = [
"arrow",
"arrow-array",
@@ -3892,7 +3892,7 @@ dependencies = [
[[package]]
name = "lance-encoding"
version = "0.27.0"
source = "git+https://github.com/lancedb/lance.git?tag=v0.27.0-beta.2#cf903b470be1aaff2998830bd0358226f27f4185"
source = "git+https://github.com/lancedb/lance.git?tag=v0.27.0-beta.5#80a3f8796aee814c60cbdc94179b4e6231fa54e4"
dependencies = [
"arrayref",
"arrow",
@@ -3932,7 +3932,7 @@ dependencies = [
[[package]]
name = "lance-file"
version = "0.27.0"
source = "git+https://github.com/lancedb/lance.git?tag=v0.27.0-beta.2#cf903b470be1aaff2998830bd0358226f27f4185"
source = "git+https://github.com/lancedb/lance.git?tag=v0.27.0-beta.5#80a3f8796aee814c60cbdc94179b4e6231fa54e4"
dependencies = [
"arrow-arith",
"arrow-array",
@@ -3967,7 +3967,7 @@ dependencies = [
[[package]]
name = "lance-index"
version = "0.27.0"
source = "git+https://github.com/lancedb/lance.git?tag=v0.27.0-beta.2#cf903b470be1aaff2998830bd0358226f27f4185"
source = "git+https://github.com/lancedb/lance.git?tag=v0.27.0-beta.5#80a3f8796aee814c60cbdc94179b4e6231fa54e4"
dependencies = [
"arrow",
"arrow-array",
@@ -4021,7 +4021,7 @@ dependencies = [
[[package]]
name = "lance-io"
version = "0.27.0"
source = "git+https://github.com/lancedb/lance.git?tag=v0.27.0-beta.2#cf903b470be1aaff2998830bd0358226f27f4185"
source = "git+https://github.com/lancedb/lance.git?tag=v0.27.0-beta.5#80a3f8796aee814c60cbdc94179b4e6231fa54e4"
dependencies = [
"arrow",
"arrow-arith",
@@ -4060,7 +4060,7 @@ dependencies = [
[[package]]
name = "lance-linalg"
version = "0.27.0"
source = "git+https://github.com/lancedb/lance.git?tag=v0.27.0-beta.2#cf903b470be1aaff2998830bd0358226f27f4185"
source = "git+https://github.com/lancedb/lance.git?tag=v0.27.0-beta.5#80a3f8796aee814c60cbdc94179b4e6231fa54e4"
dependencies = [
"arrow-array",
"arrow-ord",
@@ -4084,7 +4084,7 @@ dependencies = [
[[package]]
name = "lance-table"
version = "0.27.0"
source = "git+https://github.com/lancedb/lance.git?tag=v0.27.0-beta.2#cf903b470be1aaff2998830bd0358226f27f4185"
source = "git+https://github.com/lancedb/lance.git?tag=v0.27.0-beta.5#80a3f8796aee814c60cbdc94179b4e6231fa54e4"
dependencies = [
"arrow",
"arrow-array",
@@ -4124,7 +4124,7 @@ dependencies = [
[[package]]
name = "lance-testing"
version = "0.27.0"
source = "git+https://github.com/lancedb/lance.git?tag=v0.27.0-beta.2#cf903b470be1aaff2998830bd0358226f27f4185"
source = "git+https://github.com/lancedb/lance.git?tag=v0.27.0-beta.5#80a3f8796aee814c60cbdc94179b4e6231fa54e4"
dependencies = [
"arrow-array",
"arrow-schema",
@@ -4135,7 +4135,7 @@ dependencies = [
[[package]]
name = "lancedb"
version = "0.19.1-beta.2"
version = "0.19.1-beta.3"
dependencies = [
"arrow",
"arrow-array",
@@ -4222,7 +4222,7 @@ dependencies = [
[[package]]
name = "lancedb-node"
version = "0.19.1-beta.2"
version = "0.19.1-beta.3"
dependencies = [
"arrow-array",
"arrow-ipc",
@@ -4247,7 +4247,7 @@ dependencies = [
[[package]]
name = "lancedb-nodejs"
version = "0.19.1-beta.2"
version = "0.19.1-beta.3"
dependencies = [
"arrow-array",
"arrow-ipc",
@@ -4266,7 +4266,7 @@ dependencies = [
[[package]]
name = "lancedb-python"
version = "0.22.1-beta.2"
version = "0.22.1-beta.3"
dependencies = [
"arrow",
"env_logger",

View File

@@ -21,14 +21,14 @@ categories = ["database-implementations"]
rust-version = "1.78.0"
[workspace.dependencies]
lance = { "version" = "=0.27.0", "features" = ["dynamodb"], tag = "v0.27.0-beta.2", git="https://github.com/lancedb/lance.git" }
lance-io = { version = "=0.27.0", tag = "v0.27.0-beta.2", git="https://github.com/lancedb/lance.git" }
lance-index = { version = "=0.27.0", tag = "v0.27.0-beta.2", git="https://github.com/lancedb/lance.git" }
lance-linalg = { version = "=0.27.0", tag = "v0.27.0-beta.2", git="https://github.com/lancedb/lance.git" }
lance-table = { version = "=0.27.0", tag = "v0.27.0-beta.2", git="https://github.com/lancedb/lance.git" }
lance-testing = { version = "=0.27.0", tag = "v0.27.0-beta.2", git="https://github.com/lancedb/lance.git" }
lance-datafusion = { version = "=0.27.0", tag = "v0.27.0-beta.2", git="https://github.com/lancedb/lance.git" }
lance-encoding = { version = "=0.27.0", tag = "v0.27.0-beta.2", git="https://github.com/lancedb/lance.git" }
lance = { "version" = "=0.27.0", "features" = ["dynamodb"], tag = "v0.27.0-beta.5", git="https://github.com/lancedb/lance.git" }
lance-io = { version = "=0.27.0", tag = "v0.27.0-beta.5", git="https://github.com/lancedb/lance.git" }
lance-index = { version = "=0.27.0", tag = "v0.27.0-beta.5", git="https://github.com/lancedb/lance.git" }
lance-linalg = { version = "=0.27.0", tag = "v0.27.0-beta.5", git="https://github.com/lancedb/lance.git" }
lance-table = { version = "=0.27.0", tag = "v0.27.0-beta.5", git="https://github.com/lancedb/lance.git" }
lance-testing = { version = "=0.27.0", tag = "v0.27.0-beta.5", git="https://github.com/lancedb/lance.git" }
lance-datafusion = { version = "=0.27.0", tag = "v0.27.0-beta.5", git="https://github.com/lancedb/lance.git" }
lance-encoding = { version = "=0.27.0", tag = "v0.27.0-beta.5", git="https://github.com/lancedb/lance.git" }
# Note that this one does not include pyarrow
arrow = { version = "54.1", optional = false }
arrow-array = "54.1"

View File

@@ -33,7 +33,7 @@ Construct a MergeInsertBuilder. __Internal use only.__
### execute()
```ts
execute(data): Promise<MergeResult>
execute(data, execOptions?): Promise<MergeResult>
```
Executes the merge insert operation
@@ -42,6 +42,8 @@ Executes the merge insert operation
* **data**: [`Data`](../type-aliases/Data.md)
* **execOptions?**: `Partial`&lt;[`WriteExecutionOptions`](../interfaces/WriteExecutionOptions.md)&gt;
#### Returns
`Promise`&lt;[`MergeResult`](../interfaces/MergeResult.md)&gt;

View File

@@ -72,6 +72,7 @@
- [UpdateOptions](interfaces/UpdateOptions.md)
- [UpdateResult](interfaces/UpdateResult.md)
- [Version](interfaces/Version.md)
- [WriteExecutionOptions](interfaces/WriteExecutionOptions.md)
## Type Aliases

View File

@@ -0,0 +1,26 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / WriteExecutionOptions
# Interface: WriteExecutionOptions
## Properties
### timeoutMs?
```ts
optional timeoutMs: number;
```
Maximum time to run the operation before cancelling it.
By default, there is a 30-second timeout that is only enforced after the
first attempt. This is to prevent spending too long retrying to resolve
conflicts. For example, if a write attempt takes 20 seconds and fails,
the second attempt will be cancelled after 10 seconds, hitting the
30-second timeout. However, a write that takes one hour and succeeds on the
first attempt will not be cancelled.
When this is set, the timeout is enforced on all attempts, including the first.

View File

@@ -8,7 +8,7 @@
<parent>
<groupId>com.lancedb</groupId>
<artifactId>lancedb-parent</artifactId>
<version>0.19.1-beta.2</version>
<version>0.19.1-beta.3</version>
<relativePath>../pom.xml</relativePath>
</parent>

View File

@@ -6,7 +6,7 @@
<groupId>com.lancedb</groupId>
<artifactId>lancedb-parent</artifactId>
<version>0.19.1-beta.2</version>
<version>0.19.1-beta.3</version>
<packaging>pom</packaging>
<name>LanceDB Parent</name>

51
node/package-lock.json generated
View File

@@ -1,12 +1,12 @@
{
"name": "vectordb",
"version": "0.19.1-beta.2",
"version": "0.19.1-beta.3",
"lockfileVersion": 3,
"requires": true,
"packages": {
"": {
"name": "vectordb",
"version": "0.19.1-beta.2",
"version": "0.19.1-beta.3",
"cpu": [
"x64",
"arm64"
@@ -52,11 +52,11 @@
"uuid": "^9.0.0"
},
"optionalDependencies": {
"@lancedb/vectordb-darwin-arm64": "0.19.1-beta.2",
"@lancedb/vectordb-darwin-x64": "0.19.1-beta.2",
"@lancedb/vectordb-linux-arm64-gnu": "0.19.1-beta.2",
"@lancedb/vectordb-linux-x64-gnu": "0.19.1-beta.2",
"@lancedb/vectordb-win32-x64-msvc": "0.19.1-beta.2"
"@lancedb/vectordb-darwin-arm64": "0.19.1-beta.3",
"@lancedb/vectordb-darwin-x64": "0.19.1-beta.3",
"@lancedb/vectordb-linux-arm64-gnu": "0.19.1-beta.3",
"@lancedb/vectordb-linux-x64-gnu": "0.19.1-beta.3",
"@lancedb/vectordb-win32-x64-msvc": "0.19.1-beta.3"
},
"peerDependencies": {
"@apache-arrow/ts": "^14.0.2",
@@ -327,9 +327,9 @@
}
},
"node_modules/@lancedb/vectordb-darwin-arm64": {
"version": "0.19.1-beta.2",
"resolved": "https://registry.npmjs.org/@lancedb/vectordb-darwin-arm64/-/vectordb-darwin-arm64-0.19.1-beta.2.tgz",
"integrity": "sha512-mG0ZXL4y70GUynzGHAVfFfKLzjrro6iYRY09RWXGdapHHliZIIsLZIo+hdX4sJHjjq7MRoMbJEWtR5Wwc9t3+Q==",
"version": "0.19.1-beta.3",
"resolved": "https://registry.npmjs.org/@lancedb/vectordb-darwin-arm64/-/vectordb-darwin-arm64-0.19.1-beta.3.tgz",
"integrity": "sha512-TglTNkvgxxHHhh8YbEwj5t9XuInNVUNeFN34Zyk+7ab/rDdMASiKv6ZvDkwacVm7aXeBbLw39/6+IegStJfFCg==",
"cpu": [
"arm64"
],
@@ -340,9 +340,9 @@
]
},
"node_modules/@lancedb/vectordb-darwin-x64": {
"version": "0.19.1-beta.2",
"resolved": "https://registry.npmjs.org/@lancedb/vectordb-darwin-x64/-/vectordb-darwin-x64-0.19.1-beta.2.tgz",
"integrity": "sha512-dvhUtOG4DzFotF9pJkLfxjbj4IXTkFja+jMBZ77Udh+IvbFXuORAYfIOopP65yxKXdzXU3Tk20owt+LgQZbJjQ==",
"version": "0.19.1-beta.3",
"resolved": "https://registry.npmjs.org/@lancedb/vectordb-darwin-x64/-/vectordb-darwin-x64-0.19.1-beta.3.tgz",
"integrity": "sha512-mwBbOVgeUT3xyegzga0gTBJ+DXI3dP1zPKcOQRQDRJk+GkfHk1CblGXT3h/YL18NWfR1FGMe9s59PNJR6r6l8A==",
"cpu": [
"x64"
],
@@ -353,9 +353,9 @@
]
},
"node_modules/@lancedb/vectordb-linux-arm64-gnu": {
"version": "0.19.1-beta.2",
"resolved": "https://registry.npmjs.org/@lancedb/vectordb-linux-arm64-gnu/-/vectordb-linux-arm64-gnu-0.19.1-beta.2.tgz",
"integrity": "sha512-Onmbqk0LutVIF65ljKfdRqyG/W6nXO9NTlxB6BO71f6X9Fqh2Sv7WOZjj3Ku3KK/5mcOguMCQde4qgLVmUbJdw==",
"version": "0.19.1-beta.3",
"resolved": "https://registry.npmjs.org/@lancedb/vectordb-linux-arm64-gnu/-/vectordb-linux-arm64-gnu-0.19.1-beta.3.tgz",
"integrity": "sha512-amihspQ5ThSKRJsPpeAte/edWDGAN5ZjdqhtX8AUuuOmoJ5EekfsgXZc+fyFNwl6RzGT7PKqpL7SQzOdLKMijQ==",
"cpu": [
"arm64"
],
@@ -366,9 +366,9 @@
]
},
"node_modules/@lancedb/vectordb-linux-x64-gnu": {
"version": "0.19.1-beta.2",
"resolved": "https://registry.npmjs.org/@lancedb/vectordb-linux-x64-gnu/-/vectordb-linux-x64-gnu-0.19.1-beta.2.tgz",
"integrity": "sha512-QeZEgPQiollqgtbXXIPP/58M94f5cEk6md4k3ICl79N6hs5V+E0BrTPGYlSPZCE32B6AIGzjYCgiIDea/jvshw==",
"version": "0.19.1-beta.3",
"resolved": "https://registry.npmjs.org/@lancedb/vectordb-linux-x64-gnu/-/vectordb-linux-x64-gnu-0.19.1-beta.3.tgz",
"integrity": "sha512-mZzOETBii+UUu7D2TOohhukXNjjOfldbNADRB20FF2a3hYzrVteiFudCQRYtbVunpHE0qvNRTkyuRqM7DwOygw==",
"cpu": [
"x64"
],
@@ -378,6 +378,19 @@
"linux"
]
},
"node_modules/@lancedb/vectordb-win32-x64-msvc": {
"version": "0.19.1-beta.3",
"resolved": "https://registry.npmjs.org/@lancedb/vectordb-win32-x64-msvc/-/vectordb-win32-x64-msvc-0.19.1-beta.3.tgz",
"integrity": "sha512-LHsKFtJZRRZ4MVa6uSeWqPJ9vfw0atmp6bvVDByxgouVN4CwdlnAxOu69YJtwDPxnfg8Pn+eQ5txIFvhFtCAnA==",
"cpu": [
"x64"
],
"license": "Apache-2.0",
"optional": true,
"os": [
"win32"
]
},
"node_modules/@neon-rs/cli": {
"version": "0.0.160",
"resolved": "https://registry.npmjs.org/@neon-rs/cli/-/cli-0.0.160.tgz",

View File

@@ -1,6 +1,6 @@
{
"name": "vectordb",
"version": "0.19.1-beta.2",
"version": "0.19.1-beta.3",
"description": " Serverless, low-latency vector database for AI applications",
"private": false,
"main": "dist/index.js",
@@ -89,10 +89,10 @@
}
},
"optionalDependencies": {
"@lancedb/vectordb-darwin-x64": "0.19.1-beta.2",
"@lancedb/vectordb-darwin-arm64": "0.19.1-beta.2",
"@lancedb/vectordb-linux-x64-gnu": "0.19.1-beta.2",
"@lancedb/vectordb-linux-arm64-gnu": "0.19.1-beta.2",
"@lancedb/vectordb-win32-x64-msvc": "0.19.1-beta.2"
"@lancedb/vectordb-darwin-x64": "0.19.1-beta.3",
"@lancedb/vectordb-darwin-arm64": "0.19.1-beta.3",
"@lancedb/vectordb-linux-x64-gnu": "0.19.1-beta.3",
"@lancedb/vectordb-linux-arm64-gnu": "0.19.1-beta.3",
"@lancedb/vectordb-win32-x64-msvc": "0.19.1-beta.3"
}
}

View File

@@ -1,7 +1,7 @@
[package]
name = "lancedb-nodejs"
edition.workspace = true
version = "0.19.1-beta.2"
version = "0.19.1-beta.3"
license.workspace = true
description.workspace = true
repository.workspace = true

View File

@@ -349,7 +349,7 @@ describe("merge insert", () => {
.mergeInsert("a")
.whenMatchedUpdateAll()
.whenNotMatchedInsertAll()
.execute(newData);
.execute(newData, { timeoutMs: 10_000 });
expect(mergeInsertRes).toHaveProperty("version");
expect(mergeInsertRes.version).toBe(2);
expect(mergeInsertRes.numInsertedRows).toBe(1);
@@ -463,6 +463,20 @@ describe("merge insert", () => {
res = res.sort((a, b) => a.a - b.a);
expect(res).toEqual(expected);
});
test("timeout", async () => {
const newData = [
{ a: 2, b: "x" },
{ a: 4, b: "z" },
];
await expect(
table
.mergeInsert("a")
.whenMatchedUpdateAll()
.whenNotMatchedInsertAll()
.execute(newData, { timeoutMs: 0 }),
).rejects.toThrow("merge insert timed out");
});
});
describe("When creating an index", () => {
@@ -1287,6 +1301,32 @@ describe("when dealing with tags", () => {
await table.checkoutLatest();
expect(await table.version()).toBe(4);
});
it("can checkout and restore tags", async () => {
const conn = await connect(tmpDir.name, {
readConsistencyInterval: 0,
});
const table = await conn.createTable("my_table", [
{ id: 1n, vector: [0.1, 0.2] },
]);
expect(await table.version()).toBe(1);
expect(await table.countRows()).toBe(1);
const tagsManager = await table.tags();
const tag1 = "tag1";
await tagsManager.create(tag1, 1);
await table.add([{ id: 2n, vector: [0.3, 0.4] }]);
const tag2 = "tag2";
await tagsManager.create(tag2, 2);
expect(await table.version()).toBe(2);
await table.checkout(tag1);
expect(await table.version()).toBe(1);
await table.restore();
expect(await table.version()).toBe(3);
expect(await table.countRows()).toBe(1);
await table.add([{ id: 3n, vector: [0.5, 0.6] }]);
expect(await table.countRows()).toBe(2);
});
});
describe("when optimizing a dataset", () => {

View File

@@ -86,7 +86,7 @@ export {
ColumnAlteration,
} from "./table";
export { MergeInsertBuilder } from "./merge";
export { MergeInsertBuilder, WriteExecutionOptions } from "./merge";
export * as embedding from "./embedding";
export * as rerankers from "./rerankers";

View File

@@ -75,7 +75,10 @@ export class MergeInsertBuilder {
*
* @returns {Promise<MergeResult>} the merge result
*/
async execute(data: Data): Promise<MergeResult> {
async execute(
data: Data,
execOptions?: Partial<WriteExecutionOptions>,
): Promise<MergeResult> {
let schema: Schema;
if (this.#schema instanceof Promise) {
schema = await this.#schema;
@@ -83,7 +86,28 @@ export class MergeInsertBuilder {
} else {
schema = this.#schema;
}
if (execOptions?.timeoutMs !== undefined) {
this.#native.setTimeout(execOptions.timeoutMs);
}
const buffer = await fromDataToBuffer(data, undefined, schema);
return await this.#native.execute(buffer);
}
}
export interface WriteExecutionOptions {
/**
* Maximum time to run the operation before cancelling it.
*
* By default, there is a 30-second timeout that is only enforced after the
* first attempt. This is to prevent spending too long retrying to resolve
* conflicts. For example, if a write attempt takes 20 seconds and fails,
* the second attempt will be cancelled after 10 seconds, hitting the
* 30-second timeout. However, a write that takes one hour and succeeds on the
* first attempt will not be cancelled.
*
* When this is set, the timeout is enforced on all attempts, including the first.
*/
timeoutMs?: number;
}

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-darwin-arm64",
"version": "0.19.1-beta.2",
"version": "0.19.1-beta.3",
"os": ["darwin"],
"cpu": ["arm64"],
"main": "lancedb.darwin-arm64.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-darwin-x64",
"version": "0.19.1-beta.2",
"version": "0.19.1-beta.3",
"os": ["darwin"],
"cpu": ["x64"],
"main": "lancedb.darwin-x64.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-linux-arm64-gnu",
"version": "0.19.1-beta.2",
"version": "0.19.1-beta.3",
"os": ["linux"],
"cpu": ["arm64"],
"main": "lancedb.linux-arm64-gnu.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-linux-arm64-musl",
"version": "0.19.1-beta.2",
"version": "0.19.1-beta.3",
"os": ["linux"],
"cpu": ["arm64"],
"main": "lancedb.linux-arm64-musl.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-linux-x64-gnu",
"version": "0.19.1-beta.2",
"version": "0.19.1-beta.3",
"os": ["linux"],
"cpu": ["x64"],
"main": "lancedb.linux-x64-gnu.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-linux-x64-musl",
"version": "0.19.1-beta.2",
"version": "0.19.1-beta.3",
"os": ["linux"],
"cpu": ["x64"],
"main": "lancedb.linux-x64-musl.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-win32-arm64-msvc",
"version": "0.19.1-beta.2",
"version": "0.19.1-beta.3",
"os": [
"win32"
],

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-win32-x64-msvc",
"version": "0.19.1-beta.2",
"version": "0.19.1-beta.3",
"os": ["win32"],
"cpu": ["x64"],
"main": "lancedb.win32-x64-msvc.node",

View File

@@ -1,12 +1,12 @@
{
"name": "@lancedb/lancedb",
"version": "0.19.1-beta.2",
"version": "0.19.1-beta.3",
"lockfileVersion": 3,
"requires": true,
"packages": {
"": {
"name": "@lancedb/lancedb",
"version": "0.19.1-beta.2",
"version": "0.19.1-beta.3",
"cpu": [
"x64",
"arm64"

View File

@@ -11,7 +11,7 @@
"ann"
],
"private": false,
"version": "0.19.1-beta.2",
"version": "0.19.1-beta.3",
"main": "dist/index.js",
"exports": {
".": "./dist/index.js",

View File

@@ -1,6 +1,8 @@
// SPDX-License-Identifier: Apache-2.0
// SPDX-FileCopyrightText: Copyright The LanceDB Authors
use std::time::Duration;
use lancedb::{arrow::IntoArrow, ipc::ipc_file_to_batches, table::merge::MergeInsertBuilder};
use napi::bindgen_prelude::*;
use napi_derive::napi;
@@ -36,6 +38,11 @@ impl NativeMergeInsertBuilder {
this
}
#[napi]
pub fn set_timeout(&mut self, timeout: u32) {
self.inner.timeout(Duration::from_millis(timeout as u64));
}
#[napi(catch_unwind)]
pub async fn execute(&self, buf: Buffer) -> napi::Result<MergeResult> {
let data = ipc_file_to_batches(buf.to_vec())

View File

@@ -1,5 +1,5 @@
[tool.bumpversion]
current_version = "0.22.1-beta.3"
current_version = "0.22.1-beta.4"
parse = """(?x)
(?P<major>0|[1-9]\\d*)\\.
(?P<minor>0|[1-9]\\d*)\\.

View File

@@ -1,6 +1,6 @@
[package]
name = "lancedb-python"
version = "0.22.1-beta.3"
version = "0.22.1-beta.4"
edition.workspace = true
description = "Python bindings for LanceDB"
license.workspace = true

View File

@@ -51,7 +51,7 @@ class Table:
async def version(self) -> int: ...
async def checkout(self, version: Union[int, str]): ...
async def checkout_latest(self): ...
async def restore(self, version: Optional[int] = None): ...
async def restore(self, version: Optional[Union[int, str]] = None): ...
async def list_indices(self) -> list[IndexConfig]: ...
async def delete(self, filter: str) -> DeleteResult: ...
async def add_columns(self, columns: list[tuple[str, str]]) -> AddColumnsResult: ...

View File

@@ -4,6 +4,7 @@
from __future__ import annotations
from datetime import timedelta
from typing import TYPE_CHECKING, List, Optional
if TYPE_CHECKING:
@@ -31,6 +32,7 @@ class LanceMergeInsertBuilder(object):
self._when_not_matched_insert_all = False
self._when_not_matched_by_source_delete = False
self._when_not_matched_by_source_condition = None
self._timeout = None
def when_matched_update_all(
self, *, where: Optional[str] = None
@@ -81,6 +83,7 @@ class LanceMergeInsertBuilder(object):
new_data: DATA,
on_bad_vectors: str = "error",
fill_value: float = 0.0,
timeout: Optional[timedelta] = None,
) -> MergeInsertResult:
"""
Executes the merge insert operation
@@ -98,10 +101,24 @@ class LanceMergeInsertBuilder(object):
One of "error", "drop", "fill".
fill_value: float, default 0.
The value to use when filling vectors. Only used if on_bad_vectors="fill".
timeout: Optional[timedelta], default None
Maximum time to run the operation before cancelling it.
By default, there is a 30-second timeout that is only enforced after the
first attempt. This is to prevent spending too long retrying to resolve
conflicts. For example, if a write attempt takes 20 seconds and fails,
the second attempt will be cancelled after 10 seconds, hitting the
30-second timeout. However, a write that takes one hour and succeeds on the
first attempt will not be cancelled.
When this is set, the timeout is enforced on all attempts, including
the first.
Returns
-------
MergeInsertResult
version: the new version number of the table after doing merge insert.
"""
if timeout is not None:
self._timeout = timeout
return self._table._do_merge(self, new_data, on_bad_vectors, fill_value)

View File

@@ -47,9 +47,6 @@ class RemoteTable(Table):
def __repr__(self) -> str:
return f"RemoteTable({self.db_name}.{self.name})"
def __len__(self) -> int:
self.count_rows(None)
@property
def schema(self) -> pa.Schema:
"""The [Arrow Schema](https://arrow.apache.org/docs/python/api/datatypes.html#)
@@ -100,7 +97,7 @@ class RemoteTable(Table):
def checkout_latest(self):
return LOOP.run(self._table.checkout_latest())
def restore(self, version: Optional[int] = None):
def restore(self, version: Optional[Union[int, str]] = None):
return LOOP.run(self._table.restore(version))
def list_indices(self) -> Iterable[IndexConfig]:

View File

@@ -620,6 +620,10 @@ class Table(ABC):
"""
raise NotImplementedError
def __len__(self) -> int:
"""The number of rows in this Table"""
return self.count_rows(None)
@property
@abstractmethod
def embedding_functions(self) -> Dict[str, EmbeddingFunctionConfig]:
@@ -1470,7 +1474,7 @@ class Table(ABC):
"""
@abstractmethod
def restore(self, version: Optional[int] = None):
def restore(self, version: Optional[Union[int, str]] = None):
"""Restore a version of the table. This is an in-place operation.
This creates a new version where the data is equivalent to the
@@ -1478,9 +1482,10 @@ class Table(ABC):
Parameters
----------
version : int, default None
The version to restore. If unspecified then restores the currently
checked out version. If the currently checked out version is the
version : int or str, default None
The version number or version tag to restore.
If unspecified then restores the currently checked out version.
If the currently checked out version is the
latest version then this is a no-op.
"""
@@ -1710,7 +1715,7 @@ class LanceTable(Table):
"""
LOOP.run(self._table.checkout_latest())
def restore(self, version: Optional[int] = None):
def restore(self, version: Optional[Union[int, str]] = None):
"""Restore a version of the table. This is an in-place operation.
This creates a new version where the data is equivalent to the
@@ -1718,9 +1723,10 @@ class LanceTable(Table):
Parameters
----------
version : int, default None
The version to restore. If unspecified then restores the currently
checked out version. If the currently checked out version is the
version : int or str, default None
The version number or version tag to restore.
If unspecified then restores the currently checked out version.
If the currently checked out version is the
latest version then this is a no-op.
Examples
@@ -1738,12 +1744,20 @@ class LanceTable(Table):
AddResult(version=2)
>>> table.version
2
>>> table.tags.create("v2", 2)
>>> table.restore(1)
>>> table.to_pandas()
vector type
0 [1.1, 0.9] vector
>>> len(table.list_versions())
3
>>> table.restore("v2")
>>> table.to_pandas()
vector type
0 [1.1, 0.9] vector
1 [0.5, 0.2] vector
>>> len(table.list_versions())
4
"""
if version is not None:
LOOP.run(self._table.checkout(version))
@@ -1752,9 +1766,6 @@ class LanceTable(Table):
def count_rows(self, filter: Optional[str] = None) -> int:
return LOOP.run(self._table.count_rows(filter))
def __len__(self) -> int:
return self.count_rows()
def __repr__(self) -> str:
val = f"{self.__class__.__name__}(name={self.name!r}, version={self.version}"
if self._conn.read_consistency_interval is not None:
@@ -3705,6 +3716,7 @@ class AsyncTable:
when_not_matched_insert_all=merge._when_not_matched_insert_all,
when_not_matched_by_source_delete=merge._when_not_matched_by_source_delete,
when_not_matched_by_source_condition=merge._when_not_matched_by_source_condition,
timeout=merge._timeout,
),
)
@@ -3962,7 +3974,7 @@ class AsyncTable:
"""
await self._inner.checkout_latest()
async def restore(self, version: Optional[int] = None):
async def restore(self, version: Optional[int | str] = None):
"""
Restore the table to the currently checked out version

View File

@@ -149,6 +149,24 @@ async def test_async_checkout():
assert await table.count_rows() == 300
def test_table_len_sync():
def handler(request):
if request.path == "/v1/table/test/create/?mode=create":
request.send_response(200)
request.send_header("Content-Type", "application/json")
request.end_headers()
request.wfile.write(b"{}")
request.send_response(200)
request.send_header("Content-Type", "application/json")
request.end_headers()
request.wfile.write(json.dumps(1).encode())
with mock_lancedb_connection(handler) as db:
table = db.create_table("test", [{"id": 1}])
assert len(table) == 1
@pytest.mark.asyncio
async def test_http_error():
request_id_holder = {"request_id": None}

View File

@@ -769,6 +769,29 @@ def test_restore(mem_db: DBConnection):
table.restore(0)
def test_restore_with_tags(mem_db: DBConnection):
table = mem_db.create_table(
"my_table",
data=[{"vector": [1.1, 0.9], "type": "vector"}],
)
tag = "tag1"
table.tags.create(tag, 1)
table.add([{"vector": [0.5, 0.2], "type": "vector"}])
table.restore(tag)
assert len(table.list_versions()) == 3
assert len(table) == 1
expected = table.to_arrow()
table.add([{"vector": [0.3, 0.3], "type": "vector"}])
table.checkout("tag1")
table.restore()
assert len(table.list_versions()) == 5
assert table.to_arrow() == expected
with pytest.raises(ValueError):
table.restore("tag_unknown")
def test_merge(tmp_db: DBConnection, tmp_path):
pytest.importorskip("lance")
import lance
@@ -914,7 +937,7 @@ def test_merge_insert(mem_db: DBConnection):
table.merge_insert("a")
.when_matched_update_all()
.when_not_matched_insert_all()
.execute(new_data)
.execute(new_data, timeout=timedelta(seconds=10))
)
assert merge_insert_res.version == 2
assert merge_insert_res.num_inserted_rows == 1
@@ -990,6 +1013,12 @@ def test_merge_insert(mem_db: DBConnection):
expected = pa.table({"a": [2, 4], "b": ["x", "z"]})
assert table.to_arrow().sort_by("a") == expected
# timeout
with pytest.raises(Exception, match="merge insert timed out"):
table.merge_insert("a").when_matched_update_all().execute(
new_data, timeout=timedelta(0)
)
# We vary the data format because there are slight differences in how
# subschemas are handled in different formats

View File

@@ -17,10 +17,10 @@ use lancedb::table::{
Table as LanceDbTable,
};
use pyo3::{
exceptions::{PyIOError, PyKeyError, PyRuntimeError, PyValueError},
exceptions::{PyKeyError, PyRuntimeError, PyValueError},
pyclass, pymethods,
types::{IntoPyDict, PyAnyMethods, PyDict, PyDictMethods, PyInt, PyString},
Bound, FromPyObject, PyAny, PyObject, PyRef, PyResult, Python,
types::{IntoPyDict, PyAnyMethods, PyDict, PyDictMethods},
Bound, FromPyObject, PyAny, PyRef, PyResult, Python,
};
use pyo3_async_runtimes::tokio::future_into_py;
@@ -520,25 +520,15 @@ impl Table {
})
}
pub fn checkout(self_: PyRef<'_, Self>, version: PyObject) -> PyResult<Bound<'_, PyAny>> {
pub fn checkout(self_: PyRef<'_, Self>, version: LanceVersion) -> PyResult<Bound<'_, PyAny>> {
let inner = self_.inner_ref()?.clone();
let py = self_.py();
let (is_int, int_value, string_value) = if let Ok(i) = version.downcast_bound::<PyInt>(py) {
let num: u64 = i.extract()?;
(true, num, String::new())
} else if let Ok(s) = version.downcast_bound::<PyString>(py) {
let str_value = s.to_string();
(false, 0, str_value)
} else {
return Err(PyIOError::new_err(
"version must be an integer or a string.",
));
};
future_into_py(py, async move {
if is_int {
inner.checkout(int_value).await.infer_error()
} else {
inner.checkout_tag(&string_value).await.infer_error()
match version {
LanceVersion::Version(version_num) => {
inner.checkout(version_num).await.infer_error()
}
LanceVersion::Tag(tag) => inner.checkout_tag(&tag).await.infer_error(),
}
})
}
@@ -551,12 +541,19 @@ impl Table {
}
#[pyo3(signature = (version=None))]
pub fn restore(self_: PyRef<'_, Self>, version: Option<u64>) -> PyResult<Bound<'_, PyAny>> {
pub fn restore(
self_: PyRef<'_, Self>,
version: Option<LanceVersion>,
) -> PyResult<Bound<'_, PyAny>> {
let inner = self_.inner_ref()?.clone();
let py = self_.py();
future_into_py(self_.py(), async move {
future_into_py(py, async move {
if let Some(version) = version {
inner.checkout(version).await.infer_error()?;
match version {
LanceVersion::Version(num) => inner.checkout(num).await.infer_error()?,
LanceVersion::Tag(tag) => inner.checkout_tag(&tag).await.infer_error()?,
}
}
inner.restore().await.infer_error()
})
@@ -652,6 +649,9 @@ impl Table {
builder
.when_not_matched_by_source_delete(parameters.when_not_matched_by_source_condition);
}
if let Some(timeout) = parameters.timeout {
builder.timeout(timeout);
}
future_into_py(self_.py(), async move {
let res = builder.execute(Box::new(batches)).await.infer_error()?;
@@ -795,6 +795,12 @@ impl Table {
}
}
#[derive(FromPyObject)]
pub enum LanceVersion {
Version(u64),
Tag(String),
}
#[derive(FromPyObject)]
#[pyo3(from_item_all)]
pub struct MergeInsertParams {
@@ -804,6 +810,7 @@ pub struct MergeInsertParams {
when_not_matched_insert_all: bool,
when_not_matched_by_source_delete: bool,
when_not_matched_by_source_condition: Option<String>,
timeout: Option<std::time::Duration>,
}
#[pyclass]

View File

@@ -1,6 +1,6 @@
[package]
name = "lancedb-node"
version = "0.19.1-beta.2"
version = "0.19.1-beta.3"
description = "Serverless, low-latency vector database for AI applications"
license.workspace = true
edition.workspace = true

View File

@@ -1,6 +1,6 @@
[package]
name = "lancedb"
version = "0.19.1-beta.2"
version = "0.19.1-beta.3"
edition.workspace = true
description = "LanceDB: A serverless, low-latency vector database for AI applications"
license.workspace = true

View File

@@ -1068,13 +1068,22 @@ impl<S: HttpSend> BaseTable for RemoteTable<S> {
) -> Result<MergeResult> {
self.check_mutable().await?;
let timeout = params.timeout;
let query = MergeInsertRequest::try_from(params)?;
let request = self
let mut request = self
.client
.post(&format!("/v1/table/{}/merge_insert/", self.name))
.query(&query)
.header(CONTENT_TYPE, ARROW_STREAM_CONTENT_TYPE);
if let Some(timeout) = timeout {
// (If it doesn't fit into u64, it's not worth sending anyways.)
if let Ok(timeout_ms) = u64::try_from(timeout.as_millis()) {
request = request.header(REQUEST_TIMEOUT_HEADER, timeout_ms);
}
}
let (request_id, response) = self.send_streaming(request, new_data, true).await?;
let response = self.check_table_response(&request_id, response).await?;
@@ -1325,7 +1334,12 @@ impl<S: HttpSend> BaseTable for RemoteTable<S> {
self.name, index_name
));
let (request_id, response) = self.send(request, true).await?;
self.check_table_response(&request_id, response).await?;
if response.status() == StatusCode::NOT_FOUND {
return Err(Error::IndexNotFound {
name: index_name.to_string(),
});
};
self.client.check_response(&request_id, response).await?;
Ok(())
}
@@ -2879,6 +2893,22 @@ mod tests {
table.drop_index("my_index").await.unwrap();
}
#[tokio::test]
async fn test_drop_index_not_exists() {
let table = Table::new_with_handler("my_table", |request| {
assert_eq!(request.method(), "POST");
assert_eq!(
request.url().path(),
"/v1/table/my_table/index/my_index/drop/"
);
http::Response::builder().status(404).body("{}").unwrap()
});
// Assert that the error is IndexNotFound
let e = table.drop_index("my_index").await.unwrap_err();
assert!(matches!(e, Error::IndexNotFound { .. }));
}
#[tokio::test]
async fn test_wait_for_index() {
let table = _make_table_with_indices(0);

View File

@@ -14,7 +14,7 @@ use datafusion_physical_plan::projection::ProjectionExec;
use datafusion_physical_plan::repartition::RepartitionExec;
use datafusion_physical_plan::union::UnionExec;
use datafusion_physical_plan::ExecutionPlan;
use futures::{StreamExt, TryStreamExt};
use futures::{FutureExt, StreamExt, TryFutureExt, TryStreamExt};
use lance::dataset::builder::DatasetBuilder;
use lance::dataset::cleanup::RemovalStats;
use lance::dataset::optimize::{compact_files, CompactionMetrics, IndexRemapperOptions};
@@ -80,7 +80,7 @@ pub mod merge;
use crate::index::waiter::wait_for_index;
pub use chrono::Duration;
use futures::future::join_all;
use futures::future::{join_all, Either};
pub use lance::dataset::optimize::CompactionOptions;
pub use lance::dataset::refs::{TagContents, Tags as LanceTags};
pub use lance::dataset::scanner::DatasetRecordBatchStream;
@@ -423,7 +423,7 @@ pub trait Tags: Send + Sync {
async fn update(&mut self, tag: &str, version: u64) -> Result<()>;
}
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize, Default)]
pub struct UpdateResult {
#[serde(default)]
pub rows_updated: u64,
@@ -434,7 +434,7 @@ pub struct UpdateResult {
pub version: u64,
}
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize, Default)]
pub struct AddResult {
// The commit version associated with the operation.
// A version of `0` indicates compatibility with legacy servers that do not return
@@ -443,7 +443,7 @@ pub struct AddResult {
pub version: u64,
}
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize, Default)]
pub struct DeleteResult {
// The commit version associated with the operation.
// A version of `0` indicates compatibility with legacy servers that do not return
@@ -452,7 +452,7 @@ pub struct DeleteResult {
pub version: u64,
}
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize, Default)]
pub struct MergeResult {
// The commit version associated with the operation.
// A version of `0` indicates compatibility with legacy servers that do not return
@@ -472,7 +472,7 @@ pub struct MergeResult {
pub num_deleted_rows: u64,
}
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize, Default)]
pub struct AddColumnsResult {
// The commit version associated with the operation.
// A version of `0` indicates compatibility with legacy servers that do not return
@@ -481,7 +481,7 @@ pub struct AddColumnsResult {
pub version: u64,
}
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize, Default)]
pub struct AlterColumnsResult {
// The commit version associated with the operation.
// A version of `0` indicates compatibility with legacy servers that do not return
@@ -490,7 +490,7 @@ pub struct AlterColumnsResult {
pub version: u64,
}
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize, Default)]
pub struct DropColumnsResult {
// The commit version associated with the operation.
// A version of `0` indicates compatibility with legacy servers that do not return
@@ -2014,7 +2014,7 @@ impl NativeTable {
/// more information.
pub async fn uses_v2_manifest_paths(&self) -> Result<bool> {
let dataset = self.dataset.get().await?;
Ok(dataset.manifest_naming_scheme == ManifestNamingScheme::V2)
Ok(dataset.manifest_location().naming_scheme == ManifestNamingScheme::V2)
}
/// Migrate the table to use the new manifest path scheme.
@@ -2475,8 +2475,26 @@ impl BaseTable for NativeTable {
} else {
builder.when_not_matched_by_source(WhenNotMatchedBySource::Keep);
}
let job = builder.try_build()?;
let (new_dataset, stats) = job.execute_reader(new_data).await?;
let future = if let Some(timeout) = params.timeout {
// The default retry timeout is 30s, so we pass the full timeout down
// as well in case it is longer than that.
let future = builder
.retry_timeout(timeout)
.try_build()?
.execute_reader(new_data);
Either::Left(tokio::time::timeout(timeout, future).map(|res| match res {
Ok(Ok((new_dataset, stats))) => Ok((new_dataset, stats)),
Ok(Err(e)) => Err(e.into()),
Err(_) => Err(Error::Runtime {
message: "merge insert timed out".to_string(),
}),
}))
} else {
let job = builder.try_build()?;
Either::Right(job.execute_reader(new_data).map_err(|e| e.into()))
};
let (new_dataset, stats) = future.await?;
let version = new_dataset.manifest().version;
self.dataset.set_latest(new_dataset.as_ref().clone()).await;
Ok(MergeResult {

View File

@@ -1,7 +1,7 @@
// SPDX-License-Identifier: Apache-2.0
// SPDX-FileCopyrightText: Copyright The LanceDB Authors
use std::sync::Arc;
use std::{sync::Arc, time::Duration};
use arrow_array::RecordBatchReader;
@@ -21,6 +21,7 @@ pub struct MergeInsertBuilder {
pub(crate) when_not_matched_insert_all: bool,
pub(crate) when_not_matched_by_source_delete: bool,
pub(crate) when_not_matched_by_source_delete_filt: Option<String>,
pub(crate) timeout: Option<Duration>,
}
impl MergeInsertBuilder {
@@ -33,6 +34,7 @@ impl MergeInsertBuilder {
when_not_matched_insert_all: false,
when_not_matched_by_source_delete: false,
when_not_matched_by_source_delete_filt: None,
timeout: None,
}
}
@@ -84,6 +86,21 @@ impl MergeInsertBuilder {
self
}
/// Maximum time to run the operation before cancelling it.
///
/// By default, there is a 30-second timeout that is only enforced after the
/// first attempt. This is to prevent spending too long retrying to resolve
/// conflicts. For example, if a write attempt takes 20 seconds and fails,
/// the second attempt will be cancelled after 10 seconds, hitting the
/// 30-second timeout. However, a write that takes one hour and succeeds on the
/// first attempt will not be cancelled.
///
/// When this is set, the timeout is enforced on all attempts, including the first.
pub fn timeout(&mut self, timeout: Duration) -> &mut Self {
self.timeout = Some(timeout);
self
}
/// Executes the merge insert operation
///
/// Returns version and statistics about the merge operation including the number of rows