Compare commits

...

22 Commits

Author SHA1 Message Date
Jack Ye
d940b9ade7 commit 2026-04-12 12:58:14 -07:00
Jack Ye
d059447feb refactor: rename deserialize to deserialize_conn, consistent pushdown naming
- Rename deserialize() -> deserialize_conn() for clarity
- Rename internal _pushdown_operations -> _namespace_client_pushdown_operations
  for consistency with the parameter name
- Rename serialized key "pushdown_operations" ->
  "namespace_client_pushdown_operations"

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 12:58:14 -07:00
Jack Ye
6427024bcb fix: route list_namespaces through _namespace_conn for consistency
list_namespaces with empty path was going through Rust (ListingDatabase)
which doesn't see namespaces created via the directory namespace client.
Always delegate to _namespace_conn() so create/list are consistent.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 12:58:14 -07:00
Jack Ye
be19a880a9 fix: format, lint, and update tests for namespace delegation
- Run ruff format on all changed files
- Fix F821 forward reference in _namespace_conn return type
- Update test_local_namespace_operations to verify operations succeed
  instead of expecting NotImplementedError (namespace ops now work on
  LanceDBConnection via directory namespace delegation)
- Remove test_local_create_namespace_not_supported and
  test_local_drop_namespace_not_supported (no longer applicable)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 12:58:14 -07:00
Jack Ye
e7ed3d5dab fix: merge user storage_options in server-side create_table
_create_table_server_side was only passing self.storage_options
(connection-level) to CreateTableRequest, ignoring the user-provided
storage_options parameter. This caused per-table options like
new_table_data_storage_version to be silently dropped.

Fix both sync and async paths to merge user options on top of
connection options.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 12:58:14 -07:00
Jack Ye
7ed4b059a6 refactor: rename serialize_to_json/from_serialized_json to serialize/deserialize
Simpler names, docs no longer reference JSON as the serialization format.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 12:58:14 -07:00
Jack Ye
f36583d0c3 fix: keep original Rust path for root namespace operations
Only delegate to _namespace_conn() when namespace_path is non-empty.
Root namespace operations (list_namespaces, list_tables with empty
path) still go through the original Rust connection to avoid regression.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 12:58:14 -07:00
Jack Ye
fea2ef6a0a refactor: generalize worker property overrides with _lancedb_worker_ prefix
Replace the special-cased worker_uri key with a generic mechanism:
any namespace_client_properties key starting with _lancedb_worker_
has the prefix stripped and overrides the corresponding property
when for_worker=True.

e.g. _lancedb_worker_uri overrides uri in worker context.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 12:58:14 -07:00
Jack Ye
fc2a9726b2 refactor: delegate child namespace ops to LanceNamespaceDBConnection
Instead of reimplementing namespace logic (describe_table, merge
storage_options, etc.) in LanceDBConnection, delegate child namespace
operations to a LanceNamespaceDBConnection via _namespace_conn().

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 12:58:14 -07:00
Jack Ye
e3893dacf8 feat: cache namespace_client and auto-delegate child namespace operations
LanceDBConnection now:
- Caches namespace_client() result to avoid repeated DirectoryNamespace builds
- Auto-delegates open_table/create_table with non-empty namespace_path
  through the directory namespace client
- Routes create_namespace/drop_namespace/describe_namespace/list_namespaces
  through the namespace client
- Routes list_tables/drop_table for child namespaces through namespace client

This enables local storage connections to transparently handle child
namespaces like ["__system"] without requiring a separate
LanceNamespaceDBConnection.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 12:58:14 -07:00
Jack Ye
dd2a0ec48f feat: add serialize_to_json/from_serialized_json for DBConnection
Add serialization support to DBConnection classes so connections
can be reconstructed in remote workers without tracking namespace
params separately.

- DBConnection.serialize_to_json() base method
- LanceDBConnection: serializes uri, storage_options, read_consistency_interval
- LanceNamespaceDBConnection: stores namespace_client_impl/properties,
  serializes all connection params including pushdown_operations
- from_serialized_json() factory with for_worker flag for worker_uri swap
- connect_namespace() now passes impl/properties to connection for serialization

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 12:58:14 -07:00
Lance Release
c6ae0de3ee Bump version: 0.28.0-beta.3 → 0.28.0-beta.4 2026-04-12 03:57:58 +00:00
Lance Release
231f0655ce Bump version: 0.31.0-beta.3 → 0.31.0-beta.4 2026-04-12 03:57:35 +00:00
LanceDB Robot
8c52977c59 chore: update lance dependency to v5.1.0-beta.3 (#3266)
## Summary
- Bump Rust Lance dependencies to `v5.1.0-beta.3` using
`ci/set_lance_version.py`.
- Update Java `lance-core.version` to `5.1.0-beta.3` in `java/pom.xml`.
- Refresh `Cargo.lock` metadata to the `v5.1.0-beta.3` Lance git tag.

## Verification
- `cargo clippy --workspace --tests --all-features -- -D warnings`
- `cargo fmt --all`

## Upstream Tag
- https://github.com/lance-format/lance/releases/tag/v5.1.0-beta.3
2026-04-11 20:56:49 -07:00
Lance Release
359710a0bf Bump version: 0.28.0-beta.2 → 0.28.0-beta.3 2026-04-11 22:44:52 +00:00
Lance Release
1f1726369d Bump version: 0.31.0-beta.2 → 0.31.0-beta.3 2026-04-11 22:44:25 +00:00
Lance Release
df354abae4 Bump version: 0.28.0-beta.1 → 0.28.0-beta.2 2026-04-11 07:06:00 +00:00
Lance Release
11bc674548 Bump version: 0.31.0-beta.1 → 0.31.0-beta.2 2026-04-11 07:05:36 +00:00
LanceDB Robot
5593460823 chore: update lance dependency to v5.1.0-beta.2 (#3263)
## Summary
- Bump Lance Rust workspace dependencies from `5.0.0-beta.5` to
`5.1.0-beta.2` using `ci/set_lance_version.py`.
- Update Java `lance-core.version` in `java/pom.xml` to `5.1.0-beta.2`.
- Refresh `Cargo.lock` to match the new Lance tag.

## Verification
- `cargo clippy --workspace --tests --all-features -- -D warnings`
(passes)
- `cargo fmt --all` (passes)

## Triggering Tag
- https://github.com/lance-format/lance/releases/tag/v5.1.0-beta.2
2026-04-11 00:04:43 -07:00
Will Jones
2807ad6854 chore: bump Rust toolchain from 1.91.0 to 1.94.0 (#3257)
Bumps the Rust toolchain to 1.94.0 (latest installed) to unblock CI
failures caused by the AWS SDK's MSRV requirement. No lint fixes were
needed.

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-10 07:57:47 -07:00
Dhruv Garg
4761fa9bcb fix(python): migrate gemini-text provider to google-genai sdk (#3250)
## Summary
- migrate gemini-text embedding provider from deprecated
google.generativeai to google.genai
- update Python embedding extra dependency to google-genai
- update default model name to gemini-embedding-001
- adapt embed calls to Client().models.embed_content(...)
- apply lint fixes from CI

## Related
- Closes #3191
2026-04-09 15:28:34 -07:00
lennylxx
4c2939d66e fix(python): guard against None before .decode() on split_names metadata key (#3229)
`.get(b"split_names", None).decode()` was called unconditionally in both
Permutations.__init__ and Permutation.from_tables(), crashing with
AttributeError when schema metadata existed but lacked the split_names
key. Guard the decode behind a None check and add regression tests.
2026-04-08 16:04:13 -07:00
33 changed files with 422 additions and 206 deletions

View File

@@ -1,5 +1,5 @@
[tool.bumpversion]
current_version = "0.28.0-beta.1"
current_version = "0.28.0-beta.4"
parse = """(?x)
(?P<major>0|[1-9]\\d*)\\.
(?P<minor>0|[1-9]\\d*)\\.

View File

@@ -8,6 +8,7 @@ on:
paths:
- Cargo.toml
- Cargo.lock
- rust-toolchain.toml
- nodejs/**
- rust/**
- docs/src/js/**

View File

@@ -8,6 +8,7 @@ on:
paths:
- Cargo.toml
- Cargo.lock
- rust-toolchain.toml
- python/**
- rust/**
- .github/workflows/python.yml

View File

@@ -8,6 +8,7 @@ on:
paths:
- Cargo.toml
- Cargo.lock
- rust-toolchain.toml
- rust/**
- .github/workflows/rust.yml

73
Cargo.lock generated
View File

@@ -3072,8 +3072,8 @@ checksum = "42703706b716c37f96a77aea830392ad231f44c9e9a67872fa5548707e11b11c"
[[package]]
name = "fsst"
version = "5.0.0-beta.5"
source = "git+https://github.com/lance-format/lance.git?tag=v5.0.0-beta.5#d630106da5a238b3adfb8c5dea3b3921f3519945"
version = "5.1.0-beta.3"
source = "git+https://github.com/lance-format/lance.git?tag=v5.1.0-beta.3#0a0d53b5ef4b0f65bb775ce63bdcabc79d1d14b8"
dependencies = [
"arrow-array",
"rand 0.9.2",
@@ -4134,13 +4134,14 @@ dependencies = [
[[package]]
name = "lance"
version = "5.0.0-beta.5"
source = "git+https://github.com/lance-format/lance.git?tag=v5.0.0-beta.5#d630106da5a238b3adfb8c5dea3b3921f3519945"
version = "5.1.0-beta.3"
source = "git+https://github.com/lance-format/lance.git?tag=v5.1.0-beta.3#0a0d53b5ef4b0f65bb775ce63bdcabc79d1d14b8"
dependencies = [
"arrow",
"arrow-arith",
"arrow-array",
"arrow-buffer",
"arrow-cast",
"arrow-ipc",
"arrow-ord",
"arrow-row",
@@ -4201,13 +4202,14 @@ dependencies = [
[[package]]
name = "lance-arrow"
version = "5.0.0-beta.5"
source = "git+https://github.com/lance-format/lance.git?tag=v5.0.0-beta.5#d630106da5a238b3adfb8c5dea3b3921f3519945"
version = "5.1.0-beta.3"
source = "git+https://github.com/lance-format/lance.git?tag=v5.1.0-beta.3#0a0d53b5ef4b0f65bb775ce63bdcabc79d1d14b8"
dependencies = [
"arrow-array",
"arrow-buffer",
"arrow-cast",
"arrow-data",
"arrow-ipc",
"arrow-ord",
"arrow-schema",
"arrow-select",
@@ -4222,8 +4224,8 @@ dependencies = [
[[package]]
name = "lance-bitpacking"
version = "5.0.0-beta.5"
source = "git+https://github.com/lance-format/lance.git?tag=v5.0.0-beta.5#d630106da5a238b3adfb8c5dea3b3921f3519945"
version = "5.1.0-beta.3"
source = "git+https://github.com/lance-format/lance.git?tag=v5.1.0-beta.3#0a0d53b5ef4b0f65bb775ce63bdcabc79d1d14b8"
dependencies = [
"arrayref",
"paste",
@@ -4232,8 +4234,8 @@ dependencies = [
[[package]]
name = "lance-core"
version = "5.0.0-beta.5"
source = "git+https://github.com/lance-format/lance.git?tag=v5.0.0-beta.5#d630106da5a238b3adfb8c5dea3b3921f3519945"
version = "5.1.0-beta.3"
source = "git+https://github.com/lance-format/lance.git?tag=v5.1.0-beta.3#0a0d53b5ef4b0f65bb775ce63bdcabc79d1d14b8"
dependencies = [
"arrow-array",
"arrow-buffer",
@@ -4270,12 +4272,13 @@ dependencies = [
[[package]]
name = "lance-datafusion"
version = "5.0.0-beta.5"
source = "git+https://github.com/lance-format/lance.git?tag=v5.0.0-beta.5#d630106da5a238b3adfb8c5dea3b3921f3519945"
version = "5.1.0-beta.3"
source = "git+https://github.com/lance-format/lance.git?tag=v5.1.0-beta.3#0a0d53b5ef4b0f65bb775ce63bdcabc79d1d14b8"
dependencies = [
"arrow",
"arrow-array",
"arrow-buffer",
"arrow-cast",
"arrow-ord",
"arrow-schema",
"arrow-select",
@@ -4301,8 +4304,8 @@ dependencies = [
[[package]]
name = "lance-datagen"
version = "5.0.0-beta.5"
source = "git+https://github.com/lance-format/lance.git?tag=v5.0.0-beta.5#d630106da5a238b3adfb8c5dea3b3921f3519945"
version = "5.1.0-beta.3"
source = "git+https://github.com/lance-format/lance.git?tag=v5.1.0-beta.3#0a0d53b5ef4b0f65bb775ce63bdcabc79d1d14b8"
dependencies = [
"arrow",
"arrow-array",
@@ -4320,8 +4323,8 @@ dependencies = [
[[package]]
name = "lance-encoding"
version = "5.0.0-beta.5"
source = "git+https://github.com/lance-format/lance.git?tag=v5.0.0-beta.5#d630106da5a238b3adfb8c5dea3b3921f3519945"
version = "5.1.0-beta.3"
source = "git+https://github.com/lance-format/lance.git?tag=v5.1.0-beta.3#0a0d53b5ef4b0f65bb775ce63bdcabc79d1d14b8"
dependencies = [
"arrow-arith",
"arrow-array",
@@ -4358,8 +4361,8 @@ dependencies = [
[[package]]
name = "lance-file"
version = "5.0.0-beta.5"
source = "git+https://github.com/lance-format/lance.git?tag=v5.0.0-beta.5#d630106da5a238b3adfb8c5dea3b3921f3519945"
version = "5.1.0-beta.3"
source = "git+https://github.com/lance-format/lance.git?tag=v5.1.0-beta.3#0a0d53b5ef4b0f65bb775ce63bdcabc79d1d14b8"
dependencies = [
"arrow-arith",
"arrow-array",
@@ -4391,8 +4394,8 @@ dependencies = [
[[package]]
name = "lance-index"
version = "5.0.0-beta.5"
source = "git+https://github.com/lance-format/lance.git?tag=v5.0.0-beta.5#d630106da5a238b3adfb8c5dea3b3921f3519945"
version = "5.1.0-beta.3"
source = "git+https://github.com/lance-format/lance.git?tag=v5.1.0-beta.3#0a0d53b5ef4b0f65bb775ce63bdcabc79d1d14b8"
dependencies = [
"arrow",
"arrow-arith",
@@ -4456,8 +4459,8 @@ dependencies = [
[[package]]
name = "lance-io"
version = "5.0.0-beta.5"
source = "git+https://github.com/lance-format/lance.git?tag=v5.0.0-beta.5#d630106da5a238b3adfb8c5dea3b3921f3519945"
version = "5.1.0-beta.3"
source = "git+https://github.com/lance-format/lance.git?tag=v5.1.0-beta.3#0a0d53b5ef4b0f65bb775ce63bdcabc79d1d14b8"
dependencies = [
"arrow",
"arrow-arith",
@@ -4501,8 +4504,8 @@ dependencies = [
[[package]]
name = "lance-linalg"
version = "5.0.0-beta.5"
source = "git+https://github.com/lance-format/lance.git?tag=v5.0.0-beta.5#d630106da5a238b3adfb8c5dea3b3921f3519945"
version = "5.1.0-beta.3"
source = "git+https://github.com/lance-format/lance.git?tag=v5.1.0-beta.3#0a0d53b5ef4b0f65bb775ce63bdcabc79d1d14b8"
dependencies = [
"arrow-array",
"arrow-buffer",
@@ -4518,8 +4521,8 @@ dependencies = [
[[package]]
name = "lance-namespace"
version = "5.0.0-beta.5"
source = "git+https://github.com/lance-format/lance.git?tag=v5.0.0-beta.5#d630106da5a238b3adfb8c5dea3b3921f3519945"
version = "5.1.0-beta.3"
source = "git+https://github.com/lance-format/lance.git?tag=v5.1.0-beta.3#0a0d53b5ef4b0f65bb775ce63bdcabc79d1d14b8"
dependencies = [
"arrow",
"async-trait",
@@ -4532,8 +4535,8 @@ dependencies = [
[[package]]
name = "lance-namespace-impls"
version = "5.0.0-beta.5"
source = "git+https://github.com/lance-format/lance.git?tag=v5.0.0-beta.5#d630106da5a238b3adfb8c5dea3b3921f3519945"
version = "5.1.0-beta.3"
source = "git+https://github.com/lance-format/lance.git?tag=v5.1.0-beta.3#0a0d53b5ef4b0f65bb775ce63bdcabc79d1d14b8"
dependencies = [
"arrow",
"arrow-ipc",
@@ -4578,8 +4581,8 @@ dependencies = [
[[package]]
name = "lance-table"
version = "5.0.0-beta.5"
source = "git+https://github.com/lance-format/lance.git?tag=v5.0.0-beta.5#d630106da5a238b3adfb8c5dea3b3921f3519945"
version = "5.1.0-beta.3"
source = "git+https://github.com/lance-format/lance.git?tag=v5.1.0-beta.3#0a0d53b5ef4b0f65bb775ce63bdcabc79d1d14b8"
dependencies = [
"arrow",
"arrow-array",
@@ -4618,8 +4621,8 @@ dependencies = [
[[package]]
name = "lance-testing"
version = "5.0.0-beta.5"
source = "git+https://github.com/lance-format/lance.git?tag=v5.0.0-beta.5#d630106da5a238b3adfb8c5dea3b3921f3519945"
version = "5.1.0-beta.3"
source = "git+https://github.com/lance-format/lance.git?tag=v5.1.0-beta.3#0a0d53b5ef4b0f65bb775ce63bdcabc79d1d14b8"
dependencies = [
"arrow-array",
"arrow-schema",
@@ -4630,7 +4633,7 @@ dependencies = [
[[package]]
name = "lancedb"
version = "0.28.0-beta.1"
version = "0.28.0-beta.4"
dependencies = [
"ahash",
"anyhow",
@@ -4712,7 +4715,7 @@ dependencies = [
[[package]]
name = "lancedb-nodejs"
version = "0.28.0-beta.1"
version = "0.28.0-beta.4"
dependencies = [
"arrow-array",
"arrow-buffer",
@@ -4734,7 +4737,7 @@ dependencies = [
[[package]]
name = "lancedb-python"
version = "0.31.0-beta.1"
version = "0.31.0-beta.4"
dependencies = [
"arrow",
"async-trait",

View File

@@ -15,20 +15,20 @@ categories = ["database-implementations"]
rust-version = "1.91.0"
[workspace.dependencies]
lance = { "version" = "=5.0.0-beta.5", default-features = false, "tag" = "v5.0.0-beta.5", "git" = "https://github.com/lance-format/lance.git" }
lance-core = { "version" = "=5.0.0-beta.5", "tag" = "v5.0.0-beta.5", "git" = "https://github.com/lance-format/lance.git" }
lance-datagen = { "version" = "=5.0.0-beta.5", "tag" = "v5.0.0-beta.5", "git" = "https://github.com/lance-format/lance.git" }
lance-file = { "version" = "=5.0.0-beta.5", "tag" = "v5.0.0-beta.5", "git" = "https://github.com/lance-format/lance.git" }
lance-io = { "version" = "=5.0.0-beta.5", default-features = false, "tag" = "v5.0.0-beta.5", "git" = "https://github.com/lance-format/lance.git" }
lance-index = { "version" = "=5.0.0-beta.5", "tag" = "v5.0.0-beta.5", "git" = "https://github.com/lance-format/lance.git" }
lance-linalg = { "version" = "=5.0.0-beta.5", "tag" = "v5.0.0-beta.5", "git" = "https://github.com/lance-format/lance.git" }
lance-namespace = { "version" = "=5.0.0-beta.5", "tag" = "v5.0.0-beta.5", "git" = "https://github.com/lance-format/lance.git" }
lance-namespace-impls = { "version" = "=5.0.0-beta.5", default-features = false, "tag" = "v5.0.0-beta.5", "git" = "https://github.com/lance-format/lance.git" }
lance-table = { "version" = "=5.0.0-beta.5", "tag" = "v5.0.0-beta.5", "git" = "https://github.com/lance-format/lance.git" }
lance-testing = { "version" = "=5.0.0-beta.5", "tag" = "v5.0.0-beta.5", "git" = "https://github.com/lance-format/lance.git" }
lance-datafusion = { "version" = "=5.0.0-beta.5", "tag" = "v5.0.0-beta.5", "git" = "https://github.com/lance-format/lance.git" }
lance-encoding = { "version" = "=5.0.0-beta.5", "tag" = "v5.0.0-beta.5", "git" = "https://github.com/lance-format/lance.git" }
lance-arrow = { "version" = "=5.0.0-beta.5", "tag" = "v5.0.0-beta.5", "git" = "https://github.com/lance-format/lance.git" }
lance = { "version" = "=5.1.0-beta.3", default-features = false, "tag" = "v5.1.0-beta.3", "git" = "https://github.com/lance-format/lance.git" }
lance-core = { "version" = "=5.1.0-beta.3", "tag" = "v5.1.0-beta.3", "git" = "https://github.com/lance-format/lance.git" }
lance-datagen = { "version" = "=5.1.0-beta.3", "tag" = "v5.1.0-beta.3", "git" = "https://github.com/lance-format/lance.git" }
lance-file = { "version" = "=5.1.0-beta.3", "tag" = "v5.1.0-beta.3", "git" = "https://github.com/lance-format/lance.git" }
lance-io = { "version" = "=5.1.0-beta.3", default-features = false, "tag" = "v5.1.0-beta.3", "git" = "https://github.com/lance-format/lance.git" }
lance-index = { "version" = "=5.1.0-beta.3", "tag" = "v5.1.0-beta.3", "git" = "https://github.com/lance-format/lance.git" }
lance-linalg = { "version" = "=5.1.0-beta.3", "tag" = "v5.1.0-beta.3", "git" = "https://github.com/lance-format/lance.git" }
lance-namespace = { "version" = "=5.1.0-beta.3", "tag" = "v5.1.0-beta.3", "git" = "https://github.com/lance-format/lance.git" }
lance-namespace-impls = { "version" = "=5.1.0-beta.3", default-features = false, "tag" = "v5.1.0-beta.3", "git" = "https://github.com/lance-format/lance.git" }
lance-table = { "version" = "=5.1.0-beta.3", "tag" = "v5.1.0-beta.3", "git" = "https://github.com/lance-format/lance.git" }
lance-testing = { "version" = "=5.1.0-beta.3", "tag" = "v5.1.0-beta.3", "git" = "https://github.com/lance-format/lance.git" }
lance-datafusion = { "version" = "=5.1.0-beta.3", "tag" = "v5.1.0-beta.3", "git" = "https://github.com/lance-format/lance.git" }
lance-encoding = { "version" = "=5.1.0-beta.3", "tag" = "v5.1.0-beta.3", "git" = "https://github.com/lance-format/lance.git" }
lance-arrow = { "version" = "=5.1.0-beta.3", "tag" = "v5.1.0-beta.3", "git" = "https://github.com/lance-format/lance.git" }
ahash = "0.8"
# Note that this one does not include pyarrow
arrow = { version = "57.2", optional = false }

View File

@@ -14,7 +14,7 @@ Add the following dependency to your `pom.xml`:
<dependency>
<groupId>com.lancedb</groupId>
<artifactId>lancedb-core</artifactId>
<version>0.28.0-beta.1</version>
<version>0.28.0-beta.4</version>
</dependency>
```

View File

@@ -8,7 +8,7 @@
<parent>
<groupId>com.lancedb</groupId>
<artifactId>lancedb-parent</artifactId>
<version>0.28.0-beta.1</version>
<version>0.28.0-beta.4</version>
<relativePath>../pom.xml</relativePath>
</parent>

View File

@@ -6,7 +6,7 @@
<groupId>com.lancedb</groupId>
<artifactId>lancedb-parent</artifactId>
<version>0.28.0-beta.1</version>
<version>0.28.0-beta.4</version>
<packaging>pom</packaging>
<name>${project.artifactId}</name>
<description>LanceDB Java SDK Parent POM</description>
@@ -28,7 +28,7 @@
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<arrow.version>15.0.0</arrow.version>
<lance-core.version>5.0.0-beta.5</lance-core.version>
<lance-core.version>5.1.0-beta.3</lance-core.version>
<spotless.skip>false</spotless.skip>
<spotless.version>2.30.0</spotless.version>
<spotless.java.googlejavaformat.version>1.7</spotless.java.googlejavaformat.version>

View File

@@ -1,7 +1,7 @@
[package]
name = "lancedb-nodejs"
edition.workspace = true
version = "0.28.0-beta.1"
version = "0.28.0-beta.4"
license.workspace = true
description.workspace = true
repository.workspace = true

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-darwin-arm64",
"version": "0.28.0-beta.1",
"version": "0.28.0-beta.4",
"os": ["darwin"],
"cpu": ["arm64"],
"main": "lancedb.darwin-arm64.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-linux-arm64-gnu",
"version": "0.28.0-beta.1",
"version": "0.28.0-beta.4",
"os": ["linux"],
"cpu": ["arm64"],
"main": "lancedb.linux-arm64-gnu.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-linux-arm64-musl",
"version": "0.28.0-beta.1",
"version": "0.28.0-beta.4",
"os": ["linux"],
"cpu": ["arm64"],
"main": "lancedb.linux-arm64-musl.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-linux-x64-gnu",
"version": "0.28.0-beta.1",
"version": "0.28.0-beta.4",
"os": ["linux"],
"cpu": ["x64"],
"main": "lancedb.linux-x64-gnu.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-linux-x64-musl",
"version": "0.28.0-beta.1",
"version": "0.28.0-beta.4",
"os": ["linux"],
"cpu": ["x64"],
"main": "lancedb.linux-x64-musl.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-win32-arm64-msvc",
"version": "0.28.0-beta.1",
"version": "0.28.0-beta.4",
"os": [
"win32"
],

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-win32-x64-msvc",
"version": "0.28.0-beta.1",
"version": "0.28.0-beta.4",
"os": ["win32"],
"cpu": ["x64"],
"main": "lancedb.win32-x64-msvc.node",

View File

@@ -1,12 +1,12 @@
{
"name": "@lancedb/lancedb",
"version": "0.28.0-beta.1",
"version": "0.28.0-beta.4",
"lockfileVersion": 3,
"requires": true,
"packages": {
"": {
"name": "@lancedb/lancedb",
"version": "0.28.0-beta.1",
"version": "0.28.0-beta.4",
"cpu": [
"x64",
"arm64"

View File

@@ -11,7 +11,7 @@
"ann"
],
"private": false,
"version": "0.28.0-beta.1",
"version": "0.28.0-beta.4",
"main": "dist/index.js",
"exports": {
".": "./dist/index.js",

View File

@@ -1,5 +1,5 @@
[tool.bumpversion]
current_version = "0.31.0-beta.1"
current_version = "0.31.0-beta.4"
parse = """(?x)
(?P<major>0|[1-9]\\d*)\\.
(?P<minor>0|[1-9]\\d*)\\.

View File

@@ -1,6 +1,6 @@
[package]
name = "lancedb-python"
version = "0.31.0-beta.1"
version = "0.31.0-beta.4"
edition.workspace = true
description = "Python bindings for LanceDB"
license.workspace = true

View File

@@ -83,7 +83,7 @@ embeddings = [
"colpali-engine>=0.3.10",
"huggingface_hub>=0.19.0",
"InstructorEmbedding>=1.0.1",
"google.generativeai>=0.3.0",
"google-genai>=1.0.0",
"boto3>=1.28.57",
"awscli>=1.44.38",
"botocore>=1.31.57",

View File

@@ -215,6 +215,85 @@ def connect(
)
WORKER_PROPERTY_PREFIX = "_lancedb_worker_"
def _apply_worker_overrides(props: dict[str, str]) -> dict[str, str]:
"""Apply worker property overrides.
Any key starting with ``_lancedb_worker_`` is extracted, the prefix
is stripped, and the resulting key-value pair is put back into the
map (overriding the existing value if present). The original
prefixed key is removed.
"""
worker_keys = [k for k in props if k.startswith(WORKER_PROPERTY_PREFIX)]
if not worker_keys:
return props
result = dict(props)
for key in worker_keys:
value = result.pop(key)
real_key = key[len(WORKER_PROPERTY_PREFIX) :]
result[real_key] = value
return result
def deserialize_conn(
data: str,
*,
for_worker: bool = False,
) -> DBConnection:
"""Reconstruct a DBConnection from a serialized string.
The string must have been produced by
:meth:`DBConnection.serialize`.
Parameters
----------
data : str
String produced by ``serialize()``.
for_worker : bool, default False
When ``True``, any namespace client property whose key starts
with ``_lancedb_worker_`` has that prefix stripped and the
value overrides the corresponding property. For example,
``_lancedb_worker_uri`` replaces ``uri``.
Returns
-------
DBConnection
A new connection matching the serialized state.
"""
import json
parsed = json.loads(data)
connection_type = parsed.get("connection_type")
rci_secs = parsed.get("read_consistency_interval_seconds")
rci = timedelta(seconds=rci_secs) if rci_secs is not None else None
storage_options = parsed.get("storage_options")
if connection_type == "namespace":
props = dict(parsed.get("namespace_client_properties") or {})
if for_worker:
props = _apply_worker_overrides(props)
return connect_namespace(
namespace_client_impl=parsed["namespace_client_impl"],
namespace_client_properties=props,
read_consistency_interval=rci,
storage_options=storage_options,
namespace_client_pushdown_operations=parsed.get(
"namespace_client_pushdown_operations"
),
)
elif connection_type == "local":
return LanceDBConnection(
parsed["uri"],
read_consistency_interval=rci,
storage_options=storage_options,
)
else:
raise ValueError(f"Unknown connection_type: {connection_type}")
async def connect_async(
uri: URI,
*,

View File

@@ -529,6 +529,19 @@ class DBConnection(EnforceOverrides):
"namespace_client is not supported for this connection type"
)
def serialize(self) -> str:
"""Serialize this connection for reconstruction.
The returned string can be passed to :func:`lancedb.deserialize_conn`
to recreate an equivalent connection, e.g. in a remote worker.
Returns
-------
str
Serialized representation of this connection.
"""
raise NotImplementedError("serialize is not supported for this connection type")
class LanceDBConnection(DBConnection):
"""
@@ -581,6 +594,7 @@ class LanceDBConnection(DBConnection):
):
if _inner is not None:
self._conn = _inner
self._cached_namespace_client = None
return
if not isinstance(uri, Path):
@@ -628,6 +642,7 @@ class LanceDBConnection(DBConnection):
# beyond _conn.
self.storage_options = storage_options
self._conn = AsyncConnection(LOOP.run(do_connect()))
self._cached_namespace_client: Optional[LanceNamespace] = None
@property
def read_consistency_interval(self) -> Optional[timedelta]:
@@ -652,6 +667,22 @@ class LanceDBConnection(DBConnection):
val += ")"
return val
@override
def serialize(self) -> str:
import json
rci = self.read_consistency_interval
return json.dumps(
{
"connection_type": "local",
"uri": self.uri,
"storage_options": self.storage_options,
"read_consistency_interval_seconds": (
rci.total_seconds() if rci else None
),
}
)
async def _async_get_table_names(self, start_after: Optional[str], limit: int):
conn = AsyncConnection(await lancedb_connect(self.uri))
return await conn.table_names(start_after=start_after, limit=limit)
@@ -687,10 +718,10 @@ class LanceDBConnection(DBConnection):
"""
if namespace_path is None:
namespace_path = []
return LOOP.run(
self._conn.list_namespaces(
namespace_path=namespace_path, page_token=page_token, limit=limit
)
return self._namespace_conn().list_namespaces(
namespace_path=namespace_path,
page_token=page_token,
limit=limit,
)
@override
@@ -700,27 +731,10 @@ class LanceDBConnection(DBConnection):
mode: Optional[str] = None,
properties: Optional[Dict[str, str]] = None,
) -> CreateNamespaceResponse:
"""Create a new namespace.
Parameters
----------
namespace_path: List[str]
The namespace identifier to create.
mode: str, optional
Creation mode - "create" (fail if exists), "exist_ok" (skip if exists),
or "overwrite" (replace if exists). Case insensitive.
properties: Dict[str, str], optional
Properties to set on the namespace.
Returns
-------
CreateNamespaceResponse
Response containing the properties of the created namespace.
"""
return LOOP.run(
self._conn.create_namespace(
namespace_path=namespace_path, mode=mode, properties=properties
)
return self._namespace_conn().create_namespace(
namespace_path=namespace_path,
mode=mode,
properties=properties,
)
@override
@@ -730,46 +744,19 @@ class LanceDBConnection(DBConnection):
mode: Optional[str] = None,
behavior: Optional[str] = None,
) -> DropNamespaceResponse:
"""Drop a namespace.
Parameters
----------
namespace_path: List[str]
The namespace identifier to drop.
mode: str, optional
Whether to skip if not exists ("SKIP") or fail ("FAIL"). Case insensitive.
behavior: str, optional
Whether to restrict drop if not empty ("RESTRICT") or cascade ("CASCADE").
Case insensitive.
Returns
-------
DropNamespaceResponse
Response containing properties and transaction_id if applicable.
"""
return LOOP.run(
self._conn.drop_namespace(
namespace_path=namespace_path, mode=mode, behavior=behavior
)
return self._namespace_conn().drop_namespace(
namespace_path=namespace_path,
mode=mode,
behavior=behavior,
)
@override
def describe_namespace(
self, namespace_path: List[str]
) -> DescribeNamespaceResponse:
"""Describe a namespace.
Parameters
----------
namespace_path: List[str]
The namespace identifier to describe.
Returns
-------
DescribeNamespaceResponse
Response containing the namespace properties.
"""
return LOOP.run(self._conn.describe_namespace(namespace_path=namespace_path))
return self._namespace_conn().describe_namespace(
namespace_path=namespace_path,
)
@override
def list_tables(
@@ -798,6 +785,12 @@ class LanceDBConnection(DBConnection):
"""
if namespace_path is None:
namespace_path = []
if namespace_path:
return self._namespace_conn().list_tables(
namespace_path=namespace_path,
page_token=page_token,
limit=limit,
)
return LOOP.run(
self._conn.list_tables(
namespace_path=namespace_path, page_token=page_token, limit=limit
@@ -886,6 +879,22 @@ class LanceDBConnection(DBConnection):
raise ValueError("mode must be either 'create' or 'overwrite'")
validate_table_name(name)
if namespace_path:
return self._namespace_conn().create_table(
name,
data=data,
schema=schema,
mode=mode,
exist_ok=exist_ok,
on_bad_vectors=on_bad_vectors,
fill_value=fill_value,
embedding_functions=embedding_functions,
namespace_path=namespace_path,
storage_options=storage_options,
data_storage_version=data_storage_version,
enable_v2_manifest_paths=enable_v2_manifest_paths,
)
tbl = LanceTable.create(
self,
name,
@@ -901,6 +910,19 @@ class LanceDBConnection(DBConnection):
)
return tbl
def _namespace_conn(self) -> DBConnection:
"""Return a LanceNamespaceDBConnection backed by this connection's
directory namespace. Used to delegate child-namespace operations."""
from lancedb.namespace import LanceNamespaceDBConnection
return LanceNamespaceDBConnection(
self.namespace_client(),
read_consistency_interval=self.read_consistency_interval,
storage_options=self.storage_options,
namespace_client_impl=None,
namespace_client_properties=None,
)
@override
def open_table(
self,
@@ -917,7 +939,8 @@ class LanceDBConnection(DBConnection):
name: str
The name of the table.
namespace_path: List[str], optional
The namespace to open the table from.
The namespace to open the table from. When non-empty, the
table is resolved through the directory namespace client.
Returns
-------
@@ -936,6 +959,14 @@ class LanceDBConnection(DBConnection):
stacklevel=2,
)
if namespace_path:
return self._namespace_conn().open_table(
name,
namespace_path=namespace_path,
storage_options=storage_options,
index_cache_size=index_cache_size,
)
return LanceTable.open(
self,
name,
@@ -1020,6 +1051,9 @@ class LanceDBConnection(DBConnection):
"""
if namespace_path is None:
namespace_path = []
if namespace_path:
self._namespace_conn().drop_table(name, namespace_path=namespace_path)
return
LOOP.run(
self._conn.drop_table(
name, namespace_path=namespace_path, ignore_missing=ignore_missing
@@ -1071,14 +1105,17 @@ class LanceDBConnection(DBConnection):
"""Get the equivalent namespace client for this connection.
Returns a DirectoryNamespace pointing to the same root with the
same storage options.
same storage options. The result is cached for the lifetime of
this connection.
Returns
-------
LanceNamespace
The namespace client for this connection.
"""
return LOOP.run(self._conn.namespace_client())
if self._cached_namespace_client is None:
self._cached_namespace_client = LOOP.run(self._conn.namespace_client())
return self._cached_namespace_client
@deprecation.deprecated(
deprecated_in="0.15.1",

View File

@@ -19,10 +19,10 @@ from .utils import TEXT, api_key_not_found_help
@register("gemini-text")
class GeminiText(TextEmbeddingFunction):
"""
An embedding function that uses the Google's Gemini API. Requires GOOGLE_API_KEY to
An embedding function that uses Google's Gemini API. Requires GOOGLE_API_KEY to
be set.
https://ai.google.dev/docs/embeddings_guide
https://ai.google.dev/gemini-api/docs/embeddings
Supports various tasks types:
| Task Type | Description |
@@ -46,9 +46,12 @@ class GeminiText(TextEmbeddingFunction):
Parameters
----------
name: str, default "models/embedding-001"
The name of the model to use. See the Gemini documentation for a list of
available models.
name: str, default "gemini-embedding-001"
The name of the model to use. Supported models include:
- "gemini-embedding-001" (768 dimensions)
Note: The legacy "models/embedding-001" format is also supported but
"gemini-embedding-001" is recommended.
query_task_type: str, default "retrieval_query"
Sets the task type for the queries.
@@ -77,7 +80,7 @@ class GeminiText(TextEmbeddingFunction):
"""
name: str = "models/embedding-001"
name: str = "gemini-embedding-001"
query_task_type: str = "retrieval_query"
source_task_type: str = "retrieval_document"
@@ -114,23 +117,48 @@ class GeminiText(TextEmbeddingFunction):
texts: list[str] or np.ndarray (of str)
The texts to embed
"""
if (
kwargs.get("task_type") == "retrieval_document"
): # Provide a title to use existing API design
title = "Embedding of a document"
kwargs["title"] = title
from google.genai import types
return [
self.client.embed_content(model=self.name, content=text, **kwargs)[
"embedding"
]
for text in texts
]
task_type = kwargs.get("task_type")
# Build content objects for embed_content
contents = []
for text in texts:
if task_type == "retrieval_document":
# Provide a title for retrieval_document task
contents.append(
{"parts": [{"text": "Embedding of a document"}, {"text": text}]}
)
else:
contents.append({"parts": [{"text": text}]})
# Build config
config_kwargs = {}
if task_type:
config_kwargs["task_type"] = task_type.upper() # API expects uppercase
# Call embed_content for each content
embeddings = []
for content in contents:
config = (
types.EmbedContentConfig(**config_kwargs) if config_kwargs else None
)
response = self.client.models.embed_content(
model=self.name,
contents=content,
config=config,
)
embeddings.append(response.embeddings[0].values)
return embeddings
@cached_property
def client(self):
genai = attempt_import_or_raise("google.generativeai", "google.generativeai")
attempt_import_or_raise("google.genai", "google-genai")
if not os.environ.get("GOOGLE_API_KEY"):
api_key_not_found_help("google")
return genai
from google import genai as genai_module
return genai_module.Client(api_key=os.environ.get("GOOGLE_API_KEY"))

View File

@@ -381,6 +381,8 @@ class LanceNamespaceDBConnection(DBConnection):
storage_options: Optional[Dict[str, str]] = None,
session: Optional[Session] = None,
namespace_client_pushdown_operations: Optional[List[str]] = None,
namespace_client_impl: Optional[str] = None,
namespace_client_properties: Optional[Dict[str, str]] = None,
):
"""
Initialize a namespace-based LanceDB connection.
@@ -406,12 +408,43 @@ class LanceNamespaceDBConnection(DBConnection):
namespace.create_table() instead of using declare_table + local write.
Default is None (no pushdown, all operations run locally).
namespace_client_impl : Optional[str]
The namespace implementation name used to create this connection.
Stored for serialization purposes.
namespace_client_properties : Optional[Dict[str, str]]
The namespace properties used to create this connection.
Stored for serialization purposes.
"""
self._namespace_client = namespace_client
self.read_consistency_interval = read_consistency_interval
self.storage_options = storage_options or {}
self.session = session
self._pushdown_operations = set(namespace_client_pushdown_operations or [])
self._namespace_client_pushdown_operations = set(
namespace_client_pushdown_operations or []
)
self._namespace_client_impl = namespace_client_impl
self._namespace_client_properties = namespace_client_properties
@override
def serialize(self) -> str:
import json
return json.dumps(
{
"connection_type": "namespace",
"namespace_client_impl": self._namespace_client_impl,
"namespace_client_properties": self._namespace_client_properties,
"namespace_client_pushdown_operations": sorted(
self._namespace_client_pushdown_operations
),
"storage_options": self.storage_options or None,
"read_consistency_interval_seconds": (
self.read_consistency_interval.total_seconds()
if self.read_consistency_interval
else None
),
}
)
@override
def table_names(
@@ -467,7 +500,7 @@ class LanceNamespaceDBConnection(DBConnection):
table_id = namespace_path + [name]
if "CreateTable" in self._pushdown_operations:
if "CreateTable" in self._namespace_client_pushdown_operations:
return self._create_table_server_side(
name=name,
data=data,
@@ -549,7 +582,7 @@ class LanceNamespaceDBConnection(DBConnection):
storage_options=merged_storage_options,
location=location,
namespace_client=self._namespace_client,
pushdown_operations=self._pushdown_operations,
pushdown_operations=self._namespace_client_pushdown_operations,
)
return tbl
@@ -580,10 +613,13 @@ class LanceNamespaceDBConnection(DBConnection):
fill_value=fill_value,
)
merged = dict(self.storage_options or {})
if storage_options:
merged.update(storage_options)
request = CreateTableRequest(
id=table_id,
mode=_normalize_create_table_mode(mode),
properties=self.storage_options if self.storage_options else None,
properties=merged or None,
)
try:
@@ -887,7 +923,7 @@ class LanceNamespaceDBConnection(DBConnection):
location=table_uri,
namespace_client=namespace_client,
managed_versioning=managed_versioning,
pushdown_operations=self._pushdown_operations,
pushdown_operations=self._namespace_client_pushdown_operations,
)
@override
@@ -951,7 +987,9 @@ class AsyncLanceNamespaceDBConnection:
self.read_consistency_interval = read_consistency_interval
self.storage_options = storage_options or {}
self.session = session
self._pushdown_operations = set(namespace_client_pushdown_operations or [])
self._namespace_client_pushdown_operations = set(
namespace_client_pushdown_operations or []
)
async def table_names(
self,
@@ -1006,7 +1044,7 @@ class AsyncLanceNamespaceDBConnection:
table_id = namespace_path + [name]
if "CreateTable" in self._pushdown_operations:
if "CreateTable" in self._namespace_client_pushdown_operations:
return await self._create_table_server_side(
name=name,
data=data,
@@ -1086,7 +1124,7 @@ class AsyncLanceNamespaceDBConnection:
storage_options=merged_storage_options,
location=location,
namespace_client=self._namespace_client,
pushdown_operations=self._pushdown_operations,
pushdown_operations=self._namespace_client_pushdown_operations,
)
lance_table = await asyncio.to_thread(_create_table)
@@ -1120,10 +1158,13 @@ class AsyncLanceNamespaceDBConnection:
fill_value=fill_value,
)
merged = dict(self.storage_options or {})
if storage_options:
merged.update(storage_options)
request = CreateTableRequest(
id=table_id,
mode=_normalize_create_table_mode(mode),
properties=self.storage_options if self.storage_options else None,
properties=merged or None,
)
self._namespace_client.create_table(request, arrow_ipc_bytes)
@@ -1190,7 +1231,7 @@ class AsyncLanceNamespaceDBConnection:
location=response.location,
namespace_client=self._namespace_client,
managed_versioning=managed_versioning,
pushdown_operations=self._pushdown_operations,
pushdown_operations=self._namespace_client_pushdown_operations,
)
lance_table = await asyncio.to_thread(_open_table)
@@ -1472,6 +1513,8 @@ def connect_namespace(
storage_options=storage_options,
session=session,
namespace_client_pushdown_operations=namespace_client_pushdown_operations,
namespace_client_impl=namespace_client_impl,
namespace_client_properties=namespace_client_properties,
)

View File

@@ -284,9 +284,8 @@ class Permutations:
self.permutation_table = permutation_table
if permutation_table.schema.metadata is not None:
split_names = permutation_table.schema.metadata.get(
b"split_names", None
).decode("utf-8")
raw = permutation_table.schema.metadata.get(b"split_names")
split_names = raw.decode("utf-8") if raw is not None else None
if split_names is not None:
self.split_names = json.loads(split_names)
self.split_dict = {
@@ -460,9 +459,8 @@ class Permutation:
f"Cannot create a permutation on split `{split}`"
" because no split names are defined in the permutation table"
)
split_names = permutation_table.schema.metadata.get(
b"split_names", None
).decode("utf-8")
raw = permutation_table.schema.metadata.get(b"split_names")
split_names = raw.decode("utf-8") if raw is not None else None
if split_names is None:
raise ValueError(
f"Cannot create a permutation on split `{split}`"

View File

@@ -897,42 +897,22 @@ def test_bypass_vector_index_sync(tmp_db: lancedb.DBConnection):
def test_local_namespace_operations(tmp_path):
"""Test that local mode namespace operations behave as expected."""
# Create a local database connection
"""Test that local mode namespace operations work via directory namespace."""
db = lancedb.connect(tmp_path)
# Test list_namespaces returns empty list for root namespace
namespaces = db.list_namespaces().namespaces
assert namespaces == []
# Root namespace starts empty
assert db.list_namespaces().namespaces == []
# Test list_namespaces with non-empty namespace raises NotImplementedError
with pytest.raises(
NotImplementedError,
match="Namespace operations are not supported for listing database",
):
db.list_namespaces(namespace_path=["test"])
# Create and list child namespace
db.create_namespace(["child"])
assert "child" in db.list_namespaces().namespaces
# List namespaces under child
assert db.list_namespaces(namespace_path=["child"]).namespaces == []
def test_local_create_namespace_not_supported(tmp_path):
"""Test that create_namespace is not supported in local mode."""
db = lancedb.connect(tmp_path)
with pytest.raises(
NotImplementedError,
match="Namespace operations are not supported for listing database",
):
db.create_namespace(["test_namespace"])
def test_local_drop_namespace_not_supported(tmp_path):
"""Test that drop_namespace is not supported in local mode."""
db = lancedb.connect(tmp_path)
with pytest.raises(
NotImplementedError,
match="Namespace operations are not supported for listing database",
):
db.drop_namespace(["test_namespace"])
# Drop namespace
db.drop_namespace(["child"])
assert db.list_namespaces().namespaces == []
def test_clone_table_latest_version(tmp_path):

View File

@@ -681,7 +681,7 @@ class TestPushdownOperations:
{"root": self.temp_dir},
namespace_client_pushdown_operations=["QueryTable"],
)
assert "QueryTable" in db._pushdown_operations
assert "QueryTable" in db._namespace_client_pushdown_operations
def test_create_table_pushdown_stored(self):
"""Test that CreateTable pushdown is stored on sync connection."""
@@ -690,7 +690,7 @@ class TestPushdownOperations:
{"root": self.temp_dir},
namespace_client_pushdown_operations=["CreateTable"],
)
assert "CreateTable" in db._pushdown_operations
assert "CreateTable" in db._namespace_client_pushdown_operations
def test_both_pushdowns_stored(self):
"""Test that both pushdown operations can be set together."""
@@ -699,13 +699,13 @@ class TestPushdownOperations:
{"root": self.temp_dir},
namespace_client_pushdown_operations=["QueryTable", "CreateTable"],
)
assert "QueryTable" in db._pushdown_operations
assert "CreateTable" in db._pushdown_operations
assert "QueryTable" in db._namespace_client_pushdown_operations
assert "CreateTable" in db._namespace_client_pushdown_operations
def test_pushdown_defaults_to_empty(self):
"""Test that pushdown operations default to empty."""
db = lancedb.connect_namespace("dir", {"root": self.temp_dir})
assert len(db._pushdown_operations) == 0
assert len(db._namespace_client_pushdown_operations) == 0
@pytest.mark.asyncio
@@ -727,7 +727,7 @@ class TestAsyncPushdownOperations:
{"root": self.temp_dir},
namespace_client_pushdown_operations=["QueryTable"],
)
assert "QueryTable" in db._pushdown_operations
assert "QueryTable" in db._namespace_client_pushdown_operations
async def test_async_create_table_pushdown_stored(self):
"""Test that CreateTable pushdown is stored on async connection."""
@@ -736,9 +736,9 @@ class TestAsyncPushdownOperations:
{"root": self.temp_dir},
namespace_client_pushdown_operations=["CreateTable"],
)
assert "CreateTable" in db._pushdown_operations
assert "CreateTable" in db._namespace_client_pushdown_operations
async def test_async_pushdown_defaults_to_empty(self):
"""Test that pushdown operations default to empty on async connection."""
db = lancedb.connect_namespace_async("dir", {"root": self.temp_dir})
assert len(db._pushdown_operations) == 0
assert len(db._namespace_client_pushdown_operations) == 0

View File

@@ -522,6 +522,50 @@ def test_no_split_names(some_table: Table):
assert permutations[1].num_rows == 500
def test_permutations_metadata_without_split_names_key(mem_db: DBConnection):
"""Regression: schema metadata present but missing split_names key must not crash.
Previously, `.get(b"split_names", None).decode()` was called unconditionally,
so any permutation table whose metadata dict had other keys but no split_names
raised AttributeError: 'NoneType' has no attribute 'decode'.
"""
base = mem_db.create_table("base_nosplit", pa.table({"x": range(10)}))
# Build a permutation-like table that carries some metadata but NOT split_names.
raw = pa.table(
{
"row_id": pa.array(range(10), type=pa.uint64()),
"split_id": pa.array([0] * 10, type=pa.uint32()),
}
).replace_schema_metadata({b"other_key": b"other_value"})
perm_tbl = mem_db.create_table("perm_nosplit", raw)
permutations = Permutations(base, perm_tbl)
assert permutations.split_names == []
assert permutations.split_dict == {}
def test_from_tables_string_split_missing_names_key(mem_db: DBConnection):
"""Regression: from_tables() with a string split must raise ValueError, not
AttributeError.
Previously, `.get(b"split_names", None).decode()` crashed with AttributeError
when the metadata dict existed but had no split_names key.
"""
base = mem_db.create_table("base_strsplit", pa.table({"x": range(10)}))
raw = pa.table(
{
"row_id": pa.array(range(10), type=pa.uint64()),
"split_id": pa.array([0] * 10, type=pa.uint32()),
}
).replace_schema_metadata({b"other_key": b"other_value"})
perm_tbl = mem_db.create_table("perm_strsplit", raw)
with pytest.raises(ValueError, match="no split names are defined"):
Permutation.from_tables(base, perm_tbl, split="train")
@pytest.fixture
def some_perm_table(some_table: Table) -> Table:
return (

View File

@@ -1,2 +1,2 @@
[toolchain]
channel = "1.91.0"
channel = "1.94.0"

View File

@@ -1,6 +1,6 @@
[package]
name = "lancedb"
version = "0.28.0-beta.1"
version = "0.28.0-beta.4"
edition.workspace = true
description = "LanceDB: A serverless, low-latency vector database for AI applications"
license.workspace = true

View File

@@ -177,6 +177,7 @@ impl BedrockEmbeddingFunction {
))
.send()
.await
.map_err(Box::new)
})
})
.unwrap();