Compare commits

...

27 Commits

Author SHA1 Message Date
LuQQiu
d67a8743ba feat: support remote ivf rq (#2863) 2026-01-02 15:35:33 -08:00
Chenghao Lyu
46fcbbc1e3 fix(python): require explicit region for S3 buckets with dots (#2892)
When region is not specific in the s3 path, `resolve_s3_region` from
"lance-format" project (see [here][1]) will resolve the region by
calling `resolve_bucket_region`, which is a function from the
"arrow-rs-object-store" project expecting [virtual-hosted-style
URLs][1]. When there are dot (".") in the virtual-hosted-style URLs, it
breaks automatic region detection. See more details in the issue
description:
https://github.com/lancedb/lancedb/issues/1898#issuecomment-3690142427

This PR add early validation in connect() and connect_async() to raise a
clear error with instructions when the region is not specified for such
buckets.


[1]:
https://github.com/lance-format/lance/blob/v2.0.0-beta.4/rust/lance-io/src/object_store/providers/aws.rs#L197
[2]:
eedbf3d7d8/src/aws/resolve.rs (L52C5-L52C65)
[3]:
https://docs.aws.amazon.com/AmazonS3/latest/userguide/VirtualHosting.html#virtual-hosted-style-access

Fixes #1898
2026-01-02 15:35:22 -08:00
Prashanth Rao
ff53b76ac0 docs: address styling and aesthetics issues with banner and links (#2878)
Aesthetic and styling fixes to the SDK reference docs:
- [x] Improve readability of LanceDB  in the header
- [x] Make header more compact, and consistent in gradient color with
the main website/docs
- [x] Updated favicon to match with the docs page
- [x] Enable permalink display to allow users to get anchor links to
each function/method
- [x] Point readers to the main docs at
[docs.lancedb.com](https://docs.lancedb.com)
2026-01-02 15:15:35 -08:00
fzowl
2adb10e6a8 feat: voyage-multimodal-3.5 (#2887)
voyage-multimodal-3.5 support (text, image and video embeddings)
2026-01-02 15:14:52 -08:00
Colin Patrick McCabe
ac164c352b test: convert test_table_names to test both remote and local (#2888)
Convert test_table_names to test both remote and local connections.

This PR also includes some miscellaneous improvements in
src/test_utils/connection.rs. It starts a thread to drain stdout from
the server process. It adds the
PRINT_LANCEDB_TEST_CONNECTION_SCRIPT_OUTPUT environment variable, which
optionally displays server stdout.

Fix a bash conditional in run_with_test_connection.sh.
2026-01-02 15:08:44 -08:00
Lance Release
8bcac7e372 Bump version: 0.23.1-beta.2 → 0.23.1 2026-01-02 17:39:19 +00:00
Lance Release
e496184ab2 Bump version: 0.23.1-beta.1 → 0.23.1-beta.2 2026-01-02 17:38:54 +00:00
Lance Release
d85d338a8e Bump version: 0.26.1-beta.2 → 0.26.1 2026-01-02 17:38:22 +00:00
Lance Release
f0320725b6 Bump version: 0.26.1-beta.1 → 0.26.1-beta.2 2026-01-02 17:38:20 +00:00
Will Jones
dfbd3552bf feat: update lance dependency to v1.0.1 (#2881)
Co-authored-by: Jack Ye <yezhaoqin@gmail.com>
2026-01-02 09:23:46 -08:00
Jonathan Hsieh
1cf7b4b678 docs: remove incorrect "LanceDb Cloud only" from table_names params (#2893)
The page_token and limit parameters for table_names() are supported by
both local storage and LanceDB Cloud, not just Cloud as the docstring
incorrectly stated.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-29 09:08:04 -08:00
Prashanth Rao
8ae4f42fbe fix: add to_lance() and to_polars() stub methods for type-checkers (#2876)
Adds `Table.to_lance()` and `Table.to_polars()` methods (non-abstract
methods, defaulting to `NotImplementedError`) so type checkers like
mypy, pyright and ty don’t flag them as unknown attributes on `Table`.
Not making these abstract methods should keep existing remote/other
`Table` implementations instantiable.

This is non-breaking change to existing functionality and is purely for
the purpose of pleasing static type-checkers like mypy, ty and pyright.

<img width="626" height="134" alt="image"
src="https://github.com/user-attachments/assets/f4619bca-a882-432b-bd23-ae8f189ff9e3"
/>
2025-12-18 12:55:07 -05:00
Lance Release
0667fa38d4 Bump version: 0.23.1-beta.0 → 0.23.1-beta.1 2025-12-17 06:59:29 +00:00
Lance Release
30108c0b1f Bump version: 0.26.1-beta.0 → 0.26.1-beta.1 2025-12-17 06:58:52 +00:00
Jack Ye
1628f7e3f3 fix: pass namespace storage options provider into native table (#2873)
Previously the native table is created with static credentials and could
not auto-refresh credentials when expired.
2025-12-16 22:58:04 -08:00
Lance Release
2fd712312f Bump version: 0.23.0 → 0.23.1-beta.0 2025-12-17 03:30:51 +00:00
Lance Release
ba94e69d5d Bump version: 0.26.0 → 0.26.1-beta.0 2025-12-17 03:30:18 +00:00
Jack Ye
9e60fda0ec fix: use post for describe_namespace and allow access to underlying client (#2871)
Issues found during integration tests:
1. describe_namespace should use POST
2. service needs to access the underlying namespace to be able to do
operations like create_empty_table directly, or get credentials in
isolated paths like a remote take
2025-12-16 19:29:27 -08:00
LanceDB Robot
3e0d451e9b chore: update lance dependency to v1.0.1-beta.1 (#2872)
bump Lance crates to v1.0.1-beta.1

Triggering tag:
https://github.com/lance-format/lance/releases/tag/v1.0.1-beta.1
2025-12-16 17:44:32 -08:00
Lance Release
94bdffe13c Bump version: 0.23.0-beta.2 → 0.23.0 2025-12-16 16:58:35 +00:00
Lance Release
b93ea3a388 Bump version: 0.23.0-beta.1 → 0.23.0-beta.2 2025-12-16 16:57:55 +00:00
Lance Release
ff20d12f20 Bump version: 0.26.0-beta.2 → 0.26.0 2025-12-16 16:57:09 +00:00
Lance Release
5f3e133470 Bump version: 0.26.0-beta.1 → 0.26.0-beta.2 2025-12-16 16:57:07 +00:00
Jack Ye
332e722a64 feat: upgrade lance-namespace python to 0.3.2 (#2868)
Includes fix https://github.com/lance-format/lance-namespace/pull/281
2025-12-16 08:56:04 -08:00
LanceDB Robot
3f63c4f8d9 chore: update lance dependency to v1.0.0 (#2867)
## Summary
- update all lance crates to v1.0.0 using the helper script (fallbacks
to the v1.0.0 tag)
- refresh Cargo.lock to pull the new release
- add script fallback to retry with the git tag when a crates.io release
is unavailable

## Testing
- cargo clippy --workspace --tests --all-features -- -D warnings
- cargo fmt --all

Tag: https://github.com/lance-format/lance/releases/tag/v1.0.0

---------

Co-authored-by: Jack Ye <yezhaoqin@gmail.com>
2025-12-15 20:36:19 -08:00
BubbleCal
39a18baf59 feat: infer vector type to float32 if integers are out of uint8 range (#2856)
## Summary
- infer integer vector columns as float32 when any value exceeds uint8
range or is negative
- keep uint8 for integer vectors within range and nulls only
- add sync/async tests covering large integer vector inference

## Testing
- ./.venv/bin/pytest python/python/tests/test_table.py -k
"large_int_vectors"
2025-12-08 17:10:25 +08:00
Lance Release
0960e19559 Bump version: 0.23.0-beta.0 → 0.23.0-beta.1 2025-12-05 00:36:39 +00:00
47 changed files with 1534 additions and 346 deletions

View File

@@ -1,5 +1,5 @@
[tool.bumpversion]
current_version = "0.23.0-beta.0"
current_version = "0.23.1"
parse = """(?x)
(?P<major>0|[1-9]\\d*)\\.
(?P<minor>0|[1-9]\\d*)\\.

View File

@@ -88,7 +88,7 @@ jobs:
npm install -g @napi-rs/cli
- name: Build
run: |
npm ci
npm ci --include=optional
npm run build:debug -- --profile ci
npm run tsc
- name: Setup localstack
@@ -146,7 +146,7 @@ jobs:
npm install -g @napi-rs/cli
- name: Build
run: |
npm ci
npm ci --include=optional
npm run build:debug -- --profile ci
npm run tsc
- name: Test

View File

@@ -49,8 +49,8 @@ jobs:
type-check:
name: "Type Check"
timeout-minutes: 30
runs-on: "ubuntu-22.04"
timeout-minutes: 60
runs-on: ubuntu-2404-8x-x64
defaults:
run:
shell: bash
@@ -78,7 +78,7 @@ jobs:
doctest:
name: "Doctest"
timeout-minutes: 30
timeout-minutes: 60
runs-on: ubuntu-2404-8x-x64
defaults:
run:

199
Cargo.lock generated
View File

@@ -1041,6 +1041,61 @@ dependencies = [
"tracing",
]
[[package]]
name = "axum"
version = "0.7.9"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "edca88bc138befd0323b20752846e6587272d3b03b0343c8ea28a6f819e6e71f"
dependencies = [
"async-trait",
"axum-core",
"bytes",
"futures-util",
"http 1.3.1",
"http-body 1.0.1",
"http-body-util",
"hyper 1.7.0",
"hyper-util",
"itoa",
"matchit",
"memchr",
"mime",
"percent-encoding",
"pin-project-lite",
"rustversion",
"serde",
"serde_json",
"serde_path_to_error",
"serde_urlencoded",
"sync_wrapper",
"tokio",
"tower",
"tower-layer",
"tower-service",
"tracing",
]
[[package]]
name = "axum-core"
version = "0.4.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "09f2bd6146b97ae3359fa0cc6d6b376d9539582c7b4220f041a33ec24c226199"
dependencies = [
"async-trait",
"bytes",
"futures-util",
"http 1.3.1",
"http-body 1.0.1",
"http-body-util",
"mime",
"pin-project-lite",
"rustversion",
"sync_wrapper",
"tower-layer",
"tower-service",
"tracing",
]
[[package]]
name = "backoff"
version = "0.4.0"
@@ -3086,8 +3141,9 @@ checksum = "42703706b716c37f96a77aea830392ad231f44c9e9a67872fa5548707e11b11c"
[[package]]
name = "fsst"
version = "1.1.0-beta.1"
source = "git+https://github.com/lance-format/lance.git?tag=v1.1.0-beta.1#ddea38f049e64df8b893e1c8ecca7878ea373d1e"
version = "1.0.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5ffdff7a2d68d22afc0657eddde3e946371ce7cfe730a3f78a5ed44ea5b1cb2e"
dependencies = [
"arrow-array",
"rand 0.9.2",
@@ -3930,6 +3986,7 @@ dependencies = [
"http 1.3.1",
"http-body 1.0.1",
"httparse",
"httpdate",
"itoa",
"pin-project-lite",
"pin-utils",
@@ -4205,7 +4262,7 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "4b0f83760fb341a774ed326568e19f5a863af4a952def8c39f9ab92fd95b88e5"
dependencies = [
"equivalent",
"hashbrown 0.15.5",
"hashbrown 0.16.0",
"serde",
"serde_core",
]
@@ -4422,8 +4479,9 @@ dependencies = [
[[package]]
name = "lance"
version = "1.1.0-beta.1"
source = "git+https://github.com/lance-format/lance.git?tag=v1.1.0-beta.1#ddea38f049e64df8b893e1c8ecca7878ea373d1e"
version = "1.0.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "e8c439decbc304e180748e34bb6d3df729069a222e83e74e2185c38f107136e9"
dependencies = [
"arrow",
"arrow-arith",
@@ -4488,8 +4546,9 @@ dependencies = [
[[package]]
name = "lance-arrow"
version = "1.1.0-beta.1"
source = "git+https://github.com/lance-format/lance.git?tag=v1.1.0-beta.1#ddea38f049e64df8b893e1c8ecca7878ea373d1e"
version = "1.0.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "f4ee5508b225456d3d56998eaeef0d8fbce5ea93856df47b12a94d2e74153210"
dependencies = [
"arrow-array",
"arrow-buffer",
@@ -4507,8 +4566,9 @@ dependencies = [
[[package]]
name = "lance-bitpacking"
version = "1.1.0-beta.1"
source = "git+https://github.com/lance-format/lance.git?tag=v1.1.0-beta.1#ddea38f049e64df8b893e1c8ecca7878ea373d1e"
version = "1.0.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d1c065fb3bd4a8cc4f78428443e990d4921aa08f707b676753db740e0b402a21"
dependencies = [
"arrayref",
"paste",
@@ -4517,8 +4577,9 @@ dependencies = [
[[package]]
name = "lance-core"
version = "1.1.0-beta.1"
source = "git+https://github.com/lance-format/lance.git?tag=v1.1.0-beta.1#ddea38f049e64df8b893e1c8ecca7878ea373d1e"
version = "1.0.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "e8856abad92e624b75cd57a04703f6441948a239463bdf973f2ac1924b0bcdbe"
dependencies = [
"arrow-array",
"arrow-buffer",
@@ -4554,8 +4615,9 @@ dependencies = [
[[package]]
name = "lance-datafusion"
version = "1.1.0-beta.1"
source = "git+https://github.com/lance-format/lance.git?tag=v1.1.0-beta.1#ddea38f049e64df8b893e1c8ecca7878ea373d1e"
version = "1.0.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "4c8835308044cef5467d7751be87fcbefc2db01c22370726a8704bd62991693f"
dependencies = [
"arrow",
"arrow-array",
@@ -4585,8 +4647,9 @@ dependencies = [
[[package]]
name = "lance-datagen"
version = "1.1.0-beta.1"
source = "git+https://github.com/lance-format/lance.git?tag=v1.1.0-beta.1#ddea38f049e64df8b893e1c8ecca7878ea373d1e"
version = "1.0.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "612de1e888bb36f6bf51196a6eb9574587fdf256b1759a4c50e643e00d5f96d0"
dependencies = [
"arrow",
"arrow-array",
@@ -4603,8 +4666,9 @@ dependencies = [
[[package]]
name = "lance-encoding"
version = "1.1.0-beta.1"
source = "git+https://github.com/lance-format/lance.git?tag=v1.1.0-beta.1#ddea38f049e64df8b893e1c8ecca7878ea373d1e"
version = "1.0.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "2b456b29b135d3c7192602e516ccade38b5483986e121895fa43cf1fdb38bf60"
dependencies = [
"arrow-arith",
"arrow-array",
@@ -4641,8 +4705,9 @@ dependencies = [
[[package]]
name = "lance-file"
version = "1.1.0-beta.1"
source = "git+https://github.com/lance-format/lance.git?tag=v1.1.0-beta.1#ddea38f049e64df8b893e1c8ecca7878ea373d1e"
version = "1.0.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "ab1538d14d5bb3735b4222b3f5aff83cfa59cc6ef7cdd3dd9139e4c77193c80b"
dependencies = [
"arrow-arith",
"arrow-array",
@@ -4674,8 +4739,9 @@ dependencies = [
[[package]]
name = "lance-geo"
version = "1.1.0-beta.1"
source = "git+https://github.com/lance-format/lance.git?tag=v1.1.0-beta.1#ddea38f049e64df8b893e1c8ecca7878ea373d1e"
version = "1.0.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "a5a69a2f3b55703d9c240ad7c5ffa2c755db69e9cf8aa05efe274a212910472d"
dependencies = [
"datafusion",
"geo-types",
@@ -4686,8 +4752,9 @@ dependencies = [
[[package]]
name = "lance-index"
version = "1.1.0-beta.1"
source = "git+https://github.com/lance-format/lance.git?tag=v1.1.0-beta.1#ddea38f049e64df8b893e1c8ecca7878ea373d1e"
version = "1.0.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0ea84613df6fa6b9168a1f056ba4f9cb73b90a1b452814c6fd4b3529bcdbfc78"
dependencies = [
"arrow",
"arrow-arith",
@@ -4748,8 +4815,9 @@ dependencies = [
[[package]]
name = "lance-io"
version = "1.1.0-beta.1"
source = "git+https://github.com/lance-format/lance.git?tag=v1.1.0-beta.1#ddea38f049e64df8b893e1c8ecca7878ea373d1e"
version = "1.0.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "6b3fc4c1d941fceef40a0edbd664dbef108acfc5d559bb9e7f588d0c733cbc35"
dependencies = [
"arrow",
"arrow-arith",
@@ -4789,8 +4857,9 @@ dependencies = [
[[package]]
name = "lance-linalg"
version = "1.1.0-beta.1"
source = "git+https://github.com/lance-format/lance.git?tag=v1.1.0-beta.1#ddea38f049e64df8b893e1c8ecca7878ea373d1e"
version = "1.0.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b62ffbc5ce367fbf700a69de3fe0612ee1a11191a64a632888610b6bacfa0f63"
dependencies = [
"arrow-array",
"arrow-buffer",
@@ -4806,8 +4875,9 @@ dependencies = [
[[package]]
name = "lance-namespace"
version = "1.1.0-beta.1"
source = "git+https://github.com/lance-format/lance.git?tag=v1.1.0-beta.1#ddea38f049e64df8b893e1c8ecca7878ea373d1e"
version = "1.0.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "791bbcd868ee758123a34e07d320a1fb99379432b5ecc0e78d6b4686e999b629"
dependencies = [
"arrow",
"async-trait",
@@ -4819,13 +4889,15 @@ dependencies = [
[[package]]
name = "lance-namespace-impls"
version = "1.1.0-beta.1"
source = "git+https://github.com/lance-format/lance.git?tag=v1.1.0-beta.1#ddea38f049e64df8b893e1c8ecca7878ea373d1e"
version = "1.0.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "ee713505576f6b1988a491f77c7ca8b0cf7090a393598e63c85079fa70a53ebf"
dependencies = [
"arrow",
"arrow-ipc",
"arrow-schema",
"async-trait",
"axum",
"bytes",
"futures",
"lance",
@@ -4837,9 +4909,12 @@ dependencies = [
"object_store",
"rand 0.9.2",
"reqwest",
"serde",
"serde_json",
"snafu",
"tokio",
"tower",
"tower-http 0.5.2",
"url",
]
@@ -4858,8 +4933,9 @@ dependencies = [
[[package]]
name = "lance-table"
version = "1.1.0-beta.1"
source = "git+https://github.com/lance-format/lance.git?tag=v1.1.0-beta.1#ddea38f049e64df8b893e1c8ecca7878ea373d1e"
version = "1.0.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "6fdb2d56bfa4d1511c765fa0cc00fdaa37e5d2d1cd2f57b3c6355d9072177052"
dependencies = [
"arrow",
"arrow-array",
@@ -4898,8 +4974,9 @@ dependencies = [
[[package]]
name = "lance-testing"
version = "1.1.0-beta.1"
source = "git+https://github.com/lance-format/lance.git?tag=v1.1.0-beta.1#ddea38f049e64df8b893e1c8ecca7878ea373d1e"
version = "1.0.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d8ccb1a4a9284435c6a8c02c8c06e7e041bece0d7f722152159353cf55dc51e3"
dependencies = [
"arrow-array",
"arrow-schema",
@@ -4910,7 +4987,7 @@ dependencies = [
[[package]]
name = "lancedb"
version = "0.23.0-beta.0"
version = "0.23.1"
dependencies = [
"ahash",
"anyhow",
@@ -4989,7 +5066,7 @@ dependencies = [
[[package]]
name = "lancedb-nodejs"
version = "0.23.0-beta.0"
version = "0.23.1"
dependencies = [
"arrow-array",
"arrow-ipc",
@@ -5009,7 +5086,7 @@ dependencies = [
[[package]]
name = "lancedb-python"
version = "0.26.0-beta.0"
version = "0.26.1"
dependencies = [
"arrow",
"async-trait",
@@ -5277,6 +5354,12 @@ dependencies = [
"regex-automata",
]
[[package]]
name = "matchit"
version = "0.7.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0e7465ac9959cc2b1404e8e2367b43684a6d13790fe23056cc8c6c5a6b7bcb94"
[[package]]
name = "matrixmultiply"
version = "0.3.10"
@@ -6659,8 +6742,8 @@ version = "0.13.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "be769465445e8c1474e9c5dac2018218498557af32d9ed057325ec9a41ae81bf"
dependencies = [
"heck 0.4.1",
"itertools 0.12.1",
"heck 0.5.0",
"itertools 0.14.0",
"log",
"multimap",
"once_cell",
@@ -6680,7 +6763,7 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "8a56d757972c98b346a9b766e3f02746cde6dd1cd1d1d563472929fdd74bec4d"
dependencies = [
"anyhow",
"itertools 0.12.1",
"itertools 0.14.0",
"proc-macro2",
"quote",
"syn 2.0.106",
@@ -7265,7 +7348,7 @@ dependencies = [
"tokio-rustls 0.26.4",
"tokio-util",
"tower",
"tower-http",
"tower-http 0.6.6",
"tower-service",
"url",
"wasm-bindgen",
@@ -7784,6 +7867,17 @@ dependencies = [
"serde_core",
]
[[package]]
name = "serde_path_to_error"
version = "0.1.20"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "10a9ff822e371bb5403e391ecd83e182e0e77ba7f6fe0160b795797109d1b457"
dependencies = [
"itoa",
"serde",
"serde_core",
]
[[package]]
name = "serde_plain"
version = "1.0.2"
@@ -7999,7 +8093,7 @@ version = "0.8.9"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "c1c97747dbf44bb1ca44a561ece23508e99cb592e862f22222dcf42f51d1e451"
dependencies = [
"heck 0.4.1",
"heck 0.5.0",
"proc-macro2",
"quote",
"syn 2.0.106",
@@ -8819,6 +8913,24 @@ dependencies = [
"tokio",
"tower-layer",
"tower-service",
"tracing",
]
[[package]]
name = "tower-http"
version = "0.5.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "1e9cd434a998747dd2c4276bc96ee2e0c7a2eadf3cae88e52be55a05fa9053f5"
dependencies = [
"bitflags 2.9.4",
"bytes",
"http 1.3.1",
"http-body 1.0.1",
"http-body-util",
"pin-project-lite",
"tower-layer",
"tower-service",
"tracing",
]
[[package]]
@@ -8857,6 +8969,7 @@ version = "0.1.41"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "784e0ac535deb450455cbfa28a6f0df145ea1bb7ae51b821cf5e7927fdcfbdd0"
dependencies = [
"log",
"pin-project-lite",
"tracing-attributes",
"tracing-core",

View File

@@ -15,20 +15,20 @@ categories = ["database-implementations"]
rust-version = "1.78.0"
[workspace.dependencies]
lance = { "version" = "=1.1.0-beta.1", default-features = false, "tag" = "v1.1.0-beta.1", "git" = "https://github.com/lance-format/lance.git" }
lance-core = { "version" = "=1.1.0-beta.1", "tag" = "v1.1.0-beta.1", "git" = "https://github.com/lance-format/lance.git" }
lance-datagen = { "version" = "=1.1.0-beta.1", "tag" = "v1.1.0-beta.1", "git" = "https://github.com/lance-format/lance.git" }
lance-file = { "version" = "=1.1.0-beta.1", "tag" = "v1.1.0-beta.1", "git" = "https://github.com/lance-format/lance.git" }
lance-io = { "version" = "=1.1.0-beta.1", default-features = false, "tag" = "v1.1.0-beta.1", "git" = "https://github.com/lance-format/lance.git" }
lance-index = { "version" = "=1.1.0-beta.1", "tag" = "v1.1.0-beta.1", "git" = "https://github.com/lance-format/lance.git" }
lance-linalg = { "version" = "=1.1.0-beta.1", "tag" = "v1.1.0-beta.1", "git" = "https://github.com/lance-format/lance.git" }
lance-namespace = { "version" = "=1.1.0-beta.1", "tag" = "v1.1.0-beta.1", "git" = "https://github.com/lance-format/lance.git" }
lance-namespace-impls = { "version" = "=1.1.0-beta.1", default-features = false, "tag" = "v1.1.0-beta.1", "git" = "https://github.com/lance-format/lance.git" }
lance-table = { "version" = "=1.1.0-beta.1", "tag" = "v1.1.0-beta.1", "git" = "https://github.com/lance-format/lance.git" }
lance-testing = { "version" = "=1.1.0-beta.1", "tag" = "v1.1.0-beta.1", "git" = "https://github.com/lance-format/lance.git" }
lance-datafusion = { "version" = "=1.1.0-beta.1", "tag" = "v1.1.0-beta.1", "git" = "https://github.com/lance-format/lance.git" }
lance-encoding = { "version" = "=1.1.0-beta.1", "tag" = "v1.1.0-beta.1", "git" = "https://github.com/lance-format/lance.git" }
lance-arrow = { "version" = "=1.1.0-beta.1", "tag" = "v1.1.0-beta.1", "git" = "https://github.com/lance-format/lance.git" }
lance = { "version" = "=1.0.1", default-features = false }
lance-core = "=1.0.1"
lance-datagen = "=1.0.1"
lance-file = "=1.0.1"
lance-io = { "version" = "=1.0.1", default-features = false }
lance-index = "=1.0.1"
lance-linalg = "=1.0.1"
lance-namespace = "=1.0.1"
lance-namespace-impls = { "version" = "=1.0.1", default-features = false }
lance-table = "=1.0.1"
lance-testing = "=1.0.1"
lance-datafusion = "=1.0.1"
lance-encoding = "=1.0.1"
lance-arrow = "=1.0.1"
ahash = "0.8"
# Note that this one does not include pyarrow
arrow = { version = "56.2", optional = false }

View File

@@ -16,7 +16,7 @@ check_command_exists() {
}
if [[ ! -e ./lancedb ]]; then
if [[ -v SOPHON_READ_TOKEN ]]; then
if [[ x${SOPHON_READ_TOKEN} != "x" ]]; then
INPUT="lancedb-linux-x64"
gh release \
--repo lancedb/lancedb \

View File

@@ -229,6 +229,29 @@ def set_local_version():
update_cargo_toml(line_updater)
def update_lockfiles(version: str, fallback_to_git: bool = False):
"""
Update Cargo metadata and optionally fall back to using the git tag if the
requested crates.io version is unavailable.
"""
try:
print("Updating lockfiles...", file=sys.stderr, end="")
run_command("cargo metadata > /dev/null")
print(" done.", file=sys.stderr)
except Exception as e:
if fallback_to_git and "failed to select a version" in str(e):
print(
f" failed for crates.io v{version}, retrying with git tag...",
file=sys.stderr,
)
set_preview_version(version)
print("Updating lockfiles...", file=sys.stderr, end="")
run_command("cargo metadata > /dev/null")
print(" done.", file=sys.stderr)
else:
raise
parser = argparse.ArgumentParser(description="Set the version of the Lance package.")
parser.add_argument(
"version",
@@ -244,6 +267,7 @@ if args.version == "stable":
file=sys.stderr,
)
set_stable_version(latest_stable_version)
update_lockfiles(latest_stable_version)
elif args.version == "preview":
latest_preview_version = get_latest_preview_version()
print(
@@ -251,8 +275,10 @@ elif args.version == "preview":
file=sys.stderr,
)
set_preview_version(latest_preview_version)
update_lockfiles(latest_preview_version)
elif args.version == "local":
set_local_version()
update_lockfiles("local")
else:
# Parse the version number.
version = args.version
@@ -262,9 +288,7 @@ else:
if "beta" in version:
set_preview_version(version)
update_lockfiles(version)
else:
set_stable_version(version)
print("Updating lockfiles...", file=sys.stderr, end="")
run_command("cargo metadata > /dev/null")
print(" done.", file=sys.stderr)
update_lockfiles(version, fallback_to_git=True)

View File

@@ -11,7 +11,7 @@ watch:
theme:
name: "material"
logo: assets/logo.png
favicon: assets/logo.png
favicon: assets/favicon.ico
palette:
# Palette toggle for light mode
- scheme: lancedb
@@ -32,8 +32,6 @@ theme:
- content.tooltips
- toc.follow
- navigation.top
- navigation.tabs
- navigation.tabs.sticky
- navigation.footer
- navigation.tracking
- navigation.instant
@@ -115,12 +113,13 @@ markdown_extensions:
emoji_index: !!python/name:material.extensions.emoji.twemoji
emoji_generator: !!python/name:material.extensions.emoji.to_svg
- markdown.extensions.toc:
baselevel: 1
permalink: ""
toc_depth: 3
permalink: true
permalink_title: Anchor link to this section
nav:
- API reference:
- Overview: index.md
- Documentation:
- SDK Reference: index.md
- Python: python/python.md
- Javascript/TypeScript: js/globals.md
- Java: java/java.md

BIN
docs/src/assets/favicon.ico Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 15 KiB

View File

@@ -0,0 +1,111 @@
# VoyageAI Embeddings : Multimodal
VoyageAI embeddings can also be used to embed both text and image data, only some of the models support image data and you can check the list
under [https://docs.voyageai.com/docs/multimodal-embeddings](https://docs.voyageai.com/docs/multimodal-embeddings)
Supported multimodal models:
- `voyage-multimodal-3` - 1024 dimensions (text + images)
- `voyage-multimodal-3.5` - Flexible dimensions (256, 512, 1024 default, 2048). Supports text, images, and video.
### Video Support (voyage-multimodal-3.5)
The `voyage-multimodal-3.5` model supports video input through:
- Video URLs (`.mp4`, `.webm`, `.mov`, `.avi`, `.mkv`, `.m4v`, `.gif`)
- Video file paths
Constraints: Max 20MB video size.
Supported parameters (to be passed in `create` method) are:
| Parameter | Type | Default Value | Description |
|---|---|-------------------------|-------------------------------------------|
| `name` | `str` | `"voyage-multimodal-3"` | The model ID of the VoyageAI model to use |
| `output_dimension` | `int` | `None` | Output dimension for voyage-multimodal-3.5. Valid: 256, 512, 1024, 2048 |
Usage Example:
```python
import base64
import os
from io import BytesIO
import requests
import lancedb
from lancedb.pydantic import LanceModel, Vector
from lancedb.embeddings import get_registry
import pandas as pd
os.environ['VOYAGE_API_KEY'] = 'YOUR_VOYAGE_API_KEY'
db = lancedb.connect(".lancedb")
func = get_registry().get("voyageai").create(name="voyage-multimodal-3")
def image_to_base64(image_bytes: bytes):
buffered = BytesIO(image_bytes)
img_str = base64.b64encode(buffered.getvalue())
return img_str.decode("utf-8")
class Images(LanceModel):
label: str
image_uri: str = func.SourceField() # image uri as the source
image_bytes: str = func.SourceField() # image bytes base64 encoded as the source
vector: Vector(func.ndims()) = func.VectorField() # vector column
vec_from_bytes: Vector(func.ndims()) = func.VectorField() # Another vector column
if "images" in db.table_names():
db.drop_table("images")
table = db.create_table("images", schema=Images)
labels = ["cat", "cat", "dog", "dog", "horse", "horse"]
uris = [
"http://farm1.staticflickr.com/53/167798175_7c7845bbbd_z.jpg",
"http://farm1.staticflickr.com/134/332220238_da527d8140_z.jpg",
"http://farm9.staticflickr.com/8387/8602747737_2e5c2a45d4_z.jpg",
"http://farm5.staticflickr.com/4092/5017326486_1f46057f5f_z.jpg",
"http://farm9.staticflickr.com/8216/8434969557_d37882c42d_z.jpg",
"http://farm6.staticflickr.com/5142/5835678453_4f3a4edb45_z.jpg",
]
# get each uri as bytes
images_bytes = [image_to_base64(requests.get(uri).content) for uri in uris]
table.add(
pd.DataFrame({"label": labels, "image_uri": uris, "image_bytes": images_bytes})
)
```
Now we can search using text from both the default vector column and the custom vector column
```python
# text search
actual = table.search("man's best friend", "vec_from_bytes").limit(1).to_pydantic(Images)[0]
print(actual.label) # prints "dog"
frombytes = (
table.search("man's best friend", vector_column_name="vec_from_bytes")
.limit(1)
.to_pydantic(Images)[0]
)
print(frombytes.label)
```
Because we're using a multi-modal embedding function, we can also search using images
```python
# image search
query_image_uri = "http://farm1.staticflickr.com/200/467715466_ed4a31801f_z.jpg"
image_bytes = requests.get(query_image_uri).content
query_image = Image.open(BytesIO(image_bytes))
actual = table.search(query_image, "vec_from_bytes").limit(1).to_pydantic(Images)[0]
print(actual.label == "dog")
# image search using a custom vector column
other = (
table.search(query_image, vector_column_name="vec_from_bytes")
.limit(1)
.to_pydantic(Images)[0]
)
print(actual.label)
```

View File

@@ -1,8 +1,12 @@
# API Reference
# SDK Reference
This page contains the API reference for the SDKs supported by the LanceDB team.
This site contains the API reference for the client SDKs supported by [LanceDB](https://lancedb.com).
- [Python](python/python.md)
- [JavaScript/TypeScript](js/globals.md)
- [Java](java/java.md)
- [Rust](https://docs.rs/lancedb/latest/lancedb/index.html)
- [Rust](https://docs.rs/lancedb/latest/lancedb/index.html)
!!! info "LanceDB Documentation"
If you're looking for the full documentation of LanceDB, visit [docs.lancedb.com](https://docs.lancedb.com).

View File

@@ -14,7 +14,7 @@ Add the following dependency to your `pom.xml`:
<dependency>
<groupId>com.lancedb</groupId>
<artifactId>lancedb-core</artifactId>
<version>0.23.0-beta.0</version>
<version>0.23.1</version>
</dependency>
```

View File

@@ -85,17 +85,26 @@
/* Header gradient (only header area) */
.md-header {
background: linear-gradient(90deg, #3B2E58 0%, #F0B7C1 45%, #E55A2B 100%);
background: linear-gradient(90deg, #e4d8f8 0%, #F0B7C1 45%, #E55A2B 100%);
box-shadow: inset 0 1px 0 rgba(255,255,255,0.08), 0 1px 0 rgba(0,0,0,0.08);
}
/* Improve brand title contrast on the lavender side */
.md-header__title,
.md-header__topic,
.md-header__title .md-ellipsis,
.md-header__topic .md-ellipsis {
color: #2b1b3a;
text-shadow: 0 1px 0 rgba(255, 255, 255, 0.25);
}
/* Same colors as header for tabs (that hold the text) */
.md-tabs {
background: linear-gradient(90deg, #3B2E58 0%, #F0B7C1 45%, #E55A2B 100%);
background: linear-gradient(90deg, #e4d8f8 0%, #F0B7C1 45%, #E55A2B 100%);
}
/* Dark scheme variant */
[data-md-color-scheme="slate"] .md-header,
[data-md-color-scheme="slate"] .md-tabs {
background: linear-gradient(90deg, #3B2E58 0%, #F0B7C1 45%, #E55A2B 100%);
background: linear-gradient(90deg, #e4d8f8 0%, #F0B7C1 45%, #E55A2B 100%);
}

View File

@@ -8,7 +8,7 @@
<parent>
<groupId>com.lancedb</groupId>
<artifactId>lancedb-parent</artifactId>
<version>0.23.0-beta.0</version>
<version>0.23.1-final.0</version>
<relativePath>../pom.xml</relativePath>
</parent>

View File

@@ -6,7 +6,7 @@
<groupId>com.lancedb</groupId>
<artifactId>lancedb-parent</artifactId>
<version>0.23.0-beta.0</version>
<version>0.23.1-final.0</version>
<packaging>pom</packaging>
<name>${project.artifactId}</name>
<description>LanceDB Java SDK Parent POM</description>

View File

@@ -1,7 +1,7 @@
[package]
name = "lancedb-nodejs"
edition.workspace = true
version = "0.23.0-beta.0"
version = "0.23.1"
license.workspace = true
description.workspace = true
repository.workspace = true

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-darwin-arm64",
"version": "0.23.0-beta.0",
"version": "0.23.1",
"os": ["darwin"],
"cpu": ["arm64"],
"main": "lancedb.darwin-arm64.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-darwin-x64",
"version": "0.23.0-beta.0",
"version": "0.23.1",
"os": ["darwin"],
"cpu": ["x64"],
"main": "lancedb.darwin-x64.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-linux-arm64-gnu",
"version": "0.23.0-beta.0",
"version": "0.23.1",
"os": ["linux"],
"cpu": ["arm64"],
"main": "lancedb.linux-arm64-gnu.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-linux-arm64-musl",
"version": "0.23.0-beta.0",
"version": "0.23.1",
"os": ["linux"],
"cpu": ["arm64"],
"main": "lancedb.linux-arm64-musl.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-linux-x64-gnu",
"version": "0.23.0-beta.0",
"version": "0.23.1",
"os": ["linux"],
"cpu": ["x64"],
"main": "lancedb.linux-x64-gnu.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-linux-x64-musl",
"version": "0.23.0-beta.0",
"version": "0.23.1",
"os": ["linux"],
"cpu": ["x64"],
"main": "lancedb.linux-x64-musl.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-win32-arm64-msvc",
"version": "0.23.0-beta.0",
"version": "0.23.1",
"os": [
"win32"
],

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-win32-x64-msvc",
"version": "0.23.0-beta.0",
"version": "0.23.1",
"os": ["win32"],
"cpu": ["x64"],
"main": "lancedb.win32-x64-msvc.node",

View File

@@ -1,12 +1,12 @@
{
"name": "@lancedb/lancedb",
"version": "0.23.0-beta.0",
"version": "0.23.1",
"lockfileVersion": 3,
"requires": true,
"packages": {
"": {
"name": "@lancedb/lancedb",
"version": "0.23.0-beta.0",
"version": "0.23.1",
"cpu": [
"x64",
"arm64"

View File

@@ -11,7 +11,7 @@
"ann"
],
"private": false,
"version": "0.23.0-beta.0",
"version": "0.23.1",
"main": "dist/index.js",
"exports": {
".": "./dist/index.js",

View File

@@ -1,5 +1,5 @@
[tool.bumpversion]
current_version = "0.26.0-beta.1"
current_version = "0.26.1"
parse = """(?x)
(?P<major>0|[1-9]\\d*)\\.
(?P<minor>0|[1-9]\\d*)\\.

View File

@@ -1,6 +1,6 @@
[package]
name = "lancedb-python"
version = "0.26.0-beta.1"
version = "0.26.1"
edition.workspace = true
description = "Python bindings for LanceDB"
license.workspace = true

View File

@@ -10,7 +10,7 @@ dependencies = [
"pyarrow>=16",
"pydantic>=1.10",
"tqdm>=4.27.0",
"lance-namespace>=0.2.1"
"lance-namespace>=0.3.2"
]
description = "lancedb"
authors = [{ name = "LanceDB Devs", email = "dev@lancedb.com" }]

View File

@@ -13,6 +13,7 @@ __version__ = importlib.metadata.version("lancedb")
from ._lancedb import connect as lancedb_connect
from .common import URI, sanitize_uri
from urllib.parse import urlparse
from .db import AsyncConnection, DBConnection, LanceDBConnection
from .io import StorageOptionsProvider
from .remote import ClientConfig
@@ -28,6 +29,39 @@ from .namespace import (
)
def _check_s3_bucket_with_dots(
uri: str, storage_options: Optional[Dict[str, str]]
) -> None:
"""
Check if an S3 URI has a bucket name containing dots and warn if no region
is specified. S3 buckets with dots cannot use virtual-hosted-style URLs,
which breaks automatic region detection.
See: https://github.com/lancedb/lancedb/issues/1898
"""
if not isinstance(uri, str) or not uri.startswith("s3://"):
return
parsed = urlparse(uri)
bucket = parsed.netloc
if "." not in bucket:
return
# Check if region is provided in storage_options
region_keys = {"region", "aws_region"}
has_region = storage_options and any(k in storage_options for k in region_keys)
if not has_region:
raise ValueError(
f"S3 bucket name '{bucket}' contains dots, which prevents automatic "
f"region detection. Please specify the region explicitly via "
f"storage_options={{'region': '<your-region>'}} or "
f"storage_options={{'aws_region': '<your-region>'}}. "
f"See https://github.com/lancedb/lancedb/issues/1898 for details."
)
def connect(
uri: URI,
*,
@@ -121,9 +155,11 @@ def connect(
storage_options=storage_options,
**kwargs,
)
_check_s3_bucket_with_dots(str(uri), storage_options)
if kwargs:
raise ValueError(f"Unknown keyword arguments: {kwargs}")
return LanceDBConnection(
uri,
read_consistency_interval=read_consistency_interval,
@@ -211,6 +247,8 @@ async def connect_async(
if isinstance(client_config, dict):
client_config = ClientConfig(**client_config)
_check_s3_bucket_with_dots(str(uri), storage_options)
return AsyncConnection(
await lancedb_connect(
sanitize_uri(uri),

View File

@@ -210,10 +210,8 @@ class DBConnection(EnforceOverrides):
page_token: str, optional
The token to use for pagination. If not present, start from the beginning.
Typically, this token is last table name from the previous page.
Only supported by LanceDb Cloud.
limit: int, default 10
The size of the page to return.
Only supported by LanceDb Cloud.
Returns
-------

View File

@@ -2,7 +2,7 @@
# SPDX-FileCopyrightText: Copyright The LanceDB Authors
import base64
import os
from typing import ClassVar, TYPE_CHECKING, List, Union, Any, Generator
from typing import ClassVar, TYPE_CHECKING, List, Union, Any, Generator, Optional
from pathlib import Path
from urllib.parse import urlparse
@@ -45,11 +45,29 @@ def is_valid_url(text):
return False
VIDEO_EXTENSIONS = {".mp4", ".webm", ".mov", ".avi", ".mkv", ".m4v", ".gif"}
def is_video_url(url: str) -> bool:
"""Check if URL points to a video file based on extension."""
parsed = urlparse(url)
path = parsed.path.lower()
return any(path.endswith(ext) for ext in VIDEO_EXTENSIONS)
def is_video_path(path: Path) -> bool:
"""Check if file path is a video file based on extension."""
return path.suffix.lower() in VIDEO_EXTENSIONS
def transform_input(input_data: Union[str, bytes, Path]):
PIL = attempt_import_or_raise("PIL", "pillow")
if isinstance(input_data, str):
if is_valid_url(input_data):
content = {"type": "image_url", "image_url": input_data}
if is_video_url(input_data):
content = {"type": "video_url", "video_url": input_data}
else:
content = {"type": "image_url", "image_url": input_data}
else:
content = {"type": "text", "text": input_data}
elif isinstance(input_data, PIL.Image.Image):
@@ -70,14 +88,24 @@ def transform_input(input_data: Union[str, bytes, Path]):
"image_base64": "data:image/jpeg;base64," + img_str,
}
elif isinstance(input_data, Path):
img = PIL.Image.open(input_data)
buffered = BytesIO()
img.save(buffered, format="JPEG")
img_str = base64.b64encode(buffered.getvalue()).decode("utf-8")
content = {
"type": "image_base64",
"image_base64": "data:image/jpeg;base64," + img_str,
}
if is_video_path(input_data):
# Read video file and encode as base64
with open(input_data, "rb") as f:
video_bytes = f.read()
video_str = base64.b64encode(video_bytes).decode("utf-8")
content = {
"type": "video_base64",
"video_base64": video_str,
}
else:
img = PIL.Image.open(input_data)
buffered = BytesIO()
img.save(buffered, format="JPEG")
img_str = base64.b64encode(buffered.getvalue()).decode("utf-8")
content = {
"type": "image_base64",
"image_base64": "data:image/jpeg;base64," + img_str,
}
else:
raise ValueError("Each input should be either str, bytes, Path or Image.")
@@ -91,6 +119,8 @@ def sanitize_multimodal_input(inputs: Union[TEXT, IMAGES]) -> List[Any]:
PIL = attempt_import_or_raise("PIL", "pillow")
if isinstance(inputs, (str, bytes, Path, PIL.Image.Image)):
inputs = [inputs]
elif isinstance(inputs, list):
pass # Already a list, use as-is
elif isinstance(inputs, pa.Array):
inputs = inputs.to_pylist()
elif isinstance(inputs, pa.ChunkedArray):
@@ -143,11 +173,16 @@ class VoyageAIEmbeddingFunction(EmbeddingFunction):
* voyage-3
* voyage-3-lite
* voyage-multimodal-3
* voyage-multimodal-3.5
* voyage-finance-2
* voyage-multilingual-2
* voyage-law-2
* voyage-code-2
output_dimension: int, optional
The output dimension for models that support flexible dimensions.
Currently only voyage-multimodal-3.5 supports this feature.
Valid options: 256, 512, 1024 (default), 2048.
Examples
--------
@@ -175,7 +210,10 @@ class VoyageAIEmbeddingFunction(EmbeddingFunction):
"""
name: str
output_dimension: Optional[int] = None
client: ClassVar = None
_FLEXIBLE_DIM_MODELS: ClassVar[list] = ["voyage-multimodal-3.5"]
_VALID_DIMENSIONS: ClassVar[list] = [256, 512, 1024, 2048]
text_embedding_models: list = [
"voyage-3.5",
"voyage-3.5-lite",
@@ -186,7 +224,7 @@ class VoyageAIEmbeddingFunction(EmbeddingFunction):
"voyage-law-2",
"voyage-code-2",
]
multimodal_embedding_models: list = ["voyage-multimodal-3"]
multimodal_embedding_models: list = ["voyage-multimodal-3", "voyage-multimodal-3.5"]
contextual_embedding_models: list = ["voyage-context-3"]
def _is_multimodal_model(self, model_name: str):
@@ -198,6 +236,17 @@ class VoyageAIEmbeddingFunction(EmbeddingFunction):
return model_name in self.contextual_embedding_models or "context" in model_name
def ndims(self):
# Handle flexible dimension models
if self.name in self._FLEXIBLE_DIM_MODELS:
if self.output_dimension is not None:
if self.output_dimension not in self._VALID_DIMENSIONS:
raise ValueError(
f"Invalid output_dimension {self.output_dimension} "
f"for {self.name}. Valid options: {self._VALID_DIMENSIONS}"
)
return self.output_dimension
return 1024 # default dimension
if self.name == "voyage-3-lite":
return 512
elif self.name == "voyage-code-2":
@@ -211,12 +260,17 @@ class VoyageAIEmbeddingFunction(EmbeddingFunction):
"voyage-finance-2",
"voyage-multilingual-2",
"voyage-law-2",
"voyage-multimodal-3",
]:
return 1024
else:
raise ValueError(f"Model {self.name} not supported")
def _get_multimodal_kwargs(self, **kwargs):
"""Get kwargs for multimodal embed call, including output_dimension if set."""
if self.name in self._FLEXIBLE_DIM_MODELS and self.output_dimension is not None:
kwargs["output_dimension"] = self.output_dimension
return kwargs
def compute_query_embeddings(
self, query: Union[str, "PIL.Image.Image"], *args, **kwargs
) -> List[np.ndarray]:
@@ -234,6 +288,7 @@ class VoyageAIEmbeddingFunction(EmbeddingFunction):
"""
client = VoyageAIEmbeddingFunction._get_client()
if self._is_multimodal_model(self.name):
kwargs = self._get_multimodal_kwargs(**kwargs)
result = client.multimodal_embed(
inputs=[[query]], model=self.name, input_type="query", **kwargs
)
@@ -275,6 +330,7 @@ class VoyageAIEmbeddingFunction(EmbeddingFunction):
)
if has_images:
# Use non-batched API for images
kwargs = self._get_multimodal_kwargs(**kwargs)
result = client.multimodal_embed(
inputs=sanitized, model=self.name, input_type="document", **kwargs
)
@@ -357,6 +413,7 @@ class VoyageAIEmbeddingFunction(EmbeddingFunction):
callable: A function that takes a batch of texts and returns embeddings.
"""
if self._is_multimodal_model(self.name):
multimodal_kwargs = self._get_multimodal_kwargs(**kwargs)
def embed_batch(batch: List[str]) -> List[np.array]:
batch_inputs = sanitize_multimodal_input(batch)
@@ -364,7 +421,7 @@ class VoyageAIEmbeddingFunction(EmbeddingFunction):
inputs=batch_inputs,
model=self.name,
input_type=input_type,
**kwargs,
**multimodal_kwargs,
)
return result.embeddings

View File

@@ -18,7 +18,17 @@ from lancedb._lancedb import (
UpdateResult,
)
from lancedb.embeddings.base import EmbeddingFunctionConfig
from lancedb.index import FTS, BTree, Bitmap, HnswSq, IvfFlat, IvfPq, IvfSq, LabelList
from lancedb.index import (
FTS,
BTree,
Bitmap,
HnswSq,
IvfFlat,
IvfPq,
IvfRq,
IvfSq,
LabelList,
)
from lancedb.remote.db import LOOP
import pyarrow as pa
@@ -265,6 +275,12 @@ class RemoteTable(Table):
num_sub_vectors=num_sub_vectors,
num_bits=num_bits,
)
elif index_type == "IVF_RQ":
config = IvfRq(
distance_type=metric,
num_partitions=num_partitions,
num_bits=num_bits,
)
elif index_type == "IVF_SQ":
config = IvfSq(distance_type=metric, num_partitions=num_partitions)
elif index_type == "IVF_HNSW_PQ":
@@ -279,7 +295,8 @@ class RemoteTable(Table):
else:
raise ValueError(
f"Unknown vector index type: {index_type}. Valid options are"
" 'IVF_FLAT', 'IVF_SQ', 'IVF_PQ', 'IVF_HNSW_PQ', 'IVF_HNSW_SQ'"
" 'IVF_FLAT', 'IVF_PQ', 'IVF_RQ', 'IVF_SQ',"
" 'IVF_HNSW_PQ', 'IVF_HNSW_SQ'"
)
LOOP.run(

View File

@@ -684,6 +684,24 @@ class Table(ABC):
"""
raise NotImplementedError
def to_lance(self, **kwargs) -> lance.LanceDataset:
"""Return the table as a lance.LanceDataset.
Returns
-------
lance.LanceDataset
"""
raise NotImplementedError
def to_polars(self, **kwargs) -> "pl.DataFrame":
"""Return the table as a polars.DataFrame.
Returns
-------
polars.DataFrame
"""
raise NotImplementedError
def create_index(
self,
metric="l2",
@@ -3208,7 +3226,27 @@ def _infer_target_schema(
if pa.types.is_floating(field.type.value_type):
target_type = pa.list_(pa.float32(), dim)
elif pa.types.is_integer(field.type.value_type):
target_type = pa.list_(pa.uint8(), dim)
values = peeked.column(i)
if isinstance(values, pa.ChunkedArray):
values = values.combine_chunks()
flattened = values.flatten()
valid_count = pc.count(flattened, mode="only_valid").as_py()
if valid_count == 0:
target_type = pa.list_(pa.uint8(), dim)
else:
min_max = pc.min_max(flattened)
min_value = min_max["min"].as_py()
max_value = min_max["max"].as_py()
if (min_value is not None and min_value < 0) or (
max_value is not None and max_value > 255
):
target_type = pa.list_(pa.float32(), dim)
else:
target_type = pa.list_(pa.uint8(), dim)
else:
continue # Skip non-numeric types

View File

@@ -613,6 +613,133 @@ def test_voyageai_multimodal_embedding_text_function():
assert len(tbl.to_pandas()["vector"][0]) == voyageai.ndims()
@pytest.mark.slow
@pytest.mark.skipif(
os.environ.get("VOYAGE_API_KEY") is None, reason="VOYAGE_API_KEY not set"
)
def test_voyageai_multimodal_35_embedding_function():
"""Test voyage-multimodal-3.5 model with text input."""
voyageai = (
get_registry()
.get("voyageai")
.create(name="voyage-multimodal-3.5", max_retries=0)
)
class TextModel(LanceModel):
text: str = voyageai.SourceField()
vector: Vector(voyageai.ndims()) = voyageai.VectorField()
df = pd.DataFrame({"text": ["hello world", "goodbye world"]})
db = lancedb.connect("~/lancedb")
tbl = db.create_table("test_multimodal_35", schema=TextModel, mode="overwrite")
tbl.add(df)
assert len(tbl.to_pandas()["vector"][0]) == voyageai.ndims()
assert voyageai.ndims() == 1024
@pytest.mark.slow
@pytest.mark.skipif(
os.environ.get("VOYAGE_API_KEY") is None, reason="VOYAGE_API_KEY not set"
)
def test_voyageai_multimodal_35_flexible_dimensions():
"""Test voyage-multimodal-3.5 model with custom output dimension."""
voyageai = (
get_registry()
.get("voyageai")
.create(name="voyage-multimodal-3.5", output_dimension=512, max_retries=0)
)
class TextModel(LanceModel):
text: str = voyageai.SourceField()
vector: Vector(voyageai.ndims()) = voyageai.VectorField()
assert voyageai.ndims() == 512
df = pd.DataFrame({"text": ["hello world", "goodbye world"]})
db = lancedb.connect("~/lancedb")
tbl = db.create_table("test_multimodal_35_dim", schema=TextModel, mode="overwrite")
tbl.add(df)
assert len(tbl.to_pandas()["vector"][0]) == 512
@pytest.mark.slow
@pytest.mark.skipif(
os.environ.get("VOYAGE_API_KEY") is None, reason="VOYAGE_API_KEY not set"
)
def test_voyageai_multimodal_35_image_embedding():
"""Test voyage-multimodal-3.5 model with image input."""
voyageai = (
get_registry()
.get("voyageai")
.create(name="voyage-multimodal-3.5", max_retries=0)
)
class Images(LanceModel):
label: str
image_uri: str = voyageai.SourceField()
vector: Vector(voyageai.ndims()) = voyageai.VectorField()
db = lancedb.connect("~/lancedb")
table = db.create_table(
"test_multimodal_35_images", schema=Images, mode="overwrite"
)
labels = ["cat", "dog"]
uris = [
"http://farm1.staticflickr.com/53/167798175_7c7845bbbd_z.jpg",
"http://farm9.staticflickr.com/8387/8602747737_2e5c2a45d4_z.jpg",
]
table.add(pd.DataFrame({"label": labels, "image_uri": uris}))
assert len(table.to_pandas()["vector"][0]) == voyageai.ndims()
assert voyageai.ndims() == 1024
@pytest.mark.slow
@pytest.mark.skipif(
os.environ.get("VOYAGE_API_KEY") is None, reason="VOYAGE_API_KEY not set"
)
@pytest.mark.parametrize("dimension", [256, 512, 1024, 2048])
def test_voyageai_multimodal_35_all_dimensions(dimension):
"""Test voyage-multimodal-3.5 model with all valid output dimensions."""
voyageai = (
get_registry()
.get("voyageai")
.create(name="voyage-multimodal-3.5", output_dimension=dimension, max_retries=0)
)
assert voyageai.ndims() == dimension
class TextModel(LanceModel):
text: str = voyageai.SourceField()
vector: Vector(voyageai.ndims()) = voyageai.VectorField()
df = pd.DataFrame({"text": ["hello world"]})
db = lancedb.connect("~/lancedb")
tbl = db.create_table(
f"test_multimodal_35_dim_{dimension}", schema=TextModel, mode="overwrite"
)
tbl.add(df)
assert len(tbl.to_pandas()["vector"][0]) == dimension
@pytest.mark.slow
@pytest.mark.skipif(
os.environ.get("VOYAGE_API_KEY") is None, reason="VOYAGE_API_KEY not set"
)
def test_voyageai_multimodal_35_invalid_dimension():
"""Test voyage-multimodal-3.5 model raises error for invalid output dimension."""
with pytest.raises(ValueError, match="Invalid output_dimension"):
voyageai = (
get_registry()
.get("voyageai")
.create(name="voyage-multimodal-3.5", output_dimension=999, max_retries=0)
)
# ndims() is where the validation happens
voyageai.ndims()
@pytest.mark.slow
@pytest.mark.skipif(
importlib.util.find_spec("colpali_engine") is None,

View File

@@ -0,0 +1,68 @@
# SPDX-License-Identifier: Apache-2.0
# SPDX-FileCopyrightText: Copyright The LanceDB Authors
"""
Tests for S3 bucket names containing dots.
Related issue: https://github.com/lancedb/lancedb/issues/1898
These tests validate the early error checking for S3 bucket names with dots.
No actual S3 connection is made - validation happens before connection.
"""
import pytest
import lancedb
# Test URIs
BUCKET_WITH_DOTS = "s3://my.bucket.name/path"
BUCKET_WITH_DOTS_AND_REGION = ("s3://my.bucket.name", {"region": "us-east-1"})
BUCKET_WITH_DOTS_AND_AWS_REGION = ("s3://my.bucket.name", {"aws_region": "us-east-1"})
BUCKET_WITHOUT_DOTS = "s3://my-bucket/path"
class TestS3BucketWithDotsSync:
"""Tests for connect()."""
def test_bucket_with_dots_requires_region(self):
with pytest.raises(ValueError, match="contains dots"):
lancedb.connect(BUCKET_WITH_DOTS)
def test_bucket_with_dots_and_region_passes(self):
uri, opts = BUCKET_WITH_DOTS_AND_REGION
db = lancedb.connect(uri, storage_options=opts)
assert db is not None
def test_bucket_with_dots_and_aws_region_passes(self):
uri, opts = BUCKET_WITH_DOTS_AND_AWS_REGION
db = lancedb.connect(uri, storage_options=opts)
assert db is not None
def test_bucket_without_dots_passes(self):
db = lancedb.connect(BUCKET_WITHOUT_DOTS)
assert db is not None
class TestS3BucketWithDotsAsync:
"""Tests for connect_async()."""
@pytest.mark.asyncio
async def test_bucket_with_dots_requires_region(self):
with pytest.raises(ValueError, match="contains dots"):
await lancedb.connect_async(BUCKET_WITH_DOTS)
@pytest.mark.asyncio
async def test_bucket_with_dots_and_region_passes(self):
uri, opts = BUCKET_WITH_DOTS_AND_REGION
db = await lancedb.connect_async(uri, storage_options=opts)
assert db is not None
@pytest.mark.asyncio
async def test_bucket_with_dots_and_aws_region_passes(self):
uri, opts = BUCKET_WITH_DOTS_AND_AWS_REGION
db = await lancedb.connect_async(uri, storage_options=opts)
assert db is not None
@pytest.mark.asyncio
async def test_bucket_without_dots_passes(self):
db = await lancedb.connect_async(BUCKET_WITHOUT_DOTS)
assert db is not None

View File

@@ -46,6 +46,39 @@ def test_basic(mem_db: DBConnection):
assert table.to_arrow() == expected_data
def test_create_table_infers_large_int_vectors(mem_db: DBConnection):
data = [{"vector": [0, 300]}]
table = mem_db.create_table(
"int_vector_overflow", data=data, mode="overwrite", exist_ok=True
)
vector_field = table.schema.field("vector")
assert vector_field.type == pa.list_(pa.float32(), 2)
vector_column = table.to_arrow().column("vector")
assert vector_column.type == pa.list_(pa.float32(), 2)
assert vector_column.to_pylist() == [[0.0, 300.0]]
@pytest.mark.asyncio
async def test_create_table_async_infers_large_int_vectors(
mem_db_async: AsyncConnection,
):
data = [{"vector": [256, 257]}]
table = await mem_db_async.create_table(
"int_vector_overflow_async", data=data, mode="overwrite", exist_ok=True
)
schema = await table.schema()
assert schema.field("vector").type == pa.list_(pa.float32(), 2)
vector_column = (await table.to_arrow()).column("vector")
assert vector_column.type == pa.list_(pa.float32(), 2)
assert vector_column.to_pylist() == [[256.0, 257.0]]
def test_input_data_type(mem_db: DBConnection, tmp_path):
schema = pa.schema(
{

View File

@@ -1,6 +1,6 @@
[package]
name = "lancedb"
version = "0.23.0-beta.0"
version = "0.23.1"
edition.workspace = true
description = "LanceDB: A serverless, low-latency vector database for AI applications"
license.workspace = true
@@ -110,7 +110,7 @@ oss = ["lance/oss", "lance-io/oss", "lance-namespace-impls/dir-oss"]
gcs = ["lance/gcp", "lance-io/gcp", "lance-namespace-impls/dir-gcp"]
azure = ["lance/azure", "lance-io/azure", "lance-namespace-impls/dir-azure"]
dynamodb = ["lance/dynamodb", "aws"]
remote = ["dep:reqwest", "dep:http", "lance-namespace-impls/rest"]
remote = ["dep:reqwest", "dep:http", "lance-namespace-impls/rest", "lance-namespace-impls/rest-adapter"]
fp16kernels = ["lance-linalg/fp16kernels"]
s3-test = []
bedrock = ["dep:aws-sdk-bedrockruntime"]

View File

@@ -804,6 +804,14 @@ impl Connection {
self.internal.describe_namespace(request).await
}
/// Get the equivalent namespace client in the database of this connection.
/// For LanceNamespaceDatabase, it is the underlying LanceNamespace.
/// For ListingDatabase, it is the equivalent DirectoryNamespace.
/// For RemoteDatabase, it is the equivalent RestNamespace.
pub async fn namespace_client(&self) -> Result<Arc<dyn lance_namespace::LanceNamespace>> {
self.internal.namespace_client().await
}
/// List tables with pagination support
pub async fn list_tables(&self, request: ListTablesRequest) -> Result<ListTablesResponse> {
self.internal.list_tables(request).await
@@ -1317,25 +1325,27 @@ mod tests {
#[tokio::test]
async fn test_table_names() {
let tmp_dir = tempdir().unwrap();
let tc = new_test_connection().await.unwrap();
let db = tc.connection;
let schema = Arc::new(Schema::new(vec![Field::new("x", DataType::Int32, false)]));
let mut names = Vec::with_capacity(100);
for _ in 0..100 {
let mut name = uuid::Uuid::new_v4().to_string();
let name = uuid::Uuid::new_v4().to_string();
names.push(name.clone());
name.push_str(".lance");
create_dir_all(tmp_dir.path().join(&name)).unwrap();
db.create_empty_table(name, schema.clone())
.execute()
.await
.unwrap();
}
names.sort();
let uri = tmp_dir.path().to_str().unwrap();
let db = connect(uri).execute().await.unwrap();
let tables = db.table_names().execute().await.unwrap();
let tables = db.table_names().limit(100).execute().await.unwrap();
assert_eq!(tables, names);
let tables = db
.table_names()
.start_after(&names[30])
.limit(100)
.execute()
.await
.unwrap();

View File

@@ -296,4 +296,10 @@ pub trait Database:
/// Drop all tables in the database
async fn drop_all_tables(&self, namespace: &[String]) -> Result<()>;
fn as_any(&self) -> &dyn std::any::Any;
/// Get the equivalent namespace client of this database
/// For LanceNamespaceDatabase, it is the underlying LanceNamespace.
/// For ListingDatabase, it is the equivalent DirectoryNamespace.
/// For RemoteDatabase, it is the equivalent RestNamespace.
async fn namespace_client(&self) -> Result<Arc<dyn LanceNamespace>>;
}

View File

@@ -1043,6 +1043,24 @@ impl Database for ListingDatabase {
fn as_any(&self) -> &dyn std::any::Any {
self
}
async fn namespace_client(&self) -> Result<Arc<dyn lance_namespace::LanceNamespace>> {
// Create a DirectoryNamespace pointing to the same root with the same storage options
let mut builder = lance_namespace_impls::DirectoryNamespaceBuilder::new(&self.uri);
// Add storage options
if !self.storage_options.is_empty() {
builder = builder.storage_options(self.storage_options.clone());
}
// Use the same session
builder = builder.session(self.session.clone());
let namespace = builder.build().await.map_err(|e| Error::Runtime {
message: format!("Failed to create namespace client: {}", e),
})?;
Ok(Arc::new(namespace) as Arc<dyn lance_namespace::LanceNamespace>)
}
}
#[cfg(test)]
@@ -2027,4 +2045,63 @@ mod tests {
let db_options = ListingDatabaseOptions::parse_from_map(&options).unwrap();
assert_eq!(db_options.new_table_config.enable_stable_row_ids, None);
}
#[tokio::test]
async fn test_namespace_client() {
let (_tempdir, db) = setup_database().await;
// Create some tables first
let schema = Arc::new(Schema::new(vec![
Field::new("id", DataType::Int32, false),
Field::new("name", DataType::Utf8, false),
]));
db.create_table(CreateTableRequest {
name: "table1".to_string(),
namespace: vec![],
data: CreateTableData::Empty(TableDefinition::new_from_schema(schema.clone())),
mode: CreateTableMode::Create,
write_options: Default::default(),
location: None,
namespace_client: None,
})
.await
.unwrap();
db.create_table(CreateTableRequest {
name: "table2".to_string(),
namespace: vec![],
data: CreateTableData::Empty(TableDefinition::new_from_schema(schema)),
mode: CreateTableMode::Create,
write_options: Default::default(),
location: None,
namespace_client: None,
})
.await
.unwrap();
// Get the namespace client
let namespace_client = db.namespace_client().await;
assert!(namespace_client.is_ok());
let namespace_client = namespace_client.unwrap();
// Verify the namespace client can list the tables we created
// Use empty vec for root namespace
let list_result = namespace_client
.list_tables(lance_namespace::models::ListTablesRequest {
id: Some(vec![]),
..Default::default()
})
.await;
assert!(
list_result.is_ok(),
"list_tables failed: {:?}",
list_result.err()
);
let tables = list_result.unwrap().tables;
assert_eq!(tables.len(), 2);
assert!(tables.contains(&"table1".to_string()));
assert!(tables.contains(&"table2".to_string()));
}
}

View File

@@ -7,7 +7,6 @@ use std::collections::HashMap;
use std::sync::Arc;
use async_trait::async_trait;
use lance_io::object_store::{LanceNamespaceStorageOptionsProvider, StorageOptionsProvider};
use lance_namespace::{
models::{
CreateEmptyTableRequest, CreateNamespaceRequest, CreateNamespaceResponse,
@@ -19,13 +18,13 @@ use lance_namespace::{
};
use lance_namespace_impls::ConnectBuilder;
use crate::connection::ConnectRequest;
use crate::database::ReadConsistency;
use crate::error::{Error, Result};
use crate::table::NativeTable;
use super::{
listing::ListingDatabase, BaseTable, CloneTableRequest, CreateTableMode,
CreateTableRequest as DbCreateTableRequest, Database, OpenTableRequest, TableNamesRequest,
BaseTable, CloneTableRequest, CreateTableMode, CreateTableRequest as DbCreateTableRequest,
Database, OpenTableRequest, TableNamesRequest,
};
/// A database implementation that uses lance-namespace for table management
@@ -90,51 +89,6 @@ impl std::fmt::Display for LanceNamespaceDatabase {
}
}
impl LanceNamespaceDatabase {
/// Create a temporary listing database for the given location
///
/// Merges storage options with priority: connection < user < namespace
async fn create_listing_database(
&self,
location: &str,
table_id: Vec<String>,
user_storage_options: Option<&HashMap<String, String>>,
response_storage_options: Option<&HashMap<String, String>>,
) -> Result<ListingDatabase> {
// Merge storage options: connection < user < namespace
let mut merged_storage_options = self.storage_options.clone();
if let Some(opts) = user_storage_options {
merged_storage_options.extend(opts.clone());
}
if let Some(opts) = response_storage_options {
merged_storage_options.extend(opts.clone());
}
let request = ConnectRequest {
uri: location.to_string(),
#[cfg(feature = "remote")]
client_config: Default::default(),
options: merged_storage_options,
read_consistency_interval: self.read_consistency_interval,
session: self.session.clone(),
};
let mut listing_db = ListingDatabase::connect_with_options(&request).await?;
// Create storage options provider only if namespace returned storage options
// (not just user-provided options)
if response_storage_options.is_some() {
let provider = Arc::new(LanceNamespaceStorageOptionsProvider::new(
self.namespace.clone(),
table_id,
)) as Arc<dyn StorageOptionsProvider>;
listing_db.storage_options_provider = Some(provider);
}
Ok(listing_db)
}
}
#[async_trait]
impl Database for LanceNamespaceDatabase {
fn uri(&self) -> &str {
@@ -195,14 +149,6 @@ impl Database for LanceNamespaceDatabase {
}
async fn create_table(&self, request: DbCreateTableRequest) -> Result<Arc<dyn BaseTable>> {
// Extract user-provided storage options from request
let user_storage_options = request
.write_options
.lance_write_params
.as_ref()
.and_then(|lwp| lwp.store_params.as_ref())
.and_then(|sp| sp.storage_options.as_ref());
let mut table_id = request.namespace.clone();
table_id.push(request.name.clone());
let describe_request = DescribeTableRequest {
@@ -235,34 +181,20 @@ impl Database for LanceNamespaceDatabase {
}
}
CreateTableMode::ExistOk(_) => {
if let Ok(response) = describe_result {
let location = response.location.ok_or_else(|| Error::Runtime {
message: "Table location is missing from namespace response".to_string(),
})?;
if describe_result.is_ok() {
let native_table = NativeTable::open_from_namespace(
self.namespace.clone(),
&request.name,
request.namespace.clone(),
None,
None,
self.read_consistency_interval,
self.server_side_query_enabled,
self.session.clone(),
)
.await?;
let listing_db = self
.create_listing_database(
&location,
table_id.clone(),
user_storage_options,
response.storage_options.as_ref(),
)
.await?;
let namespace_client = self
.server_side_query_enabled
.then(|| self.namespace.clone());
return listing_db
.open_table(OpenTableRequest {
name: request.name.clone(),
namespace: request.namespace.clone(),
index_cache_size: None,
lance_read_params: None,
location: Some(location),
namespace_client,
})
.await;
return Ok(Arc::new(native_table));
}
}
}
@@ -294,82 +226,37 @@ impl Database for LanceNamespaceDatabase {
message: "Table location is missing from create_empty_table response".to_string(),
})?;
let listing_db = self
.create_listing_database(
&location,
table_id.clone(),
user_storage_options,
create_empty_response.storage_options.as_ref(),
)
.await?;
let native_table = NativeTable::create_from_namespace(
self.namespace.clone(),
&location,
&request.name,
request.namespace.clone(),
request.data,
None, // write_store_wrapper not used for namespace connections
request.write_options.lance_write_params,
self.read_consistency_interval,
self.server_side_query_enabled,
self.session.clone(),
)
.await?;
let namespace_client = self
.server_side_query_enabled
.then(|| self.namespace.clone());
let create_request = DbCreateTableRequest {
name: request.name,
namespace: request.namespace,
data: request.data,
mode: request.mode,
write_options: request.write_options,
location: Some(location),
namespace_client,
};
listing_db.create_table(create_request).await
Ok(Arc::new(native_table))
}
async fn open_table(&self, request: OpenTableRequest) -> Result<Arc<dyn BaseTable>> {
// Extract user-provided storage options from request
let user_storage_options = request
.lance_read_params
.as_ref()
.and_then(|lrp| lrp.store_options.as_ref())
.and_then(|so| so.storage_options.as_ref());
let native_table = NativeTable::open_from_namespace(
self.namespace.clone(),
&request.name,
request.namespace.clone(),
None, // write_store_wrapper not used for namespace connections
request.lance_read_params,
self.read_consistency_interval,
self.server_side_query_enabled,
self.session.clone(),
)
.await?;
let mut table_id = request.namespace.clone();
table_id.push(request.name.clone());
let describe_request = DescribeTableRequest {
id: Some(table_id.clone()),
version: None,
};
let response = self
.namespace
.describe_table(describe_request)
.await
.map_err(|e| Error::Runtime {
message: format!("Failed to describe table: {}", e),
})?;
let location = response.location.ok_or_else(|| Error::Runtime {
message: "Table location is missing from namespace response".to_string(),
})?;
let listing_db = self
.create_listing_database(
&location,
table_id.clone(),
user_storage_options,
response.storage_options.as_ref(),
)
.await?;
let namespace_client = self
.server_side_query_enabled
.then(|| self.namespace.clone());
let open_request = OpenTableRequest {
name: request.name.clone(),
namespace: request.namespace.clone(),
index_cache_size: request.index_cache_size,
lance_read_params: request.lance_read_params,
location: Some(location),
namespace_client,
};
listing_db.open_table(open_request).await
Ok(Arc::new(native_table))
}
async fn clone_table(&self, _request: CloneTableRequest) -> Result<Arc<dyn BaseTable>> {
@@ -425,6 +312,10 @@ impl Database for LanceNamespaceDatabase {
fn as_any(&self) -> &dyn std::any::Any {
self
}
async fn namespace_client(&self) -> Result<Arc<dyn LanceNamespace>> {
Ok(self.namespace.clone())
}
}
#[cfg(test)]

View File

@@ -232,6 +232,38 @@ impl HttpSend for Sender {
}
}
/// Parsed components from a database URL (db://...)
pub struct ParsedDbUrl {
pub db_name: String,
pub db_prefix: Option<String>,
}
/// Parse a database URL and extract the database name and optional prefix.
///
/// Expected format: `db://db_name` or `db://db_name/prefix`
pub fn parse_db_url(db_url: &str) -> Result<ParsedDbUrl> {
let parsed_url = url::Url::parse(db_url).map_err(|err| Error::InvalidInput {
message: format!("db_url is not a valid URL. '{db_url}'. Error: {err}"),
})?;
debug_assert_eq!(parsed_url.scheme(), "db");
if !parsed_url.has_host() {
return Err(Error::InvalidInput {
message: format!("Invalid database URL (missing host) '{}'", db_url),
});
}
let db_name = parsed_url.host_str().unwrap().to_string();
let db_prefix = {
let prefix = parsed_url.path().trim_start_matches('/');
if prefix.is_empty() {
None
} else {
Some(prefix.to_string())
}
};
Ok(ParsedDbUrl { db_name, db_prefix })
}
impl RestfulLanceDbClient<Sender> {
fn get_timeout(passed: Option<Duration>, env_var: &str) -> Result<Option<Duration>> {
if let Some(passed) = passed {
@@ -250,32 +282,12 @@ impl RestfulLanceDbClient<Sender> {
}
pub fn try_new(
db_url: &str,
api_key: &str,
parsed_url: &ParsedDbUrl,
region: &str,
host_override: Option<String>,
default_headers: HeaderMap,
client_config: ClientConfig,
options: &RemoteOptions,
) -> Result<Self> {
let parsed_url = url::Url::parse(db_url).map_err(|err| Error::InvalidInput {
message: format!("db_url is not a valid URL. '{db_url}'. Error: {err}"),
})?;
debug_assert_eq!(parsed_url.scheme(), "db");
if !parsed_url.has_host() {
return Err(Error::InvalidInput {
message: format!("Invalid database URL (missing host) '{}'", db_url),
});
}
let db_name = parsed_url.host_str().unwrap();
let db_prefix = {
let prefix = parsed_url.path().trim_start_matches('/');
if prefix.is_empty() {
None
} else {
Some(prefix)
}
};
// Get the timeouts
let timeout =
Self::get_timeout(client_config.timeout_config.timeout, "LANCE_CLIENT_TIMEOUT")?;
@@ -348,15 +360,7 @@ impl RestfulLanceDbClient<Sender> {
}
let client = client_builder
.default_headers(Self::default_headers(
api_key,
region,
db_name,
host_override.is_some(),
options,
db_prefix,
&client_config,
)?)
.default_headers(default_headers)
.user_agent(client_config.user_agent)
.build()
.map_err(|err| Error::Other {
@@ -366,7 +370,7 @@ impl RestfulLanceDbClient<Sender> {
let host = match host_override {
Some(host_override) => host_override,
None => format!("https://{}.{}.api.lancedb.com", db_name, region),
None => format!("https://{}.{}.api.lancedb.com", parsed_url.db_name, region),
};
debug!("Created client for host: {}", host);
let retry_config = client_config.retry_config.clone().try_into()?;
@@ -389,7 +393,7 @@ impl<S: HttpSend> RestfulLanceDbClient<S> {
&self.host
}
fn default_headers(
pub fn default_headers(
api_key: &str,
region: &str,
db_name: &str,

View File

@@ -189,6 +189,10 @@ pub struct RemoteDatabase<S: HttpSend = Sender> {
client: RestfulLanceDbClient<S>,
table_cache: Cache<String, Arc<RemoteTable<S>>>,
uri: String,
/// Headers to pass to the namespace client for authentication
namespace_headers: HashMap<String, String>,
/// TLS configuration for mTLS support
tls_config: Option<super::client::TlsConfig>,
}
impl RemoteDatabase {
@@ -200,13 +204,32 @@ impl RemoteDatabase {
client_config: ClientConfig,
options: RemoteOptions,
) -> Result<Self> {
let client = RestfulLanceDbClient::try_new(
uri,
let parsed = super::client::parse_db_url(uri)?;
let header_map = RestfulLanceDbClient::<Sender>::default_headers(
api_key,
region,
host_override,
client_config,
&parsed.db_name,
host_override.is_some(),
&options,
parsed.db_prefix.as_deref(),
&client_config,
)?;
let namespace_headers: HashMap<String, String> = header_map
.iter()
.filter_map(|(k, v)| {
v.to_str()
.ok()
.map(|val| (k.as_str().to_string(), val.to_string()))
})
.collect();
let client = RestfulLanceDbClient::try_new(
&parsed,
region,
host_override,
header_map,
client_config.clone(),
)?;
let table_cache = Cache::builder()
@@ -218,6 +241,8 @@ impl RemoteDatabase {
client,
table_cache,
uri: uri.to_owned(),
namespace_headers,
tls_config: client_config.tls_config,
})
}
}
@@ -240,6 +265,8 @@ mod test_utils {
client,
table_cache: Cache::new(0),
uri: "http://localhost".to_string(),
namespace_headers: HashMap::new(),
tls_config: None,
}
}
@@ -248,11 +275,13 @@ mod test_utils {
F: Fn(reqwest::Request) -> http::Response<T> + Send + Sync + 'static,
T: Into<reqwest::Body>,
{
let client = client_with_handler_and_config(handler, config);
let client = client_with_handler_and_config(handler, config.clone());
Self {
client,
table_cache: Cache::new(0),
uri: "http://localhost".to_string(),
namespace_headers: config.extra_headers.clone(),
tls_config: config.tls_config.clone(),
}
}
}
@@ -716,7 +745,8 @@ impl<S: HttpSend> Database for RemoteDatabase<S> {
let namespace_id = build_namespace_identifier(namespace_parts, &self.client.id_delimiter);
let req = self
.client
.get(&format!("/v1/namespace/{}/describe", namespace_id));
.post(&format!("/v1/namespace/{}/describe", namespace_id))
.json(&DescribeNamespaceRequest::default());
let (request_id, resp) = self.client.send(req).await?;
let resp = self.client.check_response(&request_id, resp).await?;
@@ -727,6 +757,31 @@ impl<S: HttpSend> Database for RemoteDatabase<S> {
fn as_any(&self) -> &dyn std::any::Any {
self
}
async fn namespace_client(&self) -> Result<Arc<dyn lance_namespace::LanceNamespace>> {
// Create a RestNamespace pointing to the same remote host with the same authentication headers
let mut builder = lance_namespace_impls::RestNamespaceBuilder::new(self.client.host())
.delimiter(&self.client.id_delimiter)
// TODO: support header provider
.headers(self.namespace_headers.clone());
// Apply mTLS configuration if present
if let Some(tls_config) = &self.tls_config {
if let Some(cert_file) = &tls_config.cert_file {
builder = builder.cert_file(cert_file);
}
if let Some(key_file) = &tls_config.key_file {
builder = builder.key_file(key_file);
}
if let Some(ssl_ca_cert) = &tls_config.ssl_ca_cert {
builder = builder.ssl_ca_cert(ssl_ca_cert);
}
builder = builder.assert_hostname(tls_config.assert_hostname);
}
let namespace = builder.build();
Ok(Arc::new(namespace) as Arc<dyn lance_namespace::LanceNamespace>)
}
}
/// RemoteOptions contains a subset of StorageOptions that are compatible with Remote LanceDB connections
@@ -1518,4 +1573,265 @@ mod tests {
panic!("Expected HTTP error");
}
}
#[tokio::test]
async fn test_namespace_client() {
let conn = Connection::new_with_handler(|_| {
http::Response::builder()
.status(200)
.body(r#"{"tables": []}"#)
.unwrap()
});
// Get the namespace client from the connection's internal database
let namespace_client = conn.namespace_client().await;
assert!(namespace_client.is_ok());
}
#[tokio::test]
async fn test_namespace_client_with_tls_config() {
use crate::remote::client::TlsConfig;
let tls_config = TlsConfig {
cert_file: Some("/path/to/cert.pem".to_string()),
key_file: Some("/path/to/key.pem".to_string()),
ssl_ca_cert: Some("/path/to/ca.pem".to_string()),
assert_hostname: true,
};
let client_config = ClientConfig {
tls_config: Some(tls_config),
..Default::default()
};
let conn = Connection::new_with_handler_and_config(
|_| {
http::Response::builder()
.status(200)
.body(r#"{"tables": []}"#)
.unwrap()
},
client_config,
);
// Get the namespace client - it should be created with the TLS config
let namespace_client = conn.namespace_client().await;
assert!(namespace_client.is_ok());
}
#[tokio::test]
async fn test_namespace_client_with_headers() {
let mut extra_headers = HashMap::new();
extra_headers.insert("X-Custom-Header".to_string(), "custom-value".to_string());
let client_config = ClientConfig {
extra_headers,
..Default::default()
};
let conn = Connection::new_with_handler_and_config(
|_| {
http::Response::builder()
.status(200)
.body(r#"{"tables": []}"#)
.unwrap()
},
client_config,
);
// Get the namespace client - it should be created with the extra headers
let namespace_client = conn.namespace_client().await;
assert!(namespace_client.is_ok());
}
/// Integration tests using RestAdapter to run RemoteDatabase against a real namespace server
mod rest_adapter_integration {
use super::*;
use lance_namespace::models::ListTablesRequest;
use lance_namespace_impls::{DirectoryNamespaceBuilder, RestAdapter, RestAdapterConfig};
use std::sync::Arc;
use tempfile::TempDir;
/// Test fixture that manages a REST server backed by DirectoryNamespace
struct RestServerFixture {
_temp_dir: TempDir,
server_handle: lance_namespace_impls::RestAdapterHandle,
server_url: String,
}
impl RestServerFixture {
async fn new() -> Self {
let temp_dir = TempDir::new().unwrap();
let temp_path = temp_dir.path().to_str().unwrap().to_string();
// Create DirectoryNamespace backend
let backend = DirectoryNamespaceBuilder::new(&temp_path)
.build()
.await
.unwrap();
let backend = Arc::new(backend);
// Start REST server with port 0 (OS assigns available port)
let config = RestAdapterConfig {
port: 0,
..Default::default()
};
let server = RestAdapter::new(backend, config);
let server_handle = server.start().await.unwrap();
// Get the actual port assigned by OS
let actual_port = server_handle.port();
let server_url = format!("http://127.0.0.1:{}", actual_port);
Self {
_temp_dir: temp_dir,
server_handle,
server_url,
}
}
}
impl Drop for RestServerFixture {
fn drop(&mut self) {
self.server_handle.shutdown();
}
}
#[tokio::test(flavor = "multi_thread", worker_threads = 2)]
async fn test_remote_database_with_rest_adapter() {
use lance_namespace::models::CreateNamespaceRequest;
let fixture = RestServerFixture::new().await;
// Connect to the REST server using lancedb Connection
// Use db://dummy as URI and set actual server URL via host_override
let conn = ConnectBuilder::new("db://dummy")
.api_key("test-api-key")
.region("us-east-1")
.host_override(&fixture.server_url)
.execute()
.await
.unwrap();
// Create a child namespace first
let namespace = vec!["test_ns".to_string()];
conn.create_namespace(CreateNamespaceRequest {
id: Some(namespace.clone()),
mode: None,
properties: None,
})
.await
.expect("Failed to create namespace");
// Create a table in the child namespace
let schema = Arc::new(Schema::new(vec![Field::new("a", DataType::Int32, false)]));
let data = RecordBatch::try_new(
schema.clone(),
vec![Arc::new(Int32Array::from(vec![1, 2, 3]))],
)
.unwrap();
let reader = RecordBatchIterator::new([Ok(data.clone())], schema.clone());
let table = conn
.create_table("test_table", reader)
.namespace(namespace.clone())
.execute()
.await;
assert!(table.is_ok(), "Failed to create table: {:?}", table.err());
// List tables in the child namespace
let list_response = conn
.list_tables(ListTablesRequest {
id: Some(namespace.clone()),
page_token: None,
limit: None,
})
.await
.expect("Failed to list tables");
assert_eq!(list_response.tables, vec!["test_table"]);
// Get namespace client and verify it can also list tables
let namespace_client = conn.namespace_client().await.unwrap();
let list_response = namespace_client
.list_tables(ListTablesRequest {
id: Some(namespace.clone()),
page_token: None,
limit: None,
})
.await
.unwrap();
assert_eq!(list_response.tables, vec!["test_table"]);
// Open the table from the child namespace
let opened_table = conn
.open_table("test_table")
.namespace(namespace.clone())
.execute()
.await;
assert!(
opened_table.is_ok(),
"Failed to open table: {:?}",
opened_table.err()
);
assert_eq!(opened_table.unwrap().name(), "test_table");
}
#[tokio::test(flavor = "multi_thread", worker_threads = 2)]
async fn test_remote_database_with_multiple_tables() {
use lance_namespace::models::CreateNamespaceRequest;
let fixture = RestServerFixture::new().await;
// Connect to the REST server
// Use db://dummy as URI and set actual server URL via host_override
let conn = ConnectBuilder::new("db://dummy")
.api_key("test-api-key")
.region("us-east-1")
.host_override(&fixture.server_url)
.execute()
.await
.unwrap();
// Create a child namespace first
let namespace = vec!["multi_table_ns".to_string()];
conn.create_namespace(CreateNamespaceRequest {
id: Some(namespace.clone()),
mode: None,
properties: None,
})
.await
.expect("Failed to create namespace");
// Create multiple tables in the child namespace
let schema = Arc::new(Schema::new(vec![Field::new("id", DataType::Int32, false)]));
for i in 1..=3 {
let data =
RecordBatch::try_new(schema.clone(), vec![Arc::new(Int32Array::from(vec![i]))])
.unwrap();
let reader = RecordBatchIterator::new([Ok(data.clone())], schema.clone());
conn.create_table(format!("table{}", i), reader)
.namespace(namespace.clone())
.execute()
.await
.unwrap_or_else(|e| panic!("Failed to create table{}: {:?}", i, e));
}
// List tables in the child namespace
let list_response = conn
.list_tables(ListTablesRequest {
id: Some(namespace.clone()),
page_token: None,
limit: None,
})
.await
.unwrap();
assert_eq!(list_response.tables.len(), 3);
assert!(list_response.tables.contains(&"table1".to_string()));
assert!(list_response.tables.contains(&"table2".to_string()));
assert!(list_response.tables.contains(&"table3".to_string()));
}
}
}

View File

@@ -1088,6 +1088,17 @@ impl<S: HttpSend> BaseTable for RemoteTable<S> {
body["num_partitions"] = serde_json::Value::Number(num_partitions.into());
}
}
Index::IvfRq(index) => {
body[INDEX_TYPE_KEY] = serde_json::Value::String("IVF_RQ".to_string());
body[METRIC_TYPE_KEY] =
serde_json::Value::String(index.distance_type.to_string().to_lowercase());
if let Some(num_partitions) = index.num_partitions {
body["num_partitions"] = serde_json::Value::Number(num_partitions.into());
}
if let Some(num_bits) = index.num_bits {
body["num_bits"] = serde_json::Value::Number(num_bits.into());
}
}
Index::BTree(_) => {
body[INDEX_TYPE_KEY] = serde_json::Value::String("BTREE".to_string());
}

View File

@@ -29,7 +29,7 @@ use lance::dataset::{
use lance::dataset::{MergeInsertBuilder as LanceMergeInsertBuilder, WhenNotMatchedBySource};
use lance::index::vector::utils::infer_vector_dim;
use lance::index::vector::VectorIndexParams;
use lance::io::WrappingObjectStore;
use lance::io::{ObjectStoreParams, WrappingObjectStore};
use lance_datafusion::exec::{analyze_plan as lance_analyze_plan, execute_plan};
use lance_datafusion::utils::StreamingWriteSource;
use lance_index::scalar::{BuiltinIndexType, ScalarIndexParams};
@@ -40,6 +40,7 @@ use lance_index::vector::pq::PQBuildParams;
use lance_index::vector::sq::builder::SQBuildParams;
use lance_index::DatasetIndexExt;
use lance_index::IndexType;
use lance_io::object_store::LanceNamespaceStorageOptionsProvider;
use lance_namespace::models::{
QueryTableRequest as NsQueryTableRequest, QueryTableRequestFullTextQuery,
QueryTableRequestVector, StringFtsQuery,
@@ -1611,6 +1612,105 @@ impl NativeTable {
self
}
/// Opens an existing Table using a namespace client.
///
/// This method uses `DatasetBuilder::from_namespace` to open the table, which
/// automatically fetches the table location and storage options from the namespace.
/// This eliminates the need to pre-fetch and merge storage options before opening.
///
/// # Arguments
///
/// * `namespace_client` - The namespace client to use for fetching table metadata
/// * `name` - The table name
/// * `namespace` - The namespace path (e.g., vec!["parent", "child"])
/// * `write_store_wrapper` - Optional wrapper for the object store on write path
/// * `params` - Optional read parameters
/// * `read_consistency_interval` - Optional interval for read consistency
/// * `server_side_query_enabled` - Whether to enable server-side query execution.
/// When true, the namespace_client will be stored and queries will be executed
/// on the namespace server. When false, the namespace is only used for opening
/// the table, and queries are executed locally.
/// * `session` - Optional session for object stores and caching
///
/// # Returns
///
/// * A [NativeTable] object.
#[allow(clippy::too_many_arguments)]
pub async fn open_from_namespace(
namespace_client: Arc<dyn LanceNamespace>,
name: &str,
namespace: Vec<String>,
write_store_wrapper: Option<Arc<dyn WrappingObjectStore>>,
params: Option<ReadParams>,
read_consistency_interval: Option<std::time::Duration>,
server_side_query_enabled: bool,
session: Option<Arc<lance::session::Session>>,
) -> Result<Self> {
let mut params = params.unwrap_or_default();
// Set the session in read params
if let Some(sess) = session {
params.session(sess);
}
// patch the params if we have a write store wrapper
let params = match write_store_wrapper.clone() {
Some(wrapper) => params.patch_with_store_wrapper(wrapper)?,
None => params,
};
// Build table_id from namespace + name
let mut table_id = namespace.clone();
table_id.push(name.to_string());
// Use DatasetBuilder::from_namespace which automatically fetches location
// and storage options from the namespace
let builder = DatasetBuilder::from_namespace(
namespace_client.clone(),
table_id,
false, // Don't ignore namespace storage options
)
.await
.map_err(|e| match e {
lance::Error::Namespace { source, .. } => Error::Runtime {
message: format!("Failed to get table info from namespace: {:?}", source),
},
source => Error::Lance { source },
})?;
let dataset = builder
.with_read_params(params)
.load()
.await
.map_err(|e| match e {
lance::Error::DatasetNotFound { .. } => Error::TableNotFound {
name: name.to_string(),
source: Box::new(e),
},
source => Error::Lance { source },
})?;
let uri = dataset.uri().to_string();
let dataset = DatasetConsistencyWrapper::new_latest(dataset, read_consistency_interval);
let id = Self::build_id(&namespace, name);
let stored_namespace_client = if server_side_query_enabled {
Some(namespace_client)
} else {
None
};
Ok(Self {
name: name.to_string(),
namespace,
id,
uri,
dataset,
read_consistency_interval,
namespace_client: stored_namespace_client,
})
}
fn get_table_name(uri: &str) -> Result<String> {
let path = Path::new(uri);
let name = path
@@ -1722,6 +1822,102 @@ impl NativeTable {
.await
}
/// Creates a new Table using a namespace client for storage options.
///
/// This method sets up a `StorageOptionsProvider` from the namespace client,
/// enabling automatic credential refresh for cloud storage. The namespace
/// is used for:
/// 1. Setting up storage options provider for credential vending
/// 2. Optionally enabling server-side query execution
///
/// # Arguments
///
/// * `namespace_client` - The namespace client to use for storage options
/// * `uri` - The URI to the table (obtained from create_empty_table response)
/// * `name` - The table name
/// * `namespace` - The namespace path (e.g., vec!["parent", "child"])
/// * `batches` - RecordBatch to be saved in the database
/// * `write_store_wrapper` - Optional wrapper for the object store on write path
/// * `params` - Optional write parameters
/// * `read_consistency_interval` - Optional interval for read consistency
/// * `server_side_query_enabled` - Whether to enable server-side query execution
///
/// # Returns
///
/// * A [NativeTable] object.
#[allow(clippy::too_many_arguments)]
pub async fn create_from_namespace(
namespace_client: Arc<dyn LanceNamespace>,
uri: &str,
name: &str,
namespace: Vec<String>,
batches: impl StreamingWriteSource,
write_store_wrapper: Option<Arc<dyn WrappingObjectStore>>,
params: Option<WriteParams>,
read_consistency_interval: Option<std::time::Duration>,
server_side_query_enabled: bool,
session: Option<Arc<lance::session::Session>>,
) -> Result<Self> {
// Build table_id from namespace + name for the storage options provider
let mut table_id = namespace.clone();
table_id.push(name.to_string());
// Set up storage options provider from namespace
let storage_options_provider = Arc::new(LanceNamespaceStorageOptionsProvider::new(
namespace_client.clone(),
table_id,
));
// Start with provided params or defaults
let mut params = params.unwrap_or_default();
// Set the session in write params
if let Some(sess) = session {
params.session = Some(sess);
}
// Ensure store_params exists and set the storage options provider
let store_params = params
.store_params
.get_or_insert_with(ObjectStoreParams::default);
store_params.storage_options_provider = Some(storage_options_provider);
// Patch the params if we have a write store wrapper
let params = match write_store_wrapper.clone() {
Some(wrapper) => params.patch_with_store_wrapper(wrapper)?,
None => params,
};
let insert_builder = InsertBuilder::new(uri).with_params(&params);
let dataset = insert_builder
.execute_stream(batches)
.await
.map_err(|e| match e {
lance::Error::DatasetAlreadyExists { .. } => Error::TableAlreadyExists {
name: name.to_string(),
},
source => Error::Lance { source },
})?;
let id = Self::build_id(&namespace, name);
let stored_namespace_client = if server_side_query_enabled {
Some(namespace_client)
} else {
None
};
Ok(Self {
name: name.to_string(),
namespace,
id,
uri: uri.to_string(),
dataset: DatasetConsistencyWrapper::new_latest(dataset, read_consistency_interval),
read_consistency_interval,
namespace_client: stored_namespace_client,
})
}
async fn optimize_indices(&self, options: &OptimizeOptions) -> Result<()> {
info!("LanceDB: optimizing indices: {:?}", options);
self.dataset

View File

@@ -5,16 +5,19 @@
use regex::Regex;
use std::env;
use std::io::{BufRead, BufReader};
use std::process::{Child, ChildStdout, Command, Stdio};
use std::process::Stdio;
use tokio::io::{AsyncBufReadExt, BufReader};
use tokio::process::{Child, ChildStdout, Command};
use tokio::sync::mpsc;
use crate::{connect, Connection};
use anyhow::{bail, Result};
use anyhow::{anyhow, bail, Result};
use tempfile::{tempdir, TempDir};
pub struct TestConnection {
pub uri: String,
pub connection: Connection,
pub is_remote: bool,
_temp_dir: Option<TempDir>,
_process: Option<TestProcess>,
}
@@ -37,6 +40,56 @@ pub async fn new_test_connection() -> Result<TestConnection> {
}
}
async fn spawn_stdout_reader(
mut stdout: BufReader<ChildStdout>,
port_sender: mpsc::Sender<anyhow::Result<String>>,
) -> tokio::task::JoinHandle<()> {
let print_stdout = env::var("PRINT_LANCEDB_TEST_CONNECTION_SCRIPT_OUTPUT").is_ok();
tokio::spawn(async move {
let mut line = String::new();
let re = Regex::new(r"Query node now listening on 0.0.0.0:(.*)").unwrap();
loop {
line.clear();
let result = stdout.read_line(&mut line).await;
if let Err(err) = result {
port_sender
.send(Err(anyhow!(
"error while reading from process output: {}",
err
)))
.await
.unwrap();
return;
} else if result.unwrap() == 0 {
port_sender
.send(Err(anyhow!(
" hit EOF before reading port from process output."
)))
.await
.unwrap();
return;
}
if re.is_match(&line) {
let caps = re.captures(&line).unwrap();
port_sender.send(Ok(caps[1].to_string())).await.unwrap();
break;
}
}
loop {
line.clear();
match stdout.read_line(&mut line).await {
Err(_) => return,
Ok(0) => return,
Ok(_size) => {
if print_stdout {
print!("{}", line);
}
}
}
}
})
}
async fn new_remote_connection(script_path: &str) -> Result<TestConnection> {
let temp_dir = tempdir()?;
let data_path = temp_dir.path().to_str().unwrap().to_string();
@@ -57,38 +110,25 @@ async fn new_remote_connection(script_path: &str) -> Result<TestConnection> {
child: child_result.unwrap(),
};
let stdout = BufReader::new(process.child.stdout.take().unwrap());
let port = read_process_port(stdout)?;
let (port_sender, mut port_receiver) = mpsc::channel(5);
let _reader = spawn_stdout_reader(stdout, port_sender).await;
let port = match port_receiver.recv().await {
None => bail!("Unable to determine the port number used by the phalanx process we spawned, because the reader thread was closed too soon."),
Some(Err(err)) => bail!("Unable to determine the port number used by the phalanx process we spawned, because of an error, {}", err),
Some(Ok(port)) => port,
};
let uri = "db://test";
let host_override = format!("http://localhost:{}", port);
let connection = create_new_connection(uri, &host_override).await?;
Ok(TestConnection {
uri: uri.to_string(),
connection,
is_remote: true,
_temp_dir: Some(temp_dir),
_process: Some(process),
})
}
fn read_process_port(mut stdout: BufReader<ChildStdout>) -> Result<String> {
let mut line = String::new();
let re = Regex::new(r"Query node now listening on 0.0.0.0:(.*)").unwrap();
loop {
let result = stdout.read_line(&mut line);
if let Err(err) = result {
bail!(format!(
"read_process_port: error while reading from process output: {}",
err
));
} else if result.unwrap() == 0 {
bail!("read_process_port: hit EOF before reading port from process output.");
}
if re.is_match(&line) {
let caps = re.captures(&line).unwrap();
return Ok(caps[1].to_string());
}
}
}
#[cfg(feature = "remote")]
async fn create_new_connection(uri: &str, host_override: &str) -> crate::error::Result<Connection> {
connect(uri)
@@ -114,6 +154,7 @@ async fn new_local_connection() -> Result<TestConnection> {
Ok(TestConnection {
uri: uri.to_string(),
connection,
is_remote: false,
_temp_dir: Some(temp_dir),
_process: None,
})