Compare commits

...

56 Commits

Author SHA1 Message Date
Lance Release
ba94e69d5d Bump version: 0.26.0 → 0.26.1-beta.0 2025-12-17 03:30:18 +00:00
Jack Ye
9e60fda0ec fix: use post for describe_namespace and allow access to underlying client (#2871)
Issues found during integration tests:
1. describe_namespace should use POST
2. service needs to access the underlying namespace to be able to do
operations like create_empty_table directly, or get credentials in
isolated paths like a remote take
2025-12-16 19:29:27 -08:00
LanceDB Robot
3e0d451e9b chore: update lance dependency to v1.0.1-beta.1 (#2872)
bump Lance crates to v1.0.1-beta.1

Triggering tag:
https://github.com/lance-format/lance/releases/tag/v1.0.1-beta.1
2025-12-16 17:44:32 -08:00
Lance Release
94bdffe13c Bump version: 0.23.0-beta.2 → 0.23.0 2025-12-16 16:58:35 +00:00
Lance Release
b93ea3a388 Bump version: 0.23.0-beta.1 → 0.23.0-beta.2 2025-12-16 16:57:55 +00:00
Lance Release
ff20d12f20 Bump version: 0.26.0-beta.2 → 0.26.0 2025-12-16 16:57:09 +00:00
Lance Release
5f3e133470 Bump version: 0.26.0-beta.1 → 0.26.0-beta.2 2025-12-16 16:57:07 +00:00
Jack Ye
332e722a64 feat: upgrade lance-namespace python to 0.3.2 (#2868)
Includes fix https://github.com/lance-format/lance-namespace/pull/281
2025-12-16 08:56:04 -08:00
LanceDB Robot
3f63c4f8d9 chore: update lance dependency to v1.0.0 (#2867)
## Summary
- update all lance crates to v1.0.0 using the helper script (fallbacks
to the v1.0.0 tag)
- refresh Cargo.lock to pull the new release
- add script fallback to retry with the git tag when a crates.io release
is unavailable

## Testing
- cargo clippy --workspace --tests --all-features -- -D warnings
- cargo fmt --all

Tag: https://github.com/lance-format/lance/releases/tag/v1.0.0

---------

Co-authored-by: Jack Ye <yezhaoqin@gmail.com>
2025-12-15 20:36:19 -08:00
BubbleCal
39a18baf59 feat: infer vector type to float32 if integers are out of uint8 range (#2856)
## Summary
- infer integer vector columns as float32 when any value exceeds uint8
range or is negative
- keep uint8 for integer vectors within range and nulls only
- add sync/async tests covering large integer vector inference

## Testing
- ./.venv/bin/pytest python/python/tests/test_table.py -k
"large_int_vectors"
2025-12-08 17:10:25 +08:00
Lance Release
0960e19559 Bump version: 0.23.0-beta.0 → 0.23.0-beta.1 2025-12-05 00:36:39 +00:00
Lance Release
e5321ba311 Bump version: 0.26.0-beta.0 → 0.26.0-beta.1 2025-12-05 00:35:17 +00:00
Jack Ye
f523191d21 feat: make java client builder generic (#2851)
In #2845 we ported the lancedb integration in lance-namespace to
lancedb. But that is too specific to RestNamespace. We can improve the
user entry point so that we can put local mode and future version of the
Flight SQL-based LanceDB server all behind this single
`LanceDbNamespaceClientBuilder` API.

Also I renamed `namespace` to `namesapceClient` to avoid confusion with
the namespace path.
2025-12-04 16:34:32 -08:00
Jack Ye
4c3790cde4 chore: remove java-jni from cargo workspace (#2849)
Fixes
https://github.com/lancedb/lancedb/actions/runs/19945349063/job/57193307680
2025-12-04 16:31:37 -08:00
Jack Ye
ff75f2467b feat: use rest namespace for lancedb java sdk (#2845)
After the refactoring on both client and server side, we should have the
ability to fully use lance REST namespace to call into LanceDB cloud and
enterprise. We can avoid having a JNI implementation (which today does
not really do anything except for vending a connection object), and just
use lance-core's RestNamespace.

We will at this moment have a LanceDbRestNamespaceBuilder to allow users
to more easily build the RestNamespace to talk to LanceDB Cloud or
Enterprise endpoint.

In the future, we could extend this further to also support the local
mode through DirectoryNamespace. That will be a separated PR.
2025-12-04 13:53:47 -08:00
Lance Release
6f79770248 Bump version: 0.22.4-beta.3 → 0.23.0-beta.0 2025-12-04 19:33:37 +00:00
Lance Release
a497db66f9 Bump version: 0.25.4-beta.3 → 0.26.0-beta.0 2025-12-04 19:32:04 +00:00
LanceDB Robot
f5076269fe chore: update lance dependency to v1.1.0-beta.1 (#2844)
## Summary
- bump Lance workspace dependencies to v1.1.0-beta.1 via
ci/set_lance_version.py
- verified `cargo clippy --workspace --tests --all-features -- -D
warnings` and `cargo fmt --all` after the bump

Link: refs/tags/v1.1.0-beta.1
2025-12-04 00:44:50 -08:00
BubbleCal
a61461331c feat: add IVF SQ index support and HNSW aliases (#2832)
Adds IVF_SQ index config through Rust core and Python bindings, plus
alias names IvfHnswSq/Pq for backward compatibility. Updates
remote/table helpers and types to accept the new index type. Includes
tests covering IVF SQ creation and alias usage.
2025-12-04 00:25:44 +08:00
Jack Ye
b0170ea86a fix: table_names error at root namespace (#2842)
Root namepace should be passed in as an empty vector, not None.
2025-12-02 23:53:29 -08:00
Jack Ye
d1efc6ad8a refactor!: use namespace models directly for namespace operations (#2806)
1. Use generated models in lance-namespace for request response models
to avoid multiple layers of conversions
2. Make sure the API is consistent with the namespace spec
3. Deprecate the table_names API in favor of the list_tables API in
namespace that allows full pagination support without the need to have
sorted table names
4. Add describe_namespace API which was a miss in the original
implementation
2025-12-02 22:41:04 -08:00
Jack Ye
9d638cb3c7 feat: support namespace server side query (#2811)
Currently a table in a namespace is still backed with a `NativeTable`,
which means after getting the location of the table and optional storage
options override from `namespace.describe_table`, all things work like a
normal local table. However, namespace also supports `query_table`,
which is exactly the same API as remote table. This PR adds a
`server_side_query` capability, when enabled, it runs the query by
calling `namespace.query_table`. For namespace that implements the
operation (e.g. REST namespace), this could hit a backend server that
could execute the query faster (e.g. using a distributed engine).
2025-12-02 21:04:12 -08:00
Jack Ye
3cd73f9f5a refactor!: deprecate mac x86 support (#2836)
We have very low download stats for mac x86, and also latest github
runners for mac are all arm, so it makes sense at this point to
deprecate x86 support in general.
2025-12-02 14:12:51 -08:00
Lance Release
b2d06a3a73 Bump version: 0.22.4-beta.2 → 0.22.4-beta.3 2025-12-02 22:01:59 +00:00
Lance Release
9d129c7e86 Bump version: 0.25.4-beta.2 → 0.25.4-beta.3 2025-12-02 22:00:35 +00:00
Jonathan Hsieh
44878dd9a5 feat: support stable row IDs via storage_options (#2831)
Add support for enabling stable row IDs when creating tables via the
`new_table_enable_stable_row_ids` storage option.

Stable row IDs ensure that row identifiers remain constant after
compaction, update, delete, and merge operations. This is useful for
materialized views and other use cases that need to track source rows
across these operations.

The option can be set at two levels:
- Connection level: applies to all tables created with that connection
- Table level: per-table override via create_table storage_options

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-12-02 13:57:00 -08:00
LanceDB Robot
4b5bb2d76c chore: update lance dependency to v1.0.0-beta.16 (#2835)
## Summary
- bump all Lance crates to v1.0.0-beta.16 via ci/set_lance_version.py
- refresh Cargo.lock (reqwest/opendal/etc.) to satisfy the new release

## Verification
- cargo clippy --workspace --tests --all-features -- -D warnings
- cargo fmt --all

Triggered by
[refs/tags/v1.0.0-beta.16](https://github.com/lance-format/lance/releases/tag/v1.0.0-beta.16)

---------

Co-authored-by: Jack Ye <yezhaoqin@gmail.com>
2025-12-01 23:07:03 -08:00
LanceDB Robot
434f4124fc chore: update lance dependency to v1.0.0-beta.14 (#2826)
## Summary
- bump all Lance crates to 1.0.0-beta.14 via ci/set_lance_version.py
- refresh Cargo.lock to capture new transitive requirements
- verified `cargo clippy --workspace --tests --all-features -- -D
warnings` and `cargo fmt --all`

Triggered by refs/tags/v1.0.0-beta.14

---------

Co-authored-by: Jack Ye <yezhaoqin@gmail.com>
2025-12-01 14:43:03 -08:00
Rudi Floren
03a1a99270 feat: remove remote default features on lance-namespace-impls (#2828)
This tries to fix #2771. It is not a complete fix because
`lance-namespace-impls` uses `lance` which has its default features
enabled. Thus, to close #2771, the lance repo also needs an update.

The `dir-*` features are enabled by the respective remote feature
(`aws`, `gcp`, `azure`, `oss`).
The `rest` feature is enabled via `remote`.
2025-12-01 10:53:22 -08:00
Xuanwo
0110e3b6f8 chore: clippy::string_to_string has been replaced by implicit_clone (#2817)
clippy::string_to_string has been replaced by implicit_clone, so lancedb
will raise a build error in Rust 1.91. This PR suppresses it.

---

**This PR was primarily authored with Codex using GPT-5-Codex and then
hand-reviewed by me. I AM responsible for every change made in this PR.
I aimed to keep it aligned with our goals, though I may have missed
minor issues. Please flag anything that feels off, I'll fix it
quickly.**

Signed-off-by: Xuanwo <github@xuanwo.io>
2025-11-26 16:30:35 +08:00
Xuanwo
f1f85b0a84 ci: migrate macos ci runners (#2818)
This PR will migrate macos CI runners.

---

**This PR was primarily authored with Codex using GPT-5-Codex and then
hand-reviewed by me. I AM responsible for every change made in this PR.
I aimed to keep it aligned with our goals, though I may have missed
minor issues. Please flag anything that feels off, I'll fix it
quickly.**

Signed-off-by: Xuanwo <github@xuanwo.io>
2025-11-26 01:22:35 +08:00
LanceDB Robot
d6daa08b54 chore: update lance dependency to v1.0.0-beta.8 (#2813)
## Summary
- bump all Lance crates to v1.0.0-beta.8 via ci/set_lance_version.py
- verified 
- ran 

Trigger: refs/tags/v1.0.0-beta.8
2025-11-24 14:58:42 -08:00
Wyatt Alt
17b71de22e feat: update codex url key (#2812)
This previously through unknown key for htmlUrl and indicated "url" is a
valid field.
2025-11-24 13:13:18 -08:00
Prashanth Rao
a250d8e7df docs: improve docstring for RabitQ in Python (#2808)
This PR improves the docstring for `IVF_RQ` (RabitQ) in Python. The
earlier version referred to it as "residual quantization", which is
confusing to future readers of the code.

In contrast, the TypeScript and Rust codebases defined `IVF_RQ` as
RabitQ. So now the three languages use comments that are consistent with
one another.

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-11-24 13:35:19 +08:00
LanceDB Robot
5a2b33581e chore: update lance dependency to v1.0.0-beta.7 (#2807)
## Summary
- bump Lance dependencies to v1.0.0-beta.7 via `ci/set_lance_version.py`
- verified workspace with `cargo clippy --workspace --tests
--all-features -- -D warnings`
- formatted with `cargo fmt --all`

Trigger: refs/tags/v1.0.0-beta.7
2025-11-21 21:42:09 -08:00
Jack Ye
3d254f61b0 ci: trigger downstream verification after version bump (#2809) 2025-11-21 09:50:23 -08:00
Jack Ye
d15e380be1 ci: add support for lance-format fury index for downloading pylance (#2804)
Set the lance-format fury repo for most places that are downloading. For
uploading, it is kept unchanged since lancedb is published to lancedb
fury.
2025-11-20 23:29:36 -08:00
Jack Ye
0baf807be0 ci: use larger runner for doctest and fix failing tests (#2801)
Currently test would fail after installing to around pytorch
2025-11-20 19:44:31 -08:00
Prashanth Rao
76bcc78910 docs: nodejs failing CI is fixed (#2802)
Fixes the breaking CI for nodejs, related to the documentation of the
new Permutation API in typescript.

- Expanded the generated typings in `nodejs/lancedb/native.d.ts` to
include `SplitCalculatedOptions`, `splitNames` fields, and the
persist/options-based `splitCalculated` methods so the permutation
exports match the native API.
- The previous block comment block had an inconsistency.
`splitCalculated` takes an options object (`SplitCalculatedOptions`) in
our bindings, not a bare string. The previous example showed
`builder.splitCalculated("user_id % 3");`, which doesn’t match the
actual signature and would fail TS typecheck. I updated the comment to
`builder.splitCalculated({ calculation: "user_id % 3" });` so the
example is now correct.
- Updated the `splitCalculated` example in
`nodejs/lancedb/permutation.ts` to use the options object.
- Ran `npm docs` to ensure docs build correctly.

> [!NOTE]
> **Disclaimer**: I used GPT-5.1-Codex-Max to make these updates, but I
have read the code and run `npm run docs` to verify that they work and
are correct to the best of my knowledge.
2025-11-20 16:16:38 -08:00
Prashanth Rao
135dfdc7ec docs: 404 and outdated URLs should now work (#2800)
Did a full scan of all URLs that used to point to the old mkdocs pages,
and now links to the appropriate pages on lancedb.com/docs or lance.org
docs.

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-11-20 11:14:20 -08:00
Will Jones
6f39108857 docs: add some missing classes (#2450)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Documentation**
- Expanded Python API reference with new entries for table metadata,
tagging, remote client configuration, and index statistics.
- Added documentation for new classes and modules in both synchronous
and asynchronous sections, including `FragmentStatistics`,
`FragmentSummaryStats`, `Tags`, `AsyncTags`, `IndexStatistics`, and
remote configuration options.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-11-20 11:04:16 -08:00
Jackson Hew
bb6b0bea0c fix: .phrase_query() not working (#2781)
The `self._query` value was not set when wrapping its copy `query` with
quotation marks.

The test for phrase queries has been updated to test the
`.phrase_query()` method as well, which will catch this bug.

---------

Co-authored-by: Will Jones <willjones127@gmail.com>
2025-11-20 10:32:37 -08:00
Jack Ye
0084eb238b fix: use None default for namespace (#2797)
Realized that using [] is an anti-pattern in python for defaults:
https://docs.python-guide.org/writing/gotchas/
2025-11-20 10:23:41 -08:00
LanceDB Robot
28ab29a3f0 chore: update lance dependency to v1.0.0-beta.5 (#2798)
## Summary
- bump all Lance workspace dependencies to v1.0.0-beta.5
- verified `cargo clippy --workspace --tests --all-features -- -D
warnings`
- ran `cargo fmt --all`

Triggered by refs/tags/v1.0.0-beta.5
2025-11-20 17:43:24 +08:00
Colin Patrick McCabe
7d3f5348a7 feat: implement head() for remote tables (#2793)
Implemnent the head() function for RemoteTable.
2025-11-19 12:49:34 -08:00
Lance Release
3531393523 Bump version: 0.22.4-beta.1 → 0.22.4-beta.2 2025-11-19 20:25:41 +00:00
Lance Release
93b8ac8e3e Bump version: 0.25.4-beta.1 → 0.25.4-beta.2 2025-11-19 20:24:46 +00:00
Jack Ye
1b78ccedaf feat: support async namespace connection (#2788)
Also fix 2 bugs:
1. make storage options provider serializable in ray
2. fix table.to_table() uri is wrong for namespace-backed tables
2025-11-19 12:23:50 -08:00
Mykola Skrynnyk
ca8d118f78 feat(python): support to_pydantic in async (#2438)
This request improves support for `pydantic` integration by adding
`to_pydantic` method to asynchronous queries and handling models that
use `alias` in field definitions. Fixes #2436 and closes #2437 .

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Added support for converting asynchronous query results to Pydantic
models.
- **Bug Fixes**
- Simplified conversion of query results to Pydantic models for improved
reliability.
- Improved handling of field aliases and computed fields when mapping
query results to Pydantic models.
- **Tests**
- Added tests to verify correct mapping of aliased and computed fields
in both synchronous and asynchronous scenarios.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-11-19 11:20:14 -08:00
Wyatt Alt
386fc9e466 feat: add num_attempts to merge insert result (#2795)
This pipes the num_attempts field from lance's merge insert result
through lancedb. This allows callers of merge_insert to get a better
idea of whether transaction conflicts are occurring.
2025-11-19 09:32:57 -08:00
Lance Release
ce1bafec1a Bump version: 0.22.4-beta.0 → 0.22.4-beta.1 2025-11-19 12:58:59 +00:00
Lance Release
8f18a7feed Bump version: 0.25.4-beta.0 → 0.25.4-beta.1 2025-11-19 12:58:16 +00:00
LanceDB Robot
e2be699544 chore: update lance dependency to v1.0.0-beta.3 (#2794)
## Summary
- update Lance dependencies to v1.0.0-beta.3 using
ci/set_lance_version.py
- verified with `cargo clippy --workspace --tests --all-features -- -D
warnings`
- formatted with `cargo fmt --all`

Refs: refs/tags/v1.0.0-beta.3
2025-11-19 09:25:45 -03:30
Xuanwo
f77b0ef37d ci: add timely lance release check (#2790)
This PR will add a timely lance release check for lancedb which will
auto bump lance while new tags released.

---

**This PR was primarily authored with Codex using GPT-5-Codex and then
hand-reviewed by me. I AM responsible for every change made in this PR.
I aimed to keep it aligned with our goals, though I may have missed
minor issues. Please flag anything that feels off, I'll fix it
quickly.**

---------

Signed-off-by: Xuanwo <github@xuanwo.io>
2025-11-19 16:43:41 +08:00
Wyatt Alt
c41401f20f chore: repath lance dependencies to lance-format (#2787) 2025-11-18 12:25:58 -08:00
Will Jones
1cf3917a87 ci: make rust ci faster, get ci green (#2782)
* Add `ci` profile for smaller build caches. This had a meaningful
impact in Lance, and I expect a similar impact here.
https://github.com/lancedb/lance/pull/5236
* Get caching working in Rust. Previously was not working due to
`workspaces: rust`.
* Get caching working in NodeJs lint job. Previously wasn't working
because we installed the toolchain **after** we called `- uses:
Swatinem/rust-cache@v2`, which invalidates the cache locally.
* Fix broken pytest from async io transition
(`pytest.PytestRemovedIn9Warning`)
* Altered `get_num_sub_vectors` to handle bug in case of 4-bit PQ. This
was cause of `rust future panicked: unknown error`. Raised an issue
upstream to change panic to error:
https://github.com/lancedb/lance/issues/5257
* Call `npm run docs` to fix doc issue.
* Disable flakey Windows test for consistency. It's just an OS-specific
timer issue, not our fault.
* Fix Windows absolute path handling in namespaces. Was causing CI
failure `OSError: [WinError 123] The filename, directory name, or volume
label syntax is incorrect: `
2025-11-18 09:04:56 -08:00
129 changed files with 5978 additions and 2273 deletions

View File

@@ -1,5 +1,5 @@
[tool.bumpversion]
current_version = "0.22.4-beta.0"
current_version = "0.23.0"
parse = """(?x)
(?P<major>0|[1-9]\\d*)\\.
(?P<minor>0|[1-9]\\d*)\\.
@@ -72,3 +72,9 @@ search = "\nversion = \"{current_version}\""
filename = "nodejs/Cargo.toml"
replace = "\nversion = \"{new_version}\""
search = "\nversion = \"{current_version}\""
# Java documentation
[[tool.bumpversion.files]]
filename = "docs/src/java/java.md"
replace = "<version>{new_version}</version>"
search = "<version>{current_version}</version>"

View File

@@ -19,7 +19,7 @@ rustflags = [
"-Wclippy::string_add_assign",
"-Wclippy::string_add",
"-Wclippy::string_lit_as_bytes",
"-Wclippy::string_to_string",
"-Wclippy::implicit_clone",
"-Wclippy::use_self",
"-Dclippy::cargo",
"-Dclippy::dbg_macro",

View File

@@ -18,6 +18,6 @@ body:
label: Link
description: >
Provide a link to the existing documentation, if applicable.
placeholder: ex. https://lancedb.github.io/lancedb/guides/tables/...
placeholder: ex. https://lancedb.com/docs/tables/...
validations:
required: false

View File

@@ -31,7 +31,7 @@ runs:
with:
command: build
working-directory: python
docker-options: "-e PIP_EXTRA_INDEX_URL=https://pypi.fury.io/lancedb/"
docker-options: "-e PIP_EXTRA_INDEX_URL='https://pypi.fury.io/lance-format/ https://pypi.fury.io/lancedb/'"
target: x86_64-unknown-linux-gnu
manylinux: ${{ inputs.manylinux }}
args: ${{ inputs.args }}
@@ -46,7 +46,7 @@ runs:
with:
command: build
working-directory: python
docker-options: "-e PIP_EXTRA_INDEX_URL=https://pypi.fury.io/lancedb/"
docker-options: "-e PIP_EXTRA_INDEX_URL='https://pypi.fury.io/lance-format/ https://pypi.fury.io/lancedb/'"
target: aarch64-unknown-linux-gnu
manylinux: ${{ inputs.manylinux }}
args: ${{ inputs.args }}

View File

@@ -22,5 +22,5 @@ runs:
command: build
# TODO: pass through interpreter
args: ${{ inputs.args }}
docker-options: "-e PIP_EXTRA_INDEX_URL=https://pypi.fury.io/lancedb/"
docker-options: "-e PIP_EXTRA_INDEX_URL='https://pypi.fury.io/lance-format/ https://pypi.fury.io/lancedb/'"
working-directory: python

View File

@@ -26,7 +26,7 @@ runs:
with:
command: build
args: ${{ inputs.args }}
docker-options: "-e PIP_EXTRA_INDEX_URL=https://pypi.fury.io/lancedb/"
docker-options: "-e PIP_EXTRA_INDEX_URL='https://pypi.fury.io/lance-format/ https://pypi.fury.io/lancedb/'"
working-directory: python
- uses: actions/upload-artifact@v4
with:

View File

@@ -98,3 +98,30 @@ jobs:
printenv OPENAI_API_KEY | codex login --with-api-key
codex --config shell_environment_policy.ignore_default_excludes=true exec --dangerously-bypass-approvals-and-sandbox "$(cat /tmp/codex-prompt.txt)"
- name: Trigger sophon dependency update
env:
TAG: ${{ inputs.tag }}
GH_TOKEN: ${{ secrets.ROBOT_TOKEN }}
run: |
set -euo pipefail
VERSION="${TAG#refs/tags/}"
VERSION="${VERSION#v}"
LANCEDB_BRANCH="codex/update-lance-${VERSION//[^a-zA-Z0-9]/-}"
echo "Triggering sophon workflow with:"
echo " lance_ref: ${TAG#refs/tags/}"
echo " lancedb_ref: ${LANCEDB_BRANCH}"
gh workflow run codex-bump-lancedb-lance.yml \
--repo lancedb/sophon \
-f lance_ref="${TAG#refs/tags/}" \
-f lancedb_ref="${LANCEDB_BRANCH}"
- name: Show latest sophon workflow run
env:
GH_TOKEN: ${{ secrets.ROBOT_TOKEN }}
run: |
set -euo pipefail
echo "Latest sophon workflow run:"
gh run list --repo lancedb/sophon --workflow codex-bump-lancedb-lance.yml --limit 1 --json databaseId,url,displayTitle

View File

@@ -24,7 +24,7 @@ env:
# according to: https://matklad.github.io/2021/09/04/fast-rust-builds.html
# CI builds are faster with incremental disabled.
CARGO_INCREMENTAL: "0"
PIP_EXTRA_INDEX_URL: "https://pypi.fury.io/lancedb/"
PIP_EXTRA_INDEX_URL: "https://pypi.fury.io/lance-format/ https://pypi.fury.io/lancedb/"
jobs:
# Single deploy job since we're just deploying
@@ -50,8 +50,8 @@ jobs:
- name: Build Python
working-directory: python
run: |
python -m pip install --extra-index-url https://pypi.fury.io/lancedb/ -e .
python -m pip install --extra-index-url https://pypi.fury.io/lancedb/ -r ../docs/requirements.txt
python -m pip install --extra-index-url https://pypi.fury.io/lance-format/ --extra-index-url https://pypi.fury.io/lancedb/ -e .
python -m pip install --extra-index-url https://pypi.fury.io/lance-format/ --extra-index-url https://pypi.fury.io/lancedb/ -r ../docs/requirements.txt
- name: Set up node
uses: actions/setup-node@v3
with:

View File

@@ -1,76 +1,35 @@
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
name: Build and publish Java packages
on:
release:
types: [released]
push:
tags:
- "v*"
pull_request:
paths:
- .github/workflows/java-publish.yml
jobs:
macos-arm64:
name: Build on MacOS Arm64
runs-on: macos-14
timeout-minutes: 45
defaults:
run:
working-directory: ./java/core/lancedb-jni
steps:
- name: Checkout repository
uses: actions/checkout@v4
- uses: Swatinem/rust-cache@v2
- name: Install dependencies
run: |
brew install protobuf
- name: Build release
run: |
cargo build --release
- uses: actions/upload-artifact@v4
with:
name: liblancedb_jni_darwin_aarch64.zip
path: target/release/liblancedb_jni.dylib
retention-days: 1
if-no-files-found: error
linux-arm64:
name: Build on Linux Arm64
runs-on: warp-ubuntu-2204-arm64-8x
timeout-minutes: 45
defaults:
run:
working-directory: ./java/core/lancedb-jni
steps:
- name: Checkout repository
uses: actions/checkout@v4
- uses: Swatinem/rust-cache@v2
- uses: actions-rust-lang/setup-rust-toolchain@v1
with:
cache-workspaces: "./java/core/lancedb-jni"
# Disable full debug symbol generation to speed up CI build and keep memory down
# "1" means line tables only, which is useful for panic tracebacks.
rustflags: "-C debuginfo=1"
- name: Install dependencies
run: |
sudo apt -y -qq update
sudo apt install -y protobuf-compiler libssl-dev pkg-config
- name: Build release
run: |
cargo build --release
- uses: actions/upload-artifact@v4
with:
name: liblancedb_jni_linux_aarch64.zip
path: target/release/liblancedb_jni.so
retention-days: 1
if-no-files-found: error
linux-x86:
runs-on: warp-ubuntu-2204-x64-8x
publish:
name: Build and Publish
runs-on: ubuntu-24.04
timeout-minutes: 30
needs: [macos-arm64, linux-arm64]
defaults:
run:
working-directory: ./java
steps:
- name: Checkout repository
uses: actions/checkout@v4
- uses: Swatinem/rust-cache@v2
- name: Set up Java 8
uses: actions/setup-java@v4
with:
@@ -82,40 +41,30 @@ jobs:
server-password: SONATYPE_TOKEN
gpg-private-key: ${{ secrets.GPG_PRIVATE_KEY }}
gpg-passphrase: ${{ secrets.GPG_PASSPHRASE }}
- name: Install dependencies
- name: Set git config
run: |
sudo apt -y -qq update
sudo apt install -y protobuf-compiler libssl-dev pkg-config
- name: Download artifact
uses: actions/download-artifact@v4
- name: Copy native libs
run: |
mkdir -p ./core/target/classes/nativelib/darwin-aarch64 ./core/target/classes/nativelib/linux-aarch64
cp ../liblancedb_jni_darwin_aarch64.zip/liblancedb_jni.dylib ./core/target/classes/nativelib/darwin-aarch64/liblancedb_jni.dylib
cp ../liblancedb_jni_linux_aarch64.zip/liblancedb_jni.so ./core/target/classes/nativelib/linux-aarch64/liblancedb_jni.so
git config --global user.email "dev+gha@lancedb.com"
git config --global user.name "LanceDB Github Runner"
- name: Dry run
if: github.event_name == 'pull_request'
run: |
mvn --batch-mode -DskipTests -Drust.release.build=true package
- name: Set github
run: |
git config --global user.email "LanceDB Github Runner"
git config --global user.name "dev+gha@lancedb.com"
- name: Publish with Java 8
if: github.event_name == 'release'
./mvnw --batch-mode -DskipTests package -pl lancedb-core -am
- name: Publish
if: startsWith(github.ref, 'refs/tags/v')
run: |
echo "use-agent" >> ~/.gnupg/gpg.conf
echo "pinentry-mode loopback" >> ~/.gnupg/gpg.conf
export GPG_TTY=$(tty)
mvn --batch-mode -DskipTests -Drust.release.build=true -DpushChanges=false -Dgpg.passphrase=${{ secrets.GPG_PASSPHRASE }} deploy -P deploy-to-ossrh
./mvnw --batch-mode -DskipTests -DpushChanges=false -Dgpg.passphrase=${{ secrets.GPG_PASSPHRASE }} deploy -pl lancedb-core -am -P deploy-to-ossrh
env:
SONATYPE_USER: ${{ secrets.SONATYPE_USER }}
SONATYPE_TOKEN: ${{ secrets.SONATYPE_TOKEN }}
report-failure:
name: Report Workflow Failure
runs-on: ubuntu-latest
needs: [linux-arm64, linux-x86, macos-arm64]
if: always() && (github.event_name == 'release' || github.event_name == 'workflow_dispatch')
needs: [publish]
if: always() && failure() && startsWith(github.ref, 'refs/tags/v')
permissions:
contents: read
issues: write

View File

@@ -1,118 +1,46 @@
name: Build and Run Java JNI Tests
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
name: Build Java LanceDB Core
on:
push:
branches:
- main
paths:
- java/**
- .github/workflows/java.yml
pull_request:
paths:
- java/**
- rust/**
- .github/workflows/java.yml
env:
# This env var is used by Swatinem/rust-cache@v2 for the cache
# key, so we set it to make sure it is always consistent.
CARGO_TERM_COLOR: always
# Disable full debug symbol generation to speed up CI build and keep memory down
# "1" means line tables only, which is useful for panic tracebacks.
RUSTFLAGS: "-C debuginfo=1"
RUST_BACKTRACE: "1"
# according to: https://matklad.github.io/2021/09/04/fast-rust-builds.html
# CI builds are faster with incremental disabled.
CARGO_INCREMENTAL: "0"
CARGO_BUILD_JOBS: "1"
jobs:
linux-build-java-11:
runs-on: ubuntu-22.04
name: ubuntu-22.04 + Java 11
build-java:
runs-on: ubuntu-24.04
name: Build
defaults:
run:
working-directory: ./java
steps:
- name: Checkout repository
uses: actions/checkout@v4
- uses: Swatinem/rust-cache@v2
with:
workspaces: java/core/lancedb-jni
- uses: actions-rust-lang/setup-rust-toolchain@v1
with:
components: rustfmt
- name: Run cargo fmt
run: cargo fmt --check
working-directory: ./java/core/lancedb-jni
- name: Install dependencies
run: |
sudo apt update
sudo apt install -y protobuf-compiler libssl-dev
- name: Install Java 11
uses: actions/setup-java@v4
with:
distribution: temurin
java-version: 11
cache: "maven"
- name: Java Style Check
run: mvn checkstyle:check
# Disable because of issues in lancedb rust core code
# - name: Rust Clippy
# working-directory: java/core/lancedb-jni
# run: cargo clippy --all-targets -- -D warnings
- name: Running tests with Java 11
run: mvn clean test
linux-build-java-17:
runs-on: ubuntu-22.04
name: ubuntu-22.04 + Java 17
defaults:
run:
working-directory: ./java
steps:
- name: Checkout repository
uses: actions/checkout@v4
- uses: Swatinem/rust-cache@v2
with:
workspaces: java/core/lancedb-jni
- uses: actions-rust-lang/setup-rust-toolchain@v1
with:
components: rustfmt
- name: Run cargo fmt
run: cargo fmt --check
working-directory: ./java/core/lancedb-jni
- name: Install dependencies
run: |
sudo apt update
sudo apt install -y protobuf-compiler libssl-dev
- name: Install Java 17
- name: Set up Java 17
uses: actions/setup-java@v4
with:
distribution: temurin
java-version: 17
cache: "maven"
- run: echo "JAVA_17=$JAVA_HOME" >> $GITHUB_ENV
- name: Java Style Check
run: mvn checkstyle:check
# Disable because of issues in lancedb rust core code
# - name: Rust Clippy
# working-directory: java/core/lancedb-jni
# run: cargo clippy --all-targets -- -D warnings
- name: Running tests with Java 17
run: |
export JAVA_TOOL_OPTIONS="$JAVA_TOOL_OPTIONS \
-XX:+IgnoreUnrecognizedVMOptions \
--add-opens=java.base/java.lang=ALL-UNNAMED \
--add-opens=java.base/java.lang.invoke=ALL-UNNAMED \
--add-opens=java.base/java.lang.reflect=ALL-UNNAMED \
--add-opens=java.base/java.io=ALL-UNNAMED \
--add-opens=java.base/java.net=ALL-UNNAMED \
--add-opens=java.base/java.nio=ALL-UNNAMED \
--add-opens=java.base/java.util=ALL-UNNAMED \
--add-opens=java.base/java.util.concurrent=ALL-UNNAMED \
--add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED \
--add-opens=java.base/jdk.internal.ref=ALL-UNNAMED \
--add-opens=java.base/sun.nio.ch=ALL-UNNAMED \
--add-opens=java.base/sun.nio.cs=ALL-UNNAMED \
--add-opens=java.base/sun.security.action=ALL-UNNAMED \
--add-opens=java.base/sun.util.calendar=ALL-UNNAMED \
--add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED \
-Djdk.reflect.useDirectMethodHandle=false \
-Dio.netty.tryReflectionSetAccessible=true"
JAVA_HOME=$JAVA_17 mvn clean test
run: ./mvnw checkstyle:check
- name: Build and install
run: ./mvnw clean install

View File

@@ -0,0 +1,62 @@
name: Lance Release Timer
on:
schedule:
- cron: "*/10 * * * *"
workflow_dispatch:
permissions:
contents: read
actions: write
concurrency:
group: lance-release-timer
cancel-in-progress: false
jobs:
trigger-update:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Check for new Lance tag
id: check
env:
GH_TOKEN: ${{ secrets.ROBOT_TOKEN }}
run: |
python3 ci/check_lance_release.py --github-output "$GITHUB_OUTPUT"
- name: Look for existing PR
if: steps.check.outputs.needs_update == 'true'
id: pr
env:
GH_TOKEN: ${{ secrets.ROBOT_TOKEN }}
run: |
set -euo pipefail
TITLE="chore: update lance dependency to v${{ steps.check.outputs.latest_version }}"
COUNT=$(gh pr list --search "\"$TITLE\" in:title" --state open --limit 1 --json number --jq 'length')
if [ "$COUNT" -gt 0 ]; then
echo "Open PR already exists for $TITLE"
echo "pr_exists=true" >> "$GITHUB_OUTPUT"
else
echo "No existing PR for $TITLE"
echo "pr_exists=false" >> "$GITHUB_OUTPUT"
fi
- name: Trigger codex update workflow
if: steps.check.outputs.needs_update == 'true' && steps.pr.outputs.pr_exists != 'true'
env:
GH_TOKEN: ${{ secrets.ROBOT_TOKEN }}
run: |
set -euo pipefail
TAG=${{ steps.check.outputs.latest_tag }}
gh workflow run codex-update-lance-dependency.yml -f tag=refs/tags/$TAG
- name: Show latest codex workflow run
if: steps.check.outputs.needs_update == 'true' && steps.pr.outputs.pr_exists != 'true'
env:
GH_TOKEN: ${{ secrets.ROBOT_TOKEN }}
run: |
set -euo pipefail
gh run list --workflow codex-update-lance-dependency.yml --limit 1 --json databaseId,url,displayTitle

View File

@@ -16,9 +16,6 @@ concurrency:
cancel-in-progress: true
env:
# Disable full debug symbol generation to speed up CI build and keep memory down
# "1" means line tables only, which is useful for panic tracebacks.
RUSTFLAGS: "-C debuginfo=1"
RUST_BACKTRACE: "1"
jobs:
@@ -43,18 +40,20 @@ jobs:
node-version: 20
cache: 'npm'
cache-dependency-path: nodejs/package-lock.json
- uses: Swatinem/rust-cache@v2
- uses: actions-rust-lang/setup-rust-toolchain@v1
with:
components: rustfmt, clippy
- name: Install dependencies
run: |
sudo apt update
sudo apt install -y protobuf-compiler libssl-dev
- uses: actions-rust-lang/setup-rust-toolchain@v1
with:
components: rustfmt, clippy
- name: Lint
- uses: Swatinem/rust-cache@v2
- name: Format Rust
run: cargo fmt --all -- --check
- name: Lint Rust
run: cargo clippy --profile ci --all --all-features -- -D warnings
- name: Lint Typescript
run: |
cargo fmt --all -- --check
cargo clippy --all --all-features -- -D warnings
npm ci
npm run lint-ci
- name: Lint examples
@@ -89,8 +88,9 @@ jobs:
npm install -g @napi-rs/cli
- name: Build
run: |
npm ci
npm run build
npm ci --include=optional
npm run build:debug -- --profile ci
npm run tsc
- name: Setup localstack
working-directory: .
run: docker compose up --detach --wait
@@ -146,8 +146,9 @@ jobs:
npm install -g @napi-rs/cli
- name: Build
run: |
npm ci
npm run build
npm ci --include=optional
npm run build:debug -- --profile ci
npm run tsc
- name: Test
run: |
npm run test

View File

@@ -97,12 +97,6 @@ jobs:
fail-fast: false
matrix:
settings:
- target: x86_64-apple-darwin
host: macos-latest
features: ","
pre_build: |-
brew install protobuf
rustup target add x86_64-apple-darwin
- target: aarch64-apple-darwin
host: macos-latest
features: fp16kernels

View File

@@ -11,7 +11,7 @@ on:
- Cargo.toml # Change in dependency frequently breaks builds
env:
PIP_EXTRA_INDEX_URL: "https://pypi.fury.io/lancedb/"
PIP_EXTRA_INDEX_URL: "https://pypi.fury.io/lance-format/ https://pypi.fury.io/lancedb/"
jobs:
linux:
@@ -64,8 +64,6 @@ jobs:
strategy:
matrix:
config:
- target: x86_64-apple-darwin
runner: macos-13
- target: aarch64-apple-darwin
runner: warp-macos-14-arm64-6x
env:

View File

@@ -18,7 +18,8 @@ env:
# Color output for pytest is off by default.
PYTEST_ADDOPTS: "--color=yes"
FORCE_COLOR: "1"
PIP_EXTRA_INDEX_URL: "https://pypi.fury.io/lancedb/"
PIP_EXTRA_INDEX_URL: "https://pypi.fury.io/lance-format/ https://pypi.fury.io/lancedb/"
RUST_BACKTRACE: "1"
jobs:
lint:
@@ -48,8 +49,8 @@ jobs:
type-check:
name: "Type Check"
timeout-minutes: 30
runs-on: "ubuntu-22.04"
timeout-minutes: 60
runs-on: ubuntu-2404-8x-x64
defaults:
run:
shell: bash
@@ -77,8 +78,8 @@ jobs:
doctest:
name: "Doctest"
timeout-minutes: 30
runs-on: "ubuntu-24.04"
timeout-minutes: 60
runs-on: ubuntu-2404-8x-x64
defaults:
run:
shell: bash
@@ -97,12 +98,9 @@ jobs:
run: |
sudo apt update
sudo apt install -y protobuf-compiler
- uses: Swatinem/rust-cache@v2
with:
workspaces: python
- name: Install
run: |
pip install --extra-index-url https://pypi.fury.io/lancedb/ -e .[tests,dev,embeddings]
pip install --extra-index-url https://pypi.fury.io/lance-format/ --extra-index-url https://pypi.fury.io/lancedb/ -e .[tests,dev,embeddings]
pip install tantivy
pip install mlx
- name: Doctest
@@ -131,10 +129,9 @@ jobs:
uses: actions/setup-python@v5
with:
python-version: 3.${{ matrix.python-minor-version }}
- uses: Swatinem/rust-cache@v2
with:
workspaces: python
- uses: ./.github/workflows/build_linux_wheel
with:
args: --profile ci
- uses: ./.github/workflows/run_tests
with:
integration: true
@@ -146,16 +143,9 @@ jobs:
- name: Delete wheels
run: rm -rf target/wheels
platform:
name: "Mac: ${{ matrix.config.name }}"
name: "Mac"
timeout-minutes: 30
strategy:
matrix:
config:
- name: x86
runner: macos-13
- name: Arm
runner: macos-14
runs-on: "${{ matrix.config.runner }}"
runs-on: macos-14
defaults:
run:
shell: bash
@@ -169,10 +159,9 @@ jobs:
uses: actions/setup-python@v5
with:
python-version: "3.12"
- uses: Swatinem/rust-cache@v2
with:
workspaces: python
- uses: ./.github/workflows/build_mac_wheel
with:
args: --profile ci
- uses: ./.github/workflows/run_tests
# Make sure wheels are not included in the Rust cache
- name: Delete wheels
@@ -199,10 +188,9 @@ jobs:
uses: actions/setup-python@v5
with:
python-version: "3.12"
- uses: Swatinem/rust-cache@v2
with:
workspaces: python
- uses: ./.github/workflows/build_windows_wheel
with:
args: --profile ci
- uses: ./.github/workflows/run_tests
# Make sure wheels are not included in the Rust cache
- name: Delete wheels
@@ -231,7 +219,7 @@ jobs:
run: |
pip install "pydantic<2"
pip install pyarrow==16
pip install --extra-index-url https://pypi.fury.io/lancedb/ -e .[tests]
pip install --extra-index-url https://pypi.fury.io/lance-format/ --extra-index-url https://pypi.fury.io/lancedb/ -e .[tests]
pip install tantivy
- name: Run tests
run: pytest -m "not slow and not s3_test" -x -v --durations=30 python/tests

View File

@@ -15,7 +15,7 @@ runs:
- name: Install lancedb
shell: bash
run: |
pip3 install --extra-index-url https://pypi.fury.io/lancedb/ $(ls target/wheels/lancedb-*.whl)[tests,dev]
pip3 install --extra-index-url https://pypi.fury.io/lance-format/ --extra-index-url https://pypi.fury.io/lancedb/ $(ls target/wheels/lancedb-*.whl)[tests,dev]
- name: Setup localstack for integration tests
if: ${{ inputs.integration == 'true' }}
shell: bash

View File

@@ -18,11 +18,7 @@ env:
# This env var is used by Swatinem/rust-cache@v2 for the cache
# key, so we set it to make sure it is always consistent.
CARGO_TERM_COLOR: always
# Disable full debug symbol generation to speed up CI build and keep memory down
# "1" means line tables only, which is useful for panic tracebacks.
RUSTFLAGS: "-C debuginfo=1"
RUST_BACKTRACE: "1"
CARGO_INCREMENTAL: 0
jobs:
lint:
@@ -44,8 +40,6 @@ jobs:
with:
components: rustfmt, clippy
- uses: Swatinem/rust-cache@v2
with:
workspaces: rust
- name: Install dependencies
run: |
sudo apt update
@@ -53,7 +47,7 @@ jobs:
- name: Run format
run: cargo fmt --all -- --check
- name: Run clippy
run: cargo clippy --workspace --tests --all-features -- -D warnings
run: cargo clippy --profile ci --workspace --tests --all-features -- -D warnings
build-no-lock:
runs-on: ubuntu-24.04
@@ -80,7 +74,7 @@ jobs:
sudo apt install -y protobuf-compiler libssl-dev
- name: Build all
run: |
cargo build --benches --all-features --tests
cargo build --profile ci --benches --all-features --tests
linux:
timeout-minutes: 30
@@ -103,14 +97,8 @@ jobs:
fetch-depth: 0
lfs: true
- uses: Swatinem/rust-cache@v2
with:
workspaces: rust
- name: Install dependencies
run: |
# This shaves 2 minutes off this step in CI. This doesn't seem to be
# necessary in standard runners, but it is in the 4x runners.
sudo rm /var/lib/man-db/auto-update
sudo apt install -y protobuf-compiler libssl-dev
run: sudo apt install -y protobuf-compiler libssl-dev
- uses: rui314/setup-mold@v1
- name: Make Swap
run: |
@@ -119,22 +107,22 @@ jobs:
sudo mkswap /swapfile
sudo swapon /swapfile
- name: Build
run: cargo build --all-features --tests --locked --examples
run: cargo build --profile ci --all-features --tests --locked --examples
- name: Run feature tests
run: make -C ./lancedb feature-tests
run: CARGO_ARGS="--profile ci" make -C ./lancedb feature-tests
- name: Run examples
run: cargo run --example simple --locked
run: cargo run --profile ci --example simple --locked
- name: Run remote tests
# Running this requires access to secrets, so skip if this is
# a PR from a fork.
if: github.event_name != 'pull_request' || !github.event.pull_request.head.repo.fork
run: make -C ./lancedb remote-tests
run: CARGO_ARGS="--profile ci" make -C ./lancedb remote-tests
macos:
timeout-minutes: 30
strategy:
matrix:
mac-runner: ["macos-13", "macos-14"]
mac-runner: ["macos-14", "macos-15"]
runs-on: "${{ matrix.mac-runner }}"
defaults:
run:
@@ -148,8 +136,6 @@ jobs:
- name: CPU features
run: sysctl -a | grep cpu
- uses: Swatinem/rust-cache@v2
with:
workspaces: rust
- name: Install dependencies
run: brew install protobuf
- name: Run tests
@@ -159,7 +145,7 @@ jobs:
ALL_FEATURES=`cargo metadata --format-version=1 --no-deps \
| jq -r '.packages[] | .features | keys | .[]' \
| grep -v s3-test | sort | uniq | paste -s -d "," -`
cargo test --features $ALL_FEATURES --locked
cargo test --profile ci --features $ALL_FEATURES --locked
windows:
runs-on: windows-2022
@@ -173,22 +159,21 @@ jobs:
working-directory: rust/lancedb
steps:
- uses: actions/checkout@v4
- name: Set target
run: rustup target add ${{ matrix.target }}
- uses: Swatinem/rust-cache@v2
with:
workspaces: rust
- name: Install Protoc v21.12
run: choco install --no-progress protoc
- name: Build
run: |
rustup target add ${{ matrix.target }}
$env:VCPKG_ROOT = $env:VCPKG_INSTALLATION_ROOT
cargo build --features remote --tests --locked --target ${{ matrix.target }}
cargo build --profile ci --features remote --tests --locked --target ${{ matrix.target }}
- name: Run tests
# Can only run tests when target matches host
if: ${{ matrix.target == 'x86_64-pc-windows-msvc' }}
run: |
$env:VCPKG_ROOT = $env:VCPKG_INSTALLATION_ROOT
cargo test --features remote --locked
cargo test --profile ci --features remote --locked
msrv:
# Check the minimum supported Rust version
@@ -213,6 +198,7 @@ jobs:
uses: dtolnay/rust-toolchain@master
with:
toolchain: ${{ matrix.msrv }}
- uses: Swatinem/rust-cache@v2
- name: Downgrade dependencies
# These packages have newer requirements for MSRV
run: |
@@ -226,4 +212,4 @@ jobs:
cargo update -p aws-sdk-sts --precise 1.51.0
cargo update -p home --precise 0.5.9
- name: cargo +${{ matrix.msrv }} check
run: cargo check --workspace --tests --benches --all-features
run: cargo check --profile ci --workspace --tests --benches --all-features

992
Cargo.lock generated

File diff suppressed because it is too large Load Diff

View File

@@ -1,5 +1,5 @@
[workspace]
members = ["rust/lancedb", "nodejs", "python", "java/core/lancedb-jni"]
members = ["rust/lancedb", "nodejs", "python"]
# Python package needs to be built by maturin.
exclude = ["python"]
resolver = "2"
@@ -15,20 +15,20 @@ categories = ["database-implementations"]
rust-version = "1.78.0"
[workspace.dependencies]
lance = { "version" = "=1.0.0-beta.2", default-features = false, "tag" = "v1.0.0-beta.2", "git" = "https://github.com/lancedb/lance.git" }
lance-core = { "version" = "=1.0.0-beta.2", "tag" = "v1.0.0-beta.2", "git" = "https://github.com/lancedb/lance.git" }
lance-datagen = { "version" = "=1.0.0-beta.2", "tag" = "v1.0.0-beta.2", "git" = "https://github.com/lancedb/lance.git" }
lance-file = { "version" = "=1.0.0-beta.2", "tag" = "v1.0.0-beta.2", "git" = "https://github.com/lancedb/lance.git" }
lance-io = { "version" = "=1.0.0-beta.2", default-features = false, "tag" = "v1.0.0-beta.2", "git" = "https://github.com/lancedb/lance.git" }
lance-index = { "version" = "=1.0.0-beta.2", "tag" = "v1.0.0-beta.2", "git" = "https://github.com/lancedb/lance.git" }
lance-linalg = { "version" = "=1.0.0-beta.2", "tag" = "v1.0.0-beta.2", "git" = "https://github.com/lancedb/lance.git" }
lance-namespace = { "version" = "=1.0.0-beta.2", "tag" = "v1.0.0-beta.2", "git" = "https://github.com/lancedb/lance.git" }
lance-namespace-impls = { "version" = "=1.0.0-beta.2", "features" = ["dir-aws", "dir-gcp", "dir-azure", "dir-oss", "rest"], "tag" = "v1.0.0-beta.2", "git" = "https://github.com/lancedb/lance.git" }
lance-table = { "version" = "=1.0.0-beta.2", "tag" = "v1.0.0-beta.2", "git" = "https://github.com/lancedb/lance.git" }
lance-testing = { "version" = "=1.0.0-beta.2", "tag" = "v1.0.0-beta.2", "git" = "https://github.com/lancedb/lance.git" }
lance-datafusion = { "version" = "=1.0.0-beta.2", "tag" = "v1.0.0-beta.2", "git" = "https://github.com/lancedb/lance.git" }
lance-encoding = { "version" = "=1.0.0-beta.2", "tag" = "v1.0.0-beta.2", "git" = "https://github.com/lancedb/lance.git" }
lance-arrow = { "version" = "=1.0.0-beta.2", "tag" = "v1.0.0-beta.2", "git" = "https://github.com/lancedb/lance.git" }
lance = { "version" = "=1.0.1-beta.1", default-features = false, "tag" = "v1.0.1-beta.1", "git" = "https://github.com/lance-format/lance.git" }
lance-core = { "version" = "=1.0.1-beta.1", "tag" = "v1.0.1-beta.1", "git" = "https://github.com/lance-format/lance.git" }
lance-datagen = { "version" = "=1.0.1-beta.1", "tag" = "v1.0.1-beta.1", "git" = "https://github.com/lance-format/lance.git" }
lance-file = { "version" = "=1.0.1-beta.1", "tag" = "v1.0.1-beta.1", "git" = "https://github.com/lance-format/lance.git" }
lance-io = { "version" = "=1.0.1-beta.1", default-features = false, "tag" = "v1.0.1-beta.1", "git" = "https://github.com/lance-format/lance.git" }
lance-index = { "version" = "=1.0.1-beta.1", "tag" = "v1.0.1-beta.1", "git" = "https://github.com/lance-format/lance.git" }
lance-linalg = { "version" = "=1.0.1-beta.1", "tag" = "v1.0.1-beta.1", "git" = "https://github.com/lance-format/lance.git" }
lance-namespace = { "version" = "=1.0.1-beta.1", "tag" = "v1.0.1-beta.1", "git" = "https://github.com/lance-format/lance.git" }
lance-namespace-impls = { "version" = "=1.0.1-beta.1", default-features = false, "tag" = "v1.0.1-beta.1", "git" = "https://github.com/lance-format/lance.git" }
lance-table = { "version" = "=1.0.1-beta.1", "tag" = "v1.0.1-beta.1", "git" = "https://github.com/lance-format/lance.git" }
lance-testing = { "version" = "=1.0.1-beta.1", "tag" = "v1.0.1-beta.1", "git" = "https://github.com/lance-format/lance.git" }
lance-datafusion = { "version" = "=1.0.1-beta.1", "tag" = "v1.0.1-beta.1", "git" = "https://github.com/lance-format/lance.git" }
lance-encoding = { "version" = "=1.0.1-beta.1", "tag" = "v1.0.1-beta.1", "git" = "https://github.com/lance-format/lance.git" }
lance-arrow = { "version" = "=1.0.1-beta.1", "tag" = "v1.0.1-beta.1", "git" = "https://github.com/lance-format/lance.git" }
ahash = "0.8"
# Note that this one does not include pyarrow
arrow = { version = "56.2", optional = false }
@@ -63,3 +63,17 @@ regex = "1.10"
lazy_static = "1"
semver = "1.0.25"
chrono = "0.4"
[profile.ci]
debug = "line-tables-only"
inherits = "dev"
incremental = false
# This rule applies to every package except workspace members (dependencies
# such as `arrow` and `tokio`). It disables debug info and related features on
# dependencies so their binaries stay smaller, improving cache reuse.
[profile.ci.package."*"]
debug = false
debug-assertions = false
strip = "debuginfo"
incremental = false

View File

@@ -15,7 +15,7 @@
# **The Multimodal AI Lakehouse**
[**How to Install** ](#how-to-install) ✦ [**Detailed Documentation**](https://lancedb.github.io/lancedb/) ✦ [**Tutorials and Recipes**](https://github.com/lancedb/vectordb-recipes/tree/main) ✦ [**Contributors**](#contributors)
[**How to Install** ](#how-to-install) ✦ [**Detailed Documentation**](https://lancedb.com/docs) ✦ [**Tutorials and Recipes**](https://github.com/lancedb/vectordb-recipes/tree/main) ✦ [**Contributors**](#contributors)
**The ultimate multimodal data platform for AI/ML applications.**

208
ci/check_lance_release.py Executable file
View File

@@ -0,0 +1,208 @@
#!/usr/bin/env python3
"""Determine whether a newer Lance tag exists and expose results for CI."""
from __future__ import annotations
import argparse
import json
import os
import re
import subprocess
import sys
from dataclasses import dataclass
from pathlib import Path
from typing import Iterable, List, Sequence, Tuple, Union
try: # Python >=3.11
import tomllib # type: ignore
except ModuleNotFoundError: # pragma: no cover - fallback for older Python
import tomli as tomllib # type: ignore
LANCE_REPO = "lance-format/lance"
SEMVER_RE = re.compile(
r"^\s*(?P<major>0|[1-9]\d*)\.(?P<minor>0|[1-9]\d*)\.(?P<patch>0|[1-9]\d*)"
r"(?:-(?P<prerelease>[0-9A-Za-z.-]+))?"
r"(?:\+[0-9A-Za-z.-]+)?\s*$"
)
@dataclass(frozen=True)
class SemVer:
major: int
minor: int
patch: int
prerelease: Tuple[Union[int, str], ...]
def __lt__(self, other: "SemVer") -> bool: # pragma: no cover - simple comparison
if (self.major, self.minor, self.patch) != (other.major, other.minor, other.patch):
return (self.major, self.minor, self.patch) < (other.major, other.minor, other.patch)
if self.prerelease == other.prerelease:
return False
if not self.prerelease:
return False # release > anything else
if not other.prerelease:
return True
for left, right in zip(self.prerelease, other.prerelease):
if left == right:
continue
if isinstance(left, int) and isinstance(right, int):
return left < right
if isinstance(left, int):
return True
if isinstance(right, int):
return False
return str(left) < str(right)
return len(self.prerelease) < len(other.prerelease)
def __eq__(self, other: object) -> bool: # pragma: no cover - trivial
if not isinstance(other, SemVer):
return NotImplemented
return (
self.major == other.major
and self.minor == other.minor
and self.patch == other.patch
and self.prerelease == other.prerelease
)
def parse_semver(raw: str) -> SemVer:
match = SEMVER_RE.match(raw)
if not match:
raise ValueError(f"Unsupported version format: {raw}")
prerelease = match.group("prerelease")
parts: Tuple[Union[int, str], ...] = ()
if prerelease:
parsed: List[Union[int, str]] = []
for piece in prerelease.split("."):
if piece.isdigit():
parsed.append(int(piece))
else:
parsed.append(piece)
parts = tuple(parsed)
return SemVer(
major=int(match.group("major")),
minor=int(match.group("minor")),
patch=int(match.group("patch")),
prerelease=parts,
)
@dataclass
class TagInfo:
tag: str # e.g. v1.0.0-beta.2
version: str # e.g. 1.0.0-beta.2
semver: SemVer
def run_command(cmd: Sequence[str]) -> str:
result = subprocess.run(cmd, capture_output=True, text=True, check=False)
if result.returncode != 0:
raise RuntimeError(
f"Command {' '.join(cmd)} failed with {result.returncode}: {result.stderr.strip()}"
)
return result.stdout.strip()
def fetch_remote_tags() -> List[TagInfo]:
output = run_command(
[
"gh",
"api",
"-X",
"GET",
f"repos/{LANCE_REPO}/git/refs/tags",
"--paginate",
"--jq",
".[].ref",
]
)
tags: List[TagInfo] = []
for line in output.splitlines():
ref = line.strip()
if not ref.startswith("refs/tags/v"):
continue
tag = ref.split("refs/tags/")[-1]
version = tag.lstrip("v")
try:
tags.append(TagInfo(tag=tag, version=version, semver=parse_semver(version)))
except ValueError:
continue
if not tags:
raise RuntimeError("No Lance tags could be parsed from GitHub API output")
return tags
def read_current_version(repo_root: Path) -> str:
cargo_path = repo_root / "Cargo.toml"
with cargo_path.open("rb") as fh:
data = tomllib.load(fh)
try:
deps = data["workspace"]["dependencies"]
entry = deps["lance"]
except KeyError as exc: # pragma: no cover - configuration guard
raise RuntimeError("Failed to locate workspace.dependencies.lance in Cargo.toml") from exc
if isinstance(entry, str):
raw_version = entry
elif isinstance(entry, dict):
raw_version = entry.get("version", "")
else: # pragma: no cover - defensive
raise RuntimeError("Unexpected lance dependency format")
raw_version = raw_version.strip()
if not raw_version:
raise RuntimeError("lance dependency does not declare a version")
return raw_version.lstrip("=")
def determine_latest_tag(tags: Iterable[TagInfo]) -> TagInfo:
return max(tags, key=lambda tag: tag.semver)
def write_outputs(args: argparse.Namespace, payload: dict) -> None:
target = getattr(args, "github_output", None)
if not target:
return
with open(target, "a", encoding="utf-8") as handle:
for key, value in payload.items():
handle.write(f"{key}={value}\n")
def main(argv: Sequence[str] | None = None) -> int:
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument(
"--repo-root",
default=Path(__file__).resolve().parents[1],
type=Path,
help="Path to the lancedb repository root",
)
parser.add_argument(
"--github-output",
default=os.environ.get("GITHUB_OUTPUT"),
help="Optional file path for writing GitHub Action outputs",
)
args = parser.parse_args(argv)
repo_root = Path(args.repo_root)
current_version = read_current_version(repo_root)
current_semver = parse_semver(current_version)
tags = fetch_remote_tags()
latest = determine_latest_tag(tags)
needs_update = latest.semver > current_semver
payload = {
"current_version": current_version,
"current_tag": f"v{current_version}",
"latest_version": latest.version,
"latest_tag": latest.tag,
"needs_update": "true" if needs_update else "false",
}
print(json.dumps(payload))
write_outputs(args, payload)
return 0
if __name__ == "__main__":
sys.exit(main())

View File

@@ -3,6 +3,8 @@ import re
import sys
import json
LANCE_GIT_URL = "https://github.com/lance-format/lance.git"
def run_command(command: str) -> str:
"""
@@ -29,7 +31,7 @@ def get_latest_stable_version() -> str:
def get_latest_preview_version() -> str:
lance_tags = run_command(
"git ls-remote --tags https://github.com/lancedb/lance.git | grep 'refs/tags/v[0-9beta.-]\\+$'"
f"git ls-remote --tags {LANCE_GIT_URL} | grep 'refs/tags/v[0-9beta.-]\\+$'"
).splitlines()
lance_tags = (
tag.split("refs/tags/")[1]
@@ -176,8 +178,8 @@ def set_stable_version(version: str):
def set_preview_version(version: str):
"""
Sets lines to
lance = { "version" = "=0.29.0", default-features = false, "features" = ["dynamodb"], "tag" = "v0.29.0-beta.2", "git" = "https://github.com/lancedb/lance.git" }
lance-io = { "version" = "=0.29.0", default-features = false, "tag" = "v0.29.0-beta.2", "git" = "https://github.com/lancedb/lance.git" }
lance = { "version" = "=0.29.0", default-features = false, "features" = ["dynamodb"], "tag" = "v0.29.0-beta.2", "git" = LANCE_GIT_URL }
lance-io = { "version" = "=0.29.0", default-features = false, "tag" = "v0.29.0-beta.2", "git" = LANCE_GIT_URL }
...
"""
@@ -194,7 +196,7 @@ def set_preview_version(version: str):
config["features"] = features
config["tag"] = f"v{version}"
config["git"] = "https://github.com/lancedb/lance.git"
config["git"] = LANCE_GIT_URL
return dict_to_toml_line(package_name, config)
@@ -227,6 +229,29 @@ def set_local_version():
update_cargo_toml(line_updater)
def update_lockfiles(version: str, fallback_to_git: bool = False):
"""
Update Cargo metadata and optionally fall back to using the git tag if the
requested crates.io version is unavailable.
"""
try:
print("Updating lockfiles...", file=sys.stderr, end="")
run_command("cargo metadata > /dev/null")
print(" done.", file=sys.stderr)
except Exception as e:
if fallback_to_git and "failed to select a version" in str(e):
print(
f" failed for crates.io v{version}, retrying with git tag...",
file=sys.stderr,
)
set_preview_version(version)
print("Updating lockfiles...", file=sys.stderr, end="")
run_command("cargo metadata > /dev/null")
print(" done.", file=sys.stderr)
else:
raise
parser = argparse.ArgumentParser(description="Set the version of the Lance package.")
parser.add_argument(
"version",
@@ -242,6 +267,7 @@ if args.version == "stable":
file=sys.stderr,
)
set_stable_version(latest_stable_version)
update_lockfiles(latest_stable_version)
elif args.version == "preview":
latest_preview_version = get_latest_preview_version()
print(
@@ -249,8 +275,10 @@ elif args.version == "preview":
file=sys.stderr,
)
set_preview_version(latest_preview_version)
update_lockfiles(latest_preview_version)
elif args.version == "local":
set_local_version()
update_lockfiles("local")
else:
# Parse the version number.
version = args.version
@@ -260,9 +288,7 @@ else:
if "beta" in version:
set_preview_version(version)
update_lockfiles(version)
else:
set_stable_version(version)
print("Updating lockfiles...", file=sys.stderr, end="")
run_command("cargo metadata > /dev/null")
print(" done.", file=sys.stderr)
update_lockfiles(version, fallback_to_git=True)

View File

@@ -1,8 +1,8 @@
# LanceDB Documentation
LanceDB docs are deployed to https://lancedb.github.io/lancedb/.
LanceDB docs are available at [lancedb.com/docs](https://lancedb.com/docs).
Docs is built and deployed automatically by [Github Actions](../.github/workflows/docs.yml)
The SDK docs are built and deployed automatically by [Github Actions](../.github/workflows/docs.yml)
whenever a commit is pushed to the `main` branch. So it is possible for the docs to show
unreleased features.

View File

@@ -123,6 +123,7 @@ nav:
- Overview: index.md
- Python: python/python.md
- Javascript/TypeScript: js/globals.md
- Java: java/java.md
- Rust: https://docs.rs/lancedb/latest/lancedb/index.html
extra_css:

View File

@@ -4,4 +4,5 @@ This page contains the API reference for the SDKs supported by the LanceDB team.
- [Python](python/python.md)
- [JavaScript/TypeScript](js/globals.md)
- [Java](java/java.md)
- [Rust](https://docs.rs/lancedb/latest/lancedb/index.html)

499
docs/src/java/java.md Normal file
View File

@@ -0,0 +1,499 @@
# Java SDK
The LanceDB Java SDK provides a convenient way to interact with LanceDB Cloud and Enterprise deployments using the Lance REST Namespace API.
!!! note
The Java SDK currently only works for LanceDB remote database that connects to LanceDB Cloud and Enterprise.
Local database support is a work in progress. Check [LANCEDB-2848](https://github.com/lancedb/lancedb/issues/2848) for the latest progress.
## Installation
Add the following dependency to your `pom.xml`:
```xml
<dependency>
<groupId>com.lancedb</groupId>
<artifactId>lancedb-core</artifactId>
<version>0.23.0</version>
</dependency>
```
## Quick Start
### Connecting to LanceDB Cloud
```java
import com.lancedb.LanceDbNamespaceClientBuilder;
import org.lance.namespace.LanceNamespace;
// If your DB url is db://example-db, then your database here is example-db
LanceNamespace namespaceClient = LanceDbNamespaceClientBuilder.newBuilder()
.apiKey("your_lancedb_cloud_api_key")
.database("your_database_name")
.build();
```
### Connecting to LanceDB Enterprise
For LanceDB Enterprise deployments with a custom endpoint:
```java
LanceNamespace namespaceClient = LanceDbNamespaceClientBuilder.newBuilder()
.apiKey("your_lancedb_enterprise_api_key")
.database("your_database_name")
.endpoint("<your_enterprise_endpoint>")
.build();
```
### Configuration Options
| Method | Description | Required |
|--------|-------------|----------|
| `apiKey(String)` | LanceDB API key | Yes |
| `database(String)` | Database name | Yes |
| `endpoint(String)` | Custom endpoint URL for Enterprise deployments | No |
| `region(String)` | AWS region (default: "us-east-1") | No |
| `config(String, String)` | Additional configuration parameters | No |
## Metadata Operations
### Creating a Namespace
Namespaces organize tables hierarchically. Create a namespace before creating tables within it:
```java
import org.lance.namespace.model.CreateNamespaceRequest;
import org.lance.namespace.model.CreateNamespaceResponse;
// Create a child namespace
CreateNamespaceRequest request = new CreateNamespaceRequest();
request.setId(Arrays.asList("my_namespace"));
CreateNamespaceResponse response = namespaceClient.createNamespace(request);
```
You can also create nested namespaces:
```java
// Create a nested namespace: parent/child
CreateNamespaceRequest request = new CreateNamespaceRequest();
request.setId(Arrays.asList("parent_namespace", "child_namespace"));
CreateNamespaceResponse response = namespaceClient.createNamespace(request);
```
### Describing a Namespace
```java
import org.lance.namespace.model.DescribeNamespaceRequest;
import org.lance.namespace.model.DescribeNamespaceResponse;
DescribeNamespaceRequest request = new DescribeNamespaceRequest();
request.setId(Arrays.asList("my_namespace"));
DescribeNamespaceResponse response = namespaceClient.describeNamespace(request);
System.out.println("Namespace properties: " + response.getProperties());
```
### Listing Namespaces
```java
import org.lance.namespace.model.ListNamespacesRequest;
import org.lance.namespace.model.ListNamespacesResponse;
// List all namespaces at root level
ListNamespacesRequest request = new ListNamespacesRequest();
request.setId(Arrays.asList()); // Empty for root
ListNamespacesResponse response = namespaceClient.listNamespaces(request);
for (String ns : response.getNamespaces()) {
System.out.println("Namespace: " + ns);
}
// List child namespaces under a parent
ListNamespacesRequest childRequest = new ListNamespacesRequest();
childRequest.setId(Arrays.asList("parent_namespace"));
ListNamespacesResponse childResponse = namespaceClient.listNamespaces(childRequest);
```
### Listing Tables
```java
import org.lance.namespace.model.ListTablesRequest;
import org.lance.namespace.model.ListTablesResponse;
// List tables in a namespace
ListTablesRequest request = new ListTablesRequest();
request.setId(Arrays.asList("my_namespace"));
ListTablesResponse response = namespaceClient.listTables(request);
for (String table : response.getTables()) {
System.out.println("Table: " + table);
}
```
### Dropping a Namespace
```java
import org.lance.namespace.model.DropNamespaceRequest;
import org.lance.namespace.model.DropNamespaceResponse;
DropNamespaceRequest request = new DropNamespaceRequest();
request.setId(Arrays.asList("my_namespace"));
DropNamespaceResponse response = namespaceClient.dropNamespace(request);
```
### Describing a Table
```java
import org.lance.namespace.model.DescribeTableRequest;
import org.lance.namespace.model.DescribeTableResponse;
DescribeTableRequest request = new DescribeTableRequest();
request.setId(Arrays.asList("my_namespace", "my_table"));
DescribeTableResponse response = namespaceClient.describeTable(request);
System.out.println("Table version: " + response.getVersion());
System.out.println("Schema fields: " + response.getSchema().getFields());
```
### Dropping a Table
```java
import org.lance.namespace.model.DropTableRequest;
import org.lance.namespace.model.DropTableResponse;
DropTableRequest request = new DropTableRequest();
request.setId(Arrays.asList("my_namespace", "my_table"));
DropTableResponse response = namespaceClient.dropTable(request);
```
## Writing Data
### Creating a Table
Tables are created within a namespace by providing data in Apache Arrow IPC format:
```java
import org.lance.namespace.LanceNamespace;
import org.lance.namespace.model.CreateTableRequest;
import org.lance.namespace.model.CreateTableResponse;
import org.apache.arrow.memory.BufferAllocator;
import org.apache.arrow.memory.RootAllocator;
import org.apache.arrow.vector.IntVector;
import org.apache.arrow.vector.VarCharVector;
import org.apache.arrow.vector.VectorSchemaRoot;
import org.apache.arrow.vector.complex.FixedSizeListVector;
import org.apache.arrow.vector.Float4Vector;
import org.apache.arrow.vector.ipc.ArrowStreamWriter;
import org.apache.arrow.vector.types.FloatingPointPrecision;
import org.apache.arrow.vector.types.pojo.ArrowType;
import org.apache.arrow.vector.types.pojo.Field;
import org.apache.arrow.vector.types.pojo.FieldType;
import org.apache.arrow.vector.types.pojo.Schema;
import java.io.ByteArrayOutputStream;
import java.nio.channels.Channels;
import java.util.Arrays;
// Create schema with id, name, and embedding fields
Schema schema = new Schema(Arrays.asList(
new Field("id", FieldType.nullable(new ArrowType.Int(32, true)), null),
new Field("name", FieldType.nullable(new ArrowType.Utf8()), null),
new Field("embedding",
FieldType.nullable(new ArrowType.FixedSizeList(128)),
Arrays.asList(new Field("item",
FieldType.nullable(new ArrowType.FloatingPoint(FloatingPointPrecision.SINGLE)),
null)))
));
try (BufferAllocator allocator = new RootAllocator();
VectorSchemaRoot root = VectorSchemaRoot.create(schema, allocator)) {
// Populate data
root.setRowCount(3);
IntVector idVector = (IntVector) root.getVector("id");
VarCharVector nameVector = (VarCharVector) root.getVector("name");
FixedSizeListVector embeddingVector = (FixedSizeListVector) root.getVector("embedding");
Float4Vector embeddingData = (Float4Vector) embeddingVector.getDataVector();
for (int i = 0; i < 3; i++) {
idVector.setSafe(i, i + 1);
nameVector.setSafe(i, ("item_" + i).getBytes());
embeddingVector.setNotNull(i);
for (int j = 0; j < 128; j++) {
embeddingData.setSafe(i * 128 + j, (float) i);
}
}
idVector.setValueCount(3);
nameVector.setValueCount(3);
embeddingData.setValueCount(3 * 128);
embeddingVector.setValueCount(3);
// Serialize to Arrow IPC format
ByteArrayOutputStream out = new ByteArrayOutputStream();
try (ArrowStreamWriter writer = new ArrowStreamWriter(root, null, Channels.newChannel(out))) {
writer.start();
writer.writeBatch();
writer.end();
}
byte[] tableData = out.toByteArray();
// Create table in a namespace
CreateTableRequest request = new CreateTableRequest();
request.setId(Arrays.asList("my_namespace", "my_table"));
CreateTableResponse response = namespaceClient.createTable(request, tableData);
}
```
### Insert
```java
import org.lance.namespace.model.InsertIntoTableRequest;
import org.lance.namespace.model.InsertIntoTableResponse;
// Prepare data in Arrow IPC format (similar to create table example)
byte[] insertData = prepareArrowData();
InsertIntoTableRequest request = new InsertIntoTableRequest();
request.setId(Arrays.asList("my_namespace", "my_table"));
request.setMode(InsertIntoTableRequest.ModeEnum.APPEND);
InsertIntoTableResponse response = namespaceClient.insertIntoTable(request, insertData);
System.out.println("New version: " + response.getVersion());
```
### Update
Update rows matching a predicate condition:
```java
import org.lance.namespace.model.UpdateTableRequest;
import org.lance.namespace.model.UpdateTableResponse;
UpdateTableRequest request = new UpdateTableRequest();
request.setId(Arrays.asList("my_namespace", "my_table"));
// Predicate to select rows to update
request.setPredicate("id = 1");
// Set new values using SQL expressions as [column_name, expression] pairs
request.setUpdates(Arrays.asList(
Arrays.asList("name", "'updated_name'")
));
UpdateTableResponse response = namespaceClient.updateTable(request);
System.out.println("Updated rows: " + response.getUpdatedRows());
```
### Delete
Delete rows matching a predicate condition:
```java
import org.lance.namespace.model.DeleteFromTableRequest;
import org.lance.namespace.model.DeleteFromTableResponse;
DeleteFromTableRequest request = new DeleteFromTableRequest();
request.setId(Arrays.asList("my_namespace", "my_table"));
// Predicate to select rows to delete
request.setPredicate("id > 100");
DeleteFromTableResponse response = namespaceClient.deleteFromTable(request);
System.out.println("New version: " + response.getVersion());
```
### Merge Insert (Upsert)
Merge insert allows you to update existing rows and insert new rows in a single operation based on a key column:
```java
import org.lance.namespace.model.MergeInsertIntoTableRequest;
import org.lance.namespace.model.MergeInsertIntoTableResponse;
// Prepare data with rows to update (id=2,3) and new rows (id=4)
byte[] mergeData = prepareArrowData(); // Contains rows with id=2,3,4
MergeInsertIntoTableRequest request = new MergeInsertIntoTableRequest();
request.setId(Arrays.asList("my_namespace", "my_table"));
// Match on the "id" column
request.setOn("id");
// Update all columns when a matching row is found
request.setWhenMatchedUpdateAll(true);
// Insert new rows when no match is found
request.setWhenNotMatchedInsertAll(true);
MergeInsertIntoTableResponse response = namespaceClient.mergeInsertIntoTable(request, mergeData);
System.out.println("Updated rows: " + response.getNumUpdatedRows());
System.out.println("Inserted rows: " + response.getNumInsertedRows());
```
## Querying Data
### Counting Rows
```java
import org.lance.namespace.model.CountTableRowsRequest;
CountTableRowsRequest request = new CountTableRowsRequest();
request.setId(Arrays.asList("my_namespace", "my_table"));
Long rowCount = namespaceClient.countTableRows(request);
System.out.println("Row count: " + rowCount);
```
### Vector Search
```java
import org.lance.namespace.model.QueryTableRequest;
import org.lance.namespace.model.QueryTableRequestVector;
QueryTableRequest query = new QueryTableRequest();
query.setId(Arrays.asList("my_namespace", "my_table"));
query.setK(10); // Return top 10 results
// Set the query vector
List<Float> queryVector = new ArrayList<>();
for (int i = 0; i < 128; i++) {
queryVector.add(1.0f);
}
QueryTableRequestVector vector = new QueryTableRequestVector();
vector.setSingleVector(queryVector);
query.setVector(vector);
// Specify columns to return
query.setColumns(Arrays.asList("id", "name", "embedding"));
// Execute query - returns Arrow IPC format
byte[] result = namespaceClient.queryTable(query);
```
### Full Text Search
```java
import org.lance.namespace.model.QueryTableRequest;
import org.lance.namespace.model.QueryTableRequestFullTextQuery;
import org.lance.namespace.model.StringFtsQuery;
QueryTableRequest query = new QueryTableRequest();
query.setId(Arrays.asList("my_namespace", "my_table"));
query.setK(10);
// Set full text search query
StringFtsQuery stringQuery = new StringFtsQuery();
stringQuery.setQuery("search terms");
stringQuery.setColumns(Arrays.asList("text_column"));
QueryTableRequestFullTextQuery fts = new QueryTableRequestFullTextQuery();
fts.setStringQuery(stringQuery);
query.setFullTextQuery(fts);
// Specify columns to return
query.setColumns(Arrays.asList("id", "text_column"));
byte[] result = namespaceClient.queryTable(query);
```
### Query with Filter
```java
QueryTableRequest query = new QueryTableRequest();
query.setId(Arrays.asList("my_namespace", "my_table"));
query.setK(10);
query.setFilter("id > 50");
query.setColumns(Arrays.asList("id", "name"));
byte[] result = namespaceClient.queryTable(query);
```
### Query with Prefilter
```java
QueryTableRequest query = new QueryTableRequest();
query.setId(Arrays.asList("my_namespace", "my_table"));
query.setK(5);
query.setPrefilter(true); // Apply filter before vector search
query.setFilter("category = 'electronics'");
// Set query vector
QueryTableRequestVector vector = new QueryTableRequestVector();
vector.setSingleVector(queryVector);
query.setVector(vector);
byte[] result = namespaceClient.queryTable(query);
```
### Reading Query Results
Query results are returned in Apache Arrow IPC file format. Here's how to read them:
```java
import org.apache.arrow.vector.ipc.ArrowFileReader;
import org.apache.arrow.vector.VectorSchemaRoot;
import org.apache.arrow.memory.BufferAllocator;
import org.apache.arrow.memory.RootAllocator;
import java.nio.ByteBuffer;
import java.nio.channels.SeekableByteChannel;
// Helper class to read Arrow data from byte array
class ByteArraySeekableByteChannel implements SeekableByteChannel {
private final byte[] data;
private long position = 0;
private boolean isOpen = true;
public ByteArraySeekableByteChannel(byte[] data) {
this.data = data;
}
@Override
public int read(ByteBuffer dst) {
int remaining = dst.remaining();
int available = (int) (data.length - position);
if (available <= 0) return -1;
int toRead = Math.min(remaining, available);
dst.put(data, (int) position, toRead);
position += toRead;
return toRead;
}
@Override public long position() { return position; }
@Override public SeekableByteChannel position(long newPosition) { position = newPosition; return this; }
@Override public long size() { return data.length; }
@Override public boolean isOpen() { return isOpen; }
@Override public void close() { isOpen = false; }
@Override public int write(ByteBuffer src) { throw new UnsupportedOperationException(); }
@Override public SeekableByteChannel truncate(long size) { throw new UnsupportedOperationException(); }
}
// Read query results
byte[] queryResult = namespaceClient.queryTable(query);
try (BufferAllocator allocator = new RootAllocator();
ArrowFileReader reader = new ArrowFileReader(
new ByteArraySeekableByteChannel(queryResult), allocator)) {
for (int i = 0; i < reader.getRecordBlocks().size(); i++) {
reader.loadRecordBatch(reader.getRecordBlocks().get(i));
VectorSchemaRoot root = reader.getVectorSchemaRoot();
// Access data
IntVector idVector = (IntVector) root.getVector("id");
VarCharVector nameVector = (VarCharVector) root.getVector("name");
for (int row = 0; row < root.getRowCount(); row++) {
int id = idVector.get(row);
String name = new String(nameVector.get(row));
System.out.println("Row " + row + ": id=" + id + ", name=" + name);
}
}
}
```

View File

@@ -34,7 +34,7 @@ const results = await table.vectorSearch([0.1, 0.3]).limit(20).toArray();
console.log(results);
```
The [quickstart](https://lancedb.github.io/lancedb/basic/) contains a more complete example.
The [quickstart](https://lancedb.com/docs/quickstart/basic-usage/) contains more complete examples.
## Development

View File

@@ -1,7 +1,7 @@
# Contributing to LanceDB Typescript
This document outlines the process for contributing to LanceDB Typescript.
For general contribution guidelines, see [CONTRIBUTING.md](../../../../CONTRIBUTING.md).
For general contribution guidelines, see [CONTRIBUTING.md](../CONTRIBUTING.md).
## Project layout

View File

@@ -147,7 +147,7 @@ A new PermutationBuilder instance
#### Example
```ts
builder.splitCalculated("user_id % 3");
builder.splitCalculated({ calculation: "user_id % 3" });
```
***

View File

@@ -89,4 +89,4 @@ optional storageOptions: Record<string, string>;
(For LanceDB OSS only): configuration for object storage.
The available options are described at https://lancedb.github.io/lancedb/guides/storage/
The available options are described at https://lancedb.com/docs/storage/

View File

@@ -97,4 +97,4 @@ Configuration for object storage.
Options already set on the connection will be inherited by the table,
but can be overridden here.
The available options are described at https://lancedb.github.io/lancedb/guides/storage/
The available options are described at https://lancedb.com/docs/storage/

View File

@@ -8,6 +8,14 @@
## Properties
### numAttempts
```ts
numAttempts: number;
```
***
### numDeletedRows
```ts

View File

@@ -42,4 +42,4 @@ Configuration for object storage.
Options already set on the connection will be inherited by the table,
but can be overridden here.
The available options are described at https://lancedb.github.io/lancedb/guides/storage/
The available options are described at https://lancedb.com/docs/storage/

View File

@@ -30,6 +30,12 @@ is also an [asynchronous API client](#connections-asynchronous).
::: lancedb.table.Table
::: lancedb.table.FragmentStatistics
::: lancedb.table.FragmentSummaryStats
::: lancedb.table.Tags
## Querying (Synchronous)
::: lancedb.query.Query
@@ -58,6 +64,14 @@ is also an [asynchronous API client](#connections-asynchronous).
::: lancedb.embeddings.open_clip.OpenClipEmbeddings
## Remote configuration
::: lancedb.remote.ClientConfig
::: lancedb.remote.TimeoutConfig
::: lancedb.remote.RetryConfig
## Context
::: lancedb.context.contextualize
@@ -115,6 +129,8 @@ Table hold your actual data as a collection of records / rows.
::: lancedb.table.AsyncTable
::: lancedb.table.AsyncTags
## Indices (Asynchronous)
Indices can be created on a table to speed up queries. This section
@@ -136,6 +152,8 @@ lists the indices that LanceDb supports.
::: lancedb.index.IvfFlat
::: lancedb.table.IndexStatistics
## Querying (Asynchronous)
Queries allow you to return data from your database. Basic queries can be

28
java/Makefile Normal file
View File

@@ -0,0 +1,28 @@
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
.PHONY: build-lancedb
build-lancedb:
./mvnw spotless:apply -pl lancedb-core -am
./mvnw install -pl lancedb-core -am
.PHONY: test-lancedb
test-lancedb:
# Requires LANCEDB_DB and LANCEDB_API_KEY environment variables
./mvnw test -pl lancedb-core -P integration-tests
.PHONY: clean
clean:
./mvnw clean
.PHONY: build
build: build-lancedb

View File

@@ -7,10 +7,11 @@
For LanceDB Cloud, use the simplified builder API:
```java
import com.lancedb.lance.namespace.LanceRestNamespace;
import com.lancedb.LanceDbNamespaceClientBuilder;
import org.lance.namespace.LanceNamespace;
// If your DB url is db://example-db, then your database here is example-db
LanceRestNamespace namespace = LanceDBRestNamespaces.builder()
LanceNamespace namespaceClient = LanceDbNamespaceClientBuilder.newBuilder()
.apiKey("your_lancedb_cloud_api_key")
.database("your_database_name")
.build();
@@ -18,13 +19,13 @@ LanceRestNamespace namespace = LanceDBRestNamespaces.builder()
### LanceDB Enterprise
For Enterprise deployments, use your VPC endpoint:
For Enterprise deployments, use your custom endpoint:
```java
LanceRestNamespace namespace = LanceDBRestNamespaces.builder()
LanceNamespace namespaceClient = LanceDbNamespaceClientBuilder.newBuilder()
.apiKey("your_lancedb_enterprise_api_key")
.database("your-top-dir") // Your top level folder under your cloud bucket, e.g. s3://your-bucket/your-top-dir/
.hostOverride("http://<vpc_endpoint_dns_name>:80")
.database("your_database_name")
.endpoint("<your_enterprise_endpoint>")
.build();
```
@@ -33,5 +34,11 @@ LanceRestNamespace namespace = LanceDBRestNamespaces.builder()
Build:
```shell
./mvnw install
./mvnw install -pl lancedb-core -am
```
Run tests:
```shell
./mvnw test -pl lancedb-core
```

View File

@@ -1,30 +0,0 @@
[package]
name = "lancedb-jni"
description = "JNI bindings for LanceDB"
# TODO modify lancedb/Cargo.toml for version and dependencies
version = "0.10.0"
edition.workspace = true
repository.workspace = true
readme.workspace = true
license.workspace = true
keywords.workspace = true
categories.workspace = true
publish = false
[lib]
crate-type = ["cdylib"]
[dependencies]
lancedb = { path = "../../../rust/lancedb", default-features = false }
lance = { workspace = true }
arrow = { workspace = true, features = ["ffi"] }
arrow-schema.workspace = true
tokio = "1.46"
jni = "0.21.1"
snafu.workspace = true
lazy_static.workspace = true
serde = { version = "^1" }
serde_json = { version = "1" }
[features]
default = ["lancedb/default"]

View File

@@ -1,133 +0,0 @@
// SPDX-License-Identifier: Apache-2.0
// SPDX-FileCopyrightText: Copyright The LanceDB Authors
use crate::ffi::JNIEnvExt;
use crate::traits::IntoJava;
use crate::{Error, RT};
use jni::objects::{JObject, JString, JValue};
use jni::JNIEnv;
pub const NATIVE_CONNECTION: &str = "nativeConnectionHandle";
use crate::Result;
use lancedb::connection::{connect, Connection};
#[derive(Clone)]
pub struct BlockingConnection {
pub(crate) inner: Connection,
}
impl BlockingConnection {
pub fn create(dataset_uri: &str) -> Result<Self> {
let inner = RT.block_on(connect(dataset_uri).execute())?;
Ok(Self { inner })
}
pub fn table_names(
&self,
start_after: Option<String>,
limit: Option<i32>,
) -> Result<Vec<String>> {
let mut op = self.inner.table_names();
if let Some(start_after) = start_after {
op = op.start_after(start_after);
}
if let Some(limit) = limit {
op = op.limit(limit as u32);
}
Ok(RT.block_on(op.execute())?)
}
}
impl IntoJava for BlockingConnection {
fn into_java<'a>(self, env: &mut JNIEnv<'a>) -> JObject<'a> {
attach_native_connection(env, self)
}
}
fn attach_native_connection<'local>(
env: &mut JNIEnv<'local>,
connection: BlockingConnection,
) -> JObject<'local> {
let j_connection = create_java_connection_object(env);
// This block sets a native Rust object (Connection) as a field in the Java object (j_Connection).
// Caution: This creates a potential for memory leaks. The Rust object (Connection) is not
// automatically garbage-collected by Java, and its memory will not be freed unless
// explicitly handled.
//
// To prevent memory leaks, ensure the following:
// 1. The Java object (`j_Connection`) should implement the `java.io.Closeable` interface.
// 2. Users of this Java object should be instructed to always use it within a try-with-resources
// statement (or manually call the `close()` method) to ensure that `self.close()` is invoked.
match unsafe { env.set_rust_field(&j_connection, NATIVE_CONNECTION, connection) } {
Ok(_) => j_connection,
Err(err) => {
env.throw_new(
"java/lang/RuntimeException",
format!("Failed to set native handle for Connection: {}", err),
)
.expect("Error throwing exception");
JObject::null()
}
}
}
fn create_java_connection_object<'a>(env: &mut JNIEnv<'a>) -> JObject<'a> {
env.new_object("com/lancedb/lancedb/Connection", "()V", &[])
.expect("Failed to create Java Lance Connection instance")
}
#[no_mangle]
pub extern "system" fn Java_com_lancedb_lancedb_Connection_releaseNativeConnection(
mut env: JNIEnv,
j_connection: JObject,
) {
let _: BlockingConnection = unsafe {
env.take_rust_field(j_connection, NATIVE_CONNECTION)
.expect("Failed to take native Connection handle")
};
}
#[no_mangle]
pub extern "system" fn Java_com_lancedb_lancedb_Connection_connect<'local>(
mut env: JNIEnv<'local>,
_obj: JObject,
dataset_uri_object: JString,
) -> JObject<'local> {
let dataset_uri: String = ok_or_throw!(env, env.get_string(&dataset_uri_object)).into();
let blocking_connection = ok_or_throw!(env, BlockingConnection::create(&dataset_uri));
blocking_connection.into_java(&mut env)
}
#[no_mangle]
pub extern "system" fn Java_com_lancedb_lancedb_Connection_tableNames<'local>(
mut env: JNIEnv<'local>,
j_connection: JObject,
start_after_obj: JObject, // Optional<String>
limit_obj: JObject, // Optional<Integer>
) -> JObject<'local> {
ok_or_throw!(
env,
inner_table_names(&mut env, j_connection, start_after_obj, limit_obj)
)
}
fn inner_table_names<'local>(
env: &mut JNIEnv<'local>,
j_connection: JObject,
start_after_obj: JObject, // Optional<String>
limit_obj: JObject, // Optional<Integer>
) -> Result<JObject<'local>> {
let start_after = env.get_string_opt(&start_after_obj)?;
let limit = env.get_int_opt(&limit_obj)?;
let conn =
unsafe { env.get_rust_field::<_, _, BlockingConnection>(j_connection, NATIVE_CONNECTION) }?;
let table_names = conn.table_names(start_after, limit)?;
drop(conn);
let j_names = env.new_object("java/util/ArrayList", "()V", &[])?;
for item in table_names {
let jstr_item = env.new_string(item)?;
let item_jobj = JObject::from(jstr_item);
let item_gen = JValue::Object(&item_jobj);
env.call_method(&j_names, "add", "(Ljava/lang/Object;)Z", &[item_gen])?;
}
Ok(j_names)
}

View File

@@ -1,217 +0,0 @@
// SPDX-License-Identifier: Apache-2.0
// SPDX-FileCopyrightText: Copyright The LanceDB Authors
use std::str::Utf8Error;
use arrow_schema::ArrowError;
use jni::errors::Error as JniError;
use serde_json::Error as JsonError;
use snafu::{Location, Snafu};
type BoxedError = Box<dyn std::error::Error + Send + Sync + 'static>;
/// Java Exception types
pub enum JavaException {
IllegalArgumentException,
IOException,
RuntimeException,
}
impl JavaException {
pub fn as_str(&self) -> &str {
match self {
Self::IllegalArgumentException => "java/lang/IllegalArgumentException",
Self::IOException => "java/io/IOException",
Self::RuntimeException => "java/lang/RuntimeException",
}
}
}
/// TODO(lu) change to lancedb-jni
#[derive(Debug, Snafu)]
#[snafu(visibility(pub))]
pub enum Error {
#[snafu(display("JNI error: {message}, {location}"))]
Jni { message: String, location: Location },
#[snafu(display("Invalid argument: {message}, {location}"))]
InvalidArgument { message: String, location: Location },
#[snafu(display("IO error: {source}, {location}"))]
IO {
source: BoxedError,
location: Location,
},
#[snafu(display("Arrow error: {message}, {location}"))]
Arrow { message: String, location: Location },
#[snafu(display("Index error: {message}, {location}"))]
Index { message: String, location: Location },
#[snafu(display("JSON error: {message}, {location}"))]
JSON { message: String, location: Location },
#[snafu(display("Dataset at path {path} was not found, {location}"))]
DatasetNotFound { path: String, location: Location },
#[snafu(display("Dataset already exists: {uri}, {location}"))]
DatasetAlreadyExists { uri: String, location: Location },
#[snafu(display("Table '{name}' already exists"))]
TableAlreadyExists { name: String },
#[snafu(display("Table '{name}' was not found: {source}"))]
TableNotFound {
name: String,
source: Box<dyn std::error::Error + Send + Sync>,
},
#[snafu(display("Invalid table name '{name}': {reason}"))]
InvalidTableName { name: String, reason: String },
#[snafu(display("Embedding function '{name}' was not found: {reason}, {location}"))]
EmbeddingFunctionNotFound {
name: String,
reason: String,
location: Location,
},
#[snafu(display("Other Lance error: {message}, {location}"))]
OtherLance { message: String, location: Location },
#[snafu(display("Other LanceDB error: {message}, {location}"))]
OtherLanceDB { message: String, location: Location },
}
impl Error {
/// Throw as Java Exception
pub fn throw(&self, env: &mut jni::JNIEnv) {
match self {
Self::InvalidArgument { .. }
| Self::DatasetNotFound { .. }
| Self::DatasetAlreadyExists { .. }
| Self::TableAlreadyExists { .. }
| Self::TableNotFound { .. }
| Self::InvalidTableName { .. }
| Self::EmbeddingFunctionNotFound { .. } => {
self.throw_as(env, JavaException::IllegalArgumentException)
}
Self::IO { .. } | Self::Index { .. } => self.throw_as(env, JavaException::IOException),
Self::Arrow { .. }
| Self::JSON { .. }
| Self::OtherLance { .. }
| Self::OtherLanceDB { .. }
| Self::Jni { .. } => self.throw_as(env, JavaException::RuntimeException),
}
}
/// Throw as an concrete Java Exception
pub fn throw_as(&self, env: &mut jni::JNIEnv, exception: JavaException) {
let message = &format!(
"Error when throwing Java exception: {}:{}",
exception.as_str(),
self
);
env.throw_new(exception.as_str(), self.to_string())
.expect(message);
}
}
pub type Result<T> = std::result::Result<T, Error>;
trait ToSnafuLocation {
fn to_snafu_location(&'static self) -> snafu::Location;
}
impl ToSnafuLocation for std::panic::Location<'static> {
fn to_snafu_location(&'static self) -> snafu::Location {
snafu::Location::new(self.file(), self.line(), self.column())
}
}
impl From<JniError> for Error {
#[track_caller]
fn from(source: JniError) -> Self {
Self::Jni {
message: source.to_string(),
location: std::panic::Location::caller().to_snafu_location(),
}
}
}
impl From<Utf8Error> for Error {
#[track_caller]
fn from(source: Utf8Error) -> Self {
Self::InvalidArgument {
message: source.to_string(),
location: std::panic::Location::caller().to_snafu_location(),
}
}
}
impl From<ArrowError> for Error {
#[track_caller]
fn from(source: ArrowError) -> Self {
Self::Arrow {
message: source.to_string(),
location: std::panic::Location::caller().to_snafu_location(),
}
}
}
impl From<JsonError> for Error {
#[track_caller]
fn from(source: JsonError) -> Self {
Self::JSON {
message: source.to_string(),
location: std::panic::Location::caller().to_snafu_location(),
}
}
}
impl From<lance::Error> for Error {
#[track_caller]
fn from(source: lance::Error) -> Self {
match source {
lance::Error::DatasetNotFound {
path,
source: _,
location,
} => Self::DatasetNotFound { path, location },
lance::Error::DatasetAlreadyExists { uri, location } => {
Self::DatasetAlreadyExists { uri, location }
}
lance::Error::IO { source, location } => Self::IO { source, location },
lance::Error::Arrow { message, location } => Self::Arrow { message, location },
lance::Error::Index { message, location } => Self::Index { message, location },
lance::Error::InvalidInput { source, location } => Self::InvalidArgument {
message: source.to_string(),
location,
},
_ => Self::OtherLance {
message: source.to_string(),
location: std::panic::Location::caller().to_snafu_location(),
},
}
}
}
impl From<lancedb::Error> for Error {
#[track_caller]
fn from(source: lancedb::Error) -> Self {
match source {
lancedb::Error::InvalidTableName { name, reason } => {
Self::InvalidTableName { name, reason }
}
lancedb::Error::InvalidInput { message } => Self::InvalidArgument {
message,
location: std::panic::Location::caller().to_snafu_location(),
},
lancedb::Error::TableNotFound { name, source } => Self::TableNotFound { name, source },
lancedb::Error::TableAlreadyExists { name } => Self::TableAlreadyExists { name },
lancedb::Error::EmbeddingFunctionNotFound { name, reason } => {
Self::EmbeddingFunctionNotFound {
name,
reason,
location: std::panic::Location::caller().to_snafu_location(),
}
}
lancedb::Error::Arrow { source } => Self::Arrow {
message: source.to_string(),
location: std::panic::Location::caller().to_snafu_location(),
},
lancedb::Error::Lance { source } => Self::from(source),
_ => Self::OtherLanceDB {
message: source.to_string(),
location: std::panic::Location::caller().to_snafu_location(),
},
}
}
}

View File

@@ -1,194 +0,0 @@
// SPDX-License-Identifier: Apache-2.0
// SPDX-FileCopyrightText: Copyright The LanceDB Authors
use core::slice;
use jni::objects::{JByteBuffer, JObjectArray, JString};
use jni::sys::jobjectArray;
use jni::{objects::JObject, JNIEnv};
use crate::error::{Error, Result};
/// TODO(lu) import from lance-jni without duplicate
/// Extend JNIEnv with helper functions.
pub trait JNIEnvExt {
/// Get integers from Java List<Integer> object.
fn get_integers(&mut self, obj: &JObject) -> Result<Vec<i32>>;
/// Get strings from Java List<String> object.
#[allow(dead_code)]
fn get_strings(&mut self, obj: &JObject) -> Result<Vec<String>>;
/// Get strings from Java String[] object.
/// Note that get Option<Vec<String>> from Java Optional<String[]> just doesn't work.
#[allow(unused)]
fn get_strings_array(&mut self, obj: jobjectArray) -> Result<Vec<String>>;
/// Get Option<String> from Java Optional<String>.
fn get_string_opt(&mut self, obj: &JObject) -> Result<Option<String>>;
/// Get Option<Vec<String>> from Java Optional<List<String>>.
#[allow(unused)]
fn get_strings_opt(&mut self, obj: &JObject) -> Result<Option<Vec<String>>>;
/// Get Option<i32> from Java Optional<Integer>.
fn get_int_opt(&mut self, obj: &JObject) -> Result<Option<i32>>;
/// Get Option<Vec<i32>> from Java Optional<List<Integer>>.
fn get_ints_opt(&mut self, obj: &JObject) -> Result<Option<Vec<i32>>>;
/// Get Option<i64> from Java Optional<Long>.
#[allow(unused)]
fn get_long_opt(&mut self, obj: &JObject) -> Result<Option<i64>>;
/// Get Option<u64> from Java Optional<Long>.
#[allow(unused)]
fn get_u64_opt(&mut self, obj: &JObject) -> Result<Option<u64>>;
/// Get Option<&[u8]> from Java Optional<ByteBuffer>.
#[allow(unused)]
fn get_bytes_opt(&mut self, obj: &JObject) -> Result<Option<&[u8]>>;
fn get_optional<T, F>(&mut self, obj: &JObject, f: F) -> Result<Option<T>>
where
F: FnOnce(&mut JNIEnv, &JObject) -> Result<T>;
}
impl JNIEnvExt for JNIEnv<'_> {
fn get_integers(&mut self, obj: &JObject) -> Result<Vec<i32>> {
let list = self.get_list(obj)?;
let mut iter = list.iter(self)?;
let mut results = Vec::with_capacity(list.size(self)? as usize);
while let Some(elem) = iter.next(self)? {
let int_obj = self.call_method(elem, "intValue", "()I", &[])?;
let int_value = int_obj.i()?;
results.push(int_value);
}
Ok(results)
}
fn get_strings(&mut self, obj: &JObject) -> Result<Vec<String>> {
let list = self.get_list(obj)?;
let mut iter = list.iter(self)?;
let mut results = Vec::with_capacity(list.size(self)? as usize);
while let Some(elem) = iter.next(self)? {
let jstr = JString::from(elem);
let val = self.get_string(&jstr)?;
results.push(val.to_str()?.to_string())
}
Ok(results)
}
fn get_strings_array(&mut self, obj: jobjectArray) -> Result<Vec<String>> {
let jobject_array = unsafe { JObjectArray::from_raw(obj) };
let array_len = self.get_array_length(&jobject_array)?;
let mut res: Vec<String> = Vec::new();
for i in 0..array_len {
let item: JString = self.get_object_array_element(&jobject_array, i)?.into();
res.push(self.get_string(&item)?.into());
}
Ok(res)
}
fn get_string_opt(&mut self, obj: &JObject) -> Result<Option<String>> {
self.get_optional(obj, |env, inner_obj| {
let java_obj_gen = env.call_method(inner_obj, "get", "()Ljava/lang/Object;", &[])?;
let java_string_obj = java_obj_gen.l()?;
let jstr = JString::from(java_string_obj);
let val = env.get_string(&jstr)?;
Ok(val.to_str()?.to_string())
})
}
fn get_strings_opt(&mut self, obj: &JObject) -> Result<Option<Vec<String>>> {
self.get_optional(obj, |env, inner_obj| {
let java_obj_gen = env.call_method(inner_obj, "get", "()Ljava/lang/Object;", &[])?;
let java_list_obj = java_obj_gen.l()?;
env.get_strings(&java_list_obj)
})
}
fn get_int_opt(&mut self, obj: &JObject) -> Result<Option<i32>> {
self.get_optional(obj, |env, inner_obj| {
let java_obj_gen = env.call_method(inner_obj, "get", "()Ljava/lang/Object;", &[])?;
let java_int_obj = java_obj_gen.l()?;
let int_obj = env.call_method(java_int_obj, "intValue", "()I", &[])?;
let int_value = int_obj.i()?;
Ok(int_value)
})
}
fn get_ints_opt(&mut self, obj: &JObject) -> Result<Option<Vec<i32>>> {
self.get_optional(obj, |env, inner_obj| {
let java_obj_gen = env.call_method(inner_obj, "get", "()Ljava/lang/Object;", &[])?;
let java_list_obj = java_obj_gen.l()?;
env.get_integers(&java_list_obj)
})
}
fn get_long_opt(&mut self, obj: &JObject) -> Result<Option<i64>> {
self.get_optional(obj, |env, inner_obj| {
let java_obj_gen = env.call_method(inner_obj, "get", "()Ljava/lang/Object;", &[])?;
let java_long_obj = java_obj_gen.l()?;
let long_obj = env.call_method(java_long_obj, "longValue", "()J", &[])?;
let long_value = long_obj.j()?;
Ok(long_value)
})
}
fn get_u64_opt(&mut self, obj: &JObject) -> Result<Option<u64>> {
self.get_optional(obj, |env, inner_obj| {
let java_obj_gen = env.call_method(inner_obj, "get", "()Ljava/lang/Object;", &[])?;
let java_long_obj = java_obj_gen.l()?;
let long_obj = env.call_method(java_long_obj, "longValue", "()J", &[])?;
let long_value = long_obj.j()?;
Ok(long_value as u64)
})
}
fn get_bytes_opt(&mut self, obj: &JObject) -> Result<Option<&[u8]>> {
self.get_optional(obj, |env, inner_obj| {
let java_obj_gen = env.call_method(inner_obj, "get", "()Ljava/lang/Object;", &[])?;
let java_byte_buffer_obj = java_obj_gen.l()?;
let j_byte_buffer = JByteBuffer::from(java_byte_buffer_obj);
let raw_data = env.get_direct_buffer_address(&j_byte_buffer)?;
let capacity = env.get_direct_buffer_capacity(&j_byte_buffer)?;
let data = unsafe { slice::from_raw_parts(raw_data, capacity) };
Ok(data)
})
}
fn get_optional<T, F>(&mut self, obj: &JObject, f: F) -> Result<Option<T>>
where
F: FnOnce(&mut JNIEnv, &JObject) -> Result<T>,
{
if obj.is_null() {
return Ok(None);
}
let is_present = self.call_method(obj, "isPresent", "()Z", &[])?;
if !is_present.z()? {
// TODO(lu): put get java object into here cuz can only get java Object
Ok(None)
} else {
f(self, obj).map(Some)
}
}
}
#[no_mangle]
pub extern "system" fn Java_com_lancedb_lance_test_JniTestHelper_parseInts(
mut env: JNIEnv,
_obj: JObject,
list_obj: JObject, // List<Integer>
) {
ok_or_throw_without_return!(env, env.get_integers(&list_obj));
}
#[no_mangle]
pub extern "system" fn Java_com_lancedb_lance_test_JniTestHelper_parseIntsOpt(
mut env: JNIEnv,
_obj: JObject,
list_obj: JObject, // Optional<List<Integer>>
) {
ok_or_throw_without_return!(env, env.get_ints_opt(&list_obj));
}

View File

@@ -1,57 +0,0 @@
// SPDX-License-Identifier: Apache-2.0
// SPDX-FileCopyrightText: Copyright The LanceDB Authors
use lazy_static::lazy_static;
// TODO import from lance-jni without duplicate
#[macro_export]
macro_rules! ok_or_throw {
($env:expr, $result:expr) => {
match $result {
Ok(value) => value,
Err(err) => {
Error::from(err).throw(&mut $env);
return JObject::null();
}
}
};
}
macro_rules! ok_or_throw_without_return {
($env:expr, $result:expr) => {
match $result {
Ok(value) => value,
Err(err) => {
Error::from(err).throw(&mut $env);
return;
}
}
};
}
#[macro_export]
macro_rules! ok_or_throw_with_return {
($env:expr, $result:expr, $ret:expr) => {
match $result {
Ok(value) => value,
Err(err) => {
Error::from(err).throw(&mut $env);
return $ret;
}
}
};
}
mod connection;
pub mod error;
mod ffi;
mod traits;
pub use error::{Error, Result};
lazy_static! {
static ref RT: tokio::runtime::Runtime = tokio::runtime::Builder::new_multi_thread()
.enable_all()
.build()
.expect("Failed to create tokio runtime");
}

View File

@@ -1,114 +0,0 @@
// SPDX-License-Identifier: Apache-2.0
// SPDX-FileCopyrightText: Copyright The LanceDB Authors
use jni::objects::{JMap, JObject, JString, JValue};
use jni::JNIEnv;
use crate::Result;
#[allow(dead_code)]
pub trait FromJObject<T> {
fn extract(&self) -> Result<T>;
}
/// Convert a Rust type into a Java Object.
pub trait IntoJava {
fn into_java<'a>(self, env: &mut JNIEnv<'a>) -> JObject<'a>;
}
impl FromJObject<i32> for JObject<'_> {
fn extract(&self) -> Result<i32> {
Ok(JValue::from(self).i()?)
}
}
impl FromJObject<i64> for JObject<'_> {
fn extract(&self) -> Result<i64> {
Ok(JValue::from(self).j()?)
}
}
impl FromJObject<f32> for JObject<'_> {
fn extract(&self) -> Result<f32> {
Ok(JValue::from(self).f()?)
}
}
impl FromJObject<f64> for JObject<'_> {
fn extract(&self) -> Result<f64> {
Ok(JValue::from(self).d()?)
}
}
#[allow(dead_code)]
pub trait FromJString {
fn extract(&self, env: &mut JNIEnv) -> Result<String>;
}
impl FromJString for JString<'_> {
fn extract(&self, env: &mut JNIEnv) -> Result<String> {
Ok(env.get_string(self)?.into())
}
}
pub trait JMapExt {
#[allow(dead_code)]
fn get_string(&self, env: &mut JNIEnv, key: &str) -> Result<Option<String>>;
#[allow(dead_code)]
fn get_i32(&self, env: &mut JNIEnv, key: &str) -> Result<Option<i32>>;
#[allow(dead_code)]
fn get_i64(&self, env: &mut JNIEnv, key: &str) -> Result<Option<i64>>;
#[allow(dead_code)]
fn get_f32(&self, env: &mut JNIEnv, key: &str) -> Result<Option<f32>>;
#[allow(dead_code)]
fn get_f64(&self, env: &mut JNIEnv, key: &str) -> Result<Option<f64>>;
}
#[allow(dead_code)]
fn get_map_value<T>(env: &mut JNIEnv, map: &JMap, key: &str) -> Result<Option<T>>
where
for<'a> JObject<'a>: FromJObject<T>,
{
let key_obj: JObject = env.new_string(key)?.into();
if let Some(value) = map.get(env, &key_obj)? {
if value.is_null() {
Ok(None)
} else {
Ok(Some(value.extract()?))
}
} else {
Ok(None)
}
}
impl JMapExt for JMap<'_, '_, '_> {
fn get_string(&self, env: &mut JNIEnv, key: &str) -> Result<Option<String>> {
let key_obj: JObject = env.new_string(key)?.into();
if let Some(value) = self.get(env, &key_obj)? {
let value_str: JString = value.into();
Ok(Some(value_str.extract(env)?))
} else {
Ok(None)
}
}
fn get_i32(&self, env: &mut JNIEnv, key: &str) -> Result<Option<i32>> {
get_map_value(env, self, key)
}
fn get_i64(&self, env: &mut JNIEnv, key: &str) -> Result<Option<i64>> {
get_map_value(env, self, key)
}
fn get_f32(&self, env: &mut JNIEnv, key: &str) -> Result<Option<f32>> {
get_map_value(env, self, key)
}
fn get_f64(&self, env: &mut JNIEnv, key: &str) -> Result<Option<f64>> {
get_map_value(env, self, key)
}
}

View File

@@ -1,103 +0,0 @@
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>com.lancedb</groupId>
<artifactId>lancedb-parent</artifactId>
<version>0.22.4-beta.0</version>
<relativePath>../pom.xml</relativePath>
</parent>
<artifactId>lancedb-core</artifactId>
<name>${project.artifactId}</name>
<description>LanceDB Core</description>
<packaging>jar</packaging>
<properties>
<rust.release.build>false</rust.release.build>
</properties>
<dependencies>
<dependency>
<groupId>com.lancedb</groupId>
<artifactId>lance-namespace-core</artifactId>
<version>0.0.1</version>
</dependency>
<dependency>
<groupId>org.apache.arrow</groupId>
<artifactId>arrow-vector</artifactId>
</dependency>
<dependency>
<groupId>org.apache.arrow</groupId>
<artifactId>arrow-memory-netty</artifactId>
</dependency>
<dependency>
<groupId>org.apache.arrow</groupId>
<artifactId>arrow-c-data</artifactId>
</dependency>
<dependency>
<groupId>org.apache.arrow</groupId>
<artifactId>arrow-dataset</artifactId>
</dependency>
<dependency>
<groupId>org.json</groupId>
<artifactId>json</artifactId>
</dependency>
<dependency>
<groupId>org.questdb</groupId>
<artifactId>jar-jni</artifactId>
</dependency>
<dependency>
<groupId>org.junit.jupiter</groupId>
<artifactId>junit-jupiter</artifactId>
<scope>test</scope>
</dependency>
</dependencies>
<profiles>
<profile>
<id>build-jni</id>
<activation>
<activeByDefault>true</activeByDefault>
</activation>
<build>
<plugins>
<plugin>
<groupId>org.questdb</groupId>
<artifactId>rust-maven-plugin</artifactId>
<version>1.1.1</version>
<executions>
<execution>
<id>lancedb-jni</id>
<goals>
<goal>build</goal>
</goals>
<configuration>
<path>lancedb-jni</path>
<release>${rust.release.build}</release>
<!-- Copy native libraries to target/classes for runtime access -->
<copyTo>${project.build.directory}/classes/nativelib</copyTo>
<copyWithPlatformDir>true</copyWithPlatformDir>
</configuration>
</execution>
<execution>
<id>lancedb-jni-test</id>
<goals>
<goal>test</goal>
</goals>
<configuration>
<path>lancedb-jni</path>
<release>false</release>
<verbosity>-v</verbosity>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
</profile>
</profiles>
</project>

View File

@@ -1,108 +0,0 @@
/*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package com.lancedb.lancedb;
import io.questdb.jar.jni.JarJniLoader;
import java.io.Closeable;
import java.util.List;
import java.util.Optional;
/** Represents LanceDB database. */
public class Connection implements Closeable {
static {
JarJniLoader.loadLib(Connection.class, "/nativelib", "lancedb_jni");
}
private long nativeConnectionHandle;
/** Connect to a LanceDB instance. */
public static native Connection connect(String uri);
/**
* Get the names of all tables in the database. The names are sorted in ascending order.
*
* @return the table names
*/
public List<String> tableNames() {
return tableNames(Optional.empty(), Optional.empty());
}
/**
* Get the names of filtered tables in the database. The names are sorted in ascending order.
*
* @param limit The number of results to return.
* @return the table names
*/
public List<String> tableNames(int limit) {
return tableNames(Optional.empty(), Optional.of(limit));
}
/**
* Get the names of filtered tables in the database. The names are sorted in ascending order.
*
* @param startAfter If present, only return names that come lexicographically after the supplied
* value. This can be combined with limit to implement pagination by setting this to the last
* table name from the previous page.
* @return the table names
*/
public List<String> tableNames(String startAfter) {
return tableNames(Optional.of(startAfter), Optional.empty());
}
/**
* Get the names of filtered tables in the database. The names are sorted in ascending order.
*
* @param startAfter If present, only return names that come lexicographically after the supplied
* value. This can be combined with limit to implement pagination by setting this to the last
* table name from the previous page.
* @param limit The number of results to return.
* @return the table names
*/
public List<String> tableNames(String startAfter, int limit) {
return tableNames(Optional.of(startAfter), Optional.of(limit));
}
/**
* Get the names of filtered tables in the database. The names are sorted in ascending order.
*
* @param startAfter If present, only return names that come lexicographically after the supplied
* value. This can be combined with limit to implement pagination by setting this to the last
* table name from the previous page.
* @param limit The number of results to return.
* @return the table names
*/
public native List<String> tableNames(Optional<String> startAfter, Optional<Integer> limit);
/**
* Closes this connection and releases any system resources associated with it. If the connection
* is already closed, then invoking this method has no effect.
*/
@Override
public void close() {
if (nativeConnectionHandle != 0) {
releaseNativeConnection(nativeConnectionHandle);
nativeConnectionHandle = 0;
}
}
/**
* Native method to release the Lance connection resources associated with the given handle.
*
* @param handle The native handle to the connection resource.
*/
private native void releaseNativeConnection(long handle);
private Connection() {}
}

View File

@@ -1,135 +0,0 @@
/*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package com.lancedb.lancedb;
import org.junit.jupiter.api.BeforeAll;
import org.junit.jupiter.api.Test;
import org.junit.jupiter.api.io.TempDir;
import java.net.URL;
import java.nio.file.Path;
import java.util.List;
import static org.junit.jupiter.api.Assertions.assertEquals;
import static org.junit.jupiter.api.Assertions.assertTrue;
public class ConnectionTest {
private static final String[] TABLE_NAMES = {
"dataset_version", "new_empty_dataset", "test", "write_stream"
};
@TempDir static Path tempDir; // Temporary directory for the tests
private static URL lanceDbURL;
@BeforeAll
static void setUp() {
ClassLoader classLoader = ConnectionTest.class.getClassLoader();
lanceDbURL = classLoader.getResource("example_db");
}
@Test
void emptyDB() {
String databaseUri = tempDir.resolve("emptyDB").toString();
try (Connection conn = Connection.connect(databaseUri)) {
List<String> tableNames = conn.tableNames();
assertTrue(tableNames.isEmpty());
}
}
@Test
void tableNames() {
try (Connection conn = Connection.connect(lanceDbURL.toString())) {
List<String> tableNames = conn.tableNames();
assertEquals(4, tableNames.size());
for (int i = 0; i < TABLE_NAMES.length; i++) {
assertEquals(TABLE_NAMES[i], tableNames.get(i));
}
}
}
@Test
void tableNamesStartAfter() {
try (Connection conn = Connection.connect(lanceDbURL.toString())) {
assertTableNamesStartAfter(
conn, TABLE_NAMES[0], 3, TABLE_NAMES[1], TABLE_NAMES[2], TABLE_NAMES[3]);
assertTableNamesStartAfter(conn, TABLE_NAMES[1], 2, TABLE_NAMES[2], TABLE_NAMES[3]);
assertTableNamesStartAfter(conn, TABLE_NAMES[2], 1, TABLE_NAMES[3]);
assertTableNamesStartAfter(conn, TABLE_NAMES[3], 0);
assertTableNamesStartAfter(
conn, "a_dataset", 4, TABLE_NAMES[0], TABLE_NAMES[1], TABLE_NAMES[2], TABLE_NAMES[3]);
assertTableNamesStartAfter(conn, "o_dataset", 2, TABLE_NAMES[2], TABLE_NAMES[3]);
assertTableNamesStartAfter(conn, "v_dataset", 1, TABLE_NAMES[3]);
assertTableNamesStartAfter(conn, "z_dataset", 0);
}
}
private void assertTableNamesStartAfter(
Connection conn, String startAfter, int expectedSize, String... expectedNames) {
List<String> tableNames = conn.tableNames(startAfter);
assertEquals(expectedSize, tableNames.size());
for (int i = 0; i < expectedNames.length; i++) {
assertEquals(expectedNames[i], tableNames.get(i));
}
}
@Test
void tableNamesLimit() {
try (Connection conn = Connection.connect(lanceDbURL.toString())) {
for (int i = 0; i <= TABLE_NAMES.length; i++) {
List<String> tableNames = conn.tableNames(i);
assertEquals(i, tableNames.size());
for (int j = 0; j < i; j++) {
assertEquals(TABLE_NAMES[j], tableNames.get(j));
}
}
}
}
@Test
void tableNamesStartAfterLimit() {
try (Connection conn = Connection.connect(lanceDbURL.toString())) {
List<String> tableNames = conn.tableNames(TABLE_NAMES[0], 2);
assertEquals(2, tableNames.size());
assertEquals(TABLE_NAMES[1], tableNames.get(0));
assertEquals(TABLE_NAMES[2], tableNames.get(1));
tableNames = conn.tableNames(TABLE_NAMES[1], 1);
assertEquals(1, tableNames.size());
assertEquals(TABLE_NAMES[2], tableNames.get(0));
tableNames = conn.tableNames(TABLE_NAMES[2], 2);
assertEquals(1, tableNames.size());
assertEquals(TABLE_NAMES[3], tableNames.get(0));
tableNames = conn.tableNames(TABLE_NAMES[3], 2);
assertEquals(0, tableNames.size());
tableNames = conn.tableNames(TABLE_NAMES[0], 0);
assertEquals(0, tableNames.size());
// Limit larger than the number of remaining tables
tableNames = conn.tableNames(TABLE_NAMES[0], 10);
assertEquals(3, tableNames.size());
assertEquals(TABLE_NAMES[1], tableNames.get(0));
assertEquals(TABLE_NAMES[2], tableNames.get(1));
assertEquals(TABLE_NAMES[3], tableNames.get(2));
// Start after a value not in the list
tableNames = conn.tableNames("non_existent_table", 2);
assertEquals(2, tableNames.size());
assertEquals(TABLE_NAMES[2], tableNames.get(0));
assertEquals(TABLE_NAMES[3], tableNames.get(1));
// Start after the last table with a limit
tableNames = conn.tableNames(TABLE_NAMES[3], 1);
assertEquals(0, tableNames.size());
}
}
}

View File

@@ -1 +0,0 @@
$d51afd07-e3cd-4c76-9b9b-787e13fd55b0<62>=id <20><><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>*int3208name <20><><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>*string08

View File

@@ -1 +0,0 @@
$15648e72-076f-4ef1-8b90-10d305b95b3b<33>=id <20><><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>*int3208name <20><><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>*string08

View File

@@ -1 +0,0 @@
$a3689caf-4f6b-4afc-a3c7-97af75661843<34>oitem <20><><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>*string8price <20><><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>*double80vector <20><><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>*fixed_size_list:float:28

View File

@@ -1,26 +0,0 @@
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>com.lancedb</groupId>
<artifactId>lancedb-parent</artifactId>
<version>0.22.4-beta.0</version>
<relativePath>../pom.xml</relativePath>
</parent>
<artifactId>lancedb-lance-namespace</artifactId>
<name>${project.artifactId}</name>
<description>LanceDB Java Integration with Lance Namespace</description>
<packaging>jar</packaging>
<dependencies>
<dependency>
<groupId>com.lancedb</groupId>
<artifactId>lance-namespace-core</artifactId>
</dependency>
</dependencies>
</project>

99
java/lancedb-core/pom.xml Normal file
View File

@@ -0,0 +1,99 @@
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>com.lancedb</groupId>
<artifactId>lancedb-parent</artifactId>
<version>0.23.0-final.0</version>
<relativePath>../pom.xml</relativePath>
</parent>
<artifactId>lancedb-core</artifactId>
<name>${project.artifactId}</name>
<description>Utilities to work with LanceDB Cloud and Enterprise via Lance REST Namespace</description>
<packaging>jar</packaging>
<dependencies>
<dependency>
<groupId>org.lance</groupId>
<artifactId>lance-core</artifactId>
</dependency>
<dependency>
<groupId>org.apache.arrow</groupId>
<artifactId>arrow-vector</artifactId>
</dependency>
<dependency>
<groupId>org.apache.arrow</groupId>
<artifactId>arrow-memory-netty</artifactId>
</dependency>
<dependency>
<groupId>org.junit.jupiter</groupId>
<artifactId>junit-jupiter</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.mockito</groupId>
<artifactId>mockito-junit-jupiter</artifactId>
<version>5.18.0</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
<version>2.0.16</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-slf4j2-impl</artifactId>
<version>2.24.3</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-core</artifactId>
<version>2.24.3</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-api</artifactId>
<version>2.24.3</version>
<scope>test</scope>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-source-plugin</artifactId>
<version>3.3.0</version>
<executions>
<execution>
<id>attach-sources</id>
<goals>
<goal>jar-no-fork</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
</plugin>
</plugins>
</build>
</project>

View File

@@ -11,35 +11,58 @@
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package com.lancedb.lancedb;
package com.lancedb;
import com.lancedb.lance.namespace.LanceRestNamespace;
import com.lancedb.lance.namespace.client.apache.ApiClient;
import org.lance.namespace.LanceNamespace;
import java.util.HashMap;
import java.util.Map;
import java.util.Optional;
/** Util class to help construct a {@link LanceRestNamespace} for LanceDB. */
public class LanceDbRestNamespaces {
/**
* Util class to help construct a {@link LanceNamespace} for LanceDB.
*
* <p>For LanceDB Cloud, use the simplified builder API:
*
* <pre>{@code
* import org.lance.namespace.LanceNamespace;
*
* // If your DB url is db://example-db, then your database here is example-db
* LanceNamespace namespaceClient = LanceDbNamespaceClientBuilder.newBuilder()
* .apiKey("your_lancedb_cloud_api_key")
* .database("your_database_name")
* .build();
* }</pre>
*
* <p>For LanceDB Enterprise deployments, use your custom endpoint:
*
* <pre>{@code
* LanceNamespace namespaceClient = LanceDbNamespaceClientBuilder.newBuilder()
* .apiKey("your_lancedb_enterprise_api_key")
* .database("your_database_name")
* .endpoint("<your_enterprise_endpoint>")
* .build();
* }</pre>
*/
public class LanceDbNamespaceClientBuilder {
private static final String DEFAULT_REGION = "us-east-1";
private static final String CLOUD_URL_PATTERN = "https://%s.%s.api.lancedb.com";
private String apiKey;
private String database;
private Optional<String> hostOverride = Optional.empty();
private Optional<String> endpoint = Optional.empty();
private Optional<String> region = Optional.empty();
private Map<String, String> additionalConfig = new HashMap<>();
private LanceDbRestNamespaces() {}
private LanceDbNamespaceClientBuilder() {}
/**
* Create a new builder instance.
*
* @return A new LanceRestNamespaceBuilder
* @return A new LanceDbNamespaceClientBuilder
*/
public static LanceDbRestNamespaces builder() {
return new LanceDbRestNamespaces();
public static LanceDbNamespaceClientBuilder newBuilder() {
return new LanceDbNamespaceClientBuilder();
}
/**
@@ -48,7 +71,7 @@ public class LanceDbRestNamespaces {
* @param apiKey The LanceDB API key
* @return This builder
*/
public LanceDbRestNamespaces apiKey(String apiKey) {
public LanceDbNamespaceClientBuilder apiKey(String apiKey) {
if (apiKey == null || apiKey.trim().isEmpty()) {
throw new IllegalArgumentException("API key cannot be null or empty");
}
@@ -62,7 +85,7 @@ public class LanceDbRestNamespaces {
* @param database The database name
* @return This builder
*/
public LanceDbRestNamespaces database(String database) {
public LanceDbNamespaceClientBuilder database(String database) {
if (database == null || database.trim().isEmpty()) {
throw new IllegalArgumentException("Database cannot be null or empty");
}
@@ -71,25 +94,25 @@ public class LanceDbRestNamespaces {
}
/**
* Set a custom host override (optional). When set, this overrides the default LanceDB Cloud URL
* Set a custom endpoint URL (optional). When set, this overrides the default LanceDB Cloud URL
* construction. Use this for LanceDB Enterprise deployments.
*
* @param hostOverride The complete base URL (e.g., "http://your-vpc-endpoint:80")
* @param endpoint The complete base URL for your LanceDB Enterprise deployment
* @return This builder
*/
public LanceDbRestNamespaces hostOverride(String hostOverride) {
this.hostOverride = Optional.ofNullable(hostOverride);
public LanceDbNamespaceClientBuilder endpoint(String endpoint) {
this.endpoint = Optional.ofNullable(endpoint);
return this;
}
/**
* Set the region for LanceDB Cloud (optional). Defaults to "us-east-1" if not specified. This is
* ignored when hostOverride is set.
* ignored when endpoint is set.
*
* @param region The AWS region (e.g., "us-east-1", "eu-west-1")
* @return This builder
*/
public LanceDbRestNamespaces region(String region) {
public LanceDbNamespaceClientBuilder region(String region) {
this.region = Optional.ofNullable(region);
return this;
}
@@ -101,18 +124,18 @@ public class LanceDbRestNamespaces {
* @param value The configuration value
* @return This builder
*/
public LanceDbRestNamespaces config(String key, String value) {
public LanceDbNamespaceClientBuilder config(String key, String value) {
this.additionalConfig.put(key, value);
return this;
}
/**
* Build the LanceRestNamespace instance.
* Build the LanceNamespace instance.
*
* @return A configured LanceRestNamespace
* @return A configured LanceNamespace
* @throws IllegalStateException if required parameters are missing
*/
public LanceRestNamespace build() {
public LanceNamespace build() {
// Validate required fields
if (apiKey == null) {
throw new IllegalStateException("API key is required");
@@ -123,24 +146,19 @@ public class LanceDbRestNamespaces {
// Build configuration map
Map<String, String> config = new HashMap<>(additionalConfig);
config.put("headers.x-lancedb-database", database);
config.put("headers.x-api-key", apiKey);
config.put("header.x-lancedb-database", database);
config.put("header.x-api-key", apiKey);
// Determine base URL
String baseUrl;
if (hostOverride.isPresent()) {
baseUrl = hostOverride.get();
config.put("host_override", hostOverride.get());
String uri;
if (endpoint.isPresent()) {
uri = endpoint.get();
} else {
String effectiveRegion = region.orElse(DEFAULT_REGION);
baseUrl = String.format(CLOUD_URL_PATTERN, database, effectiveRegion);
config.put("region", effectiveRegion);
uri = String.format(CLOUD_URL_PATTERN, database, effectiveRegion);
}
config.put("uri", uri);
// Create and configure ApiClient
ApiClient apiClient = new ApiClient();
apiClient.setBasePath(baseUrl);
return new LanceRestNamespace(apiClient, config);
return LanceNamespace.connect("rest", config, null);
}
}

View File

@@ -0,0 +1,96 @@
/*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package com.lancedb;
import org.junit.jupiter.api.Test;
import static org.junit.jupiter.api.Assertions.*;
/** Unit tests for LanceDbNamespaceClientBuilder. */
public class LanceDbNamespaceClientBuilderTest {
@Test
public void testBuilderRequiresApiKey() {
LanceDbNamespaceClientBuilder builder =
LanceDbNamespaceClientBuilder.newBuilder().database("test-db");
IllegalStateException exception = assertThrows(IllegalStateException.class, builder::build);
assertEquals("API key is required", exception.getMessage());
}
@Test
public void testBuilderRequiresDatabase() {
LanceDbNamespaceClientBuilder builder =
LanceDbNamespaceClientBuilder.newBuilder().apiKey("test-api-key");
IllegalStateException exception = assertThrows(IllegalStateException.class, builder::build);
assertEquals("Database is required", exception.getMessage());
}
@Test
public void testApiKeyCannotBeNull() {
IllegalArgumentException exception =
assertThrows(
IllegalArgumentException.class,
() -> LanceDbNamespaceClientBuilder.newBuilder().apiKey(null));
assertEquals("API key cannot be null or empty", exception.getMessage());
}
@Test
public void testApiKeyCannotBeEmpty() {
IllegalArgumentException exception =
assertThrows(
IllegalArgumentException.class,
() -> LanceDbNamespaceClientBuilder.newBuilder().apiKey(" "));
assertEquals("API key cannot be null or empty", exception.getMessage());
}
@Test
public void testDatabaseCannotBeNull() {
IllegalArgumentException exception =
assertThrows(
IllegalArgumentException.class,
() -> LanceDbNamespaceClientBuilder.newBuilder().database(null));
assertEquals("Database cannot be null or empty", exception.getMessage());
}
@Test
public void testDatabaseCannotBeEmpty() {
IllegalArgumentException exception =
assertThrows(
IllegalArgumentException.class,
() -> LanceDbNamespaceClientBuilder.newBuilder().database(" "));
assertEquals("Database cannot be null or empty", exception.getMessage());
}
@Test
public void testBuilderFluentApi() {
// Verify the builder returns itself for chaining
LanceDbNamespaceClientBuilder builder = LanceDbNamespaceClientBuilder.newBuilder();
assertSame(builder, builder.apiKey("test-key"));
assertSame(builder, builder.database("test-db"));
assertSame(builder, builder.endpoint("http://localhost:8080"));
assertSame(builder, builder.region("eu-west-1"));
assertSame(builder, builder.config("custom-key", "custom-value"));
}
@Test
public void testNewBuilderCreatesNewInstance() {
LanceDbNamespaceClientBuilder builder1 = LanceDbNamespaceClientBuilder.newBuilder();
LanceDbNamespaceClientBuilder builder2 = LanceDbNamespaceClientBuilder.newBuilder();
assertNotSame(builder1, builder2);
}
}

View File

@@ -0,0 +1,32 @@
<?xml version='1.0' encoding='UTF-8'?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
<configuration monitorInterval="30">
<appenders>
<Console name='Console' target='SYSTEM_ERR'>
<PatternLayout pattern='%d{HH:mm:ss.SSS} %p [%t] %C{1}.%M: %m%n'/>
</Console>
</appenders>
<loggers>
<logger name='com.lancedb' level='DEBUG' additivity='false'>
<appender-ref ref='Console'/>
</logger>
<root level='INFO'>
<appender-ref ref='Console'/>
</root>
</loggers>
</configuration>

View File

@@ -6,7 +6,7 @@
<groupId>com.lancedb</groupId>
<artifactId>lancedb-parent</artifactId>
<version>0.22.4-beta.0</version>
<version>0.23.0-final.0</version>
<packaging>pom</packaging>
<name>${project.artifactId}</name>
<description>LanceDB Java SDK Parent POM</description>
@@ -28,7 +28,7 @@
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<arrow.version>15.0.0</arrow.version>
<lance-namespace.verison>0.0.1</lance-namespace.verison>
<lance-core.version>1.0.0-rc.2</lance-core.version>
<spotless.skip>false</spotless.skip>
<spotless.version>2.30.0</spotless.version>
<spotless.java.googlejavaformat.version>1.7</spotless.java.googlejavaformat.version>
@@ -51,8 +51,7 @@
</properties>
<modules>
<module>core</module>
<module>lance-namespace</module>
<module>lancedb-core</module>
</modules>
<scm>
@@ -64,9 +63,9 @@
<dependencyManagement>
<dependencies>
<dependency>
<groupId>com.lancedb</groupId>
<artifactId>lance-namespace-core</artifactId>
<version>${lance-namespace.verison}</version>
<groupId>org.lance</groupId>
<artifactId>lance-core</artifactId>
<version>${lance-core.version}</version>
</dependency>
<dependency>
<groupId>org.apache.arrow</groupId>
@@ -88,21 +87,11 @@
<artifactId>arrow-dataset</artifactId>
<version>${arrow.version}</version>
</dependency>
<dependency>
<groupId>org.questdb</groupId>
<artifactId>jar-jni</artifactId>
<version>1.1.1</version>
</dependency>
<dependency>
<groupId>org.junit.jupiter</groupId>
<artifactId>junit-jupiter</artifactId>
<version>5.10.1</version>
</dependency>
<dependency>
<groupId>org.json</groupId>
<artifactId>json</artifactId>
<version>20210307</version>
</dependency>
</dependencies>
</dependencyManagement>

View File

@@ -1,7 +1,7 @@
[package]
name = "lancedb-nodejs"
edition.workspace = true
version = "0.22.4-beta.0"
version = "0.23.0"
license.workspace = true
description.workspace = true
repository.workspace = true

View File

@@ -30,7 +30,7 @@ const results = await table.vectorSearch([0.1, 0.3]).limit(20).toArray();
console.log(results);
```
The [quickstart](https://lancedb.github.io/lancedb/basic/) contains a more complete example.
The [quickstart](https://lancedb.com/docs/quickstart/basic-usage/) contains more complete examples.
## Development

View File

@@ -42,7 +42,7 @@ export interface CreateTableOptions {
* Options already set on the connection will be inherited by the table,
* but can be overridden here.
*
* The available options are described at https://lancedb.github.io/lancedb/guides/storage/
* The available options are described at https://lancedb.com/docs/storage/
*/
storageOptions?: Record<string, string>;
@@ -78,7 +78,7 @@ export interface OpenTableOptions {
* Options already set on the connection will be inherited by the table,
* but can be overridden here.
*
* The available options are described at https://lancedb.github.io/lancedb/guides/storage/
* The available options are described at https://lancedb.com/docs/storage/
*/
storageOptions?: Record<string, string>;
/**

View File

@@ -118,7 +118,7 @@ export class PermutationBuilder {
* @returns A new PermutationBuilder instance
* @example
* ```ts
* builder.splitCalculated("user_id % 3");
* builder.splitCalculated({ calculation: "user_id % 3" });
* ```
*/
splitCalculated(options: SplitCalculatedOptions): PermutationBuilder {

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-darwin-arm64",
"version": "0.22.4-beta.0",
"version": "0.23.0",
"os": ["darwin"],
"cpu": ["arm64"],
"main": "lancedb.darwin-arm64.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-darwin-x64",
"version": "0.22.4-beta.0",
"version": "0.23.0",
"os": ["darwin"],
"cpu": ["x64"],
"main": "lancedb.darwin-x64.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-linux-arm64-gnu",
"version": "0.22.4-beta.0",
"version": "0.23.0",
"os": ["linux"],
"cpu": ["arm64"],
"main": "lancedb.linux-arm64-gnu.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-linux-arm64-musl",
"version": "0.22.4-beta.0",
"version": "0.23.0",
"os": ["linux"],
"cpu": ["arm64"],
"main": "lancedb.linux-arm64-musl.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-linux-x64-gnu",
"version": "0.22.4-beta.0",
"version": "0.23.0",
"os": ["linux"],
"cpu": ["x64"],
"main": "lancedb.linux-x64-gnu.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-linux-x64-musl",
"version": "0.22.4-beta.0",
"version": "0.23.0",
"os": ["linux"],
"cpu": ["x64"],
"main": "lancedb.linux-x64-musl.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-win32-arm64-msvc",
"version": "0.22.4-beta.0",
"version": "0.23.0",
"os": [
"win32"
],

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-win32-x64-msvc",
"version": "0.22.4-beta.0",
"version": "0.23.0",
"os": ["win32"],
"cpu": ["x64"],
"main": "lancedb.win32-x64-msvc.node",

View File

@@ -1,12 +1,12 @@
{
"name": "@lancedb/lancedb",
"version": "0.22.4-beta.0",
"version": "0.23.0",
"lockfileVersion": 3,
"requires": true,
"packages": {
"": {
"name": "@lancedb/lancedb",
"version": "0.22.4-beta.0",
"version": "0.23.0",
"cpu": [
"x64",
"arm64"

View File

@@ -11,7 +11,7 @@
"ann"
],
"private": false,
"version": "0.22.4-beta.0",
"version": "0.23.0",
"main": "dist/index.js",
"exports": {
".": "./dist/index.js",
@@ -73,8 +73,10 @@
"scripts": {
"artifacts": "napi artifacts",
"build:debug": "napi build --platform --no-const-enum --dts ../lancedb/native.d.ts --js ../lancedb/native.js lancedb",
"postbuild:debug": "shx mkdir -p dist && shx cp lancedb/*.node dist/",
"build:release": "napi build --platform --no-const-enum --release --dts ../lancedb/native.d.ts --js ../lancedb/native.js dist/",
"build": "npm run build:debug && npm run tsc && shx cp lancedb/*.node dist/",
"postbuild:release": "shx mkdir -p dist && shx cp lancedb/*.node dist/",
"build": "npm run build:debug && npm run tsc",
"build-release": "npm run build:release && npm run tsc",
"tsc": "tsc -b",
"posttsc": "shx cp lancedb/native.d.ts dist/native.d.ts",

View File

@@ -35,7 +35,7 @@ pub struct ConnectionOptions {
pub read_consistency_interval: Option<f64>,
/// (For LanceDB OSS only): configuration for object storage.
///
/// The available options are described at https://lancedb.github.io/lancedb/guides/storage/
/// The available options are described at https://lancedb.com/docs/storage/
pub storage_options: Option<HashMap<String, String>>,
/// (For LanceDB OSS only): the session to use for this connection. Holds
/// shared caches and other session-specific state.

View File

@@ -740,6 +740,7 @@ pub struct MergeResult {
pub num_inserted_rows: i64,
pub num_updated_rows: i64,
pub num_deleted_rows: i64,
pub num_attempts: i64,
}
impl From<lancedb::table::MergeResult> for MergeResult {
@@ -749,6 +750,7 @@ impl From<lancedb::table::MergeResult> for MergeResult {
num_inserted_rows: value.num_inserted_rows as i64,
num_updated_rows: value.num_updated_rows as i64,
num_deleted_rows: value.num_deleted_rows as i64,
num_attempts: value.num_attempts as i64,
}
}
}

View File

@@ -1,5 +1,5 @@
[tool.bumpversion]
current_version = "0.25.4-beta.0"
current_version = "0.26.1-beta.0"
parse = """(?x)
(?P<major>0|[1-9]\\d*)\\.
(?P<minor>0|[1-9]\\d*)\\.

View File

@@ -1,6 +1,6 @@
[package]
name = "lancedb-python"
version = "0.25.4-beta.0"
version = "0.26.1-beta.0"
edition.workspace = true
description = "Python bindings for LanceDB"
license.workspace = true
@@ -18,6 +18,7 @@ arrow = { version = "56.2", features = ["pyarrow"] }
async-trait = "0.1"
lancedb = { path = "../rust/lancedb", default-features = false }
lance-core.workspace = true
lance-namespace.workspace = true
lance-io.workspace = true
env_logger.workspace = true
pyo3 = { version = "0.25", features = ["extension-module", "abi3-py39"] }

View File

@@ -1,11 +1,11 @@
PIP_EXTRA_INDEX_URL ?= https://pypi.fury.io/lancedb/
PIP_EXTRA_INDEX_URL ?= https://pypi.fury.io/lance-format/ https://pypi.fury.io/lancedb/
help: ## Show this help.
@sed -ne '/@sed/!s/## //p' $(MAKEFILE_LIST)
.PHONY: develop
develop: ## Install the package in development mode.
PIP_EXTRA_INDEX_URL=$(PIP_EXTRA_INDEX_URL) maturin develop --extras tests,dev,embeddings
PIP_EXTRA_INDEX_URL="$(PIP_EXTRA_INDEX_URL)" maturin develop --extras tests,dev,embeddings
.PHONY: format
format: ## Format the code.

View File

@@ -10,7 +10,7 @@ dependencies = [
"pyarrow>=16",
"pydantic>=1.10",
"tqdm>=4.27.0",
"lance-namespace>=0.0.21"
"lance-namespace>=0.3.2"
]
description = "lancedb"
authors = [{ name = "LanceDB Devs", email = "dev@lancedb.com" }]
@@ -45,7 +45,7 @@ repository = "https://github.com/lancedb/lancedb"
[project.optional-dependencies]
pylance = [
"pylance>=0.25",
"pylance>=1.0.0b14",
]
tests = [
"aiohttp",
@@ -59,7 +59,7 @@ tests = [
"polars>=0.19, <=1.3.0",
"tantivy",
"pyarrow-stubs",
"pylance>=1.0.0b2",
"pylance>=1.0.0b14",
"requests",
"datafusion",
]

View File

@@ -20,7 +20,12 @@ from .remote.db import RemoteDBConnection
from .schema import vector
from .table import AsyncTable, Table
from ._lancedb import Session
from .namespace import connect_namespace, LanceNamespaceDBConnection
from .namespace import (
connect_namespace,
connect_namespace_async,
LanceNamespaceDBConnection,
AsyncLanceNamespaceDBConnection,
)
def connect(
@@ -36,7 +41,7 @@ def connect(
session: Optional[Session] = None,
**kwargs: Any,
) -> DBConnection:
"""Connect to a LanceDB database. YAY!
"""Connect to a LanceDB database.
Parameters
----------
@@ -67,7 +72,7 @@ def connect(
default configuration is used.
storage_options: dict, optional
Additional options for the storage backend. See available options at
<https://lancedb.github.io/lancedb/guides/storage/>
<https://lancedb.com/docs/storage/>
session: Session, optional
(For LanceDB OSS only)
A session to use for this connection. Sessions allow you to configure
@@ -169,7 +174,7 @@ async def connect_async(
default configuration is used.
storage_options: dict, optional
Additional options for the storage backend. See available options at
<https://lancedb.github.io/lancedb/guides/storage/>
<https://lancedb.com/docs/storage/>
session: Session, optional
(For LanceDB OSS only)
A session to use for this connection. Sessions allow you to configure
@@ -224,7 +229,9 @@ __all__ = [
"connect",
"connect_async",
"connect_namespace",
"connect_namespace_async",
"AsyncConnection",
"AsyncLanceNamespaceDBConnection",
"AsyncTable",
"URI",
"sanitize_uri",

View File

@@ -3,10 +3,30 @@ from typing import Dict, List, Optional, Tuple, Any, TypedDict, Union, Literal
import pyarrow as pa
from .index import BTree, IvfFlat, IvfPq, Bitmap, LabelList, HnswPq, HnswSq, FTS
from .index import (
BTree,
IvfFlat,
IvfPq,
IvfSq,
Bitmap,
LabelList,
HnswPq,
HnswSq,
FTS,
)
from .io import StorageOptionsProvider
from lance_namespace import (
ListNamespacesResponse,
CreateNamespaceResponse,
DropNamespaceResponse,
DescribeNamespaceResponse,
ListTablesResponse,
)
from .remote import ClientConfig
IvfHnswPq: type[HnswPq] = HnswPq
IvfHnswSq: type[HnswSq] = HnswSq
class Session:
def __init__(
self,
@@ -26,24 +46,44 @@ class Connection(object):
async def close(self): ...
async def list_namespaces(
self,
namespace: List[str],
page_token: Optional[str],
limit: Optional[int],
) -> List[str]: ...
async def create_namespace(self, namespace: List[str]) -> None: ...
async def drop_namespace(self, namespace: List[str]) -> None: ...
async def table_names(
namespace: Optional[List[str]] = None,
page_token: Optional[str] = None,
limit: Optional[int] = None,
) -> ListNamespacesResponse: ...
async def create_namespace(
self,
namespace: List[str],
mode: Optional[str] = None,
properties: Optional[Dict[str, str]] = None,
) -> CreateNamespaceResponse: ...
async def drop_namespace(
self,
namespace: List[str],
mode: Optional[str] = None,
behavior: Optional[str] = None,
) -> DropNamespaceResponse: ...
async def describe_namespace(
self,
namespace: List[str],
) -> DescribeNamespaceResponse: ...
async def list_tables(
self,
namespace: Optional[List[str]] = None,
page_token: Optional[str] = None,
limit: Optional[int] = None,
) -> ListTablesResponse: ...
async def table_names(
self,
namespace: Optional[List[str]],
start_after: Optional[str],
limit: Optional[int],
) -> list[str]: ...
) -> list[str]: ... # Deprecated: Use list_tables instead
async def create_table(
self,
name: str,
mode: str,
data: pa.RecordBatchReader,
namespace: List[str] = [],
namespace: Optional[List[str]] = None,
storage_options: Optional[Dict[str, str]] = None,
storage_options_provider: Optional[StorageOptionsProvider] = None,
location: Optional[str] = None,
@@ -53,7 +93,7 @@ class Connection(object):
name: str,
mode: str,
schema: pa.Schema,
namespace: List[str] = [],
namespace: Optional[List[str]] = None,
storage_options: Optional[Dict[str, str]] = None,
storage_options_provider: Optional[StorageOptionsProvider] = None,
location: Optional[str] = None,
@@ -61,7 +101,7 @@ class Connection(object):
async def open_table(
self,
name: str,
namespace: List[str] = [],
namespace: Optional[List[str]] = None,
storage_options: Optional[Dict[str, str]] = None,
storage_options_provider: Optional[StorageOptionsProvider] = None,
index_cache_size: Optional[int] = None,
@@ -71,7 +111,7 @@ class Connection(object):
self,
target_table_name: str,
source_uri: str,
target_namespace: List[str] = [],
target_namespace: Optional[List[str]] = None,
source_version: Optional[int] = None,
source_tag: Optional[str] = None,
is_shallow: bool = True,
@@ -80,11 +120,13 @@ class Connection(object):
self,
cur_name: str,
new_name: str,
cur_namespace: List[str] = [],
new_namespace: List[str] = [],
cur_namespace: Optional[List[str]] = None,
new_namespace: Optional[List[str]] = None,
) -> None: ...
async def drop_table(self, name: str, namespace: List[str] = []) -> None: ...
async def drop_all_tables(self, namespace: List[str] = []) -> None: ...
async def drop_table(
self, name: str, namespace: Optional[List[str]] = None
) -> None: ...
async def drop_all_tables(self, namespace: Optional[List[str]] = None) -> None: ...
class Table:
def name(self) -> str: ...
@@ -102,7 +144,17 @@ class Table:
async def create_index(
self,
column: str,
index: Union[IvfFlat, IvfPq, HnswPq, HnswSq, BTree, Bitmap, LabelList, FTS],
index: Union[
IvfFlat,
IvfSq,
IvfPq,
HnswPq,
HnswSq,
BTree,
Bitmap,
LabelList,
FTS,
],
replace: Optional[bool],
wait_timeout: Optional[object],
*,
@@ -306,6 +358,7 @@ class MergeResult:
num_updated_rows: int
num_inserted_rows: int
num_deleted_rows: int
num_attempts: int
class AddColumnsResult:
version: int

View File

@@ -96,7 +96,7 @@ def data_to_reader(
f"Unknown data type {type(data)}. "
"Supported types: list of dicts, pandas DataFrame, polars DataFrame, "
"pyarrow Table/RecordBatch, or Pydantic models. "
"See https://lancedb.github.io/lancedb/guides/tables/ for examples."
"See https://lancedb.com/docs/tables/ for examples."
)

View File

@@ -22,6 +22,13 @@ from lancedb.embeddings.registry import EmbeddingFunctionRegistry
from lancedb.common import data_to_reader, sanitize_uri, validate_schema
from lancedb.background_loop import LOOP
from lance_namespace import (
ListNamespacesResponse,
CreateNamespaceResponse,
DropNamespaceResponse,
DescribeNamespaceResponse,
ListTablesResponse,
)
from . import __version__
from ._lancedb import connect as lancedb_connect # type: ignore
@@ -48,16 +55,22 @@ if TYPE_CHECKING:
from .io import StorageOptionsProvider
from ._lancedb import Session
from .namespace_utils import (
_normalize_create_namespace_mode,
_normalize_drop_namespace_mode,
_normalize_drop_namespace_behavior,
)
class DBConnection(EnforceOverrides):
"""An active LanceDB connection interface."""
def list_namespaces(
self,
namespace: List[str] = [],
namespace: Optional[List[str]] = None,
page_token: Optional[str] = None,
limit: int = 10,
) -> Iterable[str]:
limit: Optional[int] = None,
) -> ListNamespacesResponse:
"""List immediate child namespace names in the given namespace.
Parameters
@@ -66,48 +79,126 @@ class DBConnection(EnforceOverrides):
The parent namespace to list namespaces in.
Empty list represents root namespace.
page_token: str, optional
The token to use for pagination. If not present, start from the beginning.
limit: int, default 10
The size of the page to return.
Token for pagination. Use the token from a previous response
to get the next page of results.
limit: int, optional
The maximum number of results to return.
Returns
-------
Iterable of str
List of immediate child namespace names
ListNamespacesResponse
Response containing namespace names and optional page_token for pagination.
"""
return []
if namespace is None:
namespace = []
return ListNamespacesResponse(namespaces=[], page_token=None)
def create_namespace(self, namespace: List[str]) -> None:
def create_namespace(
self,
namespace: List[str],
mode: Optional[str] = None,
properties: Optional[Dict[str, str]] = None,
) -> CreateNamespaceResponse:
"""Create a new namespace.
Parameters
----------
namespace: List[str]
The namespace identifier to create.
mode: str, optional
Creation mode - "create" (fail if exists), "exist_ok" (skip if exists),
or "overwrite" (replace if exists). Case insensitive.
properties: Dict[str, str], optional
Properties to set on the namespace.
Returns
-------
CreateNamespaceResponse
Response containing the properties of the created namespace.
"""
raise NotImplementedError(
"Namespace operations are not supported for this connection type"
)
def drop_namespace(self, namespace: List[str]) -> None:
def drop_namespace(
self,
namespace: List[str],
mode: Optional[str] = None,
behavior: Optional[str] = None,
) -> DropNamespaceResponse:
"""Drop a namespace.
Parameters
----------
namespace: List[str]
The namespace identifier to drop.
mode: str, optional
Whether to skip if not exists ("SKIP") or fail ("FAIL"). Case insensitive.
behavior: str, optional
Whether to restrict drop if not empty ("RESTRICT") or cascade ("CASCADE").
Case insensitive.
Returns
-------
DropNamespaceResponse
Response containing properties and transaction_id if applicable.
"""
raise NotImplementedError(
"Namespace operations are not supported for this connection type"
)
def describe_namespace(self, namespace: List[str]) -> DescribeNamespaceResponse:
"""Describe a namespace.
Parameters
----------
namespace: List[str]
The namespace identifier to describe.
Returns
-------
DescribeNamespaceResponse
Response containing the namespace properties.
"""
raise NotImplementedError(
"Namespace operations are not supported for this connection type"
)
def list_tables(
self,
namespace: Optional[List[str]] = None,
page_token: Optional[str] = None,
limit: Optional[int] = None,
) -> ListTablesResponse:
"""List all tables in this database with pagination support.
Parameters
----------
namespace: List[str], optional
The namespace to list tables in.
None or empty list represents root namespace.
page_token: str, optional
Token for pagination. Use the token from a previous response
to get the next page of results.
limit: int, optional
The maximum number of results to return.
Returns
-------
ListTablesResponse
Response containing table names and optional page_token for pagination.
"""
raise NotImplementedError(
"list_tables is not supported for this connection type"
)
@abstractmethod
def table_names(
self,
page_token: Optional[str] = None,
limit: int = 10,
*,
namespace: List[str] = [],
namespace: Optional[List[str]] = None,
) -> Iterable[str]:
"""List all tables in this database, in sorted order
@@ -142,7 +233,7 @@ class DBConnection(EnforceOverrides):
fill_value: float = 0.0,
embedding_functions: Optional[List[EmbeddingFunctionConfig]] = None,
*,
namespace: List[str] = [],
namespace: Optional[List[str]] = None,
storage_options: Optional[Dict[str, str]] = None,
storage_options_provider: Optional["StorageOptionsProvider"] = None,
data_storage_version: Optional[str] = None,
@@ -191,7 +282,11 @@ class DBConnection(EnforceOverrides):
Additional options for the storage backend. Options already set on the
connection will be inherited by the table, but can be overridden here.
See available options at
<https://lancedb.github.io/lancedb/guides/storage/>
<https://lancedb.com/docs/storage/>
To enable stable row IDs (row IDs remain stable after compaction,
update, delete, and merges), set `new_table_enable_stable_row_ids`
to `"true"` in storage_options when connecting to the database.
data_storage_version: optional, str, default "stable"
Deprecated. Set `storage_options` when connecting to the database and set
`new_table_data_storage_version` in the options.
@@ -308,7 +403,7 @@ class DBConnection(EnforceOverrides):
self,
name: str,
*,
namespace: List[str] = [],
namespace: Optional[List[str]] = None,
storage_options: Optional[Dict[str, str]] = None,
storage_options_provider: Optional["StorageOptionsProvider"] = None,
index_cache_size: Optional[int] = None,
@@ -339,7 +434,7 @@ class DBConnection(EnforceOverrides):
Additional options for the storage backend. Options already set on the
connection will be inherited by the table, but can be overridden here.
See available options at
<https://lancedb.github.io/lancedb/guides/storage/>
<https://lancedb.com/docs/storage/>
Returns
-------
@@ -347,7 +442,7 @@ class DBConnection(EnforceOverrides):
"""
raise NotImplementedError
def drop_table(self, name: str, namespace: List[str] = []):
def drop_table(self, name: str, namespace: Optional[List[str]] = None):
"""Drop a table from the database.
Parameters
@@ -358,14 +453,16 @@ class DBConnection(EnforceOverrides):
The namespace to drop the table from.
Empty list represents root namespace.
"""
if namespace is None:
namespace = []
raise NotImplementedError
def rename_table(
self,
cur_name: str,
new_name: str,
cur_namespace: List[str] = [],
new_namespace: List[str] = [],
cur_namespace: Optional[List[str]] = None,
new_namespace: Optional[List[str]] = None,
):
"""Rename a table in the database.
@@ -382,6 +479,10 @@ class DBConnection(EnforceOverrides):
The namespace to move the table to.
If not specified, defaults to the same as cur_namespace.
"""
if cur_namespace is None:
cur_namespace = []
if new_namespace is None:
new_namespace = []
raise NotImplementedError
def drop_database(self):
@@ -391,7 +492,7 @@ class DBConnection(EnforceOverrides):
"""
raise NotImplementedError
def drop_all_tables(self, namespace: List[str] = []):
def drop_all_tables(self, namespace: Optional[List[str]] = None):
"""
Drop all tables from the database
@@ -401,6 +502,8 @@ class DBConnection(EnforceOverrides):
The namespace to drop all tables from.
None or empty list represents root namespace.
"""
if namespace is None:
namespace = []
raise NotImplementedError
@property
@@ -472,6 +575,12 @@ class LanceDBConnection(DBConnection):
uri = uri[7:] # Remove "file://"
elif uri.startswith("file:/"):
uri = uri[5:] # Remove "file:"
if sys.platform == "win32":
# On Windows, a path like /C:/path should become C:/path
if len(uri) >= 3 and uri[0] == "/" and uri[2] == ":":
uri = uri[1:]
uri = Path(uri)
uri = uri.expanduser().absolute()
Path(uri).mkdir(parents=True, exist_ok=True)
@@ -535,10 +644,10 @@ class LanceDBConnection(DBConnection):
@override
def list_namespaces(
self,
namespace: List[str] = [],
namespace: Optional[List[str]] = None,
page_token: Optional[str] = None,
limit: int = 10,
) -> Iterable[str]:
limit: Optional[int] = None,
) -> ListNamespacesResponse:
"""List immediate child namespace names in the given namespace.
Parameters
@@ -547,15 +656,18 @@ class LanceDBConnection(DBConnection):
The parent namespace to list namespaces in.
None or empty list represents root namespace.
page_token: str, optional
The token to use for pagination. If not present, start from the beginning.
limit: int, default 10
The size of the page to return.
Token for pagination. Use the token from a previous response
to get the next page of results.
limit: int, optional
The maximum number of results to return.
Returns
-------
Iterable of str
List of immediate child namespace names
ListNamespacesResponse
Response containing namespace names and optional page_token for pagination.
"""
if namespace is None:
namespace = []
return LOOP.run(
self._conn.list_namespaces(
namespace=namespace, page_token=page_token, limit=limit
@@ -563,26 +675,111 @@ class LanceDBConnection(DBConnection):
)
@override
def create_namespace(self, namespace: List[str]) -> None:
def create_namespace(
self,
namespace: List[str],
mode: Optional[str] = None,
properties: Optional[Dict[str, str]] = None,
) -> CreateNamespaceResponse:
"""Create a new namespace.
Parameters
----------
namespace: List[str]
The namespace identifier to create.
mode: str, optional
Creation mode - "create" (fail if exists), "exist_ok" (skip if exists),
or "overwrite" (replace if exists). Case insensitive.
properties: Dict[str, str], optional
Properties to set on the namespace.
Returns
-------
CreateNamespaceResponse
Response containing the properties of the created namespace.
"""
LOOP.run(self._conn.create_namespace(namespace=namespace))
return LOOP.run(
self._conn.create_namespace(
namespace=namespace, mode=mode, properties=properties
)
)
@override
def drop_namespace(self, namespace: List[str]) -> None:
def drop_namespace(
self,
namespace: List[str],
mode: Optional[str] = None,
behavior: Optional[str] = None,
) -> DropNamespaceResponse:
"""Drop a namespace.
Parameters
----------
namespace: List[str]
The namespace identifier to drop.
mode: str, optional
Whether to skip if not exists ("SKIP") or fail ("FAIL"). Case insensitive.
behavior: str, optional
Whether to restrict drop if not empty ("RESTRICT") or cascade ("CASCADE").
Case insensitive.
Returns
-------
DropNamespaceResponse
Response containing properties and transaction_id if applicable.
"""
return LOOP.run(self._conn.drop_namespace(namespace=namespace))
return LOOP.run(
self._conn.drop_namespace(namespace=namespace, mode=mode, behavior=behavior)
)
@override
def describe_namespace(self, namespace: List[str]) -> DescribeNamespaceResponse:
"""Describe a namespace.
Parameters
----------
namespace: List[str]
The namespace identifier to describe.
Returns
-------
DescribeNamespaceResponse
Response containing the namespace properties.
"""
return LOOP.run(self._conn.describe_namespace(namespace=namespace))
@override
def list_tables(
self,
namespace: Optional[List[str]] = None,
page_token: Optional[str] = None,
limit: Optional[int] = None,
) -> ListTablesResponse:
"""List all tables in this database with pagination support.
Parameters
----------
namespace: List[str], optional
The namespace to list tables in.
None or empty list represents root namespace.
page_token: str, optional
Token for pagination. Use the token from a previous response
to get the next page of results.
limit: int, optional
The maximum number of results to return.
Returns
-------
ListTablesResponse
Response containing table names and optional page_token for pagination.
"""
if namespace is None:
namespace = []
return LOOP.run(
self._conn.list_tables(
namespace=namespace, page_token=page_token, limit=limit
)
)
@override
def table_names(
@@ -590,10 +787,13 @@ class LanceDBConnection(DBConnection):
page_token: Optional[str] = None,
limit: int = 10,
*,
namespace: List[str] = [],
namespace: Optional[List[str]] = None,
) -> Iterable[str]:
"""Get the names of all tables in the database. The names are sorted.
.. deprecated::
Use :meth:`list_tables` instead, which provides proper pagination support.
Parameters
----------
namespace: List[str], optional
@@ -608,6 +808,15 @@ class LanceDBConnection(DBConnection):
Iterator of str.
A list of table names.
"""
import warnings
warnings.warn(
"table_names() is deprecated, use list_tables() instead",
DeprecationWarning,
stacklevel=2,
)
if namespace is None:
namespace = []
return LOOP.run(
self._conn.table_names(
namespace=namespace, start_after=page_token, limit=limit
@@ -632,7 +841,7 @@ class LanceDBConnection(DBConnection):
fill_value: float = 0.0,
embedding_functions: Optional[List[EmbeddingFunctionConfig]] = None,
*,
namespace: List[str] = [],
namespace: Optional[List[str]] = None,
storage_options: Optional[Dict[str, str]] = None,
storage_options_provider: Optional["StorageOptionsProvider"] = None,
data_storage_version: Optional[str] = None,
@@ -649,6 +858,8 @@ class LanceDBConnection(DBConnection):
---
DBConnection.create_table
"""
if namespace is None:
namespace = []
if mode.lower() not in ["create", "overwrite"]:
raise ValueError("mode must be either 'create' or 'overwrite'")
validate_table_name(name)
@@ -674,7 +885,7 @@ class LanceDBConnection(DBConnection):
self,
name: str,
*,
namespace: List[str] = [],
namespace: Optional[List[str]] = None,
storage_options: Optional[Dict[str, str]] = None,
storage_options_provider: Optional["StorageOptionsProvider"] = None,
index_cache_size: Optional[int] = None,
@@ -692,6 +903,8 @@ class LanceDBConnection(DBConnection):
-------
A LanceTable object representing the table.
"""
if namespace is None:
namespace = []
if index_cache_size is not None:
import warnings
@@ -717,7 +930,7 @@ class LanceDBConnection(DBConnection):
target_table_name: str,
source_uri: str,
*,
target_namespace: List[str] = [],
target_namespace: Optional[List[str]] = None,
source_version: Optional[int] = None,
source_tag: Optional[str] = None,
is_shallow: bool = True,
@@ -750,6 +963,8 @@ class LanceDBConnection(DBConnection):
-------
A LanceTable object representing the cloned table.
"""
if target_namespace is None:
target_namespace = []
LOOP.run(
self._conn.clone_table(
target_table_name,
@@ -770,7 +985,7 @@ class LanceDBConnection(DBConnection):
def drop_table(
self,
name: str,
namespace: List[str] = [],
namespace: Optional[List[str]] = None,
ignore_missing: bool = False,
):
"""Drop a table from the database.
@@ -784,6 +999,8 @@ class LanceDBConnection(DBConnection):
ignore_missing: bool, default False
If True, ignore if the table does not exist.
"""
if namespace is None:
namespace = []
LOOP.run(
self._conn.drop_table(
name, namespace=namespace, ignore_missing=ignore_missing
@@ -791,7 +1008,9 @@ class LanceDBConnection(DBConnection):
)
@override
def drop_all_tables(self, namespace: List[str] = []):
def drop_all_tables(self, namespace: Optional[List[str]] = None):
if namespace is None:
namespace = []
LOOP.run(self._conn.drop_all_tables(namespace=namespace))
@override
@@ -799,8 +1018,8 @@ class LanceDBConnection(DBConnection):
self,
cur_name: str,
new_name: str,
cur_namespace: List[str] = [],
new_namespace: List[str] = [],
cur_namespace: Optional[List[str]] = None,
new_namespace: Optional[List[str]] = None,
):
"""Rename a table in the database.
@@ -815,6 +1034,10 @@ class LanceDBConnection(DBConnection):
new_namespace: List[str], optional
The namespace to move the table to.
"""
if cur_namespace is None:
cur_namespace = []
if new_namespace is None:
new_namespace = []
LOOP.run(
self._conn.rename_table(
cur_name,
@@ -904,10 +1127,10 @@ class AsyncConnection(object):
async def list_namespaces(
self,
namespace: List[str] = [],
namespace: Optional[List[str]] = None,
page_token: Optional[str] = None,
limit: int = 10,
) -> Iterable[str]:
limit: Optional[int] = None,
) -> ListNamespacesResponse:
"""List immediate child namespace names in the given namespace.
Parameters
@@ -917,47 +1140,141 @@ class AsyncConnection(object):
None or empty list represents root namespace.
page_token: str, optional
The token to use for pagination. If not present, start from the beginning.
limit: int, default 10
The size of the page to return.
limit: int, optional
The maximum number of results to return.
Returns
-------
Iterable of str
List of immediate child namespace names (not full paths)
ListNamespacesResponse
Response containing namespace names and optional pagination token
"""
return await self._inner.list_namespaces(
if namespace is None:
namespace = []
result = await self._inner.list_namespaces(
namespace=namespace, page_token=page_token, limit=limit
)
return ListNamespacesResponse(**result)
async def create_namespace(self, namespace: List[str]) -> None:
async def create_namespace(
self,
namespace: List[str],
mode: Optional[str] = None,
properties: Optional[Dict[str, str]] = None,
) -> CreateNamespaceResponse:
"""Create a new namespace.
Parameters
----------
namespace: List[str]
The namespace identifier to create.
"""
await self._inner.create_namespace(namespace)
mode: str, optional
Creation mode - "create", "exist_ok", or "overwrite". Case insensitive.
properties: Dict[str, str], optional
Properties to associate with the namespace
async def drop_namespace(self, namespace: List[str]) -> None:
Returns
-------
CreateNamespaceResponse
Response containing namespace properties
"""
result = await self._inner.create_namespace(
namespace,
mode=_normalize_create_namespace_mode(mode),
properties=properties,
)
return CreateNamespaceResponse(**result)
async def drop_namespace(
self,
namespace: List[str],
mode: Optional[str] = None,
behavior: Optional[str] = None,
) -> DropNamespaceResponse:
"""Drop a namespace.
Parameters
----------
namespace: List[str]
The namespace identifier to drop.
mode: str, optional
Whether to skip if not exists ("SKIP") or fail ("FAIL"). Case insensitive.
behavior: str, optional
Whether to restrict drop if not empty ("RESTRICT") or cascade ("CASCADE").
Case insensitive.
Returns
-------
DropNamespaceResponse
Response containing properties and transaction_id if applicable.
"""
await self._inner.drop_namespace(namespace)
result = await self._inner.drop_namespace(
namespace,
mode=_normalize_drop_namespace_mode(mode),
behavior=_normalize_drop_namespace_behavior(behavior),
)
return DropNamespaceResponse(**result)
async def describe_namespace(
self, namespace: List[str]
) -> DescribeNamespaceResponse:
"""Describe a namespace.
Parameters
----------
namespace: List[str]
The namespace identifier to describe.
Returns
-------
DescribeNamespaceResponse
Response containing the namespace properties.
"""
result = await self._inner.describe_namespace(namespace)
return DescribeNamespaceResponse(**result)
async def list_tables(
self,
namespace: Optional[List[str]] = None,
page_token: Optional[str] = None,
limit: Optional[int] = None,
) -> ListTablesResponse:
"""List all tables in this database with pagination support.
Parameters
----------
namespace: List[str], optional
The namespace to list tables in.
None or empty list represents root namespace.
page_token: str, optional
Token for pagination. Use the token from a previous response
to get the next page of results.
limit: int, optional
The maximum number of results to return.
Returns
-------
ListTablesResponse
Response containing table names and optional page_token for pagination.
"""
if namespace is None:
namespace = []
result = await self._inner.list_tables(
namespace=namespace, page_token=page_token, limit=limit
)
return ListTablesResponse(**result)
async def table_names(
self,
*,
namespace: List[str] = [],
namespace: Optional[List[str]] = None,
start_after: Optional[str] = None,
limit: Optional[int] = None,
) -> Iterable[str]:
"""List all tables in this database, in sorted order
.. deprecated::
Use :meth:`list_tables` instead, which provides proper pagination support.
Parameters
----------
namespace: List[str], optional
@@ -976,6 +1293,15 @@ class AsyncConnection(object):
-------
Iterable of str
"""
import warnings
warnings.warn(
"table_names() is deprecated, use list_tables() instead",
DeprecationWarning,
stacklevel=2,
)
if namespace is None:
namespace = []
return await self._inner.table_names(
namespace=namespace, start_after=start_after, limit=limit
)
@@ -992,7 +1318,7 @@ class AsyncConnection(object):
storage_options: Optional[Dict[str, str]] = None,
storage_options_provider: Optional["StorageOptionsProvider"] = None,
*,
namespace: List[str] = [],
namespace: Optional[List[str]] = None,
embedding_functions: Optional[List[EmbeddingFunctionConfig]] = None,
location: Optional[str] = None,
) -> AsyncTable:
@@ -1039,7 +1365,11 @@ class AsyncConnection(object):
Additional options for the storage backend. Options already set on the
connection will be inherited by the table, but can be overridden here.
See available options at
<https://lancedb.github.io/lancedb/guides/storage/>
<https://lancedb.com/docs/storage/>
To enable stable row IDs (row IDs remain stable after compaction,
update, delete, and merges), set `new_table_enable_stable_row_ids`
to `"true"` in storage_options when connecting to the database.
Returns
-------
@@ -1149,6 +1479,8 @@ class AsyncConnection(object):
... await db.create_table("table4", make_batches(), schema=schema)
>>> asyncio.run(iterable_example())
"""
if namespace is None:
namespace = []
metadata = None
if embedding_functions is not None:
@@ -1206,7 +1538,7 @@ class AsyncConnection(object):
self,
name: str,
*,
namespace: List[str] = [],
namespace: Optional[List[str]] = None,
storage_options: Optional[Dict[str, str]] = None,
storage_options_provider: Optional["StorageOptionsProvider"] = None,
index_cache_size: Optional[int] = None,
@@ -1225,7 +1557,7 @@ class AsyncConnection(object):
Additional options for the storage backend. Options already set on the
connection will be inherited by the table, but can be overridden here.
See available options at
<https://lancedb.github.io/lancedb/guides/storage/>
<https://lancedb.com/docs/storage/>
index_cache_size: int, default 256
**Deprecated**: Use session-level cache configuration instead.
Create a Session with custom cache sizes and pass it to lancedb.connect().
@@ -1248,6 +1580,8 @@ class AsyncConnection(object):
-------
A LanceTable object representing the table.
"""
if namespace is None:
namespace = []
table = await self._inner.open_table(
name,
namespace=namespace,
@@ -1263,7 +1597,7 @@ class AsyncConnection(object):
target_table_name: str,
source_uri: str,
*,
target_namespace: List[str] = [],
target_namespace: Optional[List[str]] = None,
source_version: Optional[int] = None,
source_tag: Optional[str] = None,
is_shallow: bool = True,
@@ -1296,6 +1630,8 @@ class AsyncConnection(object):
-------
An AsyncTable object representing the cloned table.
"""
if target_namespace is None:
target_namespace = []
table = await self._inner.clone_table(
target_table_name,
source_uri,
@@ -1310,8 +1646,8 @@ class AsyncConnection(object):
self,
cur_name: str,
new_name: str,
cur_namespace: List[str] = [],
new_namespace: List[str] = [],
cur_namespace: Optional[List[str]] = None,
new_namespace: Optional[List[str]] = None,
):
"""Rename a table in the database.
@@ -1328,6 +1664,10 @@ class AsyncConnection(object):
The namespace to move the table to.
If not specified, defaults to the same as cur_namespace.
"""
if cur_namespace is None:
cur_namespace = []
if new_namespace is None:
new_namespace = []
await self._inner.rename_table(
cur_name, new_name, cur_namespace=cur_namespace, new_namespace=new_namespace
)
@@ -1336,7 +1676,7 @@ class AsyncConnection(object):
self,
name: str,
*,
namespace: List[str] = [],
namespace: Optional[List[str]] = None,
ignore_missing: bool = False,
):
"""Drop a table from the database.
@@ -1351,6 +1691,8 @@ class AsyncConnection(object):
ignore_missing: bool, default False
If True, ignore if the table does not exist.
"""
if namespace is None:
namespace = []
try:
await self._inner.drop_table(name, namespace=namespace)
except ValueError as e:
@@ -1359,7 +1701,7 @@ class AsyncConnection(object):
if f"Table '{name}' was not found" not in str(e):
raise e
async def drop_all_tables(self, namespace: List[str] = []):
async def drop_all_tables(self, namespace: Optional[List[str]] = None):
"""Drop all tables from the database.
Parameters
@@ -1368,6 +1710,8 @@ class AsyncConnection(object):
The namespace to drop all tables from.
None or empty list represents root namespace.
"""
if namespace is None:
namespace = []
await self._inner.drop_all_tables(namespace=namespace)
@deprecation.deprecated(

View File

@@ -376,6 +376,11 @@ class HnswSq:
target_partition_size: Optional[int] = None
# Backwards-compatible aliases
IvfHnswPq = HnswPq
IvfHnswSq = HnswSq
@dataclass
class IvfFlat:
"""Describes an IVF Flat Index
@@ -475,6 +480,36 @@ class IvfFlat:
target_partition_size: Optional[int] = None
@dataclass
class IvfSq:
"""Describes an IVF Scalar Quantization (SQ) index.
This index applies scalar quantization to compress vectors and organizes the
quantized vectors into IVF partitions. It offers a balance between search
speed and storage efficiency while keeping good recall.
Attributes
----------
distance_type: str, default "l2"
The distance metric used to train and search the index. Supported values
are "l2", "cosine", and "dot".
num_partitions: int, default sqrt(num_rows)
Number of IVF partitions to create.
max_iterations: int, default 50
Maximum iterations for kmeans during partition training.
sample_rate: int, default 256
Controls the number of training vectors: sample_rate * num_partitions.
target_partition_size: int, optional
Target size for each partition; adjusts the balance between speed and accuracy.
"""
distance_type: Literal["l2", "cosine", "dot"] = "l2"
num_partitions: Optional[int] = None
max_iterations: int = 50
sample_rate: int = 256
target_partition_size: Optional[int] = None
@dataclass
class IvfPq:
"""Describes an IVF PQ Index
@@ -609,9 +644,19 @@ class IvfPq:
class IvfRq:
"""Describes an IVF RQ Index
IVF-RQ (Residual Quantization) stores a compressed copy of each vector using
residual quantization and organizes them into IVF partitions. Parameters
largely mirror IVF-PQ for consistency.
IVF-RQ (RabitQ Quantization) compresses vectors using RabitQ quantization
and organizes them into IVF partitions.
The compression scheme is called RabitQ quantization. Each dimension is
quantized into a small number of bits. The parameters `num_bits` and
`num_partitions` control this process, providing a tradeoff between
index size (and thus search speed) and index accuracy.
The partitioning process is called IVF and the `num_partitions` parameter
controls how many groups to create.
Note that training an IVF RQ index on a large dataset is a slow operation
and currently is also a memory intensive operation.
Attributes
----------
@@ -628,7 +673,7 @@ class IvfRq:
Number of IVF partitions to create.
num_bits: int, default 1
Number of bits to encode each dimension.
Number of bits to encode each dimension in the RabitQ codebook.
max_iterations: int, default 50
Max iterations to train kmeans when computing IVF partitions.
@@ -651,6 +696,9 @@ class IvfRq:
__all__ = [
"BTree",
"IvfPq",
"IvfHnswPq",
"IvfHnswSq",
"IvfSq",
"IvfRq",
"IvfFlat",
"HnswPq",

View File

@@ -10,6 +10,7 @@ through a namespace abstraction.
from __future__ import annotations
import asyncio
import sys
from typing import Dict, Iterable, List, Optional, Union
@@ -22,27 +23,39 @@ from datetime import timedelta
import pyarrow as pa
from lancedb.db import DBConnection, LanceDBConnection
from lancedb.namespace_utils import (
_normalize_create_namespace_mode,
_normalize_drop_namespace_mode,
_normalize_drop_namespace_behavior,
)
from lancedb.io import StorageOptionsProvider
from lancedb.table import LanceTable, Table
from lance_namespace import (
LanceNamespace,
connect as namespace_connect,
CreateNamespaceResponse,
DescribeNamespaceResponse,
DropNamespaceResponse,
ListNamespacesResponse,
ListTablesResponse,
ListTablesRequest,
DescribeTableRequest,
DescribeNamespaceRequest,
DropTableRequest,
ListNamespacesRequest,
CreateNamespaceRequest,
DropNamespaceRequest,
CreateEmptyTableRequest,
)
from lancedb.table import AsyncTable, LanceTable, Table
from lancedb.util import validate_table_name
from lancedb.common import DATA
from lancedb.pydantic import LanceModel
from lancedb.embeddings import EmbeddingFunctionConfig
from ._lancedb import Session
from lance_namespace import LanceNamespace, connect as namespace_connect
from lance_namespace_urllib3_client.models import (
ListTablesRequest,
DescribeTableRequest,
DropTableRequest,
ListNamespacesRequest,
CreateNamespaceRequest,
DropNamespaceRequest,
CreateEmptyTableRequest,
JsonArrowSchema,
JsonArrowField,
JsonArrowDataType,
)
from lance_namespace_urllib3_client.models.json_arrow_schema import JsonArrowSchema
from lance_namespace_urllib3_client.models.json_arrow_field import JsonArrowField
from lance_namespace_urllib3_client.models.json_arrow_data_type import JsonArrowDataType
def _convert_pyarrow_type_to_json(arrow_type: pa.DataType) -> JsonArrowDataType:
@@ -126,13 +139,17 @@ class LanceNamespaceStorageOptionsProvider(StorageOptionsProvider):
Examples
--------
>>> from lance_namespace import connect as namespace_connect
>>> namespace = namespace_connect("rest", {"url": "https://..."})
>>> provider = LanceNamespaceStorageOptionsProvider(
... namespace=namespace,
... table_id=["my_namespace", "my_table"]
... )
>>> options = provider.fetch_storage_options()
Create a provider and fetch storage options::
from lance_namespace import connect as namespace_connect
# Connect to namespace (requires a running namespace server)
namespace = namespace_connect("rest", {"uri": "https://..."})
provider = LanceNamespaceStorageOptionsProvider(
namespace=namespace,
table_id=["my_namespace", "my_table"]
)
options = provider.fetch_storage_options()
"""
def __init__(self, namespace: LanceNamespace, table_id: List[str]):
@@ -234,8 +251,23 @@ class LanceNamespaceDBConnection(DBConnection):
page_token: Optional[str] = None,
limit: int = 10,
*,
namespace: List[str] = [],
namespace: Optional[List[str]] = None,
) -> Iterable[str]:
"""
List table names in the database.
.. deprecated::
Use :meth:`list_tables` instead, which provides proper pagination support.
"""
import warnings
warnings.warn(
"table_names() is deprecated, use list_tables() instead",
DeprecationWarning,
stacklevel=2,
)
if namespace is None:
namespace = []
request = ListTablesRequest(id=namespace, page_token=page_token, limit=limit)
response = self._ns.list_tables(request)
return response.tables if response.tables else []
@@ -252,12 +284,14 @@ class LanceNamespaceDBConnection(DBConnection):
fill_value: float = 0.0,
embedding_functions: Optional[List[EmbeddingFunctionConfig]] = None,
*,
namespace: List[str] = [],
namespace: Optional[List[str]] = None,
storage_options: Optional[Dict[str, str]] = None,
storage_options_provider: Optional[StorageOptionsProvider] = None,
data_storage_version: Optional[str] = None,
enable_v2_manifest_paths: Optional[bool] = None,
) -> Table:
if namespace is None:
namespace = []
if mode.lower() not in ["create", "overwrite"]:
raise ValueError("mode must be either 'create' or 'overwrite'")
validate_table_name(name)
@@ -346,11 +380,13 @@ class LanceNamespaceDBConnection(DBConnection):
self,
name: str,
*,
namespace: List[str] = [],
namespace: Optional[List[str]] = None,
storage_options: Optional[Dict[str, str]] = None,
storage_options_provider: Optional[StorageOptionsProvider] = None,
index_cache_size: Optional[int] = None,
) -> Table:
if namespace is None:
namespace = []
table_id = namespace + [name]
request = DescribeTableRequest(id=table_id)
response = self._ns.describe_table(request)
@@ -380,8 +416,10 @@ class LanceNamespaceDBConnection(DBConnection):
)
@override
def drop_table(self, name: str, namespace: List[str] = []):
def drop_table(self, name: str, namespace: Optional[List[str]] = None):
# Use namespace drop_table directly
if namespace is None:
namespace = []
table_id = namespace + [name]
request = DropTableRequest(id=table_id)
self._ns.drop_table(request)
@@ -391,9 +429,13 @@ class LanceNamespaceDBConnection(DBConnection):
self,
cur_name: str,
new_name: str,
cur_namespace: List[str] = [],
new_namespace: List[str] = [],
cur_namespace: Optional[List[str]] = None,
new_namespace: Optional[List[str]] = None,
):
if cur_namespace is None:
cur_namespace = []
if new_namespace is None:
new_namespace = []
raise NotImplementedError(
"rename_table is not supported for namespace connections"
)
@@ -405,17 +447,19 @@ class LanceNamespaceDBConnection(DBConnection):
)
@override
def drop_all_tables(self, namespace: List[str] = []):
def drop_all_tables(self, namespace: Optional[List[str]] = None):
if namespace is None:
namespace = []
for table_name in self.table_names(namespace=namespace):
self.drop_table(table_name, namespace=namespace)
@override
def list_namespaces(
self,
namespace: List[str] = [],
namespace: Optional[List[str]] = None,
page_token: Optional[str] = None,
limit: int = 10,
) -> Iterable[str]:
limit: Optional[int] = None,
) -> ListNamespacesResponse:
"""
List child namespaces under the given namespace.
@@ -425,23 +469,34 @@ class LanceNamespaceDBConnection(DBConnection):
The parent namespace to list children from.
If None, lists root-level namespaces.
page_token : Optional[str]
Pagination token for listing results.
limit : int
Token for pagination. Use the token from a previous response
to get the next page of results.
limit : int, optional
Maximum number of namespaces to return.
Returns
-------
Iterable[str]
Names of child namespaces.
ListNamespacesResponse
Response containing namespace names and optional page_token for pagination.
"""
if namespace is None:
namespace = []
request = ListNamespacesRequest(
id=namespace, page_token=page_token, limit=limit
)
response = self._ns.list_namespaces(request)
return response.namespaces if response.namespaces else []
return ListNamespacesResponse(
namespaces=response.namespaces if response.namespaces else [],
page_token=response.page_token,
)
@override
def create_namespace(self, namespace: List[str]) -> None:
def create_namespace(
self,
namespace: List[str],
mode: Optional[str] = None,
properties: Optional[Dict[str, str]] = None,
) -> CreateNamespaceResponse:
"""
Create a new namespace.
@@ -449,12 +504,34 @@ class LanceNamespaceDBConnection(DBConnection):
----------
namespace : List[str]
The namespace path to create.
mode : str, optional
Creation mode - "create" (fail if exists), "exist_ok" (skip if exists),
or "overwrite" (replace if exists). Case insensitive.
properties : Dict[str, str], optional
Properties to set on the namespace.
Returns
-------
CreateNamespaceResponse
Response containing the properties of the created namespace.
"""
request = CreateNamespaceRequest(id=namespace)
self._ns.create_namespace(request)
request = CreateNamespaceRequest(
id=namespace,
mode=_normalize_create_namespace_mode(mode),
properties=properties,
)
response = self._ns.create_namespace(request)
return CreateNamespaceResponse(
properties=response.properties if hasattr(response, "properties") else None
)
@override
def drop_namespace(self, namespace: List[str]) -> None:
def drop_namespace(
self,
namespace: List[str],
mode: Optional[str] = None,
behavior: Optional[str] = None,
) -> DropNamespaceResponse:
"""
Drop a namespace.
@@ -462,22 +539,102 @@ class LanceNamespaceDBConnection(DBConnection):
----------
namespace : List[str]
The namespace path to drop.
mode : str, optional
Whether to skip if not exists ("SKIP") or fail ("FAIL"). Case insensitive.
behavior : str, optional
Whether to restrict drop if not empty ("RESTRICT") or cascade ("CASCADE").
Case insensitive.
Returns
-------
DropNamespaceResponse
Response containing properties and transaction_id if applicable.
"""
request = DropNamespaceRequest(id=namespace)
self._ns.drop_namespace(request)
request = DropNamespaceRequest(
id=namespace,
mode=_normalize_drop_namespace_mode(mode),
behavior=_normalize_drop_namespace_behavior(behavior),
)
response = self._ns.drop_namespace(request)
return DropNamespaceResponse(
properties=(
response.properties if hasattr(response, "properties") else None
),
transaction_id=(
response.transaction_id if hasattr(response, "transaction_id") else None
),
)
@override
def describe_namespace(self, namespace: List[str]) -> DescribeNamespaceResponse:
"""
Describe a namespace.
Parameters
----------
namespace : List[str]
The namespace identifier to describe.
Returns
-------
DescribeNamespaceResponse
Response containing the namespace properties.
"""
request = DescribeNamespaceRequest(id=namespace)
response = self._ns.describe_namespace(request)
return DescribeNamespaceResponse(
properties=response.properties if hasattr(response, "properties") else None
)
@override
def list_tables(
self,
namespace: Optional[List[str]] = None,
page_token: Optional[str] = None,
limit: Optional[int] = None,
) -> ListTablesResponse:
"""
List all tables in this database with pagination support.
Parameters
----------
namespace : List[str], optional
The namespace to list tables in.
None or empty list represents root namespace.
page_token : str, optional
Token for pagination. Use the token from a previous response
to get the next page of results.
limit : int, optional
The maximum number of results to return.
Returns
-------
ListTablesResponse
Response containing table names and optional page_token for pagination.
"""
if namespace is None:
namespace = []
request = ListTablesRequest(id=namespace, page_token=page_token, limit=limit)
response = self._ns.list_tables(request)
return ListTablesResponse(
tables=response.tables if response.tables else [],
page_token=response.page_token,
)
def _lance_table_from_uri(
self,
name: str,
table_uri: str,
*,
namespace: List[str] = [],
namespace: Optional[List[str]] = None,
storage_options: Optional[Dict[str, str]] = None,
storage_options_provider: Optional[StorageOptionsProvider] = None,
index_cache_size: Optional[int] = None,
) -> LanceTable:
# Open a table directly from a URI using the location parameter
# Note: storage_options should already be merged by the caller
if namespace is None:
namespace = []
temp_conn = LanceDBConnection(
table_uri, # Use the table location as the connection URI
read_consistency_interval=self.read_consistency_interval,
@@ -497,6 +654,431 @@ class LanceNamespaceDBConnection(DBConnection):
)
class AsyncLanceNamespaceDBConnection:
"""
An async LanceDB connection that uses a namespace for table management.
This connection delegates table URI resolution to a lance_namespace instance,
while providing async methods for all operations.
"""
def __init__(
self,
namespace: LanceNamespace,
*,
read_consistency_interval: Optional[timedelta] = None,
storage_options: Optional[Dict[str, str]] = None,
session: Optional[Session] = None,
):
"""
Initialize an async namespace-based LanceDB connection.
Parameters
----------
namespace : LanceNamespace
The namespace instance to use for table management
read_consistency_interval : Optional[timedelta]
The interval at which to check for updates to the table from other
processes. If None, then consistency is not checked.
storage_options : Optional[Dict[str, str]]
Additional options for the storage backend
session : Optional[Session]
A session to use for this connection
"""
self._ns = namespace
self.read_consistency_interval = read_consistency_interval
self.storage_options = storage_options or {}
self.session = session
async def table_names(
self,
page_token: Optional[str] = None,
limit: int = 10,
*,
namespace: Optional[List[str]] = None,
) -> Iterable[str]:
"""
List table names in the namespace.
.. deprecated::
Use :meth:`list_tables` instead, which provides proper pagination support.
"""
import warnings
warnings.warn(
"table_names() is deprecated, use list_tables() instead",
DeprecationWarning,
stacklevel=2,
)
if namespace is None:
namespace = []
request = ListTablesRequest(id=namespace, page_token=page_token, limit=limit)
response = self._ns.list_tables(request)
return response.tables if response.tables else []
async def create_table(
self,
name: str,
data: Optional[DATA] = None,
schema: Optional[Union[pa.Schema, LanceModel]] = None,
mode: str = "create",
exist_ok: bool = False,
on_bad_vectors: str = "error",
fill_value: float = 0.0,
embedding_functions: Optional[List[EmbeddingFunctionConfig]] = None,
*,
namespace: Optional[List[str]] = None,
storage_options: Optional[Dict[str, str]] = None,
storage_options_provider: Optional[StorageOptionsProvider] = None,
data_storage_version: Optional[str] = None,
enable_v2_manifest_paths: Optional[bool] = None,
) -> AsyncTable:
"""Create a new table in the namespace."""
if namespace is None:
namespace = []
if mode.lower() not in ["create", "overwrite"]:
raise ValueError("mode must be either 'create' or 'overwrite'")
validate_table_name(name)
# Get location from namespace
table_id = namespace + [name]
# Step 1: Get the table location and storage options from namespace
location = None
namespace_storage_options = None
if mode.lower() == "overwrite":
# Try to describe the table first to see if it exists
try:
describe_request = DescribeTableRequest(id=table_id)
describe_response = self._ns.describe_table(describe_request)
location = describe_response.location
namespace_storage_options = describe_response.storage_options
except Exception:
# Table doesn't exist, will create a new one below
pass
if location is None:
# Table doesn't exist or mode is "create", reserve a new location
create_empty_request = CreateEmptyTableRequest(
id=table_id,
location=None,
properties=self.storage_options if self.storage_options else None,
)
create_empty_response = self._ns.create_empty_table(create_empty_request)
if not create_empty_response.location:
raise ValueError(
"Table location is missing from create_empty_table response"
)
location = create_empty_response.location
namespace_storage_options = create_empty_response.storage_options
# Merge storage options: self.storage_options < user options < namespace options
merged_storage_options = dict(self.storage_options)
if storage_options:
merged_storage_options.update(storage_options)
if namespace_storage_options:
merged_storage_options.update(namespace_storage_options)
# Step 2: Create table using LanceTable.create with the location
# Run the sync operation in a thread
def _create_table():
temp_conn = LanceDBConnection(
location,
read_consistency_interval=self.read_consistency_interval,
storage_options=merged_storage_options,
session=self.session,
)
# Create a storage options provider if not provided by user
if (
storage_options_provider is None
and namespace_storage_options is not None
):
provider = LanceNamespaceStorageOptionsProvider(
namespace=self._ns,
table_id=table_id,
)
else:
provider = storage_options_provider
return LanceTable.create(
temp_conn,
name,
data,
schema,
mode=mode,
exist_ok=exist_ok,
on_bad_vectors=on_bad_vectors,
fill_value=fill_value,
embedding_functions=embedding_functions,
namespace=namespace,
storage_options=merged_storage_options,
storage_options_provider=provider,
location=location,
)
lance_table = await asyncio.to_thread(_create_table)
# Get the underlying async table from LanceTable
return lance_table._table
async def open_table(
self,
name: str,
*,
namespace: Optional[List[str]] = None,
storage_options: Optional[Dict[str, str]] = None,
storage_options_provider: Optional[StorageOptionsProvider] = None,
index_cache_size: Optional[int] = None,
) -> AsyncTable:
"""Open an existing table from the namespace."""
if namespace is None:
namespace = []
table_id = namespace + [name]
request = DescribeTableRequest(id=table_id)
response = self._ns.describe_table(request)
# Merge storage options: self.storage_options < user options < namespace options
merged_storage_options = dict(self.storage_options)
if storage_options:
merged_storage_options.update(storage_options)
if response.storage_options:
merged_storage_options.update(response.storage_options)
# Create a storage options provider if not provided by user
if storage_options_provider is None and response.storage_options is not None:
storage_options_provider = LanceNamespaceStorageOptionsProvider(
namespace=self._ns,
table_id=table_id,
)
# Open table in a thread
def _open_table():
temp_conn = LanceDBConnection(
response.location,
read_consistency_interval=self.read_consistency_interval,
storage_options=merged_storage_options,
session=self.session,
)
return LanceTable.open(
temp_conn,
name,
namespace=namespace,
storage_options=merged_storage_options,
storage_options_provider=storage_options_provider,
index_cache_size=index_cache_size,
location=response.location,
)
lance_table = await asyncio.to_thread(_open_table)
return lance_table._table
async def drop_table(self, name: str, namespace: Optional[List[str]] = None):
"""Drop a table from the namespace."""
if namespace is None:
namespace = []
table_id = namespace + [name]
request = DropTableRequest(id=table_id)
self._ns.drop_table(request)
async def rename_table(
self,
cur_name: str,
new_name: str,
cur_namespace: Optional[List[str]] = None,
new_namespace: Optional[List[str]] = None,
):
"""Rename is not supported for namespace connections."""
if cur_namespace is None:
cur_namespace = []
if new_namespace is None:
new_namespace = []
raise NotImplementedError(
"rename_table is not supported for namespace connections"
)
async def drop_database(self):
"""Deprecated method."""
raise NotImplementedError(
"drop_database is deprecated, use drop_all_tables instead"
)
async def drop_all_tables(self, namespace: Optional[List[str]] = None):
"""Drop all tables in the namespace."""
if namespace is None:
namespace = []
table_names = await self.table_names(namespace=namespace)
for table_name in table_names:
await self.drop_table(table_name, namespace=namespace)
async def list_namespaces(
self,
namespace: Optional[List[str]] = None,
page_token: Optional[str] = None,
limit: Optional[int] = None,
) -> ListNamespacesResponse:
"""
List child namespaces under the given namespace.
Parameters
----------
namespace : Optional[List[str]]
The parent namespace to list children from.
If None, lists root-level namespaces.
page_token : Optional[str]
Token for pagination. Use the token from a previous response
to get the next page of results.
limit : int, optional
Maximum number of namespaces to return.
Returns
-------
ListNamespacesResponse
Response containing namespace names and optional page_token for pagination.
"""
if namespace is None:
namespace = []
request = ListNamespacesRequest(
id=namespace, page_token=page_token, limit=limit
)
response = self._ns.list_namespaces(request)
return ListNamespacesResponse(
namespaces=response.namespaces if response.namespaces else [],
page_token=response.page_token,
)
async def create_namespace(
self,
namespace: List[str],
mode: Optional[str] = None,
properties: Optional[Dict[str, str]] = None,
) -> CreateNamespaceResponse:
"""
Create a new namespace.
Parameters
----------
namespace : List[str]
The namespace path to create.
mode : str, optional
Creation mode - "create" (fail if exists), "exist_ok" (skip if exists),
or "overwrite" (replace if exists). Case insensitive.
properties : Dict[str, str], optional
Properties to set on the namespace.
Returns
-------
CreateNamespaceResponse
Response containing the properties of the created namespace.
"""
request = CreateNamespaceRequest(
id=namespace,
mode=_normalize_create_namespace_mode(mode),
properties=properties,
)
response = self._ns.create_namespace(request)
return CreateNamespaceResponse(
properties=response.properties if hasattr(response, "properties") else None
)
async def drop_namespace(
self,
namespace: List[str],
mode: Optional[str] = None,
behavior: Optional[str] = None,
) -> DropNamespaceResponse:
"""
Drop a namespace.
Parameters
----------
namespace : List[str]
The namespace path to drop.
mode : str, optional
Whether to skip if not exists ("SKIP") or fail ("FAIL"). Case insensitive.
behavior : str, optional
Whether to restrict drop if not empty ("RESTRICT") or cascade ("CASCADE").
Case insensitive.
Returns
-------
DropNamespaceResponse
Response containing properties and transaction_id if applicable.
"""
request = DropNamespaceRequest(
id=namespace,
mode=_normalize_drop_namespace_mode(mode),
behavior=_normalize_drop_namespace_behavior(behavior),
)
response = self._ns.drop_namespace(request)
return DropNamespaceResponse(
properties=(
response.properties if hasattr(response, "properties") else None
),
transaction_id=(
response.transaction_id if hasattr(response, "transaction_id") else None
),
)
async def describe_namespace(
self, namespace: List[str]
) -> DescribeNamespaceResponse:
"""
Describe a namespace.
Parameters
----------
namespace : List[str]
The namespace identifier to describe.
Returns
-------
DescribeNamespaceResponse
Response containing the namespace properties.
"""
request = DescribeNamespaceRequest(id=namespace)
response = self._ns.describe_namespace(request)
return DescribeNamespaceResponse(
properties=response.properties if hasattr(response, "properties") else None
)
async def list_tables(
self,
namespace: Optional[List[str]] = None,
page_token: Optional[str] = None,
limit: Optional[int] = None,
) -> ListTablesResponse:
"""
List all tables in this database with pagination support.
Parameters
----------
namespace : List[str], optional
The namespace to list tables in.
None or empty list represents root namespace.
page_token : str, optional
Token for pagination. Use the token from a previous response
to get the next page of results.
limit : int, optional
The maximum number of results to return.
Returns
-------
ListTablesResponse
Response containing table names and optional page_token for pagination.
"""
if namespace is None:
namespace = []
request = ListTablesRequest(id=namespace, page_token=page_token, limit=limit)
response = self._ns.list_tables(request)
return ListTablesResponse(
tables=response.tables if response.tables else [],
page_token=response.page_token,
)
def connect_namespace(
impl: str,
properties: Dict[str, str],
@@ -541,3 +1123,62 @@ def connect_namespace(
storage_options=storage_options,
session=session,
)
def connect_namespace_async(
impl: str,
properties: Dict[str, str],
*,
read_consistency_interval: Optional[timedelta] = None,
storage_options: Optional[Dict[str, str]] = None,
session: Optional[Session] = None,
) -> AsyncLanceNamespaceDBConnection:
"""
Connect to a LanceDB database through a namespace (returns async connection).
This function is synchronous but returns an AsyncLanceNamespaceDBConnection
that provides async methods for all database operations.
Parameters
----------
impl : str
The namespace implementation to use. For examples:
- "dir" for DirectoryNamespace
- "rest" for REST-based namespace
- Full module path for custom implementations
properties : Dict[str, str]
Configuration properties for the namespace implementation.
Different namespace implemenation has different config properties.
For example, use DirectoryNamespace with {"root": "/path/to/directory"}
read_consistency_interval : Optional[timedelta]
The interval at which to check for updates to the table from other
processes. If None, then consistency is not checked.
storage_options : Optional[Dict[str, str]]
Additional options for the storage backend
session : Optional[Session]
A session to use for this connection
Returns
-------
AsyncLanceNamespaceDBConnection
An async namespace-based connection to LanceDB
Examples
--------
>>> import lancedb
>>> # This function is sync, but returns an async connection
>>> db = lancedb.connect_namespace_async("dir", {"root": "/path/to/db"})
>>> # Use async methods on the connection
>>> async def use_db():
... tables = await db.table_names()
... table = await db.create_table("my_table", schema=schema)
"""
namespace = namespace_connect(impl, properties)
# Return the async namespace-based connection
return AsyncLanceNamespaceDBConnection(
namespace,
read_consistency_interval=read_consistency_interval,
storage_options=storage_options,
session=session,
)

View File

@@ -0,0 +1,27 @@
# SPDX-License-Identifier: Apache-2.0
# SPDX-FileCopyrightText: Copyright The LanceDB Authors
"""Utility functions for namespace operations."""
from typing import Optional
def _normalize_create_namespace_mode(mode: Optional[str]) -> Optional[str]:
"""Normalize create namespace mode to lowercase (API expects lowercase)."""
if mode is None:
return None
return mode.lower()
def _normalize_drop_namespace_mode(mode: Optional[str]) -> Optional[str]:
"""Normalize drop namespace mode to uppercase (API expects uppercase)."""
if mode is None:
return None
return mode.upper()
def _normalize_drop_namespace_behavior(behavior: Optional[str]) -> Optional[str]:
"""Normalize drop namespace behavior to uppercase (API expects uppercase)."""
if behavior is None:
return None
return behavior.upper()

Some files were not shown because too many files have changed in this diff Show More