Compare commits

...

26 Commits

Author SHA1 Message Date
Jack Ye
26c105205c trigger ci 2025-12-02 14:48:03 -08:00
Jack Ye
8192648abc ci: fix bot sophon access 2025-12-02 14:45:42 -08:00
Jack Ye
3cd73f9f5a refactor!: deprecate mac x86 support (#2836)
We have very low download stats for mac x86, and also latest github
runners for mac are all arm, so it makes sense at this point to
deprecate x86 support in general.
2025-12-02 14:12:51 -08:00
Lance Release
b2d06a3a73 Bump version: 0.22.4-beta.2 → 0.22.4-beta.3 2025-12-02 22:01:59 +00:00
Lance Release
9d129c7e86 Bump version: 0.25.4-beta.2 → 0.25.4-beta.3 2025-12-02 22:00:35 +00:00
Jonathan Hsieh
44878dd9a5 feat: support stable row IDs via storage_options (#2831)
Add support for enabling stable row IDs when creating tables via the
`new_table_enable_stable_row_ids` storage option.

Stable row IDs ensure that row identifiers remain constant after
compaction, update, delete, and merge operations. This is useful for
materialized views and other use cases that need to track source rows
across these operations.

The option can be set at two levels:
- Connection level: applies to all tables created with that connection
- Table level: per-table override via create_table storage_options

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-12-02 13:57:00 -08:00
LanceDB Robot
4b5bb2d76c chore: update lance dependency to v1.0.0-beta.16 (#2835)
## Summary
- bump all Lance crates to v1.0.0-beta.16 via ci/set_lance_version.py
- refresh Cargo.lock (reqwest/opendal/etc.) to satisfy the new release

## Verification
- cargo clippy --workspace --tests --all-features -- -D warnings
- cargo fmt --all

Triggered by
[refs/tags/v1.0.0-beta.16](https://github.com/lance-format/lance/releases/tag/v1.0.0-beta.16)

---------

Co-authored-by: Jack Ye <yezhaoqin@gmail.com>
2025-12-01 23:07:03 -08:00
LanceDB Robot
434f4124fc chore: update lance dependency to v1.0.0-beta.14 (#2826)
## Summary
- bump all Lance crates to 1.0.0-beta.14 via ci/set_lance_version.py
- refresh Cargo.lock to capture new transitive requirements
- verified `cargo clippy --workspace --tests --all-features -- -D
warnings` and `cargo fmt --all`

Triggered by refs/tags/v1.0.0-beta.14

---------

Co-authored-by: Jack Ye <yezhaoqin@gmail.com>
2025-12-01 14:43:03 -08:00
Rudi Floren
03a1a99270 feat: remove remote default features on lance-namespace-impls (#2828)
This tries to fix #2771. It is not a complete fix because
`lance-namespace-impls` uses `lance` which has its default features
enabled. Thus, to close #2771, the lance repo also needs an update.

The `dir-*` features are enabled by the respective remote feature
(`aws`, `gcp`, `azure`, `oss`).
The `rest` feature is enabled via `remote`.
2025-12-01 10:53:22 -08:00
Xuanwo
0110e3b6f8 chore: clippy::string_to_string has been replaced by implicit_clone (#2817)
clippy::string_to_string has been replaced by implicit_clone, so lancedb
will raise a build error in Rust 1.91. This PR suppresses it.

---

**This PR was primarily authored with Codex using GPT-5-Codex and then
hand-reviewed by me. I AM responsible for every change made in this PR.
I aimed to keep it aligned with our goals, though I may have missed
minor issues. Please flag anything that feels off, I'll fix it
quickly.**

Signed-off-by: Xuanwo <github@xuanwo.io>
2025-11-26 16:30:35 +08:00
Xuanwo
f1f85b0a84 ci: migrate macos ci runners (#2818)
This PR will migrate macos CI runners.

---

**This PR was primarily authored with Codex using GPT-5-Codex and then
hand-reviewed by me. I AM responsible for every change made in this PR.
I aimed to keep it aligned with our goals, though I may have missed
minor issues. Please flag anything that feels off, I'll fix it
quickly.**

Signed-off-by: Xuanwo <github@xuanwo.io>
2025-11-26 01:22:35 +08:00
LanceDB Robot
d6daa08b54 chore: update lance dependency to v1.0.0-beta.8 (#2813)
## Summary
- bump all Lance crates to v1.0.0-beta.8 via ci/set_lance_version.py
- verified 
- ran 

Trigger: refs/tags/v1.0.0-beta.8
2025-11-24 14:58:42 -08:00
Wyatt Alt
17b71de22e feat: update codex url key (#2812)
This previously through unknown key for htmlUrl and indicated "url" is a
valid field.
2025-11-24 13:13:18 -08:00
Prashanth Rao
a250d8e7df docs: improve docstring for RabitQ in Python (#2808)
This PR improves the docstring for `IVF_RQ` (RabitQ) in Python. The
earlier version referred to it as "residual quantization", which is
confusing to future readers of the code.

In contrast, the TypeScript and Rust codebases defined `IVF_RQ` as
RabitQ. So now the three languages use comments that are consistent with
one another.

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-11-24 13:35:19 +08:00
LanceDB Robot
5a2b33581e chore: update lance dependency to v1.0.0-beta.7 (#2807)
## Summary
- bump Lance dependencies to v1.0.0-beta.7 via `ci/set_lance_version.py`
- verified workspace with `cargo clippy --workspace --tests
--all-features -- -D warnings`
- formatted with `cargo fmt --all`

Trigger: refs/tags/v1.0.0-beta.7
2025-11-21 21:42:09 -08:00
Jack Ye
3d254f61b0 ci: trigger downstream verification after version bump (#2809) 2025-11-21 09:50:23 -08:00
Jack Ye
d15e380be1 ci: add support for lance-format fury index for downloading pylance (#2804)
Set the lance-format fury repo for most places that are downloading. For
uploading, it is kept unchanged since lancedb is published to lancedb
fury.
2025-11-20 23:29:36 -08:00
Jack Ye
0baf807be0 ci: use larger runner for doctest and fix failing tests (#2801)
Currently test would fail after installing to around pytorch
2025-11-20 19:44:31 -08:00
Prashanth Rao
76bcc78910 docs: nodejs failing CI is fixed (#2802)
Fixes the breaking CI for nodejs, related to the documentation of the
new Permutation API in typescript.

- Expanded the generated typings in `nodejs/lancedb/native.d.ts` to
include `SplitCalculatedOptions`, `splitNames` fields, and the
persist/options-based `splitCalculated` methods so the permutation
exports match the native API.
- The previous block comment block had an inconsistency.
`splitCalculated` takes an options object (`SplitCalculatedOptions`) in
our bindings, not a bare string. The previous example showed
`builder.splitCalculated("user_id % 3");`, which doesn’t match the
actual signature and would fail TS typecheck. I updated the comment to
`builder.splitCalculated({ calculation: "user_id % 3" });` so the
example is now correct.
- Updated the `splitCalculated` example in
`nodejs/lancedb/permutation.ts` to use the options object.
- Ran `npm docs` to ensure docs build correctly.

> [!NOTE]
> **Disclaimer**: I used GPT-5.1-Codex-Max to make these updates, but I
have read the code and run `npm run docs` to verify that they work and
are correct to the best of my knowledge.
2025-11-20 16:16:38 -08:00
Prashanth Rao
135dfdc7ec docs: 404 and outdated URLs should now work (#2800)
Did a full scan of all URLs that used to point to the old mkdocs pages,
and now links to the appropriate pages on lancedb.com/docs or lance.org
docs.

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-11-20 11:14:20 -08:00
Will Jones
6f39108857 docs: add some missing classes (#2450)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Documentation**
- Expanded Python API reference with new entries for table metadata,
tagging, remote client configuration, and index statistics.
- Added documentation for new classes and modules in both synchronous
and asynchronous sections, including `FragmentStatistics`,
`FragmentSummaryStats`, `Tags`, `AsyncTags`, `IndexStatistics`, and
remote configuration options.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2025-11-20 11:04:16 -08:00
Jackson Hew
bb6b0bea0c fix: .phrase_query() not working (#2781)
The `self._query` value was not set when wrapping its copy `query` with
quotation marks.

The test for phrase queries has been updated to test the
`.phrase_query()` method as well, which will catch this bug.

---------

Co-authored-by: Will Jones <willjones127@gmail.com>
2025-11-20 10:32:37 -08:00
Jack Ye
0084eb238b fix: use None default for namespace (#2797)
Realized that using [] is an anti-pattern in python for defaults:
https://docs.python-guide.org/writing/gotchas/
2025-11-20 10:23:41 -08:00
LanceDB Robot
28ab29a3f0 chore: update lance dependency to v1.0.0-beta.5 (#2798)
## Summary
- bump all Lance workspace dependencies to v1.0.0-beta.5
- verified `cargo clippy --workspace --tests --all-features -- -D
warnings`
- ran `cargo fmt --all`

Triggered by refs/tags/v1.0.0-beta.5
2025-11-20 17:43:24 +08:00
Colin Patrick McCabe
7d3f5348a7 feat: implement head() for remote tables (#2793)
Implemnent the head() function for RemoteTable.
2025-11-19 12:49:34 -08:00
Lance Release
3531393523 Bump version: 0.22.4-beta.1 → 0.22.4-beta.2 2025-11-19 20:25:41 +00:00
66 changed files with 1460 additions and 405 deletions

View File

@@ -1,5 +1,5 @@
[tool.bumpversion] [tool.bumpversion]
current_version = "0.22.4-beta.1" current_version = "0.22.4-beta.3"
parse = """(?x) parse = """(?x)
(?P<major>0|[1-9]\\d*)\\. (?P<major>0|[1-9]\\d*)\\.
(?P<minor>0|[1-9]\\d*)\\. (?P<minor>0|[1-9]\\d*)\\.

View File

@@ -19,7 +19,7 @@ rustflags = [
"-Wclippy::string_add_assign", "-Wclippy::string_add_assign",
"-Wclippy::string_add", "-Wclippy::string_add",
"-Wclippy::string_lit_as_bytes", "-Wclippy::string_lit_as_bytes",
"-Wclippy::string_to_string", "-Wclippy::implicit_clone",
"-Wclippy::use_self", "-Wclippy::use_self",
"-Dclippy::cargo", "-Dclippy::cargo",
"-Dclippy::dbg_macro", "-Dclippy::dbg_macro",

View File

@@ -18,6 +18,6 @@ body:
label: Link label: Link
description: > description: >
Provide a link to the existing documentation, if applicable. Provide a link to the existing documentation, if applicable.
placeholder: ex. https://lancedb.github.io/lancedb/guides/tables/... placeholder: ex. https://lancedb.com/docs/tables/...
validations: validations:
required: false required: false

View File

@@ -31,7 +31,7 @@ runs:
with: with:
command: build command: build
working-directory: python working-directory: python
docker-options: "-e PIP_EXTRA_INDEX_URL=https://pypi.fury.io/lancedb/" docker-options: "-e PIP_EXTRA_INDEX_URL='https://pypi.fury.io/lance-format/ https://pypi.fury.io/lancedb/'"
target: x86_64-unknown-linux-gnu target: x86_64-unknown-linux-gnu
manylinux: ${{ inputs.manylinux }} manylinux: ${{ inputs.manylinux }}
args: ${{ inputs.args }} args: ${{ inputs.args }}
@@ -46,7 +46,7 @@ runs:
with: with:
command: build command: build
working-directory: python working-directory: python
docker-options: "-e PIP_EXTRA_INDEX_URL=https://pypi.fury.io/lancedb/" docker-options: "-e PIP_EXTRA_INDEX_URL='https://pypi.fury.io/lance-format/ https://pypi.fury.io/lancedb/'"
target: aarch64-unknown-linux-gnu target: aarch64-unknown-linux-gnu
manylinux: ${{ inputs.manylinux }} manylinux: ${{ inputs.manylinux }}
args: ${{ inputs.args }} args: ${{ inputs.args }}

View File

@@ -22,5 +22,5 @@ runs:
command: build command: build
# TODO: pass through interpreter # TODO: pass through interpreter
args: ${{ inputs.args }} args: ${{ inputs.args }}
docker-options: "-e PIP_EXTRA_INDEX_URL=https://pypi.fury.io/lancedb/" docker-options: "-e PIP_EXTRA_INDEX_URL='https://pypi.fury.io/lance-format/ https://pypi.fury.io/lancedb/'"
working-directory: python working-directory: python

View File

@@ -26,7 +26,7 @@ runs:
with: with:
command: build command: build
args: ${{ inputs.args }} args: ${{ inputs.args }}
docker-options: "-e PIP_EXTRA_INDEX_URL=https://pypi.fury.io/lancedb/" docker-options: "-e PIP_EXTRA_INDEX_URL='https://pypi.fury.io/lance-format/ https://pypi.fury.io/lancedb/'"
working-directory: python working-directory: python
- uses: actions/upload-artifact@v4 - uses: actions/upload-artifact@v4
with: with:

View File

@@ -1,6 +1,9 @@
name: Codex Update Lance Dependency name: Codex Update Lance Dependency
on: on:
pull_request:
paths:
- '.github/workflows/codex-update-lance-dependency.yml'
workflow_call: workflow_call:
inputs: inputs:
tag: tag:
@@ -11,7 +14,8 @@ on:
inputs: inputs:
tag: tag:
description: "Tag name from Lance" description: "Tag name from Lance"
required: true required: false
default: "v1.0.0-rc.1"
type: string type: string
permissions: permissions:
@@ -25,7 +29,7 @@ jobs:
steps: steps:
- name: Show inputs - name: Show inputs
run: | run: |
echo "tag = ${{ inputs.tag }}" echo "tag = ${{ inputs.tag || 'v1.0.0-rc.1' }}"
- name: Checkout Repo LanceDB - name: Checkout Repo LanceDB
uses: actions/checkout@v4 uses: actions/checkout@v4
@@ -65,7 +69,7 @@ jobs:
- name: Run Codex to update Lance dependency - name: Run Codex to update Lance dependency
env: env:
TAG: ${{ inputs.tag }} TAG: ${{ inputs.tag || 'v1.0.0-rc.1' }}
GITHUB_TOKEN: ${{ secrets.ROBOT_TOKEN }} GITHUB_TOKEN: ${{ secrets.ROBOT_TOKEN }}
GH_TOKEN: ${{ secrets.ROBOT_TOKEN }} GH_TOKEN: ${{ secrets.ROBOT_TOKEN }}
OPENAI_API_KEY: ${{ secrets.CODEX_TOKEN }} OPENAI_API_KEY: ${{ secrets.CODEX_TOKEN }}
@@ -98,3 +102,50 @@ jobs:
printenv OPENAI_API_KEY | codex login --with-api-key printenv OPENAI_API_KEY | codex login --with-api-key
codex --config shell_environment_policy.ignore_default_excludes=true exec --dangerously-bypass-approvals-and-sandbox "$(cat /tmp/codex-prompt.txt)" codex --config shell_environment_policy.ignore_default_excludes=true exec --dangerously-bypass-approvals-and-sandbox "$(cat /tmp/codex-prompt.txt)"
- name: Debug token access
env:
GH_TOKEN: ${{ secrets.ROBOT_TOKEN }}
run: |
set -euo pipefail
echo "=== Checking authenticated user ==="
gh api user --jq '.login' || echo "Failed to get user info"
echo ""
echo "=== Listing repos in lancedb org that token can access ==="
gh repo list lancedb --limit 50 --json name,visibility --jq '.[] | "\(.name) (\(.visibility))"' || echo "Failed to list repos"
echo ""
echo "=== Checking if sophon repo exists and is accessible ==="
gh repo view lancedb/sophon --json name,visibility 2>&1 || echo "Cannot access lancedb/sophon"
echo ""
echo "=== Checking token scopes ==="
gh api -i user 2>&1 | grep -i "x-oauth-scopes" || echo "Could not determine token scopes"
- name: Trigger sophon dependency update
env:
TAG: ${{ inputs.tag || 'v1.0.0-rc.1' }}
GH_TOKEN: ${{ secrets.ROBOT_TOKEN }}
run: |
set -euo pipefail
VERSION="${TAG#refs/tags/}"
VERSION="${VERSION#v}"
LANCEDB_BRANCH="codex/update-lance-${VERSION//[^a-zA-Z0-9]/-}"
echo "Triggering sophon workflow with:"
echo " lance_ref: ${TAG#refs/tags/}"
echo " lancedb_ref: ${LANCEDB_BRANCH}"
gh workflow run codex-bump-lancedb-lance.yml \
--repo lancedb/sophon \
-f lance_ref="${TAG#refs/tags/}" \
-f lancedb_ref="${LANCEDB_BRANCH}"
- name: Show latest sophon workflow run
env:
GH_TOKEN: ${{ secrets.ROBOT_TOKEN }}
run: |
set -euo pipefail
echo "Latest sophon workflow run:"
gh run list --repo lancedb/sophon --workflow codex-bump-lancedb-lance.yml --limit 1 --json databaseId,url,displayTitle

View File

@@ -24,7 +24,7 @@ env:
# according to: https://matklad.github.io/2021/09/04/fast-rust-builds.html # according to: https://matklad.github.io/2021/09/04/fast-rust-builds.html
# CI builds are faster with incremental disabled. # CI builds are faster with incremental disabled.
CARGO_INCREMENTAL: "0" CARGO_INCREMENTAL: "0"
PIP_EXTRA_INDEX_URL: "https://pypi.fury.io/lancedb/" PIP_EXTRA_INDEX_URL: "https://pypi.fury.io/lance-format/ https://pypi.fury.io/lancedb/"
jobs: jobs:
# Single deploy job since we're just deploying # Single deploy job since we're just deploying
@@ -50,8 +50,8 @@ jobs:
- name: Build Python - name: Build Python
working-directory: python working-directory: python
run: | run: |
python -m pip install --extra-index-url https://pypi.fury.io/lancedb/ -e . python -m pip install --extra-index-url https://pypi.fury.io/lance-format/ --extra-index-url https://pypi.fury.io/lancedb/ -e .
python -m pip install --extra-index-url https://pypi.fury.io/lancedb/ -r ../docs/requirements.txt python -m pip install --extra-index-url https://pypi.fury.io/lance-format/ --extra-index-url https://pypi.fury.io/lancedb/ -r ../docs/requirements.txt
- name: Set up node - name: Set up node
uses: actions/setup-node@v3 uses: actions/setup-node@v3
with: with:

View File

@@ -59,4 +59,4 @@ jobs:
GH_TOKEN: ${{ secrets.ROBOT_TOKEN }} GH_TOKEN: ${{ secrets.ROBOT_TOKEN }}
run: | run: |
set -euo pipefail set -euo pipefail
gh run list --workflow codex-update-lance-dependency.yml --limit 1 --json databaseId,htmlUrl,displayTitle gh run list --workflow codex-update-lance-dependency.yml --limit 1 --json databaseId,url,displayTitle

View File

@@ -97,12 +97,6 @@ jobs:
fail-fast: false fail-fast: false
matrix: matrix:
settings: settings:
- target: x86_64-apple-darwin
host: macos-latest
features: ","
pre_build: |-
brew install protobuf
rustup target add x86_64-apple-darwin
- target: aarch64-apple-darwin - target: aarch64-apple-darwin
host: macos-latest host: macos-latest
features: fp16kernels features: fp16kernels

View File

@@ -11,7 +11,7 @@ on:
- Cargo.toml # Change in dependency frequently breaks builds - Cargo.toml # Change in dependency frequently breaks builds
env: env:
PIP_EXTRA_INDEX_URL: "https://pypi.fury.io/lancedb/" PIP_EXTRA_INDEX_URL: "https://pypi.fury.io/lance-format/ https://pypi.fury.io/lancedb/"
jobs: jobs:
linux: linux:
@@ -64,8 +64,6 @@ jobs:
strategy: strategy:
matrix: matrix:
config: config:
- target: x86_64-apple-darwin
runner: macos-13
- target: aarch64-apple-darwin - target: aarch64-apple-darwin
runner: warp-macos-14-arm64-6x runner: warp-macos-14-arm64-6x
env: env:

View File

@@ -18,7 +18,7 @@ env:
# Color output for pytest is off by default. # Color output for pytest is off by default.
PYTEST_ADDOPTS: "--color=yes" PYTEST_ADDOPTS: "--color=yes"
FORCE_COLOR: "1" FORCE_COLOR: "1"
PIP_EXTRA_INDEX_URL: "https://pypi.fury.io/lancedb/" PIP_EXTRA_INDEX_URL: "https://pypi.fury.io/lance-format/ https://pypi.fury.io/lancedb/"
RUST_BACKTRACE: "1" RUST_BACKTRACE: "1"
jobs: jobs:
@@ -79,7 +79,7 @@ jobs:
doctest: doctest:
name: "Doctest" name: "Doctest"
timeout-minutes: 30 timeout-minutes: 30
runs-on: "ubuntu-24.04" runs-on: ubuntu-2404-8x-x64
defaults: defaults:
run: run:
shell: bash shell: bash
@@ -100,7 +100,7 @@ jobs:
sudo apt install -y protobuf-compiler sudo apt install -y protobuf-compiler
- name: Install - name: Install
run: | run: |
pip install --extra-index-url https://pypi.fury.io/lancedb/ -e .[tests,dev,embeddings] pip install --extra-index-url https://pypi.fury.io/lance-format/ --extra-index-url https://pypi.fury.io/lancedb/ -e .[tests,dev,embeddings]
pip install tantivy pip install tantivy
pip install mlx pip install mlx
- name: Doctest - name: Doctest
@@ -143,16 +143,9 @@ jobs:
- name: Delete wheels - name: Delete wheels
run: rm -rf target/wheels run: rm -rf target/wheels
platform: platform:
name: "Mac: ${{ matrix.config.name }}" name: "Mac"
timeout-minutes: 30 timeout-minutes: 30
strategy: runs-on: macos-14
matrix:
config:
- name: x86
runner: macos-13
- name: Arm
runner: macos-14
runs-on: "${{ matrix.config.runner }}"
defaults: defaults:
run: run:
shell: bash shell: bash
@@ -226,7 +219,7 @@ jobs:
run: | run: |
pip install "pydantic<2" pip install "pydantic<2"
pip install pyarrow==16 pip install pyarrow==16
pip install --extra-index-url https://pypi.fury.io/lancedb/ -e .[tests] pip install --extra-index-url https://pypi.fury.io/lance-format/ --extra-index-url https://pypi.fury.io/lancedb/ -e .[tests]
pip install tantivy pip install tantivy
- name: Run tests - name: Run tests
run: pytest -m "not slow and not s3_test" -x -v --durations=30 python/tests run: pytest -m "not slow and not s3_test" -x -v --durations=30 python/tests

View File

@@ -15,7 +15,7 @@ runs:
- name: Install lancedb - name: Install lancedb
shell: bash shell: bash
run: | run: |
pip3 install --extra-index-url https://pypi.fury.io/lancedb/ $(ls target/wheels/lancedb-*.whl)[tests,dev] pip3 install --extra-index-url https://pypi.fury.io/lance-format/ --extra-index-url https://pypi.fury.io/lancedb/ $(ls target/wheels/lancedb-*.whl)[tests,dev]
- name: Setup localstack for integration tests - name: Setup localstack for integration tests
if: ${{ inputs.integration == 'true' }} if: ${{ inputs.integration == 'true' }}
shell: bash shell: bash

View File

@@ -122,7 +122,7 @@ jobs:
timeout-minutes: 30 timeout-minutes: 30
strategy: strategy:
matrix: matrix:
mac-runner: ["macos-13", "macos-14"] mac-runner: ["macos-14", "macos-15"]
runs-on: "${{ matrix.mac-runner }}" runs-on: "${{ matrix.mac-runner }}"
defaults: defaults:
run: run:

773
Cargo.lock generated

File diff suppressed because it is too large Load Diff

View File

@@ -15,20 +15,20 @@ categories = ["database-implementations"]
rust-version = "1.78.0" rust-version = "1.78.0"
[workspace.dependencies] [workspace.dependencies]
lance = { "version" = "=1.0.0-beta.3", default-features = false, "tag" = "v1.0.0-beta.3", "git" = "https://github.com/lance-format/lance.git" } lance = { "version" = "=1.0.0-beta.16", default-features = false, "tag" = "v1.0.0-beta.16", "git" = "https://github.com/lance-format/lance.git" }
lance-core = { "version" = "=1.0.0-beta.3", "tag" = "v1.0.0-beta.3", "git" = "https://github.com/lance-format/lance.git" } lance-core = { "version" = "=1.0.0-beta.16", "tag" = "v1.0.0-beta.16", "git" = "https://github.com/lance-format/lance.git" }
lance-datagen = { "version" = "=1.0.0-beta.3", "tag" = "v1.0.0-beta.3", "git" = "https://github.com/lance-format/lance.git" } lance-datagen = { "version" = "=1.0.0-beta.16", "tag" = "v1.0.0-beta.16", "git" = "https://github.com/lance-format/lance.git" }
lance-file = { "version" = "=1.0.0-beta.3", "tag" = "v1.0.0-beta.3", "git" = "https://github.com/lance-format/lance.git" } lance-file = { "version" = "=1.0.0-beta.16", "tag" = "v1.0.0-beta.16", "git" = "https://github.com/lance-format/lance.git" }
lance-io = { "version" = "=1.0.0-beta.3", default-features = false, "tag" = "v1.0.0-beta.3", "git" = "https://github.com/lance-format/lance.git" } lance-io = { "version" = "=1.0.0-beta.16", default-features = false, "tag" = "v1.0.0-beta.16", "git" = "https://github.com/lance-format/lance.git" }
lance-index = { "version" = "=1.0.0-beta.3", "tag" = "v1.0.0-beta.3", "git" = "https://github.com/lance-format/lance.git" } lance-index = { "version" = "=1.0.0-beta.16", "tag" = "v1.0.0-beta.16", "git" = "https://github.com/lance-format/lance.git" }
lance-linalg = { "version" = "=1.0.0-beta.3", "tag" = "v1.0.0-beta.3", "git" = "https://github.com/lance-format/lance.git" } lance-linalg = { "version" = "=1.0.0-beta.16", "tag" = "v1.0.0-beta.16", "git" = "https://github.com/lance-format/lance.git" }
lance-namespace = { "version" = "=1.0.0-beta.3", "tag" = "v1.0.0-beta.3", "git" = "https://github.com/lance-format/lance.git" } lance-namespace = { "version" = "=1.0.0-beta.16", "tag" = "v1.0.0-beta.16", "git" = "https://github.com/lance-format/lance.git" }
lance-namespace-impls = { "version" = "=1.0.0-beta.3", "features" = ["dir-aws", "dir-gcp", "dir-azure", "dir-oss", "rest"], "tag" = "v1.0.0-beta.3", "git" = "https://github.com/lance-format/lance.git" } lance-namespace-impls = { "version" = "=1.0.0-beta.16", default-features = false, "tag" = "v1.0.0-beta.16", "git" = "https://github.com/lance-format/lance.git" }
lance-table = { "version" = "=1.0.0-beta.3", "tag" = "v1.0.0-beta.3", "git" = "https://github.com/lance-format/lance.git" } lance-table = { "version" = "=1.0.0-beta.16", "tag" = "v1.0.0-beta.16", "git" = "https://github.com/lance-format/lance.git" }
lance-testing = { "version" = "=1.0.0-beta.3", "tag" = "v1.0.0-beta.3", "git" = "https://github.com/lance-format/lance.git" } lance-testing = { "version" = "=1.0.0-beta.16", "tag" = "v1.0.0-beta.16", "git" = "https://github.com/lance-format/lance.git" }
lance-datafusion = { "version" = "=1.0.0-beta.3", "tag" = "v1.0.0-beta.3", "git" = "https://github.com/lance-format/lance.git" } lance-datafusion = { "version" = "=1.0.0-beta.16", "tag" = "v1.0.0-beta.16", "git" = "https://github.com/lance-format/lance.git" }
lance-encoding = { "version" = "=1.0.0-beta.3", "tag" = "v1.0.0-beta.3", "git" = "https://github.com/lance-format/lance.git" } lance-encoding = { "version" = "=1.0.0-beta.16", "tag" = "v1.0.0-beta.16", "git" = "https://github.com/lance-format/lance.git" }
lance-arrow = { "version" = "=1.0.0-beta.3", "tag" = "v1.0.0-beta.3", "git" = "https://github.com/lance-format/lance.git" } lance-arrow = { "version" = "=1.0.0-beta.16", "tag" = "v1.0.0-beta.16", "git" = "https://github.com/lance-format/lance.git" }
ahash = "0.8" ahash = "0.8"
# Note that this one does not include pyarrow # Note that this one does not include pyarrow
arrow = { version = "56.2", optional = false } arrow = { version = "56.2", optional = false }

View File

@@ -15,7 +15,7 @@
# **The Multimodal AI Lakehouse** # **The Multimodal AI Lakehouse**
[**How to Install** ](#how-to-install) ✦ [**Detailed Documentation**](https://lancedb.github.io/lancedb/) ✦ [**Tutorials and Recipes**](https://github.com/lancedb/vectordb-recipes/tree/main) ✦ [**Contributors**](#contributors) [**How to Install** ](#how-to-install) ✦ [**Detailed Documentation**](https://lancedb.com/docs) ✦ [**Tutorials and Recipes**](https://github.com/lancedb/vectordb-recipes/tree/main) ✦ [**Contributors**](#contributors)
**The ultimate multimodal data platform for AI/ML applications.** **The ultimate multimodal data platform for AI/ML applications.**

View File

@@ -1,8 +1,8 @@
# LanceDB Documentation # LanceDB Documentation
LanceDB docs are deployed to https://lancedb.github.io/lancedb/. LanceDB docs are available at [lancedb.com/docs](https://lancedb.com/docs).
Docs is built and deployed automatically by [Github Actions](../.github/workflows/docs.yml) The SDK docs are built and deployed automatically by [Github Actions](../.github/workflows/docs.yml)
whenever a commit is pushed to the `main` branch. So it is possible for the docs to show whenever a commit is pushed to the `main` branch. So it is possible for the docs to show
unreleased features. unreleased features.

View File

@@ -34,7 +34,7 @@ const results = await table.vectorSearch([0.1, 0.3]).limit(20).toArray();
console.log(results); console.log(results);
``` ```
The [quickstart](https://lancedb.github.io/lancedb/basic/) contains a more complete example. The [quickstart](https://lancedb.com/docs/quickstart/basic-usage/) contains more complete examples.
## Development ## Development

View File

@@ -147,7 +147,7 @@ A new PermutationBuilder instance
#### Example #### Example
```ts ```ts
builder.splitCalculated("user_id % 3"); builder.splitCalculated({ calculation: "user_id % 3" });
``` ```
*** ***

View File

@@ -89,4 +89,4 @@ optional storageOptions: Record<string, string>;
(For LanceDB OSS only): configuration for object storage. (For LanceDB OSS only): configuration for object storage.
The available options are described at https://lancedb.github.io/lancedb/guides/storage/ The available options are described at https://lancedb.com/docs/storage/

View File

@@ -97,4 +97,4 @@ Configuration for object storage.
Options already set on the connection will be inherited by the table, Options already set on the connection will be inherited by the table,
but can be overridden here. but can be overridden here.
The available options are described at https://lancedb.github.io/lancedb/guides/storage/ The available options are described at https://lancedb.com/docs/storage/

View File

@@ -42,4 +42,4 @@ Configuration for object storage.
Options already set on the connection will be inherited by the table, Options already set on the connection will be inherited by the table,
but can be overridden here. but can be overridden here.
The available options are described at https://lancedb.github.io/lancedb/guides/storage/ The available options are described at https://lancedb.com/docs/storage/

View File

@@ -30,6 +30,12 @@ is also an [asynchronous API client](#connections-asynchronous).
::: lancedb.table.Table ::: lancedb.table.Table
::: lancedb.table.FragmentStatistics
::: lancedb.table.FragmentSummaryStats
::: lancedb.table.Tags
## Querying (Synchronous) ## Querying (Synchronous)
::: lancedb.query.Query ::: lancedb.query.Query
@@ -58,6 +64,14 @@ is also an [asynchronous API client](#connections-asynchronous).
::: lancedb.embeddings.open_clip.OpenClipEmbeddings ::: lancedb.embeddings.open_clip.OpenClipEmbeddings
## Remote configuration
::: lancedb.remote.ClientConfig
::: lancedb.remote.TimeoutConfig
::: lancedb.remote.RetryConfig
## Context ## Context
::: lancedb.context.contextualize ::: lancedb.context.contextualize
@@ -115,6 +129,8 @@ Table hold your actual data as a collection of records / rows.
::: lancedb.table.AsyncTable ::: lancedb.table.AsyncTable
::: lancedb.table.AsyncTags
## Indices (Asynchronous) ## Indices (Asynchronous)
Indices can be created on a table to speed up queries. This section Indices can be created on a table to speed up queries. This section
@@ -136,6 +152,8 @@ lists the indices that LanceDb supports.
::: lancedb.index.IvfFlat ::: lancedb.index.IvfFlat
::: lancedb.table.IndexStatistics
## Querying (Asynchronous) ## Querying (Asynchronous)
Queries allow you to return data from your database. Basic queries can be Queries allow you to return data from your database. Basic queries can be

View File

@@ -8,7 +8,7 @@
<parent> <parent>
<groupId>com.lancedb</groupId> <groupId>com.lancedb</groupId>
<artifactId>lancedb-parent</artifactId> <artifactId>lancedb-parent</artifactId>
<version>0.22.4-beta.1</version> <version>0.22.4-beta.3</version>
<relativePath>../pom.xml</relativePath> <relativePath>../pom.xml</relativePath>
</parent> </parent>

View File

@@ -8,7 +8,7 @@
<parent> <parent>
<groupId>com.lancedb</groupId> <groupId>com.lancedb</groupId>
<artifactId>lancedb-parent</artifactId> <artifactId>lancedb-parent</artifactId>
<version>0.22.4-beta.1</version> <version>0.22.4-beta.3</version>
<relativePath>../pom.xml</relativePath> <relativePath>../pom.xml</relativePath>
</parent> </parent>

View File

@@ -6,7 +6,7 @@
<groupId>com.lancedb</groupId> <groupId>com.lancedb</groupId>
<artifactId>lancedb-parent</artifactId> <artifactId>lancedb-parent</artifactId>
<version>0.22.4-beta.1</version> <version>0.22.4-beta.3</version>
<packaging>pom</packaging> <packaging>pom</packaging>
<name>${project.artifactId}</name> <name>${project.artifactId}</name>
<description>LanceDB Java SDK Parent POM</description> <description>LanceDB Java SDK Parent POM</description>

View File

@@ -1,7 +1,7 @@
[package] [package]
name = "lancedb-nodejs" name = "lancedb-nodejs"
edition.workspace = true edition.workspace = true
version = "0.22.4-beta.1" version = "0.22.4-beta.3"
license.workspace = true license.workspace = true
description.workspace = true description.workspace = true
repository.workspace = true repository.workspace = true

View File

@@ -30,7 +30,7 @@ const results = await table.vectorSearch([0.1, 0.3]).limit(20).toArray();
console.log(results); console.log(results);
``` ```
The [quickstart](https://lancedb.github.io/lancedb/basic/) contains a more complete example. The [quickstart](https://lancedb.com/docs/quickstart/basic-usage/) contains more complete examples.
## Development ## Development

View File

@@ -42,7 +42,7 @@ export interface CreateTableOptions {
* Options already set on the connection will be inherited by the table, * Options already set on the connection will be inherited by the table,
* but can be overridden here. * but can be overridden here.
* *
* The available options are described at https://lancedb.github.io/lancedb/guides/storage/ * The available options are described at https://lancedb.com/docs/storage/
*/ */
storageOptions?: Record<string, string>; storageOptions?: Record<string, string>;
@@ -78,7 +78,7 @@ export interface OpenTableOptions {
* Options already set on the connection will be inherited by the table, * Options already set on the connection will be inherited by the table,
* but can be overridden here. * but can be overridden here.
* *
* The available options are described at https://lancedb.github.io/lancedb/guides/storage/ * The available options are described at https://lancedb.com/docs/storage/
*/ */
storageOptions?: Record<string, string>; storageOptions?: Record<string, string>;
/** /**

View File

@@ -118,7 +118,7 @@ export class PermutationBuilder {
* @returns A new PermutationBuilder instance * @returns A new PermutationBuilder instance
* @example * @example
* ```ts * ```ts
* builder.splitCalculated("user_id % 3"); * builder.splitCalculated({ calculation: "user_id % 3" });
* ``` * ```
*/ */
splitCalculated(options: SplitCalculatedOptions): PermutationBuilder { splitCalculated(options: SplitCalculatedOptions): PermutationBuilder {

View File

@@ -1,6 +1,6 @@
{ {
"name": "@lancedb/lancedb-darwin-arm64", "name": "@lancedb/lancedb-darwin-arm64",
"version": "0.22.4-beta.1", "version": "0.22.4-beta.3",
"os": ["darwin"], "os": ["darwin"],
"cpu": ["arm64"], "cpu": ["arm64"],
"main": "lancedb.darwin-arm64.node", "main": "lancedb.darwin-arm64.node",

View File

@@ -1,6 +1,6 @@
{ {
"name": "@lancedb/lancedb-darwin-x64", "name": "@lancedb/lancedb-darwin-x64",
"version": "0.22.4-beta.1", "version": "0.22.4-beta.3",
"os": ["darwin"], "os": ["darwin"],
"cpu": ["x64"], "cpu": ["x64"],
"main": "lancedb.darwin-x64.node", "main": "lancedb.darwin-x64.node",

View File

@@ -1,6 +1,6 @@
{ {
"name": "@lancedb/lancedb-linux-arm64-gnu", "name": "@lancedb/lancedb-linux-arm64-gnu",
"version": "0.22.4-beta.1", "version": "0.22.4-beta.3",
"os": ["linux"], "os": ["linux"],
"cpu": ["arm64"], "cpu": ["arm64"],
"main": "lancedb.linux-arm64-gnu.node", "main": "lancedb.linux-arm64-gnu.node",

View File

@@ -1,6 +1,6 @@
{ {
"name": "@lancedb/lancedb-linux-arm64-musl", "name": "@lancedb/lancedb-linux-arm64-musl",
"version": "0.22.4-beta.1", "version": "0.22.4-beta.3",
"os": ["linux"], "os": ["linux"],
"cpu": ["arm64"], "cpu": ["arm64"],
"main": "lancedb.linux-arm64-musl.node", "main": "lancedb.linux-arm64-musl.node",

View File

@@ -1,6 +1,6 @@
{ {
"name": "@lancedb/lancedb-linux-x64-gnu", "name": "@lancedb/lancedb-linux-x64-gnu",
"version": "0.22.4-beta.1", "version": "0.22.4-beta.3",
"os": ["linux"], "os": ["linux"],
"cpu": ["x64"], "cpu": ["x64"],
"main": "lancedb.linux-x64-gnu.node", "main": "lancedb.linux-x64-gnu.node",

View File

@@ -1,6 +1,6 @@
{ {
"name": "@lancedb/lancedb-linux-x64-musl", "name": "@lancedb/lancedb-linux-x64-musl",
"version": "0.22.4-beta.1", "version": "0.22.4-beta.3",
"os": ["linux"], "os": ["linux"],
"cpu": ["x64"], "cpu": ["x64"],
"main": "lancedb.linux-x64-musl.node", "main": "lancedb.linux-x64-musl.node",

View File

@@ -1,6 +1,6 @@
{ {
"name": "@lancedb/lancedb-win32-arm64-msvc", "name": "@lancedb/lancedb-win32-arm64-msvc",
"version": "0.22.4-beta.1", "version": "0.22.4-beta.3",
"os": [ "os": [
"win32" "win32"
], ],

View File

@@ -1,6 +1,6 @@
{ {
"name": "@lancedb/lancedb-win32-x64-msvc", "name": "@lancedb/lancedb-win32-x64-msvc",
"version": "0.22.4-beta.1", "version": "0.22.4-beta.3",
"os": ["win32"], "os": ["win32"],
"cpu": ["x64"], "cpu": ["x64"],
"main": "lancedb.win32-x64-msvc.node", "main": "lancedb.win32-x64-msvc.node",

View File

@@ -1,12 +1,12 @@
{ {
"name": "@lancedb/lancedb", "name": "@lancedb/lancedb",
"version": "0.22.4-beta.1", "version": "0.22.4-beta.3",
"lockfileVersion": 3, "lockfileVersion": 3,
"requires": true, "requires": true,
"packages": { "packages": {
"": { "": {
"name": "@lancedb/lancedb", "name": "@lancedb/lancedb",
"version": "0.22.4-beta.1", "version": "0.22.4-beta.3",
"cpu": [ "cpu": [
"x64", "x64",
"arm64" "arm64"

View File

@@ -11,7 +11,7 @@
"ann" "ann"
], ],
"private": false, "private": false,
"version": "0.22.4-beta.1", "version": "0.22.4-beta.3",
"main": "dist/index.js", "main": "dist/index.js",
"exports": { "exports": {
".": "./dist/index.js", ".": "./dist/index.js",

View File

@@ -35,7 +35,7 @@ pub struct ConnectionOptions {
pub read_consistency_interval: Option<f64>, pub read_consistency_interval: Option<f64>,
/// (For LanceDB OSS only): configuration for object storage. /// (For LanceDB OSS only): configuration for object storage.
/// ///
/// The available options are described at https://lancedb.github.io/lancedb/guides/storage/ /// The available options are described at https://lancedb.com/docs/storage/
pub storage_options: Option<HashMap<String, String>>, pub storage_options: Option<HashMap<String, String>>,
/// (For LanceDB OSS only): the session to use for this connection. Holds /// (For LanceDB OSS only): the session to use for this connection. Holds
/// shared caches and other session-specific state. /// shared caches and other session-specific state.

View File

@@ -1,5 +1,5 @@
[tool.bumpversion] [tool.bumpversion]
current_version = "0.25.4-beta.2" current_version = "0.25.4-beta.3"
parse = """(?x) parse = """(?x)
(?P<major>0|[1-9]\\d*)\\. (?P<major>0|[1-9]\\d*)\\.
(?P<minor>0|[1-9]\\d*)\\. (?P<minor>0|[1-9]\\d*)\\.

View File

@@ -1,6 +1,6 @@
[package] [package]
name = "lancedb-python" name = "lancedb-python"
version = "0.25.4-beta.2" version = "0.25.4-beta.3"
edition.workspace = true edition.workspace = true
description = "Python bindings for LanceDB" description = "Python bindings for LanceDB"
license.workspace = true license.workspace = true

View File

@@ -1,11 +1,11 @@
PIP_EXTRA_INDEX_URL ?= https://pypi.fury.io/lancedb/ PIP_EXTRA_INDEX_URL ?= https://pypi.fury.io/lance-format/ https://pypi.fury.io/lancedb/
help: ## Show this help. help: ## Show this help.
@sed -ne '/@sed/!s/## //p' $(MAKEFILE_LIST) @sed -ne '/@sed/!s/## //p' $(MAKEFILE_LIST)
.PHONY: develop .PHONY: develop
develop: ## Install the package in development mode. develop: ## Install the package in development mode.
PIP_EXTRA_INDEX_URL=$(PIP_EXTRA_INDEX_URL) maturin develop --extras tests,dev,embeddings PIP_EXTRA_INDEX_URL="$(PIP_EXTRA_INDEX_URL)" maturin develop --extras tests,dev,embeddings
.PHONY: format .PHONY: format
format: ## Format the code. format: ## Format the code.

View File

@@ -10,7 +10,7 @@ dependencies = [
"pyarrow>=16", "pyarrow>=16",
"pydantic>=1.10", "pydantic>=1.10",
"tqdm>=4.27.0", "tqdm>=4.27.0",
"lance-namespace>=0.0.21" "lance-namespace>=0.2.1"
] ]
description = "lancedb" description = "lancedb"
authors = [{ name = "LanceDB Devs", email = "dev@lancedb.com" }] authors = [{ name = "LanceDB Devs", email = "dev@lancedb.com" }]
@@ -45,7 +45,7 @@ repository = "https://github.com/lancedb/lancedb"
[project.optional-dependencies] [project.optional-dependencies]
pylance = [ pylance = [
"pylance>=0.25", "pylance>=1.0.0b14",
] ]
tests = [ tests = [
"aiohttp", "aiohttp",
@@ -59,7 +59,7 @@ tests = [
"polars>=0.19, <=1.3.0", "polars>=0.19, <=1.3.0",
"tantivy", "tantivy",
"pyarrow-stubs", "pyarrow-stubs",
"pylance>=1.0.0b4", "pylance>=1.0.0b14",
"requests", "requests",
"datafusion", "datafusion",
] ]

View File

@@ -72,7 +72,7 @@ def connect(
default configuration is used. default configuration is used.
storage_options: dict, optional storage_options: dict, optional
Additional options for the storage backend. See available options at Additional options for the storage backend. See available options at
<https://lancedb.github.io/lancedb/guides/storage/> <https://lancedb.com/docs/storage/>
session: Session, optional session: Session, optional
(For LanceDB OSS only) (For LanceDB OSS only)
A session to use for this connection. Sessions allow you to configure A session to use for this connection. Sessions allow you to configure
@@ -174,7 +174,7 @@ async def connect_async(
default configuration is used. default configuration is used.
storage_options: dict, optional storage_options: dict, optional
Additional options for the storage backend. See available options at Additional options for the storage backend. See available options at
<https://lancedb.github.io/lancedb/guides/storage/> <https://lancedb.com/docs/storage/>
session: Session, optional session: Session, optional
(For LanceDB OSS only) (For LanceDB OSS only)
A session to use for this connection. Sessions allow you to configure A session to use for this connection. Sessions allow you to configure

View File

@@ -26,7 +26,7 @@ class Connection(object):
async def close(self): ... async def close(self): ...
async def list_namespaces( async def list_namespaces(
self, self,
namespace: List[str], namespace: Optional[List[str]],
page_token: Optional[str], page_token: Optional[str],
limit: Optional[int], limit: Optional[int],
) -> List[str]: ... ) -> List[str]: ...
@@ -34,7 +34,7 @@ class Connection(object):
async def drop_namespace(self, namespace: List[str]) -> None: ... async def drop_namespace(self, namespace: List[str]) -> None: ...
async def table_names( async def table_names(
self, self,
namespace: List[str], namespace: Optional[List[str]],
start_after: Optional[str], start_after: Optional[str],
limit: Optional[int], limit: Optional[int],
) -> list[str]: ... ) -> list[str]: ...
@@ -43,7 +43,7 @@ class Connection(object):
name: str, name: str,
mode: str, mode: str,
data: pa.RecordBatchReader, data: pa.RecordBatchReader,
namespace: List[str] = [], namespace: Optional[List[str]] = None,
storage_options: Optional[Dict[str, str]] = None, storage_options: Optional[Dict[str, str]] = None,
storage_options_provider: Optional[StorageOptionsProvider] = None, storage_options_provider: Optional[StorageOptionsProvider] = None,
location: Optional[str] = None, location: Optional[str] = None,
@@ -53,7 +53,7 @@ class Connection(object):
name: str, name: str,
mode: str, mode: str,
schema: pa.Schema, schema: pa.Schema,
namespace: List[str] = [], namespace: Optional[List[str]] = None,
storage_options: Optional[Dict[str, str]] = None, storage_options: Optional[Dict[str, str]] = None,
storage_options_provider: Optional[StorageOptionsProvider] = None, storage_options_provider: Optional[StorageOptionsProvider] = None,
location: Optional[str] = None, location: Optional[str] = None,
@@ -61,7 +61,7 @@ class Connection(object):
async def open_table( async def open_table(
self, self,
name: str, name: str,
namespace: List[str] = [], namespace: Optional[List[str]] = None,
storage_options: Optional[Dict[str, str]] = None, storage_options: Optional[Dict[str, str]] = None,
storage_options_provider: Optional[StorageOptionsProvider] = None, storage_options_provider: Optional[StorageOptionsProvider] = None,
index_cache_size: Optional[int] = None, index_cache_size: Optional[int] = None,
@@ -71,7 +71,7 @@ class Connection(object):
self, self,
target_table_name: str, target_table_name: str,
source_uri: str, source_uri: str,
target_namespace: List[str] = [], target_namespace: Optional[List[str]] = None,
source_version: Optional[int] = None, source_version: Optional[int] = None,
source_tag: Optional[str] = None, source_tag: Optional[str] = None,
is_shallow: bool = True, is_shallow: bool = True,
@@ -80,11 +80,13 @@ class Connection(object):
self, self,
cur_name: str, cur_name: str,
new_name: str, new_name: str,
cur_namespace: List[str] = [], cur_namespace: Optional[List[str]] = None,
new_namespace: List[str] = [], new_namespace: Optional[List[str]] = None,
) -> None: ... ) -> None: ...
async def drop_table(self, name: str, namespace: List[str] = []) -> None: ... async def drop_table(
async def drop_all_tables(self, namespace: List[str] = []) -> None: ... self, name: str, namespace: Optional[List[str]] = None
) -> None: ...
async def drop_all_tables(self, namespace: Optional[List[str]] = None) -> None: ...
class Table: class Table:
def name(self) -> str: ... def name(self) -> str: ...

View File

@@ -96,7 +96,7 @@ def data_to_reader(
f"Unknown data type {type(data)}. " f"Unknown data type {type(data)}. "
"Supported types: list of dicts, pandas DataFrame, polars DataFrame, " "Supported types: list of dicts, pandas DataFrame, polars DataFrame, "
"pyarrow Table/RecordBatch, or Pydantic models. " "pyarrow Table/RecordBatch, or Pydantic models. "
"See https://lancedb.github.io/lancedb/guides/tables/ for examples." "See https://lancedb.com/docs/tables/ for examples."
) )

View File

@@ -54,7 +54,7 @@ class DBConnection(EnforceOverrides):
def list_namespaces( def list_namespaces(
self, self,
namespace: List[str] = [], namespace: Optional[List[str]] = None,
page_token: Optional[str] = None, page_token: Optional[str] = None,
limit: int = 10, limit: int = 10,
) -> Iterable[str]: ) -> Iterable[str]:
@@ -75,6 +75,8 @@ class DBConnection(EnforceOverrides):
Iterable of str Iterable of str
List of immediate child namespace names List of immediate child namespace names
""" """
if namespace is None:
namespace = []
return [] return []
def create_namespace(self, namespace: List[str]) -> None: def create_namespace(self, namespace: List[str]) -> None:
@@ -107,7 +109,7 @@ class DBConnection(EnforceOverrides):
page_token: Optional[str] = None, page_token: Optional[str] = None,
limit: int = 10, limit: int = 10,
*, *,
namespace: List[str] = [], namespace: Optional[List[str]] = None,
) -> Iterable[str]: ) -> Iterable[str]:
"""List all tables in this database, in sorted order """List all tables in this database, in sorted order
@@ -142,7 +144,7 @@ class DBConnection(EnforceOverrides):
fill_value: float = 0.0, fill_value: float = 0.0,
embedding_functions: Optional[List[EmbeddingFunctionConfig]] = None, embedding_functions: Optional[List[EmbeddingFunctionConfig]] = None,
*, *,
namespace: List[str] = [], namespace: Optional[List[str]] = None,
storage_options: Optional[Dict[str, str]] = None, storage_options: Optional[Dict[str, str]] = None,
storage_options_provider: Optional["StorageOptionsProvider"] = None, storage_options_provider: Optional["StorageOptionsProvider"] = None,
data_storage_version: Optional[str] = None, data_storage_version: Optional[str] = None,
@@ -191,7 +193,11 @@ class DBConnection(EnforceOverrides):
Additional options for the storage backend. Options already set on the Additional options for the storage backend. Options already set on the
connection will be inherited by the table, but can be overridden here. connection will be inherited by the table, but can be overridden here.
See available options at See available options at
<https://lancedb.github.io/lancedb/guides/storage/> <https://lancedb.com/docs/storage/>
To enable stable row IDs (row IDs remain stable after compaction,
update, delete, and merges), set `new_table_enable_stable_row_ids`
to `"true"` in storage_options when connecting to the database.
data_storage_version: optional, str, default "stable" data_storage_version: optional, str, default "stable"
Deprecated. Set `storage_options` when connecting to the database and set Deprecated. Set `storage_options` when connecting to the database and set
`new_table_data_storage_version` in the options. `new_table_data_storage_version` in the options.
@@ -308,7 +314,7 @@ class DBConnection(EnforceOverrides):
self, self,
name: str, name: str,
*, *,
namespace: List[str] = [], namespace: Optional[List[str]] = None,
storage_options: Optional[Dict[str, str]] = None, storage_options: Optional[Dict[str, str]] = None,
storage_options_provider: Optional["StorageOptionsProvider"] = None, storage_options_provider: Optional["StorageOptionsProvider"] = None,
index_cache_size: Optional[int] = None, index_cache_size: Optional[int] = None,
@@ -339,7 +345,7 @@ class DBConnection(EnforceOverrides):
Additional options for the storage backend. Options already set on the Additional options for the storage backend. Options already set on the
connection will be inherited by the table, but can be overridden here. connection will be inherited by the table, but can be overridden here.
See available options at See available options at
<https://lancedb.github.io/lancedb/guides/storage/> <https://lancedb.com/docs/storage/>
Returns Returns
------- -------
@@ -347,7 +353,7 @@ class DBConnection(EnforceOverrides):
""" """
raise NotImplementedError raise NotImplementedError
def drop_table(self, name: str, namespace: List[str] = []): def drop_table(self, name: str, namespace: Optional[List[str]] = None):
"""Drop a table from the database. """Drop a table from the database.
Parameters Parameters
@@ -358,14 +364,16 @@ class DBConnection(EnforceOverrides):
The namespace to drop the table from. The namespace to drop the table from.
Empty list represents root namespace. Empty list represents root namespace.
""" """
if namespace is None:
namespace = []
raise NotImplementedError raise NotImplementedError
def rename_table( def rename_table(
self, self,
cur_name: str, cur_name: str,
new_name: str, new_name: str,
cur_namespace: List[str] = [], cur_namespace: Optional[List[str]] = None,
new_namespace: List[str] = [], new_namespace: Optional[List[str]] = None,
): ):
"""Rename a table in the database. """Rename a table in the database.
@@ -382,6 +390,10 @@ class DBConnection(EnforceOverrides):
The namespace to move the table to. The namespace to move the table to.
If not specified, defaults to the same as cur_namespace. If not specified, defaults to the same as cur_namespace.
""" """
if cur_namespace is None:
cur_namespace = []
if new_namespace is None:
new_namespace = []
raise NotImplementedError raise NotImplementedError
def drop_database(self): def drop_database(self):
@@ -391,7 +403,7 @@ class DBConnection(EnforceOverrides):
""" """
raise NotImplementedError raise NotImplementedError
def drop_all_tables(self, namespace: List[str] = []): def drop_all_tables(self, namespace: Optional[List[str]] = None):
""" """
Drop all tables from the database Drop all tables from the database
@@ -401,6 +413,8 @@ class DBConnection(EnforceOverrides):
The namespace to drop all tables from. The namespace to drop all tables from.
None or empty list represents root namespace. None or empty list represents root namespace.
""" """
if namespace is None:
namespace = []
raise NotImplementedError raise NotImplementedError
@property @property
@@ -541,7 +555,7 @@ class LanceDBConnection(DBConnection):
@override @override
def list_namespaces( def list_namespaces(
self, self,
namespace: List[str] = [], namespace: Optional[List[str]] = None,
page_token: Optional[str] = None, page_token: Optional[str] = None,
limit: int = 10, limit: int = 10,
) -> Iterable[str]: ) -> Iterable[str]:
@@ -562,6 +576,8 @@ class LanceDBConnection(DBConnection):
Iterable of str Iterable of str
List of immediate child namespace names List of immediate child namespace names
""" """
if namespace is None:
namespace = []
return LOOP.run( return LOOP.run(
self._conn.list_namespaces( self._conn.list_namespaces(
namespace=namespace, page_token=page_token, limit=limit namespace=namespace, page_token=page_token, limit=limit
@@ -596,7 +612,7 @@ class LanceDBConnection(DBConnection):
page_token: Optional[str] = None, page_token: Optional[str] = None,
limit: int = 10, limit: int = 10,
*, *,
namespace: List[str] = [], namespace: Optional[List[str]] = None,
) -> Iterable[str]: ) -> Iterable[str]:
"""Get the names of all tables in the database. The names are sorted. """Get the names of all tables in the database. The names are sorted.
@@ -614,6 +630,8 @@ class LanceDBConnection(DBConnection):
Iterator of str. Iterator of str.
A list of table names. A list of table names.
""" """
if namespace is None:
namespace = []
return LOOP.run( return LOOP.run(
self._conn.table_names( self._conn.table_names(
namespace=namespace, start_after=page_token, limit=limit namespace=namespace, start_after=page_token, limit=limit
@@ -638,7 +656,7 @@ class LanceDBConnection(DBConnection):
fill_value: float = 0.0, fill_value: float = 0.0,
embedding_functions: Optional[List[EmbeddingFunctionConfig]] = None, embedding_functions: Optional[List[EmbeddingFunctionConfig]] = None,
*, *,
namespace: List[str] = [], namespace: Optional[List[str]] = None,
storage_options: Optional[Dict[str, str]] = None, storage_options: Optional[Dict[str, str]] = None,
storage_options_provider: Optional["StorageOptionsProvider"] = None, storage_options_provider: Optional["StorageOptionsProvider"] = None,
data_storage_version: Optional[str] = None, data_storage_version: Optional[str] = None,
@@ -655,6 +673,8 @@ class LanceDBConnection(DBConnection):
--- ---
DBConnection.create_table DBConnection.create_table
""" """
if namespace is None:
namespace = []
if mode.lower() not in ["create", "overwrite"]: if mode.lower() not in ["create", "overwrite"]:
raise ValueError("mode must be either 'create' or 'overwrite'") raise ValueError("mode must be either 'create' or 'overwrite'")
validate_table_name(name) validate_table_name(name)
@@ -680,7 +700,7 @@ class LanceDBConnection(DBConnection):
self, self,
name: str, name: str,
*, *,
namespace: List[str] = [], namespace: Optional[List[str]] = None,
storage_options: Optional[Dict[str, str]] = None, storage_options: Optional[Dict[str, str]] = None,
storage_options_provider: Optional["StorageOptionsProvider"] = None, storage_options_provider: Optional["StorageOptionsProvider"] = None,
index_cache_size: Optional[int] = None, index_cache_size: Optional[int] = None,
@@ -698,6 +718,8 @@ class LanceDBConnection(DBConnection):
------- -------
A LanceTable object representing the table. A LanceTable object representing the table.
""" """
if namespace is None:
namespace = []
if index_cache_size is not None: if index_cache_size is not None:
import warnings import warnings
@@ -723,7 +745,7 @@ class LanceDBConnection(DBConnection):
target_table_name: str, target_table_name: str,
source_uri: str, source_uri: str,
*, *,
target_namespace: List[str] = [], target_namespace: Optional[List[str]] = None,
source_version: Optional[int] = None, source_version: Optional[int] = None,
source_tag: Optional[str] = None, source_tag: Optional[str] = None,
is_shallow: bool = True, is_shallow: bool = True,
@@ -756,6 +778,8 @@ class LanceDBConnection(DBConnection):
------- -------
A LanceTable object representing the cloned table. A LanceTable object representing the cloned table.
""" """
if target_namespace is None:
target_namespace = []
LOOP.run( LOOP.run(
self._conn.clone_table( self._conn.clone_table(
target_table_name, target_table_name,
@@ -776,7 +800,7 @@ class LanceDBConnection(DBConnection):
def drop_table( def drop_table(
self, self,
name: str, name: str,
namespace: List[str] = [], namespace: Optional[List[str]] = None,
ignore_missing: bool = False, ignore_missing: bool = False,
): ):
"""Drop a table from the database. """Drop a table from the database.
@@ -790,6 +814,8 @@ class LanceDBConnection(DBConnection):
ignore_missing: bool, default False ignore_missing: bool, default False
If True, ignore if the table does not exist. If True, ignore if the table does not exist.
""" """
if namespace is None:
namespace = []
LOOP.run( LOOP.run(
self._conn.drop_table( self._conn.drop_table(
name, namespace=namespace, ignore_missing=ignore_missing name, namespace=namespace, ignore_missing=ignore_missing
@@ -797,7 +823,9 @@ class LanceDBConnection(DBConnection):
) )
@override @override
def drop_all_tables(self, namespace: List[str] = []): def drop_all_tables(self, namespace: Optional[List[str]] = None):
if namespace is None:
namespace = []
LOOP.run(self._conn.drop_all_tables(namespace=namespace)) LOOP.run(self._conn.drop_all_tables(namespace=namespace))
@override @override
@@ -805,8 +833,8 @@ class LanceDBConnection(DBConnection):
self, self,
cur_name: str, cur_name: str,
new_name: str, new_name: str,
cur_namespace: List[str] = [], cur_namespace: Optional[List[str]] = None,
new_namespace: List[str] = [], new_namespace: Optional[List[str]] = None,
): ):
"""Rename a table in the database. """Rename a table in the database.
@@ -821,6 +849,10 @@ class LanceDBConnection(DBConnection):
new_namespace: List[str], optional new_namespace: List[str], optional
The namespace to move the table to. The namespace to move the table to.
""" """
if cur_namespace is None:
cur_namespace = []
if new_namespace is None:
new_namespace = []
LOOP.run( LOOP.run(
self._conn.rename_table( self._conn.rename_table(
cur_name, cur_name,
@@ -910,7 +942,7 @@ class AsyncConnection(object):
async def list_namespaces( async def list_namespaces(
self, self,
namespace: List[str] = [], namespace: Optional[List[str]] = None,
page_token: Optional[str] = None, page_token: Optional[str] = None,
limit: int = 10, limit: int = 10,
) -> Iterable[str]: ) -> Iterable[str]:
@@ -931,6 +963,8 @@ class AsyncConnection(object):
Iterable of str Iterable of str
List of immediate child namespace names (not full paths) List of immediate child namespace names (not full paths)
""" """
if namespace is None:
namespace = []
return await self._inner.list_namespaces( return await self._inner.list_namespaces(
namespace=namespace, page_token=page_token, limit=limit namespace=namespace, page_token=page_token, limit=limit
) )
@@ -958,7 +992,7 @@ class AsyncConnection(object):
async def table_names( async def table_names(
self, self,
*, *,
namespace: List[str] = [], namespace: Optional[List[str]] = None,
start_after: Optional[str] = None, start_after: Optional[str] = None,
limit: Optional[int] = None, limit: Optional[int] = None,
) -> Iterable[str]: ) -> Iterable[str]:
@@ -982,6 +1016,8 @@ class AsyncConnection(object):
------- -------
Iterable of str Iterable of str
""" """
if namespace is None:
namespace = []
return await self._inner.table_names( return await self._inner.table_names(
namespace=namespace, start_after=start_after, limit=limit namespace=namespace, start_after=start_after, limit=limit
) )
@@ -998,7 +1034,7 @@ class AsyncConnection(object):
storage_options: Optional[Dict[str, str]] = None, storage_options: Optional[Dict[str, str]] = None,
storage_options_provider: Optional["StorageOptionsProvider"] = None, storage_options_provider: Optional["StorageOptionsProvider"] = None,
*, *,
namespace: List[str] = [], namespace: Optional[List[str]] = None,
embedding_functions: Optional[List[EmbeddingFunctionConfig]] = None, embedding_functions: Optional[List[EmbeddingFunctionConfig]] = None,
location: Optional[str] = None, location: Optional[str] = None,
) -> AsyncTable: ) -> AsyncTable:
@@ -1045,7 +1081,11 @@ class AsyncConnection(object):
Additional options for the storage backend. Options already set on the Additional options for the storage backend. Options already set on the
connection will be inherited by the table, but can be overridden here. connection will be inherited by the table, but can be overridden here.
See available options at See available options at
<https://lancedb.github.io/lancedb/guides/storage/> <https://lancedb.com/docs/storage/>
To enable stable row IDs (row IDs remain stable after compaction,
update, delete, and merges), set `new_table_enable_stable_row_ids`
to `"true"` in storage_options when connecting to the database.
Returns Returns
------- -------
@@ -1155,6 +1195,8 @@ class AsyncConnection(object):
... await db.create_table("table4", make_batches(), schema=schema) ... await db.create_table("table4", make_batches(), schema=schema)
>>> asyncio.run(iterable_example()) >>> asyncio.run(iterable_example())
""" """
if namespace is None:
namespace = []
metadata = None metadata = None
if embedding_functions is not None: if embedding_functions is not None:
@@ -1212,7 +1254,7 @@ class AsyncConnection(object):
self, self,
name: str, name: str,
*, *,
namespace: List[str] = [], namespace: Optional[List[str]] = None,
storage_options: Optional[Dict[str, str]] = None, storage_options: Optional[Dict[str, str]] = None,
storage_options_provider: Optional["StorageOptionsProvider"] = None, storage_options_provider: Optional["StorageOptionsProvider"] = None,
index_cache_size: Optional[int] = None, index_cache_size: Optional[int] = None,
@@ -1231,7 +1273,7 @@ class AsyncConnection(object):
Additional options for the storage backend. Options already set on the Additional options for the storage backend. Options already set on the
connection will be inherited by the table, but can be overridden here. connection will be inherited by the table, but can be overridden here.
See available options at See available options at
<https://lancedb.github.io/lancedb/guides/storage/> <https://lancedb.com/docs/storage/>
index_cache_size: int, default 256 index_cache_size: int, default 256
**Deprecated**: Use session-level cache configuration instead. **Deprecated**: Use session-level cache configuration instead.
Create a Session with custom cache sizes and pass it to lancedb.connect(). Create a Session with custom cache sizes and pass it to lancedb.connect().
@@ -1254,6 +1296,8 @@ class AsyncConnection(object):
------- -------
A LanceTable object representing the table. A LanceTable object representing the table.
""" """
if namespace is None:
namespace = []
table = await self._inner.open_table( table = await self._inner.open_table(
name, name,
namespace=namespace, namespace=namespace,
@@ -1269,7 +1313,7 @@ class AsyncConnection(object):
target_table_name: str, target_table_name: str,
source_uri: str, source_uri: str,
*, *,
target_namespace: List[str] = [], target_namespace: Optional[List[str]] = None,
source_version: Optional[int] = None, source_version: Optional[int] = None,
source_tag: Optional[str] = None, source_tag: Optional[str] = None,
is_shallow: bool = True, is_shallow: bool = True,
@@ -1302,6 +1346,8 @@ class AsyncConnection(object):
------- -------
An AsyncTable object representing the cloned table. An AsyncTable object representing the cloned table.
""" """
if target_namespace is None:
target_namespace = []
table = await self._inner.clone_table( table = await self._inner.clone_table(
target_table_name, target_table_name,
source_uri, source_uri,
@@ -1316,8 +1362,8 @@ class AsyncConnection(object):
self, self,
cur_name: str, cur_name: str,
new_name: str, new_name: str,
cur_namespace: List[str] = [], cur_namespace: Optional[List[str]] = None,
new_namespace: List[str] = [], new_namespace: Optional[List[str]] = None,
): ):
"""Rename a table in the database. """Rename a table in the database.
@@ -1334,6 +1380,10 @@ class AsyncConnection(object):
The namespace to move the table to. The namespace to move the table to.
If not specified, defaults to the same as cur_namespace. If not specified, defaults to the same as cur_namespace.
""" """
if cur_namespace is None:
cur_namespace = []
if new_namespace is None:
new_namespace = []
await self._inner.rename_table( await self._inner.rename_table(
cur_name, new_name, cur_namespace=cur_namespace, new_namespace=new_namespace cur_name, new_name, cur_namespace=cur_namespace, new_namespace=new_namespace
) )
@@ -1342,7 +1392,7 @@ class AsyncConnection(object):
self, self,
name: str, name: str,
*, *,
namespace: List[str] = [], namespace: Optional[List[str]] = None,
ignore_missing: bool = False, ignore_missing: bool = False,
): ):
"""Drop a table from the database. """Drop a table from the database.
@@ -1357,6 +1407,8 @@ class AsyncConnection(object):
ignore_missing: bool, default False ignore_missing: bool, default False
If True, ignore if the table does not exist. If True, ignore if the table does not exist.
""" """
if namespace is None:
namespace = []
try: try:
await self._inner.drop_table(name, namespace=namespace) await self._inner.drop_table(name, namespace=namespace)
except ValueError as e: except ValueError as e:
@@ -1365,7 +1417,7 @@ class AsyncConnection(object):
if f"Table '{name}' was not found" not in str(e): if f"Table '{name}' was not found" not in str(e):
raise e raise e
async def drop_all_tables(self, namespace: List[str] = []): async def drop_all_tables(self, namespace: Optional[List[str]] = None):
"""Drop all tables from the database. """Drop all tables from the database.
Parameters Parameters
@@ -1374,6 +1426,8 @@ class AsyncConnection(object):
The namespace to drop all tables from. The namespace to drop all tables from.
None or empty list represents root namespace. None or empty list represents root namespace.
""" """
if namespace is None:
namespace = []
await self._inner.drop_all_tables(namespace=namespace) await self._inner.drop_all_tables(namespace=namespace)
@deprecation.deprecated( @deprecation.deprecated(

View File

@@ -609,9 +609,19 @@ class IvfPq:
class IvfRq: class IvfRq:
"""Describes an IVF RQ Index """Describes an IVF RQ Index
IVF-RQ (Residual Quantization) stores a compressed copy of each vector using IVF-RQ (RabitQ Quantization) compresses vectors using RabitQ quantization
residual quantization and organizes them into IVF partitions. Parameters and organizes them into IVF partitions.
largely mirror IVF-PQ for consistency.
The compression scheme is called RabitQ quantization. Each dimension is
quantized into a small number of bits. The parameters `num_bits` and
`num_partitions` control this process, providing a tradeoff between
index size (and thus search speed) and index accuracy.
The partitioning process is called IVF and the `num_partitions` parameter
controls how many groups to create.
Note that training an IVF RQ index on a large dataset is a slow operation
and currently is also a memory intensive operation.
Attributes Attributes
---------- ----------
@@ -628,7 +638,7 @@ class IvfRq:
Number of IVF partitions to create. Number of IVF partitions to create.
num_bits: int, default 1 num_bits: int, default 1
Number of bits to encode each dimension. Number of bits to encode each dimension in the RabitQ codebook.
max_iterations: int, default 50 max_iterations: int, default 50
Max iterations to train kmeans when computing IVF partitions. Max iterations to train kmeans when computing IVF partitions.

View File

@@ -127,13 +127,17 @@ class LanceNamespaceStorageOptionsProvider(StorageOptionsProvider):
Examples Examples
-------- --------
>>> from lance_namespace import connect as namespace_connect Create a provider and fetch storage options::
>>> namespace = namespace_connect("rest", {"url": "https://..."})
>>> provider = LanceNamespaceStorageOptionsProvider( from lance_namespace import connect as namespace_connect
... namespace=namespace,
... table_id=["my_namespace", "my_table"] # Connect to namespace (requires a running namespace server)
... ) namespace = namespace_connect("rest", {"uri": "https://..."})
>>> options = provider.fetch_storage_options() provider = LanceNamespaceStorageOptionsProvider(
namespace=namespace,
table_id=["my_namespace", "my_table"]
)
options = provider.fetch_storage_options()
""" """
def __init__(self, namespace: LanceNamespace, table_id: List[str]): def __init__(self, namespace: LanceNamespace, table_id: List[str]):
@@ -235,8 +239,10 @@ class LanceNamespaceDBConnection(DBConnection):
page_token: Optional[str] = None, page_token: Optional[str] = None,
limit: int = 10, limit: int = 10,
*, *,
namespace: List[str] = [], namespace: Optional[List[str]] = None,
) -> Iterable[str]: ) -> Iterable[str]:
if namespace is None:
namespace = []
request = ListTablesRequest(id=namespace, page_token=page_token, limit=limit) request = ListTablesRequest(id=namespace, page_token=page_token, limit=limit)
response = self._ns.list_tables(request) response = self._ns.list_tables(request)
return response.tables if response.tables else [] return response.tables if response.tables else []
@@ -253,12 +259,14 @@ class LanceNamespaceDBConnection(DBConnection):
fill_value: float = 0.0, fill_value: float = 0.0,
embedding_functions: Optional[List[EmbeddingFunctionConfig]] = None, embedding_functions: Optional[List[EmbeddingFunctionConfig]] = None,
*, *,
namespace: List[str] = [], namespace: Optional[List[str]] = None,
storage_options: Optional[Dict[str, str]] = None, storage_options: Optional[Dict[str, str]] = None,
storage_options_provider: Optional[StorageOptionsProvider] = None, storage_options_provider: Optional[StorageOptionsProvider] = None,
data_storage_version: Optional[str] = None, data_storage_version: Optional[str] = None,
enable_v2_manifest_paths: Optional[bool] = None, enable_v2_manifest_paths: Optional[bool] = None,
) -> Table: ) -> Table:
if namespace is None:
namespace = []
if mode.lower() not in ["create", "overwrite"]: if mode.lower() not in ["create", "overwrite"]:
raise ValueError("mode must be either 'create' or 'overwrite'") raise ValueError("mode must be either 'create' or 'overwrite'")
validate_table_name(name) validate_table_name(name)
@@ -347,11 +355,13 @@ class LanceNamespaceDBConnection(DBConnection):
self, self,
name: str, name: str,
*, *,
namespace: List[str] = [], namespace: Optional[List[str]] = None,
storage_options: Optional[Dict[str, str]] = None, storage_options: Optional[Dict[str, str]] = None,
storage_options_provider: Optional[StorageOptionsProvider] = None, storage_options_provider: Optional[StorageOptionsProvider] = None,
index_cache_size: Optional[int] = None, index_cache_size: Optional[int] = None,
) -> Table: ) -> Table:
if namespace is None:
namespace = []
table_id = namespace + [name] table_id = namespace + [name]
request = DescribeTableRequest(id=table_id) request = DescribeTableRequest(id=table_id)
response = self._ns.describe_table(request) response = self._ns.describe_table(request)
@@ -381,8 +391,10 @@ class LanceNamespaceDBConnection(DBConnection):
) )
@override @override
def drop_table(self, name: str, namespace: List[str] = []): def drop_table(self, name: str, namespace: Optional[List[str]] = None):
# Use namespace drop_table directly # Use namespace drop_table directly
if namespace is None:
namespace = []
table_id = namespace + [name] table_id = namespace + [name]
request = DropTableRequest(id=table_id) request = DropTableRequest(id=table_id)
self._ns.drop_table(request) self._ns.drop_table(request)
@@ -392,9 +404,13 @@ class LanceNamespaceDBConnection(DBConnection):
self, self,
cur_name: str, cur_name: str,
new_name: str, new_name: str,
cur_namespace: List[str] = [], cur_namespace: Optional[List[str]] = None,
new_namespace: List[str] = [], new_namespace: Optional[List[str]] = None,
): ):
if cur_namespace is None:
cur_namespace = []
if new_namespace is None:
new_namespace = []
raise NotImplementedError( raise NotImplementedError(
"rename_table is not supported for namespace connections" "rename_table is not supported for namespace connections"
) )
@@ -406,14 +422,16 @@ class LanceNamespaceDBConnection(DBConnection):
) )
@override @override
def drop_all_tables(self, namespace: List[str] = []): def drop_all_tables(self, namespace: Optional[List[str]] = None):
if namespace is None:
namespace = []
for table_name in self.table_names(namespace=namespace): for table_name in self.table_names(namespace=namespace):
self.drop_table(table_name, namespace=namespace) self.drop_table(table_name, namespace=namespace)
@override @override
def list_namespaces( def list_namespaces(
self, self,
namespace: List[str] = [], namespace: Optional[List[str]] = None,
page_token: Optional[str] = None, page_token: Optional[str] = None,
limit: int = 10, limit: int = 10,
) -> Iterable[str]: ) -> Iterable[str]:
@@ -435,6 +453,8 @@ class LanceNamespaceDBConnection(DBConnection):
Iterable[str] Iterable[str]
Names of child namespaces. Names of child namespaces.
""" """
if namespace is None:
namespace = []
request = ListNamespacesRequest( request = ListNamespacesRequest(
id=namespace, page_token=page_token, limit=limit id=namespace, page_token=page_token, limit=limit
) )
@@ -472,13 +492,15 @@ class LanceNamespaceDBConnection(DBConnection):
name: str, name: str,
table_uri: str, table_uri: str,
*, *,
namespace: List[str] = [], namespace: Optional[List[str]] = None,
storage_options: Optional[Dict[str, str]] = None, storage_options: Optional[Dict[str, str]] = None,
storage_options_provider: Optional[StorageOptionsProvider] = None, storage_options_provider: Optional[StorageOptionsProvider] = None,
index_cache_size: Optional[int] = None, index_cache_size: Optional[int] = None,
) -> LanceTable: ) -> LanceTable:
# Open a table directly from a URI using the location parameter # Open a table directly from a URI using the location parameter
# Note: storage_options should already be merged by the caller # Note: storage_options should already be merged by the caller
if namespace is None:
namespace = []
temp_conn = LanceDBConnection( temp_conn = LanceDBConnection(
table_uri, # Use the table location as the connection URI table_uri, # Use the table location as the connection URI
read_consistency_interval=self.read_consistency_interval, read_consistency_interval=self.read_consistency_interval,
@@ -539,9 +561,11 @@ class AsyncLanceNamespaceDBConnection:
page_token: Optional[str] = None, page_token: Optional[str] = None,
limit: int = 10, limit: int = 10,
*, *,
namespace: List[str] = [], namespace: Optional[List[str]] = None,
) -> Iterable[str]: ) -> Iterable[str]:
"""List table names in the namespace.""" """List table names in the namespace."""
if namespace is None:
namespace = []
request = ListTablesRequest(id=namespace, page_token=page_token, limit=limit) request = ListTablesRequest(id=namespace, page_token=page_token, limit=limit)
response = self._ns.list_tables(request) response = self._ns.list_tables(request)
return response.tables if response.tables else [] return response.tables if response.tables else []
@@ -557,13 +581,15 @@ class AsyncLanceNamespaceDBConnection:
fill_value: float = 0.0, fill_value: float = 0.0,
embedding_functions: Optional[List[EmbeddingFunctionConfig]] = None, embedding_functions: Optional[List[EmbeddingFunctionConfig]] = None,
*, *,
namespace: List[str] = [], namespace: Optional[List[str]] = None,
storage_options: Optional[Dict[str, str]] = None, storage_options: Optional[Dict[str, str]] = None,
storage_options_provider: Optional[StorageOptionsProvider] = None, storage_options_provider: Optional[StorageOptionsProvider] = None,
data_storage_version: Optional[str] = None, data_storage_version: Optional[str] = None,
enable_v2_manifest_paths: Optional[bool] = None, enable_v2_manifest_paths: Optional[bool] = None,
) -> AsyncTable: ) -> AsyncTable:
"""Create a new table in the namespace.""" """Create a new table in the namespace."""
if namespace is None:
namespace = []
if mode.lower() not in ["create", "overwrite"]: if mode.lower() not in ["create", "overwrite"]:
raise ValueError("mode must be either 'create' or 'overwrite'") raise ValueError("mode must be either 'create' or 'overwrite'")
validate_table_name(name) validate_table_name(name)
@@ -655,12 +681,14 @@ class AsyncLanceNamespaceDBConnection:
self, self,
name: str, name: str,
*, *,
namespace: List[str] = [], namespace: Optional[List[str]] = None,
storage_options: Optional[Dict[str, str]] = None, storage_options: Optional[Dict[str, str]] = None,
storage_options_provider: Optional[StorageOptionsProvider] = None, storage_options_provider: Optional[StorageOptionsProvider] = None,
index_cache_size: Optional[int] = None, index_cache_size: Optional[int] = None,
) -> AsyncTable: ) -> AsyncTable:
"""Open an existing table from the namespace.""" """Open an existing table from the namespace."""
if namespace is None:
namespace = []
table_id = namespace + [name] table_id = namespace + [name]
request = DescribeTableRequest(id=table_id) request = DescribeTableRequest(id=table_id)
response = self._ns.describe_table(request) response = self._ns.describe_table(request)
@@ -701,8 +729,10 @@ class AsyncLanceNamespaceDBConnection:
lance_table = await asyncio.to_thread(_open_table) lance_table = await asyncio.to_thread(_open_table)
return lance_table._table return lance_table._table
async def drop_table(self, name: str, namespace: List[str] = []): async def drop_table(self, name: str, namespace: Optional[List[str]] = None):
"""Drop a table from the namespace.""" """Drop a table from the namespace."""
if namespace is None:
namespace = []
table_id = namespace + [name] table_id = namespace + [name]
request = DropTableRequest(id=table_id) request = DropTableRequest(id=table_id)
self._ns.drop_table(request) self._ns.drop_table(request)
@@ -711,10 +741,14 @@ class AsyncLanceNamespaceDBConnection:
self, self,
cur_name: str, cur_name: str,
new_name: str, new_name: str,
cur_namespace: List[str] = [], cur_namespace: Optional[List[str]] = None,
new_namespace: List[str] = [], new_namespace: Optional[List[str]] = None,
): ):
"""Rename is not supported for namespace connections.""" """Rename is not supported for namespace connections."""
if cur_namespace is None:
cur_namespace = []
if new_namespace is None:
new_namespace = []
raise NotImplementedError( raise NotImplementedError(
"rename_table is not supported for namespace connections" "rename_table is not supported for namespace connections"
) )
@@ -725,15 +759,17 @@ class AsyncLanceNamespaceDBConnection:
"drop_database is deprecated, use drop_all_tables instead" "drop_database is deprecated, use drop_all_tables instead"
) )
async def drop_all_tables(self, namespace: List[str] = []): async def drop_all_tables(self, namespace: Optional[List[str]] = None):
"""Drop all tables in the namespace.""" """Drop all tables in the namespace."""
if namespace is None:
namespace = []
table_names = await self.table_names(namespace=namespace) table_names = await self.table_names(namespace=namespace)
for table_name in table_names: for table_name in table_names:
await self.drop_table(table_name, namespace=namespace) await self.drop_table(table_name, namespace=namespace)
async def list_namespaces( async def list_namespaces(
self, self,
namespace: List[str] = [], namespace: Optional[List[str]] = None,
page_token: Optional[str] = None, page_token: Optional[str] = None,
limit: int = 10, limit: int = 10,
) -> Iterable[str]: ) -> Iterable[str]:
@@ -755,6 +791,8 @@ class AsyncLanceNamespaceDBConnection:
Iterable[str] Iterable[str]
Names of child namespaces. Names of child namespaces.
""" """
if namespace is None:
namespace = []
request = ListNamespacesRequest( request = ListNamespacesRequest(
id=namespace, page_token=page_token, limit=limit id=namespace, page_token=page_token, limit=limit
) )

View File

@@ -883,7 +883,7 @@ class LanceQueryBuilder(ABC):
---------- ----------
where: str where: str
The where clause which is a valid SQL where clause. See The where clause which is a valid SQL where clause. See
`Lance filter pushdown <https://lancedb.github.io/lance/read_and_write.html#filter-push-down>`_ `Lance filter pushdown <https://lance.org/guide/read_and_write#filter-push-down>`_
for valid SQL expressions. for valid SQL expressions.
prefilter: bool, default True prefilter: bool, default True
If True, apply the filter before vector search, otherwise the If True, apply the filter before vector search, otherwise the
@@ -1356,7 +1356,7 @@ class LanceVectorQueryBuilder(LanceQueryBuilder):
---------- ----------
where: str where: str
The where clause which is a valid SQL where clause. See The where clause which is a valid SQL where clause. See
`Lance filter pushdown <https://lancedb.github.io/lance/read_and_write.html#filter-push-down>`_ `Lance filter pushdown <https://lance.org/guide/read_and_write#filter-push-down>`_
for valid SQL expressions. for valid SQL expressions.
prefilter: bool, default True prefilter: bool, default True
If True, apply the filter before vector search, otherwise the If True, apply the filter before vector search, otherwise the
@@ -1495,7 +1495,7 @@ class LanceFtsQueryBuilder(LanceQueryBuilder):
if self._phrase_query: if self._phrase_query:
if isinstance(query, str): if isinstance(query, str):
if not query.startswith('"') or not query.endswith('"'): if not query.startswith('"') or not query.endswith('"'):
query = f'"{query}"' self._query = f'"{query}"'
elif isinstance(query, FullTextQuery) and not isinstance( elif isinstance(query, FullTextQuery) and not isinstance(
query, PhraseQuery query, PhraseQuery
): ):
@@ -2429,9 +2429,8 @@ class AsyncQueryBase(object):
>>> from lancedb import connect_async >>> from lancedb import connect_async
>>> async def doctest_example(): >>> async def doctest_example():
... conn = await connect_async("./.lancedb") ... conn = await connect_async("./.lancedb")
... table = await conn.create_table("my_table", [{"vector": [99, 99]}]) ... table = await conn.create_table("my_table", [{"vector": [99.0, 99.0]}])
... query = [100, 100] ... plan = await table.query().nearest_to([1.0, 2.0]).explain_plan(True)
... plan = await table.query().nearest_to([1, 2]).explain_plan(True)
... print(plan) ... print(plan)
>>> asyncio.run(doctest_example()) # doctest: +ELLIPSIS, +NORMALIZE_WHITESPACE >>> asyncio.run(doctest_example()) # doctest: +ELLIPSIS, +NORMALIZE_WHITESPACE
ProjectionExec: expr=[vector@0 as vector, _distance@2 as _distance] ProjectionExec: expr=[vector@0 as vector, _distance@2 as _distance]
@@ -2440,6 +2439,7 @@ class AsyncQueryBase(object):
SortExec: TopK(fetch=10), expr=[_distance@2 ASC NULLS LAST, _rowid@1 ASC NULLS LAST], preserve_partitioning=[false] SortExec: TopK(fetch=10), expr=[_distance@2 ASC NULLS LAST, _rowid@1 ASC NULLS LAST], preserve_partitioning=[false]
KNNVectorDistance: metric=l2 KNNVectorDistance: metric=l2
LanceRead: uri=..., projection=[vector], ... LanceRead: uri=..., projection=[vector], ...
<BLANKLINE>
Parameters Parameters
---------- ----------
@@ -3141,10 +3141,9 @@ class AsyncHybridQuery(AsyncStandardQuery, AsyncVectorQueryBase):
>>> from lancedb.index import FTS >>> from lancedb.index import FTS
>>> async def doctest_example(): >>> async def doctest_example():
... conn = await connect_async("./.lancedb") ... conn = await connect_async("./.lancedb")
... table = await conn.create_table("my_table", [{"vector": [99, 99], "text": "hello world"}]) ... table = await conn.create_table("my_table", [{"vector": [99.0, 99.0], "text": "hello world"}])
... await table.create_index("text", config=FTS(with_position=False)) ... await table.create_index("text", config=FTS(with_position=False))
... query = [100, 100] ... plan = await table.query().nearest_to([1.0, 2.0]).nearest_to_text("hello").explain_plan(True)
... plan = await table.query().nearest_to([1, 2]).nearest_to_text("hello").explain_plan(True)
... print(plan) ... print(plan)
>>> asyncio.run(doctest_example()) # doctest: +ELLIPSIS, +NORMALIZE_WHITESPACE >>> asyncio.run(doctest_example()) # doctest: +ELLIPSIS, +NORMALIZE_WHITESPACE
Vector Search Plan: Vector Search Plan:
@@ -3418,9 +3417,8 @@ class BaseQueryBuilder(object):
>>> from lancedb import connect_async >>> from lancedb import connect_async
>>> async def doctest_example(): >>> async def doctest_example():
... conn = await connect_async("./.lancedb") ... conn = await connect_async("./.lancedb")
... table = await conn.create_table("my_table", [{"vector": [99, 99]}]) ... table = await conn.create_table("my_table", [{"vector": [99.0, 99.0]}])
... query = [100, 100] ... plan = await table.query().nearest_to([1.0, 2.0]).explain_plan(True)
... plan = await table.query().nearest_to([1, 2]).explain_plan(True)
... print(plan) ... print(plan)
>>> asyncio.run(doctest_example()) # doctest: +ELLIPSIS, +NORMALIZE_WHITESPACE >>> asyncio.run(doctest_example()) # doctest: +ELLIPSIS, +NORMALIZE_WHITESPACE
ProjectionExec: expr=[vector@0 as vector, _distance@2 as _distance] ProjectionExec: expr=[vector@0 as vector, _distance@2 as _distance]
@@ -3429,6 +3427,7 @@ class BaseQueryBuilder(object):
SortExec: TopK(fetch=10), expr=[_distance@2 ASC NULLS LAST, _rowid@1 ASC NULLS LAST], preserve_partitioning=[false] SortExec: TopK(fetch=10), expr=[_distance@2 ASC NULLS LAST, _rowid@1 ASC NULLS LAST], preserve_partitioning=[false]
KNNVectorDistance: metric=l2 KNNVectorDistance: metric=l2
LanceRead: uri=..., projection=[vector], ... LanceRead: uri=..., projection=[vector], ...
<BLANKLINE>
Parameters Parameters
---------- ----------

View File

@@ -104,7 +104,7 @@ class RemoteDBConnection(DBConnection):
@override @override
def list_namespaces( def list_namespaces(
self, self,
namespace: List[str] = [], namespace: Optional[List[str]] = None,
page_token: Optional[str] = None, page_token: Optional[str] = None,
limit: int = 10, limit: int = 10,
) -> Iterable[str]: ) -> Iterable[str]:
@@ -125,6 +125,8 @@ class RemoteDBConnection(DBConnection):
Iterable of str Iterable of str
List of immediate child namespace names List of immediate child namespace names
""" """
if namespace is None:
namespace = []
return LOOP.run( return LOOP.run(
self._conn.list_namespaces( self._conn.list_namespaces(
namespace=namespace, page_token=page_token, limit=limit namespace=namespace, page_token=page_token, limit=limit
@@ -159,7 +161,7 @@ class RemoteDBConnection(DBConnection):
page_token: Optional[str] = None, page_token: Optional[str] = None,
limit: int = 10, limit: int = 10,
*, *,
namespace: List[str] = [], namespace: Optional[List[str]] = None,
) -> Iterable[str]: ) -> Iterable[str]:
"""List the names of all tables in the database. """List the names of all tables in the database.
@@ -177,6 +179,8 @@ class RemoteDBConnection(DBConnection):
------- -------
An iterator of table names. An iterator of table names.
""" """
if namespace is None:
namespace = []
return LOOP.run( return LOOP.run(
self._conn.table_names( self._conn.table_names(
namespace=namespace, start_after=page_token, limit=limit namespace=namespace, start_after=page_token, limit=limit
@@ -188,7 +192,7 @@ class RemoteDBConnection(DBConnection):
self, self,
name: str, name: str,
*, *,
namespace: List[str] = [], namespace: Optional[List[str]] = None,
storage_options: Optional[Dict[str, str]] = None, storage_options: Optional[Dict[str, str]] = None,
index_cache_size: Optional[int] = None, index_cache_size: Optional[int] = None,
) -> Table: ) -> Table:
@@ -208,6 +212,8 @@ class RemoteDBConnection(DBConnection):
""" """
from .table import RemoteTable from .table import RemoteTable
if namespace is None:
namespace = []
if index_cache_size is not None: if index_cache_size is not None:
logging.info( logging.info(
"index_cache_size is ignored in LanceDb Cloud" "index_cache_size is ignored in LanceDb Cloud"
@@ -222,7 +228,7 @@ class RemoteDBConnection(DBConnection):
target_table_name: str, target_table_name: str,
source_uri: str, source_uri: str,
*, *,
target_namespace: List[str] = [], target_namespace: Optional[List[str]] = None,
source_version: Optional[int] = None, source_version: Optional[int] = None,
source_tag: Optional[str] = None, source_tag: Optional[str] = None,
is_shallow: bool = True, is_shallow: bool = True,
@@ -252,6 +258,8 @@ class RemoteDBConnection(DBConnection):
""" """
from .table import RemoteTable from .table import RemoteTable
if target_namespace is None:
target_namespace = []
table = LOOP.run( table = LOOP.run(
self._conn.clone_table( self._conn.clone_table(
target_table_name, target_table_name,
@@ -275,7 +283,7 @@ class RemoteDBConnection(DBConnection):
mode: Optional[str] = None, mode: Optional[str] = None,
embedding_functions: Optional[List[EmbeddingFunctionConfig]] = None, embedding_functions: Optional[List[EmbeddingFunctionConfig]] = None,
*, *,
namespace: List[str] = [], namespace: Optional[List[str]] = None,
) -> Table: ) -> Table:
"""Create a [Table][lancedb.table.Table] in the database. """Create a [Table][lancedb.table.Table] in the database.
@@ -372,6 +380,8 @@ class RemoteDBConnection(DBConnection):
LanceTable(table4) LanceTable(table4)
""" """
if namespace is None:
namespace = []
validate_table_name(name) validate_table_name(name)
if embedding_functions is not None: if embedding_functions is not None:
logging.warning( logging.warning(
@@ -396,7 +406,7 @@ class RemoteDBConnection(DBConnection):
return RemoteTable(table, self.db_name) return RemoteTable(table, self.db_name)
@override @override
def drop_table(self, name: str, namespace: List[str] = []): def drop_table(self, name: str, namespace: Optional[List[str]] = None):
"""Drop a table from the database. """Drop a table from the database.
Parameters Parameters
@@ -407,6 +417,8 @@ class RemoteDBConnection(DBConnection):
The namespace to drop the table from. The namespace to drop the table from.
None or empty list represents root namespace. None or empty list represents root namespace.
""" """
if namespace is None:
namespace = []
LOOP.run(self._conn.drop_table(name, namespace=namespace)) LOOP.run(self._conn.drop_table(name, namespace=namespace))
@override @override
@@ -414,8 +426,8 @@ class RemoteDBConnection(DBConnection):
self, self,
cur_name: str, cur_name: str,
new_name: str, new_name: str,
cur_namespace: List[str] = [], cur_namespace: Optional[List[str]] = None,
new_namespace: List[str] = [], new_namespace: Optional[List[str]] = None,
): ):
"""Rename a table in the database. """Rename a table in the database.
@@ -426,6 +438,10 @@ class RemoteDBConnection(DBConnection):
new_name: str new_name: str
The new name of the table. The new name of the table.
""" """
if cur_namespace is None:
cur_namespace = []
if new_namespace is None:
new_namespace = []
LOOP.run( LOOP.run(
self._conn.rename_table( self._conn.rename_table(
cur_name, cur_name,

View File

@@ -652,6 +652,17 @@ class RemoteTable(Table):
"migrate_v2_manifest_paths() is not supported on the LanceDB Cloud" "migrate_v2_manifest_paths() is not supported on the LanceDB Cloud"
) )
def head(self, n=5) -> pa.Table:
"""
Return the first `n` rows of the table.
Parameters
----------
n: int, default 5
The number of rows to return.
"""
return LOOP.run(self._table.query().limit(n).to_arrow())
def add_index(tbl: pa.Table, i: int) -> pa.Table: def add_index(tbl: pa.Table, i: int) -> pa.Table:
return tbl.add_column( return tbl.add_column(

View File

@@ -178,7 +178,7 @@ def _into_pyarrow_reader(
f"Unknown data type {type(data)}. " f"Unknown data type {type(data)}. "
"Supported types: list of dicts, pandas DataFrame, polars DataFrame, " "Supported types: list of dicts, pandas DataFrame, polars DataFrame, "
"pyarrow Table/RecordBatch, or Pydantic models. " "pyarrow Table/RecordBatch, or Pydantic models. "
"See https://lancedb.github.io/lancedb/guides/tables/ for examples." "See https://lancedb.com/docs/tables/ for examples."
) )
@@ -1018,7 +1018,7 @@ class Table(ABC):
... .when_not_matched_insert_all() \\ ... .when_not_matched_insert_all() \\
... .execute(new_data) ... .execute(new_data)
>>> res >>> res
MergeResult(version=2, num_updated_rows=2, num_inserted_rows=1, num_deleted_rows=0) MergeResult(version=2, num_updated_rows=2, num_inserted_rows=1, num_deleted_rows=0, num_attempts=1)
>>> # The order of new rows is non-deterministic since we use >>> # The order of new rows is non-deterministic since we use
>>> # a hash-join as part of this operation and so we sort here >>> # a hash-join as part of this operation and so we sort here
>>> table.to_arrow().sort_by("a").to_pandas() >>> table.to_arrow().sort_by("a").to_pandas()
@@ -1708,13 +1708,15 @@ class LanceTable(Table):
connection: "LanceDBConnection", connection: "LanceDBConnection",
name: str, name: str,
*, *,
namespace: List[str] = [], namespace: Optional[List[str]] = None,
storage_options: Optional[Dict[str, str]] = None, storage_options: Optional[Dict[str, str]] = None,
storage_options_provider: Optional["StorageOptionsProvider"] = None, storage_options_provider: Optional["StorageOptionsProvider"] = None,
index_cache_size: Optional[int] = None, index_cache_size: Optional[int] = None,
location: Optional[str] = None, location: Optional[str] = None,
_async: AsyncTable = None, _async: AsyncTable = None,
): ):
if namespace is None:
namespace = []
self._conn = connection self._conn = connection
self._namespace = namespace self._namespace = namespace
self._location = location # Store location for use in _dataset_path self._location = location # Store location for use in _dataset_path
@@ -1766,12 +1768,14 @@ class LanceTable(Table):
db, db,
name, name,
*, *,
namespace: List[str] = [], namespace: Optional[List[str]] = None,
storage_options: Optional[Dict[str, str]] = None, storage_options: Optional[Dict[str, str]] = None,
storage_options_provider: Optional["StorageOptionsProvider"] = None, storage_options_provider: Optional["StorageOptionsProvider"] = None,
index_cache_size: Optional[int] = None, index_cache_size: Optional[int] = None,
location: Optional[str] = None, location: Optional[str] = None,
): ):
if namespace is None:
namespace = []
tbl = cls( tbl = cls(
db, db,
name, name,
@@ -2623,7 +2627,7 @@ class LanceTable(Table):
fill_value: float = 0.0, fill_value: float = 0.0,
embedding_functions: Optional[List[EmbeddingFunctionConfig]] = None, embedding_functions: Optional[List[EmbeddingFunctionConfig]] = None,
*, *,
namespace: List[str] = [], namespace: Optional[List[str]] = None,
storage_options: Optional[Dict[str, str | bool]] = None, storage_options: Optional[Dict[str, str | bool]] = None,
storage_options_provider: Optional["StorageOptionsProvider"] = None, storage_options_provider: Optional["StorageOptionsProvider"] = None,
data_storage_version: Optional[str] = None, data_storage_version: Optional[str] = None,
@@ -2683,6 +2687,8 @@ class LanceTable(Table):
Deprecated. Set `storage_options` when connecting to the database and set Deprecated. Set `storage_options` when connecting to the database and set
`new_table_enable_v2_manifest_paths` in the options. `new_table_enable_v2_manifest_paths` in the options.
""" """
if namespace is None:
namespace = []
self = cls.__new__(cls) self = cls.__new__(cls)
self._conn = db self._conn = db
self._namespace = namespace self._namespace = namespace
@@ -3628,7 +3634,7 @@ class AsyncTable:
... .when_not_matched_insert_all() \\ ... .when_not_matched_insert_all() \\
... .execute(new_data) ... .execute(new_data)
>>> res >>> res
MergeResult(version=2, num_updated_rows=2, num_inserted_rows=1, num_deleted_rows=0) MergeResult(version=2, num_updated_rows=2, num_inserted_rows=1, num_deleted_rows=0, num_attempts=1)
>>> # The order of new rows is non-deterministic since we use >>> # The order of new rows is non-deterministic since we use
>>> # a hash-join as part of this operation and so we sort here >>> # a hash-join as part of this operation and so we sort here
>>> table.to_arrow().sort_by("a").to_pandas() >>> table.to_arrow().sort_by("a").to_pandas()

View File

@@ -441,6 +441,150 @@ async def test_create_table_v2_manifest_paths_async(tmp_path):
assert re.match(r"\d{20}\.manifest", manifest) assert re.match(r"\d{20}\.manifest", manifest)
@pytest.mark.asyncio
async def test_create_table_stable_row_ids_via_storage_options(tmp_path):
"""Test stable_row_ids via storage_options at connect time."""
import lance
# Connect with stable row IDs enabled as default for new tables
db_with = await lancedb.connect_async(
tmp_path, storage_options={"new_table_enable_stable_row_ids": "true"}
)
# Connect without stable row IDs (default)
db_without = await lancedb.connect_async(
tmp_path, storage_options={"new_table_enable_stable_row_ids": "false"}
)
# Create table using connection with stable row IDs enabled
await db_with.create_table(
"with_stable_via_opts",
data=[{"id": i} for i in range(10)],
)
lance_ds_with = lance.dataset(tmp_path / "with_stable_via_opts.lance")
fragments_with = lance_ds_with.get_fragments()
assert len(fragments_with) > 0
assert fragments_with[0].metadata.row_id_meta is not None
# Create table using connection without stable row IDs
await db_without.create_table(
"without_stable_via_opts",
data=[{"id": i} for i in range(10)],
)
lance_ds_without = lance.dataset(tmp_path / "without_stable_via_opts.lance")
fragments_without = lance_ds_without.get_fragments()
assert len(fragments_without) > 0
assert fragments_without[0].metadata.row_id_meta is None
def test_create_table_stable_row_ids_via_storage_options_sync(tmp_path):
"""Test that enable_stable_row_ids can be set via storage_options (sync API)."""
# Connect with stable row IDs enabled as default for new tables
db_with = lancedb.connect(
tmp_path, storage_options={"new_table_enable_stable_row_ids": "true"}
)
# Connect without stable row IDs (default)
db_without = lancedb.connect(
tmp_path, storage_options={"new_table_enable_stable_row_ids": "false"}
)
# Create table using connection with stable row IDs enabled
tbl_with = db_with.create_table(
"with_stable_sync",
data=[{"id": i} for i in range(10)],
)
lance_ds_with = tbl_with.to_lance()
fragments_with = lance_ds_with.get_fragments()
assert len(fragments_with) > 0
assert fragments_with[0].metadata.row_id_meta is not None
# Create table using connection without stable row IDs
tbl_without = db_without.create_table(
"without_stable_sync",
data=[{"id": i} for i in range(10)],
)
lance_ds_without = tbl_without.to_lance()
fragments_without = lance_ds_without.get_fragments()
assert len(fragments_without) > 0
assert fragments_without[0].metadata.row_id_meta is None
@pytest.mark.asyncio
async def test_create_table_stable_row_ids_table_level_override(tmp_path):
"""Test that stable_row_ids can be enabled/disabled at create_table level."""
import lance
# Connect without any stable row ID setting
db_default = await lancedb.connect_async(tmp_path)
# Connect with stable row IDs enabled at connection level
db_with_stable = await lancedb.connect_async(
tmp_path, storage_options={"new_table_enable_stable_row_ids": "true"}
)
# Case 1: No connection setting, enable at table level
await db_default.create_table(
"table_level_enabled",
data=[{"id": i} for i in range(10)],
storage_options={"new_table_enable_stable_row_ids": "true"},
)
lance_ds = lance.dataset(tmp_path / "table_level_enabled.lance")
fragments = lance_ds.get_fragments()
assert len(fragments) > 0
assert fragments[0].metadata.row_id_meta is not None, (
"Table should have stable row IDs when enabled at table level"
)
# Case 2: Connection has stable row IDs, override with false at table level
await db_with_stable.create_table(
"table_level_disabled",
data=[{"id": i} for i in range(10)],
storage_options={"new_table_enable_stable_row_ids": "false"},
)
lance_ds = lance.dataset(tmp_path / "table_level_disabled.lance")
fragments = lance_ds.get_fragments()
assert len(fragments) > 0
assert fragments[0].metadata.row_id_meta is None, (
"Table should NOT have stable row IDs when disabled at table level"
)
def test_create_table_stable_row_ids_table_level_override_sync(tmp_path):
"""Test that stable_row_ids can be enabled/disabled at create_table level (sync)."""
# Connect without any stable row ID setting
db_default = lancedb.connect(tmp_path)
# Connect with stable row IDs enabled at connection level
db_with_stable = lancedb.connect(
tmp_path, storage_options={"new_table_enable_stable_row_ids": "true"}
)
# Case 1: No connection setting, enable at table level
tbl = db_default.create_table(
"table_level_enabled_sync",
data=[{"id": i} for i in range(10)],
storage_options={"new_table_enable_stable_row_ids": "true"},
)
lance_ds = tbl.to_lance()
fragments = lance_ds.get_fragments()
assert len(fragments) > 0
assert fragments[0].metadata.row_id_meta is not None, (
"Table should have stable row IDs when enabled at table level"
)
# Case 2: Connection has stable row IDs, override with false at table level
tbl = db_with_stable.create_table(
"table_level_disabled_sync",
data=[{"id": i} for i in range(10)],
storage_options={"new_table_enable_stable_row_ids": "false"},
)
lance_ds = tbl.to_lance()
fragments = lance_ds.get_fragments()
assert len(fragments) > 0
assert fragments[0].metadata.row_id_meta is None, (
"Table should NOT have stable row IDs when disabled at table level"
)
def test_open_table_sync(tmp_db: lancedb.DBConnection): def test_open_table_sync(tmp_db: lancedb.DBConnection):
tmp_db.create_table("test", data=[{"id": 0}]) tmp_db.create_table("test", data=[{"id": 0}])
assert tmp_db.open_table("test").count_rows() == 1 assert tmp_db.open_table("test").count_rows() == 1

View File

@@ -325,11 +325,18 @@ def test_search_fts_phrase_query(table):
pass pass
table.create_fts_index("text", use_tantivy=False, with_position=True, replace=True) table.create_fts_index("text", use_tantivy=False, with_position=True, replace=True)
results = table.search("puppy").limit(100).to_list() results = table.search("puppy").limit(100).to_list()
# Test with quotation marks
phrase_results = table.search('"puppy runs"').limit(100).to_list() phrase_results = table.search('"puppy runs"').limit(100).to_list()
assert len(results) > len(phrase_results) assert len(results) > len(phrase_results)
assert len(phrase_results) > 0 assert len(phrase_results) > 0
# Test with a query # Test with .phrase_query()
phrase_results = table.search("puppy runs").phrase_query().limit(100).to_list()
assert len(results) > len(phrase_results)
assert len(phrase_results) > 0
# Test with PhraseQuery()
phrase_results = ( phrase_results = (
table.search(PhraseQuery("puppy runs", "text")).limit(100).to_list() table.search(PhraseQuery("puppy runs", "text")).limit(100).to_list()
) )

View File

@@ -546,6 +546,22 @@ def query_test_table(query_handler, *, server_version=Version("0.1.0")):
yield table yield table
def test_head():
def handler(body):
assert body == {
"k": 5,
"prefilter": True,
"vector": [],
"version": None,
}
return pa.table({"id": [1, 2, 3]})
with query_test_table(handler) as table:
data = table.head(5)
assert data == pa.table({"id": [1, 2, 3]})
def test_query_sync_minimal(): def test_query_sync_minimal():
def handler(body): def handler(body):
assert body == { assert body == {

View File

@@ -1487,7 +1487,7 @@ def setup_hybrid_search_table(db: DBConnection, embedding_func):
table.add([{"text": p} for p in phrases]) table.add([{"text": p} for p in phrases])
# Create a fts index # Create a fts index
table.create_fts_index("text") table.create_fts_index("text", with_position=True)
return table, MyTable, emb return table, MyTable, emb

View File

@@ -690,7 +690,7 @@ impl FTSQuery {
} }
pub fn get_query(&self) -> String { pub fn get_query(&self) -> String {
self.fts_query.query.query().to_owned() self.fts_query.query.query().clone()
} }
pub fn to_query_request(&self) -> PyQueryRequest { pub fn to_query_request(&self) -> PyQueryRequest {

View File

@@ -1,6 +1,6 @@
[package] [package]
name = "lancedb" name = "lancedb"
version = "0.22.4-beta.1" version = "0.22.4-beta.3"
edition.workspace = true edition.workspace = true
description = "LanceDB: A serverless, low-latency vector database for AI applications" description = "LanceDB: A serverless, low-latency vector database for AI applications"
license.workspace = true license.workspace = true
@@ -105,12 +105,12 @@ test-log = "0.2"
[features] [features]
default = ["aws", "gcs", "azure", "dynamodb", "oss"] default = ["aws", "gcs", "azure", "dynamodb", "oss"]
aws = ["lance/aws", "lance-io/aws"] aws = ["lance/aws", "lance-io/aws", "lance-namespace-impls/dir-aws"]
oss = ["lance/oss", "lance-io/oss"] oss = ["lance/oss", "lance-io/oss", "lance-namespace-impls/dir-oss"]
gcs = ["lance/gcp", "lance-io/gcp"] gcs = ["lance/gcp", "lance-io/gcp", "lance-namespace-impls/dir-gcp"]
azure = ["lance/azure", "lance-io/azure"] azure = ["lance/azure", "lance-io/azure", "lance-namespace-impls/dir-azure"]
dynamodb = ["lance/dynamodb", "aws"] dynamodb = ["lance/dynamodb", "aws"]
remote = ["dep:reqwest", "dep:http"] remote = ["dep:reqwest", "dep:http", "lance-namespace-impls/rest"]
fp16kernels = ["lance-linalg/fp16kernels"] fp16kernels = ["lance-linalg/fp16kernels"]
s3-test = [] s3-test = []
bedrock = ["dep:aws-sdk-bedrockruntime"] bedrock = ["dep:aws-sdk-bedrockruntime"]

View File

@@ -239,7 +239,7 @@ impl<const HAS_DATA: bool> CreateTableBuilder<HAS_DATA> {
/// Options already set on the connection will be inherited by the table, /// Options already set on the connection will be inherited by the table,
/// but can be overridden here. /// but can be overridden here.
/// ///
/// See available options at <https://lancedb.github.io/lancedb/guides/storage/> /// See available options at <https://lancedb.com/docs/storage/>
pub fn storage_option(mut self, key: impl Into<String>, value: impl Into<String>) -> Self { pub fn storage_option(mut self, key: impl Into<String>, value: impl Into<String>) -> Self {
let store_options = self let store_options = self
.request .request
@@ -259,7 +259,7 @@ impl<const HAS_DATA: bool> CreateTableBuilder<HAS_DATA> {
/// Options already set on the connection will be inherited by the table, /// Options already set on the connection will be inherited by the table,
/// but can be overridden here. /// but can be overridden here.
/// ///
/// See available options at <https://lancedb.github.io/lancedb/guides/storage/> /// See available options at <https://lancedb.com/docs/storage/>
pub fn storage_options( pub fn storage_options(
mut self, mut self,
pairs: impl IntoIterator<Item = (impl Into<String>, impl Into<String>)>, pairs: impl IntoIterator<Item = (impl Into<String>, impl Into<String>)>,
@@ -442,7 +442,7 @@ impl OpenTableBuilder {
/// Options already set on the connection will be inherited by the table, /// Options already set on the connection will be inherited by the table,
/// but can be overridden here. /// but can be overridden here.
/// ///
/// See available options at <https://lancedb.github.io/lancedb/guides/storage/> /// See available options at <https://lancedb.com/docs/storage/>
pub fn storage_option(mut self, key: impl Into<String>, value: impl Into<String>) -> Self { pub fn storage_option(mut self, key: impl Into<String>, value: impl Into<String>) -> Self {
let storage_options = self let storage_options = self
.request .request
@@ -461,7 +461,7 @@ impl OpenTableBuilder {
/// Options already set on the connection will be inherited by the table, /// Options already set on the connection will be inherited by the table,
/// but can be overridden here. /// but can be overridden here.
/// ///
/// See available options at <https://lancedb.github.io/lancedb/guides/storage/> /// See available options at <https://lancedb.com/docs/storage/>
pub fn storage_options( pub fn storage_options(
mut self, mut self,
pairs: impl IntoIterator<Item = (impl Into<String>, impl Into<String>)>, pairs: impl IntoIterator<Item = (impl Into<String>, impl Into<String>)>,
@@ -959,7 +959,7 @@ impl ConnectBuilder {
/// Set an option for the storage layer. /// Set an option for the storage layer.
/// ///
/// See available options at <https://lancedb.github.io/lancedb/guides/storage/> /// See available options at <https://lancedb.com/docs/storage/>
pub fn storage_option(mut self, key: impl Into<String>, value: impl Into<String>) -> Self { pub fn storage_option(mut self, key: impl Into<String>, value: impl Into<String>) -> Self {
self.request.options.insert(key.into(), value.into()); self.request.options.insert(key.into(), value.into());
self self
@@ -967,7 +967,7 @@ impl ConnectBuilder {
/// Set multiple options for the storage layer. /// Set multiple options for the storage layer.
/// ///
/// See available options at <https://lancedb.github.io/lancedb/guides/storage/> /// See available options at <https://lancedb.com/docs/storage/>
pub fn storage_options( pub fn storage_options(
mut self, mut self,
pairs: impl IntoIterator<Item = (impl Into<String>, impl Into<String>)>, pairs: impl IntoIterator<Item = (impl Into<String>, impl Into<String>)>,
@@ -1102,7 +1102,7 @@ impl ConnectNamespaceBuilder {
/// Set an option for the storage layer. /// Set an option for the storage layer.
/// ///
/// See available options at <https://lancedb.github.io/lancedb/guides/storage/> /// See available options at <https://lancedb.com/docs/storage/>
pub fn storage_option(mut self, key: impl Into<String>, value: impl Into<String>) -> Self { pub fn storage_option(mut self, key: impl Into<String>, value: impl Into<String>) -> Self {
self.storage_options.insert(key.into(), value.into()); self.storage_options.insert(key.into(), value.into());
self self
@@ -1110,7 +1110,7 @@ impl ConnectNamespaceBuilder {
/// Set multiple options for the storage layer. /// Set multiple options for the storage layer.
/// ///
/// See available options at <https://lancedb.github.io/lancedb/guides/storage/> /// See available options at <https://lancedb.com/docs/storage/>
pub fn storage_options( pub fn storage_options(
mut self, mut self,
pairs: impl IntoIterator<Item = (impl Into<String>, impl Into<String>)>, pairs: impl IntoIterator<Item = (impl Into<String>, impl Into<String>)>,

View File

@@ -35,6 +35,7 @@ pub const LANCE_FILE_EXTENSION: &str = "lance";
pub const OPT_NEW_TABLE_STORAGE_VERSION: &str = "new_table_data_storage_version"; pub const OPT_NEW_TABLE_STORAGE_VERSION: &str = "new_table_data_storage_version";
pub const OPT_NEW_TABLE_V2_MANIFEST_PATHS: &str = "new_table_enable_v2_manifest_paths"; pub const OPT_NEW_TABLE_V2_MANIFEST_PATHS: &str = "new_table_enable_v2_manifest_paths";
pub const OPT_NEW_TABLE_ENABLE_STABLE_ROW_IDS: &str = "new_table_enable_stable_row_ids";
/// Controls how new tables should be created /// Controls how new tables should be created
#[derive(Clone, Debug, Default)] #[derive(Clone, Debug, Default)]
@@ -48,6 +49,12 @@ pub struct NewTableConfig {
/// V2 manifest paths are more efficient than V2 manifest paths but are not /// V2 manifest paths are more efficient than V2 manifest paths but are not
/// supported by old clients. /// supported by old clients.
pub enable_v2_manifest_paths: Option<bool>, pub enable_v2_manifest_paths: Option<bool>,
/// Whether to enable stable row IDs for new tables
///
/// When enabled, row IDs remain stable after compaction, update, delete,
/// and merges. This is useful for materialized views and other use cases
/// that need to track source rows across these operations.
pub enable_stable_row_ids: Option<bool>,
} }
/// Options specific to the listing database /// Options specific to the listing database
@@ -60,7 +67,7 @@ pub struct ListingDatabaseOptions {
/// These are used to create/list tables and they are inherited by all tables /// These are used to create/list tables and they are inherited by all tables
/// opened by this database. /// opened by this database.
/// ///
/// See available options at <https://lancedb.github.io/lancedb/guides/storage/> /// See available options at <https://lancedb.com/docs/storage/>
pub storage_options: HashMap<String, String>, pub storage_options: HashMap<String, String>,
} }
@@ -87,6 +94,14 @@ impl ListingDatabaseOptions {
}) })
}) })
.transpose()?, .transpose()?,
enable_stable_row_ids: map
.get(OPT_NEW_TABLE_ENABLE_STABLE_ROW_IDS)
.map(|s| {
s.parse::<bool>().map_err(|_| Error::InvalidInput {
message: format!("enable_stable_row_ids must be a boolean, received {}", s),
})
})
.transpose()?,
}; };
// We just assume that any options that are not new table config options are storage options // We just assume that any options that are not new table config options are storage options
let storage_options = map let storage_options = map
@@ -94,6 +109,7 @@ impl ListingDatabaseOptions {
.filter(|(key, _)| { .filter(|(key, _)| {
key.as_str() != OPT_NEW_TABLE_STORAGE_VERSION key.as_str() != OPT_NEW_TABLE_STORAGE_VERSION
&& key.as_str() != OPT_NEW_TABLE_V2_MANIFEST_PATHS && key.as_str() != OPT_NEW_TABLE_V2_MANIFEST_PATHS
&& key.as_str() != OPT_NEW_TABLE_ENABLE_STABLE_ROW_IDS
}) })
.map(|(key, value)| (key.clone(), value.clone())) .map(|(key, value)| (key.clone(), value.clone()))
.collect(); .collect();
@@ -118,6 +134,12 @@ impl DatabaseOptions for ListingDatabaseOptions {
enable_v2_manifest_paths.to_string(), enable_v2_manifest_paths.to_string(),
); );
} }
if let Some(enable_stable_row_ids) = self.new_table_config.enable_stable_row_ids {
map.insert(
OPT_NEW_TABLE_ENABLE_STABLE_ROW_IDS.to_string(),
enable_stable_row_ids.to_string(),
);
}
} }
} }
@@ -157,7 +179,7 @@ impl ListingDatabaseOptionsBuilder {
/// Set an option for the storage layer. /// Set an option for the storage layer.
/// ///
/// See available options at <https://lancedb.github.io/lancedb/guides/storage/> /// See available options at <https://lancedb.com/docs/storage/>
pub fn storage_option(mut self, key: impl Into<String>, value: impl Into<String>) -> Self { pub fn storage_option(mut self, key: impl Into<String>, value: impl Into<String>) -> Self {
self.options self.options
.storage_options .storage_options
@@ -167,7 +189,7 @@ impl ListingDatabaseOptionsBuilder {
/// Set multiple options for the storage layer. /// Set multiple options for the storage layer.
/// ///
/// See available options at <https://lancedb.github.io/lancedb/guides/storage/> /// See available options at <https://lancedb.com/docs/storage/>
pub fn storage_options( pub fn storage_options(
mut self, mut self,
pairs: impl IntoIterator<Item = (impl Into<String>, impl Into<String>)>, pairs: impl IntoIterator<Item = (impl Into<String>, impl Into<String>)>,
@@ -475,7 +497,7 @@ impl ListingDatabase {
// this error is not lance::Error::DatasetNotFound, as the method // this error is not lance::Error::DatasetNotFound, as the method
// `remove_dir_all` may be used to remove something not be a dataset // `remove_dir_all` may be used to remove something not be a dataset
lance::Error::NotFound { .. } => Error::TableNotFound { lance::Error::NotFound { .. } => Error::TableNotFound {
name: name.to_owned(), name: name.clone(),
source: Box::new(err), source: Box::new(err),
}, },
_ => Error::from(err), _ => Error::from(err),
@@ -497,7 +519,7 @@ impl ListingDatabase {
fn extract_storage_overrides( fn extract_storage_overrides(
&self, &self,
request: &CreateTableRequest, request: &CreateTableRequest,
) -> Result<(Option<LanceFileVersion>, Option<bool>)> { ) -> Result<(Option<LanceFileVersion>, Option<bool>, Option<bool>)> {
let storage_options = request let storage_options = request
.write_options .write_options
.lance_write_params .lance_write_params
@@ -518,7 +540,19 @@ impl ListingDatabase {
message: "enable_v2_manifest_paths must be a boolean".to_string(), message: "enable_v2_manifest_paths must be a boolean".to_string(),
})?; })?;
Ok((storage_version_override, v2_manifest_override)) let stable_row_ids_override = storage_options
.and_then(|opts| opts.get(OPT_NEW_TABLE_ENABLE_STABLE_ROW_IDS))
.map(|s| s.parse::<bool>())
.transpose()
.map_err(|_| Error::InvalidInput {
message: "enable_stable_row_ids must be a boolean".to_string(),
})?;
Ok((
storage_version_override,
v2_manifest_override,
stable_row_ids_override,
))
} }
/// Prepare write parameters for table creation /// Prepare write parameters for table creation
@@ -527,6 +561,7 @@ impl ListingDatabase {
request: &CreateTableRequest, request: &CreateTableRequest,
storage_version_override: Option<LanceFileVersion>, storage_version_override: Option<LanceFileVersion>,
v2_manifest_override: Option<bool>, v2_manifest_override: Option<bool>,
stable_row_ids_override: Option<bool>,
) -> lance::dataset::WriteParams { ) -> lance::dataset::WriteParams {
let mut write_params = request let mut write_params = request
.write_options .write_options
@@ -571,6 +606,13 @@ impl ListingDatabase {
write_params.enable_v2_manifest_paths = enable_v2_manifest_paths; write_params.enable_v2_manifest_paths = enable_v2_manifest_paths;
} }
// Apply enable_stable_row_ids: table-level override takes precedence over connection config
if let Some(enable_stable_row_ids) =
stable_row_ids_override.or(self.new_table_config.enable_stable_row_ids)
{
write_params.enable_stable_row_ids = enable_stable_row_ids;
}
if matches!(&request.mode, CreateTableMode::Overwrite) { if matches!(&request.mode, CreateTableMode::Overwrite) {
write_params.mode = WriteMode::Overwrite; write_params.mode = WriteMode::Overwrite;
} }
@@ -706,11 +748,15 @@ impl Database for ListingDatabase {
.clone() .clone()
.unwrap_or_else(|| self.table_uri(&request.name).unwrap()); .unwrap_or_else(|| self.table_uri(&request.name).unwrap());
let (storage_version_override, v2_manifest_override) = let (storage_version_override, v2_manifest_override, stable_row_ids_override) =
self.extract_storage_overrides(&request)?; self.extract_storage_overrides(&request)?;
let write_params = let write_params = self.prepare_write_params(
self.prepare_write_params(&request, storage_version_override, v2_manifest_override); &request,
storage_version_override,
v2_manifest_override,
stable_row_ids_override,
);
let data_schema = request.data.arrow_schema(); let data_schema = request.data.arrow_schema();
@@ -921,7 +967,7 @@ impl Database for ListingDatabase {
mod tests { mod tests {
use super::*; use super::*;
use crate::connection::ConnectRequest; use crate::connection::ConnectRequest;
use crate::database::{CreateTableData, CreateTableMode, CreateTableRequest}; use crate::database::{CreateTableData, CreateTableMode, CreateTableRequest, WriteOptions};
use crate::table::{Table, TableDefinition}; use crate::table::{Table, TableDefinition};
use arrow_array::{Int32Array, RecordBatch, StringArray}; use arrow_array::{Int32Array, RecordBatch, StringArray};
use arrow_schema::{DataType, Field, Schema}; use arrow_schema::{DataType, Field, Schema};
@@ -1621,4 +1667,267 @@ mod tests {
// Cloned table should have all 8 rows from the latest version // Cloned table should have all 8 rows from the latest version
assert_eq!(cloned_table.count_rows(None).await.unwrap(), 8); assert_eq!(cloned_table.count_rows(None).await.unwrap(), 8);
} }
#[tokio::test]
async fn test_create_table_with_stable_row_ids_connection_level() {
let tempdir = tempdir().unwrap();
let uri = tempdir.path().to_str().unwrap();
// Create database with stable row IDs enabled at connection level
let mut options = HashMap::new();
options.insert(
OPT_NEW_TABLE_ENABLE_STABLE_ROW_IDS.to_string(),
"true".to_string(),
);
let request = ConnectRequest {
uri: uri.to_string(),
#[cfg(feature = "remote")]
client_config: Default::default(),
options,
read_consistency_interval: None,
session: None,
};
let db = ListingDatabase::connect_with_options(&request)
.await
.unwrap();
// Verify the config was parsed correctly
assert_eq!(db.new_table_config.enable_stable_row_ids, Some(true));
// Create a table - it should inherit the stable row IDs setting
let schema = Arc::new(Schema::new(vec![Field::new("id", DataType::Int32, false)]));
let batch = RecordBatch::try_new(
schema.clone(),
vec![Arc::new(Int32Array::from(vec![1, 2, 3]))],
)
.unwrap();
let reader = Box::new(arrow_array::RecordBatchIterator::new(
vec![Ok(batch)],
schema.clone(),
));
let table = db
.create_table(CreateTableRequest {
name: "test_stable".to_string(),
namespace: vec![],
data: CreateTableData::Data(reader),
mode: CreateTableMode::Create,
write_options: Default::default(),
location: None,
})
.await
.unwrap();
// Verify table was created successfully
assert_eq!(table.count_rows(None).await.unwrap(), 3);
}
#[tokio::test]
async fn test_create_table_with_stable_row_ids_table_level() {
let (_tempdir, db) = setup_database().await;
// Verify connection has no stable row IDs config
assert_eq!(db.new_table_config.enable_stable_row_ids, None);
// Create a table with stable row IDs enabled at table level via storage_options
let schema = Arc::new(Schema::new(vec![Field::new("id", DataType::Int32, false)]));
let batch = RecordBatch::try_new(
schema.clone(),
vec![Arc::new(Int32Array::from(vec![1, 2, 3]))],
)
.unwrap();
let reader = Box::new(arrow_array::RecordBatchIterator::new(
vec![Ok(batch)],
schema.clone(),
));
let mut storage_options = HashMap::new();
storage_options.insert(
OPT_NEW_TABLE_ENABLE_STABLE_ROW_IDS.to_string(),
"true".to_string(),
);
let write_options = WriteOptions {
lance_write_params: Some(lance::dataset::WriteParams {
store_params: Some(lance::io::ObjectStoreParams {
storage_options: Some(storage_options),
..Default::default()
}),
..Default::default()
}),
};
let table = db
.create_table(CreateTableRequest {
name: "test_stable_table_level".to_string(),
namespace: vec![],
data: CreateTableData::Data(reader),
mode: CreateTableMode::Create,
write_options,
location: None,
})
.await
.unwrap();
// Verify table was created successfully
assert_eq!(table.count_rows(None).await.unwrap(), 3);
}
#[tokio::test]
async fn test_create_table_stable_row_ids_table_overrides_connection() {
let tempdir = tempdir().unwrap();
let uri = tempdir.path().to_str().unwrap();
// Create database with stable row IDs enabled at connection level
let mut options = HashMap::new();
options.insert(
OPT_NEW_TABLE_ENABLE_STABLE_ROW_IDS.to_string(),
"true".to_string(),
);
let request = ConnectRequest {
uri: uri.to_string(),
#[cfg(feature = "remote")]
client_config: Default::default(),
options,
read_consistency_interval: None,
session: None,
};
let db = ListingDatabase::connect_with_options(&request)
.await
.unwrap();
assert_eq!(db.new_table_config.enable_stable_row_ids, Some(true));
// Create table with stable row IDs disabled at table level (overrides connection)
let schema = Arc::new(Schema::new(vec![Field::new("id", DataType::Int32, false)]));
let batch = RecordBatch::try_new(
schema.clone(),
vec![Arc::new(Int32Array::from(vec![1, 2, 3]))],
)
.unwrap();
let reader = Box::new(arrow_array::RecordBatchIterator::new(
vec![Ok(batch)],
schema.clone(),
));
let mut storage_options = HashMap::new();
storage_options.insert(
OPT_NEW_TABLE_ENABLE_STABLE_ROW_IDS.to_string(),
"false".to_string(),
);
let write_options = WriteOptions {
lance_write_params: Some(lance::dataset::WriteParams {
store_params: Some(lance::io::ObjectStoreParams {
storage_options: Some(storage_options),
..Default::default()
}),
..Default::default()
}),
};
let table = db
.create_table(CreateTableRequest {
name: "test_override".to_string(),
namespace: vec![],
data: CreateTableData::Data(reader),
mode: CreateTableMode::Create,
write_options,
location: None,
})
.await
.unwrap();
// Verify table was created successfully
assert_eq!(table.count_rows(None).await.unwrap(), 3);
}
#[tokio::test]
async fn test_stable_row_ids_invalid_value() {
let tempdir = tempdir().unwrap();
let uri = tempdir.path().to_str().unwrap();
// Try to create database with invalid stable row IDs value
let mut options = HashMap::new();
options.insert(
OPT_NEW_TABLE_ENABLE_STABLE_ROW_IDS.to_string(),
"not_a_boolean".to_string(),
);
let request = ConnectRequest {
uri: uri.to_string(),
#[cfg(feature = "remote")]
client_config: Default::default(),
options,
read_consistency_interval: None,
session: None,
};
let result = ListingDatabase::connect_with_options(&request).await;
assert!(result.is_err());
assert!(matches!(
result.unwrap_err(),
Error::InvalidInput { message } if message.contains("enable_stable_row_ids must be a boolean")
));
}
#[test]
fn test_stable_row_ids_config_serialization() {
// Test that ListingDatabaseOptions correctly serializes stable_row_ids
let mut options = HashMap::new();
options.insert(
OPT_NEW_TABLE_ENABLE_STABLE_ROW_IDS.to_string(),
"true".to_string(),
);
// Parse the options
let db_options = ListingDatabaseOptions::parse_from_map(&options).unwrap();
assert_eq!(
db_options.new_table_config.enable_stable_row_ids,
Some(true)
);
// Serialize back to map
let mut serialized = HashMap::new();
db_options.serialize_into_map(&mut serialized);
assert_eq!(
serialized.get(OPT_NEW_TABLE_ENABLE_STABLE_ROW_IDS),
Some(&"true".to_string())
);
}
#[test]
fn test_stable_row_ids_config_parse_false() {
let mut options = HashMap::new();
options.insert(
OPT_NEW_TABLE_ENABLE_STABLE_ROW_IDS.to_string(),
"false".to_string(),
);
let db_options = ListingDatabaseOptions::parse_from_map(&options).unwrap();
assert_eq!(
db_options.new_table_config.enable_stable_row_ids,
Some(false)
);
}
#[test]
fn test_stable_row_ids_config_not_set() {
let options = HashMap::new();
let db_options = ListingDatabaseOptions::parse_from_map(&options).unwrap();
assert_eq!(db_options.new_table_config.enable_stable_row_ids, None);
}
} }

View File

@@ -71,7 +71,7 @@
//! It treats [`FixedSizeList<Float16/Float32>`](https://docs.rs/arrow/latest/arrow/array/struct.FixedSizeListArray.html) //! It treats [`FixedSizeList<Float16/Float32>`](https://docs.rs/arrow/latest/arrow/array/struct.FixedSizeListArray.html)
//! columns as vector columns. //! columns as vector columns.
//! //!
//! For more details, please refer to [LanceDB documentation](https://lancedb.github.io/lancedb/). //! For more details, please refer to the [LanceDB documentation](https://lancedb.com/docs).
//! //!
//! #### Create a table //! #### Create a table
//! //!

View File

@@ -90,7 +90,7 @@ pub struct RemoteDatabaseOptions {
pub host_override: Option<String>, pub host_override: Option<String>,
/// Storage options configure the storage layer (e.g. S3, GCS, Azure, etc.) /// Storage options configure the storage layer (e.g. S3, GCS, Azure, etc.)
/// ///
/// See available options at <https://lancedb.github.io/lancedb/guides/storage/> /// See available options at <https://lancedb.com/docs/storage/>
/// ///
/// These options are only used for LanceDB Enterprise and only a subset of options /// These options are only used for LanceDB Enterprise and only a subset of options
/// are supported. /// are supported.