mirror of
https://github.com/lancedb/lancedb.git
synced 2026-07-02 18:40:40 +00:00
Compare commits
26 Commits
v0.31.0-be
...
v0.31.0
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
3f8d76817e | ||
|
|
7f32a5fd02 | ||
|
|
1545bd4b4f | ||
|
|
589331d86b | ||
|
|
56f856569c | ||
|
|
37466a0390 | ||
|
|
bfce8a510d | ||
|
|
a1261e6299 | ||
|
|
17c499177f | ||
|
|
d889321b5e | ||
|
|
8a37f2ad77 | ||
|
|
f94673ae5e | ||
|
|
3b70fc4c9d | ||
|
|
3a7b02119b | ||
|
|
bcbc0da090 | ||
|
|
9bead9f53d | ||
|
|
0351b77984 | ||
|
|
f6c9d31f98 | ||
|
|
a8f1c5a69f | ||
|
|
10fecdf051 | ||
|
|
c9ae93a7fa | ||
|
|
05756f0bbf | ||
|
|
2a0945443e | ||
|
|
39e819b6a7 | ||
|
|
70126943ff | ||
|
|
e01777070d |
137
.agents/skills/lancedb-branch-ops/SKILL.md
Normal file
137
.agents/skills/lancedb-branch-ops/SKILL.md
Normal file
@@ -0,0 +1,137 @@
|
||||
---
|
||||
name: lancedb-branch-ops
|
||||
description: Branch management for LanceDB tables via the REST API. Use this skill whenever someone wants to create, delete, list, or switch branches on a LanceDB table — or needs to make sure a write (metadata update, index build, etc.) lands on a specific branch instead of main. Invoke it even without the word "branch" if context makes clear they want an experimental copy of a table, want to isolate changes, or want to confirm a mutation didn't touch main. Covers: branches/list, branches/create, branches/delete, and passing "branch" in describe/update_field_metadata/create_index to target a non-main version.
|
||||
---
|
||||
|
||||
## Goal
|
||||
|
||||
Manage branches on a LanceDB table: list what exists, create new ones, delete stale ones, and direct read/write operations at a specific branch without touching main.
|
||||
|
||||
## Step 0: Establish the connection
|
||||
|
||||
Use the `lancedb-connect` skill to resolve the base URL and auth headers (`x-api-key`, `x-lancedb-database`). Skip this only if the connection is already known from the current conversation.
|
||||
|
||||
All examples below use `{base_url}` — substitute the resolved endpoint and include the auth headers on every request.
|
||||
|
||||
## The branch model (important)
|
||||
|
||||
LanceDB branches are named snapshots that diverge from the table's current state at creation time. There is **no checkout command** — you never switch the whole table to a branch. Instead, you **pass `"branch": "<name>"` in the request body** of any operation to target that branch. Omitting the key (or sending an empty body) always targets main.
|
||||
|
||||
`branches/list` returns only non-main branches. Main always exists and is not listed.
|
||||
|
||||
## List branches
|
||||
|
||||
```http
|
||||
POST {base_url}/v1/table/{table_id}/branches/list
|
||||
Content-Type: application/json
|
||||
|
||||
{}
|
||||
```
|
||||
|
||||
Response:
|
||||
```json
|
||||
{
|
||||
"branches": {
|
||||
"experiment-reindex": {"parentVersion": 1, "createAt": 1782506085, "manifestSize": 1029}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
If `branches` is `{}`, the table has no branches besides main.
|
||||
|
||||
## Create a branch
|
||||
|
||||
```http
|
||||
POST {base_url}/v1/table/{table_id}/branches/create
|
||||
Content-Type: application/json
|
||||
|
||||
{"name": "experiment-reindex"}
|
||||
```
|
||||
|
||||
HTTP 200 with `{}` body = success. The branch is created off the table's current state on main.
|
||||
|
||||
Verify by calling `branches/list` and confirming the new name appears.
|
||||
|
||||
## Delete a branch
|
||||
|
||||
```http
|
||||
POST {base_url}/v1/table/{table_id}/branches/delete
|
||||
Content-Type: application/json
|
||||
|
||||
{"name": "stale-2024"}
|
||||
```
|
||||
|
||||
HTTP 200 with `{}` body = success. Only the branch pointer is removed — main and all row data remain intact.
|
||||
|
||||
Verify by calling `branches/list` (name gone) and `describe` with no branch param (main still responds).
|
||||
|
||||
## Operate on a specific branch
|
||||
|
||||
Pass `"branch": "<name>"` in the body of any operation to scope it to that branch:
|
||||
|
||||
**Read schema on a branch:**
|
||||
```http
|
||||
POST {base_url}/v1/table/{table_id}/describe
|
||||
Content-Type: application/json
|
||||
|
||||
{"branch": "wip-branch"}
|
||||
```
|
||||
|
||||
**Write metadata to a branch (not main):**
|
||||
```http
|
||||
POST {base_url}/v1/table/{table_id}/update_field_metadata
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"branch": "wip-branch",
|
||||
"updates": [
|
||||
{
|
||||
"path": "category",
|
||||
"metadata": {"lancedb:description": "Product category label."},
|
||||
"replace": false
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**Build an index on a branch:**
|
||||
```http
|
||||
POST {base_url}/v1/table/{table_id}/create_index
|
||||
Content-Type: application/json
|
||||
|
||||
{
|
||||
"branch": "wip-branch",
|
||||
"column": "category",
|
||||
"index_type": "BTREE"
|
||||
}
|
||||
```
|
||||
|
||||
## Verifying isolation
|
||||
|
||||
After writing to a branch, always confirm the change did NOT land on main:
|
||||
|
||||
```bash
|
||||
# Should show the new metadata
|
||||
curl -s -X POST {base_url}/v1/table/{table_id}/describe \
|
||||
-H "x-api-key: <key>" -H "x-lancedb-database: <db>" \
|
||||
-H "content-type: application/json" \
|
||||
-d '{"branch": "wip-branch"}'
|
||||
|
||||
# Should NOT show the new metadata
|
||||
curl -s -X POST {base_url}/v1/table/{table_id}/describe \
|
||||
-H "x-api-key: <key>" -H "x-lancedb-database: <db>" \
|
||||
-H "content-type: application/json" \
|
||||
-d '{}'
|
||||
```
|
||||
|
||||
## Quick reference
|
||||
|
||||
| Goal | Endpoint | Body |
|
||||
|------|----------|------|
|
||||
| List all branches | `branches/list` | `{}` |
|
||||
| Create a branch | `branches/create` | `{"name": "..."}` |
|
||||
| Delete a branch | `branches/delete` | `{"name": "..."}` |
|
||||
| Read schema on branch | `describe` | `{"branch": "..."}` |
|
||||
| Write metadata on branch | `update_field_metadata` | `{"branch": "...", "updates": [...]}` |
|
||||
| Build index on branch | `create_index` | `{"branch": "...", "column": ..., "index_type": ...}` |
|
||||
| Target main (default) | any endpoint | omit `"branch"` key |
|
||||
@@ -1,5 +1,5 @@
|
||||
[tool.bumpversion]
|
||||
current_version = "0.31.0-beta.4"
|
||||
current_version = "0.31.0"
|
||||
parse = """(?x)
|
||||
(?P<major>0|[1-9]\\d*)\\.
|
||||
(?P<minor>0|[1-9]\\d*)\\.
|
||||
|
||||
4
.github/workflows/cargo-publish.yml
vendored
4
.github/workflows/cargo-publish.yml
vendored
@@ -25,7 +25,7 @@ jobs:
|
||||
# Only runs on tags that matches the make-release action
|
||||
if: startsWith(github.ref, 'refs/tags/v')
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- uses: actions/checkout@v6
|
||||
- uses: Swatinem/rust-cache@v2
|
||||
with:
|
||||
workspaces: rust
|
||||
@@ -47,7 +47,7 @@ jobs:
|
||||
contents: read
|
||||
issues: write
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- uses: actions/checkout@v6
|
||||
- uses: ./.github/actions/create-failure-issue
|
||||
with:
|
||||
job-results: ${{ toJSON(needs) }}
|
||||
|
||||
6
.github/workflows/codex-fix-ci.yml
vendored
6
.github/workflows/codex-fix-ci.yml
vendored
@@ -36,14 +36,14 @@ jobs:
|
||||
echo "guidelines = ${{ inputs.guidelines }}"
|
||||
|
||||
- name: Checkout Repo
|
||||
uses: actions/checkout@v4
|
||||
uses: actions/checkout@v6
|
||||
with:
|
||||
ref: ${{ inputs.branch }}
|
||||
fetch-depth: 0
|
||||
persist-credentials: true
|
||||
|
||||
- name: Set up Node.js
|
||||
uses: actions/setup-node@v4
|
||||
uses: actions/setup-node@v6
|
||||
with:
|
||||
# pnpm 11 (used by the nodejs install step below) requires
|
||||
# Node >= 22.13; use 24 since 22 hits EOL in October.
|
||||
@@ -82,7 +82,7 @@ jobs:
|
||||
cache: maven
|
||||
|
||||
- name: Setup pnpm
|
||||
uses: pnpm/action-setup@v4
|
||||
uses: pnpm/action-setup@v6
|
||||
with:
|
||||
version: 11.1.1
|
||||
- name: Install Node.js dependencies for TypeScript bindings
|
||||
|
||||
@@ -30,13 +30,13 @@ jobs:
|
||||
echo "tag = ${{ inputs.tag || 'latest' }}"
|
||||
|
||||
- name: Checkout Repo LanceDB
|
||||
uses: actions/checkout@v4
|
||||
uses: actions/checkout@v6
|
||||
with:
|
||||
fetch-depth: 0
|
||||
persist-credentials: true
|
||||
|
||||
- name: Set up Node.js
|
||||
uses: actions/setup-node@v4
|
||||
uses: actions/setup-node@v6
|
||||
with:
|
||||
node-version: 20
|
||||
|
||||
|
||||
2
.github/workflows/dev.yml
vendored
2
.github/workflows/dev.yml
vendored
@@ -27,7 +27,7 @@ jobs:
|
||||
name: Verify PR title / description conforms to semantic-release
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/setup-node@v4
|
||||
- uses: actions/setup-node@v6
|
||||
with:
|
||||
node-version: "18"
|
||||
# These rules are disabled because Github will always ensure there
|
||||
|
||||
4
.github/workflows/docs.yml
vendored
4
.github/workflows/docs.yml
vendored
@@ -35,7 +35,7 @@ jobs:
|
||||
runs-on: ubuntu-24.04
|
||||
steps:
|
||||
- name: Checkout
|
||||
uses: actions/checkout@v4
|
||||
uses: actions/checkout@v6
|
||||
- name: Install dependencies needed for ubuntu
|
||||
run: |
|
||||
sudo apt install -y protobuf-compiler libssl-dev
|
||||
@@ -53,7 +53,7 @@ jobs:
|
||||
python -m pip install --extra-index-url https://pypi.fury.io/lance-format/ --extra-index-url https://pypi.fury.io/lancedb/ -e .
|
||||
python -m pip install --extra-index-url https://pypi.fury.io/lance-format/ --extra-index-url https://pypi.fury.io/lancedb/ -r ../docs/requirements.txt
|
||||
- name: Set up node
|
||||
uses: actions/setup-node@v4
|
||||
uses: actions/setup-node@v6
|
||||
with:
|
||||
node-version: 20
|
||||
cache: 'npm'
|
||||
|
||||
4
.github/workflows/java-publish.yml
vendored
4
.github/workflows/java-publish.yml
vendored
@@ -32,7 +32,7 @@ jobs:
|
||||
working-directory: ./java
|
||||
steps:
|
||||
- name: Checkout repository
|
||||
uses: actions/checkout@v4
|
||||
uses: actions/checkout@v6
|
||||
- name: Set up Java 8
|
||||
uses: actions/setup-java@v4
|
||||
with:
|
||||
@@ -73,7 +73,7 @@ jobs:
|
||||
contents: read
|
||||
issues: write
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- uses: actions/checkout@v6
|
||||
- uses: ./.github/actions/create-failure-issue
|
||||
with:
|
||||
job-results: ${{ toJSON(needs) }}
|
||||
|
||||
2
.github/workflows/java.yml
vendored
2
.github/workflows/java.yml
vendored
@@ -36,7 +36,7 @@ jobs:
|
||||
working-directory: ./java
|
||||
steps:
|
||||
- name: Checkout repository
|
||||
uses: actions/checkout@v4
|
||||
uses: actions/checkout@v6
|
||||
- name: Set up Java 17
|
||||
uses: actions/setup-java@v4
|
||||
with:
|
||||
|
||||
2
.github/workflows/license-header-check.yml
vendored
2
.github/workflows/license-header-check.yml
vendored
@@ -19,7 +19,7 @@ jobs:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- name: Check out code
|
||||
uses: actions/checkout@v4
|
||||
uses: actions/checkout@v6
|
||||
- name: Install license-header-checker
|
||||
working-directory: /tmp
|
||||
run: |
|
||||
|
||||
2
.github/workflows/make-release-commit.yml
vendored
2
.github/workflows/make-release-commit.yml
vendored
@@ -49,7 +49,7 @@ jobs:
|
||||
steps:
|
||||
- name: Output Inputs
|
||||
run: echo "${{ toJSON(github.event.inputs) }}"
|
||||
- uses: actions/checkout@v4
|
||||
- uses: actions/checkout@v6
|
||||
with:
|
||||
fetch-depth: 0
|
||||
lfs: true
|
||||
|
||||
20
.github/workflows/nodejs.yml
vendored
20
.github/workflows/nodejs.yml
vendored
@@ -38,14 +38,14 @@ jobs:
|
||||
CC: gcc-12
|
||||
CXX: g++-12
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- uses: actions/checkout@v6
|
||||
with:
|
||||
fetch-depth: 0
|
||||
lfs: true
|
||||
- uses: pnpm/action-setup@v4
|
||||
- uses: pnpm/action-setup@v6
|
||||
with:
|
||||
version: 11.1.1
|
||||
- uses: actions/setup-node@v4
|
||||
- uses: actions/setup-node@v6
|
||||
with:
|
||||
# pnpm 11 requires Node >= 22.13; use 24 since 22 hits EOL
|
||||
# in October. The library itself still supports Node >= 18
|
||||
@@ -86,14 +86,14 @@ jobs:
|
||||
shell: bash
|
||||
working-directory: nodejs
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- uses: actions/checkout@v6
|
||||
with:
|
||||
fetch-depth: 0
|
||||
lfs: true
|
||||
- uses: pnpm/action-setup@v4
|
||||
- uses: pnpm/action-setup@v6
|
||||
with:
|
||||
version: 11.1.1
|
||||
- uses: actions/setup-node@v4
|
||||
- uses: actions/setup-node@v6
|
||||
name: Setup Node.js 24 for build
|
||||
with:
|
||||
# pnpm 11 requires Node >= 22.13; use 24 since 22 hits EOL
|
||||
@@ -130,7 +130,7 @@ jobs:
|
||||
echo "Run 'pnpm run docs', fix any warnings, and commit the changes."
|
||||
exit 1
|
||||
fi
|
||||
- uses: actions/setup-node@v4
|
||||
- uses: actions/setup-node@v6
|
||||
name: Setup Node.js ${{ matrix.node-version }} for test
|
||||
with:
|
||||
node-version: ${{ matrix.node-version }}
|
||||
@@ -166,14 +166,14 @@ jobs:
|
||||
shell: bash
|
||||
working-directory: nodejs
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- uses: actions/checkout@v6
|
||||
with:
|
||||
fetch-depth: 0
|
||||
lfs: true
|
||||
- uses: pnpm/action-setup@v4
|
||||
- uses: pnpm/action-setup@v6
|
||||
with:
|
||||
version: 11.1.1
|
||||
- uses: actions/setup-node@v4
|
||||
- uses: actions/setup-node@v6
|
||||
with:
|
||||
# pnpm 11 requires Node >= 22.13; use 24 since 22 hits EOL
|
||||
# in October.
|
||||
|
||||
38
.github/workflows/npm-publish.yml
vendored
38
.github/workflows/npm-publish.yml
vendored
@@ -32,7 +32,7 @@ jobs:
|
||||
permissions:
|
||||
contents: write
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- uses: actions/checkout@v6
|
||||
with:
|
||||
fetch-depth: 0
|
||||
lfs: true
|
||||
@@ -170,13 +170,13 @@ jobs:
|
||||
run:
|
||||
working-directory: nodejs
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- uses: actions/checkout@v6
|
||||
- name: Setup pnpm
|
||||
uses: pnpm/action-setup@v4
|
||||
uses: pnpm/action-setup@v6
|
||||
with:
|
||||
version: 11.1.1
|
||||
- name: Setup node
|
||||
uses: actions/setup-node@v4
|
||||
uses: actions/setup-node@v6
|
||||
with:
|
||||
# pnpm 11 requires Node >= 22.13; use 24 since 22 hits EOL
|
||||
# in October.
|
||||
@@ -190,7 +190,7 @@ jobs:
|
||||
toolchain: stable
|
||||
targets: ${{ matrix.settings.target }}
|
||||
- name: Cache cargo
|
||||
uses: actions/cache@v4
|
||||
uses: actions/cache@v5
|
||||
with:
|
||||
path: |
|
||||
~/.cargo/registry/index/
|
||||
@@ -244,7 +244,7 @@ jobs:
|
||||
if: ${{ !matrix.settings.docker }}
|
||||
shell: bash
|
||||
- name: Upload artifact
|
||||
uses: actions/upload-artifact@v4
|
||||
uses: actions/upload-artifact@v7
|
||||
with:
|
||||
name: lancedb-${{ matrix.settings.target }}
|
||||
path: nodejs/dist/*.node
|
||||
@@ -256,7 +256,7 @@ jobs:
|
||||
run: pnpm tsc
|
||||
- name: Upload Generic Artifacts
|
||||
if: ${{ matrix.settings.target == 'aarch64-apple-darwin' }}
|
||||
uses: actions/upload-artifact@v4
|
||||
uses: actions/upload-artifact@v7
|
||||
with:
|
||||
name: nodejs-dist
|
||||
path: |
|
||||
@@ -287,13 +287,13 @@ jobs:
|
||||
shell: bash
|
||||
working-directory: nodejs
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- uses: actions/checkout@v6
|
||||
- name: Setup pnpm
|
||||
uses: pnpm/action-setup@v4
|
||||
uses: pnpm/action-setup@v6
|
||||
with:
|
||||
version: 11.1.1
|
||||
- name: Setup Node.js 24 for install
|
||||
uses: actions/setup-node@v4
|
||||
uses: actions/setup-node@v6
|
||||
with:
|
||||
# pnpm 11 requires Node >= 22.13; use 24 since 22 hits EOL
|
||||
# in October.
|
||||
@@ -303,18 +303,18 @@ jobs:
|
||||
- name: Install dependencies
|
||||
run: pnpm install --frozen-lockfile
|
||||
- name: Setup Node.js ${{ matrix.node }} for test
|
||||
uses: actions/setup-node@v4
|
||||
uses: actions/setup-node@v6
|
||||
with:
|
||||
node-version: ${{ matrix.node }}
|
||||
- name: Download artifacts
|
||||
uses: actions/download-artifact@v4
|
||||
uses: actions/download-artifact@v8
|
||||
with:
|
||||
name: lancedb-${{ matrix.settings.target }}
|
||||
path: nodejs/dist/
|
||||
# For testing purposes:
|
||||
# run-id: 13982782871
|
||||
# github-token: ${{ secrets.GITHUB_TOKEN }} # token with actions:read permissions on target repo
|
||||
- uses: actions/download-artifact@v4
|
||||
- uses: actions/download-artifact@v8
|
||||
with:
|
||||
name: nodejs-dist
|
||||
path: nodejs/dist
|
||||
@@ -339,13 +339,13 @@ jobs:
|
||||
needs:
|
||||
- test-lancedb
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- uses: actions/checkout@v6
|
||||
- name: Setup pnpm
|
||||
uses: pnpm/action-setup@v4
|
||||
uses: pnpm/action-setup@v6
|
||||
with:
|
||||
version: 11.1.1
|
||||
- name: Setup node
|
||||
uses: actions/setup-node@v4
|
||||
uses: actions/setup-node@v6
|
||||
with:
|
||||
node-version: 24
|
||||
cache: pnpm
|
||||
@@ -353,14 +353,14 @@ jobs:
|
||||
registry-url: "https://registry.npmjs.org"
|
||||
- name: Install dependencies
|
||||
run: pnpm install --frozen-lockfile
|
||||
- uses: actions/download-artifact@v4
|
||||
- uses: actions/download-artifact@v8
|
||||
with:
|
||||
name: nodejs-dist
|
||||
path: nodejs/dist
|
||||
# For testing purposes:
|
||||
# run-id: 13982782871
|
||||
# github-token: ${{ secrets.GITHUB_TOKEN }} # token with actions:read permissions on target repo
|
||||
- uses: actions/download-artifact@v4
|
||||
- uses: actions/download-artifact@v8
|
||||
name: Download arch-specific binaries
|
||||
with:
|
||||
pattern: lancedb-*
|
||||
@@ -398,7 +398,7 @@ jobs:
|
||||
contents: read
|
||||
issues: write
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- uses: actions/checkout@v6
|
||||
- uses: ./.github/actions/create-failure-issue
|
||||
with:
|
||||
job-results: ${{ toJSON(needs) }}
|
||||
|
||||
14
.github/workflows/python.yml
vendored
14
.github/workflows/python.yml
vendored
@@ -41,7 +41,7 @@ jobs:
|
||||
shell: bash
|
||||
working-directory: python
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- uses: actions/checkout@v6
|
||||
with:
|
||||
fetch-depth: 0
|
||||
lfs: true
|
||||
@@ -66,7 +66,7 @@ jobs:
|
||||
shell: bash
|
||||
working-directory: python
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- uses: actions/checkout@v6
|
||||
with:
|
||||
fetch-depth: 0
|
||||
lfs: true
|
||||
@@ -95,7 +95,7 @@ jobs:
|
||||
shell: bash
|
||||
working-directory: python
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- uses: actions/checkout@v6
|
||||
with:
|
||||
fetch-depth: 0
|
||||
lfs: true
|
||||
@@ -126,7 +126,7 @@ jobs:
|
||||
shell: bash
|
||||
working-directory: python
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- uses: actions/checkout@v6
|
||||
with:
|
||||
fetch-depth: 0
|
||||
lfs: true
|
||||
@@ -160,7 +160,7 @@ jobs:
|
||||
shell: bash
|
||||
working-directory: python
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- uses: actions/checkout@v6
|
||||
with:
|
||||
fetch-depth: 0
|
||||
lfs: true
|
||||
@@ -189,7 +189,7 @@ jobs:
|
||||
shell: bash
|
||||
working-directory: python
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- uses: actions/checkout@v6
|
||||
with:
|
||||
fetch-depth: 0
|
||||
lfs: true
|
||||
@@ -212,7 +212,7 @@ jobs:
|
||||
shell: bash
|
||||
working-directory: python
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- uses: actions/checkout@v6
|
||||
with:
|
||||
fetch-depth: 0
|
||||
lfs: true
|
||||
|
||||
14
.github/workflows/rust.yml
vendored
14
.github/workflows/rust.yml
vendored
@@ -40,7 +40,7 @@ jobs:
|
||||
CC: clang-18
|
||||
CXX: clang++-18
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- uses: actions/checkout@v6
|
||||
with:
|
||||
fetch-depth: 0
|
||||
lfs: true
|
||||
@@ -65,7 +65,7 @@ jobs:
|
||||
timeout-minutes: 10
|
||||
runs-on: ubuntu-24.04
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- uses: actions/checkout@v6
|
||||
- uses: EmbarkStudios/cargo-deny-action@v2
|
||||
with:
|
||||
command: check advisories bans licenses sources
|
||||
@@ -78,7 +78,7 @@ jobs:
|
||||
CC: clang
|
||||
CXX: clang++
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- uses: actions/checkout@v6
|
||||
# Building without a lock file often requires the latest Rust version since downstream
|
||||
# dependencies may have updated their minimum Rust version.
|
||||
- uses: actions-rust-lang/setup-rust-toolchain@v1
|
||||
@@ -113,7 +113,7 @@ jobs:
|
||||
CXX: clang++-18
|
||||
GH_TOKEN: ${{ secrets.SOPHON_READ_TOKEN }}
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- uses: actions/checkout@v6
|
||||
with:
|
||||
fetch-depth: 0
|
||||
lfs: true
|
||||
@@ -152,7 +152,7 @@ jobs:
|
||||
shell: bash
|
||||
working-directory: rust
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- uses: actions/checkout@v6
|
||||
with:
|
||||
fetch-depth: 0
|
||||
lfs: true
|
||||
@@ -181,7 +181,7 @@ jobs:
|
||||
run:
|
||||
working-directory: rust/lancedb
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- uses: actions/checkout@v6
|
||||
- name: Set target
|
||||
run: rustup target add ${{ matrix.target }}
|
||||
- uses: Swatinem/rust-cache@v2
|
||||
@@ -210,7 +210,7 @@ jobs:
|
||||
CC: clang-18
|
||||
CXX: clang++-18
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- uses: actions/checkout@v6
|
||||
with:
|
||||
submodules: true
|
||||
- name: Install dependencies
|
||||
|
||||
@@ -11,7 +11,7 @@ jobs:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- name: Checkout
|
||||
uses: actions/checkout@v4
|
||||
uses: actions/checkout@v6
|
||||
with:
|
||||
ref: main
|
||||
persist-credentials: false
|
||||
|
||||
@@ -11,7 +11,7 @@ jobs:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- name: Checkout
|
||||
uses: actions/checkout@v4
|
||||
uses: actions/checkout@v6
|
||||
with:
|
||||
ref: main
|
||||
persist-credentials: false
|
||||
|
||||
152
Cargo.lock
generated
152
Cargo.lock
generated
@@ -157,9 +157,9 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "anyhow"
|
||||
version = "1.0.102"
|
||||
version = "1.0.103"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "7f202df86484c868dbad7eaa557ef785d5c66295e41b460ef922eca0723b842c"
|
||||
checksum = "2a4385e2e34eb35d6b3efe798b9eb88096925d87726c0798709bf56d9ed84af3"
|
||||
|
||||
[[package]]
|
||||
name = "approx"
|
||||
@@ -3186,9 +3186,9 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "env_filter"
|
||||
version = "1.0.1"
|
||||
version = "2.0.0"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "32e90c2accc4b07a8456ea0debdc2e7587bdd890680d71173a15d4ae604f6eef"
|
||||
checksum = "900d271a03799a1ee8d1ca9b19893b48ca674a9284fefcfb85f05e74ed314217"
|
||||
dependencies = [
|
||||
"log",
|
||||
"regex",
|
||||
@@ -3196,9 +3196,9 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "env_logger"
|
||||
version = "0.11.10"
|
||||
version = "0.11.11"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "0621c04f2196ac3f488dd583365b9c09be011a4ab8b9f37248ffcc8f6198b56a"
|
||||
checksum = "de671bd27a75a797dc9ae289ba1e77276e75e2026408aab65185384e2d5cd3f6"
|
||||
dependencies = [
|
||||
"anstream",
|
||||
"anstyle",
|
||||
@@ -3432,8 +3432,9 @@ checksum = "42703706b716c37f96a77aea830392ad231f44c9e9a67872fa5548707e11b11c"
|
||||
|
||||
[[package]]
|
||||
name = "fsst"
|
||||
version = "9.0.0-beta.8"
|
||||
source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.8#71c4aa2174971e98acb7e256fde1e1589024f5bc"
|
||||
version = "8.0.0"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "7af6e24ed12cf382082d5a7f365df2b8e3d6b1518f615d54c85165e9b9718250"
|
||||
dependencies = [
|
||||
"arrow-array",
|
||||
"rand 0.9.4",
|
||||
@@ -4735,8 +4736,9 @@ checksum = "e037a2e1d8d5fdbd49b16a4ea09d5d6401c1f29eca5ff29d03d3824dba16256a"
|
||||
|
||||
[[package]]
|
||||
name = "lance"
|
||||
version = "9.0.0-beta.8"
|
||||
source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.8#71c4aa2174971e98acb7e256fde1e1589024f5bc"
|
||||
version = "8.0.0"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "e2122e9f5f5f4b38bb9f0c4991c8d5171696402b3cc6187885c8385875f6df2e"
|
||||
dependencies = [
|
||||
"arc-swap",
|
||||
"arrow",
|
||||
@@ -4771,7 +4773,7 @@ dependencies = [
|
||||
"futures",
|
||||
"half",
|
||||
"humantime",
|
||||
"itertools 0.14.0",
|
||||
"itertools 0.13.0",
|
||||
"lance-arrow",
|
||||
"lance-core",
|
||||
"lance-datafusion",
|
||||
@@ -4810,8 +4812,9 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "lance-arrow"
|
||||
version = "9.0.0-beta.8"
|
||||
source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.8#71c4aa2174971e98acb7e256fde1e1589024f5bc"
|
||||
version = "8.0.0"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "95f45fb7e0822cc0233d686222ab451f6ec58879620029e0b7bae8192c2f7c7e"
|
||||
dependencies = [
|
||||
"arrow-array",
|
||||
"arrow-buffer",
|
||||
@@ -4832,7 +4835,8 @@ dependencies = [
|
||||
[[package]]
|
||||
name = "lance-arrow-scalar"
|
||||
version = "58.0.0"
|
||||
source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.8#71c4aa2174971e98acb7e256fde1e1589024f5bc"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "771f68b04b47f3addf781116f65061808de94b05e1e9411c23c18f32d14ebe79"
|
||||
dependencies = [
|
||||
"arrow-array",
|
||||
"arrow-buffer",
|
||||
@@ -4846,17 +4850,20 @@ dependencies = [
|
||||
[[package]]
|
||||
name = "lance-arrow-stats"
|
||||
version = "58.0.0"
|
||||
source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.8#71c4aa2174971e98acb7e256fde1e1589024f5bc"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "dd47ec33c90bf29f688fd02118e37d3a5ad5c339caa3163f89e417dc0867001f"
|
||||
dependencies = [
|
||||
"arrow-array",
|
||||
"arrow-schema",
|
||||
"half",
|
||||
"lance-arrow-scalar",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "lance-bitpacking"
|
||||
version = "9.0.0-beta.8"
|
||||
source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.8#71c4aa2174971e98acb7e256fde1e1589024f5bc"
|
||||
version = "8.0.0"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "c56c21a39860ca7bae712b4e24b9fd066029c570a72428a579faf697de5b8822"
|
||||
dependencies = [
|
||||
"arrayref",
|
||||
"paste",
|
||||
@@ -4865,8 +4872,9 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "lance-core"
|
||||
version = "9.0.0-beta.8"
|
||||
source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.8#71c4aa2174971e98acb7e256fde1e1589024f5bc"
|
||||
version = "8.0.0"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "4ccc7b0b61c11bdb74f929111214553b36b1ee43c88445edda17bc9a2bda75e9"
|
||||
dependencies = [
|
||||
"arrow-array",
|
||||
"arrow-buffer",
|
||||
@@ -4878,7 +4886,7 @@ dependencies = [
|
||||
"datafusion-common",
|
||||
"datafusion-sql",
|
||||
"futures",
|
||||
"itertools 0.14.0",
|
||||
"itertools 0.13.0",
|
||||
"lance-arrow",
|
||||
"lance-derive",
|
||||
"libc",
|
||||
@@ -4904,8 +4912,9 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "lance-datafusion"
|
||||
version = "9.0.0-beta.8"
|
||||
source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.8#71c4aa2174971e98acb7e256fde1e1589024f5bc"
|
||||
version = "8.0.0"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "f0932140306faa6c3c4e17f0478c031e2be659ea2721311a530fb7c664e1b9a0"
|
||||
dependencies = [
|
||||
"arrow",
|
||||
"arrow-array",
|
||||
@@ -4935,8 +4944,9 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "lance-datagen"
|
||||
version = "9.0.0-beta.8"
|
||||
source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.8#71c4aa2174971e98acb7e256fde1e1589024f5bc"
|
||||
version = "8.0.0"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "075c8154e6b27ff6859f90886efd8cbac348cca255c8b855da630994b4fed714"
|
||||
dependencies = [
|
||||
"arrow",
|
||||
"arrow-array",
|
||||
@@ -4953,8 +4963,9 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "lance-derive"
|
||||
version = "9.0.0-beta.8"
|
||||
source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.8#71c4aa2174971e98acb7e256fde1e1589024f5bc"
|
||||
version = "8.0.0"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "7056da2e742fd466f6bdfaa876c91d3e0e99bb4f756be058e374ceae2630ec2f"
|
||||
dependencies = [
|
||||
"proc-macro2",
|
||||
"quote",
|
||||
@@ -4963,8 +4974,9 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "lance-encoding"
|
||||
version = "9.0.0-beta.8"
|
||||
source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.8#71c4aa2174971e98acb7e256fde1e1589024f5bc"
|
||||
version = "8.0.0"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "5e165c668ae61fce336d5f6f9a999ee99b963bb395a3fb818737c533fc9528d1"
|
||||
dependencies = [
|
||||
"arrow-arith",
|
||||
"arrow-array",
|
||||
@@ -4980,7 +4992,7 @@ dependencies = [
|
||||
"futures",
|
||||
"hex",
|
||||
"hyperloglogplus",
|
||||
"itertools 0.14.0",
|
||||
"itertools 0.13.0",
|
||||
"lance-arrow",
|
||||
"lance-bitpacking",
|
||||
"lance-core",
|
||||
@@ -4999,8 +5011,9 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "lance-file"
|
||||
version = "9.0.0-beta.8"
|
||||
source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.8#71c4aa2174971e98acb7e256fde1e1589024f5bc"
|
||||
version = "8.0.0"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "1691bd5c37bc5e07bde00e32950b5526c2e5653bd2493a3773712574366a37a1"
|
||||
dependencies = [
|
||||
"arrow-arith",
|
||||
"arrow-array",
|
||||
@@ -5030,8 +5043,9 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "lance-index"
|
||||
version = "9.0.0-beta.8"
|
||||
source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.8#71c4aa2174971e98acb7e256fde1e1589024f5bc"
|
||||
version = "8.0.0"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "9b4792b56f0e047eed706d05a1eedc5f549090913fb4527585bb3a331b83416b"
|
||||
dependencies = [
|
||||
"arc-swap",
|
||||
"arrow",
|
||||
@@ -5056,7 +5070,7 @@ dependencies = [
|
||||
"fst",
|
||||
"futures",
|
||||
"half",
|
||||
"itertools 0.14.0",
|
||||
"itertools 0.13.0",
|
||||
"jieba-rs",
|
||||
"jsonb",
|
||||
"lance-arrow",
|
||||
@@ -5096,8 +5110,9 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "lance-io"
|
||||
version = "9.0.0-beta.8"
|
||||
source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.8#71c4aa2174971e98acb7e256fde1e1589024f5bc"
|
||||
version = "8.0.0"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "6c81dda99d5b8c8741f413d65650a28ff203d94eda3cbbdda6fd3af3ea9b2a69"
|
||||
dependencies = [
|
||||
"arrow",
|
||||
"arrow-arith",
|
||||
@@ -5138,8 +5153,9 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "lance-linalg"
|
||||
version = "9.0.0-beta.8"
|
||||
source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.8#71c4aa2174971e98acb7e256fde1e1589024f5bc"
|
||||
version = "8.0.0"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "5c0a8d2a2bc7e6b6229abe320ec6d9e2049a0e65e084684cba01cf032fc71896"
|
||||
dependencies = [
|
||||
"arrow-array",
|
||||
"arrow-buffer",
|
||||
@@ -5150,13 +5166,13 @@ dependencies = [
|
||||
"lance-core",
|
||||
"num-traits",
|
||||
"rand 0.9.4",
|
||||
"rayon",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "lance-namespace"
|
||||
version = "9.0.0-beta.8"
|
||||
source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.8#71c4aa2174971e98acb7e256fde1e1589024f5bc"
|
||||
version = "8.0.0"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "3ee2d75929caec41747e58ce090b1a061b650a16cb10d09998128d7912203b1d"
|
||||
dependencies = [
|
||||
"arrow",
|
||||
"async-trait",
|
||||
@@ -5168,8 +5184,9 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "lance-namespace-impls"
|
||||
version = "9.0.0-beta.8"
|
||||
source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.8#71c4aa2174971e98acb7e256fde1e1589024f5bc"
|
||||
version = "8.0.0"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "c92dd5dcb98562a91650da0b5fa63726d435fad8b03327625cc3de445b7e191a"
|
||||
dependencies = [
|
||||
"arrow",
|
||||
"arrow-ipc",
|
||||
@@ -5223,15 +5240,16 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "lance-select"
|
||||
version = "9.0.0-beta.8"
|
||||
source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.8#71c4aa2174971e98acb7e256fde1e1589024f5bc"
|
||||
version = "8.0.0"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "cf3a2162385a4392376ed7a732e070afd1432bb6f86b0ebcb7c4304c7196953b"
|
||||
dependencies = [
|
||||
"arrow-array",
|
||||
"arrow-buffer",
|
||||
"arrow-schema",
|
||||
"byteorder",
|
||||
"bytes",
|
||||
"itertools 0.14.0",
|
||||
"itertools 0.13.0",
|
||||
"lance-core",
|
||||
"roaring",
|
||||
"tracing",
|
||||
@@ -5239,8 +5257,9 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "lance-table"
|
||||
version = "9.0.0-beta.8"
|
||||
source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.8#71c4aa2174971e98acb7e256fde1e1589024f5bc"
|
||||
version = "8.0.0"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "73da3932a85489e80d5458d4bbc1d340e15c72b89bab529723f575d4d8fda5eb"
|
||||
dependencies = [
|
||||
"arrow",
|
||||
"arrow-array",
|
||||
@@ -5279,8 +5298,9 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "lance-testing"
|
||||
version = "9.0.0-beta.8"
|
||||
source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.8#71c4aa2174971e98acb7e256fde1e1589024f5bc"
|
||||
version = "8.0.0"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "f0a5a7b5dbda917170b705301d0c6a8753a8c02f501b20d8b7067b5463ba071d"
|
||||
dependencies = [
|
||||
"arrow-array",
|
||||
"arrow-schema",
|
||||
@@ -5293,8 +5313,9 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "lance-tokenizer"
|
||||
version = "9.0.0-beta.8"
|
||||
source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.8#71c4aa2174971e98acb7e256fde1e1589024f5bc"
|
||||
version = "8.0.0"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "d16326f6b6d3b736aac40a0ffa78d8aa17d3a53a3766cd0af6d23db87472bd5e"
|
||||
dependencies = [
|
||||
"icu_segmenter",
|
||||
"jieba-rs",
|
||||
@@ -5307,12 +5328,13 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "lancedb"
|
||||
version = "0.31.0-beta.3"
|
||||
version = "0.31.0-beta.6"
|
||||
dependencies = [
|
||||
"ahash",
|
||||
"anyhow",
|
||||
"arrow",
|
||||
"arrow-array",
|
||||
"arrow-buffer",
|
||||
"arrow-cast",
|
||||
"arrow-data",
|
||||
"arrow-ipc",
|
||||
@@ -5391,7 +5413,7 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "lancedb-nodejs"
|
||||
version = "0.31.0-beta.3"
|
||||
version = "0.31.0-beta.6"
|
||||
dependencies = [
|
||||
"arrow-array",
|
||||
"arrow-buffer",
|
||||
@@ -5416,7 +5438,7 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "lancedb-python"
|
||||
version = "0.34.0-beta.3"
|
||||
version = "0.34.0-beta.6"
|
||||
dependencies = [
|
||||
"arrow",
|
||||
"async-trait",
|
||||
@@ -5649,9 +5671,9 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "log"
|
||||
version = "0.4.32"
|
||||
version = "0.4.33"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "953f07c43838f8e6f9758cab68bf5bed85465e7587ebe0b823f1bcd81978ad3a"
|
||||
checksum = "0ceec5bc11778974d1bcb055b18002eba7f4b3518b6a0081b3af5f21666da9ad"
|
||||
|
||||
[[package]]
|
||||
name = "loom"
|
||||
@@ -5959,9 +5981,9 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "napi"
|
||||
version = "3.9.3"
|
||||
version = "3.9.4"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "fbd9f9295f3ff5921e78a71222c3361a8216f7760b1a99a6ad4e8441de18bbb9"
|
||||
checksum = "b41bda2ac390efb5e8d22025d925ccc3f3807d8c1bea6d19b36127247c4b8f83"
|
||||
dependencies = [
|
||||
"bitflags 2.11.1",
|
||||
"chrono",
|
||||
@@ -5984,9 +6006,9 @@ checksum = "c9c366d2c8c60b86fa632df75f745509b52f9128f91a6bad4c796e44abb505e1"
|
||||
|
||||
[[package]]
|
||||
name = "napi-derive"
|
||||
version = "3.5.6"
|
||||
version = "3.5.7"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "89b3f766e04667e6da0e181e2da4f85475d5a6513b7cf6a80bea184e224a5b42"
|
||||
checksum = "61d66f70256ad5aef58659966064471d0ad90e2897bc36a5a5e0389c85aabc1e"
|
||||
dependencies = [
|
||||
"convert_case",
|
||||
"ctor 1.0.5",
|
||||
@@ -5998,9 +6020,9 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "napi-derive-backend"
|
||||
version = "5.0.4"
|
||||
version = "5.0.5"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "0d5af30503edf933ce7377cf6d4c877a62b0f1107ea05585f1b5e430e88d5baf"
|
||||
checksum = "81b4b08f15eed7a2a20c3f4c6314013fc3ac890a3afa9892b594485299ebdb2d"
|
||||
dependencies = [
|
||||
"convert_case",
|
||||
"proc-macro2",
|
||||
@@ -10128,9 +10150,9 @@ checksum = "06abde3611657adf66d383f00b093d7faecc7fa57071cce2578660c9f1010821"
|
||||
|
||||
[[package]]
|
||||
name = "uuid"
|
||||
version = "1.23.3"
|
||||
version = "1.23.4"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "144d6b123cef80b301b8f72a9e2ca4370ddec21950d0a103dd22c437006d2db7"
|
||||
checksum = "bf80a72845275afea99e7f2b434723d3bc7e38470fcd1c7ed39a599c73319a53"
|
||||
dependencies = [
|
||||
"getrandom 0.4.2",
|
||||
"js-sys",
|
||||
|
||||
29
Cargo.toml
29
Cargo.toml
@@ -13,24 +13,25 @@ categories = ["database-implementations"]
|
||||
rust-version = "1.91.0"
|
||||
|
||||
[workspace.dependencies]
|
||||
lance = { "version" = "=9.0.0-beta.8", default-features = false, "tag" = "v9.0.0-beta.8", "git" = "https://github.com/lance-format/lance.git" }
|
||||
lance-core = { "version" = "=9.0.0-beta.8", "tag" = "v9.0.0-beta.8", "git" = "https://github.com/lance-format/lance.git" }
|
||||
lance-datagen = { "version" = "=9.0.0-beta.8", "tag" = "v9.0.0-beta.8", "git" = "https://github.com/lance-format/lance.git" }
|
||||
lance-file = { "version" = "=9.0.0-beta.8", "tag" = "v9.0.0-beta.8", "git" = "https://github.com/lance-format/lance.git" }
|
||||
lance-io = { "version" = "=9.0.0-beta.8", default-features = false, "tag" = "v9.0.0-beta.8", "git" = "https://github.com/lance-format/lance.git" }
|
||||
lance-index = { "version" = "=9.0.0-beta.8", "tag" = "v9.0.0-beta.8", "git" = "https://github.com/lance-format/lance.git" }
|
||||
lance-linalg = { "version" = "=9.0.0-beta.8", "tag" = "v9.0.0-beta.8", "git" = "https://github.com/lance-format/lance.git" }
|
||||
lance-namespace = { "version" = "=9.0.0-beta.8", "tag" = "v9.0.0-beta.8", "git" = "https://github.com/lance-format/lance.git" }
|
||||
lance-namespace-impls = { "version" = "=9.0.0-beta.8", default-features = false, "tag" = "v9.0.0-beta.8", "git" = "https://github.com/lance-format/lance.git" }
|
||||
lance-table = { "version" = "=9.0.0-beta.8", "tag" = "v9.0.0-beta.8", "git" = "https://github.com/lance-format/lance.git" }
|
||||
lance-testing = { "version" = "=9.0.0-beta.8", "tag" = "v9.0.0-beta.8", "git" = "https://github.com/lance-format/lance.git" }
|
||||
lance-datafusion = { "version" = "=9.0.0-beta.8", "tag" = "v9.0.0-beta.8", "git" = "https://github.com/lance-format/lance.git" }
|
||||
lance-encoding = { "version" = "=9.0.0-beta.8", "tag" = "v9.0.0-beta.8", "git" = "https://github.com/lance-format/lance.git" }
|
||||
lance-arrow = { "version" = "=9.0.0-beta.8", "tag" = "v9.0.0-beta.8", "git" = "https://github.com/lance-format/lance.git" }
|
||||
lance = { "version" = "=8.0.0", default-features = false }
|
||||
lance-core = "=8.0.0"
|
||||
lance-datagen = "=8.0.0"
|
||||
lance-file = "=8.0.0"
|
||||
lance-io = { "version" = "=8.0.0", default-features = false }
|
||||
lance-index = "=8.0.0"
|
||||
lance-linalg = "=8.0.0"
|
||||
lance-namespace = "=8.0.0"
|
||||
lance-namespace-impls = { "version" = "=8.0.0", default-features = false }
|
||||
lance-table = "=8.0.0"
|
||||
lance-testing = "=8.0.0"
|
||||
lance-datafusion = "=8.0.0"
|
||||
lance-encoding = "=8.0.0"
|
||||
lance-arrow = "=8.0.0"
|
||||
ahash = "0.8"
|
||||
# Note that this one does not include pyarrow
|
||||
arrow = { version = "58.0.0", optional = false }
|
||||
arrow-array = "58.0.0"
|
||||
arrow-buffer = "58.0.0"
|
||||
arrow-data = "58.0.0"
|
||||
arrow-ipc = "58.0.0"
|
||||
arrow-ord = "58.0.0"
|
||||
|
||||
@@ -14,7 +14,7 @@ Add the following dependency to your `pom.xml`:
|
||||
<dependency>
|
||||
<groupId>com.lancedb</groupId>
|
||||
<artifactId>lancedb-core</artifactId>
|
||||
<version>0.31.0-beta.4</version>
|
||||
<version>0.31.0</version>
|
||||
</dependency>
|
||||
```
|
||||
|
||||
|
||||
@@ -518,6 +518,9 @@ x > 5 OR y = 'test'
|
||||
|
||||
Filtering performance can often be improved by creating a scalar index
|
||||
on the filter column(s).
|
||||
|
||||
Calling this multiple times combines the filters with a logical AND rather
|
||||
than replacing the previous filter.
|
||||
```
|
||||
|
||||
#### Inherited from
|
||||
|
||||
@@ -767,6 +767,9 @@ x > 5 OR y = 'test'
|
||||
|
||||
Filtering performance can often be improved by creating a scalar index
|
||||
on the filter column(s).
|
||||
|
||||
Calling this multiple times combines the filters with a logical AND rather
|
||||
than replacing the previous filter.
|
||||
```
|
||||
|
||||
#### Inherited from
|
||||
|
||||
29
docs/src/js/enumerations/OAuthFlowType.md
Normal file
29
docs/src/js/enumerations/OAuthFlowType.md
Normal file
@@ -0,0 +1,29 @@
|
||||
[**@lancedb/lancedb**](../README.md) • **Docs**
|
||||
|
||||
***
|
||||
|
||||
[@lancedb/lancedb](../globals.md) / OAuthFlowType
|
||||
|
||||
# Enumeration: OAuthFlowType
|
||||
|
||||
OAuth authentication flow types.
|
||||
|
||||
## Enumeration Members
|
||||
|
||||
### AzureManagedIdentity
|
||||
|
||||
```ts
|
||||
AzureManagedIdentity: "azure_managed_identity";
|
||||
```
|
||||
|
||||
Azure Managed Identity via IMDS.
|
||||
|
||||
***
|
||||
|
||||
### ClientCredentials
|
||||
|
||||
```ts
|
||||
ClientCredentials: "client_credentials";
|
||||
```
|
||||
|
||||
Client Credentials grant (service-to-service / M2M).
|
||||
@@ -12,6 +12,7 @@
|
||||
## Enumerations
|
||||
|
||||
- [FullTextQueryType](enumerations/FullTextQueryType.md)
|
||||
- [OAuthFlowType](enumerations/OAuthFlowType.md)
|
||||
- [Occur](enumerations/Occur.md)
|
||||
- [Operator](enumerations/Operator.md)
|
||||
|
||||
@@ -85,6 +86,8 @@
|
||||
- [ListNamespacesResponse](interfaces/ListNamespacesResponse.md)
|
||||
- [LsmWriteSpec](interfaces/LsmWriteSpec.md)
|
||||
- [MergeResult](interfaces/MergeResult.md)
|
||||
- [NativeOAuthConfig](interfaces/NativeOAuthConfig.md)
|
||||
- [OAuthConfig](interfaces/OAuthConfig.md)
|
||||
- [OpenTableOptions](interfaces/OpenTableOptions.md)
|
||||
- [OptimizeOptions](interfaces/OptimizeOptions.md)
|
||||
- [OptimizeStats](interfaces/OptimizeStats.md)
|
||||
|
||||
@@ -64,6 +64,19 @@ client used by manifest-enabled native connections.
|
||||
|
||||
***
|
||||
|
||||
### oauthConfig?
|
||||
|
||||
```ts
|
||||
optional oauthConfig: NativeOAuthConfig;
|
||||
```
|
||||
|
||||
(For LanceDB cloud only): OAuth configuration for IdP-based
|
||||
authentication (e.g., Azure Entra ID). When set, token acquisition
|
||||
and refresh are handled entirely in Rust. TypeScript users should pass
|
||||
the public `OAuthConfig` type exported from `@lancedb/lancedb`.
|
||||
|
||||
***
|
||||
|
||||
### readConsistencyInterval?
|
||||
|
||||
```ts
|
||||
|
||||
88
docs/src/js/interfaces/NativeOAuthConfig.md
Normal file
88
docs/src/js/interfaces/NativeOAuthConfig.md
Normal file
@@ -0,0 +1,88 @@
|
||||
[**@lancedb/lancedb**](../README.md) • **Docs**
|
||||
|
||||
***
|
||||
|
||||
[@lancedb/lancedb](../globals.md) / NativeOAuthConfig
|
||||
|
||||
# Interface: NativeOAuthConfig
|
||||
|
||||
OAuth configuration for LanceDB authentication.
|
||||
|
||||
This is the generated napi-rs binding shape. TypeScript users should prefer
|
||||
the public `OAuthConfig` type exported from `@lancedb/lancedb`.
|
||||
|
||||
All token acquisition and refresh is handled in the Rust layer.
|
||||
|
||||
## Properties
|
||||
|
||||
### clientId
|
||||
|
||||
```ts
|
||||
clientId: string;
|
||||
```
|
||||
|
||||
Application / Client ID.
|
||||
|
||||
***
|
||||
|
||||
### clientSecret?
|
||||
|
||||
```ts
|
||||
optional clientSecret: string;
|
||||
```
|
||||
|
||||
Client secret (required for client_credentials).
|
||||
|
||||
***
|
||||
|
||||
### flow?
|
||||
|
||||
```ts
|
||||
optional flow: string;
|
||||
```
|
||||
|
||||
Authentication flow: "client_credentials" or "azure_managed_identity"
|
||||
|
||||
***
|
||||
|
||||
### issuerUrl
|
||||
|
||||
```ts
|
||||
issuerUrl: string;
|
||||
```
|
||||
|
||||
OIDC issuer URL or OAuth authority URL.
|
||||
For Azure: `https://login.microsoftonline.com/{tenant_id}/v2.0`
|
||||
|
||||
***
|
||||
|
||||
### managedIdentityClientId?
|
||||
|
||||
```ts
|
||||
optional managedIdentityClientId: string;
|
||||
```
|
||||
|
||||
Client ID for user-assigned managed identity (azure_managed_identity).
|
||||
|
||||
***
|
||||
|
||||
### refreshBufferSecs?
|
||||
|
||||
```ts
|
||||
optional refreshBufferSecs: number;
|
||||
```
|
||||
|
||||
Seconds before expiry to trigger proactive refresh (default: 300).
|
||||
Keep this well below the token TTL; if it is greater than or equal to
|
||||
the TTL, each request refreshes the token.
|
||||
|
||||
***
|
||||
|
||||
### scopes
|
||||
|
||||
```ts
|
||||
scopes: string[];
|
||||
```
|
||||
|
||||
OAuth scopes to request. For Azure managed identity, exactly one scope
|
||||
or resource is required. For example: `["api://{app_id}/.default"]`
|
||||
111
docs/src/js/interfaces/OAuthConfig.md
Normal file
111
docs/src/js/interfaces/OAuthConfig.md
Normal file
@@ -0,0 +1,111 @@
|
||||
[**@lancedb/lancedb**](../README.md) • **Docs**
|
||||
|
||||
***
|
||||
|
||||
[@lancedb/lancedb](../globals.md) / OAuthConfig
|
||||
|
||||
# Interface: OAuthConfig
|
||||
|
||||
OAuth configuration for LanceDB authentication.
|
||||
|
||||
This is the public TypeScript OAuth configuration type. The generated
|
||||
`NativeOAuthConfig` type has the same runtime shape but is an implementation
|
||||
detail of the napi-rs binding.
|
||||
|
||||
All token acquisition and refresh is handled in the Rust layer.
|
||||
This config is passed through to Rust via napi-rs.
|
||||
|
||||
## Examples
|
||||
|
||||
```typescript
|
||||
const config: OAuthConfig = {
|
||||
issuerUrl: "https://login.microsoftonline.com/{tenant}/v2.0",
|
||||
clientId: "app-id",
|
||||
clientSecret: "secret",
|
||||
scopes: ["api://lancedb-api/.default"],
|
||||
};
|
||||
```
|
||||
|
||||
```typescript
|
||||
const config: OAuthConfig = {
|
||||
issuerUrl: "https://login.microsoftonline.com/{tenant}/v2.0",
|
||||
clientId: "app-id",
|
||||
scopes: ["api://lancedb-api/.default"],
|
||||
flow: OAuthFlowType.AzureManagedIdentity,
|
||||
};
|
||||
```
|
||||
|
||||
## Properties
|
||||
|
||||
### clientId
|
||||
|
||||
```ts
|
||||
clientId: string;
|
||||
```
|
||||
|
||||
Application / Client ID.
|
||||
|
||||
***
|
||||
|
||||
### clientSecret?
|
||||
|
||||
```ts
|
||||
optional clientSecret: string;
|
||||
```
|
||||
|
||||
Client secret (required for ClientCredentials).
|
||||
|
||||
***
|
||||
|
||||
### flow?
|
||||
|
||||
```ts
|
||||
optional flow: OAuthFlowType;
|
||||
```
|
||||
|
||||
Authentication flow (default: ClientCredentials).
|
||||
|
||||
***
|
||||
|
||||
### issuerUrl
|
||||
|
||||
```ts
|
||||
issuerUrl: string;
|
||||
```
|
||||
|
||||
OIDC issuer URL or OAuth authority URL.
|
||||
For Azure: `https://login.microsoftonline.com/{tenant_id}/v2.0`
|
||||
|
||||
***
|
||||
|
||||
### managedIdentityClientId?
|
||||
|
||||
```ts
|
||||
optional managedIdentityClientId: string;
|
||||
```
|
||||
|
||||
Client ID for user-assigned managed identity (AzureManagedIdentity).
|
||||
|
||||
***
|
||||
|
||||
### refreshBufferSecs?
|
||||
|
||||
```ts
|
||||
optional refreshBufferSecs: number;
|
||||
```
|
||||
|
||||
Seconds before expiry to trigger proactive refresh (default: 300).
|
||||
Keep this well below the token TTL; if it is greater than or equal to
|
||||
the TTL, each request refreshes the token.
|
||||
|
||||
***
|
||||
|
||||
### scopes
|
||||
|
||||
```ts
|
||||
scopes: string[];
|
||||
```
|
||||
|
||||
OAuth scopes to request.
|
||||
For Azure managed identity, exactly one scope or resource is required.
|
||||
For example: `["api://{app_id}/.default"]`
|
||||
@@ -8,7 +8,7 @@
|
||||
<parent>
|
||||
<groupId>com.lancedb</groupId>
|
||||
<artifactId>lancedb-parent</artifactId>
|
||||
<version>0.31.0-beta.4</version>
|
||||
<version>0.31.0-final.0</version>
|
||||
<relativePath>../pom.xml</relativePath>
|
||||
</parent>
|
||||
|
||||
|
||||
@@ -6,7 +6,7 @@
|
||||
|
||||
<groupId>com.lancedb</groupId>
|
||||
<artifactId>lancedb-parent</artifactId>
|
||||
<version>0.31.0-beta.4</version>
|
||||
<version>0.31.0-final.0</version>
|
||||
<packaging>pom</packaging>
|
||||
<name>${project.artifactId}</name>
|
||||
<description>LanceDB Java SDK Parent POM</description>
|
||||
@@ -28,7 +28,7 @@
|
||||
<properties>
|
||||
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
|
||||
<arrow.version>15.0.0</arrow.version>
|
||||
<lance-core.version>9.0.0-beta.8</lance-core.version>
|
||||
<lance-core.version>8.0.0</lance-core.version>
|
||||
<spotless.skip>false</spotless.skip>
|
||||
<spotless.version>2.30.0</spotless.version>
|
||||
<spotless.java.googlejavaformat.version>1.7</spotless.java.googlejavaformat.version>
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
[package]
|
||||
name = "lancedb-nodejs"
|
||||
edition.workspace = true
|
||||
version = "0.31.0-beta.4"
|
||||
version = "0.31.0"
|
||||
publish = false
|
||||
license.workspace = true
|
||||
description.workspace = true
|
||||
|
||||
@@ -215,6 +215,20 @@ describe("Query orderBy", () => {
|
||||
expect(results[2].score).toBeCloseTo(4.1, 0.001);
|
||||
});
|
||||
|
||||
it("should combine repeated where clauses with AND", async () => {
|
||||
const results = await table
|
||||
.query()
|
||||
.where("score > 1.0")
|
||||
.where("score < 3.0")
|
||||
.orderBy({ columnName: "score" })
|
||||
.toArray();
|
||||
// Only rows matching both predicates should be returned, rather than the
|
||||
// second where() silently replacing the first.
|
||||
expect(results.length).toBe(2);
|
||||
expect(results[0].score).toBeCloseTo(1.2, 0.001);
|
||||
expect(results[1].score).toBeCloseTo(2.8, 0.001);
|
||||
});
|
||||
|
||||
it("should support method chaining with limit", async () => {
|
||||
const results = await table
|
||||
.query()
|
||||
|
||||
@@ -52,6 +52,7 @@ export {
|
||||
SplitHashOptions,
|
||||
SplitSequentialOptions,
|
||||
ShuffleOptions,
|
||||
OAuthConfig as NativeOAuthConfig,
|
||||
} from "./native.js";
|
||||
|
||||
export {
|
||||
@@ -130,6 +131,8 @@ export {
|
||||
TokenResponse,
|
||||
} from "./header";
|
||||
|
||||
export { OAuthConfig, OAuthFlowType } from "./oauth";
|
||||
|
||||
export { MergeInsertBuilder, WriteExecutionOptions } from "./merge";
|
||||
|
||||
export * as embedding from "./embedding";
|
||||
|
||||
76
nodejs/lancedb/oauth.ts
Normal file
76
nodejs/lancedb/oauth.ts
Normal file
@@ -0,0 +1,76 @@
|
||||
// SPDX-License-Identifier: Apache-2.0
|
||||
// SPDX-FileCopyrightText: Copyright The LanceDB Authors
|
||||
|
||||
/**
|
||||
* OAuth authentication flow types.
|
||||
*/
|
||||
export enum OAuthFlowType {
|
||||
/** Client Credentials grant (service-to-service / M2M). */
|
||||
ClientCredentials = "client_credentials",
|
||||
/** Azure Managed Identity via IMDS. */
|
||||
AzureManagedIdentity = "azure_managed_identity",
|
||||
}
|
||||
|
||||
/**
|
||||
* OAuth configuration for LanceDB authentication.
|
||||
*
|
||||
* This is the public TypeScript OAuth configuration type. The generated
|
||||
* `NativeOAuthConfig` type has the same runtime shape but is an implementation
|
||||
* detail of the napi-rs binding.
|
||||
*
|
||||
* All token acquisition and refresh is handled in the Rust layer.
|
||||
* This config is passed through to Rust via napi-rs.
|
||||
*
|
||||
* @example Client Credentials (service-to-service):
|
||||
* ```typescript
|
||||
* const config: OAuthConfig = {
|
||||
* issuerUrl: "https://login.microsoftonline.com/{tenant}/v2.0",
|
||||
* clientId: "app-id",
|
||||
* clientSecret: "secret",
|
||||
* scopes: ["api://lancedb-api/.default"],
|
||||
* };
|
||||
* ```
|
||||
*
|
||||
* @example Azure Managed Identity:
|
||||
* ```typescript
|
||||
* const config: OAuthConfig = {
|
||||
* issuerUrl: "https://login.microsoftonline.com/{tenant}/v2.0",
|
||||
* clientId: "app-id",
|
||||
* scopes: ["api://lancedb-api/.default"],
|
||||
* flow: OAuthFlowType.AzureManagedIdentity,
|
||||
* };
|
||||
* ```
|
||||
*/
|
||||
export interface OAuthConfig {
|
||||
/**
|
||||
* OIDC issuer URL or OAuth authority URL.
|
||||
* For Azure: `https://login.microsoftonline.com/{tenant_id}/v2.0`
|
||||
*/
|
||||
issuerUrl: string;
|
||||
|
||||
/** Application / Client ID. */
|
||||
clientId: string;
|
||||
|
||||
/**
|
||||
* OAuth scopes to request.
|
||||
* For Azure managed identity, exactly one scope or resource is required.
|
||||
* For example: `["api://{app_id}/.default"]`
|
||||
*/
|
||||
scopes: string[];
|
||||
|
||||
/** Authentication flow (default: ClientCredentials). */
|
||||
flow?: OAuthFlowType;
|
||||
|
||||
/** Client secret (required for ClientCredentials). */
|
||||
clientSecret?: string;
|
||||
|
||||
/** Client ID for user-assigned managed identity (AzureManagedIdentity). */
|
||||
managedIdentityClientId?: string;
|
||||
|
||||
/**
|
||||
* Seconds before expiry to trigger proactive refresh (default: 300).
|
||||
* Keep this well below the token TTL; if it is greater than or equal to
|
||||
* the TTL, each request refreshes the token.
|
||||
*/
|
||||
refreshBufferSecs?: number;
|
||||
}
|
||||
@@ -362,6 +362,9 @@ export class StandardQueryBase<
|
||||
*
|
||||
* Filtering performance can often be improved by creating a scalar index
|
||||
* on the filter column(s).
|
||||
*
|
||||
* Calling this multiple times combines the filters with a logical AND rather
|
||||
* than replacing the previous filter.
|
||||
*/
|
||||
where(predicate: string): this {
|
||||
this.doCall((inner: NativeQueryType) => inner.onlyIf(predicate));
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
{
|
||||
"name": "@lancedb/lancedb-darwin-arm64",
|
||||
"version": "0.31.0-beta.4",
|
||||
"version": "0.31.0",
|
||||
"os": ["darwin"],
|
||||
"cpu": ["arm64"],
|
||||
"main": "lancedb.darwin-arm64.node",
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
{
|
||||
"name": "@lancedb/lancedb-linux-arm64-gnu",
|
||||
"version": "0.31.0-beta.4",
|
||||
"version": "0.31.0",
|
||||
"os": ["linux"],
|
||||
"cpu": ["arm64"],
|
||||
"main": "lancedb.linux-arm64-gnu.node",
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
{
|
||||
"name": "@lancedb/lancedb-linux-arm64-musl",
|
||||
"version": "0.31.0-beta.4",
|
||||
"version": "0.31.0",
|
||||
"os": ["linux"],
|
||||
"cpu": ["arm64"],
|
||||
"main": "lancedb.linux-arm64-musl.node",
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
{
|
||||
"name": "@lancedb/lancedb-linux-x64-gnu",
|
||||
"version": "0.31.0-beta.4",
|
||||
"version": "0.31.0",
|
||||
"os": ["linux"],
|
||||
"cpu": ["x64"],
|
||||
"main": "lancedb.linux-x64-gnu.node",
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
{
|
||||
"name": "@lancedb/lancedb-linux-x64-musl",
|
||||
"version": "0.31.0-beta.4",
|
||||
"version": "0.31.0",
|
||||
"os": ["linux"],
|
||||
"cpu": ["x64"],
|
||||
"main": "lancedb.linux-x64-musl.node",
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
{
|
||||
"name": "@lancedb/lancedb-win32-arm64-msvc",
|
||||
"version": "0.31.0-beta.4",
|
||||
"version": "0.31.0",
|
||||
"os": [
|
||||
"win32"
|
||||
],
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
{
|
||||
"name": "@lancedb/lancedb-win32-x64-msvc",
|
||||
"version": "0.31.0-beta.4",
|
||||
"version": "0.31.0",
|
||||
"os": ["win32"],
|
||||
"cpu": ["x64"],
|
||||
"main": "lancedb.win32-x64-msvc.node",
|
||||
|
||||
4
nodejs/package-lock.json
generated
4
nodejs/package-lock.json
generated
@@ -1,12 +1,12 @@
|
||||
{
|
||||
"name": "@lancedb/lancedb",
|
||||
"version": "0.31.0-beta.3",
|
||||
"version": "0.31.0-beta.6",
|
||||
"lockfileVersion": 3,
|
||||
"requires": true,
|
||||
"packages": {
|
||||
"": {
|
||||
"name": "@lancedb/lancedb",
|
||||
"version": "0.31.0-beta.3",
|
||||
"version": "0.31.0-beta.6",
|
||||
"cpu": [
|
||||
"x64",
|
||||
"arm64"
|
||||
|
||||
@@ -11,7 +11,7 @@
|
||||
"ann"
|
||||
],
|
||||
"private": false,
|
||||
"version": "0.31.0-beta.4",
|
||||
"version": "0.31.0",
|
||||
"main": "dist/index.js",
|
||||
"exports": {
|
||||
".": "./dist/index.js",
|
||||
|
||||
@@ -112,6 +112,12 @@ impl Connection {
|
||||
|
||||
builder = builder.client_config(rust_config);
|
||||
|
||||
if let Some(oauth_config) = options.oauth_config {
|
||||
let config: lancedb::remote::oauth::OAuthConfig =
|
||||
oauth_config.try_into().default_error()?;
|
||||
builder = builder.oauth_config(config);
|
||||
}
|
||||
|
||||
if let Some(api_key) = options.api_key {
|
||||
builder = builder.api_key(&api_key);
|
||||
}
|
||||
|
||||
@@ -65,6 +65,11 @@ pub struct ConnectionOptions {
|
||||
/// (For LanceDB cloud only): the host to use for LanceDB cloud. Used
|
||||
/// for testing purposes.
|
||||
pub host_override: Option<String>,
|
||||
/// (For LanceDB cloud only): OAuth configuration for IdP-based
|
||||
/// authentication (e.g., Azure Entra ID). When set, token acquisition
|
||||
/// and refresh are handled entirely in Rust. TypeScript users should pass
|
||||
/// the public `OAuthConfig` type exported from `@lancedb/lancedb`.
|
||||
pub oauth_config: Option<remote::OAuthConfig>,
|
||||
}
|
||||
|
||||
#[napi(object)]
|
||||
|
||||
@@ -3,7 +3,7 @@
|
||||
|
||||
use std::time::Duration;
|
||||
|
||||
use lancedb::{arrow::IntoArrow, ipc::ipc_file_to_batches, table::merge::MergeInsertBuilder};
|
||||
use lancedb::{ipc::ipc_file_to_batches, table::merge::MergeInsertBuilder};
|
||||
use napi::bindgen_prelude::*;
|
||||
use napi_derive::napi;
|
||||
|
||||
@@ -66,11 +66,9 @@ impl NativeMergeInsertBuilder {
|
||||
|
||||
#[napi(catch_unwind)]
|
||||
pub async fn execute(&self, buf: Buffer) -> napi::Result<MergeResult> {
|
||||
let data = ipc_file_to_batches(buf.to_vec())
|
||||
.and_then(IntoArrow::into_arrow)
|
||||
.map_err(|e| {
|
||||
napi::Error::from_reason(format!("Failed to read IPC file: {}", convert_error(&e)))
|
||||
})?;
|
||||
let data = ipc_file_to_batches(buf.to_vec()).map_err(|e| {
|
||||
napi::Error::from_reason(format!("Failed to read IPC file: {}", convert_error(&e)))
|
||||
})?;
|
||||
|
||||
let this = self.clone();
|
||||
|
||||
|
||||
@@ -3,6 +3,7 @@
|
||||
|
||||
use std::collections::HashMap;
|
||||
|
||||
use lancedb::error::Error;
|
||||
use napi_derive::*;
|
||||
|
||||
/// Timeout configuration for remote HTTP client.
|
||||
@@ -140,6 +141,84 @@ impl From<TlsConfig> for lancedb::remote::TlsConfig {
|
||||
}
|
||||
}
|
||||
|
||||
/// OAuth configuration for LanceDB authentication.
|
||||
///
|
||||
/// This is the generated napi-rs binding shape. TypeScript users should prefer
|
||||
/// the public `OAuthConfig` type exported from `@lancedb/lancedb`.
|
||||
///
|
||||
/// All token acquisition and refresh is handled in the Rust layer.
|
||||
#[napi(object)]
|
||||
#[derive(Clone)]
|
||||
pub struct OAuthConfig {
|
||||
/// OIDC issuer URL or OAuth authority URL.
|
||||
/// For Azure: `https://login.microsoftonline.com/{tenant_id}/v2.0`
|
||||
pub issuer_url: String,
|
||||
/// Application / Client ID.
|
||||
pub client_id: String,
|
||||
/// OAuth scopes to request. For Azure managed identity, exactly one scope
|
||||
/// or resource is required. For example: `["api://{app_id}/.default"]`
|
||||
pub scopes: Vec<String>,
|
||||
/// Authentication flow: "client_credentials" or "azure_managed_identity"
|
||||
pub flow: Option<String>,
|
||||
/// Client secret (required for client_credentials).
|
||||
pub client_secret: Option<String>,
|
||||
/// Client ID for user-assigned managed identity (azure_managed_identity).
|
||||
pub managed_identity_client_id: Option<String>,
|
||||
/// Seconds before expiry to trigger proactive refresh (default: 300).
|
||||
/// Keep this well below the token TTL; if it is greater than or equal to
|
||||
/// the TTL, each request refreshes the token.
|
||||
pub refresh_buffer_secs: Option<u32>,
|
||||
}
|
||||
|
||||
impl std::fmt::Debug for OAuthConfig {
|
||||
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
|
||||
f.debug_struct("OAuthConfig")
|
||||
.field("issuer_url", &self.issuer_url)
|
||||
.field("client_id", &self.client_id)
|
||||
.field("scopes", &self.scopes)
|
||||
.field("flow", &self.flow)
|
||||
.field(
|
||||
"client_secret",
|
||||
&self.client_secret.as_deref().map(|_| "<redacted>"),
|
||||
)
|
||||
.field(
|
||||
"managed_identity_client_id",
|
||||
&self.managed_identity_client_id,
|
||||
)
|
||||
.field("refresh_buffer_secs", &self.refresh_buffer_secs)
|
||||
.finish()
|
||||
}
|
||||
}
|
||||
|
||||
impl TryFrom<OAuthConfig> for lancedb::remote::oauth::OAuthConfig {
|
||||
type Error = Error;
|
||||
|
||||
fn try_from(config: OAuthConfig) -> Result<Self, Self::Error> {
|
||||
use lancedb::remote::oauth::OAuthFlow;
|
||||
|
||||
let flow = match config.flow.as_deref().unwrap_or("client_credentials") {
|
||||
"client_credentials" => OAuthFlow::ClientCredentials,
|
||||
"azure_managed_identity" => OAuthFlow::AzureManagedIdentity {
|
||||
client_id: config.managed_identity_client_id,
|
||||
},
|
||||
other => {
|
||||
return Err(Error::InvalidInput {
|
||||
message: format!("Unknown OAuth flow type: {other}"),
|
||||
});
|
||||
}
|
||||
};
|
||||
|
||||
Ok(Self {
|
||||
issuer_url: config.issuer_url,
|
||||
client_id: config.client_id,
|
||||
client_secret: config.client_secret,
|
||||
scopes: config.scopes,
|
||||
flow,
|
||||
refresh_buffer_secs: config.refresh_buffer_secs.map(|v| v as u64),
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
impl From<ClientConfig> for lancedb::remote::ClientConfig {
|
||||
fn from(config: ClientConfig) -> Self {
|
||||
Self {
|
||||
@@ -156,3 +235,45 @@ impl From<ClientConfig> for lancedb::remote::ClientConfig {
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
#[test]
|
||||
fn test_unknown_oauth_flow_returns_invalid_input() {
|
||||
let config = OAuthConfig {
|
||||
issuer_url: "https://issuer.example.com".to_string(),
|
||||
client_id: "client-id".to_string(),
|
||||
scopes: vec!["scope".to_string()],
|
||||
flow: Some("typo".to_string()),
|
||||
client_secret: None,
|
||||
managed_identity_client_id: None,
|
||||
refresh_buffer_secs: None,
|
||||
};
|
||||
|
||||
let err = lancedb::remote::oauth::OAuthConfig::try_from(config).unwrap_err();
|
||||
assert!(matches!(
|
||||
err,
|
||||
Error::InvalidInput { message }
|
||||
if message == "Unknown OAuth flow type: typo"
|
||||
));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn test_oauth_config_debug_redacts_client_secret() {
|
||||
let config = OAuthConfig {
|
||||
issuer_url: "https://issuer.example.com".to_string(),
|
||||
client_id: "client-id".to_string(),
|
||||
scopes: vec!["scope".to_string()],
|
||||
flow: Some("client_credentials".to_string()),
|
||||
client_secret: Some("super-secret".to_string()),
|
||||
managed_identity_client_id: None,
|
||||
refresh_buffer_secs: None,
|
||||
};
|
||||
|
||||
let debug = format!("{config:?}");
|
||||
assert!(!debug.contains("super-secret"));
|
||||
assert!(debug.contains("client_secret: Some(\"<redacted>\")"));
|
||||
}
|
||||
}
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
[tool.bumpversion]
|
||||
current_version = "0.34.0-beta.4"
|
||||
current_version = "0.34.0"
|
||||
parse = """(?x)
|
||||
(?P<major>0|[1-9]\\d*)\\.
|
||||
(?P<minor>0|[1-9]\\d*)\\.
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
[package]
|
||||
name = "lancedb-python"
|
||||
version = "0.34.0-beta.4"
|
||||
version = "0.34.0"
|
||||
publish = false
|
||||
edition.workspace = true
|
||||
description = "Python bindings for LanceDB"
|
||||
|
||||
@@ -89,6 +89,8 @@ def connect(
|
||||
If presented, connect to LanceDB cloud.
|
||||
Otherwise, connect to a database on file system or cloud storage.
|
||||
Can be set via environment variable `LANCEDB_API_KEY`.
|
||||
OAuth configuration is currently supported only by ``connect_async``;
|
||||
synchronous LanceDB Cloud connections require an API key.
|
||||
region: str, default "us-east-1"
|
||||
The region to use for LanceDB Cloud.
|
||||
host_override: str, optional
|
||||
@@ -340,6 +342,7 @@ async def connect_async(
|
||||
session: Optional[Session] = None,
|
||||
manifest_enabled: bool = False,
|
||||
namespace_client_properties: Optional[Dict[str, str]] = None,
|
||||
oauth_config=None,
|
||||
) -> AsyncConnection:
|
||||
"""Connect to a LanceDB database.
|
||||
|
||||
@@ -389,6 +392,10 @@ async def connect_async(
|
||||
namespace_client_properties : dict, optional
|
||||
Additional directory namespace client properties to use with
|
||||
``manifest_enabled=True``.
|
||||
oauth_config : OAuthConfig, optional
|
||||
OAuth configuration for LanceDB Cloud/Enterprise. This is supported by
|
||||
``connect_async`` only; synchronous ``connect`` uses API key
|
||||
authentication for ``db://`` URIs.
|
||||
|
||||
Examples
|
||||
--------
|
||||
@@ -435,6 +442,7 @@ async def connect_async(
|
||||
session,
|
||||
manifest_enabled,
|
||||
namespace_client_properties,
|
||||
oauth_config,
|
||||
)
|
||||
)
|
||||
|
||||
|
||||
@@ -280,6 +280,24 @@ async def connect(
|
||||
session: Optional[Session],
|
||||
manifest_enabled: bool = False,
|
||||
namespace_client_properties: Optional[Dict[str, str]] = None,
|
||||
oauth_config: Optional[Any] = None,
|
||||
) -> Connection: ...
|
||||
def connect_namespace(
|
||||
namespace_client_impl: str,
|
||||
namespace_client_properties: Dict[str, str],
|
||||
read_consistency_interval: Optional[float] = None,
|
||||
storage_options: Optional[Dict[str, str]] = None,
|
||||
session: Optional[Session] = None,
|
||||
namespace_client_pushdown_operations: Optional[List[str]] = None,
|
||||
) -> Connection: ...
|
||||
def connect_namespace_client(
|
||||
namespace_client: Any,
|
||||
read_consistency_interval: Optional[float] = None,
|
||||
storage_options: Optional[Dict[str, str]] = None,
|
||||
session: Optional[Session] = None,
|
||||
namespace_client_pushdown_operations: Optional[List[str]] = None,
|
||||
namespace_client_impl: Optional[str] = None,
|
||||
namespace_client_properties: Optional[Dict[str, str]] = None,
|
||||
) -> Connection: ...
|
||||
|
||||
class RecordBatchStream:
|
||||
|
||||
@@ -51,6 +51,15 @@ class LanceMergeInsertBuilder(object):
|
||||
If there are multiple matches then the behavior is undefined.
|
||||
Currently this causes multiple copies of the row to be created
|
||||
but that behavior is subject to change.
|
||||
|
||||
Parameters
|
||||
----------
|
||||
where: Optional[str], default None
|
||||
An optional filter to limit which rows are updated. Column
|
||||
references in this expression must be prefixed with "target."
|
||||
to refer to the existing table data. For example, to only
|
||||
update rows where the existing color is red, use:
|
||||
``where="target.color = 'red'"``
|
||||
"""
|
||||
self._when_matched_update_all = True
|
||||
self._when_matched_update_all_condition = where
|
||||
|
||||
@@ -38,15 +38,13 @@ from lance_namespace_urllib3_client.models.query_table_request_vector import (
|
||||
QueryTableRequestVector,
|
||||
)
|
||||
from lance_namespace_urllib3_client.models.string_fts_query import StringFtsQuery
|
||||
from lance_namespace.errors import TableNotFoundError
|
||||
from lancedb._lancedb import connect_namespace_client as _connect_namespace_client
|
||||
from lance_namespace.errors import NamespaceNotEmptyError, TableNotFoundError
|
||||
from lancedb._lancedb import (
|
||||
connect_namespace as _connect_namespace,
|
||||
connect_namespace_client as _connect_namespace_client,
|
||||
)
|
||||
from lancedb.background_loop import LOOP
|
||||
from lancedb.db import AsyncConnection, DBConnection
|
||||
from lancedb.namespace_utils import (
|
||||
_normalize_create_namespace_mode,
|
||||
_normalize_drop_namespace_mode,
|
||||
_normalize_drop_namespace_behavior,
|
||||
)
|
||||
from lance_namespace import (
|
||||
LanceNamespace,
|
||||
connect as namespace_connect,
|
||||
@@ -55,13 +53,6 @@ from lance_namespace import (
|
||||
DropNamespaceResponse,
|
||||
ListNamespacesResponse,
|
||||
ListTablesResponse,
|
||||
ListTablesRequest,
|
||||
DescribeNamespaceRequest,
|
||||
DropTableRequest,
|
||||
RenameTableRequest,
|
||||
ListNamespacesRequest,
|
||||
CreateNamespaceRequest,
|
||||
DropNamespaceRequest,
|
||||
)
|
||||
from lancedb.table import AsyncTable, LanceTable, Table
|
||||
from lancedb.util import validate_table_name
|
||||
@@ -386,6 +377,10 @@ def _builds_namespace_natively(
|
||||
return namespace_client_impl == "rest" and bool(namespace_client_properties)
|
||||
|
||||
|
||||
def _supports_native_namespace(namespace_client_impl: str) -> bool:
|
||||
return namespace_client_impl in {"dir", "rest"}
|
||||
|
||||
|
||||
class LanceNamespaceDBConnection(DBConnection):
|
||||
"""
|
||||
A LanceDB connection that uses a namespace for table management.
|
||||
@@ -396,7 +391,7 @@ class LanceNamespaceDBConnection(DBConnection):
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
namespace_client: LanceNamespace,
|
||||
namespace_client: Optional[LanceNamespace] = None,
|
||||
*,
|
||||
read_consistency_interval: Optional[timedelta] = None,
|
||||
storage_options: Optional[Dict[str, str]] = None,
|
||||
@@ -404,6 +399,7 @@ class LanceNamespaceDBConnection(DBConnection):
|
||||
namespace_client_pushdown_operations: Optional[List[str]] = None,
|
||||
namespace_client_impl: Optional[str] = None,
|
||||
namespace_client_properties: Optional[Dict[str, str]] = None,
|
||||
_inner: Optional[AsyncConnection] = None,
|
||||
):
|
||||
"""
|
||||
Initialize a namespace-based LanceDB connection.
|
||||
@@ -445,30 +441,36 @@ class LanceNamespaceDBConnection(DBConnection):
|
||||
)
|
||||
self._namespace_client_impl = namespace_client_impl
|
||||
self._namespace_client_properties = namespace_client_properties
|
||||
# When the namespace client is built natively (see Rust
|
||||
# ``build_namespace_natively``), the underlying Rust table performs
|
||||
# QueryTable pushdown through the read-freshness context provider, which
|
||||
# the pure-Python ``query_table`` path bypasses.
|
||||
self._route_pushdown_to_rust = _builds_namespace_natively(
|
||||
# When the namespace connection or client is built natively in Rust, the
|
||||
# underlying Rust table performs QueryTable pushdown through the
|
||||
# read-freshness context provider, which the pure-Python ``query_table``
|
||||
# path bypasses.
|
||||
self._route_pushdown_to_rust = _inner is not None or _builds_namespace_natively(
|
||||
namespace_client_impl, namespace_client_properties
|
||||
)
|
||||
self._inner = AsyncConnection(
|
||||
_connect_namespace_client(
|
||||
namespace_client,
|
||||
read_consistency_interval=(
|
||||
read_consistency_interval.total_seconds()
|
||||
if read_consistency_interval is not None
|
||||
else None
|
||||
),
|
||||
storage_options=self.storage_options or None,
|
||||
session=session,
|
||||
namespace_client_pushdown_operations=(
|
||||
list(self._namespace_client_pushdown_operations)
|
||||
),
|
||||
namespace_client_impl=namespace_client_impl,
|
||||
namespace_client_properties=namespace_client_properties,
|
||||
if _inner is not None:
|
||||
self._inner = _inner
|
||||
else:
|
||||
if namespace_client is None:
|
||||
raise ValueError("namespace_client is required without a native _inner")
|
||||
self._inner = AsyncConnection(
|
||||
_connect_namespace_client(
|
||||
namespace_client,
|
||||
read_consistency_interval=(
|
||||
read_consistency_interval.total_seconds()
|
||||
if read_consistency_interval is not None
|
||||
else None
|
||||
),
|
||||
storage_options=self.storage_options or None,
|
||||
session=session,
|
||||
namespace_client_pushdown_operations=(
|
||||
list(self._namespace_client_pushdown_operations)
|
||||
),
|
||||
namespace_client_impl=namespace_client_impl,
|
||||
namespace_client_properties=namespace_client_properties,
|
||||
)
|
||||
)
|
||||
)
|
||||
self._uri = self._inner.uri
|
||||
|
||||
@override
|
||||
def serialize(self) -> str:
|
||||
@@ -514,11 +516,11 @@ class LanceNamespaceDBConnection(DBConnection):
|
||||
)
|
||||
if namespace_path is None:
|
||||
namespace_path = []
|
||||
request = ListTablesRequest(
|
||||
id=namespace_path, page_token=page_token, limit=limit
|
||||
return LOOP.run(
|
||||
self._inner.table_names(
|
||||
namespace_path=namespace_path, start_after=page_token, limit=limit
|
||||
)
|
||||
)
|
||||
response = self._namespace_client.list_tables(request)
|
||||
return response.tables if response.tables else []
|
||||
|
||||
@override
|
||||
def create_table(
|
||||
@@ -589,8 +591,8 @@ class LanceNamespaceDBConnection(DBConnection):
|
||||
index_cache_size=index_cache_size,
|
||||
)
|
||||
)
|
||||
except RuntimeError as e:
|
||||
if "Table not found" in str(e):
|
||||
except (RuntimeError, ValueError) as e:
|
||||
if "Table not found" in str(e) or "was not found" in str(e):
|
||||
table_id = namespace_path + [name]
|
||||
raise TableNotFoundError(f"Table not found: {'$'.join(table_id)}")
|
||||
raise
|
||||
@@ -612,12 +614,9 @@ class LanceNamespaceDBConnection(DBConnection):
|
||||
|
||||
@override
|
||||
def drop_table(self, name: str, namespace_path: Optional[List[str]] = None):
|
||||
# Use namespace drop_table directly
|
||||
if namespace_path is None:
|
||||
namespace_path = []
|
||||
table_id = namespace_path + [name]
|
||||
request = DropTableRequest(id=table_id)
|
||||
self._namespace_client.drop_table(request)
|
||||
LOOP.run(self._inner.drop_table(name, namespace_path=namespace_path))
|
||||
|
||||
@override
|
||||
def rename_table(
|
||||
@@ -631,14 +630,19 @@ class LanceNamespaceDBConnection(DBConnection):
|
||||
cur_namespace_path = []
|
||||
if new_namespace_path is None:
|
||||
new_namespace_path = []
|
||||
cur_table_id = cur_namespace_path + [cur_name]
|
||||
new_namespace_id = new_namespace_path if new_namespace_path else None
|
||||
request = RenameTableRequest(
|
||||
id=cur_table_id,
|
||||
new_table_name=new_name,
|
||||
new_namespace_id=new_namespace_id,
|
||||
)
|
||||
self._namespace_client.rename_table(request)
|
||||
try:
|
||||
LOOP.run(
|
||||
self._inner.rename_table(
|
||||
cur_name,
|
||||
new_name,
|
||||
cur_namespace_path=cur_namespace_path,
|
||||
new_namespace_path=new_namespace_path,
|
||||
)
|
||||
)
|
||||
except RuntimeError as e:
|
||||
if "rename_table not implemented" in str(e):
|
||||
raise NotImplementedError("rename_table not implemented") from e
|
||||
raise
|
||||
|
||||
@override
|
||||
def drop_database(self):
|
||||
@@ -650,8 +654,7 @@ class LanceNamespaceDBConnection(DBConnection):
|
||||
def drop_all_tables(self, namespace_path: Optional[List[str]] = None):
|
||||
if namespace_path is None:
|
||||
namespace_path = []
|
||||
for table_name in self.table_names(namespace_path=namespace_path):
|
||||
self.drop_table(table_name, namespace_path=namespace_path)
|
||||
LOOP.run(self._inner.drop_all_tables(namespace_path=namespace_path))
|
||||
|
||||
@override
|
||||
def list_namespaces(
|
||||
@@ -681,13 +684,10 @@ class LanceNamespaceDBConnection(DBConnection):
|
||||
"""
|
||||
if namespace_path is None:
|
||||
namespace_path = []
|
||||
request = ListNamespacesRequest(
|
||||
id=namespace_path, page_token=page_token, limit=limit
|
||||
)
|
||||
response = self._namespace_client.list_namespaces(request)
|
||||
return ListNamespacesResponse(
|
||||
namespaces=response.namespaces if response.namespaces else [],
|
||||
page_token=response.page_token,
|
||||
return LOOP.run(
|
||||
self._inner.list_namespaces(
|
||||
namespace_path=namespace_path, page_token=page_token, limit=limit
|
||||
)
|
||||
)
|
||||
|
||||
@override
|
||||
@@ -715,14 +715,12 @@ class LanceNamespaceDBConnection(DBConnection):
|
||||
CreateNamespaceResponse
|
||||
Response containing the properties of the created namespace.
|
||||
"""
|
||||
request = CreateNamespaceRequest(
|
||||
id=namespace_path,
|
||||
mode=_normalize_create_namespace_mode(mode),
|
||||
properties=properties,
|
||||
)
|
||||
response = self._namespace_client.create_namespace(request)
|
||||
return CreateNamespaceResponse(
|
||||
properties=response.properties if hasattr(response, "properties") else None
|
||||
return LOOP.run(
|
||||
self._inner.create_namespace(
|
||||
namespace_path=namespace_path,
|
||||
mode=mode,
|
||||
properties=properties,
|
||||
)
|
||||
)
|
||||
|
||||
@override
|
||||
@@ -750,20 +748,18 @@ class LanceNamespaceDBConnection(DBConnection):
|
||||
DropNamespaceResponse
|
||||
Response containing properties and transaction_id if applicable.
|
||||
"""
|
||||
request = DropNamespaceRequest(
|
||||
id=namespace_path,
|
||||
mode=_normalize_drop_namespace_mode(mode),
|
||||
behavior=_normalize_drop_namespace_behavior(behavior),
|
||||
)
|
||||
response = self._namespace_client.drop_namespace(request)
|
||||
return DropNamespaceResponse(
|
||||
properties=(
|
||||
response.properties if hasattr(response, "properties") else None
|
||||
),
|
||||
transaction_id=(
|
||||
response.transaction_id if hasattr(response, "transaction_id") else None
|
||||
),
|
||||
)
|
||||
try:
|
||||
return LOOP.run(
|
||||
self._inner.drop_namespace(
|
||||
namespace_path=namespace_path,
|
||||
mode=mode,
|
||||
behavior=behavior,
|
||||
)
|
||||
)
|
||||
except RuntimeError as e:
|
||||
if "Namespace not empty" in str(e):
|
||||
raise NamespaceNotEmptyError(str(e)) from e
|
||||
raise
|
||||
|
||||
@override
|
||||
def describe_namespace(
|
||||
@@ -782,11 +778,7 @@ class LanceNamespaceDBConnection(DBConnection):
|
||||
DescribeNamespaceResponse
|
||||
Response containing the namespace properties.
|
||||
"""
|
||||
request = DescribeNamespaceRequest(id=namespace_path)
|
||||
response = self._namespace_client.describe_namespace(request)
|
||||
return DescribeNamespaceResponse(
|
||||
properties=response.properties if hasattr(response, "properties") else None
|
||||
)
|
||||
return LOOP.run(self._inner.describe_namespace(namespace_path))
|
||||
|
||||
@override
|
||||
def list_tables(
|
||||
@@ -816,13 +808,10 @@ class LanceNamespaceDBConnection(DBConnection):
|
||||
"""
|
||||
if namespace_path is None:
|
||||
namespace_path = []
|
||||
request = ListTablesRequest(
|
||||
id=namespace_path, page_token=page_token, limit=limit
|
||||
)
|
||||
response = self._namespace_client.list_tables(request)
|
||||
return ListTablesResponse(
|
||||
tables=response.tables if response.tables else [],
|
||||
page_token=response.page_token,
|
||||
return LOOP.run(
|
||||
self._inner.list_tables(
|
||||
namespace_path=namespace_path, page_token=page_token, limit=limit
|
||||
)
|
||||
)
|
||||
|
||||
def _lance_table_from_uri(
|
||||
@@ -878,6 +867,18 @@ class LanceNamespaceDBConnection(DBConnection):
|
||||
LanceNamespace
|
||||
The namespace client for this connection.
|
||||
"""
|
||||
if self._namespace_client is None:
|
||||
if (
|
||||
self._namespace_client_impl is None
|
||||
or self._namespace_client_properties is None
|
||||
):
|
||||
raise ValueError(
|
||||
"Cannot construct a Python namespace client without "
|
||||
"namespace implementation properties"
|
||||
)
|
||||
self._namespace_client = namespace_connect(
|
||||
self._namespace_client_impl, self._namespace_client_properties
|
||||
)
|
||||
return self._namespace_client
|
||||
|
||||
|
||||
@@ -891,7 +892,7 @@ class AsyncLanceNamespaceDBConnection:
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
namespace_client: LanceNamespace,
|
||||
namespace_client: Optional[LanceNamespace] = None,
|
||||
*,
|
||||
read_consistency_interval: Optional[timedelta] = None,
|
||||
storage_options: Optional[Dict[str, str]] = None,
|
||||
@@ -899,6 +900,7 @@ class AsyncLanceNamespaceDBConnection:
|
||||
namespace_client_pushdown_operations: Optional[List[str]] = None,
|
||||
namespace_client_impl: Optional[str] = None,
|
||||
namespace_client_properties: Optional[Dict[str, str]] = None,
|
||||
_inner: Optional[AsyncConnection] = None,
|
||||
):
|
||||
"""
|
||||
Initialize an async namespace-based LanceDB connection.
|
||||
@@ -940,29 +942,35 @@ class AsyncLanceNamespaceDBConnection:
|
||||
)
|
||||
self._namespace_client_impl = namespace_client_impl
|
||||
self._namespace_client_properties = namespace_client_properties
|
||||
# See LanceNamespaceDBConnection: when built natively the Rust table runs
|
||||
# QueryTable pushdown through the read-freshness provider, so defer to it
|
||||
# rather than the urllib3 client (which omits x-lancedb-min-timestamp).
|
||||
self._route_pushdown_to_rust = _builds_namespace_natively(
|
||||
# See LanceNamespaceDBConnection: when Rust owns the namespace
|
||||
# connection/client, its table performs QueryTable pushdown through the
|
||||
# read-freshness provider, so defer to it rather than the urllib3 client
|
||||
# path (which omits x-lancedb-min-timestamp).
|
||||
self._route_pushdown_to_rust = _inner is not None or _builds_namespace_natively(
|
||||
namespace_client_impl, namespace_client_properties
|
||||
)
|
||||
self._inner = AsyncConnection(
|
||||
_connect_namespace_client(
|
||||
namespace_client,
|
||||
read_consistency_interval=(
|
||||
read_consistency_interval.total_seconds()
|
||||
if read_consistency_interval is not None
|
||||
else None
|
||||
),
|
||||
storage_options=self.storage_options or None,
|
||||
session=session,
|
||||
namespace_client_pushdown_operations=(
|
||||
list(self._namespace_client_pushdown_operations)
|
||||
),
|
||||
namespace_client_impl=namespace_client_impl,
|
||||
namespace_client_properties=namespace_client_properties,
|
||||
if _inner is not None:
|
||||
self._inner = _inner
|
||||
else:
|
||||
if namespace_client is None:
|
||||
raise ValueError("namespace_client is required without a native _inner")
|
||||
self._inner = AsyncConnection(
|
||||
_connect_namespace_client(
|
||||
namespace_client,
|
||||
read_consistency_interval=(
|
||||
read_consistency_interval.total_seconds()
|
||||
if read_consistency_interval is not None
|
||||
else None
|
||||
),
|
||||
storage_options=self.storage_options or None,
|
||||
session=session,
|
||||
namespace_client_pushdown_operations=(
|
||||
list(self._namespace_client_pushdown_operations)
|
||||
),
|
||||
namespace_client_impl=namespace_client_impl,
|
||||
namespace_client_properties=namespace_client_properties,
|
||||
)
|
||||
)
|
||||
)
|
||||
|
||||
async def table_names(
|
||||
self,
|
||||
@@ -986,11 +994,9 @@ class AsyncLanceNamespaceDBConnection:
|
||||
)
|
||||
if namespace_path is None:
|
||||
namespace_path = []
|
||||
request = ListTablesRequest(
|
||||
id=namespace_path, page_token=page_token, limit=limit
|
||||
return await self._inner.table_names(
|
||||
namespace_path=namespace_path, start_after=page_token, limit=limit
|
||||
)
|
||||
response = self._namespace_client.list_tables(request)
|
||||
return response.tables if response.tables else []
|
||||
|
||||
async def create_table(
|
||||
self,
|
||||
@@ -1053,8 +1059,8 @@ class AsyncLanceNamespaceDBConnection:
|
||||
storage_options=storage_options,
|
||||
index_cache_size=index_cache_size,
|
||||
)
|
||||
except RuntimeError as e:
|
||||
if "Table not found" in str(e):
|
||||
except (RuntimeError, ValueError) as e:
|
||||
if "Table not found" in str(e) or "was not found" in str(e):
|
||||
table_id = namespace_path + [name]
|
||||
raise TableNotFoundError(f"Table not found: {'$'.join(table_id)}")
|
||||
raise
|
||||
@@ -1075,9 +1081,7 @@ class AsyncLanceNamespaceDBConnection:
|
||||
"""Drop a table from the namespace."""
|
||||
if namespace_path is None:
|
||||
namespace_path = []
|
||||
table_id = namespace_path + [name]
|
||||
request = DropTableRequest(id=table_id)
|
||||
self._namespace_client.drop_table(request)
|
||||
await self._inner.drop_table(name, namespace_path=namespace_path)
|
||||
|
||||
async def rename_table(
|
||||
self,
|
||||
@@ -1091,14 +1095,17 @@ class AsyncLanceNamespaceDBConnection:
|
||||
cur_namespace_path = []
|
||||
if new_namespace_path is None:
|
||||
new_namespace_path = []
|
||||
cur_table_id = cur_namespace_path + [cur_name]
|
||||
new_namespace_id = new_namespace_path if new_namespace_path else None
|
||||
request = RenameTableRequest(
|
||||
id=cur_table_id,
|
||||
new_table_name=new_name,
|
||||
new_namespace_id=new_namespace_id,
|
||||
)
|
||||
self._namespace_client.rename_table(request)
|
||||
try:
|
||||
await self._inner.rename_table(
|
||||
cur_name,
|
||||
new_name,
|
||||
cur_namespace_path=cur_namespace_path,
|
||||
new_namespace_path=new_namespace_path,
|
||||
)
|
||||
except RuntimeError as e:
|
||||
if "rename_table not implemented" in str(e):
|
||||
raise NotImplementedError("rename_table not implemented") from e
|
||||
raise
|
||||
|
||||
async def drop_database(self):
|
||||
"""Deprecated method."""
|
||||
@@ -1110,9 +1117,7 @@ class AsyncLanceNamespaceDBConnection:
|
||||
"""Drop all tables in the namespace."""
|
||||
if namespace_path is None:
|
||||
namespace_path = []
|
||||
table_names = await self.table_names(namespace_path=namespace_path)
|
||||
for table_name in table_names:
|
||||
await self.drop_table(table_name, namespace_path=namespace_path)
|
||||
await self._inner.drop_all_tables(namespace_path=namespace_path)
|
||||
|
||||
async def list_namespaces(
|
||||
self,
|
||||
@@ -1141,13 +1146,8 @@ class AsyncLanceNamespaceDBConnection:
|
||||
"""
|
||||
if namespace_path is None:
|
||||
namespace_path = []
|
||||
request = ListNamespacesRequest(
|
||||
id=namespace_path, page_token=page_token, limit=limit
|
||||
)
|
||||
response = self._namespace_client.list_namespaces(request)
|
||||
return ListNamespacesResponse(
|
||||
namespaces=response.namespaces if response.namespaces else [],
|
||||
page_token=response.page_token,
|
||||
return await self._inner.list_namespaces(
|
||||
namespace_path=namespace_path, page_token=page_token, limit=limit
|
||||
)
|
||||
|
||||
async def create_namespace(
|
||||
@@ -1174,15 +1174,11 @@ class AsyncLanceNamespaceDBConnection:
|
||||
CreateNamespaceResponse
|
||||
Response containing the properties of the created namespace.
|
||||
"""
|
||||
request = CreateNamespaceRequest(
|
||||
id=namespace_path,
|
||||
mode=_normalize_create_namespace_mode(mode),
|
||||
return await self._inner.create_namespace(
|
||||
namespace_path=namespace_path,
|
||||
mode=mode,
|
||||
properties=properties,
|
||||
)
|
||||
response = self._namespace_client.create_namespace(request)
|
||||
return CreateNamespaceResponse(
|
||||
properties=response.properties if hasattr(response, "properties") else None
|
||||
)
|
||||
|
||||
async def drop_namespace(
|
||||
self,
|
||||
@@ -1208,20 +1204,16 @@ class AsyncLanceNamespaceDBConnection:
|
||||
DropNamespaceResponse
|
||||
Response containing properties and transaction_id if applicable.
|
||||
"""
|
||||
request = DropNamespaceRequest(
|
||||
id=namespace_path,
|
||||
mode=_normalize_drop_namespace_mode(mode),
|
||||
behavior=_normalize_drop_namespace_behavior(behavior),
|
||||
)
|
||||
response = self._namespace_client.drop_namespace(request)
|
||||
return DropNamespaceResponse(
|
||||
properties=(
|
||||
response.properties if hasattr(response, "properties") else None
|
||||
),
|
||||
transaction_id=(
|
||||
response.transaction_id if hasattr(response, "transaction_id") else None
|
||||
),
|
||||
)
|
||||
try:
|
||||
return await self._inner.drop_namespace(
|
||||
namespace_path=namespace_path,
|
||||
mode=mode,
|
||||
behavior=behavior,
|
||||
)
|
||||
except RuntimeError as e:
|
||||
if "Namespace not empty" in str(e):
|
||||
raise NamespaceNotEmptyError(str(e)) from e
|
||||
raise
|
||||
|
||||
async def describe_namespace(
|
||||
self, namespace_path: List[str]
|
||||
@@ -1239,11 +1231,7 @@ class AsyncLanceNamespaceDBConnection:
|
||||
DescribeNamespaceResponse
|
||||
Response containing the namespace properties.
|
||||
"""
|
||||
request = DescribeNamespaceRequest(id=namespace_path)
|
||||
response = self._namespace_client.describe_namespace(request)
|
||||
return DescribeNamespaceResponse(
|
||||
properties=response.properties if hasattr(response, "properties") else None
|
||||
)
|
||||
return await self._inner.describe_namespace(namespace_path)
|
||||
|
||||
async def list_tables(
|
||||
self,
|
||||
@@ -1272,13 +1260,8 @@ class AsyncLanceNamespaceDBConnection:
|
||||
"""
|
||||
if namespace_path is None:
|
||||
namespace_path = []
|
||||
request = ListTablesRequest(
|
||||
id=namespace_path, page_token=page_token, limit=limit
|
||||
)
|
||||
response = self._namespace_client.list_tables(request)
|
||||
return ListTablesResponse(
|
||||
tables=response.tables if response.tables else [],
|
||||
page_token=response.page_token,
|
||||
return await self._inner.list_tables(
|
||||
namespace_path=namespace_path, page_token=page_token, limit=limit
|
||||
)
|
||||
|
||||
async def namespace_client(self) -> LanceNamespace:
|
||||
@@ -1292,6 +1275,18 @@ class AsyncLanceNamespaceDBConnection:
|
||||
LanceNamespace
|
||||
The namespace client for this connection.
|
||||
"""
|
||||
if self._namespace_client is None:
|
||||
if (
|
||||
self._namespace_client_impl is None
|
||||
or self._namespace_client_properties is None
|
||||
):
|
||||
raise ValueError(
|
||||
"Cannot construct a Python namespace client without "
|
||||
"namespace implementation properties"
|
||||
)
|
||||
self._namespace_client = namespace_connect(
|
||||
self._namespace_client_impl, self._namespace_client_properties
|
||||
)
|
||||
return self._namespace_client
|
||||
|
||||
|
||||
@@ -1342,6 +1337,32 @@ def connect_namespace(
|
||||
LanceNamespaceDBConnection
|
||||
A namespace-based connection to LanceDB
|
||||
"""
|
||||
if _supports_native_namespace(namespace_client_impl):
|
||||
inner = AsyncConnection(
|
||||
_connect_namespace(
|
||||
namespace_client_impl,
|
||||
namespace_client_properties,
|
||||
read_consistency_interval=(
|
||||
read_consistency_interval.total_seconds()
|
||||
if read_consistency_interval is not None
|
||||
else None
|
||||
),
|
||||
storage_options=storage_options,
|
||||
session=session,
|
||||
namespace_client_pushdown_operations=namespace_client_pushdown_operations,
|
||||
)
|
||||
)
|
||||
return LanceNamespaceDBConnection(
|
||||
namespace_client=None,
|
||||
read_consistency_interval=read_consistency_interval,
|
||||
storage_options=storage_options,
|
||||
session=session,
|
||||
namespace_client_pushdown_operations=namespace_client_pushdown_operations,
|
||||
namespace_client_impl=namespace_client_impl,
|
||||
namespace_client_properties=namespace_client_properties,
|
||||
_inner=inner,
|
||||
)
|
||||
|
||||
namespace_client = namespace_connect(
|
||||
namespace_client_impl, namespace_client_properties
|
||||
)
|
||||
@@ -1417,6 +1438,32 @@ def connect_namespace_async(
|
||||
... tables = await db.table_names()
|
||||
... table = await db.create_table("my_table", schema=schema)
|
||||
"""
|
||||
if _supports_native_namespace(namespace_client_impl):
|
||||
inner = AsyncConnection(
|
||||
_connect_namespace(
|
||||
namespace_client_impl,
|
||||
namespace_client_properties,
|
||||
read_consistency_interval=(
|
||||
read_consistency_interval.total_seconds()
|
||||
if read_consistency_interval is not None
|
||||
else None
|
||||
),
|
||||
storage_options=storage_options,
|
||||
session=session,
|
||||
namespace_client_pushdown_operations=namespace_client_pushdown_operations,
|
||||
)
|
||||
)
|
||||
return AsyncLanceNamespaceDBConnection(
|
||||
namespace_client=None,
|
||||
read_consistency_interval=read_consistency_interval,
|
||||
storage_options=storage_options,
|
||||
session=session,
|
||||
namespace_client_pushdown_operations=namespace_client_pushdown_operations,
|
||||
namespace_client_impl=namespace_client_impl,
|
||||
namespace_client_properties=namespace_client_properties,
|
||||
_inner=inner,
|
||||
)
|
||||
|
||||
namespace_client = namespace_connect(
|
||||
namespace_client_impl, namespace_client_properties
|
||||
)
|
||||
|
||||
@@ -48,6 +48,14 @@ class PermutationBuilder:
|
||||
By default, the permutation builder will create a single split that contains all
|
||||
rows in the same order as the base table.
|
||||
"""
|
||||
if not hasattr(table, "_inner"):
|
||||
raise TypeError(
|
||||
f"PermutationBuilder requires a local LanceTable, "
|
||||
f"got {type(table).__name__}. "
|
||||
"The permutation API is not supported on remote tables. "
|
||||
"Remote tables connect to LanceDB Cloud or Enterprise and do not have "
|
||||
"direct access to the underlying Lance dataset needed for permutations."
|
||||
)
|
||||
self._async = async_permutation_builder(table)
|
||||
|
||||
def split_random(
|
||||
|
||||
@@ -119,6 +119,27 @@ def _filter_to_sql(filter: Optional[Union[str, Expr]]) -> Optional[str]:
|
||||
return filter
|
||||
|
||||
|
||||
def _combine_where(
|
||||
existing: Optional[Union[str, Expr]], new: Union[str, Expr]
|
||||
) -> Union[str, Expr]:
|
||||
"""Combine a new filter with an existing one using a logical AND.
|
||||
|
||||
Calling ``where`` more than once composes the filters with AND instead of
|
||||
replacing the previous filter. Two :class:`~lancedb.expr.Expr` filters are
|
||||
combined as an expression; otherwise both filters are lowered to SQL strings
|
||||
and combined as SQL.
|
||||
"""
|
||||
if existing is None:
|
||||
return new
|
||||
existing_is_expr = isinstance(existing, Expr)
|
||||
new_is_expr = isinstance(new, Expr)
|
||||
if existing_is_expr and new_is_expr:
|
||||
return existing & new
|
||||
existing_sql = existing.to_sql() if existing_is_expr else existing
|
||||
new_sql = new.to_sql() if new_is_expr else new
|
||||
return f"({existing_sql}) AND ({new_sql})"
|
||||
|
||||
|
||||
def _projection_to_scanner_kwargs(
|
||||
columns: Optional[
|
||||
Union[
|
||||
@@ -1148,8 +1169,13 @@ class LanceQueryBuilder(ABC):
|
||||
-------
|
||||
LanceQueryBuilder
|
||||
The LanceQueryBuilder object.
|
||||
|
||||
Notes
|
||||
-----
|
||||
Calling this multiple times combines the filters with a logical AND
|
||||
rather than replacing the previous filter.
|
||||
"""
|
||||
self._where = where
|
||||
self._where = _combine_where(self._where, where)
|
||||
self._postfilter = not prefilter
|
||||
return self
|
||||
|
||||
@@ -1693,8 +1719,13 @@ class LanceVectorQueryBuilder(LanceQueryBuilder):
|
||||
-------
|
||||
LanceQueryBuilder
|
||||
The LanceQueryBuilder object.
|
||||
|
||||
Notes
|
||||
-----
|
||||
Calling this multiple times combines the filters with a logical AND
|
||||
rather than replacing the previous filter.
|
||||
"""
|
||||
self._where = where
|
||||
self._where = _combine_where(self._where, where)
|
||||
if prefilter is not None:
|
||||
self._postfilter = not prefilter
|
||||
return self
|
||||
@@ -2894,6 +2925,9 @@ class AsyncStandardQuery(AsyncQueryBase):
|
||||
|
||||
Filtering performance can often be improved by creating a scalar index
|
||||
on the filter column(s).
|
||||
|
||||
Calling this multiple times combines the filters with a logical AND
|
||||
rather than replacing the previous filter.
|
||||
"""
|
||||
if isinstance(predicate, Expr):
|
||||
self._inner.where_expr(predicate._inner)
|
||||
|
||||
@@ -9,6 +9,7 @@ from typing import List, Optional
|
||||
from lancedb import __version__
|
||||
|
||||
from .header import HeaderProvider
|
||||
from .oauth import OAuthConfig, OAuthFlowType
|
||||
|
||||
__all__ = [
|
||||
"TimeoutConfig",
|
||||
@@ -16,6 +17,8 @@ __all__ = [
|
||||
"TlsConfig",
|
||||
"ClientConfig",
|
||||
"HeaderProvider",
|
||||
"OAuthConfig",
|
||||
"OAuthFlowType",
|
||||
]
|
||||
|
||||
|
||||
|
||||
75
python/python/lancedb/remote/oauth.py
Normal file
75
python/python/lancedb/remote/oauth.py
Normal file
@@ -0,0 +1,75 @@
|
||||
# SPDX-License-Identifier: Apache-2.0
|
||||
# SPDX-FileCopyrightText: Copyright The LanceDB Authors
|
||||
|
||||
from dataclasses import dataclass, field
|
||||
from enum import Enum
|
||||
from typing import List, Optional
|
||||
|
||||
|
||||
class OAuthFlowType(str, Enum):
|
||||
"""OAuth authentication flow types."""
|
||||
|
||||
CLIENT_CREDENTIALS = "client_credentials"
|
||||
"""Client Credentials grant (service-to-service / M2M)."""
|
||||
|
||||
AZURE_MANAGED_IDENTITY = "azure_managed_identity"
|
||||
"""Azure Managed Identity via IMDS."""
|
||||
|
||||
|
||||
@dataclass
|
||||
class OAuthConfig:
|
||||
"""OAuth configuration for LanceDB authentication.
|
||||
|
||||
All token acquisition and refresh is handled in the Rust layer.
|
||||
This config is passed through to Rust via PyO3.
|
||||
|
||||
Parameters
|
||||
----------
|
||||
issuer_url : str
|
||||
OIDC issuer URL or OAuth authority URL.
|
||||
For Azure: ``https://login.microsoftonline.com/{tenant_id}/v2.0``
|
||||
client_id : str
|
||||
Application / Client ID.
|
||||
scopes : List[str]
|
||||
OAuth scopes to request.
|
||||
For Azure managed identity, exactly one scope or resource is required.
|
||||
For example: ``["api://{app_id}/.default"]``
|
||||
flow : OAuthFlowType
|
||||
Authentication flow to use. Default: CLIENT_CREDENTIALS.
|
||||
client_secret : Optional[str]
|
||||
Client secret (required for CLIENT_CREDENTIALS).
|
||||
managed_identity_client_id : Optional[str]
|
||||
Client ID for user-assigned managed identity (AZURE_MANAGED_IDENTITY).
|
||||
refresh_buffer_secs : Optional[int]
|
||||
Seconds before expiry to trigger proactive refresh (default: 300).
|
||||
Keep this well below the token TTL; if it is greater than or equal to
|
||||
the TTL, each request refreshes the token.
|
||||
|
||||
Examples
|
||||
--------
|
||||
Client Credentials (service-to-service):
|
||||
|
||||
>>> config = OAuthConfig(
|
||||
... issuer_url="https://login.microsoftonline.com/{tenant}/v2.0",
|
||||
... client_id="app-id",
|
||||
... client_secret="secret",
|
||||
... scopes=["api://lancedb-api/.default"],
|
||||
... )
|
||||
|
||||
Azure Managed Identity:
|
||||
|
||||
>>> config = OAuthConfig(
|
||||
... issuer_url="https://login.microsoftonline.com/{tenant}/v2.0",
|
||||
... client_id="app-id",
|
||||
... scopes=["api://lancedb-api/.default"],
|
||||
... flow=OAuthFlowType.AZURE_MANAGED_IDENTITY,
|
||||
... )
|
||||
"""
|
||||
|
||||
issuer_url: str
|
||||
client_id: str
|
||||
scopes: List[str]
|
||||
flow: OAuthFlowType = OAuthFlowType.CLIENT_CREDENTIALS
|
||||
client_secret: Optional[str] = field(default=None, repr=False)
|
||||
managed_identity_client_id: Optional[str] = None
|
||||
refresh_buffer_secs: Optional[int] = None
|
||||
@@ -156,9 +156,16 @@ class MRRReranker(Reranker):
|
||||
reciprocal_rank = 1.0 / rank
|
||||
mrr_score_map[result_id].append(reciprocal_rank)
|
||||
|
||||
# MRR averages the reciprocal rank across *all* ranking systems, treating
|
||||
# a system in which a document does not appear as a reciprocal rank of 0.
|
||||
# We therefore divide by the total number of systems, not by the number of
|
||||
# systems the document happens to appear in -- otherwise a document found
|
||||
# by a single ranking would outrank one ranked highly by every system,
|
||||
# defeating the purpose of fusing the rankings.
|
||||
num_systems = len(vector_results)
|
||||
final_mrr_scores = {}
|
||||
for result_id, reciprocal_ranks in mrr_score_map.items():
|
||||
mean_rr = np.mean(reciprocal_ranks)
|
||||
mean_rr = float(np.sum(reciprocal_ranks)) / num_systems
|
||||
final_mrr_scores[result_id] = mean_rr
|
||||
|
||||
combined = pa.concat_tables(vector_results, **self._concat_tables_args)
|
||||
|
||||
@@ -2142,12 +2142,19 @@ class LanceTable(Table):
|
||||
|
||||
branch = self.current_branch()
|
||||
version = None if branch is not None else self.version
|
||||
if self._namespace_client is not None:
|
||||
namespace_client = self._namespace_client
|
||||
if namespace_client is None:
|
||||
conn_uri = getattr(self._conn, "uri", "")
|
||||
if get_uri_scheme(conn_uri) == "namespace":
|
||||
namespace_client = self._conn.namespace_client()
|
||||
self._namespace_client = namespace_client
|
||||
|
||||
if namespace_client is not None:
|
||||
table_id = self._namespace_path + [self.name]
|
||||
ds = lance.dataset(
|
||||
version=version,
|
||||
storage_options=self._conn.storage_options,
|
||||
namespace_client=self._namespace_client,
|
||||
namespace_client=namespace_client,
|
||||
table_id=table_id,
|
||||
**kwargs,
|
||||
)
|
||||
|
||||
@@ -5,6 +5,7 @@
|
||||
|
||||
import tempfile
|
||||
import shutil
|
||||
import importlib
|
||||
import pytest
|
||||
import pyarrow as pa
|
||||
import lancedb
|
||||
@@ -103,6 +104,40 @@ class TestNamespaceConnection:
|
||||
assert isinstance(db, lancedb.LanceNamespaceDBConnection)
|
||||
assert len(list(db.table_names())) == 0
|
||||
|
||||
def test_sync_builtin_namespace_uses_rust_without_python_client(self, monkeypatch):
|
||||
"""Built-in sync namespace connections should not construct or call the
|
||||
Python namespace client for normal namespace/table management."""
|
||||
namespace_module = importlib.import_module("lancedb.namespace")
|
||||
|
||||
def fail_namespace_connect(*args, **kwargs):
|
||||
raise AssertionError("Python namespace client should not be constructed")
|
||||
|
||||
monkeypatch.setattr(
|
||||
namespace_module, "namespace_connect", fail_namespace_connect
|
||||
)
|
||||
|
||||
db = lancedb.connect_namespace("dir", {"root": self.temp_dir})
|
||||
assert isinstance(db, lancedb.LanceNamespaceDBConnection)
|
||||
assert db._namespace_client is None
|
||||
assert db._route_pushdown_to_rust is True
|
||||
|
||||
db.create_namespace(["test_ns"])
|
||||
assert "test_ns" in db.list_namespaces().namespaces
|
||||
|
||||
schema = pa.schema([pa.field("id", pa.int64())])
|
||||
table = db.create_table("test_table", schema=schema, namespace_path=["test_ns"])
|
||||
assert table.namespace == ["test_ns"]
|
||||
assert "test_table" in db.table_names(namespace_path=["test_ns"])
|
||||
assert "test_table" in db.list_tables(namespace_path=["test_ns"]).tables
|
||||
|
||||
opened = db.open_table("test_table", namespace_path=["test_ns"])
|
||||
assert opened.namespace == ["test_ns"]
|
||||
|
||||
db.drop_table("test_table", namespace_path=["test_ns"])
|
||||
assert db.list_tables(namespace_path=["test_ns"]).tables == []
|
||||
db.drop_namespace(["test_ns"])
|
||||
assert "test_ns" not in db.list_namespaces().namespaces
|
||||
|
||||
def test_create_table_through_namespace(self):
|
||||
"""Test creating a table through namespace."""
|
||||
db = lancedb.connect_namespace("dir", {"root": self.temp_dir})
|
||||
@@ -564,6 +599,61 @@ class TestAsyncNamespaceConnection:
|
||||
table_names = await db.table_names()
|
||||
assert len(list(table_names)) == 0
|
||||
|
||||
async def test_async_builtin_namespace_uses_rust_without_python_client(
|
||||
self, monkeypatch
|
||||
):
|
||||
"""Built-in async namespace connections should not construct or call the
|
||||
Python namespace client for normal namespace/table management."""
|
||||
namespace_module = importlib.import_module("lancedb.namespace")
|
||||
|
||||
def fail_namespace_connect(*args, **kwargs):
|
||||
raise AssertionError("Python namespace client should not be constructed")
|
||||
|
||||
monkeypatch.setattr(
|
||||
namespace_module, "namespace_connect", fail_namespace_connect
|
||||
)
|
||||
|
||||
db = lancedb.connect_namespace_async("dir", {"root": self.temp_dir})
|
||||
assert isinstance(db, lancedb.AsyncLanceNamespaceDBConnection)
|
||||
assert db._namespace_client is None
|
||||
assert db._route_pushdown_to_rust is True
|
||||
|
||||
await db.create_namespace(["test_ns"])
|
||||
assert "test_ns" in (await db.list_namespaces()).namespaces
|
||||
|
||||
schema = pa.schema([pa.field("id", pa.int64())])
|
||||
table = await db.create_table(
|
||||
"test_table", schema=schema, namespace_path=["test_ns"]
|
||||
)
|
||||
assert table._namespace_path == ["test_ns"]
|
||||
assert table._namespace_client is None
|
||||
assert table._route_pushdown_to_rust is True
|
||||
assert "test_table" in await db.table_names(namespace_path=["test_ns"])
|
||||
assert "test_table" in (await db.list_tables(namespace_path=["test_ns"])).tables
|
||||
|
||||
opened = await db.open_table("test_table", namespace_path=["test_ns"])
|
||||
assert opened._namespace_path == ["test_ns"]
|
||||
|
||||
await db.drop_table("test_table", namespace_path=["test_ns"])
|
||||
assert (await db.list_tables(namespace_path=["test_ns"])).tables == []
|
||||
await db.drop_namespace(["test_ns"])
|
||||
assert "test_ns" not in (await db.list_namespaces()).namespaces
|
||||
|
||||
async def test_async_namespace_client_is_lazy(self):
|
||||
"""namespace_client() should still return the backing client on demand."""
|
||||
pytest.importorskip("lance")
|
||||
from lance.namespace import DirectoryNamespace
|
||||
|
||||
db = lancedb.connect_namespace_async("dir", {"root": self.temp_dir})
|
||||
assert db._namespace_client is None
|
||||
|
||||
ns_client = await db.namespace_client()
|
||||
|
||||
assert isinstance(ns_client, DirectoryNamespace)
|
||||
namespace_id = ns_client.namespace_id().replace("\\\\", "\\")
|
||||
assert str(self.temp_dir) in namespace_id
|
||||
assert db._namespace_client is ns_client
|
||||
|
||||
# Async connect via namespace helper is not enabled yet.
|
||||
|
||||
async def test_create_table_async(self):
|
||||
@@ -818,10 +908,11 @@ class TestPushdownOperations:
|
||||
)
|
||||
assert db._route_pushdown_to_rust is True
|
||||
|
||||
def test_route_pushdown_to_rust_false_for_dir(self):
|
||||
"""A non-native (dir) connection keeps the Python pushdown path."""
|
||||
def test_route_pushdown_to_rust_for_native_dir(self):
|
||||
"""The sync dir connection is natively built and defers QueryTable
|
||||
pushdown to Rust."""
|
||||
db = lancedb.connect_namespace("dir", {"root": self.temp_dir})
|
||||
assert db._route_pushdown_to_rust is False
|
||||
assert db._route_pushdown_to_rust is True
|
||||
|
||||
def test_async_route_pushdown_to_rust_for_native_rest(self):
|
||||
"""The async connection must not silently bypass the read-freshness fix:
|
||||
@@ -834,10 +925,11 @@ class TestPushdownOperations:
|
||||
)
|
||||
assert db._route_pushdown_to_rust is True
|
||||
|
||||
def test_async_route_pushdown_to_rust_false_for_dir(self):
|
||||
"""The async non-native (dir) connection keeps the Python pushdown path."""
|
||||
def test_async_route_pushdown_to_rust_for_native_dir(self):
|
||||
"""The async dir connection is natively built and defers QueryTable
|
||||
pushdown to Rust."""
|
||||
db = lancedb.connect_namespace_async("dir", {"root": self.temp_dir})
|
||||
assert db._route_pushdown_to_rust is False
|
||||
assert db._route_pushdown_to_rust is True
|
||||
|
||||
def test_lance_table_to_arrow_uses_query_pushdown(self):
|
||||
namespace_client = _NamespaceClient()
|
||||
|
||||
@@ -502,6 +502,61 @@ def test_with_row_id(table: lancedb.table.Table):
|
||||
assert rs["_rowid"].to_pylist() == [0, 1]
|
||||
|
||||
|
||||
def test_where_repeated_combines_with_and(table: lancedb.table.Table):
|
||||
# Calling where() more than once should AND the filters together instead of
|
||||
# silently replacing the previous one (regression test for #2649).
|
||||
builder = table.search().where("id >= 1").where("id < 2")
|
||||
assert builder._where == "(id >= 1) AND (id < 2)"
|
||||
|
||||
ids = [row["id"] for row in builder.limit(10).to_list()]
|
||||
assert ids == [1]
|
||||
|
||||
|
||||
def test_where_repeated_combines_expr(table: lancedb.table.Table):
|
||||
from lancedb.expr import col, lit
|
||||
|
||||
builder = table.search().where(col("id") >= lit(1)).where(col("id") < lit(2))
|
||||
ids = [row["id"] for row in builder.limit(10).to_list()]
|
||||
assert ids == [1]
|
||||
|
||||
|
||||
def test_where_mixed_filter_kinds_combines(table: lancedb.table.Table):
|
||||
# Mixing a SQL string filter with an expression filter lowers the
|
||||
# expression to SQL and combines them as SQL strings.
|
||||
from lancedb.expr import col, lit
|
||||
|
||||
builder = table.search().where("id >= 1").where(col("id") < lit(2))
|
||||
ids = [row["id"] for row in builder.limit(10).to_list()]
|
||||
assert ids == [1]
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_where_repeated_combines_with_and_async(table_async: AsyncTable):
|
||||
ids = [
|
||||
row["id"]
|
||||
for row in (
|
||||
await table_async.query().where("id >= 1").where("id < 2").to_list()
|
||||
)
|
||||
]
|
||||
assert ids == [1]
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_where_mixed_filter_kinds_combines_async(table_async: AsyncTable):
|
||||
from lancedb.expr import col, lit
|
||||
|
||||
ids = [
|
||||
row["id"]
|
||||
for row in (
|
||||
await table_async.query()
|
||||
.where("id >= 1")
|
||||
.where(col("id") < lit(2))
|
||||
.to_list()
|
||||
)
|
||||
]
|
||||
assert ids == [1]
|
||||
|
||||
|
||||
def test_distance_range(table: lancedb.table.Table):
|
||||
q = [0, 0]
|
||||
rs = table.search(q).to_arrow()
|
||||
|
||||
@@ -350,6 +350,38 @@ def test_mrr_reranker_empty_input():
|
||||
reranker.rerank_multivector([])
|
||||
|
||||
|
||||
def test_mrr_multivector_rewards_consensus():
|
||||
# Reciprocal ranks must be averaged across *all* ranking systems, treating a
|
||||
# missing system as 0. A document ranked first by every system must outrank a
|
||||
# document ranked first by only one of them.
|
||||
reranker = MRRReranker()
|
||||
|
||||
def ranking(row_ids):
|
||||
return pa.table({"_rowid": pa.array(row_ids, type=pa.int64())})
|
||||
|
||||
# Doc 1 is rank 1 in only the first system; doc 2 is rank 1 in two systems
|
||||
# and rank 2 in the third (strong cross-system consensus).
|
||||
rs1 = ranking([1, 2, 3])
|
||||
rs2 = ranking([2, 3, 4])
|
||||
rs3 = ranking([2, 5, 6])
|
||||
|
||||
result = reranker.rerank_multivector([rs1, rs2, rs3])
|
||||
scores = {
|
||||
row_id: score
|
||||
for row_id, score in zip(
|
||||
result["_rowid"].to_pylist(),
|
||||
result["_relevance_score"].to_pylist(),
|
||||
)
|
||||
}
|
||||
|
||||
# sum of reciprocal ranks / number of systems
|
||||
assert scores[1] == pytest.approx(1.0 / 3)
|
||||
assert scores[2] == pytest.approx((0.5 + 1.0 + 1.0) / 3)
|
||||
assert scores[2] > scores[1]
|
||||
# The consensus document ranks first overall.
|
||||
assert result["_rowid"].to_pylist()[0] == 2
|
||||
|
||||
|
||||
def test_rrf_reranker_distance():
|
||||
data = pa.table(
|
||||
{
|
||||
|
||||
@@ -1137,6 +1137,16 @@ def test_namespace_open_table_with_branch_version(tmp_path):
|
||||
assert db.open_table("t", namespace_path=["ns1"], branch="exp").count_rows() == 3
|
||||
|
||||
|
||||
def test_namespace_root_table_to_lance_uses_namespace_client(tmp_path):
|
||||
pytest.importorskip("lance") # "dir" impl is lance.namespace.DirectoryNamespace
|
||||
db = lancedb.connect_namespace("dir", {"root": str(tmp_path)})
|
||||
table = db.create_table("t", [{"i": 0}])
|
||||
|
||||
assert table._namespace_client is None
|
||||
assert table.to_lance().count_rows() == 1
|
||||
assert table._namespace_client is not None
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_async_namespace_open_table_with_branch_version(tmp_path):
|
||||
pytest.importorskip("lance") # "dir" impl is lance.namespace.DirectoryNamespace
|
||||
|
||||
@@ -539,7 +539,7 @@ impl Connection {
|
||||
}
|
||||
|
||||
#[pyfunction]
|
||||
#[pyo3(signature = (uri, api_key=None, region=None, host_override=None, read_consistency_interval=None, client_config=None, storage_options=None, session=None, manifest_enabled=false, namespace_client_properties=None))]
|
||||
#[pyo3(signature = (uri, api_key=None, region=None, host_override=None, read_consistency_interval=None, client_config=None, storage_options=None, session=None, manifest_enabled=false, namespace_client_properties=None, oauth_config=None))]
|
||||
#[allow(clippy::too_many_arguments)]
|
||||
pub fn connect(
|
||||
py: Python<'_>,
|
||||
@@ -553,6 +553,7 @@ pub fn connect(
|
||||
session: Option<crate::session::Session>,
|
||||
manifest_enabled: bool,
|
||||
namespace_client_properties: Option<HashMap<String, String>>,
|
||||
oauth_config: Option<crate::oauth::PyOAuthConfig>,
|
||||
) -> PyResult<Bound<'_, PyAny>> {
|
||||
future_into_py(py, async move {
|
||||
let mut builder = lancedb::connect(&uri);
|
||||
@@ -582,6 +583,11 @@ pub fn connect(
|
||||
if let Some(client_config) = client_config {
|
||||
builder = builder.client_config(client_config.into());
|
||||
}
|
||||
if let Some(oauth_config) = oauth_config {
|
||||
let config: lancedb::remote::oauth::OAuthConfig =
|
||||
oauth_config.try_into().infer_error()?;
|
||||
builder = builder.oauth_config(config);
|
||||
}
|
||||
if let Some(session) = session {
|
||||
builder = builder.session(session.inner.clone());
|
||||
}
|
||||
@@ -649,6 +655,46 @@ pub fn connect_namespace_client(
|
||||
)))
|
||||
}
|
||||
|
||||
#[pyfunction]
|
||||
#[pyo3(signature = (
|
||||
namespace_client_impl,
|
||||
namespace_client_properties,
|
||||
read_consistency_interval=None,
|
||||
storage_options=None,
|
||||
session=None,
|
||||
namespace_client_pushdown_operations=None,
|
||||
))]
|
||||
#[allow(clippy::too_many_arguments)]
|
||||
pub fn connect_namespace(
|
||||
namespace_client_impl: String,
|
||||
namespace_client_properties: HashMap<String, String>,
|
||||
read_consistency_interval: Option<f64>,
|
||||
storage_options: Option<HashMap<String, String>>,
|
||||
session: Option<crate::session::Session>,
|
||||
namespace_client_pushdown_operations: Option<Vec<String>>,
|
||||
) -> PyResult<Connection> {
|
||||
let read_consistency_interval = read_consistency_interval.map(Duration::from_secs_f64);
|
||||
let namespace_client_pushdown_operations =
|
||||
parse_namespace_client_pushdown_operations(namespace_client_pushdown_operations)?;
|
||||
|
||||
let mut builder =
|
||||
lancedb::connect_namespace(&namespace_client_impl, namespace_client_properties)
|
||||
.pushdown_operations(namespace_client_pushdown_operations);
|
||||
if let Some(storage_options) = storage_options {
|
||||
builder = builder.storage_options(storage_options);
|
||||
}
|
||||
if let Some(read_consistency_interval) = read_consistency_interval {
|
||||
builder = builder.read_consistency_interval(read_consistency_interval);
|
||||
}
|
||||
if let Some(session) = session {
|
||||
builder = builder.session(session.inner.clone());
|
||||
}
|
||||
|
||||
Ok(Connection::new(
|
||||
crate::runtime::block_on(builder.execute()).infer_error()?,
|
||||
))
|
||||
}
|
||||
|
||||
/// Whether to build the namespace natively (from impl + properties) instead of
|
||||
/// wrapping a pre-built client. Native construction is required for the
|
||||
/// read-freshness provider to be installed
|
||||
|
||||
@@ -2,7 +2,7 @@
|
||||
// SPDX-FileCopyrightText: Copyright The LanceDB Authors
|
||||
|
||||
use arrow::RecordBatchStream;
|
||||
use connection::{Connection, connect, connect_namespace_client};
|
||||
use connection::{Connection, connect, connect_namespace, connect_namespace_client};
|
||||
use env_logger::Env;
|
||||
use expr::{PyExpr, expr_col, expr_func, expr_lit};
|
||||
use index::IndexConfig;
|
||||
@@ -26,6 +26,7 @@ pub mod expr;
|
||||
pub mod header;
|
||||
pub mod index;
|
||||
pub mod namespace;
|
||||
pub mod oauth;
|
||||
pub mod permutation;
|
||||
pub mod query;
|
||||
pub mod runtime;
|
||||
@@ -61,6 +62,7 @@ pub fn _lancedb(_py: Python, m: &Bound<'_, PyModule>) -> PyResult<()> {
|
||||
m.add_class::<PyPermutationReader>()?;
|
||||
m.add_class::<PyExpr>()?;
|
||||
m.add_function(wrap_pyfunction!(connect, m)?)?;
|
||||
m.add_function(wrap_pyfunction!(connect_namespace, m)?)?;
|
||||
m.add_function(wrap_pyfunction!(connect_namespace_client, m)?)?;
|
||||
m.add_function(wrap_pyfunction!(permutation::async_permutation_builder, m)?)?;
|
||||
m.add_function(wrap_pyfunction!(util::validate_table_name, m)?)?;
|
||||
|
||||
72
python/src/oauth.rs
Normal file
72
python/src/oauth.rs
Normal file
@@ -0,0 +1,72 @@
|
||||
// SPDX-License-Identifier: Apache-2.0
|
||||
// SPDX-FileCopyrightText: Copyright The LanceDB Authors
|
||||
|
||||
use pyo3::FromPyObject;
|
||||
|
||||
use lancedb::error::Error;
|
||||
use lancedb::remote::oauth::{OAuthConfig, OAuthFlow};
|
||||
|
||||
/// Python-side OAuth configuration, extracted via FromPyObject.
|
||||
/// Maps to `lancedb.remote.oauth.OAuthConfig` Python dataclass.
|
||||
#[derive(FromPyObject)]
|
||||
pub struct PyOAuthConfig {
|
||||
pub issuer_url: String,
|
||||
pub client_id: String,
|
||||
pub scopes: Vec<String>,
|
||||
pub flow: String,
|
||||
pub client_secret: Option<String>,
|
||||
pub managed_identity_client_id: Option<String>,
|
||||
pub refresh_buffer_secs: Option<u64>,
|
||||
}
|
||||
|
||||
impl TryFrom<PyOAuthConfig> for OAuthConfig {
|
||||
type Error = Error;
|
||||
|
||||
fn try_from(py: PyOAuthConfig) -> Result<Self, Self::Error> {
|
||||
let flow = match py.flow.as_str() {
|
||||
"client_credentials" => OAuthFlow::ClientCredentials,
|
||||
"azure_managed_identity" => OAuthFlow::AzureManagedIdentity {
|
||||
client_id: py.managed_identity_client_id,
|
||||
},
|
||||
other => {
|
||||
return Err(Error::InvalidInput {
|
||||
message: format!("Unknown OAuth flow type: {other}"),
|
||||
});
|
||||
}
|
||||
};
|
||||
|
||||
Ok(Self {
|
||||
issuer_url: py.issuer_url,
|
||||
client_id: py.client_id,
|
||||
client_secret: py.client_secret,
|
||||
scopes: py.scopes,
|
||||
flow,
|
||||
refresh_buffer_secs: py.refresh_buffer_secs,
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
#[test]
|
||||
fn test_unknown_oauth_flow_returns_invalid_input() {
|
||||
let config = PyOAuthConfig {
|
||||
issuer_url: "https://issuer.example.com".to_string(),
|
||||
client_id: "client-id".to_string(),
|
||||
scopes: vec!["scope".to_string()],
|
||||
flow: "typo".to_string(),
|
||||
client_secret: None,
|
||||
managed_identity_client_id: None,
|
||||
refresh_buffer_secs: None,
|
||||
};
|
||||
|
||||
let err = OAuthConfig::try_from(config).unwrap_err();
|
||||
assert!(matches!(
|
||||
err,
|
||||
Error::InvalidInput { message }
|
||||
if message == "Unknown OAuth flow type: typo"
|
||||
));
|
||||
}
|
||||
}
|
||||
33
python/tests/test_oauth.py
Normal file
33
python/tests/test_oauth.py
Normal file
@@ -0,0 +1,33 @@
|
||||
# SPDX-License-Identifier: Apache-2.0
|
||||
# SPDX-FileCopyrightText: Copyright The LanceDB Authors
|
||||
|
||||
import importlib.util
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
|
||||
def _load_oauth_module():
|
||||
oauth_path = (
|
||||
Path(__file__).parents[1] / "python" / "lancedb" / "remote" / "oauth.py"
|
||||
)
|
||||
spec = importlib.util.spec_from_file_location("lancedb_remote_oauth", oauth_path)
|
||||
module = importlib.util.module_from_spec(spec)
|
||||
assert spec.loader is not None
|
||||
sys.modules[spec.name] = module
|
||||
spec.loader.exec_module(module)
|
||||
return module
|
||||
|
||||
|
||||
def test_oauth_config_repr_redacts_client_secret():
|
||||
oauth = _load_oauth_module()
|
||||
|
||||
config = oauth.OAuthConfig(
|
||||
issuer_url="https://issuer.example.com",
|
||||
client_id="client-id",
|
||||
scopes=["scope"],
|
||||
client_secret="super-secret",
|
||||
)
|
||||
|
||||
rendered = repr(config)
|
||||
assert "super-secret" not in rendered
|
||||
assert "client_secret" not in rendered
|
||||
@@ -1,6 +1,6 @@
|
||||
[package]
|
||||
name = "lancedb"
|
||||
version = "0.31.0-beta.4"
|
||||
version = "0.31.0"
|
||||
edition.workspace = true
|
||||
description = "LanceDB: A serverless, low-latency vector database for AI applications"
|
||||
license.workspace = true
|
||||
@@ -14,6 +14,7 @@ rust-version.workspace = true
|
||||
ahash = { workspace = true }
|
||||
arrow = { workspace = true }
|
||||
arrow-array = { workspace = true }
|
||||
arrow-buffer = { workspace = true }
|
||||
arrow-data = { workspace = true }
|
||||
arrow-schema = { workspace = true }
|
||||
arrow-select = { workspace = true }
|
||||
@@ -166,6 +167,10 @@ required-features = ["bedrock"]
|
||||
[[example]]
|
||||
name = "simple"
|
||||
|
||||
[[example]]
|
||||
name = "polars"
|
||||
required-features = ["polars"]
|
||||
|
||||
[[example]]
|
||||
name = "full_text_search"
|
||||
|
||||
|
||||
47
rust/lancedb/examples/polars.rs
Normal file
47
rust/lancedb/examples/polars.rs
Normal file
@@ -0,0 +1,47 @@
|
||||
// SPDX-License-Identifier: Apache-2.0
|
||||
// SPDX-FileCopyrightText: Copyright The LanceDB Authors
|
||||
|
||||
//! This example demonstrates ingesting a Polars DataFrame into LanceDB and
|
||||
//! reading it back out as a Polars DataFrame.
|
||||
|
||||
use lancedb::arrow::IntoPolars;
|
||||
use lancedb::query::ExecutableQuery;
|
||||
use lancedb::{Result, connect};
|
||||
use polars::prelude::{DataFrame, NamedFrom, Series};
|
||||
|
||||
fn make_dataframe() -> DataFrame {
|
||||
let ids = Series::new("id", &[1i32, 2, 3, 4, 5]);
|
||||
let names = Series::new("name", &["Alice", "Bob", "Carol", "Dave", "Eve"]);
|
||||
let scores = Series::new("score", &[9.5f64, 8.1, 7.3, 9.0, 6.5]);
|
||||
DataFrame::new(vec![ids, names, scores]).unwrap()
|
||||
}
|
||||
|
||||
#[tokio::main]
|
||||
async fn main() -> Result<()> {
|
||||
let tmp = tempfile::tempdir().unwrap();
|
||||
let db = connect(tmp.path().to_str().unwrap()).execute().await?;
|
||||
|
||||
// Ingest a Polars DataFrame directly — DataFrame now implements Scannable.
|
||||
let df = make_dataframe();
|
||||
println!("Input DataFrame:\n{df}");
|
||||
|
||||
let table = db.create_table("people", df).execute().await?;
|
||||
|
||||
// Append more rows.
|
||||
let more = DataFrame::new(vec![
|
||||
Series::new("id", &[6i32, 7]),
|
||||
Series::new("name", &["Frank", "Grace"]),
|
||||
Series::new("score", &[7.8f64, 8.9]),
|
||||
])
|
||||
.unwrap();
|
||||
table.add(more).execute().await?;
|
||||
|
||||
// Read back as a Polars DataFrame.
|
||||
let result_df = table.query().execute().await?.into_polars().await?;
|
||||
|
||||
println!(
|
||||
"\nRound-tripped DataFrame ({} rows):\n{result_df}",
|
||||
result_df.height()
|
||||
);
|
||||
Ok(())
|
||||
}
|
||||
@@ -3,7 +3,19 @@
|
||||
|
||||
use std::{pin::Pin, sync::Arc};
|
||||
|
||||
// Re-export the arrow crates we depend on so downstream consumers can build
|
||||
// `RecordBatch`/arrays/builders against the exact same arrow line lancedb was
|
||||
// compiled against, instead of declaring their own (potentially mismatched)
|
||||
// direct arrow dependencies. See https://github.com/lancedb/lancedb/issues/3575.
|
||||
pub use arrow;
|
||||
pub use arrow_array;
|
||||
pub use arrow_buffer;
|
||||
pub use arrow_cast;
|
||||
pub use arrow_data;
|
||||
pub use arrow_ipc;
|
||||
pub use arrow_ord;
|
||||
pub use arrow_schema;
|
||||
pub use arrow_select;
|
||||
use datafusion_common::DataFusionError;
|
||||
use datafusion_physical_plan::stream::RecordBatchStreamAdapter;
|
||||
use futures::{Stream, StreamExt, TryStreamExt};
|
||||
@@ -112,54 +124,14 @@ impl<S: Stream<Item = Result<arrow_array::RecordBatch>>> RecordBatchStream
|
||||
|
||||
/// A trait for converting incoming data to Arrow
|
||||
///
|
||||
/// Integrations should implement this trait to allow data to be
|
||||
/// imported directly from the integration. For example, implementing
|
||||
/// this trait for `Vec<Vec<...>>` would allow the `Vec` to be directly
|
||||
/// used in methods like [`crate::connection::Connection::create_table`]
|
||||
/// or [`crate::table::Table::add`]
|
||||
pub trait IntoArrow {
|
||||
/// Convert the data into an iterator of Arrow batches
|
||||
fn into_arrow(self) -> Result<Box<dyn arrow_array::RecordBatchReader + Send>>;
|
||||
}
|
||||
|
||||
pub type BoxedRecordBatchReader = Box<dyn arrow_array::RecordBatchReader + Send>;
|
||||
|
||||
impl<T: arrow_array::RecordBatchReader + Send + 'static> IntoArrow for T {
|
||||
fn into_arrow(self) -> Result<Box<dyn arrow_array::RecordBatchReader + Send>> {
|
||||
Ok(Box::new(self))
|
||||
}
|
||||
}
|
||||
|
||||
/// A trait for converting incoming data to Arrow asynchronously
|
||||
///
|
||||
/// Serves the same purpose as [`IntoArrow`], but for asynchronous data.
|
||||
///
|
||||
/// Note: Arrow has no async equivalent to RecordBatchReader and so
|
||||
pub trait IntoArrowStream {
|
||||
/// Convert the data into a stream of Arrow batches
|
||||
fn into_arrow(self) -> Result<SendableRecordBatchStream>;
|
||||
}
|
||||
|
||||
impl<S: Stream<Item = Result<arrow_array::RecordBatch>>> SimpleRecordBatchStream<S> {
|
||||
pub fn new(stream: S, schema: Arc<arrow_schema::Schema>) -> Self {
|
||||
Self { schema, stream }
|
||||
}
|
||||
}
|
||||
|
||||
impl IntoArrowStream for SendableRecordBatchStream {
|
||||
fn into_arrow(self) -> Result<SendableRecordBatchStream> {
|
||||
Ok(self)
|
||||
}
|
||||
}
|
||||
|
||||
impl IntoArrowStream for datafusion_physical_plan::SendableRecordBatchStream {
|
||||
fn into_arrow(self) -> Result<SendableRecordBatchStream> {
|
||||
let schema = self.schema();
|
||||
let stream = self.map_err(|df_err| df_err.into());
|
||||
Ok(Box::pin(SimpleRecordBatchStream::new(stream, schema)))
|
||||
}
|
||||
}
|
||||
|
||||
pub trait LanceDbDatagenExt {
|
||||
fn into_ldb_stream(
|
||||
self,
|
||||
@@ -264,9 +236,7 @@ impl IntoPolars for SendableRecordBatchStream {
|
||||
#[cfg(all(test, feature = "polars"))]
|
||||
mod tests {
|
||||
use super::SendableRecordBatchStream;
|
||||
use crate::arrow::{
|
||||
IntoArrow, IntoPolars, PolarsDataFrameRecordBatchReader, SimpleRecordBatchStream,
|
||||
};
|
||||
use crate::arrow::{IntoPolars, PolarsDataFrameRecordBatchReader, SimpleRecordBatchStream};
|
||||
use polars::prelude::{DataFrame, NamedFrom, Series};
|
||||
|
||||
fn get_record_batch_reader_from_polars() -> Box<dyn arrow_array::RecordBatchReader + Send> {
|
||||
@@ -280,10 +250,7 @@ mod tests {
|
||||
float_series = Series::new("float", &[2.0]);
|
||||
let df2 = DataFrame::new(vec![string_series, int_series, float_series]).unwrap();
|
||||
|
||||
PolarsDataFrameRecordBatchReader::new(df1.vstack(&df2).unwrap())
|
||||
.unwrap()
|
||||
.into_arrow()
|
||||
.unwrap()
|
||||
Box::new(PolarsDataFrameRecordBatchReader::new(df1.vstack(&df2).unwrap()).unwrap())
|
||||
}
|
||||
|
||||
#[test]
|
||||
|
||||
@@ -185,6 +185,43 @@ impl Scannable for SendableRecordBatchStream {
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(feature = "polars")]
|
||||
impl Scannable for polars::frame::DataFrame {
|
||||
fn schema(&self) -> SchemaRef {
|
||||
crate::polars_arrow_convertors::convert_polars_df_schema_to_arrow_rb_schema(
|
||||
self.schema().clone(),
|
||||
)
|
||||
.expect("failed to convert Polars DataFrame schema to Arrow schema")
|
||||
}
|
||||
|
||||
fn scan_as_stream(&mut self) -> SendableRecordBatchStream {
|
||||
let schema = Scannable::schema(self);
|
||||
let batches: crate::Result<Vec<RecordBatch>> =
|
||||
match crate::arrow::PolarsDataFrameRecordBatchReader::new(self.clone()) {
|
||||
Err(e) => Err(e),
|
||||
Ok(reader) => reader.map(|b| b.map_err(Into::into)).collect(),
|
||||
};
|
||||
match batches {
|
||||
Err(e) => Box::pin(SimpleRecordBatchStream {
|
||||
schema,
|
||||
stream: once(async move { Err(e) }),
|
||||
}),
|
||||
Ok(batches) => {
|
||||
let stream = futures::stream::iter(batches.into_iter().map(Ok));
|
||||
Box::pin(SimpleRecordBatchStream { schema, stream })
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
fn num_rows(&self) -> Option<usize> {
|
||||
Some(self.height())
|
||||
}
|
||||
|
||||
fn rescannable(&self) -> bool {
|
||||
true
|
||||
}
|
||||
}
|
||||
|
||||
#[async_trait]
|
||||
impl StreamingWriteSource for Box<dyn Scannable> {
|
||||
fn arrow_schema(&self) -> SchemaRef {
|
||||
@@ -1089,4 +1126,60 @@ mod tests {
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(feature = "polars")]
|
||||
mod polars_tests {
|
||||
use super::*;
|
||||
use crate::arrow::IntoPolars;
|
||||
use crate::query::ExecutableQuery;
|
||||
use polars::prelude::{DataFrame, NamedFrom, Series};
|
||||
|
||||
fn make_df() -> DataFrame {
|
||||
DataFrame::new(vec![
|
||||
Series::new("id", &[1i32, 2, 3]),
|
||||
Series::new("val", &[1.1f64, 2.2, 3.3]),
|
||||
])
|
||||
.unwrap()
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_dataframe_scannable_round_trip() {
|
||||
let tmp = tempfile::tempdir().unwrap();
|
||||
let db = crate::connect(tmp.path().to_str().unwrap())
|
||||
.execute()
|
||||
.await
|
||||
.unwrap();
|
||||
|
||||
let df = make_df();
|
||||
let table = db.create_table("t", df.clone()).execute().await.unwrap();
|
||||
|
||||
// Append the same rows again.
|
||||
table.add(df.clone()).execute().await.unwrap();
|
||||
|
||||
let result = table
|
||||
.query()
|
||||
.execute()
|
||||
.await
|
||||
.unwrap()
|
||||
.into_polars()
|
||||
.await
|
||||
.unwrap();
|
||||
|
||||
assert_eq!(result.height(), df.height() * 2);
|
||||
assert_eq!(result.schema(), df.schema());
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_dataframe_scannable_rescannable() {
|
||||
let mut df = make_df();
|
||||
assert!(df.rescannable());
|
||||
|
||||
let batches1: Vec<RecordBatch> = df.scan_as_stream().try_collect().await.unwrap();
|
||||
assert_eq!(batches1.iter().map(|b| b.num_rows()).sum::<usize>(), 3);
|
||||
|
||||
// Can be scanned again.
|
||||
let batches2: Vec<RecordBatch> = df.scan_as_stream().try_collect().await.unwrap();
|
||||
assert_eq!(batches2.iter().map(|b| b.num_rows()).sum::<usize>(), 3);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
@@ -342,3 +342,9 @@ pub use connection::connect_namespace;
|
||||
/// Re-export Lance Session and ObjectStoreRegistry for custom session creation
|
||||
pub use lance::session::Session;
|
||||
pub use lance_io::object_store::ObjectStoreRegistry;
|
||||
|
||||
/// Re-export DataFusion so consumers can build the `Expr` values that public
|
||||
/// query/merge APIs (e.g. [`query::QueryBase::only_if_expr`]) accept without
|
||||
/// declaring their own (potentially mismatched) direct `datafusion` dependency.
|
||||
/// See <https://github.com/lancedb/lancedb/issues/3575>.
|
||||
pub use datafusion;
|
||||
|
||||
@@ -401,6 +401,9 @@ pub trait QueryBase {
|
||||
///
|
||||
/// Filtering performance can often be improved by creating a scalar index
|
||||
/// on the filter column(s).
|
||||
///
|
||||
/// Calling this multiple times combines the filters with a logical AND
|
||||
/// (i.e. `(previous) AND (new)`) rather than replacing the previous filter.
|
||||
fn only_if(self, filter: impl AsRef<str>) -> Self;
|
||||
|
||||
/// Only return rows which match the filter, using an expression builder.
|
||||
@@ -423,6 +426,9 @@ pub trait QueryBase {
|
||||
///
|
||||
/// Note: Expression filters are not supported for remote/server-side queries.
|
||||
/// Use [`QueryBase::only_if`] with SQL strings for remote tables.
|
||||
///
|
||||
/// Calling this multiple times combines the expressions with a logical AND
|
||||
/// rather than replacing the previous filter.
|
||||
fn only_if_expr(self, filter: datafusion_expr::Expr) -> Self;
|
||||
|
||||
/// Perform a full text search on the table.
|
||||
@@ -535,12 +541,13 @@ impl<T: HasQuery> QueryBase for T {
|
||||
}
|
||||
|
||||
fn only_if(mut self, filter: impl AsRef<str>) -> Self {
|
||||
self.mut_query().filter = Some(QueryFilter::Sql(filter.as_ref().to_string()));
|
||||
self.mut_query()
|
||||
.add_filter(QueryFilter::Sql(filter.as_ref().to_string()));
|
||||
self
|
||||
}
|
||||
|
||||
fn only_if_expr(mut self, filter: datafusion_expr::Expr) -> Self {
|
||||
self.mut_query().filter = Some(QueryFilter::Datafusion(filter));
|
||||
self.mut_query().add_filter(QueryFilter::Datafusion(filter));
|
||||
self
|
||||
}
|
||||
|
||||
@@ -716,6 +723,39 @@ pub enum QueryFilter {
|
||||
Datafusion(Expr),
|
||||
}
|
||||
|
||||
/// Combine two filters with a logical AND.
|
||||
///
|
||||
/// This is used when a query receives more than one filter (for example when
|
||||
/// `where`/`only_if` is called multiple times) so the filters are composed
|
||||
/// with AND rather than the later filter silently replacing the earlier one.
|
||||
///
|
||||
/// SQL string and expression filters are combined within their own
|
||||
/// representation. When the two representations are mixed, the expression is
|
||||
/// lowered to SQL (via [`crate::expr::expr_to_sql_string`]) and the filters are
|
||||
/// combined as SQL strings. Substrait filters cannot be combined and return an
|
||||
/// error.
|
||||
fn and_filters(existing: QueryFilter, new: QueryFilter) -> Result<QueryFilter> {
|
||||
match (existing, new) {
|
||||
(QueryFilter::Sql(lhs), QueryFilter::Sql(rhs)) => {
|
||||
Ok(QueryFilter::Sql(format!("({lhs}) AND ({rhs})")))
|
||||
}
|
||||
(QueryFilter::Datafusion(lhs), QueryFilter::Datafusion(rhs)) => {
|
||||
Ok(QueryFilter::Datafusion(lhs.and(rhs)))
|
||||
}
|
||||
(QueryFilter::Sql(lhs), QueryFilter::Datafusion(rhs)) => {
|
||||
let rhs = crate::expr::expr_to_sql_string(&rhs)?;
|
||||
Ok(QueryFilter::Sql(format!("({lhs}) AND ({rhs})")))
|
||||
}
|
||||
(QueryFilter::Datafusion(lhs), QueryFilter::Sql(rhs)) => {
|
||||
let lhs = crate::expr::expr_to_sql_string(&lhs)?;
|
||||
Ok(QueryFilter::Sql(format!("({lhs}) AND ({rhs})")))
|
||||
}
|
||||
_ => Err(Error::InvalidInput {
|
||||
message: "cannot combine a Substrait filter with another filter".to_string(),
|
||||
}),
|
||||
}
|
||||
}
|
||||
|
||||
/// A basic query into a table without any kind of search
|
||||
///
|
||||
/// This will result in a (potentially filtered) scan if executed
|
||||
@@ -730,6 +770,13 @@ pub struct QueryRequest {
|
||||
/// Apply filter to the returned rows.
|
||||
pub filter: Option<QueryFilter>,
|
||||
|
||||
/// An error recorded while combining repeated filters that could not be
|
||||
/// composed (see [`QueryRequest::add_filter`]). It is surfaced when the
|
||||
/// query is executed via [`QueryRequest::check_filter`]. We defer the error
|
||||
/// because the builder methods that set filters return `Self` rather than a
|
||||
/// `Result`.
|
||||
pub(crate) filter_error: Option<String>,
|
||||
|
||||
/// Perform a full text search on the table.
|
||||
pub full_text_search: Option<FullTextSearchQuery>,
|
||||
|
||||
@@ -775,6 +822,7 @@ impl Default for QueryRequest {
|
||||
limit: None,
|
||||
offset: None,
|
||||
filter: None,
|
||||
filter_error: None,
|
||||
full_text_search: None,
|
||||
select: Select::All,
|
||||
fast_search: false,
|
||||
@@ -788,6 +836,41 @@ impl Default for QueryRequest {
|
||||
}
|
||||
}
|
||||
|
||||
impl QueryRequest {
|
||||
/// Add a filter, combining it with any existing filter using a logical AND.
|
||||
///
|
||||
/// If the new filter cannot be combined with the existing one (because they
|
||||
/// use different representations) the error is recorded and surfaced later
|
||||
/// by [`Self::check_filter`].
|
||||
pub(crate) fn add_filter(&mut self, new: QueryFilter) {
|
||||
self.filter = Some(match self.filter.take() {
|
||||
None => new,
|
||||
Some(existing) => match and_filters(existing, new) {
|
||||
Ok(combined) => combined,
|
||||
Err(err) => {
|
||||
// The filters were consumed while attempting to combine
|
||||
// them; the recorded error is surfaced by `check_filter`
|
||||
// before the query executes.
|
||||
self.filter_error = Some(err.to_string());
|
||||
return;
|
||||
}
|
||||
},
|
||||
});
|
||||
}
|
||||
|
||||
/// Return an error if combining filters failed (see [`Self::add_filter`]).
|
||||
///
|
||||
/// This must be called by every backend before executing a query.
|
||||
pub(crate) fn check_filter(&self) -> Result<()> {
|
||||
if let Some(message) = &self.filter_error {
|
||||
return Err(Error::InvalidInput {
|
||||
message: message.clone(),
|
||||
});
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
}
|
||||
|
||||
/// A builder for LanceDB queries.
|
||||
///
|
||||
/// See [`crate::Table::query`] for more details on queries
|
||||
@@ -1682,6 +1765,70 @@ mod tests {
|
||||
}
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_repeated_only_if_combines_with_and() {
|
||||
use crate::expr::{col, lit};
|
||||
|
||||
let tmp_dir = tempdir().unwrap();
|
||||
let dataset_path = tmp_dir.path().join("test.lance");
|
||||
let uri = dataset_path.to_str().unwrap();
|
||||
|
||||
let conn = connect(uri).execute().await.unwrap();
|
||||
let table = conn
|
||||
.create_table("my_table", make_non_empty_batches())
|
||||
.execute()
|
||||
.await
|
||||
.unwrap();
|
||||
|
||||
let query = table.query().only_if("id > 0").only_if("id < 100");
|
||||
match &query.request.filter {
|
||||
Some(QueryFilter::Sql(sql)) => assert_eq!(sql, "(id > 0) AND (id < 100)"),
|
||||
other => panic!("expected combined SQL filter, got {other:?}"),
|
||||
}
|
||||
|
||||
// A single filter is left untouched.
|
||||
let query = table.query().only_if("id > 0");
|
||||
match &query.request.filter {
|
||||
Some(QueryFilter::Sql(sql)) => assert_eq!(sql, "id > 0"),
|
||||
other => panic!("expected single SQL filter, got {other:?}"),
|
||||
}
|
||||
|
||||
// Expression filters are combined with a logical AND as well.
|
||||
let query = table
|
||||
.query()
|
||||
.only_if_expr(col("id").gt(lit(0i32)))
|
||||
.only_if_expr(col("id").lt(lit(100i32)));
|
||||
match &query.request.filter {
|
||||
Some(QueryFilter::Datafusion(expr)) => {
|
||||
assert_eq!(
|
||||
expr,
|
||||
&col("id").gt(lit(0i32)).and(col("id").lt(lit(100i32)))
|
||||
);
|
||||
}
|
||||
other => panic!("expected combined Datafusion filter, got {other:?}"),
|
||||
}
|
||||
|
||||
// Mixing an SQL string filter with an expression filter lowers the
|
||||
// expression to SQL and combines them as SQL strings.
|
||||
let query = table
|
||||
.query()
|
||||
.only_if("id > 0")
|
||||
.only_if_expr(col("id").lt(lit(100i32)));
|
||||
match &query.request.filter {
|
||||
Some(QueryFilter::Sql(sql)) => {
|
||||
let expected = format!(
|
||||
"(id > 0) AND ({})",
|
||||
crate::expr::expr_to_sql_string(&col("id").lt(lit(100i32))).unwrap()
|
||||
);
|
||||
assert_eq!(sql, &expected);
|
||||
}
|
||||
other => panic!("expected combined SQL filter, got {other:?}"),
|
||||
}
|
||||
assert!(query.request.check_filter().is_ok());
|
||||
// The combined filter executes without error.
|
||||
query.execute().await.unwrap();
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_select_with_transform() {
|
||||
// TODO: Switch back to memory://foo after https://github.com/lancedb/lancedb/issues/1051
|
||||
|
||||
@@ -70,18 +70,29 @@ use tokio::sync::RwLock;
|
||||
const REQUEST_TIMEOUT_HEADER: HeaderName = HeaderName::from_static("x-request-timeout-ms");
|
||||
const MIN_VERSION_HEADER: HeaderName = HeaderName::from_static("x-lancedb-min-version");
|
||||
const MIN_TIMESTAMP_HEADER: HeaderName = HeaderName::from_static("x-lancedb-min-timestamp");
|
||||
const MIN_READ_VERSION_HEADER: HeaderName = HeaderName::from_static("x-lancedb-min-read-version");
|
||||
const VERSION_HEADER: HeaderName = HeaderName::from_static("x-lancedb-version");
|
||||
const METRIC_TYPE_KEY: &str = "metric_type";
|
||||
const INDEX_TYPE_KEY: &str = "index_type";
|
||||
const SCHEMA_CACHE_TTL: Duration = Duration::from_secs(30);
|
||||
const SCHEMA_CACHE_REFRESH_WINDOW: Duration = Duration::from_secs(5);
|
||||
|
||||
/// Per-table state driving the freshness headers (`x-lancedb-min-version` and
|
||||
/// `x-lancedb-min-timestamp`) sent on read requests.
|
||||
/// Per-table state driving the freshness headers (`x-lancedb-min-version`,
|
||||
/// `x-lancedb-min-timestamp`, and `x-lancedb-min-read-version`) sent on read
|
||||
/// requests.
|
||||
#[derive(Debug, Default, Clone, Copy)]
|
||||
struct FreshnessState {
|
||||
/// Provides read-your-write within a single handle: writes that return a
|
||||
/// version update this, and reads send it as `x-lancedb-min-version`.
|
||||
min_version: Option<u64>,
|
||||
/// Highest dataset version observed in a *read* response on this handle.
|
||||
/// Reads send it as `x-lancedb-min-read-version` so a load-balanced query
|
||||
/// node whose cache is behind this version must refresh before serving,
|
||||
/// giving monotonic reads across nodes regardless of which one the load
|
||||
/// balancer routes to. Sourced only from reads (always committed dataset
|
||||
/// versions), never from writes (which may return WAL entry ids), so it is
|
||||
/// unaffected by the WAL/version mismatch that retired `min_version`.
|
||||
min_read_version: Option<u64>,
|
||||
/// Wall-clock time captured at the last [`BaseTable::checkout_latest`]
|
||||
/// call. Subsequent reads send
|
||||
/// `max(baseline, now - read_consistency_interval)` as
|
||||
@@ -102,6 +113,7 @@ struct FreshnessState {
|
||||
struct FreshnessHeaders {
|
||||
min_version: Option<u64>,
|
||||
min_timestamp: Option<SystemTime>,
|
||||
min_read_version: Option<u64>,
|
||||
}
|
||||
|
||||
impl FreshnessHeaders {
|
||||
@@ -113,6 +125,9 @@ impl FreshnessHeaders {
|
||||
let dt: chrono::DateTime<chrono::Utc> = ts.into();
|
||||
request = request.header(MIN_TIMESTAMP_HEADER, dt.to_rfc3339());
|
||||
}
|
||||
if let Some(v) = self.min_read_version {
|
||||
request = request.header(MIN_READ_VERSION_HEADER, v.to_string());
|
||||
}
|
||||
request
|
||||
}
|
||||
}
|
||||
@@ -597,6 +612,7 @@ impl<S: HttpSend> RemoteTable<S> {
|
||||
body: &mut serde_json::Value,
|
||||
params: &QueryRequest,
|
||||
) -> Result<()> {
|
||||
params.check_filter()?;
|
||||
body["prefilter"] = params.prefilter.into();
|
||||
if let Some(offset) = params.offset {
|
||||
body["offset"] = serde_json::Value::Number(serde_json::Number::from(offset));
|
||||
@@ -884,6 +900,7 @@ impl<S: HttpSend> RemoteTable<S> {
|
||||
self.client.read_consistency_interval,
|
||||
SystemTime::now(),
|
||||
),
|
||||
min_read_version: state.min_read_version,
|
||||
}
|
||||
}
|
||||
|
||||
@@ -905,6 +922,30 @@ impl<S: HttpSend> RemoteTable<S> {
|
||||
state.min_version = Some(state.min_version.map_or(version, |v| v.max(version)));
|
||||
}
|
||||
|
||||
/// Record a dataset version observed in a *read* response so subsequent
|
||||
/// reads request at least this version via `x-lancedb-min-read-version`,
|
||||
/// giving monotonic reads across load-balanced query nodes. A returned `0`
|
||||
/// (or absent header from an old server) is ignored.
|
||||
fn track_read_version(&self, version: u64) {
|
||||
if version == 0 {
|
||||
return;
|
||||
}
|
||||
let mut state = self.freshness.lock().unwrap();
|
||||
state.min_read_version = Some(state.min_read_version.map_or(version, |v| v.max(version)));
|
||||
}
|
||||
|
||||
/// Parse the `x-lancedb-version` response header (the dataset version a read
|
||||
/// reflects) and fold it into the read-version watermark.
|
||||
fn track_read_version_from_headers(&self, headers: &reqwest::header::HeaderMap) {
|
||||
if let Some(version) = headers
|
||||
.get(&VERSION_HEADER)
|
||||
.and_then(|value| value.to_str().ok())
|
||||
.and_then(|value| value.parse::<u64>().ok())
|
||||
{
|
||||
self.track_read_version(version);
|
||||
}
|
||||
}
|
||||
|
||||
async fn execute_query(
|
||||
&self,
|
||||
query: &AnyQuery,
|
||||
@@ -928,6 +969,7 @@ impl<S: HttpSend> RemoteTable<S> {
|
||||
|
||||
let futures = requests.into_iter().map(|req| async move {
|
||||
let (request_id, response) = self.send(req, true).await?;
|
||||
self.track_read_version_from_headers(response.headers());
|
||||
self.read_arrow_stream(&request_id, response).await
|
||||
});
|
||||
let streams = futures::future::try_join_all(futures);
|
||||
@@ -1545,11 +1587,12 @@ impl<S: HttpSend> BaseTable for RemoteTable<S> {
|
||||
*write_guard = None;
|
||||
drop(write_guard);
|
||||
|
||||
// Drop any per-handle write tracking; subsequent reads use the
|
||||
// Drop any per-handle read/write tracking; subsequent reads use the
|
||||
// baseline timestamp captured now to guarantee freshness.
|
||||
*self.freshness.lock().unwrap() = FreshnessState {
|
||||
min_version: None,
|
||||
checkout_baseline: Some(SystemTime::now()),
|
||||
min_read_version: None,
|
||||
};
|
||||
|
||||
// Invalidate schema cache since we're switching versions
|
||||
@@ -1805,6 +1848,7 @@ impl<S: HttpSend> BaseTable for RemoteTable<S> {
|
||||
}
|
||||
};
|
||||
|
||||
self.track_read_version_from_headers(response.headers());
|
||||
let body = response.text().await.err_to_http(request_id.clone())?;
|
||||
|
||||
serde_json::from_str(&body).map_err(|e| Error::Http {
|
||||
@@ -7124,6 +7168,7 @@ mod tests {
|
||||
let state = FreshnessState {
|
||||
min_version: None,
|
||||
checkout_baseline: Some(baseline),
|
||||
min_read_version: None,
|
||||
};
|
||||
assert_eq!(compute_min_timestamp(&state, None, now), Some(baseline));
|
||||
|
||||
@@ -7148,6 +7193,7 @@ mod tests {
|
||||
let state = FreshnessState {
|
||||
min_version: None,
|
||||
checkout_baseline: Some(baseline),
|
||||
min_read_version: None,
|
||||
};
|
||||
assert_eq!(
|
||||
compute_min_timestamp(&state, Some(Duration::from_secs(10)), now),
|
||||
@@ -7159,6 +7205,7 @@ mod tests {
|
||||
let state = FreshnessState {
|
||||
min_version: None,
|
||||
checkout_baseline: Some(recent_baseline),
|
||||
min_read_version: None,
|
||||
};
|
||||
assert_eq!(
|
||||
compute_min_timestamp(&state, Some(Duration::from_secs(60)), now),
|
||||
@@ -7303,6 +7350,106 @@ mod tests {
|
||||
);
|
||||
}
|
||||
|
||||
/// A handler that records every request's headers and answers each read with
|
||||
/// an `x-lancedb-version` response header taken from `versions` (by call
|
||||
/// index, saturating at the last entry). An empty string means "no header".
|
||||
fn read_version_handler(
|
||||
versions: &'static [&'static str],
|
||||
) -> (
|
||||
impl Fn(reqwest::Request) -> http::Response<String> + Clone + Send + Sync + 'static,
|
||||
Arc<std::sync::Mutex<Vec<http::HeaderMap>>>,
|
||||
) {
|
||||
let requests = Arc::new(std::sync::Mutex::new(Vec::new()));
|
||||
let requests_c = requests.clone();
|
||||
let call = Arc::new(AtomicUsize::new(0));
|
||||
let handler = move |request: reqwest::Request| {
|
||||
requests_c.lock().unwrap().push(request.headers().clone());
|
||||
let i = call.fetch_add(1, Ordering::SeqCst).min(versions.len() - 1);
|
||||
let mut builder = http::Response::builder().status(200);
|
||||
if !versions[i].is_empty() {
|
||||
builder = builder.header("x-lancedb-version", versions[i]);
|
||||
}
|
||||
builder.body("42".to_string()).unwrap()
|
||||
};
|
||||
(handler, requests)
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_read_version_watermark_tracked_and_sent() {
|
||||
let (handler, requests) = read_version_handler(&["100", "100"]);
|
||||
let table = Table::new_with_handler("my_table", handler);
|
||||
|
||||
// First read has no watermark yet; the response advertises version 100,
|
||||
// so the second read must floor the server at 100.
|
||||
table.count_rows(None).await.unwrap();
|
||||
table.count_rows(None).await.unwrap();
|
||||
|
||||
let reqs = requests.lock().unwrap();
|
||||
assert!(!reqs[0].contains_key("x-lancedb-min-read-version"));
|
||||
assert_eq!(
|
||||
reqs[1]
|
||||
.get("x-lancedb-min-read-version")
|
||||
.unwrap()
|
||||
.to_str()
|
||||
.unwrap(),
|
||||
"100"
|
||||
);
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_read_version_watermark_keeps_max() {
|
||||
// Server reports 100 then a stale 50; the watermark must not regress.
|
||||
let (handler, requests) = read_version_handler(&["100", "50", "50"]);
|
||||
let table = Table::new_with_handler("my_table", handler);
|
||||
|
||||
table.count_rows(None).await.unwrap();
|
||||
table.count_rows(None).await.unwrap();
|
||||
table.count_rows(None).await.unwrap();
|
||||
|
||||
let reqs = requests.lock().unwrap();
|
||||
assert_eq!(
|
||||
reqs[2]
|
||||
.get("x-lancedb-min-read-version")
|
||||
.unwrap()
|
||||
.to_str()
|
||||
.unwrap(),
|
||||
"100"
|
||||
);
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_read_version_absent_header_no_watermark() {
|
||||
// An old server that doesn't return the version header leaves the
|
||||
// watermark unset, preserving backward compatibility.
|
||||
let (handler, requests) = read_version_handler(&[""]);
|
||||
let table = Table::new_with_handler("my_table", handler);
|
||||
|
||||
table.count_rows(None).await.unwrap();
|
||||
table.count_rows(None).await.unwrap();
|
||||
|
||||
let reqs = requests.lock().unwrap();
|
||||
assert!(!reqs[1].contains_key("x-lancedb-min-read-version"));
|
||||
}
|
||||
|
||||
#[tokio::test]
|
||||
async fn test_read_version_watermark_reset_on_checkout_latest() {
|
||||
let (handler, requests) = read_version_handler(&["100", "100"]);
|
||||
let table = Table::new_with_handler("my_table", handler);
|
||||
|
||||
table.count_rows(None).await.unwrap();
|
||||
table.checkout_latest().await.unwrap();
|
||||
table.count_rows(None).await.unwrap();
|
||||
|
||||
// The read after checkout_latest starts from a clean slate.
|
||||
let reqs = requests.lock().unwrap();
|
||||
assert!(
|
||||
!reqs
|
||||
.last()
|
||||
.unwrap()
|
||||
.contains_key("x-lancedb-min-read-version")
|
||||
);
|
||||
}
|
||||
|
||||
/// Like `capturing_handler`, but keeps a per-path snapshot of the headers
|
||||
/// from every request so tests can assert on a specific endpoint.
|
||||
#[allow(clippy::type_complexity)]
|
||||
|
||||
@@ -35,6 +35,15 @@ pub enum AnyQuery {
|
||||
VectorQuery(VectorQueryRequest),
|
||||
}
|
||||
|
||||
impl AnyQuery {
|
||||
pub(crate) fn base(&self) -> &QueryRequest {
|
||||
match self {
|
||||
Self::Query(query) => query,
|
||||
Self::VectorQuery(query) => &query.base,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
//Decide between namespace or local
|
||||
pub async fn execute_query(
|
||||
table: &NativeTable,
|
||||
@@ -108,6 +117,7 @@ pub async fn create_plan(
|
||||
AnyQuery::VectorQuery(query) => query.clone(),
|
||||
AnyQuery::Query(query) => VectorQueryRequest::from_plain_query(query.clone()),
|
||||
};
|
||||
query.base.check_filter()?;
|
||||
|
||||
let ds_ref = table.dataset.get().await?;
|
||||
let schema = ds_ref.schema();
|
||||
@@ -357,6 +367,7 @@ async fn execute_namespace_query(
|
||||
|
||||
/// Convert an AnyQuery to the namespace QueryTableRequest format.
|
||||
fn convert_to_namespace_query(query: &AnyQuery) -> Result<NsQueryTableRequest> {
|
||||
query.base().check_filter()?;
|
||||
match query {
|
||||
AnyQuery::VectorQuery(vq) => {
|
||||
// Extract the query vector(s)
|
||||
|
||||
Reference in New Issue
Block a user