Compare commits

..

19 Commits

Author SHA1 Message Date
Lance Release
d2c7e8117f Bump version: 0.31.0-beta.4 → 0.31.0-beta.5 2026-06-30 22:24:24 +00:00
Lance Release
bcbc0da090 Bump version: 0.34.0-beta.4 → 0.34.0-beta.5 2026-06-30 22:23:43 +00:00
Jack Ye
9bead9f53d fix(python): route sync namespace connections through rust (#3598)
Summary:
- Route built-in sync namespace connections through the Rust namespace
connector.
- Keep custom namespace clients on the existing Python fallback.
- Preserve namespace-backed to_lance compatibility with lazy Python
client construction and add regressions.
2026-06-30 14:46:23 -07:00
Jack Ye
0351b77984 feat(remote): monotonic reads via x-lancedb-min-read-version watermark (#3597)
## Summary

Adds per-session monotonic reads for remote (LanceDB Cloud/Enterprise)
tables, preventing successive reads on a handle from moving *backward*
in dataset version when a load balancer routes them to query nodes with
differently-cached views.

Each `RemoteTable` handle tracks the highest dataset version it has
observed in a read response — surfaced by the server via a new
`x-lancedb-version` response header — and sends it back as
`x-lancedb-min-read-version` on subsequent reads (`count_rows`,
`query`). A query node whose cache is behind that version refreshes
before serving; a node already at/beyond it serves from cache at no
extra cost.

The watermark is sourced only from reads (always committed dataset
versions), so unlike the retired `x-lancedb-min-version` it is
unaffected by WAL writes returning WAL entry ids. It is reset on
`checkout_latest()`. Both headers are optional and ignored by older
peers.

Server-side enforcement lives in LanceDB Enterprise. Targets the
`codex/update-lance-9-0-0-beta-8` integration branch to match the
Enterprise submodule pin.
2026-06-30 11:22:00 -07:00
Weston Pace
f6c9d31f98 feat: add polars dataframe integration (#3584)
This PR is part cleanup, part feature, part example.

It removes `IntoArrow` and `IntoArrowStream`. There was only one
redundant call site between the two. Once we moved everything to
`Scannable` these traits no longer serve any purpose.

It adds a `Scannable` impl for a polars DataFrame. We used to have this
at one point for `IntoArrow` so this is more like a regression fix than
anything.

It adds an example (and unit test) which ensures we can ingest from a
Polars DataFrame and export to one. LazyFrame support would be a
follow-up (though a pretty straightforward one) but we've never had
proper LazyFrame support before.
2026-06-30 08:28:41 -07:00
Dan Tasse
a8f1c5a69f feat: add skill to work with branches better (#3596)
Agents seemed to have trouble finding the right calls to work with
branches (create, list, delete) and passing the right params to get it
to work. We probably don't need a big skill to get it on the right track
but a little nudge seems helpful. Doing a couple simple tasks, it saved
about half the time and tokens, so feels worthwhile. Created with the
Claude skills creator, hence the "skill.md in a bare folder"
organization - happy to move it if that's not the standard anymore.

```
Benchmark results (3 evals, with-skill vs baseline):

┌────────────────┬────────────┬────────────────────┐
│     Metric     │ With skill │   Without skill    │
├────────────────┼────────────┼────────────────────┤
│ Pass rate      │ 3/3 (100%) │ 3/3 (100%)         │
├────────────────┼────────────┼────────────────────┤
│ Avg time       │ 51s        │ 142s (2.8× slower) │
├────────────────┼────────────┼────────────────────┤
│ Avg tokens     │ 19,305     │ 36,513 (47% more)  │
├────────────────┼────────────┼────────────────────┤
│ Avg tool calls │ 5.7        │ 26 (4.5× more)     │
└────────────────┴────────────┴────────────────────┘
```
2026-06-30 09:27:21 -04:00
Jack Ye
10fecdf051 feat(node): expose OAuth connection config (#3587)
Expose the merged Rust OAuth header provider through the Node/TypeScript
connection path.

Includes:
- Native OAuthConfig conversion for napi-rs
- ConnectionOptions.oauthConfig plumbing
- Public TypeScript OAuthConfig and OAuthFlowType exports
- Generated TypeScript API docs for the new config surface
- input-validation and debug-redaction coverage in the Rust binding
layer

Local validation: cargo fmt --all; git diff --check.
2026-06-29 16:55:45 -07:00
Raphael Malikian
c9ae93a7fa fix: add missing stacklevel=2 to warnings.warn() calls (Fixes #3589) (#3590)
Fixes #3589

## Problem
Multiple `warnings.warn()` calls across the Python client are missing
the `stacklevel=2` parameter. This causes warning messages to point to
lancedb internal code instead of the user's code that triggered the
warning, making debugging difficult.

## Solution
Add `stacklevel=2` to 7 `warnings.warn()` calls across 4 files:

| File | Warnings Fixed |
|------|---------------|
| `remote/db.py` | `request_thread_pool`, `connection_timeout`,
`read_timeout` deprecation warnings |
| `remote/table.py` | `cleanup_old_versions`, `compact_files`,
`optimize` no-op warnings |
| `table.py` | `data_storage_version`, `enable_v2_manifest_paths`,
`retrain` deprecation warnings |
| `embeddings/colpali.py` | `use_token_pooling` deprecation warning |

## Verification
- All 4 modified files pass `ast.parse()` syntax check
- Only `stacklevel=2` added — no other changes

## Changelog

| Date | Change | Author |
|------|--------|--------|
| 2026-06-27 | Add missing stacklevel=2 to warnings.warn() calls |
rtmalikian |

### Files Changed
- `python/python/lancedb/remote/db.py` — Add stacklevel=2 to 3
deprecation warnings
- `python/python/lancedb/remote/table.py` — Add stacklevel=2 to 3 no-op
warnings
- `python/python/lancedb/table.py` — Add stacklevel=2 to 3 deprecation
warnings
- `python/python/lancedb/embeddings/colpali.py` — Add stacklevel=2 to 1
deprecation warning

### Verification
- Syntax check passed on all modified files

---

**About the Author:** Raphael Malikian — Clinical AI Solutions
Architect. I specialise in building and fixing AI/ML systems for
healthcare, including vector databases, RAG pipelines, and clinical NLP.
If you need help with your project or think I can add value to your
organisation, feel free to reach out — I'd love to connect.

📧 rtmalikian@gmail.com
🔗 GitHub: https://github.com/rtmalikian
🔗 LinkedIn:
http://www.linkedin.com/in/raphael-t-malikian-mbbs-bsc-hons-71075436a

---

**Disclosure:** This code was developed with assistance from
DeepSeek-V4-Pro (DeepSeek) via Hermes Agent (Nous Research). All changes
were reviewed, tested against the actual codebase, and verified for
correctness.

Signed-off-by: rtmalikian <rtmalikian@gmail.com>
2026-06-29 16:36:44 -07:00
Raphael Malikian
05756f0bbf fix(python): raise clear error when permutation API is used on remote tables (Fixes #2934) (#3591)
Fixes #2934

## Problem
Passing a `RemoteTable` to `permutation_builder()` raises a cryptic
`AttributeError`:
```
AttributeError: 'RemoteTable' object has no attribute '_inner'
```
This leaves users confused about what went wrong and why.

## Root Cause
`PermutationBuilder.__init__()` calls `async_permutation_builder(table)`
which accesses `table._inner` — the underlying Rust Lance table object.
`RemoteTable` connects to LanceDB Cloud/Enterprise and does not have a
local `_inner` attribute, making permutations fundamentally unsupported
on remote tables.

## Solution
Added an early check in `PermutationBuilder.__init__()` that verifies
the table has `_inner` before calling the Rust function, raising a clear
`TypeError` with an explanation of why permutations don't work on remote
tables.

## Verification
- Syntax validated with `ast.parse()`
- Structural verification: single call site (`permutation_builder()`),
guard placed before Rust FFI call
- Error message tested with mock: `MockRemoteTable()` correctly triggers
`TypeError`

## Changelog

| Date | Change | Author |
|------|--------|--------|
| 2026-06-28 | Added remote table guard in PermutationBuilder.__init__ |
rtmalikian |

### Files Changed
- python/python/lancedb/permutation.py — Added `hasattr(table,
"_inner")` check with clear error

---

**About the Author:** Raphael Malikian — Clinical AI Solutions
Architect. I specialise in building and fixing AI/ML systems for
healthcare, including vector databases, RAG pipelines, and clinical NLP.
If you need help with your project or think I can add value to your
organisation, feel free to reach out — I'd love to connect.

📧 rtmalikian@gmail.com
🔗 GitHub: https://github.com/rtmalikian
🔗 LinkedIn:
http://www.linkedin.com/in/raphael-t-malikian-mbbs-bsc-hons-71075436a

---

**Disclosure:** This code was developed with assistance from
deepseek-v4-pro (DeepSeek) via Hermes Agent (Nous Research). All changes
were reviewed, tested against the actual codebase, and verified for
correctness.

Signed-off-by: rtmalikian <rtmalikian@gmail.com>
2026-06-29 16:36:01 -07:00
LanceDB Robot
2a0945443e chore: update lance dependency to v9.0.0-beta.10 (#3594)
Updates Lance Rust workspace dependencies and Java lance-core to
v9.0.0-beta.10.

No compatibility code changes were required; clippy and rustfmt passed
after installing the missing runner components.

Lance tag:
https://github.com/lance-format/lance/releases/tag/v9.0.0-beta.10
2026-06-29 15:28:47 -05:00
Jack Ye
39e819b6a7 feat(python): expose OAuth connection config (#3586)
Expose the merged Rust OAuth header provider through the Python async
connection path.

Includes:
- Python OAuthConfig and OAuthFlowType public config objects
- PyO3 conversion into the Rust OAuthConfig
- connect_async(oauth_config=...) plumbing
- repr redaction coverage for client_secret

Local validation: cargo fmt --all; ruff format/check on touched Python
files.
2026-06-29 12:36:35 -07:00
dependabot[bot]
70126943ff chore(deps): bump the rust-minor-patch group across 1 directory with 6 updates (#3588)
Bumps the rust-minor-patch group with 6 updates in the / directory:

| Package | From | To |
| --- | --- | --- |
| [env_logger](https://github.com/rust-cli/env_logger) | `0.11.10` |
`0.11.11` |
| [log](https://github.com/rust-lang/log) | `0.4.32` | `0.4.33` |
| [uuid](https://github.com/uuid-rs/uuid) | `1.23.3` | `1.23.4` |
| [anyhow](https://github.com/dtolnay/anyhow) | `1.0.102` | `1.0.103` |
| [napi](https://github.com/napi-rs/napi-rs) | `3.9.3` | `3.9.4` |
| [napi-derive](https://github.com/napi-rs/napi-rs) | `3.5.6` | `3.5.7`
|


Updates `env_logger` from 0.11.10 to 0.11.11
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/rust-cli/env_logger/releases">env_logger's
releases</a>.</em></p>
<blockquote>
<h2>v0.11.11</h2>
<h2>[0.11.11] - 2026-06-25</h2>
<h3>Internal</h3>
<ul>
<li>Updated <code>env_filter</code></li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/rust-cli/env_logger/blob/main/CHANGELOG.md">env_logger's
changelog</a>.</em></p>
<blockquote>
<h2>[0.11.11] - 2026-06-25</h2>
<h3>Internal</h3>
<ul>
<li>Updated <code>env_filter</code></li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="b4d3f2b8dd"><code>b4d3f2b</code></a>
chore: Release</li>
<li><a
href="cc2b2efcd7"><code>cc2b2ef</code></a>
chore: Release</li>
<li><a
href="69e27d1e82"><code>69e27d1</code></a>
docs: Update changelog</li>
<li><a
href="166880db07"><code>166880d</code></a>
Merge pull request <a
href="https://redirect.github.com/rust-cli/env_logger/issues/411">#411</a>
from epage/parse</li>
<li><a
href="0a580d06e7"><code>0a580d0</code></a>
fix(filter): Remove 'parse' on no_std</li>
<li><a
href="78d8ef116e"><code>78d8ef1</code></a>
Merge pull request <a
href="https://redirect.github.com/rust-cli/env_logger/issues/404">#404</a>
from cagatay-y/feature/filter-no_std</li>
<li><a
href="132fe86c8c"><code>132fe86</code></a>
feat(filter): Add support for no_std environments</li>
<li><a
href="4feafa4c3c"><code>4feafa4</code></a>
refactor(env_filter): Fix unreachable pub warning</li>
<li><a
href="92f8d8d083"><code>92f8d8d</code></a>
Merge pull request <a
href="https://redirect.github.com/rust-cli/env_logger/issues/410">#410</a>
from rust-cli/renovate/crate-ci-typos-1.x</li>
<li><a
href="4e57784e0a"><code>4e57784</code></a>
chore(deps): Update pre-commit hook crate-ci/typos to v1.47.0</li>
<li>Additional commits viewable in <a
href="https://github.com/rust-cli/env_logger/compare/v0.11.10...v0.11.11">compare
view</a></li>
</ul>
</details>
<br />

Updates `log` from 0.4.32 to 0.4.33
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/rust-lang/log/blob/master/CHANGELOG.md">log's
changelog</a>.</em></p>
<blockquote>
<h2>[0.4.33] - 2026-06-20</h2>
<h2>What's Changed</h2>
<ul>
<li>Fixed key comparison by <a
href="https://github.com/matteo-zeggiotti-ok"><code>@​matteo-zeggiotti-ok</code></a>
in <a
href="https://redirect.github.com/rust-lang/log/pull/732">rust-lang/log#732</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a
href="https://github.com/matteo-zeggiotti-ok"><code>@​matteo-zeggiotti-ok</code></a>
made their first contribution in <a
href="https://redirect.github.com/rust-lang/log/pull/732">rust-lang/log#732</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/rust-lang/log/compare/0.4.32...0.4.33">https://github.com/rust-lang/log/compare/0.4.32...0.4.33</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="f405739f3a"><code>f405739</code></a>
Merge pull request <a
href="https://redirect.github.com/rust-lang/log/issues/734">#734</a>
from rust-lang/cargo/0.4.33</li>
<li><a
href="6a24abf083"><code>6a24abf</code></a>
prepare for 0.4.33 release</li>
<li><a
href="87e062162e"><code>87e0621</code></a>
Merge pull request <a
href="https://redirect.github.com/rust-lang/log/issues/732">#732</a>
from matteo-zeggiotti-ok/fix-key-comparison</li>
<li><a
href="a9b57119a6"><code>a9b5711</code></a>
Review: fallback to the &amp;str hash</li>
<li><a
href="cc89cc6e41"><code>cc89cc6</code></a>
Review: fixed other comparisons</li>
<li><a
href="920e7dc281"><code>920e7dc</code></a>
Review: fixed comparison on <code>MaybeStaticStr</code></li>
<li><a
href="0d71d3c685"><code>0d71d3c</code></a>
Fixed key comparison</li>
<li>See full diff in <a
href="https://github.com/rust-lang/log/compare/0.4.32...0.4.33">compare
view</a></li>
</ul>
</details>
<br />

Updates `uuid` from 1.23.3 to 1.23.4
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/uuid-rs/uuid/releases">uuid's
releases</a>.</em></p>
<blockquote>
<h2>v1.23.4</h2>
<h2>What's Changed</h2>
<ul>
<li>Fix up name of fuzz script in readme by <a
href="https://github.com/KodrAus"><code>@​KodrAus</code></a> in <a
href="https://redirect.github.com/uuid-rs/uuid/pull/888">uuid-rs/uuid#888</a></li>
<li>document fixes by <a
href="https://github.com/frostyplanet"><code>@​frostyplanet</code></a>
in <a
href="https://redirect.github.com/uuid-rs/uuid/pull/889">uuid-rs/uuid#889</a></li>
<li>Prepare for 1.23.4 release by <a
href="https://github.com/KodrAus"><code>@​KodrAus</code></a> in <a
href="https://redirect.github.com/uuid-rs/uuid/pull/890">uuid-rs/uuid#890</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a
href="https://github.com/frostyplanet"><code>@​frostyplanet</code></a>
made their first contribution in <a
href="https://redirect.github.com/uuid-rs/uuid/pull/889">uuid-rs/uuid#889</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/uuid-rs/uuid/compare/v1.23.3...v1.23.4">https://github.com/uuid-rs/uuid/compare/v1.23.3...v1.23.4</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="3296d64a19"><code>3296d64</code></a>
Merge pull request <a
href="https://redirect.github.com/uuid-rs/uuid/issues/890">#890</a> from
uuid-rs/cargo/v1.23.4</li>
<li><a
href="cba53d0da2"><code>cba53d0</code></a>
prepare for 1.23.4 release</li>
<li><a
href="e347af48aa"><code>e347af4</code></a>
Merge pull request <a
href="https://redirect.github.com/uuid-rs/uuid/issues/889">#889</a> from
frostyplanet/main</li>
<li><a
href="e9bf55c222"><code>e9bf55c</code></a>
doc: Fix broken link warnings</li>
<li><a
href="5351af40a0"><code>5351af4</code></a>
doc: Enable feature flag label for docs.rs</li>
<li><a
href="1e6a9669e3"><code>1e6a966</code></a>
Merge pull request <a
href="https://redirect.github.com/uuid-rs/uuid/issues/888">#888</a> from
uuid-rs/KodrAus-patch-1</li>
<li><a
href="c9619f639c"><code>c9619f6</code></a>
fix up name of fuzz script in readme</li>
<li>See full diff in <a
href="https://github.com/uuid-rs/uuid/compare/v1.23.3...v1.23.4">compare
view</a></li>
</ul>
</details>
<br />

Updates `anyhow` from 1.0.102 to 1.0.103
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/dtolnay/anyhow/releases">anyhow's
releases</a>.</em></p>
<blockquote>
<h2>1.0.103</h2>
<ul>
<li>Fix Stacked Borrows violation (UB) in
<code>Error::downcast_mut</code> (<a
href="https://redirect.github.com/dtolnay/anyhow/issues/451">#451</a>,
<a
href="https://redirect.github.com/dtolnay/anyhow/issues/452">#452</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="5bdb0e24db"><code>5bdb0e2</code></a>
Release 1.0.103</li>
<li><a
href="e621bd35dd"><code>e621bd3</code></a>
Merge pull request <a
href="https://redirect.github.com/dtolnay/anyhow/issues/452">#452</a>
from dtolnay/downcast</li>
<li><a
href="6e8c000690"><code>6e8c000</code></a>
Eliminate pointer-&gt;reference-&gt;pointer during downcast</li>
<li><a
href="67c4abd771"><code>67c4abd</code></a>
Add regression test for issue 451</li>
<li><a
href="917a169320"><code>917a169</code></a>
Update actions/upload-artifact@v6 -&gt; v7</li>
<li><a
href="d9dc3faf78"><code>d9dc3fa</code></a>
Update actions/checkout@v6 -&gt; v7</li>
<li><a
href="841522b2aa"><code>841522b</code></a>
Raise minimum tested compiler to rust 1.85</li>
<li>See full diff in <a
href="https://github.com/dtolnay/anyhow/compare/1.0.102...1.0.103">compare
view</a></li>
</ul>
</details>
<br />

Updates `napi` from 3.9.3 to 3.9.4
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/napi-rs/napi-rs/releases">napi's
releases</a>.</em></p>
<blockquote>
<h2>napi-v3.9.4</h2>
<h3>Other</h3>
<ul>
<li><em>(napi-derive)</em> outline #[napi(object)] field-error
decoration (<a
href="https://redirect.github.com/napi-rs/napi-rs/pull/3338">#3338</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="9cc199fa34"><code>9cc199f</code></a>
chore: release (<a
href="https://redirect.github.com/napi-rs/napi-rs/issues/3345">#3345</a>)</li>
<li><a
href="b77119e711"><code>b77119e</code></a>
chore(release): publish</li>
<li><a
href="71ce9f6015"><code>71ce9f6</code></a>
chore(deps): update actions/cache action to v6 (<a
href="https://redirect.github.com/napi-rs/napi-rs/issues/3349">#3349</a>)</li>
<li><a
href="8c87f474c8"><code>8c87f47</code></a>
chore(deps): update <code>@​tybys/wasm-util</code> to 0.10.3 (<a
href="https://redirect.github.com/napi-rs/napi-rs/issues/3348">#3348</a>)</li>
<li><a
href="04e2a7655d"><code>04e2a76</code></a>
chore(deps): update cross-platform-actions/action action to v1.3.0 (<a
href="https://redirect.github.com/napi-rs/napi-rs/issues/3346">#3346</a>)</li>
<li><a
href="54ecbe4915"><code>54ecbe4</code></a>
chore(deps): update actions/checkout action to v7 (<a
href="https://redirect.github.com/napi-rs/napi-rs/issues/3340">#3340</a>)</li>
<li><a
href="3dd0c309da"><code>3dd0c30</code></a>
perf(napi-derive): outline #[napi(object)] field-error decoration (<a
href="https://redirect.github.com/napi-rs/napi-rs/issues/3338">#3338</a>)</li>
<li><a
href="81ac3d98c3"><code>81ac3d9</code></a>
build(deps): bump undici from 6.26.0 to 6.27.0 (<a
href="https://redirect.github.com/napi-rs/napi-rs/issues/3342">#3342</a>)</li>
<li>See full diff in <a
href="https://github.com/napi-rs/napi-rs/compare/napi-v3.9.3...napi-v3.9.4">compare
view</a></li>
</ul>
</details>
<br />

Updates `napi-derive` from 3.5.6 to 3.5.7
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/napi-rs/napi-rs/releases">napi-derive's
releases</a>.</em></p>
<blockquote>
<h2>napi-derive-v3.5.7</h2>
<h3>Other</h3>
<ul>
<li>updated the following local packages: napi-derive-backend</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="9cc199fa34"><code>9cc199f</code></a>
chore: release (<a
href="https://redirect.github.com/napi-rs/napi-rs/issues/3345">#3345</a>)</li>
<li><a
href="b77119e711"><code>b77119e</code></a>
chore(release): publish</li>
<li><a
href="71ce9f6015"><code>71ce9f6</code></a>
chore(deps): update actions/cache action to v6 (<a
href="https://redirect.github.com/napi-rs/napi-rs/issues/3349">#3349</a>)</li>
<li><a
href="8c87f474c8"><code>8c87f47</code></a>
chore(deps): update <code>@​tybys/wasm-util</code> to 0.10.3 (<a
href="https://redirect.github.com/napi-rs/napi-rs/issues/3348">#3348</a>)</li>
<li><a
href="04e2a7655d"><code>04e2a76</code></a>
chore(deps): update cross-platform-actions/action action to v1.3.0 (<a
href="https://redirect.github.com/napi-rs/napi-rs/issues/3346">#3346</a>)</li>
<li><a
href="54ecbe4915"><code>54ecbe4</code></a>
chore(deps): update actions/checkout action to v7 (<a
href="https://redirect.github.com/napi-rs/napi-rs/issues/3340">#3340</a>)</li>
<li><a
href="3dd0c309da"><code>3dd0c30</code></a>
perf(napi-derive): outline #[napi(object)] field-error decoration (<a
href="https://redirect.github.com/napi-rs/napi-rs/issues/3338">#3338</a>)</li>
<li><a
href="81ac3d98c3"><code>81ac3d9</code></a>
build(deps): bump undici from 6.26.0 to 6.27.0 (<a
href="https://redirect.github.com/napi-rs/napi-rs/issues/3342">#3342</a>)</li>
<li><a
href="ee58383da4"><code>ee58383</code></a>
chore(napi): release v3.9.3 (<a
href="https://redirect.github.com/napi-rs/napi-rs/issues/3335">#3335</a>)</li>
<li><a
href="c78727667b"><code>c787276</code></a>
fix(napi): sync referred flag when creating a weak ThreadsafeFunction
(<a
href="https://redirect.github.com/napi-rs/napi-rs/issues/3337">#3337</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/napi-rs/napi-rs/compare/napi-derive-v3.5.6...napi-derive-v3.5.7">compare
view</a></li>
</ul>
</details>
<br />

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-06-29 07:44:46 -07:00
Lance Release
e01777070d Bump version: 0.31.0-beta.3 → 0.31.0-beta.4 2026-06-29 11:12:18 +00:00
Lance Release
3878adc6dc Bump version: 0.34.0-beta.3 → 0.34.0-beta.4 2026-06-29 11:11:05 +00:00
Jack Ye
3df3043563 feat(rust): add OAuth header provider (#3579)
## Summary

Add the Rust OAuth header provider for remote LanceDB connections.

This supports client credentials and Azure managed identity flows,
handles token caching and refresh, redacts secrets in Debug output, and
wires `ConnectBuilder::oauth_config()` into the remote client while
rejecting ambiguous API-key/header-provider combinations.
2026-06-26 23:57:16 -07:00
Ryan Green
8a5cd74e48 fix: ensure read freshness provider is built into namespace client (#3571)
By default the read freshness provider was not included in the namespace
client, preventing the read freshness headers from being included in the
request. This prevents checkout_latest() from working as expected when
using the namespace client.

This fix ensures the provided is built into the client when the
namespace impl and properties are provided.
2026-06-25 21:47:55 -07:00
Lance Release
448d5ec20f Bump version: 0.31.0-beta.2 → 0.31.0-beta.3 2026-06-25 01:55:06 +00:00
Lance Release
8718345229 Bump version: 0.34.0-beta.2 → 0.34.0-beta.3 2026-06-25 01:53:51 +00:00
LanceDB Robot
026fedc286 chore: update lance dependency to v9.0.0-beta.8 (#3580)
Updates Lance dependencies from v9.0.0-beta.4 to v9.0.0-beta.8.\n\nThis
refreshes the Rust workspace lockfile and the Java lance-core version.
Triggering Lance tag:
https://github.com/lance-format/lance/releases/tag/v9.0.0-beta.8
2026-06-24 18:52:59 -07:00
55 changed files with 2882 additions and 458 deletions

View File

@@ -0,0 +1,137 @@
---
name: lancedb-branch-ops
description: Branch management for LanceDB tables via the REST API. Use this skill whenever someone wants to create, delete, list, or switch branches on a LanceDB table — or needs to make sure a write (metadata update, index build, etc.) lands on a specific branch instead of main. Invoke it even without the word "branch" if context makes clear they want an experimental copy of a table, want to isolate changes, or want to confirm a mutation didn't touch main. Covers: branches/list, branches/create, branches/delete, and passing "branch" in describe/update_field_metadata/create_index to target a non-main version.
---
## Goal
Manage branches on a LanceDB table: list what exists, create new ones, delete stale ones, and direct read/write operations at a specific branch without touching main.
## Step 0: Establish the connection
Use the `lancedb-connect` skill to resolve the base URL and auth headers (`x-api-key`, `x-lancedb-database`). Skip this only if the connection is already known from the current conversation.
All examples below use `{base_url}` — substitute the resolved endpoint and include the auth headers on every request.
## The branch model (important)
LanceDB branches are named snapshots that diverge from the table's current state at creation time. There is **no checkout command** — you never switch the whole table to a branch. Instead, you **pass `"branch": "<name>"` in the request body** of any operation to target that branch. Omitting the key (or sending an empty body) always targets main.
`branches/list` returns only non-main branches. Main always exists and is not listed.
## List branches
```http
POST {base_url}/v1/table/{table_id}/branches/list
Content-Type: application/json
{}
```
Response:
```json
{
"branches": {
"experiment-reindex": {"parentVersion": 1, "createAt": 1782506085, "manifestSize": 1029}
}
}
```
If `branches` is `{}`, the table has no branches besides main.
## Create a branch
```http
POST {base_url}/v1/table/{table_id}/branches/create
Content-Type: application/json
{"name": "experiment-reindex"}
```
HTTP 200 with `{}` body = success. The branch is created off the table's current state on main.
Verify by calling `branches/list` and confirming the new name appears.
## Delete a branch
```http
POST {base_url}/v1/table/{table_id}/branches/delete
Content-Type: application/json
{"name": "stale-2024"}
```
HTTP 200 with `{}` body = success. Only the branch pointer is removed — main and all row data remain intact.
Verify by calling `branches/list` (name gone) and `describe` with no branch param (main still responds).
## Operate on a specific branch
Pass `"branch": "<name>"` in the body of any operation to scope it to that branch:
**Read schema on a branch:**
```http
POST {base_url}/v1/table/{table_id}/describe
Content-Type: application/json
{"branch": "wip-branch"}
```
**Write metadata to a branch (not main):**
```http
POST {base_url}/v1/table/{table_id}/update_field_metadata
Content-Type: application/json
{
"branch": "wip-branch",
"updates": [
{
"path": "category",
"metadata": {"lancedb:description": "Product category label."},
"replace": false
}
]
}
```
**Build an index on a branch:**
```http
POST {base_url}/v1/table/{table_id}/create_index
Content-Type: application/json
{
"branch": "wip-branch",
"column": "category",
"index_type": "BTREE"
}
```
## Verifying isolation
After writing to a branch, always confirm the change did NOT land on main:
```bash
# Should show the new metadata
curl -s -X POST {base_url}/v1/table/{table_id}/describe \
-H "x-api-key: <key>" -H "x-lancedb-database: <db>" \
-H "content-type: application/json" \
-d '{"branch": "wip-branch"}'
# Should NOT show the new metadata
curl -s -X POST {base_url}/v1/table/{table_id}/describe \
-H "x-api-key: <key>" -H "x-lancedb-database: <db>" \
-H "content-type: application/json" \
-d '{}'
```
## Quick reference
| Goal | Endpoint | Body |
|------|----------|------|
| List all branches | `branches/list` | `{}` |
| Create a branch | `branches/create` | `{"name": "..."}` |
| Delete a branch | `branches/delete` | `{"name": "..."}` |
| Read schema on branch | `describe` | `{"branch": "..."}` |
| Write metadata on branch | `update_field_metadata` | `{"branch": "...", "updates": [...]}` |
| Build index on branch | `create_index` | `{"branch": "...", "column": ..., "index_type": ...}` |
| Target main (default) | any endpoint | omit `"branch"` key |

View File

@@ -1,5 +1,5 @@
[tool.bumpversion]
current_version = "0.31.0-beta.2"
current_version = "0.31.0-beta.5"
parse = """(?x)
(?P<major>0|[1-9]\\d*)\\.
(?P<minor>0|[1-9]\\d*)\\.

133
Cargo.lock generated
View File

@@ -157,9 +157,9 @@ dependencies = [
[[package]]
name = "anyhow"
version = "1.0.102"
version = "1.0.103"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7f202df86484c868dbad7eaa557ef785d5c66295e41b460ef922eca0723b842c"
checksum = "2a4385e2e34eb35d6b3efe798b9eb88096925d87726c0798709bf56d9ed84af3"
[[package]]
name = "approx"
@@ -1297,15 +1297,6 @@ version = "2.11.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "c4512299f36f043ab09a583e57bceb5a5aab7a73db1805848e8fef3c9e8c78b3"
[[package]]
name = "bitpacking"
version = "0.9.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "96a7139abd3d9cebf8cd6f920a389cf3dc9576172e32f4563f188cae3c3eb019"
dependencies = [
"crunchy",
]
[[package]]
name = "bitvec"
version = "1.0.1"
@@ -3186,9 +3177,9 @@ dependencies = [
[[package]]
name = "env_filter"
version = "1.0.1"
version = "2.0.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "32e90c2accc4b07a8456ea0debdc2e7587bdd890680d71173a15d4ae604f6eef"
checksum = "900d271a03799a1ee8d1ca9b19893b48ca674a9284fefcfb85f05e74ed314217"
dependencies = [
"log",
"regex",
@@ -3196,9 +3187,9 @@ dependencies = [
[[package]]
name = "env_logger"
version = "0.11.10"
version = "0.11.11"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0621c04f2196ac3f488dd583365b9c09be011a4ab8b9f37248ffcc8f6198b56a"
checksum = "de671bd27a75a797dc9ae289ba1e77276e75e2026408aab65185384e2d5cd3f6"
dependencies = [
"anstream",
"anstyle",
@@ -3432,8 +3423,8 @@ checksum = "42703706b716c37f96a77aea830392ad231f44c9e9a67872fa5548707e11b11c"
[[package]]
name = "fsst"
version = "9.0.0-beta.8"
source = "git+https://github.com/lance-format/lance.git?rev=4acefffd5d38f88003fce681ae1d0871077ce5e7#4acefffd5d38f88003fce681ae1d0871077ce5e7"
version = "9.0.0-beta.10"
source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.10#e25b71e74b89d10c57b412d111bde087117383f3"
dependencies = [
"arrow-array",
"rand 0.9.4",
@@ -4735,8 +4726,8 @@ checksum = "e037a2e1d8d5fdbd49b16a4ea09d5d6401c1f29eca5ff29d03d3824dba16256a"
[[package]]
name = "lance"
version = "9.0.0-beta.8"
source = "git+https://github.com/lance-format/lance.git?rev=4acefffd5d38f88003fce681ae1d0871077ce5e7#4acefffd5d38f88003fce681ae1d0871077ce5e7"
version = "9.0.0-beta.10"
source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.10#e25b71e74b89d10c57b412d111bde087117383f3"
dependencies = [
"arc-swap",
"arrow",
@@ -4754,7 +4745,6 @@ dependencies = [
"async_cell",
"aws-credential-types",
"aws-sdk-dynamodb",
"bitpacking",
"byteorder",
"bytes",
"chrono",
@@ -4773,6 +4763,7 @@ dependencies = [
"humantime",
"itertools 0.14.0",
"lance-arrow",
"lance-bitpacking",
"lance-core",
"lance-datafusion",
"lance-encoding",
@@ -4810,8 +4801,8 @@ dependencies = [
[[package]]
name = "lance-arrow"
version = "9.0.0-beta.8"
source = "git+https://github.com/lance-format/lance.git?rev=4acefffd5d38f88003fce681ae1d0871077ce5e7#4acefffd5d38f88003fce681ae1d0871077ce5e7"
version = "9.0.0-beta.10"
source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.10#e25b71e74b89d10c57b412d111bde087117383f3"
dependencies = [
"arrow-array",
"arrow-buffer",
@@ -4832,7 +4823,7 @@ dependencies = [
[[package]]
name = "lance-arrow-scalar"
version = "58.0.0"
source = "git+https://github.com/lance-format/lance.git?rev=4acefffd5d38f88003fce681ae1d0871077ce5e7#4acefffd5d38f88003fce681ae1d0871077ce5e7"
source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.10#e25b71e74b89d10c57b412d111bde087117383f3"
dependencies = [
"arrow-array",
"arrow-buffer",
@@ -4846,7 +4837,7 @@ dependencies = [
[[package]]
name = "lance-arrow-stats"
version = "58.0.0"
source = "git+https://github.com/lance-format/lance.git?rev=4acefffd5d38f88003fce681ae1d0871077ce5e7#4acefffd5d38f88003fce681ae1d0871077ce5e7"
source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.10#e25b71e74b89d10c57b412d111bde087117383f3"
dependencies = [
"arrow-array",
"arrow-schema",
@@ -4855,18 +4846,19 @@ dependencies = [
[[package]]
name = "lance-bitpacking"
version = "9.0.0-beta.8"
source = "git+https://github.com/lance-format/lance.git?rev=4acefffd5d38f88003fce681ae1d0871077ce5e7#4acefffd5d38f88003fce681ae1d0871077ce5e7"
version = "9.0.0-beta.10"
source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.10#e25b71e74b89d10c57b412d111bde087117383f3"
dependencies = [
"arrayref",
"crunchy",
"paste",
"seq-macro",
]
[[package]]
name = "lance-core"
version = "9.0.0-beta.8"
source = "git+https://github.com/lance-format/lance.git?rev=4acefffd5d38f88003fce681ae1d0871077ce5e7#4acefffd5d38f88003fce681ae1d0871077ce5e7"
version = "9.0.0-beta.10"
source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.10#e25b71e74b89d10c57b412d111bde087117383f3"
dependencies = [
"arrow-array",
"arrow-buffer",
@@ -4904,8 +4896,8 @@ dependencies = [
[[package]]
name = "lance-datafusion"
version = "9.0.0-beta.8"
source = "git+https://github.com/lance-format/lance.git?rev=4acefffd5d38f88003fce681ae1d0871077ce5e7#4acefffd5d38f88003fce681ae1d0871077ce5e7"
version = "9.0.0-beta.10"
source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.10#e25b71e74b89d10c57b412d111bde087117383f3"
dependencies = [
"arrow",
"arrow-array",
@@ -4935,8 +4927,8 @@ dependencies = [
[[package]]
name = "lance-datagen"
version = "9.0.0-beta.8"
source = "git+https://github.com/lance-format/lance.git?rev=4acefffd5d38f88003fce681ae1d0871077ce5e7#4acefffd5d38f88003fce681ae1d0871077ce5e7"
version = "9.0.0-beta.10"
source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.10#e25b71e74b89d10c57b412d111bde087117383f3"
dependencies = [
"arrow",
"arrow-array",
@@ -4953,8 +4945,8 @@ dependencies = [
[[package]]
name = "lance-derive"
version = "9.0.0-beta.8"
source = "git+https://github.com/lance-format/lance.git?rev=4acefffd5d38f88003fce681ae1d0871077ce5e7#4acefffd5d38f88003fce681ae1d0871077ce5e7"
version = "9.0.0-beta.10"
source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.10#e25b71e74b89d10c57b412d111bde087117383f3"
dependencies = [
"proc-macro2",
"quote",
@@ -4963,8 +4955,8 @@ dependencies = [
[[package]]
name = "lance-encoding"
version = "9.0.0-beta.8"
source = "git+https://github.com/lance-format/lance.git?rev=4acefffd5d38f88003fce681ae1d0871077ce5e7#4acefffd5d38f88003fce681ae1d0871077ce5e7"
version = "9.0.0-beta.10"
source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.10#e25b71e74b89d10c57b412d111bde087117383f3"
dependencies = [
"arrow-arith",
"arrow-array",
@@ -4999,8 +4991,8 @@ dependencies = [
[[package]]
name = "lance-file"
version = "9.0.0-beta.8"
source = "git+https://github.com/lance-format/lance.git?rev=4acefffd5d38f88003fce681ae1d0871077ce5e7#4acefffd5d38f88003fce681ae1d0871077ce5e7"
version = "9.0.0-beta.10"
source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.10#e25b71e74b89d10c57b412d111bde087117383f3"
dependencies = [
"arrow-arith",
"arrow-array",
@@ -5030,8 +5022,8 @@ dependencies = [
[[package]]
name = "lance-index"
version = "9.0.0-beta.8"
source = "git+https://github.com/lance-format/lance.git?rev=4acefffd5d38f88003fce681ae1d0871077ce5e7#4acefffd5d38f88003fce681ae1d0871077ce5e7"
version = "9.0.0-beta.10"
source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.10#e25b71e74b89d10c57b412d111bde087117383f3"
dependencies = [
"arc-swap",
"arrow",
@@ -5043,7 +5035,6 @@ dependencies = [
"async-channel",
"async-recursion",
"async-trait",
"bitpacking",
"bitvec",
"bytes",
"chrono",
@@ -5061,6 +5052,7 @@ dependencies = [
"jsonb",
"lance-arrow",
"lance-arrow-stats",
"lance-bitpacking",
"lance-core",
"lance-datafusion",
"lance-datagen",
@@ -5096,8 +5088,8 @@ dependencies = [
[[package]]
name = "lance-io"
version = "9.0.0-beta.8"
source = "git+https://github.com/lance-format/lance.git?rev=4acefffd5d38f88003fce681ae1d0871077ce5e7#4acefffd5d38f88003fce681ae1d0871077ce5e7"
version = "9.0.0-beta.10"
source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.10#e25b71e74b89d10c57b412d111bde087117383f3"
dependencies = [
"arrow",
"arrow-arith",
@@ -5138,8 +5130,8 @@ dependencies = [
[[package]]
name = "lance-linalg"
version = "9.0.0-beta.8"
source = "git+https://github.com/lance-format/lance.git?rev=4acefffd5d38f88003fce681ae1d0871077ce5e7#4acefffd5d38f88003fce681ae1d0871077ce5e7"
version = "9.0.0-beta.10"
source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.10#e25b71e74b89d10c57b412d111bde087117383f3"
dependencies = [
"arrow-array",
"arrow-buffer",
@@ -5155,8 +5147,8 @@ dependencies = [
[[package]]
name = "lance-namespace"
version = "9.0.0-beta.8"
source = "git+https://github.com/lance-format/lance.git?rev=4acefffd5d38f88003fce681ae1d0871077ce5e7#4acefffd5d38f88003fce681ae1d0871077ce5e7"
version = "9.0.0-beta.10"
source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.10#e25b71e74b89d10c57b412d111bde087117383f3"
dependencies = [
"arrow",
"async-trait",
@@ -5168,8 +5160,8 @@ dependencies = [
[[package]]
name = "lance-namespace-impls"
version = "9.0.0-beta.8"
source = "git+https://github.com/lance-format/lance.git?rev=4acefffd5d38f88003fce681ae1d0871077ce5e7#4acefffd5d38f88003fce681ae1d0871077ce5e7"
version = "9.0.0-beta.10"
source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.10#e25b71e74b89d10c57b412d111bde087117383f3"
dependencies = [
"arrow",
"arrow-ipc",
@@ -5223,8 +5215,8 @@ dependencies = [
[[package]]
name = "lance-select"
version = "9.0.0-beta.8"
source = "git+https://github.com/lance-format/lance.git?rev=4acefffd5d38f88003fce681ae1d0871077ce5e7#4acefffd5d38f88003fce681ae1d0871077ce5e7"
version = "9.0.0-beta.10"
source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.10#e25b71e74b89d10c57b412d111bde087117383f3"
dependencies = [
"arrow-array",
"arrow-buffer",
@@ -5239,8 +5231,8 @@ dependencies = [
[[package]]
name = "lance-table"
version = "9.0.0-beta.8"
source = "git+https://github.com/lance-format/lance.git?rev=4acefffd5d38f88003fce681ae1d0871077ce5e7#4acefffd5d38f88003fce681ae1d0871077ce5e7"
version = "9.0.0-beta.10"
source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.10#e25b71e74b89d10c57b412d111bde087117383f3"
dependencies = [
"arrow",
"arrow-array",
@@ -5279,8 +5271,8 @@ dependencies = [
[[package]]
name = "lance-testing"
version = "9.0.0-beta.8"
source = "git+https://github.com/lance-format/lance.git?rev=4acefffd5d38f88003fce681ae1d0871077ce5e7#4acefffd5d38f88003fce681ae1d0871077ce5e7"
version = "9.0.0-beta.10"
source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.10#e25b71e74b89d10c57b412d111bde087117383f3"
dependencies = [
"arrow-array",
"arrow-schema",
@@ -5293,8 +5285,8 @@ dependencies = [
[[package]]
name = "lance-tokenizer"
version = "9.0.0-beta.8"
source = "git+https://github.com/lance-format/lance.git?rev=4acefffd5d38f88003fce681ae1d0871077ce5e7#4acefffd5d38f88003fce681ae1d0871077ce5e7"
version = "9.0.0-beta.10"
source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.10#e25b71e74b89d10c57b412d111bde087117383f3"
dependencies = [
"icu_segmenter",
"jieba-rs",
@@ -5307,7 +5299,7 @@ dependencies = [
[[package]]
name = "lancedb"
version = "0.31.0-beta.2"
version = "0.31.0-beta.4"
dependencies = [
"ahash",
"anyhow",
@@ -5384,13 +5376,14 @@ dependencies = [
"tokenizers",
"tokio",
"url",
"urlencoding",
"uuid",
"walkdir",
]
[[package]]
name = "lancedb-nodejs"
version = "0.31.0-beta.2"
version = "0.31.0-beta.4"
dependencies = [
"arrow-array",
"arrow-buffer",
@@ -5415,7 +5408,7 @@ dependencies = [
[[package]]
name = "lancedb-python"
version = "0.34.0-beta.2"
version = "0.34.0-beta.4"
dependencies = [
"arrow",
"async-trait",
@@ -5648,9 +5641,9 @@ dependencies = [
[[package]]
name = "log"
version = "0.4.32"
version = "0.4.33"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "953f07c43838f8e6f9758cab68bf5bed85465e7587ebe0b823f1bcd81978ad3a"
checksum = "0ceec5bc11778974d1bcb055b18002eba7f4b3518b6a0081b3af5f21666da9ad"
[[package]]
name = "loom"
@@ -5958,9 +5951,9 @@ dependencies = [
[[package]]
name = "napi"
version = "3.9.3"
version = "3.9.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "fbd9f9295f3ff5921e78a71222c3361a8216f7760b1a99a6ad4e8441de18bbb9"
checksum = "b41bda2ac390efb5e8d22025d925ccc3f3807d8c1bea6d19b36127247c4b8f83"
dependencies = [
"bitflags 2.11.1",
"chrono",
@@ -5983,9 +5976,9 @@ checksum = "c9c366d2c8c60b86fa632df75f745509b52f9128f91a6bad4c796e44abb505e1"
[[package]]
name = "napi-derive"
version = "3.5.6"
version = "3.5.7"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "89b3f766e04667e6da0e181e2da4f85475d5a6513b7cf6a80bea184e224a5b42"
checksum = "61d66f70256ad5aef58659966064471d0ad90e2897bc36a5a5e0389c85aabc1e"
dependencies = [
"convert_case",
"ctor 1.0.5",
@@ -5997,9 +5990,9 @@ dependencies = [
[[package]]
name = "napi-derive-backend"
version = "5.0.4"
version = "5.0.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0d5af30503edf933ce7377cf6d4c877a62b0f1107ea05585f1b5e430e88d5baf"
checksum = "81b4b08f15eed7a2a20c3f4c6314013fc3ac890a3afa9892b594485299ebdb2d"
dependencies = [
"convert_case",
"proc-macro2",
@@ -10127,9 +10120,9 @@ checksum = "06abde3611657adf66d383f00b093d7faecc7fa57071cce2578660c9f1010821"
[[package]]
name = "uuid"
version = "1.23.3"
version = "1.23.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "144d6b123cef80b301b8f72a9e2ca4370ddec21950d0a103dd22c437006d2db7"
checksum = "bf80a72845275afea99e7f2b434723d3bc7e38470fcd1c7ed39a599c73319a53"
dependencies = [
"getrandom 0.4.2",
"js-sys",

View File

@@ -13,20 +13,20 @@ categories = ["database-implementations"]
rust-version = "1.91.0"
[workspace.dependencies]
lance = { "version" = "=9.0.0-beta.8", default-features = false, "rev" = "4acefffd5d38f88003fce681ae1d0871077ce5e7", "git" = "https://github.com/lance-format/lance.git" }
lance-core = { "version" = "=9.0.0-beta.8", "rev" = "4acefffd5d38f88003fce681ae1d0871077ce5e7", "git" = "https://github.com/lance-format/lance.git" }
lance-datagen = { "version" = "=9.0.0-beta.8", "rev" = "4acefffd5d38f88003fce681ae1d0871077ce5e7", "git" = "https://github.com/lance-format/lance.git" }
lance-file = { "version" = "=9.0.0-beta.8", "rev" = "4acefffd5d38f88003fce681ae1d0871077ce5e7", "git" = "https://github.com/lance-format/lance.git" }
lance-io = { "version" = "=9.0.0-beta.8", default-features = false, "rev" = "4acefffd5d38f88003fce681ae1d0871077ce5e7", "git" = "https://github.com/lance-format/lance.git" }
lance-index = { "version" = "=9.0.0-beta.8", "rev" = "4acefffd5d38f88003fce681ae1d0871077ce5e7", "git" = "https://github.com/lance-format/lance.git" }
lance-linalg = { "version" = "=9.0.0-beta.8", "rev" = "4acefffd5d38f88003fce681ae1d0871077ce5e7", "git" = "https://github.com/lance-format/lance.git" }
lance-namespace = { "version" = "=9.0.0-beta.8", "rev" = "4acefffd5d38f88003fce681ae1d0871077ce5e7", "git" = "https://github.com/lance-format/lance.git" }
lance-namespace-impls = { "version" = "=9.0.0-beta.8", default-features = false, "rev" = "4acefffd5d38f88003fce681ae1d0871077ce5e7", "git" = "https://github.com/lance-format/lance.git" }
lance-table = { "version" = "=9.0.0-beta.8", "rev" = "4acefffd5d38f88003fce681ae1d0871077ce5e7", "git" = "https://github.com/lance-format/lance.git" }
lance-testing = { "version" = "=9.0.0-beta.8", "rev" = "4acefffd5d38f88003fce681ae1d0871077ce5e7", "git" = "https://github.com/lance-format/lance.git" }
lance-datafusion = { "version" = "=9.0.0-beta.8", "rev" = "4acefffd5d38f88003fce681ae1d0871077ce5e7", "git" = "https://github.com/lance-format/lance.git" }
lance-encoding = { "version" = "=9.0.0-beta.8", "rev" = "4acefffd5d38f88003fce681ae1d0871077ce5e7", "git" = "https://github.com/lance-format/lance.git" }
lance-arrow = { "version" = "=9.0.0-beta.8", "rev" = "4acefffd5d38f88003fce681ae1d0871077ce5e7", "git" = "https://github.com/lance-format/lance.git" }
lance = { "version" = "=9.0.0-beta.10", default-features = false, "tag" = "v9.0.0-beta.10", "git" = "https://github.com/lance-format/lance.git" }
lance-core = { "version" = "=9.0.0-beta.10", "tag" = "v9.0.0-beta.10", "git" = "https://github.com/lance-format/lance.git" }
lance-datagen = { "version" = "=9.0.0-beta.10", "tag" = "v9.0.0-beta.10", "git" = "https://github.com/lance-format/lance.git" }
lance-file = { "version" = "=9.0.0-beta.10", "tag" = "v9.0.0-beta.10", "git" = "https://github.com/lance-format/lance.git" }
lance-io = { "version" = "=9.0.0-beta.10", default-features = false, "tag" = "v9.0.0-beta.10", "git" = "https://github.com/lance-format/lance.git" }
lance-index = { "version" = "=9.0.0-beta.10", "tag" = "v9.0.0-beta.10", "git" = "https://github.com/lance-format/lance.git" }
lance-linalg = { "version" = "=9.0.0-beta.10", "tag" = "v9.0.0-beta.10", "git" = "https://github.com/lance-format/lance.git" }
lance-namespace = { "version" = "=9.0.0-beta.10", "tag" = "v9.0.0-beta.10", "git" = "https://github.com/lance-format/lance.git" }
lance-namespace-impls = { "version" = "=9.0.0-beta.10", default-features = false, "tag" = "v9.0.0-beta.10", "git" = "https://github.com/lance-format/lance.git" }
lance-table = { "version" = "=9.0.0-beta.10", "tag" = "v9.0.0-beta.10", "git" = "https://github.com/lance-format/lance.git" }
lance-testing = { "version" = "=9.0.0-beta.10", "tag" = "v9.0.0-beta.10", "git" = "https://github.com/lance-format/lance.git" }
lance-datafusion = { "version" = "=9.0.0-beta.10", "tag" = "v9.0.0-beta.10", "git" = "https://github.com/lance-format/lance.git" }
lance-encoding = { "version" = "=9.0.0-beta.10", "tag" = "v9.0.0-beta.10", "git" = "https://github.com/lance-format/lance.git" }
lance-arrow = { "version" = "=9.0.0-beta.10", "tag" = "v9.0.0-beta.10", "git" = "https://github.com/lance-format/lance.git" }
ahash = "0.8"
# Note that this one does not include pyarrow
arrow = { version = "58.0.0", optional = false }

View File

@@ -14,7 +14,7 @@ Add the following dependency to your `pom.xml`:
<dependency>
<groupId>com.lancedb</groupId>
<artifactId>lancedb-core</artifactId>
<version>0.31.0-beta.2</version>
<version>0.31.0-beta.5</version>
</dependency>
```

View File

@@ -0,0 +1,29 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / OAuthFlowType
# Enumeration: OAuthFlowType
OAuth authentication flow types.
## Enumeration Members
### AzureManagedIdentity
```ts
AzureManagedIdentity: "azure_managed_identity";
```
Azure Managed Identity via IMDS.
***
### ClientCredentials
```ts
ClientCredentials: "client_credentials";
```
Client Credentials grant (service-to-service / M2M).

View File

@@ -12,6 +12,7 @@
## Enumerations
- [FullTextQueryType](enumerations/FullTextQueryType.md)
- [OAuthFlowType](enumerations/OAuthFlowType.md)
- [Occur](enumerations/Occur.md)
- [Operator](enumerations/Operator.md)
@@ -85,6 +86,8 @@
- [ListNamespacesResponse](interfaces/ListNamespacesResponse.md)
- [LsmWriteSpec](interfaces/LsmWriteSpec.md)
- [MergeResult](interfaces/MergeResult.md)
- [NativeOAuthConfig](interfaces/NativeOAuthConfig.md)
- [OAuthConfig](interfaces/OAuthConfig.md)
- [OpenTableOptions](interfaces/OpenTableOptions.md)
- [OptimizeOptions](interfaces/OptimizeOptions.md)
- [OptimizeStats](interfaces/OptimizeStats.md)

View File

@@ -64,6 +64,19 @@ client used by manifest-enabled native connections.
***
### oauthConfig?
```ts
optional oauthConfig: NativeOAuthConfig;
```
(For LanceDB cloud only): OAuth configuration for IdP-based
authentication (e.g., Azure Entra ID). When set, token acquisition
and refresh are handled entirely in Rust. TypeScript users should pass
the public `OAuthConfig` type exported from `@lancedb/lancedb`.
***
### readConsistencyInterval?
```ts

View File

@@ -0,0 +1,88 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / NativeOAuthConfig
# Interface: NativeOAuthConfig
OAuth configuration for LanceDB authentication.
This is the generated napi-rs binding shape. TypeScript users should prefer
the public `OAuthConfig` type exported from `@lancedb/lancedb`.
All token acquisition and refresh is handled in the Rust layer.
## Properties
### clientId
```ts
clientId: string;
```
Application / Client ID.
***
### clientSecret?
```ts
optional clientSecret: string;
```
Client secret (required for client_credentials).
***
### flow?
```ts
optional flow: string;
```
Authentication flow: "client_credentials" or "azure_managed_identity"
***
### issuerUrl
```ts
issuerUrl: string;
```
OIDC issuer URL or OAuth authority URL.
For Azure: `https://login.microsoftonline.com/{tenant_id}/v2.0`
***
### managedIdentityClientId?
```ts
optional managedIdentityClientId: string;
```
Client ID for user-assigned managed identity (azure_managed_identity).
***
### refreshBufferSecs?
```ts
optional refreshBufferSecs: number;
```
Seconds before expiry to trigger proactive refresh (default: 300).
Keep this well below the token TTL; if it is greater than or equal to
the TTL, each request refreshes the token.
***
### scopes
```ts
scopes: string[];
```
OAuth scopes to request. For Azure managed identity, exactly one scope
or resource is required. For example: `["api://{app_id}/.default"]`

View File

@@ -0,0 +1,111 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / OAuthConfig
# Interface: OAuthConfig
OAuth configuration for LanceDB authentication.
This is the public TypeScript OAuth configuration type. The generated
`NativeOAuthConfig` type has the same runtime shape but is an implementation
detail of the napi-rs binding.
All token acquisition and refresh is handled in the Rust layer.
This config is passed through to Rust via napi-rs.
## Examples
```typescript
const config: OAuthConfig = {
issuerUrl: "https://login.microsoftonline.com/{tenant}/v2.0",
clientId: "app-id",
clientSecret: "secret",
scopes: ["api://lancedb-api/.default"],
};
```
```typescript
const config: OAuthConfig = {
issuerUrl: "https://login.microsoftonline.com/{tenant}/v2.0",
clientId: "app-id",
scopes: ["api://lancedb-api/.default"],
flow: OAuthFlowType.AzureManagedIdentity,
};
```
## Properties
### clientId
```ts
clientId: string;
```
Application / Client ID.
***
### clientSecret?
```ts
optional clientSecret: string;
```
Client secret (required for ClientCredentials).
***
### flow?
```ts
optional flow: OAuthFlowType;
```
Authentication flow (default: ClientCredentials).
***
### issuerUrl
```ts
issuerUrl: string;
```
OIDC issuer URL or OAuth authority URL.
For Azure: `https://login.microsoftonline.com/{tenant_id}/v2.0`
***
### managedIdentityClientId?
```ts
optional managedIdentityClientId: string;
```
Client ID for user-assigned managed identity (AzureManagedIdentity).
***
### refreshBufferSecs?
```ts
optional refreshBufferSecs: number;
```
Seconds before expiry to trigger proactive refresh (default: 300).
Keep this well below the token TTL; if it is greater than or equal to
the TTL, each request refreshes the token.
***
### scopes
```ts
scopes: string[];
```
OAuth scopes to request.
For Azure managed identity, exactly one scope or resource is required.
For example: `["api://{app_id}/.default"]`

View File

@@ -8,7 +8,7 @@
<parent>
<groupId>com.lancedb</groupId>
<artifactId>lancedb-parent</artifactId>
<version>0.31.0-beta.2</version>
<version>0.31.0-beta.5</version>
<relativePath>../pom.xml</relativePath>
</parent>

View File

@@ -6,7 +6,7 @@
<groupId>com.lancedb</groupId>
<artifactId>lancedb-parent</artifactId>
<version>0.31.0-beta.2</version>
<version>0.31.0-beta.5</version>
<packaging>pom</packaging>
<name>${project.artifactId}</name>
<description>LanceDB Java SDK Parent POM</description>
@@ -28,7 +28,7 @@
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<arrow.version>15.0.0</arrow.version>
<lance-core.version>9.0.0-beta.8</lance-core.version>
<lance-core.version>9.0.0-beta.10</lance-core.version>
<spotless.skip>false</spotless.skip>
<spotless.version>2.30.0</spotless.version>
<spotless.java.googlejavaformat.version>1.7</spotless.java.googlejavaformat.version>

View File

@@ -1,7 +1,7 @@
[package]
name = "lancedb-nodejs"
edition.workspace = true
version = "0.31.0-beta.2"
version = "0.31.0-beta.5"
publish = false
license.workspace = true
description.workspace = true

View File

@@ -52,6 +52,7 @@ export {
SplitHashOptions,
SplitSequentialOptions,
ShuffleOptions,
OAuthConfig as NativeOAuthConfig,
} from "./native.js";
export {
@@ -130,6 +131,8 @@ export {
TokenResponse,
} from "./header";
export { OAuthConfig, OAuthFlowType } from "./oauth";
export { MergeInsertBuilder, WriteExecutionOptions } from "./merge";
export * as embedding from "./embedding";

76
nodejs/lancedb/oauth.ts Normal file
View File

@@ -0,0 +1,76 @@
// SPDX-License-Identifier: Apache-2.0
// SPDX-FileCopyrightText: Copyright The LanceDB Authors
/**
* OAuth authentication flow types.
*/
export enum OAuthFlowType {
/** Client Credentials grant (service-to-service / M2M). */
ClientCredentials = "client_credentials",
/** Azure Managed Identity via IMDS. */
AzureManagedIdentity = "azure_managed_identity",
}
/**
* OAuth configuration for LanceDB authentication.
*
* This is the public TypeScript OAuth configuration type. The generated
* `NativeOAuthConfig` type has the same runtime shape but is an implementation
* detail of the napi-rs binding.
*
* All token acquisition and refresh is handled in the Rust layer.
* This config is passed through to Rust via napi-rs.
*
* @example Client Credentials (service-to-service):
* ```typescript
* const config: OAuthConfig = {
* issuerUrl: "https://login.microsoftonline.com/{tenant}/v2.0",
* clientId: "app-id",
* clientSecret: "secret",
* scopes: ["api://lancedb-api/.default"],
* };
* ```
*
* @example Azure Managed Identity:
* ```typescript
* const config: OAuthConfig = {
* issuerUrl: "https://login.microsoftonline.com/{tenant}/v2.0",
* clientId: "app-id",
* scopes: ["api://lancedb-api/.default"],
* flow: OAuthFlowType.AzureManagedIdentity,
* };
* ```
*/
export interface OAuthConfig {
/**
* OIDC issuer URL or OAuth authority URL.
* For Azure: `https://login.microsoftonline.com/{tenant_id}/v2.0`
*/
issuerUrl: string;
/** Application / Client ID. */
clientId: string;
/**
* OAuth scopes to request.
* For Azure managed identity, exactly one scope or resource is required.
* For example: `["api://{app_id}/.default"]`
*/
scopes: string[];
/** Authentication flow (default: ClientCredentials). */
flow?: OAuthFlowType;
/** Client secret (required for ClientCredentials). */
clientSecret?: string;
/** Client ID for user-assigned managed identity (AzureManagedIdentity). */
managedIdentityClientId?: string;
/**
* Seconds before expiry to trigger proactive refresh (default: 300).
* Keep this well below the token TTL; if it is greater than or equal to
* the TTL, each request refreshes the token.
*/
refreshBufferSecs?: number;
}

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-darwin-arm64",
"version": "0.31.0-beta.2",
"version": "0.31.0-beta.5",
"os": ["darwin"],
"cpu": ["arm64"],
"main": "lancedb.darwin-arm64.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-linux-arm64-gnu",
"version": "0.31.0-beta.2",
"version": "0.31.0-beta.5",
"os": ["linux"],
"cpu": ["arm64"],
"main": "lancedb.linux-arm64-gnu.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-linux-arm64-musl",
"version": "0.31.0-beta.2",
"version": "0.31.0-beta.5",
"os": ["linux"],
"cpu": ["arm64"],
"main": "lancedb.linux-arm64-musl.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-linux-x64-gnu",
"version": "0.31.0-beta.2",
"version": "0.31.0-beta.5",
"os": ["linux"],
"cpu": ["x64"],
"main": "lancedb.linux-x64-gnu.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-linux-x64-musl",
"version": "0.31.0-beta.2",
"version": "0.31.0-beta.5",
"os": ["linux"],
"cpu": ["x64"],
"main": "lancedb.linux-x64-musl.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-win32-arm64-msvc",
"version": "0.31.0-beta.2",
"version": "0.31.0-beta.5",
"os": [
"win32"
],

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-win32-x64-msvc",
"version": "0.31.0-beta.2",
"version": "0.31.0-beta.5",
"os": ["win32"],
"cpu": ["x64"],
"main": "lancedb.win32-x64-msvc.node",

View File

@@ -1,12 +1,12 @@
{
"name": "@lancedb/lancedb",
"version": "0.31.0-beta.2",
"version": "0.31.0-beta.4",
"lockfileVersion": 3,
"requires": true,
"packages": {
"": {
"name": "@lancedb/lancedb",
"version": "0.31.0-beta.2",
"version": "0.31.0-beta.4",
"cpu": [
"x64",
"arm64"

View File

@@ -11,7 +11,7 @@
"ann"
],
"private": false,
"version": "0.31.0-beta.2",
"version": "0.31.0-beta.5",
"main": "dist/index.js",
"exports": {
".": "./dist/index.js",

View File

@@ -112,6 +112,12 @@ impl Connection {
builder = builder.client_config(rust_config);
if let Some(oauth_config) = options.oauth_config {
let config: lancedb::remote::oauth::OAuthConfig =
oauth_config.try_into().default_error()?;
builder = builder.oauth_config(config);
}
if let Some(api_key) = options.api_key {
builder = builder.api_key(&api_key);
}

View File

@@ -65,6 +65,11 @@ pub struct ConnectionOptions {
/// (For LanceDB cloud only): the host to use for LanceDB cloud. Used
/// for testing purposes.
pub host_override: Option<String>,
/// (For LanceDB cloud only): OAuth configuration for IdP-based
/// authentication (e.g., Azure Entra ID). When set, token acquisition
/// and refresh are handled entirely in Rust. TypeScript users should pass
/// the public `OAuthConfig` type exported from `@lancedb/lancedb`.
pub oauth_config: Option<remote::OAuthConfig>,
}
#[napi(object)]

View File

@@ -3,7 +3,7 @@
use std::time::Duration;
use lancedb::{arrow::IntoArrow, ipc::ipc_file_to_batches, table::merge::MergeInsertBuilder};
use lancedb::{ipc::ipc_file_to_batches, table::merge::MergeInsertBuilder};
use napi::bindgen_prelude::*;
use napi_derive::napi;
@@ -66,11 +66,9 @@ impl NativeMergeInsertBuilder {
#[napi(catch_unwind)]
pub async fn execute(&self, buf: Buffer) -> napi::Result<MergeResult> {
let data = ipc_file_to_batches(buf.to_vec())
.and_then(IntoArrow::into_arrow)
.map_err(|e| {
napi::Error::from_reason(format!("Failed to read IPC file: {}", convert_error(&e)))
})?;
let data = ipc_file_to_batches(buf.to_vec()).map_err(|e| {
napi::Error::from_reason(format!("Failed to read IPC file: {}", convert_error(&e)))
})?;
let this = self.clone();

View File

@@ -3,6 +3,7 @@
use std::collections::HashMap;
use lancedb::error::Error;
use napi_derive::*;
/// Timeout configuration for remote HTTP client.
@@ -140,6 +141,84 @@ impl From<TlsConfig> for lancedb::remote::TlsConfig {
}
}
/// OAuth configuration for LanceDB authentication.
///
/// This is the generated napi-rs binding shape. TypeScript users should prefer
/// the public `OAuthConfig` type exported from `@lancedb/lancedb`.
///
/// All token acquisition and refresh is handled in the Rust layer.
#[napi(object)]
#[derive(Clone)]
pub struct OAuthConfig {
/// OIDC issuer URL or OAuth authority URL.
/// For Azure: `https://login.microsoftonline.com/{tenant_id}/v2.0`
pub issuer_url: String,
/// Application / Client ID.
pub client_id: String,
/// OAuth scopes to request. For Azure managed identity, exactly one scope
/// or resource is required. For example: `["api://{app_id}/.default"]`
pub scopes: Vec<String>,
/// Authentication flow: "client_credentials" or "azure_managed_identity"
pub flow: Option<String>,
/// Client secret (required for client_credentials).
pub client_secret: Option<String>,
/// Client ID for user-assigned managed identity (azure_managed_identity).
pub managed_identity_client_id: Option<String>,
/// Seconds before expiry to trigger proactive refresh (default: 300).
/// Keep this well below the token TTL; if it is greater than or equal to
/// the TTL, each request refreshes the token.
pub refresh_buffer_secs: Option<u32>,
}
impl std::fmt::Debug for OAuthConfig {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
f.debug_struct("OAuthConfig")
.field("issuer_url", &self.issuer_url)
.field("client_id", &self.client_id)
.field("scopes", &self.scopes)
.field("flow", &self.flow)
.field(
"client_secret",
&self.client_secret.as_deref().map(|_| "<redacted>"),
)
.field(
"managed_identity_client_id",
&self.managed_identity_client_id,
)
.field("refresh_buffer_secs", &self.refresh_buffer_secs)
.finish()
}
}
impl TryFrom<OAuthConfig> for lancedb::remote::oauth::OAuthConfig {
type Error = Error;
fn try_from(config: OAuthConfig) -> Result<Self, Self::Error> {
use lancedb::remote::oauth::OAuthFlow;
let flow = match config.flow.as_deref().unwrap_or("client_credentials") {
"client_credentials" => OAuthFlow::ClientCredentials,
"azure_managed_identity" => OAuthFlow::AzureManagedIdentity {
client_id: config.managed_identity_client_id,
},
other => {
return Err(Error::InvalidInput {
message: format!("Unknown OAuth flow type: {other}"),
});
}
};
Ok(Self {
issuer_url: config.issuer_url,
client_id: config.client_id,
client_secret: config.client_secret,
scopes: config.scopes,
flow,
refresh_buffer_secs: config.refresh_buffer_secs.map(|v| v as u64),
})
}
}
impl From<ClientConfig> for lancedb::remote::ClientConfig {
fn from(config: ClientConfig) -> Self {
Self {
@@ -156,3 +235,45 @@ impl From<ClientConfig> for lancedb::remote::ClientConfig {
}
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_unknown_oauth_flow_returns_invalid_input() {
let config = OAuthConfig {
issuer_url: "https://issuer.example.com".to_string(),
client_id: "client-id".to_string(),
scopes: vec!["scope".to_string()],
flow: Some("typo".to_string()),
client_secret: None,
managed_identity_client_id: None,
refresh_buffer_secs: None,
};
let err = lancedb::remote::oauth::OAuthConfig::try_from(config).unwrap_err();
assert!(matches!(
err,
Error::InvalidInput { message }
if message == "Unknown OAuth flow type: typo"
));
}
#[test]
fn test_oauth_config_debug_redacts_client_secret() {
let config = OAuthConfig {
issuer_url: "https://issuer.example.com".to_string(),
client_id: "client-id".to_string(),
scopes: vec!["scope".to_string()],
flow: Some("client_credentials".to_string()),
client_secret: Some("super-secret".to_string()),
managed_identity_client_id: None,
refresh_buffer_secs: None,
};
let debug = format!("{config:?}");
assert!(!debug.contains("super-secret"));
assert!(debug.contains("client_secret: Some(\"<redacted>\")"));
}
}

View File

@@ -1,5 +1,5 @@
[tool.bumpversion]
current_version = "0.34.0-beta.2"
current_version = "0.34.0-beta.5"
parse = """(?x)
(?P<major>0|[1-9]\\d*)\\.
(?P<minor>0|[1-9]\\d*)\\.

View File

@@ -1,6 +1,6 @@
[package]
name = "lancedb-python"
version = "0.34.0-beta.2"
version = "0.34.0-beta.5"
publish = false
edition.workspace = true
description = "Python bindings for LanceDB"

View File

@@ -89,6 +89,8 @@ def connect(
If presented, connect to LanceDB cloud.
Otherwise, connect to a database on file system or cloud storage.
Can be set via environment variable `LANCEDB_API_KEY`.
OAuth configuration is currently supported only by ``connect_async``;
synchronous LanceDB Cloud connections require an API key.
region: str, default "us-east-1"
The region to use for LanceDB Cloud.
host_override: str, optional
@@ -340,6 +342,7 @@ async def connect_async(
session: Optional[Session] = None,
manifest_enabled: bool = False,
namespace_client_properties: Optional[Dict[str, str]] = None,
oauth_config=None,
) -> AsyncConnection:
"""Connect to a LanceDB database.
@@ -389,6 +392,10 @@ async def connect_async(
namespace_client_properties : dict, optional
Additional directory namespace client properties to use with
``manifest_enabled=True``.
oauth_config : OAuthConfig, optional
OAuth configuration for LanceDB Cloud/Enterprise. This is supported by
``connect_async`` only; synchronous ``connect`` uses API key
authentication for ``db://`` URIs.
Examples
--------
@@ -435,6 +442,7 @@ async def connect_async(
session,
manifest_enabled,
namespace_client_properties,
oauth_config,
)
)

View File

@@ -280,6 +280,24 @@ async def connect(
session: Optional[Session],
manifest_enabled: bool = False,
namespace_client_properties: Optional[Dict[str, str]] = None,
oauth_config: Optional[Any] = None,
) -> Connection: ...
def connect_namespace(
namespace_client_impl: str,
namespace_client_properties: Dict[str, str],
read_consistency_interval: Optional[float] = None,
storage_options: Optional[Dict[str, str]] = None,
session: Optional[Session] = None,
namespace_client_pushdown_operations: Optional[List[str]] = None,
) -> Connection: ...
def connect_namespace_client(
namespace_client: Any,
read_consistency_interval: Optional[float] = None,
storage_options: Optional[Dict[str, str]] = None,
session: Optional[Session] = None,
namespace_client_pushdown_operations: Optional[List[str]] = None,
namespace_client_impl: Optional[str] = None,
namespace_client_properties: Optional[Dict[str, str]] = None,
) -> Connection: ...
class RecordBatchStream:

View File

@@ -38,8 +38,11 @@ from lance_namespace_urllib3_client.models.query_table_request_vector import (
QueryTableRequestVector,
)
from lance_namespace_urllib3_client.models.string_fts_query import StringFtsQuery
from lance_namespace.errors import TableNotFoundError
from lancedb._lancedb import connect_namespace_client as _connect_namespace_client
from lance_namespace.errors import NamespaceNotEmptyError, TableNotFoundError
from lancedb._lancedb import (
connect_namespace as _connect_namespace,
connect_namespace_client as _connect_namespace_client,
)
from lancedb.background_loop import LOOP
from lancedb.db import AsyncConnection, DBConnection
from lancedb.namespace_utils import (
@@ -373,6 +376,23 @@ def _convert_pyarrow_schema_to_json(schema: pa.Schema) -> JsonArrowSchema:
return JsonArrowSchema(fields=fields, metadata=meta)
def _builds_namespace_natively(
namespace_client_impl: Optional[str],
namespace_client_properties: Optional[Dict[str, str]],
) -> bool:
"""Whether ``connect_namespace_client`` builds the namespace client natively
in Rust (installing the read-freshness context provider) rather than wrapping
the pre-built Python client.
Must mirror Rust ``build_namespace_natively`` in ``python/src/connection.rs``.
"""
return namespace_client_impl == "rest" and bool(namespace_client_properties)
def _supports_native_sync_namespace(namespace_client_impl: str) -> bool:
return namespace_client_impl in {"dir", "rest"}
class LanceNamespaceDBConnection(DBConnection):
"""
A LanceDB connection that uses a namespace for table management.
@@ -383,7 +403,7 @@ class LanceNamespaceDBConnection(DBConnection):
def __init__(
self,
namespace_client: LanceNamespace,
namespace_client: Optional[LanceNamespace] = None,
*,
read_consistency_interval: Optional[timedelta] = None,
storage_options: Optional[Dict[str, str]] = None,
@@ -391,6 +411,8 @@ class LanceNamespaceDBConnection(DBConnection):
namespace_client_pushdown_operations: Optional[List[str]] = None,
namespace_client_impl: Optional[str] = None,
namespace_client_properties: Optional[Dict[str, str]] = None,
_inner: Optional[AsyncConnection] = None,
_route_pushdown_to_rust: Optional[bool] = None,
):
"""
Initialize a namespace-based LanceDB connection.
@@ -432,23 +454,40 @@ class LanceNamespaceDBConnection(DBConnection):
)
self._namespace_client_impl = namespace_client_impl
self._namespace_client_properties = namespace_client_properties
self._inner = AsyncConnection(
_connect_namespace_client(
namespace_client,
read_consistency_interval=(
read_consistency_interval.total_seconds()
if read_consistency_interval is not None
else None
),
storage_options=self.storage_options or None,
session=session,
namespace_client_pushdown_operations=(
list(self._namespace_client_pushdown_operations)
),
namespace_client_impl=namespace_client_impl,
namespace_client_properties=namespace_client_properties,
# When the namespace client is built natively (see Rust
# ``build_namespace_natively``), the underlying Rust table performs
# QueryTable pushdown through the read-freshness context provider, which
# the pure-Python ``query_table`` path bypasses.
self._route_pushdown_to_rust = (
_route_pushdown_to_rust
if _route_pushdown_to_rust is not None
else _builds_namespace_natively(
namespace_client_impl, namespace_client_properties
)
)
if _inner is not None:
self._inner = _inner
else:
if namespace_client is None:
raise ValueError("namespace_client is required without a native _inner")
self._inner = AsyncConnection(
_connect_namespace_client(
namespace_client,
read_consistency_interval=(
read_consistency_interval.total_seconds()
if read_consistency_interval is not None
else None
),
storage_options=self.storage_options or None,
session=session,
namespace_client_pushdown_operations=(
list(self._namespace_client_pushdown_operations)
),
namespace_client_impl=namespace_client_impl,
namespace_client_properties=namespace_client_properties,
)
)
self._uri = self._inner.uri
@override
def serialize(self) -> str:
@@ -494,11 +533,11 @@ class LanceNamespaceDBConnection(DBConnection):
)
if namespace_path is None:
namespace_path = []
request = ListTablesRequest(
id=namespace_path, page_token=page_token, limit=limit
return LOOP.run(
self._inner.table_names(
namespace_path=namespace_path, start_after=page_token, limit=limit
)
)
response = self._namespace_client.list_tables(request)
return response.tables if response.tables else []
@override
def create_table(
@@ -543,6 +582,7 @@ class LanceNamespaceDBConnection(DBConnection):
namespace_path=namespace_path,
namespace_client=self._namespace_client,
pushdown_operations=self._namespace_client_pushdown_operations,
route_pushdown_to_rust=self._route_pushdown_to_rust,
_async=async_table,
)
@@ -568,8 +608,8 @@ class LanceNamespaceDBConnection(DBConnection):
index_cache_size=index_cache_size,
)
)
except RuntimeError as e:
if "Table not found" in str(e):
except (RuntimeError, ValueError) as e:
if "Table not found" in str(e) or "was not found" in str(e):
table_id = namespace_path + [name]
raise TableNotFoundError(f"Table not found: {'$'.join(table_id)}")
raise
@@ -580,6 +620,7 @@ class LanceNamespaceDBConnection(DBConnection):
namespace_path=namespace_path,
namespace_client=self._namespace_client,
pushdown_operations=self._namespace_client_pushdown_operations,
route_pushdown_to_rust=self._route_pushdown_to_rust,
_async=async_table,
)
if branch is not None:
@@ -590,12 +631,9 @@ class LanceNamespaceDBConnection(DBConnection):
@override
def drop_table(self, name: str, namespace_path: Optional[List[str]] = None):
# Use namespace drop_table directly
if namespace_path is None:
namespace_path = []
table_id = namespace_path + [name]
request = DropTableRequest(id=table_id)
self._namespace_client.drop_table(request)
LOOP.run(self._inner.drop_table(name, namespace_path=namespace_path))
@override
def rename_table(
@@ -609,14 +647,19 @@ class LanceNamespaceDBConnection(DBConnection):
cur_namespace_path = []
if new_namespace_path is None:
new_namespace_path = []
cur_table_id = cur_namespace_path + [cur_name]
new_namespace_id = new_namespace_path if new_namespace_path else None
request = RenameTableRequest(
id=cur_table_id,
new_table_name=new_name,
new_namespace_id=new_namespace_id,
)
self._namespace_client.rename_table(request)
try:
LOOP.run(
self._inner.rename_table(
cur_name,
new_name,
cur_namespace_path=cur_namespace_path,
new_namespace_path=new_namespace_path,
)
)
except RuntimeError as e:
if "rename_table not implemented" in str(e):
raise NotImplementedError("rename_table not implemented") from e
raise
@override
def drop_database(self):
@@ -628,8 +671,7 @@ class LanceNamespaceDBConnection(DBConnection):
def drop_all_tables(self, namespace_path: Optional[List[str]] = None):
if namespace_path is None:
namespace_path = []
for table_name in self.table_names(namespace_path=namespace_path):
self.drop_table(table_name, namespace_path=namespace_path)
LOOP.run(self._inner.drop_all_tables(namespace_path=namespace_path))
@override
def list_namespaces(
@@ -659,13 +701,10 @@ class LanceNamespaceDBConnection(DBConnection):
"""
if namespace_path is None:
namespace_path = []
request = ListNamespacesRequest(
id=namespace_path, page_token=page_token, limit=limit
)
response = self._namespace_client.list_namespaces(request)
return ListNamespacesResponse(
namespaces=response.namespaces if response.namespaces else [],
page_token=response.page_token,
return LOOP.run(
self._inner.list_namespaces(
namespace_path=namespace_path, page_token=page_token, limit=limit
)
)
@override
@@ -693,14 +732,12 @@ class LanceNamespaceDBConnection(DBConnection):
CreateNamespaceResponse
Response containing the properties of the created namespace.
"""
request = CreateNamespaceRequest(
id=namespace_path,
mode=_normalize_create_namespace_mode(mode),
properties=properties,
)
response = self._namespace_client.create_namespace(request)
return CreateNamespaceResponse(
properties=response.properties if hasattr(response, "properties") else None
return LOOP.run(
self._inner.create_namespace(
namespace_path=namespace_path,
mode=mode,
properties=properties,
)
)
@override
@@ -728,20 +765,18 @@ class LanceNamespaceDBConnection(DBConnection):
DropNamespaceResponse
Response containing properties and transaction_id if applicable.
"""
request = DropNamespaceRequest(
id=namespace_path,
mode=_normalize_drop_namespace_mode(mode),
behavior=_normalize_drop_namespace_behavior(behavior),
)
response = self._namespace_client.drop_namespace(request)
return DropNamespaceResponse(
properties=(
response.properties if hasattr(response, "properties") else None
),
transaction_id=(
response.transaction_id if hasattr(response, "transaction_id") else None
),
)
try:
return LOOP.run(
self._inner.drop_namespace(
namespace_path=namespace_path,
mode=mode,
behavior=behavior,
)
)
except RuntimeError as e:
if "Namespace not empty" in str(e):
raise NamespaceNotEmptyError(str(e)) from e
raise
@override
def describe_namespace(
@@ -760,11 +795,7 @@ class LanceNamespaceDBConnection(DBConnection):
DescribeNamespaceResponse
Response containing the namespace properties.
"""
request = DescribeNamespaceRequest(id=namespace_path)
response = self._namespace_client.describe_namespace(request)
return DescribeNamespaceResponse(
properties=response.properties if hasattr(response, "properties") else None
)
return LOOP.run(self._inner.describe_namespace(namespace_path))
@override
def list_tables(
@@ -794,13 +825,10 @@ class LanceNamespaceDBConnection(DBConnection):
"""
if namespace_path is None:
namespace_path = []
request = ListTablesRequest(
id=namespace_path, page_token=page_token, limit=limit
)
response = self._namespace_client.list_tables(request)
return ListTablesResponse(
tables=response.tables if response.tables else [],
page_token=response.page_token,
return LOOP.run(
self._inner.list_tables(
namespace_path=namespace_path, page_token=page_token, limit=limit
)
)
def _lance_table_from_uri(
@@ -856,6 +884,18 @@ class LanceNamespaceDBConnection(DBConnection):
LanceNamespace
The namespace client for this connection.
"""
if self._namespace_client is None:
if (
self._namespace_client_impl is None
or self._namespace_client_properties is None
):
raise ValueError(
"Cannot construct a Python namespace client without "
"namespace implementation properties"
)
self._namespace_client = namespace_connect(
self._namespace_client_impl, self._namespace_client_properties
)
return self._namespace_client
@@ -875,6 +915,8 @@ class AsyncLanceNamespaceDBConnection:
storage_options: Optional[Dict[str, str]] = None,
session: Optional[Session] = None,
namespace_client_pushdown_operations: Optional[List[str]] = None,
namespace_client_impl: Optional[str] = None,
namespace_client_properties: Optional[Dict[str, str]] = None,
):
"""
Initialize an async namespace-based LanceDB connection.
@@ -900,6 +942,12 @@ class AsyncLanceNamespaceDBConnection:
namespace.create_table() instead of using declare_table + local write.
Default is None (no pushdown, all operations run locally).
namespace_client_impl : Optional[str]
The namespace implementation name used to create this connection.
Required (with ``namespace_client_properties``) for the Rust client to
be built natively and install the read-freshness provider.
namespace_client_properties : Optional[Dict[str, str]]
The namespace properties used to create this connection.
"""
self._namespace_client = namespace_client
self.read_consistency_interval = read_consistency_interval
@@ -908,6 +956,14 @@ class AsyncLanceNamespaceDBConnection:
self._namespace_client_pushdown_operations = set(
namespace_client_pushdown_operations or []
)
self._namespace_client_impl = namespace_client_impl
self._namespace_client_properties = namespace_client_properties
# See LanceNamespaceDBConnection: when built natively the Rust table runs
# QueryTable pushdown through the read-freshness provider, so defer to it
# rather than the urllib3 client (which omits x-lancedb-min-timestamp).
self._route_pushdown_to_rust = _builds_namespace_natively(
namespace_client_impl, namespace_client_properties
)
self._inner = AsyncConnection(
_connect_namespace_client(
namespace_client,
@@ -921,8 +977,8 @@ class AsyncLanceNamespaceDBConnection:
namespace_client_pushdown_operations=(
list(self._namespace_client_pushdown_operations)
),
namespace_client_impl=None,
namespace_client_properties=None,
namespace_client_impl=namespace_client_impl,
namespace_client_properties=namespace_client_properties,
)
)
@@ -992,6 +1048,7 @@ class AsyncLanceNamespaceDBConnection:
namespace_path=namespace_path,
namespace_client=self._namespace_client,
pushdown_operations=self._namespace_client_pushdown_operations,
route_pushdown_to_rust=self._route_pushdown_to_rust,
)
async def open_table(
@@ -1029,6 +1086,7 @@ class AsyncLanceNamespaceDBConnection:
namespace_path=namespace_path,
namespace_client=self._namespace_client,
pushdown_operations=self._namespace_client_pushdown_operations,
route_pushdown_to_rust=self._route_pushdown_to_rust,
)
async def drop_table(self, name: str, namespace_path: Optional[List[str]] = None):
@@ -1302,6 +1360,33 @@ def connect_namespace(
LanceNamespaceDBConnection
A namespace-based connection to LanceDB
"""
if _supports_native_sync_namespace(namespace_client_impl):
inner = AsyncConnection(
_connect_namespace(
namespace_client_impl,
namespace_client_properties,
read_consistency_interval=(
read_consistency_interval.total_seconds()
if read_consistency_interval is not None
else None
),
storage_options=storage_options,
session=session,
namespace_client_pushdown_operations=namespace_client_pushdown_operations,
)
)
return LanceNamespaceDBConnection(
namespace_client=None,
read_consistency_interval=read_consistency_interval,
storage_options=storage_options,
session=session,
namespace_client_pushdown_operations=namespace_client_pushdown_operations,
namespace_client_impl=namespace_client_impl,
namespace_client_properties=namespace_client_properties,
_inner=inner,
_route_pushdown_to_rust=True,
)
namespace_client = namespace_connect(
namespace_client_impl, namespace_client_properties
)
@@ -1387,4 +1472,6 @@ def connect_namespace_async(
storage_options=storage_options,
session=session,
namespace_client_pushdown_operations=namespace_client_pushdown_operations,
namespace_client_impl=namespace_client_impl,
namespace_client_properties=namespace_client_properties,
)

View File

@@ -48,6 +48,14 @@ class PermutationBuilder:
By default, the permutation builder will create a single split that contains all
rows in the same order as the base table.
"""
if not hasattr(table, "_inner"):
raise TypeError(
f"PermutationBuilder requires a local LanceTable, "
f"got {type(table).__name__}. "
"The permutation API is not supported on remote tables. "
"Remote tables connect to LanceDB Cloud or Enterprise and do not have "
"direct access to the underlying Lance dataset needed for permutations."
)
self._async = async_permutation_builder(table)
def split_random(

View File

@@ -9,6 +9,7 @@ from typing import List, Optional
from lancedb import __version__
from .header import HeaderProvider
from .oauth import OAuthConfig, OAuthFlowType
__all__ = [
"TimeoutConfig",
@@ -16,6 +17,8 @@ __all__ = [
"TlsConfig",
"ClientConfig",
"HeaderProvider",
"OAuthConfig",
"OAuthFlowType",
]

View File

@@ -0,0 +1,75 @@
# SPDX-License-Identifier: Apache-2.0
# SPDX-FileCopyrightText: Copyright The LanceDB Authors
from dataclasses import dataclass, field
from enum import Enum
from typing import List, Optional
class OAuthFlowType(str, Enum):
"""OAuth authentication flow types."""
CLIENT_CREDENTIALS = "client_credentials"
"""Client Credentials grant (service-to-service / M2M)."""
AZURE_MANAGED_IDENTITY = "azure_managed_identity"
"""Azure Managed Identity via IMDS."""
@dataclass
class OAuthConfig:
"""OAuth configuration for LanceDB authentication.
All token acquisition and refresh is handled in the Rust layer.
This config is passed through to Rust via PyO3.
Parameters
----------
issuer_url : str
OIDC issuer URL or OAuth authority URL.
For Azure: ``https://login.microsoftonline.com/{tenant_id}/v2.0``
client_id : str
Application / Client ID.
scopes : List[str]
OAuth scopes to request.
For Azure managed identity, exactly one scope or resource is required.
For example: ``["api://{app_id}/.default"]``
flow : OAuthFlowType
Authentication flow to use. Default: CLIENT_CREDENTIALS.
client_secret : Optional[str]
Client secret (required for CLIENT_CREDENTIALS).
managed_identity_client_id : Optional[str]
Client ID for user-assigned managed identity (AZURE_MANAGED_IDENTITY).
refresh_buffer_secs : Optional[int]
Seconds before expiry to trigger proactive refresh (default: 300).
Keep this well below the token TTL; if it is greater than or equal to
the TTL, each request refreshes the token.
Examples
--------
Client Credentials (service-to-service):
>>> config = OAuthConfig(
... issuer_url="https://login.microsoftonline.com/{tenant}/v2.0",
... client_id="app-id",
... client_secret="secret",
... scopes=["api://lancedb-api/.default"],
... )
Azure Managed Identity:
>>> config = OAuthConfig(
... issuer_url="https://login.microsoftonline.com/{tenant}/v2.0",
... client_id="app-id",
... scopes=["api://lancedb-api/.default"],
... flow=OAuthFlowType.AZURE_MANAGED_IDENTITY,
... )
"""
issuer_url: str
client_id: str
scopes: List[str]
flow: OAuthFlowType = OAuthFlowType.CLIENT_CREDENTIALS
client_secret: Optional[str] = field(default=None, repr=False)
managed_identity_client_id: Optional[str] = None
refresh_buffer_secs: Optional[int] = None

View File

@@ -2022,6 +2022,7 @@ class LanceTable(Table):
namespace_client: Optional[Any] = None,
managed_versioning: Optional[bool] = None,
pushdown_operations: Optional[set] = None,
route_pushdown_to_rust: bool = False,
_async: AsyncTable = None,
):
if namespace_path is None:
@@ -2031,6 +2032,14 @@ class LanceTable(Table):
self._location = location # Store location for use in _dataset_path
self._namespace_client = namespace_client
self._pushdown_operations = pushdown_operations or set()
# When the connection built the namespace client natively (e.g. an
# enterprise "rest" connection), the underlying Rust table already
# executes QueryTable pushdown itself -- and, unlike this Python urllib3
# path, it routes through the read-freshness context provider that emits
# the ``x-lancedb-min-timestamp`` header. So we must defer pushdown to
# Rust instead of calling the Python ``namespace_client.query_table``
# directly, or reads silently bypass read-freshness (stale results).
self._route_pushdown_to_rust = route_pushdown_to_rust
if _async is not None:
self._table = _async
else:
@@ -2133,12 +2142,19 @@ class LanceTable(Table):
branch = self.current_branch()
version = None if branch is not None else self.version
if self._namespace_client is not None:
namespace_client = self._namespace_client
if namespace_client is None:
conn_uri = getattr(self._conn, "uri", "")
if get_uri_scheme(conn_uri) == "namespace":
namespace_client = self._conn.namespace_client()
self._namespace_client = namespace_client
if namespace_client is not None:
table_id = self._namespace_path + [self.name]
ds = lance.dataset(
version=version,
storage_options=self._conn.storage_options,
namespace_client=self._namespace_client,
namespace_client=namespace_client,
table_id=table_id,
**kwargs,
)
@@ -2241,6 +2257,7 @@ class LanceTable(Table):
namespace_path=self._namespace_path,
namespace_client=self._namespace_client,
pushdown_operations=self._pushdown_operations,
route_pushdown_to_rust=self._route_pushdown_to_rust,
location=self._location,
_async=async_table,
)
@@ -2391,8 +2408,11 @@ class LanceTable(Table):
Returns
-------
pa.Table"""
if _should_push_down_query_table(
self._namespace_client, self._pushdown_operations
if (
_should_push_down_query_table(
self._namespace_client, self._pushdown_operations
)
and not self._route_pushdown_to_rust
):
return self._execute_query(Query()).read_all()
@@ -3344,6 +3364,7 @@ class LanceTable(Table):
location: Optional[str] = None,
namespace_client: Optional[Any] = None,
pushdown_operations: Optional[set] = None,
route_pushdown_to_rust: bool = False,
):
"""
Create a new table.
@@ -3406,6 +3427,7 @@ class LanceTable(Table):
self._location = location
self._namespace_client = namespace_client
self._pushdown_operations = pushdown_operations or set()
self._route_pushdown_to_rust = route_pushdown_to_rust
if data_storage_version is not None:
warnings.warn(
@@ -3519,6 +3541,7 @@ class LanceTable(Table):
_should_push_down_query_table(
self._namespace_client, self._pushdown_operations
)
and not self._route_pushdown_to_rust
and self.current_branch() is None
):
from lancedb.namespace import _execute_server_side_query
@@ -4260,6 +4283,7 @@ class AsyncTable:
namespace_path: Optional[List[str]] = None,
namespace_client: Optional[Any] = None,
pushdown_operations: Optional[set] = None,
route_pushdown_to_rust: bool = False,
):
"""Create a new AsyncTable object.
@@ -4272,6 +4296,9 @@ class AsyncTable:
self._namespace_path = namespace_path or []
self._namespace_client = namespace_client
self._pushdown_operations = pushdown_operations or set()
# See LanceTable.__init__: defer QueryTable pushdown to Rust (which emits
# the read-freshness header) for natively-built namespace clients.
self._route_pushdown_to_rust = route_pushdown_to_rust
def _set_namespace_context(
self,
@@ -4279,10 +4306,12 @@ class AsyncTable:
namespace_path: Optional[List[str]] = None,
namespace_client: Optional[Any] = None,
pushdown_operations: Optional[set] = None,
route_pushdown_to_rust: bool = False,
) -> "AsyncTable":
self._namespace_path = namespace_path or []
self._namespace_client = namespace_client
self._pushdown_operations = pushdown_operations or set()
self._route_pushdown_to_rust = route_pushdown_to_rust
return self
def __repr__(self):
@@ -4492,8 +4521,11 @@ class AsyncTable:
-------
pa.Table
"""
if _should_push_down_query_table(
self._namespace_client, self._pushdown_operations
if (
_should_push_down_query_table(
self._namespace_client, self._pushdown_operations
)
and not self._route_pushdown_to_rust
):
return (await self._execute_query(Query())).read_all()
@@ -5177,8 +5209,11 @@ class AsyncTable:
batch_size: Optional[int] = None,
timeout: Optional[timedelta] = None,
) -> pa.RecordBatchReader:
if _should_push_down_query_table(
self._namespace_client, self._pushdown_operations
if (
_should_push_down_query_table(
self._namespace_client, self._pushdown_operations
)
and not self._route_pushdown_to_rust
):
from lancedb.namespace import _execute_server_side_query

View File

@@ -5,6 +5,7 @@
import tempfile
import shutil
import importlib
import pytest
import pyarrow as pa
import lancedb
@@ -65,6 +66,9 @@ def _namespace_lance_table(namespace_client: _NamespaceClient) -> LanceTable:
table._namespace_path = ["geneva"]
table._namespace_client = namespace_client
table._pushdown_operations = {"QueryTable"}
# This test exercises the Python-side pushdown path (non-native client), so
# pushdown is not routed to Rust.
table._route_pushdown_to_rust = False
return table
@@ -100,6 +104,40 @@ class TestNamespaceConnection:
assert isinstance(db, lancedb.LanceNamespaceDBConnection)
assert len(list(db.table_names())) == 0
def test_sync_builtin_namespace_uses_rust_without_python_client(self, monkeypatch):
"""Built-in sync namespace connections should not construct or call the
Python namespace client for normal namespace/table management."""
namespace_module = importlib.import_module("lancedb.namespace")
def fail_namespace_connect(*args, **kwargs):
raise AssertionError("Python namespace client should not be constructed")
monkeypatch.setattr(
namespace_module, "namespace_connect", fail_namespace_connect
)
db = lancedb.connect_namespace("dir", {"root": self.temp_dir})
assert isinstance(db, lancedb.LanceNamespaceDBConnection)
assert db._namespace_client is None
assert db._route_pushdown_to_rust is True
db.create_namespace(["test_ns"])
assert "test_ns" in db.list_namespaces().namespaces
schema = pa.schema([pa.field("id", pa.int64())])
table = db.create_table("test_table", schema=schema, namespace_path=["test_ns"])
assert table.namespace == ["test_ns"]
assert "test_table" in db.table_names(namespace_path=["test_ns"])
assert "test_table" in db.list_tables(namespace_path=["test_ns"]).tables
opened = db.open_table("test_table", namespace_path=["test_ns"])
assert opened.namespace == ["test_ns"]
db.drop_table("test_table", namespace_path=["test_ns"])
assert db.list_tables(namespace_path=["test_ns"]).tables == []
db.drop_namespace(["test_ns"])
assert "test_ns" not in db.list_namespaces().namespaces
def test_create_table_through_namespace(self):
"""Test creating a table through namespace."""
db = lancedb.connect_namespace("dir", {"root": self.temp_dir})
@@ -805,6 +843,38 @@ class TestPushdownOperations:
db = lancedb.connect_namespace("dir", {"root": self.temp_dir})
assert len(db._namespace_client_pushdown_operations) == 0
def test_route_pushdown_to_rust_for_native_rest(self):
"""A natively-built rest connection must defer QueryTable pushdown to
Rust so reads carry the x-lancedb-min-timestamp read-freshness header."""
db = lancedb.connect_namespace(
"rest",
{"uri": "http://localhost:12345"},
namespace_client_pushdown_operations=["QueryTable"],
)
assert db._route_pushdown_to_rust is True
def test_route_pushdown_to_rust_for_native_dir(self):
"""The sync dir connection is natively built and defers QueryTable
pushdown to Rust."""
db = lancedb.connect_namespace("dir", {"root": self.temp_dir})
assert db._route_pushdown_to_rust is True
def test_async_route_pushdown_to_rust_for_native_rest(self):
"""The async connection must not silently bypass the read-freshness fix:
a natively-built rest connection defers pushdown to Rust (regression test
for the async path omitting the freshness header)."""
db = lancedb.connect_namespace_async(
"rest",
{"uri": "http://localhost:12345"},
namespace_client_pushdown_operations=["QueryTable"],
)
assert db._route_pushdown_to_rust is True
def test_async_route_pushdown_to_rust_false_for_dir(self):
"""The async non-native (dir) connection keeps the Python pushdown path."""
db = lancedb.connect_namespace_async("dir", {"root": self.temp_dir})
assert db._route_pushdown_to_rust is False
def test_lance_table_to_arrow_uses_query_pushdown(self):
namespace_client = _NamespaceClient()
table = _namespace_lance_table(namespace_client)

View File

@@ -1137,6 +1137,16 @@ def test_namespace_open_table_with_branch_version(tmp_path):
assert db.open_table("t", namespace_path=["ns1"], branch="exp").count_rows() == 3
def test_namespace_root_table_to_lance_uses_namespace_client(tmp_path):
pytest.importorskip("lance") # "dir" impl is lance.namespace.DirectoryNamespace
db = lancedb.connect_namespace("dir", {"root": str(tmp_path)})
table = db.create_table("t", [{"i": 0}])
assert table._namespace_client is None
assert table.to_lance().count_rows() == 1
assert table._namespace_client is not None
@pytest.mark.asyncio
async def test_async_namespace_open_table_with_branch_version(tmp_path):
pytest.importorskip("lance") # "dir" impl is lance.namespace.DirectoryNamespace

View File

@@ -539,7 +539,7 @@ impl Connection {
}
#[pyfunction]
#[pyo3(signature = (uri, api_key=None, region=None, host_override=None, read_consistency_interval=None, client_config=None, storage_options=None, session=None, manifest_enabled=false, namespace_client_properties=None))]
#[pyo3(signature = (uri, api_key=None, region=None, host_override=None, read_consistency_interval=None, client_config=None, storage_options=None, session=None, manifest_enabled=false, namespace_client_properties=None, oauth_config=None))]
#[allow(clippy::too_many_arguments)]
pub fn connect(
py: Python<'_>,
@@ -553,6 +553,7 @@ pub fn connect(
session: Option<crate::session::Session>,
manifest_enabled: bool,
namespace_client_properties: Option<HashMap<String, String>>,
oauth_config: Option<crate::oauth::PyOAuthConfig>,
) -> PyResult<Bound<'_, PyAny>> {
future_into_py(py, async move {
let mut builder = lancedb::connect(&uri);
@@ -582,6 +583,11 @@ pub fn connect(
if let Some(client_config) = client_config {
builder = builder.client_config(client_config.into());
}
if let Some(oauth_config) = oauth_config {
let config: lancedb::remote::oauth::OAuthConfig =
oauth_config.try_into().infer_error()?;
builder = builder.oauth_config(config);
}
if let Some(session) = session {
builder = builder.session(session.inner.clone());
}
@@ -610,24 +616,38 @@ pub fn connect_namespace_client(
namespace_client_impl: Option<String>,
namespace_client_properties: Option<HashMap<String, String>>,
) -> PyResult<Connection> {
let namespace_client = extract_namespace_arc(py, namespace_client)?;
let read_consistency_interval = read_consistency_interval.map(Duration::from_secs_f64);
let namespace_client_pushdown_operations =
parse_namespace_client_pushdown_operations(namespace_client_pushdown_operations)?;
let ns_impl = namespace_client_impl.unwrap_or_else(|| "python".to_string());
let ns_properties = namespace_client_properties.unwrap_or_default();
let storage_options = storage_options.unwrap_or_default();
let session = session.map(|s| s.inner.clone());
let database = LanceNamespaceDatabase::from_namespace_client(
namespace_client,
ns_impl,
ns_properties,
storage_options,
read_consistency_interval,
session,
namespace_client_pushdown_operations,
);
// Prefer building the namespace natively from (impl, properties) so the
// read-freshness provider installed
let database = if build_namespace_natively(namespace_client_impl.as_deref(), &ns_properties) {
let ns_impl = namespace_client_impl.expect("impl present per build_namespace_natively");
crate::runtime::block_on(LanceNamespaceDatabase::connect(
&ns_impl,
ns_properties,
storage_options,
read_consistency_interval,
session,
namespace_client_pushdown_operations,
))
.infer_error()?
} else {
let namespace_client = extract_namespace_arc(py, namespace_client)?;
LanceNamespaceDatabase::from_namespace_client(
namespace_client,
namespace_client_impl.unwrap_or_else(|| "python".to_string()),
ns_properties,
storage_options,
read_consistency_interval,
session,
namespace_client_pushdown_operations,
)
};
Ok(Connection::new(LanceConnection::new(
Arc::new(database),
@@ -635,6 +655,56 @@ pub fn connect_namespace_client(
)))
}
#[pyfunction]
#[pyo3(signature = (
namespace_client_impl,
namespace_client_properties,
read_consistency_interval=None,
storage_options=None,
session=None,
namespace_client_pushdown_operations=None,
))]
#[allow(clippy::too_many_arguments)]
pub fn connect_namespace(
namespace_client_impl: String,
namespace_client_properties: HashMap<String, String>,
read_consistency_interval: Option<f64>,
storage_options: Option<HashMap<String, String>>,
session: Option<crate::session::Session>,
namespace_client_pushdown_operations: Option<Vec<String>>,
) -> PyResult<Connection> {
let read_consistency_interval = read_consistency_interval.map(Duration::from_secs_f64);
let namespace_client_pushdown_operations =
parse_namespace_client_pushdown_operations(namespace_client_pushdown_operations)?;
let mut builder =
lancedb::connect_namespace(&namespace_client_impl, namespace_client_properties)
.pushdown_operations(namespace_client_pushdown_operations);
if let Some(storage_options) = storage_options {
builder = builder.storage_options(storage_options);
}
if let Some(read_consistency_interval) = read_consistency_interval {
builder = builder.read_consistency_interval(read_consistency_interval);
}
if let Some(session) = session {
builder = builder.session(session.inner.clone());
}
Ok(Connection::new(
crate::runtime::block_on(builder.execute()).infer_error()?,
))
}
/// Whether to build the namespace natively (from impl + properties) instead of
/// wrapping a pre-built client. Native construction is required for the
/// read-freshness provider to be installed
fn build_namespace_natively(
namespace_client_impl: Option<&str>,
namespace_client_properties: &HashMap<String, String>,
) -> bool {
matches!(namespace_client_impl, Some("rest")) && !namespace_client_properties.is_empty()
}
#[derive(FromPyObject)]
pub struct PyClientConfig {
user_agent: String,
@@ -733,3 +803,36 @@ impl From<PyClientConfig> for lancedb::remote::ClientConfig {
}
}
}
#[cfg(test)]
mod tests {
use super::*;
fn props(pairs: &[(&str, &str)]) -> HashMap<String, String> {
pairs
.iter()
.map(|(k, v)| (k.to_string(), v.to_string()))
.collect()
}
#[test]
fn native_build_only_for_rest_with_properties() {
let rest = props(&[("uri", "http://localhost:10024")]);
// rest + non-empty properties -> build natively (installs the
// read-freshness provider so checkout_latest() busts the server cache).
assert!(build_namespace_natively(Some("rest"), &rest));
// dir is local (no server cache) -> wrap the pre-built client unchanged.
assert!(!build_namespace_natively(
Some("dir"),
&props(&[("root", "/tmp")])
));
// No impl: only a pre-built client was handed in -> wrap it as-is.
assert!(!build_namespace_natively(None, &rest));
// rest but no properties: nothing to build a connection from -> wrap.
assert!(!build_namespace_natively(Some("rest"), &HashMap::new()));
}
}

View File

@@ -2,7 +2,7 @@
// SPDX-FileCopyrightText: Copyright The LanceDB Authors
use arrow::RecordBatchStream;
use connection::{Connection, connect, connect_namespace_client};
use connection::{Connection, connect, connect_namespace, connect_namespace_client};
use env_logger::Env;
use expr::{PyExpr, expr_col, expr_func, expr_lit};
use index::IndexConfig;
@@ -26,6 +26,7 @@ pub mod expr;
pub mod header;
pub mod index;
pub mod namespace;
pub mod oauth;
pub mod permutation;
pub mod query;
pub mod runtime;
@@ -61,6 +62,7 @@ pub fn _lancedb(_py: Python, m: &Bound<'_, PyModule>) -> PyResult<()> {
m.add_class::<PyPermutationReader>()?;
m.add_class::<PyExpr>()?;
m.add_function(wrap_pyfunction!(connect, m)?)?;
m.add_function(wrap_pyfunction!(connect_namespace, m)?)?;
m.add_function(wrap_pyfunction!(connect_namespace_client, m)?)?;
m.add_function(wrap_pyfunction!(permutation::async_permutation_builder, m)?)?;
m.add_function(wrap_pyfunction!(util::validate_table_name, m)?)?;

72
python/src/oauth.rs Normal file
View File

@@ -0,0 +1,72 @@
// SPDX-License-Identifier: Apache-2.0
// SPDX-FileCopyrightText: Copyright The LanceDB Authors
use pyo3::FromPyObject;
use lancedb::error::Error;
use lancedb::remote::oauth::{OAuthConfig, OAuthFlow};
/// Python-side OAuth configuration, extracted via FromPyObject.
/// Maps to `lancedb.remote.oauth.OAuthConfig` Python dataclass.
#[derive(FromPyObject)]
pub struct PyOAuthConfig {
pub issuer_url: String,
pub client_id: String,
pub scopes: Vec<String>,
pub flow: String,
pub client_secret: Option<String>,
pub managed_identity_client_id: Option<String>,
pub refresh_buffer_secs: Option<u64>,
}
impl TryFrom<PyOAuthConfig> for OAuthConfig {
type Error = Error;
fn try_from(py: PyOAuthConfig) -> Result<Self, Self::Error> {
let flow = match py.flow.as_str() {
"client_credentials" => OAuthFlow::ClientCredentials,
"azure_managed_identity" => OAuthFlow::AzureManagedIdentity {
client_id: py.managed_identity_client_id,
},
other => {
return Err(Error::InvalidInput {
message: format!("Unknown OAuth flow type: {other}"),
});
}
};
Ok(Self {
issuer_url: py.issuer_url,
client_id: py.client_id,
client_secret: py.client_secret,
scopes: py.scopes,
flow,
refresh_buffer_secs: py.refresh_buffer_secs,
})
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_unknown_oauth_flow_returns_invalid_input() {
let config = PyOAuthConfig {
issuer_url: "https://issuer.example.com".to_string(),
client_id: "client-id".to_string(),
scopes: vec!["scope".to_string()],
flow: "typo".to_string(),
client_secret: None,
managed_identity_client_id: None,
refresh_buffer_secs: None,
};
let err = OAuthConfig::try_from(config).unwrap_err();
assert!(matches!(
err,
Error::InvalidInput { message }
if message == "Unknown OAuth flow type: typo"
));
}
}

View File

@@ -56,6 +56,15 @@ fn get_runtime() -> &'static runtime::Runtime {
unsafe { &*new_ptr }
}
/// Block the current thread on a future using the shared runtime.
///
/// For sync `#[pyfunction]`s that need to drive an async operation (e.g.
/// building a namespace client). Must not be called from within the runtime's
/// own worker threads.
pub fn block_on<F: std::future::Future>(fut: F) -> F::Output {
get_runtime().block_on(fut)
}
/// Runs in async-signal context after `fork()` in the child. We can only
/// touch atomics here; we deliberately leak the previous runtime because
/// dropping a tokio `Runtime` would try to join its (now-dead) worker

View File

@@ -0,0 +1,33 @@
# SPDX-License-Identifier: Apache-2.0
# SPDX-FileCopyrightText: Copyright The LanceDB Authors
import importlib.util
import sys
from pathlib import Path
def _load_oauth_module():
oauth_path = (
Path(__file__).parents[1] / "python" / "lancedb" / "remote" / "oauth.py"
)
spec = importlib.util.spec_from_file_location("lancedb_remote_oauth", oauth_path)
module = importlib.util.module_from_spec(spec)
assert spec.loader is not None
sys.modules[spec.name] = module
spec.loader.exec_module(module)
return module
def test_oauth_config_repr_redacts_client_secret():
oauth = _load_oauth_module()
config = oauth.OAuthConfig(
issuer_url="https://issuer.example.com",
client_id="client-id",
scopes=["scope"],
client_secret="super-secret",
)
rendered = repr(config)
assert "super-secret" not in rendered
assert "client_secret" not in rendered

View File

@@ -1,6 +1,6 @@
[package]
name = "lancedb"
version = "0.31.0-beta.2"
version = "0.31.0-beta.5"
edition.workspace = true
description = "LanceDB: A serverless, low-latency vector database for AI applications"
license.workspace = true
@@ -50,7 +50,7 @@ lance-namespace = { workspace = true }
lance-namespace-impls = { workspace = true }
moka = { workspace = true }
pin-project = { workspace = true }
tokio = { version = "1.23", features = ["rt-multi-thread"] }
tokio = { version = "1.23", features = ["rt-multi-thread", "sync"] }
log.workspace = true
async-trait = "0"
bytes = "1"
@@ -75,6 +75,7 @@ reqwest = { version = "0.12.0", default-features = false, features = [
"stream",
], optional = true }
http = { version = "1", optional = true } # Matching what is in reqwest
urlencoding = { version = "2", optional = true }
uuid = { version = "1.7.0", features = ["v4", "v5"] }
polars-arrow = { version = ">=0.37,<0.40.0", optional = true }
polars = { version = ">=0.37,<0.40.0", optional = true }
@@ -93,6 +94,7 @@ semver = { workspace = true }
anyhow = "1"
tempfile = "3.5.0"
random_word = { version = "0.4.3", features = ["en"] }
tokio = { version = "1.23", features = ["io-util", "macros", "net", "rt-multi-thread", "sync"] }
uuid = { version = "1.7.0", features = ["v4"] }
walkdir = "2"
aws-sdk-dynamodb = { version = "1.55.0" }
@@ -129,7 +131,13 @@ huggingface = [
"lance-namespace-impls/dir-huggingface",
]
dynamodb = ["lance/dynamodb", "aws"]
remote = ["dep:reqwest", "dep:http", "lance-namespace-impls/rest", "lance-namespace-impls/rest-adapter"]
remote = [
"dep:reqwest",
"dep:http",
"dep:urlencoding",
"lance-namespace-impls/rest",
"lance-namespace-impls/rest-adapter",
]
fp16kernels = ["lance-linalg/fp16kernels"]
s3-test = []
bedrock = ["dep:aws-sdk-bedrockruntime"]
@@ -158,6 +166,10 @@ required-features = ["bedrock"]
[[example]]
name = "simple"
[[example]]
name = "polars"
required-features = ["polars"]
[[example]]
name = "full_text_search"

View File

@@ -0,0 +1,47 @@
// SPDX-License-Identifier: Apache-2.0
// SPDX-FileCopyrightText: Copyright The LanceDB Authors
//! This example demonstrates ingesting a Polars DataFrame into LanceDB and
//! reading it back out as a Polars DataFrame.
use lancedb::arrow::IntoPolars;
use lancedb::query::ExecutableQuery;
use lancedb::{Result, connect};
use polars::prelude::{DataFrame, NamedFrom, Series};
fn make_dataframe() -> DataFrame {
let ids = Series::new("id", &[1i32, 2, 3, 4, 5]);
let names = Series::new("name", &["Alice", "Bob", "Carol", "Dave", "Eve"]);
let scores = Series::new("score", &[9.5f64, 8.1, 7.3, 9.0, 6.5]);
DataFrame::new(vec![ids, names, scores]).unwrap()
}
#[tokio::main]
async fn main() -> Result<()> {
let tmp = tempfile::tempdir().unwrap();
let db = connect(tmp.path().to_str().unwrap()).execute().await?;
// Ingest a Polars DataFrame directly — DataFrame now implements Scannable.
let df = make_dataframe();
println!("Input DataFrame:\n{df}");
let table = db.create_table("people", df).execute().await?;
// Append more rows.
let more = DataFrame::new(vec![
Series::new("id", &[6i32, 7]),
Series::new("name", &["Frank", "Grace"]),
Series::new("score", &[7.8f64, 8.9]),
])
.unwrap();
table.add(more).execute().await?;
// Read back as a Polars DataFrame.
let result_df = table.query().execute().await?.into_polars().await?;
println!(
"\nRound-tripped DataFrame ({} rows):\n{result_df}",
result_df.height()
);
Ok(())
}

View File

@@ -112,54 +112,14 @@ impl<S: Stream<Item = Result<arrow_array::RecordBatch>>> RecordBatchStream
/// A trait for converting incoming data to Arrow
///
/// Integrations should implement this trait to allow data to be
/// imported directly from the integration. For example, implementing
/// this trait for `Vec<Vec<...>>` would allow the `Vec` to be directly
/// used in methods like [`crate::connection::Connection::create_table`]
/// or [`crate::table::Table::add`]
pub trait IntoArrow {
/// Convert the data into an iterator of Arrow batches
fn into_arrow(self) -> Result<Box<dyn arrow_array::RecordBatchReader + Send>>;
}
pub type BoxedRecordBatchReader = Box<dyn arrow_array::RecordBatchReader + Send>;
impl<T: arrow_array::RecordBatchReader + Send + 'static> IntoArrow for T {
fn into_arrow(self) -> Result<Box<dyn arrow_array::RecordBatchReader + Send>> {
Ok(Box::new(self))
}
}
/// A trait for converting incoming data to Arrow asynchronously
///
/// Serves the same purpose as [`IntoArrow`], but for asynchronous data.
///
/// Note: Arrow has no async equivalent to RecordBatchReader and so
pub trait IntoArrowStream {
/// Convert the data into a stream of Arrow batches
fn into_arrow(self) -> Result<SendableRecordBatchStream>;
}
impl<S: Stream<Item = Result<arrow_array::RecordBatch>>> SimpleRecordBatchStream<S> {
pub fn new(stream: S, schema: Arc<arrow_schema::Schema>) -> Self {
Self { schema, stream }
}
}
impl IntoArrowStream for SendableRecordBatchStream {
fn into_arrow(self) -> Result<SendableRecordBatchStream> {
Ok(self)
}
}
impl IntoArrowStream for datafusion_physical_plan::SendableRecordBatchStream {
fn into_arrow(self) -> Result<SendableRecordBatchStream> {
let schema = self.schema();
let stream = self.map_err(|df_err| df_err.into());
Ok(Box::pin(SimpleRecordBatchStream::new(stream, schema)))
}
}
pub trait LanceDbDatagenExt {
fn into_ldb_stream(
self,
@@ -264,9 +224,7 @@ impl IntoPolars for SendableRecordBatchStream {
#[cfg(all(test, feature = "polars"))]
mod tests {
use super::SendableRecordBatchStream;
use crate::arrow::{
IntoArrow, IntoPolars, PolarsDataFrameRecordBatchReader, SimpleRecordBatchStream,
};
use crate::arrow::{IntoPolars, PolarsDataFrameRecordBatchReader, SimpleRecordBatchStream};
use polars::prelude::{DataFrame, NamedFrom, Series};
fn get_record_batch_reader_from_polars() -> Box<dyn arrow_array::RecordBatchReader + Send> {
@@ -280,10 +238,7 @@ mod tests {
float_series = Series::new("float", &[2.0]);
let df2 = DataFrame::new(vec![string_series, int_series, float_series]).unwrap();
PolarsDataFrameRecordBatchReader::new(df1.vstack(&df2).unwrap())
.unwrap()
.into_arrow()
.unwrap()
Box::new(PolarsDataFrameRecordBatchReader::new(df1.vstack(&df2).unwrap()).unwrap())
}
#[test]

View File

@@ -667,6 +667,8 @@ pub struct ConnectRequest {
pub struct ConnectBuilder {
request: ConnectRequest,
embedding_registry: Option<Arc<dyn EmbeddingRegistry>>,
#[cfg(feature = "remote")]
oauth_config: Option<crate::remote::OAuthConfig>,
}
#[cfg(feature = "remote")]
@@ -688,6 +690,8 @@ impl ConnectBuilder {
session: None,
},
embedding_registry: None,
#[cfg(feature = "remote")]
oauth_config: None,
}
}
@@ -776,6 +780,19 @@ impl ConnectBuilder {
self
}
/// Configure OAuth authentication for LanceDB Cloud/Enterprise.
///
/// This creates an [`OAuthHeaderProvider`](crate::remote::OAuthHeaderProvider)
/// from the given config and sets it as the header provider. OAuth cannot
/// be combined with an API key or another header provider.
///
/// Token acquisition and refresh are handled in Rust.
#[cfg(feature = "remote")]
pub fn oauth_config(mut self, config: crate::remote::OAuthConfig) -> Self {
self.oauth_config = Some(config);
self
}
/// Provide a custom [`EmbeddingRegistry`] to use for this connection.
pub fn embedding_registry(mut self, registry: Arc<dyn EmbeddingRegistry>) -> Self {
self.embedding_registry = Some(registry);
@@ -921,9 +938,40 @@ impl ConnectBuilder {
let region = options.region.ok_or_else(|| Error::InvalidInput {
message: "A region is required when connecting to LanceDb Cloud".to_string(),
})?;
let api_key = options.api_key.ok_or_else(|| Error::InvalidInput {
message: "An api_key is required when connecting to LanceDb Cloud".to_string(),
})?;
let api_key = match (&self.oauth_config, &options.api_key) {
(Some(_), None) => String::new(),
(Some(_), Some(_)) => {
return Err(Error::InvalidInput {
message:
"api_key and oauth_config cannot both be set when connecting to LanceDb Cloud"
.to_string(),
});
}
(None, Some(key)) => key.clone(),
(None, None) => {
return Err(Error::InvalidInput {
message:
"An api_key or oauth_config is required when connecting to LanceDb Cloud"
.to_string(),
});
}
};
if self.oauth_config.is_some() && self.request.client_config.header_provider.is_some() {
return Err(Error::InvalidInput {
message:
"oauth_config and client_config.header_provider cannot both be set when connecting to LanceDb Cloud"
.to_string(),
});
}
let mut client_config = self.request.client_config;
if let Some(oauth_config) = self.oauth_config {
let provider = crate::remote::OAuthHeaderProvider::new(oauth_config)?;
client_config.header_provider =
Some(Arc::new(provider) as Arc<dyn crate::remote::HeaderProvider>);
}
let storage_options = StorageOptions(options.storage_options.clone());
let internal = Arc::new(crate::remote::db::RemoteDatabase::try_new(
@@ -931,7 +979,7 @@ impl ConnectBuilder {
&api_key,
&region,
options.host_override,
self.request.client_config,
client_config,
storage_options.into(),
self.request.read_consistency_interval,
)?);
@@ -1240,6 +1288,83 @@ mod tests {
assert_eq!(Some(&"EXPLICIT-VALUE".to_string()), options.get(opts_key));
}
#[cfg(feature = "remote")]
#[tokio::test]
async fn test_connect_rejects_api_key_with_oauth_config() {
let oauth_config = crate::remote::OAuthConfig {
issuer_url: "https://issuer.example.com".to_string(),
client_id: "client-id".to_string(),
client_secret: Some("secret".to_string()),
scopes: vec!["scope".to_string()],
flow: crate::remote::OAuthFlow::ClientCredentials,
refresh_buffer_secs: None,
};
let result = ConnectBuilder::new("db://my-container/my-prefix")
.region("us-east-1")
.api_key("my-api-key")
.oauth_config(oauth_config)
.execute()
.await;
match result {
Err(Error::InvalidInput { message })
if message
== "api_key and oauth_config cannot both be set when connecting to LanceDb Cloud" =>
{}
Err(err) => panic!("expected InvalidInput, got {err:?}"),
Ok(_) => panic!("expected api_key and oauth_config to be rejected"),
}
}
#[cfg(feature = "remote")]
#[tokio::test]
async fn test_connect_rejects_header_provider_with_oauth_config() {
#[derive(Debug)]
struct TestHeaderProvider;
#[async_trait::async_trait]
impl crate::remote::HeaderProvider for TestHeaderProvider {
async fn get_headers(&self) -> Result<HashMap<String, String>> {
Ok(HashMap::from([(
"authorization".to_string(),
"Bearer token".to_string(),
)]))
}
}
let oauth_config = crate::remote::OAuthConfig {
issuer_url: "https://issuer.example.com".to_string(),
client_id: "client-id".to_string(),
client_secret: Some("secret".to_string()),
scopes: vec!["scope".to_string()],
flow: crate::remote::OAuthFlow::ClientCredentials,
refresh_buffer_secs: None,
};
let client_config = crate::remote::ClientConfig {
header_provider: Some(
Arc::new(TestHeaderProvider) as Arc<dyn crate::remote::HeaderProvider>
),
..Default::default()
};
let result = ConnectBuilder::new("db://my-container/my-prefix")
.region("us-east-1")
.client_config(client_config)
.oauth_config(oauth_config)
.execute()
.await;
match result {
Err(Error::InvalidInput { message })
if message
== "oauth_config and client_config.header_provider cannot both be set when connecting to LanceDb Cloud" =>
{}
Err(err) => panic!("expected InvalidInput, got {err:?}"),
Ok(_) => panic!("expected header_provider and oauth_config to be rejected"),
}
}
#[cfg(not(windows))]
#[tokio::test]
async fn test_connect_relative() {

View File

@@ -185,6 +185,43 @@ impl Scannable for SendableRecordBatchStream {
}
}
#[cfg(feature = "polars")]
impl Scannable for polars::frame::DataFrame {
fn schema(&self) -> SchemaRef {
crate::polars_arrow_convertors::convert_polars_df_schema_to_arrow_rb_schema(
self.schema().clone(),
)
.expect("failed to convert Polars DataFrame schema to Arrow schema")
}
fn scan_as_stream(&mut self) -> SendableRecordBatchStream {
let schema = Scannable::schema(self);
let batches: crate::Result<Vec<RecordBatch>> =
match crate::arrow::PolarsDataFrameRecordBatchReader::new(self.clone()) {
Err(e) => Err(e),
Ok(reader) => reader.map(|b| b.map_err(Into::into)).collect(),
};
match batches {
Err(e) => Box::pin(SimpleRecordBatchStream {
schema,
stream: once(async move { Err(e) }),
}),
Ok(batches) => {
let stream = futures::stream::iter(batches.into_iter().map(Ok));
Box::pin(SimpleRecordBatchStream { schema, stream })
}
}
}
fn num_rows(&self) -> Option<usize> {
Some(self.height())
}
fn rescannable(&self) -> bool {
true
}
}
#[async_trait]
impl StreamingWriteSource for Box<dyn Scannable> {
fn arrow_schema(&self) -> SchemaRef {
@@ -1089,4 +1126,60 @@ mod tests {
);
}
}
#[cfg(feature = "polars")]
mod polars_tests {
use super::*;
use crate::arrow::IntoPolars;
use crate::query::ExecutableQuery;
use polars::prelude::{DataFrame, NamedFrom, Series};
fn make_df() -> DataFrame {
DataFrame::new(vec![
Series::new("id", &[1i32, 2, 3]),
Series::new("val", &[1.1f64, 2.2, 3.3]),
])
.unwrap()
}
#[tokio::test]
async fn test_dataframe_scannable_round_trip() {
let tmp = tempfile::tempdir().unwrap();
let db = crate::connect(tmp.path().to_str().unwrap())
.execute()
.await
.unwrap();
let df = make_df();
let table = db.create_table("t", df.clone()).execute().await.unwrap();
// Append the same rows again.
table.add(df.clone()).execute().await.unwrap();
let result = table
.query()
.execute()
.await
.unwrap()
.into_polars()
.await
.unwrap();
assert_eq!(result.height(), df.height() * 2);
assert_eq!(result.schema(), df.schema());
}
#[tokio::test]
async fn test_dataframe_scannable_rescannable() {
let mut df = make_df();
assert!(df.rescannable());
let batches1: Vec<RecordBatch> = df.scan_as_stream().try_collect().await.unwrap();
assert_eq!(batches1.iter().map(|b| b.num_rows()).sum::<usize>(), 3);
// Can be scanned again.
let batches2: Vec<RecordBatch> = df.scan_as_stream().try_collect().await.unwrap();
assert_eq!(batches2.iter().map(|b| b.num_rows()).sum::<usize>(), 3);
}
}
}

View File

@@ -14,6 +14,7 @@ use lance::io::{ObjectStore, ObjectStoreParams, WrappingObjectStore};
use lance_datafusion::utils::StreamingWriteSource;
use lance_encoding::version::LanceFileVersion;
use lance_io::object_store::{StorageOptionsAccessor, StorageOptionsProvider};
use lance_table::io::commit::commit_handler_from_url;
use object_store::local::LocalFileSystem;
use snafu::ResultExt;
@@ -234,9 +235,11 @@ impl ListingDatabaseOptionsBuilder {
/// We will have two tables named `table1` and `table2`.
#[derive(Debug)]
pub struct ListingDatabase {
object_store: Arc<ObjectStore>,
query_string: Option<String>,
pub(crate) uri: String,
pub(crate) base_path: object_store::path::Path,
// the object store wrapper to use on write path
pub(crate) store_wrapper: Option<Arc<dyn WrappingObjectStore>>,
@@ -255,13 +258,8 @@ pub struct ListingDatabase {
// Session for object stores and caching
session: Arc<lance::session::Session>,
// Namespace-backed database for child namespace operations (manifest mode).
// Namespace-backed database for child namespace operations
namespace_database: Arc<LanceNamespaceDatabase>,
// V1 (manifest-disabled) directory namespace for root table lifecycle, so root
// drops are soft-deletes and purge/table_status are available. Shares the same root
// as `namespace_database` but in directory mode.
root_namespace_database: Arc<LanceNamespaceDatabase>,
}
impl std::fmt::Display for ListingDatabase {
@@ -282,6 +280,7 @@ impl std::fmt::Display for ListingDatabase {
}
}
const LANCE_EXTENSION: &str = "lance";
const ENGINE: &str = "engine";
const MIRRORED_STORE: &str = "mirroredStore";
@@ -343,39 +342,6 @@ impl ListingDatabase {
))
}
/// Build the V1 (manifest-disabled) directory namespace used for *root* table
/// lifecycle ops.
///
/// Root tables in a listing database are flat `<name>.lance` directories; soft-delete
/// (drop/purge/TTL) is a V1-only mechanism, so root ops go through this namespace.
/// Child namespaces are manifest-backed and handled by the separate
/// (manifest-enabled) `namespace_database`.
async fn connect_root_namespace_database(
uri: &str,
storage_options: HashMap<String, String>,
namespace_client_properties: HashMap<String, String>,
read_consistency_interval: Option<std::time::Duration>,
session: Arc<lance::session::Session>,
) -> Result<Arc<LanceNamespaceDatabase>> {
let mut ns_properties = Self::build_namespace_client_properties(
uri,
&storage_options,
namespace_client_properties,
);
ns_properties.insert("manifest_enabled".to_string(), "false".to_string());
Ok(Arc::new(
LanceNamespaceDatabase::connect(
"dir",
ns_properties,
storage_options,
read_consistency_interval,
Some(session),
HashSet::new(),
)
.await?,
))
}
async fn prepare_namespace_root(
uri: &str,
storage_options: &HashMap<String, String>,
@@ -582,7 +548,7 @@ impl ListingDatabase {
},
..Default::default()
};
let (object_store, _base_path) = ObjectStore::from_uri_and_params(
let (object_store, base_path) = ObjectStore::from_uri_and_params(
session.store_registry(),
&plain_uri,
&os_params,
@@ -611,18 +577,12 @@ impl ListingDatabase {
session.clone(),
)
.await?;
let root_namespace_database = Self::connect_root_namespace_database(
&table_base_uri,
options.storage_options.clone(),
request.namespace_client_properties.clone(),
request.read_consistency_interval,
session.clone(),
)
.await?;
Ok(Self {
uri: table_base_uri,
query_string,
base_path,
object_store,
store_wrapper: write_store_wrapper,
read_consistency_interval: request.read_consistency_interval,
storage_options: options.storage_options,
@@ -630,7 +590,6 @@ impl ListingDatabase {
new_table_config: options.new_table_config,
session,
namespace_database,
root_namespace_database,
})
}
Err(_) => {
@@ -654,7 +613,7 @@ impl ListingDatabase {
session: Option<Arc<lance::session::Session>>,
) -> Result<Self> {
let session = session.unwrap_or_else(|| Arc::new(lance::session::Session::default()));
let (object_store, _base_path) = ObjectStore::from_uri_and_params(
let (object_store, base_path) = ObjectStore::from_uri_and_params(
session.store_registry(),
path,
&ObjectStoreParams::default(),
@@ -665,14 +624,6 @@ impl ListingDatabase {
}
let namespace_database = Self::connect_namespace_database(
path,
HashMap::new(),
namespace_client_properties.clone(),
read_consistency_interval,
session.clone(),
)
.await?;
let root_namespace_database = Self::connect_root_namespace_database(
path,
HashMap::new(),
namespace_client_properties,
@@ -684,6 +635,8 @@ impl ListingDatabase {
Ok(Self {
uri: path.to_string(),
query_string: None,
base_path,
object_store,
store_wrapper: None,
read_consistency_interval,
storage_options: HashMap::new(),
@@ -691,7 +644,6 @@ impl ListingDatabase {
new_table_config,
session,
namespace_database,
root_namespace_database,
})
}
@@ -753,10 +705,42 @@ impl ListingDatabase {
self.namespace_database.clone()
}
/// The V1 directory namespace used for root table lifecycle (soft-delete drop, purge,
/// table_status, O(1) listing).
fn root_namespace_database(&self) -> Arc<LanceNamespaceDatabase> {
self.root_namespace_database.clone()
async fn drop_tables(&self, names: Vec<String>) -> Result<()> {
let object_store_params = ObjectStoreParams {
storage_options_accessor: if self.storage_options.is_empty() {
None
} else {
Some(Arc::new(StorageOptionsAccessor::with_static_options(
self.storage_options.clone(),
)))
},
..Default::default()
};
let mut uri = self.uri.clone();
if let Some(query_string) = &self.query_string {
uri.push_str(&format!("?{}", query_string));
}
let commit_handler = commit_handler_from_url(&uri, &Some(object_store_params)).await?;
for name in names {
let dir_name = format!("{}.{}", name, LANCE_EXTENSION);
let full_path = self.base_path.clone().join(dir_name.clone());
commit_handler.delete(&full_path).await?;
self.object_store
.remove_dir_all(full_path.clone())
.await
.map_err(|err| match err {
// this error is not lance::Error::DatasetNotFound, as the method
// `remove_dir_all` may be used to remove something not be a dataset
lance::Error::NotFound { .. } => Error::TableNotFound {
name: name.clone(),
source: Box::new(err),
},
_ => Error::from(err),
})?;
}
Ok(())
}
/// Inherit storage options from the connection into the target map
@@ -962,43 +946,88 @@ impl Database for ListingDatabase {
if !request.namespace_path.is_empty() {
return self.namespace_database().table_names(request).await;
}
// Root tables: the V1 namespace lists them in a single read_dir (O(1) requests)
// and excludes soft-deleted tables, instead of a per-table probe here.
self.root_namespace_database().table_names(request).await
let mut f = self
.object_store
.read_dir(self.base_path.clone())
.await?
.iter()
.map(Path::new)
.filter(|path| {
let is_lance = path
.extension()
.and_then(|e| e.to_str())
.map(|e| e == LANCE_EXTENSION);
is_lance.unwrap_or(false)
})
.filter_map(|p| p.file_stem().and_then(|s| s.to_str().map(String::from)))
.collect::<Vec<String>>();
f.sort();
if let Some(start_after) = request.start_after {
let index = f
.iter()
.position(|name| name.as_str() > start_after.as_str())
.unwrap_or(f.len());
f.drain(0..index);
}
if let Some(limit) = request.limit {
f.truncate(limit as usize);
}
Ok(f)
}
async fn list_tables(&self, request: ListTablesRequest) -> Result<ListTablesResponse> {
if request.id.as_ref().map(|v| !v.is_empty()).unwrap_or(false) {
return self.namespace_database().list_tables(request).await;
}
self.root_namespace_database().list_tables(request).await
let mut f = self
.object_store
.read_dir(self.base_path.clone())
.await?
.iter()
.map(Path::new)
.filter(|path| {
let is_lance = path
.extension()
.and_then(|e| e.to_str())
.map(|e| e == LANCE_EXTENSION);
is_lance.unwrap_or(false)
})
.filter_map(|p| p.file_stem().and_then(|s| s.to_str().map(String::from)))
.collect::<Vec<String>>();
f.sort();
// Handle pagination with page_token
if let Some(ref page_token) = request.page_token {
let index = f
.iter()
.position(|name| name.as_str() > page_token.as_str())
.unwrap_or(f.len());
f.drain(0..index);
}
// Determine if there's a next page
let next_page_token = if let Some(limit) = request.limit {
if f.len() > limit as usize {
let token = f[limit as usize].clone();
f.truncate(limit as usize);
Some(token)
} else {
None
}
} else {
None
};
Ok(ListTablesResponse {
tables: f,
page_token: next_page_token,
})
}
async fn create_table(&self, request: CreateTableRequest) -> Result<Arc<dyn BaseTable>> {
if !request.namespace_path.is_empty() {
return self.namespace_database().create_table(request).await;
}
let mut request = request;
// Re-creating a soft-deleted table is a revive: clear the delete marker (via the
// V1 root namespace, under its lifecycle lock so a concurrent purge can't race),
// making the table live again, then overwrite its data through the native create
// path below (preserving lineage as a new version). A plain native create would
// leave the marker in place, keeping the table hidden.
if matches!(
self.root_namespace_database()
.namespace_client()
.await?
.table_status(Some(vec![request.name.clone()]))
.await?,
lance_namespace::TableLifecycle::SoftDeleted { .. }
) {
self.root_namespace_database()
.namespace_client()
.await?
.undelete_table(Some(vec![request.name.clone()]))
.await?;
request.mode = CreateTableMode::Overwrite;
}
// Use provided location if available, otherwise derive from table name
let table_uri = request
.location
@@ -1117,19 +1146,6 @@ impl Database for ListingDatabase {
if !request.namespace_path.is_empty() {
return self.namespace_database().open_table(request).await;
}
// A soft-deleted (dropped-but-not-purged) table must read as absent even though
// its data still exists on disk. Consult the V1 root namespace (which owns the
// marker); if soft-deleted, route to it so the open surfaces TableNotFound.
if matches!(
self.root_namespace_database()
.namespace_client()
.await?
.table_status(Some(vec![request.name.clone()]))
.await?,
lance_namespace::TableLifecycle::SoftDeleted { .. }
) {
return self.root_namespace_database().open_table(request).await;
}
// Use provided location if available, otherwise derive from table name
let table_uri = request
.location
@@ -1229,23 +1245,20 @@ impl Database for ListingDatabase {
.drop_table(name, namespace_path)
.await;
}
// Root table: route through the V1 namespace so the drop is a soft-delete (writes
// a marker, leaves data for later purge) rather than an immediate remove_dir_all.
self.root_namespace_database()
.drop_table(name, namespace_path)
.await
self.drop_tables(vec![name.to_string()]).await
}
#[allow(deprecated)]
async fn drop_all_tables(&self, namespace_path: &[String]) -> Result<()> {
// Check if namespace parameter is provided
if !namespace_path.is_empty() {
return self
.namespace_database()
.drop_all_tables(namespace_path)
.await;
}
self.root_namespace_database()
.drop_all_tables(namespace_path)
.await
let tables = self.table_names(TableNamesRequest::default()).await?;
self.drop_tables(tables).await
}
fn as_any(&self) -> &dyn std::any::Any {
@@ -1253,9 +1266,6 @@ impl Database for ListingDatabase {
}
async fn namespace_client(&self) -> Result<Arc<dyn lance_namespace::LanceNamespace>> {
// Returns the manifest-backed namespace so callers can operate on child
// namespaces (multi-level table ids) through the client. Root-table soft-delete
// lifecycle (table_status/purge) is reached via the V1 root namespace internally.
self.namespace_database.namespace_client().await
}
@@ -2605,67 +2615,4 @@ mod tests {
.unwrap();
assert!(post_drop.tables.is_empty());
}
/// Root-table drop is a soft-delete routed through the V1 namespace: the table is
/// hidden from listing/open but its data survives until purged, and re-creating it
/// revives it. Verifies the consolidation end-to-end at the ListingDatabase level.
#[tokio::test]
async fn test_root_table_soft_delete_lifecycle() {
let (_tempdir, db) = setup_database().await;
let schema = Arc::new(Schema::new(vec![Field::new("id", DataType::Int32, false)]));
let create = |name: &str| CreateTableRequest {
name: name.to_string(),
namespace_path: vec![],
data: Box::new(RecordBatch::new_empty(schema.clone())) as Box<dyn Scannable>,
mode: CreateTableMode::Create,
write_options: Default::default(),
location: None,
namespace_client: None,
};
let open = |name: &str| OpenTableRequest {
name: name.to_string(),
namespace_path: vec![],
index_cache_size: None,
lance_read_params: None,
location: None,
namespace_client: None,
managed_versioning: None,
};
db.create_table(create("t")).await.unwrap();
db.drop_table("t", &[]).await.unwrap();
// Hidden from listing and not openable...
#[allow(deprecated)]
let names = db.table_names(TableNamesRequest::default()).await.unwrap();
assert!(!names.contains(&"t".to_string()));
assert!(matches!(
db.open_table(open("t")).await,
Err(Error::TableNotFound { .. })
));
// ...but data survives: it shows up as purgable via the V1 root namespace.
let root_ns = db
.root_namespace_database()
.namespace_client()
.await
.unwrap();
let purgable = root_ns.list_purgable_tables(None).await.unwrap();
assert_eq!(purgable.len(), 1);
assert_eq!(purgable[0].id, vec!["t".to_string()]);
// Re-creating revives it.
db.create_table(create("t")).await.unwrap();
db.open_table(open("t")).await.unwrap();
#[allow(deprecated)]
let names = db.table_names(TableNamesRequest::default()).await.unwrap();
assert!(names.contains(&"t".to_string()));
assert!(root_ns.list_purgable_tables(None).await.unwrap().is_empty());
// Drop then purge reclaims it for good.
db.drop_table("t", &[]).await.unwrap();
let purged = root_ns.purge_tables(None).await.unwrap();
assert_eq!(purged, vec![vec!["t".to_string()]]);
assert!(root_ns.list_purgable_tables(None).await.unwrap().is_empty());
}
}

View File

@@ -583,9 +583,9 @@ impl Database for LanceNamespaceDatabase {
self.namespace
.drop_table(drop_request)
.await
// Preserve TableNotFound (e.g. dropping a non-existent table) rather than
// flattening every failure to a generic Runtime error.
.map_err(|e| map_namespace_lance_error(e, name))?;
.map_err(|e| Error::Runtime {
message: format!("Failed to drop table: {}", e),
})?;
Ok(())
}

View File

@@ -8,6 +8,7 @@
pub(crate) mod client;
pub(crate) mod db;
pub mod oauth;
mod retry;
pub(crate) mod table;
pub(crate) mod util;
@@ -20,3 +21,4 @@ const JSON_CONTENT_TYPE: &str = "application/json";
pub use client::{ClientConfig, HeaderProvider, RetryConfig, TimeoutConfig, TlsConfig};
pub use db::{RemoteDatabaseOptions, RemoteDatabaseOptionsBuilder};
pub use oauth::{OAuthConfig, OAuthFlow, OAuthHeaderProvider};

View File

@@ -0,0 +1,907 @@
// SPDX-License-Identifier: Apache-2.0
// SPDX-FileCopyrightText: Copyright The LanceDB Authors
use std::collections::HashMap;
use std::net::IpAddr;
use std::sync::Arc;
use std::time::{Duration, Instant};
use async_trait::async_trait;
use log::debug;
use reqwest::Client;
use serde::Deserialize;
use tokio::sync::RwLock;
use crate::error::{Error, Result};
use crate::remote::client::HeaderProvider;
const DEFAULT_REFRESH_BUFFER_SECS: u64 = 300;
const DEFAULT_TOKEN_TTL_SECS: u64 = 3600;
const AZURE_IMDS_ENDPOINT: &str = "http://169.254.169.254/metadata/identity/oauth2/token";
const AZURE_IMDS_API_VERSION: &str = "2018-02-01";
/// OAuth authentication flow configuration.
#[derive(Debug, Clone)]
pub enum OAuthFlow {
/// Client Credentials grant (service-to-service / M2M).
/// Requires `client_secret` in [`OAuthConfig`].
ClientCredentials,
/// Azure Managed Identity via IMDS.
/// Works on Azure VMs, AKS, App Service, and Azure Functions.
/// IMDS requests bypass proxy settings because the endpoint is link-local.
AzureManagedIdentity {
/// Client ID for user-assigned managed identity.
/// Omit for system-assigned managed identity.
client_id: Option<String>,
},
}
/// OAuth configuration for LanceDB authentication.
///
/// All token acquisition and refresh is handled in the Rust layer.
/// Python and TypeScript bindings expose this as a plain config object.
#[derive(Clone)]
pub struct OAuthConfig {
/// OIDC issuer URL or OAuth authority URL.
/// For Azure: `https://login.microsoftonline.com/{tenant_id}/v2.0`
pub issuer_url: String,
/// Application / Client ID.
pub client_id: String,
/// Client secret (required for `ClientCredentials`, optional for others).
pub client_secret: Option<String>,
/// OAuth scopes to request.
/// For Azure managed identity, exactly one scope or resource is required.
/// For example: `["api://{app_id}/.default"]`
pub scopes: Vec<String>,
/// Authentication flow to use.
pub flow: OAuthFlow,
/// Seconds before token expiry to trigger proactive refresh (default: 300).
/// Keep this well below the token TTL; if it is greater than or equal to
/// the TTL, each request refreshes the token.
pub refresh_buffer_secs: Option<u64>,
}
impl std::fmt::Debug for OAuthConfig {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
f.debug_struct("OAuthConfig")
.field("issuer_url", &self.issuer_url)
.field("client_id", &self.client_id)
.field(
"client_secret",
&self.client_secret.as_deref().map(|_| "<redacted>"),
)
.field("scopes", &self.scopes)
.field("flow", &self.flow)
.field("refresh_buffer_secs", &self.refresh_buffer_secs)
.finish()
}
}
// -- OIDC Discovery --
#[derive(Clone, Debug, Deserialize)]
struct OidcDiscovery {
token_endpoint: String,
}
// -- Token Response --
#[derive(Deserialize)]
struct TokenResponse {
access_token: String,
/// Token lifetime in seconds.
/// Some providers (Azure IMDS) return this as a string, so we accept both.
#[serde(default, deserialize_with = "deserialize_optional_u64_or_string")]
expires_in: Option<u64>,
#[serde(default)]
#[allow(dead_code)]
token_type: Option<String>,
}
impl std::fmt::Debug for TokenResponse {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
f.debug_struct("TokenResponse")
.field("access_token", &"<redacted>")
.field("expires_in", &self.expires_in)
.field("token_type", &self.token_type)
.finish()
}
}
fn deserialize_optional_u64_or_string<'de, D>(
deserializer: D,
) -> std::result::Result<Option<u64>, D::Error>
where
D: serde::Deserializer<'de>,
{
use serde::de;
struct U64OrString;
impl<'de> de::Visitor<'de> for U64OrString {
type Value = Option<u64>;
fn expecting(&self, formatter: &mut std::fmt::Formatter) -> std::fmt::Result {
formatter.write_str("an integer, an integer-valued float, a numeric string, or null")
}
fn visit_u64<E: de::Error>(self, v: u64) -> std::result::Result<Self::Value, E> {
Ok(Some(v))
}
fn visit_i64<E: de::Error>(self, v: i64) -> std::result::Result<Self::Value, E> {
if v < 0 {
return Err(E::custom(format!("invalid expires_in value: {v}")));
}
Ok(Some(v as u64))
}
fn visit_f64<E: de::Error>(self, v: f64) -> std::result::Result<Self::Value, E> {
if !v.is_finite() || v < 0.0 || v.fract() != 0.0 || v > u64::MAX as f64 {
return Err(E::custom(format!("invalid expires_in value: {v}")));
}
Ok(Some(v as u64))
}
fn visit_str<E: de::Error>(self, v: &str) -> std::result::Result<Self::Value, E> {
v.parse::<u64>().map(Some).map_err(de::Error::custom)
}
fn visit_none<E: de::Error>(self) -> std::result::Result<Self::Value, E> {
Ok(None)
}
fn visit_unit<E: de::Error>(self) -> std::result::Result<Self::Value, E> {
Ok(None)
}
}
deserializer.deserialize_any(U64OrString)
}
// -- Internal Token State --
struct TokenState {
access_token: Option<String>,
expires_at: Option<Instant>,
}
impl TokenState {
fn new() -> Self {
Self {
access_token: None,
expires_at: None,
}
}
fn is_expired(&self, buffer: Duration) -> bool {
match (self.access_token.as_ref(), self.expires_at) {
(Some(_), Some(expires_at)) => Instant::now() + buffer >= expires_at,
(None, _) => true,
(Some(_), None) => true,
}
}
fn update(&mut self, resp: &TokenResponse) {
self.access_token = Some(resp.access_token.clone());
let expires_in = resp.expires_in.unwrap_or(DEFAULT_TOKEN_TTL_SECS);
self.expires_at = Some(Instant::now() + Duration::from_secs(expires_in));
}
}
#[async_trait]
trait TokenSource: Send + Sync + std::fmt::Debug {
async fn fetch_token(&self) -> Result<TokenResponse>;
}
struct ClientCredentialsSource {
issuer_url: String,
client_id: String,
client_secret: String,
scopes: Vec<String>,
http_client: Client,
discovery: RwLock<Option<OidcDiscovery>>,
}
impl std::fmt::Debug for ClientCredentialsSource {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
f.debug_struct("ClientCredentialsSource")
.field("issuer_url", &self.issuer_url)
.field("client_id", &self.client_id)
.field("client_secret", &"<redacted>")
.field("scopes", &self.scopes)
.finish()
}
}
impl ClientCredentialsSource {
fn new(
issuer_url: String,
client_id: String,
client_secret: Option<String>,
scopes: Vec<String>,
) -> Result<Self> {
let client_secret = client_secret.ok_or(Error::InvalidInput {
message: "client_secret is required for ClientCredentials flow".to_string(),
})?;
Self::validate_issuer_transport(&issuer_url)?;
let http_client = Client::builder()
.timeout(Duration::from_secs(30))
.build()
.map_err(|e| Error::Runtime {
message: format!("Failed to create HTTP client for OAuth: {e}"),
})?;
Ok(Self {
issuer_url,
client_id,
client_secret,
scopes,
http_client,
discovery: RwLock::new(None),
})
}
fn validate_issuer_transport(issuer_url: &str) -> Result<()> {
let issuer = url::Url::parse(issuer_url).map_err(|e| Error::InvalidInput {
message: format!("Invalid OAuth issuer_url: {e}"),
})?;
match issuer.scheme() {
"https" => Ok(()),
"http" if Self::is_loopback_issuer(&issuer) => Ok(()),
_ => Err(Error::InvalidInput {
message:
"ClientCredentials OAuth issuer_url must use https, except for loopback hosts"
.to_string(),
}),
}
}
fn is_loopback_issuer(issuer: &url::Url) -> bool {
let Some(host) = issuer.host_str() else {
return false;
};
host.eq_ignore_ascii_case("localhost")
|| host
.parse::<IpAddr>()
.map(|addr| addr.is_loopback())
.unwrap_or(false)
}
async fn get_discovery(&self) -> Result<OidcDiscovery> {
{
let cached = self.discovery.read().await;
if let Some(ref disc) = *cached {
return Ok(disc.clone());
}
}
let mut cache = self.discovery.write().await;
// Double-check
if let Some(ref disc) = *cache {
return Ok(disc.clone());
}
let discovery_url = format!(
"{}/.well-known/openid-configuration",
self.issuer_url.trim_end_matches('/')
);
debug!("Fetching OIDC discovery from {}", discovery_url);
let resp = self
.http_client
.get(&discovery_url)
.send()
.await
.map_err(|e| Error::Runtime {
message: format!("Failed to fetch OIDC discovery document: {e}"),
})?;
if !resp.status().is_success() {
return Err(Error::Runtime {
message: format!(
"OIDC discovery failed with status {}: {}",
resp.status(),
resp.text().await.unwrap_or_default()
),
});
}
let disc: OidcDiscovery = resp.json().await.map_err(|e| Error::Runtime {
message: format!("Failed to parse OIDC discovery document: {e}"),
})?;
let result = disc.clone();
*cache = Some(disc);
Ok(result)
}
async fn get_token_endpoint(&self) -> Result<String> {
self.get_discovery().await.map(|disc| disc.token_endpoint)
}
fn scopes_string(&self) -> String {
self.scopes.join(" ")
}
async fn post_token_request(
&self,
endpoint: &str,
params: &[(&str, &str)],
) -> Result<TokenResponse> {
let resp = self
.http_client
.post(endpoint)
.form(params)
.send()
.await
.map_err(|e| Error::Runtime {
message: format!("Token request to {endpoint} failed: {e}"),
})?;
if !resp.status().is_success() {
return Err(Error::Runtime {
message: format!(
"Token request failed with status {}: {}",
resp.status(),
resp.text().await.unwrap_or_default()
),
});
}
resp.json().await.map_err(|e| Error::Runtime {
message: format!("Failed to parse token response: {e}"),
})
}
}
#[async_trait]
impl TokenSource for ClientCredentialsSource {
async fn fetch_token(&self) -> Result<TokenResponse> {
let token_endpoint = self.get_token_endpoint().await?;
let scope = self.scopes_string();
let params = [
("grant_type", "client_credentials"),
("client_id", self.client_id.as_str()),
("client_secret", self.client_secret.as_str()),
("scope", scope.as_str()),
];
self.post_token_request(&token_endpoint, &params).await
}
}
struct AzureImdsSource {
client_id: Option<String>,
resource: String,
http_client: Client,
}
impl std::fmt::Debug for AzureImdsSource {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
f.debug_struct("AzureImdsSource")
.field("client_id", &self.client_id)
.field("resource", &self.resource)
.finish()
}
}
impl AzureImdsSource {
fn new(scopes: Vec<String>, client_id: Option<String>) -> Result<Self> {
let resource = Self::resource_from_scopes(&scopes)?;
let http_client = Client::builder()
.timeout(Duration::from_secs(30))
.no_proxy()
.build()
.map_err(|e| Error::Runtime {
message: format!("Failed to create HTTP client for Azure IMDS OAuth: {e}"),
})?;
Ok(Self {
client_id,
resource,
http_client,
})
}
fn resource_from_scopes(scopes: &[String]) -> Result<String> {
let [scope] = scopes else {
return Err(Error::InvalidInput {
message: "AzureManagedIdentity flow requires exactly one OAuth scope or resource"
.to_string(),
});
};
Ok(scope.strip_suffix("/.default").unwrap_or(scope).to_string())
}
}
#[async_trait]
impl TokenSource for AzureImdsSource {
async fn fetch_token(&self) -> Result<TokenResponse> {
let mut url = format!(
"{AZURE_IMDS_ENDPOINT}?api-version={AZURE_IMDS_API_VERSION}&resource={}",
urlencoding::encode(&self.resource),
);
if let Some(cid) = self.client_id.as_deref() {
url.push_str(&format!("&client_id={}", urlencoding::encode(cid)));
}
let resp = self
.http_client
.get(&url)
.header("Metadata", "true")
.send()
.await
.map_err(|e| Error::Runtime {
message: format!("Azure IMDS request failed: {e}"),
})?;
if !resp.status().is_success() {
return Err(Error::Runtime {
message: format!(
"Azure IMDS returned status {}: {}",
resp.status(),
resp.text().await.unwrap_or_default()
),
});
}
resp.json().await.map_err(|e| Error::Runtime {
message: format!("Failed to parse IMDS token response: {e}"),
})
}
}
/// OAuth header provider that manages the full token lifecycle.
///
/// Implements [`HeaderProvider`] to inject `Authorization: Bearer <token>`
/// headers into every LanceDB request, with automatic token refresh.
pub struct OAuthHeaderProvider {
token_source: Box<dyn TokenSource>,
token_state: Arc<RwLock<TokenState>>,
refresh_buffer: Duration,
}
impl std::fmt::Debug for OAuthHeaderProvider {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
f.debug_struct("OAuthHeaderProvider")
.field("token_source", &self.token_source)
.finish()
}
}
impl OAuthHeaderProvider {
/// Create a new OAuth header provider from configuration.
pub fn new(config: OAuthConfig) -> Result<Self> {
let OAuthConfig {
issuer_url,
client_id,
client_secret,
scopes,
flow,
refresh_buffer_secs,
} = config;
if scopes.is_empty() {
return Err(Error::InvalidInput {
message: "At least one OAuth scope is required".to_string(),
});
}
let refresh_buffer =
Duration::from_secs(refresh_buffer_secs.unwrap_or(DEFAULT_REFRESH_BUFFER_SECS));
let token_source: Box<dyn TokenSource> = match flow {
OAuthFlow::ClientCredentials => Box::new(ClientCredentialsSource::new(
issuer_url,
client_id,
client_secret,
scopes,
)?),
OAuthFlow::AzureManagedIdentity { client_id } => {
Box::new(AzureImdsSource::new(scopes, client_id)?)
}
};
Ok(Self {
token_source,
token_state: Arc::new(RwLock::new(TokenState::new())),
refresh_buffer,
})
}
/// Get a valid access token, refreshing if necessary.
async fn get_valid_token(&self) -> Result<String> {
// Fast path: check if current token is still valid
{
let state = self.token_state.read().await;
if !state.is_expired(self.refresh_buffer)
&& let Some(ref token) = state.access_token
{
return Ok(token.clone());
}
}
// Slow path: acquire or refresh token
let mut state = self.token_state.write().await;
// Double-check after acquiring write lock
if !state.is_expired(self.refresh_buffer)
&& let Some(ref token) = state.access_token
{
return Ok(token.clone());
}
debug!("Acquiring new OAuth token via {:?}", self.token_source);
let resp = self.token_source.fetch_token().await?;
state.update(&resp);
Ok(resp.access_token)
}
}
#[async_trait]
impl HeaderProvider for OAuthHeaderProvider {
async fn get_headers(&self) -> Result<HashMap<String, String>> {
let token = self.get_valid_token().await?;
Ok(HashMap::from([(
"authorization".to_string(),
format!("Bearer {token}"),
)]))
}
}
#[cfg(test)]
mod tests {
use super::*;
use std::sync::atomic::{AtomicUsize, Ordering};
use tokio::io::{AsyncReadExt, AsyncWriteExt};
use tokio::net::{TcpListener, TcpStream};
use tokio::task::JoinHandle;
#[test]
fn test_token_state_expiry() {
let mut state = TokenState::new();
assert!(state.is_expired(Duration::from_secs(0)));
state.access_token = Some("tok".to_string());
state.expires_at = Some(Instant::now() + Duration::from_secs(600));
assert!(!state.is_expired(Duration::from_secs(300)));
assert!(state.is_expired(Duration::from_secs(601)));
state.expires_at = None;
assert!(state.is_expired(Duration::from_secs(0)));
}
#[test]
fn test_token_state_uses_default_expiry() {
let mut state = TokenState::new();
let response = TokenResponse {
access_token: "tok".to_string(),
expires_in: None,
token_type: None,
};
state.update(&response);
assert!(!state.is_expired(Duration::from_secs(DEFAULT_TOKEN_TTL_SECS - 1)));
assert!(state.is_expired(Duration::from_secs(DEFAULT_TOKEN_TTL_SECS + 1)));
}
#[test]
fn test_token_response_accepts_float_expires_in() {
let response: TokenResponse =
serde_json::from_str(r#"{"access_token":"tok","expires_in":3600.0}"#).unwrap();
assert_eq!(response.expires_in, Some(3600));
}
#[test]
fn test_token_response_rejects_negative_expires_in() {
let err =
serde_json::from_str::<TokenResponse>(r#"{"access_token":"tok","expires_in":-1}"#)
.unwrap_err();
assert!(err.to_string().contains("invalid expires_in value: -1"));
}
#[test]
fn test_token_response_debug_redacts_access_token() {
let response = TokenResponse {
access_token: "secret-token".to_string(),
expires_in: Some(3600),
token_type: Some("Bearer".to_string()),
};
let debug = format!("{response:?}");
assert!(!debug.contains("secret-token"));
assert!(debug.contains("access_token: \"<redacted>\""));
}
#[test]
fn test_scopes_string() {
let source = ClientCredentialsSource::new(
"https://login.microsoftonline.com/tenant/v2.0".to_string(),
"app-id".to_string(),
Some("secret".to_string()),
vec!["scope1".to_string(), "scope2".to_string()],
)
.unwrap();
assert_eq!(source.scopes_string(), "scope1 scope2");
}
#[test]
fn test_oauth_config_debug_redacts_client_secret() {
let config = OAuthConfig {
issuer_url: "https://issuer.example.com".to_string(),
client_id: "client-id".to_string(),
client_secret: Some("super-secret".to_string()),
scopes: vec!["scope".to_string()],
flow: OAuthFlow::ClientCredentials,
refresh_buffer_secs: None,
};
let debug = format!("{config:?}");
assert!(!debug.contains("super-secret"));
assert!(debug.contains("client_secret: Some(\"<redacted>\")"));
}
#[test]
fn test_oauth_header_provider_debug_redacts_client_secret() {
let config = OAuthConfig {
issuer_url: "https://issuer.example.com".to_string(),
client_id: "client-id".to_string(),
client_secret: Some("super-secret".to_string()),
scopes: vec!["scope".to_string()],
flow: OAuthFlow::ClientCredentials,
refresh_buffer_secs: None,
};
let provider = OAuthHeaderProvider::new(config).unwrap();
let debug = format!("{provider:?}");
assert!(!debug.contains("super-secret"));
assert!(debug.contains("client_secret: \"<redacted>\""));
}
#[test]
fn test_managed_identity_resource_from_default_scope() {
assert_eq!(
AzureImdsSource::resource_from_scopes(&["api://test/.default".to_string()]).unwrap(),
"api://test"
);
}
#[test]
fn test_managed_identity_resource_without_default_suffix() {
assert_eq!(
AzureImdsSource::resource_from_scopes(&["api://test".to_string()]).unwrap(),
"api://test"
);
}
#[test]
fn test_managed_identity_rejects_multiple_scopes() {
let config = OAuthConfig {
issuer_url: "https://login.microsoftonline.com/tenant/v2.0".to_string(),
client_id: "app-id".to_string(),
client_secret: None,
scopes: vec![
"api://test-a/.default".to_string(),
"api://test-b/.default".to_string(),
],
flow: OAuthFlow::AzureManagedIdentity { client_id: None },
refresh_buffer_secs: None,
};
assert!(OAuthHeaderProvider::new(config).is_err());
}
#[tokio::test]
async fn test_token_endpoint_requires_discovery_success() {
let (issuer_url, server) = spawn_discovery_error_server().await;
let source = ClientCredentialsSource::new(
issuer_url,
"client-id".to_string(),
Some("secret".to_string()),
vec!["scope".to_string()],
)
.unwrap();
let err = source.get_token_endpoint().await.unwrap_err();
assert!(matches!(
err,
Error::Runtime { message }
if message.contains("OIDC discovery failed with status 503")
));
server.await.unwrap();
}
#[test]
fn test_client_credentials_requires_secret() {
let config = OAuthConfig {
issuer_url: "https://login.microsoftonline.com/tenant/v2.0".to_string(),
client_id: "app-id".to_string(),
client_secret: None,
scopes: vec!["scope".to_string()],
flow: OAuthFlow::ClientCredentials,
refresh_buffer_secs: None,
};
assert!(OAuthHeaderProvider::new(config).is_err());
}
#[test]
fn test_client_credentials_rejects_insecure_non_loopback_issuer() {
let config = OAuthConfig {
issuer_url: "http://issuer.example.com".to_string(),
client_id: "app-id".to_string(),
client_secret: Some("secret".to_string()),
scopes: vec!["scope".to_string()],
flow: OAuthFlow::ClientCredentials,
refresh_buffer_secs: None,
};
let err = OAuthHeaderProvider::new(config).unwrap_err();
assert!(matches!(
err,
Error::InvalidInput { message }
if message == "ClientCredentials OAuth issuer_url must use https, except for loopback hosts"
));
}
#[test]
fn test_empty_scopes_rejected() {
let config = OAuthConfig {
issuer_url: "https://login.microsoftonline.com/tenant/v2.0".to_string(),
client_id: "app-id".to_string(),
client_secret: None,
scopes: vec![],
flow: OAuthFlow::AzureManagedIdentity { client_id: None },
refresh_buffer_secs: None,
};
assert!(OAuthHeaderProvider::new(config).is_err());
}
#[tokio::test]
async fn test_client_credentials_token_lifecycle() {
let (issuer_url, token_requests, server) = spawn_oauth_server().await;
let config = OAuthConfig {
issuer_url,
client_id: "client-id".to_string(),
client_secret: Some("secret".to_string()),
scopes: vec!["scope".to_string()],
flow: OAuthFlow::ClientCredentials,
refresh_buffer_secs: Some(0),
};
let provider = OAuthHeaderProvider::new(config).unwrap();
let headers = provider.get_headers().await.unwrap();
assert_eq!(headers.get("authorization").unwrap(), "Bearer token-1");
assert_eq!(token_requests.load(Ordering::SeqCst), 1);
let headers = provider.get_headers().await.unwrap();
assert_eq!(headers.get("authorization").unwrap(), "Bearer token-1");
assert_eq!(token_requests.load(Ordering::SeqCst), 1);
provider.token_state.write().await.expires_at =
Some(Instant::now() - Duration::from_secs(1));
let headers = provider.get_headers().await.unwrap();
assert_eq!(headers.get("authorization").unwrap(), "Bearer token-2");
assert_eq!(token_requests.load(Ordering::SeqCst), 2);
server.await.unwrap();
}
async fn spawn_oauth_server() -> (String, Arc<AtomicUsize>, JoinHandle<()>) {
let listener = TcpListener::bind("127.0.0.1:0").await.unwrap();
let addr = listener.local_addr().unwrap();
let issuer_url = format!("http://{addr}");
let token_requests = Arc::new(AtomicUsize::new(0));
let server_token_requests = Arc::clone(&token_requests);
let server = tokio::spawn(async move {
for _ in 0..3 {
let (mut stream, _) = listener.accept().await.unwrap();
let (request_line, body) = read_http_request(&mut stream).await;
if request_line.starts_with("GET /.well-known/openid-configuration ") {
let discovery = format!(r#"{{"token_endpoint":"http://{addr}/token"}}"#);
write_json_response(&mut stream, "200 OK", &discovery).await;
} else if request_line.starts_with("POST /token ") {
assert!(body.contains("grant_type=client_credentials"));
assert!(body.contains("client_id=client-id"));
assert!(body.contains("client_secret=secret"));
assert!(body.contains("scope=scope"));
let token_num = server_token_requests.fetch_add(1, Ordering::SeqCst) + 1;
let token = format!(
r#"{{"access_token":"token-{token_num}","expires_in":3600,"token_type":"Bearer"}}"#
);
write_json_response(&mut stream, "200 OK", &token).await;
} else {
write_json_response(&mut stream, "404 Not Found", "{}").await;
}
}
});
(issuer_url, token_requests, server)
}
async fn spawn_discovery_error_server() -> (String, JoinHandle<()>) {
let listener = TcpListener::bind("127.0.0.1:0").await.unwrap();
let addr = listener.local_addr().unwrap();
let issuer_url = format!("http://{addr}");
let server = tokio::spawn(async move {
let (mut stream, _) = listener.accept().await.unwrap();
let (request_line, _) = read_http_request(&mut stream).await;
assert!(request_line.starts_with("GET /.well-known/openid-configuration "));
write_json_response(&mut stream, "503 Service Unavailable", "{}").await;
});
(issuer_url, server)
}
async fn read_http_request(stream: &mut TcpStream) -> (String, String) {
let mut buffer = Vec::new();
let mut header_end = None;
while header_end.is_none() {
let mut chunk = [0; 1024];
let read = stream.read(&mut chunk).await.unwrap();
assert_ne!(read, 0, "connection closed before request headers");
buffer.extend_from_slice(&chunk[..read]);
header_end = find_subsequence(&buffer, b"\r\n\r\n").map(|pos| pos + 4);
}
let header_end = header_end.unwrap();
let headers = String::from_utf8_lossy(&buffer[..header_end]).to_string();
let request_line = headers.lines().next().unwrap_or_default().to_string();
let content_length = headers
.lines()
.find_map(|line| {
let (name, value) = line.split_once(':')?;
name.eq_ignore_ascii_case("content-length")
.then(|| value.trim().parse::<usize>().ok())
.flatten()
})
.unwrap_or(0);
while buffer.len() < header_end + content_length {
let mut chunk = [0; 1024];
let read = stream.read(&mut chunk).await.unwrap();
assert_ne!(read, 0, "connection closed before request body");
buffer.extend_from_slice(&chunk[..read]);
}
let body =
String::from_utf8_lossy(&buffer[header_end..header_end + content_length]).to_string();
(request_line, body)
}
fn find_subsequence(haystack: &[u8], needle: &[u8]) -> Option<usize> {
haystack
.windows(needle.len())
.position(|window| window == needle)
}
async fn write_json_response(stream: &mut TcpStream, status: &str, body: &str) {
let response = format!(
"HTTP/1.1 {status}\r\ncontent-type: application/json\r\ncontent-length: {}\r\nconnection: close\r\n\r\n{body}",
body.len()
);
stream.write_all(response.as_bytes()).await.unwrap();
}
}

View File

@@ -70,18 +70,29 @@ use tokio::sync::RwLock;
const REQUEST_TIMEOUT_HEADER: HeaderName = HeaderName::from_static("x-request-timeout-ms");
const MIN_VERSION_HEADER: HeaderName = HeaderName::from_static("x-lancedb-min-version");
const MIN_TIMESTAMP_HEADER: HeaderName = HeaderName::from_static("x-lancedb-min-timestamp");
const MIN_READ_VERSION_HEADER: HeaderName = HeaderName::from_static("x-lancedb-min-read-version");
const VERSION_HEADER: HeaderName = HeaderName::from_static("x-lancedb-version");
const METRIC_TYPE_KEY: &str = "metric_type";
const INDEX_TYPE_KEY: &str = "index_type";
const SCHEMA_CACHE_TTL: Duration = Duration::from_secs(30);
const SCHEMA_CACHE_REFRESH_WINDOW: Duration = Duration::from_secs(5);
/// Per-table state driving the freshness headers (`x-lancedb-min-version` and
/// `x-lancedb-min-timestamp`) sent on read requests.
/// Per-table state driving the freshness headers (`x-lancedb-min-version`,
/// `x-lancedb-min-timestamp`, and `x-lancedb-min-read-version`) sent on read
/// requests.
#[derive(Debug, Default, Clone, Copy)]
struct FreshnessState {
/// Provides read-your-write within a single handle: writes that return a
/// version update this, and reads send it as `x-lancedb-min-version`.
min_version: Option<u64>,
/// Highest dataset version observed in a *read* response on this handle.
/// Reads send it as `x-lancedb-min-read-version` so a load-balanced query
/// node whose cache is behind this version must refresh before serving,
/// giving monotonic reads across nodes regardless of which one the load
/// balancer routes to. Sourced only from reads (always committed dataset
/// versions), never from writes (which may return WAL entry ids), so it is
/// unaffected by the WAL/version mismatch that retired `min_version`.
min_read_version: Option<u64>,
/// Wall-clock time captured at the last [`BaseTable::checkout_latest`]
/// call. Subsequent reads send
/// `max(baseline, now - read_consistency_interval)` as
@@ -102,6 +113,7 @@ struct FreshnessState {
struct FreshnessHeaders {
min_version: Option<u64>,
min_timestamp: Option<SystemTime>,
min_read_version: Option<u64>,
}
impl FreshnessHeaders {
@@ -113,6 +125,9 @@ impl FreshnessHeaders {
let dt: chrono::DateTime<chrono::Utc> = ts.into();
request = request.header(MIN_TIMESTAMP_HEADER, dt.to_rfc3339());
}
if let Some(v) = self.min_read_version {
request = request.header(MIN_READ_VERSION_HEADER, v.to_string());
}
request
}
}
@@ -884,6 +899,7 @@ impl<S: HttpSend> RemoteTable<S> {
self.client.read_consistency_interval,
SystemTime::now(),
),
min_read_version: state.min_read_version,
}
}
@@ -905,6 +921,30 @@ impl<S: HttpSend> RemoteTable<S> {
state.min_version = Some(state.min_version.map_or(version, |v| v.max(version)));
}
/// Record a dataset version observed in a *read* response so subsequent
/// reads request at least this version via `x-lancedb-min-read-version`,
/// giving monotonic reads across load-balanced query nodes. A returned `0`
/// (or absent header from an old server) is ignored.
fn track_read_version(&self, version: u64) {
if version == 0 {
return;
}
let mut state = self.freshness.lock().unwrap();
state.min_read_version = Some(state.min_read_version.map_or(version, |v| v.max(version)));
}
/// Parse the `x-lancedb-version` response header (the dataset version a read
/// reflects) and fold it into the read-version watermark.
fn track_read_version_from_headers(&self, headers: &reqwest::header::HeaderMap) {
if let Some(version) = headers
.get(&VERSION_HEADER)
.and_then(|value| value.to_str().ok())
.and_then(|value| value.parse::<u64>().ok())
{
self.track_read_version(version);
}
}
async fn execute_query(
&self,
query: &AnyQuery,
@@ -928,6 +968,7 @@ impl<S: HttpSend> RemoteTable<S> {
let futures = requests.into_iter().map(|req| async move {
let (request_id, response) = self.send(req, true).await?;
self.track_read_version_from_headers(response.headers());
self.read_arrow_stream(&request_id, response).await
});
let streams = futures::future::try_join_all(futures);
@@ -1545,11 +1586,12 @@ impl<S: HttpSend> BaseTable for RemoteTable<S> {
*write_guard = None;
drop(write_guard);
// Drop any per-handle write tracking; subsequent reads use the
// Drop any per-handle read/write tracking; subsequent reads use the
// baseline timestamp captured now to guarantee freshness.
*self.freshness.lock().unwrap() = FreshnessState {
min_version: None,
checkout_baseline: Some(SystemTime::now()),
min_read_version: None,
};
// Invalidate schema cache since we're switching versions
@@ -1805,6 +1847,7 @@ impl<S: HttpSend> BaseTable for RemoteTable<S> {
}
};
self.track_read_version_from_headers(response.headers());
let body = response.text().await.err_to_http(request_id.clone())?;
serde_json::from_str(&body).map_err(|e| Error::Http {
@@ -7124,6 +7167,7 @@ mod tests {
let state = FreshnessState {
min_version: None,
checkout_baseline: Some(baseline),
min_read_version: None,
};
assert_eq!(compute_min_timestamp(&state, None, now), Some(baseline));
@@ -7148,6 +7192,7 @@ mod tests {
let state = FreshnessState {
min_version: None,
checkout_baseline: Some(baseline),
min_read_version: None,
};
assert_eq!(
compute_min_timestamp(&state, Some(Duration::from_secs(10)), now),
@@ -7159,6 +7204,7 @@ mod tests {
let state = FreshnessState {
min_version: None,
checkout_baseline: Some(recent_baseline),
min_read_version: None,
};
assert_eq!(
compute_min_timestamp(&state, Some(Duration::from_secs(60)), now),
@@ -7303,6 +7349,106 @@ mod tests {
);
}
/// A handler that records every request's headers and answers each read with
/// an `x-lancedb-version` response header taken from `versions` (by call
/// index, saturating at the last entry). An empty string means "no header".
fn read_version_handler(
versions: &'static [&'static str],
) -> (
impl Fn(reqwest::Request) -> http::Response<String> + Clone + Send + Sync + 'static,
Arc<std::sync::Mutex<Vec<http::HeaderMap>>>,
) {
let requests = Arc::new(std::sync::Mutex::new(Vec::new()));
let requests_c = requests.clone();
let call = Arc::new(AtomicUsize::new(0));
let handler = move |request: reqwest::Request| {
requests_c.lock().unwrap().push(request.headers().clone());
let i = call.fetch_add(1, Ordering::SeqCst).min(versions.len() - 1);
let mut builder = http::Response::builder().status(200);
if !versions[i].is_empty() {
builder = builder.header("x-lancedb-version", versions[i]);
}
builder.body("42".to_string()).unwrap()
};
(handler, requests)
}
#[tokio::test]
async fn test_read_version_watermark_tracked_and_sent() {
let (handler, requests) = read_version_handler(&["100", "100"]);
let table = Table::new_with_handler("my_table", handler);
// First read has no watermark yet; the response advertises version 100,
// so the second read must floor the server at 100.
table.count_rows(None).await.unwrap();
table.count_rows(None).await.unwrap();
let reqs = requests.lock().unwrap();
assert!(!reqs[0].contains_key("x-lancedb-min-read-version"));
assert_eq!(
reqs[1]
.get("x-lancedb-min-read-version")
.unwrap()
.to_str()
.unwrap(),
"100"
);
}
#[tokio::test]
async fn test_read_version_watermark_keeps_max() {
// Server reports 100 then a stale 50; the watermark must not regress.
let (handler, requests) = read_version_handler(&["100", "50", "50"]);
let table = Table::new_with_handler("my_table", handler);
table.count_rows(None).await.unwrap();
table.count_rows(None).await.unwrap();
table.count_rows(None).await.unwrap();
let reqs = requests.lock().unwrap();
assert_eq!(
reqs[2]
.get("x-lancedb-min-read-version")
.unwrap()
.to_str()
.unwrap(),
"100"
);
}
#[tokio::test]
async fn test_read_version_absent_header_no_watermark() {
// An old server that doesn't return the version header leaves the
// watermark unset, preserving backward compatibility.
let (handler, requests) = read_version_handler(&[""]);
let table = Table::new_with_handler("my_table", handler);
table.count_rows(None).await.unwrap();
table.count_rows(None).await.unwrap();
let reqs = requests.lock().unwrap();
assert!(!reqs[1].contains_key("x-lancedb-min-read-version"));
}
#[tokio::test]
async fn test_read_version_watermark_reset_on_checkout_latest() {
let (handler, requests) = read_version_handler(&["100", "100"]);
let table = Table::new_with_handler("my_table", handler);
table.count_rows(None).await.unwrap();
table.checkout_latest().await.unwrap();
table.count_rows(None).await.unwrap();
// The read after checkout_latest starts from a clean slate.
let reqs = requests.lock().unwrap();
assert!(
!reqs
.last()
.unwrap()
.contains_key("x-lancedb-min-read-version")
);
}
/// Like `capturing_handler`, but keeps a per-path snapshot of the headers
/// from every request so tests can assert on a specific endpoint.
#[allow(clippy::type_complexity)]

View File

@@ -579,24 +579,45 @@ fn array_to_f32_vec(arr: &Arc<dyn arrow_array::Array>) -> Result<Vec<f32>> {
})
}
/// Magic bytes that prefix (and suffix) the Arrow IPC *file* format.
const ARROW_IPC_FILE_MAGIC: &[u8] = b"ARROW1";
/// Parse Arrow IPC response from the namespace server.
///
/// The server may return either the Arrow IPC *file* format or the *stream*
/// format. REST/phalanx returns the file format (it begins with the `ARROW1`
/// magic); reading that with a `StreamReader` fails with "failed to fill whole
/// buffer". Detect the magic and pick the matching reader so both are handled.
async fn parse_arrow_ipc_response(bytes: bytes::Bytes) -> Result<DatasetRecordBatchStream> {
use arrow_ipc::reader::StreamReader;
use arrow_ipc::reader::{FileReader, StreamReader};
use std::io::Cursor;
let cursor = Cursor::new(bytes);
let reader = StreamReader::try_new(cursor, None).map_err(|e| Error::Runtime {
message: format!("Failed to parse Arrow IPC response: {}", e),
})?;
// Collect all record batches
let schema = reader.schema();
let batches: Vec<_> = reader
.into_iter()
.collect::<std::result::Result<Vec<_>, _>>()
.map_err(|e| Error::Runtime {
message: format!("Failed to read Arrow IPC batches: {}", e),
let (schema, batches) = if bytes.starts_with(ARROW_IPC_FILE_MAGIC) {
let reader = FileReader::try_new(Cursor::new(bytes), None).map_err(|e| Error::Runtime {
message: format!("Failed to parse Arrow IPC file response: {}", e),
})?;
let schema = reader.schema();
let batches = reader
.into_iter()
.collect::<std::result::Result<Vec<_>, _>>()
.map_err(|e| Error::Runtime {
message: format!("Failed to read Arrow IPC file batches: {}", e),
})?;
(schema, batches)
} else {
let reader =
StreamReader::try_new(Cursor::new(bytes), None).map_err(|e| Error::Runtime {
message: format!("Failed to parse Arrow IPC response: {}", e),
})?;
let schema = reader.schema();
let batches = reader
.into_iter()
.collect::<std::result::Result<Vec<_>, _>>()
.map_err(|e| Error::Runtime {
message: format!("Failed to read Arrow IPC batches: {}", e),
})?;
(schema, batches)
};
// Create a stream from the batches
let stream = futures::stream::iter(batches.into_iter().map(Ok));
@@ -624,6 +645,59 @@ mod tests {
FixedSizeListArray::try_new_from_values(Float32Array::from(values), dimension).unwrap()
}
#[tokio::test]
async fn test_parse_arrow_ipc_response_handles_file_and_stream() {
use arrow_array::{Int32Array, RecordBatch};
use arrow_ipc::writer::{FileWriter, StreamWriter};
use arrow_schema::{DataType, Field, Schema};
let schema = Arc::new(Schema::new(vec![Field::new("id", DataType::Int32, false)]));
let batch = RecordBatch::try_new(
schema.clone(),
vec![Arc::new(Int32Array::from(vec![1, 2, 3])) as ArrayRef],
)
.unwrap();
// Arrow IPC *file* format -- what REST/phalanx returns. Previously this
// failed with "failed to fill whole buffer" because we used a StreamReader.
let mut file_buf = Vec::new();
{
let mut writer = FileWriter::try_new(&mut file_buf, &schema).unwrap();
writer.write(&batch).unwrap();
writer.finish().unwrap();
}
assert!(file_buf.starts_with(ARROW_IPC_FILE_MAGIC));
let rows: usize = parse_arrow_ipc_response(bytes::Bytes::from(file_buf))
.await
.unwrap()
.try_collect::<Vec<_>>()
.await
.unwrap()
.iter()
.map(|b| b.num_rows())
.sum();
assert_eq!(rows, 3);
// Arrow IPC *stream* format must still parse.
let mut stream_buf = Vec::new();
{
let mut writer = StreamWriter::try_new(&mut stream_buf, &schema).unwrap();
writer.write(&batch).unwrap();
writer.finish().unwrap();
}
assert!(!stream_buf.starts_with(ARROW_IPC_FILE_MAGIC));
let rows: usize = parse_arrow_ipc_response(bytes::Bytes::from(stream_buf))
.await
.unwrap()
.try_collect::<Vec<_>>()
.await
.unwrap()
.iter()
.map(|b| b.num_rows())
.sum();
assert_eq!(rows, 3);
}
#[test]
fn test_convert_to_namespace_query_vector() {
let query_vector = Arc::new(Float32Array::from(vec![1.0, 2.0, 3.0, 4.0]));