Compare commits

...

18 Commits

Author SHA1 Message Date
Neo-X7
17c499177f docs(python): add missing parameter documentation for when_matched_update_all (#3536)
Fixes #2493

Added target. prefix requirement to where parameter docstring.
2026-07-01 10:28:58 -07:00
Will Jones
d889321b5e fix!: combine repeated where filters with AND instead of replacing (#3585)
BREAKING CHANGE: When passing multiple where clauses to a query, they
now stack instead of replacing the previous filter.

Previously, calling `where`/`only_if` more than once on a query silently
replaced the previous filter, so only the last filter was applied. This
was
surprising and could return rows that an earlier filter should have
excluded.

This implements the alternative suggested in
https://github.com/lancedb/lancedb/pull/3514#issuecomment-4664901580:
instead of
rejecting a second filter, repeated filters are combined with a logical
AND
(`(previous) AND (new)`).

The combination happens in the Rust core (`QueryBase::only_if` and
`only_if_expr`), so it applies to all SDKs at once (Rust, Python async,
and
TypeScript). The Python sync query builder keeps its own filter state,
so it
combines filters in the binding layer as well.

SQL string and expression filters are combined within their own
representation.
When the two representations are mixed, the expression is lowered to SQL
(via
`expr_to_sql_string`) and the filters are combined as SQL strings, so
chaining
`where` works regardless of which form each filter takes.

Fixes #2649

## Tests
- Rust: `cargo test --features remote -p lancedb --lib query`
- Python: `uv run --extra tests pytest python/tests/test_query.py`
- TypeScript: `pnpm test __test__/query.test.ts`

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-07-01 10:11:58 -07:00
Will Jones
8a37f2ad77 feat(rust): re-export arrow and datafusion crates from lancedb (#3576)
lancedb's public API forces downstream crates to construct foreign types
— `RecordBatch`/arrays/builders for `Table::add(...)` (arrow), and
`datafusion_expr::Expr` for `only_if_expr`/`expr_projection`/merge
filters. The required version must exactly match lancedb's internal
arrow/datafusion line, but nothing on the API surface makes that
visible. Drift surfaces only as confusing trait/type errors:

```text
error[E0277]: the trait bound `RecordBatch: Scannable` is not satisfied
  = note: there are multiple different versions of crate `arrow_array` in the dependency graph
```

This re-exports the crates lancedb already pins, so consumers can rely
on a single, guaranteed-matching line via a discoverable import path
instead of declaring their own (potentially mismatched) direct
dependency.

- `lancedb::arrow::{arrow, arrow_array, arrow_buffer, arrow_cast,
arrow_data, arrow_ipc, arrow_ord, arrow_schema, arrow_select}` —
previously only `arrow_schema` was re-exported. `arrow-buffer` is
promoted from a transitive to a direct dependency.
- `lancedb::datafusion` — `Expr` is a first-class part of the query and
merge APIs (`only_if_expr`, `expr_projection`,
`QueryFilter::Datafusion`, `when_matched_update_all_expr`), and
`ExecutionPlan` is returned from `create_plan`.

This follows DataFusion's own precedent of re-exporting `arrow`. The
coupling already exists via the trait/impl bounds — this surfaces it
rather than hiding it behind an `E0277`.

Closes #3575

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-07-01 10:10:55 -07:00
Raphael Malikian
f94673ae5e ci: update deprecated GitHub Actions to latest versions (Fixes #3577) (#3608)
Fixes #3577

## Problem
GitHub Actions is deprecating Node.js 20 on its runners. Multiple
workflows in lancedb use action versions that target Node.js 20
(`actions/checkout@v4`, `actions/setup-node@v4`, `actions/cache@v4`,
`actions/upload-artifact@v4`, `actions/download-artifact@v4`,
`pnpm/action-setup@v4`). These are being force-run on Node.js 24,
generating deprecation warnings.

## Solution
Updated all deprecated actions to their latest major versions that
support Node.js 24:

| Action | Old Version | New Version |
|--------|------------|-------------|
| `actions/checkout` | @v4 | @v6 |
| `actions/setup-node` | @v4 | @v6 |
| `actions/cache` | @v4 | @v5 |
| `actions/upload-artifact` | @v4 | @v7 |
| `actions/download-artifact` | @v4 | @v8 |
| `pnpm/action-setup` | @v4 | @v6 |

Note: `actions/checkout@v6` and `actions/upload-artifact@v7` are already
used in `pypi-publish.yml` — this PR extends the same versions to all
remaining workflows.

### Files Changed
- `.github/workflows/npm-publish.yml` — Updated checkout, setup-node,
cache, upload-artifact, download-artifact, pnpm
- `.github/workflows/nodejs.yml` — Updated checkout, setup-node, pnpm
- `.github/workflows/python.yml` — Updated checkout
- `.github/workflows/rust.yml` — Updated checkout
- `.github/workflows/java.yml` — Updated checkout
- `.github/workflows/java-publish.yml` — Updated checkout
- `.github/workflows/cargo-publish.yml` — Updated checkout
- `.github/workflows/docs.yml` — Updated checkout, setup-node
- `.github/workflows/dev.yml` — Updated setup-node
- `.github/workflows/codex-fix-ci.yml` — Updated checkout, setup-node,
pnpm
- `.github/workflows/codex-update-lance-dependency.yml` — Updated
checkout, setup-node
- `.github/workflows/license-header-check.yml` — Updated checkout
- `.github/workflows/make-release-commit.yml` — Updated checkout
- `.github/workflows/update_package_lock_run.yml` — Updated checkout
- `.github/workflows/update_package_lock_run_nodejs.yml` — Updated
checkout

## Verification
- All 20 YAML files validated with `yaml.safe_load()` — no syntax errors
- GitHub Actions CI will validate the actual action versions at runtime

## Changelog

| Date | Change | Author |
|------|--------|--------|
| 2026-07-01 | Updated all deprecated Node 20 actions to latest versions
across 15 workflow files | rtmalikian |

---

**Disclosure:** This code was developed with assistance from
DeepSeek-v4-pro (DeepSeek) via Hermes Agent (Nous Research). All changes
were reviewed and verified for correctness.

Signed-off-by: rtmalikian <rtmalikian@gmail.com>
2026-07-01 09:38:26 -07:00
Jack Ye
3b70fc4c9d fix(python): route async namespace connections through rust (#3603)
Summary:
- Route built-in async namespace-backed connections through the Rust
namespace connector.
- Delegate async namespace/table management methods to the inner
AsyncConnection while keeping the custom implementation Python-client
fallback.
- Add regressions for the native async dir path and lazy
namespace_client() construction.

Validated locally with targeted namespace/db/table pytest, full
test_namespace.py, ruff, cargo fmt/check/clippy, and cargo test -p
lancedb-python.
2026-06-30 17:03:23 -07:00
Lance Release
3a7b02119b Bump version: 0.31.0-beta.4 → 0.31.0-beta.5 2026-06-30 22:24:56 +00:00
Lance Release
bcbc0da090 Bump version: 0.34.0-beta.4 → 0.34.0-beta.5 2026-06-30 22:23:43 +00:00
Jack Ye
9bead9f53d fix(python): route sync namespace connections through rust (#3598)
Summary:
- Route built-in sync namespace connections through the Rust namespace
connector.
- Keep custom namespace clients on the existing Python fallback.
- Preserve namespace-backed to_lance compatibility with lazy Python
client construction and add regressions.
2026-06-30 14:46:23 -07:00
Jack Ye
0351b77984 feat(remote): monotonic reads via x-lancedb-min-read-version watermark (#3597)
## Summary

Adds per-session monotonic reads for remote (LanceDB Cloud/Enterprise)
tables, preventing successive reads on a handle from moving *backward*
in dataset version when a load balancer routes them to query nodes with
differently-cached views.

Each `RemoteTable` handle tracks the highest dataset version it has
observed in a read response — surfaced by the server via a new
`x-lancedb-version` response header — and sends it back as
`x-lancedb-min-read-version` on subsequent reads (`count_rows`,
`query`). A query node whose cache is behind that version refreshes
before serving; a node already at/beyond it serves from cache at no
extra cost.

The watermark is sourced only from reads (always committed dataset
versions), so unlike the retired `x-lancedb-min-version` it is
unaffected by WAL writes returning WAL entry ids. It is reset on
`checkout_latest()`. Both headers are optional and ignored by older
peers.

Server-side enforcement lives in LanceDB Enterprise. Targets the
`codex/update-lance-9-0-0-beta-8` integration branch to match the
Enterprise submodule pin.
2026-06-30 11:22:00 -07:00
Weston Pace
f6c9d31f98 feat: add polars dataframe integration (#3584)
This PR is part cleanup, part feature, part example.

It removes `IntoArrow` and `IntoArrowStream`. There was only one
redundant call site between the two. Once we moved everything to
`Scannable` these traits no longer serve any purpose.

It adds a `Scannable` impl for a polars DataFrame. We used to have this
at one point for `IntoArrow` so this is more like a regression fix than
anything.

It adds an example (and unit test) which ensures we can ingest from a
Polars DataFrame and export to one. LazyFrame support would be a
follow-up (though a pretty straightforward one) but we've never had
proper LazyFrame support before.
2026-06-30 08:28:41 -07:00
Dan Tasse
a8f1c5a69f feat: add skill to work with branches better (#3596)
Agents seemed to have trouble finding the right calls to work with
branches (create, list, delete) and passing the right params to get it
to work. We probably don't need a big skill to get it on the right track
but a little nudge seems helpful. Doing a couple simple tasks, it saved
about half the time and tokens, so feels worthwhile. Created with the
Claude skills creator, hence the "skill.md in a bare folder"
organization - happy to move it if that's not the standard anymore.

```
Benchmark results (3 evals, with-skill vs baseline):

┌────────────────┬────────────┬────────────────────┐
│     Metric     │ With skill │   Without skill    │
├────────────────┼────────────┼────────────────────┤
│ Pass rate      │ 3/3 (100%) │ 3/3 (100%)         │
├────────────────┼────────────┼────────────────────┤
│ Avg time       │ 51s        │ 142s (2.8× slower) │
├────────────────┼────────────┼────────────────────┤
│ Avg tokens     │ 19,305     │ 36,513 (47% more)  │
├────────────────┼────────────┼────────────────────┤
│ Avg tool calls │ 5.7        │ 26 (4.5× more)     │
└────────────────┴────────────┴────────────────────┘
```
2026-06-30 09:27:21 -04:00
Jack Ye
10fecdf051 feat(node): expose OAuth connection config (#3587)
Expose the merged Rust OAuth header provider through the Node/TypeScript
connection path.

Includes:
- Native OAuthConfig conversion for napi-rs
- ConnectionOptions.oauthConfig plumbing
- Public TypeScript OAuthConfig and OAuthFlowType exports
- Generated TypeScript API docs for the new config surface
- input-validation and debug-redaction coverage in the Rust binding
layer

Local validation: cargo fmt --all; git diff --check.
2026-06-29 16:55:45 -07:00
Raphael Malikian
c9ae93a7fa fix: add missing stacklevel=2 to warnings.warn() calls (Fixes #3589) (#3590)
Fixes #3589

## Problem
Multiple `warnings.warn()` calls across the Python client are missing
the `stacklevel=2` parameter. This causes warning messages to point to
lancedb internal code instead of the user's code that triggered the
warning, making debugging difficult.

## Solution
Add `stacklevel=2` to 7 `warnings.warn()` calls across 4 files:

| File | Warnings Fixed |
|------|---------------|
| `remote/db.py` | `request_thread_pool`, `connection_timeout`,
`read_timeout` deprecation warnings |
| `remote/table.py` | `cleanup_old_versions`, `compact_files`,
`optimize` no-op warnings |
| `table.py` | `data_storage_version`, `enable_v2_manifest_paths`,
`retrain` deprecation warnings |
| `embeddings/colpali.py` | `use_token_pooling` deprecation warning |

## Verification
- All 4 modified files pass `ast.parse()` syntax check
- Only `stacklevel=2` added — no other changes

## Changelog

| Date | Change | Author |
|------|--------|--------|
| 2026-06-27 | Add missing stacklevel=2 to warnings.warn() calls |
rtmalikian |

### Files Changed
- `python/python/lancedb/remote/db.py` — Add stacklevel=2 to 3
deprecation warnings
- `python/python/lancedb/remote/table.py` — Add stacklevel=2 to 3 no-op
warnings
- `python/python/lancedb/table.py` — Add stacklevel=2 to 3 deprecation
warnings
- `python/python/lancedb/embeddings/colpali.py` — Add stacklevel=2 to 1
deprecation warning

### Verification
- Syntax check passed on all modified files

---

**About the Author:** Raphael Malikian — Clinical AI Solutions
Architect. I specialise in building and fixing AI/ML systems for
healthcare, including vector databases, RAG pipelines, and clinical NLP.
If you need help with your project or think I can add value to your
organisation, feel free to reach out — I'd love to connect.

📧 rtmalikian@gmail.com
🔗 GitHub: https://github.com/rtmalikian
🔗 LinkedIn:
http://www.linkedin.com/in/raphael-t-malikian-mbbs-bsc-hons-71075436a

---

**Disclosure:** This code was developed with assistance from
DeepSeek-V4-Pro (DeepSeek) via Hermes Agent (Nous Research). All changes
were reviewed, tested against the actual codebase, and verified for
correctness.

Signed-off-by: rtmalikian <rtmalikian@gmail.com>
2026-06-29 16:36:44 -07:00
Raphael Malikian
05756f0bbf fix(python): raise clear error when permutation API is used on remote tables (Fixes #2934) (#3591)
Fixes #2934

## Problem
Passing a `RemoteTable` to `permutation_builder()` raises a cryptic
`AttributeError`:
```
AttributeError: 'RemoteTable' object has no attribute '_inner'
```
This leaves users confused about what went wrong and why.

## Root Cause
`PermutationBuilder.__init__()` calls `async_permutation_builder(table)`
which accesses `table._inner` — the underlying Rust Lance table object.
`RemoteTable` connects to LanceDB Cloud/Enterprise and does not have a
local `_inner` attribute, making permutations fundamentally unsupported
on remote tables.

## Solution
Added an early check in `PermutationBuilder.__init__()` that verifies
the table has `_inner` before calling the Rust function, raising a clear
`TypeError` with an explanation of why permutations don't work on remote
tables.

## Verification
- Syntax validated with `ast.parse()`
- Structural verification: single call site (`permutation_builder()`),
guard placed before Rust FFI call
- Error message tested with mock: `MockRemoteTable()` correctly triggers
`TypeError`

## Changelog

| Date | Change | Author |
|------|--------|--------|
| 2026-06-28 | Added remote table guard in PermutationBuilder.__init__ |
rtmalikian |

### Files Changed
- python/python/lancedb/permutation.py — Added `hasattr(table,
"_inner")` check with clear error

---

**About the Author:** Raphael Malikian — Clinical AI Solutions
Architect. I specialise in building and fixing AI/ML systems for
healthcare, including vector databases, RAG pipelines, and clinical NLP.
If you need help with your project or think I can add value to your
organisation, feel free to reach out — I'd love to connect.

📧 rtmalikian@gmail.com
🔗 GitHub: https://github.com/rtmalikian
🔗 LinkedIn:
http://www.linkedin.com/in/raphael-t-malikian-mbbs-bsc-hons-71075436a

---

**Disclosure:** This code was developed with assistance from
deepseek-v4-pro (DeepSeek) via Hermes Agent (Nous Research). All changes
were reviewed, tested against the actual codebase, and verified for
correctness.

Signed-off-by: rtmalikian <rtmalikian@gmail.com>
2026-06-29 16:36:01 -07:00
LanceDB Robot
2a0945443e chore: update lance dependency to v9.0.0-beta.10 (#3594)
Updates Lance Rust workspace dependencies and Java lance-core to
v9.0.0-beta.10.

No compatibility code changes were required; clippy and rustfmt passed
after installing the missing runner components.

Lance tag:
https://github.com/lance-format/lance/releases/tag/v9.0.0-beta.10
2026-06-29 15:28:47 -05:00
Jack Ye
39e819b6a7 feat(python): expose OAuth connection config (#3586)
Expose the merged Rust OAuth header provider through the Python async
connection path.

Includes:
- Python OAuthConfig and OAuthFlowType public config objects
- PyO3 conversion into the Rust OAuthConfig
- connect_async(oauth_config=...) plumbing
- repr redaction coverage for client_secret

Local validation: cargo fmt --all; ruff format/check on touched Python
files.
2026-06-29 12:36:35 -07:00
dependabot[bot]
70126943ff chore(deps): bump the rust-minor-patch group across 1 directory with 6 updates (#3588)
Bumps the rust-minor-patch group with 6 updates in the / directory:

| Package | From | To |
| --- | --- | --- |
| [env_logger](https://github.com/rust-cli/env_logger) | `0.11.10` |
`0.11.11` |
| [log](https://github.com/rust-lang/log) | `0.4.32` | `0.4.33` |
| [uuid](https://github.com/uuid-rs/uuid) | `1.23.3` | `1.23.4` |
| [anyhow](https://github.com/dtolnay/anyhow) | `1.0.102` | `1.0.103` |
| [napi](https://github.com/napi-rs/napi-rs) | `3.9.3` | `3.9.4` |
| [napi-derive](https://github.com/napi-rs/napi-rs) | `3.5.6` | `3.5.7`
|


Updates `env_logger` from 0.11.10 to 0.11.11
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/rust-cli/env_logger/releases">env_logger's
releases</a>.</em></p>
<blockquote>
<h2>v0.11.11</h2>
<h2>[0.11.11] - 2026-06-25</h2>
<h3>Internal</h3>
<ul>
<li>Updated <code>env_filter</code></li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/rust-cli/env_logger/blob/main/CHANGELOG.md">env_logger's
changelog</a>.</em></p>
<blockquote>
<h2>[0.11.11] - 2026-06-25</h2>
<h3>Internal</h3>
<ul>
<li>Updated <code>env_filter</code></li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="b4d3f2b8dd"><code>b4d3f2b</code></a>
chore: Release</li>
<li><a
href="cc2b2efcd7"><code>cc2b2ef</code></a>
chore: Release</li>
<li><a
href="69e27d1e82"><code>69e27d1</code></a>
docs: Update changelog</li>
<li><a
href="166880db07"><code>166880d</code></a>
Merge pull request <a
href="https://redirect.github.com/rust-cli/env_logger/issues/411">#411</a>
from epage/parse</li>
<li><a
href="0a580d06e7"><code>0a580d0</code></a>
fix(filter): Remove 'parse' on no_std</li>
<li><a
href="78d8ef116e"><code>78d8ef1</code></a>
Merge pull request <a
href="https://redirect.github.com/rust-cli/env_logger/issues/404">#404</a>
from cagatay-y/feature/filter-no_std</li>
<li><a
href="132fe86c8c"><code>132fe86</code></a>
feat(filter): Add support for no_std environments</li>
<li><a
href="4feafa4c3c"><code>4feafa4</code></a>
refactor(env_filter): Fix unreachable pub warning</li>
<li><a
href="92f8d8d083"><code>92f8d8d</code></a>
Merge pull request <a
href="https://redirect.github.com/rust-cli/env_logger/issues/410">#410</a>
from rust-cli/renovate/crate-ci-typos-1.x</li>
<li><a
href="4e57784e0a"><code>4e57784</code></a>
chore(deps): Update pre-commit hook crate-ci/typos to v1.47.0</li>
<li>Additional commits viewable in <a
href="https://github.com/rust-cli/env_logger/compare/v0.11.10...v0.11.11">compare
view</a></li>
</ul>
</details>
<br />

Updates `log` from 0.4.32 to 0.4.33
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/rust-lang/log/blob/master/CHANGELOG.md">log's
changelog</a>.</em></p>
<blockquote>
<h2>[0.4.33] - 2026-06-20</h2>
<h2>What's Changed</h2>
<ul>
<li>Fixed key comparison by <a
href="https://github.com/matteo-zeggiotti-ok"><code>@​matteo-zeggiotti-ok</code></a>
in <a
href="https://redirect.github.com/rust-lang/log/pull/732">rust-lang/log#732</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a
href="https://github.com/matteo-zeggiotti-ok"><code>@​matteo-zeggiotti-ok</code></a>
made their first contribution in <a
href="https://redirect.github.com/rust-lang/log/pull/732">rust-lang/log#732</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/rust-lang/log/compare/0.4.32...0.4.33">https://github.com/rust-lang/log/compare/0.4.32...0.4.33</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="f405739f3a"><code>f405739</code></a>
Merge pull request <a
href="https://redirect.github.com/rust-lang/log/issues/734">#734</a>
from rust-lang/cargo/0.4.33</li>
<li><a
href="6a24abf083"><code>6a24abf</code></a>
prepare for 0.4.33 release</li>
<li><a
href="87e062162e"><code>87e0621</code></a>
Merge pull request <a
href="https://redirect.github.com/rust-lang/log/issues/732">#732</a>
from matteo-zeggiotti-ok/fix-key-comparison</li>
<li><a
href="a9b57119a6"><code>a9b5711</code></a>
Review: fallback to the &amp;str hash</li>
<li><a
href="cc89cc6e41"><code>cc89cc6</code></a>
Review: fixed other comparisons</li>
<li><a
href="920e7dc281"><code>920e7dc</code></a>
Review: fixed comparison on <code>MaybeStaticStr</code></li>
<li><a
href="0d71d3c685"><code>0d71d3c</code></a>
Fixed key comparison</li>
<li>See full diff in <a
href="https://github.com/rust-lang/log/compare/0.4.32...0.4.33">compare
view</a></li>
</ul>
</details>
<br />

Updates `uuid` from 1.23.3 to 1.23.4
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/uuid-rs/uuid/releases">uuid's
releases</a>.</em></p>
<blockquote>
<h2>v1.23.4</h2>
<h2>What's Changed</h2>
<ul>
<li>Fix up name of fuzz script in readme by <a
href="https://github.com/KodrAus"><code>@​KodrAus</code></a> in <a
href="https://redirect.github.com/uuid-rs/uuid/pull/888">uuid-rs/uuid#888</a></li>
<li>document fixes by <a
href="https://github.com/frostyplanet"><code>@​frostyplanet</code></a>
in <a
href="https://redirect.github.com/uuid-rs/uuid/pull/889">uuid-rs/uuid#889</a></li>
<li>Prepare for 1.23.4 release by <a
href="https://github.com/KodrAus"><code>@​KodrAus</code></a> in <a
href="https://redirect.github.com/uuid-rs/uuid/pull/890">uuid-rs/uuid#890</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a
href="https://github.com/frostyplanet"><code>@​frostyplanet</code></a>
made their first contribution in <a
href="https://redirect.github.com/uuid-rs/uuid/pull/889">uuid-rs/uuid#889</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/uuid-rs/uuid/compare/v1.23.3...v1.23.4">https://github.com/uuid-rs/uuid/compare/v1.23.3...v1.23.4</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="3296d64a19"><code>3296d64</code></a>
Merge pull request <a
href="https://redirect.github.com/uuid-rs/uuid/issues/890">#890</a> from
uuid-rs/cargo/v1.23.4</li>
<li><a
href="cba53d0da2"><code>cba53d0</code></a>
prepare for 1.23.4 release</li>
<li><a
href="e347af48aa"><code>e347af4</code></a>
Merge pull request <a
href="https://redirect.github.com/uuid-rs/uuid/issues/889">#889</a> from
frostyplanet/main</li>
<li><a
href="e9bf55c222"><code>e9bf55c</code></a>
doc: Fix broken link warnings</li>
<li><a
href="5351af40a0"><code>5351af4</code></a>
doc: Enable feature flag label for docs.rs</li>
<li><a
href="1e6a9669e3"><code>1e6a966</code></a>
Merge pull request <a
href="https://redirect.github.com/uuid-rs/uuid/issues/888">#888</a> from
uuid-rs/KodrAus-patch-1</li>
<li><a
href="c9619f639c"><code>c9619f6</code></a>
fix up name of fuzz script in readme</li>
<li>See full diff in <a
href="https://github.com/uuid-rs/uuid/compare/v1.23.3...v1.23.4">compare
view</a></li>
</ul>
</details>
<br />

Updates `anyhow` from 1.0.102 to 1.0.103
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/dtolnay/anyhow/releases">anyhow's
releases</a>.</em></p>
<blockquote>
<h2>1.0.103</h2>
<ul>
<li>Fix Stacked Borrows violation (UB) in
<code>Error::downcast_mut</code> (<a
href="https://redirect.github.com/dtolnay/anyhow/issues/451">#451</a>,
<a
href="https://redirect.github.com/dtolnay/anyhow/issues/452">#452</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="5bdb0e24db"><code>5bdb0e2</code></a>
Release 1.0.103</li>
<li><a
href="e621bd35dd"><code>e621bd3</code></a>
Merge pull request <a
href="https://redirect.github.com/dtolnay/anyhow/issues/452">#452</a>
from dtolnay/downcast</li>
<li><a
href="6e8c000690"><code>6e8c000</code></a>
Eliminate pointer-&gt;reference-&gt;pointer during downcast</li>
<li><a
href="67c4abd771"><code>67c4abd</code></a>
Add regression test for issue 451</li>
<li><a
href="917a169320"><code>917a169</code></a>
Update actions/upload-artifact@v6 -&gt; v7</li>
<li><a
href="d9dc3faf78"><code>d9dc3fa</code></a>
Update actions/checkout@v6 -&gt; v7</li>
<li><a
href="841522b2aa"><code>841522b</code></a>
Raise minimum tested compiler to rust 1.85</li>
<li>See full diff in <a
href="https://github.com/dtolnay/anyhow/compare/1.0.102...1.0.103">compare
view</a></li>
</ul>
</details>
<br />

Updates `napi` from 3.9.3 to 3.9.4
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/napi-rs/napi-rs/releases">napi's
releases</a>.</em></p>
<blockquote>
<h2>napi-v3.9.4</h2>
<h3>Other</h3>
<ul>
<li><em>(napi-derive)</em> outline #[napi(object)] field-error
decoration (<a
href="https://redirect.github.com/napi-rs/napi-rs/pull/3338">#3338</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="9cc199fa34"><code>9cc199f</code></a>
chore: release (<a
href="https://redirect.github.com/napi-rs/napi-rs/issues/3345">#3345</a>)</li>
<li><a
href="b77119e711"><code>b77119e</code></a>
chore(release): publish</li>
<li><a
href="71ce9f6015"><code>71ce9f6</code></a>
chore(deps): update actions/cache action to v6 (<a
href="https://redirect.github.com/napi-rs/napi-rs/issues/3349">#3349</a>)</li>
<li><a
href="8c87f474c8"><code>8c87f47</code></a>
chore(deps): update <code>@​tybys/wasm-util</code> to 0.10.3 (<a
href="https://redirect.github.com/napi-rs/napi-rs/issues/3348">#3348</a>)</li>
<li><a
href="04e2a7655d"><code>04e2a76</code></a>
chore(deps): update cross-platform-actions/action action to v1.3.0 (<a
href="https://redirect.github.com/napi-rs/napi-rs/issues/3346">#3346</a>)</li>
<li><a
href="54ecbe4915"><code>54ecbe4</code></a>
chore(deps): update actions/checkout action to v7 (<a
href="https://redirect.github.com/napi-rs/napi-rs/issues/3340">#3340</a>)</li>
<li><a
href="3dd0c309da"><code>3dd0c30</code></a>
perf(napi-derive): outline #[napi(object)] field-error decoration (<a
href="https://redirect.github.com/napi-rs/napi-rs/issues/3338">#3338</a>)</li>
<li><a
href="81ac3d98c3"><code>81ac3d9</code></a>
build(deps): bump undici from 6.26.0 to 6.27.0 (<a
href="https://redirect.github.com/napi-rs/napi-rs/issues/3342">#3342</a>)</li>
<li>See full diff in <a
href="https://github.com/napi-rs/napi-rs/compare/napi-v3.9.3...napi-v3.9.4">compare
view</a></li>
</ul>
</details>
<br />

Updates `napi-derive` from 3.5.6 to 3.5.7
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/napi-rs/napi-rs/releases">napi-derive's
releases</a>.</em></p>
<blockquote>
<h2>napi-derive-v3.5.7</h2>
<h3>Other</h3>
<ul>
<li>updated the following local packages: napi-derive-backend</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="9cc199fa34"><code>9cc199f</code></a>
chore: release (<a
href="https://redirect.github.com/napi-rs/napi-rs/issues/3345">#3345</a>)</li>
<li><a
href="b77119e711"><code>b77119e</code></a>
chore(release): publish</li>
<li><a
href="71ce9f6015"><code>71ce9f6</code></a>
chore(deps): update actions/cache action to v6 (<a
href="https://redirect.github.com/napi-rs/napi-rs/issues/3349">#3349</a>)</li>
<li><a
href="8c87f474c8"><code>8c87f47</code></a>
chore(deps): update <code>@​tybys/wasm-util</code> to 0.10.3 (<a
href="https://redirect.github.com/napi-rs/napi-rs/issues/3348">#3348</a>)</li>
<li><a
href="04e2a7655d"><code>04e2a76</code></a>
chore(deps): update cross-platform-actions/action action to v1.3.0 (<a
href="https://redirect.github.com/napi-rs/napi-rs/issues/3346">#3346</a>)</li>
<li><a
href="54ecbe4915"><code>54ecbe4</code></a>
chore(deps): update actions/checkout action to v7 (<a
href="https://redirect.github.com/napi-rs/napi-rs/issues/3340">#3340</a>)</li>
<li><a
href="3dd0c309da"><code>3dd0c30</code></a>
perf(napi-derive): outline #[napi(object)] field-error decoration (<a
href="https://redirect.github.com/napi-rs/napi-rs/issues/3338">#3338</a>)</li>
<li><a
href="81ac3d98c3"><code>81ac3d9</code></a>
build(deps): bump undici from 6.26.0 to 6.27.0 (<a
href="https://redirect.github.com/napi-rs/napi-rs/issues/3342">#3342</a>)</li>
<li><a
href="ee58383da4"><code>ee58383</code></a>
chore(napi): release v3.9.3 (<a
href="https://redirect.github.com/napi-rs/napi-rs/issues/3335">#3335</a>)</li>
<li><a
href="c78727667b"><code>c787276</code></a>
fix(napi): sync referred flag when creating a weak ThreadsafeFunction
(<a
href="https://redirect.github.com/napi-rs/napi-rs/issues/3337">#3337</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/napi-rs/napi-rs/compare/napi-derive-v3.5.6...napi-derive-v3.5.7">compare
view</a></li>
</ul>
</details>
<br />

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-06-29 07:44:46 -07:00
Lance Release
e01777070d Bump version: 0.31.0-beta.3 → 0.31.0-beta.4 2026-06-29 11:12:18 +00:00
73 changed files with 1960 additions and 411 deletions

View File

@@ -0,0 +1,137 @@
---
name: lancedb-branch-ops
description: Branch management for LanceDB tables via the REST API. Use this skill whenever someone wants to create, delete, list, or switch branches on a LanceDB table — or needs to make sure a write (metadata update, index build, etc.) lands on a specific branch instead of main. Invoke it even without the word "branch" if context makes clear they want an experimental copy of a table, want to isolate changes, or want to confirm a mutation didn't touch main. Covers: branches/list, branches/create, branches/delete, and passing "branch" in describe/update_field_metadata/create_index to target a non-main version.
---
## Goal
Manage branches on a LanceDB table: list what exists, create new ones, delete stale ones, and direct read/write operations at a specific branch without touching main.
## Step 0: Establish the connection
Use the `lancedb-connect` skill to resolve the base URL and auth headers (`x-api-key`, `x-lancedb-database`). Skip this only if the connection is already known from the current conversation.
All examples below use `{base_url}` — substitute the resolved endpoint and include the auth headers on every request.
## The branch model (important)
LanceDB branches are named snapshots that diverge from the table's current state at creation time. There is **no checkout command** — you never switch the whole table to a branch. Instead, you **pass `"branch": "<name>"` in the request body** of any operation to target that branch. Omitting the key (or sending an empty body) always targets main.
`branches/list` returns only non-main branches. Main always exists and is not listed.
## List branches
```http
POST {base_url}/v1/table/{table_id}/branches/list
Content-Type: application/json
{}
```
Response:
```json
{
"branches": {
"experiment-reindex": {"parentVersion": 1, "createAt": 1782506085, "manifestSize": 1029}
}
}
```
If `branches` is `{}`, the table has no branches besides main.
## Create a branch
```http
POST {base_url}/v1/table/{table_id}/branches/create
Content-Type: application/json
{"name": "experiment-reindex"}
```
HTTP 200 with `{}` body = success. The branch is created off the table's current state on main.
Verify by calling `branches/list` and confirming the new name appears.
## Delete a branch
```http
POST {base_url}/v1/table/{table_id}/branches/delete
Content-Type: application/json
{"name": "stale-2024"}
```
HTTP 200 with `{}` body = success. Only the branch pointer is removed — main and all row data remain intact.
Verify by calling `branches/list` (name gone) and `describe` with no branch param (main still responds).
## Operate on a specific branch
Pass `"branch": "<name>"` in the body of any operation to scope it to that branch:
**Read schema on a branch:**
```http
POST {base_url}/v1/table/{table_id}/describe
Content-Type: application/json
{"branch": "wip-branch"}
```
**Write metadata to a branch (not main):**
```http
POST {base_url}/v1/table/{table_id}/update_field_metadata
Content-Type: application/json
{
"branch": "wip-branch",
"updates": [
{
"path": "category",
"metadata": {"lancedb:description": "Product category label."},
"replace": false
}
]
}
```
**Build an index on a branch:**
```http
POST {base_url}/v1/table/{table_id}/create_index
Content-Type: application/json
{
"branch": "wip-branch",
"column": "category",
"index_type": "BTREE"
}
```
## Verifying isolation
After writing to a branch, always confirm the change did NOT land on main:
```bash
# Should show the new metadata
curl -s -X POST {base_url}/v1/table/{table_id}/describe \
-H "x-api-key: <key>" -H "x-lancedb-database: <db>" \
-H "content-type: application/json" \
-d '{"branch": "wip-branch"}'
# Should NOT show the new metadata
curl -s -X POST {base_url}/v1/table/{table_id}/describe \
-H "x-api-key: <key>" -H "x-lancedb-database: <db>" \
-H "content-type: application/json" \
-d '{}'
```
## Quick reference
| Goal | Endpoint | Body |
|------|----------|------|
| List all branches | `branches/list` | `{}` |
| Create a branch | `branches/create` | `{"name": "..."}` |
| Delete a branch | `branches/delete` | `{"name": "..."}` |
| Read schema on branch | `describe` | `{"branch": "..."}` |
| Write metadata on branch | `update_field_metadata` | `{"branch": "...", "updates": [...]}` |
| Build index on branch | `create_index` | `{"branch": "...", "column": ..., "index_type": ...}` |
| Target main (default) | any endpoint | omit `"branch"` key |

View File

@@ -1,5 +1,5 @@
[tool.bumpversion] [tool.bumpversion]
current_version = "0.31.0-beta.3" current_version = "0.31.0-beta.5"
parse = """(?x) parse = """(?x)
(?P<major>0|[1-9]\\d*)\\. (?P<major>0|[1-9]\\d*)\\.
(?P<minor>0|[1-9]\\d*)\\. (?P<minor>0|[1-9]\\d*)\\.

View File

@@ -25,7 +25,7 @@ jobs:
# Only runs on tags that matches the make-release action # Only runs on tags that matches the make-release action
if: startsWith(github.ref, 'refs/tags/v') if: startsWith(github.ref, 'refs/tags/v')
steps: steps:
- uses: actions/checkout@v4 - uses: actions/checkout@v6
- uses: Swatinem/rust-cache@v2 - uses: Swatinem/rust-cache@v2
with: with:
workspaces: rust workspaces: rust
@@ -47,7 +47,7 @@ jobs:
contents: read contents: read
issues: write issues: write
steps: steps:
- uses: actions/checkout@v4 - uses: actions/checkout@v6
- uses: ./.github/actions/create-failure-issue - uses: ./.github/actions/create-failure-issue
with: with:
job-results: ${{ toJSON(needs) }} job-results: ${{ toJSON(needs) }}

View File

@@ -36,14 +36,14 @@ jobs:
echo "guidelines = ${{ inputs.guidelines }}" echo "guidelines = ${{ inputs.guidelines }}"
- name: Checkout Repo - name: Checkout Repo
uses: actions/checkout@v4 uses: actions/checkout@v6
with: with:
ref: ${{ inputs.branch }} ref: ${{ inputs.branch }}
fetch-depth: 0 fetch-depth: 0
persist-credentials: true persist-credentials: true
- name: Set up Node.js - name: Set up Node.js
uses: actions/setup-node@v4 uses: actions/setup-node@v6
with: with:
# pnpm 11 (used by the nodejs install step below) requires # pnpm 11 (used by the nodejs install step below) requires
# Node >= 22.13; use 24 since 22 hits EOL in October. # Node >= 22.13; use 24 since 22 hits EOL in October.
@@ -82,7 +82,7 @@ jobs:
cache: maven cache: maven
- name: Setup pnpm - name: Setup pnpm
uses: pnpm/action-setup@v4 uses: pnpm/action-setup@v6
with: with:
version: 11.1.1 version: 11.1.1
- name: Install Node.js dependencies for TypeScript bindings - name: Install Node.js dependencies for TypeScript bindings

View File

@@ -30,13 +30,13 @@ jobs:
echo "tag = ${{ inputs.tag || 'latest' }}" echo "tag = ${{ inputs.tag || 'latest' }}"
- name: Checkout Repo LanceDB - name: Checkout Repo LanceDB
uses: actions/checkout@v4 uses: actions/checkout@v6
with: with:
fetch-depth: 0 fetch-depth: 0
persist-credentials: true persist-credentials: true
- name: Set up Node.js - name: Set up Node.js
uses: actions/setup-node@v4 uses: actions/setup-node@v6
with: with:
node-version: 20 node-version: 20

View File

@@ -27,7 +27,7 @@ jobs:
name: Verify PR title / description conforms to semantic-release name: Verify PR title / description conforms to semantic-release
runs-on: ubuntu-latest runs-on: ubuntu-latest
steps: steps:
- uses: actions/setup-node@v4 - uses: actions/setup-node@v6
with: with:
node-version: "18" node-version: "18"
# These rules are disabled because Github will always ensure there # These rules are disabled because Github will always ensure there

View File

@@ -35,7 +35,7 @@ jobs:
runs-on: ubuntu-24.04 runs-on: ubuntu-24.04
steps: steps:
- name: Checkout - name: Checkout
uses: actions/checkout@v4 uses: actions/checkout@v6
- name: Install dependencies needed for ubuntu - name: Install dependencies needed for ubuntu
run: | run: |
sudo apt install -y protobuf-compiler libssl-dev sudo apt install -y protobuf-compiler libssl-dev
@@ -53,7 +53,7 @@ jobs:
python -m pip install --extra-index-url https://pypi.fury.io/lance-format/ --extra-index-url https://pypi.fury.io/lancedb/ -e . python -m pip install --extra-index-url https://pypi.fury.io/lance-format/ --extra-index-url https://pypi.fury.io/lancedb/ -e .
python -m pip install --extra-index-url https://pypi.fury.io/lance-format/ --extra-index-url https://pypi.fury.io/lancedb/ -r ../docs/requirements.txt python -m pip install --extra-index-url https://pypi.fury.io/lance-format/ --extra-index-url https://pypi.fury.io/lancedb/ -r ../docs/requirements.txt
- name: Set up node - name: Set up node
uses: actions/setup-node@v4 uses: actions/setup-node@v6
with: with:
node-version: 20 node-version: 20
cache: 'npm' cache: 'npm'

View File

@@ -32,7 +32,7 @@ jobs:
working-directory: ./java working-directory: ./java
steps: steps:
- name: Checkout repository - name: Checkout repository
uses: actions/checkout@v4 uses: actions/checkout@v6
- name: Set up Java 8 - name: Set up Java 8
uses: actions/setup-java@v4 uses: actions/setup-java@v4
with: with:
@@ -73,7 +73,7 @@ jobs:
contents: read contents: read
issues: write issues: write
steps: steps:
- uses: actions/checkout@v4 - uses: actions/checkout@v6
- uses: ./.github/actions/create-failure-issue - uses: ./.github/actions/create-failure-issue
with: with:
job-results: ${{ toJSON(needs) }} job-results: ${{ toJSON(needs) }}

View File

@@ -36,7 +36,7 @@ jobs:
working-directory: ./java working-directory: ./java
steps: steps:
- name: Checkout repository - name: Checkout repository
uses: actions/checkout@v4 uses: actions/checkout@v6
- name: Set up Java 17 - name: Set up Java 17
uses: actions/setup-java@v4 uses: actions/setup-java@v4
with: with:

View File

@@ -19,7 +19,7 @@ jobs:
runs-on: ubuntu-latest runs-on: ubuntu-latest
steps: steps:
- name: Check out code - name: Check out code
uses: actions/checkout@v4 uses: actions/checkout@v6
- name: Install license-header-checker - name: Install license-header-checker
working-directory: /tmp working-directory: /tmp
run: | run: |

View File

@@ -49,7 +49,7 @@ jobs:
steps: steps:
- name: Output Inputs - name: Output Inputs
run: echo "${{ toJSON(github.event.inputs) }}" run: echo "${{ toJSON(github.event.inputs) }}"
- uses: actions/checkout@v4 - uses: actions/checkout@v6
with: with:
fetch-depth: 0 fetch-depth: 0
lfs: true lfs: true

View File

@@ -38,14 +38,14 @@ jobs:
CC: gcc-12 CC: gcc-12
CXX: g++-12 CXX: g++-12
steps: steps:
- uses: actions/checkout@v4 - uses: actions/checkout@v6
with: with:
fetch-depth: 0 fetch-depth: 0
lfs: true lfs: true
- uses: pnpm/action-setup@v4 - uses: pnpm/action-setup@v6
with: with:
version: 11.1.1 version: 11.1.1
- uses: actions/setup-node@v4 - uses: actions/setup-node@v6
with: with:
# pnpm 11 requires Node >= 22.13; use 24 since 22 hits EOL # pnpm 11 requires Node >= 22.13; use 24 since 22 hits EOL
# in October. The library itself still supports Node >= 18 # in October. The library itself still supports Node >= 18
@@ -86,14 +86,14 @@ jobs:
shell: bash shell: bash
working-directory: nodejs working-directory: nodejs
steps: steps:
- uses: actions/checkout@v4 - uses: actions/checkout@v6
with: with:
fetch-depth: 0 fetch-depth: 0
lfs: true lfs: true
- uses: pnpm/action-setup@v4 - uses: pnpm/action-setup@v6
with: with:
version: 11.1.1 version: 11.1.1
- uses: actions/setup-node@v4 - uses: actions/setup-node@v6
name: Setup Node.js 24 for build name: Setup Node.js 24 for build
with: with:
# pnpm 11 requires Node >= 22.13; use 24 since 22 hits EOL # pnpm 11 requires Node >= 22.13; use 24 since 22 hits EOL
@@ -130,7 +130,7 @@ jobs:
echo "Run 'pnpm run docs', fix any warnings, and commit the changes." echo "Run 'pnpm run docs', fix any warnings, and commit the changes."
exit 1 exit 1
fi fi
- uses: actions/setup-node@v4 - uses: actions/setup-node@v6
name: Setup Node.js ${{ matrix.node-version }} for test name: Setup Node.js ${{ matrix.node-version }} for test
with: with:
node-version: ${{ matrix.node-version }} node-version: ${{ matrix.node-version }}
@@ -166,14 +166,14 @@ jobs:
shell: bash shell: bash
working-directory: nodejs working-directory: nodejs
steps: steps:
- uses: actions/checkout@v4 - uses: actions/checkout@v6
with: with:
fetch-depth: 0 fetch-depth: 0
lfs: true lfs: true
- uses: pnpm/action-setup@v4 - uses: pnpm/action-setup@v6
with: with:
version: 11.1.1 version: 11.1.1
- uses: actions/setup-node@v4 - uses: actions/setup-node@v6
with: with:
# pnpm 11 requires Node >= 22.13; use 24 since 22 hits EOL # pnpm 11 requires Node >= 22.13; use 24 since 22 hits EOL
# in October. # in October.

View File

@@ -32,7 +32,7 @@ jobs:
permissions: permissions:
contents: write contents: write
steps: steps:
- uses: actions/checkout@v4 - uses: actions/checkout@v6
with: with:
fetch-depth: 0 fetch-depth: 0
lfs: true lfs: true
@@ -170,13 +170,13 @@ jobs:
run: run:
working-directory: nodejs working-directory: nodejs
steps: steps:
- uses: actions/checkout@v4 - uses: actions/checkout@v6
- name: Setup pnpm - name: Setup pnpm
uses: pnpm/action-setup@v4 uses: pnpm/action-setup@v6
with: with:
version: 11.1.1 version: 11.1.1
- name: Setup node - name: Setup node
uses: actions/setup-node@v4 uses: actions/setup-node@v6
with: with:
# pnpm 11 requires Node >= 22.13; use 24 since 22 hits EOL # pnpm 11 requires Node >= 22.13; use 24 since 22 hits EOL
# in October. # in October.
@@ -190,7 +190,7 @@ jobs:
toolchain: stable toolchain: stable
targets: ${{ matrix.settings.target }} targets: ${{ matrix.settings.target }}
- name: Cache cargo - name: Cache cargo
uses: actions/cache@v4 uses: actions/cache@v5
with: with:
path: | path: |
~/.cargo/registry/index/ ~/.cargo/registry/index/
@@ -244,7 +244,7 @@ jobs:
if: ${{ !matrix.settings.docker }} if: ${{ !matrix.settings.docker }}
shell: bash shell: bash
- name: Upload artifact - name: Upload artifact
uses: actions/upload-artifact@v4 uses: actions/upload-artifact@v7
with: with:
name: lancedb-${{ matrix.settings.target }} name: lancedb-${{ matrix.settings.target }}
path: nodejs/dist/*.node path: nodejs/dist/*.node
@@ -256,7 +256,7 @@ jobs:
run: pnpm tsc run: pnpm tsc
- name: Upload Generic Artifacts - name: Upload Generic Artifacts
if: ${{ matrix.settings.target == 'aarch64-apple-darwin' }} if: ${{ matrix.settings.target == 'aarch64-apple-darwin' }}
uses: actions/upload-artifact@v4 uses: actions/upload-artifact@v7
with: with:
name: nodejs-dist name: nodejs-dist
path: | path: |
@@ -287,13 +287,13 @@ jobs:
shell: bash shell: bash
working-directory: nodejs working-directory: nodejs
steps: steps:
- uses: actions/checkout@v4 - uses: actions/checkout@v6
- name: Setup pnpm - name: Setup pnpm
uses: pnpm/action-setup@v4 uses: pnpm/action-setup@v6
with: with:
version: 11.1.1 version: 11.1.1
- name: Setup Node.js 24 for install - name: Setup Node.js 24 for install
uses: actions/setup-node@v4 uses: actions/setup-node@v6
with: with:
# pnpm 11 requires Node >= 22.13; use 24 since 22 hits EOL # pnpm 11 requires Node >= 22.13; use 24 since 22 hits EOL
# in October. # in October.
@@ -303,18 +303,18 @@ jobs:
- name: Install dependencies - name: Install dependencies
run: pnpm install --frozen-lockfile run: pnpm install --frozen-lockfile
- name: Setup Node.js ${{ matrix.node }} for test - name: Setup Node.js ${{ matrix.node }} for test
uses: actions/setup-node@v4 uses: actions/setup-node@v6
with: with:
node-version: ${{ matrix.node }} node-version: ${{ matrix.node }}
- name: Download artifacts - name: Download artifacts
uses: actions/download-artifact@v4 uses: actions/download-artifact@v8
with: with:
name: lancedb-${{ matrix.settings.target }} name: lancedb-${{ matrix.settings.target }}
path: nodejs/dist/ path: nodejs/dist/
# For testing purposes: # For testing purposes:
# run-id: 13982782871 # run-id: 13982782871
# github-token: ${{ secrets.GITHUB_TOKEN }} # token with actions:read permissions on target repo # github-token: ${{ secrets.GITHUB_TOKEN }} # token with actions:read permissions on target repo
- uses: actions/download-artifact@v4 - uses: actions/download-artifact@v8
with: with:
name: nodejs-dist name: nodejs-dist
path: nodejs/dist path: nodejs/dist
@@ -339,13 +339,13 @@ jobs:
needs: needs:
- test-lancedb - test-lancedb
steps: steps:
- uses: actions/checkout@v4 - uses: actions/checkout@v6
- name: Setup pnpm - name: Setup pnpm
uses: pnpm/action-setup@v4 uses: pnpm/action-setup@v6
with: with:
version: 11.1.1 version: 11.1.1
- name: Setup node - name: Setup node
uses: actions/setup-node@v4 uses: actions/setup-node@v6
with: with:
node-version: 24 node-version: 24
cache: pnpm cache: pnpm
@@ -353,14 +353,14 @@ jobs:
registry-url: "https://registry.npmjs.org" registry-url: "https://registry.npmjs.org"
- name: Install dependencies - name: Install dependencies
run: pnpm install --frozen-lockfile run: pnpm install --frozen-lockfile
- uses: actions/download-artifact@v4 - uses: actions/download-artifact@v8
with: with:
name: nodejs-dist name: nodejs-dist
path: nodejs/dist path: nodejs/dist
# For testing purposes: # For testing purposes:
# run-id: 13982782871 # run-id: 13982782871
# github-token: ${{ secrets.GITHUB_TOKEN }} # token with actions:read permissions on target repo # github-token: ${{ secrets.GITHUB_TOKEN }} # token with actions:read permissions on target repo
- uses: actions/download-artifact@v4 - uses: actions/download-artifact@v8
name: Download arch-specific binaries name: Download arch-specific binaries
with: with:
pattern: lancedb-* pattern: lancedb-*
@@ -398,7 +398,7 @@ jobs:
contents: read contents: read
issues: write issues: write
steps: steps:
- uses: actions/checkout@v4 - uses: actions/checkout@v6
- uses: ./.github/actions/create-failure-issue - uses: ./.github/actions/create-failure-issue
with: with:
job-results: ${{ toJSON(needs) }} job-results: ${{ toJSON(needs) }}

View File

@@ -41,7 +41,7 @@ jobs:
shell: bash shell: bash
working-directory: python working-directory: python
steps: steps:
- uses: actions/checkout@v4 - uses: actions/checkout@v6
with: with:
fetch-depth: 0 fetch-depth: 0
lfs: true lfs: true
@@ -66,7 +66,7 @@ jobs:
shell: bash shell: bash
working-directory: python working-directory: python
steps: steps:
- uses: actions/checkout@v4 - uses: actions/checkout@v6
with: with:
fetch-depth: 0 fetch-depth: 0
lfs: true lfs: true
@@ -95,7 +95,7 @@ jobs:
shell: bash shell: bash
working-directory: python working-directory: python
steps: steps:
- uses: actions/checkout@v4 - uses: actions/checkout@v6
with: with:
fetch-depth: 0 fetch-depth: 0
lfs: true lfs: true
@@ -126,7 +126,7 @@ jobs:
shell: bash shell: bash
working-directory: python working-directory: python
steps: steps:
- uses: actions/checkout@v4 - uses: actions/checkout@v6
with: with:
fetch-depth: 0 fetch-depth: 0
lfs: true lfs: true
@@ -160,7 +160,7 @@ jobs:
shell: bash shell: bash
working-directory: python working-directory: python
steps: steps:
- uses: actions/checkout@v4 - uses: actions/checkout@v6
with: with:
fetch-depth: 0 fetch-depth: 0
lfs: true lfs: true
@@ -189,7 +189,7 @@ jobs:
shell: bash shell: bash
working-directory: python working-directory: python
steps: steps:
- uses: actions/checkout@v4 - uses: actions/checkout@v6
with: with:
fetch-depth: 0 fetch-depth: 0
lfs: true lfs: true
@@ -212,7 +212,7 @@ jobs:
shell: bash shell: bash
working-directory: python working-directory: python
steps: steps:
- uses: actions/checkout@v4 - uses: actions/checkout@v6
with: with:
fetch-depth: 0 fetch-depth: 0
lfs: true lfs: true

View File

@@ -40,7 +40,7 @@ jobs:
CC: clang-18 CC: clang-18
CXX: clang++-18 CXX: clang++-18
steps: steps:
- uses: actions/checkout@v4 - uses: actions/checkout@v6
with: with:
fetch-depth: 0 fetch-depth: 0
lfs: true lfs: true
@@ -65,7 +65,7 @@ jobs:
timeout-minutes: 10 timeout-minutes: 10
runs-on: ubuntu-24.04 runs-on: ubuntu-24.04
steps: steps:
- uses: actions/checkout@v4 - uses: actions/checkout@v6
- uses: EmbarkStudios/cargo-deny-action@v2 - uses: EmbarkStudios/cargo-deny-action@v2
with: with:
command: check advisories bans licenses sources command: check advisories bans licenses sources
@@ -78,7 +78,7 @@ jobs:
CC: clang CC: clang
CXX: clang++ CXX: clang++
steps: steps:
- uses: actions/checkout@v4 - uses: actions/checkout@v6
# Building without a lock file often requires the latest Rust version since downstream # Building without a lock file often requires the latest Rust version since downstream
# dependencies may have updated their minimum Rust version. # dependencies may have updated their minimum Rust version.
- uses: actions-rust-lang/setup-rust-toolchain@v1 - uses: actions-rust-lang/setup-rust-toolchain@v1
@@ -113,7 +113,7 @@ jobs:
CXX: clang++-18 CXX: clang++-18
GH_TOKEN: ${{ secrets.SOPHON_READ_TOKEN }} GH_TOKEN: ${{ secrets.SOPHON_READ_TOKEN }}
steps: steps:
- uses: actions/checkout@v4 - uses: actions/checkout@v6
with: with:
fetch-depth: 0 fetch-depth: 0
lfs: true lfs: true
@@ -152,7 +152,7 @@ jobs:
shell: bash shell: bash
working-directory: rust working-directory: rust
steps: steps:
- uses: actions/checkout@v4 - uses: actions/checkout@v6
with: with:
fetch-depth: 0 fetch-depth: 0
lfs: true lfs: true
@@ -181,7 +181,7 @@ jobs:
run: run:
working-directory: rust/lancedb working-directory: rust/lancedb
steps: steps:
- uses: actions/checkout@v4 - uses: actions/checkout@v6
- name: Set target - name: Set target
run: rustup target add ${{ matrix.target }} run: rustup target add ${{ matrix.target }}
- uses: Swatinem/rust-cache@v2 - uses: Swatinem/rust-cache@v2
@@ -210,7 +210,7 @@ jobs:
CC: clang-18 CC: clang-18
CXX: clang++-18 CXX: clang++-18
steps: steps:
- uses: actions/checkout@v4 - uses: actions/checkout@v6
with: with:
submodules: true submodules: true
- name: Install dependencies - name: Install dependencies

View File

@@ -11,7 +11,7 @@ jobs:
runs-on: ubuntu-latest runs-on: ubuntu-latest
steps: steps:
- name: Checkout - name: Checkout
uses: actions/checkout@v4 uses: actions/checkout@v6
with: with:
ref: main ref: main
persist-credentials: false persist-credentials: false

View File

@@ -11,7 +11,7 @@ jobs:
runs-on: ubuntu-latest runs-on: ubuntu-latest
steps: steps:
- name: Checkout - name: Checkout
uses: actions/checkout@v4 uses: actions/checkout@v6
with: with:
ref: main ref: main
persist-credentials: false persist-credentials: false

133
Cargo.lock generated
View File

@@ -157,9 +157,9 @@ dependencies = [
[[package]] [[package]]
name = "anyhow" name = "anyhow"
version = "1.0.102" version = "1.0.103"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7f202df86484c868dbad7eaa557ef785d5c66295e41b460ef922eca0723b842c" checksum = "2a4385e2e34eb35d6b3efe798b9eb88096925d87726c0798709bf56d9ed84af3"
[[package]] [[package]]
name = "approx" name = "approx"
@@ -1297,15 +1297,6 @@ version = "2.11.1"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "c4512299f36f043ab09a583e57bceb5a5aab7a73db1805848e8fef3c9e8c78b3" checksum = "c4512299f36f043ab09a583e57bceb5a5aab7a73db1805848e8fef3c9e8c78b3"
[[package]]
name = "bitpacking"
version = "0.9.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "96a7139abd3d9cebf8cd6f920a389cf3dc9576172e32f4563f188cae3c3eb019"
dependencies = [
"crunchy",
]
[[package]] [[package]]
name = "bitvec" name = "bitvec"
version = "1.0.1" version = "1.0.1"
@@ -3186,9 +3177,9 @@ dependencies = [
[[package]] [[package]]
name = "env_filter" name = "env_filter"
version = "1.0.1" version = "2.0.0"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "32e90c2accc4b07a8456ea0debdc2e7587bdd890680d71173a15d4ae604f6eef" checksum = "900d271a03799a1ee8d1ca9b19893b48ca674a9284fefcfb85f05e74ed314217"
dependencies = [ dependencies = [
"log", "log",
"regex", "regex",
@@ -3196,9 +3187,9 @@ dependencies = [
[[package]] [[package]]
name = "env_logger" name = "env_logger"
version = "0.11.10" version = "0.11.11"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0621c04f2196ac3f488dd583365b9c09be011a4ab8b9f37248ffcc8f6198b56a" checksum = "de671bd27a75a797dc9ae289ba1e77276e75e2026408aab65185384e2d5cd3f6"
dependencies = [ dependencies = [
"anstream", "anstream",
"anstyle", "anstyle",
@@ -3432,8 +3423,8 @@ checksum = "42703706b716c37f96a77aea830392ad231f44c9e9a67872fa5548707e11b11c"
[[package]] [[package]]
name = "fsst" name = "fsst"
version = "9.0.0-beta.8" version = "9.0.0-beta.10"
source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.8#71c4aa2174971e98acb7e256fde1e1589024f5bc" source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.10#e25b71e74b89d10c57b412d111bde087117383f3"
dependencies = [ dependencies = [
"arrow-array", "arrow-array",
"rand 0.9.4", "rand 0.9.4",
@@ -4735,8 +4726,8 @@ checksum = "e037a2e1d8d5fdbd49b16a4ea09d5d6401c1f29eca5ff29d03d3824dba16256a"
[[package]] [[package]]
name = "lance" name = "lance"
version = "9.0.0-beta.8" version = "9.0.0-beta.10"
source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.8#71c4aa2174971e98acb7e256fde1e1589024f5bc" source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.10#e25b71e74b89d10c57b412d111bde087117383f3"
dependencies = [ dependencies = [
"arc-swap", "arc-swap",
"arrow", "arrow",
@@ -4754,7 +4745,6 @@ dependencies = [
"async_cell", "async_cell",
"aws-credential-types", "aws-credential-types",
"aws-sdk-dynamodb", "aws-sdk-dynamodb",
"bitpacking",
"byteorder", "byteorder",
"bytes", "bytes",
"chrono", "chrono",
@@ -4773,6 +4763,7 @@ dependencies = [
"humantime", "humantime",
"itertools 0.14.0", "itertools 0.14.0",
"lance-arrow", "lance-arrow",
"lance-bitpacking",
"lance-core", "lance-core",
"lance-datafusion", "lance-datafusion",
"lance-encoding", "lance-encoding",
@@ -4810,8 +4801,8 @@ dependencies = [
[[package]] [[package]]
name = "lance-arrow" name = "lance-arrow"
version = "9.0.0-beta.8" version = "9.0.0-beta.10"
source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.8#71c4aa2174971e98acb7e256fde1e1589024f5bc" source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.10#e25b71e74b89d10c57b412d111bde087117383f3"
dependencies = [ dependencies = [
"arrow-array", "arrow-array",
"arrow-buffer", "arrow-buffer",
@@ -4832,7 +4823,7 @@ dependencies = [
[[package]] [[package]]
name = "lance-arrow-scalar" name = "lance-arrow-scalar"
version = "58.0.0" version = "58.0.0"
source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.8#71c4aa2174971e98acb7e256fde1e1589024f5bc" source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.10#e25b71e74b89d10c57b412d111bde087117383f3"
dependencies = [ dependencies = [
"arrow-array", "arrow-array",
"arrow-buffer", "arrow-buffer",
@@ -4846,7 +4837,7 @@ dependencies = [
[[package]] [[package]]
name = "lance-arrow-stats" name = "lance-arrow-stats"
version = "58.0.0" version = "58.0.0"
source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.8#71c4aa2174971e98acb7e256fde1e1589024f5bc" source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.10#e25b71e74b89d10c57b412d111bde087117383f3"
dependencies = [ dependencies = [
"arrow-array", "arrow-array",
"arrow-schema", "arrow-schema",
@@ -4855,18 +4846,19 @@ dependencies = [
[[package]] [[package]]
name = "lance-bitpacking" name = "lance-bitpacking"
version = "9.0.0-beta.8" version = "9.0.0-beta.10"
source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.8#71c4aa2174971e98acb7e256fde1e1589024f5bc" source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.10#e25b71e74b89d10c57b412d111bde087117383f3"
dependencies = [ dependencies = [
"arrayref", "arrayref",
"crunchy",
"paste", "paste",
"seq-macro", "seq-macro",
] ]
[[package]] [[package]]
name = "lance-core" name = "lance-core"
version = "9.0.0-beta.8" version = "9.0.0-beta.10"
source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.8#71c4aa2174971e98acb7e256fde1e1589024f5bc" source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.10#e25b71e74b89d10c57b412d111bde087117383f3"
dependencies = [ dependencies = [
"arrow-array", "arrow-array",
"arrow-buffer", "arrow-buffer",
@@ -4904,8 +4896,8 @@ dependencies = [
[[package]] [[package]]
name = "lance-datafusion" name = "lance-datafusion"
version = "9.0.0-beta.8" version = "9.0.0-beta.10"
source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.8#71c4aa2174971e98acb7e256fde1e1589024f5bc" source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.10#e25b71e74b89d10c57b412d111bde087117383f3"
dependencies = [ dependencies = [
"arrow", "arrow",
"arrow-array", "arrow-array",
@@ -4935,8 +4927,8 @@ dependencies = [
[[package]] [[package]]
name = "lance-datagen" name = "lance-datagen"
version = "9.0.0-beta.8" version = "9.0.0-beta.10"
source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.8#71c4aa2174971e98acb7e256fde1e1589024f5bc" source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.10#e25b71e74b89d10c57b412d111bde087117383f3"
dependencies = [ dependencies = [
"arrow", "arrow",
"arrow-array", "arrow-array",
@@ -4953,8 +4945,8 @@ dependencies = [
[[package]] [[package]]
name = "lance-derive" name = "lance-derive"
version = "9.0.0-beta.8" version = "9.0.0-beta.10"
source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.8#71c4aa2174971e98acb7e256fde1e1589024f5bc" source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.10#e25b71e74b89d10c57b412d111bde087117383f3"
dependencies = [ dependencies = [
"proc-macro2", "proc-macro2",
"quote", "quote",
@@ -4963,8 +4955,8 @@ dependencies = [
[[package]] [[package]]
name = "lance-encoding" name = "lance-encoding"
version = "9.0.0-beta.8" version = "9.0.0-beta.10"
source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.8#71c4aa2174971e98acb7e256fde1e1589024f5bc" source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.10#e25b71e74b89d10c57b412d111bde087117383f3"
dependencies = [ dependencies = [
"arrow-arith", "arrow-arith",
"arrow-array", "arrow-array",
@@ -4999,8 +4991,8 @@ dependencies = [
[[package]] [[package]]
name = "lance-file" name = "lance-file"
version = "9.0.0-beta.8" version = "9.0.0-beta.10"
source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.8#71c4aa2174971e98acb7e256fde1e1589024f5bc" source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.10#e25b71e74b89d10c57b412d111bde087117383f3"
dependencies = [ dependencies = [
"arrow-arith", "arrow-arith",
"arrow-array", "arrow-array",
@@ -5030,8 +5022,8 @@ dependencies = [
[[package]] [[package]]
name = "lance-index" name = "lance-index"
version = "9.0.0-beta.8" version = "9.0.0-beta.10"
source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.8#71c4aa2174971e98acb7e256fde1e1589024f5bc" source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.10#e25b71e74b89d10c57b412d111bde087117383f3"
dependencies = [ dependencies = [
"arc-swap", "arc-swap",
"arrow", "arrow",
@@ -5043,7 +5035,6 @@ dependencies = [
"async-channel", "async-channel",
"async-recursion", "async-recursion",
"async-trait", "async-trait",
"bitpacking",
"bitvec", "bitvec",
"bytes", "bytes",
"chrono", "chrono",
@@ -5061,6 +5052,7 @@ dependencies = [
"jsonb", "jsonb",
"lance-arrow", "lance-arrow",
"lance-arrow-stats", "lance-arrow-stats",
"lance-bitpacking",
"lance-core", "lance-core",
"lance-datafusion", "lance-datafusion",
"lance-datagen", "lance-datagen",
@@ -5096,8 +5088,8 @@ dependencies = [
[[package]] [[package]]
name = "lance-io" name = "lance-io"
version = "9.0.0-beta.8" version = "9.0.0-beta.10"
source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.8#71c4aa2174971e98acb7e256fde1e1589024f5bc" source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.10#e25b71e74b89d10c57b412d111bde087117383f3"
dependencies = [ dependencies = [
"arrow", "arrow",
"arrow-arith", "arrow-arith",
@@ -5138,8 +5130,8 @@ dependencies = [
[[package]] [[package]]
name = "lance-linalg" name = "lance-linalg"
version = "9.0.0-beta.8" version = "9.0.0-beta.10"
source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.8#71c4aa2174971e98acb7e256fde1e1589024f5bc" source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.10#e25b71e74b89d10c57b412d111bde087117383f3"
dependencies = [ dependencies = [
"arrow-array", "arrow-array",
"arrow-buffer", "arrow-buffer",
@@ -5155,8 +5147,8 @@ dependencies = [
[[package]] [[package]]
name = "lance-namespace" name = "lance-namespace"
version = "9.0.0-beta.8" version = "9.0.0-beta.10"
source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.8#71c4aa2174971e98acb7e256fde1e1589024f5bc" source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.10#e25b71e74b89d10c57b412d111bde087117383f3"
dependencies = [ dependencies = [
"arrow", "arrow",
"async-trait", "async-trait",
@@ -5168,8 +5160,8 @@ dependencies = [
[[package]] [[package]]
name = "lance-namespace-impls" name = "lance-namespace-impls"
version = "9.0.0-beta.8" version = "9.0.0-beta.10"
source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.8#71c4aa2174971e98acb7e256fde1e1589024f5bc" source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.10#e25b71e74b89d10c57b412d111bde087117383f3"
dependencies = [ dependencies = [
"arrow", "arrow",
"arrow-ipc", "arrow-ipc",
@@ -5223,8 +5215,8 @@ dependencies = [
[[package]] [[package]]
name = "lance-select" name = "lance-select"
version = "9.0.0-beta.8" version = "9.0.0-beta.10"
source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.8#71c4aa2174971e98acb7e256fde1e1589024f5bc" source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.10#e25b71e74b89d10c57b412d111bde087117383f3"
dependencies = [ dependencies = [
"arrow-array", "arrow-array",
"arrow-buffer", "arrow-buffer",
@@ -5239,8 +5231,8 @@ dependencies = [
[[package]] [[package]]
name = "lance-table" name = "lance-table"
version = "9.0.0-beta.8" version = "9.0.0-beta.10"
source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.8#71c4aa2174971e98acb7e256fde1e1589024f5bc" source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.10#e25b71e74b89d10c57b412d111bde087117383f3"
dependencies = [ dependencies = [
"arrow", "arrow",
"arrow-array", "arrow-array",
@@ -5279,8 +5271,8 @@ dependencies = [
[[package]] [[package]]
name = "lance-testing" name = "lance-testing"
version = "9.0.0-beta.8" version = "9.0.0-beta.10"
source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.8#71c4aa2174971e98acb7e256fde1e1589024f5bc" source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.10#e25b71e74b89d10c57b412d111bde087117383f3"
dependencies = [ dependencies = [
"arrow-array", "arrow-array",
"arrow-schema", "arrow-schema",
@@ -5293,8 +5285,8 @@ dependencies = [
[[package]] [[package]]
name = "lance-tokenizer" name = "lance-tokenizer"
version = "9.0.0-beta.8" version = "9.0.0-beta.10"
source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.8#71c4aa2174971e98acb7e256fde1e1589024f5bc" source = "git+https://github.com/lance-format/lance.git?tag=v9.0.0-beta.10#e25b71e74b89d10c57b412d111bde087117383f3"
dependencies = [ dependencies = [
"icu_segmenter", "icu_segmenter",
"jieba-rs", "jieba-rs",
@@ -5307,12 +5299,13 @@ dependencies = [
[[package]] [[package]]
name = "lancedb" name = "lancedb"
version = "0.31.0-beta.3" version = "0.31.0-beta.5"
dependencies = [ dependencies = [
"ahash", "ahash",
"anyhow", "anyhow",
"arrow", "arrow",
"arrow-array", "arrow-array",
"arrow-buffer",
"arrow-cast", "arrow-cast",
"arrow-data", "arrow-data",
"arrow-ipc", "arrow-ipc",
@@ -5391,7 +5384,7 @@ dependencies = [
[[package]] [[package]]
name = "lancedb-nodejs" name = "lancedb-nodejs"
version = "0.31.0-beta.3" version = "0.31.0-beta.5"
dependencies = [ dependencies = [
"arrow-array", "arrow-array",
"arrow-buffer", "arrow-buffer",
@@ -5416,7 +5409,7 @@ dependencies = [
[[package]] [[package]]
name = "lancedb-python" name = "lancedb-python"
version = "0.34.0-beta.3" version = "0.34.0-beta.5"
dependencies = [ dependencies = [
"arrow", "arrow",
"async-trait", "async-trait",
@@ -5649,9 +5642,9 @@ dependencies = [
[[package]] [[package]]
name = "log" name = "log"
version = "0.4.32" version = "0.4.33"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "953f07c43838f8e6f9758cab68bf5bed85465e7587ebe0b823f1bcd81978ad3a" checksum = "0ceec5bc11778974d1bcb055b18002eba7f4b3518b6a0081b3af5f21666da9ad"
[[package]] [[package]]
name = "loom" name = "loom"
@@ -5959,9 +5952,9 @@ dependencies = [
[[package]] [[package]]
name = "napi" name = "napi"
version = "3.9.3" version = "3.9.4"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "fbd9f9295f3ff5921e78a71222c3361a8216f7760b1a99a6ad4e8441de18bbb9" checksum = "b41bda2ac390efb5e8d22025d925ccc3f3807d8c1bea6d19b36127247c4b8f83"
dependencies = [ dependencies = [
"bitflags 2.11.1", "bitflags 2.11.1",
"chrono", "chrono",
@@ -5984,9 +5977,9 @@ checksum = "c9c366d2c8c60b86fa632df75f745509b52f9128f91a6bad4c796e44abb505e1"
[[package]] [[package]]
name = "napi-derive" name = "napi-derive"
version = "3.5.6" version = "3.5.7"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "89b3f766e04667e6da0e181e2da4f85475d5a6513b7cf6a80bea184e224a5b42" checksum = "61d66f70256ad5aef58659966064471d0ad90e2897bc36a5a5e0389c85aabc1e"
dependencies = [ dependencies = [
"convert_case", "convert_case",
"ctor 1.0.5", "ctor 1.0.5",
@@ -5998,9 +5991,9 @@ dependencies = [
[[package]] [[package]]
name = "napi-derive-backend" name = "napi-derive-backend"
version = "5.0.4" version = "5.0.5"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0d5af30503edf933ce7377cf6d4c877a62b0f1107ea05585f1b5e430e88d5baf" checksum = "81b4b08f15eed7a2a20c3f4c6314013fc3ac890a3afa9892b594485299ebdb2d"
dependencies = [ dependencies = [
"convert_case", "convert_case",
"proc-macro2", "proc-macro2",
@@ -10128,9 +10121,9 @@ checksum = "06abde3611657adf66d383f00b093d7faecc7fa57071cce2578660c9f1010821"
[[package]] [[package]]
name = "uuid" name = "uuid"
version = "1.23.3" version = "1.23.4"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "144d6b123cef80b301b8f72a9e2ca4370ddec21950d0a103dd22c437006d2db7" checksum = "bf80a72845275afea99e7f2b434723d3bc7e38470fcd1c7ed39a599c73319a53"
dependencies = [ dependencies = [
"getrandom 0.4.2", "getrandom 0.4.2",
"js-sys", "js-sys",

View File

@@ -13,24 +13,25 @@ categories = ["database-implementations"]
rust-version = "1.91.0" rust-version = "1.91.0"
[workspace.dependencies] [workspace.dependencies]
lance = { "version" = "=9.0.0-beta.8", default-features = false, "tag" = "v9.0.0-beta.8", "git" = "https://github.com/lance-format/lance.git" } lance = { "version" = "=9.0.0-beta.10", default-features = false, "tag" = "v9.0.0-beta.10", "git" = "https://github.com/lance-format/lance.git" }
lance-core = { "version" = "=9.0.0-beta.8", "tag" = "v9.0.0-beta.8", "git" = "https://github.com/lance-format/lance.git" } lance-core = { "version" = "=9.0.0-beta.10", "tag" = "v9.0.0-beta.10", "git" = "https://github.com/lance-format/lance.git" }
lance-datagen = { "version" = "=9.0.0-beta.8", "tag" = "v9.0.0-beta.8", "git" = "https://github.com/lance-format/lance.git" } lance-datagen = { "version" = "=9.0.0-beta.10", "tag" = "v9.0.0-beta.10", "git" = "https://github.com/lance-format/lance.git" }
lance-file = { "version" = "=9.0.0-beta.8", "tag" = "v9.0.0-beta.8", "git" = "https://github.com/lance-format/lance.git" } lance-file = { "version" = "=9.0.0-beta.10", "tag" = "v9.0.0-beta.10", "git" = "https://github.com/lance-format/lance.git" }
lance-io = { "version" = "=9.0.0-beta.8", default-features = false, "tag" = "v9.0.0-beta.8", "git" = "https://github.com/lance-format/lance.git" } lance-io = { "version" = "=9.0.0-beta.10", default-features = false, "tag" = "v9.0.0-beta.10", "git" = "https://github.com/lance-format/lance.git" }
lance-index = { "version" = "=9.0.0-beta.8", "tag" = "v9.0.0-beta.8", "git" = "https://github.com/lance-format/lance.git" } lance-index = { "version" = "=9.0.0-beta.10", "tag" = "v9.0.0-beta.10", "git" = "https://github.com/lance-format/lance.git" }
lance-linalg = { "version" = "=9.0.0-beta.8", "tag" = "v9.0.0-beta.8", "git" = "https://github.com/lance-format/lance.git" } lance-linalg = { "version" = "=9.0.0-beta.10", "tag" = "v9.0.0-beta.10", "git" = "https://github.com/lance-format/lance.git" }
lance-namespace = { "version" = "=9.0.0-beta.8", "tag" = "v9.0.0-beta.8", "git" = "https://github.com/lance-format/lance.git" } lance-namespace = { "version" = "=9.0.0-beta.10", "tag" = "v9.0.0-beta.10", "git" = "https://github.com/lance-format/lance.git" }
lance-namespace-impls = { "version" = "=9.0.0-beta.8", default-features = false, "tag" = "v9.0.0-beta.8", "git" = "https://github.com/lance-format/lance.git" } lance-namespace-impls = { "version" = "=9.0.0-beta.10", default-features = false, "tag" = "v9.0.0-beta.10", "git" = "https://github.com/lance-format/lance.git" }
lance-table = { "version" = "=9.0.0-beta.8", "tag" = "v9.0.0-beta.8", "git" = "https://github.com/lance-format/lance.git" } lance-table = { "version" = "=9.0.0-beta.10", "tag" = "v9.0.0-beta.10", "git" = "https://github.com/lance-format/lance.git" }
lance-testing = { "version" = "=9.0.0-beta.8", "tag" = "v9.0.0-beta.8", "git" = "https://github.com/lance-format/lance.git" } lance-testing = { "version" = "=9.0.0-beta.10", "tag" = "v9.0.0-beta.10", "git" = "https://github.com/lance-format/lance.git" }
lance-datafusion = { "version" = "=9.0.0-beta.8", "tag" = "v9.0.0-beta.8", "git" = "https://github.com/lance-format/lance.git" } lance-datafusion = { "version" = "=9.0.0-beta.10", "tag" = "v9.0.0-beta.10", "git" = "https://github.com/lance-format/lance.git" }
lance-encoding = { "version" = "=9.0.0-beta.8", "tag" = "v9.0.0-beta.8", "git" = "https://github.com/lance-format/lance.git" } lance-encoding = { "version" = "=9.0.0-beta.10", "tag" = "v9.0.0-beta.10", "git" = "https://github.com/lance-format/lance.git" }
lance-arrow = { "version" = "=9.0.0-beta.8", "tag" = "v9.0.0-beta.8", "git" = "https://github.com/lance-format/lance.git" } lance-arrow = { "version" = "=9.0.0-beta.10", "tag" = "v9.0.0-beta.10", "git" = "https://github.com/lance-format/lance.git" }
ahash = "0.8" ahash = "0.8"
# Note that this one does not include pyarrow # Note that this one does not include pyarrow
arrow = { version = "58.0.0", optional = false } arrow = { version = "58.0.0", optional = false }
arrow-array = "58.0.0" arrow-array = "58.0.0"
arrow-buffer = "58.0.0"
arrow-data = "58.0.0" arrow-data = "58.0.0"
arrow-ipc = "58.0.0" arrow-ipc = "58.0.0"
arrow-ord = "58.0.0" arrow-ord = "58.0.0"

View File

@@ -14,7 +14,7 @@ Add the following dependency to your `pom.xml`:
<dependency> <dependency>
<groupId>com.lancedb</groupId> <groupId>com.lancedb</groupId>
<artifactId>lancedb-core</artifactId> <artifactId>lancedb-core</artifactId>
<version>0.31.0-beta.3</version> <version>0.31.0-beta.5</version>
</dependency> </dependency>
``` ```

View File

@@ -518,6 +518,9 @@ x > 5 OR y = 'test'
Filtering performance can often be improved by creating a scalar index Filtering performance can often be improved by creating a scalar index
on the filter column(s). on the filter column(s).
Calling this multiple times combines the filters with a logical AND rather
than replacing the previous filter.
``` ```
#### Inherited from #### Inherited from

View File

@@ -767,6 +767,9 @@ x > 5 OR y = 'test'
Filtering performance can often be improved by creating a scalar index Filtering performance can often be improved by creating a scalar index
on the filter column(s). on the filter column(s).
Calling this multiple times combines the filters with a logical AND rather
than replacing the previous filter.
``` ```
#### Inherited from #### Inherited from

View File

@@ -0,0 +1,29 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / OAuthFlowType
# Enumeration: OAuthFlowType
OAuth authentication flow types.
## Enumeration Members
### AzureManagedIdentity
```ts
AzureManagedIdentity: "azure_managed_identity";
```
Azure Managed Identity via IMDS.
***
### ClientCredentials
```ts
ClientCredentials: "client_credentials";
```
Client Credentials grant (service-to-service / M2M).

View File

@@ -12,6 +12,7 @@
## Enumerations ## Enumerations
- [FullTextQueryType](enumerations/FullTextQueryType.md) - [FullTextQueryType](enumerations/FullTextQueryType.md)
- [OAuthFlowType](enumerations/OAuthFlowType.md)
- [Occur](enumerations/Occur.md) - [Occur](enumerations/Occur.md)
- [Operator](enumerations/Operator.md) - [Operator](enumerations/Operator.md)
@@ -85,6 +86,8 @@
- [ListNamespacesResponse](interfaces/ListNamespacesResponse.md) - [ListNamespacesResponse](interfaces/ListNamespacesResponse.md)
- [LsmWriteSpec](interfaces/LsmWriteSpec.md) - [LsmWriteSpec](interfaces/LsmWriteSpec.md)
- [MergeResult](interfaces/MergeResult.md) - [MergeResult](interfaces/MergeResult.md)
- [NativeOAuthConfig](interfaces/NativeOAuthConfig.md)
- [OAuthConfig](interfaces/OAuthConfig.md)
- [OpenTableOptions](interfaces/OpenTableOptions.md) - [OpenTableOptions](interfaces/OpenTableOptions.md)
- [OptimizeOptions](interfaces/OptimizeOptions.md) - [OptimizeOptions](interfaces/OptimizeOptions.md)
- [OptimizeStats](interfaces/OptimizeStats.md) - [OptimizeStats](interfaces/OptimizeStats.md)

View File

@@ -64,6 +64,19 @@ client used by manifest-enabled native connections.
*** ***
### oauthConfig?
```ts
optional oauthConfig: NativeOAuthConfig;
```
(For LanceDB cloud only): OAuth configuration for IdP-based
authentication (e.g., Azure Entra ID). When set, token acquisition
and refresh are handled entirely in Rust. TypeScript users should pass
the public `OAuthConfig` type exported from `@lancedb/lancedb`.
***
### readConsistencyInterval? ### readConsistencyInterval?
```ts ```ts

View File

@@ -0,0 +1,88 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / NativeOAuthConfig
# Interface: NativeOAuthConfig
OAuth configuration for LanceDB authentication.
This is the generated napi-rs binding shape. TypeScript users should prefer
the public `OAuthConfig` type exported from `@lancedb/lancedb`.
All token acquisition and refresh is handled in the Rust layer.
## Properties
### clientId
```ts
clientId: string;
```
Application / Client ID.
***
### clientSecret?
```ts
optional clientSecret: string;
```
Client secret (required for client_credentials).
***
### flow?
```ts
optional flow: string;
```
Authentication flow: "client_credentials" or "azure_managed_identity"
***
### issuerUrl
```ts
issuerUrl: string;
```
OIDC issuer URL or OAuth authority URL.
For Azure: `https://login.microsoftonline.com/{tenant_id}/v2.0`
***
### managedIdentityClientId?
```ts
optional managedIdentityClientId: string;
```
Client ID for user-assigned managed identity (azure_managed_identity).
***
### refreshBufferSecs?
```ts
optional refreshBufferSecs: number;
```
Seconds before expiry to trigger proactive refresh (default: 300).
Keep this well below the token TTL; if it is greater than or equal to
the TTL, each request refreshes the token.
***
### scopes
```ts
scopes: string[];
```
OAuth scopes to request. For Azure managed identity, exactly one scope
or resource is required. For example: `["api://{app_id}/.default"]`

View File

@@ -0,0 +1,111 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / OAuthConfig
# Interface: OAuthConfig
OAuth configuration for LanceDB authentication.
This is the public TypeScript OAuth configuration type. The generated
`NativeOAuthConfig` type has the same runtime shape but is an implementation
detail of the napi-rs binding.
All token acquisition and refresh is handled in the Rust layer.
This config is passed through to Rust via napi-rs.
## Examples
```typescript
const config: OAuthConfig = {
issuerUrl: "https://login.microsoftonline.com/{tenant}/v2.0",
clientId: "app-id",
clientSecret: "secret",
scopes: ["api://lancedb-api/.default"],
};
```
```typescript
const config: OAuthConfig = {
issuerUrl: "https://login.microsoftonline.com/{tenant}/v2.0",
clientId: "app-id",
scopes: ["api://lancedb-api/.default"],
flow: OAuthFlowType.AzureManagedIdentity,
};
```
## Properties
### clientId
```ts
clientId: string;
```
Application / Client ID.
***
### clientSecret?
```ts
optional clientSecret: string;
```
Client secret (required for ClientCredentials).
***
### flow?
```ts
optional flow: OAuthFlowType;
```
Authentication flow (default: ClientCredentials).
***
### issuerUrl
```ts
issuerUrl: string;
```
OIDC issuer URL or OAuth authority URL.
For Azure: `https://login.microsoftonline.com/{tenant_id}/v2.0`
***
### managedIdentityClientId?
```ts
optional managedIdentityClientId: string;
```
Client ID for user-assigned managed identity (AzureManagedIdentity).
***
### refreshBufferSecs?
```ts
optional refreshBufferSecs: number;
```
Seconds before expiry to trigger proactive refresh (default: 300).
Keep this well below the token TTL; if it is greater than or equal to
the TTL, each request refreshes the token.
***
### scopes
```ts
scopes: string[];
```
OAuth scopes to request.
For Azure managed identity, exactly one scope or resource is required.
For example: `["api://{app_id}/.default"]`

View File

@@ -8,7 +8,7 @@
<parent> <parent>
<groupId>com.lancedb</groupId> <groupId>com.lancedb</groupId>
<artifactId>lancedb-parent</artifactId> <artifactId>lancedb-parent</artifactId>
<version>0.31.0-beta.3</version> <version>0.31.0-beta.5</version>
<relativePath>../pom.xml</relativePath> <relativePath>../pom.xml</relativePath>
</parent> </parent>

View File

@@ -6,7 +6,7 @@
<groupId>com.lancedb</groupId> <groupId>com.lancedb</groupId>
<artifactId>lancedb-parent</artifactId> <artifactId>lancedb-parent</artifactId>
<version>0.31.0-beta.3</version> <version>0.31.0-beta.5</version>
<packaging>pom</packaging> <packaging>pom</packaging>
<name>${project.artifactId}</name> <name>${project.artifactId}</name>
<description>LanceDB Java SDK Parent POM</description> <description>LanceDB Java SDK Parent POM</description>
@@ -28,7 +28,7 @@
<properties> <properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<arrow.version>15.0.0</arrow.version> <arrow.version>15.0.0</arrow.version>
<lance-core.version>9.0.0-beta.8</lance-core.version> <lance-core.version>9.0.0-beta.10</lance-core.version>
<spotless.skip>false</spotless.skip> <spotless.skip>false</spotless.skip>
<spotless.version>2.30.0</spotless.version> <spotless.version>2.30.0</spotless.version>
<spotless.java.googlejavaformat.version>1.7</spotless.java.googlejavaformat.version> <spotless.java.googlejavaformat.version>1.7</spotless.java.googlejavaformat.version>

View File

@@ -1,7 +1,7 @@
[package] [package]
name = "lancedb-nodejs" name = "lancedb-nodejs"
edition.workspace = true edition.workspace = true
version = "0.31.0-beta.3" version = "0.31.0-beta.5"
publish = false publish = false
license.workspace = true license.workspace = true
description.workspace = true description.workspace = true

View File

@@ -215,6 +215,20 @@ describe("Query orderBy", () => {
expect(results[2].score).toBeCloseTo(4.1, 0.001); expect(results[2].score).toBeCloseTo(4.1, 0.001);
}); });
it("should combine repeated where clauses with AND", async () => {
const results = await table
.query()
.where("score > 1.0")
.where("score < 3.0")
.orderBy({ columnName: "score" })
.toArray();
// Only rows matching both predicates should be returned, rather than the
// second where() silently replacing the first.
expect(results.length).toBe(2);
expect(results[0].score).toBeCloseTo(1.2, 0.001);
expect(results[1].score).toBeCloseTo(2.8, 0.001);
});
it("should support method chaining with limit", async () => { it("should support method chaining with limit", async () => {
const results = await table const results = await table
.query() .query()

View File

@@ -52,6 +52,7 @@ export {
SplitHashOptions, SplitHashOptions,
SplitSequentialOptions, SplitSequentialOptions,
ShuffleOptions, ShuffleOptions,
OAuthConfig as NativeOAuthConfig,
} from "./native.js"; } from "./native.js";
export { export {
@@ -130,6 +131,8 @@ export {
TokenResponse, TokenResponse,
} from "./header"; } from "./header";
export { OAuthConfig, OAuthFlowType } from "./oauth";
export { MergeInsertBuilder, WriteExecutionOptions } from "./merge"; export { MergeInsertBuilder, WriteExecutionOptions } from "./merge";
export * as embedding from "./embedding"; export * as embedding from "./embedding";

76
nodejs/lancedb/oauth.ts Normal file
View File

@@ -0,0 +1,76 @@
// SPDX-License-Identifier: Apache-2.0
// SPDX-FileCopyrightText: Copyright The LanceDB Authors
/**
* OAuth authentication flow types.
*/
export enum OAuthFlowType {
/** Client Credentials grant (service-to-service / M2M). */
ClientCredentials = "client_credentials",
/** Azure Managed Identity via IMDS. */
AzureManagedIdentity = "azure_managed_identity",
}
/**
* OAuth configuration for LanceDB authentication.
*
* This is the public TypeScript OAuth configuration type. The generated
* `NativeOAuthConfig` type has the same runtime shape but is an implementation
* detail of the napi-rs binding.
*
* All token acquisition and refresh is handled in the Rust layer.
* This config is passed through to Rust via napi-rs.
*
* @example Client Credentials (service-to-service):
* ```typescript
* const config: OAuthConfig = {
* issuerUrl: "https://login.microsoftonline.com/{tenant}/v2.0",
* clientId: "app-id",
* clientSecret: "secret",
* scopes: ["api://lancedb-api/.default"],
* };
* ```
*
* @example Azure Managed Identity:
* ```typescript
* const config: OAuthConfig = {
* issuerUrl: "https://login.microsoftonline.com/{tenant}/v2.0",
* clientId: "app-id",
* scopes: ["api://lancedb-api/.default"],
* flow: OAuthFlowType.AzureManagedIdentity,
* };
* ```
*/
export interface OAuthConfig {
/**
* OIDC issuer URL or OAuth authority URL.
* For Azure: `https://login.microsoftonline.com/{tenant_id}/v2.0`
*/
issuerUrl: string;
/** Application / Client ID. */
clientId: string;
/**
* OAuth scopes to request.
* For Azure managed identity, exactly one scope or resource is required.
* For example: `["api://{app_id}/.default"]`
*/
scopes: string[];
/** Authentication flow (default: ClientCredentials). */
flow?: OAuthFlowType;
/** Client secret (required for ClientCredentials). */
clientSecret?: string;
/** Client ID for user-assigned managed identity (AzureManagedIdentity). */
managedIdentityClientId?: string;
/**
* Seconds before expiry to trigger proactive refresh (default: 300).
* Keep this well below the token TTL; if it is greater than or equal to
* the TTL, each request refreshes the token.
*/
refreshBufferSecs?: number;
}

View File

@@ -362,6 +362,9 @@ export class StandardQueryBase<
* *
* Filtering performance can often be improved by creating a scalar index * Filtering performance can often be improved by creating a scalar index
* on the filter column(s). * on the filter column(s).
*
* Calling this multiple times combines the filters with a logical AND rather
* than replacing the previous filter.
*/ */
where(predicate: string): this { where(predicate: string): this {
this.doCall((inner: NativeQueryType) => inner.onlyIf(predicate)); this.doCall((inner: NativeQueryType) => inner.onlyIf(predicate));

View File

@@ -1,6 +1,6 @@
{ {
"name": "@lancedb/lancedb-darwin-arm64", "name": "@lancedb/lancedb-darwin-arm64",
"version": "0.31.0-beta.3", "version": "0.31.0-beta.5",
"os": ["darwin"], "os": ["darwin"],
"cpu": ["arm64"], "cpu": ["arm64"],
"main": "lancedb.darwin-arm64.node", "main": "lancedb.darwin-arm64.node",

View File

@@ -1,6 +1,6 @@
{ {
"name": "@lancedb/lancedb-linux-arm64-gnu", "name": "@lancedb/lancedb-linux-arm64-gnu",
"version": "0.31.0-beta.3", "version": "0.31.0-beta.5",
"os": ["linux"], "os": ["linux"],
"cpu": ["arm64"], "cpu": ["arm64"],
"main": "lancedb.linux-arm64-gnu.node", "main": "lancedb.linux-arm64-gnu.node",

View File

@@ -1,6 +1,6 @@
{ {
"name": "@lancedb/lancedb-linux-arm64-musl", "name": "@lancedb/lancedb-linux-arm64-musl",
"version": "0.31.0-beta.3", "version": "0.31.0-beta.5",
"os": ["linux"], "os": ["linux"],
"cpu": ["arm64"], "cpu": ["arm64"],
"main": "lancedb.linux-arm64-musl.node", "main": "lancedb.linux-arm64-musl.node",

View File

@@ -1,6 +1,6 @@
{ {
"name": "@lancedb/lancedb-linux-x64-gnu", "name": "@lancedb/lancedb-linux-x64-gnu",
"version": "0.31.0-beta.3", "version": "0.31.0-beta.5",
"os": ["linux"], "os": ["linux"],
"cpu": ["x64"], "cpu": ["x64"],
"main": "lancedb.linux-x64-gnu.node", "main": "lancedb.linux-x64-gnu.node",

View File

@@ -1,6 +1,6 @@
{ {
"name": "@lancedb/lancedb-linux-x64-musl", "name": "@lancedb/lancedb-linux-x64-musl",
"version": "0.31.0-beta.3", "version": "0.31.0-beta.5",
"os": ["linux"], "os": ["linux"],
"cpu": ["x64"], "cpu": ["x64"],
"main": "lancedb.linux-x64-musl.node", "main": "lancedb.linux-x64-musl.node",

View File

@@ -1,6 +1,6 @@
{ {
"name": "@lancedb/lancedb-win32-arm64-msvc", "name": "@lancedb/lancedb-win32-arm64-msvc",
"version": "0.31.0-beta.3", "version": "0.31.0-beta.5",
"os": [ "os": [
"win32" "win32"
], ],

View File

@@ -1,6 +1,6 @@
{ {
"name": "@lancedb/lancedb-win32-x64-msvc", "name": "@lancedb/lancedb-win32-x64-msvc",
"version": "0.31.0-beta.3", "version": "0.31.0-beta.5",
"os": ["win32"], "os": ["win32"],
"cpu": ["x64"], "cpu": ["x64"],
"main": "lancedb.win32-x64-msvc.node", "main": "lancedb.win32-x64-msvc.node",

View File

@@ -1,12 +1,12 @@
{ {
"name": "@lancedb/lancedb", "name": "@lancedb/lancedb",
"version": "0.31.0-beta.3", "version": "0.31.0-beta.5",
"lockfileVersion": 3, "lockfileVersion": 3,
"requires": true, "requires": true,
"packages": { "packages": {
"": { "": {
"name": "@lancedb/lancedb", "name": "@lancedb/lancedb",
"version": "0.31.0-beta.3", "version": "0.31.0-beta.5",
"cpu": [ "cpu": [
"x64", "x64",
"arm64" "arm64"

View File

@@ -11,7 +11,7 @@
"ann" "ann"
], ],
"private": false, "private": false,
"version": "0.31.0-beta.3", "version": "0.31.0-beta.5",
"main": "dist/index.js", "main": "dist/index.js",
"exports": { "exports": {
".": "./dist/index.js", ".": "./dist/index.js",

View File

@@ -112,6 +112,12 @@ impl Connection {
builder = builder.client_config(rust_config); builder = builder.client_config(rust_config);
if let Some(oauth_config) = options.oauth_config {
let config: lancedb::remote::oauth::OAuthConfig =
oauth_config.try_into().default_error()?;
builder = builder.oauth_config(config);
}
if let Some(api_key) = options.api_key { if let Some(api_key) = options.api_key {
builder = builder.api_key(&api_key); builder = builder.api_key(&api_key);
} }

View File

@@ -65,6 +65,11 @@ pub struct ConnectionOptions {
/// (For LanceDB cloud only): the host to use for LanceDB cloud. Used /// (For LanceDB cloud only): the host to use for LanceDB cloud. Used
/// for testing purposes. /// for testing purposes.
pub host_override: Option<String>, pub host_override: Option<String>,
/// (For LanceDB cloud only): OAuth configuration for IdP-based
/// authentication (e.g., Azure Entra ID). When set, token acquisition
/// and refresh are handled entirely in Rust. TypeScript users should pass
/// the public `OAuthConfig` type exported from `@lancedb/lancedb`.
pub oauth_config: Option<remote::OAuthConfig>,
} }
#[napi(object)] #[napi(object)]

View File

@@ -3,7 +3,7 @@
use std::time::Duration; use std::time::Duration;
use lancedb::{arrow::IntoArrow, ipc::ipc_file_to_batches, table::merge::MergeInsertBuilder}; use lancedb::{ipc::ipc_file_to_batches, table::merge::MergeInsertBuilder};
use napi::bindgen_prelude::*; use napi::bindgen_prelude::*;
use napi_derive::napi; use napi_derive::napi;
@@ -66,11 +66,9 @@ impl NativeMergeInsertBuilder {
#[napi(catch_unwind)] #[napi(catch_unwind)]
pub async fn execute(&self, buf: Buffer) -> napi::Result<MergeResult> { pub async fn execute(&self, buf: Buffer) -> napi::Result<MergeResult> {
let data = ipc_file_to_batches(buf.to_vec()) let data = ipc_file_to_batches(buf.to_vec()).map_err(|e| {
.and_then(IntoArrow::into_arrow) napi::Error::from_reason(format!("Failed to read IPC file: {}", convert_error(&e)))
.map_err(|e| { })?;
napi::Error::from_reason(format!("Failed to read IPC file: {}", convert_error(&e)))
})?;
let this = self.clone(); let this = self.clone();

View File

@@ -3,6 +3,7 @@
use std::collections::HashMap; use std::collections::HashMap;
use lancedb::error::Error;
use napi_derive::*; use napi_derive::*;
/// Timeout configuration for remote HTTP client. /// Timeout configuration for remote HTTP client.
@@ -140,6 +141,84 @@ impl From<TlsConfig> for lancedb::remote::TlsConfig {
} }
} }
/// OAuth configuration for LanceDB authentication.
///
/// This is the generated napi-rs binding shape. TypeScript users should prefer
/// the public `OAuthConfig` type exported from `@lancedb/lancedb`.
///
/// All token acquisition and refresh is handled in the Rust layer.
#[napi(object)]
#[derive(Clone)]
pub struct OAuthConfig {
/// OIDC issuer URL or OAuth authority URL.
/// For Azure: `https://login.microsoftonline.com/{tenant_id}/v2.0`
pub issuer_url: String,
/// Application / Client ID.
pub client_id: String,
/// OAuth scopes to request. For Azure managed identity, exactly one scope
/// or resource is required. For example: `["api://{app_id}/.default"]`
pub scopes: Vec<String>,
/// Authentication flow: "client_credentials" or "azure_managed_identity"
pub flow: Option<String>,
/// Client secret (required for client_credentials).
pub client_secret: Option<String>,
/// Client ID for user-assigned managed identity (azure_managed_identity).
pub managed_identity_client_id: Option<String>,
/// Seconds before expiry to trigger proactive refresh (default: 300).
/// Keep this well below the token TTL; if it is greater than or equal to
/// the TTL, each request refreshes the token.
pub refresh_buffer_secs: Option<u32>,
}
impl std::fmt::Debug for OAuthConfig {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
f.debug_struct("OAuthConfig")
.field("issuer_url", &self.issuer_url)
.field("client_id", &self.client_id)
.field("scopes", &self.scopes)
.field("flow", &self.flow)
.field(
"client_secret",
&self.client_secret.as_deref().map(|_| "<redacted>"),
)
.field(
"managed_identity_client_id",
&self.managed_identity_client_id,
)
.field("refresh_buffer_secs", &self.refresh_buffer_secs)
.finish()
}
}
impl TryFrom<OAuthConfig> for lancedb::remote::oauth::OAuthConfig {
type Error = Error;
fn try_from(config: OAuthConfig) -> Result<Self, Self::Error> {
use lancedb::remote::oauth::OAuthFlow;
let flow = match config.flow.as_deref().unwrap_or("client_credentials") {
"client_credentials" => OAuthFlow::ClientCredentials,
"azure_managed_identity" => OAuthFlow::AzureManagedIdentity {
client_id: config.managed_identity_client_id,
},
other => {
return Err(Error::InvalidInput {
message: format!("Unknown OAuth flow type: {other}"),
});
}
};
Ok(Self {
issuer_url: config.issuer_url,
client_id: config.client_id,
client_secret: config.client_secret,
scopes: config.scopes,
flow,
refresh_buffer_secs: config.refresh_buffer_secs.map(|v| v as u64),
})
}
}
impl From<ClientConfig> for lancedb::remote::ClientConfig { impl From<ClientConfig> for lancedb::remote::ClientConfig {
fn from(config: ClientConfig) -> Self { fn from(config: ClientConfig) -> Self {
Self { Self {
@@ -156,3 +235,45 @@ impl From<ClientConfig> for lancedb::remote::ClientConfig {
} }
} }
} }
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_unknown_oauth_flow_returns_invalid_input() {
let config = OAuthConfig {
issuer_url: "https://issuer.example.com".to_string(),
client_id: "client-id".to_string(),
scopes: vec!["scope".to_string()],
flow: Some("typo".to_string()),
client_secret: None,
managed_identity_client_id: None,
refresh_buffer_secs: None,
};
let err = lancedb::remote::oauth::OAuthConfig::try_from(config).unwrap_err();
assert!(matches!(
err,
Error::InvalidInput { message }
if message == "Unknown OAuth flow type: typo"
));
}
#[test]
fn test_oauth_config_debug_redacts_client_secret() {
let config = OAuthConfig {
issuer_url: "https://issuer.example.com".to_string(),
client_id: "client-id".to_string(),
scopes: vec!["scope".to_string()],
flow: Some("client_credentials".to_string()),
client_secret: Some("super-secret".to_string()),
managed_identity_client_id: None,
refresh_buffer_secs: None,
};
let debug = format!("{config:?}");
assert!(!debug.contains("super-secret"));
assert!(debug.contains("client_secret: Some(\"<redacted>\")"));
}
}

View File

@@ -1,5 +1,5 @@
[tool.bumpversion] [tool.bumpversion]
current_version = "0.34.0-beta.4" current_version = "0.34.0-beta.5"
parse = """(?x) parse = """(?x)
(?P<major>0|[1-9]\\d*)\\. (?P<major>0|[1-9]\\d*)\\.
(?P<minor>0|[1-9]\\d*)\\. (?P<minor>0|[1-9]\\d*)\\.

View File

@@ -1,6 +1,6 @@
[package] [package]
name = "lancedb-python" name = "lancedb-python"
version = "0.34.0-beta.4" version = "0.34.0-beta.5"
publish = false publish = false
edition.workspace = true edition.workspace = true
description = "Python bindings for LanceDB" description = "Python bindings for LanceDB"

View File

@@ -89,6 +89,8 @@ def connect(
If presented, connect to LanceDB cloud. If presented, connect to LanceDB cloud.
Otherwise, connect to a database on file system or cloud storage. Otherwise, connect to a database on file system or cloud storage.
Can be set via environment variable `LANCEDB_API_KEY`. Can be set via environment variable `LANCEDB_API_KEY`.
OAuth configuration is currently supported only by ``connect_async``;
synchronous LanceDB Cloud connections require an API key.
region: str, default "us-east-1" region: str, default "us-east-1"
The region to use for LanceDB Cloud. The region to use for LanceDB Cloud.
host_override: str, optional host_override: str, optional
@@ -340,6 +342,7 @@ async def connect_async(
session: Optional[Session] = None, session: Optional[Session] = None,
manifest_enabled: bool = False, manifest_enabled: bool = False,
namespace_client_properties: Optional[Dict[str, str]] = None, namespace_client_properties: Optional[Dict[str, str]] = None,
oauth_config=None,
) -> AsyncConnection: ) -> AsyncConnection:
"""Connect to a LanceDB database. """Connect to a LanceDB database.
@@ -389,6 +392,10 @@ async def connect_async(
namespace_client_properties : dict, optional namespace_client_properties : dict, optional
Additional directory namespace client properties to use with Additional directory namespace client properties to use with
``manifest_enabled=True``. ``manifest_enabled=True``.
oauth_config : OAuthConfig, optional
OAuth configuration for LanceDB Cloud/Enterprise. This is supported by
``connect_async`` only; synchronous ``connect`` uses API key
authentication for ``db://`` URIs.
Examples Examples
-------- --------
@@ -435,6 +442,7 @@ async def connect_async(
session, session,
manifest_enabled, manifest_enabled,
namespace_client_properties, namespace_client_properties,
oauth_config,
) )
) )

View File

@@ -280,6 +280,24 @@ async def connect(
session: Optional[Session], session: Optional[Session],
manifest_enabled: bool = False, manifest_enabled: bool = False,
namespace_client_properties: Optional[Dict[str, str]] = None, namespace_client_properties: Optional[Dict[str, str]] = None,
oauth_config: Optional[Any] = None,
) -> Connection: ...
def connect_namespace(
namespace_client_impl: str,
namespace_client_properties: Dict[str, str],
read_consistency_interval: Optional[float] = None,
storage_options: Optional[Dict[str, str]] = None,
session: Optional[Session] = None,
namespace_client_pushdown_operations: Optional[List[str]] = None,
) -> Connection: ...
def connect_namespace_client(
namespace_client: Any,
read_consistency_interval: Optional[float] = None,
storage_options: Optional[Dict[str, str]] = None,
session: Optional[Session] = None,
namespace_client_pushdown_operations: Optional[List[str]] = None,
namespace_client_impl: Optional[str] = None,
namespace_client_properties: Optional[Dict[str, str]] = None,
) -> Connection: ... ) -> Connection: ...
class RecordBatchStream: class RecordBatchStream:

View File

@@ -51,6 +51,15 @@ class LanceMergeInsertBuilder(object):
If there are multiple matches then the behavior is undefined. If there are multiple matches then the behavior is undefined.
Currently this causes multiple copies of the row to be created Currently this causes multiple copies of the row to be created
but that behavior is subject to change. but that behavior is subject to change.
Parameters
----------
where: Optional[str], default None
An optional filter to limit which rows are updated. Column
references in this expression must be prefixed with "target."
to refer to the existing table data. For example, to only
update rows where the existing color is red, use:
``where="target.color = 'red'"``
""" """
self._when_matched_update_all = True self._when_matched_update_all = True
self._when_matched_update_all_condition = where self._when_matched_update_all_condition = where

View File

@@ -38,15 +38,13 @@ from lance_namespace_urllib3_client.models.query_table_request_vector import (
QueryTableRequestVector, QueryTableRequestVector,
) )
from lance_namespace_urllib3_client.models.string_fts_query import StringFtsQuery from lance_namespace_urllib3_client.models.string_fts_query import StringFtsQuery
from lance_namespace.errors import TableNotFoundError from lance_namespace.errors import NamespaceNotEmptyError, TableNotFoundError
from lancedb._lancedb import connect_namespace_client as _connect_namespace_client from lancedb._lancedb import (
connect_namespace as _connect_namespace,
connect_namespace_client as _connect_namespace_client,
)
from lancedb.background_loop import LOOP from lancedb.background_loop import LOOP
from lancedb.db import AsyncConnection, DBConnection from lancedb.db import AsyncConnection, DBConnection
from lancedb.namespace_utils import (
_normalize_create_namespace_mode,
_normalize_drop_namespace_mode,
_normalize_drop_namespace_behavior,
)
from lance_namespace import ( from lance_namespace import (
LanceNamespace, LanceNamespace,
connect as namespace_connect, connect as namespace_connect,
@@ -55,13 +53,6 @@ from lance_namespace import (
DropNamespaceResponse, DropNamespaceResponse,
ListNamespacesResponse, ListNamespacesResponse,
ListTablesResponse, ListTablesResponse,
ListTablesRequest,
DescribeNamespaceRequest,
DropTableRequest,
RenameTableRequest,
ListNamespacesRequest,
CreateNamespaceRequest,
DropNamespaceRequest,
) )
from lancedb.table import AsyncTable, LanceTable, Table from lancedb.table import AsyncTable, LanceTable, Table
from lancedb.util import validate_table_name from lancedb.util import validate_table_name
@@ -386,6 +377,10 @@ def _builds_namespace_natively(
return namespace_client_impl == "rest" and bool(namespace_client_properties) return namespace_client_impl == "rest" and bool(namespace_client_properties)
def _supports_native_namespace(namespace_client_impl: str) -> bool:
return namespace_client_impl in {"dir", "rest"}
class LanceNamespaceDBConnection(DBConnection): class LanceNamespaceDBConnection(DBConnection):
""" """
A LanceDB connection that uses a namespace for table management. A LanceDB connection that uses a namespace for table management.
@@ -396,7 +391,7 @@ class LanceNamespaceDBConnection(DBConnection):
def __init__( def __init__(
self, self,
namespace_client: LanceNamespace, namespace_client: Optional[LanceNamespace] = None,
*, *,
read_consistency_interval: Optional[timedelta] = None, read_consistency_interval: Optional[timedelta] = None,
storage_options: Optional[Dict[str, str]] = None, storage_options: Optional[Dict[str, str]] = None,
@@ -404,6 +399,7 @@ class LanceNamespaceDBConnection(DBConnection):
namespace_client_pushdown_operations: Optional[List[str]] = None, namespace_client_pushdown_operations: Optional[List[str]] = None,
namespace_client_impl: Optional[str] = None, namespace_client_impl: Optional[str] = None,
namespace_client_properties: Optional[Dict[str, str]] = None, namespace_client_properties: Optional[Dict[str, str]] = None,
_inner: Optional[AsyncConnection] = None,
): ):
""" """
Initialize a namespace-based LanceDB connection. Initialize a namespace-based LanceDB connection.
@@ -445,30 +441,36 @@ class LanceNamespaceDBConnection(DBConnection):
) )
self._namespace_client_impl = namespace_client_impl self._namespace_client_impl = namespace_client_impl
self._namespace_client_properties = namespace_client_properties self._namespace_client_properties = namespace_client_properties
# When the namespace client is built natively (see Rust # When the namespace connection or client is built natively in Rust, the
# ``build_namespace_natively``), the underlying Rust table performs # underlying Rust table performs QueryTable pushdown through the
# QueryTable pushdown through the read-freshness context provider, which # read-freshness context provider, which the pure-Python ``query_table``
# the pure-Python ``query_table`` path bypasses. # path bypasses.
self._route_pushdown_to_rust = _builds_namespace_natively( self._route_pushdown_to_rust = _inner is not None or _builds_namespace_natively(
namespace_client_impl, namespace_client_properties namespace_client_impl, namespace_client_properties
) )
self._inner = AsyncConnection( if _inner is not None:
_connect_namespace_client( self._inner = _inner
namespace_client, else:
read_consistency_interval=( if namespace_client is None:
read_consistency_interval.total_seconds() raise ValueError("namespace_client is required without a native _inner")
if read_consistency_interval is not None self._inner = AsyncConnection(
else None _connect_namespace_client(
), namespace_client,
storage_options=self.storage_options or None, read_consistency_interval=(
session=session, read_consistency_interval.total_seconds()
namespace_client_pushdown_operations=( if read_consistency_interval is not None
list(self._namespace_client_pushdown_operations) else None
), ),
namespace_client_impl=namespace_client_impl, storage_options=self.storage_options or None,
namespace_client_properties=namespace_client_properties, session=session,
namespace_client_pushdown_operations=(
list(self._namespace_client_pushdown_operations)
),
namespace_client_impl=namespace_client_impl,
namespace_client_properties=namespace_client_properties,
)
) )
) self._uri = self._inner.uri
@override @override
def serialize(self) -> str: def serialize(self) -> str:
@@ -514,11 +516,11 @@ class LanceNamespaceDBConnection(DBConnection):
) )
if namespace_path is None: if namespace_path is None:
namespace_path = [] namespace_path = []
request = ListTablesRequest( return LOOP.run(
id=namespace_path, page_token=page_token, limit=limit self._inner.table_names(
namespace_path=namespace_path, start_after=page_token, limit=limit
)
) )
response = self._namespace_client.list_tables(request)
return response.tables if response.tables else []
@override @override
def create_table( def create_table(
@@ -589,8 +591,8 @@ class LanceNamespaceDBConnection(DBConnection):
index_cache_size=index_cache_size, index_cache_size=index_cache_size,
) )
) )
except RuntimeError as e: except (RuntimeError, ValueError) as e:
if "Table not found" in str(e): if "Table not found" in str(e) or "was not found" in str(e):
table_id = namespace_path + [name] table_id = namespace_path + [name]
raise TableNotFoundError(f"Table not found: {'$'.join(table_id)}") raise TableNotFoundError(f"Table not found: {'$'.join(table_id)}")
raise raise
@@ -612,12 +614,9 @@ class LanceNamespaceDBConnection(DBConnection):
@override @override
def drop_table(self, name: str, namespace_path: Optional[List[str]] = None): def drop_table(self, name: str, namespace_path: Optional[List[str]] = None):
# Use namespace drop_table directly
if namespace_path is None: if namespace_path is None:
namespace_path = [] namespace_path = []
table_id = namespace_path + [name] LOOP.run(self._inner.drop_table(name, namespace_path=namespace_path))
request = DropTableRequest(id=table_id)
self._namespace_client.drop_table(request)
@override @override
def rename_table( def rename_table(
@@ -631,14 +630,19 @@ class LanceNamespaceDBConnection(DBConnection):
cur_namespace_path = [] cur_namespace_path = []
if new_namespace_path is None: if new_namespace_path is None:
new_namespace_path = [] new_namespace_path = []
cur_table_id = cur_namespace_path + [cur_name] try:
new_namespace_id = new_namespace_path if new_namespace_path else None LOOP.run(
request = RenameTableRequest( self._inner.rename_table(
id=cur_table_id, cur_name,
new_table_name=new_name, new_name,
new_namespace_id=new_namespace_id, cur_namespace_path=cur_namespace_path,
) new_namespace_path=new_namespace_path,
self._namespace_client.rename_table(request) )
)
except RuntimeError as e:
if "rename_table not implemented" in str(e):
raise NotImplementedError("rename_table not implemented") from e
raise
@override @override
def drop_database(self): def drop_database(self):
@@ -650,8 +654,7 @@ class LanceNamespaceDBConnection(DBConnection):
def drop_all_tables(self, namespace_path: Optional[List[str]] = None): def drop_all_tables(self, namespace_path: Optional[List[str]] = None):
if namespace_path is None: if namespace_path is None:
namespace_path = [] namespace_path = []
for table_name in self.table_names(namespace_path=namespace_path): LOOP.run(self._inner.drop_all_tables(namespace_path=namespace_path))
self.drop_table(table_name, namespace_path=namespace_path)
@override @override
def list_namespaces( def list_namespaces(
@@ -681,13 +684,10 @@ class LanceNamespaceDBConnection(DBConnection):
""" """
if namespace_path is None: if namespace_path is None:
namespace_path = [] namespace_path = []
request = ListNamespacesRequest( return LOOP.run(
id=namespace_path, page_token=page_token, limit=limit self._inner.list_namespaces(
) namespace_path=namespace_path, page_token=page_token, limit=limit
response = self._namespace_client.list_namespaces(request) )
return ListNamespacesResponse(
namespaces=response.namespaces if response.namespaces else [],
page_token=response.page_token,
) )
@override @override
@@ -715,14 +715,12 @@ class LanceNamespaceDBConnection(DBConnection):
CreateNamespaceResponse CreateNamespaceResponse
Response containing the properties of the created namespace. Response containing the properties of the created namespace.
""" """
request = CreateNamespaceRequest( return LOOP.run(
id=namespace_path, self._inner.create_namespace(
mode=_normalize_create_namespace_mode(mode), namespace_path=namespace_path,
properties=properties, mode=mode,
) properties=properties,
response = self._namespace_client.create_namespace(request) )
return CreateNamespaceResponse(
properties=response.properties if hasattr(response, "properties") else None
) )
@override @override
@@ -750,20 +748,18 @@ class LanceNamespaceDBConnection(DBConnection):
DropNamespaceResponse DropNamespaceResponse
Response containing properties and transaction_id if applicable. Response containing properties and transaction_id if applicable.
""" """
request = DropNamespaceRequest( try:
id=namespace_path, return LOOP.run(
mode=_normalize_drop_namespace_mode(mode), self._inner.drop_namespace(
behavior=_normalize_drop_namespace_behavior(behavior), namespace_path=namespace_path,
) mode=mode,
response = self._namespace_client.drop_namespace(request) behavior=behavior,
return DropNamespaceResponse( )
properties=( )
response.properties if hasattr(response, "properties") else None except RuntimeError as e:
), if "Namespace not empty" in str(e):
transaction_id=( raise NamespaceNotEmptyError(str(e)) from e
response.transaction_id if hasattr(response, "transaction_id") else None raise
),
)
@override @override
def describe_namespace( def describe_namespace(
@@ -782,11 +778,7 @@ class LanceNamespaceDBConnection(DBConnection):
DescribeNamespaceResponse DescribeNamespaceResponse
Response containing the namespace properties. Response containing the namespace properties.
""" """
request = DescribeNamespaceRequest(id=namespace_path) return LOOP.run(self._inner.describe_namespace(namespace_path))
response = self._namespace_client.describe_namespace(request)
return DescribeNamespaceResponse(
properties=response.properties if hasattr(response, "properties") else None
)
@override @override
def list_tables( def list_tables(
@@ -816,13 +808,10 @@ class LanceNamespaceDBConnection(DBConnection):
""" """
if namespace_path is None: if namespace_path is None:
namespace_path = [] namespace_path = []
request = ListTablesRequest( return LOOP.run(
id=namespace_path, page_token=page_token, limit=limit self._inner.list_tables(
) namespace_path=namespace_path, page_token=page_token, limit=limit
response = self._namespace_client.list_tables(request) )
return ListTablesResponse(
tables=response.tables if response.tables else [],
page_token=response.page_token,
) )
def _lance_table_from_uri( def _lance_table_from_uri(
@@ -878,6 +867,18 @@ class LanceNamespaceDBConnection(DBConnection):
LanceNamespace LanceNamespace
The namespace client for this connection. The namespace client for this connection.
""" """
if self._namespace_client is None:
if (
self._namespace_client_impl is None
or self._namespace_client_properties is None
):
raise ValueError(
"Cannot construct a Python namespace client without "
"namespace implementation properties"
)
self._namespace_client = namespace_connect(
self._namespace_client_impl, self._namespace_client_properties
)
return self._namespace_client return self._namespace_client
@@ -891,7 +892,7 @@ class AsyncLanceNamespaceDBConnection:
def __init__( def __init__(
self, self,
namespace_client: LanceNamespace, namespace_client: Optional[LanceNamespace] = None,
*, *,
read_consistency_interval: Optional[timedelta] = None, read_consistency_interval: Optional[timedelta] = None,
storage_options: Optional[Dict[str, str]] = None, storage_options: Optional[Dict[str, str]] = None,
@@ -899,6 +900,7 @@ class AsyncLanceNamespaceDBConnection:
namespace_client_pushdown_operations: Optional[List[str]] = None, namespace_client_pushdown_operations: Optional[List[str]] = None,
namespace_client_impl: Optional[str] = None, namespace_client_impl: Optional[str] = None,
namespace_client_properties: Optional[Dict[str, str]] = None, namespace_client_properties: Optional[Dict[str, str]] = None,
_inner: Optional[AsyncConnection] = None,
): ):
""" """
Initialize an async namespace-based LanceDB connection. Initialize an async namespace-based LanceDB connection.
@@ -940,29 +942,35 @@ class AsyncLanceNamespaceDBConnection:
) )
self._namespace_client_impl = namespace_client_impl self._namespace_client_impl = namespace_client_impl
self._namespace_client_properties = namespace_client_properties self._namespace_client_properties = namespace_client_properties
# See LanceNamespaceDBConnection: when built natively the Rust table runs # See LanceNamespaceDBConnection: when Rust owns the namespace
# QueryTable pushdown through the read-freshness provider, so defer to it # connection/client, its table performs QueryTable pushdown through the
# rather than the urllib3 client (which omits x-lancedb-min-timestamp). # read-freshness provider, so defer to it rather than the urllib3 client
self._route_pushdown_to_rust = _builds_namespace_natively( # path (which omits x-lancedb-min-timestamp).
self._route_pushdown_to_rust = _inner is not None or _builds_namespace_natively(
namespace_client_impl, namespace_client_properties namespace_client_impl, namespace_client_properties
) )
self._inner = AsyncConnection( if _inner is not None:
_connect_namespace_client( self._inner = _inner
namespace_client, else:
read_consistency_interval=( if namespace_client is None:
read_consistency_interval.total_seconds() raise ValueError("namespace_client is required without a native _inner")
if read_consistency_interval is not None self._inner = AsyncConnection(
else None _connect_namespace_client(
), namespace_client,
storage_options=self.storage_options or None, read_consistency_interval=(
session=session, read_consistency_interval.total_seconds()
namespace_client_pushdown_operations=( if read_consistency_interval is not None
list(self._namespace_client_pushdown_operations) else None
), ),
namespace_client_impl=namespace_client_impl, storage_options=self.storage_options or None,
namespace_client_properties=namespace_client_properties, session=session,
namespace_client_pushdown_operations=(
list(self._namespace_client_pushdown_operations)
),
namespace_client_impl=namespace_client_impl,
namespace_client_properties=namespace_client_properties,
)
) )
)
async def table_names( async def table_names(
self, self,
@@ -986,11 +994,9 @@ class AsyncLanceNamespaceDBConnection:
) )
if namespace_path is None: if namespace_path is None:
namespace_path = [] namespace_path = []
request = ListTablesRequest( return await self._inner.table_names(
id=namespace_path, page_token=page_token, limit=limit namespace_path=namespace_path, start_after=page_token, limit=limit
) )
response = self._namespace_client.list_tables(request)
return response.tables if response.tables else []
async def create_table( async def create_table(
self, self,
@@ -1053,8 +1059,8 @@ class AsyncLanceNamespaceDBConnection:
storage_options=storage_options, storage_options=storage_options,
index_cache_size=index_cache_size, index_cache_size=index_cache_size,
) )
except RuntimeError as e: except (RuntimeError, ValueError) as e:
if "Table not found" in str(e): if "Table not found" in str(e) or "was not found" in str(e):
table_id = namespace_path + [name] table_id = namespace_path + [name]
raise TableNotFoundError(f"Table not found: {'$'.join(table_id)}") raise TableNotFoundError(f"Table not found: {'$'.join(table_id)}")
raise raise
@@ -1075,9 +1081,7 @@ class AsyncLanceNamespaceDBConnection:
"""Drop a table from the namespace.""" """Drop a table from the namespace."""
if namespace_path is None: if namespace_path is None:
namespace_path = [] namespace_path = []
table_id = namespace_path + [name] await self._inner.drop_table(name, namespace_path=namespace_path)
request = DropTableRequest(id=table_id)
self._namespace_client.drop_table(request)
async def rename_table( async def rename_table(
self, self,
@@ -1091,14 +1095,17 @@ class AsyncLanceNamespaceDBConnection:
cur_namespace_path = [] cur_namespace_path = []
if new_namespace_path is None: if new_namespace_path is None:
new_namespace_path = [] new_namespace_path = []
cur_table_id = cur_namespace_path + [cur_name] try:
new_namespace_id = new_namespace_path if new_namespace_path else None await self._inner.rename_table(
request = RenameTableRequest( cur_name,
id=cur_table_id, new_name,
new_table_name=new_name, cur_namespace_path=cur_namespace_path,
new_namespace_id=new_namespace_id, new_namespace_path=new_namespace_path,
) )
self._namespace_client.rename_table(request) except RuntimeError as e:
if "rename_table not implemented" in str(e):
raise NotImplementedError("rename_table not implemented") from e
raise
async def drop_database(self): async def drop_database(self):
"""Deprecated method.""" """Deprecated method."""
@@ -1110,9 +1117,7 @@ class AsyncLanceNamespaceDBConnection:
"""Drop all tables in the namespace.""" """Drop all tables in the namespace."""
if namespace_path is None: if namespace_path is None:
namespace_path = [] namespace_path = []
table_names = await self.table_names(namespace_path=namespace_path) await self._inner.drop_all_tables(namespace_path=namespace_path)
for table_name in table_names:
await self.drop_table(table_name, namespace_path=namespace_path)
async def list_namespaces( async def list_namespaces(
self, self,
@@ -1141,13 +1146,8 @@ class AsyncLanceNamespaceDBConnection:
""" """
if namespace_path is None: if namespace_path is None:
namespace_path = [] namespace_path = []
request = ListNamespacesRequest( return await self._inner.list_namespaces(
id=namespace_path, page_token=page_token, limit=limit namespace_path=namespace_path, page_token=page_token, limit=limit
)
response = self._namespace_client.list_namespaces(request)
return ListNamespacesResponse(
namespaces=response.namespaces if response.namespaces else [],
page_token=response.page_token,
) )
async def create_namespace( async def create_namespace(
@@ -1174,15 +1174,11 @@ class AsyncLanceNamespaceDBConnection:
CreateNamespaceResponse CreateNamespaceResponse
Response containing the properties of the created namespace. Response containing the properties of the created namespace.
""" """
request = CreateNamespaceRequest( return await self._inner.create_namespace(
id=namespace_path, namespace_path=namespace_path,
mode=_normalize_create_namespace_mode(mode), mode=mode,
properties=properties, properties=properties,
) )
response = self._namespace_client.create_namespace(request)
return CreateNamespaceResponse(
properties=response.properties if hasattr(response, "properties") else None
)
async def drop_namespace( async def drop_namespace(
self, self,
@@ -1208,20 +1204,16 @@ class AsyncLanceNamespaceDBConnection:
DropNamespaceResponse DropNamespaceResponse
Response containing properties and transaction_id if applicable. Response containing properties and transaction_id if applicable.
""" """
request = DropNamespaceRequest( try:
id=namespace_path, return await self._inner.drop_namespace(
mode=_normalize_drop_namespace_mode(mode), namespace_path=namespace_path,
behavior=_normalize_drop_namespace_behavior(behavior), mode=mode,
) behavior=behavior,
response = self._namespace_client.drop_namespace(request) )
return DropNamespaceResponse( except RuntimeError as e:
properties=( if "Namespace not empty" in str(e):
response.properties if hasattr(response, "properties") else None raise NamespaceNotEmptyError(str(e)) from e
), raise
transaction_id=(
response.transaction_id if hasattr(response, "transaction_id") else None
),
)
async def describe_namespace( async def describe_namespace(
self, namespace_path: List[str] self, namespace_path: List[str]
@@ -1239,11 +1231,7 @@ class AsyncLanceNamespaceDBConnection:
DescribeNamespaceResponse DescribeNamespaceResponse
Response containing the namespace properties. Response containing the namespace properties.
""" """
request = DescribeNamespaceRequest(id=namespace_path) return await self._inner.describe_namespace(namespace_path)
response = self._namespace_client.describe_namespace(request)
return DescribeNamespaceResponse(
properties=response.properties if hasattr(response, "properties") else None
)
async def list_tables( async def list_tables(
self, self,
@@ -1272,13 +1260,8 @@ class AsyncLanceNamespaceDBConnection:
""" """
if namespace_path is None: if namespace_path is None:
namespace_path = [] namespace_path = []
request = ListTablesRequest( return await self._inner.list_tables(
id=namespace_path, page_token=page_token, limit=limit namespace_path=namespace_path, page_token=page_token, limit=limit
)
response = self._namespace_client.list_tables(request)
return ListTablesResponse(
tables=response.tables if response.tables else [],
page_token=response.page_token,
) )
async def namespace_client(self) -> LanceNamespace: async def namespace_client(self) -> LanceNamespace:
@@ -1292,6 +1275,18 @@ class AsyncLanceNamespaceDBConnection:
LanceNamespace LanceNamespace
The namespace client for this connection. The namespace client for this connection.
""" """
if self._namespace_client is None:
if (
self._namespace_client_impl is None
or self._namespace_client_properties is None
):
raise ValueError(
"Cannot construct a Python namespace client without "
"namespace implementation properties"
)
self._namespace_client = namespace_connect(
self._namespace_client_impl, self._namespace_client_properties
)
return self._namespace_client return self._namespace_client
@@ -1342,6 +1337,32 @@ def connect_namespace(
LanceNamespaceDBConnection LanceNamespaceDBConnection
A namespace-based connection to LanceDB A namespace-based connection to LanceDB
""" """
if _supports_native_namespace(namespace_client_impl):
inner = AsyncConnection(
_connect_namespace(
namespace_client_impl,
namespace_client_properties,
read_consistency_interval=(
read_consistency_interval.total_seconds()
if read_consistency_interval is not None
else None
),
storage_options=storage_options,
session=session,
namespace_client_pushdown_operations=namespace_client_pushdown_operations,
)
)
return LanceNamespaceDBConnection(
namespace_client=None,
read_consistency_interval=read_consistency_interval,
storage_options=storage_options,
session=session,
namespace_client_pushdown_operations=namespace_client_pushdown_operations,
namespace_client_impl=namespace_client_impl,
namespace_client_properties=namespace_client_properties,
_inner=inner,
)
namespace_client = namespace_connect( namespace_client = namespace_connect(
namespace_client_impl, namespace_client_properties namespace_client_impl, namespace_client_properties
) )
@@ -1417,6 +1438,32 @@ def connect_namespace_async(
... tables = await db.table_names() ... tables = await db.table_names()
... table = await db.create_table("my_table", schema=schema) ... table = await db.create_table("my_table", schema=schema)
""" """
if _supports_native_namespace(namespace_client_impl):
inner = AsyncConnection(
_connect_namespace(
namespace_client_impl,
namespace_client_properties,
read_consistency_interval=(
read_consistency_interval.total_seconds()
if read_consistency_interval is not None
else None
),
storage_options=storage_options,
session=session,
namespace_client_pushdown_operations=namespace_client_pushdown_operations,
)
)
return AsyncLanceNamespaceDBConnection(
namespace_client=None,
read_consistency_interval=read_consistency_interval,
storage_options=storage_options,
session=session,
namespace_client_pushdown_operations=namespace_client_pushdown_operations,
namespace_client_impl=namespace_client_impl,
namespace_client_properties=namespace_client_properties,
_inner=inner,
)
namespace_client = namespace_connect( namespace_client = namespace_connect(
namespace_client_impl, namespace_client_properties namespace_client_impl, namespace_client_properties
) )

View File

@@ -48,6 +48,14 @@ class PermutationBuilder:
By default, the permutation builder will create a single split that contains all By default, the permutation builder will create a single split that contains all
rows in the same order as the base table. rows in the same order as the base table.
""" """
if not hasattr(table, "_inner"):
raise TypeError(
f"PermutationBuilder requires a local LanceTable, "
f"got {type(table).__name__}. "
"The permutation API is not supported on remote tables. "
"Remote tables connect to LanceDB Cloud or Enterprise and do not have "
"direct access to the underlying Lance dataset needed for permutations."
)
self._async = async_permutation_builder(table) self._async = async_permutation_builder(table)
def split_random( def split_random(

View File

@@ -119,6 +119,27 @@ def _filter_to_sql(filter: Optional[Union[str, Expr]]) -> Optional[str]:
return filter return filter
def _combine_where(
existing: Optional[Union[str, Expr]], new: Union[str, Expr]
) -> Union[str, Expr]:
"""Combine a new filter with an existing one using a logical AND.
Calling ``where`` more than once composes the filters with AND instead of
replacing the previous filter. Two :class:`~lancedb.expr.Expr` filters are
combined as an expression; otherwise both filters are lowered to SQL strings
and combined as SQL.
"""
if existing is None:
return new
existing_is_expr = isinstance(existing, Expr)
new_is_expr = isinstance(new, Expr)
if existing_is_expr and new_is_expr:
return existing & new
existing_sql = existing.to_sql() if existing_is_expr else existing
new_sql = new.to_sql() if new_is_expr else new
return f"({existing_sql}) AND ({new_sql})"
def _projection_to_scanner_kwargs( def _projection_to_scanner_kwargs(
columns: Optional[ columns: Optional[
Union[ Union[
@@ -1148,8 +1169,13 @@ class LanceQueryBuilder(ABC):
------- -------
LanceQueryBuilder LanceQueryBuilder
The LanceQueryBuilder object. The LanceQueryBuilder object.
Notes
-----
Calling this multiple times combines the filters with a logical AND
rather than replacing the previous filter.
""" """
self._where = where self._where = _combine_where(self._where, where)
self._postfilter = not prefilter self._postfilter = not prefilter
return self return self
@@ -1693,8 +1719,13 @@ class LanceVectorQueryBuilder(LanceQueryBuilder):
------- -------
LanceQueryBuilder LanceQueryBuilder
The LanceQueryBuilder object. The LanceQueryBuilder object.
Notes
-----
Calling this multiple times combines the filters with a logical AND
rather than replacing the previous filter.
""" """
self._where = where self._where = _combine_where(self._where, where)
if prefilter is not None: if prefilter is not None:
self._postfilter = not prefilter self._postfilter = not prefilter
return self return self
@@ -2894,6 +2925,9 @@ class AsyncStandardQuery(AsyncQueryBase):
Filtering performance can often be improved by creating a scalar index Filtering performance can often be improved by creating a scalar index
on the filter column(s). on the filter column(s).
Calling this multiple times combines the filters with a logical AND
rather than replacing the previous filter.
""" """
if isinstance(predicate, Expr): if isinstance(predicate, Expr):
self._inner.where_expr(predicate._inner) self._inner.where_expr(predicate._inner)

View File

@@ -9,6 +9,7 @@ from typing import List, Optional
from lancedb import __version__ from lancedb import __version__
from .header import HeaderProvider from .header import HeaderProvider
from .oauth import OAuthConfig, OAuthFlowType
__all__ = [ __all__ = [
"TimeoutConfig", "TimeoutConfig",
@@ -16,6 +17,8 @@ __all__ = [
"TlsConfig", "TlsConfig",
"ClientConfig", "ClientConfig",
"HeaderProvider", "HeaderProvider",
"OAuthConfig",
"OAuthFlowType",
] ]

View File

@@ -0,0 +1,75 @@
# SPDX-License-Identifier: Apache-2.0
# SPDX-FileCopyrightText: Copyright The LanceDB Authors
from dataclasses import dataclass, field
from enum import Enum
from typing import List, Optional
class OAuthFlowType(str, Enum):
"""OAuth authentication flow types."""
CLIENT_CREDENTIALS = "client_credentials"
"""Client Credentials grant (service-to-service / M2M)."""
AZURE_MANAGED_IDENTITY = "azure_managed_identity"
"""Azure Managed Identity via IMDS."""
@dataclass
class OAuthConfig:
"""OAuth configuration for LanceDB authentication.
All token acquisition and refresh is handled in the Rust layer.
This config is passed through to Rust via PyO3.
Parameters
----------
issuer_url : str
OIDC issuer URL or OAuth authority URL.
For Azure: ``https://login.microsoftonline.com/{tenant_id}/v2.0``
client_id : str
Application / Client ID.
scopes : List[str]
OAuth scopes to request.
For Azure managed identity, exactly one scope or resource is required.
For example: ``["api://{app_id}/.default"]``
flow : OAuthFlowType
Authentication flow to use. Default: CLIENT_CREDENTIALS.
client_secret : Optional[str]
Client secret (required for CLIENT_CREDENTIALS).
managed_identity_client_id : Optional[str]
Client ID for user-assigned managed identity (AZURE_MANAGED_IDENTITY).
refresh_buffer_secs : Optional[int]
Seconds before expiry to trigger proactive refresh (default: 300).
Keep this well below the token TTL; if it is greater than or equal to
the TTL, each request refreshes the token.
Examples
--------
Client Credentials (service-to-service):
>>> config = OAuthConfig(
... issuer_url="https://login.microsoftonline.com/{tenant}/v2.0",
... client_id="app-id",
... client_secret="secret",
... scopes=["api://lancedb-api/.default"],
... )
Azure Managed Identity:
>>> config = OAuthConfig(
... issuer_url="https://login.microsoftonline.com/{tenant}/v2.0",
... client_id="app-id",
... scopes=["api://lancedb-api/.default"],
... flow=OAuthFlowType.AZURE_MANAGED_IDENTITY,
... )
"""
issuer_url: str
client_id: str
scopes: List[str]
flow: OAuthFlowType = OAuthFlowType.CLIENT_CREDENTIALS
client_secret: Optional[str] = field(default=None, repr=False)
managed_identity_client_id: Optional[str] = None
refresh_buffer_secs: Optional[int] = None

View File

@@ -2142,12 +2142,19 @@ class LanceTable(Table):
branch = self.current_branch() branch = self.current_branch()
version = None if branch is not None else self.version version = None if branch is not None else self.version
if self._namespace_client is not None: namespace_client = self._namespace_client
if namespace_client is None:
conn_uri = getattr(self._conn, "uri", "")
if get_uri_scheme(conn_uri) == "namespace":
namespace_client = self._conn.namespace_client()
self._namespace_client = namespace_client
if namespace_client is not None:
table_id = self._namespace_path + [self.name] table_id = self._namespace_path + [self.name]
ds = lance.dataset( ds = lance.dataset(
version=version, version=version,
storage_options=self._conn.storage_options, storage_options=self._conn.storage_options,
namespace_client=self._namespace_client, namespace_client=namespace_client,
table_id=table_id, table_id=table_id,
**kwargs, **kwargs,
) )

View File

@@ -5,6 +5,7 @@
import tempfile import tempfile
import shutil import shutil
import importlib
import pytest import pytest
import pyarrow as pa import pyarrow as pa
import lancedb import lancedb
@@ -103,6 +104,40 @@ class TestNamespaceConnection:
assert isinstance(db, lancedb.LanceNamespaceDBConnection) assert isinstance(db, lancedb.LanceNamespaceDBConnection)
assert len(list(db.table_names())) == 0 assert len(list(db.table_names())) == 0
def test_sync_builtin_namespace_uses_rust_without_python_client(self, monkeypatch):
"""Built-in sync namespace connections should not construct or call the
Python namespace client for normal namespace/table management."""
namespace_module = importlib.import_module("lancedb.namespace")
def fail_namespace_connect(*args, **kwargs):
raise AssertionError("Python namespace client should not be constructed")
monkeypatch.setattr(
namespace_module, "namespace_connect", fail_namespace_connect
)
db = lancedb.connect_namespace("dir", {"root": self.temp_dir})
assert isinstance(db, lancedb.LanceNamespaceDBConnection)
assert db._namespace_client is None
assert db._route_pushdown_to_rust is True
db.create_namespace(["test_ns"])
assert "test_ns" in db.list_namespaces().namespaces
schema = pa.schema([pa.field("id", pa.int64())])
table = db.create_table("test_table", schema=schema, namespace_path=["test_ns"])
assert table.namespace == ["test_ns"]
assert "test_table" in db.table_names(namespace_path=["test_ns"])
assert "test_table" in db.list_tables(namespace_path=["test_ns"]).tables
opened = db.open_table("test_table", namespace_path=["test_ns"])
assert opened.namespace == ["test_ns"]
db.drop_table("test_table", namespace_path=["test_ns"])
assert db.list_tables(namespace_path=["test_ns"]).tables == []
db.drop_namespace(["test_ns"])
assert "test_ns" not in db.list_namespaces().namespaces
def test_create_table_through_namespace(self): def test_create_table_through_namespace(self):
"""Test creating a table through namespace.""" """Test creating a table through namespace."""
db = lancedb.connect_namespace("dir", {"root": self.temp_dir}) db = lancedb.connect_namespace("dir", {"root": self.temp_dir})
@@ -564,6 +599,61 @@ class TestAsyncNamespaceConnection:
table_names = await db.table_names() table_names = await db.table_names()
assert len(list(table_names)) == 0 assert len(list(table_names)) == 0
async def test_async_builtin_namespace_uses_rust_without_python_client(
self, monkeypatch
):
"""Built-in async namespace connections should not construct or call the
Python namespace client for normal namespace/table management."""
namespace_module = importlib.import_module("lancedb.namespace")
def fail_namespace_connect(*args, **kwargs):
raise AssertionError("Python namespace client should not be constructed")
monkeypatch.setattr(
namespace_module, "namespace_connect", fail_namespace_connect
)
db = lancedb.connect_namespace_async("dir", {"root": self.temp_dir})
assert isinstance(db, lancedb.AsyncLanceNamespaceDBConnection)
assert db._namespace_client is None
assert db._route_pushdown_to_rust is True
await db.create_namespace(["test_ns"])
assert "test_ns" in (await db.list_namespaces()).namespaces
schema = pa.schema([pa.field("id", pa.int64())])
table = await db.create_table(
"test_table", schema=schema, namespace_path=["test_ns"]
)
assert table._namespace_path == ["test_ns"]
assert table._namespace_client is None
assert table._route_pushdown_to_rust is True
assert "test_table" in await db.table_names(namespace_path=["test_ns"])
assert "test_table" in (await db.list_tables(namespace_path=["test_ns"])).tables
opened = await db.open_table("test_table", namespace_path=["test_ns"])
assert opened._namespace_path == ["test_ns"]
await db.drop_table("test_table", namespace_path=["test_ns"])
assert (await db.list_tables(namespace_path=["test_ns"])).tables == []
await db.drop_namespace(["test_ns"])
assert "test_ns" not in (await db.list_namespaces()).namespaces
async def test_async_namespace_client_is_lazy(self):
"""namespace_client() should still return the backing client on demand."""
pytest.importorskip("lance")
from lance.namespace import DirectoryNamespace
db = lancedb.connect_namespace_async("dir", {"root": self.temp_dir})
assert db._namespace_client is None
ns_client = await db.namespace_client()
assert isinstance(ns_client, DirectoryNamespace)
namespace_id = ns_client.namespace_id().replace("\\\\", "\\")
assert str(self.temp_dir) in namespace_id
assert db._namespace_client is ns_client
# Async connect via namespace helper is not enabled yet. # Async connect via namespace helper is not enabled yet.
async def test_create_table_async(self): async def test_create_table_async(self):
@@ -818,10 +908,11 @@ class TestPushdownOperations:
) )
assert db._route_pushdown_to_rust is True assert db._route_pushdown_to_rust is True
def test_route_pushdown_to_rust_false_for_dir(self): def test_route_pushdown_to_rust_for_native_dir(self):
"""A non-native (dir) connection keeps the Python pushdown path.""" """The sync dir connection is natively built and defers QueryTable
pushdown to Rust."""
db = lancedb.connect_namespace("dir", {"root": self.temp_dir}) db = lancedb.connect_namespace("dir", {"root": self.temp_dir})
assert db._route_pushdown_to_rust is False assert db._route_pushdown_to_rust is True
def test_async_route_pushdown_to_rust_for_native_rest(self): def test_async_route_pushdown_to_rust_for_native_rest(self):
"""The async connection must not silently bypass the read-freshness fix: """The async connection must not silently bypass the read-freshness fix:
@@ -834,10 +925,11 @@ class TestPushdownOperations:
) )
assert db._route_pushdown_to_rust is True assert db._route_pushdown_to_rust is True
def test_async_route_pushdown_to_rust_false_for_dir(self): def test_async_route_pushdown_to_rust_for_native_dir(self):
"""The async non-native (dir) connection keeps the Python pushdown path.""" """The async dir connection is natively built and defers QueryTable
pushdown to Rust."""
db = lancedb.connect_namespace_async("dir", {"root": self.temp_dir}) db = lancedb.connect_namespace_async("dir", {"root": self.temp_dir})
assert db._route_pushdown_to_rust is False assert db._route_pushdown_to_rust is True
def test_lance_table_to_arrow_uses_query_pushdown(self): def test_lance_table_to_arrow_uses_query_pushdown(self):
namespace_client = _NamespaceClient() namespace_client = _NamespaceClient()

View File

@@ -502,6 +502,61 @@ def test_with_row_id(table: lancedb.table.Table):
assert rs["_rowid"].to_pylist() == [0, 1] assert rs["_rowid"].to_pylist() == [0, 1]
def test_where_repeated_combines_with_and(table: lancedb.table.Table):
# Calling where() more than once should AND the filters together instead of
# silently replacing the previous one (regression test for #2649).
builder = table.search().where("id >= 1").where("id < 2")
assert builder._where == "(id >= 1) AND (id < 2)"
ids = [row["id"] for row in builder.limit(10).to_list()]
assert ids == [1]
def test_where_repeated_combines_expr(table: lancedb.table.Table):
from lancedb.expr import col, lit
builder = table.search().where(col("id") >= lit(1)).where(col("id") < lit(2))
ids = [row["id"] for row in builder.limit(10).to_list()]
assert ids == [1]
def test_where_mixed_filter_kinds_combines(table: lancedb.table.Table):
# Mixing a SQL string filter with an expression filter lowers the
# expression to SQL and combines them as SQL strings.
from lancedb.expr import col, lit
builder = table.search().where("id >= 1").where(col("id") < lit(2))
ids = [row["id"] for row in builder.limit(10).to_list()]
assert ids == [1]
@pytest.mark.asyncio
async def test_where_repeated_combines_with_and_async(table_async: AsyncTable):
ids = [
row["id"]
for row in (
await table_async.query().where("id >= 1").where("id < 2").to_list()
)
]
assert ids == [1]
@pytest.mark.asyncio
async def test_where_mixed_filter_kinds_combines_async(table_async: AsyncTable):
from lancedb.expr import col, lit
ids = [
row["id"]
for row in (
await table_async.query()
.where("id >= 1")
.where(col("id") < lit(2))
.to_list()
)
]
assert ids == [1]
def test_distance_range(table: lancedb.table.Table): def test_distance_range(table: lancedb.table.Table):
q = [0, 0] q = [0, 0]
rs = table.search(q).to_arrow() rs = table.search(q).to_arrow()

View File

@@ -1137,6 +1137,16 @@ def test_namespace_open_table_with_branch_version(tmp_path):
assert db.open_table("t", namespace_path=["ns1"], branch="exp").count_rows() == 3 assert db.open_table("t", namespace_path=["ns1"], branch="exp").count_rows() == 3
def test_namespace_root_table_to_lance_uses_namespace_client(tmp_path):
pytest.importorskip("lance") # "dir" impl is lance.namespace.DirectoryNamespace
db = lancedb.connect_namespace("dir", {"root": str(tmp_path)})
table = db.create_table("t", [{"i": 0}])
assert table._namespace_client is None
assert table.to_lance().count_rows() == 1
assert table._namespace_client is not None
@pytest.mark.asyncio @pytest.mark.asyncio
async def test_async_namespace_open_table_with_branch_version(tmp_path): async def test_async_namespace_open_table_with_branch_version(tmp_path):
pytest.importorskip("lance") # "dir" impl is lance.namespace.DirectoryNamespace pytest.importorskip("lance") # "dir" impl is lance.namespace.DirectoryNamespace

View File

@@ -539,7 +539,7 @@ impl Connection {
} }
#[pyfunction] #[pyfunction]
#[pyo3(signature = (uri, api_key=None, region=None, host_override=None, read_consistency_interval=None, client_config=None, storage_options=None, session=None, manifest_enabled=false, namespace_client_properties=None))] #[pyo3(signature = (uri, api_key=None, region=None, host_override=None, read_consistency_interval=None, client_config=None, storage_options=None, session=None, manifest_enabled=false, namespace_client_properties=None, oauth_config=None))]
#[allow(clippy::too_many_arguments)] #[allow(clippy::too_many_arguments)]
pub fn connect( pub fn connect(
py: Python<'_>, py: Python<'_>,
@@ -553,6 +553,7 @@ pub fn connect(
session: Option<crate::session::Session>, session: Option<crate::session::Session>,
manifest_enabled: bool, manifest_enabled: bool,
namespace_client_properties: Option<HashMap<String, String>>, namespace_client_properties: Option<HashMap<String, String>>,
oauth_config: Option<crate::oauth::PyOAuthConfig>,
) -> PyResult<Bound<'_, PyAny>> { ) -> PyResult<Bound<'_, PyAny>> {
future_into_py(py, async move { future_into_py(py, async move {
let mut builder = lancedb::connect(&uri); let mut builder = lancedb::connect(&uri);
@@ -582,6 +583,11 @@ pub fn connect(
if let Some(client_config) = client_config { if let Some(client_config) = client_config {
builder = builder.client_config(client_config.into()); builder = builder.client_config(client_config.into());
} }
if let Some(oauth_config) = oauth_config {
let config: lancedb::remote::oauth::OAuthConfig =
oauth_config.try_into().infer_error()?;
builder = builder.oauth_config(config);
}
if let Some(session) = session { if let Some(session) = session {
builder = builder.session(session.inner.clone()); builder = builder.session(session.inner.clone());
} }
@@ -649,6 +655,46 @@ pub fn connect_namespace_client(
))) )))
} }
#[pyfunction]
#[pyo3(signature = (
namespace_client_impl,
namespace_client_properties,
read_consistency_interval=None,
storage_options=None,
session=None,
namespace_client_pushdown_operations=None,
))]
#[allow(clippy::too_many_arguments)]
pub fn connect_namespace(
namespace_client_impl: String,
namespace_client_properties: HashMap<String, String>,
read_consistency_interval: Option<f64>,
storage_options: Option<HashMap<String, String>>,
session: Option<crate::session::Session>,
namespace_client_pushdown_operations: Option<Vec<String>>,
) -> PyResult<Connection> {
let read_consistency_interval = read_consistency_interval.map(Duration::from_secs_f64);
let namespace_client_pushdown_operations =
parse_namespace_client_pushdown_operations(namespace_client_pushdown_operations)?;
let mut builder =
lancedb::connect_namespace(&namespace_client_impl, namespace_client_properties)
.pushdown_operations(namespace_client_pushdown_operations);
if let Some(storage_options) = storage_options {
builder = builder.storage_options(storage_options);
}
if let Some(read_consistency_interval) = read_consistency_interval {
builder = builder.read_consistency_interval(read_consistency_interval);
}
if let Some(session) = session {
builder = builder.session(session.inner.clone());
}
Ok(Connection::new(
crate::runtime::block_on(builder.execute()).infer_error()?,
))
}
/// Whether to build the namespace natively (from impl + properties) instead of /// Whether to build the namespace natively (from impl + properties) instead of
/// wrapping a pre-built client. Native construction is required for the /// wrapping a pre-built client. Native construction is required for the
/// read-freshness provider to be installed /// read-freshness provider to be installed

View File

@@ -2,7 +2,7 @@
// SPDX-FileCopyrightText: Copyright The LanceDB Authors // SPDX-FileCopyrightText: Copyright The LanceDB Authors
use arrow::RecordBatchStream; use arrow::RecordBatchStream;
use connection::{Connection, connect, connect_namespace_client}; use connection::{Connection, connect, connect_namespace, connect_namespace_client};
use env_logger::Env; use env_logger::Env;
use expr::{PyExpr, expr_col, expr_func, expr_lit}; use expr::{PyExpr, expr_col, expr_func, expr_lit};
use index::IndexConfig; use index::IndexConfig;
@@ -26,6 +26,7 @@ pub mod expr;
pub mod header; pub mod header;
pub mod index; pub mod index;
pub mod namespace; pub mod namespace;
pub mod oauth;
pub mod permutation; pub mod permutation;
pub mod query; pub mod query;
pub mod runtime; pub mod runtime;
@@ -61,6 +62,7 @@ pub fn _lancedb(_py: Python, m: &Bound<'_, PyModule>) -> PyResult<()> {
m.add_class::<PyPermutationReader>()?; m.add_class::<PyPermutationReader>()?;
m.add_class::<PyExpr>()?; m.add_class::<PyExpr>()?;
m.add_function(wrap_pyfunction!(connect, m)?)?; m.add_function(wrap_pyfunction!(connect, m)?)?;
m.add_function(wrap_pyfunction!(connect_namespace, m)?)?;
m.add_function(wrap_pyfunction!(connect_namespace_client, m)?)?; m.add_function(wrap_pyfunction!(connect_namespace_client, m)?)?;
m.add_function(wrap_pyfunction!(permutation::async_permutation_builder, m)?)?; m.add_function(wrap_pyfunction!(permutation::async_permutation_builder, m)?)?;
m.add_function(wrap_pyfunction!(util::validate_table_name, m)?)?; m.add_function(wrap_pyfunction!(util::validate_table_name, m)?)?;

72
python/src/oauth.rs Normal file
View File

@@ -0,0 +1,72 @@
// SPDX-License-Identifier: Apache-2.0
// SPDX-FileCopyrightText: Copyright The LanceDB Authors
use pyo3::FromPyObject;
use lancedb::error::Error;
use lancedb::remote::oauth::{OAuthConfig, OAuthFlow};
/// Python-side OAuth configuration, extracted via FromPyObject.
/// Maps to `lancedb.remote.oauth.OAuthConfig` Python dataclass.
#[derive(FromPyObject)]
pub struct PyOAuthConfig {
pub issuer_url: String,
pub client_id: String,
pub scopes: Vec<String>,
pub flow: String,
pub client_secret: Option<String>,
pub managed_identity_client_id: Option<String>,
pub refresh_buffer_secs: Option<u64>,
}
impl TryFrom<PyOAuthConfig> for OAuthConfig {
type Error = Error;
fn try_from(py: PyOAuthConfig) -> Result<Self, Self::Error> {
let flow = match py.flow.as_str() {
"client_credentials" => OAuthFlow::ClientCredentials,
"azure_managed_identity" => OAuthFlow::AzureManagedIdentity {
client_id: py.managed_identity_client_id,
},
other => {
return Err(Error::InvalidInput {
message: format!("Unknown OAuth flow type: {other}"),
});
}
};
Ok(Self {
issuer_url: py.issuer_url,
client_id: py.client_id,
client_secret: py.client_secret,
scopes: py.scopes,
flow,
refresh_buffer_secs: py.refresh_buffer_secs,
})
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_unknown_oauth_flow_returns_invalid_input() {
let config = PyOAuthConfig {
issuer_url: "https://issuer.example.com".to_string(),
client_id: "client-id".to_string(),
scopes: vec!["scope".to_string()],
flow: "typo".to_string(),
client_secret: None,
managed_identity_client_id: None,
refresh_buffer_secs: None,
};
let err = OAuthConfig::try_from(config).unwrap_err();
assert!(matches!(
err,
Error::InvalidInput { message }
if message == "Unknown OAuth flow type: typo"
));
}
}

View File

@@ -0,0 +1,33 @@
# SPDX-License-Identifier: Apache-2.0
# SPDX-FileCopyrightText: Copyright The LanceDB Authors
import importlib.util
import sys
from pathlib import Path
def _load_oauth_module():
oauth_path = (
Path(__file__).parents[1] / "python" / "lancedb" / "remote" / "oauth.py"
)
spec = importlib.util.spec_from_file_location("lancedb_remote_oauth", oauth_path)
module = importlib.util.module_from_spec(spec)
assert spec.loader is not None
sys.modules[spec.name] = module
spec.loader.exec_module(module)
return module
def test_oauth_config_repr_redacts_client_secret():
oauth = _load_oauth_module()
config = oauth.OAuthConfig(
issuer_url="https://issuer.example.com",
client_id="client-id",
scopes=["scope"],
client_secret="super-secret",
)
rendered = repr(config)
assert "super-secret" not in rendered
assert "client_secret" not in rendered

View File

@@ -1,6 +1,6 @@
[package] [package]
name = "lancedb" name = "lancedb"
version = "0.31.0-beta.3" version = "0.31.0-beta.5"
edition.workspace = true edition.workspace = true
description = "LanceDB: A serverless, low-latency vector database for AI applications" description = "LanceDB: A serverless, low-latency vector database for AI applications"
license.workspace = true license.workspace = true
@@ -14,6 +14,7 @@ rust-version.workspace = true
ahash = { workspace = true } ahash = { workspace = true }
arrow = { workspace = true } arrow = { workspace = true }
arrow-array = { workspace = true } arrow-array = { workspace = true }
arrow-buffer = { workspace = true }
arrow-data = { workspace = true } arrow-data = { workspace = true }
arrow-schema = { workspace = true } arrow-schema = { workspace = true }
arrow-select = { workspace = true } arrow-select = { workspace = true }
@@ -166,6 +167,10 @@ required-features = ["bedrock"]
[[example]] [[example]]
name = "simple" name = "simple"
[[example]]
name = "polars"
required-features = ["polars"]
[[example]] [[example]]
name = "full_text_search" name = "full_text_search"

View File

@@ -0,0 +1,47 @@
// SPDX-License-Identifier: Apache-2.0
// SPDX-FileCopyrightText: Copyright The LanceDB Authors
//! This example demonstrates ingesting a Polars DataFrame into LanceDB and
//! reading it back out as a Polars DataFrame.
use lancedb::arrow::IntoPolars;
use lancedb::query::ExecutableQuery;
use lancedb::{Result, connect};
use polars::prelude::{DataFrame, NamedFrom, Series};
fn make_dataframe() -> DataFrame {
let ids = Series::new("id", &[1i32, 2, 3, 4, 5]);
let names = Series::new("name", &["Alice", "Bob", "Carol", "Dave", "Eve"]);
let scores = Series::new("score", &[9.5f64, 8.1, 7.3, 9.0, 6.5]);
DataFrame::new(vec![ids, names, scores]).unwrap()
}
#[tokio::main]
async fn main() -> Result<()> {
let tmp = tempfile::tempdir().unwrap();
let db = connect(tmp.path().to_str().unwrap()).execute().await?;
// Ingest a Polars DataFrame directly — DataFrame now implements Scannable.
let df = make_dataframe();
println!("Input DataFrame:\n{df}");
let table = db.create_table("people", df).execute().await?;
// Append more rows.
let more = DataFrame::new(vec![
Series::new("id", &[6i32, 7]),
Series::new("name", &["Frank", "Grace"]),
Series::new("score", &[7.8f64, 8.9]),
])
.unwrap();
table.add(more).execute().await?;
// Read back as a Polars DataFrame.
let result_df = table.query().execute().await?.into_polars().await?;
println!(
"\nRound-tripped DataFrame ({} rows):\n{result_df}",
result_df.height()
);
Ok(())
}

View File

@@ -3,7 +3,19 @@
use std::{pin::Pin, sync::Arc}; use std::{pin::Pin, sync::Arc};
// Re-export the arrow crates we depend on so downstream consumers can build
// `RecordBatch`/arrays/builders against the exact same arrow line lancedb was
// compiled against, instead of declaring their own (potentially mismatched)
// direct arrow dependencies. See https://github.com/lancedb/lancedb/issues/3575.
pub use arrow;
pub use arrow_array;
pub use arrow_buffer;
pub use arrow_cast;
pub use arrow_data;
pub use arrow_ipc;
pub use arrow_ord;
pub use arrow_schema; pub use arrow_schema;
pub use arrow_select;
use datafusion_common::DataFusionError; use datafusion_common::DataFusionError;
use datafusion_physical_plan::stream::RecordBatchStreamAdapter; use datafusion_physical_plan::stream::RecordBatchStreamAdapter;
use futures::{Stream, StreamExt, TryStreamExt}; use futures::{Stream, StreamExt, TryStreamExt};
@@ -112,54 +124,14 @@ impl<S: Stream<Item = Result<arrow_array::RecordBatch>>> RecordBatchStream
/// A trait for converting incoming data to Arrow /// A trait for converting incoming data to Arrow
/// ///
/// Integrations should implement this trait to allow data to be
/// imported directly from the integration. For example, implementing
/// this trait for `Vec<Vec<...>>` would allow the `Vec` to be directly
/// used in methods like [`crate::connection::Connection::create_table`]
/// or [`crate::table::Table::add`]
pub trait IntoArrow {
/// Convert the data into an iterator of Arrow batches
fn into_arrow(self) -> Result<Box<dyn arrow_array::RecordBatchReader + Send>>;
}
pub type BoxedRecordBatchReader = Box<dyn arrow_array::RecordBatchReader + Send>; pub type BoxedRecordBatchReader = Box<dyn arrow_array::RecordBatchReader + Send>;
impl<T: arrow_array::RecordBatchReader + Send + 'static> IntoArrow for T {
fn into_arrow(self) -> Result<Box<dyn arrow_array::RecordBatchReader + Send>> {
Ok(Box::new(self))
}
}
/// A trait for converting incoming data to Arrow asynchronously
///
/// Serves the same purpose as [`IntoArrow`], but for asynchronous data.
///
/// Note: Arrow has no async equivalent to RecordBatchReader and so
pub trait IntoArrowStream {
/// Convert the data into a stream of Arrow batches
fn into_arrow(self) -> Result<SendableRecordBatchStream>;
}
impl<S: Stream<Item = Result<arrow_array::RecordBatch>>> SimpleRecordBatchStream<S> { impl<S: Stream<Item = Result<arrow_array::RecordBatch>>> SimpleRecordBatchStream<S> {
pub fn new(stream: S, schema: Arc<arrow_schema::Schema>) -> Self { pub fn new(stream: S, schema: Arc<arrow_schema::Schema>) -> Self {
Self { schema, stream } Self { schema, stream }
} }
} }
impl IntoArrowStream for SendableRecordBatchStream {
fn into_arrow(self) -> Result<SendableRecordBatchStream> {
Ok(self)
}
}
impl IntoArrowStream for datafusion_physical_plan::SendableRecordBatchStream {
fn into_arrow(self) -> Result<SendableRecordBatchStream> {
let schema = self.schema();
let stream = self.map_err(|df_err| df_err.into());
Ok(Box::pin(SimpleRecordBatchStream::new(stream, schema)))
}
}
pub trait LanceDbDatagenExt { pub trait LanceDbDatagenExt {
fn into_ldb_stream( fn into_ldb_stream(
self, self,
@@ -264,9 +236,7 @@ impl IntoPolars for SendableRecordBatchStream {
#[cfg(all(test, feature = "polars"))] #[cfg(all(test, feature = "polars"))]
mod tests { mod tests {
use super::SendableRecordBatchStream; use super::SendableRecordBatchStream;
use crate::arrow::{ use crate::arrow::{IntoPolars, PolarsDataFrameRecordBatchReader, SimpleRecordBatchStream};
IntoArrow, IntoPolars, PolarsDataFrameRecordBatchReader, SimpleRecordBatchStream,
};
use polars::prelude::{DataFrame, NamedFrom, Series}; use polars::prelude::{DataFrame, NamedFrom, Series};
fn get_record_batch_reader_from_polars() -> Box<dyn arrow_array::RecordBatchReader + Send> { fn get_record_batch_reader_from_polars() -> Box<dyn arrow_array::RecordBatchReader + Send> {
@@ -280,10 +250,7 @@ mod tests {
float_series = Series::new("float", &[2.0]); float_series = Series::new("float", &[2.0]);
let df2 = DataFrame::new(vec![string_series, int_series, float_series]).unwrap(); let df2 = DataFrame::new(vec![string_series, int_series, float_series]).unwrap();
PolarsDataFrameRecordBatchReader::new(df1.vstack(&df2).unwrap()) Box::new(PolarsDataFrameRecordBatchReader::new(df1.vstack(&df2).unwrap()).unwrap())
.unwrap()
.into_arrow()
.unwrap()
} }
#[test] #[test]

View File

@@ -185,6 +185,43 @@ impl Scannable for SendableRecordBatchStream {
} }
} }
#[cfg(feature = "polars")]
impl Scannable for polars::frame::DataFrame {
fn schema(&self) -> SchemaRef {
crate::polars_arrow_convertors::convert_polars_df_schema_to_arrow_rb_schema(
self.schema().clone(),
)
.expect("failed to convert Polars DataFrame schema to Arrow schema")
}
fn scan_as_stream(&mut self) -> SendableRecordBatchStream {
let schema = Scannable::schema(self);
let batches: crate::Result<Vec<RecordBatch>> =
match crate::arrow::PolarsDataFrameRecordBatchReader::new(self.clone()) {
Err(e) => Err(e),
Ok(reader) => reader.map(|b| b.map_err(Into::into)).collect(),
};
match batches {
Err(e) => Box::pin(SimpleRecordBatchStream {
schema,
stream: once(async move { Err(e) }),
}),
Ok(batches) => {
let stream = futures::stream::iter(batches.into_iter().map(Ok));
Box::pin(SimpleRecordBatchStream { schema, stream })
}
}
}
fn num_rows(&self) -> Option<usize> {
Some(self.height())
}
fn rescannable(&self) -> bool {
true
}
}
#[async_trait] #[async_trait]
impl StreamingWriteSource for Box<dyn Scannable> { impl StreamingWriteSource for Box<dyn Scannable> {
fn arrow_schema(&self) -> SchemaRef { fn arrow_schema(&self) -> SchemaRef {
@@ -1089,4 +1126,60 @@ mod tests {
); );
} }
} }
#[cfg(feature = "polars")]
mod polars_tests {
use super::*;
use crate::arrow::IntoPolars;
use crate::query::ExecutableQuery;
use polars::prelude::{DataFrame, NamedFrom, Series};
fn make_df() -> DataFrame {
DataFrame::new(vec![
Series::new("id", &[1i32, 2, 3]),
Series::new("val", &[1.1f64, 2.2, 3.3]),
])
.unwrap()
}
#[tokio::test]
async fn test_dataframe_scannable_round_trip() {
let tmp = tempfile::tempdir().unwrap();
let db = crate::connect(tmp.path().to_str().unwrap())
.execute()
.await
.unwrap();
let df = make_df();
let table = db.create_table("t", df.clone()).execute().await.unwrap();
// Append the same rows again.
table.add(df.clone()).execute().await.unwrap();
let result = table
.query()
.execute()
.await
.unwrap()
.into_polars()
.await
.unwrap();
assert_eq!(result.height(), df.height() * 2);
assert_eq!(result.schema(), df.schema());
}
#[tokio::test]
async fn test_dataframe_scannable_rescannable() {
let mut df = make_df();
assert!(df.rescannable());
let batches1: Vec<RecordBatch> = df.scan_as_stream().try_collect().await.unwrap();
assert_eq!(batches1.iter().map(|b| b.num_rows()).sum::<usize>(), 3);
// Can be scanned again.
let batches2: Vec<RecordBatch> = df.scan_as_stream().try_collect().await.unwrap();
assert_eq!(batches2.iter().map(|b| b.num_rows()).sum::<usize>(), 3);
}
}
} }

View File

@@ -342,3 +342,9 @@ pub use connection::connect_namespace;
/// Re-export Lance Session and ObjectStoreRegistry for custom session creation /// Re-export Lance Session and ObjectStoreRegistry for custom session creation
pub use lance::session::Session; pub use lance::session::Session;
pub use lance_io::object_store::ObjectStoreRegistry; pub use lance_io::object_store::ObjectStoreRegistry;
/// Re-export DataFusion so consumers can build the `Expr` values that public
/// query/merge APIs (e.g. [`query::QueryBase::only_if_expr`]) accept without
/// declaring their own (potentially mismatched) direct `datafusion` dependency.
/// See <https://github.com/lancedb/lancedb/issues/3575>.
pub use datafusion;

View File

@@ -401,6 +401,9 @@ pub trait QueryBase {
/// ///
/// Filtering performance can often be improved by creating a scalar index /// Filtering performance can often be improved by creating a scalar index
/// on the filter column(s). /// on the filter column(s).
///
/// Calling this multiple times combines the filters with a logical AND
/// (i.e. `(previous) AND (new)`) rather than replacing the previous filter.
fn only_if(self, filter: impl AsRef<str>) -> Self; fn only_if(self, filter: impl AsRef<str>) -> Self;
/// Only return rows which match the filter, using an expression builder. /// Only return rows which match the filter, using an expression builder.
@@ -423,6 +426,9 @@ pub trait QueryBase {
/// ///
/// Note: Expression filters are not supported for remote/server-side queries. /// Note: Expression filters are not supported for remote/server-side queries.
/// Use [`QueryBase::only_if`] with SQL strings for remote tables. /// Use [`QueryBase::only_if`] with SQL strings for remote tables.
///
/// Calling this multiple times combines the expressions with a logical AND
/// rather than replacing the previous filter.
fn only_if_expr(self, filter: datafusion_expr::Expr) -> Self; fn only_if_expr(self, filter: datafusion_expr::Expr) -> Self;
/// Perform a full text search on the table. /// Perform a full text search on the table.
@@ -535,12 +541,13 @@ impl<T: HasQuery> QueryBase for T {
} }
fn only_if(mut self, filter: impl AsRef<str>) -> Self { fn only_if(mut self, filter: impl AsRef<str>) -> Self {
self.mut_query().filter = Some(QueryFilter::Sql(filter.as_ref().to_string())); self.mut_query()
.add_filter(QueryFilter::Sql(filter.as_ref().to_string()));
self self
} }
fn only_if_expr(mut self, filter: datafusion_expr::Expr) -> Self { fn only_if_expr(mut self, filter: datafusion_expr::Expr) -> Self {
self.mut_query().filter = Some(QueryFilter::Datafusion(filter)); self.mut_query().add_filter(QueryFilter::Datafusion(filter));
self self
} }
@@ -716,6 +723,39 @@ pub enum QueryFilter {
Datafusion(Expr), Datafusion(Expr),
} }
/// Combine two filters with a logical AND.
///
/// This is used when a query receives more than one filter (for example when
/// `where`/`only_if` is called multiple times) so the filters are composed
/// with AND rather than the later filter silently replacing the earlier one.
///
/// SQL string and expression filters are combined within their own
/// representation. When the two representations are mixed, the expression is
/// lowered to SQL (via [`crate::expr::expr_to_sql_string`]) and the filters are
/// combined as SQL strings. Substrait filters cannot be combined and return an
/// error.
fn and_filters(existing: QueryFilter, new: QueryFilter) -> Result<QueryFilter> {
match (existing, new) {
(QueryFilter::Sql(lhs), QueryFilter::Sql(rhs)) => {
Ok(QueryFilter::Sql(format!("({lhs}) AND ({rhs})")))
}
(QueryFilter::Datafusion(lhs), QueryFilter::Datafusion(rhs)) => {
Ok(QueryFilter::Datafusion(lhs.and(rhs)))
}
(QueryFilter::Sql(lhs), QueryFilter::Datafusion(rhs)) => {
let rhs = crate::expr::expr_to_sql_string(&rhs)?;
Ok(QueryFilter::Sql(format!("({lhs}) AND ({rhs})")))
}
(QueryFilter::Datafusion(lhs), QueryFilter::Sql(rhs)) => {
let lhs = crate::expr::expr_to_sql_string(&lhs)?;
Ok(QueryFilter::Sql(format!("({lhs}) AND ({rhs})")))
}
_ => Err(Error::InvalidInput {
message: "cannot combine a Substrait filter with another filter".to_string(),
}),
}
}
/// A basic query into a table without any kind of search /// A basic query into a table without any kind of search
/// ///
/// This will result in a (potentially filtered) scan if executed /// This will result in a (potentially filtered) scan if executed
@@ -730,6 +770,13 @@ pub struct QueryRequest {
/// Apply filter to the returned rows. /// Apply filter to the returned rows.
pub filter: Option<QueryFilter>, pub filter: Option<QueryFilter>,
/// An error recorded while combining repeated filters that could not be
/// composed (see [`QueryRequest::add_filter`]). It is surfaced when the
/// query is executed via [`QueryRequest::check_filter`]. We defer the error
/// because the builder methods that set filters return `Self` rather than a
/// `Result`.
pub(crate) filter_error: Option<String>,
/// Perform a full text search on the table. /// Perform a full text search on the table.
pub full_text_search: Option<FullTextSearchQuery>, pub full_text_search: Option<FullTextSearchQuery>,
@@ -775,6 +822,7 @@ impl Default for QueryRequest {
limit: None, limit: None,
offset: None, offset: None,
filter: None, filter: None,
filter_error: None,
full_text_search: None, full_text_search: None,
select: Select::All, select: Select::All,
fast_search: false, fast_search: false,
@@ -788,6 +836,41 @@ impl Default for QueryRequest {
} }
} }
impl QueryRequest {
/// Add a filter, combining it with any existing filter using a logical AND.
///
/// If the new filter cannot be combined with the existing one (because they
/// use different representations) the error is recorded and surfaced later
/// by [`Self::check_filter`].
pub(crate) fn add_filter(&mut self, new: QueryFilter) {
self.filter = Some(match self.filter.take() {
None => new,
Some(existing) => match and_filters(existing, new) {
Ok(combined) => combined,
Err(err) => {
// The filters were consumed while attempting to combine
// them; the recorded error is surfaced by `check_filter`
// before the query executes.
self.filter_error = Some(err.to_string());
return;
}
},
});
}
/// Return an error if combining filters failed (see [`Self::add_filter`]).
///
/// This must be called by every backend before executing a query.
pub(crate) fn check_filter(&self) -> Result<()> {
if let Some(message) = &self.filter_error {
return Err(Error::InvalidInput {
message: message.clone(),
});
}
Ok(())
}
}
/// A builder for LanceDB queries. /// A builder for LanceDB queries.
/// ///
/// See [`crate::Table::query`] for more details on queries /// See [`crate::Table::query`] for more details on queries
@@ -1682,6 +1765,70 @@ mod tests {
} }
} }
#[tokio::test]
async fn test_repeated_only_if_combines_with_and() {
use crate::expr::{col, lit};
let tmp_dir = tempdir().unwrap();
let dataset_path = tmp_dir.path().join("test.lance");
let uri = dataset_path.to_str().unwrap();
let conn = connect(uri).execute().await.unwrap();
let table = conn
.create_table("my_table", make_non_empty_batches())
.execute()
.await
.unwrap();
let query = table.query().only_if("id > 0").only_if("id < 100");
match &query.request.filter {
Some(QueryFilter::Sql(sql)) => assert_eq!(sql, "(id > 0) AND (id < 100)"),
other => panic!("expected combined SQL filter, got {other:?}"),
}
// A single filter is left untouched.
let query = table.query().only_if("id > 0");
match &query.request.filter {
Some(QueryFilter::Sql(sql)) => assert_eq!(sql, "id > 0"),
other => panic!("expected single SQL filter, got {other:?}"),
}
// Expression filters are combined with a logical AND as well.
let query = table
.query()
.only_if_expr(col("id").gt(lit(0i32)))
.only_if_expr(col("id").lt(lit(100i32)));
match &query.request.filter {
Some(QueryFilter::Datafusion(expr)) => {
assert_eq!(
expr,
&col("id").gt(lit(0i32)).and(col("id").lt(lit(100i32)))
);
}
other => panic!("expected combined Datafusion filter, got {other:?}"),
}
// Mixing an SQL string filter with an expression filter lowers the
// expression to SQL and combines them as SQL strings.
let query = table
.query()
.only_if("id > 0")
.only_if_expr(col("id").lt(lit(100i32)));
match &query.request.filter {
Some(QueryFilter::Sql(sql)) => {
let expected = format!(
"(id > 0) AND ({})",
crate::expr::expr_to_sql_string(&col("id").lt(lit(100i32))).unwrap()
);
assert_eq!(sql, &expected);
}
other => panic!("expected combined SQL filter, got {other:?}"),
}
assert!(query.request.check_filter().is_ok());
// The combined filter executes without error.
query.execute().await.unwrap();
}
#[tokio::test] #[tokio::test]
async fn test_select_with_transform() { async fn test_select_with_transform() {
// TODO: Switch back to memory://foo after https://github.com/lancedb/lancedb/issues/1051 // TODO: Switch back to memory://foo after https://github.com/lancedb/lancedb/issues/1051

View File

@@ -70,18 +70,29 @@ use tokio::sync::RwLock;
const REQUEST_TIMEOUT_HEADER: HeaderName = HeaderName::from_static("x-request-timeout-ms"); const REQUEST_TIMEOUT_HEADER: HeaderName = HeaderName::from_static("x-request-timeout-ms");
const MIN_VERSION_HEADER: HeaderName = HeaderName::from_static("x-lancedb-min-version"); const MIN_VERSION_HEADER: HeaderName = HeaderName::from_static("x-lancedb-min-version");
const MIN_TIMESTAMP_HEADER: HeaderName = HeaderName::from_static("x-lancedb-min-timestamp"); const MIN_TIMESTAMP_HEADER: HeaderName = HeaderName::from_static("x-lancedb-min-timestamp");
const MIN_READ_VERSION_HEADER: HeaderName = HeaderName::from_static("x-lancedb-min-read-version");
const VERSION_HEADER: HeaderName = HeaderName::from_static("x-lancedb-version");
const METRIC_TYPE_KEY: &str = "metric_type"; const METRIC_TYPE_KEY: &str = "metric_type";
const INDEX_TYPE_KEY: &str = "index_type"; const INDEX_TYPE_KEY: &str = "index_type";
const SCHEMA_CACHE_TTL: Duration = Duration::from_secs(30); const SCHEMA_CACHE_TTL: Duration = Duration::from_secs(30);
const SCHEMA_CACHE_REFRESH_WINDOW: Duration = Duration::from_secs(5); const SCHEMA_CACHE_REFRESH_WINDOW: Duration = Duration::from_secs(5);
/// Per-table state driving the freshness headers (`x-lancedb-min-version` and /// Per-table state driving the freshness headers (`x-lancedb-min-version`,
/// `x-lancedb-min-timestamp`) sent on read requests. /// `x-lancedb-min-timestamp`, and `x-lancedb-min-read-version`) sent on read
/// requests.
#[derive(Debug, Default, Clone, Copy)] #[derive(Debug, Default, Clone, Copy)]
struct FreshnessState { struct FreshnessState {
/// Provides read-your-write within a single handle: writes that return a /// Provides read-your-write within a single handle: writes that return a
/// version update this, and reads send it as `x-lancedb-min-version`. /// version update this, and reads send it as `x-lancedb-min-version`.
min_version: Option<u64>, min_version: Option<u64>,
/// Highest dataset version observed in a *read* response on this handle.
/// Reads send it as `x-lancedb-min-read-version` so a load-balanced query
/// node whose cache is behind this version must refresh before serving,
/// giving monotonic reads across nodes regardless of which one the load
/// balancer routes to. Sourced only from reads (always committed dataset
/// versions), never from writes (which may return WAL entry ids), so it is
/// unaffected by the WAL/version mismatch that retired `min_version`.
min_read_version: Option<u64>,
/// Wall-clock time captured at the last [`BaseTable::checkout_latest`] /// Wall-clock time captured at the last [`BaseTable::checkout_latest`]
/// call. Subsequent reads send /// call. Subsequent reads send
/// `max(baseline, now - read_consistency_interval)` as /// `max(baseline, now - read_consistency_interval)` as
@@ -102,6 +113,7 @@ struct FreshnessState {
struct FreshnessHeaders { struct FreshnessHeaders {
min_version: Option<u64>, min_version: Option<u64>,
min_timestamp: Option<SystemTime>, min_timestamp: Option<SystemTime>,
min_read_version: Option<u64>,
} }
impl FreshnessHeaders { impl FreshnessHeaders {
@@ -113,6 +125,9 @@ impl FreshnessHeaders {
let dt: chrono::DateTime<chrono::Utc> = ts.into(); let dt: chrono::DateTime<chrono::Utc> = ts.into();
request = request.header(MIN_TIMESTAMP_HEADER, dt.to_rfc3339()); request = request.header(MIN_TIMESTAMP_HEADER, dt.to_rfc3339());
} }
if let Some(v) = self.min_read_version {
request = request.header(MIN_READ_VERSION_HEADER, v.to_string());
}
request request
} }
} }
@@ -597,6 +612,7 @@ impl<S: HttpSend> RemoteTable<S> {
body: &mut serde_json::Value, body: &mut serde_json::Value,
params: &QueryRequest, params: &QueryRequest,
) -> Result<()> { ) -> Result<()> {
params.check_filter()?;
body["prefilter"] = params.prefilter.into(); body["prefilter"] = params.prefilter.into();
if let Some(offset) = params.offset { if let Some(offset) = params.offset {
body["offset"] = serde_json::Value::Number(serde_json::Number::from(offset)); body["offset"] = serde_json::Value::Number(serde_json::Number::from(offset));
@@ -884,6 +900,7 @@ impl<S: HttpSend> RemoteTable<S> {
self.client.read_consistency_interval, self.client.read_consistency_interval,
SystemTime::now(), SystemTime::now(),
), ),
min_read_version: state.min_read_version,
} }
} }
@@ -905,6 +922,30 @@ impl<S: HttpSend> RemoteTable<S> {
state.min_version = Some(state.min_version.map_or(version, |v| v.max(version))); state.min_version = Some(state.min_version.map_or(version, |v| v.max(version)));
} }
/// Record a dataset version observed in a *read* response so subsequent
/// reads request at least this version via `x-lancedb-min-read-version`,
/// giving monotonic reads across load-balanced query nodes. A returned `0`
/// (or absent header from an old server) is ignored.
fn track_read_version(&self, version: u64) {
if version == 0 {
return;
}
let mut state = self.freshness.lock().unwrap();
state.min_read_version = Some(state.min_read_version.map_or(version, |v| v.max(version)));
}
/// Parse the `x-lancedb-version` response header (the dataset version a read
/// reflects) and fold it into the read-version watermark.
fn track_read_version_from_headers(&self, headers: &reqwest::header::HeaderMap) {
if let Some(version) = headers
.get(&VERSION_HEADER)
.and_then(|value| value.to_str().ok())
.and_then(|value| value.parse::<u64>().ok())
{
self.track_read_version(version);
}
}
async fn execute_query( async fn execute_query(
&self, &self,
query: &AnyQuery, query: &AnyQuery,
@@ -928,6 +969,7 @@ impl<S: HttpSend> RemoteTable<S> {
let futures = requests.into_iter().map(|req| async move { let futures = requests.into_iter().map(|req| async move {
let (request_id, response) = self.send(req, true).await?; let (request_id, response) = self.send(req, true).await?;
self.track_read_version_from_headers(response.headers());
self.read_arrow_stream(&request_id, response).await self.read_arrow_stream(&request_id, response).await
}); });
let streams = futures::future::try_join_all(futures); let streams = futures::future::try_join_all(futures);
@@ -1545,11 +1587,12 @@ impl<S: HttpSend> BaseTable for RemoteTable<S> {
*write_guard = None; *write_guard = None;
drop(write_guard); drop(write_guard);
// Drop any per-handle write tracking; subsequent reads use the // Drop any per-handle read/write tracking; subsequent reads use the
// baseline timestamp captured now to guarantee freshness. // baseline timestamp captured now to guarantee freshness.
*self.freshness.lock().unwrap() = FreshnessState { *self.freshness.lock().unwrap() = FreshnessState {
min_version: None, min_version: None,
checkout_baseline: Some(SystemTime::now()), checkout_baseline: Some(SystemTime::now()),
min_read_version: None,
}; };
// Invalidate schema cache since we're switching versions // Invalidate schema cache since we're switching versions
@@ -1805,6 +1848,7 @@ impl<S: HttpSend> BaseTable for RemoteTable<S> {
} }
}; };
self.track_read_version_from_headers(response.headers());
let body = response.text().await.err_to_http(request_id.clone())?; let body = response.text().await.err_to_http(request_id.clone())?;
serde_json::from_str(&body).map_err(|e| Error::Http { serde_json::from_str(&body).map_err(|e| Error::Http {
@@ -7124,6 +7168,7 @@ mod tests {
let state = FreshnessState { let state = FreshnessState {
min_version: None, min_version: None,
checkout_baseline: Some(baseline), checkout_baseline: Some(baseline),
min_read_version: None,
}; };
assert_eq!(compute_min_timestamp(&state, None, now), Some(baseline)); assert_eq!(compute_min_timestamp(&state, None, now), Some(baseline));
@@ -7148,6 +7193,7 @@ mod tests {
let state = FreshnessState { let state = FreshnessState {
min_version: None, min_version: None,
checkout_baseline: Some(baseline), checkout_baseline: Some(baseline),
min_read_version: None,
}; };
assert_eq!( assert_eq!(
compute_min_timestamp(&state, Some(Duration::from_secs(10)), now), compute_min_timestamp(&state, Some(Duration::from_secs(10)), now),
@@ -7159,6 +7205,7 @@ mod tests {
let state = FreshnessState { let state = FreshnessState {
min_version: None, min_version: None,
checkout_baseline: Some(recent_baseline), checkout_baseline: Some(recent_baseline),
min_read_version: None,
}; };
assert_eq!( assert_eq!(
compute_min_timestamp(&state, Some(Duration::from_secs(60)), now), compute_min_timestamp(&state, Some(Duration::from_secs(60)), now),
@@ -7303,6 +7350,106 @@ mod tests {
); );
} }
/// A handler that records every request's headers and answers each read with
/// an `x-lancedb-version` response header taken from `versions` (by call
/// index, saturating at the last entry). An empty string means "no header".
fn read_version_handler(
versions: &'static [&'static str],
) -> (
impl Fn(reqwest::Request) -> http::Response<String> + Clone + Send + Sync + 'static,
Arc<std::sync::Mutex<Vec<http::HeaderMap>>>,
) {
let requests = Arc::new(std::sync::Mutex::new(Vec::new()));
let requests_c = requests.clone();
let call = Arc::new(AtomicUsize::new(0));
let handler = move |request: reqwest::Request| {
requests_c.lock().unwrap().push(request.headers().clone());
let i = call.fetch_add(1, Ordering::SeqCst).min(versions.len() - 1);
let mut builder = http::Response::builder().status(200);
if !versions[i].is_empty() {
builder = builder.header("x-lancedb-version", versions[i]);
}
builder.body("42".to_string()).unwrap()
};
(handler, requests)
}
#[tokio::test]
async fn test_read_version_watermark_tracked_and_sent() {
let (handler, requests) = read_version_handler(&["100", "100"]);
let table = Table::new_with_handler("my_table", handler);
// First read has no watermark yet; the response advertises version 100,
// so the second read must floor the server at 100.
table.count_rows(None).await.unwrap();
table.count_rows(None).await.unwrap();
let reqs = requests.lock().unwrap();
assert!(!reqs[0].contains_key("x-lancedb-min-read-version"));
assert_eq!(
reqs[1]
.get("x-lancedb-min-read-version")
.unwrap()
.to_str()
.unwrap(),
"100"
);
}
#[tokio::test]
async fn test_read_version_watermark_keeps_max() {
// Server reports 100 then a stale 50; the watermark must not regress.
let (handler, requests) = read_version_handler(&["100", "50", "50"]);
let table = Table::new_with_handler("my_table", handler);
table.count_rows(None).await.unwrap();
table.count_rows(None).await.unwrap();
table.count_rows(None).await.unwrap();
let reqs = requests.lock().unwrap();
assert_eq!(
reqs[2]
.get("x-lancedb-min-read-version")
.unwrap()
.to_str()
.unwrap(),
"100"
);
}
#[tokio::test]
async fn test_read_version_absent_header_no_watermark() {
// An old server that doesn't return the version header leaves the
// watermark unset, preserving backward compatibility.
let (handler, requests) = read_version_handler(&[""]);
let table = Table::new_with_handler("my_table", handler);
table.count_rows(None).await.unwrap();
table.count_rows(None).await.unwrap();
let reqs = requests.lock().unwrap();
assert!(!reqs[1].contains_key("x-lancedb-min-read-version"));
}
#[tokio::test]
async fn test_read_version_watermark_reset_on_checkout_latest() {
let (handler, requests) = read_version_handler(&["100", "100"]);
let table = Table::new_with_handler("my_table", handler);
table.count_rows(None).await.unwrap();
table.checkout_latest().await.unwrap();
table.count_rows(None).await.unwrap();
// The read after checkout_latest starts from a clean slate.
let reqs = requests.lock().unwrap();
assert!(
!reqs
.last()
.unwrap()
.contains_key("x-lancedb-min-read-version")
);
}
/// Like `capturing_handler`, but keeps a per-path snapshot of the headers /// Like `capturing_handler`, but keeps a per-path snapshot of the headers
/// from every request so tests can assert on a specific endpoint. /// from every request so tests can assert on a specific endpoint.
#[allow(clippy::type_complexity)] #[allow(clippy::type_complexity)]

View File

@@ -35,6 +35,15 @@ pub enum AnyQuery {
VectorQuery(VectorQueryRequest), VectorQuery(VectorQueryRequest),
} }
impl AnyQuery {
pub(crate) fn base(&self) -> &QueryRequest {
match self {
Self::Query(query) => query,
Self::VectorQuery(query) => &query.base,
}
}
}
//Decide between namespace or local //Decide between namespace or local
pub async fn execute_query( pub async fn execute_query(
table: &NativeTable, table: &NativeTable,
@@ -108,6 +117,7 @@ pub async fn create_plan(
AnyQuery::VectorQuery(query) => query.clone(), AnyQuery::VectorQuery(query) => query.clone(),
AnyQuery::Query(query) => VectorQueryRequest::from_plain_query(query.clone()), AnyQuery::Query(query) => VectorQueryRequest::from_plain_query(query.clone()),
}; };
query.base.check_filter()?;
let ds_ref = table.dataset.get().await?; let ds_ref = table.dataset.get().await?;
let schema = ds_ref.schema(); let schema = ds_ref.schema();
@@ -357,6 +367,7 @@ async fn execute_namespace_query(
/// Convert an AnyQuery to the namespace QueryTableRequest format. /// Convert an AnyQuery to the namespace QueryTableRequest format.
fn convert_to_namespace_query(query: &AnyQuery) -> Result<NsQueryTableRequest> { fn convert_to_namespace_query(query: &AnyQuery) -> Result<NsQueryTableRequest> {
query.base().check_filter()?;
match query { match query {
AnyQuery::VectorQuery(vq) => { AnyQuery::VectorQuery(vq) => {
// Extract the query vector(s) // Extract the query vector(s)