Compare commits

..

10 Commits

Author SHA1 Message Date
Daniel Rammer
1700d618e5 test(python): regenerate lindera ipadic fixtures for lindera 3.x
lance v7.0.0-beta.9 bumps lindera 0.44 -> 3.0.7, which changed the
tokenizer config schema (dictionary is now a string path, not a
{ path: ... } map) and the dictionary binary format (now requires
metadata.json). The old fixtures broke test_fts_lindera_tokenizer on
all platforms.

Lift the regenerated config.yml and main.zip from the lance
v7.0.0-beta.9 tag (lance-format/lance#6719) and update the
lindera_ipadic fixture's config writer to the 3.x schema.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 10:46:44 -05:00
lancedb automation
3726491b27 chore: update lance dependency to v7.0.0-beta.9 2026-05-15 14:46:00 +00:00
Neha Prasad
13c6dae9a3 feat(nodejs): add Connection.renameTable with namespace support (#3365)
### Summary

- Expose Connection.renameTable in the Node.js bindings and align it
with existing namespace-aware connection APIs.

### Changes

- Add napi-rs rename_table on Connection, delegating to Rust
Connection::rename_table.
- Add renameTable(oldName, newName, namespacePath?) on abstract
Connection and implement on LocalConnection.
- Add a connection test that renames a table and checks names / open
behavior.

#### Testing

- cd nodejs && npm run build
- cd nodejs && npm test __test__/connection.test.ts

fix : #3364

---------

Co-authored-by: Will Jones <willjones127@gmail.com>
2026-05-14 15:30:31 -07:00
Shengan Zhang
64aeee84a8 feat(python): support bytes in lit() expressions (#3387)
Closes #3261.

## Summary

Adds `bytes` to the accepted types of `lancedb.expr.lit()` so that
binary scalars can be used in filter / projection expressions. The
previous attempt in #3235 had to be reverted because DataFusion's SQL
unparser does not support `Binary` / `LargeBinary` scalars, so any
expression containing such a literal would fail in both `to_sql()` and
`__repr__`.

## How

`expr_to_sql_string` now has two paths:

- **Fast path** (no binary literals): delegate to DataFusion's unparser
unchanged.
- **Slow path**: rewrite each `Binary(Some(bytes))` literal in the tree
to a unique string-literal placeholder, run the unparser, then
substitute `'<placeholder>'` with `X'<HEX>'` in the resulting SQL.
`Binary(None)` / `LargeBinary(None)` are rewritten to
`ScalarValue::Null` so the unparser emits plain `NULL`.

This keeps DataFusion as the single source of truth for operator and
function serialization, so binary literals work in every expression node
type the unparser already supports — including nested cases like
`contains(col("data"), lit(b"\xff"))`, `NOT (col == lit(b"..."))`, and
`col.cast(...) == lit(b"...")`.

## Changes

- `rust/lancedb/src/expr/sql.rs`: placeholder-substitution
implementation.
- `rust/lancedb/src/expr.rs`: 4 new unit tests covering binary literals
in equality, compound predicates, scalar function calls, negation, and
`NULL` binary literals.
- `python/src/expr.rs`: `expr_lit` accepts `PyBytes` and produces
`ScalarValue::Binary`.
- `python/Cargo.toml` + `Cargo.lock`: pull in `datafusion-common` for
`ScalarValue`.
- `python/python/lancedb/expr.py`: extend `ExprLike` and `lit()` type
annotations / docstrings with `bytes`.
- `python/python/lancedb/_lancedb.pyi`: update `expr_lit` stub.
- `python/tests/test_expr.py`: unit tests for `to_sql` / `repr` of
binary literals and an integration test against a real `pa.binary()`
column for equality / inequality / compound filters.

## Example

```python
from lancedb.expr import col, lit, func

# Equality against a binary column
col("payload") == lit(b"\xca\xfe")
# Expr((payload = X'CAFE'))

# Nested inside a function call (previously failed)
func("contains", col("data"), lit(b"\xff"))
# Expr(contains(data, X'FF'))

# repr() no longer crashes
repr(lit(b"\xde\xad\xbe\xef"))
# "Expr(X'DEADBEEF')"
```

## Verification

- [x] `cargo test -p lancedb --lib expr::` — 12/12 pass (was 9; +3 new
tests)
- [x] `cargo check --features remote --tests --examples` — clean
- [x] `cargo clippy --features remote --tests --examples` — no warnings
- [x] `cargo fmt --all -- --check` — clean
- [x] `pytest python/tests/test_expr.py` — 76/76 pass (was 74; +2 new
tests)
- [x] `ruff check python` / `ruff format --check python` — clean

## Follow-ups (not in this PR)

Issue #3261 also raises the possibility of a *truncated* `__repr__` for
very large binary literals. This PR keeps `__repr__` exact (it forwards
to `to_sql()`), since truncating display output would diverge from the
SQL that actually gets executed. A display-only truncation could be
added in a follow-up by giving `__repr__` its own renderer.

Made with [Cursor](https://cursor.com)

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-14 15:24:52 -07:00
Justin Miller
5b45e44ce3 fix(rust): map lance-namespace errors to TableNotFound / TableAlreadyExists (#3385)
## Summary

`LanceNamespaceDatabase::open_table` and `create_table` were squashing
`NamespaceError::TableNotFound` and `TableAlreadyExists` into generic
`Error::Runtime`, so callers couldn't distinguish a missing-table or
duplicate-table error from any other internal failure. Downstream this
surfaced to geneva-style code as HTTP 500 / "internal server error" on
operations that should have been 400/404 — see
[ENT-1235](https://linear.app/lancedb/issue/ENT-1235/fix-ns-errors-for-create-tableopen-table).

This PR walks the boxed-error chain from `lance::Error::Namespace` down
to the inner `NamespaceError` and maps its `ErrorCode` onto the proper
`lancedb::Error` variant:

- `NamespaceError::TableNotFound` → `Error::TableNotFound { name, source
}`
- `NamespaceError::TableAlreadyExists` → `Error::TableAlreadyExists {
name }`
- everything else → `Error::Runtime` (unchanged behavior for the long
tail)

It also replaces the existing `e.to_string().contains("already exists")`
string match in `LanceNamespaceDatabase::create_table` with a downcast
on the `NamespaceError` code. That string-match happened to work for the
`dir` backend but isn't guaranteed to match the REST namespace backend's
error format; the downcast works for both.

The chain-walk is needed because `DatasetBuilder::from_namespace`
re-wraps the inner namespace error in a fresh `lance::Error::Namespace`,
so a single top-level downcast misses it.

## How this helps geneva

Geneva's workaround (linked in the parent issue) currently has to use
`except Exception:` with a `# todo: this is too broad` comment, plus
`str(e).lower().contains("already exists")` string matching, because the
namespace-impl path raised a generic `RuntimeError`. After this PR:

- `db.open_table("missing")` raises `ValueError("Table 'missing' was not
found")` (via the existing Python binding mapping of `TableNotFound` →
`PyValueError`) — geneva can catch `ValueError` cleanly.
- `db.create_table("dup")` raises `ValueError("Table 'dup' already
exists")` reliably across both `dir` and REST backends, so the existing
string match becomes deterministic.

In phalanx (the sophon REST server), `LanceDBError::TableNotFound` and
`LanceDBError::TableAlreadyExists` already map directly to HTTP 404 and
HTTP 400 respectively — see
[phalanx/src/error.rs:77-94](https://github.com/lancedb/sophon/blob/main/src/rust/phalanx/src/error.rs#L77).
No phalanx code change is needed for the bug fix; the previous 500 came
from phalanx's string-match fallback not finding `"namespace"` AND `"not
found"` in the `Runtime` error's debug-formatted message.

## Follow-up


[ENT-1246](https://linear.app/lancedb/issue/ENT-1246/remove-dead-namespace-error-string-matching-in-phalanx)
— after this lands and phalanx picks up the new lancedb, the
string-matching fallback for table errors in
`src/rust/phalanx/src/error.rs` (lines 99-168, 236-256, 502-514) and
`src/rust/phalanx/src/rest/table/create_table.rs` (lines 224-241)
becomes dead code and can be removed. The `// TODO: Refactor for better
namespace error handling` comment at phalanx/src/error.rs:96-98 is
exactly what this PR addresses on the lancedb side; ENT-1246 finishes
the cleanup on the sophon side.

## Test plan

- [x] `cargo test --quiet --features remote -p lancedb --lib` — all 495
lib tests pass, including 4 new tests in `database::namespace::tests`:
- `test_namespace_table_not_found` — extended to assert
`Error::TableNotFound` (was just `is_err()`)
- `test_namespace_open_table_not_found_at_root` — covers the
root-namespace path
- `test_namespace_create_table_already_exists` — covers child namespace
- `test_namespace_create_table_already_exists_at_root` — covers root
namespace
- [x] `cargo clippy --quiet --features remote --tests` — clean
- [x] `cargo fmt --all` — clean
- [x] Manually confirmed (via test failures before the fix) that the two
`open_table` tests were returning `Error::Runtime { message: "Failed to
get table info from namespace: Namespace { source: TableNotFound { ... }
}" }` prior to this change.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 15:19:23 -07:00
Brendan Clement
f893589356 fix(python): invalid namespace mode/behavior was silently ignored, now raises ValueError (#3388)
Follow-up to #3371 , which added runtime validation for namespace `mode`
and `behavior` parameters in the NodeJS SDK. Bringing the same fix to
Python for cross-SDK consistency.

**Before:** unrecognized values were silently dropped to `None`, so
`db.create_namespace(["x"], mode="foobar")` would quietly fall through
to the server's default mode and hide caller typos.

**After:** raises `ValueError` listing the valid values.
2026-05-14 15:17:44 -07:00
Tanay
df4ad9f851 feat(nodejs): add Scannable primitive for streaming ingestion (#3271)
## **Summary**

This PR adds a **Scannable primitive** to the Node.js bindings, bringing
parity with Python's `PyScannable`.

A `Scannable` wraps a schema, an optional row count hint, a rescannable
flag, and a batch producing callback. On the Rust side it implements
`lancedb::data::scannable::Scannable`. The goal is to give consumers
such as `Table.add`, `createTable`, and `mergeInsert` a way to stream
data without materializing the full dataset in JS memory.

This PR introduces only the primitive. Migrating existing consumers to
use it will come in follow up work.

---

## **Design**

### **Transport**

The transport uses the **Arrow IPC Stream format, one batch at a time**.

The JS side encodes each `RecordBatch` into a self contained IPC Stream
message containing schema, batch, and end of stream. The message is
returned as a `Buffer` through a napi `ThreadsafeFunction`. The Rust
side decodes it using `arrow_ipc::reader::StreamReader`.

Only one batch is active at a time, so JS memory stays bounded by the
batch size. The Node `Buffer` size limit of about 4 GiB therefore does
not constrain the stream as a whole.

I initially evaluated the Arrow C Data Interface, which is the approach
used in Python. I dropped that path after confirming that the
`apache-arrow` npm package does not expose a C Data Interface export in
any supported version from 15 to 18. JavaScript is not listed in Arrow's
C Data Interface implementation table, and the upstream tracking issue
remains open with no scheduled work.

Third party FFI shims would introduce additional dependency risk without
solving the core maintenance problem. Using IPC adds one encode and
decode step per batch, but the cost is predictable and typically
dominated by Lance's write path.

---

### **API**

```ts
class Scannable {
  readonly schema: Schema
  readonly numRows: number | null
  readonly rescannable: boolean

  static fromFactory(schema, factory, opts?)
  static fromTable(table, opts?)
  static fromIterable(schema, iter, opts?)
  static fromRecordBatchReader(reader, opts?)
}
```

The FFI boundary consists of a single callback:

`getNextBatch(isStart: boolean): Promise<Buffer | null>`

`isStart` is `true` on the first call of each new scan and `false` for
every call after it. The JS side uses it to drop any cached iterator and
re-invoke the factory at scan boundaries. This is what makes a
rescannable source restart at batch 0 on every `scan_as_stream` call,
even when a previous scan ended mid stream, for example a retried write
after a network error. Without this signal a retry would resume a stale
iterator and silently skip already emitted batches.

In addition, a schema only IPC buffer is transferred once during
construction.

---

## **Changes**

* `nodejs/src/scannable.rs`
Adds `NapiScannable` and the `LanceScannable` implementation. Implements
`schema()`, `num_rows()`, `rescannable()`, and `scan_as_stream()`.
Includes per batch schema validation against the declared schema, one
shot enforcement for non rescannable sources, and a scan boundary reset
signal (`isStart`) so rescannable sources restart from batch 0 on every
`scan_as_stream` call rather than resuming a stale iterator.

* `nodejs/src/lib.rs`
  Module registration.

* `nodejs/lancedb/scannable.ts`
Defines the `Scannable` class and the four constructors listed above.
Each constructor rejects option combinations it cannot honor, for
example a `rescannable: true` request on a one shot iterable or reader,
and a `numRows` that disagrees with an in memory table's row count.

* `nodejs/lancedb/index.ts`
  Exports the new primitive.

* `nodejs/__test__/scannable.test.ts`
  Test suite for the primitive.

---

## **Validation**

Before implementing the bridge, I ran an end to end harness with a JS
producer feeding a standalone Rust consumer built against the same
`arrow-ipc` version used in the bridge.

The harness covered the following scenarios:

* happy path
* empty stream
* 1,000 small batches
* 10 large batches
* mixed primitive types with nullables
* nested `List<Struct<>>`
* truncated stream error handling
* declared schema mismatch validation
* a 6 GB stress test through the pipe

All scenarios completed with bounded memory usage. The goal of this
harness was to confirm that the IPC Stream transport works correctly end
to end and that Node's `Buffer` size limit does not constrain the
overall stream.

Separately, the rescannable restart contract was verified with a focused
harness. A rescannable source is consumed partially and the scan is
dropped mid stream, then re-scanned. The re-scan replays from batch 0
rather than resuming the stale iterator. The same harness was run with
the `isStart` reset path disabled and the mid stream restart case failed
as expected, confirming the test exercises the real regression.

These harnesses are not meant to replace the full test suite, which is
described below.

---

## **Tests**

`__test__/scannable.test.ts` covers construction, metadata reflection,
per constructor defaults and overrides, construction time validation,
the native handle surface, and schema variety across empty tables,
nested types, `FixedSizeList`, and wide schemas.

Runtime scan behavior including `scan_as_stream`, one shot enforcement
on non rescannable sources, schema mismatch detection, IPC decode
failures, and rescannable restart semantics is not exercised here. There
is no in tree JS consumer of `NapiScannable` yet. This mirrors Python's
`PyScannable`, which has no dedicated test file and is covered
transitively through the consumers that accept a Scannable.

Runtime coverage will follow in the consumer migration work.

---

## **Status**

Ready for review.

Closes #3223

---
2026-05-14 15:07:41 -07:00
Brendan Clement
9330a9b851 feat(nodejs): expose connectNamespace for namespace-backed connections (#3383)
### Summary 

Adds a `connectNamespace(implName, properties, options?)` to the NodeJS
SDK`. Closes #3380.

### Testing
- pnpm test
- Ran smoke test

```
import { connectNamespace } from "lancedb"
import { tmpdir } from "os";
import { mkdtempSync } from "fs";
import { join } from "path";

const dir = mkdtempSync(join(tmpdir(), "lancedb-connect-namespace-smoke-"));
console.log(`Using temp dir: ${dir}\n`);

// 1. Happy path: connect via the "dir" namespace impl, create + list a table.
console.log('Connecting via connectNamespace("dir", { root })...');
const db = await connectNamespace("dir", { root: dir });
console.log("  ✓ connected:", db.display());

console.log("Creating a table and listing it...");
await db.createTable("users", [
  { id: 1, name: "alice" },
  { id: 2, name: "bob" },
]);
console.log("  ✓ tableNames ->", await db.tableNames());

const table = await db.openTable("users");
console.log("  ✓ users.countRows ->", await table.countRows());

// 2. Storage options pass-through.
console.log("\nReconnecting with storageOptions (plumbing check)...");
const dbWithOpts = await connectNamespace(
  "dir",
  { root: dir },
  { storageOptions: { newTableDataStorageVersion: "stable" } },
);
console.log("  ✓ connected with storageOptions:", dbWithOpts.display());
await dbWithOpts.close();

// 3. Empty implName -> clear error.
console.log("\nCalling connectNamespace('', {}) (expect error)...");
try {
  await connectNamespace("", {});
  console.error("  UNEXPECTED: empty implName did not throw");
} catch (err) {
  console.log(`  ✓ Got expected error: ${err.message.split("\n")[0]}`);
}

// 4. Unknown impl -> error.
console.log("\nCalling connectNamespace('not-a-real-impl', {}) (expect error)...");
try {
  await connectNamespace("not-a-real-impl", {});
  console.error("  UNEXPECTED: unknown impl did not throw");
} catch (err) {
  console.log(`  ✓ Got expected error: ${err.message.split("\n")[0]}`);
}

// 5. Create a table inside a child namespace, then reconnect with a fresh
//    connectNamespace call and confirm the table is reachable via that
//    namespace path. (The dir+manifest impl keeps the namespace hierarchy in
//    a root manifest, so "scoping" happens via namespacePath args, not by
//    pointing root at a subdir.)
console.log("\nCreating a table inside a child namespace...");
const dir2 = mkdtempSync(join(tmpdir(), "lancedb-connect-namespace-smoke-"));
const writer = await connectNamespace("dir", {
  root: dir2,
  manifest_enabled: "true",
});
await writer.createNamespace(["analytics"]);
await writer.createTable(
  "orders",
  [
    { id: 1, total: 10 },
    { id: 2, total: 20 },
  ],
  ["analytics"],
);
console.log(
  "  ✓ writer sees tables under [analytics] ->",
  await writer.tableNames(["analytics"]),
);
await writer.close();

console.log("Reconnecting and reading the table via its namespace path...");
const reader = await connectNamespace("dir", {
  root: dir2,
  manifest_enabled: "true",
});
console.log(
  "  ✓ reader tableNames(['analytics']) ->",
  await reader.tableNames(["analytics"]),
);
const orders = await reader.openTable("orders", ["analytics"]);
console.log("  ✓ orders.countRows via reader ->", await orders.countRows());
await reader.close();

await db.close();
console.log("\nAll checks passed.");
```

```
Using temp dir: /var/folders/bj/hn6jv9c50y301d1nx0y8xmn00000gn/T/lancedb-connect-namespace-smoke-WByF1P

Connecting via connectNamespace("dir", { root })...
  ✓ connected: LanceNamespaceDatabase
Creating a table and listing it...
  ✓ tableNames -> [ 'users' ]
  ✓ users.countRows -> 2

Reconnecting with storageOptions (plumbing check)...
  ✓ connected with storageOptions: LanceNamespaceDatabase

Calling connectNamespace('', {}) (expect error)...
  ✓ Got expected error: implName must be a non-empty string

Calling connectNamespace('not-a-real-impl', {}) (expect error)...
  ✓ Got expected error: Invalid input, Failed to connect to namespace: Namespace { source: Unsupported { message: "Implementation 'not-a-real-impl' is not available. Supported: dir, rest" }, location: Location { file: "/Users/brendan/.cargo/git/checkouts/lance-8ddea23c38163eda/f693245/rust/lance-namespace-impls/src/connect.rs", line: 216, column: 14 } }

Creating a table inside a child namespace...
  ✓ writer sees tables under [analytics] -> [ 'orders' ]
Reconnecting and reading the table via its namespace path...
  ✓ reader tableNames(['analytics']) -> [ 'orders' ]
  ✓ orders.countRows via reader -> 2

All checks passed.
```

### Docs
- regenerated docs
2026-05-13 16:16:56 -07:00
Brendan Clement
02de07576e feat(nodejs): add namespace management methods on Connection (#3371)
### Summary

Closes #3363 

Adds the four namespace management methods to the NodeJS `Connection`,
bringing parity with the Rust core and Python bindings:

- `listNamespaces(parent?, options?)`
- `createNamespace(namespacePath, options?)`
- `dropNamespace(namespacePath, options?)`
- `describeNamespace(namespacePath)`

### Test plan
- npm test
- Ran a smoke test script

```typescript
import { connect } from '<lancePath>'
import { tmpdir } from "os";
import { mkdtempSync } from "fs";
import { join } from "path";

const dir = mkdtempSync(join(tmpdir(), "lancedb-smoke-"));
console.log(`Using temp dir: ${dir}\n`);

const db = await connect(dir, {
  namespaceClientProperties: { manifest_enabled: "true" },
});

console.log("Creating namespaces...");
await db.createNamespace(["analytics"]);
await db.createNamespace(["analytics", "sales"], {
  properties: { owner: "brendan", purpose: "smoke-test" },
});
await db.createNamespace(["marketing"]);

const root = await db.listNamespaces();
console.log("Root namespaces:", root.namespaces);

const children = await db.listNamespaces(["analytics"]);
console.log("Children of 'analytics':", children.namespaces);

const descWithProps = await db.describeNamespace(["analytics", "sales"]);
console.log("Describe analytics/sales (with properties):", descWithProps);

const descNoProps = await db.describeNamespace(["analytics"]);
console.log("Describe analytics (no properties):", descNoProps);

console.log("Describing a non-existent namespace (expect error)...");
try {
  await db.describeNamespace(["does-not-exist"]);
  console.error("  UNEXPECTED: describe succeeded for non-existent namespace");
} catch (err) {
  console.log(`  ✓ Got expected error: ${err.message.split("\n")[0]}`);
}

await db.dropNamespace(["marketing"]);
const afterDrop = await db.listNamespaces();
console.log("Root after dropping marketing:", afterDrop.namespaces);

await db.close();
console.log("\nAll operations completed successfully.");
```

```
Using temp dir: /var/folders/bj/hn6jv9c50y301d1nx0y8xmn00000gn/T/lancedb-smoke-MUC5NI

Creating namespaces...
Root namespaces: [ 'analytics', 'marketing' ]
Children of 'analytics': [ 'sales' ]
Describe analytics/sales (with properties): { properties: { purpose: 'smoke-test', owner: 'brendan' } }
Describe analytics (no properties): {}
Describing a non-existent namespace (expect error)...
  ✓ Got expected error: lance error: Namespace error: Namespace not found: does-not-exist, rust/lance-namespace-impls/src/dir/manifest.rs:2495:14  Caused by: Namespace error: Namespace not found: does-not-exist, rust/lance-namespace-impls/src/dir/manifest.rs:2495:14    Caused by: Namespace not found: does-not-exist
Root after dropping marketing: [ 'analytics' ]

All operations completed successfully.
```

### Documentation
- regenerated docs
2026-05-13 11:49:27 -07:00
Will Jones
81617fd3d9 ci(nodejs): switch from npm to pnpm 11 (#3373)
## Summary

Switch the nodejs bindings and examples package from npm to pnpm 11 to
pick up its stronger supply-chain defaults:

- `minimumReleaseAge` defaults to 1 day, so newly-published (potentially
compromised) versions aren't resolved into installs for at least 24h.
- Install lifecycle scripts (`preinstall`/`install`/`postinstall`) are
no longer run for arbitrary transitive deps; only an explicit allowlist
may run them, and unapproved scripts cause install to fail
(`strictDepBuilds: true`).
- Audit uses GHSA IDs and `--fix=update` to add patched versions to
`minimumReleaseAgeExclude`.

This is the same class of protection that would have blunted the recent
TanStack/`@uipath`/etc. compromise discussed in the [Aikido
write-up](https://www.aikido.dev/blog/mini-shai-hulud-is-back-tanstack-compromised).

## Changes

- Replace `nodejs/package-lock.json` and
`nodejs/examples/package-lock.json` with `pnpm-lock.yaml`.
- Pin pnpm via `packageManager: pnpm@11.1.1` in both `package.json`s.
- Add `pnpm-workspace.yaml` with the four build-script packages we
actually need: `@biomejs/biome`, `onnxruntime-node`, `protobufjs`,
`sharp`. Everything else is blocked from running install scripts.
- Update package.json scripts (`npm run X` → `pnpm X`).
- Update workflows: `.github/workflows/nodejs.yml`,
`.github/workflows/npm-publish.yml`, and
`.github/workflows/codex-fix-ci.yml` — install pnpm via
`pnpm/action-setup@v4` and switch `setup-node` caches to
`pnpm-lock.yaml`.
- Refresh `nodejs/AGENTS.md`, `nodejs/CLAUDE.md`, and
`nodejs/CONTRIBUTING.md`.

`docs/package-lock.json` is **not** touched — out of scope for this PR.

## Test plan

- [ ] `Lint` job (lint Rust/TS + examples lint) passes on CI.
- [ ] `Linux (NodeJS 18/20)` build+test passes, including the examples
test step.
- [ ] `macos` build+test passes.
- [ ] `NPM Publish` workflow's PR dry-run completes (build matrix + test
matrix + dry `npm publish`).
- [ ] No new install-script approvals are required at install time.

## Follow-ups

- `update_package_lock_run_nodejs.yml` references a composite action
path that doesn't exist
(`./.github/workflows/update_package_lock_nodejs`); it was already
broken pre-PR. We may want to either delete this workflow or rewrite it
for pnpm in a follow-up.
- Consider migrating `docs/` to pnpm in a separate PR.

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 11:27:38 -07:00
77 changed files with 15483 additions and 16025 deletions

View File

@@ -1,5 +1,5 @@
[tool.bumpversion]
current_version = "0.29.0"
current_version = "0.28.0-beta.11"
parse = """(?x)
(?P<major>0|[1-9]\\d*)\\.
(?P<minor>0|[1-9]\\d*)\\.

View File

@@ -45,7 +45,9 @@ jobs:
- name: Set up Node.js
uses: actions/setup-node@v4
with:
node-version: 20
# pnpm 11 (used by the nodejs install step below) requires
# Node >= 22.13; use 24 since 22 hits EOL in October.
node-version: 24
- name: Install Codex CLI
run: npm install -g @openai/codex
@@ -79,10 +81,14 @@ jobs:
java-version: '11'
cache: maven
- name: Setup pnpm
uses: pnpm/action-setup@v4
with:
version: 11.1.1
- name: Install Node.js dependencies for TypeScript bindings
run: |
cd nodejs
npm ci
pnpm install --frozen-lockfile
- name: Configure git user
run: |
@@ -137,7 +143,7 @@ jobs:
- For Rust test failures: Run the specific test with "cargo test -p <crate> <test_name>"
- For Python test failures: Build with "cd python && maturin develop" then run "pytest <specific_test_file>::<test_name>"
- For Java test failures: Run "cd java && mvn test -Dtest=<TestClass>#<testMethod>"
- For TypeScript test failures: Run "cd nodejs && npm run build && npm test -- --testNamePattern='<test_name>'"
- For TypeScript test failures: Run "cd nodejs && pnpm build && pnpm test -- --testNamePattern='<test_name>'"
- Do NOT run the full test suite - only run the tests that were failing
7. If the additional guidelines are provided, follow them as well.

View File

@@ -16,7 +16,6 @@ on:
push:
branches:
- main
- release/**
paths:
- java/**
- .github/workflows/java.yml

View File

@@ -3,7 +3,6 @@ on:
push:
branches:
- main
- release/**
pull_request:
paths:
- rust/**

View File

@@ -4,7 +4,6 @@ on:
push:
branches:
- main
- release/**
pull_request:
paths:
- Cargo.toml
@@ -43,11 +42,17 @@ jobs:
with:
fetch-depth: 0
lfs: true
- uses: pnpm/action-setup@v4
with:
version: 11.1.1
- uses: actions/setup-node@v4
with:
node-version: 20
cache: 'npm'
cache-dependency-path: nodejs/package-lock.json
# pnpm 11 requires Node >= 22.13; use 24 since 22 hits EOL
# in October. The library itself still supports Node >= 18
# (see test matrix below).
node-version: 24
cache: 'pnpm'
cache-dependency-path: nodejs/pnpm-lock.yaml
- uses: actions-rust-lang/setup-rust-toolchain@v1
with:
components: rustfmt, clippy
@@ -62,11 +67,13 @@ jobs:
run: cargo clippy --profile ci --all --all-features -- -D warnings
- name: Lint Typescript
run: |
npm ci
npm run lint-ci
pnpm install --frozen-lockfile
pnpm lint-ci
- name: Lint examples
working-directory: nodejs/examples
run: npm ci && npm run lint-ci
# The `@lancedb/lancedb` dep points at file:../dist; pnpm errors if
# that dir is missing, so create an empty one for lint-only runs.
run: mkdir -p ../dist && pnpm install --frozen-lockfile && pnpm lint-ci
linux:
name: Linux (NodeJS ${{ matrix.node-version }})
timeout-minutes: 30
@@ -83,14 +90,18 @@ jobs:
with:
fetch-depth: 0
lfs: true
- uses: actions/setup-node@v4
name: Setup Node.js 20 for build
- uses: pnpm/action-setup@v4
with:
# @napi-rs/cli v3 requires Node >= 20.12 (via @inquirer/prompts@8).
# Build always on Node 20; tests run on the matrix version below.
node-version: 20
cache: 'npm'
cache-dependency-path: nodejs/package-lock.json
version: 11.1.1
- uses: actions/setup-node@v4
name: Setup Node.js 24 for build
with:
# pnpm 11 requires Node >= 22.13; use 24 since 22 hits EOL
# in October. Build/install runs on Node 24; tests run on the
# matrix version below using direct jest invocation.
node-version: 24
cache: 'pnpm'
cache-dependency-path: nodejs/pnpm-lock.yaml
- uses: Swatinem/rust-cache@v2
- name: Install dependencies
run: |
@@ -98,45 +109,52 @@ jobs:
sudo apt install -y protobuf-compiler libssl-dev
- name: Build
run: |
npm ci --include=optional
npm run build:debug -- --profile ci
pnpm install --frozen-lockfile
# No `--` separator: pnpm forwards it literally, which would
# make napi-rs treat `--profile ci` as a cargo passthrough arg.
pnpm build:debug --profile ci
pnpm tsc
- name: Setup examples
working-directory: nodejs/examples
run: pnpm install --frozen-lockfile
- name: Check docs
run: |
# We run this as part of the job because the binary needs to be built
# first to export the types of the native code.
set -e
# `pnpm docs` would invoke pnpm's built-in `docs` command, not
# the script — use `pnpm run docs`.
pnpm run docs
if ! git diff --exit-code -- ../ ':(exclude)Cargo.lock'; then
echo "Docs need to be updated"
echo "Run 'pnpm run docs', fix any warnings, and commit the changes."
exit 1
fi
- uses: actions/setup-node@v4
name: Setup Node.js ${{ matrix.node-version }} for test
with:
node-version: ${{ matrix.node-version }}
- name: Compile TypeScript
run: npm run tsc
- name: Setup localstack
working-directory: .
run: docker compose up --detach --wait
- name: Test
env:
S3_TEST: "1"
run: npm run test
- name: Setup examples
working-directory: nodejs/examples
run: npm ci
# Newer @smithy/core uses dynamic ESM imports.
NODE_OPTIONS: "--experimental-vm-modules"
# Invoke jest directly because pnpm 11 itself requires Node 22+
# while the matrix tests on older Node versions.
run: npx jest --verbose
- name: Test examples
working-directory: ./
env:
OPENAI_API_KEY: test
OPENAI_BASE_URL: http://0.0.0.0:8000
NODE_OPTIONS: "--experimental-vm-modules"
run: |
python ci/mock_openai.py &
cd nodejs/examples
npm test
- name: Check docs
run: |
# We run this as part of the job because the binary needs to be built
# first to export the types of the native code.
set -e
npm ci
npm run docs
if ! git diff --exit-code -- ../ ':(exclude)Cargo.lock'; then
echo "Docs need to be updated"
echo "Run 'npm run docs', fix any warnings, and commit the changes."
exit 1
fi
npx jest --testEnvironment jest-environment-node-single-context --verbose
macos:
timeout-minutes: 30
runs-on: "macos-14"
@@ -149,20 +167,28 @@ jobs:
with:
fetch-depth: 0
lfs: true
- uses: pnpm/action-setup@v4
with:
version: 11.1.1
- uses: actions/setup-node@v4
with:
node-version: 20
cache: 'npm'
cache-dependency-path: nodejs/package-lock.json
# pnpm 11 requires Node >= 22.13; use 24 since 22 hits EOL
# in October.
node-version: 24
cache: 'pnpm'
cache-dependency-path: nodejs/pnpm-lock.yaml
- uses: dtolnay/rust-toolchain@stable
- uses: Swatinem/rust-cache@v2
- name: Install dependencies
run: |
brew install protobuf
- name: Build
run: |
npm ci --include=optional
npm run build:debug -- --profile ci
npm run tsc
pnpm install --frozen-lockfile
# No `--` separator: pnpm forwards it literally, which would
# make napi-rs treat `--profile ci` as a cargo passthrough arg.
pnpm build:debug --profile ci
pnpm tsc
- name: Test
run: |
npm run test
pnpm test

View File

@@ -171,13 +171,18 @@ jobs:
working-directory: nodejs
steps:
- uses: actions/checkout@v4
- name: Setup pnpm
uses: pnpm/action-setup@v4
with:
version: 11.1.1
- name: Setup node
uses: actions/setup-node@v4
if: ${{ !matrix.settings.docker }}
with:
node-version: 20
cache: npm
cache-dependency-path: nodejs/package-lock.json
# pnpm 11 requires Node >= 22.13; use 24 since 22 hits EOL
# in October.
node-version: 24
cache: pnpm
cache-dependency-path: nodejs/pnpm-lock.yaml
- name: Install
uses: dtolnay/rust-toolchain@stable
if: ${{ !matrix.settings.docker }}
@@ -195,7 +200,7 @@ jobs:
target/
key: nodejs-${{ matrix.settings.target }}-cargo-${{ matrix.settings.host }}
- name: Install dependencies
run: npm ci
run: pnpm install --frozen-lockfile
- name: Install Zig
uses: mlugg/setup-zig@v2
if: ${{ contains(matrix.settings.target, 'musl') }}
@@ -248,7 +253,7 @@ jobs:
# one to do the upload.
- name: Make generic artifacts
if: ${{ matrix.settings.target == 'aarch64-apple-darwin' }}
run: npm run tsc
run: pnpm tsc
- name: Upload Generic Artifacts
if: ${{ matrix.settings.target == 'aarch64-apple-darwin' }}
uses: actions/upload-artifact@v4
@@ -283,14 +288,24 @@ jobs:
working-directory: nodejs
steps:
- uses: actions/checkout@v4
- name: Setup node
- name: Setup pnpm
uses: pnpm/action-setup@v4
with:
version: 11.1.1
- name: Setup Node.js 24 for install
uses: actions/setup-node@v4
with:
# pnpm 11 requires Node >= 22.13; use 24 since 22 hits EOL
# in October.
node-version: 24
cache: pnpm
cache-dependency-path: nodejs/pnpm-lock.yaml
- name: Install dependencies
run: pnpm install --frozen-lockfile
- name: Setup Node.js ${{ matrix.node }} for test
uses: actions/setup-node@v4
with:
node-version: ${{ matrix.node }}
cache: npm
cache-dependency-path: nodejs/package-lock.json
- name: Install dependencies
run: npm ci
- name: Download artifacts
uses: actions/download-artifact@v4
with:
@@ -311,7 +326,9 @@ jobs:
- name: Move built files
run: cp dist/native.d.ts dist/native.js dist/*.node lancedb/
- name: Test bindings
run: npm test
# Invoke jest directly because pnpm 11 itself requires Node 22+
# while the matrix tests on older Node versions.
run: npx jest --verbose
publish:
name: Publish
runs-on: ubuntu-latest
@@ -323,15 +340,19 @@ jobs:
- test-lancedb
steps:
- uses: actions/checkout@v4
- name: Setup pnpm
uses: pnpm/action-setup@v4
with:
version: 11.1.1
- name: Setup node
uses: actions/setup-node@v4
with:
node-version: 24
cache: npm
cache-dependency-path: nodejs/package-lock.json
cache: pnpm
cache-dependency-path: nodejs/pnpm-lock.yaml
registry-url: "https://registry.npmjs.org"
- name: Install dependencies
run: npm ci
run: pnpm install --frozen-lockfile
- uses: actions/download-artifact@v4
with:
name: nodejs-dist
@@ -351,7 +372,7 @@ jobs:
- name: Display structure of downloaded files
run: find dist && find nodejs-artifacts
- name: Move artifacts
run: npx napi artifacts -d nodejs-artifacts
run: pnpm exec napi artifacts -d nodejs-artifacts
- name: List packages
run: find npm
- name: Publish

View File

@@ -4,7 +4,6 @@ on:
push:
branches:
- main
- release/**
pull_request:
paths:
- Cargo.toml

View File

@@ -4,7 +4,6 @@ on:
push:
branches:
- main
- release/**
pull_request:
paths:
- Cargo.toml

1828
Cargo.lock generated

File diff suppressed because it is too large Load Diff

View File

@@ -13,20 +13,20 @@ categories = ["database-implementations"]
rust-version = "1.91.0"
[workspace.dependencies]
lance = { "version" = "=6.0.0", default-features = false }
lance-core = "=6.0.0"
lance-datagen = "=6.0.0"
lance-file = "=6.0.0"
lance-io = { "version" = "=6.0.0", default-features = false }
lance-index = "=6.0.0"
lance-linalg = "=6.0.0"
lance-namespace = "=6.0.0"
lance-namespace-impls = { "version" = "=6.0.0", default-features = false }
lance-table = "=6.0.0"
lance-testing = "=6.0.0"
lance-datafusion = "=6.0.0"
lance-encoding = "=6.0.0"
lance-arrow = "=6.0.0"
lance = { "version" = "=7.0.0-beta.9", default-features = false, "tag" = "v7.0.0-beta.9", "git" = "https://github.com/lance-format/lance.git" }
lance-core = { "version" = "=7.0.0-beta.9", "tag" = "v7.0.0-beta.9", "git" = "https://github.com/lance-format/lance.git" }
lance-datagen = { "version" = "=7.0.0-beta.9", "tag" = "v7.0.0-beta.9", "git" = "https://github.com/lance-format/lance.git" }
lance-file = { "version" = "=7.0.0-beta.9", "tag" = "v7.0.0-beta.9", "git" = "https://github.com/lance-format/lance.git" }
lance-io = { "version" = "=7.0.0-beta.9", default-features = false, "tag" = "v7.0.0-beta.9", "git" = "https://github.com/lance-format/lance.git" }
lance-index = { "version" = "=7.0.0-beta.9", "tag" = "v7.0.0-beta.9", "git" = "https://github.com/lance-format/lance.git" }
lance-linalg = { "version" = "=7.0.0-beta.9", "tag" = "v7.0.0-beta.9", "git" = "https://github.com/lance-format/lance.git" }
lance-namespace = { "version" = "=7.0.0-beta.9", "tag" = "v7.0.0-beta.9", "git" = "https://github.com/lance-format/lance.git" }
lance-namespace-impls = { "version" = "=7.0.0-beta.9", default-features = false, "tag" = "v7.0.0-beta.9", "git" = "https://github.com/lance-format/lance.git" }
lance-table = { "version" = "=7.0.0-beta.9", "tag" = "v7.0.0-beta.9", "git" = "https://github.com/lance-format/lance.git" }
lance-testing = { "version" = "=7.0.0-beta.9", "tag" = "v7.0.0-beta.9", "git" = "https://github.com/lance-format/lance.git" }
lance-datafusion = { "version" = "=7.0.0-beta.9", "tag" = "v7.0.0-beta.9", "git" = "https://github.com/lance-format/lance.git" }
lance-encoding = { "version" = "=7.0.0-beta.9", "tag" = "v7.0.0-beta.9", "git" = "https://github.com/lance-format/lance.git" }
lance-arrow = { "version" = "=7.0.0-beta.9", "tag" = "v7.0.0-beta.9", "git" = "https://github.com/lance-format/lance.git" }
ahash = "0.8"
# Note that this one does not include pyarrow
arrow = { version = "58.0.0", optional = false }
@@ -54,7 +54,7 @@ half = { "version" = "2.7.1", default-features = false, features = [
futures = "0"
log = "0.4"
moka = { version = "0.12", features = ["future"] }
object_store = "0.12.0"
object_store = "0.13.2"
pin-project = "1.0.7"
rand = "0.9"
snafu = "0.8"

View File

@@ -14,7 +14,7 @@ Add the following dependency to your `pom.xml`:
<dependency>
<groupId>com.lancedb</groupId>
<artifactId>lancedb-core</artifactId>
<version>0.29.0</version>
<version>0.28.0-beta.11</version>
</dependency>
```

View File

@@ -12,20 +12,22 @@ Typescript.
* `src/`: Rust bindings source code
* `lancedb/`: Typescript package source code
* `__test__/`: Unit tests
* `examples/`: An npm package with the examples shown in the documentation
* `examples/`: A pnpm package with the examples shown in the documentation
## Development environment
To set up your development environment, you will need to install the following:
1. Node.js 14 or later
2. Rust's package manager, Cargo. Use [rustup](https://rustup.rs/) to install.
3. [protoc](https://grpc.io/docs/protoc-installation/) (Protocol Buffers compiler)
1. Node.js 22 or later (required by pnpm 11)
2. [pnpm](https://pnpm.io/installation) 11 or later (or run via `corepack enable`,
which uses the `packageManager` field in `package.json`)
3. Rust's package manager, Cargo. Use [rustup](https://rustup.rs/) to install.
4. [protoc](https://grpc.io/docs/protoc-installation/) (Protocol Buffers compiler)
Initial setup:
```shell
npm install
pnpm install
```
### Commit Hooks
@@ -39,38 +41,38 @@ pre-commit install
## Development
Most common development commands can be run using the npm scripts.
Most common development commands can be run using the pnpm scripts.
Build the package
```shell
npm install
npm run build
pnpm install
pnpm build
```
Lint:
```shell
npm run lint
pnpm lint
```
Format and fix lints:
```shell
npm run lint-fix
pnpm lint-fix
```
Run tests:
```shell
npm test
pnpm test
```
To run a single test:
```shell
# Single file: table.test.ts
npm test -- table.test.ts
pnpm test -- table.test.ts
# Single test: 'merge insert' in table.test.ts
npm test -- table.test.ts --testNamePattern=merge\ insert
pnpm test -- table.test.ts --testNamePattern=merge\ insert
```

View File

@@ -148,6 +148,33 @@ Creates a new empty Table
***
### createNamespace()
```ts
abstract createNamespace(namespacePath, options?): Promise<CreateNamespaceResponse>
```
Create a new namespace at the given path.
#### Parameters
* **namespacePath**: `string`[]
The namespace path to create.
* **options?**: `Partial`&lt;[`CreateNamespaceOptions`](../interfaces/CreateNamespaceOptions.md)&gt;
Creation `mode`
("create" | "exist_ok" | "overwrite") and optional `properties`
to attach to the namespace.
#### Returns
`Promise`&lt;[`CreateNamespaceResponse`](../interfaces/CreateNamespaceResponse.md)&gt;
The properties of the
created namespace and an optional transaction id.
***
### createTable()
#### createTable(options, namespacePath)
@@ -230,6 +257,29 @@ Creates a new Table and initialize it with new data.
***
### describeNamespace()
```ts
abstract describeNamespace(namespacePath): Promise<DescribeNamespaceResponse>
```
Describe a namespace, returning its properties.
#### Parameters
* **namespacePath**: `string`[]
The namespace path to describe, in
parent → child order, e.g. `["analytics", "sales"]`.
#### Returns
`Promise`&lt;[`DescribeNamespaceResponse`](../interfaces/DescribeNamespaceResponse.md)&gt;
The namespace's properties
(may be undefined if the namespace has none).
***
### display()
```ts
@@ -263,6 +313,36 @@ Drop all tables in the database.
***
### dropNamespace()
```ts
abstract dropNamespace(namespacePath, options?): Promise<DropNamespaceResponse>
```
Drop a namespace.
Use `behavior: "cascade"` to also drop everything contained in the
namespace (sub-namespaces and tables). The default `"restrict"`
behavior refuses to drop a non-empty namespace.
#### Parameters
* **namespacePath**: `string`[]
The namespace path to drop.
* **options?**: `Partial`&lt;[`DropNamespaceOptions`](../interfaces/DropNamespaceOptions.md)&gt;
`mode` ("skip" | "fail"
for missing-namespace handling) and `behavior` ("restrict" | "cascade").
#### Returns
`Promise`&lt;[`DropNamespaceResponse`](../interfaces/DropNamespaceResponse.md)&gt;
Any properties returned by
the server and an optional transaction id.
***
### dropTable()
```ts
@@ -299,6 +379,36 @@ Return true if the connection has not been closed
***
### listNamespaces()
```ts
abstract listNamespaces(namespacePath?, options?): Promise<ListNamespacesResponse>
```
List the immediate child namespaces under the given parent.
Results may be paginated. To retrieve subsequent pages, pass the
`pageToken` returned by a previous call.
#### Parameters
* **namespacePath?**: `string`[]
The parent namespace path. Defaults
to the root namespace if omitted.
* **options?**: `Partial`&lt;[`ListNamespacesOptions`](../interfaces/ListNamespacesOptions.md)&gt;
Pagination options
(`pageToken`, `limit`).
#### Returns
`Promise`&lt;[`ListNamespacesResponse`](../interfaces/ListNamespacesResponse.md)&gt;
Child namespace names and
an optional token for fetching the next page.
***
### openTable()
```ts
@@ -327,6 +437,29 @@ Open a table in the database.
***
### renameTable()
```ts
abstract renameTable(
oldName,
newName,
namespacePath?): Promise<void>
```
#### Parameters
* **oldName**: `string`
* **newName**: `string`
* **namespacePath?**: `string`[]
#### Returns
`Promise`&lt;`void`&gt;
***
### tableNames()
#### tableNames(options)

View File

@@ -0,0 +1,173 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / Scannable
# Class: Scannable
A data source that can be scanned as a stream of Arrow `RecordBatch`es.
`Scannable` wraps the schema + optional row count + rescannable flag and
a callback that yields batches one at a time. It is passed to consumers
(e.g. `Table.add`, `createTable`, `mergeInsert` — follow-up work) that
need to pull data without materializing the full dataset in JS memory.
Batches cross the JS↔Rust boundary as Arrow IPC Stream messages; a fresh
writer serializes each batch, and the Rust side decodes it with
`arrow_ipc::reader::StreamReader`. One batch is in flight at a time.
## Properties
### numRows
```ts
readonly numRows: null | number;
```
***
### rescannable
```ts
readonly rescannable: boolean;
```
***
### schema
```ts
readonly schema: Schema<any>;
```
## Methods
### fromFactory()
```ts
static fromFactory(
schema,
factory,
opts): Promise<Scannable>
```
Build a Scannable from an explicit schema and a factory that returns a
fresh batch iterator on each call.
The factory is invoked once per scan. Each iterator yields
`RecordBatch`es matching the declared schema. Use this when you need
direct control over the pull loop — for example, to wrap a streaming
source whose batches are produced lazily.
#### Parameters
* **schema**: `Schema`&lt;`any`&gt;
The Arrow schema of the produced batches.
* **factory**
Called at the start of each scan to produce a batch
iterator. Must be idempotent when `rescannable` is true.
* **opts**: [`ScannableOptions`](../interfaces/ScannableOptions.md) = `{}`
Optional hints. `rescannable` defaults to `true`; set to
`false` if calling `factory()` twice would not reproduce the same data.
#### Returns
`Promise`&lt;[`Scannable`](Scannable.md)&gt;
***
### fromIterable()
```ts
static fromIterable(
schema,
iter,
opts): Promise<Scannable>
```
Build a Scannable from an iterable of `RecordBatch`es. `rescannable`
defaults to `false`. Pass an explicit schema so the consumer can
validate before any batch is pulled.
`opts.rescannable: true` is honest for replayable iterables (Arrays,
Sets, or custom iterables whose `[Symbol.iterator]()` returns a fresh
iterator each call). It is rejected for one-shot iterables (generators,
async generators, or already-an-iterator inputs) because their
`[Symbol.iterator]()` returns the same exhausted object on the second
scan. For replayable sources outside this shape, use
`fromFactory(schema, () => createIter(), { rescannable: true })`.
Note: when `opts.rescannable` is `true`, the constructor calls
`[Symbol.iterator]()` once on the input to perform the structural check.
#### Parameters
* **schema**: `Schema`&lt;`any`&gt;
* **iter**: `Iterable`&lt;`RecordBatch`&lt;`any`&gt;&gt; \| `AsyncIterable`&lt;`RecordBatch`&lt;`any`&gt;&gt;
* **opts**: [`ScannableOptions`](../interfaces/ScannableOptions.md) = `{}`
#### Returns
`Promise`&lt;[`Scannable`](Scannable.md)&gt;
***
### fromRecordBatchReader()
```ts
static fromRecordBatchReader(reader, opts): Promise<Scannable>
```
Build a Scannable from an Arrow `RecordBatchReader`. A reader can only
be consumed once; `rescannable` defaults to `false`.
The reader must already be opened (via `.open()`) so its `.schema` is
populated. `RecordBatchReader.from(...)` returns an unopened reader.
`opts.rescannable: true` is rejected because `RecordBatchReader` is a
self-iterator (its `[Symbol.iterator]()` returns itself), and this
constructor does not call `reader.reset()` between scans, so a second
scan would always see an exhausted reader. For genuinely replayable
sources, use
`fromFactory(schema, () => openReader(), { rescannable: true })`,
which mints a fresh reader on each scan.
#### Parameters
* **reader**: `RecordBatchReader`&lt;`any`&gt;
* **opts**: [`ScannableOptions`](../interfaces/ScannableOptions.md) = `{}`
#### Returns
`Promise`&lt;[`Scannable`](Scannable.md)&gt;
***
### fromTable()
```ts
static fromTable(table, opts): Promise<Scannable>
```
Build a Scannable from an in-memory Arrow `Table`. Always rescannable;
the table's batches are replayed on each scan.
The table's row count is authoritative: `opts.numRows` must either be
omitted or equal to `table.numRows`. `opts.rescannable` of `false` is
rejected because in-memory Tables are always rescannable.
#### Parameters
* **table**: `Table`&lt;`any`&gt;
* **opts**: [`ScannableOptions`](../interfaces/ScannableOptions.md) = `{}`
#### Returns
`Promise`&lt;[`Scannable`](Scannable.md)&gt;

View File

@@ -0,0 +1,131 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / connectNamespace
# Function: connectNamespace()
## connectNamespace(implName, config, options)
```ts
function connectNamespace(
implName,
config,
options?): Promise<Connection>
```
Connect to a LanceDB database through a namespace.
Unlike [connect](connect.md), which routes by URI scheme (local path vs.
`db://` cloud), `connectNamespace` always returns a namespace-backed
connection. The `implName` selects the namespace implementation:
- `"dir"` — directory namespace, configured with [DirNamespaceConfig](../interfaces/DirNamespaceConfig.md).
- `"rest"` — remote REST catalog, configured with [RestNamespaceConfig](../interfaces/RestNamespaceConfig.md).
- Any other string — full module path for a custom implementation,
configured with a free-form string-keyed `properties` map.
### Parameters
* **implName**: `"dir"`
* **config**: [`DirNamespaceConfig`](../interfaces/DirNamespaceConfig.md)
* **options?**: `Partial`&lt;[`ConnectNamespaceOptions`](../interfaces/ConnectNamespaceOptions.md)&gt;
### Returns
`Promise`&lt;[`Connection`](../classes/Connection.md)&gt;
### Examples
```ts
const db = await connectNamespace("dir", { root: "/path/to/db" });
await db.createTable("users", [{ id: 1 }]);
```
```ts
const db = await connectNamespace("rest", {
uri: "https://catalog.example.com",
headers: { "x-api-key": process.env.CATALOG_KEY ?? "" },
});
```
```ts
const db = await connectNamespace("my.custom.Namespace", {
endpoint: "...",
});
```
## connectNamespace(implName, config, options)
```ts
function connectNamespace(
implName,
config,
options?): Promise<Connection>
```
Connect through the built-in REST namespace.
Configured with [RestNamespaceConfig](../interfaces/RestNamespaceConfig.md). See the function-level
documentation above for the full surface, examples, and how this
relates to [connect](connect.md).
### Parameters
* **implName**: `"rest"`
* **config**: [`RestNamespaceConfig`](../interfaces/RestNamespaceConfig.md)
* **options?**: `Partial`&lt;[`ConnectNamespaceOptions`](../interfaces/ConnectNamespaceOptions.md)&gt;
### Returns
`Promise`&lt;[`Connection`](../classes/Connection.md)&gt;
### Example
```ts
const db = await connectNamespace("rest", {
uri: "https://catalog.example.com",
headers: { "x-api-key": process.env.CATALOG_KEY ?? "" },
});
```
## connectNamespace(implName, properties, options)
```ts
function connectNamespace(
implName,
properties,
options?): Promise<Connection>
```
Connect through a custom namespace implementation by full module path,
configured with a free-form string-keyed `properties` map. Use the
typed overloads above for the built-in `"dir"` and `"rest"` impls.
See the function-level documentation above for examples and how this
relates to [connect](connect.md).
### Parameters
* **implName**: `string`
* **properties**: `Record`&lt;`string`, `string`&gt;
* **options?**: `Partial`&lt;[`ConnectNamespaceOptions`](../interfaces/ConnectNamespaceOptions.md)&gt;
### Returns
`Promise`&lt;[`Connection`](../classes/Connection.md)&gt;
### Example
```ts
const db = await connectNamespace("my.custom.Namespace", {
endpoint: "...",
});
```

View File

@@ -32,6 +32,7 @@
- [PhraseQuery](classes/PhraseQuery.md)
- [Query](classes/Query.md)
- [QueryBase](classes/QueryBase.md)
- [Scannable](classes/Scannable.md)
- [Session](classes/Session.md)
- [StaticHeaderProvider](classes/StaticHeaderProvider.md)
- [Table](classes/Table.md)
@@ -51,10 +52,17 @@
- [ClientConfig](interfaces/ClientConfig.md)
- [ColumnAlteration](interfaces/ColumnAlteration.md)
- [CompactionStats](interfaces/CompactionStats.md)
- [ConnectNamespaceOptions](interfaces/ConnectNamespaceOptions.md)
- [ConnectionOptions](interfaces/ConnectionOptions.md)
- [CreateNamespaceOptions](interfaces/CreateNamespaceOptions.md)
- [CreateNamespaceResponse](interfaces/CreateNamespaceResponse.md)
- [CreateTableOptions](interfaces/CreateTableOptions.md)
- [DeleteResult](interfaces/DeleteResult.md)
- [DescribeNamespaceResponse](interfaces/DescribeNamespaceResponse.md)
- [DirNamespaceConfig](interfaces/DirNamespaceConfig.md)
- [DropColumnsResult](interfaces/DropColumnsResult.md)
- [DropNamespaceOptions](interfaces/DropNamespaceOptions.md)
- [DropNamespaceResponse](interfaces/DropNamespaceResponse.md)
- [ExecutableQuery](interfaces/ExecutableQuery.md)
- [FragmentStatistics](interfaces/FragmentStatistics.md)
- [FragmentSummaryStats](interfaces/FragmentSummaryStats.md)
@@ -69,13 +77,17 @@
- [IvfFlatOptions](interfaces/IvfFlatOptions.md)
- [IvfPqOptions](interfaces/IvfPqOptions.md)
- [IvfRqOptions](interfaces/IvfRqOptions.md)
- [ListNamespacesOptions](interfaces/ListNamespacesOptions.md)
- [ListNamespacesResponse](interfaces/ListNamespacesResponse.md)
- [MergeResult](interfaces/MergeResult.md)
- [OpenTableOptions](interfaces/OpenTableOptions.md)
- [OptimizeOptions](interfaces/OptimizeOptions.md)
- [OptimizeStats](interfaces/OptimizeStats.md)
- [QueryExecutionOptions](interfaces/QueryExecutionOptions.md)
- [RemovalStats](interfaces/RemovalStats.md)
- [RestNamespaceConfig](interfaces/RestNamespaceConfig.md)
- [RetryConfig](interfaces/RetryConfig.md)
- [ScannableOptions](interfaces/ScannableOptions.md)
- [ShuffleOptions](interfaces/ShuffleOptions.md)
- [SplitCalculatedOptions](interfaces/SplitCalculatedOptions.md)
- [SplitHashOptions](interfaces/SplitHashOptions.md)
@@ -107,6 +119,7 @@
- [RecordBatchIterator](functions/RecordBatchIterator.md)
- [connect](functions/connect.md)
- [connectNamespace](functions/connectNamespace.md)
- [makeArrowTable](functions/makeArrowTable.md)
- [packBits](functions/packBits.md)
- [permutationBuilder](functions/permutationBuilder.md)

View File

@@ -0,0 +1,54 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / ConnectNamespaceOptions
# Interface: ConnectNamespaceOptions
## Properties
### namespaceClientProperties?
```ts
optional namespaceClientProperties: Record<string, string>;
```
Extra properties for the backing namespace client.
***
### readConsistencyInterval?
```ts
optional readConsistencyInterval: number;
```
The interval, in seconds, at which to check for updates to the table
from other processes. If None, then consistency is not checked. For
performance reasons, this is the default. For strong consistency, set
this to zero seconds. Then every read will check for updates from other
processes. As a compromise, you can set this to a non-zero value for
eventual consistency.
***
### session?
```ts
optional session: Session;
```
The session to use for this connection. Holds shared caches and other
session-specific state.
***
### storageOptions?
```ts
optional storageOptions: Record<string, string>;
```
Configuration for object storage. The available options are described
at https://docs.lancedb.com/storage/

View File

@@ -0,0 +1,27 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / CreateNamespaceOptions
# Interface: CreateNamespaceOptions
## Properties
### mode?
```ts
optional mode: "overwrite" | "create" | "exist_ok";
```
Creation mode.
***
### properties?
```ts
optional properties: Record<string, string>;
```
Properties to set on the new namespace.

View File

@@ -0,0 +1,23 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / CreateNamespaceResponse
# Interface: CreateNamespaceResponse
## Properties
### properties?
```ts
optional properties: Record<string, string>;
```
***
### transactionId?
```ts
optional transactionId: string;
```

View File

@@ -0,0 +1,15 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / DescribeNamespaceResponse
# Interface: DescribeNamespaceResponse
## Properties
### properties?
```ts
optional properties: Record<string, string>;
```

View File

@@ -0,0 +1,47 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / DirNamespaceConfig
# Interface: DirNamespaceConfig
Configuration for the built-in directory namespace (`"dir"`).
The directory namespace stores tables under a single root path (local
filesystem or object storage URI). See
[https://docs.lancedb.com/namespaces](https://docs.lancedb.com/namespaces) for the documented surface;
less-common knobs live under [DirNamespaceConfig.extraProperties](DirNamespaceConfig.md#extraproperties).
## Properties
### extraProperties?
```ts
optional extraProperties: Record<string, string>;
```
Additional raw properties passed verbatim to the namespace
implementation (e.g. `storage.*`, `credential_vendor.*`). Typed
fields above take precedence on key collision.
***
### manifestEnabled?
```ts
optional manifestEnabled: boolean;
```
Whether to maintain a namespace manifest at the root. Required for
child namespaces. Defaults to true on the impl side.
***
### root
```ts
root: string;
```
Root path or URI containing the LanceDB tables.

View File

@@ -0,0 +1,27 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / DropNamespaceOptions
# Interface: DropNamespaceOptions
## Properties
### behavior?
```ts
optional behavior: "restrict" | "cascade";
```
Refuse to drop if non-empty (restrict) or drop recursively (cascade).
***
### mode?
```ts
optional mode: "fail" | "skip";
```
Whether to skip if the namespace doesn't exist, or fail.

View File

@@ -0,0 +1,23 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / DropNamespaceResponse
# Interface: DropNamespaceResponse
## Properties
### properties?
```ts
optional properties: Record<string, string>;
```
***
### transactionId?
```ts
optional transactionId: string[];
```

View File

@@ -0,0 +1,27 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / ListNamespacesOptions
# Interface: ListNamespacesOptions
## Properties
### limit?
```ts
optional limit: number;
```
An optional limit to the number of results to return.
***
### pageToken?
```ts
optional pageToken: string;
```
Token from a previous response for pagination.

View File

@@ -0,0 +1,23 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / ListNamespacesResponse
# Interface: ListNamespacesResponse
## Properties
### namespaces
```ts
namespaces: string[];
```
***
### pageToken?
```ts
optional pageToken: string;
```

View File

@@ -0,0 +1,47 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / RestNamespaceConfig
# Interface: RestNamespaceConfig
Configuration for the built-in REST namespace (`"rest"`).
The REST namespace talks to a remote catalog server over HTTP. See
[https://docs.lancedb.com/namespaces](https://docs.lancedb.com/namespaces) for the documented surface;
less-common knobs (TLS, metrics) live under
[RestNamespaceConfig.extraProperties](RestNamespaceConfig.md#extraproperties).
## Properties
### extraProperties?
```ts
optional extraProperties: Record<string, string>;
```
Additional raw properties passed verbatim to the namespace
implementation (e.g. `tls.*`, `ops_metrics_enabled`, `delimiter`).
Typed fields above take precedence on key collision.
***
### headers?
```ts
optional headers: Record<string, string>;
```
HTTP headers forwarded with each request. Keys are passed through
as-is (e.g. `"x-api-key"`, `"Authorization"`).
***
### uri
```ts
uri: string;
```
Catalog endpoint URL.

View File

@@ -0,0 +1,29 @@
[**@lancedb/lancedb**](../README.md) • **Docs**
***
[@lancedb/lancedb](../globals.md) / ScannableOptions
# Interface: ScannableOptions
## Properties
### numRows?
```ts
optional numRows: number;
```
Hint about the number of rows. Not validated against the stream.
***
### rescannable?
```ts
optional rescannable: boolean;
```
Whether the source can be scanned more than once. Defaults to `true` for
`fromTable` / `fromFactory` and `false` for `fromIterable` /
`fromRecordBatchReader`.

View File

@@ -8,7 +8,7 @@
<parent>
<groupId>com.lancedb</groupId>
<artifactId>lancedb-parent</artifactId>
<version>0.29.0-final.0</version>
<version>0.28.0-beta.11</version>
<relativePath>../pom.xml</relativePath>
</parent>

View File

@@ -6,7 +6,7 @@
<groupId>com.lancedb</groupId>
<artifactId>lancedb-parent</artifactId>
<version>0.29.0-final.0</version>
<version>0.28.0-beta.11</version>
<packaging>pom</packaging>
<name>${project.artifactId}</name>
<description>LanceDB Java SDK Parent POM</description>
@@ -28,7 +28,7 @@
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<arrow.version>15.0.0</arrow.version>
<lance-core.version>6.0.0</lance-core.version>
<lance-core.version>7.0.0-beta.9</lance-core.version>
<spotless.skip>false</spotless.skip>
<spotless.version>2.30.0</spotless.version>
<spotless.java.googlejavaformat.version>1.7</spotless.java.googlejavaformat.version>

View File

@@ -3,11 +3,11 @@ The core Rust library is in the `../rust/lancedb` directory, the rust binding
code is in the `src/` directory and the typescript bindings are in
the `lancedb/` directory.
Whenever you change the Rust code, you will need to recompile: `npm run build`.
Whenever you change the Rust code, you will need to recompile: `pnpm build`.
Common commands:
* Build: `npm run build`
* Lint: `npm run lint`
* Fix lints: `npm run lint-fix`
* Test: `npm test`
* Run single test file: `npm test __test__/arrow.test.ts`
* Build: `pnpm build`
* Lint: `pnpm lint`
* Fix lints: `pnpm lint-fix`
* Test: `pnpm test`
* Run single test file: `pnpm test __test__/arrow.test.ts`

View File

@@ -12,20 +12,22 @@ Typescript.
* `src/`: Rust bindings source code
* `lancedb/`: Typescript package source code
* `__test__/`: Unit tests
* `examples/`: An npm package with the examples shown in the documentation
* `examples/`: A pnpm package with the examples shown in the documentation
## Development environment
To set up your development environment, you will need to install the following:
1. Node.js 14 or later
2. Rust's package manager, Cargo. Use [rustup](https://rustup.rs/) to install.
3. [protoc](https://grpc.io/docs/protoc-installation/) (Protocol Buffers compiler)
1. Node.js 22 or later (required by pnpm 11)
2. [pnpm](https://pnpm.io/installation) 11 or later (or run via `corepack enable`,
which uses the `packageManager` field in `package.json`)
3. Rust's package manager, Cargo. Use [rustup](https://rustup.rs/) to install.
4. [protoc](https://grpc.io/docs/protoc-installation/) (Protocol Buffers compiler)
Initial setup:
```shell
npm install
pnpm install
```
### Commit Hooks
@@ -39,38 +41,38 @@ pre-commit install
## Development
Most common development commands can be run using the npm scripts.
Most common development commands can be run using the pnpm scripts.
Build the package
```shell
npm install
npm run build
pnpm install
pnpm build
```
Lint:
```shell
npm run lint
pnpm lint
```
Format and fix lints:
```shell
npm run lint-fix
pnpm lint-fix
```
Run tests:
```shell
npm test
pnpm test
```
To run a single test:
```shell
# Single file: table.test.ts
npm test -- table.test.ts
pnpm test -- table.test.ts
# Single test: 'merge insert' in table.test.ts
npm test -- table.test.ts --testNamePattern=merge\ insert
pnpm test -- table.test.ts --testNamePattern=merge\ insert
```

View File

@@ -1,7 +1,7 @@
[package]
name = "lancedb-nodejs"
edition.workspace = true
version = "0.29.0"
version = "0.28.0-beta.11"
publish = false
license.workspace = true
description.workspace = true
@@ -22,6 +22,7 @@ arrow-schema.workspace = true
env_logger.workspace = true
futures.workspace = true
lancedb = { path = "../rust/lancedb", default-features = false }
lance-namespace.workspace = true
napi = { version = "3.8.3", default-features = false, features = [
"napi9",
"async"

View File

@@ -4,7 +4,7 @@
import { readdirSync } from "fs";
import { Field, Float64, Schema } from "apache-arrow";
import * as tmp from "tmp";
import { Connection, Table, connect } from "../lancedb";
import { Connection, Table, connect, connectNamespace } from "../lancedb";
import { LocalTable } from "../lancedb/table";
describe("when connecting", () => {
@@ -81,6 +81,16 @@ describe("given a connection", () => {
await db.createTable("test4", [{ id: 1 }, { id: 2 }]);
});
it("should expose renameTable and reject on OSS listing DB", async () => {
await db.createTable("old_name", [{ id: 1 }]);
await expect(db.renameTable("old_name", "new_name")).rejects.toThrow(
"rename_table is not supported in LanceDB OSS",
);
await expect(db.tableNames()).resolves.toEqual(["old_name"]);
});
it("should fail if creating table twice, unless overwrite is true", async () => {
let tbl = await db.createTable("test", [{ id: 1 }, { id: 2 }]);
await expect(tbl.countRows()).resolves.toBe(2);
@@ -306,3 +316,186 @@ describe("clone table functionality", () => {
).rejects.toThrow("Deep clone is not yet implemented");
});
});
describe("namespaces", () => {
let tmpDir: tmp.DirResult;
let db: Connection;
beforeEach(async () => {
tmpDir = tmp.dirSync({ unsafeCleanup: true });
// The local DirectoryNamespace backend only supports child namespaces
// when manifest mode is enabled (see lance-namespace-impls/src/dir.rs).
db = await connect(tmpDir.name, {
// biome-ignore lint/style/useNamingConvention: opaque backend property key, must match Rust
namespaceClientProperties: { manifest_enabled: "true" },
});
});
afterEach(() => tmpDir.removeCallback());
it("should create and describe a namespace", async () => {
await db.createNamespace(["myns"]);
const desc = await db.describeNamespace(["myns"]);
expect(desc).toBeDefined();
});
it("should list namespaces created at the root", async () => {
await db.createNamespace(["alpha"]);
await db.createNamespace(["beta"]);
const list = await db.listNamespaces();
expect(list.namespaces).toEqual(expect.arrayContaining(["alpha", "beta"]));
});
it("should list child namespaces under a parent", async () => {
await db.createNamespace(["parent"]);
await db.createNamespace(["parent", "child"]);
const list = await db.listNamespaces(["parent"]);
expect(list.namespaces).toContain("child");
});
it("should drop a namespace", async () => {
await db.createNamespace(["ephemeral"]);
await db.dropNamespace(["ephemeral"]);
const list = await db.listNamespaces();
expect(list.namespaces).not.toContain("ephemeral");
});
it("should raise an error on any namespace op after close", async () => {
await db.close();
await expect(db.describeNamespace(["foo"])).rejects.toThrow(
"Connection is closed",
);
await expect(db.listNamespaces()).rejects.toThrow("Connection is closed");
await expect(db.createNamespace(["foo"])).rejects.toThrow(
"Connection is closed",
);
await expect(db.dropNamespace(["foo"])).rejects.toThrow(
"Connection is closed",
);
});
it("should raise an understandable error when describing a non-existent namespace", async () => {
await expect(db.describeNamespace(["does-not-exist"])).rejects.toThrow(
/not found/i,
);
});
it("should raise an error when creating a namespace that already exists", async () => {
await db.createNamespace(["dup"]);
await expect(db.createNamespace(["dup"])).rejects.toThrow();
});
it("should reject an unrecognized createNamespace mode with a clear error", async () => {
await expect(
// biome-ignore lint/suspicious/noExplicitAny: deliberately bypass TS to test runtime validation
db.createNamespace(["x"], { mode: "frobnicate" as any }),
).rejects.toThrow(/Invalid mode 'frobnicate'/);
});
it("should reject an unrecognized dropNamespace mode with a clear error", async () => {
await db.createNamespace(["x"]);
await expect(
// biome-ignore lint/suspicious/noExplicitAny: deliberately bypass TS to test runtime validation
db.dropNamespace(["x"], { mode: "frobnicate" as any }),
).rejects.toThrow(/Invalid mode 'frobnicate'/);
});
it("should reject an unrecognized dropNamespace behavior with a clear error", async () => {
await db.createNamespace(["x"]);
await expect(
// biome-ignore lint/suspicious/noExplicitAny: deliberately bypass TS to test runtime validation
db.dropNamespace(["x"], { behavior: "frobnicate" as any }),
).rejects.toThrow(/Invalid behavior 'frobnicate'/);
});
});
describe("connectNamespace", () => {
let tmpDir: tmp.DirResult;
beforeEach(() => {
tmpDir = tmp.dirSync({ unsafeCleanup: true });
});
afterEach(() => tmpDir.removeCallback());
it("connects via the dir implementation and supports table ops", async () => {
const db = await connectNamespace("dir", { root: tmpDir.name });
await db.createTable("users", [{ id: 1 }, { id: 2 }]);
await expect(db.tableNames()).resolves.toContain("users");
});
it("throws a clear error when implName is empty", async () => {
await expect(connectNamespace("", {})).rejects.toThrow(
"implName must be a non-empty string",
);
});
it("throws when the namespace implementation is unknown", async () => {
await expect(connectNamespace("not-a-real-impl", {})).rejects.toThrow();
});
it("passes storage options through to the namespace", async () => {
const db = await connectNamespace(
"dir",
{ root: tmpDir.name },
{ storageOptions: { newTableDataStorageVersion: "stable" } },
);
await db.createTable("plumbing", [{ id: 1 }]);
await expect(db.tableNames()).resolves.toContain("plumbing");
});
it("supports child namespaces when manifestEnabled is true on the dir config", async () => {
const writer = await connectNamespace("dir", {
root: tmpDir.name,
manifestEnabled: true,
});
await writer.createNamespace(["analytics"]);
await writer.createTable("orders", [{ id: 1 }, { id: 2 }], ["analytics"]);
await writer.close();
const reader = await connectNamespace("dir", {
root: tmpDir.name,
manifestEnabled: true,
});
await expect(reader.tableNames(["analytics"])).resolves.toContain("orders");
const orders = await reader.openTable("orders", ["analytics"]);
await expect(orders.countRows()).resolves.toBe(2);
});
it("merges extraProperties into the dir config and is overridden by typed fields", async () => {
// Two observable assertions:
// - Typed `root` overrides extraProperties.root: createTable would fail
// under the bogus path if the override didn't happen.
// - extraProperties.manifest_enabled="false" is honored end-to-end. Child
// namespaces require manifest mode (default true), so explicitly
// disabling it via extraProperties must make createNamespace reject. If
// extraProperties pass-through were silently broken, the default would
// let createNamespace succeed.
const db = await connectNamespace("dir", {
root: tmpDir.name,
extraProperties: {
root: "/should/be/overridden",
// biome-ignore lint/style/useNamingConvention: backend property key
manifest_enabled: "false",
},
});
await db.createTable("base", [{ id: 1 }]);
await expect(db.tableNames()).resolves.toContain("base");
await expect(db.createNamespace(["analytics"])).rejects.toThrow();
});
it("flows unknown top-level keys through when implName is dynamic (no silent drop)", async () => {
// Routes via the third overload because `impl` is `string`, not the
// literal `"dir"`. The dispatcher still notices the runtime value is
// "dir", but unknown keys like `manifest_enabled` must not be silently
// dropped during the conversion.
//
// Asserting a *negative* outcome (manifest disabled -> createNamespace
// rejects) is required for observability, since the backend default for
// `manifest_enabled` is true.
const impl: string = "dir";
const db = await connectNamespace(impl, {
root: tmpDir.name,
// biome-ignore lint/style/useNamingConvention: backend property key
manifest_enabled: "false",
});
await expect(db.createNamespace(["mixed"])).rejects.toThrow();
});
});

View File

@@ -0,0 +1,438 @@
// SPDX-License-Identifier: Apache-2.0
// SPDX-FileCopyrightText: Copyright The LanceDB Authors
import {
Field,
Float16,
Int32,
type RecordBatch,
RecordBatchReader,
Schema,
tableToIPC,
} from "apache-arrow";
import { makeArrowTable, makeEmptyTable } from "../lancedb/arrow";
import { Scannable } from "../lancedb/scannable";
function makeTable() {
return makeArrowTable(
[
{ id: 1, name: "a" },
{ id: 2, name: "b" },
{ id: 3, name: "c" },
],
{ vectorColumns: {} },
);
}
async function makeReader(): Promise<RecordBatchReader> {
// `RecordBatchReader.from()` returns an unopened reader; `.schema` is only
// populated after `.open()`. Opening sync readers is synchronous.
const reader = RecordBatchReader.from(tableToIPC(makeTable()));
return reader.open() as RecordBatchReader;
}
describe("Scannable", () => {
describe("fromTable", () => {
test("reflects schema, numRows, and defaults rescannable=true", async () => {
const table = makeTable();
const scannable = await Scannable.fromTable(table);
expect(scannable.schema).toBe(table.schema);
expect(scannable.numRows).toBe(table.numRows);
expect(scannable.rescannable).toBe(true);
});
test("throws when opts.numRows does not match table.numRows", async () => {
await expect(
Scannable.fromTable(makeTable(), { numRows: 42 }),
).rejects.toThrow(/does not match table\.numRows/);
});
test("throws when opts.rescannable is false", async () => {
await expect(
Scannable.fromTable(makeTable(), { rescannable: false }),
).rejects.toThrow(/always rescannable/);
});
});
describe("fromRecordBatchReader", () => {
test("reflects schema and defaults numRows=null, rescannable=false", async () => {
const reader = await makeReader();
const scannable = await Scannable.fromRecordBatchReader(reader);
expect(scannable.schema).toBe(reader.schema);
expect(scannable.numRows).toBeNull();
expect(scannable.rescannable).toBe(false);
});
test("honors numRows override", async () => {
const scannable = await Scannable.fromRecordBatchReader(
await makeReader(),
{ numRows: 3 },
);
expect(scannable.numRows).toBe(3);
expect(scannable.rescannable).toBe(false);
});
test("rescannable: false explicit does not throw", async () => {
const reader = await makeReader();
const scannable = await Scannable.fromRecordBatchReader(reader, {
rescannable: false,
});
expect(scannable.rescannable).toBe(false);
});
test("throws when opts.rescannable is true", async () => {
const reader = await makeReader();
await expect(
Scannable.fromRecordBatchReader(reader, { rescannable: true }),
).rejects.toThrow(/does not accept rescannable/);
});
test("throws when opts.rescannable is true even alongside numRows", async () => {
const reader = await makeReader();
await expect(
Scannable.fromRecordBatchReader(reader, {
numRows: 3,
rescannable: true,
}),
).rejects.toThrow(/does not accept rescannable/);
});
});
describe("fromIterable", () => {
test("accepts a sync iterable of batches", async () => {
const table = makeTable();
const scannable = await Scannable.fromIterable(
table.schema,
table.batches,
);
expect(scannable.schema).toBe(table.schema);
expect(scannable.numRows).toBeNull();
expect(scannable.rescannable).toBe(false);
});
test("accepts an async iterable of batches", async () => {
const table = makeTable();
async function* generator(): AsyncGenerator<RecordBatch> {
for (const batch of table.batches) {
yield batch;
}
}
const scannable = await Scannable.fromIterable(table.schema, generator());
expect(scannable.schema).toBe(table.schema);
expect(scannable.rescannable).toBe(false);
});
describe("rescannable: true detection", () => {
// Replayable inputs: [Symbol.iterator]() / [Symbol.asyncIterator]()
// returns a fresh iterator each call. Must NOT throw.
test("Array passes (fresh ArrayIterator each call)", async () => {
const table = makeTable();
const scannable = await Scannable.fromIterable(
table.schema,
table.batches,
{ rescannable: true },
);
expect(scannable.rescannable).toBe(true);
});
test("Set passes (fresh SetIterator each call)", async () => {
const table = makeTable();
const set = new Set<RecordBatch>(table.batches);
const scannable = await Scannable.fromIterable(table.schema, set, {
rescannable: true,
});
expect(scannable.rescannable).toBe(true);
});
test("custom Iterable returning a fresh iterator passes", async () => {
const table = makeTable();
const replayable: Iterable<RecordBatch> = {
[Symbol.iterator]() {
return table.batches[Symbol.iterator]();
},
};
const scannable = await Scannable.fromIterable(
table.schema,
replayable,
{ rescannable: true },
);
expect(scannable.rescannable).toBe(true);
});
test("object with generator method passes (fresh generator each call)", async () => {
const table = makeTable();
const replayable: Iterable<RecordBatch> = {
*[Symbol.iterator]() {
for (const batch of table.batches) yield batch;
},
};
const scannable = await Scannable.fromIterable(
table.schema,
replayable,
{ rescannable: true },
);
expect(scannable.rescannable).toBe(true);
});
test("empty Array passes (replayable degenerate case)", async () => {
const schema = makeTable().schema;
const scannable = await Scannable.fromIterable(
schema,
[] as RecordBatch[],
{ rescannable: true },
);
expect(scannable.rescannable).toBe(true);
});
// One-shot inputs: [Symbol.iterator]() / [Symbol.asyncIterator]()
// returns the same object, or the input is already-an-iterator.
// Must throw with a /one-shot/ message.
test("sync generator throws", async () => {
const table = makeTable();
function* generator(): Generator<RecordBatch> {
for (const batch of table.batches) yield batch;
}
await expect(
Scannable.fromIterable(table.schema, generator(), {
rescannable: true,
}),
).rejects.toThrow(/one-shot/);
});
test("async generator throws", async () => {
const table = makeTable();
async function* generator(): AsyncGenerator<RecordBatch> {
for (const batch of table.batches) yield batch;
}
await expect(
Scannable.fromIterable(table.schema, generator(), {
rescannable: true,
}),
).rejects.toThrow(/one-shot/);
});
test("empty generator throws (one-shot degenerate case)", async () => {
const schema = makeTable().schema;
function* generator(): Generator<RecordBatch> {
// intentionally empty; yields nothing.
}
await expect(
Scannable.fromIterable(schema, generator(), { rescannable: true }),
).rejects.toThrow(/one-shot/);
});
test("custom self-iterator throws", async () => {
const table = makeTable();
const batches = table.batches;
let i = 0;
const oneShot: Iterable<RecordBatch> & Iterator<RecordBatch> = {
[Symbol.iterator]() {
return this;
},
next() {
if (i >= batches.length) {
return { done: true, value: undefined };
}
return { done: false, value: batches[i++] };
},
};
await expect(
Scannable.fromIterable(table.schema, oneShot, { rescannable: true }),
).rejects.toThrow(/one-shot/);
});
test("Array.values() (IterableIterator) throws", async () => {
const table = makeTable();
const iter = table.batches.values();
await expect(
Scannable.fromIterable(table.schema, iter, { rescannable: true }),
).rejects.toThrow(/one-shot/);
});
test("raw iterator (only `.next`) throws", async () => {
const table = makeTable();
const batches = table.batches;
let i = 0;
const rawIter = {
next(): IteratorResult<RecordBatch> {
if (i >= batches.length) {
return { done: true, value: undefined };
}
return { done: false, value: batches[i++] };
},
};
await expect(
Scannable.fromIterable(
table.schema,
rawIter as unknown as Iterable<RecordBatch>,
{ rescannable: true },
),
).rejects.toThrow(/one-shot/);
});
// Edge: null/undefined must not crash the detection helper. The
// null check belongs to `normalizeIterator` and only fires when a
// scan starts.
test("null input does not crash detection at construction", async () => {
const schema = makeTable().schema;
await expect(
Scannable.fromIterable(
schema,
null as unknown as Iterable<RecordBatch>,
{
rescannable: true,
},
),
).resolves.toBeDefined();
});
test("undefined input does not crash detection at construction", async () => {
const schema = makeTable().schema;
await expect(
Scannable.fromIterable(
schema,
undefined as unknown as Iterable<RecordBatch>,
{ rescannable: true },
),
).resolves.toBeDefined();
});
// Default (rescannable omitted) skips the check entirely, so even
// pathological inputs construct without throwing here.
test("rescannable omitted skips detection entirely (generator passes)", async () => {
const table = makeTable();
function* generator(): Generator<RecordBatch> {
for (const batch of table.batches) yield batch;
}
const scannable = await Scannable.fromIterable(
table.schema,
generator(),
);
expect(scannable.rescannable).toBe(false);
});
test("rescannable: false explicit skips detection entirely (generator passes)", async () => {
const table = makeTable();
function* generator(): Generator<RecordBatch> {
for (const batch of table.batches) yield batch;
}
const scannable = await Scannable.fromIterable(
table.schema,
generator(),
{ rescannable: false },
);
expect(scannable.rescannable).toBe(false);
});
});
});
describe("fromFactory", () => {
test("defaults rescannable=true and does not invoke the factory eagerly", async () => {
const table = makeTable();
const factory = jest.fn(() => table.batches);
const scannable = await Scannable.fromFactory(table.schema, factory);
expect(scannable.schema).toBe(table.schema);
expect(scannable.rescannable).toBe(true);
expect(factory).not.toHaveBeenCalled();
});
test("honors rescannable and numRows overrides", async () => {
const table = makeTable();
const scannable = await Scannable.fromFactory(
table.schema,
() => table.batches,
{ numRows: 7, rescannable: false },
);
expect(scannable.numRows).toBe(7);
expect(scannable.rescannable).toBe(false);
});
});
describe("validation", () => {
test("throws when numRows is negative", async () => {
await expect(
Scannable.fromFactory(makeTable().schema, () => [], { numRows: -1 }),
).rejects.toThrow(/non-negative/);
});
test("throws when numRows is not an integer", async () => {
await expect(
Scannable.fromFactory(makeTable().schema, () => [], { numRows: 3.5 }),
).rejects.toThrow(/integer/);
});
});
describe("native handle", () => {
test("exposes a native handle via inner", async () => {
const scannable = await Scannable.fromTable(makeTable());
expect(scannable.inner).toBeDefined();
expect(typeof scannable.inner).toBe("object");
expect(scannable.inner).not.toBeNull();
});
});
// Schema-variety construction tests. Each asserts that construction
// succeeds against a richer Arrow schema, which transitively exercises
// schema serialization and the Rust-side `ipc_file_to_schema` for types
// beyond flat primitives.
describe("schema variety", () => {
test("accepts an empty table", async () => {
const schema = new Schema([new Field("id", new Int32(), true)]);
const table = makeEmptyTable(schema);
const scannable = await Scannable.fromTable(table);
expect(scannable.numRows).toBe(0);
expect(scannable.schema).toBe(table.schema);
});
test("accepts nested struct and list columns", async () => {
const table = makeArrowTable(
[
{ id: 1, point: { x: 0, y: 0 }, tags: ["a", "b"] },
{ id: 2, point: { x: 1, y: 2 }, tags: ["c"] },
],
{ vectorColumns: {} },
);
const scannable = await Scannable.fromTable(table);
expect(scannable.schema).toBe(table.schema);
expect(scannable.numRows).toBe(2);
});
test("accepts a FixedSizeList (vector) column", async () => {
const table = makeArrowTable(
[
{ id: 1, vec: [1, 2, 3] },
{ id: 2, vec: [4, 5, 6] },
],
{ vectorColumns: { vec: { type: new Float16() } } },
);
const scannable = await Scannable.fromTable(table);
expect(scannable.schema).toBe(table.schema);
expect(scannable.numRows).toBe(2);
});
test("accepts a table with many columns", async () => {
const row: Record<string, number> = {};
for (let i = 0; i < 50; i++) row[`c${i}`] = i;
const table = makeArrowTable([row, row], { vectorColumns: {} });
const scannable = await Scannable.fromTable(table);
expect(scannable.schema.fields.length).toBe(50);
expect(scannable.numRows).toBe(2);
});
});
});

File diff suppressed because it is too large Load Diff

View File

@@ -11,16 +11,17 @@
"test": "node --experimental-vm-modules node_modules/.bin/jest --testEnvironment jest-environment-node-single-context --verbose",
"lint": "biome check *.ts && biome format *.ts",
"lint-ci": "biome ci .",
"lint-fix": "biome check --write *.ts && npm run format",
"lint-fix": "biome check --write *.ts && pnpm format",
"format": "biome format --write *.ts"
},
"author": "Lance Devs",
"license": "Apache-2.0",
"packageManager": "pnpm@11.1.1",
"dependencies": {
"@huggingface/transformers": "^3.0.2",
"@huggingface/transformers": "3.0.2",
"@lancedb/lancedb": "file:../dist",
"openai": "^4.29.2",
"sharp": "^0.33.5"
"openai": "4.29.2",
"sharp": "0.33.5"
},
"devDependencies": {
"@biomejs/biome": "^1.7.3",

3466
nodejs/examples/pnpm-lock.yaml generated Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,13 @@
# Block resolution of versions less than 24h old (Shai-Hulud window).
# This is the pnpm 11 default but pinned here so it's visible to
# reviewers and survives a future pnpm major flipping the default.
minimumReleaseAge: 1440
# Fail install if a transitive dep tries to run an unapproved script.
strictDepBuilds: true
allowBuilds:
'@biomejs/biome': true
onnxruntime-node: true
protobufjs: true
sharp: true

View File

@@ -1291,6 +1291,18 @@ export async function fromRecordBatchToBuffer(
return Buffer.from(await writer.toUint8Array());
}
/**
* Create a buffer containing a single record batch using the Arrow IPC Stream
* serialization. Each call produces a self-contained Stream message (schema +
* batch + EOS) suitable for incremental decode by `arrow_ipc::reader::StreamReader`.
*/
export async function fromRecordBatchToStreamBuffer(
batch: RecordBatch,
): Promise<Buffer> {
const writer = RecordBatchStreamWriter.writeAll([batch]);
return Buffer.from(await writer.toUint8Array());
}
/**
* Serialize an Arrow Table into a buffer using the Arrow IPC Stream serialization
*

View File

@@ -16,6 +16,18 @@ import {
} from "./arrow";
import { EmbeddingFunctionConfig, getRegistry } from "./embedding/registry";
import { Connection as LanceDbConnection } from "./native";
import type {
CreateNamespaceResponse,
DescribeNamespaceResponse,
DropNamespaceResponse,
ListNamespacesResponse,
} from "./native";
export type {
CreateNamespaceResponse,
DescribeNamespaceResponse,
DropNamespaceResponse,
ListNamespacesResponse,
};
import { sanitizeTable } from "./sanitize";
import { LocalTable, Table } from "./table";
@@ -110,6 +122,28 @@ export interface TableNamesOptions {
/** An optional limit to the number of results to return. */
limit?: number;
}
export interface ListNamespacesOptions {
/** Token from a previous response for pagination. */
pageToken?: string;
/** An optional limit to the number of results to return. */
limit?: number;
}
export interface CreateNamespaceOptions {
/** Creation mode. */
mode?: "create" | "exist_ok" | "overwrite";
/** Properties to set on the new namespace. */
properties?: Record<string, string>;
}
export interface DropNamespaceOptions {
/** Whether to skip if the namespace doesn't exist, or fail. */
mode?: "skip" | "fail";
/** Refuse to drop if non-empty (restrict) or drop recursively (cascade). */
behavior?: "restrict" | "cascade";
}
/**
* A LanceDB Connection that allows you to open tables and create new ones.
*
@@ -262,12 +296,81 @@ export abstract class Connection {
*/
abstract dropTable(name: string, namespacePath?: string[]): Promise<void>;
abstract renameTable(
oldName: string,
newName: string,
namespacePath?: string[],
): Promise<void>;
/**
* Drop all tables in the database.
* @param {string[]} namespacePath The namespace path to drop tables from (defaults to root namespace).
*/
abstract dropAllTables(namespacePath?: string[]): Promise<void>;
/**
* Describe a namespace, returning its properties.
*
* @param {string[]} namespacePath - The namespace path to describe, in
* parent → child order, e.g. `["analytics", "sales"]`.
* @returns {Promise<DescribeNamespaceResponse>} The namespace's properties
* (may be undefined if the namespace has none).
*/
abstract describeNamespace(
namespacePath: string[],
): Promise<DescribeNamespaceResponse>;
/**
* List the immediate child namespaces under the given parent.
*
* Results may be paginated. To retrieve subsequent pages, pass the
* `pageToken` returned by a previous call.
*
* @param {string[]} namespacePath - The parent namespace path. Defaults
* to the root namespace if omitted.
* @param {Partial<ListNamespacesOptions>} options - Pagination options
* (`pageToken`, `limit`).
* @returns {Promise<ListNamespacesResponse>} Child namespace names and
* an optional token for fetching the next page.
*/
abstract listNamespaces(
namespacePath?: string[],
options?: Partial<ListNamespacesOptions>,
): Promise<ListNamespacesResponse>;
/**
* Create a new namespace at the given path.
*
* @param {string[]} namespacePath - The namespace path to create.
* @param {Partial<CreateNamespaceOptions>} options - Creation `mode`
* ("create" | "exist_ok" | "overwrite") and optional `properties`
* to attach to the namespace.
* @returns {Promise<CreateNamespaceResponse>} The properties of the
* created namespace and an optional transaction id.
*/
abstract createNamespace(
namespacePath: string[],
options?: Partial<CreateNamespaceOptions>,
): Promise<CreateNamespaceResponse>;
/**
* Drop a namespace.
*
* Use `behavior: "cascade"` to also drop everything contained in the
* namespace (sub-namespaces and tables). The default `"restrict"`
* behavior refuses to drop a non-empty namespace.
*
* @param {string[]} namespacePath - The namespace path to drop.
* @param {Partial<DropNamespaceOptions>} options - `mode` ("skip" | "fail"
* for missing-namespace handling) and `behavior` ("restrict" | "cascade").
* @returns {Promise<DropNamespaceResponse>} Any properties returned by
* the server and an optional transaction id.
*/
abstract dropNamespace(
namespacePath: string[],
options?: Partial<DropNamespaceOptions>,
): Promise<DropNamespaceResponse>;
/**
* Clone a table from a source table.
*
@@ -512,9 +615,56 @@ export class LocalConnection extends Connection {
return this.inner.dropTable(name, namespacePath ?? []);
}
async renameTable(
oldName: string,
newName: string,
namespacePath?: string[],
): Promise<void> {
return this.inner.renameTable(oldName, newName, namespacePath ?? []);
}
async dropAllTables(namespacePath?: string[]): Promise<void> {
return this.inner.dropAllTables(namespacePath ?? []);
}
describeNamespace(
namespacePath: string[],
): Promise<DescribeNamespaceResponse> {
return this.inner.describeNamespace(namespacePath);
}
listNamespaces(
namespacePath?: string[],
options?: Partial<ListNamespacesOptions>,
): Promise<ListNamespacesResponse> {
return this.inner.listNamespaces(
namespacePath ?? [],
options?.pageToken,
options?.limit,
);
}
createNamespace(
namespacePath: string[],
options?: Partial<CreateNamespaceOptions>,
): Promise<CreateNamespaceResponse> {
return this.inner.createNamespace(
namespacePath,
options?.mode,
options?.properties,
);
}
dropNamespace(
namespacePath: string[],
options?: Partial<DropNamespaceOptions>,
): Promise<DropNamespaceResponse> {
return this.inner.dropNamespace(
namespacePath,
options?.mode,
options?.behavior,
);
}
}
/**

View File

@@ -8,6 +8,7 @@ import {
} from "./connection";
import {
ConnectNamespaceOptions,
ConnectionOptions,
Connection as LanceDbConnection,
JsHeaderProvider as NativeJsHeaderProvider,
@@ -22,6 +23,7 @@ export { JsHeaderProvider as NativeJsHeaderProvider } from "./native.js";
export {
AddColumnsSql,
ConnectionOptions,
ConnectNamespaceOptions,
IndexStatistics,
IndexConfig,
ClientConfig,
@@ -62,6 +64,13 @@ export {
CreateTableOptions,
TableNamesOptions,
OpenTableOptions,
ListNamespacesOptions,
CreateNamespaceOptions,
DropNamespaceOptions,
ListNamespacesResponse,
CreateNamespaceResponse,
DropNamespaceResponse,
DescribeNamespaceResponse,
} from "./connection";
export { Session } from "./native.js";
@@ -117,6 +126,7 @@ export { MergeInsertBuilder, WriteExecutionOptions } from "./merge";
export * as embedding from "./embedding";
export { permutationBuilder, PermutationBuilder } from "./permutation";
export { Scannable, ScannableOptions } from "./scannable";
export * as rerankers from "./rerankers";
export {
SchemaLike,
@@ -293,3 +303,197 @@ export async function connect(
);
return new LocalConnection(nativeConn);
}
/**
* Configuration for the built-in directory namespace (`"dir"`).
*
* The directory namespace stores tables under a single root path (local
* filesystem or object storage URI). See
* {@link https://docs.lancedb.com/namespaces} for the documented surface;
* less-common knobs live under {@link DirNamespaceConfig.extraProperties}.
*/
export interface DirNamespaceConfig {
/** Root path or URI containing the LanceDB tables. */
root: string;
/**
* Whether to maintain a namespace manifest at the root. Required for
* child namespaces. Defaults to true on the impl side.
*/
manifestEnabled?: boolean;
/**
* Additional raw properties passed verbatim to the namespace
* implementation (e.g. `storage.*`, `credential_vendor.*`). Typed
* fields above take precedence on key collision.
*/
extraProperties?: Record<string, string>;
}
/**
* Configuration for the built-in REST namespace (`"rest"`).
*
* The REST namespace talks to a remote catalog server over HTTP. See
* {@link https://docs.lancedb.com/namespaces} for the documented surface;
* less-common knobs (TLS, metrics) live under
* {@link RestNamespaceConfig.extraProperties}.
*/
export interface RestNamespaceConfig {
/** Catalog endpoint URL. */
uri: string;
/**
* HTTP headers forwarded with each request. Keys are passed through
* as-is (e.g. `"x-api-key"`, `"Authorization"`).
*/
headers?: Record<string, string>;
/**
* Additional raw properties passed verbatim to the namespace
* implementation (e.g. `tls.*`, `ops_metrics_enabled`, `delimiter`).
* Typed fields above take precedence on key collision.
*/
extraProperties?: Record<string, string>;
}
function dirConfigToProperties(
config: DirNamespaceConfig,
): Record<string, string> {
// Spread the whole input so that unknown keys (e.g. a raw `manifest_enabled`
// passed via the dynamic-impl path) flow through instead of being dropped.
// Typed transformations layer on top.
const { manifestEnabled, extraProperties, ...rest } = config;
const properties: Record<string, string> = {
...(extraProperties ?? {}),
...(rest as Record<string, string>),
};
if (manifestEnabled !== undefined) {
properties.manifest_enabled = String(manifestEnabled);
}
return properties;
}
function restConfigToProperties(
config: RestNamespaceConfig,
): Record<string, string> {
const { headers, extraProperties, ...rest } = config;
const properties: Record<string, string> = {
...(extraProperties ?? {}),
...(rest as Record<string, string>),
};
if (headers) {
for (const [name, value] of Object.entries(headers)) {
properties[`headers.${name}`] = value;
}
}
return properties;
}
/**
* Connect to a LanceDB database through a namespace.
*
* Unlike {@link connect}, which routes by URI scheme (local path vs.
* `db://` cloud), `connectNamespace` always returns a namespace-backed
* connection. The `implName` selects the namespace implementation:
*
* - `"dir"` — directory namespace, configured with {@link DirNamespaceConfig}.
* - `"rest"` — remote REST catalog, configured with {@link RestNamespaceConfig}.
* - Any other string — full module path for a custom implementation,
* configured with a free-form string-keyed `properties` map.
*
* @example Typed dir namespace
* ```ts
* const db = await connectNamespace("dir", { root: "/path/to/db" });
* await db.createTable("users", [{ id: 1 }]);
* ```
*
* @example Typed REST namespace with auth headers
* ```ts
* const db = await connectNamespace("rest", {
* uri: "https://catalog.example.com",
* headers: { "x-api-key": process.env.CATALOG_KEY ?? "" },
* });
* ```
*
* @example Custom implementation with raw properties
* ```ts
* const db = await connectNamespace("my.custom.Namespace", {
* endpoint: "...",
* });
* ```
*/
export function connectNamespace(
implName: "dir",
config: DirNamespaceConfig,
options?: Partial<ConnectNamespaceOptions>,
): Promise<Connection>;
/**
* Connect through the built-in REST namespace.
*
* Configured with {@link RestNamespaceConfig}. See the function-level
* documentation above for the full surface, examples, and how this
* relates to {@link connect}.
*
* @example
* ```ts
* const db = await connectNamespace("rest", {
* uri: "https://catalog.example.com",
* headers: { "x-api-key": process.env.CATALOG_KEY ?? "" },
* });
* ```
*/
export function connectNamespace(
implName: "rest",
config: RestNamespaceConfig,
options?: Partial<ConnectNamespaceOptions>,
): Promise<Connection>;
/**
* Connect through a custom namespace implementation by full module path,
* configured with a free-form string-keyed `properties` map. Use the
* typed overloads above for the built-in `"dir"` and `"rest"` impls.
*
* See the function-level documentation above for examples and how this
* relates to {@link connect}.
*
* @example
* ```ts
* const db = await connectNamespace("my.custom.Namespace", {
* endpoint: "...",
* });
* ```
*/
export function connectNamespace(
implName: string,
properties: Record<string, string>,
options?: Partial<ConnectNamespaceOptions>,
): Promise<Connection>;
export async function connectNamespace(
implName: string,
configOrProperties:
| DirNamespaceConfig
| RestNamespaceConfig
| Record<string, string>,
options?: Partial<ConnectNamespaceOptions>,
): Promise<Connection> {
let properties: Record<string, string>;
if (implName === "dir") {
properties = dirConfigToProperties(
configOrProperties as DirNamespaceConfig,
);
} else if (implName === "rest") {
properties = restConfigToProperties(
configOrProperties as RestNamespaceConfig,
);
} else {
properties = configOrProperties as Record<string, string>;
}
const finalOptions: ConnectNamespaceOptions = (options ??
{}) as ConnectNamespaceOptions;
finalOptions.storageOptions = cleanseStorageOptions(
finalOptions.storageOptions,
);
const nativeConn = await LanceDbConnection.newWithNamespace(
implName,
properties,
finalOptions,
);
return new LocalConnection(nativeConn);
}

274
nodejs/lancedb/scannable.ts Normal file
View File

@@ -0,0 +1,274 @@
// SPDX-License-Identifier: Apache-2.0
// SPDX-FileCopyrightText: Copyright The LanceDB Authors
import {
Table as ArrowTable,
RecordBatch,
RecordBatchReader,
Schema,
} from "apache-arrow";
import {
fromRecordBatchToStreamBuffer,
fromTableToBuffer,
makeEmptyTable,
} from "./arrow";
import { NapiScannable } from "./native.js";
export interface ScannableOptions {
/** Hint about the number of rows. Not validated against the stream. */
numRows?: number;
/**
* Whether the source can be scanned more than once. Defaults to `true` for
* `fromTable` / `fromFactory` and `false` for `fromIterable` /
* `fromRecordBatchReader`.
*/
rescannable?: boolean;
}
/**
* A data source that can be scanned as a stream of Arrow `RecordBatch`es.
*
* `Scannable` wraps the schema + optional row count + rescannable flag and
* a callback that yields batches one at a time. It is passed to consumers
* (e.g. `Table.add`, `createTable`, `mergeInsert` — follow-up work) that
* need to pull data without materializing the full dataset in JS memory.
*
* Batches cross the JS↔Rust boundary as Arrow IPC Stream messages; a fresh
* writer serializes each batch, and the Rust side decodes it with
* `arrow_ipc::reader::StreamReader`. One batch is in flight at a time.
*/
export class Scannable {
readonly schema: Schema;
readonly numRows: number | null;
readonly rescannable: boolean;
/** @hidden */
private readonly native: NapiScannable;
private constructor(
native: NapiScannable,
schema: Schema,
numRows: number | null,
rescannable: boolean,
) {
this.native = native;
this.schema = schema;
this.numRows = numRows;
this.rescannable = rescannable;
}
/** @hidden Access the native handle for passing through to Rust consumers. */
get inner(): NapiScannable {
return this.native;
}
/**
* Build a Scannable from an explicit schema and a factory that returns a
* fresh batch iterator on each call.
*
* The factory is invoked once per scan. Each iterator yields
* `RecordBatch`es matching the declared schema. Use this when you need
* direct control over the pull loop — for example, to wrap a streaming
* source whose batches are produced lazily.
*
* @param schema - The Arrow schema of the produced batches.
* @param factory - Called at the start of each scan to produce a batch
* iterator. Must be idempotent when `rescannable` is true.
* @param opts - Optional hints. `rescannable` defaults to `true`; set to
* `false` if calling `factory()` twice would not reproduce the same data.
*/
static async fromFactory(
schema: Schema,
factory: () =>
| AsyncIterable<RecordBatch>
| Iterable<RecordBatch>
| AsyncIterator<RecordBatch>
| Iterator<RecordBatch>,
opts: ScannableOptions = {},
): Promise<Scannable> {
const numRows = opts.numRows ?? null;
if (numRows != null && !Number.isInteger(numRows)) {
throw new TypeError("numRows must be an integer");
}
const rescannable = opts.rescannable ?? true;
let iter: AsyncIterator<RecordBatch> | Iterator<RecordBatch> | null = null;
const getNextBatch = async (isStart: boolean): Promise<Buffer | null> => {
// `isStart` is true on the first pull of every new scan_as_stream.
// Drop any cached iterator so factory() is re-invoked for the next scan
if (isStart) {
iter = null;
}
if (iter === null) {
iter = normalizeIterator(factory());
}
const result = await iter.next();
if (result.done) {
iter = null;
return null;
}
return fromRecordBatchToStreamBuffer(result.value);
};
const schemaBuf = await fromTableToBuffer(makeEmptyTable(schema));
const native = new NapiScannable(
schemaBuf,
numRows,
rescannable,
getNextBatch,
);
return new Scannable(native, schema, numRows, rescannable);
}
/**
* Build a Scannable from an in-memory Arrow `Table`. Always rescannable;
* the table's batches are replayed on each scan.
*
* The table's row count is authoritative: `opts.numRows` must either be
* omitted or equal to `table.numRows`. `opts.rescannable` of `false` is
* rejected because in-memory Tables are always rescannable.
*/
static async fromTable(
table: ArrowTable,
opts: ScannableOptions = {},
): Promise<Scannable> {
if (opts.numRows != null && opts.numRows !== table.numRows) {
throw new TypeError(
`opts.numRows (${opts.numRows}) does not match table.numRows (${table.numRows}). ` +
`The table's row count is authoritative; omit numRows or pass the matching value.`,
);
}
if (opts.rescannable === false) {
throw new TypeError(
`fromTable does not accept rescannable: false. ` +
`In-memory Arrow Tables are always rescannable; omit the option or pass true.`,
);
}
return Scannable.fromFactory(table.schema, () => table.batches, {
numRows: table.numRows,
rescannable: true,
});
}
/**
* Build a Scannable from an iterable of `RecordBatch`es. `rescannable`
* defaults to `false`. Pass an explicit schema so the consumer can
* validate before any batch is pulled.
*
* `opts.rescannable: true` is honest for replayable iterables (Arrays,
* Sets, or custom iterables whose `[Symbol.iterator]()` returns a fresh
* iterator each call). It is rejected for one-shot iterables (generators,
* async generators, or already-an-iterator inputs) because their
* `[Symbol.iterator]()` returns the same exhausted object on the second
* scan. For replayable sources outside this shape, use
* `fromFactory(schema, () => createIter(), { rescannable: true })`.
*
* Note: when `opts.rescannable` is `true`, the constructor calls
* `[Symbol.iterator]()` once on the input to perform the structural check.
*/
static async fromIterable(
schema: Schema,
iter: AsyncIterable<RecordBatch> | Iterable<RecordBatch>,
opts: ScannableOptions = {},
): Promise<Scannable> {
if (opts.rescannable === true && isOneShotIterable(iter)) {
throw new TypeError(
`fromIterable: rescannable: true is not honest for one-shot iterables ` +
`(generators, async generators, or iterators where [Symbol.iterator]() ` +
`returns the same object). The source would be exhausted after the first scan. ` +
`Use fromFactory(schema, () => createIter(), { rescannable: true }) for sources ` +
`where each call mints a fresh iterator.`,
);
}
return Scannable.fromFactory(schema, () => iter, {
numRows: opts.numRows,
rescannable: opts.rescannable ?? false,
});
}
/**
* Build a Scannable from an Arrow `RecordBatchReader`. A reader can only
* be consumed once; `rescannable` defaults to `false`.
*
* The reader must already be opened (via `.open()`) so its `.schema` is
* populated. `RecordBatchReader.from(...)` returns an unopened reader.
*
* `opts.rescannable: true` is rejected because `RecordBatchReader` is a
* self-iterator (its `[Symbol.iterator]()` returns itself), and this
* constructor does not call `reader.reset()` between scans, so a second
* scan would always see an exhausted reader. For genuinely replayable
* sources, use
* `fromFactory(schema, () => openReader(), { rescannable: true })`,
* which mints a fresh reader on each scan.
*/
static async fromRecordBatchReader(
reader: RecordBatchReader,
opts: ScannableOptions = {},
): Promise<Scannable> {
if (opts.rescannable === true) {
throw new TypeError(
`fromRecordBatchReader does not accept rescannable: true. ` +
`RecordBatchReader is a self-iterator (its [Symbol.iterator]() ` +
`returns itself) and would be exhausted after the first scan. ` +
`Use fromFactory(schema, () => openReader(), { rescannable: true }) ` +
`for sources where each call mints a fresh reader.`,
);
}
return Scannable.fromFactory(reader.schema, () => reader, {
numRows: opts.numRows,
rescannable: false,
});
}
}
function normalizeIterator<T>(
source: AsyncIterable<T> | Iterable<T> | AsyncIterator<T> | Iterator<T>,
): AsyncIterator<T> | Iterator<T> {
if (source == null) {
throw new TypeError("Scannable factory returned null/undefined");
}
if (
typeof (source as AsyncIterable<T>)[Symbol.asyncIterator] === "function"
) {
return (source as AsyncIterable<T>)[Symbol.asyncIterator]();
}
if (typeof (source as Iterable<T>)[Symbol.iterator] === "function") {
return (source as Iterable<T>)[Symbol.iterator]();
}
// Already an iterator (has `.next`).
if (typeof (source as Iterator<T>).next === "function") {
return source as Iterator<T>;
}
throw new TypeError("Scannable factory returned a non-iterable value");
}
// A "self-iterator" returns the same object from `[Symbol.iterator]()` /
// `[Symbol.asyncIterator]()`. Generators behave this way, so they exhaust
// after one pass. Replayable iterables (Array, Set, custom) return a fresh
// iterator each call. Detection mirrors `normalizeIterator`'s ordering so
// classification matches scan-time behavior.
function isOneShotIterable(
source: AsyncIterable<unknown> | Iterable<unknown>,
): boolean {
// null/undefined are not one-shot in any meaningful sense; let
// `normalizeIterator` raise the actual error at scan time.
if (source == null) return false;
const ref = source as unknown;
if (
typeof (source as AsyncIterable<unknown>)[Symbol.asyncIterator] ===
"function"
) {
const it = (source as AsyncIterable<unknown>)[
Symbol.asyncIterator
]() as unknown;
return it === ref;
}
if (typeof (source as Iterable<unknown>)[Symbol.iterator] === "function") {
const it = (source as Iterable<unknown>)[Symbol.iterator]() as unknown;
return it === ref;
}
// Already-an-iterator (has `.next` but no `Symbol.iterator`) is by
// definition one-shot.
if (typeof (source as { next?: unknown }).next === "function") return true;
return false;
}

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-darwin-arm64",
"version": "0.29.0",
"version": "0.28.0-beta.11",
"os": ["darwin"],
"cpu": ["arm64"],
"main": "lancedb.darwin-arm64.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-linux-arm64-gnu",
"version": "0.29.0",
"version": "0.28.0-beta.11",
"os": ["linux"],
"cpu": ["arm64"],
"main": "lancedb.linux-arm64-gnu.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-linux-arm64-musl",
"version": "0.29.0",
"version": "0.28.0-beta.11",
"os": ["linux"],
"cpu": ["arm64"],
"main": "lancedb.linux-arm64-musl.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-linux-x64-gnu",
"version": "0.29.0",
"version": "0.28.0-beta.11",
"os": ["linux"],
"cpu": ["x64"],
"main": "lancedb.linux-x64-gnu.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-linux-x64-musl",
"version": "0.29.0",
"version": "0.28.0-beta.11",
"os": ["linux"],
"cpu": ["x64"],
"main": "lancedb.linux-x64-musl.node",

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-win32-arm64-msvc",
"version": "0.29.0",
"version": "0.28.0-beta.11",
"os": [
"win32"
],

View File

@@ -1,6 +1,6 @@
{
"name": "@lancedb/lancedb-win32-x64-msvc",
"version": "0.29.0",
"version": "0.28.0-beta.11",
"os": ["win32"],
"cpu": ["x64"],
"main": "lancedb.win32-x64-msvc.node",

10452
nodejs/package-lock.json generated

File diff suppressed because it is too large Load Diff

View File

@@ -11,7 +11,7 @@
"ann"
],
"private": false,
"version": "0.29.0",
"version": "0.28.0-beta.11",
"main": "dist/index.js",
"exports": {
".": "./dist/index.js",
@@ -38,15 +38,15 @@
"url": "https://github.com/lancedb/lancedb"
},
"devDependencies": {
"@aws-sdk/client-dynamodb": "^3.33.0",
"@aws-sdk/client-kms": "^3.33.0",
"@aws-sdk/client-s3": "^3.33.0",
"@aws-sdk/client-dynamodb": "3.1003.0",
"@aws-sdk/client-kms": "3.1003.0",
"@aws-sdk/client-s3": "3.1003.0",
"@biomejs/biome": "^1.7.3",
"@jest/globals": "^29.7.0",
"@napi-rs/cli": "^3.5.1",
"@napi-rs/cli": "3.5.1",
"@types/axios": "^0.14.0",
"@types/jest": "^29.1.2",
"@types/node": "^22.7.4",
"@types/node": "22.7.4",
"@types/tmp": "^0.2.6",
"apache-arrow-15": "npm:apache-arrow@15.0.0",
"apache-arrow-16": "npm:apache-arrow@16.0.0",
@@ -57,9 +57,9 @@
"shx": "^0.3.4",
"tmp": "^0.2.3",
"ts-jest": "^29.1.2",
"typedoc": "^0.26.4",
"typedoc-plugin-markdown": "^4.2.1",
"typescript": "^5.5.4",
"typedoc": "0.26.4",
"typedoc-plugin-markdown": "4.2.1",
"typescript": "5.5.4",
"typescript-eslint": "^7.1.0"
},
"ava": {
@@ -68,15 +68,16 @@
"engines": {
"node": ">= 18"
},
"packageManager": "pnpm@11.1.1",
"cpu": ["x64", "arm64"],
"os": ["darwin", "linux", "win32"],
"scripts": {
"artifacts": "napi artifacts",
"build:debug": "napi build --platform --dts ../lancedb/native.d.ts --js ../lancedb/native.js --output-dir lancedb",
"postbuild:debug": "shx mkdir -p dist && shx cp lancedb/*.node dist/",
"postbuild:debug": "shx mkdir -p dist && shx cp lancedb/*.node dist/ && node -e \"require('fs').writeFileSync('dist/package.json', JSON.stringify({name:'@lancedb/lancedb',type:'commonjs'}))\"",
"build:release": "napi build --platform --release --dts ../lancedb/native.d.ts --js ../lancedb/native.js --output-dir dist",
"build": "npm run build:debug && npm run tsc",
"build-release": "npm run build:release && npm run tsc",
"build": "pnpm build:debug && pnpm tsc",
"build-release": "pnpm build:release && pnpm tsc",
"tsc": "tsc -b",
"posttsc": "shx cp lancedb/native.d.ts dist/native.d.ts",
"lint-ci": "biome ci .",
@@ -86,7 +87,7 @@
"lint-fix": "biome check --write . && biome format --write .",
"prepublishOnly": "napi prepublish -t npm",
"test": "jest --verbose",
"integration": "S3_TEST=1 npm run test",
"integration": "S3_TEST=1 pnpm test",
"universal": "napi universalize",
"version": "napi version"
},
@@ -94,8 +95,8 @@
"reflect-metadata": "^0.2.2"
},
"optionalDependencies": {
"@huggingface/transformers": "^3.0.2",
"openai": "^4.29.2"
"@huggingface/transformers": "3.0.2",
"openai": "4.29.2"
},
"peerDependencies": {
"apache-arrow": ">=15.0.0 <=18.1.0"

7317
nodejs/pnpm-lock.yaml generated Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,18 @@
# Flat node_modules layout. The @napi-rs/cli build step fails to locate
# the cdylib artifact under pnpm's isolated layout; the hoisted linker
# mirrors npm's structure and unblocks the native build.
nodeLinker: hoisted
# Block resolution of versions less than 24h old (Shai-Hulud window).
# This is the pnpm 11 default but pinned here so it's visible to
# reviewers and survives a future pnpm major flipping the default.
minimumReleaseAge: 1440
# Fail install if a transitive dep tries to run an unapproved script.
strictDepBuilds: true
allowBuilds:
'@biomejs/biome': true
onnxruntime-node: true
protobufjs: true
sharp: true

View File

@@ -8,12 +8,16 @@ use lancedb::database::{CreateTableMode, Database};
use napi::bindgen_prelude::*;
use napi_derive::*;
use crate::ConnectNamespaceOptions;
use crate::ConnectionOptions;
use crate::error::NapiErrorExt;
use crate::header::JsHeaderProvider;
use crate::table::Table;
use lancedb::connection::{ConnectBuilder, Connection as LanceDBConnection};
use lancedb::connection::{ConnectBuilder, Connection as LanceDBConnection, connect_namespace};
use lance_namespace::models::{
CreateNamespaceRequest, DescribeNamespaceRequest, DropNamespaceRequest, ListNamespacesRequest,
};
use lancedb::ipc::{ipc_file_to_batches, ipc_file_to_schema};
#[napi]
@@ -21,6 +25,29 @@ pub struct Connection {
inner: Option<LanceDBConnection>,
}
#[napi(object)]
pub struct DescribeNamespaceResponse {
pub properties: Option<HashMap<String, String>>,
}
#[napi(object)]
pub struct ListNamespacesResponse {
pub namespaces: Vec<String>,
pub page_token: Option<String>,
}
#[napi(object)]
pub struct CreateNamespaceResponse {
pub properties: Option<HashMap<String, String>>,
pub transaction_id: Option<String>,
}
#[napi(object)]
pub struct DropNamespaceResponse {
pub properties: Option<HashMap<String, String>>,
pub transaction_id: Option<Vec<String>>,
}
impl Connection {
pub(crate) fn inner_new(inner: LanceDBConnection) -> Self {
Self { inner: Some(inner) }
@@ -106,6 +133,39 @@ impl Connection {
Ok(Self::inner_new(builder.execute().await.default_error()?))
}
/// Create a new Connection instance backed by a namespace implementation.
#[napi(factory)]
pub async fn new_with_namespace(
impl_name: String,
properties: HashMap<String, String>,
options: ConnectNamespaceOptions,
) -> napi::Result<Self> {
if impl_name.is_empty() {
return Err(napi::Error::from_reason(
"implName must be a non-empty string",
));
}
let mut builder = connect_namespace(&impl_name, properties);
if let Some(interval) = options.read_consistency_interval {
builder =
builder.read_consistency_interval(std::time::Duration::from_secs_f64(interval));
}
if let Some(storage_options) = options.storage_options {
for (key, value) in storage_options {
builder = builder.storage_option(key, value);
}
}
if let Some(namespace_client_properties) = options.namespace_client_properties {
builder = builder.namespace_client_properties(namespace_client_properties);
}
if let Some(session) = options.session {
builder = builder.session(session.inner.clone());
}
Ok(Self::inner_new(builder.execute().await.default_error()?))
}
#[napi]
pub fn display(&self) -> napi::Result<String> {
Ok(self.get_inner()?.to_string())
@@ -268,9 +328,149 @@ impl Connection {
.default_error()
}
#[napi(catch_unwind)]
pub async fn rename_table(
&self,
old_name: String,
new_name: String,
namespace_path: Option<Vec<String>>,
) -> napi::Result<()> {
let ns = namespace_path.unwrap_or_default();
self.get_inner()?
.rename_table(&old_name, &new_name, &ns, &ns)
.await
.default_error()
}
#[napi(catch_unwind)]
pub async fn drop_all_tables(&self, namespace_path: Option<Vec<String>>) -> napi::Result<()> {
let ns = namespace_path.unwrap_or_default();
self.get_inner()?.drop_all_tables(&ns).await.default_error()
}
#[napi(catch_unwind)]
/// Describe a namespace and return its properties.
pub async fn describe_namespace(
&self,
namespace_path: Vec<String>,
) -> napi::Result<DescribeNamespaceResponse> {
let req = DescribeNamespaceRequest {
id: Some(namespace_path),
..Default::default()
};
let resp = self
.get_inner()?
.describe_namespace(req)
.await
.default_error()?;
Ok(DescribeNamespaceResponse {
properties: resp.properties,
})
}
#[napi(catch_unwind)]
/// List child namespaces under the given namespace path
pub async fn list_namespaces(
&self,
namespace_path: Option<Vec<String>>,
page_token: Option<String>,
limit: Option<u32>,
) -> napi::Result<ListNamespacesResponse> {
let req = ListNamespacesRequest {
id: namespace_path,
page_token,
limit: limit.map(|l| l as i32),
..Default::default()
};
let resp = self
.get_inner()?
.list_namespaces(req)
.await
.default_error()?;
Ok(ListNamespacesResponse {
namespaces: resp.namespaces,
page_token: resp.page_token,
})
}
#[napi(catch_unwind)]
/// Create a new namespace with optional properties.
pub async fn create_namespace(
&self,
namespace_path: Vec<String>,
mode: Option<String>,
properties: Option<HashMap<String, String>>,
) -> napi::Result<CreateNamespaceResponse> {
let mode_str = mode
.map(|m| match m.to_lowercase().as_str() {
"create" => Ok("Create".to_string()),
"exist_ok" => Ok("ExistOk".to_string()),
"overwrite" => Ok("Overwrite".to_string()),
_ => Err(napi::Error::from_reason(format!(
"Invalid mode '{}': expected one of 'create', 'exist_ok', 'overwrite'",
m
))),
})
.transpose()?;
let req = CreateNamespaceRequest {
id: Some(namespace_path),
mode: mode_str,
properties,
..Default::default()
};
let resp = self
.get_inner()?
.create_namespace(req)
.await
.default_error()?;
Ok(CreateNamespaceResponse {
properties: resp.properties,
transaction_id: resp.transaction_id,
})
}
#[napi(catch_unwind)]
/// Drop a namespace.
pub async fn drop_namespace(
&self,
namespace_path: Vec<String>,
mode: Option<String>,
behavior: Option<String>,
) -> napi::Result<DropNamespaceResponse> {
let mode_str = mode
.map(|m| match m.to_lowercase().as_str() {
"skip" => Ok("Skip".to_string()),
"fail" => Ok("Fail".to_string()),
_ => Err(napi::Error::from_reason(format!(
"Invalid mode '{}': expected one of 'skip', 'fail'",
m
))),
})
.transpose()?;
let behavior_str = behavior
.map(|b| match b.to_lowercase().as_str() {
"restrict" => Ok("Restrict".to_string()),
"cascade" => Ok("Cascade".to_string()),
_ => Err(napi::Error::from_reason(format!(
"Invalid behavior '{}': expected one of 'restrict', 'cascade'",
b
))),
})
.transpose()?;
let req = DropNamespaceRequest {
id: Some(namespace_path),
mode: mode_str,
behavior: behavior_str,
..Default::default()
};
let resp = self
.get_inner()?
.drop_namespace(req)
.await
.default_error()?;
Ok(DropNamespaceResponse {
properties: resp.properties,
transaction_id: resp.transaction_id,
})
}
}

View File

@@ -16,6 +16,7 @@ pub mod permutation;
mod query;
pub mod remote;
mod rerankers;
mod scannable;
mod session;
mod table;
mod util;
@@ -67,6 +68,26 @@ pub struct OpenTableOptions {
pub storage_options: Option<HashMap<String, String>>,
}
#[napi(object)]
#[derive(Debug)]
pub struct ConnectNamespaceOptions {
/// The interval, in seconds, at which to check for updates to the table
/// from other processes. If None, then consistency is not checked. For
/// performance reasons, this is the default. For strong consistency, set
/// this to zero seconds. Then every read will check for updates from other
/// processes. As a compromise, you can set this to a non-zero value for
/// eventual consistency.
pub read_consistency_interval: Option<f64>,
/// Configuration for object storage. The available options are described
/// at https://docs.lancedb.com/storage/
pub storage_options: Option<HashMap<String, String>>,
/// Extra properties for the backing namespace client.
pub namespace_client_properties: Option<HashMap<String, String>>,
/// The session to use for this connection. Holds shared caches and other
/// session-specific state.
pub session: Option<session::Session>,
}
#[napi_derive::module_init]
fn init() {
let env = Env::new()

253
nodejs/src/scannable.rs Normal file
View File

@@ -0,0 +1,253 @@
// SPDX-License-Identifier: Apache-2.0
// SPDX-FileCopyrightText: Copyright The LanceDB Authors
//! NodeJS binding for the [`lancedb::data::scannable::Scannable`] trait.
//!
//! The JS side supplies a `getNextBatch(isStart)` callback that returns the
//! next Arrow `RecordBatch` encoded as a self-contained Arrow IPC Stream
//! message (schema message + record batch message + EOS marker) wrapped in a
//! `Buffer`, or `null` when the stream is exhausted. The Rust side parses
//! each buffer with `arrow_ipc::reader::StreamReader`, validates every
//! standalone batch stream against the declared schema, and yields decoded
//! `RecordBatch`es as a [`SendableRecordBatchStream`].
//!
//! `isStart` is `true` on the first `getNextBatch` call of each new
//! `scan_as_stream` and `false` thereafter. JS uses it to drop any cached
//! iterator and re-invoke its factory at scan boundaries, so retries
//! triggered by mid-stream failures restart at batch 0.
use std::io::Cursor;
use std::sync::Arc;
use arrow_array::RecordBatch;
use arrow_ipc::reader::StreamReader;
use arrow_schema::SchemaRef;
use futures::stream::once;
use lancedb::arrow::{SendableRecordBatchStream, SimpleRecordBatchStream};
use lancedb::data::scannable::Scannable as LanceScannable;
use lancedb::ipc::ipc_file_to_schema;
use lancedb::{Error, Result as LanceResult};
use napi::bindgen_prelude::*;
use napi::threadsafe_function::ThreadsafeFunction;
use napi_derive::napi;
/// Threadsafe handle to the JS `getNextBatch` callback. The callback takes a
/// single boolean `isStart` (`true` on the first call of each new scan) and
/// returns a Promise that resolves to a `Buffer` containing one IPC Stream
/// message, or `null` at end-of-stream.
type GetNextBatchFn = ThreadsafeFunction<bool, Promise<Option<Buffer>>, bool, Status, false>;
/// A Rust-side view of a JS-constructed `Scannable`.
///
/// Held in JS as the return value of the `Scannable` class constructor. When
/// passed to a consumer that accepts `impl lancedb::data::scannable::Scannable`,
/// the consumer invokes `scan_as_stream()` to pull batches through the JS
/// callback.
#[napi]
pub struct NapiScannable {
schema: SchemaRef,
num_rows: Option<usize>,
rescannable: bool,
// `ThreadsafeFunction` is not `Clone`; wrap in `Arc` so the stream
// returned by `scan_as_stream` can own a handle independent of `self`.
get_next_batch: Arc<GetNextBatchFn>,
// Tracks whether a scan has already started; used to enforce one-shot
// semantics on non-rescannable sources.
scanned: bool,
}
#[napi]
impl NapiScannable {
/// Construct a new `NapiScannable`.
///
/// - `schema_buf` — Arrow IPC File buffer carrying only the schema (no batches).
/// - `num_rows` — optional row count hint; not validated against the stream.
/// - `rescannable` — whether `get_next_batch` may be re-driven after the
/// scan completes.
/// - `get_next_batch` -- JS callback that yields the next batch as an Arrow
/// IPC Stream message wrapped in a `Buffer`, or `null` at EOF. The
/// `isStart` argument is `true` on the first call of each new scan;
/// JS uses it to discard any cached iterator before pulling.
#[napi(constructor)]
pub fn new(
schema_buf: Buffer,
num_rows: Option<i64>,
rescannable: bool,
get_next_batch: Function<bool, Promise<Option<Buffer>>>,
) -> napi::Result<Self> {
let schema = ipc_file_to_schema(schema_buf.to_vec())
.map_err(|e| napi::Error::from_reason(format!("Invalid schema buffer: {}", e)))?;
let num_rows = num_rows
.map(|n| {
usize::try_from(n)
.map_err(|_| napi::Error::from_reason("num_rows must be non-negative"))
})
.transpose()?;
let get_next_batch = Arc::new(get_next_batch.build_threadsafe_function().build()?);
Ok(Self {
schema,
num_rows,
rescannable,
get_next_batch,
scanned: false,
})
}
}
impl std::fmt::Debug for NapiScannable {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
f.debug_struct("NapiScannable")
.field("schema", &self.schema)
.field("num_rows", &self.num_rows)
.field("rescannable", &self.rescannable)
.finish()
}
}
impl LanceScannable for NapiScannable {
fn schema(&self) -> SchemaRef {
self.schema.clone()
}
fn scan_as_stream(&mut self) -> SendableRecordBatchStream {
let schema = self.schema.clone();
// One-shot enforcement for non-rescannable sources: return a stream
// whose first item is an error.
if self.scanned && !self.rescannable {
let err_stream = once(async {
Err(Error::InvalidInput {
message: "Scannable has already been consumed (non-rescannable source)"
.to_string(),
})
});
return Box::pin(SimpleRecordBatchStream::new(err_stream, schema));
}
self.scanned = true;
let tsfn = Arc::clone(&self.get_next_batch);
let declared_schema = schema.clone();
// State threaded through the unfold. `is_first_pull` starts true so
// the first call into JS signals a new-scan boundary; JS uses it to
// reset any cached iterator before factory()-ing a fresh one.
let initial = State {
tsfn,
batch_index: 0,
declared_schema,
errored: false,
is_first_pull: true,
};
let stream = futures::stream::unfold(initial, |mut state| async move {
if state.errored {
return None;
}
// Pull the next IPC Stream buffer from JS. `is_first_pull` is
// consumed here and cleared so subsequent pulls continue the
// same scan rather than restarting it.
let is_start = state.is_first_pull;
state.is_first_pull = false;
let buf = match pull_next(&state.tsfn, is_start).await {
Ok(Some(buf)) => buf,
Ok(None) => return None,
Err(e) => {
state.errored = true;
return Some((Err(e), state));
}
};
match decode_one_batch(buf.as_ref(), &state.declared_schema) {
Ok(batch) => {
state.batch_index += 1;
Some((Ok(batch), state))
}
Err(e) => {
let tagged = Error::Runtime {
message: format!(
"[scannable/rust-bridge] failure at batch index {}: {}",
state.batch_index, e
),
};
state.errored = true;
Some((Err(tagged), state))
}
}
});
Box::pin(SimpleRecordBatchStream::new(stream, schema))
}
fn num_rows(&self) -> Option<usize> {
self.num_rows
}
fn rescannable(&self) -> bool {
self.rescannable
}
}
struct State {
tsfn: Arc<GetNextBatchFn>,
batch_index: usize,
declared_schema: SchemaRef,
errored: bool,
/// True for the very first pull of a new scan. Forwarded to JS so the
/// callback can drop any cached iterator and call its factory fresh,
/// which makes rescannable sources restart at batch 0 even when the
/// previous scan ended mid-stream.
is_first_pull: bool,
}
/// Invoke the JS callback and await its Promise. `is_start` is forwarded to
/// the JS side as the `isStart` argument so it can reset its iterator at the
/// scan boundary. Errors on the JS side surface here as rejected promises
/// and are tunneled back as `lancedb::Error::Runtime`.
async fn pull_next(tsfn: &GetNextBatchFn, is_start: bool) -> LanceResult<Option<Buffer>> {
let promise = tsfn
.call_async(is_start)
.await
.map_err(|e| Error::Runtime {
message: format!(
"[scannable/js-factory] napi error status={}, reason={}",
e.status, e.reason
),
})?;
promise.await.map_err(|e| Error::Runtime {
message: format!(
"[scannable/js-iterator] napi error status={}, reason={}",
e.status, e.reason
),
})
}
/// Decode one IPC Stream buffer (schema + batch + EOS) into a `RecordBatch`.
/// Each buffer is a standalone IPC stream, so every decoded stream schema must
/// match the one declared at construction.
fn decode_one_batch(buf: &[u8], declared: &SchemaRef) -> LanceResult<RecordBatch> {
let reader = StreamReader::try_new(Cursor::new(buf), None).map_err(|e| Error::Runtime {
message: format!("failed to open IPC stream reader: {}", e),
})?;
let actual = reader.schema();
if actual.as_ref() != declared.as_ref() {
return Err(Error::InvalidInput {
message: format!(
"declared schema does not match stream schema: declared={:?} actual={:?}",
declared, actual
),
});
}
let mut iter = reader;
let batch = iter
.next()
.ok_or_else(|| Error::Runtime {
message: "IPC stream contained schema but no record batch".to_string(),
})?
.map_err(|e| Error::Runtime {
message: format!("failed to decode record batch: {}", e),
})?;
Ok(batch)
}

View File

@@ -1,5 +1,5 @@
[tool.bumpversion]
current_version = "0.32.0"
current_version = "0.31.0-beta.11"
parse = """(?x)
(?P<major>0|[1-9]\\d*)\\.
(?P<minor>0|[1-9]\\d*)\\.

View File

@@ -1,6 +1,6 @@
[package]
name = "lancedb-python"
version = "0.32.0"
version = "0.31.0-beta.11"
publish = false
edition.workspace = true
description = "Python bindings for LanceDB"
@@ -19,6 +19,7 @@ arrow = { version = "58.0.0", features = ["pyarrow"] }
async-trait = "0.1"
bytes = "1"
lancedb = { path = "../rust/lancedb", default-features = false }
datafusion-common.workspace = true
lance-core.workspace = true
lance-namespace.workspace = true
lance-namespace-impls.workspace = true

View File

@@ -45,7 +45,7 @@ repository = "https://github.com/lancedb/lancedb"
[project.optional-dependencies]
pylance = [
"pylance>=6.0.0",
"pylance>=5.0.0b5",
]
tests = [
"aiohttp>=3.9.0",
@@ -58,7 +58,7 @@ tests = [
"pytz>=2023.3",
"polars>=0.19, <=1.3.0",
"pyarrow-stubs>=16.0",
"pylance>=6.0.0",
"pylance>=5.0.0b5",
"requests>=2.31.0",
"datafusion>=52,<53",
]

View File

@@ -51,7 +51,7 @@ class PyExpr:
def to_sql(self) -> str: ...
def expr_col(name: str) -> PyExpr: ...
def expr_lit(value: Union[bool, int, float, str]) -> PyExpr: ...
def expr_lit(value: Union[bool, int, float, str, bytes]) -> PyExpr: ...
def expr_func(name: str, args: List[PyExpr]) -> PyExpr: ...
class Session:

View File

@@ -63,7 +63,7 @@ def _coerce(value: "ExprLike") -> "Expr":
# Type alias used in annotations.
ExprLike = Union["Expr", bool, int, float, str]
ExprLike = Union["Expr", bool, int, float, str, bytes]
class Expr:
@@ -261,13 +261,13 @@ def col(name: str) -> Expr:
return Expr(expr_col(name))
def lit(value: Union[bool, int, float, str]) -> Expr:
def lit(value: Union[bool, int, float, str, bytes]) -> Expr:
"""Create a literal (constant) value expression.
Parameters
----------
value:
A Python ``bool``, ``int``, ``float``, or ``str``.
A Python ``bool``, ``int``, ``float``, ``str``, or ``bytes``.
Examples
--------

View File

@@ -6,22 +6,44 @@
from typing import Optional
_CREATE_NAMESPACE_MODES = frozenset({"create", "exist_ok", "overwrite"})
_DROP_NAMESPACE_MODES = frozenset({"SKIP", "FAIL"})
_DROP_NAMESPACE_BEHAVIORS = frozenset({"RESTRICT", "CASCADE"})
def _normalize_create_namespace_mode(mode: Optional[str]) -> Optional[str]:
"""Normalize create namespace mode to lowercase (API expects lowercase)."""
if mode is None:
return None
return mode.lower()
normalized = mode.lower()
if normalized not in _CREATE_NAMESPACE_MODES:
raise ValueError(
f"Invalid create namespace mode {mode!r}: "
f"expected one of 'create', 'exist_ok', 'overwrite'"
)
return normalized
def _normalize_drop_namespace_mode(mode: Optional[str]) -> Optional[str]:
"""Normalize drop namespace mode to uppercase (API expects uppercase)."""
if mode is None:
return None
return mode.upper()
normalized = mode.upper()
if normalized not in _DROP_NAMESPACE_MODES:
raise ValueError(
f"Invalid drop namespace mode {mode!r}: expected one of 'skip', 'fail'"
)
return normalized
def _normalize_drop_namespace_behavior(behavior: Optional[str]) -> Optional[str]:
"""Normalize drop namespace behavior to uppercase (API expects uppercase)."""
if behavior is None:
return None
return behavior.upper()
normalized = behavior.upper()
if normalized not in _DROP_NAMESPACE_BEHAVIORS:
raise ValueError(
f"Invalid drop namespace behavior {behavior!r}: "
f"expected one of 'restrict', 'cascade'"
)
return normalized

View File

@@ -1,4 +1,3 @@
segmenter:
mode: "normal"
dictionary:
path: "./python/tests/models/lindera/ipadic/main"
dictionary: "./python/tests/models/lindera/ipadic/main"

View File

@@ -914,6 +914,29 @@ def test_local_namespace_operations(tmp_path):
assert db.list_namespaces().namespaces == []
def test_create_namespace_invalid_mode_raises(tmp_path):
"""Unrecognized create namespace modes raise a clear error."""
db = lancedb.connect(tmp_path)
with pytest.raises(ValueError, match="Invalid create namespace mode"):
db.create_namespace(["child"], mode="frobnicate")
def test_drop_namespace_invalid_mode_raises(tmp_path):
"""Unrecognized drop namespace modes raise a clear error."""
db = lancedb.connect(tmp_path)
db.create_namespace(["child"])
with pytest.raises(ValueError, match="Invalid drop namespace mode"):
db.drop_namespace(["child"], mode="frobnicate")
def test_drop_namespace_invalid_behavior_raises(tmp_path):
"""Unrecognized drop namespace behaviors raise a clear error."""
db = lancedb.connect(tmp_path)
db.create_namespace(["child"])
with pytest.raises(ValueError, match="Invalid drop namespace behavior"):
db.drop_namespace(["child"], behavior="frobnicate")
def test_clone_table_latest_version(tmp_path):
"""Test cloning a table with the latest version (default behavior)"""
import os

View File

@@ -116,8 +116,7 @@ def lindera_ipadic(language_model_home):
config_path.write_text(
"segmenter:\n"
' mode: "normal"\n'
" dictionary:\n"
f' path: "{extracted_model.resolve().as_posix()}"\n',
f' dictionary: "{extracted_model.resolve().as_posix()}"\n',
encoding="utf-8",
)

View File

@@ -395,12 +395,17 @@ impl Connection {
future_into_py(py, async move {
use lance_namespace::models::CreateNamespaceRequest;
// Mode is now a string field
let mode_str = mode.and_then(|m| match m.to_lowercase().as_str() {
"create" => Some("Create".to_string()),
"exist_ok" => Some("ExistOk".to_string()),
"overwrite" => Some("Overwrite".to_string()),
_ => None,
});
let mode_str = mode
.map(|m| match m.to_lowercase().as_str() {
"create" => Ok("Create".to_string()),
"exist_ok" => Ok("ExistOk".to_string()),
"overwrite" => Ok("Overwrite".to_string()),
_ => Err(PyValueError::new_err(format!(
"Invalid mode {:?}: expected one of 'create', 'exist_ok', 'overwrite'",
m
))),
})
.transpose()?;
let request = CreateNamespaceRequest {
id: Some(namespace_path),
mode: mode_str,
@@ -428,16 +433,26 @@ impl Connection {
future_into_py(py, async move {
use lance_namespace::models::DropNamespaceRequest;
// Mode and Behavior are now string fields
let mode_str = mode.and_then(|m| match m.to_uppercase().as_str() {
"SKIP" => Some("Skip".to_string()),
"FAIL" => Some("Fail".to_string()),
_ => None,
});
let behavior_str = behavior.and_then(|b| match b.to_uppercase().as_str() {
"RESTRICT" => Some("Restrict".to_string()),
"CASCADE" => Some("Cascade".to_string()),
_ => None,
});
let mode_str = mode
.map(|m| match m.to_uppercase().as_str() {
"SKIP" => Ok("Skip".to_string()),
"FAIL" => Ok("Fail".to_string()),
_ => Err(PyValueError::new_err(format!(
"Invalid mode {:?}: expected one of 'skip', 'fail'",
m
))),
})
.transpose()?;
let behavior_str = behavior
.map(|b| match b.to_uppercase().as_str() {
"RESTRICT" => Ok("Restrict".to_string()),
"CASCADE" => Ok("Cascade".to_string()),
_ => Err(PyValueError::new_err(format!(
"Invalid behavior {:?}: expected one of 'restrict', 'cascade'",
b
))),
})
.transpose()?;
let request = DropNamespaceRequest {
id: Some(namespace_path),
mode: mode_str,

View File

@@ -8,7 +8,9 @@
//! DataFusion [`Expr`] nodes, bypassing SQL string parsing.
use arrow::{datatypes::DataType, pyarrow::PyArrowType};
use datafusion_common::ScalarValue;
use lancedb::expr::{DfExpr, col as ldb_col, contains, expr_cast, lit as df_lit, lower, upper};
use pyo3::types::PyBytes;
use pyo3::{Bound, PyAny, PyResult, exceptions::PyValueError, prelude::*, pyfunction};
/// A type-safe DataFusion expression.
@@ -141,7 +143,7 @@ pub fn expr_col(name: &str) -> PyExpr {
/// Create a literal value expression.
///
/// Supported Python types: `bool`, `int`, `float`, `str`.
/// Supported Python types: `bool`, `int`, `float`, `str`, `bytes`.
#[pyfunction]
pub fn expr_lit(value: Bound<'_, PyAny>) -> PyResult<PyExpr> {
// bool must be checked before int because bool is a subclass of int in Python
@@ -157,8 +159,12 @@ pub fn expr_lit(value: Bound<'_, PyAny>) -> PyResult<PyExpr> {
if let Ok(s) = value.extract::<String>() {
return Ok(PyExpr(df_lit(s)));
}
if value.is_instance_of::<PyBytes>() {
let bytes = value.extract::<Vec<u8>>()?;
return Ok(PyExpr(df_lit(ScalarValue::Binary(Some(bytes)))));
}
Err(PyValueError::new_err(format!(
"unsupported literal type: {}. Supported: bool, int, float, str",
"unsupported literal type: {}. Supported: bool, int, float, str, bytes",
value.get_type().name()?
)))
}

View File

@@ -33,6 +33,14 @@ class TestExprConstruction:
e = lit(True)
assert isinstance(e, Expr)
def test_lit_bytes(self):
e = lit(b"\xde\xad\xbe\xef")
assert isinstance(e, Expr)
def test_lit_bytes_empty(self):
e = lit(b"")
assert isinstance(e, Expr)
def test_lit_unsupported_type_raises(self):
with pytest.raises(Exception):
lit([1, 2, 3])
@@ -135,6 +143,43 @@ class TestExprOperators:
assert e.to_sql() == "(name = 'alice')"
class TestExprBytesLiteral:
def test_bytes_to_sql(self):
e = lit(b"\xde\xad\xbe\xef")
assert e.to_sql() == "X'DEADBEEF'"
def test_empty_bytes_to_sql(self):
e = lit(b"")
assert e.to_sql() == "X''"
def test_bytes_repr(self):
e = lit(b"\x01\x02")
assert repr(e) == "Expr(X'0102')"
def test_bytes_equality_expr_sql(self):
e = col("data") == lit(b"\xca\xfe")
assert e.to_sql() == "(data = X'CAFE')"
def test_bytes_ne_expr_sql(self):
e = col("data") != lit(b"\xff")
assert e.to_sql() == "(data <> X'FF')"
def test_bytes_compound_expr_sql(self):
e = (col("data") == lit(b"\x01")) & (col("id") > lit(5))
assert e.to_sql() == "((data = X'01') AND (id > 5))"
def test_bytes_in_function_call(self):
# Regression test: binary literals inside scalar function calls
# used to fail because DataFusion's unparser does not support Binary
# scalars. Now handled via a placeholder-substitution rewrite.
e = func("contains", col("data"), lit(b"\xff"))
assert e.to_sql() == "contains(data, X'FF')"
def test_bytes_in_not(self):
e = ~(col("data") == lit(b"\xff"))
assert e.to_sql() == "NOT (data = X'FF')"
class TestExprStringMethods:
def test_lower(self):
e = col("name").lower()
@@ -385,3 +430,44 @@ class TestColNamingIntegration:
)
assert "upper_name" in result.schema.names
assert sorted(result["upper_name"].to_pylist()) == ["ALICE", "BOB", "CHARLIE"]
# ── bytes / binary column integration tests ───────────────────────────────────
@pytest.fixture
def binary_table(tmp_path):
db = lancedb.connect(str(tmp_path))
data = pa.table(
{
"id": [1, 2, 3],
"payload": pa.array(
[b"\x01\x02", b"\xca\xfe", b"\xff\x00"],
type=pa.binary(),
),
}
)
return db.create_table("binary_test", data)
class TestExprBytesIntegration:
def test_binary_equality_filter(self, binary_table):
result = (
binary_table.search().where(col("payload") == lit(b"\xca\xfe")).to_arrow()
)
assert result.num_rows == 1
assert result["id"][0].as_py() == 2
def test_binary_ne_filter(self, binary_table):
result = (
binary_table.search().where(col("payload") != lit(b"\x01\x02")).to_arrow()
)
assert result.num_rows == 2
def test_binary_compound_filter(self, binary_table):
result = (
binary_table.search()
.where((col("payload") == lit(b"\x01\x02")) | (col("id") == lit(3)))
.to_arrow()
)
assert result.num_rows == 2

View File

@@ -1,6 +1,6 @@
[package]
name = "lancedb"
version = "0.29.0"
version = "0.28.0-beta.11"
edition.workspace = true
description = "LanceDB: A serverless, low-latency vector database for AI applications"
license.workspace = true
@@ -108,7 +108,12 @@ test-log = "0.2"
[features]
default = []
aws = ["lance/aws", "lance-io/aws", "lance-namespace-impls/dir-aws"]
aws = [
"lance/aws",
"lance-io/aws",
"lance-namespace-impls/dir-aws",
"object_store/aws",
]
oss = ["lance/oss", "lance-io/oss", "lance-namespace-impls/dir-oss"]
gcs = ["lance/gcp", "lance-io/gcp", "lance-namespace-impls/dir-gcp"]
azure = [

View File

@@ -722,7 +722,7 @@ impl ListingDatabase {
let commit_handler = commit_handler_from_url(&uri, &Some(object_store_params)).await?;
for name in names {
let dir_name = format!("{}.{}", name, LANCE_EXTENSION);
let full_path = self.base_path.child(dir_name.clone());
let full_path = self.base_path.clone().join(dir_name.clone());
commit_handler.delete(&full_path).await?;

View File

@@ -11,6 +11,7 @@ use lance::io::commit::namespace_manifest::LanceNamespaceExternalManifestStore;
use lance_io::object_store::{ObjectStoreParams, StorageOptionsAccessor};
use lance_namespace::{
LanceNamespace,
error::{ErrorCode, NamespaceError},
models::{
CreateNamespaceRequest, CreateNamespaceResponse, DeclareTableRequest,
DescribeNamespaceRequest, DescribeNamespaceResponse, DescribeTableRequest,
@@ -29,7 +30,7 @@ use crate::database::listing::{
OPT_NEW_TABLE_V2_MANIFEST_PATHS,
};
use crate::error::{Error, Result};
use crate::table::NativeTable;
use crate::table::{NativeTable, map_namespace_lance_error};
use lance::dataset::WriteMode;
use super::{
@@ -37,6 +38,19 @@ use super::{
Database, OpenTableRequest, TableNamesRequest,
};
/// Returns true if the given `lance::Error` (anywhere in its source chain) is a
/// `NamespaceError::TableAlreadyExists`.
fn is_table_already_exists_namespace_error(err: &lance::Error) -> bool {
let mut current: Option<&(dyn std::error::Error + 'static)> = Some(err);
while let Some(e) = current {
if let Some(ns_err) = e.downcast_ref::<NamespaceError>() {
return ns_err.code() == ErrorCode::TableAlreadyExists;
}
current = e.source();
}
false
}
/// A database implementation that uses lance-namespace for table management
pub struct LanceNamespaceDatabase {
namespace: Arc<dyn LanceNamespace>,
@@ -356,13 +370,15 @@ impl Database for LanceNamespaceDatabase {
(loc, opts, response.managed_versioning)
}
Err(e)
if matches!(request.mode, CreateTableMode::Create) && {
let err_str = e.to_string();
err_str.contains("already exists")
|| err_str.contains("TableAlreadyExists")
|| err_str.contains("table already exists")
} =>
if matches!(request.mode, CreateTableMode::Create)
&& is_table_already_exists_namespace_error(&e) =>
{
// A declare conflict can either mean (a) the table was previously
// *declared* but never written (in which case we should proceed and
// create it), or (b) the table is fully realized (in which case the
// user is creating something that already exists and we should
// surface TableAlreadyExists). Disambiguate by describing the table
// and checking whether it has both a version and a schema.
let response = self
.namespace
.describe_table(DescribeTableRequest {
@@ -370,11 +386,8 @@ impl Database for LanceNamespaceDatabase {
..Default::default()
})
.await
.map_err(|describe_err| Error::Runtime {
message: format!(
"Failed to describe existing declared table after declare conflict: {}",
describe_err
),
.map_err(|describe_err| {
map_namespace_lance_error(describe_err, &request.name)
})?;
if response.version.is_some() && response.schema.is_some() {
@@ -394,9 +407,7 @@ impl Database for LanceNamespaceDatabase {
(loc, opts, response.managed_versioning)
}
Err(e) => {
return Err(Error::Runtime {
message: format!("Failed to declare table: {}", e),
});
return Err(map_namespace_lance_error(e, &request.name));
}
}
}
@@ -1086,8 +1097,120 @@ mod tests {
.execute()
.await;
// Verify: Should return an error
assert!(result.is_err());
// Verify: Should return TableNotFound — not a generic Runtime/internal error
// (regression test for ENT-1235: open_table on missing table previously surfaced as
// a generic 500/Runtime error rather than TableNotFound).
match result {
Err(Error::TableNotFound { name, .. }) => {
assert_eq!(name, "non_existent_table");
}
Err(other) => panic!("Expected TableNotFound, got: {:?}", other),
Ok(_) => panic!("Expected open_table to fail, but it succeeded"),
}
}
#[tokio::test]
async fn test_namespace_open_table_not_found_at_root() {
// Same as above, but at the root namespace (no parent namespace creation).
// Covers the common code path used by `db.open_table("foo")` without a namespace.
let tmp_dir = tempdir().unwrap();
let root_path = tmp_dir.path().to_str().unwrap().to_string();
let mut properties = HashMap::new();
properties.insert("root".to_string(), root_path);
let conn = connect_namespace("dir", properties)
.execute()
.await
.expect("Failed to connect to namespace");
let result = conn.open_table("missing_at_root").execute().await;
match result {
Err(Error::TableNotFound { name, .. }) => {
assert_eq!(name, "missing_at_root");
}
Err(other) => panic!("Expected TableNotFound, got: {:?}", other),
Ok(_) => panic!("Expected open_table to fail, but it succeeded"),
}
}
#[tokio::test]
async fn test_namespace_create_table_already_exists() {
// Regression test for ENT-1235: create_table on an existing table (in default
// Create mode) should return TableAlreadyExists, not a generic Runtime/500 error.
let tmp_dir = tempdir().unwrap();
let root_path = tmp_dir.path().to_str().unwrap().to_string();
let mut properties = HashMap::new();
properties.insert("root".to_string(), root_path);
let conn = connect_namespace("dir", properties)
.execute()
.await
.expect("Failed to connect to namespace");
conn.create_namespace(CreateNamespaceRequest {
id: Some(vec!["test_ns".into()]),
..Default::default()
})
.await
.expect("Failed to create namespace");
// Create the table once.
conn.create_table("dup_table", create_test_data())
.namespace(vec!["test_ns".into()])
.execute()
.await
.expect("Failed to create table the first time");
// Try to create it again with the default Create mode.
let result = conn
.create_table("dup_table", create_test_data())
.namespace(vec!["test_ns".into()])
.execute()
.await;
match result {
Err(Error::TableAlreadyExists { name }) => {
assert_eq!(name, "dup_table");
}
Err(other) => panic!("Expected TableAlreadyExists, got: {:?}", other),
Ok(_) => panic!("Expected create_table to fail, but it succeeded"),
}
}
#[tokio::test]
async fn test_namespace_create_table_already_exists_at_root() {
// Same as above, but at the root namespace.
let tmp_dir = tempdir().unwrap();
let root_path = tmp_dir.path().to_str().unwrap().to_string();
let mut properties = HashMap::new();
properties.insert("root".to_string(), root_path);
let conn = connect_namespace("dir", properties)
.execute()
.await
.expect("Failed to connect to namespace");
conn.create_table("dup_root", create_test_data())
.execute()
.await
.expect("Failed to create table the first time");
let result = conn
.create_table("dup_root", create_test_data())
.execute()
.await;
match result {
Err(Error::TableAlreadyExists { name }) => {
assert_eq!(name, "dup_root");
}
Err(other) => panic!("Expected TableAlreadyExists, got: {:?}", other),
Ok(_) => panic!("Expected create_table to fail, but it succeeded"),
}
}
#[tokio::test]

View File

@@ -138,4 +138,69 @@ mod tests {
let sql = expr_to_sql_string(&expr).unwrap();
assert!(sql.contains("price"));
}
#[test]
fn test_binary_literal() {
use datafusion_common::ScalarValue;
let expr = lit(ScalarValue::Binary(Some(vec![0xde, 0xad, 0xbe, 0xef])));
let sql = expr_to_sql_string(&expr).unwrap();
assert_eq!(sql, "X'DEADBEEF'");
}
#[test]
fn test_binary_literal_in_filter() {
use datafusion_common::ScalarValue;
let expr = col("data").eq(lit(ScalarValue::Binary(Some(vec![0xca, 0xfe]))));
let sql = expr_to_sql_string(&expr).unwrap();
assert_eq!(sql, "(data = X'CAFE')");
}
#[test]
fn test_binary_literal_compound() {
use datafusion_common::ScalarValue;
let bin_expr = col("data").eq(lit(ScalarValue::Binary(Some(vec![0x01]))));
let int_expr = col("id").gt(lit(5i64));
let combined = bin_expr.and(int_expr);
let sql = expr_to_sql_string(&combined).unwrap();
assert_eq!(sql, "((data = X'01') AND (id > 5))");
}
#[test]
fn test_null_binary_literal() {
use datafusion_common::ScalarValue;
let expr = lit(ScalarValue::Binary(None));
let sql = expr_to_sql_string(&expr).unwrap();
assert_eq!(sql, "NULL");
}
#[test]
fn test_binary_literal_in_function_call() {
use datafusion_common::ScalarValue;
// Binary literals inside scalar function arguments must also be
// serialized correctly (regression test for placeholder rewrite path).
let expr = contains(col("data"), lit(ScalarValue::Binary(Some(vec![0xff]))));
let sql = expr_to_sql_string(&expr).unwrap();
assert_eq!(sql, "contains(data, X'FF')");
}
#[test]
fn test_binary_literal_in_negation() {
use datafusion_common::ScalarValue;
use std::ops::Not;
let expr = col("data")
.eq(lit(ScalarValue::Binary(Some(vec![0xab, 0xcd]))))
.not();
let sql = expr_to_sql_string(&expr).unwrap();
assert_eq!(sql, "NOT (data = X'ABCD')");
}
#[test]
fn test_multiple_binary_literals() {
use datafusion_common::ScalarValue;
let lhs = col("a").eq(lit(ScalarValue::Binary(Some(vec![0x01]))));
let rhs = col("b").eq(lit(ScalarValue::Binary(Some(vec![0x02, 0x03]))));
let expr = lhs.and(rhs);
let sql = expr_to_sql_string(&expr).unwrap();
assert_eq!(sql, "((a = X'01') AND (b = X'0203'))");
}
}

View File

@@ -1,6 +1,8 @@
// SPDX-License-Identifier: Apache-2.0
// SPDX-FileCopyrightText: Copyright The LanceDB Authors
use datafusion_common::ScalarValue;
use datafusion_common::tree_node::{Transformed, TreeNode, TreeNodeRecursion};
use datafusion_expr::Expr;
use datafusion_sql::unparser::{self, dialect::Dialect};
@@ -28,7 +30,36 @@ impl Dialect for LanceSqlDialect {
}
}
pub fn expr_to_sql_string(expr: &Expr) -> crate::Result<String> {
/// Prefix for placeholder strings inserted in place of binary literals. Chosen
/// to be extremely unlikely to occur in user data.
const BINARY_PLACEHOLDER_PREFIX: &str = "__lancedb_binary_placeholder_";
fn bytes_to_hex_sql(bytes: &[u8]) -> String {
let hex: String = bytes.iter().map(|b| format!("{b:02X}")).collect();
format!("X'{hex}'")
}
/// Returns true if *expr* contains a `Binary` or `LargeBinary` scalar literal
/// anywhere in its subtree. DataFusion's SQL unparser cannot serialize those
/// variants, so we route such expressions through a placeholder-substitution
/// path that emits SQL `X'...'` byte-string literals.
fn has_binary_literal(expr: &Expr) -> bool {
let mut found = false;
let _ = expr.apply(&mut |e: &Expr| {
if matches!(
e,
Expr::Literal(ScalarValue::Binary(_) | ScalarValue::LargeBinary(_), _)
) {
found = true;
Ok(TreeNodeRecursion::Stop)
} else {
Ok(TreeNodeRecursion::Continue)
}
});
found
}
fn run_unparser(expr: &Expr) -> crate::Result<String> {
let ast = unparser::Unparser::new(&LanceSqlDialect)
.expr_to_sql(expr)
.map_err(|e| crate::Error::InvalidInput {
@@ -36,3 +67,49 @@ pub fn expr_to_sql_string(expr: &Expr) -> crate::Result<String> {
})?;
Ok(ast.to_string())
}
pub fn expr_to_sql_string(expr: &Expr) -> crate::Result<String> {
// Fast path: no binary literals — DataFusion's unparser handles everything.
if !has_binary_literal(expr) {
return run_unparser(expr);
}
// Slow path: DataFusion's unparser cannot serialize `Binary`/`LargeBinary`
// scalars, so we rewrite each one to a unique string-literal placeholder,
// let the unparser do the rest of the work, then substitute the SQL
// `X'...'` byte-string literal back in. This keeps the operator/function
// serialization logic centralized in DataFusion and works for every
// expression node type the unparser supports.
let mut bindings: Vec<Vec<u8>> = Vec::new();
let rewritten = expr
.clone()
.transform(|e: Expr| match e {
Expr::Literal(ScalarValue::Binary(Some(bytes)), m)
| Expr::Literal(ScalarValue::LargeBinary(Some(bytes)), m) => {
let placeholder = format!("{}{}__", BINARY_PLACEHOLDER_PREFIX, bindings.len());
bindings.push(bytes);
Ok(Transformed::yes(Expr::Literal(
ScalarValue::Utf8(Some(placeholder)),
m,
)))
}
Expr::Literal(ScalarValue::Binary(None), m)
| Expr::Literal(ScalarValue::LargeBinary(None), m) => {
Ok(Transformed::yes(Expr::Literal(ScalarValue::Null, m)))
}
other => Ok(Transformed::no(other)),
})
.map_err(|e| crate::Error::InvalidInput {
message: format!("failed to rewrite expression: {}", e),
})?
.data;
let mut sql = run_unparser(&rewritten)?;
for (i, bytes) in bindings.iter().enumerate() {
// The unparser quotes string literals with single quotes, so the
// placeholder appears as `'__lancedb_binary_placeholder_<i>__'`.
let quoted = format!("'{}{}__'", BINARY_PLACEHOLDER_PREFIX, i);
sql = sql.replace(&quoted, &bytes_to_hex_sql(bytes));
}
Ok(sql)
}

View File

@@ -5,11 +5,12 @@
use std::{fmt::Formatter, sync::Arc};
use futures::{TryFutureExt, stream::BoxStream};
use futures::{StreamExt, TryFutureExt, stream::BoxStream};
use lance::io::WrappingObjectStore;
use object_store::{
Error, GetOptions, GetResult, ListResult, MultipartUpload, ObjectMeta, ObjectStore,
PutMultipartOptions, PutOptions, PutPayload, PutResult, Result, UploadPart, path::Path,
CopyOptions, Error, GetOptions, GetResult, ListResult, MultipartUpload, ObjectMeta,
ObjectStore, ObjectStoreExt, PutMultipartOptions, PutOptions, PutPayload, PutResult, Result,
UploadPart, path::Path,
};
use async_trait::async_trait;
@@ -93,20 +94,6 @@ impl ObjectStore for MirroringObjectStore {
self.primary.get_opts(location, options).await
}
async fn head(&self, location: &Path) -> Result<ObjectMeta> {
self.primary.head(location).await
}
async fn delete(&self, location: &Path) -> Result<()> {
if !location.primary_only() {
match self.secondary.delete(location).await {
Err(Error::NotFound { .. }) | Ok(_) => {}
Err(e) => return Err(e),
}
}
self.primary.delete(location).await
}
fn list(&self, prefix: Option<&Path>) -> BoxStream<'static, Result<ObjectMeta>> {
self.primary.list(prefix)
}
@@ -115,21 +102,40 @@ impl ObjectStore for MirroringObjectStore {
self.primary.list_with_delimiter(prefix).await
}
async fn copy(&self, from: &Path, to: &Path) -> Result<()> {
if to.primary_only() {
self.primary.copy(from, to).await
} else {
self.secondary.copy(from, to).await?;
self.primary.copy(from, to).await?;
Ok(())
}
fn delete_stream(
&self,
locations: BoxStream<'static, Result<Path>>,
) -> BoxStream<'static, Result<Path>> {
let primary = self.primary.clone();
let secondary = self.secondary.clone();
locations
.map(move |location| {
let primary = primary.clone();
let secondary = secondary.clone();
async move {
let location = location?;
if !location.primary_only() {
match secondary.delete(&location).await {
Err(Error::NotFound { .. }) | Ok(_) => {}
Err(e) => return Err(e),
}
}
primary.delete(&location).await?;
Ok(location)
}
})
.buffered(10)
.boxed()
}
async fn copy_if_not_exists(&self, from: &Path, to: &Path) -> Result<()> {
if !to.primary_only() {
self.secondary.copy(from, to).await?;
async fn copy_opts(&self, from: &Path, to: &Path, options: CopyOptions) -> Result<()> {
if to.primary_only() {
self.primary.copy_opts(from, to, options).await
} else {
self.secondary.copy_opts(from, to, options.clone()).await?;
self.primary.copy_opts(from, to, options).await?;
Ok(())
}
self.primary.copy_if_not_exists(from, to).await
}
}

View File

@@ -10,9 +10,9 @@ use bytes::Bytes;
use futures::stream::BoxStream;
use lance::io::WrappingObjectStore;
use object_store::{
GetOptions, GetResult, ListResult, MultipartUpload, ObjectMeta, ObjectStore,
PutMultipartOptions, PutOptions, PutPayload, PutResult, Result as OSResult, UploadPart,
path::Path,
CopyOptions, GetOptions, GetResult, ListResult, MultipartUpload, ObjectMeta, ObjectStore,
PutMultipartOptions, PutOptions, PutPayload, PutResult, RenameOptions, Result as OSResult,
UploadPart, path::Path,
};
#[derive(Debug, Default)]
@@ -81,11 +81,6 @@ impl IoTrackingStore {
#[async_trait::async_trait]
#[deny(clippy::missing_trait_methods)]
impl ObjectStore for IoTrackingStore {
async fn put(&self, location: &Path, bytes: PutPayload) -> OSResult<PutResult> {
self.record_write(bytes.content_length() as u64);
self.target.put(location, bytes).await
}
async fn put_opts(
&self,
location: &Path,
@@ -96,14 +91,6 @@ impl ObjectStore for IoTrackingStore {
self.target.put_opts(location, bytes, opts).await
}
async fn put_multipart(&self, location: &Path) -> OSResult<Box<dyn MultipartUpload>> {
let target = self.target.put_multipart(location).await?;
Ok(Box::new(IoTrackingMultipartUpload {
target,
stats: self.stats.clone(),
}))
}
async fn put_multipart_opts(
&self,
location: &Path,
@@ -116,15 +103,6 @@ impl ObjectStore for IoTrackingStore {
}))
}
async fn get(&self, location: &Path) -> OSResult<GetResult> {
let result = self.target.get(location).await;
if let Ok(result) = &result {
let num_bytes = result.range.end - result.range.start;
self.record_read(num_bytes);
}
result
}
async fn get_opts(&self, location: &Path, options: GetOptions) -> OSResult<GetResult> {
let result = self.target.get_opts(location, options).await;
if let Ok(result) = &result {
@@ -134,14 +112,6 @@ impl ObjectStore for IoTrackingStore {
result
}
async fn get_range(&self, location: &Path, range: std::ops::Range<u64>) -> OSResult<Bytes> {
let result = self.target.get_range(location, range).await;
if let Ok(result) = &result {
self.record_read(result.len() as u64);
}
result
}
async fn get_ranges(
&self,
location: &Path,
@@ -154,20 +124,11 @@ impl ObjectStore for IoTrackingStore {
result
}
async fn head(&self, location: &Path) -> OSResult<ObjectMeta> {
self.record_read(0);
self.target.head(location).await
}
async fn delete(&self, location: &Path) -> OSResult<()> {
fn delete_stream(
&self,
locations: BoxStream<'static, OSResult<Path>>,
) -> BoxStream<'static, OSResult<Path>> {
self.record_write(0);
self.target.delete(location).await
}
fn delete_stream<'a>(
&'a self,
locations: BoxStream<'a, OSResult<Path>>,
) -> BoxStream<'a, OSResult<Path>> {
self.target.delete_stream(locations)
}
@@ -190,24 +151,14 @@ impl ObjectStore for IoTrackingStore {
self.target.list_with_delimiter(prefix).await
}
async fn copy(&self, from: &Path, to: &Path) -> OSResult<()> {
async fn copy_opts(&self, from: &Path, to: &Path, options: CopyOptions) -> OSResult<()> {
self.record_write(0);
self.target.copy(from, to).await
self.target.copy_opts(from, to, options).await
}
async fn rename(&self, from: &Path, to: &Path) -> OSResult<()> {
async fn rename_opts(&self, from: &Path, to: &Path, options: RenameOptions) -> OSResult<()> {
self.record_write(0);
self.target.rename(from, to).await
}
async fn rename_if_not_exists(&self, from: &Path, to: &Path) -> OSResult<()> {
self.record_write(0);
self.target.rename_if_not_exists(from, to).await
}
async fn copy_if_not_exists(&self, from: &Path, to: &Path) -> OSResult<()> {
self.record_write(0);
self.target.copy_if_not_exists(from, to).await
self.target.rename_opts(from, to, options).await
}
}

View File

@@ -36,6 +36,7 @@ pub use query::AnyQuery;
use lance::io::commit::namespace_manifest::LanceNamespaceExternalManifestStore;
use lance_namespace::LanceNamespace;
use lance_namespace::error::NamespaceError;
use lance_namespace::models::DescribeTableRequest;
use lance_table::format::Manifest;
use lance_table::io::commit::CommitHandler;
@@ -94,6 +95,53 @@ pub use schema_evolution::{AddColumnsResult, AlterColumnsResult, DropColumnsResu
use serde_with::skip_serializing_none;
pub use update::{UpdateBuilder, UpdateResult};
/// Walk a boxed error chain to find the innermost `NamespaceError`.
///
/// Callers like `DatasetBuilder::from_namespace` re-wrap their inner namespace error
/// inside a fresh `lance::Error::Namespace`, so a single downcast at the top level
/// won't find it. This walks `.source()` to unwrap arbitrarily nested layers.
fn find_namespace_error<'a>(
err: &'a (dyn std::error::Error + 'static),
) -> Option<&'a NamespaceError> {
let mut current: Option<&(dyn std::error::Error + 'static)> = Some(err);
while let Some(e) = current {
if let Some(ns_err) = e.downcast_ref::<NamespaceError>() {
return Some(ns_err);
}
current = e.source();
}
None
}
/// Map a `lance::Error` coming from a `lance-namespace` call into a `lancedb::Error`,
/// preserving the fine-grained namespace error code (e.g. `TableNotFound`,
/// `TableAlreadyExists`). Errors that aren't recognized namespace error variants fall
/// through to a generic runtime error rather than `TableNotFound`/`TableAlreadyExists`.
pub(crate) fn map_namespace_lance_error(err: lance::Error, table_name: &str) -> Error {
if let Some(code) = find_namespace_error(&err).map(NamespaceError::code) {
match code {
lance_namespace::error::ErrorCode::TableNotFound => {
return Error::TableNotFound {
name: table_name.to_string(),
source: Box::new(err),
};
}
lance_namespace::error::ErrorCode::TableAlreadyExists => {
return Error::TableAlreadyExists {
name: table_name.to_string(),
};
}
_ => {}
}
}
match err {
lance::Error::Namespace { source, .. } => Error::Runtime {
message: format!("Namespace error: {}", source),
},
other => other.into(),
}
}
/// Defines the type of column
#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum ColumnKind {
@@ -1494,12 +1542,7 @@ impl NativeTable {
// and storage options from the namespace
let builder = DatasetBuilder::from_namespace(namespace_client.clone(), table_id)
.await
.map_err(|e| match e {
lance::Error::Namespace { source, .. } => Error::Runtime {
message: format!("Failed to get table info from namespace: {:?}", source),
},
e => e.into(),
})?;
.map_err(|e| map_namespace_lance_error(e, name))?;
let dataset = builder
.with_read_params(params)