lancedb

mirror of https://github.com/lancedb/lancedb.git synced 2026-07-03 19:10:41 +00:00

Author	SHA1	Message	Date
Will Jones	d889321b5e	fix!: combine repeated where filters with AND instead of replacing (#3585 ) BREAKING CHANGE: When passing multiple where clauses to a query, they now stack instead of replacing the previous filter. Previously, calling `where`/`only_if` more than once on a query silently replaced the previous filter, so only the last filter was applied. This was surprising and could return rows that an earlier filter should have excluded. This implements the alternative suggested in https://github.com/lancedb/lancedb/pull/3514#issuecomment-4664901580: instead of rejecting a second filter, repeated filters are combined with a logical AND (`(previous) AND (new)`). The combination happens in the Rust core (`QueryBase::only_if` and `only_if_expr`), so it applies to all SDKs at once (Rust, Python async, and TypeScript). The Python sync query builder keeps its own filter state, so it combines filters in the binding layer as well. SQL string and expression filters are combined within their own representation. When the two representations are mixed, the expression is lowered to SQL (via `expr_to_sql_string`) and the filters are combined as SQL strings, so chaining `where` works regardless of which form each filter takes. Fixes #2649 ## Tests - Rust: `cargo test --features remote -p lancedb --lib query` - Python: `uv run --extra tests pytest python/tests/test_query.py` - TypeScript: `pnpm test __test__/query.test.ts` 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-07-01 10:11:58 -07:00
Brendan Clement	f76b075d13	feat: add table branch support to remote tables and Python/TS bindings (#3540 ) ### Description Adding branch support for RemoteTable by threading a branch selector onto every operation the data plane accepts it on. Exposes the currentBranch to nodejs and python through the bindings. Matching the server handlers, the branch rides as: - a `?branch=` query parameter for Arrow-body and query-only ops (insert, merge_insert, multipart_*, version/list, drop_index) - a `branch` field in the JSON body for everything else (count_rows, query, update, delete, create_index, column ops, index list/stats, stats, restore, describe, tags create/update) A main-branch handle (`branch == None`) produces byte-identical requests to before: no `branch` field and no `?branch=` - Handle-per-branch: `create_branch` / `checkout_branch` return a new handle with fresh caches and reset version/freshness state, mirroring `NativeTable`. - `create_branch` maps 409 to already-exists, 400 to invalid, and 404 to not-found with source context, and sends without retry so the 409 stays observable. - `Ref` translation covers version, version-number (relative to the handle's branch), and tag (resolved via the tags endpoint); `"main"` and empty normalize to the main branch. - Python branch handles persist their branch (and pinned version) across pickle/fork, so a forked or pickled handle reopens on its branch rather than silently reverting to main. ### Tests - Rust mock tests per op category (query-param and body mechanisms, branch CRUD, error paths, backward-compat). - Python sync branch CRUD, `open_table(branch=)`, and a pickle round-trip regression test.	2026-06-15 18:07:40 -04:00
Will Jones	f8caef3aca	feat(bindings): expose new IndexConfig fields in Python and Node.js (#3534 ) ## Summary Surfaces the rich per-index metadata added in #3497 to the Python and Node.js language bindings. Closes #3495. New optional fields exposed on `IndexConfig` in both bindings: - `index_uuid` / `indexUuid` — UUID of the first index segment - `type_url` / `typeUrl` — protobuf type URL for the index - `created_at` / `createdAt` — creation timestamp (milliseconds since Unix epoch) - `num_indexed_rows` / `numIndexedRows` — rows covered by the index - `num_unindexed_rows` / `numUnindexedRows` — rows not yet indexed - `size_bytes` / `sizeBytes` — total index file size in bytes - `num_segments` / `numSegments` — number of index segments - `index_version` / `indexVersion` — on-disk format version - `index_details` / `indexDetails` — type-specific JSON details string All fields are `None`/`undefined` for remote tables (which don't yet surface this metadata through the server response). ## Changes - `python/src/index.rs`: extend `IndexConfig` pyclass; update `From` impl; update `__getitem__` - `python/python/lancedb/_lancedb.pyi`: add type hints for new fields - `python/python/tests/test_table.py`: new `test_index_config_fields` test - `nodejs/src/table.rs`: extend `IndexConfig` napi struct; update `From` impl - `nodejs/__test__/table.test.ts`: new test; update existing `toEqual` assertions to `expect.objectContaining` to accommodate new fields ## Test plan - [x] Python: `uv run --extra tests pytest python/tests/test_table.py::test_index_config_fields` - [x] Node.js: `pnpm test __test__/table.test.ts` 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-11 13:37:39 -07:00
Jack Ye	8373318e89	feat: support FM-Index scalar index for substring search (#3532 ) Adds an FM-Index — a scalar index over string and binary columns that accelerates substring search (`contains(col, 'needle')`), distinct from the tokenized `FTS` index — across the Rust core and the Python and TypeScript bindings. ## Rust - `Index::Fm(FmIndexBuilder)` and `IndexType::Fm`. - `make_index_params` maps `Index::Fm` to Lance's `ScalarIndexParams::for_builtin(BuiltinIndexType::Fm)`. - `supported_fm_data_type` validates `Utf8`/`LargeUtf8`/`Binary`/`LargeBinary` columns. - `list_indices` round-trips the type (`"Fm"` → `IndexType::Fm`); the remote wire type is `"FM"`. ## Python Adds `lancedb.index.Fm`, accepted by `create_index`: ```python from lancedb.index import Fm await tbl.create_index("text", config=Fm()) ``` ## TypeScript Adds the `Index.fm()` factory: ```ts await tbl.createIndex("text", { config: Index.fm() }); ```	2026-06-10 12:28:20 -07:00
Xuanwo	566b67a634	fix: support LargeList label list indexes (#3529 ) ## Summary This PR extends nested-field regression coverage across Rust local/remote, Python sync/async, and Node so canonical escaped paths stay consistent across scalar, vector, and FTS index lifecycle behavior. It also aligns LanceDB's LabelList type gate with Lance by accepting `LargeList<primitive>` columns while keeping `List<Struct<...>>` unsupported until Lance defines stable membership semantics for struct labels. Part of #3406.	2026-06-10 23:53:56 +08:00
Brendan Clement	d9018067b3	feat: support checking out a version on a branch (#3504 ) ### Description Stacked on #3490. Adds an optional version to branch checkout across the Rust core and the Python and TypeScript SDKs, so you can open a specific version on a branch ("version V of branch B"), not just the branch's latest version Rust ```rust // Open version 3 of branch "exp" (a read-only view): check out from an // existing table, or open it directly from the connection. let exp_v3 = table.checkout_branch("exp", Some(3)).await?; let exp_v3 = db.open_table("items").branch("exp").version(3).execute().await?; // checkout_latest re-attaches to the branch's writable HEAD. exp_v3.checkout_latest().await?; // With no branch, a version opens main at that version. let main_v3 = db.open_table("items").version(3).execute().await?; ``` Python ```python # Open version 3 of branch "exp" (a read-only view): check out from an # existing table, or open it directly from the connection. branch_v3 = await table.branches.checkout("exp", version=3) branch_v3 = await db.open_table("items", branch="exp", version=3) # checkout_latest re-attaches to the branch's writable HEAD. await branch_v3.checkout_latest() # With no branch, a version opens main at that version. main_v3 = await db.open_table("items", version=3) ``` TypeScript ```typescript // Open version 3 of branch "exp" (a read-only view): check out from an // existing table, or open it directly from the connection. const branchV3 = await (await table.branches()).checkout("exp", 3); const opened = await db.openTable("items", undefined, { branch: "exp", version: 3 }); // checkoutLatest re-attaches to the branch's writable HEAD. await branchV3.checkoutLatest(); // With no branch, a version opens main at that version. const mainV3 = await db.openTable("items", undefined, { version: 3 }); ``` ### Testing - Added unit tests (Rust, Python sync + async, TypeScript): branch-scoped resolution at a version number shared with `main` and with another branch, read-only enforcement on a pinned handle, `checkout_latest` recovery to the branch's HEAD, fork-point reads, and the nonexistent-version/branch error paths. - Ran smoke tests against the Python and TypeScript SDKs on local machine.	2026-06-08 17:36:38 -07:00
Brendan Clement	53517b3aaa	feat: add table branch support (#3490 ) ### Description Adds first-class support for table branches across the Rust core and the Python and TypeScript SDKs. Rust ```rust use lance::dataset::refs::Ref; // Create a branch from main and write to it — main is untouched. let exp = table.create_branch("exp", Ref::Version(None, None)).await?; exp.add(batches).await?; // Reopen the branch later: check out from a table, or open it directly. let exp = table.checkout_branch("exp").await?; let exp = db.open_table("items").branch("exp").execute().await?; let branches = table.list_branches().await?; table.delete_branch("exp").await?; ``` Python ```python # Create a branch from main and write to it branch = await table.branches.create("exp", from_ref="main") await branch.add(data) # Reopen the branch later: check out from a table, or open it directly. branch = await table.branches.checkout("exp") branch = await db.open_table("items", branch="exp") await table.branches.list() await table.branches.delete("exp") ``` TypeScript ```typescript const branches = await table.branches(); // Create a branch from main and write to it const branch = await branches.create("exp"); await branch.add(data); // Reopen the branch later: check out from a table, or open it directly. const checkedOut = await branches.checkout("exp"); const opened = await db.openTable("items", undefined, { branch: "exp" }); await branches.list(); await branches.delete("exp"); ``` ### Testing - Added unit tests - ran smoke tests against python and typescript sdks on local machine ### Next steps - Add RemoteTable support - Add Branch Comparison support - Merge Branching support	2026-06-08 16:26:46 -07:00
Will Jones	09b1bbc12a	refactor!: drop unused loss field from IndexStatistics (#3496 ) BREAKING CHANGE: direct Rust users lose the `IndexStatistics::loss` field. Python and Node.js consumers are unaffected in practice for remote tables (the value was always `None`/absent), but the attribute is gone for local tables too. `IndexStatistics::loss` was local-only — LanceDB Cloud never returned it, so `RemoteTable::index_stats` always set `loss: None`. It's vestigial; this removes it. - Remove `loss` from `IndexStatistics` and the internal `IndexMetadata` in `rust/lancedb/src/index.rs`, plus the summing logic in `NativeTable::index_stats`. - Drop `loss` from the Python and Node.js bindings (and their tests/docs). Fixes #3493 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-05 07:52:40 -07:00
Brendan Clement	d065be0474	feat: add update_field_metadata to edit per-field metadata (#3482 ) ### Summary Adds update_field_metadata to the client SDK (Rust core, Python, and TypeScript) so clients can edit per-field (column) Arrow metadata (schema.fields[].metadata) ### Testing - added unit tests - ran E2E against a local server on both local and remote tables (set → merge → delete), across Python sync/async and TypeScript ### Next steps - deprecate replace_field_metadata in the python lancedb favor of this (typescript didn't have replace_field_metadata method). This matches Lance's API direction (Lance already deprecated replace_field_metadata for update_field_metadata)	2026-06-02 07:00:00 -07:00
Heng Ge	048f52c2aa	feat(table): route merge_insert through the MemWAL LSM write path (#3354 ) ## Summary When an `LsmWriteSpec` is installed on a table (#3396), `merge_insert` upsert calls are dispatched through Lance's MemWAL `ShardWriter` (LSM-style append) instead of the standard merge path. - `use_lsm_write` — a `merge_insert` builder option, default `true`; set it `false` to use the standard path for a call even when a spec is set. - `assume_pre_sharded` — a `merge_insert` builder option, default `false`; skips the per-row shard check and routes by the first row only. - `close_lsm_writers` — drains and closes the table's cached MemWAL shard writers. - The `merge_insert` `on` columns default to, and are validated against, the table's unenforced primary key. - Shard writers are cached alongside the dataset (in `DatasetConsistencyWrapper`) and reused for the session. - `MergeResult` gains `num_rows` — on the LSM path the insert/update breakdown is unknown until compaction, so only the total is reported. Routing covers all three sharding strategies — bucket (murmur3, Iceberg-compatible), identity, and unsharded. Each `merge_insert` call targets a single shard; the whole input is collected and validated before a single atomic `ShardWriter::put`, so a validation failure leaves the MemWAL untouched. Bindings: Python (`merge_insert(...).use_lsm_write(...)` / `.assume_pre_sharded(...)`, `Table.close_lsm_writers`) and TypeScript (`mergeInsert(...).useLsmWrite(...)` / `.assumePreSharded(...)`, `Table.closeLsmWriters`). ## Context Reconstructed from the original #3354 branch onto current `main`: the branch predated the #3394 (unenforced primary key) / #3396 (`LsmWriteSpec`) split and has been rebuilt on that merged foundation. Depends on Lance `v7.0.0-beta.13`. The MemWAL read path (reading un-flushed shard data back into queries) and remote (LanceDB Cloud) LSM support are follow-ups. --------- Co-authored-by: Jack Ye <yezhaoqin@gmail.com>	2026-05-29 08:48:11 -07:00
Jack Ye	a7d9f2e99d	fix: remove primary key constraint from MemWAL bucket sharding (#3435 ) ## Summary - Bump lance dependency from `v7.0.0-beta.13` to `v7.0.0-rc.1` - Remove PK constraint from `LsmWriteSpec::Bucket` docs and `Table::set_lsm_write_spec` docs - Remove test assertions that expected rejection when no PK is set or when bucket column != PK Closes https://github.com/lance-format/lance/issues/6917	2026-05-26 17:35:28 -07:00
Xuanwo	d5dc4c0f06	fix: discover nested vector columns by default (#3423 ) LanceDB default vector column discovery only considered top-level fields, so tables with a single nested vector leaf still required users to pass an explicit field path. This updates Rust and Python discovery to recurse into struct fields, return canonical field paths, and preserve actionable errors when no default or multiple defaults exist. The explicit nested path flow for index creation and search remains supported across Rust, Python, and Node, with regression coverage for single nested vector leaves, multiple candidate leaves, and schemas without vector leaves. Closes #3405.	2026-05-21 19:02:41 +08:00
Brendan Clement	4cb9147bbf	feat(nodejs): add renameTable on Connection (#3386 ) Adds `Connection.renameTable` to the Node SDK. Closes #3381.	2026-05-20 09:05:48 -07:00
Brendan Clement	049b0c8f09	feat(nodejs): add progress to Table.add (#3398 ) ### Summary - Add an optional `progress` callback to `Table.add(data, { progress })`. Callback fires once per batch written and once more with `done: true` when the write completes. - Errors thrown from the user's callback are logged with `console.warn` and swallowed ### Testing - npm test - ran smoke test script to verify functionality	2026-05-19 18:35:07 -07:00
Heng Ge	0d30b31998	feat: support setting LSM write spec for a table (#3396 ) ## Summary Split out from #3354 Adds `LsmWriteSpec` and `Table::set_lsm_write_spec` / `unset_lsm_write_spec` to install and clear the spec that selects Lance's MemWAL LSM-style write path for `merge_insert`. `LsmWriteSpec` offers three sharding strategies, all built on Lance's `InitializeMemWalBuilder`: - `LsmWriteSpec::bucket(column, num_buckets)` — hash-bucket sharding by the single-column unenforced primary key. - `LsmWriteSpec::identity(column)` — identity sharding by the raw value of a scalar column. - `LsmWriteSpec::unsharded()` — a single MemWAL shard. Each can be refined with `with_maintained_indexes(...)` (indexes the MemWAL keeps up to date as rows are appended) and `with_writer_config_defaults(...)` (default `ShardWriter` configuration recorded in the MemWAL index, so every writer starts from the same defaults). All variants require the table to have an unenforced primary key. - `set_lsm_write_spec` installs the spec by initializing the MemWAL index; `unset_lsm_write_spec` removes it (dropping the MemWAL index), reverting to the standard `merge_insert` path. `unset` is idempotent. - Bindings: Python (`LsmWriteSpec.bucket` / `.identity` / `.unsharded`, `set_lsm_write_spec` / `unset_lsm_write_spec`) and TypeScript (`setLsmWriteSpec` with `specType` `"bucket"` / `"identity"` / `"unsharded"`). `RemoteTable` returns `NotSupported`. The actual `merge_insert` LSM dispatch and `ShardWriter` write path are a follow-up — this PR only installs and clears the spec.	2026-05-18 00:11:33 -07:00
Heng Ge	6a431ff0a0	feat: support setting unenforced primary key (#3394 ) ## Summary Adds `Table::set_unenforced_primary_key` — records a single column as the table's unenforced primary key in Lance schema field metadata. "Unenforced" means LanceDB does not check uniqueness on write; the key is metadata that `merge_insert` consumes. - Single-column only; the column must exist and have a supported dtype (Int32, Int64, Utf8, LargeUtf8, Binary, LargeBinary, FixedSizeBinary). The API accepts an iterable for binding ergonomics but requires exactly one column — compound keys are rejected. - The primary key is immutable: calling this on a table that already has an unenforced primary key is rejected. Concurrent writers racing to set the key fail at commit time rather than silently overriding it. - `RemoteTable` returns `NotSupported`. - Bindings: Python (`AsyncTable`, `LanceTable`, `RemoteTable`) and TypeScript (`Table.setUnenforcedPrimaryKey`). ## Context Split out from #3354 per review feedback, so the unenforced primary key and the `merge_insert` sharding spec land as separate reviewable PRs. No Lance dependency bump — `main` is already on v7.0.0-beta.10, which includes the field-metadata round-trip fix the API relies on. Enforcing primary-key immutability at the Lance commit layer (so the cross-column concurrent race is also rejected) is a companion Lance change: lance-format/lance#6810.	2026-05-16 23:12:55 -07:00
Xin Sun	ab2c5adf5e	feat(nodejs): add order_by method to Query (#3123 )	2026-05-16 22:49:08 -07:00
Neha Prasad	13c6dae9a3	feat(nodejs): add Connection.renameTable with namespace support (#3365 ) ### Summary - Expose Connection.renameTable in the Node.js bindings and align it with existing namespace-aware connection APIs. ### Changes - Add napi-rs rename_table on Connection, delegating to Rust Connection::rename_table. - Add renameTable(oldName, newName, namespacePath?) on abstract Connection and implement on LocalConnection. - Add a connection test that renames a table and checks names / open behavior. #### Testing - cd nodejs && npm run build - cd nodejs && npm test __test__/connection.test.ts fix : #3364 --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2026-05-14 15:30:31 -07:00
Tanay	df4ad9f851	feat(nodejs): add Scannable primitive for streaming ingestion (#3271 ) ## Summary This PR adds a Scannable primitive to the Node.js bindings, bringing parity with Python's `PyScannable`. A `Scannable` wraps a schema, an optional row count hint, a rescannable flag, and a batch producing callback. On the Rust side it implements `lancedb::data::scannable::Scannable`. The goal is to give consumers such as `Table.add`, `createTable`, and `mergeInsert` a way to stream data without materializing the full dataset in JS memory. This PR introduces only the primitive. Migrating existing consumers to use it will come in follow up work. --- ## Design ### Transport The transport uses the Arrow IPC Stream format, one batch at a time. The JS side encodes each `RecordBatch` into a self contained IPC Stream message containing schema, batch, and end of stream. The message is returned as a `Buffer` through a napi `ThreadsafeFunction`. The Rust side decodes it using `arrow_ipc::reader::StreamReader`. Only one batch is active at a time, so JS memory stays bounded by the batch size. The Node `Buffer` size limit of about 4 GiB therefore does not constrain the stream as a whole. I initially evaluated the Arrow C Data Interface, which is the approach used in Python. I dropped that path after confirming that the `apache-arrow` npm package does not expose a C Data Interface export in any supported version from 15 to 18. JavaScript is not listed in Arrow's C Data Interface implementation table, and the upstream tracking issue remains open with no scheduled work. Third party FFI shims would introduce additional dependency risk without solving the core maintenance problem. Using IPC adds one encode and decode step per batch, but the cost is predictable and typically dominated by Lance's write path. --- ### API ```ts class Scannable { readonly schema: Schema readonly numRows: number \| null readonly rescannable: boolean static fromFactory(schema, factory, opts?) static fromTable(table, opts?) static fromIterable(schema, iter, opts?) static fromRecordBatchReader(reader, opts?) } ``` The FFI boundary consists of a single callback: `getNextBatch(isStart: boolean): Promise<Buffer \| null>` `isStart` is `true` on the first call of each new scan and `false` for every call after it. The JS side uses it to drop any cached iterator and re-invoke the factory at scan boundaries. This is what makes a rescannable source restart at batch 0 on every `scan_as_stream` call, even when a previous scan ended mid stream, for example a retried write after a network error. Without this signal a retry would resume a stale iterator and silently skip already emitted batches. In addition, a schema only IPC buffer is transferred once during construction. --- ## Changes * `nodejs/src/scannable.rs` Adds `NapiScannable` and the `LanceScannable` implementation. Implements `schema()`, `num_rows()`, `rescannable()`, and `scan_as_stream()`. Includes per batch schema validation against the declared schema, one shot enforcement for non rescannable sources, and a scan boundary reset signal (`isStart`) so rescannable sources restart from batch 0 on every `scan_as_stream` call rather than resuming a stale iterator. * `nodejs/src/lib.rs` Module registration. * `nodejs/lancedb/scannable.ts` Defines the `Scannable` class and the four constructors listed above. Each constructor rejects option combinations it cannot honor, for example a `rescannable: true` request on a one shot iterable or reader, and a `numRows` that disagrees with an in memory table's row count. * `nodejs/lancedb/index.ts` Exports the new primitive. * `nodejs/__test__/scannable.test.ts` Test suite for the primitive. --- ## Validation Before implementing the bridge, I ran an end to end harness with a JS producer feeding a standalone Rust consumer built against the same `arrow-ipc` version used in the bridge. The harness covered the following scenarios: * happy path * empty stream * 1,000 small batches * 10 large batches * mixed primitive types with nullables * nested `List<Struct<>>` * truncated stream error handling * declared schema mismatch validation * a 6 GB stress test through the pipe All scenarios completed with bounded memory usage. The goal of this harness was to confirm that the IPC Stream transport works correctly end to end and that Node's `Buffer` size limit does not constrain the overall stream. Separately, the rescannable restart contract was verified with a focused harness. A rescannable source is consumed partially and the scan is dropped mid stream, then re-scanned. The re-scan replays from batch 0 rather than resuming the stale iterator. The same harness was run with the `isStart` reset path disabled and the mid stream restart case failed as expected, confirming the test exercises the real regression. These harnesses are not meant to replace the full test suite, which is described below. --- ## Tests `__test__/scannable.test.ts` covers construction, metadata reflection, per constructor defaults and overrides, construction time validation, the native handle surface, and schema variety across empty tables, nested types, `FixedSizeList`, and wide schemas. Runtime scan behavior including `scan_as_stream`, one shot enforcement on non rescannable sources, schema mismatch detection, IPC decode failures, and rescannable restart semantics is not exercised here. There is no in tree JS consumer of `NapiScannable` yet. This mirrors Python's `PyScannable`, which has no dedicated test file and is covered transitively through the consumers that accept a Scannable. Runtime coverage will follow in the consumer migration work. --- ## Status Ready for review. Closes #3223 ---	2026-05-14 15:07:41 -07:00
Brendan Clement	9330a9b851	feat(nodejs): expose connectNamespace for namespace-backed connections (#3383 ) ### Summary Adds a `connectNamespace(implName, properties, options?)` to the NodeJS SDK`. Closes #3380. ### Testing - pnpm test - Ran smoke test ``` import { connectNamespace } from "lancedb" import { tmpdir } from "os"; import { mkdtempSync } from "fs"; import { join } from "path"; const dir = mkdtempSync(join(tmpdir(), "lancedb-connect-namespace-smoke-")); console.log(`Using temp dir: ${dir}\n`); // 1. Happy path: connect via the "dir" namespace impl, create + list a table. console.log('Connecting via connectNamespace("dir", { root })...'); const db = await connectNamespace("dir", { root: dir }); console.log(" ✓ connected:", db.display()); console.log("Creating a table and listing it..."); await db.createTable("users", [ { id: 1, name: "alice" }, { id: 2, name: "bob" }, ]); console.log(" ✓ tableNames ->", await db.tableNames()); const table = await db.openTable("users"); console.log(" ✓ users.countRows ->", await table.countRows()); // 2. Storage options pass-through. console.log("\nReconnecting with storageOptions (plumbing check)..."); const dbWithOpts = await connectNamespace( "dir", { root: dir }, { storageOptions: { newTableDataStorageVersion: "stable" } }, ); console.log(" ✓ connected with storageOptions:", dbWithOpts.display()); await dbWithOpts.close(); // 3. Empty implName -> clear error. console.log("\nCalling connectNamespace('', {}) (expect error)..."); try { await connectNamespace("", {}); console.error(" UNEXPECTED: empty implName did not throw"); } catch (err) { console.log(` ✓ Got expected error: ${err.message.split("\n")[0]}`); } // 4. Unknown impl -> error. console.log("\nCalling connectNamespace('not-a-real-impl', {}) (expect error)..."); try { await connectNamespace("not-a-real-impl", {}); console.error(" UNEXPECTED: unknown impl did not throw"); } catch (err) { console.log(` ✓ Got expected error: ${err.message.split("\n")[0]}`); } // 5. Create a table inside a child namespace, then reconnect with a fresh // connectNamespace call and confirm the table is reachable via that // namespace path. (The dir+manifest impl keeps the namespace hierarchy in // a root manifest, so "scoping" happens via namespacePath args, not by // pointing root at a subdir.) console.log("\nCreating a table inside a child namespace..."); const dir2 = mkdtempSync(join(tmpdir(), "lancedb-connect-namespace-smoke-")); const writer = await connectNamespace("dir", { root: dir2, manifest_enabled: "true", }); await writer.createNamespace(["analytics"]); await writer.createTable( "orders", [ { id: 1, total: 10 }, { id: 2, total: 20 }, ], ["analytics"], ); console.log( " ✓ writer sees tables under [analytics] ->", await writer.tableNames(["analytics"]), ); await writer.close(); console.log("Reconnecting and reading the table via its namespace path..."); const reader = await connectNamespace("dir", { root: dir2, manifest_enabled: "true", }); console.log( " ✓ reader tableNames(['analytics']) ->", await reader.tableNames(["analytics"]), ); const orders = await reader.openTable("orders", ["analytics"]); console.log(" ✓ orders.countRows via reader ->", await orders.countRows()); await reader.close(); await db.close(); console.log("\nAll checks passed."); ``` ``` Using temp dir: /var/folders/bj/hn6jv9c50y301d1nx0y8xmn00000gn/T/lancedb-connect-namespace-smoke-WByF1P Connecting via connectNamespace("dir", { root })... ✓ connected: LanceNamespaceDatabase Creating a table and listing it... ✓ tableNames -> [ 'users' ] ✓ users.countRows -> 2 Reconnecting with storageOptions (plumbing check)... ✓ connected with storageOptions: LanceNamespaceDatabase Calling connectNamespace('', {}) (expect error)... ✓ Got expected error: implName must be a non-empty string Calling connectNamespace('not-a-real-impl', {}) (expect error)... ✓ Got expected error: Invalid input, Failed to connect to namespace: Namespace { source: Unsupported { message: "Implementation 'not-a-real-impl' is not available. Supported: dir, rest" }, location: Location { file: "/Users/brendan/.cargo/git/checkouts/lance-8ddea23c38163eda/f693245/rust/lance-namespace-impls/src/connect.rs", line: 216, column: 14 } } Creating a table inside a child namespace... ✓ writer sees tables under [analytics] -> [ 'orders' ] Reconnecting and reading the table via its namespace path... ✓ reader tableNames(['analytics']) -> [ 'orders' ] ✓ orders.countRows via reader -> 2 All checks passed. ``` ### Docs - regenerated docs	2026-05-13 16:16:56 -07:00
Brendan Clement	02de07576e	feat(nodejs): add namespace management methods on Connection (#3371 ) ### Summary Closes #3363 Adds the four namespace management methods to the NodeJS `Connection`, bringing parity with the Rust core and Python bindings: - `listNamespaces(parent?, options?)` - `createNamespace(namespacePath, options?)` - `dropNamespace(namespacePath, options?)` - `describeNamespace(namespacePath)` ### Test plan - npm test - Ran a smoke test script ```typescript import { connect } from '<lancePath>' import { tmpdir } from "os"; import { mkdtempSync } from "fs"; import { join } from "path"; const dir = mkdtempSync(join(tmpdir(), "lancedb-smoke-")); console.log(`Using temp dir: ${dir}\n`); const db = await connect(dir, { namespaceClientProperties: { manifest_enabled: "true" }, }); console.log("Creating namespaces..."); await db.createNamespace(["analytics"]); await db.createNamespace(["analytics", "sales"], { properties: { owner: "brendan", purpose: "smoke-test" }, }); await db.createNamespace(["marketing"]); const root = await db.listNamespaces(); console.log("Root namespaces:", root.namespaces); const children = await db.listNamespaces(["analytics"]); console.log("Children of 'analytics':", children.namespaces); const descWithProps = await db.describeNamespace(["analytics", "sales"]); console.log("Describe analytics/sales (with properties):", descWithProps); const descNoProps = await db.describeNamespace(["analytics"]); console.log("Describe analytics (no properties):", descNoProps); console.log("Describing a non-existent namespace (expect error)..."); try { await db.describeNamespace(["does-not-exist"]); console.error(" UNEXPECTED: describe succeeded for non-existent namespace"); } catch (err) { console.log(` ✓ Got expected error: ${err.message.split("\n")[0]}`); } await db.dropNamespace(["marketing"]); const afterDrop = await db.listNamespaces(); console.log("Root after dropping marketing:", afterDrop.namespaces); await db.close(); console.log("\nAll operations completed successfully."); ``` ``` Using temp dir: /var/folders/bj/hn6jv9c50y301d1nx0y8xmn00000gn/T/lancedb-smoke-MUC5NI Creating namespaces... Root namespaces: [ 'analytics', 'marketing' ] Children of 'analytics': [ 'sales' ] Describe analytics/sales (with properties): { properties: { purpose: 'smoke-test', owner: 'brendan' } } Describe analytics (no properties): {} Describing a non-existent namespace (expect error)... ✓ Got expected error: lance error: Namespace error: Namespace not found: does-not-exist, rust/lance-namespace-impls/src/dir/manifest.rs:2495:14 Caused by: Namespace error: Namespace not found: does-not-exist, rust/lance-namespace-impls/src/dir/manifest.rs:2495:14 Caused by: Namespace not found: does-not-exist Root after dropping marketing: [ 'analytics' ] All operations completed successfully. ``` ### Documentation - regenerated docs	2026-05-13 11:49:27 -07:00
Brendan Clement	011fdd5c94	feat(nodejs): add prewarmData method on Table (#3374 ) ### Summary - Closes #3362 - Adds `prewarmData(columns?: string[])` to the Node bindings, mirroring the Rust and Python implementations ### Testing - [x] `npm run build` (regenerates the napi `.node` module + TS declarations) - [x] `npm run lint` - [x] `npm test - [ ] live test against remote table - just waiting for my dev stack to get created ### Documentation - updated docs	2026-05-12 15:29:48 -07:00
C Kaustubh	fd98b845ea	fix(node): prevent reranker from keeping process alive (#3270 ) Fixes #3269. ## What I observed Using a reranker in a hybrid query could keep the Node.js process alive even after `table.close()` and `db.close()`. ## Root cause The reranker callback bridge used a `ThreadsafeFunction` in referenced mode, which can keep the event loop alive longer than intended. ## Minimal fix - In `nodejs/src/rerankers.rs`, create the reranker callback TSFN in weak mode (`.weak::<true>()`). - Add a regression test in `nodejs/__test__/rerankers.test.ts` that spawns a child process, runs a rerank query, and asserts the process exits naturally. ## Validation - Built Node bindings successfully. - Ran targeted tests: `rerankers.test.ts` passes (including new regression test). - Pre-commit checks for changed files were run and clean.	2026-04-19 14:02:23 +08:00
Jack Ye	e26b22bcca	refactor!: consolidate namespace related naming and enterprise integration (#3205 ) 1. Refactored every client (Rust core, Python, Node/TypeScript) so “namespace” usage is explicit: code now keeps namespace paths (namespace_path) separate from namespace clients (namespace_client). Connections propagate the client, table creation routes through it, and managed versioning defaults are resolved from namespace metadata. Python gained LanceNamespaceDBConnection/async counterparts, and the namespace-focused tests were rewritten to match the clarified API surface. 2. Synchronized the workspace with Lance 5.0.0-beta.3 (see https://github.com/lance-format/lance/pull/6186 for the upstream namespace refactor), updating Cargo/uv lockfiles and ensuring all bindings align with the new namespace semantics. 3. Added a namespace-backed code path to lancedb.connect() via new keyword arguments (namespace_client_impl, namespace_client_properties, plus the existing pushdown-ops flag). When those kwargs are supplied, connect() delegates to connect_namespace, so users can opt into namespace clients without changing APIs. (The async helper will gain parity in a later change)	2026-04-03 00:09:03 -07:00
Vedant Madane	1ba19d728e	feat(node): support Float16, Float64, and Uint8 vector queries (#3193 ) Fixes #2716 ## Summary Add support for querying with Float16Array, Float64Array, and Uint8Array vectors in the Node.js SDK, eliminating precision loss from the previous \Float32Array.from()\ conversion. ## Implementation Follows @wjones127's [5-step plan](https://github.com/lancedb/lancedb/issues/2716#issuecomment-3447750543): ### Rust (\ odejs/src/query.rs\) 1. \ytes_to_arrow_array(data: Uint8Array, dtype: String)\ helper that: - Creates an Arrow \Buffer\ from the raw bytes - Wraps it in a typed \ScalarBuffer<T>\ based on the dtype enum - Constructs a \PrimitiveArray\ and returns \Arc<dyn Array>\ 2. \ earest_to_raw(data, dtype)\ and \dd_query_vector_raw(data, dtype)\ NAPI methods that pass the type-erased array to the core \ earest_to\/\dd_query_vector\ which already accept \impl IntoQueryVector\ for \Arc<dyn Array>\ ### TypeScript (\ odejs/lancedb/query.ts\, \rrow.ts\) 3. Extended \IntoVector\ type to include \Uint8Array\ (and \Float16Array\ via runtime check for Node 22+) 4. \xtractVectorBuffer()\ helper detects non-Float32 typed arrays and extracts their underlying byte buffer + dtype string 5. \ earestTo()\ and \ddQueryVector()\ route through the raw NAPI path when the input is Float16/Float64/Uint8 ### Backward compatibility Existing \Float32Array\ and \ umber[]\ inputs are unchanged -- they still use the original \ earest_to(Float32Array)\ NAPI method. The new raw path is only used when a non-Float32 typed array is detected. ## Usage \\\ ypescript // Float16Array (Node 22+) -- no precision loss const f16vec = new Float16Array([0.1, 0.2, 0.3]); const results = await table.query().nearestTo(f16vec).limit(10).toArray(); // Float64Array -- no precision loss const f64vec = new Float64Array([0.1, 0.2, 0.3]); const results = await table.query().nearestTo(f64vec).limit(10).toArray(); // Uint8Array (binary embeddings) const u8vec = new Uint8Array([1, 0, 1, 1, 0]); const results = await table.query().nearestTo(u8vec).limit(10).toArray(); // Existing usage unchanged const results = await table.query().nearestTo([0.1, 0.2, 0.3]).limit(10).toArray(); \\\ ## Note on dependencies The Rust side uses \rrow_array\, \rrow_buffer\, and \half\ crates. These should already be in the dependency tree via \lancedb\ core, but \Cargo.toml\ may need explicit entries for \half\ and the arrow sub-crates in the nodejs workspace. --------- Signed-off-by: Vedant Madane <6527493+VedantMadane@users.noreply.github.com> Co-authored-by: Will Jones <willjones127@gmail.com>	2026-03-30 11:15:35 -07:00
Pratik Dey	d1d720d08a	feat(nodejs): support field/data type input in add_columns() method (#3114 ) Add support for passing field/data type information into add_columns() method, bringing parity with Python bindings. The method now accepts: - AddColumnsSql[] - SQL expressions (existing functionality) - Field - single Arrow field with explicit data type - Field[] - array of Arrow fields with explicit data types - Schema - Arrow schema with explicit data types New columns added via Field/Schema are initialized with null values. All field-based columns must be nullable due to null initialization. Resolves #3107 --------- Signed-off-by: Pratik <pratikrocks.dey11@gmail.com> Co-authored-by: Claude <noreply@anthropic.com>	2026-03-13 12:57:14 -07:00
Weston Pace	4be85444f0	feat: infer js native arrays (#3119 ) When we create tables without using Arrow by parsing JS records we always infer to float64. Many times embeddings are not float64 and it would be nice to be able to use the native type without requiring users to pull in Arrow. We can utilize JS's builtin Float32Array to do this. This PR also adds support for UInt8/16/32 and Int8/16/32 arrays as well. Closes #3115	2026-03-09 10:13:59 -07:00
Sean Mackrory	e71a00998c	ci: add regression test for fastSearch in FTS queries in TypeScript (#3090 ) We recently added support for this for the Python bindings, and wanted to confirm this already worked as expected in the TS bindings.	2026-03-03 07:09:09 -08:00
Jack Ye	bd2c6d0763	chore: update lance dependency to v2.0.0-rc.4 (#2972 )	2026-02-03 14:38:39 -08:00
Vedant Madane	d3e15f3e17	fix(node): allow bigint[] for takeRowIds (#2916 ) ## Summary This PR changes takeRowIds to accept bigint[] instead of number[], matching the type of _rowid returned by withRowId(). ## Problem When retrieving row IDs using \withRowId()\ and querying them back with takeRowIds(), users get an error because: 1. _rowid values are returned as JavaScript bigint 2. takeRowIds() expected number[] 3. NAPI failed to convert: Error: Failed to convert napi value BigInt into rust type i64 ## Reproduction \\\js import lancedb from '@lancedb/lancedb'; const db = await lancedb.connect('memory://'); const table = await db.createTable('test', [{ id: 1, vector: [1.0, 2.0] }]); const results = await table.query().withRowId().toArray(); const rowIds = results.map(row => row._rowid); console.log('types:', rowIds.map(id => typeof id)); // ['bigint'] await table.takeRowIds(rowIds).toArray(); // âŒ Error before fix \\\ ## Solution - Updated TypeScript signature from takeRowIds(rowIds: number[]) to takeRowIds(rowIds: bigint[]) - Updated Rust NAPI binding to accept Vec<BigInt> and convert using get_u64() Fixes #2722 --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2026-02-03 10:09:51 -08:00
Jack Ye	e4552e577a	chore(revert): revert update lance dependency to v2.0.0-rc.1 (#2936 ) (#2941 ) This reverts commit `bd84bba14d`, so that we can bump version to 1.0.4-rc.1	2026-01-26 11:13:59 -08:00
LanceDB Robot	bd84bba14d	chore: update lance dependency to v2.0.0-rc.1 (#2936 ) ## Summary - bump Lance dependencies to v2.0.0-rc.1 (git tag) - align Arrow/DataFusion/PyO3 versions for the new Lance release - update Python bindings for PyO3 0.26 (attach API + Py<PyAny>) ## Verification - `cargo clippy --workspace --tests --all-features -- -D warnings` - `cargo fmt --all` ## Reference - https://github.com/lance-format/lance/releases/tag/v2.0.0-rc.1 --------- Co-authored-by: Jack Ye <yezhaoqin@gmail.com> Co-authored-by: Will Jones <willjones127@gmail.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: BubbleCal <bubble_cal@outlook.com>	2026-01-22 13:14:38 -08:00
Weston Pace	aeac9c7644	feat: add python Permutation class to mimic hugging face dataset and provide pytorch dataloader (#2725 )	2025-11-06 16:15:33 -08:00
Weston Pace	4cfcd95320	feat: add a permutation reader that can read a permutation view (#2712 ) This adds a rust permutation builder. In the next PR I will have python bindings and integration with pytorch.	2025-10-17 05:00:23 -07:00
Weston Pace	8f8e06a2da	feat: add output_schema method to queries (#2717 ) This is a helper utility I need for some of my data loader work. It makes it easy to see the output schema even when a `select` has been applied.	2025-10-14 05:13:28 -07:00
Weston Pace	5a19cf15a6	feat: a utility for creating "permutation views" (#2552 ) I'm working on a lancedb version of pytorch data loading (and hopefully addressing https://github.com/lancedb/lance/issues/3727). However, rather than rely on pytorch for everything I'm moving some of the things that pytorch does into rust. This gives us more control over data loading (e.g. using shards or a hash-based split) and it allows permutations to be persistent. In particular I hope to be able to: * Create a persistent permutation * This permutation can handle splits, filtering, shuffling, and sharding * Create a rust data loader that can read a permutation (one or more splits), or a subset of a permutation (for DDP) * Create a python data loader that delegates to the rust data loader Eventually create integrations for other data loading libraries, including rust & node	2025-10-09 18:07:31 -07:00
BubbleCal	b59d1007d3	feat(index): add IVF_RQ index type (#2687 ) this expose IVF_RQ (RabitQ quantization) index type to lancedb --------- Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2025-10-09 15:46:18 +08:00
Tom LaMarre	917aabd077	fix(node): support specifying arrow field types by name (#2704 ) The [`FieldLike` type in arrow.ts](`5ec12c9971/nodejs/lancedb/arrow.ts (L71-L78)`) can have a `type: string` property, but before this change, actually trying to create a table that has a schema that specifies field types by name results in an error: ``` Error: Expected a Type but object was null/undefined ``` This change adds support for mapping some type name strings to arrow `DataType`s, so that passing `FieldLike`s with a `type: string` property to `sanitizeField` does not throw an error. The type names that can be passed are upper/lowercase variations of the keys of the `constructorsByTypeName` object. This does not support mapping types that need parameters, such as timestamps which need timezones. With this, it is possible to create empty tables from `SchemaLike` objects without instantiating arrow types, e.g.: ``` import { SchemaLike } from "../lancedb/arrow" // ... const schemaLike = { fields: [ { name: "id", type: "int64", nullable: true, }, { name: "vector", type: "float64", nullable: true, }, ], // ... } satisfies SchemaLike; const table = await con.createEmptyTable("test", schemaLike); ``` This change also makes `FieldLike.nullable` required since the `sanitizeField` function throws if it is undefined.	2025-10-08 04:40:06 -07:00
Neha Prasad	9e2a68541e	fix(node): allow undefined/omitted values for nullable vector fields (#2656 ) Problem: When a vector field is marked as nullable, users should be able to omit it or pass `undefined`, but this was throwing an error: "Table has embeddings: 'vector', but no embedding function was provided" fixes: #2646 Solution: Modified `validateSchemaEmbeddings` to check `field.nullable` before treating `undefined` values as missing embedding fields. Changes: - Fixed validation logic in `nodejs/lancedb/arrow.ts` - Enabled previously skipped test for nullable fields - Added reproduction test case Behavior: - ✅ `{ vector: undefined }` now works for nullable fields - ✅ `{}` (omitted field) now works for nullable fields - ✅ `{ vector: null }` still works (unchanged) - ✅ Non-nullable fields still properly throw errors (unchanged) --------- Co-authored-by: Will Jones <willjones127@gmail.com> Co-authored-by: neha <neha@posthog.com>	2025-10-02 10:53:05 -07:00
Will Jones	d617cdef4a	feat: add use_index parameter to merge insert operations (#2674 ) ## Summary Exposes `use_index` Merge Insert parameter, which was created upstream in https://github.com/lancedb/lance/pull/4688. ## API Examples ### Python ```python # Force table scan table.merge_insert(["id"]) \ .when_not_matched_insert_all() \ .use_index(False) \ .execute(data) ``` ### Node.js/TypeScript ```typescript // Force table scan await table.mergeInsert("id") .whenNotMatchedInsertAll() .useIndex(false) .execute(data); ``` ### Rust ```rust // Force table scan let mut builder = table.merge_insert(&["id"]); builder.when_not_matched_insert_all() .use_index(false); builder.execute(data).await?; ``` 🤖 Generated with [Claude Code](https://claude.ai/code) Co-authored-by: Claude <noreply@anthropic.com>	2025-09-24 12:50:21 -07:00
Will Jones	356d7046fd	ci: fix test failure on main (#2677 ) Test was in wrong position.	2025-09-24 09:46:04 -07:00
Will Jones	48e5caabda	ci(nodejs): lint for unused imports (#2673 )	2025-09-23 18:49:42 -07:00
Neha Prasad	b0800b4b71	fix: undefined values should become null in nullable fields (#2658 ) ### Bug Fix: Undefined Values in Nullable Fields Issue: When inserting data with `undefined` values into nullable fields, LanceDB was incorrectly coercing them to default values (`false` for booleans, `NaN` for numbers, `""` for strings) instead of `null`. Fix: Modified the `makeVector()` function in `arrow.ts` to properly convert `undefined` values to `null` for nullable fields before passing data to Apache Arrow. fixes: #2645 Result: Now `{ text: undefined, number: undefined, bool: undefined }` correctly becomes `{ text: null, number: null, bool: null }` when fields are marked as nullable in the schema. Files Changed: - `nodejs/lancedb/arrow.ts` (core fix) - `nodejs/__test__/arrow.test.ts` (test coverage) - This ensures proper null handling for nullable fields as expected by users. --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2025-09-23 14:29:52 -07:00
Neha Prasad	1befebf614	fix(node): handle null values in nullable boolean fields (#2657 ) ### Solution Added special handling in `makeVector` function for boolean arrays where all values are null. The fix creates a proper null bitmap using `makeData` and `arrowMakeVector` instead of relying on Apache Arrow's `vectorFromArray` which doesn't handle this edge case correctly. fixes: #2644 ### Changes - Added null value detection for boolean types in `makeVector` function - Creates proper Arrow data structure with null bitmap when all boolean values are null - Preserves existing behavior for non-null boolean values and other data types - Fixes the boolean null value bug while maintaining backward compatibility. --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2025-09-23 14:07:00 -07:00
Jack Ye	ff71d7e552	feat: support shallow clone (#2653 ) Support shallow cloning a dataset at a specific location to create a new dataset, using the shallow_clone feature in Lance. Also introduce remote `clone` API for remote tables for this functionality.	2025-09-21 21:28:40 -07:00
Neha Prasad	2261eb95a0	fix(node): handle undefined vector fields with embedding functions (#2655 ) - Fixes issue where passing `{ vector: undefined }` with an embedding function threw "Found field not in schema" error instead of calling the embedding function like `null` or omitted fields. Changes: - Modified `rowPathsAndValues` to skip undefined values during schema inference - Added test case verifying undefined, null, and omitted vector fields all work correctly Before: `{ vector: undefined }` → Error After: `{ vector: undefined }` → Calls embedding function Closes #2647	2025-09-19 09:17:28 -07:00
Jack Ye	8da74dcb37	feat: support per-request header override (#2631 ) ## Summary This PR introduces a `HeaderProvider` which is called for all remote HTTP calls to get the latest headers to inject. This is useful for features like adding the latest auth tokens where the header provider can auto-refresh tokens internally and each request always set the refreshed token. --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-09-10 13:44:00 -07:00
Jack Ye	9391ad1450	feat: support mTLS for remote database (#2638 ) This PR adds mTLS (mutual TLS) configuration support for the LanceDB remote HTTP client, allowing users to authenticate with client certificates and configure custom CA certificates for server verification. --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-09-09 21:04:46 -07:00
Will Jones	ad09234d59	feat: allow setting `train=False` and `name` on indices (#2586 ) Enables two new parameters when building indices: * `name`: Allows explicitly setting a name on the index. Default is `{col_name}_idx`. * `train` (default `True`): When set to `False`, an empty index will be immediately created. The upgrade of Lance means there are also additional behaviors from `cd76a993b8`: * When a scalar index is created on a Table, it will be kept around even if all rows are deleted or updated. * Scalar indices can be created on empty tables. They will default to `train=False` if the table is empty. --------- Co-authored-by: Weston Pace <weston.pace@gmail.com>	2025-08-15 14:00:26 -07:00
Will Jones	dcf53c4506	fix: limit and offset support paginating through FTS and vector search results (#2592 ) Adds tests to ensure that users can paginate through simple scan, FTS, and vector search results using `limit` and `offset`. Tests upstream work: https://github.com/lancedb/lance/pull/4318 Closes #2459	2025-08-15 08:55:12 -07:00

1 2 3 4

160 Commits