lancedb

mirror of https://github.com/lancedb/lancedb.git synced 2026-05-23 23:10:40 +00:00

Author	SHA1	Message	Date
Brendan Clement	4cb9147bbf	feat(nodejs): add renameTable on Connection (#3386 ) Adds `Connection.renameTable` to the Node SDK. Closes #3381.	2026-05-20 09:05:48 -07:00
Heng Ge	0d30b31998	feat: support setting LSM write spec for a table (#3396 ) ## Summary Split out from #3354 Adds `LsmWriteSpec` and `Table::set_lsm_write_spec` / `unset_lsm_write_spec` to install and clear the spec that selects Lance's MemWAL LSM-style write path for `merge_insert`. `LsmWriteSpec` offers three sharding strategies, all built on Lance's `InitializeMemWalBuilder`: - `LsmWriteSpec::bucket(column, num_buckets)` — hash-bucket sharding by the single-column unenforced primary key. - `LsmWriteSpec::identity(column)` — identity sharding by the raw value of a scalar column. - `LsmWriteSpec::unsharded()` — a single MemWAL shard. Each can be refined with `with_maintained_indexes(...)` (indexes the MemWAL keeps up to date as rows are appended) and `with_writer_config_defaults(...)` (default `ShardWriter` configuration recorded in the MemWAL index, so every writer starts from the same defaults). All variants require the table to have an unenforced primary key. - `set_lsm_write_spec` installs the spec by initializing the MemWAL index; `unset_lsm_write_spec` removes it (dropping the MemWAL index), reverting to the standard `merge_insert` path. `unset` is idempotent. - Bindings: Python (`LsmWriteSpec.bucket` / `.identity` / `.unsharded`, `set_lsm_write_spec` / `unset_lsm_write_spec`) and TypeScript (`setLsmWriteSpec` with `specType` `"bucket"` / `"identity"` / `"unsharded"`). `RemoteTable` returns `NotSupported`. The actual `merge_insert` LSM dispatch and `ShardWriter` write path are a follow-up — this PR only installs and clears the spec.	2026-05-18 00:11:33 -07:00
Heng Ge	6a431ff0a0	feat: support setting unenforced primary key (#3394 ) ## Summary Adds `Table::set_unenforced_primary_key` — records a single column as the table's unenforced primary key in Lance schema field metadata. "Unenforced" means LanceDB does not check uniqueness on write; the key is metadata that `merge_insert` consumes. - Single-column only; the column must exist and have a supported dtype (Int32, Int64, Utf8, LargeUtf8, Binary, LargeBinary, FixedSizeBinary). The API accepts an iterable for binding ergonomics but requires exactly one column — compound keys are rejected. - The primary key is immutable: calling this on a table that already has an unenforced primary key is rejected. Concurrent writers racing to set the key fail at commit time rather than silently overriding it. - `RemoteTable` returns `NotSupported`. - Bindings: Python (`AsyncTable`, `LanceTable`, `RemoteTable`) and TypeScript (`Table.setUnenforcedPrimaryKey`). ## Context Split out from #3354 per review feedback, so the unenforced primary key and the `merge_insert` sharding spec land as separate reviewable PRs. No Lance dependency bump — `main` is already on v7.0.0-beta.10, which includes the field-metadata round-trip fix the API relies on. Enforcing primary-key immutability at the Lance commit layer (so the cross-column concurrent race is also rejected) is a companion Lance change: lance-format/lance#6810.	2026-05-16 23:12:55 -07:00
Xin Sun	ab2c5adf5e	feat(nodejs): add order_by method to Query (#3123 )	2026-05-16 22:49:08 -07:00
Neha Prasad	13c6dae9a3	feat(nodejs): add Connection.renameTable with namespace support (#3365 ) ### Summary - Expose Connection.renameTable in the Node.js bindings and align it with existing namespace-aware connection APIs. ### Changes - Add napi-rs rename_table on Connection, delegating to Rust Connection::rename_table. - Add renameTable(oldName, newName, namespacePath?) on abstract Connection and implement on LocalConnection. - Add a connection test that renames a table and checks names / open behavior. #### Testing - cd nodejs && npm run build - cd nodejs && npm test __test__/connection.test.ts fix : #3364 --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2026-05-14 15:30:31 -07:00
Tanay	df4ad9f851	feat(nodejs): add Scannable primitive for streaming ingestion (#3271 ) ## Summary This PR adds a Scannable primitive to the Node.js bindings, bringing parity with Python's `PyScannable`. A `Scannable` wraps a schema, an optional row count hint, a rescannable flag, and a batch producing callback. On the Rust side it implements `lancedb::data::scannable::Scannable`. The goal is to give consumers such as `Table.add`, `createTable`, and `mergeInsert` a way to stream data without materializing the full dataset in JS memory. This PR introduces only the primitive. Migrating existing consumers to use it will come in follow up work. --- ## Design ### Transport The transport uses the Arrow IPC Stream format, one batch at a time. The JS side encodes each `RecordBatch` into a self contained IPC Stream message containing schema, batch, and end of stream. The message is returned as a `Buffer` through a napi `ThreadsafeFunction`. The Rust side decodes it using `arrow_ipc::reader::StreamReader`. Only one batch is active at a time, so JS memory stays bounded by the batch size. The Node `Buffer` size limit of about 4 GiB therefore does not constrain the stream as a whole. I initially evaluated the Arrow C Data Interface, which is the approach used in Python. I dropped that path after confirming that the `apache-arrow` npm package does not expose a C Data Interface export in any supported version from 15 to 18. JavaScript is not listed in Arrow's C Data Interface implementation table, and the upstream tracking issue remains open with no scheduled work. Third party FFI shims would introduce additional dependency risk without solving the core maintenance problem. Using IPC adds one encode and decode step per batch, but the cost is predictable and typically dominated by Lance's write path. --- ### API ```ts class Scannable { readonly schema: Schema readonly numRows: number \| null readonly rescannable: boolean static fromFactory(schema, factory, opts?) static fromTable(table, opts?) static fromIterable(schema, iter, opts?) static fromRecordBatchReader(reader, opts?) } ``` The FFI boundary consists of a single callback: `getNextBatch(isStart: boolean): Promise<Buffer \| null>` `isStart` is `true` on the first call of each new scan and `false` for every call after it. The JS side uses it to drop any cached iterator and re-invoke the factory at scan boundaries. This is what makes a rescannable source restart at batch 0 on every `scan_as_stream` call, even when a previous scan ended mid stream, for example a retried write after a network error. Without this signal a retry would resume a stale iterator and silently skip already emitted batches. In addition, a schema only IPC buffer is transferred once during construction. --- ## Changes * `nodejs/src/scannable.rs` Adds `NapiScannable` and the `LanceScannable` implementation. Implements `schema()`, `num_rows()`, `rescannable()`, and `scan_as_stream()`. Includes per batch schema validation against the declared schema, one shot enforcement for non rescannable sources, and a scan boundary reset signal (`isStart`) so rescannable sources restart from batch 0 on every `scan_as_stream` call rather than resuming a stale iterator. * `nodejs/src/lib.rs` Module registration. * `nodejs/lancedb/scannable.ts` Defines the `Scannable` class and the four constructors listed above. Each constructor rejects option combinations it cannot honor, for example a `rescannable: true` request on a one shot iterable or reader, and a `numRows` that disagrees with an in memory table's row count. * `nodejs/lancedb/index.ts` Exports the new primitive. * `nodejs/__test__/scannable.test.ts` Test suite for the primitive. --- ## Validation Before implementing the bridge, I ran an end to end harness with a JS producer feeding a standalone Rust consumer built against the same `arrow-ipc` version used in the bridge. The harness covered the following scenarios: * happy path * empty stream * 1,000 small batches * 10 large batches * mixed primitive types with nullables * nested `List<Struct<>>` * truncated stream error handling * declared schema mismatch validation * a 6 GB stress test through the pipe All scenarios completed with bounded memory usage. The goal of this harness was to confirm that the IPC Stream transport works correctly end to end and that Node's `Buffer` size limit does not constrain the overall stream. Separately, the rescannable restart contract was verified with a focused harness. A rescannable source is consumed partially and the scan is dropped mid stream, then re-scanned. The re-scan replays from batch 0 rather than resuming the stale iterator. The same harness was run with the `isStart` reset path disabled and the mid stream restart case failed as expected, confirming the test exercises the real regression. These harnesses are not meant to replace the full test suite, which is described below. --- ## Tests `__test__/scannable.test.ts` covers construction, metadata reflection, per constructor defaults and overrides, construction time validation, the native handle surface, and schema variety across empty tables, nested types, `FixedSizeList`, and wide schemas. Runtime scan behavior including `scan_as_stream`, one shot enforcement on non rescannable sources, schema mismatch detection, IPC decode failures, and rescannable restart semantics is not exercised here. There is no in tree JS consumer of `NapiScannable` yet. This mirrors Python's `PyScannable`, which has no dedicated test file and is covered transitively through the consumers that accept a Scannable. Runtime coverage will follow in the consumer migration work. --- ## Status Ready for review. Closes #3223 ---	2026-05-14 15:07:41 -07:00
Brendan Clement	02de07576e	feat(nodejs): add namespace management methods on Connection (#3371 ) ### Summary Closes #3363 Adds the four namespace management methods to the NodeJS `Connection`, bringing parity with the Rust core and Python bindings: - `listNamespaces(parent?, options?)` - `createNamespace(namespacePath, options?)` - `dropNamespace(namespacePath, options?)` - `describeNamespace(namespacePath)` ### Test plan - npm test - Ran a smoke test script ```typescript import { connect } from '<lancePath>' import { tmpdir } from "os"; import { mkdtempSync } from "fs"; import { join } from "path"; const dir = mkdtempSync(join(tmpdir(), "lancedb-smoke-")); console.log(`Using temp dir: ${dir}\n`); const db = await connect(dir, { namespaceClientProperties: { manifest_enabled: "true" }, }); console.log("Creating namespaces..."); await db.createNamespace(["analytics"]); await db.createNamespace(["analytics", "sales"], { properties: { owner: "brendan", purpose: "smoke-test" }, }); await db.createNamespace(["marketing"]); const root = await db.listNamespaces(); console.log("Root namespaces:", root.namespaces); const children = await db.listNamespaces(["analytics"]); console.log("Children of 'analytics':", children.namespaces); const descWithProps = await db.describeNamespace(["analytics", "sales"]); console.log("Describe analytics/sales (with properties):", descWithProps); const descNoProps = await db.describeNamespace(["analytics"]); console.log("Describe analytics (no properties):", descNoProps); console.log("Describing a non-existent namespace (expect error)..."); try { await db.describeNamespace(["does-not-exist"]); console.error(" UNEXPECTED: describe succeeded for non-existent namespace"); } catch (err) { console.log(` ✓ Got expected error: ${err.message.split("\n")[0]}`); } await db.dropNamespace(["marketing"]); const afterDrop = await db.listNamespaces(); console.log("Root after dropping marketing:", afterDrop.namespaces); await db.close(); console.log("\nAll operations completed successfully."); ``` ``` Using temp dir: /var/folders/bj/hn6jv9c50y301d1nx0y8xmn00000gn/T/lancedb-smoke-MUC5NI Creating namespaces... Root namespaces: [ 'analytics', 'marketing' ] Children of 'analytics': [ 'sales' ] Describe analytics/sales (with properties): { properties: { purpose: 'smoke-test', owner: 'brendan' } } Describe analytics (no properties): {} Describing a non-existent namespace (expect error)... ✓ Got expected error: lance error: Namespace error: Namespace not found: does-not-exist, rust/lance-namespace-impls/src/dir/manifest.rs:2495:14 Caused by: Namespace error: Namespace not found: does-not-exist, rust/lance-namespace-impls/src/dir/manifest.rs:2495:14 Caused by: Namespace not found: does-not-exist Root after dropping marketing: [ 'analytics' ] All operations completed successfully. ``` ### Documentation - regenerated docs	2026-05-13 11:49:27 -07:00
Brendan Clement	011fdd5c94	feat(nodejs): add prewarmData method on Table (#3374 ) ### Summary - Closes #3362 - Adds `prewarmData(columns?: string[])` to the Node bindings, mirroring the Rust and Python implementations ### Testing - [x] `npm run build` (regenerates the napi `.node` module + TS declarations) - [x] `npm run lint` - [x] `npm test - [ ] live test against remote table - just waiting for my dev stack to get created ### Documentation - updated docs	2026-05-12 15:29:48 -07:00
Jack Ye	e26b22bcca	refactor!: consolidate namespace related naming and enterprise integration (#3205 ) 1. Refactored every client (Rust core, Python, Node/TypeScript) so “namespace” usage is explicit: code now keeps namespace paths (namespace_path) separate from namespace clients (namespace_client). Connections propagate the client, table creation routes through it, and managed versioning defaults are resolved from namespace metadata. Python gained LanceNamespaceDBConnection/async counterparts, and the namespace-focused tests were rewritten to match the clarified API surface. 2. Synchronized the workspace with Lance 5.0.0-beta.3 (see https://github.com/lance-format/lance/pull/6186 for the upstream namespace refactor), updating Cargo/uv lockfiles and ensuring all bindings align with the new namespace semantics. 3. Added a namespace-backed code path to lancedb.connect() via new keyword arguments (namespace_client_impl, namespace_client_properties, plus the existing pushdown-ops flag). When those kwargs are supplied, connect() delegates to connect_namespace, so users can opt into namespace clients without changing APIs. (The async helper will gain parity in a later change)	2026-04-03 00:09:03 -07:00
Will Jones	9a5b0398ec	chore: fix ci (#3139 ) * Move away from buildjet, which is shutting down runners for GHA [^1] * Add `Cargo.lock` to build jobs, so when we upgrade locked dependencies we check the builds actually pass. CI started failing because dependencies were changed in #3116 without running all build jobs. * Add fixes for aws-lc-rs build in NodeJS. [^1]: https://buildjet.com/for-github-actions/blog/we-are-shutting-down --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-16 06:25:40 -07:00
Pratik Dey	d1d720d08a	feat(nodejs): support field/data type input in add_columns() method (#3114 ) Add support for passing field/data type information into add_columns() method, bringing parity with Python bindings. The method now accepts: - AddColumnsSql[] - SQL expressions (existing functionality) - Field - single Arrow field with explicit data type - Field[] - array of Arrow fields with explicit data types - Schema - Arrow schema with explicit data types New columns added via Field/Schema are initialized with null values. All field-based columns must be nullable due to null initialization. Resolves #3107 --------- Signed-off-by: Pratik <pratikrocks.dey11@gmail.com> Co-authored-by: Claude <noreply@anthropic.com>	2026-03-13 12:57:14 -07:00
Jack Ye	6329b57604	docs: update nodejs docs for storage options APIs (#2978 ) Regenerate TypeScript docs to include the new initialStorageOptions() and latestStorageOptions() methods added in #2966. Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-04 16:07:58 -08:00
Vedant Madane	d3e15f3e17	fix(node): allow bigint[] for takeRowIds (#2916 ) ## Summary This PR changes takeRowIds to accept bigint[] instead of number[], matching the type of _rowid returned by withRowId(). ## Problem When retrieving row IDs using \withRowId()\ and querying them back with takeRowIds(), users get an error because: 1. _rowid values are returned as JavaScript bigint 2. takeRowIds() expected number[] 3. NAPI failed to convert: Error: Failed to convert napi value BigInt into rust type i64 ## Reproduction \\\js import lancedb from '@lancedb/lancedb'; const db = await lancedb.connect('memory://'); const table = await db.createTable('test', [{ id: 1, vector: [1.0, 2.0] }]); const results = await table.query().withRowId().toArray(); const rowIds = results.map(row => row._rowid); console.log('types:', rowIds.map(id => typeof id)); // ['bigint'] await table.takeRowIds(rowIds).toArray(); // âŒ Error before fix \\\ ## Solution - Updated TypeScript signature from takeRowIds(rowIds: number[]) to takeRowIds(rowIds: bigint[]) - Updated Rust NAPI binding to accept Vec<BigInt> and convert using get_u64() Fixes #2722 --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2026-02-03 10:09:51 -08:00
Prashanth Rao	76bcc78910	docs: nodejs failing CI is fixed (#2802 ) Fixes the breaking CI for nodejs, related to the documentation of the new Permutation API in typescript. - Expanded the generated typings in `nodejs/lancedb/native.d.ts` to include `SplitCalculatedOptions`, `splitNames` fields, and the persist/options-based `splitCalculated` methods so the permutation exports match the native API. - The previous block comment block had an inconsistency. `splitCalculated` takes an options object (`SplitCalculatedOptions`) in our bindings, not a bare string. The previous example showed `builder.splitCalculated("user_id % 3");`, which doesn’t match the actual signature and would fail TS typecheck. I updated the comment to `builder.splitCalculated({ calculation: "user_id % 3" });` so the example is now correct. - Updated the `splitCalculated` example in `nodejs/lancedb/permutation.ts` to use the options object. - Ran `npm docs` to ensure docs build correctly. > [!NOTE] > Disclaimer: I used GPT-5.1-Codex-Max to make these updates, but I have read the code and run `npm run docs` to verify that they work and are correct to the best of my knowledge.	2025-11-20 16:16:38 -08:00
Weston Pace	aeac9c7644	feat: add python Permutation class to mimic hugging face dataset and provide pytorch dataloader (#2725 )	2025-11-06 16:15:33 -08:00
S.A.N	20bec61ecb	refactor(node): async generator for RecordBatchIterator (#2744 ) JS native Async Generator, more efficient asynchronous iteration, fewer synthetic promises, and the ability to handle `catch` or `break` of parent loop in `finally` block	2025-10-30 14:36:24 -07:00
Weston Pace	8f8e06a2da	feat: add output_schema method to queries (#2717 ) This is a helper utility I need for some of my data loader work. It makes it easy to see the output schema even when a `select` has been applied.	2025-10-14 05:13:28 -07:00
Weston Pace	5a19cf15a6	feat: a utility for creating "permutation views" (#2552 ) I'm working on a lancedb version of pytorch data loading (and hopefully addressing https://github.com/lancedb/lance/issues/3727). However, rather than rely on pytorch for everything I'm moving some of the things that pytorch does into rust. This gives us more control over data loading (e.g. using shards or a hash-based split) and it allows permutations to be persistent. In particular I hope to be able to: * Create a persistent permutation * This permutation can handle splits, filtering, shuffling, and sharding * Create a rust data loader that can read a permutation (one or more splits), or a subset of a permutation (for DDP) * Create a python data loader that delegates to the rust data loader Eventually create integrations for other data loading libraries, including rust & node	2025-10-09 18:07:31 -07:00
BubbleCal	b59d1007d3	feat(index): add IVF_RQ index type (#2687 ) this expose IVF_RQ (RabitQ quantization) index type to lancedb --------- Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2025-10-09 15:46:18 +08:00
Will Jones	1ac745eb18	ci: fix Python and Node CI on main (#2700 ) Example failure: https://github.com/lancedb/lancedb/actions/runs/18237024283/job/51932651993	2025-10-06 09:40:08 -07:00
Will Jones	48e5caabda	ci(nodejs): lint for unused imports (#2673 )	2025-09-23 18:49:42 -07:00
Jack Ye	8da74dcb37	feat: support per-request header override (#2631 ) ## Summary This PR introduces a `HeaderProvider` which is called for all remote HTTP calls to get the latest headers to inject. This is useful for features like adding the latest auth tokens where the header provider can auto-refresh tokens internally and each request always set the refreshed token. --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-09-10 13:44:00 -07:00
Weston Pace	ed640a76d9	feat: add take_offsets and take_row_ids (#2584 ) These operations have existed in lance for a long while and many users need to drop down to lance for this capability. This PR adds the API and implements it using filters (e.g. `_rowid IN (...)`) so that in doesn't currently add any load to `BaseTable`. I'm not sure that is sustainable as base table implementations may want to specialize how they handle this method. However, I figure it is a good starting point. In addition, unlike Lance, this API does not currently guarantee anything about the order of the take results. This is necessary for the fallback filter approach to work (SQL filters cannot guarantee result order)	2025-08-15 06:48:24 -07:00
Will Jones	3d1f102087	feat: allow Python and Typescript users to create `Session`s (#2530 ) ## Summary - Exposes `Session` in Python and Typescript so users can set the `index_cache_size_bytes` and `metadata_cache_size_bytes` * The `Session` is attached to the `Connection`, and thus shared across all tables in that connection. - Adds deprecation warnings for table-level cache configuration 🤖 Generated with [Claude Code](https://claude.ai/code) --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-07-24 12:06:29 -07:00
BubbleCal	96c66fd087	feat: support multivector for JS SDK (#2527 ) Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2025-07-22 21:19:34 +08:00
BubbleCal	fec8d58f06	feat: support a bunch or FTS features in JS SDK (#2431 ) - operator for match query - slop for phrase query - boolean query <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Introduced support for boolean full-text search queries with AND/OR logic and occurrence conditions. - Added operator options for match and multi-match queries to control term combination logic. - Enabled phrase queries to specify proximity (slop) for flexible phrase matching. - Added new enumerations (`Operator`, `Occur`) and the `BooleanQuery` class for enhanced query expressiveness. - Bug Fixes - Improved validation and error handling for invalid operator and occurrence inputs in full-text queries. - Tests - Expanded test coverage with new cases for boolean queries and operator-based full-text searches. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2025-06-12 17:04:19 +08:00
Will Jones	272e4103b2	feat: provide timeout parameter for merge_insert (#2378 ) Provides the ability to set a timeout for merge insert. The default underlying timeout is however long the first attempt takes, or if there are multiple attempts, 30 seconds. This has two use cases: 1. Make the timeout shorter, when you want to fail if it takes too long. 2. Allow taking more time to do retries. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Added support for specifying a timeout when performing merge insert operations in Python, Node.js, and Rust APIs. - Introduced a new option to control the maximum allowed execution time for merge inserts, including retry timeout handling. - Documentation - Updated and added documentation to describe the new timeout option and its usage in APIs. - Tests - Added and updated tests to verify correct timeout behavior during merge insert operations. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-05-08 13:07:05 -07:00
LuQQiu	ed594b0f76	feat: return version for all write operations (#2368 ) return version info for all write operations (add, update, merge_insert and column modification operations) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Table modification operations (add, update, delete, merge, add/alter/drop columns) now return detailed result objects including version numbers and operation statistics. - Result objects provide clearer feedback such as rows affected and new table version after each operation. - Documentation - Updated documentation to describe new result objects and their fields for all relevant table operations. - Added documentation for new result interfaces and updated method return types in Node.js and Python APIs. - Tests - Enhanced test coverage to assert correctness of returned versioning and operation metadata after table modifications. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-05-05 14:25:34 -07:00
Alex Pilon	f315f9665a	feat: implement bindings to return merge stats (#2367 ) Based on this comment: https://github.com/lancedb/lancedb/issues/2228#issuecomment-2730463075 and https://github.com/lancedb/lance/pull/2357 Here is my attempt at implementing bindings for returning merge stats from a `merge_insert.execute` call for lancedb. Note: I have almost no idea what I am doing in Rust but tried to follow existing code patterns and pay attention to compiler hints. - The change in nodejs binding appeared to be necessary to get compilation to work, presumably this could actual work properly by returning some kind of NAPI JS object of the stats data? - I am unsure of what to do with the remote/table.rs changes - necessarily for compilation to work; I assume this is related to LanceDB cloud, but unsure the best way to handle that at this point. Proof of function: ```python import pandas as pd import lancedb db = lancedb.connect("/tmp/test.db") test_data = pd.DataFrame( { "title": ["Hello", "Test Document", "Example", "Data Sample", "Last One"], "id": [1, 2, 3, 4, 5], "content": [ "World", "This is a test", "Another example", "More test data", "Final entry", ], } ) table = db.create_table("documents", data=test_data, exist_ok=True, mode="overwrite") update_data = pd.DataFrame( { "title": [ "Hello, World", "Test Document, it's good", "Example", "Data Sample", "Last One", "New One", ], "id": [1, 2, 3, 4, 5, 6], "content": [ "World", "This is a test", "Another example", "More test data", "Final entry", "New content", ], } ) stats = ( table.merge_insert(on="id") .when_matched_update_all() .when_not_matched_insert_all() .execute(update_data) ) print(stats) ``` returns ``` {'num_inserted_rows': 1, 'num_updated_rows': 5, 'num_deleted_rows': 0} ``` <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Summary by CodeRabbit - New Features - Merge-insert operations now return detailed statistics, including counts of inserted, updated, and deleted rows. - Bug Fixes - Tests updated to validate returned merge-insert statistics for accuracy. - Documentation - Method documentation improved to reflect new return values and clarify merge operation results. - Added documentation for the new `MergeStats` interface detailing operation statistics. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2025-05-01 10:00:20 -07:00
Ryan Green	af54e0ce06	feat: add table stats API (#2363 ) * Add a new "table stats" API to expose basic table and fragment statistics with local and remote table implementations ### Questions * This is using `calculate_data_stats` to determine total bytes in the table. This seems like a potentially expensive operation - are there any concerns about performance for large datasets? ### Notes * bytes_on_disk seems to be stored at the column level but there does not seem to be a way to easily calculate total bytes per fragment. This may need to be added in lance before we can support fragment size (bytes) statistics. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Added a method to retrieve comprehensive table statistics, including total rows, index counts, storage size, and detailed fragment size metrics such as minimum, maximum, mean, and percentiles. - Enabled fetching of table statistics from remote sources through asynchronous requests. - Extended table interfaces across Python, Rust, and Node.js to support synchronous and asynchronous retrieval of table statistics. - Tests - Introduced tests to verify the accuracy of the new table statistics feature for both populated and empty tables. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-04-29 15:19:08 -02:30
LuQQiu	a9311c4dc0	feat: add list/create/delete/update/checkout tag API (#2353 ) add the tag related API to list existing tags, attach tag to a version, update the tag version, delete tag, get the version of the tag, and checkout the version that the tag bounded to. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Introduced table version tagging, allowing users to create, update, delete, and list human-readable tags for specific table versions. - Enabled checking out a table by either version number or tag name. - Added new interfaces for tag management in both Python and Node.js APIs, supporting synchronous and asynchronous workflows. - Bug Fixes - None. - Documentation - Updated documentation to describe the new tagging features, including usage examples. - Tests - Added comprehensive tests for tag creation, updating, deletion, listing, and version checkout by tag in both Python and Node.js environments. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-04-28 10:04:46 -07:00
Ryan Green	3ae90dde80	feat: add new table API to wait for async indexing (#2338 ) * Add new wait_for_index() table operation that polls until indices are created/fully indexed * Add an optional wait timeout parameter to all create_index operations * Python and NodeJS interfaces <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Summary by CodeRabbit - New Features - Added optional waiting for index creation completion with configurable timeout. - Introduced methods to poll and wait for indices to be fully built across sync and async tables. - Extended index creation APIs to accept a wait timeout parameter. - Bug Fixes - Added a new timeout error variant for improved error reporting on index operations. - Tests - Added tests covering successful index readiness waiting, timeout scenarios, and missing index cases. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-04-21 08:41:21 -02:30
Weston Pace	26080ee4c1	feat: add prewarm_index function (#2342 ) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Added the ability to prewarm (load into memory) table indexes via new methods in Python, Node.js, and Rust APIs, potentially reducing cold-start query latency. - Bug Fixes - Ensured prewarming an index does not interfere with subsequent search operations. - Tests - Introduced new test cases to verify full-text search index creation, prewarming, and search functionalities in both Python and Node.js. - Chores - Updated dependencies for improved compatibility and performance. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Lu Qiu <luqiujob@gmail.com>	2025-04-17 15:14:36 -07:00
BubbleCal	2248aa9508	fix: bugs for new FTS APIs (#2314 ) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Enhanced full-text search capabilities with support for phrase queries, fuzzy matching, boosting, and multi-column matching. - Search methods now accept full-text query objects directly, improving query flexibility and precision. - Python and JavaScript SDKs updated to handle full-text queries seamlessly, including async search support. - Tests - Added comprehensive tests covering fuzzy search, phrase search, and boosted queries to ensure robust full-text search functionality. - Documentation - Updated query class documentation to reflect new constructor options and removal of deprecated methods for clarity and simplicity. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2025-04-15 11:51:35 +08:00
Weston Pace	625bab3f21	feat: update to lance 0.25.3b1 (#2294 ) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Chores - Updated dependency versions for improved performance and compatibility. - New Features - Added support for structured full-text search with expanded query types (e.g., match, phrase, boost, multi-match) and flexible input formats. - Introduced a new method to check server support for structural full-text search features. - Enhanced the query system with new classes and interfaces for handling various full-text queries. - Expanded the functionality of existing methods to accept more complex query structures, including updates to method signatures. - Bug Fixes - Improved error handling and reporting for full-text search queries. - Refactor - Enhanced query processing with streamlined input handling and improved error reporting, ensuring more robust and consistent search results across platforms. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: BubbleCal <bubble-cal@outlook.com> Co-authored-by: BubbleCal <bubble-cal@outlook.com>	2025-04-01 06:36:42 -07:00
LuQQiu	a1d1833a40	feat: add analyze_plan api (#2280 ) add analyze plan api to allow executing the queries and see runtime metrics. Which help identify the query IO overhead and help identify query slowness	2025-03-28 14:28:52 -07:00
BubbleCal	bdb6c09c3b	feat: support binary vector and IVF_FLAT in TypeScript (#2221 ) resolve #2218 --------- Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2025-03-21 10:57:08 -07:00
Weston Pace	1a449fa49e	refactor: rename drop_db / drop_database to drop_all_tables, expose database from connection (#2098 ) If we start supporting external catalogs then "drop database" may be misleading (and not possible). We should be more clear that this is a utility method to drop all tables. This is also a nice chance for some consistency cleanup as it was `drop_db` in rust, `drop_database` in python, and non-existent in typescript. This PR also adds a public accessor to get the database trait from a connection. BREAKING CHANGE: the `drop_database` / `drop_db` methods are now deprecated.	2025-02-06 13:22:28 -08:00
Will Jones	e05c0cd87e	ci(node): check docs in CI (#2084 ) * Make `npm run docs` fail if there are any warnings. This will catch items missing from the API reference. * Add a check in our CI to make sure `npm run dos` runs without warnings and doesn't generate any new files (indicating it might be out-of-date. * Hide constructors that aren't user facing. * Remove unused enum `WriteMode`. Closes #2068	2025-01-30 16:06:06 -08:00
Will Jones	f059372137	feat: add `drop_index()` method (#2039 ) Closes #1665	2025-01-20 10:08:51 -08:00
Will Jones	db125013fc	docs: better formatting for Node API docs (#1892 ) * Sets `"useCodeBlocks": true` * Adds a post-processing script `nodejs/typedoc_post_process.js` that puts the parameter description on the same line as the parameter name, like it is in our Python docs. This makes the text hierarchy clearer in those sections and also makes the sections shorter.	2024-12-09 17:04:09 -08:00
BubbleCal	bf7d2d6fb0	docs: update FTS docs for JS SDK (#1634 ) Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2024-09-13 05:48:29 -07:00
Cory Grinstead	69295548cc	docs: minor updates for js migration guides (#1451 ) Co-authored-by: Will Jones <willjones127@gmail.com>	2024-07-22 10:26:49 -07:00
Weston Pace	e21b56293c	docs: add a reference to @lancedb/lance in the docs (#1166 ) We aren't yet ready to switch over the examples since almost all JS examples rely on embeddings and we haven't yet ported those over. However, this makes it possible for those that are interested to start using `@lancedb/lancedb`	2024-04-05 16:34:39 -07:00

44 Commits