lancedb

mirror of https://github.com/lancedb/lancedb.git synced 2026-05-19 13:00:40 +00:00

Author	SHA1	Message	Date
Brendan Clement	1df20aed81	docs(nodejs): clarify Table.add progress dispatch semantics	2026-05-18 09:53:20 -07:00
Brendan Clement	cb811428f9	fix(nodejs): swallow errors thrown from add progress callback	2026-05-18 09:14:25 -07:00
Brendan Clement	aed71c3bc0	feat(nodejs): add Table.add progress callback	2026-05-18 09:08:58 -07:00
Heng Ge	0d30b31998	feat: support setting LSM write spec for a table (#3396 ) ## Summary Split out from #3354 Adds `LsmWriteSpec` and `Table::set_lsm_write_spec` / `unset_lsm_write_spec` to install and clear the spec that selects Lance's MemWAL LSM-style write path for `merge_insert`. `LsmWriteSpec` offers three sharding strategies, all built on Lance's `InitializeMemWalBuilder`: - `LsmWriteSpec::bucket(column, num_buckets)` — hash-bucket sharding by the single-column unenforced primary key. - `LsmWriteSpec::identity(column)` — identity sharding by the raw value of a scalar column. - `LsmWriteSpec::unsharded()` — a single MemWAL shard. Each can be refined with `with_maintained_indexes(...)` (indexes the MemWAL keeps up to date as rows are appended) and `with_writer_config_defaults(...)` (default `ShardWriter` configuration recorded in the MemWAL index, so every writer starts from the same defaults). All variants require the table to have an unenforced primary key. - `set_lsm_write_spec` installs the spec by initializing the MemWAL index; `unset_lsm_write_spec` removes it (dropping the MemWAL index), reverting to the standard `merge_insert` path. `unset` is idempotent. - Bindings: Python (`LsmWriteSpec.bucket` / `.identity` / `.unsharded`, `set_lsm_write_spec` / `unset_lsm_write_spec`) and TypeScript (`setLsmWriteSpec` with `specType` `"bucket"` / `"identity"` / `"unsharded"`). `RemoteTable` returns `NotSupported`. The actual `merge_insert` LSM dispatch and `ShardWriter` write path are a follow-up — this PR only installs and clears the spec.	2026-05-18 00:11:33 -07:00
Heng Ge	6a431ff0a0	feat: support setting unenforced primary key (#3394 ) ## Summary Adds `Table::set_unenforced_primary_key` — records a single column as the table's unenforced primary key in Lance schema field metadata. "Unenforced" means LanceDB does not check uniqueness on write; the key is metadata that `merge_insert` consumes. - Single-column only; the column must exist and have a supported dtype (Int32, Int64, Utf8, LargeUtf8, Binary, LargeBinary, FixedSizeBinary). The API accepts an iterable for binding ergonomics but requires exactly one column — compound keys are rejected. - The primary key is immutable: calling this on a table that already has an unenforced primary key is rejected. Concurrent writers racing to set the key fail at commit time rather than silently overriding it. - `RemoteTable` returns `NotSupported`. - Bindings: Python (`AsyncTable`, `LanceTable`, `RemoteTable`) and TypeScript (`Table.setUnenforcedPrimaryKey`). ## Context Split out from #3354 per review feedback, so the unenforced primary key and the `merge_insert` sharding spec land as separate reviewable PRs. No Lance dependency bump — `main` is already on v7.0.0-beta.10, which includes the field-metadata round-trip fix the API relies on. Enforcing primary-key immutability at the Lance commit layer (so the cross-column concurrent race is also rejected) is a companion Lance change: lance-format/lance#6810.	2026-05-16 23:12:55 -07:00
Brendan Clement	011fdd5c94	feat(nodejs): add prewarmData method on Table (#3374 ) ### Summary - Closes #3362 - Adds `prewarmData(columns?: string[])` to the Node bindings, mirroring the Rust and Python implementations ### Testing - [x] `npm run build` (regenerates the napi `.node` module + TS declarations) - [x] `npm run lint` - [x] `npm test - [ ] live test against remote table - just waiting for my dev stack to get created ### Documentation - updated docs	2026-05-12 15:29:48 -07:00
Pratik Dey	d1d720d08a	feat(nodejs): support field/data type input in add_columns() method (#3114 ) Add support for passing field/data type information into add_columns() method, bringing parity with Python bindings. The method now accepts: - AddColumnsSql[] - SQL expressions (existing functionality) - Field - single Arrow field with explicit data type - Field[] - array of Arrow fields with explicit data types - Schema - Arrow schema with explicit data types New columns added via Field/Schema are initialized with null values. All field-based columns must be nullable due to null initialization. Resolves #3107 --------- Signed-off-by: Pratik <pratikrocks.dey11@gmail.com> Co-authored-by: Claude <noreply@anthropic.com>	2026-03-13 12:57:14 -07:00
Weston Pace	216c1b5f77	docs: remove experimental label from optimize and warn about delete_unverified (#3128 ) ## Summary - Removes the "Experimental API" section from `optimize` method documentation across Rust, Python, and TypeScript - Adds a warning to `delete_unverified` documentation in all bindings: this should only be set to true if you can guarantee no other process is working on the dataset, otherwise it could be corrupted - Fixes a typo ("shoudl" → "should") Closes #3125 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-12 14:37:42 +08:00
Jack Ye	0859312b83	feat: add initial and latest storage options apis (#2966 ) Expose `initial_storage_options()` and `latest_storage_options()` in lance Dataset, in lancedb rust, python and typescript SDKs. --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-04 10:31:39 -08:00
Vedant Madane	d3e15f3e17	fix(node): allow bigint[] for takeRowIds (#2916 ) ## Summary This PR changes takeRowIds to accept bigint[] instead of number[], matching the type of _rowid returned by withRowId(). ## Problem When retrieving row IDs using \withRowId()\ and querying them back with takeRowIds(), users get an error because: 1. _rowid values are returned as JavaScript bigint 2. takeRowIds() expected number[] 3. NAPI failed to convert: Error: Failed to convert napi value BigInt into rust type i64 ## Reproduction \\\js import lancedb from '@lancedb/lancedb'; const db = await lancedb.connect('memory://'); const table = await db.createTable('test', [{ id: 1, vector: [1.0, 2.0] }]); const results = await table.query().withRowId().toArray(); const rowIds = results.map(row => row._rowid); console.log('types:', rowIds.map(id => typeof id)); // ['bigint'] await table.takeRowIds(rowIds).toArray(); // âŒ Error before fix \\\ ## Solution - Updated TypeScript signature from takeRowIds(rowIds: number[]) to takeRowIds(rowIds: bigint[]) - Updated Rust NAPI binding to accept Vec<BigInt> and convert using get_u64() Fixes #2722 --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2026-02-03 10:09:51 -08:00
Will Jones	ad09234d59	feat: allow setting `train=False` and `name` on indices (#2586 ) Enables two new parameters when building indices: * `name`: Allows explicitly setting a name on the index. Default is `{col_name}_idx`. * `train` (default `True`): When set to `False`, an empty index will be immediately created. The upgrade of Lance means there are also additional behaviors from `cd76a993b8`: * When a scalar index is created on a Table, it will be kept around even if all rows are deleted or updated. * Scalar indices can be created on empty tables. They will default to `train=False` if the table is empty. --------- Co-authored-by: Weston Pace <weston.pace@gmail.com>	2025-08-15 14:00:26 -07:00
Weston Pace	ed640a76d9	feat: add take_offsets and take_row_ids (#2584 ) These operations have existed in lance for a long while and many users need to drop down to lance for this capability. This PR adds the API and implements it using filters (e.g. `_rowid IN (...)`) so that in doesn't currently add any load to `BaseTable`. I'm not sure that is sustainable as base table implementations may want to specialize how they handle this method. However, I figure it is a good starting point. In addition, unlike Lance, this API does not currently guarantee anything about the order of the take results. This is necessary for the fallback filter approach to work (SQL filters cannot guarantee result order)	2025-08-15 06:48:24 -07:00
BubbleCal	96c66fd087	feat: support multivector for JS SDK (#2527 ) Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2025-07-22 21:19:34 +08:00
Benjamin Schmidt	4c999fb651	chore: fix cleanupOlderThan docs (#2504 ) Thanks for all your work. The docstring for `OptimizeOptions ` seems to reference a non-existent method on `Table`. I believe this is the correct example for `cleanupOlderThan`. This also appears in the generated docs, but I assume they live downstream from this code?	2025-07-15 16:23:10 +08:00
LuQQiu	ed594b0f76	feat: return version for all write operations (#2368 ) return version info for all write operations (add, update, merge_insert and column modification operations) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Table modification operations (add, update, delete, merge, add/alter/drop columns) now return detailed result objects including version numbers and operation statistics. - Result objects provide clearer feedback such as rows affected and new table version after each operation. - Documentation - Updated documentation to describe new result objects and their fields for all relevant table operations. - Added documentation for new result interfaces and updated method return types in Node.js and Python APIs. - Tests - Enhanced test coverage to assert correctness of returned versioning and operation metadata after table modifications. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-05-05 14:25:34 -07:00
Ryan Green	af54e0ce06	feat: add table stats API (#2363 ) * Add a new "table stats" API to expose basic table and fragment statistics with local and remote table implementations ### Questions * This is using `calculate_data_stats` to determine total bytes in the table. This seems like a potentially expensive operation - are there any concerns about performance for large datasets? ### Notes * bytes_on_disk seems to be stored at the column level but there does not seem to be a way to easily calculate total bytes per fragment. This may need to be added in lance before we can support fragment size (bytes) statistics. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Added a method to retrieve comprehensive table statistics, including total rows, index counts, storage size, and detailed fragment size metrics such as minimum, maximum, mean, and percentiles. - Enabled fetching of table statistics from remote sources through asynchronous requests. - Extended table interfaces across Python, Rust, and Node.js to support synchronous and asynchronous retrieval of table statistics. - Tests - Introduced tests to verify the accuracy of the new table statistics feature for both populated and empty tables. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-04-29 15:19:08 -02:30
LuQQiu	a9311c4dc0	feat: add list/create/delete/update/checkout tag API (#2353 ) add the tag related API to list existing tags, attach tag to a version, update the tag version, delete tag, get the version of the tag, and checkout the version that the tag bounded to. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Introduced table version tagging, allowing users to create, update, delete, and list human-readable tags for specific table versions. - Enabled checking out a table by either version number or tag name. - Added new interfaces for tag management in both Python and Node.js APIs, supporting synchronous and asynchronous workflows. - Bug Fixes - None. - Documentation - Updated documentation to describe the new tagging features, including usage examples. - Tests - Added comprehensive tests for tag creation, updating, deletion, listing, and version checkout by tag in both Python and Node.js environments. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-04-28 10:04:46 -07:00
Ryan Green	3ae90dde80	feat: add new table API to wait for async indexing (#2338 ) * Add new wait_for_index() table operation that polls until indices are created/fully indexed * Add an optional wait timeout parameter to all create_index operations * Python and NodeJS interfaces <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Summary by CodeRabbit - New Features - Added optional waiting for index creation completion with configurable timeout. - Introduced methods to poll and wait for indices to be fully built across sync and async tables. - Extended index creation APIs to accept a wait timeout parameter. - Bug Fixes - Added a new timeout error variant for improved error reporting on index operations. - Tests - Added tests covering successful index readiness waiting, timeout scenarios, and missing index cases. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-04-21 08:41:21 -02:30
Weston Pace	26080ee4c1	feat: add prewarm_index function (#2342 ) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Added the ability to prewarm (load into memory) table indexes via new methods in Python, Node.js, and Rust APIs, potentially reducing cold-start query latency. - Bug Fixes - Ensured prewarming an index does not interfere with subsequent search operations. - Tests - Introduced new test cases to verify full-text search index creation, prewarming, and search functionalities in both Python and Node.js. - Chores - Updated dependencies for improved compatibility and performance. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Lu Qiu <luqiujob@gmail.com>	2025-04-17 15:14:36 -07:00
BubbleCal	2248aa9508	fix: bugs for new FTS APIs (#2314 ) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Enhanced full-text search capabilities with support for phrase queries, fuzzy matching, boosting, and multi-column matching. - Search methods now accept full-text query objects directly, improving query flexibility and precision. - Python and JavaScript SDKs updated to handle full-text queries seamlessly, including async search support. - Tests - Added comprehensive tests covering fuzzy search, phrase search, and boosted queries to ensure robust full-text search functionality. - Documentation - Updated query class documentation to reflect new constructor options and removal of deprecated methods for clarity and simplicity. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2025-04-15 11:51:35 +08:00
Will Jones	7747c9bcbf	feat(node): parse arrow types in `alterColumns()` (#2208 ) Previously, users could only specify new data types in `alterColumns` as strings: ```ts await tbl.alterColumns([ path: "price", dataType: "float" ]); ``` But this has some problems: 1. It wasn't clear what were valid types 2. It was impossible to specify nested types, like lists and vector columns. This PR changes it to take an Arrow data type, similar to how the Python API works. This allows casting vector types: ```ts await tbl.alterColumns([ { path: "vector", dataType: new arrow.FixedSizeList( 2, new arrow.Field("item", new arrow.Float16(), false), ), }, ]); ``` Closes #2185	2025-03-12 09:57:36 -07:00
Will Jones	e05c0cd87e	ci(node): check docs in CI (#2084 ) * Make `npm run docs` fail if there are any warnings. This will catch items missing from the API reference. * Add a check in our CI to make sure `npm run dos` runs without warnings and doesn't generate any new files (indicating it might be out-of-date. * Hide constructors that aren't user facing. * Remove unused enum `WriteMode`. Closes #2068	2025-01-30 16:06:06 -08:00
Will Jones	15f8f4d627	ci: check license headers (#2076 ) Based on the same workflow in Lance.	2025-01-29 08:27:07 -08:00
Will Jones	0a9e1eab75	fix(node): `createTable()` should save embeddings, and `mergeInsert` should use them (#2065 ) * `createTable()` now saves embeddings in the schema metadata. Previously, it would drop them. (`createEmptyTable()` was already tested and worked.) * `mergeInsert()` now uses embeddings. Fixes #2066	2025-01-28 12:38:50 -08:00
Will Jones	f059372137	feat: add `drop_index()` method (#2039 ) Closes #1665	2025-01-20 10:08:51 -08:00
Bert	cb9a00a28d	feat: add list_versions to typescript, rust and remote python sdks (#1850 ) Will require update to lance dependency to bring in this change which makes the version serializable https://github.com/lancedb/lance/pull/3143	2024-11-21 13:35:14 -05:00
Will Jones	2a6586d6fb	feat: add flag to enable faster manifest paths (#1612 ) The new V2 manifest path scheme makes discovering the latest version of a table constant time on object stores, regardless of the number of versions in the table. See benchmarks in the PR here: https://github.com/lancedb/lance/pull/2798 Closes #1583	2024-09-09 11:34:36 -07:00
Gagan Bhullar	d2caa5e202	feat(nodejs): add delete unverified (#1530 ) PR fixes part of #1527	2024-08-14 08:53:53 -07:00
BubbleCal	f9d5fa88a1	feat!: migrate FTS from tantivy to lance-index (#1483 ) Lance now supports FTS, so add it into lancedb Python, TypeScript and Rust SDKs. For Python, we still use tantivy based FTS by default because the lance FTS index now misses some features of tantivy. For Python: - Support to create lance based FTS index - Support to specify columns for full text search (only available for lance based FTS index) For TypeScript: - Change the search method so that it can accept both string and vector - Support full text search For Rust - Support full text search The others: - Update the FTS doc BREAKING CHANGE: - for Python, this renames the attached score column of FTS from "score" to "_score", this could be a breaking change for users that rely the scores --------- Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2024-08-08 15:33:15 +08:00
Cory Grinstead	fbfe2444a8	feat(nodejs): huggingface compatible transformers (#1462 )	2024-07-26 12:54:15 -07:00
Cory Grinstead	492d0328fe	chore: update readme to point to lancedb package (#1470 )	2024-07-23 13:46:32 -07:00
Cory Grinstead	fdc949bafb	feat(nodejs): `update({values \| valuesSql})` (#1439 )	2024-07-10 14:09:39 -05:00
Cory Grinstead	b8ccea9f71	feat(nodejs): make tbl.search chainable (#1421 ) so this was annoying me when writing the docs. for a `search` query, one needed to chain `async` calls. ```ts const res = await (await tbl.search("greetings")).toArray() ``` now the promise will be deferred until the query is collected, leading to a more functional API ```ts const res = await tbl.search("greetings").toArray() ```	2024-07-02 14:31:57 -05:00
Cory Grinstead	79a1667753	feat(nodejs): feature parity [6/N] - make public interface work with multiple arrow versions (#1392 ) previously we didnt have great compatibility with other versions of apache arrow. This should bridge that gap a bit. depends on https://github.com/lancedb/lancedb/pull/1391 see actual diff here https://github.com/universalmind303/lancedb/compare/query-filter...universalmind303:arrow-compatibility	2024-06-25 11:10:08 -05:00
Cory Grinstead	55f88346d0	feat(nodejs): table.indexStats (#1361 ) closes https://github.com/lancedb/lancedb/issues/1359	2024-06-21 17:06:52 -05:00
Cory Grinstead	b3e5ac6d2a	feat(nodejs): feature parity [2/N] - add `table.name` and `lancedb.connect({args})` (#1380 ) depends on https://github.com/lancedb/lancedb/pull/1378 see proper diff here https://github.com/universalmind303/lancedb/compare/remote-table-node...universalmind303:lancedb:table-name	2024-06-21 11:38:26 -05:00
Cory Grinstead	e7022b990e	feat(nodejs): feature parity [1/N] - remote table (#1378 ) closes https://github.com/lancedb/lancedb/issues/1362	2024-06-17 15:23:27 -05:00
Cory Grinstead	bc19a75f65	feat(nodejs): merge insert (#1351 ) closes https://github.com/lancedb/lancedb/issues/1349	2024-06-11 15:05:15 -05:00
Cory Grinstead	70f92f19a6	feat(nodejs): table.search functionality (#1341 ) closes https://github.com/lancedb/lancedb/issues/1256	2024-06-04 14:04:03 -05:00
Cory Grinstead	d9fb6457e1	fix(nodejs): better support for f16 and f64 (#1343 ) closes https://github.com/lancedb/lancedb/issues/1292 closes https://github.com/lancedb/lancedb/issues/1293	2024-06-04 13:41:21 -05:00
paul n walsh	7c133ec416	feat(nodejs): table.toArrow function (#1282 ) Addresses https://github.com/lancedb/lancedb/issues/1254. --------- Co-authored-by: universalmind303 <cory.grinstead@gmail.com>	2024-05-31 13:24:21 -05:00
Cory Grinstead	bc139000bd	feat(nodejs): add compatibility across arrow versions (#1337 ) while adding some more docs & examples for the new js sdk, i ran across a few compatibility issues when using different arrow versions. This should fix those issues.	2024-05-29 17:36:34 -05:00
Cory Grinstead	dbea3a7544	feat: js embedding registry (#1308 ) --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2024-05-29 13:12:19 -05:00
Weston Pace	4f512af024	feat: add the optimize function to nodejs and async python (#1257 ) The optimize function is pretty crucial for getting good performance when building a large scale dataset but it was only exposed in rust (many sync python users are probably doing this via to_lance today) This PR adds the optimize function to nodejs and to python. I left the function marked experimental because I think there will likely be changes to optimization (e.g. if we add features like "optimize on write"). I also only exposed the `cleanup_older_than` configuration parameter since this one is very commonly used and the rest have sensible defaults and we don't really know why we would recommend different values for these defaults anyways.	2024-05-20 07:09:31 -07:00
Cory Grinstead	055efdcdb6	refactor(nodejs): use biomejs instead of eslint & prettier (#1304 ) I've been noticing a lot of friction with the current toolchain for '/nodejs'. Particularly with the usage of eslint and prettier. [Biome](https://biomejs.dev/) is an all in one formatter & linter that replaces the need for two different ones that can potentially clash with one another. I've been using it in the [nodejs-polars](https://github.com/pola-rs/nodejs-polars) repo for quite some time & have found it much more pleasant to work with. --- One other small change included in this PR: use [ts-jest](https://www.npmjs.com/package/ts-jest) so we can run our tests without having to rebuild typescript code first	2024-05-14 11:11:18 -05:00
Cory Grinstead	71323a064a	chore(nodejs): update docs on "table.ts" (#1255 ) closes https://github.com/lancedb/lancedb/issues/1007	2024-05-01 23:00:22 -05:00
Weston Pace	4180b44472	feat: refactor the query API and add query support to the python async API (#1113 ) In addition, there are also a number of changes in nodejs to the docstrings of existing methods because this PR adds a jsdoc linter.	2024-04-05 16:32:47 -07:00
Weston Pace	b6a522d483	feat: add list_indices to the async api (#1074 )	2024-04-05 16:32:15 -07:00
Weston Pace	9031ec6878	feat: add update to the async API (#1093 )	2024-04-05 16:32:15 -07:00
Weston Pace	47daf9b7b0	feat: add time travel operations to the async API (#1070 )	2024-04-05 16:32:15 -07:00

1 2

56 Commits