lancedb

mirror of https://github.com/lancedb/lancedb.git synced 2025-12-27 15:12:53 +00:00

Author	SHA1	Message	Date
LuQQiu	ed594b0f76	feat: return version for all write operations (#2368 ) return version info for all write operations (add, update, merge_insert and column modification operations) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Table modification operations (add, update, delete, merge, add/alter/drop columns) now return detailed result objects including version numbers and operation statistics. - Result objects provide clearer feedback such as rows affected and new table version after each operation. - Documentation - Updated documentation to describe new result objects and their fields for all relevant table operations. - Added documentation for new result interfaces and updated method return types in Node.js and Python APIs. - Tests - Enhanced test coverage to assert correctness of returned versioning and operation metadata after table modifications. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-05-05 14:25:34 -07:00
Ryan Green	af54e0ce06	feat: add table stats API (#2363 ) * Add a new "table stats" API to expose basic table and fragment statistics with local and remote table implementations ### Questions * This is using `calculate_data_stats` to determine total bytes in the table. This seems like a potentially expensive operation - are there any concerns about performance for large datasets? ### Notes * bytes_on_disk seems to be stored at the column level but there does not seem to be a way to easily calculate total bytes per fragment. This may need to be added in lance before we can support fragment size (bytes) statistics. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Added a method to retrieve comprehensive table statistics, including total rows, index counts, storage size, and detailed fragment size metrics such as minimum, maximum, mean, and percentiles. - Enabled fetching of table statistics from remote sources through asynchronous requests. - Extended table interfaces across Python, Rust, and Node.js to support synchronous and asynchronous retrieval of table statistics. - Tests - Introduced tests to verify the accuracy of the new table statistics feature for both populated and empty tables. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-04-29 15:19:08 -02:30
LuQQiu	a9311c4dc0	feat: add list/create/delete/update/checkout tag API (#2353 ) add the tag related API to list existing tags, attach tag to a version, update the tag version, delete tag, get the version of the tag, and checkout the version that the tag bounded to. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Introduced table version tagging, allowing users to create, update, delete, and list human-readable tags for specific table versions. - Enabled checking out a table by either version number or tag name. - Added new interfaces for tag management in both Python and Node.js APIs, supporting synchronous and asynchronous workflows. - Bug Fixes - None. - Documentation - Updated documentation to describe the new tagging features, including usage examples. - Tests - Added comprehensive tests for tag creation, updating, deletion, listing, and version checkout by tag in both Python and Node.js environments. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-04-28 10:04:46 -07:00
Ryan Green	3ae90dde80	feat: add new table API to wait for async indexing (#2338 ) * Add new wait_for_index() table operation that polls until indices are created/fully indexed * Add an optional wait timeout parameter to all create_index operations * Python and NodeJS interfaces <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Summary by CodeRabbit - New Features - Added optional waiting for index creation completion with configurable timeout. - Introduced methods to poll and wait for indices to be fully built across sync and async tables. - Extended index creation APIs to accept a wait timeout parameter. - Bug Fixes - Added a new timeout error variant for improved error reporting on index operations. - Tests - Added tests covering successful index readiness waiting, timeout scenarios, and missing index cases. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-04-21 08:41:21 -02:30
Weston Pace	26080ee4c1	feat: add prewarm_index function (#2342 ) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Added the ability to prewarm (load into memory) table indexes via new methods in Python, Node.js, and Rust APIs, potentially reducing cold-start query latency. - Bug Fixes - Ensured prewarming an index does not interfere with subsequent search operations. - Tests - Introduced new test cases to verify full-text search index creation, prewarming, and search functionalities in both Python and Node.js. - Chores - Updated dependencies for improved compatibility and performance. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Lu Qiu <luqiujob@gmail.com>	2025-04-17 15:14:36 -07:00
BubbleCal	7ff6ec7fe3	feat: upgrade to lance v0.25.0-beta.5 (#2248 ) - adds `loss` into the index stats for vector index - now `optimize` can retrain the vector index --------- Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2025-03-21 10:12:23 -07:00
Will Jones	15f8f4d627	ci: check license headers (#2076 ) Based on the same workflow in Lance.	2025-01-29 08:27:07 -08:00
Will Jones	f059372137	feat: add `drop_index()` method (#2039 ) Closes #1665	2025-01-20 10:08:51 -08:00
Will Jones	79eaa52184	feat: schema evolution APIs in all SDKs (#1851 ) * Support `add_columns`, `alter_columns`, `drop_columns` in Remote SDK and async Python * Add `data_type` parameter to node * Docs updates	2024-12-04 14:47:50 -08:00
Bert	cb9a00a28d	feat: add list_versions to typescript, rust and remote python sdks (#1850 ) Will require update to lance dependency to bring in this change which makes the version serializable https://github.com/lancedb/lance/pull/3143	2024-11-21 13:35:14 -05:00
Will Jones	a324f4ad7a	feat(node): enable logging and show full errors (#1775 ) This exposes the `LANCEDB_LOG` environment variable in node, so that users can now turn on logging. In addition, fixes a bug where only the top-level error from Rust was being shown. This PR makes sure the full error chain is included in the error message. In the future, will improve this so the error chain is set on the [cause](https://nodejs.org/api/errors.html#errorcause) property of JS errors https://github.com/lancedb/lancedb/issues/1779 Fixes #1774	2024-10-29 15:13:34 -07:00
Will Jones	f958f4d2e8	feat: remote index stats (#1702 ) BREAKING CHANGE: the return value of `index_stats` method has changed and all `index_stats` APIs now take index name instead of UUID. Also several deprecated index statistics methods were removed. * Removes deprecated methods for individual index statistics * Aligns public `IndexStatistics` struct with API response from LanceDB Cloud. * Implements `index_stats` for remote Rust SDK and Python async API.	2024-09-27 12:10:00 -07:00
LuQQiu	abeaae3d80	feat!: upgrade Lance to 0.18.0 (#1657 ) BREAKING CHANGE: default file format changed to Lance v2.0. Upgrade Lance to 0.18.0 Change notes: https://github.com/lancedb/lance/releases/tag/v0.18.0	2024-09-19 10:50:26 -07:00
Will Jones	2a6586d6fb	feat: add flag to enable faster manifest paths (#1612 ) The new V2 manifest path scheme makes discovering the latest version of a table constant time on object stores, regardless of the number of versions in the table. See benchmarks in the PR here: https://github.com/lancedb/lance/pull/2798 Closes #1583	2024-09-09 11:34:36 -07:00
Gagan Bhullar	d2caa5e202	feat(nodejs): add delete unverified (#1530 ) PR fixes part of #1527	2024-08-14 08:53:53 -07:00
Lei Xu	2bdf0a02f9	feat!: upgrade lance to 0.16 (#1519 )	2024-08-07 13:15:22 -07:00
Cory Grinstead	b8a1719174	feat(nodejs): catch unwinds in node bindings (#1414 ) this bumps napi version to 2.16 which contains a few bug fixes. Additionally, it adds `catch_unwind` to any method that may unintentionally panic. `catch_unwind` will unwind the panics and return a regular JS error instead of panicking.	2024-07-01 09:28:10 -05:00
Cory Grinstead	55f88346d0	feat(nodejs): table.indexStats (#1361 ) closes https://github.com/lancedb/lancedb/issues/1359	2024-06-21 17:06:52 -05:00
Cory Grinstead	3cd84c9375	feat(nodejs): feature parity [4/N] - add 'name' to 'IndexConfig' for 'listIndices' (#1390 ) depends on https://github.com/lancedb/lancedb/pull/1386 see actual diff here https://github.com/universalmind303/lancedb/compare/create-table-args...universalmind303:list-indices-name	2024-06-21 15:45:02 -05:00
Cory Grinstead	b3e5ac6d2a	feat(nodejs): feature parity [2/N] - add `table.name` and `lancedb.connect({args})` (#1380 ) depends on https://github.com/lancedb/lancedb/pull/1378 see proper diff here https://github.com/universalmind303/lancedb/compare/remote-table-node...universalmind303:lancedb:table-name	2024-06-21 11:38:26 -05:00
Cory Grinstead	bc19a75f65	feat(nodejs): merge insert (#1351 ) closes https://github.com/lancedb/lancedb/issues/1349	2024-06-11 15:05:15 -05:00
Rob Meng	2e197ef387	feat: upgrade lance to 0.11.0 (#1317 ) upgrade lance and make fixes for the upgrade	2024-05-21 18:53:19 -04:00
Weston Pace	4f512af024	feat: add the optimize function to nodejs and async python (#1257 ) The optimize function is pretty crucial for getting good performance when building a large scale dataset but it was only exposed in rust (many sync python users are probably doing this via to_lance today) This PR adds the optimize function to nodejs and to python. I left the function marked experimental because I think there will likely be changes to optimization (e.g. if we add features like "optimize on write"). I also only exposed the `cleanup_older_than` configuration parameter since this one is very commonly used and the rest have sensible defaults and we don't really know why we would recommend different values for these defaults anyways.	2024-05-20 07:09:31 -07:00
Weston Pace	968c62cb8f	feat: introduce ArrowNative wrapper struct for adding data that is already a RecordBatchReader (#1139 ) In `2de226220b` I added a new `IntoArrow` trait for adding data into a table. Unfortunately, it seems my approach for implementing the trait for "things that are already record batch readers" was flawed. This PR corrects that flaw and, conveniently, removes the need to box readers at all (though it is ok if you do).	2024-04-05 16:33:37 -07:00
Weston Pace	4180b44472	feat: refactor the query API and add query support to the python async API (#1113 ) In addition, there are also a number of changes in nodejs to the docstrings of existing methods because this PR adds a jsdoc linter.	2024-04-05 16:32:47 -07:00
Weston Pace	b6a522d483	feat: add list_indices to the async api (#1074 )	2024-04-05 16:32:15 -07:00
Weston Pace	9031ec6878	feat: add update to the async API (#1093 )	2024-04-05 16:32:15 -07:00
Weston Pace	47daf9b7b0	feat: add time travel operations to the async API (#1070 )	2024-04-05 16:32:15 -07:00
Weston Pace	f822255683	feat: add create_index to the async python API (#1052 ) This also refactors the rust lancedb index builder API (and, correspondingly, the nodejs API)	2024-04-05 16:32:14 -07:00
Weston Pace	8033a44d68	feat: add support for add to async python API (#1037 ) In order to add support for `add` we needed to migrate the rust `Table` trait to a `Table` struct and `TableInternal` trait (similar to the way the connection is designed). While doing this we also cleaned up some inconsistencies between the SDKs: * Python and Node are garbage collected languages and it can be difficult to trigger something to be freed. The convention for these languages is to have some kind of close method. I added a close method to both the table and connection which will drop the underlying rust object. * We made significant improvements to table creation in `cc5f2136a6` for the `node` SDK. I copied these changes to the `nodejs` SDK. * The nodejs tables were using fs to create tmp directories and these were not getting cleaned up. This is mostly harmless but annoying and so I changed it up a bit to ensure we cleanup tmp directories. * ~~countRows in the node SDK was returning `bigint`. I changed it to return `number`~~ (this actually happened in a previous PR) * Tables and connections now implement `std::fmt::Display` which is hooked into python's `__repr__`. Node has no concept of a regular "to string" function and so I added a `display` method. * Python method signatures are changing so that optional parameters are always `Optional[foo] = None` instead of something like `foo = False`. This is because we want those defaults to be in rust whenever possible (though we still need to mention the default in documentation). * I changed the python `AsyncConnection/AsyncTable` classes from abstract classes with a single implementation to just classes because we no longer have the remote implementation in python. Note: this does NOT add the `add` function to the remote table. This PR was already large enough, and the remote implementation is unique enough, that I am going to do all the remote stuff at a later date (we should have the structure in place and correct so there shouldn't be any refactor concerns) --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2024-04-05 16:31:36 -07:00
Weston Pace	4299f719ec	feat: port create_table to the async python API and the remote rust API (#1031 ) I've also started `ASYNC_MIGRATION.MD` to keep track of the breaking changes from sync to async python.	2024-04-05 16:31:36 -07:00
Rob Meng	f3de3d990d	chore: upgrade to lance 0.10.1 (#1034 ) upgrade to lance 0.10.1 and update doc string to reflect dynamic projection options	2024-04-05 16:31:36 -07:00
Will Jones	464a36ad38	feat: `{add\|alter\|drop}_columns` APIs (#1015 ) Initial work for #959. This exposes the basic functionality for each in all of the APIs. Will add user guide documentation in a later PR.	2024-04-05 16:30:47 -07:00
Weston Pace	2163502b31	refactor: rename the rust crate from vectordb to lancedb (#1012 ) This also renames the new experimental node package to lancedb. The classic node package remains named vectordb. The goal here is to avoid introducing piecemeal breaking changes to the vectordb crate. Instead, once the new API is stabilized, we will officially release the lancedb crate and deprecate the vectordb crate. The same pattern will eventually happen with the npm package vectordb.	2024-04-05 16:30:40 -07:00
Will Jones	c5b0934bfb	feat(node): add `read_consistency_interval` to Node and Rust (#1002 ) This PR adds the same consistency semantics as was added in #828. It does not add the same lazy-loading of tables, since that breaks some existing tests. This closes #998. --------- Co-authored-by: Weston Pace <weston.pace@gmail.com>	2024-04-05 16:30:40 -07:00
Weston Pace	cbc0c439ef	refactor: rust vectordb API stabilization of the Connection trait (#993 ) This is the start of a more comprehensive refactor and stabilization of the Rust API. The `Connection` trait is cleaned up to not require `lance` and to match the `Connection` trait in other APIs. In addition, the concrete implementation `Database` is hidden. BREAKING CHANGE: The struct `crate::connection::Database` is now gone. Several examples opened a connection using `Database::connect` or `Database::connect_with_params`. Users should now use `vectordb::connect`. BREAKING CHANGE: The `connect`, `create_table`, and `open_table` methods now all return a builder object. This means that a call like `conn.open_table(..., opt1, opt2)` will now become `conn.open_table(...).opt1(opt1).opt2(opt2).execute()` In addition, the structure of options has changed slightly. However, no options capability has been removed. --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2024-04-05 16:30:40 -07:00
Weston Pace	138fc3f66b	feat: add a filterable count_rows to all the lancedb APIs (#913 ) A `count_rows` method that takes a filter was recently added to `LanceTable`. This PR adds it everywhere else except `RemoteTable` (that will come soon).	2024-04-05 16:29:58 -07:00
Lei Xu	db4a979278	feat(napi): Provide a new createIndex API in the napi SDK. (#857 )	2024-04-05 16:27:51 -07:00
Lei Xu	dfabbe9081	feat(rust): create index API improvement (#853 ) * Extract a minimal Table interface in Rust SDK * Make create_index composable in Rust. * Fix compiling issues from ffi	2024-04-05 16:27:51 -07:00
Lei Xu	efcaa433fe	feat: rework NodeJS SDK using napi (#847 ) Use Napi to write a Node.js SDK that follows Polars for better maintainability, while keeping most of the logic in Rust.	2024-04-05 16:27:51 -07:00

40 Commits