lancedb

mirror of https://github.com/lancedb/lancedb.git synced 2025-12-27 23:12:58 +00:00

Author	SHA1	Message	Date
Ryan Green	96d534d4bc	feat: add retries to remote client for requests with stream bodies (#2349 ) Closes https://github.com/lancedb/lancedb/issues/2307 * Adds retries to remote operations with stream bodies (add, merge_insert) * Change default retryable status codes to 409, 429, 500, 502, 503, 504 * Don't retry add or merge_insert operations on 5xx responses Notes: * Supporting retries on stream bodies means we have to buffer the body into memory so it can be cloned on retry. This will impact memory use patterns for the remote client. This buffering can be disabled by disabling retries (i.e. setting retries to 0 in RetryConfig) * It does not seem that retry config can be specified by env vars as the documentation suggests. I added a follow-up issue [here](https://github.com/lancedb/lancedb/issues/2350) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Summary by CodeRabbit - New Features - Enhanced retry support for remote requests with configurable limits and exponential backoff with jitter. - Added robust retry logic for streaming data uploads, enabling retries with buffered data to ensure reliability. - Bug Fixes - Improved error handling and retry behavior for HTTP status codes 409 and 504. - Refactor - Centralized and modularized HTTP request sending and retry logic across remote database and table operations. - Streamlined request ID management for improved traceability. - Simplified error message construction in index waiting functionality. - Tests - Added a test verifying merge-insert retries on HTTP 409 responses. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-04-22 15:40:44 -02:30
Ryan Green	3ae90dde80	feat: add new table API to wait for async indexing (#2338 ) * Add new wait_for_index() table operation that polls until indices are created/fully indexed * Add an optional wait timeout parameter to all create_index operations * Python and NodeJS interfaces <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Summary by CodeRabbit - New Features - Added optional waiting for index creation completion with configurable timeout. - Introduced methods to poll and wait for indices to be fully built across sync and async tables. - Extended index creation APIs to accept a wait timeout parameter. - Bug Fixes - Added a new timeout error variant for improved error reporting on index operations. - Tests - Added tests covering successful index readiness waiting, timeout scenarios, and missing index cases. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-04-21 08:41:21 -02:30
Weston Pace	26080ee4c1	feat: add prewarm_index function (#2342 ) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Added the ability to prewarm (load into memory) table indexes via new methods in Python, Node.js, and Rust APIs, potentially reducing cold-start query latency. - Bug Fixes - Ensured prewarming an index does not interfere with subsequent search operations. - Tests - Introduced new test cases to verify full-text search index creation, prewarming, and search functionalities in both Python and Node.js. - Chores - Updated dependencies for improved compatibility and performance. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Lu Qiu <luqiujob@gmail.com>	2025-04-17 15:14:36 -07:00
Lei Xu	4708b60bb1	chore: cargo update on main (#2331 ) Fix test failures on main	2025-04-12 09:00:47 -05:00
Lei Xu	080ea2f9a4	chore: fix 1.86 warnings (#2312 ) Fix rust 1.86 warnings	2025-04-12 08:29:10 -05:00
BubbleCal	ec8271931f	feat: support to create FTS index on list of strings (#2317 ) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Chores - Updated internal library dependencies to the latest beta version for improved system stability. - Tests - Added automated tests to validate full-text search functionality on list-based text fields. - Refactor - Enhanced the search processing logic to provide robust support for list-type text data, ensuring more reliable results. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2025-04-08 14:12:35 +08:00
Will Jones	1cd76b8498	feat: add timeout to query execution options (#2288 ) Closes #2287 <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Added configurable timeout support for query executions. Users can now specify maximum wait times for queries, enhancing control over long-running operations across various integrations. - Tests - Expanded test coverage to validate timeout behavior in both synchronous and asynchronous query flows, ensuring timely error responses when query execution exceeds the specified limit. - Introduced a new test suite to verify query operations when a timeout is reached, checking for appropriate error handling. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-04-04 12:34:41 -07:00
Weston Pace	a6d4125cbf	feat: upgrade lance to 0.25.3b2 (#2304 ) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Chores - Updated core dependency versions to v0.25.3-beta.2. - Enabled additional functionality with a new "dynamodb" feature. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-04-02 14:22:30 -07:00
Will Jones	f091f57594	ci: fix lancedb musl builds (#2296 ) Fixes #2255 <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Chores - Enhanced the build process to improve performance and reliability across Linux platforms. - Updated environment settings for more accurate compiler integration. - Activated previously inactive build configurations to support advanced feature support. - Added support for the x86_64 architecture on Linux systems utilizing the musl C library. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-04-01 14:44:27 -07:00
Weston Pace	1ee63984f5	feat: allow FSB to be used for btree indices (#2297 ) We recently allowed this for lance but there was a check in lancedb as well that was preventing it <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Added support for indexing fixed-size binary data using B-tree structures for efficient data storage and retrieval. - Tests - Implemented automated tests to ensure the new binary indexing works correctly and meets the expected configuration. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-04-01 10:27:22 -07:00
Weston Pace	625bab3f21	feat: update to lance 0.25.3b1 (#2294 ) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Chores - Updated dependency versions for improved performance and compatibility. - New Features - Added support for structured full-text search with expanded query types (e.g., match, phrase, boost, multi-match) and flexible input formats. - Introduced a new method to check server support for structural full-text search features. - Enhanced the query system with new classes and interfaces for handling various full-text queries. - Expanded the functionality of existing methods to accept more complex query structures, including updates to method signatures. - Bug Fixes - Improved error handling and reporting for full-text search queries. - Refactor - Enhanced query processing with streamlined input handling and improved error reporting, ensuring more robust and consistent search results across platforms. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: BubbleCal <bubble-cal@outlook.com> Co-authored-by: BubbleCal <bubble-cal@outlook.com>	2025-04-01 06:36:42 -07:00
LuQQiu	a1d1833a40	feat: add analyze_plan api (#2280 ) add analyze plan api to allow executing the queries and see runtime metrics. Which help identify the query IO overhead and help identify query slowness	2025-03-28 14:28:52 -07:00
LuQQiu	698f329598	feat: add explain plan remote api (#2263 ) Add explain plan remote api	2025-03-26 11:22:40 -07:00
BubbleCal	79fa745130	feat: upgrade lance to v0.25.1-beta.3 (#2276 ) Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2025-03-26 23:14:27 +08:00
Will Jones	2bfdef2624	ci: refactor node releases (#2223 ) This PR fixes build issues associated with `aws-lc-rs`, while simplifying the build process. Previously, we used custom scripts for the musl and Windows ARM builds. These were complicated and prone to breaking. This PR switches to a setup that mirrors https://github.com/napi-rs/package-template/blob/main/.github/workflows/CI.yml. * linux glibc and musl builds now use the Docker images provided by the napi project * Windows ARM build now just cross compiles from Windows x64, which turns out to work quite well.	2025-03-21 10:56:29 -07:00
BubbleCal	7ff6ec7fe3	feat: upgrade to lance v0.25.0-beta.5 (#2248 ) - adds `loss` into the index stats for vector index - now `optimize` can retrain the vector index --------- Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2025-03-21 10:12:23 -07:00
Will Jones	440a466a13	ci: remove OpenSSL as dependency in favor of rustls (#2242 ) `object_store` already hard codes `rustls` as the TLS implementation, so we have been shipping a mix of `rustls` and `openssl`. For simplicity of builds, we should consolidate to one, and that has to be `rustls`.	2025-03-20 08:06:45 -07:00
BubbleCal	6c321c694a	feat: upgrade lance to 0.25.0-beta2 (#2220 ) Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2025-03-13 14:12:54 -07:00
Bob Liu	5c00b2904c	feat: add get dataset method on NativeTable (#2021 ) I want to public the dataset method from native table, then I can use more lance method like order_by which is not exposed in the lancedb crate.	2025-03-13 11:15:28 -07:00
Gagan Bhullar	14677d7c18	fix: metric type inconsistency (#2122 ) PR fixes #2113 --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2025-03-12 10:28:37 -07:00
Will Jones	7747c9bcbf	feat(node): parse arrow types in `alterColumns()` (#2208 ) Previously, users could only specify new data types in `alterColumns` as strings: ```ts await tbl.alterColumns([ path: "price", dataType: "float" ]); ``` But this has some problems: 1. It wasn't clear what were valid types 2. It was impossible to specify nested types, like lists and vector columns. This PR changes it to take an Arrow data type, similar to how the Python API works. This allows casting vector types: ```ts await tbl.alterColumns([ { path: "vector", dataType: new arrow.FixedSizeList( 2, new arrow.Field("item", new arrow.Float16(), false), ), }, ]); ``` Closes #2185	2025-03-12 09:57:36 -07:00
Weston Pace	4a47150ae7	feat: upgrade to lance 0.24.1 (#2199 )	2025-03-10 15:18:37 -07:00
Weston Pace	bc49c4db82	feat: respect datafusion's batch size when running as a table provider (#2187 ) Datafusion makes the batch size available as part of the `SessionState`. We should use that to set the `max_batch_length` property in the `QueryExecutionOptions`.	2025-03-07 05:53:36 -08:00
BubbleCal	35e5b84ba9	chore: upgrade lance to 0.24.0-beta.1 (#2171 ) Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2025-03-03 12:32:12 +08:00
Weston Pace	fa1b9ad5bd	fix: don't use with_schema to remove schema metadata (#2162 ) It seems that `RecordBatch::with_schema` is unable to remove schema metadata from a batch. It fails with the error `target schema is not superset of current schema`. I'm not sure how the `test_metadata_erased` test is passing. Strangely, the metadata was not present by the time the batch arrived at the metadata eraser. I think maybe the schema metadata is only present in the batch if there is a filter. I've created a new unit test that makes sure the metadata is erased if we have a filter also	2025-02-27 10:24:00 -08:00
BubbleCal	8877eb020d	feat: record the server version for remote table (#2147 ) Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2025-02-27 15:55:59 +08:00
Weston Pace	4ba5326880	feat: reapply upgrade lance to v0.23.3-beta.1 (#2157 ) This reverts commit `2f0c5baea2`. --------- Co-authored-by: Lu Qiu <luqiujob@gmail.com>	2025-02-26 11:44:11 -08:00
Weston Pace	2f0c5baea2	Revert "chore: upgrade lance to v0.23.3-beta.1 (#2153 )" This reverts commit `a63dd66d41`.	2025-02-26 10:14:29 -08:00
BubbleCal	a63dd66d41	chore: upgrade lance to v0.23.3-beta.1 (#2153 ) this fixes a bug in SQ, see https://github.com/lancedb/lance/pull/3476 for more details --------- Signed-off-by: BubbleCal <bubble-cal@outlook.com> Co-authored-by: Lu Qiu <luqiujob@gmail.com>	2025-02-26 09:52:28 -08:00
Weston Pace	d6b3ccb37b	feat: upgrade lance to 0.23.2 (#2152 ) This also changes the pylance pin from `==0.23.2` to `~=0.23.2` which should allow the pylance dependency to float a little. The pylance dependency is actually not used for much anymore and so it should be tolerant of patch changes.	2025-02-26 09:02:51 -08:00
Weston Pace	c4f99e82e5	feat: push filters down into DF table provider (#2128 )	2025-02-25 14:46:28 -08:00
BubbleCal	a99a450f2b	fix: flat FTS panic with prefilter and update lance (#2144 ) this is fixed in lance so upgrade lance to 0.23.2-beta1	2025-02-24 14:34:00 +08:00
BubbleCal	784f00ef6d	chore: update Cargo.lock (#2137 )	2025-02-21 12:27:10 +08:00
BubbleCal	14c9ff46d1	feat: support multivector on remote table (#2045 ) Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2025-02-20 11:34:51 +08:00
BubbleCal	00514999ff	feat: upgrade lance to 0.23.1-beta.4 (#2121 ) this also upgrades object_store to 0.11.0, snafu to 0.8 Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2025-02-16 14:53:26 +08:00
BubbleCal	3b19e96ae7	fix: panic when field id doesn't equal to field index (#2116 ) Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2025-02-13 12:38:35 +08:00
BubbleCal	3490f3456f	chore: upgrade lance to 0.23.1-beta.2 (#2109 )	2025-02-11 23:57:56 +08:00
Wyatt Alt	3e3118f85c	feat: update lance dependency to 0.23.1-beta.1 (#2102 )	2025-02-07 10:56:01 -08:00
Will Jones	e7574698eb	feat: upgrade Lance to 0.23.0 (#2101 ) Upstream changelog: https://github.com/lancedb/lance/releases/tag/v0.23.0	2025-02-07 07:58:07 -08:00
Weston Pace	4e5fbe6c99	fix: ensure metadata erased from schema call in table provider (#2099 ) This also adds a basic unit test for the table provider	2025-02-06 15:30:20 -08:00
Weston Pace	6bf742c759	feat: expose table trait (#2097 ) Similar to `c269524b2f` this PR reworks and exposes an internal trait (this time `TableInternal`) to be a public trait. These two PRs together should make it possible for others to integrate LanceDB on top of other catalogs. This PR also adds a basic `TableProvider` implementation for tables, although some work still needs to be done here (pushdown not yet enabled).	2025-02-05 18:13:51 -08:00
Weston Pace	c269524b2f	feat!: refactor ConnectionInternal into a Database trait (#2067 ) This opens up the door for more custom database implementations than the two we have today. The biggest change should be inivisble: `ConnectionInternal` has been renamed to `Database`, made public, and refactored However, there are a few breaking changes. `data_storage_version` and `enable_v2_manifest_paths` have been moved from options on `create_table` to options for the database which are now set via `storage_options`. Before: ``` db = connect(uri) tbl = db.create_table("my_table", data, data_storage_version="legacy", enable_v2_manifest_paths=True) ``` After: ``` db = connect(uri, storage_options={ "new_table_enable_v2_manifest_paths": "true", "new_table_data_storage_version": "legacy" }) tbl = db.create_table("my_table", data) ``` BREAKING CHANGE: the data_storage_version, enable_v2_manifest_paths options have moved from options to create_table to storage_options. BREAKING CHANGE: the use_legacy_format option has been removed, data_storage_version has replaced it for some time now	2025-02-04 14:35:14 -08:00
Rob Meng	32716adaa3	chore: bump lance version (#2092 )	2025-02-04 12:25:05 -05:00
Will Jones	2f39274a66	feat: upgrade lance to 0.23.0-beta.4 (#2089 ) Upstream changelog: https://github.com/lancedb/lance/releases/tag/v0.23.0-beta.4	2025-01-31 17:20:15 -08:00
Will Jones	e05c0cd87e	ci(node): check docs in CI (#2084 ) * Make `npm run docs` fail if there are any warnings. This will catch items missing from the API reference. * Add a check in our CI to make sure `npm run dos` runs without warnings and doesn't generate any new files (indicating it might be out-of-date. * Hide constructors that aren't user facing. * Remove unused enum `WriteMode`. Closes #2068	2025-01-30 16:06:06 -08:00
Will Jones	a677a4b651	ci: fix arm64 windows cross compile build (#2081 ) * Adds a CI job to check the cross compiled Windows ARM build. * Didn't replace the test build because we need native build to run tests. But for some reason (I forget why) we need cross compiled for nodejs. * Pinned crunchy to workaround https://github.com/eira-fransham/crunchy/issues/13 This is needed to fix failure from https://github.com/lancedb/lancedb/actions/runs/13020773184/job/36320719331	2025-01-30 09:24:20 -08:00
Will Jones	6526d6c3b1	ci(rust): caching improvements (up to 2.8x faster builds) (#2075 ) Some Rust jobs (such as [Rust/linux](https://github.com/lancedb/lancedb/actions/runs/13019232960/job/36315830779)) take almost minutes. This can be a bit of a bottleneck. * Two fixes to make caches more effective * Check in `Cargo.lock` so that dependencies don't change much between runs * Added a new CI job to validate we can build without a lockfile * Altered build commands so they don't have contradictory features and therefore don't trigger multiple builds Sadly, I don't think there's much to be done for windows-arm64, as much of the compile time is because the base image is so bare we need to install the build tools ourselves.	2025-01-29 08:26:45 -08:00
Lei Xu	148ed82607	Bump Lance version to 0.5.3 (#250 )	2023-07-04 08:34:41 -07:00
Rob Meng	a6bdffd75b	bump lance to 0.5.2, make object store construction hook public (#237 ) * bump to 0.5.2 to pick up S3 auth fixes * make `open_table_params` a public attribute * add `open_table_with_params` on `Database`	2023-06-29 18:50:02 -04:00
Rob Meng	0f58bd7af2	allow passing ReadParams to dataset when opening a table (#234 ) Plumb thru object store construction hook from [lance/pull/1014](https://github.com/lancedb/lance/pull/1014)	2023-06-28 11:20:09 -04:00

1 2

62 Commits