lancedb

mirror of https://github.com/lancedb/lancedb.git synced 2026-05-23 15:00:39 +00:00

Author	SHA1	Message	Date
Will Jones	b3a4efd587	fix: revert change default read_consistency_interval=5s (#2327 ) This reverts commit `a547c523c2` or #2281 The current implementation can cause panics and performance degradation. I will bring this back with more testing in https://github.com/lancedb/lancedb/pull/2311 <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Documentation - Enhanced clarity on read consistency settings with updated descriptions and default behavior. - Removed outdated warnings about eventual consistency from the troubleshooting guide. - Refactor - Streamlined the handling of the read consistency interval across integrations, now defaulting to "None" for improved performance. - Simplified internal logic to offer a more consistent experience. - Tests - Updated test expectations to reflect the new default representation for the read consistency interval. - Removed redundant tests related to "no consistency" settings for streamlined testing. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>	2025-04-14 08:48:15 -07:00
Lei Xu	080ea2f9a4	chore: fix 1.86 warnings (#2312 ) Fix rust 1.86 warnings	2025-04-12 08:29:10 -05:00
Lance Release	56aa133ee6	Bump version: 0.19.0-beta.5 → 0.19.0-beta.6	2025-04-08 06:16:30 +00:00
BubbleCal	ec8271931f	feat: support to create FTS index on list of strings (#2317 ) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Chores - Updated internal library dependencies to the latest beta version for improved system stability. - Tests - Added automated tests to validate full-text search functionality on list-based text fields. - Refactor - Enhanced the search processing logic to provide robust support for list-type text data, ensuring more reliable results. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2025-04-08 14:12:35 +08:00
Lance Release	c298482ee1	Bump version: 0.19.0-beta.4 → 0.19.0-beta.5	2025-04-04 21:49:53 +00:00
Will Jones	657843d9e9	perf: remove redundant checkout latest (#2310 ) This bug was introduced in https://github.com/lancedb/lancedb/pull/2281 Likely introduced during a rebase when fixing merge conflicts. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Refactor - Updated the refresh process so that reloading now uses the existing dataset version instead of automatically updating to the latest version. This change may affect workflows that rely on immediate data updates during refresh. - New Features - Introduced a new module for tracking I/O statistics in object store operations, enhancing monitoring capabilities. - Added a new test module to validate the functionality of the dataset operations. - Bug Fixes - Reintroduced the `write_options` method in the `CreateTableBuilder`, ensuring consistent functionality across different builder variants. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-04-04 12:56:02 -07:00
Will Jones	1cd76b8498	feat: add timeout to query execution options (#2288 ) Closes #2287 <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Added configurable timeout support for query executions. Users can now specify maximum wait times for queries, enhancing control over long-running operations across various integrations. - Tests - Expanded test coverage to validate timeout behavior in both synchronous and asynchronous query flows, ensuring timely error responses when query execution exceeds the specified limit. - Introduced a new test suite to verify query operations when a timeout is reached, checking for appropriate error handling. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-04-04 12:34:41 -07:00
Lance Release	d4ea50fba1	Bump version: 0.19.0-beta.3 → 0.19.0-beta.4	2025-04-02 21:23:19 +00:00
Lance Release	bd62c2384f	Bump version: 0.19.0-beta.2 → 0.19.0-beta.3	2025-04-02 09:28:14 +00:00
BubbleCal	e52ac79c69	fix: can't do structured FTS in python (#2300 ) missed to support it in `search()` API and there were some pydantic errors <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Enhanced full-text search capabilities by incorporating additional parameters, enabling more flexible query definitions. - Extended table search functionality to support full-text queries alongside existing search types. - Tests - Introduced new tests that validate both structured and conditional full-text search behaviors. - Expanded test coverage for various query types, including MatchQuery, BoostQuery, MultiMatchQuery, and PhraseQuery. - Bug Fixes - Fixed a logic issue in query processing to ensure correct handling of full-text search queries. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2025-04-02 17:27:15 +08:00
Lance Release	a505bc3965	Bump version: 0.19.0-beta.1 → 0.19.0-beta.2	2025-04-01 17:28:21 +00:00
Weston Pace	1ee63984f5	feat: allow FSB to be used for btree indices (#2297 ) We recently allowed this for lance but there was a check in lancedb as well that was preventing it <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Added support for indexing fixed-size binary data using B-tree structures for efficient data storage and retrieval. - Tests - Implemented automated tests to ensure the new binary indexing works correctly and meets the expected configuration. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-04-01 10:27:22 -07:00
Lance Release	e4485a630e	Bump version: 0.19.0-beta.0 → 0.19.0-beta.1	2025-04-01 14:26:47 +00:00
Weston Pace	625bab3f21	feat: update to lance 0.25.3b1 (#2294 ) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Chores - Updated dependency versions for improved performance and compatibility. - New Features - Added support for structured full-text search with expanded query types (e.g., match, phrase, boost, multi-match) and flexible input formats. - Introduced a new method to check server support for structural full-text search features. - Enhanced the query system with new classes and interfaces for handling various full-text queries. - Expanded the functionality of existing methods to accept more complex query structures, including updates to method signatures. - Bug Fixes - Improved error handling and reporting for full-text search queries. - Refactor - Enhanced query processing with streamlined input handling and improved error reporting, ensuring more robust and consistent search results across platforms. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: BubbleCal <bubble-cal@outlook.com> Co-authored-by: BubbleCal <bubble-cal@outlook.com>	2025-04-01 06:36:42 -07:00
Lance Release	e67cd0baf9	Bump version: 0.18.3-beta.0 → 0.19.0-beta.0	2025-03-30 18:04:32 +00:00
LuQQiu	b9bdb8d937	fix: fix remote restore api to always checkout latest version (#2291 ) Fix restore to always checkout latest version, following local restore api implementation `a1d1833a40/rust/lancedb/src/table.rs (L1910)` Otherwise table.create_table -> version 1 table.add_table -> version 2 table.checkout(1), table.restore() -> the version remains at 1 (should checkout_latest inside restore method to update version to latest version and allow write operation) table.checkout_latest() -> version is 3 can do write operations	2025-03-29 22:46:57 -07:00
LuQQiu	a1d1833a40	feat: add analyze_plan api (#2280 ) add analyze plan api to allow executing the queries and see runtime metrics. Which help identify the query IO overhead and help identify query slowness	2025-03-28 14:28:52 -07:00
Will Jones	a547c523c2	feat!: change default read_consistency_interval=5s (#2281 ) Previously, when we loaded the next version of the table, we would block all reads with a write lock. Now, we only do that if `read_consistency_interval=0`. Otherwise, we load the next version asynchronously in the background. This should mean that `read_consistency_interval > 0` won't have a meaningful impact on latency. Along with this change, I felt it was safe to change the default consistency interval to 5 seconds. The current default is `None`, which means we will never check for a new version by default. I think that default is contrary to most users expectations.	2025-03-28 11:04:31 -07:00
Lance Release	346cbf8bf7	Bump version: 0.18.2-beta.0 → 0.18.3-beta.0	2025-03-28 16:03:31 +00:00
LuQQiu	cba14a5743	feat: add restore remote api (#2282 )	2025-03-27 16:33:52 -07:00
LuQQiu	698f329598	feat: add explain plan remote api (#2263 ) Add explain plan remote api	2025-03-26 11:22:40 -07:00
Lance Release	f97e751b3c	Bump version: 0.18.1 → 0.18.2-beta.0	2025-03-21 20:02:59 +00:00
Weston Pace	9403254442	feat: add to_query_object method (#2239 ) This PR adds a `to_query_object` method to the various query builders (except not hybrid queries yet). This makes it possible to inspect the query that is built. In addition this PR does some normalization between the sync and async query paths. A few custom defaults were removed in favor of None (with the default getting set once, in rust). Also, the synchronous to_batches method will now actually stream results Also, the remote API now defaults to prefiltering	2025-03-21 13:01:51 -07:00
Samuel Colvin	7982d5c082	fix: correct rust install docs (#2253 ) I'm pretty sure you mean `cargo add lancedb` here, `cargo install lancedb` fails right now.	2025-03-21 10:12:53 -07:00
BubbleCal	7ff6ec7fe3	feat: upgrade to lance v0.25.0-beta.5 (#2248 ) - adds `loss` into the index stats for vector index - now `optimize` can retrain the vector index --------- Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2025-03-21 10:12:23 -07:00
Will Jones	440a466a13	ci: remove OpenSSL as dependency in favor of rustls (#2242 ) `object_store` already hard codes `rustls` as the TLS implementation, so we have been shipping a mix of `rustls` and `openssl`. For simplicity of builds, we should consolidate to one, and that has to be `rustls`.	2025-03-20 08:06:45 -07:00
Weston Pace	4e03ee82bc	refactor: rework catalog/database options (#2213 ) The `ConnectRequest` has a set of properties that only make sense for listing databases / catalogs and a set of properties that only make sense for remote databases. This PR reduces all options to a single `HashMap<String, String>`. This makes it easier to add new database / catalog implementations and makes it clearer to users which options are applicable in which situations. I don't believe there are any breaking changes here. The closest thing is that I placed the `ConnectBuilder` methods `api_key`, `region`, and `host_override` behind a `remote` feature gate. This is not strictly needed and I could remove the feature gate but it seemed appropriate. Since using these methods without the remote feature would have been meaningless I don't feel this counts as a breaking change. We could look at removing these methods entirely from the `ConnectBuilder` (and encouraging users to use `RemoteDatabaseOptions` instead) but I'm not sure how I feel about that. Another approach we could take is to move these methods into a `RemoteConnectBuilderExt` trait (and there could be a similar `ListingConnectBuilderExt` trait to add methods for the listing database / catalog). For now though my main goal is to simplify `ConnectRequest` as much as possible (I see this being part of the key public API for database / catalog integrations, similar to the `BaseTable`, `Catalog`, and `Database` traits and I'd like it to be simple).	2025-03-18 10:13:59 -07:00
Weston Pace	46a6846d07	refactor: remove dataset reference from base table (#2226 )	2025-03-17 06:27:33 -07:00
Bob Liu	5c00b2904c	feat: add get dataset method on NativeTable (#2021 ) I want to public the dataset method from native table, then I can use more lance method like order_by which is not exposed in the lancedb crate.	2025-03-13 11:15:28 -07:00
Gagan Bhullar	14677d7c18	fix: metric type inconsistency (#2122 ) PR fixes #2113 --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2025-03-12 10:28:37 -07:00
Will Jones	7747c9bcbf	feat(node): parse arrow types in `alterColumns()` (#2208 ) Previously, users could only specify new data types in `alterColumns` as strings: ```ts await tbl.alterColumns([ path: "price", dataType: "float" ]); ``` But this has some problems: 1. It wasn't clear what were valid types 2. It was impossible to specify nested types, like lists and vector columns. This PR changes it to take an Arrow data type, similar to how the Python API works. This allows casting vector types: ```ts await tbl.alterColumns([ { path: "vector", dataType: new arrow.FixedSizeList( 2, new arrow.Field("item", new arrow.Float16(), false), ), }, ]); ``` Closes #2185	2025-03-12 09:57:36 -07:00
vinoyang	3750639b5f	feat(rust): add connect_catalog method to support connect catalog via url (#2177 )	2025-03-12 05:19:03 -07:00
Lance Release	de6739e7ec	Bump version: 0.18.1-beta.0 → 0.18.1	2025-03-11 13:14:49 +00:00
Lance Release	495216efdb	Bump version: 0.18.0 → 0.18.1-beta.0	2025-03-11 13:14:44 +00:00
Lance Release	e80a405dee	Bump version: 0.18.0-beta.1 → 0.18.0	2025-03-10 23:13:18 +00:00
Lance Release	a53e19e386	Bump version: 0.18.0-beta.0 → 0.18.0-beta.1	2025-03-10 23:13:13 +00:00
Wyatt Alt	f86b20a564	fix: delete tables from DDB on drop_all_tables (#2194 ) Prior to this commit, issuing drop_all_tables on a listing database with an external manifest store would delete physical tables but leave references behind in the manifest store. The table drop would succeed, but subsequent creation of a table with the same name would fail with a conflict. With this patch, the external manifest store is updated to account for the dropped tables so that dropped table names can be reused.	2025-03-10 15:00:53 -07:00
Weston Pace	bc49c4db82	feat: respect datafusion's batch size when running as a table provider (#2187 ) Datafusion makes the batch size available as part of the `SessionState`. We should use that to set the `max_batch_length` property in the `QueryExecutionOptions`.	2025-03-07 05:53:36 -08:00
Weston Pace	d2eec46f17	feat: add support for streaming input to create_table (#2175 ) This PR makes it possible to create a table using an asynchronous stream of input data. Currently only a synchronous iterator is supported. There are a number of follow-ups not yet tackled: * Support for embedding functions (the embedding functions wrapper needs to be re-written to be async, should be an easy lift) * Support for async input into the remote table (the make_ipc_batch needs to change to accept async input, leaving undone for now because I think we want to support actual streaming uploads into the remote table soon) * Support for async input into the add function (pretty essential, but it is a fairly distinct code path, so saving for a different PR)	2025-03-06 11:55:00 -08:00
vinoyang	374fe0ad95	feat(rust): introduce Catalog trait and implement ListingCatalog (#2148 ) Co-authored-by: Weston Pace <weston.pace@gmail.com>	2025-03-03 20:22:24 -08:00
Weston Pace	fa1b9ad5bd	fix: don't use with_schema to remove schema metadata (#2162 ) It seems that `RecordBatch::with_schema` is unable to remove schema metadata from a batch. It fails with the error `target schema is not superset of current schema`. I'm not sure how the `test_metadata_erased` test is passing. Strangely, the metadata was not present by the time the batch arrived at the metadata eraser. I think maybe the schema metadata is only present in the batch if there is a filter. I've created a new unit test that makes sure the metadata is erased if we have a filter also	2025-02-27 10:24:00 -08:00
BubbleCal	8877eb020d	feat: record the server version for remote table (#2147 ) Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2025-02-27 15:55:59 +08:00
Lance Release	84b110e0ef	Bump version: 0.17.0 → 0.18.0-beta.0	2025-02-26 20:11:07 +00:00
Will Jones	5b12a47119	feat!: revert query limit to be unbounded for scans (#2151 ) In earlier PRs (#1886, #1191) we made the default limit 10 regardless of the query type. This was confusing for users and in many cases a breaking change. Users would have queries that used to return all results, but instead only returned the first 10, causing silent bugs. Part of the cause was consistency: the Python sync API seems to have always had a limit of 10, while newer APIs (Python async and Nodejs) didn't. This PR sets the default limit only for searches (vector search, FTS), while letting scans (even with filters) be unbounded. It does this consistently for all SDKs. Fixes #1983 Fixes #1852 Fixes #2141	2025-02-26 10:32:14 -08:00
Lance Release	22bd8329f3	Bump version: 0.17.0-beta.0 → 0.17.0	2025-02-26 18:16:07 +00:00
Lance Release	a736fad149	Bump version: 0.16.1-beta.3 → 0.17.0-beta.0	2025-02-26 18:16:01 +00:00
Weston Pace	c4f99e82e5	feat: push filters down into DF table provider (#2128 )	2025-02-25 14:46:28 -08:00
BubbleCal	f391ed828a	fix: remote table doesn't apply the prefilter flag for FTS (#2145 )	2025-02-24 21:37:43 +08:00
Lance Release	0f102f02c3	Bump version: 0.16.1-beta.2 → 0.16.1-beta.3	2025-02-20 03:38:01 +00:00
BubbleCal	14c9ff46d1	feat: support multivector on remote table (#2045 ) Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2025-02-20 11:34:51 +08:00

1 2 3 4 5 ...

430 Commits