lancedb

mirror of https://github.com/lancedb/lancedb.git synced 2026-07-03 19:10:41 +00:00

Author	SHA1	Message	Date
Lance Release	a505bc3965	Bump version: 0.19.0-beta.1 → 0.19.0-beta.2	2025-04-01 17:28:21 +00:00
Weston Pace	1ee63984f5	feat: allow FSB to be used for btree indices (#2297 ) We recently allowed this for lance but there was a check in lancedb as well that was preventing it <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Added support for indexing fixed-size binary data using B-tree structures for efficient data storage and retrieval. - Tests - Implemented automated tests to ensure the new binary indexing works correctly and meets the expected configuration. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2025-04-01 10:27:22 -07:00
Lance Release	e4485a630e	Bump version: 0.19.0-beta.0 → 0.19.0-beta.1	2025-04-01 14:26:47 +00:00
Weston Pace	625bab3f21	feat: update to lance 0.25.3b1 (#2294 ) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - Chores - Updated dependency versions for improved performance and compatibility. - New Features - Added support for structured full-text search with expanded query types (e.g., match, phrase, boost, multi-match) and flexible input formats. - Introduced a new method to check server support for structural full-text search features. - Enhanced the query system with new classes and interfaces for handling various full-text queries. - Expanded the functionality of existing methods to accept more complex query structures, including updates to method signatures. - Bug Fixes - Improved error handling and reporting for full-text search queries. - Refactor - Enhanced query processing with streamlined input handling and improved error reporting, ensuring more robust and consistent search results across platforms. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: BubbleCal <bubble-cal@outlook.com> Co-authored-by: BubbleCal <bubble-cal@outlook.com>	2025-04-01 06:36:42 -07:00
Lance Release	e67cd0baf9	Bump version: 0.18.3-beta.0 → 0.19.0-beta.0	2025-03-30 18:04:32 +00:00
LuQQiu	b9bdb8d937	fix: fix remote restore api to always checkout latest version (#2291 ) Fix restore to always checkout latest version, following local restore api implementation `a1d1833a40/rust/lancedb/src/table.rs (L1910)` Otherwise table.create_table -> version 1 table.add_table -> version 2 table.checkout(1), table.restore() -> the version remains at 1 (should checkout_latest inside restore method to update version to latest version and allow write operation) table.checkout_latest() -> version is 3 can do write operations	2025-03-29 22:46:57 -07:00
LuQQiu	a1d1833a40	feat: add analyze_plan api (#2280 ) add analyze plan api to allow executing the queries and see runtime metrics. Which help identify the query IO overhead and help identify query slowness	2025-03-28 14:28:52 -07:00
Will Jones	a547c523c2	feat!: change default read_consistency_interval=5s (#2281 ) Previously, when we loaded the next version of the table, we would block all reads with a write lock. Now, we only do that if `read_consistency_interval=0`. Otherwise, we load the next version asynchronously in the background. This should mean that `read_consistency_interval > 0` won't have a meaningful impact on latency. Along with this change, I felt it was safe to change the default consistency interval to 5 seconds. The current default is `None`, which means we will never check for a new version by default. I think that default is contrary to most users expectations.	2025-03-28 11:04:31 -07:00
Lance Release	346cbf8bf7	Bump version: 0.18.2-beta.0 → 0.18.3-beta.0	2025-03-28 16:03:31 +00:00
LuQQiu	cba14a5743	feat: add restore remote api (#2282 )	2025-03-27 16:33:52 -07:00
LuQQiu	698f329598	feat: add explain plan remote api (#2263 ) Add explain plan remote api	2025-03-26 11:22:40 -07:00
Lance Release	f97e751b3c	Bump version: 0.18.1 → 0.18.2-beta.0	2025-03-21 20:02:59 +00:00
Weston Pace	9403254442	feat: add to_query_object method (#2239 ) This PR adds a `to_query_object` method to the various query builders (except not hybrid queries yet). This makes it possible to inspect the query that is built. In addition this PR does some normalization between the sync and async query paths. A few custom defaults were removed in favor of None (with the default getting set once, in rust). Also, the synchronous to_batches method will now actually stream results Also, the remote API now defaults to prefiltering	2025-03-21 13:01:51 -07:00
Samuel Colvin	7982d5c082	fix: correct rust install docs (#2253 ) I'm pretty sure you mean `cargo add lancedb` here, `cargo install lancedb` fails right now.	2025-03-21 10:12:53 -07:00
BubbleCal	7ff6ec7fe3	feat: upgrade to lance v0.25.0-beta.5 (#2248 ) - adds `loss` into the index stats for vector index - now `optimize` can retrain the vector index --------- Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2025-03-21 10:12:23 -07:00
Will Jones	440a466a13	ci: remove OpenSSL as dependency in favor of rustls (#2242 ) `object_store` already hard codes `rustls` as the TLS implementation, so we have been shipping a mix of `rustls` and `openssl`. For simplicity of builds, we should consolidate to one, and that has to be `rustls`.	2025-03-20 08:06:45 -07:00
Weston Pace	4e03ee82bc	refactor: rework catalog/database options (#2213 ) The `ConnectRequest` has a set of properties that only make sense for listing databases / catalogs and a set of properties that only make sense for remote databases. This PR reduces all options to a single `HashMap<String, String>`. This makes it easier to add new database / catalog implementations and makes it clearer to users which options are applicable in which situations. I don't believe there are any breaking changes here. The closest thing is that I placed the `ConnectBuilder` methods `api_key`, `region`, and `host_override` behind a `remote` feature gate. This is not strictly needed and I could remove the feature gate but it seemed appropriate. Since using these methods without the remote feature would have been meaningless I don't feel this counts as a breaking change. We could look at removing these methods entirely from the `ConnectBuilder` (and encouraging users to use `RemoteDatabaseOptions` instead) but I'm not sure how I feel about that. Another approach we could take is to move these methods into a `RemoteConnectBuilderExt` trait (and there could be a similar `ListingConnectBuilderExt` trait to add methods for the listing database / catalog). For now though my main goal is to simplify `ConnectRequest` as much as possible (I see this being part of the key public API for database / catalog integrations, similar to the `BaseTable`, `Catalog`, and `Database` traits and I'd like it to be simple).	2025-03-18 10:13:59 -07:00
Weston Pace	46a6846d07	refactor: remove dataset reference from base table (#2226 )	2025-03-17 06:27:33 -07:00
Bob Liu	5c00b2904c	feat: add get dataset method on NativeTable (#2021 ) I want to public the dataset method from native table, then I can use more lance method like order_by which is not exposed in the lancedb crate.	2025-03-13 11:15:28 -07:00
Gagan Bhullar	14677d7c18	fix: metric type inconsistency (#2122 ) PR fixes #2113 --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2025-03-12 10:28:37 -07:00
Will Jones	7747c9bcbf	feat(node): parse arrow types in `alterColumns()` (#2208 ) Previously, users could only specify new data types in `alterColumns` as strings: ```ts await tbl.alterColumns([ path: "price", dataType: "float" ]); ``` But this has some problems: 1. It wasn't clear what were valid types 2. It was impossible to specify nested types, like lists and vector columns. This PR changes it to take an Arrow data type, similar to how the Python API works. This allows casting vector types: ```ts await tbl.alterColumns([ { path: "vector", dataType: new arrow.FixedSizeList( 2, new arrow.Field("item", new arrow.Float16(), false), ), }, ]); ``` Closes #2185	2025-03-12 09:57:36 -07:00
vinoyang	3750639b5f	feat(rust): add connect_catalog method to support connect catalog via url (#2177 )	2025-03-12 05:19:03 -07:00
Lance Release	de6739e7ec	Bump version: 0.18.1-beta.0 → 0.18.1	2025-03-11 13:14:49 +00:00
Lance Release	495216efdb	Bump version: 0.18.0 → 0.18.1-beta.0	2025-03-11 13:14:44 +00:00
Lance Release	e80a405dee	Bump version: 0.18.0-beta.1 → 0.18.0	2025-03-10 23:13:18 +00:00
Lance Release	a53e19e386	Bump version: 0.18.0-beta.0 → 0.18.0-beta.1	2025-03-10 23:13:13 +00:00
Wyatt Alt	f86b20a564	fix: delete tables from DDB on drop_all_tables (#2194 ) Prior to this commit, issuing drop_all_tables on a listing database with an external manifest store would delete physical tables but leave references behind in the manifest store. The table drop would succeed, but subsequent creation of a table with the same name would fail with a conflict. With this patch, the external manifest store is updated to account for the dropped tables so that dropped table names can be reused.	2025-03-10 15:00:53 -07:00
Weston Pace	bc49c4db82	feat: respect datafusion's batch size when running as a table provider (#2187 ) Datafusion makes the batch size available as part of the `SessionState`. We should use that to set the `max_batch_length` property in the `QueryExecutionOptions`.	2025-03-07 05:53:36 -08:00
Weston Pace	d2eec46f17	feat: add support for streaming input to create_table (#2175 ) This PR makes it possible to create a table using an asynchronous stream of input data. Currently only a synchronous iterator is supported. There are a number of follow-ups not yet tackled: * Support for embedding functions (the embedding functions wrapper needs to be re-written to be async, should be an easy lift) * Support for async input into the remote table (the make_ipc_batch needs to change to accept async input, leaving undone for now because I think we want to support actual streaming uploads into the remote table soon) * Support for async input into the add function (pretty essential, but it is a fairly distinct code path, so saving for a different PR)	2025-03-06 11:55:00 -08:00
vinoyang	374fe0ad95	feat(rust): introduce Catalog trait and implement ListingCatalog (#2148 ) Co-authored-by: Weston Pace <weston.pace@gmail.com>	2025-03-03 20:22:24 -08:00
Weston Pace	fa1b9ad5bd	fix: don't use with_schema to remove schema metadata (#2162 ) It seems that `RecordBatch::with_schema` is unable to remove schema metadata from a batch. It fails with the error `target schema is not superset of current schema`. I'm not sure how the `test_metadata_erased` test is passing. Strangely, the metadata was not present by the time the batch arrived at the metadata eraser. I think maybe the schema metadata is only present in the batch if there is a filter. I've created a new unit test that makes sure the metadata is erased if we have a filter also	2025-02-27 10:24:00 -08:00
BubbleCal	8877eb020d	feat: record the server version for remote table (#2147 ) Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2025-02-27 15:55:59 +08:00
Lance Release	84b110e0ef	Bump version: 0.17.0 → 0.18.0-beta.0	2025-02-26 20:11:07 +00:00
Will Jones	5b12a47119	feat!: revert query limit to be unbounded for scans (#2151 ) In earlier PRs (#1886, #1191) we made the default limit 10 regardless of the query type. This was confusing for users and in many cases a breaking change. Users would have queries that used to return all results, but instead only returned the first 10, causing silent bugs. Part of the cause was consistency: the Python sync API seems to have always had a limit of 10, while newer APIs (Python async and Nodejs) didn't. This PR sets the default limit only for searches (vector search, FTS), while letting scans (even with filters) be unbounded. It does this consistently for all SDKs. Fixes #1983 Fixes #1852 Fixes #2141	2025-02-26 10:32:14 -08:00
Lance Release	22bd8329f3	Bump version: 0.17.0-beta.0 → 0.17.0	2025-02-26 18:16:07 +00:00
Lance Release	a736fad149	Bump version: 0.16.1-beta.3 → 0.17.0-beta.0	2025-02-26 18:16:01 +00:00
Weston Pace	c4f99e82e5	feat: push filters down into DF table provider (#2128 )	2025-02-25 14:46:28 -08:00
BubbleCal	f391ed828a	fix: remote table doesn't apply the prefilter flag for FTS (#2145 )	2025-02-24 21:37:43 +08:00
Lance Release	0f102f02c3	Bump version: 0.16.1-beta.2 → 0.16.1-beta.3	2025-02-20 03:38:01 +00:00
BubbleCal	14c9ff46d1	feat: support multivector on remote table (#2045 ) Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2025-02-20 11:34:51 +08:00
Lance Release	09e110525f	Bump version: 0.16.1-beta.1 → 0.16.1-beta.2	2025-02-13 04:39:38 +00:00
BubbleCal	3b19e96ae7	fix: panic when field id doesn't equal to field index (#2116 ) Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2025-02-13 12:38:35 +08:00
Lance Release	83273ad997	Bump version: 0.16.1-beta.0 → 0.16.1-beta.1	2025-02-11 20:55:43 +00:00
LuQQiu	c3e865e8d0	fix: fix index out of bound in load indices (#2108 ) panicked at 'index out of bounds: the len is 24 but the index is 25':Lancedb/rust/lancedb/src/index/vector.rs:26\n load_indices() on the old manifest while use the newer manifest to get column names could result in index out of bound if some columns are removed from the new version. This change reduce the possibility of index out of bound operation but does not fully remove it. Better that lance can directly provide column name info so no need extra calls to get column name but that require modify the public APIs	2025-02-11 12:54:11 -08:00
Lance Release	3626f2f5e1	Bump version: 0.16.0 → 0.16.1-beta.0	2025-02-07 19:27:26 +00:00
Lance Release	7e259d8b0f	Bump version: 0.16.0-beta.0 → 0.16.0	2025-02-07 17:33:13 +00:00
Lance Release	e84f747464	Bump version: 0.15.1-beta.3 → 0.16.0-beta.0	2025-02-07 17:33:08 +00:00
Will Jones	e7574698eb	feat: upgrade Lance to 0.23.0 (#2101 ) Upstream changelog: https://github.com/lancedb/lance/releases/tag/v0.23.0	2025-02-07 07:58:07 -08:00
Weston Pace	4e5fbe6c99	fix: ensure metadata erased from schema call in table provider (#2099 ) This also adds a basic unit test for the table provider	2025-02-06 15:30:20 -08:00
Weston Pace	1a449fa49e	refactor: rename drop_db / drop_database to drop_all_tables, expose database from connection (#2098 ) If we start supporting external catalogs then "drop database" may be misleading (and not possible). We should be more clear that this is a utility method to drop all tables. This is also a nice chance for some consistency cleanup as it was `drop_db` in rust, `drop_database` in python, and non-existent in typescript. This PR also adds a public accessor to get the database trait from a connection. BREAKING CHANGE: the `drop_database` / `drop_db` methods are now deprecated.	2025-02-06 13:22:28 -08:00

1 2 3 4 5 ...

420 Commits