lancedb

mirror of https://github.com/lancedb/lancedb.git synced 2026-07-04 03:20:40 +00:00

Author	SHA1	Message	Date
Weston Pace	d5586c9c32	feat: make it possible to opt in to using the v2 format (#1352 ) This also exposed the max_batch_length configuration option in python/node (it was needed to verify if we are actually in v2 mode or not)	2024-06-04 21:52:14 -07:00
Rob Meng	2e197ef387	feat: upgrade lance to 0.11.0 (#1317 ) upgrade lance and make fixes for the upgrade	2024-05-21 18:53:19 -04:00
Weston Pace	4f512af024	feat: add the optimize function to nodejs and async python (#1257 ) The optimize function is pretty crucial for getting good performance when building a large scale dataset but it was only exposed in rust (many sync python users are probably doing this via to_lance today) This PR adds the optimize function to nodejs and to python. I left the function marked experimental because I think there will likely be changes to optimization (e.g. if we add features like "optimize on write"). I also only exposed the `cleanup_older_than` configuration parameter since this one is very commonly used and the rest have sensible defaults and we don't really know why we would recommend different values for these defaults anyways.	2024-05-20 07:09:31 -07:00
Weston Pace	3d7c48feca	feat: allow the index_cache_size to be configured when opening a table (#1245 ) This was already configurable in the rust API but it wasn't actually being passed down to the underlying dataset. I added this option to both the async python API and the new nodejs API. I also added this option to the synchronous python API. I did not add the option to vectordb.	2024-04-26 13:42:02 -07:00
Will Jones	1d23af213b	feat: expose storage options in LanceDB (#1204 ) Exposes `storage_options` in LanceDB. This is provided for Python async, Node `lancedb`, and Node `vectordb` (and Rust of course). Python synchronous is omitted because it's not compatible with the PyArrow filesystems we use there currently. In the future, we will move the sync API to wrap the async one, and then it will get support for `storage_options`. 1. Fixes #1168 2. Closes #1165 3. Closes #1082 4. Closes #439 5. Closes #897 6. Closes #642 7. Closes #281 8. Closes #114 9. Closes #990 10. Deprecating `awsCredentials` and `awsRegion`. Users are encouraged to use `storageOptions` instead.	2024-04-10 10:12:04 -07:00
Weston Pace	968c62cb8f	feat: introduce ArrowNative wrapper struct for adding data that is already a RecordBatchReader (#1139 ) In `2de226220b` I added a new `IntoArrow` trait for adding data into a table. Unfortunately, it seems my approach for implementing the trait for "things that are already record batch readers" was flawed. This PR corrects that flaw and, conveniently, removes the need to box readers at all (though it is ok if you do).	2024-04-05 16:33:37 -07:00
Weston Pace	4180b44472	feat: refactor the query API and add query support to the python async API (#1113 ) In addition, there are also a number of changes in nodejs to the docstrings of existing methods because this PR adds a jsdoc linter.	2024-04-05 16:32:47 -07:00
Weston Pace	b6a522d483	feat: add list_indices to the async api (#1074 )	2024-04-05 16:32:15 -07:00
Weston Pace	9031ec6878	feat: add update to the async API (#1093 )	2024-04-05 16:32:15 -07:00
Weston Pace	47daf9b7b0	feat: add time travel operations to the async API (#1070 )	2024-04-05 16:32:15 -07:00
Weston Pace	f822255683	feat: add create_index to the async python API (#1052 ) This also refactors the rust lancedb index builder API (and, correspondingly, the nodejs API)	2024-04-05 16:32:14 -07:00
Weston Pace	73c69a6b9a	feat: page_token / limit to native table_names function. Use async table_names function from sync table_names function (#1059 ) The synchronous table_names function in python lancedb relies on arrow's filesystem which behaves slightly differently than object_store. As a result, the function would not work properly in GCS. However, the async table_names function uses object_store directly and thus is accurate. In most cases we can fallback to using the async table_names function and so this PR does so. The one case we cannot is if the user is already in an async context (we can't start a new async event loop). Soon, we can just redirect those users to use the async API instead of the sync API and so that case will eventually go away. For now, we fallback to the old behavior.	2024-04-05 16:31:45 -07:00
Weston Pace	8033a44d68	feat: add support for add to async python API (#1037 ) In order to add support for `add` we needed to migrate the rust `Table` trait to a `Table` struct and `TableInternal` trait (similar to the way the connection is designed). While doing this we also cleaned up some inconsistencies between the SDKs: * Python and Node are garbage collected languages and it can be difficult to trigger something to be freed. The convention for these languages is to have some kind of close method. I added a close method to both the table and connection which will drop the underlying rust object. * We made significant improvements to table creation in `cc5f2136a6` for the `node` SDK. I copied these changes to the `nodejs` SDK. * The nodejs tables were using fs to create tmp directories and these were not getting cleaned up. This is mostly harmless but annoying and so I changed it up a bit to ensure we cleanup tmp directories. * ~~countRows in the node SDK was returning `bigint`. I changed it to return `number`~~ (this actually happened in a previous PR) * Tables and connections now implement `std::fmt::Display` which is hooked into python's `__repr__`. Node has no concept of a regular "to string" function and so I added a `display` method. * Python method signatures are changing so that optional parameters are always `Optional[foo] = None` instead of something like `foo = False`. This is because we want those defaults to be in rust whenever possible (though we still need to mention the default in documentation). * I changed the python `AsyncConnection/AsyncTable` classes from abstract classes with a single implementation to just classes because we no longer have the remote implementation in python. Note: this does NOT add the `add` function to the remote table. This PR was already large enough, and the remote implementation is unique enough, that I am going to do all the remote stuff at a later date (we should have the structure in place and correct so there shouldn't be any refactor concerns) --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2024-04-05 16:31:36 -07:00
Weston Pace	4299f719ec	feat: port create_table to the async python API and the remote rust API (#1031 ) I've also started `ASYNC_MIGRATION.MD` to keep track of the breaking changes from sync to async python.	2024-04-05 16:31:36 -07:00
Rob Meng	f3de3d990d	chore: upgrade to lance 0.10.1 (#1034 ) upgrade to lance 0.10.1 and update doc string to reflect dynamic projection options	2024-04-05 16:31:36 -07:00
Will Jones	464a36ad38	feat: `{add\|alter\|drop}_columns` APIs (#1015 ) Initial work for #959. This exposes the basic functionality for each in all of the APIs. Will add user guide documentation in a later PR.	2024-04-05 16:30:47 -07:00
Weston Pace	2163502b31	refactor: rename the rust crate from vectordb to lancedb (#1012 ) This also renames the new experimental node package to lancedb. The classic node package remains named vectordb. The goal here is to avoid introducing piecemeal breaking changes to the vectordb crate. Instead, once the new API is stabilized, we will officially release the lancedb crate and deprecate the vectordb crate. The same pattern will eventually happen with the npm package vectordb.	2024-04-05 16:30:40 -07:00
Will Jones	c5b0934bfb	feat(node): add `read_consistency_interval` to Node and Rust (#1002 ) This PR adds the same consistency semantics as was added in #828. It does not add the same lazy-loading of tables, since that breaks some existing tests. This closes #998. --------- Co-authored-by: Weston Pace <weston.pace@gmail.com>	2024-04-05 16:30:40 -07:00
Weston Pace	cbc0c439ef	refactor: rust vectordb API stabilization of the Connection trait (#993 ) This is the start of a more comprehensive refactor and stabilization of the Rust API. The `Connection` trait is cleaned up to not require `lance` and to match the `Connection` trait in other APIs. In addition, the concrete implementation `Database` is hidden. BREAKING CHANGE: The struct `crate::connection::Database` is now gone. Several examples opened a connection using `Database::connect` or `Database::connect_with_params`. Users should now use `vectordb::connect`. BREAKING CHANGE: The `connect`, `create_table`, and `open_table` methods now all return a builder object. This means that a call like `conn.open_table(..., opt1, opt2)` will now become `conn.open_table(...).opt1(opt1).opt2(opt2).execute()` In addition, the structure of options has changed slightly. However, no options capability has been removed. --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2024-04-05 16:30:40 -07:00
Weston Pace	138fc3f66b	feat: add a filterable count_rows to all the lancedb APIs (#913 ) A `count_rows` method that takes a filter was recently added to `LanceTable`. This PR adds it everywhere else except `RemoteTable` (that will come soon).	2024-04-05 16:29:58 -07:00
Lei Xu	cef0293985	feat(napi): Issue queries as node SDK (#868 ) * Query as a fluent API and `AsyncIterator<RecordBatch>` * Much more docs * Add tests for auto infer vector search columns with different dimensions.	2024-04-05 16:28:18 -07:00
Lei Xu	8b04d8fef6	feat: improve the rust table query API and documents (#860 ) * Easy to type * Handle `String, &str, [String] and [&str]` well without manual conversion * Fix function name to be verb * Improve docstring of Rust. * Promote `query` and `search()` to public `Table` trait	2024-04-05 16:27:51 -07:00
Lei Xu	db4a979278	feat(napi): Provide a new createIndex API in the napi SDK. (#857 )	2024-04-05 16:27:51 -07:00
Lei Xu	dfabbe9081	feat(rust): create index API improvement (#853 ) * Extract a minimal Table interface in Rust SDK * Make create_index composable in Rust. * Fix compiling issues from ffi	2024-04-05 16:27:51 -07:00
Lei Xu	efcaa433fe	feat: rework NodeJS SDK using napi (#847 ) Use Napi to write a Node.js SDK that follows Polars for better maintainability, while keeping most of the logic in Rust.	2024-04-05 16:27:51 -07:00

1 2

75 Commits