lancedb

mirror of https://github.com/lancedb/lancedb.git synced 2026-07-07 13:00:40 +00:00

Author	SHA1	Message	Date
BubbleCal	fd1a5ce788	feat: support IVF_HNSW_PQ (#1314 ) this also simplifies the code of creating index with macro --------- Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2024-05-24 18:32:00 +08:00
QianZhu	def087fc85	fix: parse index_stats for scalar index (#1319 ) parse the index stats for scalar index - it is different from the index stats for vector index	2024-05-23 13:10:46 -07:00
Lance Release	1d9f76bdda	Bump version: 0.5.0-beta.0 → 0.5.0	2024-05-23 16:30:27 +00:00
Lance Release	affdfc4d48	Bump version: 0.4.20 → 0.5.0-beta.0	2024-05-23 16:30:26 +00:00
Will Jones	657aba3c05	ci: pin aws sdk versions (#1318 )	2024-05-22 08:26:09 -07:00
Rob Meng	2e197ef387	feat: upgrade lance to 0.11.0 (#1317 ) upgrade lance and make fixes for the upgrade	2024-05-21 18:53:19 -04:00
Weston Pace	4f512af024	feat: add the optimize function to nodejs and async python (#1257 ) The optimize function is pretty crucial for getting good performance when building a large scale dataset but it was only exposed in rust (many sync python users are probably doing this via to_lance today) This PR adds the optimize function to nodejs and to python. I left the function marked experimental because I think there will likely be changes to optimization (e.g. if we add features like "optimize on write"). I also only exposed the `cleanup_older_than` configuration parameter since this one is very commonly used and the rest have sensible defaults and we don't really know why we would recommend different values for these defaults anyways.	2024-05-20 07:09:31 -07:00
BubbleCal	5e01810438	feat: support IVF_HNSW_SQ (#1284 ) Signed-off-by: BubbleCal <bubble-cal@outlook.com>	2024-05-16 14:28:06 +08:00
Lance Release	5f6eb4651e	Bump version: 0.4.19 → 0.4.20	2024-05-09 21:14:30 +00:00
Bert	805c78bb20	chore: bump lance to v0.10.18 (#1287 ) https://github.com/lancedb/lance/releases/tag/v0.10.18	2024-05-09 17:06:26 -03:00
Lance Release	e2e8b6aee4	Bump version: 0.4.18 → 0.4.19	2024-05-07 19:04:31 +00:00
Will Jones	a6babfa651	fix(node/vectordb): parse value not key (#1276 )	2024-05-07 10:16:05 -07:00
Cory Grinstead	9d2fb7d602	feat: rust embedding registry (#1259 ) Todo: - [x] add proper documentation - [x] add unit tests - [x] better handling of the registry1 - [x] allow user defined registry2 1 The python implementation just uses a global registry so it makes things a bit easier. I attached it to the db/connection to prevent future conflicts if running multiple connections/databases. I mostly modeled the registry & pattern off of datafusion's [FunctionRegistry](https://docs.rs/datafusion/latest/datafusion/execution/trait.FunctionRegistry.html). 2 Ideally, the user should be able to provide it's own registry entirely, but currently it just uses an in memory registry by default (_which isn't configurable_) `rust/lancedb/examples/embedding_registry.rs` provides a thorough example of expected usage. --- Some additional notes: This does not provide any of the out of box functionality that the python registry does. _i.e there are no built-in embedding functions._ You can think of this as the ground work for adding those built in functions, So while this is part of https://github.com/lancedb/lancedb/issues/994, it does not yet offer feature parity.	2024-05-06 18:39:07 -05:00
Rohit Rastogi	a7c0d80b9e	Implement convertors to and from Polars DataFrames in Rust SDK using convertors based on C FFI #1099 (#1260 ) https://github.com/lancedb/lancedb/issues/1099 Took the same general approach from: https://github.com/lancedb/lancedb/pull/1235. Instead of using high-level convertors implemented in polars-arrow (with the arrow-rs feature flag, which adds a dependency on arrow-rs), I used convertors based on the C FFI to avoid dependency conflicts. --------- Co-authored-by: Rohit Rastogi <rohitrastogi@Rohits-MacBook-Pro.local> Co-authored-by: Weston Pace <weston.pace@gmail.com>	2024-05-03 16:15:14 -07:00
Lance Release	975da09b02	Bump version: 0.4.17 → 0.4.18	2024-04-30 19:21:37 +00:00
Ryan Green	0528abdf97	fix: fix path on remote create_table and check for error response (#1244 )	2024-04-28 11:33:05 -02:30
Weston Pace	3d7c48feca	feat: allow the index_cache_size to be configured when opening a table (#1245 ) This was already configurable in the rust API but it wasn't actually being passed down to the underlying dataset. I added this option to both the async python API and the new nodejs API. I also added this option to the synchronous python API. I did not add the option to vectordb.	2024-04-26 13:42:02 -07:00
Lei Xu	b272408b05	chore: fix main branch test failure (#1240 )	2024-04-24 13:49:37 -07:00
Weston Pace	46ffa87cd4	chore: disable the remote feature by default (#1239 ) The rust implementation of the remote client is not yet ready. This is understandably confusing for users since it is enabled by default. This PR disables it by default. We can re-enable it when we are ready (even then it is not clear this is something that should be a default feature). --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2024-04-24 09:28:24 -07:00
QianZhu	cd9fc37b95	add rename_table fn and more data for index_stats to return (#1234 ) 1. added rename_table fn to enable dashboard to rename a table 2. added index_type and distance_type (for vector index) to index_stats so that more detailed data can be shown on the table page.	2024-04-23 16:42:26 -07:00
Lance Release	e4945abb1a	Bump version: 0.4.16 → 0.4.17	2024-04-10 17:39:52 +00:00
Will Jones	1d23af213b	feat: expose storage options in LanceDB (#1204 ) Exposes `storage_options` in LanceDB. This is provided for Python async, Node `lancedb`, and Node `vectordb` (and Rust of course). Python synchronous is omitted because it's not compatible with the PyArrow filesystems we use there currently. In the future, we will move the sync API to wrap the async one, and then it will get support for `storage_options`. 1. Fixes #1168 2. Closes #1165 3. Closes #1082 4. Closes #439 5. Closes #897 6. Closes #642 7. Closes #281 8. Closes #114 9. Closes #990 10. Deprecating `awsCredentials` and `awsRegion`. Users are encouraged to use `storageOptions` instead.	2024-04-10 10:12:04 -07:00
Lance Release	6c452f29e9	Bump version: 0.4.15 → 0.4.16	2024-04-05 16:34:50 -07:00
Will Jones	47cff963c5	feat: ship fp16kernels in Python wheels (#1148 ) Same deal as https://github.com/lancedb/lance/pull/2098	2024-04-05 16:34:50 -07:00
Lei Xu	e6ff3d848b	chore: bump to 0.10.8 (#1187 )	2024-04-05 16:34:50 -07:00
Lei Xu	3f14938392	chore: pass str instead of String to build table names (#1178 )	2024-04-05 16:34:50 -07:00
Lance Release	e0f50013ea	Bump version: 0.4.14 → 0.4.15	2024-04-05 16:34:39 -07:00
Lance Release	ccf13f15d4	Bump version: 0.4.13 → 0.4.14	2024-04-05 16:33:37 -07:00
QianZhu	db2631c2ad	remove warnings (#1147 )	2024-04-05 16:33:37 -07:00
Lei Xu	473ef7e426	chore: validate table name (#1146 ) Closes #1129	2024-04-05 16:33:37 -07:00
QianZhu	63db51c90d	better error msg for query vector with wrong dim (#1140 )	2024-04-05 16:33:37 -07:00
Weston Pace	968c62cb8f	feat: introduce ArrowNative wrapper struct for adding data that is already a RecordBatchReader (#1139 ) In `2de226220b` I added a new `IntoArrow` trait for adding data into a table. Unfortunately, it seems my approach for implementing the trait for "things that are already record batch readers" was flawed. This PR corrects that flaw and, conveniently, removes the need to box readers at all (though it is ok if you do).	2024-04-05 16:33:37 -07:00
Weston Pace	b36c750cc7	fix: fix compile error in example caused by merge conflict (#1135 )	2024-04-05 16:33:06 -07:00
Weston Pace	a23b856410	feat: change DistanceType to be independent thing instead of resuing lance_linalg (#1133 ) This PR originated from a request to add `Serialize` / `Deserialize` to `lance_linalg::distance::DistanceType`. However, that is a strange request for `lance_linalg` which shouldn't really have to worry about `Serialize` / `Deserialize`. The problem is that `lancedb` is re-using `DistanceType` and things in `lancedb` do need to worry about `Serialize`/`Deserialize` (because `lancedb` needs to support remote client). On the bright side, separating the two types allows us to independently document distance type and allows `lance_linalg` to make changes to `DistanceType` in the future without having to worry about backwards compatibility concerns.	2024-04-05 16:33:06 -07:00
Weston Pace	0fe0976a0e	docs: add links to rust SDK docs, remove references to rust SDK being unstable / experimental (#1131 )	2024-04-05 16:33:05 -07:00
Weston Pace	abde77eafb	feat(rust): add trait for incoming data (#1128 ) This will make it easier for 3rd party integrations. They simply need to implement `IntoArrow` for their types in order for those types to be used in ingestion.	2024-04-05 16:32:47 -07:00
Weston Pace	4180b44472	feat: refactor the query API and add query support to the python async API (#1113 ) In addition, there are also a number of changes in nodejs to the docstrings of existing methods because this PR adds a jsdoc linter.	2024-04-05 16:32:47 -07:00
Lance Release	1f816d597a	Bump version: 0.4.12 → 0.4.13	2024-04-05 16:32:31 -07:00
Weston Pace	b6a522d483	feat: add list_indices to the async api (#1074 )	2024-04-05 16:32:15 -07:00
Weston Pace	9031ec6878	feat: add update to the async API (#1093 )	2024-04-05 16:32:15 -07:00
Weston Pace	47daf9b7b0	feat: add time travel operations to the async API (#1070 )	2024-04-05 16:32:15 -07:00
Weston Pace	f822255683	feat: add create_index to the async python API (#1052 ) This also refactors the rust lancedb index builder API (and, correspondingly, the nodejs API)	2024-04-05 16:32:14 -07:00
Will Jones	90af5cf028	fix: propagate filter validation errors (#1092 ) In Rust and Node, we have been swallowing filter validation errors. If there was an error in parsing the filter, then the filter was silently ignored, returning unfiltered results. Fixes #1081	2024-04-05 16:31:53 -07:00
Weston Pace	d4502add44	Remove remote integration workflow (#1076 )	2024-04-05 16:31:53 -07:00
Will Jones	334857a8cb	fix: Allow converting from NativeTable to Table (#1069 )	2024-04-05 16:31:53 -07:00
Lance Release	386d5da22f	Bump version: 0.4.11 → 0.4.12	2024-04-05 16:31:45 -07:00
Will Jones	5120bf262b	fix: make checkout_latest force a reload (#1064 ) #1002 accidentally changed `checkout_latest` to do nothing if the table was already in latest mode. This PR makes sure it forces a reload of the table (if there is a newer version).	2024-04-05 16:31:45 -07:00
Weston Pace	73c69a6b9a	feat: page_token / limit to native table_names function. Use async table_names function from sync table_names function (#1059 ) The synchronous table_names function in python lancedb relies on arrow's filesystem which behaves slightly differently than object_store. As a result, the function would not work properly in GCS. However, the async table_names function uses object_store directly and thus is accurate. In most cases we can fallback to using the async table_names function and so this PR does so. The one case we cannot is if the user is already in an async context (we can't start a new async event loop). Soon, we can just redirect those users to use the async API instead of the sync API and so that case will eventually go away. For now, we fallback to the old behavior.	2024-04-05 16:31:45 -07:00
Will Jones	05f9a77baf	feat: more accessible errors (#1025 ) The fact that we convert errors to strings makes them really hard to work with. For example, in SaaS we want to know whether the underlying `lance::Error` was the `InvalidInput` variant, so we can return a 400 instead of a 500.	2024-04-05 16:31:45 -07:00
Weston Pace	8033a44d68	feat: add support for add to async python API (#1037 ) In order to add support for `add` we needed to migrate the rust `Table` trait to a `Table` struct and `TableInternal` trait (similar to the way the connection is designed). While doing this we also cleaned up some inconsistencies between the SDKs: * Python and Node are garbage collected languages and it can be difficult to trigger something to be freed. The convention for these languages is to have some kind of close method. I added a close method to both the table and connection which will drop the underlying rust object. * We made significant improvements to table creation in `cc5f2136a6` for the `node` SDK. I copied these changes to the `nodejs` SDK. * The nodejs tables were using fs to create tmp directories and these were not getting cleaned up. This is mostly harmless but annoying and so I changed it up a bit to ensure we cleanup tmp directories. * ~~countRows in the node SDK was returning `bigint`. I changed it to return `number`~~ (this actually happened in a previous PR) * Tables and connections now implement `std::fmt::Display` which is hooked into python's `__repr__`. Node has no concept of a regular "to string" function and so I added a `display` method. * Python method signatures are changing so that optional parameters are always `Optional[foo] = None` instead of something like `foo = False`. This is because we want those defaults to be in rust whenever possible (though we still need to mention the default in documentation). * I changed the python `AsyncConnection/AsyncTable` classes from abstract classes with a single implementation to just classes because we no longer have the remote implementation in python. Note: this does NOT add the `add` function to the remote table. This PR was already large enough, and the remote implementation is unique enough, that I am going to do all the remote stuff at a later date (we should have the structure in place and correct so there shouldn't be any refactor concerns) --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2024-04-05 16:31:36 -07:00

1 2 3 4 5

205 Commits