lancedb

mirror of https://github.com/lancedb/lancedb.git synced 2026-07-04 03:20:40 +00:00

Author	SHA1	Message	Date
qzhu	b7c816c919	add index_stats to python api	2024-03-12 16:28:15 -07:00
qzhu	34dd548bc8	init commit for test	2024-03-11 13:28:24 -07:00
Lance Release	272cbcad7a	[python] Bump version: 0.6.1 → 0.6.2	2024-03-06 16:28:50 +00:00
Lei Xu	d1983602c2	chore: bump lance to 0.10.2 (#1061 )	2024-03-05 10:16:07 -08:00
Weston Pace	9148cd6d47	feat: page_token / limit to native table_names function. Use async table_names function from sync table_names function (#1059 ) The synchronous table_names function in python lancedb relies on arrow's filesystem which behaves slightly differently than object_store. As a result, the function would not work properly in GCS. However, the async table_names function uses object_store directly and thus is accurate. In most cases we can fallback to using the async table_names function and so this PR does so. The one case we cannot is if the user is already in an async context (we can't start a new async event loop). Soon, we can just redirect those users to use the async API instead of the sync API and so that case will eventually go away. For now, we fallback to the old behavior.	2024-03-05 08:38:18 -08:00
Will Jones	47dbb988bf	feat: more accessible errors (#1025 ) The fact that we convert errors to strings makes them really hard to work with. For example, in SaaS we want to know whether the underlying `lance::Error` was the `InvalidInput` variant, so we can return a 400 instead of a 500.	2024-03-05 07:57:11 -08:00
Chang She	6821536d44	doc(python): document the method in fts (#982 ) Co-authored-by: prrao87 <prrao87@gmail.com> Co-authored-by: Prashanth Rao <35005448+prrao87@users.noreply.github.com>	2024-03-04 16:42:24 -08:00
Ayush Chaurasia	d6f0663671	fix(python): Few fts patches (#1039 ) 1. filtering with fts mutated the schema, which caused schema mistmatch problems with hybrid search as it combines fts and vector search tables. 2. fts with filter failed with `with_row_id`. This was because row_id was calculated before filtering which caused size mismatch on attaching it after. 3. The fix for 1 meant that now row_id is attached before filtering but passing a filter to `to_lance` on a dataset that already contains `_rowid` raises a panic from lance. So temporarily, in case where fts is used with a filter AND `with_row_id`, we just force user to using the duckdb pathway. --------- Co-authored-by: Chang She <759245+changhiskhan@users.noreply.github.com>	2024-03-04 16:41:59 -08:00
Weston Pace	abaf315baf	feat: add support for add to async python API (#1037 ) In order to add support for `add` we needed to migrate the rust `Table` trait to a `Table` struct and `TableInternal` trait (similar to the way the connection is designed). While doing this we also cleaned up some inconsistencies between the SDKs: * Python and Node are garbage collected languages and it can be difficult to trigger something to be freed. The convention for these languages is to have some kind of close method. I added a close method to both the table and connection which will drop the underlying rust object. * We made significant improvements to table creation in `cc5f2136a6` for the `node` SDK. I copied these changes to the `nodejs` SDK. * The nodejs tables were using fs to create tmp directories and these were not getting cleaned up. This is mostly harmless but annoying and so I changed it up a bit to ensure we cleanup tmp directories. * ~~countRows in the node SDK was returning `bigint`. I changed it to return `number`~~ (this actually happened in a previous PR) * Tables and connections now implement `std::fmt::Display` which is hooked into python's `__repr__`. Node has no concept of a regular "to string" function and so I added a `display` method. * Python method signatures are changing so that optional parameters are always `Optional[foo] = None` instead of something like `foo = False`. This is because we want those defaults to be in rust whenever possible (though we still need to mention the default in documentation). * I changed the python `AsyncConnection/AsyncTable` classes from abstract classes with a single implementation to just classes because we no longer have the remote implementation in python. Note: this does NOT add the `add` function to the remote table. This PR was already large enough, and the remote implementation is unique enough, that I am going to do all the remote stuff at a later date (we should have the structure in place and correct so there shouldn't be any refactor concerns) --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2024-03-04 09:27:41 -08:00
Chang She	14b9277ac1	chore(rust): update rust version (#810 )	2024-03-03 18:51:58 -08:00
Chang She	d621826b79	feat(python): allow user to override api url (#1054 )	2024-03-03 18:29:47 -08:00
Chang She	08c0803ae1	chore(python): use pypi tantivy to speed up CI (#987 )	2024-03-03 16:57:55 -08:00
Chang She	d14c9b6d9e	feat(python): add model_names() method to openai embedding function (#1049 ) small QoL improvement	2024-03-03 12:33:00 -08:00
QianZhu	c1af53b787	Add create scalar index to sdk (#1033 )	2024-02-29 13:32:01 -08:00
Weston Pace	2a02d1394b	feat: port create_table to the async python API and the remote rust API (#1031 ) I've also started `ASYNC_MIGRATION.MD` to keep track of the breaking changes from sync to async python.	2024-02-29 13:29:29 -08:00
Lance Release	085066d2a8	[python] Bump version: 0.6.0 → 0.6.1	2024-02-29 19:48:16 +00:00
Rob Meng	adf1a38f4d	fix: fix columns type for pydantic 2.x (#1045 )	2024-02-29 14:47:56 -05:00
Weston Pace	294c33a42e	feat: Initial remote table implementation for rust (#1024 ) This will eventually replace the remote table implementations in python and node.	2024-02-29 10:55:49 -08:00
Lance Release	245786fed7	[python] Bump version: 0.5.7 → 0.6.0	2024-02-29 16:03:01 +00:00
Rob Meng	ebaa2dede5	chore: upgrade to lance 0.10.1 (#1034 ) upgrade to lance 0.10.1 and update doc string to reflect dynamic projection options	2024-02-28 11:06:46 -05:00
Weston Pace	a6bcbd007b	feat: add a basic async python client starting point (#1014 ) This changes `lancedb` from a "pure python" setuptools project to a maturin project and adds a rust lancedb dependency. The async python client is extremely minimal (only `connect` and `Connection.table_names` are supported). The purpose of this PR is to get the infrastructure in place for building out the rest of the async client. Although this is not technically a breaking change (no APIs are changing) it is still a considerable change in the way the wheels are built because they now include the native shared library.	2024-02-27 04:52:02 -08:00
Will Jones	5af74b5aca	feat: `{add\|alter\|drop}_columns` APIs (#1015 ) Initial work for #959. This exposes the basic functionality for each in all of the APIs. Will add user guide documentation in a later PR.	2024-02-26 11:04:53 -08:00
Lance Release	677b7c1fcc	[python] Bump version: 0.5.6 → 0.5.7	2024-02-22 20:07:12 +00:00
Lei Xu	8303a7197b	chore: bump pylance to 0.9.18 (#1011 )	2024-02-22 11:47:36 -08:00
Raghav Dixit	5fa9bfc4a8	python(feat): Imagebind embedding fn support (#1003 ) Added imagebind fn support , steps to install mentioned in docstring. pytest slow checks done locally --------- Co-authored-by: Ayush Chaurasia <ayush.chaurarsia@gmail.com>	2024-02-22 11:47:08 +05:30
Ayush Chaurasia	bf2e9d0088	Docs: add meta tags (#1006 )	2024-02-21 23:22:47 +05:30
Lance Release	62c5117def	[python] Bump version: 0.5.5 → 0.5.6	2024-02-20 20:45:02 +00:00
Chang She	6f8cf1e068	doc: improve embedding functions documentation (#983 ) Got some user feedback that the `implicit` / `explicit` distinction is confusing. Instead I was thinking we would just deprecate the `with_embeddings` API and then organize working with embeddings into 3 buckets: 1. manually generate embeddings 2. use a provided embedding function 3. define your own custom embedding function	2024-02-17 10:39:28 -08:00
Chang She	e0277383a5	feat(python): add optional threadpool for batch requests (#981 ) Currently if a batch request is given to the remote API, each query is sent sequentially. We should allow the user to specify a threadpool.	2024-02-16 20:22:22 -08:00
Will Jones	73b2977bff	chore: upgrade lance to 0.9.16 (#975 )	2024-02-14 14:20:03 -08:00
Lance Release	5b60412d66	[python] Bump version: 0.5.4 → 0.5.5	2024-02-13 23:30:35 +00:00
Ayush Chaurasia	eb31d95fef	feat(python): hybrid search updates, examples, & latency benchmarks (#964 ) - Rename safe_import -> attempt_import_or_raise (closes https://github.com/lancedb/lancedb/pull/923) - Update docs - Add Notebook example (@changhiskhan you can use it for the talk. Comes with "open in colab" button) - Latency benchmark & results comparison, sanity check on real-world data - Updates the default openai model to gpt-4	2024-02-13 17:58:39 +05:30
QianZhu	1b990983b3	Qian/make vector col optional (#950 ) remote SDK tests were completed through lancedb_integtest	2024-02-12 16:35:44 -08:00
Lance Release	82936c77ef	[python] Bump version: 0.5.3 → 0.5.4	2024-02-09 22:56:45 +00:00
Weston Pace	dddcddcaf9	chore: bump lance version to 0.9.15 (#949 )	2024-02-09 14:55:44 -08:00
Weston Pace	a9727eb318	feat: add support for filter during merge insert when matched (#948 ) Closes #940	2024-02-09 10:26:14 -08:00
QianZhu	48d55bf952	added error msg to SaaS APIs (#852 ) 1. improved error msg for SaaS create_table and create_index --------- Co-authored-by: Chang She <759245+changhiskhan@users.noreply.github.com>	2024-02-09 10:07:47 -08:00
Weston Pace	d2e71c8b08	feat: add a filterable count_rows to all the lancedb APIs (#913 ) A `count_rows` method that takes a filter was recently added to `LanceTable`. This PR adds it everywhere else except `RemoteTable` (that will come soon).	2024-02-08 09:40:29 -08:00
Ayush Chaurasia	d982ee934a	feat(python): Reranker DX improvements (#904 ) - Most users might not know how to use `QueryBuilder` object. Instead we should just pass the string query. - Add new rerankers: Colbert, openai	2024-02-06 13:59:31 +05:30
Will Jones	57605a2d86	feat(python): add `read_consistency_interval` argument (#828 ) This PR refactors how we handle read consistency: does the `LanceTable` class always pick up modifications to the table made by other instance or processes. Users have three options they can set at the connection level: 1. (Default) `read_consistency_interval=None` means it will not check at all. Users can call `table.checkout_latest()` to manually check for updates. 2. `read_consistency_interval=timedelta(0)` means always check for updates, giving strong read consistency. 3. `read_consistency_interval=timedelta(seconds=20)` means check for updates every 20 seconds. This is eventual consistency, a compromise between the two options above. ## Table reference state There is now an explicit difference between a `LanceTable` that tracks the current version and one that is fixed at a historical version. We now enforce that users cannot write if they have checked out an old version. They are instructed to call `checkout_latest()` before calling the write methods. Since `conn.open_table()` doesn't have a parameter for version, users will only get fixed references if they call `table.checkout()`. The difference between these two can be seen in the repr: Table that are fixed at a particular version will have a `version` displayed in the repr. Otherwise, the version will not be shown. ```python >>> table LanceTable(connection=..., name="my_table") >>> table.checkout(1) >>> table LanceTable(connection=..., name="my_table", version=1) ``` I decided to not create different classes for these states, because I think we already have enough complexity with the Cloud vs OSS table references. Based on #812	2024-02-05 08:12:19 -08:00
Ayush Chaurasia	738511c5f2	feat(python): add support new openai embedding functions (#912 ) @PrashantDixit0 --------- Co-authored-by: Chang She <759245+changhiskhan@users.noreply.github.com>	2024-02-04 18:19:42 -08:00
Lance Release	a9088224c5	[python] Bump version: 0.5.2 → 0.5.3	2024-02-03 03:04:04 +00:00
Ayush Chaurasia	688c57a0d8	fix: revert safe_import_pandas usage (#921 )	2024-02-02 18:57:13 -08:00
Lance Release	ce2242e06d	[python] Bump version: 0.5.1 → 0.5.2	2024-02-02 21:33:02 +00:00
Weston Pace	778339388a	chore: bump pylance version to latest in pyproject.toml (#918 )	2024-02-02 13:32:12 -08:00
Weston Pace	7f8637a0b4	feat: add merge_insert to the node and rust APIs (#915 )	2024-02-02 13:16:51 -08:00
QianZhu	09cd08222d	make it explicit about the vector column data type (#916 ) <img width="837" alt="Screenshot 2024-02-01 at 4 23 34 PM" src="https://github.com/lancedb/lancedb/assets/1305083/4f0f5c5a-2a24-4b00-aad1-ef80a593d964"> [ <img width="838" alt="Screenshot 2024-02-01 at 4 26 03 PM" src="https://github.com/lancedb/lancedb/assets/1305083/ca073bc8-b518-4be3-811d-8a7184416f07"> ](url) --------- Co-authored-by: Weston Pace <weston.pace@gmail.com>	2024-02-02 09:02:02 -08:00
Bert	a248d7feec	fix: add request retry to python client (#917 ) Adds capability to the remote python SDK to retry requests (fixes #911) This can be configured through environment: - `LANCE_CLIENT_MAX_RETRIES`= total number of retries. Set to 0 to disable retries. default = 3 - `LANCE_CLIENT_CONNECT_RETRIES` = number of times to retry request in case of TCP connect failure. default = 3 - `LANCE_CLIENT_READ_RETRIES` = number of times to retry request in case of HTTP request failure. default = 3 - `LANCE_CLIENT_RETRY_STATUSES` = http statuses for which the request will be retried. passed as comma separated list of ints. default `500, 502, 503` - `LANCE_CLIENT_RETRY_BACKOFF_FACTOR` = controls time between retry requests. see [here](`23f2287eb5/src/urllib3/util/retry.py (L141-L146)`). default = 0.25 Only read requests will be retried: - list table names - query - describe table - list table indices This does not add retry capabilities for writes as it could possibly cause issues in the case where the retried write isn't idempotent. For example, in the case where the LB times-out the request but the server completes the request anyway, we might not want to blindly retry an insert request.	2024-02-02 11:27:29 -05:00
Weston Pace	cc9473a94a	docs: add cleanup_old_versions and compact_files to `Table` for documentation purposes (#900 ) Closes #819	2024-02-01 15:06:00 -08:00
Weston Pace	d77e95a4f4	feat: upgrade to lance 0.9.11 and expose merge_insert (#906 ) This adds the python bindings requested in #870 The javascript/rust bindings will be added in a future PR.	2024-02-01 11:36:29 -08:00

1 2 3 4 5 ...

294 Commits