lancedb

mirror of https://github.com/lancedb/lancedb.git synced 2025-12-25 22:29:58 +00:00

Author	SHA1	Message	Date
Lance Release	2db257ca29	[python] Bump version: 0.6.3 → 0.6.4	2024-04-05 16:32:41 -07:00
Weston Pace	c1e3dc48af	feat: bump lance to 0.10.4 (#1123 )	2024-04-05 16:32:31 -07:00
Christian Di Lorenzo	8bb983bc3d	fix(python): Add python azure blob read support (#1102 ) I know there's a larger effort to have the python client based on the core rust implementation, but in the meantime there have been several issues (#1072 and #485) with some of the azure blob storage calls due to pyarrow not natively supporting an azure backend. To this end, I've added an optional import of the fsspec implementation of azure blob storage [`adlfs`](https://pypi.org/project/adlfs/) and passed it to `pyarrow.fs`. I've modified the existing test and manually verified it with some real credentials to make sure it behaves as expected. It should be now as simple as: ```python import lancedb db = lancedb.connect("az://blob_name/path") table = db.open_table("test") table.search(...) ``` Thank you for this cool project and we're excited to start using this for real shortly! 🎉 And thanks to @dwhitena for bringing it to my attention with his prediction guard posts. Co-authored-by: christiandilorenzo <christian.dilorenzo@infiniaml.com>	2024-04-05 16:32:31 -07:00
Weston Pace	1ea0c33545	feat: update lance to v0.10.3 (#1094 )	2024-04-05 16:32:31 -07:00
Raghav Dixit	765569425c	doc updates (#1085 ) closes #1084	2024-04-05 16:32:15 -07:00
Chang She	377832e532	feat(python): support optional vector field in pydantic model (#1097 ) The LanceDB embeddings registry allows users to annotate the pydantic model used as table schema with the desired embedding function, e.g.: ```python class Schema(LanceModel): id: str vector: Vector(openai.ndims()) = openai.VectorField() text: str = openai.SourceField() ``` Tables created like this does not require embeddings to be calculated by the user explicitly, e.g. this works: ```python table.add([{"id": "foo", "text": "rust all the things"}]) ``` However, trying to construct pydantic model instances without vector doesn't because it's a required field. Instead, you need add a default value: ```python class Schema(LanceModel): id: str vector: Vector(openai.ndims()) = openai.VectorField(default=None) text: str = openai.SourceField() ``` then this completes without errors: ```python table.add([Schema(id="foo", text="rust all the things")]) ``` However, all of the vectors are filled with zeros. Instead in add_vector_col we have to add an additional check so that the embedding generation is called.	2024-04-05 16:32:15 -07:00
QianZhu	723defbe7e	add index_stats python api (#1096 ) the integration test will be covered in another PR: https://github.com/lancedb/sophon/pull/1876	2024-04-05 16:32:15 -07:00
Chang She	c33110397e	fix(python): fix typo in passing in the api_key explicitly (#1098 ) fix silly typo	2024-04-05 16:32:15 -07:00
Weston Pace	b6a522d483	feat: add list_indices to the async api (#1074 )	2024-04-05 16:32:15 -07:00
Weston Pace	9031ec6878	feat: add update to the async API (#1093 )	2024-04-05 16:32:15 -07:00
Weston Pace	47daf9b7b0	feat: add time travel operations to the async API (#1070 )	2024-04-05 16:32:15 -07:00
Weston Pace	f822255683	feat: add create_index to the async python API (#1052 ) This also refactors the rust lancedb index builder API (and, correspondingly, the nodejs API)	2024-04-05 16:32:14 -07:00
Lance Release	fec6f92184	[python] Bump version: 0.6.2 → 0.6.3	2024-04-05 16:31:53 -07:00
Rob Meng	35bc4f3078	feat: configurable timeout for LanceDB Cloud queries (#1090 )	2024-04-05 16:31:53 -07:00
Lance Release	77ba97416d	[python] Bump version: 0.6.1 → 0.6.2	2024-04-05 16:31:45 -07:00
Lei Xu	f27167017b	chore: bump lance to 0.10.2 (#1061 )	2024-04-05 16:31:45 -07:00
Weston Pace	73c69a6b9a	feat: page_token / limit to native table_names function. Use async table_names function from sync table_names function (#1059 ) The synchronous table_names function in python lancedb relies on arrow's filesystem which behaves slightly differently than object_store. As a result, the function would not work properly in GCS. However, the async table_names function uses object_store directly and thus is accurate. In most cases we can fallback to using the async table_names function and so this PR does so. The one case we cannot is if the user is already in an async context (we can't start a new async event loop). Soon, we can just redirect those users to use the async API instead of the sync API and so that case will eventually go away. For now, we fallback to the old behavior.	2024-04-05 16:31:45 -07:00
Will Jones	05f9a77baf	feat: more accessible errors (#1025 ) The fact that we convert errors to strings makes them really hard to work with. For example, in SaaS we want to know whether the underlying `lance::Error` was the `InvalidInput` variant, so we can return a 400 instead of a 500.	2024-04-05 16:31:45 -07:00
Chang She	10089481c0	doc(python): document the method in fts (#982 ) Co-authored-by: prrao87 <prrao87@gmail.com> Co-authored-by: Prashanth Rao <35005448+prrao87@users.noreply.github.com>	2024-04-05 16:31:45 -07:00
Ayush Chaurasia	b5326d31e9	fix(python): Few fts patches (#1039 ) 1. filtering with fts mutated the schema, which caused schema mistmatch problems with hybrid search as it combines fts and vector search tables. 2. fts with filter failed with `with_row_id`. This was because row_id was calculated before filtering which caused size mismatch on attaching it after. 3. The fix for 1 meant that now row_id is attached before filtering but passing a filter to `to_lance` on a dataset that already contains `_rowid` raises a panic from lance. So temporarily, in case where fts is used with a filter AND `with_row_id`, we just force user to using the duckdb pathway. --------- Co-authored-by: Chang She <759245+changhiskhan@users.noreply.github.com>	2024-04-05 16:31:45 -07:00
Weston Pace	8033a44d68	feat: add support for add to async python API (#1037 ) In order to add support for `add` we needed to migrate the rust `Table` trait to a `Table` struct and `TableInternal` trait (similar to the way the connection is designed). While doing this we also cleaned up some inconsistencies between the SDKs: * Python and Node are garbage collected languages and it can be difficult to trigger something to be freed. The convention for these languages is to have some kind of close method. I added a close method to both the table and connection which will drop the underlying rust object. * We made significant improvements to table creation in `cc5f2136a6` for the `node` SDK. I copied these changes to the `nodejs` SDK. * The nodejs tables were using fs to create tmp directories and these were not getting cleaned up. This is mostly harmless but annoying and so I changed it up a bit to ensure we cleanup tmp directories. * ~~countRows in the node SDK was returning `bigint`. I changed it to return `number`~~ (this actually happened in a previous PR) * Tables and connections now implement `std::fmt::Display` which is hooked into python's `__repr__`. Node has no concept of a regular "to string" function and so I added a `display` method. * Python method signatures are changing so that optional parameters are always `Optional[foo] = None` instead of something like `foo = False`. This is because we want those defaults to be in rust whenever possible (though we still need to mention the default in documentation). * I changed the python `AsyncConnection/AsyncTable` classes from abstract classes with a single implementation to just classes because we no longer have the remote implementation in python. Note: this does NOT add the `add` function to the remote table. This PR was already large enough, and the remote implementation is unique enough, that I am going to do all the remote stuff at a later date (we should have the structure in place and correct so there shouldn't be any refactor concerns) --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2024-04-05 16:31:36 -07:00
Chang She	3bbcaba65b	chore(rust): update rust version (#810 )	2024-04-05 16:31:36 -07:00
Chang She	e60fde73ba	feat(python): allow user to override api url (#1054 )	2024-04-05 16:31:36 -07:00
Chang She	a7dbe933dc	chore(python): use pypi tantivy to speed up CI (#987 )	2024-04-05 16:31:36 -07:00
Chang She	4b40dad963	feat(python): add model_names() method to openai embedding function (#1049 ) small QoL improvement	2024-04-05 16:31:36 -07:00
QianZhu	b32b69c993	Add create scalar index to sdk (#1033 )	2024-04-05 16:31:36 -07:00
Weston Pace	4299f719ec	feat: port create_table to the async python API and the remote rust API (#1031 ) I've also started `ASYNC_MIGRATION.MD` to keep track of the breaking changes from sync to async python.	2024-04-05 16:31:36 -07:00
Lance Release	accf31fa92	[python] Bump version: 0.6.0 → 0.6.1	2024-04-05 16:31:36 -07:00
Rob Meng	b8eb5d4bfe	fix: fix columns type for pydantic 2.x (#1045 )	2024-04-05 16:31:36 -07:00
Weston Pace	629c622d15	feat: Initial remote table implementation for rust (#1024 ) This will eventually replace the remote table implementations in python and node.	2024-04-05 16:31:36 -07:00
Lance Release	45b5b66c82	[python] Bump version: 0.5.7 → 0.6.0	2024-04-05 16:31:36 -07:00
Rob Meng	f3de3d990d	chore: upgrade to lance 0.10.1 (#1034 ) upgrade to lance 0.10.1 and update doc string to reflect dynamic projection options	2024-04-05 16:31:36 -07:00
Weston Pace	2cec2a8937	feat: add a basic async python client starting point (#1014 ) This changes `lancedb` from a "pure python" setuptools project to a maturin project and adds a rust lancedb dependency. The async python client is extremely minimal (only `connect` and `Connection.table_names` are supported). The purpose of this PR is to get the infrastructure in place for building out the rest of the async client. Although this is not technically a breaking change (no APIs are changing) it is still a considerable change in the way the wheels are built because they now include the native shared library.	2024-04-05 16:31:34 -07:00
Will Jones	464a36ad38	feat: `{add\|alter\|drop}_columns` APIs (#1015 ) Initial work for #959. This exposes the basic functionality for each in all of the APIs. Will add user guide documentation in a later PR.	2024-04-05 16:30:47 -07:00
Lance Release	ef54bd5ba2	[python] Bump version: 0.5.6 → 0.5.7	2024-04-05 16:30:40 -07:00
Lei Xu	80e4d14c02	chore: bump pylance to 0.9.18 (#1011 )	2024-04-05 16:30:40 -07:00
Raghav Dixit	fdabf31984	python(feat): Imagebind embedding fn support (#1003 ) Added imagebind fn support , steps to install mentioned in docstring. pytest slow checks done locally --------- Co-authored-by: Ayush Chaurasia <ayush.chaurarsia@gmail.com>	2024-04-05 16:30:40 -07:00
Ayush Chaurasia	538d0320f7	Docs: add meta tags (#1006 )	2024-04-05 16:30:40 -07:00
Lance Release	69492586f0	[python] Bump version: 0.5.5 → 0.5.6	2024-04-05 16:30:40 -07:00
Chang She	484a121866	doc: improve embedding functions documentation (#983 ) Got some user feedback that the `implicit` / `explicit` distinction is confusing. Instead I was thinking we would just deprecate the `with_embeddings` API and then organize working with embeddings into 3 buckets: 1. manually generate embeddings 2. use a provided embedding function 3. define your own custom embedding function	2024-04-05 16:30:40 -07:00
Chang She	bc850e6add	feat(python): add optional threadpool for batch requests (#981 ) Currently if a batch request is given to the remote API, each query is sent sequentially. We should allow the user to specify a threadpool.	2024-04-05 16:30:40 -07:00
Will Jones	efa846b6e5	chore: upgrade lance to 0.9.16 (#975 )	2024-04-05 16:30:36 -07:00
Lance Release	fded15c9fe	[python] Bump version: 0.5.4 → 0.5.5	2024-04-05 16:30:36 -07:00
Ayush Chaurasia	510e8378bc	feat(python): hybrid search updates, examples, & latency benchmarks (#964 ) - Rename safe_import -> attempt_import_or_raise (closes https://github.com/lancedb/lancedb/pull/923) - Update docs - Add Notebook example (@changhiskhan you can use it for the talk. Comes with "open in colab" button) - Latency benchmark & results comparison, sanity check on real-world data - Updates the default openai model to gpt-4	2024-04-05 16:30:30 -07:00
QianZhu	7afcfca10d	Qian/make vector col optional (#950 ) remote SDK tests were completed through lancedb_integtest	2024-04-05 16:30:29 -07:00
Lance Release	a7e60a4c3f	[python] Bump version: 0.5.3 → 0.5.4	2024-04-05 16:29:58 -07:00
Weston Pace	e12bdc78bb	chore: bump lance version to 0.9.15 (#949 )	2024-04-05 16:29:58 -07:00
Weston Pace	41ccb48160	feat: add support for filter during merge insert when matched (#948 ) Closes #940	2024-04-05 16:29:58 -07:00
QianZhu	069ad267bd	added error msg to SaaS APIs (#852 ) 1. improved error msg for SaaS create_table and create_index --------- Co-authored-by: Chang She <759245+changhiskhan@users.noreply.github.com>	2024-04-05 16:29:58 -07:00
Weston Pace	138fc3f66b	feat: add a filterable count_rows to all the lancedb APIs (#913 ) A `count_rows` method that takes a filter was recently added to `LanceTable`. This PR adds it everywhere else except `RemoteTable` (that will come soon).	2024-04-05 16:29:58 -07:00

... 10 11 12 13 14 ...

854 Commits