lancedb

mirror of https://github.com/lancedb/lancedb.git synced 2025-12-25 14:29:56 +00:00

Author	SHA1	Message	Date
Chang She	98af0ceec6	feat(python): first cut batch queries for remote api (#753 ) issue separate requests under the hood and concatenate results	2024-04-05 16:24:47 -07:00
Lance Release	7778031b26	[python] Bump version: 0.4.1 → 0.4.2	2024-04-05 16:24:47 -07:00
Chang She	c97ae6b787	chore(python): update embedding API to use openai 1.6.1 (#751 ) API has changed significantly, namely `openai.Embedding.create` no longer exists. https://github.com/openai/openai-python/discussions/742 Update the OpenAI embedding function and put a minimum on the openai sdk version.	2024-04-05 16:24:47 -07:00
Chang She	7bac1131fb	feat: add timezone handling for datetime in pydantic (#578 ) If you add timezone information in the Field annotation for a datetime then that will now be passed to the pyarrow data type. I'm not sure how pyarrow enforces timezones, right now, it silently coerces to the timezone given in the column regardless of whether the input had the matching timezone or not. This is probably not the right behavior. Though we could just make it so the user has to make the pydantic model do the validation instead of doing that at the pyarrow conversion layer.	2024-04-05 16:24:47 -07:00
Chang She	a0afa84786	feat(python): add post filtering for full text search (#739 ) Closes #721 fts will return results as a pyarrow table. Pyarrow tables has a `filter` method but it does not take sql filter strings (only pyarrow compute expressions). Instead, we do one of two things to support `tbl.search("keywords").where("foo=5").limit(10).to_arrow()`: Default path: If duckdb is available then use duckdb to execute the sql filter string on the pyarrow table. Backup path: Otherwise, write the pyarrow table to a lance dataset and then do `to_table(filter=<filter>)` Neither is ideal. Default path has two issues: 1. requires installing an extra library (duckdb) 2. duckdb mangles some fields (like fixed size list => list) Backup path incurs a latency penalty (~20ms on ssd) to write the resultset to disk. In the short term, once #676 is addressed, we can write the dataset to "memory://" instead of disk, this makes the post filter evaluate much quicker (ETA next week). In the longer term, we'd like to be able to evaluate the filter string on the pyarrow Table directly, one possibility being that we use Substrait to generate pyarrow compute expressions from sql string. Or if there's enough progress on pyarrow, it could support Substrait expressions directly (no ETA) --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2024-04-05 16:24:47 -07:00
Chang She	46bf5a1ed1	feat(python): support list of list fields from pydantic schema (#747 ) For object detection, each row may correspond to an image and each image can have multiple bounding boxes of x-y coordinates. This means that a `bbox` field is potentially "list of list of float". This adds support in our pydantic-pyarrow conversion for nested lists.	2024-04-05 16:24:47 -07:00
Lance Release	d1f24ba1dd	[python] Bump version: 0.4.0 → 0.4.1	2024-04-05 16:24:47 -07:00
Will Jones	43662705ad	docs: enhance Update user guide (#735 ) Closes #705	2024-04-05 16:24:47 -07:00
Weston Pace	94e81ff84b	feat: add the ability to create scalar indices (#679 ) This is a pretty direct binding to the underlying lance capability	2024-04-05 16:24:47 -07:00
Chang She	009297e900	bug(python): fix path handling in windows (#724 ) Use pathlib for local paths so that pathlib can handle the correct separator on windows. Closes #703 --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2024-04-05 16:24:45 -07:00
Will Jones	a975cc0a94	fix: prevent duplicate data in FTS index (#728 ) This forces the user to replace the whole FTS directory when re-creating the index, prevent duplicate data from being created. Previously, the whole dataset was re-added to the existing index, duplicating existing rows in the index. This (in combination with lancedb/lance#1707) caused #726, since the duplicate data emitted duplicate indices for `take()` and an upstream issue caused those queries to fail. This solution isn't ideal, since it makes the FTS index temporarily unavailable while the index is built. In the future, we should have multiple FTS index directories, which would allow atomic commits of new indexes (as well as multiple indexes for different columns). Fixes #498. Fixes #726. --------- Co-authored-by: Chang She <759245+changhiskhan@users.noreply.github.com>	2024-04-05 16:24:30 -07:00
Will Jones	48a12e780c	upgrade lance to v0.9.1 (#727 ) This brings in some important bugfixes related to take and aarch64 Linux. See changes at: https://github.com/lancedb/lance/releases/tag/v0.9.1	2024-04-05 16:24:30 -07:00
Chang She	b60a2177ae	feat(python): support nested reference for fts (#723 ) https://github.com/lancedb/lance/issues/1739 Support nested field reference in full text search --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2024-04-05 16:24:30 -07:00
Chang She	cc9d74e7a7	feat(python): add option to flatten output in to_pandas (#722 ) Closes https://github.com/lancedb/lance/issues/1738 We add a `flatten` parameter to the signature of `to_pandas`. By default this is None and does nothing. If set to True or -1, then LanceDB will flatten structs before converting to a pandas dataframe. All nested structs are also flattened. If set to any positive integer, then LanceDB will flatten structs up to the specified level of nesting. --------- Co-authored-by: Weston Pace <weston.pace@gmail.com>	2024-04-05 16:24:30 -07:00
Lance Release	acbcbe6496	[python] Bump version: 0.3.6 → 0.4.0	2024-04-05 16:24:30 -07:00
Lei Xu	1d79e9168e	chore: bump lance version to 0.9 (#715 )	2024-04-05 16:24:30 -07:00
Lance Release	811e604077	[python] Bump version: 0.3.5 → 0.3.6	2024-04-05 16:24:30 -07:00
Bert	eb62ddfb0c	implement update for remote clients (#706 )	2024-04-05 16:24:30 -07:00
Rob Meng	32515ace74	feat: pass vector column name to remote backend (#710 ) pass vector column name to remote as well. `vector_column` is already part of `Query` just declearing it as part to `remote.VectorQuery` as well	2024-04-05 16:24:30 -07:00
Chang She	374a6f7e78	feat: support nested pydantic schema (#707 )	2024-04-05 16:24:30 -07:00
Lance Release	fc32f98c34	[python] Bump version: 0.3.4 → 0.3.5	2024-04-05 16:24:30 -07:00
Will Jones	9356c3b86a	feat(python): add update query support for Python (#654 ) Closes #69 Will not pass until https://github.com/lancedb/lance/pull/1585 is released	2024-04-05 16:24:29 -07:00
Ayush Chaurasia	3413e79b0f	chore(python): Reduce posthog event count (#661 ) - Register open_table as event - Because we're dropping 'seach' event currently, changed the name to 'search_table' and introduced throttling - Throttled events will be counted once per time batch so that the user is registered but event count doesn't go up by a lot	2024-04-05 16:24:14 -07:00
QianZhu	480a630e19	Qian/minor fix doc (#695 )	2024-04-05 16:23:49 -07:00
QianZhu	bda0135cfc	saas python sdk doc (#692 ) <img width="256" alt="Screenshot 2023-12-07 at 11 55 41 AM" src="https://github.com/lancedb/lancedb/assets/1305083/259bf234-9b3b-4c5d-af45-c7f3fada2cc7">	2024-04-05 16:23:49 -07:00
Bert	bbf34ae7f4	fix: python remote correct open_table error message (#659 )	2024-04-05 16:23:49 -07:00
Lance Release	8f82e4897c	[python] Bump version: 0.3.3 → 0.3.4	2024-04-05 16:23:49 -07:00
Will Jones	6d76fe80b8	chore: upgrade lance to v0.8.17 (#656 ) Readying for the next Lance release.	2024-04-05 16:23:49 -07:00
Rok Mihevc	78ab9068a8	feat(python): expose index cache size (#655 ) This is to enable https://github.com/lancedb/lancedb/issues/641. Should be merged after https://github.com/lancedb/lance/pull/1587 is released.	2024-04-05 16:23:49 -07:00
Aidan	775bee576c	SaaS create_index API (#649 )	2024-04-05 16:23:49 -07:00
Rok Mihevc	32cb1b9ea4	feat: add RemoteTable.version in Python (#644 ) Please note: this is not tested as we don't have a server here and testing against a mock object wouldn't be that interesting.	2024-04-05 16:23:49 -07:00
Ayush Chaurasia	d59dbf8230	fix: Pydantic 1.x compat for weak_lru caching in embeddings API (#643 ) Colab has pydantic 1.x by default and pydantic 1.x BaseModel objects don't support weakref creation by default that we use to cache embedding models https://github.com/lancedb/lancedb/blob/main/python/lancedb/embeddings/utils.py#L206 . It needs to be added to slot.	2024-04-05 16:23:49 -07:00
Ayush Chaurasia	c0a49a9a5b	Multi-task instructor model with quantization support & weak_lru cache for embedding function models (#612 ) resolves #608	2024-04-05 16:23:49 -07:00
QianZhu	2f2964a645	fix saas open_table and table_names issues (#640 ) - added check whether a table exists in SaaS open_table - remove prefilter not supported warning in SaaS search - fixed issues for SaaS table_names	2024-04-05 16:23:49 -07:00
Lei Xu	86efd36689	chore: improve create_table API consistency between local and remote SDK (#627 )	2024-04-05 16:23:47 -07:00
Ayush Chaurasia	159ecbac5a	Exponential standoff retry support for handling rate limited embedding functions (#614 ) Users ingesting data using rate limited apis don't need to manually make the process sleep for counter rate limits resolves #579	2024-04-05 16:23:14 -07:00
Lance Release	c1b037f0a5	[python] Bump version: 0.3.2 → 0.3.3	2024-04-05 16:23:14 -07:00
Lei Xu	3855bdf986	chore: bump lance to 8.10 (#622 )	2024-04-05 16:23:14 -07:00
Bert	cd7a4dd251	fix!: sort table names (#619 ) https://github.com/lancedb/lance/issues/1385	2024-04-05 16:22:59 -07:00
QianZhu	3c139c2ee5	Qian/query option doc (#615 ) - API documentation improvement for queries (table.search) - a small bug fix for the remote API on create_table ![image](https://github.com/lancedb/lancedb/assets/1305083/712e9bd3-deb8-4d81-8cd0-d8e98ef68f4e) ![image](https://github.com/lancedb/lancedb/assets/1305083/ba22125a-8c36-4e34-a07f-e39f0136e62c)	2024-04-05 16:22:59 -07:00
Will Jones	166b281d66	increment pylance (#618 )	2024-04-05 16:22:59 -07:00
Bert	c9fee0faed	added api docs for prefilter flag (#617 ) Added the prefilter flag argument to the `LanceQueryBuilder.where`. This should make it display here: https://lancedb.github.io/lancedb/python/python/#lancedb.query.LanceQueryBuilder.select And also in intellisense like this: <img width="848" alt="image" src="https://github.com/lancedb/lancedb/assets/5846846/e0c53f4f-96bc-411b-9159-680a6c4d0070"> Also adds some improved documentation about the `where` argument to this method. --------- Co-authored-by: Weston Pace <weston.pace@gmail.com>	2024-04-05 16:22:59 -07:00
Weston Pace	301e08f30e	feat: allow prefiltering with index (#610 ) Support for prefiltering with an index was added in lance version 0.8.7. We can remove the lancedb check that prevents this. Closes #261	2024-04-05 16:22:59 -07:00
Lance Release	a3c955070e	[python] Bump version: 0.3.1 → 0.3.2	2024-04-05 16:22:59 -07:00
Bert	edeecd3d9f	update lance to 0.8.7 (#598 )	2024-04-05 16:22:59 -07:00
Chang She	2861f33982	fix(python): fix multiple embedding functions bug (#597 ) Closes #594 The embedding functions are pydantic models so multiple instances with the same parameters are considered ==, which means that if you have multiple embedding columns it's possible for the embeddings to get overwritten. Instead we use `is` instead of == to avoid this problem. testing: modified unit test to include this case	2024-04-05 16:22:59 -07:00
Will Jones	e37a0566e0	Revert "[python] Bump version: 0.3.2 → 0.3.3" This reverts commit `c30faf6083`.	2024-04-05 16:22:59 -07:00
Will Jones	48999ffc27	[python] Bump version: 0.3.2 → 0.3.3	2024-04-05 16:22:59 -07:00
Chang She	31334b05df	chore: bump lance version in python/rust lancedb (#584 ) To include latest v0.8.6 Co-authored-by: Chang She <chang@lancedb.com>	2024-04-05 16:22:59 -07:00
Ayush Chaurasia	507f6087c2	[Python]Embeddings API refactor (#580 ) Sets things up for this -> https://github.com/lancedb/lancedb/issues/579 - Just separates out the registry/ingestion code from the function implementation code - adds a `get_registry` util - package name "open-clip" -> "open-clip-torch"	2024-04-05 16:22:59 -07:00

... 12 13 14 15 16 ...

854 Commits