lancedb

mirror of https://github.com/lancedb/lancedb.git synced 2026-07-07 21:10:41 +00:00

Author	SHA1	Message	Date
Sebastian Law	4aa7f58a07	use requests instead of aiohttp for underlying http client (#803 ) instead of starting and stopping the current thread's event loop on every http call, just make an http call.	2024-04-05 16:25:01 -07:00
Chang She	7581cbb38f	chore(python): add docstring for limit behavior (#800 ) Closes #796	2024-04-05 16:25:01 -07:00
Chang She	881dfa022b	feat(python): add phrase query option for fts (#798 ) addresses #797 Problem: tantivy does not expose option to explicitly Proposed solution here: 1. Add a `.phrase_query()` option 2. Under the hood, LanceDB takes care of wrapping the input in quotes and replace nested double quotes with single quotes I've also filed an upstream issue, if they support phrase queries natively then we can get rid of our manual custom processing here.	2024-04-05 16:25:01 -07:00
Chang She	f17d16f935	feat(python): add count_rows with filter option (#801 ) Closes #795	2024-04-05 16:25:01 -07:00
Chang She	f3a905af63	fix(rust): not sure why clippy is suddenly unhappy (#794 ) should fix the error on top of main https://github.com/lancedb/lancedb/actions/runs/7457190471/job/20288985725	2024-04-05 16:25:01 -07:00
Chang She	a07c6c465a	feat(python): support new style optional syntax (#793 )	2024-04-05 16:25:01 -07:00
Chang She	1dd663fc8a	chore(python): document phrase queries in fts (#788 ) closes #769 Add unit test and documentation on using quotes to perform a phrase query	2024-04-05 16:25:01 -07:00
Chang She	175ad9223b	feat(node): support table.schema for LocalTable (#789 ) Close #773 we pass an empty table over IPC so we don't need to manually deal with serde. Then we just return the schema attribute from the empty table. --------- Co-authored-by: albertlockett <albert.lockett@gmail.com>	2024-04-05 16:25:01 -07:00
Lei Xu	4c8690549a	chore: bump lance to 0.9.5 (#790 )	2024-04-05 16:25:01 -07:00
Chang She	3100f0d861	feat(python): Set heap size to get faster fts indexing performance (#762 ) By default tantivy-py uses 128MB heapsize. We change the default to 1GB and we allow the user to customize this locally this makes `test_fts.py` run 10x faster	2024-04-05 16:25:00 -07:00
lucasiscovici	328aa2247b	raise exception if fts index does not exist (#776 ) raise exception if fts index does not exist --------- Co-authored-by: Chang She <759245+changhiskhan@users.noreply.github.com>	2024-04-05 16:24:47 -07:00
sudhir	8a48b32689	Make examples work with current version of Openai api's (#779 ) These examples don't work because of changes in openai api from version 1+	2024-04-05 16:24:47 -07:00
Chris	6698376f02	Minor Fixes to Ingest Embedding Functions Docs (#777 ) Addressed minor typos and grammatical issues to improve readability --------- Co-authored-by: Christopher Correa <chris.correa@gmail.com>	2024-04-05 16:24:47 -07:00
Vladimir Varankin	2fd829296e	Minor corrections for docs of embedding_functions (#780 ) In addition to #777, this pull request fixes more typos in the documentation for "Ingest Embedding Functions".	2024-04-05 16:24:47 -07:00
QianZhu	a25d10279c	small bug fix for example code in SaaS JS doc (#770 )	2024-04-05 16:24:47 -07:00
Chang She	e929491187	chore(python): handle NaN input in fts ingestion (#763 ) If the input text is None, Tantivy raises an error complaining it cannot add a NoneType. We handle this upstream so None's are not added to the document. If all of the indexed fields are None then we skip this document.	2024-04-05 16:24:47 -07:00
Bengsoon Chuah	e3ba5b2402	Add relevant imports for each step (#764 ) I found that it was quite incoherent to have to read through the documentation and having to search which submodule that each class should be imported from. For example, it is cumbersome to have to navigate to another documentation page to find out that `EmbeddingFunctionRegistry` is from `lancedb.embeddings`	2024-04-05 16:24:47 -07:00
QianZhu	25d1c62c3f	SaaS JS API sdk doc (#740 ) Co-authored-by: Aidan <64613310+aidangomar@users.noreply.github.com>	2024-04-05 16:24:47 -07:00
Chang She	cd791a366b	feat(js): support list of string input (#755 ) Add support for adding lists of string input (e.g., list of categorical labels) Follow-up items: #757 #758	2024-04-05 16:24:47 -07:00
Lance Release	24afea8c56	Updating package-lock.json	2024-04-05 16:24:47 -07:00
Lance Release	0d2dbf7d09	Updating package-lock.json	2024-04-05 16:24:47 -07:00
Lance Release	c629080d60	Bump version: 0.4.1 → 0.4.2	2024-04-05 16:24:47 -07:00
Lance Release	918a2a4405	[python] Bump version: 0.4.2 → 0.4.3	2024-04-05 16:24:47 -07:00
Lei Xu	56db257ea9	chore: bump pylance to 0.9.2 (#754 )	2024-04-05 16:24:47 -07:00
Xin Hao	a63262cfda	docs: fix link (#752 )	2024-04-05 16:24:47 -07:00
Chang She	98af0ceec6	feat(python): first cut batch queries for remote api (#753 ) issue separate requests under the hood and concatenate results	2024-04-05 16:24:47 -07:00
Lance Release	7778031b26	[python] Bump version: 0.4.1 → 0.4.2	2024-04-05 16:24:47 -07:00
Chang She	c97ae6b787	chore(python): update embedding API to use openai 1.6.1 (#751 ) API has changed significantly, namely `openai.Embedding.create` no longer exists. https://github.com/openai/openai-python/discussions/742 Update the OpenAI embedding function and put a minimum on the openai sdk version.	2024-04-05 16:24:47 -07:00
Chang She	7bac1131fb	feat: add timezone handling for datetime in pydantic (#578 ) If you add timezone information in the Field annotation for a datetime then that will now be passed to the pyarrow data type. I'm not sure how pyarrow enforces timezones, right now, it silently coerces to the timezone given in the column regardless of whether the input had the matching timezone or not. This is probably not the right behavior. Though we could just make it so the user has to make the pydantic model do the validation instead of doing that at the pyarrow conversion layer.	2024-04-05 16:24:47 -07:00
Chang She	a0afa84786	feat(python): add post filtering for full text search (#739 ) Closes #721 fts will return results as a pyarrow table. Pyarrow tables has a `filter` method but it does not take sql filter strings (only pyarrow compute expressions). Instead, we do one of two things to support `tbl.search("keywords").where("foo=5").limit(10).to_arrow()`: Default path: If duckdb is available then use duckdb to execute the sql filter string on the pyarrow table. Backup path: Otherwise, write the pyarrow table to a lance dataset and then do `to_table(filter=<filter>)` Neither is ideal. Default path has two issues: 1. requires installing an extra library (duckdb) 2. duckdb mangles some fields (like fixed size list => list) Backup path incurs a latency penalty (~20ms on ssd) to write the resultset to disk. In the short term, once #676 is addressed, we can write the dataset to "memory://" instead of disk, this makes the post filter evaluate much quicker (ETA next week). In the longer term, we'd like to be able to evaluate the filter string on the pyarrow Table directly, one possibility being that we use Substrait to generate pyarrow compute expressions from sql string. Or if there's enough progress on pyarrow, it could support Substrait expressions directly (no ETA) --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2024-04-05 16:24:47 -07:00
Aidan	e74c203e6f	fix: createIndex index cache size (#741 )	2024-04-05 16:24:47 -07:00
Chang She	46bf5a1ed1	feat(python): support list of list fields from pydantic schema (#747 ) For object detection, each row may correspond to an image and each image can have multiple bounding boxes of x-y coordinates. This means that a `bbox` field is potentially "list of list of float". This adds support in our pydantic-pyarrow conversion for nested lists.	2024-04-05 16:24:47 -07:00
Lance Release	4891a7ae14	Updating package-lock.json	2024-04-05 16:24:47 -07:00
Lance Release	d1f24ba1dd	[python] Bump version: 0.4.0 → 0.4.1	2024-04-05 16:24:47 -07:00
Lance Release	b56c54c990	Bump version: 0.4.0 → 0.4.1	2024-04-05 16:24:47 -07:00
elliottRobinson	3ab4b335c3	Update default_embedding_functions.md (#744 ) Modify some grammar, punctuation, and spelling errors.	2024-04-05 16:24:47 -07:00
Will Jones	c34aa09166	docs: update node API reference (#734 ) This command hasn't been run for a while...	2024-04-05 16:24:47 -07:00
Will Jones	43662705ad	docs: enhance Update user guide (#735 ) Closes #705	2024-04-05 16:24:47 -07:00
Bert	5bb128a24d	docs: fix JS api docs for update method (#738 )	2024-04-05 16:24:47 -07:00
Weston Pace	94e81ff84b	feat: add the ability to create scalar indices (#679 ) This is a pretty direct binding to the underlying lance capability	2024-04-05 16:24:47 -07:00
Aidan	b4ae3f3097	feat: node list tables pagination (#733 )	2024-04-05 16:24:47 -07:00
Chang She	5376970e87	doc(javascript): minor improvement on docs for working with tables (#736 ) Closes #639 Closes #638	2024-04-05 16:24:47 -07:00
Chang She	009297e900	bug(python): fix path handling in windows (#724 ) Use pathlib for local paths so that pathlib can handle the correct separator on windows. Closes #703 --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2024-04-05 16:24:45 -07:00
Will Jones	3f3acb48c6	chore: add issue templates (#732 ) This PR adds issue templates, which help two recurring issues: * Users forget to tell us whether they are using the Node or Python SDK * Issues don't get appropriate tags This doesn't force the use of the templates. Because we set `blank_issues_enabled: true`, users can still create a custom issue.	2024-04-05 16:24:30 -07:00
Will Jones	c3cda2c5d0	ci: check formatting and clippy (#730 )	2024-04-05 16:24:30 -07:00
Will Jones	a975cc0a94	fix: prevent duplicate data in FTS index (#728 ) This forces the user to replace the whole FTS directory when re-creating the index, prevent duplicate data from being created. Previously, the whole dataset was re-added to the existing index, duplicating existing rows in the index. This (in combination with lancedb/lance#1707) caused #726, since the duplicate data emitted duplicate indices for `take()` and an upstream issue caused those queries to fail. This solution isn't ideal, since it makes the FTS index temporarily unavailable while the index is built. In the future, we should have multiple FTS index directories, which would allow atomic commits of new indexes (as well as multiple indexes for different columns). Fixes #498. Fixes #726. --------- Co-authored-by: Chang She <759245+changhiskhan@users.noreply.github.com>	2024-04-05 16:24:30 -07:00
Will Jones	48a12e780c	upgrade lance to v0.9.1 (#727 ) This brings in some important bugfixes related to take and aarch64 Linux. See changes at: https://github.com/lancedb/lance/releases/tag/v0.9.1	2024-04-05 16:24:30 -07:00
Chang She	b60a2177ae	feat(python): support nested reference for fts (#723 ) https://github.com/lancedb/lance/issues/1739 Support nested field reference in full text search --------- Co-authored-by: Will Jones <willjones127@gmail.com>	2024-04-05 16:24:30 -07:00
Chang She	cc9d74e7a7	feat(python): add option to flatten output in to_pandas (#722 ) Closes https://github.com/lancedb/lance/issues/1738 We add a `flatten` parameter to the signature of `to_pandas`. By default this is None and does nothing. If set to True or -1, then LanceDB will flatten structs before converting to a pandas dataframe. All nested structs are also flattened. If set to any positive integer, then LanceDB will flatten structs up to the specified level of nesting. --------- Co-authored-by: Weston Pace <weston.pace@gmail.com>	2024-04-05 16:24:30 -07:00
Aidan	3232b55218	feat: Node create index API (#720 )	2024-04-05 16:24:30 -07:00

1 2 3 4 5 ...

626 Commits