Commit Graph

637 Commits

Author SHA1 Message Date
Ayush Chaurasia
340fd98b42 chore(python): get rid of Pydantic deprication warning in embedding fcn (#816)
```
UserWarning: Valid config keys have changed in V2:
* 'keep_untouched' has been renamed to 'ignored_types' warnings.warn(message, UserWarning)
```
2024-04-05 16:26:20 -07:00
Anton Shevtsov
dc0b11a86a Add openai api key not found help (#815)
This pull request adds check for the presence of an environment variable
`OPENAI_API_KEY` and removes an unused parameter in
`retry_with_exponential_backoff` function.
2024-04-05 16:26:20 -07:00
Chang She
17dcb70076 feat(python): basic polars integration (#811)
We should now be able to directly ingest polars dataframes and return
results as polars dataframes

![image](https://github.com/lancedb/lancedb/assets/759245/828b1260-c791-45f1-a047-aa649575e798)
2024-04-05 16:26:19 -07:00
Ayush Chaurasia
2f72d5138e feat(python): Add gemini text embedding function (#806)
Named it Gemini-text for now. Not sure how complicated it will be to
support both text and multimodal embeddings under the same class
"gemini"..But its not something to worry about for now I guess.
2024-04-05 16:25:52 -07:00
Lance Release
f0a654036e Updating package-lock.json 2024-04-05 16:25:02 -07:00
Lance Release
162f8536d1 Updating package-lock.json 2024-04-05 16:25:02 -07:00
Lance Release
55cc3ed5a2 Bump version: 0.4.2 → 0.4.3 2024-04-05 16:25:02 -07:00
Lance Release
1387dc6e48 [python] Bump version: 0.4.3 → 0.4.4 2024-04-05 16:25:02 -07:00
Will Jones
63e273606e upgrade lance (#809) 2024-04-05 16:25:02 -07:00
Lei Xu
45b006d68c chore: remove black as dependency (#808)
We use `ruff` in CI and dev workflow now.
2024-04-05 16:25:02 -07:00
Chang She
4b243c5ff8 feat(node): align incoming data to table schema (#802) 2024-04-05 16:25:01 -07:00
Sebastian Law
4aa7f58a07 use requests instead of aiohttp for underlying http client (#803)
instead of starting and stopping the current thread's event loop on
every http call, just make an http call.
2024-04-05 16:25:01 -07:00
Chang She
7581cbb38f chore(python): add docstring for limit behavior (#800)
Closes #796
2024-04-05 16:25:01 -07:00
Chang She
881dfa022b feat(python): add phrase query option for fts (#798)
addresses #797 

Problem: tantivy does not expose option to explicitly

Proposed solution here: 

1. Add a `.phrase_query()` option
2. Under the hood, LanceDB takes care of wrapping the input in quotes
and replace nested double quotes with single quotes

I've also filed an upstream issue, if they support phrase queries
natively then we can get rid of our manual custom processing here.
2024-04-05 16:25:01 -07:00
Chang She
f17d16f935 feat(python): add count_rows with filter option (#801)
Closes #795
2024-04-05 16:25:01 -07:00
Chang She
f3a905af63 fix(rust): not sure why clippy is suddenly unhappy (#794)
should fix the error on top of main


https://github.com/lancedb/lancedb/actions/runs/7457190471/job/20288985725
2024-04-05 16:25:01 -07:00
Chang She
a07c6c465a feat(python): support new style optional syntax (#793) 2024-04-05 16:25:01 -07:00
Chang She
1dd663fc8a chore(python): document phrase queries in fts (#788)
closes #769 

Add unit test and documentation on using quotes to perform a phrase
query
2024-04-05 16:25:01 -07:00
Chang She
175ad9223b feat(node): support table.schema for LocalTable (#789)
Close #773 

we pass an empty table over IPC so we don't need to manually deal with
serde. Then we just return the schema attribute from the empty table.

---------

Co-authored-by: albertlockett <albert.lockett@gmail.com>
2024-04-05 16:25:01 -07:00
Lei Xu
4c8690549a chore: bump lance to 0.9.5 (#790) 2024-04-05 16:25:01 -07:00
Chang She
3100f0d861 feat(python): Set heap size to get faster fts indexing performance (#762)
By default tantivy-py uses 128MB heapsize. We change the default to 1GB
and we allow the user to customize this

locally this makes `test_fts.py` run 10x faster
2024-04-05 16:25:00 -07:00
lucasiscovici
328aa2247b raise exception if fts index does not exist (#776)
raise exception if fts index does not exist

---------

Co-authored-by: Chang She <759245+changhiskhan@users.noreply.github.com>
2024-04-05 16:24:47 -07:00
sudhir
8a48b32689 Make examples work with current version of Openai api's (#779)
These examples don't work because of changes in openai api from version
1+
2024-04-05 16:24:47 -07:00
Chris
6698376f02 Minor Fixes to Ingest Embedding Functions Docs (#777)
Addressed minor typos and grammatical issues to improve readability

---------

Co-authored-by: Christopher Correa <chris.correa@gmail.com>
2024-04-05 16:24:47 -07:00
Vladimir Varankin
2fd829296e Minor corrections for docs of embedding_functions (#780)
In addition to #777, this pull request fixes more typos in the
documentation for "Ingest Embedding Functions".
2024-04-05 16:24:47 -07:00
QianZhu
a25d10279c small bug fix for example code in SaaS JS doc (#770) 2024-04-05 16:24:47 -07:00
Chang She
e929491187 chore(python): handle NaN input in fts ingestion (#763)
If the input text is None, Tantivy raises an error
complaining it cannot add a NoneType. We handle this
upstream so None's are not added to the document.
If all of the indexed fields are None then we skip
this document.
2024-04-05 16:24:47 -07:00
Bengsoon Chuah
e3ba5b2402 Add relevant imports for each step (#764)
I found that it was quite incoherent to have to read through the
documentation and having to search which submodule that each class
should be imported from.

For example, it is cumbersome to have to navigate to another
documentation page to find out that `EmbeddingFunctionRegistry` is from
`lancedb.embeddings`
2024-04-05 16:24:47 -07:00
QianZhu
25d1c62c3f SaaS JS API sdk doc (#740)
Co-authored-by: Aidan <64613310+aidangomar@users.noreply.github.com>
2024-04-05 16:24:47 -07:00
Chang She
cd791a366b feat(js): support list of string input (#755)
Add support for adding lists of string input (e.g., list of categorical
labels)

Follow-up items: #757 #758
2024-04-05 16:24:47 -07:00
Lance Release
24afea8c56 Updating package-lock.json 2024-04-05 16:24:47 -07:00
Lance Release
0d2dbf7d09 Updating package-lock.json 2024-04-05 16:24:47 -07:00
Lance Release
c629080d60 Bump version: 0.4.1 → 0.4.2 2024-04-05 16:24:47 -07:00
Lance Release
918a2a4405 [python] Bump version: 0.4.2 → 0.4.3 2024-04-05 16:24:47 -07:00
Lei Xu
56db257ea9 chore: bump pylance to 0.9.2 (#754) 2024-04-05 16:24:47 -07:00
Xin Hao
a63262cfda docs: fix link (#752) 2024-04-05 16:24:47 -07:00
Chang She
98af0ceec6 feat(python): first cut batch queries for remote api (#753)
issue separate requests under the hood and concatenate results
2024-04-05 16:24:47 -07:00
Lance Release
7778031b26 [python] Bump version: 0.4.1 → 0.4.2 2024-04-05 16:24:47 -07:00
Chang She
c97ae6b787 chore(python): update embedding API to use openai 1.6.1 (#751)
API has changed significantly, namely `openai.Embedding.create` no
longer exists.
https://github.com/openai/openai-python/discussions/742

Update the OpenAI embedding function and put a minimum on the openai sdk
version.
2024-04-05 16:24:47 -07:00
Chang She
7bac1131fb feat: add timezone handling for datetime in pydantic (#578)
If you add timezone information in the Field annotation for a datetime
then that will now be passed to the pyarrow data type.

I'm not sure how pyarrow enforces timezones, right now, it silently
coerces to the timezone given in the column regardless of whether the
input had the matching timezone or not. This is probably not the right
behavior. Though we could just make it so the user has to make the
pydantic model do the validation instead of doing that at the pyarrow
conversion layer.
2024-04-05 16:24:47 -07:00
Chang She
a0afa84786 feat(python): add post filtering for full text search (#739)
Closes #721 

fts will return results as a pyarrow table. Pyarrow tables has a
`filter` method but it does not take sql filter strings (only pyarrow
compute expressions). Instead, we do one of two things to support
`tbl.search("keywords").where("foo=5").limit(10).to_arrow()`:

Default path: If duckdb is available then use duckdb to execute the sql
filter string on the pyarrow table.
Backup path: Otherwise, write the pyarrow table to a lance dataset and
then do `to_table(filter=<filter>)`

Neither is ideal. 
Default path has two issues:
1. requires installing an extra library (duckdb)
2. duckdb mangles some fields (like fixed size list => list)

Backup path incurs a latency penalty (~20ms on ssd) to write the
resultset to disk.

In the short term, once #676 is addressed, we can write the dataset to
"memory://" instead of disk, this makes the post filter evaluate much
quicker (ETA next week).

In the longer term, we'd like to be able to evaluate the filter string
on the pyarrow Table directly, one possibility being that we use
Substrait to generate pyarrow compute expressions from sql string. Or if
there's enough progress on pyarrow, it could support Substrait
expressions directly (no ETA)

---------

Co-authored-by: Will Jones <willjones127@gmail.com>
2024-04-05 16:24:47 -07:00
Aidan
e74c203e6f fix: createIndex index cache size (#741) 2024-04-05 16:24:47 -07:00
Chang She
46bf5a1ed1 feat(python): support list of list fields from pydantic schema (#747)
For object detection, each row may correspond to an image and each image
can have multiple bounding boxes of x-y coordinates. This means that a
`bbox` field is potentially "list of list of float". This adds support
in our pydantic-pyarrow conversion for nested lists.
2024-04-05 16:24:47 -07:00
Lance Release
4891a7ae14 Updating package-lock.json 2024-04-05 16:24:47 -07:00
Lance Release
d1f24ba1dd [python] Bump version: 0.4.0 → 0.4.1 2024-04-05 16:24:47 -07:00
Lance Release
b56c54c990 Bump version: 0.4.0 → 0.4.1 2024-04-05 16:24:47 -07:00
elliottRobinson
3ab4b335c3 Update default_embedding_functions.md (#744)
Modify some grammar, punctuation, and spelling errors.
2024-04-05 16:24:47 -07:00
Will Jones
c34aa09166 docs: update node API reference (#734)
This command hasn't been run for a while...
2024-04-05 16:24:47 -07:00
Will Jones
43662705ad docs: enhance Update user guide (#735)
Closes #705
2024-04-05 16:24:47 -07:00
Bert
5bb128a24d docs: fix JS api docs for update method (#738) 2024-04-05 16:24:47 -07:00