Commit Graph

642 Commits

Author SHA1 Message Date
Lance Release
8bcdc81fd3 [python] Bump version: 0.4.4 → 0.5.0 python-v0.5.0 2024-01-18 01:53:15 +00:00
Chang She
39e14c70c5 chore(python): turn off lazy frame ingestion (#821) 2024-01-16 19:11:16 -08:00
Chang She
af8263af94 feat(python): allow the entire table to be converted a polars dataframe (#814) 2024-01-15 15:49:16 -08:00
Chang She
be4ab9eef3 feat(python): add exist_ok option to create table (#813)
This mimics CREATE TABLE IF NOT EXISTS behavior.
We add `db.create_table(..., exist_ok=True)` parameter.
By default it is set to False, so trying to create
a table with the same name will raise an exception.
If set to True, then it only opens the table if it
already exists. If you pass in a schema, it will
be checked against the existing table to make sure
you get what you want. If you pass in data, it will
NOT be added to the existing table.
2024-01-15 11:09:18 -08:00
Ayush Chaurasia
184d2bc969 chore(python): get rid of Pydantic deprication warning in embedding fcn (#816)
```
UserWarning: Valid config keys have changed in V2:
* 'keep_untouched' has been renamed to 'ignored_types' warnings.warn(message, UserWarning)
```
2024-01-15 12:19:51 +05:30
Anton Shevtsov
ff6f005336 Add openai api key not found help (#815)
This pull request adds check for the presence of an environment variable
`OPENAI_API_KEY` and removes an unused parameter in
`retry_with_exponential_backoff` function.
2024-01-15 02:44:09 +05:30
Chang She
49333e522c feat(python): basic polars integration (#811)
We should now be able to directly ingest polars dataframes and return
results as polars dataframes


![image](https://github.com/lancedb/lancedb/assets/759245/828b1260-c791-45f1-a047-aa649575e798)
2024-01-13 16:38:16 -08:00
Ayush Chaurasia
4568df422d feat(python): Add gemini text embedding function (#806)
Named it Gemini-text for now. Not sure how complicated it will be to
support both text and multimodal embeddings under the same class
"gemini"..But its not something to worry about for now I guess.
2024-01-12 22:38:55 -08:00
Lance Release
986891db98 Updating package-lock.json 2024-01-11 22:21:42 +00:00
Lance Release
036bf02901 Updating package-lock.json 2024-01-11 21:34:04 +00:00
Lance Release
4e31f0cc7a Bump version: 0.4.2 → 0.4.3 v0.4.3 2024-01-11 21:33:55 +00:00
Lance Release
0a16e29b93 [python] Bump version: 0.4.3 → 0.4.4 python-v0.4.4 2024-01-11 21:29:00 +00:00
Will Jones
cf7d7a19f5 upgrade lance (#809) 2024-01-11 13:28:10 -08:00
Lei Xu
fe2fb91a8b chore: remove black as dependency (#808)
We use `ruff` in CI and dev workflow now.
2024-01-11 10:58:49 -08:00
Chang She
81af350d85 feat(node): align incoming data to table schema (#802) 2024-01-10 16:44:00 -08:00
Sebastian Law
99adfe065a use requests instead of aiohttp for underlying http client (#803)
instead of starting and stopping the current thread's event loop on
every http call, just make an http call.
2024-01-10 00:07:50 -05:00
Chang She
277406509e chore(python): add docstring for limit behavior (#800)
Closes #796
2024-01-09 20:20:13 -08:00
Chang She
63411b4d8b feat(python): add phrase query option for fts (#798)
addresses #797 

Problem: tantivy does not expose option to explicitly

Proposed solution here: 

1. Add a `.phrase_query()` option
2. Under the hood, LanceDB takes care of wrapping the input in quotes
and replace nested double quotes with single quotes

I've also filed an upstream issue, if they support phrase queries
natively then we can get rid of our manual custom processing here.
2024-01-09 19:41:31 -08:00
Chang She
d998f80b04 feat(python): add count_rows with filter option (#801)
Closes #795
2024-01-09 19:33:03 -08:00
Chang She
629379a532 fix(rust): not sure why clippy is suddenly unhappy (#794)
should fix the error on top of main


https://github.com/lancedb/lancedb/actions/runs/7457190471/job/20288985725
2024-01-09 19:27:38 -08:00
Chang She
99ba5331f0 feat(python): support new style optional syntax (#793) 2024-01-09 07:03:29 -08:00
Chang She
121687231c chore(python): document phrase queries in fts (#788)
closes #769 

Add unit test and documentation on using quotes to perform a phrase
query
2024-01-08 21:49:31 -08:00
Chang She
ac40d4b235 feat(node): support table.schema for LocalTable (#789)
Close #773 

we pass an empty table over IPC so we don't need to manually deal with
serde. Then we just return the schema attribute from the empty table.

---------

Co-authored-by: albertlockett <albert.lockett@gmail.com>
2024-01-08 21:12:48 -08:00
Lei Xu
c5a52565ac chore: bump lance to 0.9.5 (#790) 2024-01-07 19:27:47 -08:00
Chang She
b0a88a7286 feat(python): Set heap size to get faster fts indexing performance (#762)
By default tantivy-py uses 128MB heapsize. We change the default to 1GB
and we allow the user to customize this

locally this makes `test_fts.py` run 10x faster
2024-01-07 15:15:13 -08:00
lucasiscovici
d41d849e0e raise exception if fts index does not exist (#776)
raise exception if fts index does not exist

---------

Co-authored-by: Chang She <759245+changhiskhan@users.noreply.github.com>
2024-01-07 14:34:04 -08:00
sudhir
bf5202f196 Make examples work with current version of Openai api's (#779)
These examples don't work because of changes in openai api from version
1+
2024-01-07 14:27:56 -08:00
Chris
8be2861061 Minor Fixes to Ingest Embedding Functions Docs (#777)
Addressed minor typos and grammatical issues to improve readability

---------

Co-authored-by: Christopher Correa <chris.correa@gmail.com>
2024-01-07 14:27:40 -08:00
Vladimir Varankin
0560e3a0e5 Minor corrections for docs of embedding_functions (#780)
In addition to #777, this pull request fixes more typos in the
documentation for "Ingest Embedding Functions".
2024-01-07 14:26:35 -08:00
QianZhu
b83fbfc344 small bug fix for example code in SaaS JS doc (#770) 2024-01-04 14:30:34 -08:00
Chang She
60b22d84bf chore(python): handle NaN input in fts ingestion (#763)
If the input text is None, Tantivy raises an error
complaining it cannot add a NoneType. We handle this
upstream so None's are not added to the document.
If all of the indexed fields are None then we skip
this document.
2024-01-04 11:45:12 -08:00
Bengsoon Chuah
7d55a94efd Add relevant imports for each step (#764)
I found that it was quite incoherent to have to read through the
documentation and having to search which submodule that each class
should be imported from.

For example, it is cumbersome to have to navigate to another
documentation page to find out that `EmbeddingFunctionRegistry` is from
`lancedb.embeddings`
2024-01-04 11:15:42 -08:00
QianZhu
4d8e401d34 SaaS JS API sdk doc (#740)
Co-authored-by: Aidan <64613310+aidangomar@users.noreply.github.com>
2024-01-03 16:24:21 -08:00
Chang She
684eb8b087 feat(js): support list of string input (#755)
Add support for adding lists of string input (e.g., list of categorical
labels)

Follow-up items: #757 #758
2024-01-02 20:55:33 -08:00
Lance Release
4e3b82feaa Updating package-lock.json 2023-12-30 03:16:41 +00:00
Lance Release
8e248a9d67 Updating package-lock.json 2023-12-30 00:53:51 +00:00
Lance Release
065ffde443 Bump version: 0.4.1 → 0.4.2 v0.4.2 2023-12-30 00:53:30 +00:00
Lance Release
c3059dc689 [python] Bump version: 0.4.2 → 0.4.3 python-v0.4.3 2023-12-30 00:52:54 +00:00
Lei Xu
a9caa5f2d4 chore: bump pylance to 0.9.2 (#754) 2023-12-29 16:39:45 -08:00
Xin Hao
8411c36b96 docs: fix link (#752) 2023-12-29 15:33:24 -08:00
Chang She
7773bda7ee feat(python): first cut batch queries for remote api (#753)
issue separate requests under the hood and concatenate results
2023-12-29 15:33:03 -08:00
Lance Release
392777952f [python] Bump version: 0.4.1 → 0.4.2 python-v0.4.2 2023-12-29 00:19:21 +00:00
Chang She
7e75e50d3a chore(python): update embedding API to use openai 1.6.1 (#751)
API has changed significantly, namely `openai.Embedding.create` no
longer exists.
https://github.com/openai/openai-python/discussions/742

Update the OpenAI embedding function and put a minimum on the openai sdk
version.
2023-12-28 15:05:57 -08:00
Chang She
4b8af261a3 feat: add timezone handling for datetime in pydantic (#578)
If you add timezone information in the Field annotation for a datetime
then that will now be passed to the pyarrow data type.

I'm not sure how pyarrow enforces timezones, right now, it silently
coerces to the timezone given in the column regardless of whether the
input had the matching timezone or not. This is probably not the right
behavior. Though we could just make it so the user has to make the
pydantic model do the validation instead of doing that at the pyarrow
conversion layer.
2023-12-28 11:02:56 -08:00
Chang She
c8728d4ca1 feat(python): add post filtering for full text search (#739)
Closes #721 

fts will return results as a pyarrow table. Pyarrow tables has a
`filter` method but it does not take sql filter strings (only pyarrow
compute expressions). Instead, we do one of two things to support
`tbl.search("keywords").where("foo=5").limit(10).to_arrow()`:

Default path: If duckdb is available then use duckdb to execute the sql
filter string on the pyarrow table.
Backup path: Otherwise, write the pyarrow table to a lance dataset and
then do `to_table(filter=<filter>)`

Neither is ideal. 
Default path has two issues:
1. requires installing an extra library (duckdb)
2. duckdb mangles some fields (like fixed size list => list)

Backup path incurs a latency penalty (~20ms on ssd) to write the
resultset to disk.

In the short term, once #676 is addressed, we can write the dataset to
"memory://" instead of disk, this makes the post filter evaluate much
quicker (ETA next week).

In the longer term, we'd like to be able to evaluate the filter string
on the pyarrow Table directly, one possibility being that we use
Substrait to generate pyarrow compute expressions from sql string. Or if
there's enough progress on pyarrow, it could support Substrait
expressions directly (no ETA)

---------

Co-authored-by: Will Jones <willjones127@gmail.com>
2023-12-27 09:31:04 -08:00
Aidan
446f837335 fix: createIndex index cache size (#741) 2023-12-27 09:25:13 -08:00
Chang She
8f9ad978f5 feat(python): support list of list fields from pydantic schema (#747)
For object detection, each row may correspond to an image and each image
can have multiple bounding boxes of x-y coordinates. This means that a
`bbox` field is potentially "list of list of float". This adds support
in our pydantic-pyarrow conversion for nested lists.
2023-12-27 09:10:09 -08:00
Lance Release
0df38341d5 Updating package-lock.json 2023-12-26 17:21:51 +00:00
Lance Release
60260018cf [python] Bump version: 0.4.0 → 0.4.1 python-v0.4.1 2023-12-26 16:51:16 +00:00
Lance Release
bb100c5c19 Bump version: 0.4.0 → 0.4.1 v0.4.1 2023-12-26 16:51:09 +00:00