Commit Graph

43 Commits

Author SHA1 Message Date
Lei Xu
f09db4a6d6 [Python] Do not return Table count for every add operation (#328)
`Table::count()` will be linearly slower with more fragments ingested.
2023-07-18 17:11:17 -07:00
Lei Xu
088e745e1d [Python] Create table with Iterator[RecordBatch] and add docs (#316) 2023-07-16 21:45:55 -07:00
Lei Xu
028a6e433d [Python] Get table schema (#313) 2023-07-15 17:39:37 -07:00
Chang She
2fdcb307eb [python] Fix a few minor bugs (#304) 2023-07-15 03:47:42 +08:00
Lei Xu
08944bf4fd [Python] Convert Pydantic Model to Arrow Schema (#291)
Provide utility to automatically convert Pydantic model to Arrow Schema

Closes #256
2023-07-13 11:16:37 -07:00
Lei Xu
0e7ae5dfbf [Python] Fix list type conversion to JSON and temporal types (#274) 2023-07-11 11:05:51 -07:00
Lei Xu
9f603f73a9 [Python] Schema to JSON (#272) 2023-07-10 18:11:24 -07:00
Lei Xu
e6c6da6104 [Python] Initial support of cloud API (#260)
Support connect with remote database, and implement Search API
2023-07-07 15:41:15 -07:00
Chang She
e2325c634b Allow creation of an empty table (#254)
It's inconvenient to always require data at table creation time.
Here we enable you to create an empty table and add data and set schema
later.

---------

Co-authored-by: Chang She <chang@lancedb.com>
2023-07-06 20:44:58 -07:00
Chang She
507eeae9c8 Set default to error instead of drop (#259)
when encountering bad input data, we can default to principle of least
surprise and raise an exception.

Co-authored-by: Chang She <chang@lancedb.com>
2023-07-05 22:44:18 -07:00
Chang She
3c46d7f268 Handle NaN input data (#241)
Sometimes LangChain would insert a single `[np.nan]` as a placeholder if
the embedding function failed. This causes a problem for Lance format
because then the array can't be stored as a FixedSizedListArray.

Instead:
1. By default we remove rows with embedding lengths less than the
maximum length in the batch
2. If `strict=True` kwargs is set to True, then a `ValueError` is raised
if the embeddings aren't all the same length

---------

Co-authored-by: Chang She <chang@lancedb.com>
2023-07-04 20:00:46 -07:00
Leon Yee
eb5bcda337 Error implementations (#232)
Solves #216 by adding a check on table open for existence of the
`.lance` file. Does not check for it for remote connections.
2023-06-27 16:48:31 -07:00
Lei Xu
4bc676e26a [Python] Support replace during create_index (#233)
Closes #214
2023-06-27 16:02:07 -07:00
Philip Kung
313e66c4c5 Specify and Index Column for Vector Search (#217) 2023-06-26 16:11:08 -07:00
Rob Meng
d1e8a97a2a isort entire repo (#200) 2023-06-15 20:12:10 -04:00
Rob Meng
cbb56e25ab port remote connection client into lancedb (#194)
* to_df() is now async, added `to_df_blocking` to convenience
* add remote lancedb client to public lancedb
* make lancedb connection class understand url scheme
`lancedb+<connection_type>://<host>:<port>`.
2023-06-15 18:57:52 -04:00
Utkarsh Gautam
6b5c046c3b [Python] Updated to_df implementation in Contextualizer class (#174)
Changes include:
- Contexts of sizes less than window param to be included as well
- Added optional threshold parameter to to_df in Contextualizer 
This should close #165 
- If maintainers are satisfied with the implementation will add more
examples and test cases and update the documentations as well.

---------

Co-authored-by: Nithin PS <47279496+Nithinps021@users.noreply.github.com>
Co-authored-by: Will Jones <willjones127@gmail.com>
2023-06-14 09:22:32 -07:00
Tevin Wang
9b83ce3d2a add black to python CI (#178)
Closes #48
2023-06-12 11:22:34 -07:00
Will Jones
fed33a51d5 wip: make the python API reference a bit nicer (#162)
Adds:

* Make `mkdocstrings` aware we are using numpy-style docstrings
* Fixes broken link on `index.md` to Python API docs (and added link to
node ones)
* Added examples to various classes.
* Added doctest to verify examples work.
2023-06-08 16:07:06 -07:00
Chang She
50cdb16b45 Better handle empty results from tantivy (#155)
Closes #154

---------

Co-authored-by: Chang She <chang@lancedb.com>
2023-06-05 18:18:14 -07:00
gsilvestrin
f765a453cf Use fsspec to implement table_names with cloud storage support (#117)
Co-authored-by: Will Jones <willjones127@gmail.com>
2023-06-01 16:56:26 -07:00
Lei Xu
9965b4564d [Python] Support drop table (#123)
Closes #86
2023-06-01 15:58:45 -07:00
Chang She
04d97347d7 move tantivy-py installation to be separate from wheel (#97)
pypi does not allow packages to be uploaded that has a direct reference

for now we'll just ask the user to install tantivy separately

---------

Co-authored-by: Chang She <chang@lancedb.com>
2023-05-25 17:57:26 -06:00
Chang She
f485378ea4 Basic full text search capabilities (#62)
This is v1 of integrating full text search index into LanceDB.

# API
The query API is roughly the same as before, except if the input is text
instead of a vector we assume that its fts search.

## Example
If `table` is a LanceDB LanceTable, then:

Build index: `table.create_fts_index("text")`

Query: `df = table.search("puppy").limit(10).select(["text"]).to_df()`

# Implementation
Here we use the tantivy-py package to build the index. We then use the
row id's as the full-text-search index's doc id then we just do a Take
operation to fetch the rows.

# Limitations

1. don't support incremental row appends yet. New data won't show up in
search
2. local filesystem only 
3. requires building tantivy explicitly

---------

Co-authored-by: Chang She <chang@lancedb.com>
2023-05-24 22:25:31 -06:00
Chang She
59014a01e0 bump version for v0.1.2 2023-05-05 11:27:09 -07:00
Chang She
b6739f3f66 windows paths 2023-05-04 11:41:05 -07:00
Chang She
3a2df0ce45 Add method to get the URI scheme to support cloud storage 2023-05-04 09:47:03 -07:00
Chang She
a8db7f56d2 tolerance 2023-04-25 20:08:18 -07:00
Chang She
89e6232aeb Make distance metric configurable during search 2023-04-24 22:40:40 -07:00
Chang She
159b175316 Merge pull request #34 from lancedb/changhiskhan/overwrite-table
Add mode to overwrite table if already exists
2023-04-19 21:11:56 -07:00
Chang She
99310e099e expose methods to work with versioning in tables 2023-04-19 16:48:06 -07:00
Chang She
d7c5793803 Add mode to overwrite table if already exists 2023-04-19 16:22:11 -07:00
Lei Xu
ec197b1855 Merge pull request #31 from lancedb/lei/doc
[Doc] Pandas, Parrow, DuckDB integration
2023-04-19 14:55:42 -07:00
Lei Xu
c38d80cab2 remove print 2023-04-19 14:26:07 -07:00
Lei Xu
b3fdabdf45 use python and arrow 2023-04-19 14:15:18 -07:00
Chang She
f0ea1d898b invalidate cached dataset after create_index and add 2023-04-18 16:51:26 -07:00
gsilvestrin
6865d66d37 renaming test case 2023-04-14 16:32:31 -07:00
gsilvestrin
aeecd809cc bugfix for LanceTable.add to convert python lists into arrow fixed size lists
- Fixed `add` unit test to create the correct expected result
- Added a unit test for LanceTable.add
- Need to discuss if len(LanceTable) is handled correctly
2023-04-14 14:13:01 -07:00
Chang She
eba533da4f fix 3.11 2023-03-24 19:45:46 -07:00
Chang She
5d7832c8a5 update for release 2023-03-24 18:16:29 -07:00
Chang She
5ef5141812 black 2023-03-22 18:29:07 -07:00
Chang She
690141d357 add unit tests 2023-03-21 22:29:19 -07:00
Chang She
b10301f5d6 initial python impl 2023-03-18 10:43:26 -07:00