It's inconvenient to always require data at table creation time.
Here we enable you to create an empty table and add data and set schema
later.
---------
Co-authored-by: Chang She <chang@lancedb.com>
Sometimes LangChain would insert a single `[np.nan]` as a placeholder if
the embedding function failed. This causes a problem for Lance format
because then the array can't be stored as a FixedSizedListArray.
Instead:
1. By default we remove rows with embedding lengths less than the
maximum length in the batch
2. If `strict=True` kwargs is set to True, then a `ValueError` is raised
if the embeddings aren't all the same length
---------
Co-authored-by: Chang She <chang@lancedb.com>
* to_df() is now async, added `to_df_blocking` to convenience
* add remote lancedb client to public lancedb
* make lancedb connection class understand url scheme
`lancedb+<connection_type>://<host>:<port>`.
pypi does not allow packages to be uploaded that has a direct reference
for now we'll just ask the user to install tantivy separately
---------
Co-authored-by: Chang She <chang@lancedb.com>
- Fixed `add` unit test to create the correct expected result
- Added a unit test for LanceTable.add
- Need to discuss if len(LanceTable) is handled correctly