Handle NaN input data (#241)

Sometimes LangChain would insert a single `[np.nan]` as a placeholder if
the embedding function failed. This causes a problem for Lance format
because then the array can't be stored as a FixedSizedListArray.

Instead:
1. By default we remove rows with embedding lengths less than the
maximum length in the batch
2. If `strict=True` kwargs is set to True, then a `ValueError` is raised
if the embeddings aren't all the same length

---------

Co-authored-by: Chang She <chang@lancedb.com>
This commit is contained in:
Chang She
2023-07-04 20:00:46 -07:00
committed by GitHub
parent 9600a38ff0
commit 3c46d7f268
7 changed files with 273 additions and 24 deletions

View File

@@ -30,7 +30,7 @@ class MockLanceDBServer:
table_name = request.match_info["table_name"]
assert table_name == "test_table"
request_json = await request.json()
await request.json()
# TODO: do some matching
vecs = pd.Series([np.random.rand(128) for x in range(10)], name="vector")