Handle NaN input data (#241)

Sometimes LangChain would insert a single `[np.nan]` as a placeholder if
the embedding function failed. This causes a problem for Lance format
because then the array can't be stored as a FixedSizedListArray.

Instead:
1. By default we remove rows with embedding lengths less than the
maximum length in the batch
2. If `strict=True` kwargs is set to True, then a `ValueError` is raised
if the embeddings aren't all the same length

---------

Co-authored-by: Chang She <chang@lancedb.com>
This commit is contained in:
Chang She
2023-07-04 20:00:46 -07:00
committed by GitHub
parent 9600a38ff0
commit 3c46d7f268
7 changed files with 273 additions and 24 deletions

View File

@@ -15,7 +15,6 @@ import unittest.mock as mock
import lance
import numpy as np
import pandas as pd
import pandas.testing as tm
import pyarrow as pa
import pytest