Files
lancedb/python
Eric B e6661a7285 fix: handle empty/wrong-length vectors returned by embedding functions (#3192)
## Summary

- When an embedding function returns an empty list (e.g. `[]`) for an
input row — as can happen when a model produces no output for a blank
string — `_append_vector_columns` crashed with `ArrowInvalid: Length of
item not correct: expected N but got array of size 0` because PyArrow
cannot fit a zero-length value into a fixed-size list element.
- The fix adds a validation step in `gen()`, inside
`_append_vector_columns`, that replaces any vector whose length does not
match the expected `ndims` (including empty lists and `None`) with
`None` before `pa.array()` is called.
- `None` is a valid null in a PyArrow fixed-size list array, so the bad
entry flows into `_handle_bad_vectors` and is handled according to the
caller-supplied `on_bad_vectors` policy (`error` / `drop` / `fill` /
`null`) instead of causing an unconditional crash.

## Test plan

- [ ] Added `test_embedding_with_empty_output_vectors` in
`python/python/tests/test_embeddings.py` that uses an embedding function
returning `[]` for empty-string inputs, calls `table.add(...,
on_bad_vectors="drop")`, and asserts no crash and that bad rows are
correctly dropped.
- [ ] Existing `test_embedding_with_bad_results` continues to pass (NaN
vectors still handled correctly).
- [ ] Verified manually that `pa.array([[1.,2.,3.,4.], []],
type=pa.list_(pa.float32(), 4))` raises `ArrowInvalid` without the fix,
and succeeds with `None` in place of `[]`.

Fixes #1672

---------

Co-authored-by: Will Jones <willjones127@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-07-02 13:31:16 -07:00
..
2025-01-29 08:27:07 -08:00
2024-04-05 16:22:59 -07:00

LanceDB Python SDK

A Python library for LanceDB.

Installation

pip install lancedb

Preview Releases

Stable releases are created about every 2 weeks. For the latest features and bug fixes, you can install the preview release. These releases receive the same level of testing as stable releases, but are not guaranteed to be available for more than 6 months after they are released. Once your application is stable, we recommend switching to stable releases.

pip install --pre --extra-index-url https://pypi.fury.io/lancedb/ lancedb

Usage

Basic Example

import lancedb
db = lancedb.connect('<PATH_TO_LANCEDB_DATASET>')
table = db.open_table('my_table')
results = table.search([0.1, 0.3]).limit(20).to_list()
print(results)

Development

See CONTRIBUTING.md for information on how to contribute to LanceDB.