fix: handle empty list with schema in table creation (#2548)

## Summary
Fixes IndexError when creating tables with empty list data and a
provided schema. Previously, `_into_pyarrow_reader()` would attempt to
access `data[0]` on empty lists, causing an IndexError. Now properly
handles empty lists by using the provided schema.

Also adds regression tests for GitHub issues #1968 and #303 to prevent
future regressions with empty table scenarios.

## Changes
- Fix IndexError in `_into_pyarrow_reader()` for empty list + schema
case
- Add Optional[pa.Schema] parameter to handle empty data gracefully  
- Add `test_create_table_empty_list_with_schema` for the IndexError fix
- Add `test_create_empty_then_add_data` for issue #1968
- Add `test_search_empty_table` for issue #303

## Test plan
- [x] All new regression tests pass
- [x] Existing tests continue to pass
- [x] Code formatted with `make format`
This commit is contained in:
Tristan Zajonc
2025-07-24 19:23:43 -07:00
committed by GitHub
parent 050f0086b8
commit 055bf91d3e
3 changed files with 69 additions and 2 deletions

View File

@@ -1339,3 +1339,20 @@ async def test_query_timeout_async(tmp_path):
.nearest_to([0.0, 0.0])
.to_list(timeout=timedelta(0))
)
def test_search_empty_table(mem_db):
"""Test searching on empty table should not crash
Regression test for issue #303:
https://github.com/lancedb/lancedb/issues/303
Searching on empty table produces scary error message
"""
schema = pa.schema(
[pa.field("vector", pa.list_(pa.float32(), 2)), pa.field("id", pa.int64())]
)
table = mem_db.create_table("test_empty_search", schema=schema)
# Search on empty table should return empty results, not crash
results = table.search([1.0, 2.0]).limit(5).to_list()
assert results == []