fix: include _rowid in hash and calculated split projections (#2965)

## Summary - PR #2957 changed the permutation builder to only select `_rowid` from the base table, but `Splitter::project()` for hash and calculated splits replaced the selection entirely, dropping `_rowid`. - Include `_rowid` in the column selections for hash and calculated split projections. - Fix a Python test that queried the permutation table for base table columns no longer materialized. Fixes the `test_split_hash`, `test_split_hash_with_discard`, `test_split_calculated`, `test_shuffle_combined_with_splits`, and `test_filter_with_splits` failures in `test_permutation.py`. ## Test plan - [x] `cargo test -p lancedb -- permutation` (22 passed) - [x] `pytest python/tests/test_permutation.py` (46 passed) - [x] `npm test __test__/permutation.test.ts` (20 passed) 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-05-24 07:20:40 +00:00 · 2026-02-02 16:27:58 -08:00
parent 3c7ddf4d0c
commit 131024839f
2 changed files with 17 additions and 7 deletions
--- a/python/python/tests/test_permutation.py
+++ b/python/python/tests/test_permutation.py
@@ -438,11 +438,15 @@ def test_filter_with_splits(mem_db):
    row_count = permutation_tbl.count_rows()
    assert row_count == 67

-    data = permutation_tbl.search(None).to_arrow().to_pydict()
+    # Verify the permutation table only contains row_id and split_id
+    assert set(permutation_tbl.schema.names) == {"row_id", "split_id"}
+
+    row_ids = permutation_tbl.search(None).to_arrow().to_pydict()["row_id"]
+    data = tbl.take_row_ids(row_ids).to_arrow().to_pydict()
    categories = data["category"]

    # All categories should be A or B
-    assert all(cat in ["A", "B"] for cat in categories)
+    assert all(cat in ("A", "B") for cat in categories)


 def test_filter_with_shuffle(mem_db):