mirror of
https://github.com/lancedb/lancedb.git
synced 2025-12-27 07:09:57 +00:00
feat: add table stats API (#2363)
* Add a new "table stats" API to expose basic table and fragment statistics with local and remote table implementations ### Questions * This is using `calculate_data_stats` to determine total bytes in the table. This seems like a potentially expensive operation - are there any concerns about performance for large datasets? ### Notes * bytes_on_disk seems to be stored at the column level but there does not seem to be a way to easily calculate total bytes per fragment. This may need to be added in lance before we can support fragment size (bytes) statistics. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Added a method to retrieve comprehensive table statistics, including total rows, index counts, storage size, and detailed fragment size metrics such as minimum, maximum, mean, and percentiles. - Enabled fetching of table statistics from remote sources through asynchronous requests. - Extended table interfaces across Python, Rust, and Node.js to support synchronous and asynchronous retrieval of table statistics. - **Tests** - Introduced tests to verify the accuracy of the new table statistics feature for both populated and empty tables. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
This commit is contained in:
@@ -1695,3 +1695,31 @@ def test_replace_field_metadata(tmp_path):
|
||||
schema = table.schema
|
||||
field = schema[0].metadata
|
||||
assert field == {b"foo": b"bar"}
|
||||
|
||||
|
||||
def test_stats(mem_db: DBConnection):
|
||||
table = mem_db.create_table(
|
||||
"my_table",
|
||||
data=[{"text": "foo", "id": 0}, {"text": "bar", "id": 1}],
|
||||
)
|
||||
assert len(table) == 2
|
||||
stats = table.stats()
|
||||
print(f"{stats=}")
|
||||
assert stats == {
|
||||
"total_bytes": 38,
|
||||
"num_rows": 2,
|
||||
"num_indices": 0,
|
||||
"fragment_stats": {
|
||||
"num_fragments": 1,
|
||||
"num_small_fragments": 1,
|
||||
"lengths": {
|
||||
"min": 2,
|
||||
"max": 2,
|
||||
"mean": 2,
|
||||
"p25": 2,
|
||||
"p50": 2,
|
||||
"p75": 2,
|
||||
"p99": 2,
|
||||
},
|
||||
},
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user