feat: add PruneReader for optimized row filtering (#4370)

* Add PruneReader for optimized row filtering and error handling

 - Introduced `PruneReader` to replace `RowGroupReader` for optimized row filtering.

* Commit Message:

 Make ReaderMetrics fields public for external access

* Add row selection support to SeqScan and FileRange readers

 - Updated `SeqScan::build_part_sources` to accept an optional `TimeSeriesRowSelector`.

* Refactor `scan_region.rs` to remove unnecessary cloning of `series_row_selector`. Enhance `file_range.rs` by adding `select_all` method to check if all rows in a row group are selected, and update the logic in `reader` method to use `LastRowReader` only when all rows are
 selected and no DELETE operations are present.

* Commit Message:

Enhance PruneReader and ParquetReader with reset functionality and metrics handling

Summary:

 • Made Source enum public in prune.rs.

* chore: Update src/mito2/src/sst/parquet/reader.rs

---------

Co-authored-by: Yingwen <realevenyag@gmail.com>
This commit is contained in:
Lei, HUANG
2024-07-15 22:23:34 +08:00
committed by GitHub
parent 2e7b12c344
commit 9fbc4ba649
9 changed files with 282 additions and 84 deletions

View File

@@ -17,7 +17,7 @@ use datafusion_expr::expr::Expr;
use strum::Display;
/// A hint on how to select rows from a time-series.
#[derive(Debug, Clone, PartialEq, Eq, Hash, Display)]
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash, Display)]
pub enum TimeSeriesRowSelector {
/// Only keep the last row of each time-series.
LastRow,