* Add PruneReader for optimized row filtering and error handling
- Introduced `PruneReader` to replace `RowGroupReader` for optimized row filtering.
* Commit Message:
Make ReaderMetrics fields public for external access
* Add row selection support to SeqScan and FileRange readers
- Updated `SeqScan::build_part_sources` to accept an optional `TimeSeriesRowSelector`.
* Refactor `scan_region.rs` to remove unnecessary cloning of `series_row_selector`. Enhance `file_range.rs` by adding `select_all` method to check if all rows in a row group are selected, and update the logic in `reader` method to use `LastRowReader` only when all rows are
selected and no DELETE operations are present.
* Commit Message:
Enhance PruneReader and ParquetReader with reset functionality and metrics handling
Summary:
• Made Source enum public in prune.rs.
* chore: Update src/mito2/src/sst/parquet/reader.rs
---------
Co-authored-by: Yingwen <realevenyag@gmail.com>
* feat: support text/plain format of log input
* refactor: pipeline query and delete using dataframe api
* chore: minor refactor
* refactor: skip jsonify when processing plan/text
* refactor: support array(string) as pipeline engine input
* feat/copy-to-parquet-parameter: Commit Message:
Enhance Parquet Writer with Column-wise Configuration
Summary:
• Introduced column_wise_config function to customize per-column properties in Parquet writer.
* feat/copy-to-parquet-parameter: Commit Message:
Enhance Parquet File Format Handling for Specific Data Types
Summary:
• Added ConcreteDataType import to support specific data type handling.
* feat/copy-to-parquet-parameter: Commit Message:
Refactor Parquet file format configuration
* feat/copy-to-parquet-parameter:
Enhance Parquet file format handling for timestamp columns
- Added logic to disable dictionary encoding and set DELTA_BINARY_PACKED encoding for timestamp columns in the Parquet file format configuration.
* feat/copy-to-parquet-parameter:
Disable dictionary encoding for timestamp columns in Parquet writer and update default max_active_window_runs in TwcsOptions
- Modified Parquet writer to disable dictionary encoding for timestamp columns to optimize for increasing timestamp data.
* feat/copy-to-parquet-parameter:
Update compaction settings in tests
- Modified `test_compaction_region` to include new compaction options: `compaction.type`,
`compaction.twcs.max_active_window_runs`, and `compaction.twcs.max_inactive_window_runs`.
- Updated `test_merge_mode_compaction` to use `compaction.twcs.max_active_window_runs` and
`compaction.twcs.max_inactive_window_runs` instead of `max_active_window_files` and
`max_inactive_window_files`.
* feat: use `Inserter` as Frontend
* fix: enable procedure in flownode
* docs: remove `frontend_addr` opts
* chore: rm fe addr in test runner
* refactor: int test also use inserter invoker
* feat: flow shutdown&refactor: remove `Frontendinvoker`
* refactor: rename `RemoteFrontendInvoker` to `FrontendInvoker`
* refactor: per review
* refactor: remove a layer of box
* fix: standalone use `node_manager`
* fix: remove a `Arc` cycle
* feat/inverted-index-cache:
Update dependencies and add caching for inverted index reader
- Updated `atomic` to 0.6.0 and `uuid` to 1.9.1 in `Cargo.lock`.
- Added `moka` and `uuid` dependencies in `Cargo.toml`.
- Introduced `seek_read` method in `InvertedIndexBlobReader` for common seek and read operations.
- Added `cache.rs` module to implement caching for inverted index reader using `moka`.
- Updated `async-compression` to 0.4.11 in `puffin/Cargo.toml`.
* feat/inverted-index-cache:
Refactor InvertedIndexReader and Add Index Cache Support
- Refactored `InvertedIndexReader` to include `seek_read` method and default implementations for `fst` and `bitmap`.
- Implemented `seek_read` in `InvertedIndexBlobReader` and `CachedInvertedIndexBlobReader`.
- Introduced `InvertedIndexCache` in `CacheManager` and `SstIndexApplier`.
- Updated `SstIndexApplierBuilder` to accept and utilize `InvertedIndexCache`.
- Added `From<FileId> for Uuid` implementation.
* feat/inverted-index-cache:
Update Cargo.toml and refactor SstIndexApplier
- Moved `uuid.workspace` entry in Cargo.toml for better organization.
* feat/inverted-index-cache:
Refactor InvertedIndexCache to use type alias for Arc
- Replaced `Arc<InvertedIndexCache>` with `InvertedIndexCacheRef` type alias.
* feat/inverted-index-cache:
Add Prometheus metrics and caching improvements for inverted index
- Introduced `prometheus` and `puffin` dependencies for metrics.
* feat/inverted-index-cache:
Refactor InvertedIndexReader and Cache handling
- Simplified `InvertedIndexReader` trait by removing seek-related comments.
* feat/inverted-index-cache:
Add configurable cache sizes for inverted index metadata and content
- Introduced `index_metadata_size` and `index_content_size` in `CacheManagerBuilder`.
* feat/inverted-index-cache:
Refactor and optimize inverted index caching
- Removed `metrics.rs` and integrated cache metrics into `index.rs`.
* feat/inverted-index-cache:
Remove unused dependencies from Cargo.lock and Cargo.toml
- Removed `moka`, `prometheus`, and `puffin` dependencies from both Cargo.lock and Cargo.toml.
* feat/inverted-index-cache:
Replace Uuid with FileId in CachedInvertedIndexBlobReader
- Updated `file_id` type from `Uuid` to `FileId` in `CachedInvertedIndexBlobReader` and related methods.
* feat/inverted-index-cache:
Refactor cache configuration for inverted index
- Moved `inverted_index_metadata_cache_size` and `inverted_index_cache_size` from `MitoConfig` to `InvertedIndexConfig`.
* feat/inverted-index-cache:
Remove unnecessary conversion of `file_id` in `SstIndexApplier`
- Simplified the initialization of `CachedInvertedIndexBlobReader` by removing the redundant `into()` conversion for `file_id`.