mirror of
https://github.com/GreptimeTeam/greptimedb.git
synced 2025-12-26 08:00:01 +00:00
* - **Refactor `RegionFilePathFactory` to `RegionFilePathProvider`:** Updated references and implementations in `access_layer.rs`, `write_cache.rs`, and related test files to use the new struct name. - **Add `max_file_size` support in compaction:** Introduced `max_file_size` option in `PickerOutput`, `SerializedPickerOutput`, and `WriteOptions` in `compactor.rs`, `picker.rs`, `twcs.rs`, and `window.rs`. - **Enhance Parquet writing logic:** Modified `parquet.rs` and `parquet/writer.rs` to support optional `max_file_size` and added a test case `test_write_multiple_files` to verify writing multiple files based on size constraints. **Refactor Parquet Writer Initialization and File Handling** - Updated `ParquetWriter` in `writer.rs` to handle `current_indexer` as an `Option`, allowing for more flexible initialization and management. - Introduced `finish_current_file` method to encapsulate logic for completing and transitioning between SST files, improving code clarity and maintainability. - Enhanced error handling and logging with `debug` statements for better traceability during file operations. - **Removed Output Size Enforcement in `twcs.rs`:** - Deleted the `enforce_max_output_size` function and related logic to simplify compaction input handling. - **Added Max File Size Option in `parquet.rs`:** - Introduced `max_file_size` in `WriteOptions` to control the maximum size of output files. - **Refactored Indexer Management in `parquet/writer.rs`:** - Changed `current_indexer` from an `Option` to a direct `Indexer` type. - Implemented `roll_to_next_file` to handle file transitions when exceeding `max_file_size`. - Simplified indexer initialization and management logic. - **Refactored SST File Handling**: - Introduced `FilePathProvider` trait and its implementations (`WriteCachePathProvider`, `RegionFilePathFactory`) to manage SST and index file paths. - Updated `AccessLayer`, `WriteCache`, and `ParquetWriter` to use `FilePathProvider` for path management. - Modified `SstWriteRequest` and `SstUploadRequest` to use path providers instead of direct paths. - Files affected: `access_layer.rs`, `write_cache.rs`, `parquet.rs`, `writer.rs`. - **Enhanced Indexer Management**: - Replaced `IndexerBuilder` with `IndexerBuilderImpl` and made it async to support dynamic indexer creation. - Updated `ParquetWriter` to handle multiple indexers and file IDs. - Files affected: `index.rs`, `parquet.rs`, `writer.rs`. - **Removed Redundant File ID Handling**: - Removed `file_id` from `SstWriteRequest` and `CompactionOutput`. - Updated related logic to dynamically generate file IDs where necessary. - Files affected: `compaction.rs`, `flush.rs`, `picker.rs`, `twcs.rs`, `window.rs`. - **Test Adjustments**: - Updated tests to align with new path and indexer management. - Introduced `FixedPathProvider` and `NoopIndexBuilder` for testing purposes. - Files affected: `sst_util.rs`, `version_util.rs`, `parquet.rs`. * chore: rebase main * feat/multiple-compaction-output: ### Add Benchmarking and Refactor Compaction Logic - **Benchmarking**: Added a new benchmark `run_bench` in `Cargo.toml` and implemented benchmarks in `benches/run_bench.rs` using Criterion for `find_sorted_runs` and `reduce_runs` functions. - **Compaction Module Enhancements**: - Made `run.rs` public and refactored the `Ranged` and `Item` traits to be public. - Simplified the logic in `find_sorted_runs` and `reduce_runs` by removing `MergeItems` and related functions. - Introduced `find_overlapping_items` for identifying overlapping items. - **Code Cleanup**: Removed redundant code and tests related to `MergeItems` in `run.rs`. * feat/multiple-compaction-output: ### Enhance Compaction Logic and Add Benchmarks - **Compaction Logic Improvements**: - Updated `reduce_runs` function in `src/mito2/src/compaction/run.rs` to remove the target parameter and improve the logic for selecting files to merge based on minimum penalty. - Enhanced `find_overlapping_items` to handle unsorted inputs and improve overlap detection efficiency. - **Benchmark Enhancements**: - Added `bench_find_overlapping_items` in `src/mito2/benches/run_bench.rs` to benchmark the new `find_overlapping_items` function. - Extended existing benchmarks to include larger data sizes. - **Testing Enhancements**: - Updated tests in `src/mito2/src/compaction/run.rs` to reflect changes in `reduce_runs` and added new tests for `find_overlapping_items`. - **Logging and Debugging**: - Improved logging in `src/mito2/src/compaction/twcs.rs` to provide more detailed information about compaction decisions. * feat/multiple-compaction-output: ### Refactor and Enhance Compaction Logic - **Refactor `find_overlapping_items` Function**: Changed the function signature to accept slices instead of mutable vectors in `run.rs`. - **Rename and Update Struct Fields**: Renamed `penalty` to `size` in `SortedRun` struct and updated related logic in `run.rs`. - **Enhance `reduce_runs` Function**: Improved logic to sort runs by size and limit probe runs to 100 in `run.rs`. - **Add `merge_seq_files` Function**: Introduced a new function `merge_seq_files` in `run.rs` for merging sequential files. - **Modify `TwcsPicker` Logic**: Updated the compaction logic to use `merge_seq_files` when only one run is found in `twcs.rs`. - **Remove `enforce_file_num` Function**: Deleted the `enforce_file_num` function and its related test cases in `twcs.rs`. * feat/multiple-compaction-output: ### Enhance Compaction Logic and Testing - **Add `merge_seq_files` Functionality**: Implemented the `merge_seq_files` function in `run.rs` to optimize file merging based on scoring systems. Updated benchmarks in `run_bench.rs` to include `bench_merge_seq_files`. - **Improve Compaction Strategy in `twcs.rs`**: Modified the compaction logic to handle file merging more effectively, considering file size and overlap. - **Update Tests**: Enhanced test coverage in `compaction_test.rs` and `append_mode_test.rs` to validate new compaction logic and file merging strategies. - **Remove Unused Function**: Deleted `new_file_handles` from `test_util.rs` as it was no longer needed. * feat/multiple-compaction-output: ### Refactor TWCS Compaction Options - **Refactor Compaction Logic**: Simplified the TWCS compaction logic by replacing multiple parameters (`max_active_window_runs`, `max_active_window_files`, `max_inactive_window_runs`, `max_inactive_window_files`) with a single `trigger_file_num` parameter in `picker.rs`, `twcs.rs`, and `options.rs`. - **Update Tests**: Adjusted test cases to reflect the new compaction logic in `append_mode_test.rs`, `compaction_test.rs`, `filter_deleted_test.rs`, `merge_mode_test.rs`, and various test files under `tests/cases`. - **Modify Engine Options**: Updated engine option keys to use `trigger_file_num` in `mito_engine_options.rs` and `region_request.rs`. - **Fuzz Testing**: Updated fuzz test generators and translators to accommodate the new compaction parameter in `alter_expr.rs` and related files. This refactor aims to streamline the compaction configuration by reducing the number of parameters and simplifying the codebase. * chore: add trailing space * fix license header * feat/revise-compaction-picker: **Limit File Processing and Optimize Merge Logic in `run.rs`** - Introduced a limit to process a maximum of 100 files in `merge_seq_files` to control time complexity. - Adjusted logic to calculate `target_size` and iterate over files using the limited set of files. - Updated scoring calculations to use the limited file set, ensuring efficient file merging. * feat/revise-compaction-picker: ### Add Compaction Metrics and Remove Debug Logging - **Compaction Metrics**: Introduced new histograms `COMPACTION_INPUT_BYTES` and `COMPACTION_OUTPUT_BYTES` to track compaction input and output file sizes in `metrics.rs`. Updated `compactor.rs` to observe these metrics during the compaction process. - **Logging Cleanup**: Removed debug logging of file ranges during the merge process in `twcs.rs`. * feat/revise-compaction-picker: ## Enhance Compaction Logic and Metrics - **Compaction Logic Improvements**: - Added methods `input_file_size` and `output_file_size` to `MergeOutput` in `compactor.rs` to streamline file size calculations. - Updated `Compactor` implementation to use these methods for metrics tracking. - Modified `Ranged` trait logic in `run.rs` to improve range comparison. - Enhanced test cases in `run.rs` to reflect changes in compaction logic. - **Metrics Enhancements**: - Changed `COMPACTION_INPUT_BYTES` and `COMPACTION_OUTPUT_BYTES` from histograms to counters in `metrics.rs` for better performance tracking. - **Debugging and Logging**: - Added detailed logging for compaction pick results in `twcs.rs`. - Implemented custom `Debug` trait for `FileMeta` in `file.rs` to improve debugging output. - **Testing Enhancements**: - Added new test `test_compaction_overlapping_files` in `compaction_test.rs` to verify compaction behavior with overlapping files. - Updated `merge_mode_test.rs` to reflect changes in file handling during scans. * feat/revise-compaction-picker: ### Update `FileHandle` Debug Implementation - **Refactor Debug Output**: Simplified the `fmt::Debug` implementation for `FileHandle` in `src/mito2/src/sst/file.rs` by consolidating multiple fields into a single `meta` field using `meta_ref()`. - **Atomic Operations**: Updated the `deleted` field to use atomic loading with `Ordering::Relaxed`. * Trigger CI * feat/revise-compaction-picker: **Update compaction logic and default options** - **`twcs.rs`**: Enhanced logging for compaction pick results by improving the formatting for better readability. - **`options.rs`**: Modified the default `max_output_file_size` in `TwcsOptions` from 2GB to 512MB to optimize file handling and performance. * feat/revise-compaction-picker: Refactor `find_overlapping_items` to use an external result vector - Updated `find_overlapping_items` in `src/mito2/src/compaction/run.rs` to accept a mutable result vector instead of returning a new vector, improving memory efficiency. - Modified benchmarks in `src/mito2/benches/bench_compaction_picker.rs` to accommodate the new function signature. - Adjusted tests in `src/mito2/src/compaction/run.rs` to use the updated function signature, ensuring correct functionality with the new approach. * feat/revise-compaction-picker: Improve file merging logic in `run.rs` - Refactor the loop logic in `merge_seq_files` to simplify the iteration over file groups. - Adjust the range for `end_idx` to include the endpoint, allowing for more flexible group selection. - Remove the condition that skips groups with only one file, enabling more comprehensive processing of file sequences. * feat/revise-compaction-picker: Enhance `find_overlapping_items` with `SortedRun` and Update Tests - Refactor `find_overlapping_items` in `src/mito2/src/compaction/run.rs` to utilize the `SortedRun` struct for improved efficiency and clarity. - Introduce a `sorted` flag in `SortedRun` to optimize sorting operations. - Update test cases in `src/mito2/benches/bench_compaction_picker.rs` to accommodate changes in `find_overlapping_items` by using `SortedRun`. - Add `From<Vec<T>>` implementation for `SortedRun` to facilitate easy conversion from vectors. * feat/revise-compaction-picker: **Enhancements in `compaction/run.rs`:** - Added `ReadableSize` import to handle size calculations. - Modified the logic in `merge_seq_files` to clamp the calculated target size to a maximum of 2GB when `max_file_size` is not provided. * feat/revise-compaction-picker: Add Default Max Output Size Constant for Compaction Introduce DEFAULT_MAX_OUTPUT_SIZE constant to define the default maximum compaction output file size as 2GB. Refactor the merge_seq_files function to utilize this constant, ensuring consistent and maintainable code for handling file size limits during compaction.
58 lines
1.9 KiB
SQL
58 lines
1.9 KiB
SQL
CREATE TABLE integers(i BIGINT, ts TIMESTAMP TIME INDEX);
|
|
|
|
-- SQLNESS REPLACE (-+) -
|
|
-- SQLNESS REPLACE (\s\s+) _
|
|
-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
|
|
-- SQLNESS REPLACE (Hash.*) REDACTED
|
|
-- SQLNESS REPLACE (peers.*) REDACTED
|
|
EXPLAIN SELECT DISTINCT i%2 FROM integers ORDER BY 1;
|
|
|
|
DROP TABLE integers;
|
|
|
|
|
|
CREATE TABLE test (a INTEGER, b INTEGER, t TIMESTAMP TIME INDEX);
|
|
|
|
-- SQLNESS REPLACE (-+) -
|
|
-- SQLNESS REPLACE (\s\s+) _
|
|
-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
|
|
-- SQLNESS REPLACE (Hash.*) REDACTED
|
|
-- SQLNESS REPLACE (peers.*) REDACTED
|
|
EXPLAIN SELECT a, b FROM test ORDER BY a, b;
|
|
|
|
-- SQLNESS REPLACE (-+) -
|
|
-- SQLNESS REPLACE (\s\s+) _
|
|
-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
|
|
-- SQLNESS REPLACE (Hash.*) REDACTED
|
|
-- SQLNESS REPLACE (peers.*) REDACTED
|
|
EXPLAIN SELECT DISTINCT a, b FROM test ORDER BY a, b;
|
|
|
|
DROP TABLE test;
|
|
|
|
CREATE TABLE test_pk(pk INTEGER PRIMARY KEY, i INTEGER, t TIMESTAMP TIME INDEX) WITH('compaction.type'='twcs', 'compaction.twcs.trigger_file_num'='4');
|
|
|
|
INSERT INTO test_pk VALUES (1, 1, 1), (2, NULL, 2), (3, 1, 3), (4, 2, 4), (5, 2, 5), (6, NULL, 6);
|
|
|
|
-- Test aliasing.
|
|
SELECT i, t AS alias_ts FROM test_pk ORDER BY t DESC LIMIT 5;
|
|
|
|
-- Test aliasing.
|
|
SELECT i, t AS alias_ts FROM test_pk ORDER BY alias_ts DESC LIMIT 5;
|
|
|
|
-- SQLNESS REPLACE (-+) -
|
|
-- SQLNESS REPLACE (\s\s+) _
|
|
-- SQLNESS REPLACE (peers.*) REDACTED
|
|
-- SQLNESS REPLACE (metrics.*) REDACTED
|
|
-- SQLNESS REPLACE region=\d+\(\d+,\s+\d+\) region=REDACTED
|
|
-- SQLNESS REPLACE num_ranges=\d+ num_ranges=REDACTED
|
|
EXPLAIN ANALYZE SELECT i, t AS alias_ts FROM test_pk ORDER BY t DESC LIMIT 5;
|
|
|
|
-- SQLNESS REPLACE (-+) -
|
|
-- SQLNESS REPLACE (\s\s+) _
|
|
-- SQLNESS REPLACE (peers.*) REDACTED
|
|
-- SQLNESS REPLACE (metrics.*) REDACTED
|
|
-- SQLNESS REPLACE region=\d+\(\d+,\s+\d+\) region=REDACTED
|
|
-- SQLNESS REPLACE num_ranges=\d+ num_ranges=REDACTED
|
|
EXPLAIN ANALYZE SELECT i, t AS alias_ts FROM test_pk ORDER BY alias_ts DESC LIMIT 5;
|
|
|
|
DROP TABLE test_pk;
|