mirror of
https://github.com/GreptimeTeam/greptimedb.git
synced 2026-01-14 09:12:57 +00:00
2dfcf35fee21b1b3bf96d937d5bcff3e50325708
7 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
4b71e493f7 |
feat!: revise compaction picker (#6121)
* - **Refactor `RegionFilePathFactory` to `RegionFilePathProvider`:** Updated references and implementations in `access_layer.rs`, `write_cache.rs`, and related test files to use the new struct name. - **Add `max_file_size` support in compaction:** Introduced `max_file_size` option in `PickerOutput`, `SerializedPickerOutput`, and `WriteOptions` in `compactor.rs`, `picker.rs`, `twcs.rs`, and `window.rs`. - **Enhance Parquet writing logic:** Modified `parquet.rs` and `parquet/writer.rs` to support optional `max_file_size` and added a test case `test_write_multiple_files` to verify writing multiple files based on size constraints. **Refactor Parquet Writer Initialization and File Handling** - Updated `ParquetWriter` in `writer.rs` to handle `current_indexer` as an `Option`, allowing for more flexible initialization and management. - Introduced `finish_current_file` method to encapsulate logic for completing and transitioning between SST files, improving code clarity and maintainability. - Enhanced error handling and logging with `debug` statements for better traceability during file operations. - **Removed Output Size Enforcement in `twcs.rs`:** - Deleted the `enforce_max_output_size` function and related logic to simplify compaction input handling. - **Added Max File Size Option in `parquet.rs`:** - Introduced `max_file_size` in `WriteOptions` to control the maximum size of output files. - **Refactored Indexer Management in `parquet/writer.rs`:** - Changed `current_indexer` from an `Option` to a direct `Indexer` type. - Implemented `roll_to_next_file` to handle file transitions when exceeding `max_file_size`. - Simplified indexer initialization and management logic. - **Refactored SST File Handling**: - Introduced `FilePathProvider` trait and its implementations (`WriteCachePathProvider`, `RegionFilePathFactory`) to manage SST and index file paths. - Updated `AccessLayer`, `WriteCache`, and `ParquetWriter` to use `FilePathProvider` for path management. - Modified `SstWriteRequest` and `SstUploadRequest` to use path providers instead of direct paths. - Files affected: `access_layer.rs`, `write_cache.rs`, `parquet.rs`, `writer.rs`. - **Enhanced Indexer Management**: - Replaced `IndexerBuilder` with `IndexerBuilderImpl` and made it async to support dynamic indexer creation. - Updated `ParquetWriter` to handle multiple indexers and file IDs. - Files affected: `index.rs`, `parquet.rs`, `writer.rs`. - **Removed Redundant File ID Handling**: - Removed `file_id` from `SstWriteRequest` and `CompactionOutput`. - Updated related logic to dynamically generate file IDs where necessary. - Files affected: `compaction.rs`, `flush.rs`, `picker.rs`, `twcs.rs`, `window.rs`. - **Test Adjustments**: - Updated tests to align with new path and indexer management. - Introduced `FixedPathProvider` and `NoopIndexBuilder` for testing purposes. - Files affected: `sst_util.rs`, `version_util.rs`, `parquet.rs`. * chore: rebase main * feat/multiple-compaction-output: ### Add Benchmarking and Refactor Compaction Logic - **Benchmarking**: Added a new benchmark `run_bench` in `Cargo.toml` and implemented benchmarks in `benches/run_bench.rs` using Criterion for `find_sorted_runs` and `reduce_runs` functions. - **Compaction Module Enhancements**: - Made `run.rs` public and refactored the `Ranged` and `Item` traits to be public. - Simplified the logic in `find_sorted_runs` and `reduce_runs` by removing `MergeItems` and related functions. - Introduced `find_overlapping_items` for identifying overlapping items. - **Code Cleanup**: Removed redundant code and tests related to `MergeItems` in `run.rs`. * feat/multiple-compaction-output: ### Enhance Compaction Logic and Add Benchmarks - **Compaction Logic Improvements**: - Updated `reduce_runs` function in `src/mito2/src/compaction/run.rs` to remove the target parameter and improve the logic for selecting files to merge based on minimum penalty. - Enhanced `find_overlapping_items` to handle unsorted inputs and improve overlap detection efficiency. - **Benchmark Enhancements**: - Added `bench_find_overlapping_items` in `src/mito2/benches/run_bench.rs` to benchmark the new `find_overlapping_items` function. - Extended existing benchmarks to include larger data sizes. - **Testing Enhancements**: - Updated tests in `src/mito2/src/compaction/run.rs` to reflect changes in `reduce_runs` and added new tests for `find_overlapping_items`. - **Logging and Debugging**: - Improved logging in `src/mito2/src/compaction/twcs.rs` to provide more detailed information about compaction decisions. * feat/multiple-compaction-output: ### Refactor and Enhance Compaction Logic - **Refactor `find_overlapping_items` Function**: Changed the function signature to accept slices instead of mutable vectors in `run.rs`. - **Rename and Update Struct Fields**: Renamed `penalty` to `size` in `SortedRun` struct and updated related logic in `run.rs`. - **Enhance `reduce_runs` Function**: Improved logic to sort runs by size and limit probe runs to 100 in `run.rs`. - **Add `merge_seq_files` Function**: Introduced a new function `merge_seq_files` in `run.rs` for merging sequential files. - **Modify `TwcsPicker` Logic**: Updated the compaction logic to use `merge_seq_files` when only one run is found in `twcs.rs`. - **Remove `enforce_file_num` Function**: Deleted the `enforce_file_num` function and its related test cases in `twcs.rs`. * feat/multiple-compaction-output: ### Enhance Compaction Logic and Testing - **Add `merge_seq_files` Functionality**: Implemented the `merge_seq_files` function in `run.rs` to optimize file merging based on scoring systems. Updated benchmarks in `run_bench.rs` to include `bench_merge_seq_files`. - **Improve Compaction Strategy in `twcs.rs`**: Modified the compaction logic to handle file merging more effectively, considering file size and overlap. - **Update Tests**: Enhanced test coverage in `compaction_test.rs` and `append_mode_test.rs` to validate new compaction logic and file merging strategies. - **Remove Unused Function**: Deleted `new_file_handles` from `test_util.rs` as it was no longer needed. * feat/multiple-compaction-output: ### Refactor TWCS Compaction Options - **Refactor Compaction Logic**: Simplified the TWCS compaction logic by replacing multiple parameters (`max_active_window_runs`, `max_active_window_files`, `max_inactive_window_runs`, `max_inactive_window_files`) with a single `trigger_file_num` parameter in `picker.rs`, `twcs.rs`, and `options.rs`. - **Update Tests**: Adjusted test cases to reflect the new compaction logic in `append_mode_test.rs`, `compaction_test.rs`, `filter_deleted_test.rs`, `merge_mode_test.rs`, and various test files under `tests/cases`. - **Modify Engine Options**: Updated engine option keys to use `trigger_file_num` in `mito_engine_options.rs` and `region_request.rs`. - **Fuzz Testing**: Updated fuzz test generators and translators to accommodate the new compaction parameter in `alter_expr.rs` and related files. This refactor aims to streamline the compaction configuration by reducing the number of parameters and simplifying the codebase. * chore: add trailing space * fix license header * feat/revise-compaction-picker: **Limit File Processing and Optimize Merge Logic in `run.rs`** - Introduced a limit to process a maximum of 100 files in `merge_seq_files` to control time complexity. - Adjusted logic to calculate `target_size` and iterate over files using the limited set of files. - Updated scoring calculations to use the limited file set, ensuring efficient file merging. * feat/revise-compaction-picker: ### Add Compaction Metrics and Remove Debug Logging - **Compaction Metrics**: Introduced new histograms `COMPACTION_INPUT_BYTES` and `COMPACTION_OUTPUT_BYTES` to track compaction input and output file sizes in `metrics.rs`. Updated `compactor.rs` to observe these metrics during the compaction process. - **Logging Cleanup**: Removed debug logging of file ranges during the merge process in `twcs.rs`. * feat/revise-compaction-picker: ## Enhance Compaction Logic and Metrics - **Compaction Logic Improvements**: - Added methods `input_file_size` and `output_file_size` to `MergeOutput` in `compactor.rs` to streamline file size calculations. - Updated `Compactor` implementation to use these methods for metrics tracking. - Modified `Ranged` trait logic in `run.rs` to improve range comparison. - Enhanced test cases in `run.rs` to reflect changes in compaction logic. - **Metrics Enhancements**: - Changed `COMPACTION_INPUT_BYTES` and `COMPACTION_OUTPUT_BYTES` from histograms to counters in `metrics.rs` for better performance tracking. - **Debugging and Logging**: - Added detailed logging for compaction pick results in `twcs.rs`. - Implemented custom `Debug` trait for `FileMeta` in `file.rs` to improve debugging output. - **Testing Enhancements**: - Added new test `test_compaction_overlapping_files` in `compaction_test.rs` to verify compaction behavior with overlapping files. - Updated `merge_mode_test.rs` to reflect changes in file handling during scans. * feat/revise-compaction-picker: ### Update `FileHandle` Debug Implementation - **Refactor Debug Output**: Simplified the `fmt::Debug` implementation for `FileHandle` in `src/mito2/src/sst/file.rs` by consolidating multiple fields into a single `meta` field using `meta_ref()`. - **Atomic Operations**: Updated the `deleted` field to use atomic loading with `Ordering::Relaxed`. * Trigger CI * feat/revise-compaction-picker: **Update compaction logic and default options** - **`twcs.rs`**: Enhanced logging for compaction pick results by improving the formatting for better readability. - **`options.rs`**: Modified the default `max_output_file_size` in `TwcsOptions` from 2GB to 512MB to optimize file handling and performance. * feat/revise-compaction-picker: Refactor `find_overlapping_items` to use an external result vector - Updated `find_overlapping_items` in `src/mito2/src/compaction/run.rs` to accept a mutable result vector instead of returning a new vector, improving memory efficiency. - Modified benchmarks in `src/mito2/benches/bench_compaction_picker.rs` to accommodate the new function signature. - Adjusted tests in `src/mito2/src/compaction/run.rs` to use the updated function signature, ensuring correct functionality with the new approach. * feat/revise-compaction-picker: Improve file merging logic in `run.rs` - Refactor the loop logic in `merge_seq_files` to simplify the iteration over file groups. - Adjust the range for `end_idx` to include the endpoint, allowing for more flexible group selection. - Remove the condition that skips groups with only one file, enabling more comprehensive processing of file sequences. * feat/revise-compaction-picker: Enhance `find_overlapping_items` with `SortedRun` and Update Tests - Refactor `find_overlapping_items` in `src/mito2/src/compaction/run.rs` to utilize the `SortedRun` struct for improved efficiency and clarity. - Introduce a `sorted` flag in `SortedRun` to optimize sorting operations. - Update test cases in `src/mito2/benches/bench_compaction_picker.rs` to accommodate changes in `find_overlapping_items` by using `SortedRun`. - Add `From<Vec<T>>` implementation for `SortedRun` to facilitate easy conversion from vectors. * feat/revise-compaction-picker: **Enhancements in `compaction/run.rs`:** - Added `ReadableSize` import to handle size calculations. - Modified the logic in `merge_seq_files` to clamp the calculated target size to a maximum of 2GB when `max_file_size` is not provided. * feat/revise-compaction-picker: Add Default Max Output Size Constant for Compaction Introduce DEFAULT_MAX_OUTPUT_SIZE constant to define the default maximum compaction output file size as 2GB. Refactor the merge_seq_files function to utilize this constant, ensuring consistent and maintainable code for handling file size limits during compaction. |
||
|
|
35b635f639 |
feat!: Bump datafusion, prost, hyper, tonic, tower, axum (#5417)
* change dep Signed-off-by: Ruihang Xia <waynestxia@gmail.com> * feat: adapt to arrow's interval array * chore: fix compile errors in datatypes crate * chore: fix api crate compiler errors * chore: fix compiler errors in common-grpc * chore: fix common-datasource errors * chore: fix deprecated code in common-datasource * fix promql and physical plan related Signed-off-by: Ruihang Xia <waynestxia@gmail.com> * wip: upgrading network deps Signed-off-by: Ruihang Xia <waynestxia@gmail.com> * block on updating `sqlparser` * upgrade sqlparser Signed-off-by: Ruihang Xia <waynestxia@gmail.com> * adapt new df's trait requirements Signed-off-by: Ruihang Xia <waynestxia@gmail.com> * chore: fix compiler errors in mito2 * chore: fix common-function crate errors * chore: fix catalog errors * change import path Signed-off-by: Ruihang Xia <waynestxia@gmail.com> * chore: fix some errors in query crate * chore: fix some errors in query crate * aggr expr and some other tiny fixes Signed-off-by: Ruihang Xia <waynestxia@gmail.com> * chore: fix expr related errors in query crate * chore: fix query serializer and admin command * chore: fix grpc services * feat: axum serve * chore: fix http server * remove handle_error handler * refactor timeout layer * serve axum * chore: fix flow aggr functions * chore: fix flow * feat: fix errors in meta-srv * boxed() * use TokioIo * feat!: Remove script crate and python feature (#5321) * feat: exclude script crate * chore: simplify feature * feat: remove the script crate * chore: remove python feature and some comments * chore: fix warning * chore: fix servers tests compiler errors * feat: fix tests-integration errors * chore: fix unused * test: fix catalog test * chore: fix compiler errors for crates using common-meta testing feature is enabled when check with --workspace * test: use display for logical plan test * test: implement rewrite for ScanHintRule * fix: http server build panic * test: fix mito test * fix: sql parser type alias error * test: fix TestClient not listen * test: some flow tests * test(flow): more fix * fix: test_otlp_logs * test: fix promql test that using deprecated method fun() * fix: sql type replace supports Int8 ~ Int64, UInt8 ~ UInt64 * test: fix infer schema test case * test: fix tests related to plan display * chore: fix last flow test * test: fix function format related assertion * test: use larger port range for tests * fix: test_otlp_traces * fix: test_otlp_metrics * fix range query and dist plan Signed-off-by: Ruihang Xia <waynestxia@gmail.com> * fix: flow handle distinct use deprecated field * fix: can't pass Join plan expressions to LogicalPlan::with_new_exprs * test: fix deserialize test * test: reduce split key case num * tests: lower case aggr func name * test: fix some sqlness tests * tests: more sqlness fix * tests: fixed sqlness test * commit non-bug changes Signed-off-by: Ruihang Xia <waynestxia@gmail.com> * fix: make our udf correct * fix: implement empty methods of ContextProvider for DfContextProviderAdapter * test: update sqlness test result * chore: remove unused * fix: provide alias name for AggregateExprBuilder in range plan * test: update range query result * fix: implement missing ContextProvider methods for DfContextProviderAdapter * test: update timestamps, cte result * fix: supports empty projection in mito * test: update comment for cte test * fix: support projection for numbers * test: update test cases after projection fix * fix: fix range select first_value/last_value * fix: handle CAST and time index conflict * fix: handle order by correctly in range first_value/last_value * test: update sqlness result * test: update view test result * test: update decimal test wait for https://github.com/apache/datafusion/pull/14126 to fix this * feat: remove redundant physical optimization todo(ruihang): Check if we can remove this. * test: update sqlness test result * chore: range select default sort use nulls_first = false * test: update filter push down test result * test: comment deciaml test to avoid different panic message * test: update some distributed test result * test: update test for distributed count and filter push down * test: update subqueries test * fix: SessionState may overwrite our UDFs * chore: fix compiler errors after merging main * fix: fix elasticsearch and dashboard router panic * chore: fix common-functions tests * chore: update sqlness result * test: fix id keyword and update sqlness result * test: fix flow_null test * fix: enlarge thread size in debug mode to avoid overflow * chore: fix warnings in common-function * chore: fix warning in flow * chore: fix warnings in query crate * chore: remove unused warnings * chore: fix deprecated warnings for parquet * chore: fix deprecated warning in servers crate * style: fix clippy * test: enlarge mito cache tttl test ttl time * chore: fix typo * style: fmt toml * refactor: reimplement PartialOrd for RangeSelect * chore: remove script crate files introduced by merge * fix: return error if sql option is not kv * chore: do not use ..default::default() * chore: per review * chore: update error message in BuildAdminFunctionArgsSnafu Co-authored-by: jeremyhi <jiachun_feng@proton.me> * refactor: typed precision * update sqlness view case Signed-off-by: Ruihang Xia <waynestxia@gmail.com> * chore: flow per review * chore: add example in comment * chore: warn if parquet stats of timestamp is not INT64 * style: add a newline before derive to make the comment more clear * test: update sqlness result * fix: flow from substrait * chore: change update_range_context log to debug level * chore: move axum-extra axum-macros to workspace --------- Signed-off-by: Ruihang Xia <waynestxia@gmail.com> Co-authored-by: Ruihang Xia <waynestxia@gmail.com> Co-authored-by: luofucong <luofc@foxmail.com> Co-authored-by: discord9 <discord9@163.com> Co-authored-by: shuiyisong <xixing.sys@gmail.com> Co-authored-by: jeremyhi <jiachun_feng@proton.me> |
||
|
|
ef935a1de6 |
feat!: reduce sorted runs during compaction (#3702)
* feat: add functions to find and merge sorted runs * chore: refactor code * chore: remove some duplicates * chore: remove one clone * refactor: change max_active_window_files to max_active_window_runs * feat: integrate with sorted runs * fix: unit tests * feat: limit num of sorted runs during compaction * fix: some test * fix: some cr comments * feat: use smallvec * chore: rebase main * feat/reduce-sorted-runs: Refactor compaction logic and update test configurations - Refactored `merge_all_runs` function to use `sort_ranged_items` for sorting. - Improved item merging logic by iterating with `into_iter` and handling overlaps. - Updated test configurations to use `max_active_window_runs` instead of `max_active_window_files` for consistency. --------- Co-authored-by: tison <wander4096@gmail.com> |
||
|
|
bba3108e0d |
refactor!: unify sql options into OptionMap (#3792)
* unify sql options into OptionMap Signed-off-by: tison <wander4096@gmail.com> * fixup Signed-off-by: tison <wander4096@gmail.com> * Update src/sql/src/util.rs * drop legacy regions option Signed-off-by: tison <wander4096@gmail.com> * fixup Signed-off-by: tison <wander4096@gmail.com> * fixup Signed-off-by: tison <wander4096@gmail.com> --------- Signed-off-by: tison <wander4096@gmail.com> |
||
|
|
39b69f1e3b |
refactor!: Renames the new memtable to PartitionTreeMemtable (#3547)
* refactor: rename mod merge_tree to partition_tree * refactor: rename merge_tree * refactor: change merge tree comment * refactor: rename merge tree struct * refactor: memtable options |
||
|
|
641592644d |
feat: support per table memtable options (#3524)
* feat: add memtable builder to region * refactor: rename memtable_builder in worker to default_memtable_builder * fix: return error instead of using default compaction options Support deserializing memtable and compaction options from the option map * feat: optional memtable options * feat: add MemtableBuilderProvider to create builders * feat: change default memtable and skip deserializing dedup * chore: update test and comment * chore: test invalid type * feat: metric engine use new memtable manually * feat: expose more memtable configs * feat: add memtable options to valid option list * test: add test * test: sqlness test * chore: serde workspace * chore: remove comments |
||
|
|
9aa8f756ab |
fix: allow passing extra table options (#3484)
* fix: do not check options in parser * test: fix tests * test: fix sqlness * test: add sqlness test * chore: log options * chore: must specify compaction type * feat: validate option key * feat: add option key validation back |