mirror of
https://github.com/GreptimeTeam/greptimedb.git
synced 2026-01-03 20:02:54 +00:00
* - **Refactor `RegionFilePathFactory` to `RegionFilePathProvider`:** Updated references and implementations in `access_layer.rs`, `write_cache.rs`, and related test files to use the new struct name. - **Add `max_file_size` support in compaction:** Introduced `max_file_size` option in `PickerOutput`, `SerializedPickerOutput`, and `WriteOptions` in `compactor.rs`, `picker.rs`, `twcs.rs`, and `window.rs`. - **Enhance Parquet writing logic:** Modified `parquet.rs` and `parquet/writer.rs` to support optional `max_file_size` and added a test case `test_write_multiple_files` to verify writing multiple files based on size constraints. **Refactor Parquet Writer Initialization and File Handling** - Updated `ParquetWriter` in `writer.rs` to handle `current_indexer` as an `Option`, allowing for more flexible initialization and management. - Introduced `finish_current_file` method to encapsulate logic for completing and transitioning between SST files, improving code clarity and maintainability. - Enhanced error handling and logging with `debug` statements for better traceability during file operations. - **Removed Output Size Enforcement in `twcs.rs`:** - Deleted the `enforce_max_output_size` function and related logic to simplify compaction input handling. - **Added Max File Size Option in `parquet.rs`:** - Introduced `max_file_size` in `WriteOptions` to control the maximum size of output files. - **Refactored Indexer Management in `parquet/writer.rs`:** - Changed `current_indexer` from an `Option` to a direct `Indexer` type. - Implemented `roll_to_next_file` to handle file transitions when exceeding `max_file_size`. - Simplified indexer initialization and management logic. - **Refactored SST File Handling**: - Introduced `FilePathProvider` trait and its implementations (`WriteCachePathProvider`, `RegionFilePathFactory`) to manage SST and index file paths. - Updated `AccessLayer`, `WriteCache`, and `ParquetWriter` to use `FilePathProvider` for path management. - Modified `SstWriteRequest` and `SstUploadRequest` to use path providers instead of direct paths. - Files affected: `access_layer.rs`, `write_cache.rs`, `parquet.rs`, `writer.rs`. - **Enhanced Indexer Management**: - Replaced `IndexerBuilder` with `IndexerBuilderImpl` and made it async to support dynamic indexer creation. - Updated `ParquetWriter` to handle multiple indexers and file IDs. - Files affected: `index.rs`, `parquet.rs`, `writer.rs`. - **Removed Redundant File ID Handling**: - Removed `file_id` from `SstWriteRequest` and `CompactionOutput`. - Updated related logic to dynamically generate file IDs where necessary. - Files affected: `compaction.rs`, `flush.rs`, `picker.rs`, `twcs.rs`, `window.rs`. - **Test Adjustments**: - Updated tests to align with new path and indexer management. - Introduced `FixedPathProvider` and `NoopIndexBuilder` for testing purposes. - Files affected: `sst_util.rs`, `version_util.rs`, `parquet.rs`. * chore: rebase main * feat/multiple-compaction-output: ### Add Benchmarking and Refactor Compaction Logic - **Benchmarking**: Added a new benchmark `run_bench` in `Cargo.toml` and implemented benchmarks in `benches/run_bench.rs` using Criterion for `find_sorted_runs` and `reduce_runs` functions. - **Compaction Module Enhancements**: - Made `run.rs` public and refactored the `Ranged` and `Item` traits to be public. - Simplified the logic in `find_sorted_runs` and `reduce_runs` by removing `MergeItems` and related functions. - Introduced `find_overlapping_items` for identifying overlapping items. - **Code Cleanup**: Removed redundant code and tests related to `MergeItems` in `run.rs`. * feat/multiple-compaction-output: ### Enhance Compaction Logic and Add Benchmarks - **Compaction Logic Improvements**: - Updated `reduce_runs` function in `src/mito2/src/compaction/run.rs` to remove the target parameter and improve the logic for selecting files to merge based on minimum penalty. - Enhanced `find_overlapping_items` to handle unsorted inputs and improve overlap detection efficiency. - **Benchmark Enhancements**: - Added `bench_find_overlapping_items` in `src/mito2/benches/run_bench.rs` to benchmark the new `find_overlapping_items` function. - Extended existing benchmarks to include larger data sizes. - **Testing Enhancements**: - Updated tests in `src/mito2/src/compaction/run.rs` to reflect changes in `reduce_runs` and added new tests for `find_overlapping_items`. - **Logging and Debugging**: - Improved logging in `src/mito2/src/compaction/twcs.rs` to provide more detailed information about compaction decisions. * feat/multiple-compaction-output: ### Refactor and Enhance Compaction Logic - **Refactor `find_overlapping_items` Function**: Changed the function signature to accept slices instead of mutable vectors in `run.rs`. - **Rename and Update Struct Fields**: Renamed `penalty` to `size` in `SortedRun` struct and updated related logic in `run.rs`. - **Enhance `reduce_runs` Function**: Improved logic to sort runs by size and limit probe runs to 100 in `run.rs`. - **Add `merge_seq_files` Function**: Introduced a new function `merge_seq_files` in `run.rs` for merging sequential files. - **Modify `TwcsPicker` Logic**: Updated the compaction logic to use `merge_seq_files` when only one run is found in `twcs.rs`. - **Remove `enforce_file_num` Function**: Deleted the `enforce_file_num` function and its related test cases in `twcs.rs`. * feat/multiple-compaction-output: ### Enhance Compaction Logic and Testing - **Add `merge_seq_files` Functionality**: Implemented the `merge_seq_files` function in `run.rs` to optimize file merging based on scoring systems. Updated benchmarks in `run_bench.rs` to include `bench_merge_seq_files`. - **Improve Compaction Strategy in `twcs.rs`**: Modified the compaction logic to handle file merging more effectively, considering file size and overlap. - **Update Tests**: Enhanced test coverage in `compaction_test.rs` and `append_mode_test.rs` to validate new compaction logic and file merging strategies. - **Remove Unused Function**: Deleted `new_file_handles` from `test_util.rs` as it was no longer needed. * feat/multiple-compaction-output: ### Refactor TWCS Compaction Options - **Refactor Compaction Logic**: Simplified the TWCS compaction logic by replacing multiple parameters (`max_active_window_runs`, `max_active_window_files`, `max_inactive_window_runs`, `max_inactive_window_files`) with a single `trigger_file_num` parameter in `picker.rs`, `twcs.rs`, and `options.rs`. - **Update Tests**: Adjusted test cases to reflect the new compaction logic in `append_mode_test.rs`, `compaction_test.rs`, `filter_deleted_test.rs`, `merge_mode_test.rs`, and various test files under `tests/cases`. - **Modify Engine Options**: Updated engine option keys to use `trigger_file_num` in `mito_engine_options.rs` and `region_request.rs`. - **Fuzz Testing**: Updated fuzz test generators and translators to accommodate the new compaction parameter in `alter_expr.rs` and related files. This refactor aims to streamline the compaction configuration by reducing the number of parameters and simplifying the codebase. * chore: add trailing space * fix license header * feat/revise-compaction-picker: **Limit File Processing and Optimize Merge Logic in `run.rs`** - Introduced a limit to process a maximum of 100 files in `merge_seq_files` to control time complexity. - Adjusted logic to calculate `target_size` and iterate over files using the limited set of files. - Updated scoring calculations to use the limited file set, ensuring efficient file merging. * feat/revise-compaction-picker: ### Add Compaction Metrics and Remove Debug Logging - **Compaction Metrics**: Introduced new histograms `COMPACTION_INPUT_BYTES` and `COMPACTION_OUTPUT_BYTES` to track compaction input and output file sizes in `metrics.rs`. Updated `compactor.rs` to observe these metrics during the compaction process. - **Logging Cleanup**: Removed debug logging of file ranges during the merge process in `twcs.rs`. * feat/revise-compaction-picker: ## Enhance Compaction Logic and Metrics - **Compaction Logic Improvements**: - Added methods `input_file_size` and `output_file_size` to `MergeOutput` in `compactor.rs` to streamline file size calculations. - Updated `Compactor` implementation to use these methods for metrics tracking. - Modified `Ranged` trait logic in `run.rs` to improve range comparison. - Enhanced test cases in `run.rs` to reflect changes in compaction logic. - **Metrics Enhancements**: - Changed `COMPACTION_INPUT_BYTES` and `COMPACTION_OUTPUT_BYTES` from histograms to counters in `metrics.rs` for better performance tracking. - **Debugging and Logging**: - Added detailed logging for compaction pick results in `twcs.rs`. - Implemented custom `Debug` trait for `FileMeta` in `file.rs` to improve debugging output. - **Testing Enhancements**: - Added new test `test_compaction_overlapping_files` in `compaction_test.rs` to verify compaction behavior with overlapping files. - Updated `merge_mode_test.rs` to reflect changes in file handling during scans. * feat/revise-compaction-picker: ### Update `FileHandle` Debug Implementation - **Refactor Debug Output**: Simplified the `fmt::Debug` implementation for `FileHandle` in `src/mito2/src/sst/file.rs` by consolidating multiple fields into a single `meta` field using `meta_ref()`. - **Atomic Operations**: Updated the `deleted` field to use atomic loading with `Ordering::Relaxed`. * Trigger CI * feat/revise-compaction-picker: **Update compaction logic and default options** - **`twcs.rs`**: Enhanced logging for compaction pick results by improving the formatting for better readability. - **`options.rs`**: Modified the default `max_output_file_size` in `TwcsOptions` from 2GB to 512MB to optimize file handling and performance. * feat/revise-compaction-picker: Refactor `find_overlapping_items` to use an external result vector - Updated `find_overlapping_items` in `src/mito2/src/compaction/run.rs` to accept a mutable result vector instead of returning a new vector, improving memory efficiency. - Modified benchmarks in `src/mito2/benches/bench_compaction_picker.rs` to accommodate the new function signature. - Adjusted tests in `src/mito2/src/compaction/run.rs` to use the updated function signature, ensuring correct functionality with the new approach. * feat/revise-compaction-picker: Improve file merging logic in `run.rs` - Refactor the loop logic in `merge_seq_files` to simplify the iteration over file groups. - Adjust the range for `end_idx` to include the endpoint, allowing for more flexible group selection. - Remove the condition that skips groups with only one file, enabling more comprehensive processing of file sequences. * feat/revise-compaction-picker: Enhance `find_overlapping_items` with `SortedRun` and Update Tests - Refactor `find_overlapping_items` in `src/mito2/src/compaction/run.rs` to utilize the `SortedRun` struct for improved efficiency and clarity. - Introduce a `sorted` flag in `SortedRun` to optimize sorting operations. - Update test cases in `src/mito2/benches/bench_compaction_picker.rs` to accommodate changes in `find_overlapping_items` by using `SortedRun`. - Add `From<Vec<T>>` implementation for `SortedRun` to facilitate easy conversion from vectors. * feat/revise-compaction-picker: **Enhancements in `compaction/run.rs`:** - Added `ReadableSize` import to handle size calculations. - Modified the logic in `merge_seq_files` to clamp the calculated target size to a maximum of 2GB when `max_file_size` is not provided. * feat/revise-compaction-picker: Add Default Max Output Size Constant for Compaction Introduce DEFAULT_MAX_OUTPUT_SIZE constant to define the default maximum compaction output file size as 2GB. Refactor the merge_seq_files function to utilize this constant, ensuring consistent and maintainable code for handling file size limits during compaction.
223 lines
7.6 KiB
Rust
223 lines
7.6 KiB
Rust
// Copyright 2023 Greptime Team
|
|
//
|
|
// Licensed under the Apache License, Version 2.0 (the "License");
|
|
// you may not use this file except in compliance with the License.
|
|
// You may obtain a copy of the License at
|
|
//
|
|
// http://www.apache.org/licenses/LICENSE-2.0
|
|
//
|
|
// Unless required by applicable law or agreed to in writing, software
|
|
// distributed under the License is distributed on an "AS IS" BASIS,
|
|
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
// See the License for the specific language governing permissions and
|
|
// limitations under the License.
|
|
|
|
use std::fmt::Display;
|
|
|
|
use common_query::AddColumnLocation;
|
|
use datatypes::data_type::ConcreteDataType;
|
|
use sql::statements::concrete_data_type_to_sql_data_type;
|
|
|
|
use crate::error::{Error, Result};
|
|
use crate::ir::alter_expr::AlterTableOperation;
|
|
use crate::ir::create_expr::ColumnOption;
|
|
use crate::ir::{AlterTableExpr, Column};
|
|
use crate::translator::common::CommonAlterTableTranslator;
|
|
use crate::translator::DslTranslator;
|
|
|
|
pub struct AlterTableExprTranslator;
|
|
|
|
impl DslTranslator<AlterTableExpr, String> for AlterTableExprTranslator {
|
|
type Error = Error;
|
|
|
|
fn translate(&self, input: &AlterTableExpr) -> Result<String> {
|
|
Ok(match &input.alter_kinds {
|
|
AlterTableOperation::AddColumn { column, location } => {
|
|
Self::format_add_column(&input.table_name, column, location)
|
|
}
|
|
AlterTableOperation::RenameTable { new_table_name } => {
|
|
Self::format_rename(&input.table_name, new_table_name)
|
|
}
|
|
AlterTableOperation::ModifyDataType { column } => {
|
|
Self::format_modify_data_type(&input.table_name, column)
|
|
}
|
|
_ => CommonAlterTableTranslator.translate(input)?,
|
|
})
|
|
}
|
|
}
|
|
|
|
impl AlterTableExprTranslator {
|
|
fn format_rename(name: impl Display, new_name: impl Display) -> String {
|
|
format!("ALTER TABLE {name} RENAME {new_name};")
|
|
}
|
|
|
|
fn format_add_column(
|
|
name: impl Display,
|
|
column: &Column,
|
|
location: &Option<AddColumnLocation>,
|
|
) -> String {
|
|
format!(
|
|
"{};",
|
|
vec![
|
|
format!(
|
|
"ALTER TABLE {name} ADD COLUMN {}",
|
|
Self::format_column(column)
|
|
),
|
|
Self::format_location(location).unwrap_or_default(),
|
|
]
|
|
.into_iter()
|
|
.filter(|s| !s.is_empty())
|
|
.collect::<Vec<_>>()
|
|
.join(" ")
|
|
)
|
|
}
|
|
|
|
fn format_modify_data_type(name: impl Display, column: &Column) -> String {
|
|
format!(
|
|
"ALTER TABLE {name} MODIFY COLUMN {};",
|
|
Self::format_column(column)
|
|
)
|
|
}
|
|
|
|
fn format_location(location: &Option<AddColumnLocation>) -> Option<String> {
|
|
location.as_ref().map(|location| match location {
|
|
AddColumnLocation::First => "FIRST".to_string(),
|
|
AddColumnLocation::After { column_name } => format!("AFTER {column_name}"),
|
|
})
|
|
}
|
|
|
|
fn format_column(column: &Column) -> String {
|
|
vec![
|
|
column.name.to_string(),
|
|
Self::format_column_type(&column.column_type),
|
|
Self::format_column_options(&column.options),
|
|
]
|
|
.into_iter()
|
|
.filter(|s| !s.is_empty())
|
|
.collect::<Vec<_>>()
|
|
.join(" ")
|
|
}
|
|
|
|
fn format_column_type(column_type: &ConcreteDataType) -> String {
|
|
// Safety: We don't use the `Dictionary` type
|
|
concrete_data_type_to_sql_data_type(column_type)
|
|
.unwrap()
|
|
.to_string()
|
|
}
|
|
|
|
fn format_column_options(options: &[ColumnOption]) -> String {
|
|
options
|
|
.iter()
|
|
.map(|option| option.to_string())
|
|
.collect::<Vec<_>>()
|
|
.join(" ")
|
|
}
|
|
}
|
|
|
|
#[cfg(test)]
|
|
mod tests {
|
|
use std::str::FromStr;
|
|
|
|
use common_base::readable_size::ReadableSize;
|
|
use common_query::AddColumnLocation;
|
|
use common_time::Duration;
|
|
use datatypes::data_type::ConcreteDataType;
|
|
|
|
use super::AlterTableExprTranslator;
|
|
use crate::ir::alter_expr::{AlterTableOperation, AlterTableOption, Ttl};
|
|
use crate::ir::create_expr::ColumnOption;
|
|
use crate::ir::{AlterTableExpr, Column};
|
|
use crate::translator::DslTranslator;
|
|
|
|
#[test]
|
|
fn test_alter_table_expr() {
|
|
let alter_expr = AlterTableExpr {
|
|
table_name: "test".into(),
|
|
alter_kinds: AlterTableOperation::AddColumn {
|
|
column: Column {
|
|
name: "host".into(),
|
|
column_type: ConcreteDataType::string_datatype(),
|
|
options: vec![ColumnOption::PrimaryKey],
|
|
},
|
|
location: Some(AddColumnLocation::First),
|
|
},
|
|
};
|
|
|
|
let output = AlterTableExprTranslator.translate(&alter_expr).unwrap();
|
|
assert_eq!(
|
|
"ALTER TABLE test ADD COLUMN host STRING PRIMARY KEY FIRST;",
|
|
output
|
|
);
|
|
|
|
let alter_expr = AlterTableExpr {
|
|
table_name: "test".into(),
|
|
alter_kinds: AlterTableOperation::RenameTable {
|
|
new_table_name: "foo".into(),
|
|
},
|
|
};
|
|
|
|
let output = AlterTableExprTranslator.translate(&alter_expr).unwrap();
|
|
assert_eq!("ALTER TABLE test RENAME foo;", output);
|
|
|
|
let alter_expr = AlterTableExpr {
|
|
table_name: "test".into(),
|
|
alter_kinds: AlterTableOperation::DropColumn { name: "foo".into() },
|
|
};
|
|
|
|
let output = AlterTableExprTranslator.translate(&alter_expr).unwrap();
|
|
assert_eq!("ALTER TABLE test DROP COLUMN foo;", output);
|
|
|
|
let alter_expr = AlterTableExpr {
|
|
table_name: "test".into(),
|
|
alter_kinds: AlterTableOperation::ModifyDataType {
|
|
column: Column {
|
|
name: "host".into(),
|
|
column_type: ConcreteDataType::string_datatype(),
|
|
options: vec![],
|
|
},
|
|
},
|
|
};
|
|
|
|
let output = AlterTableExprTranslator.translate(&alter_expr).unwrap();
|
|
assert_eq!("ALTER TABLE test MODIFY COLUMN host STRING;", output);
|
|
}
|
|
|
|
#[test]
|
|
fn test_alter_table_expr_set_table_options() {
|
|
let alter_expr = AlterTableExpr {
|
|
table_name: "test".into(),
|
|
alter_kinds: AlterTableOperation::SetTableOptions {
|
|
options: vec![
|
|
AlterTableOption::Ttl(Ttl::Duration(Duration::new_second(60))),
|
|
AlterTableOption::TwcsTimeWindow(Duration::new_second(60)),
|
|
AlterTableOption::TwcsMaxOutputFileSize(ReadableSize::from_str("1GB").unwrap()),
|
|
AlterTableOption::TwcsTriggerFileNum(5),
|
|
],
|
|
},
|
|
};
|
|
|
|
let output = AlterTableExprTranslator.translate(&alter_expr).unwrap();
|
|
let expected = concat!(
|
|
"ALTER TABLE test SET 'ttl' = '60s', ",
|
|
"'compaction.twcs.time_window' = '60s', ",
|
|
"'compaction.twcs.max_output_file_size' = '1.0GiB', ",
|
|
"'compaction.twcs.trigger_file_num' = '5';"
|
|
);
|
|
assert_eq!(expected, output);
|
|
}
|
|
|
|
#[test]
|
|
fn test_alter_table_expr_unset_table_options() {
|
|
let alter_expr = AlterTableExpr {
|
|
table_name: "test".into(),
|
|
alter_kinds: AlterTableOperation::UnsetTableOptions {
|
|
keys: vec!["ttl".into(), "compaction.twcs.time_window".into()],
|
|
},
|
|
};
|
|
|
|
let output = AlterTableExprTranslator.translate(&alter_expr).unwrap();
|
|
let expected = "ALTER TABLE test UNSET 'ttl', 'compaction.twcs.time_window';";
|
|
assert_eq!(expected, output);
|
|
}
|
|
}
|