* fix unit tests
* fix: sqlness
* fix/default-time-window:
## Add Helper Functions and Enhance Compaction Tests
- **Refactor Compaction Logic**: Introduced helper functions `flush` and `compact` in `compaction_test.rs` to streamline compaction operations.
- **Enhance Compaction Tests**: Added a new test `test_infer_compaction_time_window` in `compaction_test.rs` to verify compaction time window inference.
- **Testing Improvements**: Added `#[cfg(test)]` attribute to `new_multi_partitions` in `time_partition.rs` to ensure it's only included in test builds.
* fix/default-time-window:
- **Refactor `TimePartition` Struct**: Removed unnecessary comments regarding `time_range` in `time_partition.rs`.
- **Enhance `TimePartitions` Functionality**: Added a method `part_duration_or_default` to provide a default partition duration in `time_partition.rs`.
- **Update SQL Test Cases**: Modified SQL operations and expected results in `scan_big_varchar.result` and `scan_big_varchar.sql` to reflect changes in data manipulation logic.
* fix/default-time-window:
### Update Time Partition Default Duration
- **Refactor Default Duration**: Introduced `INITIAL_TIME_WINDOW` constant to define the default time window duration as `Duration::from_days(1)`. This change replaces multiple instances of the hardcoded default duration across the `time_partition.rs` file.
- **Files Affected**: `time_partition.rs`
* fix/default-time-window:
## Update Partition Duration Handling
- **`time_partition.rs`**: Refactored `part_duration` to be non-optional, removing `Option` wrapper. Updated logic to use `unwrap_or` with `INITIAL_TIME_WINDOW` where necessary. Adjusted related methods and tests to accommodate this change.
- **`version.rs` (memtable and region)**: Updated handling of `part_duration` to align with changes in `time_partition.rs`, ensuring consistent use of non-optional `Duration`.
* fix/default-time-window:
### Improve Error Context in `time_partition.rs`
- Enhanced error context message in `time_partition.rs` to provide clearer information on partition time range issues, including bucket size details.
Signed-off-by: Lei, HUANG <lhuang@greptime.com>
---------
Signed-off-by: Lei, HUANG <lhuang@greptime.com>
* fix: alter table update table column default
* fix: fuzz test also cast default value
* chore: more testcase
* test: non-zero value
* refactor: per review
* tests: unexpected alter result(WIP on fix)
* ub
* ub more
* test: update sqlness
* - **Refactor `RegionFilePathFactory` to `RegionFilePathProvider`:** Updated references and implementations in `access_layer.rs`, `write_cache.rs`, and related test files to use the new struct name.
- **Add `max_file_size` support in compaction:** Introduced `max_file_size` option in `PickerOutput`, `SerializedPickerOutput`, and `WriteOptions` in `compactor.rs`, `picker.rs`, `twcs.rs`, and `window.rs`.
- **Enhance Parquet writing logic:** Modified `parquet.rs` and `parquet/writer.rs` to support optional `max_file_size` and added a test case `test_write_multiple_files` to verify writing multiple files based on size constraints.
**Refactor Parquet Writer Initialization and File Handling**
- Updated `ParquetWriter` in `writer.rs` to handle `current_indexer` as an `Option`, allowing for more flexible initialization and management.
- Introduced `finish_current_file` method to encapsulate logic for completing and transitioning between SST files, improving code clarity and maintainability.
- Enhanced error handling and logging with `debug` statements for better traceability during file operations.
- **Removed Output Size Enforcement in `twcs.rs`:**
- Deleted the `enforce_max_output_size` function and related logic to simplify compaction input handling.
- **Added Max File Size Option in `parquet.rs`:**
- Introduced `max_file_size` in `WriteOptions` to control the maximum size of output files.
- **Refactored Indexer Management in `parquet/writer.rs`:**
- Changed `current_indexer` from an `Option` to a direct `Indexer` type.
- Implemented `roll_to_next_file` to handle file transitions when exceeding `max_file_size`.
- Simplified indexer initialization and management logic.
- **Refactored SST File Handling**:
- Introduced `FilePathProvider` trait and its implementations (`WriteCachePathProvider`, `RegionFilePathFactory`) to manage SST and index file paths.
- Updated `AccessLayer`, `WriteCache`, and `ParquetWriter` to use `FilePathProvider` for path management.
- Modified `SstWriteRequest` and `SstUploadRequest` to use path providers instead of direct paths.
- Files affected: `access_layer.rs`, `write_cache.rs`, `parquet.rs`, `writer.rs`.
- **Enhanced Indexer Management**:
- Replaced `IndexerBuilder` with `IndexerBuilderImpl` and made it async to support dynamic indexer creation.
- Updated `ParquetWriter` to handle multiple indexers and file IDs.
- Files affected: `index.rs`, `parquet.rs`, `writer.rs`.
- **Removed Redundant File ID Handling**:
- Removed `file_id` from `SstWriteRequest` and `CompactionOutput`.
- Updated related logic to dynamically generate file IDs where necessary.
- Files affected: `compaction.rs`, `flush.rs`, `picker.rs`, `twcs.rs`, `window.rs`.
- **Test Adjustments**:
- Updated tests to align with new path and indexer management.
- Introduced `FixedPathProvider` and `NoopIndexBuilder` for testing purposes.
- Files affected: `sst_util.rs`, `version_util.rs`, `parquet.rs`.
* chore: rebase main
* feat/multiple-compaction-output:
### Add Benchmarking and Refactor Compaction Logic
- **Benchmarking**: Added a new benchmark `run_bench` in `Cargo.toml` and implemented benchmarks in `benches/run_bench.rs` using Criterion for `find_sorted_runs` and `reduce_runs` functions.
- **Compaction Module Enhancements**:
- Made `run.rs` public and refactored the `Ranged` and `Item` traits to be public.
- Simplified the logic in `find_sorted_runs` and `reduce_runs` by removing `MergeItems` and related functions.
- Introduced `find_overlapping_items` for identifying overlapping items.
- **Code Cleanup**: Removed redundant code and tests related to `MergeItems` in `run.rs`.
* feat/multiple-compaction-output:
### Enhance Compaction Logic and Add Benchmarks
- **Compaction Logic Improvements**:
- Updated `reduce_runs` function in `src/mito2/src/compaction/run.rs` to remove the target parameter and improve the logic for selecting files to merge based on minimum penalty.
- Enhanced `find_overlapping_items` to handle unsorted inputs and improve overlap detection efficiency.
- **Benchmark Enhancements**:
- Added `bench_find_overlapping_items` in `src/mito2/benches/run_bench.rs` to benchmark the new `find_overlapping_items` function.
- Extended existing benchmarks to include larger data sizes.
- **Testing Enhancements**:
- Updated tests in `src/mito2/src/compaction/run.rs` to reflect changes in `reduce_runs` and added new tests for `find_overlapping_items`.
- **Logging and Debugging**:
- Improved logging in `src/mito2/src/compaction/twcs.rs` to provide more detailed information about compaction decisions.
* feat/multiple-compaction-output:
### Refactor and Enhance Compaction Logic
- **Refactor `find_overlapping_items` Function**: Changed the function signature to accept slices instead of mutable vectors in `run.rs`.
- **Rename and Update Struct Fields**: Renamed `penalty` to `size` in `SortedRun` struct and updated related logic in `run.rs`.
- **Enhance `reduce_runs` Function**: Improved logic to sort runs by size and limit probe runs to 100 in `run.rs`.
- **Add `merge_seq_files` Function**: Introduced a new function `merge_seq_files` in `run.rs` for merging sequential files.
- **Modify `TwcsPicker` Logic**: Updated the compaction logic to use `merge_seq_files` when only one run is found in `twcs.rs`.
- **Remove `enforce_file_num` Function**: Deleted the `enforce_file_num` function and its related test cases in `twcs.rs`.
* feat/multiple-compaction-output:
### Enhance Compaction Logic and Testing
- **Add `merge_seq_files` Functionality**: Implemented the `merge_seq_files` function in `run.rs` to optimize file merging based on scoring systems. Updated
benchmarks in `run_bench.rs` to include `bench_merge_seq_files`.
- **Improve Compaction Strategy in `twcs.rs`**: Modified the compaction logic to handle file merging more effectively, considering file size and overlap.
- **Update Tests**: Enhanced test coverage in `compaction_test.rs` and `append_mode_test.rs` to validate new compaction logic and file merging strategies.
- **Remove Unused Function**: Deleted `new_file_handles` from `test_util.rs` as it was no longer needed.
* feat/multiple-compaction-output:
### Refactor TWCS Compaction Options
- **Refactor Compaction Logic**: Simplified the TWCS compaction logic by replacing multiple parameters (`max_active_window_runs`, `max_active_window_files`, `max_inactive_window_runs`, `max_inactive_window_files`) with a single `trigger_file_num` parameter in `picker.rs`, `twcs.rs`, and `options.rs`.
- **Update Tests**: Adjusted test cases to reflect the new compaction logic in `append_mode_test.rs`, `compaction_test.rs`, `filter_deleted_test.rs`, `merge_mode_test.rs`, and various test files under `tests/cases`.
- **Modify Engine Options**: Updated engine option keys to use `trigger_file_num` in `mito_engine_options.rs` and `region_request.rs`.
- **Fuzz Testing**: Updated fuzz test generators and translators to accommodate the new compaction parameter in `alter_expr.rs` and related files.
This refactor aims to streamline the compaction configuration by reducing the number of parameters and simplifying the codebase.
* chore: add trailing space
* fix license header
* feat/revise-compaction-picker:
**Limit File Processing and Optimize Merge Logic in `run.rs`**
- Introduced a limit to process a maximum of 100 files in `merge_seq_files` to control time complexity.
- Adjusted logic to calculate `target_size` and iterate over files using the limited set of files.
- Updated scoring calculations to use the limited file set, ensuring efficient file merging.
* feat/revise-compaction-picker:
### Add Compaction Metrics and Remove Debug Logging
- **Compaction Metrics**: Introduced new histograms `COMPACTION_INPUT_BYTES` and `COMPACTION_OUTPUT_BYTES` to track compaction input and output file sizes in `metrics.rs`. Updated `compactor.rs` to observe these metrics during the compaction process.
- **Logging Cleanup**: Removed debug logging of file ranges during the merge process in `twcs.rs`.
* feat/revise-compaction-picker:
## Enhance Compaction Logic and Metrics
- **Compaction Logic Improvements**:
- Added methods `input_file_size` and `output_file_size` to `MergeOutput` in `compactor.rs` to streamline file size calculations.
- Updated `Compactor` implementation to use these methods for metrics tracking.
- Modified `Ranged` trait logic in `run.rs` to improve range comparison.
- Enhanced test cases in `run.rs` to reflect changes in compaction logic.
- **Metrics Enhancements**:
- Changed `COMPACTION_INPUT_BYTES` and `COMPACTION_OUTPUT_BYTES` from histograms to counters in `metrics.rs` for better performance tracking.
- **Debugging and Logging**:
- Added detailed logging for compaction pick results in `twcs.rs`.
- Implemented custom `Debug` trait for `FileMeta` in `file.rs` to improve debugging output.
- **Testing Enhancements**:
- Added new test `test_compaction_overlapping_files` in `compaction_test.rs` to verify compaction behavior with overlapping files.
- Updated `merge_mode_test.rs` to reflect changes in file handling during scans.
* feat/revise-compaction-picker:
### Update `FileHandle` Debug Implementation
- **Refactor Debug Output**: Simplified the `fmt::Debug` implementation for `FileHandle` in `src/mito2/src/sst/file.rs` by consolidating multiple fields into a single `meta` field using `meta_ref()`.
- **Atomic Operations**: Updated the `deleted` field to use atomic loading with `Ordering::Relaxed`.
* Trigger CI
* feat/revise-compaction-picker:
**Update compaction logic and default options**
- **`twcs.rs`**: Enhanced logging for compaction pick results by improving the formatting for better readability.
- **`options.rs`**: Modified the default `max_output_file_size` in `TwcsOptions` from 2GB to 512MB to optimize file handling and performance.
* feat/revise-compaction-picker:
Refactor `find_overlapping_items` to use an external result vector
- Updated `find_overlapping_items` in `src/mito2/src/compaction/run.rs` to accept a mutable result vector instead of returning a new vector, improving memory efficiency.
- Modified benchmarks in `src/mito2/benches/bench_compaction_picker.rs` to accommodate the new function signature.
- Adjusted tests in `src/mito2/src/compaction/run.rs` to use the updated function signature, ensuring correct functionality with the new approach.
* feat/revise-compaction-picker:
Improve file merging logic in `run.rs`
- Refactor the loop logic in `merge_seq_files` to simplify the iteration over file groups.
- Adjust the range for `end_idx` to include the endpoint, allowing for more flexible group selection.
- Remove the condition that skips groups with only one file, enabling more comprehensive processing of file sequences.
* feat/revise-compaction-picker:
Enhance `find_overlapping_items` with `SortedRun` and Update Tests
- Refactor `find_overlapping_items` in `src/mito2/src/compaction/run.rs` to utilize the `SortedRun` struct for improved efficiency and clarity.
- Introduce a `sorted` flag in `SortedRun` to optimize sorting operations.
- Update test cases in `src/mito2/benches/bench_compaction_picker.rs` to accommodate changes in `find_overlapping_items` by using `SortedRun`.
- Add `From<Vec<T>>` implementation for `SortedRun` to facilitate easy conversion from vectors.
* feat/revise-compaction-picker:
**Enhancements in `compaction/run.rs`:**
- Added `ReadableSize` import to handle size calculations.
- Modified the logic in `merge_seq_files` to clamp the calculated target size to a maximum of 2GB when `max_file_size` is not provided.
* feat/revise-compaction-picker: Add Default Max Output Size Constant for Compaction
Introduce DEFAULT_MAX_OUTPUT_SIZE constant to define the default maximum compaction output file size as 2GB. Refactor the merge_seq_files function to utilize this constant, ensuring consistent and maintainable code for handling file size limits during compaction.
* fix: select after alter
* fix: insert a proper row&catch a bug
* fix: alter table modify type modify default value type too
* refactor: per review
* chore: per review
* refactor: per review
* refactor: per review
* chore: insert support string to numeric auto cast
* test: add sqlness test
* chore: remove log
* test: fix sql test
* style: fix clippy
* test: test invalid number
* feat: do not convert to default if unable to parse
* chore: update comment
* test: update sqlness test
* test: update prepare test
* chore: only retry when retry-able
* chore: revert dbg change
* refactor: per review
* fix: check for available frontend first
* docs: more explain&longer timeout&feat: more retry at every level&try send select 1
* fix: use `sql` method for "SELECT 1"
* fix: also put recover flows in spawned task and a dead loop
* test: update transient error in flow rebuild test
* chore: sleep after sqlness sleep
* chore: add a warning
* chore: wait even more time after reboot
* test: incorrect test result when filtering pk with multiple columns
* fix: prune non first tag correctly
Distinguish no column and no stats and only use default value when no
column
* test: update test result
* refactor: rename test file
* test: add test for null filter
* fix: use StatValues for null counts
* test: drop table
* test: fix unstable flow test
* feat: use flow batching engine
broken: try using logical plan
fix: use dummy catalog for logical plan
fix: insert plan exec&sqlness grpc addr
feat: use frontend instance in flownode in standalone
feat: flow type in metasrv&fix: flush flow out of sync& column name alias
tests: sqlness update
tests: sqlness flow rebuild udpate
chore: per review
refactor: keep chnl mgr
refactor: use catalog mgr for get table
tests: use valid sql
fix: add more check
refactor: put flow type determine to frontend
* chore: update proto
* chore: update proto to main branch
* fix: add locks for create/drop flow&docs: update docs
* feat: flush_flow flush all ranges now
* test: add align time window test
* docs: explain `nodeid` use in check task
* refactor: AddAutoColumnRewriter check for Projection
* refactor: per review
* fix: query without time window also clean dirty time window
* chore: better logging
* chore: add comments per review
* refactor: per review
* chore: per review
* chore: per review rename args
* refactor: per review partially
* chore: update docs
* chore: use better error variant
* chore: better error variant
* refactor: rename FlowWorkerManager to FlowStreamingEngine
* rename again
* refactor: per review
* chore: rebase after #5963 merged
* refactor: rename all flow_worker_manager occurs
* docs: rm resolved TODO
* fix: store flow schema on creation
* chore: update sqlness
* refactor: save the entire query context to flow info
* chore: sqlness update
* chore: rm pub
* fix: keep old version compatibility
* test: sqlness test case
* feat: use correct default while pruning row groups
* fix: consider default in SimpleFilterContext
* test: update sqlness test
* test: add order by
* feat: enable submitting wal prune procedure periodically
* chore: fix and add options
* test: add unit test
* test: fix unit test
* test: enable active_wal_pruning in test
* test: update default config
* chore: update config name
* refactor: use semaphore to control the number of prune process
* refactor: use split client for wal prune manager and topic creator
* chore: add configs
* chore: apply review comments
* fix: use tracker properly
* fix: use guard to track semaphore
* test: update unit tests
* chore: update config name
* chore: use prunable_entry_id
* refactor: semaphore to only limit the process of submitting
* chore: remove legacy sort
* chore: better configs
* fix: update config.md
* chore: respect fmt
* test: update unit tests
* chore: use interval_at
* fix: fix unit test
* test: fix unit test
* test: fix unit test
* chore: apply review comments
* docs: update config docs
* fix/pg-timestamp-diff:
### Add Support for `Duration` Type in PostgreSQL Encoding
- **Enhanced `encode_value` Functionality**: Updated `src/servers/src/postgres/types.rs` to support encoding of `Value::Duration` using `PgInterval`.
- **Implemented `Duration` Conversion**: Added conversion logic from `Duration` to `PgInterval` in `src/servers/src/postgres/types/interval.rs`.
- **Added Unit Tests**: Introduced tests for `Duration` to `PgInterval` conversion in `src/servers/src/postgres/types/interval.rs`.
- **Updated SQL Test Cases**: Modified `tests/cases/standalone/common/types/timestamp/timestamp.sql` and `timestamp.result` to include tests for timestamp subtraction using PostgreSQL protocol.
* fix: overflow
* fix/pg-timestamp-diff:
Update `timestamp.sql` to ensure newline consistency
- Modified `timestamp.sql` to add a newline at the end of the file for consistency.
* fix/pg-timestamp-diff:
### Add Documentation for Month Approximation in Interval Calculation
- **File Modified**: `src/servers/src/postgres/types/interval.rs`
- **Key Change**: Added a comment explaining the approximation of one month as 30.44 days in the interval calculations.