* feat: gc ctx&procedure
Signed-off-by: discord9 <discord9@163.com>
* fix: handle region not found case
Signed-off-by: discord9 <discord9@163.com>
* docs: more explain&todo
Signed-off-by: discord9 <discord9@163.com>
* per review
Signed-off-by: discord9 <discord9@163.com>
* chore: add time for region gc
Signed-off-by: discord9 <discord9@163.com>
* fix: explain why loader for gc region should fail
Signed-off-by: discord9 <discord9@163.com>
---------
Signed-off-by: discord9 <discord9@163.com>
* feat: split batches by rule in build_flat_sources()
It checks the num_series and splits batches when the series cardinality
is low
Signed-off-by: evenyag <realevenyag@gmail.com>
* fix: panic when no num_series available
Signed-off-by: evenyag <realevenyag@gmail.com>
* fix: don't subtract file index if checking mem range
Signed-off-by: evenyag <realevenyag@gmail.com>
* chore: update comments and control flow
Signed-off-by: evenyag <realevenyag@gmail.com>
* style: fix clippy
Signed-off-by: evenyag <realevenyag@gmail.com>
---------
Signed-off-by: evenyag <realevenyag@gmail.com>
* feat: divide parquet and puffin index
Signed-off-by: evenyag <realevenyag@gmail.com>
* feat: download index files when we open the region
Signed-off-by: evenyag <realevenyag@gmail.com>
* feat: use different label for parquet/puffin
Signed-off-by: evenyag <realevenyag@gmail.com>
* feat: control parallelism and cache size by env
Signed-off-by: evenyag <realevenyag@gmail.com>
* fix: change gauge to counter
Signed-off-by: evenyag <realevenyag@gmail.com>
* fix: correct file type labels in file cache
Signed-off-by: evenyag <realevenyag@gmail.com>
* refactor: move env to config and change cache ratio to percent
Signed-off-by: evenyag <realevenyag@gmail.com>
* feat: checks capacity before download and refine metrics
Signed-off-by: evenyag <realevenyag@gmail.com>
* refactor: change open to return MitoRegionRef
Signed-off-by: evenyag <realevenyag@gmail.com>
* refactor: extract download to FileCache
Signed-off-by: evenyag <realevenyag@gmail.com>
* feat: run load cache task in write cache
Signed-off-by: evenyag <realevenyag@gmail.com>
* feat: check region state before downloading files
Signed-off-by: evenyag <realevenyag@gmail.com>
* chore: update config docs and test
Signed-off-by: evenyag <realevenyag@gmail.com>
* fix: use file id from index_file_id to compute puffin key
Signed-off-by: evenyag <realevenyag@gmail.com>
* fix: skip loading cache in some states
Signed-off-by: evenyag <realevenyag@gmail.com>
---------
Signed-off-by: evenyag <realevenyag@gmail.com>
* fix/region-expire-state:
Refactor region state handling in compaction task and manifest updates
- Introduce a variable to hold the current region state for clarity in compaction task updates.
- Add an expected_region_state field to RegionEditResult to manage region state expectations during manifest handling.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* fix/region-expire-state:
Refactor region state handling in compaction task
- Replace direct assignment of `RegionLeaderState::Writable` with dynamic state retrieval and conditional check for leader state.
- Modify `RegionEditResult` to include a flag `update_region_state` instead of `expected_region_state` to indicate if the region state should be updated to writable.
- Adjust handling of `RegionEditResult` in `handle_manifest` to conditionally update region state based on the new flag.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
---------
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* fix/pick-continue:
### Add Tests for TWCS Compaction Logic
- **`twcs.rs`**:
- Modified the logic in `TwcsPicker` to handle cases with zero runs by using `continue` instead of `return`.
- Added two new test cases: `test_build_output_multiple_windows_with_zero_runs` and `test_build_output_single_window_zero_runs` to verify the behavior of the compaction logic when there are zero runs in
the windows.
- **`memtable_util.rs`**:
- Removed unused import `PredicateGroup`.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* fix: clippy
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* fix/pick-continue:
### Commit Message
Enhance Compaction Process with Expired SST Handling and Testing
- **`compactor.rs`**:
- Introduced handling for expired SSTs by updating the manifest immediately upon task completion.
- Added new test cases to verify the handling of expired SSTs and manifest updates.
- **`task.rs`**:
- Implemented `remove_expired` function to handle expired SSTs by updating the manifest and notifying the region worker loop.
- Refactored `handle_compaction` to `handle_expiration_and_compaction` to integrate expired SST removal before merging inputs.
- Added logging and error handling for expired SST removal process.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* refactor/progressive-compaction:
**Enhance Compaction Task Error Handling**
- Updated `task.rs` to conditionally execute the removal of expired SST files only when they exist, improving error handling and performance.
- Added a check for non-empty `expired_ssts` before initiating the removal process, ensuring unnecessary operations are avoided.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* refactor/progressive-compaction:
### Refactor `DefaultCompactor` to Extract `merge_single_output` Method
- **File**: `src/mito2/src/compaction/compactor.rs`
- Extracted the logic for merging a single compaction output into SST files into a new method `merge_single_output` within the `DefaultCompactor` struct.
- Simplified the `merge_ssts` method by utilizing the new `merge_single_output` method, reducing code duplication and improving maintainability.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* refactor/progressive-compaction:
### Add Max Background Compaction Tasks Configuration
- **`compaction.rs`**: Added `max_background_compactions` to the compaction scheduler to limit background tasks.
- **`compaction/compactor.rs`**: Removed immediate manifest update logic after task completion.
- **`compaction/picker.rs`**: Introduced `max_background_tasks` parameter in `new_picker` to control task limits.
- **`compaction/twcs.rs`**: Updated `TwcsPicker` to include `max_background_tasks` and truncate inputs exceeding this limit. Added related test cases to ensure functionality.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* fix/pick-continue:
### Improve Error Handling and Task Management in Compaction
- **`task.rs`**: Enhanced error handling in `remove_expired` function by logging errors without halting the compaction process. Removed the return of `Result` type and added detailed logging for various
failure scenarios.
- **`twcs.rs`**: Adjusted task management logic by removing input truncation based on `max_background_tasks` and instead discarding remaining tasks if the output size exceeds the limit. This ensures better
control over task execution and resource management.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* fix/pick-continue:
### Add Unit Tests for Compaction Task and TWCS Picker
- **`task.rs`**: Added unit tests to verify the behavior of `PickerOutput` with and without expired SSTs.
- **`twcs.rs`**: Introduced tests for `TwcsPicker` to ensure correct handling of `max_background_tasks` during compaction, including scenarios with and without task truncation.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* fix/pick-continue:
**Improve Error Handling and Notification in Compaction Task**
- **File:** `task.rs`
- Changed log level from `warn` to `error` for manifest update failures to enhance error visibility.
- Refactored the notification mechanism for expired file removal by using `BackgroundNotify::RegionEdit` with `RegionEditResult` to streamline the process.
- Simplified error handling by consolidating match cases into a single `if let Err` block for better readability and maintainability.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
---------
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* test: add tests for scanning append mode before flush
Signed-off-by: evenyag <realevenyag@gmail.com>
* refactor: extract a function maybe_dedup_one
Signed-off-by: evenyag <realevenyag@gmail.com>
* ci: add flat format to docs.yml so we can make it required later
Signed-off-by: evenyag <realevenyag@gmail.com>
---------
Signed-off-by: evenyag <realevenyag@gmail.com>
* mito2: add unit test for flat single-range append_mode dedup behavior
Verify memtable_flat_sources skips dedup when append_mode is true and
performs dedup otherwise for single-range flat memtables, preventing
regressions in the new append_mode path.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* fix/flat-source-merge:
### Improve Column Metadata Extraction Logic
- **File**: `src/common/meta/src/ddl/utils.rs`
- Modified the `extract_column_metadatas` function to use `swap_remove` for extracting the first schema and decode column metadata for comparison instead of raw bytes. This ensures that the extension map is considered during
verification, enhancing the robustness of metadata consistency checks across datanodes.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
---------
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* fix: potential failure in the test_index_build_type_compact test
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* fix: relax timestamp checking in test_timestamp_default_now
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
---------
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* feat/objbench-subcmd:
### Add Object Storage Benchmark Tool and Update Dependencies
- **`Cargo.lock` & `Cargo.toml`**: Added dependencies for `colored`, `parquet`, and `pprof` to support new features.
- **`datanode.rs`**: Introduced `ObjbenchCommand` for benchmarking object storage, including command-line options for configuration and execution. Added `StorageConfig` and `StorageConfigWrapper` for storage engine configuration.
- **`datanode.rs`**: Implemented a stub for `build_object_store` function to initialize object storage.
These changes introduce a new subcommand for object storage benchmarking and update dependencies to support additional functionality.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* init
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* fix: code style and clippy
* feat/objbench-subcmd:
Improve error handling in `objbench.rs`
- Enhanced error handling in `parse_config` and `parse_file_dir_components` functions by replacing `unwrap` with `OptionExt` and `context` for better error messages.
- Updated `build_access_layer_simple` and `build_cache_manager` functions to use `map_err` for more descriptive error handling.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* chore: rebase main
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
---------
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
again
false by default
test: config api
refactor: per code review
less info!
even less info!!
docs: gc regions instr
refactor: grp by region id
per code review
per review
error handling?
test: fix
todos
aft rebase fix
after refactor
Signed-off-by: discord9 <discord9@163.com>
* feat: adds format, regex_extract function and more type tests
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* fix: forgot functions
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* chore: forgot null type
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* test: forgot date type
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* feat: remove format function
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* test: update results after upgrading datafusion
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
---------
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* perf: only decode primary keys in the batch
Signed-off-by: evenyag <realevenyag@gmail.com>
* fix: don't push none to creator
Signed-off-by: evenyag <realevenyag@gmail.com>
* chore: implement method to filter __table_id for sparse encoding
Signed-off-by: evenyag <realevenyag@gmail.com>
* feat: filter table id for sparse encoding separately
The __table_id doesn't present in projection so we have to filter it
manually
Signed-off-by: evenyag <realevenyag@gmail.com>
* fix: decode tags for sparse encoding when building bloom filter
Signed-off-by: evenyag <realevenyag@gmail.com>
* feat: support inverted index for tags under sparse encoding
Signed-off-by: evenyag <realevenyag@gmail.com>
* feat: skip tag columns in fulltext index
Signed-off-by: evenyag <realevenyag@gmail.com>
* chore: fix warnings
Signed-off-by: evenyag <realevenyag@gmail.com>
* style: fix clippy
Signed-off-by: evenyag <realevenyag@gmail.com>
* test: fix list index metadata test
Signed-off-by: evenyag <realevenyag@gmail.com>
* fix: decode primary key columns to filter
When primary key columns are not in projection but in filters, we need
to decode them in compute_filter_mask_flat
Signed-off-by: evenyag <realevenyag@gmail.com>
* refactor: reuse filter method
Signed-off-by: evenyag <realevenyag@gmail.com>
* fix: only use dictionary for string type in compat
Signed-off-by: evenyag <realevenyag@gmail.com>
* refactor: safe to get column by creator's column id
Signed-off-by: evenyag <realevenyag@gmail.com>
---------
Signed-off-by: evenyag <realevenyag@gmail.com>
* test: support flat in basic_test
Signed-off-by: evenyag <realevenyag@gmail.com>
* test: support flat in alter_test
Signed-off-by: evenyag <realevenyag@gmail.com>
* test: support flat for append_mode_test
Signed-off-by: evenyag <realevenyag@gmail.com>
* refactor: update bump_committed_sequence_test to test both formats
Signed-off-by: evenyag <realevenyag@gmail.com>
* refactor: update close_test to test both formats
Signed-off-by: evenyag <realevenyag@gmail.com>
* refactor: update compaction_test to test both formats
Signed-off-by: evenyag <realevenyag@gmail.com>
* refactor: update create_test to test both formats
Signed-off-by: evenyag <realevenyag@gmail.com>
* refactor: update edit_region_test to test both formats
Signed-off-by: evenyag <realevenyag@gmail.com>
* refactor: update merge_mode_test to test both formats
Signed-off-by: evenyag <realevenyag@gmail.com>
* refactor: update parallel_test to test both formats
Signed-off-by: evenyag <realevenyag@gmail.com>
* refactor: update projection_test to test both formats
Signed-off-by: evenyag <realevenyag@gmail.com>
* refactor: update prune_test to test both formats
Signed-off-by: evenyag <realevenyag@gmail.com>
* refactor: update row_selector_test to test both formats
Signed-off-by: evenyag <realevenyag@gmail.com>
* refactor: update scan_test to test both formats
Signed-off-by: evenyag <realevenyag@gmail.com>
* refactor: update drop_test to test both formats
Signed-off-by: evenyag <realevenyag@gmail.com>
* refactor: update filter_deleted_test to test both formats
Signed-off-by: evenyag <realevenyag@gmail.com>
* refactor: update sync_test to test both formats
Signed-off-by: evenyag <realevenyag@gmail.com>
* refactor: update set_role_state_test to test both formats
Signed-off-by: evenyag <realevenyag@gmail.com>
* refactor: update staging_test to test both formats
Signed-off-by: evenyag <realevenyag@gmail.com>
* refactor: update truncate_test to test both formats
Signed-off-by: evenyag <realevenyag@gmail.com>
* refactor: update catchup_test to test both formats
Signed-off-by: evenyag <realevenyag@gmail.com>
* refactor: update flush_test to test both formats
Signed-off-by: evenyag <realevenyag@gmail.com>
* refactor: update open_test to test both formats
Signed-off-by: evenyag <realevenyag@gmail.com>
* refactor: update batch_open_test to test both formats
Signed-off-by: evenyag <realevenyag@gmail.com>
* test: fix all flat format tests
Signed-off-by: evenyag <realevenyag@gmail.com>
---------
Signed-off-by: evenyag <realevenyag@gmail.com>