refactor/remove-compactor-compact:
### Remove Unused Compaction Functionality
- **Removed `compact` Method**: Eliminated the `compact` method from the `Compactor` trait and its default implementation, which was primarily used for local compaction in testing. This change affects `compactor.rs`.
- **Code Cleanup**: Removed associated code and comments related to the `compact` method, streamlining the `Compactor` trait interface.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* chore: fix git cliff errors in latest version
- Fix errors in v2.12.0
- Do not generate logs for beta/rc tags between the compared commits
Signed-off-by: evenyag <realevenyag@gmail.com>
* chore: preserve blank line before release date in changelog
Signed-off-by: evenyag <realevenyag@gmail.com>
---------
Signed-off-by: evenyag <realevenyag@gmail.com>
* refactor(mito2): improve compaction error handling and file removal
Refactor compaction task execution to enhance error handling and robustness.
- Implemented parallel execution of compaction tasks with proper error capture and logging for individual task failures.
- Ensured JoinSnafu is no longer directly used in error propagation, instead handling errors within the task processing loop.
- Adjusted file removal logic to correctly include expired SSTs after compaction merges.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* refactor(mito2): extract SstMerger trait for testability in compaction
Extract SstMerger trait and DefaultSstMerger implementation to improve the testability of DefaultCompactor.
The DefaultCompactor is now generic over SstMerger, allowing mock implementations to be injected for unit testing without relying on the full object storage access layer. This refactoring separates the concerns of SST file merging from the overall compaction orchestration logic.
Additionally:
- Updated CompactionScheduler to use DefaultCompactor::default().
- Added unit tests for DefaultCompactor using a MockMerger.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* fix(compaction): propagate join error during sst flush
Correctly propagates the error when joining SST flush handles during compaction. Previously, the error was logged but not returned, leading to potential silent failures.
Also reorders some imports for consistency.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* perf(compaction): pre-allocate capacity for compacted_inputs
Pre-allocates capacity for the compacted_inputs vector based on the estimated total size of inputs and expired SSTs. This optimization aims to reduce vector reallocations during the compaction process.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/allow-partial-compaction:
### Commit Message
Enhance `DefaultCompactor` and `MockMerger` for Improved Flexibility
- **`compactor.rs`**:
- Added `Clone` trait to `DefaultSstMerger` and `MockMerger` to allow cloning.
- Removed `Arc` wrapping from `DefaultCompactor`'s `merger` field for direct usage.
- Updated `merge_ssts` method to require `Clone` trait for `SstMerger`.
- Modified `MockMerger` to use `Arc<Mutex>` for `results` and `call_idx` to ensure thread safety.
- Adjusted error handling to use `error::InvalidMetaSnafu` directly.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
---------
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
fix/flaky-test:
### Add Dynamic Port Selection for Standalone Tests
- **`cli.rs`**: Implemented functions `random_standalone_addrs` and `choose_random_unused_port_offset` to dynamically select unused ports for standalone tests, enhancing test reliability.
- Updated `test_export_create_table_with_quoted_names` to use dynamically assigned ports for HTTP, RPC, MySQL, and PostgreSQL addresses.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* fix(index): intersect bitmaps before early exit in predicates applier
The loop skipped intersecting when the next bitmap was empty, which left
the accumulator unchanged instead of zeroing it. Intersect first, then
break when the result is empty.
Signed-off-by: Weixie Cui <cuiweixie@gmail.com>
* per gemini
* style(index): format predicates applier loop
* fix(index): remove unused mut in predicates applier
---------
Signed-off-by: Weixie Cui <cuiweixie@gmail.com>
Co-authored-by: discord9 <55937128+discord9@users.noreply.github.com>
Co-authored-by: discord9 <discord9@163.com>
* feat: add support for decimal parameter type, remove string replacement fallback
* chore: format
* fix: add support for using unsigned bigint in postgres
* chore: format toml
* refactor: cleanup duplicated code
* fix: rescale decimal
perf: move Tantivy fulltext search to blocking thread pool
Wrap the synchronous Tantivy search (query parsing, posting list
traversal, stored field reads) in spawn_blocking_global to avoid
starving the tokio async runtime with CPU-bound work.
Signed-off-by: lyang24 <lanqingy93@gmail.com>
* fix: add overflow check before interleave()
Signed-off-by: evenyag <realevenyag@gmail.com>
* refactor: pass batches and column index to check_interleave_bytes_overflow
Refactor check_interleave_bytes_overflow to accept batches and a column
index directly, avoiding the intermediate Vec collection of arrays.
Signed-off-by: evenyag <realevenyag@gmail.com>
---------
Signed-off-by: evenyag <realevenyag@gmail.com>
* feat: support alter from primary_key to flat
Signed-off-by: evenyag <realevenyag@gmail.com>
* chore: alter flat to primary_key
Signed-off-by: evenyag <realevenyag@gmail.com>
* feat: change default_experimental_flat_format to true
Signed-off-by: evenyag <realevenyag@gmail.com>
* feat: compute channel size from splitted batch size
Signed-off-by: evenyag <realevenyag@gmail.com>
* test: add tests for split and channel size
Signed-off-by: evenyag <realevenyag@gmail.com>
* fix: always set sst_format from manifest on region open
sanitize_region_options did not set options.sst_format when the
default (PrimaryKey) matched the manifest value, leaving it as None
after reopen. This caused the alter format change to appear lost.
Signed-off-by: evenyag <realevenyag@gmail.com>
* test: fix tests
Signed-off-by: evenyag <realevenyag@gmail.com>
* test: show create table after alteration
Signed-off-by: evenyag <realevenyag@gmail.com>
* refactor!: rename default_experimental_flat_format to default_flat_format
The flat format is no longer experimental. Remove "experimental" from
the config field name, doc comments, and all references.
Signed-off-by: evenyag <realevenyag@gmail.com>
* chore: fix clippy
Signed-off-by: evenyag <realevenyag@gmail.com>
---------
Signed-off-by: evenyag <realevenyag@gmail.com>
* perf/schema-align:
**Refactor and Enhance Error Handling in `pending_rows_batcher.rs`**
- **Refactored `record_failure` Macro**: Moved the `record_failure` macro outside of the `flush_batch_physical` function to improve code reuse and maintainability.
- **Enhanced Batch Transformation**: Introduced `transform_logical_batches_to_physical` function to handle the transformation of logical table batches into physical format.
- **Batch Concatenation**: Added `concat_modified_batches` function to concatenate modified batches into a single batch.
- **Region Write Splitting**: Implemented `split_and_encode_region_writes` function to split combined batches into region-specific writes based on partition rules.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* perf/schema-align:
Add tests for `transform_logical_batches_to_physical` in `pending_rows_batcher.rs`
- Implemented `mock_tag_batch` function to create mock `RecordBatch` instances for testing.
- Added multiple test cases for `transform_logical_batches_to_physical`:
- `test_transform_logical_batches_to_physical_success`: Verifies successful transformation of logical to physical batches.
- `test_transform_logical_batches_to_physical_taxonomy_failure`: Tests failure scenario when column IDs are missing.
- `test_transform_logical_batches_to_physical_multiple_batches`: Checks handling of multiple batches.
- `test_transform_logical_batches_to_physical_mixed_success_failure`: Tests mixed success and failure scenarios.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* perf/schema-align: refactor `flush_batch_physical` for better testability
Introduced several traits to abstract dependencies on CatalogManager, PartitionRuleManager,
and NodeManager, enabling easier unit testing with mock implementations.
- Added `PhysicalFlushCatalogProvider`, `PhysicalFlushPartitionProvider`, and `PhysicalFlushNodeRequester` traits.
- Implemented adapters for existing managers to satisfy the new traits.
- Refactored `flush_batch_physical` to use these traits instead of concrete manager references.
- Modularized region write planning, resolution, and encoding into standalone functions.
- Added comprehensive unit tests for the refactored logic, including edge cases for table lookup and region routing.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* perf/schema-align:
### Enhance Error Handling and Simplify Code in `error.rs` and `pending_rows_batcher.rs`
- **Error Handling Improvements**:
- Added new error variants `Partition` and `MetricEngine` in `error.rs` to handle specific error cases.
- Updated error propagation using `ResultExt` and `context` for better error messages and handling in `pending_rows_batcher.rs`.
- **Code Simplification**:
- Removed `FlushWriteResult` enum and refactored `flush_region_writes_concurrently` to return `Result<()>`.
- Simplified error handling in `flush_batch_physical` and related functions by removing `first_error` and using `Result` for error propagation.
- **Test Adjustments**:
- Updated tests to align with the new error handling approach, ensuring they check for specific error messages and conditions.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* perf/schema-align: refactor `PendingBatch` to use `Option` for cleaner state management
Refactored `PendingBatch` in `pending_rows_batcher.rs` to use `Option<PendingBatch>`
within the worker loop. This change simplifies initialization and cleanup logic
by leveraging `Option::get_or_insert_with` and `Option::take`.
- Updated `PendingBatch` fields `created_at` and `ctx` to be non-optional.
- Modified `drain_batch` to take `&mut Option<PendingBatch>` and return the
drained batch, removing the need for `flush_with_error`.
- Simplified the worker loop logic for batch creation and flushing.
- Added a unit test `test_drain_batch_takes_initialized_pending_batch_from_option`
to verify the new draining logic.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* perf/schema-align: share errors across waiters using `Arc<Error>`
Enhanced error reporting in `PendingRowsBatcher` by using `Arc<Error>` in
`FlushWaiter` and `WorkerCommand`. This allows the same error instance to be
shared among all waiters of a batch, avoiding redundant error string conversions
and providing more structured error information.
- Added `SubmitBatch` variant to `Error` in `error.rs`.
- Updated `FlushWaiter` and `WorkerCommand` to use `std::result::Result<(), Arc<Error>>`.
- Refactored `notify_waiters` to distribute the shared `Arc<Error>`.
- Added `SubmitBatchSnafu` context when receiving results from the worker.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* perf/schema-align: export types for benchmarking
Exported several internal types and traits from `pending_rows_batcher.rs` to enable
external benchmarking of the physical batch flushing logic.
- Made `PhysicalTableMetadata`, `PhysicalFlushCatalogProvider`,
`PhysicalFlushPartitionProvider`, `PhysicalFlushNodeRequester`,
`TableBatch`, and `flush_batch_physical` public.
- Added a new criterion benchmark `flush_batch_physical.rs` to measure the
performance of physical batch flushing with varying numbers of logical
tables and rows per table.
- Registered the new benchmark in `src/servers/Cargo.toml`.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* fix: typo
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* refactor(servers): improve error handling and documentation in batcher
Refactored error handling in `pending_rows_batcher.rs` by using `ArrowSnafu`
for RecordBatch projection errors and simplified partition rule fetching.
Added comprehensive documentation for `flush_batch_physical` and updated
error display for `SubmitBatch`.
- Added `Location` to `Arrow` error variant for better traceability.
- Updated `SubmitBatch` display to include source error.
- Replaced manual error mapping with `context(error::ArrowSnafu)` in
`strip_partition_columns_from_batch`.
- Added doc comments to `flush_batch_physical` outlining the pipeline steps.
- Optimized capacity allocation in `transform_logical_batches_to_physical`.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* refactor(servers): clarify physical table metadata and simplify planned batch
Renamed `name_to_ids` to `col_name_to_ids` in `PhysicalTableMetadata` to
better reflect its purpose. Refactored `PlannedRegionBatch` to use a
`num_rows()` method instead of storing a redundant `row_count` field.
- Updated `PhysicalTableMetadata` and its usages in `pending_rows_batcher.rs`
and benchmarks.
- Removed `row_count` field from `PlannedRegionBatch` and added a `num_rows()`
helper.
- Cleaned up manual `with_context` closures for table lookups.
- Fixed a minor formatting issue in worker command processing.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* refactor(servers): simplify flush write structs and centralize metrics
Removed redundant `row_count` fields from `FlushRegionWrite` and
`PlannedRegionBatch` (made the helper method test-only). Centralized the
incrementing of `FLUSH_TOTAL` and `FLUSH_ROWS` metrics into `flush_batch`
to avoid duplication and ensure consistency.
- Removed `row_count` from `FlushRegionWrite` and `PlannedRegionBatch`.
- Marked `PlannedRegionBatch::num_rows()` as `#[cfg(test)]`.
- Updated `flush_batch` to handle `FLUSH_TOTAL` and `FLUSH_ROWS` metrics.
- Simplified concurrent and sequential flush logic by removing local metric
updates.
- Cleaned up related tests to match the structural changes.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
---------
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>