Commit Graph

599 Commits

Author SHA1 Message Date
Zhenchi
599f289f59 feat: add granularity and false_positive_rate options for indexes (#6416)
* feat: add `granularity` and `false_positive_rate` options for indexes

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* address comments

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* upgrade proto

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

---------

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
2025-07-02 07:33:39 +00:00
LFC
385f12a62e refactor: extract the common method for errors into tonic status (#6437)
Signed-off-by: luofucong <luofc@foxmail.com>
2025-07-02 02:57:30 +00:00
LFC
a203909de3 feat: extension range definition (#6386)
* feat: defined extension range

Signed-off-by: luofucong <luofc@foxmail.com>

* remove feature parameters

Signed-off-by: luofucong <luofc@foxmail.com>

* resolve PR comments

Signed-off-by: luofucong <luofc@foxmail.com>

* resolve PR comments

Signed-off-by: luofucong <luofc@foxmail.com>

---------

Signed-off-by: luofucong <luofc@foxmail.com>
2025-06-30 02:42:40 +00:00
Zhenchi
ff559b2688 fix: complete partial index search results in cache (#6403)
* fix: complete partial index search results in cache

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* polish

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* address comments

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* add initial tests

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* cover issue case

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* TestEnv new -> async

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

---------

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
2025-06-27 07:40:14 +00:00
Lei, HUANG
1d07864b29 refactor(object-store): move backends building functions back to object-store (#6400)
refactor/building-backend-in-object-store:
 ### Refactor Object Store Configuration

 - **Centralize Object Store Configurations**: Moved object store configurations (`FileConfig`, `S3Config`, `OssConfig`, `AzblobConfig`, `GcsConfig`) to `object-store/src/config.rs`.
 - **Error Handling Enhancements**: Introduced `object-store/src/error.rs` for improved error handling related to object store operations.
 - **Factory Pattern for Object Store**: Implemented `object-store/src/factory.rs` to create object store instances, consolidating logic from `datanode/src/store.rs`.
 - **Remove Redundant Store Implementations**: Deleted individual store files (`azblob.rs`, `fs.rs`, `gcs.rs`, `oss.rs`, `s3.rs`) from `datanode/src/store/`.
 - **Update Usage of Object Store Config**: Updated references to `ObjectStoreConfig` in `datanode.rs`, `standalone.rs`, `config.rs`, and `error.rs` to use the new centralized configuration.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
2025-06-25 13:49:55 +00:00
Ruihang Xia
7a9444c85b refactor: remove staled manifest structures (#6382)
* refactor: remove staled manifest structures

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* Update src/store-api/src/lib.rs

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-06-24 09:23:10 +00:00
LFC
bb12be3310 refactor: scan Batches directly (#6369)
* refactor: scan `Batch`es directly

Signed-off-by: luofucong <luofc@foxmail.com>

* fix ci

Signed-off-by: luofucong <luofc@foxmail.com>

* resolve PR comments

Signed-off-by: luofucong <luofc@foxmail.com>

* resolve PR comments

Signed-off-by: luofucong <luofc@foxmail.com>

---------

Signed-off-by: luofucong <luofc@foxmail.com>
2025-06-24 07:55:49 +00:00
LFC
e072726ea8 refactor: make scanner creation async (#6349)
* refactor: make scanner creation async

Signed-off-by: luofucong <luofc@foxmail.com>

* resolve PR comments

Signed-off-by: luofucong <luofc@foxmail.com>

---------

Signed-off-by: luofucong <luofc@foxmail.com>
2025-06-20 06:44:49 +00:00
Lei, HUANG
6ece560f8c fix: reordered write cause incorrect kv (#6345)
* fix/reordered-write-cause-incorrect-kv:
 - **Enhance Testing in `partition_tree.rs`**: Added comprehensive test functions such as `kv_region_metadata`, `key_values`, and `collect_kvs` to improve the robustness of key-value operations and ensure correct behavior of the `PartitionTreeMemtable`.
 - **Improve Key Handling in `dict.rs`**: Modified `KeyDictBuilder` to handle both full and sparse keys, ensuring correct mapping and insertion. Added a new test `test_builder_finish_with_sparse_key` to validate the handling of sparse keys.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* fix/reordered-write-cause-incorrect-kv:
 ### Refactor `partition_tree.rs` for Improved Key Handling

 - **Refactored Key Handling**: Simplified the `key_values` function to accept an iterator of keys, removing hardcoded key-value pairs. This change enhances flexibility and reduces redundancy in key management.
 - **Updated Test Cases**: Modified test cases to use the new `key_values` function signature, ensuring they iterate over keys dynamically rather than relying on predefined lists.

 Files affected:
 - `src/mito2/src/memtable/partition_tree.rs`

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* fix/reordered-write-cause-incorrect-kv:
 Enhance Testing in `partition_tree.rs`

 - Added assertions to verify key-value collection after `memtable` and `forked` operations.
 - Refactored key-value writing logic for clarity in `forked` operations.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

---------

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
2025-06-19 06:32:40 +00:00
Lei, HUANG
086ae9cdcd chore: print series count after wal replay (#6344)
* chore/print-series-count-after-wal-replay:
 ### Add Series Count Functionality and Logging Enhancements

 - **`time_partition.rs`**: Introduced `series_count` method to calculate the total timeseries count across all time partitions.
 - **`opener.rs`**: Enhanced logging to include the total timeseries replayed during WAL replay.
 - **`version.rs`**: Added `series_count` method to `VersionControlData` for approximating timeseries count in the current version.
 - **`handler.rs`**: Added entry and exit logging for the `sql` function to trace execution flow.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* chore/print-series-count-after-wal-replay:
 ### Remove Unused Import

 - **File Modified**: `src/servers/src/http/handler.rs`
 - **Change Summary**: Removed the unused `info` import from `common_telemetry`.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

---------

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
2025-06-18 12:04:39 +00:00
Lei, HUANG
a59b6c36d2 chore: add metrics for active series and field builders (#6332)
* chore/series-metrics:
 ### Add Metrics for Active Series and Values in Memtable

 - **`simple_bulk_memtable.rs`**: Implemented `Drop` trait for `SimpleBulkMemtable` to decrement `MEMTABLE_ACTIVE_SERIES_COUNT` and `MEMTABLE_ACTIVE_VALUES_COUNT` upon dropping.
 - **`time_series.rs`**:
   - Introduced `SeriesMap` with `Drop` implementation to manage active series and values count.
   - Updated `SeriesSet` and `Iter` to use `SeriesMap`.
   - Added `num_values` method in `Series` to calculate the number of values.
 - **`metrics.rs`**: Added `MEMTABLE_ACTIVE_SERIES_COUNT` and `MEMTABLE_ACTIVE_VALUES_COUNT` metrics to track active series and values in `TimeSeriesMemtable`.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* chore/series-metrics:
- Add metrics for active series and field builders
- Update dashboard

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* chore/series-metrics:
 **Add Series Count Tracking in Memtables**

 - **`flush.rs`**: Updated `RegionFlushTask` to track and log the series count during memtable flush operations.
 - **`memtable.rs`**: Introduced `series_count` in `MemtableStats` and added a method to retrieve it.
 - **`partition_tree.rs`, `partition.rs`, `tree.rs`**: Implemented series count calculation in `PartitionTreeMemtable` and its components.
 - **`simple_bulk_memtable.rs`, `time_series.rs`**: Integrated series count tracking in `SimpleBulkMemtable` and `TimeSeriesMemtable` implementations.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* Update src/mito2/src/memtable.rs

Co-authored-by: Yingwen <realevenyag@gmail.com>

---------

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
Co-authored-by: Yingwen <realevenyag@gmail.com>
2025-06-18 09:16:45 +00:00
Lei, HUANG
0d0236ddab fix: revert string builder initial capacity in TimeSeriesMemtable (#6330)
fix/revert-string-builder-initial-capacity:
 ### Update `time_series.rs` Memory Allocation

 - **Reduced StringBuilder Capacity**: Adjusted the initial capacity of `StringBuilder` in `ValueBuilder` from `(256, 4096)` to `(4, 8)` to optimize memory usage in `time_series.rs`.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
Co-authored-by: Ruihang Xia <waynestxia@gmail.com>
2025-06-17 13:24:52 +00:00
Weny Xu
10bf9b11f6 fix: handle corner case in catchup where compacted entry id exceeds region last entry id (#6312)
* fix(mito2): handle corner case in catchup where compacted entry id exceeds region last entry id

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: apply suggestions from CR

Signed-off-by: WenyXu <wenymedia@gmail.com>

---------

Signed-off-by: WenyXu <wenymedia@gmail.com>
2025-06-16 06:36:31 +00:00
Yingwen
eaf1e1198f refactor: Extract mito codec part into a new crate (#6307)
* chore: add a new crate mito-codec

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: port necessary mods for primary key codec

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: use codec utils in mito-codec

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: remove unused mods

Signed-off-by: evenyag <realevenyag@gmail.com>

* style: fix clippy

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: remove Partition::is_partition_column()

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: remove duplicated test utils

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: remove unused comment

Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: fix is_partition_column check

Signed-off-by: evenyag <realevenyag@gmail.com>

---------

Signed-off-by: evenyag <realevenyag@gmail.com>
2025-06-13 07:14:29 +00:00
Ruihang Xia
7468a8ab2a feat: organize EXPLAIN ANALYZE VERBOSE's output in JSON format (#6308)
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2025-06-12 09:55:53 +00:00
Lei, HUANG
5bb0466ff2 feat: introduce file group in compaction (#6261)
* fix/file-group-in-compaction:
 ### Enhance Compaction Logic with File Grouping

 - **`run.rs`**: Introduced `FileGroup` struct to manage groups of `FileHandle` objects, allowing for more efficient compaction operations. Updated `Ranged` and `Item` trait implementations to work with `FileGroup`.
 - **`test_util.rs`**: Added `new_file_handle_with_sequence` function to support file handles with sequence numbers, enhancing test utilities.
 - **`twcs.rs`**: Modified `TwcsPicker` to utilize `FileGroup` for managing files within windows, improving compaction logic. Updated `Window` struct to use `HashMap` for storing `FileGroup` objects.
 - **`version_util.rs`**: Updated version control utilities to handle sequence numbers in file metadata, aligning with new compaction logic.

Signed-off-by: Lei, HUANG <lhuang@greptime.com>

* fix/file-group-in-compaction:
 ### Add Test for File Group Assignment in TWCS

 - **Enhancements in `twcs.rs`:**
   - Added a new test `test_assign_file_groups_to_windows` to verify the correct assignment of file groups to windows.
   - Enhanced `test_assign_compacting_to_windows` with a new case to ensure files with overlapping time ranges and the same sequence are treated as one `FileGroup`.

Signed-off-by: Lei, HUANG <lhuang@greptime.com>

* fix/file-group-in-compaction:
 **Enhance Compaction Task Documentation and Initialization**

 - **`run.rs`**: Added documentation for `FileGroup` to clarify its role in representing a group of files created by the same compaction task.
 - **`twcs.rs`**: Introduced comments in the `Window` struct to explain the mapping of file sequences to file groups, indicating files created from the same compaction task. Simplified the initialization of the `files` hashmap using `HashMap::from`.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

---------

Signed-off-by: Lei, HUANG <lhuang@greptime.com>
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
2025-06-12 09:33:40 +00:00
Zhenchi
c26138963e refactor: unify function registry (Part 1) (#6262)
* refactor: unify function registry (Part 1)

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* refactor: simplify via register_scalar

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

---------

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
2025-06-10 10:11:06 +00:00
Lei, HUANG
69870e2762 fix(mito): use 1day as default time partition duration (#6202)
* fix unit tests

* fix: sqlness

* fix/default-time-window:
 ## Add Helper Functions and Enhance Compaction Tests

 - **Refactor Compaction Logic**: Introduced helper functions `flush` and `compact` in `compaction_test.rs` to streamline compaction operations.
 - **Enhance Compaction Tests**: Added a new test `test_infer_compaction_time_window` in `compaction_test.rs` to verify compaction time window inference.
 - **Testing Improvements**: Added `#[cfg(test)]` attribute to `new_multi_partitions` in `time_partition.rs` to ensure it's only included in test builds.

* fix/default-time-window:
 - **Refactor `TimePartition` Struct**: Removed unnecessary comments regarding `time_range` in `time_partition.rs`.
 - **Enhance `TimePartitions` Functionality**: Added a method `part_duration_or_default` to provide a default partition duration in `time_partition.rs`.
 - **Update SQL Test Cases**: Modified SQL operations and expected results in `scan_big_varchar.result` and `scan_big_varchar.sql` to reflect changes in data manipulation logic.

* fix/default-time-window:
 ### Update Time Partition Default Duration

 - **Refactor Default Duration**: Introduced `INITIAL_TIME_WINDOW` constant to define the default time window duration as `Duration::from_days(1)`. This change replaces multiple instances of the hardcoded default duration across the `time_partition.rs` file.
 - **Files Affected**: `time_partition.rs`

* fix/default-time-window:
 ## Update Partition Duration Handling

 - **`time_partition.rs`**: Refactored `part_duration` to be non-optional, removing `Option` wrapper. Updated logic to use `unwrap_or` with `INITIAL_TIME_WINDOW` where necessary. Adjusted related methods and tests to accommodate this change.
 - **`version.rs` (memtable and region)**: Updated handling of `part_duration` to align with changes in `time_partition.rs`, ensuring consistent use of non-optional `Duration`.

* fix/default-time-window:
 ### Improve Error Context in `time_partition.rs`

 - Enhanced error context message in `time_partition.rs` to provide clearer information on partition time range issues, including bucket size details.

Signed-off-by: Lei, HUANG <lhuang@greptime.com>

---------

Signed-off-by: Lei, HUANG <lhuang@greptime.com>
2025-06-08 16:20:26 +00:00
Weny Xu
80c5af0ecf fix: ignore incomplete WAL entries during read (#6251)
* fix: ignore incomplete entry

* fix: fix unit tests
2025-06-04 11:16:42 +00:00
Lei, HUANG
fdd164c0fa fix(mito): revert initial builder capacity for TimeSeriesMemtable (#6231)
* fix/initial-builder-cap:
 ### Enhance Series Initialization and Capacity Management

 - **`simple_bulk_memtable.rs`**: Updated the `Series` initialization to use `with_capacity` with a specified capacity of 8192, improving memory management.
 - **`time_series.rs`**: Introduced `with_capacity` method in `Series` to allow custom initial capacity for `ValueBuilder`. Adjusted `INITIAL_BUILDER_CAPACITY` to 16 for more efficient memory usage. Added a new `new` method to maintain backward compatibility.

* fix/initial-builder-cap:
 ### Adjust Memory Allocation in Memtable

 - **`simple_bulk_memtable.rs`**: Reduced the initial capacity of `Series` from 8192 to 1024 to optimize memory usage.
 - **`time_series.rs`**: Decreased `INITIAL_BUILDER_CAPACITY` from 16 to 4 to improve efficiency in vector building.
2025-06-03 08:25:02 +00:00
Zhenchi
078afb2bd6 feat: bloom filter index applier support or eq chain (#6227)
* feat: bloom filter index applier support or eq chain

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* address comments

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

---------

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
2025-06-03 08:08:19 +00:00
Lei, HUANG
4e615e8906 feat(wal): support bulk wal entries (#6178)
* feat/bulk-wal:
 ### Refactor: Simplify Data Handling in LogStore Implementations

 - **`kafka/log_store.rs`, `raft_engine/log_store.rs`, `wal.rs`, `raw_entry_reader.rs`, `logstore.rs`:**
   - Refactored `entry` and `build_entry` functions to accept `Vec<u8>` directly instead of `&mut Vec<u8>`.
   - Removed usage of `std::mem::take` for data handling, simplifying the code and improving readability.
   - Updated test cases to align with the new function signatures.

* feat/bulk-wal:
 ### Add Support for Bulk WAL Entries and Flight Data Encoding

 - **Add `raw_data` field to `BulkPart` and related structs**: Updated `BulkPart` and related structures in `src/mito2/src/memtable/bulk/part.rs`, `src/mito2/src/memtable/simple_bulk_memtable.rs`, `src/mito2/src/memtable/time_partition.rs`, `src/mito2/src/region_write_ctx.rs`,
 `src/mito2/src/worker/handle_bulk_insert.rs`, and `src/store-api/src/region_request.rs` to include a new `raw_data` field for handling Arrow IPC data.
 - **Implement Flight Data Encoding**: Added a new module `flight` in `src/common/test-util/src/flight.rs` to encode record batches to Flight data format.
 - **Update `greptime-proto` dependency**: Changed the revision of the `greptime-proto` dependency in `Cargo.lock` and `Cargo.toml`.
 - **Enhance WAL Writer and Tests**: Modified `src/mito2/src/wal.rs` and related test files to support bulk WAL entries and added tests for encoding and handling bulk data.

* feat/bulk-wal:
 - **Update `greptime-proto` Dependency**: Updated the `greptime-proto` dependency to a new revision in `Cargo.lock` and `Cargo.toml`.
 - **Add `common-grpc` Dependency**: Added `common-grpc` as a dependency in `Cargo.lock` and `src/mito2/Cargo.toml`.
 - **Refactor `BulkPart` Structure**: Removed `num_rows` field and added `num_rows()` method in `src/mito2/src/memtable/bulk/part.rs`. Updated related usages in `src/mito2/src/memtable/simple_bulk_memtable.rs`, `src/mito2/src/memtable/time_partition.rs`, `src/mito2/src/memtable/time_series.rs`,
 `src/mito2/src/region_write_ctx.rs`, and `src/mito2/src/worker/handle_bulk_insert.rs`.
 - **Implement `TryFrom` and `From` for `BulkWalEntry`**: Added implementations for converting between `BulkPart` and `BulkWalEntry` in `src/mito2/src/memtable/bulk/part.rs`.
 - **Handle Bulk Entries in Region Opener**: Added logic to process bulk entries in `src/mito2/src/region/opener.rs`.
 - **Fix `BulkInsertRequest` Handling**: Corrected `region_id` handling in `src/operator/src/bulk_insert.rs` and `src/store-api/src/region_request.rs`.
 - **Add Error Variant for `ConvertBulkWalEntry`**: Added a new error variant in `src/mito2/src/error.rs` for handling bulk WAL entry conversion errors.

* fix: ci

* feat/bulk-wal:
 Add bulk write operation in `opener.rs`

 - Enhanced the region write context by adding a call to `write_bulk()` after `write_memtable()` in `opener.rs`.
 - This change aims to improve the efficiency of writing operations by enabling bulk writes.

* feat/bulk-wal:
 Enhance error handling and metrics in `bulk_insert.rs`

 - Updated `Inserter` to improve error handling by capturing the result of `datanode.handle(request)` and incrementing the `DIST_INGEST_ROW_COUNT` metric with the number of affected rows.

* feat/bulk-wal:
 ### Remove Encode Error Handling for WAL Entries

 - **`error.rs`**: Removed the `EncodeWal` error variant and its associated handling.
 - **`wal.rs`**: Eliminated the `entry_encode_buf` buffer and its usage for encoding WAL entries. Replaced with direct encoding to a vector using `encode_to_vec()`.
2025-05-29 09:10:30 +00:00
Lei, HUANG
4b71e493f7 feat!: revise compaction picker (#6121)
* - **Refactor `RegionFilePathFactory` to `RegionFilePathProvider`:** Updated references and implementations in `access_layer.rs`, `write_cache.rs`, and related test files to use the new struct name.
 - **Add `max_file_size` support in compaction:** Introduced `max_file_size` option in `PickerOutput`, `SerializedPickerOutput`, and `WriteOptions` in `compactor.rs`, `picker.rs`, `twcs.rs`, and `window.rs`.
 - **Enhance Parquet writing logic:** Modified `parquet.rs` and `parquet/writer.rs` to support optional `max_file_size` and added a test case `test_write_multiple_files` to verify writing multiple files based on size constraints.

 **Refactor Parquet Writer Initialization and File Handling**
 - Updated `ParquetWriter` in `writer.rs` to handle `current_indexer` as an `Option`, allowing for more flexible initialization and management.
 - Introduced `finish_current_file` method to encapsulate logic for completing and transitioning between SST files, improving code clarity and maintainability.
 - Enhanced error handling and logging with `debug` statements for better traceability during file operations.

 - **Removed Output Size Enforcement in `twcs.rs`:**
   - Deleted the `enforce_max_output_size` function and related logic to simplify compaction input handling.

 - **Added Max File Size Option in `parquet.rs`:**
   - Introduced `max_file_size` in `WriteOptions` to control the maximum size of output files.

 - **Refactored Indexer Management in `parquet/writer.rs`:**
   - Changed `current_indexer` from an `Option` to a direct `Indexer` type.
   - Implemented `roll_to_next_file` to handle file transitions when exceeding `max_file_size`.
   - Simplified indexer initialization and management logic.

 - **Refactored SST File Handling**:
   - Introduced `FilePathProvider` trait and its implementations (`WriteCachePathProvider`, `RegionFilePathFactory`) to manage SST and index file paths.
   - Updated `AccessLayer`, `WriteCache`, and `ParquetWriter` to use `FilePathProvider` for path management.
   - Modified `SstWriteRequest` and `SstUploadRequest` to use path providers instead of direct paths.
   - Files affected: `access_layer.rs`, `write_cache.rs`, `parquet.rs`, `writer.rs`.

 - **Enhanced Indexer Management**:
   - Replaced `IndexerBuilder` with `IndexerBuilderImpl` and made it async to support dynamic indexer creation.
   - Updated `ParquetWriter` to handle multiple indexers and file IDs.
   - Files affected: `index.rs`, `parquet.rs`, `writer.rs`.

 - **Removed Redundant File ID Handling**:
   - Removed `file_id` from `SstWriteRequest` and `CompactionOutput`.
   - Updated related logic to dynamically generate file IDs where necessary.
   - Files affected: `compaction.rs`, `flush.rs`, `picker.rs`, `twcs.rs`, `window.rs`.

 - **Test Adjustments**:
   - Updated tests to align with new path and indexer management.
   - Introduced `FixedPathProvider` and `NoopIndexBuilder` for testing purposes.
   - Files affected: `sst_util.rs`, `version_util.rs`, `parquet.rs`.

* chore: rebase main

* feat/multiple-compaction-output:
 ### Add Benchmarking and Refactor Compaction Logic

 - **Benchmarking**: Added a new benchmark `run_bench` in `Cargo.toml` and implemented benchmarks in `benches/run_bench.rs` using Criterion for `find_sorted_runs` and `reduce_runs` functions.
 - **Compaction Module Enhancements**:
   - Made `run.rs` public and refactored the `Ranged` and `Item` traits to be public.
   - Simplified the logic in `find_sorted_runs` and `reduce_runs` by removing `MergeItems` and related functions.
   - Introduced `find_overlapping_items` for identifying overlapping items.
 - **Code Cleanup**: Removed redundant code and tests related to `MergeItems` in `run.rs`.

* feat/multiple-compaction-output:
 ### Enhance Compaction Logic and Add Benchmarks

 - **Compaction Logic Improvements**:
   - Updated `reduce_runs` function in `src/mito2/src/compaction/run.rs` to remove the target parameter and improve the logic for selecting files to merge based on minimum penalty.
   - Enhanced `find_overlapping_items` to handle unsorted inputs and improve overlap detection efficiency.

 - **Benchmark Enhancements**:
   - Added `bench_find_overlapping_items` in `src/mito2/benches/run_bench.rs` to benchmark the new `find_overlapping_items` function.
   - Extended existing benchmarks to include larger data sizes.

 - **Testing Enhancements**:
   - Updated tests in `src/mito2/src/compaction/run.rs` to reflect changes in `reduce_runs` and added new tests for `find_overlapping_items`.

 - **Logging and Debugging**:
   - Improved logging in `src/mito2/src/compaction/twcs.rs` to provide more detailed information about compaction decisions.

* feat/multiple-compaction-output:
 ### Refactor and Enhance Compaction Logic

 - **Refactor `find_overlapping_items` Function**: Changed the function signature to accept slices instead of mutable vectors in `run.rs`.
 - **Rename and Update Struct Fields**: Renamed `penalty` to `size` in `SortedRun` struct and updated related logic in `run.rs`.
 - **Enhance `reduce_runs` Function**: Improved logic to sort runs by size and limit probe runs to 100 in `run.rs`.
 - **Add `merge_seq_files` Function**: Introduced a new function `merge_seq_files` in `run.rs` for merging sequential files.
 - **Modify `TwcsPicker` Logic**: Updated the compaction logic to use `merge_seq_files` when only one run is found in `twcs.rs`.
 - **Remove `enforce_file_num` Function**: Deleted the `enforce_file_num` function and its related test cases in `twcs.rs`.

* feat/multiple-compaction-output:
 ### Enhance Compaction Logic and Testing

 - **Add `merge_seq_files` Functionality**: Implemented the `merge_seq_files` function in `run.rs` to optimize file merging based on scoring systems. Updated
 benchmarks in `run_bench.rs` to include `bench_merge_seq_files`.
 - **Improve Compaction Strategy in `twcs.rs`**: Modified the compaction logic to handle file merging more effectively, considering file size and overlap.
 - **Update Tests**: Enhanced test coverage in `compaction_test.rs` and `append_mode_test.rs` to validate new compaction logic and file merging strategies.
 - **Remove Unused Function**: Deleted `new_file_handles` from `test_util.rs` as it was no longer needed.

* feat/multiple-compaction-output:
 ### Refactor TWCS Compaction Options

 - **Refactor Compaction Logic**: Simplified the TWCS compaction logic by replacing multiple parameters (`max_active_window_runs`, `max_active_window_files`, `max_inactive_window_runs`, `max_inactive_window_files`) with a single `trigger_file_num` parameter in `picker.rs`, `twcs.rs`, and `options.rs`.
 - **Update Tests**: Adjusted test cases to reflect the new compaction logic in `append_mode_test.rs`, `compaction_test.rs`, `filter_deleted_test.rs`, `merge_mode_test.rs`, and various test files under `tests/cases`.
 - **Modify Engine Options**: Updated engine option keys to use `trigger_file_num` in `mito_engine_options.rs` and `region_request.rs`.
 - **Fuzz Testing**: Updated fuzz test generators and translators to accommodate the new compaction parameter in `alter_expr.rs` and related files.

 This refactor aims to streamline the compaction configuration by reducing the number of parameters and simplifying the codebase.

* chore: add trailing space

* fix license header

* feat/revise-compaction-picker:
 **Limit File Processing and Optimize Merge Logic in `run.rs`**

 - Introduced a limit to process a maximum of 100 files in `merge_seq_files` to control time complexity.
 - Adjusted logic to calculate `target_size` and iterate over files using the limited set of files.
 - Updated scoring calculations to use the limited file set, ensuring efficient file merging.

* feat/revise-compaction-picker:
 ### Add Compaction Metrics and Remove Debug Logging

 - **Compaction Metrics**: Introduced new histograms `COMPACTION_INPUT_BYTES` and `COMPACTION_OUTPUT_BYTES` to track compaction input and output file sizes in `metrics.rs`. Updated `compactor.rs` to observe these metrics during the compaction process.
 - **Logging Cleanup**: Removed debug logging of file ranges during the merge process in `twcs.rs`.

* feat/revise-compaction-picker:
 ## Enhance Compaction Logic and Metrics

 - **Compaction Logic Improvements**:
   - Added methods `input_file_size` and `output_file_size` to `MergeOutput` in `compactor.rs` to streamline file size calculations.
   - Updated `Compactor` implementation to use these methods for metrics tracking.
   - Modified `Ranged` trait logic in `run.rs` to improve range comparison.
   - Enhanced test cases in `run.rs` to reflect changes in compaction logic.

 - **Metrics Enhancements**:
   - Changed `COMPACTION_INPUT_BYTES` and `COMPACTION_OUTPUT_BYTES` from histograms to counters in `metrics.rs` for better performance tracking.

 - **Debugging and Logging**:
   - Added detailed logging for compaction pick results in `twcs.rs`.
   - Implemented custom `Debug` trait for `FileMeta` in `file.rs` to improve debugging output.

 - **Testing Enhancements**:
   - Added new test `test_compaction_overlapping_files` in `compaction_test.rs` to verify compaction behavior with overlapping files.
   - Updated `merge_mode_test.rs` to reflect changes in file handling during scans.

* feat/revise-compaction-picker:
 ### Update `FileHandle` Debug Implementation

 - **Refactor Debug Output**: Simplified the `fmt::Debug` implementation for `FileHandle` in `src/mito2/src/sst/file.rs` by consolidating multiple fields into a single `meta` field using `meta_ref()`.
 - **Atomic Operations**: Updated the `deleted` field to use atomic loading with `Ordering::Relaxed`.

* Trigger CI

* feat/revise-compaction-picker:
 **Update compaction logic and default options**

 - **`twcs.rs`**: Enhanced logging for compaction pick results by improving the formatting for better readability.
 - **`options.rs`**: Modified the default `max_output_file_size` in `TwcsOptions` from 2GB to 512MB to optimize file handling and performance.

* feat/revise-compaction-picker:
 Refactor `find_overlapping_items` to use an external result vector

 - Updated `find_overlapping_items` in `src/mito2/src/compaction/run.rs` to accept a mutable result vector instead of returning a new vector, improving memory efficiency.
 - Modified benchmarks in `src/mito2/benches/bench_compaction_picker.rs` to accommodate the new function signature.
 - Adjusted tests in `src/mito2/src/compaction/run.rs` to use the updated function signature, ensuring correct functionality with the new approach.

* feat/revise-compaction-picker:
 Improve file merging logic in `run.rs`

 - Refactor the loop logic in `merge_seq_files` to simplify the iteration over file groups.
 - Adjust the range for `end_idx` to include the endpoint, allowing for more flexible group selection.
 - Remove the condition that skips groups with only one file, enabling more comprehensive processing of file sequences.

* feat/revise-compaction-picker:
 Enhance `find_overlapping_items` with `SortedRun` and Update Tests

 - Refactor `find_overlapping_items` in `src/mito2/src/compaction/run.rs` to utilize the `SortedRun` struct for improved efficiency and clarity.
 - Introduce a `sorted` flag in `SortedRun` to optimize sorting operations.
 - Update test cases in `src/mito2/benches/bench_compaction_picker.rs` to accommodate changes in `find_overlapping_items` by using `SortedRun`.
 - Add `From<Vec<T>>` implementation for `SortedRun` to facilitate easy conversion from vectors.

* feat/revise-compaction-picker:
 **Enhancements in `compaction/run.rs`:**

 - Added `ReadableSize` import to handle size calculations.
 - Modified the logic in `merge_seq_files` to clamp the calculated target size to a maximum of 2GB when `max_file_size` is not provided.

* feat/revise-compaction-picker: Add Default Max Output Size Constant for Compaction

Introduce DEFAULT_MAX_OUTPUT_SIZE constant to define the default maximum compaction output file size as 2GB. Refactor the merge_seq_files function to utilize this constant, ensuring consistent and maintainable code for handling file size limits during compaction.
2025-05-23 03:29:08 +00:00
Lei, HUANG
5a0da5b6bb fix: region worker stall metrics (#6149)
fix/stall-metrics:
 Improve stalled request handling in `handle_write.rs`

 - Updated logic to account for both `write_requests` and `bulk_requests` when adjusting `stalled_count`.
 - Modified `reject_region_stalled_requests` and `handle_region_stalled_requests` to correctly subtract the combined length of `requests` and `bulk` from `stalled_count`.
2025-05-21 13:21:50 +00:00
Lei, HUANG
eaf7b4b9dd chore: update flush failure metric name and update grafana dashboard (#6138)
* 1. rename `greptime_mito_flush_errors_total` metric to `greptime_mito_flush_errors_total` for consistency
2. update grafana dashboard to add following panel:
  - compaction input/output bytes
  - bulk insert handle elasped time in frontend and region worker
2025-05-20 12:05:54 +00:00
Zhenchi
400229c384 feat: introduce index result cache (#6110)
* feat: introduce index result cache

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* Update src/mito2/src/sst/index/inverted_index/applier/builder.rs

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* optimize selector_len

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* address comments

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* address comments

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* address comments

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

---------

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-05-20 01:45:42 +00:00
Weny Xu
864cc117b3 fix: append noop entry when auto topic creation is disabled (#6092)
* feat: improve topic management and add stale records cleanup

* fix: fix unit tests

* chore: apply suggestions from CR

* chore: apply suggestions from CR
2025-05-16 11:26:47 +00:00
Yingwen
0ea9ab385d fix: clean files under the atomic write dir on failure (#6112)
* fix: remove files under atomic dir on failure

* fix: clean atomic dir on download failure

* chore: update comment

* fix: clean if failed to write without write cache

* feat: add a TempFileCleaner to clean files on failure

* chore: after merge fix

* chore: more fix

---------

Co-authored-by: discord9 <55937128+discord9@users.noreply.github.com>
Co-authored-by: discord9 <discord9@163.com>
2025-05-16 11:18:11 +00:00
Yingwen
c7e9485534 feat: New scanner SeriesScan to scan by series for querying metrics (#5968)
* chore: basic methods for SeriesScan

* chore: add to scanner enum

* feat: implement scan logic of each partition

* feat: use series scan when distribution is PerSeries

* refactor: remove per series scan from SeqScan

* fix: use series scan in PerSeries distribution

* feat: keep parallelize_scan unchanged

* fix: address compiler errors

* fix: include build merge reader cost to scan cost

* feat: use smallvec

* chore: update comment

* Revert "feat: keep parallelize_scan unchanged"

This reverts commit 96ba00d175.

* assign partition_ranges

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* feat: try send before send

reduce the send timeout to 10ms

* chore: add comments

* fix: add metrics to partition metrics list

* fix: correct scan cost metrics

* chore: reset instant

* fix: scanner metrics init

* chore: display more info in explain

* feat: metrics for send series timeout

* style: fix clippy

* refactor: use ChainedRecordBatchStream to simplify codes

* chore: fix typos

* feat: separate distributor metrics

* feat: remove parallelize hack

* chore: fix warning

* test: add test for series scan

* test: update sqlness test

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Co-authored-by: Ruihang Xia <waynestxia@gmail.com>
2025-05-16 08:53:24 +00:00
Ruihang Xia
57b53211d9 feat: don't hide atomic write dir (#6109)
* feat: don't hidden atomic write dir

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* compatible code

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* Update src/mito2/src/access_layer.rs

Co-authored-by: Yingwen <realevenyag@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Co-authored-by: Yingwen <realevenyag@gmail.com>
2025-05-16 06:21:13 +00:00
Lei, HUANG
5a9023d6b3 feat(bulk): write to multiple time partitions (#6086)
* add benchmark for splitting according to time partition

* feat/write-to-multiple-time-partitions:
 **Enhancements to Bulk Processing and Time Partitioning**

 - **`part.rs`**: Added `Snafu` to imports and introduced `timestamp_index` in `BulkPart` struct. Implemented `timestamps` method for accessing timestamp columns.
 - **`simple_bulk_memtable.rs`**: Updated tests to include `timestamp_index` initialization.
 - **`time_partition.rs`**: Enhanced `TimePartition` to support partial writes with `write_record_batch_partial`. Implemented `split_record_batch` for filtering records by timestamp range. Added comprehensive tests for `split_record_batch`.
 - **`handle_bulk_insert.rs`**: Modified to retrieve timestamp index and column together, updating `BulkPart` initialization with `timestamp_index`.

* feat/write-to-multiple-time-partitions:
 ### Enhance Time Partitioning Logic

 - **`time_partition.rs`**:
   - Introduced `HashSet` for efficient partition management.
   - Refactored `write_bulk` to handle multiple partitions and added `find_partitions_by_time_range` for identifying existing and missing partitions.
   - Updated `get_or_create_time_partition` to manage partition creation.
   - Added comprehensive tests for partition finding logic, covering various scenarios including overlapping and non-overlapping time ranges.

 - **Tests**:
   - Added `test_find_partitions_by_time_range` to validate new partitioning logic.
   - Updated `test_split_record_batch` to ensure correct record batch splitting behavior.

* feat/write-to-multiple-time-partitions:
 ### Enhance Time Partitioning and Testing in `time_partition.rs`

 - **Time Partitioning Enhancements**:
   - Updated `split_record_batch` to handle multiple timestamp units (`Second`, `Millisecond`, `Microsecond`, `Nanosecond`) by matching on `DataType`.
   - Improved filtering logic for timestamp arrays to support various time units.

 - **Testing Enhancements**:
   - Added `test_write_bulk` to verify writing across multiple partitions and scenarios in `time_partition.rs`.
   - Updated `test_split_record_batch` to use `TimestampMillisecondArray` for testing timestamp partitioning.

 - **Imports and Dependencies**:
   - Added necessary imports for new timestamp array types and testing utilities.

* feat/write-to-multiple-time-partitions:
 ### Refactor and Enhance Time Partition Filtering

 - **Refactor Filtering Logic**: Consolidated the filtering logic for timestamp arrays using macros in `time_partition.rs` and `bench_filter_time_partition.rs`. This reduces code duplication and improves maintainability.
 - **Enhance `BulkPart` Struct**: Made fields in `BulkPart` public to facilitate easier access and manipulation in `memtable.rs` and `part.rs`.
 - **Rename Function**: Renamed `split_record_batch` to `filter_record_batch` for clarity in `time_partition.rs` and `bench_filter_time_partition.rs`.
 - **Add Feature Flag**: Introduced `int_roundings` feature in `lib.rs` to support new functionality.

* refactor tests

* feat/write-to-multiple-time-partitions:
 Improve timestamp handling in `time_partition.rs`

 - Enhanced safety comments for timestamp conversion to ensure clarity.
 - Modified logic to prevent overflow by using `div_euclid` for `bulk_start_sec` and `bulk_end_sec` calculations.
 - Adjusted the `filter_map` logic to correctly compute timestamps using `start_sec` and `part_duration_sec`.

* feat/write-to-multiple-time-partitions:
 **Refactor timestamp handling and add utility function**

 - **Refactor `time_partition.rs`:** Simplified timestamp handling by replacing direct type access with a utility function to retrieve the timestamp unit. Improved error handling for timestamp conversion.
 - **Enhance `metadata.rs`:** Added `time_index_type` function to `RegionMetadata` to retrieve the timestamp type of the time index column, ensuring safer and more readable code.

* feat/write-to-multiple-time-partitions:
 Refactor time partition variable names in `time_partition.rs`

 - Renamed variables for clarity: `bulk_start_sec` to `start_bucket` and `bulk_end_sec` to `end_bucket`.
 - Updated related logic to use new variable names for improved readability and maintainability.

* feat/write-to-multiple-time-partitions:
 **Refactor variable names in `time_partition.rs`**

 - Updated variable names from `matching` and `missing` to `matchings` and `missings` for clarity and consistency.
 - Modified function calls and loop iterations to align with the new variable names.
 - Affected file: `src/mito2/src/memtable/time_partition.rs`

* feat/write-to-multiple-time-partitions:
 ### Refactor variable names in `time_partition.rs`

 - Updated variable names for clarity in `time_partition.rs`:
   - Renamed `matchings` to `matching_parts`
   - Renamed `missings` to `missing_parts`
 - Adjusted logic to use new variable names in methods `find_partitions_by_time_range` and `write_record_batch`.

* feat/write-to-multiple-time-partitions:
 ### Enhance Time Partition Handling

 - **`time_partition.rs`**:
   - Added `ArrayRef` to handle timestamp arrays, improving the partitioning logic by allowing more efficient timestamp range checks.
   - Enhanced `find_partitions_by_time_range` to support sparse data and handle different timestamp units (`Second`, `Millisecond`, `Microsecond`, `Nanosecond`).
   - Updated test cases to cover new scenarios, including sparse data and edge cases, ensuring robustness of partition handling.

---------

Co-authored-by: Lei <lei@Leis-MacBook-Pro.local>
2025-05-14 05:09:59 +00:00
Ruihang Xia
bbb6f8685e feat: implement commutativity rule for prom-related plans (#5875)
* feat: implement commutativity rule for prom-related plans

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix range manipulate deserializer

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* blocklist in commutativity rule

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* change dictionary type

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* handle partition and ordering

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix clippy

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* update tests

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* add rate, increase and delta

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* update sqlness result

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* regexp_replace uses empty string instead of null value

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* update sqlness result

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* update sqlness result

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* update sqlness result again

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2025-05-13 09:06:25 +00:00
Yingwen
ca1641d1c4 feat: implement PlainBatch struct (#6079)
* feat: implement PlainBatch struct

* chore: typo

* style: fix clippy

* feat: assert num columns
2025-05-13 05:56:12 +00:00
Zhenchi
36d9346ffc refactor: introduce row group selection (#6075)
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
2025-05-12 07:15:17 +00:00
discord9
6ab0f0cc5c fix: alter table modify type should also modify default value (#6049)
* fix: select after alter

* fix: insert a proper row&catch a bug

* fix: alter table modify type modify default value type too

* refactor: per review

* chore: per review

* refactor: per review

* refactor: per review
2025-05-09 03:40:59 +00:00
Lei, HUANG
8685ceb232 feat: impl bulk memtable and bridge bulk inserts (#6054)
* feat/bridge-bulk-insert:
 ## Implement Bulk Insert and Update Dependencies

 - **Bulk Insert Implementation**: Added `handle_bulk_inserts` method in `src/operator/src/bulk_insert.rs` to manage bulk insert requests using `FlightDecoder` and `FlightData`.
 - **Dependency Updates**: Updated `Cargo.lock` and `Cargo.toml` to use the latest revision of `greptime-proto` and added new dependencies like `arrow`, `arrow-ipc`, `bytes`, and `prost`.
 - **gRPC Enhancements**: Modified `put_record_batch` method in `src/frontend/src/instance/grpc.rs` and `src/servers/src/grpc/flight.rs` to handle `FlightData` instead of `RawRecordBatch`.
 - **Error Handling**: Added new error types in `src/operator/src/error.rs` for handling Arrow operations and decoding flight data.
 - **Miscellaneous**: Updated `src/operator/src/insert.rs` to expose `partition_manager` and `node_manager` as public fields.

* feat/bridge-bulk-insert:
 - **Update `greptime-proto` Dependency**: Updated the `greptime-proto` dependency to a new revision in `Cargo.lock` and `Cargo.toml`.
 - **Refactor gRPC Query Handling**: Removed `RawRecordBatch` usage from `grpc.rs`, `flight.rs`, `greptime_handler.rs`, and test files, simplifying the gRPC query handling.
 - **Enhance Bulk Insert Logic**: Improved bulk insert logic in `bulk_insert.rs` and `region_request.rs` by using `FlightDecoder` and `BooleanArray` for better performance and clarity.
 - **Add `common-grpc` Dependency**: Added `common-grpc` as a workspace dependency in `store-api/Cargo.toml` to support gRPC functionalities.

* fix: clippy

* fix schema serialization

* feat/bridge-bulk-insert:
 Add error handling for encoding/decoding in `metadata.rs` and `region_request.rs`

 - Introduced new error variants `FlightCodec` and `Prost` in `MetadataError` to handle encoding/decoding failures in `metadata.rs`.
 - Updated `make_region_bulk_inserts` function in `region_request.rs` to use `context` for error handling with `ProstSnafu` and `FlightCodecSnafu`.
 - Enhanced error handling for `FlightData` decoding and `filter_record_batch` operations.

* fix: test

* refactor: rename

* allow empty app_metadata in FlightData

* feat/bridge-bulk-insert:
 - **Remove Logging**: Removed unnecessary logging of affected rows in `region_server.rs`.
 - **Error Handling Enhancement**: Improved error handling in `bulk_insert.rs` by adding context to `split_record_batch` and handling single datanode fast path.
 - **Error Enum Cleanup**: Removed unused `Arrow` error variant from `error.rs`.

* fix: standalone test

* feat/bridge-bulk-insert:
 ### Enhance Bulk Insert Handling and Metadata Management

 - **`lib.rs`**: Enabled the `result_flattening` feature for improved error handling.
 - **`request.rs`**: Made `name_to_index` and `has_null` fields public in `WriteRequest` for better accessibility.
 - **`handle_bulk_insert.rs`**:
   - Added `handle_record_batch` function to streamline processing of bulk insert payloads.
   - Improved error handling and task management for bulk insert operations.
   - Updated `region_metadata_to_column_schema` to return both column schemas and a name-to-index map for efficient data access.

* feat/bridge-bulk-insert:
 - **Refactor `handle_bulk_insert.rs`:**
   - Replaced `handle_record_batch` with `handle_payload` for handling payloads.
   - Modified the fast path to use `common_runtime::spawn_global` for asynchronous task execution.

 - **Optimize `multi_dim.rs`:**
   - Added a fast path for single-region scenarios in `MultiDimPartitionRule::partition_record_batch`.

* feat/bridge-bulk-insert:
 - **Update `greptime-proto` Dependency**: Updated the `greptime-proto` dependency to a new revision in both `Cargo.lock` and `Cargo.toml`.
 - **Optimize Memory Allocation**: Increased initial and builder capacities in `time_series.rs` to improve performance.
 - **Enhance Data Handling**: Modified `bulk_insert.rs` to use `Bytes` for efficient data handling.
 - **Improve Bulk Insert Logic**: Refined the bulk insert logic in `region_request.rs` to handle schema and payload data more effectively and optimize record batch filtering.
 - **String Handling Improvement**: Updated string conversion in `helper.rs` for better performance.

* fix: clippy warnings

* feat/bridge-bulk-insert:
 **Add Metrics and Improve Error Handling**

 - **Metrics Enhancements**: Introduced new metrics for bulk insert operations in `metrics.rs`, `bulk_insert.rs`, `greptime_handler.rs`, and `region_request.rs`. Added `HANDLE_BULK_INSERT_ELAPSED`, `BULK_REQUEST_MESSAGE_SIZE`, and `GRPC_BULK_INSERT_ELAPSED` histograms to
 monitor performance.
 - **Error Handling Improvements**: Removed unnecessary error handling in `handle_bulk_insert.rs` by eliminating redundant `let _ =` patterns.
 - **Dependency Updates**: Added `lazy_static` and `prometheus` to `Cargo.lock` and `Cargo.toml` for metrics support.
 - **Code Refactoring**: Simplified function calls in `region_server.rs` and `handle_bulk_insert.rs` for better readability.

* chore: rebase main

* implement simple bulk memtable

* impl write_bulk

* implement simple bulk memtable

* feat/simple-bulk-memtable:
 ### Enhance Time-Series Memtable and Bulk Insert Handling

 - **Visibility Modifications**: Made `mutable_array` in `PrimitiveVectorBuilder` and `StringVectorBuilder` public in `primitive.rs` and `string.rs`.
 - **New Module**: Added `builder.rs` to `memtable` for time-series builders, including `FieldBuilder` and `StringBuilder` implementations.
 - **Bulk Insert Enhancements**:
   - Added `sequence` field to `BulkPart` in `part.rs` and updated its handling in `simple_bulk_memtable.rs` and `region_write_ctx.rs`.
   - Introduced metrics for bulk insert operations in `metrics.rs` and `bulk_insert.rs`.
 - **Performance Metrics**: Added timing metrics for write operations in `metrics.rs`, `region_write_ctx.rs`, and `handle_write.rs`.
 - **Region Request Handling**: Updated `make_region_bulk_inserts` in `region_request.rs` to include performance metrics.

* feat/simple-bulk-memtable:
 **Improve Memtable Stats Calculation and Add Metrics Timer**

 - **`simple_bulk_memtable.rs`**: Refactored `stats` method to use `num_rows` for checking if rows have been written, improving accuracy in memory table statistics.
 - **`handle_bulk_insert.rs`**: Introduced a metrics timer to measure the elapsed time for processing bulk requests, enhancing performance monitoring.

* feat/simple-bulk-memtable:
 ### Commit Message

 **Enhancements and Bug Fixes**

 - **Dependency Update**: Updated `greptime-proto` dependency to a new revision in `Cargo.lock` and `Cargo.toml`.
 - **Feature Addition**: Implemented `to_mutation` method in `BulkPart` to convert `BulkPart` to `Mutation` for fallback `write_bulk` implementation in `src/mito2/src/memtable/bulk/part.rs`.
 - **Functionality Improvement**: Modified `write_bulk` method in `TimeSeriesMemtable` to support default implementation fallback to row iteration in `src/mito2/src/memtable/time_series.rs`.
 - **Performance Optimization**: Enhanced `bulk_insert` handling by optimizing region request processing and data partitioning in `src/operator/src/bulk_insert.rs`.
 - **Error Handling**: Added `ComputeArrow` error variant for better error management in `src/operator/src/error.rs`.
 - **Code Refactoring**: Simplified region bulk insert request processing in `src/store-api/src/region_request.rs`.

* fix: some clippy warnings

* feat/simple-bulk-memtable:
 ### Commit Summary

 - **Refactor Return Types to `Result`:**
   Updated the return type of the `ranges` method in `memtable.rs`, `bulk.rs`, `partition_tree.rs`, `simple_bulk_memtable.rs`, `time_series.rs`, and `memtable_util.rs` to return `Result<MemtableRanges>` for better error handling.

 - **Enhance Metrics Tracking:**
   Improved metrics tracking by adding `num_rows` and `max_sequence` to `WriteMetrics` in `stats.rs`. Updated related methods in `partition_tree.rs`, `simple_bulk_memtable.rs`, `time_series.rs`, and `scan_region.rs` to utilize these metrics.

 - **Remove Unused Imports:**
   Cleaned up unused imports in `time_series.rs` to streamline the codebase.

* merge main

* remove useless error vairant

* use newer version of proto

* feat/simple-bulk-memtable:
                                                                                                                                 Commit Message

                                                                                                                                     Summary

Enhance FieldBuilder and StringBuilder functionality, add tests, and improve error handling.

                                                                                                                                   Key Changes

 • builder.rs:
    • Added documentation for FieldBuilder methods.
    • Renamed append_string_vector to append_vector in StringBuilder.
 • simple_bulk_memtable.rs:
    • Added new test cases for write_one, write_bulk, is_empty, stats, fork, and sequence_filter.
 • time_series.rs:
    • Improved error handling in ValueBuilder for type mismatches.
 • memtable_util.rs:
    • Removed unused imports and streamlined code.

These changes enhance the robustness and test coverage of the memtable components.

* feat/simple-bulk-memtable:
 Improve Time Partition Matching Logic in `time_partition.rs`

 - Enhanced the `write_bulk` method in `time_partition.rs` to improve the logic for matching partitions based on time ranges.
 - Introduced a new mechanism to filter and select partitions that overlap with the record batch's timestamp range before writing.

* feat/simple-bulk-memtable:
 Improve Metrics Handling in `bulk_insert.rs`

 - Removed the `group_request_timer` and its associated metric observation to streamline the timing logic.
 - Moved the `BULK_REQUEST_ROWS` metric observation to occur after filtering, ensuring accurate row count metrics.

* feat/simple-bulk-memtable:
 **Enhance Stalled Requests Calculation and Update Metrics**

 - **`worker.rs`**: Updated the `stalled_count` method to include both `reqs` and `bulk_reqs` in the calculation of stalled requests.
 - **`bulk_insert.rs`**: Removed duplicate observation of `BULK_REQUEST_MESSAGE_SIZE` metric.
 - **`metrics.rs`**: Changed the bucket strategy for `BULK_REQUEST_ROWS` from linear to exponential, improving the granularity of metrics collection.

* feat/simple-bulk-memtable:
 **Refactor `StringVector` Usage and Update Method Signatures**

 - **`src/datatypes/src/vectors/string.rs`**: Changed `StringVector`'s `array` field from public to private.
 - **`src/mito2/src/memtable/builder.rs`**: Refactored `append_vector` method to `append_array`, updating its usage to work directly with `StringArray` instead of `StringVector`.
 - **`src/mito2/src/memtable/time_series.rs`**: Updated `ValueBuilder` to handle `StringArray` directly, replacing `StringVector` usage with `StringArray` in the `FieldBuilder::String` case.

* feat/simple-bulk-memtable:
 - **Refactor `PrimitiveVectorBuilder`**: Made `mutable_array` private in `src/datatypes/src/vectors/primitive.rs`.
 - **Optimize `ValueBuilder`**: Replaced `UInt64VectorBuilder` and `UInt8VectorBuilder` with `Vec<u64>` and `Vec<u8>` for `sequence` and `op_type` in `src/mito2/src/memtable/time_series.rs`.
 - **Improve Metrics Initialization**: Updated histogram bucket initialization to use `exponential_buckets` in `src/mito2/src/metrics.rs`.

* feat/simple-bulk-memtable:
 Improve error handling in `simple_bulk_memtable.rs` and `time_series.rs`

 - Enhanced error handling by using `OptionExt` for more concise error context management in `simple_bulk_memtable.rs` and `time_series.rs`.
 - Replaced `ok_or` with `with_context` to streamline error context creation in both files.

* feat/simple-bulk-memtable:
 **Enhance Time Partition Handling in `time_partition.rs`**

 - Introduced `create_time_partition` function to streamline the creation of new time partitions, ensuring thread safety by acquiring a lock.
 - Modified logic to handle cases where no matching time partitions exist, creating new partitions as needed.
 - Updated `write_record_batch` and `write_one` methods to utilize the new partition creation logic, improving partition management and data writing efficiency.

* replace proto

* feat/simple-bulk-memtable:
 Update `metrics.rs` to adjust the range of exponential buckets for bulk insert message rows from `10 ~ 1_000_000` to `10 ~ 100_000`.
2025-05-09 02:56:09 +00:00
LFC
e787007eb5 feat: scan with sst minimal sequence (#6051)
* feat: scan with sst minimal sequence

* Update src/store-api/src/storage/requests.rs

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* update proto

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-05-08 01:34:51 +00:00
Weny Xu
df31f0b9ec fix: improve region migration error handling and optimize leader downgrade with lease check (#6026)
* fix(meta): improve region migration error handling and lease management

* chore: refine comments

* chore: apply suggestions from CR

* chore: apply suggestions from CR

* feat: consume opening_region_guard
2025-05-07 00:54:35 +00:00
Lei, HUANG
f298a110f9 feat: bridge bulk insert (#5927)
* feat/bridge-bulk-insert:
 ## Implement Bulk Insert and Update Dependencies

 - **Bulk Insert Implementation**: Added `handle_bulk_inserts` method in `src/operator/src/bulk_insert.rs` to manage bulk insert requests using `FlightDecoder` and `FlightData`.
 - **Dependency Updates**: Updated `Cargo.lock` and `Cargo.toml` to use the latest revision of `greptime-proto` and added new dependencies like `arrow`, `arrow-ipc`, `bytes`, and `prost`.
 - **gRPC Enhancements**: Modified `put_record_batch` method in `src/frontend/src/instance/grpc.rs` and `src/servers/src/grpc/flight.rs` to handle `FlightData` instead of `RawRecordBatch`.
 - **Error Handling**: Added new error types in `src/operator/src/error.rs` for handling Arrow operations and decoding flight data.
 - **Miscellaneous**: Updated `src/operator/src/insert.rs` to expose `partition_manager` and `node_manager` as public fields.

* feat/bridge-bulk-insert:
 - **Update `greptime-proto` Dependency**: Updated the `greptime-proto` dependency to a new revision in `Cargo.lock` and `Cargo.toml`.
 - **Refactor gRPC Query Handling**: Removed `RawRecordBatch` usage from `grpc.rs`, `flight.rs`, `greptime_handler.rs`, and test files, simplifying the gRPC query handling.
 - **Enhance Bulk Insert Logic**: Improved bulk insert logic in `bulk_insert.rs` and `region_request.rs` by using `FlightDecoder` and `BooleanArray` for better performance and clarity.
 - **Add `common-grpc` Dependency**: Added `common-grpc` as a workspace dependency in `store-api/Cargo.toml` to support gRPC functionalities.

* fix: clippy

* fix schema serialization

* feat/bridge-bulk-insert:
 Add error handling for encoding/decoding in `metadata.rs` and `region_request.rs`

 - Introduced new error variants `FlightCodec` and `Prost` in `MetadataError` to handle encoding/decoding failures in `metadata.rs`.
 - Updated `make_region_bulk_inserts` function in `region_request.rs` to use `context` for error handling with `ProstSnafu` and `FlightCodecSnafu`.
 - Enhanced error handling for `FlightData` decoding and `filter_record_batch` operations.

* fix: test

* refactor: rename

* allow empty app_metadata in FlightData

* feat/bridge-bulk-insert:
 - **Remove Logging**: Removed unnecessary logging of affected rows in `region_server.rs`.
 - **Error Handling Enhancement**: Improved error handling in `bulk_insert.rs` by adding context to `split_record_batch` and handling single datanode fast path.
 - **Error Enum Cleanup**: Removed unused `Arrow` error variant from `error.rs`.

* fix: standalone test

* feat/bridge-bulk-insert:
 ### Enhance Bulk Insert Handling and Metadata Management

 - **`lib.rs`**: Enabled the `result_flattening` feature for improved error handling.
 - **`request.rs`**: Made `name_to_index` and `has_null` fields public in `WriteRequest` for better accessibility.
 - **`handle_bulk_insert.rs`**:
   - Added `handle_record_batch` function to streamline processing of bulk insert payloads.
   - Improved error handling and task management for bulk insert operations.
   - Updated `region_metadata_to_column_schema` to return both column schemas and a name-to-index map for efficient data access.

* feat/bridge-bulk-insert:
 - **Refactor `handle_bulk_insert.rs`:**
   - Replaced `handle_record_batch` with `handle_payload` for handling payloads.
   - Modified the fast path to use `common_runtime::spawn_global` for asynchronous task execution.

 - **Optimize `multi_dim.rs`:**
   - Added a fast path for single-region scenarios in `MultiDimPartitionRule::partition_record_batch`.

* feat/bridge-bulk-insert:
 - **Update `greptime-proto` Dependency**: Updated the `greptime-proto` dependency to a new revision in both `Cargo.lock` and `Cargo.toml`.
 - **Optimize Memory Allocation**: Increased initial and builder capacities in `time_series.rs` to improve performance.
 - **Enhance Data Handling**: Modified `bulk_insert.rs` to use `Bytes` for efficient data handling.
 - **Improve Bulk Insert Logic**: Refined the bulk insert logic in `region_request.rs` to handle schema and payload data more effectively and optimize record batch filtering.
 - **String Handling Improvement**: Updated string conversion in `helper.rs` for better performance.

* fix: clippy warnings

* feat/bridge-bulk-insert:
 **Add Metrics and Improve Error Handling**

 - **Metrics Enhancements**: Introduced new metrics for bulk insert operations in `metrics.rs`, `bulk_insert.rs`, `greptime_handler.rs`, and `region_request.rs`. Added `HANDLE_BULK_INSERT_ELAPSED`, `BULK_REQUEST_MESSAGE_SIZE`, and `GRPC_BULK_INSERT_ELAPSED` histograms to
 monitor performance.
 - **Error Handling Improvements**: Removed unnecessary error handling in `handle_bulk_insert.rs` by eliminating redundant `let _ =` patterns.
 - **Dependency Updates**: Added `lazy_static` and `prometheus` to `Cargo.lock` and `Cargo.toml` for metrics support.
 - **Code Refactoring**: Simplified function calls in `region_server.rs` and `handle_bulk_insert.rs` for better readability.

* chore: rebase main

* chore: merge main
2025-05-06 09:53:25 +00:00
Yingwen
86aae6733d fix: prune primary key with multiple columns may use default value as statistics (#5996)
* test: incorrect test result when filtering pk with multiple columns

* fix: prune non first tag correctly

Distinguish no column and no stats and only use default value when no
column

* test: update test result

* refactor: rename test file

* test: add test for null filter

* fix: use StatValues for null counts

* test: drop table

* test: fix unstable flow test
2025-04-28 04:53:30 +00:00
Lei, HUANG
1a517ec8ac fix: check if memtable is empty by stats (#5989)
fix/checking-memtable-empty-and-stats:
 - **Refactor timestamp updates**: Simplified timestamp range updates in `PartitionTreeMemtable` and `TimeSeriesMemtable` by replacing `update_timestamp_range` with `fetch_max` and `fetch_min` methods for `max_timestamp` and `min_timestamp`.
   - Affected files: `partition_tree.rs`, `time_series.rs`

 - **Remove unused code**: Deleted the `update_timestamp_range` method from `WriteMetrics` and removed unnecessary imports.
   - Affected file: `stats.rs`

 - **Optimize memtable filtering**: Streamlined the check for empty memtables in `ScanRegion` by directly using `time_range`.
   - Affected file: `scan_region.rs`
2025-04-28 01:57:17 +00:00
shuiyisong
3c943be189 chore: update rust toolchain (#5818)
* chore: update nightly version

* chore: sort lint lines

* chore: minor fix

* chore: update nix

* chore: update toolchain to 2024-04-14

* chore: update toolchain to 2024-04-15

* chore: remove unnecessory test

* chore: do not assert oid in sqlness test

* chore: fix margin issue

* chore: fix cr issues

* chore: fix cr issues

---------

Co-authored-by: Ning Sun <sunning@greptime.com>
2025-04-27 09:02:36 +00:00
Weny Xu
55cadcd2c0 feat: introduce flush metadata region task for metric engine (#5951)
* feat: introduce flush metadata region task for metric engine

* docs: generate config.md

* chore: add header

* test: fix unit test

* fix: fix unit tests

* chore: apply suggestions from CR

* chore: remove docs

* fix: fix unit tests
2025-04-23 04:51:22 +00:00
Yingwen
56f319a707 fix: filter doesn't consider default values after schema change (#5912)
* test: sqlness test case

* feat: use correct default while pruning row groups

* fix: consider default in SimpleFilterContext

* test: update sqlness test

* test: add order by
2025-04-21 06:32:26 +00:00
Yuhan Wang
41814bb49f feat: introduce high_watermark for remote wal logstore (#5877)
* feat: introduce high_watermark_since_flush

* test: add unit test for high watermark

* refactor: submit a request instead

* fix: send reply before submit request

* fix: no need to update twice

* feat: update high watermark in background periodically

* test: update unit tests

* fix: update high watermark periodically

* test: update unit tests

* chore: apply review comments

* chore: rename

* chore: apply review comments

* chore: clean up

* chore: apply review comments
2025-04-18 12:10:47 +00:00
Weny Xu
b8c6f1c8ed feat: sync region followers after altering regions (#5901)
* feat: close follower regions after dropping leader regions

* chore: upgrade greptime-proto

* feat: sync region followers after alter region operations

* test: add tests

* chore: apply suggestions from CR

* chore: apply suggestions from CR
2025-04-18 10:21:35 +00:00
Lei, HUANG
799c7cbfa9 feat(mito): bulk insert request handling on datanode (#5831)
* wip: implement basic request handling

* feat/bulk-insert:
 ### Add Error Handling and Enhance Bulk Insert Functionality

 - **Error Handling**: Introduced a new error variant `ConvertDataType` in `error.rs` to handle conversion failures from `ConcreteDataType` to `ColumnDataType`.
 - **Bulk Insert Enhancements**:
   - Updated `WorkerRequest::BulkInserts` in `request.rs` to include metadata and sender.
   - Implemented `handle_bulk_inserts` in `worker.rs` to process bulk insert requests with region metadata.
   - Added functions `region_metadata_to_column_schema` and `record_batch_to_rows` in `handle_bulk_insert.rs` for schema conversion and row processing.
 - **API Changes**: Modified `RegionBulkInsertsRequest` in `region_request.rs` to include `region_id`.

 Files affected: `error.rs`, `request.rs`, `worker.rs`, `handle_bulk_insert.rs`, `region_request.rs`.

* feat/bulk-insert:
 **Enhance Error Handling and Add Unit Tests**

 - Improved error handling in `record_batch_to_rows` function within `handle_bulk_insert.rs` by returning `Result` and handling errors with `context`.
 - Added unit tests for `region_metadata_to_column_schema` and `record_batch_to_rows` functions in `handle_bulk_insert.rs` to ensure correct functionality and error handling.

* chore: update proto version

* feat/bulk-insert:
 - **Refactor Error Handling**: Updated error handling in `error.rs` by modifying the `ConvertDataType` error handling.
 - **Improve Logging and Error Reporting**: Enhanced logging and error reporting in `worker.rs` by adding error messages for missing region metadata.
 - **Add New Error Type**: Introduced `DecodeArrowIpc` error in `metadata.rs` to handle Arrow IPC decoding failures.
 - **Handle Arrow IPC Decoding**: Updated `region_request.rs` to handle Arrow IPC decoding errors using the new `DecodeArrowIpc` error type.

* chore: update proto version

* feat/bulk-insert:
 Refactor `handle_bulk_insert.rs` to simplify row construction

 - Removed the mutable `current_row` vector and refactored `row_at` function to return a new vector directly.
 - Updated `record_batch_to_rows` to utilize the refactored `row_at` function for constructing rows.

* feat/bulk-insert:
 ### Commit Summary

 **Enhancements in Region Server Request Handling**

 - Updated `region_server.rs` to include `RegionRequest::BulkInserts(_)` in the `RegionChange::Ingest` category, improving the handling of bulk insert operations.
 - Refined the categorization of region requests to ensure accurate mapping to `RegionChange` actions.
2025-04-15 14:11:50 +00:00
Lei, HUANG
6a50d71920 fix: memtable panic (#5894)
* fix: memtable panic

* fix: ci
2025-04-14 13:15:56 +00:00
Weny Xu
c522893552 fix: ensure logical regions are synced during region sync (#5878)
* fix: ensure logical regions are synced during region sync

* chore: apply suggestions from CR

* chore: apply suggestions from CR
2025-04-14 12:37:31 +00:00
Zhenchi
e3675494b4 feat: apply terms with fulltext bloom backend (#5884)
* feat: apply terms with fulltext bloom backend

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* perf: preload jieba

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* polish doc

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

---------

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
2025-04-14 07:08:59 +00:00