greptimedb

mirror of https://github.com/GreptimeTeam/greptimedb.git synced 2026-05-20 23:10:37 +00:00

Author	SHA1	Message	Date
Zhenchi	599f289f59	feat: add `granularity` and `false_positive_rate` options for indexes (#6416 ) * feat: add `granularity` and `false_positive_rate` options for indexes Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> * address comments Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> * upgrade proto Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> --------- Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>	2025-07-02 07:33:39 +00:00
LFC	385f12a62e	refactor: extract the common method for errors into tonic status (#6437 ) Signed-off-by: luofucong <luofc@foxmail.com>	2025-07-02 02:57:30 +00:00
LFC	a203909de3	feat: extension range definition (#6386 ) * feat: defined extension range Signed-off-by: luofucong <luofc@foxmail.com> * remove feature parameters Signed-off-by: luofucong <luofc@foxmail.com> * resolve PR comments Signed-off-by: luofucong <luofc@foxmail.com> * resolve PR comments Signed-off-by: luofucong <luofc@foxmail.com> --------- Signed-off-by: luofucong <luofc@foxmail.com>	2025-06-30 02:42:40 +00:00
Zhenchi	ff559b2688	fix: complete partial index search results in cache (#6403 ) * fix: complete partial index search results in cache Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> * polish Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> * address comments Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> * add initial tests Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> * cover issue case Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> * TestEnv new -> async Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> --------- Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>	2025-06-27 07:40:14 +00:00
Lei, HUANG	1d07864b29	refactor(object-store): move backends building functions back to object-store (#6400 ) refactor/building-backend-in-object-store: ### Refactor Object Store Configuration - Centralize Object Store Configurations: Moved object store configurations (`FileConfig`, `S3Config`, `OssConfig`, `AzblobConfig`, `GcsConfig`) to `object-store/src/config.rs`. - Error Handling Enhancements: Introduced `object-store/src/error.rs` for improved error handling related to object store operations. - Factory Pattern for Object Store: Implemented `object-store/src/factory.rs` to create object store instances, consolidating logic from `datanode/src/store.rs`. - Remove Redundant Store Implementations: Deleted individual store files (`azblob.rs`, `fs.rs`, `gcs.rs`, `oss.rs`, `s3.rs`) from `datanode/src/store/`. - Update Usage of Object Store Config: Updated references to `ObjectStoreConfig` in `datanode.rs`, `standalone.rs`, `config.rs`, and `error.rs` to use the new centralized configuration. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>	2025-06-25 13:49:55 +00:00
Ruihang Xia	7a9444c85b	refactor: remove staled manifest structures (#6382 ) * refactor: remove staled manifest structures Signed-off-by: Ruihang Xia <waynestxia@gmail.com> * Update src/store-api/src/lib.rs Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Signed-off-by: Ruihang Xia <waynestxia@gmail.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-06-24 09:23:10 +00:00
LFC	bb12be3310	refactor: scan `Batch`es directly (#6369 ) * refactor: scan `Batch`es directly Signed-off-by: luofucong <luofc@foxmail.com> * fix ci Signed-off-by: luofucong <luofc@foxmail.com> * resolve PR comments Signed-off-by: luofucong <luofc@foxmail.com> * resolve PR comments Signed-off-by: luofucong <luofc@foxmail.com> --------- Signed-off-by: luofucong <luofc@foxmail.com>	2025-06-24 07:55:49 +00:00
LFC	e072726ea8	refactor: make scanner creation async (#6349 ) * refactor: make scanner creation async Signed-off-by: luofucong <luofc@foxmail.com> * resolve PR comments Signed-off-by: luofucong <luofc@foxmail.com> --------- Signed-off-by: luofucong <luofc@foxmail.com>	2025-06-20 06:44:49 +00:00
Lei, HUANG	6ece560f8c	fix: reordered write cause incorrect kv (#6345 ) * fix/reordered-write-cause-incorrect-kv: - Enhance Testing in `partition_tree.rs`: Added comprehensive test functions such as `kv_region_metadata`, `key_values`, and `collect_kvs` to improve the robustness of key-value operations and ensure correct behavior of the `PartitionTreeMemtable`. - Improve Key Handling in `dict.rs`: Modified `KeyDictBuilder` to handle both full and sparse keys, ensuring correct mapping and insertion. Added a new test `test_builder_finish_with_sparse_key` to validate the handling of sparse keys. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * fix/reordered-write-cause-incorrect-kv: ### Refactor `partition_tree.rs` for Improved Key Handling - Refactored Key Handling: Simplified the `key_values` function to accept an iterator of keys, removing hardcoded key-value pairs. This change enhances flexibility and reduces redundancy in key management. - Updated Test Cases: Modified test cases to use the new `key_values` function signature, ensuring they iterate over keys dynamically rather than relying on predefined lists. Files affected: - `src/mito2/src/memtable/partition_tree.rs` Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * fix/reordered-write-cause-incorrect-kv: Enhance Testing in `partition_tree.rs` - Added assertions to verify key-value collection after `memtable` and `forked` operations. - Refactored key-value writing logic for clarity in `forked` operations. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> --------- Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>	2025-06-19 06:32:40 +00:00
Lei, HUANG	086ae9cdcd	chore: print series count after wal replay (#6344 ) * chore/print-series-count-after-wal-replay: ### Add Series Count Functionality and Logging Enhancements - `time_partition.rs`: Introduced `series_count` method to calculate the total timeseries count across all time partitions. - `opener.rs`: Enhanced logging to include the total timeseries replayed during WAL replay. - `version.rs`: Added `series_count` method to `VersionControlData` for approximating timeseries count in the current version. - `handler.rs`: Added entry and exit logging for the `sql` function to trace execution flow. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * chore/print-series-count-after-wal-replay: ### Remove Unused Import - File Modified: `src/servers/src/http/handler.rs` - Change Summary: Removed the unused `info` import from `common_telemetry`. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> --------- Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>	2025-06-18 12:04:39 +00:00
Lei, HUANG	a59b6c36d2	chore: add metrics for active series and field builders (#6332 ) * chore/series-metrics: ### Add Metrics for Active Series and Values in Memtable - `simple_bulk_memtable.rs`: Implemented `Drop` trait for `SimpleBulkMemtable` to decrement `MEMTABLE_ACTIVE_SERIES_COUNT` and `MEMTABLE_ACTIVE_VALUES_COUNT` upon dropping. - `time_series.rs`: - Introduced `SeriesMap` with `Drop` implementation to manage active series and values count. - Updated `SeriesSet` and `Iter` to use `SeriesMap`. - Added `num_values` method in `Series` to calculate the number of values. - `metrics.rs`: Added `MEMTABLE_ACTIVE_SERIES_COUNT` and `MEMTABLE_ACTIVE_VALUES_COUNT` metrics to track active series and values in `TimeSeriesMemtable`. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * chore/series-metrics: - Add metrics for active series and field builders - Update dashboard Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * chore/series-metrics: Add Series Count Tracking in Memtables - `flush.rs`: Updated `RegionFlushTask` to track and log the series count during memtable flush operations. - `memtable.rs`: Introduced `series_count` in `MemtableStats` and added a method to retrieve it. - `partition_tree.rs`, `partition.rs`, `tree.rs`: Implemented series count calculation in `PartitionTreeMemtable` and its components. - `simple_bulk_memtable.rs`, `time_series.rs`: Integrated series count tracking in `SimpleBulkMemtable` and `TimeSeriesMemtable` implementations. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * Update src/mito2/src/memtable.rs Co-authored-by: Yingwen <realevenyag@gmail.com> --------- Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> Co-authored-by: Yingwen <realevenyag@gmail.com>	2025-06-18 09:16:45 +00:00
Lei, HUANG	0d0236ddab	fix: revert string builder initial capacity in `TimeSeriesMemtable` (#6330 ) fix/revert-string-builder-initial-capacity: ### Update `time_series.rs` Memory Allocation - Reduced StringBuilder Capacity: Adjusted the initial capacity of `StringBuilder` in `ValueBuilder` from `(256, 4096)` to `(4, 8)` to optimize memory usage in `time_series.rs`. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> Co-authored-by: Ruihang Xia <waynestxia@gmail.com>	2025-06-17 13:24:52 +00:00
Weny Xu	10bf9b11f6	fix: handle corner case in catchup where compacted entry id exceeds region last entry id (#6312 ) * fix(mito2): handle corner case in catchup where compacted entry id exceeds region last entry id Signed-off-by: WenyXu <wenymedia@gmail.com> * chore: apply suggestions from CR Signed-off-by: WenyXu <wenymedia@gmail.com> --------- Signed-off-by: WenyXu <wenymedia@gmail.com>	2025-06-16 06:36:31 +00:00
Yingwen	eaf1e1198f	refactor: Extract mito codec part into a new crate (#6307 ) * chore: add a new crate mito-codec Signed-off-by: evenyag <realevenyag@gmail.com> * feat: port necessary mods for primary key codec Signed-off-by: evenyag <realevenyag@gmail.com> * refactor: use codec utils in mito-codec Signed-off-by: evenyag <realevenyag@gmail.com> * refactor: remove unused mods Signed-off-by: evenyag <realevenyag@gmail.com> * style: fix clippy Signed-off-by: evenyag <realevenyag@gmail.com> * refactor: remove Partition::is_partition_column() Signed-off-by: evenyag <realevenyag@gmail.com> * refactor: remove duplicated test utils Signed-off-by: evenyag <realevenyag@gmail.com> * chore: remove unused comment Signed-off-by: evenyag <realevenyag@gmail.com> * fix: fix is_partition_column check Signed-off-by: evenyag <realevenyag@gmail.com> --------- Signed-off-by: evenyag <realevenyag@gmail.com>	2025-06-13 07:14:29 +00:00
Ruihang Xia	7468a8ab2a	feat: organize EXPLAIN ANALYZE VERBOSE's output in JSON format (#6308 ) Signed-off-by: Ruihang Xia <waynestxia@gmail.com>	2025-06-12 09:55:53 +00:00
Lei, HUANG	5bb0466ff2	feat: introduce file group in compaction (#6261 ) * fix/file-group-in-compaction: ### Enhance Compaction Logic with File Grouping - `run.rs`: Introduced `FileGroup` struct to manage groups of `FileHandle` objects, allowing for more efficient compaction operations. Updated `Ranged` and `Item` trait implementations to work with `FileGroup`. - `test_util.rs`: Added `new_file_handle_with_sequence` function to support file handles with sequence numbers, enhancing test utilities. - `twcs.rs`: Modified `TwcsPicker` to utilize `FileGroup` for managing files within windows, improving compaction logic. Updated `Window` struct to use `HashMap` for storing `FileGroup` objects. - `version_util.rs`: Updated version control utilities to handle sequence numbers in file metadata, aligning with new compaction logic. Signed-off-by: Lei, HUANG <lhuang@greptime.com> * fix/file-group-in-compaction: ### Add Test for File Group Assignment in TWCS - Enhancements in `twcs.rs`: - Added a new test `test_assign_file_groups_to_windows` to verify the correct assignment of file groups to windows. - Enhanced `test_assign_compacting_to_windows` with a new case to ensure files with overlapping time ranges and the same sequence are treated as one `FileGroup`. Signed-off-by: Lei, HUANG <lhuang@greptime.com> * fix/file-group-in-compaction: Enhance Compaction Task Documentation and Initialization - `run.rs`: Added documentation for `FileGroup` to clarify its role in representing a group of files created by the same compaction task. - `twcs.rs`: Introduced comments in the `Window` struct to explain the mapping of file sequences to file groups, indicating files created from the same compaction task. Simplified the initialization of the `files` hashmap using `HashMap::from`. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> --------- Signed-off-by: Lei, HUANG <lhuang@greptime.com> Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>	2025-06-12 09:33:40 +00:00
Zhenchi	c26138963e	refactor: unify function registry (Part 1) (#6262 ) * refactor: unify function registry (Part 1) Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> * refactor: simplify via register_scalar Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> --------- Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>	2025-06-10 10:11:06 +00:00
Lei, HUANG	69870e2762	fix(mito): use 1day as default time partition duration (#6202 ) * fix unit tests * fix: sqlness * fix/default-time-window: ## Add Helper Functions and Enhance Compaction Tests - Refactor Compaction Logic: Introduced helper functions `flush` and `compact` in `compaction_test.rs` to streamline compaction operations. - Enhance Compaction Tests: Added a new test `test_infer_compaction_time_window` in `compaction_test.rs` to verify compaction time window inference. - Testing Improvements: Added `#[cfg(test)]` attribute to `new_multi_partitions` in `time_partition.rs` to ensure it's only included in test builds. * fix/default-time-window: - Refactor `TimePartition` Struct: Removed unnecessary comments regarding `time_range` in `time_partition.rs`. - Enhance `TimePartitions` Functionality: Added a method `part_duration_or_default` to provide a default partition duration in `time_partition.rs`. - Update SQL Test Cases: Modified SQL operations and expected results in `scan_big_varchar.result` and `scan_big_varchar.sql` to reflect changes in data manipulation logic. * fix/default-time-window: ### Update Time Partition Default Duration - Refactor Default Duration: Introduced `INITIAL_TIME_WINDOW` constant to define the default time window duration as `Duration::from_days(1)`. This change replaces multiple instances of the hardcoded default duration across the `time_partition.rs` file. - Files Affected: `time_partition.rs` * fix/default-time-window: ## Update Partition Duration Handling - `time_partition.rs`: Refactored `part_duration` to be non-optional, removing `Option` wrapper. Updated logic to use `unwrap_or` with `INITIAL_TIME_WINDOW` where necessary. Adjusted related methods and tests to accommodate this change. - `version.rs` (memtable and region): Updated handling of `part_duration` to align with changes in `time_partition.rs`, ensuring consistent use of non-optional `Duration`. * fix/default-time-window: ### Improve Error Context in `time_partition.rs` - Enhanced error context message in `time_partition.rs` to provide clearer information on partition time range issues, including bucket size details. Signed-off-by: Lei, HUANG <lhuang@greptime.com> --------- Signed-off-by: Lei, HUANG <lhuang@greptime.com>	2025-06-08 16:20:26 +00:00
Weny Xu	80c5af0ecf	fix: ignore incomplete WAL entries during read (#6251 ) * fix: ignore incomplete entry * fix: fix unit tests	2025-06-04 11:16:42 +00:00
Lei, HUANG	fdd164c0fa	fix(mito): revert initial builder capacity for TimeSeriesMemtable (#6231 ) * fix/initial-builder-cap: ### Enhance Series Initialization and Capacity Management - `simple_bulk_memtable.rs`: Updated the `Series` initialization to use `with_capacity` with a specified capacity of 8192, improving memory management. - `time_series.rs`: Introduced `with_capacity` method in `Series` to allow custom initial capacity for `ValueBuilder`. Adjusted `INITIAL_BUILDER_CAPACITY` to 16 for more efficient memory usage. Added a new `new` method to maintain backward compatibility. * fix/initial-builder-cap: ### Adjust Memory Allocation in Memtable - `simple_bulk_memtable.rs`: Reduced the initial capacity of `Series` from 8192 to 1024 to optimize memory usage. - `time_series.rs`: Decreased `INITIAL_BUILDER_CAPACITY` from 16 to 4 to improve efficiency in vector building.	2025-06-03 08:25:02 +00:00
Zhenchi	078afb2bd6	feat: bloom filter index applier support or eq chain (#6227 ) * feat: bloom filter index applier support or eq chain Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> * address comments Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> --------- Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>	2025-06-03 08:08:19 +00:00
Lei, HUANG	4e615e8906	feat(wal): support bulk wal entries (#6178 ) * feat/bulk-wal: ### Refactor: Simplify Data Handling in LogStore Implementations - `kafka/log_store.rs`, `raft_engine/log_store.rs`, `wal.rs`, `raw_entry_reader.rs`, `logstore.rs`: - Refactored `entry` and `build_entry` functions to accept `Vec<u8>` directly instead of `&mut Vec<u8>`. - Removed usage of `std::mem::take` for data handling, simplifying the code and improving readability. - Updated test cases to align with the new function signatures. * feat/bulk-wal: ### Add Support for Bulk WAL Entries and Flight Data Encoding - Add `raw_data` field to `BulkPart` and related structs: Updated `BulkPart` and related structures in `src/mito2/src/memtable/bulk/part.rs`, `src/mito2/src/memtable/simple_bulk_memtable.rs`, `src/mito2/src/memtable/time_partition.rs`, `src/mito2/src/region_write_ctx.rs`, `src/mito2/src/worker/handle_bulk_insert.rs`, and `src/store-api/src/region_request.rs` to include a new `raw_data` field for handling Arrow IPC data. - Implement Flight Data Encoding: Added a new module `flight` in `src/common/test-util/src/flight.rs` to encode record batches to Flight data format. - Update `greptime-proto` dependency: Changed the revision of the `greptime-proto` dependency in `Cargo.lock` and `Cargo.toml`. - Enhance WAL Writer and Tests: Modified `src/mito2/src/wal.rs` and related test files to support bulk WAL entries and added tests for encoding and handling bulk data. * feat/bulk-wal: - Update `greptime-proto` Dependency: Updated the `greptime-proto` dependency to a new revision in `Cargo.lock` and `Cargo.toml`. - Add `common-grpc` Dependency: Added `common-grpc` as a dependency in `Cargo.lock` and `src/mito2/Cargo.toml`. - Refactor `BulkPart` Structure: Removed `num_rows` field and added `num_rows()` method in `src/mito2/src/memtable/bulk/part.rs`. Updated related usages in `src/mito2/src/memtable/simple_bulk_memtable.rs`, `src/mito2/src/memtable/time_partition.rs`, `src/mito2/src/memtable/time_series.rs`, `src/mito2/src/region_write_ctx.rs`, and `src/mito2/src/worker/handle_bulk_insert.rs`. - Implement `TryFrom` and `From` for `BulkWalEntry`: Added implementations for converting between `BulkPart` and `BulkWalEntry` in `src/mito2/src/memtable/bulk/part.rs`. - Handle Bulk Entries in Region Opener: Added logic to process bulk entries in `src/mito2/src/region/opener.rs`. - Fix `BulkInsertRequest` Handling: Corrected `region_id` handling in `src/operator/src/bulk_insert.rs` and `src/store-api/src/region_request.rs`. - Add Error Variant for `ConvertBulkWalEntry`: Added a new error variant in `src/mito2/src/error.rs` for handling bulk WAL entry conversion errors. * fix: ci * feat/bulk-wal: Add bulk write operation in `opener.rs` - Enhanced the region write context by adding a call to `write_bulk()` after `write_memtable()` in `opener.rs`. - This change aims to improve the efficiency of writing operations by enabling bulk writes. * feat/bulk-wal: Enhance error handling and metrics in `bulk_insert.rs` - Updated `Inserter` to improve error handling by capturing the result of `datanode.handle(request)` and incrementing the `DIST_INGEST_ROW_COUNT` metric with the number of affected rows. * feat/bulk-wal: ### Remove Encode Error Handling for WAL Entries - `error.rs`: Removed the `EncodeWal` error variant and its associated handling. - `wal.rs`: Eliminated the `entry_encode_buf` buffer and its usage for encoding WAL entries. Replaced with direct encoding to a vector using `encode_to_vec()`.	2025-05-29 09:10:30 +00:00
Lei, HUANG	4b71e493f7	feat!: revise compaction picker (#6121 ) * - Refactor `RegionFilePathFactory` to `RegionFilePathProvider`: Updated references and implementations in `access_layer.rs`, `write_cache.rs`, and related test files to use the new struct name. - Add `max_file_size` support in compaction: Introduced `max_file_size` option in `PickerOutput`, `SerializedPickerOutput`, and `WriteOptions` in `compactor.rs`, `picker.rs`, `twcs.rs`, and `window.rs`. - Enhance Parquet writing logic: Modified `parquet.rs` and `parquet/writer.rs` to support optional `max_file_size` and added a test case `test_write_multiple_files` to verify writing multiple files based on size constraints. Refactor Parquet Writer Initialization and File Handling - Updated `ParquetWriter` in `writer.rs` to handle `current_indexer` as an `Option`, allowing for more flexible initialization and management. - Introduced `finish_current_file` method to encapsulate logic for completing and transitioning between SST files, improving code clarity and maintainability. - Enhanced error handling and logging with `debug` statements for better traceability during file operations. - Removed Output Size Enforcement in `twcs.rs`: - Deleted the `enforce_max_output_size` function and related logic to simplify compaction input handling. - Added Max File Size Option in `parquet.rs`: - Introduced `max_file_size` in `WriteOptions` to control the maximum size of output files. - Refactored Indexer Management in `parquet/writer.rs`: - Changed `current_indexer` from an `Option` to a direct `Indexer` type. - Implemented `roll_to_next_file` to handle file transitions when exceeding `max_file_size`. - Simplified indexer initialization and management logic. - Refactored SST File Handling: - Introduced `FilePathProvider` trait and its implementations (`WriteCachePathProvider`, `RegionFilePathFactory`) to manage SST and index file paths. - Updated `AccessLayer`, `WriteCache`, and `ParquetWriter` to use `FilePathProvider` for path management. - Modified `SstWriteRequest` and `SstUploadRequest` to use path providers instead of direct paths. - Files affected: `access_layer.rs`, `write_cache.rs`, `parquet.rs`, `writer.rs`. - Enhanced Indexer Management: - Replaced `IndexerBuilder` with `IndexerBuilderImpl` and made it async to support dynamic indexer creation. - Updated `ParquetWriter` to handle multiple indexers and file IDs. - Files affected: `index.rs`, `parquet.rs`, `writer.rs`. - Removed Redundant File ID Handling: - Removed `file_id` from `SstWriteRequest` and `CompactionOutput`. - Updated related logic to dynamically generate file IDs where necessary. - Files affected: `compaction.rs`, `flush.rs`, `picker.rs`, `twcs.rs`, `window.rs`. - Test Adjustments: - Updated tests to align with new path and indexer management. - Introduced `FixedPathProvider` and `NoopIndexBuilder` for testing purposes. - Files affected: `sst_util.rs`, `version_util.rs`, `parquet.rs`. * chore: rebase main * feat/multiple-compaction-output: ### Add Benchmarking and Refactor Compaction Logic - Benchmarking: Added a new benchmark `run_bench` in `Cargo.toml` and implemented benchmarks in `benches/run_bench.rs` using Criterion for `find_sorted_runs` and `reduce_runs` functions. - Compaction Module Enhancements: - Made `run.rs` public and refactored the `Ranged` and `Item` traits to be public. - Simplified the logic in `find_sorted_runs` and `reduce_runs` by removing `MergeItems` and related functions. - Introduced `find_overlapping_items` for identifying overlapping items. - Code Cleanup: Removed redundant code and tests related to `MergeItems` in `run.rs`. * feat/multiple-compaction-output: ### Enhance Compaction Logic and Add Benchmarks - Compaction Logic Improvements: - Updated `reduce_runs` function in `src/mito2/src/compaction/run.rs` to remove the target parameter and improve the logic for selecting files to merge based on minimum penalty. - Enhanced `find_overlapping_items` to handle unsorted inputs and improve overlap detection efficiency. - Benchmark Enhancements: - Added `bench_find_overlapping_items` in `src/mito2/benches/run_bench.rs` to benchmark the new `find_overlapping_items` function. - Extended existing benchmarks to include larger data sizes. - Testing Enhancements: - Updated tests in `src/mito2/src/compaction/run.rs` to reflect changes in `reduce_runs` and added new tests for `find_overlapping_items`. - Logging and Debugging: - Improved logging in `src/mito2/src/compaction/twcs.rs` to provide more detailed information about compaction decisions. * feat/multiple-compaction-output: ### Refactor and Enhance Compaction Logic - Refactor `find_overlapping_items` Function: Changed the function signature to accept slices instead of mutable vectors in `run.rs`. - Rename and Update Struct Fields: Renamed `penalty` to `size` in `SortedRun` struct and updated related logic in `run.rs`. - Enhance `reduce_runs` Function: Improved logic to sort runs by size and limit probe runs to 100 in `run.rs`. - Add `merge_seq_files` Function: Introduced a new function `merge_seq_files` in `run.rs` for merging sequential files. - Modify `TwcsPicker` Logic: Updated the compaction logic to use `merge_seq_files` when only one run is found in `twcs.rs`. - Remove `enforce_file_num` Function: Deleted the `enforce_file_num` function and its related test cases in `twcs.rs`. * feat/multiple-compaction-output: ### Enhance Compaction Logic and Testing - Add `merge_seq_files` Functionality: Implemented the `merge_seq_files` function in `run.rs` to optimize file merging based on scoring systems. Updated benchmarks in `run_bench.rs` to include `bench_merge_seq_files`. - Improve Compaction Strategy in `twcs.rs`: Modified the compaction logic to handle file merging more effectively, considering file size and overlap. - Update Tests: Enhanced test coverage in `compaction_test.rs` and `append_mode_test.rs` to validate new compaction logic and file merging strategies. - Remove Unused Function: Deleted `new_file_handles` from `test_util.rs` as it was no longer needed. * feat/multiple-compaction-output: ### Refactor TWCS Compaction Options - Refactor Compaction Logic: Simplified the TWCS compaction logic by replacing multiple parameters (`max_active_window_runs`, `max_active_window_files`, `max_inactive_window_runs`, `max_inactive_window_files`) with a single `trigger_file_num` parameter in `picker.rs`, `twcs.rs`, and `options.rs`. - Update Tests: Adjusted test cases to reflect the new compaction logic in `append_mode_test.rs`, `compaction_test.rs`, `filter_deleted_test.rs`, `merge_mode_test.rs`, and various test files under `tests/cases`. - Modify Engine Options: Updated engine option keys to use `trigger_file_num` in `mito_engine_options.rs` and `region_request.rs`. - Fuzz Testing: Updated fuzz test generators and translators to accommodate the new compaction parameter in `alter_expr.rs` and related files. This refactor aims to streamline the compaction configuration by reducing the number of parameters and simplifying the codebase. * chore: add trailing space * fix license header * feat/revise-compaction-picker: Limit File Processing and Optimize Merge Logic in `run.rs` - Introduced a limit to process a maximum of 100 files in `merge_seq_files` to control time complexity. - Adjusted logic to calculate `target_size` and iterate over files using the limited set of files. - Updated scoring calculations to use the limited file set, ensuring efficient file merging. * feat/revise-compaction-picker: ### Add Compaction Metrics and Remove Debug Logging - Compaction Metrics: Introduced new histograms `COMPACTION_INPUT_BYTES` and `COMPACTION_OUTPUT_BYTES` to track compaction input and output file sizes in `metrics.rs`. Updated `compactor.rs` to observe these metrics during the compaction process. - Logging Cleanup: Removed debug logging of file ranges during the merge process in `twcs.rs`. * feat/revise-compaction-picker: ## Enhance Compaction Logic and Metrics - Compaction Logic Improvements: - Added methods `input_file_size` and `output_file_size` to `MergeOutput` in `compactor.rs` to streamline file size calculations. - Updated `Compactor` implementation to use these methods for metrics tracking. - Modified `Ranged` trait logic in `run.rs` to improve range comparison. - Enhanced test cases in `run.rs` to reflect changes in compaction logic. - Metrics Enhancements: - Changed `COMPACTION_INPUT_BYTES` and `COMPACTION_OUTPUT_BYTES` from histograms to counters in `metrics.rs` for better performance tracking. - Debugging and Logging: - Added detailed logging for compaction pick results in `twcs.rs`. - Implemented custom `Debug` trait for `FileMeta` in `file.rs` to improve debugging output. - Testing Enhancements: - Added new test `test_compaction_overlapping_files` in `compaction_test.rs` to verify compaction behavior with overlapping files. - Updated `merge_mode_test.rs` to reflect changes in file handling during scans. * feat/revise-compaction-picker: ### Update `FileHandle` Debug Implementation - Refactor Debug Output: Simplified the `fmt::Debug` implementation for `FileHandle` in `src/mito2/src/sst/file.rs` by consolidating multiple fields into a single `meta` field using `meta_ref()`. - Atomic Operations: Updated the `deleted` field to use atomic loading with `Ordering::Relaxed`. * Trigger CI * feat/revise-compaction-picker: Update compaction logic and default options - `twcs.rs`: Enhanced logging for compaction pick results by improving the formatting for better readability. - `options.rs`: Modified the default `max_output_file_size` in `TwcsOptions` from 2GB to 512MB to optimize file handling and performance. * feat/revise-compaction-picker: Refactor `find_overlapping_items` to use an external result vector - Updated `find_overlapping_items` in `src/mito2/src/compaction/run.rs` to accept a mutable result vector instead of returning a new vector, improving memory efficiency. - Modified benchmarks in `src/mito2/benches/bench_compaction_picker.rs` to accommodate the new function signature. - Adjusted tests in `src/mito2/src/compaction/run.rs` to use the updated function signature, ensuring correct functionality with the new approach. * feat/revise-compaction-picker: Improve file merging logic in `run.rs` - Refactor the loop logic in `merge_seq_files` to simplify the iteration over file groups. - Adjust the range for `end_idx` to include the endpoint, allowing for more flexible group selection. - Remove the condition that skips groups with only one file, enabling more comprehensive processing of file sequences. * feat/revise-compaction-picker: Enhance `find_overlapping_items` with `SortedRun` and Update Tests - Refactor `find_overlapping_items` in `src/mito2/src/compaction/run.rs` to utilize the `SortedRun` struct for improved efficiency and clarity. - Introduce a `sorted` flag in `SortedRun` to optimize sorting operations. - Update test cases in `src/mito2/benches/bench_compaction_picker.rs` to accommodate changes in `find_overlapping_items` by using `SortedRun`. - Add `From<Vec<T>>` implementation for `SortedRun` to facilitate easy conversion from vectors. * feat/revise-compaction-picker: Enhancements in `compaction/run.rs`: - Added `ReadableSize` import to handle size calculations. - Modified the logic in `merge_seq_files` to clamp the calculated target size to a maximum of 2GB when `max_file_size` is not provided. * feat/revise-compaction-picker: Add Default Max Output Size Constant for Compaction Introduce DEFAULT_MAX_OUTPUT_SIZE constant to define the default maximum compaction output file size as 2GB. Refactor the merge_seq_files function to utilize this constant, ensuring consistent and maintainable code for handling file size limits during compaction.	2025-05-23 03:29:08 +00:00
Lei, HUANG	5a0da5b6bb	fix: region worker stall metrics (#6149 ) fix/stall-metrics: Improve stalled request handling in `handle_write.rs` - Updated logic to account for both `write_requests` and `bulk_requests` when adjusting `stalled_count`. - Modified `reject_region_stalled_requests` and `handle_region_stalled_requests` to correctly subtract the combined length of `requests` and `bulk` from `stalled_count`.	2025-05-21 13:21:50 +00:00
Lei, HUANG	eaf7b4b9dd	chore: update flush failure metric name and update grafana dashboard (#6138 ) * 1. rename `greptime_mito_flush_errors_total` metric to `greptime_mito_flush_errors_total` for consistency 2. update grafana dashboard to add following panel: - compaction input/output bytes - bulk insert handle elasped time in frontend and region worker	2025-05-20 12:05:54 +00:00
Zhenchi	400229c384	feat: introduce index result cache (#6110 ) * feat: introduce index result cache Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> * Update src/mito2/src/sst/index/inverted_index/applier/builder.rs Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * optimize selector_len Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> * address comments Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> * address comments Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> * address comments Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> --------- Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-05-20 01:45:42 +00:00
Weny Xu	864cc117b3	fix: append noop entry when auto topic creation is disabled (#6092 ) * feat: improve topic management and add stale records cleanup * fix: fix unit tests * chore: apply suggestions from CR * chore: apply suggestions from CR	2025-05-16 11:26:47 +00:00
Yingwen	0ea9ab385d	fix: clean files under the atomic write dir on failure (#6112 ) * fix: remove files under atomic dir on failure * fix: clean atomic dir on download failure * chore: update comment * fix: clean if failed to write without write cache * feat: add a TempFileCleaner to clean files on failure * chore: after merge fix * chore: more fix --------- Co-authored-by: discord9 <55937128+discord9@users.noreply.github.com> Co-authored-by: discord9 <discord9@163.com>	2025-05-16 11:18:11 +00:00
Yingwen	c7e9485534	feat: New scanner `SeriesScan` to scan by series for querying metrics (#5968 ) * chore: basic methods for SeriesScan * chore: add to scanner enum * feat: implement scan logic of each partition * feat: use series scan when distribution is PerSeries * refactor: remove per series scan from SeqScan * fix: use series scan in PerSeries distribution * feat: keep parallelize_scan unchanged * fix: address compiler errors * fix: include build merge reader cost to scan cost * feat: use smallvec * chore: update comment * Revert "feat: keep parallelize_scan unchanged" This reverts commit `96ba00d175`. * assign partition_ranges Signed-off-by: Ruihang Xia <waynestxia@gmail.com> * feat: try send before send reduce the send timeout to 10ms * chore: add comments * fix: add metrics to partition metrics list * fix: correct scan cost metrics * chore: reset instant * fix: scanner metrics init * chore: display more info in explain * feat: metrics for send series timeout * style: fix clippy * refactor: use ChainedRecordBatchStream to simplify codes * chore: fix typos * feat: separate distributor metrics * feat: remove parallelize hack * chore: fix warning * test: add test for series scan * test: update sqlness test --------- Signed-off-by: Ruihang Xia <waynestxia@gmail.com> Co-authored-by: Ruihang Xia <waynestxia@gmail.com>	2025-05-16 08:53:24 +00:00
Ruihang Xia	57b53211d9	feat: don't hide atomic write dir (#6109 ) * feat: don't hidden atomic write dir Signed-off-by: Ruihang Xia <waynestxia@gmail.com> * compatible code Signed-off-by: Ruihang Xia <waynestxia@gmail.com> * Update src/mito2/src/access_layer.rs Co-authored-by: Yingwen <realevenyag@gmail.com> --------- Signed-off-by: Ruihang Xia <waynestxia@gmail.com> Co-authored-by: Yingwen <realevenyag@gmail.com>	2025-05-16 06:21:13 +00:00
Lei, HUANG	5a9023d6b3	feat(bulk): write to multiple time partitions (#6086 ) * add benchmark for splitting according to time partition * feat/write-to-multiple-time-partitions: Enhancements to Bulk Processing and Time Partitioning - `part.rs`: Added `Snafu` to imports and introduced `timestamp_index` in `BulkPart` struct. Implemented `timestamps` method for accessing timestamp columns. - `simple_bulk_memtable.rs`: Updated tests to include `timestamp_index` initialization. - `time_partition.rs`: Enhanced `TimePartition` to support partial writes with `write_record_batch_partial`. Implemented `split_record_batch` for filtering records by timestamp range. Added comprehensive tests for `split_record_batch`. - `handle_bulk_insert.rs`: Modified to retrieve timestamp index and column together, updating `BulkPart` initialization with `timestamp_index`. * feat/write-to-multiple-time-partitions: ### Enhance Time Partitioning Logic - `time_partition.rs`: - Introduced `HashSet` for efficient partition management. - Refactored `write_bulk` to handle multiple partitions and added `find_partitions_by_time_range` for identifying existing and missing partitions. - Updated `get_or_create_time_partition` to manage partition creation. - Added comprehensive tests for partition finding logic, covering various scenarios including overlapping and non-overlapping time ranges. - Tests: - Added `test_find_partitions_by_time_range` to validate new partitioning logic. - Updated `test_split_record_batch` to ensure correct record batch splitting behavior. * feat/write-to-multiple-time-partitions: ### Enhance Time Partitioning and Testing in `time_partition.rs` - Time Partitioning Enhancements: - Updated `split_record_batch` to handle multiple timestamp units (`Second`, `Millisecond`, `Microsecond`, `Nanosecond`) by matching on `DataType`. - Improved filtering logic for timestamp arrays to support various time units. - Testing Enhancements: - Added `test_write_bulk` to verify writing across multiple partitions and scenarios in `time_partition.rs`. - Updated `test_split_record_batch` to use `TimestampMillisecondArray` for testing timestamp partitioning. - Imports and Dependencies: - Added necessary imports for new timestamp array types and testing utilities. * feat/write-to-multiple-time-partitions: ### Refactor and Enhance Time Partition Filtering - Refactor Filtering Logic: Consolidated the filtering logic for timestamp arrays using macros in `time_partition.rs` and `bench_filter_time_partition.rs`. This reduces code duplication and improves maintainability. - Enhance `BulkPart` Struct: Made fields in `BulkPart` public to facilitate easier access and manipulation in `memtable.rs` and `part.rs`. - Rename Function: Renamed `split_record_batch` to `filter_record_batch` for clarity in `time_partition.rs` and `bench_filter_time_partition.rs`. - Add Feature Flag: Introduced `int_roundings` feature in `lib.rs` to support new functionality. * refactor tests * feat/write-to-multiple-time-partitions: Improve timestamp handling in `time_partition.rs` - Enhanced safety comments for timestamp conversion to ensure clarity. - Modified logic to prevent overflow by using `div_euclid` for `bulk_start_sec` and `bulk_end_sec` calculations. - Adjusted the `filter_map` logic to correctly compute timestamps using `start_sec` and `part_duration_sec`. * feat/write-to-multiple-time-partitions: Refactor timestamp handling and add utility function - Refactor `time_partition.rs`: Simplified timestamp handling by replacing direct type access with a utility function to retrieve the timestamp unit. Improved error handling for timestamp conversion. - Enhance `metadata.rs`: Added `time_index_type` function to `RegionMetadata` to retrieve the timestamp type of the time index column, ensuring safer and more readable code. * feat/write-to-multiple-time-partitions: Refactor time partition variable names in `time_partition.rs` - Renamed variables for clarity: `bulk_start_sec` to `start_bucket` and `bulk_end_sec` to `end_bucket`. - Updated related logic to use new variable names for improved readability and maintainability. * feat/write-to-multiple-time-partitions: Refactor variable names in `time_partition.rs` - Updated variable names from `matching` and `missing` to `matchings` and `missings` for clarity and consistency. - Modified function calls and loop iterations to align with the new variable names. - Affected file: `src/mito2/src/memtable/time_partition.rs` * feat/write-to-multiple-time-partitions: ### Refactor variable names in `time_partition.rs` - Updated variable names for clarity in `time_partition.rs`: - Renamed `matchings` to `matching_parts` - Renamed `missings` to `missing_parts` - Adjusted logic to use new variable names in methods `find_partitions_by_time_range` and `write_record_batch`. * feat/write-to-multiple-time-partitions: ### Enhance Time Partition Handling - `time_partition.rs`: - Added `ArrayRef` to handle timestamp arrays, improving the partitioning logic by allowing more efficient timestamp range checks. - Enhanced `find_partitions_by_time_range` to support sparse data and handle different timestamp units (`Second`, `Millisecond`, `Microsecond`, `Nanosecond`). - Updated test cases to cover new scenarios, including sparse data and edge cases, ensuring robustness of partition handling. --------- Co-authored-by: Lei <lei@Leis-MacBook-Pro.local>	2025-05-14 05:09:59 +00:00
Ruihang Xia	bbb6f8685e	feat: implement commutativity rule for prom-related plans (#5875 ) * feat: implement commutativity rule for prom-related plans Signed-off-by: Ruihang Xia <waynestxia@gmail.com> * fix range manipulate deserializer Signed-off-by: Ruihang Xia <waynestxia@gmail.com> * blocklist in commutativity rule Signed-off-by: Ruihang Xia <waynestxia@gmail.com> * change dictionary type Signed-off-by: Ruihang Xia <waynestxia@gmail.com> * handle partition and ordering Signed-off-by: Ruihang Xia <waynestxia@gmail.com> * fix clippy Signed-off-by: Ruihang Xia <waynestxia@gmail.com> * update tests Signed-off-by: Ruihang Xia <waynestxia@gmail.com> * add rate, increase and delta Signed-off-by: Ruihang Xia <waynestxia@gmail.com> * update sqlness result Signed-off-by: Ruihang Xia <waynestxia@gmail.com> * regexp_replace uses empty string instead of null value Signed-off-by: Ruihang Xia <waynestxia@gmail.com> * update sqlness result Signed-off-by: Ruihang Xia <waynestxia@gmail.com> * update sqlness result Signed-off-by: Ruihang Xia <waynestxia@gmail.com> * update sqlness result again Signed-off-by: Ruihang Xia <waynestxia@gmail.com> --------- Signed-off-by: Ruihang Xia <waynestxia@gmail.com>	2025-05-13 09:06:25 +00:00
Yingwen	ca1641d1c4	feat: implement PlainBatch struct (#6079 ) * feat: implement PlainBatch struct * chore: typo * style: fix clippy * feat: assert num columns	2025-05-13 05:56:12 +00:00
Zhenchi	36d9346ffc	refactor: introduce row group selection (#6075 ) Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>	2025-05-12 07:15:17 +00:00
discord9	6ab0f0cc5c	fix: alter table modify type should also modify default value (#6049 ) * fix: select after alter * fix: insert a proper row&catch a bug * fix: alter table modify type modify default value type too * refactor: per review * chore: per review * refactor: per review * refactor: per review	2025-05-09 03:40:59 +00:00
Lei, HUANG	8685ceb232	feat: impl bulk memtable and bridge bulk inserts (#6054 ) * feat/bridge-bulk-insert: ## Implement Bulk Insert and Update Dependencies - Bulk Insert Implementation: Added `handle_bulk_inserts` method in `src/operator/src/bulk_insert.rs` to manage bulk insert requests using `FlightDecoder` and `FlightData`. - Dependency Updates: Updated `Cargo.lock` and `Cargo.toml` to use the latest revision of `greptime-proto` and added new dependencies like `arrow`, `arrow-ipc`, `bytes`, and `prost`. - gRPC Enhancements: Modified `put_record_batch` method in `src/frontend/src/instance/grpc.rs` and `src/servers/src/grpc/flight.rs` to handle `FlightData` instead of `RawRecordBatch`. - Error Handling: Added new error types in `src/operator/src/error.rs` for handling Arrow operations and decoding flight data. - Miscellaneous: Updated `src/operator/src/insert.rs` to expose `partition_manager` and `node_manager` as public fields. * feat/bridge-bulk-insert: - Update `greptime-proto` Dependency: Updated the `greptime-proto` dependency to a new revision in `Cargo.lock` and `Cargo.toml`. - Refactor gRPC Query Handling: Removed `RawRecordBatch` usage from `grpc.rs`, `flight.rs`, `greptime_handler.rs`, and test files, simplifying the gRPC query handling. - Enhance Bulk Insert Logic: Improved bulk insert logic in `bulk_insert.rs` and `region_request.rs` by using `FlightDecoder` and `BooleanArray` for better performance and clarity. - Add `common-grpc` Dependency: Added `common-grpc` as a workspace dependency in `store-api/Cargo.toml` to support gRPC functionalities. * fix: clippy * fix schema serialization * feat/bridge-bulk-insert: Add error handling for encoding/decoding in `metadata.rs` and `region_request.rs` - Introduced new error variants `FlightCodec` and `Prost` in `MetadataError` to handle encoding/decoding failures in `metadata.rs`. - Updated `make_region_bulk_inserts` function in `region_request.rs` to use `context` for error handling with `ProstSnafu` and `FlightCodecSnafu`. - Enhanced error handling for `FlightData` decoding and `filter_record_batch` operations. * fix: test * refactor: rename * allow empty app_metadata in FlightData * feat/bridge-bulk-insert: - Remove Logging: Removed unnecessary logging of affected rows in `region_server.rs`. - Error Handling Enhancement: Improved error handling in `bulk_insert.rs` by adding context to `split_record_batch` and handling single datanode fast path. - Error Enum Cleanup: Removed unused `Arrow` error variant from `error.rs`. * fix: standalone test * feat/bridge-bulk-insert: ### Enhance Bulk Insert Handling and Metadata Management - `lib.rs`: Enabled the `result_flattening` feature for improved error handling. - `request.rs`: Made `name_to_index` and `has_null` fields public in `WriteRequest` for better accessibility. - `handle_bulk_insert.rs`: - Added `handle_record_batch` function to streamline processing of bulk insert payloads. - Improved error handling and task management for bulk insert operations. - Updated `region_metadata_to_column_schema` to return both column schemas and a name-to-index map for efficient data access. * feat/bridge-bulk-insert: - Refactor `handle_bulk_insert.rs`: - Replaced `handle_record_batch` with `handle_payload` for handling payloads. - Modified the fast path to use `common_runtime::spawn_global` for asynchronous task execution. - Optimize `multi_dim.rs`: - Added a fast path for single-region scenarios in `MultiDimPartitionRule::partition_record_batch`. * feat/bridge-bulk-insert: - Update `greptime-proto` Dependency: Updated the `greptime-proto` dependency to a new revision in both `Cargo.lock` and `Cargo.toml`. - Optimize Memory Allocation: Increased initial and builder capacities in `time_series.rs` to improve performance. - Enhance Data Handling: Modified `bulk_insert.rs` to use `Bytes` for efficient data handling. - Improve Bulk Insert Logic: Refined the bulk insert logic in `region_request.rs` to handle schema and payload data more effectively and optimize record batch filtering. - String Handling Improvement: Updated string conversion in `helper.rs` for better performance. * fix: clippy warnings * feat/bridge-bulk-insert: Add Metrics and Improve Error Handling - Metrics Enhancements: Introduced new metrics for bulk insert operations in `metrics.rs`, `bulk_insert.rs`, `greptime_handler.rs`, and `region_request.rs`. Added `HANDLE_BULK_INSERT_ELAPSED`, `BULK_REQUEST_MESSAGE_SIZE`, and `GRPC_BULK_INSERT_ELAPSED` histograms to monitor performance. - Error Handling Improvements: Removed unnecessary error handling in `handle_bulk_insert.rs` by eliminating redundant `let _ =` patterns. - Dependency Updates: Added `lazy_static` and `prometheus` to `Cargo.lock` and `Cargo.toml` for metrics support. - Code Refactoring: Simplified function calls in `region_server.rs` and `handle_bulk_insert.rs` for better readability. * chore: rebase main * implement simple bulk memtable * impl write_bulk * implement simple bulk memtable * feat/simple-bulk-memtable: ### Enhance Time-Series Memtable and Bulk Insert Handling - Visibility Modifications: Made `mutable_array` in `PrimitiveVectorBuilder` and `StringVectorBuilder` public in `primitive.rs` and `string.rs`. - New Module: Added `builder.rs` to `memtable` for time-series builders, including `FieldBuilder` and `StringBuilder` implementations. - Bulk Insert Enhancements: - Added `sequence` field to `BulkPart` in `part.rs` and updated its handling in `simple_bulk_memtable.rs` and `region_write_ctx.rs`. - Introduced metrics for bulk insert operations in `metrics.rs` and `bulk_insert.rs`. - Performance Metrics: Added timing metrics for write operations in `metrics.rs`, `region_write_ctx.rs`, and `handle_write.rs`. - Region Request Handling: Updated `make_region_bulk_inserts` in `region_request.rs` to include performance metrics. * feat/simple-bulk-memtable: Improve Memtable Stats Calculation and Add Metrics Timer - `simple_bulk_memtable.rs`: Refactored `stats` method to use `num_rows` for checking if rows have been written, improving accuracy in memory table statistics. - `handle_bulk_insert.rs`: Introduced a metrics timer to measure the elapsed time for processing bulk requests, enhancing performance monitoring. * feat/simple-bulk-memtable: ### Commit Message Enhancements and Bug Fixes - Dependency Update: Updated `greptime-proto` dependency to a new revision in `Cargo.lock` and `Cargo.toml`. - Feature Addition: Implemented `to_mutation` method in `BulkPart` to convert `BulkPart` to `Mutation` for fallback `write_bulk` implementation in `src/mito2/src/memtable/bulk/part.rs`. - Functionality Improvement: Modified `write_bulk` method in `TimeSeriesMemtable` to support default implementation fallback to row iteration in `src/mito2/src/memtable/time_series.rs`. - Performance Optimization: Enhanced `bulk_insert` handling by optimizing region request processing and data partitioning in `src/operator/src/bulk_insert.rs`. - Error Handling: Added `ComputeArrow` error variant for better error management in `src/operator/src/error.rs`. - Code Refactoring: Simplified region bulk insert request processing in `src/store-api/src/region_request.rs`. * fix: some clippy warnings * feat/simple-bulk-memtable: ### Commit Summary - Refactor Return Types to `Result`: Updated the return type of the `ranges` method in `memtable.rs`, `bulk.rs`, `partition_tree.rs`, `simple_bulk_memtable.rs`, `time_series.rs`, and `memtable_util.rs` to return `Result<MemtableRanges>` for better error handling. - Enhance Metrics Tracking: Improved metrics tracking by adding `num_rows` and `max_sequence` to `WriteMetrics` in `stats.rs`. Updated related methods in `partition_tree.rs`, `simple_bulk_memtable.rs`, `time_series.rs`, and `scan_region.rs` to utilize these metrics. - Remove Unused Imports: Cleaned up unused imports in `time_series.rs` to streamline the codebase. * merge main * remove useless error vairant * use newer version of proto * feat/simple-bulk-memtable: Commit Message Summary Enhance FieldBuilder and StringBuilder functionality, add tests, and improve error handling. Key Changes • builder.rs: • Added documentation for FieldBuilder methods. • Renamed append_string_vector to append_vector in StringBuilder. • simple_bulk_memtable.rs: • Added new test cases for write_one, write_bulk, is_empty, stats, fork, and sequence_filter. • time_series.rs: • Improved error handling in ValueBuilder for type mismatches. • memtable_util.rs: • Removed unused imports and streamlined code. These changes enhance the robustness and test coverage of the memtable components. * feat/simple-bulk-memtable: Improve Time Partition Matching Logic in `time_partition.rs` - Enhanced the `write_bulk` method in `time_partition.rs` to improve the logic for matching partitions based on time ranges. - Introduced a new mechanism to filter and select partitions that overlap with the record batch's timestamp range before writing. * feat/simple-bulk-memtable: Improve Metrics Handling in `bulk_insert.rs` - Removed the `group_request_timer` and its associated metric observation to streamline the timing logic. - Moved the `BULK_REQUEST_ROWS` metric observation to occur after filtering, ensuring accurate row count metrics. * feat/simple-bulk-memtable: Enhance Stalled Requests Calculation and Update Metrics - `worker.rs`: Updated the `stalled_count` method to include both `reqs` and `bulk_reqs` in the calculation of stalled requests. - `bulk_insert.rs`: Removed duplicate observation of `BULK_REQUEST_MESSAGE_SIZE` metric. - `metrics.rs`: Changed the bucket strategy for `BULK_REQUEST_ROWS` from linear to exponential, improving the granularity of metrics collection. * feat/simple-bulk-memtable: Refactor `StringVector` Usage and Update Method Signatures - `src/datatypes/src/vectors/string.rs`: Changed `StringVector`'s `array` field from public to private. - `src/mito2/src/memtable/builder.rs`: Refactored `append_vector` method to `append_array`, updating its usage to work directly with `StringArray` instead of `StringVector`. - `src/mito2/src/memtable/time_series.rs`: Updated `ValueBuilder` to handle `StringArray` directly, replacing `StringVector` usage with `StringArray` in the `FieldBuilder::String` case. * feat/simple-bulk-memtable: - Refactor `PrimitiveVectorBuilder`: Made `mutable_array` private in `src/datatypes/src/vectors/primitive.rs`. - Optimize `ValueBuilder`: Replaced `UInt64VectorBuilder` and `UInt8VectorBuilder` with `Vec<u64>` and `Vec<u8>` for `sequence` and `op_type` in `src/mito2/src/memtable/time_series.rs`. - Improve Metrics Initialization: Updated histogram bucket initialization to use `exponential_buckets` in `src/mito2/src/metrics.rs`. * feat/simple-bulk-memtable: Improve error handling in `simple_bulk_memtable.rs` and `time_series.rs` - Enhanced error handling by using `OptionExt` for more concise error context management in `simple_bulk_memtable.rs` and `time_series.rs`. - Replaced `ok_or` with `with_context` to streamline error context creation in both files. * feat/simple-bulk-memtable: Enhance Time Partition Handling in `time_partition.rs` - Introduced `create_time_partition` function to streamline the creation of new time partitions, ensuring thread safety by acquiring a lock. - Modified logic to handle cases where no matching time partitions exist, creating new partitions as needed. - Updated `write_record_batch` and `write_one` methods to utilize the new partition creation logic, improving partition management and data writing efficiency. * replace proto * feat/simple-bulk-memtable: Update `metrics.rs` to adjust the range of exponential buckets for bulk insert message rows from `10 ~ 1_000_000` to `10 ~ 100_000`.	2025-05-09 02:56:09 +00:00
LFC	e787007eb5	feat: scan with sst minimal sequence (#6051 ) * feat: scan with sst minimal sequence * Update src/store-api/src/storage/requests.rs Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * update proto --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-05-08 01:34:51 +00:00
Weny Xu	df31f0b9ec	fix: improve region migration error handling and optimize leader downgrade with lease check (#6026 ) * fix(meta): improve region migration error handling and lease management * chore: refine comments * chore: apply suggestions from CR * chore: apply suggestions from CR * feat: consume opening_region_guard	2025-05-07 00:54:35 +00:00
Lei, HUANG	f298a110f9	feat: bridge bulk insert (#5927 ) * feat/bridge-bulk-insert: ## Implement Bulk Insert and Update Dependencies - Bulk Insert Implementation: Added `handle_bulk_inserts` method in `src/operator/src/bulk_insert.rs` to manage bulk insert requests using `FlightDecoder` and `FlightData`. - Dependency Updates: Updated `Cargo.lock` and `Cargo.toml` to use the latest revision of `greptime-proto` and added new dependencies like `arrow`, `arrow-ipc`, `bytes`, and `prost`. - gRPC Enhancements: Modified `put_record_batch` method in `src/frontend/src/instance/grpc.rs` and `src/servers/src/grpc/flight.rs` to handle `FlightData` instead of `RawRecordBatch`. - Error Handling: Added new error types in `src/operator/src/error.rs` for handling Arrow operations and decoding flight data. - Miscellaneous: Updated `src/operator/src/insert.rs` to expose `partition_manager` and `node_manager` as public fields. * feat/bridge-bulk-insert: - Update `greptime-proto` Dependency: Updated the `greptime-proto` dependency to a new revision in `Cargo.lock` and `Cargo.toml`. - Refactor gRPC Query Handling: Removed `RawRecordBatch` usage from `grpc.rs`, `flight.rs`, `greptime_handler.rs`, and test files, simplifying the gRPC query handling. - Enhance Bulk Insert Logic: Improved bulk insert logic in `bulk_insert.rs` and `region_request.rs` by using `FlightDecoder` and `BooleanArray` for better performance and clarity. - Add `common-grpc` Dependency: Added `common-grpc` as a workspace dependency in `store-api/Cargo.toml` to support gRPC functionalities. * fix: clippy * fix schema serialization * feat/bridge-bulk-insert: Add error handling for encoding/decoding in `metadata.rs` and `region_request.rs` - Introduced new error variants `FlightCodec` and `Prost` in `MetadataError` to handle encoding/decoding failures in `metadata.rs`. - Updated `make_region_bulk_inserts` function in `region_request.rs` to use `context` for error handling with `ProstSnafu` and `FlightCodecSnafu`. - Enhanced error handling for `FlightData` decoding and `filter_record_batch` operations. * fix: test * refactor: rename * allow empty app_metadata in FlightData * feat/bridge-bulk-insert: - Remove Logging: Removed unnecessary logging of affected rows in `region_server.rs`. - Error Handling Enhancement: Improved error handling in `bulk_insert.rs` by adding context to `split_record_batch` and handling single datanode fast path. - Error Enum Cleanup: Removed unused `Arrow` error variant from `error.rs`. * fix: standalone test * feat/bridge-bulk-insert: ### Enhance Bulk Insert Handling and Metadata Management - `lib.rs`: Enabled the `result_flattening` feature for improved error handling. - `request.rs`: Made `name_to_index` and `has_null` fields public in `WriteRequest` for better accessibility. - `handle_bulk_insert.rs`: - Added `handle_record_batch` function to streamline processing of bulk insert payloads. - Improved error handling and task management for bulk insert operations. - Updated `region_metadata_to_column_schema` to return both column schemas and a name-to-index map for efficient data access. * feat/bridge-bulk-insert: - Refactor `handle_bulk_insert.rs`: - Replaced `handle_record_batch` with `handle_payload` for handling payloads. - Modified the fast path to use `common_runtime::spawn_global` for asynchronous task execution. - Optimize `multi_dim.rs`: - Added a fast path for single-region scenarios in `MultiDimPartitionRule::partition_record_batch`. * feat/bridge-bulk-insert: - Update `greptime-proto` Dependency: Updated the `greptime-proto` dependency to a new revision in both `Cargo.lock` and `Cargo.toml`. - Optimize Memory Allocation: Increased initial and builder capacities in `time_series.rs` to improve performance. - Enhance Data Handling: Modified `bulk_insert.rs` to use `Bytes` for efficient data handling. - Improve Bulk Insert Logic: Refined the bulk insert logic in `region_request.rs` to handle schema and payload data more effectively and optimize record batch filtering. - String Handling Improvement: Updated string conversion in `helper.rs` for better performance. * fix: clippy warnings * feat/bridge-bulk-insert: Add Metrics and Improve Error Handling - Metrics Enhancements: Introduced new metrics for bulk insert operations in `metrics.rs`, `bulk_insert.rs`, `greptime_handler.rs`, and `region_request.rs`. Added `HANDLE_BULK_INSERT_ELAPSED`, `BULK_REQUEST_MESSAGE_SIZE`, and `GRPC_BULK_INSERT_ELAPSED` histograms to monitor performance. - Error Handling Improvements: Removed unnecessary error handling in `handle_bulk_insert.rs` by eliminating redundant `let _ =` patterns. - Dependency Updates: Added `lazy_static` and `prometheus` to `Cargo.lock` and `Cargo.toml` for metrics support. - Code Refactoring: Simplified function calls in `region_server.rs` and `handle_bulk_insert.rs` for better readability. * chore: rebase main * chore: merge main	2025-05-06 09:53:25 +00:00
Yingwen	86aae6733d	fix: prune primary key with multiple columns may use default value as statistics (#5996 ) * test: incorrect test result when filtering pk with multiple columns * fix: prune non first tag correctly Distinguish no column and no stats and only use default value when no column * test: update test result * refactor: rename test file * test: add test for null filter * fix: use StatValues for null counts * test: drop table * test: fix unstable flow test	2025-04-28 04:53:30 +00:00
Lei, HUANG	1a517ec8ac	fix: check if memtable is empty by stats (#5989 ) fix/checking-memtable-empty-and-stats: - Refactor timestamp updates: Simplified timestamp range updates in `PartitionTreeMemtable` and `TimeSeriesMemtable` by replacing `update_timestamp_range` with `fetch_max` and `fetch_min` methods for `max_timestamp` and `min_timestamp`. - Affected files: `partition_tree.rs`, `time_series.rs` - Remove unused code: Deleted the `update_timestamp_range` method from `WriteMetrics` and removed unnecessary imports. - Affected file: `stats.rs` - Optimize memtable filtering: Streamlined the check for empty memtables in `ScanRegion` by directly using `time_range`. - Affected file: `scan_region.rs`	2025-04-28 01:57:17 +00:00
shuiyisong	3c943be189	chore: update rust toolchain (#5818 ) * chore: update nightly version * chore: sort lint lines * chore: minor fix * chore: update nix * chore: update toolchain to 2024-04-14 * chore: update toolchain to 2024-04-15 * chore: remove unnecessory test * chore: do not assert oid in sqlness test * chore: fix margin issue * chore: fix cr issues * chore: fix cr issues --------- Co-authored-by: Ning Sun <sunning@greptime.com>	2025-04-27 09:02:36 +00:00
Weny Xu	55cadcd2c0	feat: introduce flush metadata region task for metric engine (#5951 ) * feat: introduce flush metadata region task for metric engine * docs: generate config.md * chore: add header * test: fix unit test * fix: fix unit tests * chore: apply suggestions from CR * chore: remove docs * fix: fix unit tests	2025-04-23 04:51:22 +00:00
Yingwen	56f319a707	fix: filter doesn't consider default values after schema change (#5912 ) * test: sqlness test case * feat: use correct default while pruning row groups * fix: consider default in SimpleFilterContext * test: update sqlness test * test: add order by	2025-04-21 06:32:26 +00:00
Yuhan Wang	41814bb49f	feat: introduce `high_watermark` for remote wal logstore (#5877 ) * feat: introduce high_watermark_since_flush * test: add unit test for high watermark * refactor: submit a request instead * fix: send reply before submit request * fix: no need to update twice * feat: update high watermark in background periodically * test: update unit tests * fix: update high watermark periodically * test: update unit tests * chore: apply review comments * chore: rename * chore: apply review comments * chore: clean up * chore: apply review comments	2025-04-18 12:10:47 +00:00
Weny Xu	b8c6f1c8ed	feat: sync region followers after altering regions (#5901 ) * feat: close follower regions after dropping leader regions * chore: upgrade greptime-proto * feat: sync region followers after alter region operations * test: add tests * chore: apply suggestions from CR * chore: apply suggestions from CR	2025-04-18 10:21:35 +00:00
Lei, HUANG	799c7cbfa9	feat(mito): bulk insert request handling on datanode (#5831 ) * wip: implement basic request handling * feat/bulk-insert: ### Add Error Handling and Enhance Bulk Insert Functionality - Error Handling: Introduced a new error variant `ConvertDataType` in `error.rs` to handle conversion failures from `ConcreteDataType` to `ColumnDataType`. - Bulk Insert Enhancements: - Updated `WorkerRequest::BulkInserts` in `request.rs` to include metadata and sender. - Implemented `handle_bulk_inserts` in `worker.rs` to process bulk insert requests with region metadata. - Added functions `region_metadata_to_column_schema` and `record_batch_to_rows` in `handle_bulk_insert.rs` for schema conversion and row processing. - API Changes: Modified `RegionBulkInsertsRequest` in `region_request.rs` to include `region_id`. Files affected: `error.rs`, `request.rs`, `worker.rs`, `handle_bulk_insert.rs`, `region_request.rs`. * feat/bulk-insert: Enhance Error Handling and Add Unit Tests - Improved error handling in `record_batch_to_rows` function within `handle_bulk_insert.rs` by returning `Result` and handling errors with `context`. - Added unit tests for `region_metadata_to_column_schema` and `record_batch_to_rows` functions in `handle_bulk_insert.rs` to ensure correct functionality and error handling. * chore: update proto version * feat/bulk-insert: - Refactor Error Handling: Updated error handling in `error.rs` by modifying the `ConvertDataType` error handling. - Improve Logging and Error Reporting: Enhanced logging and error reporting in `worker.rs` by adding error messages for missing region metadata. - Add New Error Type: Introduced `DecodeArrowIpc` error in `metadata.rs` to handle Arrow IPC decoding failures. - Handle Arrow IPC Decoding: Updated `region_request.rs` to handle Arrow IPC decoding errors using the new `DecodeArrowIpc` error type. * chore: update proto version * feat/bulk-insert: Refactor `handle_bulk_insert.rs` to simplify row construction - Removed the mutable `current_row` vector and refactored `row_at` function to return a new vector directly. - Updated `record_batch_to_rows` to utilize the refactored `row_at` function for constructing rows. * feat/bulk-insert: ### Commit Summary Enhancements in Region Server Request Handling - Updated `region_server.rs` to include `RegionRequest::BulkInserts(_)` in the `RegionChange::Ingest` category, improving the handling of bulk insert operations. - Refined the categorization of region requests to ensure accurate mapping to `RegionChange` actions.	2025-04-15 14:11:50 +00:00
Lei, HUANG	6a50d71920	fix: memtable panic (#5894 ) * fix: memtable panic * fix: ci	2025-04-14 13:15:56 +00:00
Weny Xu	c522893552	fix: ensure logical regions are synced during region sync (#5878 ) * fix: ensure logical regions are synced during region sync * chore: apply suggestions from CR * chore: apply suggestions from CR	2025-04-14 12:37:31 +00:00
Zhenchi	e3675494b4	feat: apply terms with fulltext bloom backend (#5884 ) * feat: apply terms with fulltext bloom backend Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> * perf: preload jieba Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> * polish doc Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> --------- Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>	2025-04-14 07:08:59 +00:00

1 2 3 4 5 ...

599 Commits