Commit Graph

4140 Commits

Author SHA1 Message Date
Lei, HUANG
493440a802 refactor: replace FlightMessage with arrow RecordBatch and Schema (#6175)
* refactor/flight-codec:
 ### Refactor and Enhance Schema and RecordBatch Handling

 - **Add `datatypes` Dependency**: Updated `Cargo.lock` and `Cargo.toml` to include the `datatypes` dependency.
 - **Schema Conversion and Error Handling**:
   - Updated `src/client/src/database.rs` and `src/client/src/region.rs` to handle schema conversion using `Arc` and added error handling for schema conversion.
   - Enhanced error handling in `src/client/src/error.rs` and `src/common/grpc/src/error.rs` by adding `ConvertSchema` error and removing unused errors.
 - **FlightMessage and RecordBatch Refactoring**:
   - Refactored `FlightMessage` enum in `src/common/grpc/src/flight.rs` to use `RecordBatch` instead of `Recordbatch`.
   - Updated related functions and tests in `src/common/grpc/benches/bench_flight_decoder.rs`, `src/operator/src/bulk_insert.rs`, `src/servers/src/grpc/flight/stream.rs`, and `tests-integration/src/grpc/flight.rs` to align with the new `FlightMessage` structure.

* refactor/flight-codec:
 Remove `ConvertArrowSchema` Error Variant

 - Removed the `ConvertArrowSchema` error variant from `error.rs`.
 - Updated the `ErrorExt` implementation to exclude `ConvertArrowSchema`.
 - Affected file: `src/common/query/src/error.rs`.

* fix: cr
2025-05-26 10:06:50 +00:00
localhost
77e2fee755 fix: add simple test for rds kv backend (#6167)
* chore: add simple test for rds kv backend

* chore: add test for etcd and mem

* chore: remove etcd simple range test

* chore: add more test case
2025-05-26 06:32:36 +00:00
dennis zhuang
b85429c0f1 fix: set column index can't work in physical table (#6179) 2025-05-26 04:44:05 +00:00
Lei, HUANG
3d942f6763 fix: bulk insert case sensitive (#6165)
* fix/bulk-insert-case-sensitive:
 Add error inspection for gRPC bulk insert in `greptime_handler.rs`

 - Enhanced error handling by adding `inspect_err` to log errors during the `put_record_batch` operation in `greptime_handler.rs`.

* fix: silient error while bulk ingest with uppercase columns
v0.15.0-nightly-20250526
2025-05-24 07:02:42 +00:00
discord9
3901863432 chore: metasrv starting not blocking (#6158)
* chore: metasrv starting not blocking

* chore: fmt

* chore: expose actual bind_addr
2025-05-23 09:53:42 +00:00
Lei, HUANG
27e339f628 perf: optimize bulk encode decode (#6161)
* main:
 **Enhancements to Flight Data Handling and Error Management**

 - **Flight Data Handling:**
   - Added `bytes` dependency in `Cargo.lock` and `Cargo.toml`.
   - Introduced `try_from_schema_bytes` and `try_decode_record_batch` methods in `FlightDecoder` to handle schema and record batch decoding more efficiently in `src/common/grpc/src/flight.rs`.
   - Updated `Inserter` in `src/operator/src/bulk_insert.rs` to utilize schema bytes directly, improving bulk insert operations.

 - **Error Management:**
   - Added `ArrowError` handling in `src/common/grpc/src/error.rs` to manage errors related to Arrow operations.

 - **Region Request Processing:**
   - Modified `make_region_bulk_inserts` in `src/store-api/src/region_request.rs` to use the new `FlightDecoder` methods for decoding Arrow IPC data.

* - **Flight Data Handling:**
 - Added `bytes` dependency in `Cargo.lock` and `Cargo.toml`.
 - Introduced `try_from_schema_bytes` and `try_decode_record_batch` methods in `FlightDecoder` to handle schema and record batch decoding more efficiently in `src/common/grpc/src/flight.rs`.
 - Updated `Inserter` in `src/operator/src/bulk_insert.rs` to utilize schema bytes directly, improving bulk insert operations.
- **Error Management:**
 - Added `ArrowError` handling in `src/common/grpc/src/error.rs` to manage errors related to Arrow operations.
- **Region Request Processing:**
 - Modified `make_region_bulk_inserts` in `src/store-api/src/region_request.rs` to use the new `FlightDecoder` methods for decoding Arrow IPC data.

* perf/optimize-bulk-encode-decode:
 Update `greptime-proto` dependency and refactor error handling

 - **Dependency Update**: Updated the `greptime-proto` dependency to a new revision in `Cargo.lock` and `Cargo.toml`.
 - **Error Handling Refactor**: Removed the `Prost` error variant from `MetadataError` in `src/store-api/src/metadata.rs`.
 - **Error Handling Improvement**: Replaced `unwrap` with `context(FlightCodecSnafu)` for error handling in `make_region_bulk_inserts` function in `src/store-api/src/region_request.rs`.

* fix: clippy

* fix: toml

* perf/optimize-bulk-encode-decode:
 ### Update `Cargo.toml` Dependencies

 - Updated the `bytes` dependency to use the workspace version in `Cargo.toml`.

* perf/optimize-bulk-encode-decode:
 **Fix payload assignment in `bulk_insert.rs`**

 - Corrected the assignment of the `payload` field in the `ArrowIpc` struct within the `Inserter` implementation in `bulk_insert.rs`.

* use main branch proto
2025-05-23 07:22:10 +00:00
discord9
cf2712e6f4 chore: invalid table flow mapping cache (#6135)
* chore: invalid table flow mapping

* chore: exists

* fix: invalid all related keys in kv cache when drop flow&refactor: per review

* fix: flow not found status code

* chore: rm unused error code

* chore: stuff

* chore: unused
2025-05-23 03:40:10 +00:00
Lei, HUANG
4b71e493f7 feat!: revise compaction picker (#6121)
* - **Refactor `RegionFilePathFactory` to `RegionFilePathProvider`:** Updated references and implementations in `access_layer.rs`, `write_cache.rs`, and related test files to use the new struct name.
 - **Add `max_file_size` support in compaction:** Introduced `max_file_size` option in `PickerOutput`, `SerializedPickerOutput`, and `WriteOptions` in `compactor.rs`, `picker.rs`, `twcs.rs`, and `window.rs`.
 - **Enhance Parquet writing logic:** Modified `parquet.rs` and `parquet/writer.rs` to support optional `max_file_size` and added a test case `test_write_multiple_files` to verify writing multiple files based on size constraints.

 **Refactor Parquet Writer Initialization and File Handling**
 - Updated `ParquetWriter` in `writer.rs` to handle `current_indexer` as an `Option`, allowing for more flexible initialization and management.
 - Introduced `finish_current_file` method to encapsulate logic for completing and transitioning between SST files, improving code clarity and maintainability.
 - Enhanced error handling and logging with `debug` statements for better traceability during file operations.

 - **Removed Output Size Enforcement in `twcs.rs`:**
   - Deleted the `enforce_max_output_size` function and related logic to simplify compaction input handling.

 - **Added Max File Size Option in `parquet.rs`:**
   - Introduced `max_file_size` in `WriteOptions` to control the maximum size of output files.

 - **Refactored Indexer Management in `parquet/writer.rs`:**
   - Changed `current_indexer` from an `Option` to a direct `Indexer` type.
   - Implemented `roll_to_next_file` to handle file transitions when exceeding `max_file_size`.
   - Simplified indexer initialization and management logic.

 - **Refactored SST File Handling**:
   - Introduced `FilePathProvider` trait and its implementations (`WriteCachePathProvider`, `RegionFilePathFactory`) to manage SST and index file paths.
   - Updated `AccessLayer`, `WriteCache`, and `ParquetWriter` to use `FilePathProvider` for path management.
   - Modified `SstWriteRequest` and `SstUploadRequest` to use path providers instead of direct paths.
   - Files affected: `access_layer.rs`, `write_cache.rs`, `parquet.rs`, `writer.rs`.

 - **Enhanced Indexer Management**:
   - Replaced `IndexerBuilder` with `IndexerBuilderImpl` and made it async to support dynamic indexer creation.
   - Updated `ParquetWriter` to handle multiple indexers and file IDs.
   - Files affected: `index.rs`, `parquet.rs`, `writer.rs`.

 - **Removed Redundant File ID Handling**:
   - Removed `file_id` from `SstWriteRequest` and `CompactionOutput`.
   - Updated related logic to dynamically generate file IDs where necessary.
   - Files affected: `compaction.rs`, `flush.rs`, `picker.rs`, `twcs.rs`, `window.rs`.

 - **Test Adjustments**:
   - Updated tests to align with new path and indexer management.
   - Introduced `FixedPathProvider` and `NoopIndexBuilder` for testing purposes.
   - Files affected: `sst_util.rs`, `version_util.rs`, `parquet.rs`.

* chore: rebase main

* feat/multiple-compaction-output:
 ### Add Benchmarking and Refactor Compaction Logic

 - **Benchmarking**: Added a new benchmark `run_bench` in `Cargo.toml` and implemented benchmarks in `benches/run_bench.rs` using Criterion for `find_sorted_runs` and `reduce_runs` functions.
 - **Compaction Module Enhancements**:
   - Made `run.rs` public and refactored the `Ranged` and `Item` traits to be public.
   - Simplified the logic in `find_sorted_runs` and `reduce_runs` by removing `MergeItems` and related functions.
   - Introduced `find_overlapping_items` for identifying overlapping items.
 - **Code Cleanup**: Removed redundant code and tests related to `MergeItems` in `run.rs`.

* feat/multiple-compaction-output:
 ### Enhance Compaction Logic and Add Benchmarks

 - **Compaction Logic Improvements**:
   - Updated `reduce_runs` function in `src/mito2/src/compaction/run.rs` to remove the target parameter and improve the logic for selecting files to merge based on minimum penalty.
   - Enhanced `find_overlapping_items` to handle unsorted inputs and improve overlap detection efficiency.

 - **Benchmark Enhancements**:
   - Added `bench_find_overlapping_items` in `src/mito2/benches/run_bench.rs` to benchmark the new `find_overlapping_items` function.
   - Extended existing benchmarks to include larger data sizes.

 - **Testing Enhancements**:
   - Updated tests in `src/mito2/src/compaction/run.rs` to reflect changes in `reduce_runs` and added new tests for `find_overlapping_items`.

 - **Logging and Debugging**:
   - Improved logging in `src/mito2/src/compaction/twcs.rs` to provide more detailed information about compaction decisions.

* feat/multiple-compaction-output:
 ### Refactor and Enhance Compaction Logic

 - **Refactor `find_overlapping_items` Function**: Changed the function signature to accept slices instead of mutable vectors in `run.rs`.
 - **Rename and Update Struct Fields**: Renamed `penalty` to `size` in `SortedRun` struct and updated related logic in `run.rs`.
 - **Enhance `reduce_runs` Function**: Improved logic to sort runs by size and limit probe runs to 100 in `run.rs`.
 - **Add `merge_seq_files` Function**: Introduced a new function `merge_seq_files` in `run.rs` for merging sequential files.
 - **Modify `TwcsPicker` Logic**: Updated the compaction logic to use `merge_seq_files` when only one run is found in `twcs.rs`.
 - **Remove `enforce_file_num` Function**: Deleted the `enforce_file_num` function and its related test cases in `twcs.rs`.

* feat/multiple-compaction-output:
 ### Enhance Compaction Logic and Testing

 - **Add `merge_seq_files` Functionality**: Implemented the `merge_seq_files` function in `run.rs` to optimize file merging based on scoring systems. Updated
 benchmarks in `run_bench.rs` to include `bench_merge_seq_files`.
 - **Improve Compaction Strategy in `twcs.rs`**: Modified the compaction logic to handle file merging more effectively, considering file size and overlap.
 - **Update Tests**: Enhanced test coverage in `compaction_test.rs` and `append_mode_test.rs` to validate new compaction logic and file merging strategies.
 - **Remove Unused Function**: Deleted `new_file_handles` from `test_util.rs` as it was no longer needed.

* feat/multiple-compaction-output:
 ### Refactor TWCS Compaction Options

 - **Refactor Compaction Logic**: Simplified the TWCS compaction logic by replacing multiple parameters (`max_active_window_runs`, `max_active_window_files`, `max_inactive_window_runs`, `max_inactive_window_files`) with a single `trigger_file_num` parameter in `picker.rs`, `twcs.rs`, and `options.rs`.
 - **Update Tests**: Adjusted test cases to reflect the new compaction logic in `append_mode_test.rs`, `compaction_test.rs`, `filter_deleted_test.rs`, `merge_mode_test.rs`, and various test files under `tests/cases`.
 - **Modify Engine Options**: Updated engine option keys to use `trigger_file_num` in `mito_engine_options.rs` and `region_request.rs`.
 - **Fuzz Testing**: Updated fuzz test generators and translators to accommodate the new compaction parameter in `alter_expr.rs` and related files.

 This refactor aims to streamline the compaction configuration by reducing the number of parameters and simplifying the codebase.

* chore: add trailing space

* fix license header

* feat/revise-compaction-picker:
 **Limit File Processing and Optimize Merge Logic in `run.rs`**

 - Introduced a limit to process a maximum of 100 files in `merge_seq_files` to control time complexity.
 - Adjusted logic to calculate `target_size` and iterate over files using the limited set of files.
 - Updated scoring calculations to use the limited file set, ensuring efficient file merging.

* feat/revise-compaction-picker:
 ### Add Compaction Metrics and Remove Debug Logging

 - **Compaction Metrics**: Introduced new histograms `COMPACTION_INPUT_BYTES` and `COMPACTION_OUTPUT_BYTES` to track compaction input and output file sizes in `metrics.rs`. Updated `compactor.rs` to observe these metrics during the compaction process.
 - **Logging Cleanup**: Removed debug logging of file ranges during the merge process in `twcs.rs`.

* feat/revise-compaction-picker:
 ## Enhance Compaction Logic and Metrics

 - **Compaction Logic Improvements**:
   - Added methods `input_file_size` and `output_file_size` to `MergeOutput` in `compactor.rs` to streamline file size calculations.
   - Updated `Compactor` implementation to use these methods for metrics tracking.
   - Modified `Ranged` trait logic in `run.rs` to improve range comparison.
   - Enhanced test cases in `run.rs` to reflect changes in compaction logic.

 - **Metrics Enhancements**:
   - Changed `COMPACTION_INPUT_BYTES` and `COMPACTION_OUTPUT_BYTES` from histograms to counters in `metrics.rs` for better performance tracking.

 - **Debugging and Logging**:
   - Added detailed logging for compaction pick results in `twcs.rs`.
   - Implemented custom `Debug` trait for `FileMeta` in `file.rs` to improve debugging output.

 - **Testing Enhancements**:
   - Added new test `test_compaction_overlapping_files` in `compaction_test.rs` to verify compaction behavior with overlapping files.
   - Updated `merge_mode_test.rs` to reflect changes in file handling during scans.

* feat/revise-compaction-picker:
 ### Update `FileHandle` Debug Implementation

 - **Refactor Debug Output**: Simplified the `fmt::Debug` implementation for `FileHandle` in `src/mito2/src/sst/file.rs` by consolidating multiple fields into a single `meta` field using `meta_ref()`.
 - **Atomic Operations**: Updated the `deleted` field to use atomic loading with `Ordering::Relaxed`.

* Trigger CI

* feat/revise-compaction-picker:
 **Update compaction logic and default options**

 - **`twcs.rs`**: Enhanced logging for compaction pick results by improving the formatting for better readability.
 - **`options.rs`**: Modified the default `max_output_file_size` in `TwcsOptions` from 2GB to 512MB to optimize file handling and performance.

* feat/revise-compaction-picker:
 Refactor `find_overlapping_items` to use an external result vector

 - Updated `find_overlapping_items` in `src/mito2/src/compaction/run.rs` to accept a mutable result vector instead of returning a new vector, improving memory efficiency.
 - Modified benchmarks in `src/mito2/benches/bench_compaction_picker.rs` to accommodate the new function signature.
 - Adjusted tests in `src/mito2/src/compaction/run.rs` to use the updated function signature, ensuring correct functionality with the new approach.

* feat/revise-compaction-picker:
 Improve file merging logic in `run.rs`

 - Refactor the loop logic in `merge_seq_files` to simplify the iteration over file groups.
 - Adjust the range for `end_idx` to include the endpoint, allowing for more flexible group selection.
 - Remove the condition that skips groups with only one file, enabling more comprehensive processing of file sequences.

* feat/revise-compaction-picker:
 Enhance `find_overlapping_items` with `SortedRun` and Update Tests

 - Refactor `find_overlapping_items` in `src/mito2/src/compaction/run.rs` to utilize the `SortedRun` struct for improved efficiency and clarity.
 - Introduce a `sorted` flag in `SortedRun` to optimize sorting operations.
 - Update test cases in `src/mito2/benches/bench_compaction_picker.rs` to accommodate changes in `find_overlapping_items` by using `SortedRun`.
 - Add `From<Vec<T>>` implementation for `SortedRun` to facilitate easy conversion from vectors.

* feat/revise-compaction-picker:
 **Enhancements in `compaction/run.rs`:**

 - Added `ReadableSize` import to handle size calculations.
 - Modified the logic in `merge_seq_files` to clamp the calculated target size to a maximum of 2GB when `max_file_size` is not provided.

* feat/revise-compaction-picker: Add Default Max Output Size Constant for Compaction

Introduce DEFAULT_MAX_OUTPUT_SIZE constant to define the default maximum compaction output file size as 2GB. Refactor the merge_seq_files function to utilize this constant, ensuring consistent and maintainable code for handling file size limits during compaction.
2025-05-23 03:29:08 +00:00
Ruihang Xia
bf496e05cc ci: turn off fail fast strategy (#6157)
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2025-05-23 02:38:25 +00:00
zyy17
513ca951ee chore: add the missing v prefix for NEXT_RELEASE_VERSION variable (#6160)
chore: add 'v' prefix for NEXT_RELEASE_VERSION variable
2025-05-22 10:33:14 +00:00
Ruihang Xia
791f530a78 fix: require input ordering in series divide plan (#6148)
* require input ordering in series divide plan

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* add sqlness case

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* finilise

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2025-05-22 07:04:25 +00:00
Ning Sun
1de6d8c619 fix: ident value in set search_path (#6153)
* fix: ident value in set search_path

* refactor: remove unneeded clone
2025-05-22 03:58:18 +00:00
discord9
a4d0420727 fix(flow): flow task run interval (#6100)
* fix: always check for shutdown signal in flow
chore: correct log msg for flows that shouldn't exist
feat: use time window size/2 as sleep interval

* chore: better slower query refresh time

* chore

* refactor: per review
2025-05-22 03:27:26 +00:00
discord9
fc6300a2ba feat(flow): support prom ql(in tql) in flow (#6063)
* feat: support parse prom ql in create flow

* refactor

* fix: just run tql unmodified

* refactor: determine type faster

* fix: pass original query

* tests: sqlness

* test: fix format&chore

* fix: get raw query

* test: fix sqlness randomness

* chore: what's the box for?

* test: location_to_index

* test: make sqlness more determinstic

* fix: tmp add sleep 1s after flush_flow

* undo test sleep 1s&rm done todo

* chore: more tests
2025-05-22 03:06:09 +00:00
liyang
f55af5838c ci: add issues write permission (#6145)
fixed to: https://github.com/GreptimeTeam/greptimedb/actions/runs/15155518237/job/42610589439
2025-05-21 15:53:01 +00:00
Lei, HUANG
5a0da5b6bb fix: region worker stall metrics (#6149)
fix/stall-metrics:
 Improve stalled request handling in `handle_write.rs`

 - Updated logic to account for both `write_requests` and `bulk_requests` when adjusting `stalled_count`.
 - Modified `reject_region_stalled_requests` and `handle_region_stalled_requests` to correctly subtract the combined length of `requests` and `bulk` from `stalled_count`.
2025-05-21 13:21:50 +00:00
Lei, HUANG
d5f0006864 fix: flaky prom gateway test (#6146)
fix/flaky-prom-gateway-test:
 **Refactor gRPC Test Assertions in `grpc.rs`**

 - Updated test assertions for `test_prom_gateway_query` to improve clarity and maintainability.
 - Replaced direct comparison with expected `PrometheusJsonResponse` objects with individual field assertions.
 - Added sorting for `vector` and `matrix` results to ensure consistent test outcomes.
2025-05-21 09:31:58 +00:00
liyang
ede82331b2 docs: change docker run mount directory (#6142) 2025-05-21 07:05:21 +00:00
Ruihang Xia
56e696bd55 chore: remove stale wal config entries (#6134)
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2025-05-20 19:42:09 +00:00
ZonaHe
bc0cdf62ba feat: update dashboard to v0.9.2 (#6140)
Co-authored-by: ZonaHex <ZonaHex@users.noreply.github.com>
2025-05-20 19:41:29 +00:00
Lei, HUANG
eaf7b4b9dd chore: update flush failure metric name and update grafana dashboard (#6138)
* 1. rename `greptime_mito_flush_errors_total` metric to `greptime_mito_flush_errors_total` for consistency
2. update grafana dashboard to add following panel:
  - compaction input/output bytes
  - bulk insert handle elasped time in frontend and region worker
2025-05-20 12:05:54 +00:00
Ruihang Xia
7ae0e150e5 feat: support altering multiple logical table in one remote write request (#6137)
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2025-05-20 11:22:38 +00:00
ZonaHe
43c30b55ae feat: update dashboard to v0.9.1 (#6132)
Co-authored-by: sunchanglong <sunchanglong@users.noreply.github.com>
2025-05-20 09:58:44 +00:00
liyang
153e80450a fix: update dev-build image tag (#6136) 2025-05-20 09:08:28 +00:00
jeremyhi
1624dc41c5 chore: reduce unnecessary txns in alter operations (#6133) 2025-05-20 08:29:49 +00:00
Ruihang Xia
300262562b feat: accommodate default column name with pre-created table schema (#6126)
* refactor: prepare_mocked_backend

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* modify request in place

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* apply to influx line protocol

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix typo

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* return on empty alter expr list

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* expose to other write paths

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2025-05-20 07:22:13 +00:00
shuiyisong
b2377d4b87 chore: update toolchain to 2025-05-19 (#6124)
* chore: update toolchain to 2025-05-19

* chore: update nix sha

* chore: rebase main and fix
2025-05-20 04:29:40 +00:00
yinheli
8d36ffb4e1 chore: enable github folder typo check and fix typos (#6128) 2025-05-20 04:20:07 +00:00
Yingwen
955ad644f7 ci: add pull requests permissions to semantic check job (#6130)
* ci: add pull requests permissions

* ci: reduce permissions
2025-05-20 03:33:33 +00:00
localhost
c2e3c3d398 chore: Add more data format support to the pipeline dryrun api. (#6115)
* chore: supporting more data type for pipeline dryrun API

* chore: add docs for parse_dryrun_data

* chore: fix by pr comment

* chore: add user-friendly error message

* chore: change EventPayloadResolver content_type field type from owner to ref

* Apply suggestions from code review

Co-authored-by: shuiyisong <113876041+shuiyisong@users.noreply.github.com>

---------

Co-authored-by: shuiyisong <113876041+shuiyisong@users.noreply.github.com>
2025-05-20 03:29:28 +00:00
Zhenchi
400229c384 feat: introduce index result cache (#6110)
* feat: introduce index result cache

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* Update src/mito2/src/sst/index/inverted_index/applier/builder.rs

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* optimize selector_len

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* address comments

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* address comments

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* address comments

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

---------

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-05-20 01:45:42 +00:00
Ruihang Xia
cd9b6990bf feat: implement clamp_min and clamp_max (#6116)
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2025-05-19 21:32:03 +00:00
Ruihang Xia
a56e6e04c2 chore: remove etcd from acknowledgement as not recommended (#6127)
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2025-05-19 12:42:30 +00:00
Ning Sun
d324439014 ci: fix release job dependencies (#6125) 2025-05-19 11:48:57 +00:00
discord9
038acda7cd fix: flow update use proper update (#6108)
* fix: flow update use proper update

* refactor: per review

* fix: flow cache

* chore: per copilot review

* refactor: rm flow node id

* refactor: per review

* chore: per review

* refactor: per review

* chore: per review
2025-05-19 11:30:10 +00:00
shuiyisong
a0d89c9ed1 feat: Prometheus remote write with pipeline (#5981)
* chore: update nightly version

* chore: sort lint lines

* chore: minor fix

* chore: update nix

* chore: update toolchain to 2024-04-14

* chore: update toolchain to 2024-04-15

* chore: remove unnecessory test

* chore: do not assert oid in sqlness test

* chore: fix margin issue

* chore: fix cr issues

* chore: fix cr issues

* chore: add pipelie handler to prom state

* chore: add prom series processor to merge function

* chore: add run pipeline in decode

* chore: add channel to pipeline ctx

* chore: add pipeline info to remote wirte hander

* chore: minor update

* chore: minor update

* chore: add test

* chore: add comment

* refactor: simplify identity pipeline params

* fix: test

* refactor: remove is_prometheus

---------

Co-authored-by: Ning Sun <sunning@greptime.com>
2025-05-19 08:00:59 +00:00
discord9
3a5534722c feat: export to s3 add more options (#6091)
* feat: export to s3 add more options

* chore: rm output dir override logic

* fix: s3 root export data

* feat: use output_dir and s3 at same time

* refactor: per review

* fix: keep same behavior
0.15.0-nightly-20250519
2025-05-16 20:58:14 +00:00
Ruihang Xia
1010a0c2ad fix: update promql-parser for regex anchor fix (#6117)
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2025-05-16 20:33:35 +00:00
Lei, HUANG
f46cdbd66b fix: fast path for single region bulk insert (#6104)
* fix/fast-path-for-single-region-bulk-insert:
 ### Commit Summary

 - **Refactor `try_decode` Method**: Updated the `try_decode` method in `FlightDecoder` to accept a reference to `FlightData` instead of consuming it. This change affects multiple files including `database.rs`, `region.rs`, `flight.rs`, `bulk_insert.rs`, `stream.rs`, and `region_request.rs`.
 - **Optimize Bulk Insert Handling**: Added a fast path for handling bulk inserts when only one region is involved in `bulk_insert.rs`.

* fix/fast-path-for-single-region-bulk-insert:
 Improve `FlightDecoder` usage in tests

 - Updated `try_decode` method calls in `flight.rs` to remove unnecessary references for `d1`, `d2`, and `d3`.
 - Ensured consistency in handling `FlightMessage` variants within test cases.

* fix/fast-path-for-single-region-bulk-insert:
 **Enhancement: Skip Empty Regions in Bulk Insert**

 - Updated `bulk_insert.rs` to improve efficiency by skipping regions without data during the bulk insert process. This change ensures that regions with a `true_count` of zero are not processed, optimizing resource usage and performance.

* fix/fast-path-for-single-region-bulk-insert:
 ### Commit Summary

 - **Refactor `RegionMask` Handling**:
   - Introduced `RegionMask` struct to encapsulate boolean array and selected rows count.
   - Updated methods to use `RegionMask` instead of `BooleanArray` for region selection.
   - Affected files: `bulk_insert.rs`, `multi_dim.rs`, `partition.rs`, `splitter.rs`.

 - **Optimize Region Selection**:
   - Removed unnecessary checks for empty regions in `bulk_insert.rs`.
   - Improved logic for handling default regions in `multi_dim.rs`.

 - **Update Tests**:
   - Modified test cases to accommodate `RegionMask` changes.
   - Affected files: `multi_dim.rs`, `splitter.rs`.

* fix/fast-path-for-single-region-bulk-insert:
 **Enhancements to MultiDimPartitionRule Logic and Tests**

 - **`multi_dim.rs`**: Improved the logic for selecting rows in `MultiDimPartitionRule` by optimizing the selection process when only one region is present.
 - **Tests**: Added new test cases to verify the behavior of default regions with unselected rows, existing default regions, and scenarios where all rows are selected. These tests ensure robust handling of partition rules and validate the correct assignment of rows to regions.
2025-05-16 20:26:56 +00:00
Weny Xu
864cc117b3 fix: append noop entry when auto topic creation is disabled (#6092)
* feat: improve topic management and add stale records cleanup

* fix: fix unit tests

* chore: apply suggestions from CR

* chore: apply suggestions from CR
2025-05-16 11:26:47 +00:00
Yingwen
0ea9ab385d fix: clean files under the atomic write dir on failure (#6112)
* fix: remove files under atomic dir on failure

* fix: clean atomic dir on download failure

* chore: update comment

* fix: clean if failed to write without write cache

* feat: add a TempFileCleaner to clean files on failure

* chore: after merge fix

* chore: more fix

---------

Co-authored-by: discord9 <55937128+discord9@users.noreply.github.com>
Co-authored-by: discord9 <discord9@163.com>
2025-05-16 11:18:11 +00:00
Yingwen
c7e9485534 feat: New scanner SeriesScan to scan by series for querying metrics (#5968)
* chore: basic methods for SeriesScan

* chore: add to scanner enum

* feat: implement scan logic of each partition

* feat: use series scan when distribution is PerSeries

* refactor: remove per series scan from SeqScan

* fix: use series scan in PerSeries distribution

* feat: keep parallelize_scan unchanged

* fix: address compiler errors

* fix: include build merge reader cost to scan cost

* feat: use smallvec

* chore: update comment

* Revert "feat: keep parallelize_scan unchanged"

This reverts commit 96ba00d175.

* assign partition_ranges

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* feat: try send before send

reduce the send timeout to 10ms

* chore: add comments

* fix: add metrics to partition metrics list

* fix: correct scan cost metrics

* chore: reset instant

* fix: scanner metrics init

* chore: display more info in explain

* feat: metrics for send series timeout

* style: fix clippy

* refactor: use ChainedRecordBatchStream to simplify codes

* chore: fix typos

* feat: separate distributor metrics

* feat: remove parallelize hack

* chore: fix warning

* test: add test for series scan

* test: update sqlness test

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Co-authored-by: Ruihang Xia <waynestxia@gmail.com>
2025-05-16 08:53:24 +00:00
Ruihang Xia
57b53211d9 feat: don't hide atomic write dir (#6109)
* feat: don't hidden atomic write dir

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* compatible code

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* Update src/mito2/src/access_layer.rs

Co-authored-by: Yingwen <realevenyag@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Co-authored-by: Yingwen <realevenyag@gmail.com>
2025-05-16 06:21:13 +00:00
zyy17
01076069a3 chore: modify default slow_query.threshold from 5s to 30s (#6107)
chore: modify slow_query.threshold from 5s to 30s
2025-05-15 20:16:13 +00:00
Ning Sun
73b4b710cd ci: update nix build linker (#6103)
* ci: update nix build linker

* ci: use mold for nix ci
2025-05-15 19:02:58 +00:00
zyy17
14b655ea57 refactor: add SlowQueryRecorder to record slow query in system table and refactor slow query options (#6008)
* refactor: add common-slow-query crate

* refactor: refine the naming

* chore: fix clippy

* chore: fix typo

* chore: sperate SlowQueryOptions From Logging

* chore: fix clippy

* chore: fix ci

* chore: refine the code

* chore: update config example

* refactor: use drop() to end the slow query timer

* refactor: move common-slow-query to frontend crate

* chore: polish some code

* refactor: code review

* refactor: add promql_range/promql_step/promql_start/promql_end fields in slow_queries

* refactor: add build_slow_query_logger()

* refactor: turn on slow query on frontend by default
2025-05-15 04:18:48 +00:00
Ruihang Xia
c780746171 perf: avoid some atomic operation on array slice (#6101)
* perf: avoid some atomic operation on array slice

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* finilise

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2025-05-15 02:29:07 +00:00
Weny Xu
1f62c3b545 fix: table metadata collection (#6102)
fix: fix collect metadata
2025-05-14 12:19:54 +00:00
Lei, HUANG
5a9023d6b3 feat(bulk): write to multiple time partitions (#6086)
* add benchmark for splitting according to time partition

* feat/write-to-multiple-time-partitions:
 **Enhancements to Bulk Processing and Time Partitioning**

 - **`part.rs`**: Added `Snafu` to imports and introduced `timestamp_index` in `BulkPart` struct. Implemented `timestamps` method for accessing timestamp columns.
 - **`simple_bulk_memtable.rs`**: Updated tests to include `timestamp_index` initialization.
 - **`time_partition.rs`**: Enhanced `TimePartition` to support partial writes with `write_record_batch_partial`. Implemented `split_record_batch` for filtering records by timestamp range. Added comprehensive tests for `split_record_batch`.
 - **`handle_bulk_insert.rs`**: Modified to retrieve timestamp index and column together, updating `BulkPart` initialization with `timestamp_index`.

* feat/write-to-multiple-time-partitions:
 ### Enhance Time Partitioning Logic

 - **`time_partition.rs`**:
   - Introduced `HashSet` for efficient partition management.
   - Refactored `write_bulk` to handle multiple partitions and added `find_partitions_by_time_range` for identifying existing and missing partitions.
   - Updated `get_or_create_time_partition` to manage partition creation.
   - Added comprehensive tests for partition finding logic, covering various scenarios including overlapping and non-overlapping time ranges.

 - **Tests**:
   - Added `test_find_partitions_by_time_range` to validate new partitioning logic.
   - Updated `test_split_record_batch` to ensure correct record batch splitting behavior.

* feat/write-to-multiple-time-partitions:
 ### Enhance Time Partitioning and Testing in `time_partition.rs`

 - **Time Partitioning Enhancements**:
   - Updated `split_record_batch` to handle multiple timestamp units (`Second`, `Millisecond`, `Microsecond`, `Nanosecond`) by matching on `DataType`.
   - Improved filtering logic for timestamp arrays to support various time units.

 - **Testing Enhancements**:
   - Added `test_write_bulk` to verify writing across multiple partitions and scenarios in `time_partition.rs`.
   - Updated `test_split_record_batch` to use `TimestampMillisecondArray` for testing timestamp partitioning.

 - **Imports and Dependencies**:
   - Added necessary imports for new timestamp array types and testing utilities.

* feat/write-to-multiple-time-partitions:
 ### Refactor and Enhance Time Partition Filtering

 - **Refactor Filtering Logic**: Consolidated the filtering logic for timestamp arrays using macros in `time_partition.rs` and `bench_filter_time_partition.rs`. This reduces code duplication and improves maintainability.
 - **Enhance `BulkPart` Struct**: Made fields in `BulkPart` public to facilitate easier access and manipulation in `memtable.rs` and `part.rs`.
 - **Rename Function**: Renamed `split_record_batch` to `filter_record_batch` for clarity in `time_partition.rs` and `bench_filter_time_partition.rs`.
 - **Add Feature Flag**: Introduced `int_roundings` feature in `lib.rs` to support new functionality.

* refactor tests

* feat/write-to-multiple-time-partitions:
 Improve timestamp handling in `time_partition.rs`

 - Enhanced safety comments for timestamp conversion to ensure clarity.
 - Modified logic to prevent overflow by using `div_euclid` for `bulk_start_sec` and `bulk_end_sec` calculations.
 - Adjusted the `filter_map` logic to correctly compute timestamps using `start_sec` and `part_duration_sec`.

* feat/write-to-multiple-time-partitions:
 **Refactor timestamp handling and add utility function**

 - **Refactor `time_partition.rs`:** Simplified timestamp handling by replacing direct type access with a utility function to retrieve the timestamp unit. Improved error handling for timestamp conversion.
 - **Enhance `metadata.rs`:** Added `time_index_type` function to `RegionMetadata` to retrieve the timestamp type of the time index column, ensuring safer and more readable code.

* feat/write-to-multiple-time-partitions:
 Refactor time partition variable names in `time_partition.rs`

 - Renamed variables for clarity: `bulk_start_sec` to `start_bucket` and `bulk_end_sec` to `end_bucket`.
 - Updated related logic to use new variable names for improved readability and maintainability.

* feat/write-to-multiple-time-partitions:
 **Refactor variable names in `time_partition.rs`**

 - Updated variable names from `matching` and `missing` to `matchings` and `missings` for clarity and consistency.
 - Modified function calls and loop iterations to align with the new variable names.
 - Affected file: `src/mito2/src/memtable/time_partition.rs`

* feat/write-to-multiple-time-partitions:
 ### Refactor variable names in `time_partition.rs`

 - Updated variable names for clarity in `time_partition.rs`:
   - Renamed `matchings` to `matching_parts`
   - Renamed `missings` to `missing_parts`
 - Adjusted logic to use new variable names in methods `find_partitions_by_time_range` and `write_record_batch`.

* feat/write-to-multiple-time-partitions:
 ### Enhance Time Partition Handling

 - **`time_partition.rs`**:
   - Added `ArrayRef` to handle timestamp arrays, improving the partitioning logic by allowing more efficient timestamp range checks.
   - Enhanced `find_partitions_by_time_range` to support sparse data and handle different timestamp units (`Second`, `Millisecond`, `Microsecond`, `Nanosecond`).
   - Updated test cases to cover new scenarios, including sparse data and edge cases, ensuring robustness of partition handling.

---------

Co-authored-by: Lei <lei@Leis-MacBook-Pro.local>
2025-05-14 05:09:59 +00:00
Ruihang Xia
209f8371f2 fix: promql regex escape behavior (#6094)
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2025-05-13 18:19:17 +00:00