Compare commits

..

45 Commits

Author SHA1 Message Date
shuiyisong
63cc51395f chore: build only centos binary
Signed-off-by: shuiyisong <xixing.sys@gmail.com>
2025-11-10 14:08:51 +08:00
shuiyisong
17a5703850 chore: change to dev-mode false packaging
Signed-off-by: shuiyisong <xixing.sys@gmail.com>
2025-11-10 11:05:31 +08:00
shuiyisong
4da1993ed3 chore: ignore unknown set in mysql
Signed-off-by: shuiyisong <xixing.sys@gmail.com>
2025-11-10 10:34:27 +08:00
Ning Sun
62d109c1f4 fix: allow case-insensitive timezone settings (#7207) 2025-11-08 15:56:27 +00:00
Alan Tang
910a383420 feat(expr): support avg functions on vector (#7146)
* feat(expr): support vec_elem_avg function

Signed-off-by: Alan Tang <jmtangcs@gmail.com>

* feat: support vec_avg function

Signed-off-by: Alan Tang <jmtangcs@gmail.com>

* test: add more query test for avg aggregator

Signed-off-by: Alan Tang <jmtangcs@gmail.com>

* fix: fix the merge batch mode

Signed-off-by: Alan Tang <jmtangcs@gmail.com>

* refactor: use sum and count as state for avg function

Signed-off-by: Alan Tang <jmtangcs@gmail.com>

* refactor: refactor merge batch mode for avg function

Signed-off-by: Alan Tang <jmtangcs@gmail.com>

* feat: add additional vector restrictions for validation

Signed-off-by: Alan Tang <jmtangcs@gmail.com>

---------

Signed-off-by: Alan Tang <jmtangcs@gmail.com>
Co-authored-by: Yingwen <realevenyag@gmail.com>
2025-11-07 13:42:14 +00:00
Weny Xu
af6bbacc8c fix: add serde defaults for MetasrvNodeInfo (#7204)
* fix: add serde defaults for `MetasrvNodeInfo`

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: fmt

Signed-off-by: WenyXu <wenymedia@gmail.com>

---------

Signed-off-by: WenyXu <wenymedia@gmail.com>
2025-11-07 09:50:09 +00:00
Yingwen
7616ffcb35 test: only set ttl to forever in fuzz alter test (#7202)
Signed-off-by: evenyag <realevenyag@gmail.com>
2025-11-07 07:32:53 +00:00
shuiyisong
a3dbd029c5 chore: remove ttl option if presents in trace meta table (#7197)
* chore: remove ttl option if presents in trace meta table

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* chore: update test

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

---------

Signed-off-by: shuiyisong <xixing.sys@gmail.com>
2025-11-06 11:51:45 +00:00
Yingwen
9caeae391e chore: print root cause in opendal logging interceptor (#7183)
* chore: print root cause in opendal

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: extract a function root_source() to get the cause

Signed-off-by: evenyag <realevenyag@gmail.com>

---------

Signed-off-by: evenyag <realevenyag@gmail.com>
2025-11-06 08:48:59 +00:00
fys
35951afff9 chore: remove unnecessary code related to triggers (#7192)
* chore: remove unused triggers memory tables

* fix: cargo clippy

* fix: sqlness
2025-11-06 08:09:14 +00:00
Ruihang Xia
a049b68c26 feat: import backup data from local files (#7180)
* feat: import backup data from local files

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* add unit tests

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix clippy

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2025-11-06 07:33:33 +00:00
Lei, HUANG
c2ff563ac6 fix(mito): avoid shortcut in picking multi window files (#7174)
* fix/pick-continue:
 ### Add Tests for TWCS Compaction Logic

 - **`twcs.rs`**:
   - Modified the logic in `TwcsPicker` to handle cases with zero runs by using `continue` instead of `return`.
   - Added two new test cases: `test_build_output_multiple_windows_with_zero_runs` and `test_build_output_single_window_zero_runs` to verify the behavior of the compaction logic when there are zero runs in
 the windows.

 - **`memtable_util.rs`**:
   - Removed unused import `PredicateGroup`.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* fix: clippy

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* fix/pick-continue:
 ### Commit Message

 Enhance Compaction Process with Expired SST Handling and Testing

 - **`compactor.rs`**:
   - Introduced handling for expired SSTs by updating the manifest immediately upon task completion.
   - Added new test cases to verify the handling of expired SSTs and manifest updates.

 - **`task.rs`**:
   - Implemented `remove_expired` function to handle expired SSTs by updating the manifest and notifying the region worker loop.
   - Refactored `handle_compaction` to `handle_expiration_and_compaction` to integrate expired SST removal before merging inputs.
   - Added logging and error handling for expired SST removal process.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* refactor/progressive-compaction:
 **Enhance Compaction Task Error Handling**

 - Updated `task.rs` to conditionally execute the removal of expired SST files only when they exist, improving error handling and performance.
 - Added a check for non-empty `expired_ssts` before initiating the removal process, ensuring unnecessary operations are avoided.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* refactor/progressive-compaction:
 ### Refactor `DefaultCompactor` to Extract `merge_single_output` Method

 - **File**: `src/mito2/src/compaction/compactor.rs`
   - Extracted the logic for merging a single compaction output into SST files into a new method `merge_single_output` within the `DefaultCompactor` struct.
   - Simplified the `merge_ssts` method by utilizing the new `merge_single_output` method, reducing code duplication and improving maintainability.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* refactor/progressive-compaction:
 ### Add Max Background Compaction Tasks Configuration

 - **`compaction.rs`**: Added `max_background_compactions` to the compaction scheduler to limit background tasks.
 - **`compaction/compactor.rs`**: Removed immediate manifest update logic after task completion.
 - **`compaction/picker.rs`**: Introduced `max_background_tasks` parameter in `new_picker` to control task limits.
 - **`compaction/twcs.rs`**: Updated `TwcsPicker` to include `max_background_tasks` and truncate inputs exceeding this limit. Added related test cases to ensure functionality.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* fix/pick-continue:
 ### Improve Error Handling and Task Management in Compaction

 - **`task.rs`**: Enhanced error handling in `remove_expired` function by logging errors without halting the compaction process. Removed the return of `Result` type and added detailed logging for various
 failure scenarios.
 - **`twcs.rs`**: Adjusted task management logic by removing input truncation based on `max_background_tasks` and instead discarding remaining tasks if the output size exceeds the limit. This ensures better
 control over task execution and resource management.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* fix/pick-continue:
 ### Add Unit Tests for Compaction Task and TWCS Picker

 - **`task.rs`**: Added unit tests to verify the behavior of `PickerOutput` with and without expired SSTs.
 - **`twcs.rs`**: Introduced tests for `TwcsPicker` to ensure correct handling of `max_background_tasks` during compaction, including scenarios with and without task truncation.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* fix/pick-continue:
 **Improve Error Handling and Notification in Compaction Task**

 - **File:** `task.rs`
   - Changed log level from `warn` to `error` for manifest update failures to enhance error visibility.
   - Refactored the notification mechanism for expired file removal by using `BackgroundNotify::RegionEdit` with `RegionEditResult` to streamline the process.
   - Simplified error handling by consolidating match cases into a single `if let Err` block for better readability and maintainability.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

---------

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
2025-11-06 06:27:17 +00:00
Yingwen
82812ff19e test: add a unit test to scan data from memtable in append mode (#7193)
* test: add tests for scanning append mode before flush

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: extract a function maybe_dedup_one

Signed-off-by: evenyag <realevenyag@gmail.com>

* ci: add flat format to docs.yml so we can make it required later

Signed-off-by: evenyag <realevenyag@gmail.com>

---------

Signed-off-by: evenyag <realevenyag@gmail.com>
2025-11-06 06:11:58 +00:00
Ning Sun
4a77167138 chore: update readme (#7187) 2025-11-06 03:21:01 +00:00
Lei, HUANG
934df46f53 fix(mito): append mode in flat format not working (#7186)
* mito2: add unit test for flat single-range append_mode dedup behavior

Verify memtable_flat_sources skips dedup when append_mode is true and
performs dedup otherwise for single-range flat memtables, preventing
regressions in the new append_mode path.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* fix/flat-source-merge:
 ### Improve Column Metadata Extraction Logic

 - **File**: `src/common/meta/src/ddl/utils.rs`
   - Modified the `extract_column_metadatas` function to use `swap_remove` for extracting the first schema and decode column metadata for comparison instead of raw bytes. This ensures that the extension map is considered during
 verification, enhancing the robustness of metadata consistency checks across datanodes.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

---------

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
2025-11-06 03:19:39 +00:00
Ning Sun
fb92e4d0b2 feat: add greptime's arrow json extension type (#7168)
* feat: add arrow json extension type

* feat: add json structure settings to extension type

* refactor: store json structure settings as extension metadata

* chore: make binary an acceptable type for extension
2025-11-05 18:34:57 +00:00
Yingwen
0939dc1d32 test: run sqlness for flat format (#7178)
* test: support flat format in sqlness

Signed-off-by: evenyag <realevenyag@gmail.com>

* test: replace region stats test result with NUM

Signed-off-by: evenyag <realevenyag@gmail.com>

* ci: add flat format to sqlness ci

Signed-off-by: evenyag <realevenyag@gmail.com>

---------

Signed-off-by: evenyag <realevenyag@gmail.com>
2025-11-05 11:23:12 +00:00
shuiyisong
50c9600ef8 fix: stabilize test results (#7182)
* fix: stablize test results

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* fix: test

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

---------

Signed-off-by: shuiyisong <xixing.sys@gmail.com>
2025-11-05 09:19:23 +00:00
Lei, HUANG
abcfbd7f41 chore(metrics): add region server requests failures count metrics (#7173)
* chore/add-region-insert-failure-metric: Add metric for failed insert requests to region server in datanode module

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* chore/add-region-insert-failure-metric:
 Add metric for tracking failed region server requests

 - Introduce a new metric `REGION_SERVER_REQUEST_FAILURE_COUNT` to count failed region server requests.
 - Update `REGION_SERVER_INSERT_FAIL_COUNT` metric description for consistency.
 - Implement error handling in `RegionServerHandler` to increment the new failure metric on request errors.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

---------

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
2025-11-05 07:23:40 +00:00
Ruihang Xia
aac3ede261 feat: allow creating logical tabel with same partition rule with physical table's (#7177)
* feat: allow creating logical tabel with same partition rule with physical table's

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix errors

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2025-11-05 06:37:17 +00:00
Yingwen
3001c2d719 feat: BulkMemtable stores small fragments in another buffer (#7164)
* feat: buffer small parts in bulk memtable

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: use assert_eq instead of assert

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: fix compiler errors

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: collect bulk memtable scan metrics

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: report metrics early

Signed-off-by: evenyag <realevenyag@gmail.com>

---------

Signed-off-by: evenyag <realevenyag@gmail.com>
2025-11-05 06:35:32 +00:00
shuiyisong
6caff50d01 chore: improve search traces and jaeger resp (#7166)
* chore: add jaeger field in trace query

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* chore: update search v1 with tags

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* chore: update col matching using col names

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* chore: minify code with macro

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* chore: fix test

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* chore: change macro to inline function

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* chore: fix filter with tags & add test

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

---------

Signed-off-by: shuiyisong <xixing.sys@gmail.com>
2025-11-04 05:49:08 +00:00
ZonaHe
421f4eec05 feat: update dashboard to v0.11.7 (#7170)
Co-authored-by: sunchanglong <sunchanglong@users.noreply.github.com>
Co-authored-by: Ning Sun <sunng@protonmail.com>
2025-11-04 02:52:26 +00:00
Yingwen
d944e5c6b8 test: add sqlness for delete and filter (#7171)
Signed-off-by: evenyag <realevenyag@gmail.com>
2025-11-04 02:13:47 +00:00
fys
013d61acbb chore(deps): remove sqlx pg feature in greptimedb build (#7172)
* chore(deps): remove sqlx pg feature in greptimedb build

* fix: ci
2025-11-03 18:49:00 +00:00
LFC
b7e834ab92 refactor: convert to influxdb values directly from arrow (#7163)
* refactor: convert to influxdb values directly from arrow

Signed-off-by: luofucong <luofc@foxmail.com>

* resolve PR comments

Signed-off-by: luofucong <luofc@foxmail.com>

* resolve PR comments

Signed-off-by: luofucong <luofc@foxmail.com>

---------

Signed-off-by: luofucong <luofc@foxmail.com>
2025-11-03 07:52:37 +00:00
LFC
5eab9a1be3 feat: json vector builder (#7151)
* resolve PR comments

Signed-off-by: luofucong <luofc@foxmail.com>

Update src/datatypes/src/vectors/json/builder.rs

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

feat: json vector builder

Signed-off-by: luofucong <luofc@foxmail.com>

* resolve PR comments

Signed-off-by: luofucong <luofc@foxmail.com>

---------

Signed-off-by: luofucong <luofc@foxmail.com>
2025-11-03 06:06:54 +00:00
Weny Xu
9de680f456 refactor: add support for batch region upgrade operations part2 (#7160)
* add tests for metric engines

Signed-off-by: WenyXu <wenymedia@gmail.com>

* feat: catchup in background

Signed-off-by: WenyXu <wenymedia@gmail.com>

* refactor: replace sequential catchup with batch processing

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: unit tests

Signed-off-by: WenyXu <wenymedia@gmail.com>

* remove single catchup

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: remove unused error

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: refine catchup tests

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: add unit tests

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: apply suggestions

Signed-off-by: WenyXu <wenymedia@gmail.com>

---------

Signed-off-by: WenyXu <wenymedia@gmail.com>
2025-11-03 06:01:38 +00:00
Ning Sun
5deaaa59ec chore: fix typo (#7169) 2025-11-03 02:22:34 +00:00
dennis zhuang
61724386ef fix: potential failure in tests (#7167)
* fix: potential failure in the test_index_build_type_compact test

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* fix: relax timestamp checking in test_timestamp_default_now

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

---------

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
2025-10-31 22:08:59 +00:00
Weny Xu
6960a0183a refactor: add support for batch region upgrade operations part1 (#7155)
* refactor: convert UpgradeRegion instruction to batch operation

Signed-off-by: WenyXu <wenymedia@gmail.com>

* feat: introduce `handle_batch_catchup_requests` fn for mito engine

Signed-off-by: WenyXu <wenymedia@gmail.com>

* test: add tests

Signed-off-by: WenyXu <wenymedia@gmail.com>

* feat: introduce `handle_batch_catchup_requests` fn for metric engine

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: suggestion and add ser/de tests

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: add comments

Signed-off-by: WenyXu <wenymedia@gmail.com>

* fix: fix unit tests

Signed-off-by: WenyXu <wenymedia@gmail.com>

---------

Signed-off-by: WenyXu <wenymedia@gmail.com>
2025-10-31 03:08:38 +00:00
Sicong Hu
30894d7599 feat(mito): Optimize async index building with priority-based batching (#7034)
* feat: add priority-based batching to IndexBuildScheduler

Signed-off-by: SNC123 <sinhco@outlook.com>

* fix: clean old puffin-related cache

Signed-off-by: SNC123 <sinhco@outlook.com>

* test: add test for IndexBuildScheduler

Signed-off-by: SNC123 <sinhco@outlook.com>

* feat: different index file id for read and async write

Signed-off-by: SNC123 <sinhco@outlook.com>

* feat: different index file id for delete

Signed-off-by: SNC123 <sinhco@outlook.com>

* chore: clippy

Signed-off-by: SNC123 <sinhco@outlook.com>

* fix: apply suggestions

Signed-off-by: SNC123 <sinhco@outlook.com>

* fix: apply comments

Signed-off-by: SNC123 <sinhco@outlook.com>

* combine files and index files

Signed-off-by: SNC123 <sinhco@outlook.com>

* feat: add index_file_id into ManifestSstEntry

Signed-off-by: SNC123 <sinhco@outlook.com>

* Update src/mito2/src/gc.rs

Signed-off-by: SNC123 <sinhco@outlook.com>

* resolve conflicts

Signed-off-by: SNC123 <sinhco@outlook.com>

* fix: sqlness

Signed-off-by: SNC123 <sinhco@outlook.com>

* chore: fmt

Signed-off-by: SNC123 <sinhco@outlook.com>

---------

Signed-off-by: SNC123 <sinhco@outlook.com>
2025-10-31 02:13:17 +00:00
Yingwen
acf38a7091 fix: avoid filtering rows with delete op by fields under merge mode (#7154)
* chore: clear allow dead_code for flat format

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: pass exprs to build appliers

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: split field filters and index appliers

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: support skip filtering fields in RowGroupPruningStats

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: add PreFilterMode to config whether to skip filtering fields

Adds the PreFilterMode to the RangeBase and sets it in
ParquetReaderBuilder

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: support skipping fields in prune reader

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: support pre filter mode in bulk memtable

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: pass PreFilterMode to memtable

Signed-off-by: evenyag <realevenyag@gmail.com>

* test: test mito filter delete

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: fix compiler errors

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: remove commented code

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: move predicate and sequence to RangesOptions

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: fmt code

Signed-off-by: evenyag <realevenyag@gmail.com>

* ci: skip cargo gc

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: fix cargo build warning

Signed-off-by: evenyag <realevenyag@gmail.com>

* Revert "ci: skip cargo gc"

This reverts commit 1ec9594a6d.

Signed-off-by: evenyag <realevenyag@gmail.com>

---------

Signed-off-by: evenyag <realevenyag@gmail.com>
2025-10-30 12:14:45 +00:00
LFC
109b70750a refactor: convert to prometheus values directly from arrow (#7153)
* refactor: convert to prometheus values directly from arrow

Signed-off-by: luofucong <luofc@foxmail.com>

* resolve PR comments

Signed-off-by: luofucong <luofc@foxmail.com>

---------

Signed-off-by: luofucong <luofc@foxmail.com>
2025-10-30 10:24:12 +00:00
shuiyisong
ee5b7ff3c8 chore: unify initialization of channel manager (#7159)
* chore: unify initialization of channel manager and extract loading tls

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* chore: fix cr issue

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

---------

Signed-off-by: shuiyisong <xixing.sys@gmail.com>
2025-10-30 04:26:02 +00:00
liyang
5d0ef376de fix: initializer container not work (#7152)
* fix: initializer not work

Signed-off-by: liyang <daviderli614@gmail.com>

* use a one version of operator

Signed-off-by: liyang <daviderli614@gmail.com>

---------

Signed-off-by: liyang <daviderli614@gmail.com>
2025-10-29 18:11:55 +00:00
shuiyisong
11c0381fc1 chore: set default catalog using build env (#7156)
* chore: update reference to const

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* chore: use option_env to set default catalog

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* chore: use const_format

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* chore: update reference in cli

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* chore: introduce a build.rs to set default catalog

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* chore: remove unused feature gate

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

---------

Signed-off-by: shuiyisong <xixing.sys@gmail.com>
2025-10-29 18:10:58 +00:00
LFC
e8b7b0ad16 fix: memtable value push result was ignored (#7136)
* fix: memtable value push result was ignored

Signed-off-by: luofucong <luofc@foxmail.com>

* chore: apply suggestion

Co-authored-by: Yingwen <realevenyag@gmail.com>

---------

Signed-off-by: luofucong <luofc@foxmail.com>
Co-authored-by: dennis zhuang <killme2008@gmail.com>
Co-authored-by: Yingwen <realevenyag@gmail.com>
2025-10-29 13:44:36 +00:00
Weny Xu
6efffa427d fix: missing flamegraph feature in pprof dependency (#7158)
fix: fix pprof deps

Signed-off-by: WenyXu <wenymedia@gmail.com>
2025-10-29 11:41:21 +00:00
Ruihang Xia
6576e3555d fix: cache estimate methods (#7157)
* fix: cache estimate methods

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* revert page value change

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* Apply suggestion from @evenyag

Co-authored-by: Yingwen <realevenyag@gmail.com>

* update test

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Co-authored-by: Yingwen <realevenyag@gmail.com>
2025-10-29 09:57:28 +00:00
Lei, HUANG
f0afd675e3 feat: objbench sub command for datanode (#7114)
* feat/objbench-subcmd:
 ### Add Object Storage Benchmark Tool and Update Dependencies

 - **`Cargo.lock` & `Cargo.toml`**: Added dependencies for `colored`, `parquet`, and `pprof` to support new features.
 - **`datanode.rs`**: Introduced `ObjbenchCommand` for benchmarking object storage, including command-line options for configuration and execution. Added `StorageConfig` and `StorageConfigWrapper` for storage engine configuration.
 - **`datanode.rs`**: Implemented a stub for `build_object_store` function to initialize object storage.

 These changes introduce a new subcommand for object storage benchmarking and update dependencies to support additional functionality.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* init

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* fix: code style and clippy

* feat/objbench-subcmd:
 Improve error handling in `objbench.rs`

 - Enhanced error handling in `parse_config` and `parse_file_dir_components` functions by replacing `unwrap` with `OptionExt` and `context` for better error messages.
 - Updated `build_access_layer_simple` and `build_cache_manager` functions to use `map_err` for more descriptive error handling.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* chore: rebase main

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

---------

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
2025-10-29 05:26:29 +00:00
discord9
37bc2e6b07 feat: gc worker heartbeat instruction (#7118)
again



false by default



test: config api



refactor: per code review



less info!



even less info!!



docs: gc regions instr



refactor: grp by region id



per code review



per review



error handling?



test: fix



todos



aft rebase fix



after refactor

Signed-off-by: discord9 <discord9@163.com>
2025-10-29 02:59:36 +00:00
Ning Sun
a9d1d33138 feat: update datafusion-pg-catalog for better dbeaver support (#7143)
* chore: update datafusion-pg-catalog to 0.12.1

* feat: import more udfs
2025-10-28 18:42:03 +00:00
discord9
22d9eb6930 feat: part sort provide dyn filter (#7140)
* feat: part sort provide dyn filter

Signed-off-by: discord9 <discord9@163.com>

* fix: reset_state reset dynamic filter

Signed-off-by: discord9 <discord9@163.com>

---------

Signed-off-by: discord9 <discord9@163.com>
2025-10-28 02:44:29 +00:00
shuiyisong
da976e534d refactor: add test feature gate to numbers table (#7148)
* refactor: add test feature gate to numbers table

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* chore: add debug_assertions

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* refactor: extract numbers table provider

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* chore: address CR issues

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

---------

Signed-off-by: shuiyisong <xixing.sys@gmail.com>
2025-10-27 10:16:00 +00:00
203 changed files with 9768 additions and 1990 deletions

View File

@@ -48,22 +48,22 @@ runs:
path: /tmp/greptime-*.log
retention-days: 3
- name: Build greptime # Builds standard greptime binary
uses: ./.github/actions/build-greptime-binary
with:
base-image: ubuntu
features: servers/dashboard
cargo-profile: ${{ inputs.cargo-profile }}
artifacts-dir: greptime-linux-${{ inputs.arch }}-${{ inputs.version }}
version: ${{ inputs.version }}
working-dir: ${{ inputs.working-dir }}
image-registry: ${{ inputs.image-registry }}
image-namespace: ${{ inputs.image-namespace }}
# - name: Build greptime # Builds standard greptime binary
# uses: ./.github/actions/build-greptime-binary
# with:
# base-image: ubuntu
# features: servers/dashboard
# cargo-profile: ${{ inputs.cargo-profile }}
# artifacts-dir: greptime-linux-${{ inputs.arch }}-${{ inputs.version }}
# version: ${{ inputs.version }}
# working-dir: ${{ inputs.working-dir }}
# image-registry: ${{ inputs.image-registry }}
# image-namespace: ${{ inputs.image-namespace }}
- name: Clean up the target directory # Clean up the target directory for the centos7 base image, or it will still use the objects of last build.
shell: bash
run: |
rm -rf ./target/
# - name: Clean up the target directory # Clean up the target directory for the centos7 base image, or it will still use the objects of last build.
# shell: bash
# run: |
# rm -rf ./target/
- name: Build greptime on centos base image
uses: ./.github/actions/build-greptime-binary
@@ -78,14 +78,14 @@ runs:
image-registry: ${{ inputs.image-registry }}
image-namespace: ${{ inputs.image-namespace }}
- name: Build greptime on android base image
uses: ./.github/actions/build-greptime-binary
if: ${{ inputs.arch == 'amd64' && inputs.dev-mode == 'false' }} # Builds arm64 greptime binary for android if the host machine amd64.
with:
base-image: android
artifacts-dir: greptime-android-arm64-${{ inputs.version }}
version: ${{ inputs.version }}
working-dir: ${{ inputs.working-dir }}
build-android-artifacts: true
image-registry: ${{ inputs.image-registry }}
image-namespace: ${{ inputs.image-namespace }}
# - name: Build greptime on android base image
# uses: ./.github/actions/build-greptime-binary
# if: ${{ inputs.arch == 'amd64' && inputs.dev-mode == 'false' }} # Builds arm64 greptime binary for android if the host machine amd64.
# with:
# base-image: android
# artifacts-dir: greptime-android-arm64-${{ inputs.version }}
# version: ${{ inputs.version }}
# working-dir: ${{ inputs.working-dir }}
# build-android-artifacts: true
# image-registry: ${{ inputs.image-registry }}
# image-namespace: ${{ inputs.image-namespace }}

View File

@@ -7,6 +7,8 @@ KUBERNETES_VERSION="${KUBERNETES_VERSION:-v1.32.0}"
ENABLE_STANDALONE_MODE="${ENABLE_STANDALONE_MODE:-true}"
DEFAULT_INSTALL_NAMESPACE=${DEFAULT_INSTALL_NAMESPACE:-default}
GREPTIMEDB_IMAGE_TAG=${GREPTIMEDB_IMAGE_TAG:-latest}
GREPTIMEDB_OPERATOR_IMAGE_TAG=${GREPTIMEDB_OPERATOR_IMAGE_TAG:-v0.5.1}
GREPTIMEDB_INITIALIZER_IMAGE_TAG="${GREPTIMEDB_OPERATOR_IMAGE_TAG}"
GREPTIME_CHART="https://greptimeteam.github.io/helm-charts/"
ETCD_CHART="oci://registry-1.docker.io/bitnamicharts/etcd"
ETCD_CHART_VERSION="${ETCD_CHART_VERSION:-12.0.8}"
@@ -58,7 +60,7 @@ function deploy_greptimedb_operator() {
# Use the latest chart and image.
helm upgrade --install greptimedb-operator greptime/greptimedb-operator \
--create-namespace \
--set image.tag=latest \
--set image.tag="$GREPTIMEDB_OPERATOR_IMAGE_TAG" \
-n "$DEFAULT_INSTALL_NAMESPACE"
# Wait for greptimedb-operator to be ready.
@@ -78,6 +80,7 @@ function deploy_greptimedb_cluster() {
helm upgrade --install "$cluster_name" greptime/greptimedb-cluster \
--create-namespace \
--set image.tag="$GREPTIMEDB_IMAGE_TAG" \
--set initializer.tag="$GREPTIMEDB_INITIALIZER_IMAGE_TAG" \
--set meta.backendStorage.etcd.endpoints="etcd.$install_namespace:2379" \
--set meta.backendStorage.etcd.storeKeyPrefix="$cluster_name" \
-n "$install_namespace"
@@ -115,6 +118,7 @@ function deploy_greptimedb_cluster_with_s3_storage() {
helm upgrade --install "$cluster_name" greptime/greptimedb-cluster -n "$install_namespace" \
--create-namespace \
--set image.tag="$GREPTIMEDB_IMAGE_TAG" \
--set initializer.tag="$GREPTIMEDB_INITIALIZER_IMAGE_TAG" \
--set meta.backendStorage.etcd.endpoints="etcd.$install_namespace:2379" \
--set meta.backendStorage.etcd.storeKeyPrefix="$cluster_name" \
--set objectStorage.s3.bucket="$AWS_CI_TEST_BUCKET" \

View File

@@ -177,7 +177,7 @@ jobs:
cargo-profile: ${{ env.CARGO_PROFILE }}
version: ${{ needs.allocate-runners.outputs.version }}
disable-run-tests: ${{ env.DISABLE_RUN_TESTS }}
dev-mode: true # Only build the standard greptime binary.
dev-mode: false # Only build the standard greptime binary.
working-dir: ${{ env.CHECKOUT_GREPTIMEDB_PATH }}
image-registry: ${{ vars.ECR_IMAGE_REGISTRY }}
image-namespace: ${{ vars.ECR_IMAGE_NAMESPACE }}

View File

@@ -613,6 +613,9 @@ jobs:
- name: "MySQL Kvbackend"
opts: "--setup-mysql"
kafka: false
- name: "Flat format"
opts: "--enable-flat-format"
kafka: false
timeout-minutes: 60
steps:
- uses: actions/checkout@v4
@@ -808,7 +811,7 @@ jobs:
- name: Setup external services
working-directory: tests-integration/fixtures
run: ../../.github/scripts/pull-test-deps-images.sh && docker compose up -d --wait
- name: Run nextest cases
run: cargo llvm-cov nextest --workspace --lcov --output-path lcov.info -F dashboard -F pg_kvbackend -F mysql_kvbackend
env:

View File

@@ -92,5 +92,6 @@ jobs:
mode:
- name: "Basic"
- name: "Remote WAL"
- name: "Flat format"
steps:
- run: 'echo "No action required"'

69
Cargo.lock generated
View File

@@ -214,6 +214,7 @@ checksum = "d301b3b94cb4b2f23d7917810addbbaff90738e0ca2be692bd027e70d7e0330c"
name = "api"
version = "0.18.0"
dependencies = [
"arrow-schema",
"common-base",
"common-decimal",
"common-error",
@@ -1629,6 +1630,7 @@ dependencies = [
"chrono",
"chrono-tz-build",
"phf 0.11.3",
"uncased",
]
[[package]]
@@ -1639,6 +1641,8 @@ checksum = "8f10f8c9340e31fc120ff885fcdb54a0b48e474bbd77cab557f0c30a3e569402"
dependencies = [
"parse-zoneinfo",
"phf_codegen 0.11.3",
"phf_shared 0.11.3",
"uncased",
]
[[package]]
@@ -1896,6 +1900,7 @@ dependencies = [
"clap 4.5.40",
"cli",
"client",
"colored",
"common-base",
"common-catalog",
"common-config",
@@ -1917,6 +1922,7 @@ dependencies = [
"common-wal",
"datanode",
"datatypes",
"either",
"etcd-client",
"file-engine",
"flow",
@@ -1932,7 +1938,9 @@ dependencies = [
"moka",
"nu-ansi-term",
"object-store",
"parquet",
"plugins",
"pprof",
"prometheus",
"prost 0.13.5",
"query",
@@ -1975,6 +1983,16 @@ version = "1.0.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b05b61dc5112cbb17e4b6cd61790d9845d13888356391624cbe7e41efeac1e75"
[[package]]
name = "colored"
version = "2.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "117725a109d387c937a1533ce01b450cbde6b88abceea8473c4d7a85853cda3c"
dependencies = [
"lazy_static",
"windows-sys 0.59.0",
]
[[package]]
name = "comfy-table"
version = "7.1.2"
@@ -2019,6 +2037,9 @@ dependencies = [
[[package]]
name = "common-catalog"
version = "0.18.0"
dependencies = [
"const_format",
]
[[package]]
name = "common-config"
@@ -3717,9 +3738,9 @@ dependencies = [
[[package]]
name = "datafusion-pg-catalog"
version = "0.11.0"
version = "0.12.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "f258caedd1593e7dca3bf53912249de6685fa224bcce897ede1fbb7b040ac6f6"
checksum = "15824c98ff2009c23b0398d441499b147f7c5ac0e5ee993e7a473d79040e3626"
dependencies = [
"async-trait",
"datafusion",
@@ -6307,17 +6328,6 @@ dependencies = [
"derive_utils",
]
[[package]]
name = "io-uring"
version = "0.7.9"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d93587f37623a1a17d94ef2bc9ada592f5465fe7732084ab7beefabe5c77c0c4"
dependencies = [
"bitflags 2.9.1",
"cfg-if",
"libc",
]
[[package]]
name = "ipnet"
version = "2.11.0"
@@ -7498,6 +7508,7 @@ dependencies = [
"common-telemetry",
"common-test-util",
"common-time",
"common-wal",
"datafusion",
"datatypes",
"futures-util",
@@ -9265,6 +9276,7 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "67eabc2ef2a60eb7faa00097bd1ffdb5bd28e62bf39990626a582201b7a754e5"
dependencies = [
"siphasher",
"uncased",
]
[[package]]
@@ -13256,23 +13268,20 @@ checksum = "1f3ccbac311fea05f86f61904b462b55fb3df8837a366dfc601a0161d0532f20"
[[package]]
name = "tokio"
version = "1.47.1"
version = "1.48.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "89e49afdadebb872d3145a5638b59eb0691ea23e46ca484037cfab3b76b95038"
checksum = "ff360e02eab121e0bc37a2d3b4d4dc622e6eda3a8e5253d5435ecf5bd4c68408"
dependencies = [
"backtrace",
"bytes",
"io-uring",
"libc",
"mio",
"parking_lot 0.12.4",
"pin-project-lite",
"signal-hook-registry",
"slab",
"socket2 0.6.0",
"tokio-macros",
"tracing",
"windows-sys 0.59.0",
"windows-sys 0.61.2",
]
[[package]]
@@ -13287,9 +13296,9 @@ dependencies = [
[[package]]
name = "tokio-macros"
version = "2.5.0"
version = "2.6.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "6e06d43f1345a3bcd39f6a56dbb7dcab2ba47e68e8ac134855e7e2bdbaf8cab8"
checksum = "af407857209536a95c8e56f8231ef2c2e2aff839b22e07a1ffcbc617e9db9fa5"
dependencies = [
"proc-macro2",
"quote",
@@ -13967,6 +13976,15 @@ dependencies = [
"serde",
]
[[package]]
name = "uncased"
version = "0.9.10"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "e1b88fcfe09e89d3866a5c11019378088af2d24c3fbd4f0543f96b479ec90697"
dependencies = [
"version_check",
]
[[package]]
name = "unescaper"
version = "0.1.6"
@@ -14711,6 +14729,15 @@ dependencies = [
"windows-targets 0.52.6",
]
[[package]]
name = "windows-sys"
version = "0.61.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "ae137229bcbd6cdf0f7b80a31df61766145077ddf49416a728b02cb3921ff3fc"
dependencies = [
"windows-link 0.2.1",
]
[[package]]
name = "windows-targets"
version = "0.48.5"

View File

@@ -118,9 +118,10 @@ bitflags = "2.4.1"
bytemuck = "1.12"
bytes = { version = "1.7", features = ["serde"] }
chrono = { version = "0.4", features = ["serde"] }
chrono-tz = "0.10.1"
chrono-tz = { version = "0.10.1", features = ["case-insensitive"] }
clap = { version = "4.4", features = ["derive"] }
config = "0.13.0"
const_format = "0.2"
crossbeam-utils = "0.8"
dashmap = "6.1"
datafusion = "50"
@@ -130,7 +131,7 @@ datafusion-functions = "50"
datafusion-functions-aggregate-common = "50"
datafusion-optimizer = "50"
datafusion-orc = "0.5"
datafusion-pg-catalog = "0.11"
datafusion-pg-catalog = "0.12.1"
datafusion-physical-expr = "50"
datafusion-physical-plan = "50"
datafusion-sql = "50"
@@ -218,12 +219,7 @@ similar-asserts = "1.6.0"
smallvec = { version = "1", features = ["serde"] }
snafu = "0.8"
sqlparser = { version = "0.58.0", default-features = false, features = ["std", "visitor", "serde"] }
sqlx = { version = "0.8", features = [
"runtime-tokio-rustls",
"mysql",
"postgres",
"chrono",
] }
sqlx = { version = "0.8", default-features = false, features = ["any", "macros", "json", "runtime-tokio-rustls"] }
strum = { version = "0.27", features = ["derive"] }
sysinfo = "0.33"
tempfile = "3"

View File

@@ -12,7 +12,6 @@
<div align="center">
<h3 align="center">
<a href="https://greptime.com/product/cloud">GreptimeCloud</a> |
<a href="https://docs.greptime.com/">User Guide</a> |
<a href="https://greptimedb.rs/">API Docs</a> |
<a href="https://github.com/GreptimeTeam/greptimedb/issues/5446">Roadmap 2025</a>
@@ -105,16 +104,6 @@ Read [more benchmark reports](https://docs.greptime.com/user-guide/concepts/feat
## Try GreptimeDB
### 1. [Live Demo](https://greptime.com/playground)
Experience GreptimeDB directly in your browser.
### 2. [GreptimeCloud](https://console.greptime.cloud/)
Start instantly with a free cluster.
### 3. Docker (Local Quickstart)
```shell
docker pull greptime/greptimedb
```

View File

@@ -8,6 +8,7 @@ license.workspace = true
workspace = true
[dependencies]
arrow-schema.workspace = true
common-base.workspace = true
common-decimal.workspace = true
common-error.workspace = true

View File

@@ -14,10 +14,11 @@
use std::collections::HashMap;
use arrow_schema::extension::{EXTENSION_TYPE_METADATA_KEY, EXTENSION_TYPE_NAME_KEY};
use datatypes::schema::{
COMMENT_KEY, ColumnDefaultConstraint, ColumnSchema, FULLTEXT_KEY, FulltextAnalyzer,
FulltextBackend, FulltextOptions, INVERTED_INDEX_KEY, JSON_STRUCTURE_SETTINGS_KEY,
SKIPPING_INDEX_KEY, SkippingIndexOptions, SkippingIndexType,
FulltextBackend, FulltextOptions, INVERTED_INDEX_KEY, SKIPPING_INDEX_KEY, SkippingIndexOptions,
SkippingIndexType,
};
use greptime_proto::v1::{
Analyzer, FulltextBackend as PbFulltextBackend, SkippingIndexType as PbSkippingIndexType,
@@ -68,8 +69,14 @@ pub fn try_as_column_schema(column_def: &ColumnDef) -> Result<ColumnSchema> {
if let Some(skipping_index) = options.options.get(SKIPPING_INDEX_GRPC_KEY) {
metadata.insert(SKIPPING_INDEX_KEY.to_string(), skipping_index.to_owned());
}
if let Some(settings) = options.options.get(JSON_STRUCTURE_SETTINGS_KEY) {
metadata.insert(JSON_STRUCTURE_SETTINGS_KEY.to_string(), settings.clone());
if let Some(extension_name) = options.options.get(EXTENSION_TYPE_NAME_KEY) {
metadata.insert(EXTENSION_TYPE_NAME_KEY.to_string(), extension_name.clone());
}
if let Some(extension_metadata) = options.options.get(EXTENSION_TYPE_METADATA_KEY) {
metadata.insert(
EXTENSION_TYPE_METADATA_KEY.to_string(),
extension_metadata.clone(),
);
}
}
@@ -142,10 +149,16 @@ pub fn options_from_column_schema(column_schema: &ColumnSchema) -> Option<Column
.options
.insert(SKIPPING_INDEX_GRPC_KEY.to_string(), skipping_index.clone());
}
if let Some(settings) = column_schema.metadata().get(JSON_STRUCTURE_SETTINGS_KEY) {
if let Some(extension_name) = column_schema.metadata().get(EXTENSION_TYPE_NAME_KEY) {
options
.options
.insert(JSON_STRUCTURE_SETTINGS_KEY.to_string(), settings.clone());
.insert(EXTENSION_TYPE_NAME_KEY.to_string(), extension_name.clone());
}
if let Some(extension_metadata) = column_schema.metadata().get(EXTENSION_TYPE_METADATA_KEY) {
options.options.insert(
EXTENSION_TYPE_METADATA_KEY.to_string(),
extension_metadata.clone(),
);
}
(!options.options.is_empty()).then_some(options)

View File

@@ -29,6 +29,7 @@ use crate::information_schema::{InformationExtensionRef, InformationSchemaProvid
use crate::kvbackend::KvBackendCatalogManager;
use crate::kvbackend::manager::{CATALOG_CACHE_MAX_CAPACITY, SystemCatalog};
use crate::process_manager::ProcessManagerRef;
use crate::system_schema::numbers_table_provider::NumbersTableProvider;
use crate::system_schema::pg_catalog::PGCatalogProvider;
pub struct KvBackendCatalogManagerBuilder {
@@ -119,6 +120,7 @@ impl KvBackendCatalogManagerBuilder {
DEFAULT_CATALOG_NAME.to_string(),
me.clone(),
)),
numbers_table_provider: NumbersTableProvider,
backend,
process_manager,
#[cfg(feature = "enterprise")]

View File

@@ -18,8 +18,7 @@ use std::sync::{Arc, Weak};
use async_stream::try_stream;
use common_catalog::consts::{
DEFAULT_CATALOG_NAME, DEFAULT_SCHEMA_NAME, INFORMATION_SCHEMA_NAME, NUMBERS_TABLE_ID,
PG_CATALOG_NAME,
DEFAULT_CATALOG_NAME, DEFAULT_SCHEMA_NAME, INFORMATION_SCHEMA_NAME, PG_CATALOG_NAME,
};
use common_error::ext::BoxedError;
use common_meta::cache::{
@@ -45,7 +44,6 @@ use table::TableRef;
use table::dist_table::DistTable;
use table::metadata::{TableId, TableInfoRef};
use table::table::PartitionRules;
use table::table::numbers::{NUMBERS_TABLE_NAME, NumbersTable};
use table::table_name::TableName;
use tokio::sync::Semaphore;
use tokio_stream::wrappers::ReceiverStream;
@@ -61,6 +59,7 @@ use crate::information_schema::{InformationExtensionRef, InformationSchemaProvid
use crate::kvbackend::TableCacheRef;
use crate::process_manager::ProcessManagerRef;
use crate::system_schema::SystemSchemaProvider;
use crate::system_schema::numbers_table_provider::NumbersTableProvider;
use crate::system_schema::pg_catalog::PGCatalogProvider;
/// Access all existing catalog, schema and tables.
@@ -555,6 +554,7 @@ pub(super) struct SystemCatalog {
// system_schema_provider for default catalog
pub(super) information_schema_provider: Arc<InformationSchemaProvider>,
pub(super) pg_catalog_provider: Arc<PGCatalogProvider>,
pub(super) numbers_table_provider: NumbersTableProvider,
pub(super) backend: KvBackendRef,
pub(super) process_manager: Option<ProcessManagerRef>,
#[cfg(feature = "enterprise")]
@@ -584,9 +584,7 @@ impl SystemCatalog {
PG_CATALOG_NAME if channel == Channel::Postgres => {
self.pg_catalog_provider.table_names()
}
DEFAULT_SCHEMA_NAME => {
vec![NUMBERS_TABLE_NAME.to_string()]
}
DEFAULT_SCHEMA_NAME => self.numbers_table_provider.table_names(),
_ => vec![],
}
}
@@ -604,7 +602,7 @@ impl SystemCatalog {
if schema == INFORMATION_SCHEMA_NAME {
self.information_schema_provider.table(table).is_some()
} else if schema == DEFAULT_SCHEMA_NAME {
table == NUMBERS_TABLE_NAME
self.numbers_table_provider.table_exists(table)
} else if schema == PG_CATALOG_NAME && channel == Channel::Postgres {
self.pg_catalog_provider.table(table).is_some()
} else {
@@ -649,8 +647,8 @@ impl SystemCatalog {
});
pg_catalog_provider.table(table_name)
}
} else if schema == DEFAULT_SCHEMA_NAME && table_name == NUMBERS_TABLE_NAME {
Some(NumbersTable::table(NUMBERS_TABLE_ID))
} else if schema == DEFAULT_SCHEMA_NAME {
self.numbers_table_provider.table(table_name)
} else {
None
}

View File

@@ -14,6 +14,7 @@
pub mod information_schema;
mod memory_table;
pub mod numbers_table_provider;
pub mod pg_catalog;
pub mod predicate;
mod utils;

View File

@@ -97,7 +97,6 @@ lazy_static! {
ROUTINES,
SCHEMA_PRIVILEGES,
TABLE_PRIVILEGES,
TRIGGERS,
GLOBAL_STATUS,
SESSION_STATUS,
PARTITIONS,
@@ -207,7 +206,6 @@ impl SystemSchemaProviderInner for InformationSchemaProvider {
ROUTINES => setup_memory_table!(ROUTINES),
SCHEMA_PRIVILEGES => setup_memory_table!(SCHEMA_PRIVILEGES),
TABLE_PRIVILEGES => setup_memory_table!(TABLE_PRIVILEGES),
TRIGGERS => setup_memory_table!(TRIGGERS),
GLOBAL_STATUS => setup_memory_table!(GLOBAL_STATUS),
SESSION_STATUS => setup_memory_table!(SESSION_STATUS),
KEY_COLUMN_USAGE => Some(Arc::new(InformationSchemaKeyColumnUsage::new(

View File

@@ -15,8 +15,7 @@
use std::sync::Arc;
use common_catalog::consts::{METRIC_ENGINE, MITO_ENGINE};
use datatypes::data_type::ConcreteDataType;
use datatypes::schema::{ColumnSchema, Schema, SchemaRef};
use datatypes::schema::{Schema, SchemaRef};
use datatypes::vectors::{Int64Vector, StringVector, VectorRef};
use crate::system_schema::information_schema::table_names::*;
@@ -366,16 +365,6 @@ pub(super) fn get_schema_columns(table_name: &str) -> (SchemaRef, Vec<VectorRef>
vec![],
),
TRIGGERS => (
vec![
string_column("TRIGGER_NAME"),
ColumnSchema::new("trigger_id", ConcreteDataType::uint64_datatype(), false),
string_column("TRIGGER_DEFINITION"),
ColumnSchema::new("flownode_id", ConcreteDataType::uint64_datatype(), true),
],
vec![],
),
// TODO: Considering store internal metrics in `global_status` and
// `session_status` tables.
GLOBAL_STATUS => (

View File

@@ -0,0 +1,59 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#[cfg(any(test, feature = "testing", debug_assertions))]
use common_catalog::consts::NUMBERS_TABLE_ID;
use table::TableRef;
#[cfg(any(test, feature = "testing", debug_assertions))]
use table::table::numbers::NUMBERS_TABLE_NAME;
#[cfg(any(test, feature = "testing", debug_assertions))]
use table::table::numbers::NumbersTable;
// NumbersTableProvider is a dedicated provider for feature-gating the numbers table.
#[derive(Clone)]
pub struct NumbersTableProvider;
#[cfg(any(test, feature = "testing", debug_assertions))]
impl NumbersTableProvider {
pub(crate) fn table_exists(&self, name: &str) -> bool {
name == NUMBERS_TABLE_NAME
}
pub(crate) fn table_names(&self) -> Vec<String> {
vec![NUMBERS_TABLE_NAME.to_string()]
}
pub(crate) fn table(&self, name: &str) -> Option<TableRef> {
if name == NUMBERS_TABLE_NAME {
Some(NumbersTable::table(NUMBERS_TABLE_ID))
} else {
None
}
}
}
#[cfg(not(any(test, feature = "testing", debug_assertions)))]
impl NumbersTableProvider {
pub(crate) fn table_exists(&self, _name: &str) -> bool {
false
}
pub(crate) fn table_names(&self) -> Vec<String> {
vec![]
}
pub(crate) fn table(&self, _name: &str) -> Option<TableRef> {
None
}
}

View File

@@ -16,12 +16,15 @@ mod export;
mod import;
use clap::Subcommand;
use client::DEFAULT_CATALOG_NAME;
use common_error::ext::BoxedError;
use crate::Tool;
use crate::data::export::ExportCommand;
use crate::data::import::ImportCommand;
pub(crate) const COPY_PATH_PLACEHOLDER: &str = "<PATH/TO/FILES>";
/// Command for data operations including exporting data from and importing data into GreptimeDB.
#[derive(Subcommand)]
pub enum DataCommand {
@@ -37,3 +40,7 @@ impl DataCommand {
}
}
}
pub(crate) fn default_database() -> String {
format!("{DEFAULT_CATALOG_NAME}-*")
}

View File

@@ -30,6 +30,7 @@ use snafu::{OptionExt, ResultExt};
use tokio::sync::Semaphore;
use tokio::time::Instant;
use crate::data::{COPY_PATH_PLACEHOLDER, default_database};
use crate::database::{DatabaseClient, parse_proxy_opts};
use crate::error::{
EmptyResultSnafu, Error, OpenDalSnafu, OutputDirNotSetSnafu, Result, S3ConfigNotSetSnafu,
@@ -63,7 +64,7 @@ pub struct ExportCommand {
output_dir: Option<String>,
/// The name of the catalog to export.
#[clap(long, default_value = "greptime-*")]
#[clap(long, default_value_t = default_database())]
database: String,
/// Parallelism of the export.
@@ -667,10 +668,26 @@ impl Export {
);
// Create copy_from.sql file
let copy_database_from_sql = format!(
r#"COPY DATABASE "{}"."{}" FROM '{}' WITH ({}){};"#,
export_self.catalog, schema, path, with_options_clone, connection_part
);
let copy_database_from_sql = {
let command_without_connection = format!(
r#"COPY DATABASE "{}"."{}" FROM '{}' WITH ({});"#,
export_self.catalog, schema, COPY_PATH_PLACEHOLDER, with_options_clone
);
if connection_part.is_empty() {
command_without_connection
} else {
let command_with_connection = format!(
r#"COPY DATABASE "{}"."{}" FROM '{}' WITH ({}){};"#,
export_self.catalog, schema, path, with_options_clone, connection_part
);
format!(
"-- {}\n{}",
command_with_connection, command_without_connection
)
}
};
let copy_from_path = export_self.get_file_path(&schema, "copy_from.sql");
export_self

View File

@@ -21,12 +21,13 @@ use clap::{Parser, ValueEnum};
use common_catalog::consts::DEFAULT_SCHEMA_NAME;
use common_error::ext::BoxedError;
use common_telemetry::{error, info, warn};
use snafu::{OptionExt, ResultExt};
use snafu::{OptionExt, ResultExt, ensure};
use tokio::sync::Semaphore;
use tokio::time::Instant;
use crate::data::{COPY_PATH_PLACEHOLDER, default_database};
use crate::database::{DatabaseClient, parse_proxy_opts};
use crate::error::{Error, FileIoSnafu, Result, SchemaNotFoundSnafu};
use crate::error::{Error, FileIoSnafu, InvalidArgumentsSnafu, Result, SchemaNotFoundSnafu};
use crate::{Tool, database};
#[derive(Debug, Default, Clone, ValueEnum)]
@@ -52,7 +53,7 @@ pub struct ImportCommand {
input_dir: String,
/// The name of the catalog to import.
#[clap(long, default_value = "greptime-*")]
#[clap(long, default_value_t = default_database())]
database: String,
/// Parallelism of the import.
@@ -147,12 +148,15 @@ impl Import {
let _permit = semaphore_moved.acquire().await.unwrap();
let database_input_dir = self.catalog_path().join(&schema);
let sql_file = database_input_dir.join(filename);
let sql = tokio::fs::read_to_string(sql_file)
let mut sql = tokio::fs::read_to_string(sql_file)
.await
.context(FileIoSnafu)?;
if sql.is_empty() {
if sql.trim().is_empty() {
info!("Empty `{filename}` {database_input_dir:?}");
} else {
if filename == "copy_from.sql" {
sql = self.rewrite_copy_database_sql(&schema, &sql)?;
}
let db = exec_db.unwrap_or(&schema);
self.database_client.sql(&sql, db).await?;
info!("Imported `{filename}` for database {schema}");
@@ -225,6 +229,57 @@ impl Import {
}
Ok(db_names)
}
fn rewrite_copy_database_sql(&self, schema: &str, sql: &str) -> Result<String> {
let target_location = self.build_copy_database_location(schema);
let escaped_location = target_location.replace('\'', "''");
let mut first_stmt_checked = false;
for line in sql.lines() {
let trimmed = line.trim_start();
if trimmed.is_empty() || trimmed.starts_with("--") {
continue;
}
ensure!(
trimmed.starts_with("COPY DATABASE"),
InvalidArgumentsSnafu {
msg: "Expected COPY DATABASE statement at start of copy_from.sql"
}
);
first_stmt_checked = true;
break;
}
ensure!(
first_stmt_checked,
InvalidArgumentsSnafu {
msg: "COPY DATABASE statement not found in copy_from.sql"
}
);
ensure!(
sql.contains(COPY_PATH_PLACEHOLDER),
InvalidArgumentsSnafu {
msg: format!(
"Placeholder `{}` not found in COPY DATABASE statement",
COPY_PATH_PLACEHOLDER
)
}
);
Ok(sql.replacen(COPY_PATH_PLACEHOLDER, &escaped_location, 1))
}
fn build_copy_database_location(&self, schema: &str) -> String {
let mut path = self.catalog_path();
path.push(schema);
let mut path_str = path.to_string_lossy().into_owned();
if !path_str.ends_with('/') {
path_str.push('/');
}
path_str
}
}
#[async_trait]
@@ -240,3 +295,52 @@ impl Tool for Import {
}
}
}
#[cfg(test)]
mod tests {
use std::time::Duration;
use super::*;
fn build_import(input_dir: &str) -> Import {
Import {
catalog: "catalog".to_string(),
schema: None,
database_client: DatabaseClient::new(
"127.0.0.1:4000".to_string(),
"catalog".to_string(),
None,
Duration::from_secs(0),
None,
),
input_dir: input_dir.to_string(),
parallelism: 1,
target: ImportTarget::Data,
}
}
#[test]
fn rewrite_copy_database_sql_replaces_placeholder() {
let import = build_import("/tmp/export-path");
let comment = "-- COPY DATABASE \"catalog\".\"schema\" FROM 's3://bucket/demo/' WITH (format = 'parquet') CONNECTION (region = 'us-west-2')";
let sql = format!(
"{comment}\nCOPY DATABASE \"catalog\".\"schema\" FROM '{}' WITH (format = 'parquet');",
COPY_PATH_PLACEHOLDER
);
let rewritten = import.rewrite_copy_database_sql("schema", &sql).unwrap();
let expected_location = import.build_copy_database_location("schema");
let escaped = expected_location.replace('\'', "''");
assert!(rewritten.starts_with(comment));
assert!(rewritten.contains(&format!("FROM '{escaped}'")));
assert!(!rewritten.contains(COPY_PATH_PLACEHOLDER));
}
#[test]
fn rewrite_copy_database_sql_requires_placeholder() {
let import = build_import("/tmp/export-path");
let sql = "COPY DATABASE \"catalog\".\"schema\" FROM '/tmp/export-path/catalog/schema/' WITH (format = 'parquet');";
assert!(import.rewrite_copy_database_sql("schema", sql).is_err());
}
}

View File

@@ -20,7 +20,9 @@ use api::v1::health_check_client::HealthCheckClient;
use api::v1::prometheus_gateway_client::PrometheusGatewayClient;
use api::v1::region::region_client::RegionClient as PbRegionClient;
use arrow_flight::flight_service_client::FlightServiceClient;
use common_grpc::channel_manager::{ChannelConfig, ChannelManager, ClientTlsOption};
use common_grpc::channel_manager::{
ChannelConfig, ChannelManager, ClientTlsOption, load_tls_config,
};
use parking_lot::RwLock;
use snafu::{OptionExt, ResultExt};
use tonic::codec::CompressionEncoding;
@@ -94,8 +96,9 @@ impl Client {
A: AsRef<[U]>,
{
let channel_config = ChannelConfig::default().client_tls_config(client_tls);
let channel_manager = ChannelManager::with_tls_config(channel_config)
let tls_config = load_tls_config(channel_config.client_tls.as_ref())
.context(error::CreateTlsChannelSnafu)?;
let channel_manager = ChannelManager::with_config(channel_config, tls_config);
Ok(Self::with_manager_and_urls(channel_manager, urls))
}

View File

@@ -74,7 +74,7 @@ impl FlownodeManager for NodeClients {
impl NodeClients {
pub fn new(config: ChannelConfig) -> Self {
Self {
channel_manager: ChannelManager::with_config(config),
channel_manager: ChannelManager::with_config(config, None),
clients: CacheBuilder::new(1024)
.time_to_live(Duration::from_secs(30 * 60))
.time_to_idle(Duration::from_secs(5 * 60))

View File

@@ -29,9 +29,11 @@ base64.workspace = true
cache.workspace = true
catalog.workspace = true
chrono.workspace = true
either = "1.15"
clap.workspace = true
cli.workspace = true
client.workspace = true
colored = "2.1.0"
common-base.workspace = true
common-catalog.workspace = true
common-config.workspace = true
@@ -63,9 +65,11 @@ lazy_static.workspace = true
meta-client.workspace = true
meta-srv.workspace = true
metric-engine.workspace = true
mito2.workspace = true
moka.workspace = true
nu-ansi-term = "0.46"
object-store.workspace = true
parquet = { workspace = true, features = ["object_store"] }
plugins.workspace = true
prometheus.workspace = true
prost.workspace = true
@@ -88,6 +92,11 @@ toml.workspace = true
tonic.workspace = true
tracing-appender.workspace = true
[target.'cfg(unix)'.dependencies]
pprof = { version = "0.14", features = [
"flamegraph",
] }
[target.'cfg(not(windows))'.dependencies]
tikv-jemallocator = "0.6"

View File

@@ -103,12 +103,15 @@ async fn main_body() -> Result<()> {
async fn start(cli: Command) -> Result<()> {
match cli.subcmd {
SubCommand::Datanode(cmd) => {
let opts = cmd.load_options(&cli.global_options)?;
let plugins = Plugins::new();
let builder = InstanceBuilder::try_new_with_init(opts, plugins).await?;
cmd.build_with(builder).await?.run().await
}
SubCommand::Datanode(cmd) => match cmd.subcmd {
datanode::SubCommand::Start(ref start) => {
let opts = start.load_options(&cli.global_options)?;
let plugins = Plugins::new();
let builder = InstanceBuilder::try_new_with_init(opts, plugins).await?;
cmd.build_with(builder).await?.run().await
}
datanode::SubCommand::Objbench(ref bench) => bench.run().await,
},
SubCommand::Flownode(cmd) => {
cmd.build(cmd.load_options(&cli.global_options)?)
.await?

View File

@@ -13,6 +13,8 @@
// limitations under the License.
pub mod builder;
#[allow(clippy::print_stdout)]
mod objbench;
use std::path::Path;
use std::time::Duration;
@@ -23,13 +25,16 @@ use common_config::Configurable;
use common_telemetry::logging::{DEFAULT_LOGGING_DIR, TracingOptions};
use common_telemetry::{info, warn};
use common_wal::config::DatanodeWalConfig;
use datanode::config::RegionEngineConfig;
use datanode::datanode::Datanode;
use meta_client::MetaClientOptions;
use serde::{Deserialize, Serialize};
use snafu::{ResultExt, ensure};
use tracing_appender::non_blocking::WorkerGuard;
use crate::App;
use crate::datanode::builder::InstanceBuilder;
use crate::datanode::objbench::ObjbenchCommand;
use crate::error::{
LoadLayeredConfigSnafu, MissingConfigSnafu, Result, ShutdownDatanodeSnafu, StartDatanodeSnafu,
};
@@ -89,7 +94,7 @@ impl App for Instance {
#[derive(Parser)]
pub struct Command {
#[clap(subcommand)]
subcmd: SubCommand,
pub subcmd: SubCommand,
}
impl Command {
@@ -100,13 +105,26 @@ impl Command {
pub fn load_options(&self, global_options: &GlobalOptions) -> Result<DatanodeOptions> {
match &self.subcmd {
SubCommand::Start(cmd) => cmd.load_options(global_options),
SubCommand::Objbench(_) => {
// For objbench command, we don't need to load DatanodeOptions
// It's a standalone utility command
let mut opts = datanode::config::DatanodeOptions::default();
opts.sanitize();
Ok(DatanodeOptions {
runtime: Default::default(),
plugins: Default::default(),
component: opts,
})
}
}
}
}
#[derive(Parser)]
enum SubCommand {
pub enum SubCommand {
Start(StartCommand),
/// Object storage benchmark tool
Objbench(ObjbenchCommand),
}
impl SubCommand {
@@ -116,12 +134,33 @@ impl SubCommand {
info!("Building datanode with {:#?}", cmd);
builder.build().await
}
SubCommand::Objbench(cmd) => {
cmd.run().await?;
std::process::exit(0);
}
}
}
}
/// Storage engine config
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Default)]
#[serde(default)]
pub struct StorageConfig {
/// The working directory of database
pub data_home: String,
#[serde(flatten)]
pub store: object_store::config::ObjectStoreConfig,
}
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Default)]
#[serde(default)]
struct StorageConfigWrapper {
storage: StorageConfig,
region_engine: Vec<RegionEngineConfig>,
}
#[derive(Debug, Parser, Default)]
struct StartCommand {
pub struct StartCommand {
#[clap(long)]
node_id: Option<u64>,
/// The address to bind the gRPC server.
@@ -149,7 +188,7 @@ struct StartCommand {
}
impl StartCommand {
fn load_options(&self, global_options: &GlobalOptions) -> Result<DatanodeOptions> {
pub fn load_options(&self, global_options: &GlobalOptions) -> Result<DatanodeOptions> {
let mut opts = DatanodeOptions::load_layered_options(
self.config_file.as_deref(),
self.env_prefix.as_ref(),

View File

@@ -0,0 +1,677 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
use std::path::PathBuf;
use std::sync::Arc;
use std::time::Instant;
use clap::Parser;
use colored::Colorize;
use datanode::config::RegionEngineConfig;
use datanode::store;
use either::Either;
use mito2::access_layer::{
AccessLayer, AccessLayerRef, Metrics, OperationType, SstWriteRequest, WriteType,
};
use mito2::cache::{CacheManager, CacheManagerRef};
use mito2::config::{FulltextIndexConfig, MitoConfig, Mode};
use mito2::read::Source;
use mito2::sst::file::{FileHandle, FileMeta};
use mito2::sst::file_purger::{FilePurger, FilePurgerRef};
use mito2::sst::index::intermediate::IntermediateManager;
use mito2::sst::index::puffin_manager::PuffinManagerFactory;
use mito2::sst::parquet::reader::ParquetReaderBuilder;
use mito2::sst::parquet::{PARQUET_METADATA_KEY, WriteOptions};
use mito2::worker::write_cache_from_config;
use object_store::ObjectStore;
use regex::Regex;
use snafu::OptionExt;
use store_api::metadata::{RegionMetadata, RegionMetadataRef};
use store_api::path_utils::region_name;
use store_api::region_request::PathType;
use store_api::storage::FileId;
use crate::datanode::{StorageConfig, StorageConfigWrapper};
use crate::error;
/// Object storage benchmark command
#[derive(Debug, Parser)]
pub struct ObjbenchCommand {
/// Path to the object-store config file (TOML). Must deserialize into object_store::config::ObjectStoreConfig.
#[clap(long, value_name = "FILE")]
pub config: PathBuf,
/// Source SST file path in object-store (e.g. "region_dir/<uuid>.parquet").
#[clap(long, value_name = "PATH")]
pub source: String,
/// Verbose output
#[clap(short, long, default_value_t = false)]
pub verbose: bool,
/// Output file path for pprof flamegraph (enables profiling)
#[clap(long, value_name = "FILE")]
pub pprof_file: Option<PathBuf>,
}
fn parse_config(config_path: &PathBuf) -> error::Result<(StorageConfig, MitoConfig)> {
let cfg_str = std::fs::read_to_string(config_path).map_err(|e| {
error::IllegalConfigSnafu {
msg: format!("failed to read config {}: {e}", config_path.display()),
}
.build()
})?;
let store_cfg: StorageConfigWrapper = toml::from_str(&cfg_str).map_err(|e| {
error::IllegalConfigSnafu {
msg: format!("failed to parse config {}: {e}", config_path.display()),
}
.build()
})?;
let storage_config = store_cfg.storage;
let mito_engine_config = store_cfg
.region_engine
.into_iter()
.filter_map(|c| {
if let RegionEngineConfig::Mito(mito) = c {
Some(mito)
} else {
None
}
})
.next()
.with_context(|| error::IllegalConfigSnafu {
msg: format!("Engine config not found in {:?}", config_path),
})?;
Ok((storage_config, mito_engine_config))
}
impl ObjbenchCommand {
pub async fn run(&self) -> error::Result<()> {
if self.verbose {
common_telemetry::init_default_ut_logging();
}
println!("{}", "Starting objbench with config:".cyan().bold());
// Build object store from config
let (store_cfg, mut mito_engine_config) = parse_config(&self.config)?;
let object_store = build_object_store(&store_cfg).await?;
println!("{} Object store initialized", "".green());
// Prepare source identifiers
let components = parse_file_dir_components(&self.source)?;
println!(
"{} Source path parsed: {}, components: {:?}",
"".green(),
self.source,
components
);
// Load parquet metadata to extract RegionMetadata and file stats
println!("{}", "Loading parquet metadata...".yellow());
let file_size = object_store
.stat(&self.source)
.await
.map_err(|e| {
error::IllegalConfigSnafu {
msg: format!("stat failed: {e}"),
}
.build()
})?
.content_length();
let parquet_meta = load_parquet_metadata(object_store.clone(), &self.source, file_size)
.await
.map_err(|e| {
error::IllegalConfigSnafu {
msg: format!("read parquet metadata failed: {e}"),
}
.build()
})?;
let region_meta = extract_region_metadata(&self.source, &parquet_meta)?;
let num_rows = parquet_meta.file_metadata().num_rows() as u64;
let num_row_groups = parquet_meta.num_row_groups() as u64;
println!(
"{} Metadata loaded - rows: {}, size: {} bytes",
"".green(),
num_rows,
file_size
);
// Build a FileHandle for the source file
let file_meta = FileMeta {
region_id: region_meta.region_id,
file_id: components.file_id,
time_range: Default::default(),
level: 0,
file_size,
available_indexes: Default::default(),
index_file_size: 0,
index_file_id: None,
num_rows,
num_row_groups,
sequence: None,
partition_expr: None,
num_series: 0,
};
let src_handle = FileHandle::new(file_meta, new_noop_file_purger());
// Build the reader for a single file via ParquetReaderBuilder
let table_dir = components.table_dir();
let (src_access_layer, cache_manager) = build_access_layer_simple(
&components,
object_store.clone(),
&mut mito_engine_config,
&store_cfg.data_home,
)
.await?;
let reader_build_start = Instant::now();
let reader = ParquetReaderBuilder::new(
table_dir,
components.path_type,
src_handle.clone(),
object_store.clone(),
)
.expected_metadata(Some(region_meta.clone()))
.build()
.await
.map_err(|e| {
error::IllegalConfigSnafu {
msg: format!("build reader failed: {e:?}"),
}
.build()
})?;
let reader_build_elapsed = reader_build_start.elapsed();
let total_rows = reader.parquet_metadata().file_metadata().num_rows();
println!("{} Reader built in {:?}", "".green(), reader_build_elapsed);
// Build write request
let fulltext_index_config = FulltextIndexConfig {
create_on_compaction: Mode::Disable,
..Default::default()
};
let write_req = SstWriteRequest {
op_type: OperationType::Flush,
metadata: region_meta,
source: Either::Left(Source::Reader(Box::new(reader))),
cache_manager,
storage: None,
max_sequence: None,
index_options: Default::default(),
index_config: mito_engine_config.index.clone(),
inverted_index_config: MitoConfig::default().inverted_index,
fulltext_index_config,
bloom_filter_index_config: MitoConfig::default().bloom_filter_index,
};
// Write SST
println!("{}", "Writing SST...".yellow());
// Start profiling if pprof_file is specified
#[cfg(unix)]
let profiler_guard = if self.pprof_file.is_some() {
println!("{} Starting profiling...", "".yellow());
Some(
pprof::ProfilerGuardBuilder::default()
.frequency(99)
.blocklist(&["libc", "libgcc", "pthread", "vdso"])
.build()
.map_err(|e| {
error::IllegalConfigSnafu {
msg: format!("Failed to start profiler: {e}"),
}
.build()
})?,
)
} else {
None
};
#[cfg(not(unix))]
if self.pprof_file.is_some() {
eprintln!(
"{}: Profiling is not supported on this platform",
"Warning".yellow()
);
}
let write_start = Instant::now();
let mut metrics = Metrics::new(WriteType::Flush);
let infos = src_access_layer
.write_sst(write_req, &WriteOptions::default(), &mut metrics)
.await
.map_err(|e| {
error::IllegalConfigSnafu {
msg: format!("write_sst failed: {e:?}"),
}
.build()
})?;
let write_elapsed = write_start.elapsed();
// Stop profiling and generate flamegraph if enabled
#[cfg(unix)]
if let (Some(guard), Some(pprof_file)) = (profiler_guard, &self.pprof_file) {
println!("{} Generating flamegraph...", "🔥".yellow());
match guard.report().build() {
Ok(report) => {
let mut flamegraph_data = Vec::new();
if let Err(e) = report.flamegraph(&mut flamegraph_data) {
println!("{}: Failed to generate flamegraph: {}", "Error".red(), e);
} else if let Err(e) = std::fs::write(pprof_file, flamegraph_data) {
println!(
"{}: Failed to write flamegraph to {}: {}",
"Error".red(),
pprof_file.display(),
e
);
} else {
println!(
"{} Flamegraph saved to {}",
"".green(),
pprof_file.display().to_string().cyan()
);
}
}
Err(e) => {
println!("{}: Failed to generate pprof report: {}", "Error".red(), e);
}
}
}
assert_eq!(infos.len(), 1);
let dst_file_id = infos[0].file_id;
let dst_file_path = format!("{}/{}.parquet", components.region_dir(), dst_file_id);
let mut dst_index_path = None;
if infos[0].index_metadata.file_size > 0 {
dst_index_path = Some(format!(
"{}/index/{}.puffin",
components.region_dir(),
dst_file_id
));
}
// Report results with ANSI colors
println!("\n{} {}", "Write complete!".green().bold(), "".green());
println!(" {}: {}", "Destination file".bold(), dst_file_path.cyan());
println!(" {}: {}", "Rows".bold(), total_rows.to_string().cyan());
println!(
" {}: {}",
"File size".bold(),
format!("{} bytes", file_size).cyan()
);
println!(
" {}: {:?}",
"Reader build time".bold(),
reader_build_elapsed
);
println!(" {}: {:?}", "Total time".bold(), write_elapsed);
// Print metrics in a formatted way
println!(" {}: {:?}", "Metrics".bold(), metrics,);
// Print infos
println!(" {}: {:?}", "Index".bold(), infos[0].index_metadata);
// Cleanup
println!("\n{}", "Cleaning up...".yellow());
object_store.delete(&dst_file_path).await.map_err(|e| {
error::IllegalConfigSnafu {
msg: format!("Failed to delete dest file {}: {}", dst_file_path, e),
}
.build()
})?;
println!("{} Temporary file {} deleted", "".green(), dst_file_path);
if let Some(index_path) = dst_index_path {
object_store.delete(&index_path).await.map_err(|e| {
error::IllegalConfigSnafu {
msg: format!("Failed to delete dest index file {}: {}", index_path, e),
}
.build()
})?;
println!(
"{} Temporary index file {} deleted",
"".green(),
index_path
);
}
println!("\n{}", "Benchmark completed successfully!".green().bold());
Ok(())
}
}
#[derive(Debug)]
struct FileDirComponents {
catalog: String,
schema: String,
table_id: u32,
region_sequence: u32,
path_type: PathType,
file_id: FileId,
}
impl FileDirComponents {
fn table_dir(&self) -> String {
format!("data/{}/{}/{}", self.catalog, self.schema, self.table_id)
}
fn region_dir(&self) -> String {
let region_name = region_name(self.table_id, self.region_sequence);
match self.path_type {
PathType::Bare => {
format!(
"data/{}/{}/{}/{}",
self.catalog, self.schema, self.table_id, region_name
)
}
PathType::Data => {
format!(
"data/{}/{}/{}/{}/data",
self.catalog, self.schema, self.table_id, region_name
)
}
PathType::Metadata => {
format!(
"data/{}/{}/{}/{}/metadata",
self.catalog, self.schema, self.table_id, region_name
)
}
}
}
}
fn parse_file_dir_components(path: &str) -> error::Result<FileDirComponents> {
// Define the regex pattern to match all three path styles
let pattern =
r"^data/([^/]+)/([^/]+)/([^/]+)/([^/]+)_([^/]+)(?:/data|/metadata)?/(.+).parquet$";
// Compile the regex
let re = Regex::new(pattern).expect("Invalid regex pattern");
// Determine the path type
let path_type = if path.contains("/data/") {
PathType::Data
} else if path.contains("/metadata/") {
PathType::Metadata
} else {
PathType::Bare
};
// Try to match the path
let components = (|| {
let captures = re.captures(path)?;
if captures.len() != 7 {
return None;
}
let mut components = FileDirComponents {
catalog: "".to_string(),
schema: "".to_string(),
table_id: 0,
region_sequence: 0,
path_type,
file_id: FileId::default(),
};
// Extract the components
components.catalog = captures.get(1)?.as_str().to_string();
components.schema = captures.get(2)?.as_str().to_string();
components.table_id = captures[3].parse().ok()?;
components.region_sequence = captures[5].parse().ok()?;
let file_id_str = &captures[6];
components.file_id = FileId::parse_str(file_id_str).ok()?;
Some(components)
})();
components.context(error::IllegalConfigSnafu {
msg: format!("Expect valid source file path, got: {}", path),
})
}
fn extract_region_metadata(
file_path: &str,
meta: &parquet::file::metadata::ParquetMetaData,
) -> error::Result<RegionMetadataRef> {
use parquet::format::KeyValue;
let kvs: Option<&Vec<KeyValue>> = meta.file_metadata().key_value_metadata();
let Some(kvs) = kvs else {
return Err(error::IllegalConfigSnafu {
msg: format!("{file_path}: missing parquet key_value metadata"),
}
.build());
};
let json = kvs
.iter()
.find(|kv| kv.key == PARQUET_METADATA_KEY)
.and_then(|kv| kv.value.as_ref())
.ok_or_else(|| {
error::IllegalConfigSnafu {
msg: format!("{file_path}: key {PARQUET_METADATA_KEY} not found or empty"),
}
.build()
})?;
let region: RegionMetadata = RegionMetadata::from_json(json).map_err(|e| {
error::IllegalConfigSnafu {
msg: format!("invalid region metadata json: {e}"),
}
.build()
})?;
Ok(Arc::new(region))
}
async fn build_object_store(sc: &StorageConfig) -> error::Result<ObjectStore> {
store::new_object_store(sc.store.clone(), &sc.data_home)
.await
.map_err(|e| {
error::IllegalConfigSnafu {
msg: format!("Failed to build object store: {e:?}"),
}
.build()
})
}
async fn build_access_layer_simple(
components: &FileDirComponents,
object_store: ObjectStore,
config: &mut MitoConfig,
data_home: &str,
) -> error::Result<(AccessLayerRef, CacheManagerRef)> {
let _ = config.index.sanitize(data_home, &config.inverted_index);
let puffin_manager = PuffinManagerFactory::new(
&config.index.aux_path,
config.index.staging_size.as_bytes(),
Some(config.index.write_buffer_size.as_bytes() as _),
config.index.staging_ttl,
)
.await
.map_err(|e| {
error::IllegalConfigSnafu {
msg: format!("Failed to build access layer: {e:?}"),
}
.build()
})?;
let intermediate_manager = IntermediateManager::init_fs(&config.index.aux_path)
.await
.map_err(|e| {
error::IllegalConfigSnafu {
msg: format!("Failed to build IntermediateManager: {e:?}"),
}
.build()
})?
.with_buffer_size(Some(config.index.write_buffer_size.as_bytes() as _));
let cache_manager =
build_cache_manager(config, puffin_manager.clone(), intermediate_manager.clone()).await?;
let layer = AccessLayer::new(
components.table_dir(),
components.path_type,
object_store,
puffin_manager,
intermediate_manager,
);
Ok((Arc::new(layer), cache_manager))
}
async fn build_cache_manager(
config: &MitoConfig,
puffin_manager: PuffinManagerFactory,
intermediate_manager: IntermediateManager,
) -> error::Result<CacheManagerRef> {
let write_cache = write_cache_from_config(config, puffin_manager, intermediate_manager)
.await
.map_err(|e| {
error::IllegalConfigSnafu {
msg: format!("Failed to build write cache: {e:?}"),
}
.build()
})?;
let cache_manager = Arc::new(
CacheManager::builder()
.sst_meta_cache_size(config.sst_meta_cache_size.as_bytes())
.vector_cache_size(config.vector_cache_size.as_bytes())
.page_cache_size(config.page_cache_size.as_bytes())
.selector_result_cache_size(config.selector_result_cache_size.as_bytes())
.index_metadata_size(config.index.metadata_cache_size.as_bytes())
.index_content_size(config.index.content_cache_size.as_bytes())
.index_content_page_size(config.index.content_cache_page_size.as_bytes())
.index_result_cache_size(config.index.result_cache_size.as_bytes())
.puffin_metadata_size(config.index.metadata_cache_size.as_bytes())
.write_cache(write_cache)
.build(),
);
Ok(cache_manager)
}
fn new_noop_file_purger() -> FilePurgerRef {
#[derive(Debug)]
struct Noop;
impl FilePurger for Noop {
fn remove_file(&self, _file_meta: FileMeta, _is_delete: bool) {}
}
Arc::new(Noop)
}
async fn load_parquet_metadata(
object_store: ObjectStore,
path: &str,
file_size: u64,
) -> Result<parquet::file::metadata::ParquetMetaData, Box<dyn std::error::Error + Send + Sync>> {
use parquet::file::FOOTER_SIZE;
use parquet::file::metadata::ParquetMetaDataReader;
let actual_size = if file_size == 0 {
object_store.stat(path).await?.content_length()
} else {
file_size
};
if actual_size < FOOTER_SIZE as u64 {
return Err("file too small".into());
}
let prefetch: u64 = 64 * 1024;
let start = actual_size.saturating_sub(prefetch);
let buffer = object_store
.read_with(path)
.range(start..actual_size)
.await?
.to_vec();
let buffer_len = buffer.len();
let mut footer = [0; 8];
footer.copy_from_slice(&buffer[buffer_len - FOOTER_SIZE..]);
let footer = ParquetMetaDataReader::decode_footer_tail(&footer)?;
let metadata_len = footer.metadata_length() as u64;
if actual_size - (FOOTER_SIZE as u64) < metadata_len {
return Err("invalid footer/metadata length".into());
}
if (metadata_len as usize) <= buffer_len - FOOTER_SIZE {
let metadata_start = buffer_len - metadata_len as usize - FOOTER_SIZE;
let meta = ParquetMetaDataReader::decode_metadata(
&buffer[metadata_start..buffer_len - FOOTER_SIZE],
)?;
Ok(meta)
} else {
let metadata_start = actual_size - metadata_len - FOOTER_SIZE as u64;
let data = object_store
.read_with(path)
.range(metadata_start..(actual_size - FOOTER_SIZE as u64))
.await?
.to_vec();
let meta = ParquetMetaDataReader::decode_metadata(&data)?;
Ok(meta)
}
}
#[cfg(test)]
mod tests {
use std::path::PathBuf;
use std::str::FromStr;
use common_base::readable_size::ReadableSize;
use store_api::region_request::PathType;
use crate::datanode::objbench::{parse_config, parse_file_dir_components};
#[test]
fn test_parse_dir() {
let meta_path = "data/greptime/public/1024/1024_0000000000/metadata/00020380-009c-426d-953e-b4e34c15af34.parquet";
let c = parse_file_dir_components(meta_path).unwrap();
assert_eq!(
c.file_id.to_string(),
"00020380-009c-426d-953e-b4e34c15af34"
);
assert_eq!(c.catalog, "greptime");
assert_eq!(c.schema, "public");
assert_eq!(c.table_id, 1024);
assert_eq!(c.region_sequence, 0);
assert_eq!(c.path_type, PathType::Metadata);
let c = parse_file_dir_components(
"data/greptime/public/1024/1024_0000000000/data/00020380-009c-426d-953e-b4e34c15af34.parquet",
).unwrap();
assert_eq!(
c.file_id.to_string(),
"00020380-009c-426d-953e-b4e34c15af34"
);
assert_eq!(c.catalog, "greptime");
assert_eq!(c.schema, "public");
assert_eq!(c.table_id, 1024);
assert_eq!(c.region_sequence, 0);
assert_eq!(c.path_type, PathType::Data);
let c = parse_file_dir_components(
"data/greptime/public/1024/1024_0000000000/00020380-009c-426d-953e-b4e34c15af34.parquet",
).unwrap();
assert_eq!(
c.file_id.to_string(),
"00020380-009c-426d-953e-b4e34c15af34"
);
assert_eq!(c.catalog, "greptime");
assert_eq!(c.schema, "public");
assert_eq!(c.table_id, 1024);
assert_eq!(c.region_sequence, 0);
assert_eq!(c.path_type, PathType::Bare);
}
#[test]
fn test_parse_config() {
let path = "../../config/datanode.example.toml";
let (storage, engine) = parse_config(&PathBuf::from_str(path).unwrap()).unwrap();
assert_eq!(storage.data_home, "./greptimedb_data");
assert_eq!(engine.index.staging_size, ReadableSize::gb(2));
}
}

View File

@@ -8,5 +8,6 @@ license.workspace = true
workspace = true
[dependencies]
const_format.workspace = true
[dev-dependencies]

View File

@@ -0,0 +1,27 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
fn main() {
// Set DEFAULT_CATALOG_NAME from environment variable or use default value
let default_catalog_name =
std::env::var("DEFAULT_CATALOG_NAME").unwrap_or_else(|_| "greptime".to_string());
println!(
"cargo:rustc-env=DEFAULT_CATALOG_NAME={}",
default_catalog_name
);
// Rerun build script if the environment variable changes
println!("cargo:rerun-if-env-changed=DEFAULT_CATALOG_NAME");
}

View File

@@ -12,13 +12,15 @@
// See the License for the specific language governing permissions and
// limitations under the License.
use const_format::concatcp;
pub const SYSTEM_CATALOG_NAME: &str = "system";
pub const INFORMATION_SCHEMA_NAME: &str = "information_schema";
pub const PG_CATALOG_NAME: &str = "pg_catalog";
pub const SYSTEM_CATALOG_TABLE_NAME: &str = "system_catalog";
pub const DEFAULT_CATALOG_NAME: &str = "greptime";
pub const DEFAULT_CATALOG_NAME: &str = env!("DEFAULT_CATALOG_NAME");
pub const DEFAULT_SCHEMA_NAME: &str = "public";
pub const DEFAULT_PRIVATE_SCHEMA_NAME: &str = "greptime_private";
pub const DEFAULT_PRIVATE_SCHEMA_NAME: &str = concatcp!(DEFAULT_CATALOG_NAME, "_private");
/// Reserves [0,MIN_USER_FLOW_ID) for internal usage.
/// User defined table id starts from this value.

View File

@@ -45,3 +45,19 @@ pub fn from_err_code_msg_to_header(code: u32, msg: &str) -> HeaderMap {
header.insert(GREPTIME_DB_HEADER_ERROR_MSG, msg);
header
}
/// Returns the external root cause of the source error (exclude the current error).
pub fn root_source(err: &dyn std::error::Error) -> Option<&dyn std::error::Error> {
// There are some divergence about the behavior of the `sources()` API
// in https://github.com/rust-lang/rust/issues/58520
// So this function iterates the sources manually.
let mut root = err.source();
while let Some(r) = root {
if let Some(s) = r.source() {
root = Some(s);
} else {
break;
}
}
root
}

View File

@@ -104,7 +104,7 @@ impl MetaClientSelector {
let cfg = ChannelConfig::new()
.connect_timeout(Duration::from_secs(30))
.timeout(Duration::from_secs(30));
let channel_manager = ChannelManager::with_config(cfg);
let channel_manager = ChannelManager::with_config(cfg, None);
Self {
meta_client,
channel_manager,

View File

@@ -12,10 +12,12 @@
// See the License for the specific language governing permissions and
// limitations under the License.
use crate::aggrs::vector::avg::VectorAvg;
use crate::aggrs::vector::product::VectorProduct;
use crate::aggrs::vector::sum::VectorSum;
use crate::function_registry::FunctionRegistry;
mod avg;
mod product;
mod sum;
@@ -25,5 +27,6 @@ impl VectorFunction {
pub fn register(registry: &FunctionRegistry) {
registry.register_aggr(VectorSum::uadf_impl());
registry.register_aggr(VectorProduct::uadf_impl());
registry.register_aggr(VectorAvg::uadf_impl());
}
}

View File

@@ -0,0 +1,270 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
use std::borrow::Cow;
use std::sync::Arc;
use arrow::array::{Array, ArrayRef, AsArray, BinaryArray, LargeStringArray, StringArray};
use arrow::compute::sum;
use arrow::datatypes::UInt64Type;
use arrow_schema::{DataType, Field};
use datafusion_common::{Result, ScalarValue};
use datafusion_expr::{
Accumulator, AggregateUDF, Signature, SimpleAggregateUDF, TypeSignature, Volatility,
};
use datafusion_functions_aggregate_common::accumulator::AccumulatorArgs;
use nalgebra::{Const, DVector, DVectorView, Dyn, OVector};
use crate::scalars::vector::impl_conv::{
binlit_as_veclit, parse_veclit_from_strlit, veclit_to_binlit,
};
/// The accumulator for the `vec_avg` aggregate function.
#[derive(Debug, Default)]
pub struct VectorAvg {
sum: Option<OVector<f32, Dyn>>,
count: u64,
}
impl VectorAvg {
/// Create a new `AggregateUDF` for the `vec_avg` aggregate function.
pub fn uadf_impl() -> AggregateUDF {
let signature = Signature::one_of(
vec![
TypeSignature::Exact(vec![DataType::Utf8]),
TypeSignature::Exact(vec![DataType::LargeUtf8]),
TypeSignature::Exact(vec![DataType::Binary]),
],
Volatility::Immutable,
);
let udaf = SimpleAggregateUDF::new_with_signature(
"vec_avg",
signature,
DataType::Binary,
Arc::new(Self::accumulator),
vec![
Arc::new(Field::new("sum", DataType::Binary, true)),
Arc::new(Field::new("count", DataType::UInt64, true)),
],
);
AggregateUDF::from(udaf)
}
fn accumulator(args: AccumulatorArgs) -> Result<Box<dyn Accumulator>> {
if args.schema.fields().len() != 1 {
return Err(datafusion_common::DataFusionError::Internal(format!(
"expect creating `VEC_AVG` with only one input field, actual {}",
args.schema.fields().len()
)));
}
let t = args.schema.field(0).data_type();
if !matches!(t, DataType::Utf8 | DataType::LargeUtf8 | DataType::Binary) {
return Err(datafusion_common::DataFusionError::Internal(format!(
"unexpected input datatype {t} when creating `VEC_AVG`"
)));
}
Ok(Box::new(VectorAvg::default()))
}
fn inner(&mut self, len: usize) -> &mut OVector<f32, Dyn> {
self.sum
.get_or_insert_with(|| OVector::zeros_generic(Dyn(len), Const::<1>))
}
fn update(&mut self, values: &[ArrayRef], is_update: bool) -> Result<()> {
if values.is_empty() {
return Ok(());
};
let vectors = match values[0].data_type() {
DataType::Utf8 => {
let arr: &StringArray = values[0].as_string();
arr.iter()
.filter_map(|x| x.map(|s| parse_veclit_from_strlit(s).map_err(Into::into)))
.map(|x| x.map(Cow::Owned))
.collect::<Result<Vec<_>>>()?
}
DataType::LargeUtf8 => {
let arr: &LargeStringArray = values[0].as_string();
arr.iter()
.filter_map(|x| x.map(|s| parse_veclit_from_strlit(s).map_err(Into::into)))
.map(|x: Result<Vec<f32>>| x.map(Cow::Owned))
.collect::<Result<Vec<_>>>()?
}
DataType::Binary => {
let arr: &BinaryArray = values[0].as_binary();
arr.iter()
.filter_map(|x| x.map(|b| binlit_as_veclit(b).map_err(Into::into)))
.collect::<Result<Vec<_>>>()?
}
_ => {
return Err(datafusion_common::DataFusionError::NotImplemented(format!(
"unsupported data type {} for `VEC_AVG`",
values[0].data_type()
)));
}
};
if vectors.is_empty() {
return Ok(());
}
let len = if is_update {
vectors.len() as u64
} else {
sum(values[1].as_primitive::<UInt64Type>()).unwrap_or_default()
};
let dims = vectors[0].len();
let mut sum = DVector::zeros(dims);
for v in vectors {
if v.len() != dims {
return Err(datafusion_common::DataFusionError::Execution(
"vectors length not match: VEC_AVG".to_string(),
));
}
let v_view = DVectorView::from_slice(&v, dims);
sum += &v_view;
}
*self.inner(dims) += sum;
self.count += len;
Ok(())
}
}
impl Accumulator for VectorAvg {
fn state(&mut self) -> Result<Vec<ScalarValue>> {
let vector = match &self.sum {
None => ScalarValue::Binary(None),
Some(sum) => ScalarValue::Binary(Some(veclit_to_binlit(sum.as_slice()))),
};
Ok(vec![vector, ScalarValue::from(self.count)])
}
fn update_batch(&mut self, values: &[ArrayRef]) -> Result<()> {
self.update(values, true)
}
fn merge_batch(&mut self, states: &[ArrayRef]) -> Result<()> {
self.update(states, false)
}
fn evaluate(&mut self) -> Result<ScalarValue> {
match &self.sum {
None => Ok(ScalarValue::Binary(None)),
Some(sum) => Ok(ScalarValue::Binary(Some(veclit_to_binlit(
(sum / self.count as f32).as_slice(),
)))),
}
}
fn size(&self) -> usize {
size_of_val(self)
}
}
#[cfg(test)]
mod tests {
use std::sync::Arc;
use arrow::array::StringArray;
use datatypes::scalars::ScalarVector;
use datatypes::vectors::{ConstantVector, StringVector, Vector};
use super::*;
#[test]
fn test_update_batch() {
// test update empty batch, expect not updating anything
let mut vec_avg = VectorAvg::default();
vec_avg.update_batch(&[]).unwrap();
assert!(vec_avg.sum.is_none());
assert_eq!(ScalarValue::Binary(None), vec_avg.evaluate().unwrap());
// test update one not-null value
let mut vec_avg = VectorAvg::default();
let v: Vec<ArrayRef> = vec![Arc::new(StringArray::from(vec![
Some("[1.0,2.0,3.0]".to_string()),
Some("[4.0,5.0,6.0]".to_string()),
]))];
vec_avg.update_batch(&v).unwrap();
assert_eq!(
ScalarValue::Binary(Some(veclit_to_binlit(&[2.5, 3.5, 4.5]))),
vec_avg.evaluate().unwrap()
);
// test update one null value
let mut vec_avg = VectorAvg::default();
let v: Vec<ArrayRef> = vec![Arc::new(StringArray::from(vec![Option::<String>::None]))];
vec_avg.update_batch(&v).unwrap();
assert_eq!(ScalarValue::Binary(None), vec_avg.evaluate().unwrap());
// test update no null-value batch
let mut vec_avg = VectorAvg::default();
let v: Vec<ArrayRef> = vec![Arc::new(StringArray::from(vec![
Some("[1.0,2.0,3.0]".to_string()),
Some("[4.0,5.0,6.0]".to_string()),
Some("[7.0,8.0,9.0]".to_string()),
]))];
vec_avg.update_batch(&v).unwrap();
assert_eq!(
ScalarValue::Binary(Some(veclit_to_binlit(&[4.0, 5.0, 6.0]))),
vec_avg.evaluate().unwrap()
);
// test update null-value batch
let mut vec_avg = VectorAvg::default();
let v: Vec<ArrayRef> = vec![Arc::new(StringArray::from(vec![
Some("[1.0,2.0,3.0]".to_string()),
None,
Some("[7.0,8.0,9.0]".to_string()),
]))];
vec_avg.update_batch(&v).unwrap();
assert_eq!(
ScalarValue::Binary(Some(veclit_to_binlit(&[4.0, 5.0, 6.0]))),
vec_avg.evaluate().unwrap()
);
let mut vec_avg = VectorAvg::default();
let v: Vec<ArrayRef> = vec![Arc::new(StringArray::from(vec![
None,
Some("[4.0,5.0,6.0]".to_string()),
Some("[7.0,8.0,9.0]".to_string()),
]))];
vec_avg.update_batch(&v).unwrap();
assert_eq!(
ScalarValue::Binary(Some(veclit_to_binlit(&[5.5, 6.5, 7.5]))),
vec_avg.evaluate().unwrap()
);
// test update with constant vector
let mut vec_avg = VectorAvg::default();
let v: Vec<ArrayRef> = vec![
Arc::new(ConstantVector::new(
Arc::new(StringVector::from_vec(vec!["[1.0,2.0,3.0]".to_string()])),
4,
))
.to_arrow_array(),
];
vec_avg.update_batch(&v).unwrap();
assert_eq!(
ScalarValue::Binary(Some(veclit_to_binlit(&[1.0, 2.0, 3.0]))),
vec_avg.evaluate().unwrap()
);
}
}

View File

@@ -14,6 +14,7 @@
mod convert;
mod distance;
mod elem_avg;
mod elem_product;
mod elem_sum;
pub mod impl_conv;
@@ -64,6 +65,7 @@ impl VectorFunction {
registry.register_scalar(vector_subvector::VectorSubvectorFunction::default());
registry.register_scalar(elem_sum::ElemSumFunction::default());
registry.register_scalar(elem_product::ElemProductFunction::default());
registry.register_scalar(elem_avg::ElemAvgFunction::default());
}
}

View File

@@ -0,0 +1,128 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
use std::fmt::Display;
use datafusion::arrow::datatypes::DataType;
use datafusion::logical_expr::ColumnarValue;
use datafusion_common::ScalarValue;
use datafusion_expr::type_coercion::aggregates::{BINARYS, STRINGS};
use datafusion_expr::{ScalarFunctionArgs, Signature, TypeSignature, Volatility};
use nalgebra::DVectorView;
use crate::function::Function;
use crate::scalars::vector::{VectorCalculator, impl_conv};
const NAME: &str = "vec_elem_avg";
#[derive(Debug, Clone)]
pub(crate) struct ElemAvgFunction {
signature: Signature,
}
impl Default for ElemAvgFunction {
fn default() -> Self {
Self {
signature: Signature::one_of(
vec![
TypeSignature::Uniform(1, STRINGS.to_vec()),
TypeSignature::Uniform(1, BINARYS.to_vec()),
TypeSignature::Uniform(1, vec![DataType::BinaryView]),
],
Volatility::Immutable,
),
}
}
}
impl Function for ElemAvgFunction {
fn name(&self) -> &str {
NAME
}
fn return_type(&self, _: &[DataType]) -> datafusion_common::Result<DataType> {
Ok(DataType::Float32)
}
fn signature(&self) -> &Signature {
&self.signature
}
fn invoke_with_args(
&self,
args: ScalarFunctionArgs,
) -> datafusion_common::Result<ColumnarValue> {
let body = |v0: &ScalarValue| -> datafusion_common::Result<ScalarValue> {
let v0 =
impl_conv::as_veclit(v0)?.map(|v0| DVectorView::from_slice(&v0, v0.len()).mean());
Ok(ScalarValue::Float32(v0))
};
let calculator = VectorCalculator {
name: self.name(),
func: body,
};
calculator.invoke_with_single_argument(args)
}
}
impl Display for ElemAvgFunction {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
write!(f, "{}", NAME.to_ascii_uppercase())
}
}
#[cfg(test)]
mod tests {
use std::sync::Arc;
use arrow::array::StringViewArray;
use arrow_schema::Field;
use datafusion::arrow::array::{Array, AsArray};
use datafusion::arrow::datatypes::Float32Type;
use datafusion_common::config::ConfigOptions;
use super::*;
#[test]
fn test_elem_avg() {
let func = ElemAvgFunction::default();
let input = Arc::new(StringViewArray::from(vec![
Some("[1.0,2.0,3.0]".to_string()),
Some("[4.0,5.0,6.0]".to_string()),
Some("[7.0,8.0,9.0]".to_string()),
None,
]));
let result = func
.invoke_with_args(ScalarFunctionArgs {
args: vec![ColumnarValue::Array(input.clone())],
arg_fields: vec![],
number_rows: input.len(),
return_field: Arc::new(Field::new("x", DataType::Float32, true)),
config_options: Arc::new(ConfigOptions::new()),
})
.and_then(|v| ColumnarValue::values_to_arrays(&[v]))
.map(|mut a| a.remove(0))
.unwrap();
let result = result.as_primitive::<Float32Type>();
assert_eq!(result.len(), 4);
assert_eq!(result.value(0), 2.0);
assert_eq!(result.value(1), 5.0);
assert_eq!(result.value(2), 8.0);
assert!(result.is_null(3));
}
}

View File

@@ -16,6 +16,9 @@ mod version;
use std::sync::Arc;
use common_catalog::consts::{
DEFAULT_PRIVATE_SCHEMA_NAME, INFORMATION_SCHEMA_NAME, PG_CATALOG_NAME,
};
use datafusion::arrow::array::{ArrayRef, StringArray, as_boolean_array};
use datafusion::catalog::TableFunction;
use datafusion::common::ScalarValue;
@@ -143,9 +146,9 @@ impl Function for CurrentSchemasFunction {
let mut values = vec!["public"];
// include implicit schemas
if input.value(0) {
values.push("information_schema");
values.push("pg_catalog");
values.push("greptime_private");
values.push(INFORMATION_SCHEMA_NAME);
values.push(PG_CATALOG_NAME);
values.push(DEFAULT_PRIVATE_SCHEMA_NAME);
}
let list_array = SingleRowListArrayBuilder::new(Arc::new(StringArray::from(values)));
@@ -191,7 +194,10 @@ impl PGCatalogFunction {
registry.register(pg_catalog::create_pg_get_userbyid_udf());
registry.register(pg_catalog::create_pg_table_is_visible());
registry.register(pg_catalog::pg_get_expr_udf::create_pg_get_expr_udf());
// TODO(sunng87): upgrade datafusion to add
//registry.register(pg_catalog::create_pg_encoding_to_char_udf());
registry.register(pg_catalog::create_pg_encoding_to_char_udf());
registry.register(pg_catalog::create_pg_relation_size_udf());
registry.register(pg_catalog::create_pg_total_relation_size_udf());
registry.register(pg_catalog::create_pg_stat_get_numscans());
registry.register(pg_catalog::create_pg_get_constraintdef());
}
}

View File

@@ -22,14 +22,14 @@ use dashmap::DashMap;
use dashmap::mapref::entry::Entry;
use lazy_static::lazy_static;
use serde::{Deserialize, Serialize};
use snafu::{OptionExt, ResultExt};
use snafu::ResultExt;
use tokio_util::sync::CancellationToken;
use tonic::transport::{
Certificate, Channel as InnerChannel, ClientTlsConfig, Endpoint, Identity, Uri,
};
use tower::Service;
use crate::error::{CreateChannelSnafu, InvalidConfigFilePathSnafu, InvalidTlsConfigSnafu, Result};
use crate::error::{CreateChannelSnafu, InvalidConfigFilePathSnafu, Result};
const RECYCLE_CHANNEL_INTERVAL_SECS: u64 = 60;
pub const DEFAULT_GRPC_REQUEST_TIMEOUT_SECS: u64 = 10;
@@ -91,57 +91,18 @@ impl ChannelManager {
Default::default()
}
pub fn with_config(config: ChannelConfig) -> Self {
let inner = Inner::with_config(config);
/// unified with config function that support tls config
/// use [`load_tls_config`] to load tls config from file system
pub fn with_config(config: ChannelConfig, tls_config: Option<ClientTlsConfig>) -> Self {
let mut inner = Inner::with_config(config.clone());
if let Some(tls_config) = tls_config {
inner.client_tls_config = Some(tls_config);
}
Self {
inner: Arc::new(inner),
}
}
/// Read tls cert and key files and create a ChannelManager with TLS config.
pub fn with_tls_config(config: ChannelConfig) -> Result<Self> {
let mut inner = Inner::with_config(config.clone());
// setup tls
let path_config = config.client_tls.context(InvalidTlsConfigSnafu {
msg: "no config input",
})?;
if !path_config.enabled {
// if TLS not enabled, just ignore other tls config
// and not set `client_tls_config` hence not use TLS
return Ok(Self {
inner: Arc::new(inner),
});
}
let mut tls_config = ClientTlsConfig::new();
if let Some(server_ca) = path_config.server_ca_cert_path {
let server_root_ca_cert =
std::fs::read_to_string(server_ca).context(InvalidConfigFilePathSnafu)?;
let server_root_ca_cert = Certificate::from_pem(server_root_ca_cert);
tls_config = tls_config.ca_certificate(server_root_ca_cert);
}
if let (Some(client_cert_path), Some(client_key_path)) =
(&path_config.client_cert_path, &path_config.client_key_path)
{
let client_cert =
std::fs::read_to_string(client_cert_path).context(InvalidConfigFilePathSnafu)?;
let client_key =
std::fs::read_to_string(client_key_path).context(InvalidConfigFilePathSnafu)?;
let client_identity = Identity::from_pem(client_cert, client_key);
tls_config = tls_config.identity(client_identity);
}
inner.client_tls_config = Some(tls_config);
Ok(Self {
inner: Arc::new(inner),
})
}
pub fn config(&self) -> &ChannelConfig {
&self.inner.config
}
@@ -287,6 +248,34 @@ impl ChannelManager {
}
}
pub fn load_tls_config(tls_option: Option<&ClientTlsOption>) -> Result<Option<ClientTlsConfig>> {
let path_config = match tls_option {
Some(path_config) if path_config.enabled => path_config,
_ => return Ok(None),
};
let mut tls_config = ClientTlsConfig::new();
if let Some(server_ca) = &path_config.server_ca_cert_path {
let server_root_ca_cert =
std::fs::read_to_string(server_ca).context(InvalidConfigFilePathSnafu)?;
let server_root_ca_cert = Certificate::from_pem(server_root_ca_cert);
tls_config = tls_config.ca_certificate(server_root_ca_cert);
}
if let (Some(client_cert_path), Some(client_key_path)) =
(&path_config.client_cert_path, &path_config.client_key_path)
{
let client_cert =
std::fs::read_to_string(client_cert_path).context(InvalidConfigFilePathSnafu)?;
let client_key =
std::fs::read_to_string(client_key_path).context(InvalidConfigFilePathSnafu)?;
let client_identity = Identity::from_pem(client_cert, client_key);
tls_config = tls_config.identity(client_identity);
}
Ok(Some(tls_config))
}
#[derive(Clone, Debug, PartialEq, Eq, Serialize, Deserialize)]
pub struct ClientTlsOption {
/// Whether to enable TLS for client.
@@ -659,7 +648,7 @@ mod tests {
.http2_adaptive_window(true)
.tcp_keepalive(Duration::from_secs(2))
.tcp_nodelay(true);
let mgr = ChannelManager::with_config(config);
let mgr = ChannelManager::with_config(config, None);
let res = mgr.build_endpoint("test_addr");

View File

@@ -12,14 +12,17 @@
// See the License for the specific language governing permissions and
// limitations under the License.
use common_grpc::channel_manager::{ChannelConfig, ChannelManager, ClientTlsOption};
use common_grpc::channel_manager::{
ChannelConfig, ChannelManager, ClientTlsOption, load_tls_config,
};
#[tokio::test]
async fn test_mtls_config() {
// test no config
let config = ChannelConfig::new();
let re = ChannelManager::with_tls_config(config);
assert!(re.is_err());
let re = load_tls_config(config.client_tls.as_ref());
assert!(re.is_ok());
assert!(re.unwrap().is_none());
// test wrong file
let config = ChannelConfig::new().client_tls_config(ClientTlsOption {
@@ -29,7 +32,7 @@ async fn test_mtls_config() {
client_key_path: Some("tests/tls/wrong_client.key".to_string()),
});
let re = ChannelManager::with_tls_config(config);
let re = load_tls_config(config.client_tls.as_ref());
assert!(re.is_err());
// test corrupted file content
@@ -40,7 +43,9 @@ async fn test_mtls_config() {
client_key_path: Some("tests/tls/corrupted".to_string()),
});
let re = ChannelManager::with_tls_config(config).unwrap();
let tls_config = load_tls_config(config.client_tls.as_ref()).unwrap();
let re = ChannelManager::with_config(config, tls_config);
let re = re.get("127.0.0.1:0");
assert!(re.is_err());
@@ -52,7 +57,8 @@ async fn test_mtls_config() {
client_key_path: Some("tests/tls/client.key".to_string()),
});
let re = ChannelManager::with_tls_config(config).unwrap();
let tls_config = load_tls_config(config.client_tls.as_ref()).unwrap();
let re = ChannelManager::with_config(config, tls_config);
let re = re.get("127.0.0.1:0");
let _ = re.unwrap();
}

View File

@@ -77,7 +77,10 @@ serde_json.workspace = true
serde_with.workspace = true
session.workspace = true
snafu.workspace = true
sqlx = { workspace = true, optional = true }
sqlx = { workspace = true, features = [
"mysql",
"chrono",
], optional = true }
store-api.workspace = true
strum.workspace = true
table = { workspace = true, features = ["testing"] }

View File

@@ -25,8 +25,7 @@ use store_api::region_engine::{RegionRole, RegionStatistic};
use store_api::storage::RegionId;
use table::metadata::TableId;
use crate::error;
use crate::error::Result;
use crate::error::{self, DeserializeFromJsonSnafu, Result};
use crate::heartbeat::utils::get_datanode_workloads;
const DATANODE_STAT_PREFIX: &str = "__meta_datanode_stat";
@@ -66,10 +65,12 @@ pub struct Stat {
pub node_epoch: u64,
/// The datanode workloads.
pub datanode_workloads: DatanodeWorkloads,
/// The GC statistics of the datanode.
pub gc_stat: Option<GcStat>,
}
/// The statistics of a region.
#[derive(Debug, Clone, Serialize, Deserialize)]
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq)]
pub struct RegionStat {
/// The region_id.
pub id: RegionId,
@@ -126,7 +127,7 @@ pub trait TopicStatsReporter: Send + Sync {
fn reportable_topics(&mut self) -> Vec<TopicStat>;
}
#[derive(Debug, Clone, Copy, Serialize, Deserialize)]
#[derive(Debug, Clone, Copy, Serialize, Deserialize, PartialEq, Eq, Hash)]
pub enum RegionManifestInfo {
Mito {
manifest_version: u64,
@@ -222,11 +223,12 @@ impl TryFrom<&HeartbeatRequest> for Stat {
node_epoch,
node_workloads,
topic_stats,
extensions,
..
} = value;
match (header, peer) {
(Some(_header), Some(peer)) => {
(Some(header), Some(peer)) => {
let region_stats = region_stats
.iter()
.map(RegionStat::from)
@@ -234,6 +236,14 @@ impl TryFrom<&HeartbeatRequest> for Stat {
let topic_stats = topic_stats.iter().map(TopicStat::from).collect::<Vec<_>>();
let datanode_workloads = get_datanode_workloads(node_workloads.as_ref());
let gc_stat = GcStat::from_extensions(extensions).map_err(|err| {
common_telemetry::error!(
"Failed to deserialize GcStat from extensions: {}",
err
);
header.clone()
})?;
Ok(Self {
timestamp_millis: time_util::current_time_millis(),
// datanode id
@@ -247,6 +257,7 @@ impl TryFrom<&HeartbeatRequest> for Stat {
topic_stats,
node_epoch: *node_epoch,
datanode_workloads,
gc_stat,
})
}
(header, _) => Err(header.clone()),
@@ -319,6 +330,43 @@ impl From<&api::v1::meta::TopicStat> for TopicStat {
}
}
#[derive(Debug, Clone, Serialize, Deserialize, Default)]
pub struct GcStat {
/// Number of GC tasks currently running on the datanode.
pub running_gc_tasks: u32,
/// The maximum number of concurrent GC tasks the datanode can handle.
pub gc_concurrency: u32,
}
impl GcStat {
pub const GC_STAT_KEY: &str = "__gc_stat";
pub fn new(running_gc_tasks: u32, gc_concurrency: u32) -> Self {
Self {
running_gc_tasks,
gc_concurrency,
}
}
pub fn into_extensions(&self, extensions: &mut std::collections::HashMap<String, Vec<u8>>) {
let bytes = serde_json::to_vec(self).unwrap_or_default();
extensions.insert(Self::GC_STAT_KEY.to_string(), bytes);
}
pub fn from_extensions(
extensions: &std::collections::HashMap<String, Vec<u8>>,
) -> Result<Option<Self>> {
extensions
.get(Self::GC_STAT_KEY)
.map(|bytes| {
serde_json::from_slice(bytes).with_context(|_| DeserializeFromJsonSnafu {
input: String::from_utf8_lossy(bytes).to_string(),
})
})
.transpose()
}
}
/// The key of the datanode stat in the memory store.
///
/// The format is `__meta_datanode_stat-0-{node_id}`.

View File

@@ -442,7 +442,7 @@ pub fn extract_column_metadatas(
results: &mut [RegionResponse],
key: &str,
) -> Result<Option<Vec<ColumnMetadata>>> {
let schemas = results
let mut schemas = results
.iter_mut()
.map(|r| r.extensions.remove(key))
.collect::<Vec<_>>();
@@ -454,20 +454,24 @@ pub fn extract_column_metadatas(
// Verify all the physical schemas are the same
// Safety: previous check ensures this vec is not empty
let first = schemas.first().unwrap();
ensure!(
schemas.iter().all(|x| x == first),
MetadataCorruptionSnafu {
err_msg: "The table column metadata schemas from datanodes are not the same."
}
);
let first_column_metadatas = schemas
.swap_remove(0)
.map(|first_bytes| ColumnMetadata::decode_list(&first_bytes).context(DecodeJsonSnafu))
.transpose()?;
if let Some(first) = first {
let column_metadatas = ColumnMetadata::decode_list(first).context(DecodeJsonSnafu)?;
Ok(Some(column_metadatas))
} else {
Ok(None)
for s in schemas {
// check decoded column metadata instead of bytes because it contains extension map.
let column_metadata = s
.map(|bytes| ColumnMetadata::decode_list(&bytes).context(DecodeJsonSnafu))
.transpose()?;
ensure!(
column_metadata == first_column_metadatas,
MetadataCorruptionSnafu {
err_msg: "The table column metadata schemas from datanodes are not the same."
}
);
}
Ok(first_column_metadatas)
}
#[cfg(test)]

View File

@@ -17,7 +17,7 @@ use std::fmt::{Display, Formatter};
use std::time::Duration;
use serde::{Deserialize, Deserializer, Serialize};
use store_api::storage::{RegionId, RegionNumber};
use store_api::storage::{FileRefsManifest, GcReport, RegionId, RegionNumber};
use strum::Display;
use table::metadata::TableId;
use table::table_name::TableName;
@@ -250,7 +250,7 @@ pub struct UpgradeRegion {
/// `None` stands for no wait,
/// it's helpful to verify whether the leader region is ready.
#[serde(with = "humantime_serde")]
pub replay_timeout: Option<Duration>,
pub replay_timeout: Duration,
/// The hint for replaying memtable.
#[serde(default)]
pub location_id: Option<u64>,
@@ -417,6 +417,88 @@ where
})
}
/// Instruction to get file references for specified regions.
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq)]
pub struct GetFileRefs {
/// List of region IDs to get file references for.
pub region_ids: Vec<RegionId>,
}
impl Display for GetFileRefs {
fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result {
write!(f, "GetFileRefs(region_ids={:?})", self.region_ids)
}
}
/// Instruction to trigger garbage collection for a region.
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq)]
pub struct GcRegions {
/// The region ID to perform GC on.
pub regions: Vec<RegionId>,
/// The file references manifest containing temporary file references.
pub file_refs_manifest: FileRefsManifest,
/// Whether to perform a full file listing to find orphan files.
pub full_file_listing: bool,
}
impl Display for GcRegions {
fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result {
write!(
f,
"GcRegion(regions={:?}, file_refs_count={}, full_file_listing={})",
self.regions,
self.file_refs_manifest.file_refs.len(),
self.full_file_listing
)
}
}
/// Reply for GetFileRefs instruction.
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq)]
pub struct GetFileRefsReply {
/// The file references manifest.
pub file_refs_manifest: FileRefsManifest,
/// Whether the operation was successful.
pub success: bool,
/// Error message if any.
pub error: Option<String>,
}
impl Display for GetFileRefsReply {
fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result {
write!(
f,
"GetFileRefsReply(success={}, file_refs_count={}, error={:?})",
self.success,
self.file_refs_manifest.file_refs.len(),
self.error
)
}
}
/// Reply for GC instruction.
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq)]
pub struct GcRegionsReply {
pub result: Result<GcReport, String>,
}
impl Display for GcRegionsReply {
fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result {
write!(
f,
"GcReply(result={})",
match &self.result {
Ok(report) => format!(
"GcReport(deleted_files_count={}, need_retry_regions_count={})",
report.deleted_files.len(),
report.need_retry_regions.len()
),
Err(err) => format!("Err({})", err),
}
)
}
}
#[derive(Debug, Clone, Serialize, Deserialize, Display, PartialEq)]
pub enum Instruction {
/// Opens regions.
@@ -425,18 +507,23 @@ pub enum Instruction {
/// Closes regions.
#[serde(deserialize_with = "single_or_multiple_from", alias = "CloseRegion")]
CloseRegions(Vec<RegionIdent>),
/// Upgrades a region.
UpgradeRegion(UpgradeRegion),
/// Upgrades regions.
#[serde(deserialize_with = "single_or_multiple_from", alias = "UpgradeRegion")]
UpgradeRegions(Vec<UpgradeRegion>),
#[serde(
deserialize_with = "single_or_multiple_from",
alias = "DowngradeRegion"
)]
/// Downgrades a region.
/// Downgrades regions.
DowngradeRegions(Vec<DowngradeRegion>),
/// Invalidates batch cache.
InvalidateCaches(Vec<CacheIdent>),
/// Flushes regions.
FlushRegions(FlushRegions),
/// Gets file references for regions.
GetFileRefs(GetFileRefs),
/// Triggers garbage collection for a region.
GcRegions(GcRegions),
}
impl Instruction {
@@ -473,9 +560,23 @@ impl Instruction {
}
/// Converts the instruction into a [UpgradeRegion].
pub fn into_upgrade_regions(self) -> Option<UpgradeRegion> {
pub fn into_upgrade_regions(self) -> Option<Vec<UpgradeRegion>> {
match self {
Self::UpgradeRegion(upgrade_region) => Some(upgrade_region),
Self::UpgradeRegions(upgrade_region) => Some(upgrade_region),
_ => None,
}
}
pub fn into_get_file_refs(self) -> Option<GetFileRefs> {
match self {
Self::GetFileRefs(get_file_refs) => Some(get_file_refs),
_ => None,
}
}
pub fn into_gc_regions(self) -> Option<GcRegions> {
match self {
Self::GcRegions(gc_regions) => Some(gc_regions),
_ => None,
}
}
@@ -484,6 +585,10 @@ impl Instruction {
/// The reply of [UpgradeRegion].
#[derive(Debug, Serialize, Deserialize, PartialEq, Eq, Clone)]
pub struct UpgradeRegionReply {
/// The [RegionId].
/// For compatibility, it is defaulted to [RegionId::new(0, 0)].
#[serde(default)]
pub region_id: RegionId,
/// Returns true if `last_entry_id` has been replayed to the latest.
pub ready: bool,
/// Indicates whether the region exists.
@@ -535,6 +640,39 @@ where
})
}
#[derive(Debug, Serialize, Deserialize, PartialEq, Eq, Clone)]
pub struct UpgradeRegionsReply {
pub replies: Vec<UpgradeRegionReply>,
}
impl UpgradeRegionsReply {
pub fn new(replies: Vec<UpgradeRegionReply>) -> Self {
Self { replies }
}
pub fn single(reply: UpgradeRegionReply) -> Self {
Self::new(vec![reply])
}
}
#[derive(Deserialize)]
#[serde(untagged)]
enum UpgradeRegionsCompat {
Single(UpgradeRegionReply),
Multiple(UpgradeRegionsReply),
}
fn upgrade_regions_compat_from<'de, D>(deserializer: D) -> Result<UpgradeRegionsReply, D::Error>
where
D: Deserializer<'de>,
{
let helper = UpgradeRegionsCompat::deserialize(deserializer)?;
Ok(match helper {
UpgradeRegionsCompat::Single(x) => UpgradeRegionsReply::new(vec![x]),
UpgradeRegionsCompat::Multiple(reply) => reply,
})
}
#[derive(Debug, Serialize, Deserialize, PartialEq, Eq, Clone)]
#[serde(tag = "type", rename_all = "snake_case")]
pub enum InstructionReply {
@@ -542,13 +680,19 @@ pub enum InstructionReply {
OpenRegions(SimpleReply),
#[serde(alias = "close_region")]
CloseRegions(SimpleReply),
UpgradeRegion(UpgradeRegionReply),
#[serde(
deserialize_with = "upgrade_regions_compat_from",
alias = "upgrade_region"
)]
UpgradeRegions(UpgradeRegionsReply),
#[serde(
alias = "downgrade_region",
deserialize_with = "downgrade_regions_compat_from"
)]
DowngradeRegions(DowngradeRegionsReply),
FlushRegions(FlushRegionReply),
GetFileRefs(GetFileRefsReply),
GcRegions(GcRegionsReply),
}
impl Display for InstructionReply {
@@ -556,11 +700,15 @@ impl Display for InstructionReply {
match self {
Self::OpenRegions(reply) => write!(f, "InstructionReply::OpenRegions({})", reply),
Self::CloseRegions(reply) => write!(f, "InstructionReply::CloseRegions({})", reply),
Self::UpgradeRegion(reply) => write!(f, "InstructionReply::UpgradeRegion({})", reply),
Self::UpgradeRegions(reply) => {
write!(f, "InstructionReply::UpgradeRegions({:?})", reply.replies)
}
Self::DowngradeRegions(reply) => {
write!(f, "InstructionReply::DowngradeRegions({:?})", reply)
write!(f, "InstructionReply::DowngradeRegions({:?})", reply.replies)
}
Self::FlushRegions(reply) => write!(f, "InstructionReply::FlushRegions({})", reply),
Self::GetFileRefs(reply) => write!(f, "InstructionReply::GetFileRefs({})", reply),
Self::GcRegions(reply) => write!(f, "InstructionReply::GcRegion({})", reply),
}
}
}
@@ -581,9 +729,9 @@ impl InstructionReply {
}
}
pub fn expect_upgrade_region_reply(self) -> UpgradeRegionReply {
pub fn expect_upgrade_regions_reply(self) -> Vec<UpgradeRegionReply> {
match self {
Self::UpgradeRegion(reply) => reply,
Self::UpgradeRegions(reply) => reply.replies,
_ => panic!("Expected UpgradeRegion reply"),
}
}
@@ -605,6 +753,10 @@ impl InstructionReply {
#[cfg(test)]
mod tests {
use std::collections::HashSet;
use store_api::storage::FileId;
use super::*;
#[test]
@@ -641,25 +793,58 @@ mod tests {
serialized
);
let downgrade_region = InstructionReply::DowngradeRegions(DowngradeRegionsReply::single(
DowngradeRegionReply {
let upgrade_region = Instruction::UpgradeRegions(vec![UpgradeRegion {
region_id: RegionId::new(1024, 1),
last_entry_id: None,
metadata_last_entry_id: None,
replay_timeout: Duration::from_millis(1000),
location_id: None,
replay_entry_id: None,
metadata_replay_entry_id: None,
}]);
let serialized = serde_json::to_string(&upgrade_region).unwrap();
assert_eq!(
r#"{"UpgradeRegions":[{"region_id":4398046511105,"last_entry_id":null,"metadata_last_entry_id":null,"replay_timeout":"1s","location_id":null}]}"#,
serialized
);
}
#[test]
fn test_serialize_instruction_reply() {
let downgrade_region_reply = InstructionReply::DowngradeRegions(
DowngradeRegionsReply::single(DowngradeRegionReply {
region_id: RegionId::new(1024, 1),
last_entry_id: None,
metadata_last_entry_id: None,
exists: true,
error: None,
},
));
}),
);
let serialized = serde_json::to_string(&downgrade_region).unwrap();
let serialized = serde_json::to_string(&downgrade_region_reply).unwrap();
assert_eq!(
r#"{"type":"downgrade_regions","replies":[{"region_id":4398046511105,"last_entry_id":null,"metadata_last_entry_id":null,"exists":true,"error":null}]}"#,
serialized
)
);
let upgrade_region_reply =
InstructionReply::UpgradeRegions(UpgradeRegionsReply::single(UpgradeRegionReply {
region_id: RegionId::new(1024, 1),
ready: true,
exists: true,
error: None,
}));
let serialized = serde_json::to_string(&upgrade_region_reply).unwrap();
assert_eq!(
r#"{"type":"upgrade_regions","replies":[{"region_id":4398046511105,"ready":true,"exists":true,"error":null}]}"#,
serialized
);
}
#[test]
fn test_deserialize_instruction() {
// legacy open region instruction
let open_region_instruction = r#"{"OpenRegion":{"region_ident":{"datanode_id":2,"table_id":1024,"region_number":1,"engine":"mito2"},"region_storage_path":"test/foo","region_options":{},"region_wal_options":{},"skip_wal_replay":false}}"#;
let open_region_instruction: Instruction =
serde_json::from_str(open_region_instruction).unwrap();
@@ -677,6 +862,7 @@ mod tests {
)]);
assert_eq!(open_region_instruction, open_region);
// legacy close region instruction
let close_region_instruction = r#"{"CloseRegion":{"datanode_id":2,"table_id":1024,"region_number":1,"engine":"mito2"}}"#;
let close_region_instruction: Instruction =
serde_json::from_str(close_region_instruction).unwrap();
@@ -688,6 +874,7 @@ mod tests {
}]);
assert_eq!(close_region_instruction, close_region);
// legacy downgrade region instruction
let downgrade_region_instruction = r#"{"DowngradeRegions":{"region_id":4398046511105,"flush_timeout":{"secs":1,"nanos":0}}}"#;
let downgrade_region_instruction: Instruction =
serde_json::from_str(downgrade_region_instruction).unwrap();
@@ -697,6 +884,25 @@ mod tests {
}]);
assert_eq!(downgrade_region_instruction, downgrade_region);
// legacy upgrade region instruction
let upgrade_region_instruction = r#"{"UpgradeRegion":{"region_id":4398046511105,"last_entry_id":null,"metadata_last_entry_id":null,"replay_timeout":"1s","location_id":null,"replay_entry_id":null,"metadata_replay_entry_id":null}}"#;
let upgrade_region_instruction: Instruction =
serde_json::from_str(upgrade_region_instruction).unwrap();
let upgrade_region = Instruction::UpgradeRegions(vec![UpgradeRegion {
region_id: RegionId::new(1024, 1),
last_entry_id: None,
metadata_last_entry_id: None,
replay_timeout: Duration::from_millis(1000),
location_id: None,
replay_entry_id: None,
metadata_replay_entry_id: None,
}]);
assert_eq!(upgrade_region_instruction, upgrade_region);
}
#[test]
fn test_deserialize_instruction_reply() {
// legacy close region reply
let close_region_instruction_reply =
r#"{"result":true,"error":null,"type":"close_region"}"#;
let close_region_instruction_reply: InstructionReply =
@@ -707,6 +913,7 @@ mod tests {
});
assert_eq!(close_region_instruction_reply, close_region_reply);
// legacy open region reply
let open_region_instruction_reply = r#"{"result":true,"error":null,"type":"open_region"}"#;
let open_region_instruction_reply: InstructionReply =
serde_json::from_str(open_region_instruction_reply).unwrap();
@@ -716,6 +923,7 @@ mod tests {
});
assert_eq!(open_region_instruction_reply, open_region_reply);
// legacy downgrade region reply
let downgrade_region_instruction_reply = r#"{"region_id":4398046511105,"last_entry_id":null,"metadata_last_entry_id":null,"exists":true,"error":null,"type":"downgrade_region"}"#;
let downgrade_region_instruction_reply: InstructionReply =
serde_json::from_str(downgrade_region_instruction_reply).unwrap();
@@ -729,6 +937,19 @@ mod tests {
}),
);
assert_eq!(downgrade_region_instruction_reply, downgrade_region_reply);
// legacy upgrade region reply
let upgrade_region_instruction_reply = r#"{"region_id":4398046511105,"ready":true,"exists":true,"error":null,"type":"upgrade_region"}"#;
let upgrade_region_instruction_reply: InstructionReply =
serde_json::from_str(upgrade_region_instruction_reply).unwrap();
let upgrade_region_reply =
InstructionReply::UpgradeRegions(UpgradeRegionsReply::single(UpgradeRegionReply {
region_id: RegionId::new(1024, 1),
ready: true,
exists: true,
error: None,
}));
assert_eq!(upgrade_region_instruction_reply, upgrade_region_reply);
}
#[derive(Debug, Clone, Serialize, Deserialize)]
@@ -903,4 +1124,30 @@ mod tests {
_ => panic!("Expected FlushRegions instruction"),
}
}
#[test]
fn test_serialize_get_file_refs_instruction_reply() {
let mut manifest = FileRefsManifest::default();
let r0 = RegionId::new(1024, 1);
let r1 = RegionId::new(1024, 2);
manifest
.file_refs
.insert(r0, HashSet::from([FileId::random()]));
manifest
.file_refs
.insert(r1, HashSet::from([FileId::random()]));
manifest.manifest_version.insert(r0, 10);
manifest.manifest_version.insert(r1, 20);
let instruction_reply = InstructionReply::GetFileRefs(GetFileRefsReply {
file_refs_manifest: manifest,
success: true,
error: None,
});
let serialized = serde_json::to_string(&instruction_reply).unwrap();
let deserialized = serde_json::from_str(&serialized).unwrap();
assert_eq!(instruction_reply, deserialized);
}
}

View File

@@ -26,7 +26,6 @@ use datatypes::arrow::datatypes::{
Int32Type, TimestampMicrosecondType, TimestampMillisecondType, TimestampNanosecondType,
TimestampSecondType,
};
use datatypes::schema::SchemaRef;
fn prepare_record_batch(rows: usize) -> RecordBatch {
let schema = Schema::new(vec![
@@ -56,14 +55,6 @@ fn prepare_record_batch(rows: usize) -> RecordBatch {
RecordBatch::try_new(Arc::new(schema), columns).unwrap()
}
fn iter_by_greptimedb_values(schema: SchemaRef, record_batch: RecordBatch) {
let record_batch =
common_recordbatch::RecordBatch::try_from_df_record_batch(schema, record_batch).unwrap();
for row in record_batch.rows() {
black_box(row);
}
}
fn iter_by_loop_rows_and_columns(record_batch: RecordBatch) {
for i in 0..record_batch.num_rows() {
for column in record_batch.columns() {
@@ -125,19 +116,6 @@ pub fn criterion_benchmark(c: &mut Criterion) {
let mut group = c.benchmark_group("iter_record_batch");
for rows in [1usize, 10, 100, 1_000, 10_000] {
group.bench_with_input(
BenchmarkId::new("by_greptimedb_values", rows),
&rows,
|b, rows| {
let record_batch = prepare_record_batch(*rows);
let schema =
Arc::new(datatypes::schema::Schema::try_from(record_batch.schema()).unwrap());
b.iter(|| {
iter_by_greptimedb_values(schema.clone(), record_batch.clone());
})
},
);
group.bench_with_input(
BenchmarkId::new("by_loop_rows_and_columns", rows),
&rows,

View File

@@ -23,7 +23,6 @@ use datafusion_common::arrow::datatypes::{DataType as ArrowDataType, SchemaRef a
use datatypes::arrow::array::RecordBatchOptions;
use datatypes::prelude::DataType;
use datatypes::schema::SchemaRef;
use datatypes::value::Value;
use datatypes::vectors::{Helper, VectorRef};
use serde::ser::{Error, SerializeStruct};
use serde::{Serialize, Serializer};
@@ -194,11 +193,6 @@ impl RecordBatch {
self.df_record_batch.num_rows()
}
/// Create an iterator to traverse the data by row
pub fn rows(&self) -> RecordBatchRowIterator<'_> {
RecordBatchRowIterator::new(self)
}
pub fn column_vectors(
&self,
table_name: &str,
@@ -277,44 +271,6 @@ impl Serialize for RecordBatch {
}
}
pub struct RecordBatchRowIterator<'a> {
record_batch: &'a RecordBatch,
rows: usize,
columns: usize,
row_cursor: usize,
}
impl<'a> RecordBatchRowIterator<'a> {
fn new(record_batch: &'a RecordBatch) -> RecordBatchRowIterator<'a> {
RecordBatchRowIterator {
record_batch,
rows: record_batch.df_record_batch.num_rows(),
columns: record_batch.df_record_batch.num_columns(),
row_cursor: 0,
}
}
}
impl Iterator for RecordBatchRowIterator<'_> {
type Item = Vec<Value>;
fn next(&mut self) -> Option<Self::Item> {
if self.row_cursor == self.rows {
None
} else {
let mut row = Vec::with_capacity(self.columns);
for col in 0..self.columns {
let column = self.record_batch.column(col);
row.push(column.get(self.row_cursor));
}
self.row_cursor += 1;
Some(row)
}
}
}
/// merge multiple recordbatch into a single
pub fn merge_record_batches(schema: SchemaRef, batches: &[RecordBatch]) -> Result<RecordBatch> {
let batches_len = batches.len();
@@ -349,7 +305,9 @@ pub fn merge_record_batches(schema: SchemaRef, batches: &[RecordBatch]) -> Resul
mod tests {
use std::sync::Arc;
use datatypes::arrow::datatypes::{DataType, Field, Schema as ArrowSchema};
use datatypes::arrow::array::{AsArray, UInt32Array};
use datatypes::arrow::datatypes::{DataType, Field, Schema as ArrowSchema, UInt32Type};
use datatypes::arrow_array::StringArray;
use datatypes::data_type::ConcreteDataType;
use datatypes::schema::{ColumnSchema, Schema};
use datatypes::vectors::{StringVector, UInt32Vector};
@@ -407,64 +365,6 @@ mod tests {
);
}
#[test]
fn test_record_batch_visitor() {
let column_schemas = vec![
ColumnSchema::new("numbers", ConcreteDataType::uint32_datatype(), false),
ColumnSchema::new("strings", ConcreteDataType::string_datatype(), true),
];
let schema = Arc::new(Schema::new(column_schemas));
let columns: Vec<VectorRef> = vec![
Arc::new(UInt32Vector::from_slice(vec![1, 2, 3, 4])),
Arc::new(StringVector::from(vec![
None,
Some("hello"),
Some("greptime"),
None,
])),
];
let recordbatch = RecordBatch::new(schema, columns).unwrap();
let mut record_batch_iter = recordbatch.rows();
assert_eq!(
vec![Value::UInt32(1), Value::Null],
record_batch_iter
.next()
.unwrap()
.into_iter()
.collect::<Vec<Value>>()
);
assert_eq!(
vec![Value::UInt32(2), Value::String("hello".into())],
record_batch_iter
.next()
.unwrap()
.into_iter()
.collect::<Vec<Value>>()
);
assert_eq!(
vec![Value::UInt32(3), Value::String("greptime".into())],
record_batch_iter
.next()
.unwrap()
.into_iter()
.collect::<Vec<Value>>()
);
assert_eq!(
vec![Value::UInt32(4), Value::Null],
record_batch_iter
.next()
.unwrap()
.into_iter()
.collect::<Vec<Value>>()
);
assert!(record_batch_iter.next().is_none());
}
#[test]
fn test_record_batch_slice() {
let column_schemas = vec![
@@ -483,26 +383,16 @@ mod tests {
];
let recordbatch = RecordBatch::new(schema, columns).unwrap();
let recordbatch = recordbatch.slice(1, 2).expect("recordbatch slice");
let mut record_batch_iter = recordbatch.rows();
assert_eq!(
vec![Value::UInt32(2), Value::String("hello".into())],
record_batch_iter
.next()
.unwrap()
.into_iter()
.collect::<Vec<Value>>()
);
assert_eq!(
vec![Value::UInt32(3), Value::String("greptime".into())],
record_batch_iter
.next()
.unwrap()
.into_iter()
.collect::<Vec<Value>>()
);
let expected = &UInt32Array::from_iter_values([2u32, 3]);
let array = recordbatch.column(0).to_arrow_array();
let actual = array.as_primitive::<UInt32Type>();
assert_eq!(expected, actual);
assert!(record_batch_iter.next().is_none());
let expected = &StringArray::from(vec!["hello", "greptime"]);
let array = recordbatch.column(1).to_arrow_array();
let actual = array.as_string::<i32>();
assert_eq!(expected, actual);
assert!(recordbatch.slice(1, 5).is_err());
}

View File

@@ -13,7 +13,6 @@
// limitations under the License.
use std::fmt::Display;
use std::str::FromStr;
use chrono::{FixedOffset, TimeZone};
use chrono_tz::{OffsetComponents, Tz};
@@ -102,7 +101,7 @@ impl Timezone {
.parse::<u32>()
.context(ParseOffsetStrSnafu { raw: tz_string })?;
Self::hours_mins_opt(hrs, mins)
} else if let Ok(tz) = Tz::from_str(tz_string) {
} else if let Ok(tz) = Tz::from_str_insensitive(tz_string) {
Ok(Self::Named(tz))
} else {
ParseTimezoneNameSnafu { raw: tz_string }.fail()
@@ -203,6 +202,10 @@ mod tests {
Timezone::Named(Tz::Asia__Shanghai),
Timezone::from_tz_string("Asia/Shanghai").unwrap()
);
assert_eq!(
Timezone::Named(Tz::Asia__Shanghai),
Timezone::from_tz_string("Asia/ShangHai").unwrap()
);
assert_eq!(
Timezone::Named(Tz::UTC),
Timezone::from_tz_string("UTC").unwrap()

View File

@@ -11,7 +11,7 @@ workspace = true
codec = ["dep:serde"]
[dependencies]
const_format = "0.2"
const_format.workspace = true
serde = { workspace = true, optional = true }
shadow-rs = { version = "1.2.1", default-features = false }

View File

@@ -322,6 +322,21 @@ pub enum Error {
location: Location,
},
#[snafu(display("Failed to run gc for region {}", region_id))]
GcMitoEngine {
region_id: RegionId,
source: mito2::error::Error,
#[snafu(implicit)]
location: Location,
},
#[snafu(display("Invalid arguments for GC: {}", msg))]
InvalidGcArgs {
msg: String,
#[snafu(implicit)]
location: Location,
},
#[snafu(display("Failed to list SST entries from storage"))]
ListStorageSsts {
#[snafu(implicit)]
@@ -446,9 +461,11 @@ impl ErrorExt for Error {
AsyncTaskExecute { source, .. } => source.status_code(),
CreateDir { .. } | RemoveDir { .. } | ShutdownInstance { .. } | DataFusion { .. } => {
StatusCode::Internal
}
CreateDir { .. }
| RemoveDir { .. }
| ShutdownInstance { .. }
| DataFusion { .. }
| InvalidGcArgs { .. } => StatusCode::Internal,
RegionNotFound { .. } => StatusCode::RegionNotFound,
RegionNotReady { .. } => StatusCode::RegionNotReady,
@@ -466,7 +483,7 @@ impl ErrorExt for Error {
StopRegionEngine { source, .. } => source.status_code(),
FindLogicalRegions { source, .. } => source.status_code(),
BuildMitoEngine { source, .. } => source.status_code(),
BuildMitoEngine { source, .. } | GcMitoEngine { source, .. } => source.status_code(),
BuildMetricEngine { source, .. } => source.status_code(),
ListStorageSsts { source, .. } => source.status_code(),
ConcurrentQueryLimiterClosed { .. } | ConcurrentQueryLimiterTimeout { .. } => {

View File

@@ -36,14 +36,14 @@ use common_workload::DatanodeWorkloadType;
use meta_client::MetaClientRef;
use meta_client::client::{HeartbeatSender, MetaClient};
use servers::addrs;
use snafu::ResultExt;
use snafu::{OptionExt as _, ResultExt};
use tokio::sync::{Notify, mpsc};
use tokio::time::Instant;
use self::handler::RegionHeartbeatResponseHandler;
use crate::alive_keeper::{CountdownTaskHandlerExtRef, RegionAliveKeeper};
use crate::config::DatanodeOptions;
use crate::error::{self, MetaClientInitSnafu, Result};
use crate::error::{self, MetaClientInitSnafu, RegionEngineNotFoundSnafu, Result};
use crate::event_listener::RegionServerEventReceiver;
use crate::metrics::{self, HEARTBEAT_RECV_COUNT, HEARTBEAT_SENT_COUNT};
use crate::region_server::RegionServer;
@@ -242,12 +242,18 @@ impl HeartbeatTask {
let total_cpu_millicores = self.resource_stat.get_total_cpu_millicores();
let total_memory_bytes = self.resource_stat.get_total_memory_bytes();
let resource_stat = self.resource_stat.clone();
let gc_limiter = self
.region_server
.mito_engine()
.context(RegionEngineNotFoundSnafu { name: "mito" })?
.gc_limiter();
common_runtime::spawn_hb(async move {
let sleep = tokio::time::sleep(Duration::from_millis(0));
tokio::pin!(sleep);
let build_info = common_version::build_info();
let heartbeat_request = HeartbeatRequest {
peer: self_peer,
node_epoch,
@@ -283,8 +289,13 @@ impl HeartbeatTask {
if let Some(message) = message {
match outgoing_message_to_mailbox_message(message) {
Ok(message) => {
let mut extensions = heartbeat_request.extensions.clone();
let gc_stat = gc_limiter.gc_stat();
gc_stat.into_extensions(&mut extensions);
let req = HeartbeatRequest {
mailbox_message: Some(message),
extensions,
..heartbeat_request.clone()
};
HEARTBEAT_RECV_COUNT.with_label_values(&["success"]).inc();
@@ -305,10 +316,16 @@ impl HeartbeatTask {
let topic_stats = region_server_clone.topic_stats();
let now = Instant::now();
let duration_since_epoch = (now - epoch).as_millis() as u64;
let mut extensions = heartbeat_request.extensions.clone();
let gc_stat = gc_limiter.gc_stat();
gc_stat.into_extensions(&mut extensions);
let mut req = HeartbeatRequest {
region_stats,
topic_stats,
duration_since_epoch,
extensions,
..heartbeat_request.clone()
};

View File

@@ -20,16 +20,21 @@ use common_meta::heartbeat::handler::{
use common_meta::instruction::{Instruction, InstructionReply};
use common_telemetry::error;
use snafu::OptionExt;
use store_api::storage::GcReport;
mod close_region;
mod downgrade_region;
mod file_ref;
mod flush_region;
mod gc_worker;
mod open_region;
mod upgrade_region;
use crate::heartbeat::handler::close_region::CloseRegionsHandler;
use crate::heartbeat::handler::downgrade_region::DowngradeRegionsHandler;
use crate::heartbeat::handler::file_ref::GetFileRefsHandler;
use crate::heartbeat::handler::flush_region::FlushRegionsHandler;
use crate::heartbeat::handler::gc_worker::GcRegionsHandler;
use crate::heartbeat::handler::open_region::OpenRegionsHandler;
use crate::heartbeat::handler::upgrade_region::UpgradeRegionsHandler;
use crate::heartbeat::task_tracker::TaskTracker;
@@ -39,10 +44,10 @@ use crate::region_server::RegionServer;
#[derive(Clone)]
pub struct RegionHeartbeatResponseHandler {
region_server: RegionServer,
catchup_tasks: TaskTracker<()>,
downgrade_tasks: TaskTracker<()>,
flush_tasks: TaskTracker<()>,
open_region_parallelism: usize,
gc_tasks: TaskTracker<GcReport>,
}
#[async_trait::async_trait]
@@ -58,9 +63,9 @@ pub trait InstructionHandler: Send + Sync {
#[derive(Clone)]
pub struct HandlerContext {
region_server: RegionServer,
catchup_tasks: TaskTracker<()>,
downgrade_tasks: TaskTracker<()>,
flush_tasks: TaskTracker<()>,
gc_tasks: TaskTracker<GcReport>,
}
impl HandlerContext {
@@ -68,9 +73,9 @@ impl HandlerContext {
pub fn new_for_test(region_server: RegionServer) -> Self {
Self {
region_server,
catchup_tasks: TaskTracker::new(),
downgrade_tasks: TaskTracker::new(),
flush_tasks: TaskTracker::new(),
gc_tasks: TaskTracker::new(),
}
}
}
@@ -80,11 +85,11 @@ impl RegionHeartbeatResponseHandler {
pub fn new(region_server: RegionServer) -> Self {
Self {
region_server,
catchup_tasks: TaskTracker::new(),
downgrade_tasks: TaskTracker::new(),
flush_tasks: TaskTracker::new(),
// Default to half of the number of CPUs.
open_region_parallelism: (num_cpus::get() / 2).max(1),
gc_tasks: TaskTracker::new(),
}
}
@@ -105,7 +110,14 @@ impl RegionHeartbeatResponseHandler {
)),
Instruction::FlushRegions(_) => Ok(Box::new(FlushRegionsHandler.into())),
Instruction::DowngradeRegions(_) => Ok(Box::new(DowngradeRegionsHandler.into())),
Instruction::UpgradeRegion(_) => Ok(Box::new(UpgradeRegionsHandler.into())),
Instruction::UpgradeRegions(_) => Ok(Box::new(
UpgradeRegionsHandler {
upgrade_region_parallelism: self.open_region_parallelism,
}
.into(),
)),
Instruction::GetFileRefs(_) => Ok(Box::new(GetFileRefsHandler.into())),
Instruction::GcRegions(_) => Ok(Box::new(GcRegionsHandler.into())),
Instruction::InvalidateCaches(_) => InvalidHeartbeatResponseSnafu.fail(),
}
}
@@ -118,6 +130,8 @@ pub enum InstructionHandlers {
FlushRegions(FlushRegionsHandler),
DowngradeRegions(DowngradeRegionsHandler),
UpgradeRegions(UpgradeRegionsHandler),
GetFileRefs(GetFileRefsHandler),
GcRegions(GcRegionsHandler),
}
macro_rules! impl_from_handler {
@@ -137,7 +151,9 @@ impl_from_handler!(
OpenRegionsHandler => OpenRegions,
FlushRegionsHandler => FlushRegions,
DowngradeRegionsHandler => DowngradeRegions,
UpgradeRegionsHandler => UpgradeRegions
UpgradeRegionsHandler => UpgradeRegions,
GetFileRefsHandler => GetFileRefs,
GcRegionsHandler => GcRegions
);
macro_rules! dispatch_instr {
@@ -179,7 +195,9 @@ dispatch_instr!(
OpenRegions => OpenRegions,
FlushRegions => FlushRegions,
DowngradeRegions => DowngradeRegions,
UpgradeRegion => UpgradeRegions,
UpgradeRegions => UpgradeRegions,
GetFileRefs => GetFileRefs,
GcRegions => GcRegions,
);
#[async_trait]
@@ -199,18 +217,18 @@ impl HeartbeatResponseHandler for RegionHeartbeatResponseHandler {
let mailbox = ctx.mailbox.clone();
let region_server = self.region_server.clone();
let catchup_tasks = self.catchup_tasks.clone();
let downgrade_tasks = self.downgrade_tasks.clone();
let flush_tasks = self.flush_tasks.clone();
let gc_tasks = self.gc_tasks.clone();
let handler = self.build_handler(&instruction)?;
let _handle = common_runtime::spawn_global(async move {
let reply = handler
.handle(
&HandlerContext {
region_server,
catchup_tasks,
downgrade_tasks,
flush_tasks,
gc_tasks,
},
instruction,
)
@@ -315,10 +333,10 @@ mod tests {
);
// Upgrade region
let instruction = Instruction::UpgradeRegion(UpgradeRegion {
let instruction = Instruction::UpgradeRegions(vec![UpgradeRegion {
region_id,
..Default::default()
});
}]);
assert!(
heartbeat_handler.is_acceptable(&heartbeat_env.create_handler_ctx((meta, instruction)))
);

View File

@@ -0,0 +1,62 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
use common_error::ext::ErrorExt;
use common_meta::instruction::{GetFileRefs, GetFileRefsReply, InstructionReply};
use store_api::storage::FileRefsManifest;
use crate::heartbeat::handler::{HandlerContext, InstructionHandler};
pub struct GetFileRefsHandler;
#[async_trait::async_trait]
impl InstructionHandler for GetFileRefsHandler {
type Instruction = GetFileRefs;
async fn handle(
&self,
ctx: &HandlerContext,
get_file_refs: Self::Instruction,
) -> Option<InstructionReply> {
let region_server = &ctx.region_server;
// Get the MitoEngine
let Some(mito_engine) = region_server.mito_engine() else {
return Some(InstructionReply::GetFileRefs(GetFileRefsReply {
file_refs_manifest: FileRefsManifest::default(),
success: false,
error: Some("MitoEngine not found".to_string()),
}));
};
match mito_engine
.get_snapshot_of_unmanifested_refs(get_file_refs.region_ids)
.await
{
Ok(all_file_refs) => {
// Return the file references
Some(InstructionReply::GetFileRefs(GetFileRefsReply {
file_refs_manifest: all_file_refs,
success: true,
error: None,
}))
}
Err(e) => Some(InstructionReply::GetFileRefs(GetFileRefsReply {
file_refs_manifest: FileRefsManifest::default(),
success: false,
error: Some(format!("Failed to get file refs: {}", e.output_msg())),
})),
}
}
}

View File

@@ -0,0 +1,156 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
use common_meta::instruction::{GcRegions, GcRegionsReply, InstructionReply};
use common_telemetry::{debug, warn};
use mito2::gc::LocalGcWorker;
use snafu::{OptionExt, ResultExt};
use store_api::storage::{FileRefsManifest, RegionId};
use crate::error::{GcMitoEngineSnafu, InvalidGcArgsSnafu, Result, UnexpectedSnafu};
use crate::heartbeat::handler::{HandlerContext, InstructionHandler};
pub struct GcRegionsHandler;
#[async_trait::async_trait]
impl InstructionHandler for GcRegionsHandler {
type Instruction = GcRegions;
async fn handle(
&self,
ctx: &HandlerContext,
gc_regions: Self::Instruction,
) -> Option<InstructionReply> {
let region_ids = gc_regions.regions.clone();
debug!("Received gc regions instruction: {:?}", region_ids);
let is_same_table = region_ids.windows(2).all(|w| {
let t1 = w[0].table_id();
let t2 = w[1].table_id();
t1 == t2
});
if !is_same_table {
return Some(InstructionReply::GcRegions(GcRegionsReply {
result: Err(format!(
"Regions to GC should belong to the same table, found: {:?}",
region_ids
)),
}));
}
let (region_id, gc_worker) = match self
.create_gc_worker(
ctx,
region_ids,
&gc_regions.file_refs_manifest,
gc_regions.full_file_listing,
)
.await
{
Ok(worker) => worker,
Err(e) => {
return Some(InstructionReply::GcRegions(GcRegionsReply {
result: Err(format!("Failed to create GC worker: {}", e)),
}));
}
};
let register_result = ctx
.gc_tasks
.try_register(
region_id,
Box::pin(async move {
debug!("Starting gc worker for region {}", region_id);
let report = gc_worker
.run()
.await
.context(GcMitoEngineSnafu { region_id })?;
debug!("Gc worker for region {} finished", region_id);
Ok(report)
}),
)
.await;
if register_result.is_busy() {
warn!("Another gc task is running for the region: {region_id}");
return Some(InstructionReply::GcRegions(GcRegionsReply {
result: Err(format!(
"Another gc task is running for the region: {region_id}"
)),
}));
}
let mut watcher = register_result.into_watcher();
let result = ctx.gc_tasks.wait_until_finish(&mut watcher).await;
match result {
Ok(report) => Some(InstructionReply::GcRegions(GcRegionsReply {
result: Ok(report),
})),
Err(err) => Some(InstructionReply::GcRegions(GcRegionsReply {
result: Err(format!("{err:?}")),
})),
}
}
}
impl GcRegionsHandler {
async fn create_gc_worker(
&self,
ctx: &HandlerContext,
mut region_ids: Vec<RegionId>,
file_ref_manifest: &FileRefsManifest,
full_file_listing: bool,
) -> Result<(RegionId, LocalGcWorker)> {
// always use the smallest region id on datanode as the target region id
region_ids.sort_by_key(|r| r.region_number());
let mito_engine = ctx
.region_server
.mito_engine()
.with_context(|| UnexpectedSnafu {
violated: "MitoEngine not found".to_string(),
})?;
let region_id = *region_ids.first().with_context(|| UnexpectedSnafu {
violated: "No region ids provided".to_string(),
})?;
let mito_config = mito_engine.mito_config();
// Find the access layer from one of the regions that exists on this datanode
let access_layer = region_ids
.iter()
.find_map(|rid| mito_engine.find_region(*rid))
.with_context(|| InvalidGcArgsSnafu {
msg: format!(
"None of the regions is on current datanode:{:?}",
region_ids
),
})?
.access_layer();
let cache_manager = mito_engine.cache_manager();
let gc_worker = LocalGcWorker::try_new(
access_layer.clone(),
Some(cache_manager),
region_ids.into_iter().collect(),
Default::default(),
mito_config.clone().into(),
file_ref_manifest.clone(),
&mito_engine.gc_limiter(),
full_file_listing,
)
.await
.context(GcMitoEngineSnafu { region_id })?;
Ok((region_id, gc_worker))
}
}

View File

@@ -12,125 +12,209 @@
// See the License for the specific language governing permissions and
// limitations under the License.
use common_meta::instruction::{InstructionReply, UpgradeRegion, UpgradeRegionReply};
use common_telemetry::{info, warn};
use store_api::region_request::{RegionCatchupRequest, RegionRequest, ReplayCheckpoint};
use common_error::ext::{BoxedError, ErrorExt};
use common_error::status_code::StatusCode;
use common_meta::instruction::{
InstructionReply, UpgradeRegion, UpgradeRegionReply, UpgradeRegionsReply,
};
use common_telemetry::{debug, info, warn};
use store_api::region_request::{RegionCatchupRequest, ReplayCheckpoint};
use store_api::storage::RegionId;
use crate::error::Result;
use crate::heartbeat::handler::{HandlerContext, InstructionHandler};
use crate::heartbeat::task_tracker::WaitResult;
#[derive(Debug, Clone, Copy, Default)]
pub struct UpgradeRegionsHandler;
pub struct UpgradeRegionsHandler {
pub upgrade_region_parallelism: usize,
}
#[cfg(test)]
impl UpgradeRegionsHandler {
fn new_test() -> UpgradeRegionsHandler {
UpgradeRegionsHandler {
upgrade_region_parallelism: 8,
}
}
}
impl UpgradeRegionsHandler {
fn convert_responses_to_replies(
responses: Result<Vec<(RegionId, std::result::Result<(), BoxedError>)>>,
catchup_regions: &[RegionId],
) -> Vec<UpgradeRegionReply> {
match responses {
Ok(responses) => responses
.into_iter()
.map(|(region_id, result)| match result {
Ok(()) => UpgradeRegionReply {
region_id,
ready: true,
exists: true,
error: None,
},
Err(err) => {
if err.status_code() == StatusCode::RegionNotFound {
UpgradeRegionReply {
region_id,
ready: false,
exists: false,
error: Some(format!("{err:?}")),
}
} else {
UpgradeRegionReply {
region_id,
ready: false,
exists: true,
error: Some(format!("{err:?}")),
}
}
}
})
.collect::<Vec<_>>(),
Err(err) => catchup_regions
.iter()
.map(|region_id| UpgradeRegionReply {
region_id: *region_id,
ready: false,
exists: true,
error: Some(format!("{err:?}")),
})
.collect::<Vec<_>>(),
}
}
}
impl UpgradeRegionsHandler {
// Handles upgrade regions instruction.
//
// Returns batch of upgrade region replies, the order of the replies is not guaranteed.
async fn handle_upgrade_regions(
&self,
ctx: &HandlerContext,
upgrade_regions: Vec<UpgradeRegion>,
) -> Vec<UpgradeRegionReply> {
let num_upgrade_regions = upgrade_regions.len();
let mut replies = Vec::with_capacity(num_upgrade_regions);
let mut catchup_requests = Vec::with_capacity(num_upgrade_regions);
let mut catchup_regions = Vec::with_capacity(num_upgrade_regions);
let mut timeout = None;
for upgrade_region in upgrade_regions {
let Some(writable) = ctx.region_server.is_region_leader(upgrade_region.region_id)
else {
// Region is not found.
debug!("Region {} is not found", upgrade_region.region_id);
replies.push(UpgradeRegionReply {
region_id: upgrade_region.region_id,
ready: false,
exists: false,
error: None,
});
continue;
};
// Ignores the catchup requests for writable regions.
if writable {
warn!(
"Region {} is writable, ignores the catchup request",
upgrade_region.region_id
);
replies.push(UpgradeRegionReply {
region_id: upgrade_region.region_id,
ready: true,
exists: true,
error: None,
});
} else {
let UpgradeRegion {
last_entry_id,
metadata_last_entry_id,
location_id,
replay_entry_id,
metadata_replay_entry_id,
replay_timeout,
..
} = upgrade_region;
match timeout {
Some(timeout) => {
debug_assert_eq!(timeout, replay_timeout);
}
None => {
// TODO(weny): required the replay_timeout.
timeout = Some(replay_timeout);
}
}
let checkpoint = match (replay_entry_id, metadata_replay_entry_id) {
(Some(entry_id), metadata_entry_id) => Some(ReplayCheckpoint {
entry_id,
metadata_entry_id,
}),
_ => None,
};
catchup_regions.push(upgrade_region.region_id);
catchup_requests.push((
upgrade_region.region_id,
RegionCatchupRequest {
set_writable: true,
entry_id: last_entry_id,
metadata_entry_id: metadata_last_entry_id,
location_id,
checkpoint,
},
));
}
}
let Some(timeout) = timeout else {
// No replay timeout, so we don't need to catchup the regions.
info!("All regions are writable, no need to catchup");
debug_assert_eq!(replies.len(), num_upgrade_regions);
return replies;
};
match tokio::time::timeout(
timeout,
ctx.region_server
.handle_batch_catchup_requests(self.upgrade_region_parallelism, catchup_requests),
)
.await
{
Ok(responses) => {
replies.extend(
Self::convert_responses_to_replies(responses, &catchup_regions).into_iter(),
);
}
Err(_) => {
replies.extend(catchup_regions.iter().map(|region_id| UpgradeRegionReply {
region_id: *region_id,
ready: false,
exists: true,
error: None,
}));
}
}
replies
}
}
#[async_trait::async_trait]
impl InstructionHandler for UpgradeRegionsHandler {
type Instruction = UpgradeRegion;
type Instruction = Vec<UpgradeRegion>;
async fn handle(
&self,
ctx: &HandlerContext,
UpgradeRegion {
region_id,
last_entry_id,
metadata_last_entry_id,
replay_timeout,
location_id,
replay_entry_id,
metadata_replay_entry_id,
}: UpgradeRegion,
upgrade_regions: Self::Instruction,
) -> Option<InstructionReply> {
let Some(writable) = ctx.region_server.is_region_leader(region_id) else {
return Some(InstructionReply::UpgradeRegion(UpgradeRegionReply {
ready: false,
exists: false,
error: None,
}));
};
let replies = self.handle_upgrade_regions(ctx, upgrade_regions).await;
if writable {
return Some(InstructionReply::UpgradeRegion(UpgradeRegionReply {
ready: true,
exists: true,
error: None,
}));
}
let region_server_moved = ctx.region_server.clone();
let checkpoint = match (replay_entry_id, metadata_replay_entry_id) {
(Some(entry_id), metadata_entry_id) => Some(ReplayCheckpoint {
entry_id,
metadata_entry_id,
}),
_ => None,
};
// The catchup task is almost zero cost if the inside region is writable.
// Therefore, it always registers a new catchup task.
let register_result = ctx
.catchup_tasks
.try_register(
region_id,
Box::pin(async move {
info!(
"Executing region: {region_id} catchup to: last entry id {last_entry_id:?}"
);
region_server_moved
.handle_request(
region_id,
RegionRequest::Catchup(RegionCatchupRequest {
set_writable: true,
entry_id: last_entry_id,
metadata_entry_id: metadata_last_entry_id,
location_id,
checkpoint,
}),
)
.await?;
Ok(())
}),
)
.await;
if register_result.is_busy() {
warn!("Another catchup task is running for the region: {region_id}");
}
// Returns immediately
let Some(replay_timeout) = replay_timeout else {
return Some(InstructionReply::UpgradeRegion(UpgradeRegionReply {
ready: false,
exists: true,
error: None,
}));
};
// We don't care that it returns a newly registered or running task.
let mut watcher = register_result.into_watcher();
let result = ctx.catchup_tasks.wait(&mut watcher, replay_timeout).await;
match result {
WaitResult::Timeout => Some(InstructionReply::UpgradeRegion(UpgradeRegionReply {
ready: false,
exists: true,
error: None,
})),
WaitResult::Finish(Ok(_)) => {
Some(InstructionReply::UpgradeRegion(UpgradeRegionReply {
ready: true,
exists: true,
error: None,
}))
}
WaitResult::Finish(Err(err)) => {
Some(InstructionReply::UpgradeRegion(UpgradeRegionReply {
ready: false,
exists: true,
error: Some(format!("{err:?}")),
}))
}
}
Some(InstructionReply::UpgradeRegions(UpgradeRegionsReply::new(
replies,
)))
}
}
@@ -142,7 +226,6 @@ mod tests {
use mito2::engine::MITO_ENGINE_NAME;
use store_api::region_engine::RegionRole;
use store_api::storage::RegionId;
use tokio::time::Instant;
use crate::error;
use crate::heartbeat::handler::upgrade_region::UpgradeRegionsHandler;
@@ -158,21 +241,30 @@ mod tests {
let handler_context = HandlerContext::new_for_test(mock_region_server);
let region_id = RegionId::new(1024, 1);
let waits = vec![None, Some(Duration::from_millis(100u64))];
for replay_timeout in waits {
let reply = UpgradeRegionsHandler
.handle(
&handler_context,
let region_id2 = RegionId::new(1024, 2);
let replay_timeout = Duration::from_millis(100u64);
let reply = UpgradeRegionsHandler::new_test()
.handle(
&handler_context,
vec![
UpgradeRegion {
region_id,
replay_timeout,
..Default::default()
},
)
.await;
UpgradeRegion {
region_id: region_id2,
replay_timeout,
..Default::default()
},
],
)
.await;
let reply = reply.unwrap().expect_upgrade_region_reply();
let replies = &reply.unwrap().expect_upgrade_regions_reply();
assert_eq!(replies[0].region_id, region_id);
assert_eq!(replies[1].region_id, region_id2);
for reply in replies {
assert!(!reply.exists);
assert!(reply.error.is_none());
}
@@ -182,6 +274,7 @@ mod tests {
async fn test_region_writable() {
let mock_region_server = mock_region_server();
let region_id = RegionId::new(1024, 1);
let region_id2 = RegionId::new(1024, 2);
let (mock_engine, _) =
MockRegionEngine::with_custom_apply_fn(MITO_ENGINE_NAME, |region_engine| {
@@ -191,25 +284,32 @@ mod tests {
unreachable!();
}));
});
mock_region_server.register_test_region(region_id, mock_engine);
mock_region_server.register_test_region(region_id, mock_engine.clone());
mock_region_server.register_test_region(region_id2, mock_engine);
let handler_context = HandlerContext::new_for_test(mock_region_server);
let waits = vec![None, Some(Duration::from_millis(100u64))];
for replay_timeout in waits {
let reply = UpgradeRegionsHandler
.handle(
&handler_context,
let replay_timeout = Duration::from_millis(100u64);
let reply = UpgradeRegionsHandler::new_test()
.handle(
&handler_context,
vec![
UpgradeRegion {
region_id,
replay_timeout,
..Default::default()
},
)
.await;
UpgradeRegion {
region_id: region_id2,
replay_timeout,
..Default::default()
},
],
)
.await;
let reply = reply.unwrap().expect_upgrade_region_reply();
let replies = &reply.unwrap().expect_upgrade_regions_reply();
assert_eq!(replies[0].region_id, region_id);
assert_eq!(replies[1].region_id, region_id2);
for reply in replies {
assert!(reply.ready);
assert!(reply.exists);
assert!(reply.error.is_none());
@@ -232,30 +332,27 @@ mod tests {
mock_region_server.register_test_region(region_id, mock_engine);
let handler_context = HandlerContext::new_for_test(mock_region_server);
let replay_timeout = Duration::from_millis(100u64);
let reply = UpgradeRegionsHandler::new_test()
.handle(
&handler_context,
vec![UpgradeRegion {
region_id,
replay_timeout,
..Default::default()
}],
)
.await;
let waits = vec![None, Some(Duration::from_millis(100u64))];
for replay_timeout in waits {
let reply = UpgradeRegionsHandler
.handle(
&handler_context,
UpgradeRegion {
region_id,
replay_timeout,
..Default::default()
},
)
.await;
let reply = reply.unwrap().expect_upgrade_region_reply();
assert!(!reply.ready);
assert!(reply.exists);
assert!(reply.error.is_none());
}
let reply = &reply.unwrap().expect_upgrade_regions_reply()[0];
assert!(!reply.ready);
assert!(reply.exists);
assert!(reply.error.is_none(), "error: {:?}", reply.error);
}
#[tokio::test]
async fn test_region_not_ready_with_retry() {
common_telemetry::init_default_ut_logging();
let mock_region_server = mock_region_server();
let region_id = RegionId::new(1024, 1);
@@ -264,58 +361,48 @@ mod tests {
// Region is not ready.
region_engine.mock_role = Some(Some(RegionRole::Follower));
region_engine.handle_request_mock_fn = Some(Box::new(|_, _| Ok(0)));
// Note: Don't change.
region_engine.handle_request_delay = Some(Duration::from_millis(300));
});
mock_region_server.register_test_region(region_id, mock_engine);
let waits = vec![
Some(Duration::from_millis(100u64)),
Some(Duration::from_millis(100u64)),
];
let waits = vec![Duration::from_millis(100u64), Duration::from_millis(100u64)];
let handler_context = HandlerContext::new_for_test(mock_region_server);
for replay_timeout in waits {
let reply = UpgradeRegionsHandler
let reply = UpgradeRegionsHandler::new_test()
.handle(
&handler_context,
UpgradeRegion {
vec![UpgradeRegion {
region_id,
replay_timeout,
..Default::default()
},
}],
)
.await;
let reply = reply.unwrap().expect_upgrade_region_reply();
let reply = &reply.unwrap().expect_upgrade_regions_reply()[0];
assert!(!reply.ready);
assert!(reply.exists);
assert!(reply.error.is_none());
assert!(reply.error.is_none(), "error: {:?}", reply.error);
}
let timer = Instant::now();
let reply = UpgradeRegionsHandler
let reply = UpgradeRegionsHandler::new_test()
.handle(
&handler_context,
UpgradeRegion {
vec![UpgradeRegion {
region_id,
replay_timeout: Some(Duration::from_millis(500)),
replay_timeout: Duration::from_millis(500),
..Default::default()
},
}],
)
.await;
// Must less than 300 ms.
assert!(timer.elapsed().as_millis() < 300);
let reply = reply.unwrap().expect_upgrade_region_reply();
let reply = &reply.unwrap().expect_upgrade_regions_reply()[0];
assert!(reply.ready);
assert!(reply.exists);
assert!(reply.error.is_none());
assert!(reply.error.is_none(), "error: {:?}", reply.error);
}
#[tokio::test]
async fn test_region_error() {
common_telemetry::init_default_ut_logging();
let mock_region_server = mock_region_server();
let region_id = RegionId::new(1024, 1);
@@ -335,38 +422,37 @@ mod tests {
mock_region_server.register_test_region(region_id, mock_engine);
let handler_context = HandlerContext::new_for_test(mock_region_server);
let reply = UpgradeRegionsHandler
let reply = UpgradeRegionsHandler::new_test()
.handle(
&handler_context,
UpgradeRegion {
vec![UpgradeRegion {
region_id,
..Default::default()
},
}],
)
.await;
// It didn't wait for handle returns; it had no idea about the error.
let reply = reply.unwrap().expect_upgrade_region_reply();
let reply = &reply.unwrap().expect_upgrade_regions_reply()[0];
assert!(!reply.ready);
assert!(reply.exists);
assert!(reply.error.is_none());
let reply = UpgradeRegionsHandler
let reply = UpgradeRegionsHandler::new_test()
.handle(
&handler_context,
UpgradeRegion {
vec![UpgradeRegion {
region_id,
replay_timeout: Some(Duration::from_millis(200)),
replay_timeout: Duration::from_millis(200),
..Default::default()
},
}],
)
.await;
let reply = reply.unwrap().expect_upgrade_region_reply();
let reply = &reply.unwrap().expect_upgrade_regions_reply()[0];
assert!(!reply.ready);
assert!(reply.exists);
assert!(reply.error.is_some());
assert!(reply.error.unwrap().contains("mock_error"));
assert!(reply.error.as_ref().unwrap().contains("mock_error"));
}
}

View File

@@ -75,4 +75,20 @@ lazy_static! {
&[RESULT_TYPE]
)
.unwrap();
/// Total count of failed region server requests.
pub static ref REGION_SERVER_REQUEST_FAILURE_COUNT: IntCounterVec = register_int_counter_vec!(
"greptime_datanode_region_request_fail_count",
"failed region server requests count",
&[REGION_REQUEST_TYPE]
)
.unwrap();
/// Total count of failed insert requests to region server.
pub static ref REGION_SERVER_INSERT_FAIL_COUNT: IntCounterVec = register_int_counter_vec!(
"greptime_datanode_region_failed_insert_count",
"failed region server insert requests count",
&[REGION_REQUEST_TYPE]
)
.unwrap();
}

View File

@@ -66,7 +66,8 @@ use store_api::region_engine::{
SettableRegionRoleState,
};
use store_api::region_request::{
AffectedRows, BatchRegionDdlRequest, RegionCloseRequest, RegionOpenRequest, RegionRequest,
AffectedRows, BatchRegionDdlRequest, RegionCatchupRequest, RegionCloseRequest,
RegionOpenRequest, RegionRequest,
};
use store_api::storage::RegionId;
use tokio::sync::{Semaphore, SemaphorePermit};
@@ -158,6 +159,27 @@ impl RegionServer {
}
}
/// Gets the MitoEngine if it's registered.
pub fn mito_engine(&self) -> Option<MitoEngine> {
if let Some(mito) = self.inner.mito_engine.read().unwrap().clone() {
Some(mito)
} else {
self.inner
.engines
.read()
.unwrap()
.get(MITO_ENGINE_NAME)
.cloned()
.and_then(|e| {
let mito = e.as_any().downcast_ref::<MitoEngine>().cloned();
if mito.is_none() {
warn!("Mito engine not found in region server engines");
}
mito
})
}
}
#[tracing::instrument(skip_all)]
pub async fn handle_batch_open_requests(
&self,
@@ -170,6 +192,17 @@ impl RegionServer {
.await
}
#[tracing::instrument(skip_all)]
pub async fn handle_batch_catchup_requests(
&self,
parallelism: usize,
requests: Vec<(RegionId, RegionCatchupRequest)>,
) -> Result<Vec<(RegionId, std::result::Result<(), BoxedError>)>> {
self.inner
.handle_batch_catchup_requests(parallelism, requests)
.await
}
#[tracing::instrument(skip_all, fields(request_type = request.request_type()))]
pub async fn handle_request(
&self,
@@ -378,6 +411,14 @@ impl RegionServer {
#[cfg(test)]
/// Registers a region for test purpose.
pub(crate) fn register_test_region(&self, region_id: RegionId, engine: RegionEngineRef) {
{
let mut engines = self.inner.engines.write().unwrap();
if !engines.contains_key(engine.name()) {
debug!("Registering test engine: {}", engine.name());
engines.insert(engine.name().to_string(), engine.clone());
}
}
self.inner
.region_map
.insert(region_id, RegionEngineWithStatus::Ready(engine));
@@ -559,6 +600,8 @@ impl RegionServer {
#[async_trait]
impl RegionServerHandler for RegionServer {
async fn handle(&self, request: region_request::Body) -> ServerResult<RegionResponseV1> {
let failed_requests_cnt = crate::metrics::REGION_SERVER_REQUEST_FAILURE_COUNT
.with_label_values(&[request.as_ref()]);
let response = match &request {
region_request::Body::Creates(_)
| region_request::Body::Drops(_)
@@ -576,6 +619,9 @@ impl RegionServerHandler for RegionServer {
_ => self.handle_requests_in_serial(request).await,
}
.map_err(BoxedError::new)
.inspect_err(|_| {
failed_requests_cnt.inc();
})
.context(ExecuteGrpcRequestSnafu)?;
Ok(RegionResponseV1 {
@@ -676,14 +722,14 @@ struct RegionServerInner {
runtime: Runtime,
event_listener: RegionServerEventListenerRef,
table_provider_factory: TableProviderFactoryRef,
// The number of queries allowed to be executed at the same time.
// Act as last line of defense on datanode to prevent query overloading.
/// The number of queries allowed to be executed at the same time.
/// Act as last line of defense on datanode to prevent query overloading.
parallelism: Option<RegionServerParallelism>,
// The topic stats reporter.
/// The topic stats reporter.
topic_stats_reporter: RwLock<Option<Box<dyn TopicStatsReporter>>>,
// HACK(zhongzc): Direct MitoEngine handle for diagnostics. This couples the
// server with a concrete engine; acceptable for now to fetch Mito-specific
// info (e.g., list SSTs). Consider a diagnostics trait later.
/// HACK(zhongzc): Direct MitoEngine handle for diagnostics. This couples the
/// server with a concrete engine; acceptable for now to fetch Mito-specific
/// info (e.g., list SSTs). Consider a diagnostics trait later.
mito_engine: RwLock<Option<MitoEngine>>,
}
@@ -951,6 +997,116 @@ impl RegionServerInner {
.collect::<Vec<_>>())
}
pub async fn handle_batch_catchup_requests_inner(
&self,
engine: RegionEngineRef,
parallelism: usize,
requests: Vec<(RegionId, RegionCatchupRequest)>,
) -> Result<Vec<(RegionId, std::result::Result<(), BoxedError>)>> {
for (region_id, _) in &requests {
self.set_region_status_not_ready(*region_id, &engine, &RegionChange::Catchup);
}
let region_ids = requests
.iter()
.map(|(region_id, _)| *region_id)
.collect::<Vec<_>>();
let mut responses = Vec::with_capacity(requests.len());
match engine
.handle_batch_catchup_requests(parallelism, requests)
.await
{
Ok(results) => {
for (region_id, result) in results {
match result {
Ok(_) => {
if let Err(e) = self
.set_region_status_ready(
region_id,
engine.clone(),
RegionChange::Catchup,
)
.await
{
error!(e; "Failed to set region to ready: {}", region_id);
responses.push((region_id, Err(BoxedError::new(e))));
} else {
responses.push((region_id, Ok(())));
}
}
Err(e) => {
self.unset_region_status(region_id, &engine, RegionChange::Catchup);
error!(e; "Failed to catchup region: {}", region_id);
responses.push((region_id, Err(e)));
}
}
}
}
Err(e) => {
for region_id in region_ids {
self.unset_region_status(region_id, &engine, RegionChange::Catchup);
}
error!(e; "Failed to catchup batch regions");
return error::UnexpectedSnafu {
violated: format!("Failed to catchup batch regions: {:?}", e),
}
.fail();
}
}
Ok(responses)
}
pub async fn handle_batch_catchup_requests(
&self,
parallelism: usize,
requests: Vec<(RegionId, RegionCatchupRequest)>,
) -> Result<Vec<(RegionId, std::result::Result<(), BoxedError>)>> {
let mut engine_grouped_requests: HashMap<String, Vec<_>> = HashMap::new();
let mut responses = Vec::with_capacity(requests.len());
for (region_id, request) in requests {
if let Ok(engine) = self.get_engine(region_id, &RegionChange::Catchup) {
match engine {
CurrentEngine::Engine(engine) => {
engine_grouped_requests
.entry(engine.name().to_string())
.or_default()
.push((region_id, request));
}
CurrentEngine::EarlyReturn(_) => {
return error::UnexpectedSnafu {
violated: format!("Unexpected engine type for region {}", region_id),
}
.fail();
}
}
} else {
responses.push((
region_id,
Err(BoxedError::new(
error::RegionNotFoundSnafu { region_id }.build(),
)),
));
}
}
for (engine, requests) in engine_grouped_requests {
let engine = self
.engines
.read()
.unwrap()
.get(&engine)
.with_context(|| RegionEngineNotFoundSnafu { name: &engine })?
.clone();
responses.extend(
self.handle_batch_catchup_requests_inner(engine, parallelism, requests)
.await?,
);
}
Ok(responses)
}
// Handle requests in batch.
//
// limitation: all create requests must be in the same engine.
@@ -1079,6 +1235,11 @@ impl RegionServerInner {
})
}
Err(err) => {
if matches!(region_change, RegionChange::Ingest) {
crate::metrics::REGION_SERVER_INSERT_FAIL_COUNT
.with_label_values(&[request_type])
.inc();
}
// Removes the region status if the operation fails.
self.unset_region_status(region_id, &engine, region_change);
Err(err)

View File

@@ -47,10 +47,7 @@ pub(crate) async fn new_object_store_without_cache(
Ok(object_store)
}
pub(crate) async fn new_object_store(
store: ObjectStoreConfig,
data_home: &str,
) -> Result<ObjectStore> {
pub async fn new_object_store(store: ObjectStoreConfig, data_home: &str) -> Result<ObjectStore> {
let object_store = new_raw_object_store(&store, data_home)
.await
.context(error::ObjectStoreSnafu)?;
@@ -59,7 +56,7 @@ pub(crate) async fn new_object_store(
let object_store = {
// It's safe to unwrap here because we already checked above.
let cache_config = store.cache_config().unwrap();
if let Some(cache_layer) = build_cache_layer(cache_config).await? {
if let Some(cache_layer) = build_cache_layer(cache_config, data_home).await? {
// Adds cache layer
object_store.layer(cache_layer)
} else {
@@ -79,17 +76,22 @@ pub(crate) async fn new_object_store(
async fn build_cache_layer(
cache_config: &ObjectStorageCacheConfig,
data_home: &str,
) -> Result<Option<LruCacheLayer<impl Access>>> {
// No need to build cache layer if read cache is disabled.
if !cache_config.enable_read_cache {
return Ok(None);
}
let atomic_temp_dir = join_dir(&cache_config.cache_path, ATOMIC_WRITE_DIR);
let cache_base_dir = if cache_config.cache_path.is_empty() {
data_home
} else {
&cache_config.cache_path
};
let atomic_temp_dir = join_dir(cache_base_dir, ATOMIC_WRITE_DIR);
clean_temp_dir(&atomic_temp_dir).context(error::ObjectStoreSnafu)?;
let cache_store = Fs::default()
.root(&cache_config.cache_path)
.root(cache_base_dir)
.atomic_write_dir(&atomic_temp_dir)
.build()
.context(error::BuildCacheStoreSnafu)?;

View File

@@ -277,6 +277,10 @@ impl ConcreteDataType {
matches!(self, ConcreteDataType::Null(NullType))
}
pub(crate) fn is_struct(&self) -> bool {
matches!(self, ConcreteDataType::Struct(_))
}
/// Try to cast the type as a [`ListType`].
pub fn as_list(&self) -> Option<&ListType> {
match self {

View File

@@ -266,6 +266,14 @@ pub enum Error {
#[snafu(implicit)]
location: Location,
},
#[snafu(display("Failed to parse or serialize arrow metadata"))]
ArrowMetadata {
#[snafu(source)]
error: arrow::error::ArrowError,
#[snafu(implicit)]
location: Location,
},
}
impl ErrorExt for Error {
@@ -307,7 +315,8 @@ impl ErrorExt for Error {
| ConvertArrowArrayToScalars { .. }
| ConvertScalarToArrowArray { .. }
| ParseExtendedType { .. }
| InconsistentStructFieldsAndItems { .. } => StatusCode::Internal,
| InconsistentStructFieldsAndItems { .. }
| ArrowMetadata { .. } => StatusCode::Internal,
}
}

View File

@@ -0,0 +1,15 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
pub mod json;

View File

@@ -0,0 +1,104 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
use std::sync::Arc;
use arrow_schema::extension::ExtensionType;
use arrow_schema::{ArrowError, DataType};
use serde::{Deserialize, Serialize};
use crate::json::JsonStructureSettings;
#[derive(Debug, Clone, Serialize, Deserialize, Default)]
pub struct JsonMetadata {
/// Indicates how to handle JSON is stored in underlying data type
///
/// This field can be `None` for data is converted to complete structured in-memory form.
pub json_structure_settings: Option<JsonStructureSettings>,
}
#[derive(Debug, Clone)]
pub struct JsonExtensionType(Arc<JsonMetadata>);
impl JsonExtensionType {
pub fn new(metadata: Arc<JsonMetadata>) -> Self {
JsonExtensionType(metadata)
}
}
impl ExtensionType for JsonExtensionType {
const NAME: &'static str = "greptime.json";
type Metadata = Arc<JsonMetadata>;
fn metadata(&self) -> &Self::Metadata {
&self.0
}
fn serialize_metadata(&self) -> Option<String> {
serde_json::to_string(self.metadata()).ok()
}
fn deserialize_metadata(metadata: Option<&str>) -> Result<Self::Metadata, ArrowError> {
if let Some(metadata) = metadata {
let metadata = serde_json::from_str(metadata).map_err(|e| {
ArrowError::ParseError(format!("Failed to deserialize JSON metadata: {}", e))
})?;
Ok(Arc::new(metadata))
} else {
Ok(Arc::new(JsonMetadata::default()))
}
}
fn supports_data_type(&self, data_type: &DataType) -> Result<(), ArrowError> {
match data_type {
// object
DataType::Struct(_)
// array
| DataType::List(_)
| DataType::ListView(_)
| DataType::LargeList(_)
| DataType::LargeListView(_)
// string
| DataType::Utf8
| DataType::Utf8View
| DataType::LargeUtf8
// number
| DataType::Int8
| DataType::Int16
| DataType::Int32
| DataType::Int64
| DataType::UInt8
| DataType::UInt16
| DataType::UInt32
| DataType::UInt64
| DataType::Float32
| DataType::Float64
// boolean
| DataType::Boolean
// null
| DataType::Null
// legacy json type
| DataType::Binary => Ok(()),
dt => Err(ArrowError::SchemaError(format!(
"Unexpected data type {dt}"
))),
}
}
fn try_new(data_type: &DataType, metadata: Self::Metadata) -> Result<Self, ArrowError> {
let json = Self(metadata);
json.supports_data_type(data_type)?;
Ok(json)
}
}

View File

@@ -13,11 +13,13 @@
// limitations under the License.
#![feature(assert_matches)]
#![feature(box_patterns)]
pub mod arrow_array;
pub mod data_type;
pub mod duration;
pub mod error;
pub mod extension;
pub mod interval;
pub mod json;
pub mod macros;

View File

@@ -32,9 +32,8 @@ pub use crate::schema::column_schema::{
COLUMN_FULLTEXT_OPT_KEY_FALSE_POSITIVE_RATE, COLUMN_FULLTEXT_OPT_KEY_GRANULARITY,
COLUMN_SKIPPING_INDEX_OPT_KEY_FALSE_POSITIVE_RATE, COLUMN_SKIPPING_INDEX_OPT_KEY_GRANULARITY,
COLUMN_SKIPPING_INDEX_OPT_KEY_TYPE, COMMENT_KEY, ColumnExtType, ColumnSchema, FULLTEXT_KEY,
FulltextAnalyzer, FulltextBackend, FulltextOptions, INVERTED_INDEX_KEY,
JSON_STRUCTURE_SETTINGS_KEY, Metadata, SKIPPING_INDEX_KEY, SkippingIndexOptions,
SkippingIndexType, TIME_INDEX_KEY,
FulltextAnalyzer, FulltextBackend, FulltextOptions, INVERTED_INDEX_KEY, Metadata,
SKIPPING_INDEX_KEY, SkippingIndexOptions, SkippingIndexType, TIME_INDEX_KEY,
};
pub use crate::schema::constraint::ColumnDefaultConstraint;
pub use crate::schema::raw::RawSchema;

View File

@@ -17,13 +17,17 @@ use std::fmt;
use std::str::FromStr;
use arrow::datatypes::Field;
use arrow_schema::extension::{
EXTENSION_TYPE_METADATA_KEY, EXTENSION_TYPE_NAME_KEY, ExtensionType,
};
use serde::{Deserialize, Serialize};
use snafu::{ResultExt, ensure};
use sqlparser_derive::{Visit, VisitMut};
use crate::data_type::{ConcreteDataType, DataType};
use crate::error::{self, Error, InvalidFulltextOptionSnafu, ParseExtendedTypeSnafu, Result};
use crate::json::JsonStructureSettings;
use crate::error::{
self, ArrowMetadataSnafu, Error, InvalidFulltextOptionSnafu, ParseExtendedTypeSnafu, Result,
};
use crate::schema::TYPE_KEY;
use crate::schema::constraint::ColumnDefaultConstraint;
use crate::value::Value;
@@ -42,7 +46,6 @@ pub const FULLTEXT_KEY: &str = "greptime:fulltext";
pub const INVERTED_INDEX_KEY: &str = "greptime:inverted_index";
/// Key used to store skip options in arrow field's metadata.
pub const SKIPPING_INDEX_KEY: &str = "greptime:skipping_index";
pub const JSON_STRUCTURE_SETTINGS_KEY: &str = "greptime:json:structure_settings";
/// Keys used in fulltext options
pub const COLUMN_FULLTEXT_CHANGE_OPT_KEY_ENABLE: &str = "enable";
@@ -394,18 +397,38 @@ impl ColumnSchema {
Ok(())
}
pub fn json_structure_settings(&self) -> Result<Option<JsonStructureSettings>> {
self.metadata
.get(JSON_STRUCTURE_SETTINGS_KEY)
.map(|json| serde_json::from_str(json).context(error::DeserializeSnafu { json }))
.transpose()
pub fn extension_type<E>(&self) -> Result<Option<E>>
where
E: ExtensionType,
{
let extension_type_name = self.metadata.get(EXTENSION_TYPE_NAME_KEY);
if extension_type_name.map(|s| s.as_str()) == Some(E::NAME) {
let extension_metadata = self.metadata.get(EXTENSION_TYPE_METADATA_KEY);
let extension_metadata =
E::deserialize_metadata(extension_metadata.map(|s| s.as_str()))
.context(ArrowMetadataSnafu)?;
let extension = E::try_new(&self.data_type.as_arrow_type(), extension_metadata)
.context(ArrowMetadataSnafu)?;
Ok(Some(extension))
} else {
Ok(None)
}
}
pub fn with_json_structure_settings(&mut self, settings: &JsonStructureSettings) -> Result<()> {
self.metadata.insert(
JSON_STRUCTURE_SETTINGS_KEY.to_string(),
serde_json::to_string(settings).context(error::SerializeSnafu)?,
);
pub fn with_extension_type<E>(&mut self, extension_type: &E) -> Result<()>
where
E: ExtensionType,
{
self.metadata
.insert(EXTENSION_TYPE_NAME_KEY.to_string(), E::NAME.to_string());
if let Some(extension_metadata) = extension_type.serialize_metadata() {
self.metadata
.insert(EXTENSION_TYPE_METADATA_KEY.to_string(), extension_metadata);
}
Ok(())
}
}

View File

@@ -12,7 +12,7 @@
// See the License for the specific language governing permissions and
// limitations under the License.
use std::collections::BTreeMap;
use std::collections::{BTreeMap, HashMap};
use std::str::FromStr;
use std::sync::Arc;
@@ -31,9 +31,12 @@ use crate::scalars::ScalarVectorBuilder;
use crate::type_id::LogicalTypeId;
use crate::types::{ListType, StructField, StructType};
use crate::value::Value;
use crate::vectors::json::builder::JsonVectorBuilder;
use crate::vectors::{BinaryVectorBuilder, MutableVector};
pub const JSON_TYPE_NAME: &str = "Json";
const JSON_PLAIN_FIELD_NAME: &str = "__plain__";
const JSON_PLAIN_FIELD_METADATA_KEY: &str = "is_plain_json";
#[derive(Debug, Clone, PartialEq, Eq, Hash, PartialOrd, Ord, Serialize, Deserialize, Default)]
pub enum JsonFormat {
@@ -54,28 +57,46 @@ impl JsonType {
Self { format }
}
// TODO(LFC): remove "allow unused"
#[allow(unused)]
pub(crate) fn empty() -> Self {
Self {
format: JsonFormat::Native(Box::new(ConcreteDataType::null_datatype())),
}
}
/// Make json type a struct type, by:
/// - if the json is an object, its entries are mapped to struct fields, obviously;
/// - if not, the json is one of bool, number, string or array, make it a special field called
/// "__plain" in a struct with only that field.
/// [JSON_PLAIN_FIELD_NAME] with metadata [JSON_PLAIN_FIELD_METADATA_KEY] = `"true"` in a
/// struct with only that field.
pub(crate) fn as_struct_type(&self) -> StructType {
match &self.format {
JsonFormat::Jsonb => StructType::default(),
JsonFormat::Native(inner) => match inner.as_ref() {
ConcreteDataType::Struct(t) => t.clone(),
x => StructType::new(Arc::new(vec![StructField::new(
"__plain".to_string(),
x.clone(),
true,
)])),
x => {
let mut field =
StructField::new(JSON_PLAIN_FIELD_NAME.to_string(), x.clone(), true);
field.insert_metadata(JSON_PLAIN_FIELD_METADATA_KEY, true);
StructType::new(Arc::new(vec![field]))
}
},
}
}
// TODO(LFC): remove "allow unused"
#[allow(unused)]
/// Check if this json type is the special "plain" one.
/// See [JsonType::as_struct_type].
pub(crate) fn is_plain_json(&self) -> bool {
let JsonFormat::Native(box ConcreteDataType::Struct(t)) = &self.format else {
return true;
};
let fields = t.fields();
let Some((single, [])) = fields.split_first() else {
return false;
};
single.name() == JSON_PLAIN_FIELD_NAME
&& single.metadata(JSON_PLAIN_FIELD_METADATA_KEY) == Some("true")
}
/// Try to merge this json type with others, error on datatype conflict.
pub(crate) fn merge(&mut self, other: &JsonType) -> Result<()> {
match (&self.format, &other.format) {
@@ -91,6 +112,47 @@ impl JsonType {
.fail(),
}
}
pub(crate) fn is_mergeable(&self, other: &JsonType) -> bool {
match (&self.format, &other.format) {
(JsonFormat::Jsonb, JsonFormat::Jsonb) => true,
(JsonFormat::Native(this), JsonFormat::Native(that)) => {
is_mergeable(this.as_ref(), that.as_ref())
}
_ => false,
}
}
}
fn is_mergeable(this: &ConcreteDataType, that: &ConcreteDataType) -> bool {
fn is_mergeable_struct(this: &StructType, that: &StructType) -> bool {
let this_fields = this.fields();
let this_fields = this_fields
.iter()
.map(|x| (x.name(), x))
.collect::<HashMap<_, _>>();
for that_field in that.fields().iter() {
if let Some(this_field) = this_fields.get(that_field.name())
&& !is_mergeable(this_field.data_type(), that_field.data_type())
{
return false;
}
}
true
}
match (this, that) {
(this, that) if this == that => true,
(ConcreteDataType::List(this), ConcreteDataType::List(that)) => {
is_mergeable(this.item_type(), that.item_type())
}
(ConcreteDataType::Struct(this), ConcreteDataType::Struct(that)) => {
is_mergeable_struct(this, that)
}
(ConcreteDataType::Null(_), _) | (_, ConcreteDataType::Null(_)) => true,
_ => false,
}
}
fn merge(this: &ConcreteDataType, that: &ConcreteDataType) -> Result<ConcreteDataType> {
@@ -166,7 +228,10 @@ impl DataType for JsonType {
}
fn create_mutable_vector(&self, capacity: usize) -> Box<dyn MutableVector> {
Box::new(BinaryVectorBuilder::with_capacity(capacity))
match self.format {
JsonFormat::Jsonb => Box::new(BinaryVectorBuilder::with_capacity(capacity)),
JsonFormat::Native(_) => Box::new(JsonVectorBuilder::with_capacity(capacity)),
}
}
fn try_cast(&self, from: Value) -> Option<Value> {
@@ -226,10 +291,12 @@ mod tests {
let result = json_type.merge(other);
match (result, expected) {
(Ok(()), Ok(expected)) => {
assert_eq!(json_type.name(), expected)
assert_eq!(json_type.name(), expected);
assert!(json_type.is_mergeable(other));
}
(Err(err), Err(expected)) => {
assert_eq!(err.to_string(), expected)
assert_eq!(err.to_string(), expected);
assert!(!json_type.is_mergeable(other));
}
_ => unreachable!(),
}

View File

@@ -12,6 +12,7 @@
// See the License for the specific language governing permissions and
// limitations under the License.
use std::collections::BTreeMap;
use std::sync::Arc;
use arrow::datatypes::{DataType as ArrowDataType, Field};
@@ -46,6 +47,15 @@ impl TryFrom<&Fields> for StructType {
}
}
impl<const N: usize> From<[StructField; N]> for StructType {
fn from(value: [StructField; N]) -> Self {
let value: Box<[StructField]> = Box::new(value);
Self {
fields: Arc::new(value.into_vec()),
}
}
}
impl DataType for StructType {
fn name(&self) -> String {
format!(
@@ -108,6 +118,7 @@ pub struct StructField {
name: String,
data_type: ConcreteDataType,
nullable: bool,
metadata: BTreeMap<String, String>,
}
impl StructField {
@@ -116,6 +127,7 @@ impl StructField {
name,
data_type,
nullable,
metadata: BTreeMap::new(),
}
}
@@ -135,11 +147,25 @@ impl StructField {
self.nullable
}
pub(crate) fn insert_metadata(&mut self, key: impl ToString, value: impl ToString) {
self.metadata.insert(key.to_string(), value.to_string());
}
pub(crate) fn metadata(&self, key: &str) -> Option<&str> {
self.metadata.get(key).map(String::as_str)
}
pub fn to_df_field(&self) -> Field {
let metadata = self
.metadata
.iter()
.map(|(k, v)| (k.clone(), v.clone()))
.collect();
Field::new(
self.name.clone(),
self.data_type.as_arrow_type(),
self.nullable,
)
.with_metadata(metadata)
}
}

View File

@@ -873,6 +873,12 @@ impl From<&[u8]> for Value {
}
}
impl From<()> for Value {
fn from(_: ()) -> Self {
Value::Null
}
}
impl TryFrom<Value> for serde_json::Value {
type Error = serde_json::Error;

View File

@@ -35,6 +35,7 @@ mod duration;
mod eq;
mod helper;
mod interval;
pub(crate) mod json;
mod list;
mod null;
pub(crate) mod operations;

View File

@@ -464,6 +464,14 @@ impl Helper {
}
}
#[cfg(test)]
pub(crate) fn pretty_print(vector: VectorRef) -> String {
let array = vector.to_arrow_array();
arrow::util::pretty::pretty_format_columns(&vector.vector_type_name(), &[array])
.map(|x| x.to_string())
.unwrap_or_else(|e| e.to_string())
}
#[cfg(test)]
mod tests {
use arrow::array::{

View File

@@ -0,0 +1,15 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
pub(crate) mod builder;

View File

@@ -0,0 +1,485 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
use std::any::Any;
use std::collections::HashMap;
use snafu::OptionExt;
use crate::data_type::ConcreteDataType;
use crate::error::{Result, TryFromValueSnafu, UnsupportedOperationSnafu};
use crate::prelude::{ValueRef, Vector, VectorRef};
use crate::types::JsonType;
use crate::value::StructValueRef;
use crate::vectors::{MutableVector, StructVectorBuilder};
struct JsonStructsBuilder {
json_type: JsonType,
inner: StructVectorBuilder,
}
impl JsonStructsBuilder {
fn new(json_type: JsonType, capacity: usize) -> Self {
let struct_type = json_type.as_struct_type();
let inner = StructVectorBuilder::with_type_and_capacity(struct_type, capacity);
Self { json_type, inner }
}
fn len(&self) -> usize {
self.inner.len()
}
fn push(&mut self, value: &ValueRef) -> Result<()> {
if self.json_type.is_plain_json() {
let value = ValueRef::Struct(StructValueRef::RefList {
val: vec![value.clone()],
fields: self.json_type.as_struct_type(),
});
self.inner.try_push_value_ref(&value)
} else {
self.inner.try_push_value_ref(value)
}
}
/// Try to merge (and consume the data of) other json vector builder into this one.
/// Note that the other builder's json type must be able to be merged with this one's
/// (this one's json type has all the fields in other one's, and no datatypes conflict).
/// Normally this is guaranteed, as long as json values are pushed through [JsonVectorBuilder].
fn try_merge(&mut self, other: &mut JsonStructsBuilder) -> Result<()> {
debug_assert!(self.json_type.is_mergeable(&other.json_type));
fn helper(this: &mut StructVectorBuilder, that: &mut StructVectorBuilder) -> Result<()> {
let that_len = that.len();
if let Some(x) = that.mut_null_buffer().finish() {
this.mut_null_buffer().append_buffer(&x)
} else {
this.mut_null_buffer().append_n_non_nulls(that_len);
}
let that_fields = that.struct_type().fields();
let mut that_builders = that_fields
.iter()
.zip(that.mut_value_builders().iter_mut())
.map(|(field, builder)| (field.name(), builder))
.collect::<HashMap<_, _>>();
for (field, this_builder) in this
.struct_type()
.fields()
.iter()
.zip(this.mut_value_builders().iter_mut())
{
if let Some(that_builder) = that_builders.get_mut(field.name()) {
if field.data_type().is_struct() {
let this = this_builder
.as_mut_any()
.downcast_mut::<StructVectorBuilder>()
// Safety: a struct datatype field must be corresponding to a struct vector builder.
.unwrap();
let that = that_builder
.as_mut_any()
.downcast_mut::<StructVectorBuilder>()
// Safety: other builder with same field name must have same datatype,
// ensured because the two json types are mergeable.
.unwrap();
helper(this, that)?;
} else {
let vector = that_builder.to_vector();
this_builder.extend_slice_of(vector.as_ref(), 0, vector.len())?;
}
} else {
this_builder.push_nulls(that_len);
}
}
Ok(())
}
helper(&mut self.inner, &mut other.inner)
}
/// Same as [JsonStructsBuilder::try_merge], but does not consume the other builder's data.
fn try_merge_cloned(&mut self, other: &JsonStructsBuilder) -> Result<()> {
debug_assert!(self.json_type.is_mergeable(&other.json_type));
fn helper(this: &mut StructVectorBuilder, that: &StructVectorBuilder) -> Result<()> {
let that_len = that.len();
if let Some(x) = that.null_buffer().finish_cloned() {
this.mut_null_buffer().append_buffer(&x)
} else {
this.mut_null_buffer().append_n_non_nulls(that_len);
}
let that_fields = that.struct_type().fields();
let that_builders = that_fields
.iter()
.zip(that.value_builders().iter())
.map(|(field, builder)| (field.name(), builder))
.collect::<HashMap<_, _>>();
for (field, this_builder) in this
.struct_type()
.fields()
.iter()
.zip(this.mut_value_builders().iter_mut())
{
if let Some(that_builder) = that_builders.get(field.name()) {
if field.data_type().is_struct() {
let this = this_builder
.as_mut_any()
.downcast_mut::<StructVectorBuilder>()
// Safety: a struct datatype field must be corresponding to a struct vector builder.
.unwrap();
let that = that_builder
.as_any()
.downcast_ref::<StructVectorBuilder>()
// Safety: other builder with same field name must have same datatype,
// ensured because the two json types are mergeable.
.unwrap();
helper(this, that)?;
} else {
let vector = that_builder.to_vector_cloned();
this_builder.extend_slice_of(vector.as_ref(), 0, vector.len())?;
}
} else {
this_builder.push_nulls(that_len);
}
}
Ok(())
}
helper(&mut self.inner, &other.inner)
}
}
/// The vector builder for json type values.
///
/// Json type are dynamic, to some degree (as long as they can be merged into each other). So are
/// json values. Json values are physically stored in struct vectors, which require the types of
/// struct values to be fixed inside a certain struct vector. So to resolve "dynamic" vs "fixed"
/// datatype problem, in this builder, each type of json value gets its own struct vector builder.
/// Once new json type value is pushing into this builder, it creates a new "child" builder for it.
///
/// Given the "mixed" nature of the values stored in this builder, to produce the json vector, a
/// "merge" operation is performed. The "merge" is to iterate over all the "child" builders, and fill
/// nulls for missing json fields. The final vector's json type is fixed to be the "merge" of all
/// pushed json types.
pub(crate) struct JsonVectorBuilder {
merged_type: JsonType,
capacity: usize,
builders: Vec<JsonStructsBuilder>,
}
impl JsonVectorBuilder {
pub(crate) fn with_capacity(capacity: usize) -> Self {
Self {
merged_type: JsonType::empty(),
capacity,
builders: vec![],
}
}
fn try_create_new_builder(&mut self, json_type: &JsonType) -> Result<&mut JsonStructsBuilder> {
self.merged_type.merge(json_type)?;
let builder = JsonStructsBuilder::new(json_type.clone(), self.capacity);
self.builders.push(builder);
let len = self.builders.len();
Ok(&mut self.builders[len - 1])
}
}
impl MutableVector for JsonVectorBuilder {
fn data_type(&self) -> ConcreteDataType {
ConcreteDataType::Json(self.merged_type.clone())
}
fn len(&self) -> usize {
self.builders.iter().map(|x| x.len()).sum()
}
fn as_any(&self) -> &dyn Any {
self
}
fn as_mut_any(&mut self) -> &mut dyn Any {
self
}
fn to_vector(&mut self) -> VectorRef {
// Fast path:
if self.builders.len() == 1 {
return self.builders[0].inner.to_vector();
}
let mut unified_jsons = JsonStructsBuilder::new(self.merged_type.clone(), self.capacity);
for builder in self.builders.iter_mut() {
unified_jsons
.try_merge(builder)
// Safety: the "unified_jsons" has the merged json type from all the builders,
// so it should merge them without errors.
.unwrap_or_else(|e| panic!("failed to merge json builders, error: {e}"));
}
unified_jsons.inner.to_vector()
}
fn to_vector_cloned(&self) -> VectorRef {
// Fast path:
if self.builders.len() == 1 {
return self.builders[0].inner.to_vector_cloned();
}
let mut unified_jsons = JsonStructsBuilder::new(self.merged_type.clone(), self.capacity);
for builder in self.builders.iter() {
unified_jsons
.try_merge_cloned(builder)
// Safety: the "unified_jsons" has the merged json type from all the builders,
// so it should merge them without errors.
.unwrap_or_else(|e| panic!("failed to merge json builders, error: {e}"));
}
unified_jsons.inner.to_vector_cloned()
}
fn try_push_value_ref(&mut self, value: &ValueRef) -> Result<()> {
let data_type = value.data_type();
let json_type = data_type.as_json().with_context(|| TryFromValueSnafu {
reason: format!("expected json value, got {value:?}"),
})?;
let builder = match self.builders.last_mut() {
Some(last) => {
if &last.json_type != json_type {
self.try_create_new_builder(json_type)?
} else {
last
}
}
None => self.try_create_new_builder(json_type)?,
};
let ValueRef::Json(value) = value else {
// Safety: json datatype value must be the value of json.
unreachable!()
};
builder.push(value)
}
fn push_null(&mut self) {
let null_json_value = ValueRef::Json(Box::new(ValueRef::Null));
self.try_push_value_ref(&null_json_value)
// Safety: learning from the method "try_push_value_ref", a null json value should be
// always able to push into any json vectors.
.unwrap_or_else(|e| {
panic!("failed to push null json value: {null_json_value:?}, error: {e}")
});
}
fn extend_slice_of(&mut self, _: &dyn Vector, _: usize, _: usize) -> Result<()> {
UnsupportedOperationSnafu {
op: "extend_slice_of",
vector_type: "JsonVector",
}
.fail()
}
}
#[cfg(test)]
mod tests {
use super::*;
use crate::data_type::DataType;
use crate::json::JsonStructureSettings;
use crate::vectors::helper::pretty_print;
fn push(json: &str, builder: &mut JsonVectorBuilder, expected: std::result::Result<(), &str>) {
let settings = JsonStructureSettings::Structured(None);
let json: serde_json::Value = serde_json::from_str(json).unwrap();
let value = settings.encode(json).unwrap();
let value = value.as_value_ref();
let result = builder.try_push_value_ref(&value);
match (result, expected) {
(Ok(()), Ok(())) => (),
(Err(e), Err(expected)) => assert_eq!(e.to_string(), expected),
_ => unreachable!(),
}
}
#[test]
fn test_push_plain_jsons() -> Result<()> {
let jsons = vec!["1", "2", r#""s""#, "[true]"];
let results = vec![
Ok(()),
Ok(()),
Err(
"Failed to merge JSON datatype: datatypes have conflict, this: Int64, that: String",
),
Err(
"Failed to merge JSON datatype: datatypes have conflict, this: Int64, that: List<Boolean>",
),
];
let mut builder = JsonVectorBuilder::with_capacity(1);
for (json, result) in jsons.into_iter().zip(results.into_iter()) {
push(json, &mut builder, result);
}
let vector = builder.to_vector();
let expected = r#"
+----------------+
| StructVector |
+----------------+
| {__plain__: 1} |
| {__plain__: 2} |
+----------------+"#;
assert_eq!(pretty_print(vector), expected.trim());
Ok(())
}
#[test]
fn test_push_json_objects() -> Result<()> {
let jsons = vec![
r#"{
"s": "a",
"list": [1, 2, 3]
}"#,
r#"{
"list": [4],
"s": "b"
}"#,
r#"{
"s": "c",
"float": 0.9
}"#,
r#"{
"float": 0.8,
"s": "d"
}"#,
r#"{
"float": 0.7,
"int": -1
}"#,
r#"{
"int": 0,
"float": 0.6
}"#,
r#"{
"int": 1,
"object": {"hello": "world", "timestamp": 1761523200000}
}"#,
r#"{
"object": {"hello": "greptime", "timestamp": 1761523201000},
"int": 2
}"#,
r#"{
"object": {"timestamp": 1761523202000},
"nested": {"a": {"b": {"b": {"a": "abba"}}}}
}"#,
r#"{
"nested": {"a": {"b": {"a": {"b": "abab"}}}},
"object": {"timestamp": 1761523203000}
}"#,
];
let mut builder = JsonVectorBuilder::with_capacity(1);
for json in jsons {
push(json, &mut builder, Ok(()));
}
assert_eq!(builder.len(), 10);
// test children builders:
assert_eq!(builder.builders.len(), 6);
let expect_types = [
r#"Json<Struct<"list": List<Int64>, "s": String>>"#,
r#"Json<Struct<"float": Float64, "s": String>>"#,
r#"Json<Struct<"float": Float64, "int": Int64>>"#,
r#"Json<Struct<"int": Int64, "object": Struct<"hello": String, "timestamp": Int64>>>"#,
r#"Json<Struct<"nested": Struct<"a": Struct<"b": Struct<"b": Struct<"a": String>>>>, "object": Struct<"timestamp": Int64>>>"#,
r#"Json<Struct<"nested": Struct<"a": Struct<"b": Struct<"a": Struct<"b": String>>>>, "object": Struct<"timestamp": Int64>>>"#,
];
let expect_vectors = [
r#"
+-------------------------+
| StructVector |
+-------------------------+
| {list: [1, 2, 3], s: a} |
| {list: [4], s: b} |
+-------------------------+"#,
r#"
+--------------------+
| StructVector |
+--------------------+
| {float: 0.9, s: c} |
| {float: 0.8, s: d} |
+--------------------+"#,
r#"
+-----------------------+
| StructVector |
+-----------------------+
| {float: 0.7, int: -1} |
| {float: 0.6, int: 0} |
+-----------------------+"#,
r#"
+---------------------------------------------------------------+
| StructVector |
+---------------------------------------------------------------+
| {int: 1, object: {hello: world, timestamp: 1761523200000}} |
| {int: 2, object: {hello: greptime, timestamp: 1761523201000}} |
+---------------------------------------------------------------+"#,
r#"
+------------------------------------------------------------------------+
| StructVector |
+------------------------------------------------------------------------+
| {nested: {a: {b: {b: {a: abba}}}}, object: {timestamp: 1761523202000}} |
+------------------------------------------------------------------------+"#,
r#"
+------------------------------------------------------------------------+
| StructVector |
+------------------------------------------------------------------------+
| {nested: {a: {b: {a: {b: abab}}}}, object: {timestamp: 1761523203000}} |
+------------------------------------------------------------------------+"#,
];
for (builder, (expect_type, expect_vector)) in builder
.builders
.iter()
.zip(expect_types.into_iter().zip(expect_vectors.into_iter()))
{
assert_eq!(builder.json_type.name(), expect_type);
let vector = builder.inner.to_vector_cloned();
assert_eq!(pretty_print(vector), expect_vector.trim());
}
// test final merged json type:
let expected = r#"Json<Struct<"float": Float64, "int": Int64, "list": List<Int64>, "nested": Struct<"a": Struct<"b": Struct<"a": Struct<"b": String>, "b": Struct<"a": String>>>>, "object": Struct<"hello": String, "timestamp": Int64>, "s": String>>"#;
assert_eq!(builder.data_type().to_string(), expected);
// test final produced vector:
let expected = r#"
+-------------------------------------------------------------------------------------------------------------------+
| StructVector |
+-------------------------------------------------------------------------------------------------------------------+
| {float: , int: , list: [1, 2, 3], nested: , object: , s: a} |
| {float: , int: , list: [4], nested: , object: , s: b} |
| {float: 0.9, int: , list: , nested: , object: , s: c} |
| {float: 0.8, int: , list: , nested: , object: , s: d} |
| {float: 0.7, int: -1, list: , nested: , object: , s: } |
| {float: 0.6, int: 0, list: , nested: , object: , s: } |
| {float: , int: 1, list: , nested: , object: {hello: world, timestamp: 1761523200000}, s: } |
| {float: , int: 2, list: , nested: , object: {hello: greptime, timestamp: 1761523201000}, s: } |
| {float: , int: , list: , nested: {a: {b: {a: , b: {a: abba}}}}, object: {hello: , timestamp: 1761523202000}, s: } |
| {float: , int: , list: , nested: {a: {b: {a: {b: abab}, b: }}}, object: {hello: , timestamp: 1761523203000}, s: } |
+-------------------------------------------------------------------------------------------------------------------+"#;
let vector = builder.to_vector_cloned();
assert_eq!(pretty_print(vector), expected.trim());
let vector = builder.to_vector();
assert_eq!(pretty_print(vector), expected.trim());
Ok(())
}
}

View File

@@ -323,6 +323,26 @@ impl StructVectorBuilder {
}
self.null_buffer.append_null();
}
pub(crate) fn struct_type(&self) -> &StructType {
&self.fields
}
pub(crate) fn value_builders(&self) -> &[Box<dyn MutableVector>] {
&self.value_builders
}
pub(crate) fn mut_value_builders(&mut self) -> &mut [Box<dyn MutableVector>] {
&mut self.value_builders
}
pub(crate) fn null_buffer(&self) -> &NullBufferBuilder {
&self.null_buffer
}
pub(crate) fn mut_null_buffer(&mut self) -> &mut NullBufferBuilder {
&mut self.null_buffer
}
}
impl MutableVector for StructVectorBuilder {

View File

@@ -18,6 +18,7 @@ use std::collections::BTreeSet;
use std::sync::Arc;
use catalog::CatalogManagerRef;
use client::{DEFAULT_CATALOG_NAME, DEFAULT_SCHEMA_NAME};
use common_error::ext::BoxedError;
use common_meta::key::flow::FlowMetadataManagerRef;
use common_recordbatch::{RecordBatch, RecordBatches, SendableRecordBatchStream};
@@ -396,8 +397,8 @@ impl RefillTask {
// we don't need information from query context in this query so a default query context is enough
let query_ctx = Arc::new(
QueryContextBuilder::default()
.current_catalog("greptime".to_string())
.current_schema("public".to_string())
.current_catalog(DEFAULT_CATALOG_NAME.to_string())
.current_schema(DEFAULT_SCHEMA_NAME.to_string())
.build(),
);

View File

@@ -23,7 +23,7 @@ use api::v1::query_request::Query;
use api::v1::{CreateTableExpr, QueryRequest};
use client::{Client, Database};
use common_error::ext::{BoxedError, ErrorExt};
use common_grpc::channel_manager::{ChannelConfig, ChannelManager};
use common_grpc::channel_manager::{ChannelConfig, ChannelManager, load_tls_config};
use common_meta::cluster::{NodeInfo, NodeInfoKey, Role};
use common_meta::peer::Peer;
use common_meta::rpc::store::RangeRequest;
@@ -123,12 +123,10 @@ impl FrontendClient {
let cfg = ChannelConfig::new()
.connect_timeout(batch_opts.grpc_conn_timeout)
.timeout(batch_opts.query_timeout);
if let Some(tls) = &batch_opts.frontend_tls {
let cfg = cfg.client_tls_config(tls.clone());
ChannelManager::with_tls_config(cfg).context(InvalidClientConfigSnafu)?
} else {
ChannelManager::with_config(cfg)
}
let tls_config = load_tls_config(batch_opts.frontend_tls.as_ref())
.context(InvalidClientConfigSnafu)?;
ChannelManager::with_config(cfg, tls_config)
},
auth,
query,

View File

@@ -12,7 +12,7 @@
// See the License for the specific language governing permissions and
// limitations under the License.
use std::collections::HashMap;
use std::collections::{HashMap, HashSet};
use std::sync::Arc;
use async_trait::async_trait;
@@ -28,6 +28,7 @@ use common_function::scalars::udf::create_udf;
use common_query::{Output, OutputData};
use common_recordbatch::adapter::RecordBatchStreamAdapter;
use common_recordbatch::util;
use common_telemetry::warn;
use datafusion::dataframe::DataFrame;
use datafusion::execution::SessionStateBuilder;
use datafusion::execution::context::SessionContext;
@@ -42,8 +43,9 @@ use servers::error::{
};
use servers::http::jaeger::{JAEGER_QUERY_TABLE_NAME_KEY, QueryTraceParams};
use servers::otlp::trace::{
DURATION_NANO_COLUMN, SERVICE_NAME_COLUMN, SPAN_ATTRIBUTES_COLUMN, SPAN_KIND_COLUMN,
SPAN_KIND_PREFIX, SPAN_NAME_COLUMN, TIMESTAMP_COLUMN, TRACE_ID_COLUMN,
DURATION_NANO_COLUMN, KEY_OTEL_STATUS_ERROR_KEY, SERVICE_NAME_COLUMN, SPAN_ATTRIBUTES_COLUMN,
SPAN_KIND_COLUMN, SPAN_KIND_PREFIX, SPAN_NAME_COLUMN, SPAN_STATUS_CODE, SPAN_STATUS_ERROR,
TIMESTAMP_COLUMN, TRACE_ID_COLUMN,
};
use servers::query_handler::JaegerQueryHandler;
use session::context::QueryContextRef;
@@ -263,7 +265,7 @@ impl JaegerQueryHandler for Instance {
self.query_engine(),
vec![wildcard()],
filters,
vec![],
vec![col(TIMESTAMP_COLUMN).sort(false, false)], // Sort by timestamp in descending order.
None,
None,
vec![],
@@ -322,6 +324,7 @@ async fn query_trace_table(
})?;
let is_data_model_v1 = table
.clone()
.table_info()
.meta
.options
@@ -330,6 +333,14 @@ async fn query_trace_table(
.map(|s| s.as_str())
== Some(TABLE_DATA_MODEL_TRACE_V1);
// collect to set
let col_names = table
.table_info()
.meta
.field_column_names()
.map(|s| format!("\"{}\"", s))
.collect::<HashSet<String>>();
let df_context = create_df_context(query_engine)?;
let dataframe = df_context
@@ -342,7 +353,7 @@ async fn query_trace_table(
let dataframe = filters
.into_iter()
.chain(tags.map_or(Ok(vec![]), |t| {
tags_filters(&dataframe, t, is_data_model_v1)
tags_filters(&dataframe, t, is_data_model_v1, &col_names)
})?)
.try_fold(dataframe, |df, expr| {
df.filter(expr).context(DataFusionSnafu)
@@ -472,23 +483,73 @@ fn json_tag_filters(
Ok(filters)
}
fn flatten_tag_filters(tags: HashMap<String, JsonValue>) -> ServerResult<Vec<Expr>> {
/// Helper function to check if span_key or resource_key exists in col_names and create an expression.
/// If neither exists, logs a warning and returns None.
#[inline]
fn check_col_and_build_expr<F>(
span_key: String,
resource_key: String,
key: &str,
col_names: &HashSet<String>,
expr_builder: F,
) -> Option<Expr>
where
F: FnOnce(String) -> Expr,
{
if col_names.contains(&span_key) {
return Some(expr_builder(span_key));
}
if col_names.contains(&resource_key) {
return Some(expr_builder(resource_key));
}
warn!("tag key {} not found in table columns", key);
None
}
fn flatten_tag_filters(
tags: HashMap<String, JsonValue>,
col_names: &HashSet<String>,
) -> ServerResult<Vec<Expr>> {
let filters = tags
.into_iter()
.filter_map(|(key, value)| {
let key = format!("\"span_attributes.{}\"", key);
if key == KEY_OTEL_STATUS_ERROR_KEY && value == JsonValue::Bool(true) {
return Some(col(SPAN_STATUS_CODE).eq(lit(SPAN_STATUS_ERROR)));
}
// TODO(shuiyisong): add more precise mapping from key to col name
let span_key = format!("\"span_attributes.{}\"", key);
let resource_key = format!("\"resource_attributes.{}\"", key);
match value {
JsonValue::String(value) => Some(col(key).eq(lit(value))),
JsonValue::String(value) => {
check_col_and_build_expr(span_key, resource_key, &key, col_names, |k| {
col(k).eq(lit(value))
})
}
JsonValue::Number(value) => {
if value.is_f64() {
// safe to unwrap as checked previously
Some(col(key).eq(lit(value.as_f64().unwrap())))
let value = value.as_f64().unwrap();
check_col_and_build_expr(span_key, resource_key, &key, col_names, |k| {
col(k).eq(lit(value))
})
} else {
Some(col(key).eq(lit(value.as_i64().unwrap())))
let value = value.as_i64().unwrap();
check_col_and_build_expr(span_key, resource_key, &key, col_names, |k| {
col(k).eq(lit(value))
})
}
}
JsonValue::Bool(value) => Some(col(key).eq(lit(value))),
JsonValue::Null => Some(col(key).is_null()),
JsonValue::Bool(value) => {
check_col_and_build_expr(span_key, resource_key, &key, col_names, |k| {
col(k).eq(lit(value))
})
}
JsonValue::Null => {
check_col_and_build_expr(span_key, resource_key, &key, col_names, |k| {
col(k).is_null()
})
}
// not supported at the moment
JsonValue::Array(_value) => None,
JsonValue::Object(_value) => None,
@@ -502,9 +563,10 @@ fn tags_filters(
dataframe: &DataFrame,
tags: HashMap<String, JsonValue>,
is_data_model_v1: bool,
col_names: &HashSet<String>,
) -> ServerResult<Vec<Expr>> {
if is_data_model_v1 {
flatten_tag_filters(tags)
flatten_tag_filters(tags, col_names)
} else {
json_tag_filters(dataframe, tags)
}

View File

@@ -36,7 +36,7 @@ async fn run() {
.timeout(Duration::from_secs(3))
.connect_timeout(Duration::from_secs(5))
.tcp_nodelay(true);
let channel_manager = ChannelManager::with_config(config);
let channel_manager = ChannelManager::with_config(config, None);
let mut meta_client = MetaClientBuilder::datanode_default_options(id)
.channel_manager(channel_manager)
.build();

View File

@@ -101,7 +101,7 @@ pub async fn create_meta_client(
if let MetaClientType::Frontend = client_type {
let ddl_config = base_config.clone().timeout(meta_client_options.ddl_timeout);
builder = builder.ddl_channel_manager(ChannelManager::with_config(ddl_config));
builder = builder.ddl_channel_manager(ChannelManager::with_config(ddl_config, None));
if let Some(plugins) = plugins {
let region_follower = plugins.get::<RegionFollowerClientRef>();
if let Some(region_follower) = region_follower {
@@ -112,8 +112,8 @@ pub async fn create_meta_client(
}
builder = builder
.channel_manager(ChannelManager::with_config(base_config))
.heartbeat_channel_manager(ChannelManager::with_config(heartbeat_config));
.channel_manager(ChannelManager::with_config(base_config, None))
.heartbeat_channel_manager(ChannelManager::with_config(heartbeat_config, None));
let mut meta_client = builder.build();

View File

@@ -72,7 +72,10 @@ serde.workspace = true
serde_json.workspace = true
servers.workspace = true
snafu.workspace = true
sqlx = { workspace = true, optional = true }
sqlx = { workspace = true, features = [
"mysql",
"chrono",
], optional = true }
store-api.workspace = true
strum.workspace = true
table.workspace = true

View File

@@ -375,12 +375,14 @@ pub struct MetasrvNodeInfo {
// The node total cpu millicores
#[serde(default)]
pub total_cpu_millicores: i64,
#[serde(default)]
// The node total memory bytes
#[serde(default)]
pub total_memory_bytes: i64,
/// The node build cpu usage millicores
#[serde(default)]
pub cpu_usage_millicores: i64,
/// The node build memory usage bytes
#[serde(default)]
pub memory_usage_bytes: i64,
// The node hostname
#[serde(default)]
@@ -858,3 +860,18 @@ impl Metasrv {
}
}
}
#[cfg(test)]
mod tests {
use crate::metasrv::MetasrvNodeInfo;
#[test]
fn test_deserialize_metasrv_node_info() {
let str = r#"{"addr":"127.0.0.1:4002","version":"0.1.0","git_commit":"1234567890","start_time_ms":1715145600}"#;
let node_info: MetasrvNodeInfo = serde_json::from_str(str).unwrap();
assert_eq!(node_info.addr, "127.0.0.1:4002");
assert_eq!(node_info.version, "0.1.0");
assert_eq!(node_info.git_commit, "1234567890");
assert_eq!(node_info.start_time_ms, 1715145600);
}
}

View File

@@ -134,7 +134,7 @@ pub async fn mock(
.timeout(Duration::from_secs(10))
.connect_timeout(Duration::from_secs(10))
.tcp_nodelay(true);
let channel_manager = ChannelManager::with_config(config);
let channel_manager = ChannelManager::with_config(config, None);
// Move client to an option so we can _move_ the inner value
// on the first attempt to connect. All other attempts will fail.

View File

@@ -17,7 +17,9 @@ use std::time::Duration;
use api::v1::meta::MailboxMessage;
use common_meta::ddl::utils::parse_region_wal_options;
use common_meta::instruction::{Instruction, InstructionReply, UpgradeRegion, UpgradeRegionReply};
use common_meta::instruction::{
Instruction, InstructionReply, UpgradeRegion, UpgradeRegionReply, UpgradeRegionsReply,
};
use common_meta::lock_key::RemoteWalLock;
use common_meta::wal_options_allocator::extract_topic_from_wal_options;
use common_procedure::{Context as ProcedureContext, Status};
@@ -131,19 +133,19 @@ impl UpgradeCandidateRegion {
None
};
let upgrade_instruction = Instruction::UpgradeRegion(
let upgrade_instruction = Instruction::UpgradeRegions(vec![
UpgradeRegion {
region_id,
last_entry_id,
metadata_last_entry_id,
replay_timeout: Some(replay_timeout),
replay_timeout,
location_id: Some(ctx.persistent_ctx.from_peer.id),
replay_entry_id: None,
metadata_replay_entry_id: None,
}
.with_replay_entry_id(checkpoint.map(|c| c.entry_id))
.with_metadata_replay_entry_id(checkpoint.and_then(|c| c.metadata_entry_id)),
);
]);
Ok(upgrade_instruction)
}
@@ -193,11 +195,7 @@ impl UpgradeCandidateRegion {
match receiver.await {
Ok(msg) => {
let reply = HeartbeatMailbox::json_reply(&msg)?;
let InstructionReply::UpgradeRegion(UpgradeRegionReply {
ready,
exists,
error,
}) = reply
let InstructionReply::UpgradeRegions(UpgradeRegionsReply { replies }) = reply
else {
return error::UnexpectedInstructionReplySnafu {
mailbox_message: msg.to_string(),
@@ -205,6 +203,13 @@ impl UpgradeCandidateRegion {
}
.fail();
};
// TODO(weny): handle multiple replies.
let UpgradeRegionReply {
ready,
exists,
error,
..
} = &replies[0];
// Notes: The order of handling is important.
if error.is_some() {

View File

@@ -18,7 +18,7 @@ use api::v1::meta::mailbox_message::Payload;
use api::v1::meta::{HeartbeatResponse, MailboxMessage};
use common_meta::instruction::{
DowngradeRegionReply, DowngradeRegionsReply, FlushRegionReply, InstructionReply, SimpleReply,
UpgradeRegionReply,
UpgradeRegionReply, UpgradeRegionsReply,
};
use common_meta::key::TableMetadataManagerRef;
use common_meta::key::table_route::TableRouteValue;
@@ -212,11 +212,14 @@ pub fn new_upgrade_region_reply(
to: "meta".to_string(),
timestamp_millis: current_time_millis(),
payload: Some(Payload::Json(
serde_json::to_string(&InstructionReply::UpgradeRegion(UpgradeRegionReply {
ready,
exists,
error,
}))
serde_json::to_string(&InstructionReply::UpgradeRegions(
UpgradeRegionsReply::single(UpgradeRegionReply {
region_id: RegionId::new(0, 0),
ready,
exists,
error,
}),
))
.unwrap(),
)),
}

View File

@@ -254,7 +254,7 @@ mod tests {
assert_eq!(status, http::StatusCode::OK);
assert_eq!(
body,
"[[{\"timestamp_millis\":3,\"id\":0,\"addr\":\"127.0.0.1:3001\",\"rcus\":0,\"wcus\":0,\"region_num\":0,\"region_stats\":[],\"topic_stats\":[],\"node_epoch\":0,\"datanode_workloads\":{\"types\":[]}}]]"
"[[{\"timestamp_millis\":3,\"id\":0,\"addr\":\"127.0.0.1:3001\",\"rcus\":0,\"wcus\":0,\"region_num\":0,\"region_stats\":[],\"topic_stats\":[],\"node_epoch\":0,\"datanode_workloads\":{\"types\":[]},\"gc_stat\":null}]]"
);
}
}

View File

@@ -46,6 +46,7 @@ tracing.workspace = true
common-meta = { workspace = true, features = ["testing"] }
common-test-util.workspace = true
mito2 = { workspace = true, features = ["test"] }
common-wal = { workspace = true }
[package.metadata.cargo-udeps.ignore]
normal = ["aquamarine"]

View File

@@ -37,7 +37,7 @@ use common_error::status_code::StatusCode;
use common_runtime::RepeatedTask;
use mito2::engine::MitoEngine;
pub(crate) use options::IndexOptions;
use snafu::ResultExt;
use snafu::{OptionExt, ResultExt};
pub(crate) use state::MetricEngineState;
use store_api::metadata::RegionMetadataRef;
use store_api::metric_engine_consts::METRIC_ENGINE_NAME;
@@ -46,7 +46,9 @@ use store_api::region_engine::{
RegionStatistic, SetRegionRoleStateResponse, SetRegionRoleStateSuccess,
SettableRegionRoleState, SyncManifestResponse,
};
use store_api::region_request::{BatchRegionDdlRequest, RegionOpenRequest, RegionRequest};
use store_api::region_request::{
BatchRegionDdlRequest, RegionCatchupRequest, RegionOpenRequest, RegionRequest,
};
use store_api::storage::{RegionId, ScanRequest, SequenceNumber};
use crate::config::EngineConfig;
@@ -142,6 +144,17 @@ impl RegionEngine for MetricEngine {
.map_err(BoxedError::new)
}
async fn handle_batch_catchup_requests(
&self,
parallelism: usize,
requests: Vec<(RegionId, RegionCatchupRequest)>,
) -> Result<BatchResponses, BoxedError> {
self.inner
.handle_batch_catchup_requests(parallelism, requests)
.await
.map_err(BoxedError::new)
}
async fn handle_batch_ddl_requests(
&self,
batch_request: BatchRegionDdlRequest,
@@ -247,7 +260,25 @@ impl RegionEngine for MetricEngine {
UnsupportedRegionRequestSnafu { request }.fail()
}
}
RegionRequest::Catchup(req) => self.inner.catchup_region(region_id, req).await,
RegionRequest::Catchup(_) => {
let mut response = self
.inner
.handle_batch_catchup_requests(
1,
vec![(region_id, RegionCatchupRequest::default())],
)
.await
.map_err(BoxedError::new)?;
debug_assert_eq!(response.len(), 1);
let (resp_region_id, response) = response
.pop()
.context(error::UnexpectedRequestSnafu {
reason: "expected 1 response, but got zero responses",
})
.map_err(BoxedError::new)?;
debug_assert_eq!(region_id, resp_region_id);
return response;
}
RegionRequest::BulkInserts(_) => {
// todo(hl): find a way to support bulk inserts in metric engine.
UnsupportedRegionRequestSnafu { request }.fail()
@@ -496,13 +527,17 @@ mod test {
use std::collections::HashMap;
use common_telemetry::info;
use common_wal::options::{KafkaWalOptions, WalOptions};
use mito2::sst::location::region_dir_from_table_dir;
use mito2::test_util::{kafka_log_store_factory, prepare_test_for_kafka_log_store};
use store_api::metric_engine_consts::PHYSICAL_TABLE_METADATA_KEY;
use store_api::mito_engine_options::WAL_OPTIONS_KEY;
use store_api::region_request::{
PathType, RegionCloseRequest, RegionFlushRequest, RegionOpenRequest, RegionRequest,
};
use super::*;
use crate::maybe_skip_kafka_log_store_integration_test;
use crate::test_util::TestEnv;
#[tokio::test]
@@ -683,4 +718,128 @@ mod test {
.unwrap_err();
assert_eq!(err.status_code(), StatusCode::RegionNotFound);
}
#[tokio::test]
async fn test_catchup_regions() {
common_telemetry::init_default_ut_logging();
maybe_skip_kafka_log_store_integration_test!();
let kafka_log_store_factory = kafka_log_store_factory().unwrap();
let mito_env = mito2::test_util::TestEnv::new()
.await
.with_log_store_factory(kafka_log_store_factory.clone());
let env = TestEnv::with_mito_env(mito_env).await;
let table_dir = |region_id| format!("table/{region_id}");
let mut physical_region_ids = vec![];
let mut logical_region_ids = vec![];
let num_topics = 3;
let num_physical_regions = 8;
let num_logical_regions = 16;
let parallelism = 2;
let mut topics = Vec::with_capacity(num_topics);
for _ in 0..num_topics {
let topic = prepare_test_for_kafka_log_store(&kafka_log_store_factory)
.await
.unwrap();
topics.push(topic);
}
let topic_idx = |id| (id as usize) % num_topics;
// Creates physical regions
for i in 0..num_physical_regions {
let physical_region_id = RegionId::new(1, i);
physical_region_ids.push(physical_region_id);
let wal_options = WalOptions::Kafka(KafkaWalOptions {
topic: topics[topic_idx(i)].clone(),
});
env.create_physical_region(
physical_region_id,
&table_dir(physical_region_id),
vec![(
WAL_OPTIONS_KEY.to_string(),
serde_json::to_string(&wal_options).unwrap(),
)],
)
.await;
// Creates logical regions for each physical region
for j in 0..num_logical_regions {
let logical_region_id = RegionId::new(1024 + i, j);
logical_region_ids.push(logical_region_id);
env.create_logical_region(physical_region_id, logical_region_id)
.await;
}
}
let metric_engine = env.metric();
// Closes all regions
for region_id in logical_region_ids.iter().chain(physical_region_ids.iter()) {
metric_engine
.handle_request(*region_id, RegionRequest::Close(RegionCloseRequest {}))
.await
.unwrap();
}
// Opens all regions and skip the wal
let requests = physical_region_ids
.iter()
.enumerate()
.map(|(idx, region_id)| {
let mut options = HashMap::new();
let wal_options = WalOptions::Kafka(KafkaWalOptions {
topic: topics[topic_idx(idx as u32)].clone(),
});
options.insert(PHYSICAL_TABLE_METADATA_KEY.to_string(), String::new());
options.insert(
WAL_OPTIONS_KEY.to_string(),
serde_json::to_string(&wal_options).unwrap(),
);
(
*region_id,
RegionOpenRequest {
engine: METRIC_ENGINE_NAME.to_string(),
table_dir: table_dir(*region_id),
path_type: PathType::Bare,
options: options.clone(),
skip_wal_replay: true,
checkpoint: None,
},
)
})
.collect::<Vec<_>>();
info!("Open batch regions with parallelism: {parallelism}");
metric_engine
.handle_batch_open_requests(parallelism, requests)
.await
.unwrap();
{
let state = metric_engine.inner.state.read().unwrap();
for logical_region in &logical_region_ids {
assert!(!state.logical_regions().contains_key(logical_region));
}
}
let catch_requests = physical_region_ids
.iter()
.map(|region_id| {
(
*region_id,
RegionCatchupRequest {
set_writable: true,
..Default::default()
},
)
})
.collect::<Vec<_>>();
metric_engine
.handle_batch_catchup_requests(parallelism, catch_requests)
.await
.unwrap();
{
let state = metric_engine.inner.state.read().unwrap();
for logical_region in &logical_region_ids {
assert!(state.logical_regions().contains_key(logical_region));
}
}
}
}

View File

@@ -324,9 +324,9 @@ mod test {
let physical_region_id2 = RegionId::new(1024, 1);
let logical_region_id1 = RegionId::new(1025, 0);
let logical_region_id2 = RegionId::new(1025, 1);
env.create_physical_region(physical_region_id1, "/test_dir1")
env.create_physical_region(physical_region_id1, "/test_dir1", vec![])
.await;
env.create_physical_region(physical_region_id2, "/test_dir2")
env.create_physical_region(physical_region_id2, "/test_dir2", vec![])
.await;
let region_create_request1 = crate::test_util::create_logical_region_request(

View File

@@ -12,51 +12,45 @@
// See the License for the specific language governing permissions and
// limitations under the License.
use common_telemetry::debug;
use std::collections::HashMap;
use common_error::ext::BoxedError;
use snafu::{OptionExt, ResultExt};
use store_api::region_engine::RegionEngine;
use store_api::region_request::{
AffectedRows, RegionCatchupRequest, RegionRequest, ReplayCheckpoint,
};
use store_api::region_engine::{BatchResponses, RegionEngine};
use store_api::region_request::{RegionCatchupRequest, ReplayCheckpoint};
use store_api::storage::RegionId;
use crate::engine::MetricEngineInner;
use crate::error::{
MitoCatchupOperationSnafu, PhysicalRegionNotFoundSnafu, Result, UnsupportedRegionRequestSnafu,
};
use crate::error::{BatchCatchupMitoRegionSnafu, PhysicalRegionNotFoundSnafu, Result};
use crate::utils;
impl MetricEngineInner {
pub async fn catchup_region(
pub async fn handle_batch_catchup_requests(
&self,
region_id: RegionId,
req: RegionCatchupRequest,
) -> Result<AffectedRows> {
if !self.is_physical_region(region_id) {
return UnsupportedRegionRequestSnafu {
request: RegionRequest::Catchup(req),
}
.fail();
}
let data_region_id = utils::to_data_region_id(region_id);
let physical_region_options = *self
.state
.read()
.unwrap()
.physical_region_states()
.get(&data_region_id)
.context(PhysicalRegionNotFoundSnafu {
region_id: data_region_id,
})?
.options();
parallelism: usize,
requests: Vec<(RegionId, RegionCatchupRequest)>,
) -> Result<BatchResponses> {
let mut all_requests = Vec::with_capacity(requests.len() * 2);
let mut physical_region_options_list = Vec::with_capacity(requests.len());
let metadata_region_id = utils::to_metadata_region_id(region_id);
// TODO(weny): improve the catchup, we can read the wal entries only once.
debug!("Catchup metadata region {metadata_region_id}");
self.mito
.handle_request(
for (region_id, req) in requests {
let metadata_region_id = utils::to_metadata_region_id(region_id);
let data_region_id = utils::to_data_region_id(region_id);
let physical_region_options = *self
.state
.read()
.unwrap()
.physical_region_states()
.get(&data_region_id)
.context(PhysicalRegionNotFoundSnafu {
region_id: data_region_id,
})?
.options();
physical_region_options_list.push((data_region_id, physical_region_options));
all_requests.push((
metadata_region_id,
RegionRequest::Catchup(RegionCatchupRequest {
RegionCatchupRequest {
set_writable: req.set_writable,
entry_id: req.metadata_entry_id,
metadata_entry_id: None,
@@ -65,16 +59,11 @@ impl MetricEngineInner {
entry_id: c.metadata_entry_id.unwrap_or_default(),
metadata_entry_id: None,
}),
}),
)
.await
.context(MitoCatchupOperationSnafu)?;
debug!("Catchup data region {data_region_id}");
self.mito
.handle_request(
},
));
all_requests.push((
data_region_id,
RegionRequest::Catchup(RegionCatchupRequest {
RegionCatchupRequest {
set_writable: req.set_writable,
entry_id: req.entry_id,
metadata_entry_id: None,
@@ -83,14 +72,45 @@ impl MetricEngineInner {
entry_id: c.entry_id,
metadata_entry_id: None,
}),
}),
)
.await
.context(MitoCatchupOperationSnafu)
.map(|response| response.affected_rows)?;
},
));
}
self.recover_states(region_id, physical_region_options)
.await?;
Ok(0)
let mut results = self
.mito
.handle_batch_catchup_requests(parallelism, all_requests)
.await
.context(BatchCatchupMitoRegionSnafu {})?
.into_iter()
.collect::<HashMap<_, _>>();
let mut responses = Vec::with_capacity(physical_region_options_list.len());
for (physical_region_id, physical_region_options) in physical_region_options_list {
let metadata_region_id = utils::to_metadata_region_id(physical_region_id);
let data_region_id = utils::to_data_region_id(physical_region_id);
let metadata_region_result = results.remove(&metadata_region_id);
let data_region_result = results.remove(&data_region_id);
// Pass the optional `metadata_region_result` and `data_region_result` to
// `recover_physical_region_with_results`. This function handles errors for each
// catchup physical region request, allowing the process to continue with the
// remaining regions even if some requests fail.
let response = self
.recover_physical_region_with_results(
metadata_region_result,
data_region_result,
physical_region_id,
physical_region_options,
// Note: We intentionally dont close the region if recovery fails.
// Closing it here might confuse the region server since it links RegionIds to Engines.
// If recovery didnt succeed, the region should stay open.
false,
)
.await
.map_err(BoxedError::new);
responses.push((physical_region_id, response));
}
Ok(responses)
}
}

View File

@@ -828,9 +828,9 @@ mod test {
let physical_region_id2 = RegionId::new(1024, 1);
let logical_region_id1 = RegionId::new(1025, 0);
let logical_region_id2 = RegionId::new(1025, 1);
env.create_physical_region(physical_region_id1, "/test_dir1")
env.create_physical_region(physical_region_id1, "/test_dir1", vec![])
.await;
env.create_physical_region(physical_region_id2, "/test_dir2")
env.create_physical_region(physical_region_id2, "/test_dir2", vec![])
.await;
let region_create_request1 =

View File

@@ -76,7 +76,7 @@ mod tests {
];
for (phy_region_id, logi_region_ids) in &phy_to_logi {
env.create_physical_region(*phy_region_id, &TestEnv::default_table_dir())
env.create_physical_region(*phy_region_id, &TestEnv::default_table_dir(), vec![])
.await;
for logi_region_id in logi_region_ids {
env.create_logical_region(*phy_region_id, *logi_region_id)
@@ -119,6 +119,7 @@ mod tests {
.index_file_path
.map(|path| path.replace(&e.file_id, "<file_id>"));
e.file_id = "<file_id>".to_string();
e.index_file_id = e.index_file_id.map(|_| "<index_file_id>".to_string());
format!("\n{:?}", e)
})
.sorted()
@@ -127,12 +128,12 @@ mod tests {
assert_eq!(
debug_format,
r#"
ManifestSstEntry { table_dir: "test_metric_region/", region_id: 47244640257(11, 1), table_id: 11, region_number: 1, region_group: 0, region_sequence: 1, file_id: "<file_id>", level: 0, file_path: "test_metric_region/11_0000000001/data/<file_id>.parquet", file_size: 3173, index_file_path: Some("test_metric_region/11_0000000001/data/index/<file_id>.puffin"), index_file_size: Some(235), num_rows: 10, num_row_groups: 1, num_series: Some(1), min_ts: 0::Millisecond, max_ts: 9::Millisecond, sequence: Some(20), origin_region_id: 47244640257(11, 1), node_id: None, visible: true }
ManifestSstEntry { table_dir: "test_metric_region/", region_id: 47244640258(11, 2), table_id: 11, region_number: 2, region_group: 0, region_sequence: 2, file_id: "<file_id>", level: 0, file_path: "test_metric_region/11_0000000002/data/<file_id>.parquet", file_size: 3173, index_file_path: Some("test_metric_region/11_0000000002/data/index/<file_id>.puffin"), index_file_size: Some(235), num_rows: 10, num_row_groups: 1, num_series: Some(1), min_ts: 0::Millisecond, max_ts: 9::Millisecond, sequence: Some(10), origin_region_id: 47244640258(11, 2), node_id: None, visible: true }
ManifestSstEntry { table_dir: "test_metric_region/", region_id: 47261417473(11, 16777217), table_id: 11, region_number: 16777217, region_group: 1, region_sequence: 1, file_id: "<file_id>", level: 0, file_path: "test_metric_region/11_0000000001/metadata/<file_id>.parquet", file_size: 3505, index_file_path: None, index_file_size: None, num_rows: 8, num_row_groups: 1, num_series: Some(8), min_ts: 0::Millisecond, max_ts: 0::Millisecond, sequence: Some(8), origin_region_id: 47261417473(11, 16777217), node_id: None, visible: true }
ManifestSstEntry { table_dir: "test_metric_region/", region_id: 47261417474(11, 16777218), table_id: 11, region_number: 16777218, region_group: 1, region_sequence: 2, file_id: "<file_id>", level: 0, file_path: "test_metric_region/11_0000000002/metadata/<file_id>.parquet", file_size: 3489, index_file_path: None, index_file_size: None, num_rows: 4, num_row_groups: 1, num_series: Some(4), min_ts: 0::Millisecond, max_ts: 0::Millisecond, sequence: Some(4), origin_region_id: 47261417474(11, 16777218), node_id: None, visible: true }
ManifestSstEntry { table_dir: "test_metric_region/", region_id: 94489280554(22, 42), table_id: 22, region_number: 42, region_group: 0, region_sequence: 42, file_id: "<file_id>", level: 0, file_path: "test_metric_region/22_0000000042/data/<file_id>.parquet", file_size: 3173, index_file_path: Some("test_metric_region/22_0000000042/data/index/<file_id>.puffin"), index_file_size: Some(235), num_rows: 10, num_row_groups: 1, num_series: Some(1), min_ts: 0::Millisecond, max_ts: 9::Millisecond, sequence: Some(10), origin_region_id: 94489280554(22, 42), node_id: None, visible: true }
ManifestSstEntry { table_dir: "test_metric_region/", region_id: 94506057770(22, 16777258), table_id: 22, region_number: 16777258, region_group: 1, region_sequence: 42, file_id: "<file_id>", level: 0, file_path: "test_metric_region/22_0000000042/metadata/<file_id>.parquet", file_size: 3489, index_file_path: None, index_file_size: None, num_rows: 4, num_row_groups: 1, num_series: Some(4), min_ts: 0::Millisecond, max_ts: 0::Millisecond, sequence: Some(4), origin_region_id: 94506057770(22, 16777258), node_id: None, visible: true }"#
ManifestSstEntry { table_dir: "test_metric_region/", region_id: 47244640257(11, 1), table_id: 11, region_number: 1, region_group: 0, region_sequence: 1, file_id: "<file_id>", index_file_id: Some("<index_file_id>"), level: 0, file_path: "test_metric_region/11_0000000001/data/<file_id>.parquet", file_size: 3173, index_file_path: Some("test_metric_region/11_0000000001/data/index/<file_id>.puffin"), index_file_size: Some(235), num_rows: 10, num_row_groups: 1, num_series: Some(1), min_ts: 0::Millisecond, max_ts: 9::Millisecond, sequence: Some(20), origin_region_id: 47244640257(11, 1), node_id: None, visible: true }
ManifestSstEntry { table_dir: "test_metric_region/", region_id: 47244640258(11, 2), table_id: 11, region_number: 2, region_group: 0, region_sequence: 2, file_id: "<file_id>", index_file_id: Some("<index_file_id>"), level: 0, file_path: "test_metric_region/11_0000000002/data/<file_id>.parquet", file_size: 3173, index_file_path: Some("test_metric_region/11_0000000002/data/index/<file_id>.puffin"), index_file_size: Some(235), num_rows: 10, num_row_groups: 1, num_series: Some(1), min_ts: 0::Millisecond, max_ts: 9::Millisecond, sequence: Some(10), origin_region_id: 47244640258(11, 2), node_id: None, visible: true }
ManifestSstEntry { table_dir: "test_metric_region/", region_id: 47261417473(11, 16777217), table_id: 11, region_number: 16777217, region_group: 1, region_sequence: 1, file_id: "<file_id>", index_file_id: None, level: 0, file_path: "test_metric_region/11_0000000001/metadata/<file_id>.parquet", file_size: 3505, index_file_path: None, index_file_size: None, num_rows: 8, num_row_groups: 1, num_series: Some(8), min_ts: 0::Millisecond, max_ts: 0::Millisecond, sequence: Some(8), origin_region_id: 47261417473(11, 16777217), node_id: None, visible: true }
ManifestSstEntry { table_dir: "test_metric_region/", region_id: 47261417474(11, 16777218), table_id: 11, region_number: 16777218, region_group: 1, region_sequence: 2, file_id: "<file_id>", index_file_id: None, level: 0, file_path: "test_metric_region/11_0000000002/metadata/<file_id>.parquet", file_size: 3489, index_file_path: None, index_file_size: None, num_rows: 4, num_row_groups: 1, num_series: Some(4), min_ts: 0::Millisecond, max_ts: 0::Millisecond, sequence: Some(4), origin_region_id: 47261417474(11, 16777218), node_id: None, visible: true }
ManifestSstEntry { table_dir: "test_metric_region/", region_id: 94489280554(22, 42), table_id: 22, region_number: 42, region_group: 0, region_sequence: 42, file_id: "<file_id>", index_file_id: Some("<index_file_id>"), level: 0, file_path: "test_metric_region/22_0000000042/data/<file_id>.parquet", file_size: 3173, index_file_path: Some("test_metric_region/22_0000000042/data/index/<file_id>.puffin"), index_file_size: Some(235), num_rows: 10, num_row_groups: 1, num_series: Some(1), min_ts: 0::Millisecond, max_ts: 9::Millisecond, sequence: Some(10), origin_region_id: 94489280554(22, 42), node_id: None, visible: true }
ManifestSstEntry { table_dir: "test_metric_region/", region_id: 94506057770(22, 16777258), table_id: 22, region_number: 16777258, region_group: 1, region_sequence: 42, file_id: "<file_id>", index_file_id: None, level: 0, file_path: "test_metric_region/22_0000000042/metadata/<file_id>.parquet", file_size: 3489, index_file_path: None, index_file_size: None, num_rows: 4, num_row_groups: 1, num_series: Some(4), min_ts: 0::Millisecond, max_ts: 0::Millisecond, sequence: Some(4), origin_region_id: 94506057770(22, 16777258), node_id: None, visible: true }"#
);
// list from storage
let storage_entries = mito

View File

@@ -47,6 +47,7 @@ impl MetricEngineInner {
for (region_id, request) in requests {
if !request.is_physical_table() {
warn!("Skipping non-physical table open request: {region_id}");
continue;
}
let physical_region_options = PhysicalRegionOptions::try_from(&request.options)?;
@@ -72,17 +73,19 @@ impl MetricEngineInner {
let metadata_region_id = utils::to_metadata_region_id(physical_region_id);
let data_region_id = utils::to_data_region_id(physical_region_id);
let metadata_region_result = results.remove(&metadata_region_id);
let data_region_result = results.remove(&data_region_id);
let data_region_result: Option<std::result::Result<RegionResponse, BoxedError>> =
results.remove(&data_region_id);
// Pass the optional `metadata_region_result` and `data_region_result` to
// `open_physical_region_with_results`. This function handles errors for each
// `recover_physical_region_with_results`. This function handles errors for each
// open physical region request, allowing the process to continue with the
// remaining regions even if some requests fail.
let response = self
.open_physical_region_with_results(
.recover_physical_region_with_results(
metadata_region_result,
data_region_result,
physical_region_id,
physical_region_options,
true,
)
.await
.map_err(BoxedError::new);
@@ -107,12 +110,13 @@ impl MetricEngineInner {
}
}
async fn open_physical_region_with_results(
pub(crate) async fn recover_physical_region_with_results(
&self,
metadata_region_result: Option<std::result::Result<RegionResponse, BoxedError>>,
data_region_result: Option<std::result::Result<RegionResponse, BoxedError>>,
physical_region_id: RegionId,
physical_region_options: PhysicalRegionOptions,
close_region_on_failure: bool,
) -> Result<RegionResponse> {
let metadata_region_id = utils::to_metadata_region_id(physical_region_id);
let data_region_id = utils::to_data_region_id(physical_region_id);
@@ -136,8 +140,10 @@ impl MetricEngineInner {
.recover_states(physical_region_id, physical_region_options)
.await
{
self.close_physical_region_on_recovery_failure(physical_region_id)
.await;
if close_region_on_failure {
self.close_physical_region_on_recovery_failure(physical_region_id)
.await;
}
return Err(err);
}
Ok(data_region_response)

View File

@@ -50,6 +50,13 @@ pub enum Error {
location: Location,
},
#[snafu(display("Failed to batch catchup mito region"))]
BatchCatchupMitoRegion {
source: BoxedError,
#[snafu(implicit)]
location: Location,
},
#[snafu(display("No open region result for region {}", region_id))]
NoOpenRegionResult {
region_id: RegionId,
@@ -149,13 +156,6 @@ pub enum Error {
location: Location,
},
#[snafu(display("Mito catchup operation fails"))]
MitoCatchupOperation {
source: BoxedError,
#[snafu(implicit)]
location: Location,
},
#[snafu(display("Mito sync operation fails"))]
MitoSyncOperation {
source: BoxedError,
@@ -357,11 +357,11 @@ impl ErrorExt for Error {
| CloseMitoRegion { source, .. }
| MitoReadOperation { source, .. }
| MitoWriteOperation { source, .. }
| MitoCatchupOperation { source, .. }
| MitoFlushOperation { source, .. }
| MitoDeleteOperation { source, .. }
| MitoSyncOperation { source, .. }
| BatchOpenMitoRegion { source, .. } => source.status_code(),
| BatchOpenMitoRegion { source, .. }
| BatchCatchupMitoRegion { source, .. } => source.status_code(),
EncodePrimaryKey { source, .. } => source.status_code(),

View File

@@ -76,6 +76,17 @@ impl TestEnv {
}
}
/// Returns a new env with specific `prefix` and `mito_env` for test.
pub async fn with_mito_env(mut mito_env: MitoTestEnv) -> Self {
let mito = mito_env.create_engine(MitoConfig::default()).await;
let metric = MetricEngine::try_new(mito.clone(), EngineConfig::default()).unwrap();
Self {
mito_env,
mito,
metric,
}
}
pub fn data_home(&self) -> String {
let env_root = self.mito_env.data_home().to_string_lossy().to_string();
join_dir(&env_root, "data")
@@ -125,7 +136,12 @@ impl TestEnv {
}
/// Create regions in [MetricEngine] with specific `physical_region_id`.
pub async fn create_physical_region(&self, physical_region_id: RegionId, table_dir: &str) {
pub async fn create_physical_region(
&self,
physical_region_id: RegionId,
table_dir: &str,
options: Vec<(String, String)>,
) {
let region_create_request = RegionCreateRequest {
engine: METRIC_ENGINE_NAME.to_string(),
column_metadatas: vec![
@@ -151,6 +167,7 @@ impl TestEnv {
primary_key: vec![],
options: [(PHYSICAL_TABLE_METADATA_KEY.to_string(), String::new())]
.into_iter()
.chain(options.into_iter())
.collect(),
table_dir: table_dir.to_string(),
path_type: PathType::Bare, // Use Bare path type for engine regions
@@ -231,7 +248,7 @@ impl TestEnv {
/// under [`default_logical_region_id`].
pub async fn init_metric_region(&self) {
let physical_region_id = self.default_physical_region_id();
self.create_physical_region(physical_region_id, &Self::default_table_dir())
self.create_physical_region(physical_region_id, &Self::default_table_dir(), vec![])
.await;
let logical_region_id = self.default_logical_region_id();
self.create_logical_region(physical_region_id, logical_region_id)
@@ -424,6 +441,22 @@ pub fn build_rows(num_tags: usize, num_rows: usize) -> Vec<Row> {
rows
}
#[macro_export]
/// Skip the test if the environment variable `GT_KAFKA_ENDPOINTS` is not set.
///
/// The format of the environment variable is:
/// ```text
/// GT_KAFKA_ENDPOINTS=localhost:9092,localhost:9093
/// ```
macro_rules! maybe_skip_kafka_log_store_integration_test {
() => {
if std::env::var("GT_KAFKA_ENDPOINTS").is_err() {
common_telemetry::warn!("The kafka endpoints is empty, skipping the test");
return;
}
};
}
#[cfg(test)]
mod test {
use object_store::ObjectStore;

View File

@@ -477,6 +477,8 @@ fn flat_merge_iterator_bench(c: &mut Criterion) {
bulk_part.batch.clone(),
context.clone(),
None, // No sequence filter
1024, // 1024 hosts per part
None, // No mem_scan_metrics
);
iters.push(Box::new(iter) as _);
}
@@ -534,8 +536,13 @@ fn bulk_part_record_batch_iter_filter(c: &mut Criterion) {
);
// Create and iterate over BulkPartRecordBatchIter with filter
let iter =
BulkPartRecordBatchIter::new(record_batch_with_filter.clone(), context, None);
let iter = BulkPartRecordBatchIter::new(
record_batch_with_filter.clone(),
context,
None, // No sequence filter
4096, // 4096 hosts
None, // No mem_scan_metrics
);
// Consume all batches
for batch_result in iter {
@@ -559,7 +566,13 @@ fn bulk_part_record_batch_iter_filter(c: &mut Criterion) {
);
// Create and iterate over BulkPartRecordBatchIter
let iter = BulkPartRecordBatchIter::new(record_batch_no_filter.clone(), context, None);
let iter = BulkPartRecordBatchIter::new(
record_batch_no_filter.clone(),
context,
None, // No sequence filter
4096, // 4096 hosts
None, // No mem_scan_metrics
);
// Consume all batches
for batch_result in iter {

View File

@@ -20,12 +20,11 @@ use criterion::{Criterion, black_box, criterion_group, criterion_main};
use datatypes::data_type::ConcreteDataType;
use datatypes::schema::ColumnSchema;
use mito2::memtable::simple_bulk_memtable::SimpleBulkMemtable;
use mito2::memtable::{KeyValues, Memtable, MemtableRanges};
use mito2::memtable::{KeyValues, Memtable, MemtableRanges, RangesOptions};
use mito2::read;
use mito2::read::Source;
use mito2::read::dedup::DedupReader;
use mito2::read::merge::MergeReaderBuilder;
use mito2::read::scan_region::PredicateGroup;
use mito2::region::options::MergeMode;
use mito2::test_util::column_metadata_to_column_schema;
use store_api::metadata::{ColumnMetadata, RegionMetadataBuilder};
@@ -126,9 +125,7 @@ fn create_memtable_with_rows(num_batches: usize) -> SimpleBulkMemtable {
}
async fn flush(mem: &SimpleBulkMemtable) {
let MemtableRanges { ranges, .. } = mem
.ranges(None, PredicateGroup::default(), None, true)
.unwrap();
let MemtableRanges { ranges, .. } = mem.ranges(None, RangesOptions::for_flush()).unwrap();
let mut source = if ranges.len() == 1 {
let only_range = ranges.into_values().next().unwrap();

View File

@@ -72,7 +72,7 @@ pub struct Metrics {
}
impl Metrics {
pub(crate) fn new(write_type: WriteType) -> Self {
pub fn new(write_type: WriteType) -> Self {
Self {
write_type,
iter_source: Default::default(),
@@ -213,7 +213,11 @@ impl AccessLayer {
}
/// Deletes a SST file (and its index file if it has one) with given file id.
pub(crate) async fn delete_sst(&self, region_file_id: &RegionFileId) -> Result<()> {
pub(crate) async fn delete_sst(
&self,
region_file_id: &RegionFileId,
index_file_id: &RegionFileId,
) -> Result<()> {
let path = location::sst_file_path(&self.table_dir, *region_file_id, self.path_type);
self.object_store
.delete(&path)
@@ -222,7 +226,7 @@ impl AccessLayer {
file_id: region_file_id.file_id(),
})?;
let path = location::index_file_path(&self.table_dir, *region_file_id, self.path_type);
let path = location::index_file_path(&self.table_dir, *index_file_id, self.path_type);
self.object_store
.delete(&path)
.await
@@ -255,12 +259,12 @@ impl AccessLayer {
&self,
request: SstWriteRequest,
write_opts: &WriteOptions,
write_type: WriteType,
) -> Result<(SstInfoArray, Metrics)> {
metrics: &mut Metrics,
) -> Result<SstInfoArray> {
let region_id = request.metadata.region_id;
let cache_manager = request.cache_manager.clone();
let (sst_info, metrics) = if let Some(write_cache) = cache_manager.write_cache() {
let sst_info = if let Some(write_cache) = cache_manager.write_cache() {
// Write to the write cache.
write_cache
.write_and_upload_sst(
@@ -273,7 +277,7 @@ impl AccessLayer {
remote_store: self.object_store.clone(),
},
write_opts,
write_type,
metrics,
)
.await?
} else {
@@ -303,11 +307,11 @@ impl AccessLayer {
request.index_config,
indexer_builder,
path_provider,
Metrics::new(write_type),
metrics,
)
.await
.with_file_cleaner(cleaner);
let ssts = match request.source {
match request.source {
Either::Left(source) => {
writer
.write_all(source, request.max_sequence, write_opts)
@@ -316,9 +320,7 @@ impl AccessLayer {
Either::Right(flat_source) => {
writer.write_all_flat(flat_source, write_opts).await?
}
};
let metrics = writer.into_metrics();
(ssts, metrics)
}
};
// Put parquet metadata to cache manager.
@@ -333,7 +335,7 @@ impl AccessLayer {
}
}
Ok((sst_info, metrics))
Ok(sst_info)
}
/// Puts encoded SST bytes to the write cache (if enabled) and uploads it to the object store.

View File

@@ -15,7 +15,7 @@
use std::ops::Range;
use std::sync::Arc;
use api::v1::index::BloomFilterMeta;
use api::v1::index::{BloomFilterLoc, BloomFilterMeta};
use async_trait::async_trait;
use bytes::Bytes;
use index::bloom_filter::error::Result;
@@ -60,11 +60,17 @@ impl BloomFilterIndexCache {
/// Calculates weight for bloom filter index metadata.
fn bloom_filter_index_metadata_weight(
k: &(FileId, ColumnId, Tag),
_: &Arc<BloomFilterMeta>,
meta: &Arc<BloomFilterMeta>,
) -> u32 {
(k.0.as_bytes().len()
let base = k.0.as_bytes().len()
+ std::mem::size_of::<ColumnId>()
+ std::mem::size_of::<BloomFilterMeta>()) as u32
+ std::mem::size_of::<Tag>()
+ std::mem::size_of::<BloomFilterMeta>();
let vec_estimated = meta.segment_loc_indices.len() * std::mem::size_of::<u64>()
+ meta.bloom_filter_locs.len() * std::mem::size_of::<BloomFilterLoc>();
(base + vec_estimated) as u32
}
/// Calculates weight for bloom filter index content.
@@ -171,6 +177,45 @@ mod test {
const FUZZ_REPEAT_TIMES: usize = 100;
#[test]
fn bloom_filter_metadata_weight_counts_vec_contents() {
let file_id = FileId::parse_str("00000000-0000-0000-0000-000000000001").unwrap();
let column_id: ColumnId = 42;
let tag = Tag::Skipping;
let meta = BloomFilterMeta {
rows_per_segment: 128,
segment_count: 2,
row_count: 256,
bloom_filter_size: 1024,
segment_loc_indices: vec![0, 64, 128, 192],
bloom_filter_locs: vec![
BloomFilterLoc {
offset: 0,
size: 512,
element_count: 1000,
},
BloomFilterLoc {
offset: 512,
size: 512,
element_count: 1000,
},
],
};
let weight =
bloom_filter_index_metadata_weight(&(file_id, column_id, tag), &Arc::new(meta.clone()));
let base = file_id.as_bytes().len()
+ std::mem::size_of::<ColumnId>()
+ std::mem::size_of::<Tag>()
+ std::mem::size_of::<BloomFilterMeta>();
let expected_dynamic = meta.segment_loc_indices.len() * std::mem::size_of::<u64>()
+ meta.bloom_filter_locs.len() * std::mem::size_of::<BloomFilterLoc>();
assert_eq!(weight as usize, base + expected_dynamic);
}
#[test]
fn fuzz_index_calculation() {
let mut rng = rand::rng();

View File

@@ -169,8 +169,8 @@ impl WriteCache {
write_request: SstWriteRequest,
upload_request: SstUploadRequest,
write_opts: &WriteOptions,
write_type: WriteType,
) -> Result<(SstInfoArray, Metrics)> {
metrics: &mut Metrics,
) -> Result<SstInfoArray> {
let region_id = write_request.metadata.region_id;
let store = self.file_cache.local_store();
@@ -197,7 +197,7 @@ impl WriteCache {
write_request.index_config,
indexer,
path_provider.clone(),
Metrics::new(write_type),
metrics,
)
.await
.with_file_cleaner(cleaner);
@@ -210,11 +210,10 @@ impl WriteCache {
}
either::Right(flat_source) => writer.write_all_flat(flat_source, write_opts).await?,
};
let mut metrics = writer.into_metrics();
// Upload sst file to remote object store.
if sst_info.is_empty() {
return Ok((sst_info, metrics));
return Ok(sst_info);
}
let mut upload_tracker = UploadTracker::new(region_id);
@@ -256,7 +255,7 @@ impl WriteCache {
return Err(err);
}
Ok((sst_info, metrics))
Ok(sst_info)
}
/// Removes a file from the cache by `index_key`.
@@ -559,8 +558,9 @@ mod tests {
};
// Write to cache and upload sst to mock remote store
let (mut sst_infos, _) = write_cache
.write_and_upload_sst(write_request, upload_request, &write_opts, WriteType::Flush)
let mut metrics = Metrics::new(WriteType::Flush);
let mut sst_infos = write_cache
.write_and_upload_sst(write_request, upload_request, &write_opts, &mut metrics)
.await
.unwrap();
let sst_info = sst_infos.remove(0);
@@ -655,8 +655,9 @@ mod tests {
remote_store: mock_store.clone(),
};
let (mut sst_infos, _) = write_cache
.write_and_upload_sst(write_request, upload_request, &write_opts, WriteType::Flush)
let mut metrics = Metrics::new(WriteType::Flush);
let mut sst_infos = write_cache
.write_and_upload_sst(write_request, upload_request, &write_opts, &mut metrics)
.await
.unwrap();
let sst_info = sst_infos.remove(0);
@@ -735,8 +736,9 @@ mod tests {
remote_store: mock_store.clone(),
};
let mut metrics = Metrics::new(WriteType::Flush);
write_cache
.write_and_upload_sst(write_request, upload_request, &write_opts, WriteType::Flush)
.write_and_upload_sst(write_request, upload_request, &write_opts, &mut metrics)
.await
.unwrap_err();
let atomic_write_dir = write_cache_dir.path().join(ATOMIC_WRITE_DIR);

View File

@@ -305,6 +305,7 @@ impl CompactionScheduler {
&options,
&request.current_version.options.compaction,
request.current_version.options.append_mode,
Some(self.engine_config.max_background_compactions),
);
let region_id = request.region_id();
let CompactionRequest {

View File

@@ -30,10 +30,12 @@ use store_api::metadata::RegionMetadataRef;
use store_api::region_request::PathType;
use store_api::storage::RegionId;
use crate::access_layer::{AccessLayer, AccessLayerRef, OperationType, SstWriteRequest, WriteType};
use crate::access_layer::{
AccessLayer, AccessLayerRef, Metrics, OperationType, SstWriteRequest, WriteType,
};
use crate::cache::{CacheManager, CacheManagerRef};
use crate::compaction::picker::{PickerOutput, new_picker};
use crate::compaction::{CompactionSstReaderBuilder, find_ttl};
use crate::compaction::{CompactionOutput, CompactionSstReaderBuilder, find_ttl};
use crate::config::MitoConfig;
use crate::error::{
EmptyRegionDirSnafu, InvalidPartitionExprSnafu, JoinSnafu, ObjectStoreNotFoundSnafu, Result,
@@ -311,6 +313,126 @@ pub trait Compactor: Send + Sync + 'static {
/// DefaultCompactor is the default implementation of Compactor.
pub struct DefaultCompactor;
impl DefaultCompactor {
/// Merge a single compaction output into SST files.
async fn merge_single_output(
compaction_region: CompactionRegion,
output: CompactionOutput,
write_opts: WriteOptions,
) -> Result<Vec<FileMeta>> {
let region_id = compaction_region.region_id;
let storage = compaction_region.region_options.storage.clone();
let index_options = compaction_region
.current_version
.options
.index_options
.clone();
let append_mode = compaction_region.current_version.options.append_mode;
let merge_mode = compaction_region.current_version.options.merge_mode();
let flat_format = compaction_region
.region_options
.sst_format
.map(|format| format == FormatType::Flat)
.unwrap_or(
compaction_region
.engine_config
.default_experimental_flat_format,
);
let index_config = compaction_region.engine_config.index.clone();
let inverted_index_config = compaction_region.engine_config.inverted_index.clone();
let fulltext_index_config = compaction_region.engine_config.fulltext_index.clone();
let bloom_filter_index_config = compaction_region.engine_config.bloom_filter_index.clone();
let input_file_names = output
.inputs
.iter()
.map(|f| f.file_id().to_string())
.join(",");
let max_sequence = output
.inputs
.iter()
.map(|f| f.meta_ref().sequence)
.max()
.flatten();
let builder = CompactionSstReaderBuilder {
metadata: compaction_region.region_metadata.clone(),
sst_layer: compaction_region.access_layer.clone(),
cache: compaction_region.cache_manager.clone(),
inputs: &output.inputs,
append_mode,
filter_deleted: output.filter_deleted,
time_range: output.output_time_range,
merge_mode,
};
let source = if flat_format {
let reader = builder.build_flat_sst_reader().await?;
Either::Right(FlatSource::Stream(reader))
} else {
let reader = builder.build_sst_reader().await?;
Either::Left(Source::Reader(reader))
};
let mut metrics = Metrics::new(WriteType::Compaction);
let region_metadata = compaction_region.region_metadata.clone();
let sst_infos = compaction_region
.access_layer
.write_sst(
SstWriteRequest {
op_type: OperationType::Compact,
metadata: region_metadata.clone(),
source,
cache_manager: compaction_region.cache_manager.clone(),
storage,
max_sequence: max_sequence.map(NonZero::get),
index_options,
index_config,
inverted_index_config,
fulltext_index_config,
bloom_filter_index_config,
},
&write_opts,
&mut metrics,
)
.await?;
// Convert partition expression once outside the map
let partition_expr = match &region_metadata.partition_expr {
None => None,
Some(json_str) if json_str.is_empty() => None,
Some(json_str) => PartitionExpr::from_json_str(json_str).with_context(|_| {
InvalidPartitionExprSnafu {
expr: json_str.clone(),
}
})?,
};
let output_files = sst_infos
.into_iter()
.map(|sst_info| FileMeta {
region_id,
file_id: sst_info.file_id,
time_range: sst_info.time_range,
level: output.output_level,
file_size: sst_info.file_size,
available_indexes: sst_info.index_metadata.build_available_indexes(),
index_file_size: sst_info.index_metadata.file_size,
index_file_id: None,
num_rows: sst_info.num_rows as u64,
num_row_groups: sst_info.num_row_groups,
sequence: max_sequence,
partition_expr: partition_expr.clone(),
num_series: sst_info.num_series,
})
.collect::<Vec<_>>();
let output_file_names = output_files.iter().map(|f| f.file_id.to_string()).join(",");
info!(
"Region {} compaction inputs: [{}], outputs: [{}], flat_format: {}, metrics: {:?}",
region_id, input_file_names, output_file_names, flat_format, metrics
);
metrics.observe();
Ok(output_files)
}
}
#[async_trait::async_trait]
impl Compactor for DefaultCompactor {
async fn merge_ssts(
@@ -322,129 +444,22 @@ impl Compactor for DefaultCompactor {
let mut compacted_inputs =
Vec::with_capacity(picker_output.outputs.iter().map(|o| o.inputs.len()).sum());
let internal_parallelism = compaction_region.max_parallelism.max(1);
let compaction_time_window = picker_output.time_window_size;
for output in picker_output.outputs.drain(..) {
compacted_inputs.extend(output.inputs.iter().map(|f| f.meta_ref().clone()));
let inputs_to_remove: Vec<_> =
output.inputs.iter().map(|f| f.meta_ref().clone()).collect();
compacted_inputs.extend(inputs_to_remove.iter().cloned());
let write_opts = WriteOptions {
write_buffer_size: compaction_region.engine_config.sst_write_buffer_size,
max_file_size: picker_output.max_file_size,
..Default::default()
};
let region_metadata = compaction_region.region_metadata.clone();
let sst_layer = compaction_region.access_layer.clone();
let region_id = compaction_region.region_id;
let cache_manager = compaction_region.cache_manager.clone();
let storage = compaction_region.region_options.storage.clone();
let index_options = compaction_region
.current_version
.options
.index_options
.clone();
let append_mode = compaction_region.current_version.options.append_mode;
let merge_mode = compaction_region.current_version.options.merge_mode();
let flat_format = compaction_region
.region_options
.sst_format
.map(|format| format == FormatType::Flat)
.unwrap_or(
compaction_region
.engine_config
.default_experimental_flat_format,
);
let index_config = compaction_region.engine_config.index.clone();
let inverted_index_config = compaction_region.engine_config.inverted_index.clone();
let fulltext_index_config = compaction_region.engine_config.fulltext_index.clone();
let bloom_filter_index_config =
compaction_region.engine_config.bloom_filter_index.clone();
let max_sequence = output
.inputs
.iter()
.map(|f| f.meta_ref().sequence)
.max()
.flatten();
let region_metadata_for_filemeta = region_metadata.clone();
futs.push(async move {
let input_file_names = output
.inputs
.iter()
.map(|f| f.file_id().to_string())
.join(",");
let builder = CompactionSstReaderBuilder {
metadata: region_metadata.clone(),
sst_layer: sst_layer.clone(),
cache: cache_manager.clone(),
inputs: &output.inputs,
append_mode,
filter_deleted: output.filter_deleted,
time_range: output.output_time_range,
merge_mode,
};
let source = if flat_format {
let reader = builder.build_flat_sst_reader().await?;
either::Right(FlatSource::Stream(reader))
} else {
let reader = builder.build_sst_reader().await?;
either::Left(Source::Reader(reader))
};
let (sst_infos, metrics) = sst_layer
.write_sst(
SstWriteRequest {
op_type: OperationType::Compact,
metadata: region_metadata,
source,
cache_manager,
storage,
max_sequence: max_sequence.map(NonZero::get),
index_options,
index_config,
inverted_index_config,
fulltext_index_config,
bloom_filter_index_config,
},
&write_opts,
WriteType::Compaction,
)
.await?;
// Convert partition expression once outside the map
let partition_expr = match &region_metadata_for_filemeta.partition_expr {
None => None,
Some(json_str) if json_str.is_empty() => None,
Some(json_str) => {
PartitionExpr::from_json_str(json_str).with_context(|_| {
InvalidPartitionExprSnafu {
expr: json_str.clone(),
}
})?
}
};
let output_files = sst_infos
.into_iter()
.map(|sst_info| FileMeta {
region_id,
file_id: sst_info.file_id,
time_range: sst_info.time_range,
level: output.output_level,
file_size: sst_info.file_size,
available_indexes: sst_info.index_metadata.build_available_indexes(),
index_file_size: sst_info.index_metadata.file_size,
num_rows: sst_info.num_rows as u64,
num_row_groups: sst_info.num_row_groups,
sequence: max_sequence,
partition_expr: partition_expr.clone(),
num_series: sst_info.num_series,
})
.collect::<Vec<_>>();
let output_file_names =
output_files.iter().map(|f| f.file_id.to_string()).join(",");
info!(
"Region {} compaction inputs: [{}], outputs: [{}], flat_format: {}, metrics: {:?}",
region_id, input_file_names, output_file_names, flat_format, metrics
);
metrics.observe();
Ok(output_files)
});
futs.push(Self::merge_single_output(
compaction_region.clone(),
output,
write_opts,
));
}
let mut output_files = Vec::with_capacity(futs.len());
while !futs.is_empty() {
@@ -462,6 +477,8 @@ impl Compactor for DefaultCompactor {
output_files.extend(metas.into_iter().flatten());
}
// In case of remote compaction, we still allow the region edit after merge to
// clean expired ssts.
let mut inputs: Vec<_> = compacted_inputs.into_iter().collect();
inputs.extend(
picker_output
@@ -473,7 +490,7 @@ impl Compactor for DefaultCompactor {
Ok(MergeOutput {
files_to_add: output_files,
files_to_remove: inputs,
compaction_time_window: Some(picker_output.time_window_size),
compaction_time_window: Some(compaction_time_window),
})
}
@@ -518,6 +535,7 @@ impl Compactor for DefaultCompactor {
&compact_request_options,
&compaction_region.region_options.compaction,
compaction_region.region_options.append_mode,
None,
)
.pick(compaction_region);

View File

@@ -125,6 +125,7 @@ pub fn new_picker(
compact_request_options: &compact_request::Options,
compaction_options: &CompactionOptions,
append_mode: bool,
max_background_tasks: Option<usize>,
) -> Arc<dyn Picker> {
if let compact_request::Options::StrictWindow(window) = compact_request_options {
let window = if window.window_seconds == 0 {
@@ -140,6 +141,7 @@ pub fn new_picker(
time_window_seconds: twcs_opts.time_window_seconds(),
max_output_file_size: twcs_opts.max_output_file_size.map(|r| r.as_bytes()),
append_mode,
max_background_tasks,
}) as Arc<_>,
}
}

View File

@@ -16,19 +16,22 @@ use std::fmt::{Debug, Formatter};
use std::sync::Arc;
use std::time::Instant;
use common_telemetry::{error, info};
use common_telemetry::{error, info, warn};
use itertools::Itertools;
use snafu::ResultExt;
use tokio::sync::mpsc;
use crate::compaction::compactor::{CompactionRegion, Compactor};
use crate::compaction::picker::{CompactionTask, PickerOutput};
use crate::error::CompactRegionSnafu;
use crate::manifest::action::RegionEdit;
use crate::manifest::action::{RegionEdit, RegionMetaAction, RegionMetaActionList};
use crate::metrics::{COMPACTION_FAILURE_COUNT, COMPACTION_STAGE_ELAPSED};
use crate::region::RegionLeaderState;
use crate::request::{
BackgroundNotify, CompactionFailed, CompactionFinished, OutputTx, WorkerRequest,
WorkerRequestWithTime,
BackgroundNotify, CompactionFailed, CompactionFinished, OutputTx, RegionEditResult,
WorkerRequest, WorkerRequestWithTime,
};
use crate::sst::file::FileMeta;
use crate::worker::WorkerListener;
use crate::{error, metrics};
@@ -78,9 +81,93 @@ impl CompactionTaskImpl {
.for_each(|o| o.inputs.iter().for_each(|f| f.set_compacting(compacting)));
}
async fn handle_compaction(&mut self) -> error::Result<RegionEdit> {
/// Remove expired ssts files, update manifest immediately
/// and apply the edit to region version.
///
/// This function logs errors but does not stop the compaction process if removal fails.
async fn remove_expired(
&self,
compaction_region: &CompactionRegion,
expired_files: Vec<FileMeta>,
) {
let region_id = compaction_region.region_id;
let expired_files_str = expired_files.iter().map(|f| f.file_id).join(",");
let (expire_delete_sender, expire_delete_listener) = tokio::sync::oneshot::channel();
// Update manifest to remove expired SSTs
let edit = RegionEdit {
files_to_add: Vec::new(),
files_to_remove: expired_files,
timestamp_ms: Some(chrono::Utc::now().timestamp_millis()),
compaction_time_window: None,
flushed_entry_id: None,
flushed_sequence: None,
committed_sequence: None,
};
// 1. Update manifest
let action_list = RegionMetaActionList::with_action(RegionMetaAction::Edit(edit.clone()));
if let Err(e) = compaction_region
.manifest_ctx
.update_manifest(RegionLeaderState::Writable, action_list)
.await
{
error!(
e;
"Failed to update manifest for expired files removal, region: {region_id}, files: [{expired_files_str}]. Compaction will continue."
);
return;
}
// 2. Notify region worker loop to remove expired files from region version.
self.send_to_worker(WorkerRequest::Background {
region_id,
notify: BackgroundNotify::RegionEdit(RegionEditResult {
region_id,
sender: expire_delete_sender,
edit,
result: Ok(()),
}),
})
.await;
if let Err(e) = expire_delete_listener
.await
.context(error::RecvSnafu)
.flatten()
{
warn!(
e;
"Failed to remove expired files from region version, region: {region_id}, files: [{expired_files_str}]. Compaction will continue."
);
return;
}
info!(
"Successfully removed expired files, region: {region_id}, files: [{expired_files_str}]"
);
}
async fn handle_expiration_and_compaction(&mut self) -> error::Result<RegionEdit> {
self.mark_files_compacting(true);
// 1. In case of local compaction, we can delete expired ssts in advance.
if !self.picker_output.expired_ssts.is_empty() {
let remove_timer = COMPACTION_STAGE_ELAPSED
.with_label_values(&["remove_expired"])
.start_timer();
let expired_ssts = self
.picker_output
.expired_ssts
.drain(..)
.map(|f| f.meta_ref().clone())
.collect();
// remove_expired logs errors but doesn't stop compaction
self.remove_expired(&self.compaction_region, expired_ssts)
.await;
remove_timer.observe_duration();
}
// 2. Merge inputs
let merge_timer = COMPACTION_STAGE_ELAPSED
.with_label_values(&["merge"])
.start_timer();
@@ -152,7 +239,7 @@ impl CompactionTaskImpl {
#[async_trait::async_trait]
impl CompactionTask for CompactionTaskImpl {
async fn run(&mut self) {
let notify = match self.handle_compaction().await {
let notify = match self.handle_expiration_and_compaction().await {
Ok(edit) => BackgroundNotify::CompactionFinished(CompactionFinished {
region_id: self.compaction_region.region_id,
senders: std::mem::take(&mut self.waiters),
@@ -178,3 +265,66 @@ impl CompactionTask for CompactionTaskImpl {
.await;
}
}
#[cfg(test)]
mod tests {
use store_api::storage::FileId;
use crate::compaction::picker::PickerOutput;
use crate::compaction::test_util::new_file_handle;
#[test]
fn test_picker_output_with_expired_ssts() {
// Test that PickerOutput correctly includes expired_ssts
// This verifies that expired SSTs are properly identified and included
// in the picker output, which is then handled by handle_expiration_and_compaction
let file_ids = (0..3).map(|_| FileId::random()).collect::<Vec<_>>();
let expired_ssts = vec![
new_file_handle(file_ids[0], 0, 999, 0),
new_file_handle(file_ids[1], 1000, 1999, 0),
];
let picker_output = PickerOutput {
outputs: vec![],
expired_ssts: expired_ssts.clone(),
time_window_size: 3600,
max_file_size: None,
};
// Verify expired_ssts are included
assert_eq!(picker_output.expired_ssts.len(), 2);
assert_eq!(
picker_output.expired_ssts[0].file_id(),
expired_ssts[0].file_id()
);
assert_eq!(
picker_output.expired_ssts[1].file_id(),
expired_ssts[1].file_id()
);
}
#[test]
fn test_picker_output_without_expired_ssts() {
// Test that PickerOutput works correctly when there are no expired SSTs
let picker_output = PickerOutput {
outputs: vec![],
expired_ssts: vec![],
time_window_size: 3600,
max_file_size: None,
};
// Verify empty expired_ssts
assert!(picker_output.expired_ssts.is_empty());
}
// Note: Testing remove_expired() directly requires extensive mocking of:
// - manifest_ctx (ManifestContext)
// - request_sender (mpsc::Sender<WorkerRequestWithTime>)
// - WorkerRequest handling
//
// The behavior is tested indirectly through integration tests:
// - remove_expired() logs errors but doesn't stop compaction
// - handle_expiration_and_compaction() continues even if remove_expired() encounters errors
// - The function is designed to be non-blocking for compaction
}

View File

@@ -76,6 +76,7 @@ pub fn new_file_handle_with_size_and_sequence(
file_size,
available_indexes: Default::default(),
index_file_size: 0,
index_file_id: None,
num_rows: 0,
num_row_groups: 0,
num_series: 0,

View File

@@ -18,7 +18,7 @@ use std::fmt::Debug;
use std::num::NonZeroU64;
use common_base::readable_size::ReadableSize;
use common_telemetry::info;
use common_telemetry::{debug, info};
use common_time::Timestamp;
use common_time::timestamp::TimeUnit;
use common_time::timestamp_millis::BucketAligned;
@@ -48,6 +48,8 @@ pub struct TwcsPicker {
pub max_output_file_size: Option<u64>,
/// Whether the target region is in append mode.
pub append_mode: bool,
/// Max background compaction tasks.
pub max_background_tasks: Option<usize>,
}
impl TwcsPicker {
@@ -88,7 +90,7 @@ impl TwcsPicker {
// because after compaction there will be no overlapping files.
let filter_deleted = !files.overlapping && found_runs <= 2 && !self.append_mode;
if found_runs == 0 {
return output;
continue;
}
let inputs = if found_runs > 1 {
@@ -119,6 +121,16 @@ impl TwcsPicker {
filter_deleted,
output_time_range: None, // we do not enforce output time range in twcs compactions.
});
if let Some(max_background_tasks) = self.max_background_tasks
&& output.len() >= max_background_tasks
{
debug!(
"Region ({:?}) compaction task size larger than max background tasks({}), remaining tasks discarded",
region_id, max_background_tasks
);
break;
}
}
}
output
@@ -680,6 +692,7 @@ mod tests {
time_window_seconds: None,
max_output_file_size: None,
append_mode: false,
max_background_tasks: None,
}
.build_output(RegionId::from_u64(0), &mut windows, active_window);
@@ -831,5 +844,185 @@ mod tests {
}
}
#[test]
fn test_build_output_multiple_windows_with_zero_runs() {
let file_ids = (0..6).map(|_| FileId::random()).collect::<Vec<_>>();
let files = [
// Window 0: Contains 3 files but not forming any runs (not enough files in sequence to reach trigger_file_num)
new_file_handle_with_sequence(file_ids[0], 0, 999, 0, 1),
new_file_handle_with_sequence(file_ids[1], 0, 999, 0, 2),
new_file_handle_with_sequence(file_ids[2], 0, 999, 0, 3),
// Window 3: Contains files that will form 2 runs
new_file_handle_with_sequence(file_ids[3], 3000, 3999, 0, 4),
new_file_handle_with_sequence(file_ids[4], 3000, 3999, 0, 5),
new_file_handle_with_sequence(file_ids[5], 3000, 3999, 0, 6),
];
let mut windows = assign_to_windows(files.iter(), 3);
// Create picker with trigger_file_num of 4 so single files won't form runs in first window
let picker = TwcsPicker {
trigger_file_num: 4, // High enough to prevent runs in first window
time_window_seconds: Some(3),
max_output_file_size: None,
append_mode: false,
max_background_tasks: None,
};
let active_window = find_latest_window_in_seconds(files.iter(), 3);
let output = picker.build_output(RegionId::from_u64(123), &mut windows, active_window);
assert!(
!output.is_empty(),
"Should have output from windows with runs, even when one window has 0 runs"
);
let all_output_files: Vec<_> = output
.iter()
.flat_map(|o| o.inputs.iter())
.map(|f| f.file_id().file_id())
.collect();
assert!(
all_output_files.contains(&file_ids[3])
|| all_output_files.contains(&file_ids[4])
|| all_output_files.contains(&file_ids[5]),
"Output should contain files from the window with runs"
);
}
#[test]
fn test_build_output_single_window_zero_runs() {
let file_ids = (0..2).map(|_| FileId::random()).collect::<Vec<_>>();
let large_file_1 = new_file_handle_with_size_and_sequence(file_ids[0], 0, 999, 0, 1, 2000); // 2000 bytes
let large_file_2 = new_file_handle_with_size_and_sequence(file_ids[1], 0, 999, 0, 2, 2500); // 2500 bytes
let files = [large_file_1, large_file_2];
let mut windows = assign_to_windows(files.iter(), 3);
let picker = TwcsPicker {
trigger_file_num: 2,
time_window_seconds: Some(3),
max_output_file_size: Some(1000),
append_mode: true,
max_background_tasks: None,
};
let active_window = find_latest_window_in_seconds(files.iter(), 3);
let output = picker.build_output(RegionId::from_u64(456), &mut windows, active_window);
// Should return empty output (no compaction needed)
assert!(
output.is_empty(),
"Should return empty output when no runs are found after filtering"
);
}
#[test]
fn test_max_background_tasks_truncation() {
let file_ids = (0..10).map(|_| FileId::random()).collect::<Vec<_>>();
let max_background_tasks = 3;
// Create files across multiple windows that will generate multiple compaction outputs
let files = [
// Window 0: 4 files that will form a run
new_file_handle_with_sequence(file_ids[0], 0, 999, 0, 1),
new_file_handle_with_sequence(file_ids[1], 0, 999, 0, 2),
new_file_handle_with_sequence(file_ids[2], 0, 999, 0, 3),
new_file_handle_with_sequence(file_ids[3], 0, 999, 0, 4),
// Window 3: 4 files that will form another run
new_file_handle_with_sequence(file_ids[4], 3000, 3999, 0, 5),
new_file_handle_with_sequence(file_ids[5], 3000, 3999, 0, 6),
new_file_handle_with_sequence(file_ids[6], 3000, 3999, 0, 7),
new_file_handle_with_sequence(file_ids[7], 3000, 3999, 0, 8),
// Window 6: 4 files that will form another run
new_file_handle_with_sequence(file_ids[8], 6000, 6999, 0, 9),
new_file_handle_with_sequence(file_ids[9], 6000, 6999, 0, 10),
];
let mut windows = assign_to_windows(files.iter(), 3);
let picker = TwcsPicker {
trigger_file_num: 4,
time_window_seconds: Some(3),
max_output_file_size: None,
append_mode: false,
max_background_tasks: Some(max_background_tasks),
};
let active_window = find_latest_window_in_seconds(files.iter(), 3);
let output = picker.build_output(RegionId::from_u64(123), &mut windows, active_window);
// Should have at most max_background_tasks outputs
assert!(
output.len() <= max_background_tasks,
"Output should be truncated to max_background_tasks: expected <= {}, got {}",
max_background_tasks,
output.len()
);
// Without max_background_tasks, should have more outputs
let picker_no_limit = TwcsPicker {
trigger_file_num: 4,
time_window_seconds: Some(3),
max_output_file_size: None,
append_mode: false,
max_background_tasks: None,
};
let mut windows_no_limit = assign_to_windows(files.iter(), 3);
let output_no_limit = picker_no_limit.build_output(
RegionId::from_u64(123),
&mut windows_no_limit,
active_window,
);
// Without limit, should have more outputs (if there are enough windows)
if output_no_limit.len() > max_background_tasks {
assert!(
output_no_limit.len() > output.len(),
"Without limit should have more outputs than with limit"
);
}
}
#[test]
fn test_max_background_tasks_no_truncation_when_under_limit() {
let file_ids = (0..4).map(|_| FileId::random()).collect::<Vec<_>>();
let max_background_tasks = 10; // Larger than expected outputs
// Create files in one window that will generate one compaction output
let files = [
new_file_handle_with_sequence(file_ids[0], 0, 999, 0, 1),
new_file_handle_with_sequence(file_ids[1], 0, 999, 0, 2),
new_file_handle_with_sequence(file_ids[2], 0, 999, 0, 3),
new_file_handle_with_sequence(file_ids[3], 0, 999, 0, 4),
];
let mut windows = assign_to_windows(files.iter(), 3);
let picker = TwcsPicker {
trigger_file_num: 4,
time_window_seconds: Some(3),
max_output_file_size: None,
append_mode: false,
max_background_tasks: Some(max_background_tasks),
};
let active_window = find_latest_window_in_seconds(files.iter(), 3);
let output = picker.build_output(RegionId::from_u64(123), &mut windows, active_window);
// Should have all outputs since we're under the limit
assert!(
output.len() <= max_background_tasks,
"Output should be within limit"
);
// Should have at least one output
assert!(!output.is_empty(), "Should have at least one output");
}
// TODO(hl): TTL tester that checks if get_expired_ssts function works as expected.
}

Some files were not shown because too many files have changed in this diff Show More