Compare commits

...

26 Commits

Author SHA1 Message Date
yihong
09f3d72d2d fix: closee issue #6555 return empty result (#6569)
* fix: closee issue #6555 return empty result

Signed-off-by: yihong0618 <zouzou0208@gmail.com>

* fix: only start one instance one regrex sqlness test (#6570)

Signed-off-by: yihong0618 <zouzou0208@gmail.com>

* refactor: refactor partition mod to use PartitionExpr instead of PartitionDef (#6554)

* refactor: refactor partition mod to use PartitionExpr instead of PartitionDef

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* fix snafu

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* Puts expression into PbPartition

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* address comments

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* fix compile

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* update proto

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* add serde test

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* add serde test

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

---------

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* fix: address comments

Signed-off-by: yihong0618 <zouzou0208@gmail.com>

---------

Signed-off-by: yihong0618 <zouzou0208@gmail.com>
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
Co-authored-by: Zhenchi <zhongzc_arch@outlook.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
2025-07-24 15:00:32 +08:00
Yingwen
ca0c1282ed chore: bump version to 0.15.3 (#6580)
Signed-off-by: evenyag <realevenyag@gmail.com>
2025-07-24 11:24:07 +08:00
Yingwen
b719c020ba chore: cherry pick #6540, #6550, #6551, #6556, #6563, #6534 to v0.15 branch (#6577)
* feat: add metrics for request wait time and adjust stall metrics (#6540)

* feat: add metric greptime_mito_request_wait_time to observe wait time

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: add worker to wait time metric

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: rename stall gauge to greptime_mito_write_stalling_count

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: change greptime_mito_write_stall_total to total stalled requests

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: merge lazy static blocks

Signed-off-by: evenyag <realevenyag@gmail.com>

---------

Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: estimate mem size for bulk ingester (#6550)

Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: flow mirror cache (#6551)

* fix: invalid cache when flownode change address

Signed-off-by: discord9 <discord9@163.com>

* update comments

Signed-off-by: discord9 <discord9@163.com>

* fix

Signed-off-by: discord9 <discord9@163.com>

* refactor: add log&rename

Signed-off-by: discord9 <discord9@163.com>

* stuff

Signed-off-by: discord9 <discord9@163.com>

---------

Signed-off-by: discord9 <discord9@163.com>
Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: impl timestamp function for promql (#6556)

* feat: impl timestamp function for promql

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* chore: style and typo

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* fix: test

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* docs: update comments

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* chore: comment

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

---------

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: MergeScan print input (#6563)

* feat: MergeScan print input

Signed-off-by: discord9 <discord9@163.com>

* test: fix ut

Signed-off-by: discord9 <discord9@163.com>

---------

Signed-off-by: discord9 <discord9@163.com>
Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: aggr group by all partition cols use partial commutative (#6534)

* fix: aggr group by all partition cols use partial commutative

Signed-off-by: discord9 <discord9@163.com>

* test: bugged case

Signed-off-by: discord9 <discord9@163.com>

* test: sqlness fix

Signed-off-by: discord9 <discord9@163.com>

* test: more redacted

Signed-off-by: discord9 <discord9@163.com>

* more cases

Signed-off-by: discord9 <discord9@163.com>

* even more test cases

Signed-off-by: discord9 <discord9@163.com>

* join testcase

Signed-off-by: discord9 <discord9@163.com>

* fix: column requirement added in correct location

Signed-off-by: discord9 <discord9@163.com>

* fix test

Signed-off-by: discord9 <discord9@163.com>

* chore: clippy

Signed-off-by: discord9 <discord9@163.com>

* track col reqs per stack

Signed-off-by: discord9 <discord9@163.com>

* fix: continue

Signed-off-by: discord9 <discord9@163.com>

* chore: clippy

Signed-off-by: discord9 <discord9@163.com>

* refactor: test mod

Signed-off-by: discord9 <discord9@163.com>

* test utils

Signed-off-by: discord9 <discord9@163.com>

* test: better test

Signed-off-by: discord9 <discord9@163.com>

* more testcases

Signed-off-by: discord9 <discord9@163.com>

* test limit push down

Signed-off-by: discord9 <discord9@163.com>

* more testcases

Signed-off-by: discord9 <discord9@163.com>

* more testcase

Signed-off-by: discord9 <discord9@163.com>

* more test

Signed-off-by: discord9 <discord9@163.com>

* chore: update sqlness

Signed-off-by: discord9 <discord9@163.com>

* chore: update commnets

Signed-off-by: discord9 <discord9@163.com>

* fix: check col reqs from bottom to upper

Signed-off-by: discord9 <discord9@163.com>

* chore: more comment

Signed-off-by: discord9 <discord9@163.com>

* docs: more todo

Signed-off-by: discord9 <discord9@163.com>

* chore: comments

Signed-off-by: discord9 <discord9@163.com>

* test: a new failing test that should be fixed

Signed-off-by: discord9 <discord9@163.com>

* fix: part col alias tracking

Signed-off-by: discord9 <discord9@163.com>

* chore: unused

Signed-off-by: discord9 <discord9@163.com>

* chore: clippy

Signed-off-by: discord9 <discord9@163.com>

* docs: comment

Signed-off-by: discord9 <discord9@163.com>

* mroe testcase

Signed-off-by: discord9 <discord9@163.com>

* more testcase for step/part aggr combine

Signed-off-by: discord9 <discord9@163.com>

* FIXME: a new bug

Signed-off-by: discord9 <discord9@163.com>

* literally unfixable

Signed-off-by: discord9 <discord9@163.com>

* chore: remove some debug print

Signed-off-by: discord9 <discord9@163.com>

---------

Signed-off-by: discord9 <discord9@163.com>
Signed-off-by: evenyag <realevenyag@gmail.com>

---------

Signed-off-by: evenyag <realevenyag@gmail.com>
Signed-off-by: discord9 <discord9@163.com>
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
Co-authored-by: fys <40801205+fengys1996@users.noreply.github.com>
Co-authored-by: discord9 <55937128+discord9@users.noreply.github.com>
Co-authored-by: dennis zhuang <killme2008@gmail.com>
2025-07-23 22:29:14 +08:00
Ruihang Xia
717c1d1807 feat: update partial execution metrics (#6499)
* feat: update partial execution metrics

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* send data with metrics in distributed mode

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix clippy

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* only send partial metrics under VERBOSE flag

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* loop to while

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
2025-07-23 20:54:33 +08:00
Zhenchi
291f3c89fe fix: row selection intersection removes trailing rows (#6539)
* fix: row selection intersection removes trailing rows

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* fix typos

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

---------

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
2025-07-23 20:54:33 +08:00
discord9
602cc38056 fix: breaking loop when not retryable (#6538)
fix: breaking when not retryable

Signed-off-by: discord9 <discord9@163.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
2025-07-23 20:54:33 +08:00
Lei, HUANG
46b3593021 fix(grpc): check grpc client unavailable (#6488)
* fix/check-grpc-client-unavailable:
 Improve async handling in `greptime_handler.rs`

 - Updated the `DoPut` response handling to use `await` with `result_sender.send` for better asynchronous operation.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* fix/check-grpc-client-unavailable:
 ### Improve Error Handling in `greptime_handler.rs`

 - Enhanced error handling for the `DoPut` operation by switching from `send` to `try_send` for the `result_sender`.
 - Added specific logging for unreachable clients, including `request_id` in the warning message.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

---------

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
2025-07-23 20:54:33 +08:00
Yan Tingwang
ff402fd6f6 test: add sqlness test for max execution time (#6517)
* add sqlness test for max_execution_time

Signed-off-by: codephage. <tingwangyan2020@163.com>

* add Pre-line comments SQLNESS PROTOCOL MYSQL

Signed-off-by: codephage. <tingwangyan2020@163.com>

* fix(mysql): support max_execution_time variable

Co-authored-by: evenyag <realevenyag@gmail.com>
Signed-off-by: codephage. <tingwangyan2020@163.com>

* fix: test::test_check & sqlness test mysql

Signed-off-by: codephage. <tingwangyan2020@163.com>

* add sqlness test for max_execution_time

Signed-off-by: codephage. <tingwangyan2020@163.com>

* add Pre-line comments SQLNESS PROTOCOL MYSQL

Signed-off-by: codephage. <tingwangyan2020@163.com>

* fix(mysql): support max_execution_time variable

Co-authored-by: evenyag <realevenyag@gmail.com>
Signed-off-by: codephage. <tingwangyan2020@163.com>

* fix: test::test_check & sqlness test mysql

Signed-off-by: codephage. <tingwangyan2020@163.com>

* chore: Unify the sql style

Signed-off-by: codephage. <tingwangyan2020@163.com>

---------

Signed-off-by: codephage. <tingwangyan2020@163.com>
Co-authored-by: evenyag <realevenyag@gmail.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
2025-07-23 20:54:33 +08:00
Yan Tingwang
b83e6e2b18 fix: add system variable max_execution_time (#6511)
add system variable : max_execution_time

Signed-off-by: codephage. <tingwangyan2020@163.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
2025-07-23 20:54:33 +08:00
discord9
cb74337dbe refactor(flow): faster time window expr (#6495)
* refactor: faster window expr

Signed-off-by: discord9 <discord9@163.com>

* docs: explain fast path

Signed-off-by: discord9 <discord9@163.com>

* chore: rm unwrap

Signed-off-by: discord9 <discord9@163.com>

---------

Signed-off-by: discord9 <discord9@163.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
2025-07-23 20:54:33 +08:00
shuiyisong
32bffbb668 feat: add filter processor to v0.15 (#6516)
feat: add filter processor

Signed-off-by: shuiyisong <xixing.sys@gmail.com>
2025-07-14 17:43:49 +08:00
evenyag
941906dc74 chore: bump version to v0.15.2
Signed-off-by: evenyag <realevenyag@gmail.com>
2025-07-11 00:24:21 +08:00
Ruihang Xia
cbf251d0f0 fix: expand on conditional commutative as well (#6484)
* fix: expand on conditional commutative as well

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Signed-off-by: discord9 <discord9@163.com>

* update sqlness result

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Signed-off-by: discord9 <discord9@163.com>

* add logging to figure test failure

Signed-off-by: discord9 <discord9@163.com>

* revert

Signed-off-by: discord9 <discord9@163.com>

* feat: stream drop record metrics

Signed-off-by: discord9 <discord9@163.com>

* Revert "feat: stream drop record metrics"

This reverts commit 6a16946a5b8ea37557bbb1b600847d24274d6500.

Signed-off-by: discord9 <discord9@163.com>

* feat: stream drop record metrics

Signed-off-by: discord9 <discord9@163.com>

refactor: move logging to drop too

Signed-off-by: discord9 <discord9@163.com>

fix: drop input stream before collect metrics

Signed-off-by: discord9 <discord9@163.com>

* fix: expand differently

Signed-off-by: discord9 <discord9@163.com>

* test: update sqlness

Signed-off-by: discord9 <discord9@163.com>

* chore: more dbg

Signed-off-by: discord9 <discord9@163.com>

* Revert "feat: stream drop record metrics"

This reverts commit 3eda4a2257928d95cf9c1328ae44fae84cfbb017.

Signed-off-by: discord9 <discord9@163.com>

* test: sqlness redacted

Signed-off-by: discord9 <discord9@163.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Signed-off-by: discord9 <discord9@163.com>
Co-authored-by: discord9 <discord9@163.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
2025-07-11 00:24:21 +08:00
shuiyisong
1519379262 chore: skip calc ts in doc 2 with transform (#6509)
Signed-off-by: shuiyisong <xixing.sys@gmail.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
2025-07-10 22:40:07 +08:00
localhost
4bfe02ec7f chore: remove region id to reduce time series (#6506)
Signed-off-by: evenyag <realevenyag@gmail.com>
2025-07-10 22:40:07 +08:00
Weny Xu
ecacf1333e fix: correctly update partition key indices during alter table operations (#6494)
* fix: correctly update partition key indices in alter table operations

Signed-off-by: WenyXu <wenymedia@gmail.com>

* test: add sqlness tests

Signed-off-by: WenyXu <wenymedia@gmail.com>

---------

Signed-off-by: WenyXu <wenymedia@gmail.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
2025-07-10 22:40:07 +08:00
Yingwen
92fa33c250 fix: range query returns range selector error when table not found (#6481)
* test: add sqlness test for range vector with non-existence metric

Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: handle empty metric for matrix selector

Signed-off-by: evenyag <realevenyag@gmail.com>

* test: update sqlness result

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: add newline

Signed-off-by: evenyag <realevenyag@gmail.com>

---------

Signed-off-by: evenyag <realevenyag@gmail.com>
2025-07-10 22:40:07 +08:00
shuiyisong
8b2d1a3753 fix: skip nan in prom remote write pipeline (#6489)
Signed-off-by: shuiyisong <xixing.sys@gmail.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
2025-07-10 22:40:07 +08:00
Ning Sun
13401c94e0 feat: allow alternative version string (#6472)
* feat: allow alternative version string

* refactor: rename original version function to verbose_version

Signed-off-by: Ning Sun <sunning@greptime.com>

---------

Signed-off-by: Ning Sun <sunning@greptime.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
2025-07-10 22:40:07 +08:00
shuiyisong
fd637dae47 chore: sort range query return values (#6474)
* chore: sort range query return values

* chore: add comments

* chore: add is_sorted check

* fix: test

Signed-off-by: evenyag <realevenyag@gmail.com>
2025-07-10 22:40:07 +08:00
dennis zhuang
69fac19770 fix: empty statements hang (#6480)
* fix: empty statements hang

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* tests: add cases

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

---------

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
2025-07-10 22:40:07 +08:00
discord9
6435b97314 fix: stricter win sort condition (#6477)
test: sqlness

test: fix sqlness redacted

Signed-off-by: discord9 <discord9@163.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
2025-07-10 22:40:07 +08:00
Weny Xu
726e3909fe fix(metric-engine): handle stale metadata region recovery failures (#6395)
* fix(metric-engine): handle stale metadata region recovery failures

Signed-off-by: WenyXu <wenymedia@gmail.com>

* test: add unit tests

Signed-off-by: WenyXu <wenymedia@gmail.com>

---------

Signed-off-by: WenyXu <wenymedia@gmail.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
2025-07-10 22:40:07 +08:00
evenyag
00d759e828 chore: bump version to v0.15.1
Signed-off-by: evenyag <realevenyag@gmail.com>
2025-07-04 22:53:46 +08:00
Lei, HUANG
0042ea6462 fix: filter empty batch in bulk insert api (#6459)
* fix/filter-empty-batch-in-bulk-insert-api:
 **Add Early Return for Empty Record Batches in `bulk_insert.rs`**

 - Implemented an early return in the `Inserter` implementation to handle cases where `record_batch.num_rows()` is zero, improving efficiency by avoiding unnecessary processing.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* fix/filter-empty-batch-in-bulk-insert-api:
 **Improve Bulk Insert Handling**

 - **`handle_bulk_insert.rs`**: Added a check to handle cases where the batch has zero rows, immediately returning and sending a success response with zero rows processed.
 - **`bulk_insert.rs`**: Enhanced logic to skip processing for masks that select none, optimizing the bulk insert operation by avoiding unnecessary iterations.

 These changes improve the efficiency and robustness of the bulk insert process by handling edge cases more effectively.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* fix/filter-empty-batch-in-bulk-insert-api:
 ### Refactor and Error Handling Enhancements

 - **Refactored Timestamp Handling**: Introduced `timestamp_array_to_primitive` function in `timestamp.rs` to streamline conversion of timestamp arrays to primitive arrays, reducing redundancy in `handle_bulk_insert.rs` and `bulk_insert.rs`.
 - **Error Handling**: Added `InconsistentTimestampLength` error in `error.rs` to handle mismatched timestamp column lengths in bulk insert operations.
 - **Bulk Insert Logic**: Updated `handle_bulk_insert.rs` to utilize the new timestamp conversion function and added checks for timestamp length consistency.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* fix/filter-empty-batch-in-bulk-insert-api:
 **Refactor `bulk_insert.rs` to streamline imports**

 - Simplified import statements by removing unused timestamp-related arrays and data types from the `arrow` crate in `bulk_insert.rs`.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

---------

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
2025-07-04 22:53:46 +08:00
Zhenchi
d06450715f fix: add backward compatibility for SkippingIndexOptions deserialization (#6458)
* fix: add backward compatibility for `SkippingIndexOptions` deserialization

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* address comments

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* address comments

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

---------

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
2025-07-04 22:53:46 +08:00
113 changed files with 6249 additions and 691 deletions

View File

@@ -12,3 +12,6 @@ fetch = true
checkout = true
list_files = true
internal_use_git2 = false
[env]
CARGO_WORKSPACE_DIR = { value = "", relative = true }

166
Cargo.lock generated
View File

@@ -211,7 +211,7 @@ checksum = "d301b3b94cb4b2f23d7917810addbbaff90738e0ca2be692bd027e70d7e0330c"
[[package]]
name = "api"
version = "0.15.0"
version = "0.15.3"
dependencies = [
"common-base",
"common-decimal",
@@ -944,7 +944,7 @@ dependencies = [
[[package]]
name = "auth"
version = "0.15.0"
version = "0.15.3"
dependencies = [
"api",
"async-trait",
@@ -1586,7 +1586,7 @@ dependencies = [
[[package]]
name = "cache"
version = "0.15.0"
version = "0.15.3"
dependencies = [
"catalog",
"common-error",
@@ -1602,6 +1602,17 @@ version = "1.0.7"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "acbc26382d871df4b7442e3df10a9402bf3cf5e55cbd66f12be38861425f0564"
[[package]]
name = "cargo-manifest"
version = "0.19.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "a1d8af896b707212cd0e99c112a78c9497dd32994192a463ed2f7419d29bd8c6"
dependencies = [
"serde",
"thiserror 2.0.12",
"toml 0.8.19",
]
[[package]]
name = "cast"
version = "0.3.0"
@@ -1610,7 +1621,7 @@ checksum = "37b2a672a2cb129a2e41c10b1224bb368f9f37a2b16b612598138befd7b37eb5"
[[package]]
name = "catalog"
version = "0.15.0"
version = "0.15.3"
dependencies = [
"api",
"arrow 54.2.1",
@@ -1948,7 +1959,7 @@ checksum = "1462739cb27611015575c0c11df5df7601141071f07518d56fcc1be504cbec97"
[[package]]
name = "cli"
version = "0.15.0"
version = "0.15.3"
dependencies = [
"async-stream",
"async-trait",
@@ -1993,7 +2004,7 @@ dependencies = [
"session",
"snafu 0.8.5",
"store-api",
"substrait 0.15.0",
"substrait 0.15.3",
"table",
"tempfile",
"tokio",
@@ -2002,7 +2013,7 @@ dependencies = [
[[package]]
name = "client"
version = "0.15.0"
version = "0.15.3"
dependencies = [
"api",
"arc-swap",
@@ -2032,7 +2043,7 @@ dependencies = [
"rand 0.9.0",
"serde_json",
"snafu 0.8.5",
"substrait 0.15.0",
"substrait 0.15.3",
"substrait 0.37.3",
"tokio",
"tokio-stream",
@@ -2073,7 +2084,7 @@ dependencies = [
[[package]]
name = "cmd"
version = "0.15.0"
version = "0.15.3"
dependencies = [
"async-trait",
"auth",
@@ -2134,7 +2145,7 @@ dependencies = [
"snafu 0.8.5",
"stat",
"store-api",
"substrait 0.15.0",
"substrait 0.15.3",
"table",
"temp-env",
"tempfile",
@@ -2181,7 +2192,7 @@ checksum = "55b672471b4e9f9e95499ea597ff64941a309b2cdbffcc46f2cc5e2d971fd335"
[[package]]
name = "common-base"
version = "0.15.0"
version = "0.15.3"
dependencies = [
"anymap2",
"async-trait",
@@ -2203,11 +2214,11 @@ dependencies = [
[[package]]
name = "common-catalog"
version = "0.15.0"
version = "0.15.3"
[[package]]
name = "common-config"
version = "0.15.0"
version = "0.15.3"
dependencies = [
"common-base",
"common-error",
@@ -2232,7 +2243,7 @@ dependencies = [
[[package]]
name = "common-datasource"
version = "0.15.0"
version = "0.15.3"
dependencies = [
"arrow 54.2.1",
"arrow-schema 54.3.1",
@@ -2269,7 +2280,7 @@ dependencies = [
[[package]]
name = "common-decimal"
version = "0.15.0"
version = "0.15.3"
dependencies = [
"bigdecimal 0.4.8",
"common-error",
@@ -2282,7 +2293,7 @@ dependencies = [
[[package]]
name = "common-error"
version = "0.15.0"
version = "0.15.3"
dependencies = [
"common-macro",
"http 1.1.0",
@@ -2293,7 +2304,7 @@ dependencies = [
[[package]]
name = "common-frontend"
version = "0.15.0"
version = "0.15.3"
dependencies = [
"async-trait",
"common-error",
@@ -2309,7 +2320,7 @@ dependencies = [
[[package]]
name = "common-function"
version = "0.15.0"
version = "0.15.3"
dependencies = [
"ahash 0.8.11",
"api",
@@ -2362,7 +2373,7 @@ dependencies = [
[[package]]
name = "common-greptimedb-telemetry"
version = "0.15.0"
version = "0.15.3"
dependencies = [
"async-trait",
"common-runtime",
@@ -2379,7 +2390,7 @@ dependencies = [
[[package]]
name = "common-grpc"
version = "0.15.0"
version = "0.15.3"
dependencies = [
"api",
"arrow-flight",
@@ -2411,7 +2422,7 @@ dependencies = [
[[package]]
name = "common-grpc-expr"
version = "0.15.0"
version = "0.15.3"
dependencies = [
"api",
"common-base",
@@ -2430,7 +2441,7 @@ dependencies = [
[[package]]
name = "common-macro"
version = "0.15.0"
version = "0.15.3"
dependencies = [
"arc-swap",
"common-query",
@@ -2444,7 +2455,7 @@ dependencies = [
[[package]]
name = "common-mem-prof"
version = "0.15.0"
version = "0.15.3"
dependencies = [
"anyhow",
"common-error",
@@ -2460,7 +2471,7 @@ dependencies = [
[[package]]
name = "common-meta"
version = "0.15.0"
version = "0.15.3"
dependencies = [
"anymap2",
"api",
@@ -2525,7 +2536,7 @@ dependencies = [
[[package]]
name = "common-options"
version = "0.15.0"
version = "0.15.3"
dependencies = [
"common-grpc",
"humantime-serde",
@@ -2534,11 +2545,11 @@ dependencies = [
[[package]]
name = "common-plugins"
version = "0.15.0"
version = "0.15.3"
[[package]]
name = "common-pprof"
version = "0.15.0"
version = "0.15.3"
dependencies = [
"common-error",
"common-macro",
@@ -2550,7 +2561,7 @@ dependencies = [
[[package]]
name = "common-procedure"
version = "0.15.0"
version = "0.15.3"
dependencies = [
"async-stream",
"async-trait",
@@ -2577,7 +2588,7 @@ dependencies = [
[[package]]
name = "common-procedure-test"
version = "0.15.0"
version = "0.15.3"
dependencies = [
"async-trait",
"common-procedure",
@@ -2586,7 +2597,7 @@ dependencies = [
[[package]]
name = "common-query"
version = "0.15.0"
version = "0.15.3"
dependencies = [
"api",
"async-trait",
@@ -2612,7 +2623,7 @@ dependencies = [
[[package]]
name = "common-recordbatch"
version = "0.15.0"
version = "0.15.3"
dependencies = [
"arc-swap",
"common-error",
@@ -2632,7 +2643,7 @@ dependencies = [
[[package]]
name = "common-runtime"
version = "0.15.0"
version = "0.15.3"
dependencies = [
"async-trait",
"clap 4.5.19",
@@ -2662,17 +2673,18 @@ dependencies = [
[[package]]
name = "common-session"
version = "0.15.0"
version = "0.15.3"
dependencies = [
"strum 0.27.1",
]
[[package]]
name = "common-telemetry"
version = "0.15.0"
version = "0.15.3"
dependencies = [
"backtrace",
"common-error",
"common-version",
"console-subscriber",
"greptime-proto",
"humantime-serde",
@@ -2696,7 +2708,7 @@ dependencies = [
[[package]]
name = "common-test-util"
version = "0.15.0"
version = "0.15.3"
dependencies = [
"client",
"common-grpc",
@@ -2709,7 +2721,7 @@ dependencies = [
[[package]]
name = "common-time"
version = "0.15.0"
version = "0.15.3"
dependencies = [
"arrow 54.2.1",
"chrono",
@@ -2727,9 +2739,10 @@ dependencies = [
[[package]]
name = "common-version"
version = "0.15.0"
version = "0.15.3"
dependencies = [
"build-data",
"cargo-manifest",
"const_format",
"serde",
"shadow-rs",
@@ -2737,7 +2750,7 @@ dependencies = [
[[package]]
name = "common-wal"
version = "0.15.0"
version = "0.15.3"
dependencies = [
"common-base",
"common-error",
@@ -2760,7 +2773,7 @@ dependencies = [
[[package]]
name = "common-workload"
version = "0.15.0"
version = "0.15.3"
dependencies = [
"api",
"common-telemetry",
@@ -3716,7 +3729,7 @@ dependencies = [
[[package]]
name = "datanode"
version = "0.15.0"
version = "0.15.3"
dependencies = [
"api",
"arrow-flight",
@@ -3769,7 +3782,7 @@ dependencies = [
"session",
"snafu 0.8.5",
"store-api",
"substrait 0.15.0",
"substrait 0.15.3",
"table",
"tokio",
"toml 0.8.19",
@@ -3778,7 +3791,7 @@ dependencies = [
[[package]]
name = "datatypes"
version = "0.15.0"
version = "0.15.3"
dependencies = [
"arrow 54.2.1",
"arrow-array 54.2.1",
@@ -4438,7 +4451,7 @@ checksum = "e8c02a5121d4ea3eb16a80748c74f5549a5665e4c21333c6098f283870fbdea6"
[[package]]
name = "file-engine"
version = "0.15.0"
version = "0.15.3"
dependencies = [
"api",
"async-trait",
@@ -4575,7 +4588,7 @@ checksum = "8bf7cc16383c4b8d58b9905a8509f02926ce3058053c056376248d958c9df1e8"
[[package]]
name = "flow"
version = "0.15.0"
version = "0.15.3"
dependencies = [
"api",
"arrow 54.2.1",
@@ -4640,7 +4653,7 @@ dependencies = [
"sql",
"store-api",
"strum 0.27.1",
"substrait 0.15.0",
"substrait 0.15.3",
"table",
"tokio",
"tonic 0.12.3",
@@ -4695,7 +4708,7 @@ checksum = "6c2141d6d6c8512188a7891b4b01590a45f6dac67afb4f255c4124dbb86d4eaa"
[[package]]
name = "frontend"
version = "0.15.0"
version = "0.15.3"
dependencies = [
"api",
"arc-swap",
@@ -4755,7 +4768,7 @@ dependencies = [
"sqlparser 0.54.0 (git+https://github.com/GreptimeTeam/sqlparser-rs.git?rev=0cf6c04490d59435ee965edd2078e8855bd8471e)",
"store-api",
"strfmt",
"substrait 0.15.0",
"substrait 0.15.3",
"table",
"tokio",
"tokio-util",
@@ -5916,7 +5929,7 @@ dependencies = [
[[package]]
name = "index"
version = "0.15.0"
version = "0.15.3"
dependencies = [
"async-trait",
"asynchronous-codec",
@@ -6801,7 +6814,7 @@ checksum = "a7a70ba024b9dc04c27ea2f0c0548feb474ec5c54bba33a7f72f873a39d07b24"
[[package]]
name = "log-query"
version = "0.15.0"
version = "0.15.3"
dependencies = [
"chrono",
"common-error",
@@ -6813,7 +6826,7 @@ dependencies = [
[[package]]
name = "log-store"
version = "0.15.0"
version = "0.15.3"
dependencies = [
"async-stream",
"async-trait",
@@ -7111,7 +7124,7 @@ dependencies = [
[[package]]
name = "meta-client"
version = "0.15.0"
version = "0.15.3"
dependencies = [
"api",
"async-trait",
@@ -7139,7 +7152,7 @@ dependencies = [
[[package]]
name = "meta-srv"
version = "0.15.0"
version = "0.15.3"
dependencies = [
"api",
"async-trait",
@@ -7230,7 +7243,7 @@ dependencies = [
[[package]]
name = "metric-engine"
version = "0.15.0"
version = "0.15.3"
dependencies = [
"api",
"aquamarine",
@@ -7320,7 +7333,7 @@ dependencies = [
[[package]]
name = "mito-codec"
version = "0.15.0"
version = "0.15.3"
dependencies = [
"api",
"bytes",
@@ -7343,7 +7356,7 @@ dependencies = [
[[package]]
name = "mito2"
version = "0.15.0"
version = "0.15.3"
dependencies = [
"api",
"aquamarine",
@@ -8093,7 +8106,7 @@ dependencies = [
[[package]]
name = "object-store"
version = "0.15.0"
version = "0.15.3"
dependencies = [
"anyhow",
"bytes",
@@ -8407,7 +8420,7 @@ dependencies = [
[[package]]
name = "operator"
version = "0.15.0"
version = "0.15.3"
dependencies = [
"ahash 0.8.11",
"api",
@@ -8462,7 +8475,7 @@ dependencies = [
"sql",
"sqlparser 0.54.0 (git+https://github.com/GreptimeTeam/sqlparser-rs.git?rev=0cf6c04490d59435ee965edd2078e8855bd8471e)",
"store-api",
"substrait 0.15.0",
"substrait 0.15.3",
"table",
"tokio",
"tokio-util",
@@ -8729,7 +8742,7 @@ dependencies = [
[[package]]
name = "partition"
version = "0.15.0"
version = "0.15.3"
dependencies = [
"api",
"async-trait",
@@ -9017,7 +9030,7 @@ checksum = "8b870d8c151b6f2fb93e84a13146138f05d02ed11c7e7c54f8826aaaf7c9f184"
[[package]]
name = "pipeline"
version = "0.15.0"
version = "0.15.3"
dependencies = [
"ahash 0.8.11",
"api",
@@ -9160,7 +9173,7 @@ dependencies = [
[[package]]
name = "plugins"
version = "0.15.0"
version = "0.15.3"
dependencies = [
"auth",
"clap 4.5.19",
@@ -9473,7 +9486,7 @@ dependencies = [
[[package]]
name = "promql"
version = "0.15.0"
version = "0.15.3"
dependencies = [
"ahash 0.8.11",
"async-trait",
@@ -9755,7 +9768,7 @@ dependencies = [
[[package]]
name = "puffin"
version = "0.15.0"
version = "0.15.3"
dependencies = [
"async-compression 0.4.13",
"async-trait",
@@ -9797,7 +9810,7 @@ dependencies = [
[[package]]
name = "query"
version = "0.15.0"
version = "0.15.3"
dependencies = [
"ahash 0.8.11",
"api",
@@ -9863,7 +9876,7 @@ dependencies = [
"sqlparser 0.54.0 (git+https://github.com/GreptimeTeam/sqlparser-rs.git?rev=0cf6c04490d59435ee965edd2078e8855bd8471e)",
"statrs",
"store-api",
"substrait 0.15.0",
"substrait 0.15.3",
"table",
"tokio",
"tokio-stream",
@@ -11149,7 +11162,7 @@ dependencies = [
[[package]]
name = "servers"
version = "0.15.0"
version = "0.15.3"
dependencies = [
"ahash 0.8.11",
"api",
@@ -11270,7 +11283,7 @@ dependencies = [
[[package]]
name = "session"
version = "0.15.0"
version = "0.15.3"
dependencies = [
"api",
"arc-swap",
@@ -11609,7 +11622,7 @@ dependencies = [
[[package]]
name = "sql"
version = "0.15.0"
version = "0.15.3"
dependencies = [
"api",
"chrono",
@@ -11664,7 +11677,7 @@ dependencies = [
[[package]]
name = "sqlness-runner"
version = "0.15.0"
version = "0.15.3"
dependencies = [
"async-trait",
"clap 4.5.19",
@@ -11964,7 +11977,7 @@ dependencies = [
[[package]]
name = "stat"
version = "0.15.0"
version = "0.15.3"
dependencies = [
"nix 0.30.1",
]
@@ -11990,7 +12003,7 @@ dependencies = [
[[package]]
name = "store-api"
version = "0.15.0"
version = "0.15.3"
dependencies = [
"api",
"aquamarine",
@@ -12151,7 +12164,7 @@ dependencies = [
[[package]]
name = "substrait"
version = "0.15.0"
version = "0.15.3"
dependencies = [
"async-trait",
"bytes",
@@ -12331,7 +12344,7 @@ dependencies = [
[[package]]
name = "table"
version = "0.15.0"
version = "0.15.3"
dependencies = [
"api",
"async-trait",
@@ -12592,7 +12605,7 @@ checksum = "3369f5ac52d5eb6ab48c6b4ffdc8efbcad6b89c765749064ba298f2c68a16a76"
[[package]]
name = "tests-fuzz"
version = "0.15.0"
version = "0.15.3"
dependencies = [
"arbitrary",
"async-trait",
@@ -12636,7 +12649,7 @@ dependencies = [
[[package]]
name = "tests-integration"
version = "0.15.0"
version = "0.15.3"
dependencies = [
"api",
"arrow-flight",
@@ -12703,7 +12716,7 @@ dependencies = [
"sql",
"sqlx",
"store-api",
"substrait 0.15.0",
"substrait 0.15.3",
"table",
"tempfile",
"time",
@@ -13073,6 +13086,7 @@ version = "0.8.19"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "a1ed1f98e3fdc28d6d910e6737ae6ab1a93bf1985935a1193e68f93eeb68d24e"
dependencies = [
"indexmap 2.9.0",
"serde",
"serde_spanned",
"toml_datetime",

View File

@@ -71,7 +71,7 @@ members = [
resolver = "2"
[workspace.package]
version = "0.15.0"
version = "0.15.3"
edition = "2021"
license = "Apache-2.0"

View File

@@ -211,12 +211,18 @@ impl Database {
retries += 1;
warn!("Retrying {} times with error = {:?}", retries, err);
continue;
} else {
error!(
err; "Failed to send request to grpc handle, retries = {}, not retryable error, aborting",
retries
);
return Err(err.into());
}
}
(Err(err), false) => {
error!(
"Failed to send request to grpc handle after {} retries, error = {:?}",
retries, err
err; "Failed to send request to grpc handle after {} retries",
retries,
);
return Err(err.into());
}

View File

@@ -163,19 +163,70 @@ impl RegionRequester {
let _span = tracing_context.attach(common_telemetry::tracing::info_span!(
"poll_flight_data_stream"
));
while let Some(flight_message) = flight_message_stream.next().await {
let flight_message = flight_message
.map_err(BoxedError::new)
.context(ExternalSnafu)?;
let mut buffered_message: Option<FlightMessage> = None;
let mut stream_ended = false;
while !stream_ended {
// get the next message from the buffered message or read from the flight message stream
let flight_message_item = if let Some(msg) = buffered_message.take() {
Some(Ok(msg))
} else {
flight_message_stream.next().await
};
let flight_message = match flight_message_item {
Some(Ok(message)) => message,
Some(Err(e)) => {
yield Err(BoxedError::new(e)).context(ExternalSnafu);
break;
}
None => break,
};
match flight_message {
FlightMessage::RecordBatch(record_batch) => {
yield RecordBatch::try_from_df_record_batch(
let result_to_yield = RecordBatch::try_from_df_record_batch(
schema_cloned.clone(),
record_batch,
)
);
// get the next message from the stream. normally it should be a metrics message.
if let Some(next_flight_message_result) = flight_message_stream.next().await
{
match next_flight_message_result {
Ok(FlightMessage::Metrics(s)) => {
let m = serde_json::from_str(&s).ok().map(Arc::new);
metrics_ref.swap(m);
}
Ok(FlightMessage::RecordBatch(rb)) => {
// for some reason it's not a metrics message, so we need to buffer this record batch
// and yield it in the next iteration.
buffered_message = Some(FlightMessage::RecordBatch(rb));
}
Ok(_) => {
yield IllegalFlightMessagesSnafu {
reason: "A RecordBatch message can only be succeeded by a Metrics message or another RecordBatch message"
}
.fail()
.map_err(BoxedError::new)
.context(ExternalSnafu);
break;
}
Err(e) => {
yield Err(BoxedError::new(e)).context(ExternalSnafu);
break;
}
}
} else {
// the stream has ended
stream_ended = true;
}
yield result_to_yield;
}
FlightMessage::Metrics(s) => {
// just a branch in case of some metrics message comes after other things.
let m = serde_json::from_str(&s).ok().map(Arc::new);
metrics_ref.swap(m);
break;

View File

@@ -20,11 +20,11 @@ use cmd::error::{InitTlsProviderSnafu, Result};
use cmd::options::GlobalOptions;
use cmd::{cli, datanode, flownode, frontend, metasrv, standalone, App};
use common_base::Plugins;
use common_version::version;
use common_version::{verbose_version, version};
use servers::install_ring_crypto_provider;
#[derive(Parser)]
#[command(name = "greptime", author, version, long_version = version(), about)]
#[command(name = "greptime", author, version, long_version = verbose_version(), about)]
#[command(propagate_version = true)]
pub(crate) struct Command {
#[clap(subcommand)]
@@ -143,10 +143,8 @@ async fn start(cli: Command) -> Result<()> {
}
fn setup_human_panic() {
human_panic::setup_panic!(
human_panic::Metadata::new("GreptimeDB", env!("CARGO_PKG_VERSION"))
.homepage("https://github.com/GreptimeTeam/greptimedb/discussions")
);
human_panic::setup_panic!(human_panic::Metadata::new("GreptimeDB", version())
.homepage("https://github.com/GreptimeTeam/greptimedb/discussions"));
common_telemetry::set_panic_hook();
}

View File

@@ -19,7 +19,7 @@ use catalog::kvbackend::MetaKvBackend;
use common_base::Plugins;
use common_meta::cache::LayeredCacheRegistryBuilder;
use common_telemetry::info;
use common_version::{short_version, version};
use common_version::{short_version, verbose_version};
use datanode::datanode::DatanodeBuilder;
use datanode::service::DatanodeServiceBuilder;
use meta_client::MetaClientType;
@@ -67,7 +67,7 @@ impl InstanceBuilder {
None,
);
log_versions(version(), short_version(), APP_NAME);
log_versions(verbose_version(), short_version(), APP_NAME);
create_resource_limit_metrics(APP_NAME);
plugins::setup_datanode_plugins(plugins, &opts.plugins, dn_opts)

View File

@@ -32,7 +32,7 @@ use common_meta::key::flow::FlowMetadataManager;
use common_meta::key::TableMetadataManager;
use common_telemetry::info;
use common_telemetry::logging::{TracingOptions, DEFAULT_LOGGING_DIR};
use common_version::{short_version, version};
use common_version::{short_version, verbose_version};
use flow::{
get_flow_auth_options, FlownodeBuilder, FlownodeInstance, FlownodeServiceBuilder,
FrontendClient, FrontendInvoker,
@@ -279,7 +279,7 @@ impl StartCommand {
None,
);
log_versions(version(), short_version(), APP_NAME);
log_versions(verbose_version(), short_version(), APP_NAME);
create_resource_limit_metrics(APP_NAME);
info!("Flownode start command: {:#?}", self);

View File

@@ -33,7 +33,7 @@ use common_meta::heartbeat::handler::HandlerGroupExecutor;
use common_telemetry::info;
use common_telemetry::logging::{TracingOptions, DEFAULT_LOGGING_DIR};
use common_time::timezone::set_default_timezone;
use common_version::{short_version, version};
use common_version::{short_version, verbose_version};
use frontend::frontend::Frontend;
use frontend::heartbeat::HeartbeatTask;
use frontend::instance::builder::FrontendBuilder;
@@ -282,7 +282,7 @@ impl StartCommand {
opts.component.slow_query.as_ref(),
);
log_versions(version(), short_version(), APP_NAME);
log_versions(verbose_version(), short_version(), APP_NAME);
create_resource_limit_metrics(APP_NAME);
info!("Frontend start command: {:#?}", self);

View File

@@ -112,7 +112,7 @@ pub trait App: Send {
pub fn log_versions(version: &str, short_version: &str, app: &str) {
// Report app version as gauge.
APP_VERSION
.with_label_values(&[env!("CARGO_PKG_VERSION"), short_version, app])
.with_label_values(&[common_version::version(), short_version, app])
.inc();
// Log version and argument flags.

View File

@@ -22,7 +22,7 @@ use common_base::Plugins;
use common_config::Configurable;
use common_telemetry::info;
use common_telemetry::logging::{TracingOptions, DEFAULT_LOGGING_DIR};
use common_version::{short_version, version};
use common_version::{short_version, verbose_version};
use meta_srv::bootstrap::MetasrvInstance;
use meta_srv::metasrv::BackendImpl;
use snafu::ResultExt;
@@ -320,7 +320,7 @@ impl StartCommand {
None,
);
log_versions(version(), short_version(), APP_NAME);
log_versions(verbose_version(), short_version(), APP_NAME);
create_resource_limit_metrics(APP_NAME);
info!("Metasrv start command: {:#?}", self);

View File

@@ -51,7 +51,7 @@ use common_telemetry::logging::{
LoggingOptions, SlowQueryOptions, TracingOptions, DEFAULT_LOGGING_DIR,
};
use common_time::timezone::set_default_timezone;
use common_version::{short_version, version};
use common_version::{short_version, verbose_version};
use common_wal::config::DatanodeWalConfig;
use datanode::config::{DatanodeOptions, ProcedureConfig, RegionEngineConfig, StorageConfig};
use datanode::datanode::{Datanode, DatanodeBuilder};
@@ -466,7 +466,7 @@ impl StartCommand {
opts.component.slow_query.as_ref(),
);
log_versions(version(), short_version(), APP_NAME);
log_versions(verbose_version(), short_version(), APP_NAME);
create_resource_limit_metrics(APP_NAME);
info!("Standalone start command: {:#?}", self);

View File

@@ -12,8 +12,8 @@
// See the License for the specific language governing permissions and
// limitations under the License.
use std::fmt;
use std::sync::Arc;
use std::{env, fmt};
use common_query::error::Result;
use common_query::prelude::{Signature, Volatility};
@@ -47,7 +47,7 @@ impl Function for PGVersionFunction {
fn eval(&self, _func_ctx: &FunctionContext, _columns: &[VectorRef]) -> Result<VectorRef> {
let result = StringVector::from(vec![format!(
"PostgreSQL 16.3 GreptimeDB {}",
env!("CARGO_PKG_VERSION")
common_version::version()
)]);
Ok(Arc::new(result))
}

View File

@@ -12,8 +12,8 @@
// See the License for the specific language governing permissions and
// limitations under the License.
use std::fmt;
use std::sync::Arc;
use std::{env, fmt};
use common_query::error::Result;
use common_query::prelude::{Signature, Volatility};
@@ -52,13 +52,13 @@ impl Function for VersionFunction {
"{}-greptimedb-{}",
std::env::var("GREPTIMEDB_MYSQL_SERVER_VERSION")
.unwrap_or_else(|_| "8.4.2".to_string()),
env!("CARGO_PKG_VERSION")
common_version::version()
)
}
Channel::Postgres => {
format!("16.3-greptimedb-{}", env!("CARGO_PKG_VERSION"))
format!("16.3-greptimedb-{}", common_version::version())
}
_ => env!("CARGO_PKG_VERSION").to_string(),
_ => common_version::version().to_string(),
};
let result = StringVector::from(vec![version]);
Ok(Arc::new(result))

View File

@@ -15,6 +15,7 @@
use std::collections::HashMap;
use std::sync::Arc;
use common_telemetry::info;
use futures::future::BoxFuture;
use moka::future::Cache;
use moka::ops::compute::Op;
@@ -89,6 +90,12 @@ fn init_factory(table_flow_manager: TableFlowManagerRef) -> Initializer<TableId,
// we have a corresponding cache invalidation mechanism to invalidate `(Key, EmptyHashSet)`.
.map(Arc::new)
.map(Some)
.inspect(|set| {
info!(
"Initialized table_flownode cache for table_id: {}, set: {:?}",
table_id, set
);
})
})
})
}
@@ -167,6 +174,13 @@ fn invalidator<'a>(
match ident {
CacheIdent::CreateFlow(create_flow) => handle_create_flow(cache, create_flow).await,
CacheIdent::DropFlow(drop_flow) => handle_drop_flow(cache, drop_flow).await,
CacheIdent::FlowNodeAddressChange(node_id) => {
info!(
"Invalidate flow node cache for node_id in table_flownode: {}",
node_id
);
cache.invalidate_all();
}
_ => {}
}
Ok(())
@@ -174,7 +188,10 @@ fn invalidator<'a>(
}
fn filter(ident: &CacheIdent) -> bool {
matches!(ident, CacheIdent::CreateFlow(_) | CacheIdent::DropFlow(_))
matches!(
ident,
CacheIdent::CreateFlow(_) | CacheIdent::DropFlow(_) | CacheIdent::FlowNodeAddressChange(_)
)
}
#[cfg(test)]

View File

@@ -22,6 +22,7 @@ use crate::key::flow::flow_name::FlowNameKey;
use crate::key::flow::flow_route::FlowRouteKey;
use crate::key::flow::flownode_flow::FlownodeFlowKey;
use crate::key::flow::table_flow::TableFlowKey;
use crate::key::node_address::NodeAddressKey;
use crate::key::schema_name::SchemaNameKey;
use crate::key::table_info::TableInfoKey;
use crate::key::table_name::TableNameKey;
@@ -53,6 +54,10 @@ pub struct Context {
#[async_trait::async_trait]
pub trait CacheInvalidator: Send + Sync {
async fn invalidate(&self, ctx: &Context, caches: &[CacheIdent]) -> Result<()>;
fn name(&self) -> &'static str {
std::any::type_name::<Self>()
}
}
pub type CacheInvalidatorRef = Arc<dyn CacheInvalidator>;
@@ -137,6 +142,13 @@ where
let key = FlowInfoKey::new(*flow_id);
self.invalidate_key(&key.to_bytes()).await;
}
CacheIdent::FlowNodeAddressChange(node_id) => {
// other caches doesn't need to be invalidated
// since this is only for flownode address change not id change
common_telemetry::info!("Invalidate flow node cache for node_id: {}", node_id);
let key = NodeAddressKey::with_flownode(*node_id);
self.invalidate_key(&key.to_bytes()).await;
}
}
}
Ok(())

View File

@@ -174,6 +174,8 @@ pub struct UpgradeRegion {
/// The identifier of cache.
pub enum CacheIdent {
FlowId(FlowId),
/// Indicate change of address of flownode.
FlowNodeAddressChange(u64),
FlowName(FlowName),
TableId(TableId),
TableName(TableName),

View File

@@ -222,6 +222,7 @@ pub struct RecordBatchStreamAdapter {
enum Metrics {
Unavailable,
Unresolved(Arc<dyn ExecutionPlan>),
PartialResolved(Arc<dyn ExecutionPlan>, RecordBatchMetrics),
Resolved(RecordBatchMetrics),
}
@@ -275,7 +276,9 @@ impl RecordBatchStream for RecordBatchStreamAdapter {
fn metrics(&self) -> Option<RecordBatchMetrics> {
match &self.metrics_2 {
Metrics::Resolved(metrics) => Some(metrics.clone()),
Metrics::Resolved(metrics) | Metrics::PartialResolved(_, metrics) => {
Some(metrics.clone())
}
Metrics::Unavailable | Metrics::Unresolved(_) => None,
}
}
@@ -299,13 +302,25 @@ impl Stream for RecordBatchStreamAdapter {
Poll::Pending => Poll::Pending,
Poll::Ready(Some(df_record_batch)) => {
let df_record_batch = df_record_batch?;
if let Metrics::Unresolved(df_plan) | Metrics::PartialResolved(df_plan, _) =
&self.metrics_2
{
let mut metric_collector = MetricCollector::new(self.explain_verbose);
accept(df_plan.as_ref(), &mut metric_collector).unwrap();
self.metrics_2 = Metrics::PartialResolved(
df_plan.clone(),
metric_collector.record_batch_metrics,
);
}
Poll::Ready(Some(RecordBatch::try_from_df_record_batch(
self.schema(),
df_record_batch,
)))
}
Poll::Ready(None) => {
if let Metrics::Unresolved(df_plan) = &self.metrics_2 {
if let Metrics::Unresolved(df_plan) | Metrics::PartialResolved(df_plan, _) =
&self.metrics_2
{
let mut metric_collector = MetricCollector::new(self.explain_verbose);
accept(df_plan.as_ref(), &mut metric_collector).unwrap();
self.metrics_2 = Metrics::Resolved(metric_collector.record_batch_metrics);

View File

@@ -14,6 +14,7 @@ workspace = true
[dependencies]
backtrace = "0.3"
common-error.workspace = true
common-version.workspace = true
console-subscriber = { version = "0.1", optional = true }
greptime-proto.workspace = true
humantime-serde.workspace = true

View File

@@ -384,7 +384,7 @@ pub fn init_global_logging(
resource::SERVICE_INSTANCE_ID,
node_id.unwrap_or("none".to_string()),
),
KeyValue::new(resource::SERVICE_VERSION, env!("CARGO_PKG_VERSION")),
KeyValue::new(resource::SERVICE_VERSION, common_version::version()),
KeyValue::new(resource::PROCESS_PID, std::process::id().to_string()),
]));

View File

@@ -17,4 +17,5 @@ shadow-rs.workspace = true
[build-dependencies]
build-data = "0.2"
cargo-manifest = "0.19"
shadow-rs.workspace = true

View File

@@ -14,8 +14,10 @@
use std::collections::BTreeSet;
use std::env;
use std::path::PathBuf;
use build_data::{format_timestamp, get_source_time};
use cargo_manifest::Manifest;
use shadow_rs::{BuildPattern, ShadowBuilder, CARGO_METADATA, CARGO_TREE};
fn main() -> shadow_rs::SdResult<()> {
@@ -33,6 +35,24 @@ fn main() -> shadow_rs::SdResult<()> {
// solve the problem where the "CARGO_MANIFEST_DIR" is not what we want when this repo is
// made as a submodule in another repo.
let src_path = env::var("CARGO_WORKSPACE_DIR").or_else(|_| env::var("CARGO_MANIFEST_DIR"))?;
let manifest = Manifest::from_path(PathBuf::from(&src_path).join("Cargo.toml"))
.expect("Failed to parse Cargo.toml");
if let Some(product_version) = manifest.workspace.as_ref().and_then(|w| {
w.metadata.as_ref().and_then(|m| {
m.get("greptime")
.and_then(|g| g.get("product_version").and_then(|v| v.as_str()))
})
}) {
println!(
"cargo:rustc-env=GREPTIME_PRODUCT_VERSION={}",
product_version
);
} else {
let version = env::var("CARGO_PKG_VERSION").unwrap();
println!("cargo:rustc-env=GREPTIME_PRODUCT_VERSION={}", version,);
}
let out_path = env::var("OUT_DIR")?;
let _ = ShadowBuilder::builder()

View File

@@ -105,13 +105,17 @@ pub const fn build_info() -> BuildInfo {
build_time: env!("BUILD_TIMESTAMP"),
rustc: build::RUST_VERSION,
target: build::BUILD_TARGET,
version: build::PKG_VERSION,
version: env!("GREPTIME_PRODUCT_VERSION"),
}
}
const BUILD_INFO: BuildInfo = build_info();
pub const fn version() -> &'static str {
BUILD_INFO.version
}
pub const fn verbose_version() -> &'static str {
const_format::formatcp!(
"\nbranch: {}\ncommit: {}\nclean: {}\nversion: {}",
BUILD_INFO.branch,

View File

@@ -27,14 +27,14 @@ lazy_static! {
pub static ref HANDLE_REGION_REQUEST_ELAPSED: HistogramVec = register_histogram_vec!(
"greptime_datanode_handle_region_request_elapsed",
"datanode handle region request elapsed",
&[REGION_ID, REGION_REQUEST_TYPE]
&[REGION_REQUEST_TYPE]
)
.unwrap();
/// The number of rows in region request received by region server, labeled with request type.
pub static ref REGION_CHANGED_ROW_COUNT: IntCounterVec = register_int_counter_vec!(
"greptime_datanode_region_changed_row_count",
"datanode region changed row count",
&[REGION_ID, REGION_REQUEST_TYPE]
&[REGION_REQUEST_TYPE]
)
.unwrap();
/// The elapsed time since the last received heartbeat.

View File

@@ -51,7 +51,7 @@ use servers::error::{self as servers_error, ExecuteGrpcRequestSnafu, Result as S
use servers::grpc::flight::{FlightCraft, FlightRecordBatchStream, TonicStream};
use servers::grpc::region_server::RegionServerHandler;
use servers::grpc::FlightCompression;
use session::context::{QueryContextBuilder, QueryContextRef};
use session::context::{QueryContext, QueryContextBuilder, QueryContextRef};
use snafu::{ensure, OptionExt, ResultExt};
use store_api::metric_engine_consts::{
FILE_ENGINE_NAME, LOGICAL_TABLE_METADATA_KEY, METRIC_ENGINE_NAME,
@@ -194,6 +194,7 @@ impl RegionServer {
pub async fn handle_remote_read(
&self,
request: api::v1::region::QueryRequest,
query_ctx: QueryContextRef,
) -> Result<SendableRecordBatchStream> {
let _permit = if let Some(p) = &self.inner.parallelism {
Some(p.acquire().await?)
@@ -201,12 +202,6 @@ impl RegionServer {
None
};
let query_ctx: QueryContextRef = request
.header
.as_ref()
.map(|h| Arc::new(h.into()))
.unwrap_or_else(|| Arc::new(QueryContextBuilder::default().build()));
let region_id = RegionId::from_u64(request.region_id);
let provider = self.table_provider(region_id, Some(&query_ctx)).await?;
let catalog_list = Arc::new(DummyCatalogList::with_table_provider(provider));
@@ -214,7 +209,7 @@ impl RegionServer {
let decoder = self
.inner
.query_engine
.engine_context(query_ctx)
.engine_context(query_ctx.clone())
.new_plan_decoder()
.context(NewPlanDecoderSnafu)?;
@@ -224,11 +219,14 @@ impl RegionServer {
.context(DecodeLogicalPlanSnafu)?;
self.inner
.handle_read(QueryRequest {
header: request.header,
region_id,
plan,
})
.handle_read(
QueryRequest {
header: request.header,
region_id,
plan,
},
query_ctx,
)
.await
}
@@ -243,6 +241,7 @@ impl RegionServer {
let ctx: Option<session::context::QueryContext> = request.header.as_ref().map(|h| h.into());
let provider = self.table_provider(request.region_id, ctx.as_ref()).await?;
let query_ctx = Arc::new(ctx.unwrap_or_else(|| QueryContextBuilder::default().build()));
struct RegionDataSourceInjector {
source: Arc<dyn TableSource>,
@@ -271,7 +270,7 @@ impl RegionServer {
.data;
self.inner
.handle_read(QueryRequest { plan, ..request })
.handle_read(QueryRequest { plan, ..request }, query_ctx)
.await
}
@@ -536,9 +535,14 @@ impl FlightCraft for RegionServer {
.as_ref()
.map(|h| TracingContext::from_w3c(&h.tracing_context))
.unwrap_or_default();
let query_ctx = request
.header
.as_ref()
.map(|h| Arc::new(QueryContext::from(h)))
.unwrap_or(QueryContext::arc());
let result = self
.handle_remote_read(request)
.handle_remote_read(request, query_ctx.clone())
.trace(tracing_context.attach(info_span!("RegionServer::handle_read")))
.await?;
@@ -546,6 +550,7 @@ impl FlightCraft for RegionServer {
result,
tracing_context,
self.flight_compression,
query_ctx,
));
Ok(Response::new(stream))
}
@@ -915,9 +920,8 @@ impl RegionServerInner {
request: RegionRequest,
) -> Result<RegionResponse> {
let request_type = request.request_type();
let region_id_str = region_id.to_string();
let _timer = crate::metrics::HANDLE_REGION_REQUEST_ELAPSED
.with_label_values(&[&region_id_str, request_type])
.with_label_values(&[request_type])
.start_timer();
let region_change = match &request {
@@ -957,7 +961,7 @@ impl RegionServerInner {
// Update metrics
if matches!(region_change, RegionChange::Ingest) {
crate::metrics::REGION_CHANGED_ROW_COUNT
.with_label_values(&[&region_id_str, request_type])
.with_label_values(&[request_type])
.inc_by(result.affected_rows as u64);
}
// Sets corresponding region status to ready.
@@ -1124,16 +1128,13 @@ impl RegionServerInner {
Ok(())
}
pub async fn handle_read(&self, request: QueryRequest) -> Result<SendableRecordBatchStream> {
pub async fn handle_read(
&self,
request: QueryRequest,
query_ctx: QueryContextRef,
) -> Result<SendableRecordBatchStream> {
// TODO(ruihang): add metrics and set trace id
// Build query context from gRPC header
let query_ctx: QueryContextRef = request
.header
.as_ref()
.map(|h| Arc::new(h.into()))
.unwrap_or_else(|| QueryContextBuilder::default().build().into());
let result = self
.query_engine
.execute(request.plan, query_ctx)

View File

@@ -527,7 +527,7 @@ pub struct FulltextOptions {
#[serde(default = "fulltext_options_default_granularity")]
pub granularity: u32,
/// The false positive rate of the fulltext index (for bloom backend only)
#[serde(default = "fulltext_options_default_false_positive_rate_in_10000")]
#[serde(default = "index_options_default_false_positive_rate_in_10000")]
pub false_positive_rate_in_10000: u32,
}
@@ -535,7 +535,7 @@ fn fulltext_options_default_granularity() -> u32 {
DEFAULT_GRANULARITY
}
fn fulltext_options_default_false_positive_rate_in_10000() -> u32 {
fn index_options_default_false_positive_rate_in_10000() -> u32 {
(DEFAULT_FALSE_POSITIVE_RATE * 10000.0) as u32
}
@@ -773,6 +773,7 @@ pub struct SkippingIndexOptions {
/// The granularity of the skip index.
pub granularity: u32,
/// The false positive rate of the skip index (in ten-thousandths, e.g., 100 = 1%).
#[serde(default = "index_options_default_false_positive_rate_in_10000")]
pub false_positive_rate_in_10000: u32,
/// The type of the skip index.
#[serde(default)]
@@ -1179,4 +1180,59 @@ mod tests {
assert!(column_schema.default_constraint.is_none());
assert!(column_schema.metadata.is_empty());
}
#[test]
fn test_skipping_index_options_deserialization() {
let original_options = "{\"granularity\":1024,\"false-positive-rate-in-10000\":10,\"index-type\":\"BloomFilter\"}";
let options = serde_json::from_str::<SkippingIndexOptions>(original_options).unwrap();
assert_eq!(1024, options.granularity);
assert_eq!(SkippingIndexType::BloomFilter, options.index_type);
assert_eq!(0.001, options.false_positive_rate());
let options_str = serde_json::to_string(&options).unwrap();
assert_eq!(options_str, original_options);
}
#[test]
fn test_skipping_index_options_deserialization_v0_14_to_v0_15() {
let options = "{\"granularity\":10240,\"index-type\":\"BloomFilter\"}";
let options = serde_json::from_str::<SkippingIndexOptions>(options).unwrap();
assert_eq!(10240, options.granularity);
assert_eq!(SkippingIndexType::BloomFilter, options.index_type);
assert_eq!(DEFAULT_FALSE_POSITIVE_RATE, options.false_positive_rate());
let options_str = serde_json::to_string(&options).unwrap();
assert_eq!(options_str, "{\"granularity\":10240,\"false-positive-rate-in-10000\":100,\"index-type\":\"BloomFilter\"}");
}
#[test]
fn test_fulltext_options_deserialization() {
let original_options = "{\"enable\":true,\"analyzer\":\"English\",\"case-sensitive\":false,\"backend\":\"bloom\",\"granularity\":1024,\"false-positive-rate-in-10000\":10}";
let options = serde_json::from_str::<FulltextOptions>(original_options).unwrap();
assert!(!options.case_sensitive);
assert!(options.enable);
assert_eq!(FulltextBackend::Bloom, options.backend);
assert_eq!(FulltextAnalyzer::default(), options.analyzer);
assert_eq!(1024, options.granularity);
assert_eq!(0.001, options.false_positive_rate());
let options_str = serde_json::to_string(&options).unwrap();
assert_eq!(options_str, original_options);
}
#[test]
fn test_fulltext_options_deserialization_v0_14_to_v0_15() {
// 0.14 to 0.15
let options = "{\"enable\":true,\"analyzer\":\"English\",\"case-sensitive\":false,\"backend\":\"bloom\"}";
let options = serde_json::from_str::<FulltextOptions>(options).unwrap();
assert!(!options.case_sensitive);
assert!(options.enable);
assert_eq!(FulltextBackend::Bloom, options.backend);
assert_eq!(FulltextAnalyzer::default(), options.analyzer);
assert_eq!(DEFAULT_GRANULARITY, options.granularity);
assert_eq!(DEFAULT_FALSE_POSITIVE_RATE, options.false_positive_rate());
let options_str = serde_json::to_string(&options).unwrap();
assert_eq!(options_str, "{\"enable\":true,\"analyzer\":\"English\",\"case-sensitive\":false,\"backend\":\"bloom\",\"granularity\":10240,\"false-positive-rate-in-10000\":100}");
}
}

View File

@@ -12,6 +12,11 @@
// See the License for the specific language governing permissions and
// limitations under the License.
use arrow_array::{
ArrayRef, PrimitiveArray, TimestampMicrosecondArray, TimestampMillisecondArray,
TimestampNanosecondArray, TimestampSecondArray,
};
use arrow_schema::DataType;
use common_time::timestamp::TimeUnit;
use common_time::Timestamp;
use paste::paste;
@@ -138,6 +143,41 @@ define_timestamp_with_unit!(Millisecond);
define_timestamp_with_unit!(Microsecond);
define_timestamp_with_unit!(Nanosecond);
pub fn timestamp_array_to_primitive(
ts_array: &ArrayRef,
) -> Option<(
PrimitiveArray<arrow_array::types::Int64Type>,
arrow::datatypes::TimeUnit,
)> {
let DataType::Timestamp(unit, _) = ts_array.data_type() else {
return None;
};
let ts_primitive = match unit {
arrow_schema::TimeUnit::Second => ts_array
.as_any()
.downcast_ref::<TimestampSecondArray>()
.unwrap()
.reinterpret_cast::<arrow_array::types::Int64Type>(),
arrow_schema::TimeUnit::Millisecond => ts_array
.as_any()
.downcast_ref::<TimestampMillisecondArray>()
.unwrap()
.reinterpret_cast::<arrow_array::types::Int64Type>(),
arrow_schema::TimeUnit::Microsecond => ts_array
.as_any()
.downcast_ref::<TimestampMicrosecondArray>()
.unwrap()
.reinterpret_cast::<arrow_array::types::Int64Type>(),
arrow_schema::TimeUnit::Nanosecond => ts_array
.as_any()
.downcast_ref::<TimestampNanosecondArray>()
.unwrap()
.reinterpret_cast::<arrow_array::types::Int64Type>(),
};
Some((ts_primitive, *unit))
}
#[cfg(test)]
mod tests {
use common_time::timezone::set_default_timezone;

View File

@@ -14,7 +14,7 @@
//! Batching mode engine
use std::collections::{BTreeMap, HashMap};
use std::collections::{BTreeMap, HashMap, HashSet};
use std::sync::Arc;
use api::v1::flow::{DirtyWindowRequests, FlowResponse};
@@ -142,7 +142,7 @@ impl BatchingEngine {
let handle: JoinHandle<Result<(), Error>> = tokio::spawn(async move {
let src_table_names = &task.config.source_table_names;
let mut all_dirty_windows = vec![];
let mut all_dirty_windows = HashSet::new();
for src_table_name in src_table_names {
if let Some((timestamps, unit)) = group_by_table_name.get(src_table_name) {
let Some(expr) = &task.config.time_window_expr else {
@@ -155,7 +155,7 @@ impl BatchingEngine {
.context(UnexpectedSnafu {
reason: "Failed to eval start value",
})?;
all_dirty_windows.push(align_start);
all_dirty_windows.insert(align_start);
}
}
}

View File

@@ -50,7 +50,8 @@ use snafu::{ensure, OptionExt, ResultExt};
use crate::adapter::util::from_proto_to_data_type;
use crate::error::{
ArrowSnafu, DatafusionSnafu, DatatypesSnafu, ExternalSnafu, PlanSnafu, UnexpectedSnafu,
ArrowSnafu, DatafusionSnafu, DatatypesSnafu, ExternalSnafu, PlanSnafu, TimeSnafu,
UnexpectedSnafu,
};
use crate::expr::error::DataTypeSnafu;
use crate::Error;
@@ -74,6 +75,7 @@ pub struct TimeWindowExpr {
logical_expr: Expr,
df_schema: DFSchema,
eval_time_window_size: Option<std::time::Duration>,
eval_time_original: Option<Timestamp>,
}
impl std::fmt::Display for TimeWindowExpr {
@@ -106,10 +108,11 @@ impl TimeWindowExpr {
logical_expr: expr.clone(),
df_schema: df_schema.clone(),
eval_time_window_size: None,
eval_time_original: None,
};
let test_ts = DEFAULT_TEST_TIMESTAMP;
let (l, u) = zelf.eval(test_ts)?;
let time_window_size = match (l, u) {
let (lower, upper) = zelf.eval(test_ts)?;
let time_window_size = match (lower, upper) {
(Some(l), Some(u)) => u.sub(&l).map(|r| r.to_std()).transpose().map_err(|_| {
UnexpectedSnafu {
reason: format!(
@@ -121,13 +124,59 @@ impl TimeWindowExpr {
_ => None,
};
zelf.eval_time_window_size = time_window_size;
zelf.eval_time_original = lower;
Ok(zelf)
}
/// TODO(discord9): add `eval_batch` too
pub fn eval(
&self,
current: Timestamp,
) -> Result<(Option<Timestamp>, Option<Timestamp>), Error> {
fn compute_distance(time_diff_ns: i64, stride_ns: i64) -> i64 {
if stride_ns == 0 {
return time_diff_ns;
}
// a - (a % n) impl ceil to nearest n * stride
let time_delta = time_diff_ns - (time_diff_ns % stride_ns);
if time_diff_ns < 0 && time_delta != time_diff_ns {
// The origin is later than the source timestamp, round down to the previous bin
time_delta - stride_ns
} else {
time_delta
}
}
// FAST PATH: if we have eval_time_original and eval_time_window_size,
// we can compute the bounds directly
if let (Some(original), Some(window_size)) =
(self.eval_time_original, self.eval_time_window_size)
{
// date_bin align current to lower bound
let time_diff_ns = current.sub(&original).and_then(|s|s.num_nanoseconds()).with_context(||UnexpectedSnafu {
reason: format!(
"Failed to compute time difference between current {current:?} and original {original:?}"
),
})?;
let window_size_ns = window_size.as_nanos() as i64;
let distance_ns = compute_distance(time_diff_ns, window_size_ns);
let lower_bound = if distance_ns >= 0 {
original.add_duration(std::time::Duration::from_nanos(distance_ns as u64))
} else {
original.sub_duration(std::time::Duration::from_nanos((-distance_ns) as u64))
}
.context(TimeSnafu)?;
let upper_bound = lower_bound.add_duration(window_size).context(TimeSnafu)?;
return Ok((Some(lower_bound), Some(upper_bound)));
}
let lower_bound =
calc_expr_time_window_lower_bound(&self.phy_expr, &self.df_schema, current)?;
let upper_bound =

View File

@@ -380,6 +380,13 @@ impl SqlQueryHandler for Instance {
.and_then(|stmts| query_interceptor.post_parsing(stmts, query_ctx.clone()))
{
Ok(stmts) => {
if stmts.is_empty() {
return vec![InvalidSqlSnafu {
err_msg: "empty statements",
}
.fail()];
}
let mut results = Vec::with_capacity(stmts.len());
for stmt in stmts {
if let Err(e) = checker

View File

@@ -13,6 +13,7 @@
// limitations under the License.
use api::v1::meta::{HeartbeatRequest, Peer, Role};
use common_meta::instruction::CacheIdent;
use common_meta::key::node_address::{NodeAddressKey, NodeAddressValue};
use common_meta::key::{MetadataKey, MetadataValue};
use common_meta::rpc::store::PutRequest;
@@ -80,7 +81,19 @@ async fn rewrite_node_address(ctx: &mut Context, peer: &Peer) {
match ctx.leader_cached_kv_backend.put(put).await {
Ok(_) => {
info!("Successfully updated flow `NodeAddressValue`: {:?}", peer);
// TODO(discord): broadcast invalidating cache to all frontends
// broadcast invalidating cache to all frontends
let cache_idents = vec![CacheIdent::FlowNodeAddressChange(peer.id)];
info!(
"Invalidate flow node cache for new address with cache idents: {:?}",
cache_idents
);
if let Err(e) = ctx
.cache_invalidator
.invalidate(&Default::default(), &cache_idents)
.await
{
error!(e; "Failed to invalidate {} `NodeAddressKey` cache, peer: {:?}", cache_idents.len(), peer);
}
}
Err(e) => {
error!(e; "Failed to update flow `NodeAddressValue`: {:?}", peer);

View File

@@ -473,8 +473,9 @@ struct MetricEngineInner {
mod test {
use std::collections::HashMap;
use common_telemetry::info;
use store_api::metric_engine_consts::PHYSICAL_TABLE_METADATA_KEY;
use store_api::region_request::{RegionCloseRequest, RegionOpenRequest};
use store_api::region_request::{RegionCloseRequest, RegionFlushRequest, RegionOpenRequest};
use super::*;
use crate::test_util::TestEnv;
@@ -559,4 +560,90 @@ mod test {
assert!(env.metric().region_statistic(logical_region_id).is_none());
assert!(env.metric().region_statistic(physical_region_id).is_some());
}
#[tokio::test]
async fn test_open_region_failure() {
let env = TestEnv::new().await;
env.init_metric_region().await;
let physical_region_id = env.default_physical_region_id();
let metric_engine = env.metric();
metric_engine
.handle_request(
physical_region_id,
RegionRequest::Flush(RegionFlushRequest {
row_group_size: None,
}),
)
.await
.unwrap();
let path = format!("{}/metadata/", env.default_region_dir());
let object_store = env.get_object_store().unwrap();
let list = object_store.list(&path).await.unwrap();
// Delete parquet files in metadata region
for entry in list {
if entry.metadata().is_dir() {
continue;
}
if entry.name().ends_with("parquet") {
info!("deleting {}", entry.path());
object_store.delete(entry.path()).await.unwrap();
}
}
let physical_region_option = [(PHYSICAL_TABLE_METADATA_KEY.to_string(), String::new())]
.into_iter()
.collect();
let open_request = RegionOpenRequest {
engine: METRIC_ENGINE_NAME.to_string(),
region_dir: env.default_region_dir(),
options: physical_region_option,
skip_wal_replay: false,
};
// Opening an already opened region should succeed.
// Since the region is already open, no metadata recovery operations will be performed.
metric_engine
.handle_request(physical_region_id, RegionRequest::Open(open_request))
.await
.unwrap();
// Close the region
metric_engine
.handle_request(
physical_region_id,
RegionRequest::Close(RegionCloseRequest {}),
)
.await
.unwrap();
// Try to reopen region.
let physical_region_option = [(PHYSICAL_TABLE_METADATA_KEY.to_string(), String::new())]
.into_iter()
.collect();
let open_request = RegionOpenRequest {
engine: METRIC_ENGINE_NAME.to_string(),
region_dir: env.default_region_dir(),
options: physical_region_option,
skip_wal_replay: false,
};
let err = metric_engine
.handle_request(physical_region_id, RegionRequest::Open(open_request))
.await
.unwrap_err();
// Failed to open region because of missing parquet files.
assert_eq!(err.status_code(), StatusCode::StorageUnavailable);
let mito_engine = metric_engine.mito();
let data_region_id = utils::to_data_region_id(physical_region_id);
let metadata_region_id = utils::to_metadata_region_id(physical_region_id);
// The metadata/data region should be closed.
let err = mito_engine.get_metadata(data_region_id).await.unwrap_err();
assert_eq!(err.status_code(), StatusCode::RegionNotFound);
let err = mito_engine
.get_metadata(metadata_region_id)
.await
.unwrap_err();
assert_eq!(err.status_code(), StatusCode::RegionNotFound);
}
}

View File

@@ -59,7 +59,7 @@ impl MetricEngineInner {
}
}
async fn close_physical_region(&self, region_id: RegionId) -> Result<AffectedRows> {
pub(crate) async fn close_physical_region(&self, region_id: RegionId) -> Result<AffectedRows> {
let data_region_id = utils::to_data_region_id(region_id);
let metadata_region_id = utils::to_metadata_region_id(region_id);

View File

@@ -17,7 +17,7 @@
use api::region::RegionResponse;
use api::v1::SemanticType;
use common_error::ext::BoxedError;
use common_telemetry::info;
use common_telemetry::{error, info, warn};
use datafusion::common::HashMap;
use mito2::engine::MITO_ENGINE_NAME;
use object_store::util::join_dir;
@@ -94,6 +94,21 @@ impl MetricEngineInner {
Ok(responses)
}
// If the metadata region is opened with a stale manifest,
// the metric engine may fail to recover logical tables from the metadata region,
// as the manifest could reference files that have already been deleted
// due to compaction operations performed by the region leader.
async fn close_physical_region_on_recovery_failure(&self, physical_region_id: RegionId) {
info!(
"Closing metadata region {} and data region {} on metadata recovery failure",
utils::to_metadata_region_id(physical_region_id),
utils::to_data_region_id(physical_region_id)
);
if let Err(err) = self.close_physical_region(physical_region_id).await {
error!(err; "Failed to close physical region {}", physical_region_id);
}
}
async fn open_physical_region_with_results(
&self,
metadata_region_result: Option<std::result::Result<RegionResponse, BoxedError>>,
@@ -119,8 +134,14 @@ impl MetricEngineInner {
region_type: "data",
})?;
self.recover_states(physical_region_id, physical_region_options)
.await?;
if let Err(err) = self
.recover_states(physical_region_id, physical_region_options)
.await
{
self.close_physical_region_on_recovery_failure(physical_region_id)
.await;
return Err(err);
}
Ok(data_region_response)
}
@@ -139,11 +160,31 @@ impl MetricEngineInner {
request: RegionOpenRequest,
) -> Result<AffectedRows> {
if request.is_physical_table() {
if self
.state
.read()
.unwrap()
.physical_region_states()
.get(&region_id)
.is_some()
{
warn!(
"The physical region {} is already open, ignore the open request",
region_id
);
return Ok(0);
}
// open physical region and recover states
let physical_region_options = PhysicalRegionOptions::try_from(&request.options)?;
self.open_physical_region(region_id, request).await?;
self.recover_states(region_id, physical_region_options)
.await?;
if let Err(err) = self
.recover_states(region_id, physical_region_options)
.await
{
self.close_physical_region_on_recovery_failure(region_id)
.await;
return Err(err);
}
Ok(0)
} else {

View File

@@ -23,6 +23,7 @@ use mito2::config::MitoConfig;
use mito2::engine::MitoEngine;
use mito2::test_util::TestEnv as MitoTestEnv;
use object_store::util::join_dir;
use object_store::ObjectStore;
use store_api::metadata::ColumnMetadata;
use store_api::metric_engine_consts::{
LOGICAL_TABLE_METADATA_KEY, METRIC_ENGINE_NAME, PHYSICAL_TABLE_METADATA_KEY,
@@ -74,6 +75,10 @@ impl TestEnv {
join_dir(&env_root, "data")
}
pub fn get_object_store(&self) -> Option<ObjectStore> {
self.mito_env.get_object_store()
}
/// Returns a reference to the engine.
pub fn mito(&self) -> MitoEngine {
self.mito.clone()

View File

@@ -62,7 +62,7 @@ use crate::read::BoxedBatchReader;
use crate::region::options::MergeMode;
use crate::region::version::VersionControlRef;
use crate::region::ManifestContextRef;
use crate::request::{OptionOutputTx, OutputTx, WorkerRequest};
use crate::request::{OptionOutputTx, OutputTx, WorkerRequestWithTime};
use crate::schedule::remote_job_scheduler::{
CompactionJob, DefaultNotifier, RemoteJob, RemoteJobSchedulerRef,
};
@@ -77,7 +77,7 @@ pub struct CompactionRequest {
pub(crate) current_version: CompactionVersion,
pub(crate) access_layer: AccessLayerRef,
/// Sender to send notification to the region worker.
pub(crate) request_sender: mpsc::Sender<WorkerRequest>,
pub(crate) request_sender: mpsc::Sender<WorkerRequestWithTime>,
/// Waiters of the compaction request.
pub(crate) waiters: Vec<OutputTx>,
/// Start time of compaction task.
@@ -101,7 +101,7 @@ pub(crate) struct CompactionScheduler {
/// Compacting regions.
region_status: HashMap<RegionId, CompactionStatus>,
/// Request sender of the worker that this scheduler belongs to.
request_sender: Sender<WorkerRequest>,
request_sender: Sender<WorkerRequestWithTime>,
cache_manager: CacheManagerRef,
engine_config: Arc<MitoConfig>,
listener: WorkerListener,
@@ -112,7 +112,7 @@ pub(crate) struct CompactionScheduler {
impl CompactionScheduler {
pub(crate) fn new(
scheduler: SchedulerRef,
request_sender: Sender<WorkerRequest>,
request_sender: Sender<WorkerRequestWithTime>,
cache_manager: CacheManagerRef,
engine_config: Arc<MitoConfig>,
listener: WorkerListener,
@@ -559,7 +559,7 @@ impl CompactionStatus {
#[allow(clippy::too_many_arguments)]
fn new_compaction_request(
&mut self,
request_sender: Sender<WorkerRequest>,
request_sender: Sender<WorkerRequestWithTime>,
mut waiter: OptionOutputTx,
engine_config: Arc<MitoConfig>,
cache_manager: CacheManagerRef,

View File

@@ -27,6 +27,7 @@ use crate::manifest::action::RegionEdit;
use crate::metrics::{COMPACTION_FAILURE_COUNT, COMPACTION_STAGE_ELAPSED};
use crate::request::{
BackgroundNotify, CompactionFailed, CompactionFinished, OutputTx, WorkerRequest,
WorkerRequestWithTime,
};
use crate::worker::WorkerListener;
use crate::{error, metrics};
@@ -37,7 +38,7 @@ pub const MAX_PARALLEL_COMPACTION: usize = 1;
pub(crate) struct CompactionTaskImpl {
pub compaction_region: CompactionRegion,
/// Request sender to notify the worker.
pub(crate) request_sender: mpsc::Sender<WorkerRequest>,
pub(crate) request_sender: mpsc::Sender<WorkerRequestWithTime>,
/// Senders that are used to notify waiters waiting for pending compaction tasks.
pub waiters: Vec<OutputTx>,
/// Start time of compaction task
@@ -135,7 +136,11 @@ impl CompactionTaskImpl {
/// Notifies region worker to handle post-compaction tasks.
async fn send_to_worker(&self, request: WorkerRequest) {
if let Err(e) = self.request_sender.send(request).await {
if let Err(e) = self
.request_sender
.send(WorkerRequestWithTime::new(request))
.await
{
error!(
"Failed to notify compaction job status for region {}, request: {:?}",
self.compaction_region.region_id, e.0

View File

@@ -1020,6 +1020,18 @@ pub enum Error {
location: Location,
source: mito_codec::error::Error,
},
#[snafu(display(
"Inconsistent timestamp column length, expect: {}, actual: {}",
expected,
actual
))]
InconsistentTimestampLength {
expected: usize,
actual: usize,
#[snafu(implicit)]
location: Location,
},
}
pub type Result<T, E = Error> = std::result::Result<T, E>;
@@ -1175,6 +1187,8 @@ impl ErrorExt for Error {
ConvertBulkWalEntry { source, .. } => source.status_code(),
Encode { source, .. } | Decode { source, .. } => source.status_code(),
InconsistentTimestampLength { .. } => StatusCode::InvalidArguments,
}
}

View File

@@ -42,7 +42,7 @@ use crate::region::version::{VersionControlData, VersionControlRef};
use crate::region::{ManifestContextRef, RegionLeaderState};
use crate::request::{
BackgroundNotify, FlushFailed, FlushFinished, OptionOutputTx, OutputTx, SenderBulkRequest,
SenderDdlRequest, SenderWriteRequest, WorkerRequest,
SenderDdlRequest, SenderWriteRequest, WorkerRequest, WorkerRequestWithTime,
};
use crate::schedule::scheduler::{Job, SchedulerRef};
use crate::sst::file::FileMeta;
@@ -223,7 +223,7 @@ pub(crate) struct RegionFlushTask {
/// Flush result senders.
pub(crate) senders: Vec<OutputTx>,
/// Request sender to notify the worker.
pub(crate) request_sender: mpsc::Sender<WorkerRequest>,
pub(crate) request_sender: mpsc::Sender<WorkerRequestWithTime>,
pub(crate) access_layer: AccessLayerRef,
pub(crate) listener: WorkerListener,
@@ -441,7 +441,11 @@ impl RegionFlushTask {
/// Notify flush job status.
async fn send_worker_request(&self, request: WorkerRequest) {
if let Err(e) = self.request_sender.send(request).await {
if let Err(e) = self
.request_sender
.send(WorkerRequestWithTime::new(request))
.await
{
error!(
"Failed to notify flush job status for region {}, request: {:?}",
self.region_id, e.0

View File

@@ -126,7 +126,12 @@ impl From<&BulkPart> for BulkWalEntry {
impl BulkPart {
pub(crate) fn estimated_size(&self) -> usize {
self.batch.get_array_memory_size()
self.batch
.columns()
.iter()
// If can not get slice memory size, assume 0 here.
.map(|c| c.to_data().get_slice_memory_size().unwrap_or(0))
.sum()
}
/// Converts [BulkPart] to [Mutation] for fallback `write_bulk` implementation.

View File

@@ -94,12 +94,7 @@ lazy_static! {
// ------ Write related metrics
/// Number of stalled write requests in each worker.
pub static ref WRITE_STALL_TOTAL: IntGaugeVec = register_int_gauge_vec!(
"greptime_mito_write_stall_total",
"mito stalled write request in each worker",
&[WORKER_LABEL]
).unwrap();
//
/// Counter of rejected write requests.
pub static ref WRITE_REJECT_TOTAL: IntCounter =
register_int_counter!("greptime_mito_write_reject_total", "mito write reject total").unwrap();
@@ -402,6 +397,7 @@ lazy_static! {
}
// Use another block to avoid reaching the recursion limit.
lazy_static! {
/// Counter for compaction input file size.
pub static ref COMPACTION_INPUT_BYTES: Counter = register_counter!(
@@ -426,6 +422,27 @@ lazy_static! {
"greptime_mito_memtable_field_builder_count",
"active field builder count in TimeSeriesMemtable",
).unwrap();
/// Number of stalling write requests in each worker.
pub static ref WRITE_STALLING: IntGaugeVec = register_int_gauge_vec!(
"greptime_mito_write_stalling_count",
"mito stalled write request in each worker",
&[WORKER_LABEL]
).unwrap();
/// Total number of stalled write requests.
pub static ref WRITE_STALL_TOTAL: IntCounter = register_int_counter!(
"greptime_mito_write_stall_total",
"Total number of stalled write requests"
).unwrap();
/// Time waiting for requests to be handled by the region worker.
pub static ref REQUEST_WAIT_TIME: HistogramVec = register_histogram_vec!(
"greptime_mito_request_wait_time",
"mito request wait time before being handled by region worker",
&[WORKER_LABEL],
// 0.001 ~ 10000
exponential_buckets(0.001, 10.0, 8).unwrap(),
)
.unwrap();
}
/// Stager notifier to collect metrics.

View File

@@ -542,6 +542,22 @@ pub(crate) struct SenderBulkRequest {
pub(crate) region_metadata: RegionMetadataRef,
}
/// Request sent to a worker with timestamp
#[derive(Debug)]
pub(crate) struct WorkerRequestWithTime {
pub(crate) request: WorkerRequest,
pub(crate) created_at: Instant,
}
impl WorkerRequestWithTime {
pub(crate) fn new(request: WorkerRequest) -> Self {
Self {
request,
created_at: Instant::now(),
}
}
}
/// Request sent to a worker
#[derive(Debug)]
pub(crate) enum WorkerRequest {

View File

@@ -30,6 +30,7 @@ use crate::manifest::action::RegionEdit;
use crate::metrics::{COMPACTION_FAILURE_COUNT, INFLIGHT_COMPACTION_COUNT};
use crate::request::{
BackgroundNotify, CompactionFailed, CompactionFinished, OutputTx, WorkerRequest,
WorkerRequestWithTime,
};
pub type RemoteJobSchedulerRef = Arc<dyn RemoteJobScheduler>;
@@ -130,7 +131,7 @@ pub struct CompactionJobResult {
/// DefaultNotifier is a default implementation of Notifier that sends WorkerRequest to the mito engine.
pub(crate) struct DefaultNotifier {
/// The sender to send WorkerRequest to the mito engine. This is used to notify the mito engine when a remote job is completed.
pub(crate) request_sender: Sender<WorkerRequest>,
pub(crate) request_sender: Sender<WorkerRequestWithTime>,
}
impl DefaultNotifier {
@@ -173,10 +174,10 @@ impl Notifier for DefaultNotifier {
if let Err(e) = self
.request_sender
.send(WorkerRequest::Background {
.send(WorkerRequestWithTime::new(WorkerRequest::Background {
region_id: result.region_id,
notify,
})
}))
.await
{
error!(

View File

@@ -294,7 +294,7 @@ impl RowGroupSelection {
let Some(y) = self.selection_in_rg.get(rg_id) else {
continue;
};
let selection = x.selection.intersection(&y.selection);
let selection = intersect_row_selections(&x.selection, &y.selection);
let row_count = selection.row_count();
let selector_len = selector_len(&selection);
if row_count > 0 {
@@ -423,6 +423,68 @@ impl RowGroupSelection {
}
}
/// Ported from `parquet` but trailing rows are removed.
///
/// Combine two lists of `RowSelection` return the intersection of them
/// For example:
/// self: NNYYYYNNYYNYN
/// other: NYNNNNNNY
///
/// returned: NNNNNNNNY (modified)
/// NNNNNNNNYYNYN (original)
fn intersect_row_selections(left: &RowSelection, right: &RowSelection) -> RowSelection {
let mut l_iter = left.iter().copied().peekable();
let mut r_iter = right.iter().copied().peekable();
let iter = std::iter::from_fn(move || {
loop {
let l = l_iter.peek_mut();
let r = r_iter.peek_mut();
match (l, r) {
(Some(a), _) if a.row_count == 0 => {
l_iter.next().unwrap();
}
(_, Some(b)) if b.row_count == 0 => {
r_iter.next().unwrap();
}
(Some(l), Some(r)) => {
return match (l.skip, r.skip) {
// Keep both ranges
(false, false) => {
if l.row_count < r.row_count {
r.row_count -= l.row_count;
l_iter.next()
} else {
l.row_count -= r.row_count;
r_iter.next()
}
}
// skip at least one
_ => {
if l.row_count < r.row_count {
let skip = l.row_count;
r.row_count -= l.row_count;
l_iter.next();
Some(RowSelector::skip(skip))
} else {
let skip = r.row_count;
l.row_count -= skip;
r_iter.next();
Some(RowSelector::skip(skip))
}
}
};
}
(None, _) => return None,
(_, None) => return None,
}
}
});
iter.collect()
}
/// Converts an iterator of row ranges into a `RowSelection` by creating a sequence of `RowSelector`s.
///
/// This function processes each range in the input and either creates a new selector or merges
@@ -448,10 +510,6 @@ pub(crate) fn row_selection_from_row_ranges(
last_processed_end = end;
}
if last_processed_end < total_row_count {
add_or_merge_selector(&mut selectors, total_row_count - last_processed_end, true);
}
RowSelection::from(selectors)
}
@@ -546,7 +604,6 @@ mod tests {
RowSelector::select(2),
RowSelector::skip(2),
RowSelector::select(3),
RowSelector::skip(2),
]);
assert_eq!(selection, expected);
}
@@ -555,7 +612,7 @@ mod tests {
fn test_empty_range() {
let ranges = [];
let selection = row_selection_from_row_ranges(ranges.iter().cloned(), 10);
let expected = RowSelection::from(vec![RowSelector::skip(10)]);
let expected = RowSelection::from(vec![]);
assert_eq!(selection, expected);
}
@@ -563,11 +620,7 @@ mod tests {
fn test_adjacent_ranges() {
let ranges = [1..2, 2..3];
let selection = row_selection_from_row_ranges(ranges.iter().cloned(), 10);
let expected = RowSelection::from(vec![
RowSelector::skip(1),
RowSelector::select(2),
RowSelector::skip(7),
]);
let expected = RowSelection::from(vec![RowSelector::skip(1), RowSelector::select(2)]);
assert_eq!(selection, expected);
}
@@ -580,7 +633,6 @@ mod tests {
RowSelector::select(1),
RowSelector::skip(98),
RowSelector::select(1),
RowSelector::skip(10139),
]);
assert_eq!(selection, expected);
}

View File

@@ -32,7 +32,7 @@ use crate::error::Result;
use crate::flush::FlushScheduler;
use crate::manifest::manager::{RegionManifestManager, RegionManifestOptions};
use crate::region::{ManifestContext, ManifestContextRef, RegionLeaderState, RegionRoleState};
use crate::request::WorkerRequest;
use crate::request::{WorkerRequest, WorkerRequestWithTime};
use crate::schedule::scheduler::{Job, LocalScheduler, Scheduler, SchedulerRef};
use crate::sst::index::intermediate::IntermediateManager;
use crate::sst::index::puffin_manager::PuffinManagerFactory;
@@ -85,7 +85,7 @@ impl SchedulerEnv {
/// Creates a new compaction scheduler.
pub(crate) fn mock_compaction_scheduler(
&self,
request_sender: Sender<WorkerRequest>,
request_sender: Sender<WorkerRequestWithTime>,
) -> CompactionScheduler {
let scheduler = self.get_scheduler();

View File

@@ -39,7 +39,7 @@ use common_runtime::JoinHandle;
use common_telemetry::{error, info, warn};
use futures::future::try_join_all;
use object_store::manager::ObjectStoreManagerRef;
use prometheus::IntGauge;
use prometheus::{Histogram, IntGauge};
use rand::{rng, Rng};
use snafu::{ensure, ResultExt};
use store_api::logstore::LogStore;
@@ -58,11 +58,11 @@ use crate::error;
use crate::error::{CreateDirSnafu, JoinSnafu, Result, WorkerStoppedSnafu};
use crate::flush::{FlushScheduler, WriteBufferManagerImpl, WriteBufferManagerRef};
use crate::memtable::MemtableBuilderProvider;
use crate::metrics::{REGION_COUNT, WRITE_STALL_TOTAL};
use crate::metrics::{REGION_COUNT, REQUEST_WAIT_TIME, WRITE_STALLING};
use crate::region::{MitoRegionRef, OpeningRegions, OpeningRegionsRef, RegionMap, RegionMapRef};
use crate::request::{
BackgroundNotify, DdlRequest, SenderBulkRequest, SenderDdlRequest, SenderWriteRequest,
WorkerRequest,
WorkerRequest, WorkerRequestWithTime,
};
use crate::schedule::scheduler::{LocalScheduler, SchedulerRef};
use crate::sst::file::FileId;
@@ -469,8 +469,9 @@ impl<S: LogStore> WorkerStarter<S> {
last_periodical_check_millis: now,
flush_sender: self.flush_sender,
flush_receiver: self.flush_receiver,
stalled_count: WRITE_STALL_TOTAL.with_label_values(&[&id_string]),
stalling_count: WRITE_STALLING.with_label_values(&[&id_string]),
region_count: REGION_COUNT.with_label_values(&[&id_string]),
request_wait_time: REQUEST_WAIT_TIME.with_label_values(&[&id_string]),
region_edit_queues: RegionEditQueues::default(),
schema_metadata_manager: self.schema_metadata_manager,
};
@@ -498,7 +499,7 @@ pub(crate) struct RegionWorker {
/// The opening regions.
opening_regions: OpeningRegionsRef,
/// Request sender.
sender: Sender<WorkerRequest>,
sender: Sender<WorkerRequestWithTime>,
/// Handle to the worker thread.
handle: Mutex<Option<JoinHandle<()>>>,
/// Whether to run the worker thread.
@@ -509,7 +510,8 @@ impl RegionWorker {
/// Submits request to background worker thread.
async fn submit_request(&self, request: WorkerRequest) -> Result<()> {
ensure!(self.is_running(), WorkerStoppedSnafu { id: self.id });
if self.sender.send(request).await.is_err() {
let request_with_time = WorkerRequestWithTime::new(request);
if self.sender.send(request_with_time).await.is_err() {
warn!(
"Worker {} is already exited but the running flag is still true",
self.id
@@ -531,7 +533,12 @@ impl RegionWorker {
info!("Stop region worker {}", self.id);
self.set_running(false);
if self.sender.send(WorkerRequest::Stop).await.is_err() {
if self
.sender
.send(WorkerRequestWithTime::new(WorkerRequest::Stop))
.await
.is_err()
{
warn!("Worker {} is already exited before stop", self.id);
}
@@ -669,9 +676,9 @@ struct RegionWorkerLoop<S> {
/// Regions that are opening.
opening_regions: OpeningRegionsRef,
/// Request sender.
sender: Sender<WorkerRequest>,
sender: Sender<WorkerRequestWithTime>,
/// Request receiver.
receiver: Receiver<WorkerRequest>,
receiver: Receiver<WorkerRequestWithTime>,
/// WAL of the engine.
wal: Wal<S>,
/// Manages object stores for manifest and SSTs.
@@ -706,10 +713,12 @@ struct RegionWorkerLoop<S> {
flush_sender: watch::Sender<()>,
/// Watch channel receiver to wait for background flush job.
flush_receiver: watch::Receiver<()>,
/// Gauge of stalled request count.
stalled_count: IntGauge,
/// Gauge of stalling request count.
stalling_count: IntGauge,
/// Gauge of regions in the worker.
region_count: IntGauge,
/// Histogram of request wait time for this worker.
request_wait_time: Histogram,
/// Queues for region edit requests.
region_edit_queues: RegionEditQueues,
/// Database level metadata manager.
@@ -749,10 +758,16 @@ impl<S: LogStore> RegionWorkerLoop<S> {
tokio::select! {
request_opt = self.receiver.recv() => {
match request_opt {
Some(request) => match request {
WorkerRequest::Write(sender_req) => write_req_buffer.push(sender_req),
WorkerRequest::Ddl(sender_req) => ddl_req_buffer.push(sender_req),
_ => general_req_buffer.push(request),
Some(request_with_time) => {
// Observe the wait time
let wait_time = request_with_time.created_at.elapsed();
self.request_wait_time.observe(wait_time.as_secs_f64());
match request_with_time.request {
WorkerRequest::Write(sender_req) => write_req_buffer.push(sender_req),
WorkerRequest::Ddl(sender_req) => ddl_req_buffer.push(sender_req),
req => general_req_buffer.push(req),
}
},
// The channel is disconnected.
None => break,
@@ -791,11 +806,17 @@ impl<S: LogStore> RegionWorkerLoop<S> {
for _ in 1..self.config.worker_request_batch_size {
// We have received one request so we start from 1.
match self.receiver.try_recv() {
Ok(req) => match req {
WorkerRequest::Write(sender_req) => write_req_buffer.push(sender_req),
WorkerRequest::Ddl(sender_req) => ddl_req_buffer.push(sender_req),
_ => general_req_buffer.push(req),
},
Ok(request_with_time) => {
// Observe the wait time
let wait_time = request_with_time.created_at.elapsed();
self.request_wait_time.observe(wait_time.as_secs_f64());
match request_with_time.request {
WorkerRequest::Write(sender_req) => write_req_buffer.push(sender_req),
WorkerRequest::Ddl(sender_req) => ddl_req_buffer.push(sender_req),
req => general_req_buffer.push(req),
}
}
// We still need to handle remaining requests.
Err(_) => break,
}

View File

@@ -15,15 +15,11 @@
//! Handles bulk insert requests.
use datatypes::arrow;
use datatypes::arrow::array::{
TimestampMicrosecondArray, TimestampMillisecondArray, TimestampNanosecondArray,
TimestampSecondArray,
};
use datatypes::arrow::datatypes::{DataType, TimeUnit};
use store_api::logstore::LogStore;
use store_api::metadata::RegionMetadataRef;
use store_api::region_request::RegionBulkInsertsRequest;
use crate::error::InconsistentTimestampLengthSnafu;
use crate::memtable::bulk::part::BulkPart;
use crate::request::{OptionOutputTx, SenderBulkRequest};
use crate::worker::RegionWorkerLoop;
@@ -41,6 +37,10 @@ impl<S: LogStore> RegionWorkerLoop<S> {
.with_label_values(&["process_bulk_req"])
.start_timer();
let batch = request.payload;
if batch.num_rows() == 0 {
sender.send(Ok(0));
return;
}
let Some((ts_index, ts)) = batch
.schema()
@@ -60,55 +60,23 @@ impl<S: LogStore> RegionWorkerLoop<S> {
return;
};
let DataType::Timestamp(unit, _) = ts.data_type() else {
// safety: ts data type must be a timestamp type.
unreachable!()
};
if batch.num_rows() != ts.len() {
sender.send(
InconsistentTimestampLengthSnafu {
expected: batch.num_rows(),
actual: ts.len(),
}
.fail(),
);
return;
}
let (min_ts, max_ts) = match unit {
TimeUnit::Second => {
let ts = ts.as_any().downcast_ref::<TimestampSecondArray>().unwrap();
(
//safety: ts array must contain at least one row so this won't return None.
arrow::compute::min(ts).unwrap(),
arrow::compute::max(ts).unwrap(),
)
}
// safety: ts data type must be a timestamp type.
let (ts_primitive, _) = datatypes::timestamp::timestamp_array_to_primitive(ts).unwrap();
TimeUnit::Millisecond => {
let ts = ts
.as_any()
.downcast_ref::<TimestampMillisecondArray>()
.unwrap();
(
//safety: ts array must contain at least one row so this won't return None.
arrow::compute::min(ts).unwrap(),
arrow::compute::max(ts).unwrap(),
)
}
TimeUnit::Microsecond => {
let ts = ts
.as_any()
.downcast_ref::<TimestampMicrosecondArray>()
.unwrap();
(
//safety: ts array must contain at least one row so this won't return None.
arrow::compute::min(ts).unwrap(),
arrow::compute::max(ts).unwrap(),
)
}
TimeUnit::Nanosecond => {
let ts = ts
.as_any()
.downcast_ref::<TimestampNanosecondArray>()
.unwrap();
(
//safety: ts array must contain at least one row so this won't return None.
arrow::compute::min(ts).unwrap(),
arrow::compute::max(ts).unwrap(),
)
}
};
// safety: we've checked ts.len() == batch.num_rows() and batch is not empty
let min_ts = arrow::compute::min(&ts_primitive).unwrap();
let max_ts = arrow::compute::max(&ts_primitive).unwrap();
let part = BulkPart {
batch,

View File

@@ -34,7 +34,7 @@ use crate::region::version::VersionBuilder;
use crate::region::{MitoRegionRef, RegionLeaderState, RegionRoleState};
use crate::request::{
BackgroundNotify, OptionOutputTx, RegionChangeResult, RegionEditRequest, RegionEditResult,
RegionSyncRequest, TruncateResult, WorkerRequest,
RegionSyncRequest, TruncateResult, WorkerRequest, WorkerRequestWithTime,
};
use crate::sst::location;
use crate::worker::{RegionWorkerLoop, WorkerListener};
@@ -230,7 +230,10 @@ impl<S> RegionWorkerLoop<S> {
}),
};
// We don't set state back as the worker loop is already exited.
if let Err(res) = request_sender.send(notify).await {
if let Err(res) = request_sender
.send(WorkerRequestWithTime::new(notify))
.await
{
warn!(
"Failed to send region edit result back to the worker, region_id: {}, res: {:?}",
region_id, res
@@ -318,10 +321,10 @@ impl<S> RegionWorkerLoop<S> {
truncated_sequence: truncate.truncated_sequence,
};
let _ = request_sender
.send(WorkerRequest::Background {
.send(WorkerRequestWithTime::new(WorkerRequest::Background {
region_id: truncate.region_id,
notify: BackgroundNotify::Truncate(truncate_result),
})
}))
.await
.inspect_err(|_| warn!("failed to send truncate result"));
});
@@ -364,7 +367,10 @@ impl<S> RegionWorkerLoop<S> {
.on_notify_region_change_result_begin(region.region_id)
.await;
if let Err(res) = request_sender.send(notify).await {
if let Err(res) = request_sender
.send(WorkerRequestWithTime::new(notify))
.await
{
warn!(
"Failed to send region change result back to the worker, region_id: {}, res: {:?}",
region.region_id, res

View File

@@ -27,7 +27,9 @@ use store_api::storage::RegionId;
use crate::error::{InvalidRequestSnafu, RegionStateSnafu, RejectWriteSnafu, Result};
use crate::metrics;
use crate::metrics::{WRITE_REJECT_TOTAL, WRITE_ROWS_TOTAL, WRITE_STAGE_ELAPSED};
use crate::metrics::{
WRITE_REJECT_TOTAL, WRITE_ROWS_TOTAL, WRITE_STAGE_ELAPSED, WRITE_STALL_TOTAL,
};
use crate::region::{RegionLeaderState, RegionRoleState};
use crate::region_write_ctx::RegionWriteCtx;
use crate::request::{SenderBulkRequest, SenderWriteRequest, WriteRequest};
@@ -57,8 +59,9 @@ impl<S: LogStore> RegionWorkerLoop<S> {
}
if self.write_buffer_manager.should_stall() && allow_stall {
self.stalled_count
.add((write_requests.len() + bulk_requests.len()) as i64);
let stalled_count = (write_requests.len() + bulk_requests.len()) as i64;
self.stalling_count.add(stalled_count);
WRITE_STALL_TOTAL.inc_by(stalled_count as u64);
self.stalled_requests.append(write_requests, bulk_requests);
self.listener.on_write_stall();
return;
@@ -161,7 +164,7 @@ impl<S: LogStore> RegionWorkerLoop<S> {
pub(crate) async fn handle_stalled_requests(&mut self) {
// Handle stalled requests.
let stalled = std::mem::take(&mut self.stalled_requests);
self.stalled_count.sub(stalled.stalled_count() as i64);
self.stalling_count.sub(stalled.stalled_count() as i64);
// We already stalled these requests, don't stall them again.
for (_, (_, mut requests, mut bulk)) in stalled.requests {
self.handle_write_requests(&mut requests, &mut bulk, false)
@@ -172,7 +175,7 @@ impl<S: LogStore> RegionWorkerLoop<S> {
/// Rejects all stalled requests.
pub(crate) fn reject_stalled_requests(&mut self) {
let stalled = std::mem::take(&mut self.stalled_requests);
self.stalled_count.sub(stalled.stalled_count() as i64);
self.stalling_count.sub(stalled.stalled_count() as i64);
for (_, (_, mut requests, mut bulk)) in stalled.requests {
reject_write_requests(&mut requests, &mut bulk);
}
@@ -182,7 +185,8 @@ impl<S: LogStore> RegionWorkerLoop<S> {
pub(crate) fn reject_region_stalled_requests(&mut self, region_id: &RegionId) {
debug!("Rejects stalled requests for region {}", region_id);
let (mut requests, mut bulk) = self.stalled_requests.remove(region_id);
self.stalled_count.sub((requests.len() + bulk.len()) as i64);
self.stalling_count
.sub((requests.len() + bulk.len()) as i64);
reject_write_requests(&mut requests, &mut bulk);
}
@@ -190,7 +194,8 @@ impl<S: LogStore> RegionWorkerLoop<S> {
pub(crate) async fn handle_region_stalled_requests(&mut self, region_id: &RegionId) {
debug!("Handles stalled requests for region {}", region_id);
let (mut requests, mut bulk) = self.stalled_requests.remove(region_id);
self.stalled_count.sub((requests.len() + bulk.len()) as i64);
self.stalling_count
.sub((requests.len() + bulk.len()) as i64);
self.handle_write_requests(&mut requests, &mut bulk, true)
.await;
}
@@ -251,7 +256,8 @@ impl<S> RegionWorkerLoop<S> {
"Region {} is altering, add request to pending writes",
region.region_id
);
self.stalled_count.add(1);
self.stalling_count.add(1);
WRITE_STALL_TOTAL.inc();
self.stalled_requests.push(sender_req);
continue;
}
@@ -353,7 +359,8 @@ impl<S> RegionWorkerLoop<S> {
"Region {} is altering, add request to pending writes",
region.region_id
);
self.stalled_count.add(1);
self.stalling_count.add(1);
WRITE_STALL_TOTAL.inc();
self.stalled_requests.push_bulk(bulk_req);
continue;
}

View File

@@ -20,11 +20,7 @@ use api::v1::region::{
bulk_insert_request, region_request, BulkInsertRequest, RegionRequest, RegionRequestHeader,
};
use api::v1::ArrowIpc;
use arrow::array::{
Array, TimestampMicrosecondArray, TimestampMillisecondArray, TimestampNanosecondArray,
TimestampSecondArray,
};
use arrow::datatypes::{DataType, Int64Type, TimeUnit};
use arrow::array::Array;
use arrow::record_batch::RecordBatch;
use common_base::AffectedRows;
use common_grpc::flight::{FlightDecoder, FlightEncoder, FlightMessage};
@@ -62,6 +58,10 @@ impl Inserter {
};
decode_timer.observe_duration();
if record_batch.num_rows() == 0 {
return Ok(0);
}
// notify flownode to update dirty timestamps if flow is configured.
self.maybe_update_flow_dirty_window(table_info, record_batch.clone());
@@ -155,6 +155,9 @@ impl Inserter {
let mut raw_data_bytes = None;
for (peer, masks) in mask_per_datanode {
for (region_id, mask) in masks {
if mask.select_none() {
continue;
}
let rb = record_batch.clone();
let schema_bytes = schema_bytes.clone();
let node_manager = self.node_manager.clone();
@@ -304,32 +307,11 @@ fn extract_timestamps(rb: &RecordBatch, timestamp_index_name: &str) -> error::Re
if rb.num_rows() == 0 {
return Ok(vec![]);
}
let primitive = match ts_col.data_type() {
DataType::Timestamp(unit, _) => match unit {
TimeUnit::Second => ts_col
.as_any()
.downcast_ref::<TimestampSecondArray>()
.unwrap()
.reinterpret_cast::<Int64Type>(),
TimeUnit::Millisecond => ts_col
.as_any()
.downcast_ref::<TimestampMillisecondArray>()
.unwrap()
.reinterpret_cast::<Int64Type>(),
TimeUnit::Microsecond => ts_col
.as_any()
.downcast_ref::<TimestampMicrosecondArray>()
.unwrap()
.reinterpret_cast::<Int64Type>(),
TimeUnit::Nanosecond => ts_col
.as_any()
.downcast_ref::<TimestampNanosecondArray>()
.unwrap()
.reinterpret_cast::<Int64Type>(),
},
t => {
return error::InvalidTimeIndexTypeSnafu { ty: t.clone() }.fail();
}
};
let (primitive, _) =
datatypes::timestamp::timestamp_array_to_primitive(ts_col).with_context(|| {
error::InvalidTimeIndexTypeSnafu {
ty: ts_col.data_type().clone(),
}
})?;
Ok(primitive.iter().flatten().collect())
}

View File

@@ -229,6 +229,7 @@ impl DispatchedTo {
pub enum PipelineExecOutput {
Transformed(TransformedOutput),
DispatchedTo(DispatchedTo, Value),
Filtered,
}
#[derive(Debug)]
@@ -309,6 +310,10 @@ impl Pipeline {
// process
for processor in self.processors.iter() {
val = processor.exec_mut(val)?;
if val.is_null() {
// line is filtered
return Ok(PipelineExecOutput::Filtered);
}
}
// dispatch, fast return if matched
@@ -333,9 +338,9 @@ impl Pipeline {
table_suffix,
}));
}
// continue v2 process, check ts column and set the rest fields with auto-transform
// continue v2 process, and set the rest fields with auto-transform
// if transformer presents, then ts has been set
values_to_row(schema_info, val, pipeline_ctx, Some(values))?
values_to_row(schema_info, val, pipeline_ctx, Some(values), false)?
}
TransformerMode::AutoTransform(ts_name, time_unit) => {
// infer ts from the context
@@ -347,7 +352,7 @@ impl Pipeline {
));
let n_ctx =
PipelineContext::new(&def, pipeline_ctx.pipeline_param, pipeline_ctx.channel);
values_to_row(schema_info, val, &n_ctx, None)?
values_to_row(schema_info, val, &n_ctx, None, true)?
}
};
@@ -525,9 +530,6 @@ transform:
.into_transformed()
.unwrap();
// println!("[DEBUG]schema_info: {:?}", schema_info.schema);
// println!("[DEBUG]re: {:?}", result.0.values);
assert_eq!(schema_info.schema.len(), result.0.values.len());
let test = vec![
(

View File

@@ -19,6 +19,7 @@ pub mod decolorize;
pub mod digest;
pub mod dissect;
pub mod epoch;
pub mod filter;
pub mod gsub;
pub mod join;
pub mod json_parse;
@@ -54,6 +55,7 @@ use crate::error::{
Result, UnsupportedProcessorSnafu,
};
use crate::etl::field::{Field, Fields};
use crate::etl::processor::filter::FilterProcessor;
use crate::etl::processor::json_parse::JsonParseProcessor;
use crate::etl::processor::select::SelectProcessor;
use crate::etl::processor::simple_extract::SimpleExtractProcessor;
@@ -146,6 +148,7 @@ pub enum ProcessorKind {
Digest(DigestProcessor),
Select(SelectProcessor),
Vrl(VrlProcessor),
Filter(FilterProcessor),
}
#[derive(Debug, Default)]
@@ -226,6 +229,7 @@ fn parse_processor(doc: &yaml_rust::Yaml) -> Result<ProcessorKind> {
}
vrl::PROCESSOR_VRL => ProcessorKind::Vrl(VrlProcessor::try_from(value)?),
select::PROCESSOR_SELECT => ProcessorKind::Select(SelectProcessor::try_from(value)?),
filter::PROCESSOR_FILTER => ProcessorKind::Filter(FilterProcessor::try_from(value)?),
_ => return UnsupportedProcessorSnafu { processor: str_key }.fail(),
};

View File

@@ -0,0 +1,242 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
use ahash::{HashSet, HashSetExt};
use snafu::OptionExt;
use crate::error::{
Error, KeyMustBeStringSnafu, ProcessorExpectStringSnafu, ProcessorMissingFieldSnafu, Result,
ValueMustBeMapSnafu,
};
use crate::etl::field::Fields;
use crate::etl::processor::{
yaml_bool, yaml_new_field, yaml_new_fields, yaml_string, yaml_strings, FIELDS_NAME, FIELD_NAME,
};
use crate::{Processor, Value};
pub(crate) const PROCESSOR_FILTER: &str = "filter";
const MATCH_MODE_NAME: &str = "mode";
const MATCH_OP_NAME: &str = "match_op";
const CASE_INSENSITIVE_NAME: &str = "case_insensitive";
const TARGETS_NAME: &str = "targets";
#[derive(Debug)]
enum MatchMode {
SimpleMatch(MatchOp),
}
impl Default for MatchMode {
fn default() -> Self {
Self::SimpleMatch(MatchOp::default())
}
}
#[derive(Debug, Default)]
enum MatchOp {
#[default]
In,
NotIn,
}
/// Filter out the whole line if matches.
/// Ultimately it's a condition check, maybe we can use VRL to do more complex check.
/// Implement simple string match for now. Can be extended later.
#[derive(Debug, Default)]
pub struct FilterProcessor {
fields: Fields,
mode: MatchMode,
case_insensitive: bool,
targets: HashSet<String>,
}
impl TryFrom<&yaml_rust::yaml::Hash> for FilterProcessor {
type Error = Error;
// match mode can be extended in the future
#[allow(clippy::single_match)]
fn try_from(value: &yaml_rust::yaml::Hash) -> std::result::Result<Self, Self::Error> {
let mut fields = Fields::default();
let mut mode = MatchMode::default();
let mut op = MatchOp::default();
let mut case_insensitive = true;
let mut targets = HashSet::new();
for (k, v) in value.iter() {
let key = k
.as_str()
.with_context(|| KeyMustBeStringSnafu { k: k.clone() })?;
match key {
FIELD_NAME => fields = Fields::one(yaml_new_field(v, FIELD_NAME)?),
FIELDS_NAME => fields = yaml_new_fields(v, FIELDS_NAME)?,
MATCH_MODE_NAME => match yaml_string(v, MATCH_MODE_NAME)?.as_str() {
"simple" => mode = MatchMode::SimpleMatch(MatchOp::In),
_ => {}
},
MATCH_OP_NAME => match yaml_string(v, MATCH_OP_NAME)?.as_str() {
"in" => op = MatchOp::In,
"not_in" => op = MatchOp::NotIn,
_ => {}
},
CASE_INSENSITIVE_NAME => case_insensitive = yaml_bool(v, CASE_INSENSITIVE_NAME)?,
TARGETS_NAME => {
yaml_strings(v, TARGETS_NAME)?
.into_iter()
.filter(|s| !s.is_empty())
.for_each(|s| {
targets.insert(s);
});
}
_ => {}
}
}
if matches!(mode, MatchMode::SimpleMatch(_)) {
mode = MatchMode::SimpleMatch(op);
}
if targets.is_empty() {
return ProcessorMissingFieldSnafu {
processor: PROCESSOR_FILTER,
field: TARGETS_NAME.to_string(),
}
.fail();
}
if case_insensitive {
targets = targets.into_iter().map(|s| s.to_lowercase()).collect();
}
Ok(FilterProcessor {
fields,
mode,
case_insensitive,
targets,
})
}
}
impl FilterProcessor {
fn match_target(&self, input: String) -> bool {
let input = if self.case_insensitive {
input.to_lowercase()
} else {
input
};
match &self.mode {
MatchMode::SimpleMatch(op) => match op {
MatchOp::In => self.targets.contains(&input),
MatchOp::NotIn => !self.targets.contains(&input),
},
}
}
}
impl Processor for FilterProcessor {
fn kind(&self) -> &str {
PROCESSOR_FILTER
}
fn ignore_missing(&self) -> bool {
true
}
fn exec_mut(&self, mut val: Value) -> Result<Value> {
let v_map = val.as_map_mut().context(ValueMustBeMapSnafu)?;
for field in self.fields.iter() {
let index = field.input_field();
match v_map.get(index) {
Some(Value::String(s)) => {
if self.match_target(s.clone()) {
return Ok(Value::Null);
}
}
Some(v) => {
return ProcessorExpectStringSnafu {
processor: self.kind(),
v: v.clone(),
}
.fail();
}
None => {}
}
}
Ok(val)
}
}
#[cfg(test)]
mod test {
use ahash::HashSet;
use crate::etl::field::{Field, Fields};
use crate::etl::processor::filter::{FilterProcessor, MatchMode, MatchOp};
use crate::{Map, Processor, Value};
#[test]
fn test_eq() {
let processor = FilterProcessor {
fields: Fields::one(Field::new("name", None)),
mode: MatchMode::SimpleMatch(MatchOp::In),
case_insensitive: false,
targets: HashSet::from_iter(vec!["John".to_string()]),
};
let val = Value::Map(Map::one("name", Value::String("John".to_string())));
let result = processor.exec_mut(val).unwrap();
assert_eq!(result, Value::Null);
let val = Value::Map(Map::one("name", Value::String("Wick".to_string())));
let expect = val.clone();
let result = processor.exec_mut(val).unwrap();
assert_eq!(result, expect);
}
#[test]
fn test_ne() {
let processor = FilterProcessor {
fields: Fields::one(Field::new("name", None)),
mode: MatchMode::SimpleMatch(MatchOp::NotIn),
case_insensitive: false,
targets: HashSet::from_iter(vec!["John".to_string()]),
};
let val = Value::Map(Map::one("name", Value::String("John".to_string())));
let expect = val.clone();
let result = processor.exec_mut(val).unwrap();
assert_eq!(result, expect);
let val = Value::Map(Map::one("name", Value::String("Wick".to_string())));
let result = processor.exec_mut(val).unwrap();
assert_eq!(result, Value::Null);
}
#[test]
fn test_case() {
let processor = FilterProcessor {
fields: Fields::one(Field::new("name", None)),
mode: MatchMode::SimpleMatch(MatchOp::In),
case_insensitive: true,
targets: HashSet::from_iter(vec!["john".to_string()]),
};
let val = Value::Map(Map::one("name", Value::String("JoHN".to_string())));
let result = processor.exec_mut(val).unwrap();
assert_eq!(result, Value::Null);
}
}

View File

@@ -420,15 +420,17 @@ pub(crate) fn values_to_row(
values: Value,
pipeline_ctx: &PipelineContext<'_>,
row: Option<Vec<GreptimeValue>>,
need_calc_ts: bool,
) -> Result<Row> {
let mut row: Vec<GreptimeValue> =
row.unwrap_or_else(|| Vec::with_capacity(schema_info.schema.len()));
let custom_ts = pipeline_ctx.pipeline_definition.get_custom_ts();
// calculate timestamp value based on the channel
let ts = calc_ts(pipeline_ctx, &values)?;
row.push(GreptimeValue { value_data: ts });
if need_calc_ts {
// calculate timestamp value based on the channel
let ts = calc_ts(pipeline_ctx, &values)?;
row.push(GreptimeValue { value_data: ts });
}
row.resize(schema_info.schema.len(), GreptimeValue { value_data: None });
@@ -608,7 +610,7 @@ fn identity_pipeline_inner(
skip_error
);
let row = unwrap_or_continue_if_err!(
values_to_row(&mut schema_info, pipeline_map, pipeline_ctx, None),
values_to_row(&mut schema_info, pipeline_map, pipeline_ctx, None, true),
skip_error
);

View File

@@ -340,7 +340,14 @@ impl ExecutionPlan for RangeManipulateExec {
}
fn required_input_distribution(&self) -> Vec<Distribution> {
self.input.required_input_distribution()
let input_requirement = self.input.required_input_distribution();
if input_requirement.is_empty() {
// if the input is EmptyMetric, its required_input_distribution() is empty so we can't
// use its input distribution.
vec![Distribution::UnspecifiedDistribution]
} else {
input_requirement
}
}
fn with_new_children(

View File

@@ -237,7 +237,8 @@ fn create_output_batch(
for (node, metric) in sub_stage_metrics.into_iter().enumerate() {
builder.append_metric(1, node as _, metrics_to_string(metric, format)?);
}
return Ok(TreeNodeRecursion::Stop);
// might have multiple merge scans, so continue
return Ok(TreeNodeRecursion::Continue);
}
Ok(TreeNodeRecursion::Continue)
})?;

View File

@@ -12,7 +12,7 @@
// See the License for the specific language governing permissions and
// limitations under the License.
use std::collections::HashSet;
use std::collections::{HashMap, HashSet};
use std::sync::Arc;
use common_telemetry::debug;
@@ -38,6 +38,13 @@ use crate::dist_plan::merge_scan::MergeScanLogicalPlan;
use crate::plan::ExtractExpr;
use crate::query_engine::DefaultSerializer;
#[cfg(test)]
mod test;
mod utils;
pub(crate) use utils::{AliasMapping, AliasTracker};
#[derive(Debug)]
pub struct DistPlannerAnalyzer;
@@ -154,8 +161,50 @@ struct PlanRewriter {
status: RewriterStatus,
/// Partition columns of the table in current pass
partition_cols: Option<Vec<String>>,
column_requirements: HashSet<Column>,
alias_tracker: Option<AliasTracker>,
/// use stack count as scope to determine column requirements is needed or not
/// i.e for a logical plan like:
/// ```ignore
/// 1: Projection: t.number
/// 2: Sort: t.pk1+t.pk2
/// 3. Projection: t.number, t.pk1, t.pk2
/// ```
/// `Sort` will make a column requirement for `t.pk1` at level 2.
/// Which making `Projection` at level 1 need to add a ref to `t.pk1` as well.
/// So that the expanded plan will be
/// ```ignore
/// Projection: t.number
/// MergeSort: t.pk1
/// MergeScan: remote_input=
/// Projection: t.number, "t.pk1+t.pk2" <--- the original `Projection` at level 1 get added with `t.pk1+t.pk2`
/// Sort: t.pk1+t.pk2
/// Projection: t.number, t.pk1, t.pk2
/// ```
/// Making `MergeSort` can have `t.pk1` as input.
/// Meanwhile `Projection` at level 3 doesn't need to add any new column because 3 > 2
/// and col requirements at level 2 is not applicable for level 3.
///
/// see more details in test `expand_proj_step_aggr` and `expand_proj_sort_proj`
///
/// TODO(discord9): a simpler solution to track column requirements for merge scan
column_requirements: Vec<(HashSet<Column>, usize)>,
/// Whether to expand on next call
/// This is used to handle the case where a plan is transformed, but need to be expanded from it's
/// parent node. For example a Aggregate plan is split into two parts in frontend and datanode, and need
/// to be expanded from the parent node of the Aggregate plan.
expand_on_next_call: bool,
/// Expanding on next partial/conditional/transformed commutative plan
/// This is used to handle the case where a plan is transformed, but still
/// need to push down as many node as possible before next partial/conditional/transformed commutative
/// plan. I.e.
/// ```ignore
/// Limit:
/// Sort:
/// ```
/// where `Limit` is partial commutative, and `Sort` is conditional commutative.
/// In this case, we need to expand the `Limit` plan,
/// so that we can push down the `Sort` plan as much as possible.
expand_on_next_part_cond_trans_commutative: bool,
new_child_plan: Option<LogicalPlan>,
}
@@ -171,21 +220,57 @@ impl PlanRewriter {
/// Return true if should stop and expand. The input plan is the parent node of current node
fn should_expand(&mut self, plan: &LogicalPlan) -> bool {
debug!(
"Check should_expand at level: {} with Stack:\n{}, ",
self.level,
self.stack
.iter()
.map(|(p, l)| format!("{l}:{}{}", " ".repeat(l - 1), p.display()))
.collect::<Vec<String>>()
.join("\n"),
);
if DFLogicalSubstraitConvertor
.encode(plan, DefaultSerializer)
.is_err()
{
return true;
}
if self.expand_on_next_call {
self.expand_on_next_call = false;
return true;
}
match Categorizer::check_plan(plan, self.partition_cols.clone()) {
if self.expand_on_next_part_cond_trans_commutative {
let comm = Categorizer::check_plan(plan, self.get_aliased_partition_columns());
match comm {
Commutativity::PartialCommutative => {
// a small difference is that for partial commutative, we still need to
// push down it(so `Limit` can be pushed down)
// notice how limit needed to be expanded as well to make sure query is correct
// i.e. `Limit fetch=10` need to be pushed down to the leaf node
self.expand_on_next_part_cond_trans_commutative = false;
self.expand_on_next_call = true;
}
Commutativity::ConditionalCommutative(_)
| Commutativity::TransformedCommutative { .. } => {
// again a new node that can be push down, we should just
// do push down now and avoid further expansion
self.expand_on_next_part_cond_trans_commutative = false;
return true;
}
_ => (),
}
}
match Categorizer::check_plan(plan, self.get_aliased_partition_columns()) {
Commutativity::Commutative => {}
Commutativity::PartialCommutative => {
if let Some(plan) = partial_commutative_transformer(plan) {
self.update_column_requirements(&plan);
// notice this plan is parent of current node, so `self.level - 1` when updating column requirements
self.update_column_requirements(&plan, self.level - 1);
self.expand_on_next_part_cond_trans_commutative = true;
self.stage.push(plan)
}
}
@@ -193,7 +278,9 @@ impl PlanRewriter {
if let Some(transformer) = transformer
&& let Some(plan) = transformer(plan)
{
self.update_column_requirements(&plan);
// notice this plan is parent of current node, so `self.level - 1` when updating column requirements
self.update_column_requirements(&plan, self.level - 1);
self.expand_on_next_part_cond_trans_commutative = true;
self.stage.push(plan)
}
}
@@ -202,12 +289,22 @@ impl PlanRewriter {
&& let Some(transformer_actions) = transformer(plan)
{
debug!(
"PlanRewriter: transformed plan: {:#?}\n from {plan}",
transformer_actions.extra_parent_plans
"PlanRewriter: transformed plan: {}\n from {plan}",
transformer_actions
.extra_parent_plans
.iter()
.enumerate()
.map(|(i, p)| format!(
"Extra {i}-th parent plan from parent to child = {}",
p.display()
))
.collect::<Vec<_>>()
.join("\n")
);
if let Some(last_stage) = transformer_actions.extra_parent_plans.last() {
// update the column requirements from the last stage
self.update_column_requirements(last_stage);
// notice current plan's parent plan is where we need to apply the column requirements
self.update_column_requirements(last_stage, self.level - 1);
}
self.stage
.extend(transformer_actions.extra_parent_plans.into_iter().rev());
@@ -225,16 +322,25 @@ impl PlanRewriter {
false
}
fn update_column_requirements(&mut self, plan: &LogicalPlan) {
/// Update the column requirements for the current plan, plan_level is the level of the plan
/// in the stack, which is used to determine if the column requirements are applicable
/// for other plans in the stack.
fn update_column_requirements(&mut self, plan: &LogicalPlan, plan_level: usize) {
debug!(
"PlanRewriter: update column requirements for plan: {plan}\n with old column_requirements: {:?}",
self.column_requirements
);
let mut container = HashSet::new();
for expr in plan.expressions() {
// this method won't fail
let _ = expr_to_columns(&expr, &mut container);
}
for col in container {
self.column_requirements.insert(col);
}
self.column_requirements.push((container, plan_level));
debug!(
"PlanRewriter: updated column requirements: {:?}",
self.column_requirements
);
}
fn is_expanded(&self) -> bool {
@@ -249,6 +355,45 @@ impl PlanRewriter {
self.status = RewriterStatus::Unexpanded;
}
/// Maybe update alias for original table columns in the plan
fn maybe_update_alias(&mut self, node: &LogicalPlan) {
if let Some(alias_tracker) = &mut self.alias_tracker {
alias_tracker.update_alias(node);
debug!(
"Current partition columns are: {:?}",
self.get_aliased_partition_columns()
);
} else if let LogicalPlan::TableScan(table_scan) = node {
self.alias_tracker = AliasTracker::new(table_scan);
debug!(
"Initialize partition columns: {:?} with table={}",
self.get_aliased_partition_columns(),
table_scan.table_name
);
}
}
fn get_aliased_partition_columns(&self) -> Option<AliasMapping> {
if let Some(part_cols) = self.partition_cols.as_ref() {
let Some(alias_tracker) = &self.alias_tracker else {
// no alias tracker meaning no table scan encountered
return None;
};
let mut aliased = HashMap::new();
for part_col in part_cols {
let all_alias = alias_tracker
.get_all_alias_for_col(part_col)
.cloned()
.unwrap_or_default();
aliased.insert(part_col.clone(), all_alias);
}
Some(aliased)
} else {
None
}
}
fn maybe_set_partitions(&mut self, plan: &LogicalPlan) {
if self.partition_cols.is_some() {
// only need to set once
@@ -294,10 +439,15 @@ impl PlanRewriter {
}
// store schema before expand
let schema = on_node.schema().clone();
let mut rewriter = EnforceDistRequirementRewriter {
column_requirements: std::mem::take(&mut self.column_requirements),
};
let mut rewriter = EnforceDistRequirementRewriter::new(
std::mem::take(&mut self.column_requirements),
self.level,
);
debug!("PlanRewriter: enforce column requirements for node: {on_node} with rewriter: {rewriter:?}");
on_node = on_node.rewrite(&mut rewriter)?.data;
debug!(
"PlanRewriter: after enforced column requirements for node: {on_node} with rewriter: {rewriter:?}"
);
// add merge scan as the new root
let mut node = MergeScanLogicalPlan::new(
@@ -316,7 +466,8 @@ impl PlanRewriter {
}
self.set_expanded();
// recover the schema
// recover the schema, this make sure after expand the schema is the same as old node
// because after expand the raw top node might have extra columns i.e. sorting columns for `Sort` node
let node = LogicalPlanBuilder::from(node)
.project(schema.iter().map(|(qualifier, field)| {
Expr::Column(Column::new(qualifier.cloned(), field.name()))
@@ -333,42 +484,96 @@ impl PlanRewriter {
/// Requirements enforced by this rewriter:
/// - Enforce column requirements for `LogicalPlan::Projection` nodes. Makes sure the
/// required columns are available in the sub plan.
///
#[derive(Debug)]
struct EnforceDistRequirementRewriter {
column_requirements: HashSet<Column>,
/// only enforce column requirements after the expanding node in question,
/// meaning only for node with `cur_level` <= `level` will consider adding those column requirements
/// TODO(discord9): a simpler solution to track column requirements for merge scan
column_requirements: Vec<(HashSet<Column>, usize)>,
/// only apply column requirements >= `cur_level`
/// this is used to avoid applying column requirements that are not needed
/// for the current node, i.e. the node is not in the scope of the column requirements
/// i.e, for this plan:
/// ```ignore
/// Aggregate: min(t.number)
/// Projection: t.number
/// ```
/// when on `Projection` node, we don't need to apply the column requirements of `Aggregate` node
/// because the `Projection` node is not in the scope of the `Aggregate` node
cur_level: usize,
}
impl EnforceDistRequirementRewriter {
fn new(column_requirements: Vec<(HashSet<Column>, usize)>, cur_level: usize) -> Self {
Self {
column_requirements,
cur_level,
}
}
}
impl TreeNodeRewriter for EnforceDistRequirementRewriter {
type Node = LogicalPlan;
fn f_down(&mut self, node: Self::Node) -> DfResult<Transformed<Self::Node>> {
if let LogicalPlan::Projection(ref projection) = node {
let mut column_requirements = std::mem::take(&mut self.column_requirements);
if column_requirements.is_empty() {
return Ok(Transformed::no(node));
}
for expr in &projection.expr {
let (qualifier, name) = expr.qualified_name();
let column = Column::new(qualifier, name);
column_requirements.remove(&column);
}
if column_requirements.is_empty() {
return Ok(Transformed::no(node));
}
let mut new_exprs = projection.expr.clone();
for col in &column_requirements {
new_exprs.push(Expr::Column(col.clone()));
}
let new_node =
node.with_new_exprs(new_exprs, node.inputs().into_iter().cloned().collect())?;
return Ok(Transformed::yes(new_node));
// check that node doesn't have multiple children, i.e. join/subquery
if node.inputs().len() > 1 {
return Err(datafusion_common::DataFusionError::Internal(
"EnforceDistRequirementRewriter: node with multiple inputs is not supported"
.to_string(),
));
}
self.cur_level += 1;
Ok(Transformed::no(node))
}
fn f_up(&mut self, node: Self::Node) -> DfResult<Transformed<Self::Node>> {
self.cur_level -= 1;
// first get all applicable column requirements
let mut applicable_column_requirements = self
.column_requirements
.iter()
.filter(|(_, level)| *level >= self.cur_level)
.map(|(cols, _)| cols.clone())
.reduce(|mut acc, cols| {
acc.extend(cols);
acc
})
.unwrap_or_default();
debug!(
"EnforceDistRequirementRewriter: applicable column requirements at level {} = {:?} for node {}",
self.cur_level,
applicable_column_requirements,
node.display()
);
// make sure all projection applicable scope has the required columns
if let LogicalPlan::Projection(ref projection) = node {
for expr in &projection.expr {
let (qualifier, name) = expr.qualified_name();
let column = Column::new(qualifier, name);
applicable_column_requirements.remove(&column);
}
if applicable_column_requirements.is_empty() {
return Ok(Transformed::no(node));
}
let mut new_exprs = projection.expr.clone();
for col in &applicable_column_requirements {
new_exprs.push(Expr::Column(col.clone()));
}
let new_node =
node.with_new_exprs(new_exprs, node.inputs().into_iter().cloned().collect())?;
debug!(
"EnforceDistRequirementRewriter: added missing columns {:?} to projection node from old node: \n{node}\n Making new node: \n{new_node}",
applicable_column_requirements
);
// still need to continue for next projection if applicable
return Ok(Transformed::yes(new_node));
}
Ok(Transformed::no(node))
}
}
@@ -384,6 +589,7 @@ impl TreeNodeRewriter for PlanRewriter {
self.stage.clear();
self.set_unexpanded();
self.partition_cols = None;
self.alias_tracker = None;
Ok(Transformed::no(node))
}
@@ -406,8 +612,19 @@ impl TreeNodeRewriter for PlanRewriter {
self.maybe_set_partitions(&node);
self.maybe_update_alias(&node);
let Some(parent) = self.get_parent() else {
let node = self.expand(node)?;
debug!("Plan Rewriter: expand now for no parent found for node: {node}");
let node = self.expand(node);
debug!(
"PlanRewriter: expanded plan: {}",
match &node {
Ok(n) => n.to_string(),
Err(e) => format!("Error expanding plan: {e}"),
}
);
let node = node?;
self.pop_stack();
return Ok(Transformed::yes(node));
};
@@ -435,160 +652,3 @@ impl TreeNodeRewriter for PlanRewriter {
Ok(Transformed::no(node))
}
}
#[cfg(test)]
mod test {
use std::sync::Arc;
use datafusion::datasource::DefaultTableSource;
use datafusion::functions_aggregate::expr_fn::avg;
use datafusion_common::JoinType;
use datafusion_expr::{col, lit, Expr, LogicalPlanBuilder};
use table::table::adapter::DfTableProviderAdapter;
use table::table::numbers::NumbersTable;
use super::*;
#[ignore = "Projection is disabled for https://github.com/apache/arrow-datafusion/issues/6489"]
#[test]
fn transform_simple_projection_filter() {
let numbers_table = NumbersTable::table(0);
let table_source = Arc::new(DefaultTableSource::new(Arc::new(
DfTableProviderAdapter::new(numbers_table),
)));
let plan = LogicalPlanBuilder::scan_with_filters("t", table_source, None, vec![])
.unwrap()
.filter(col("number").lt(lit(10)))
.unwrap()
.project(vec![col("number")])
.unwrap()
.distinct()
.unwrap()
.build()
.unwrap();
let config = ConfigOptions::default();
let result = DistPlannerAnalyzer {}.analyze(plan, &config).unwrap();
let expected = [
"Distinct:",
" MergeScan [is_placeholder=false]",
" Distinct:",
" Projection: t.number",
" Filter: t.number < Int32(10)",
" TableScan: t",
]
.join("\n");
assert_eq!(expected, result.to_string());
}
#[test]
fn transform_aggregator() {
let numbers_table = NumbersTable::table(0);
let table_source = Arc::new(DefaultTableSource::new(Arc::new(
DfTableProviderAdapter::new(numbers_table),
)));
let plan = LogicalPlanBuilder::scan_with_filters("t", table_source, None, vec![])
.unwrap()
.aggregate(Vec::<Expr>::new(), vec![avg(col("number"))])
.unwrap()
.build()
.unwrap();
let config = ConfigOptions::default();
let result = DistPlannerAnalyzer {}.analyze(plan, &config).unwrap();
let expected = "Projection: avg(t.number)\
\n MergeScan [is_placeholder=false]";
assert_eq!(expected, result.to_string());
}
#[test]
fn transform_distinct_order() {
let numbers_table = NumbersTable::table(0);
let table_source = Arc::new(DefaultTableSource::new(Arc::new(
DfTableProviderAdapter::new(numbers_table),
)));
let plan = LogicalPlanBuilder::scan_with_filters("t", table_source, None, vec![])
.unwrap()
.distinct()
.unwrap()
.sort(vec![col("number").sort(true, false)])
.unwrap()
.build()
.unwrap();
let config = ConfigOptions::default();
let result = DistPlannerAnalyzer {}.analyze(plan, &config).unwrap();
let expected = ["Projection: t.number", " MergeScan [is_placeholder=false]"].join("\n");
assert_eq!(expected, result.to_string());
}
#[test]
fn transform_single_limit() {
let numbers_table = NumbersTable::table(0);
let table_source = Arc::new(DefaultTableSource::new(Arc::new(
DfTableProviderAdapter::new(numbers_table),
)));
let plan = LogicalPlanBuilder::scan_with_filters("t", table_source, None, vec![])
.unwrap()
.limit(0, Some(1))
.unwrap()
.build()
.unwrap();
let config = ConfigOptions::default();
let result = DistPlannerAnalyzer {}.analyze(plan, &config).unwrap();
let expected = "Projection: t.number\
\n MergeScan [is_placeholder=false]";
assert_eq!(expected, result.to_string());
}
#[test]
fn transform_unalighed_join_with_alias() {
let left = NumbersTable::table(0);
let right = NumbersTable::table(1);
let left_source = Arc::new(DefaultTableSource::new(Arc::new(
DfTableProviderAdapter::new(left),
)));
let right_source = Arc::new(DefaultTableSource::new(Arc::new(
DfTableProviderAdapter::new(right),
)));
let right_plan = LogicalPlanBuilder::scan_with_filters("t", right_source, None, vec![])
.unwrap()
.alias("right")
.unwrap()
.build()
.unwrap();
let plan = LogicalPlanBuilder::scan_with_filters("t", left_source, None, vec![])
.unwrap()
.join_on(
right_plan,
JoinType::LeftSemi,
vec![col("t.number").eq(col("right.number"))],
)
.unwrap()
.limit(0, Some(1))
.unwrap()
.build()
.unwrap();
let config = ConfigOptions::default();
let result = DistPlannerAnalyzer {}.analyze(plan, &config).unwrap();
let expected = [
"Limit: skip=0, fetch=1",
" LeftSemi Join: Filter: t.number = right.number",
" Projection: t.number",
" MergeScan [is_placeholder=false]",
" SubqueryAlias: right",
" Projection: t.number",
" MergeScan [is_placeholder=false]",
]
.join("\n");
assert_eq!(expected, result.to_string());
}
}

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,318 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
use std::collections::{HashMap, HashSet};
use datafusion::datasource::DefaultTableSource;
use datafusion_common::Column;
use datafusion_expr::{Expr, LogicalPlan, TableScan};
use table::metadata::TableType;
use table::table::adapter::DfTableProviderAdapter;
/// Mapping of original column in table to all the alias at current node
pub type AliasMapping = HashMap<String, HashSet<Column>>;
/// tracking aliases for the source table columns in the plan
#[derive(Debug, Clone)]
pub struct AliasTracker {
/// mapping from the original table name to the alias used in the plan
/// notice how one column might have multiple aliases in the plan
///
pub mapping: AliasMapping,
}
impl AliasTracker {
pub fn new(table_scan: &TableScan) -> Option<Self> {
if let Some(source) = table_scan
.source
.as_any()
.downcast_ref::<DefaultTableSource>()
{
if let Some(provider) = source
.table_provider
.as_any()
.downcast_ref::<DfTableProviderAdapter>()
{
if provider.table().table_type() == TableType::Base {
let info = provider.table().table_info();
let schema = info.meta.schema.clone();
let col_schema = schema.column_schemas();
let mapping = col_schema
.iter()
.map(|col| {
(
col.name.clone(),
HashSet::from_iter(std::iter::once(Column::new_unqualified(
col.name.clone(),
))),
)
})
.collect();
return Some(Self { mapping });
}
}
}
None
}
/// update alias for original columns
///
/// only handle `Alias` with column in `Projection` node
pub fn update_alias(&mut self, node: &LogicalPlan) {
if let LogicalPlan::Projection(projection) = node {
// first collect all the alias mapping, i.e. the col_a AS b AS c AS d become `a->d`
// notice one column might have multiple aliases
let mut alias_mapping: AliasMapping = HashMap::new();
for expr in &projection.expr {
if let Expr::Alias(alias) = expr {
let outer_alias = alias.clone();
let mut cur_alias = alias.clone();
while let Expr::Alias(alias) = *cur_alias.expr {
cur_alias = alias;
}
if let Expr::Column(column) = *cur_alias.expr {
alias_mapping
.entry(column.name.clone())
.or_default()
.insert(Column::new(outer_alias.relation, outer_alias.name));
}
} else if let Expr::Column(column) = expr {
// identity mapping
alias_mapping
.entry(column.name.clone())
.or_default()
.insert(column.clone());
}
}
// update mapping using `alias_mapping`
let mut new_mapping = HashMap::new();
for (table_col_name, cur_columns) in std::mem::take(&mut self.mapping) {
let new_aliases = {
let mut new_aliases = HashSet::new();
for cur_column in &cur_columns {
let new_alias_for_cur_column = alias_mapping
.get(cur_column.name())
.cloned()
.unwrap_or_default();
for new_alias in new_alias_for_cur_column {
let is_table_ref_eq = match (&new_alias.relation, &cur_column.relation)
{
(Some(o), Some(c)) => o.resolved_eq(c),
_ => true,
};
// is the same column if both name and table ref is eq
if is_table_ref_eq {
new_aliases.insert(new_alias.clone());
}
}
}
new_aliases
};
new_mapping.insert(table_col_name, new_aliases);
}
self.mapping = new_mapping;
common_telemetry::debug!(
"Updating alias tracker to {:?} using node: \n{node}",
self.mapping
);
}
}
pub fn get_all_alias_for_col(&self, col_name: &str) -> Option<&HashSet<Column>> {
self.mapping.get(col_name)
}
#[allow(unused)]
pub fn is_alias_for(&self, original_col: &str, cur_col: &Column) -> bool {
self.mapping
.get(original_col)
.map(|cols| cols.contains(cur_col))
.unwrap_or(false)
}
}
#[cfg(test)]
mod tests {
use std::sync::Arc;
use common_telemetry::init_default_ut_logging;
use datafusion::error::Result as DfResult;
use datafusion_common::tree_node::{TreeNode, TreeNodeRecursion, TreeNodeVisitor};
use datafusion_expr::{col, LogicalPlanBuilder};
use super::*;
use crate::dist_plan::analyzer::test::TestTable;
#[derive(Debug)]
struct TrackerTester {
alias_tracker: Option<AliasTracker>,
mapping_at_each_level: Vec<AliasMapping>,
}
impl TreeNodeVisitor<'_> for TrackerTester {
type Node = LogicalPlan;
fn f_up(&mut self, node: &LogicalPlan) -> DfResult<TreeNodeRecursion> {
if let Some(alias_tracker) = &mut self.alias_tracker {
alias_tracker.update_alias(node);
self.mapping_at_each_level.push(
self.alias_tracker
.as_ref()
.map(|a| a.mapping.clone())
.unwrap_or_default()
.clone(),
);
} else if let LogicalPlan::TableScan(table_scan) = node {
self.alias_tracker = AliasTracker::new(table_scan);
self.mapping_at_each_level.push(
self.alias_tracker
.as_ref()
.map(|a| a.mapping.clone())
.unwrap_or_default()
.clone(),
);
}
Ok(TreeNodeRecursion::Continue)
}
}
#[test]
fn proj_alias_tracker() {
// use logging for better debugging
init_default_ut_logging();
let test_table = TestTable::table_with_name(0, "numbers".to_string());
let table_source = Arc::new(DefaultTableSource::new(Arc::new(
DfTableProviderAdapter::new(test_table),
)));
let plan = LogicalPlanBuilder::scan_with_filters("t", table_source, None, vec![])
.unwrap()
.project(vec![
col("number"),
col("pk3").alias("pk1"),
col("pk2").alias("pk3"),
])
.unwrap()
.project(vec![
col("number"),
col("pk1").alias("pk2"),
col("pk3").alias("pk1"),
])
.unwrap()
.build()
.unwrap();
let mut tracker_tester = TrackerTester {
alias_tracker: None,
mapping_at_each_level: Vec::new(),
};
plan.visit(&mut tracker_tester).unwrap();
assert_eq!(
tracker_tester.mapping_at_each_level,
vec![
HashMap::from([
("number".to_string(), HashSet::from(["number".into()])),
("pk1".to_string(), HashSet::from(["pk1".into()])),
("pk2".to_string(), HashSet::from(["pk2".into()])),
("pk3".to_string(), HashSet::from(["pk3".into()])),
("ts".to_string(), HashSet::from(["ts".into()]))
]),
HashMap::from([
("number".to_string(), HashSet::from(["t.number".into()])),
("pk1".to_string(), HashSet::from([])),
("pk2".to_string(), HashSet::from(["pk3".into()])),
("pk3".to_string(), HashSet::from(["pk1".into()])),
("ts".to_string(), HashSet::from([]))
]),
HashMap::from([
("number".to_string(), HashSet::from(["t.number".into()])),
("pk1".to_string(), HashSet::from([])),
("pk2".to_string(), HashSet::from(["pk1".into()])),
("pk3".to_string(), HashSet::from(["pk2".into()])),
("ts".to_string(), HashSet::from([]))
])
]
);
}
#[test]
fn proj_multi_alias_tracker() {
// use logging for better debugging
init_default_ut_logging();
let test_table = TestTable::table_with_name(0, "numbers".to_string());
let table_source = Arc::new(DefaultTableSource::new(Arc::new(
DfTableProviderAdapter::new(test_table),
)));
let plan = LogicalPlanBuilder::scan_with_filters("t", table_source, None, vec![])
.unwrap()
.project(vec![
col("number"),
col("pk3").alias("pk1"),
col("pk3").alias("pk2"),
])
.unwrap()
.project(vec![
col("number"),
col("pk2").alias("pk4"),
col("pk1").alias("pk5"),
])
.unwrap()
.build()
.unwrap();
let mut tracker_tester = TrackerTester {
alias_tracker: None,
mapping_at_each_level: Vec::new(),
};
plan.visit(&mut tracker_tester).unwrap();
assert_eq!(
tracker_tester.mapping_at_each_level,
vec![
HashMap::from([
("number".to_string(), HashSet::from(["number".into()])),
("pk1".to_string(), HashSet::from(["pk1".into()])),
("pk2".to_string(), HashSet::from(["pk2".into()])),
("pk3".to_string(), HashSet::from(["pk3".into()])),
("ts".to_string(), HashSet::from(["ts".into()]))
]),
HashMap::from([
("number".to_string(), HashSet::from(["t.number".into()])),
("pk1".to_string(), HashSet::from([])),
("pk2".to_string(), HashSet::from([])),
(
"pk3".to_string(),
HashSet::from(["pk1".into(), "pk2".into()])
),
("ts".to_string(), HashSet::from([]))
]),
HashMap::from([
("number".to_string(), HashSet::from(["t.number".into()])),
("pk1".to_string(), HashSet::from([])),
("pk2".to_string(), HashSet::from([])),
(
"pk3".to_string(),
HashSet::from(["pk4".into(), "pk5".into()])
),
("ts".to_string(), HashSet::from([]))
])
]
);
}
}

View File

@@ -27,6 +27,7 @@ use promql::extension_plan::{
EmptyMetric, InstantManipulate, RangeManipulate, SeriesDivide, SeriesNormalize,
};
use crate::dist_plan::analyzer::AliasMapping;
use crate::dist_plan::merge_sort::{merge_sort_transformer, MergeSortLogicalPlan};
use crate::dist_plan::MergeScanLogicalPlan;
@@ -139,9 +140,7 @@ pub fn step_aggr_to_upper_aggr(
new_projection_exprs.push(aliased_output_aggr_expr);
}
let upper_aggr_plan = LogicalPlan::Aggregate(new_aggr);
debug!("Before recompute schema: {upper_aggr_plan:?}");
let upper_aggr_plan = upper_aggr_plan.recompute_schema()?;
debug!("After recompute schema: {upper_aggr_plan:?}");
// create a projection on top of the new aggregate plan
let new_projection =
Projection::try_new(new_projection_exprs, Arc::new(upper_aggr_plan.clone()))?;
@@ -222,7 +221,7 @@ pub enum Commutativity {
pub struct Categorizer {}
impl Categorizer {
pub fn check_plan(plan: &LogicalPlan, partition_cols: Option<Vec<String>>) -> Commutativity {
pub fn check_plan(plan: &LogicalPlan, partition_cols: Option<AliasMapping>) -> Commutativity {
let partition_cols = partition_cols.unwrap_or_default();
match plan {
@@ -247,7 +246,6 @@ impl Categorizer {
transformer: Some(Arc::new(|plan: &LogicalPlan| {
debug!("Before Step optimize: {plan}");
let ret = step_aggr_to_upper_aggr(plan);
debug!("After Step Optimize: {ret:?}");
ret.ok().map(|s| TransformerAction {
extra_parent_plans: s.to_vec(),
new_child_plan: None,
@@ -264,7 +262,11 @@ impl Categorizer {
return commutativity;
}
}
Commutativity::Commutative
// all group by expressions are partition columns can push down, unless
// another push down(including `Limit` or `Sort`) is already in progress(which will then prvent next cond commutative node from being push down).
// TODO(discord9): This is a temporary solution(that works), a better description of
// commutativity is needed under this situation.
Commutativity::ConditionalCommutative(None)
}
LogicalPlan::Sort(_) => {
if partition_cols.is_empty() {
@@ -322,17 +324,20 @@ impl Categorizer {
pub fn check_extension_plan(
plan: &dyn UserDefinedLogicalNode,
partition_cols: &[String],
partition_cols: &AliasMapping,
) -> Commutativity {
match plan.name() {
name if name == SeriesDivide::name() => {
let series_divide = plan.as_any().downcast_ref::<SeriesDivide>().unwrap();
let tags = series_divide.tags().iter().collect::<HashSet<_>>();
for partition_col in partition_cols {
if !tags.contains(partition_col) {
for all_alias in partition_cols.values() {
let all_alias = all_alias.iter().map(|c| &c.name).collect::<HashSet<_>>();
if tags.intersection(&all_alias).count() == 0 {
return Commutativity::NonCommutative;
}
}
Commutativity::Commutative
}
name if name == SeriesNormalize::name()
@@ -396,7 +401,7 @@ impl Categorizer {
/// Return true if the given expr and partition cols satisfied the rule.
/// In this case the plan can be treated as fully commutative.
fn check_partition(exprs: &[Expr], partition_cols: &[String]) -> bool {
fn check_partition(exprs: &[Expr], partition_cols: &AliasMapping) -> bool {
let mut ref_cols = HashSet::new();
for expr in exprs {
expr.add_column_refs(&mut ref_cols);
@@ -405,8 +410,14 @@ impl Categorizer {
.into_iter()
.map(|c| c.name.clone())
.collect::<HashSet<_>>();
for col in partition_cols {
if !ref_cols.contains(col) {
for all_alias in partition_cols.values() {
let all_alias = all_alias
.iter()
.map(|c| c.name.clone())
.collect::<HashSet<_>>();
// check if ref columns intersect with all alias of partition columns
// is empty, if it's empty, not all partition columns show up in `exprs`
if ref_cols.intersection(&all_alias).count() == 0 {
return false;
}
}
@@ -424,7 +435,7 @@ pub type StageTransformer = Arc<dyn Fn(&LogicalPlan) -> Option<TransformerAction
pub struct TransformerAction {
/// list of plans that need to be applied to parent plans, in the order of parent to child.
/// i.e. if this returns `[Projection, Aggregate]`, then the parent plan should be transformed to
/// ```
/// ```ignore
/// Original Parent Plan:
/// Projection:
/// Aggregate:
@@ -453,7 +464,7 @@ mod test {
fetch: None,
});
assert!(matches!(
Categorizer::check_plan(&plan, Some(vec![])),
Categorizer::check_plan(&plan, Some(Default::default())),
Commutativity::Commutative
));
}

View File

@@ -16,7 +16,7 @@ use std::any::Any;
use std::sync::{Arc, Mutex};
use std::time::Duration;
use ahash::HashSet;
use ahash::{HashMap, HashSet};
use arrow_schema::{Schema as ArrowSchema, SchemaRef as ArrowSchemaRef, SortOptions};
use async_stream::stream;
use common_catalog::parse_catalog_and_schema_from_db_string;
@@ -88,7 +88,11 @@ impl UserDefinedLogicalNodeCore for MergeScanLogicalPlan {
}
fn fmt_for_explain(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
write!(f, "MergeScan [is_placeholder={}]", self.is_placeholder)
write!(
f,
"MergeScan [is_placeholder={}, remote_input=[\n{}\n]]",
self.is_placeholder, self.input
)
}
fn with_exprs_and_inputs(
@@ -143,7 +147,7 @@ pub struct MergeScanExec {
metric: ExecutionPlanMetricsSet,
properties: PlanProperties,
/// Metrics from sub stages
sub_stage_metrics: Arc<Mutex<Vec<RecordBatchMetrics>>>,
sub_stage_metrics: Arc<Mutex<HashMap<RegionId, RecordBatchMetrics>>>,
query_ctx: QueryContextRef,
target_partition: usize,
partition_cols: Vec<String>,
@@ -155,6 +159,7 @@ impl std::fmt::Debug for MergeScanExec {
.field("table", &self.table)
.field("regions", &self.regions)
.field("schema", &self.schema)
.field("plan", &self.plan)
.finish()
}
}
@@ -317,6 +322,12 @@ impl MergeScanExec {
if let Some(mut first_consume_timer) = first_consume_timer.take() {
first_consume_timer.stop();
}
if let Some(metrics) = stream.metrics() {
let mut sub_stage_metrics = sub_stage_metrics_moved.lock().unwrap();
sub_stage_metrics.insert(region_id, metrics);
}
yield Ok(batch);
// reset poll timer
poll_timer = Instant::now();
@@ -341,7 +352,8 @@ impl MergeScanExec {
metric.record_greptime_exec_cost(value as usize);
// record metrics from sub sgates
sub_stage_metrics_moved.lock().unwrap().push(metrics);
let mut sub_stage_metrics = sub_stage_metrics_moved.lock().unwrap();
sub_stage_metrics.insert(region_id, metrics);
}
MERGE_SCAN_POLL_ELAPSED.observe(poll_duration.as_secs_f64());
@@ -409,7 +421,12 @@ impl MergeScanExec {
}
pub fn sub_stage_metrics(&self) -> Vec<RecordBatchMetrics> {
self.sub_stage_metrics.lock().unwrap().clone()
self.sub_stage_metrics
.lock()
.unwrap()
.values()
.cloned()
.collect()
}
pub fn partition_count(&self) -> usize {

View File

@@ -181,6 +181,15 @@ fn fetch_partition_range(input: Arc<dyn ExecutionPlan>) -> DataFusionResult<Opti
is_batch_coalesced = true;
}
// only a very limited set of plans can exist between region scan and sort exec
// other plans might make this optimize wrong, so be safe here by limiting it
if !(plan.as_any().is::<ProjectionExec>()
|| plan.as_any().is::<FilterExec>()
|| plan.as_any().is::<CoalesceBatchesExec>())
{
partition_ranges = None;
}
// TODO(discord9): do this in logical plan instead as it's lessy bugy there
// Collects alias of the time index column.
if let Some(projection) = plan.as_any().downcast_ref::<ProjectionExec>() {
@@ -194,6 +203,14 @@ fn fetch_partition_range(input: Arc<dyn ExecutionPlan>) -> DataFusionResult<Opti
}
if let Some(region_scan_exec) = plan.as_any().downcast_ref::<RegionScanExec>() {
// `PerSeries` distribution is not supported in windowed sort.
if region_scan_exec.distribution()
== Some(store_api::storage::TimeSeriesDistribution::PerSeries)
{
partition_ranges = None;
return Ok(Transformed::no(plan));
}
partition_ranges = Some(region_scan_exec.get_uncollapsed_partition_ranges());
// Reset time index column.
time_index = HashSet::from([region_scan_exec.time_index()]);

View File

@@ -96,9 +96,10 @@ impl PartSortExec {
if partition >= self.partition_ranges.len() {
internal_err!(
"Partition index out of range: {} >= {}",
"Partition index out of range: {} >= {} at {}",
partition,
self.partition_ranges.len()
self.partition_ranges.len(),
snafu::location!()
)?;
}
@@ -322,9 +323,10 @@ impl PartSortStream {
) -> datafusion_common::Result<()> {
if self.cur_part_idx >= self.partition_ranges.len() {
internal_err!(
"Partition index out of range: {} >= {}",
"Partition index out of range: {} >= {} at {}",
self.cur_part_idx,
self.partition_ranges.len()
self.partition_ranges.len(),
snafu::location!()
)?;
}
let cur_range = self.partition_ranges[self.cur_part_idx];
@@ -355,9 +357,10 @@ impl PartSortStream {
// check if the current partition index is out of range
if self.cur_part_idx >= self.partition_ranges.len() {
internal_err!(
"Partition index out of range: {} >= {}",
"Partition index out of range: {} >= {} at {}",
self.cur_part_idx,
self.partition_ranges.len()
self.partition_ranges.len(),
snafu::location!()
)?;
}
let cur_range = self.partition_ranges[self.cur_part_idx];

View File

@@ -191,18 +191,38 @@ impl PromPlanner {
planner.prom_expr_to_plan(&stmt.expr, session_state).await
}
#[async_recursion]
pub async fn prom_expr_to_plan(
&mut self,
prom_expr: &PromExpr,
session_state: &SessionState,
) -> Result<LogicalPlan> {
self.prom_expr_to_plan_inner(prom_expr, false, session_state)
.await
}
/**
Converts a PromQL expression to a logical plan.
NOTE:
The `timestamp_fn` indicates whether the PromQL `timestamp()` function is being evaluated in the current context.
If `true`, the planner generates a logical plan that projects the timestamp (time index) column
as the value column for each input row, implementing the PromQL `timestamp()` function semantics.
If `false`, the planner generates the standard logical plan for the given PromQL expression.
*/
#[async_recursion]
async fn prom_expr_to_plan_inner(
&mut self,
prom_expr: &PromExpr,
timestamp_fn: bool,
session_state: &SessionState,
) -> Result<LogicalPlan> {
let res = match prom_expr {
PromExpr::Aggregate(expr) => self.prom_aggr_expr_to_plan(session_state, expr).await?,
PromExpr::Unary(expr) => self.prom_unary_expr_to_plan(session_state, expr).await?,
PromExpr::Binary(expr) => self.prom_binary_expr_to_plan(session_state, expr).await?,
PromExpr::Paren(ParenExpr { expr }) => {
self.prom_expr_to_plan(expr, session_state).await?
self.prom_expr_to_plan_inner(expr, timestamp_fn, session_state)
.await?
}
PromExpr::Subquery(expr) => {
self.prom_subquery_expr_to_plan(session_state, expr).await?
@@ -210,7 +230,8 @@ impl PromPlanner {
PromExpr::NumberLiteral(lit) => self.prom_number_lit_to_plan(lit)?,
PromExpr::StringLiteral(lit) => self.prom_string_lit_to_plan(lit)?,
PromExpr::VectorSelector(selector) => {
self.prom_vector_selector_to_plan(selector).await?
self.prom_vector_selector_to_plan(selector, timestamp_fn)
.await?
}
PromExpr::MatrixSelector(selector) => {
self.prom_matrix_selector_to_plan(selector).await?
@@ -673,6 +694,7 @@ impl PromPlanner {
async fn prom_vector_selector_to_plan(
&mut self,
vector_selector: &VectorSelector,
timestamp_fn: bool,
) -> Result<LogicalPlan> {
let VectorSelector {
name,
@@ -687,6 +709,15 @@ impl PromPlanner {
let normalize = self
.selector_to_series_normalize_plan(offset, matchers, false)
.await?;
let normalize = if timestamp_fn {
// If evaluating the PromQL `timestamp()` function, project the time index column as the value column
// before wrapping with [`InstantManipulate`], so the output matches PromQL's `timestamp()` semantics.
self.create_timestamp_func_plan(normalize)?
} else {
normalize
};
let manipulate = InstantManipulate::new(
self.ctx.start,
self.ctx.end,
@@ -704,6 +735,43 @@ impl PromPlanner {
}))
}
/// Builds a projection plan for the PromQL `timestamp()` function.
/// Projects the time index column as the value column for each row.
///
/// # Arguments
/// * `normalize` - Input [`LogicalPlan`] for the normalized series.
///
/// # Returns
/// Returns a [`Result<LogicalPlan>`] where the resulting logical plan projects the timestamp
/// column as the value column, along with the original tag and time index columns.
///
/// # Timestamp vs. Time Function
///
/// - **Timestamp Function (`timestamp()`)**: In PromQL, the `timestamp()` function returns the
/// timestamp (time index) of each sample as the value column.
///
/// - **Time Function (`time()`)**: The `time()` function returns the evaluation time of the query
/// as a scalar value.
///
/// # Side Effects
/// Updates the planner context's field columns to the timestamp column name.
///
fn create_timestamp_func_plan(&mut self, normalize: LogicalPlan) -> Result<LogicalPlan> {
let time_expr = build_special_time_expr(self.ctx.time_index_column.as_ref().unwrap())
.alias(DEFAULT_FIELD_COLUMN);
self.ctx.field_columns = vec![time_expr.schema_name().to_string()];
let mut project_exprs = Vec::with_capacity(self.ctx.tag_columns.len() + 2);
project_exprs.push(self.create_time_index_column_expr()?);
project_exprs.push(time_expr);
project_exprs.extend(self.create_tag_column_exprs()?);
LogicalPlanBuilder::from(normalize)
.project(project_exprs)
.context(DataFusionPlanningSnafu)?
.build()
.context(DataFusionPlanningSnafu)
}
async fn prom_matrix_selector_to_plan(
&mut self,
matrix_selector: &MatrixSelector,
@@ -716,17 +784,19 @@ impl PromPlanner {
..
} = vs;
let matchers = self.preprocess_label_matchers(matchers, name)?;
if let Some(empty_plan) = self.setup_context().await? {
return Ok(empty_plan);
}
ensure!(!range.is_zero(), ZeroRangeSelectorSnafu);
let range_ms = range.as_millis() as _;
self.ctx.range = Some(range_ms);
let normalize = self
.selector_to_series_normalize_plan(offset, matchers, true)
.await?;
// Some functions like rate may require special fields in the RangeManipulate plan
// so we can't skip RangeManipulate.
let normalize = match self.setup_context().await? {
Some(empty_plan) => empty_plan,
None => {
self.selector_to_series_normalize_plan(offset, matchers, true)
.await?
}
};
let manipulate = RangeManipulate::new(
self.ctx.start,
self.ctx.end,
@@ -766,7 +836,8 @@ impl PromPlanner {
// transform function arguments
let args = self.create_function_args(&args.args)?;
let input = if let Some(prom_expr) = &args.input {
self.prom_expr_to_plan(prom_expr, session_state).await?
self.prom_expr_to_plan_inner(prom_expr, func.name == "timestamp", session_state)
.await?
} else {
self.ctx.time_index_column = Some(SPECIAL_TIME_FUNCTION.to_string());
self.ctx.reset_table_name_and_schema();
@@ -1652,7 +1723,7 @@ impl PromPlanner {
ScalarFunc::GeneratedExpr
}
"sort" | "sort_desc" | "sort_by_label" | "sort_by_label_desc" => {
"sort" | "sort_desc" | "sort_by_label" | "sort_by_label_desc" | "timestamp" => {
// These functions are not expression but a part of plan,
// they are processed by `prom_call_expr_to_plan`.
for value in &self.ctx.field_columns {
@@ -2263,10 +2334,14 @@ impl PromPlanner {
let input_plan = self.prom_expr_to_plan(&input, session_state).await?;
if !self.ctx.has_le_tag() {
return ColumnNotFoundSnafu {
col: LE_COLUMN_NAME.to_string(),
}
.fail();
// Return empty result instead of error when 'le' column is not found
// This handles the case when histogram metrics don't exist
return Ok(LogicalPlan::EmptyRelation(
datafusion::logical_expr::EmptyRelation {
produce_one_row: false,
schema: Arc::new(DFSchema::empty()),
},
));
}
let time_index_column =
self.ctx
@@ -4657,4 +4732,53 @@ Filter: up.field_0 IS NOT NULL [timestamp:Timestamp(Millisecond, None), field_0:
assert_eq!(plan.display_indent_schema().to_string(), expected);
}
#[tokio::test]
async fn test_histogram_quantile_missing_le_column() {
let mut eval_stmt = EvalStmt {
expr: PromExpr::NumberLiteral(NumberLiteral { val: 1.0 }),
start: UNIX_EPOCH,
end: UNIX_EPOCH
.checked_add(Duration::from_secs(100_000))
.unwrap(),
interval: Duration::from_secs(5),
lookback_delta: Duration::from_secs(1),
};
// Test case: histogram_quantile with a table that doesn't have 'le' column
let case = r#"histogram_quantile(0.99, sum by(pod,instance,le) (rate(non_existent_histogram_bucket{instance=~"xxx"}[1m])))"#;
let prom_expr = parser::parse(case).unwrap();
eval_stmt.expr = prom_expr;
// Create a table provider with a table that doesn't have 'le' column
let table_provider = build_test_table_provider_with_fields(
&[(
DEFAULT_SCHEMA_NAME.to_string(),
"non_existent_histogram_bucket".to_string(),
)],
&["pod", "instance"], // Note: no 'le' column
)
.await;
// Should return empty result instead of error
let result =
PromPlanner::stmt_to_plan(table_provider, &eval_stmt, &build_session_state()).await;
// This should succeed now (returning empty result) instead of failing with "Cannot find column le"
assert!(
result.is_ok(),
"Expected successful plan creation with empty result, but got error: {:?}",
result.err()
);
// Verify that the result is an EmptyRelation
let plan = result.unwrap();
match plan {
LogicalPlan::EmptyRelation(_) => {
// This is what we expect
}
_ => panic!("Expected EmptyRelation, but got: {:?}", plan),
}
}
}

View File

@@ -36,6 +36,7 @@ use common_telemetry::tracing_context::{FutureExt, TracingContext};
use futures::{future, ready, Stream};
use futures_util::{StreamExt, TryStreamExt};
use prost::Message;
use session::context::{QueryContext, QueryContextRef};
use snafu::{ensure, ResultExt};
use table::table_name::TableName;
use tokio::sync::mpsc;
@@ -188,6 +189,7 @@ impl FlightCraft for GreptimeRequestHandler {
let ticket = request.into_inner().ticket;
let request =
GreptimeRequest::decode(ticket.as_ref()).context(error::InvalidFlightTicketSnafu)?;
let query_ctx = QueryContext::arc();
// The Grpc protocol pass query by Flight. It needs to be wrapped under a span, in order to record stream
let span = info_span!(
@@ -202,6 +204,7 @@ impl FlightCraft for GreptimeRequestHandler {
output,
TracingContext::from_current_span(),
flight_compression,
query_ctx,
);
Ok(Response::new(stream))
}
@@ -371,15 +374,25 @@ fn to_flight_data_stream(
output: Output,
tracing_context: TracingContext,
flight_compression: FlightCompression,
query_ctx: QueryContextRef,
) -> TonicStream<FlightData> {
match output.data {
OutputData::Stream(stream) => {
let stream = FlightRecordBatchStream::new(stream, tracing_context, flight_compression);
let stream = FlightRecordBatchStream::new(
stream,
tracing_context,
flight_compression,
query_ctx,
);
Box::pin(stream) as _
}
OutputData::RecordBatches(x) => {
let stream =
FlightRecordBatchStream::new(x.as_stream(), tracing_context, flight_compression);
let stream = FlightRecordBatchStream::new(
x.as_stream(),
tracing_context,
flight_compression,
query_ctx,
);
Box::pin(stream) as _
}
OutputData::AffectedRows(rows) => {

View File

@@ -25,6 +25,7 @@ use futures::channel::mpsc;
use futures::channel::mpsc::Sender;
use futures::{SinkExt, Stream, StreamExt};
use pin_project::{pin_project, pinned_drop};
use session::context::QueryContextRef;
use snafu::ResultExt;
use tokio::task::JoinHandle;
@@ -46,10 +47,12 @@ impl FlightRecordBatchStream {
recordbatches: SendableRecordBatchStream,
tracing_context: TracingContext,
compression: FlightCompression,
query_ctx: QueryContextRef,
) -> Self {
let should_send_partial_metrics = query_ctx.explain_verbose();
let (tx, rx) = mpsc::channel::<TonicResult<FlightMessage>>(1);
let join_handle = common_runtime::spawn_global(async move {
Self::flight_data_stream(recordbatches, tx)
Self::flight_data_stream(recordbatches, tx, should_send_partial_metrics)
.trace(tracing_context.attach(info_span!("flight_data_stream")))
.await
});
@@ -69,6 +72,7 @@ impl FlightRecordBatchStream {
async fn flight_data_stream(
mut recordbatches: SendableRecordBatchStream,
mut tx: Sender<TonicResult<FlightMessage>>,
should_send_partial_metrics: bool,
) {
let schema = recordbatches.schema().arrow_schema().clone();
if let Err(e) = tx.send(Ok(FlightMessage::Schema(schema))).await {
@@ -88,6 +92,17 @@ impl FlightRecordBatchStream {
warn!(e; "stop sending Flight data");
return;
}
if should_send_partial_metrics {
if let Some(metrics) = recordbatches
.metrics()
.and_then(|m| serde_json::to_string(&m).ok())
{
if let Err(e) = tx.send(Ok(FlightMessage::Metrics(metrics))).await {
warn!(e; "stop sending Flight data");
return;
}
}
}
}
Err(e) => {
let e = Err(e).context(error::CollectRecordbatchSnafu);
@@ -154,6 +169,7 @@ mod test {
use datatypes::schema::{ColumnSchema, Schema};
use datatypes::vectors::Int32Vector;
use futures::StreamExt;
use session::context::QueryContext;
use super::*;
@@ -175,6 +191,7 @@ mod test {
recordbatches,
TracingContext::default(),
FlightCompression::default(),
QueryContext::arc(),
);
let mut raw_data = Vec::with_capacity(2);

View File

@@ -42,6 +42,7 @@ use session::hints::READ_PREFERENCE_HINT;
use snafu::{OptionExt, ResultExt};
use table::TableRef;
use tokio::sync::mpsc;
use tokio::sync::mpsc::error::TrySendError;
use crate::error::Error::UnsupportedAuthScheme;
use crate::error::{
@@ -176,8 +177,9 @@ impl GreptimeRequestHandler {
let result = result
.map(|x| DoPutResponse::new(request_id, x))
.map_err(Into::into);
if result_sender.try_send(result).is_err() {
warn!(r#""DoPut" client maybe unreachable, abort handling its message"#);
if let Err(e)= result_sender.try_send(result)
&& let TrySendError::Closed(_) = e {
warn!(r#""DoPut" client with request_id {} maybe unreachable, abort handling its message"#, request_id);
break;
}
}

View File

@@ -13,7 +13,7 @@
// limitations under the License.
//! prom supply the prometheus HTTP API Server compliance
use std::collections::{HashMap, HashSet};
use std::collections::{BTreeMap, HashMap, HashSet};
use std::sync::Arc;
use axum::extract::{Path, Query, State};
@@ -62,7 +62,7 @@ use crate::prometheus_handler::PrometheusHandlerRef;
/// For [ValueType::Vector] result type
#[derive(Debug, Default, Serialize, Deserialize, PartialEq)]
pub struct PromSeriesVector {
pub metric: HashMap<String, String>,
pub metric: BTreeMap<String, String>,
#[serde(skip_serializing_if = "Option::is_none")]
pub value: Option<(f64, String)>,
}
@@ -70,7 +70,7 @@ pub struct PromSeriesVector {
/// For [ValueType::Matrix] result type
#[derive(Debug, Default, Serialize, Deserialize, PartialEq)]
pub struct PromSeriesMatrix {
pub metric: HashMap<String, String>,
pub metric: BTreeMap<String, String>,
pub values: Vec<(f64, String)>,
}

View File

@@ -13,7 +13,8 @@
// limitations under the License.
//! prom supply the prometheus HTTP API Server compliance
use std::collections::HashMap;
use std::cmp::Ordering;
use std::collections::{BTreeMap, HashMap};
use axum::http::HeaderValue;
use axum::response::{IntoResponse, Response};
@@ -311,7 +312,7 @@ impl PrometheusJsonResponse {
let metric = tags
.into_iter()
.map(|(k, v)| (k.to_string(), v.to_string()))
.collect::<HashMap<_, _>>();
.collect::<BTreeMap<_, _>>();
match result {
PromQueryResult::Vector(ref mut v) => {
v.push(PromSeriesVector {
@@ -320,6 +321,11 @@ impl PrometheusJsonResponse {
});
}
PromQueryResult::Matrix(ref mut v) => {
// sort values by timestamp
if !values.is_sorted_by(|a, b| a.0 <= b.0) {
values.sort_by(|a, b| a.0.partial_cmp(&b.0).unwrap_or(Ordering::Equal));
}
v.push(PromSeriesMatrix { metric, values });
}
PromQueryResult::Scalar(ref mut v) => {
@@ -331,6 +337,12 @@ impl PrometheusJsonResponse {
}
});
// sort matrix by metric
// see: https://prometheus.io/docs/prometheus/3.5/querying/api/#range-vectors
if let PromQueryResult::Matrix(ref mut v) = result {
v.sort_by(|a, b| a.metric.cmp(&b.metric));
}
let result_type_string = result_type.to_string();
let data = PrometheusResponse::PromData(PromData {
result_type: result_type_string,

View File

@@ -170,7 +170,7 @@ fn select_variable(query: &str, query_context: QueryContextRef) -> Option<Output
// skip the first "select"
for var in vars.iter().skip(1) {
let var = var.trim_matches(|c| c == ' ' || c == ',');
let var = var.trim_matches(|c| c == ' ' || c == ',' || c == ';');
let var_as: Vec<&str> = var
.split(" as ")
.map(|x| {
@@ -185,6 +185,9 @@ fn select_variable(query: &str, query_context: QueryContextRef) -> Option<Output
let value = match var_as[0] {
"session.time_zone" | "time_zone" => query_context.timezone().to_string(),
"system_time_zone" => system_timezone_name(),
"max_execution_time" | "session.max_execution_time" => {
query_context.query_timeout_as_millis().to_string()
}
_ => VAR_VALUES
.get(var_as[0])
.map(|v| v.to_string())
@@ -352,11 +355,11 @@ mod test {
// complex variables
let query = "/* mysql-connector-java-8.0.17 (Revision: 16a712ddb3f826a1933ab42b0039f7fb9eebc6ec) */SELECT @@session.auto_increment_increment AS auto_increment_increment, @@character_set_client AS character_set_client, @@character_set_connection AS character_set_connection, @@character_set_results AS character_set_results, @@character_set_server AS character_set_server, @@collation_server AS collation_server, @@collation_connection AS collation_connection, @@init_connect AS init_connect, @@interactive_timeout AS interactive_timeout, @@license AS license, @@lower_case_table_names AS lower_case_table_names, @@max_allowed_packet AS max_allowed_packet, @@net_write_timeout AS net_write_timeout, @@performance_schema AS performance_schema, @@sql_mode AS sql_mode, @@system_time_zone AS system_time_zone, @@time_zone AS time_zone, @@transaction_isolation AS transaction_isolation, @@wait_timeout AS wait_timeout;";
let expected = "\
+--------------------------+----------------------+--------------------------+-----------------------+----------------------+------------------+----------------------+--------------+---------------------+---------+------------------------+--------------------+-------------------+--------------------+----------+------------------+---------------+-----------------------+---------------+
| auto_increment_increment | character_set_client | character_set_connection | character_set_results | character_set_server | collation_server | collation_connection | init_connect | interactive_timeout | license | lower_case_table_names | max_allowed_packet | net_write_timeout | performance_schema | sql_mode | system_time_zone | time_zone | transaction_isolation | wait_timeout; |
+--------------------------+----------------------+--------------------------+-----------------------+----------------------+------------------+----------------------+--------------+---------------------+---------+------------------------+--------------------+-------------------+--------------------+----------+------------------+---------------+-----------------------+---------------+
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 31536000 | 0 | 0 | 134217728 | 31536000 | 0 | 0 | Asia/Shanghai | Asia/Shanghai | REPEATABLE-READ | 31536000 |
+--------------------------+----------------------+--------------------------+-----------------------+----------------------+------------------+----------------------+--------------+---------------------+---------+------------------------+--------------------+-------------------+--------------------+----------+------------------+---------------+-----------------------+---------------+";
+--------------------------+----------------------+--------------------------+-----------------------+----------------------+------------------+----------------------+--------------+---------------------+---------+------------------------+--------------------+-------------------+--------------------+----------+------------------+---------------+-----------------------+--------------+
| auto_increment_increment | character_set_client | character_set_connection | character_set_results | character_set_server | collation_server | collation_connection | init_connect | interactive_timeout | license | lower_case_table_names | max_allowed_packet | net_write_timeout | performance_schema | sql_mode | system_time_zone | time_zone | transaction_isolation | wait_timeout |
+--------------------------+----------------------+--------------------------+-----------------------+----------------------+------------------+----------------------+--------------+---------------------+---------+------------------------+--------------------+-------------------+--------------------+----------+------------------+---------------+-----------------------+--------------+
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 31536000 | 0 | 0 | 134217728 | 31536000 | 0 | 0 | Asia/Shanghai | Asia/Shanghai | REPEATABLE-READ | 31536000 |
+--------------------------+----------------------+--------------------------+-----------------------+----------------------+------------------+----------------------+--------------+---------------------+---------+------------------------+--------------------+-------------------+--------------------+----------+------------------+---------------+-----------------------+--------------+";
test(query, expected);
let query = "show variables";

View File

@@ -167,6 +167,9 @@ async fn run_custom_pipeline(
PipelineExecOutput::DispatchedTo(dispatched_to, val) => {
push_to_map!(dispatched, dispatched_to, val, arr_len);
}
PipelineExecOutput::Filtered => {
continue;
}
}
}

View File

@@ -49,7 +49,7 @@ pub(crate) struct GreptimeDBStartupParameters {
impl GreptimeDBStartupParameters {
fn new() -> GreptimeDBStartupParameters {
GreptimeDBStartupParameters {
version: format!("16.3-greptimedb-{}", env!("CARGO_PKG_VERSION")),
version: format!("16.3-greptimedb-{}", common_version::version()),
}
}
}

View File

@@ -412,6 +412,10 @@ impl PromSeriesProcessor {
let one_sample = series.samples.len() == 1;
for s in series.samples.iter() {
// skip NaN value
if s.value.is_nan() {
continue;
}
let timestamp = s.timestamp;
pipeline_map.insert(GREPTIME_TIMESTAMP.to_string(), Value::Int64(timestamp));
pipeline_map.insert(GREPTIME_VALUE.to_string(), Value::Float64(s.value));

View File

@@ -95,6 +95,18 @@ pub enum Error {
location: Location,
},
#[snafu(display(
"Not allowed to remove partition column {} from table {}",
column_name,
table_name
))]
RemovePartitionColumn {
column_name: String,
table_name: String,
#[snafu(implicit)]
location: Location,
},
#[snafu(display(
"Failed to build column descriptor for table: {}, column: {}",
table_name,
@@ -193,6 +205,7 @@ impl ErrorExt for Error {
StatusCode::EngineExecuteQuery
}
Error::RemoveColumnInIndex { .. }
| Error::RemovePartitionColumn { .. }
| Error::BuildColumnDescriptor { .. }
| Error::InvalidAlterRequest { .. } => StatusCode::InvalidArguments,
Error::CastDefaultValue { source, .. } => source.status_code(),

View File

@@ -645,10 +645,19 @@ impl TableMeta {
msg: format!("Table {table_name} cannot add new columns {column_names:?}"),
})?;
let partition_key_indices = self
.partition_key_indices
.iter()
.map(|idx| table_schema.column_name_by_index(*idx))
// This unwrap is safe since we only add new columns.
.map(|name| new_schema.column_index_by_name(name).unwrap())
.collect();
// value_indices would be generated automatically.
let _ = meta_builder
.schema(Arc::new(new_schema))
.primary_key_indices(primary_key_indices);
.primary_key_indices(primary_key_indices)
.partition_key_indices(partition_key_indices);
Ok(meta_builder)
}
@@ -676,6 +685,14 @@ impl TableMeta {
}
);
ensure!(
!self.partition_key_indices.contains(&index),
error::RemovePartitionColumnSnafu {
column_name: *column_name,
table_name,
}
);
if let Some(ts_index) = timestamp_index {
// Not allowed to remove column in timestamp index.
ensure!(
@@ -725,9 +742,18 @@ impl TableMeta {
.map(|name| new_schema.column_index_by_name(name).unwrap())
.collect();
let partition_key_indices = self
.partition_key_indices
.iter()
.map(|idx| table_schema.column_name_by_index(*idx))
// This unwrap is safe since we don't allow removing a partition key column.
.map(|name| new_schema.column_index_by_name(name).unwrap())
.collect();
let _ = meta_builder
.schema(Arc::new(new_schema))
.primary_key_indices(primary_key_indices);
.primary_key_indices(primary_key_indices)
.partition_key_indices(partition_key_indices);
Ok(meta_builder)
}
@@ -1300,6 +1326,8 @@ fn unset_column_skipping_index_options(
#[cfg(test)]
mod tests {
use std::assert_matches::assert_matches;
use common_error::ext::ErrorExt;
use common_error::status_code::StatusCode;
use datatypes::data_type::ConcreteDataType;
@@ -1308,6 +1336,7 @@ mod tests {
};
use super::*;
use crate::Error;
/// Create a test schema with 3 columns: `[col1 int32, ts timestampmills, col2 int32]`.
fn new_test_schema() -> Schema {
@@ -1385,6 +1414,11 @@ mod tests {
ConcreteDataType::string_datatype(),
true,
);
let yet_another_field = ColumnSchema::new(
"yet_another_field_after_ts",
ConcreteDataType::int64_datatype(),
true,
);
let alter_kind = AlterKind::AddColumns {
columns: vec![
AddColumnRequest {
@@ -1401,6 +1435,14 @@ mod tests {
}),
add_if_not_exists: false,
},
AddColumnRequest {
column_schema: yet_another_field,
is_key: true,
location: Some(AddColumnLocation::After {
column_name: "ts".to_string(),
}),
add_if_not_exists: false,
},
],
};
@@ -1756,6 +1798,29 @@ mod tests {
assert_eq!(StatusCode::InvalidArguments, err.status_code());
}
#[test]
fn test_remove_partition_column() {
let schema = Arc::new(new_test_schema());
let meta = TableMetaBuilder::empty()
.schema(schema)
.primary_key_indices(vec![])
.partition_key_indices(vec![0])
.engine("engine")
.next_column_id(3)
.build()
.unwrap();
// Remove column in primary key.
let alter_kind = AlterKind::DropColumns {
names: vec![String::from("col1")],
};
let err = meta
.builder_with_alter_kind("my_table", &alter_kind)
.err()
.unwrap();
assert_matches!(err, Error::RemovePartitionColumn { .. });
}
#[test]
fn test_change_key_column_data_type() {
let schema = Arc::new(new_test_schema());
@@ -1821,6 +1886,8 @@ mod tests {
let meta = TableMetaBuilder::empty()
.schema(schema)
.primary_key_indices(vec![0])
// partition col: col1, col2
.partition_key_indices(vec![0, 2])
.engine("engine")
.next_column_id(3)
.build()
@@ -1836,11 +1903,19 @@ mod tests {
.map(|column_schema| column_schema.name.clone())
.collect();
assert_eq!(
&["my_tag_first", "col1", "ts", "my_field_after_ts", "col2"],
&[
"my_tag_first", // primary key column
"col1", // partition column
"ts", // timestamp column
"yet_another_field_after_ts", // primary key column
"my_field_after_ts", // value column
"col2", // partition column
],
&names[..]
);
assert_eq!(&[0, 1], &new_meta.primary_key_indices[..]);
assert_eq!(&[2, 3, 4], &new_meta.value_indices[..]);
assert_eq!(&[0, 1, 3], &new_meta.primary_key_indices[..]);
assert_eq!(&[2, 4, 5], &new_meta.value_indices[..]);
assert_eq!(&[1, 5], &new_meta.partition_key_indices[..]);
}
#[test]

View File

@@ -882,11 +882,14 @@ CREATE TABLE {table_name} (
let region_id = RegionId::new(table_id, *region);
let stream = region_server
.handle_remote_read(RegionQueryRequest {
region_id: region_id.as_u64(),
plan: plan.to_vec(),
..Default::default()
})
.handle_remote_read(
RegionQueryRequest {
region_id: region_id.as_u64(),
plan: plan.to_vec(),
..Default::default()
},
QueryContext::arc(),
)
.await
.unwrap();

View File

@@ -249,11 +249,14 @@ mod tests {
let region_id = RegionId::new(table_id, *region);
let stream = region_server
.handle_remote_read(QueryRequest {
region_id: region_id.as_u64(),
plan: plan.to_vec(),
..Default::default()
})
.handle_remote_read(
QueryRequest {
region_id: region_id.as_u64(),
plan: plan.to_vec(),
..Default::default()
},
QueryContext::arc(),
)
.await
.unwrap();

View File

@@ -112,6 +112,7 @@ macro_rules! http_tests {
test_pipeline_with_hint_vrl,
test_pipeline_2,
test_pipeline_skip_error,
test_pipeline_filter,
test_otlp_metrics,
test_otlp_traces_v0,
@@ -1945,6 +1946,78 @@ transform:
guard.remove_all().await;
}
pub async fn test_pipeline_filter(store_type: StorageType) {
common_telemetry::init_default_ut_logging();
let (app, mut guard) =
setup_test_http_app_with_frontend(store_type, "test_pipeline_filter").await;
// handshake
let client = TestClient::new(app).await;
let pipeline_body = r#"
processors:
- date:
field: time
formats:
- "%Y-%m-%d %H:%M:%S%.3f"
- filter:
field: name
targets:
- John
transform:
- field: name
type: string
- field: time
type: time
index: timestamp
"#;
// 1. create pipeline
let res = client
.post("/v1/events/pipelines/test")
.header("Content-Type", "application/x-yaml")
.body(pipeline_body)
.send()
.await;
assert_eq!(res.status(), StatusCode::OK);
// 2. write data
let data_body = r#"
[
{
"time": "2024-05-25 20:16:37.217",
"name": "John"
},
{
"time": "2024-05-25 20:16:37.218",
"name": "JoHN"
},
{
"time": "2024-05-25 20:16:37.328",
"name": "Jane"
}
]
"#;
let res = client
.post("/v1/events/logs?db=public&table=logs1&pipeline_name=test")
.header("Content-Type", "application/json")
.body(data_body)
.send()
.await;
assert_eq!(res.status(), StatusCode::OK);
validate_data(
"pipeline_filter",
&client,
"select * from logs1",
"[[\"Jane\",1716668197328000000]]",
)
.await;
guard.remove_all().await;
}
pub async fn test_pipeline_dispatcher(storage_type: StorageType) {
common_telemetry::init_default_ut_logging();
let (app, mut guard) =
@@ -2405,14 +2478,19 @@ processors:
ignore_missing: true
- vrl:
source: |
.log_id = .id
del(.id)
.from_source = "channel_2"
cond, err = .id1 > .id2
if (cond) {
.from_source = "channel_1"
}
del(.id1)
del(.id2)
.
transform:
- fields:
- log_id
type: int32
- from_source
type: string
- field: time
type: time
index: timestamp
@@ -2432,7 +2510,8 @@ transform:
let data_body = r#"
[
{
"id": "2436",
"id1": 2436,
"id2": 123,
"time": "2024-05-25 20:16:37.217"
}
]
@@ -2449,7 +2528,7 @@ transform:
"test_pipeline_with_vrl",
&client,
"select * from d_table",
"[[2436,1716668197217000000]]",
"[[\"channel_1\",1716668197217000000]]",
)
.await;

View File

@@ -152,6 +152,16 @@ pub async fn test_mysql_stmts(store_type: StorageType) {
conn.execute("SET TRANSACTION READ ONLY").await.unwrap();
// empty statements
let err = conn.execute(" ------- ;").await.unwrap_err();
assert!(err.to_string().contains("empty statements"));
let err = conn.execute("----------\n;").await.unwrap_err();
assert!(err.to_string().contains("empty statements"));
let err = conn.execute(" ;").await.unwrap_err();
assert!(err.to_string().contains("empty statements"));
let err = conn.execute(" \n ;").await.unwrap_err();
assert!(err.to_string().contains("empty statements"));
let _ = fe_mysql_server.shutdown().await;
guard.remove_all().await;
}

View File

@@ -84,17 +84,37 @@ limit 1;
|_|_Inner Join: t_2.ts = t_3.ts, t_2.vin = t_3.vin_|
|_|_Inner Join: t_1.ts = t_2.ts, t_1.vin = t_2.vin_|
|_|_Filter: t_1.vin IS NOT NULL_|
|_|_MergeScan [is_placeholder=false]_|
|_|_MergeScan [is_placeholder=false, remote_input=[_|
|_| TableScan: t_1_|
|_| ]]_|
|_|_Filter: t_2.vin IS NOT NULL_|
|_|_MergeScan [is_placeholder=false]_|
|_|_MergeScan [is_placeholder=false]_|
|_|_MergeScan [is_placeholder=false]_|
|_|_MergeScan [is_placeholder=false]_|
|_|_MergeScan [is_placeholder=false]_|
|_|_MergeScan [is_placeholder=false]_|
|_|_MergeScan [is_placeholder=false]_|
|_|_MergeScan [is_placeholder=false]_|
|_|_MergeScan [is_placeholder=false]_|
|_|_MergeScan [is_placeholder=false, remote_input=[_|
|_| TableScan: t_2_|
|_| ]]_|
|_|_MergeScan [is_placeholder=false, remote_input=[_|
|_| TableScan: t_3_|
|_| ]]_|
|_|_MergeScan [is_placeholder=false, remote_input=[_|
|_| TableScan: t_4_|
|_| ]]_|
|_|_MergeScan [is_placeholder=false, remote_input=[_|
|_| TableScan: t_5_|
|_| ]]_|
|_|_MergeScan [is_placeholder=false, remote_input=[_|
|_| TableScan: t_6_|
|_| ]]_|
|_|_MergeScan [is_placeholder=false, remote_input=[_|
|_| TableScan: t_7_|
|_| ]]_|
|_|_MergeScan [is_placeholder=false, remote_input=[_|
|_| TableScan: t_8_|
|_| ]]_|
|_|_MergeScan [is_placeholder=false, remote_input=[_|
|_| TableScan: t_9_|
|_| ]]_|
|_|_MergeScan [is_placeholder=false, remote_input=[_|
|_| TableScan: t_10_|
|_| ]]_|
| physical_plan | SortPreservingMergeExec: [ts@0 DESC], fetch=1_|
|_|_SortExec: TopK(fetch=1), expr=[ts@0 DESC], preserve_partitioning=[true]_|
|_|_CoalesceBatchesExec: target_batch_size=8192_|

View File

@@ -26,7 +26,12 @@ explain SELECT * FROM demo WHERE ts > cast(1000000000 as timestamp) ORDER BY hos
| plan_type_| plan_|
+-+-+
| logical_plan_| MergeSort: demo.host ASC NULLS LAST_|
|_|_MergeScan [is_placeholder=false]_|
|_|_MergeScan [is_placeholder=false, remote_input=[_|
|_| Sort: demo.host ASC NULLS LAST_|
|_|_Projection: demo.host, demo.ts, demo.cpu, demo.memory, demo.disk_util_|
|_|_Filter: demo.ts > arrow_cast(Int64(1000000000), Utf8("Timestamp(Millisecond, None)"))_|
|_|_TableScan: demo_|
|_| ]]_|
| physical_plan | SortPreservingMergeExec: [host@0 ASC NULLS LAST]_|
|_|_MergeScanExec: REDACTED
|_|_|

View File

@@ -12,7 +12,12 @@ EXPLAIN SELECT DISTINCT i%2 FROM integers ORDER BY 1;
+-+-+
| plan_type_| plan_|
+-+-+
| logical_plan_| MergeScan [is_placeholder=false]_|
| logical_plan_| MergeScan [is_placeholder=false, remote_input=[ |
|_| Sort: integers.i % Int64(2) ASC NULLS LAST_|
|_|_Distinct:_|
|_|_Projection: integers.i % Int64(2)_|
|_|_TableScan: integers_|
|_| ]]_|
| physical_plan | MergeScanExec: REDACTED
|_|_|
+-+-+
@@ -35,7 +40,11 @@ EXPLAIN SELECT a, b FROM test ORDER BY a, b;
+-+-+
| plan_type_| plan_|
+-+-+
| logical_plan_| MergeScan [is_placeholder=false]_|
| logical_plan_| MergeScan [is_placeholder=false, remote_input=[_|
|_| Sort: test.a ASC NULLS LAST, test.b ASC NULLS LAST |
|_|_Projection: test.a, test.b_|
|_|_TableScan: test_|
|_| ]]_|
| physical_plan | MergeScanExec: REDACTED
|_|_|
+-+-+
@@ -50,7 +59,12 @@ EXPLAIN SELECT DISTINCT a, b FROM test ORDER BY a, b;
+-+-+
| plan_type_| plan_|
+-+-+
| logical_plan_| MergeScan [is_placeholder=false]_|
| logical_plan_| MergeScan [is_placeholder=false, remote_input=[_|
|_| Sort: test.a ASC NULLS LAST, test.b ASC NULLS LAST |
|_|_Distinct:_|
|_|_Projection: test.a, test.b_|
|_|_TableScan: test_|
|_| ]]_|
| physical_plan | MergeScanExec: REDACTED
|_|_|
+-+-+

View File

@@ -12,7 +12,11 @@ EXPLAIN SELECT COUNT(*) FROM single_partition;
+-+-+
| plan_type_| plan_|
+-+-+
| logical_plan_| MergeScan [is_placeholder=false]_|
| logical_plan_| MergeScan [is_placeholder=false, remote_input=[_|
|_| Projection: count(*)_|
|_|_Aggregate: groupBy=[[]], aggr=[[count(single_partition.j) AS count(*)]] |
|_|_TableScan: single_partition_|
|_| ]]_|
| physical_plan | MergeScanExec: REDACTED
|_|_|
+-+-+
@@ -27,7 +31,11 @@ EXPLAIN SELECT SUM(i) FROM single_partition;
+-+-+
| plan_type_| plan_|
+-+-+
| logical_plan_| MergeScan [is_placeholder=false]_|
| logical_plan_| MergeScan [is_placeholder=false, remote_input=[_|
|_| Projection: sum(single_partition.i)_|
|_|_Aggregate: groupBy=[[]], aggr=[[sum(single_partition.i)]] |
|_|_TableScan: single_partition_|
|_| ]]_|
| physical_plan | MergeScanExec: REDACTED
|_|_|
+-+-+
@@ -42,7 +50,11 @@ EXPLAIN SELECT * FROM single_partition ORDER BY i DESC;
+-+-+
| plan_type_| plan_|
+-+-+
| logical_plan_| MergeScan [is_placeholder=false]_|
| logical_plan_| MergeScan [is_placeholder=false, remote_input=[_|
|_| Sort: single_partition.i DESC NULLS FIRST_|
|_|_Projection: single_partition.i, single_partition.j, single_partition.k |
|_|_TableScan: single_partition_|
|_| ]]_|
| physical_plan | MergeScanExec: REDACTED
|_|_|
+-+-+

View File

@@ -55,7 +55,10 @@ FROM
+-+-+
| logical_plan_| Projection: sum(count(integers.i)) AS count(integers.i), sum(sum(integers.i)) AS sum(integers.i), uddsketch_calc(Float64(0.5), uddsketch_merge(Int64(128),Float64(0.01),uddsketch_state(Int64(128),Float64(0.01),integers.i))) AS uddsketch_calc(Float64(0.5),uddsketch_state(Int64(128),Float64(0.01),integers.i)), hll_count(hll_merge(hll(integers.i))) AS hll_count(hll(integers.i))_|
|_|_Aggregate: groupBy=[[]], aggr=[[sum(count(integers.i)), sum(sum(integers.i)), uddsketch_merge(Int64(128), Float64(0.01), uddsketch_state(Int64(128),Float64(0.01),integers.i)), hll_merge(hll(integers.i))]]_|
|_|_MergeScan [is_placeholder=false]_|
|_|_MergeScan [is_placeholder=false, remote_input=[_|
|_| Aggregate: groupBy=[[]], aggr=[[count(integers.i), sum(integers.i), uddsketch_state(Int64(128), Float64(0.01), CAST(integers.i AS Float64)), hll(CAST(integers.i AS Utf8))]]_|
|_|_TableScan: integers_|
|_| ]]_|
| physical_plan | ProjectionExec: expr=[sum(count(integers.i))@0 as count(integers.i), sum(sum(integers.i))@1 as sum(integers.i), uddsketch_calc(0.5, uddsketch_merge(Int64(128),Float64(0.01),uddsketch_state(Int64(128),Float64(0.01),integers.i))@2) as uddsketch_calc(Float64(0.5),uddsketch_state(Int64(128),Float64(0.01),integers.i)), hll_count(hll_merge(hll(integers.i))@3) as hll_count(hll(integers.i))] |
|_|_AggregateExec: mode=Final, gby=[], aggr=[sum(count(integers.i)), sum(sum(integers.i)), uddsketch_merge(Int64(128),Float64(0.01),uddsketch_state(Int64(128),Float64(0.01),integers.i)), hll_merge(hll(integers.i))]_|
|_|_CoalescePartitionsExec_|
@@ -156,7 +159,10 @@ ORDER BY
| logical_plan_| Sort: integers.ts ASC NULLS LAST_|
|_|_Projection: integers.ts, sum(count(integers.i)) AS count(integers.i), sum(sum(integers.i)) AS sum(integers.i), uddsketch_calc(Float64(0.5), uddsketch_merge(Int64(128),Float64(0.01),uddsketch_state(Int64(128),Float64(0.01),integers.i))) AS uddsketch_calc(Float64(0.5),uddsketch_state(Int64(128),Float64(0.01),integers.i)), hll_count(hll_merge(hll(integers.i))) AS hll_count(hll(integers.i))_|
|_|_Aggregate: groupBy=[[integers.ts]], aggr=[[sum(count(integers.i)), sum(sum(integers.i)), uddsketch_merge(Int64(128), Float64(0.01), uddsketch_state(Int64(128),Float64(0.01),integers.i)), hll_merge(hll(integers.i))]]_|
|_|_MergeScan [is_placeholder=false]_|
|_|_MergeScan [is_placeholder=false, remote_input=[_|
|_| Aggregate: groupBy=[[integers.ts]], aggr=[[count(integers.i), sum(integers.i), uddsketch_state(Int64(128), Float64(0.01), CAST(integers.i AS Float64)), hll(CAST(integers.i AS Utf8))]]_|
|_|_TableScan: integers_|
|_| ]]_|
| physical_plan | SortPreservingMergeExec: [ts@0 ASC NULLS LAST]_|
|_|_SortExec: expr=[ts@0 ASC NULLS LAST], preserve_partitioning=[true]_|
|_|_ProjectionExec: expr=[ts@0 as ts, sum(count(integers.i))@1 as count(integers.i), sum(sum(integers.i))@2 as sum(integers.i), uddsketch_calc(0.5, uddsketch_merge(Int64(128),Float64(0.01),uddsketch_state(Int64(128),Float64(0.01),integers.i))@3) as uddsketch_calc(Float64(0.5),uddsketch_state(Int64(128),Float64(0.01),integers.i)), hll_count(hll_merge(hll(integers.i))@4) as hll_count(hll(integers.i))] |

View File

@@ -0,0 +1,974 @@
CREATE TABLE IF NOT EXISTS aggr_optimize_not (
a STRING NULL,
b STRING NULL,
c STRING NULL,
d STRING NULL,
greptime_timestamp TIMESTAMP(3) NOT NULL,
greptime_value DOUBLE NULL,
TIME INDEX (greptime_timestamp),
PRIMARY KEY (a, b, c, d)
) PARTITION ON COLUMNS (a, b, c) (a < 'b', a >= 'b',);
Affected Rows: 0
-- Case 0: group by columns are the same as partition columns.
-- This query shouldn't push down aggregation even if group by columns are partitioned.
-- because sort is already pushed down.
-- If it does, it will cause a wrong result.
-- explain at 0s, 5s and 10s. No point at 0s.
-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
-- SQLNESS REPLACE (peers.*) REDACTED
-- SQLNESS REPLACE (Hash.*) REDACTED
tql explain (1752591864, 1752592164, '30s') max by (a, b, c) (max_over_time(aggr_optimize_not [2m]));
+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type | plan |
+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| logical_plan | Sort: aggr_optimize_not.a ASC NULLS LAST, aggr_optimize_not.b ASC NULLS LAST, aggr_optimize_not.c ASC NULLS LAST, aggr_optimize_not.greptime_timestamp ASC NULLS LAST |
| | Aggregate: groupBy=[[aggr_optimize_not.a, aggr_optimize_not.b, aggr_optimize_not.c, aggr_optimize_not.greptime_timestamp]], aggr=[[max(prom_max_over_time(greptime_timestamp_range,greptime_value))]] |
| | Projection: aggr_optimize_not.greptime_timestamp, prom_max_over_time(greptime_timestamp_range,greptime_value), aggr_optimize_not.a, aggr_optimize_not.b, aggr_optimize_not.c |
| | MergeSort: aggr_optimize_not.a ASC NULLS FIRST, aggr_optimize_not.b ASC NULLS FIRST, aggr_optimize_not.c ASC NULLS FIRST, aggr_optimize_not.d ASC NULLS FIRST, aggr_optimize_not.greptime_timestamp ASC NULLS FIRST |
| | MergeScan [is_placeholder=false, remote_input=[ |
| | Filter: prom_max_over_time(greptime_timestamp_range,greptime_value) IS NOT NULL |
| | Projection: aggr_optimize_not.greptime_timestamp, prom_max_over_time(greptime_timestamp_range, greptime_value) AS prom_max_over_time(greptime_timestamp_range,greptime_value), aggr_optimize_not.a, aggr_optimize_not.b, aggr_optimize_not.c, aggr_optimize_not.d |
| | PromRangeManipulate: req range=[0..0], interval=[300000], eval range=[120000], time index=[greptime_timestamp], values=["greptime_value"] |
| | PromSeriesNormalize: offset=[0], time index=[greptime_timestamp], filter NaN: [true] |
| | PromSeriesDivide: tags=["a", "b", "c", "d"] |
| | Sort: aggr_optimize_not.a ASC NULLS FIRST, aggr_optimize_not.b ASC NULLS FIRST, aggr_optimize_not.c ASC NULLS FIRST, aggr_optimize_not.d ASC NULLS FIRST, aggr_optimize_not.greptime_timestamp ASC NULLS FIRST |
| | Filter: aggr_optimize_not.greptime_timestamp >= TimestampMillisecond(-420000, None) AND aggr_optimize_not.greptime_timestamp <= TimestampMillisecond(300000, None) |
| | TableScan: aggr_optimize_not |
| | ]] |
| physical_plan | SortPreservingMergeExec: [a@0 ASC NULLS LAST, b@1 ASC NULLS LAST, c@2 ASC NULLS LAST, greptime_timestamp@3 ASC NULLS LAST] |
| | SortExec: expr=[a@0 ASC NULLS LAST, b@1 ASC NULLS LAST, c@2 ASC NULLS LAST, greptime_timestamp@3 ASC NULLS LAST], preserve_partitioning=[true] |
| | AggregateExec: mode=FinalPartitioned, gby=[a@0 as a, b@1 as b, c@2 as c, greptime_timestamp@3 as greptime_timestamp], aggr=[max(prom_max_over_time(greptime_timestamp_range,greptime_value))], ordering_mode=PartiallySorted([0, 1, 2]) |
| | SortExec: expr=[a@0 ASC NULLS LAST, b@1 ASC NULLS LAST, c@2 ASC NULLS LAST], preserve_partitioning=[true] |
| | CoalesceBatchesExec: target_batch_size=8192 |
| | RepartitionExec: partitioning=REDACTED
| | AggregateExec: mode=Partial, gby=[a@2 as a, b@3 as b, c@4 as c, greptime_timestamp@0 as greptime_timestamp], aggr=[max(prom_max_over_time(greptime_timestamp_range,greptime_value))], ordering_mode=PartiallySorted([0, 1, 2]) |
| | ProjectionExec: expr=[greptime_timestamp@0 as greptime_timestamp, prom_max_over_time(greptime_timestamp_range,greptime_value)@1 as prom_max_over_time(greptime_timestamp_range,greptime_value), a@2 as a, b@3 as b, c@4 as c] |
| | SortExec: expr=[a@2 ASC, b@3 ASC, c@4 ASC, d@5 ASC, greptime_timestamp@0 ASC], preserve_partitioning=[true] |
| | MergeScanExec: REDACTED
| | |
+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
-- SQLNESS REPLACE (metrics.*) REDACTED
-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
-- SQLNESS REPLACE (Hash.*) REDACTED
-- SQLNESS REPLACE (-+) -
-- SQLNESS REPLACE (\s\s+) _
-- SQLNESS REPLACE (peers.*) REDACTED
-- SQLNESS REPLACE region=\d+\(\d+,\s+\d+\) region=REDACTED
tql analyze (1752591864, 1752592164, '30s') max by (a, b, c) (max_over_time(aggr_optimize_not [2m]));
+-+-+-+
| stage | node | plan_|
+-+-+-+
| 0_| 0_|_SortPreservingMergeExec: [a@0 ASC NULLS LAST, b@1 ASC NULLS LAST, c@2 ASC NULLS LAST, greptime_timestamp@3 ASC NULLS LAST] REDACTED
|_|_|_SortExec: expr=[a@0 ASC NULLS LAST, b@1 ASC NULLS LAST, c@2 ASC NULLS LAST, greptime_timestamp@3 ASC NULLS LAST], preserve_partitioning=[true] REDACTED
|_|_|_AggregateExec: mode=FinalPartitioned, gby=[a@0 as a, b@1 as b, c@2 as c, greptime_timestamp@3 as greptime_timestamp], aggr=[max(prom_max_over_time(greptime_timestamp_range,greptime_value))], ordering_mode=PartiallySorted([0, 1, 2]) REDACTED
|_|_|_SortExec: expr=[a@0 ASC NULLS LAST, b@1 ASC NULLS LAST, c@2 ASC NULLS LAST], preserve_partitioning=[true] REDACTED
|_|_|_CoalesceBatchesExec: target_batch_size=8192 REDACTED
|_|_|_RepartitionExec: partitioning=REDACTED
|_|_|_AggregateExec: mode=Partial, gby=[a@2 as a, b@3 as b, c@4 as c, greptime_timestamp@0 as greptime_timestamp], aggr=[max(prom_max_over_time(greptime_timestamp_range,greptime_value))], ordering_mode=PartiallySorted([0, 1, 2]) REDACTED
|_|_|_ProjectionExec: expr=[greptime_timestamp@0 as greptime_timestamp, prom_max_over_time(greptime_timestamp_range,greptime_value)@1 as prom_max_over_time(greptime_timestamp_range,greptime_value), a@2 as a, b@3 as b, c@4 as c] REDACTED
|_|_|_SortExec: expr=[a@2 ASC, b@3 ASC, c@4 ASC, d@5 ASC, greptime_timestamp@0 ASC], preserve_partitioning=[true] REDACTED
|_|_|_MergeScanExec: REDACTED
|_|_|_|
| 1_| 0_|_CoalesceBatchesExec: target_batch_size=8192 REDACTED
|_|_|_FilterExec: prom_max_over_time(greptime_timestamp_range,greptime_value)@1 IS NOT NULL REDACTED
|_|_|_ProjectionExec: expr=[greptime_timestamp@4 as greptime_timestamp, prom_max_over_time(greptime_timestamp_range@6, greptime_value@5) as prom_max_over_time(greptime_timestamp_range,greptime_value), a@0 as a, b@1 as b, c@2 as c, d@3 as d] REDACTED
|_|_|_PromRangeManipulateExec: req range=[1752591864000..1752592164000], interval=[30000], eval range=[120000], time index=[greptime_timestamp] REDACTED
|_|_|_PromSeriesNormalizeExec: offset=[0], time index=[greptime_timestamp], filter NaN: [true] REDACTED
|_|_|_PromSeriesDivideExec: tags=["a", "b", "c", "d"] REDACTED
|_|_|_SeriesScan: region=REDACTED, "partition_count":{"count":0, "mem_ranges":0, "files":0, "file_ranges":0}, "distribution":"PerSeries" REDACTED
|_|_|_|
| 1_| 1_|_CoalesceBatchesExec: target_batch_size=8192 REDACTED
|_|_|_FilterExec: prom_max_over_time(greptime_timestamp_range,greptime_value)@1 IS NOT NULL REDACTED
|_|_|_ProjectionExec: expr=[greptime_timestamp@4 as greptime_timestamp, prom_max_over_time(greptime_timestamp_range@6, greptime_value@5) as prom_max_over_time(greptime_timestamp_range,greptime_value), a@0 as a, b@1 as b, c@2 as c, d@3 as d] REDACTED
|_|_|_PromRangeManipulateExec: req range=[1752591864000..1752592164000], interval=[30000], eval range=[120000], time index=[greptime_timestamp] REDACTED
|_|_|_PromSeriesNormalizeExec: offset=[0], time index=[greptime_timestamp], filter NaN: [true] REDACTED
|_|_|_PromSeriesDivideExec: tags=["a", "b", "c", "d"] REDACTED
|_|_|_SeriesScan: region=REDACTED, "partition_count":{"count":0, "mem_ranges":0, "files":0, "file_ranges":0}, "distribution":"PerSeries" REDACTED
|_|_|_|
|_|_| Total rows: 0_|
+-+-+-+
-- Case 1: group by columns are prefix of partition columns.
-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
-- SQLNESS REPLACE (peers.*) REDACTED
-- SQLNESS REPLACE (Hash.*) REDACTED
tql explain (1752591864, 1752592164, '30s') sum by (a, b) (max_over_time(aggr_optimize_not [2m]));
+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type | plan |
+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| logical_plan | Sort: aggr_optimize_not.a ASC NULLS LAST, aggr_optimize_not.b ASC NULLS LAST, aggr_optimize_not.greptime_timestamp ASC NULLS LAST |
| | Aggregate: groupBy=[[aggr_optimize_not.a, aggr_optimize_not.b, aggr_optimize_not.greptime_timestamp]], aggr=[[sum(prom_max_over_time(greptime_timestamp_range,greptime_value))]] |
| | Projection: aggr_optimize_not.greptime_timestamp, prom_max_over_time(greptime_timestamp_range,greptime_value), aggr_optimize_not.a, aggr_optimize_not.b |
| | MergeSort: aggr_optimize_not.a ASC NULLS FIRST, aggr_optimize_not.b ASC NULLS FIRST, aggr_optimize_not.c ASC NULLS FIRST, aggr_optimize_not.d ASC NULLS FIRST, aggr_optimize_not.greptime_timestamp ASC NULLS FIRST |
| | MergeScan [is_placeholder=false, remote_input=[ |
| | Filter: prom_max_over_time(greptime_timestamp_range,greptime_value) IS NOT NULL |
| | Projection: aggr_optimize_not.greptime_timestamp, prom_max_over_time(greptime_timestamp_range, greptime_value) AS prom_max_over_time(greptime_timestamp_range,greptime_value), aggr_optimize_not.a, aggr_optimize_not.b, aggr_optimize_not.c, aggr_optimize_not.d |
| | PromRangeManipulate: req range=[0..0], interval=[300000], eval range=[120000], time index=[greptime_timestamp], values=["greptime_value"] |
| | PromSeriesNormalize: offset=[0], time index=[greptime_timestamp], filter NaN: [true] |
| | PromSeriesDivide: tags=["a", "b", "c", "d"] |
| | Sort: aggr_optimize_not.a ASC NULLS FIRST, aggr_optimize_not.b ASC NULLS FIRST, aggr_optimize_not.c ASC NULLS FIRST, aggr_optimize_not.d ASC NULLS FIRST, aggr_optimize_not.greptime_timestamp ASC NULLS FIRST |
| | Filter: aggr_optimize_not.greptime_timestamp >= TimestampMillisecond(-420000, None) AND aggr_optimize_not.greptime_timestamp <= TimestampMillisecond(300000, None) |
| | TableScan: aggr_optimize_not |
| | ]] |
| physical_plan | SortPreservingMergeExec: [a@0 ASC NULLS LAST, b@1 ASC NULLS LAST, greptime_timestamp@2 ASC NULLS LAST] |
| | SortExec: expr=[a@0 ASC NULLS LAST, b@1 ASC NULLS LAST, greptime_timestamp@2 ASC NULLS LAST], preserve_partitioning=[true] |
| | AggregateExec: mode=FinalPartitioned, gby=[a@0 as a, b@1 as b, greptime_timestamp@2 as greptime_timestamp], aggr=[sum(prom_max_over_time(greptime_timestamp_range,greptime_value))], ordering_mode=PartiallySorted([0, 1]) |
| | SortExec: expr=[a@0 ASC NULLS LAST, b@1 ASC NULLS LAST], preserve_partitioning=[true] |
| | CoalesceBatchesExec: target_batch_size=8192 |
| | RepartitionExec: partitioning=REDACTED
| | AggregateExec: mode=Partial, gby=[a@2 as a, b@3 as b, greptime_timestamp@0 as greptime_timestamp], aggr=[sum(prom_max_over_time(greptime_timestamp_range,greptime_value))], ordering_mode=PartiallySorted([0, 1]) |
| | ProjectionExec: expr=[greptime_timestamp@0 as greptime_timestamp, prom_max_over_time(greptime_timestamp_range,greptime_value)@1 as prom_max_over_time(greptime_timestamp_range,greptime_value), a@2 as a, b@3 as b] |
| | SortExec: expr=[a@2 ASC, b@3 ASC, c@4 ASC, d@5 ASC, greptime_timestamp@0 ASC], preserve_partitioning=[true] |
| | MergeScanExec: REDACTED
| | |
+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
-- SQLNESS REPLACE (metrics.*) REDACTED
-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
-- SQLNESS REPLACE (Hash.*) REDACTED
-- SQLNESS REPLACE (-+) -
-- SQLNESS REPLACE (\s\s+) _
-- SQLNESS REPLACE (peers.*) REDACTED
-- SQLNESS REPLACE region=\d+\(\d+,\s+\d+\) region=REDACTED
tql analyze (1752591864, 1752592164, '30s') sum by (a, b) (max_over_time(aggr_optimize_not [2m]));
+-+-+-+
| stage | node | plan_|
+-+-+-+
| 0_| 0_|_SortPreservingMergeExec: [a@0 ASC NULLS LAST, b@1 ASC NULLS LAST, greptime_timestamp@2 ASC NULLS LAST] REDACTED
|_|_|_SortExec: expr=[a@0 ASC NULLS LAST, b@1 ASC NULLS LAST, greptime_timestamp@2 ASC NULLS LAST], preserve_partitioning=[true] REDACTED
|_|_|_AggregateExec: mode=FinalPartitioned, gby=[a@0 as a, b@1 as b, greptime_timestamp@2 as greptime_timestamp], aggr=[sum(prom_max_over_time(greptime_timestamp_range,greptime_value))], ordering_mode=PartiallySorted([0, 1]) REDACTED
|_|_|_SortExec: expr=[a@0 ASC NULLS LAST, b@1 ASC NULLS LAST], preserve_partitioning=[true] REDACTED
|_|_|_CoalesceBatchesExec: target_batch_size=8192 REDACTED
|_|_|_RepartitionExec: partitioning=REDACTED
|_|_|_AggregateExec: mode=Partial, gby=[a@2 as a, b@3 as b, greptime_timestamp@0 as greptime_timestamp], aggr=[sum(prom_max_over_time(greptime_timestamp_range,greptime_value))], ordering_mode=PartiallySorted([0, 1]) REDACTED
|_|_|_ProjectionExec: expr=[greptime_timestamp@0 as greptime_timestamp, prom_max_over_time(greptime_timestamp_range,greptime_value)@1 as prom_max_over_time(greptime_timestamp_range,greptime_value), a@2 as a, b@3 as b] REDACTED
|_|_|_SortExec: expr=[a@2 ASC, b@3 ASC, c@4 ASC, d@5 ASC, greptime_timestamp@0 ASC], preserve_partitioning=[true] REDACTED
|_|_|_MergeScanExec: REDACTED
|_|_|_|
| 1_| 0_|_CoalesceBatchesExec: target_batch_size=8192 REDACTED
|_|_|_FilterExec: prom_max_over_time(greptime_timestamp_range,greptime_value)@1 IS NOT NULL REDACTED
|_|_|_ProjectionExec: expr=[greptime_timestamp@4 as greptime_timestamp, prom_max_over_time(greptime_timestamp_range@6, greptime_value@5) as prom_max_over_time(greptime_timestamp_range,greptime_value), a@0 as a, b@1 as b, c@2 as c, d@3 as d] REDACTED
|_|_|_PromRangeManipulateExec: req range=[1752591864000..1752592164000], interval=[30000], eval range=[120000], time index=[greptime_timestamp] REDACTED
|_|_|_PromSeriesNormalizeExec: offset=[0], time index=[greptime_timestamp], filter NaN: [true] REDACTED
|_|_|_PromSeriesDivideExec: tags=["a", "b", "c", "d"] REDACTED
|_|_|_SeriesScan: region=REDACTED, "partition_count":{"count":0, "mem_ranges":0, "files":0, "file_ranges":0}, "distribution":"PerSeries" REDACTED
|_|_|_|
| 1_| 1_|_CoalesceBatchesExec: target_batch_size=8192 REDACTED
|_|_|_FilterExec: prom_max_over_time(greptime_timestamp_range,greptime_value)@1 IS NOT NULL REDACTED
|_|_|_ProjectionExec: expr=[greptime_timestamp@4 as greptime_timestamp, prom_max_over_time(greptime_timestamp_range@6, greptime_value@5) as prom_max_over_time(greptime_timestamp_range,greptime_value), a@0 as a, b@1 as b, c@2 as c, d@3 as d] REDACTED
|_|_|_PromRangeManipulateExec: req range=[1752591864000..1752592164000], interval=[30000], eval range=[120000], time index=[greptime_timestamp] REDACTED
|_|_|_PromSeriesNormalizeExec: offset=[0], time index=[greptime_timestamp], filter NaN: [true] REDACTED
|_|_|_PromSeriesDivideExec: tags=["a", "b", "c", "d"] REDACTED
|_|_|_SeriesScan: region=REDACTED, "partition_count":{"count":0, "mem_ranges":0, "files":0, "file_ranges":0}, "distribution":"PerSeries" REDACTED
|_|_|_|
|_|_| Total rows: 0_|
+-+-+-+
-- Case 2: group by columns are prefix of partition columns.
-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
-- SQLNESS REPLACE (peers.*) REDACTED
-- SQLNESS REPLACE (Hash.*) REDACTED
tql explain (1752591864, 1752592164, '30s') avg by (a) (max_over_time(aggr_optimize_not [2m]));
+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type | plan |
+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| logical_plan | Sort: aggr_optimize_not.a ASC NULLS LAST, aggr_optimize_not.greptime_timestamp ASC NULLS LAST |
| | Aggregate: groupBy=[[aggr_optimize_not.a, aggr_optimize_not.greptime_timestamp]], aggr=[[avg(prom_max_over_time(greptime_timestamp_range,greptime_value))]] |
| | Projection: aggr_optimize_not.greptime_timestamp, prom_max_over_time(greptime_timestamp_range,greptime_value), aggr_optimize_not.a |
| | MergeSort: aggr_optimize_not.a ASC NULLS FIRST, aggr_optimize_not.b ASC NULLS FIRST, aggr_optimize_not.c ASC NULLS FIRST, aggr_optimize_not.d ASC NULLS FIRST, aggr_optimize_not.greptime_timestamp ASC NULLS FIRST |
| | MergeScan [is_placeholder=false, remote_input=[ |
| | Filter: prom_max_over_time(greptime_timestamp_range,greptime_value) IS NOT NULL |
| | Projection: aggr_optimize_not.greptime_timestamp, prom_max_over_time(greptime_timestamp_range, greptime_value) AS prom_max_over_time(greptime_timestamp_range,greptime_value), aggr_optimize_not.a, aggr_optimize_not.b, aggr_optimize_not.c, aggr_optimize_not.d |
| | PromRangeManipulate: req range=[0..0], interval=[300000], eval range=[120000], time index=[greptime_timestamp], values=["greptime_value"] |
| | PromSeriesNormalize: offset=[0], time index=[greptime_timestamp], filter NaN: [true] |
| | PromSeriesDivide: tags=["a", "b", "c", "d"] |
| | Sort: aggr_optimize_not.a ASC NULLS FIRST, aggr_optimize_not.b ASC NULLS FIRST, aggr_optimize_not.c ASC NULLS FIRST, aggr_optimize_not.d ASC NULLS FIRST, aggr_optimize_not.greptime_timestamp ASC NULLS FIRST |
| | Filter: aggr_optimize_not.greptime_timestamp >= TimestampMillisecond(-420000, None) AND aggr_optimize_not.greptime_timestamp <= TimestampMillisecond(300000, None) |
| | TableScan: aggr_optimize_not |
| | ]] |
| physical_plan | SortPreservingMergeExec: [a@0 ASC NULLS LAST, greptime_timestamp@1 ASC NULLS LAST] |
| | SortExec: expr=[a@0 ASC NULLS LAST, greptime_timestamp@1 ASC NULLS LAST], preserve_partitioning=[true] |
| | AggregateExec: mode=FinalPartitioned, gby=[a@0 as a, greptime_timestamp@1 as greptime_timestamp], aggr=[avg(prom_max_over_time(greptime_timestamp_range,greptime_value))], ordering_mode=PartiallySorted([0]) |
| | SortExec: expr=[a@0 ASC NULLS LAST], preserve_partitioning=[true] |
| | CoalesceBatchesExec: target_batch_size=8192 |
| | RepartitionExec: partitioning=REDACTED
| | AggregateExec: mode=Partial, gby=[a@2 as a, greptime_timestamp@0 as greptime_timestamp], aggr=[avg(prom_max_over_time(greptime_timestamp_range,greptime_value))], ordering_mode=PartiallySorted([0]) |
| | ProjectionExec: expr=[greptime_timestamp@0 as greptime_timestamp, prom_max_over_time(greptime_timestamp_range,greptime_value)@1 as prom_max_over_time(greptime_timestamp_range,greptime_value), a@2 as a] |
| | SortExec: expr=[a@2 ASC, b@3 ASC, c@4 ASC, d@5 ASC, greptime_timestamp@0 ASC], preserve_partitioning=[true] |
| | MergeScanExec: REDACTED
| | |
+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
-- SQLNESS REPLACE (metrics.*) REDACTED
-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
-- SQLNESS REPLACE (Hash.*) REDACTED
-- SQLNESS REPLACE (-+) -
-- SQLNESS REPLACE (\s\s+) _
-- SQLNESS REPLACE (peers.*) REDACTED
-- SQLNESS REPLACE region=\d+\(\d+,\s+\d+\) region=REDACTED
tql analyze (1752591864, 1752592164, '30s') avg by (a) (max_over_time(aggr_optimize_not [2m]));
+-+-+-+
| stage | node | plan_|
+-+-+-+
| 0_| 0_|_SortPreservingMergeExec: [a@0 ASC NULLS LAST, greptime_timestamp@1 ASC NULLS LAST] REDACTED
|_|_|_SortExec: expr=[a@0 ASC NULLS LAST, greptime_timestamp@1 ASC NULLS LAST], preserve_partitioning=[true] REDACTED
|_|_|_AggregateExec: mode=FinalPartitioned, gby=[a@0 as a, greptime_timestamp@1 as greptime_timestamp], aggr=[avg(prom_max_over_time(greptime_timestamp_range,greptime_value))], ordering_mode=PartiallySorted([0]) REDACTED
|_|_|_SortExec: expr=[a@0 ASC NULLS LAST], preserve_partitioning=[true] REDACTED
|_|_|_CoalesceBatchesExec: target_batch_size=8192 REDACTED
|_|_|_RepartitionExec: partitioning=REDACTED
|_|_|_AggregateExec: mode=Partial, gby=[a@2 as a, greptime_timestamp@0 as greptime_timestamp], aggr=[avg(prom_max_over_time(greptime_timestamp_range,greptime_value))], ordering_mode=PartiallySorted([0]) REDACTED
|_|_|_ProjectionExec: expr=[greptime_timestamp@0 as greptime_timestamp, prom_max_over_time(greptime_timestamp_range,greptime_value)@1 as prom_max_over_time(greptime_timestamp_range,greptime_value), a@2 as a] REDACTED
|_|_|_SortExec: expr=[a@2 ASC, b@3 ASC, c@4 ASC, d@5 ASC, greptime_timestamp@0 ASC], preserve_partitioning=[true] REDACTED
|_|_|_MergeScanExec: REDACTED
|_|_|_|
| 1_| 0_|_CoalesceBatchesExec: target_batch_size=8192 REDACTED
|_|_|_FilterExec: prom_max_over_time(greptime_timestamp_range,greptime_value)@1 IS NOT NULL REDACTED
|_|_|_ProjectionExec: expr=[greptime_timestamp@4 as greptime_timestamp, prom_max_over_time(greptime_timestamp_range@6, greptime_value@5) as prom_max_over_time(greptime_timestamp_range,greptime_value), a@0 as a, b@1 as b, c@2 as c, d@3 as d] REDACTED
|_|_|_PromRangeManipulateExec: req range=[1752591864000..1752592164000], interval=[30000], eval range=[120000], time index=[greptime_timestamp] REDACTED
|_|_|_PromSeriesNormalizeExec: offset=[0], time index=[greptime_timestamp], filter NaN: [true] REDACTED
|_|_|_PromSeriesDivideExec: tags=["a", "b", "c", "d"] REDACTED
|_|_|_SeriesScan: region=REDACTED, "partition_count":{"count":0, "mem_ranges":0, "files":0, "file_ranges":0}, "distribution":"PerSeries" REDACTED
|_|_|_|
| 1_| 1_|_CoalesceBatchesExec: target_batch_size=8192 REDACTED
|_|_|_FilterExec: prom_max_over_time(greptime_timestamp_range,greptime_value)@1 IS NOT NULL REDACTED
|_|_|_ProjectionExec: expr=[greptime_timestamp@4 as greptime_timestamp, prom_max_over_time(greptime_timestamp_range@6, greptime_value@5) as prom_max_over_time(greptime_timestamp_range,greptime_value), a@0 as a, b@1 as b, c@2 as c, d@3 as d] REDACTED
|_|_|_PromRangeManipulateExec: req range=[1752591864000..1752592164000], interval=[30000], eval range=[120000], time index=[greptime_timestamp] REDACTED
|_|_|_PromSeriesNormalizeExec: offset=[0], time index=[greptime_timestamp], filter NaN: [true] REDACTED
|_|_|_PromSeriesDivideExec: tags=["a", "b", "c", "d"] REDACTED
|_|_|_SeriesScan: region=REDACTED, "partition_count":{"count":0, "mem_ranges":0, "files":0, "file_ranges":0}, "distribution":"PerSeries" REDACTED
|_|_|_|
|_|_| Total rows: 0_|
+-+-+-+
-- Case 3: group by columns are superset of partition columns.
-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
-- SQLNESS REPLACE (peers.*) REDACTED
-- SQLNESS REPLACE (Hash.*) REDACTED
tql explain (1752591864, 1752592164, '30s') count by (a, b, c, d) (max_over_time(aggr_optimize_not [2m]));
+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type | plan |
+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| logical_plan | Sort: aggr_optimize_not.a ASC NULLS LAST, aggr_optimize_not.b ASC NULLS LAST, aggr_optimize_not.c ASC NULLS LAST, aggr_optimize_not.d ASC NULLS LAST, aggr_optimize_not.greptime_timestamp ASC NULLS LAST |
| | Aggregate: groupBy=[[aggr_optimize_not.a, aggr_optimize_not.b, aggr_optimize_not.c, aggr_optimize_not.d, aggr_optimize_not.greptime_timestamp]], aggr=[[count(prom_max_over_time(greptime_timestamp_range,greptime_value))]] |
| | MergeSort: aggr_optimize_not.a ASC NULLS FIRST, aggr_optimize_not.b ASC NULLS FIRST, aggr_optimize_not.c ASC NULLS FIRST, aggr_optimize_not.d ASC NULLS FIRST, aggr_optimize_not.greptime_timestamp ASC NULLS FIRST |
| | MergeScan [is_placeholder=false, remote_input=[ |
| | Filter: prom_max_over_time(greptime_timestamp_range,greptime_value) IS NOT NULL |
| | Projection: aggr_optimize_not.greptime_timestamp, prom_max_over_time(greptime_timestamp_range, greptime_value) AS prom_max_over_time(greptime_timestamp_range,greptime_value), aggr_optimize_not.a, aggr_optimize_not.b, aggr_optimize_not.c, aggr_optimize_not.d |
| | PromRangeManipulate: req range=[0..0], interval=[300000], eval range=[120000], time index=[greptime_timestamp], values=["greptime_value"] |
| | PromSeriesNormalize: offset=[0], time index=[greptime_timestamp], filter NaN: [true] |
| | PromSeriesDivide: tags=["a", "b", "c", "d"] |
| | Sort: aggr_optimize_not.a ASC NULLS FIRST, aggr_optimize_not.b ASC NULLS FIRST, aggr_optimize_not.c ASC NULLS FIRST, aggr_optimize_not.d ASC NULLS FIRST, aggr_optimize_not.greptime_timestamp ASC NULLS FIRST |
| | Filter: aggr_optimize_not.greptime_timestamp >= TimestampMillisecond(-420000, None) AND aggr_optimize_not.greptime_timestamp <= TimestampMillisecond(300000, None) |
| | TableScan: aggr_optimize_not |
| | ]] |
| physical_plan | SortPreservingMergeExec: [a@0 ASC NULLS LAST, b@1 ASC NULLS LAST, c@2 ASC NULLS LAST, d@3 ASC NULLS LAST, greptime_timestamp@4 ASC NULLS LAST] |
| | AggregateExec: mode=FinalPartitioned, gby=[a@0 as a, b@1 as b, c@2 as c, d@3 as d, greptime_timestamp@4 as greptime_timestamp], aggr=[count(prom_max_over_time(greptime_timestamp_range,greptime_value))], ordering_mode=Sorted |
| | SortExec: expr=[a@0 ASC NULLS LAST, b@1 ASC NULLS LAST, c@2 ASC NULLS LAST, d@3 ASC NULLS LAST, greptime_timestamp@4 ASC NULLS LAST], preserve_partitioning=[true] |
| | CoalesceBatchesExec: target_batch_size=8192 |
| | RepartitionExec: partitioning=REDACTED
| | AggregateExec: mode=Partial, gby=[a@2 as a, b@3 as b, c@4 as c, d@5 as d, greptime_timestamp@0 as greptime_timestamp], aggr=[count(prom_max_over_time(greptime_timestamp_range,greptime_value))], ordering_mode=Sorted |
| | SortExec: expr=[a@2 ASC, b@3 ASC, c@4 ASC, d@5 ASC, greptime_timestamp@0 ASC], preserve_partitioning=[true] |
| | MergeScanExec: REDACTED
| | |
+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
-- SQLNESS REPLACE (metrics.*) REDACTED
-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
-- SQLNESS REPLACE (Hash.*) REDACTED
-- SQLNESS REPLACE (-+) -
-- SQLNESS REPLACE (\s\s+) _
-- SQLNESS REPLACE (peers.*) REDACTED
-- SQLNESS REPLACE region=\d+\(\d+,\s+\d+\) region=REDACTED
tql analyze (1752591864, 1752592164, '30s') count by (a, b, c, d) (max_over_time(aggr_optimize_not [2m]));
+-+-+-+
| stage | node | plan_|
+-+-+-+
| 0_| 0_|_SortPreservingMergeExec: [a@0 ASC NULLS LAST, b@1 ASC NULLS LAST, c@2 ASC NULLS LAST, d@3 ASC NULLS LAST, greptime_timestamp@4 ASC NULLS LAST] REDACTED
|_|_|_AggregateExec: mode=FinalPartitioned, gby=[a@0 as a, b@1 as b, c@2 as c, d@3 as d, greptime_timestamp@4 as greptime_timestamp], aggr=[count(prom_max_over_time(greptime_timestamp_range,greptime_value))], ordering_mode=Sorted REDACTED
|_|_|_SortExec: expr=[a@0 ASC NULLS LAST, b@1 ASC NULLS LAST, c@2 ASC NULLS LAST, d@3 ASC NULLS LAST, greptime_timestamp@4 ASC NULLS LAST], preserve_partitioning=[true] REDACTED
|_|_|_CoalesceBatchesExec: target_batch_size=8192 REDACTED
|_|_|_RepartitionExec: partitioning=REDACTED
|_|_|_AggregateExec: mode=Partial, gby=[a@2 as a, b@3 as b, c@4 as c, d@5 as d, greptime_timestamp@0 as greptime_timestamp], aggr=[count(prom_max_over_time(greptime_timestamp_range,greptime_value))], ordering_mode=Sorted REDACTED
|_|_|_SortExec: expr=[a@2 ASC, b@3 ASC, c@4 ASC, d@5 ASC, greptime_timestamp@0 ASC], preserve_partitioning=[true] REDACTED
|_|_|_MergeScanExec: REDACTED
|_|_|_|
| 1_| 0_|_CoalesceBatchesExec: target_batch_size=8192 REDACTED
|_|_|_FilterExec: prom_max_over_time(greptime_timestamp_range,greptime_value)@1 IS NOT NULL REDACTED
|_|_|_ProjectionExec: expr=[greptime_timestamp@4 as greptime_timestamp, prom_max_over_time(greptime_timestamp_range@6, greptime_value@5) as prom_max_over_time(greptime_timestamp_range,greptime_value), a@0 as a, b@1 as b, c@2 as c, d@3 as d] REDACTED
|_|_|_PromRangeManipulateExec: req range=[1752591864000..1752592164000], interval=[30000], eval range=[120000], time index=[greptime_timestamp] REDACTED
|_|_|_PromSeriesNormalizeExec: offset=[0], time index=[greptime_timestamp], filter NaN: [true] REDACTED
|_|_|_PromSeriesDivideExec: tags=["a", "b", "c", "d"] REDACTED
|_|_|_SeriesScan: region=REDACTED, "partition_count":{"count":0, "mem_ranges":0, "files":0, "file_ranges":0}, "distribution":"PerSeries" REDACTED
|_|_|_|
| 1_| 1_|_CoalesceBatchesExec: target_batch_size=8192 REDACTED
|_|_|_FilterExec: prom_max_over_time(greptime_timestamp_range,greptime_value)@1 IS NOT NULL REDACTED
|_|_|_ProjectionExec: expr=[greptime_timestamp@4 as greptime_timestamp, prom_max_over_time(greptime_timestamp_range@6, greptime_value@5) as prom_max_over_time(greptime_timestamp_range,greptime_value), a@0 as a, b@1 as b, c@2 as c, d@3 as d] REDACTED
|_|_|_PromRangeManipulateExec: req range=[1752591864000..1752592164000], interval=[30000], eval range=[120000], time index=[greptime_timestamp] REDACTED
|_|_|_PromSeriesNormalizeExec: offset=[0], time index=[greptime_timestamp], filter NaN: [true] REDACTED
|_|_|_PromSeriesDivideExec: tags=["a", "b", "c", "d"] REDACTED
|_|_|_SeriesScan: region=REDACTED, "partition_count":{"count":0, "mem_ranges":0, "files":0, "file_ranges":0}, "distribution":"PerSeries" REDACTED
|_|_|_|
|_|_| Total rows: 0_|
+-+-+-+
-- Case 4: group by columns are not prefix of partition columns.
-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
-- SQLNESS REPLACE (peers.*) REDACTED
-- SQLNESS REPLACE (Hash.*) REDACTED
tql explain (1752591864, 1752592164, '30s') min by (b, c, d) (max_over_time(aggr_optimize_not [2m]));
+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type | plan |
+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| logical_plan | Sort: aggr_optimize_not.b ASC NULLS LAST, aggr_optimize_not.c ASC NULLS LAST, aggr_optimize_not.d ASC NULLS LAST, aggr_optimize_not.greptime_timestamp ASC NULLS LAST |
| | Aggregate: groupBy=[[aggr_optimize_not.b, aggr_optimize_not.c, aggr_optimize_not.d, aggr_optimize_not.greptime_timestamp]], aggr=[[min(prom_max_over_time(greptime_timestamp_range,greptime_value))]] |
| | Projection: aggr_optimize_not.greptime_timestamp, prom_max_over_time(greptime_timestamp_range,greptime_value), aggr_optimize_not.b, aggr_optimize_not.c, aggr_optimize_not.d |
| | MergeSort: aggr_optimize_not.a ASC NULLS FIRST, aggr_optimize_not.b ASC NULLS FIRST, aggr_optimize_not.c ASC NULLS FIRST, aggr_optimize_not.d ASC NULLS FIRST, aggr_optimize_not.greptime_timestamp ASC NULLS FIRST |
| | MergeScan [is_placeholder=false, remote_input=[ |
| | Filter: prom_max_over_time(greptime_timestamp_range,greptime_value) IS NOT NULL |
| | Projection: aggr_optimize_not.greptime_timestamp, prom_max_over_time(greptime_timestamp_range, greptime_value) AS prom_max_over_time(greptime_timestamp_range,greptime_value), aggr_optimize_not.a, aggr_optimize_not.b, aggr_optimize_not.c, aggr_optimize_not.d |
| | PromRangeManipulate: req range=[0..0], interval=[300000], eval range=[120000], time index=[greptime_timestamp], values=["greptime_value"] |
| | PromSeriesNormalize: offset=[0], time index=[greptime_timestamp], filter NaN: [true] |
| | PromSeriesDivide: tags=["a", "b", "c", "d"] |
| | Sort: aggr_optimize_not.a ASC NULLS FIRST, aggr_optimize_not.b ASC NULLS FIRST, aggr_optimize_not.c ASC NULLS FIRST, aggr_optimize_not.d ASC NULLS FIRST, aggr_optimize_not.greptime_timestamp ASC NULLS FIRST |
| | Filter: aggr_optimize_not.greptime_timestamp >= TimestampMillisecond(-420000, None) AND aggr_optimize_not.greptime_timestamp <= TimestampMillisecond(300000, None) |
| | TableScan: aggr_optimize_not |
| | ]] |
| physical_plan | SortPreservingMergeExec: [b@0 ASC NULLS LAST, c@1 ASC NULLS LAST, d@2 ASC NULLS LAST, greptime_timestamp@3 ASC NULLS LAST] |
| | SortExec: expr=[b@0 ASC NULLS LAST, c@1 ASC NULLS LAST, d@2 ASC NULLS LAST, greptime_timestamp@3 ASC NULLS LAST], preserve_partitioning=[true] |
| | AggregateExec: mode=FinalPartitioned, gby=[b@0 as b, c@1 as c, d@2 as d, greptime_timestamp@3 as greptime_timestamp], aggr=[min(prom_max_over_time(greptime_timestamp_range,greptime_value))] |
| | CoalesceBatchesExec: target_batch_size=8192 |
| | RepartitionExec: partitioning=REDACTED
| | AggregateExec: mode=Partial, gby=[b@2 as b, c@3 as c, d@4 as d, greptime_timestamp@0 as greptime_timestamp], aggr=[min(prom_max_over_time(greptime_timestamp_range,greptime_value))] |
| | ProjectionExec: expr=[greptime_timestamp@0 as greptime_timestamp, prom_max_over_time(greptime_timestamp_range,greptime_value)@1 as prom_max_over_time(greptime_timestamp_range,greptime_value), b@3 as b, c@4 as c, d@5 as d] |
| | MergeScanExec: REDACTED
| | |
+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
-- SQLNESS REPLACE (metrics.*) REDACTED
-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
-- SQLNESS REPLACE (Hash.*) REDACTED
-- SQLNESS REPLACE (-+) -
-- SQLNESS REPLACE (\s\s+) _
-- SQLNESS REPLACE (peers.*) REDACTED
-- SQLNESS REPLACE region=\d+\(\d+,\s+\d+\) region=REDACTED
tql analyze (1752591864, 1752592164, '30s') min by (b, c, d) (max_over_time(aggr_optimize_not [2m]));
+-+-+-+
| stage | node | plan_|
+-+-+-+
| 0_| 0_|_SortPreservingMergeExec: [b@0 ASC NULLS LAST, c@1 ASC NULLS LAST, d@2 ASC NULLS LAST, greptime_timestamp@3 ASC NULLS LAST] REDACTED
|_|_|_SortExec: expr=[b@0 ASC NULLS LAST, c@1 ASC NULLS LAST, d@2 ASC NULLS LAST, greptime_timestamp@3 ASC NULLS LAST], preserve_partitioning=[true] REDACTED
|_|_|_AggregateExec: mode=FinalPartitioned, gby=[b@0 as b, c@1 as c, d@2 as d, greptime_timestamp@3 as greptime_timestamp], aggr=[min(prom_max_over_time(greptime_timestamp_range,greptime_value))] REDACTED
|_|_|_CoalesceBatchesExec: target_batch_size=8192 REDACTED
|_|_|_RepartitionExec: partitioning=REDACTED
|_|_|_AggregateExec: mode=Partial, gby=[b@2 as b, c@3 as c, d@4 as d, greptime_timestamp@0 as greptime_timestamp], aggr=[min(prom_max_over_time(greptime_timestamp_range,greptime_value))] REDACTED
|_|_|_ProjectionExec: expr=[greptime_timestamp@0 as greptime_timestamp, prom_max_over_time(greptime_timestamp_range,greptime_value)@1 as prom_max_over_time(greptime_timestamp_range,greptime_value), b@3 as b, c@4 as c, d@5 as d] REDACTED
|_|_|_MergeScanExec: REDACTED
|_|_|_|
| 1_| 0_|_CoalesceBatchesExec: target_batch_size=8192 REDACTED
|_|_|_FilterExec: prom_max_over_time(greptime_timestamp_range,greptime_value)@1 IS NOT NULL REDACTED
|_|_|_ProjectionExec: expr=[greptime_timestamp@4 as greptime_timestamp, prom_max_over_time(greptime_timestamp_range@6, greptime_value@5) as prom_max_over_time(greptime_timestamp_range,greptime_value), a@0 as a, b@1 as b, c@2 as c, d@3 as d] REDACTED
|_|_|_PromRangeManipulateExec: req range=[1752591864000..1752592164000], interval=[30000], eval range=[120000], time index=[greptime_timestamp] REDACTED
|_|_|_PromSeriesNormalizeExec: offset=[0], time index=[greptime_timestamp], filter NaN: [true] REDACTED
|_|_|_PromSeriesDivideExec: tags=["a", "b", "c", "d"] REDACTED
|_|_|_SeriesScan: region=REDACTED, "partition_count":{"count":0, "mem_ranges":0, "files":0, "file_ranges":0}, "distribution":"PerSeries" REDACTED
|_|_|_|
| 1_| 1_|_CoalesceBatchesExec: target_batch_size=8192 REDACTED
|_|_|_FilterExec: prom_max_over_time(greptime_timestamp_range,greptime_value)@1 IS NOT NULL REDACTED
|_|_|_ProjectionExec: expr=[greptime_timestamp@4 as greptime_timestamp, prom_max_over_time(greptime_timestamp_range@6, greptime_value@5) as prom_max_over_time(greptime_timestamp_range,greptime_value), a@0 as a, b@1 as b, c@2 as c, d@3 as d] REDACTED
|_|_|_PromRangeManipulateExec: req range=[1752591864000..1752592164000], interval=[30000], eval range=[120000], time index=[greptime_timestamp] REDACTED
|_|_|_PromSeriesNormalizeExec: offset=[0], time index=[greptime_timestamp], filter NaN: [true] REDACTED
|_|_|_PromSeriesDivideExec: tags=["a", "b", "c", "d"] REDACTED
|_|_|_SeriesScan: region=REDACTED, "partition_count":{"count":0, "mem_ranges":0, "files":0, "file_ranges":0}, "distribution":"PerSeries" REDACTED
|_|_|_|
|_|_| Total rows: 0_|
+-+-+-+
-- Case 5: a simple sum
-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
-- SQLNESS REPLACE (peers.*) REDACTED
-- SQLNESS REPLACE (Hash.*) REDACTED
tql explain sum(aggr_optimize_not);
+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type | plan |
+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| logical_plan | Sort: aggr_optimize_not.greptime_timestamp ASC NULLS LAST |
| | Aggregate: groupBy=[[aggr_optimize_not.greptime_timestamp]], aggr=[[sum(aggr_optimize_not.greptime_value)]] |
| | Projection: aggr_optimize_not.greptime_timestamp, aggr_optimize_not.greptime_value |
| | MergeSort: aggr_optimize_not.a ASC NULLS FIRST, aggr_optimize_not.b ASC NULLS FIRST, aggr_optimize_not.c ASC NULLS FIRST, aggr_optimize_not.d ASC NULLS FIRST, aggr_optimize_not.greptime_timestamp ASC NULLS FIRST |
| | MergeScan [is_placeholder=false, remote_input=[ |
| | PromInstantManipulate: range=[0..0], lookback=[300000], interval=[300000], time index=[greptime_timestamp] |
| | PromSeriesDivide: tags=["a", "b", "c", "d"] |
| | Sort: aggr_optimize_not.a ASC NULLS FIRST, aggr_optimize_not.b ASC NULLS FIRST, aggr_optimize_not.c ASC NULLS FIRST, aggr_optimize_not.d ASC NULLS FIRST, aggr_optimize_not.greptime_timestamp ASC NULLS FIRST |
| | Filter: aggr_optimize_not.greptime_timestamp >= TimestampMillisecond(-300000, None) AND aggr_optimize_not.greptime_timestamp <= TimestampMillisecond(300000, None) |
| | TableScan: aggr_optimize_not |
| | ]] |
| physical_plan | SortPreservingMergeExec: [greptime_timestamp@0 ASC NULLS LAST] |
| | SortExec: expr=[greptime_timestamp@0 ASC NULLS LAST], preserve_partitioning=[true] |
| | AggregateExec: mode=FinalPartitioned, gby=[greptime_timestamp@0 as greptime_timestamp], aggr=[sum(aggr_optimize_not.greptime_value)] |
| | CoalesceBatchesExec: target_batch_size=8192 |
| | RepartitionExec: partitioning=REDACTED
| | AggregateExec: mode=Partial, gby=[greptime_timestamp@0 as greptime_timestamp], aggr=[sum(aggr_optimize_not.greptime_value)] |
| | ProjectionExec: expr=[greptime_timestamp@4 as greptime_timestamp, greptime_value@5 as greptime_value] |
| | MergeScanExec: REDACTED
| | |
+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
-- SQLNESS REPLACE (metrics.*) REDACTED
-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
-- SQLNESS REPLACE (Hash.*) REDACTED
-- SQLNESS REPLACE (-+) -
-- SQLNESS REPLACE (\s\s+) _
-- SQLNESS REPLACE (peers.*) REDACTED
-- SQLNESS REPLACE region=\d+\(\d+,\s+\d+\) region=REDACTED
tql analyze sum(aggr_optimize_not);
+-+-+-+
| stage | node | plan_|
+-+-+-+
| 0_| 0_|_SortPreservingMergeExec: [greptime_timestamp@0 ASC NULLS LAST] REDACTED
|_|_|_SortExec: expr=[greptime_timestamp@0 ASC NULLS LAST], preserve_partitioning=[true] REDACTED
|_|_|_AggregateExec: mode=FinalPartitioned, gby=[greptime_timestamp@0 as greptime_timestamp], aggr=[sum(aggr_optimize_not.greptime_value)] REDACTED
|_|_|_CoalesceBatchesExec: target_batch_size=8192 REDACTED
|_|_|_RepartitionExec: partitioning=REDACTED
|_|_|_AggregateExec: mode=Partial, gby=[greptime_timestamp@0 as greptime_timestamp], aggr=[sum(aggr_optimize_not.greptime_value)] REDACTED
|_|_|_ProjectionExec: expr=[greptime_timestamp@4 as greptime_timestamp, greptime_value@5 as greptime_value] REDACTED
|_|_|_MergeScanExec: REDACTED
|_|_|_|
| 1_| 0_|_PromInstantManipulateExec: range=[0..0], lookback=[300000], interval=[300000], time index=[greptime_timestamp] REDACTED
|_|_|_PromSeriesDivideExec: tags=["a", "b", "c", "d"] REDACTED
|_|_|_SeriesScan: region=REDACTED, "partition_count":{"count":0, "mem_ranges":0, "files":0, "file_ranges":0}, "distribution":"PerSeries" REDACTED
|_|_|_|
| 1_| 1_|_PromInstantManipulateExec: range=[0..0], lookback=[300000], interval=[300000], time index=[greptime_timestamp] REDACTED
|_|_|_PromSeriesDivideExec: tags=["a", "b", "c", "d"] REDACTED
|_|_|_SeriesScan: region=REDACTED, "partition_count":{"count":0, "mem_ranges":0, "files":0, "file_ranges":0}, "distribution":"PerSeries" REDACTED
|_|_|_|
|_|_| Total rows: 0_|
+-+-+-+
-- TODO(discord9): more cases for aggr push down interacting with partitioning&tql
CREATE TABLE IF NOT EXISTS aggr_optimize_not_count (
a STRING NULL,
b STRING NULL,
c STRING NULL,
d STRING NULL,
greptime_timestamp TIMESTAMP(3) NOT NULL,
greptime_value DOUBLE NULL,
TIME INDEX (greptime_timestamp),
PRIMARY KEY (a, b, c, d)
) PARTITION ON COLUMNS (a, b, c) (a < 'b', a >= 'b',);
Affected Rows: 0
-- Case 6: Test average rate (sum/count like)
-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
-- SQLNESS REPLACE (peers.*) REDACTED
-- SQLNESS REPLACE (Hash.*) REDACTED
tql explain (1752591864, 1752592164, '30s') sum by (a, b, c) (rate(aggr_optimize_not [2m])) / sum by (a, b, c) (rate(aggr_optimize_not_count [2m]));
+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type | plan |
+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| logical_plan | Projection: aggr_optimize_not_count.a, aggr_optimize_not_count.b, aggr_optimize_not_count.c, aggr_optimize_not_count.greptime_timestamp, aggr_optimize_not.sum(prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000))) / aggr_optimize_not_count.sum(prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000))) AS aggr_optimize_not.sum(prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000))) / aggr_optimize_not_count.sum(prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000))) |
| | Inner Join: aggr_optimize_not.a = aggr_optimize_not_count.a, aggr_optimize_not.b = aggr_optimize_not_count.b, aggr_optimize_not.c = aggr_optimize_not_count.c, aggr_optimize_not.greptime_timestamp = aggr_optimize_not_count.greptime_timestamp |
| | SubqueryAlias: aggr_optimize_not |
| | Sort: aggr_optimize_not.a ASC NULLS LAST, aggr_optimize_not.b ASC NULLS LAST, aggr_optimize_not.c ASC NULLS LAST, aggr_optimize_not.greptime_timestamp ASC NULLS LAST |
| | Aggregate: groupBy=[[aggr_optimize_not.a, aggr_optimize_not.b, aggr_optimize_not.c, aggr_optimize_not.greptime_timestamp]], aggr=[[sum(prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000)))]] |
| | Projection: aggr_optimize_not.greptime_timestamp, prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000)), aggr_optimize_not.a, aggr_optimize_not.b, aggr_optimize_not.c |
| | MergeSort: aggr_optimize_not.a ASC NULLS FIRST, aggr_optimize_not.b ASC NULLS FIRST, aggr_optimize_not.c ASC NULLS FIRST, aggr_optimize_not.d ASC NULLS FIRST, aggr_optimize_not.greptime_timestamp ASC NULLS FIRST |
| | MergeScan [is_placeholder=false, remote_input=[ |
| | Filter: prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000)) IS NOT NULL |
| | Projection: aggr_optimize_not.greptime_timestamp, prom_rate(greptime_timestamp_range, greptime_value, aggr_optimize_not.greptime_timestamp, Int64(120000)) AS prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000)), aggr_optimize_not.a, aggr_optimize_not.b, aggr_optimize_not.c, aggr_optimize_not.d |
| | PromRangeManipulate: req range=[0..0], interval=[300000], eval range=[120000], time index=[greptime_timestamp], values=["greptime_value"] |
| | PromSeriesNormalize: offset=[0], time index=[greptime_timestamp], filter NaN: [true] |
| | PromSeriesDivide: tags=["a", "b", "c", "d"] |
| | Sort: aggr_optimize_not.a ASC NULLS FIRST, aggr_optimize_not.b ASC NULLS FIRST, aggr_optimize_not.c ASC NULLS FIRST, aggr_optimize_not.d ASC NULLS FIRST, aggr_optimize_not.greptime_timestamp ASC NULLS FIRST |
| | Filter: aggr_optimize_not.greptime_timestamp >= TimestampMillisecond(-420000, None) AND aggr_optimize_not.greptime_timestamp <= TimestampMillisecond(300000, None) |
| | TableScan: aggr_optimize_not |
| | ]] |
| | SubqueryAlias: aggr_optimize_not_count |
| | Sort: aggr_optimize_not_count.a ASC NULLS LAST, aggr_optimize_not_count.b ASC NULLS LAST, aggr_optimize_not_count.c ASC NULLS LAST, aggr_optimize_not_count.greptime_timestamp ASC NULLS LAST |
| | Aggregate: groupBy=[[aggr_optimize_not_count.a, aggr_optimize_not_count.b, aggr_optimize_not_count.c, aggr_optimize_not_count.greptime_timestamp]], aggr=[[sum(prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000)))]] |
| | Projection: aggr_optimize_not_count.greptime_timestamp, prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000)), aggr_optimize_not_count.a, aggr_optimize_not_count.b, aggr_optimize_not_count.c |
| | MergeSort: aggr_optimize_not_count.a ASC NULLS FIRST, aggr_optimize_not_count.b ASC NULLS FIRST, aggr_optimize_not_count.c ASC NULLS FIRST, aggr_optimize_not_count.d ASC NULLS FIRST, aggr_optimize_not_count.greptime_timestamp ASC NULLS FIRST |
| | MergeScan [is_placeholder=false, remote_input=[ |
| | Filter: prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000)) IS NOT NULL |
| | Projection: aggr_optimize_not_count.greptime_timestamp, prom_rate(greptime_timestamp_range, greptime_value, aggr_optimize_not_count.greptime_timestamp, Int64(120000)) AS prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000)), aggr_optimize_not_count.a, aggr_optimize_not_count.b, aggr_optimize_not_count.c, aggr_optimize_not_count.d |
| | PromRangeManipulate: req range=[0..0], interval=[300000], eval range=[120000], time index=[greptime_timestamp], values=["greptime_value"] |
| | PromSeriesNormalize: offset=[0], time index=[greptime_timestamp], filter NaN: [true] |
| | PromSeriesDivide: tags=["a", "b", "c", "d"] |
| | Sort: aggr_optimize_not_count.a ASC NULLS FIRST, aggr_optimize_not_count.b ASC NULLS FIRST, aggr_optimize_not_count.c ASC NULLS FIRST, aggr_optimize_not_count.d ASC NULLS FIRST, aggr_optimize_not_count.greptime_timestamp ASC NULLS FIRST |
| | Filter: aggr_optimize_not_count.greptime_timestamp >= TimestampMillisecond(-420000, None) AND aggr_optimize_not_count.greptime_timestamp <= TimestampMillisecond(300000, None) |
| | TableScan: aggr_optimize_not_count |
| | ]] |
| physical_plan | ProjectionExec: expr=[a@1 as a, b@2 as b, c@3 as c, greptime_timestamp@4 as greptime_timestamp, sum(prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000)))@0 / sum(prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000)))@5 as aggr_optimize_not.sum(prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000))) / aggr_optimize_not_count.sum(prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000)))] |
| | CoalesceBatchesExec: target_batch_size=8192 |
| | REDACTED
| | AggregateExec: mode=FinalPartitioned, gby=[a@0 as a, b@1 as b, c@2 as c, greptime_timestamp@3 as greptime_timestamp], aggr=[sum(prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000)))], ordering_mode=PartiallySorted([0, 1, 2]) |
| | SortExec: expr=[a@0 ASC NULLS LAST, b@1 ASC NULLS LAST, c@2 ASC NULLS LAST], preserve_partitioning=[true] |
| | CoalesceBatchesExec: target_batch_size=8192 |
| | RepartitionExec: partitioning=REDACTED
| | AggregateExec: mode=Partial, gby=[a@2 as a, b@3 as b, c@4 as c, greptime_timestamp@0 as greptime_timestamp], aggr=[sum(prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000)))], ordering_mode=PartiallySorted([0, 1, 2]) |
| | ProjectionExec: expr=[greptime_timestamp@0 as greptime_timestamp, prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000))@1 as prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000)), a@2 as a, b@3 as b, c@4 as c] |
| | SortExec: expr=[a@2 ASC, b@3 ASC, c@4 ASC, d@5 ASC, greptime_timestamp@0 ASC], preserve_partitioning=[true] |
| | MergeScanExec: REDACTED
| | CoalesceBatchesExec: target_batch_size=8192 |
| | RepartitionExec: partitioning=REDACTED
| | CoalescePartitionsExec |
| | AggregateExec: mode=FinalPartitioned, gby=[a@0 as a, b@1 as b, c@2 as c, greptime_timestamp@3 as greptime_timestamp], aggr=[sum(prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000)))], ordering_mode=PartiallySorted([0, 1, 2]) |
| | SortExec: expr=[a@0 ASC NULLS LAST, b@1 ASC NULLS LAST, c@2 ASC NULLS LAST], preserve_partitioning=[true] |
| | CoalesceBatchesExec: target_batch_size=8192 |
| | RepartitionExec: partitioning=REDACTED
| | AggregateExec: mode=Partial, gby=[a@2 as a, b@3 as b, c@4 as c, greptime_timestamp@0 as greptime_timestamp], aggr=[sum(prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000)))], ordering_mode=PartiallySorted([0, 1, 2]) |
| | ProjectionExec: expr=[greptime_timestamp@0 as greptime_timestamp, prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000))@1 as prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000)), a@2 as a, b@3 as b, c@4 as c] |
| | SortExec: expr=[a@2 ASC, b@3 ASC, c@4 ASC, d@5 ASC, greptime_timestamp@0 ASC], preserve_partitioning=[true] |
| | MergeScanExec: REDACTED
| | |
+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
-- SQLNESS REPLACE (metrics.*) REDACTED
-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
-- SQLNESS REPLACE (Hash.*) REDACTED
-- SQLNESS REPLACE (-+) -
-- SQLNESS REPLACE (\s\s+) _
-- SQLNESS REPLACE (peers.*) REDACTED
-- SQLNESS REPLACE region=\d+\(\d+,\s+\d+\) region=REDACTED
tql analyze (1752591864, 1752592164, '30s') sum by (a, b, c) (rate(aggr_optimize_not [2m])) / sum by (a, b, c) (rate(aggr_optimize_not_count [2m]));
+-+-+-+
| stage | node | plan_|
+-+-+-+
| 0_| 0_|_ProjectionExec: expr=[a@1 as a, b@2 as b, c@3 as c, greptime_timestamp@4 as greptime_timestamp, sum(prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000)))@0 / sum(prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000)))@5 as aggr_optimize_not.sum(prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000))) / aggr_optimize_not_count.sum(prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000)))] REDACTED
|_|_|_CoalesceBatchesExec: target_batch_size=8192 REDACTED
|_|_|_REDACTED
|_|_|_AggregateExec: mode=FinalPartitioned, gby=[a@0 as a, b@1 as b, c@2 as c, greptime_timestamp@3 as greptime_timestamp], aggr=[sum(prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000)))], ordering_mode=PartiallySorted([0, 1, 2]) REDACTED
|_|_|_SortExec: expr=[a@0 ASC NULLS LAST, b@1 ASC NULLS LAST, c@2 ASC NULLS LAST], preserve_partitioning=[true] REDACTED
|_|_|_CoalesceBatchesExec: target_batch_size=8192 REDACTED
|_|_|_RepartitionExec: partitioning=REDACTED
|_|_|_AggregateExec: mode=Partial, gby=[a@2 as a, b@3 as b, c@4 as c, greptime_timestamp@0 as greptime_timestamp], aggr=[sum(prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000)))], ordering_mode=PartiallySorted([0, 1, 2]) REDACTED
|_|_|_ProjectionExec: expr=[greptime_timestamp@0 as greptime_timestamp, prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000))@1 as prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000)), a@2 as a, b@3 as b, c@4 as c] REDACTED
|_|_|_SortExec: expr=[a@2 ASC, b@3 ASC, c@4 ASC, d@5 ASC, greptime_timestamp@0 ASC], preserve_partitioning=[true] REDACTED
|_|_|_MergeScanExec: REDACTED
|_|_|_CoalesceBatchesExec: target_batch_size=8192 REDACTED
|_|_|_RepartitionExec: partitioning=REDACTED
|_|_|_CoalescePartitionsExec REDACTED
|_|_|_AggregateExec: mode=FinalPartitioned, gby=[a@0 as a, b@1 as b, c@2 as c, greptime_timestamp@3 as greptime_timestamp], aggr=[sum(prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000)))], ordering_mode=PartiallySorted([0, 1, 2]) REDACTED
|_|_|_SortExec: expr=[a@0 ASC NULLS LAST, b@1 ASC NULLS LAST, c@2 ASC NULLS LAST], preserve_partitioning=[true] REDACTED
|_|_|_CoalesceBatchesExec: target_batch_size=8192 REDACTED
|_|_|_RepartitionExec: partitioning=REDACTED
|_|_|_AggregateExec: mode=Partial, gby=[a@2 as a, b@3 as b, c@4 as c, greptime_timestamp@0 as greptime_timestamp], aggr=[sum(prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000)))], ordering_mode=PartiallySorted([0, 1, 2]) REDACTED
|_|_|_ProjectionExec: expr=[greptime_timestamp@0 as greptime_timestamp, prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000))@1 as prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000)), a@2 as a, b@3 as b, c@4 as c] REDACTED
|_|_|_SortExec: expr=[a@2 ASC, b@3 ASC, c@4 ASC, d@5 ASC, greptime_timestamp@0 ASC], preserve_partitioning=[true] REDACTED
|_|_|_MergeScanExec: REDACTED
|_|_|_|
| 1_| 0_|_ProjectionExec: expr=[greptime_timestamp@0 as greptime_timestamp, prom_rate(greptime_timestamp_range,greptime_value,aggr_optimize_not.greptime_timestamp,Int64(120000))@1 as prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000)), a@2 as a, b@3 as b, c@4 as c, d@5 as d] REDACTED
|_|_|_CoalesceBatchesExec: target_batch_size=8192 REDACTED
|_|_|_FilterExec: prom_rate(greptime_timestamp_range,greptime_value,aggr_optimize_not.greptime_timestamp,Int64(120000))@1 IS NOT NULL REDACTED
|_|_|_ProjectionExec: expr=[greptime_timestamp@4 as greptime_timestamp, prom_rate(greptime_timestamp_range@6, greptime_value@5, greptime_timestamp@4, 120000) as prom_rate(greptime_timestamp_range,greptime_value,aggr_optimize_not.greptime_timestamp,Int64(120000)), a@0 as a, b@1 as b, c@2 as c, d@3 as d] REDACTED
|_|_|_PromRangeManipulateExec: req range=[1752591864000..1752592164000], interval=[30000], eval range=[120000], time index=[greptime_timestamp] REDACTED
|_|_|_PromSeriesNormalizeExec: offset=[0], time index=[greptime_timestamp], filter NaN: [true] REDACTED
|_|_|_PromSeriesDivideExec: tags=["a", "b", "c", "d"] REDACTED
|_|_|_SeriesScan: region=REDACTED, "partition_count":{"count":0, "mem_ranges":0, "files":0, "file_ranges":0}, "distribution":"PerSeries" REDACTED
|_|_|_|
| 1_| 1_|_ProjectionExec: expr=[greptime_timestamp@0 as greptime_timestamp, prom_rate(greptime_timestamp_range,greptime_value,aggr_optimize_not.greptime_timestamp,Int64(120000))@1 as prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000)), a@2 as a, b@3 as b, c@4 as c, d@5 as d] REDACTED
|_|_|_CoalesceBatchesExec: target_batch_size=8192 REDACTED
|_|_|_FilterExec: prom_rate(greptime_timestamp_range,greptime_value,aggr_optimize_not.greptime_timestamp,Int64(120000))@1 IS NOT NULL REDACTED
|_|_|_ProjectionExec: expr=[greptime_timestamp@4 as greptime_timestamp, prom_rate(greptime_timestamp_range@6, greptime_value@5, greptime_timestamp@4, 120000) as prom_rate(greptime_timestamp_range,greptime_value,aggr_optimize_not.greptime_timestamp,Int64(120000)), a@0 as a, b@1 as b, c@2 as c, d@3 as d] REDACTED
|_|_|_PromRangeManipulateExec: req range=[1752591864000..1752592164000], interval=[30000], eval range=[120000], time index=[greptime_timestamp] REDACTED
|_|_|_PromSeriesNormalizeExec: offset=[0], time index=[greptime_timestamp], filter NaN: [true] REDACTED
|_|_|_PromSeriesDivideExec: tags=["a", "b", "c", "d"] REDACTED
|_|_|_SeriesScan: region=REDACTED, "partition_count":{"count":0, "mem_ranges":0, "files":0, "file_ranges":0}, "distribution":"PerSeries" REDACTED
|_|_|_|
| 1_| 0_|_ProjectionExec: expr=[greptime_timestamp@0 as greptime_timestamp, prom_rate(greptime_timestamp_range,greptime_value,aggr_optimize_not_count.greptime_timestamp,Int64(120000))@1 as prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000)), a@2 as a, b@3 as b, c@4 as c, d@5 as d] REDACTED
|_|_|_CoalesceBatchesExec: target_batch_size=8192 REDACTED
|_|_|_FilterExec: prom_rate(greptime_timestamp_range,greptime_value,aggr_optimize_not_count.greptime_timestamp,Int64(120000))@1 IS NOT NULL REDACTED
|_|_|_ProjectionExec: expr=[greptime_timestamp@4 as greptime_timestamp, prom_rate(greptime_timestamp_range@6, greptime_value@5, greptime_timestamp@4, 120000) as prom_rate(greptime_timestamp_range,greptime_value,aggr_optimize_not_count.greptime_timestamp,Int64(120000)), a@0 as a, b@1 as b, c@2 as c, d@3 as d] REDACTED
|_|_|_PromRangeManipulateExec: req range=[1752591864000..1752592164000], interval=[30000], eval range=[120000], time index=[greptime_timestamp] REDACTED
|_|_|_PromSeriesNormalizeExec: offset=[0], time index=[greptime_timestamp], filter NaN: [true] REDACTED
|_|_|_PromSeriesDivideExec: tags=["a", "b", "c", "d"] REDACTED
|_|_|_SeriesScan: region=REDACTED, "partition_count":{"count":0, "mem_ranges":0, "files":0, "file_ranges":0}, "distribution":"PerSeries" REDACTED
|_|_|_|
| 1_| 1_|_ProjectionExec: expr=[greptime_timestamp@0 as greptime_timestamp, prom_rate(greptime_timestamp_range,greptime_value,aggr_optimize_not_count.greptime_timestamp,Int64(120000))@1 as prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000)), a@2 as a, b@3 as b, c@4 as c, d@5 as d] REDACTED
|_|_|_CoalesceBatchesExec: target_batch_size=8192 REDACTED
|_|_|_FilterExec: prom_rate(greptime_timestamp_range,greptime_value,aggr_optimize_not_count.greptime_timestamp,Int64(120000))@1 IS NOT NULL REDACTED
|_|_|_ProjectionExec: expr=[greptime_timestamp@4 as greptime_timestamp, prom_rate(greptime_timestamp_range@6, greptime_value@5, greptime_timestamp@4, 120000) as prom_rate(greptime_timestamp_range,greptime_value,aggr_optimize_not_count.greptime_timestamp,Int64(120000)), a@0 as a, b@1 as b, c@2 as c, d@3 as d] REDACTED
|_|_|_PromRangeManipulateExec: req range=[1752591864000..1752592164000], interval=[30000], eval range=[120000], time index=[greptime_timestamp] REDACTED
|_|_|_PromSeriesNormalizeExec: offset=[0], time index=[greptime_timestamp], filter NaN: [true] REDACTED
|_|_|_PromSeriesDivideExec: tags=["a", "b", "c", "d"] REDACTED
|_|_|_SeriesScan: region=REDACTED, "partition_count":{"count":0, "mem_ranges":0, "files":0, "file_ranges":0}, "distribution":"PerSeries" REDACTED
|_|_|_|
|_|_| Total rows: 0_|
+-+-+-+
-- Case 7: aggregate without sort should be pushed down. This one push down for include all partition columns.
-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
-- SQLNESS REPLACE (peers.*) REDACTED
-- SQLNESS REPLACE (Hash.*) REDACTED
EXPLAIN
SELECT
min(greptime_value)
FROM
aggr_optimize_not
GROUP BY
a,
b,
c;
+---------------+----------------------------------------------------------------------------------------------------------------------------------------+
| plan_type | plan |
+---------------+----------------------------------------------------------------------------------------------------------------------------------------+
| logical_plan | MergeScan [is_placeholder=false, remote_input=[ |
| | Projection: min(aggr_optimize_not.greptime_value) |
| | Aggregate: groupBy=[[aggr_optimize_not.a, aggr_optimize_not.b, aggr_optimize_not.c]], aggr=[[min(aggr_optimize_not.greptime_value)]] |
| | TableScan: aggr_optimize_not |
| | ]] |
| physical_plan | MergeScanExec: REDACTED
| | |
+---------------+----------------------------------------------------------------------------------------------------------------------------------------+
-- SQLNESS REPLACE (metrics.*) REDACTED
-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
-- SQLNESS REPLACE (Hash.*) REDACTED
-- SQLNESS REPLACE (-+) -
-- SQLNESS REPLACE (\s\s+) _
-- SQLNESS REPLACE (peers.*) REDACTED
-- SQLNESS REPLACE region=\d+\(\d+,\s+\d+\) region=REDACTED
EXPLAIN ANALYZE
SELECT
min(greptime_value)
FROM
aggr_optimize_not
GROUP BY
a,
b,
c;
+-+-+-+
| stage | node | plan_|
+-+-+-+
| 0_| 0_|_MergeScanExec: REDACTED
|_|_|_|
| 1_| 0_|_ProjectionExec: expr=[min(aggr_optimize_not.greptime_value)@3 as min(aggr_optimize_not.greptime_value)] REDACTED
|_|_|_AggregateExec: mode=FinalPartitioned, gby=[a@0 as a, b@1 as b, c@2 as c], aggr=[min(aggr_optimize_not.greptime_value)] REDACTED
|_|_|_CoalesceBatchesExec: target_batch_size=8192 REDACTED
|_|_|_RepartitionExec: partitioning=REDACTED
|_|_|_AggregateExec: mode=Partial, gby=[a@0 as a, b@1 as b, c@2 as c], aggr=[min(aggr_optimize_not.greptime_value)] REDACTED
|_|_|_SeqScan: region=REDACTED, "partition_count":{"count":0, "mem_ranges":0, "files":0, "file_ranges":0} REDACTED
|_|_|_|
| 1_| 1_|_ProjectionExec: expr=[min(aggr_optimize_not.greptime_value)@3 as min(aggr_optimize_not.greptime_value)] REDACTED
|_|_|_AggregateExec: mode=FinalPartitioned, gby=[a@0 as a, b@1 as b, c@2 as c], aggr=[min(aggr_optimize_not.greptime_value)] REDACTED
|_|_|_CoalesceBatchesExec: target_batch_size=8192 REDACTED
|_|_|_RepartitionExec: partitioning=REDACTED
|_|_|_AggregateExec: mode=Partial, gby=[a@0 as a, b@1 as b, c@2 as c], aggr=[min(aggr_optimize_not.greptime_value)] REDACTED
|_|_|_SeqScan: region=REDACTED, "partition_count":{"count":0, "mem_ranges":0, "files":0, "file_ranges":0} REDACTED
|_|_|_|
|_|_| Total rows: 0_|
+-+-+-+
-- Case 8: aggregate without sort should be pushed down. This one push down for include all partition columns then some
-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
-- SQLNESS REPLACE (peers.*) REDACTED
-- SQLNESS REPLACE (Hash.*) REDACTED
EXPLAIN
SELECT
min(greptime_value)
FROM
aggr_optimize_not
GROUP BY
a,
b,
c,
d;
+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type | plan |
+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+
| logical_plan | MergeScan [is_placeholder=false, remote_input=[ |
| | Projection: min(aggr_optimize_not.greptime_value) |
| | Aggregate: groupBy=[[aggr_optimize_not.a, aggr_optimize_not.b, aggr_optimize_not.c, aggr_optimize_not.d]], aggr=[[min(aggr_optimize_not.greptime_value)]] |
| | TableScan: aggr_optimize_not |
| | ]] |
| physical_plan | MergeScanExec: REDACTED
| | |
+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+
-- SQLNESS REPLACE (metrics.*) REDACTED
-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
-- SQLNESS REPLACE (Hash.*) REDACTED
-- SQLNESS REPLACE (-+) -
-- SQLNESS REPLACE (\s\s+) _
-- SQLNESS REPLACE (peers.*) REDACTED
-- SQLNESS REPLACE region=\d+\(\d+,\s+\d+\) region=REDACTED
EXPLAIN ANALYZE
SELECT
min(greptime_value)
FROM
aggr_optimize_not
GROUP BY
a,
b,
c,
d;
+-+-+-+
| stage | node | plan_|
+-+-+-+
| 0_| 0_|_MergeScanExec: REDACTED
|_|_|_|
| 1_| 0_|_ProjectionExec: expr=[min(aggr_optimize_not.greptime_value)@4 as min(aggr_optimize_not.greptime_value)] REDACTED
|_|_|_AggregateExec: mode=FinalPartitioned, gby=[a@0 as a, b@1 as b, c@2 as c, d@3 as d], aggr=[min(aggr_optimize_not.greptime_value)] REDACTED
|_|_|_CoalesceBatchesExec: target_batch_size=8192 REDACTED
|_|_|_RepartitionExec: partitioning=REDACTED
|_|_|_AggregateExec: mode=Partial, gby=[a@0 as a, b@1 as b, c@2 as c, d@3 as d], aggr=[min(aggr_optimize_not.greptime_value)] REDACTED
|_|_|_SeqScan: region=REDACTED, "partition_count":{"count":0, "mem_ranges":0, "files":0, "file_ranges":0} REDACTED
|_|_|_|
| 1_| 1_|_ProjectionExec: expr=[min(aggr_optimize_not.greptime_value)@4 as min(aggr_optimize_not.greptime_value)] REDACTED
|_|_|_AggregateExec: mode=FinalPartitioned, gby=[a@0 as a, b@1 as b, c@2 as c, d@3 as d], aggr=[min(aggr_optimize_not.greptime_value)] REDACTED
|_|_|_CoalesceBatchesExec: target_batch_size=8192 REDACTED
|_|_|_RepartitionExec: partitioning=REDACTED
|_|_|_AggregateExec: mode=Partial, gby=[a@0 as a, b@1 as b, c@2 as c, d@3 as d], aggr=[min(aggr_optimize_not.greptime_value)] REDACTED
|_|_|_SeqScan: region=REDACTED, "partition_count":{"count":0, "mem_ranges":0, "files":0, "file_ranges":0} REDACTED
|_|_|_|
|_|_| Total rows: 0_|
+-+-+-+
-- Case 9: aggregate without sort should be pushed down. This one push down for step aggr push down.
-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
-- SQLNESS REPLACE (peers.*) REDACTED
-- SQLNESS REPLACE (Hash.*) REDACTED
EXPLAIN
SELECT
min(greptime_value)
FROM
aggr_optimize_not
GROUP BY
a,
b;
+---------------+------------------------------------------------------------------------------------------------------------------------+
| plan_type | plan |
+---------------+------------------------------------------------------------------------------------------------------------------------+
| logical_plan | Projection: min(min(aggr_optimize_not.greptime_value)) AS min(aggr_optimize_not.greptime_value) |
| | Aggregate: groupBy=[[aggr_optimize_not.a, aggr_optimize_not.b]], aggr=[[min(min(aggr_optimize_not.greptime_value))]] |
| | MergeScan [is_placeholder=false, remote_input=[ |
| | Aggregate: groupBy=[[aggr_optimize_not.a, aggr_optimize_not.b]], aggr=[[min(aggr_optimize_not.greptime_value)]] |
| | TableScan: aggr_optimize_not |
| | ]] |
| physical_plan | ProjectionExec: expr=[min(min(aggr_optimize_not.greptime_value))@2 as min(aggr_optimize_not.greptime_value)] |
| | AggregateExec: mode=SinglePartitioned, gby=[a@0 as a, b@1 as b], aggr=[min(min(aggr_optimize_not.greptime_value))] |
| | MergeScanExec: REDACTED
| | |
+---------------+------------------------------------------------------------------------------------------------------------------------+
-- SQLNESS REPLACE (metrics.*) REDACTED
-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
-- SQLNESS REPLACE (Hash.*) REDACTED
-- SQLNESS REPLACE (-+) -
-- SQLNESS REPLACE (\s\s+) _
-- SQLNESS REPLACE (peers.*) REDACTED
-- SQLNESS REPLACE region=\d+\(\d+,\s+\d+\) region=REDACTED
EXPLAIN ANALYZE
SELECT
min(greptime_value)
FROM
aggr_optimize_not
GROUP BY
a,
b;
+-+-+-+
| stage | node | plan_|
+-+-+-+
| 0_| 0_|_ProjectionExec: expr=[min(min(aggr_optimize_not.greptime_value))@2 as min(aggr_optimize_not.greptime_value)] REDACTED
|_|_|_AggregateExec: mode=SinglePartitioned, gby=[a@0 as a, b@1 as b], aggr=[min(min(aggr_optimize_not.greptime_value))] REDACTED
|_|_|_MergeScanExec: REDACTED
|_|_|_|
| 1_| 0_|_AggregateExec: mode=FinalPartitioned, gby=[a@0 as a, b@1 as b], aggr=[min(aggr_optimize_not.greptime_value)] REDACTED
|_|_|_CoalesceBatchesExec: target_batch_size=8192 REDACTED
|_|_|_RepartitionExec: partitioning=REDACTED
|_|_|_AggregateExec: mode=Partial, gby=[a@0 as a, b@1 as b], aggr=[min(aggr_optimize_not.greptime_value)] REDACTED
|_|_|_SeqScan: region=REDACTED, "partition_count":{"count":0, "mem_ranges":0, "files":0, "file_ranges":0} REDACTED
|_|_|_|
| 1_| 1_|_AggregateExec: mode=FinalPartitioned, gby=[a@0 as a, b@1 as b], aggr=[min(aggr_optimize_not.greptime_value)] REDACTED
|_|_|_CoalesceBatchesExec: target_batch_size=8192 REDACTED
|_|_|_RepartitionExec: partitioning=REDACTED
|_|_|_AggregateExec: mode=Partial, gby=[a@0 as a, b@1 as b], aggr=[min(aggr_optimize_not.greptime_value)] REDACTED
|_|_|_SeqScan: region=REDACTED, "partition_count":{"count":0, "mem_ranges":0, "files":0, "file_ranges":0} REDACTED
|_|_|_|
|_|_| Total rows: 0_|
+-+-+-+
-- Case 10: aggregate without sort should be pushed down. This one push down for step aggr push down with complex aggr
-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
-- SQLNESS REPLACE (peers.*) REDACTED
-- SQLNESS REPLACE (Hash.*) REDACTED
EXPLAIN
SELECT
min(greptime_value) + max(greptime_value)
FROM
aggr_optimize_not
GROUP BY
a,
b;
+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type | plan |
+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| logical_plan | Projection: min(min(aggr_optimize_not.greptime_value)) + max(max(aggr_optimize_not.greptime_value)) AS min(aggr_optimize_not.greptime_value) + max(aggr_optimize_not.greptime_value) |
| | Aggregate: groupBy=[[aggr_optimize_not.a, aggr_optimize_not.b]], aggr=[[min(min(aggr_optimize_not.greptime_value)), max(max(aggr_optimize_not.greptime_value))]] |
| | MergeScan [is_placeholder=false, remote_input=[ |
| | Aggregate: groupBy=[[aggr_optimize_not.a, aggr_optimize_not.b]], aggr=[[min(aggr_optimize_not.greptime_value), max(aggr_optimize_not.greptime_value)]] |
| | TableScan: aggr_optimize_not |
| | ]] |
| physical_plan | ProjectionExec: expr=[min(min(aggr_optimize_not.greptime_value))@2 + max(max(aggr_optimize_not.greptime_value))@3 as min(aggr_optimize_not.greptime_value) + max(aggr_optimize_not.greptime_value)] |
| | AggregateExec: mode=SinglePartitioned, gby=[a@0 as a, b@1 as b], aggr=[min(min(aggr_optimize_not.greptime_value)), max(max(aggr_optimize_not.greptime_value))] |
| | MergeScanExec: REDACTED
| | |
+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
-- SQLNESS REPLACE (metrics.*) REDACTED
-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
-- SQLNESS REPLACE (Hash.*) REDACTED
-- SQLNESS REPLACE (-+) -
-- SQLNESS REPLACE (\s\s+) _
-- SQLNESS REPLACE (peers.*) REDACTED
-- SQLNESS REPLACE region=\d+\(\d+,\s+\d+\) region=REDACTED
EXPLAIN ANALYZE
SELECT
min(greptime_value) + max(greptime_value)
FROM
aggr_optimize_not
GROUP BY
a,
b;
+-+-+-+
| stage | node | plan_|
+-+-+-+
| 0_| 0_|_ProjectionExec: expr=[min(min(aggr_optimize_not.greptime_value))@2 + max(max(aggr_optimize_not.greptime_value))@3 as min(aggr_optimize_not.greptime_value) + max(aggr_optimize_not.greptime_value)] REDACTED
|_|_|_AggregateExec: mode=SinglePartitioned, gby=[a@0 as a, b@1 as b], aggr=[min(min(aggr_optimize_not.greptime_value)), max(max(aggr_optimize_not.greptime_value))] REDACTED
|_|_|_MergeScanExec: REDACTED
|_|_|_|
| 1_| 0_|_AggregateExec: mode=FinalPartitioned, gby=[a@0 as a, b@1 as b], aggr=[min(aggr_optimize_not.greptime_value), max(aggr_optimize_not.greptime_value)] REDACTED
|_|_|_CoalesceBatchesExec: target_batch_size=8192 REDACTED
|_|_|_RepartitionExec: partitioning=REDACTED
|_|_|_AggregateExec: mode=Partial, gby=[a@0 as a, b@1 as b], aggr=[min(aggr_optimize_not.greptime_value), max(aggr_optimize_not.greptime_value)] REDACTED
|_|_|_SeqScan: region=REDACTED, "partition_count":{"count":0, "mem_ranges":0, "files":0, "file_ranges":0} REDACTED
|_|_|_|
| 1_| 1_|_AggregateExec: mode=FinalPartitioned, gby=[a@0 as a, b@1 as b], aggr=[min(aggr_optimize_not.greptime_value), max(aggr_optimize_not.greptime_value)] REDACTED
|_|_|_CoalesceBatchesExec: target_batch_size=8192 REDACTED
|_|_|_RepartitionExec: partitioning=REDACTED
|_|_|_AggregateExec: mode=Partial, gby=[a@0 as a, b@1 as b], aggr=[min(aggr_optimize_not.greptime_value), max(aggr_optimize_not.greptime_value)] REDACTED
|_|_|_SeqScan: region=REDACTED, "partition_count":{"count":0, "mem_ranges":0, "files":0, "file_ranges":0} REDACTED
|_|_|_|
|_|_| Total rows: 0_|
+-+-+-+
-- Case 11: aggregate with subquery
-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
-- SQLNESS REPLACE (peers.*) REDACTED
-- SQLNESS REPLACE (Hash.*) REDACTED
EXPLAIN
SELECT
a,
min(greptime_value)
FROM
(
SELECT
a,
b,
greptime_value
FROM
aggr_optimize_not
ORDER BY
a,
b
)
GROUP BY
a;
+---------------+------------------------------------------------------------------------------------------------------------------------+
| plan_type | plan |
+---------------+------------------------------------------------------------------------------------------------------------------------+
| logical_plan | Projection: aggr_optimize_not.a, min(min(aggr_optimize_not.greptime_value)) AS min(aggr_optimize_not.greptime_value) |
| | Aggregate: groupBy=[[aggr_optimize_not.a]], aggr=[[min(min(aggr_optimize_not.greptime_value))]] |
| | MergeScan [is_placeholder=false, remote_input=[ |
| | Aggregate: groupBy=[[aggr_optimize_not.a]], aggr=[[min(aggr_optimize_not.greptime_value)]] |
| | Projection: aggr_optimize_not.a, aggr_optimize_not.b, aggr_optimize_not.greptime_value |
| | TableScan: aggr_optimize_not |
| | ]] |
| physical_plan | ProjectionExec: expr=[a@0 as a, min(min(aggr_optimize_not.greptime_value))@1 as min(aggr_optimize_not.greptime_value)] |
| | AggregateExec: mode=SinglePartitioned, gby=[a@0 as a], aggr=[min(min(aggr_optimize_not.greptime_value))] |
| | MergeScanExec: REDACTED
| | |
+---------------+------------------------------------------------------------------------------------------------------------------------+
-- SQLNESS REPLACE (metrics.*) REDACTED
-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
-- SQLNESS REPLACE (Hash.*) REDACTED
-- SQLNESS REPLACE (-+) -
-- SQLNESS REPLACE (\s\s+) _
-- SQLNESS REPLACE (peers.*) REDACTED
-- SQLNESS REPLACE region=\d+\(\d+,\s+\d+\) region=REDACTED
EXPLAIN ANALYZE
SELECT
a,
min(greptime_value)
FROM
(
SELECT
a,
b,
greptime_value
FROM
aggr_optimize_not
ORDER BY
a,
b
)
GROUP BY
a;
+-+-+-+
| stage | node | plan_|
+-+-+-+
| 0_| 0_|_ProjectionExec: expr=[a@0 as a, min(min(aggr_optimize_not.greptime_value))@1 as min(aggr_optimize_not.greptime_value)] REDACTED
|_|_|_AggregateExec: mode=SinglePartitioned, gby=[a@0 as a], aggr=[min(min(aggr_optimize_not.greptime_value))] REDACTED
|_|_|_MergeScanExec: REDACTED
|_|_|_|
| 1_| 0_|_AggregateExec: mode=FinalPartitioned, gby=[a@0 as a], aggr=[min(aggr_optimize_not.greptime_value)] REDACTED
|_|_|_CoalesceBatchesExec: target_batch_size=8192 REDACTED
|_|_|_RepartitionExec: partitioning=REDACTED
|_|_|_AggregateExec: mode=Partial, gby=[a@0 as a], aggr=[min(aggr_optimize_not.greptime_value)] REDACTED
|_|_|_SeqScan: region=REDACTED, "partition_count":{"count":0, "mem_ranges":0, "files":0, "file_ranges":0} REDACTED
|_|_|_|
| 1_| 1_|_AggregateExec: mode=FinalPartitioned, gby=[a@0 as a], aggr=[min(aggr_optimize_not.greptime_value)] REDACTED
|_|_|_CoalesceBatchesExec: target_batch_size=8192 REDACTED
|_|_|_RepartitionExec: partitioning=REDACTED
|_|_|_AggregateExec: mode=Partial, gby=[a@0 as a], aggr=[min(aggr_optimize_not.greptime_value)] REDACTED
|_|_|_SeqScan: region=REDACTED, "partition_count":{"count":0, "mem_ranges":0, "files":0, "file_ranges":0} REDACTED
|_|_|_|
|_|_| Total rows: 0_|
+-+-+-+
drop table aggr_optimize_not_count;
Affected Rows: 0
drop table aggr_optimize_not;
Affected Rows: 0

View File

@@ -0,0 +1,307 @@
CREATE TABLE IF NOT EXISTS aggr_optimize_not (
a STRING NULL,
b STRING NULL,
c STRING NULL,
d STRING NULL,
greptime_timestamp TIMESTAMP(3) NOT NULL,
greptime_value DOUBLE NULL,
TIME INDEX (greptime_timestamp),
PRIMARY KEY (a, b, c, d)
) PARTITION ON COLUMNS (a, b, c) (a < 'b', a >= 'b',);
-- Case 0: group by columns are the same as partition columns.
-- This query shouldn't push down aggregation even if group by columns are partitioned.
-- because sort is already pushed down.
-- If it does, it will cause a wrong result.
-- explain at 0s, 5s and 10s. No point at 0s.
-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
-- SQLNESS REPLACE (peers.*) REDACTED
-- SQLNESS REPLACE (Hash.*) REDACTED
tql explain (1752591864, 1752592164, '30s') max by (a, b, c) (max_over_time(aggr_optimize_not [2m]));
-- SQLNESS REPLACE (metrics.*) REDACTED
-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
-- SQLNESS REPLACE (Hash.*) REDACTED
-- SQLNESS REPLACE (-+) -
-- SQLNESS REPLACE (\s\s+) _
-- SQLNESS REPLACE (peers.*) REDACTED
-- SQLNESS REPLACE region=\d+\(\d+,\s+\d+\) region=REDACTED
tql analyze (1752591864, 1752592164, '30s') max by (a, b, c) (max_over_time(aggr_optimize_not [2m]));
-- Case 1: group by columns are prefix of partition columns.
-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
-- SQLNESS REPLACE (peers.*) REDACTED
-- SQLNESS REPLACE (Hash.*) REDACTED
tql explain (1752591864, 1752592164, '30s') sum by (a, b) (max_over_time(aggr_optimize_not [2m]));
-- SQLNESS REPLACE (metrics.*) REDACTED
-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
-- SQLNESS REPLACE (Hash.*) REDACTED
-- SQLNESS REPLACE (-+) -
-- SQLNESS REPLACE (\s\s+) _
-- SQLNESS REPLACE (peers.*) REDACTED
-- SQLNESS REPLACE region=\d+\(\d+,\s+\d+\) region=REDACTED
tql analyze (1752591864, 1752592164, '30s') sum by (a, b) (max_over_time(aggr_optimize_not [2m]));
-- Case 2: group by columns are prefix of partition columns.
-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
-- SQLNESS REPLACE (peers.*) REDACTED
-- SQLNESS REPLACE (Hash.*) REDACTED
tql explain (1752591864, 1752592164, '30s') avg by (a) (max_over_time(aggr_optimize_not [2m]));
-- SQLNESS REPLACE (metrics.*) REDACTED
-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
-- SQLNESS REPLACE (Hash.*) REDACTED
-- SQLNESS REPLACE (-+) -
-- SQLNESS REPLACE (\s\s+) _
-- SQLNESS REPLACE (peers.*) REDACTED
-- SQLNESS REPLACE region=\d+\(\d+,\s+\d+\) region=REDACTED
tql analyze (1752591864, 1752592164, '30s') avg by (a) (max_over_time(aggr_optimize_not [2m]));
-- Case 3: group by columns are superset of partition columns.
-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
-- SQLNESS REPLACE (peers.*) REDACTED
-- SQLNESS REPLACE (Hash.*) REDACTED
tql explain (1752591864, 1752592164, '30s') count by (a, b, c, d) (max_over_time(aggr_optimize_not [2m]));
-- SQLNESS REPLACE (metrics.*) REDACTED
-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
-- SQLNESS REPLACE (Hash.*) REDACTED
-- SQLNESS REPLACE (-+) -
-- SQLNESS REPLACE (\s\s+) _
-- SQLNESS REPLACE (peers.*) REDACTED
-- SQLNESS REPLACE region=\d+\(\d+,\s+\d+\) region=REDACTED
tql analyze (1752591864, 1752592164, '30s') count by (a, b, c, d) (max_over_time(aggr_optimize_not [2m]));
-- Case 4: group by columns are not prefix of partition columns.
-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
-- SQLNESS REPLACE (peers.*) REDACTED
-- SQLNESS REPLACE (Hash.*) REDACTED
tql explain (1752591864, 1752592164, '30s') min by (b, c, d) (max_over_time(aggr_optimize_not [2m]));
-- SQLNESS REPLACE (metrics.*) REDACTED
-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
-- SQLNESS REPLACE (Hash.*) REDACTED
-- SQLNESS REPLACE (-+) -
-- SQLNESS REPLACE (\s\s+) _
-- SQLNESS REPLACE (peers.*) REDACTED
-- SQLNESS REPLACE region=\d+\(\d+,\s+\d+\) region=REDACTED
tql analyze (1752591864, 1752592164, '30s') min by (b, c, d) (max_over_time(aggr_optimize_not [2m]));
-- Case 5: a simple sum
-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
-- SQLNESS REPLACE (peers.*) REDACTED
-- SQLNESS REPLACE (Hash.*) REDACTED
tql explain sum(aggr_optimize_not);
-- SQLNESS REPLACE (metrics.*) REDACTED
-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
-- SQLNESS REPLACE (Hash.*) REDACTED
-- SQLNESS REPLACE (-+) -
-- SQLNESS REPLACE (\s\s+) _
-- SQLNESS REPLACE (peers.*) REDACTED
-- SQLNESS REPLACE region=\d+\(\d+,\s+\d+\) region=REDACTED
tql analyze sum(aggr_optimize_not);
-- TODO(discord9): more cases for aggr push down interacting with partitioning&tql
CREATE TABLE IF NOT EXISTS aggr_optimize_not_count (
a STRING NULL,
b STRING NULL,
c STRING NULL,
d STRING NULL,
greptime_timestamp TIMESTAMP(3) NOT NULL,
greptime_value DOUBLE NULL,
TIME INDEX (greptime_timestamp),
PRIMARY KEY (a, b, c, d)
) PARTITION ON COLUMNS (a, b, c) (a < 'b', a >= 'b',);
-- Case 6: Test average rate (sum/count like)
-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
-- SQLNESS REPLACE (peers.*) REDACTED
-- SQLNESS REPLACE (Hash.*) REDACTED
tql explain (1752591864, 1752592164, '30s') sum by (a, b, c) (rate(aggr_optimize_not [2m])) / sum by (a, b, c) (rate(aggr_optimize_not_count [2m]));
-- SQLNESS REPLACE (metrics.*) REDACTED
-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
-- SQLNESS REPLACE (Hash.*) REDACTED
-- SQLNESS REPLACE (-+) -
-- SQLNESS REPLACE (\s\s+) _
-- SQLNESS REPLACE (peers.*) REDACTED
-- SQLNESS REPLACE region=\d+\(\d+,\s+\d+\) region=REDACTED
tql analyze (1752591864, 1752592164, '30s') sum by (a, b, c) (rate(aggr_optimize_not [2m])) / sum by (a, b, c) (rate(aggr_optimize_not_count [2m]));
-- Case 7: aggregate without sort should be pushed down. This one push down for include all partition columns.
-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
-- SQLNESS REPLACE (peers.*) REDACTED
-- SQLNESS REPLACE (Hash.*) REDACTED
EXPLAIN
SELECT
min(greptime_value)
FROM
aggr_optimize_not
GROUP BY
a,
b,
c;
-- SQLNESS REPLACE (metrics.*) REDACTED
-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
-- SQLNESS REPLACE (Hash.*) REDACTED
-- SQLNESS REPLACE (-+) -
-- SQLNESS REPLACE (\s\s+) _
-- SQLNESS REPLACE (peers.*) REDACTED
-- SQLNESS REPLACE region=\d+\(\d+,\s+\d+\) region=REDACTED
EXPLAIN ANALYZE
SELECT
min(greptime_value)
FROM
aggr_optimize_not
GROUP BY
a,
b,
c;
-- Case 8: aggregate without sort should be pushed down. This one push down for include all partition columns then some
-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
-- SQLNESS REPLACE (peers.*) REDACTED
-- SQLNESS REPLACE (Hash.*) REDACTED
EXPLAIN
SELECT
min(greptime_value)
FROM
aggr_optimize_not
GROUP BY
a,
b,
c,
d;
-- SQLNESS REPLACE (metrics.*) REDACTED
-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
-- SQLNESS REPLACE (Hash.*) REDACTED
-- SQLNESS REPLACE (-+) -
-- SQLNESS REPLACE (\s\s+) _
-- SQLNESS REPLACE (peers.*) REDACTED
-- SQLNESS REPLACE region=\d+\(\d+,\s+\d+\) region=REDACTED
EXPLAIN ANALYZE
SELECT
min(greptime_value)
FROM
aggr_optimize_not
GROUP BY
a,
b,
c,
d;
-- Case 9: aggregate without sort should be pushed down. This one push down for step aggr push down.
-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
-- SQLNESS REPLACE (peers.*) REDACTED
-- SQLNESS REPLACE (Hash.*) REDACTED
EXPLAIN
SELECT
min(greptime_value)
FROM
aggr_optimize_not
GROUP BY
a,
b;
-- SQLNESS REPLACE (metrics.*) REDACTED
-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
-- SQLNESS REPLACE (Hash.*) REDACTED
-- SQLNESS REPLACE (-+) -
-- SQLNESS REPLACE (\s\s+) _
-- SQLNESS REPLACE (peers.*) REDACTED
-- SQLNESS REPLACE region=\d+\(\d+,\s+\d+\) region=REDACTED
EXPLAIN ANALYZE
SELECT
min(greptime_value)
FROM
aggr_optimize_not
GROUP BY
a,
b;
-- Case 10: aggregate without sort should be pushed down. This one push down for step aggr push down with complex aggr
-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
-- SQLNESS REPLACE (peers.*) REDACTED
-- SQLNESS REPLACE (Hash.*) REDACTED
EXPLAIN
SELECT
min(greptime_value) + max(greptime_value)
FROM
aggr_optimize_not
GROUP BY
a,
b;
-- SQLNESS REPLACE (metrics.*) REDACTED
-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
-- SQLNESS REPLACE (Hash.*) REDACTED
-- SQLNESS REPLACE (-+) -
-- SQLNESS REPLACE (\s\s+) _
-- SQLNESS REPLACE (peers.*) REDACTED
-- SQLNESS REPLACE region=\d+\(\d+,\s+\d+\) region=REDACTED
EXPLAIN ANALYZE
SELECT
min(greptime_value) + max(greptime_value)
FROM
aggr_optimize_not
GROUP BY
a,
b;
-- Case 11: aggregate with subquery
-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
-- SQLNESS REPLACE (peers.*) REDACTED
-- SQLNESS REPLACE (Hash.*) REDACTED
EXPLAIN
SELECT
a,
min(greptime_value)
FROM
(
SELECT
a,
b,
greptime_value
FROM
aggr_optimize_not
ORDER BY
a,
b
)
GROUP BY
a;
-- SQLNESS REPLACE (metrics.*) REDACTED
-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
-- SQLNESS REPLACE (Hash.*) REDACTED
-- SQLNESS REPLACE (-+) -
-- SQLNESS REPLACE (\s\s+) _
-- SQLNESS REPLACE (peers.*) REDACTED
-- SQLNESS REPLACE region=\d+\(\d+,\s+\d+\) region=REDACTED
EXPLAIN ANALYZE
SELECT
a,
min(greptime_value)
FROM
(
SELECT
a,
b,
greptime_value
FROM
aggr_optimize_not
ORDER BY
a,
b
)
GROUP BY
a;
drop table aggr_optimize_not_count;
drop table aggr_optimize_not;

View File

@@ -50,7 +50,10 @@ FROM
+-+-+
| logical_plan_| Projection: sum(count(integers.i)) AS count(integers.i)_|
|_|_Aggregate: groupBy=[[]], aggr=[[sum(count(integers.i))]]_|
|_|_MergeScan [is_placeholder=false]_|
|_|_MergeScan [is_placeholder=false, remote_input=[_|
|_| Aggregate: groupBy=[[]], aggr=[[count(integers.i)]]_|
|_|_TableScan: integers_|
|_| ]]_|
| physical_plan | ProjectionExec: expr=[sum(count(integers.i))@0 as count(integers.i)]_|
|_|_AggregateExec: mode=Final, gby=[], aggr=[sum(count(integers.i))]_|
|_|_CoalescePartitionsExec_|
@@ -144,7 +147,10 @@ ORDER BY
| logical_plan_| Sort: integers.ts ASC NULLS LAST, count(integers.i) ASC NULLS LAST_|
|_|_Projection: integers.ts, sum(count(integers.i)) AS count(integers.i)_|
|_|_Aggregate: groupBy=[[integers.ts]], aggr=[[sum(count(integers.i))]]_|
|_|_MergeScan [is_placeholder=false]_|
|_|_MergeScan [is_placeholder=false, remote_input=[_|
|_| Aggregate: groupBy=[[integers.ts]], aggr=[[count(integers.i)]]_|
|_|_TableScan: integers_|
|_| ]]_|
| physical_plan | SortPreservingMergeExec: [ts@0 ASC NULLS LAST, count(integers.i)@1 ASC NULLS LAST]_|
|_|_SortExec: expr=[ts@0 ASC NULLS LAST, count(integers.i)@1 ASC NULLS LAST], preserve_partitioning=[true]_|
|_|_ProjectionExec: expr=[ts@0 as ts, sum(count(integers.i))@1 as count(integers.i)]_|
@@ -253,7 +259,10 @@ ORDER BY
| logical_plan_| Sort: time_window ASC NULLS LAST, count(integers.i) ASC NULLS LAST_|
|_|_Projection: date_bin(Utf8("1 hour"),integers.ts) AS time_window, sum(count(integers.i)) AS count(integers.i)_|
|_|_Aggregate: groupBy=[[date_bin(Utf8("1 hour"),integers.ts)]], aggr=[[sum(count(integers.i))]]_|
|_|_MergeScan [is_placeholder=false]_|
|_|_MergeScan [is_placeholder=false, remote_input=[_|
|_| Aggregate: groupBy=[[date_bin(CAST(Utf8("1 hour") AS Interval(MonthDayNano)), integers.ts)]], aggr=[[count(integers.i)]]_|
|_|_TableScan: integers_|
|_| ]]_|
| physical_plan | SortPreservingMergeExec: [time_window@0 ASC NULLS LAST, count(integers.i)@1 ASC NULLS LAST]_|
|_|_SortExec: expr=[time_window@0 ASC NULLS LAST, count(integers.i)@1 ASC NULLS LAST], preserve_partitioning=[true]_|
|_|_ProjectionExec: expr=[date_bin(Utf8("1 hour"),integers.ts)@0 as time_window, sum(count(integers.i))@1 as count(integers.i)]_|
@@ -369,7 +378,10 @@ ORDER BY
| logical_plan_| Sort: integers.ts + Int64(1) ASC NULLS LAST, integers.i / Int64(2) ASC NULLS LAST_|
|_|_Projection: integers.ts + Int64(1), integers.i / Int64(2), sum(count(integers.i)) AS count(integers.i)_|
|_|_Aggregate: groupBy=[[integers.ts + Int64(1), integers.i / Int64(2)]], aggr=[[sum(count(integers.i))]]_|
|_|_MergeScan [is_placeholder=false]_|
|_|_MergeScan [is_placeholder=false, remote_input=[_|
|_| Aggregate: groupBy=[[CAST(integers.ts AS Int64) + Int64(1), integers.i / Int64(2)]], aggr=[[count(integers.i)]]_|
|_|_TableScan: integers_|
|_| ]]_|
| physical_plan | SortPreservingMergeExec: [integers.ts + Int64(1)@0 ASC NULLS LAST, integers.i / Int64(2)@1 ASC NULLS LAST]_|
|_|_SortExec: expr=[integers.ts + Int64(1)@0 ASC NULLS LAST, integers.i / Int64(2)@1 ASC NULLS LAST], preserve_partitioning=[true]_|
|_|_ProjectionExec: expr=[integers.ts + Int64(1)@0 as integers.ts + Int64(1), integers.i / Int64(2)@1 as integers.i / Int64(2), sum(count(integers.i))@2 as count(integers.i)]_|
@@ -497,7 +509,10 @@ FROM
+-+-+
| logical_plan_| Projection: uddsketch_calc(Float64(0.5), uddsketch_merge(Int64(128),Float64(0.01),uddsketch_merge(Int64(128),Float64(0.01),sink_table.udd_state))) AS udd_result, hll_count(hll_merge(hll_merge(sink_table.hll_state))) AS hll_result_|
|_|_Aggregate: groupBy=[[]], aggr=[[uddsketch_merge(Int64(128), Float64(0.01), uddsketch_merge(Int64(128),Float64(0.01),sink_table.udd_state)), hll_merge(hll_merge(sink_table.hll_state))]]_|
|_|_MergeScan [is_placeholder=false]_|
|_|_MergeScan [is_placeholder=false, remote_input=[_|
|_| Aggregate: groupBy=[[]], aggr=[[uddsketch_merge(Int64(128), Float64(0.01), sink_table.udd_state), hll_merge(sink_table.hll_state)]]_|
|_|_TableScan: sink_table_|
|_| ]]_|
| physical_plan | ProjectionExec: expr=[uddsketch_calc(0.5, uddsketch_merge(Int64(128),Float64(0.01),uddsketch_merge(Int64(128),Float64(0.01),sink_table.udd_state))@0) as udd_result, hll_count(hll_merge(hll_merge(sink_table.hll_state))@1) as hll_result] |
|_|_AggregateExec: mode=Final, gby=[], aggr=[uddsketch_merge(Int64(128),Float64(0.01),uddsketch_merge(Int64(128),Float64(0.01),sink_table.udd_state)), hll_merge(hll_merge(sink_table.hll_state))]_|
|_|_CoalescePartitionsExec_|

View File

@@ -247,7 +247,11 @@ GROUP BY
+-+-+
| logical_plan_| Projection: base_table.env, base_table.service_name, base_table.city, base_table.page, uddsketch_merge(Int64(128),Float64(0.01),uddsketch_state(Int64(128),Float64(0.01),CASE WHEN base_table.lcp > Int64(0) AND base_table.lcp < Int64(3000000) THEN base_table.lcp ELSE NULL END)) AS lcp_state, max(max(CASE WHEN base_table.lcp > Int64(0) AND base_table.lcp < Int64(3000000) THEN base_table.lcp ELSE NULL END)) AS max_lcp, min(min(CASE WHEN base_table.lcp > Int64(0) AND base_table.lcp < Int64(3000000) THEN base_table.lcp ELSE NULL END)) AS min_lcp, uddsketch_merge(Int64(128),Float64(0.01),uddsketch_state(Int64(128),Float64(0.01),CASE WHEN base_table.fmp > Int64(0) AND base_table.fmp < Int64(3000000) THEN base_table.fmp ELSE NULL END)) AS fmp_state, max(max(CASE WHEN base_table.fmp > Int64(0) AND base_table.fmp < Int64(3000000) THEN base_table.fmp ELSE NULL END)) AS max_fmp, min(min(CASE WHEN base_table.fmp > Int64(0) AND base_table.fmp < Int64(3000000) THEN base_table.fmp ELSE NULL END)) AS min_fmp, uddsketch_merge(Int64(128),Float64(0.01),uddsketch_state(Int64(128),Float64(0.01),CASE WHEN base_table.fcp > Int64(0) AND base_table.fcp < Int64(3000000) THEN base_table.fcp ELSE NULL END)) AS fcp_state, max(max(CASE WHEN base_table.fcp > Int64(0) AND base_table.fcp < Int64(3000000) THEN base_table.fcp ELSE NULL END)) AS max_fcp, min(min(CASE WHEN base_table.fcp > Int64(0) AND base_table.fcp < Int64(3000000) THEN base_table.fcp ELSE NULL END)) AS min_fcp, uddsketch_merge(Int64(128),Float64(0.01),uddsketch_state(Int64(128),Float64(0.01),CASE WHEN base_table.fp > Int64(0) AND base_table.fp < Int64(3000000) THEN base_table.fp ELSE NULL END)) AS fp_state, max(max(CASE WHEN base_table.fp > Int64(0) AND base_table.fp < Int64(3000000) THEN base_table.fp ELSE NULL END)) AS max_fp, min(min(CASE WHEN base_table.fp > Int64(0) AND base_table.fp < Int64(3000000) THEN base_table.fp ELSE NULL END)) AS min_fp, uddsketch_merge(Int64(128),Float64(0.01),uddsketch_state(Int64(128),Float64(0.01),CASE WHEN base_table.tti > Int64(0) AND base_table.tti < Int64(3000000) THEN base_table.tti ELSE NULL END)) AS tti_state, max(max(CASE WHEN base_table.tti > Int64(0) AND base_table.tti < Int64(3000000) THEN base_table.tti ELSE NULL END)) AS max_tti, min(min(CASE WHEN base_table.tti > Int64(0) AND base_table.tti < Int64(3000000) THEN base_table.tti ELSE NULL END)) AS min_tti, uddsketch_merge(Int64(128),Float64(0.01),uddsketch_state(Int64(128),Float64(0.01),CASE WHEN base_table.fid > Int64(0) AND base_table.fid < Int64(3000000) THEN base_table.fid ELSE NULL END)) AS fid_state, max(max(CASE WHEN base_table.fid > Int64(0) AND base_table.fid < Int64(3000000) THEN base_table.fid ELSE NULL END)) AS max_fid, min(min(CASE WHEN base_table.fid > Int64(0) AND base_table.fid < Int64(3000000) THEN base_table.fid ELSE NULL END)) AS min_fid, max(max(base_table.shard_key)) AS shard_key, arrow_cast(date_bin(Utf8("60 seconds"),base_table.time),Utf8("Timestamp(Second, None)"))_|
|_|_Aggregate: groupBy=[[base_table.env, base_table.service_name, base_table.city, base_table.page, arrow_cast(date_bin(Utf8("60 seconds"),base_table.time),Utf8("Timestamp(Second, None)"))]], aggr=[[uddsketch_merge(Int64(128), Float64(0.01), uddsketch_state(Int64(128),Float64(0.01),CASE WHEN base_table.lcp > Int64(0) AND base_table.lcp < Int64(3000000) THEN base_table.lcp ELSE NULL END)), max(max(CASE WHEN base_table.lcp > Int64(0) AND base_table.lcp < Int64(3000000) THEN base_table.lcp ELSE NULL END)), min(min(CASE WHEN base_table.lcp > Int64(0) AND base_table.lcp < Int64(3000000) THEN base_table.lcp ELSE NULL END)), uddsketch_merge(Int64(128), Float64(0.01), uddsketch_state(Int64(128),Float64(0.01),CASE WHEN base_table.fmp > Int64(0) AND base_table.fmp < Int64(3000000) THEN base_table.fmp ELSE NULL END)), max(max(CASE WHEN base_table.fmp > Int64(0) AND base_table.fmp < Int64(3000000) THEN base_table.fmp ELSE NULL END)), min(min(CASE WHEN base_table.fmp > Int64(0) AND base_table.fmp < Int64(3000000) THEN base_table.fmp ELSE NULL END)), uddsketch_merge(Int64(128), Float64(0.01), uddsketch_state(Int64(128),Float64(0.01),CASE WHEN base_table.fcp > Int64(0) AND base_table.fcp < Int64(3000000) THEN base_table.fcp ELSE NULL END)), max(max(CASE WHEN base_table.fcp > Int64(0) AND base_table.fcp < Int64(3000000) THEN base_table.fcp ELSE NULL END)), min(min(CASE WHEN base_table.fcp > Int64(0) AND base_table.fcp < Int64(3000000) THEN base_table.fcp ELSE NULL END)), uddsketch_merge(Int64(128), Float64(0.01), uddsketch_state(Int64(128),Float64(0.01),CASE WHEN base_table.fp > Int64(0) AND base_table.fp < Int64(3000000) THEN base_table.fp ELSE NULL END)), max(max(CASE WHEN base_table.fp > Int64(0) AND base_table.fp < Int64(3000000) THEN base_table.fp ELSE NULL END)), min(min(CASE WHEN base_table.fp > Int64(0) AND base_table.fp < Int64(3000000) THEN base_table.fp ELSE NULL END)), uddsketch_merge(Int64(128), Float64(0.01), uddsketch_state(Int64(128),Float64(0.01),CASE WHEN base_table.tti > Int64(0) AND base_table.tti < Int64(3000000) THEN base_table.tti ELSE NULL END)), max(max(CASE WHEN base_table.tti > Int64(0) AND base_table.tti < Int64(3000000) THEN base_table.tti ELSE NULL END)), min(min(CASE WHEN base_table.tti > Int64(0) AND base_table.tti < Int64(3000000) THEN base_table.tti ELSE NULL END)), uddsketch_merge(Int64(128), Float64(0.01), uddsketch_state(Int64(128),Float64(0.01),CASE WHEN base_table.fid > Int64(0) AND base_table.fid < Int64(3000000) THEN base_table.fid ELSE NULL END)), max(max(CASE WHEN base_table.fid > Int64(0) AND base_table.fid < Int64(3000000) THEN base_table.fid ELSE NULL END)), min(min(CASE WHEN base_table.fid > Int64(0) AND base_table.fid < Int64(3000000) THEN base_table.fid ELSE NULL END)), max(max(base_table.shard_key))]]_|
|_|_MergeScan [is_placeholder=false]_|
|_|_MergeScan [is_placeholder=false, remote_input=[_|
|_| Aggregate: groupBy=[[base_table.env, base_table.service_name, base_table.city, base_table.page, arrow_cast(date_bin(CAST(Utf8("60 seconds") AS Interval(MonthDayNano)), base_table.time), Utf8("Timestamp(Second, None)"))]], aggr=[[uddsketch_state(Int64(128), Float64(0.01), CAST(CASE WHEN base_table.lcp > Int64(0) AND base_table.lcp < Int64(3000000) THEN base_table.lcp ELSE CAST(NULL AS Int64) END AS Float64)), max(CASE WHEN base_table.lcp > Int64(0) AND base_table.lcp < Int64(3000000) THEN base_table.lcp ELSE CAST(NULL AS Int64) END), min(CASE WHEN base_table.lcp > Int64(0) AND base_table.lcp < Int64(3000000) THEN base_table.lcp ELSE CAST(NULL AS Int64) END), uddsketch_state(Int64(128), Float64(0.01), CAST(CASE WHEN base_table.fmp > Int64(0) AND base_table.fmp < Int64(3000000) THEN base_table.fmp ELSE CAST(NULL AS Int64) END AS Float64)), max(CASE WHEN base_table.fmp > Int64(0) AND base_table.fmp < Int64(3000000) THEN base_table.fmp ELSE CAST(NULL AS Int64) END), min(CASE WHEN base_table.fmp > Int64(0) AND base_table.fmp < Int64(3000000) THEN base_table.fmp ELSE CAST(NULL AS Int64) END), uddsketch_state(Int64(128), Float64(0.01), CAST(CASE WHEN base_table.fcp > Int64(0) AND base_table.fcp < Int64(3000000) THEN base_table.fcp ELSE CAST(NULL AS Int64) END AS Float64)), max(CASE WHEN base_table.fcp > Int64(0) AND base_table.fcp < Int64(3000000) THEN base_table.fcp ELSE CAST(NULL AS Int64) END), min(CASE WHEN base_table.fcp > Int64(0) AND base_table.fcp < Int64(3000000) THEN base_table.fcp ELSE CAST(NULL AS Int64) END), uddsketch_state(Int64(128), Float64(0.01), CAST(CASE WHEN base_table.fp > Int64(0) AND base_table.fp < Int64(3000000) THEN base_table.fp ELSE CAST(NULL AS Int64) END AS Float64)), max(CASE WHEN base_table.fp > Int64(0) AND base_table.fp < Int64(3000000) THEN base_table.fp ELSE CAST(NULL AS Int64) END), min(CASE WHEN base_table.fp > Int64(0) AND base_table.fp < Int64(3000000) THEN base_table.fp ELSE CAST(NULL AS Int64) END), uddsketch_state(Int64(128), Float64(0.01), CAST(CASE WHEN base_table.tti > Int64(0) AND base_table.tti < Int64(3000000) THEN base_table.tti ELSE CAST(NULL AS Int64) END AS Float64)), max(CASE WHEN base_table.tti > Int64(0) AND base_table.tti < Int64(3000000) THEN base_table.tti ELSE CAST(NULL AS Int64) END), min(CASE WHEN base_table.tti > Int64(0) AND base_table.tti < Int64(3000000) THEN base_table.tti ELSE CAST(NULL AS Int64) END), uddsketch_state(Int64(128), Float64(0.01), CAST(CASE WHEN base_table.fid > Int64(0) AND base_table.fid < Int64(3000000) THEN base_table.fid ELSE CAST(NULL AS Int64) END AS Float64)), max(CASE WHEN base_table.fid > Int64(0) AND base_table.fid < Int64(3000000) THEN base_table.fid ELSE CAST(NULL AS Int64) END), min(CASE WHEN base_table.fid > Int64(0) AND base_table.fid < Int64(3000000) THEN base_table.fid ELSE CAST(NULL AS Int64) END), max(base_table.shard_key)]]_|
|_|_Filter: (base_table.lcp > Int64(0) AND base_table.lcp < Int64(3000000) OR base_table.fmp > Int64(0) AND base_table.fmp < Int64(3000000) OR base_table.fcp > Int64(0) AND base_table.fcp < Int64(3000000) OR base_table.fp > Int64(0) AND base_table.fp < Int64(3000000) OR base_table.tti > Int64(0) AND base_table.tti < Int64(3000000) OR base_table.fid > Int64(0) AND base_table.fid < Int64(3000000)) AND CAST(base_table.time AS Timestamp(Millisecond, Some("+00:00"))) >= CAST(now() AS Timestamp(Millisecond, Some("+00:00")))_|
|_|_TableScan: base_table_|
|_| ]]_|
| physical_plan | ProjectionExec: expr=[env@0 as env, service_name@1 as service_name, city@2 as city, page@3 as page, uddsketch_merge(Int64(128),Float64(0.01),uddsketch_state(Int64(128),Float64(0.01),CASE WHEN base_table.lcp > Int64(0) AND base_table.lcp < Int64(3000000) THEN base_table.lcp ELSE NULL END))@5 as lcp_state, max(max(CASE WHEN base_table.lcp > Int64(0) AND base_table.lcp < Int64(3000000) THEN base_table.lcp ELSE NULL END))@6 as max_lcp, min(min(CASE WHEN base_table.lcp > Int64(0) AND base_table.lcp < Int64(3000000) THEN base_table.lcp ELSE NULL END))@7 as min_lcp, uddsketch_merge(Int64(128),Float64(0.01),uddsketch_state(Int64(128),Float64(0.01),CASE WHEN base_table.fmp > Int64(0) AND base_table.fmp < Int64(3000000) THEN base_table.fmp ELSE NULL END))@8 as fmp_state, max(max(CASE WHEN base_table.fmp > Int64(0) AND base_table.fmp < Int64(3000000) THEN base_table.fmp ELSE NULL END))@9 as max_fmp, min(min(CASE WHEN base_table.fmp > Int64(0) AND base_table.fmp < Int64(3000000) THEN base_table.fmp ELSE NULL END))@10 as min_fmp, uddsketch_merge(Int64(128),Float64(0.01),uddsketch_state(Int64(128),Float64(0.01),CASE WHEN base_table.fcp > Int64(0) AND base_table.fcp < Int64(3000000) THEN base_table.fcp ELSE NULL END))@11 as fcp_state, max(max(CASE WHEN base_table.fcp > Int64(0) AND base_table.fcp < Int64(3000000) THEN base_table.fcp ELSE NULL END))@12 as max_fcp, min(min(CASE WHEN base_table.fcp > Int64(0) AND base_table.fcp < Int64(3000000) THEN base_table.fcp ELSE NULL END))@13 as min_fcp, uddsketch_merge(Int64(128),Float64(0.01),uddsketch_state(Int64(128),Float64(0.01),CASE WHEN base_table.fp > Int64(0) AND base_table.fp < Int64(3000000) THEN base_table.fp ELSE NULL END))@14 as fp_state, max(max(CASE WHEN base_table.fp > Int64(0) AND base_table.fp < Int64(3000000) THEN base_table.fp ELSE NULL END))@15 as max_fp, min(min(CASE WHEN base_table.fp > Int64(0) AND base_table.fp < Int64(3000000) THEN base_table.fp ELSE NULL END))@16 as min_fp, uddsketch_merge(Int64(128),Float64(0.01),uddsketch_state(Int64(128),Float64(0.01),CASE WHEN base_table.tti > Int64(0) AND base_table.tti < Int64(3000000) THEN base_table.tti ELSE NULL END))@17 as tti_state, max(max(CASE WHEN base_table.tti > Int64(0) AND base_table.tti < Int64(3000000) THEN base_table.tti ELSE NULL END))@18 as max_tti, min(min(CASE WHEN base_table.tti > Int64(0) AND base_table.tti < Int64(3000000) THEN base_table.tti ELSE NULL END))@19 as min_tti, uddsketch_merge(Int64(128),Float64(0.01),uddsketch_state(Int64(128),Float64(0.01),CASE WHEN base_table.fid > Int64(0) AND base_table.fid < Int64(3000000) THEN base_table.fid ELSE NULL END))@20 as fid_state, max(max(CASE WHEN base_table.fid > Int64(0) AND base_table.fid < Int64(3000000) THEN base_table.fid ELSE NULL END))@21 as max_fid, min(min(CASE WHEN base_table.fid > Int64(0) AND base_table.fid < Int64(3000000) THEN base_table.fid ELSE NULL END))@22 as min_fid, max(max(base_table.shard_key))@23 as shard_key, arrow_cast(date_bin(Utf8("60 seconds"),base_table.time),Utf8("Timestamp(Second, None)"))@4 as arrow_cast(date_bin(Utf8("60 seconds"),base_table.time),Utf8("Timestamp(Second, None)"))] |
|_|_AggregateExec: mode=FinalPartitioned, gby=[env@0 as env, service_name@1 as service_name, city@2 as city, page@3 as page, arrow_cast(date_bin(Utf8("60 seconds"),base_table.time),Utf8("Timestamp(Second, None)"))@4 as arrow_cast(date_bin(Utf8("60 seconds"),base_table.time),Utf8("Timestamp(Second, None)"))], aggr=[uddsketch_merge(Int64(128),Float64(0.01),uddsketch_state(Int64(128),Float64(0.01),CASE WHEN base_table.lcp > Int64(0) AND base_table.lcp < Int64(3000000) THEN base_table.lcp ELSE NULL END)), max(max(CASE WHEN base_table.lcp > Int64(0) AND base_table.lcp < Int64(3000000) THEN base_table.lcp ELSE NULL END)), min(min(CASE WHEN base_table.lcp > Int64(0) AND base_table.lcp < Int64(3000000) THEN base_table.lcp ELSE NULL END)), uddsketch_merge(Int64(128),Float64(0.01),uddsketch_state(Int64(128),Float64(0.01),CASE WHEN base_table.fmp > Int64(0) AND base_table.fmp < Int64(3000000) THEN base_table.fmp ELSE NULL END)), max(max(CASE WHEN base_table.fmp > Int64(0) AND base_table.fmp < Int64(3000000) THEN base_table.fmp ELSE NULL END)), min(min(CASE WHEN base_table.fmp > Int64(0) AND base_table.fmp < Int64(3000000) THEN base_table.fmp ELSE NULL END)), uddsketch_merge(Int64(128),Float64(0.01),uddsketch_state(Int64(128),Float64(0.01),CASE WHEN base_table.fcp > Int64(0) AND base_table.fcp < Int64(3000000) THEN base_table.fcp ELSE NULL END)), max(max(CASE WHEN base_table.fcp > Int64(0) AND base_table.fcp < Int64(3000000) THEN base_table.fcp ELSE NULL END)), min(min(CASE WHEN base_table.fcp > Int64(0) AND base_table.fcp < Int64(3000000) THEN base_table.fcp ELSE NULL END)), uddsketch_merge(Int64(128),Float64(0.01),uddsketch_state(Int64(128),Float64(0.01),CASE WHEN base_table.fp > Int64(0) AND base_table.fp < Int64(3000000) THEN base_table.fp ELSE NULL END)), max(max(CASE WHEN base_table.fp > Int64(0) AND base_table.fp < Int64(3000000) THEN base_table.fp ELSE NULL END)), min(min(CASE WHEN base_table.fp > Int64(0) AND base_table.fp < Int64(3000000) THEN base_table.fp ELSE NULL END)), uddsketch_merge(Int64(128),Float64(0.01),uddsketch_state(Int64(128),Float64(0.01),CASE WHEN base_table.tti > Int64(0) AND base_table.tti < Int64(3000000) THEN base_table.tti ELSE NULL END)), max(max(CASE WHEN base_table.tti > Int64(0) AND base_table.tti < Int64(3000000) THEN base_table.tti ELSE NULL END)), min(min(CASE WHEN base_table.tti > Int64(0) AND base_table.tti < Int64(3000000) THEN base_table.tti ELSE NULL END)), uddsketch_merge(Int64(128),Float64(0.01),uddsketch_state(Int64(128),Float64(0.01),CASE WHEN base_table.fid > Int64(0) AND base_table.fid < Int64(3000000) THEN base_table.fid ELSE NULL END)), max(max(CASE WHEN base_table.fid > Int64(0) AND base_table.fid < Int64(3000000) THEN base_table.fid ELSE NULL END)), min(min(CASE WHEN base_table.fid > Int64(0) AND base_table.fid < Int64(3000000) THEN base_table.fid ELSE NULL END)), max(max(base_table.shard_key))]_|
|_|_CoalesceBatchesExec: target_batch_size=8192_|
@@ -624,7 +628,11 @@ where
+-+-+
| logical_plan_| Projection: count(*) AS count(*)_|
|_|_Aggregate: groupBy=[[]], aggr=[[sum(count(*)) AS count(*)]]_|
|_|_MergeScan [is_placeholder=false]_|
|_|_MergeScan [is_placeholder=false, remote_input=[_|
|_| Aggregate: groupBy=[[]], aggr=[[count(base_table.time) AS count(*)]]_|
|_|_Filter: CAST(base_table.time AS Timestamp(Millisecond, Some("+00:00"))) >= CAST(now() AS Timestamp(Millisecond, Some("+00:00")))_|
|_|_TableScan: base_table_|
|_| ]]_|
| physical_plan | AggregateExec: mode=Final, gby=[], aggr=[count(*)]_|
|_|_CoalescePartitionsExec_|
|_|_AggregateExec: mode=Partial, gby=[], aggr=[count(*)]_|

View File

@@ -14,9 +14,14 @@ EXPLAIN SELECT * FROM integers WHERE i IN ((SELECT i FROM integers)) ORDER BY i;
+-+-+
| logical_plan_| Sort: integers.i ASC NULLS LAST_|
|_|_LeftSemi Join: integers.i = __correlated_sq_1.i_|
|_|_MergeScan [is_placeholder=false]_|
|_|_MergeScan [is_placeholder=false, remote_input=[_|
|_| TableScan: integers_|
|_| ]]_|
|_|_SubqueryAlias: __correlated_sq_1_|
|_|_MergeScan [is_placeholder=false]_|
|_|_MergeScan [is_placeholder=false, remote_input=[_|
|_| Projection: integers.i_|
|_|_TableScan: integers_|
|_| ]]_|
| physical_plan | SortPreservingMergeExec: [i@0 ASC NULLS LAST]_|
|_|_SortExec: expr=[i@0 ASC NULLS LAST], preserve_partitioning=[true]_|
|_|_CoalesceBatchesExec: target_batch_size=8192_|
@@ -43,10 +48,14 @@ EXPLAIN SELECT * FROM integers i1 WHERE EXISTS(SELECT i FROM integers WHERE i=i1
| logical_plan_| Sort: i1.i ASC NULLS LAST_|
|_|_LeftSemi Join: i1.i = __correlated_sq_1.i_|
|_|_SubqueryAlias: i1_|
|_|_MergeScan [is_placeholder=false]_|
|_|_MergeScan [is_placeholder=false, remote_input=[_|
|_| TableScan: integers_|
|_| ]]_|
|_|_SubqueryAlias: __correlated_sq_1_|
|_|_Projection: integers.i_|
|_|_MergeScan [is_placeholder=false]_|
|_|_MergeScan [is_placeholder=false, remote_input=[_|
|_| TableScan: integers_|
|_| ]]_|
| physical_plan | SortPreservingMergeExec: [i@0 ASC NULLS LAST]_|
|_|_SortExec: expr=[i@0 ASC NULLS LAST], preserve_partitioning=[true]_|
|_|_CoalesceBatchesExec: target_batch_size=8192_|
@@ -85,9 +94,13 @@ order by t.i desc;
|_|_Cross Join:_|
|_|_Filter: integers.i IS NOT NULL_|
|_|_Projection: integers.i_|
|_|_MergeScan [is_placeholder=false]_|
|_|_MergeScan [is_placeholder=false, remote_input=[_|
|_| TableScan: integers_|
|_| ]]_|
|_|_Projection:_|
|_|_MergeScan [is_placeholder=false]_|
|_|_MergeScan [is_placeholder=false, remote_input=[_|
|_| TableScan: other_|
|_| ]]_|
| physical_plan | SortPreservingMergeExec: [i@0 DESC]_|
|_|_SortExec: expr=[i@0 DESC], preserve_partitioning=[true]_|
|_|_CrossJoinExec_|
@@ -116,9 +129,15 @@ EXPLAIN INSERT INTO other SELECT i, 2 FROM integers WHERE i=(SELECT MAX(i) FROM
| | Projection: integers.i AS i, TimestampMillisecond(2, None) AS j |
| | Inner Join: integers.i = __scalar_sq_1.max(integers.i) |
| | Projection: integers.i |
| | MergeScan [is_placeholder=false] |
| | MergeScan [is_placeholder=false, remote_input=[ |
| | TableScan: integers |
| | ]] |
| | SubqueryAlias: __scalar_sq_1 |
| | MergeScan [is_placeholder=false] |
| | MergeScan [is_placeholder=false, remote_input=[ |
| | Projection: max(integers.i) |
| | Aggregate: groupBy=[[]], aggr=[[max(integers.i)]] |
| | TableScan: integers |
| | ]] |
| physical_plan_error | Error during planning: failed to resolve catalog: datafusion |
+---------------------+-------------------------------------------------------------------+

View File

@@ -252,10 +252,14 @@ EXPLAIN SELECT * FROM (SELECT 0=1 AS cond FROM integers i1, integers i2) a1 WHER
|_|_Cross Join:_|
|_|_SubqueryAlias: i1_|
|_|_Projection:_|
|_|_MergeScan [is_placeholder=false]_|
|_|_MergeScan [is_placeholder=false, remote_input=[ |
|_| TableScan: integers_|
|_| ]]_|
|_|_SubqueryAlias: i2_|
|_|_Projection:_|
|_|_MergeScan [is_placeholder=false]_|
|_|_MergeScan [is_placeholder=false, remote_input=[ |
|_| TableScan: integers_|
|_| ]]_|
| physical_plan | CoalescePartitionsExec_|
|_|_ProjectionExec: expr=[false as cond]_|
|_|_CrossJoinExec_|

View File

@@ -4,7 +4,10 @@ explain select * from numbers;
+---------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type | plan |
+---------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| logical_plan | MergeScan [is_placeholder=false] |
| logical_plan | MergeScan [is_placeholder=false, remote_input=[ |
| | Projection: numbers.number |
| | TableScan: numbers |
| | ]] |
| physical_plan | StreamScanAdapter: [<SendableRecordBatchStream>], schema: [Schema { fields: [Field { name: "number", data_type: UInt32, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {"greptime:version": "0"} }] |
| | |
+---------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
@@ -15,7 +18,11 @@ explain select * from numbers order by number desc;
+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type | plan |
+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| logical_plan | MergeScan [is_placeholder=false] |
| logical_plan | MergeScan [is_placeholder=false, remote_input=[ |
| | Sort: numbers.number DESC NULLS FIRST |
| | Projection: numbers.number |
| | TableScan: numbers |
| | ]] |
| physical_plan | SortExec: expr=[number@0 DESC], preserve_partitioning=[false] |
| | StreamScanAdapter: [<SendableRecordBatchStream>], schema: [Schema { fields: [Field { name: "number", data_type: UInt32, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {"greptime:version": "0"} }] |
| | |
@@ -27,7 +34,11 @@ explain select * from numbers order by number asc;
+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type | plan |
+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| logical_plan | MergeScan [is_placeholder=false] |
| logical_plan | MergeScan [is_placeholder=false, remote_input=[ |
| | Sort: numbers.number ASC NULLS LAST |
| | Projection: numbers.number |
| | TableScan: numbers |
| | ]] |
| physical_plan | SortExec: expr=[number@0 ASC NULLS LAST], preserve_partitioning=[false] |
| | StreamScanAdapter: [<SendableRecordBatchStream>], schema: [Schema { fields: [Field { name: "number", data_type: UInt32, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {"greptime:version": "0"} }] |
| | |
@@ -39,7 +50,12 @@ explain select * from numbers order by number desc limit 10;
+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type | plan |
+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| logical_plan | MergeScan [is_placeholder=false] |
| logical_plan | MergeScan [is_placeholder=false, remote_input=[ |
| | Limit: skip=0, fetch=10 |
| | Sort: numbers.number DESC NULLS FIRST |
| | Projection: numbers.number |
| | TableScan: numbers |
| | ]] |
| physical_plan | SortExec: TopK(fetch=10), expr=[number@0 DESC], preserve_partitioning=[false] |
| | StreamScanAdapter: [<SendableRecordBatchStream>], schema: [Schema { fields: [Field { name: "number", data_type: UInt32, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {"greptime:version": "0"} }] |
| | |
@@ -51,7 +67,12 @@ explain select * from numbers order by number asc limit 10;
+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type | plan |
+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| logical_plan | MergeScan [is_placeholder=false] |
| logical_plan | MergeScan [is_placeholder=false, remote_input=[ |
| | Limit: skip=0, fetch=10 |
| | Sort: numbers.number ASC NULLS LAST |
| | Projection: numbers.number |
| | TableScan: numbers |
| | ]] |
| physical_plan | SortExec: TopK(fetch=10), expr=[number@0 ASC NULLS LAST], preserve_partitioning=[false] |
| | StreamScanAdapter: [<SendableRecordBatchStream>], schema: [Schema { fields: [Field { name: "number", data_type: UInt32, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {"greptime:version": "0"} }] |
| | |

View File

@@ -174,3 +174,80 @@ DROP TABLE t;
Affected Rows: 0
CREATE TABLE my_table (
a INT PRIMARY KEY,
b STRING,
ts TIMESTAMP TIME INDEX,
)
PARTITION ON COLUMNS (a) (
a < 1000,
a >= 1000 AND a < 2000,
a >= 2000
);
Affected Rows: 0
INSERT INTO my_table VALUES
(100, 'a', 1),
(200, 'b', 2),
(1100, 'c', 3),
(1200, 'd', 4),
(2000, 'e', 5),
(2100, 'f', 6),
(2200, 'g', 7),
(2400, 'h', 8);
Affected Rows: 8
SELECT * FROM my_table WHERE a > 100 order by a;
+------+---+-------------------------+
| a | b | ts |
+------+---+-------------------------+
| 200 | b | 1970-01-01T00:00:00.002 |
| 1100 | c | 1970-01-01T00:00:00.003 |
| 1200 | d | 1970-01-01T00:00:00.004 |
| 2000 | e | 1970-01-01T00:00:00.005 |
| 2100 | f | 1970-01-01T00:00:00.006 |
| 2200 | g | 1970-01-01T00:00:00.007 |
| 2400 | h | 1970-01-01T00:00:00.008 |
+------+---+-------------------------+
SELECT count(*) FROM my_table WHERE a > 100;
+----------+
| count(*) |
+----------+
| 7 |
+----------+
ALTER TABLE my_table ADD COLUMN c STRING FIRST;
Affected Rows: 0
SELECT * FROM my_table WHERE a > 100 order by a;
+---+------+---+-------------------------+
| c | a | b | ts |
+---+------+---+-------------------------+
| | 200 | b | 1970-01-01T00:00:00.002 |
| | 1100 | c | 1970-01-01T00:00:00.003 |
| | 1200 | d | 1970-01-01T00:00:00.004 |
| | 2000 | e | 1970-01-01T00:00:00.005 |
| | 2100 | f | 1970-01-01T00:00:00.006 |
| | 2200 | g | 1970-01-01T00:00:00.007 |
| | 2400 | h | 1970-01-01T00:00:00.008 |
+---+------+---+-------------------------+
SELECT count(*) FROM my_table WHERE a > 100;
+----------+
| count(*) |
+----------+
| 7 |
+----------+
DROP TABLE my_table;
Affected Rows: 0

View File

@@ -47,3 +47,36 @@ SELECT * FROM t;
ALTER TABLE t ADD COLUMN x int xxx;
DROP TABLE t;
CREATE TABLE my_table (
a INT PRIMARY KEY,
b STRING,
ts TIMESTAMP TIME INDEX,
)
PARTITION ON COLUMNS (a) (
a < 1000,
a >= 1000 AND a < 2000,
a >= 2000
);
INSERT INTO my_table VALUES
(100, 'a', 1),
(200, 'b', 2),
(1100, 'c', 3),
(1200, 'd', 4),
(2000, 'e', 5),
(2100, 'f', 6),
(2200, 'g', 7),
(2400, 'h', 8);
SELECT * FROM my_table WHERE a > 100 order by a;
SELECT count(*) FROM my_table WHERE a > 100;
ALTER TABLE my_table ADD COLUMN c STRING FIRST;
SELECT * FROM my_table WHERE a > 100 order by a;
SELECT count(*) FROM my_table WHERE a > 100;
DROP TABLE my_table;

View File

@@ -31,3 +31,24 @@ DROP TABLE test;
Affected Rows: 0
CREATE TABLE my_table (
a INT PRIMARY KEY,
b STRING,
ts TIMESTAMP TIME INDEX,
)
PARTITION ON COLUMNS (a) (
a < 1000,
a >= 1000 AND a < 2000,
a >= 2000
);
Affected Rows: 0
ALTER TABLE my_table DROP COLUMN a;
Error: 1004(InvalidArguments), Not allowed to remove index column a from table my_table
DROP TABLE my_table;
Affected Rows: 0

View File

@@ -11,3 +11,18 @@ SELECT * FROM test;
ALTER TABLE test DROP COLUMN j;
DROP TABLE test;
CREATE TABLE my_table (
a INT PRIMARY KEY,
b STRING,
ts TIMESTAMP TIME INDEX,
)
PARTITION ON COLUMNS (a) (
a < 1000,
a >= 1000 AND a < 2000,
a >= 2000
);
ALTER TABLE my_table DROP COLUMN a;
DROP TABLE my_table;

View File

@@ -2,19 +2,19 @@
SELECt @@tx_isolation;
+-----------------+
| @@tx_isolation; |
| @@tx_isolation |
+-----------------+
| 0 |
| REPEATABLE-READ |
+-----------------+
-- SQLNESS PROTOCOL MYSQL
SELECT @@version_comment;
+--------------------+
| @@version_comment; |
+--------------------+
| 0 |
+--------------------+
+-------------------+
| @@version_comment |
+-------------------+
| Greptime |
+-------------------+
-- SQLNESS PROTOCOL MYSQL
SHOW DATABASES;

View File

@@ -70,8 +70,14 @@ EXPLAIN SELECT a % 2, b FROM test UNION SELECT a % 2 AS k, b FROM test ORDER BY
| logical_plan | Sort: Int64(-1) ASC NULLS LAST |
| | Aggregate: groupBy=[[test.a % Int64(2), test.b]], aggr=[[]] |
| | Union |
| | MergeScan [is_placeholder=false] |
| | MergeScan [is_placeholder=false] |
| | MergeScan [is_placeholder=false, remote_input=[ |
| | Projection: CAST(test.a AS Int64) % Int64(2) AS test.a % Int64(2), test.b |
| | TableScan: test |
| | ]] |
| | MergeScan [is_placeholder=false, remote_input=[ |
| | Projection: CAST(test.a AS Int64) % Int64(2) AS test.a % Int64(2), test.b |
| | TableScan: test |
| | ]] |
| physical_plan | CoalescePartitionsExec |
| | AggregateExec: mode=SinglePartitioned, gby=[test.a % Int64(2)@0 as test.a % Int64(2), b@1 as b], aggr=[] |
| | InterleaveExec |

View File

@@ -332,3 +332,34 @@ drop table histogram4_bucket;
Affected Rows: 0
tql eval(0, 10, '10s') histogram_quantile(0.99, sum by(pod,instance, fff) (rate(greptime_servers_postgres_query_elapsed_bucket{instance=~"xxx"}[1m])));
++
++
-- test case where table exists but doesn't have 'le' column should raise error
CREATE TABLE greptime_servers_postgres_query_elapsed_no_le (
pod STRING,
instance STRING,
t TIMESTAMP TIME INDEX,
v DOUBLE,
PRIMARY KEY (pod, instance)
);
Affected Rows: 0
-- should return empty result instead of error when 'le' column is missing
tql eval(0, 10, '10s') histogram_quantile(0.99, sum by(pod,instance, le) (rate(greptime_servers_postgres_query_elapsed_no_le{instance=~"xxx"}[1m])));
++
++
tql eval(0, 10, '10s') histogram_quantile(0.99, sum by(pod,instance, fbf) (rate(greptime_servers_postgres_query_elapsed_no_le{instance=~"xxx"}[1m])));
++
++
drop table greptime_servers_postgres_query_elapsed_no_le;
Affected Rows: 0

View File

@@ -187,3 +187,20 @@ insert into histogram4_bucket values
tql eval (2900, 3000, '100s') histogram_quantile(0.9, histogram4_bucket);
drop table histogram4_bucket;
tql eval(0, 10, '10s') histogram_quantile(0.99, sum by(pod,instance, fff) (rate(greptime_servers_postgres_query_elapsed_bucket{instance=~"xxx"}[1m])));
-- test case where table exists but doesn't have 'le' column should raise error
CREATE TABLE greptime_servers_postgres_query_elapsed_no_le (
pod STRING,
instance STRING,
t TIMESTAMP TIME INDEX,
v DOUBLE,
PRIMARY KEY (pod, instance)
);
-- should return empty result instead of error when 'le' column is missing
tql eval(0, 10, '10s') histogram_quantile(0.99, sum by(pod,instance, le) (rate(greptime_servers_postgres_query_elapsed_no_le{instance=~"xxx"}[1m])));
tql eval(0, 10, '10s') histogram_quantile(0.99, sum by(pod,instance, fbf) (rate(greptime_servers_postgres_query_elapsed_no_le{instance=~"xxx"}[1m])));
drop table greptime_servers_postgres_query_elapsed_no_le;

View File

@@ -0,0 +1,160 @@
-- Test `timestamp()` function
-- timestamp() returns the timestamp of each sample as seconds since Unix epoch
create table timestamp_test (ts timestamp time index, val double);
Affected Rows: 0
insert into timestamp_test values
(0, 1.0),
(1000, 2.0),
(60000, 3.0),
(3600000, 4.0),
-- 2021-01-01 00:00:00
(1609459200000, 5.0),
-- 2021-01-01 00:01:00
(1609459260000, 6.0);
Affected Rows: 6
-- Test timestamp() with time series
tql eval (0, 3600, '30s') timestamp(timestamp_test);
+---------------------+--------+
| ts | value |
+---------------------+--------+
| 1970-01-01T00:00:00 | 0.0 |
| 1970-01-01T00:00:30 | 1.0 |
| 1970-01-01T00:01:00 | 60.0 |
| 1970-01-01T00:01:30 | 60.0 |
| 1970-01-01T00:02:00 | 60.0 |
| 1970-01-01T00:02:30 | 60.0 |
| 1970-01-01T00:03:00 | 60.0 |
| 1970-01-01T00:03:30 | 60.0 |
| 1970-01-01T00:04:00 | 60.0 |
| 1970-01-01T00:04:30 | 60.0 |
| 1970-01-01T00:05:00 | 60.0 |
| 1970-01-01T00:05:30 | 60.0 |
| 1970-01-01T00:06:00 | 60.0 |
| 1970-01-01T01:00:00 | 3600.0 |
+---------------------+--------+
-- Test timestamp() with specific time range
tql eval (0, 60, '30s') timestamp(timestamp_test);
+---------------------+-------+
| ts | value |
+---------------------+-------+
| 1970-01-01T00:00:00 | 0.0 |
| 1970-01-01T00:00:30 | 1.0 |
| 1970-01-01T00:01:00 | 60.0 |
+---------------------+-------+
tql eval (0, 60, '30s') -timestamp(timestamp_test);
+---------------------+-----------+
| ts | (- value) |
+---------------------+-----------+
| 1970-01-01T00:00:00 | -0.0 |
| 1970-01-01T00:00:30 | -1.0 |
| 1970-01-01T00:01:00 | -60.0 |
+---------------------+-----------+
-- Test timestamp() with 2021 data
tql eval (1609459200, 1609459260, '30s') timestamp(timestamp_test);
+---------------------+--------------+
| ts | value |
+---------------------+--------------+
| 2021-01-01T00:00:00 | 1609459200.0 |
| 2021-01-01T00:00:30 | 1609459200.0 |
| 2021-01-01T00:01:00 | 1609459260.0 |
+---------------------+--------------+
-- Test timestamp() with arithmetic operations
tql eval (0, 60, '30s') timestamp(timestamp_test) + 1;
+---------------------+--------------------+
| ts | value + Float64(1) |
+---------------------+--------------------+
| 1970-01-01T00:00:00 | 1.0 |
| 1970-01-01T00:00:30 | 2.0 |
| 1970-01-01T00:01:00 | 61.0 |
+---------------------+--------------------+
-- Test timestamp() with boolean operations
tql eval (0, 60, '30s') timestamp(timestamp_test) > bool 30;
+---------------------+---------------------+
| ts | value > Float64(30) |
+---------------------+---------------------+
| 1970-01-01T00:00:00 | 0.0 |
| 1970-01-01T00:00:30 | 0.0 |
| 1970-01-01T00:01:00 | 1.0 |
+---------------------+---------------------+
-- Test timestamp() with time functions
tql eval (0, 60, '30s') timestamp(timestamp_test) - time();
+---------------------+----------------------------+
| ts | value - ts / Float64(1000) |
+---------------------+----------------------------+
| 1970-01-01T00:00:00 | 0.0 |
| 1970-01-01T00:00:30 | -29.0 |
| 1970-01-01T00:01:00 | 0.0 |
+---------------------+----------------------------+
-- Test timestamp() with other functions
tql eval (0, 60, '30s') abs(timestamp(timestamp_test) - avg(timestamp(timestamp_test))) > 20;
Error: 1004(InvalidArguments), Invalid function argument for unknown
tql eval (0, 60, '30s') timestamp(timestamp_test) == 60;
+---------------------+-------+
| ts | value |
+---------------------+-------+
| 1970-01-01T00:01:00 | 60.0 |
+---------------------+-------+
-- Test timestamp() with multiple metrics
create table timestamp_test2 (ts timestamp time index, val double);
Affected Rows: 0
insert into timestamp_test2 values
(0, 10.0),
(1000, 20.0),
(60000, 30.0);
Affected Rows: 3
-- SQLNESS SORT_RESULT 3 1
tql eval (0, 60, '30s') timestamp(timestamp_test) + timestamp(timestamp_test2);
+---------------------+----------------------------------------------+
| ts | timestamp_test.value + timestamp_test2.value |
+---------------------+----------------------------------------------+
| 1970-01-01T00:00:00 | 0.0 |
| 1970-01-01T00:00:30 | 2.0 |
| 1970-01-01T00:01:00 | 120.0 |
+---------------------+----------------------------------------------+
-- SQLNESS SORT_RESULT 3 1
tql eval (0, 60, '30s') timestamp(timestamp_test) == timestamp(timestamp_test2);
+---------------------+-------+---------------------+-------+
| ts | value | ts | value |
+---------------------+-------+---------------------+-------+
| 1970-01-01T00:00:00 | 0.0 | 1970-01-01T00:00:00 | 0.0 |
| 1970-01-01T00:00:30 | 1.0 | 1970-01-01T00:00:30 | 1.0 |
| 1970-01-01T00:01:00 | 60.0 | 1970-01-01T00:01:00 | 60.0 |
+---------------------+-------+---------------------+-------+
drop table timestamp_test;
Affected Rows: 0
drop table timestamp_test2;
Affected Rows: 0

Some files were not shown because too many files have changed in this diff Show More