fix: closee issue #6555 return empty result (#6569 )

* fix: closee issue #6555 return empty result Signed-off-by: yihong0618 <zouzou0208@gmail.com> * fix: only start one instance one regrex sqlness test (#6570) Signed-off-by: yihong0618 <zouzou0208@gmail.com> * refactor: refactor partition mod to use PartitionExpr instead of PartitionDef (#6554) * refactor: refactor partition mod to use PartitionExpr instead of PartitionDef Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> * fix snafu Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> * Puts expression into PbPartition Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> * address comments Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> * fix compile Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> * update proto Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> * add serde test Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> * add serde test Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> --------- Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> * fix: address comments Signed-off-by: yihong0618 <zouzou0208@gmail.com> --------- Signed-off-by: yihong0618 <zouzou0208@gmail.com> Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> Co-authored-by: Zhenchi <zhongzc_arch@outlook.com> Signed-off-by: evenyag <realevenyag@gmail.com>
chore: bump version to 0.15.3 (#6580 )
2025-12-24 23:19:57 +00:00 · 2025-07-24 15:00:32 +08:00 · 2025-07-24 11:24:07 +08:00 · 2025-07-23 22:29:14 +08:00 · 2025-07-23 20:54:33 +08:00 · 2025-07-23 20:54:33 +08:00
113 changed files with 6249 additions and 691 deletions
--- a/.cargo/config.toml
+++ b/.cargo/config.toml
@@ -12,3 +12,6 @@ fetch = true
 checkout = true
 list_files = true
 internal_use_git2 = false
+
+[env]
+CARGO_WORKSPACE_DIR = { value = "", relative = true }
--- a/Cargo.lock
+++ b/Cargo.lock
@@ -211,7 +211,7 @@ checksum = "d301b3b94cb4b2f23d7917810addbbaff90738e0ca2be692bd027e70d7e0330c"

 [[package]]
 name = "api"
-version = "0.15.0"
+version = "0.15.3"
 dependencies = [
 "common-base",
 "common-decimal",
@@ -944,7 +944,7 @@ dependencies = [

 [[package]]
 name = "auth"
-version = "0.15.0"
+version = "0.15.3"
 dependencies = [
 "api",
 "async-trait",
@@ -1586,7 +1586,7 @@ dependencies = [

 [[package]]
 name = "cache"
-version = "0.15.0"
+version = "0.15.3"
 dependencies = [
 "catalog",
 "common-error",
@@ -1602,6 +1602,17 @@ version = "1.0.7"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "acbc26382d871df4b7442e3df10a9402bf3cf5e55cbd66f12be38861425f0564"

+[[package]]
+name = "cargo-manifest"
+version = "0.19.1"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "a1d8af896b707212cd0e99c112a78c9497dd32994192a463ed2f7419d29bd8c6"
+dependencies = [
+ "serde",
+ "thiserror 2.0.12",
+ "toml 0.8.19",
+]
+
 [[package]]
 name = "cast"
 version = "0.3.0"
@@ -1610,7 +1621,7 @@ checksum = "37b2a672a2cb129a2e41c10b1224bb368f9f37a2b16b612598138befd7b37eb5"

 [[package]]
 name = "catalog"
-version = "0.15.0"
+version = "0.15.3"
 dependencies = [
 "api",
 "arrow 54.2.1",
@@ -1948,7 +1959,7 @@ checksum = "1462739cb27611015575c0c11df5df7601141071f07518d56fcc1be504cbec97"

 [[package]]
 name = "cli"
-version = "0.15.0"
+version = "0.15.3"
 dependencies = [
 "async-stream",
 "async-trait",
@@ -1993,7 +2004,7 @@ dependencies = [
 "session",
 "snafu 0.8.5",
 "store-api",
- "substrait 0.15.0",
+ "substrait 0.15.3",
 "table",
 "tempfile",
 "tokio",
@@ -2002,7 +2013,7 @@ dependencies = [

 [[package]]
 name = "client"
-version = "0.15.0"
+version = "0.15.3"
 dependencies = [
 "api",
 "arc-swap",
@@ -2032,7 +2043,7 @@ dependencies = [
 "rand 0.9.0",
 "serde_json",
 "snafu 0.8.5",
- "substrait 0.15.0",
+ "substrait 0.15.3",
 "substrait 0.37.3",
 "tokio",
 "tokio-stream",
@@ -2073,7 +2084,7 @@ dependencies = [

 [[package]]
 name = "cmd"
-version = "0.15.0"
+version = "0.15.3"
 dependencies = [
 "async-trait",
 "auth",
@@ -2134,7 +2145,7 @@ dependencies = [
 "snafu 0.8.5",
 "stat",
 "store-api",
- "substrait 0.15.0",
+ "substrait 0.15.3",
 "table",
 "temp-env",
 "tempfile",
@@ -2181,7 +2192,7 @@ checksum = "55b672471b4e9f9e95499ea597ff64941a309b2cdbffcc46f2cc5e2d971fd335"

 [[package]]
 name = "common-base"
-version = "0.15.0"
+version = "0.15.3"
 dependencies = [
 "anymap2",
 "async-trait",
@@ -2203,11 +2214,11 @@ dependencies = [

 [[package]]
 name = "common-catalog"
-version = "0.15.0"
+version = "0.15.3"

 [[package]]
 name = "common-config"
-version = "0.15.0"
+version = "0.15.3"
 dependencies = [
 "common-base",
 "common-error",
@@ -2232,7 +2243,7 @@ dependencies = [

 [[package]]
 name = "common-datasource"
-version = "0.15.0"
+version = "0.15.3"
 dependencies = [
 "arrow 54.2.1",
 "arrow-schema 54.3.1",
@@ -2269,7 +2280,7 @@ dependencies = [

 [[package]]
 name = "common-decimal"
-version = "0.15.0"
+version = "0.15.3"
 dependencies = [
 "bigdecimal 0.4.8",
 "common-error",
@@ -2282,7 +2293,7 @@ dependencies = [

 [[package]]
 name = "common-error"
-version = "0.15.0"
+version = "0.15.3"
 dependencies = [
 "common-macro",
 "http 1.1.0",
@@ -2293,7 +2304,7 @@ dependencies = [

 [[package]]
 name = "common-frontend"
-version = "0.15.0"
+version = "0.15.3"
 dependencies = [
 "async-trait",
 "common-error",
@@ -2309,7 +2320,7 @@ dependencies = [

 [[package]]
 name = "common-function"
-version = "0.15.0"
+version = "0.15.3"
 dependencies = [
 "ahash 0.8.11",
 "api",
@@ -2362,7 +2373,7 @@ dependencies = [

 [[package]]
 name = "common-greptimedb-telemetry"
-version = "0.15.0"
+version = "0.15.3"
 dependencies = [
 "async-trait",
 "common-runtime",
@@ -2379,7 +2390,7 @@ dependencies = [

 [[package]]
 name = "common-grpc"
-version = "0.15.0"
+version = "0.15.3"
 dependencies = [
 "api",
 "arrow-flight",
@@ -2411,7 +2422,7 @@ dependencies = [

 [[package]]
 name = "common-grpc-expr"
-version = "0.15.0"
+version = "0.15.3"
 dependencies = [
 "api",
 "common-base",
@@ -2430,7 +2441,7 @@ dependencies = [

 [[package]]
 name = "common-macro"
-version = "0.15.0"
+version = "0.15.3"
 dependencies = [
 "arc-swap",
 "common-query",
@@ -2444,7 +2455,7 @@ dependencies = [

 [[package]]
 name = "common-mem-prof"
-version = "0.15.0"
+version = "0.15.3"
 dependencies = [
 "anyhow",
 "common-error",
@@ -2460,7 +2471,7 @@ dependencies = [

 [[package]]
 name = "common-meta"
-version = "0.15.0"
+version = "0.15.3"
 dependencies = [
 "anymap2",
 "api",
@@ -2525,7 +2536,7 @@ dependencies = [

 [[package]]
 name = "common-options"
-version = "0.15.0"
+version = "0.15.3"
 dependencies = [
 "common-grpc",
 "humantime-serde",
@@ -2534,11 +2545,11 @@ dependencies = [

 [[package]]
 name = "common-plugins"
-version = "0.15.0"
+version = "0.15.3"

 [[package]]
 name = "common-pprof"
-version = "0.15.0"
+version = "0.15.3"
 dependencies = [
 "common-error",
 "common-macro",
@@ -2550,7 +2561,7 @@ dependencies = [

 [[package]]
 name = "common-procedure"
-version = "0.15.0"
+version = "0.15.3"
 dependencies = [
 "async-stream",
 "async-trait",
@@ -2577,7 +2588,7 @@ dependencies = [

 [[package]]
 name = "common-procedure-test"
-version = "0.15.0"
+version = "0.15.3"
 dependencies = [
 "async-trait",
 "common-procedure",
@@ -2586,7 +2597,7 @@ dependencies = [

 [[package]]
 name = "common-query"
-version = "0.15.0"
+version = "0.15.3"
 dependencies = [
 "api",
 "async-trait",
@@ -2612,7 +2623,7 @@ dependencies = [

 [[package]]
 name = "common-recordbatch"
-version = "0.15.0"
+version = "0.15.3"
 dependencies = [
 "arc-swap",
 "common-error",
@@ -2632,7 +2643,7 @@ dependencies = [

 [[package]]
 name = "common-runtime"
-version = "0.15.0"
+version = "0.15.3"
 dependencies = [
 "async-trait",
 "clap 4.5.19",
@@ -2662,17 +2673,18 @@ dependencies = [

 [[package]]
 name = "common-session"
-version = "0.15.0"
+version = "0.15.3"
 dependencies = [
 "strum 0.27.1",
 ]

 [[package]]
 name = "common-telemetry"
-version = "0.15.0"
+version = "0.15.3"
 dependencies = [
 "backtrace",
 "common-error",
+ "common-version",
 "console-subscriber",
 "greptime-proto",
 "humantime-serde",
@@ -2696,7 +2708,7 @@ dependencies = [

 [[package]]
 name = "common-test-util"
-version = "0.15.0"
+version = "0.15.3"
 dependencies = [
 "client",
 "common-grpc",
@@ -2709,7 +2721,7 @@ dependencies = [

 [[package]]
 name = "common-time"
-version = "0.15.0"
+version = "0.15.3"
 dependencies = [
 "arrow 54.2.1",
 "chrono",
@@ -2727,9 +2739,10 @@ dependencies = [

 [[package]]
 name = "common-version"
-version = "0.15.0"
+version = "0.15.3"
 dependencies = [
 "build-data",
+ "cargo-manifest",
 "const_format",
 "serde",
 "shadow-rs",
@@ -2737,7 +2750,7 @@ dependencies = [

 [[package]]
 name = "common-wal"
-version = "0.15.0"
+version = "0.15.3"
 dependencies = [
 "common-base",
 "common-error",
@@ -2760,7 +2773,7 @@ dependencies = [

 [[package]]
 name = "common-workload"
-version = "0.15.0"
+version = "0.15.3"
 dependencies = [
 "api",
 "common-telemetry",
@@ -3716,7 +3729,7 @@ dependencies = [

 [[package]]
 name = "datanode"
-version = "0.15.0"
+version = "0.15.3"
 dependencies = [
 "api",
 "arrow-flight",
@@ -3769,7 +3782,7 @@ dependencies = [
 "session",
 "snafu 0.8.5",
 "store-api",
- "substrait 0.15.0",
+ "substrait 0.15.3",
 "table",
 "tokio",
 "toml 0.8.19",
@@ -3778,7 +3791,7 @@ dependencies = [

 [[package]]
 name = "datatypes"
-version = "0.15.0"
+version = "0.15.3"
 dependencies = [
 "arrow 54.2.1",
 "arrow-array 54.2.1",
@@ -4438,7 +4451,7 @@ checksum = "e8c02a5121d4ea3eb16a80748c74f5549a5665e4c21333c6098f283870fbdea6"

 [[package]]
 name = "file-engine"
-version = "0.15.0"
+version = "0.15.3"
 dependencies = [
 "api",
 "async-trait",
@@ -4575,7 +4588,7 @@ checksum = "8bf7cc16383c4b8d58b9905a8509f02926ce3058053c056376248d958c9df1e8"

 [[package]]
 name = "flow"
-version = "0.15.0"
+version = "0.15.3"
 dependencies = [
 "api",
 "arrow 54.2.1",
@@ -4640,7 +4653,7 @@ dependencies = [
 "sql",
 "store-api",
 "strum 0.27.1",
- "substrait 0.15.0",
+ "substrait 0.15.3",
 "table",
 "tokio",
 "tonic 0.12.3",
@@ -4695,7 +4708,7 @@ checksum = "6c2141d6d6c8512188a7891b4b01590a45f6dac67afb4f255c4124dbb86d4eaa"

 [[package]]
 name = "frontend"
-version = "0.15.0"
+version = "0.15.3"
 dependencies = [
 "api",
 "arc-swap",
@@ -4755,7 +4768,7 @@ dependencies = [
 "sqlparser 0.54.0 (git+https://github.com/GreptimeTeam/sqlparser-rs.git?rev=0cf6c04490d59435ee965edd2078e8855bd8471e)",
 "store-api",
 "strfmt",
- "substrait 0.15.0",
+ "substrait 0.15.3",
 "table",
 "tokio",
 "tokio-util",
@@ -5916,7 +5929,7 @@ dependencies = [

 [[package]]
 name = "index"
-version = "0.15.0"
+version = "0.15.3"
 dependencies = [
 "async-trait",
 "asynchronous-codec",
@@ -6801,7 +6814,7 @@ checksum = "a7a70ba024b9dc04c27ea2f0c0548feb474ec5c54bba33a7f72f873a39d07b24"

 [[package]]
 name = "log-query"
-version = "0.15.0"
+version = "0.15.3"
 dependencies = [
 "chrono",
 "common-error",
@@ -6813,7 +6826,7 @@ dependencies = [

 [[package]]
 name = "log-store"
-version = "0.15.0"
+version = "0.15.3"
 dependencies = [
 "async-stream",
 "async-trait",
@@ -7111,7 +7124,7 @@ dependencies = [

 [[package]]
 name = "meta-client"
-version = "0.15.0"
+version = "0.15.3"
 dependencies = [
 "api",
 "async-trait",
@@ -7139,7 +7152,7 @@ dependencies = [

 [[package]]
 name = "meta-srv"
-version = "0.15.0"
+version = "0.15.3"
 dependencies = [
 "api",
 "async-trait",
@@ -7230,7 +7243,7 @@ dependencies = [

 [[package]]
 name = "metric-engine"
-version = "0.15.0"
+version = "0.15.3"
 dependencies = [
 "api",
 "aquamarine",
@@ -7320,7 +7333,7 @@ dependencies = [

 [[package]]
 name = "mito-codec"
-version = "0.15.0"
+version = "0.15.3"
 dependencies = [
 "api",
 "bytes",
@@ -7343,7 +7356,7 @@ dependencies = [

 [[package]]
 name = "mito2"
-version = "0.15.0"
+version = "0.15.3"
 dependencies = [
 "api",
 "aquamarine",
@@ -8093,7 +8106,7 @@ dependencies = [

 [[package]]
 name = "object-store"
-version = "0.15.0"
+version = "0.15.3"
 dependencies = [
 "anyhow",
 "bytes",
@@ -8407,7 +8420,7 @@ dependencies = [

 [[package]]
 name = "operator"
-version = "0.15.0"
+version = "0.15.3"
 dependencies = [
 "ahash 0.8.11",
 "api",
@@ -8462,7 +8475,7 @@ dependencies = [
 "sql",
 "sqlparser 0.54.0 (git+https://github.com/GreptimeTeam/sqlparser-rs.git?rev=0cf6c04490d59435ee965edd2078e8855bd8471e)",
 "store-api",
- "substrait 0.15.0",
+ "substrait 0.15.3",
 "table",
 "tokio",
 "tokio-util",
@@ -8729,7 +8742,7 @@ dependencies = [

 [[package]]
 name = "partition"
-version = "0.15.0"
+version = "0.15.3"
 dependencies = [
 "api",
 "async-trait",
@@ -9017,7 +9030,7 @@ checksum = "8b870d8c151b6f2fb93e84a13146138f05d02ed11c7e7c54f8826aaaf7c9f184"

 [[package]]
 name = "pipeline"
-version = "0.15.0"
+version = "0.15.3"
 dependencies = [
 "ahash 0.8.11",
 "api",
@@ -9160,7 +9173,7 @@ dependencies = [

 [[package]]
 name = "plugins"
-version = "0.15.0"
+version = "0.15.3"
 dependencies = [
 "auth",
 "clap 4.5.19",
@@ -9473,7 +9486,7 @@ dependencies = [

 [[package]]
 name = "promql"
-version = "0.15.0"
+version = "0.15.3"
 dependencies = [
 "ahash 0.8.11",
 "async-trait",
@@ -9755,7 +9768,7 @@ dependencies = [

 [[package]]
 name = "puffin"
-version = "0.15.0"
+version = "0.15.3"
 dependencies = [
 "async-compression 0.4.13",
 "async-trait",
@@ -9797,7 +9810,7 @@ dependencies = [

 [[package]]
 name = "query"
-version = "0.15.0"
+version = "0.15.3"
 dependencies = [
 "ahash 0.8.11",
 "api",
@@ -9863,7 +9876,7 @@ dependencies = [
 "sqlparser 0.54.0 (git+https://github.com/GreptimeTeam/sqlparser-rs.git?rev=0cf6c04490d59435ee965edd2078e8855bd8471e)",
 "statrs",
 "store-api",
- "substrait 0.15.0",
+ "substrait 0.15.3",
 "table",
 "tokio",
 "tokio-stream",
@@ -11149,7 +11162,7 @@ dependencies = [

 [[package]]
 name = "servers"
-version = "0.15.0"
+version = "0.15.3"
 dependencies = [
 "ahash 0.8.11",
 "api",
@@ -11270,7 +11283,7 @@ dependencies = [

 [[package]]
 name = "session"
-version = "0.15.0"
+version = "0.15.3"
 dependencies = [
 "api",
 "arc-swap",
@@ -11609,7 +11622,7 @@ dependencies = [

 [[package]]
 name = "sql"
-version = "0.15.0"
+version = "0.15.3"
 dependencies = [
 "api",
 "chrono",
@@ -11664,7 +11677,7 @@ dependencies = [

 [[package]]
 name = "sqlness-runner"
-version = "0.15.0"
+version = "0.15.3"
 dependencies = [
 "async-trait",
 "clap 4.5.19",
@@ -11964,7 +11977,7 @@ dependencies = [

 [[package]]
 name = "stat"
-version = "0.15.0"
+version = "0.15.3"
 dependencies = [
 "nix 0.30.1",
 ]
@@ -11990,7 +12003,7 @@ dependencies = [

 [[package]]
 name = "store-api"
-version = "0.15.0"
+version = "0.15.3"
 dependencies = [
 "api",
 "aquamarine",
@@ -12151,7 +12164,7 @@ dependencies = [

 [[package]]
 name = "substrait"
-version = "0.15.0"
+version = "0.15.3"
 dependencies = [
 "async-trait",
 "bytes",
@@ -12331,7 +12344,7 @@ dependencies = [

 [[package]]
 name = "table"
-version = "0.15.0"
+version = "0.15.3"
 dependencies = [
 "api",
 "async-trait",
@@ -12592,7 +12605,7 @@ checksum = "3369f5ac52d5eb6ab48c6b4ffdc8efbcad6b89c765749064ba298f2c68a16a76"

 [[package]]
 name = "tests-fuzz"
-version = "0.15.0"
+version = "0.15.3"
 dependencies = [
 "arbitrary",
 "async-trait",
@@ -12636,7 +12649,7 @@ dependencies = [

 [[package]]
 name = "tests-integration"
-version = "0.15.0"
+version = "0.15.3"
 dependencies = [
 "api",
 "arrow-flight",
@@ -12703,7 +12716,7 @@ dependencies = [
 "sql",
 "sqlx",
 "store-api",
- "substrait 0.15.0",
+ "substrait 0.15.3",
 "table",
 "tempfile",
 "time",
@@ -13073,6 +13086,7 @@ version = "0.8.19"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "a1ed1f98e3fdc28d6d910e6737ae6ab1a93bf1985935a1193e68f93eeb68d24e"
 dependencies = [
+ "indexmap 2.9.0",
 "serde",
 "serde_spanned",
 "toml_datetime",
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -71,7 +71,7 @@ members = [
 resolver = "2"

 [workspace.package]
-version = "0.15.0"
+version = "0.15.3"
 edition = "2021"
 license = "Apache-2.0"

--- a/src/client/src/database.rs
+++ b/src/client/src/database.rs
@@ -211,12 +211,18 @@ impl Database {
                        retries += 1;
                        warn!("Retrying {} times with error = {:?}", retries, err);
                        continue;
+                    } else {
+                        error!(
+                            err; "Failed to send request to grpc handle, retries = {}, not retryable error, aborting",
+                            retries
+                        );
+                        return Err(err.into());
                    }
                }
                (Err(err), false) => {
                    error!(
-                        "Failed to send request to grpc handle after {} retries, error = {:?}",
-                        retries, err
+                        err; "Failed to send request to grpc handle after {} retries",
+                        retries,
                    );
                    return Err(err.into());
                }
--- a/src/client/src/region.rs
+++ b/src/client/src/region.rs
@@ -163,19 +163,70 @@ impl RegionRequester {
            let _span = tracing_context.attach(common_telemetry::tracing::info_span!(
                "poll_flight_data_stream"
            ));
-            while let Some(flight_message) = flight_message_stream.next().await {
-                let flight_message = flight_message
-                    .map_err(BoxedError::new)
-                    .context(ExternalSnafu)?;
+
+            let mut buffered_message: Option<FlightMessage> = None;
+            let mut stream_ended = false;
+
+            while !stream_ended {
+                // get the next message from the buffered message or read from the flight message stream
+                let flight_message_item = if let Some(msg) = buffered_message.take() {
+                    Some(Ok(msg))
+                } else {
+                    flight_message_stream.next().await
+                };
+
+                let flight_message = match flight_message_item {
+                    Some(Ok(message)) => message,
+                    Some(Err(e)) => {
+                        yield Err(BoxedError::new(e)).context(ExternalSnafu);
+                        break;
+                    }
+                    None => break,
+                };

                match flight_message {
                    FlightMessage::RecordBatch(record_batch) => {
-                        yield RecordBatch::try_from_df_record_batch(
+                        let result_to_yield = RecordBatch::try_from_df_record_batch(
                            schema_cloned.clone(),
                            record_batch,
-                        )
+                        );
+
+                        // get the next message from the stream. normally it should be a metrics message.
+                        if let Some(next_flight_message_result) = flight_message_stream.next().await
+                        {
+                            match next_flight_message_result {
+                                Ok(FlightMessage::Metrics(s)) => {
+                                    let m = serde_json::from_str(&s).ok().map(Arc::new);
+                                    metrics_ref.swap(m);
+                                }
+                                Ok(FlightMessage::RecordBatch(rb)) => {
+                                    // for some reason it's not a metrics message, so we need to buffer this record batch
+                                    // and yield it in the next iteration.
+                                    buffered_message = Some(FlightMessage::RecordBatch(rb));
+                                }
+                                Ok(_) => {
+                                    yield IllegalFlightMessagesSnafu {
+                                        reason: "A RecordBatch message can only be succeeded by a Metrics message or another RecordBatch message"
+                                    }
+                                    .fail()
+                                    .map_err(BoxedError::new)
+                                    .context(ExternalSnafu);
+                                    break;
+                                }
+                                Err(e) => {
+                                    yield Err(BoxedError::new(e)).context(ExternalSnafu);
+                                    break;
+                                }
+                            }
+                        } else {
+                            // the stream has ended
+                            stream_ended = true;
+                        }
+
+                        yield result_to_yield;
                    }
                    FlightMessage::Metrics(s) => {
+                        // just a branch in case of some metrics message comes after other things.
                        let m = serde_json::from_str(&s).ok().map(Arc::new);
                        metrics_ref.swap(m);
                        break;
--- a/src/cmd/src/bin/greptime.rs
+++ b/src/cmd/src/bin/greptime.rs
@@ -20,11 +20,11 @@ use cmd::error::{InitTlsProviderSnafu, Result};
 use cmd::options::GlobalOptions;
 use cmd::{cli, datanode, flownode, frontend, metasrv, standalone, App};
 use common_base::Plugins;
-use common_version::version;
+use common_version::{verbose_version, version};
 use servers::install_ring_crypto_provider;

 #[derive(Parser)]
-#[command(name = "greptime", author, version, long_version = version(), about)]
+#[command(name = "greptime", author, version, long_version = verbose_version(), about)]
 #[command(propagate_version = true)]
 pub(crate) struct Command {
    #[clap(subcommand)]
@@ -143,10 +143,8 @@ async fn start(cli: Command) -> Result<()> {
 }

 fn setup_human_panic() {
-    human_panic::setup_panic!(
-        human_panic::Metadata::new("GreptimeDB", env!("CARGO_PKG_VERSION"))
-            .homepage("https://github.com/GreptimeTeam/greptimedb/discussions")
-    );
+    human_panic::setup_panic!(human_panic::Metadata::new("GreptimeDB", version())
+        .homepage("https://github.com/GreptimeTeam/greptimedb/discussions"));

    common_telemetry::set_panic_hook();
 }
--- a/src/cmd/src/datanode/builder.rs
+++ b/src/cmd/src/datanode/builder.rs
@@ -19,7 +19,7 @@ use catalog::kvbackend::MetaKvBackend;
 use common_base::Plugins;
 use common_meta::cache::LayeredCacheRegistryBuilder;
 use common_telemetry::info;
-use common_version::{short_version, version};
+use common_version::{short_version, verbose_version};
 use datanode::datanode::DatanodeBuilder;
 use datanode::service::DatanodeServiceBuilder;
 use meta_client::MetaClientType;
@@ -67,7 +67,7 @@ impl InstanceBuilder {
            None,
        );

-        log_versions(version(), short_version(), APP_NAME);
+        log_versions(verbose_version(), short_version(), APP_NAME);
        create_resource_limit_metrics(APP_NAME);

        plugins::setup_datanode_plugins(plugins, &opts.plugins, dn_opts)
--- a/src/cmd/src/flownode.rs
+++ b/src/cmd/src/flownode.rs
@@ -32,7 +32,7 @@ use common_meta::key::flow::FlowMetadataManager;
 use common_meta::key::TableMetadataManager;
 use common_telemetry::info;
 use common_telemetry::logging::{TracingOptions, DEFAULT_LOGGING_DIR};
-use common_version::{short_version, version};
+use common_version::{short_version, verbose_version};
 use flow::{
    get_flow_auth_options, FlownodeBuilder, FlownodeInstance, FlownodeServiceBuilder,
    FrontendClient, FrontendInvoker,
@@ -279,7 +279,7 @@ impl StartCommand {
            None,
        );

-        log_versions(version(), short_version(), APP_NAME);
+        log_versions(verbose_version(), short_version(), APP_NAME);
        create_resource_limit_metrics(APP_NAME);

        info!("Flownode start command: {:#?}", self);
--- a/src/cmd/src/frontend.rs
+++ b/src/cmd/src/frontend.rs
@@ -33,7 +33,7 @@ use common_meta::heartbeat::handler::HandlerGroupExecutor;
 use common_telemetry::info;
 use common_telemetry::logging::{TracingOptions, DEFAULT_LOGGING_DIR};
 use common_time::timezone::set_default_timezone;
-use common_version::{short_version, version};
+use common_version::{short_version, verbose_version};
 use frontend::frontend::Frontend;
 use frontend::heartbeat::HeartbeatTask;
 use frontend::instance::builder::FrontendBuilder;
@@ -282,7 +282,7 @@ impl StartCommand {
            opts.component.slow_query.as_ref(),
        );

-        log_versions(version(), short_version(), APP_NAME);
+        log_versions(verbose_version(), short_version(), APP_NAME);
        create_resource_limit_metrics(APP_NAME);

        info!("Frontend start command: {:#?}", self);
--- a/src/cmd/src/lib.rs
+++ b/src/cmd/src/lib.rs
@@ -112,7 +112,7 @@ pub trait App: Send {
 pub fn log_versions(version: &str, short_version: &str, app: &str) {
    // Report app version as gauge.
    APP_VERSION
-        .with_label_values(&[env!("CARGO_PKG_VERSION"), short_version, app])
+        .with_label_values(&[common_version::version(), short_version, app])
        .inc();

    // Log version and argument flags.
--- a/src/cmd/src/metasrv.rs
+++ b/src/cmd/src/metasrv.rs
@@ -22,7 +22,7 @@ use common_base::Plugins;
 use common_config::Configurable;
 use common_telemetry::info;
 use common_telemetry::logging::{TracingOptions, DEFAULT_LOGGING_DIR};
-use common_version::{short_version, version};
+use common_version::{short_version, verbose_version};
 use meta_srv::bootstrap::MetasrvInstance;
 use meta_srv::metasrv::BackendImpl;
 use snafu::ResultExt;
@@ -320,7 +320,7 @@ impl StartCommand {
            None,
        );

-        log_versions(version(), short_version(), APP_NAME);
+        log_versions(verbose_version(), short_version(), APP_NAME);
        create_resource_limit_metrics(APP_NAME);

        info!("Metasrv start command: {:#?}", self);
--- a/src/cmd/src/standalone.rs
+++ b/src/cmd/src/standalone.rs
@@ -51,7 +51,7 @@ use common_telemetry::logging::{
    LoggingOptions, SlowQueryOptions, TracingOptions, DEFAULT_LOGGING_DIR,
 };
 use common_time::timezone::set_default_timezone;
-use common_version::{short_version, version};
+use common_version::{short_version, verbose_version};
 use common_wal::config::DatanodeWalConfig;
 use datanode::config::{DatanodeOptions, ProcedureConfig, RegionEngineConfig, StorageConfig};
 use datanode::datanode::{Datanode, DatanodeBuilder};
@@ -466,7 +466,7 @@ impl StartCommand {
            opts.component.slow_query.as_ref(),
        );

-        log_versions(version(), short_version(), APP_NAME);
+        log_versions(verbose_version(), short_version(), APP_NAME);
        create_resource_limit_metrics(APP_NAME);

        info!("Standalone start command: {:#?}", self);
--- a/src/common/function/src/system/pg_catalog/version.rs
+++ b/src/common/function/src/system/pg_catalog/version.rs
@@ -12,8 +12,8 @@
 // See the License for the specific language governing permissions and
 // limitations under the License.

+use std::fmt;
 use std::sync::Arc;
-use std::{env, fmt};

 use common_query::error::Result;
 use common_query::prelude::{Signature, Volatility};
@@ -47,7 +47,7 @@ impl Function for PGVersionFunction {
    fn eval(&self, _func_ctx: &FunctionContext, _columns: &[VectorRef]) -> Result<VectorRef> {
        let result = StringVector::from(vec![format!(
            "PostgreSQL 16.3 GreptimeDB {}",
-            env!("CARGO_PKG_VERSION")
+            common_version::version()
        )]);
        Ok(Arc::new(result))
    }
--- a/src/common/function/src/system/version.rs
+++ b/src/common/function/src/system/version.rs
@@ -12,8 +12,8 @@
 // See the License for the specific language governing permissions and
 // limitations under the License.

+use std::fmt;
 use std::sync::Arc;
-use std::{env, fmt};

 use common_query::error::Result;
 use common_query::prelude::{Signature, Volatility};
@@ -52,13 +52,13 @@ impl Function for VersionFunction {
                    "{}-greptimedb-{}",
                    std::env::var("GREPTIMEDB_MYSQL_SERVER_VERSION")
                        .unwrap_or_else(|_| "8.4.2".to_string()),
-                    env!("CARGO_PKG_VERSION")
+                    common_version::version()
                )
            }
            Channel::Postgres => {
-                format!("16.3-greptimedb-{}", env!("CARGO_PKG_VERSION"))
+                format!("16.3-greptimedb-{}", common_version::version())
            }
-            _ => env!("CARGO_PKG_VERSION").to_string(),
+            _ => common_version::version().to_string(),
        };
        let result = StringVector::from(vec![version]);
        Ok(Arc::new(result))
--- a/src/common/meta/src/cache/flow/table_flownode.rs
+++ b/src/common/meta/src/cache/flow/table_flownode.rs
@@ -15,6 +15,7 @@
 use std::collections::HashMap;
 use std::sync::Arc;

+use common_telemetry::info;
 use futures::future::BoxFuture;
 use moka::future::Cache;
 use moka::ops::compute::Op;
@@ -89,6 +90,12 @@ fn init_factory(table_flow_manager: TableFlowManagerRef) -> Initializer<TableId,
                // we have a corresponding cache invalidation mechanism to invalidate `(Key, EmptyHashSet)`.
                .map(Arc::new)
                .map(Some)
+                .inspect(|set| {
+                    info!(
+                        "Initialized table_flownode cache for table_id: {}, set: {:?}",
+                        table_id, set
+                    );
+                })
        })
    })
 }
@@ -167,6 +174,13 @@ fn invalidator<'a>(
        match ident {
            CacheIdent::CreateFlow(create_flow) => handle_create_flow(cache, create_flow).await,
            CacheIdent::DropFlow(drop_flow) => handle_drop_flow(cache, drop_flow).await,
+            CacheIdent::FlowNodeAddressChange(node_id) => {
+                info!(
+                    "Invalidate flow node cache for node_id in table_flownode: {}",
+                    node_id
+                );
+                cache.invalidate_all();
+            }
            _ => {}
        }
        Ok(())
@@ -174,7 +188,10 @@ fn invalidator<'a>(
 }

 fn filter(ident: &CacheIdent) -> bool {
-    matches!(ident, CacheIdent::CreateFlow(_) | CacheIdent::DropFlow(_))
+    matches!(
+        ident,
+        CacheIdent::CreateFlow(_) | CacheIdent::DropFlow(_) | CacheIdent::FlowNodeAddressChange(_)
+    )
 }

 #[cfg(test)]
--- a/src/common/meta/src/cache_invalidator.rs
+++ b/src/common/meta/src/cache_invalidator.rs
@@ -22,6 +22,7 @@ use crate::key::flow::flow_name::FlowNameKey;
 use crate::key::flow::flow_route::FlowRouteKey;
 use crate::key::flow::flownode_flow::FlownodeFlowKey;
 use crate::key::flow::table_flow::TableFlowKey;
+use crate::key::node_address::NodeAddressKey;
 use crate::key::schema_name::SchemaNameKey;
 use crate::key::table_info::TableInfoKey;
 use crate::key::table_name::TableNameKey;
@@ -53,6 +54,10 @@ pub struct Context {
 #[async_trait::async_trait]
 pub trait CacheInvalidator: Send + Sync {
    async fn invalidate(&self, ctx: &Context, caches: &[CacheIdent]) -> Result<()>;
+
+    fn name(&self) -> &'static str {
+        std::any::type_name::<Self>()
+    }
 }

 pub type CacheInvalidatorRef = Arc<dyn CacheInvalidator>;
@@ -137,6 +142,13 @@ where
                    let key = FlowInfoKey::new(*flow_id);
                    self.invalidate_key(&key.to_bytes()).await;
                }
+                CacheIdent::FlowNodeAddressChange(node_id) => {
+                    // other caches doesn't need to be invalidated
+                    // since this is only for flownode address change not id change
+                    common_telemetry::info!("Invalidate flow node cache for node_id: {}", node_id);
+                    let key = NodeAddressKey::with_flownode(*node_id);
+                    self.invalidate_key(&key.to_bytes()).await;
+                }
            }
        }
        Ok(())
--- a/src/common/meta/src/instruction.rs
+++ b/src/common/meta/src/instruction.rs
@@ -174,6 +174,8 @@ pub struct UpgradeRegion {
 /// The identifier of cache.
 pub enum CacheIdent {
    FlowId(FlowId),
+    /// Indicate change of address of flownode.
+    FlowNodeAddressChange(u64),
    FlowName(FlowName),
    TableId(TableId),
    TableName(TableName),
--- a/src/common/recordbatch/src/adapter.rs
+++ b/src/common/recordbatch/src/adapter.rs
@@ -222,6 +222,7 @@ pub struct RecordBatchStreamAdapter {
 enum Metrics {
    Unavailable,
    Unresolved(Arc<dyn ExecutionPlan>),
+    PartialResolved(Arc<dyn ExecutionPlan>, RecordBatchMetrics),
    Resolved(RecordBatchMetrics),
 }

@@ -275,7 +276,9 @@ impl RecordBatchStream for RecordBatchStreamAdapter {

    fn metrics(&self) -> Option<RecordBatchMetrics> {
        match &self.metrics_2 {
-            Metrics::Resolved(metrics) => Some(metrics.clone()),
+            Metrics::Resolved(metrics) | Metrics::PartialResolved(_, metrics) => {
+                Some(metrics.clone())
+            }
            Metrics::Unavailable | Metrics::Unresolved(_) => None,
        }
    }
@@ -299,13 +302,25 @@ impl Stream for RecordBatchStreamAdapter {
            Poll::Pending => Poll::Pending,
            Poll::Ready(Some(df_record_batch)) => {
                let df_record_batch = df_record_batch?;
+                if let Metrics::Unresolved(df_plan) | Metrics::PartialResolved(df_plan, _) =
+                    &self.metrics_2
+                {
+                    let mut metric_collector = MetricCollector::new(self.explain_verbose);
+                    accept(df_plan.as_ref(), &mut metric_collector).unwrap();
+                    self.metrics_2 = Metrics::PartialResolved(
+                        df_plan.clone(),
+                        metric_collector.record_batch_metrics,
+                    );
+                }
                Poll::Ready(Some(RecordBatch::try_from_df_record_batch(
                    self.schema(),
                    df_record_batch,
                )))
            }
            Poll::Ready(None) => {
-                if let Metrics::Unresolved(df_plan) = &self.metrics_2 {
+                if let Metrics::Unresolved(df_plan) | Metrics::PartialResolved(df_plan, _) =
+                    &self.metrics_2
+                {
                    let mut metric_collector = MetricCollector::new(self.explain_verbose);
                    accept(df_plan.as_ref(), &mut metric_collector).unwrap();
                    self.metrics_2 = Metrics::Resolved(metric_collector.record_batch_metrics);
--- a/src/common/telemetry/Cargo.toml
+++ b/src/common/telemetry/Cargo.toml
@@ -14,6 +14,7 @@ workspace = true
 [dependencies]
 backtrace = "0.3"
 common-error.workspace = true
+common-version.workspace = true
 console-subscriber = { version = "0.1", optional = true }
 greptime-proto.workspace = true
 humantime-serde.workspace = true
--- a/src/common/telemetry/src/logging.rs
+++ b/src/common/telemetry/src/logging.rs
@@ -384,7 +384,7 @@ pub fn init_global_logging(
                        resource::SERVICE_INSTANCE_ID,
                        node_id.unwrap_or("none".to_string()),
                    ),
-                    KeyValue::new(resource::SERVICE_VERSION, env!("CARGO_PKG_VERSION")),
+                    KeyValue::new(resource::SERVICE_VERSION, common_version::version()),
                    KeyValue::new(resource::PROCESS_PID, std::process::id().to_string()),
                ]));

--- a/src/common/version/Cargo.toml
+++ b/src/common/version/Cargo.toml
@@ -17,4 +17,5 @@ shadow-rs.workspace = true

 [build-dependencies]
 build-data = "0.2"
+cargo-manifest = "0.19"
 shadow-rs.workspace = true
--- a/src/common/version/build.rs
+++ b/src/common/version/build.rs
@@ -14,8 +14,10 @@

 use std::collections::BTreeSet;
 use std::env;
+use std::path::PathBuf;

 use build_data::{format_timestamp, get_source_time};
+use cargo_manifest::Manifest;
 use shadow_rs::{BuildPattern, ShadowBuilder, CARGO_METADATA, CARGO_TREE};

 fn main() -> shadow_rs::SdResult<()> {
@@ -33,6 +35,24 @@ fn main() -> shadow_rs::SdResult<()> {
    // solve the problem where the "CARGO_MANIFEST_DIR" is not what we want when this repo is
    // made as a submodule in another repo.
    let src_path = env::var("CARGO_WORKSPACE_DIR").or_else(|_| env::var("CARGO_MANIFEST_DIR"))?;
+
+    let manifest = Manifest::from_path(PathBuf::from(&src_path).join("Cargo.toml"))
+        .expect("Failed to parse Cargo.toml");
+    if let Some(product_version) = manifest.workspace.as_ref().and_then(|w| {
+        w.metadata.as_ref().and_then(|m| {
+            m.get("greptime")
+                .and_then(|g| g.get("product_version").and_then(|v| v.as_str()))
+        })
+    }) {
+        println!(
+            "cargo:rustc-env=GREPTIME_PRODUCT_VERSION={}",
+            product_version
+        );
+    } else {
+        let version = env::var("CARGO_PKG_VERSION").unwrap();
+        println!("cargo:rustc-env=GREPTIME_PRODUCT_VERSION={}", version,);
+    }
+
    let out_path = env::var("OUT_DIR")?;

    let _ = ShadowBuilder::builder()
--- a/src/common/version/src/lib.rs
+++ b/src/common/version/src/lib.rs
@@ -105,13 +105,17 @@ pub const fn build_info() -> BuildInfo {
        build_time: env!("BUILD_TIMESTAMP"),
        rustc: build::RUST_VERSION,
        target: build::BUILD_TARGET,
-        version: build::PKG_VERSION,
+        version: env!("GREPTIME_PRODUCT_VERSION"),
    }
 }

 const BUILD_INFO: BuildInfo = build_info();

 pub const fn version() -> &'static str {
+    BUILD_INFO.version
+}
+
+pub const fn verbose_version() -> &'static str {
    const_format::formatcp!(
        "\nbranch: {}\ncommit: {}\nclean: {}\nversion: {}",
        BUILD_INFO.branch,
--- a/src/datanode/src/metrics.rs
+++ b/src/datanode/src/metrics.rs
@@ -27,14 +27,14 @@ lazy_static! {
    pub static ref HANDLE_REGION_REQUEST_ELAPSED: HistogramVec = register_histogram_vec!(
        "greptime_datanode_handle_region_request_elapsed",
        "datanode handle region request elapsed",
-        &[REGION_ID, REGION_REQUEST_TYPE]
+        &[REGION_REQUEST_TYPE]
    )
    .unwrap();
    /// The number of rows in region request received by region server, labeled with request type.
    pub static ref REGION_CHANGED_ROW_COUNT: IntCounterVec = register_int_counter_vec!(
        "greptime_datanode_region_changed_row_count",
        "datanode region changed row count",
-        &[REGION_ID, REGION_REQUEST_TYPE]
+        &[REGION_REQUEST_TYPE]
    )
    .unwrap();
    /// The elapsed time since the last received heartbeat.
--- a/src/datanode/src/region_server.rs
+++ b/src/datanode/src/region_server.rs
@@ -51,7 +51,7 @@ use servers::error::{self as servers_error, ExecuteGrpcRequestSnafu, Result as S
 use servers::grpc::flight::{FlightCraft, FlightRecordBatchStream, TonicStream};
 use servers::grpc::region_server::RegionServerHandler;
 use servers::grpc::FlightCompression;
-use session::context::{QueryContextBuilder, QueryContextRef};
+use session::context::{QueryContext, QueryContextBuilder, QueryContextRef};
 use snafu::{ensure, OptionExt, ResultExt};
 use store_api::metric_engine_consts::{
    FILE_ENGINE_NAME, LOGICAL_TABLE_METADATA_KEY, METRIC_ENGINE_NAME,
@@ -194,6 +194,7 @@ impl RegionServer {
    pub async fn handle_remote_read(
        &self,
        request: api::v1::region::QueryRequest,
+        query_ctx: QueryContextRef,
    ) -> Result<SendableRecordBatchStream> {
        let _permit = if let Some(p) = &self.inner.parallelism {
            Some(p.acquire().await?)
@@ -201,12 +202,6 @@ impl RegionServer {
            None
        };

-        let query_ctx: QueryContextRef = request
-            .header
-            .as_ref()
-            .map(|h| Arc::new(h.into()))
-            .unwrap_or_else(|| Arc::new(QueryContextBuilder::default().build()));
-
        let region_id = RegionId::from_u64(request.region_id);
        let provider = self.table_provider(region_id, Some(&query_ctx)).await?;
        let catalog_list = Arc::new(DummyCatalogList::with_table_provider(provider));
@@ -214,7 +209,7 @@ impl RegionServer {
        let decoder = self
            .inner
            .query_engine
-            .engine_context(query_ctx)
+            .engine_context(query_ctx.clone())
            .new_plan_decoder()
            .context(NewPlanDecoderSnafu)?;

@@ -224,11 +219,14 @@ impl RegionServer {
            .context(DecodeLogicalPlanSnafu)?;

        self.inner
-            .handle_read(QueryRequest {
-                header: request.header,
-                region_id,
-                plan,
-            })
+            .handle_read(
+                QueryRequest {
+                    header: request.header,
+                    region_id,
+                    plan,
+                },
+                query_ctx,
+            )
            .await
    }

@@ -243,6 +241,7 @@ impl RegionServer {
        let ctx: Option<session::context::QueryContext> = request.header.as_ref().map(|h| h.into());

        let provider = self.table_provider(request.region_id, ctx.as_ref()).await?;
+        let query_ctx = Arc::new(ctx.unwrap_or_else(|| QueryContextBuilder::default().build()));

        struct RegionDataSourceInjector {
            source: Arc<dyn TableSource>,
@@ -271,7 +270,7 @@ impl RegionServer {
            .data;

        self.inner
-            .handle_read(QueryRequest { plan, ..request })
+            .handle_read(QueryRequest { plan, ..request }, query_ctx)
            .await
    }

@@ -536,9 +535,14 @@ impl FlightCraft for RegionServer {
            .as_ref()
            .map(|h| TracingContext::from_w3c(&h.tracing_context))
            .unwrap_or_default();
+        let query_ctx = request
+            .header
+            .as_ref()
+            .map(|h| Arc::new(QueryContext::from(h)))
+            .unwrap_or(QueryContext::arc());

        let result = self
-            .handle_remote_read(request)
+            .handle_remote_read(request, query_ctx.clone())
            .trace(tracing_context.attach(info_span!("RegionServer::handle_read")))
            .await?;

@@ -546,6 +550,7 @@ impl FlightCraft for RegionServer {
            result,
            tracing_context,
            self.flight_compression,
+            query_ctx,
        ));
        Ok(Response::new(stream))
    }
@@ -915,9 +920,8 @@ impl RegionServerInner {
        request: RegionRequest,
    ) -> Result<RegionResponse> {
        let request_type = request.request_type();
-        let region_id_str = region_id.to_string();
        let _timer = crate::metrics::HANDLE_REGION_REQUEST_ELAPSED
-            .with_label_values(&[&region_id_str, request_type])
+            .with_label_values(&[request_type])
            .start_timer();

        let region_change = match &request {
@@ -957,7 +961,7 @@ impl RegionServerInner {
                // Update metrics
                if matches!(region_change, RegionChange::Ingest) {
                    crate::metrics::REGION_CHANGED_ROW_COUNT
-                        .with_label_values(&[&region_id_str, request_type])
+                        .with_label_values(&[request_type])
                        .inc_by(result.affected_rows as u64);
                }
                // Sets corresponding region status to ready.
@@ -1124,16 +1128,13 @@ impl RegionServerInner {
        Ok(())
    }

-    pub async fn handle_read(&self, request: QueryRequest) -> Result<SendableRecordBatchStream> {
+    pub async fn handle_read(
+        &self,
+        request: QueryRequest,
+        query_ctx: QueryContextRef,
+    ) -> Result<SendableRecordBatchStream> {
        // TODO(ruihang): add metrics and set trace id

-        // Build query context from gRPC header
-        let query_ctx: QueryContextRef = request
-            .header
-            .as_ref()
-            .map(|h| Arc::new(h.into()))
-            .unwrap_or_else(|| QueryContextBuilder::default().build().into());
-
        let result = self
            .query_engine
            .execute(request.plan, query_ctx)
--- a/src/datatypes/src/schema/column_schema.rs
+++ b/src/datatypes/src/schema/column_schema.rs
@@ -527,7 +527,7 @@ pub struct FulltextOptions {
    #[serde(default = "fulltext_options_default_granularity")]
    pub granularity: u32,
    /// The false positive rate of the fulltext index (for bloom backend only)
-    #[serde(default = "fulltext_options_default_false_positive_rate_in_10000")]
+    #[serde(default = "index_options_default_false_positive_rate_in_10000")]
    pub false_positive_rate_in_10000: u32,
 }

@@ -535,7 +535,7 @@ fn fulltext_options_default_granularity() -> u32 {
    DEFAULT_GRANULARITY
 }

-fn fulltext_options_default_false_positive_rate_in_10000() -> u32 {
+fn index_options_default_false_positive_rate_in_10000() -> u32 {
    (DEFAULT_FALSE_POSITIVE_RATE * 10000.0) as u32
 }

@@ -773,6 +773,7 @@ pub struct SkippingIndexOptions {
    /// The granularity of the skip index.
    pub granularity: u32,
    /// The false positive rate of the skip index (in ten-thousandths, e.g., 100 = 1%).
+    #[serde(default = "index_options_default_false_positive_rate_in_10000")]
    pub false_positive_rate_in_10000: u32,
    /// The type of the skip index.
    #[serde(default)]
@@ -1179,4 +1180,59 @@ mod tests {
        assert!(column_schema.default_constraint.is_none());
        assert!(column_schema.metadata.is_empty());
    }
+
+    #[test]
+    fn test_skipping_index_options_deserialization() {
+        let original_options = "{\"granularity\":1024,\"false-positive-rate-in-10000\":10,\"index-type\":\"BloomFilter\"}";
+        let options = serde_json::from_str::<SkippingIndexOptions>(original_options).unwrap();
+        assert_eq!(1024, options.granularity);
+        assert_eq!(SkippingIndexType::BloomFilter, options.index_type);
+        assert_eq!(0.001, options.false_positive_rate());
+
+        let options_str = serde_json::to_string(&options).unwrap();
+        assert_eq!(options_str, original_options);
+    }
+
+    #[test]
+    fn test_skipping_index_options_deserialization_v0_14_to_v0_15() {
+        let options = "{\"granularity\":10240,\"index-type\":\"BloomFilter\"}";
+        let options = serde_json::from_str::<SkippingIndexOptions>(options).unwrap();
+        assert_eq!(10240, options.granularity);
+        assert_eq!(SkippingIndexType::BloomFilter, options.index_type);
+        assert_eq!(DEFAULT_FALSE_POSITIVE_RATE, options.false_positive_rate());
+
+        let options_str = serde_json::to_string(&options).unwrap();
+        assert_eq!(options_str, "{\"granularity\":10240,\"false-positive-rate-in-10000\":100,\"index-type\":\"BloomFilter\"}");
+    }
+
+    #[test]
+    fn test_fulltext_options_deserialization() {
+        let original_options = "{\"enable\":true,\"analyzer\":\"English\",\"case-sensitive\":false,\"backend\":\"bloom\",\"granularity\":1024,\"false-positive-rate-in-10000\":10}";
+        let options = serde_json::from_str::<FulltextOptions>(original_options).unwrap();
+        assert!(!options.case_sensitive);
+        assert!(options.enable);
+        assert_eq!(FulltextBackend::Bloom, options.backend);
+        assert_eq!(FulltextAnalyzer::default(), options.analyzer);
+        assert_eq!(1024, options.granularity);
+        assert_eq!(0.001, options.false_positive_rate());
+
+        let options_str = serde_json::to_string(&options).unwrap();
+        assert_eq!(options_str, original_options);
+    }
+
+    #[test]
+    fn test_fulltext_options_deserialization_v0_14_to_v0_15() {
+        // 0.14 to 0.15
+        let options = "{\"enable\":true,\"analyzer\":\"English\",\"case-sensitive\":false,\"backend\":\"bloom\"}";
+        let options = serde_json::from_str::<FulltextOptions>(options).unwrap();
+        assert!(!options.case_sensitive);
+        assert!(options.enable);
+        assert_eq!(FulltextBackend::Bloom, options.backend);
+        assert_eq!(FulltextAnalyzer::default(), options.analyzer);
+        assert_eq!(DEFAULT_GRANULARITY, options.granularity);
+        assert_eq!(DEFAULT_FALSE_POSITIVE_RATE, options.false_positive_rate());
+
+        let options_str = serde_json::to_string(&options).unwrap();
+        assert_eq!(options_str, "{\"enable\":true,\"analyzer\":\"English\",\"case-sensitive\":false,\"backend\":\"bloom\",\"granularity\":10240,\"false-positive-rate-in-10000\":100}");
+    }
 }
--- a/src/datatypes/src/timestamp.rs
+++ b/src/datatypes/src/timestamp.rs
@@ -12,6 +12,11 @@
 // See the License for the specific language governing permissions and
 // limitations under the License.

+use arrow_array::{
+    ArrayRef, PrimitiveArray, TimestampMicrosecondArray, TimestampMillisecondArray,
+    TimestampNanosecondArray, TimestampSecondArray,
+};
+use arrow_schema::DataType;
 use common_time::timestamp::TimeUnit;
 use common_time::Timestamp;
 use paste::paste;
@@ -138,6 +143,41 @@ define_timestamp_with_unit!(Millisecond);
 define_timestamp_with_unit!(Microsecond);
 define_timestamp_with_unit!(Nanosecond);

+pub fn timestamp_array_to_primitive(
+    ts_array: &ArrayRef,
+) -> Option<(
+    PrimitiveArray<arrow_array::types::Int64Type>,
+    arrow::datatypes::TimeUnit,
+)> {
+    let DataType::Timestamp(unit, _) = ts_array.data_type() else {
+        return None;
+    };
+
+    let ts_primitive = match unit {
+        arrow_schema::TimeUnit::Second => ts_array
+            .as_any()
+            .downcast_ref::<TimestampSecondArray>()
+            .unwrap()
+            .reinterpret_cast::<arrow_array::types::Int64Type>(),
+        arrow_schema::TimeUnit::Millisecond => ts_array
+            .as_any()
+            .downcast_ref::<TimestampMillisecondArray>()
+            .unwrap()
+            .reinterpret_cast::<arrow_array::types::Int64Type>(),
+        arrow_schema::TimeUnit::Microsecond => ts_array
+            .as_any()
+            .downcast_ref::<TimestampMicrosecondArray>()
+            .unwrap()
+            .reinterpret_cast::<arrow_array::types::Int64Type>(),
+        arrow_schema::TimeUnit::Nanosecond => ts_array
+            .as_any()
+            .downcast_ref::<TimestampNanosecondArray>()
+            .unwrap()
+            .reinterpret_cast::<arrow_array::types::Int64Type>(),
+    };
+    Some((ts_primitive, *unit))
+}
+
 #[cfg(test)]
 mod tests {
    use common_time::timezone::set_default_timezone;
--- a/src/flow/src/batching_mode/engine.rs
+++ b/src/flow/src/batching_mode/engine.rs
@@ -14,7 +14,7 @@

 //! Batching mode engine

-use std::collections::{BTreeMap, HashMap};
+use std::collections::{BTreeMap, HashMap, HashSet};
 use std::sync::Arc;

 use api::v1::flow::{DirtyWindowRequests, FlowResponse};
@@ -142,7 +142,7 @@ impl BatchingEngine {

            let handle: JoinHandle<Result<(), Error>> = tokio::spawn(async move {
                let src_table_names = &task.config.source_table_names;
-                let mut all_dirty_windows = vec![];
+                let mut all_dirty_windows = HashSet::new();
                for src_table_name in src_table_names {
                    if let Some((timestamps, unit)) = group_by_table_name.get(src_table_name) {
                        let Some(expr) = &task.config.time_window_expr else {
@@ -155,7 +155,7 @@ impl BatchingEngine {
                                .context(UnexpectedSnafu {
                                    reason: "Failed to eval start value",
                                })?;
-                            all_dirty_windows.push(align_start);
+                            all_dirty_windows.insert(align_start);
                        }
                    }
                }
--- a/src/flow/src/batching_mode/time_window.rs
+++ b/src/flow/src/batching_mode/time_window.rs
@@ -50,7 +50,8 @@ use snafu::{ensure, OptionExt, ResultExt};

 use crate::adapter::util::from_proto_to_data_type;
 use crate::error::{
-    ArrowSnafu, DatafusionSnafu, DatatypesSnafu, ExternalSnafu, PlanSnafu, UnexpectedSnafu,
+    ArrowSnafu, DatafusionSnafu, DatatypesSnafu, ExternalSnafu, PlanSnafu, TimeSnafu,
+    UnexpectedSnafu,
 };
 use crate::expr::error::DataTypeSnafu;
 use crate::Error;
@@ -74,6 +75,7 @@ pub struct TimeWindowExpr {
    logical_expr: Expr,
    df_schema: DFSchema,
    eval_time_window_size: Option<std::time::Duration>,
+    eval_time_original: Option<Timestamp>,
 }

 impl std::fmt::Display for TimeWindowExpr {
@@ -106,10 +108,11 @@ impl TimeWindowExpr {
            logical_expr: expr.clone(),
            df_schema: df_schema.clone(),
            eval_time_window_size: None,
+            eval_time_original: None,
        };
        let test_ts = DEFAULT_TEST_TIMESTAMP;
-        let (l, u) = zelf.eval(test_ts)?;
-        let time_window_size = match (l, u) {
+        let (lower, upper) = zelf.eval(test_ts)?;
+        let time_window_size = match (lower, upper) {
            (Some(l), Some(u)) => u.sub(&l).map(|r| r.to_std()).transpose().map_err(|_| {
                UnexpectedSnafu {
                    reason: format!(
@@ -121,13 +124,59 @@ impl TimeWindowExpr {
            _ => None,
        };
        zelf.eval_time_window_size = time_window_size;
+        zelf.eval_time_original = lower;
+
        Ok(zelf)
    }

+    /// TODO(discord9): add `eval_batch` too
    pub fn eval(
        &self,
        current: Timestamp,
    ) -> Result<(Option<Timestamp>, Option<Timestamp>), Error> {
+        fn compute_distance(time_diff_ns: i64, stride_ns: i64) -> i64 {
+            if stride_ns == 0 {
+                return time_diff_ns;
+            }
+            // a - (a % n) impl ceil to nearest n * stride
+            let time_delta = time_diff_ns - (time_diff_ns % stride_ns);
+
+            if time_diff_ns < 0 && time_delta != time_diff_ns {
+                // The origin is later than the source timestamp, round down to the previous bin
+
+                time_delta - stride_ns
+            } else {
+                time_delta
+            }
+        }
+
+        // FAST PATH: if we have eval_time_original and eval_time_window_size,
+        // we can compute the bounds directly
+        if let (Some(original), Some(window_size)) =
+            (self.eval_time_original, self.eval_time_window_size)
+        {
+            // date_bin align current to lower bound
+            let time_diff_ns = current.sub(&original).and_then(|s|s.num_nanoseconds()).with_context(||UnexpectedSnafu {
+                reason: format!(
+                    "Failed to compute time difference between current {current:?} and original {original:?}"
+                ),
+            })?;
+
+            let window_size_ns = window_size.as_nanos() as i64;
+
+            let distance_ns = compute_distance(time_diff_ns, window_size_ns);
+
+            let lower_bound = if distance_ns >= 0 {
+                original.add_duration(std::time::Duration::from_nanos(distance_ns as u64))
+            } else {
+                original.sub_duration(std::time::Duration::from_nanos((-distance_ns) as u64))
+            }
+            .context(TimeSnafu)?;
+            let upper_bound = lower_bound.add_duration(window_size).context(TimeSnafu)?;
+
+            return Ok((Some(lower_bound), Some(upper_bound)));
+        }
+
        let lower_bound =
            calc_expr_time_window_lower_bound(&self.phy_expr, &self.df_schema, current)?;
        let upper_bound =
--- a/src/frontend/src/instance.rs
+++ b/src/frontend/src/instance.rs
@@ -380,6 +380,13 @@ impl SqlQueryHandler for Instance {
            .and_then(|stmts| query_interceptor.post_parsing(stmts, query_ctx.clone()))
        {
            Ok(stmts) => {
+                if stmts.is_empty() {
+                    return vec![InvalidSqlSnafu {
+                        err_msg: "empty statements",
+                    }
+                    .fail()];
+                }
+
                let mut results = Vec::with_capacity(stmts.len());
                for stmt in stmts {
                    if let Err(e) = checker
--- a/src/meta-srv/src/handler/remap_flow_peer_handler.rs
+++ b/src/meta-srv/src/handler/remap_flow_peer_handler.rs
@@ -13,6 +13,7 @@
 // limitations under the License.

 use api::v1::meta::{HeartbeatRequest, Peer, Role};
+use common_meta::instruction::CacheIdent;
 use common_meta::key::node_address::{NodeAddressKey, NodeAddressValue};
 use common_meta::key::{MetadataKey, MetadataValue};
 use common_meta::rpc::store::PutRequest;
@@ -80,7 +81,19 @@ async fn rewrite_node_address(ctx: &mut Context, peer: &Peer) {
        match ctx.leader_cached_kv_backend.put(put).await {
            Ok(_) => {
                info!("Successfully updated flow `NodeAddressValue`: {:?}", peer);
-                // TODO(discord): broadcast invalidating cache to all frontends
+                // broadcast invalidating cache to all frontends
+                let cache_idents = vec![CacheIdent::FlowNodeAddressChange(peer.id)];
+                info!(
+                    "Invalidate flow node cache for new address with cache idents: {:?}",
+                    cache_idents
+                );
+                if let Err(e) = ctx
+                    .cache_invalidator
+                    .invalidate(&Default::default(), &cache_idents)
+                    .await
+                {
+                    error!(e; "Failed to invalidate {} `NodeAddressKey` cache, peer: {:?}", cache_idents.len(), peer);
+                }
            }
            Err(e) => {
                error!(e; "Failed to update flow `NodeAddressValue`: {:?}", peer);
--- a/src/metric-engine/src/engine.rs
+++ b/src/metric-engine/src/engine.rs
@@ -473,8 +473,9 @@ struct MetricEngineInner {
 mod test {
    use std::collections::HashMap;

+    use common_telemetry::info;
    use store_api::metric_engine_consts::PHYSICAL_TABLE_METADATA_KEY;
-    use store_api::region_request::{RegionCloseRequest, RegionOpenRequest};
+    use store_api::region_request::{RegionCloseRequest, RegionFlushRequest, RegionOpenRequest};

    use super::*;
    use crate::test_util::TestEnv;
@@ -559,4 +560,90 @@ mod test {
        assert!(env.metric().region_statistic(logical_region_id).is_none());
        assert!(env.metric().region_statistic(physical_region_id).is_some());
    }
+
+    #[tokio::test]
+    async fn test_open_region_failure() {
+        let env = TestEnv::new().await;
+        env.init_metric_region().await;
+        let physical_region_id = env.default_physical_region_id();
+
+        let metric_engine = env.metric();
+        metric_engine
+            .handle_request(
+                physical_region_id,
+                RegionRequest::Flush(RegionFlushRequest {
+                    row_group_size: None,
+                }),
+            )
+            .await
+            .unwrap();
+
+        let path = format!("{}/metadata/", env.default_region_dir());
+        let object_store = env.get_object_store().unwrap();
+        let list = object_store.list(&path).await.unwrap();
+        // Delete parquet files in metadata region
+        for entry in list {
+            if entry.metadata().is_dir() {
+                continue;
+            }
+            if entry.name().ends_with("parquet") {
+                info!("deleting {}", entry.path());
+                object_store.delete(entry.path()).await.unwrap();
+            }
+        }
+
+        let physical_region_option = [(PHYSICAL_TABLE_METADATA_KEY.to_string(), String::new())]
+            .into_iter()
+            .collect();
+        let open_request = RegionOpenRequest {
+            engine: METRIC_ENGINE_NAME.to_string(),
+            region_dir: env.default_region_dir(),
+            options: physical_region_option,
+            skip_wal_replay: false,
+        };
+        // Opening an already opened region should succeed.
+        // Since the region is already open, no metadata recovery operations will be performed.
+        metric_engine
+            .handle_request(physical_region_id, RegionRequest::Open(open_request))
+            .await
+            .unwrap();
+
+        // Close the region
+        metric_engine
+            .handle_request(
+                physical_region_id,
+                RegionRequest::Close(RegionCloseRequest {}),
+            )
+            .await
+            .unwrap();
+
+        // Try to reopen region.
+        let physical_region_option = [(PHYSICAL_TABLE_METADATA_KEY.to_string(), String::new())]
+            .into_iter()
+            .collect();
+        let open_request = RegionOpenRequest {
+            engine: METRIC_ENGINE_NAME.to_string(),
+            region_dir: env.default_region_dir(),
+            options: physical_region_option,
+            skip_wal_replay: false,
+        };
+        let err = metric_engine
+            .handle_request(physical_region_id, RegionRequest::Open(open_request))
+            .await
+            .unwrap_err();
+        // Failed to open region because of missing parquet files.
+        assert_eq!(err.status_code(), StatusCode::StorageUnavailable);
+
+        let mito_engine = metric_engine.mito();
+        let data_region_id = utils::to_data_region_id(physical_region_id);
+        let metadata_region_id = utils::to_metadata_region_id(physical_region_id);
+        // The metadata/data region should be closed.
+        let err = mito_engine.get_metadata(data_region_id).await.unwrap_err();
+        assert_eq!(err.status_code(), StatusCode::RegionNotFound);
+        let err = mito_engine
+            .get_metadata(metadata_region_id)
+            .await
+            .unwrap_err();
+        assert_eq!(err.status_code(), StatusCode::RegionNotFound);
+    }
 }
--- a/src/metric-engine/src/engine/close.rs
+++ b/src/metric-engine/src/engine/close.rs
@@ -59,7 +59,7 @@ impl MetricEngineInner {
        }
    }

-    async fn close_physical_region(&self, region_id: RegionId) -> Result<AffectedRows> {
+    pub(crate) async fn close_physical_region(&self, region_id: RegionId) -> Result<AffectedRows> {
        let data_region_id = utils::to_data_region_id(region_id);
        let metadata_region_id = utils::to_metadata_region_id(region_id);

--- a/src/metric-engine/src/engine/open.rs
+++ b/src/metric-engine/src/engine/open.rs
@@ -17,7 +17,7 @@
 use api::region::RegionResponse;
 use api::v1::SemanticType;
 use common_error::ext::BoxedError;
-use common_telemetry::info;
+use common_telemetry::{error, info, warn};
 use datafusion::common::HashMap;
 use mito2::engine::MITO_ENGINE_NAME;
 use object_store::util::join_dir;
@@ -94,6 +94,21 @@ impl MetricEngineInner {
        Ok(responses)
    }

+    // If the metadata region is opened with a stale manifest,
+    // the metric engine may fail to recover logical tables from the metadata region,
+    // as the manifest could reference files that have already been deleted
+    // due to compaction operations performed by the region leader.
+    async fn close_physical_region_on_recovery_failure(&self, physical_region_id: RegionId) {
+        info!(
+            "Closing metadata region {} and data region {} on metadata recovery failure",
+            utils::to_metadata_region_id(physical_region_id),
+            utils::to_data_region_id(physical_region_id)
+        );
+        if let Err(err) = self.close_physical_region(physical_region_id).await {
+            error!(err; "Failed to close physical region {}", physical_region_id);
+        }
+    }
+
    async fn open_physical_region_with_results(
        &self,
        metadata_region_result: Option<std::result::Result<RegionResponse, BoxedError>>,
@@ -119,8 +134,14 @@ impl MetricEngineInner {
                region_type: "data",
            })?;

-        self.recover_states(physical_region_id, physical_region_options)
-            .await?;
+        if let Err(err) = self
+            .recover_states(physical_region_id, physical_region_options)
+            .await
+        {
+            self.close_physical_region_on_recovery_failure(physical_region_id)
+                .await;
+            return Err(err);
+        }
        Ok(data_region_response)
    }

@@ -139,11 +160,31 @@ impl MetricEngineInner {
        request: RegionOpenRequest,
    ) -> Result<AffectedRows> {
        if request.is_physical_table() {
+            if self
+                .state
+                .read()
+                .unwrap()
+                .physical_region_states()
+                .get(&region_id)
+                .is_some()
+            {
+                warn!(
+                    "The physical region {} is already open, ignore the open request",
+                    region_id
+                );
+                return Ok(0);
+            }
            // open physical region and recover states
            let physical_region_options = PhysicalRegionOptions::try_from(&request.options)?;
            self.open_physical_region(region_id, request).await?;
-            self.recover_states(region_id, physical_region_options)
-                .await?;
+            if let Err(err) = self
+                .recover_states(region_id, physical_region_options)
+                .await
+            {
+                self.close_physical_region_on_recovery_failure(region_id)
+                    .await;
+                return Err(err);
+            }

            Ok(0)
        } else {
--- a/src/metric-engine/src/test_util.rs
+++ b/src/metric-engine/src/test_util.rs
@@ -23,6 +23,7 @@ use mito2::config::MitoConfig;
 use mito2::engine::MitoEngine;
 use mito2::test_util::TestEnv as MitoTestEnv;
 use object_store::util::join_dir;
+use object_store::ObjectStore;
 use store_api::metadata::ColumnMetadata;
 use store_api::metric_engine_consts::{
    LOGICAL_TABLE_METADATA_KEY, METRIC_ENGINE_NAME, PHYSICAL_TABLE_METADATA_KEY,
@@ -74,6 +75,10 @@ impl TestEnv {
        join_dir(&env_root, "data")
    }

+    pub fn get_object_store(&self) -> Option<ObjectStore> {
+        self.mito_env.get_object_store()
+    }
+
    /// Returns a reference to the engine.
    pub fn mito(&self) -> MitoEngine {
        self.mito.clone()
--- a/src/mito2/src/compaction.rs
+++ b/src/mito2/src/compaction.rs
@@ -62,7 +62,7 @@ use crate::read::BoxedBatchReader;
 use crate::region::options::MergeMode;
 use crate::region::version::VersionControlRef;
 use crate::region::ManifestContextRef;
-use crate::request::{OptionOutputTx, OutputTx, WorkerRequest};
+use crate::request::{OptionOutputTx, OutputTx, WorkerRequestWithTime};
 use crate::schedule::remote_job_scheduler::{
    CompactionJob, DefaultNotifier, RemoteJob, RemoteJobSchedulerRef,
 };
@@ -77,7 +77,7 @@ pub struct CompactionRequest {
    pub(crate) current_version: CompactionVersion,
    pub(crate) access_layer: AccessLayerRef,
    /// Sender to send notification to the region worker.
-    pub(crate) request_sender: mpsc::Sender<WorkerRequest>,
+    pub(crate) request_sender: mpsc::Sender<WorkerRequestWithTime>,
    /// Waiters of the compaction request.
    pub(crate) waiters: Vec<OutputTx>,
    /// Start time of compaction task.
@@ -101,7 +101,7 @@ pub(crate) struct CompactionScheduler {
    /// Compacting regions.
    region_status: HashMap<RegionId, CompactionStatus>,
    /// Request sender of the worker that this scheduler belongs to.
-    request_sender: Sender<WorkerRequest>,
+    request_sender: Sender<WorkerRequestWithTime>,
    cache_manager: CacheManagerRef,
    engine_config: Arc<MitoConfig>,
    listener: WorkerListener,
@@ -112,7 +112,7 @@ pub(crate) struct CompactionScheduler {
 impl CompactionScheduler {
    pub(crate) fn new(
        scheduler: SchedulerRef,
-        request_sender: Sender<WorkerRequest>,
+        request_sender: Sender<WorkerRequestWithTime>,
        cache_manager: CacheManagerRef,
        engine_config: Arc<MitoConfig>,
        listener: WorkerListener,
@@ -559,7 +559,7 @@ impl CompactionStatus {
    #[allow(clippy::too_many_arguments)]
    fn new_compaction_request(
        &mut self,
-        request_sender: Sender<WorkerRequest>,
+        request_sender: Sender<WorkerRequestWithTime>,
        mut waiter: OptionOutputTx,
        engine_config: Arc<MitoConfig>,
        cache_manager: CacheManagerRef,
--- a/src/mito2/src/compaction/task.rs
+++ b/src/mito2/src/compaction/task.rs
@@ -27,6 +27,7 @@ use crate::manifest::action::RegionEdit;
 use crate::metrics::{COMPACTION_FAILURE_COUNT, COMPACTION_STAGE_ELAPSED};
 use crate::request::{
    BackgroundNotify, CompactionFailed, CompactionFinished, OutputTx, WorkerRequest,
+    WorkerRequestWithTime,
 };
 use crate::worker::WorkerListener;
 use crate::{error, metrics};
@@ -37,7 +38,7 @@ pub const MAX_PARALLEL_COMPACTION: usize = 1;
 pub(crate) struct CompactionTaskImpl {
    pub compaction_region: CompactionRegion,
    /// Request sender to notify the worker.
-    pub(crate) request_sender: mpsc::Sender<WorkerRequest>,
+    pub(crate) request_sender: mpsc::Sender<WorkerRequestWithTime>,
    /// Senders that are used to notify waiters waiting for pending compaction tasks.
    pub waiters: Vec<OutputTx>,
    /// Start time of compaction task
@@ -135,7 +136,11 @@ impl CompactionTaskImpl {

    /// Notifies region worker to handle post-compaction tasks.
    async fn send_to_worker(&self, request: WorkerRequest) {
-        if let Err(e) = self.request_sender.send(request).await {
+        if let Err(e) = self
+            .request_sender
+            .send(WorkerRequestWithTime::new(request))
+            .await
+        {
            error!(
                "Failed to notify compaction job status for region {}, request: {:?}",
                self.compaction_region.region_id, e.0
--- a/src/mito2/src/error.rs
+++ b/src/mito2/src/error.rs
@@ -1020,6 +1020,18 @@ pub enum Error {
        location: Location,
        source: mito_codec::error::Error,
    },
+
+    #[snafu(display(
+        "Inconsistent timestamp column length, expect: {}, actual: {}",
+        expected,
+        actual
+    ))]
+    InconsistentTimestampLength {
+        expected: usize,
+        actual: usize,
+        #[snafu(implicit)]
+        location: Location,
+    },
 }

 pub type Result<T, E = Error> = std::result::Result<T, E>;
@@ -1175,6 +1187,8 @@ impl ErrorExt for Error {
            ConvertBulkWalEntry { source, .. } => source.status_code(),

            Encode { source, .. } | Decode { source, .. } => source.status_code(),
+
+            InconsistentTimestampLength { .. } => StatusCode::InvalidArguments,
        }
    }

--- a/src/mito2/src/flush.rs
+++ b/src/mito2/src/flush.rs
@@ -42,7 +42,7 @@ use crate::region::version::{VersionControlData, VersionControlRef};
 use crate::region::{ManifestContextRef, RegionLeaderState};
 use crate::request::{
    BackgroundNotify, FlushFailed, FlushFinished, OptionOutputTx, OutputTx, SenderBulkRequest,
-    SenderDdlRequest, SenderWriteRequest, WorkerRequest,
+    SenderDdlRequest, SenderWriteRequest, WorkerRequest, WorkerRequestWithTime,
 };
 use crate::schedule::scheduler::{Job, SchedulerRef};
 use crate::sst::file::FileMeta;
@@ -223,7 +223,7 @@ pub(crate) struct RegionFlushTask {
    /// Flush result senders.
    pub(crate) senders: Vec<OutputTx>,
    /// Request sender to notify the worker.
-    pub(crate) request_sender: mpsc::Sender<WorkerRequest>,
+    pub(crate) request_sender: mpsc::Sender<WorkerRequestWithTime>,

    pub(crate) access_layer: AccessLayerRef,
    pub(crate) listener: WorkerListener,
@@ -441,7 +441,11 @@ impl RegionFlushTask {

    /// Notify flush job status.
    async fn send_worker_request(&self, request: WorkerRequest) {
-        if let Err(e) = self.request_sender.send(request).await {
+        if let Err(e) = self
+            .request_sender
+            .send(WorkerRequestWithTime::new(request))
+            .await
+        {
            error!(
                "Failed to notify flush job status for region {}, request: {:?}",
                self.region_id, e.0
--- a/src/mito2/src/memtable/bulk/part.rs
+++ b/src/mito2/src/memtable/bulk/part.rs
@@ -126,7 +126,12 @@ impl From<&BulkPart> for BulkWalEntry {

 impl BulkPart {
    pub(crate) fn estimated_size(&self) -> usize {
-        self.batch.get_array_memory_size()
+        self.batch
+            .columns()
+            .iter()
+            // If can not get slice memory size, assume 0 here.
+            .map(|c| c.to_data().get_slice_memory_size().unwrap_or(0))
+            .sum()
    }

    /// Converts [BulkPart] to [Mutation] for fallback `write_bulk` implementation.
--- a/src/mito2/src/metrics.rs
+++ b/src/mito2/src/metrics.rs
@@ -94,12 +94,7 @@ lazy_static! {


    // ------ Write related metrics
-    /// Number of stalled write requests in each worker.
-    pub static ref WRITE_STALL_TOTAL: IntGaugeVec = register_int_gauge_vec!(
-            "greptime_mito_write_stall_total",
-            "mito stalled write request in each worker",
-            &[WORKER_LABEL]
-        ).unwrap();
+    //
    /// Counter of rejected write requests.
    pub static ref WRITE_REJECT_TOTAL: IntCounter =
        register_int_counter!("greptime_mito_write_reject_total", "mito write reject total").unwrap();
@@ -402,6 +397,7 @@ lazy_static! {

 }

+// Use another block to avoid reaching the recursion limit.
 lazy_static! {
    /// Counter for compaction input file size.
    pub static ref COMPACTION_INPUT_BYTES: Counter = register_counter!(
@@ -426,6 +422,27 @@ lazy_static! {
        "greptime_mito_memtable_field_builder_count",
        "active field builder count in TimeSeriesMemtable",
        ).unwrap();
+
+    /// Number of stalling write requests in each worker.
+    pub static ref WRITE_STALLING: IntGaugeVec = register_int_gauge_vec!(
+            "greptime_mito_write_stalling_count",
+            "mito stalled write request in each worker",
+            &[WORKER_LABEL]
+        ).unwrap();
+    /// Total number of stalled write requests.
+    pub static ref WRITE_STALL_TOTAL: IntCounter = register_int_counter!(
+        "greptime_mito_write_stall_total",
+        "Total number of stalled write requests"
+    ).unwrap();
+    /// Time waiting for requests to be handled by the region worker.
+    pub static ref REQUEST_WAIT_TIME: HistogramVec = register_histogram_vec!(
+            "greptime_mito_request_wait_time",
+            "mito request wait time before being handled by region worker",
+            &[WORKER_LABEL],
+            // 0.001 ~ 10000
+            exponential_buckets(0.001, 10.0, 8).unwrap(),
+        )
+        .unwrap();
 }

 /// Stager notifier to collect metrics.
--- a/src/mito2/src/request.rs
+++ b/src/mito2/src/request.rs
@@ -542,6 +542,22 @@ pub(crate) struct SenderBulkRequest {
    pub(crate) region_metadata: RegionMetadataRef,
 }

+/// Request sent to a worker with timestamp
+#[derive(Debug)]
+pub(crate) struct WorkerRequestWithTime {
+    pub(crate) request: WorkerRequest,
+    pub(crate) created_at: Instant,
+}
+
+impl WorkerRequestWithTime {
+    pub(crate) fn new(request: WorkerRequest) -> Self {
+        Self {
+            request,
+            created_at: Instant::now(),
+        }
+    }
+}
+
 /// Request sent to a worker
 #[derive(Debug)]
 pub(crate) enum WorkerRequest {
--- a/src/mito2/src/schedule/remote_job_scheduler.rs
+++ b/src/mito2/src/schedule/remote_job_scheduler.rs
@@ -30,6 +30,7 @@ use crate::manifest::action::RegionEdit;
 use crate::metrics::{COMPACTION_FAILURE_COUNT, INFLIGHT_COMPACTION_COUNT};
 use crate::request::{
    BackgroundNotify, CompactionFailed, CompactionFinished, OutputTx, WorkerRequest,
+    WorkerRequestWithTime,
 };

 pub type RemoteJobSchedulerRef = Arc<dyn RemoteJobScheduler>;
@@ -130,7 +131,7 @@ pub struct CompactionJobResult {
 /// DefaultNotifier is a default implementation of Notifier that sends WorkerRequest to the mito engine.
 pub(crate) struct DefaultNotifier {
    /// The sender to send WorkerRequest to the mito engine. This is used to notify the mito engine when a remote job is completed.
-    pub(crate) request_sender: Sender<WorkerRequest>,
+    pub(crate) request_sender: Sender<WorkerRequestWithTime>,
 }

 impl DefaultNotifier {
@@ -173,10 +174,10 @@ impl Notifier for DefaultNotifier {

                if let Err(e) = self
                    .request_sender
-                    .send(WorkerRequest::Background {
+                    .send(WorkerRequestWithTime::new(WorkerRequest::Background {
                        region_id: result.region_id,
                        notify,
-                    })
+                    }))
                    .await
                {
                    error!(
--- a/src/mito2/src/sst/parquet/row_selection.rs
+++ b/src/mito2/src/sst/parquet/row_selection.rs
@@ -294,7 +294,7 @@ impl RowGroupSelection {
            let Some(y) = self.selection_in_rg.get(rg_id) else {
                continue;
            };
-            let selection = x.selection.intersection(&y.selection);
+            let selection = intersect_row_selections(&x.selection, &y.selection);
            let row_count = selection.row_count();
            let selector_len = selector_len(&selection);
            if row_count > 0 {
@@ -423,6 +423,68 @@ impl RowGroupSelection {
    }
 }

+/// Ported from `parquet` but trailing rows are removed.
+///
+/// Combine two lists of `RowSelection` return the intersection of them
+/// For example:
+/// self:      NNYYYYNNYYNYN
+/// other:     NYNNNNNNY
+///
+/// returned:  NNNNNNNNY     (modified)
+///            NNNNNNNNYYNYN (original)
+fn intersect_row_selections(left: &RowSelection, right: &RowSelection) -> RowSelection {
+    let mut l_iter = left.iter().copied().peekable();
+    let mut r_iter = right.iter().copied().peekable();
+
+    let iter = std::iter::from_fn(move || {
+        loop {
+            let l = l_iter.peek_mut();
+            let r = r_iter.peek_mut();
+
+            match (l, r) {
+                (Some(a), _) if a.row_count == 0 => {
+                    l_iter.next().unwrap();
+                }
+                (_, Some(b)) if b.row_count == 0 => {
+                    r_iter.next().unwrap();
+                }
+                (Some(l), Some(r)) => {
+                    return match (l.skip, r.skip) {
+                        // Keep both ranges
+                        (false, false) => {
+                            if l.row_count < r.row_count {
+                                r.row_count -= l.row_count;
+                                l_iter.next()
+                            } else {
+                                l.row_count -= r.row_count;
+                                r_iter.next()
+                            }
+                        }
+                        // skip at least one
+                        _ => {
+                            if l.row_count < r.row_count {
+                                let skip = l.row_count;
+                                r.row_count -= l.row_count;
+                                l_iter.next();
+                                Some(RowSelector::skip(skip))
+                            } else {
+                                let skip = r.row_count;
+                                l.row_count -= skip;
+                                r_iter.next();
+                                Some(RowSelector::skip(skip))
+                            }
+                        }
+                    };
+                }
+                (None, _) => return None,
+                (_, None) => return None,
+            }
+        }
+    });
+
+    iter.collect()
+}
+
 /// Converts an iterator of row ranges into a `RowSelection` by creating a sequence of `RowSelector`s.
 ///
 /// This function processes each range in the input and either creates a new selector or merges
@@ -448,10 +510,6 @@ pub(crate) fn row_selection_from_row_ranges(
        last_processed_end = end;
    }

-    if last_processed_end < total_row_count {
-        add_or_merge_selector(&mut selectors, total_row_count - last_processed_end, true);
-    }
-
    RowSelection::from(selectors)
 }

@@ -546,7 +604,6 @@ mod tests {
            RowSelector::select(2),
            RowSelector::skip(2),
            RowSelector::select(3),
-            RowSelector::skip(2),
        ]);
        assert_eq!(selection, expected);
    }
@@ -555,7 +612,7 @@ mod tests {
    fn test_empty_range() {
        let ranges = [];
        let selection = row_selection_from_row_ranges(ranges.iter().cloned(), 10);
-        let expected = RowSelection::from(vec![RowSelector::skip(10)]);
+        let expected = RowSelection::from(vec![]);
        assert_eq!(selection, expected);
    }

@@ -563,11 +620,7 @@ mod tests {
    fn test_adjacent_ranges() {
        let ranges = [1..2, 2..3];
        let selection = row_selection_from_row_ranges(ranges.iter().cloned(), 10);
-        let expected = RowSelection::from(vec![
-            RowSelector::skip(1),
-            RowSelector::select(2),
-            RowSelector::skip(7),
-        ]);
+        let expected = RowSelection::from(vec![RowSelector::skip(1), RowSelector::select(2)]);
        assert_eq!(selection, expected);
    }

@@ -580,7 +633,6 @@ mod tests {
            RowSelector::select(1),
            RowSelector::skip(98),
            RowSelector::select(1),
-            RowSelector::skip(10139),
        ]);
        assert_eq!(selection, expected);
    }
--- a/src/mito2/src/test_util/scheduler_util.rs
+++ b/src/mito2/src/test_util/scheduler_util.rs
@@ -32,7 +32,7 @@ use crate::error::Result;
 use crate::flush::FlushScheduler;
 use crate::manifest::manager::{RegionManifestManager, RegionManifestOptions};
 use crate::region::{ManifestContext, ManifestContextRef, RegionLeaderState, RegionRoleState};
-use crate::request::WorkerRequest;
+use crate::request::{WorkerRequest, WorkerRequestWithTime};
 use crate::schedule::scheduler::{Job, LocalScheduler, Scheduler, SchedulerRef};
 use crate::sst::index::intermediate::IntermediateManager;
 use crate::sst::index::puffin_manager::PuffinManagerFactory;
@@ -85,7 +85,7 @@ impl SchedulerEnv {
    /// Creates a new compaction scheduler.
    pub(crate) fn mock_compaction_scheduler(
        &self,
-        request_sender: Sender<WorkerRequest>,
+        request_sender: Sender<WorkerRequestWithTime>,
    ) -> CompactionScheduler {
        let scheduler = self.get_scheduler();

--- a/src/mito2/src/worker.rs
+++ b/src/mito2/src/worker.rs
@@ -39,7 +39,7 @@ use common_runtime::JoinHandle;
 use common_telemetry::{error, info, warn};
 use futures::future::try_join_all;
 use object_store::manager::ObjectStoreManagerRef;
-use prometheus::IntGauge;
+use prometheus::{Histogram, IntGauge};
 use rand::{rng, Rng};
 use snafu::{ensure, ResultExt};
 use store_api::logstore::LogStore;
@@ -58,11 +58,11 @@ use crate::error;
 use crate::error::{CreateDirSnafu, JoinSnafu, Result, WorkerStoppedSnafu};
 use crate::flush::{FlushScheduler, WriteBufferManagerImpl, WriteBufferManagerRef};
 use crate::memtable::MemtableBuilderProvider;
-use crate::metrics::{REGION_COUNT, WRITE_STALL_TOTAL};
+use crate::metrics::{REGION_COUNT, REQUEST_WAIT_TIME, WRITE_STALLING};
 use crate::region::{MitoRegionRef, OpeningRegions, OpeningRegionsRef, RegionMap, RegionMapRef};
 use crate::request::{
    BackgroundNotify, DdlRequest, SenderBulkRequest, SenderDdlRequest, SenderWriteRequest,
-    WorkerRequest,
+    WorkerRequest, WorkerRequestWithTime,
 };
 use crate::schedule::scheduler::{LocalScheduler, SchedulerRef};
 use crate::sst::file::FileId;
@@ -469,8 +469,9 @@ impl<S: LogStore> WorkerStarter<S> {
            last_periodical_check_millis: now,
            flush_sender: self.flush_sender,
            flush_receiver: self.flush_receiver,
-            stalled_count: WRITE_STALL_TOTAL.with_label_values(&[&id_string]),
+            stalling_count: WRITE_STALLING.with_label_values(&[&id_string]),
            region_count: REGION_COUNT.with_label_values(&[&id_string]),
+            request_wait_time: REQUEST_WAIT_TIME.with_label_values(&[&id_string]),
            region_edit_queues: RegionEditQueues::default(),
            schema_metadata_manager: self.schema_metadata_manager,
        };
@@ -498,7 +499,7 @@ pub(crate) struct RegionWorker {
    /// The opening regions.
    opening_regions: OpeningRegionsRef,
    /// Request sender.
-    sender: Sender<WorkerRequest>,
+    sender: Sender<WorkerRequestWithTime>,
    /// Handle to the worker thread.
    handle: Mutex<Option<JoinHandle<()>>>,
    /// Whether to run the worker thread.
@@ -509,7 +510,8 @@ impl RegionWorker {
    /// Submits request to background worker thread.
    async fn submit_request(&self, request: WorkerRequest) -> Result<()> {
        ensure!(self.is_running(), WorkerStoppedSnafu { id: self.id });
-        if self.sender.send(request).await.is_err() {
+        let request_with_time = WorkerRequestWithTime::new(request);
+        if self.sender.send(request_with_time).await.is_err() {
            warn!(
                "Worker {} is already exited but the running flag is still true",
                self.id
@@ -531,7 +533,12 @@ impl RegionWorker {
            info!("Stop region worker {}", self.id);

            self.set_running(false);
-            if self.sender.send(WorkerRequest::Stop).await.is_err() {
+            if self
+                .sender
+                .send(WorkerRequestWithTime::new(WorkerRequest::Stop))
+                .await
+                .is_err()
+            {
                warn!("Worker {} is already exited before stop", self.id);
            }

@@ -669,9 +676,9 @@ struct RegionWorkerLoop<S> {
    /// Regions that are opening.
    opening_regions: OpeningRegionsRef,
    /// Request sender.
-    sender: Sender<WorkerRequest>,
+    sender: Sender<WorkerRequestWithTime>,
    /// Request receiver.
-    receiver: Receiver<WorkerRequest>,
+    receiver: Receiver<WorkerRequestWithTime>,
    /// WAL of the engine.
    wal: Wal<S>,
    /// Manages object stores for manifest and SSTs.
@@ -706,10 +713,12 @@ struct RegionWorkerLoop<S> {
    flush_sender: watch::Sender<()>,
    /// Watch channel receiver to wait for background flush job.
    flush_receiver: watch::Receiver<()>,
-    /// Gauge of stalled request count.
-    stalled_count: IntGauge,
+    /// Gauge of stalling request count.
+    stalling_count: IntGauge,
    /// Gauge of regions in the worker.
    region_count: IntGauge,
+    /// Histogram of request wait time for this worker.
+    request_wait_time: Histogram,
    /// Queues for region edit requests.
    region_edit_queues: RegionEditQueues,
    /// Database level metadata manager.
@@ -749,10 +758,16 @@ impl<S: LogStore> RegionWorkerLoop<S> {
            tokio::select! {
                request_opt = self.receiver.recv() => {
                    match request_opt {
-                        Some(request) => match request {
-                            WorkerRequest::Write(sender_req) => write_req_buffer.push(sender_req),
-                            WorkerRequest::Ddl(sender_req) => ddl_req_buffer.push(sender_req),
-                            _ => general_req_buffer.push(request),
+                        Some(request_with_time) => {
+                            // Observe the wait time
+                            let wait_time = request_with_time.created_at.elapsed();
+                            self.request_wait_time.observe(wait_time.as_secs_f64());
+
+                            match request_with_time.request {
+                                WorkerRequest::Write(sender_req) => write_req_buffer.push(sender_req),
+                                WorkerRequest::Ddl(sender_req) => ddl_req_buffer.push(sender_req),
+                                req => general_req_buffer.push(req),
+                            }
                        },
                        // The channel is disconnected.
                        None => break,
@@ -791,11 +806,17 @@ impl<S: LogStore> RegionWorkerLoop<S> {
            for _ in 1..self.config.worker_request_batch_size {
                // We have received one request so we start from 1.
                match self.receiver.try_recv() {
-                    Ok(req) => match req {
-                        WorkerRequest::Write(sender_req) => write_req_buffer.push(sender_req),
-                        WorkerRequest::Ddl(sender_req) => ddl_req_buffer.push(sender_req),
-                        _ => general_req_buffer.push(req),
-                    },
+                    Ok(request_with_time) => {
+                        // Observe the wait time
+                        let wait_time = request_with_time.created_at.elapsed();
+                        self.request_wait_time.observe(wait_time.as_secs_f64());
+
+                        match request_with_time.request {
+                            WorkerRequest::Write(sender_req) => write_req_buffer.push(sender_req),
+                            WorkerRequest::Ddl(sender_req) => ddl_req_buffer.push(sender_req),
+                            req => general_req_buffer.push(req),
+                        }
+                    }
                    // We still need to handle remaining requests.
                    Err(_) => break,
                }
--- a/src/mito2/src/worker/handle_bulk_insert.rs
+++ b/src/mito2/src/worker/handle_bulk_insert.rs
@@ -15,15 +15,11 @@
 //! Handles bulk insert requests.

 use datatypes::arrow;
-use datatypes::arrow::array::{
-    TimestampMicrosecondArray, TimestampMillisecondArray, TimestampNanosecondArray,
-    TimestampSecondArray,
-};
-use datatypes::arrow::datatypes::{DataType, TimeUnit};
 use store_api::logstore::LogStore;
 use store_api::metadata::RegionMetadataRef;
 use store_api::region_request::RegionBulkInsertsRequest;

+use crate::error::InconsistentTimestampLengthSnafu;
 use crate::memtable::bulk::part::BulkPart;
 use crate::request::{OptionOutputTx, SenderBulkRequest};
 use crate::worker::RegionWorkerLoop;
@@ -41,6 +37,10 @@ impl<S: LogStore> RegionWorkerLoop<S> {
            .with_label_values(&["process_bulk_req"])
            .start_timer();
        let batch = request.payload;
+        if batch.num_rows() == 0 {
+            sender.send(Ok(0));
+            return;
+        }

        let Some((ts_index, ts)) = batch
            .schema()
@@ -60,55 +60,23 @@ impl<S: LogStore> RegionWorkerLoop<S> {
            return;
        };

-        let DataType::Timestamp(unit, _) = ts.data_type() else {
-            // safety: ts data type must be a timestamp type.
-            unreachable!()
-        };
+        if batch.num_rows() != ts.len() {
+            sender.send(
+                InconsistentTimestampLengthSnafu {
+                    expected: batch.num_rows(),
+                    actual: ts.len(),
+                }
+                .fail(),
+            );
+            return;
+        }

-        let (min_ts, max_ts) = match unit {
-            TimeUnit::Second => {
-                let ts = ts.as_any().downcast_ref::<TimestampSecondArray>().unwrap();
-                (
-                    //safety: ts array must contain at least one row so this won't return None.
-                    arrow::compute::min(ts).unwrap(),
-                    arrow::compute::max(ts).unwrap(),
-                )
-            }
+        // safety: ts data type must be a timestamp type.
+        let (ts_primitive, _) = datatypes::timestamp::timestamp_array_to_primitive(ts).unwrap();

-            TimeUnit::Millisecond => {
-                let ts = ts
-                    .as_any()
-                    .downcast_ref::<TimestampMillisecondArray>()
-                    .unwrap();
-                (
-                    //safety: ts array must contain at least one row so this won't return None.
-                    arrow::compute::min(ts).unwrap(),
-                    arrow::compute::max(ts).unwrap(),
-                )
-            }
-            TimeUnit::Microsecond => {
-                let ts = ts
-                    .as_any()
-                    .downcast_ref::<TimestampMicrosecondArray>()
-                    .unwrap();
-                (
-                    //safety: ts array must contain at least one row so this won't return None.
-                    arrow::compute::min(ts).unwrap(),
-                    arrow::compute::max(ts).unwrap(),
-                )
-            }
-            TimeUnit::Nanosecond => {
-                let ts = ts
-                    .as_any()
-                    .downcast_ref::<TimestampNanosecondArray>()
-                    .unwrap();
-                (
-                    //safety: ts array must contain at least one row so this won't return None.
-                    arrow::compute::min(ts).unwrap(),
-                    arrow::compute::max(ts).unwrap(),
-                )
-            }
-        };
+        // safety: we've checked ts.len() == batch.num_rows() and batch is not empty
+        let min_ts = arrow::compute::min(&ts_primitive).unwrap();
+        let max_ts = arrow::compute::max(&ts_primitive).unwrap();

        let part = BulkPart {
            batch,
--- a/src/mito2/src/worker/handle_manifest.rs
+++ b/src/mito2/src/worker/handle_manifest.rs
@@ -34,7 +34,7 @@ use crate::region::version::VersionBuilder;
 use crate::region::{MitoRegionRef, RegionLeaderState, RegionRoleState};
 use crate::request::{
    BackgroundNotify, OptionOutputTx, RegionChangeResult, RegionEditRequest, RegionEditResult,
-    RegionSyncRequest, TruncateResult, WorkerRequest,
+    RegionSyncRequest, TruncateResult, WorkerRequest, WorkerRequestWithTime,
 };
 use crate::sst::location;
 use crate::worker::{RegionWorkerLoop, WorkerListener};
@@ -230,7 +230,10 @@ impl<S> RegionWorkerLoop<S> {
                }),
            };
            // We don't set state back as the worker loop is already exited.
-            if let Err(res) = request_sender.send(notify).await {
+            if let Err(res) = request_sender
+                .send(WorkerRequestWithTime::new(notify))
+                .await
+            {
                warn!(
                    "Failed to send region edit result back to the worker, region_id: {}, res: {:?}",
                    region_id, res
@@ -318,10 +321,10 @@ impl<S> RegionWorkerLoop<S> {
                truncated_sequence: truncate.truncated_sequence,
            };
            let _ = request_sender
-                .send(WorkerRequest::Background {
+                .send(WorkerRequestWithTime::new(WorkerRequest::Background {
                    region_id: truncate.region_id,
                    notify: BackgroundNotify::Truncate(truncate_result),
-                })
+                }))
                .await
                .inspect_err(|_| warn!("failed to send truncate result"));
        });
@@ -364,7 +367,10 @@ impl<S> RegionWorkerLoop<S> {
                .on_notify_region_change_result_begin(region.region_id)
                .await;

-            if let Err(res) = request_sender.send(notify).await {
+            if let Err(res) = request_sender
+                .send(WorkerRequestWithTime::new(notify))
+                .await
+            {
                warn!(
                    "Failed to send region change result back to the worker, region_id: {}, res: {:?}",
                    region.region_id, res
--- a/src/mito2/src/worker/handle_write.rs
+++ b/src/mito2/src/worker/handle_write.rs
@@ -27,7 +27,9 @@ use store_api::storage::RegionId;

 use crate::error::{InvalidRequestSnafu, RegionStateSnafu, RejectWriteSnafu, Result};
 use crate::metrics;
-use crate::metrics::{WRITE_REJECT_TOTAL, WRITE_ROWS_TOTAL, WRITE_STAGE_ELAPSED};
+use crate::metrics::{
+    WRITE_REJECT_TOTAL, WRITE_ROWS_TOTAL, WRITE_STAGE_ELAPSED, WRITE_STALL_TOTAL,
+};
 use crate::region::{RegionLeaderState, RegionRoleState};
 use crate::region_write_ctx::RegionWriteCtx;
 use crate::request::{SenderBulkRequest, SenderWriteRequest, WriteRequest};
@@ -57,8 +59,9 @@ impl<S: LogStore> RegionWorkerLoop<S> {
        }

        if self.write_buffer_manager.should_stall() && allow_stall {
-            self.stalled_count
-                .add((write_requests.len() + bulk_requests.len()) as i64);
+            let stalled_count = (write_requests.len() + bulk_requests.len()) as i64;
+            self.stalling_count.add(stalled_count);
+            WRITE_STALL_TOTAL.inc_by(stalled_count as u64);
            self.stalled_requests.append(write_requests, bulk_requests);
            self.listener.on_write_stall();
            return;
@@ -161,7 +164,7 @@ impl<S: LogStore> RegionWorkerLoop<S> {
    pub(crate) async fn handle_stalled_requests(&mut self) {
        // Handle stalled requests.
        let stalled = std::mem::take(&mut self.stalled_requests);
-        self.stalled_count.sub(stalled.stalled_count() as i64);
+        self.stalling_count.sub(stalled.stalled_count() as i64);
        // We already stalled these requests, don't stall them again.
        for (_, (_, mut requests, mut bulk)) in stalled.requests {
            self.handle_write_requests(&mut requests, &mut bulk, false)
@@ -172,7 +175,7 @@ impl<S: LogStore> RegionWorkerLoop<S> {
    /// Rejects all stalled requests.
    pub(crate) fn reject_stalled_requests(&mut self) {
        let stalled = std::mem::take(&mut self.stalled_requests);
-        self.stalled_count.sub(stalled.stalled_count() as i64);
+        self.stalling_count.sub(stalled.stalled_count() as i64);
        for (_, (_, mut requests, mut bulk)) in stalled.requests {
            reject_write_requests(&mut requests, &mut bulk);
        }
@@ -182,7 +185,8 @@ impl<S: LogStore> RegionWorkerLoop<S> {
    pub(crate) fn reject_region_stalled_requests(&mut self, region_id: &RegionId) {
        debug!("Rejects stalled requests for region {}", region_id);
        let (mut requests, mut bulk) = self.stalled_requests.remove(region_id);
-        self.stalled_count.sub((requests.len() + bulk.len()) as i64);
+        self.stalling_count
+            .sub((requests.len() + bulk.len()) as i64);
        reject_write_requests(&mut requests, &mut bulk);
    }

@@ -190,7 +194,8 @@ impl<S: LogStore> RegionWorkerLoop<S> {
    pub(crate) async fn handle_region_stalled_requests(&mut self, region_id: &RegionId) {
        debug!("Handles stalled requests for region {}", region_id);
        let (mut requests, mut bulk) = self.stalled_requests.remove(region_id);
-        self.stalled_count.sub((requests.len() + bulk.len()) as i64);
+        self.stalling_count
+            .sub((requests.len() + bulk.len()) as i64);
        self.handle_write_requests(&mut requests, &mut bulk, true)
            .await;
    }
@@ -251,7 +256,8 @@ impl<S> RegionWorkerLoop<S> {
                            "Region {} is altering, add request to pending writes",
                            region.region_id
                        );
-                        self.stalled_count.add(1);
+                        self.stalling_count.add(1);
+                        WRITE_STALL_TOTAL.inc();
                        self.stalled_requests.push(sender_req);
                        continue;
                    }
@@ -353,7 +359,8 @@ impl<S> RegionWorkerLoop<S> {
                            "Region {} is altering, add request to pending writes",
                            region.region_id
                        );
-                        self.stalled_count.add(1);
+                        self.stalling_count.add(1);
+                        WRITE_STALL_TOTAL.inc();
                        self.stalled_requests.push_bulk(bulk_req);
                        continue;
                    }
--- a/src/operator/src/bulk_insert.rs
+++ b/src/operator/src/bulk_insert.rs
@@ -20,11 +20,7 @@ use api::v1::region::{
    bulk_insert_request, region_request, BulkInsertRequest, RegionRequest, RegionRequestHeader,
 };
 use api::v1::ArrowIpc;
-use arrow::array::{
-    Array, TimestampMicrosecondArray, TimestampMillisecondArray, TimestampNanosecondArray,
-    TimestampSecondArray,
-};
-use arrow::datatypes::{DataType, Int64Type, TimeUnit};
+use arrow::array::Array;
 use arrow::record_batch::RecordBatch;
 use common_base::AffectedRows;
 use common_grpc::flight::{FlightDecoder, FlightEncoder, FlightMessage};
@@ -62,6 +58,10 @@ impl Inserter {
        };
        decode_timer.observe_duration();

+        if record_batch.num_rows() == 0 {
+            return Ok(0);
+        }
+
        // notify flownode to update dirty timestamps if flow is configured.
        self.maybe_update_flow_dirty_window(table_info, record_batch.clone());

@@ -155,6 +155,9 @@ impl Inserter {
        let mut raw_data_bytes = None;
        for (peer, masks) in mask_per_datanode {
            for (region_id, mask) in masks {
+                if mask.select_none() {
+                    continue;
+                }
                let rb = record_batch.clone();
                let schema_bytes = schema_bytes.clone();
                let node_manager = self.node_manager.clone();
@@ -304,32 +307,11 @@ fn extract_timestamps(rb: &RecordBatch, timestamp_index_name: &str) -> error::Re
    if rb.num_rows() == 0 {
        return Ok(vec![]);
    }
-    let primitive = match ts_col.data_type() {
-        DataType::Timestamp(unit, _) => match unit {
-            TimeUnit::Second => ts_col
-                .as_any()
-                .downcast_ref::<TimestampSecondArray>()
-                .unwrap()
-                .reinterpret_cast::<Int64Type>(),
-            TimeUnit::Millisecond => ts_col
-                .as_any()
-                .downcast_ref::<TimestampMillisecondArray>()
-                .unwrap()
-                .reinterpret_cast::<Int64Type>(),
-            TimeUnit::Microsecond => ts_col
-                .as_any()
-                .downcast_ref::<TimestampMicrosecondArray>()
-                .unwrap()
-                .reinterpret_cast::<Int64Type>(),
-            TimeUnit::Nanosecond => ts_col
-                .as_any()
-                .downcast_ref::<TimestampNanosecondArray>()
-                .unwrap()
-                .reinterpret_cast::<Int64Type>(),
-        },
-        t => {
-            return error::InvalidTimeIndexTypeSnafu { ty: t.clone() }.fail();
-        }
-    };
+    let (primitive, _) =
+        datatypes::timestamp::timestamp_array_to_primitive(ts_col).with_context(|| {
+            error::InvalidTimeIndexTypeSnafu {
+                ty: ts_col.data_type().clone(),
+            }
+        })?;
    Ok(primitive.iter().flatten().collect())
 }
--- a/src/pipeline/src/etl.rs
+++ b/src/pipeline/src/etl.rs
@@ -229,6 +229,7 @@ impl DispatchedTo {
 pub enum PipelineExecOutput {
    Transformed(TransformedOutput),
    DispatchedTo(DispatchedTo, Value),
+    Filtered,
 }

 #[derive(Debug)]
@@ -309,6 +310,10 @@ impl Pipeline {
        // process
        for processor in self.processors.iter() {
            val = processor.exec_mut(val)?;
+            if val.is_null() {
+                // line is filtered
+                return Ok(PipelineExecOutput::Filtered);
+            }
        }

        // dispatch, fast return if matched
@@ -333,9 +338,9 @@ impl Pipeline {
                        table_suffix,
                    }));
                }
-                // continue v2 process, check ts column and set the rest fields with auto-transform
+                // continue v2 process, and set the rest fields with auto-transform
                // if transformer presents, then ts has been set
-                values_to_row(schema_info, val, pipeline_ctx, Some(values))?
+                values_to_row(schema_info, val, pipeline_ctx, Some(values), false)?
            }
            TransformerMode::AutoTransform(ts_name, time_unit) => {
                // infer ts from the context
@@ -347,7 +352,7 @@ impl Pipeline {
                ));
                let n_ctx =
                    PipelineContext::new(&def, pipeline_ctx.pipeline_param, pipeline_ctx.channel);
-                values_to_row(schema_info, val, &n_ctx, None)?
+                values_to_row(schema_info, val, &n_ctx, None, true)?
            }
        };

@@ -525,9 +530,6 @@ transform:
            .into_transformed()
            .unwrap();

-        // println!("[DEBUG]schema_info: {:?}", schema_info.schema);
-        // println!("[DEBUG]re: {:?}", result.0.values);
-
        assert_eq!(schema_info.schema.len(), result.0.values.len());
        let test = vec![
            (
--- a/src/pipeline/src/etl/processor.rs
+++ b/src/pipeline/src/etl/processor.rs
@@ -19,6 +19,7 @@ pub mod decolorize;
 pub mod digest;
 pub mod dissect;
 pub mod epoch;
+pub mod filter;
 pub mod gsub;
 pub mod join;
 pub mod json_parse;
@@ -54,6 +55,7 @@ use crate::error::{
    Result, UnsupportedProcessorSnafu,
 };
 use crate::etl::field::{Field, Fields};
+use crate::etl::processor::filter::FilterProcessor;
 use crate::etl::processor::json_parse::JsonParseProcessor;
 use crate::etl::processor::select::SelectProcessor;
 use crate::etl::processor::simple_extract::SimpleExtractProcessor;
@@ -146,6 +148,7 @@ pub enum ProcessorKind {
    Digest(DigestProcessor),
    Select(SelectProcessor),
    Vrl(VrlProcessor),
+    Filter(FilterProcessor),
 }

 #[derive(Debug, Default)]
@@ -226,6 +229,7 @@ fn parse_processor(doc: &yaml_rust::Yaml) -> Result<ProcessorKind> {
        }
        vrl::PROCESSOR_VRL => ProcessorKind::Vrl(VrlProcessor::try_from(value)?),
        select::PROCESSOR_SELECT => ProcessorKind::Select(SelectProcessor::try_from(value)?),
+        filter::PROCESSOR_FILTER => ProcessorKind::Filter(FilterProcessor::try_from(value)?),
        _ => return UnsupportedProcessorSnafu { processor: str_key }.fail(),
    };

--- a/src/pipeline/src/etl/processor/filter.rs
+++ b/src/pipeline/src/etl/processor/filter.rs
@@ -0,0 +1,242 @@
+// Copyright 2023 Greptime Team
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+use ahash::{HashSet, HashSetExt};
+use snafu::OptionExt;
+
+use crate::error::{
+    Error, KeyMustBeStringSnafu, ProcessorExpectStringSnafu, ProcessorMissingFieldSnafu, Result,
+    ValueMustBeMapSnafu,
+};
+use crate::etl::field::Fields;
+use crate::etl::processor::{
+    yaml_bool, yaml_new_field, yaml_new_fields, yaml_string, yaml_strings, FIELDS_NAME, FIELD_NAME,
+};
+use crate::{Processor, Value};
+
+pub(crate) const PROCESSOR_FILTER: &str = "filter";
+
+const MATCH_MODE_NAME: &str = "mode";
+const MATCH_OP_NAME: &str = "match_op";
+const CASE_INSENSITIVE_NAME: &str = "case_insensitive";
+const TARGETS_NAME: &str = "targets";
+
+#[derive(Debug)]
+enum MatchMode {
+    SimpleMatch(MatchOp),
+}
+
+impl Default for MatchMode {
+    fn default() -> Self {
+        Self::SimpleMatch(MatchOp::default())
+    }
+}
+
+#[derive(Debug, Default)]
+enum MatchOp {
+    #[default]
+    In,
+    NotIn,
+}
+
+/// Filter out the whole line if matches.
+/// Ultimately it's a condition check, maybe we can use VRL to do more complex check.
+/// Implement simple string match for now. Can be extended later.
+#[derive(Debug, Default)]
+pub struct FilterProcessor {
+    fields: Fields,
+    mode: MatchMode,
+    case_insensitive: bool,
+    targets: HashSet<String>,
+}
+
+impl TryFrom<&yaml_rust::yaml::Hash> for FilterProcessor {
+    type Error = Error;
+
+    // match mode can be extended in the future
+    #[allow(clippy::single_match)]
+    fn try_from(value: &yaml_rust::yaml::Hash) -> std::result::Result<Self, Self::Error> {
+        let mut fields = Fields::default();
+        let mut mode = MatchMode::default();
+        let mut op = MatchOp::default();
+        let mut case_insensitive = true;
+        let mut targets = HashSet::new();
+
+        for (k, v) in value.iter() {
+            let key = k
+                .as_str()
+                .with_context(|| KeyMustBeStringSnafu { k: k.clone() })?;
+            match key {
+                FIELD_NAME => fields = Fields::one(yaml_new_field(v, FIELD_NAME)?),
+                FIELDS_NAME => fields = yaml_new_fields(v, FIELDS_NAME)?,
+                MATCH_MODE_NAME => match yaml_string(v, MATCH_MODE_NAME)?.as_str() {
+                    "simple" => mode = MatchMode::SimpleMatch(MatchOp::In),
+                    _ => {}
+                },
+                MATCH_OP_NAME => match yaml_string(v, MATCH_OP_NAME)?.as_str() {
+                    "in" => op = MatchOp::In,
+                    "not_in" => op = MatchOp::NotIn,
+                    _ => {}
+                },
+                CASE_INSENSITIVE_NAME => case_insensitive = yaml_bool(v, CASE_INSENSITIVE_NAME)?,
+                TARGETS_NAME => {
+                    yaml_strings(v, TARGETS_NAME)?
+                        .into_iter()
+                        .filter(|s| !s.is_empty())
+                        .for_each(|s| {
+                            targets.insert(s);
+                        });
+                }
+                _ => {}
+            }
+        }
+
+        if matches!(mode, MatchMode::SimpleMatch(_)) {
+            mode = MatchMode::SimpleMatch(op);
+        }
+
+        if targets.is_empty() {
+            return ProcessorMissingFieldSnafu {
+                processor: PROCESSOR_FILTER,
+                field: TARGETS_NAME.to_string(),
+            }
+            .fail();
+        }
+
+        if case_insensitive {
+            targets = targets.into_iter().map(|s| s.to_lowercase()).collect();
+        }
+
+        Ok(FilterProcessor {
+            fields,
+            mode,
+            case_insensitive,
+            targets,
+        })
+    }
+}
+
+impl FilterProcessor {
+    fn match_target(&self, input: String) -> bool {
+        let input = if self.case_insensitive {
+            input.to_lowercase()
+        } else {
+            input
+        };
+
+        match &self.mode {
+            MatchMode::SimpleMatch(op) => match op {
+                MatchOp::In => self.targets.contains(&input),
+                MatchOp::NotIn => !self.targets.contains(&input),
+            },
+        }
+    }
+}
+
+impl Processor for FilterProcessor {
+    fn kind(&self) -> &str {
+        PROCESSOR_FILTER
+    }
+
+    fn ignore_missing(&self) -> bool {
+        true
+    }
+
+    fn exec_mut(&self, mut val: Value) -> Result<Value> {
+        let v_map = val.as_map_mut().context(ValueMustBeMapSnafu)?;
+
+        for field in self.fields.iter() {
+            let index = field.input_field();
+            match v_map.get(index) {
+                Some(Value::String(s)) => {
+                    if self.match_target(s.clone()) {
+                        return Ok(Value::Null);
+                    }
+                }
+                Some(v) => {
+                    return ProcessorExpectStringSnafu {
+                        processor: self.kind(),
+                        v: v.clone(),
+                    }
+                    .fail();
+                }
+                None => {}
+            }
+        }
+
+        Ok(val)
+    }
+}
+
+#[cfg(test)]
+mod test {
+    use ahash::HashSet;
+
+    use crate::etl::field::{Field, Fields};
+    use crate::etl::processor::filter::{FilterProcessor, MatchMode, MatchOp};
+    use crate::{Map, Processor, Value};
+
+    #[test]
+    fn test_eq() {
+        let processor = FilterProcessor {
+            fields: Fields::one(Field::new("name", None)),
+            mode: MatchMode::SimpleMatch(MatchOp::In),
+            case_insensitive: false,
+            targets: HashSet::from_iter(vec!["John".to_string()]),
+        };
+
+        let val = Value::Map(Map::one("name", Value::String("John".to_string())));
+
+        let result = processor.exec_mut(val).unwrap();
+        assert_eq!(result, Value::Null);
+
+        let val = Value::Map(Map::one("name", Value::String("Wick".to_string())));
+        let expect = val.clone();
+        let result = processor.exec_mut(val).unwrap();
+        assert_eq!(result, expect);
+    }
+
+    #[test]
+    fn test_ne() {
+        let processor = FilterProcessor {
+            fields: Fields::one(Field::new("name", None)),
+            mode: MatchMode::SimpleMatch(MatchOp::NotIn),
+            case_insensitive: false,
+            targets: HashSet::from_iter(vec!["John".to_string()]),
+        };
+
+        let val = Value::Map(Map::one("name", Value::String("John".to_string())));
+        let expect = val.clone();
+        let result = processor.exec_mut(val).unwrap();
+        assert_eq!(result, expect);
+
+        let val = Value::Map(Map::one("name", Value::String("Wick".to_string())));
+        let result = processor.exec_mut(val).unwrap();
+        assert_eq!(result, Value::Null);
+    }
+
+    #[test]
+    fn test_case() {
+        let processor = FilterProcessor {
+            fields: Fields::one(Field::new("name", None)),
+            mode: MatchMode::SimpleMatch(MatchOp::In),
+            case_insensitive: true,
+            targets: HashSet::from_iter(vec!["john".to_string()]),
+        };
+
+        let val = Value::Map(Map::one("name", Value::String("JoHN".to_string())));
+        let result = processor.exec_mut(val).unwrap();
+        assert_eq!(result, Value::Null);
+    }
+}
--- a/src/pipeline/src/etl/transform/transformer/greptime.rs
+++ b/src/pipeline/src/etl/transform/transformer/greptime.rs
@@ -420,15 +420,17 @@ pub(crate) fn values_to_row(
    values: Value,
    pipeline_ctx: &PipelineContext<'_>,
    row: Option<Vec<GreptimeValue>>,
+    need_calc_ts: bool,
 ) -> Result<Row> {
    let mut row: Vec<GreptimeValue> =
        row.unwrap_or_else(|| Vec::with_capacity(schema_info.schema.len()));
    let custom_ts = pipeline_ctx.pipeline_definition.get_custom_ts();

-    // calculate timestamp value based on the channel
-    let ts = calc_ts(pipeline_ctx, &values)?;
-
-    row.push(GreptimeValue { value_data: ts });
+    if need_calc_ts {
+        // calculate timestamp value based on the channel
+        let ts = calc_ts(pipeline_ctx, &values)?;
+        row.push(GreptimeValue { value_data: ts });
+    }

    row.resize(schema_info.schema.len(), GreptimeValue { value_data: None });

@@ -608,7 +610,7 @@ fn identity_pipeline_inner(
            skip_error
        );
        let row = unwrap_or_continue_if_err!(
-            values_to_row(&mut schema_info, pipeline_map, pipeline_ctx, None),
+            values_to_row(&mut schema_info, pipeline_map, pipeline_ctx, None, true),
            skip_error
        );

--- a/src/promql/src/extension_plan/range_manipulate.rs
+++ b/src/promql/src/extension_plan/range_manipulate.rs
@@ -340,7 +340,14 @@ impl ExecutionPlan for RangeManipulateExec {
    }

    fn required_input_distribution(&self) -> Vec<Distribution> {
-        self.input.required_input_distribution()
+        let input_requirement = self.input.required_input_distribution();
+        if input_requirement.is_empty() {
+            // if the input is EmptyMetric, its required_input_distribution() is empty so we can't
+            // use its input distribution.
+            vec![Distribution::UnspecifiedDistribution]
+        } else {
+            input_requirement
+        }
    }

    fn with_new_children(
--- a/src/query/src/analyze.rs
+++ b/src/query/src/analyze.rs
@@ -237,7 +237,8 @@ fn create_output_batch(
            for (node, metric) in sub_stage_metrics.into_iter().enumerate() {
                builder.append_metric(1, node as _, metrics_to_string(metric, format)?);
            }
-            return Ok(TreeNodeRecursion::Stop);
+            // might have multiple merge scans, so continue
+            return Ok(TreeNodeRecursion::Continue);
        }
        Ok(TreeNodeRecursion::Continue)
    })?;
--- a/src/query/src/dist_plan/analyzer.rs
+++ b/src/query/src/dist_plan/analyzer.rs
@@ -12,7 +12,7 @@
 // See the License for the specific language governing permissions and
 // limitations under the License.

-use std::collections::HashSet;
+use std::collections::{HashMap, HashSet};
 use std::sync::Arc;

 use common_telemetry::debug;
@@ -38,6 +38,13 @@ use crate::dist_plan::merge_scan::MergeScanLogicalPlan;
 use crate::plan::ExtractExpr;
 use crate::query_engine::DefaultSerializer;

+#[cfg(test)]
+mod test;
+
+mod utils;
+
+pub(crate) use utils::{AliasMapping, AliasTracker};
+
 #[derive(Debug)]
 pub struct DistPlannerAnalyzer;

@@ -154,8 +161,50 @@ struct PlanRewriter {
    status: RewriterStatus,
    /// Partition columns of the table in current pass
    partition_cols: Option<Vec<String>>,
-    column_requirements: HashSet<Column>,
+    alias_tracker: Option<AliasTracker>,
+    /// use stack count as scope to determine column requirements is needed or not
+    /// i.e for a logical plan like:
+    /// ```ignore
+    /// 1: Projection: t.number
+    /// 2: Sort: t.pk1+t.pk2
+    /// 3. Projection: t.number, t.pk1, t.pk2
+    /// ```
+    /// `Sort` will make a column requirement for `t.pk1` at level 2.
+    /// Which making `Projection` at level 1 need to add a ref to `t.pk1` as well.
+    /// So that the expanded plan will be
+    /// ```ignore
+    /// Projection: t.number
+    ///   MergeSort: t.pk1
+    ///     MergeScan: remote_input=
+    /// Projection: t.number, "t.pk1+t.pk2" <--- the original `Projection` at level 1 get added with `t.pk1+t.pk2`
+    ///  Sort: t.pk1+t.pk2
+    ///    Projection: t.number, t.pk1, t.pk2
+    /// ```
+    /// Making `MergeSort` can have `t.pk1` as input.
+    /// Meanwhile `Projection` at level 3 doesn't need to add any new column because 3 > 2
+    /// and col requirements at level 2 is not applicable for level 3.
+    ///
+    /// see more details in test `expand_proj_step_aggr` and `expand_proj_sort_proj`
+    ///
+    /// TODO(discord9): a simpler solution to track column requirements for merge scan
+    column_requirements: Vec<(HashSet<Column>, usize)>,
+    /// Whether to expand on next call
+    /// This is used to handle the case where a plan is transformed, but need to be expanded from it's
+    /// parent node. For example a Aggregate plan is split into two parts in frontend and datanode, and need
+    /// to be expanded from the parent node of the Aggregate plan.
    expand_on_next_call: bool,
+    /// Expanding on next partial/conditional/transformed commutative plan
+    /// This is used to handle the case where a plan is transformed, but still
+    /// need to push down as many node as possible before next partial/conditional/transformed commutative
+    /// plan. I.e.
+    /// ```ignore
+    /// Limit:
+    ///     Sort:
+    /// ```
+    /// where `Limit` is partial commutative, and `Sort` is conditional commutative.
+    /// In this case, we need to expand the `Limit` plan,
+    /// so that we can push down the `Sort` plan as much as possible.
+    expand_on_next_part_cond_trans_commutative: bool,
    new_child_plan: Option<LogicalPlan>,
 }

@@ -171,21 +220,57 @@ impl PlanRewriter {

    /// Return true if should stop and expand. The input plan is the parent node of current node
    fn should_expand(&mut self, plan: &LogicalPlan) -> bool {
+        debug!(
+            "Check should_expand at level: {}  with Stack:\n{}, ",
+            self.level,
+            self.stack
+                .iter()
+                .map(|(p, l)| format!("{l}:{}{}", "  ".repeat(l - 1), p.display()))
+                .collect::<Vec<String>>()
+                .join("\n"),
+        );
        if DFLogicalSubstraitConvertor
            .encode(plan, DefaultSerializer)
            .is_err()
        {
            return true;
        }
+
        if self.expand_on_next_call {
            self.expand_on_next_call = false;
            return true;
        }
-        match Categorizer::check_plan(plan, self.partition_cols.clone()) {
+
+        if self.expand_on_next_part_cond_trans_commutative {
+            let comm = Categorizer::check_plan(plan, self.get_aliased_partition_columns());
+            match comm {
+                Commutativity::PartialCommutative => {
+                    // a small difference is that for partial commutative, we still need to
+                    // push down it(so `Limit` can be pushed down)
+
+                    // notice how limit needed to be expanded as well to make sure query is correct
+                    // i.e. `Limit fetch=10` need to be pushed down to the leaf node
+                    self.expand_on_next_part_cond_trans_commutative = false;
+                    self.expand_on_next_call = true;
+                }
+                Commutativity::ConditionalCommutative(_)
+                | Commutativity::TransformedCommutative { .. } => {
+                    // again a new node that can be push down, we should just
+                    // do push down now and avoid further expansion
+                    self.expand_on_next_part_cond_trans_commutative = false;
+                    return true;
+                }
+                _ => (),
+            }
+        }
+
+        match Categorizer::check_plan(plan, self.get_aliased_partition_columns()) {
            Commutativity::Commutative => {}
            Commutativity::PartialCommutative => {
                if let Some(plan) = partial_commutative_transformer(plan) {
-                    self.update_column_requirements(&plan);
+                    // notice this plan is parent of current node, so `self.level - 1` when updating column requirements
+                    self.update_column_requirements(&plan, self.level - 1);
+                    self.expand_on_next_part_cond_trans_commutative = true;
                    self.stage.push(plan)
                }
            }
@@ -193,7 +278,9 @@ impl PlanRewriter {
                if let Some(transformer) = transformer
                    && let Some(plan) = transformer(plan)
                {
-                    self.update_column_requirements(&plan);
+                    // notice this plan is parent of current node, so `self.level - 1` when updating column requirements
+                    self.update_column_requirements(&plan, self.level - 1);
+                    self.expand_on_next_part_cond_trans_commutative = true;
                    self.stage.push(plan)
                }
            }
@@ -202,12 +289,22 @@ impl PlanRewriter {
                    && let Some(transformer_actions) = transformer(plan)
                {
                    debug!(
-                        "PlanRewriter: transformed plan: {:#?}\n from {plan}",
-                        transformer_actions.extra_parent_plans
+                        "PlanRewriter: transformed plan: {}\n from {plan}",
+                        transformer_actions
+                            .extra_parent_plans
+                            .iter()
+                            .enumerate()
+                            .map(|(i, p)| format!(
+                                "Extra {i}-th parent plan from parent to child = {}",
+                                p.display()
+                            ))
+                            .collect::<Vec<_>>()
+                            .join("\n")
                    );
                    if let Some(last_stage) = transformer_actions.extra_parent_plans.last() {
                        // update the column requirements from the last stage
-                        self.update_column_requirements(last_stage);
+                        // notice current plan's parent plan is where we need to apply the column requirements
+                        self.update_column_requirements(last_stage, self.level - 1);
                    }
                    self.stage
                        .extend(transformer_actions.extra_parent_plans.into_iter().rev());
@@ -225,16 +322,25 @@ impl PlanRewriter {
        false
    }

-    fn update_column_requirements(&mut self, plan: &LogicalPlan) {
+    /// Update the column requirements for the current plan, plan_level is the level of the plan
+    /// in the stack, which is used to determine if the column requirements are applicable
+    /// for other plans in the stack.
+    fn update_column_requirements(&mut self, plan: &LogicalPlan, plan_level: usize) {
+        debug!(
+            "PlanRewriter: update column requirements for plan: {plan}\n with old column_requirements: {:?}",
+            self.column_requirements
+        );
        let mut container = HashSet::new();
        for expr in plan.expressions() {
            // this method won't fail
            let _ = expr_to_columns(&expr, &mut container);
        }

-        for col in container {
-            self.column_requirements.insert(col);
-        }
+        self.column_requirements.push((container, plan_level));
+        debug!(
+            "PlanRewriter: updated column requirements: {:?}",
+            self.column_requirements
+        );
    }

    fn is_expanded(&self) -> bool {
@@ -249,6 +355,45 @@ impl PlanRewriter {
        self.status = RewriterStatus::Unexpanded;
    }

+    /// Maybe update alias for original table columns in the plan
+    fn maybe_update_alias(&mut self, node: &LogicalPlan) {
+        if let Some(alias_tracker) = &mut self.alias_tracker {
+            alias_tracker.update_alias(node);
+            debug!(
+                "Current partition columns are: {:?}",
+                self.get_aliased_partition_columns()
+            );
+        } else if let LogicalPlan::TableScan(table_scan) = node {
+            self.alias_tracker = AliasTracker::new(table_scan);
+            debug!(
+                "Initialize partition columns: {:?} with table={}",
+                self.get_aliased_partition_columns(),
+                table_scan.table_name
+            );
+        }
+    }
+
+    fn get_aliased_partition_columns(&self) -> Option<AliasMapping> {
+        if let Some(part_cols) = self.partition_cols.as_ref() {
+            let Some(alias_tracker) = &self.alias_tracker else {
+                // no alias tracker meaning no table scan encountered
+                return None;
+            };
+            let mut aliased = HashMap::new();
+            for part_col in part_cols {
+                let all_alias = alias_tracker
+                    .get_all_alias_for_col(part_col)
+                    .cloned()
+                    .unwrap_or_default();
+
+                aliased.insert(part_col.clone(), all_alias);
+            }
+            Some(aliased)
+        } else {
+            None
+        }
+    }
+
    fn maybe_set_partitions(&mut self, plan: &LogicalPlan) {
        if self.partition_cols.is_some() {
            // only need to set once
@@ -294,10 +439,15 @@ impl PlanRewriter {
        }
        // store schema before expand
        let schema = on_node.schema().clone();
-        let mut rewriter = EnforceDistRequirementRewriter {
-            column_requirements: std::mem::take(&mut self.column_requirements),
-        };
+        let mut rewriter = EnforceDistRequirementRewriter::new(
+            std::mem::take(&mut self.column_requirements),
+            self.level,
+        );
+        debug!("PlanRewriter: enforce column requirements for node: {on_node} with rewriter: {rewriter:?}");
        on_node = on_node.rewrite(&mut rewriter)?.data;
+        debug!(
+            "PlanRewriter: after enforced column requirements for node: {on_node} with rewriter: {rewriter:?}"
+        );

        // add merge scan as the new root
        let mut node = MergeScanLogicalPlan::new(
@@ -316,7 +466,8 @@ impl PlanRewriter {
        }
        self.set_expanded();

-        // recover the schema
+        // recover the schema, this make sure after expand the schema is the same as old node
+        // because after expand the raw top node might have extra columns i.e. sorting columns for `Sort` node
        let node = LogicalPlanBuilder::from(node)
            .project(schema.iter().map(|(qualifier, field)| {
                Expr::Column(Column::new(qualifier.cloned(), field.name()))
@@ -333,42 +484,96 @@ impl PlanRewriter {
 /// Requirements enforced by this rewriter:
 /// - Enforce column requirements for `LogicalPlan::Projection` nodes. Makes sure the
 ///   required columns are available in the sub plan.
+///
+#[derive(Debug)]
 struct EnforceDistRequirementRewriter {
-    column_requirements: HashSet<Column>,
+    /// only enforce column requirements after the expanding node in question,
+    /// meaning only for node with `cur_level` <= `level` will consider adding those column requirements
+    /// TODO(discord9): a simpler solution to track column requirements for merge scan
+    column_requirements: Vec<(HashSet<Column>, usize)>,
+    /// only apply column requirements >= `cur_level`
+    /// this is used to avoid applying column requirements that are not needed
+    /// for the current node, i.e. the node is not in the scope of the column requirements
+    /// i.e, for this plan:
+    /// ```ignore
+    /// Aggregate: min(t.number)
+    ///   Projection: t.number
+    /// ```
+    /// when on `Projection` node, we don't need to apply the column requirements of `Aggregate` node
+    /// because the `Projection` node is not in the scope of the `Aggregate` node
+    cur_level: usize,
+}
+
+impl EnforceDistRequirementRewriter {
+    fn new(column_requirements: Vec<(HashSet<Column>, usize)>, cur_level: usize) -> Self {
+        Self {
+            column_requirements,
+            cur_level,
+        }
+    }
 }

 impl TreeNodeRewriter for EnforceDistRequirementRewriter {
    type Node = LogicalPlan;

    fn f_down(&mut self, node: Self::Node) -> DfResult<Transformed<Self::Node>> {
-        if let LogicalPlan::Projection(ref projection) = node {
-            let mut column_requirements = std::mem::take(&mut self.column_requirements);
-            if column_requirements.is_empty() {
-                return Ok(Transformed::no(node));
-            }
-
-            for expr in &projection.expr {
-                let (qualifier, name) = expr.qualified_name();
-                let column = Column::new(qualifier, name);
-                column_requirements.remove(&column);
-            }
-            if column_requirements.is_empty() {
-                return Ok(Transformed::no(node));
-            }
-
-            let mut new_exprs = projection.expr.clone();
-            for col in &column_requirements {
-                new_exprs.push(Expr::Column(col.clone()));
-            }
-            let new_node =
-                node.with_new_exprs(new_exprs, node.inputs().into_iter().cloned().collect())?;
-            return Ok(Transformed::yes(new_node));
+        // check that node doesn't have multiple children, i.e. join/subquery
+        if node.inputs().len() > 1 {
+            return Err(datafusion_common::DataFusionError::Internal(
+                "EnforceDistRequirementRewriter: node with multiple inputs is not supported"
+                    .to_string(),
+            ));
        }
-
+        self.cur_level += 1;
        Ok(Transformed::no(node))
    }

    fn f_up(&mut self, node: Self::Node) -> DfResult<Transformed<Self::Node>> {
+        self.cur_level -= 1;
+        // first get all applicable column requirements
+        let mut applicable_column_requirements = self
+            .column_requirements
+            .iter()
+            .filter(|(_, level)| *level >= self.cur_level)
+            .map(|(cols, _)| cols.clone())
+            .reduce(|mut acc, cols| {
+                acc.extend(cols);
+                acc
+            })
+            .unwrap_or_default();
+
+        debug!(
+            "EnforceDistRequirementRewriter: applicable column requirements at level {} = {:?} for node {}",
+            self.cur_level,
+            applicable_column_requirements,
+            node.display()
+        );
+
+        // make sure all projection applicable scope has the required columns
+        if let LogicalPlan::Projection(ref projection) = node {
+            for expr in &projection.expr {
+                let (qualifier, name) = expr.qualified_name();
+                let column = Column::new(qualifier, name);
+                applicable_column_requirements.remove(&column);
+            }
+            if applicable_column_requirements.is_empty() {
+                return Ok(Transformed::no(node));
+            }
+
+            let mut new_exprs = projection.expr.clone();
+            for col in &applicable_column_requirements {
+                new_exprs.push(Expr::Column(col.clone()));
+            }
+            let new_node =
+                node.with_new_exprs(new_exprs, node.inputs().into_iter().cloned().collect())?;
+            debug!(
+                "EnforceDistRequirementRewriter: added missing columns {:?} to projection node from old node: \n{node}\n Making new node: \n{new_node}",
+                applicable_column_requirements
+            );
+
+            // still need to continue for next projection if applicable
+            return Ok(Transformed::yes(new_node));
+        }
        Ok(Transformed::no(node))
    }
 }
@@ -384,6 +589,7 @@ impl TreeNodeRewriter for PlanRewriter {
        self.stage.clear();
        self.set_unexpanded();
        self.partition_cols = None;
+        self.alias_tracker = None;
        Ok(Transformed::no(node))
    }

@@ -406,8 +612,19 @@ impl TreeNodeRewriter for PlanRewriter {

        self.maybe_set_partitions(&node);

+        self.maybe_update_alias(&node);
+
        let Some(parent) = self.get_parent() else {
-            let node = self.expand(node)?;
+            debug!("Plan Rewriter: expand now for no parent found for node: {node}");
+            let node = self.expand(node);
+            debug!(
+                "PlanRewriter: expanded plan: {}",
+                match &node {
+                    Ok(n) => n.to_string(),
+                    Err(e) => format!("Error expanding plan: {e}"),
+                }
+            );
+            let node = node?;
            self.pop_stack();
            return Ok(Transformed::yes(node));
        };
@@ -435,160 +652,3 @@ impl TreeNodeRewriter for PlanRewriter {
        Ok(Transformed::no(node))
    }
 }
-
-#[cfg(test)]
-mod test {
-    use std::sync::Arc;
-
-    use datafusion::datasource::DefaultTableSource;
-    use datafusion::functions_aggregate::expr_fn::avg;
-    use datafusion_common::JoinType;
-    use datafusion_expr::{col, lit, Expr, LogicalPlanBuilder};
-    use table::table::adapter::DfTableProviderAdapter;
-    use table::table::numbers::NumbersTable;
-
-    use super::*;
-
-    #[ignore = "Projection is disabled for https://github.com/apache/arrow-datafusion/issues/6489"]
-    #[test]
-    fn transform_simple_projection_filter() {
-        let numbers_table = NumbersTable::table(0);
-        let table_source = Arc::new(DefaultTableSource::new(Arc::new(
-            DfTableProviderAdapter::new(numbers_table),
-        )));
-
-        let plan = LogicalPlanBuilder::scan_with_filters("t", table_source, None, vec![])
-            .unwrap()
-            .filter(col("number").lt(lit(10)))
-            .unwrap()
-            .project(vec![col("number")])
-            .unwrap()
-            .distinct()
-            .unwrap()
-            .build()
-            .unwrap();
-
-        let config = ConfigOptions::default();
-        let result = DistPlannerAnalyzer {}.analyze(plan, &config).unwrap();
-        let expected = [
-            "Distinct:",
-            "  MergeScan [is_placeholder=false]",
-            "    Distinct:",
-            "      Projection: t.number",
-            "        Filter: t.number < Int32(10)",
-            "          TableScan: t",
-        ]
-        .join("\n");
-        assert_eq!(expected, result.to_string());
-    }
-
-    #[test]
-    fn transform_aggregator() {
-        let numbers_table = NumbersTable::table(0);
-        let table_source = Arc::new(DefaultTableSource::new(Arc::new(
-            DfTableProviderAdapter::new(numbers_table),
-        )));
-
-        let plan = LogicalPlanBuilder::scan_with_filters("t", table_source, None, vec![])
-            .unwrap()
-            .aggregate(Vec::<Expr>::new(), vec![avg(col("number"))])
-            .unwrap()
-            .build()
-            .unwrap();
-
-        let config = ConfigOptions::default();
-        let result = DistPlannerAnalyzer {}.analyze(plan, &config).unwrap();
-        let expected = "Projection: avg(t.number)\
-        \n  MergeScan [is_placeholder=false]";
-        assert_eq!(expected, result.to_string());
-    }
-
-    #[test]
-    fn transform_distinct_order() {
-        let numbers_table = NumbersTable::table(0);
-        let table_source = Arc::new(DefaultTableSource::new(Arc::new(
-            DfTableProviderAdapter::new(numbers_table),
-        )));
-
-        let plan = LogicalPlanBuilder::scan_with_filters("t", table_source, None, vec![])
-            .unwrap()
-            .distinct()
-            .unwrap()
-            .sort(vec![col("number").sort(true, false)])
-            .unwrap()
-            .build()
-            .unwrap();
-
-        let config = ConfigOptions::default();
-        let result = DistPlannerAnalyzer {}.analyze(plan, &config).unwrap();
-        let expected = ["Projection: t.number", "  MergeScan [is_placeholder=false]"].join("\n");
-        assert_eq!(expected, result.to_string());
-    }
-
-    #[test]
-    fn transform_single_limit() {
-        let numbers_table = NumbersTable::table(0);
-        let table_source = Arc::new(DefaultTableSource::new(Arc::new(
-            DfTableProviderAdapter::new(numbers_table),
-        )));
-
-        let plan = LogicalPlanBuilder::scan_with_filters("t", table_source, None, vec![])
-            .unwrap()
-            .limit(0, Some(1))
-            .unwrap()
-            .build()
-            .unwrap();
-
-        let config = ConfigOptions::default();
-        let result = DistPlannerAnalyzer {}.analyze(plan, &config).unwrap();
-        let expected = "Projection: t.number\
-        \n  MergeScan [is_placeholder=false]";
-        assert_eq!(expected, result.to_string());
-    }
-
-    #[test]
-    fn transform_unalighed_join_with_alias() {
-        let left = NumbersTable::table(0);
-        let right = NumbersTable::table(1);
-        let left_source = Arc::new(DefaultTableSource::new(Arc::new(
-            DfTableProviderAdapter::new(left),
-        )));
-        let right_source = Arc::new(DefaultTableSource::new(Arc::new(
-            DfTableProviderAdapter::new(right),
-        )));
-
-        let right_plan = LogicalPlanBuilder::scan_with_filters("t", right_source, None, vec![])
-            .unwrap()
-            .alias("right")
-            .unwrap()
-            .build()
-            .unwrap();
-
-        let plan = LogicalPlanBuilder::scan_with_filters("t", left_source, None, vec![])
-            .unwrap()
-            .join_on(
-                right_plan,
-                JoinType::LeftSemi,
-                vec![col("t.number").eq(col("right.number"))],
-            )
-            .unwrap()
-            .limit(0, Some(1))
-            .unwrap()
-            .build()
-            .unwrap();
-
-        let config = ConfigOptions::default();
-        let result = DistPlannerAnalyzer {}.analyze(plan, &config).unwrap();
-        let expected = [
-            "Limit: skip=0, fetch=1",
-            "  LeftSemi Join:  Filter: t.number = right.number",
-            "    Projection: t.number",
-            "      MergeScan [is_placeholder=false]",
-            "    SubqueryAlias: right",
-            "      Projection: t.number",
-            "        MergeScan [is_placeholder=false]",
-        ]
-        .join("\n");
-        assert_eq!(expected, result.to_string());
-    }
-}
--- a/src/query/src/dist_plan/analyzer/test.rs
+++ b/src/query/src/dist_plan/analyzer/test.rs
--- a/src/query/src/dist_plan/analyzer/utils.rs
+++ b/src/query/src/dist_plan/analyzer/utils.rs
@@ -0,0 +1,318 @@
+// Copyright 2023 Greptime Team
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+use std::collections::{HashMap, HashSet};
+
+use datafusion::datasource::DefaultTableSource;
+use datafusion_common::Column;
+use datafusion_expr::{Expr, LogicalPlan, TableScan};
+use table::metadata::TableType;
+use table::table::adapter::DfTableProviderAdapter;
+
+/// Mapping of original column in table to all the alias at current node
+pub type AliasMapping = HashMap<String, HashSet<Column>>;
+
+/// tracking aliases for the source table columns in the plan
+#[derive(Debug, Clone)]
+pub struct AliasTracker {
+    /// mapping from the original table name to the alias used in the plan
+    /// notice how one column might have multiple aliases in the plan
+    ///
+    pub mapping: AliasMapping,
+}
+
+impl AliasTracker {
+    pub fn new(table_scan: &TableScan) -> Option<Self> {
+        if let Some(source) = table_scan
+            .source
+            .as_any()
+            .downcast_ref::<DefaultTableSource>()
+        {
+            if let Some(provider) = source
+                .table_provider
+                .as_any()
+                .downcast_ref::<DfTableProviderAdapter>()
+            {
+                if provider.table().table_type() == TableType::Base {
+                    let info = provider.table().table_info();
+                    let schema = info.meta.schema.clone();
+                    let col_schema = schema.column_schemas();
+                    let mapping = col_schema
+                        .iter()
+                        .map(|col| {
+                            (
+                                col.name.clone(),
+                                HashSet::from_iter(std::iter::once(Column::new_unqualified(
+                                    col.name.clone(),
+                                ))),
+                            )
+                        })
+                        .collect();
+                    return Some(Self { mapping });
+                }
+            }
+        }
+
+        None
+    }
+
+    /// update alias for original columns
+    ///
+    /// only handle `Alias` with column in `Projection` node
+    pub fn update_alias(&mut self, node: &LogicalPlan) {
+        if let LogicalPlan::Projection(projection) = node {
+            // first collect all the alias mapping, i.e. the col_a AS b AS c AS d become `a->d`
+            // notice one column might have multiple aliases
+            let mut alias_mapping: AliasMapping = HashMap::new();
+            for expr in &projection.expr {
+                if let Expr::Alias(alias) = expr {
+                    let outer_alias = alias.clone();
+                    let mut cur_alias = alias.clone();
+                    while let Expr::Alias(alias) = *cur_alias.expr {
+                        cur_alias = alias;
+                    }
+                    if let Expr::Column(column) = *cur_alias.expr {
+                        alias_mapping
+                            .entry(column.name.clone())
+                            .or_default()
+                            .insert(Column::new(outer_alias.relation, outer_alias.name));
+                    }
+                } else if let Expr::Column(column) = expr {
+                    // identity mapping
+                    alias_mapping
+                        .entry(column.name.clone())
+                        .or_default()
+                        .insert(column.clone());
+                }
+            }
+
+            // update mapping using `alias_mapping`
+            let mut new_mapping = HashMap::new();
+            for (table_col_name, cur_columns) in std::mem::take(&mut self.mapping) {
+                let new_aliases = {
+                    let mut new_aliases = HashSet::new();
+                    for cur_column in &cur_columns {
+                        let new_alias_for_cur_column = alias_mapping
+                            .get(cur_column.name())
+                            .cloned()
+                            .unwrap_or_default();
+
+                        for new_alias in new_alias_for_cur_column {
+                            let is_table_ref_eq = match (&new_alias.relation, &cur_column.relation)
+                            {
+                                (Some(o), Some(c)) => o.resolved_eq(c),
+                                _ => true,
+                            };
+                            // is the same column if both name and table ref is eq
+                            if is_table_ref_eq {
+                                new_aliases.insert(new_alias.clone());
+                            }
+                        }
+                    }
+                    new_aliases
+                };
+
+                new_mapping.insert(table_col_name, new_aliases);
+            }
+
+            self.mapping = new_mapping;
+            common_telemetry::debug!(
+                "Updating alias tracker to {:?} using node: \n{node}",
+                self.mapping
+            );
+        }
+    }
+
+    pub fn get_all_alias_for_col(&self, col_name: &str) -> Option<&HashSet<Column>> {
+        self.mapping.get(col_name)
+    }
+
+    #[allow(unused)]
+    pub fn is_alias_for(&self, original_col: &str, cur_col: &Column) -> bool {
+        self.mapping
+            .get(original_col)
+            .map(|cols| cols.contains(cur_col))
+            .unwrap_or(false)
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use std::sync::Arc;
+
+    use common_telemetry::init_default_ut_logging;
+    use datafusion::error::Result as DfResult;
+    use datafusion_common::tree_node::{TreeNode, TreeNodeRecursion, TreeNodeVisitor};
+    use datafusion_expr::{col, LogicalPlanBuilder};
+
+    use super::*;
+    use crate::dist_plan::analyzer::test::TestTable;
+
+    #[derive(Debug)]
+    struct TrackerTester {
+        alias_tracker: Option<AliasTracker>,
+        mapping_at_each_level: Vec<AliasMapping>,
+    }
+
+    impl TreeNodeVisitor<'_> for TrackerTester {
+        type Node = LogicalPlan;
+
+        fn f_up(&mut self, node: &LogicalPlan) -> DfResult<TreeNodeRecursion> {
+            if let Some(alias_tracker) = &mut self.alias_tracker {
+                alias_tracker.update_alias(node);
+                self.mapping_at_each_level.push(
+                    self.alias_tracker
+                        .as_ref()
+                        .map(|a| a.mapping.clone())
+                        .unwrap_or_default()
+                        .clone(),
+                );
+            } else if let LogicalPlan::TableScan(table_scan) = node {
+                self.alias_tracker = AliasTracker::new(table_scan);
+                self.mapping_at_each_level.push(
+                    self.alias_tracker
+                        .as_ref()
+                        .map(|a| a.mapping.clone())
+                        .unwrap_or_default()
+                        .clone(),
+                );
+            }
+            Ok(TreeNodeRecursion::Continue)
+        }
+    }
+
+    #[test]
+    fn proj_alias_tracker() {
+        // use logging for better debugging
+        init_default_ut_logging();
+        let test_table = TestTable::table_with_name(0, "numbers".to_string());
+        let table_source = Arc::new(DefaultTableSource::new(Arc::new(
+            DfTableProviderAdapter::new(test_table),
+        )));
+        let plan = LogicalPlanBuilder::scan_with_filters("t", table_source, None, vec![])
+            .unwrap()
+            .project(vec![
+                col("number"),
+                col("pk3").alias("pk1"),
+                col("pk2").alias("pk3"),
+            ])
+            .unwrap()
+            .project(vec![
+                col("number"),
+                col("pk1").alias("pk2"),
+                col("pk3").alias("pk1"),
+            ])
+            .unwrap()
+            .build()
+            .unwrap();
+
+        let mut tracker_tester = TrackerTester {
+            alias_tracker: None,
+            mapping_at_each_level: Vec::new(),
+        };
+        plan.visit(&mut tracker_tester).unwrap();
+
+        assert_eq!(
+            tracker_tester.mapping_at_each_level,
+            vec![
+                HashMap::from([
+                    ("number".to_string(), HashSet::from(["number".into()])),
+                    ("pk1".to_string(), HashSet::from(["pk1".into()])),
+                    ("pk2".to_string(), HashSet::from(["pk2".into()])),
+                    ("pk3".to_string(), HashSet::from(["pk3".into()])),
+                    ("ts".to_string(), HashSet::from(["ts".into()]))
+                ]),
+                HashMap::from([
+                    ("number".to_string(), HashSet::from(["t.number".into()])),
+                    ("pk1".to_string(), HashSet::from([])),
+                    ("pk2".to_string(), HashSet::from(["pk3".into()])),
+                    ("pk3".to_string(), HashSet::from(["pk1".into()])),
+                    ("ts".to_string(), HashSet::from([]))
+                ]),
+                HashMap::from([
+                    ("number".to_string(), HashSet::from(["t.number".into()])),
+                    ("pk1".to_string(), HashSet::from([])),
+                    ("pk2".to_string(), HashSet::from(["pk1".into()])),
+                    ("pk3".to_string(), HashSet::from(["pk2".into()])),
+                    ("ts".to_string(), HashSet::from([]))
+                ])
+            ]
+        );
+    }
+
+    #[test]
+    fn proj_multi_alias_tracker() {
+        // use logging for better debugging
+        init_default_ut_logging();
+        let test_table = TestTable::table_with_name(0, "numbers".to_string());
+        let table_source = Arc::new(DefaultTableSource::new(Arc::new(
+            DfTableProviderAdapter::new(test_table),
+        )));
+        let plan = LogicalPlanBuilder::scan_with_filters("t", table_source, None, vec![])
+            .unwrap()
+            .project(vec![
+                col("number"),
+                col("pk3").alias("pk1"),
+                col("pk3").alias("pk2"),
+            ])
+            .unwrap()
+            .project(vec![
+                col("number"),
+                col("pk2").alias("pk4"),
+                col("pk1").alias("pk5"),
+            ])
+            .unwrap()
+            .build()
+            .unwrap();
+
+        let mut tracker_tester = TrackerTester {
+            alias_tracker: None,
+            mapping_at_each_level: Vec::new(),
+        };
+        plan.visit(&mut tracker_tester).unwrap();
+
+        assert_eq!(
+            tracker_tester.mapping_at_each_level,
+            vec![
+                HashMap::from([
+                    ("number".to_string(), HashSet::from(["number".into()])),
+                    ("pk1".to_string(), HashSet::from(["pk1".into()])),
+                    ("pk2".to_string(), HashSet::from(["pk2".into()])),
+                    ("pk3".to_string(), HashSet::from(["pk3".into()])),
+                    ("ts".to_string(), HashSet::from(["ts".into()]))
+                ]),
+                HashMap::from([
+                    ("number".to_string(), HashSet::from(["t.number".into()])),
+                    ("pk1".to_string(), HashSet::from([])),
+                    ("pk2".to_string(), HashSet::from([])),
+                    (
+                        "pk3".to_string(),
+                        HashSet::from(["pk1".into(), "pk2".into()])
+                    ),
+                    ("ts".to_string(), HashSet::from([]))
+                ]),
+                HashMap::from([
+                    ("number".to_string(), HashSet::from(["t.number".into()])),
+                    ("pk1".to_string(), HashSet::from([])),
+                    ("pk2".to_string(), HashSet::from([])),
+                    (
+                        "pk3".to_string(),
+                        HashSet::from(["pk4".into(), "pk5".into()])
+                    ),
+                    ("ts".to_string(), HashSet::from([]))
+                ])
+            ]
+        );
+    }
+}
--- a/src/query/src/dist_plan/commutativity.rs
+++ b/src/query/src/dist_plan/commutativity.rs
@@ -27,6 +27,7 @@ use promql::extension_plan::{
    EmptyMetric, InstantManipulate, RangeManipulate, SeriesDivide, SeriesNormalize,
 };

+use crate::dist_plan::analyzer::AliasMapping;
 use crate::dist_plan::merge_sort::{merge_sort_transformer, MergeSortLogicalPlan};
 use crate::dist_plan::MergeScanLogicalPlan;

@@ -139,9 +140,7 @@ pub fn step_aggr_to_upper_aggr(
        new_projection_exprs.push(aliased_output_aggr_expr);
    }
    let upper_aggr_plan = LogicalPlan::Aggregate(new_aggr);
-    debug!("Before recompute schema: {upper_aggr_plan:?}");
    let upper_aggr_plan = upper_aggr_plan.recompute_schema()?;
-    debug!("After recompute schema: {upper_aggr_plan:?}");
    // create a projection on top of the new aggregate plan
    let new_projection =
        Projection::try_new(new_projection_exprs, Arc::new(upper_aggr_plan.clone()))?;
@@ -222,7 +221,7 @@ pub enum Commutativity {
 pub struct Categorizer {}

 impl Categorizer {
-    pub fn check_plan(plan: &LogicalPlan, partition_cols: Option<Vec<String>>) -> Commutativity {
+    pub fn check_plan(plan: &LogicalPlan, partition_cols: Option<AliasMapping>) -> Commutativity {
        let partition_cols = partition_cols.unwrap_or_default();

        match plan {
@@ -247,7 +246,6 @@ impl Categorizer {
                        transformer: Some(Arc::new(|plan: &LogicalPlan| {
                            debug!("Before Step optimize: {plan}");
                            let ret = step_aggr_to_upper_aggr(plan);
-                            debug!("After Step Optimize: {ret:?}");
                            ret.ok().map(|s| TransformerAction {
                                extra_parent_plans: s.to_vec(),
                                new_child_plan: None,
@@ -264,7 +262,11 @@ impl Categorizer {
                        return commutativity;
                    }
                }
-                Commutativity::Commutative
+                // all group by expressions are partition columns can push down, unless
+                // another push down(including `Limit` or `Sort`) is already in progress(which will then prvent next cond commutative node from being push down).
+                // TODO(discord9): This is a temporary solution(that works), a better description of
+                // commutativity is needed under this situation.
+                Commutativity::ConditionalCommutative(None)
            }
            LogicalPlan::Sort(_) => {
                if partition_cols.is_empty() {
@@ -322,17 +324,20 @@ impl Categorizer {

    pub fn check_extension_plan(
        plan: &dyn UserDefinedLogicalNode,
-        partition_cols: &[String],
+        partition_cols: &AliasMapping,
    ) -> Commutativity {
        match plan.name() {
            name if name == SeriesDivide::name() => {
                let series_divide = plan.as_any().downcast_ref::<SeriesDivide>().unwrap();
                let tags = series_divide.tags().iter().collect::<HashSet<_>>();
-                for partition_col in partition_cols {
-                    if !tags.contains(partition_col) {
+
+                for all_alias in partition_cols.values() {
+                    let all_alias = all_alias.iter().map(|c| &c.name).collect::<HashSet<_>>();
+                    if tags.intersection(&all_alias).count() == 0 {
                        return Commutativity::NonCommutative;
                    }
                }
+
                Commutativity::Commutative
            }
            name if name == SeriesNormalize::name()
@@ -396,7 +401,7 @@ impl Categorizer {

    /// Return true if the given expr and partition cols satisfied the rule.
    /// In this case the plan can be treated as fully commutative.
-    fn check_partition(exprs: &[Expr], partition_cols: &[String]) -> bool {
+    fn check_partition(exprs: &[Expr], partition_cols: &AliasMapping) -> bool {
        let mut ref_cols = HashSet::new();
        for expr in exprs {
            expr.add_column_refs(&mut ref_cols);
@@ -405,8 +410,14 @@ impl Categorizer {
            .into_iter()
            .map(|c| c.name.clone())
            .collect::<HashSet<_>>();
-        for col in partition_cols {
-            if !ref_cols.contains(col) {
+        for all_alias in partition_cols.values() {
+            let all_alias = all_alias
+                .iter()
+                .map(|c| c.name.clone())
+                .collect::<HashSet<_>>();
+            // check if ref columns intersect with all alias of partition columns
+            // is empty, if it's empty, not all partition columns show up in `exprs`
+            if ref_cols.intersection(&all_alias).count() == 0 {
                return false;
            }
        }
@@ -424,7 +435,7 @@ pub type StageTransformer = Arc<dyn Fn(&LogicalPlan) -> Option<TransformerAction
 pub struct TransformerAction {
    /// list of plans that need to be applied to parent plans, in the order of parent to child.
    /// i.e. if this returns `[Projection, Aggregate]`, then the parent plan should be transformed to
-    /// ```
+    /// ```ignore
    /// Original Parent Plan:
    ///     Projection:
    ///         Aggregate:
@@ -453,7 +464,7 @@ mod test {
            fetch: None,
        });
        assert!(matches!(
-            Categorizer::check_plan(&plan, Some(vec![])),
+            Categorizer::check_plan(&plan, Some(Default::default())),
            Commutativity::Commutative
        ));
    }
--- a/src/query/src/dist_plan/merge_scan.rs
+++ b/src/query/src/dist_plan/merge_scan.rs
@@ -16,7 +16,7 @@ use std::any::Any;
 use std::sync::{Arc, Mutex};
 use std::time::Duration;

-use ahash::HashSet;
+use ahash::{HashMap, HashSet};
 use arrow_schema::{Schema as ArrowSchema, SchemaRef as ArrowSchemaRef, SortOptions};
 use async_stream::stream;
 use common_catalog::parse_catalog_and_schema_from_db_string;
@@ -88,7 +88,11 @@ impl UserDefinedLogicalNodeCore for MergeScanLogicalPlan {
    }

    fn fmt_for_explain(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
-        write!(f, "MergeScan [is_placeholder={}]", self.is_placeholder)
+        write!(
+            f,
+            "MergeScan [is_placeholder={}, remote_input=[\n{}\n]]",
+            self.is_placeholder, self.input
+        )
    }

    fn with_exprs_and_inputs(
@@ -143,7 +147,7 @@ pub struct MergeScanExec {
    metric: ExecutionPlanMetricsSet,
    properties: PlanProperties,
    /// Metrics from sub stages
-    sub_stage_metrics: Arc<Mutex<Vec<RecordBatchMetrics>>>,
+    sub_stage_metrics: Arc<Mutex<HashMap<RegionId, RecordBatchMetrics>>>,
    query_ctx: QueryContextRef,
    target_partition: usize,
    partition_cols: Vec<String>,
@@ -155,6 +159,7 @@ impl std::fmt::Debug for MergeScanExec {
            .field("table", &self.table)
            .field("regions", &self.regions)
            .field("schema", &self.schema)
+            .field("plan", &self.plan)
            .finish()
    }
 }
@@ -317,6 +322,12 @@ impl MergeScanExec {
                    if let Some(mut first_consume_timer) = first_consume_timer.take() {
                        first_consume_timer.stop();
                    }
+
+                    if let Some(metrics) = stream.metrics() {
+                        let mut sub_stage_metrics = sub_stage_metrics_moved.lock().unwrap();
+                        sub_stage_metrics.insert(region_id, metrics);
+                    }
+
                    yield Ok(batch);
                    // reset poll timer
                    poll_timer = Instant::now();
@@ -341,7 +352,8 @@ impl MergeScanExec {
                    metric.record_greptime_exec_cost(value as usize);

                    // record metrics from sub sgates
-                    sub_stage_metrics_moved.lock().unwrap().push(metrics);
+                    let mut sub_stage_metrics = sub_stage_metrics_moved.lock().unwrap();
+                    sub_stage_metrics.insert(region_id, metrics);
                }

                MERGE_SCAN_POLL_ELAPSED.observe(poll_duration.as_secs_f64());
@@ -409,7 +421,12 @@ impl MergeScanExec {
    }

    pub fn sub_stage_metrics(&self) -> Vec<RecordBatchMetrics> {
-        self.sub_stage_metrics.lock().unwrap().clone()
+        self.sub_stage_metrics
+            .lock()
+            .unwrap()
+            .values()
+            .cloned()
+            .collect()
    }

    pub fn partition_count(&self) -> usize {
--- a/src/query/src/optimizer/windowed_sort.rs
+++ b/src/query/src/optimizer/windowed_sort.rs
@@ -181,6 +181,15 @@ fn fetch_partition_range(input: Arc<dyn ExecutionPlan>) -> DataFusionResult<Opti
            is_batch_coalesced = true;
        }

+        // only a very limited set of plans can exist between region scan and sort exec
+        // other plans might make this optimize wrong, so be safe here by limiting it
+        if !(plan.as_any().is::<ProjectionExec>()
+            || plan.as_any().is::<FilterExec>()
+            || plan.as_any().is::<CoalesceBatchesExec>())
+        {
+            partition_ranges = None;
+        }
+
        // TODO(discord9): do this in logical plan instead as it's lessy bugy there
        // Collects alias of the time index column.
        if let Some(projection) = plan.as_any().downcast_ref::<ProjectionExec>() {
@@ -194,6 +203,14 @@ fn fetch_partition_range(input: Arc<dyn ExecutionPlan>) -> DataFusionResult<Opti
        }

        if let Some(region_scan_exec) = plan.as_any().downcast_ref::<RegionScanExec>() {
+            // `PerSeries` distribution is not supported in windowed sort.
+            if region_scan_exec.distribution()
+                == Some(store_api::storage::TimeSeriesDistribution::PerSeries)
+            {
+                partition_ranges = None;
+                return Ok(Transformed::no(plan));
+            }
+
            partition_ranges = Some(region_scan_exec.get_uncollapsed_partition_ranges());
            // Reset time index column.
            time_index = HashSet::from([region_scan_exec.time_index()]);
--- a/src/query/src/part_sort.rs
+++ b/src/query/src/part_sort.rs
@@ -96,9 +96,10 @@ impl PartSortExec {

        if partition >= self.partition_ranges.len() {
            internal_err!(
-                "Partition index out of range: {} >= {}",
+                "Partition index out of range: {} >= {} at {}",
                partition,
-                self.partition_ranges.len()
+                self.partition_ranges.len(),
+                snafu::location!()
            )?;
        }

@@ -322,9 +323,10 @@ impl PartSortStream {
    ) -> datafusion_common::Result<()> {
        if self.cur_part_idx >= self.partition_ranges.len() {
            internal_err!(
-                "Partition index out of range: {} >= {}",
+                "Partition index out of range: {} >= {} at {}",
                self.cur_part_idx,
-                self.partition_ranges.len()
+                self.partition_ranges.len(),
+                snafu::location!()
            )?;
        }
        let cur_range = self.partition_ranges[self.cur_part_idx];
@@ -355,9 +357,10 @@ impl PartSortStream {
        // check if the current partition index is out of range
        if self.cur_part_idx >= self.partition_ranges.len() {
            internal_err!(
-                "Partition index out of range: {} >= {}",
+                "Partition index out of range: {} >= {} at {}",
                self.cur_part_idx,
-                self.partition_ranges.len()
+                self.partition_ranges.len(),
+                snafu::location!()
            )?;
        }
        let cur_range = self.partition_ranges[self.cur_part_idx];
--- a/src/query/src/promql/planner.rs
+++ b/src/query/src/promql/planner.rs
@@ -191,18 +191,38 @@ impl PromPlanner {
        planner.prom_expr_to_plan(&stmt.expr, session_state).await
    }

-    #[async_recursion]
    pub async fn prom_expr_to_plan(
        &mut self,
        prom_expr: &PromExpr,
        session_state: &SessionState,
+    ) -> Result<LogicalPlan> {
+        self.prom_expr_to_plan_inner(prom_expr, false, session_state)
+            .await
+    }
+
+    /**
+    Converts a PromQL expression to a logical plan.
+
+    NOTE:
+        The `timestamp_fn` indicates whether the PromQL `timestamp()` function is being evaluated in the current context.
+        If `true`, the planner generates a logical plan that projects the timestamp (time index) column
+        as the value column for each input row, implementing the PromQL `timestamp()` function semantics.
+        If `false`, the planner generates the standard logical plan for the given PromQL expression.
+    */
+    #[async_recursion]
+    async fn prom_expr_to_plan_inner(
+        &mut self,
+        prom_expr: &PromExpr,
+        timestamp_fn: bool,
+        session_state: &SessionState,
    ) -> Result<LogicalPlan> {
        let res = match prom_expr {
            PromExpr::Aggregate(expr) => self.prom_aggr_expr_to_plan(session_state, expr).await?,
            PromExpr::Unary(expr) => self.prom_unary_expr_to_plan(session_state, expr).await?,
            PromExpr::Binary(expr) => self.prom_binary_expr_to_plan(session_state, expr).await?,
            PromExpr::Paren(ParenExpr { expr }) => {
-                self.prom_expr_to_plan(expr, session_state).await?
+                self.prom_expr_to_plan_inner(expr, timestamp_fn, session_state)
+                    .await?
            }
            PromExpr::Subquery(expr) => {
                self.prom_subquery_expr_to_plan(session_state, expr).await?
@@ -210,7 +230,8 @@ impl PromPlanner {
            PromExpr::NumberLiteral(lit) => self.prom_number_lit_to_plan(lit)?,
            PromExpr::StringLiteral(lit) => self.prom_string_lit_to_plan(lit)?,
            PromExpr::VectorSelector(selector) => {
-                self.prom_vector_selector_to_plan(selector).await?
+                self.prom_vector_selector_to_plan(selector, timestamp_fn)
+                    .await?
            }
            PromExpr::MatrixSelector(selector) => {
                self.prom_matrix_selector_to_plan(selector).await?
@@ -673,6 +694,7 @@ impl PromPlanner {
    async fn prom_vector_selector_to_plan(
        &mut self,
        vector_selector: &VectorSelector,
+        timestamp_fn: bool,
    ) -> Result<LogicalPlan> {
        let VectorSelector {
            name,
@@ -687,6 +709,15 @@ impl PromPlanner {
        let normalize = self
            .selector_to_series_normalize_plan(offset, matchers, false)
            .await?;
+
+        let normalize = if timestamp_fn {
+            // If evaluating the PromQL `timestamp()` function, project the time index column as the value column
+            // before wrapping with [`InstantManipulate`], so the output matches PromQL's `timestamp()` semantics.
+            self.create_timestamp_func_plan(normalize)?
+        } else {
+            normalize
+        };
+
        let manipulate = InstantManipulate::new(
            self.ctx.start,
            self.ctx.end,
@@ -704,6 +735,43 @@ impl PromPlanner {
        }))
    }

+    /// Builds a projection plan for the PromQL `timestamp()` function.
+    /// Projects the time index column as the value column for each row.
+    ///
+    /// # Arguments
+    /// * `normalize` - Input [`LogicalPlan`] for the normalized series.
+    ///
+    /// # Returns
+    /// Returns a [`Result<LogicalPlan>`] where the resulting logical plan projects the timestamp
+    /// column as the value column, along with the original tag and time index columns.
+    ///
+    /// # Timestamp vs. Time Function
+    ///
+    /// - **Timestamp Function (`timestamp()`)**: In PromQL, the `timestamp()` function returns the
+    ///   timestamp (time index) of each sample as the value column.
+    ///
+    /// - **Time Function (`time()`)**: The `time()` function returns the evaluation time of the query
+    ///   as a scalar value.
+    ///
+    /// # Side Effects
+    /// Updates the planner context's field columns to the timestamp column name.
+    ///
+    fn create_timestamp_func_plan(&mut self, normalize: LogicalPlan) -> Result<LogicalPlan> {
+        let time_expr = build_special_time_expr(self.ctx.time_index_column.as_ref().unwrap())
+            .alias(DEFAULT_FIELD_COLUMN);
+        self.ctx.field_columns = vec![time_expr.schema_name().to_string()];
+        let mut project_exprs = Vec::with_capacity(self.ctx.tag_columns.len() + 2);
+        project_exprs.push(self.create_time_index_column_expr()?);
+        project_exprs.push(time_expr);
+        project_exprs.extend(self.create_tag_column_exprs()?);
+
+        LogicalPlanBuilder::from(normalize)
+            .project(project_exprs)
+            .context(DataFusionPlanningSnafu)?
+            .build()
+            .context(DataFusionPlanningSnafu)
+    }
+
    async fn prom_matrix_selector_to_plan(
        &mut self,
        matrix_selector: &MatrixSelector,
@@ -716,17 +784,19 @@ impl PromPlanner {
            ..
        } = vs;
        let matchers = self.preprocess_label_matchers(matchers, name)?;
-        if let Some(empty_plan) = self.setup_context().await? {
-            return Ok(empty_plan);
-        }
-
        ensure!(!range.is_zero(), ZeroRangeSelectorSnafu);
        let range_ms = range.as_millis() as _;
        self.ctx.range = Some(range_ms);

-        let normalize = self
-            .selector_to_series_normalize_plan(offset, matchers, true)
-            .await?;
+        // Some functions like rate may require special fields in the RangeManipulate plan
+        // so we can't skip RangeManipulate.
+        let normalize = match self.setup_context().await? {
+            Some(empty_plan) => empty_plan,
+            None => {
+                self.selector_to_series_normalize_plan(offset, matchers, true)
+                    .await?
+            }
+        };
        let manipulate = RangeManipulate::new(
            self.ctx.start,
            self.ctx.end,
@@ -766,7 +836,8 @@ impl PromPlanner {
        // transform function arguments
        let args = self.create_function_args(&args.args)?;
        let input = if let Some(prom_expr) = &args.input {
-            self.prom_expr_to_plan(prom_expr, session_state).await?
+            self.prom_expr_to_plan_inner(prom_expr, func.name == "timestamp", session_state)
+                .await?
        } else {
            self.ctx.time_index_column = Some(SPECIAL_TIME_FUNCTION.to_string());
            self.ctx.reset_table_name_and_schema();
@@ -1652,7 +1723,7 @@ impl PromPlanner {

                ScalarFunc::GeneratedExpr
            }
-            "sort" | "sort_desc" | "sort_by_label" | "sort_by_label_desc" => {
+            "sort" | "sort_desc" | "sort_by_label" | "sort_by_label_desc" | "timestamp" => {
                // These functions are not expression but a part of plan,
                // they are processed by `prom_call_expr_to_plan`.
                for value in &self.ctx.field_columns {
@@ -2263,10 +2334,14 @@ impl PromPlanner {
        let input_plan = self.prom_expr_to_plan(&input, session_state).await?;

        if !self.ctx.has_le_tag() {
-            return ColumnNotFoundSnafu {
-                col: LE_COLUMN_NAME.to_string(),
-            }
-            .fail();
+            // Return empty result instead of error when 'le' column is not found
+            // This handles the case when histogram metrics don't exist
+            return Ok(LogicalPlan::EmptyRelation(
+                datafusion::logical_expr::EmptyRelation {
+                    produce_one_row: false,
+                    schema: Arc::new(DFSchema::empty()),
+                },
+            ));
        }
        let time_index_column =
            self.ctx
@@ -4657,4 +4732,53 @@ Filter: up.field_0 IS NOT NULL [timestamp:Timestamp(Millisecond, None), field_0:

        assert_eq!(plan.display_indent_schema().to_string(), expected);
    }
+
+    #[tokio::test]
+    async fn test_histogram_quantile_missing_le_column() {
+        let mut eval_stmt = EvalStmt {
+            expr: PromExpr::NumberLiteral(NumberLiteral { val: 1.0 }),
+            start: UNIX_EPOCH,
+            end: UNIX_EPOCH
+                .checked_add(Duration::from_secs(100_000))
+                .unwrap(),
+            interval: Duration::from_secs(5),
+            lookback_delta: Duration::from_secs(1),
+        };
+
+        // Test case: histogram_quantile with a table that doesn't have 'le' column
+        let case = r#"histogram_quantile(0.99, sum by(pod,instance,le) (rate(non_existent_histogram_bucket{instance=~"xxx"}[1m])))"#;
+
+        let prom_expr = parser::parse(case).unwrap();
+        eval_stmt.expr = prom_expr;
+
+        // Create a table provider with a table that doesn't have 'le' column
+        let table_provider = build_test_table_provider_with_fields(
+            &[(
+                DEFAULT_SCHEMA_NAME.to_string(),
+                "non_existent_histogram_bucket".to_string(),
+            )],
+            &["pod", "instance"], // Note: no 'le' column
+        )
+        .await;
+
+        // Should return empty result instead of error
+        let result =
+            PromPlanner::stmt_to_plan(table_provider, &eval_stmt, &build_session_state()).await;
+
+        // This should succeed now (returning empty result) instead of failing with "Cannot find column le"
+        assert!(
+            result.is_ok(),
+            "Expected successful plan creation with empty result, but got error: {:?}",
+            result.err()
+        );
+
+        // Verify that the result is an EmptyRelation
+        let plan = result.unwrap();
+        match plan {
+            LogicalPlan::EmptyRelation(_) => {
+                // This is what we expect
+            }
+            _ => panic!("Expected EmptyRelation, but got: {:?}", plan),
+        }
+    }
 }
--- a/src/servers/src/grpc/flight.rs
+++ b/src/servers/src/grpc/flight.rs
@@ -36,6 +36,7 @@ use common_telemetry::tracing_context::{FutureExt, TracingContext};
 use futures::{future, ready, Stream};
 use futures_util::{StreamExt, TryStreamExt};
 use prost::Message;
+use session::context::{QueryContext, QueryContextRef};
 use snafu::{ensure, ResultExt};
 use table::table_name::TableName;
 use tokio::sync::mpsc;
@@ -188,6 +189,7 @@ impl FlightCraft for GreptimeRequestHandler {
        let ticket = request.into_inner().ticket;
        let request =
            GreptimeRequest::decode(ticket.as_ref()).context(error::InvalidFlightTicketSnafu)?;
+        let query_ctx = QueryContext::arc();

        // The Grpc protocol pass query by Flight. It needs to be wrapped under a span, in order to record stream
        let span = info_span!(
@@ -202,6 +204,7 @@ impl FlightCraft for GreptimeRequestHandler {
                output,
                TracingContext::from_current_span(),
                flight_compression,
+                query_ctx,
            );
            Ok(Response::new(stream))
        }
@@ -371,15 +374,25 @@ fn to_flight_data_stream(
    output: Output,
    tracing_context: TracingContext,
    flight_compression: FlightCompression,
+    query_ctx: QueryContextRef,
 ) -> TonicStream<FlightData> {
    match output.data {
        OutputData::Stream(stream) => {
-            let stream = FlightRecordBatchStream::new(stream, tracing_context, flight_compression);
+            let stream = FlightRecordBatchStream::new(
+                stream,
+                tracing_context,
+                flight_compression,
+                query_ctx,
+            );
            Box::pin(stream) as _
        }
        OutputData::RecordBatches(x) => {
-            let stream =
-                FlightRecordBatchStream::new(x.as_stream(), tracing_context, flight_compression);
+            let stream = FlightRecordBatchStream::new(
+                x.as_stream(),
+                tracing_context,
+                flight_compression,
+                query_ctx,
+            );
            Box::pin(stream) as _
        }
        OutputData::AffectedRows(rows) => {
--- a/src/servers/src/grpc/flight/stream.rs
+++ b/src/servers/src/grpc/flight/stream.rs
@@ -25,6 +25,7 @@ use futures::channel::mpsc;
 use futures::channel::mpsc::Sender;
 use futures::{SinkExt, Stream, StreamExt};
 use pin_project::{pin_project, pinned_drop};
+use session::context::QueryContextRef;
 use snafu::ResultExt;
 use tokio::task::JoinHandle;

@@ -46,10 +47,12 @@ impl FlightRecordBatchStream {
        recordbatches: SendableRecordBatchStream,
        tracing_context: TracingContext,
        compression: FlightCompression,
+        query_ctx: QueryContextRef,
    ) -> Self {
+        let should_send_partial_metrics = query_ctx.explain_verbose();
        let (tx, rx) = mpsc::channel::<TonicResult<FlightMessage>>(1);
        let join_handle = common_runtime::spawn_global(async move {
-            Self::flight_data_stream(recordbatches, tx)
+            Self::flight_data_stream(recordbatches, tx, should_send_partial_metrics)
                .trace(tracing_context.attach(info_span!("flight_data_stream")))
                .await
        });
@@ -69,6 +72,7 @@ impl FlightRecordBatchStream {
    async fn flight_data_stream(
        mut recordbatches: SendableRecordBatchStream,
        mut tx: Sender<TonicResult<FlightMessage>>,
+        should_send_partial_metrics: bool,
    ) {
        let schema = recordbatches.schema().arrow_schema().clone();
        if let Err(e) = tx.send(Ok(FlightMessage::Schema(schema))).await {
@@ -88,6 +92,17 @@ impl FlightRecordBatchStream {
                        warn!(e; "stop sending Flight data");
                        return;
                    }
+                    if should_send_partial_metrics {
+                        if let Some(metrics) = recordbatches
+                            .metrics()
+                            .and_then(|m| serde_json::to_string(&m).ok())
+                        {
+                            if let Err(e) = tx.send(Ok(FlightMessage::Metrics(metrics))).await {
+                                warn!(e; "stop sending Flight data");
+                                return;
+                            }
+                        }
+                    }
                }
                Err(e) => {
                    let e = Err(e).context(error::CollectRecordbatchSnafu);
@@ -154,6 +169,7 @@ mod test {
    use datatypes::schema::{ColumnSchema, Schema};
    use datatypes::vectors::Int32Vector;
    use futures::StreamExt;
+    use session::context::QueryContext;

    use super::*;

@@ -175,6 +191,7 @@ mod test {
            recordbatches,
            TracingContext::default(),
            FlightCompression::default(),
+            QueryContext::arc(),
        );

        let mut raw_data = Vec::with_capacity(2);
--- a/src/servers/src/grpc/greptime_handler.rs
+++ b/src/servers/src/grpc/greptime_handler.rs
@@ -42,6 +42,7 @@ use session::hints::READ_PREFERENCE_HINT;
 use snafu::{OptionExt, ResultExt};
 use table::TableRef;
 use tokio::sync::mpsc;
+use tokio::sync::mpsc::error::TrySendError;

 use crate::error::Error::UnsupportedAuthScheme;
 use crate::error::{
@@ -176,8 +177,9 @@ impl GreptimeRequestHandler {
                let result = result
                    .map(|x| DoPutResponse::new(request_id, x))
                    .map_err(Into::into);
-                if result_sender.try_send(result).is_err() {
-                    warn!(r#""DoPut" client maybe unreachable, abort handling its message"#);
+                if let Err(e)= result_sender.try_send(result)
+                    && let TrySendError::Closed(_) = e {
+                    warn!(r#""DoPut" client with request_id {} maybe unreachable, abort handling its message"#, request_id);
                    break;
                }
            }
--- a/src/servers/src/http/prometheus.rs
+++ b/src/servers/src/http/prometheus.rs
@@ -13,7 +13,7 @@
 // limitations under the License.

 //! prom supply the prometheus HTTP API Server compliance
-use std::collections::{HashMap, HashSet};
+use std::collections::{BTreeMap, HashMap, HashSet};
 use std::sync::Arc;

 use axum::extract::{Path, Query, State};
@@ -62,7 +62,7 @@ use crate::prometheus_handler::PrometheusHandlerRef;
 /// For [ValueType::Vector] result type
 #[derive(Debug, Default, Serialize, Deserialize, PartialEq)]
 pub struct PromSeriesVector {
-    pub metric: HashMap<String, String>,
+    pub metric: BTreeMap<String, String>,
    #[serde(skip_serializing_if = "Option::is_none")]
    pub value: Option<(f64, String)>,
 }
@@ -70,7 +70,7 @@ pub struct PromSeriesVector {
 /// For [ValueType::Matrix] result type
 #[derive(Debug, Default, Serialize, Deserialize, PartialEq)]
 pub struct PromSeriesMatrix {
-    pub metric: HashMap<String, String>,
+    pub metric: BTreeMap<String, String>,
    pub values: Vec<(f64, String)>,
 }

--- a/src/servers/src/http/result/prometheus_resp.rs
+++ b/src/servers/src/http/result/prometheus_resp.rs
@@ -13,7 +13,8 @@
 // limitations under the License.

 //! prom supply the prometheus HTTP API Server compliance
-use std::collections::HashMap;
+use std::cmp::Ordering;
+use std::collections::{BTreeMap, HashMap};

 use axum::http::HeaderValue;
 use axum::response::{IntoResponse, Response};
@@ -311,7 +312,7 @@ impl PrometheusJsonResponse {
            let metric = tags
                .into_iter()
                .map(|(k, v)| (k.to_string(), v.to_string()))
-                .collect::<HashMap<_, _>>();
+                .collect::<BTreeMap<_, _>>();
            match result {
                PromQueryResult::Vector(ref mut v) => {
                    v.push(PromSeriesVector {
@@ -320,6 +321,11 @@ impl PrometheusJsonResponse {
                    });
                }
                PromQueryResult::Matrix(ref mut v) => {
+                    // sort values by timestamp
+                    if !values.is_sorted_by(|a, b| a.0 <= b.0) {
+                        values.sort_by(|a, b| a.0.partial_cmp(&b.0).unwrap_or(Ordering::Equal));
+                    }
+
                    v.push(PromSeriesMatrix { metric, values });
                }
                PromQueryResult::Scalar(ref mut v) => {
@@ -331,6 +337,12 @@ impl PrometheusJsonResponse {
            }
        });

+        // sort matrix by metric
+        // see: https://prometheus.io/docs/prometheus/3.5/querying/api/#range-vectors
+        if let PromQueryResult::Matrix(ref mut v) = result {
+            v.sort_by(|a, b| a.metric.cmp(&b.metric));
+        }
+
        let result_type_string = result_type.to_string();
        let data = PrometheusResponse::PromData(PromData {
            result_type: result_type_string,
--- a/src/servers/src/mysql/federated.rs
+++ b/src/servers/src/mysql/federated.rs
@@ -170,7 +170,7 @@ fn select_variable(query: &str, query_context: QueryContextRef) -> Option<Output

    // skip the first "select"
    for var in vars.iter().skip(1) {
-        let var = var.trim_matches(|c| c == ' ' || c == ',');
+        let var = var.trim_matches(|c| c == ' ' || c == ',' || c == ';');
        let var_as: Vec<&str> = var
            .split(" as ")
            .map(|x| {
@@ -185,6 +185,9 @@ fn select_variable(query: &str, query_context: QueryContextRef) -> Option<Output
        let value = match var_as[0] {
            "session.time_zone" | "time_zone" => query_context.timezone().to_string(),
            "system_time_zone" => system_timezone_name(),
+            "max_execution_time" | "session.max_execution_time" => {
+                query_context.query_timeout_as_millis().to_string()
+            }
            _ => VAR_VALUES
                .get(var_as[0])
                .map(|v| v.to_string())
@@ -352,11 +355,11 @@ mod test {
        // complex variables
        let query = "/* mysql-connector-java-8.0.17 (Revision: 16a712ddb3f826a1933ab42b0039f7fb9eebc6ec) */SELECT  @@session.auto_increment_increment AS auto_increment_increment, @@character_set_client AS character_set_client, @@character_set_connection AS character_set_connection, @@character_set_results AS character_set_results, @@character_set_server AS character_set_server, @@collation_server AS collation_server, @@collation_connection AS collation_connection, @@init_connect AS init_connect, @@interactive_timeout AS interactive_timeout, @@license AS license, @@lower_case_table_names AS lower_case_table_names, @@max_allowed_packet AS max_allowed_packet, @@net_write_timeout AS net_write_timeout, @@performance_schema AS performance_schema, @@sql_mode AS sql_mode, @@system_time_zone AS system_time_zone, @@time_zone AS time_zone, @@transaction_isolation AS transaction_isolation, @@wait_timeout AS wait_timeout;";
        let expected = "\
-+--------------------------+----------------------+--------------------------+-----------------------+----------------------+------------------+----------------------+--------------+---------------------+---------+------------------------+--------------------+-------------------+--------------------+----------+------------------+---------------+-----------------------+---------------+
-| auto_increment_increment | character_set_client | character_set_connection | character_set_results | character_set_server | collation_server | collation_connection | init_connect | interactive_timeout | license | lower_case_table_names | max_allowed_packet | net_write_timeout | performance_schema | sql_mode | system_time_zone | time_zone     | transaction_isolation | wait_timeout; |
-+--------------------------+----------------------+--------------------------+-----------------------+----------------------+------------------+----------------------+--------------+---------------------+---------+------------------------+--------------------+-------------------+--------------------+----------+------------------+---------------+-----------------------+---------------+
-| 0                        | 0                    | 0                        | 0                     | 0                    | 0                | 0                    | 0            | 31536000            | 0       | 0                      | 134217728          | 31536000          | 0                  | 0        | Asia/Shanghai    | Asia/Shanghai | REPEATABLE-READ       | 31536000      |
-+--------------------------+----------------------+--------------------------+-----------------------+----------------------+------------------+----------------------+--------------+---------------------+---------+------------------------+--------------------+-------------------+--------------------+----------+------------------+---------------+-----------------------+---------------+";
+--------------------------+----------------------+--------------------------+-----------------------+----------------------+------------------+----------------------+--------------+---------------------+---------+------------------------+--------------------+-------------------+--------------------+----------+------------------+---------------+-----------------------+--------------+
+| auto_increment_increment | character_set_client | character_set_connection | character_set_results | character_set_server | collation_server | collation_connection | init_connect | interactive_timeout | license | lower_case_table_names | max_allowed_packet | net_write_timeout | performance_schema | sql_mode | system_time_zone | time_zone     | transaction_isolation | wait_timeout |
+--------------------------+----------------------+--------------------------+-----------------------+----------------------+------------------+----------------------+--------------+---------------------+---------+------------------------+--------------------+-------------------+--------------------+----------+------------------+---------------+-----------------------+--------------+
+| 0                        | 0                    | 0                        | 0                     | 0                    | 0                | 0                    | 0            | 31536000            | 0       | 0                      | 134217728          | 31536000          | 0                  | 0        | Asia/Shanghai    | Asia/Shanghai | REPEATABLE-READ       | 31536000     |
+--------------------------+----------------------+--------------------------+-----------------------+----------------------+------------------+----------------------+--------------+---------------------+---------+------------------------+--------------------+-------------------+--------------------+----------+------------------+---------------+-----------------------+--------------+";
        test(query, expected);

        let query = "show variables";
--- a/src/servers/src/pipeline.rs
+++ b/src/servers/src/pipeline.rs
@@ -167,6 +167,9 @@ async fn run_custom_pipeline(
            PipelineExecOutput::DispatchedTo(dispatched_to, val) => {
                push_to_map!(dispatched, dispatched_to, val, arr_len);
            }
+            PipelineExecOutput::Filtered => {
+                continue;
+            }
        }
    }

--- a/src/servers/src/postgres.rs
+++ b/src/servers/src/postgres.rs
@@ -49,7 +49,7 @@ pub(crate) struct GreptimeDBStartupParameters {
 impl GreptimeDBStartupParameters {
    fn new() -> GreptimeDBStartupParameters {
        GreptimeDBStartupParameters {
-            version: format!("16.3-greptimedb-{}", env!("CARGO_PKG_VERSION")),
+            version: format!("16.3-greptimedb-{}", common_version::version()),
        }
    }
 }
--- a/src/servers/src/proto.rs
+++ b/src/servers/src/proto.rs
@@ -412,6 +412,10 @@ impl PromSeriesProcessor {
        let one_sample = series.samples.len() == 1;

        for s in series.samples.iter() {
+            // skip NaN value
+            if s.value.is_nan() {
+                continue;
+            }
            let timestamp = s.timestamp;
            pipeline_map.insert(GREPTIME_TIMESTAMP.to_string(), Value::Int64(timestamp));
            pipeline_map.insert(GREPTIME_VALUE.to_string(), Value::Float64(s.value));
--- a/src/table/src/error.rs
+++ b/src/table/src/error.rs
@@ -95,6 +95,18 @@ pub enum Error {
        location: Location,
    },

+    #[snafu(display(
+        "Not allowed to remove partition column {} from table {}",
+        column_name,
+        table_name
+    ))]
+    RemovePartitionColumn {
+        column_name: String,
+        table_name: String,
+        #[snafu(implicit)]
+        location: Location,
+    },
+
    #[snafu(display(
        "Failed to build column descriptor for table: {}, column: {}",
        table_name,
@@ -193,6 +205,7 @@ impl ErrorExt for Error {
                StatusCode::EngineExecuteQuery
            }
            Error::RemoveColumnInIndex { .. }
+            | Error::RemovePartitionColumn { .. }
            | Error::BuildColumnDescriptor { .. }
            | Error::InvalidAlterRequest { .. } => StatusCode::InvalidArguments,
            Error::CastDefaultValue { source, .. } => source.status_code(),
--- a/src/table/src/metadata.rs
+++ b/src/table/src/metadata.rs
@@ -645,10 +645,19 @@ impl TableMeta {
            msg: format!("Table {table_name} cannot add new columns {column_names:?}"),
        })?;

+        let partition_key_indices = self
+            .partition_key_indices
+            .iter()
+            .map(|idx| table_schema.column_name_by_index(*idx))
+            // This unwrap is safe since we only add new columns.
+            .map(|name| new_schema.column_index_by_name(name).unwrap())
+            .collect();
+
        // value_indices would be generated automatically.
        let _ = meta_builder
            .schema(Arc::new(new_schema))
-            .primary_key_indices(primary_key_indices);
+            .primary_key_indices(primary_key_indices)
+            .partition_key_indices(partition_key_indices);

        Ok(meta_builder)
    }
@@ -676,6 +685,14 @@ impl TableMeta {
                    }
                );

+                ensure!(
+                    !self.partition_key_indices.contains(&index),
+                    error::RemovePartitionColumnSnafu {
+                        column_name: *column_name,
+                        table_name,
+                    }
+                );
+
                if let Some(ts_index) = timestamp_index {
                    // Not allowed to remove column in timestamp index.
                    ensure!(
@@ -725,9 +742,18 @@ impl TableMeta {
            .map(|name| new_schema.column_index_by_name(name).unwrap())
            .collect();

+        let partition_key_indices = self
+            .partition_key_indices
+            .iter()
+            .map(|idx| table_schema.column_name_by_index(*idx))
+            // This unwrap is safe since we don't allow removing a partition key column.
+            .map(|name| new_schema.column_index_by_name(name).unwrap())
+            .collect();
+
        let _ = meta_builder
            .schema(Arc::new(new_schema))
-            .primary_key_indices(primary_key_indices);
+            .primary_key_indices(primary_key_indices)
+            .partition_key_indices(partition_key_indices);

        Ok(meta_builder)
    }
@@ -1300,6 +1326,8 @@ fn unset_column_skipping_index_options(

 #[cfg(test)]
 mod tests {
+    use std::assert_matches::assert_matches;
+
    use common_error::ext::ErrorExt;
    use common_error::status_code::StatusCode;
    use datatypes::data_type::ConcreteDataType;
@@ -1308,6 +1336,7 @@ mod tests {
    };

    use super::*;
+    use crate::Error;

    /// Create a test schema with 3 columns: `[col1 int32, ts timestampmills, col2 int32]`.
    fn new_test_schema() -> Schema {
@@ -1385,6 +1414,11 @@ mod tests {
            ConcreteDataType::string_datatype(),
            true,
        );
+        let yet_another_field = ColumnSchema::new(
+            "yet_another_field_after_ts",
+            ConcreteDataType::int64_datatype(),
+            true,
+        );
        let alter_kind = AlterKind::AddColumns {
            columns: vec![
                AddColumnRequest {
@@ -1401,6 +1435,14 @@ mod tests {
                    }),
                    add_if_not_exists: false,
                },
+                AddColumnRequest {
+                    column_schema: yet_another_field,
+                    is_key: true,
+                    location: Some(AddColumnLocation::After {
+                        column_name: "ts".to_string(),
+                    }),
+                    add_if_not_exists: false,
+                },
            ],
        };

@@ -1756,6 +1798,29 @@ mod tests {
        assert_eq!(StatusCode::InvalidArguments, err.status_code());
    }

+    #[test]
+    fn test_remove_partition_column() {
+        let schema = Arc::new(new_test_schema());
+        let meta = TableMetaBuilder::empty()
+            .schema(schema)
+            .primary_key_indices(vec![])
+            .partition_key_indices(vec![0])
+            .engine("engine")
+            .next_column_id(3)
+            .build()
+            .unwrap();
+        // Remove column in primary key.
+        let alter_kind = AlterKind::DropColumns {
+            names: vec![String::from("col1")],
+        };
+
+        let err = meta
+            .builder_with_alter_kind("my_table", &alter_kind)
+            .err()
+            .unwrap();
+        assert_matches!(err, Error::RemovePartitionColumn { .. });
+    }
+
    #[test]
    fn test_change_key_column_data_type() {
        let schema = Arc::new(new_test_schema());
@@ -1821,6 +1886,8 @@ mod tests {
        let meta = TableMetaBuilder::empty()
            .schema(schema)
            .primary_key_indices(vec![0])
+            // partition col: col1, col2
+            .partition_key_indices(vec![0, 2])
            .engine("engine")
            .next_column_id(3)
            .build()
@@ -1836,11 +1903,19 @@ mod tests {
            .map(|column_schema| column_schema.name.clone())
            .collect();
        assert_eq!(
-            &["my_tag_first", "col1", "ts", "my_field_after_ts", "col2"],
+            &[
+                "my_tag_first",               // primary key column
+                "col1",                       // partition column
+                "ts",                         // timestamp column
+                "yet_another_field_after_ts", // primary key column
+                "my_field_after_ts",          // value column
+                "col2",                       // partition column
+            ],
            &names[..]
        );
-        assert_eq!(&[0, 1], &new_meta.primary_key_indices[..]);
-        assert_eq!(&[2, 3, 4], &new_meta.value_indices[..]);
+        assert_eq!(&[0, 1, 3], &new_meta.primary_key_indices[..]);
+        assert_eq!(&[2, 4, 5], &new_meta.value_indices[..]);
+        assert_eq!(&[1, 5], &new_meta.partition_key_indices[..]);
    }

    #[test]
--- a/tests-integration/src/grpc.rs
+++ b/tests-integration/src/grpc.rs
@@ -882,11 +882,14 @@ CREATE TABLE {table_name} (
            let region_id = RegionId::new(table_id, *region);

            let stream = region_server
-                .handle_remote_read(RegionQueryRequest {
-                    region_id: region_id.as_u64(),
-                    plan: plan.to_vec(),
-                    ..Default::default()
-                })
+                .handle_remote_read(
+                    RegionQueryRequest {
+                        region_id: region_id.as_u64(),
+                        plan: plan.to_vec(),
+                        ..Default::default()
+                    },
+                    QueryContext::arc(),
+                )
                .await
                .unwrap();

--- a/tests-integration/src/instance.rs
+++ b/tests-integration/src/instance.rs
@@ -249,11 +249,14 @@ mod tests {
            let region_id = RegionId::new(table_id, *region);

            let stream = region_server
-                .handle_remote_read(QueryRequest {
-                    region_id: region_id.as_u64(),
-                    plan: plan.to_vec(),
-                    ..Default::default()
-                })
+                .handle_remote_read(
+                    QueryRequest {
+                        region_id: region_id.as_u64(),
+                        plan: plan.to_vec(),
+                        ..Default::default()
+                    },
+                    QueryContext::arc(),
+                )
                .await
                .unwrap();

--- a/tests-integration/tests/http.rs
+++ b/tests-integration/tests/http.rs
@@ -112,6 +112,7 @@ macro_rules! http_tests {
                test_pipeline_with_hint_vrl,
                test_pipeline_2,
                test_pipeline_skip_error,
+                test_pipeline_filter,

                test_otlp_metrics,
                test_otlp_traces_v0,
@@ -1945,6 +1946,78 @@ transform:
    guard.remove_all().await;
 }

+pub async fn test_pipeline_filter(store_type: StorageType) {
+    common_telemetry::init_default_ut_logging();
+    let (app, mut guard) =
+        setup_test_http_app_with_frontend(store_type, "test_pipeline_filter").await;
+
+    // handshake
+    let client = TestClient::new(app).await;
+
+    let pipeline_body = r#"
+processors:
+  - date:
+      field: time
+      formats:
+        - "%Y-%m-%d %H:%M:%S%.3f"
+  - filter:
+      field: name
+      targets:
+        - John
+transform:
+  - field: name
+    type: string
+  - field: time
+    type: time
+    index: timestamp
+"#;
+
+    // 1. create pipeline
+    let res = client
+        .post("/v1/events/pipelines/test")
+        .header("Content-Type", "application/x-yaml")
+        .body(pipeline_body)
+        .send()
+        .await;
+    assert_eq!(res.status(), StatusCode::OK);
+
+    // 2. write data
+    let data_body = r#"
+[
+  {
+    "time": "2024-05-25 20:16:37.217",
+    "name": "John"
+  },
+  {
+    "time": "2024-05-25 20:16:37.218",
+    "name": "JoHN"
+  },
+  {
+    "time": "2024-05-25 20:16:37.328",
+    "name": "Jane"
+  }
+]
+"#;
+
+    let res = client
+        .post("/v1/events/logs?db=public&table=logs1&pipeline_name=test")
+        .header("Content-Type", "application/json")
+        .body(data_body)
+        .send()
+        .await;
+    assert_eq!(res.status(), StatusCode::OK);
+
+    validate_data(
+        "pipeline_filter",
+        &client,
+        "select * from logs1",
+        "[[\"Jane\",1716668197328000000]]",
+    )
+    .await;
+
+    guard.remove_all().await;
+}
+
 pub async fn test_pipeline_dispatcher(storage_type: StorageType) {
    common_telemetry::init_default_ut_logging();
    let (app, mut guard) =
@@ -2405,14 +2478,19 @@ processors:
      ignore_missing: true
  - vrl:
      source: |
-        .log_id = .id
-        del(.id)
+        .from_source = "channel_2"
+        cond, err = .id1 > .id2
+        if (cond) {
+            .from_source = "channel_1"
+        }
+        del(.id1)
+        del(.id2)
        .

 transform:
  - fields:
-      - log_id
-    type: int32
+      - from_source
+    type: string
  - field: time
    type: time
    index: timestamp
@@ -2432,7 +2510,8 @@ transform:
    let data_body = r#"
 [
  {
-    "id": "2436",
+    "id1": 2436,
+    "id2": 123,
    "time": "2024-05-25 20:16:37.217"
  }
 ]
@@ -2449,7 +2528,7 @@ transform:
        "test_pipeline_with_vrl",
        &client,
        "select * from d_table",
-        "[[2436,1716668197217000000]]",
+        "[[\"channel_1\",1716668197217000000]]",
    )
    .await;

--- a/tests-integration/tests/sql.rs
+++ b/tests-integration/tests/sql.rs
@@ -152,6 +152,16 @@ pub async fn test_mysql_stmts(store_type: StorageType) {

    conn.execute("SET TRANSACTION READ ONLY").await.unwrap();

+    // empty statements
+    let err = conn.execute("      -------  ;").await.unwrap_err();
+    assert!(err.to_string().contains("empty statements"));
+    let err = conn.execute("----------\n;").await.unwrap_err();
+    assert!(err.to_string().contains("empty statements"));
+    let err = conn.execute("        ;").await.unwrap_err();
+    assert!(err.to_string().contains("empty statements"));
+    let err = conn.execute("    \n    ;").await.unwrap_err();
+    assert!(err.to_string().contains("empty statements"));
+
    let _ = fe_mysql_server.shutdown().await;
    guard.remove_all().await;
 }
--- a/tests/cases/distributed/explain/join_10_tables.result
+++ b/tests/cases/distributed/explain/join_10_tables.result
@@ -84,17 +84,37 @@ limit 1;
 |_|_Inner Join: t_2.ts = t_3.ts, t_2.vin = t_3.vin_|
 |_|_Inner Join: t_1.ts = t_2.ts, t_1.vin = t_2.vin_|
 |_|_Filter: t_1.vin IS NOT NULL_|
-|_|_MergeScan [is_placeholder=false]_|
+|_|_MergeScan [is_placeholder=false, remote_input=[_|
+|_| TableScan: t_1_|
+|_| ]]_|
 |_|_Filter: t_2.vin IS NOT NULL_|
-|_|_MergeScan [is_placeholder=false]_|
-|_|_MergeScan [is_placeholder=false]_|
-|_|_MergeScan [is_placeholder=false]_|
-|_|_MergeScan [is_placeholder=false]_|
-|_|_MergeScan [is_placeholder=false]_|
-|_|_MergeScan [is_placeholder=false]_|
-|_|_MergeScan [is_placeholder=false]_|
-|_|_MergeScan [is_placeholder=false]_|
-|_|_MergeScan [is_placeholder=false]_|
+|_|_MergeScan [is_placeholder=false, remote_input=[_|
+|_| TableScan: t_2_|
+|_| ]]_|
+|_|_MergeScan [is_placeholder=false, remote_input=[_|
+|_| TableScan: t_3_|
+|_| ]]_|
+|_|_MergeScan [is_placeholder=false, remote_input=[_|
+|_| TableScan: t_4_|
+|_| ]]_|
+|_|_MergeScan [is_placeholder=false, remote_input=[_|
+|_| TableScan: t_5_|
+|_| ]]_|
+|_|_MergeScan [is_placeholder=false, remote_input=[_|
+|_| TableScan: t_6_|
+|_| ]]_|
+|_|_MergeScan [is_placeholder=false, remote_input=[_|
+|_| TableScan: t_7_|
+|_| ]]_|
+|_|_MergeScan [is_placeholder=false, remote_input=[_|
+|_| TableScan: t_8_|
+|_| ]]_|
+|_|_MergeScan [is_placeholder=false, remote_input=[_|
+|_| TableScan: t_9_|
+|_| ]]_|
+|_|_MergeScan [is_placeholder=false, remote_input=[_|
+|_| TableScan: t_10_|
+|_| ]]_|
 | physical_plan | SortPreservingMergeExec: [ts@0 DESC], fetch=1_|
 |_|_SortExec: TopK(fetch=1), expr=[ts@0 DESC], preserve_partitioning=[true]_|
 |_|_CoalesceBatchesExec: target_batch_size=8192_|
--- a/tests/cases/distributed/explain/multi_partitions.result
+++ b/tests/cases/distributed/explain/multi_partitions.result
@@ -26,7 +26,12 @@ explain SELECT * FROM demo WHERE ts > cast(1000000000 as timestamp) ORDER BY hos
 | plan_type_| plan_|
 +-+-+
 | logical_plan_| MergeSort: demo.host ASC NULLS LAST_|
-|_|_MergeScan [is_placeholder=false]_|
+|_|_MergeScan [is_placeholder=false, remote_input=[_|
+|_| Sort: demo.host ASC NULLS LAST_|
+|_|_Projection: demo.host, demo.ts, demo.cpu, demo.memory, demo.disk_util_|
+|_|_Filter: demo.ts > arrow_cast(Int64(1000000000), Utf8("Timestamp(Millisecond, None)"))_|
+|_|_TableScan: demo_|
+|_| ]]_|
 | physical_plan | SortPreservingMergeExec: [host@0 ASC NULLS LAST]_|
 |_|_MergeScanExec: REDACTED
 |_|_|
--- a/tests/cases/distributed/explain/order_by.result
+++ b/tests/cases/distributed/explain/order_by.result
@@ -12,7 +12,12 @@ EXPLAIN SELECT DISTINCT i%2 FROM integers ORDER BY 1;
 +-+-+
 | plan_type_| plan_|
 +-+-+
-| logical_plan_| MergeScan [is_placeholder=false]_|
+| logical_plan_| MergeScan [is_placeholder=false, remote_input=[ |
+|_| Sort: integers.i % Int64(2) ASC NULLS LAST_|
+|_|_Distinct:_|
+|_|_Projection: integers.i % Int64(2)_|
+|_|_TableScan: integers_|
+|_| ]]_|
 | physical_plan | MergeScanExec: REDACTED
 |_|_|
 +-+-+
@@ -35,7 +40,11 @@ EXPLAIN SELECT a, b FROM test ORDER BY a, b;
 +-+-+
 | plan_type_| plan_|
 +-+-+
-| logical_plan_| MergeScan [is_placeholder=false]_|
+| logical_plan_| MergeScan [is_placeholder=false, remote_input=[_|
+|_| Sort: test.a ASC NULLS LAST, test.b ASC NULLS LAST |
+|_|_Projection: test.a, test.b_|
+|_|_TableScan: test_|
+|_| ]]_|
 | physical_plan | MergeScanExec: REDACTED
 |_|_|
 +-+-+
@@ -50,7 +59,12 @@ EXPLAIN SELECT DISTINCT a, b FROM test ORDER BY a, b;
 +-+-+
 | plan_type_| plan_|
 +-+-+
-| logical_plan_| MergeScan [is_placeholder=false]_|
+| logical_plan_| MergeScan [is_placeholder=false, remote_input=[_|
+|_| Sort: test.a ASC NULLS LAST, test.b ASC NULLS LAST |
+|_|_Distinct:_|
+|_|_Projection: test.a, test.b_|
+|_|_TableScan: test_|
+|_| ]]_|
 | physical_plan | MergeScanExec: REDACTED
 |_|_|
 +-+-+
--- a/tests/cases/distributed/explain/single_partition.result
+++ b/tests/cases/distributed/explain/single_partition.result
@@ -12,7 +12,11 @@ EXPLAIN SELECT COUNT(*) FROM single_partition;
 +-+-+
 | plan_type_| plan_|
 +-+-+
-| logical_plan_| MergeScan [is_placeholder=false]_|
+| logical_plan_| MergeScan [is_placeholder=false, remote_input=[_|
+|_| Projection: count(*)_|
+|_|_Aggregate: groupBy=[[]], aggr=[[count(single_partition.j) AS count(*)]] |
+|_|_TableScan: single_partition_|
+|_| ]]_|
 | physical_plan | MergeScanExec: REDACTED
 |_|_|
 +-+-+
@@ -27,7 +31,11 @@ EXPLAIN SELECT SUM(i) FROM single_partition;
 +-+-+
 | plan_type_| plan_|
 +-+-+
-| logical_plan_| MergeScan [is_placeholder=false]_|
+| logical_plan_| MergeScan [is_placeholder=false, remote_input=[_|
+|_| Projection: sum(single_partition.i)_|
+|_|_Aggregate: groupBy=[[]], aggr=[[sum(single_partition.i)]] |
+|_|_TableScan: single_partition_|
+|_| ]]_|
 | physical_plan | MergeScanExec: REDACTED
 |_|_|
 +-+-+
@@ -42,7 +50,11 @@ EXPLAIN SELECT * FROM single_partition ORDER BY i DESC;
 +-+-+
 | plan_type_| plan_|
 +-+-+
-| logical_plan_| MergeScan [is_placeholder=false]_|
+| logical_plan_| MergeScan [is_placeholder=false, remote_input=[_|
+|_| Sort: single_partition.i DESC NULLS FIRST_|
+|_|_Projection: single_partition.i, single_partition.j, single_partition.k |
+|_|_TableScan: single_partition_|
+|_| ]]_|
 | physical_plan | MergeScanExec: REDACTED
 |_|_|
 +-+-+
--- a/tests/cases/distributed/explain/step_aggr.result
+++ b/tests/cases/distributed/explain/step_aggr.result
@@ -55,7 +55,10 @@ FROM
 +-+-+
 | logical_plan_| Projection: sum(count(integers.i)) AS count(integers.i), sum(sum(integers.i)) AS sum(integers.i), uddsketch_calc(Float64(0.5), uddsketch_merge(Int64(128),Float64(0.01),uddsketch_state(Int64(128),Float64(0.01),integers.i))) AS uddsketch_calc(Float64(0.5),uddsketch_state(Int64(128),Float64(0.01),integers.i)), hll_count(hll_merge(hll(integers.i))) AS hll_count(hll(integers.i))_|
 |_|_Aggregate: groupBy=[[]], aggr=[[sum(count(integers.i)), sum(sum(integers.i)), uddsketch_merge(Int64(128), Float64(0.01), uddsketch_state(Int64(128),Float64(0.01),integers.i)), hll_merge(hll(integers.i))]]_|
-|_|_MergeScan [is_placeholder=false]_|
+|_|_MergeScan [is_placeholder=false, remote_input=[_|
+|_| Aggregate: groupBy=[[]], aggr=[[count(integers.i), sum(integers.i), uddsketch_state(Int64(128), Float64(0.01), CAST(integers.i AS Float64)), hll(CAST(integers.i AS Utf8))]]_|
+|_|_TableScan: integers_|
+|_| ]]_|
 | physical_plan | ProjectionExec: expr=[sum(count(integers.i))@0 as count(integers.i), sum(sum(integers.i))@1 as sum(integers.i), uddsketch_calc(0.5, uddsketch_merge(Int64(128),Float64(0.01),uddsketch_state(Int64(128),Float64(0.01),integers.i))@2) as uddsketch_calc(Float64(0.5),uddsketch_state(Int64(128),Float64(0.01),integers.i)), hll_count(hll_merge(hll(integers.i))@3) as hll_count(hll(integers.i))] |
 |_|_AggregateExec: mode=Final, gby=[], aggr=[sum(count(integers.i)), sum(sum(integers.i)), uddsketch_merge(Int64(128),Float64(0.01),uddsketch_state(Int64(128),Float64(0.01),integers.i)), hll_merge(hll(integers.i))]_|
 |_|_CoalescePartitionsExec_|
@@ -156,7 +159,10 @@ ORDER BY
 | logical_plan_| Sort: integers.ts ASC NULLS LAST_|
 |_|_Projection: integers.ts, sum(count(integers.i)) AS count(integers.i), sum(sum(integers.i)) AS sum(integers.i), uddsketch_calc(Float64(0.5), uddsketch_merge(Int64(128),Float64(0.01),uddsketch_state(Int64(128),Float64(0.01),integers.i))) AS uddsketch_calc(Float64(0.5),uddsketch_state(Int64(128),Float64(0.01),integers.i)), hll_count(hll_merge(hll(integers.i))) AS hll_count(hll(integers.i))_|
 |_|_Aggregate: groupBy=[[integers.ts]], aggr=[[sum(count(integers.i)), sum(sum(integers.i)), uddsketch_merge(Int64(128), Float64(0.01), uddsketch_state(Int64(128),Float64(0.01),integers.i)), hll_merge(hll(integers.i))]]_|
-|_|_MergeScan [is_placeholder=false]_|
+|_|_MergeScan [is_placeholder=false, remote_input=[_|
+|_| Aggregate: groupBy=[[integers.ts]], aggr=[[count(integers.i), sum(integers.i), uddsketch_state(Int64(128), Float64(0.01), CAST(integers.i AS Float64)), hll(CAST(integers.i AS Utf8))]]_|
+|_|_TableScan: integers_|
+|_| ]]_|
 | physical_plan | SortPreservingMergeExec: [ts@0 ASC NULLS LAST]_|
 |_|_SortExec: expr=[ts@0 ASC NULLS LAST], preserve_partitioning=[true]_|
 |_|_ProjectionExec: expr=[ts@0 as ts, sum(count(integers.i))@1 as count(integers.i), sum(sum(integers.i))@2 as sum(integers.i), uddsketch_calc(0.5, uddsketch_merge(Int64(128),Float64(0.01),uddsketch_state(Int64(128),Float64(0.01),integers.i))@3) as uddsketch_calc(Float64(0.5),uddsketch_state(Int64(128),Float64(0.01),integers.i)), hll_count(hll_merge(hll(integers.i))@4) as hll_count(hll(integers.i))] |
--- a/tests/cases/distributed/explain/step_aggr_advance.result
+++ b/tests/cases/distributed/explain/step_aggr_advance.result
@@ -0,0 +1,974 @@
+CREATE TABLE IF NOT EXISTS aggr_optimize_not (
+  a STRING NULL,
+  b STRING NULL,
+  c STRING NULL,
+  d STRING NULL,
+  greptime_timestamp TIMESTAMP(3) NOT NULL,
+  greptime_value DOUBLE NULL,
+  TIME INDEX (greptime_timestamp),
+  PRIMARY KEY (a, b, c, d)
+) PARTITION ON COLUMNS (a, b, c) (a < 'b', a >= 'b',);
+
+Affected Rows: 0
+
+-- Case 0: group by columns are the same as partition columns.
+-- This query shouldn't push down aggregation even if group by columns are partitioned.
+-- because sort is already pushed down.
+-- If it does, it will cause a wrong result.
+-- explain at 0s, 5s and 10s. No point at 0s.
+-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
+-- SQLNESS REPLACE (peers.*) REDACTED
+-- SQLNESS REPLACE (Hash.*) REDACTED
+tql explain (1752591864, 1752592164, '30s') max by (a, b, c) (max_over_time(aggr_optimize_not [2m]));
+
+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+| plan_type     | plan                                                                                                                                                                                                                                                                |
+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+| logical_plan  | Sort: aggr_optimize_not.a ASC NULLS LAST, aggr_optimize_not.b ASC NULLS LAST, aggr_optimize_not.c ASC NULLS LAST, aggr_optimize_not.greptime_timestamp ASC NULLS LAST                                                                                               |
+|               |   Aggregate: groupBy=[[aggr_optimize_not.a, aggr_optimize_not.b, aggr_optimize_not.c, aggr_optimize_not.greptime_timestamp]], aggr=[[max(prom_max_over_time(greptime_timestamp_range,greptime_value))]]                                                             |
+|               |     Projection: aggr_optimize_not.greptime_timestamp, prom_max_over_time(greptime_timestamp_range,greptime_value), aggr_optimize_not.a, aggr_optimize_not.b, aggr_optimize_not.c                                                                                    |
+|               |       MergeSort: aggr_optimize_not.a ASC NULLS FIRST, aggr_optimize_not.b ASC NULLS FIRST, aggr_optimize_not.c ASC NULLS FIRST, aggr_optimize_not.d ASC NULLS FIRST, aggr_optimize_not.greptime_timestamp ASC NULLS FIRST                                           |
+|               |         MergeScan [is_placeholder=false, remote_input=[                                                                                                                                                                                                             |
+|               | Filter: prom_max_over_time(greptime_timestamp_range,greptime_value) IS NOT NULL                                                                                                                                                                                     |
+|               |   Projection: aggr_optimize_not.greptime_timestamp, prom_max_over_time(greptime_timestamp_range, greptime_value) AS prom_max_over_time(greptime_timestamp_range,greptime_value), aggr_optimize_not.a, aggr_optimize_not.b, aggr_optimize_not.c, aggr_optimize_not.d |
+|               |     PromRangeManipulate: req range=[0..0], interval=[300000], eval range=[120000], time index=[greptime_timestamp], values=["greptime_value"]                                                                                                                       |
+|               |       PromSeriesNormalize: offset=[0], time index=[greptime_timestamp], filter NaN: [true]                                                                                                                                                                          |
+|               |         PromSeriesDivide: tags=["a", "b", "c", "d"]                                                                                                                                                                                                                 |
+|               |           Sort: aggr_optimize_not.a ASC NULLS FIRST, aggr_optimize_not.b ASC NULLS FIRST, aggr_optimize_not.c ASC NULLS FIRST, aggr_optimize_not.d ASC NULLS FIRST, aggr_optimize_not.greptime_timestamp ASC NULLS FIRST                                            |
+|               |             Filter: aggr_optimize_not.greptime_timestamp >= TimestampMillisecond(-420000, None) AND aggr_optimize_not.greptime_timestamp <= TimestampMillisecond(300000, None)                                                                                      |
+|               |               TableScan: aggr_optimize_not                                                                                                                                                                                                                          |
+|               | ]]                                                                                                                                                                                                                                                                  |
+| physical_plan | SortPreservingMergeExec: [a@0 ASC NULLS LAST, b@1 ASC NULLS LAST, c@2 ASC NULLS LAST, greptime_timestamp@3 ASC NULLS LAST]                                                                                                                                          |
+|               |   SortExec: expr=[a@0 ASC NULLS LAST, b@1 ASC NULLS LAST, c@2 ASC NULLS LAST, greptime_timestamp@3 ASC NULLS LAST], preserve_partitioning=[true]                                                                                                                    |
+|               |     AggregateExec: mode=FinalPartitioned, gby=[a@0 as a, b@1 as b, c@2 as c, greptime_timestamp@3 as greptime_timestamp], aggr=[max(prom_max_over_time(greptime_timestamp_range,greptime_value))], ordering_mode=PartiallySorted([0, 1, 2])                         |
+|               |       SortExec: expr=[a@0 ASC NULLS LAST, b@1 ASC NULLS LAST, c@2 ASC NULLS LAST], preserve_partitioning=[true]                                                                                                                                                     |
+|               |         CoalesceBatchesExec: target_batch_size=8192                                                                                                                                                                                                                 |
+|               |           RepartitionExec: partitioning=REDACTED
+|               |             AggregateExec: mode=Partial, gby=[a@2 as a, b@3 as b, c@4 as c, greptime_timestamp@0 as greptime_timestamp], aggr=[max(prom_max_over_time(greptime_timestamp_range,greptime_value))], ordering_mode=PartiallySorted([0, 1, 2])                          |
+|               |               ProjectionExec: expr=[greptime_timestamp@0 as greptime_timestamp, prom_max_over_time(greptime_timestamp_range,greptime_value)@1 as prom_max_over_time(greptime_timestamp_range,greptime_value), a@2 as a, b@3 as b, c@4 as c]                         |
+|               |                 SortExec: expr=[a@2 ASC, b@3 ASC, c@4 ASC, d@5 ASC, greptime_timestamp@0 ASC], preserve_partitioning=[true]                                                                                                                                         |
+|               |                   MergeScanExec: REDACTED
+|               |                                                                                                                                                                                                                                                                     |
+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+
+-- SQLNESS REPLACE (metrics.*) REDACTED
+-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
+-- SQLNESS REPLACE (Hash.*) REDACTED
+-- SQLNESS REPLACE (-+) -
+-- SQLNESS REPLACE (\s\s+) _
+-- SQLNESS REPLACE (peers.*) REDACTED
+-- SQLNESS REPLACE region=\d+\(\d+,\s+\d+\) region=REDACTED
+tql analyze (1752591864, 1752592164, '30s') max by (a, b, c) (max_over_time(aggr_optimize_not [2m]));
+
+-+-+-+
+| stage | node | plan_|
+-+-+-+
+| 0_| 0_|_SortPreservingMergeExec: [a@0 ASC NULLS LAST, b@1 ASC NULLS LAST, c@2 ASC NULLS LAST, greptime_timestamp@3 ASC NULLS LAST] REDACTED
+|_|_|_SortExec: expr=[a@0 ASC NULLS LAST, b@1 ASC NULLS LAST, c@2 ASC NULLS LAST, greptime_timestamp@3 ASC NULLS LAST], preserve_partitioning=[true] REDACTED
+|_|_|_AggregateExec: mode=FinalPartitioned, gby=[a@0 as a, b@1 as b, c@2 as c, greptime_timestamp@3 as greptime_timestamp], aggr=[max(prom_max_over_time(greptime_timestamp_range,greptime_value))], ordering_mode=PartiallySorted([0, 1, 2]) REDACTED
+|_|_|_SortExec: expr=[a@0 ASC NULLS LAST, b@1 ASC NULLS LAST, c@2 ASC NULLS LAST], preserve_partitioning=[true] REDACTED
+|_|_|_CoalesceBatchesExec: target_batch_size=8192 REDACTED
+|_|_|_RepartitionExec: partitioning=REDACTED
+|_|_|_AggregateExec: mode=Partial, gby=[a@2 as a, b@3 as b, c@4 as c, greptime_timestamp@0 as greptime_timestamp], aggr=[max(prom_max_over_time(greptime_timestamp_range,greptime_value))], ordering_mode=PartiallySorted([0, 1, 2]) REDACTED
+|_|_|_ProjectionExec: expr=[greptime_timestamp@0 as greptime_timestamp, prom_max_over_time(greptime_timestamp_range,greptime_value)@1 as prom_max_over_time(greptime_timestamp_range,greptime_value), a@2 as a, b@3 as b, c@4 as c] REDACTED
+|_|_|_SortExec: expr=[a@2 ASC, b@3 ASC, c@4 ASC, d@5 ASC, greptime_timestamp@0 ASC], preserve_partitioning=[true] REDACTED
+|_|_|_MergeScanExec: REDACTED
+|_|_|_|
+| 1_| 0_|_CoalesceBatchesExec: target_batch_size=8192 REDACTED
+|_|_|_FilterExec: prom_max_over_time(greptime_timestamp_range,greptime_value)@1 IS NOT NULL REDACTED
+|_|_|_ProjectionExec: expr=[greptime_timestamp@4 as greptime_timestamp, prom_max_over_time(greptime_timestamp_range@6, greptime_value@5) as prom_max_over_time(greptime_timestamp_range,greptime_value), a@0 as a, b@1 as b, c@2 as c, d@3 as d] REDACTED
+|_|_|_PromRangeManipulateExec: req range=[1752591864000..1752592164000], interval=[30000], eval range=[120000], time index=[greptime_timestamp] REDACTED
+|_|_|_PromSeriesNormalizeExec: offset=[0], time index=[greptime_timestamp], filter NaN: [true] REDACTED
+|_|_|_PromSeriesDivideExec: tags=["a", "b", "c", "d"] REDACTED
+|_|_|_SeriesScan: region=REDACTED, "partition_count":{"count":0, "mem_ranges":0, "files":0, "file_ranges":0}, "distribution":"PerSeries" REDACTED
+|_|_|_|
+| 1_| 1_|_CoalesceBatchesExec: target_batch_size=8192 REDACTED
+|_|_|_FilterExec: prom_max_over_time(greptime_timestamp_range,greptime_value)@1 IS NOT NULL REDACTED
+|_|_|_ProjectionExec: expr=[greptime_timestamp@4 as greptime_timestamp, prom_max_over_time(greptime_timestamp_range@6, greptime_value@5) as prom_max_over_time(greptime_timestamp_range,greptime_value), a@0 as a, b@1 as b, c@2 as c, d@3 as d] REDACTED
+|_|_|_PromRangeManipulateExec: req range=[1752591864000..1752592164000], interval=[30000], eval range=[120000], time index=[greptime_timestamp] REDACTED
+|_|_|_PromSeriesNormalizeExec: offset=[0], time index=[greptime_timestamp], filter NaN: [true] REDACTED
+|_|_|_PromSeriesDivideExec: tags=["a", "b", "c", "d"] REDACTED
+|_|_|_SeriesScan: region=REDACTED, "partition_count":{"count":0, "mem_ranges":0, "files":0, "file_ranges":0}, "distribution":"PerSeries" REDACTED
+|_|_|_|
+|_|_| Total rows: 0_|
+-+-+-+
+
+-- Case 1: group by columns are prefix of partition columns.
+-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
+-- SQLNESS REPLACE (peers.*) REDACTED
+-- SQLNESS REPLACE (Hash.*) REDACTED
+tql explain (1752591864, 1752592164, '30s') sum by (a, b) (max_over_time(aggr_optimize_not [2m]));
+
+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+| plan_type     | plan                                                                                                                                                                                                                                                                |
+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+| logical_plan  | Sort: aggr_optimize_not.a ASC NULLS LAST, aggr_optimize_not.b ASC NULLS LAST, aggr_optimize_not.greptime_timestamp ASC NULLS LAST                                                                                                                                   |
+|               |   Aggregate: groupBy=[[aggr_optimize_not.a, aggr_optimize_not.b, aggr_optimize_not.greptime_timestamp]], aggr=[[sum(prom_max_over_time(greptime_timestamp_range,greptime_value))]]                                                                                  |
+|               |     Projection: aggr_optimize_not.greptime_timestamp, prom_max_over_time(greptime_timestamp_range,greptime_value), aggr_optimize_not.a, aggr_optimize_not.b                                                                                                         |
+|               |       MergeSort: aggr_optimize_not.a ASC NULLS FIRST, aggr_optimize_not.b ASC NULLS FIRST, aggr_optimize_not.c ASC NULLS FIRST, aggr_optimize_not.d ASC NULLS FIRST, aggr_optimize_not.greptime_timestamp ASC NULLS FIRST                                           |
+|               |         MergeScan [is_placeholder=false, remote_input=[                                                                                                                                                                                                             |
+|               | Filter: prom_max_over_time(greptime_timestamp_range,greptime_value) IS NOT NULL                                                                                                                                                                                     |
+|               |   Projection: aggr_optimize_not.greptime_timestamp, prom_max_over_time(greptime_timestamp_range, greptime_value) AS prom_max_over_time(greptime_timestamp_range,greptime_value), aggr_optimize_not.a, aggr_optimize_not.b, aggr_optimize_not.c, aggr_optimize_not.d |
+|               |     PromRangeManipulate: req range=[0..0], interval=[300000], eval range=[120000], time index=[greptime_timestamp], values=["greptime_value"]                                                                                                                       |
+|               |       PromSeriesNormalize: offset=[0], time index=[greptime_timestamp], filter NaN: [true]                                                                                                                                                                          |
+|               |         PromSeriesDivide: tags=["a", "b", "c", "d"]                                                                                                                                                                                                                 |
+|               |           Sort: aggr_optimize_not.a ASC NULLS FIRST, aggr_optimize_not.b ASC NULLS FIRST, aggr_optimize_not.c ASC NULLS FIRST, aggr_optimize_not.d ASC NULLS FIRST, aggr_optimize_not.greptime_timestamp ASC NULLS FIRST                                            |
+|               |             Filter: aggr_optimize_not.greptime_timestamp >= TimestampMillisecond(-420000, None) AND aggr_optimize_not.greptime_timestamp <= TimestampMillisecond(300000, None)                                                                                      |
+|               |               TableScan: aggr_optimize_not                                                                                                                                                                                                                          |
+|               | ]]                                                                                                                                                                                                                                                                  |
+| physical_plan | SortPreservingMergeExec: [a@0 ASC NULLS LAST, b@1 ASC NULLS LAST, greptime_timestamp@2 ASC NULLS LAST]                                                                                                                                                              |
+|               |   SortExec: expr=[a@0 ASC NULLS LAST, b@1 ASC NULLS LAST, greptime_timestamp@2 ASC NULLS LAST], preserve_partitioning=[true]                                                                                                                                        |
+|               |     AggregateExec: mode=FinalPartitioned, gby=[a@0 as a, b@1 as b, greptime_timestamp@2 as greptime_timestamp], aggr=[sum(prom_max_over_time(greptime_timestamp_range,greptime_value))], ordering_mode=PartiallySorted([0, 1])                                      |
+|               |       SortExec: expr=[a@0 ASC NULLS LAST, b@1 ASC NULLS LAST], preserve_partitioning=[true]                                                                                                                                                                         |
+|               |         CoalesceBatchesExec: target_batch_size=8192                                                                                                                                                                                                                 |
+|               |           RepartitionExec: partitioning=REDACTED
+|               |             AggregateExec: mode=Partial, gby=[a@2 as a, b@3 as b, greptime_timestamp@0 as greptime_timestamp], aggr=[sum(prom_max_over_time(greptime_timestamp_range,greptime_value))], ordering_mode=PartiallySorted([0, 1])                                       |
+|               |               ProjectionExec: expr=[greptime_timestamp@0 as greptime_timestamp, prom_max_over_time(greptime_timestamp_range,greptime_value)@1 as prom_max_over_time(greptime_timestamp_range,greptime_value), a@2 as a, b@3 as b]                                   |
+|               |                 SortExec: expr=[a@2 ASC, b@3 ASC, c@4 ASC, d@5 ASC, greptime_timestamp@0 ASC], preserve_partitioning=[true]                                                                                                                                         |
+|               |                   MergeScanExec: REDACTED
+|               |                                                                                                                                                                                                                                                                     |
+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+
+-- SQLNESS REPLACE (metrics.*) REDACTED
+-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
+-- SQLNESS REPLACE (Hash.*) REDACTED
+-- SQLNESS REPLACE (-+) -
+-- SQLNESS REPLACE (\s\s+) _
+-- SQLNESS REPLACE (peers.*) REDACTED
+-- SQLNESS REPLACE region=\d+\(\d+,\s+\d+\) region=REDACTED
+tql analyze (1752591864, 1752592164, '30s') sum by (a, b) (max_over_time(aggr_optimize_not [2m]));
+
+-+-+-+
+| stage | node | plan_|
+-+-+-+
+| 0_| 0_|_SortPreservingMergeExec: [a@0 ASC NULLS LAST, b@1 ASC NULLS LAST, greptime_timestamp@2 ASC NULLS LAST] REDACTED
+|_|_|_SortExec: expr=[a@0 ASC NULLS LAST, b@1 ASC NULLS LAST, greptime_timestamp@2 ASC NULLS LAST], preserve_partitioning=[true] REDACTED
+|_|_|_AggregateExec: mode=FinalPartitioned, gby=[a@0 as a, b@1 as b, greptime_timestamp@2 as greptime_timestamp], aggr=[sum(prom_max_over_time(greptime_timestamp_range,greptime_value))], ordering_mode=PartiallySorted([0, 1]) REDACTED
+|_|_|_SortExec: expr=[a@0 ASC NULLS LAST, b@1 ASC NULLS LAST], preserve_partitioning=[true] REDACTED
+|_|_|_CoalesceBatchesExec: target_batch_size=8192 REDACTED
+|_|_|_RepartitionExec: partitioning=REDACTED
+|_|_|_AggregateExec: mode=Partial, gby=[a@2 as a, b@3 as b, greptime_timestamp@0 as greptime_timestamp], aggr=[sum(prom_max_over_time(greptime_timestamp_range,greptime_value))], ordering_mode=PartiallySorted([0, 1]) REDACTED
+|_|_|_ProjectionExec: expr=[greptime_timestamp@0 as greptime_timestamp, prom_max_over_time(greptime_timestamp_range,greptime_value)@1 as prom_max_over_time(greptime_timestamp_range,greptime_value), a@2 as a, b@3 as b] REDACTED
+|_|_|_SortExec: expr=[a@2 ASC, b@3 ASC, c@4 ASC, d@5 ASC, greptime_timestamp@0 ASC], preserve_partitioning=[true] REDACTED
+|_|_|_MergeScanExec: REDACTED
+|_|_|_|
+| 1_| 0_|_CoalesceBatchesExec: target_batch_size=8192 REDACTED
+|_|_|_FilterExec: prom_max_over_time(greptime_timestamp_range,greptime_value)@1 IS NOT NULL REDACTED
+|_|_|_ProjectionExec: expr=[greptime_timestamp@4 as greptime_timestamp, prom_max_over_time(greptime_timestamp_range@6, greptime_value@5) as prom_max_over_time(greptime_timestamp_range,greptime_value), a@0 as a, b@1 as b, c@2 as c, d@3 as d] REDACTED
+|_|_|_PromRangeManipulateExec: req range=[1752591864000..1752592164000], interval=[30000], eval range=[120000], time index=[greptime_timestamp] REDACTED
+|_|_|_PromSeriesNormalizeExec: offset=[0], time index=[greptime_timestamp], filter NaN: [true] REDACTED
+|_|_|_PromSeriesDivideExec: tags=["a", "b", "c", "d"] REDACTED
+|_|_|_SeriesScan: region=REDACTED, "partition_count":{"count":0, "mem_ranges":0, "files":0, "file_ranges":0}, "distribution":"PerSeries" REDACTED
+|_|_|_|
+| 1_| 1_|_CoalesceBatchesExec: target_batch_size=8192 REDACTED
+|_|_|_FilterExec: prom_max_over_time(greptime_timestamp_range,greptime_value)@1 IS NOT NULL REDACTED
+|_|_|_ProjectionExec: expr=[greptime_timestamp@4 as greptime_timestamp, prom_max_over_time(greptime_timestamp_range@6, greptime_value@5) as prom_max_over_time(greptime_timestamp_range,greptime_value), a@0 as a, b@1 as b, c@2 as c, d@3 as d] REDACTED
+|_|_|_PromRangeManipulateExec: req range=[1752591864000..1752592164000], interval=[30000], eval range=[120000], time index=[greptime_timestamp] REDACTED
+|_|_|_PromSeriesNormalizeExec: offset=[0], time index=[greptime_timestamp], filter NaN: [true] REDACTED
+|_|_|_PromSeriesDivideExec: tags=["a", "b", "c", "d"] REDACTED
+|_|_|_SeriesScan: region=REDACTED, "partition_count":{"count":0, "mem_ranges":0, "files":0, "file_ranges":0}, "distribution":"PerSeries" REDACTED
+|_|_|_|
+|_|_| Total rows: 0_|
+-+-+-+
+
+-- Case 2: group by columns are prefix of partition columns.
+-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
+-- SQLNESS REPLACE (peers.*) REDACTED
+-- SQLNESS REPLACE (Hash.*) REDACTED
+tql explain (1752591864, 1752592164, '30s') avg by (a) (max_over_time(aggr_optimize_not [2m]));
+
+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+| plan_type     | plan                                                                                                                                                                                                                                                                |
+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+| logical_plan  | Sort: aggr_optimize_not.a ASC NULLS LAST, aggr_optimize_not.greptime_timestamp ASC NULLS LAST                                                                                                                                                                       |
+|               |   Aggregate: groupBy=[[aggr_optimize_not.a, aggr_optimize_not.greptime_timestamp]], aggr=[[avg(prom_max_over_time(greptime_timestamp_range,greptime_value))]]                                                                                                       |
+|               |     Projection: aggr_optimize_not.greptime_timestamp, prom_max_over_time(greptime_timestamp_range,greptime_value), aggr_optimize_not.a                                                                                                                              |
+|               |       MergeSort: aggr_optimize_not.a ASC NULLS FIRST, aggr_optimize_not.b ASC NULLS FIRST, aggr_optimize_not.c ASC NULLS FIRST, aggr_optimize_not.d ASC NULLS FIRST, aggr_optimize_not.greptime_timestamp ASC NULLS FIRST                                           |
+|               |         MergeScan [is_placeholder=false, remote_input=[                                                                                                                                                                                                             |
+|               | Filter: prom_max_over_time(greptime_timestamp_range,greptime_value) IS NOT NULL                                                                                                                                                                                     |
+|               |   Projection: aggr_optimize_not.greptime_timestamp, prom_max_over_time(greptime_timestamp_range, greptime_value) AS prom_max_over_time(greptime_timestamp_range,greptime_value), aggr_optimize_not.a, aggr_optimize_not.b, aggr_optimize_not.c, aggr_optimize_not.d |
+|               |     PromRangeManipulate: req range=[0..0], interval=[300000], eval range=[120000], time index=[greptime_timestamp], values=["greptime_value"]                                                                                                                       |
+|               |       PromSeriesNormalize: offset=[0], time index=[greptime_timestamp], filter NaN: [true]                                                                                                                                                                          |
+|               |         PromSeriesDivide: tags=["a", "b", "c", "d"]                                                                                                                                                                                                                 |
+|               |           Sort: aggr_optimize_not.a ASC NULLS FIRST, aggr_optimize_not.b ASC NULLS FIRST, aggr_optimize_not.c ASC NULLS FIRST, aggr_optimize_not.d ASC NULLS FIRST, aggr_optimize_not.greptime_timestamp ASC NULLS FIRST                                            |
+|               |             Filter: aggr_optimize_not.greptime_timestamp >= TimestampMillisecond(-420000, None) AND aggr_optimize_not.greptime_timestamp <= TimestampMillisecond(300000, None)                                                                                      |
+|               |               TableScan: aggr_optimize_not                                                                                                                                                                                                                          |
+|               | ]]                                                                                                                                                                                                                                                                  |
+| physical_plan | SortPreservingMergeExec: [a@0 ASC NULLS LAST, greptime_timestamp@1 ASC NULLS LAST]                                                                                                                                                                                  |
+|               |   SortExec: expr=[a@0 ASC NULLS LAST, greptime_timestamp@1 ASC NULLS LAST], preserve_partitioning=[true]                                                                                                                                                            |
+|               |     AggregateExec: mode=FinalPartitioned, gby=[a@0 as a, greptime_timestamp@1 as greptime_timestamp], aggr=[avg(prom_max_over_time(greptime_timestamp_range,greptime_value))], ordering_mode=PartiallySorted([0])                                                   |
+|               |       SortExec: expr=[a@0 ASC NULLS LAST], preserve_partitioning=[true]                                                                                                                                                                                             |
+|               |         CoalesceBatchesExec: target_batch_size=8192                                                                                                                                                                                                                 |
+|               |           RepartitionExec: partitioning=REDACTED
+|               |             AggregateExec: mode=Partial, gby=[a@2 as a, greptime_timestamp@0 as greptime_timestamp], aggr=[avg(prom_max_over_time(greptime_timestamp_range,greptime_value))], ordering_mode=PartiallySorted([0])                                                    |
+|               |               ProjectionExec: expr=[greptime_timestamp@0 as greptime_timestamp, prom_max_over_time(greptime_timestamp_range,greptime_value)@1 as prom_max_over_time(greptime_timestamp_range,greptime_value), a@2 as a]                                             |
+|               |                 SortExec: expr=[a@2 ASC, b@3 ASC, c@4 ASC, d@5 ASC, greptime_timestamp@0 ASC], preserve_partitioning=[true]                                                                                                                                         |
+|               |                   MergeScanExec: REDACTED
+|               |                                                                                                                                                                                                                                                                     |
+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+
+-- SQLNESS REPLACE (metrics.*) REDACTED
+-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
+-- SQLNESS REPLACE (Hash.*) REDACTED
+-- SQLNESS REPLACE (-+) -
+-- SQLNESS REPLACE (\s\s+) _
+-- SQLNESS REPLACE (peers.*) REDACTED
+-- SQLNESS REPLACE region=\d+\(\d+,\s+\d+\) region=REDACTED
+tql analyze (1752591864, 1752592164, '30s') avg by (a) (max_over_time(aggr_optimize_not [2m]));
+
+-+-+-+
+| stage | node | plan_|
+-+-+-+
+| 0_| 0_|_SortPreservingMergeExec: [a@0 ASC NULLS LAST, greptime_timestamp@1 ASC NULLS LAST] REDACTED
+|_|_|_SortExec: expr=[a@0 ASC NULLS LAST, greptime_timestamp@1 ASC NULLS LAST], preserve_partitioning=[true] REDACTED
+|_|_|_AggregateExec: mode=FinalPartitioned, gby=[a@0 as a, greptime_timestamp@1 as greptime_timestamp], aggr=[avg(prom_max_over_time(greptime_timestamp_range,greptime_value))], ordering_mode=PartiallySorted([0]) REDACTED
+|_|_|_SortExec: expr=[a@0 ASC NULLS LAST], preserve_partitioning=[true] REDACTED
+|_|_|_CoalesceBatchesExec: target_batch_size=8192 REDACTED
+|_|_|_RepartitionExec: partitioning=REDACTED
+|_|_|_AggregateExec: mode=Partial, gby=[a@2 as a, greptime_timestamp@0 as greptime_timestamp], aggr=[avg(prom_max_over_time(greptime_timestamp_range,greptime_value))], ordering_mode=PartiallySorted([0]) REDACTED
+|_|_|_ProjectionExec: expr=[greptime_timestamp@0 as greptime_timestamp, prom_max_over_time(greptime_timestamp_range,greptime_value)@1 as prom_max_over_time(greptime_timestamp_range,greptime_value), a@2 as a] REDACTED
+|_|_|_SortExec: expr=[a@2 ASC, b@3 ASC, c@4 ASC, d@5 ASC, greptime_timestamp@0 ASC], preserve_partitioning=[true] REDACTED
+|_|_|_MergeScanExec: REDACTED
+|_|_|_|
+| 1_| 0_|_CoalesceBatchesExec: target_batch_size=8192 REDACTED
+|_|_|_FilterExec: prom_max_over_time(greptime_timestamp_range,greptime_value)@1 IS NOT NULL REDACTED
+|_|_|_ProjectionExec: expr=[greptime_timestamp@4 as greptime_timestamp, prom_max_over_time(greptime_timestamp_range@6, greptime_value@5) as prom_max_over_time(greptime_timestamp_range,greptime_value), a@0 as a, b@1 as b, c@2 as c, d@3 as d] REDACTED
+|_|_|_PromRangeManipulateExec: req range=[1752591864000..1752592164000], interval=[30000], eval range=[120000], time index=[greptime_timestamp] REDACTED
+|_|_|_PromSeriesNormalizeExec: offset=[0], time index=[greptime_timestamp], filter NaN: [true] REDACTED
+|_|_|_PromSeriesDivideExec: tags=["a", "b", "c", "d"] REDACTED
+|_|_|_SeriesScan: region=REDACTED, "partition_count":{"count":0, "mem_ranges":0, "files":0, "file_ranges":0}, "distribution":"PerSeries" REDACTED
+|_|_|_|
+| 1_| 1_|_CoalesceBatchesExec: target_batch_size=8192 REDACTED
+|_|_|_FilterExec: prom_max_over_time(greptime_timestamp_range,greptime_value)@1 IS NOT NULL REDACTED
+|_|_|_ProjectionExec: expr=[greptime_timestamp@4 as greptime_timestamp, prom_max_over_time(greptime_timestamp_range@6, greptime_value@5) as prom_max_over_time(greptime_timestamp_range,greptime_value), a@0 as a, b@1 as b, c@2 as c, d@3 as d] REDACTED
+|_|_|_PromRangeManipulateExec: req range=[1752591864000..1752592164000], interval=[30000], eval range=[120000], time index=[greptime_timestamp] REDACTED
+|_|_|_PromSeriesNormalizeExec: offset=[0], time index=[greptime_timestamp], filter NaN: [true] REDACTED
+|_|_|_PromSeriesDivideExec: tags=["a", "b", "c", "d"] REDACTED
+|_|_|_SeriesScan: region=REDACTED, "partition_count":{"count":0, "mem_ranges":0, "files":0, "file_ranges":0}, "distribution":"PerSeries" REDACTED
+|_|_|_|
+|_|_| Total rows: 0_|
+-+-+-+
+
+-- Case 3: group by columns are superset of partition columns.
+-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
+-- SQLNESS REPLACE (peers.*) REDACTED
+-- SQLNESS REPLACE (Hash.*) REDACTED
+tql explain (1752591864, 1752592164, '30s') count by (a, b, c, d) (max_over_time(aggr_optimize_not [2m]));
+
+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+| plan_type     | plan                                                                                                                                                                                                                                                                |
+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+| logical_plan  | Sort: aggr_optimize_not.a ASC NULLS LAST, aggr_optimize_not.b ASC NULLS LAST, aggr_optimize_not.c ASC NULLS LAST, aggr_optimize_not.d ASC NULLS LAST, aggr_optimize_not.greptime_timestamp ASC NULLS LAST                                                           |
+|               |   Aggregate: groupBy=[[aggr_optimize_not.a, aggr_optimize_not.b, aggr_optimize_not.c, aggr_optimize_not.d, aggr_optimize_not.greptime_timestamp]], aggr=[[count(prom_max_over_time(greptime_timestamp_range,greptime_value))]]                                      |
+|               |     MergeSort: aggr_optimize_not.a ASC NULLS FIRST, aggr_optimize_not.b ASC NULLS FIRST, aggr_optimize_not.c ASC NULLS FIRST, aggr_optimize_not.d ASC NULLS FIRST, aggr_optimize_not.greptime_timestamp ASC NULLS FIRST                                             |
+|               |       MergeScan [is_placeholder=false, remote_input=[                                                                                                                                                                                                               |
+|               | Filter: prom_max_over_time(greptime_timestamp_range,greptime_value) IS NOT NULL                                                                                                                                                                                     |
+|               |   Projection: aggr_optimize_not.greptime_timestamp, prom_max_over_time(greptime_timestamp_range, greptime_value) AS prom_max_over_time(greptime_timestamp_range,greptime_value), aggr_optimize_not.a, aggr_optimize_not.b, aggr_optimize_not.c, aggr_optimize_not.d |
+|               |     PromRangeManipulate: req range=[0..0], interval=[300000], eval range=[120000], time index=[greptime_timestamp], values=["greptime_value"]                                                                                                                       |
+|               |       PromSeriesNormalize: offset=[0], time index=[greptime_timestamp], filter NaN: [true]                                                                                                                                                                          |
+|               |         PromSeriesDivide: tags=["a", "b", "c", "d"]                                                                                                                                                                                                                 |
+|               |           Sort: aggr_optimize_not.a ASC NULLS FIRST, aggr_optimize_not.b ASC NULLS FIRST, aggr_optimize_not.c ASC NULLS FIRST, aggr_optimize_not.d ASC NULLS FIRST, aggr_optimize_not.greptime_timestamp ASC NULLS FIRST                                            |
+|               |             Filter: aggr_optimize_not.greptime_timestamp >= TimestampMillisecond(-420000, None) AND aggr_optimize_not.greptime_timestamp <= TimestampMillisecond(300000, None)                                                                                      |
+|               |               TableScan: aggr_optimize_not                                                                                                                                                                                                                          |
+|               | ]]                                                                                                                                                                                                                                                                  |
+| physical_plan | SortPreservingMergeExec: [a@0 ASC NULLS LAST, b@1 ASC NULLS LAST, c@2 ASC NULLS LAST, d@3 ASC NULLS LAST, greptime_timestamp@4 ASC NULLS LAST]                                                                                                                      |
+|               |   AggregateExec: mode=FinalPartitioned, gby=[a@0 as a, b@1 as b, c@2 as c, d@3 as d, greptime_timestamp@4 as greptime_timestamp], aggr=[count(prom_max_over_time(greptime_timestamp_range,greptime_value))], ordering_mode=Sorted                                   |
+|               |     SortExec: expr=[a@0 ASC NULLS LAST, b@1 ASC NULLS LAST, c@2 ASC NULLS LAST, d@3 ASC NULLS LAST, greptime_timestamp@4 ASC NULLS LAST], preserve_partitioning=[true]                                                                                              |
+|               |       CoalesceBatchesExec: target_batch_size=8192                                                                                                                                                                                                                   |
+|               |         RepartitionExec: partitioning=REDACTED
+|               |           AggregateExec: mode=Partial, gby=[a@2 as a, b@3 as b, c@4 as c, d@5 as d, greptime_timestamp@0 as greptime_timestamp], aggr=[count(prom_max_over_time(greptime_timestamp_range,greptime_value))], ordering_mode=Sorted                                    |
+|               |             SortExec: expr=[a@2 ASC, b@3 ASC, c@4 ASC, d@5 ASC, greptime_timestamp@0 ASC], preserve_partitioning=[true]                                                                                                                                             |
+|               |               MergeScanExec: REDACTED
+|               |                                                                                                                                                                                                                                                                     |
+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+
+-- SQLNESS REPLACE (metrics.*) REDACTED
+-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
+-- SQLNESS REPLACE (Hash.*) REDACTED
+-- SQLNESS REPLACE (-+) -
+-- SQLNESS REPLACE (\s\s+) _
+-- SQLNESS REPLACE (peers.*) REDACTED
+-- SQLNESS REPLACE region=\d+\(\d+,\s+\d+\) region=REDACTED
+tql analyze (1752591864, 1752592164, '30s') count by (a, b, c, d) (max_over_time(aggr_optimize_not [2m]));
+
+-+-+-+
+| stage | node | plan_|
+-+-+-+
+| 0_| 0_|_SortPreservingMergeExec: [a@0 ASC NULLS LAST, b@1 ASC NULLS LAST, c@2 ASC NULLS LAST, d@3 ASC NULLS LAST, greptime_timestamp@4 ASC NULLS LAST] REDACTED
+|_|_|_AggregateExec: mode=FinalPartitioned, gby=[a@0 as a, b@1 as b, c@2 as c, d@3 as d, greptime_timestamp@4 as greptime_timestamp], aggr=[count(prom_max_over_time(greptime_timestamp_range,greptime_value))], ordering_mode=Sorted REDACTED
+|_|_|_SortExec: expr=[a@0 ASC NULLS LAST, b@1 ASC NULLS LAST, c@2 ASC NULLS LAST, d@3 ASC NULLS LAST, greptime_timestamp@4 ASC NULLS LAST], preserve_partitioning=[true] REDACTED
+|_|_|_CoalesceBatchesExec: target_batch_size=8192 REDACTED
+|_|_|_RepartitionExec: partitioning=REDACTED
+|_|_|_AggregateExec: mode=Partial, gby=[a@2 as a, b@3 as b, c@4 as c, d@5 as d, greptime_timestamp@0 as greptime_timestamp], aggr=[count(prom_max_over_time(greptime_timestamp_range,greptime_value))], ordering_mode=Sorted REDACTED
+|_|_|_SortExec: expr=[a@2 ASC, b@3 ASC, c@4 ASC, d@5 ASC, greptime_timestamp@0 ASC], preserve_partitioning=[true] REDACTED
+|_|_|_MergeScanExec: REDACTED
+|_|_|_|
+| 1_| 0_|_CoalesceBatchesExec: target_batch_size=8192 REDACTED
+|_|_|_FilterExec: prom_max_over_time(greptime_timestamp_range,greptime_value)@1 IS NOT NULL REDACTED
+|_|_|_ProjectionExec: expr=[greptime_timestamp@4 as greptime_timestamp, prom_max_over_time(greptime_timestamp_range@6, greptime_value@5) as prom_max_over_time(greptime_timestamp_range,greptime_value), a@0 as a, b@1 as b, c@2 as c, d@3 as d] REDACTED
+|_|_|_PromRangeManipulateExec: req range=[1752591864000..1752592164000], interval=[30000], eval range=[120000], time index=[greptime_timestamp] REDACTED
+|_|_|_PromSeriesNormalizeExec: offset=[0], time index=[greptime_timestamp], filter NaN: [true] REDACTED
+|_|_|_PromSeriesDivideExec: tags=["a", "b", "c", "d"] REDACTED
+|_|_|_SeriesScan: region=REDACTED, "partition_count":{"count":0, "mem_ranges":0, "files":0, "file_ranges":0}, "distribution":"PerSeries" REDACTED
+|_|_|_|
+| 1_| 1_|_CoalesceBatchesExec: target_batch_size=8192 REDACTED
+|_|_|_FilterExec: prom_max_over_time(greptime_timestamp_range,greptime_value)@1 IS NOT NULL REDACTED
+|_|_|_ProjectionExec: expr=[greptime_timestamp@4 as greptime_timestamp, prom_max_over_time(greptime_timestamp_range@6, greptime_value@5) as prom_max_over_time(greptime_timestamp_range,greptime_value), a@0 as a, b@1 as b, c@2 as c, d@3 as d] REDACTED
+|_|_|_PromRangeManipulateExec: req range=[1752591864000..1752592164000], interval=[30000], eval range=[120000], time index=[greptime_timestamp] REDACTED
+|_|_|_PromSeriesNormalizeExec: offset=[0], time index=[greptime_timestamp], filter NaN: [true] REDACTED
+|_|_|_PromSeriesDivideExec: tags=["a", "b", "c", "d"] REDACTED
+|_|_|_SeriesScan: region=REDACTED, "partition_count":{"count":0, "mem_ranges":0, "files":0, "file_ranges":0}, "distribution":"PerSeries" REDACTED
+|_|_|_|
+|_|_| Total rows: 0_|
+-+-+-+
+
+-- Case 4: group by columns are not prefix of partition columns.
+-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
+-- SQLNESS REPLACE (peers.*) REDACTED
+-- SQLNESS REPLACE (Hash.*) REDACTED
+tql explain (1752591864, 1752592164, '30s') min by (b, c, d) (max_over_time(aggr_optimize_not [2m]));
+
+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+| plan_type     | plan                                                                                                                                                                                                                                                                |
+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+| logical_plan  | Sort: aggr_optimize_not.b ASC NULLS LAST, aggr_optimize_not.c ASC NULLS LAST, aggr_optimize_not.d ASC NULLS LAST, aggr_optimize_not.greptime_timestamp ASC NULLS LAST                                                                                               |
+|               |   Aggregate: groupBy=[[aggr_optimize_not.b, aggr_optimize_not.c, aggr_optimize_not.d, aggr_optimize_not.greptime_timestamp]], aggr=[[min(prom_max_over_time(greptime_timestamp_range,greptime_value))]]                                                             |
+|               |     Projection: aggr_optimize_not.greptime_timestamp, prom_max_over_time(greptime_timestamp_range,greptime_value), aggr_optimize_not.b, aggr_optimize_not.c, aggr_optimize_not.d                                                                                    |
+|               |       MergeSort: aggr_optimize_not.a ASC NULLS FIRST, aggr_optimize_not.b ASC NULLS FIRST, aggr_optimize_not.c ASC NULLS FIRST, aggr_optimize_not.d ASC NULLS FIRST, aggr_optimize_not.greptime_timestamp ASC NULLS FIRST                                           |
+|               |         MergeScan [is_placeholder=false, remote_input=[                                                                                                                                                                                                             |
+|               | Filter: prom_max_over_time(greptime_timestamp_range,greptime_value) IS NOT NULL                                                                                                                                                                                     |
+|               |   Projection: aggr_optimize_not.greptime_timestamp, prom_max_over_time(greptime_timestamp_range, greptime_value) AS prom_max_over_time(greptime_timestamp_range,greptime_value), aggr_optimize_not.a, aggr_optimize_not.b, aggr_optimize_not.c, aggr_optimize_not.d |
+|               |     PromRangeManipulate: req range=[0..0], interval=[300000], eval range=[120000], time index=[greptime_timestamp], values=["greptime_value"]                                                                                                                       |
+|               |       PromSeriesNormalize: offset=[0], time index=[greptime_timestamp], filter NaN: [true]                                                                                                                                                                          |
+|               |         PromSeriesDivide: tags=["a", "b", "c", "d"]                                                                                                                                                                                                                 |
+|               |           Sort: aggr_optimize_not.a ASC NULLS FIRST, aggr_optimize_not.b ASC NULLS FIRST, aggr_optimize_not.c ASC NULLS FIRST, aggr_optimize_not.d ASC NULLS FIRST, aggr_optimize_not.greptime_timestamp ASC NULLS FIRST                                            |
+|               |             Filter: aggr_optimize_not.greptime_timestamp >= TimestampMillisecond(-420000, None) AND aggr_optimize_not.greptime_timestamp <= TimestampMillisecond(300000, None)                                                                                      |
+|               |               TableScan: aggr_optimize_not                                                                                                                                                                                                                          |
+|               | ]]                                                                                                                                                                                                                                                                  |
+| physical_plan | SortPreservingMergeExec: [b@0 ASC NULLS LAST, c@1 ASC NULLS LAST, d@2 ASC NULLS LAST, greptime_timestamp@3 ASC NULLS LAST]                                                                                                                                          |
+|               |   SortExec: expr=[b@0 ASC NULLS LAST, c@1 ASC NULLS LAST, d@2 ASC NULLS LAST, greptime_timestamp@3 ASC NULLS LAST], preserve_partitioning=[true]                                                                                                                    |
+|               |     AggregateExec: mode=FinalPartitioned, gby=[b@0 as b, c@1 as c, d@2 as d, greptime_timestamp@3 as greptime_timestamp], aggr=[min(prom_max_over_time(greptime_timestamp_range,greptime_value))]                                                                   |
+|               |       CoalesceBatchesExec: target_batch_size=8192                                                                                                                                                                                                                   |
+|               |         RepartitionExec: partitioning=REDACTED
+|               |           AggregateExec: mode=Partial, gby=[b@2 as b, c@3 as c, d@4 as d, greptime_timestamp@0 as greptime_timestamp], aggr=[min(prom_max_over_time(greptime_timestamp_range,greptime_value))]                                                                      |
+|               |             ProjectionExec: expr=[greptime_timestamp@0 as greptime_timestamp, prom_max_over_time(greptime_timestamp_range,greptime_value)@1 as prom_max_over_time(greptime_timestamp_range,greptime_value), b@3 as b, c@4 as c, d@5 as d]                           |
+|               |               MergeScanExec: REDACTED
+|               |                                                                                                                                                                                                                                                                     |
+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+
+-- SQLNESS REPLACE (metrics.*) REDACTED
+-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
+-- SQLNESS REPLACE (Hash.*) REDACTED
+-- SQLNESS REPLACE (-+) -
+-- SQLNESS REPLACE (\s\s+) _
+-- SQLNESS REPLACE (peers.*) REDACTED
+-- SQLNESS REPLACE region=\d+\(\d+,\s+\d+\) region=REDACTED
+tql analyze (1752591864, 1752592164, '30s') min by (b, c, d) (max_over_time(aggr_optimize_not [2m]));
+
+-+-+-+
+| stage | node | plan_|
+-+-+-+
+| 0_| 0_|_SortPreservingMergeExec: [b@0 ASC NULLS LAST, c@1 ASC NULLS LAST, d@2 ASC NULLS LAST, greptime_timestamp@3 ASC NULLS LAST] REDACTED
+|_|_|_SortExec: expr=[b@0 ASC NULLS LAST, c@1 ASC NULLS LAST, d@2 ASC NULLS LAST, greptime_timestamp@3 ASC NULLS LAST], preserve_partitioning=[true] REDACTED
+|_|_|_AggregateExec: mode=FinalPartitioned, gby=[b@0 as b, c@1 as c, d@2 as d, greptime_timestamp@3 as greptime_timestamp], aggr=[min(prom_max_over_time(greptime_timestamp_range,greptime_value))] REDACTED
+|_|_|_CoalesceBatchesExec: target_batch_size=8192 REDACTED
+|_|_|_RepartitionExec: partitioning=REDACTED
+|_|_|_AggregateExec: mode=Partial, gby=[b@2 as b, c@3 as c, d@4 as d, greptime_timestamp@0 as greptime_timestamp], aggr=[min(prom_max_over_time(greptime_timestamp_range,greptime_value))] REDACTED
+|_|_|_ProjectionExec: expr=[greptime_timestamp@0 as greptime_timestamp, prom_max_over_time(greptime_timestamp_range,greptime_value)@1 as prom_max_over_time(greptime_timestamp_range,greptime_value), b@3 as b, c@4 as c, d@5 as d] REDACTED
+|_|_|_MergeScanExec: REDACTED
+|_|_|_|
+| 1_| 0_|_CoalesceBatchesExec: target_batch_size=8192 REDACTED
+|_|_|_FilterExec: prom_max_over_time(greptime_timestamp_range,greptime_value)@1 IS NOT NULL REDACTED
+|_|_|_ProjectionExec: expr=[greptime_timestamp@4 as greptime_timestamp, prom_max_over_time(greptime_timestamp_range@6, greptime_value@5) as prom_max_over_time(greptime_timestamp_range,greptime_value), a@0 as a, b@1 as b, c@2 as c, d@3 as d] REDACTED
+|_|_|_PromRangeManipulateExec: req range=[1752591864000..1752592164000], interval=[30000], eval range=[120000], time index=[greptime_timestamp] REDACTED
+|_|_|_PromSeriesNormalizeExec: offset=[0], time index=[greptime_timestamp], filter NaN: [true] REDACTED
+|_|_|_PromSeriesDivideExec: tags=["a", "b", "c", "d"] REDACTED
+|_|_|_SeriesScan: region=REDACTED, "partition_count":{"count":0, "mem_ranges":0, "files":0, "file_ranges":0}, "distribution":"PerSeries" REDACTED
+|_|_|_|
+| 1_| 1_|_CoalesceBatchesExec: target_batch_size=8192 REDACTED
+|_|_|_FilterExec: prom_max_over_time(greptime_timestamp_range,greptime_value)@1 IS NOT NULL REDACTED
+|_|_|_ProjectionExec: expr=[greptime_timestamp@4 as greptime_timestamp, prom_max_over_time(greptime_timestamp_range@6, greptime_value@5) as prom_max_over_time(greptime_timestamp_range,greptime_value), a@0 as a, b@1 as b, c@2 as c, d@3 as d] REDACTED
+|_|_|_PromRangeManipulateExec: req range=[1752591864000..1752592164000], interval=[30000], eval range=[120000], time index=[greptime_timestamp] REDACTED
+|_|_|_PromSeriesNormalizeExec: offset=[0], time index=[greptime_timestamp], filter NaN: [true] REDACTED
+|_|_|_PromSeriesDivideExec: tags=["a", "b", "c", "d"] REDACTED
+|_|_|_SeriesScan: region=REDACTED, "partition_count":{"count":0, "mem_ranges":0, "files":0, "file_ranges":0}, "distribution":"PerSeries" REDACTED
+|_|_|_|
+|_|_| Total rows: 0_|
+-+-+-+
+
+-- Case 5: a simple sum
+-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
+-- SQLNESS REPLACE (peers.*) REDACTED
+-- SQLNESS REPLACE (Hash.*) REDACTED
+tql explain sum(aggr_optimize_not);
+
+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+| plan_type     | plan                                                                                                                                                                                                                      |
+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+| logical_plan  | Sort: aggr_optimize_not.greptime_timestamp ASC NULLS LAST                                                                                                                                                                 |
+|               |   Aggregate: groupBy=[[aggr_optimize_not.greptime_timestamp]], aggr=[[sum(aggr_optimize_not.greptime_value)]]                                                                                                             |
+|               |     Projection: aggr_optimize_not.greptime_timestamp, aggr_optimize_not.greptime_value                                                                                                                                    |
+|               |       MergeSort: aggr_optimize_not.a ASC NULLS FIRST, aggr_optimize_not.b ASC NULLS FIRST, aggr_optimize_not.c ASC NULLS FIRST, aggr_optimize_not.d ASC NULLS FIRST, aggr_optimize_not.greptime_timestamp ASC NULLS FIRST |
+|               |         MergeScan [is_placeholder=false, remote_input=[                                                                                                                                                                   |
+|               | PromInstantManipulate: range=[0..0], lookback=[300000], interval=[300000], time index=[greptime_timestamp]                                                                                                                |
+|               |   PromSeriesDivide: tags=["a", "b", "c", "d"]                                                                                                                                                                             |
+|               |     Sort: aggr_optimize_not.a ASC NULLS FIRST, aggr_optimize_not.b ASC NULLS FIRST, aggr_optimize_not.c ASC NULLS FIRST, aggr_optimize_not.d ASC NULLS FIRST, aggr_optimize_not.greptime_timestamp ASC NULLS FIRST        |
+|               |       Filter: aggr_optimize_not.greptime_timestamp >= TimestampMillisecond(-300000, None) AND aggr_optimize_not.greptime_timestamp <= TimestampMillisecond(300000, None)                                                  |
+|               |         TableScan: aggr_optimize_not                                                                                                                                                                                      |
+|               | ]]                                                                                                                                                                                                                        |
+| physical_plan | SortPreservingMergeExec: [greptime_timestamp@0 ASC NULLS LAST]                                                                                                                                                            |
+|               |   SortExec: expr=[greptime_timestamp@0 ASC NULLS LAST], preserve_partitioning=[true]                                                                                                                                      |
+|               |     AggregateExec: mode=FinalPartitioned, gby=[greptime_timestamp@0 as greptime_timestamp], aggr=[sum(aggr_optimize_not.greptime_value)]                                                                                  |
+|               |       CoalesceBatchesExec: target_batch_size=8192                                                                                                                                                                         |
+|               |         RepartitionExec: partitioning=REDACTED
+|               |           AggregateExec: mode=Partial, gby=[greptime_timestamp@0 as greptime_timestamp], aggr=[sum(aggr_optimize_not.greptime_value)]                                                                                     |
+|               |             ProjectionExec: expr=[greptime_timestamp@4 as greptime_timestamp, greptime_value@5 as greptime_value]                                                                                                         |
+|               |               MergeScanExec: REDACTED
+|               |                                                                                                                                                                                                                           |
+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+
+-- SQLNESS REPLACE (metrics.*) REDACTED
+-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
+-- SQLNESS REPLACE (Hash.*) REDACTED
+-- SQLNESS REPLACE (-+) -
+-- SQLNESS REPLACE (\s\s+) _
+-- SQLNESS REPLACE (peers.*) REDACTED
+-- SQLNESS REPLACE region=\d+\(\d+,\s+\d+\) region=REDACTED
+tql analyze sum(aggr_optimize_not);
+
+-+-+-+
+| stage | node | plan_|
+-+-+-+
+| 0_| 0_|_SortPreservingMergeExec: [greptime_timestamp@0 ASC NULLS LAST] REDACTED
+|_|_|_SortExec: expr=[greptime_timestamp@0 ASC NULLS LAST], preserve_partitioning=[true] REDACTED
+|_|_|_AggregateExec: mode=FinalPartitioned, gby=[greptime_timestamp@0 as greptime_timestamp], aggr=[sum(aggr_optimize_not.greptime_value)] REDACTED
+|_|_|_CoalesceBatchesExec: target_batch_size=8192 REDACTED
+|_|_|_RepartitionExec: partitioning=REDACTED
+|_|_|_AggregateExec: mode=Partial, gby=[greptime_timestamp@0 as greptime_timestamp], aggr=[sum(aggr_optimize_not.greptime_value)] REDACTED
+|_|_|_ProjectionExec: expr=[greptime_timestamp@4 as greptime_timestamp, greptime_value@5 as greptime_value] REDACTED
+|_|_|_MergeScanExec: REDACTED
+|_|_|_|
+| 1_| 0_|_PromInstantManipulateExec: range=[0..0], lookback=[300000], interval=[300000], time index=[greptime_timestamp] REDACTED
+|_|_|_PromSeriesDivideExec: tags=["a", "b", "c", "d"] REDACTED
+|_|_|_SeriesScan: region=REDACTED, "partition_count":{"count":0, "mem_ranges":0, "files":0, "file_ranges":0}, "distribution":"PerSeries" REDACTED
+|_|_|_|
+| 1_| 1_|_PromInstantManipulateExec: range=[0..0], lookback=[300000], interval=[300000], time index=[greptime_timestamp] REDACTED
+|_|_|_PromSeriesDivideExec: tags=["a", "b", "c", "d"] REDACTED
+|_|_|_SeriesScan: region=REDACTED, "partition_count":{"count":0, "mem_ranges":0, "files":0, "file_ranges":0}, "distribution":"PerSeries" REDACTED
+|_|_|_|
+|_|_| Total rows: 0_|
+-+-+-+
+
+-- TODO(discord9): more cases for aggr push down interacting with partitioning&tql
+CREATE TABLE IF NOT EXISTS aggr_optimize_not_count (
+  a STRING NULL,
+  b STRING NULL,
+  c STRING NULL,
+  d STRING NULL,
+  greptime_timestamp TIMESTAMP(3) NOT NULL,
+  greptime_value DOUBLE NULL,
+  TIME INDEX (greptime_timestamp),
+  PRIMARY KEY (a, b, c, d)
+) PARTITION ON COLUMNS (a, b, c) (a < 'b', a >= 'b',);
+
+Affected Rows: 0
+
+-- Case 6: Test average rate (sum/count like)
+-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
+-- SQLNESS REPLACE (peers.*) REDACTED
+-- SQLNESS REPLACE (Hash.*) REDACTED
+tql explain (1752591864, 1752592164, '30s') sum by (a, b, c) (rate(aggr_optimize_not [2m])) / sum by (a, b, c) (rate(aggr_optimize_not_count [2m]));
+
+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+| plan_type     | plan                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+| logical_plan  | Projection: aggr_optimize_not_count.a, aggr_optimize_not_count.b, aggr_optimize_not_count.c, aggr_optimize_not_count.greptime_timestamp, aggr_optimize_not.sum(prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000))) / aggr_optimize_not_count.sum(prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000))) AS aggr_optimize_not.sum(prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000))) / aggr_optimize_not_count.sum(prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000))) |
+|               |   Inner Join: aggr_optimize_not.a = aggr_optimize_not_count.a, aggr_optimize_not.b = aggr_optimize_not_count.b, aggr_optimize_not.c = aggr_optimize_not_count.c, aggr_optimize_not.greptime_timestamp = aggr_optimize_not_count.greptime_timestamp                                                                                                                                                                                                                                                                                                                                                      |
+|               |     SubqueryAlias: aggr_optimize_not                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
+|               |       Sort: aggr_optimize_not.a ASC NULLS LAST, aggr_optimize_not.b ASC NULLS LAST, aggr_optimize_not.c ASC NULLS LAST, aggr_optimize_not.greptime_timestamp ASC NULLS LAST                                                                                                                                                                                                                                                                                                                                                                                                                             |
+|               |         Aggregate: groupBy=[[aggr_optimize_not.a, aggr_optimize_not.b, aggr_optimize_not.c, aggr_optimize_not.greptime_timestamp]], aggr=[[sum(prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000)))]]                                                                                                                                                                                                                                                                                                                                                                   |
+|               |           Projection: aggr_optimize_not.greptime_timestamp, prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000)), aggr_optimize_not.a, aggr_optimize_not.b, aggr_optimize_not.c                                                                                                                                                                                                                                                                                                                                                                                          |
+|               |             MergeSort: aggr_optimize_not.a ASC NULLS FIRST, aggr_optimize_not.b ASC NULLS FIRST, aggr_optimize_not.c ASC NULLS FIRST, aggr_optimize_not.d ASC NULLS FIRST, aggr_optimize_not.greptime_timestamp ASC NULLS FIRST                                                                                                                                                                                                                                                                                                                                                                         |
+|               |               MergeScan [is_placeholder=false, remote_input=[                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
+|               | Filter: prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000)) IS NOT NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
+|               |   Projection: aggr_optimize_not.greptime_timestamp, prom_rate(greptime_timestamp_range, greptime_value, aggr_optimize_not.greptime_timestamp, Int64(120000)) AS prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000)), aggr_optimize_not.a, aggr_optimize_not.b, aggr_optimize_not.c, aggr_optimize_not.d                                                                                                                                                                                                                                                                 |
+|               |     PromRangeManipulate: req range=[0..0], interval=[300000], eval range=[120000], time index=[greptime_timestamp], values=["greptime_value"]                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
+|               |       PromSeriesNormalize: offset=[0], time index=[greptime_timestamp], filter NaN: [true]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
+|               |         PromSeriesDivide: tags=["a", "b", "c", "d"]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
+|               |           Sort: aggr_optimize_not.a ASC NULLS FIRST, aggr_optimize_not.b ASC NULLS FIRST, aggr_optimize_not.c ASC NULLS FIRST, aggr_optimize_not.d ASC NULLS FIRST, aggr_optimize_not.greptime_timestamp ASC NULLS FIRST                                                                                                                                                                                                                                                                                                                                                                                |
+|               |             Filter: aggr_optimize_not.greptime_timestamp >= TimestampMillisecond(-420000, None) AND aggr_optimize_not.greptime_timestamp <= TimestampMillisecond(300000, None)                                                                                                                                                                                                                                                                                                                                                                                                                          |
+|               |               TableScan: aggr_optimize_not                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
+|               | ]]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
+|               |     SubqueryAlias: aggr_optimize_not_count                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
+|               |       Sort: aggr_optimize_not_count.a ASC NULLS LAST, aggr_optimize_not_count.b ASC NULLS LAST, aggr_optimize_not_count.c ASC NULLS LAST, aggr_optimize_not_count.greptime_timestamp ASC NULLS LAST                                                                                                                                                                                                                                                                                                                                                                                                     |
+|               |         Aggregate: groupBy=[[aggr_optimize_not_count.a, aggr_optimize_not_count.b, aggr_optimize_not_count.c, aggr_optimize_not_count.greptime_timestamp]], aggr=[[sum(prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000)))]]                                                                                                                                                                                                                                                                                                                                           |
+|               |           Projection: aggr_optimize_not_count.greptime_timestamp, prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000)), aggr_optimize_not_count.a, aggr_optimize_not_count.b, aggr_optimize_not_count.c                                                                                                                                                                                                                                                                                                                                                                  |
+|               |             MergeSort: aggr_optimize_not_count.a ASC NULLS FIRST, aggr_optimize_not_count.b ASC NULLS FIRST, aggr_optimize_not_count.c ASC NULLS FIRST, aggr_optimize_not_count.d ASC NULLS FIRST, aggr_optimize_not_count.greptime_timestamp ASC NULLS FIRST                                                                                                                                                                                                                                                                                                                                           |
+|               |               MergeScan [is_placeholder=false, remote_input=[                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
+|               | Filter: prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000)) IS NOT NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
+|               |   Projection: aggr_optimize_not_count.greptime_timestamp, prom_rate(greptime_timestamp_range, greptime_value, aggr_optimize_not_count.greptime_timestamp, Int64(120000)) AS prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000)), aggr_optimize_not_count.a, aggr_optimize_not_count.b, aggr_optimize_not_count.c, aggr_optimize_not_count.d                                                                                                                                                                                                                             |
+|               |     PromRangeManipulate: req range=[0..0], interval=[300000], eval range=[120000], time index=[greptime_timestamp], values=["greptime_value"]                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
+|               |       PromSeriesNormalize: offset=[0], time index=[greptime_timestamp], filter NaN: [true]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
+|               |         PromSeriesDivide: tags=["a", "b", "c", "d"]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
+|               |           Sort: aggr_optimize_not_count.a ASC NULLS FIRST, aggr_optimize_not_count.b ASC NULLS FIRST, aggr_optimize_not_count.c ASC NULLS FIRST, aggr_optimize_not_count.d ASC NULLS FIRST, aggr_optimize_not_count.greptime_timestamp ASC NULLS FIRST                                                                                                                                                                                                                                                                                                                                                  |
+|               |             Filter: aggr_optimize_not_count.greptime_timestamp >= TimestampMillisecond(-420000, None) AND aggr_optimize_not_count.greptime_timestamp <= TimestampMillisecond(300000, None)                                                                                                                                                                                                                                                                                                                                                                                                              |
+|               |               TableScan: aggr_optimize_not_count                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
+|               | ]]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
+| physical_plan | ProjectionExec: expr=[a@1 as a, b@2 as b, c@3 as c, greptime_timestamp@4 as greptime_timestamp, sum(prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000)))@0 / sum(prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000)))@5 as aggr_optimize_not.sum(prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000))) / aggr_optimize_not_count.sum(prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000)))]                                                                               |
+|               |   CoalesceBatchesExec: target_batch_size=8192                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
+|               |     REDACTED
+|               |       AggregateExec: mode=FinalPartitioned, gby=[a@0 as a, b@1 as b, c@2 as c, greptime_timestamp@3 as greptime_timestamp], aggr=[sum(prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000)))], ordering_mode=PartiallySorted([0, 1, 2])                                                                                                                                                                                                                                                                                                                                   |
+|               |         SortExec: expr=[a@0 ASC NULLS LAST, b@1 ASC NULLS LAST, c@2 ASC NULLS LAST], preserve_partitioning=[true]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
+|               |           CoalesceBatchesExec: target_batch_size=8192                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
+|               |             RepartitionExec: partitioning=REDACTED
+|               |               AggregateExec: mode=Partial, gby=[a@2 as a, b@3 as b, c@4 as c, greptime_timestamp@0 as greptime_timestamp], aggr=[sum(prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000)))], ordering_mode=PartiallySorted([0, 1, 2])                                                                                                                                                                                                                                                                                                                                    |
+|               |                 ProjectionExec: expr=[greptime_timestamp@0 as greptime_timestamp, prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000))@1 as prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000)), a@2 as a, b@3 as b, c@4 as c]                                                                                                                                                                                                                                                                                                           |
+|               |                   SortExec: expr=[a@2 ASC, b@3 ASC, c@4 ASC, d@5 ASC, greptime_timestamp@0 ASC], preserve_partitioning=[true]                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
+|               |                     MergeScanExec: REDACTED
+|               |       CoalesceBatchesExec: target_batch_size=8192                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
+|               |         RepartitionExec: partitioning=REDACTED
+|               |           CoalescePartitionsExec                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
+|               |             AggregateExec: mode=FinalPartitioned, gby=[a@0 as a, b@1 as b, c@2 as c, greptime_timestamp@3 as greptime_timestamp], aggr=[sum(prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000)))], ordering_mode=PartiallySorted([0, 1, 2])                                                                                                                                                                                                                                                                                                                             |
+|               |               SortExec: expr=[a@0 ASC NULLS LAST, b@1 ASC NULLS LAST, c@2 ASC NULLS LAST], preserve_partitioning=[true]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
+|               |                 CoalesceBatchesExec: target_batch_size=8192                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
+|               |                   RepartitionExec: partitioning=REDACTED
+|               |                     AggregateExec: mode=Partial, gby=[a@2 as a, b@3 as b, c@4 as c, greptime_timestamp@0 as greptime_timestamp], aggr=[sum(prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000)))], ordering_mode=PartiallySorted([0, 1, 2])                                                                                                                                                                                                                                                                                                                              |
+|               |                       ProjectionExec: expr=[greptime_timestamp@0 as greptime_timestamp, prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000))@1 as prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000)), a@2 as a, b@3 as b, c@4 as c]                                                                                                                                                                                                                                                                                                     |
+|               |                         SortExec: expr=[a@2 ASC, b@3 ASC, c@4 ASC, d@5 ASC, greptime_timestamp@0 ASC], preserve_partitioning=[true]                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
+|               |                           MergeScanExec: REDACTED
+|               |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+
+-- SQLNESS REPLACE (metrics.*) REDACTED
+-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
+-- SQLNESS REPLACE (Hash.*) REDACTED
+-- SQLNESS REPLACE (-+) -
+-- SQLNESS REPLACE (\s\s+) _
+-- SQLNESS REPLACE (peers.*) REDACTED
+-- SQLNESS REPLACE region=\d+\(\d+,\s+\d+\) region=REDACTED
+tql analyze (1752591864, 1752592164, '30s') sum by (a, b, c) (rate(aggr_optimize_not [2m])) / sum by (a, b, c) (rate(aggr_optimize_not_count [2m]));
+
+-+-+-+
+| stage | node | plan_|
+-+-+-+
+| 0_| 0_|_ProjectionExec: expr=[a@1 as a, b@2 as b, c@3 as c, greptime_timestamp@4 as greptime_timestamp, sum(prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000)))@0 / sum(prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000)))@5 as aggr_optimize_not.sum(prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000))) / aggr_optimize_not_count.sum(prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000)))] REDACTED
+|_|_|_CoalesceBatchesExec: target_batch_size=8192 REDACTED
+|_|_|_REDACTED
+|_|_|_AggregateExec: mode=FinalPartitioned, gby=[a@0 as a, b@1 as b, c@2 as c, greptime_timestamp@3 as greptime_timestamp], aggr=[sum(prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000)))], ordering_mode=PartiallySorted([0, 1, 2]) REDACTED
+|_|_|_SortExec: expr=[a@0 ASC NULLS LAST, b@1 ASC NULLS LAST, c@2 ASC NULLS LAST], preserve_partitioning=[true] REDACTED
+|_|_|_CoalesceBatchesExec: target_batch_size=8192 REDACTED
+|_|_|_RepartitionExec: partitioning=REDACTED
+|_|_|_AggregateExec: mode=Partial, gby=[a@2 as a, b@3 as b, c@4 as c, greptime_timestamp@0 as greptime_timestamp], aggr=[sum(prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000)))], ordering_mode=PartiallySorted([0, 1, 2]) REDACTED
+|_|_|_ProjectionExec: expr=[greptime_timestamp@0 as greptime_timestamp, prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000))@1 as prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000)), a@2 as a, b@3 as b, c@4 as c] REDACTED
+|_|_|_SortExec: expr=[a@2 ASC, b@3 ASC, c@4 ASC, d@5 ASC, greptime_timestamp@0 ASC], preserve_partitioning=[true] REDACTED
+|_|_|_MergeScanExec: REDACTED
+|_|_|_CoalesceBatchesExec: target_batch_size=8192 REDACTED
+|_|_|_RepartitionExec: partitioning=REDACTED
+|_|_|_CoalescePartitionsExec REDACTED
+|_|_|_AggregateExec: mode=FinalPartitioned, gby=[a@0 as a, b@1 as b, c@2 as c, greptime_timestamp@3 as greptime_timestamp], aggr=[sum(prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000)))], ordering_mode=PartiallySorted([0, 1, 2]) REDACTED
+|_|_|_SortExec: expr=[a@0 ASC NULLS LAST, b@1 ASC NULLS LAST, c@2 ASC NULLS LAST], preserve_partitioning=[true] REDACTED
+|_|_|_CoalesceBatchesExec: target_batch_size=8192 REDACTED
+|_|_|_RepartitionExec: partitioning=REDACTED
+|_|_|_AggregateExec: mode=Partial, gby=[a@2 as a, b@3 as b, c@4 as c, greptime_timestamp@0 as greptime_timestamp], aggr=[sum(prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000)))], ordering_mode=PartiallySorted([0, 1, 2]) REDACTED
+|_|_|_ProjectionExec: expr=[greptime_timestamp@0 as greptime_timestamp, prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000))@1 as prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000)), a@2 as a, b@3 as b, c@4 as c] REDACTED
+|_|_|_SortExec: expr=[a@2 ASC, b@3 ASC, c@4 ASC, d@5 ASC, greptime_timestamp@0 ASC], preserve_partitioning=[true] REDACTED
+|_|_|_MergeScanExec: REDACTED
+|_|_|_|
+| 1_| 0_|_ProjectionExec: expr=[greptime_timestamp@0 as greptime_timestamp, prom_rate(greptime_timestamp_range,greptime_value,aggr_optimize_not.greptime_timestamp,Int64(120000))@1 as prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000)), a@2 as a, b@3 as b, c@4 as c, d@5 as d] REDACTED
+|_|_|_CoalesceBatchesExec: target_batch_size=8192 REDACTED
+|_|_|_FilterExec: prom_rate(greptime_timestamp_range,greptime_value,aggr_optimize_not.greptime_timestamp,Int64(120000))@1 IS NOT NULL REDACTED
+|_|_|_ProjectionExec: expr=[greptime_timestamp@4 as greptime_timestamp, prom_rate(greptime_timestamp_range@6, greptime_value@5, greptime_timestamp@4, 120000) as prom_rate(greptime_timestamp_range,greptime_value,aggr_optimize_not.greptime_timestamp,Int64(120000)), a@0 as a, b@1 as b, c@2 as c, d@3 as d] REDACTED
+|_|_|_PromRangeManipulateExec: req range=[1752591864000..1752592164000], interval=[30000], eval range=[120000], time index=[greptime_timestamp] REDACTED
+|_|_|_PromSeriesNormalizeExec: offset=[0], time index=[greptime_timestamp], filter NaN: [true] REDACTED
+|_|_|_PromSeriesDivideExec: tags=["a", "b", "c", "d"] REDACTED
+|_|_|_SeriesScan: region=REDACTED, "partition_count":{"count":0, "mem_ranges":0, "files":0, "file_ranges":0}, "distribution":"PerSeries" REDACTED
+|_|_|_|
+| 1_| 1_|_ProjectionExec: expr=[greptime_timestamp@0 as greptime_timestamp, prom_rate(greptime_timestamp_range,greptime_value,aggr_optimize_not.greptime_timestamp,Int64(120000))@1 as prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000)), a@2 as a, b@3 as b, c@4 as c, d@5 as d] REDACTED
+|_|_|_CoalesceBatchesExec: target_batch_size=8192 REDACTED
+|_|_|_FilterExec: prom_rate(greptime_timestamp_range,greptime_value,aggr_optimize_not.greptime_timestamp,Int64(120000))@1 IS NOT NULL REDACTED
+|_|_|_ProjectionExec: expr=[greptime_timestamp@4 as greptime_timestamp, prom_rate(greptime_timestamp_range@6, greptime_value@5, greptime_timestamp@4, 120000) as prom_rate(greptime_timestamp_range,greptime_value,aggr_optimize_not.greptime_timestamp,Int64(120000)), a@0 as a, b@1 as b, c@2 as c, d@3 as d] REDACTED
+|_|_|_PromRangeManipulateExec: req range=[1752591864000..1752592164000], interval=[30000], eval range=[120000], time index=[greptime_timestamp] REDACTED
+|_|_|_PromSeriesNormalizeExec: offset=[0], time index=[greptime_timestamp], filter NaN: [true] REDACTED
+|_|_|_PromSeriesDivideExec: tags=["a", "b", "c", "d"] REDACTED
+|_|_|_SeriesScan: region=REDACTED, "partition_count":{"count":0, "mem_ranges":0, "files":0, "file_ranges":0}, "distribution":"PerSeries" REDACTED
+|_|_|_|
+| 1_| 0_|_ProjectionExec: expr=[greptime_timestamp@0 as greptime_timestamp, prom_rate(greptime_timestamp_range,greptime_value,aggr_optimize_not_count.greptime_timestamp,Int64(120000))@1 as prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000)), a@2 as a, b@3 as b, c@4 as c, d@5 as d] REDACTED
+|_|_|_CoalesceBatchesExec: target_batch_size=8192 REDACTED
+|_|_|_FilterExec: prom_rate(greptime_timestamp_range,greptime_value,aggr_optimize_not_count.greptime_timestamp,Int64(120000))@1 IS NOT NULL REDACTED
+|_|_|_ProjectionExec: expr=[greptime_timestamp@4 as greptime_timestamp, prom_rate(greptime_timestamp_range@6, greptime_value@5, greptime_timestamp@4, 120000) as prom_rate(greptime_timestamp_range,greptime_value,aggr_optimize_not_count.greptime_timestamp,Int64(120000)), a@0 as a, b@1 as b, c@2 as c, d@3 as d] REDACTED
+|_|_|_PromRangeManipulateExec: req range=[1752591864000..1752592164000], interval=[30000], eval range=[120000], time index=[greptime_timestamp] REDACTED
+|_|_|_PromSeriesNormalizeExec: offset=[0], time index=[greptime_timestamp], filter NaN: [true] REDACTED
+|_|_|_PromSeriesDivideExec: tags=["a", "b", "c", "d"] REDACTED
+|_|_|_SeriesScan: region=REDACTED, "partition_count":{"count":0, "mem_ranges":0, "files":0, "file_ranges":0}, "distribution":"PerSeries" REDACTED
+|_|_|_|
+| 1_| 1_|_ProjectionExec: expr=[greptime_timestamp@0 as greptime_timestamp, prom_rate(greptime_timestamp_range,greptime_value,aggr_optimize_not_count.greptime_timestamp,Int64(120000))@1 as prom_rate(greptime_timestamp_range,greptime_value,greptime_timestamp,Int64(120000)), a@2 as a, b@3 as b, c@4 as c, d@5 as d] REDACTED
+|_|_|_CoalesceBatchesExec: target_batch_size=8192 REDACTED
+|_|_|_FilterExec: prom_rate(greptime_timestamp_range,greptime_value,aggr_optimize_not_count.greptime_timestamp,Int64(120000))@1 IS NOT NULL REDACTED
+|_|_|_ProjectionExec: expr=[greptime_timestamp@4 as greptime_timestamp, prom_rate(greptime_timestamp_range@6, greptime_value@5, greptime_timestamp@4, 120000) as prom_rate(greptime_timestamp_range,greptime_value,aggr_optimize_not_count.greptime_timestamp,Int64(120000)), a@0 as a, b@1 as b, c@2 as c, d@3 as d] REDACTED
+|_|_|_PromRangeManipulateExec: req range=[1752591864000..1752592164000], interval=[30000], eval range=[120000], time index=[greptime_timestamp] REDACTED
+|_|_|_PromSeriesNormalizeExec: offset=[0], time index=[greptime_timestamp], filter NaN: [true] REDACTED
+|_|_|_PromSeriesDivideExec: tags=["a", "b", "c", "d"] REDACTED
+|_|_|_SeriesScan: region=REDACTED, "partition_count":{"count":0, "mem_ranges":0, "files":0, "file_ranges":0}, "distribution":"PerSeries" REDACTED
+|_|_|_|
+|_|_| Total rows: 0_|
+-+-+-+
+
+-- Case 7: aggregate without sort should be pushed down. This one push down for include all partition columns.
+-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
+-- SQLNESS REPLACE (peers.*) REDACTED
+-- SQLNESS REPLACE (Hash.*) REDACTED
+EXPLAIN
+SELECT
+  min(greptime_value)
+FROM
+  aggr_optimize_not
+GROUP BY
+  a,
+  b,
+  c;
+
+---------------+----------------------------------------------------------------------------------------------------------------------------------------+
+| plan_type     | plan                                                                                                                                   |
+---------------+----------------------------------------------------------------------------------------------------------------------------------------+
+| logical_plan  | MergeScan [is_placeholder=false, remote_input=[                                                                                        |
+|               | Projection: min(aggr_optimize_not.greptime_value)                                                                                      |
+|               |   Aggregate: groupBy=[[aggr_optimize_not.a, aggr_optimize_not.b, aggr_optimize_not.c]], aggr=[[min(aggr_optimize_not.greptime_value)]] |
+|               |     TableScan: aggr_optimize_not                                                                                                       |
+|               | ]]                                                                                                                                     |
+| physical_plan | MergeScanExec: REDACTED
+|               |                                                                                                                                        |
+---------------+----------------------------------------------------------------------------------------------------------------------------------------+
+
+-- SQLNESS REPLACE (metrics.*) REDACTED
+-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
+-- SQLNESS REPLACE (Hash.*) REDACTED
+-- SQLNESS REPLACE (-+) -
+-- SQLNESS REPLACE (\s\s+) _
+-- SQLNESS REPLACE (peers.*) REDACTED
+-- SQLNESS REPLACE region=\d+\(\d+,\s+\d+\) region=REDACTED
+EXPLAIN ANALYZE
+SELECT
+  min(greptime_value)
+FROM
+  aggr_optimize_not
+GROUP BY
+  a,
+  b,
+  c;
+
+-+-+-+
+| stage | node | plan_|
+-+-+-+
+| 0_| 0_|_MergeScanExec: REDACTED
+|_|_|_|
+| 1_| 0_|_ProjectionExec: expr=[min(aggr_optimize_not.greptime_value)@3 as min(aggr_optimize_not.greptime_value)] REDACTED
+|_|_|_AggregateExec: mode=FinalPartitioned, gby=[a@0 as a, b@1 as b, c@2 as c], aggr=[min(aggr_optimize_not.greptime_value)] REDACTED
+|_|_|_CoalesceBatchesExec: target_batch_size=8192 REDACTED
+|_|_|_RepartitionExec: partitioning=REDACTED
+|_|_|_AggregateExec: mode=Partial, gby=[a@0 as a, b@1 as b, c@2 as c], aggr=[min(aggr_optimize_not.greptime_value)] REDACTED
+|_|_|_SeqScan: region=REDACTED, "partition_count":{"count":0, "mem_ranges":0, "files":0, "file_ranges":0} REDACTED
+|_|_|_|
+| 1_| 1_|_ProjectionExec: expr=[min(aggr_optimize_not.greptime_value)@3 as min(aggr_optimize_not.greptime_value)] REDACTED
+|_|_|_AggregateExec: mode=FinalPartitioned, gby=[a@0 as a, b@1 as b, c@2 as c], aggr=[min(aggr_optimize_not.greptime_value)] REDACTED
+|_|_|_CoalesceBatchesExec: target_batch_size=8192 REDACTED
+|_|_|_RepartitionExec: partitioning=REDACTED
+|_|_|_AggregateExec: mode=Partial, gby=[a@0 as a, b@1 as b, c@2 as c], aggr=[min(aggr_optimize_not.greptime_value)] REDACTED
+|_|_|_SeqScan: region=REDACTED, "partition_count":{"count":0, "mem_ranges":0, "files":0, "file_ranges":0} REDACTED
+|_|_|_|
+|_|_| Total rows: 0_|
+-+-+-+
+
+-- Case 8: aggregate without sort should be pushed down. This one push down for include all partition columns then some
+-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
+-- SQLNESS REPLACE (peers.*) REDACTED
+-- SQLNESS REPLACE (Hash.*) REDACTED
+EXPLAIN
+SELECT
+  min(greptime_value)
+FROM
+  aggr_optimize_not
+GROUP BY
+  a,
+  b,
+  c,
+  d;
+
+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+
+| plan_type     | plan                                                                                                                                                        |
+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+
+| logical_plan  | MergeScan [is_placeholder=false, remote_input=[                                                                                                             |
+|               | Projection: min(aggr_optimize_not.greptime_value)                                                                                                           |
+|               |   Aggregate: groupBy=[[aggr_optimize_not.a, aggr_optimize_not.b, aggr_optimize_not.c, aggr_optimize_not.d]], aggr=[[min(aggr_optimize_not.greptime_value)]] |
+|               |     TableScan: aggr_optimize_not                                                                                                                            |
+|               | ]]                                                                                                                                                          |
+| physical_plan | MergeScanExec: REDACTED
+|               |                                                                                                                                                             |
+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+
+
+-- SQLNESS REPLACE (metrics.*) REDACTED
+-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
+-- SQLNESS REPLACE (Hash.*) REDACTED
+-- SQLNESS REPLACE (-+) -
+-- SQLNESS REPLACE (\s\s+) _
+-- SQLNESS REPLACE (peers.*) REDACTED
+-- SQLNESS REPLACE region=\d+\(\d+,\s+\d+\) region=REDACTED
+EXPLAIN ANALYZE
+SELECT
+  min(greptime_value)
+FROM
+  aggr_optimize_not
+GROUP BY
+  a,
+  b,
+  c,
+  d;
+
+-+-+-+
+| stage | node | plan_|
+-+-+-+
+| 0_| 0_|_MergeScanExec: REDACTED
+|_|_|_|
+| 1_| 0_|_ProjectionExec: expr=[min(aggr_optimize_not.greptime_value)@4 as min(aggr_optimize_not.greptime_value)] REDACTED
+|_|_|_AggregateExec: mode=FinalPartitioned, gby=[a@0 as a, b@1 as b, c@2 as c, d@3 as d], aggr=[min(aggr_optimize_not.greptime_value)] REDACTED
+|_|_|_CoalesceBatchesExec: target_batch_size=8192 REDACTED
+|_|_|_RepartitionExec: partitioning=REDACTED
+|_|_|_AggregateExec: mode=Partial, gby=[a@0 as a, b@1 as b, c@2 as c, d@3 as d], aggr=[min(aggr_optimize_not.greptime_value)] REDACTED
+|_|_|_SeqScan: region=REDACTED, "partition_count":{"count":0, "mem_ranges":0, "files":0, "file_ranges":0} REDACTED
+|_|_|_|
+| 1_| 1_|_ProjectionExec: expr=[min(aggr_optimize_not.greptime_value)@4 as min(aggr_optimize_not.greptime_value)] REDACTED
+|_|_|_AggregateExec: mode=FinalPartitioned, gby=[a@0 as a, b@1 as b, c@2 as c, d@3 as d], aggr=[min(aggr_optimize_not.greptime_value)] REDACTED
+|_|_|_CoalesceBatchesExec: target_batch_size=8192 REDACTED
+|_|_|_RepartitionExec: partitioning=REDACTED
+|_|_|_AggregateExec: mode=Partial, gby=[a@0 as a, b@1 as b, c@2 as c, d@3 as d], aggr=[min(aggr_optimize_not.greptime_value)] REDACTED
+|_|_|_SeqScan: region=REDACTED, "partition_count":{"count":0, "mem_ranges":0, "files":0, "file_ranges":0} REDACTED
+|_|_|_|
+|_|_| Total rows: 0_|
+-+-+-+
+
+-- Case 9: aggregate without sort should be pushed down. This one push down for step aggr push down.
+-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
+-- SQLNESS REPLACE (peers.*) REDACTED
+-- SQLNESS REPLACE (Hash.*) REDACTED
+EXPLAIN
+SELECT
+  min(greptime_value)
+FROM
+  aggr_optimize_not
+GROUP BY
+  a,
+  b;
+
+---------------+------------------------------------------------------------------------------------------------------------------------+
+| plan_type     | plan                                                                                                                   |
+---------------+------------------------------------------------------------------------------------------------------------------------+
+| logical_plan  | Projection: min(min(aggr_optimize_not.greptime_value)) AS min(aggr_optimize_not.greptime_value)                        |
+|               |   Aggregate: groupBy=[[aggr_optimize_not.a, aggr_optimize_not.b]], aggr=[[min(min(aggr_optimize_not.greptime_value))]] |
+|               |     MergeScan [is_placeholder=false, remote_input=[                                                                    |
+|               | Aggregate: groupBy=[[aggr_optimize_not.a, aggr_optimize_not.b]], aggr=[[min(aggr_optimize_not.greptime_value)]]        |
+|               |   TableScan: aggr_optimize_not                                                                                         |
+|               | ]]                                                                                                                     |
+| physical_plan | ProjectionExec: expr=[min(min(aggr_optimize_not.greptime_value))@2 as min(aggr_optimize_not.greptime_value)]           |
+|               |   AggregateExec: mode=SinglePartitioned, gby=[a@0 as a, b@1 as b], aggr=[min(min(aggr_optimize_not.greptime_value))]   |
+|               |     MergeScanExec: REDACTED
+|               |                                                                                                                        |
+---------------+------------------------------------------------------------------------------------------------------------------------+
+
+-- SQLNESS REPLACE (metrics.*) REDACTED
+-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
+-- SQLNESS REPLACE (Hash.*) REDACTED
+-- SQLNESS REPLACE (-+) -
+-- SQLNESS REPLACE (\s\s+) _
+-- SQLNESS REPLACE (peers.*) REDACTED
+-- SQLNESS REPLACE region=\d+\(\d+,\s+\d+\) region=REDACTED
+EXPLAIN ANALYZE
+SELECT
+  min(greptime_value)
+FROM
+  aggr_optimize_not
+GROUP BY
+  a,
+  b;
+
+-+-+-+
+| stage | node | plan_|
+-+-+-+
+| 0_| 0_|_ProjectionExec: expr=[min(min(aggr_optimize_not.greptime_value))@2 as min(aggr_optimize_not.greptime_value)] REDACTED
+|_|_|_AggregateExec: mode=SinglePartitioned, gby=[a@0 as a, b@1 as b], aggr=[min(min(aggr_optimize_not.greptime_value))] REDACTED
+|_|_|_MergeScanExec: REDACTED
+|_|_|_|
+| 1_| 0_|_AggregateExec: mode=FinalPartitioned, gby=[a@0 as a, b@1 as b], aggr=[min(aggr_optimize_not.greptime_value)] REDACTED
+|_|_|_CoalesceBatchesExec: target_batch_size=8192 REDACTED
+|_|_|_RepartitionExec: partitioning=REDACTED
+|_|_|_AggregateExec: mode=Partial, gby=[a@0 as a, b@1 as b], aggr=[min(aggr_optimize_not.greptime_value)] REDACTED
+|_|_|_SeqScan: region=REDACTED, "partition_count":{"count":0, "mem_ranges":0, "files":0, "file_ranges":0} REDACTED
+|_|_|_|
+| 1_| 1_|_AggregateExec: mode=FinalPartitioned, gby=[a@0 as a, b@1 as b], aggr=[min(aggr_optimize_not.greptime_value)] REDACTED
+|_|_|_CoalesceBatchesExec: target_batch_size=8192 REDACTED
+|_|_|_RepartitionExec: partitioning=REDACTED
+|_|_|_AggregateExec: mode=Partial, gby=[a@0 as a, b@1 as b], aggr=[min(aggr_optimize_not.greptime_value)] REDACTED
+|_|_|_SeqScan: region=REDACTED, "partition_count":{"count":0, "mem_ranges":0, "files":0, "file_ranges":0} REDACTED
+|_|_|_|
+|_|_| Total rows: 0_|
+-+-+-+
+
+-- Case 10: aggregate without sort should be pushed down. This one push down for step aggr push down with complex aggr
+-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
+-- SQLNESS REPLACE (peers.*) REDACTED
+-- SQLNESS REPLACE (Hash.*) REDACTED
+EXPLAIN
+SELECT
+  min(greptime_value) + max(greptime_value)
+FROM
+  aggr_optimize_not
+GROUP BY
+  a,
+  b;
+
+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+| plan_type     | plan                                                                                                                                                                                                |
+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+| logical_plan  | Projection: min(min(aggr_optimize_not.greptime_value)) + max(max(aggr_optimize_not.greptime_value)) AS min(aggr_optimize_not.greptime_value) + max(aggr_optimize_not.greptime_value)                |
+|               |   Aggregate: groupBy=[[aggr_optimize_not.a, aggr_optimize_not.b]], aggr=[[min(min(aggr_optimize_not.greptime_value)), max(max(aggr_optimize_not.greptime_value))]]                                  |
+|               |     MergeScan [is_placeholder=false, remote_input=[                                                                                                                                                 |
+|               | Aggregate: groupBy=[[aggr_optimize_not.a, aggr_optimize_not.b]], aggr=[[min(aggr_optimize_not.greptime_value), max(aggr_optimize_not.greptime_value)]]                                              |
+|               |   TableScan: aggr_optimize_not                                                                                                                                                                      |
+|               | ]]                                                                                                                                                                                                  |
+| physical_plan | ProjectionExec: expr=[min(min(aggr_optimize_not.greptime_value))@2 + max(max(aggr_optimize_not.greptime_value))@3 as min(aggr_optimize_not.greptime_value) + max(aggr_optimize_not.greptime_value)] |
+|               |   AggregateExec: mode=SinglePartitioned, gby=[a@0 as a, b@1 as b], aggr=[min(min(aggr_optimize_not.greptime_value)), max(max(aggr_optimize_not.greptime_value))]                                    |
+|               |     MergeScanExec: REDACTED
+|               |                                                                                                                                                                                                     |
+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+
+-- SQLNESS REPLACE (metrics.*) REDACTED
+-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
+-- SQLNESS REPLACE (Hash.*) REDACTED
+-- SQLNESS REPLACE (-+) -
+-- SQLNESS REPLACE (\s\s+) _
+-- SQLNESS REPLACE (peers.*) REDACTED
+-- SQLNESS REPLACE region=\d+\(\d+,\s+\d+\) region=REDACTED
+EXPLAIN ANALYZE
+SELECT
+  min(greptime_value) + max(greptime_value)
+FROM
+  aggr_optimize_not
+GROUP BY
+  a,
+  b;
+
+-+-+-+
+| stage | node | plan_|
+-+-+-+
+| 0_| 0_|_ProjectionExec: expr=[min(min(aggr_optimize_not.greptime_value))@2 + max(max(aggr_optimize_not.greptime_value))@3 as min(aggr_optimize_not.greptime_value) + max(aggr_optimize_not.greptime_value)] REDACTED
+|_|_|_AggregateExec: mode=SinglePartitioned, gby=[a@0 as a, b@1 as b], aggr=[min(min(aggr_optimize_not.greptime_value)), max(max(aggr_optimize_not.greptime_value))] REDACTED
+|_|_|_MergeScanExec: REDACTED
+|_|_|_|
+| 1_| 0_|_AggregateExec: mode=FinalPartitioned, gby=[a@0 as a, b@1 as b], aggr=[min(aggr_optimize_not.greptime_value), max(aggr_optimize_not.greptime_value)] REDACTED
+|_|_|_CoalesceBatchesExec: target_batch_size=8192 REDACTED
+|_|_|_RepartitionExec: partitioning=REDACTED
+|_|_|_AggregateExec: mode=Partial, gby=[a@0 as a, b@1 as b], aggr=[min(aggr_optimize_not.greptime_value), max(aggr_optimize_not.greptime_value)] REDACTED
+|_|_|_SeqScan: region=REDACTED, "partition_count":{"count":0, "mem_ranges":0, "files":0, "file_ranges":0} REDACTED
+|_|_|_|
+| 1_| 1_|_AggregateExec: mode=FinalPartitioned, gby=[a@0 as a, b@1 as b], aggr=[min(aggr_optimize_not.greptime_value), max(aggr_optimize_not.greptime_value)] REDACTED
+|_|_|_CoalesceBatchesExec: target_batch_size=8192 REDACTED
+|_|_|_RepartitionExec: partitioning=REDACTED
+|_|_|_AggregateExec: mode=Partial, gby=[a@0 as a, b@1 as b], aggr=[min(aggr_optimize_not.greptime_value), max(aggr_optimize_not.greptime_value)] REDACTED
+|_|_|_SeqScan: region=REDACTED, "partition_count":{"count":0, "mem_ranges":0, "files":0, "file_ranges":0} REDACTED
+|_|_|_|
+|_|_| Total rows: 0_|
+-+-+-+
+
+-- Case 11: aggregate with subquery
+-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
+-- SQLNESS REPLACE (peers.*) REDACTED
+-- SQLNESS REPLACE (Hash.*) REDACTED
+EXPLAIN
+SELECT
+  a,
+  min(greptime_value)
+FROM
+  (
+    SELECT
+      a,
+      b,
+      greptime_value
+    FROM
+      aggr_optimize_not
+    ORDER BY
+      a,
+      b
+  )
+GROUP BY
+  a;
+
+---------------+------------------------------------------------------------------------------------------------------------------------+
+| plan_type     | plan                                                                                                                   |
+---------------+------------------------------------------------------------------------------------------------------------------------+
+| logical_plan  | Projection: aggr_optimize_not.a, min(min(aggr_optimize_not.greptime_value)) AS min(aggr_optimize_not.greptime_value)   |
+|               |   Aggregate: groupBy=[[aggr_optimize_not.a]], aggr=[[min(min(aggr_optimize_not.greptime_value))]]                      |
+|               |     MergeScan [is_placeholder=false, remote_input=[                                                                    |
+|               | Aggregate: groupBy=[[aggr_optimize_not.a]], aggr=[[min(aggr_optimize_not.greptime_value)]]                             |
+|               |   Projection: aggr_optimize_not.a, aggr_optimize_not.b, aggr_optimize_not.greptime_value                               |
+|               |     TableScan: aggr_optimize_not                                                                                       |
+|               | ]]                                                                                                                     |
+| physical_plan | ProjectionExec: expr=[a@0 as a, min(min(aggr_optimize_not.greptime_value))@1 as min(aggr_optimize_not.greptime_value)] |
+|               |   AggregateExec: mode=SinglePartitioned, gby=[a@0 as a], aggr=[min(min(aggr_optimize_not.greptime_value))]             |
+|               |     MergeScanExec: REDACTED
+|               |                                                                                                                        |
+---------------+------------------------------------------------------------------------------------------------------------------------+
+
+-- SQLNESS REPLACE (metrics.*) REDACTED
+-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
+-- SQLNESS REPLACE (Hash.*) REDACTED
+-- SQLNESS REPLACE (-+) -
+-- SQLNESS REPLACE (\s\s+) _
+-- SQLNESS REPLACE (peers.*) REDACTED
+-- SQLNESS REPLACE region=\d+\(\d+,\s+\d+\) region=REDACTED
+EXPLAIN ANALYZE
+SELECT
+  a,
+  min(greptime_value)
+FROM
+  (
+    SELECT
+      a,
+      b,
+      greptime_value
+    FROM
+      aggr_optimize_not
+    ORDER BY
+      a,
+      b
+  )
+GROUP BY
+  a;
+
+-+-+-+
+| stage | node | plan_|
+-+-+-+
+| 0_| 0_|_ProjectionExec: expr=[a@0 as a, min(min(aggr_optimize_not.greptime_value))@1 as min(aggr_optimize_not.greptime_value)] REDACTED
+|_|_|_AggregateExec: mode=SinglePartitioned, gby=[a@0 as a], aggr=[min(min(aggr_optimize_not.greptime_value))] REDACTED
+|_|_|_MergeScanExec: REDACTED
+|_|_|_|
+| 1_| 0_|_AggregateExec: mode=FinalPartitioned, gby=[a@0 as a], aggr=[min(aggr_optimize_not.greptime_value)] REDACTED
+|_|_|_CoalesceBatchesExec: target_batch_size=8192 REDACTED
+|_|_|_RepartitionExec: partitioning=REDACTED
+|_|_|_AggregateExec: mode=Partial, gby=[a@0 as a], aggr=[min(aggr_optimize_not.greptime_value)] REDACTED
+|_|_|_SeqScan: region=REDACTED, "partition_count":{"count":0, "mem_ranges":0, "files":0, "file_ranges":0} REDACTED
+|_|_|_|
+| 1_| 1_|_AggregateExec: mode=FinalPartitioned, gby=[a@0 as a], aggr=[min(aggr_optimize_not.greptime_value)] REDACTED
+|_|_|_CoalesceBatchesExec: target_batch_size=8192 REDACTED
+|_|_|_RepartitionExec: partitioning=REDACTED
+|_|_|_AggregateExec: mode=Partial, gby=[a@0 as a], aggr=[min(aggr_optimize_not.greptime_value)] REDACTED
+|_|_|_SeqScan: region=REDACTED, "partition_count":{"count":0, "mem_ranges":0, "files":0, "file_ranges":0} REDACTED
+|_|_|_|
+|_|_| Total rows: 0_|
+-+-+-+
+
+drop table aggr_optimize_not_count;
+
+Affected Rows: 0
+
+drop table aggr_optimize_not;
+
+Affected Rows: 0
+
--- a/tests/cases/distributed/explain/step_aggr_advance.sql
+++ b/tests/cases/distributed/explain/step_aggr_advance.sql
@@ -0,0 +1,307 @@
+CREATE TABLE IF NOT EXISTS aggr_optimize_not (
+  a STRING NULL,
+  b STRING NULL,
+  c STRING NULL,
+  d STRING NULL,
+  greptime_timestamp TIMESTAMP(3) NOT NULL,
+  greptime_value DOUBLE NULL,
+  TIME INDEX (greptime_timestamp),
+  PRIMARY KEY (a, b, c, d)
+) PARTITION ON COLUMNS (a, b, c) (a < 'b', a >= 'b',);
+
+-- Case 0: group by columns are the same as partition columns.
+-- This query shouldn't push down aggregation even if group by columns are partitioned.
+-- because sort is already pushed down.
+-- If it does, it will cause a wrong result.
+-- explain at 0s, 5s and 10s. No point at 0s.
+-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
+-- SQLNESS REPLACE (peers.*) REDACTED
+-- SQLNESS REPLACE (Hash.*) REDACTED
+tql explain (1752591864, 1752592164, '30s') max by (a, b, c) (max_over_time(aggr_optimize_not [2m]));
+
+-- SQLNESS REPLACE (metrics.*) REDACTED
+-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
+-- SQLNESS REPLACE (Hash.*) REDACTED
+-- SQLNESS REPLACE (-+) -
+-- SQLNESS REPLACE (\s\s+) _
+-- SQLNESS REPLACE (peers.*) REDACTED
+-- SQLNESS REPLACE region=\d+\(\d+,\s+\d+\) region=REDACTED
+tql analyze (1752591864, 1752592164, '30s') max by (a, b, c) (max_over_time(aggr_optimize_not [2m]));
+
+-- Case 1: group by columns are prefix of partition columns.
+-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
+-- SQLNESS REPLACE (peers.*) REDACTED
+-- SQLNESS REPLACE (Hash.*) REDACTED
+tql explain (1752591864, 1752592164, '30s') sum by (a, b) (max_over_time(aggr_optimize_not [2m]));
+
+-- SQLNESS REPLACE (metrics.*) REDACTED
+-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
+-- SQLNESS REPLACE (Hash.*) REDACTED
+-- SQLNESS REPLACE (-+) -
+-- SQLNESS REPLACE (\s\s+) _
+-- SQLNESS REPLACE (peers.*) REDACTED
+-- SQLNESS REPLACE region=\d+\(\d+,\s+\d+\) region=REDACTED
+tql analyze (1752591864, 1752592164, '30s') sum by (a, b) (max_over_time(aggr_optimize_not [2m]));
+
+-- Case 2: group by columns are prefix of partition columns.
+-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
+-- SQLNESS REPLACE (peers.*) REDACTED
+-- SQLNESS REPLACE (Hash.*) REDACTED
+tql explain (1752591864, 1752592164, '30s') avg by (a) (max_over_time(aggr_optimize_not [2m]));
+
+-- SQLNESS REPLACE (metrics.*) REDACTED
+-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
+-- SQLNESS REPLACE (Hash.*) REDACTED
+-- SQLNESS REPLACE (-+) -
+-- SQLNESS REPLACE (\s\s+) _
+-- SQLNESS REPLACE (peers.*) REDACTED
+-- SQLNESS REPLACE region=\d+\(\d+,\s+\d+\) region=REDACTED
+tql analyze (1752591864, 1752592164, '30s') avg by (a) (max_over_time(aggr_optimize_not [2m]));
+
+-- Case 3: group by columns are superset of partition columns.
+-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
+-- SQLNESS REPLACE (peers.*) REDACTED
+-- SQLNESS REPLACE (Hash.*) REDACTED
+tql explain (1752591864, 1752592164, '30s') count by (a, b, c, d) (max_over_time(aggr_optimize_not [2m]));
+
+-- SQLNESS REPLACE (metrics.*) REDACTED
+-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
+-- SQLNESS REPLACE (Hash.*) REDACTED
+-- SQLNESS REPLACE (-+) -
+-- SQLNESS REPLACE (\s\s+) _
+-- SQLNESS REPLACE (peers.*) REDACTED
+-- SQLNESS REPLACE region=\d+\(\d+,\s+\d+\) region=REDACTED
+tql analyze (1752591864, 1752592164, '30s') count by (a, b, c, d) (max_over_time(aggr_optimize_not [2m]));
+
+-- Case 4: group by columns are not prefix of partition columns.
+-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
+-- SQLNESS REPLACE (peers.*) REDACTED
+-- SQLNESS REPLACE (Hash.*) REDACTED
+tql explain (1752591864, 1752592164, '30s') min by (b, c, d) (max_over_time(aggr_optimize_not [2m]));
+
+-- SQLNESS REPLACE (metrics.*) REDACTED
+-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
+-- SQLNESS REPLACE (Hash.*) REDACTED
+-- SQLNESS REPLACE (-+) -
+-- SQLNESS REPLACE (\s\s+) _
+-- SQLNESS REPLACE (peers.*) REDACTED
+-- SQLNESS REPLACE region=\d+\(\d+,\s+\d+\) region=REDACTED
+tql analyze (1752591864, 1752592164, '30s') min by (b, c, d) (max_over_time(aggr_optimize_not [2m]));
+
+-- Case 5: a simple sum
+-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
+-- SQLNESS REPLACE (peers.*) REDACTED
+-- SQLNESS REPLACE (Hash.*) REDACTED
+tql explain sum(aggr_optimize_not);
+
+-- SQLNESS REPLACE (metrics.*) REDACTED
+-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
+-- SQLNESS REPLACE (Hash.*) REDACTED
+-- SQLNESS REPLACE (-+) -
+-- SQLNESS REPLACE (\s\s+) _
+-- SQLNESS REPLACE (peers.*) REDACTED
+-- SQLNESS REPLACE region=\d+\(\d+,\s+\d+\) region=REDACTED
+tql analyze sum(aggr_optimize_not);
+
+-- TODO(discord9): more cases for aggr push down interacting with partitioning&tql
+CREATE TABLE IF NOT EXISTS aggr_optimize_not_count (
+  a STRING NULL,
+  b STRING NULL,
+  c STRING NULL,
+  d STRING NULL,
+  greptime_timestamp TIMESTAMP(3) NOT NULL,
+  greptime_value DOUBLE NULL,
+  TIME INDEX (greptime_timestamp),
+  PRIMARY KEY (a, b, c, d)
+) PARTITION ON COLUMNS (a, b, c) (a < 'b', a >= 'b',);
+
+-- Case 6: Test average rate (sum/count like)
+-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
+-- SQLNESS REPLACE (peers.*) REDACTED
+-- SQLNESS REPLACE (Hash.*) REDACTED
+tql explain (1752591864, 1752592164, '30s') sum by (a, b, c) (rate(aggr_optimize_not [2m])) / sum by (a, b, c) (rate(aggr_optimize_not_count [2m]));
+
+-- SQLNESS REPLACE (metrics.*) REDACTED
+-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
+-- SQLNESS REPLACE (Hash.*) REDACTED
+-- SQLNESS REPLACE (-+) -
+-- SQLNESS REPLACE (\s\s+) _
+-- SQLNESS REPLACE (peers.*) REDACTED
+-- SQLNESS REPLACE region=\d+\(\d+,\s+\d+\) region=REDACTED
+tql analyze (1752591864, 1752592164, '30s') sum by (a, b, c) (rate(aggr_optimize_not [2m])) / sum by (a, b, c) (rate(aggr_optimize_not_count [2m]));
+
+-- Case 7: aggregate without sort should be pushed down. This one push down for include all partition columns.
+-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
+-- SQLNESS REPLACE (peers.*) REDACTED
+-- SQLNESS REPLACE (Hash.*) REDACTED
+EXPLAIN
+SELECT
+  min(greptime_value)
+FROM
+  aggr_optimize_not
+GROUP BY
+  a,
+  b,
+  c;
+
+-- SQLNESS REPLACE (metrics.*) REDACTED
+-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
+-- SQLNESS REPLACE (Hash.*) REDACTED
+-- SQLNESS REPLACE (-+) -
+-- SQLNESS REPLACE (\s\s+) _
+-- SQLNESS REPLACE (peers.*) REDACTED
+-- SQLNESS REPLACE region=\d+\(\d+,\s+\d+\) region=REDACTED
+EXPLAIN ANALYZE
+SELECT
+  min(greptime_value)
+FROM
+  aggr_optimize_not
+GROUP BY
+  a,
+  b,
+  c;
+
+-- Case 8: aggregate without sort should be pushed down. This one push down for include all partition columns then some
+-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
+-- SQLNESS REPLACE (peers.*) REDACTED
+-- SQLNESS REPLACE (Hash.*) REDACTED
+EXPLAIN
+SELECT
+  min(greptime_value)
+FROM
+  aggr_optimize_not
+GROUP BY
+  a,
+  b,
+  c,
+  d;
+
+-- SQLNESS REPLACE (metrics.*) REDACTED
+-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
+-- SQLNESS REPLACE (Hash.*) REDACTED
+-- SQLNESS REPLACE (-+) -
+-- SQLNESS REPLACE (\s\s+) _
+-- SQLNESS REPLACE (peers.*) REDACTED
+-- SQLNESS REPLACE region=\d+\(\d+,\s+\d+\) region=REDACTED
+EXPLAIN ANALYZE
+SELECT
+  min(greptime_value)
+FROM
+  aggr_optimize_not
+GROUP BY
+  a,
+  b,
+  c,
+  d;
+
+-- Case 9: aggregate without sort should be pushed down. This one push down for step aggr push down.
+-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
+-- SQLNESS REPLACE (peers.*) REDACTED
+-- SQLNESS REPLACE (Hash.*) REDACTED
+EXPLAIN
+SELECT
+  min(greptime_value)
+FROM
+  aggr_optimize_not
+GROUP BY
+  a,
+  b;
+
+-- SQLNESS REPLACE (metrics.*) REDACTED
+-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
+-- SQLNESS REPLACE (Hash.*) REDACTED
+-- SQLNESS REPLACE (-+) -
+-- SQLNESS REPLACE (\s\s+) _
+-- SQLNESS REPLACE (peers.*) REDACTED
+-- SQLNESS REPLACE region=\d+\(\d+,\s+\d+\) region=REDACTED
+EXPLAIN ANALYZE
+SELECT
+  min(greptime_value)
+FROM
+  aggr_optimize_not
+GROUP BY
+  a,
+  b;
+
+-- Case 10: aggregate without sort should be pushed down. This one push down for step aggr push down with complex aggr
+-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
+-- SQLNESS REPLACE (peers.*) REDACTED
+-- SQLNESS REPLACE (Hash.*) REDACTED
+EXPLAIN
+SELECT
+  min(greptime_value) + max(greptime_value)
+FROM
+  aggr_optimize_not
+GROUP BY
+  a,
+  b;
+
+-- SQLNESS REPLACE (metrics.*) REDACTED
+-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
+-- SQLNESS REPLACE (Hash.*) REDACTED
+-- SQLNESS REPLACE (-+) -
+-- SQLNESS REPLACE (\s\s+) _
+-- SQLNESS REPLACE (peers.*) REDACTED
+-- SQLNESS REPLACE region=\d+\(\d+,\s+\d+\) region=REDACTED
+EXPLAIN ANALYZE
+SELECT
+  min(greptime_value) + max(greptime_value)
+FROM
+  aggr_optimize_not
+GROUP BY
+  a,
+  b;
+
+
+-- Case 11: aggregate with subquery
+-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
+-- SQLNESS REPLACE (peers.*) REDACTED
+-- SQLNESS REPLACE (Hash.*) REDACTED
+EXPLAIN
+SELECT
+  a,
+  min(greptime_value)
+FROM
+  (
+    SELECT
+      a,
+      b,
+      greptime_value
+    FROM
+      aggr_optimize_not
+    ORDER BY
+      a,
+      b
+  )
+GROUP BY
+  a;
+
+-- SQLNESS REPLACE (metrics.*) REDACTED
+-- SQLNESS REPLACE (RoundRobinBatch.*) REDACTED
+-- SQLNESS REPLACE (Hash.*) REDACTED
+-- SQLNESS REPLACE (-+) -
+-- SQLNESS REPLACE (\s\s+) _
+-- SQLNESS REPLACE (peers.*) REDACTED
+-- SQLNESS REPLACE region=\d+\(\d+,\s+\d+\) region=REDACTED
+EXPLAIN ANALYZE
+SELECT
+  a,
+  min(greptime_value)
+FROM
+  (
+    SELECT
+      a,
+      b,
+      greptime_value
+    FROM
+      aggr_optimize_not
+    ORDER BY
+      a,
+      b
+  )
+GROUP BY
+  a;
+
+drop table aggr_optimize_not_count;
+
+drop table aggr_optimize_not;
--- a/tests/cases/distributed/explain/step_aggr_basic.result
+++ b/tests/cases/distributed/explain/step_aggr_basic.result
@@ -50,7 +50,10 @@ FROM
 +-+-+
 | logical_plan_| Projection: sum(count(integers.i)) AS count(integers.i)_|
 |_|_Aggregate: groupBy=[[]], aggr=[[sum(count(integers.i))]]_|
-|_|_MergeScan [is_placeholder=false]_|
+|_|_MergeScan [is_placeholder=false, remote_input=[_|
+|_| Aggregate: groupBy=[[]], aggr=[[count(integers.i)]]_|
+|_|_TableScan: integers_|
+|_| ]]_|
 | physical_plan | ProjectionExec: expr=[sum(count(integers.i))@0 as count(integers.i)]_|
 |_|_AggregateExec: mode=Final, gby=[], aggr=[sum(count(integers.i))]_|
 |_|_CoalescePartitionsExec_|
@@ -144,7 +147,10 @@ ORDER BY
 | logical_plan_| Sort: integers.ts ASC NULLS LAST, count(integers.i) ASC NULLS LAST_|
 |_|_Projection: integers.ts, sum(count(integers.i)) AS count(integers.i)_|
 |_|_Aggregate: groupBy=[[integers.ts]], aggr=[[sum(count(integers.i))]]_|
-|_|_MergeScan [is_placeholder=false]_|
+|_|_MergeScan [is_placeholder=false, remote_input=[_|
+|_| Aggregate: groupBy=[[integers.ts]], aggr=[[count(integers.i)]]_|
+|_|_TableScan: integers_|
+|_| ]]_|
 | physical_plan | SortPreservingMergeExec: [ts@0 ASC NULLS LAST, count(integers.i)@1 ASC NULLS LAST]_|
 |_|_SortExec: expr=[ts@0 ASC NULLS LAST, count(integers.i)@1 ASC NULLS LAST], preserve_partitioning=[true]_|
 |_|_ProjectionExec: expr=[ts@0 as ts, sum(count(integers.i))@1 as count(integers.i)]_|
@@ -253,7 +259,10 @@ ORDER BY
 | logical_plan_| Sort: time_window ASC NULLS LAST, count(integers.i) ASC NULLS LAST_|
 |_|_Projection: date_bin(Utf8("1 hour"),integers.ts) AS time_window, sum(count(integers.i)) AS count(integers.i)_|
 |_|_Aggregate: groupBy=[[date_bin(Utf8("1 hour"),integers.ts)]], aggr=[[sum(count(integers.i))]]_|
-|_|_MergeScan [is_placeholder=false]_|
+|_|_MergeScan [is_placeholder=false, remote_input=[_|
+|_| Aggregate: groupBy=[[date_bin(CAST(Utf8("1 hour") AS Interval(MonthDayNano)), integers.ts)]], aggr=[[count(integers.i)]]_|
+|_|_TableScan: integers_|
+|_| ]]_|
 | physical_plan | SortPreservingMergeExec: [time_window@0 ASC NULLS LAST, count(integers.i)@1 ASC NULLS LAST]_|
 |_|_SortExec: expr=[time_window@0 ASC NULLS LAST, count(integers.i)@1 ASC NULLS LAST], preserve_partitioning=[true]_|
 |_|_ProjectionExec: expr=[date_bin(Utf8("1 hour"),integers.ts)@0 as time_window, sum(count(integers.i))@1 as count(integers.i)]_|
@@ -369,7 +378,10 @@ ORDER BY
 | logical_plan_| Sort: integers.ts + Int64(1) ASC NULLS LAST, integers.i / Int64(2) ASC NULLS LAST_|
 |_|_Projection: integers.ts + Int64(1), integers.i / Int64(2), sum(count(integers.i)) AS count(integers.i)_|
 |_|_Aggregate: groupBy=[[integers.ts + Int64(1), integers.i / Int64(2)]], aggr=[[sum(count(integers.i))]]_|
-|_|_MergeScan [is_placeholder=false]_|
+|_|_MergeScan [is_placeholder=false, remote_input=[_|
+|_| Aggregate: groupBy=[[CAST(integers.ts AS Int64) + Int64(1), integers.i / Int64(2)]], aggr=[[count(integers.i)]]_|
+|_|_TableScan: integers_|
+|_| ]]_|
 | physical_plan | SortPreservingMergeExec: [integers.ts + Int64(1)@0 ASC NULLS LAST, integers.i / Int64(2)@1 ASC NULLS LAST]_|
 |_|_SortExec: expr=[integers.ts + Int64(1)@0 ASC NULLS LAST, integers.i / Int64(2)@1 ASC NULLS LAST], preserve_partitioning=[true]_|
 |_|_ProjectionExec: expr=[integers.ts + Int64(1)@0 as integers.ts + Int64(1), integers.i / Int64(2)@1 as integers.i / Int64(2), sum(count(integers.i))@2 as count(integers.i)]_|
@@ -497,7 +509,10 @@ FROM
 +-+-+
 | logical_plan_| Projection: uddsketch_calc(Float64(0.5), uddsketch_merge(Int64(128),Float64(0.01),uddsketch_merge(Int64(128),Float64(0.01),sink_table.udd_state))) AS udd_result, hll_count(hll_merge(hll_merge(sink_table.hll_state))) AS hll_result_|
 |_|_Aggregate: groupBy=[[]], aggr=[[uddsketch_merge(Int64(128), Float64(0.01), uddsketch_merge(Int64(128),Float64(0.01),sink_table.udd_state)), hll_merge(hll_merge(sink_table.hll_state))]]_|
-|_|_MergeScan [is_placeholder=false]_|
+|_|_MergeScan [is_placeholder=false, remote_input=[_|
+|_| Aggregate: groupBy=[[]], aggr=[[uddsketch_merge(Int64(128), Float64(0.01), sink_table.udd_state), hll_merge(sink_table.hll_state)]]_|
+|_|_TableScan: sink_table_|
+|_| ]]_|
 | physical_plan | ProjectionExec: expr=[uddsketch_calc(0.5, uddsketch_merge(Int64(128),Float64(0.01),uddsketch_merge(Int64(128),Float64(0.01),sink_table.udd_state))@0) as udd_result, hll_count(hll_merge(hll_merge(sink_table.hll_state))@1) as hll_result] |
 |_|_AggregateExec: mode=Final, gby=[], aggr=[uddsketch_merge(Int64(128),Float64(0.01),uddsketch_merge(Int64(128),Float64(0.01),sink_table.udd_state)), hll_merge(hll_merge(sink_table.hll_state))]_|
 |_|_CoalescePartitionsExec_|
--- a/tests/cases/distributed/explain/step_aggr_massive.result
+++ b/tests/cases/distributed/explain/step_aggr_massive.result
@@ -247,7 +247,11 @@ GROUP BY
 +-+-+
 | logical_plan_| Projection: base_table.env, base_table.service_name, base_table.city, base_table.page, uddsketch_merge(Int64(128),Float64(0.01),uddsketch_state(Int64(128),Float64(0.01),CASE WHEN base_table.lcp > Int64(0) AND base_table.lcp < Int64(3000000) THEN base_table.lcp ELSE NULL END)) AS lcp_state, max(max(CASE WHEN base_table.lcp > Int64(0) AND base_table.lcp < Int64(3000000) THEN base_table.lcp ELSE NULL END)) AS max_lcp, min(min(CASE WHEN base_table.lcp > Int64(0) AND base_table.lcp < Int64(3000000) THEN base_table.lcp ELSE NULL END)) AS min_lcp, uddsketch_merge(Int64(128),Float64(0.01),uddsketch_state(Int64(128),Float64(0.01),CASE WHEN base_table.fmp > Int64(0) AND base_table.fmp < Int64(3000000) THEN base_table.fmp ELSE NULL END)) AS fmp_state, max(max(CASE WHEN base_table.fmp > Int64(0) AND base_table.fmp < Int64(3000000) THEN base_table.fmp ELSE NULL END)) AS max_fmp, min(min(CASE WHEN base_table.fmp > Int64(0) AND base_table.fmp < Int64(3000000) THEN base_table.fmp ELSE NULL END)) AS min_fmp, uddsketch_merge(Int64(128),Float64(0.01),uddsketch_state(Int64(128),Float64(0.01),CASE WHEN base_table.fcp > Int64(0) AND base_table.fcp < Int64(3000000) THEN base_table.fcp ELSE NULL END)) AS fcp_state, max(max(CASE WHEN base_table.fcp > Int64(0) AND base_table.fcp < Int64(3000000) THEN base_table.fcp ELSE NULL END)) AS max_fcp, min(min(CASE WHEN base_table.fcp > Int64(0) AND base_table.fcp < Int64(3000000) THEN base_table.fcp ELSE NULL END)) AS min_fcp, uddsketch_merge(Int64(128),Float64(0.01),uddsketch_state(Int64(128),Float64(0.01),CASE WHEN base_table.fp > Int64(0) AND base_table.fp < Int64(3000000) THEN base_table.fp ELSE NULL END)) AS fp_state, max(max(CASE WHEN base_table.fp > Int64(0) AND base_table.fp < Int64(3000000) THEN base_table.fp ELSE NULL END)) AS max_fp, min(min(CASE WHEN base_table.fp > Int64(0) AND base_table.fp < Int64(3000000) THEN base_table.fp ELSE NULL END)) AS min_fp, uddsketch_merge(Int64(128),Float64(0.01),uddsketch_state(Int64(128),Float64(0.01),CASE WHEN base_table.tti > Int64(0) AND base_table.tti < Int64(3000000) THEN base_table.tti ELSE NULL END)) AS tti_state, max(max(CASE WHEN base_table.tti > Int64(0) AND base_table.tti < Int64(3000000) THEN base_table.tti ELSE NULL END)) AS max_tti, min(min(CASE WHEN base_table.tti > Int64(0) AND base_table.tti < Int64(3000000) THEN base_table.tti ELSE NULL END)) AS min_tti, uddsketch_merge(Int64(128),Float64(0.01),uddsketch_state(Int64(128),Float64(0.01),CASE WHEN base_table.fid > Int64(0) AND base_table.fid < Int64(3000000) THEN base_table.fid ELSE NULL END)) AS fid_state, max(max(CASE WHEN base_table.fid > Int64(0) AND base_table.fid < Int64(3000000) THEN base_table.fid ELSE NULL END)) AS max_fid, min(min(CASE WHEN base_table.fid > Int64(0) AND base_table.fid < Int64(3000000) THEN base_table.fid ELSE NULL END)) AS min_fid, max(max(base_table.shard_key)) AS shard_key, arrow_cast(date_bin(Utf8("60 seconds"),base_table.time),Utf8("Timestamp(Second, None)"))_|
 |_|_Aggregate: groupBy=[[base_table.env, base_table.service_name, base_table.city, base_table.page, arrow_cast(date_bin(Utf8("60 seconds"),base_table.time),Utf8("Timestamp(Second, None)"))]], aggr=[[uddsketch_merge(Int64(128), Float64(0.01), uddsketch_state(Int64(128),Float64(0.01),CASE WHEN base_table.lcp > Int64(0) AND base_table.lcp < Int64(3000000) THEN base_table.lcp ELSE NULL END)), max(max(CASE WHEN base_table.lcp > Int64(0) AND base_table.lcp < Int64(3000000) THEN base_table.lcp ELSE NULL END)), min(min(CASE WHEN base_table.lcp > Int64(0) AND base_table.lcp < Int64(3000000) THEN base_table.lcp ELSE NULL END)), uddsketch_merge(Int64(128), Float64(0.01), uddsketch_state(Int64(128),Float64(0.01),CASE WHEN base_table.fmp > Int64(0) AND base_table.fmp < Int64(3000000) THEN base_table.fmp ELSE NULL END)), max(max(CASE WHEN base_table.fmp > Int64(0) AND base_table.fmp < Int64(3000000) THEN base_table.fmp ELSE NULL END)), min(min(CASE WHEN base_table.fmp > Int64(0) AND base_table.fmp < Int64(3000000) THEN base_table.fmp ELSE NULL END)), uddsketch_merge(Int64(128), Float64(0.01), uddsketch_state(Int64(128),Float64(0.01),CASE WHEN base_table.fcp > Int64(0) AND base_table.fcp < Int64(3000000) THEN base_table.fcp ELSE NULL END)), max(max(CASE WHEN base_table.fcp > Int64(0) AND base_table.fcp < Int64(3000000) THEN base_table.fcp ELSE NULL END)), min(min(CASE WHEN base_table.fcp > Int64(0) AND base_table.fcp < Int64(3000000) THEN base_table.fcp ELSE NULL END)), uddsketch_merge(Int64(128), Float64(0.01), uddsketch_state(Int64(128),Float64(0.01),CASE WHEN base_table.fp > Int64(0) AND base_table.fp < Int64(3000000) THEN base_table.fp ELSE NULL END)), max(max(CASE WHEN base_table.fp > Int64(0) AND base_table.fp < Int64(3000000) THEN base_table.fp ELSE NULL END)), min(min(CASE WHEN base_table.fp > Int64(0) AND base_table.fp < Int64(3000000) THEN base_table.fp ELSE NULL END)), uddsketch_merge(Int64(128), Float64(0.01), uddsketch_state(Int64(128),Float64(0.01),CASE WHEN base_table.tti > Int64(0) AND base_table.tti < Int64(3000000) THEN base_table.tti ELSE NULL END)), max(max(CASE WHEN base_table.tti > Int64(0) AND base_table.tti < Int64(3000000) THEN base_table.tti ELSE NULL END)), min(min(CASE WHEN base_table.tti > Int64(0) AND base_table.tti < Int64(3000000) THEN base_table.tti ELSE NULL END)), uddsketch_merge(Int64(128), Float64(0.01), uddsketch_state(Int64(128),Float64(0.01),CASE WHEN base_table.fid > Int64(0) AND base_table.fid < Int64(3000000) THEN base_table.fid ELSE NULL END)), max(max(CASE WHEN base_table.fid > Int64(0) AND base_table.fid < Int64(3000000) THEN base_table.fid ELSE NULL END)), min(min(CASE WHEN base_table.fid > Int64(0) AND base_table.fid < Int64(3000000) THEN base_table.fid ELSE NULL END)), max(max(base_table.shard_key))]]_|
-|_|_MergeScan [is_placeholder=false]_|
+|_|_MergeScan [is_placeholder=false, remote_input=[_|
+|_| Aggregate: groupBy=[[base_table.env, base_table.service_name, base_table.city, base_table.page, arrow_cast(date_bin(CAST(Utf8("60 seconds") AS Interval(MonthDayNano)), base_table.time), Utf8("Timestamp(Second, None)"))]], aggr=[[uddsketch_state(Int64(128), Float64(0.01), CAST(CASE WHEN base_table.lcp > Int64(0) AND base_table.lcp < Int64(3000000) THEN base_table.lcp ELSE CAST(NULL AS Int64) END AS Float64)), max(CASE WHEN base_table.lcp > Int64(0) AND base_table.lcp < Int64(3000000) THEN base_table.lcp ELSE CAST(NULL AS Int64) END), min(CASE WHEN base_table.lcp > Int64(0) AND base_table.lcp < Int64(3000000) THEN base_table.lcp ELSE CAST(NULL AS Int64) END), uddsketch_state(Int64(128), Float64(0.01), CAST(CASE WHEN base_table.fmp > Int64(0) AND base_table.fmp < Int64(3000000) THEN base_table.fmp ELSE CAST(NULL AS Int64) END AS Float64)), max(CASE WHEN base_table.fmp > Int64(0) AND base_table.fmp < Int64(3000000) THEN base_table.fmp ELSE CAST(NULL AS Int64) END), min(CASE WHEN base_table.fmp > Int64(0) AND base_table.fmp < Int64(3000000) THEN base_table.fmp ELSE CAST(NULL AS Int64) END), uddsketch_state(Int64(128), Float64(0.01), CAST(CASE WHEN base_table.fcp > Int64(0) AND base_table.fcp < Int64(3000000) THEN base_table.fcp ELSE CAST(NULL AS Int64) END AS Float64)), max(CASE WHEN base_table.fcp > Int64(0) AND base_table.fcp < Int64(3000000) THEN base_table.fcp ELSE CAST(NULL AS Int64) END), min(CASE WHEN base_table.fcp > Int64(0) AND base_table.fcp < Int64(3000000) THEN base_table.fcp ELSE CAST(NULL AS Int64) END), uddsketch_state(Int64(128), Float64(0.01), CAST(CASE WHEN base_table.fp > Int64(0) AND base_table.fp < Int64(3000000) THEN base_table.fp ELSE CAST(NULL AS Int64) END AS Float64)), max(CASE WHEN base_table.fp > Int64(0) AND base_table.fp < Int64(3000000) THEN base_table.fp ELSE CAST(NULL AS Int64) END), min(CASE WHEN base_table.fp > Int64(0) AND base_table.fp < Int64(3000000) THEN base_table.fp ELSE CAST(NULL AS Int64) END), uddsketch_state(Int64(128), Float64(0.01), CAST(CASE WHEN base_table.tti > Int64(0) AND base_table.tti < Int64(3000000) THEN base_table.tti ELSE CAST(NULL AS Int64) END AS Float64)), max(CASE WHEN base_table.tti > Int64(0) AND base_table.tti < Int64(3000000) THEN base_table.tti ELSE CAST(NULL AS Int64) END), min(CASE WHEN base_table.tti > Int64(0) AND base_table.tti < Int64(3000000) THEN base_table.tti ELSE CAST(NULL AS Int64) END), uddsketch_state(Int64(128), Float64(0.01), CAST(CASE WHEN base_table.fid > Int64(0) AND base_table.fid < Int64(3000000) THEN base_table.fid ELSE CAST(NULL AS Int64) END AS Float64)), max(CASE WHEN base_table.fid > Int64(0) AND base_table.fid < Int64(3000000) THEN base_table.fid ELSE CAST(NULL AS Int64) END), min(CASE WHEN base_table.fid > Int64(0) AND base_table.fid < Int64(3000000) THEN base_table.fid ELSE CAST(NULL AS Int64) END), max(base_table.shard_key)]]_|
+|_|_Filter: (base_table.lcp > Int64(0) AND base_table.lcp < Int64(3000000) OR base_table.fmp > Int64(0) AND base_table.fmp < Int64(3000000) OR base_table.fcp > Int64(0) AND base_table.fcp < Int64(3000000) OR base_table.fp > Int64(0) AND base_table.fp < Int64(3000000) OR base_table.tti > Int64(0) AND base_table.tti < Int64(3000000) OR base_table.fid > Int64(0) AND base_table.fid < Int64(3000000)) AND CAST(base_table.time AS Timestamp(Millisecond, Some("+00:00"))) >= CAST(now() AS Timestamp(Millisecond, Some("+00:00")))_|
+|_|_TableScan: base_table_|
+|_| ]]_|
 | physical_plan | ProjectionExec: expr=[env@0 as env, service_name@1 as service_name, city@2 as city, page@3 as page, uddsketch_merge(Int64(128),Float64(0.01),uddsketch_state(Int64(128),Float64(0.01),CASE WHEN base_table.lcp > Int64(0) AND base_table.lcp < Int64(3000000) THEN base_table.lcp ELSE NULL END))@5 as lcp_state, max(max(CASE WHEN base_table.lcp > Int64(0) AND base_table.lcp < Int64(3000000) THEN base_table.lcp ELSE NULL END))@6 as max_lcp, min(min(CASE WHEN base_table.lcp > Int64(0) AND base_table.lcp < Int64(3000000) THEN base_table.lcp ELSE NULL END))@7 as min_lcp, uddsketch_merge(Int64(128),Float64(0.01),uddsketch_state(Int64(128),Float64(0.01),CASE WHEN base_table.fmp > Int64(0) AND base_table.fmp < Int64(3000000) THEN base_table.fmp ELSE NULL END))@8 as fmp_state, max(max(CASE WHEN base_table.fmp > Int64(0) AND base_table.fmp < Int64(3000000) THEN base_table.fmp ELSE NULL END))@9 as max_fmp, min(min(CASE WHEN base_table.fmp > Int64(0) AND base_table.fmp < Int64(3000000) THEN base_table.fmp ELSE NULL END))@10 as min_fmp, uddsketch_merge(Int64(128),Float64(0.01),uddsketch_state(Int64(128),Float64(0.01),CASE WHEN base_table.fcp > Int64(0) AND base_table.fcp < Int64(3000000) THEN base_table.fcp ELSE NULL END))@11 as fcp_state, max(max(CASE WHEN base_table.fcp > Int64(0) AND base_table.fcp < Int64(3000000) THEN base_table.fcp ELSE NULL END))@12 as max_fcp, min(min(CASE WHEN base_table.fcp > Int64(0) AND base_table.fcp < Int64(3000000) THEN base_table.fcp ELSE NULL END))@13 as min_fcp, uddsketch_merge(Int64(128),Float64(0.01),uddsketch_state(Int64(128),Float64(0.01),CASE WHEN base_table.fp > Int64(0) AND base_table.fp < Int64(3000000) THEN base_table.fp ELSE NULL END))@14 as fp_state, max(max(CASE WHEN base_table.fp > Int64(0) AND base_table.fp < Int64(3000000) THEN base_table.fp ELSE NULL END))@15 as max_fp, min(min(CASE WHEN base_table.fp > Int64(0) AND base_table.fp < Int64(3000000) THEN base_table.fp ELSE NULL END))@16 as min_fp, uddsketch_merge(Int64(128),Float64(0.01),uddsketch_state(Int64(128),Float64(0.01),CASE WHEN base_table.tti > Int64(0) AND base_table.tti < Int64(3000000) THEN base_table.tti ELSE NULL END))@17 as tti_state, max(max(CASE WHEN base_table.tti > Int64(0) AND base_table.tti < Int64(3000000) THEN base_table.tti ELSE NULL END))@18 as max_tti, min(min(CASE WHEN base_table.tti > Int64(0) AND base_table.tti < Int64(3000000) THEN base_table.tti ELSE NULL END))@19 as min_tti, uddsketch_merge(Int64(128),Float64(0.01),uddsketch_state(Int64(128),Float64(0.01),CASE WHEN base_table.fid > Int64(0) AND base_table.fid < Int64(3000000) THEN base_table.fid ELSE NULL END))@20 as fid_state, max(max(CASE WHEN base_table.fid > Int64(0) AND base_table.fid < Int64(3000000) THEN base_table.fid ELSE NULL END))@21 as max_fid, min(min(CASE WHEN base_table.fid > Int64(0) AND base_table.fid < Int64(3000000) THEN base_table.fid ELSE NULL END))@22 as min_fid, max(max(base_table.shard_key))@23 as shard_key, arrow_cast(date_bin(Utf8("60 seconds"),base_table.time),Utf8("Timestamp(Second, None)"))@4 as arrow_cast(date_bin(Utf8("60 seconds"),base_table.time),Utf8("Timestamp(Second, None)"))] |
 |_|_AggregateExec: mode=FinalPartitioned, gby=[env@0 as env, service_name@1 as service_name, city@2 as city, page@3 as page, arrow_cast(date_bin(Utf8("60 seconds"),base_table.time),Utf8("Timestamp(Second, None)"))@4 as arrow_cast(date_bin(Utf8("60 seconds"),base_table.time),Utf8("Timestamp(Second, None)"))], aggr=[uddsketch_merge(Int64(128),Float64(0.01),uddsketch_state(Int64(128),Float64(0.01),CASE WHEN base_table.lcp > Int64(0) AND base_table.lcp < Int64(3000000) THEN base_table.lcp ELSE NULL END)), max(max(CASE WHEN base_table.lcp > Int64(0) AND base_table.lcp < Int64(3000000) THEN base_table.lcp ELSE NULL END)), min(min(CASE WHEN base_table.lcp > Int64(0) AND base_table.lcp < Int64(3000000) THEN base_table.lcp ELSE NULL END)), uddsketch_merge(Int64(128),Float64(0.01),uddsketch_state(Int64(128),Float64(0.01),CASE WHEN base_table.fmp > Int64(0) AND base_table.fmp < Int64(3000000) THEN base_table.fmp ELSE NULL END)), max(max(CASE WHEN base_table.fmp > Int64(0) AND base_table.fmp < Int64(3000000) THEN base_table.fmp ELSE NULL END)), min(min(CASE WHEN base_table.fmp > Int64(0) AND base_table.fmp < Int64(3000000) THEN base_table.fmp ELSE NULL END)), uddsketch_merge(Int64(128),Float64(0.01),uddsketch_state(Int64(128),Float64(0.01),CASE WHEN base_table.fcp > Int64(0) AND base_table.fcp < Int64(3000000) THEN base_table.fcp ELSE NULL END)), max(max(CASE WHEN base_table.fcp > Int64(0) AND base_table.fcp < Int64(3000000) THEN base_table.fcp ELSE NULL END)), min(min(CASE WHEN base_table.fcp > Int64(0) AND base_table.fcp < Int64(3000000) THEN base_table.fcp ELSE NULL END)), uddsketch_merge(Int64(128),Float64(0.01),uddsketch_state(Int64(128),Float64(0.01),CASE WHEN base_table.fp > Int64(0) AND base_table.fp < Int64(3000000) THEN base_table.fp ELSE NULL END)), max(max(CASE WHEN base_table.fp > Int64(0) AND base_table.fp < Int64(3000000) THEN base_table.fp ELSE NULL END)), min(min(CASE WHEN base_table.fp > Int64(0) AND base_table.fp < Int64(3000000) THEN base_table.fp ELSE NULL END)), uddsketch_merge(Int64(128),Float64(0.01),uddsketch_state(Int64(128),Float64(0.01),CASE WHEN base_table.tti > Int64(0) AND base_table.tti < Int64(3000000) THEN base_table.tti ELSE NULL END)), max(max(CASE WHEN base_table.tti > Int64(0) AND base_table.tti < Int64(3000000) THEN base_table.tti ELSE NULL END)), min(min(CASE WHEN base_table.tti > Int64(0) AND base_table.tti < Int64(3000000) THEN base_table.tti ELSE NULL END)), uddsketch_merge(Int64(128),Float64(0.01),uddsketch_state(Int64(128),Float64(0.01),CASE WHEN base_table.fid > Int64(0) AND base_table.fid < Int64(3000000) THEN base_table.fid ELSE NULL END)), max(max(CASE WHEN base_table.fid > Int64(0) AND base_table.fid < Int64(3000000) THEN base_table.fid ELSE NULL END)), min(min(CASE WHEN base_table.fid > Int64(0) AND base_table.fid < Int64(3000000) THEN base_table.fid ELSE NULL END)), max(max(base_table.shard_key))]_|
 |_|_CoalesceBatchesExec: target_batch_size=8192_|
@@ -624,7 +628,11 @@ where
 +-+-+
 | logical_plan_| Projection: count(*) AS count(*)_|
 |_|_Aggregate: groupBy=[[]], aggr=[[sum(count(*)) AS count(*)]]_|
-|_|_MergeScan [is_placeholder=false]_|
+|_|_MergeScan [is_placeholder=false, remote_input=[_|
+|_| Aggregate: groupBy=[[]], aggr=[[count(base_table.time) AS count(*)]]_|
+|_|_Filter: CAST(base_table.time AS Timestamp(Millisecond, Some("+00:00"))) >= CAST(now() AS Timestamp(Millisecond, Some("+00:00")))_|
+|_|_TableScan: base_table_|
+|_| ]]_|
 | physical_plan | AggregateExec: mode=Final, gby=[], aggr=[count(*)]_|
 |_|_CoalescePartitionsExec_|
 |_|_AggregateExec: mode=Partial, gby=[], aggr=[count(*)]_|
--- a/tests/cases/distributed/explain/subqueries.result
+++ b/tests/cases/distributed/explain/subqueries.result
@@ -14,9 +14,14 @@ EXPLAIN SELECT * FROM integers WHERE i IN ((SELECT i FROM integers)) ORDER BY i;
 +-+-+
 | logical_plan_| Sort: integers.i ASC NULLS LAST_|
 |_|_LeftSemi Join: integers.i = __correlated_sq_1.i_|
-|_|_MergeScan [is_placeholder=false]_|
+|_|_MergeScan [is_placeholder=false, remote_input=[_|
+|_| TableScan: integers_|
+|_| ]]_|
 |_|_SubqueryAlias: __correlated_sq_1_|
-|_|_MergeScan [is_placeholder=false]_|
+|_|_MergeScan [is_placeholder=false, remote_input=[_|
+|_| Projection: integers.i_|
+|_|_TableScan: integers_|
+|_| ]]_|
 | physical_plan | SortPreservingMergeExec: [i@0 ASC NULLS LAST]_|
 |_|_SortExec: expr=[i@0 ASC NULLS LAST], preserve_partitioning=[true]_|
 |_|_CoalesceBatchesExec: target_batch_size=8192_|
@@ -43,10 +48,14 @@ EXPLAIN SELECT * FROM integers i1 WHERE EXISTS(SELECT i FROM integers WHERE i=i1
 | logical_plan_| Sort: i1.i ASC NULLS LAST_|
 |_|_LeftSemi Join: i1.i = __correlated_sq_1.i_|
 |_|_SubqueryAlias: i1_|
-|_|_MergeScan [is_placeholder=false]_|
+|_|_MergeScan [is_placeholder=false, remote_input=[_|
+|_| TableScan: integers_|
+|_| ]]_|
 |_|_SubqueryAlias: __correlated_sq_1_|
 |_|_Projection: integers.i_|
-|_|_MergeScan [is_placeholder=false]_|
+|_|_MergeScan [is_placeholder=false, remote_input=[_|
+|_| TableScan: integers_|
+|_| ]]_|
 | physical_plan | SortPreservingMergeExec: [i@0 ASC NULLS LAST]_|
 |_|_SortExec: expr=[i@0 ASC NULLS LAST], preserve_partitioning=[true]_|
 |_|_CoalesceBatchesExec: target_batch_size=8192_|
@@ -85,9 +94,13 @@ order by t.i desc;
 |_|_Cross Join:_|
 |_|_Filter: integers.i IS NOT NULL_|
 |_|_Projection: integers.i_|
-|_|_MergeScan [is_placeholder=false]_|
+|_|_MergeScan [is_placeholder=false, remote_input=[_|
+|_| TableScan: integers_|
+|_| ]]_|
 |_|_Projection:_|
-|_|_MergeScan [is_placeholder=false]_|
+|_|_MergeScan [is_placeholder=false, remote_input=[_|
+|_| TableScan: other_|
+|_| ]]_|
 | physical_plan | SortPreservingMergeExec: [i@0 DESC]_|
 |_|_SortExec: expr=[i@0 DESC], preserve_partitioning=[true]_|
 |_|_CrossJoinExec_|
@@ -116,9 +129,15 @@ EXPLAIN INSERT INTO other SELECT i, 2 FROM integers WHERE i=(SELECT MAX(i) FROM
 |                     |   Projection: integers.i AS i, TimestampMillisecond(2, None) AS j |
 |                     |     Inner Join: integers.i = __scalar_sq_1.max(integers.i)        |
 |                     |       Projection: integers.i                                      |
-|                     |         MergeScan [is_placeholder=false]                          |
+|                     |         MergeScan [is_placeholder=false, remote_input=[           |
+|                     | TableScan: integers                                               |
+|                     | ]]                                                                |
 |                     |       SubqueryAlias: __scalar_sq_1                                |
-|                     |         MergeScan [is_placeholder=false]                          |
+|                     |         MergeScan [is_placeholder=false, remote_input=[           |
+|                     | Projection: max(integers.i)                                       |
+|                     |   Aggregate: groupBy=[[]], aggr=[[max(integers.i)]]               |
+|                     |     TableScan: integers                                           |
+|                     | ]]                                                                |
 | physical_plan_error | Error during planning: failed to resolve catalog: datafusion      |
 +---------------------+-------------------------------------------------------------------+

--- a/tests/cases/distributed/optimizer/filter_push_down.result
+++ b/tests/cases/distributed/optimizer/filter_push_down.result
@@ -252,10 +252,14 @@ EXPLAIN SELECT * FROM (SELECT 0=1 AS cond FROM integers i1, integers i2) a1 WHER
 |_|_Cross Join:_|
 |_|_SubqueryAlias: i1_|
 |_|_Projection:_|
-|_|_MergeScan [is_placeholder=false]_|
+|_|_MergeScan [is_placeholder=false, remote_input=[ |
+|_| TableScan: integers_|
+|_| ]]_|
 |_|_SubqueryAlias: i2_|
 |_|_Projection:_|
-|_|_MergeScan [is_placeholder=false]_|
+|_|_MergeScan [is_placeholder=false, remote_input=[ |
+|_| TableScan: integers_|
+|_| ]]_|
 | physical_plan | CoalescePartitionsExec_|
 |_|_ProjectionExec: expr=[false as cond]_|
 |_|_CrossJoinExec_|
--- a/tests/cases/distributed/optimizer/order_by.result
+++ b/tests/cases/distributed/optimizer/order_by.result
@@ -4,7 +4,10 @@ explain select * from numbers;
 +---------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
 | plan_type     | plan                                                                                                                                                                                                                                 |
 +---------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
-| logical_plan  | MergeScan [is_placeholder=false]                                                                                                                                                                                                     |
+| logical_plan  | MergeScan [is_placeholder=false, remote_input=[                                                                                                                                                                                      |
+|               | Projection: numbers.number                                                                                                                                                                                                           |
+|               |   TableScan: numbers                                                                                                                                                                                                                 |
+|               | ]]                                                                                                                                                                                                                                   |
 | physical_plan | StreamScanAdapter: [<SendableRecordBatchStream>], schema: [Schema { fields: [Field { name: "number", data_type: UInt32, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {"greptime:version": "0"} }] |
 |               |                                                                                                                                                                                                                                      |
 +---------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
@@ -15,7 +18,11 @@ explain select * from numbers order by number desc;
 +---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
 | plan_type     | plan                                                                                                                                                                                                                                   |
 +---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
-| logical_plan  | MergeScan [is_placeholder=false]                                                                                                                                                                                                       |
+| logical_plan  | MergeScan [is_placeholder=false, remote_input=[                                                                                                                                                                                        |
+|               | Sort: numbers.number DESC NULLS FIRST                                                                                                                                                                                                  |
+|               |   Projection: numbers.number                                                                                                                                                                                                           |
+|               |     TableScan: numbers                                                                                                                                                                                                                 |
+|               | ]]                                                                                                                                                                                                                                     |
 | physical_plan | SortExec: expr=[number@0 DESC], preserve_partitioning=[false]                                                                                                                                                                          |
 |               |   StreamScanAdapter: [<SendableRecordBatchStream>], schema: [Schema { fields: [Field { name: "number", data_type: UInt32, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {"greptime:version": "0"} }] |
 |               |                                                                                                                                                                                                                                        |
@@ -27,7 +34,11 @@ explain select * from numbers order by number asc;
 +---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
 | plan_type     | plan                                                                                                                                                                                                                                   |
 +---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
-| logical_plan  | MergeScan [is_placeholder=false]                                                                                                                                                                                                       |
+| logical_plan  | MergeScan [is_placeholder=false, remote_input=[                                                                                                                                                                                        |
+|               | Sort: numbers.number ASC NULLS LAST                                                                                                                                                                                                    |
+|               |   Projection: numbers.number                                                                                                                                                                                                           |
+|               |     TableScan: numbers                                                                                                                                                                                                                 |
+|               | ]]                                                                                                                                                                                                                                     |
 | physical_plan | SortExec: expr=[number@0 ASC NULLS LAST], preserve_partitioning=[false]                                                                                                                                                                |
 |               |   StreamScanAdapter: [<SendableRecordBatchStream>], schema: [Schema { fields: [Field { name: "number", data_type: UInt32, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {"greptime:version": "0"} }] |
 |               |                                                                                                                                                                                                                                        |
@@ -39,7 +50,12 @@ explain select * from numbers order by number desc limit 10;
 +---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
 | plan_type     | plan                                                                                                                                                                                                                                   |
 +---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
-| logical_plan  | MergeScan [is_placeholder=false]                                                                                                                                                                                                       |
+| logical_plan  | MergeScan [is_placeholder=false, remote_input=[                                                                                                                                                                                        |
+|               | Limit: skip=0, fetch=10                                                                                                                                                                                                                |
+|               |   Sort: numbers.number DESC NULLS FIRST                                                                                                                                                                                                |
+|               |     Projection: numbers.number                                                                                                                                                                                                         |
+|               |       TableScan: numbers                                                                                                                                                                                                               |
+|               | ]]                                                                                                                                                                                                                                     |
 | physical_plan | SortExec: TopK(fetch=10), expr=[number@0 DESC], preserve_partitioning=[false]                                                                                                                                                          |
 |               |   StreamScanAdapter: [<SendableRecordBatchStream>], schema: [Schema { fields: [Field { name: "number", data_type: UInt32, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {"greptime:version": "0"} }] |
 |               |                                                                                                                                                                                                                                        |
@@ -51,7 +67,12 @@ explain select * from numbers order by number asc limit 10;
 +---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
 | plan_type     | plan                                                                                                                                                                                                                                   |
 +---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
-| logical_plan  | MergeScan [is_placeholder=false]                                                                                                                                                                                                       |
+| logical_plan  | MergeScan [is_placeholder=false, remote_input=[                                                                                                                                                                                        |
+|               | Limit: skip=0, fetch=10                                                                                                                                                                                                                |
+|               |   Sort: numbers.number ASC NULLS LAST                                                                                                                                                                                                  |
+|               |     Projection: numbers.number                                                                                                                                                                                                         |
+|               |       TableScan: numbers                                                                                                                                                                                                               |
+|               | ]]                                                                                                                                                                                                                                     |
 | physical_plan | SortExec: TopK(fetch=10), expr=[number@0 ASC NULLS LAST], preserve_partitioning=[false]                                                                                                                                                |
 |               |   StreamScanAdapter: [<SendableRecordBatchStream>], schema: [Schema { fields: [Field { name: "number", data_type: UInt32, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {"greptime:version": "0"} }] |
 |               |                                                                                                                                                                                                                                        |
--- a/tests/cases/standalone/common/alter/alter_table_first_after.result
+++ b/tests/cases/standalone/common/alter/alter_table_first_after.result
@@ -174,3 +174,80 @@ DROP TABLE t;

 Affected Rows: 0

+CREATE TABLE my_table (
+  a INT PRIMARY KEY,
+  b STRING,
+  ts TIMESTAMP TIME INDEX,
+)
+PARTITION ON COLUMNS (a) (
+  a < 1000,
+  a >= 1000 AND a < 2000,
+  a >= 2000
+);
+
+Affected Rows: 0
+
+INSERT INTO my_table VALUES
+    (100, 'a', 1),
+    (200, 'b', 2),
+    (1100, 'c', 3),
+    (1200, 'd', 4),
+    (2000, 'e', 5),
+    (2100, 'f', 6),
+    (2200, 'g', 7),
+    (2400, 'h', 8);
+
+Affected Rows: 8
+
+SELECT * FROM my_table WHERE a > 100 order by a;
+
+------+---+-------------------------+
+| a    | b | ts                      |
+------+---+-------------------------+
+| 200  | b | 1970-01-01T00:00:00.002 |
+| 1100 | c | 1970-01-01T00:00:00.003 |
+| 1200 | d | 1970-01-01T00:00:00.004 |
+| 2000 | e | 1970-01-01T00:00:00.005 |
+| 2100 | f | 1970-01-01T00:00:00.006 |
+| 2200 | g | 1970-01-01T00:00:00.007 |
+| 2400 | h | 1970-01-01T00:00:00.008 |
+------+---+-------------------------+
+
+SELECT count(*) FROM my_table WHERE a > 100;
+
+----------+
+| count(*) |
+----------+
+| 7        |
+----------+
+
+ALTER TABLE my_table ADD COLUMN c STRING FIRST;
+
+Affected Rows: 0
+
+SELECT * FROM my_table WHERE a > 100 order by a;
+
+---+------+---+-------------------------+
+| c | a    | b | ts                      |
+---+------+---+-------------------------+
+|   | 200  | b | 1970-01-01T00:00:00.002 |
+|   | 1100 | c | 1970-01-01T00:00:00.003 |
+|   | 1200 | d | 1970-01-01T00:00:00.004 |
+|   | 2000 | e | 1970-01-01T00:00:00.005 |
+|   | 2100 | f | 1970-01-01T00:00:00.006 |
+|   | 2200 | g | 1970-01-01T00:00:00.007 |
+|   | 2400 | h | 1970-01-01T00:00:00.008 |
+---+------+---+-------------------------+
+
+SELECT count(*) FROM my_table WHERE a > 100;
+
+----------+
+| count(*) |
+----------+
+| 7        |
+----------+
+
+DROP TABLE my_table;
+
+Affected Rows: 0
+
--- a/tests/cases/standalone/common/alter/alter_table_first_after.sql
+++ b/tests/cases/standalone/common/alter/alter_table_first_after.sql
@@ -47,3 +47,36 @@ SELECT * FROM t;
 ALTER TABLE t ADD COLUMN x int xxx;

 DROP TABLE t;
+
+CREATE TABLE my_table (
+  a INT PRIMARY KEY,
+  b STRING,
+  ts TIMESTAMP TIME INDEX,
+)
+PARTITION ON COLUMNS (a) (
+  a < 1000,
+  a >= 1000 AND a < 2000,
+  a >= 2000
+);
+
+INSERT INTO my_table VALUES
+    (100, 'a', 1),
+    (200, 'b', 2),
+    (1100, 'c', 3),
+    (1200, 'd', 4),
+    (2000, 'e', 5),
+    (2100, 'f', 6),
+    (2200, 'g', 7),
+    (2400, 'h', 8);
+
+SELECT * FROM my_table WHERE a > 100 order by a;
+
+SELECT count(*) FROM my_table WHERE a > 100;
+
+ALTER TABLE my_table ADD COLUMN c STRING FIRST;
+
+SELECT * FROM my_table WHERE a > 100 order by a;
+
+SELECT count(*) FROM my_table WHERE a > 100;
+
+DROP TABLE my_table;
--- a/tests/cases/standalone/common/alter/drop_col.result
+++ b/tests/cases/standalone/common/alter/drop_col.result
@@ -31,3 +31,24 @@ DROP TABLE test;

 Affected Rows: 0

+CREATE TABLE my_table (
+  a INT PRIMARY KEY,
+  b STRING,
+  ts TIMESTAMP TIME INDEX,
+)
+PARTITION ON COLUMNS (a) (
+  a < 1000,
+  a >= 1000 AND a < 2000,
+  a >= 2000
+);
+
+Affected Rows: 0
+
+ALTER TABLE my_table DROP COLUMN a;
+
+Error: 1004(InvalidArguments), Not allowed to remove index column a from table my_table
+
+DROP TABLE my_table;
+
+Affected Rows: 0
+
--- a/tests/cases/standalone/common/alter/drop_col.sql
+++ b/tests/cases/standalone/common/alter/drop_col.sql
@@ -11,3 +11,18 @@ SELECT * FROM test;
 ALTER TABLE test DROP COLUMN j;

 DROP TABLE test;
+
+CREATE TABLE my_table (
+  a INT PRIMARY KEY,
+  b STRING,
+  ts TIMESTAMP TIME INDEX,
+)
+PARTITION ON COLUMNS (a) (
+  a < 1000,
+  a >= 1000 AND a < 2000,
+  a >= 2000
+);
+
+ALTER TABLE my_table DROP COLUMN a;
+
+DROP TABLE my_table;
--- a/tests/cases/standalone/common/mysql.result
+++ b/tests/cases/standalone/common/mysql.result
@@ -2,19 +2,19 @@
 SELECt @@tx_isolation;

 +-----------------+
-| @@tx_isolation; |
+| @@tx_isolation  |
 +-----------------+
-| 0               |
+| REPEATABLE-READ |
 +-----------------+

 -- SQLNESS PROTOCOL MYSQL
 SELECT @@version_comment;

-+--------------------+
-| @@version_comment; |
-+--------------------+
-| 0                  |
-+--------------------+
+-------------------+
+| @@version_comment |
+-------------------+
+| Greptime          |
+-------------------+

 -- SQLNESS PROTOCOL MYSQL
 SHOW DATABASES;
--- a/tests/cases/standalone/common/order/order_by_exceptions.result
+++ b/tests/cases/standalone/common/order/order_by_exceptions.result
@@ -70,8 +70,14 @@ EXPLAIN SELECT a % 2, b FROM test UNION SELECT a % 2 AS k, b FROM test ORDER BY
 | logical_plan  | Sort: Int64(-1) ASC NULLS LAST                                                                             |
 |               |   Aggregate: groupBy=[[test.a % Int64(2), test.b]], aggr=[[]]                                              |
 |               |     Union                                                                                                  |
-|               |       MergeScan [is_placeholder=false]                                                                     |
-|               |       MergeScan [is_placeholder=false]                                                                     |
+|               |       MergeScan [is_placeholder=false, remote_input=[                                                      |
+|               | Projection: CAST(test.a AS Int64) % Int64(2) AS test.a % Int64(2), test.b                                  |
+|               |   TableScan: test                                                                                          |
+|               | ]]                                                                                                         |
+|               |       MergeScan [is_placeholder=false, remote_input=[                                                      |
+|               | Projection: CAST(test.a AS Int64) % Int64(2) AS test.a % Int64(2), test.b                                  |
+|               |   TableScan: test                                                                                          |
+|               | ]]                                                                                                         |
 | physical_plan | CoalescePartitionsExec                                                                                     |
 |               |   AggregateExec: mode=SinglePartitioned, gby=[test.a % Int64(2)@0 as test.a % Int64(2), b@1 as b], aggr=[] |
 |               |     InterleaveExec                                                                                         |
--- a/tests/cases/standalone/common/promql/simple_histogram.result
+++ b/tests/cases/standalone/common/promql/simple_histogram.result
@@ -332,3 +332,34 @@ drop table histogram4_bucket;

 Affected Rows: 0

+tql eval(0, 10, '10s') histogram_quantile(0.99, sum by(pod,instance, fff) (rate(greptime_servers_postgres_query_elapsed_bucket{instance=~"xxx"}[1m])));
+
++
++
+
+-- test case where table exists but doesn't have 'le' column should raise error
+CREATE TABLE greptime_servers_postgres_query_elapsed_no_le (
+    pod STRING,
+    instance STRING,
+    t TIMESTAMP TIME INDEX,
+    v DOUBLE,
+    PRIMARY KEY (pod, instance)
+);
+
+Affected Rows: 0
+
+-- should return empty result instead of error when 'le' column is missing
+tql eval(0, 10, '10s') histogram_quantile(0.99, sum by(pod,instance, le) (rate(greptime_servers_postgres_query_elapsed_no_le{instance=~"xxx"}[1m])));
+
++
++
+
+tql eval(0, 10, '10s') histogram_quantile(0.99, sum by(pod,instance, fbf) (rate(greptime_servers_postgres_query_elapsed_no_le{instance=~"xxx"}[1m])));
+
++
++
+
+drop table greptime_servers_postgres_query_elapsed_no_le;
+
+Affected Rows: 0
+
--- a/tests/cases/standalone/common/promql/simple_histogram.sql
+++ b/tests/cases/standalone/common/promql/simple_histogram.sql
@@ -187,3 +187,20 @@ insert into histogram4_bucket values
 tql eval (2900, 3000, '100s') histogram_quantile(0.9, histogram4_bucket);

 drop table histogram4_bucket;
+
+tql eval(0, 10, '10s') histogram_quantile(0.99, sum by(pod,instance, fff) (rate(greptime_servers_postgres_query_elapsed_bucket{instance=~"xxx"}[1m])));
+
+-- test case where table exists but doesn't have 'le' column should raise error
+CREATE TABLE greptime_servers_postgres_query_elapsed_no_le (
+    pod STRING,
+    instance STRING,
+    t TIMESTAMP TIME INDEX,
+    v DOUBLE,
+    PRIMARY KEY (pod, instance)
+);
+
+-- should return empty result instead of error when 'le' column is missing
+tql eval(0, 10, '10s') histogram_quantile(0.99, sum by(pod,instance, le) (rate(greptime_servers_postgres_query_elapsed_no_le{instance=~"xxx"}[1m])));
+tql eval(0, 10, '10s') histogram_quantile(0.99, sum by(pod,instance, fbf) (rate(greptime_servers_postgres_query_elapsed_no_le{instance=~"xxx"}[1m])));
+
+drop table greptime_servers_postgres_query_elapsed_no_le;
--- a/tests/cases/standalone/common/promql/timestamp_fn.result
+++ b/tests/cases/standalone/common/promql/timestamp_fn.result
@@ -0,0 +1,160 @@
+-- Test `timestamp()` function
+-- timestamp() returns the timestamp of each sample as seconds since Unix epoch
+create table timestamp_test (ts timestamp time index, val double);
+
+Affected Rows: 0
+
+insert into timestamp_test values
+  (0, 1.0),
+  (1000, 2.0),
+  (60000, 3.0),
+  (3600000, 4.0),
+   -- 2021-01-01 00:00:00
+  (1609459200000, 5.0),
+   -- 2021-01-01 00:01:00
+  (1609459260000, 6.0);
+
+Affected Rows: 6
+
+-- Test timestamp() with time series
+tql eval (0, 3600, '30s') timestamp(timestamp_test);
+
+---------------------+--------+
+| ts                  | value  |
+---------------------+--------+
+| 1970-01-01T00:00:00 | 0.0    |
+| 1970-01-01T00:00:30 | 1.0    |
+| 1970-01-01T00:01:00 | 60.0   |
+| 1970-01-01T00:01:30 | 60.0   |
+| 1970-01-01T00:02:00 | 60.0   |
+| 1970-01-01T00:02:30 | 60.0   |
+| 1970-01-01T00:03:00 | 60.0   |
+| 1970-01-01T00:03:30 | 60.0   |
+| 1970-01-01T00:04:00 | 60.0   |
+| 1970-01-01T00:04:30 | 60.0   |
+| 1970-01-01T00:05:00 | 60.0   |
+| 1970-01-01T00:05:30 | 60.0   |
+| 1970-01-01T00:06:00 | 60.0   |
+| 1970-01-01T01:00:00 | 3600.0 |
+---------------------+--------+
+
+-- Test timestamp() with specific time range
+tql eval (0, 60, '30s') timestamp(timestamp_test);
+
+---------------------+-------+
+| ts                  | value |
+---------------------+-------+
+| 1970-01-01T00:00:00 | 0.0   |
+| 1970-01-01T00:00:30 | 1.0   |
+| 1970-01-01T00:01:00 | 60.0  |
+---------------------+-------+
+
+tql eval (0, 60, '30s') -timestamp(timestamp_test);
+
+---------------------+-----------+
+| ts                  | (- value) |
+---------------------+-----------+
+| 1970-01-01T00:00:00 | -0.0      |
+| 1970-01-01T00:00:30 | -1.0      |
+| 1970-01-01T00:01:00 | -60.0     |
+---------------------+-----------+
+
+-- Test timestamp() with 2021 data
+tql eval (1609459200, 1609459260, '30s') timestamp(timestamp_test);
+
+---------------------+--------------+
+| ts                  | value        |
+---------------------+--------------+
+| 2021-01-01T00:00:00 | 1609459200.0 |
+| 2021-01-01T00:00:30 | 1609459200.0 |
+| 2021-01-01T00:01:00 | 1609459260.0 |
+---------------------+--------------+
+
+-- Test timestamp() with arithmetic operations
+tql eval (0, 60, '30s') timestamp(timestamp_test) + 1;
+
+---------------------+--------------------+
+| ts                  | value + Float64(1) |
+---------------------+--------------------+
+| 1970-01-01T00:00:00 | 1.0                |
+| 1970-01-01T00:00:30 | 2.0                |
+| 1970-01-01T00:01:00 | 61.0               |
+---------------------+--------------------+
+
+-- Test timestamp() with boolean operations
+tql eval (0, 60, '30s') timestamp(timestamp_test) > bool 30;
+
+---------------------+---------------------+
+| ts                  | value > Float64(30) |
+---------------------+---------------------+
+| 1970-01-01T00:00:00 | 0.0                 |
+| 1970-01-01T00:00:30 | 0.0                 |
+| 1970-01-01T00:01:00 | 1.0                 |
+---------------------+---------------------+
+
+-- Test timestamp() with time functions
+tql eval (0, 60, '30s') timestamp(timestamp_test) - time();
+
+---------------------+----------------------------+
+| ts                  | value - ts / Float64(1000) |
+---------------------+----------------------------+
+| 1970-01-01T00:00:00 | 0.0                        |
+| 1970-01-01T00:00:30 | -29.0                      |
+| 1970-01-01T00:01:00 | 0.0                        |
+---------------------+----------------------------+
+
+-- Test timestamp() with other functions
+tql eval (0, 60, '30s') abs(timestamp(timestamp_test) - avg(timestamp(timestamp_test))) > 20;
+
+Error: 1004(InvalidArguments), Invalid function argument for unknown
+
+tql eval (0, 60, '30s') timestamp(timestamp_test) == 60;
+
+---------------------+-------+
+| ts                  | value |
+---------------------+-------+
+| 1970-01-01T00:01:00 | 60.0  |
+---------------------+-------+
+
+-- Test timestamp() with multiple metrics
+create table timestamp_test2 (ts timestamp time index, val double);
+
+Affected Rows: 0
+
+insert into timestamp_test2 values
+  (0, 10.0),
+  (1000, 20.0),
+  (60000, 30.0);
+
+Affected Rows: 3
+
+-- SQLNESS SORT_RESULT 3 1
+tql eval (0, 60, '30s') timestamp(timestamp_test) + timestamp(timestamp_test2);
+
+---------------------+----------------------------------------------+
+| ts                  | timestamp_test.value + timestamp_test2.value |
+---------------------+----------------------------------------------+
+| 1970-01-01T00:00:00 | 0.0                                          |
+| 1970-01-01T00:00:30 | 2.0                                          |
+| 1970-01-01T00:01:00 | 120.0                                        |
+---------------------+----------------------------------------------+
+
+-- SQLNESS SORT_RESULT 3 1
+tql eval (0, 60, '30s') timestamp(timestamp_test) == timestamp(timestamp_test2);
+
+---------------------+-------+---------------------+-------+
+| ts                  | value | ts                  | value |
+---------------------+-------+---------------------+-------+
+| 1970-01-01T00:00:00 | 0.0   | 1970-01-01T00:00:00 | 0.0   |
+| 1970-01-01T00:00:30 | 1.0   | 1970-01-01T00:00:30 | 1.0   |
+| 1970-01-01T00:01:00 | 60.0  | 1970-01-01T00:01:00 | 60.0  |
+---------------------+-------+---------------------+-------+
+
+drop table timestamp_test;
+
+Affected Rows: 0
+
+drop table timestamp_test2;
+
+Affected Rows: 0
+
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
yihong	09f3d72d2d	fix: closee issue #6555 return empty result (#6569 ) * fix: closee issue #6555 return empty result Signed-off-by: yihong0618 <zouzou0208@gmail.com> * fix: only start one instance one regrex sqlness test (#6570) Signed-off-by: yihong0618 <zouzou0208@gmail.com> * refactor: refactor partition mod to use PartitionExpr instead of PartitionDef (#6554) * refactor: refactor partition mod to use PartitionExpr instead of PartitionDef Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> * fix snafu Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> * Puts expression into PbPartition Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> * address comments Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> * fix compile Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> * update proto Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> * add serde test Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> * add serde test Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> --------- Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> * fix: address comments Signed-off-by: yihong0618 <zouzou0208@gmail.com> --------- Signed-off-by: yihong0618 <zouzou0208@gmail.com> Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> Co-authored-by: Zhenchi <zhongzc_arch@outlook.com> Signed-off-by: evenyag <realevenyag@gmail.com>	2025-07-24 15:00:32 +08:00
Yingwen	ca0c1282ed	chore: bump version to 0.15.3 (#6580 ) Signed-off-by: evenyag <realevenyag@gmail.com>	2025-07-24 11:24:07 +08:00
Yingwen	b719c020ba	chore: cherry pick #6540 , #6550 , #6551 , #6556 , #6563 , #6534 to v0.15 branch (#6577 ) * feat: add metrics for request wait time and adjust stall metrics (#6540) * feat: add metric greptime_mito_request_wait_time to observe wait time Signed-off-by: evenyag <realevenyag@gmail.com> * feat: add worker to wait time metric Signed-off-by: evenyag <realevenyag@gmail.com> * refactor: rename stall gauge to greptime_mito_write_stalling_count Signed-off-by: evenyag <realevenyag@gmail.com> * feat: change greptime_mito_write_stall_total to total stalled requests Signed-off-by: evenyag <realevenyag@gmail.com> * refactor: merge lazy static blocks Signed-off-by: evenyag <realevenyag@gmail.com> --------- Signed-off-by: evenyag <realevenyag@gmail.com> * fix: estimate mem size for bulk ingester (#6550) Signed-off-by: evenyag <realevenyag@gmail.com> * fix: flow mirror cache (#6551) * fix: invalid cache when flownode change address Signed-off-by: discord9 <discord9@163.com> * update comments Signed-off-by: discord9 <discord9@163.com> * fix Signed-off-by: discord9 <discord9@163.com> * refactor: add log&rename Signed-off-by: discord9 <discord9@163.com> * stuff Signed-off-by: discord9 <discord9@163.com> --------- Signed-off-by: discord9 <discord9@163.com> Signed-off-by: evenyag <realevenyag@gmail.com> * feat: impl timestamp function for promql (#6556) * feat: impl timestamp function for promql Signed-off-by: Dennis Zhuang <killme2008@gmail.com> * chore: style and typo Signed-off-by: Dennis Zhuang <killme2008@gmail.com> * fix: test Signed-off-by: Dennis Zhuang <killme2008@gmail.com> * docs: update comments Signed-off-by: Dennis Zhuang <killme2008@gmail.com> * chore: comment Signed-off-by: Dennis Zhuang <killme2008@gmail.com> --------- Signed-off-by: Dennis Zhuang <killme2008@gmail.com> Signed-off-by: evenyag <realevenyag@gmail.com> * feat: MergeScan print input (#6563) * feat: MergeScan print input Signed-off-by: discord9 <discord9@163.com> * test: fix ut Signed-off-by: discord9 <discord9@163.com> --------- Signed-off-by: discord9 <discord9@163.com> Signed-off-by: evenyag <realevenyag@gmail.com> * fix: aggr group by all partition cols use partial commutative (#6534) * fix: aggr group by all partition cols use partial commutative Signed-off-by: discord9 <discord9@163.com> * test: bugged case Signed-off-by: discord9 <discord9@163.com> * test: sqlness fix Signed-off-by: discord9 <discord9@163.com> * test: more redacted Signed-off-by: discord9 <discord9@163.com> * more cases Signed-off-by: discord9 <discord9@163.com> * even more test cases Signed-off-by: discord9 <discord9@163.com> * join testcase Signed-off-by: discord9 <discord9@163.com> * fix: column requirement added in correct location Signed-off-by: discord9 <discord9@163.com> * fix test Signed-off-by: discord9 <discord9@163.com> * chore: clippy Signed-off-by: discord9 <discord9@163.com> * track col reqs per stack Signed-off-by: discord9 <discord9@163.com> * fix: continue Signed-off-by: discord9 <discord9@163.com> * chore: clippy Signed-off-by: discord9 <discord9@163.com> * refactor: test mod Signed-off-by: discord9 <discord9@163.com> * test utils Signed-off-by: discord9 <discord9@163.com> * test: better test Signed-off-by: discord9 <discord9@163.com> * more testcases Signed-off-by: discord9 <discord9@163.com> * test limit push down Signed-off-by: discord9 <discord9@163.com> * more testcases Signed-off-by: discord9 <discord9@163.com> * more testcase Signed-off-by: discord9 <discord9@163.com> * more test Signed-off-by: discord9 <discord9@163.com> * chore: update sqlness Signed-off-by: discord9 <discord9@163.com> * chore: update commnets Signed-off-by: discord9 <discord9@163.com> * fix: check col reqs from bottom to upper Signed-off-by: discord9 <discord9@163.com> * chore: more comment Signed-off-by: discord9 <discord9@163.com> * docs: more todo Signed-off-by: discord9 <discord9@163.com> * chore: comments Signed-off-by: discord9 <discord9@163.com> * test: a new failing test that should be fixed Signed-off-by: discord9 <discord9@163.com> * fix: part col alias tracking Signed-off-by: discord9 <discord9@163.com> * chore: unused Signed-off-by: discord9 <discord9@163.com> * chore: clippy Signed-off-by: discord9 <discord9@163.com> * docs: comment Signed-off-by: discord9 <discord9@163.com> * mroe testcase Signed-off-by: discord9 <discord9@163.com> * more testcase for step/part aggr combine Signed-off-by: discord9 <discord9@163.com> * FIXME: a new bug Signed-off-by: discord9 <discord9@163.com> * literally unfixable Signed-off-by: discord9 <discord9@163.com> * chore: remove some debug print Signed-off-by: discord9 <discord9@163.com> --------- Signed-off-by: discord9 <discord9@163.com> Signed-off-by: evenyag <realevenyag@gmail.com> --------- Signed-off-by: evenyag <realevenyag@gmail.com> Signed-off-by: discord9 <discord9@163.com> Signed-off-by: Dennis Zhuang <killme2008@gmail.com> Co-authored-by: fys <40801205+fengys1996@users.noreply.github.com> Co-authored-by: discord9 <55937128+discord9@users.noreply.github.com> Co-authored-by: dennis zhuang <killme2008@gmail.com>	2025-07-23 22:29:14 +08:00
Ruihang Xia	717c1d1807	feat: update partial execution metrics (#6499 ) * feat: update partial execution metrics Signed-off-by: Ruihang Xia <waynestxia@gmail.com> * send data with metrics in distributed mode Signed-off-by: Ruihang Xia <waynestxia@gmail.com> * fix clippy Signed-off-by: Ruihang Xia <waynestxia@gmail.com> * only send partial metrics under VERBOSE flag Signed-off-by: Ruihang Xia <waynestxia@gmail.com> * loop to while Signed-off-by: Ruihang Xia <waynestxia@gmail.com> --------- Signed-off-by: Ruihang Xia <waynestxia@gmail.com> Signed-off-by: evenyag <realevenyag@gmail.com>	2025-07-23 20:54:33 +08:00
Zhenchi	291f3c89fe	fix: row selection intersection removes trailing rows (#6539 ) * fix: row selection intersection removes trailing rows Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> * fix typos Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> --------- Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> Signed-off-by: evenyag <realevenyag@gmail.com>	2025-07-23 20:54:33 +08:00
discord9	602cc38056	fix: breaking loop when not retryable (#6538 ) fix: breaking when not retryable Signed-off-by: discord9 <discord9@163.com> Signed-off-by: evenyag <realevenyag@gmail.com>	2025-07-23 20:54:33 +08:00
Lei, HUANG	46b3593021	fix(grpc): check grpc client unavailable (#6488 ) * fix/check-grpc-client-unavailable: Improve async handling in `greptime_handler.rs` - Updated the `DoPut` response handling to use `await` with `result_sender.send` for better asynchronous operation. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * fix/check-grpc-client-unavailable: ### Improve Error Handling in `greptime_handler.rs` - Enhanced error handling for the `DoPut` operation by switching from `send` to `try_send` for the `result_sender`. - Added specific logging for unreachable clients, including `request_id` in the warning message. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> --------- Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> Signed-off-by: evenyag <realevenyag@gmail.com>	2025-07-23 20:54:33 +08:00
Yan Tingwang	ff402fd6f6	test: add sqlness test for max execution time (#6517 ) * add sqlness test for max_execution_time Signed-off-by: codephage. <tingwangyan2020@163.com> * add Pre-line comments SQLNESS PROTOCOL MYSQL Signed-off-by: codephage. <tingwangyan2020@163.com> * fix(mysql): support max_execution_time variable Co-authored-by: evenyag <realevenyag@gmail.com> Signed-off-by: codephage. <tingwangyan2020@163.com> * fix: test::test_check & sqlness test mysql Signed-off-by: codephage. <tingwangyan2020@163.com> * add sqlness test for max_execution_time Signed-off-by: codephage. <tingwangyan2020@163.com> * add Pre-line comments SQLNESS PROTOCOL MYSQL Signed-off-by: codephage. <tingwangyan2020@163.com> * fix(mysql): support max_execution_time variable Co-authored-by: evenyag <realevenyag@gmail.com> Signed-off-by: codephage. <tingwangyan2020@163.com> * fix: test::test_check & sqlness test mysql Signed-off-by: codephage. <tingwangyan2020@163.com> * chore: Unify the sql style Signed-off-by: codephage. <tingwangyan2020@163.com> --------- Signed-off-by: codephage. <tingwangyan2020@163.com> Co-authored-by: evenyag <realevenyag@gmail.com> Signed-off-by: evenyag <realevenyag@gmail.com>	2025-07-23 20:54:33 +08:00
Yan Tingwang	b83e6e2b18	fix: add system variable max_execution_time (#6511 ) add system variable : max_execution_time Signed-off-by: codephage. <tingwangyan2020@163.com> Signed-off-by: evenyag <realevenyag@gmail.com>	2025-07-23 20:54:33 +08:00
discord9	cb74337dbe	refactor(flow): faster time window expr (#6495 ) * refactor: faster window expr Signed-off-by: discord9 <discord9@163.com> * docs: explain fast path Signed-off-by: discord9 <discord9@163.com> * chore: rm unwrap Signed-off-by: discord9 <discord9@163.com> --------- Signed-off-by: discord9 <discord9@163.com> Signed-off-by: evenyag <realevenyag@gmail.com>	2025-07-23 20:54:33 +08:00
shuiyisong	32bffbb668	feat: add filter processor to v0.15 (#6516 ) feat: add filter processor Signed-off-by: shuiyisong <xixing.sys@gmail.com>	2025-07-14 17:43:49 +08:00
evenyag	941906dc74	chore: bump version to v0.15.2 Signed-off-by: evenyag <realevenyag@gmail.com>	2025-07-11 00:24:21 +08:00
Ruihang Xia	cbf251d0f0	fix: expand on conditional commutative as well (#6484 ) * fix: expand on conditional commutative as well Signed-off-by: Ruihang Xia <waynestxia@gmail.com> Signed-off-by: discord9 <discord9@163.com> * update sqlness result Signed-off-by: Ruihang Xia <waynestxia@gmail.com> Signed-off-by: discord9 <discord9@163.com> * add logging to figure test failure Signed-off-by: discord9 <discord9@163.com> * revert Signed-off-by: discord9 <discord9@163.com> * feat: stream drop record metrics Signed-off-by: discord9 <discord9@163.com> * Revert "feat: stream drop record metrics" This reverts commit 6a16946a5b8ea37557bbb1b600847d24274d6500. Signed-off-by: discord9 <discord9@163.com> * feat: stream drop record metrics Signed-off-by: discord9 <discord9@163.com> refactor: move logging to drop too Signed-off-by: discord9 <discord9@163.com> fix: drop input stream before collect metrics Signed-off-by: discord9 <discord9@163.com> * fix: expand differently Signed-off-by: discord9 <discord9@163.com> * test: update sqlness Signed-off-by: discord9 <discord9@163.com> * chore: more dbg Signed-off-by: discord9 <discord9@163.com> * Revert "feat: stream drop record metrics" This reverts commit 3eda4a2257928d95cf9c1328ae44fae84cfbb017. Signed-off-by: discord9 <discord9@163.com> * test: sqlness redacted Signed-off-by: discord9 <discord9@163.com> --------- Signed-off-by: Ruihang Xia <waynestxia@gmail.com> Signed-off-by: discord9 <discord9@163.com> Co-authored-by: discord9 <discord9@163.com> Signed-off-by: evenyag <realevenyag@gmail.com>	2025-07-11 00:24:21 +08:00
shuiyisong	1519379262	chore: skip calc ts in doc 2 with transform (#6509 ) Signed-off-by: shuiyisong <xixing.sys@gmail.com> Signed-off-by: evenyag <realevenyag@gmail.com>	2025-07-10 22:40:07 +08:00
localhost	4bfe02ec7f	chore: remove region id to reduce time series (#6506 ) Signed-off-by: evenyag <realevenyag@gmail.com>	2025-07-10 22:40:07 +08:00
Weny Xu	ecacf1333e	fix: correctly update partition key indices during alter table operations (#6494 ) * fix: correctly update partition key indices in alter table operations Signed-off-by: WenyXu <wenymedia@gmail.com> * test: add sqlness tests Signed-off-by: WenyXu <wenymedia@gmail.com> --------- Signed-off-by: WenyXu <wenymedia@gmail.com> Signed-off-by: evenyag <realevenyag@gmail.com>	2025-07-10 22:40:07 +08:00
Yingwen	92fa33c250	fix: range query returns range selector error when table not found (#6481 ) * test: add sqlness test for range vector with non-existence metric Signed-off-by: evenyag <realevenyag@gmail.com> * fix: handle empty metric for matrix selector Signed-off-by: evenyag <realevenyag@gmail.com> * test: update sqlness result Signed-off-by: evenyag <realevenyag@gmail.com> * chore: add newline Signed-off-by: evenyag <realevenyag@gmail.com> --------- Signed-off-by: evenyag <realevenyag@gmail.com>	2025-07-10 22:40:07 +08:00
shuiyisong	8b2d1a3753	fix: skip nan in prom remote write pipeline (#6489 ) Signed-off-by: shuiyisong <xixing.sys@gmail.com> Signed-off-by: evenyag <realevenyag@gmail.com>	2025-07-10 22:40:07 +08:00
Ning Sun	13401c94e0	feat: allow alternative version string (#6472 ) * feat: allow alternative version string * refactor: rename original version function to verbose_version Signed-off-by: Ning Sun <sunning@greptime.com> --------- Signed-off-by: Ning Sun <sunning@greptime.com> Signed-off-by: evenyag <realevenyag@gmail.com>	2025-07-10 22:40:07 +08:00
shuiyisong	fd637dae47	chore: sort range query return values (#6474 ) * chore: sort range query return values * chore: add comments * chore: add is_sorted check * fix: test Signed-off-by: evenyag <realevenyag@gmail.com>	2025-07-10 22:40:07 +08:00
dennis zhuang	69fac19770	fix: empty statements hang (#6480 ) * fix: empty statements hang Signed-off-by: Dennis Zhuang <killme2008@gmail.com> * tests: add cases Signed-off-by: Dennis Zhuang <killme2008@gmail.com> --------- Signed-off-by: Dennis Zhuang <killme2008@gmail.com> Signed-off-by: evenyag <realevenyag@gmail.com>	2025-07-10 22:40:07 +08:00
discord9	6435b97314	fix: stricter win sort condition (#6477 ) test: sqlness test: fix sqlness redacted Signed-off-by: discord9 <discord9@163.com> Signed-off-by: evenyag <realevenyag@gmail.com>	2025-07-10 22:40:07 +08:00
Weny Xu	726e3909fe	fix(metric-engine): handle stale metadata region recovery failures (#6395 ) * fix(metric-engine): handle stale metadata region recovery failures Signed-off-by: WenyXu <wenymedia@gmail.com> * test: add unit tests Signed-off-by: WenyXu <wenymedia@gmail.com> --------- Signed-off-by: WenyXu <wenymedia@gmail.com> Signed-off-by: evenyag <realevenyag@gmail.com>	2025-07-10 22:40:07 +08:00
evenyag	00d759e828	chore: bump version to v0.15.1 Signed-off-by: evenyag <realevenyag@gmail.com>	2025-07-04 22:53:46 +08:00
Lei, HUANG	0042ea6462	fix: filter empty batch in bulk insert api (#6459 ) * fix/filter-empty-batch-in-bulk-insert-api: Add Early Return for Empty Record Batches in `bulk_insert.rs` - Implemented an early return in the `Inserter` implementation to handle cases where `record_batch.num_rows()` is zero, improving efficiency by avoiding unnecessary processing. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * fix/filter-empty-batch-in-bulk-insert-api: Improve Bulk Insert Handling - `handle_bulk_insert.rs`: Added a check to handle cases where the batch has zero rows, immediately returning and sending a success response with zero rows processed. - `bulk_insert.rs`: Enhanced logic to skip processing for masks that select none, optimizing the bulk insert operation by avoiding unnecessary iterations. These changes improve the efficiency and robustness of the bulk insert process by handling edge cases more effectively. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * fix/filter-empty-batch-in-bulk-insert-api: ### Refactor and Error Handling Enhancements - Refactored Timestamp Handling: Introduced `timestamp_array_to_primitive` function in `timestamp.rs` to streamline conversion of timestamp arrays to primitive arrays, reducing redundancy in `handle_bulk_insert.rs` and `bulk_insert.rs`. - Error Handling: Added `InconsistentTimestampLength` error in `error.rs` to handle mismatched timestamp column lengths in bulk insert operations. - Bulk Insert Logic: Updated `handle_bulk_insert.rs` to utilize the new timestamp conversion function and added checks for timestamp length consistency. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * fix/filter-empty-batch-in-bulk-insert-api: Refactor `bulk_insert.rs` to streamline imports - Simplified import statements by removing unused timestamp-related arrays and data types from the `arrow` crate in `bulk_insert.rs`. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> --------- Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> Signed-off-by: evenyag <realevenyag@gmail.com>	2025-07-04 22:53:46 +08:00
Zhenchi	d06450715f	fix: add backward compatibility for `SkippingIndexOptions` deserialization (#6458 ) * fix: add backward compatibility for `SkippingIndexOptions` deserialization Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> * address comments Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> * address comments Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> --------- Signed-off-by: Zhenchi <zhongzc_arch@outlook.com> Signed-off-by: evenyag <realevenyag@gmail.com>	2025-07-04 22:53:46 +08:00