Compare commits

..

28 Commits

Author SHA1 Message Date
Lei, HUANG
55e5c7d2ff refactor/expose-config:
### Make SubCommand and Fields Public in `frontend.rs`

 - Made `subcmd` field in `Command` struct public.
 - Made `SubCommand` enum public.
 - Made `config_file` and `env_prefix` fields in `StartCommand` struct public.

 These changes enhance the accessibility of command-related structures and fields, facilitating external usage and integration.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
2025-07-08 09:25:51 +00:00
LFC
c2f1447345 refactor: stores the http server builder in Metasrv instance (#6461)
* refactor: stores the http server builder in Metasrv instance

Signed-off-by: luofucong <luofc@foxmail.com>

* resolve PR comments

Signed-off-by: luofucong <luofc@foxmail.com>

* fix ci

Signed-off-by: luofucong <luofc@foxmail.com>

---------

Signed-off-by: luofucong <luofc@foxmail.com>
2025-07-07 06:39:05 +00:00
Weny Xu
30f7955d2b feat: add column metadata to response extensions (#6451)
Signed-off-by: WenyXu <wenymedia@gmail.com>
2025-07-07 03:38:13 +00:00
fys
3508fddd74 feat: support show triggers (#6465)
* feat: support show triggers

* add enterprise feature

* chore: remove unused error
2025-07-07 03:30:57 +00:00
Weny Xu
351c741c70 fix(metric-engine): handle stale metadata region recovery failures (#6395)
* fix(metric-engine): handle stale metadata region recovery failures

Signed-off-by: WenyXu <wenymedia@gmail.com>

* test: add unit tests

Signed-off-by: WenyXu <wenymedia@gmail.com>

---------

Signed-off-by: WenyXu <wenymedia@gmail.com>
2025-07-07 03:30:40 +00:00
liyang
bb43d604a4 ci: use ubuntu 16 core machine in release-cn-artifacts (#6464)
ci: use ubuntu 16 core machine
2025-07-04 18:06:07 +00:00
Lei, HUANG
9576bcb9ae fix: filter empty batch in bulk insert api (#6459)
* fix/filter-empty-batch-in-bulk-insert-api:
 **Add Early Return for Empty Record Batches in `bulk_insert.rs`**

 - Implemented an early return in the `Inserter` implementation to handle cases where `record_batch.num_rows()` is zero, improving efficiency by avoiding unnecessary processing.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* fix/filter-empty-batch-in-bulk-insert-api:
 **Improve Bulk Insert Handling**

 - **`handle_bulk_insert.rs`**: Added a check to handle cases where the batch has zero rows, immediately returning and sending a success response with zero rows processed.
 - **`bulk_insert.rs`**: Enhanced logic to skip processing for masks that select none, optimizing the bulk insert operation by avoiding unnecessary iterations.

 These changes improve the efficiency and robustness of the bulk insert process by handling edge cases more effectively.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* fix/filter-empty-batch-in-bulk-insert-api:
 ### Refactor and Error Handling Enhancements

 - **Refactored Timestamp Handling**: Introduced `timestamp_array_to_primitive` function in `timestamp.rs` to streamline conversion of timestamp arrays to primitive arrays, reducing redundancy in `handle_bulk_insert.rs` and `bulk_insert.rs`.
 - **Error Handling**: Added `InconsistentTimestampLength` error in `error.rs` to handle mismatched timestamp column lengths in bulk insert operations.
 - **Bulk Insert Logic**: Updated `handle_bulk_insert.rs` to utilize the new timestamp conversion function and added checks for timestamp length consistency.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* fix/filter-empty-batch-in-bulk-insert-api:
 **Refactor `bulk_insert.rs` to streamline imports**

 - Simplified import statements by removing unused timestamp-related arrays and data types from the `arrow` crate in `bulk_insert.rs`.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

---------

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
2025-07-04 13:32:10 +00:00
Zhenchi
dc17e6e517 fix: add backward compatibility for SkippingIndexOptions deserialization (#6458)
* fix: add backward compatibility for `SkippingIndexOptions` deserialization

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* address comments

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* address comments

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

---------

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
2025-07-04 11:58:27 +00:00
Yiran
563d25ee04 fix: doc links (#6305)
Signed-off-by: Yiran <cuiyiran3@gmail.com>
2025-07-04 09:52:47 +00:00
Weny Xu
7d17782fd5 feat: persist column ids in table metadata (#6457)
* feat: persist column ids in table metadata

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: apply suggestions from CR

Signed-off-by: WenyXu <wenymedia@gmail.com>

---------

Signed-off-by: WenyXu <wenymedia@gmail.com>
2025-07-04 08:12:29 +00:00
fys
c5360601f5 feat: information table extension (#6434)
* feat: information table extension

* avoid use std HashMap behind cfg feature
2025-07-04 04:37:36 +00:00
discord9
9b5baa965c feat: truly limit time range by split window (#6295)
* feat: actually split window to limit time range

feat: truly limit time range by split window

Update src/flow/src/batching_mode/state.rs

Co-authored-by: Lei, HUANG <6406592+v0y4g3r@users.noreply.github.com>
Signed-off-by: discord9 <discord9@163.com>

* chore: added stalled time window range

Signed-off-by: discord9 <discord9@163.com>

* fix: not flush all time range as too expensive

Signed-off-by: discord9 <discord9@163.com>

* test: make it more robust

Signed-off-by: discord9 <discord9@163.com>

* what

Signed-off-by: discord9 <discord9@163.com>

* feat: denfensively handle surplus

Signed-off-by: discord9 <discord9@163.com>

* refactor: per review,explain flush flow

Signed-off-by: discord9 <discord9@163.com>

* chore: per bugbot

Signed-off-by: discord9 <discord9@163.com>

* fix: a temp fix to make mirror insert go first(still need better fix to sync with mirror insert that happens before

Signed-off-by: discord9 <discord9@163.com>

* chore: add todo

Signed-off-by: discord9 <discord9@163.com>

---------

Signed-off-by: discord9 <discord9@163.com>
Co-authored-by: Lei, HUANG <6406592+v0y4g3r@users.noreply.github.com>
2025-07-04 03:37:43 +00:00
Yingwen
76a5145def fix: enable max_execution time for other read only statements (#6454)
Also disable the timeout when timeout is 0

Signed-off-by: evenyag <realevenyag@gmail.com>
2025-07-03 13:46:02 +00:00
Ruihang Xia
7b2703760b feat: skip rule checker on ingestion (#6453)
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2025-07-03 13:31:16 +00:00
Ruihang Xia
81ea172ce4 feat!: point matrix based partition rule checker (#6431)
* bare implementation

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* stateful generator

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* error report

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix remap checkpoint

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* use matrix generator as iterator

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* pre-calculate suffix product

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* update existing test cases

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix clippy

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* sqlness

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix ut

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* clean up

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2025-07-03 06:50:02 +00:00
dennis zhuang
f7c363f969 fix: label_replace and label_join functions when used as sub‐expressions (#6443)
* fix: label_replace and label_join functions in expressions

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* chore: remove update_fields

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* chore: tql eval -> TQL EVAL

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* fix: empty regex and not existing source label

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* chore: simplfy test

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* fix: test

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* fix: test

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

---------

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
2025-07-03 05:34:22 +00:00
Ruihang Xia
5f2daae087 fix: remap column indices on overriding logical table partitions (#6446)
* fix: remap column indices on overriding logical table partitions

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* sqlness

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* refactor map query

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2025-07-02 12:12:00 +00:00
Yingwen
b1b0d0136f fix: correct MAX_EXECUTION_TIME timeout calculation (#6444)
* feat: implement statement timeout in frontend instance

Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: fail fast when timeout is 0

Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: update start time

Signed-off-by: evenyag <realevenyag@gmail.com>

---------

Signed-off-by: evenyag <realevenyag@gmail.com>
2025-07-02 08:31:40 +00:00
Zhenchi
599f289f59 feat: add granularity and false_positive_rate options for indexes (#6416)
* feat: add `granularity` and `false_positive_rate` options for indexes

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* address comments

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* upgrade proto

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

---------

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
2025-07-02 07:33:39 +00:00
LFC
385f12a62e refactor: extract the common method for errors into tonic status (#6437)
Signed-off-by: luofucong <luofc@foxmail.com>
2025-07-02 02:57:30 +00:00
fys
6b90e2b6b4 fix: allow clippy::print_stdout in cli crate (#6436)
* fix: allow clippy::print_stdout in cli crate

* add clippy lint options
2025-07-02 01:40:58 +00:00
ZonaHe
a4f3e96e96 feat: update dashboard to v0.10.2 (#6433)
Co-authored-by: sunchanglong <sunchanglong@users.noreply.github.com>
2025-07-02 01:27:37 +00:00
Ruihang Xia
2b0f27da51 feat: don't allow creating flow with the same sink and source table (#6435)
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2025-07-01 11:33:09 +00:00
Weny Xu
e0382eeb7c fix: fix dest_keys chunks bug in TombstoneManager (#6432)
* fix(meta): fix dest_keys_chunks bug in TombstoneManager

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: fix typo

Signed-off-by: WenyXu <wenymedia@gmail.com>

* fix: fix sqlness tests

Signed-off-by: WenyXu <wenymedia@gmail.com>

---------

Signed-off-by: WenyXu <wenymedia@gmail.com>
2025-07-01 09:20:13 +00:00
liyang
4aa6add8dc ci: add check-version script to check whether push the latast image (#6415)
Signed-off-by: liyang <daviderli614@gmail.com>
2025-07-01 07:45:47 +00:00
zyy17
645988975e refactor: add RegionMigrationTriggerReason in RegionMigrationProcedureTask (#6413)
Signed-off-by: zyy17 <zyylsxm@gmail.com>
2025-06-30 11:16:41 +00:00
LFC
a203909de3 feat: extension range definition (#6386)
* feat: defined extension range

Signed-off-by: luofucong <luofc@foxmail.com>

* remove feature parameters

Signed-off-by: luofucong <luofc@foxmail.com>

* resolve PR comments

Signed-off-by: luofucong <luofc@foxmail.com>

* resolve PR comments

Signed-off-by: luofucong <luofc@foxmail.com>

---------

Signed-off-by: luofucong <luofc@foxmail.com>
2025-06-30 02:42:40 +00:00
discord9
616e76941a feat: flow query parallel=1&query faster with many windows&min one time window (#6324)
* feat: flow query parallel=1&query faster when
windows too many&min one time window

Signed-off-by: discord9 <discord9@163.com>

* chore: default flow query parallelism=1

Signed-off-by: discord9 <discord9@163.com>

* refactor: use query options in flownode per review

Signed-off-by: discord9 <discord9@163.com>

* docs: update comment

Signed-off-by: discord9 <discord9@163.com>

* chore: fix test

Signed-off-by: discord9 <discord9@163.com>

* chore: per review

Signed-off-by: discord9 <discord9@163.com>

* chore: make config docs

Signed-off-by: discord9 <discord9@163.com>

---------

Signed-off-by: discord9 <discord9@163.com>
2025-06-30 02:17:01 +00:00
177 changed files with 5516 additions and 1807 deletions

42
.github/scripts/check-version.sh vendored Executable file
View File

@@ -0,0 +1,42 @@
#!/bin/bash
# Get current version
CURRENT_VERSION=$1
if [ -z "$CURRENT_VERSION" ]; then
echo "Error: Failed to get current version"
exit 1
fi
# Get the latest version from GitHub Releases
API_RESPONSE=$(curl -s "https://api.github.com/repos/GreptimeTeam/greptimedb/releases/latest")
if [ -z "$API_RESPONSE" ] || [ "$(echo "$API_RESPONSE" | jq -r '.message')" = "Not Found" ]; then
echo "Error: Failed to fetch latest version from GitHub"
exit 1
fi
# Get the latest version
LATEST_VERSION=$(echo "$API_RESPONSE" | jq -r '.tag_name')
if [ -z "$LATEST_VERSION" ] || [ "$LATEST_VERSION" = "null" ]; then
echo "Error: No valid version found in GitHub releases"
exit 1
fi
# Cleaned up version number format (removed possible 'v' prefix and -nightly suffix)
CLEAN_CURRENT=$(echo "$CURRENT_VERSION" | sed 's/^v//' | sed 's/-nightly-.*//')
CLEAN_LATEST=$(echo "$LATEST_VERSION" | sed 's/^v//' | sed 's/-nightly-.*//')
echo "Current version: $CLEAN_CURRENT"
echo "Latest release version: $CLEAN_LATEST"
# Use sort -V to compare versions
HIGHER_VERSION=$(printf "%s\n%s" "$CLEAN_CURRENT" "$CLEAN_LATEST" | sort -V | tail -n1)
if [ "$HIGHER_VERSION" = "$CLEAN_CURRENT" ]; then
echo "Current version ($CLEAN_CURRENT) is NEWER than or EQUAL to latest ($CLEAN_LATEST)"
echo "should-push-latest-tag=true" >> $GITHUB_OUTPUT
else
echo "Current version ($CLEAN_CURRENT) is OLDER than latest ($CLEAN_LATEST)"
echo "should-push-latest-tag=false" >> $GITHUB_OUTPUT
fi

View File

@@ -110,6 +110,8 @@ jobs:
# The 'version' use as the global tag name of the release workflow.
version: ${{ steps.create-version.outputs.version }}
should-push-latest-tag: ${{ steps.check-version.outputs.should-push-latest-tag }}
steps:
- name: Checkout
uses: actions/checkout@v4
@@ -135,6 +137,11 @@ jobs:
GITHUB_REF_NAME: ${{ github.ref_name }}
NIGHTLY_RELEASE_PREFIX: ${{ env.NIGHTLY_RELEASE_PREFIX }}
- name: Check version
id: check-version
run: |
./.github/scripts/check-version.sh "${{ steps.create-version.outputs.version }}"
- name: Allocate linux-amd64 runner
if: ${{ inputs.build_linux_amd64_artifacts || github.event_name == 'push' || github.event_name == 'schedule' }}
uses: ./.github/actions/start-runner
@@ -314,7 +321,7 @@ jobs:
image-registry-username: ${{ secrets.DOCKERHUB_USERNAME }}
image-registry-password: ${{ secrets.DOCKERHUB_TOKEN }}
version: ${{ needs.allocate-runners.outputs.version }}
push-latest-tag: ${{ github.ref_type == 'tag' && !contains(github.ref_name, 'nightly') && github.event_name != 'schedule' }}
push-latest-tag: ${{ needs.allocate-runners.outputs.should-push-latest-tag == 'true' && github.ref_type == 'tag' && !contains(github.ref_name, 'nightly') && github.event_name != 'schedule' }}
- name: Set build image result
id: set-build-image-result
@@ -332,7 +339,7 @@ jobs:
build-windows-artifacts,
release-images-to-dockerhub,
]
runs-on: ubuntu-latest
runs-on: ubuntu-latest-16-cores
# When we push to ACR, it's easy to fail due to some unknown network issues.
# However, we don't want to fail the whole workflow because of this.
# The ACR have daily sync with DockerHub, so don't worry about the image not being updated.
@@ -361,7 +368,7 @@ jobs:
dev-mode: false
upload-to-s3: true
update-version-info: true
push-latest-tag: ${{ github.ref_type == 'tag' && !contains(github.ref_name, 'nightly') && github.event_name != 'schedule' }}
push-latest-tag: ${{ needs.allocate-runners.outputs.should-push-latest-tag == 'true' && github.ref_type == 'tag' && !contains(github.ref_name, 'nightly') && github.event_name != 'schedule' }}
publish-github-release:
name: Create GitHub release and upload artifacts

13
Cargo.lock generated
View File

@@ -1975,7 +1975,6 @@ dependencies = [
"common-version",
"common-wal",
"datatypes",
"either",
"etcd-client",
"futures",
"humantime",
@@ -2103,7 +2102,6 @@ dependencies = [
"common-wal",
"datanode",
"datatypes",
"either",
"etcd-client",
"file-engine",
"flow",
@@ -4201,9 +4199,9 @@ dependencies = [
[[package]]
name = "either"
version = "1.13.0"
version = "1.15.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "60b1af1c220855b6ceac025d3f6ecdd2b7c4894bfe9cd9bda4fbb4bc7c0d4cf0"
checksum = "48c757948c5ede0e46177b7add2e67155f70e33c07fea8284df6576da70b3719"
dependencies = [
"serde",
]
@@ -4702,6 +4700,7 @@ version = "0.16.0"
dependencies = [
"api",
"arc-swap",
"async-stream",
"async-trait",
"auth",
"bytes",
@@ -5147,7 +5146,7 @@ dependencies = [
[[package]]
name = "greptime-proto"
version = "0.1.0"
source = "git+https://github.com/GreptimeTeam/greptime-proto.git?rev=464226cf8a4a22696503536a123d0b9e318582f4#464226cf8a4a22696503536a123d0b9e318582f4"
source = "git+https://github.com/GreptimeTeam/greptime-proto.git?rev=ceb1af4fa9309ce65bda0367db7b384df2bb4d4f#ceb1af4fa9309ce65bda0367db7b384df2bb4d4f"
dependencies = [
"prost 0.13.5",
"serde",
@@ -6698,7 +6697,7 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "4979f22fdb869068da03c9f7528f8297c6fd2606bc3a4affe42e6a823fdb8da4"
dependencies = [
"cfg-if",
"windows-targets 0.48.5",
"windows-targets 0.52.6",
]
[[package]]
@@ -7171,6 +7170,7 @@ dependencies = [
"deadpool",
"deadpool-postgres",
"derive_builder 0.20.1",
"either",
"etcd-client",
"futures",
"h2 0.3.26",
@@ -7242,6 +7242,7 @@ dependencies = [
"common-base",
"common-error",
"common-macro",
"common-meta",
"common-query",
"common-recordbatch",
"common-runtime",

View File

@@ -76,6 +76,8 @@ edition = "2021"
license = "Apache-2.0"
[workspace.lints]
clippy.print_stdout = "warn"
clippy.print_stderr = "warn"
clippy.dbg_macro = "warn"
clippy.implicit_clone = "warn"
clippy.result_large_err = "allow"
@@ -131,11 +133,12 @@ deadpool = "0.12"
deadpool-postgres = "0.14"
derive_builder = "0.20"
dotenv = "0.15"
either = "1.15"
etcd-client = "0.14"
fst = "0.4.7"
futures = "0.3"
futures-util = "0.3"
greptime-proto = { git = "https://github.com/GreptimeTeam/greptime-proto.git", rev = "464226cf8a4a22696503536a123d0b9e318582f4" }
greptime-proto = { git = "https://github.com/GreptimeTeam/greptime-proto.git", rev = "ceb1af4fa9309ce65bda0367db7b384df2bb4d4f" }
hex = "0.4"
http = "1"
humantime = "2.1"

View File

@@ -75,9 +75,9 @@
| --------- | ----------- |
| [Unified Observability Data](https://docs.greptime.com/user-guide/concepts/why-greptimedb) | Store metrics, logs, and traces as timestamped, contextual wide events. Query via [SQL](https://docs.greptime.com/user-guide/query-data/sql), [PromQL](https://docs.greptime.com/user-guide/query-data/promql), and [streaming](https://docs.greptime.com/user-guide/flow-computation/overview). |
| [High Performance & Cost Effective](https://docs.greptime.com/user-guide/manage-data/data-index) | Written in Rust, with a distributed query engine, [rich indexing](https://docs.greptime.com/user-guide/manage-data/data-index), and optimized columnar storage, delivering sub-second responses at PB scale. |
| [Cloud-Native Architecture](https://docs.greptime.com/user-guide/concepts/architecture) | Designed for [Kubernetes](https://docs.greptime.com/user-guide/deployments/deploy-on-kubernetes/greptimedb-operator-management), with compute/storage separation, native object storage (AWS S3, Azure Blob, etc.) and seamless cross-cloud access. |
| [Cloud-Native Architecture](https://docs.greptime.com/user-guide/concepts/architecture) | Designed for [Kubernetes](https://docs.greptime.com/user-guide/deployments-administration/deploy-on-kubernetes/greptimedb-operator-management), with compute/storage separation, native object storage (AWS S3, Azure Blob, etc.) and seamless cross-cloud access. |
| [Developer-Friendly](https://docs.greptime.com/user-guide/protocols/overview) | Access via SQL/PromQL interfaces, REST API, MySQL/PostgreSQL protocols, and popular ingestion [protocols](https://docs.greptime.com/user-guide/protocols/overview). |
| [Flexible Deployment](https://docs.greptime.com/user-guide/deployments/overview) | Deploy anywhere: edge (including ARM/[Android](https://docs.greptime.com/user-guide/deployments/run-on-android)) or cloud, with unified APIs and efficient data sync. |
| [Flexible Deployment](https://docs.greptime.com/user-guide/deployments-administration/overview) | Deploy anywhere: edge (including ARM/[Android](https://docs.greptime.com/user-guide/deployments-administration/run-on-android)) or cloud, with unified APIs and efficient data sync. |
Learn more in [Why GreptimeDB](https://docs.greptime.com/user-guide/concepts/why-greptimedb) and [Observability 2.0 and the Database for It](https://greptime.com/blogs/2025-04-25-greptimedb-observability2-new-database).

View File

@@ -598,3 +598,5 @@
| `logging.tracing_sample_ratio.default_ratio` | Float | `1.0` | -- |
| `tracing` | -- | -- | The tracing options. Only effect when compiled with `tokio-console` feature. |
| `tracing.tokio_console_addr` | String | Unset | The tokio console address. |
| `query` | -- | -- | -- |
| `query.parallelism` | Integer | `1` | Parallelism of the query engine for query sent by flownode.<br/>Default to 1, so it won't use too much cpu or memory |

View File

@@ -108,3 +108,8 @@ default_ratio = 1.0
## The tokio console address.
## @toml2docs:none-default
#+ tokio_console_addr = "127.0.0.1"
[query]
## Parallelism of the query engine for query sent by flownode.
## Default to 1, so it won't use too much cpu or memory
parallelism = 1

View File

@@ -48,4 +48,4 @@ Please refer to [SQL query](./query.sql) for GreptimeDB and Clickhouse, and [que
## Addition
- You can tune GreptimeDB's configuration to get better performance.
- You can setup GreptimeDB to use S3 as storage, see [here](https://docs.greptime.com/user-guide/deployments/configuration#storage-options).
- You can setup GreptimeDB to use S3 as storage, see [here](https://docs.greptime.com/user-guide/deployments-administration/configuration#storage-options).

View File

@@ -83,7 +83,7 @@ If you use the [Helm Chart](https://github.com/GreptimeTeam/helm-charts) to depl
- `monitoring.enabled=true`: Deploys a standalone GreptimeDB instance dedicated to monitoring the cluster;
- `grafana.enabled=true`: Deploys Grafana and automatically imports the monitoring dashboard;
The standalone GreptimeDB instance will collect metrics from your cluster, and the dashboard will be available in the Grafana UI. For detailed deployment instructions, please refer to our [Kubernetes deployment guide](https://docs.greptime.com/user-guide/deployments-administration/deploy-on-kubernetes/getting-started).
The standalone GreptimeDB instance will collect metrics from your cluster, and the dashboard will be available in the Grafana UI. For detailed deployment instructions, please refer to our [Kubernetes deployment guide](https://docs.greptime.com/user-guide/deployments-administration-administration/deploy-on-kubernetes/getting-started).
### Self-host Prometheus and import dashboards manually

View File

@@ -34,6 +34,7 @@ excludes = [
"src/sql/src/statements/drop/trigger.rs",
"src/sql/src/parsers/create_parser/trigger.rs",
"src/sql/src/parsers/show_parser/trigger.rs",
"src/mito2/src/extension.rs",
]
[properties]

View File

@@ -226,18 +226,20 @@ mod tests {
assert!(options.is_none());
let mut schema = ColumnSchema::new("test", ConcreteDataType::string_datatype(), true)
.with_fulltext_options(FulltextOptions {
enable: true,
analyzer: FulltextAnalyzer::English,
case_sensitive: false,
backend: FulltextBackend::Bloom,
})
.with_fulltext_options(FulltextOptions::new_unchecked(
true,
FulltextAnalyzer::English,
false,
FulltextBackend::Bloom,
10240,
0.01,
))
.unwrap();
schema.set_inverted_index(true);
let options = options_from_column_schema(&schema).unwrap();
assert_eq!(
options.options.get(FULLTEXT_GRPC_KEY).unwrap(),
"{\"enable\":true,\"analyzer\":\"English\",\"case-sensitive\":false,\"backend\":\"bloom\"}"
"{\"enable\":true,\"analyzer\":\"English\",\"case-sensitive\":false,\"backend\":\"bloom\",\"granularity\":10240,\"false-positive-rate-in-10000\":100}"
);
assert_eq!(
options.options.get(INVERTED_INDEX_GRPC_KEY).unwrap(),
@@ -247,16 +249,18 @@ mod tests {
#[test]
fn test_options_with_fulltext() {
let fulltext = FulltextOptions {
enable: true,
analyzer: FulltextAnalyzer::English,
case_sensitive: false,
backend: FulltextBackend::Bloom,
};
let fulltext = FulltextOptions::new_unchecked(
true,
FulltextAnalyzer::English,
false,
FulltextBackend::Bloom,
10240,
0.01,
);
let options = options_from_fulltext(&fulltext).unwrap().unwrap();
assert_eq!(
options.options.get(FULLTEXT_GRPC_KEY).unwrap(),
"{\"enable\":true,\"analyzer\":\"English\",\"case-sensitive\":false,\"backend\":\"bloom\"}"
"{\"enable\":true,\"analyzer\":\"English\",\"case-sensitive\":false,\"backend\":\"bloom\",\"granularity\":10240,\"false-positive-rate-in-10000\":100}"
);
}

View File

@@ -5,6 +5,7 @@ edition.workspace = true
license.workspace = true
[features]
enterprise = []
testing = []
[lints]

View File

@@ -14,9 +14,11 @@
pub use client::{CachedKvBackend, CachedKvBackendBuilder, MetaKvBackend};
mod builder;
mod client;
mod manager;
mod table_cache;
pub use builder::KvBackendCatalogManagerBuilder;
pub use manager::KvBackendCatalogManager;
pub use table_cache::{new_table_cache, TableCache, TableCacheRef};

View File

@@ -0,0 +1,131 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
use std::sync::Arc;
use common_catalog::consts::DEFAULT_CATALOG_NAME;
use common_meta::cache::LayeredCacheRegistryRef;
use common_meta::key::flow::FlowMetadataManager;
use common_meta::key::TableMetadataManager;
use common_meta::kv_backend::KvBackendRef;
use common_procedure::ProcedureManagerRef;
use moka::sync::Cache;
use partition::manager::PartitionRuleManager;
#[cfg(feature = "enterprise")]
use crate::information_schema::InformationSchemaTableFactoryRef;
use crate::information_schema::{InformationExtensionRef, InformationSchemaProvider};
use crate::kvbackend::manager::{SystemCatalog, CATALOG_CACHE_MAX_CAPACITY};
use crate::kvbackend::KvBackendCatalogManager;
use crate::process_manager::ProcessManagerRef;
use crate::system_schema::pg_catalog::PGCatalogProvider;
pub struct KvBackendCatalogManagerBuilder {
information_extension: InformationExtensionRef,
backend: KvBackendRef,
cache_registry: LayeredCacheRegistryRef,
procedure_manager: Option<ProcedureManagerRef>,
process_manager: Option<ProcessManagerRef>,
#[cfg(feature = "enterprise")]
extra_information_table_factories:
std::collections::HashMap<String, InformationSchemaTableFactoryRef>,
}
impl KvBackendCatalogManagerBuilder {
pub fn new(
information_extension: InformationExtensionRef,
backend: KvBackendRef,
cache_registry: LayeredCacheRegistryRef,
) -> Self {
Self {
information_extension,
backend,
cache_registry,
procedure_manager: None,
process_manager: None,
#[cfg(feature = "enterprise")]
extra_information_table_factories: std::collections::HashMap::new(),
}
}
pub fn with_procedure_manager(mut self, procedure_manager: ProcedureManagerRef) -> Self {
self.procedure_manager = Some(procedure_manager);
self
}
pub fn with_process_manager(mut self, process_manager: ProcessManagerRef) -> Self {
self.process_manager = Some(process_manager);
self
}
/// Sets the extra information tables.
#[cfg(feature = "enterprise")]
pub fn with_extra_information_table_factories(
mut self,
factories: std::collections::HashMap<String, InformationSchemaTableFactoryRef>,
) -> Self {
self.extra_information_table_factories = factories;
self
}
pub fn build(self) -> Arc<KvBackendCatalogManager> {
let Self {
information_extension,
backend,
cache_registry,
procedure_manager,
process_manager,
#[cfg(feature = "enterprise")]
extra_information_table_factories,
} = self;
Arc::new_cyclic(|me| KvBackendCatalogManager {
information_extension,
partition_manager: Arc::new(PartitionRuleManager::new(
backend.clone(),
cache_registry
.get()
.expect("Failed to get table_route_cache"),
)),
table_metadata_manager: Arc::new(TableMetadataManager::new(backend.clone())),
system_catalog: SystemCatalog {
catalog_manager: me.clone(),
catalog_cache: Cache::new(CATALOG_CACHE_MAX_CAPACITY),
pg_catalog_cache: Cache::new(CATALOG_CACHE_MAX_CAPACITY),
information_schema_provider: {
let provider = InformationSchemaProvider::new(
DEFAULT_CATALOG_NAME.to_string(),
me.clone(),
Arc::new(FlowMetadataManager::new(backend.clone())),
process_manager.clone(),
backend.clone(),
);
#[cfg(feature = "enterprise")]
let provider = provider
.with_extra_table_factories(extra_information_table_factories.clone());
Arc::new(provider)
},
pg_catalog_provider: Arc::new(PGCatalogProvider::new(
DEFAULT_CATALOG_NAME.to_string(),
me.clone(),
)),
backend,
process_manager,
#[cfg(feature = "enterprise")]
extra_information_table_factories,
},
cache_registry,
procedure_manager,
})
}
}

View File

@@ -13,7 +13,7 @@
// limitations under the License.
use std::any::Any;
use std::collections::{BTreeSet, HashSet};
use std::collections::BTreeSet;
use std::sync::{Arc, Weak};
use async_stream::try_stream;
@@ -30,13 +30,13 @@ use common_meta::key::flow::FlowMetadataManager;
use common_meta::key::schema_name::SchemaNameKey;
use common_meta::key::table_info::{TableInfoManager, TableInfoValue};
use common_meta::key::table_name::TableNameKey;
use common_meta::key::{TableMetadataManager, TableMetadataManagerRef};
use common_meta::key::TableMetadataManagerRef;
use common_meta::kv_backend::KvBackendRef;
use common_procedure::ProcedureManagerRef;
use futures_util::stream::BoxStream;
use futures_util::{StreamExt, TryStreamExt};
use moka::sync::Cache;
use partition::manager::{PartitionRuleManager, PartitionRuleManagerRef};
use partition::manager::PartitionRuleManagerRef;
use session::context::{Channel, QueryContext};
use snafu::prelude::*;
use store_api::metric_engine_consts::METRIC_ENGINE_NAME;
@@ -52,6 +52,8 @@ use crate::error::{
CacheNotFoundSnafu, GetTableCacheSnafu, InvalidTableInfoInCatalogSnafu, ListCatalogsSnafu,
ListSchemasSnafu, ListTablesSnafu, Result, TableMetadataManagerSnafu,
};
#[cfg(feature = "enterprise")]
use crate::information_schema::InformationSchemaTableFactoryRef;
use crate::information_schema::{InformationExtensionRef, InformationSchemaProvider};
use crate::kvbackend::TableCacheRef;
use crate::process_manager::ProcessManagerRef;
@@ -67,60 +69,22 @@ use crate::CatalogManager;
#[derive(Clone)]
pub struct KvBackendCatalogManager {
/// Provides the extension methods for the `information_schema` tables
information_extension: InformationExtensionRef,
pub(super) information_extension: InformationExtensionRef,
/// Manages partition rules.
partition_manager: PartitionRuleManagerRef,
pub(super) partition_manager: PartitionRuleManagerRef,
/// Manages table metadata.
table_metadata_manager: TableMetadataManagerRef,
pub(super) table_metadata_manager: TableMetadataManagerRef,
/// A sub-CatalogManager that handles system tables
system_catalog: SystemCatalog,
pub(super) system_catalog: SystemCatalog,
/// Cache registry for all caches.
cache_registry: LayeredCacheRegistryRef,
pub(super) cache_registry: LayeredCacheRegistryRef,
/// Only available in `Standalone` mode.
procedure_manager: Option<ProcedureManagerRef>,
pub(super) procedure_manager: Option<ProcedureManagerRef>,
}
const CATALOG_CACHE_MAX_CAPACITY: u64 = 128;
pub(super) const CATALOG_CACHE_MAX_CAPACITY: u64 = 128;
impl KvBackendCatalogManager {
pub fn new(
information_extension: InformationExtensionRef,
backend: KvBackendRef,
cache_registry: LayeredCacheRegistryRef,
procedure_manager: Option<ProcedureManagerRef>,
process_manager: Option<ProcessManagerRef>,
) -> Arc<Self> {
Arc::new_cyclic(|me| Self {
information_extension,
partition_manager: Arc::new(PartitionRuleManager::new(
backend.clone(),
cache_registry
.get()
.expect("Failed to get table_route_cache"),
)),
table_metadata_manager: Arc::new(TableMetadataManager::new(backend.clone())),
system_catalog: SystemCatalog {
catalog_manager: me.clone(),
catalog_cache: Cache::new(CATALOG_CACHE_MAX_CAPACITY),
pg_catalog_cache: Cache::new(CATALOG_CACHE_MAX_CAPACITY),
information_schema_provider: Arc::new(InformationSchemaProvider::new(
DEFAULT_CATALOG_NAME.to_string(),
me.clone(),
Arc::new(FlowMetadataManager::new(backend.clone())),
process_manager.clone(),
)),
pg_catalog_provider: Arc::new(PGCatalogProvider::new(
DEFAULT_CATALOG_NAME.to_string(),
me.clone(),
)),
backend,
process_manager,
},
cache_registry,
procedure_manager,
})
}
pub fn view_info_cache(&self) -> Result<ViewInfoCacheRef> {
self.cache_registry.get().context(CacheNotFoundSnafu {
name: "view_info_cache",
@@ -166,32 +130,29 @@ impl KvBackendCatalogManager {
.context(TableMetadataManagerSnafu)?
{
let mut new_table_info = (*table.table_info()).clone();
// Gather all column names from the logical table
let logical_column_names: HashSet<_> = new_table_info
.meta
.schema
.column_schemas()
.iter()
.map(|col| &col.name)
.collect();
// Only preserve partition key indices where the corresponding columns exist in logical table
// Remap partition key indices from physical table to logical table
new_table_info.meta.partition_key_indices = physical_table_info_value
.table_info
.meta
.partition_key_indices
.iter()
.filter(|&&index| {
.filter_map(|&physical_index| {
// Get the column name from the physical table using the physical index
physical_table_info_value
.table_info
.meta
.schema
.column_schemas
.get(index)
.map(|physical_column| logical_column_names.contains(&physical_column.name))
.unwrap_or(false)
.get(physical_index)
.and_then(|physical_column| {
// Find the corresponding index in the logical table schema
new_table_info
.meta
.schema
.column_index_by_name(physical_column.name.as_str())
})
})
.cloned()
.collect();
let new_table = DistTable::table(Arc::new(new_table_info));
@@ -510,16 +471,19 @@ fn build_table(table_info_value: TableInfoValue) -> Result<TableRef> {
/// - information_schema.{tables}
/// - pg_catalog.{tables}
#[derive(Clone)]
struct SystemCatalog {
catalog_manager: Weak<KvBackendCatalogManager>,
catalog_cache: Cache<String, Arc<InformationSchemaProvider>>,
pg_catalog_cache: Cache<String, Arc<PGCatalogProvider>>,
pub(super) struct SystemCatalog {
pub(super) catalog_manager: Weak<KvBackendCatalogManager>,
pub(super) catalog_cache: Cache<String, Arc<InformationSchemaProvider>>,
pub(super) pg_catalog_cache: Cache<String, Arc<PGCatalogProvider>>,
// system_schema_provider for default catalog
information_schema_provider: Arc<InformationSchemaProvider>,
pg_catalog_provider: Arc<PGCatalogProvider>,
backend: KvBackendRef,
process_manager: Option<ProcessManagerRef>,
pub(super) information_schema_provider: Arc<InformationSchemaProvider>,
pub(super) pg_catalog_provider: Arc<PGCatalogProvider>,
pub(super) backend: KvBackendRef,
pub(super) process_manager: Option<ProcessManagerRef>,
#[cfg(feature = "enterprise")]
pub(super) extra_information_table_factories:
std::collections::HashMap<String, InformationSchemaTableFactoryRef>,
}
impl SystemCatalog {
@@ -583,12 +547,17 @@ impl SystemCatalog {
if schema == INFORMATION_SCHEMA_NAME {
let information_schema_provider =
self.catalog_cache.get_with_by_ref(catalog, move || {
Arc::new(InformationSchemaProvider::new(
let provider = InformationSchemaProvider::new(
catalog.to_string(),
self.catalog_manager.clone(),
Arc::new(FlowMetadataManager::new(self.backend.clone())),
self.process_manager.clone(),
))
self.backend.clone(),
);
#[cfg(feature = "enterprise")]
let provider = provider
.with_extra_table_factories(self.extra_information_table_factories.clone());
Arc::new(provider)
});
information_schema_provider.table(table_name)
} else if schema == PG_CATALOG_NAME && channel == Channel::Postgres {

View File

@@ -352,11 +352,13 @@ impl MemoryCatalogManager {
}
fn create_catalog_entry(self: &Arc<Self>, catalog: String) -> SchemaEntries {
let backend = Arc::new(MemoryKvBackend::new());
let information_schema_provider = InformationSchemaProvider::new(
catalog,
Arc::downgrade(self) as Weak<dyn CatalogManager>,
Arc::new(FlowMetadataManager::new(Arc::new(MemoryKvBackend::new()))),
Arc::new(FlowMetadataManager::new(backend.clone())),
None, // we don't need ProcessManager on regions server.
backend,
);
let information_schema = information_schema_provider.tables().clone();

View File

@@ -15,7 +15,7 @@
pub mod information_schema;
mod memory_table;
pub mod pg_catalog;
mod predicate;
pub mod predicate;
mod utils;
use std::collections::HashMap;
@@ -96,7 +96,7 @@ trait SystemSchemaProviderInner {
}
}
pub(crate) trait SystemTable {
pub trait SystemTable {
fn table_id(&self) -> TableId;
fn table_name(&self) -> &'static str;
@@ -110,7 +110,7 @@ pub(crate) trait SystemTable {
}
}
pub(crate) type SystemTableRef = Arc<dyn SystemTable + Send + Sync>;
pub type SystemTableRef = Arc<dyn SystemTable + Send + Sync>;
struct SystemTableDataSource {
table: SystemTableRef,

View File

@@ -38,6 +38,7 @@ use common_meta::cluster::NodeInfo;
use common_meta::datanode::RegionStat;
use common_meta::key::flow::flow_state::FlowStat;
use common_meta::key::flow::FlowMetadataManager;
use common_meta::kv_backend::KvBackendRef;
use common_procedure::ProcedureInfo;
use common_recordbatch::SendableRecordBatchStream;
use datatypes::schema::SchemaRef;
@@ -112,6 +113,25 @@ macro_rules! setup_memory_table {
};
}
#[cfg(feature = "enterprise")]
pub struct MakeInformationTableRequest {
pub catalog_name: String,
pub catalog_manager: Weak<dyn CatalogManager>,
pub kv_backend: KvBackendRef,
}
/// A factory trait for making information schema tables.
///
/// This trait allows for extensibility of the information schema by providing
/// a way to dynamically create custom information schema tables.
#[cfg(feature = "enterprise")]
pub trait InformationSchemaTableFactory {
fn make_information_table(&self, req: MakeInformationTableRequest) -> SystemTableRef;
}
#[cfg(feature = "enterprise")]
pub type InformationSchemaTableFactoryRef = Arc<dyn InformationSchemaTableFactory + Send + Sync>;
/// The `information_schema` tables info provider.
pub struct InformationSchemaProvider {
catalog_name: String,
@@ -119,6 +139,10 @@ pub struct InformationSchemaProvider {
process_manager: Option<ProcessManagerRef>,
flow_metadata_manager: Arc<FlowMetadataManager>,
tables: HashMap<String, TableRef>,
#[allow(dead_code)]
kv_backend: KvBackendRef,
#[cfg(feature = "enterprise")]
extra_table_factories: HashMap<String, InformationSchemaTableFactoryRef>,
}
impl SystemSchemaProvider for InformationSchemaProvider {
@@ -128,6 +152,7 @@ impl SystemSchemaProvider for InformationSchemaProvider {
&self.tables
}
}
impl SystemSchemaProviderInner for InformationSchemaProvider {
fn catalog_name(&self) -> &str {
&self.catalog_name
@@ -215,7 +240,22 @@ impl SystemSchemaProviderInner for InformationSchemaProvider {
.process_manager
.as_ref()
.map(|p| Arc::new(InformationSchemaProcessList::new(p.clone())) as _),
_ => None,
table_name => {
#[cfg(feature = "enterprise")]
return self.extra_table_factories.get(table_name).map(|factory| {
let req = MakeInformationTableRequest {
catalog_name: self.catalog_name.clone(),
catalog_manager: self.catalog_manager.clone(),
kv_backend: self.kv_backend.clone(),
};
factory.make_information_table(req)
});
#[cfg(not(feature = "enterprise"))]
{
let _ = table_name;
None
}
}
}
}
}
@@ -226,6 +266,7 @@ impl InformationSchemaProvider {
catalog_manager: Weak<dyn CatalogManager>,
flow_metadata_manager: Arc<FlowMetadataManager>,
process_manager: Option<ProcessManagerRef>,
kv_backend: KvBackendRef,
) -> Self {
let mut provider = Self {
catalog_name,
@@ -233,6 +274,9 @@ impl InformationSchemaProvider {
flow_metadata_manager,
process_manager,
tables: HashMap::new(),
kv_backend,
#[cfg(feature = "enterprise")]
extra_table_factories: HashMap::new(),
};
provider.build_tables();
@@ -240,6 +284,16 @@ impl InformationSchemaProvider {
provider
}
#[cfg(feature = "enterprise")]
pub(crate) fn with_extra_table_factories(
mut self,
factories: HashMap<String, InformationSchemaTableFactoryRef>,
) -> Self {
self.extra_table_factories = factories;
self.build_tables();
self
}
fn build_tables(&mut self) {
let mut tables = HashMap::new();
@@ -290,16 +344,19 @@ impl InformationSchemaProvider {
if let Some(process_list) = self.build_table(PROCESS_LIST) {
tables.insert(PROCESS_LIST.to_string(), process_list);
}
#[cfg(feature = "enterprise")]
for name in self.extra_table_factories.keys() {
tables.insert(name.to_string(), self.build_table(name).expect(name));
}
// Add memory tables
for name in MEMORY_TABLES.iter() {
tables.insert((*name).to_string(), self.build_table(name).expect(name));
}
self.tables = tables;
}
}
trait InformationTable {
pub trait InformationTable {
fn table_id(&self) -> TableId;
fn table_name(&self) -> &'static str;

View File

@@ -48,3 +48,4 @@ pub const FLOWS: &str = "flows";
pub const PROCEDURE_INFO: &str = "procedure_info";
pub const REGION_STATISTICS: &str = "region_statistics";
pub const PROCESS_LIST: &str = "process_list";
pub const TRIGGER_LIST: &str = "trigger_list";

View File

@@ -207,6 +207,7 @@ mod tests {
use session::context::QueryContext;
use super::*;
use crate::kvbackend::KvBackendCatalogManagerBuilder;
use crate::memory::MemoryCatalogManager;
#[test]
@@ -323,13 +324,13 @@ mod tests {
.build(),
);
let catalog_manager = KvBackendCatalogManager::new(
let catalog_manager = KvBackendCatalogManagerBuilder::new(
Arc::new(NoopInformationExtension),
backend.clone(),
layered_cache_registry,
None,
None,
);
)
.build();
let table_metadata_manager = TableMetadataManager::new(backend);
let mut view_info = common_meta::key::test_utils::new_test_table_info(1024, vec![]);
view_info.table_type = TableType::View;

View File

@@ -43,7 +43,6 @@ common-time.workspace = true
common-version.workspace = true
common-wal.workspace = true
datatypes.workspace = true
either = "1.8"
etcd-client.workspace = true
futures.workspace = true
humantime.workspace = true

View File

@@ -160,6 +160,7 @@ fn create_table_info(table_id: TableId, table_name: TableName) -> RawTableInfo {
options: Default::default(),
region_numbers: (1..=100).collect(),
partition_key_indices: vec![],
column_ids: vec![],
};
RawTableInfo {

View File

@@ -12,7 +12,7 @@
// See the License for the specific language governing permissions and
// limitations under the License.
#[allow(clippy::print_stdout)]
#![allow(clippy::print_stdout)]
mod bench;
mod data;
mod database;

View File

@@ -31,7 +31,7 @@ use base64::prelude::BASE64_STANDARD;
use base64::Engine;
use common_catalog::build_db_string;
use common_catalog::consts::{DEFAULT_CATALOG_NAME, DEFAULT_SCHEMA_NAME};
use common_error::ext::{BoxedError, ErrorExt};
use common_error::ext::BoxedError;
use common_grpc::flight::do_put::DoPutResponse;
use common_grpc::flight::{FlightDecoder, FlightMessage};
use common_query::Output;
@@ -48,7 +48,7 @@ use tonic::transport::Channel;
use crate::error::{
ConvertFlightDataSnafu, Error, FlightGetSnafu, IllegalFlightMessagesSnafu,
InvalidTonicMetadataValueSnafu, ServerSnafu,
InvalidTonicMetadataValueSnafu,
};
use crate::{error, from_grpc_response, Client, Result};
@@ -196,12 +196,22 @@ impl Database {
/// Retry if connection fails, max_retries is the max number of retries, so the total wait time
/// is `max_retries * GRPC_CONN_TIMEOUT`
pub async fn handle_with_retry(&self, request: Request, max_retries: u32) -> Result<u32> {
pub async fn handle_with_retry(
&self,
request: Request,
max_retries: u32,
hints: &[(&str, &str)],
) -> Result<u32> {
let mut client = make_database_client(&self.client)?.inner;
let mut retries = 0;
let request = self.to_rpc_request(request);
loop {
let raw_response = client.handle(request.clone()).await;
let mut tonic_request = tonic::Request::new(request.clone());
let metadata = tonic_request.metadata_mut();
Self::put_hints(metadata, hints)?;
let raw_response = client.handle(tonic_request).await;
match (raw_response, retries < max_retries) {
(Ok(resp), _) => return from_grpc_response(resp.into_inner()),
(Err(err), true) => {
@@ -292,21 +302,16 @@ impl Database {
let response = client.mut_inner().do_get(request).await.or_else(|e| {
let tonic_code = e.code();
let e: Error = e.into();
let code = e.status_code();
let msg = e.to_string();
let error =
Err(BoxedError::new(ServerSnafu { code, msg }.build())).with_context(|_| {
FlightGetSnafu {
addr: client.addr().to_string(),
tonic_code,
}
});
error!(
"Failed to do Flight get, addr: {}, code: {}, source: {:?}",
client.addr(),
tonic_code,
error
e
);
let error = Err(BoxedError::new(e)).with_context(|_| FlightGetSnafu {
addr: client.addr().to_string(),
tonic_code,
});
error
})?;
@@ -436,8 +441,11 @@ mod tests {
use api::v1::auth_header::AuthScheme;
use api::v1::{AuthHeader, Basic};
use common_error::status_code::StatusCode;
use tonic::{Code, Status};
use super::*;
use crate::error::TonicSnafu;
#[test]
fn test_flight_ctx() {
@@ -460,4 +468,19 @@ mod tests {
})
)
}
#[test]
fn test_from_tonic_status() {
let expected = TonicSnafu {
code: StatusCode::Internal,
msg: "blabla".to_string(),
tonic_code: Code::Internal,
}
.build();
let status = Status::new(Code::Internal, "blabla");
let actual: Error = status.into();
assert_eq!(expected.to_string(), actual.to_string());
}
}

View File

@@ -14,13 +14,13 @@
use std::any::Any;
use common_error::define_from_tonic_status;
use common_error::ext::{BoxedError, ErrorExt};
use common_error::status_code::{convert_tonic_code_to_status_code, StatusCode};
use common_error::{GREPTIME_DB_HEADER_ERROR_CODE, GREPTIME_DB_HEADER_ERROR_MSG};
use common_error::status_code::StatusCode;
use common_macro::stack_trace_debug;
use snafu::{location, Location, Snafu};
use tonic::metadata::errors::InvalidMetadataValue;
use tonic::{Code, Status};
use tonic::Code;
#[derive(Snafu)]
#[snafu(visibility(pub))]
@@ -124,6 +124,15 @@ pub enum Error {
location: Location,
source: datatypes::error::Error,
},
#[snafu(display("{}", msg))]
Tonic {
code: StatusCode,
msg: String,
tonic_code: Code,
#[snafu(implicit)]
location: Location,
},
}
pub type Result<T> = std::result::Result<T, Error>;
@@ -135,7 +144,7 @@ impl ErrorExt for Error {
| Error::MissingField { .. }
| Error::IllegalDatabaseResponse { .. } => StatusCode::Internal,
Error::Server { code, .. } => *code,
Error::Server { code, .. } | Error::Tonic { code, .. } => *code,
Error::FlightGet { source, .. }
| Error::RegionServer { source, .. }
| Error::FlowServer { source, .. } => source.status_code(),
@@ -153,34 +162,7 @@ impl ErrorExt for Error {
}
}
impl From<Status> for Error {
fn from(e: Status) -> Self {
fn get_metadata_value(e: &Status, key: &str) -> Option<String> {
e.metadata()
.get(key)
.and_then(|v| String::from_utf8(v.as_bytes().to_vec()).ok())
}
let code = get_metadata_value(&e, GREPTIME_DB_HEADER_ERROR_CODE).and_then(|s| {
if let Ok(code) = s.parse::<u32>() {
StatusCode::from_u32(code)
} else {
None
}
});
let tonic_code = e.code();
let code = code.unwrap_or_else(|| convert_tonic_code_to_status_code(tonic_code));
let msg = get_metadata_value(&e, GREPTIME_DB_HEADER_ERROR_MSG)
.unwrap_or_else(|| e.message().to_string());
Self::Server {
code,
msg,
location: location!(),
}
}
}
define_from_tonic_status!(Error, Tonic);
impl Error {
pub fn should_retry(&self) -> bool {

View File

@@ -21,7 +21,7 @@ use arc_swap::ArcSwapOption;
use arrow_flight::Ticket;
use async_stream::stream;
use async_trait::async_trait;
use common_error::ext::{BoxedError, ErrorExt};
use common_error::ext::BoxedError;
use common_error::status_code::StatusCode;
use common_grpc::flight::{FlightDecoder, FlightMessage};
use common_meta::error::{self as meta_error, Result as MetaResult};
@@ -107,24 +107,18 @@ impl RegionRequester {
.mut_inner()
.do_get(ticket)
.await
.map_err(|e| {
.or_else(|e| {
let tonic_code = e.code();
let e: error::Error = e.into();
let code = e.status_code();
let msg = e.to_string();
let error = ServerSnafu { code, msg }
.fail::<()>()
.map_err(BoxedError::new)
.with_context(|_| FlightGetSnafu {
tonic_code,
addr: flight_client.addr().to_string(),
})
.unwrap_err();
error!(
e; "Failed to do Flight get, addr: {}, code: {}",
flight_client.addr(),
tonic_code
);
let error = Err(BoxedError::new(e)).with_context(|_| FlightGetSnafu {
addr: flight_client.addr().to_string(),
tonic_code,
});
error
})?;

View File

@@ -16,7 +16,7 @@ default = [
"meta-srv/pg_kvbackend",
"meta-srv/mysql_kvbackend",
]
enterprise = ["common-meta/enterprise", "frontend/enterprise", "meta-srv/enterprise"]
enterprise = ["common-meta/enterprise", "frontend/enterprise", "meta-srv/enterprise", "catalog/enterprise"]
tokio-console = ["common-telemetry/tokio-console"]
[lints]
@@ -52,7 +52,6 @@ common-version.workspace = true
common-wal.workspace = true
datanode.workspace = true
datatypes.workspace = true
either = "1.8"
etcd-client.workspace = true
file-engine.workspace = true
flow.workspace = true

View File

@@ -18,7 +18,7 @@ use std::time::Duration;
use cache::{build_fundamental_cache_registry, with_default_composite_cache_registry};
use catalog::information_extension::DistributedInformationExtension;
use catalog::kvbackend::{CachedKvBackendBuilder, KvBackendCatalogManager, MetaKvBackend};
use catalog::kvbackend::{CachedKvBackendBuilder, KvBackendCatalogManagerBuilder, MetaKvBackend};
use clap::Parser;
use client::client_manager::NodeClients;
use common_base::Plugins;
@@ -342,13 +342,12 @@ impl StartCommand {
let information_extension =
Arc::new(DistributedInformationExtension::new(meta_client.clone()));
let catalog_manager = KvBackendCatalogManager::new(
let catalog_manager = KvBackendCatalogManagerBuilder::new(
information_extension,
cached_meta_backend.clone(),
layered_cache_registry.clone(),
None,
None,
);
)
.build();
let table_metadata_manager =
Arc::new(TableMetadataManager::new(cached_meta_backend.clone()));
@@ -371,8 +370,11 @@ impl StartCommand {
let flow_metadata_manager = Arc::new(FlowMetadataManager::new(cached_meta_backend.clone()));
let flow_auth_header = get_flow_auth_options(&opts).context(StartFlownodeSnafu)?;
let frontend_client =
FrontendClient::from_meta_client(meta_client.clone(), flow_auth_header);
let frontend_client = FrontendClient::from_meta_client(
meta_client.clone(),
flow_auth_header,
opts.query.clone(),
);
let frontend_client = Arc::new(frontend_client);
let flownode_builder = FlownodeBuilder::new(
opts.clone(),

View File

@@ -19,7 +19,7 @@ use std::time::Duration;
use async_trait::async_trait;
use cache::{build_fundamental_cache_registry, with_default_composite_cache_registry};
use catalog::information_extension::DistributedInformationExtension;
use catalog::kvbackend::{CachedKvBackendBuilder, KvBackendCatalogManager, MetaKvBackend};
use catalog::kvbackend::{CachedKvBackendBuilder, KvBackendCatalogManagerBuilder, MetaKvBackend};
use catalog::process_manager::ProcessManager;
use clap::Parser;
use client::client_manager::NodeClients;
@@ -102,7 +102,7 @@ impl App for Instance {
#[derive(Parser)]
pub struct Command {
#[clap(subcommand)]
subcmd: SubCommand,
pub subcmd: SubCommand,
}
impl Command {
@@ -116,7 +116,7 @@ impl Command {
}
#[derive(Parser)]
enum SubCommand {
pub enum SubCommand {
Start(StartCommand),
}
@@ -153,7 +153,7 @@ pub struct StartCommand {
#[clap(long)]
postgres_addr: Option<String>,
#[clap(short, long)]
config_file: Option<String>,
pub config_file: Option<String>,
#[clap(short, long)]
influxdb_enable: Option<bool>,
#[clap(long, value_delimiter = ',', num_args = 1..)]
@@ -169,7 +169,7 @@ pub struct StartCommand {
#[clap(long)]
disable_dashboard: Option<bool>,
#[clap(long, default_value = "GREPTIMEDB_FRONTEND")]
env_prefix: String,
pub env_prefix: String,
}
impl StartCommand {
@@ -350,13 +350,20 @@ impl StartCommand {
addrs::resolve_addr(&opts.grpc.bind_addr, Some(&opts.grpc.server_addr)),
Some(meta_client.clone()),
));
let catalog_manager = KvBackendCatalogManager::new(
let builder = KvBackendCatalogManagerBuilder::new(
information_extension,
cached_meta_backend.clone(),
layered_cache_registry.clone(),
None,
Some(process_manager.clone()),
);
)
.with_process_manager(process_manager.clone());
#[cfg(feature = "enterprise")]
let builder = if let Some(factories) = plugins.get() {
builder.with_extra_information_table_factories(factories)
} else {
builder
};
let catalog_manager = builder.build();
let executor = HandlerGroupExecutor::new(vec![
Arc::new(ParseMailboxMessageHandler),

View File

@@ -54,6 +54,10 @@ impl Instance {
pub fn get_inner(&self) -> &MetasrvInstance {
&self.instance
}
pub fn mut_inner(&mut self) -> &mut MetasrvInstance {
&mut self.instance
}
}
#[async_trait]

View File

@@ -20,7 +20,7 @@ use std::{fs, path};
use async_trait::async_trait;
use cache::{build_fundamental_cache_registry, with_default_composite_cache_registry};
use catalog::information_schema::InformationExtension;
use catalog::kvbackend::KvBackendCatalogManager;
use catalog::kvbackend::KvBackendCatalogManagerBuilder;
use catalog::process_manager::ProcessManager;
use clap::Parser;
use client::api::v1::meta::RegionRole;
@@ -544,13 +544,20 @@ impl StartCommand {
));
let process_manager = Arc::new(ProcessManager::new(opts.grpc.server_addr.clone(), None));
let catalog_manager = KvBackendCatalogManager::new(
let builder = KvBackendCatalogManagerBuilder::new(
information_extension.clone(),
kv_backend.clone(),
layered_cache_registry.clone(),
Some(procedure_manager.clone()),
Some(process_manager.clone()),
);
)
.with_procedure_manager(procedure_manager.clone())
.with_process_manager(process_manager.clone());
#[cfg(feature = "enterprise")]
let builder = if let Some(factories) = plugins.get() {
builder.with_extra_information_table_factories(factories)
} else {
builder
};
let catalog_manager = builder.build();
let table_metadata_manager =
Self::create_table_metadata_manager(kv_backend.clone()).await?;
@@ -564,7 +571,7 @@ impl StartCommand {
// for standalone not use grpc, but get a handler to frontend grpc client without
// actually make a connection
let (frontend_client, frontend_instance_handler) =
FrontendClient::from_empty_grpc_handler();
FrontendClient::from_empty_grpc_handler(opts.query.clone());
let frontend_client = Arc::new(frontend_client);
let flow_builder = FlownodeBuilder::new(
flownode_options,

View File

@@ -30,6 +30,7 @@ use meta_srv::metasrv::MetasrvOptions;
use meta_srv::selector::SelectorType;
use metric_engine::config::EngineConfig as MetricEngineConfig;
use mito2::config::MitoConfig;
use query::options::QueryOptions;
use servers::export_metrics::ExportMetricsOption;
use servers::grpc::GrpcOptions;
use servers::http::HttpOptions;
@@ -216,12 +217,15 @@ fn test_load_flownode_example_config() {
dir: format!("{}/{}", DEFAULT_DATA_HOME, DEFAULT_LOGGING_DIR),
level: Some("info".to_string()),
otlp_endpoint: Some(DEFAULT_OTLP_HTTP_ENDPOINT.to_string()),
otlp_export_protocol: Some(common_telemetry::logging::OtlpExportProtocol::Http),
tracing_sample_ratio: Some(Default::default()),
..Default::default()
},
tracing: Default::default(),
heartbeat: Default::default(),
query: Default::default(),
// flownode deliberately use a slower query parallelism
// to avoid overwhelming the frontend with too many queries
query: QueryOptions { parallelism: 1 },
meta_client: Some(MetaClientOptions {
metasrv_addrs: vec!["127.0.0.1:3002".to_string()],
timeout: Duration::from_secs(3),

View File

@@ -78,7 +78,7 @@ pub const INFORMATION_SCHEMA_ROUTINES_TABLE_ID: u32 = 21;
pub const INFORMATION_SCHEMA_SCHEMA_PRIVILEGES_TABLE_ID: u32 = 22;
/// id for information_schema.TABLE_PRIVILEGES
pub const INFORMATION_SCHEMA_TABLE_PRIVILEGES_TABLE_ID: u32 = 23;
/// id for information_schema.TRIGGERS
/// id for information_schema.TRIGGERS (for mysql)
pub const INFORMATION_SCHEMA_TRIGGERS_TABLE_ID: u32 = 24;
/// id for information_schema.GLOBAL_STATUS
pub const INFORMATION_SCHEMA_GLOBAL_STATUS_TABLE_ID: u32 = 25;
@@ -104,6 +104,8 @@ pub const INFORMATION_SCHEMA_PROCEDURE_INFO_TABLE_ID: u32 = 34;
pub const INFORMATION_SCHEMA_REGION_STATISTICS_TABLE_ID: u32 = 35;
/// id for information_schema.process_list
pub const INFORMATION_SCHEMA_PROCESS_LIST_TABLE_ID: u32 = 36;
/// id for information_schema.trigger_list (for greptimedb trigger)
pub const INFORMATION_SCHEMA_TRIGGER_TABLE_ID: u32 = 37;
// ----- End of information_schema tables -----

View File

@@ -239,6 +239,48 @@ impl fmt::Display for StatusCode {
}
}
#[macro_export]
macro_rules! define_from_tonic_status {
($Error: ty, $Variant: ident) => {
impl From<tonic::Status> for $Error {
fn from(e: tonic::Status) -> Self {
use snafu::location;
fn metadata_value(e: &tonic::Status, key: &str) -> Option<String> {
e.metadata()
.get(key)
.and_then(|v| String::from_utf8(v.as_bytes().to_vec()).ok())
}
let code = metadata_value(&e, $crate::GREPTIME_DB_HEADER_ERROR_CODE)
.and_then(|s| {
if let Ok(code) = s.parse::<u32>() {
StatusCode::from_u32(code)
} else {
None
}
})
.unwrap_or_else(|| match e.code() {
tonic::Code::Cancelled => StatusCode::Cancelled,
tonic::Code::DeadlineExceeded => StatusCode::DeadlineExceeded,
_ => StatusCode::Internal,
});
let msg = metadata_value(&e, $crate::GREPTIME_DB_HEADER_ERROR_MSG)
.unwrap_or_else(|| e.message().to_string());
// TODO(LFC): Make the error variant defined automatically.
Self::$Variant {
code,
msg,
tonic_code: e.code(),
location: location!(),
}
}
}
};
}
#[macro_export]
macro_rules! define_into_tonic_status {
($Error: ty) => {
@@ -315,15 +357,6 @@ pub fn status_to_tonic_code(status_code: StatusCode) -> Code {
}
}
/// Converts tonic [Code] to [StatusCode].
pub fn convert_tonic_code_to_status_code(code: Code) -> StatusCode {
match code {
Code::Cancelled => StatusCode::Cancelled,
Code::DeadlineExceeded => StatusCode::DeadlineExceeded,
_ => StatusCode::Internal,
}
}
#[cfg(test)]
mod tests {
use strum::IntoEnumIterator;

View File

@@ -34,7 +34,7 @@ use table::requests::{
};
use crate::error::{
InvalidColumnDefSnafu, InvalidSetFulltextOptionRequestSnafu,
InvalidColumnDefSnafu, InvalidIndexOptionSnafu, InvalidSetFulltextOptionRequestSnafu,
InvalidSetSkippingIndexOptionRequestSnafu, InvalidSetTableOptionRequestSnafu,
InvalidUnsetTableOptionRequestSnafu, MissingAlterIndexOptionSnafu, MissingFieldSnafu,
MissingTimestampColumnSnafu, Result, UnknownLocationTypeSnafu,
@@ -126,18 +126,21 @@ pub fn alter_expr_to_request(table_id: TableId, expr: AlterTableExpr) -> Result<
api::v1::set_index::Options::Fulltext(f) => AlterKind::SetIndex {
options: SetIndexOptions::Fulltext {
column_name: f.column_name.clone(),
options: FulltextOptions {
enable: f.enable,
analyzer: as_fulltext_option_analyzer(
options: FulltextOptions::new(
f.enable,
as_fulltext_option_analyzer(
Analyzer::try_from(f.analyzer)
.context(InvalidSetFulltextOptionRequestSnafu)?,
),
case_sensitive: f.case_sensitive,
backend: as_fulltext_option_backend(
f.case_sensitive,
as_fulltext_option_backend(
PbFulltextBackend::try_from(f.backend)
.context(InvalidSetFulltextOptionRequestSnafu)?,
),
},
f.granularity as u32,
f.false_positive_rate,
)
.context(InvalidIndexOptionSnafu)?,
},
},
api::v1::set_index::Options::Inverted(i) => AlterKind::SetIndex {
@@ -148,13 +151,15 @@ pub fn alter_expr_to_request(table_id: TableId, expr: AlterTableExpr) -> Result<
api::v1::set_index::Options::Skipping(s) => AlterKind::SetIndex {
options: SetIndexOptions::Skipping {
column_name: s.column_name,
options: SkippingIndexOptions {
granularity: s.granularity as u32,
index_type: as_skipping_index_type(
options: SkippingIndexOptions::new(
s.granularity as u32,
s.false_positive_rate,
as_skipping_index_type(
PbSkippingIndexType::try_from(s.skipping_index_type)
.context(InvalidSetSkippingIndexOptionRequestSnafu)?,
),
},
)
.context(InvalidIndexOptionSnafu)?,
},
},
},

View File

@@ -153,6 +153,14 @@ pub enum Error {
#[snafu(implicit)]
location: Location,
},
#[snafu(display("Invalid index option"))]
InvalidIndexOption {
#[snafu(implicit)]
location: Location,
#[snafu(source)]
error: datatypes::error::Error,
},
}
pub type Result<T> = std::result::Result<T, Error>;
@@ -180,7 +188,8 @@ impl ErrorExt for Error {
| Error::InvalidUnsetTableOptionRequest { .. }
| Error::InvalidSetFulltextOptionRequest { .. }
| Error::InvalidSetSkippingIndexOptionRequest { .. }
| Error::MissingAlterIndexOption { .. } => StatusCode::InvalidArguments,
| Error::MissingAlterIndexOption { .. }
| Error::InvalidIndexOption { .. } => StatusCode::InvalidArguments,
}
}

View File

@@ -27,17 +27,18 @@ use common_telemetry::{error, info, warn};
use futures_util::future;
pub use region_request::make_alter_region_request;
use serde::{Deserialize, Serialize};
use snafu::{ensure, ResultExt};
use snafu::ResultExt;
use store_api::metadata::ColumnMetadata;
use store_api::metric_engine_consts::ALTER_PHYSICAL_EXTENSION_KEY;
use strum::AsRefStr;
use table::metadata::TableId;
use crate::ddl::utils::{
add_peer_context_if_needed, map_to_procedure_error, sync_follower_regions,
add_peer_context_if_needed, extract_column_metadatas, map_to_procedure_error,
sync_follower_regions,
};
use crate::ddl::DdlContext;
use crate::error::{DecodeJsonSnafu, MetadataCorruptionSnafu, Result};
use crate::error::Result;
use crate::instruction::CacheIdent;
use crate::key::table_info::TableInfoValue;
use crate::key::table_route::PhysicalTableRouteValue;
@@ -137,37 +138,13 @@ impl AlterLogicalTablesProcedure {
.into_iter()
.collect::<Result<Vec<_>>>()?;
// Collects responses from datanodes.
let phy_raw_schemas = results
.iter_mut()
.map(|res| res.extensions.remove(ALTER_PHYSICAL_EXTENSION_KEY))
.collect::<Vec<_>>();
if phy_raw_schemas.is_empty() {
self.submit_sync_region_requests(results, &physical_table_route.region_routes)
.await;
self.data.state = AlterTablesState::UpdateMetadata;
return Ok(Status::executing(true));
}
// Verify all the physical schemas are the same
// Safety: previous check ensures this vec is not empty
let first = phy_raw_schemas.first().unwrap();
ensure!(
phy_raw_schemas.iter().all(|x| x == first),
MetadataCorruptionSnafu {
err_msg: "The physical schemas from datanodes are not the same."
}
);
// Decodes the physical raw schemas
if let Some(phy_raw_schema) = first {
self.data.physical_columns =
ColumnMetadata::decode_list(phy_raw_schema).context(DecodeJsonSnafu)?;
if let Some(column_metadatas) =
extract_column_metadatas(&mut results, ALTER_PHYSICAL_EXTENSION_KEY)?
{
self.data.physical_columns = column_metadatas;
} else {
warn!("altering logical table result doesn't contains extension key `{ALTER_PHYSICAL_EXTENSION_KEY}`,leaving the physical table's schema unchanged");
}
self.submit_sync_region_requests(results, &physical_table_route.region_routes)
.await;
self.data.state = AlterTablesState::UpdateMetadata;
@@ -183,7 +160,7 @@ impl AlterLogicalTablesProcedure {
if let Err(err) = sync_follower_regions(
&self.context,
self.data.physical_table_id,
results,
&results,
region_routes,
table_info.meta.engine.as_str(),
)

View File

@@ -29,19 +29,22 @@ use common_procedure::{
Context as ProcedureContext, ContextProvider, Error as ProcedureError, LockKey, PoisonKey,
PoisonKeys, Procedure, ProcedureId, Status, StringKey,
};
use common_telemetry::{debug, error, info};
use common_telemetry::{debug, error, info, warn};
use futures::future::{self};
use serde::{Deserialize, Serialize};
use snafu::{ensure, ResultExt};
use store_api::metadata::ColumnMetadata;
use store_api::metric_engine_consts::TABLE_COLUMN_METADATA_EXTENSION_KEY;
use store_api::storage::RegionId;
use strum::AsRefStr;
use table::metadata::{RawTableInfo, TableId, TableInfo};
use table::table_reference::TableReference;
use crate::cache_invalidator::Context;
use crate::ddl::physical_table_metadata::update_table_info_column_ids;
use crate::ddl::utils::{
add_peer_context_if_needed, handle_multiple_results, map_to_procedure_error,
sync_follower_regions, MultipleResults,
add_peer_context_if_needed, extract_column_metadatas, handle_multiple_results,
map_to_procedure_error, sync_follower_regions, MultipleResults,
};
use crate::ddl::DdlContext;
use crate::error::{AbortProcedureSnafu, NoLeaderSnafu, PutPoisonSnafu, Result, RetryLaterSnafu};
@@ -202,9 +205,9 @@ impl AlterTableProcedure {
})
}
MultipleResults::Ok(results) => {
self.submit_sync_region_requests(results, &physical_table_route.region_routes)
self.submit_sync_region_requests(&results, &physical_table_route.region_routes)
.await;
self.data.state = AlterTableState::UpdateMetadata;
self.handle_alter_region_response(results)?;
Ok(Status::executing_with_clean_poisons(true))
}
MultipleResults::AllNonRetryable(error) => {
@@ -220,9 +223,22 @@ impl AlterTableProcedure {
}
}
fn handle_alter_region_response(&mut self, mut results: Vec<RegionResponse>) -> Result<()> {
self.data.state = AlterTableState::UpdateMetadata;
if let Some(column_metadatas) =
extract_column_metadatas(&mut results, TABLE_COLUMN_METADATA_EXTENSION_KEY)?
{
self.data.column_metadatas = column_metadatas;
} else {
warn!("altering table result doesn't contains extension key `{TABLE_COLUMN_METADATA_EXTENSION_KEY}`,leaving the table's column metadata unchanged");
}
Ok(())
}
async fn submit_sync_region_requests(
&mut self,
results: Vec<RegionResponse>,
results: &[RegionResponse],
region_routes: &[RegionRoute],
) {
// Safety: filled in `prepare` step.
@@ -268,10 +284,14 @@ impl AlterTableProcedure {
self.on_update_metadata_for_rename(new_table_name.to_string(), table_info_value)
.await?;
} else {
let mut raw_table_info = new_info.into();
if !self.data.column_metadatas.is_empty() {
update_table_info_column_ids(&mut raw_table_info, &self.data.column_metadatas);
}
// region distribution is set in submit_alter_region_requests
let region_distribution = self.data.region_distribution.as_ref().unwrap().clone();
self.on_update_metadata_for_alter(
new_info.into(),
raw_table_info,
region_distribution,
table_info_value,
)
@@ -318,6 +338,16 @@ impl AlterTableProcedure {
lock_key
}
#[cfg(test)]
pub(crate) fn data(&self) -> &AlterTableData {
&self.data
}
#[cfg(test)]
pub(crate) fn mut_data(&mut self) -> &mut AlterTableData {
&mut self.data
}
}
#[async_trait]
@@ -380,6 +410,8 @@ pub struct AlterTableData {
state: AlterTableState,
task: AlterTableTask,
table_id: TableId,
#[serde(default)]
column_metadatas: Vec<ColumnMetadata>,
/// Table info value before alteration.
table_info_value: Option<DeserializedValueWithBytes<TableInfoValue>>,
/// Region distribution for table in case we need to update region options.
@@ -392,6 +424,7 @@ impl AlterTableData {
state: AlterTableState::Prepare,
task,
table_id,
column_metadatas: vec![],
table_info_value: None,
region_distribution: None,
}
@@ -410,4 +443,14 @@ impl AlterTableData {
.as_ref()
.map(|value| &value.table_info)
}
#[cfg(test)]
pub(crate) fn column_metadatas(&self) -> &[ColumnMetadata] {
&self.column_metadatas
}
#[cfg(test)]
pub(crate) fn set_column_metadatas(&mut self, column_metadatas: Vec<ColumnMetadata>) {
self.column_metadatas = column_metadatas;
}
}

View File

@@ -167,6 +167,25 @@ impl CreateFlowProcedure {
}
self.collect_source_tables().await?;
// Validate that source and sink tables are not the same
let sink_table_name = &self.data.task.sink_table_name;
if self
.data
.task
.source_table_names
.iter()
.any(|source| source == sink_table_name)
{
return error::UnsupportedSnafu {
operation: format!(
"Creating flow with source and sink table being the same: {}",
sink_table_name
),
}
.fail();
}
if self.data.flow_id.is_none() {
self.allocate_flow_id().await?;
}

View File

@@ -27,7 +27,7 @@ use common_telemetry::{debug, error, warn};
use futures::future;
pub use region_request::create_region_request_builder;
use serde::{Deserialize, Serialize};
use snafu::{ensure, ResultExt};
use snafu::ResultExt;
use store_api::metadata::ColumnMetadata;
use store_api::metric_engine_consts::ALTER_PHYSICAL_EXTENSION_KEY;
use store_api::storage::{RegionId, RegionNumber};
@@ -35,10 +35,11 @@ use strum::AsRefStr;
use table::metadata::{RawTableInfo, TableId};
use crate::ddl::utils::{
add_peer_context_if_needed, map_to_procedure_error, sync_follower_regions,
add_peer_context_if_needed, extract_column_metadatas, map_to_procedure_error,
sync_follower_regions,
};
use crate::ddl::DdlContext;
use crate::error::{DecodeJsonSnafu, MetadataCorruptionSnafu, Result};
use crate::error::Result;
use crate::key::table_route::TableRouteValue;
use crate::lock_key::{CatalogLock, SchemaLock, TableLock, TableNameLock};
use crate::metrics;
@@ -166,47 +167,23 @@ impl CreateLogicalTablesProcedure {
.into_iter()
.collect::<Result<Vec<_>>>()?;
// Collects response from datanodes.
let phy_raw_schemas = results
.iter_mut()
.map(|res| res.extensions.remove(ALTER_PHYSICAL_EXTENSION_KEY))
.collect::<Vec<_>>();
if phy_raw_schemas.is_empty() {
self.submit_sync_region_requests(results, region_routes)
.await;
self.data.state = CreateTablesState::CreateMetadata;
return Ok(Status::executing(false));
}
// Verify all the physical schemas are the same
// Safety: previous check ensures this vec is not empty
let first = phy_raw_schemas.first().unwrap();
ensure!(
phy_raw_schemas.iter().all(|x| x == first),
MetadataCorruptionSnafu {
err_msg: "The physical schemas from datanodes are not the same."
}
);
// Decodes the physical raw schemas
if let Some(phy_raw_schemas) = first {
self.data.physical_columns =
ColumnMetadata::decode_list(phy_raw_schemas).context(DecodeJsonSnafu)?;
if let Some(column_metadatas) =
extract_column_metadatas(&mut results, ALTER_PHYSICAL_EXTENSION_KEY)?
{
self.data.physical_columns = column_metadatas;
} else {
warn!("creating logical table result doesn't contains extension key `{ALTER_PHYSICAL_EXTENSION_KEY}`,leaving the physical table's schema unchanged");
}
self.submit_sync_region_requests(results, region_routes)
self.submit_sync_region_requests(&results, region_routes)
.await;
self.data.state = CreateTablesState::CreateMetadata;
Ok(Status::executing(true))
}
async fn submit_sync_region_requests(
&self,
results: Vec<RegionResponse>,
results: &[RegionResponse],
region_routes: &[RegionRoute],
) {
if let Err(err) = sync_follower_regions(

View File

@@ -22,20 +22,23 @@ use common_procedure::error::{
ExternalSnafu, FromJsonSnafu, Result as ProcedureResult, ToJsonSnafu,
};
use common_procedure::{Context as ProcedureContext, LockKey, Procedure, Status};
use common_telemetry::info;
use common_telemetry::tracing_context::TracingContext;
use common_telemetry::{info, warn};
use futures::future::join_all;
use serde::{Deserialize, Serialize};
use snafu::{ensure, OptionExt, ResultExt};
use store_api::metadata::ColumnMetadata;
use store_api::metric_engine_consts::TABLE_COLUMN_METADATA_EXTENSION_KEY;
use store_api::storage::{RegionId, RegionNumber};
use strum::AsRefStr;
use table::metadata::{RawTableInfo, TableId};
use table::table_reference::TableReference;
use crate::ddl::create_table_template::{build_template, CreateRequestBuilder};
use crate::ddl::physical_table_metadata::update_table_info_column_ids;
use crate::ddl::utils::{
add_peer_context_if_needed, convert_region_routes_to_detecting_regions, map_to_procedure_error,
region_storage_path,
add_peer_context_if_needed, convert_region_routes_to_detecting_regions,
extract_column_metadatas, map_to_procedure_error, region_storage_path,
};
use crate::ddl::{DdlContext, TableMetadata};
use crate::error::{self, Result};
@@ -243,14 +246,21 @@ impl CreateTableProcedure {
}
}
join_all(create_region_tasks)
self.creator.data.state = CreateTableState::CreateMetadata;
let mut results = join_all(create_region_tasks)
.await
.into_iter()
.collect::<Result<Vec<_>>>()?;
self.creator.data.state = CreateTableState::CreateMetadata;
if let Some(column_metadatas) =
extract_column_metadatas(&mut results, TABLE_COLUMN_METADATA_EXTENSION_KEY)?
{
self.creator.data.column_metadatas = column_metadatas;
} else {
warn!("creating table result doesn't contains extension key `{TABLE_COLUMN_METADATA_EXTENSION_KEY}`,leaving the table's column metadata unchanged");
}
// TODO(weny): Add more tests.
Ok(Status::executing(true))
}
@@ -262,7 +272,10 @@ impl CreateTableProcedure {
let table_id = self.table_id();
let manager = &self.context.table_metadata_manager;
let raw_table_info = self.table_info().clone();
let mut raw_table_info = self.table_info().clone();
if !self.creator.data.column_metadatas.is_empty() {
update_table_info_column_ids(&mut raw_table_info, &self.creator.data.column_metadatas);
}
// Safety: the region_wal_options must be allocated.
let region_wal_options = self.region_wal_options()?.clone();
// Safety: the table_route must be allocated.
@@ -346,6 +359,7 @@ impl TableCreator {
Self {
data: CreateTableData {
state: CreateTableState::Prepare,
column_metadatas: vec![],
task,
table_route: None,
region_wal_options: None,
@@ -407,6 +421,8 @@ pub enum CreateTableState {
pub struct CreateTableData {
pub state: CreateTableState,
pub task: CreateTableTask,
#[serde(default)]
pub column_metadatas: Vec<ColumnMetadata>,
/// None stands for not allocated yet.
table_route: Option<PhysicalTableRouteValue>,
/// None stands for not allocated yet.

View File

@@ -12,9 +12,11 @@
// See the License for the specific language governing permissions and
// limitations under the License.
use std::collections::HashSet;
use std::collections::{HashMap, HashSet};
use api::v1::SemanticType;
use common_telemetry::debug;
use common_telemetry::tracing::warn;
use store_api::metadata::ColumnMetadata;
use table::metadata::RawTableInfo;
@@ -23,6 +25,10 @@ pub(crate) fn build_new_physical_table_info(
mut raw_table_info: RawTableInfo,
physical_columns: &[ColumnMetadata],
) -> RawTableInfo {
debug!(
"building new physical table info for table: {}, table_id: {}",
raw_table_info.name, raw_table_info.ident.table_id
);
let existing_columns = raw_table_info
.meta
.schema
@@ -36,6 +42,8 @@ pub(crate) fn build_new_physical_table_info(
let time_index = &mut raw_table_info.meta.schema.timestamp_index;
let columns = &mut raw_table_info.meta.schema.column_schemas;
columns.clear();
let column_ids = &mut raw_table_info.meta.column_ids;
column_ids.clear();
for (idx, col) in physical_columns.iter().enumerate() {
match col.semantic_type {
@@ -50,6 +58,7 @@ pub(crate) fn build_new_physical_table_info(
}
columns.push(col.column_schema.clone());
column_ids.push(col.column_id);
}
if let Some(time_index) = *time_index {
@@ -58,3 +67,54 @@ pub(crate) fn build_new_physical_table_info(
raw_table_info
}
/// Updates the column IDs in the table info based on the provided column metadata.
///
/// This function validates that the column metadata matches the existing table schema
/// before updating the column ids. If the column metadata doesn't match the table schema,
/// the table info remains unchanged.
pub(crate) fn update_table_info_column_ids(
raw_table_info: &mut RawTableInfo,
column_metadatas: &[ColumnMetadata],
) {
let mut table_column_names = raw_table_info
.meta
.schema
.column_schemas
.iter()
.map(|c| c.name.as_str())
.collect::<Vec<_>>();
table_column_names.sort_unstable();
let mut column_names = column_metadatas
.iter()
.map(|c| c.column_schema.name.as_str())
.collect::<Vec<_>>();
column_names.sort_unstable();
if table_column_names != column_names {
warn!(
"Column metadata doesn't match the table schema for table {}, table_id: {}, column in table: {:?}, column in metadata: {:?}",
raw_table_info.name,
raw_table_info.ident.table_id,
table_column_names,
column_names,
);
return;
}
let name_to_id = column_metadatas
.iter()
.map(|c| (c.column_schema.name.clone(), c.column_id))
.collect::<HashMap<_, _>>();
let schema = &raw_table_info.meta.schema.column_schemas;
let mut column_ids = Vec::with_capacity(schema.len());
for column_schema in schema {
if let Some(id) = name_to_id.get(&column_schema.name) {
column_ids.push(*id);
}
}
raw_table_info.meta.column_ids = column_ids;
}

View File

@@ -122,6 +122,7 @@ impl TableMetadataAllocator {
);
let peers = self.peer_allocator.alloc(regions).await?;
debug!("Allocated peers {:?} for table {}", peers, table_id);
let region_routes = task
.partitions
.iter()

View File

@@ -24,7 +24,14 @@ use std::collections::HashMap;
use api::v1::meta::Partition;
use api::v1::{ColumnDataType, SemanticType};
use common_procedure::Status;
use store_api::metric_engine_consts::{LOGICAL_TABLE_METADATA_KEY, METRIC_ENGINE_NAME};
use datatypes::prelude::ConcreteDataType;
use datatypes::schema::ColumnSchema;
use store_api::metadata::ColumnMetadata;
use store_api::metric_engine_consts::{
DATA_SCHEMA_TABLE_ID_COLUMN_NAME, DATA_SCHEMA_TSID_COLUMN_NAME, LOGICAL_TABLE_METADATA_KEY,
METRIC_ENGINE_NAME,
};
use store_api::storage::consts::ReservedColumnId;
use table::metadata::{RawTableInfo, TableId};
use crate::ddl::create_logical_tables::CreateLogicalTablesProcedure;
@@ -146,6 +153,7 @@ pub fn test_create_logical_table_task(name: &str) -> CreateTableTask {
}
}
/// Creates a physical table task with a single region.
pub fn test_create_physical_table_task(name: &str) -> CreateTableTask {
let create_table = TestCreateTableExprBuilder::default()
.column_defs([
@@ -182,3 +190,95 @@ pub fn test_create_physical_table_task(name: &str) -> CreateTableTask {
table_info,
}
}
/// Creates a column metadata list with tag fields.
pub fn test_column_metadatas(tag_fields: &[&str]) -> Vec<ColumnMetadata> {
let mut output = Vec::with_capacity(tag_fields.len() + 4);
output.extend([
ColumnMetadata {
column_schema: ColumnSchema::new(
"ts",
ConcreteDataType::timestamp_millisecond_datatype(),
false,
),
semantic_type: SemanticType::Timestamp,
column_id: 0,
},
ColumnMetadata {
column_schema: ColumnSchema::new("value", ConcreteDataType::float64_datatype(), false),
semantic_type: SemanticType::Field,
column_id: 1,
},
ColumnMetadata {
column_schema: ColumnSchema::new(
DATA_SCHEMA_TABLE_ID_COLUMN_NAME,
ConcreteDataType::timestamp_millisecond_datatype(),
false,
),
semantic_type: SemanticType::Tag,
column_id: ReservedColumnId::table_id(),
},
ColumnMetadata {
column_schema: ColumnSchema::new(
DATA_SCHEMA_TSID_COLUMN_NAME,
ConcreteDataType::float64_datatype(),
false,
),
semantic_type: SemanticType::Tag,
column_id: ReservedColumnId::tsid(),
},
]);
for (i, name) in tag_fields.iter().enumerate() {
output.push(ColumnMetadata {
column_schema: ColumnSchema::new(
name.to_string(),
ConcreteDataType::string_datatype(),
true,
),
semantic_type: SemanticType::Tag,
column_id: (i + 2) as u32,
});
}
output
}
/// Asserts the column names.
pub fn assert_column_name(table_info: &RawTableInfo, expected_column_names: &[&str]) {
assert_eq!(
table_info
.meta
.schema
.column_schemas
.iter()
.map(|c| c.name.to_string())
.collect::<Vec<_>>(),
expected_column_names
);
}
/// Asserts the column metadatas
pub fn assert_column_name_and_id(column_metadatas: &[ColumnMetadata], expected: &[(&str, u32)]) {
assert_eq!(expected.len(), column_metadatas.len());
for (name, id) in expected {
let column_metadata = column_metadatas
.iter()
.find(|c| c.column_id == *id)
.unwrap();
assert_eq!(column_metadata.column_schema.name, *name);
}
}
/// Gets the raw table info.
pub async fn get_raw_table_info(ddl_context: &DdlContext, table_id: TableId) -> RawTableInfo {
ddl_context
.table_metadata_manager
.table_info_manager()
.get(table_id)
.await
.unwrap()
.unwrap()
.into_inner()
.table_info
}

View File

@@ -132,6 +132,7 @@ pub fn build_raw_table_info_from_expr(expr: &CreateTableExpr) -> RawTableInfo {
options: TableOptions::try_from_iter(&expr.table_options).unwrap(),
created_on: DateTime::default(),
partition_key_indices: vec![],
column_ids: vec![],
},
table_type: TableType::Base,
}

View File

@@ -12,6 +12,8 @@
// See the License for the specific language governing permissions and
// limitations under the License.
use std::sync::Arc;
use api::region::RegionResponse;
use api::v1::region::RegionRequest;
use common_error::ext::{BoxedError, ErrorExt, StackError};
@@ -45,10 +47,13 @@ impl MockDatanodeHandler for () {
}
}
type RegionRequestHandler =
Arc<dyn Fn(Peer, RegionRequest) -> Result<RegionResponse> + Send + Sync>;
#[derive(Clone)]
pub struct DatanodeWatcher {
sender: mpsc::Sender<(Peer, RegionRequest)>,
handler: Option<fn(Peer, RegionRequest) -> Result<RegionResponse>>,
handler: Option<RegionRequestHandler>,
}
impl DatanodeWatcher {
@@ -61,9 +66,9 @@ impl DatanodeWatcher {
pub fn with_handler(
mut self,
user_handler: fn(Peer, RegionRequest) -> Result<RegionResponse>,
user_handler: impl Fn(Peer, RegionRequest) -> Result<RegionResponse> + Send + Sync + 'static,
) -> Self {
self.handler = Some(user_handler);
self.handler = Some(Arc::new(user_handler));
self
}
}
@@ -76,7 +81,7 @@ impl MockDatanodeHandler for DatanodeWatcher {
.send((peer.clone(), request.clone()))
.await
.unwrap();
if let Some(handler) = self.handler {
if let Some(handler) = self.handler.as_ref() {
handler(peer.clone(), request)
} else {
Ok(RegionResponse::new(0))

View File

@@ -23,17 +23,20 @@ use api::v1::{ColumnDataType, SemanticType};
use common_catalog::consts::{DEFAULT_CATALOG_NAME, DEFAULT_SCHEMA_NAME};
use common_procedure::{Procedure, ProcedureId, Status};
use common_procedure_test::MockContextProvider;
use store_api::metric_engine_consts::MANIFEST_INFO_EXTENSION_KEY;
use store_api::metadata::ColumnMetadata;
use store_api::metric_engine_consts::{ALTER_PHYSICAL_EXTENSION_KEY, MANIFEST_INFO_EXTENSION_KEY};
use store_api::region_engine::RegionManifestInfo;
use store_api::storage::consts::ReservedColumnId;
use store_api::storage::RegionId;
use tokio::sync::mpsc;
use crate::ddl::alter_logical_tables::AlterLogicalTablesProcedure;
use crate::ddl::test_util::alter_table::TestAlterTableExprBuilder;
use crate::ddl::test_util::columns::TestColumnDefBuilder;
use crate::ddl::test_util::datanode_handler::{DatanodeWatcher, NaiveDatanodeHandler};
use crate::ddl::test_util::datanode_handler::DatanodeWatcher;
use crate::ddl::test_util::{
create_logical_table, create_physical_table, create_physical_table_metadata,
assert_column_name, create_logical_table, create_physical_table,
create_physical_table_metadata, get_raw_table_info, test_column_metadatas,
test_create_physical_table_task,
};
use crate::error::Error::{AlterLogicalTablesInvalidArguments, TableNotFound};
@@ -96,6 +99,52 @@ fn make_alter_logical_table_rename_task(
}
}
fn make_alters_request_handler(
column_metadatas: Vec<ColumnMetadata>,
) -> impl Fn(Peer, RegionRequest) -> Result<RegionResponse> {
move |_peer: Peer, request: RegionRequest| {
if let region_request::Body::Alters(_) = request.body.unwrap() {
let mut response = RegionResponse::new(0);
// Default region id for physical table.
let region_id = RegionId::new(1000, 1);
response.extensions.insert(
MANIFEST_INFO_EXTENSION_KEY.to_string(),
RegionManifestInfo::encode_list(&[(
region_id,
RegionManifestInfo::metric(1, 0, 2, 0),
)])
.unwrap(),
);
response.extensions.insert(
ALTER_PHYSICAL_EXTENSION_KEY.to_string(),
ColumnMetadata::encode_list(&column_metadatas).unwrap(),
);
return Ok(response);
}
Ok(RegionResponse::new(0))
}
}
fn assert_alters_request(
peer: Peer,
request: RegionRequest,
expected_peer_id: u64,
expected_region_ids: &[RegionId],
) {
assert_eq!(peer.id, expected_peer_id,);
let Some(region_request::Body::Alters(req)) = request.body else {
unreachable!();
};
for (i, region_id) in expected_region_ids.iter().enumerate() {
assert_eq!(
req.requests[i].region_id,
*region_id,
"actual region id: {}",
RegionId::from_u64(req.requests[i].region_id)
);
}
}
#[tokio::test]
async fn test_on_prepare_check_schema() {
let node_manager = Arc::new(MockDatanodeManager::new(()));
@@ -205,15 +254,20 @@ async fn test_on_prepare() {
#[tokio::test]
async fn test_on_update_metadata() {
let node_manager = Arc::new(MockDatanodeManager::new(NaiveDatanodeHandler));
common_telemetry::init_default_ut_logging();
let (tx, mut rx) = mpsc::channel(8);
let test_column_metadatas = test_column_metadatas(&["new_col", "mew_col"]);
let datanode_handler =
DatanodeWatcher::new(tx).with_handler(make_alters_request_handler(test_column_metadatas));
let node_manager = Arc::new(MockDatanodeManager::new(datanode_handler));
let ddl_context = new_ddl_context(node_manager);
// Creates physical table
let phy_id = create_physical_table(&ddl_context, "phy").await;
// Creates 3 logical tables
create_logical_table(ddl_context.clone(), phy_id, "table1").await;
create_logical_table(ddl_context.clone(), phy_id, "table2").await;
create_logical_table(ddl_context.clone(), phy_id, "table3").await;
let logical_table1_id = create_logical_table(ddl_context.clone(), phy_id, "table1").await;
let logical_table2_id = create_logical_table(ddl_context.clone(), phy_id, "table2").await;
let logical_table3_id = create_logical_table(ddl_context.clone(), phy_id, "table3").await;
create_logical_table(ddl_context.clone(), phy_id, "table4").await;
create_logical_table(ddl_context.clone(), phy_id, "table5").await;
@@ -223,7 +277,7 @@ async fn test_on_update_metadata() {
make_alter_logical_table_add_column_task(None, "table3", vec!["new_col".to_string()]),
];
let mut procedure = AlterLogicalTablesProcedure::new(tasks, phy_id, ddl_context);
let mut procedure = AlterLogicalTablesProcedure::new(tasks, phy_id, ddl_context.clone());
let mut status = procedure.on_prepare().await.unwrap();
assert_matches!(
status,
@@ -255,18 +309,52 @@ async fn test_on_update_metadata() {
clean_poisons: false
}
);
let (peer, request) = rx.try_recv().unwrap();
rx.try_recv().unwrap_err();
assert_alters_request(
peer,
request,
0,
&[
RegionId::new(logical_table1_id, 0),
RegionId::new(logical_table2_id, 0),
RegionId::new(logical_table3_id, 0),
],
);
let table_info = get_raw_table_info(&ddl_context, phy_id).await;
assert_column_name(
&table_info,
&["ts", "value", "__table_id", "__tsid", "new_col", "mew_col"],
);
assert_eq!(
table_info.meta.column_ids,
vec![
0,
1,
ReservedColumnId::table_id(),
ReservedColumnId::tsid(),
2,
3
]
);
}
#[tokio::test]
async fn test_on_part_duplicate_alter_request() {
let node_manager = Arc::new(MockDatanodeManager::new(NaiveDatanodeHandler));
let ddl_context = new_ddl_context(node_manager);
common_telemetry::init_default_ut_logging();
let (tx, mut rx) = mpsc::channel(8);
let column_metadatas = test_column_metadatas(&["col_0"]);
let handler =
DatanodeWatcher::new(tx).with_handler(make_alters_request_handler(column_metadatas));
let node_manager = Arc::new(MockDatanodeManager::new(handler));
let mut ddl_context = new_ddl_context(node_manager);
// Creates physical table
let phy_id = create_physical_table(&ddl_context, "phy").await;
// Creates 3 logical tables
create_logical_table(ddl_context.clone(), phy_id, "table1").await;
create_logical_table(ddl_context.clone(), phy_id, "table2").await;
let logical_table1_id = create_logical_table(ddl_context.clone(), phy_id, "table1").await;
let logical_table2_id = create_logical_table(ddl_context.clone(), phy_id, "table2").await;
let tasks = vec![
make_alter_logical_table_add_column_task(None, "table1", vec!["col_0".to_string()]),
@@ -305,6 +393,40 @@ async fn test_on_part_duplicate_alter_request() {
clean_poisons: false
}
);
let (peer, request) = rx.try_recv().unwrap();
rx.try_recv().unwrap_err();
assert_alters_request(
peer,
request,
0,
&[
RegionId::new(logical_table1_id, 0),
RegionId::new(logical_table2_id, 0),
],
);
let table_info = get_raw_table_info(&ddl_context, phy_id).await;
assert_column_name(
&table_info,
&["ts", "value", "__table_id", "__tsid", "col_0"],
);
assert_eq!(
table_info.meta.column_ids,
vec![
0,
1,
ReservedColumnId::table_id(),
ReservedColumnId::tsid(),
2
]
);
let (tx, mut rx) = mpsc::channel(8);
let column_metadatas = test_column_metadatas(&["col_0", "new_col_1", "new_col_2"]);
let handler =
DatanodeWatcher::new(tx).with_handler(make_alters_request_handler(column_metadatas));
let node_manager = Arc::new(MockDatanodeManager::new(handler));
ddl_context.node_manager = node_manager;
// re-alter
let tasks = vec![
@@ -357,6 +479,44 @@ async fn test_on_part_duplicate_alter_request() {
}
);
let (peer, request) = rx.try_recv().unwrap();
rx.try_recv().unwrap_err();
assert_alters_request(
peer,
request,
0,
&[
RegionId::new(logical_table1_id, 0),
RegionId::new(logical_table2_id, 0),
],
);
let table_info = get_raw_table_info(&ddl_context, phy_id).await;
assert_column_name(
&table_info,
&[
"ts",
"value",
"__table_id",
"__tsid",
"col_0",
"new_col_1",
"new_col_2",
],
);
assert_eq!(
table_info.meta.column_ids,
vec![
0,
1,
ReservedColumnId::table_id(),
ReservedColumnId::tsid(),
2,
3,
4,
]
);
let table_name_keys = vec![
TableNameKey::new(DEFAULT_CATALOG_NAME, DEFAULT_SCHEMA_NAME, "table1"),
TableNameKey::new(DEFAULT_CATALOG_NAME, DEFAULT_SCHEMA_NAME, "table2"),
@@ -422,27 +582,13 @@ async fn test_on_part_duplicate_alter_request() {
);
}
fn alters_request_handler(_peer: Peer, request: RegionRequest) -> Result<RegionResponse> {
if let region_request::Body::Alters(_) = request.body.unwrap() {
let mut response = RegionResponse::new(0);
// Default region id for physical table.
let region_id = RegionId::new(1000, 1);
response.extensions.insert(
MANIFEST_INFO_EXTENSION_KEY.to_string(),
RegionManifestInfo::encode_list(&[(region_id, RegionManifestInfo::metric(1, 0, 2, 0))])
.unwrap(),
);
return Ok(response);
}
Ok(RegionResponse::new(0))
}
#[tokio::test]
async fn test_on_submit_alter_region_request() {
common_telemetry::init_default_ut_logging();
let (tx, mut rx) = mpsc::channel(8);
let handler = DatanodeWatcher::new(tx).with_handler(alters_request_handler);
let column_metadatas = test_column_metadatas(&["new_col", "mew_col"]);
let handler =
DatanodeWatcher::new(tx).with_handler(make_alters_request_handler(column_metadatas));
let node_manager = Arc::new(MockDatanodeManager::new(handler));
let ddl_context = new_ddl_context(node_manager);

View File

@@ -30,7 +30,12 @@ use common_error::status_code::StatusCode;
use common_procedure::store::poison_store::PoisonStore;
use common_procedure::{ProcedureId, Status};
use common_procedure_test::MockContextProvider;
use store_api::metric_engine_consts::MANIFEST_INFO_EXTENSION_KEY;
use datatypes::prelude::ConcreteDataType;
use datatypes::schema::ColumnSchema;
use store_api::metadata::ColumnMetadata;
use store_api::metric_engine_consts::{
MANIFEST_INFO_EXTENSION_KEY, TABLE_COLUMN_METADATA_EXTENSION_KEY,
};
use store_api::region_engine::RegionManifestInfo;
use store_api::storage::RegionId;
use table::requests::TTL_KEY;
@@ -43,6 +48,7 @@ use crate::ddl::test_util::datanode_handler::{
AllFailureDatanodeHandler, DatanodeWatcher, PartialSuccessDatanodeHandler,
RequestOutdatedErrorDatanodeHandler,
};
use crate::ddl::test_util::{assert_column_name, assert_column_name_and_id};
use crate::error::{Error, Result};
use crate::key::datanode_table::DatanodeTableKey;
use crate::key::table_name::TableNameKey;
@@ -179,6 +185,30 @@ fn alter_request_handler(_peer: Peer, request: RegionRequest) -> Result<RegionRe
RegionManifestInfo::encode_list(&[(region_id, RegionManifestInfo::mito(1, 1))])
.unwrap(),
);
response.extensions.insert(
TABLE_COLUMN_METADATA_EXTENSION_KEY.to_string(),
ColumnMetadata::encode_list(&[
ColumnMetadata {
column_schema: ColumnSchema::new(
"ts",
ConcreteDataType::timestamp_millisecond_datatype(),
false,
),
semantic_type: SemanticType::Timestamp,
column_id: 0,
},
ColumnMetadata {
column_schema: ColumnSchema::new(
"host",
ConcreteDataType::float64_datatype(),
false,
),
semantic_type: SemanticType::Tag,
column_id: 1,
},
])
.unwrap(),
);
return Ok(response);
}
@@ -187,6 +217,7 @@ fn alter_request_handler(_peer: Peer, request: RegionRequest) -> Result<RegionRe
#[tokio::test]
async fn test_on_submit_alter_request() {
common_telemetry::init_default_ut_logging();
let (tx, mut rx) = mpsc::channel(8);
let datanode_handler = DatanodeWatcher::new(tx).with_handler(alter_request_handler);
let node_manager = Arc::new(MockDatanodeManager::new(datanode_handler));
@@ -234,6 +265,8 @@ async fn test_on_submit_alter_request() {
assert_sync_request(peer, request, 4, RegionId::new(table_id, 2), 1);
let (peer, request) = results.remove(0);
assert_sync_request(peer, request, 5, RegionId::new(table_id, 1), 1);
let column_metadatas = procedure.data().column_metadatas();
assert_column_name_and_id(column_metadatas, &[("ts", 0), ("host", 1)]);
}
#[tokio::test]
@@ -378,6 +411,7 @@ async fn test_on_update_metadata_rename() {
#[tokio::test]
async fn test_on_update_metadata_add_columns() {
common_telemetry::init_default_ut_logging();
let node_manager = Arc::new(MockDatanodeManager::new(()));
let ddl_context = new_ddl_context(node_manager);
let table_name = "foo";
@@ -431,6 +465,34 @@ async fn test_on_update_metadata_add_columns() {
.submit_alter_region_requests(procedure_id, provider.as_ref())
.await
.unwrap();
// Returned column metadatas is empty.
assert!(procedure.data().column_metadatas().is_empty());
procedure.mut_data().set_column_metadatas(vec![
ColumnMetadata {
column_schema: ColumnSchema::new(
"ts",
ConcreteDataType::timestamp_millisecond_datatype(),
false,
),
semantic_type: SemanticType::Timestamp,
column_id: 0,
},
ColumnMetadata {
column_schema: ColumnSchema::new("host", ConcreteDataType::float64_datatype(), false),
semantic_type: SemanticType::Tag,
column_id: 1,
},
ColumnMetadata {
column_schema: ColumnSchema::new("cpu", ConcreteDataType::float64_datatype(), false),
semantic_type: SemanticType::Tag,
column_id: 2,
},
ColumnMetadata {
column_schema: ColumnSchema::new("my_tag3", ConcreteDataType::string_datatype(), true),
semantic_type: SemanticType::Tag,
column_id: 3,
},
]);
procedure.on_update_metadata().await.unwrap();
let table_info = ddl_context
@@ -447,6 +509,8 @@ async fn test_on_update_metadata_add_columns() {
table_info.meta.schema.column_schemas.len() as u32,
table_info.meta.next_column_id
);
assert_column_name(&table_info, &["ts", "host", "cpu", "my_tag3"]);
assert_eq!(table_info.meta.column_ids, vec![0, 1, 2, 3]);
}
#[tokio::test]

View File

@@ -141,3 +141,41 @@ async fn test_create_flow() {
let err = procedure.on_prepare().await.unwrap_err();
assert_matches!(err, error::Error::FlowAlreadyExists { .. });
}
#[tokio::test]
async fn test_create_flow_same_source_and_sink_table() {
let table_id = 1024;
let table_name = TableName::new(DEFAULT_CATALOG_NAME, DEFAULT_SCHEMA_NAME, "same_table");
// Use the same table for both source and sink
let source_table_names = vec![table_name.clone()];
let sink_table_name = table_name.clone();
let node_manager = Arc::new(MockFlownodeManager::new(NaiveFlownodeHandler));
let ddl_context = new_ddl_context(node_manager);
// Create the table first so it exists
let task = test_create_table_task("same_table", table_id);
ddl_context
.table_metadata_manager
.create_table_metadata(
task.table_info.clone(),
TableRouteValue::physical(vec![]),
HashMap::new(),
)
.await
.unwrap();
// Try to create a flow with same source and sink table - should fail
let task = test_create_flow_task("my_flow", source_table_names, sink_table_name, false);
let query_ctx = QueryContext::arc().into();
let mut procedure = CreateFlowProcedure::new(task, query_ctx, ddl_context);
let err = procedure.on_prepare().await.unwrap_err();
assert_matches!(err, error::Error::Unsupported { .. });
// Verify the error message contains information about the same table
if let error::Error::Unsupported { operation, .. } = &err {
assert!(operation.contains("source and sink table being the same"));
assert!(operation.contains("same_table"));
}
}

View File

@@ -23,15 +23,18 @@ use common_error::ext::ErrorExt;
use common_error::status_code::StatusCode;
use common_procedure::{Context as ProcedureContext, Procedure, ProcedureId, Status};
use common_procedure_test::MockContextProvider;
use store_api::metric_engine_consts::MANIFEST_INFO_EXTENSION_KEY;
use store_api::metadata::ColumnMetadata;
use store_api::metric_engine_consts::{ALTER_PHYSICAL_EXTENSION_KEY, MANIFEST_INFO_EXTENSION_KEY};
use store_api::region_engine::RegionManifestInfo;
use store_api::storage::consts::ReservedColumnId;
use store_api::storage::RegionId;
use tokio::sync::mpsc;
use crate::ddl::create_logical_tables::CreateLogicalTablesProcedure;
use crate::ddl::test_util::datanode_handler::{DatanodeWatcher, NaiveDatanodeHandler};
use crate::ddl::test_util::{
create_physical_table_metadata, test_create_logical_table_task, test_create_physical_table_task,
assert_column_name, create_physical_table_metadata, get_raw_table_info, test_column_metadatas,
test_create_logical_table_task, test_create_physical_table_task,
};
use crate::ddl::TableMetadata;
use crate::error::{Error, Result};
@@ -39,6 +42,54 @@ use crate::key::table_route::{PhysicalTableRouteValue, TableRouteValue};
use crate::rpc::router::{Region, RegionRoute};
use crate::test_util::{new_ddl_context, MockDatanodeManager};
fn make_creates_request_handler(
column_metadatas: Vec<ColumnMetadata>,
) -> impl Fn(Peer, RegionRequest) -> Result<RegionResponse> {
move |_peer, request| {
let _ = _peer;
if let region_request::Body::Creates(_) = request.body.unwrap() {
let mut response = RegionResponse::new(0);
// Default region id for physical table.
let region_id = RegionId::new(1024, 1);
response.extensions.insert(
MANIFEST_INFO_EXTENSION_KEY.to_string(),
RegionManifestInfo::encode_list(&[(
region_id,
RegionManifestInfo::metric(1, 0, 2, 0),
)])
.unwrap(),
);
response.extensions.insert(
ALTER_PHYSICAL_EXTENSION_KEY.to_string(),
ColumnMetadata::encode_list(&column_metadatas).unwrap(),
);
return Ok(response);
}
Ok(RegionResponse::new(0))
}
}
fn assert_creates_request(
peer: Peer,
request: RegionRequest,
expected_peer_id: u64,
expected_region_ids: &[RegionId],
) {
assert_eq!(peer.id, expected_peer_id,);
let Some(region_request::Body::Creates(req)) = request.body else {
unreachable!();
};
for (i, region_id) in expected_region_ids.iter().enumerate() {
assert_eq!(
req.requests[i].region_id,
*region_id,
"actual region id: {}",
RegionId::from_u64(req.requests[i].region_id)
);
}
}
#[tokio::test]
async fn test_on_prepare_physical_table_not_found() {
let node_manager = Arc::new(MockDatanodeManager::new(()));
@@ -227,7 +278,12 @@ async fn test_on_prepare_part_logical_tables_exist() {
#[tokio::test]
async fn test_on_create_metadata() {
let node_manager = Arc::new(MockDatanodeManager::new(NaiveDatanodeHandler));
common_telemetry::init_default_ut_logging();
let (tx, mut rx) = mpsc::channel(8);
let column_metadatas = test_column_metadatas(&["host", "cpu"]);
let datanode_handler =
DatanodeWatcher::new(tx).with_handler(make_creates_request_handler(column_metadatas));
let node_manager = Arc::new(MockDatanodeManager::new(datanode_handler));
let ddl_context = new_ddl_context(node_manager);
// Prepares physical table metadata.
let mut create_physical_table_task = test_create_physical_table_task("phy_table");
@@ -255,7 +311,7 @@ async fn test_on_create_metadata() {
let mut procedure = CreateLogicalTablesProcedure::new(
vec![task, yet_another_task],
physical_table_id,
ddl_context,
ddl_context.clone(),
);
let status = procedure.on_prepare().await.unwrap();
assert_matches!(
@@ -274,11 +330,42 @@ async fn test_on_create_metadata() {
let status = procedure.execute(&ctx).await.unwrap();
let table_ids = status.downcast_output_ref::<Vec<u32>>().unwrap();
assert_eq!(*table_ids, vec![1025, 1026]);
let (peer, request) = rx.try_recv().unwrap();
rx.try_recv().unwrap_err();
assert_creates_request(
peer,
request,
0,
&[RegionId::new(1025, 0), RegionId::new(1026, 0)],
);
let table_info = get_raw_table_info(&ddl_context, table_id).await;
assert_column_name(
&table_info,
&["ts", "value", "__table_id", "__tsid", "host", "cpu"],
);
assert_eq!(
table_info.meta.column_ids,
vec![
0,
1,
ReservedColumnId::table_id(),
ReservedColumnId::tsid(),
2,
3
]
);
}
#[tokio::test]
async fn test_on_create_metadata_part_logical_tables_exist() {
let node_manager = Arc::new(MockDatanodeManager::new(NaiveDatanodeHandler));
common_telemetry::init_default_ut_logging();
let (tx, mut rx) = mpsc::channel(8);
let column_metadatas = test_column_metadatas(&["host", "cpu"]);
let datanode_handler =
DatanodeWatcher::new(tx).with_handler(make_creates_request_handler(column_metadatas));
let node_manager = Arc::new(MockDatanodeManager::new(datanode_handler));
let ddl_context = new_ddl_context(node_manager);
// Prepares physical table metadata.
let mut create_physical_table_task = test_create_physical_table_task("phy_table");
@@ -317,7 +404,7 @@ async fn test_on_create_metadata_part_logical_tables_exist() {
let mut procedure = CreateLogicalTablesProcedure::new(
vec![task, non_exist_task],
physical_table_id,
ddl_context,
ddl_context.clone(),
);
let status = procedure.on_prepare().await.unwrap();
assert_matches!(
@@ -336,6 +423,27 @@ async fn test_on_create_metadata_part_logical_tables_exist() {
let status = procedure.execute(&ctx).await.unwrap();
let table_ids = status.downcast_output_ref::<Vec<u32>>().unwrap();
assert_eq!(*table_ids, vec![8192, 1025]);
let (peer, request) = rx.try_recv().unwrap();
rx.try_recv().unwrap_err();
assert_creates_request(peer, request, 0, &[RegionId::new(1025, 0)]);
let table_info = get_raw_table_info(&ddl_context, table_id).await;
assert_column_name(
&table_info,
&["ts", "value", "__table_id", "__tsid", "host", "cpu"],
);
assert_eq!(
table_info.meta.column_ids,
vec![
0,
1,
ReservedColumnId::table_id(),
ReservedColumnId::tsid(),
2,
3
]
);
}
#[tokio::test]
@@ -399,27 +507,13 @@ async fn test_on_create_metadata_err() {
assert!(!error.is_retry_later());
}
fn creates_request_handler(_peer: Peer, request: RegionRequest) -> Result<RegionResponse> {
if let region_request::Body::Creates(_) = request.body.unwrap() {
let mut response = RegionResponse::new(0);
// Default region id for physical table.
let region_id = RegionId::new(1024, 1);
response.extensions.insert(
MANIFEST_INFO_EXTENSION_KEY.to_string(),
RegionManifestInfo::encode_list(&[(region_id, RegionManifestInfo::metric(1, 0, 2, 0))])
.unwrap(),
);
return Ok(response);
}
Ok(RegionResponse::new(0))
}
#[tokio::test]
async fn test_on_submit_create_request() {
common_telemetry::init_default_ut_logging();
let (tx, mut rx) = mpsc::channel(8);
let handler = DatanodeWatcher::new(tx).with_handler(creates_request_handler);
let column_metadatas = test_column_metadatas(&["host", "cpu"]);
let handler =
DatanodeWatcher::new(tx).with_handler(make_creates_request_handler(column_metadatas));
let node_manager = Arc::new(MockDatanodeManager::new(handler));
let ddl_context = new_ddl_context(node_manager);
let mut create_physical_table_task = test_create_physical_table_task("phy_table");

View File

@@ -16,7 +16,9 @@ use std::assert_matches::assert_matches;
use std::collections::HashMap;
use std::sync::Arc;
use api::v1::meta::Partition;
use api::region::RegionResponse;
use api::v1::meta::{Partition, Peer};
use api::v1::region::{region_request, RegionRequest};
use api::v1::{ColumnDataType, SemanticType};
use common_error::ext::ErrorExt;
use common_error::status_code::StatusCode;
@@ -24,7 +26,12 @@ use common_procedure::{Context as ProcedureContext, Procedure, ProcedureId, Stat
use common_procedure_test::{
execute_procedure_until, execute_procedure_until_done, MockContextProvider,
};
use datatypes::prelude::ConcreteDataType;
use datatypes::schema::ColumnSchema;
use store_api::metadata::ColumnMetadata;
use store_api::metric_engine_consts::TABLE_COLUMN_METADATA_EXTENSION_KEY;
use store_api::storage::RegionId;
use tokio::sync::mpsc;
use crate::ddl::create_table::{CreateTableProcedure, CreateTableState};
use crate::ddl::test_util::columns::TestColumnDefBuilder;
@@ -32,14 +39,73 @@ use crate::ddl::test_util::create_table::{
build_raw_table_info_from_expr, TestCreateTableExprBuilder,
};
use crate::ddl::test_util::datanode_handler::{
NaiveDatanodeHandler, RetryErrorDatanodeHandler, UnexpectedErrorDatanodeHandler,
DatanodeWatcher, NaiveDatanodeHandler, RetryErrorDatanodeHandler,
UnexpectedErrorDatanodeHandler,
};
use crate::error::Error;
use crate::ddl::test_util::{assert_column_name, get_raw_table_info};
use crate::error::{Error, Result};
use crate::key::table_route::TableRouteValue;
use crate::kv_backend::memory::MemoryKvBackend;
use crate::rpc::ddl::CreateTableTask;
use crate::test_util::{new_ddl_context, new_ddl_context_with_kv_backend, MockDatanodeManager};
fn create_request_handler(_peer: Peer, request: RegionRequest) -> Result<RegionResponse> {
let _ = _peer;
if let region_request::Body::Create(_) = request.body.unwrap() {
let mut response = RegionResponse::new(0);
response.extensions.insert(
TABLE_COLUMN_METADATA_EXTENSION_KEY.to_string(),
ColumnMetadata::encode_list(&[
ColumnMetadata {
column_schema: ColumnSchema::new(
"ts",
ConcreteDataType::timestamp_millisecond_datatype(),
false,
),
semantic_type: SemanticType::Timestamp,
column_id: 0,
},
ColumnMetadata {
column_schema: ColumnSchema::new(
"host",
ConcreteDataType::float64_datatype(),
false,
),
semantic_type: SemanticType::Tag,
column_id: 1,
},
ColumnMetadata {
column_schema: ColumnSchema::new(
"cpu",
ConcreteDataType::float64_datatype(),
false,
),
semantic_type: SemanticType::Tag,
column_id: 2,
},
])
.unwrap(),
);
return Ok(response);
}
Ok(RegionResponse::new(0))
}
fn assert_create_request(
peer: Peer,
request: RegionRequest,
expected_peer_id: u64,
expected_region_id: RegionId,
) {
assert_eq!(peer.id, expected_peer_id);
let Some(region_request::Body::Create(req)) = request.body else {
unreachable!();
};
assert_eq!(req.region_id, expected_region_id);
}
pub(crate) fn test_create_table_task(name: &str) -> CreateTableTask {
let create_table = TestCreateTableExprBuilder::default()
.column_defs([
@@ -230,11 +296,13 @@ async fn test_on_create_metadata_error() {
#[tokio::test]
async fn test_on_create_metadata() {
common_telemetry::init_default_ut_logging();
let node_manager = Arc::new(MockDatanodeManager::new(NaiveDatanodeHandler));
let (tx, mut rx) = mpsc::channel(8);
let datanode_handler = DatanodeWatcher::new(tx).with_handler(create_request_handler);
let node_manager = Arc::new(MockDatanodeManager::new(datanode_handler));
let ddl_context = new_ddl_context(node_manager);
let task = test_create_table_task("foo");
assert!(!task.create_table.create_if_not_exists);
let mut procedure = CreateTableProcedure::new(task, ddl_context);
let mut procedure = CreateTableProcedure::new(task, ddl_context.clone());
procedure.on_prepare().await.unwrap();
let ctx = ProcedureContext {
procedure_id: ProcedureId::random(),
@@ -243,8 +311,16 @@ async fn test_on_create_metadata() {
procedure.execute(&ctx).await.unwrap();
// Triggers procedure to create table metadata
let status = procedure.execute(&ctx).await.unwrap();
let table_id = status.downcast_output_ref::<u32>().unwrap();
assert_eq!(*table_id, 1024);
let table_id = *status.downcast_output_ref::<u32>().unwrap();
assert_eq!(table_id, 1024);
let (peer, request) = rx.try_recv().unwrap();
rx.try_recv().unwrap_err();
assert_create_request(peer, request, 0, RegionId::new(table_id, 0));
let table_info = get_raw_table_info(&ddl_context, table_id).await;
assert_column_name(&table_info, &["ts", "host", "cpu"]);
assert_eq!(table_info.meta.column_ids, vec![0, 1, 2]);
}
#[tokio::test]

View File

@@ -29,6 +29,7 @@ use common_telemetry::{error, info, warn};
use common_wal::options::WalOptions;
use futures::future::join_all;
use snafu::{ensure, OptionExt, ResultExt};
use store_api::metadata::ColumnMetadata;
use store_api::metric_engine_consts::{LOGICAL_TABLE_METADATA_KEY, MANIFEST_INFO_EXTENSION_KEY};
use store_api::region_engine::RegionManifestInfo;
use store_api::storage::{RegionId, RegionNumber};
@@ -37,8 +38,8 @@ use table::table_reference::TableReference;
use crate::ddl::{DdlContext, DetectingRegion};
use crate::error::{
self, Error, OperateDatanodeSnafu, ParseWalOptionsSnafu, Result, TableNotFoundSnafu,
UnsupportedSnafu,
self, DecodeJsonSnafu, Error, MetadataCorruptionSnafu, OperateDatanodeSnafu,
ParseWalOptionsSnafu, Result, TableNotFoundSnafu, UnsupportedSnafu,
};
use crate::key::datanode_table::DatanodeTableValue;
use crate::key::table_name::TableNameKey;
@@ -314,11 +315,23 @@ pub fn parse_manifest_infos_from_extensions(
Ok(data_manifest_version)
}
/// Parses column metadatas from extensions.
pub fn parse_column_metadatas(
extensions: &HashMap<String, Vec<u8>>,
key: &str,
) -> Result<Vec<ColumnMetadata>> {
let value = extensions.get(key).context(error::UnexpectedSnafu {
err_msg: format!("column metadata extension not found: {}", key),
})?;
let column_metadatas = ColumnMetadata::decode_list(value).context(error::SerdeJsonSnafu {})?;
Ok(column_metadatas)
}
/// Sync follower regions on datanodes.
pub async fn sync_follower_regions(
context: &DdlContext,
table_id: TableId,
results: Vec<RegionResponse>,
results: &[RegionResponse],
region_routes: &[RegionRoute],
engine: &str,
) -> Result<()> {
@@ -331,7 +344,7 @@ pub async fn sync_follower_regions(
}
let results = results
.into_iter()
.iter()
.map(|response| parse_manifest_infos_from_extensions(&response.extensions))
.collect::<Result<Vec<_>>>()?
.into_iter()
@@ -418,6 +431,38 @@ pub async fn sync_follower_regions(
Ok(())
}
/// Extracts column metadatas from extensions.
pub fn extract_column_metadatas(
results: &mut [RegionResponse],
key: &str,
) -> Result<Option<Vec<ColumnMetadata>>> {
let schemas = results
.iter_mut()
.map(|r| r.extensions.remove(key))
.collect::<Vec<_>>();
if schemas.is_empty() {
return Ok(None);
}
// Verify all the physical schemas are the same
// Safety: previous check ensures this vec is not empty
let first = schemas.first().unwrap();
ensure!(
schemas.iter().all(|x| x == first),
MetadataCorruptionSnafu {
err_msg: "The table column metadata schemas from datanodes are not the same."
}
);
if let Some(first) = first {
let column_metadatas = ColumnMetadata::decode_list(first).context(DecodeJsonSnafu)?;
Ok(Some(column_metadatas))
} else {
Ok(None)
}
}
#[cfg(test)]
mod tests {
use super::*;

View File

@@ -334,6 +334,7 @@ mod tests {
options: Default::default(),
region_numbers: vec![1],
partition_key_indices: vec![],
column_ids: vec![],
};
RawTableInfo {

View File

@@ -14,13 +14,14 @@
use std::collections::HashMap;
use common_telemetry::debug;
use snafu::ensure;
use crate::error::{self, Result};
use crate::key::txn_helper::TxnOpGetResponseSet;
use crate::kv_backend::txn::{Compare, CompareOp, Txn, TxnOp};
use crate::kv_backend::KvBackendRef;
use crate::rpc::store::BatchGetRequest;
use crate::rpc::store::{BatchDeleteRequest, BatchGetRequest};
/// [TombstoneManager] provides the ability to:
/// - logically delete values
@@ -28,6 +29,9 @@ use crate::rpc::store::BatchGetRequest;
pub struct TombstoneManager {
kv_backend: KvBackendRef,
tombstone_prefix: String,
// Only used for testing.
#[cfg(test)]
max_txn_ops: Option<usize>,
}
const TOMBSTONE_PREFIX: &str = "__tombstone/";
@@ -35,10 +39,7 @@ const TOMBSTONE_PREFIX: &str = "__tombstone/";
impl TombstoneManager {
/// Returns [TombstoneManager].
pub fn new(kv_backend: KvBackendRef) -> Self {
Self {
kv_backend,
tombstone_prefix: TOMBSTONE_PREFIX.to_string(),
}
Self::new_with_prefix(kv_backend, TOMBSTONE_PREFIX)
}
/// Returns [TombstoneManager] with a custom tombstone prefix.
@@ -46,6 +47,8 @@ impl TombstoneManager {
Self {
kv_backend,
tombstone_prefix: prefix.to_string(),
#[cfg(test)]
max_txn_ops: None,
}
}
@@ -53,6 +56,11 @@ impl TombstoneManager {
[self.tombstone_prefix.as_bytes(), key].concat()
}
#[cfg(test)]
pub fn set_max_txn_ops(&mut self, max_txn_ops: usize) {
self.max_txn_ops = Some(max_txn_ops);
}
/// Moves value to `dest_key`.
///
/// Puts `value` to `dest_key` if the value of `src_key` equals `value`.
@@ -83,7 +91,11 @@ impl TombstoneManager {
ensure!(
keys.len() == dest_keys.len(),
error::UnexpectedSnafu {
err_msg: "The length of keys does not match the length of dest_keys."
err_msg: format!(
"The length of keys({}) does not match the length of dest_keys({}).",
keys.len(),
dest_keys.len()
),
}
);
// The key -> dest key mapping.
@@ -136,19 +148,45 @@ impl TombstoneManager {
.fail()
}
fn max_txn_ops(&self) -> usize {
#[cfg(test)]
if let Some(max_txn_ops) = self.max_txn_ops {
return max_txn_ops;
}
self.kv_backend.max_txn_ops()
}
/// Moves values to `dest_key`.
///
/// Returns the number of keys that were moved.
async fn move_values(&self, keys: Vec<Vec<u8>>, dest_keys: Vec<Vec<u8>>) -> Result<usize> {
let chunk_size = self.kv_backend.max_txn_ops() / 2;
if keys.len() > chunk_size {
let keys_chunks = keys.chunks(chunk_size).collect::<Vec<_>>();
let dest_keys_chunks = keys.chunks(chunk_size).collect::<Vec<_>>();
for (keys, dest_keys) in keys_chunks.into_iter().zip(dest_keys_chunks) {
self.move_values_inner(keys, dest_keys).await?;
ensure!(
keys.len() == dest_keys.len(),
error::UnexpectedSnafu {
err_msg: format!(
"The length of keys({}) does not match the length of dest_keys({}).",
keys.len(),
dest_keys.len()
),
}
Ok(keys.len())
);
if keys.is_empty() {
return Ok(0);
}
let chunk_size = self.max_txn_ops() / 2;
if keys.len() > chunk_size {
debug!(
"Moving values with multiple chunks, keys len: {}, chunk_size: {}",
keys.len(),
chunk_size
);
let mut moved_keys = 0;
let keys_chunks = keys.chunks(chunk_size).collect::<Vec<_>>();
let dest_keys_chunks = dest_keys.chunks(chunk_size).collect::<Vec<_>>();
for (keys, dest_keys) in keys_chunks.into_iter().zip(dest_keys_chunks) {
moved_keys += self.move_values_inner(keys, dest_keys).await?;
}
Ok(moved_keys)
} else {
self.move_values_inner(&keys, &dest_keys).await
}
@@ -196,15 +234,18 @@ impl TombstoneManager {
///
/// Returns the number of keys that were deleted.
pub async fn delete(&self, keys: Vec<Vec<u8>>) -> Result<usize> {
let operations = keys
let keys = keys
.iter()
.map(|key| TxnOp::Delete(self.to_tombstone(key)))
.map(|key| self.to_tombstone(key))
.collect::<Vec<_>>();
let txn = Txn::new().and_then(operations);
// Always success.
let _ = self.kv_backend.txn(txn).await?;
Ok(keys.len())
let num_keys = keys.len();
let _ = self
.kv_backend
.batch_delete(BatchDeleteRequest::new().with_keys(keys))
.await?;
Ok(num_keys)
}
}
@@ -392,16 +433,73 @@ mod tests {
.into_iter()
.map(|kv| (kv.key, kv.dest_key))
.unzip();
tombstone_manager
let moved_keys = tombstone_manager
.move_values(keys.clone(), dest_keys.clone())
.await
.unwrap();
assert_eq!(kvs.len(), moved_keys);
check_moved_values(kv_backend.clone(), &move_values).await;
// Moves again
tombstone_manager
let moved_keys = tombstone_manager
.move_values(keys.clone(), dest_keys.clone())
.await
.unwrap();
assert_eq!(0, moved_keys);
check_moved_values(kv_backend.clone(), &move_values).await;
}
#[tokio::test]
async fn test_move_values_with_max_txn_ops() {
common_telemetry::init_default_ut_logging();
let kv_backend = Arc::new(MemoryKvBackend::default());
let mut tombstone_manager = TombstoneManager::new(kv_backend.clone());
tombstone_manager.set_max_txn_ops(4);
let kvs = HashMap::from([
(b"bar".to_vec(), b"baz".to_vec()),
(b"foo".to_vec(), b"hi".to_vec()),
(b"baz".to_vec(), b"hello".to_vec()),
(b"qux".to_vec(), b"world".to_vec()),
(b"quux".to_vec(), b"world".to_vec()),
(b"quuux".to_vec(), b"world".to_vec()),
(b"quuuux".to_vec(), b"world".to_vec()),
(b"quuuuux".to_vec(), b"world".to_vec()),
(b"quuuuuux".to_vec(), b"world".to_vec()),
]);
for (key, value) in &kvs {
kv_backend
.put(
PutRequest::new()
.with_key(key.clone())
.with_value(value.clone()),
)
.await
.unwrap();
}
let move_values = kvs
.iter()
.map(|(key, value)| MoveValue {
key: key.clone(),
dest_key: tombstone_manager.to_tombstone(key),
value: value.clone(),
})
.collect::<Vec<_>>();
let (keys, dest_keys): (Vec<_>, Vec<_>) = move_values
.clone()
.into_iter()
.map(|kv| (kv.key, kv.dest_key))
.unzip();
let moved_keys = tombstone_manager
.move_values(keys.clone(), dest_keys.clone())
.await
.unwrap();
assert_eq!(kvs.len(), moved_keys);
check_moved_values(kv_backend.clone(), &move_values).await;
// Moves again
let moved_keys = tombstone_manager
.move_values(keys.clone(), dest_keys.clone())
.await
.unwrap();
assert_eq!(0, moved_keys);
check_moved_values(kv_backend.clone(), &move_values).await;
}
@@ -439,17 +537,19 @@ mod tests {
.unzip();
keys.push(b"non-exists".to_vec());
dest_keys.push(b"hi/non-exists".to_vec());
tombstone_manager
let moved_keys = tombstone_manager
.move_values(keys.clone(), dest_keys.clone())
.await
.unwrap();
check_moved_values(kv_backend.clone(), &move_values).await;
assert_eq!(3, moved_keys);
// Moves again
tombstone_manager
let moved_keys = tombstone_manager
.move_values(keys.clone(), dest_keys.clone())
.await
.unwrap();
check_moved_values(kv_backend.clone(), &move_values).await;
assert_eq!(0, moved_keys);
}
#[tokio::test]
@@ -490,10 +590,11 @@ mod tests {
.into_iter()
.map(|kv| (kv.key, kv.dest_key))
.unzip();
tombstone_manager
let moved_keys = tombstone_manager
.move_values(keys, dest_keys)
.await
.unwrap();
assert_eq!(kvs.len(), moved_keys);
}
#[tokio::test]
@@ -571,4 +672,24 @@ mod tests {
.unwrap();
check_moved_values(kv_backend.clone(), &move_values).await;
}
#[tokio::test]
async fn test_move_values_with_different_lengths() {
let kv_backend = Arc::new(MemoryKvBackend::default());
let tombstone_manager = TombstoneManager::new(kv_backend.clone());
let keys = vec![b"bar".to_vec(), b"foo".to_vec()];
let dest_keys = vec![b"bar".to_vec(), b"foo".to_vec(), b"baz".to_vec()];
let err = tombstone_manager
.move_values(keys, dest_keys)
.await
.unwrap_err();
assert!(err
.to_string()
.contains("The length of keys(2) does not match the length of dest_keys(3)."),);
let moved_keys = tombstone_manager.move_values(vec![], vec![]).await.unwrap();
assert_eq!(0, moved_keys);
}
}

View File

@@ -1374,6 +1374,7 @@ mod tests {
options: Default::default(),
created_on: Default::default(),
partition_key_indices: Default::default(),
column_ids: Default::default(),
};
// construct RawTableInfo

View File

@@ -178,8 +178,6 @@ pub enum Error {
StreamTimeout {
#[snafu(implicit)]
location: Location,
#[snafu(source)]
error: tokio::time::error::Elapsed,
},
#[snafu(display("RecordBatch slice index overflow: {visit_index} > {size}"))]

View File

@@ -40,7 +40,7 @@ use log_store::raft_engine::log_store::RaftEngineLogStore;
use meta_client::MetaClientRef;
use metric_engine::engine::MetricEngine;
use mito2::config::MitoConfig;
use mito2::engine::MitoEngine;
use mito2::engine::{MitoEngine, MitoEngineBuilder};
use object_store::manager::{ObjectStoreManager, ObjectStoreManagerRef};
use object_store::util::normalize_dir;
use query::dummy_catalog::TableProviderFactoryRef;
@@ -162,6 +162,8 @@ pub struct DatanodeBuilder {
meta_client: Option<MetaClientRef>,
kv_backend: KvBackendRef,
cache_registry: Option<Arc<LayeredCacheRegistry>>,
#[cfg(feature = "enterprise")]
extension_range_provider_factory: Option<mito2::extension::BoxedExtensionRangeProviderFactory>,
}
impl DatanodeBuilder {
@@ -173,6 +175,8 @@ impl DatanodeBuilder {
meta_client: None,
kv_backend,
cache_registry: None,
#[cfg(feature = "enterprise")]
extension_range_provider_factory: None,
}
}
@@ -199,6 +203,15 @@ impl DatanodeBuilder {
self
}
#[cfg(feature = "enterprise")]
pub fn with_extension_range_provider(
&mut self,
extension_range_provider_factory: mito2::extension::BoxedExtensionRangeProviderFactory,
) -> &mut Self {
self.extension_range_provider_factory = Some(extension_range_provider_factory);
self
}
pub async fn build(mut self) -> Result<Datanode> {
let node_id = self.opts.node_id.context(MissingNodeIdSnafu)?;
@@ -340,7 +353,7 @@ impl DatanodeBuilder {
}
async fn new_region_server(
&self,
&mut self,
schema_metadata_manager: SchemaMetadataManagerRef,
event_listener: RegionServerEventListenerRef,
) -> Result<RegionServer> {
@@ -376,13 +389,13 @@ impl DatanodeBuilder {
);
let object_store_manager = Self::build_object_store_manager(&opts.storage).await?;
let engines = Self::build_store_engines(
opts,
object_store_manager,
schema_metadata_manager,
self.plugins.clone(),
)
.await?;
let engines = self
.build_store_engines(
object_store_manager,
schema_metadata_manager,
self.plugins.clone(),
)
.await?;
for engine in engines {
region_server.register_engine(engine);
}
@@ -394,7 +407,7 @@ impl DatanodeBuilder {
/// Builds [RegionEngineRef] from `store_engine` section in `opts`
async fn build_store_engines(
opts: &DatanodeOptions,
&mut self,
object_store_manager: ObjectStoreManagerRef,
schema_metadata_manager: SchemaMetadataManagerRef,
plugins: Plugins,
@@ -403,7 +416,7 @@ impl DatanodeBuilder {
let mut mito_engine_config = MitoConfig::default();
let mut file_engine_config = file_engine::config::EngineConfig::default();
for engine in &opts.region_engine {
for engine in &self.opts.region_engine {
match engine {
RegionEngineConfig::Mito(config) => {
mito_engine_config = config.clone();
@@ -417,14 +430,14 @@ impl DatanodeBuilder {
}
}
let mito_engine = Self::build_mito_engine(
opts,
object_store_manager.clone(),
mito_engine_config,
schema_metadata_manager.clone(),
plugins.clone(),
)
.await?;
let mito_engine = self
.build_mito_engine(
object_store_manager.clone(),
mito_engine_config,
schema_metadata_manager.clone(),
plugins.clone(),
)
.await?;
let metric_engine = MetricEngine::try_new(mito_engine.clone(), metric_engine_config)
.context(BuildMetricEngineSnafu)?;
@@ -443,12 +456,13 @@ impl DatanodeBuilder {
/// Builds [MitoEngine] according to options.
async fn build_mito_engine(
opts: &DatanodeOptions,
&mut self,
object_store_manager: ObjectStoreManagerRef,
mut config: MitoConfig,
schema_metadata_manager: SchemaMetadataManagerRef,
plugins: Plugins,
) -> Result<MitoEngine> {
let opts = &self.opts;
if opts.storage.is_object_storage() {
// Enable the write cache when setting object storage
config.enable_write_cache = true;
@@ -456,17 +470,27 @@ impl DatanodeBuilder {
}
let mito_engine = match &opts.wal {
DatanodeWalConfig::RaftEngine(raft_engine_config) => MitoEngine::new(
&opts.storage.data_home,
config,
Self::build_raft_engine_log_store(&opts.storage.data_home, raft_engine_config)
.await?,
object_store_manager,
schema_metadata_manager,
plugins,
)
.await
.context(BuildMitoEngineSnafu)?,
DatanodeWalConfig::RaftEngine(raft_engine_config) => {
let log_store =
Self::build_raft_engine_log_store(&opts.storage.data_home, raft_engine_config)
.await?;
let builder = MitoEngineBuilder::new(
&opts.storage.data_home,
config,
log_store,
object_store_manager,
schema_metadata_manager,
plugins,
);
#[cfg(feature = "enterprise")]
let builder = builder.with_extension_range_provider_factory(
self.extension_range_provider_factory.take(),
);
builder.try_build().await.context(BuildMitoEngineSnafu)?
}
DatanodeWalConfig::Kafka(kafka_config) => {
if kafka_config.create_index && opts.node_id.is_none() {
warn!("The WAL index creation only available in distributed mode.")
@@ -488,16 +512,21 @@ impl DatanodeBuilder {
None
};
MitoEngine::new(
let builder = MitoEngineBuilder::new(
&opts.storage.data_home,
config,
Self::build_kafka_log_store(kafka_config, global_index_collector).await?,
object_store_manager,
schema_metadata_manager,
plugins,
)
.await
.context(BuildMitoEngineSnafu)?
);
#[cfg(feature = "enterprise")]
let builder = builder.with_extension_range_provider_factory(
self.extension_range_provider_factory.take(),
);
builder.try_build().await.context(BuildMitoEngineSnafu)?
}
};
Ok(mito_engine)

View File

@@ -31,9 +31,10 @@ pub use crate::schema::column_schema::{
ColumnSchema, FulltextAnalyzer, FulltextBackend, FulltextOptions, Metadata,
SkippingIndexOptions, SkippingIndexType, COLUMN_FULLTEXT_CHANGE_OPT_KEY_ENABLE,
COLUMN_FULLTEXT_OPT_KEY_ANALYZER, COLUMN_FULLTEXT_OPT_KEY_BACKEND,
COLUMN_FULLTEXT_OPT_KEY_CASE_SENSITIVE, COLUMN_SKIPPING_INDEX_OPT_KEY_GRANULARITY,
COLUMN_SKIPPING_INDEX_OPT_KEY_TYPE, COMMENT_KEY, FULLTEXT_KEY, INVERTED_INDEX_KEY,
SKIPPING_INDEX_KEY, TIME_INDEX_KEY,
COLUMN_FULLTEXT_OPT_KEY_CASE_SENSITIVE, COLUMN_FULLTEXT_OPT_KEY_FALSE_POSITIVE_RATE,
COLUMN_FULLTEXT_OPT_KEY_GRANULARITY, COLUMN_SKIPPING_INDEX_OPT_KEY_FALSE_POSITIVE_RATE,
COLUMN_SKIPPING_INDEX_OPT_KEY_GRANULARITY, COLUMN_SKIPPING_INDEX_OPT_KEY_TYPE, COMMENT_KEY,
FULLTEXT_KEY, INVERTED_INDEX_KEY, SKIPPING_INDEX_KEY, TIME_INDEX_KEY,
};
pub use crate::schema::constraint::ColumnDefaultConstraint;
pub use crate::schema::raw::RawSchema;

View File

@@ -47,13 +47,18 @@ pub const COLUMN_FULLTEXT_CHANGE_OPT_KEY_ENABLE: &str = "enable";
pub const COLUMN_FULLTEXT_OPT_KEY_ANALYZER: &str = "analyzer";
pub const COLUMN_FULLTEXT_OPT_KEY_CASE_SENSITIVE: &str = "case_sensitive";
pub const COLUMN_FULLTEXT_OPT_KEY_BACKEND: &str = "backend";
pub const COLUMN_FULLTEXT_OPT_KEY_GRANULARITY: &str = "granularity";
pub const COLUMN_FULLTEXT_OPT_KEY_FALSE_POSITIVE_RATE: &str = "false_positive_rate";
/// Keys used in SKIPPING index options
pub const COLUMN_SKIPPING_INDEX_OPT_KEY_GRANULARITY: &str = "granularity";
pub const COLUMN_SKIPPING_INDEX_OPT_KEY_FALSE_POSITIVE_RATE: &str = "false_positive_rate";
pub const COLUMN_SKIPPING_INDEX_OPT_KEY_TYPE: &str = "type";
pub const DEFAULT_GRANULARITY: u32 = 10240;
pub const DEFAULT_FALSE_POSITIVE_RATE: f64 = 0.01;
/// Schema of a column, used as an immutable struct.
#[derive(Clone, PartialEq, Eq, Serialize, Deserialize)]
pub struct ColumnSchema {
@@ -504,7 +509,7 @@ impl TryFrom<&ColumnSchema> for Field {
}
/// Fulltext options for a column.
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize, Default, Visit, VisitMut)]
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize, Visit, VisitMut)]
#[serde(rename_all = "kebab-case")]
pub struct FulltextOptions {
/// Whether the fulltext index is enabled.
@@ -518,6 +523,92 @@ pub struct FulltextOptions {
/// The fulltext backend to use.
#[serde(default)]
pub backend: FulltextBackend,
/// The granularity of the fulltext index (for bloom backend only)
#[serde(default = "fulltext_options_default_granularity")]
pub granularity: u32,
/// The false positive rate of the fulltext index (for bloom backend only)
#[serde(default = "index_options_default_false_positive_rate_in_10000")]
pub false_positive_rate_in_10000: u32,
}
fn fulltext_options_default_granularity() -> u32 {
DEFAULT_GRANULARITY
}
fn index_options_default_false_positive_rate_in_10000() -> u32 {
(DEFAULT_FALSE_POSITIVE_RATE * 10000.0) as u32
}
impl FulltextOptions {
/// Creates a new fulltext options.
pub fn new(
enable: bool,
analyzer: FulltextAnalyzer,
case_sensitive: bool,
backend: FulltextBackend,
granularity: u32,
false_positive_rate: f64,
) -> Result<Self> {
ensure!(
0.0 < false_positive_rate && false_positive_rate <= 1.0,
error::InvalidFulltextOptionSnafu {
msg: format!(
"Invalid false positive rate: {false_positive_rate}, expected: 0.0 < rate <= 1.0"
),
}
);
ensure!(
granularity > 0,
error::InvalidFulltextOptionSnafu {
msg: format!("Invalid granularity: {granularity}, expected: positive integer"),
}
);
Ok(Self::new_unchecked(
enable,
analyzer,
case_sensitive,
backend,
granularity,
false_positive_rate,
))
}
/// Creates a new fulltext options without checking `false_positive_rate` and `granularity`.
pub fn new_unchecked(
enable: bool,
analyzer: FulltextAnalyzer,
case_sensitive: bool,
backend: FulltextBackend,
granularity: u32,
false_positive_rate: f64,
) -> Self {
Self {
enable,
analyzer,
case_sensitive,
backend,
granularity,
false_positive_rate_in_10000: (false_positive_rate * 10000.0) as u32,
}
}
/// Gets the false positive rate.
pub fn false_positive_rate(&self) -> f64 {
self.false_positive_rate_in_10000 as f64 / 10000.0
}
}
impl Default for FulltextOptions {
fn default() -> Self {
Self::new_unchecked(
false,
FulltextAnalyzer::default(),
false,
FulltextBackend::default(),
DEFAULT_GRANULARITY,
DEFAULT_FALSE_POSITIVE_RATE,
)
}
}
impl fmt::Display for FulltextOptions {
@@ -527,6 +618,10 @@ impl fmt::Display for FulltextOptions {
write!(f, ", analyzer={}", self.analyzer)?;
write!(f, ", case_sensitive={}", self.case_sensitive)?;
write!(f, ", backend={}", self.backend)?;
if self.backend == FulltextBackend::Bloom {
write!(f, ", granularity={}", self.granularity)?;
write!(f, ", false_positive_rate={}", self.false_positive_rate())?;
}
}
Ok(())
}
@@ -611,6 +706,45 @@ impl TryFrom<HashMap<String, String>> for FulltextOptions {
}
}
if fulltext_options.backend == FulltextBackend::Bloom {
// Parse granularity with default value 10240
let granularity = match options.get(COLUMN_FULLTEXT_OPT_KEY_GRANULARITY) {
Some(value) => value
.parse::<u32>()
.ok()
.filter(|&v| v > 0)
.ok_or_else(|| {
error::InvalidFulltextOptionSnafu {
msg: format!(
"Invalid granularity: {value}, expected: positive integer"
),
}
.build()
})?,
None => DEFAULT_GRANULARITY,
};
fulltext_options.granularity = granularity;
// Parse false positive rate with default value 0.01
let false_positive_rate = match options.get(COLUMN_FULLTEXT_OPT_KEY_FALSE_POSITIVE_RATE)
{
Some(value) => value
.parse::<f64>()
.ok()
.filter(|&v| v > 0.0 && v <= 1.0)
.ok_or_else(|| {
error::InvalidFulltextOptionSnafu {
msg: format!(
"Invalid false positive rate: {value}, expected: 0.0 < rate <= 1.0"
),
}
.build()
})?,
None => DEFAULT_FALSE_POSITIVE_RATE,
};
fulltext_options.false_positive_rate_in_10000 = (false_positive_rate * 10000.0) as u32;
}
Ok(fulltext_options)
}
}
@@ -638,23 +772,73 @@ impl fmt::Display for FulltextAnalyzer {
pub struct SkippingIndexOptions {
/// The granularity of the skip index.
pub granularity: u32,
/// The false positive rate of the skip index (in ten-thousandths, e.g., 100 = 1%).
#[serde(default = "index_options_default_false_positive_rate_in_10000")]
pub false_positive_rate_in_10000: u32,
/// The type of the skip index.
#[serde(default)]
pub index_type: SkippingIndexType,
}
impl SkippingIndexOptions {
/// Creates a new skipping index options without checking `false_positive_rate` and `granularity`.
pub fn new_unchecked(
granularity: u32,
false_positive_rate: f64,
index_type: SkippingIndexType,
) -> Self {
Self {
granularity,
false_positive_rate_in_10000: (false_positive_rate * 10000.0) as u32,
index_type,
}
}
/// Creates a new skipping index options.
pub fn new(
granularity: u32,
false_positive_rate: f64,
index_type: SkippingIndexType,
) -> Result<Self> {
ensure!(
0.0 < false_positive_rate && false_positive_rate <= 1.0,
error::InvalidSkippingIndexOptionSnafu {
msg: format!("Invalid false positive rate: {false_positive_rate}, expected: 0.0 < rate <= 1.0"),
}
);
ensure!(
granularity > 0,
error::InvalidSkippingIndexOptionSnafu {
msg: format!("Invalid granularity: {granularity}, expected: positive integer"),
}
);
Ok(Self::new_unchecked(
granularity,
false_positive_rate,
index_type,
))
}
/// Gets the false positive rate.
pub fn false_positive_rate(&self) -> f64 {
self.false_positive_rate_in_10000 as f64 / 10000.0
}
}
impl Default for SkippingIndexOptions {
fn default() -> Self {
Self {
granularity: DEFAULT_GRANULARITY,
index_type: SkippingIndexType::default(),
}
Self::new_unchecked(
DEFAULT_GRANULARITY,
DEFAULT_FALSE_POSITIVE_RATE,
SkippingIndexType::default(),
)
}
}
impl fmt::Display for SkippingIndexOptions {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
write!(f, "granularity={}", self.granularity)?;
write!(f, ", false_positive_rate={}", self.false_positive_rate())?;
write!(f, ", index_type={}", self.index_type)?;
Ok(())
}
@@ -681,15 +865,37 @@ impl TryFrom<HashMap<String, String>> for SkippingIndexOptions {
fn try_from(options: HashMap<String, String>) -> Result<Self> {
// Parse granularity with default value 1
let granularity = match options.get(COLUMN_SKIPPING_INDEX_OPT_KEY_GRANULARITY) {
Some(value) => value.parse::<u32>().map_err(|_| {
error::InvalidSkippingIndexOptionSnafu {
msg: format!("Invalid granularity: {value}, expected: positive integer"),
}
.build()
})?,
Some(value) => value
.parse::<u32>()
.ok()
.filter(|&v| v > 0)
.ok_or_else(|| {
error::InvalidSkippingIndexOptionSnafu {
msg: format!("Invalid granularity: {value}, expected: positive integer"),
}
.build()
})?,
None => DEFAULT_GRANULARITY,
};
// Parse false positive rate with default value 100
let false_positive_rate =
match options.get(COLUMN_SKIPPING_INDEX_OPT_KEY_FALSE_POSITIVE_RATE) {
Some(value) => value
.parse::<f64>()
.ok()
.filter(|&v| v > 0.0 && v <= 1.0)
.ok_or_else(|| {
error::InvalidSkippingIndexOptionSnafu {
msg: format!(
"Invalid false positive rate: {value}, expected: 0.0 < rate <= 1.0"
),
}
.build()
})?,
None => DEFAULT_FALSE_POSITIVE_RATE,
};
// Parse index type with default value BloomFilter
let index_type = match options.get(COLUMN_SKIPPING_INDEX_OPT_KEY_TYPE) {
Some(typ) => match typ.to_ascii_uppercase().as_str() {
@@ -704,10 +910,11 @@ impl TryFrom<HashMap<String, String>> for SkippingIndexOptions {
None => SkippingIndexType::default(),
};
Ok(SkippingIndexOptions {
Ok(SkippingIndexOptions::new_unchecked(
granularity,
false_positive_rate,
index_type,
})
))
}
}
@@ -973,4 +1180,59 @@ mod tests {
assert!(column_schema.default_constraint.is_none());
assert!(column_schema.metadata.is_empty());
}
#[test]
fn test_skipping_index_options_deserialization() {
let original_options = "{\"granularity\":1024,\"false-positive-rate-in-10000\":10,\"index-type\":\"BloomFilter\"}";
let options = serde_json::from_str::<SkippingIndexOptions>(original_options).unwrap();
assert_eq!(1024, options.granularity);
assert_eq!(SkippingIndexType::BloomFilter, options.index_type);
assert_eq!(0.001, options.false_positive_rate());
let options_str = serde_json::to_string(&options).unwrap();
assert_eq!(options_str, original_options);
}
#[test]
fn test_skipping_index_options_deserialization_v0_14_to_v0_15() {
let options = "{\"granularity\":10240,\"index-type\":\"BloomFilter\"}";
let options = serde_json::from_str::<SkippingIndexOptions>(options).unwrap();
assert_eq!(10240, options.granularity);
assert_eq!(SkippingIndexType::BloomFilter, options.index_type);
assert_eq!(DEFAULT_FALSE_POSITIVE_RATE, options.false_positive_rate());
let options_str = serde_json::to_string(&options).unwrap();
assert_eq!(options_str, "{\"granularity\":10240,\"false-positive-rate-in-10000\":100,\"index-type\":\"BloomFilter\"}");
}
#[test]
fn test_fulltext_options_deserialization() {
let original_options = "{\"enable\":true,\"analyzer\":\"English\",\"case-sensitive\":false,\"backend\":\"bloom\",\"granularity\":1024,\"false-positive-rate-in-10000\":10}";
let options = serde_json::from_str::<FulltextOptions>(original_options).unwrap();
assert!(!options.case_sensitive);
assert!(options.enable);
assert_eq!(FulltextBackend::Bloom, options.backend);
assert_eq!(FulltextAnalyzer::default(), options.analyzer);
assert_eq!(1024, options.granularity);
assert_eq!(0.001, options.false_positive_rate());
let options_str = serde_json::to_string(&options).unwrap();
assert_eq!(options_str, original_options);
}
#[test]
fn test_fulltext_options_deserialization_v0_14_to_v0_15() {
// 0.14 to 0.15
let options = "{\"enable\":true,\"analyzer\":\"English\",\"case-sensitive\":false,\"backend\":\"bloom\"}";
let options = serde_json::from_str::<FulltextOptions>(options).unwrap();
assert!(!options.case_sensitive);
assert!(options.enable);
assert_eq!(FulltextBackend::Bloom, options.backend);
assert_eq!(FulltextAnalyzer::default(), options.analyzer);
assert_eq!(DEFAULT_GRANULARITY, options.granularity);
assert_eq!(DEFAULT_FALSE_POSITIVE_RATE, options.false_positive_rate());
let options_str = serde_json::to_string(&options).unwrap();
assert_eq!(options_str, "{\"enable\":true,\"analyzer\":\"English\",\"case-sensitive\":false,\"backend\":\"bloom\",\"granularity\":10240,\"false-positive-rate-in-10000\":100}");
}
}

View File

@@ -12,6 +12,11 @@
// See the License for the specific language governing permissions and
// limitations under the License.
use arrow_array::{
ArrayRef, PrimitiveArray, TimestampMicrosecondArray, TimestampMillisecondArray,
TimestampNanosecondArray, TimestampSecondArray,
};
use arrow_schema::DataType;
use common_time::timestamp::TimeUnit;
use common_time::Timestamp;
use paste::paste;
@@ -138,6 +143,41 @@ define_timestamp_with_unit!(Millisecond);
define_timestamp_with_unit!(Microsecond);
define_timestamp_with_unit!(Nanosecond);
pub fn timestamp_array_to_primitive(
ts_array: &ArrayRef,
) -> Option<(
PrimitiveArray<arrow_array::types::Int64Type>,
arrow::datatypes::TimeUnit,
)> {
let DataType::Timestamp(unit, _) = ts_array.data_type() else {
return None;
};
let ts_primitive = match unit {
arrow_schema::TimeUnit::Second => ts_array
.as_any()
.downcast_ref::<TimestampSecondArray>()
.unwrap()
.reinterpret_cast::<arrow_array::types::Int64Type>(),
arrow_schema::TimeUnit::Millisecond => ts_array
.as_any()
.downcast_ref::<TimestampMillisecondArray>()
.unwrap()
.reinterpret_cast::<arrow_array::types::Int64Type>(),
arrow_schema::TimeUnit::Microsecond => ts_array
.as_any()
.downcast_ref::<TimestampMicrosecondArray>()
.unwrap()
.reinterpret_cast::<arrow_array::types::Int64Type>(),
arrow_schema::TimeUnit::Nanosecond => ts_array
.as_any()
.downcast_ref::<TimestampNanosecondArray>()
.unwrap()
.reinterpret_cast::<arrow_array::types::Int64Type>(),
};
Some((ts_primitive, *unit))
}
#[cfg(test)]
mod tests {
use common_time::timezone::set_default_timezone;

View File

@@ -121,7 +121,9 @@ impl Default for FlownodeOptions {
logging: LoggingOptions::default(),
tracing: TracingOptions::default(),
heartbeat: HeartbeatOptions::default(),
query: QueryOptions::default(),
// flownode's query option is set to 1 to throttle flow's query so
// that it won't use too much cpu or memory
query: QueryOptions { parallelism: 1 },
user_provider: None,
}
}
@@ -903,7 +905,7 @@ impl StreamingEngine {
let rows_send = self.run_available(true).await?;
let row = self.send_writeback_requests().await?;
debug!(
"Done to flush flow_id={:?} with {} input rows flushed, {} rows sended and {} output rows flushed",
"Done to flush flow_id={:?} with {} input rows flushed, {} rows sent and {} output rows flushed",
flow_id, flushed_input_rows, rows_send, row
);
Ok(row)

View File

@@ -476,15 +476,37 @@ impl BatchingEngine {
Ok(())
}
/// Only flush the dirty windows of the flow task with given flow id, by running the query on it.
/// As flush the whole time range is usually prohibitively expensive.
pub async fn flush_flow_inner(&self, flow_id: FlowId) -> Result<usize, Error> {
debug!("Try flush flow {flow_id}");
// need to wait a bit to ensure previous mirror insert is handled
// this is only useful for the case when we are flushing the flow right after inserting data into it
// TODO(discord9): find a better way to ensure the data is ready, maybe inform flownode from frontend?
tokio::time::sleep(std::time::Duration::from_millis(100)).await;
let task = self.tasks.read().await.get(&flow_id).cloned();
let task = task.with_context(|| FlowNotFoundSnafu { id: flow_id })?;
task.mark_all_windows_as_dirty()?;
let time_window_size = task
.config
.time_window_expr
.as_ref()
.and_then(|expr| *expr.time_window_size());
let cur_dirty_window_cnt = time_window_size.map(|time_window_size| {
task.state
.read()
.unwrap()
.dirty_time_windows
.effective_count(&time_window_size)
});
let res = task
.gen_exec_once(&self.query_engine, &self.frontend_client)
.gen_exec_once(
&self.query_engine,
&self.frontend_client,
cur_dirty_window_cnt,
)
.await?;
let affected_rows = res.map(|(r, _)| r).unwrap_or_default() as usize;

View File

@@ -14,6 +14,7 @@
//! Frontend client to run flow as batching task which is time-window-aware normal query triggered every tick set by user
use std::collections::HashMap;
use std::sync::{Arc, Weak};
use std::time::SystemTime;
@@ -29,6 +30,8 @@ use common_meta::rpc::store::RangeRequest;
use common_query::Output;
use common_telemetry::warn;
use meta_client::client::MetaClient;
use query::datafusion::QUERY_PARALLELISM_HINT;
use query::options::QueryOptions;
use rand::rng;
use rand::seq::SliceRandom;
use servers::query_handler::grpc::GrpcQueryHandler;
@@ -84,27 +87,34 @@ pub enum FrontendClient {
meta_client: Arc<MetaClient>,
chnl_mgr: ChannelManager,
auth: Option<FlowAuthHeader>,
query: QueryOptions,
},
Standalone {
/// for the sake of simplicity still use grpc even in standalone mode
/// notice the client here should all be lazy, so that can wait after frontend is booted then make conn
database_client: HandlerMutable,
query: QueryOptions,
},
}
impl FrontendClient {
/// Create a new empty frontend client, with a `HandlerMutable` to set the grpc handler later
pub fn from_empty_grpc_handler() -> (Self, HandlerMutable) {
pub fn from_empty_grpc_handler(query: QueryOptions) -> (Self, HandlerMutable) {
let handler = Arc::new(std::sync::Mutex::new(None));
(
Self::Standalone {
database_client: handler.clone(),
query,
},
handler,
)
}
pub fn from_meta_client(meta_client: Arc<MetaClient>, auth: Option<FlowAuthHeader>) -> Self {
pub fn from_meta_client(
meta_client: Arc<MetaClient>,
auth: Option<FlowAuthHeader>,
query: QueryOptions,
) -> Self {
common_telemetry::info!("Frontend client build with auth={:?}", auth);
Self::Distributed {
meta_client,
@@ -115,12 +125,17 @@ impl FrontendClient {
ChannelManager::with_config(cfg)
},
auth,
query,
}
}
pub fn from_grpc_handler(grpc_handler: Weak<dyn GrpcQueryHandlerWithBoxedError>) -> Self {
pub fn from_grpc_handler(
grpc_handler: Weak<dyn GrpcQueryHandlerWithBoxedError>,
query: QueryOptions,
) -> Self {
Self::Standalone {
database_client: Arc::new(std::sync::Mutex::new(Some(grpc_handler))),
query,
}
}
}
@@ -193,6 +208,7 @@ impl FrontendClient {
meta_client: _,
chnl_mgr,
auth,
query: _,
} = self
else {
return UnexpectedSnafu {
@@ -281,7 +297,9 @@ impl FrontendClient {
.map_err(BoxedError::new)
.context(ExternalSnafu)
}
FrontendClient::Standalone { database_client } => {
FrontendClient::Standalone {
database_client, ..
} => {
let ctx = QueryContextBuilder::default()
.current_catalog(catalog.to_string())
.current_schema(schema.to_string())
@@ -328,7 +346,7 @@ impl FrontendClient {
peer_desc: &mut Option<PeerDesc>,
) -> Result<u32, Error> {
match self {
FrontendClient::Distributed { .. } => {
FrontendClient::Distributed { query, .. } => {
let db = self.get_random_active_frontend(catalog, schema).await?;
*peer_desc = Some(PeerDesc::Dist {
@@ -336,16 +354,27 @@ impl FrontendClient {
});
db.database
.handle_with_retry(req.clone(), GRPC_MAX_RETRIES)
.handle_with_retry(
req.clone(),
GRPC_MAX_RETRIES,
&[(QUERY_PARALLELISM_HINT, &query.parallelism.to_string())],
)
.await
.with_context(|_| InvalidRequestSnafu {
context: format!("Failed to handle request at {:?}: {:?}", db.peer, req),
})
}
FrontendClient::Standalone { database_client } => {
FrontendClient::Standalone {
database_client,
query,
} => {
let ctx = QueryContextBuilder::default()
.current_catalog(catalog.to_string())
.current_schema(schema.to_string())
.extensions(HashMap::from([(
QUERY_PARALLELISM_HINT.to_string(),
query.parallelism.to_string(),
)]))
.build();
let ctx = Arc::new(ctx);
{

View File

@@ -22,7 +22,7 @@ use common_telemetry::tracing::warn;
use common_time::Timestamp;
use datatypes::value::Value;
use session::context::QueryContextRef;
use snafu::{OptionExt, ResultExt};
use snafu::{ensure, OptionExt, ResultExt};
use tokio::sync::oneshot;
use tokio::time::Instant;
@@ -31,7 +31,8 @@ use crate::batching_mode::time_window::TimeWindowExpr;
use crate::batching_mode::MIN_REFRESH_DURATION;
use crate::error::{DatatypesSnafu, InternalSnafu, TimeSnafu, UnexpectedSnafu};
use crate::metrics::{
METRIC_FLOW_BATCHING_ENGINE_QUERY_TIME_RANGE, METRIC_FLOW_BATCHING_ENGINE_QUERY_WINDOW_CNT,
METRIC_FLOW_BATCHING_ENGINE_QUERY_WINDOW_CNT, METRIC_FLOW_BATCHING_ENGINE_QUERY_WINDOW_SIZE,
METRIC_FLOW_BATCHING_ENGINE_STALLED_WINDOW_SIZE,
};
use crate::{Error, FlowId};
@@ -76,38 +77,43 @@ impl TaskState {
/// Compute the next query delay based on the time window size or the last query duration.
/// Aiming to avoid too frequent queries. But also not too long delay.
/// The delay is computed as follows:
/// - If `time_window_size` is set, the delay is half the time window size, constrained to be
/// at least `last_query_duration` and at most `max_timeout`.
/// - If `time_window_size` is not set, the delay defaults to `last_query_duration`, constrained
/// to be at least `MIN_REFRESH_DURATION` and at most `max_timeout`.
///
/// If there are dirty time windows, the function returns an immediate execution time to clean them.
/// TODO: Make this behavior configurable.
/// next wait time is calculated as:
/// last query duration, capped by [max(min_run_interval, time_window_size), max_timeout],
/// note at most wait for `max_timeout`.
///
/// if current the dirty time range is longer than one query can handle,
/// execute immediately to faster clean up dirty time windows.
///
pub fn get_next_start_query_time(
&self,
flow_id: FlowId,
time_window_size: &Option<Duration>,
max_timeout: Option<Duration>,
) -> Instant {
let last_duration = max_timeout
.unwrap_or(self.last_query_duration)
.min(self.last_query_duration)
.max(MIN_REFRESH_DURATION);
// = last query duration, capped by [max(min_run_interval, time_window_size), max_timeout], note at most `max_timeout`
let lower = time_window_size.unwrap_or(MIN_REFRESH_DURATION);
let next_duration = self.last_query_duration.max(lower);
let next_duration = if let Some(max_timeout) = max_timeout {
next_duration.min(max_timeout)
} else {
next_duration
};
let next_duration = time_window_size
.map(|t| {
let half = t / 2;
half.max(last_duration)
})
.unwrap_or(last_duration);
// if have dirty time window, execute immediately to clean dirty time window
if self.dirty_time_windows.windows.is_empty() {
let cur_dirty_window_size = self.dirty_time_windows.window_size();
// compute how much time range can be handled in one query
let max_query_update_range = (*time_window_size)
.unwrap_or_default()
.mul_f64(DirtyTimeWindows::MAX_FILTER_NUM as f64);
// if dirty time range is more than one query can handle, execute immediately
// to faster clean up dirty time windows
if cur_dirty_window_size < max_query_update_range {
self.last_update_time + next_duration
} else {
// if dirty time windows can't be clean up in one query, execute immediately to faster
// clean up dirty time windows
debug!(
"Flow id = {}, still have {} dirty time window({:?}), execute immediately",
"Flow id = {}, still have too many {} dirty time window({:?}), execute immediately",
flow_id,
self.dirty_time_windows.windows.len(),
self.dirty_time_windows.windows
@@ -147,6 +153,18 @@ impl DirtyTimeWindows {
}
}
pub fn window_size(&self) -> Duration {
let mut ret = Duration::from_secs(0);
for (start, end) in &self.windows {
if let Some(end) = end {
if let Some(duration) = end.sub(start) {
ret += duration.to_std().unwrap_or_default();
}
}
}
ret
}
pub fn add_window(&mut self, start: Timestamp, end: Option<Timestamp>) {
self.windows.insert(start, end);
}
@@ -161,6 +179,33 @@ impl DirtyTimeWindows {
self.windows.len()
}
/// Get the effective count of time windows, which is the number of time windows that can be
/// used for query, compute from total time window range divided by `window_size`.
pub fn effective_count(&self, window_size: &Duration) -> usize {
if self.windows.is_empty() {
return 0;
}
let window_size =
chrono::Duration::from_std(*window_size).unwrap_or(chrono::Duration::zero());
let total_window_time_range =
self.windows
.iter()
.fold(chrono::Duration::zero(), |acc, (start, end)| {
if let Some(end) = end {
acc + end.sub(start).unwrap_or(chrono::Duration::zero())
} else {
acc + window_size
}
});
// not sure window_size is zero have any meaning, but just in case
if window_size.num_seconds() == 0 {
0
} else {
(total_window_time_range.num_seconds() / window_size.num_seconds()) as usize
}
}
/// Generate all filter expressions consuming all time windows
///
/// there is two limits:
@@ -175,6 +220,13 @@ impl DirtyTimeWindows {
flow_id: FlowId,
task_ctx: Option<&BatchingTask>,
) -> Result<Option<datafusion_expr::Expr>, Error> {
ensure!(
window_size.num_seconds() > 0,
UnexpectedSnafu {
reason: "window_size is zero, can't generate filter exprs",
}
);
debug!(
"expire_lower_bound: {:?}, window_size: {:?}",
expire_lower_bound.map(|t| t.to_iso8601_string()),
@@ -211,62 +263,94 @@ impl DirtyTimeWindows {
// get the first `window_cnt` time windows
let max_time_range = window_size * window_cnt as i32;
let nth = {
let mut cur_time_range = chrono::Duration::zero();
let mut nth_key = None;
for (idx, (start, end)) in self.windows.iter().enumerate() {
// if time range is too long, stop
if cur_time_range > max_time_range {
nth_key = Some(*start);
break;
}
// if we have enough time windows, stop
if idx >= window_cnt {
nth_key = Some(*start);
break;
}
let mut to_be_query = BTreeMap::new();
let mut new_windows = self.windows.clone();
let mut cur_time_range = chrono::Duration::zero();
for (idx, (start, end)) in self.windows.iter().enumerate() {
let first_end = start
.add_duration(window_size.to_std().unwrap())
.context(TimeSnafu)?;
let end = end.unwrap_or(first_end);
if let Some(end) = end {
if let Some(x) = end.sub(start) {
cur_time_range += x;
}
}
// if time range is too long, stop
if cur_time_range >= max_time_range {
break;
}
nth_key
};
let first_nth = {
if let Some(nth) = nth {
let mut after = self.windows.split_off(&nth);
std::mem::swap(&mut self.windows, &mut after);
// if we have enough time windows, stop
if idx >= window_cnt {
break;
}
after
let Some(x) = end.sub(start) else {
continue;
};
if cur_time_range + x <= max_time_range {
to_be_query.insert(*start, Some(end));
new_windows.remove(start);
cur_time_range += x;
} else {
std::mem::take(&mut self.windows)
// too large a window, split it
// split at window_size * times
let surplus = max_time_range - cur_time_range;
if surplus.num_seconds() <= window_size.num_seconds() {
// Skip splitting if surplus is smaller than window_size
break;
}
let times = surplus.num_seconds() / window_size.num_seconds();
let split_offset = window_size * times as i32;
let split_at = start
.add_duration(split_offset.to_std().unwrap())
.context(TimeSnafu)?;
to_be_query.insert(*start, Some(split_at));
// remove the original window
new_windows.remove(start);
new_windows.insert(split_at, Some(end));
cur_time_range += split_offset;
break;
}
};
}
self.windows = new_windows;
METRIC_FLOW_BATCHING_ENGINE_QUERY_WINDOW_CNT
.with_label_values(&[flow_id.to_string().as_str()])
.observe(first_nth.len() as f64);
.observe(to_be_query.len() as f64);
let full_time_range = first_nth
let full_time_range = to_be_query
.iter()
.fold(chrono::Duration::zero(), |acc, (start, end)| {
if let Some(end) = end {
acc + end.sub(start).unwrap_or(chrono::Duration::zero())
} else {
acc
acc + window_size
}
})
.num_seconds() as f64;
METRIC_FLOW_BATCHING_ENGINE_QUERY_TIME_RANGE
METRIC_FLOW_BATCHING_ENGINE_QUERY_WINDOW_SIZE
.with_label_values(&[flow_id.to_string().as_str()])
.observe(full_time_range);
let stalled_time_range =
self.windows
.iter()
.fold(chrono::Duration::zero(), |acc, (start, end)| {
if let Some(end) = end {
acc + end.sub(start).unwrap_or(chrono::Duration::zero())
} else {
acc + window_size
}
});
METRIC_FLOW_BATCHING_ENGINE_STALLED_WINDOW_SIZE
.with_label_values(&[flow_id.to_string().as_str()])
.observe(stalled_time_range.num_seconds() as f64);
let mut expr_lst = vec![];
for (start, end) in first_nth.into_iter() {
for (start, end) in to_be_query.into_iter() {
// align using time window exprs
let (start, end) = if let Some(ctx) = task_ctx {
let Some(time_window_expr) = &ctx.config.time_window_expr else {
@@ -500,6 +584,64 @@ mod test {
"((ts >= CAST('1970-01-01 00:00:00' AS TIMESTAMP)) AND (ts < CAST('1970-01-01 00:00:21' AS TIMESTAMP)))",
)
),
// split range
(
Vec::from_iter((0..20).map(|i|Timestamp::new_second(i*3)).chain(std::iter::once(
Timestamp::new_second(60 + 3 * (DirtyTimeWindows::MERGE_DIST as i64 + 1)),
))),
(chrono::Duration::seconds(3), None),
BTreeMap::from([
(
Timestamp::new_second(0),
Some(Timestamp::new_second(
60
)),
),
(
Timestamp::new_second(60 + 3 * (DirtyTimeWindows::MERGE_DIST as i64 + 1)),
Some(Timestamp::new_second(
60 + 3 * (DirtyTimeWindows::MERGE_DIST as i64 + 1) + 3
)),
)]),
Some(
"((ts >= CAST('1970-01-01 00:00:00' AS TIMESTAMP)) AND (ts < CAST('1970-01-01 00:01:00' AS TIMESTAMP)))",
)
),
// split 2 min into 1 min
(
Vec::from_iter((0..40).map(|i|Timestamp::new_second(i*3))),
(chrono::Duration::seconds(3), None),
BTreeMap::from([
(
Timestamp::new_second(0),
Some(Timestamp::new_second(
40 * 3
)),
)]),
Some(
"((ts >= CAST('1970-01-01 00:00:00' AS TIMESTAMP)) AND (ts < CAST('1970-01-01 00:01:00' AS TIMESTAMP)))",
)
),
// split 3s + 1min into 3s + 57s
(
Vec::from_iter(std::iter::once(Timestamp::new_second(0)).chain((0..40).map(|i|Timestamp::new_second(20+i*3)))),
(chrono::Duration::seconds(3), None),
BTreeMap::from([
(
Timestamp::new_second(0),
Some(Timestamp::new_second(
3
)),
),(
Timestamp::new_second(20),
Some(Timestamp::new_second(
140
)),
)]),
Some(
"(((ts >= CAST('1970-01-01 00:00:00' AS TIMESTAMP)) AND (ts < CAST('1970-01-01 00:00:03' AS TIMESTAMP))) OR ((ts >= CAST('1970-01-01 00:00:20' AS TIMESTAMP)) AND (ts < CAST('1970-01-01 00:01:17' AS TIMESTAMP))))",
)
),
// expired
(
vec![
@@ -516,6 +658,8 @@ mod test {
None
),
];
// let len = testcases.len();
// let testcases = testcases[(len - 2)..(len - 1)].to_vec();
for (lower_bounds, (window_size, expire_lower_bound), expected, expected_filter_expr) in
testcases
{

View File

@@ -211,8 +211,9 @@ impl BatchingTask {
&self,
engine: &QueryEngineRef,
frontend_client: &Arc<FrontendClient>,
max_window_cnt: Option<usize>,
) -> Result<Option<(u32, Duration)>, Error> {
if let Some(new_query) = self.gen_insert_plan(engine).await? {
if let Some(new_query) = self.gen_insert_plan(engine, max_window_cnt).await? {
debug!("Generate new query: {}", new_query);
self.execute_logical_plan(frontend_client, &new_query).await
} else {
@@ -224,6 +225,7 @@ impl BatchingTask {
pub async fn gen_insert_plan(
&self,
engine: &QueryEngineRef,
max_window_cnt: Option<usize>,
) -> Result<Option<LogicalPlan>, Error> {
let (table, df_schema) = get_table_info_df_schema(
self.config.catalog_manager.clone(),
@@ -232,7 +234,7 @@ impl BatchingTask {
.await?;
let new_query = self
.gen_query_with_time_window(engine.clone(), &table.meta.schema)
.gen_query_with_time_window(engine.clone(), &table.meta.schema, max_window_cnt)
.await?;
let insert_into = if let Some((new_query, _column_cnt)) = new_query {
@@ -437,7 +439,7 @@ impl BatchingTask {
.with_label_values(&[&flow_id_str])
.inc();
let new_query = match self.gen_insert_plan(&engine).await {
let new_query = match self.gen_insert_plan(&engine, None).await {
Ok(new_query) => new_query,
Err(err) => {
common_telemetry::error!(err; "Failed to generate query for flow={}", self.config.flow_id);
@@ -521,6 +523,7 @@ impl BatchingTask {
&self,
engine: QueryEngineRef,
sink_table_schema: &Arc<Schema>,
max_window_cnt: Option<usize>,
) -> Result<Option<(LogicalPlan, usize)>, Error> {
let query_ctx = self.state.read().unwrap().query_ctx.clone();
let start = SystemTime::now();
@@ -574,8 +577,8 @@ impl BatchingTask {
};
debug!(
"Flow id = {:?}, found time window: precise_lower_bound={:?}, precise_upper_bound={:?}",
self.config.flow_id, l, u
"Flow id = {:?}, found time window: precise_lower_bound={:?}, precise_upper_bound={:?} with dirty time windows: {:?}",
self.config.flow_id, l, u, self.state.read().unwrap().dirty_time_windows
);
let window_size = u.sub(&l).with_context(|| UnexpectedSnafu {
reason: format!("Can't get window size from {u:?} - {l:?}"),
@@ -601,7 +604,7 @@ impl BatchingTask {
&col_name,
Some(l),
window_size,
DirtyTimeWindows::MAX_FILTER_NUM,
max_window_cnt.unwrap_or(DirtyTimeWindows::MAX_FILTER_NUM),
self.config.flow_id,
Some(self),
)?;

View File

@@ -50,10 +50,18 @@ lazy_static! {
vec![0.0, 5., 10., 20., 40.]
)
.unwrap();
pub static ref METRIC_FLOW_BATCHING_ENGINE_QUERY_TIME_RANGE: HistogramVec =
pub static ref METRIC_FLOW_BATCHING_ENGINE_QUERY_WINDOW_SIZE: HistogramVec =
register_histogram_vec!(
"greptime_flow_batching_engine_query_time_range_secs",
"flow batching engine query time range(seconds)",
"greptime_flow_batching_engine_query_window_size_secs",
"flow batching engine query window size(seconds)",
&["flow_id"],
vec![60., 4. * 60., 16. * 60., 64. * 60., 256. * 60.]
)
.unwrap();
pub static ref METRIC_FLOW_BATCHING_ENGINE_STALLED_WINDOW_SIZE: HistogramVec =
register_histogram_vec!(
"greptime_flow_batching_engine_stalled_window_size_secs",
"flow batching engine stalled window size(seconds)",
&["flow_id"],
vec![60., 4. * 60., 16. * 60., 64. * 60., 256. * 60.]
)

View File

@@ -14,6 +14,7 @@ workspace = true
[dependencies]
api.workspace = true
arc-swap = "1.0"
async-stream.workspace = true
async-trait.workspace = true
auth.workspace = true
bytes.workspace = true

View File

@@ -363,6 +363,12 @@ pub enum Error {
#[snafu(implicit)]
location: Location,
},
#[snafu(display("Canceling statement due to statement timeout"))]
StatementTimeout {
#[snafu(implicit)]
location: Location,
},
}
pub type Result<T> = std::result::Result<T, Error>;
@@ -443,6 +449,8 @@ impl ErrorExt for Error {
Error::DataFusion { error, .. } => datafusion_status_code::<Self>(error, None),
Error::Cancelled { .. } => StatusCode::Cancelled,
Error::StatementTimeout { .. } => StatusCode::Cancelled,
}
}

View File

@@ -25,9 +25,11 @@ mod promql;
mod region_query;
pub mod standalone;
use std::pin::Pin;
use std::sync::Arc;
use std::time::SystemTime;
use std::time::{Duration, SystemTime};
use async_stream::stream;
use async_trait::async_trait;
use auth::{PermissionChecker, PermissionCheckerRef, PermissionReq};
use catalog::process_manager::ProcessManagerRef;
@@ -45,8 +47,11 @@ use common_procedure::local::{LocalManager, ManagerConfig};
use common_procedure::options::ProcedureConfig;
use common_procedure::ProcedureManagerRef;
use common_query::Output;
use common_recordbatch::error::StreamTimeoutSnafu;
use common_recordbatch::RecordBatchStreamWrapper;
use common_telemetry::{debug, error, info, tracing};
use datafusion_expr::LogicalPlan;
use futures::{Stream, StreamExt};
use log_store::raft_engine::RaftEngineBackend;
use operator::delete::DeleterRef;
use operator::insert::InserterRef;
@@ -66,20 +71,21 @@ use servers::interceptor::{
};
use servers::prometheus_handler::PrometheusHandler;
use servers::query_handler::sql::SqlQueryHandler;
use session::context::QueryContextRef;
use session::context::{Channel, QueryContextRef};
use session::table_name::table_idents_to_full_name;
use snafu::prelude::*;
use sql::dialect::Dialect;
use sql::parser::{ParseOptions, ParserContext};
use sql::statements::copy::{CopyDatabase, CopyTable};
use sql::statements::statement::Statement;
use sql::statements::tql::Tql;
use sqlparser::ast::ObjectName;
pub use standalone::StandaloneDatanodeManager;
use crate::error::{
self, Error, ExecLogicalPlanSnafu, ExecutePromqlSnafu, ExternalSnafu, InvalidSqlSnafu,
ParseSqlSnafu, PermissionSnafu, PlanStatementSnafu, Result, SqlExecInterceptedSnafu,
TableOperationSnafu,
StatementTimeoutSnafu, TableOperationSnafu,
};
use crate::limiter::LimiterRef;
use crate::slow_query_recorder::SlowQueryRecorder;
@@ -191,56 +197,7 @@ impl Instance {
Some(query_ctx.process_id()),
);
let query_fut = async {
match stmt {
Statement::Query(_) | Statement::Explain(_) | Statement::Delete(_) => {
// TODO: remove this when format is supported in datafusion
if let Statement::Explain(explain) = &stmt {
if let Some(format) = explain.format() {
query_ctx.set_explain_format(format.to_string());
}
}
let stmt = QueryStatement::Sql(stmt);
let plan = self
.statement_executor
.plan(&stmt, query_ctx.clone())
.await?;
let QueryStatement::Sql(stmt) = stmt else {
unreachable!()
};
query_interceptor.pre_execute(&stmt, Some(&plan), query_ctx.clone())?;
self.statement_executor
.exec_plan(plan, query_ctx)
.await
.context(TableOperationSnafu)
}
Statement::Tql(tql) => {
let plan = self
.statement_executor
.plan_tql(tql.clone(), &query_ctx)
.await?;
query_interceptor.pre_execute(
&Statement::Tql(tql),
Some(&plan),
query_ctx.clone(),
)?;
self.statement_executor
.exec_plan(plan, query_ctx)
.await
.context(TableOperationSnafu)
}
_ => {
query_interceptor.pre_execute(&stmt, None, query_ctx.clone())?;
self.statement_executor
.execute_sql(stmt, query_ctx)
.await
.context(TableOperationSnafu)
}
}
};
let query_fut = self.exec_statement_with_timeout(stmt, query_ctx, query_interceptor);
CancellableFuture::new(query_fut, ticket.cancellation_handle.clone())
.await
@@ -257,6 +214,153 @@ impl Instance {
Output { data, meta }
})
}
async fn exec_statement_with_timeout(
&self,
stmt: Statement,
query_ctx: QueryContextRef,
query_interceptor: Option<&SqlQueryInterceptorRef<Error>>,
) -> Result<Output> {
let timeout = derive_timeout(&stmt, &query_ctx);
match timeout {
Some(timeout) => {
let start = tokio::time::Instant::now();
let output = tokio::time::timeout(
timeout,
self.exec_statement(stmt, query_ctx, query_interceptor),
)
.await
.map_err(|_| StatementTimeoutSnafu.build())??;
// compute remaining timeout
let remaining_timeout = timeout.checked_sub(start.elapsed()).unwrap_or_default();
attach_timeout(output, remaining_timeout)
}
None => {
self.exec_statement(stmt, query_ctx, query_interceptor)
.await
}
}
}
async fn exec_statement(
&self,
stmt: Statement,
query_ctx: QueryContextRef,
query_interceptor: Option<&SqlQueryInterceptorRef<Error>>,
) -> Result<Output> {
match stmt {
Statement::Query(_) | Statement::Explain(_) | Statement::Delete(_) => {
// TODO: remove this when format is supported in datafusion
if let Statement::Explain(explain) = &stmt {
if let Some(format) = explain.format() {
query_ctx.set_explain_format(format.to_string());
}
}
self.plan_and_exec_sql(stmt, &query_ctx, query_interceptor)
.await
}
Statement::Tql(tql) => {
self.plan_and_exec_tql(&query_ctx, query_interceptor, tql)
.await
}
_ => {
query_interceptor.pre_execute(&stmt, None, query_ctx.clone())?;
self.statement_executor
.execute_sql(stmt, query_ctx)
.await
.context(TableOperationSnafu)
}
}
}
async fn plan_and_exec_sql(
&self,
stmt: Statement,
query_ctx: &QueryContextRef,
query_interceptor: Option<&SqlQueryInterceptorRef<Error>>,
) -> Result<Output> {
let stmt = QueryStatement::Sql(stmt);
let plan = self
.statement_executor
.plan(&stmt, query_ctx.clone())
.await?;
let QueryStatement::Sql(stmt) = stmt else {
unreachable!()
};
query_interceptor.pre_execute(&stmt, Some(&plan), query_ctx.clone())?;
self.statement_executor
.exec_plan(plan, query_ctx.clone())
.await
.context(TableOperationSnafu)
}
async fn plan_and_exec_tql(
&self,
query_ctx: &QueryContextRef,
query_interceptor: Option<&SqlQueryInterceptorRef<Error>>,
tql: Tql,
) -> Result<Output> {
let plan = self
.statement_executor
.plan_tql(tql.clone(), query_ctx)
.await?;
query_interceptor.pre_execute(&Statement::Tql(tql), Some(&plan), query_ctx.clone())?;
self.statement_executor
.exec_plan(plan, query_ctx.clone())
.await
.context(TableOperationSnafu)
}
}
/// If the relevant variables are set, the timeout is enforced for all PostgreSQL statements.
/// For MySQL, it applies only to read-only statements.
fn derive_timeout(stmt: &Statement, query_ctx: &QueryContextRef) -> Option<Duration> {
let query_timeout = query_ctx.query_timeout()?;
if query_timeout.is_zero() {
return None;
}
match query_ctx.channel() {
Channel::Mysql if stmt.is_readonly() => Some(query_timeout),
Channel::Postgres => Some(query_timeout),
_ => None,
}
}
fn attach_timeout(output: Output, mut timeout: Duration) -> Result<Output> {
if timeout.is_zero() {
return StatementTimeoutSnafu.fail();
}
let output = match output.data {
OutputData::AffectedRows(_) | OutputData::RecordBatches(_) => output,
OutputData::Stream(mut stream) => {
let schema = stream.schema();
let s = Box::pin(stream! {
let mut start = tokio::time::Instant::now();
while let Some(item) = tokio::time::timeout(timeout, stream.next()).await.map_err(|_| StreamTimeoutSnafu.build())? {
yield item;
let now = tokio::time::Instant::now();
timeout = timeout.checked_sub(now - start).unwrap_or(Duration::ZERO);
start = now;
// tokio::time::timeout may not return an error immediately when timeout is 0.
if timeout.is_zero() {
StreamTimeoutSnafu.fail()?;
}
}
}) as Pin<Box<dyn Stream<Item = _> + Send>>;
let stream = RecordBatchStreamWrapper {
schema,
stream: s,
output_ordering: None,
metrics: Default::default(),
};
Output::new(OutputData::Stream(Box::pin(stream)), output.meta)
}
};
Ok(output)
}
#[async_trait]

View File

@@ -218,6 +218,7 @@ mod tests {
let mut writer = Cursor::new(Vec::new());
let mut creator = BloomFilterCreator::new(
4,
0.01,
Arc::new(MockExternalTempFileProvider::new()),
Arc::new(AtomicUsize::new(0)),
None,

View File

@@ -30,9 +30,6 @@ use crate::bloom_filter::SEED;
use crate::external_provider::ExternalTempFileProvider;
use crate::Bytes;
/// The false positive rate of the Bloom filter.
pub const FALSE_POSITIVE_RATE: f64 = 0.01;
/// `BloomFilterCreator` is responsible for creating and managing bloom filters
/// for a set of elements. It divides the rows into segments and creates
/// bloom filters for each segment.
@@ -79,6 +76,7 @@ impl BloomFilterCreator {
/// `rows_per_segment` <= 0
pub fn new(
rows_per_segment: usize,
false_positive_rate: f64,
intermediate_provider: Arc<dyn ExternalTempFileProvider>,
global_memory_usage: Arc<AtomicUsize>,
global_memory_usage_threshold: Option<usize>,
@@ -95,6 +93,7 @@ impl BloomFilterCreator {
cur_seg_distinct_elems_mem_usage: 0,
global_memory_usage: global_memory_usage.clone(),
finalized_bloom_filters: FinalizedBloomFilterStorage::new(
false_positive_rate,
intermediate_provider,
global_memory_usage,
global_memory_usage_threshold,
@@ -263,6 +262,7 @@ mod tests {
let mut writer = Cursor::new(Vec::new());
let mut creator = BloomFilterCreator::new(
2,
0.01,
Arc::new(MockExternalTempFileProvider::new()),
Arc::new(AtomicUsize::new(0)),
None,
@@ -337,6 +337,7 @@ mod tests {
let mut writer = Cursor::new(Vec::new());
let mut creator: BloomFilterCreator = BloomFilterCreator::new(
2,
0.01,
Arc::new(MockExternalTempFileProvider::new()),
Arc::new(AtomicUsize::new(0)),
None,
@@ -418,6 +419,7 @@ mod tests {
let mut writer = Cursor::new(Vec::new());
let mut creator = BloomFilterCreator::new(
2,
0.01,
Arc::new(MockExternalTempFileProvider::new()),
Arc::new(AtomicUsize::new(0)),
None,

View File

@@ -23,7 +23,7 @@ use futures::{stream, AsyncWriteExt, Stream};
use snafu::ResultExt;
use crate::bloom_filter::creator::intermediate_codec::IntermediateBloomFilterCodecV1;
use crate::bloom_filter::creator::{FALSE_POSITIVE_RATE, SEED};
use crate::bloom_filter::creator::SEED;
use crate::bloom_filter::error::{IntermediateSnafu, IoSnafu, Result};
use crate::external_provider::ExternalTempFileProvider;
use crate::Bytes;
@@ -33,6 +33,9 @@ const MIN_MEMORY_USAGE_THRESHOLD: usize = 1024 * 1024; // 1MB
/// Storage for finalized Bloom filters.
pub struct FinalizedBloomFilterStorage {
/// The false positive rate of the Bloom filter.
false_positive_rate: f64,
/// Indices of the segments in the sequence of finalized Bloom filters.
segment_indices: Vec<usize>,
@@ -65,12 +68,14 @@ pub struct FinalizedBloomFilterStorage {
impl FinalizedBloomFilterStorage {
/// Creates a new `FinalizedBloomFilterStorage`.
pub fn new(
false_positive_rate: f64,
intermediate_provider: Arc<dyn ExternalTempFileProvider>,
global_memory_usage: Arc<AtomicUsize>,
global_memory_usage_threshold: Option<usize>,
) -> Self {
let external_prefix = format!("intm-bloom-filters-{}", uuid::Uuid::new_v4());
Self {
false_positive_rate,
segment_indices: Vec::new(),
in_memory: Vec::new(),
intermediate_file_id_counter: 0,
@@ -96,7 +101,7 @@ impl FinalizedBloomFilterStorage {
elems: impl IntoIterator<Item = Bytes>,
element_count: usize,
) -> Result<()> {
let mut bf = BloomFilter::with_false_pos(FALSE_POSITIVE_RATE)
let mut bf = BloomFilter::with_false_pos(self.false_positive_rate)
.seed(&SEED)
.expected_items(element_count);
for elem in elems.into_iter() {
@@ -284,6 +289,7 @@ mod tests {
let global_memory_usage_threshold = Some(1024 * 1024); // 1MB
let provider = Arc::new(mock_provider);
let mut storage = FinalizedBloomFilterStorage::new(
0.01,
provider,
global_memory_usage.clone(),
global_memory_usage_threshold,
@@ -340,6 +346,7 @@ mod tests {
let global_memory_usage_threshold = Some(1024 * 1024); // 1MB
let provider = Arc::new(mock_provider);
let mut storage = FinalizedBloomFilterStorage::new(
0.01,
provider,
global_memory_usage.clone(),
global_memory_usage_threshold,

View File

@@ -222,6 +222,7 @@ mod tests {
let mut writer = Cursor::new(vec![]);
let mut creator = BloomFilterCreator::new(
2,
0.01,
Arc::new(MockExternalTempFileProvider::new()),
Arc::new(AtomicUsize::new(0)),
None,

View File

@@ -45,6 +45,7 @@ impl BloomFilterFulltextIndexCreator {
pub fn new(
config: Config,
rows_per_segment: usize,
false_positive_rate: f64,
intermediate_provider: Arc<dyn ExternalTempFileProvider>,
global_memory_usage: Arc<AtomicUsize>,
global_memory_usage_threshold: Option<usize>,
@@ -57,6 +58,7 @@ impl BloomFilterFulltextIndexCreator {
let inner = BloomFilterCreator::new(
rows_per_segment,
false_positive_rate,
intermediate_provider,
global_memory_usage,
global_memory_usage_threshold,

View File

@@ -12,12 +12,11 @@
// See the License for the specific language governing permissions and
// limitations under the License.
use common_error::define_from_tonic_status;
use common_error::ext::ErrorExt;
use common_error::status_code::{convert_tonic_code_to_status_code, StatusCode};
use common_error::{GREPTIME_DB_HEADER_ERROR_CODE, GREPTIME_DB_HEADER_ERROR_MSG};
use common_error::status_code::StatusCode;
use common_macro::stack_trace_debug;
use snafu::{location, Location, Snafu};
use tonic::Status;
#[derive(Snafu)]
#[snafu(visibility(pub))]
@@ -161,33 +160,4 @@ impl Error {
}
}
// FIXME(dennis): partial duplicated with src/client/src/error.rs
impl From<Status> for Error {
fn from(e: Status) -> Self {
fn get_metadata_value(s: &Status, key: &str) -> Option<String> {
s.metadata()
.get(key)
.and_then(|v| String::from_utf8(v.as_bytes().to_vec()).ok())
}
let code = get_metadata_value(&e, GREPTIME_DB_HEADER_ERROR_CODE).and_then(|s| {
if let Ok(code) = s.parse::<u32>() {
StatusCode::from_u32(code)
} else {
None
}
});
let tonic_code = e.code();
let code = code.unwrap_or_else(|| convert_tonic_code_to_status_code(tonic_code));
let msg = get_metadata_value(&e, GREPTIME_DB_HEADER_ERROR_MSG)
.unwrap_or_else(|| e.message().to_string());
Self::MetaServer {
code,
msg,
tonic_code,
location: location!(),
}
}
}
define_from_tonic_status!(Error, MetaServer);

View File

@@ -45,6 +45,7 @@ datatypes.workspace = true
deadpool = { workspace = true, optional = true }
deadpool-postgres = { workspace = true, optional = true }
derive_builder.workspace = true
either.workspace = true
etcd-client.workspace = true
futures.workspace = true
h2 = "0.3"

View File

@@ -34,6 +34,7 @@ use common_meta::kv_backend::{KvBackendRef, ResettableKvBackendRef};
use common_telemetry::info;
#[cfg(feature = "pg_kvbackend")]
use deadpool_postgres::{Config, Runtime};
use either::Either;
use etcd_client::Client;
use servers::configurator::ConfiguratorRef;
use servers::export_metrics::ExportMetricsTask;
@@ -76,7 +77,7 @@ use crate::{error, Result};
pub struct MetasrvInstance {
metasrv: Arc<Metasrv>,
http_server: HttpServer,
http_server: Either<Option<HttpServerBuilder>, HttpServer>,
opts: MetasrvOptions,
@@ -99,10 +100,9 @@ impl MetasrvInstance {
plugins: Plugins,
metasrv: Metasrv,
) -> Result<MetasrvInstance> {
let http_server = HttpServerBuilder::new(opts.http.clone())
let builder = HttpServerBuilder::new(opts.http.clone())
.with_metrics_handler(MetricsHandler)
.with_greptime_config_options(opts.to_toml().context(error::TomlFormatSnafu)?)
.build();
.with_greptime_config_options(opts.to_toml().context(error::TomlFormatSnafu)?);
let metasrv = Arc::new(metasrv);
// put metasrv into plugins for later use
@@ -111,7 +111,7 @@ impl MetasrvInstance {
.context(error::InitExportMetricsTaskSnafu)?;
Ok(MetasrvInstance {
metasrv,
http_server,
http_server: Either::Left(Some(builder)),
opts,
signal_sender: None,
plugins,
@@ -122,6 +122,25 @@ impl MetasrvInstance {
}
pub async fn start(&mut self) -> Result<()> {
if let Some(builder) = self.http_server.as_mut().left()
&& let Some(builder) = builder.take()
{
let mut server = builder.build();
let addr = self.opts.http.addr.parse().context(error::ParseAddrSnafu {
addr: &self.opts.http.addr,
})?;
info!("starting http server at {}", addr);
server.start(addr).await.context(error::StartHttpSnafu)?;
self.http_server = Either::Right(server);
} else {
// If the http server builder is not present, the Metasrv has to be called "start"
// already, regardless of the startup was successful or not. Return an `Ok` here for
// simplicity.
return Ok(());
};
self.metasrv.try_start().await?;
if let Some(t) = self.export_metrics_task.as_ref() {
@@ -144,14 +163,6 @@ impl MetasrvInstance {
.await?;
self.bind_addr = Some(socket_addr);
let addr = self.opts.http.addr.parse().context(error::ParseAddrSnafu {
addr: &self.opts.http.addr,
})?;
self.http_server
.start(addr)
.await
.context(error::StartHttpSnafu)?;
*self.serve_state.lock().await = Some(serve_state_rx);
Ok(())
}
@@ -169,12 +180,15 @@ impl MetasrvInstance {
.context(error::SendShutdownSignalSnafu)?;
}
self.metasrv.shutdown().await?;
self.http_server
.shutdown()
.await
.context(error::ShutdownServerSnafu {
server: self.http_server.name(),
})?;
if let Some(http_server) = self.http_server.as_ref().right() {
http_server
.shutdown()
.await
.context(error::ShutdownServerSnafu {
server: http_server.name(),
})?;
}
Ok(())
}
@@ -188,6 +202,14 @@ impl MetasrvInstance {
pub fn bind_addr(&self) -> &Option<SocketAddr> {
&self.bind_addr
}
pub fn mut_http_server(&mut self) -> &mut Either<Option<HttpServerBuilder>, HttpServer> {
&mut self.http_server
}
pub fn http_server(&self) -> Option<&HttpServer> {
self.http_server.as_ref().right()
}
}
pub async fn bootstrap_metasrv_with_router(

View File

@@ -15,6 +15,7 @@
#![feature(result_flattening)]
#![feature(assert_matches)]
#![feature(hash_set_entry)]
#![feature(let_chains)]
pub mod bootstrap;
pub mod cache_invalidator;

View File

@@ -49,6 +49,7 @@ use common_telemetry::{error, info};
use manager::RegionMigrationProcedureGuard;
pub use manager::{
RegionMigrationManagerRef, RegionMigrationProcedureTask, RegionMigrationProcedureTracker,
RegionMigrationTriggerReason,
};
use serde::{Deserialize, Serialize};
use snafu::{OptionExt, ResultExt};
@@ -86,6 +87,9 @@ pub struct PersistentContext {
/// The timeout for downgrading leader region and upgrading candidate region operations.
#[serde(with = "humantime_serde", default = "default_timeout")]
timeout: Duration,
/// The trigger reason of region migration.
#[serde(default)]
trigger_reason: RegionMigrationTriggerReason,
}
fn default_timeout() -> Duration {
@@ -617,6 +621,7 @@ impl RegionMigrationProcedure {
from_peer: persistent_ctx.from_peer.clone(),
to_peer: persistent_ctx.to_peer.clone(),
timeout: persistent_ctx.timeout,
trigger_reason: persistent_ctx.trigger_reason,
});
let context = context_factory.new_context(persistent_ctx);
@@ -793,7 +798,7 @@ mod tests {
let procedure = RegionMigrationProcedure::new(persistent_context, context, None);
let serialized = procedure.dump().unwrap();
let expected = r#"{"persistent_ctx":{"catalog":"greptime","schema":"public","from_peer":{"id":1,"addr":""},"to_peer":{"id":2,"addr":""},"region_id":4398046511105,"timeout":"10s"},"state":{"region_migration_state":"RegionMigrationStart"}}"#;
let expected = r#"{"persistent_ctx":{"catalog":"greptime","schema":"public","from_peer":{"id":1,"addr":""},"to_peer":{"id":2,"addr":""},"region_id":4398046511105,"timeout":"10s","trigger_reason":"Unknown"},"state":{"region_migration_state":"RegionMigrationStart"}}"#;
assert_eq!(expected, serialized);
}

View File

@@ -51,10 +51,11 @@ impl State for CloseDowngradedRegion {
warn!(err; "Failed to close downgraded leader region: {region_id} on datanode {:?}", downgrade_leader_datanode);
}
info!(
"Region migration is finished: region_id: {}, from_peer: {}, to_peer: {}, {}",
"Region migration is finished: region_id: {}, from_peer: {}, to_peer: {}, trigger_reason: {}, {}",
ctx.region_id(),
ctx.persistent_ctx.from_peer,
ctx.persistent_ctx.to_peer,
ctx.persistent_ctx.trigger_reason,
ctx.volatile_ctx.metrics,
);
Ok((Box::new(RegionMigrationEnd), Status::done()))

View File

@@ -352,6 +352,7 @@ mod tests {
use super::*;
use crate::error::Error;
use crate::procedure::region_migration::manager::RegionMigrationTriggerReason;
use crate::procedure::region_migration::test_util::{new_procedure_context, TestingEnv};
use crate::procedure::region_migration::{ContextFactory, PersistentContext};
use crate::procedure::test_util::{
@@ -366,6 +367,7 @@ mod tests {
to_peer: Peer::empty(2),
region_id: RegionId::new(1024, 1),
timeout: Duration::from_millis(1000),
trigger_reason: RegionMigrationTriggerReason::Manual,
}
}

View File

@@ -24,6 +24,7 @@ use common_meta::peer::Peer;
use common_meta::rpc::router::RegionRoute;
use common_procedure::{watcher, ProcedureId, ProcedureManagerRef, ProcedureWithId};
use common_telemetry::{error, info, warn};
use serde::{Deserialize, Serialize};
use snafu::{ensure, OptionExt, ResultExt};
use store_api::storage::RegionId;
use table::table_name::TableName;
@@ -104,15 +105,38 @@ pub struct RegionMigrationProcedureTask {
pub(crate) from_peer: Peer,
pub(crate) to_peer: Peer,
pub(crate) timeout: Duration,
pub(crate) trigger_reason: RegionMigrationTriggerReason,
}
/// The reason why the region migration procedure is triggered.
#[derive(Default, Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize, strum::Display)]
#[strum(serialize_all = "PascalCase")]
pub enum RegionMigrationTriggerReason {
#[default]
/// The region migration procedure is triggered by unknown reason.
Unknown,
/// The region migration procedure is triggered by administrator.
Manual,
/// The region migration procedure is triggered by auto rebalance.
AutoRebalance,
/// The region migration procedure is triggered by failover.
Failover,
}
impl RegionMigrationProcedureTask {
pub fn new(region_id: RegionId, from_peer: Peer, to_peer: Peer, timeout: Duration) -> Self {
pub fn new(
region_id: RegionId,
from_peer: Peer,
to_peer: Peer,
timeout: Duration,
trigger_reason: RegionMigrationTriggerReason,
) -> Self {
Self {
region_id,
from_peer,
to_peer,
timeout,
trigger_reason,
}
}
}
@@ -121,8 +145,8 @@ impl Display for RegionMigrationProcedureTask {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
write!(
f,
"region: {}, from_peer: {}, to_peer: {}",
self.region_id, self.from_peer, self.to_peer
"region: {}, from_peer: {}, to_peer: {}, trigger_reason: {}",
self.region_id, self.from_peer, self.to_peer, self.trigger_reason
)
}
}
@@ -357,6 +381,7 @@ impl RegionMigrationManager {
from_peer,
to_peer,
timeout,
trigger_reason,
} = task.clone();
let procedure = RegionMigrationProcedure::new(
PersistentContext {
@@ -366,6 +391,7 @@ impl RegionMigrationManager {
from_peer,
to_peer,
timeout,
trigger_reason,
},
self.context_factory.clone(),
Some(guard),
@@ -424,6 +450,7 @@ mod test {
from_peer: Peer::empty(2),
to_peer: Peer::empty(1),
timeout: Duration::from_millis(1000),
trigger_reason: RegionMigrationTriggerReason::Manual,
};
// Inserts one
manager
@@ -448,6 +475,7 @@ mod test {
from_peer: Peer::empty(1),
to_peer: Peer::empty(1),
timeout: Duration::from_millis(1000),
trigger_reason: RegionMigrationTriggerReason::Manual,
};
let err = manager.submit_procedure(task).await.unwrap_err();
@@ -465,6 +493,7 @@ mod test {
from_peer: Peer::empty(1),
to_peer: Peer::empty(2),
timeout: Duration::from_millis(1000),
trigger_reason: RegionMigrationTriggerReason::Manual,
};
let err = manager.submit_procedure(task).await.unwrap_err();
@@ -482,6 +511,7 @@ mod test {
from_peer: Peer::empty(1),
to_peer: Peer::empty(2),
timeout: Duration::from_millis(1000),
trigger_reason: RegionMigrationTriggerReason::Manual,
};
let table_info = new_test_table_info(1024, vec![1]).into();
@@ -509,6 +539,7 @@ mod test {
from_peer: Peer::empty(1),
to_peer: Peer::empty(2),
timeout: Duration::from_millis(1000),
trigger_reason: RegionMigrationTriggerReason::Manual,
};
let table_info = new_test_table_info(1024, vec![1]).into();
@@ -537,6 +568,7 @@ mod test {
from_peer: Peer::empty(3),
to_peer: Peer::empty(2),
timeout: Duration::from_millis(1000),
trigger_reason: RegionMigrationTriggerReason::Manual,
};
let table_info = new_test_table_info(1024, vec![1]).into();
@@ -570,6 +602,7 @@ mod test {
from_peer: Peer::empty(1),
to_peer: Peer::empty(2),
timeout: Duration::from_millis(1000),
trigger_reason: RegionMigrationTriggerReason::Manual,
};
let table_info = new_test_table_info(1024, vec![1]).into();
@@ -597,6 +630,7 @@ mod test {
from_peer: Peer::empty(1),
to_peer: Peer::empty(2),
timeout: Duration::from_millis(1000),
trigger_reason: RegionMigrationTriggerReason::Manual,
};
let err = manager

View File

@@ -44,11 +44,12 @@ impl State for RegionMigrationAbort {
_procedure_ctx: &ProcedureContext,
) -> Result<(Box<dyn State>, Status)> {
warn!(
"Region migration is aborted: {}, region_id: {}, from_peer: {}, to_peer: {}, {}",
"Region migration is aborted: {}, region_id: {}, from_peer: {}, to_peer: {}, trigger_reason: {}, {}",
self.reason,
ctx.region_id(),
ctx.persistent_ctx.from_peer,
ctx.persistent_ctx.to_peer,
ctx.persistent_ctx.trigger_reason,
ctx.volatile_ctx.metrics,
);
error::MigrationAbortSnafu {

View File

@@ -45,7 +45,9 @@ use crate::metasrv::MetasrvInfo;
use crate::procedure::region_migration::close_downgraded_region::CloseDowngradedRegion;
use crate::procedure::region_migration::downgrade_leader_region::DowngradeLeaderRegion;
use crate::procedure::region_migration::flush_leader_region::PreFlushRegion;
use crate::procedure::region_migration::manager::RegionMigrationProcedureTracker;
use crate::procedure::region_migration::manager::{
RegionMigrationProcedureTracker, RegionMigrationTriggerReason,
};
use crate::procedure::region_migration::migration_abort::RegionMigrationAbort;
use crate::procedure::region_migration::migration_end::RegionMigrationEnd;
use crate::procedure::region_migration::open_candidate_region::OpenCandidateRegion;
@@ -189,6 +191,7 @@ pub fn new_persistent_context(from: u64, to: u64, region_id: RegionId) -> Persis
to_peer: Peer::empty(to),
region_id,
timeout: Duration::from_secs(10),
trigger_reason: RegionMigrationTriggerReason::default(),
}
}

View File

@@ -235,6 +235,7 @@ mod tests {
use super::*;
use crate::error::Error;
use crate::procedure::region_migration::manager::RegionMigrationTriggerReason;
use crate::procedure::region_migration::test_util::{new_procedure_context, TestingEnv};
use crate::procedure::region_migration::{ContextFactory, PersistentContext};
use crate::procedure::test_util::{
@@ -249,6 +250,7 @@ mod tests {
to_peer: Peer::empty(2),
region_id: RegionId::new(1024, 1),
timeout: Duration::from_millis(1000),
trigger_reason: RegionMigrationTriggerReason::Manual,
}
}

View File

@@ -196,6 +196,7 @@ pub mod test_data {
options: TableOptions::default(),
created_on: DateTime::default(),
partition_key_indices: vec![],
column_ids: vec![],
},
table_type: TableType::Base,
}

View File

@@ -43,7 +43,9 @@ use tokio::time::{interval, interval_at, MissedTickBehavior};
use crate::error::{self, Result};
use crate::failure_detector::PhiAccrualFailureDetectorOptions;
use crate::metasrv::{RegionStatAwareSelectorRef, SelectTarget, SelectorContext, SelectorRef};
use crate::procedure::region_migration::manager::RegionMigrationManagerRef;
use crate::procedure::region_migration::manager::{
RegionMigrationManagerRef, RegionMigrationTriggerReason,
};
use crate::procedure::region_migration::{
RegionMigrationProcedureTask, DEFAULT_REGION_MIGRATION_TIMEOUT,
};
@@ -649,6 +651,7 @@ impl RegionSupervisor {
from_peer: from_peer.clone(),
to_peer: peer,
timeout: DEFAULT_REGION_MIGRATION_TIMEOUT * count,
trigger_reason: RegionMigrationTriggerReason::Failover,
};
tasks.push((task, count));
}

View File

@@ -30,7 +30,9 @@ use tonic::{Request, Response};
use crate::error;
use crate::metasrv::Metasrv;
use crate::procedure::region_migration::manager::RegionMigrationProcedureTask;
use crate::procedure::region_migration::manager::{
RegionMigrationProcedureTask, RegionMigrationTriggerReason,
};
use crate::service::GrpcResult;
#[async_trait::async_trait]
@@ -154,6 +156,7 @@ impl procedure_service_server::ProcedureService for Metasrv {
from_peer,
to_peer,
timeout: Duration::from_secs(timeout_secs.into()),
trigger_reason: RegionMigrationTriggerReason::Manual,
})
.await?
.map(procedure::pid_to_pb_pid);

View File

@@ -40,5 +40,6 @@ store-api.workspace = true
tokio.workspace = true
[dev-dependencies]
common-meta = { workspace = true, features = ["testing"] }
common-test-util.workspace = true
mito2 = { workspace = true, features = ["test"] }

View File

@@ -145,12 +145,19 @@ impl DataRegion {
IndexOptions::Inverted => {
c.column_schema.set_inverted_index(true);
}
IndexOptions::Skipping { granularity } => {
IndexOptions::Skipping {
granularity,
false_positive_rate,
} => {
c.column_schema
.set_skipping_options(&SkippingIndexOptions {
granularity,
index_type: SkippingIndexType::BloomFilter,
})
.set_skipping_options(
&SkippingIndexOptions::new(
granularity,
false_positive_rate,
SkippingIndexType::BloomFilter,
)
.context(SetSkippingIndexOptionSnafu)?,
)
.context(SetSkippingIndexOptionSnafu)?;
}
}

View File

@@ -477,8 +477,9 @@ struct MetricEngineInner {
mod test {
use std::collections::HashMap;
use common_telemetry::info;
use store_api::metric_engine_consts::PHYSICAL_TABLE_METADATA_KEY;
use store_api::region_request::{RegionCloseRequest, RegionOpenRequest};
use store_api::region_request::{RegionCloseRequest, RegionFlushRequest, RegionOpenRequest};
use super::*;
use crate::test_util::TestEnv;
@@ -563,4 +564,90 @@ mod test {
assert!(env.metric().region_statistic(logical_region_id).is_none());
assert!(env.metric().region_statistic(physical_region_id).is_some());
}
#[tokio::test]
async fn test_open_region_failure() {
let env = TestEnv::new().await;
env.init_metric_region().await;
let physical_region_id = env.default_physical_region_id();
let metric_engine = env.metric();
metric_engine
.handle_request(
physical_region_id,
RegionRequest::Flush(RegionFlushRequest {
row_group_size: None,
}),
)
.await
.unwrap();
let path = format!("{}/metadata/", env.default_region_dir());
let object_store = env.get_object_store().unwrap();
let list = object_store.list(&path).await.unwrap();
// Delete parquet files in metadata region
for entry in list {
if entry.metadata().is_dir() {
continue;
}
if entry.name().ends_with("parquet") {
info!("deleting {}", entry.path());
object_store.delete(entry.path()).await.unwrap();
}
}
let physical_region_option = [(PHYSICAL_TABLE_METADATA_KEY.to_string(), String::new())]
.into_iter()
.collect();
let open_request = RegionOpenRequest {
engine: METRIC_ENGINE_NAME.to_string(),
region_dir: env.default_region_dir(),
options: physical_region_option,
skip_wal_replay: false,
};
// Opening an already opened region should succeed.
// Since the region is already open, no metadata recovery operations will be performed.
metric_engine
.handle_request(physical_region_id, RegionRequest::Open(open_request))
.await
.unwrap();
// Close the region
metric_engine
.handle_request(
physical_region_id,
RegionRequest::Close(RegionCloseRequest {}),
)
.await
.unwrap();
// Try to reopen region.
let physical_region_option = [(PHYSICAL_TABLE_METADATA_KEY.to_string(), String::new())]
.into_iter()
.collect();
let open_request = RegionOpenRequest {
engine: METRIC_ENGINE_NAME.to_string(),
region_dir: env.default_region_dir(),
options: physical_region_option,
skip_wal_replay: false,
};
let err = metric_engine
.handle_request(physical_region_id, RegionRequest::Open(open_request))
.await
.unwrap_err();
// Failed to open region because of missing parquet files.
assert_eq!(err.status_code(), StatusCode::StorageUnavailable);
let mito_engine = metric_engine.mito();
let data_region_id = utils::to_data_region_id(physical_region_id);
let metadata_region_id = utils::to_metadata_region_id(physical_region_id);
// The metadata/data region should be closed.
let err = mito_engine.get_metadata(data_region_id).await.unwrap_err();
assert_eq!(err.status_code(), StatusCode::RegionNotFound);
let err = mito_engine
.get_metadata(metadata_region_id)
.await
.unwrap_err();
assert_eq!(err.status_code(), StatusCode::RegionNotFound);
}
}

View File

@@ -66,7 +66,7 @@ impl MetricEngineInner {
let mut manifest_infos = Vec::with_capacity(1);
self.alter_logical_regions(physical_region_id, requests, extension_return_value)
.await?;
append_manifest_info(&self.mito, region_id, &mut manifest_infos);
append_manifest_info(&self.mito, physical_region_id, &mut manifest_infos);
encode_manifest_info_to_extensions(&manifest_infos, extension_return_value)?;
} else {
let grouped_requests =
@@ -222,13 +222,17 @@ mod test {
use std::time::Duration;
use api::v1::SemanticType;
use datatypes::data_type::ConcreteDataType;
use datatypes::schema::ColumnSchema;
use store_api::metadata::ColumnMetadata;
use store_api::region_request::{AddColumn, SetRegionOption};
use common_meta::ddl::test_util::assert_column_name_and_id;
use common_meta::ddl::utils::{parse_column_metadatas, parse_manifest_infos_from_extensions};
use store_api::metric_engine_consts::ALTER_PHYSICAL_EXTENSION_KEY;
use store_api::region_engine::RegionEngine;
use store_api::region_request::{
AlterKind, BatchRegionDdlRequest, RegionAlterRequest, SetRegionOption,
};
use store_api::storage::consts::ReservedColumnId;
use store_api::storage::RegionId;
use super::*;
use crate::test_util::TestEnv;
use crate::test_util::{alter_logical_region_request, create_logical_region_request, TestEnv};
#[tokio::test]
async fn test_alter_region() {
@@ -239,22 +243,7 @@ mod test {
// alter physical region
let physical_region_id = env.default_physical_region_id();
let request = RegionAlterRequest {
kind: AlterKind::AddColumns {
columns: vec![AddColumn {
column_metadata: ColumnMetadata {
column_id: 0,
semantic_type: SemanticType::Tag,
column_schema: ColumnSchema::new(
"tag1",
ConcreteDataType::string_datatype(),
false,
),
},
location: None,
}],
},
};
let request = alter_logical_region_request(&["tag1"]);
let result = engine_inner
.alter_physical_region(physical_region_id, request.clone())
@@ -287,14 +276,18 @@ mod test {
assert!(!is_column_exist);
let region_id = env.default_logical_region_id();
engine_inner
.alter_logical_regions(
physical_region_id,
vec![(region_id, request)],
&mut HashMap::new(),
)
let response = env
.metric()
.handle_batch_ddl_requests(BatchRegionDdlRequest::Alter(vec![(
region_id,
request.clone(),
)]))
.await
.unwrap();
let manifest_infos = parse_manifest_infos_from_extensions(&response.extensions).unwrap();
assert_eq!(manifest_infos[0].0, physical_region_id);
assert!(manifest_infos[0].1.is_metric());
let semantic_type = metadata_region
.column_semantic_type(physical_region_id, logical_region_id, "tag1")
.await
@@ -307,5 +300,77 @@ mod test {
.unwrap()
.unwrap();
assert_eq!(timestamp_index, SemanticType::Timestamp);
let column_metadatas =
parse_column_metadatas(&response.extensions, ALTER_PHYSICAL_EXTENSION_KEY).unwrap();
assert_column_name_and_id(
&column_metadatas,
&[
("greptime_timestamp", 0),
("greptime_value", 1),
("__table_id", ReservedColumnId::table_id()),
("__tsid", ReservedColumnId::tsid()),
("job", 2),
("tag1", 3),
],
);
}
#[tokio::test]
async fn test_alter_logical_regions() {
let env = TestEnv::new().await;
let engine = env.metric();
let physical_region_id1 = RegionId::new(1024, 0);
let physical_region_id2 = RegionId::new(1024, 1);
let logical_region_id1 = RegionId::new(1025, 0);
let logical_region_id2 = RegionId::new(1025, 1);
env.create_physical_region(physical_region_id1, "/test_dir1")
.await;
env.create_physical_region(physical_region_id2, "/test_dir2")
.await;
let region_create_request1 = crate::test_util::create_logical_region_request(
&["job"],
physical_region_id1,
"logical1",
);
let region_create_request2 =
create_logical_region_request(&["job"], physical_region_id2, "logical2");
engine
.handle_batch_ddl_requests(BatchRegionDdlRequest::Create(vec![
(logical_region_id1, region_create_request1),
(logical_region_id2, region_create_request2),
]))
.await
.unwrap();
let region_alter_request1 = alter_logical_region_request(&["tag1"]);
let region_alter_request2 = alter_logical_region_request(&["tag1"]);
let response = engine
.handle_batch_ddl_requests(BatchRegionDdlRequest::Alter(vec![
(logical_region_id1, region_alter_request1),
(logical_region_id2, region_alter_request2),
]))
.await
.unwrap();
let manifest_infos = parse_manifest_infos_from_extensions(&response.extensions).unwrap();
assert_eq!(manifest_infos.len(), 2);
let region_ids = manifest_infos.into_iter().map(|i| i.0).collect::<Vec<_>>();
assert!(region_ids.contains(&physical_region_id1));
assert!(region_ids.contains(&physical_region_id2));
let column_metadatas =
parse_column_metadatas(&response.extensions, ALTER_PHYSICAL_EXTENSION_KEY).unwrap();
assert_column_name_and_id(
&column_metadatas,
&[
("greptime_timestamp", 0),
("greptime_value", 1),
("__table_id", ReservedColumnId::table_id()),
("__tsid", ReservedColumnId::tsid()),
("job", 2),
("tag1", 3),
],
);
}
}

View File

@@ -59,7 +59,7 @@ impl MetricEngineInner {
}
}
async fn close_physical_region(&self, region_id: RegionId) -> Result<AffectedRows> {
pub(crate) async fn close_physical_region(&self, region_id: RegionId) -> Result<AffectedRows> {
let data_region_id = utils::to_data_region_id(region_id);
let metadata_region_id = utils::to_metadata_region_id(region_id);

View File

@@ -55,6 +55,7 @@ use crate::utils::{
};
const DEFAULT_TABLE_ID_SKIPPING_INDEX_GRANULARITY: u32 = 1024;
const DEFAULT_TABLE_ID_SKIPPING_INDEX_FALSE_POSITIVE_RATE: f64 = 0.01;
impl MetricEngineInner {
pub async fn create_regions(
@@ -79,7 +80,8 @@ impl MetricEngineInner {
}
);
let (region_id, request) = requests.pop().unwrap();
self.create_physical_region(region_id, request).await?;
self.create_physical_region(region_id, request, extension_return_value)
.await?;
return Ok(0);
} else if first_request
@@ -121,6 +123,7 @@ impl MetricEngineInner {
&self,
region_id: RegionId,
request: RegionCreateRequest,
extension_return_value: &mut HashMap<String, Vec<u8>>,
) -> Result<()> {
let physical_region_options = PhysicalRegionOptions::try_from(&request.options)?;
let (data_region_id, metadata_region_id) = Self::transform_region_id(region_id);
@@ -161,7 +164,8 @@ impl MetricEngineInner {
.context(UnexpectedRequestSnafu {
reason: "No time index column found",
})?;
self.mito
let response = self
.mito
.handle_request(
data_region_id,
RegionRequest::Create(create_data_region_request),
@@ -175,6 +179,7 @@ impl MetricEngineInner {
region_id: data_region_id,
},
)?;
extension_return_value.extend(response.extensions);
info!("Created physical metric region {region_id}, primary key encoding={primary_key_encoding}, physical_region_options={physical_region_options:?}");
PHYSICAL_REGION_COUNT.inc();
@@ -542,10 +547,11 @@ impl MetricEngineInner {
ConcreteDataType::uint32_datatype(),
false,
)
.with_skipping_options(SkippingIndexOptions {
granularity: DEFAULT_TABLE_ID_SKIPPING_INDEX_GRANULARITY,
index_type: datatypes::schema::SkippingIndexType::BloomFilter,
})
.with_skipping_options(SkippingIndexOptions::new_unchecked(
DEFAULT_TABLE_ID_SKIPPING_INDEX_GRANULARITY,
DEFAULT_TABLE_ID_SKIPPING_INDEX_FALSE_POSITIVE_RATE,
datatypes::schema::SkippingIndexType::BloomFilter,
))
.unwrap(),
};
let tsid_col = ColumnMetadata {
@@ -611,12 +617,15 @@ pub(crate) fn region_options_for_metadata_region(
#[cfg(test)]
mod test {
use common_meta::ddl::test_util::assert_column_name_and_id;
use common_meta::ddl::utils::{parse_column_metadatas, parse_manifest_infos_from_extensions};
use store_api::metric_engine_consts::{METRIC_ENGINE_NAME, PHYSICAL_TABLE_METADATA_KEY};
use store_api::region_request::BatchRegionDdlRequest;
use super::*;
use crate::config::EngineConfig;
use crate::engine::MetricEngine;
use crate::test_util::TestEnv;
use crate::test_util::{create_logical_region_request, TestEnv};
#[test]
fn test_verify_region_create_request() {
@@ -805,4 +814,50 @@ mod test {
);
assert!(!metadata_region_request.options.contains_key("skip_wal"));
}
#[tokio::test]
async fn test_create_logical_regions() {
let env = TestEnv::new().await;
let engine = env.metric();
let physical_region_id1 = RegionId::new(1024, 0);
let physical_region_id2 = RegionId::new(1024, 1);
let logical_region_id1 = RegionId::new(1025, 0);
let logical_region_id2 = RegionId::new(1025, 1);
env.create_physical_region(physical_region_id1, "/test_dir1")
.await;
env.create_physical_region(physical_region_id2, "/test_dir2")
.await;
let region_create_request1 =
create_logical_region_request(&["job"], physical_region_id1, "logical1");
let region_create_request2 =
create_logical_region_request(&["job"], physical_region_id2, "logical2");
let response = engine
.handle_batch_ddl_requests(BatchRegionDdlRequest::Create(vec![
(logical_region_id1, region_create_request1),
(logical_region_id2, region_create_request2),
]))
.await
.unwrap();
let manifest_infos = parse_manifest_infos_from_extensions(&response.extensions).unwrap();
assert_eq!(manifest_infos.len(), 2);
let region_ids = manifest_infos.into_iter().map(|i| i.0).collect::<Vec<_>>();
assert!(region_ids.contains(&physical_region_id1));
assert!(region_ids.contains(&physical_region_id2));
let column_metadatas =
parse_column_metadatas(&response.extensions, ALTER_PHYSICAL_EXTENSION_KEY).unwrap();
assert_column_name_and_id(
&column_metadatas,
&[
("greptime_timestamp", 0),
("greptime_value", 1),
("__table_id", ReservedColumnId::table_id()),
("__tsid", ReservedColumnId::tsid()),
("job", 2),
],
);
}
}

View File

@@ -17,7 +17,7 @@
use api::region::RegionResponse;
use api::v1::SemanticType;
use common_error::ext::BoxedError;
use common_telemetry::info;
use common_telemetry::{error, info, warn};
use datafusion::common::HashMap;
use mito2::engine::MITO_ENGINE_NAME;
use object_store::util::join_dir;
@@ -94,6 +94,21 @@ impl MetricEngineInner {
Ok(responses)
}
// If the metadata region is opened with a stale manifest,
// the metric engine may fail to recover logical tables from the metadata region,
// as the manifest could reference files that have already been deleted
// due to compaction operations performed by the region leader.
async fn close_physical_region_on_recovery_failure(&self, physical_region_id: RegionId) {
info!(
"Closing metadata region {} and data region {} on metadata recovery failure",
utils::to_metadata_region_id(physical_region_id),
utils::to_data_region_id(physical_region_id)
);
if let Err(err) = self.close_physical_region(physical_region_id).await {
error!(err; "Failed to close physical region {}", physical_region_id);
}
}
async fn open_physical_region_with_results(
&self,
metadata_region_result: Option<std::result::Result<RegionResponse, BoxedError>>,
@@ -119,8 +134,14 @@ impl MetricEngineInner {
region_type: "data",
})?;
self.recover_states(physical_region_id, physical_region_options)
.await?;
if let Err(err) = self
.recover_states(physical_region_id, physical_region_options)
.await
{
self.close_physical_region_on_recovery_failure(physical_region_id)
.await;
return Err(err);
}
Ok(data_region_response)
}
@@ -139,11 +160,31 @@ impl MetricEngineInner {
request: RegionOpenRequest,
) -> Result<AffectedRows> {
if request.is_physical_table() {
if self
.state
.read()
.unwrap()
.physical_region_states()
.get(&region_id)
.is_some()
{
warn!(
"The physical region {} is already open, ignore the open request",
region_id
);
return Ok(0);
}
// open physical region and recover states
let physical_region_options = PhysicalRegionOptions::try_from(&request.options)?;
self.open_physical_region(region_id, request).await?;
self.recover_states(region_id, physical_region_options)
.await?;
if let Err(err) = self
.recover_states(region_id, physical_region_options)
.await
{
self.close_physical_region_on_recovery_failure(region_id)
.await;
return Err(err);
}
Ok(0)
} else {

View File

@@ -17,6 +17,8 @@
use std::collections::HashMap;
use store_api::metric_engine_consts::{
METRIC_ENGINE_INDEX_SKIPPING_INDEX_FALSE_POSITIVE_RATE_OPTION,
METRIC_ENGINE_INDEX_SKIPPING_INDEX_FALSE_POSITIVE_RATE_OPTION_DEFAULT,
METRIC_ENGINE_INDEX_SKIPPING_INDEX_GRANULARITY_OPTION,
METRIC_ENGINE_INDEX_SKIPPING_INDEX_GRANULARITY_OPTION_DEFAULT, METRIC_ENGINE_INDEX_TYPE_OPTION,
};
@@ -31,19 +33,20 @@ use crate::error::{Error, ParseRegionOptionsSnafu, Result};
const SEG_ROW_COUNT_FOR_DATA_REGION: u32 = 256;
/// Physical region options.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
#[derive(Debug, Clone, Copy, PartialEq)]
pub struct PhysicalRegionOptions {
pub index: IndexOptions,
}
/// Index options for auto created columns
#[derive(Debug, Clone, Copy, Default, PartialEq, Eq)]
#[derive(Debug, Clone, Copy, Default, PartialEq)]
pub enum IndexOptions {
#[default]
None,
Inverted,
Skipping {
granularity: u32,
false_positive_rate: f64,
},
}
@@ -54,6 +57,7 @@ pub fn set_data_region_options(
) {
options.remove(METRIC_ENGINE_INDEX_TYPE_OPTION);
options.remove(METRIC_ENGINE_INDEX_SKIPPING_INDEX_GRANULARITY_OPTION);
options.remove(METRIC_ENGINE_INDEX_SKIPPING_INDEX_FALSE_POSITIVE_RATE_OPTION);
options.insert(
"index.inverted_index.segment_row_count".to_string(),
SEG_ROW_COUNT_FOR_DATA_REGION.to_string(),
@@ -93,7 +97,23 @@ impl TryFrom<&HashMap<String, String>> for PhysicalRegionOptions {
})
},
)?;
Ok(IndexOptions::Skipping { granularity })
let false_positive_rate = value
.get(METRIC_ENGINE_INDEX_SKIPPING_INDEX_FALSE_POSITIVE_RATE_OPTION)
.map_or(
Ok(METRIC_ENGINE_INDEX_SKIPPING_INDEX_FALSE_POSITIVE_RATE_OPTION_DEFAULT),
|f| {
f.parse().ok().filter(|f| *f > 0.0 && *f <= 1.0).ok_or(
ParseRegionOptionsSnafu {
reason: format!("Invalid false positive rate: {}", f),
}
.build(),
)
},
)?;
Ok(IndexOptions::Skipping {
granularity,
false_positive_rate,
})
}
Some(index_type) => ParseRegionOptionsSnafu {
reason: format!("Invalid index type: {}", index_type),
@@ -121,11 +141,16 @@ mod tests {
METRIC_ENGINE_INDEX_SKIPPING_INDEX_GRANULARITY_OPTION.to_string(),
"102400".to_string(),
);
options.insert(
METRIC_ENGINE_INDEX_SKIPPING_INDEX_FALSE_POSITIVE_RATE_OPTION.to_string(),
"0.01".to_string(),
);
set_data_region_options(&mut options, false);
for key in [
METRIC_ENGINE_INDEX_TYPE_OPTION,
METRIC_ENGINE_INDEX_SKIPPING_INDEX_GRANULARITY_OPTION,
METRIC_ENGINE_INDEX_SKIPPING_INDEX_FALSE_POSITIVE_RATE_OPTION,
] {
assert_eq!(options.get(key), None);
}
@@ -154,11 +179,16 @@ mod tests {
METRIC_ENGINE_INDEX_SKIPPING_INDEX_GRANULARITY_OPTION.to_string(),
"102400".to_string(),
);
options.insert(
METRIC_ENGINE_INDEX_SKIPPING_INDEX_FALSE_POSITIVE_RATE_OPTION.to_string(),
"0.01".to_string(),
);
let physical_region_options = PhysicalRegionOptions::try_from(&options).unwrap();
assert_eq!(
physical_region_options.index,
IndexOptions::Skipping {
granularity: 102400
granularity: 102400,
false_positive_rate: 0.01,
}
);
}

View File

@@ -16,6 +16,7 @@
use api::v1::value::ValueData;
use api::v1::{ColumnDataType, ColumnSchema as PbColumnSchema, Row, SemanticType, Value};
use common_meta::ddl::utils::parse_column_metadatas;
use common_telemetry::debug;
use datatypes::prelude::ConcreteDataType;
use datatypes::schema::ColumnSchema;
@@ -23,14 +24,17 @@ use mito2::config::MitoConfig;
use mito2::engine::MitoEngine;
use mito2::test_util::TestEnv as MitoTestEnv;
use object_store::util::join_dir;
use object_store::ObjectStore;
use store_api::metadata::ColumnMetadata;
use store_api::metric_engine_consts::{
LOGICAL_TABLE_METADATA_KEY, METRIC_ENGINE_NAME, PHYSICAL_TABLE_METADATA_KEY,
ALTER_PHYSICAL_EXTENSION_KEY, LOGICAL_TABLE_METADATA_KEY, METRIC_ENGINE_NAME,
PHYSICAL_TABLE_METADATA_KEY, TABLE_COLUMN_METADATA_EXTENSION_KEY,
};
use store_api::region_engine::RegionEngine;
use store_api::region_request::{
AddColumn, AlterKind, RegionAlterRequest, RegionCreateRequest, RegionOpenRequest, RegionRequest,
};
use store_api::storage::consts::ReservedColumnId;
use store_api::storage::{ColumnId, RegionId};
use crate::config::EngineConfig;
@@ -74,6 +78,10 @@ impl TestEnv {
join_dir(&env_root, "data")
}
pub fn get_object_store(&self) -> Option<ObjectStore> {
self.mito_env.get_object_store()
}
/// Returns a reference to the engine.
pub fn mito(&self) -> MitoEngine {
self.mito.clone()
@@ -111,13 +119,8 @@ impl TestEnv {
(mito, metric)
}
/// Create regions in [MetricEngine] under [`default_region_id`]
/// and region dir `"test_metric_region"`.
///
/// This method will create one logical region with three columns `(ts, val, job)`
/// under [`default_logical_region_id`].
pub async fn init_metric_region(&self) {
let region_id = self.default_physical_region_id();
/// Create regions in [MetricEngine] with specific `physical_region_id`.
pub async fn create_physical_region(&self, physical_region_id: RegionId, region_dir: &str) {
let region_create_request = RegionCreateRequest {
engine: METRIC_ENGINE_NAME.to_string(),
column_metadatas: vec![
@@ -144,26 +147,88 @@ impl TestEnv {
options: [(PHYSICAL_TABLE_METADATA_KEY.to_string(), String::new())]
.into_iter()
.collect(),
region_dir: self.default_region_dir(),
region_dir: region_dir.to_string(),
};
// create physical region
self.metric()
.handle_request(region_id, RegionRequest::Create(region_create_request))
let response = self
.metric()
.handle_request(
physical_region_id,
RegionRequest::Create(region_create_request),
)
.await
.unwrap();
let column_metadatas =
parse_column_metadatas(&response.extensions, TABLE_COLUMN_METADATA_EXTENSION_KEY)
.unwrap();
assert_eq!(column_metadatas.len(), 4);
}
// create logical region
let region_id = self.default_logical_region_id();
/// Create logical region in [MetricEngine] with specific `physical_region_id` and `logical_region_id`.
pub async fn create_logical_region(
&self,
physical_region_id: RegionId,
logical_region_id: RegionId,
) {
let region_create_request = create_logical_region_request(
&["job"],
self.default_physical_region_id(),
physical_region_id,
"test_metric_logical_region",
);
self.metric()
.handle_request(region_id, RegionRequest::Create(region_create_request))
let response = self
.metric()
.handle_request(
logical_region_id,
RegionRequest::Create(region_create_request),
)
.await
.unwrap();
let column_metadatas =
parse_column_metadatas(&response.extensions, ALTER_PHYSICAL_EXTENSION_KEY).unwrap();
assert_eq!(column_metadatas.len(), 5);
let column_names = column_metadatas
.iter()
.map(|c| c.column_schema.name.as_str())
.collect::<Vec<_>>();
let column_ids = column_metadatas
.iter()
.map(|c| c.column_id)
.collect::<Vec<_>>();
assert_eq!(
column_names,
vec![
"greptime_timestamp",
"greptime_value",
"__table_id",
"__tsid",
"job",
]
);
assert_eq!(
column_ids,
vec![
0,
1,
ReservedColumnId::table_id(),
ReservedColumnId::tsid(),
2,
]
);
}
/// Create regions in [MetricEngine] under [`default_region_id`]
/// and region dir `"test_metric_region"`.
///
/// This method will create one logical region with three columns `(ts, val, job)`
/// under [`default_logical_region_id`].
pub async fn init_metric_region(&self) {
let physical_region_id = self.default_physical_region_id();
self.create_physical_region(physical_region_id, &self.default_region_dir())
.await;
let logical_region_id = self.default_logical_region_id();
self.create_logical_region(physical_region_id, logical_region_id)
.await;
}
pub fn metadata_region(&self) -> MetadataRegion {
@@ -269,6 +334,30 @@ pub fn create_logical_region_request(
}
}
/// Generate a [RegionAlterRequest] for logical region.
/// Only need to specify tag column's name
pub fn alter_logical_region_request(tags: &[&str]) -> RegionAlterRequest {
RegionAlterRequest {
kind: AlterKind::AddColumns {
columns: tags
.iter()
.map(|tag| AddColumn {
column_metadata: ColumnMetadata {
column_id: 0,
semantic_type: SemanticType::Tag,
column_schema: ColumnSchema::new(
tag.to_string(),
ConcreteDataType::string_datatype(),
false,
),
},
location: None,
})
.collect::<Vec<_>>(),
},
}
}
/// Generate a row schema with given tag columns.
///
/// The result will also contains default timestamp and value column at beginning.

View File

@@ -7,6 +7,7 @@ license.workspace = true
[features]
default = []
test = ["common-test-util", "rstest", "rstest_reuse", "rskafka"]
enterprise = []
[lints]
workspace = true

View File

@@ -80,8 +80,10 @@ use snafu::{ensure, OptionExt, ResultExt};
use store_api::codec::PrimaryKeyEncoding;
use store_api::logstore::provider::Provider;
use store_api::logstore::LogStore;
use store_api::metadata::RegionMetadataRef;
use store_api::metric_engine_consts::MANIFEST_INFO_EXTENSION_KEY;
use store_api::metadata::{ColumnMetadata, RegionMetadataRef};
use store_api::metric_engine_consts::{
MANIFEST_INFO_EXTENSION_KEY, TABLE_COLUMN_METADATA_EXTENSION_KEY,
};
use store_api::region_engine::{
BatchResponses, RegionEngine, RegionManifestInfo, RegionRole, RegionScannerRef,
RegionStatistic, SetRegionRoleStateResponse, SettableRegionRoleState, SyncManifestResponse,
@@ -95,8 +97,10 @@ use crate::cache::CacheStrategy;
use crate::config::MitoConfig;
use crate::error::{
InvalidRequestSnafu, JoinSnafu, MitoManifestInfoSnafu, RecvSnafu, RegionNotFoundSnafu, Result,
SerdeJsonSnafu,
SerdeJsonSnafu, SerializeColumnMetadataSnafu,
};
#[cfg(feature = "enterprise")]
use crate::extension::BoxedExtensionRangeProviderFactory;
use crate::manifest::action::RegionEdit;
use crate::memtable::MemtableStats;
use crate::metrics::HANDLE_REQUEST_ELAPSED;
@@ -113,6 +117,81 @@ use crate::worker::WorkerGroup;
pub const MITO_ENGINE_NAME: &str = "mito";
pub struct MitoEngineBuilder<'a, S: LogStore> {
data_home: &'a str,
config: MitoConfig,
log_store: Arc<S>,
object_store_manager: ObjectStoreManagerRef,
schema_metadata_manager: SchemaMetadataManagerRef,
plugins: Plugins,
#[cfg(feature = "enterprise")]
extension_range_provider_factory: Option<BoxedExtensionRangeProviderFactory>,
}
impl<'a, S: LogStore> MitoEngineBuilder<'a, S> {
pub fn new(
data_home: &'a str,
config: MitoConfig,
log_store: Arc<S>,
object_store_manager: ObjectStoreManagerRef,
schema_metadata_manager: SchemaMetadataManagerRef,
plugins: Plugins,
) -> Self {
Self {
data_home,
config,
log_store,
object_store_manager,
schema_metadata_manager,
plugins,
#[cfg(feature = "enterprise")]
extension_range_provider_factory: None,
}
}
#[cfg(feature = "enterprise")]
#[must_use]
pub fn with_extension_range_provider_factory(
self,
extension_range_provider_factory: Option<BoxedExtensionRangeProviderFactory>,
) -> Self {
Self {
extension_range_provider_factory,
..self
}
}
pub async fn try_build(mut self) -> Result<MitoEngine> {
self.config.sanitize(self.data_home)?;
let config = Arc::new(self.config);
let workers = WorkerGroup::start(
config.clone(),
self.log_store.clone(),
self.object_store_manager,
self.schema_metadata_manager,
self.plugins,
)
.await?;
let wal_raw_entry_reader = Arc::new(LogStoreRawEntryReader::new(self.log_store));
let inner = EngineInner {
workers,
config,
wal_raw_entry_reader,
#[cfg(feature = "enterprise")]
extension_range_provider_factory: None,
};
#[cfg(feature = "enterprise")]
let inner =
inner.with_extension_range_provider_factory(self.extension_range_provider_factory);
Ok(MitoEngine {
inner: Arc::new(inner),
})
}
}
/// Region engine implementation for timeseries data.
#[derive(Clone)]
pub struct MitoEngine {
@@ -123,26 +202,21 @@ impl MitoEngine {
/// Returns a new [MitoEngine] with specific `config`, `log_store` and `object_store`.
pub async fn new<S: LogStore>(
data_home: &str,
mut config: MitoConfig,
config: MitoConfig,
log_store: Arc<S>,
object_store_manager: ObjectStoreManagerRef,
schema_metadata_manager: SchemaMetadataManagerRef,
plugins: Plugins,
) -> Result<MitoEngine> {
config.sanitize(data_home)?;
Ok(MitoEngine {
inner: Arc::new(
EngineInner::new(
config,
log_store,
object_store_manager,
schema_metadata_manager,
plugins,
)
.await?,
),
})
let builder = MitoEngineBuilder::new(
data_home,
config,
log_store,
object_store_manager,
schema_metadata_manager,
plugins,
);
builder.try_build().await
}
/// Returns true if the specific region exists.
@@ -263,6 +337,22 @@ impl MitoEngine {
Ok(())
}
fn encode_column_metadatas_to_extensions(
region_id: &RegionId,
column_metadatas: Vec<ColumnMetadata>,
extensions: &mut HashMap<String, Vec<u8>>,
) -> Result<()> {
extensions.insert(
TABLE_COLUMN_METADATA_EXTENSION_KEY.to_string(),
ColumnMetadata::encode_list(&column_metadatas).context(SerializeColumnMetadataSnafu)?,
);
info!(
"Added column metadatas: {:?} to extensions, region_id: {:?}",
column_metadatas, region_id
);
Ok(())
}
/// Find the current version's memtables and SSTs stats by region_id.
/// The stats must be collected in one place one time to ensure data consistency.
pub fn find_memtable_and_sst_stats(
@@ -316,6 +406,8 @@ struct EngineInner {
config: Arc<MitoConfig>,
/// The Wal raw entry reader.
wal_raw_entry_reader: Arc<dyn RawEntryReader>,
#[cfg(feature = "enterprise")]
extension_range_provider_factory: Option<BoxedExtensionRangeProviderFactory>,
}
type TopicGroupedRegionOpenRequests = HashMap<String, Vec<(RegionId, RegionOpenRequest)>>;
@@ -352,28 +444,16 @@ fn prepare_batch_open_requests(
}
impl EngineInner {
/// Returns a new [EngineInner] with specific `config`, `log_store` and `object_store`.
async fn new<S: LogStore>(
config: MitoConfig,
log_store: Arc<S>,
object_store_manager: ObjectStoreManagerRef,
schema_metadata_manager: SchemaMetadataManagerRef,
plugins: Plugins,
) -> Result<EngineInner> {
let config = Arc::new(config);
let wal_raw_entry_reader = Arc::new(LogStoreRawEntryReader::new(log_store.clone()));
Ok(EngineInner {
workers: WorkerGroup::start(
config.clone(),
log_store,
object_store_manager,
schema_metadata_manager,
plugins,
)
.await?,
config,
wal_raw_entry_reader,
})
#[cfg(feature = "enterprise")]
#[must_use]
fn with_extension_range_provider_factory(
self,
extension_range_provider_factory: Option<BoxedExtensionRangeProviderFactory>,
) -> Self {
Self {
extension_range_provider_factory,
..self
}
}
/// Stop the inner engine.
@@ -524,9 +604,27 @@ impl EngineInner {
.with_ignore_bloom_filter(self.config.bloom_filter_index.apply_on_query.disabled())
.with_start_time(query_start);
#[cfg(feature = "enterprise")]
let scan_region = self.maybe_fill_extension_range_provider(scan_region, region);
Ok(scan_region)
}
#[cfg(feature = "enterprise")]
fn maybe_fill_extension_range_provider(
&self,
mut scan_region: ScanRegion,
region: MitoRegionRef,
) -> ScanRegion {
if region.is_follower()
&& let Some(factory) = self.extension_range_provider_factory.as_ref()
{
scan_region
.set_extension_range_provider(factory.create_extension_range_provider(region));
}
scan_region
}
/// Converts the [`RegionRole`].
fn set_region_role(&self, region_id: RegionId, role: RegionRole) -> Result<()> {
let region = self.find_region(region_id)?;
@@ -615,6 +713,7 @@ impl RegionEngine for MitoEngine {
.start_timer();
let is_alter = matches!(request, RegionRequest::Alter(_));
let is_create = matches!(request, RegionRequest::Create(_));
let mut response = self
.inner
.handle_request(region_id, request)
@@ -623,14 +722,11 @@ impl RegionEngine for MitoEngine {
.map_err(BoxedError::new)?;
if is_alter {
if let Some(statistic) = self.region_statistic(region_id) {
Self::encode_manifest_info_to_extensions(
&region_id,
statistic.manifest,
&mut response.extensions,
)
self.handle_alter_response(region_id, &mut response)
.map_err(BoxedError::new)?;
} else if is_create {
self.handle_create_response(region_id, &mut response)
.map_err(BoxedError::new)?;
}
}
Ok(response)
@@ -723,6 +819,55 @@ impl RegionEngine for MitoEngine {
}
}
impl MitoEngine {
fn handle_alter_response(
&self,
region_id: RegionId,
response: &mut RegionResponse,
) -> Result<()> {
if let Some(statistic) = self.region_statistic(region_id) {
Self::encode_manifest_info_to_extensions(
&region_id,
statistic.manifest,
&mut response.extensions,
)?;
}
let column_metadatas = self
.inner
.find_region(region_id)
.ok()
.map(|r| r.metadata().column_metadatas.clone());
if let Some(column_metadatas) = column_metadatas {
Self::encode_column_metadatas_to_extensions(
&region_id,
column_metadatas,
&mut response.extensions,
)?;
}
Ok(())
}
fn handle_create_response(
&self,
region_id: RegionId,
response: &mut RegionResponse,
) -> Result<()> {
let column_metadatas = self
.inner
.find_region(region_id)
.ok()
.map(|r| r.metadata().column_metadatas.clone());
if let Some(column_metadatas) = column_metadatas {
Self::encode_column_metadatas_to_extensions(
&region_id,
column_metadatas,
&mut response.extensions,
)?;
}
Ok(())
}
}
// Tests methods.
#[cfg(any(test, feature = "test"))]
#[allow(clippy::too_many_arguments)]
@@ -756,6 +901,8 @@ impl MitoEngine {
.await?,
config,
wal_raw_entry_reader,
#[cfg(feature = "enterprise")]
extension_range_provider_factory: None,
}),
})
}

View File

@@ -20,16 +20,18 @@ use std::time::Duration;
use api::v1::value::ValueData;
use api::v1::{ColumnDataType, Row, Rows, SemanticType};
use common_error::ext::ErrorExt;
use common_meta::ddl::utils::{parse_column_metadatas, parse_manifest_infos_from_extensions};
use common_recordbatch::RecordBatches;
use datatypes::prelude::ConcreteDataType;
use datatypes::schema::{ColumnSchema, FulltextAnalyzer, FulltextBackend, FulltextOptions};
use store_api::metadata::ColumnMetadata;
use store_api::region_engine::{RegionEngine, RegionRole};
use store_api::metric_engine_consts::TABLE_COLUMN_METADATA_EXTENSION_KEY;
use store_api::region_engine::{RegionEngine, RegionManifestInfo, RegionRole};
use store_api::region_request::{
AddColumn, AddColumnLocation, AlterKind, ApiSetIndexOptions, RegionAlterRequest,
RegionOpenRequest, RegionRequest, SetRegionOption,
};
use store_api::storage::{RegionId, ScanRequest};
use store_api::storage::{ColumnId, RegionId, ScanRequest};
use crate::config::MitoConfig;
use crate::engine::listener::{AlterFlushListener, NotifyRegionChangeResultListener};
@@ -84,12 +86,14 @@ fn alter_column_fulltext_options() -> RegionAlterRequest {
kind: AlterKind::SetIndex {
options: ApiSetIndexOptions::Fulltext {
column_name: "tag_0".to_string(),
options: FulltextOptions {
enable: true,
analyzer: FulltextAnalyzer::English,
case_sensitive: false,
backend: FulltextBackend::Bloom,
},
options: FulltextOptions::new_unchecked(
true,
FulltextAnalyzer::English,
false,
FulltextBackend::Bloom,
1000,
0.01,
),
},
},
}
@@ -111,6 +115,17 @@ fn check_region_version(
assert_eq!(flushed_sequence, version_data.version.flushed_sequence);
}
fn assert_column_metadatas(column_name: &[(&str, ColumnId)], column_metadatas: &[ColumnMetadata]) {
assert_eq!(column_name.len(), column_metadatas.len());
for (name, id) in column_name {
let column_metadata = column_metadatas
.iter()
.find(|c| c.column_id == *id)
.unwrap();
assert_eq!(column_metadata.column_schema.name, *name);
}
}
#[tokio::test]
async fn test_alter_region() {
common_telemetry::init_default_ut_logging();
@@ -134,10 +149,16 @@ async fn test_alter_region() {
let column_schemas = rows_schema(&request);
let region_dir = request.region_dir.clone();
engine
let response = engine
.handle_request(region_id, RegionRequest::Create(request))
.await
.unwrap();
let column_metadatas =
parse_column_metadatas(&response.extensions, TABLE_COLUMN_METADATA_EXTENSION_KEY).unwrap();
assert_column_metadatas(
&[("tag_0", 0), ("field_0", 1), ("ts", 2)],
&column_metadatas,
);
let rows = Rows {
schema: column_schemas,
@@ -146,7 +167,7 @@ async fn test_alter_region() {
put_rows(&engine, region_id, rows).await;
let request = add_tag1();
engine
let response = engine
.handle_request(region_id, RegionRequest::Alter(request))
.await
.unwrap();
@@ -162,6 +183,18 @@ async fn test_alter_region() {
scan_check_after_alter(&engine, region_id, expected).await;
check_region_version(&engine, region_id, 1, 3, 1, 3);
let mut manifests = parse_manifest_infos_from_extensions(&response.extensions).unwrap();
assert_eq!(manifests.len(), 1);
let (return_region_id, manifest) = manifests.remove(0);
assert_eq!(return_region_id, region_id);
assert_eq!(manifest, RegionManifestInfo::mito(2, 1));
let column_metadatas =
parse_column_metadatas(&response.extensions, TABLE_COLUMN_METADATA_EXTENSION_KEY).unwrap();
assert_column_metadatas(
&[("tag_0", 0), ("field_0", 1), ("ts", 2), ("tag_1", 3)],
&column_metadatas,
);
// Reopen region.
let engine = env.reopen_engine(engine, MitoConfig::default()).await;
engine
@@ -553,12 +586,14 @@ async fn test_alter_column_fulltext_options() {
// Wait for the write job.
alter_job.await.unwrap();
let expect_fulltext_options = FulltextOptions {
enable: true,
analyzer: FulltextAnalyzer::English,
case_sensitive: false,
backend: FulltextBackend::Bloom,
};
let expect_fulltext_options = FulltextOptions::new_unchecked(
true,
FulltextAnalyzer::English,
false,
FulltextBackend::Bloom,
1000,
0.01,
);
let check_fulltext_options = |engine: &MitoEngine, expected: &FulltextOptions| {
let current_fulltext_options = engine
.get_region(region_id)

Some files were not shown because too many files have changed in this diff Show More