Compare commits

...

20 Commits

Author SHA1 Message Date
Zhenchi
f66803622d chore: bump version to 0.14.3
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
2025-05-23 20:23:23 +08:00
Ruihang Xia
e7774437b8 fix: require input ordering in series divide plan (#6148)
* require input ordering in series divide plan

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* add sqlness case

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* finilise

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2025-05-23 20:23:23 +08:00
Ruihang Xia
c272b25456 feat: support altering multiple logical table in one remote write request (#6137)
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2025-05-23 20:23:23 +08:00
discord9
724b802018 chore: invalid table flow mapping cache (#6135)
* chore: invalid table flow mapping

* chore: exists

* fix: invalid all related keys in kv cache when drop flow&refactor: per review

* fix: flow not found status code

* chore: rm unused error code

* chore: stuff

* chore: unused
2025-05-23 20:23:23 +08:00
Ruihang Xia
f3ca5f5d7f feat: accommodate default column name with pre-created table schema (#6126)
* refactor: prepare_mocked_backend

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* modify request in place

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* apply to influx line protocol

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix typo

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* return on empty alter expr list

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* expose to other write paths

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2025-05-23 20:23:23 +08:00
Ruihang Xia
6c672b96bf fix: update promql-parser for regex anchor fix (#6117)
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2025-05-23 20:23:23 +08:00
discord9
83018d6670 fix: flow update use proper update (#6108)
* fix: flow update use proper update

* refactor: per review

* fix: flow cache

* chore: per copilot review

* refactor: rm flow node id

* refactor: per review

* chore: per review

* refactor: per review

* chore: per review
2025-05-23 20:23:23 +08:00
discord9
69f1cbd484 fix(flow): flow task run interval (#6100)
* fix: always check for shutdown signal in flow
chore: correct log msg for flows that shouldn't exist
feat: use time window size/2 as sleep interval

* chore: better slower query refresh time

* chore

* refactor: per review
2025-05-23 20:23:23 +08:00
discord9
e1dad69648 fix: flownode chose fe randomly&not starve lock (#6077)
* fix: choose frontend randomly

* docs: update comment

* chore: more logs

* fix: ignore inserts until recovering flow is done

* chore: resolve TODO

* fix: rm unused code&set done in correct location

* refactor: speed up create flow
2025-05-23 20:23:23 +08:00
Ruihang Xia
6c976bc737 feat: don't hide atomic write dir (#6109)
* feat: don't hidden atomic write dir

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* compatible code

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* Update src/mito2/src/access_layer.rs

Co-authored-by: Yingwen <realevenyag@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Co-authored-by: Yingwen <realevenyag@gmail.com>
2025-05-23 20:23:23 +08:00
jeremyhi
b20c1ac797 chore: reduce unnecessary txns in alter operations (#6133) 2025-05-23 20:23:23 +08:00
Yingwen
d7cfb741a5 fix: clean files under the atomic write dir on failure (#6112)
* fix: remove files under atomic dir on failure

* fix: clean atomic dir on download failure

* chore: update comment

* fix: clean if failed to write without write cache

* feat: add a TempFileCleaner to clean files on failure

* chore: after merge fix

* chore: more fix

---------

Co-authored-by: discord9 <55937128+discord9@users.noreply.github.com>
Co-authored-by: discord9 <discord9@163.com>
2025-05-23 20:23:23 +08:00
Weny Xu
1b3efef15c fix: append noop entry when auto topic creation is disabled (#6092)
* feat: improve topic management and add stale records cleanup

* fix: fix unit tests

* chore: apply suggestions from CR

* chore: apply suggestions from CR
2025-05-23 20:23:23 +08:00
Yingwen
1ca2dbd240 fix: reset tags when creating an empty metric in prom call (#6056)
* Revert "chore: remove debug logs"

This reverts commit f73f3a7373c83db974d8ed80cb47f5f87317b490.

* chore: more logs

* fix: reset tags and fields

* test: add binary time fn test

* chore: remove logs

* test: sort result
2025-05-23 20:23:23 +08:00
Ning Sun
d596dba240 fix: ident value in set search_path (#6153)
* fix: ident value in set search_path

* refactor: remove unneeded clone
2025-05-23 20:23:23 +08:00
discord9
5c9cbb5f4c chore: bump version to 0.14.2 (#6032)
* chore: only retry when retry-able in flow (#5987)

* chore: only retry when retry-able

* chore: revert dbg change

* refactor: per review

* fix: check for available frontend first

* docs: more explain&longer timeout&feat: more retry at every level&try send select 1

* fix: use `sql` method for "SELECT 1"

* fix: also put recover flows in spawned task and a dead loop

* test: update transient error in flow rebuild test

* chore: sleep after sqlness sleep

* chore: add a warning

* chore: wait even more time after reboot

* fix: sanitize_connection_string (#6012)

* fix: disable recursion limit in prost (#6010)

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* ci: fix the bugs of release-dev-builder-images and add update-dev-builder-image-tag (#6009)

* fix: the dev-builder release job is not triggered by merged event

* ci: add update-dev-builder-image-tag

* fix: always create mito engine (#6018)

* fix: force streaming mode for instant source table (#6031)

* fix: force streaming mode for instant source table

* tests: sqlness test&refactor: get table

* refactor: per review

* chore: bump version to 0.14.2

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Co-authored-by: jeremyhi <jiachun_feng@proton.me>
Co-authored-by: Ruihang Xia <waynestxia@gmail.com>
Co-authored-by: zyy17 <zyylsxm@gmail.com>
Co-authored-by: Lei, HUANG <6406592+v0y4g3r@users.noreply.github.com>
2025-05-01 09:20:01 -07:00
Zhenchi
e2df38d0d1 chore: bump version to 0.14.1 (#6006)
* feat: remove own greatest fn (#5994)

* fix: prune primary key with multiple columns may use default value as statistics (#5996)

* test: incorrect test result when filtering pk with multiple columns

* fix: prune non first tag correctly

Distinguish no column and no stats and only use default value when no
column

* test: update test result

* refactor: rename test file

* test: add test for null filter

* fix: use StatValues for null counts

* test: drop table

* test: fix unstable flow test

* fix: check if memtable is empty by stats (#5989)

fix/checking-memtable-empty-and-stats:
 - **Refactor timestamp updates**: Simplified timestamp range updates in `PartitionTreeMemtable` and `TimeSeriesMemtable` by replacing `update_timestamp_range` with `fetch_max` and `fetch_min` methods for `max_timestamp` and `min_timestamp`.
   - Affected files: `partition_tree.rs`, `time_series.rs`

 - **Remove unused code**: Deleted the `update_timestamp_range` method from `WriteMetrics` and removed unnecessary imports.
   - Affected file: `stats.rs`

 - **Optimize memtable filtering**: Streamlined the check for empty memtables in `ScanRegion` by directly using `time_range`.
   - Affected file: `scan_region.rs`

* chore: bump version to 0.14.1

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

---------

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
Co-authored-by: dennis zhuang <killme2008@gmail.com>
Co-authored-by: Yingwen <realevenyag@gmail.com>
Co-authored-by: Lei, HUANG <6406592+v0y4g3r@users.noreply.github.com>
2025-04-28 07:39:49 +00:00
discord9
66e2242e46 fix: conn timeout&refactor: better err msg (#5974)
* fix: conn timeout&refactor: better err msg

* chore: clippy

* chore: make test work

* chore: comment

* todo: fix null cast

* fix: retry conn&udd_calc

* chore: comment

* chore: apply suggestion

---------

Co-authored-by: dennis zhuang <killme2008@gmail.com>
2025-04-25 19:12:30 +00:00
Ning Sun
489b16ae30 fix: security update (#5982) 2025-04-25 18:11:09 +00:00
dennis zhuang
85d564b0fb fix: upgrade sqlparse and validate align in range query (#5958)
* fix: upgrade sqlparse and validate align in range query

* update sqlparser to the merged commit

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Co-authored-by: Ruihang Xia <waynestxia@gmail.com>
Co-authored-by: Zhenchi <zhongzc_arch@outlook.com>
2025-04-25 17:34:49 +00:00
113 changed files with 4278 additions and 1350 deletions

37
.github/scripts/update-dev-builder-version.sh vendored Executable file
View File

@@ -0,0 +1,37 @@
#!/bin/bash
DEV_BUILDER_IMAGE_TAG=$1
update_dev_builder_version() {
if [ -z "$DEV_BUILDER_IMAGE_TAG" ]; then
echo "Error: Should specify the dev-builder image tag"
exit 1
fi
# Configure Git configs.
git config --global user.email greptimedb-ci@greptime.com
git config --global user.name greptimedb-ci
# Checkout a new branch.
BRANCH_NAME="ci/update-dev-builder-$(date +%Y%m%d%H%M%S)"
git checkout -b $BRANCH_NAME
# Update the dev-builder image tag in the Makefile.
gsed -i "s/DEV_BUILDER_IMAGE_TAG ?=.*/DEV_BUILDER_IMAGE_TAG ?= ${DEV_BUILDER_IMAGE_TAG}/g" Makefile
# Commit the changes.
git add Makefile
git commit -m "ci: update dev-builder image tag"
git push origin $BRANCH_NAME
# Create a Pull Request.
gh pr create \
--title "ci: update dev-builder image tag" \
--body "This PR updates the dev-builder image tag" \
--base main \
--head $BRANCH_NAME \
--reviewer zyy17 \
--reviewer daviderli614
}
update_dev_builder_version

View File

@@ -24,11 +24,19 @@ on:
description: Release dev-builder-android image
required: false
default: false
update_dev_builder_image_tag:
type: boolean
description: Update the DEV_BUILDER_IMAGE_TAG in Makefile and create a PR
required: false
default: false
jobs:
release-dev-builder-images:
name: Release dev builder images
if: ${{ inputs.release_dev_builder_ubuntu_image || inputs.release_dev_builder_centos_image || inputs.release_dev_builder_android_image }} # Only manually trigger this job.
# The jobs are triggered by the following events:
# 1. Manually triggered workflow_dispatch event
# 2. Push event when the PR that modifies the `rust-toolchain.toml` or `docker/dev-builder/**` is merged to main
if: ${{ github.event_name == 'push' || inputs.release_dev_builder_ubuntu_image || inputs.release_dev_builder_centos_image || inputs.release_dev_builder_android_image }}
runs-on: ubuntu-latest
outputs:
version: ${{ steps.set-version.outputs.version }}
@@ -57,9 +65,9 @@ jobs:
version: ${{ env.VERSION }}
dockerhub-image-registry-username: ${{ secrets.DOCKERHUB_USERNAME }}
dockerhub-image-registry-token: ${{ secrets.DOCKERHUB_TOKEN }}
build-dev-builder-ubuntu: ${{ inputs.release_dev_builder_ubuntu_image }}
build-dev-builder-centos: ${{ inputs.release_dev_builder_centos_image }}
build-dev-builder-android: ${{ inputs.release_dev_builder_android_image }}
build-dev-builder-ubuntu: ${{ inputs.release_dev_builder_ubuntu_image || github.event_name == 'push' }}
build-dev-builder-centos: ${{ inputs.release_dev_builder_centos_image || github.event_name == 'push' }}
build-dev-builder-android: ${{ inputs.release_dev_builder_android_image || github.event_name == 'push' }}
release-dev-builder-images-ecr:
name: Release dev builder images to AWS ECR
@@ -85,7 +93,7 @@ jobs:
- name: Push dev-builder-ubuntu image
shell: bash
if: ${{ inputs.release_dev_builder_ubuntu_image }}
if: ${{ inputs.release_dev_builder_ubuntu_image || github.event_name == 'push' }}
env:
IMAGE_VERSION: ${{ needs.release-dev-builder-images.outputs.version }}
IMAGE_NAMESPACE: ${{ vars.IMAGE_NAMESPACE }}
@@ -106,7 +114,7 @@ jobs:
- name: Push dev-builder-centos image
shell: bash
if: ${{ inputs.release_dev_builder_centos_image }}
if: ${{ inputs.release_dev_builder_centos_image || github.event_name == 'push' }}
env:
IMAGE_VERSION: ${{ needs.release-dev-builder-images.outputs.version }}
IMAGE_NAMESPACE: ${{ vars.IMAGE_NAMESPACE }}
@@ -127,7 +135,7 @@ jobs:
- name: Push dev-builder-android image
shell: bash
if: ${{ inputs.release_dev_builder_android_image }}
if: ${{ inputs.release_dev_builder_android_image || github.event_name == 'push' }}
env:
IMAGE_VERSION: ${{ needs.release-dev-builder-images.outputs.version }}
IMAGE_NAMESPACE: ${{ vars.IMAGE_NAMESPACE }}
@@ -162,7 +170,7 @@ jobs:
- name: Push dev-builder-ubuntu image
shell: bash
if: ${{ inputs.release_dev_builder_ubuntu_image }}
if: ${{ inputs.release_dev_builder_ubuntu_image || github.event_name == 'push' }}
env:
IMAGE_VERSION: ${{ needs.release-dev-builder-images.outputs.version }}
IMAGE_NAMESPACE: ${{ vars.IMAGE_NAMESPACE }}
@@ -176,7 +184,7 @@ jobs:
- name: Push dev-builder-centos image
shell: bash
if: ${{ inputs.release_dev_builder_centos_image }}
if: ${{ inputs.release_dev_builder_centos_image || github.event_name == 'push' }}
env:
IMAGE_VERSION: ${{ needs.release-dev-builder-images.outputs.version }}
IMAGE_NAMESPACE: ${{ vars.IMAGE_NAMESPACE }}
@@ -190,7 +198,7 @@ jobs:
- name: Push dev-builder-android image
shell: bash
if: ${{ inputs.release_dev_builder_android_image }}
if: ${{ inputs.release_dev_builder_android_image || github.event_name == 'push' }}
env:
IMAGE_VERSION: ${{ needs.release-dev-builder-images.outputs.version }}
IMAGE_NAMESPACE: ${{ vars.IMAGE_NAMESPACE }}
@@ -201,3 +209,24 @@ jobs:
quay.io/skopeo/stable:latest \
copy -a docker://docker.io/$IMAGE_NAMESPACE/dev-builder-android:$IMAGE_VERSION \
docker://$ACR_IMAGE_REGISTRY/$IMAGE_NAMESPACE/dev-builder-android:$IMAGE_VERSION
update-dev-builder-image-tag:
name: Update dev-builder image tag
runs-on: ubuntu-latest
permissions:
contents: write
pull-requests: write
if: ${{ github.event_name == 'push' || inputs.update_dev_builder_image_tag }}
needs: [
release-dev-builder-images
]
steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Update dev-builder image tag
shell: bash
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
./.github/scripts/update-dev-builder-version.sh ${{ needs.release-dev-builder-images.outputs.version }}

239
Cargo.lock generated
View File

@@ -185,7 +185,7 @@ checksum = "d301b3b94cb4b2f23d7917810addbbaff90738e0ca2be692bd027e70d7e0330c"
[[package]]
name = "api"
version = "0.14.0"
version = "0.14.3"
dependencies = [
"common-base",
"common-decimal",
@@ -915,7 +915,7 @@ dependencies = [
[[package]]
name = "auth"
version = "0.14.0"
version = "0.14.3"
dependencies = [
"api",
"async-trait",
@@ -1537,7 +1537,7 @@ dependencies = [
[[package]]
name = "cache"
version = "0.14.0"
version = "0.14.3"
dependencies = [
"catalog",
"common-error",
@@ -1561,7 +1561,7 @@ checksum = "37b2a672a2cb129a2e41c10b1224bb368f9f37a2b16b612598138befd7b37eb5"
[[package]]
name = "catalog"
version = "0.14.0"
version = "0.14.3"
dependencies = [
"api",
"arrow 54.2.1",
@@ -1619,9 +1619,9 @@ dependencies = [
[[package]]
name = "cc"
version = "1.1.24"
version = "1.2.20"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "812acba72f0a070b003d3697490d2b55b837230ae7c6c6497f05cc2ddbb8d938"
checksum = "04da6a0d40b948dfc4fa8f5bbf402b0fc1a64a28dbf7d12ffd683550f2c1b63a"
dependencies = [
"jobserver",
"libc",
@@ -1874,7 +1874,7 @@ checksum = "1462739cb27611015575c0c11df5df7601141071f07518d56fcc1be504cbec97"
[[package]]
name = "cli"
version = "0.14.0"
version = "0.14.3"
dependencies = [
"async-trait",
"auth",
@@ -1917,7 +1917,7 @@ dependencies = [
"session",
"snafu 0.8.5",
"store-api",
"substrait 0.14.0",
"substrait 0.14.3",
"table",
"tempfile",
"tokio",
@@ -1926,7 +1926,7 @@ dependencies = [
[[package]]
name = "client"
version = "0.14.0"
version = "0.14.3"
dependencies = [
"api",
"arc-swap",
@@ -1955,7 +1955,7 @@ dependencies = [
"rand 0.9.0",
"serde_json",
"snafu 0.8.5",
"substrait 0.14.0",
"substrait 0.14.3",
"substrait 0.37.3",
"tokio",
"tokio-stream",
@@ -1996,7 +1996,7 @@ dependencies = [
[[package]]
name = "cmd"
version = "0.14.0"
version = "0.14.3"
dependencies = [
"async-trait",
"auth",
@@ -2056,7 +2056,7 @@ dependencies = [
"similar-asserts",
"snafu 0.8.5",
"store-api",
"substrait 0.14.0",
"substrait 0.14.3",
"table",
"temp-env",
"tempfile",
@@ -2102,7 +2102,7 @@ checksum = "55b672471b4e9f9e95499ea597ff64941a309b2cdbffcc46f2cc5e2d971fd335"
[[package]]
name = "common-base"
version = "0.14.0"
version = "0.14.3"
dependencies = [
"anymap2",
"async-trait",
@@ -2124,11 +2124,11 @@ dependencies = [
[[package]]
name = "common-catalog"
version = "0.14.0"
version = "0.14.3"
[[package]]
name = "common-config"
version = "0.14.0"
version = "0.14.3"
dependencies = [
"common-base",
"common-error",
@@ -2153,7 +2153,7 @@ dependencies = [
[[package]]
name = "common-datasource"
version = "0.14.0"
version = "0.14.3"
dependencies = [
"arrow 54.2.1",
"arrow-schema 54.3.1",
@@ -2190,7 +2190,7 @@ dependencies = [
[[package]]
name = "common-decimal"
version = "0.14.0"
version = "0.14.3"
dependencies = [
"bigdecimal 0.4.8",
"common-error",
@@ -2203,7 +2203,7 @@ dependencies = [
[[package]]
name = "common-error"
version = "0.14.0"
version = "0.14.3"
dependencies = [
"common-macro",
"http 1.1.0",
@@ -2214,7 +2214,7 @@ dependencies = [
[[package]]
name = "common-frontend"
version = "0.14.0"
version = "0.14.3"
dependencies = [
"async-trait",
"common-error",
@@ -2224,7 +2224,7 @@ dependencies = [
[[package]]
name = "common-function"
version = "0.14.0"
version = "0.14.3"
dependencies = [
"ahash 0.8.11",
"api",
@@ -2277,7 +2277,7 @@ dependencies = [
[[package]]
name = "common-greptimedb-telemetry"
version = "0.14.0"
version = "0.14.3"
dependencies = [
"async-trait",
"common-runtime",
@@ -2294,7 +2294,7 @@ dependencies = [
[[package]]
name = "common-grpc"
version = "0.14.0"
version = "0.14.3"
dependencies = [
"api",
"arrow-flight",
@@ -2325,7 +2325,7 @@ dependencies = [
[[package]]
name = "common-grpc-expr"
version = "0.14.0"
version = "0.14.3"
dependencies = [
"api",
"common-base",
@@ -2344,7 +2344,7 @@ dependencies = [
[[package]]
name = "common-macro"
version = "0.14.0"
version = "0.14.3"
dependencies = [
"arc-swap",
"common-query",
@@ -2358,7 +2358,7 @@ dependencies = [
[[package]]
name = "common-mem-prof"
version = "0.14.0"
version = "0.14.3"
dependencies = [
"common-error",
"common-macro",
@@ -2371,7 +2371,7 @@ dependencies = [
[[package]]
name = "common-meta"
version = "0.14.0"
version = "0.14.3"
dependencies = [
"anymap2",
"api",
@@ -2432,7 +2432,7 @@ dependencies = [
[[package]]
name = "common-options"
version = "0.14.0"
version = "0.14.3"
dependencies = [
"common-grpc",
"humantime-serde",
@@ -2441,11 +2441,11 @@ dependencies = [
[[package]]
name = "common-plugins"
version = "0.14.0"
version = "0.14.3"
[[package]]
name = "common-pprof"
version = "0.14.0"
version = "0.14.3"
dependencies = [
"common-error",
"common-macro",
@@ -2457,7 +2457,7 @@ dependencies = [
[[package]]
name = "common-procedure"
version = "0.14.0"
version = "0.14.3"
dependencies = [
"async-stream",
"async-trait",
@@ -2484,7 +2484,7 @@ dependencies = [
[[package]]
name = "common-procedure-test"
version = "0.14.0"
version = "0.14.3"
dependencies = [
"async-trait",
"common-procedure",
@@ -2493,7 +2493,7 @@ dependencies = [
[[package]]
name = "common-query"
version = "0.14.0"
version = "0.14.3"
dependencies = [
"api",
"async-trait",
@@ -2510,7 +2510,7 @@ dependencies = [
"futures-util",
"serde",
"snafu 0.8.5",
"sqlparser 0.54.0 (git+https://github.com/GreptimeTeam/sqlparser-rs.git?rev=e98e6b322426a9d397a71efef17075966223c089)",
"sqlparser 0.54.0 (git+https://github.com/GreptimeTeam/sqlparser-rs.git?rev=0cf6c04490d59435ee965edd2078e8855bd8471e)",
"sqlparser_derive 0.1.1",
"statrs",
"store-api",
@@ -2519,7 +2519,7 @@ dependencies = [
[[package]]
name = "common-recordbatch"
version = "0.14.0"
version = "0.14.3"
dependencies = [
"arc-swap",
"common-error",
@@ -2539,7 +2539,7 @@ dependencies = [
[[package]]
name = "common-runtime"
version = "0.14.0"
version = "0.14.3"
dependencies = [
"async-trait",
"clap 4.5.19",
@@ -2569,14 +2569,14 @@ dependencies = [
[[package]]
name = "common-session"
version = "0.14.0"
version = "0.14.3"
dependencies = [
"strum 0.27.1",
]
[[package]]
name = "common-telemetry"
version = "0.14.0"
version = "0.14.3"
dependencies = [
"atty",
"backtrace",
@@ -2604,7 +2604,7 @@ dependencies = [
[[package]]
name = "common-test-util"
version = "0.14.0"
version = "0.14.3"
dependencies = [
"client",
"common-query",
@@ -2616,7 +2616,7 @@ dependencies = [
[[package]]
name = "common-time"
version = "0.14.0"
version = "0.14.3"
dependencies = [
"arrow 54.2.1",
"chrono",
@@ -2634,7 +2634,7 @@ dependencies = [
[[package]]
name = "common-version"
version = "0.14.0"
version = "0.14.3"
dependencies = [
"build-data",
"const_format",
@@ -2644,7 +2644,7 @@ dependencies = [
[[package]]
name = "common-wal"
version = "0.14.0"
version = "0.14.3"
dependencies = [
"common-base",
"common-error",
@@ -2946,9 +2946,9 @@ dependencies = [
[[package]]
name = "crossbeam-channel"
version = "0.5.13"
version = "0.5.15"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "33480d6946193aa8033910124896ca395333cae7e2d1113d1fef6c3272217df2"
checksum = "82b8f8f868b36967f9606790d1903570de9ceaf870a7bf9fbbd3016d636a2cb2"
dependencies = [
"crossbeam-utils",
]
@@ -3117,7 +3117,7 @@ checksum = "e8566979429cf69b49a5c740c60791108e86440e8be149bbea4fe54d2c32d6e2"
[[package]]
name = "datafusion"
version = "45.0.0"
source = "git+https://github.com/waynexia/arrow-datafusion.git?rev=5bbedc6704162afb03478f56ffb629405a4e1220#5bbedc6704162afb03478f56ffb629405a4e1220"
source = "git+https://github.com/waynexia/arrow-datafusion.git?rev=e104c7cf62b11dd5fe41461b82514978234326b4#e104c7cf62b11dd5fe41461b82514978234326b4"
dependencies = [
"arrow 54.2.1",
"arrow-array 54.2.1",
@@ -3168,7 +3168,7 @@ dependencies = [
[[package]]
name = "datafusion-catalog"
version = "45.0.0"
source = "git+https://github.com/waynexia/arrow-datafusion.git?rev=5bbedc6704162afb03478f56ffb629405a4e1220#5bbedc6704162afb03478f56ffb629405a4e1220"
source = "git+https://github.com/waynexia/arrow-datafusion.git?rev=e104c7cf62b11dd5fe41461b82514978234326b4#e104c7cf62b11dd5fe41461b82514978234326b4"
dependencies = [
"arrow 54.2.1",
"async-trait",
@@ -3188,7 +3188,7 @@ dependencies = [
[[package]]
name = "datafusion-catalog-listing"
version = "45.0.0"
source = "git+https://github.com/waynexia/arrow-datafusion.git?rev=5bbedc6704162afb03478f56ffb629405a4e1220#5bbedc6704162afb03478f56ffb629405a4e1220"
source = "git+https://github.com/waynexia/arrow-datafusion.git?rev=e104c7cf62b11dd5fe41461b82514978234326b4#e104c7cf62b11dd5fe41461b82514978234326b4"
dependencies = [
"arrow 54.2.1",
"arrow-schema 54.3.1",
@@ -3211,7 +3211,7 @@ dependencies = [
[[package]]
name = "datafusion-common"
version = "45.0.0"
source = "git+https://github.com/waynexia/arrow-datafusion.git?rev=5bbedc6704162afb03478f56ffb629405a4e1220#5bbedc6704162afb03478f56ffb629405a4e1220"
source = "git+https://github.com/waynexia/arrow-datafusion.git?rev=e104c7cf62b11dd5fe41461b82514978234326b4#e104c7cf62b11dd5fe41461b82514978234326b4"
dependencies = [
"ahash 0.8.11",
"arrow 54.2.1",
@@ -3236,7 +3236,7 @@ dependencies = [
[[package]]
name = "datafusion-common-runtime"
version = "45.0.0"
source = "git+https://github.com/waynexia/arrow-datafusion.git?rev=5bbedc6704162afb03478f56ffb629405a4e1220#5bbedc6704162afb03478f56ffb629405a4e1220"
source = "git+https://github.com/waynexia/arrow-datafusion.git?rev=e104c7cf62b11dd5fe41461b82514978234326b4#e104c7cf62b11dd5fe41461b82514978234326b4"
dependencies = [
"log",
"tokio",
@@ -3245,12 +3245,12 @@ dependencies = [
[[package]]
name = "datafusion-doc"
version = "45.0.0"
source = "git+https://github.com/waynexia/arrow-datafusion.git?rev=5bbedc6704162afb03478f56ffb629405a4e1220#5bbedc6704162afb03478f56ffb629405a4e1220"
source = "git+https://github.com/waynexia/arrow-datafusion.git?rev=e104c7cf62b11dd5fe41461b82514978234326b4#e104c7cf62b11dd5fe41461b82514978234326b4"
[[package]]
name = "datafusion-execution"
version = "45.0.0"
source = "git+https://github.com/waynexia/arrow-datafusion.git?rev=5bbedc6704162afb03478f56ffb629405a4e1220#5bbedc6704162afb03478f56ffb629405a4e1220"
source = "git+https://github.com/waynexia/arrow-datafusion.git?rev=e104c7cf62b11dd5fe41461b82514978234326b4#e104c7cf62b11dd5fe41461b82514978234326b4"
dependencies = [
"arrow 54.2.1",
"dashmap",
@@ -3268,7 +3268,7 @@ dependencies = [
[[package]]
name = "datafusion-expr"
version = "45.0.0"
source = "git+https://github.com/waynexia/arrow-datafusion.git?rev=5bbedc6704162afb03478f56ffb629405a4e1220#5bbedc6704162afb03478f56ffb629405a4e1220"
source = "git+https://github.com/waynexia/arrow-datafusion.git?rev=e104c7cf62b11dd5fe41461b82514978234326b4#e104c7cf62b11dd5fe41461b82514978234326b4"
dependencies = [
"arrow 54.2.1",
"chrono",
@@ -3288,7 +3288,7 @@ dependencies = [
[[package]]
name = "datafusion-expr-common"
version = "45.0.0"
source = "git+https://github.com/waynexia/arrow-datafusion.git?rev=5bbedc6704162afb03478f56ffb629405a4e1220#5bbedc6704162afb03478f56ffb629405a4e1220"
source = "git+https://github.com/waynexia/arrow-datafusion.git?rev=e104c7cf62b11dd5fe41461b82514978234326b4#e104c7cf62b11dd5fe41461b82514978234326b4"
dependencies = [
"arrow 54.2.1",
"datafusion-common",
@@ -3299,7 +3299,7 @@ dependencies = [
[[package]]
name = "datafusion-functions"
version = "45.0.0"
source = "git+https://github.com/waynexia/arrow-datafusion.git?rev=5bbedc6704162afb03478f56ffb629405a4e1220#5bbedc6704162afb03478f56ffb629405a4e1220"
source = "git+https://github.com/waynexia/arrow-datafusion.git?rev=e104c7cf62b11dd5fe41461b82514978234326b4#e104c7cf62b11dd5fe41461b82514978234326b4"
dependencies = [
"arrow 54.2.1",
"arrow-buffer 54.3.1",
@@ -3328,7 +3328,7 @@ dependencies = [
[[package]]
name = "datafusion-functions-aggregate"
version = "45.0.0"
source = "git+https://github.com/waynexia/arrow-datafusion.git?rev=5bbedc6704162afb03478f56ffb629405a4e1220#5bbedc6704162afb03478f56ffb629405a4e1220"
source = "git+https://github.com/waynexia/arrow-datafusion.git?rev=e104c7cf62b11dd5fe41461b82514978234326b4#e104c7cf62b11dd5fe41461b82514978234326b4"
dependencies = [
"ahash 0.8.11",
"arrow 54.2.1",
@@ -3349,7 +3349,7 @@ dependencies = [
[[package]]
name = "datafusion-functions-aggregate-common"
version = "45.0.0"
source = "git+https://github.com/waynexia/arrow-datafusion.git?rev=5bbedc6704162afb03478f56ffb629405a4e1220#5bbedc6704162afb03478f56ffb629405a4e1220"
source = "git+https://github.com/waynexia/arrow-datafusion.git?rev=e104c7cf62b11dd5fe41461b82514978234326b4#e104c7cf62b11dd5fe41461b82514978234326b4"
dependencies = [
"ahash 0.8.11",
"arrow 54.2.1",
@@ -3361,7 +3361,7 @@ dependencies = [
[[package]]
name = "datafusion-functions-nested"
version = "45.0.0"
source = "git+https://github.com/waynexia/arrow-datafusion.git?rev=5bbedc6704162afb03478f56ffb629405a4e1220#5bbedc6704162afb03478f56ffb629405a4e1220"
source = "git+https://github.com/waynexia/arrow-datafusion.git?rev=e104c7cf62b11dd5fe41461b82514978234326b4#e104c7cf62b11dd5fe41461b82514978234326b4"
dependencies = [
"arrow 54.2.1",
"arrow-array 54.2.1",
@@ -3383,7 +3383,7 @@ dependencies = [
[[package]]
name = "datafusion-functions-table"
version = "45.0.0"
source = "git+https://github.com/waynexia/arrow-datafusion.git?rev=5bbedc6704162afb03478f56ffb629405a4e1220#5bbedc6704162afb03478f56ffb629405a4e1220"
source = "git+https://github.com/waynexia/arrow-datafusion.git?rev=e104c7cf62b11dd5fe41461b82514978234326b4#e104c7cf62b11dd5fe41461b82514978234326b4"
dependencies = [
"arrow 54.2.1",
"async-trait",
@@ -3398,7 +3398,7 @@ dependencies = [
[[package]]
name = "datafusion-functions-window"
version = "45.0.0"
source = "git+https://github.com/waynexia/arrow-datafusion.git?rev=5bbedc6704162afb03478f56ffb629405a4e1220#5bbedc6704162afb03478f56ffb629405a4e1220"
source = "git+https://github.com/waynexia/arrow-datafusion.git?rev=e104c7cf62b11dd5fe41461b82514978234326b4#e104c7cf62b11dd5fe41461b82514978234326b4"
dependencies = [
"datafusion-common",
"datafusion-doc",
@@ -3414,7 +3414,7 @@ dependencies = [
[[package]]
name = "datafusion-functions-window-common"
version = "45.0.0"
source = "git+https://github.com/waynexia/arrow-datafusion.git?rev=5bbedc6704162afb03478f56ffb629405a4e1220#5bbedc6704162afb03478f56ffb629405a4e1220"
source = "git+https://github.com/waynexia/arrow-datafusion.git?rev=e104c7cf62b11dd5fe41461b82514978234326b4#e104c7cf62b11dd5fe41461b82514978234326b4"
dependencies = [
"datafusion-common",
"datafusion-physical-expr-common",
@@ -3423,7 +3423,7 @@ dependencies = [
[[package]]
name = "datafusion-macros"
version = "45.0.0"
source = "git+https://github.com/waynexia/arrow-datafusion.git?rev=5bbedc6704162afb03478f56ffb629405a4e1220#5bbedc6704162afb03478f56ffb629405a4e1220"
source = "git+https://github.com/waynexia/arrow-datafusion.git?rev=e104c7cf62b11dd5fe41461b82514978234326b4#e104c7cf62b11dd5fe41461b82514978234326b4"
dependencies = [
"datafusion-expr",
"quote",
@@ -3433,7 +3433,7 @@ dependencies = [
[[package]]
name = "datafusion-optimizer"
version = "45.0.0"
source = "git+https://github.com/waynexia/arrow-datafusion.git?rev=5bbedc6704162afb03478f56ffb629405a4e1220#5bbedc6704162afb03478f56ffb629405a4e1220"
source = "git+https://github.com/waynexia/arrow-datafusion.git?rev=e104c7cf62b11dd5fe41461b82514978234326b4#e104c7cf62b11dd5fe41461b82514978234326b4"
dependencies = [
"arrow 54.2.1",
"chrono",
@@ -3451,7 +3451,7 @@ dependencies = [
[[package]]
name = "datafusion-physical-expr"
version = "45.0.0"
source = "git+https://github.com/waynexia/arrow-datafusion.git?rev=5bbedc6704162afb03478f56ffb629405a4e1220#5bbedc6704162afb03478f56ffb629405a4e1220"
source = "git+https://github.com/waynexia/arrow-datafusion.git?rev=e104c7cf62b11dd5fe41461b82514978234326b4#e104c7cf62b11dd5fe41461b82514978234326b4"
dependencies = [
"ahash 0.8.11",
"arrow 54.2.1",
@@ -3474,7 +3474,7 @@ dependencies = [
[[package]]
name = "datafusion-physical-expr-common"
version = "45.0.0"
source = "git+https://github.com/waynexia/arrow-datafusion.git?rev=5bbedc6704162afb03478f56ffb629405a4e1220#5bbedc6704162afb03478f56ffb629405a4e1220"
source = "git+https://github.com/waynexia/arrow-datafusion.git?rev=e104c7cf62b11dd5fe41461b82514978234326b4#e104c7cf62b11dd5fe41461b82514978234326b4"
dependencies = [
"ahash 0.8.11",
"arrow 54.2.1",
@@ -3487,7 +3487,7 @@ dependencies = [
[[package]]
name = "datafusion-physical-optimizer"
version = "45.0.0"
source = "git+https://github.com/waynexia/arrow-datafusion.git?rev=5bbedc6704162afb03478f56ffb629405a4e1220#5bbedc6704162afb03478f56ffb629405a4e1220"
source = "git+https://github.com/waynexia/arrow-datafusion.git?rev=e104c7cf62b11dd5fe41461b82514978234326b4#e104c7cf62b11dd5fe41461b82514978234326b4"
dependencies = [
"arrow 54.2.1",
"arrow-schema 54.3.1",
@@ -3508,7 +3508,7 @@ dependencies = [
[[package]]
name = "datafusion-physical-plan"
version = "45.0.0"
source = "git+https://github.com/waynexia/arrow-datafusion.git?rev=5bbedc6704162afb03478f56ffb629405a4e1220#5bbedc6704162afb03478f56ffb629405a4e1220"
source = "git+https://github.com/waynexia/arrow-datafusion.git?rev=e104c7cf62b11dd5fe41461b82514978234326b4#e104c7cf62b11dd5fe41461b82514978234326b4"
dependencies = [
"ahash 0.8.11",
"arrow 54.2.1",
@@ -3538,7 +3538,7 @@ dependencies = [
[[package]]
name = "datafusion-sql"
version = "45.0.0"
source = "git+https://github.com/waynexia/arrow-datafusion.git?rev=5bbedc6704162afb03478f56ffb629405a4e1220#5bbedc6704162afb03478f56ffb629405a4e1220"
source = "git+https://github.com/waynexia/arrow-datafusion.git?rev=e104c7cf62b11dd5fe41461b82514978234326b4#e104c7cf62b11dd5fe41461b82514978234326b4"
dependencies = [
"arrow 54.2.1",
"arrow-array 54.2.1",
@@ -3556,7 +3556,7 @@ dependencies = [
[[package]]
name = "datafusion-substrait"
version = "45.0.0"
source = "git+https://github.com/waynexia/arrow-datafusion.git?rev=5bbedc6704162afb03478f56ffb629405a4e1220#5bbedc6704162afb03478f56ffb629405a4e1220"
source = "git+https://github.com/waynexia/arrow-datafusion.git?rev=e104c7cf62b11dd5fe41461b82514978234326b4#e104c7cf62b11dd5fe41461b82514978234326b4"
dependencies = [
"async-recursion",
"async-trait",
@@ -3572,7 +3572,7 @@ dependencies = [
[[package]]
name = "datanode"
version = "0.14.0"
version = "0.14.3"
dependencies = [
"api",
"arrow-flight",
@@ -3624,7 +3624,7 @@ dependencies = [
"session",
"snafu 0.8.5",
"store-api",
"substrait 0.14.0",
"substrait 0.14.3",
"table",
"tokio",
"toml 0.8.19",
@@ -3633,7 +3633,7 @@ dependencies = [
[[package]]
name = "datatypes"
version = "0.14.0"
version = "0.14.3"
dependencies = [
"arrow 54.2.1",
"arrow-array 54.2.1",
@@ -3656,7 +3656,7 @@ dependencies = [
"serde",
"serde_json",
"snafu 0.8.5",
"sqlparser 0.54.0 (git+https://github.com/GreptimeTeam/sqlparser-rs.git?rev=e98e6b322426a9d397a71efef17075966223c089)",
"sqlparser 0.54.0 (git+https://github.com/GreptimeTeam/sqlparser-rs.git?rev=0cf6c04490d59435ee965edd2078e8855bd8471e)",
"sqlparser_derive 0.1.1",
]
@@ -4259,7 +4259,7 @@ dependencies = [
[[package]]
name = "file-engine"
version = "0.14.0"
version = "0.14.3"
dependencies = [
"api",
"async-trait",
@@ -4382,7 +4382,7 @@ checksum = "8bf7cc16383c4b8d58b9905a8509f02926ce3058053c056376248d958c9df1e8"
[[package]]
name = "flow"
version = "0.14.0"
version = "0.14.3"
dependencies = [
"api",
"arrow 54.2.1",
@@ -4436,6 +4436,7 @@ dependencies = [
"prometheus",
"prost 0.13.5",
"query",
"rand 0.9.0",
"serde",
"serde_json",
"servers",
@@ -4444,7 +4445,7 @@ dependencies = [
"snafu 0.8.5",
"store-api",
"strum 0.27.1",
"substrait 0.14.0",
"substrait 0.14.3",
"table",
"tokio",
"tonic 0.12.3",
@@ -4499,7 +4500,7 @@ checksum = "6c2141d6d6c8512188a7891b4b01590a45f6dac67afb4f255c4124dbb86d4eaa"
[[package]]
name = "frontend"
version = "0.14.0"
version = "0.14.3"
dependencies = [
"api",
"arc-swap",
@@ -4553,10 +4554,10 @@ dependencies = [
"session",
"snafu 0.8.5",
"sql",
"sqlparser 0.54.0 (git+https://github.com/GreptimeTeam/sqlparser-rs.git?rev=e98e6b322426a9d397a71efef17075966223c089)",
"sqlparser 0.54.0 (git+https://github.com/GreptimeTeam/sqlparser-rs.git?rev=0cf6c04490d59435ee965edd2078e8855bd8471e)",
"store-api",
"strfmt",
"substrait 0.14.0",
"substrait 0.14.3",
"table",
"tokio",
"toml 0.8.19",
@@ -4944,7 +4945,7 @@ dependencies = [
[[package]]
name = "greptime-proto"
version = "0.1.0"
source = "git+https://github.com/GreptimeTeam/greptime-proto.git?rev=e82b0158cd38d4021edb4e4c0ae77f999051e62f#e82b0158cd38d4021edb4e4c0ae77f999051e62f"
source = "git+https://github.com/GreptimeTeam/greptime-proto.git?rev=4d4136692fe7fbbd509ebc8c902f6afcc0ce61e4#4d4136692fe7fbbd509ebc8c902f6afcc0ce61e4"
dependencies = [
"prost 0.13.5",
"serde",
@@ -5795,7 +5796,7 @@ dependencies = [
[[package]]
name = "index"
version = "0.14.0"
version = "0.14.3"
dependencies = [
"async-trait",
"asynchronous-codec",
@@ -6509,7 +6510,7 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "4979f22fdb869068da03c9f7528f8297c6fd2606bc3a4affe42e6a823fdb8da4"
dependencies = [
"cfg-if",
"windows-targets 0.52.6",
"windows-targets 0.48.5",
]
[[package]]
@@ -6605,7 +6606,7 @@ checksum = "a7a70ba024b9dc04c27ea2f0c0548feb474ec5c54bba33a7f72f873a39d07b24"
[[package]]
name = "log-query"
version = "0.14.0"
version = "0.14.3"
dependencies = [
"chrono",
"common-error",
@@ -6617,7 +6618,7 @@ dependencies = [
[[package]]
name = "log-store"
version = "0.14.0"
version = "0.14.3"
dependencies = [
"async-stream",
"async-trait",
@@ -6911,7 +6912,7 @@ dependencies = [
[[package]]
name = "meta-client"
version = "0.14.0"
version = "0.14.3"
dependencies = [
"api",
"async-trait",
@@ -6939,7 +6940,7 @@ dependencies = [
[[package]]
name = "meta-srv"
version = "0.14.0"
version = "0.14.3"
dependencies = [
"api",
"async-trait",
@@ -7029,7 +7030,7 @@ dependencies = [
[[package]]
name = "metric-engine"
version = "0.14.0"
version = "0.14.3"
dependencies = [
"api",
"aquamarine",
@@ -7118,7 +7119,7 @@ dependencies = [
[[package]]
name = "mito2"
version = "0.14.0"
version = "0.14.3"
dependencies = [
"api",
"aquamarine",
@@ -7824,7 +7825,7 @@ dependencies = [
[[package]]
name = "object-store"
version = "0.14.0"
version = "0.14.3"
dependencies = [
"anyhow",
"bytes",
@@ -8119,7 +8120,7 @@ dependencies = [
[[package]]
name = "operator"
version = "0.14.0"
version = "0.14.3"
dependencies = [
"ahash 0.8.11",
"api",
@@ -8166,9 +8167,9 @@ dependencies = [
"session",
"snafu 0.8.5",
"sql",
"sqlparser 0.54.0 (git+https://github.com/GreptimeTeam/sqlparser-rs.git?rev=e98e6b322426a9d397a71efef17075966223c089)",
"sqlparser 0.54.0 (git+https://github.com/GreptimeTeam/sqlparser-rs.git?rev=0cf6c04490d59435ee965edd2078e8855bd8471e)",
"store-api",
"substrait 0.14.0",
"substrait 0.14.3",
"table",
"tokio",
"tokio-util",
@@ -8423,7 +8424,7 @@ dependencies = [
[[package]]
name = "partition"
version = "0.14.0"
version = "0.14.3"
dependencies = [
"api",
"async-trait",
@@ -8443,7 +8444,7 @@ dependencies = [
"session",
"snafu 0.8.5",
"sql",
"sqlparser 0.54.0 (git+https://github.com/GreptimeTeam/sqlparser-rs.git?rev=e98e6b322426a9d397a71efef17075966223c089)",
"sqlparser 0.54.0 (git+https://github.com/GreptimeTeam/sqlparser-rs.git?rev=0cf6c04490d59435ee965edd2078e8855bd8471e)",
"store-api",
"table",
]
@@ -8705,7 +8706,7 @@ checksum = "8b870d8c151b6f2fb93e84a13146138f05d02ed11c7e7c54f8826aaaf7c9f184"
[[package]]
name = "pipeline"
version = "0.14.0"
version = "0.14.3"
dependencies = [
"ahash 0.8.11",
"api",
@@ -8847,7 +8848,7 @@ dependencies = [
[[package]]
name = "plugins"
version = "0.14.0"
version = "0.14.3"
dependencies = [
"auth",
"clap 4.5.19",
@@ -9127,7 +9128,7 @@ dependencies = [
[[package]]
name = "promql"
version = "0.14.0"
version = "0.14.3"
dependencies = [
"ahash 0.8.11",
"async-trait",
@@ -9152,8 +9153,7 @@ dependencies = [
[[package]]
name = "promql-parser"
version = "0.5.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "60d851f6523a8215e2fbf86b6cef4548433f8b76092e9ffb607105de52ae63fd"
source = "git+https://github.com/GreptimeTeam/promql-parser.git?rev=0410e8b459dda7cb222ce9596f8bf3971bd07bd2#0410e8b459dda7cb222ce9596f8bf3971bd07bd2"
dependencies = [
"cfgrammar",
"chrono",
@@ -9163,6 +9163,7 @@ dependencies = [
"regex",
"serde",
"serde_json",
"unescaper",
]
[[package]]
@@ -9373,7 +9374,7 @@ dependencies = [
[[package]]
name = "puffin"
version = "0.14.0"
version = "0.14.3"
dependencies = [
"async-compression 0.4.13",
"async-trait",
@@ -9414,7 +9415,7 @@ dependencies = [
[[package]]
name = "query"
version = "0.14.0"
version = "0.14.3"
dependencies = [
"ahash 0.8.11",
"api",
@@ -9477,10 +9478,10 @@ dependencies = [
"session",
"snafu 0.8.5",
"sql",
"sqlparser 0.54.0 (git+https://github.com/GreptimeTeam/sqlparser-rs.git?rev=e98e6b322426a9d397a71efef17075966223c089)",
"sqlparser 0.54.0 (git+https://github.com/GreptimeTeam/sqlparser-rs.git?rev=0cf6c04490d59435ee965edd2078e8855bd8471e)",
"statrs",
"store-api",
"substrait 0.14.0",
"substrait 0.14.3",
"table",
"tokio",
"tokio-stream",
@@ -10005,15 +10006,14 @@ dependencies = [
[[package]]
name = "ring"
version = "0.17.8"
version = "0.17.14"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "c17fa4cb658e3583423e915b9f3acc01cceaee1860e33d59ebae66adc3a2dc0d"
checksum = "a4689e6c2294d81e88dc6261c768b63bc4fcdb852be6d1352498b114f61383b7"
dependencies = [
"cc",
"cfg-if",
"getrandom 0.2.15",
"libc",
"spin",
"untrusted",
"windows-sys 0.52.0",
]
@@ -10831,7 +10831,7 @@ dependencies = [
[[package]]
name = "servers"
version = "0.14.0"
version = "0.14.3"
dependencies = [
"ahash 0.8.11",
"api",
@@ -10951,7 +10951,7 @@ dependencies = [
[[package]]
name = "session"
version = "0.14.0"
version = "0.14.3"
dependencies = [
"api",
"arc-swap",
@@ -11276,7 +11276,7 @@ dependencies = [
[[package]]
name = "sql"
version = "0.14.0"
version = "0.14.3"
dependencies = [
"api",
"chrono",
@@ -11304,7 +11304,7 @@ dependencies = [
"serde",
"serde_json",
"snafu 0.8.5",
"sqlparser 0.54.0 (git+https://github.com/GreptimeTeam/sqlparser-rs.git?rev=e98e6b322426a9d397a71efef17075966223c089)",
"sqlparser 0.54.0 (git+https://github.com/GreptimeTeam/sqlparser-rs.git?rev=0cf6c04490d59435ee965edd2078e8855bd8471e)",
"sqlparser_derive 0.1.1",
"store-api",
"table",
@@ -11331,7 +11331,7 @@ dependencies = [
[[package]]
name = "sqlness-runner"
version = "0.14.0"
version = "0.14.3"
dependencies = [
"async-trait",
"clap 4.5.19",
@@ -11373,7 +11373,7 @@ dependencies = [
[[package]]
name = "sqlparser"
version = "0.54.0"
source = "git+https://github.com/GreptimeTeam/sqlparser-rs.git?rev=e98e6b322426a9d397a71efef17075966223c089#e98e6b322426a9d397a71efef17075966223c089"
source = "git+https://github.com/GreptimeTeam/sqlparser-rs.git?rev=0cf6c04490d59435ee965edd2078e8855bd8471e#0cf6c04490d59435ee965edd2078e8855bd8471e"
dependencies = [
"lazy_static",
"log",
@@ -11381,7 +11381,7 @@ dependencies = [
"regex",
"serde",
"sqlparser 0.54.0 (registry+https://github.com/rust-lang/crates.io-index)",
"sqlparser_derive 0.3.0 (git+https://github.com/GreptimeTeam/sqlparser-rs.git?rev=e98e6b322426a9d397a71efef17075966223c089)",
"sqlparser_derive 0.3.0 (git+https://github.com/GreptimeTeam/sqlparser-rs.git?rev=0cf6c04490d59435ee965edd2078e8855bd8471e)",
]
[[package]]
@@ -11409,7 +11409,7 @@ dependencies = [
[[package]]
name = "sqlparser_derive"
version = "0.3.0"
source = "git+https://github.com/GreptimeTeam/sqlparser-rs.git?rev=e98e6b322426a9d397a71efef17075966223c089#e98e6b322426a9d397a71efef17075966223c089"
source = "git+https://github.com/GreptimeTeam/sqlparser-rs.git?rev=0cf6c04490d59435ee965edd2078e8855bd8471e#0cf6c04490d59435ee965edd2078e8855bd8471e"
dependencies = [
"proc-macro2",
"quote",
@@ -11650,7 +11650,7 @@ dependencies = [
[[package]]
name = "store-api"
version = "0.14.0"
version = "0.14.3"
dependencies = [
"api",
"aquamarine",
@@ -11658,6 +11658,7 @@ dependencies = [
"async-trait",
"common-base",
"common-error",
"common-grpc",
"common-macro",
"common-meta",
"common-recordbatch",
@@ -11799,7 +11800,7 @@ dependencies = [
[[package]]
name = "substrait"
version = "0.14.0"
version = "0.14.3"
dependencies = [
"async-trait",
"bytes",
@@ -11979,7 +11980,7 @@ dependencies = [
[[package]]
name = "table"
version = "0.14.0"
version = "0.14.3"
dependencies = [
"api",
"async-trait",
@@ -12230,7 +12231,7 @@ checksum = "3369f5ac52d5eb6ab48c6b4ffdc8efbcad6b89c765749064ba298f2c68a16a76"
[[package]]
name = "tests-fuzz"
version = "0.14.0"
version = "0.14.3"
dependencies = [
"arbitrary",
"async-trait",
@@ -12264,7 +12265,7 @@ dependencies = [
"serde_yaml",
"snafu 0.8.5",
"sql",
"sqlparser 0.54.0 (git+https://github.com/GreptimeTeam/sqlparser-rs.git?rev=e98e6b322426a9d397a71efef17075966223c089)",
"sqlparser 0.54.0 (git+https://github.com/GreptimeTeam/sqlparser-rs.git?rev=0cf6c04490d59435ee965edd2078e8855bd8471e)",
"sqlx",
"store-api",
"strum 0.27.1",
@@ -12274,7 +12275,7 @@ dependencies = [
[[package]]
name = "tests-integration"
version = "0.14.0"
version = "0.14.3"
dependencies = [
"api",
"arrow-flight",
@@ -12341,7 +12342,7 @@ dependencies = [
"sql",
"sqlx",
"store-api",
"substrait 0.14.0",
"substrait 0.14.3",
"table",
"tempfile",
"time",

View File

@@ -68,7 +68,7 @@ members = [
resolver = "2"
[workspace.package]
version = "0.14.0"
version = "0.14.3"
edition = "2021"
license = "Apache-2.0"
@@ -112,15 +112,15 @@ clap = { version = "4.4", features = ["derive"] }
config = "0.13.0"
crossbeam-utils = "0.8"
dashmap = "6.1"
datafusion = { git = "https://github.com/waynexia/arrow-datafusion.git", rev = "5bbedc6704162afb03478f56ffb629405a4e1220" }
datafusion-common = { git = "https://github.com/waynexia/arrow-datafusion.git", rev = "5bbedc6704162afb03478f56ffb629405a4e1220" }
datafusion-expr = { git = "https://github.com/waynexia/arrow-datafusion.git", rev = "5bbedc6704162afb03478f56ffb629405a4e1220" }
datafusion-functions = { git = "https://github.com/waynexia/arrow-datafusion.git", rev = "5bbedc6704162afb03478f56ffb629405a4e1220" }
datafusion-optimizer = { git = "https://github.com/waynexia/arrow-datafusion.git", rev = "5bbedc6704162afb03478f56ffb629405a4e1220" }
datafusion-physical-expr = { git = "https://github.com/waynexia/arrow-datafusion.git", rev = "5bbedc6704162afb03478f56ffb629405a4e1220" }
datafusion-physical-plan = { git = "https://github.com/waynexia/arrow-datafusion.git", rev = "5bbedc6704162afb03478f56ffb629405a4e1220" }
datafusion-sql = { git = "https://github.com/waynexia/arrow-datafusion.git", rev = "5bbedc6704162afb03478f56ffb629405a4e1220" }
datafusion-substrait = { git = "https://github.com/waynexia/arrow-datafusion.git", rev = "5bbedc6704162afb03478f56ffb629405a4e1220" }
datafusion = { git = "https://github.com/waynexia/arrow-datafusion.git", rev = "e104c7cf62b11dd5fe41461b82514978234326b4" }
datafusion-common = { git = "https://github.com/waynexia/arrow-datafusion.git", rev = "e104c7cf62b11dd5fe41461b82514978234326b4" }
datafusion-expr = { git = "https://github.com/waynexia/arrow-datafusion.git", rev = "e104c7cf62b11dd5fe41461b82514978234326b4" }
datafusion-functions = { git = "https://github.com/waynexia/arrow-datafusion.git", rev = "e104c7cf62b11dd5fe41461b82514978234326b4" }
datafusion-optimizer = { git = "https://github.com/waynexia/arrow-datafusion.git", rev = "e104c7cf62b11dd5fe41461b82514978234326b4" }
datafusion-physical-expr = { git = "https://github.com/waynexia/arrow-datafusion.git", rev = "e104c7cf62b11dd5fe41461b82514978234326b4" }
datafusion-physical-plan = { git = "https://github.com/waynexia/arrow-datafusion.git", rev = "e104c7cf62b11dd5fe41461b82514978234326b4" }
datafusion-sql = { git = "https://github.com/waynexia/arrow-datafusion.git", rev = "e104c7cf62b11dd5fe41461b82514978234326b4" }
datafusion-substrait = { git = "https://github.com/waynexia/arrow-datafusion.git", rev = "e104c7cf62b11dd5fe41461b82514978234326b4" }
deadpool = "0.12"
deadpool-postgres = "0.14"
derive_builder = "0.20"
@@ -129,7 +129,7 @@ etcd-client = "0.14"
fst = "0.4.7"
futures = "0.3"
futures-util = "0.3"
greptime-proto = { git = "https://github.com/GreptimeTeam/greptime-proto.git", rev = "e82b0158cd38d4021edb4e4c0ae77f999051e62f" }
greptime-proto = { git = "https://github.com/GreptimeTeam/greptime-proto.git", rev = "4d4136692fe7fbbd509ebc8c902f6afcc0ce61e4" }
hex = "0.4"
http = "1"
humantime = "2.1"
@@ -161,8 +161,10 @@ parquet = { version = "54.2", default-features = false, features = ["arrow", "as
paste = "1.0"
pin-project = "1.0"
prometheus = { version = "0.13.3", features = ["process"] }
promql-parser = { version = "0.5.1", features = ["ser"] }
prost = "0.13"
promql-parser = { git = "https://github.com/GreptimeTeam/promql-parser.git", rev = "0410e8b459dda7cb222ce9596f8bf3971bd07bd2", features = [
"ser",
] }
prost = { version = "0.13", features = ["no-recursion-limit"] }
raft-engine = { version = "0.4.1", default-features = false }
rand = "0.9"
ratelimit = "0.10"
@@ -191,7 +193,7 @@ simd-json = "0.15"
similar-asserts = "1.6.0"
smallvec = { version = "1", features = ["serde"] }
snafu = "0.8"
sqlparser = { git = "https://github.com/GreptimeTeam/sqlparser-rs.git", rev = "e98e6b322426a9d397a71efef17075966223c089", features = [
sqlparser = { git = "https://github.com/GreptimeTeam/sqlparser-rs.git", rev = "0cf6c04490d59435ee965edd2078e8855bd8471e", features = [
"visitor",
"serde",
] } # branch = "v0.54.x"

View File

@@ -36,8 +36,8 @@ use common_grpc::flight::{FlightDecoder, FlightMessage};
use common_query::Output;
use common_recordbatch::error::ExternalSnafu;
use common_recordbatch::RecordBatchStreamWrapper;
use common_telemetry::error;
use common_telemetry::tracing_context::W3cTrace;
use common_telemetry::{error, warn};
use futures::future;
use futures_util::{Stream, StreamExt, TryStreamExt};
use prost::Message;
@@ -192,6 +192,36 @@ impl Database {
from_grpc_response(response)
}
/// Retry if connection fails, max_retries is the max number of retries, so the total wait time
/// is `max_retries * GRPC_CONN_TIMEOUT`
pub async fn handle_with_retry(&self, request: Request, max_retries: u32) -> Result<u32> {
let mut client = make_database_client(&self.client)?.inner;
let mut retries = 0;
let request = self.to_rpc_request(request);
loop {
let raw_response = client.handle(request.clone()).await;
match (raw_response, retries < max_retries) {
(Ok(resp), _) => return from_grpc_response(resp.into_inner()),
(Err(err), true) => {
// determine if the error is retryable
if is_grpc_retryable(&err) {
// retry
retries += 1;
warn!("Retrying {} times with error = {:?}", retries, err);
continue;
}
}
(Err(err), false) => {
error!(
"Failed to send request to grpc handle after {} retries, error = {:?}",
retries, err
);
return Err(err.into());
}
}
}
}
#[inline]
fn to_rpc_request(&self, request: Request) -> GreptimeRequest {
GreptimeRequest {
@@ -368,6 +398,11 @@ impl Database {
}
}
/// by grpc standard, only `Unavailable` is retryable, see: https://github.com/grpc/grpc/blob/master/doc/statuscodes.md#status-codes-and-their-use-in-grpc
pub fn is_grpc_retryable(err: &tonic::Status) -> bool {
matches!(err.code(), tonic::Code::Unavailable)
}
#[derive(Default, Debug, Clone)]
struct FlightContext {
auth_header: Option<AuthHeader>,

View File

@@ -12,6 +12,7 @@
// See the License for the specific language governing permissions and
// limitations under the License.
use std::fmt;
use std::time::Duration;
use async_trait::async_trait;
@@ -131,7 +132,7 @@ impl SubCommand {
}
}
#[derive(Debug, Default, Parser)]
#[derive(Default, Parser)]
pub struct StartCommand {
/// The address to bind the gRPC server.
#[clap(long, alias = "bind-addr")]
@@ -171,6 +172,27 @@ pub struct StartCommand {
backend: Option<BackendImpl>,
}
impl fmt::Debug for StartCommand {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
f.debug_struct("StartCommand")
.field("rpc_bind_addr", &self.rpc_bind_addr)
.field("rpc_server_addr", &self.rpc_server_addr)
.field("store_addrs", &self.sanitize_store_addrs())
.field("config_file", &self.config_file)
.field("selector", &self.selector)
.field("use_memory_store", &self.use_memory_store)
.field("enable_region_failover", &self.enable_region_failover)
.field("http_addr", &self.http_addr)
.field("http_timeout", &self.http_timeout)
.field("env_prefix", &self.env_prefix)
.field("data_home", &self.data_home)
.field("store_key_prefix", &self.store_key_prefix)
.field("max_txn_ops", &self.max_txn_ops)
.field("backend", &self.backend)
.finish()
}
}
impl StartCommand {
pub fn load_options(&self, global_options: &GlobalOptions) -> Result<MetasrvOptions> {
let mut opts = MetasrvOptions::load_layered_options(
@@ -184,6 +206,15 @@ impl StartCommand {
Ok(opts)
}
fn sanitize_store_addrs(&self) -> Option<Vec<String>> {
self.store_addrs.as_ref().map(|addrs| {
addrs
.iter()
.map(|addr| common_meta::kv_backend::util::sanitize_connection_string(addr))
.collect()
})
}
// The precedence order is: cli > config file > environment variables > default values.
fn merge_with_cli_options(
&self,

View File

@@ -13,10 +13,8 @@
// limitations under the License.
use std::sync::Arc;
mod greatest;
mod to_unixtime;
use greatest::GreatestFunction;
use to_unixtime::ToUnixtimeFunction;
use crate::function_registry::FunctionRegistry;
@@ -26,6 +24,5 @@ pub(crate) struct TimestampFunction;
impl TimestampFunction {
pub fn register(registry: &FunctionRegistry) {
registry.register(Arc::new(ToUnixtimeFunction));
registry.register(Arc::new(GreatestFunction));
}
}

View File

@@ -1,328 +0,0 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
use std::fmt::{self};
use common_query::error::{
self, ArrowComputeSnafu, InvalidFuncArgsSnafu, Result, UnsupportedInputDataTypeSnafu,
};
use common_query::prelude::{Signature, Volatility};
use datafusion::arrow::compute::kernels::cmp::gt;
use datatypes::arrow::array::AsArray;
use datatypes::arrow::compute::cast;
use datatypes::arrow::compute::kernels::zip;
use datatypes::arrow::datatypes::{
DataType as ArrowDataType, Date32Type, TimeUnit, TimestampMicrosecondType,
TimestampMillisecondType, TimestampNanosecondType, TimestampSecondType,
};
use datatypes::prelude::ConcreteDataType;
use datatypes::types::TimestampType;
use datatypes::vectors::{Helper, VectorRef};
use snafu::{ensure, ResultExt};
use crate::function::{Function, FunctionContext};
#[derive(Clone, Debug, Default)]
pub struct GreatestFunction;
const NAME: &str = "greatest";
macro_rules! gt_time_types {
($ty: ident, $columns:expr) => {{
let column1 = $columns[0].to_arrow_array();
let column2 = $columns[1].to_arrow_array();
let column1 = column1.as_primitive::<$ty>();
let column2 = column2.as_primitive::<$ty>();
let boolean_array = gt(&column1, &column2).context(ArrowComputeSnafu)?;
let result = zip::zip(&boolean_array, &column1, &column2).context(ArrowComputeSnafu)?;
Helper::try_into_vector(&result).context(error::FromArrowArraySnafu)
}};
}
impl Function for GreatestFunction {
fn name(&self) -> &str {
NAME
}
fn return_type(&self, input_types: &[ConcreteDataType]) -> Result<ConcreteDataType> {
ensure!(
input_types.len() == 2,
InvalidFuncArgsSnafu {
err_msg: format!(
"The length of the args is not correct, expect exactly two, have: {}",
input_types.len()
)
}
);
match &input_types[0] {
ConcreteDataType::String(_) => Ok(ConcreteDataType::timestamp_millisecond_datatype()),
ConcreteDataType::Date(_) => Ok(ConcreteDataType::date_datatype()),
ConcreteDataType::Timestamp(ts_type) => Ok(ConcreteDataType::Timestamp(*ts_type)),
_ => UnsupportedInputDataTypeSnafu {
function: NAME,
datatypes: input_types,
}
.fail(),
}
}
fn signature(&self) -> Signature {
Signature::uniform(
2,
vec![
ConcreteDataType::string_datatype(),
ConcreteDataType::date_datatype(),
ConcreteDataType::timestamp_nanosecond_datatype(),
ConcreteDataType::timestamp_microsecond_datatype(),
ConcreteDataType::timestamp_millisecond_datatype(),
ConcreteDataType::timestamp_second_datatype(),
],
Volatility::Immutable,
)
}
fn eval(&self, _func_ctx: &FunctionContext, columns: &[VectorRef]) -> Result<VectorRef> {
ensure!(
columns.len() == 2,
InvalidFuncArgsSnafu {
err_msg: format!(
"The length of the args is not correct, expect exactly two, have: {}",
columns.len()
),
}
);
match columns[0].data_type() {
ConcreteDataType::String(_) => {
let column1 = cast(
&columns[0].to_arrow_array(),
&ArrowDataType::Timestamp(TimeUnit::Millisecond, None),
)
.context(ArrowComputeSnafu)?;
let column1 = column1.as_primitive::<TimestampMillisecondType>();
let column2 = cast(
&columns[1].to_arrow_array(),
&ArrowDataType::Timestamp(TimeUnit::Millisecond, None),
)
.context(ArrowComputeSnafu)?;
let column2 = column2.as_primitive::<TimestampMillisecondType>();
let boolean_array = gt(&column1, &column2).context(ArrowComputeSnafu)?;
let result =
zip::zip(&boolean_array, &column1, &column2).context(ArrowComputeSnafu)?;
Ok(Helper::try_into_vector(&result).context(error::FromArrowArraySnafu)?)
}
ConcreteDataType::Date(_) => gt_time_types!(Date32Type, columns),
ConcreteDataType::Timestamp(ts_type) => match ts_type {
TimestampType::Second(_) => gt_time_types!(TimestampSecondType, columns),
TimestampType::Millisecond(_) => {
gt_time_types!(TimestampMillisecondType, columns)
}
TimestampType::Microsecond(_) => {
gt_time_types!(TimestampMicrosecondType, columns)
}
TimestampType::Nanosecond(_) => {
gt_time_types!(TimestampNanosecondType, columns)
}
},
_ => UnsupportedInputDataTypeSnafu {
function: NAME,
datatypes: columns.iter().map(|c| c.data_type()).collect::<Vec<_>>(),
}
.fail(),
}
}
}
impl fmt::Display for GreatestFunction {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
write!(f, "GREATEST")
}
}
#[cfg(test)]
mod tests {
use std::sync::Arc;
use common_time::timestamp::TimeUnit;
use common_time::{Date, Timestamp};
use datatypes::types::{
DateType, TimestampMicrosecondType, TimestampMillisecondType, TimestampNanosecondType,
TimestampSecondType,
};
use datatypes::value::Value;
use datatypes::vectors::{
DateVector, StringVector, TimestampMicrosecondVector, TimestampMillisecondVector,
TimestampNanosecondVector, TimestampSecondVector, Vector,
};
use paste::paste;
use super::*;
#[test]
fn test_greatest_takes_string_vector() {
let function = GreatestFunction;
assert_eq!(
function
.return_type(&[
ConcreteDataType::string_datatype(),
ConcreteDataType::string_datatype()
])
.unwrap(),
ConcreteDataType::timestamp_millisecond_datatype()
);
let columns = vec![
Arc::new(StringVector::from(vec![
"1970-01-01".to_string(),
"2012-12-23".to_string(),
])) as _,
Arc::new(StringVector::from(vec![
"2001-02-01".to_string(),
"1999-01-01".to_string(),
])) as _,
];
let result = function
.eval(&FunctionContext::default(), &columns)
.unwrap();
let result = result
.as_any()
.downcast_ref::<TimestampMillisecondVector>()
.unwrap();
assert_eq!(result.len(), 2);
assert_eq!(
result.get(0),
Value::Timestamp(Timestamp::from_str("2001-02-01 00:00:00", None).unwrap())
);
assert_eq!(
result.get(1),
Value::Timestamp(Timestamp::from_str("2012-12-23 00:00:00", None).unwrap())
);
}
#[test]
fn test_greatest_takes_date_vector() {
let function = GreatestFunction;
assert_eq!(
function
.return_type(&[
ConcreteDataType::date_datatype(),
ConcreteDataType::date_datatype()
])
.unwrap(),
ConcreteDataType::Date(DateType)
);
let columns = vec![
Arc::new(DateVector::from_slice(vec![-1, 2])) as _,
Arc::new(DateVector::from_slice(vec![0, 1])) as _,
];
let result = function
.eval(&FunctionContext::default(), &columns)
.unwrap();
let result = result.as_any().downcast_ref::<DateVector>().unwrap();
assert_eq!(result.len(), 2);
assert_eq!(
result.get(0),
Value::Date(Date::from_str_utc("1970-01-01").unwrap())
);
assert_eq!(
result.get(1),
Value::Date(Date::from_str_utc("1970-01-03").unwrap())
);
}
#[test]
fn test_greatest_takes_datetime_vector() {
let function = GreatestFunction;
assert_eq!(
function
.return_type(&[
ConcreteDataType::timestamp_millisecond_datatype(),
ConcreteDataType::timestamp_millisecond_datatype()
])
.unwrap(),
ConcreteDataType::timestamp_millisecond_datatype()
);
let columns = vec![
Arc::new(TimestampMillisecondVector::from_slice(vec![-1, 2])) as _,
Arc::new(TimestampMillisecondVector::from_slice(vec![0, 1])) as _,
];
let result = function
.eval(&FunctionContext::default(), &columns)
.unwrap();
let result = result
.as_any()
.downcast_ref::<TimestampMillisecondVector>()
.unwrap();
assert_eq!(result.len(), 2);
assert_eq!(
result.get(0),
Value::Timestamp(Timestamp::from_str("1970-01-01 00:00:00", None).unwrap())
);
assert_eq!(
result.get(1),
Value::Timestamp(Timestamp::from_str("1970-01-01 00:00:00.002", None).unwrap())
);
}
macro_rules! test_timestamp {
($type: expr,$unit: ident) => {
paste! {
#[test]
fn [<test_greatest_takes_ $unit:lower _vector>]() {
let function = GreatestFunction;
assert_eq!(
function.return_type(&[$type, $type]).unwrap(),
ConcreteDataType::Timestamp(TimestampType::$unit([<Timestamp $unit Type>]))
);
let columns = vec![
Arc::new([<Timestamp $unit Vector>]::from_slice(vec![-1, 2])) as _,
Arc::new([<Timestamp $unit Vector>]::from_slice(vec![0, 1])) as _,
];
let result = function.eval(&FunctionContext::default(), &columns).unwrap();
let result = result.as_any().downcast_ref::<[<Timestamp $unit Vector>]>().unwrap();
assert_eq!(result.len(), 2);
assert_eq!(
result.get(0),
Value::Timestamp(Timestamp::new(0, TimeUnit::$unit))
);
assert_eq!(
result.get(1),
Value::Timestamp(Timestamp::new(2, TimeUnit::$unit))
);
}
}
}
}
test_timestamp!(
ConcreteDataType::timestamp_nanosecond_datatype(),
Nanosecond
);
test_timestamp!(
ConcreteDataType::timestamp_microsecond_datatype(),
Microsecond
);
test_timestamp!(
ConcreteDataType::timestamp_millisecond_datatype(),
Millisecond
);
test_timestamp!(ConcreteDataType::timestamp_second_datatype(), Second);
}

View File

@@ -115,6 +115,13 @@ impl Function for UddSketchCalcFunction {
}
};
// Check if the sketch is empty, if so, return null
// This is important to avoid panics when calling estimate_quantile on an empty sketch
// In practice, this will happen if input is all null
if sketch.bucket_iter().count() == 0 {
builder.push_null();
continue;
}
// Compute the estimated quantile from the sketch
let result = sketch.estimate_quantile(perc);
builder.push(Some(result));

View File

@@ -18,4 +18,5 @@ pub mod flight;
pub mod precision;
pub mod select;
pub use arrow_flight::FlightData;
pub use error::Error;

View File

@@ -24,21 +24,39 @@ use crate::cache::{CacheContainer, Initializer};
use crate::error::Result;
use crate::instruction::{CacheIdent, CreateFlow, DropFlow};
use crate::key::flow::{TableFlowManager, TableFlowManagerRef};
use crate::key::{FlowId, FlowPartitionId};
use crate::kv_backend::KvBackendRef;
use crate::peer::Peer;
use crate::FlownodeId;
type FlownodeSet = Arc<HashMap<FlownodeId, Peer>>;
/// Flow id&flow partition key
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
pub struct FlowIdent {
pub flow_id: FlowId,
pub partition_id: FlowPartitionId,
}
impl FlowIdent {
pub fn new(flow_id: FlowId, partition_id: FlowPartitionId) -> Self {
Self {
flow_id,
partition_id,
}
}
}
/// cache for TableFlowManager, the table_id part is in the outer cache
/// include flownode_id, flow_id, partition_id mapping to Peer
type FlownodeFlowSet = Arc<HashMap<FlowIdent, Peer>>;
pub type TableFlownodeSetCacheRef = Arc<TableFlownodeSetCache>;
/// [TableFlownodeSetCache] caches the [TableId] to [FlownodeSet] mapping.
pub type TableFlownodeSetCache = CacheContainer<TableId, FlownodeSet, CacheIdent>;
pub type TableFlownodeSetCache = CacheContainer<TableId, FlownodeFlowSet, CacheIdent>;
/// Constructs a [TableFlownodeSetCache].
pub fn new_table_flownode_set_cache(
name: String,
cache: Cache<TableId, FlownodeSet>,
cache: Cache<TableId, FlownodeFlowSet>,
kv_backend: KvBackendRef,
) -> TableFlownodeSetCache {
let table_flow_manager = Arc::new(TableFlowManager::new(kv_backend));
@@ -47,7 +65,7 @@ pub fn new_table_flownode_set_cache(
CacheContainer::new(name, cache, Box::new(invalidator), init, filter)
}
fn init_factory(table_flow_manager: TableFlowManagerRef) -> Initializer<TableId, FlownodeSet> {
fn init_factory(table_flow_manager: TableFlowManagerRef) -> Initializer<TableId, FlownodeFlowSet> {
Arc::new(move |&table_id| {
let table_flow_manager = table_flow_manager.clone();
Box::pin(async move {
@@ -57,7 +75,12 @@ fn init_factory(table_flow_manager: TableFlowManagerRef) -> Initializer<TableId,
.map(|flows| {
flows
.into_iter()
.map(|(key, value)| (key.flownode_id(), value.peer))
.map(|(key, value)| {
(
FlowIdent::new(key.flow_id(), key.partition_id()),
value.peer,
)
})
.collect::<HashMap<_, _>>()
})
// We must cache the `HashSet` even if it's empty,
@@ -71,26 +94,33 @@ fn init_factory(table_flow_manager: TableFlowManagerRef) -> Initializer<TableId,
}
async fn handle_create_flow(
cache: &Cache<TableId, FlownodeSet>,
cache: &Cache<TableId, FlownodeFlowSet>,
CreateFlow {
flow_id,
source_table_ids,
flownodes: flownode_peers,
partition_to_peer_mapping: flow_part2nodes,
}: &CreateFlow,
) {
for table_id in source_table_ids {
let entry = cache.entry(*table_id);
entry
.and_compute_with(
async |entry: Option<moka::Entry<u32, Arc<HashMap<u64, _>>>>| match entry {
async |entry: Option<moka::Entry<u32, FlownodeFlowSet>>| match entry {
Some(entry) => {
let mut map = entry.into_value().as_ref().clone();
map.extend(flownode_peers.iter().map(|peer| (peer.id, peer.clone())));
map.extend(
flow_part2nodes.iter().map(|(part, peer)| {
(FlowIdent::new(*flow_id, *part), peer.clone())
}),
);
Op::Put(Arc::new(map))
}
None => Op::Put(Arc::new(HashMap::from_iter(
flownode_peers.iter().map(|peer| (peer.id, peer.clone())),
))),
None => {
Op::Put(Arc::new(HashMap::from_iter(flow_part2nodes.iter().map(
|(part, peer)| (FlowIdent::new(*flow_id, *part), peer.clone()),
))))
}
},
)
.await;
@@ -98,21 +128,23 @@ async fn handle_create_flow(
}
async fn handle_drop_flow(
cache: &Cache<TableId, FlownodeSet>,
cache: &Cache<TableId, FlownodeFlowSet>,
DropFlow {
flow_id,
source_table_ids,
flownode_ids,
flow_part2node_id,
}: &DropFlow,
) {
for table_id in source_table_ids {
let entry = cache.entry(*table_id);
entry
.and_compute_with(
async |entry: Option<moka::Entry<u32, Arc<HashMap<u64, _>>>>| match entry {
async |entry: Option<moka::Entry<u32, FlownodeFlowSet>>| match entry {
Some(entry) => {
let mut set = entry.into_value().as_ref().clone();
for flownode_id in flownode_ids {
set.remove(flownode_id);
for (part, _node) in flow_part2node_id {
let key = FlowIdent::new(*flow_id, *part);
set.remove(&key);
}
Op::Put(Arc::new(set))
@@ -128,7 +160,7 @@ async fn handle_drop_flow(
}
fn invalidator<'a>(
cache: &'a Cache<TableId, FlownodeSet>,
cache: &'a Cache<TableId, FlownodeFlowSet>,
ident: &'a CacheIdent,
) -> BoxFuture<'a, Result<()>> {
Box::pin(async move {
@@ -154,7 +186,7 @@ mod tests {
use moka::future::CacheBuilder;
use table::table_name::TableName;
use crate::cache::flow::table_flownode::new_table_flownode_set_cache;
use crate::cache::flow::table_flownode::{new_table_flownode_set_cache, FlowIdent};
use crate::instruction::{CacheIdent, CreateFlow, DropFlow};
use crate::key::flow::flow_info::FlowInfoValue;
use crate::key::flow::flow_route::FlowRouteValue;
@@ -214,12 +246,16 @@ mod tests {
let set = cache.get(1024).await.unwrap().unwrap();
assert_eq!(
set.as_ref().clone(),
HashMap::from_iter((1..=3).map(|i| { (i, Peer::empty(i),) }))
HashMap::from_iter(
(1..=3).map(|i| { (FlowIdent::new(1024, (i - 1) as u32), Peer::empty(i),) })
)
);
let set = cache.get(1025).await.unwrap().unwrap();
assert_eq!(
set.as_ref().clone(),
HashMap::from_iter((1..=3).map(|i| { (i, Peer::empty(i),) }))
HashMap::from_iter(
(1..=3).map(|i| { (FlowIdent::new(1024, (i - 1) as u32), Peer::empty(i),) })
)
);
let result = cache.get(1026).await.unwrap().unwrap();
assert_eq!(result.len(), 0);
@@ -231,8 +267,9 @@ mod tests {
let cache = CacheBuilder::new(128).build();
let cache = new_table_flownode_set_cache("test".to_string(), cache, mem_kv);
let ident = vec![CacheIdent::CreateFlow(CreateFlow {
flow_id: 2001,
source_table_ids: vec![1024, 1025],
flownodes: (1..=5).map(Peer::empty).collect(),
partition_to_peer_mapping: (1..=5).map(|i| (i as u32, Peer::empty(i + 1))).collect(),
})];
cache.invalidate(&ident).await.unwrap();
let set = cache.get(1024).await.unwrap().unwrap();
@@ -241,6 +278,54 @@ mod tests {
assert_eq!(set.len(), 5);
}
#[tokio::test]
async fn test_replace_flow() {
let mem_kv = Arc::new(MemoryKvBackend::default());
let cache = CacheBuilder::new(128).build();
let cache = new_table_flownode_set_cache("test".to_string(), cache, mem_kv);
let ident = vec![CacheIdent::CreateFlow(CreateFlow {
flow_id: 2001,
source_table_ids: vec![1024, 1025],
partition_to_peer_mapping: (1..=5).map(|i| (i as u32, Peer::empty(i + 1))).collect(),
})];
cache.invalidate(&ident).await.unwrap();
let set = cache.get(1024).await.unwrap().unwrap();
assert_eq!(set.len(), 5);
let set = cache.get(1025).await.unwrap().unwrap();
assert_eq!(set.len(), 5);
let drop_then_create_flow = vec![
CacheIdent::DropFlow(DropFlow {
flow_id: 2001,
source_table_ids: vec![1024, 1025],
flow_part2node_id: (1..=5).map(|i| (i as u32, i + 1)).collect(),
}),
CacheIdent::CreateFlow(CreateFlow {
flow_id: 2001,
source_table_ids: vec![1026, 1027],
partition_to_peer_mapping: (11..=15)
.map(|i| (i as u32, Peer::empty(i + 1)))
.collect(),
}),
CacheIdent::FlowId(2001),
];
cache.invalidate(&drop_then_create_flow).await.unwrap();
let set = cache.get(1024).await.unwrap().unwrap();
assert!(set.is_empty());
let expected = HashMap::from_iter(
(11..=15).map(|i| (FlowIdent::new(2001, i as u32), Peer::empty(i + 1))),
);
let set = cache.get(1026).await.unwrap().unwrap();
assert_eq!(set.as_ref().clone(), expected);
let set = cache.get(1027).await.unwrap().unwrap();
assert_eq!(set.as_ref().clone(), expected);
}
#[tokio::test]
async fn test_drop_flow() {
let mem_kv = Arc::new(MemoryKvBackend::default());
@@ -248,34 +333,57 @@ mod tests {
let cache = new_table_flownode_set_cache("test".to_string(), cache, mem_kv);
let ident = vec![
CacheIdent::CreateFlow(CreateFlow {
flow_id: 2001,
source_table_ids: vec![1024, 1025],
flownodes: (1..=5).map(Peer::empty).collect(),
partition_to_peer_mapping: (1..=5)
.map(|i| (i as u32, Peer::empty(i + 1)))
.collect(),
}),
CacheIdent::CreateFlow(CreateFlow {
flow_id: 2002,
source_table_ids: vec![1024, 1025],
flownodes: (11..=12).map(Peer::empty).collect(),
partition_to_peer_mapping: (11..=12)
.map(|i| (i as u32, Peer::empty(i + 1)))
.collect(),
}),
// same flownode that hold multiple flows
CacheIdent::CreateFlow(CreateFlow {
flow_id: 2003,
source_table_ids: vec![1024, 1025],
partition_to_peer_mapping: (1..=5)
.map(|i| (i as u32, Peer::empty(i + 1)))
.collect(),
}),
];
cache.invalidate(&ident).await.unwrap();
let set = cache.get(1024).await.unwrap().unwrap();
assert_eq!(set.len(), 7);
assert_eq!(set.len(), 12);
let set = cache.get(1025).await.unwrap().unwrap();
assert_eq!(set.len(), 7);
assert_eq!(set.len(), 12);
let ident = vec![CacheIdent::DropFlow(DropFlow {
flow_id: 2001,
source_table_ids: vec![1024, 1025],
flownode_ids: vec![1, 2, 3, 4, 5],
flow_part2node_id: (1..=5).map(|i| (i as u32, i + 1)).collect(),
})];
cache.invalidate(&ident).await.unwrap();
let set = cache.get(1024).await.unwrap().unwrap();
assert_eq!(
set.as_ref().clone(),
HashMap::from_iter((11..=12).map(|i| { (i, Peer::empty(i),) }))
HashMap::from_iter(
(11..=12)
.map(|i| (FlowIdent::new(2002, i as u32), Peer::empty(i + 1)))
.chain((1..=5).map(|i| (FlowIdent::new(2003, i as u32), Peer::empty(i + 1))))
)
);
let set = cache.get(1025).await.unwrap().unwrap();
assert_eq!(
set.as_ref().clone(),
HashMap::from_iter((11..=12).map(|i| { (i, Peer::empty(i),) }))
HashMap::from_iter(
(11..=12)
.map(|i| (FlowIdent::new(2002, i as u32), Peer::empty(i + 1)))
.chain((1..=5).map(|i| (FlowIdent::new(2003, i as u32), Peer::empty(i + 1))))
)
);
}
}

View File

@@ -16,9 +16,12 @@ use std::sync::Arc;
use crate::error::Result;
use crate::flow_name::FlowName;
use crate::instruction::CacheIdent;
use crate::instruction::{CacheIdent, DropFlow};
use crate::key::flow::flow_info::FlowInfoKey;
use crate::key::flow::flow_name::FlowNameKey;
use crate::key::flow::flow_route::FlowRouteKey;
use crate::key::flow::flownode_flow::FlownodeFlowKey;
use crate::key::flow::table_flow::TableFlowKey;
use crate::key::schema_name::SchemaNameKey;
use crate::key::table_info::TableInfoKey;
use crate::key::table_name::TableNameKey;
@@ -89,9 +92,40 @@ where
let key: SchemaNameKey = schema_name.into();
self.invalidate_key(&key.to_bytes()).await;
}
CacheIdent::CreateFlow(_) | CacheIdent::DropFlow(_) => {
CacheIdent::CreateFlow(_) => {
// Do nothing
}
CacheIdent::DropFlow(DropFlow {
flow_id,
source_table_ids,
flow_part2node_id,
}) => {
// invalidate flow route/flownode flow/table flow
let mut keys = Vec::with_capacity(
source_table_ids.len() * flow_part2node_id.len()
+ flow_part2node_id.len() * 2,
);
for table_id in source_table_ids {
for (partition_id, node_id) in flow_part2node_id {
let key =
TableFlowKey::new(*table_id, *node_id, *flow_id, *partition_id)
.to_bytes();
keys.push(key);
}
}
for (partition_id, node_id) in flow_part2node_id {
let key =
FlownodeFlowKey::new(*node_id, *flow_id, *partition_id).to_bytes();
keys.push(key);
let key = FlowRouteKey::new(*flow_id, *partition_id).to_bytes();
keys.push(key);
}
for key in keys {
self.invalidate_key(&key).await;
}
}
CacheIdent::FlowName(FlowName {
catalog_name,
flow_name,

View File

@@ -39,7 +39,7 @@ use crate::cache_invalidator::Context;
use crate::ddl::utils::{add_peer_context_if_needed, handle_retry_error};
use crate::ddl::DdlContext;
use crate::error::{self, Result, UnexpectedSnafu};
use crate::instruction::{CacheIdent, CreateFlow};
use crate::instruction::{CacheIdent, CreateFlow, DropFlow};
use crate::key::flow::flow_info::FlowInfoValue;
use crate::key::flow::flow_route::FlowRouteValue;
use crate::key::table_name::TableNameKey;
@@ -70,6 +70,7 @@ impl CreateFlowProcedure {
query_context,
state: CreateFlowState::Prepare,
prev_flow_info_value: None,
did_replace: false,
flow_type: None,
},
}
@@ -224,6 +225,7 @@ impl CreateFlowProcedure {
.update_flow_metadata(flow_id, prev_flow_value, &flow_info, flow_routes)
.await?;
info!("Replaced flow metadata for flow {flow_id}");
self.data.did_replace = true;
} else {
self.context
.flow_metadata_manager
@@ -240,22 +242,43 @@ impl CreateFlowProcedure {
debug_assert!(self.data.state == CreateFlowState::InvalidateFlowCache);
// Safety: The flow id must be allocated.
let flow_id = self.data.flow_id.unwrap();
let did_replace = self.data.did_replace;
let ctx = Context {
subject: Some("Invalidate flow cache by creating flow".to_string()),
};
let mut caches = vec![];
// if did replaced, invalidate the flow cache with drop the old flow
if did_replace {
let old_flow_info = self.data.prev_flow_info_value.as_ref().unwrap();
// only drop flow is needed, since flow name haven't changed, and flow id already invalidated below
caches.extend([CacheIdent::DropFlow(DropFlow {
flow_id,
source_table_ids: old_flow_info.source_table_ids.clone(),
flow_part2node_id: old_flow_info.flownode_ids().clone().into_iter().collect(),
})]);
}
let (_flow_info, flow_routes) = (&self.data).into();
let flow_part2peers = flow_routes
.into_iter()
.map(|(part_id, route)| (part_id, route.peer))
.collect();
caches.extend([
CacheIdent::CreateFlow(CreateFlow {
flow_id,
source_table_ids: self.data.source_table_ids.clone(),
partition_to_peer_mapping: flow_part2peers,
}),
CacheIdent::FlowId(flow_id),
]);
self.context
.cache_invalidator
.invalidate(
&ctx,
&[
CacheIdent::CreateFlow(CreateFlow {
source_table_ids: self.data.source_table_ids.clone(),
flownodes: self.data.peers.clone(),
}),
CacheIdent::FlowId(flow_id),
],
)
.invalidate(&ctx, &caches)
.await?;
Ok(Status::done_with_output(flow_id))
@@ -377,6 +400,10 @@ pub struct CreateFlowData {
/// For verify if prev value is consistent when need to update flow metadata.
/// only set when `or_replace` is true.
pub(crate) prev_flow_info_value: Option<DeserializedValueWithBytes<FlowInfoValue>>,
/// Only set to true when replace actually happened.
/// This is used to determine whether to invalidate the cache.
#[serde(default)]
pub(crate) did_replace: bool,
pub(crate) flow_type: Option<FlowType>,
}

View File

@@ -13,6 +13,7 @@
// limitations under the License.
mod metadata;
use api::v1::flow::{flow_request, DropRequest, FlowRequest};
use async_trait::async_trait;
use common_catalog::format_full_flow_name;
@@ -153,6 +154,12 @@ impl DropFlowProcedure {
};
let flow_info_value = self.data.flow_info_value.as_ref().unwrap();
let flow_part2nodes = flow_info_value
.flownode_ids()
.clone()
.into_iter()
.collect::<Vec<_>>();
self.context
.cache_invalidator
.invalidate(
@@ -164,8 +171,9 @@ impl DropFlowProcedure {
flow_name: flow_info_value.flow_name.to_string(),
}),
CacheIdent::DropFlow(DropFlow {
flow_id,
source_table_ids: flow_info_value.source_table_ids.clone(),
flownode_ids: flow_info_value.flownode_ids.values().cloned().collect(),
flow_part2node_id: flow_part2nodes,
}),
],
)

View File

@@ -514,11 +514,25 @@ pub enum Error {
},
#[snafu(display(
"Failed to build a Kafka partition client, topic: {}, partition: {}",
"Failed to get a Kafka partition client, topic: {}, partition: {}",
topic,
partition
))]
BuildKafkaPartitionClient {
KafkaPartitionClient {
topic: String,
partition: i32,
#[snafu(implicit)]
location: Location,
#[snafu(source)]
error: rskafka::client::error::Error,
},
#[snafu(display(
"Failed to get offset from Kafka, topic: {}, partition: {}",
topic,
partition
))]
KafkaGetOffset {
topic: String,
partition: i32,
#[snafu(implicit)]
@@ -843,7 +857,7 @@ impl ErrorExt for Error {
| EncodeWalOptions { .. }
| BuildKafkaClient { .. }
| BuildKafkaCtrlClient { .. }
| BuildKafkaPartitionClient { .. }
| KafkaPartitionClient { .. }
| ResolveKafkaEndpoint { .. }
| ProduceRecord { .. }
| CreateKafkaWalTopic { .. }
@@ -852,7 +866,8 @@ impl ErrorExt for Error {
| ProcedureOutput { .. }
| FromUtf8 { .. }
| MetadataCorruption { .. }
| ParseWalOptions { .. } => StatusCode::Unexpected,
| ParseWalOptions { .. }
| KafkaGetOffset { .. } => StatusCode::Unexpected,
SendMessage { .. } | GetKvCache { .. } | CacheNotGet { .. } => StatusCode::Internal,

View File

@@ -24,7 +24,7 @@ use table::table_name::TableName;
use crate::flow_name::FlowName;
use crate::key::schema_name::SchemaName;
use crate::key::FlowId;
use crate::key::{FlowId, FlowPartitionId};
use crate::peer::Peer;
use crate::{DatanodeId, FlownodeId};
@@ -184,14 +184,19 @@ pub enum CacheIdent {
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq)]
pub struct CreateFlow {
/// The unique identifier for the flow.
pub flow_id: FlowId,
pub source_table_ids: Vec<TableId>,
pub flownodes: Vec<Peer>,
/// Mapping of flow partition to peer information
pub partition_to_peer_mapping: Vec<(FlowPartitionId, Peer)>,
}
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq)]
pub struct DropFlow {
pub flow_id: FlowId,
pub source_table_ids: Vec<TableId>,
pub flownode_ids: Vec<FlownodeId>,
/// Mapping of flow partition to flownode id
pub flow_part2node_id: Vec<(FlowPartitionId, FlownodeId)>,
}
/// Flushes a batch of regions.

View File

@@ -256,6 +256,11 @@ impl DatanodeTableManager {
})?
.and_then(|r| DatanodeTableValue::try_from_raw_value(&r.value))?
.region_info;
// If the region options are the same, we don't need to update it.
if region_info.region_options == new_region_options {
return Ok(Txn::new());
}
// substitute region options only.
region_info.region_options = new_region_options;

View File

@@ -45,7 +45,7 @@ use crate::kv_backend::KvBackendRef;
use crate::rpc::store::BatchDeleteRequest;
/// The key of `__flow/` scope.
#[derive(Debug, PartialEq)]
#[derive(Debug, Clone, PartialEq)]
pub struct FlowScoped<T> {
inner: T,
}
@@ -246,27 +246,32 @@ impl FlowMetadataManager {
new_flow_info: &FlowInfoValue,
flow_routes: Vec<(FlowPartitionId, FlowRouteValue)>,
) -> Result<()> {
let (create_flow_flow_name_txn, on_create_flow_flow_name_failure) =
let (update_flow_flow_name_txn, on_create_flow_flow_name_failure) =
self.flow_name_manager.build_update_txn(
&new_flow_info.catalog_name,
&new_flow_info.flow_name,
flow_id,
)?;
let (create_flow_txn, on_create_flow_failure) =
let (update_flow_txn, on_create_flow_failure) =
self.flow_info_manager
.build_update_txn(flow_id, current_flow_info, new_flow_info)?;
let create_flow_routes_txn = self
.flow_route_manager
.build_create_txn(flow_id, flow_routes.clone())?;
let create_flownode_flow_txn = self
.flownode_flow_manager
.build_create_txn(flow_id, new_flow_info.flownode_ids().clone());
let create_table_flow_txn = self.table_flow_manager.build_create_txn(
let update_flow_routes_txn = self.flow_route_manager.build_update_txn(
flow_id,
current_flow_info,
flow_routes.clone(),
)?;
let update_flownode_flow_txn = self.flownode_flow_manager.build_update_txn(
flow_id,
current_flow_info,
new_flow_info.flownode_ids().clone(),
);
let update_table_flow_txn = self.table_flow_manager.build_update_txn(
flow_id,
current_flow_info,
flow_routes
.into_iter()
.map(|(partition_id, route)| (partition_id, TableFlowValue { peer: route.peer }))
@@ -275,11 +280,11 @@ impl FlowMetadataManager {
)?;
let txn = Txn::merge_all(vec![
create_flow_flow_name_txn,
create_flow_txn,
create_flow_routes_txn,
create_flownode_flow_txn,
create_table_flow_txn,
update_flow_flow_name_txn,
update_flow_txn,
update_flow_routes_txn,
update_flownode_flow_txn,
update_table_flow_txn,
]);
info!(
"Creating flow {}.{}({}), with {} txn operations",
@@ -783,6 +788,141 @@ mod tests {
}
}
#[tokio::test]
async fn test_update_flow_metadata_diff_flownode() {
let mem_kv = Arc::new(MemoryKvBackend::default());
let flow_metadata_manager = FlowMetadataManager::new(mem_kv.clone());
let flow_id = 10;
let flow_value = test_flow_info_value(
"flow",
[(0u32, 1u64), (1u32, 2u64)].into(),
vec![1024, 1025, 1026],
);
let flow_routes = vec![
(
0u32,
FlowRouteValue {
peer: Peer::empty(1),
},
),
(
1,
FlowRouteValue {
peer: Peer::empty(2),
},
),
];
flow_metadata_manager
.create_flow_metadata(flow_id, flow_value.clone(), flow_routes.clone())
.await
.unwrap();
let new_flow_value = {
let mut tmp = flow_value.clone();
tmp.raw_sql = "new".to_string();
// move to different flownodes
tmp.flownode_ids = [(0, 3u64), (1, 4u64)].into();
tmp
};
let new_flow_routes = vec![
(
0u32,
FlowRouteValue {
peer: Peer::empty(3),
},
),
(
1,
FlowRouteValue {
peer: Peer::empty(4),
},
),
];
// Update flow instead
flow_metadata_manager
.update_flow_metadata(
flow_id,
&DeserializedValueWithBytes::from_inner(flow_value.clone()),
&new_flow_value,
new_flow_routes.clone(),
)
.await
.unwrap();
let got = flow_metadata_manager
.flow_info_manager()
.get(flow_id)
.await
.unwrap()
.unwrap();
let routes = flow_metadata_manager
.flow_route_manager()
.routes(flow_id)
.await
.unwrap();
assert_eq!(
routes,
vec![
(
FlowRouteKey::new(flow_id, 0),
FlowRouteValue {
peer: Peer::empty(3),
},
),
(
FlowRouteKey::new(flow_id, 1),
FlowRouteValue {
peer: Peer::empty(4),
},
),
]
);
assert_eq!(got, new_flow_value);
let flows = flow_metadata_manager
.flownode_flow_manager()
.flows(1)
.try_collect::<Vec<_>>()
.await
.unwrap();
// should moved to different flownode
assert_eq!(flows, vec![]);
let flows = flow_metadata_manager
.flownode_flow_manager()
.flows(3)
.try_collect::<Vec<_>>()
.await
.unwrap();
assert_eq!(flows, vec![(flow_id, 0)]);
for table_id in [1024, 1025, 1026] {
let nodes = flow_metadata_manager
.table_flow_manager()
.flows(table_id)
.await
.unwrap();
assert_eq!(
nodes,
vec![
(
TableFlowKey::new(table_id, 3, flow_id, 0),
TableFlowValue {
peer: Peer::empty(3)
}
),
(
TableFlowKey::new(table_id, 4, flow_id, 1),
TableFlowValue {
peer: Peer::empty(4)
}
)
]
);
}
}
#[tokio::test]
async fn test_update_flow_metadata_flow_replace_diff_id_err() {
let mem_kv = Arc::new(MemoryKvBackend::default());

View File

@@ -153,6 +153,15 @@ impl FlowInfoValue {
&self.flownode_ids
}
/// Insert a new flownode id for a partition.
pub fn insert_flownode_id(
&mut self,
partition: FlowPartitionId,
node: FlownodeId,
) -> Option<FlownodeId> {
self.flownode_ids.insert(partition, node)
}
/// Returns the `source_table`.
pub fn source_table_ids(&self) -> &[TableId] {
&self.source_table_ids
@@ -272,10 +281,11 @@ impl FlowInfoManager {
let raw_value = new_flow_value.try_as_raw_value()?;
let prev_value = current_flow_value.get_raw_bytes();
let txn = Txn::new()
.when(vec![
Compare::new(key.clone(), CompareOp::NotEqual, None),
Compare::new(key.clone(), CompareOp::Equal, Some(prev_value)),
])
.when(vec![Compare::new(
key.clone(),
CompareOp::Equal,
Some(prev_value),
)])
.and_then(vec![TxnOp::Put(key.clone(), raw_value)])
.or_else(vec![TxnOp::Get(key.clone())]);

View File

@@ -19,9 +19,12 @@ use serde::{Deserialize, Serialize};
use snafu::OptionExt;
use crate::error::{self, Result};
use crate::key::flow::flow_info::FlowInfoValue;
use crate::key::flow::{flownode_addr_helper, FlowScoped};
use crate::key::node_address::NodeAddressKey;
use crate::key::{BytesAdapter, FlowId, FlowPartitionId, MetadataKey, MetadataValue};
use crate::key::{
BytesAdapter, DeserializedValueWithBytes, FlowId, FlowPartitionId, MetadataKey, MetadataValue,
};
use crate::kv_backend::txn::{Txn, TxnOp};
use crate::kv_backend::KvBackendRef;
use crate::peer::Peer;
@@ -39,7 +42,7 @@ lazy_static! {
/// The key stores the route info of the flow.
///
/// The layout: `__flow/route/{flow_id}/{partition_id}`.
#[derive(Debug, PartialEq)]
#[derive(Debug, Clone, PartialEq)]
pub struct FlowRouteKey(FlowScoped<FlowRouteKeyInner>);
impl FlowRouteKey {
@@ -142,6 +145,12 @@ pub struct FlowRouteValue {
pub(crate) peer: Peer,
}
impl From<Peer> for FlowRouteValue {
fn from(peer: Peer) -> Self {
Self { peer }
}
}
impl FlowRouteValue {
/// Returns the `peer`.
pub fn peer(&self) -> &Peer {
@@ -204,6 +213,33 @@ impl FlowRouteManager {
Ok(Txn::new().and_then(txns))
}
/// Builds a update flow routes transaction.
///
/// Puts `__flow/route/{flow_id}/{partition_id}` keys.
/// Also removes `__flow/route/{flow_id}/{old_partition_id}` keys.
pub(crate) fn build_update_txn<I: IntoIterator<Item = (FlowPartitionId, FlowRouteValue)>>(
&self,
flow_id: FlowId,
current_flow_info: &DeserializedValueWithBytes<FlowInfoValue>,
flow_routes: I,
) -> Result<Txn> {
let del_txns = current_flow_info
.flownode_ids()
.iter()
.map(|(partition_id, _)| {
let key = FlowRouteKey::new(flow_id, *partition_id).to_bytes();
Ok(TxnOp::Delete(key))
});
let put_txns = flow_routes.into_iter().map(|(partition_id, route)| {
let key = FlowRouteKey::new(flow_id, partition_id).to_bytes();
Ok(TxnOp::Put(key, route.try_as_raw_value()?))
});
let txns = del_txns.chain(put_txns).collect::<Result<Vec<_>>>()?;
Ok(Txn::new().and_then(txns))
}
async fn remap_flow_route_addresses(
&self,
flow_routes: &mut [(FlowRouteKey, FlowRouteValue)],

View File

@@ -19,8 +19,9 @@ use regex::Regex;
use snafu::OptionExt;
use crate::error::{self, Result};
use crate::key::flow::flow_info::FlowInfoValue;
use crate::key::flow::FlowScoped;
use crate::key::{BytesAdapter, FlowId, FlowPartitionId, MetadataKey};
use crate::key::{BytesAdapter, DeserializedValueWithBytes, FlowId, FlowPartitionId, MetadataKey};
use crate::kv_backend::txn::{Txn, TxnOp};
use crate::kv_backend::KvBackendRef;
use crate::range_stream::{PaginationStream, DEFAULT_PAGE_SIZE};
@@ -165,6 +166,17 @@ impl FlownodeFlowManager {
Self { kv_backend }
}
/// Whether given flow exist on this flownode.
pub async fn exists(
&self,
flownode_id: FlownodeId,
flow_id: FlowId,
partition_id: FlowPartitionId,
) -> Result<bool> {
let key = FlownodeFlowKey::new(flownode_id, flow_id, partition_id).to_bytes();
Ok(self.kv_backend.get(&key).await?.is_some())
}
/// Retrieves all [FlowId] and [FlowPartitionId]s of the specified `flownode_id`.
pub fn flows(
&self,
@@ -202,6 +214,33 @@ impl FlownodeFlowManager {
Txn::new().and_then(txns)
}
/// Builds a update flownode flow transaction.
///
/// Puts `__flownode_flow/{flownode_id}/{flow_id}/{partition_id}` keys.
/// Remove the old `__flownode_flow/{old_flownode_id}/{flow_id}/{old_partition_id}` keys.
pub(crate) fn build_update_txn<I: IntoIterator<Item = (FlowPartitionId, FlownodeId)>>(
&self,
flow_id: FlowId,
current_flow_info: &DeserializedValueWithBytes<FlowInfoValue>,
flownode_ids: I,
) -> Txn {
let del_txns =
current_flow_info
.flownode_ids()
.iter()
.map(|(partition_id, flownode_id)| {
let key = FlownodeFlowKey::new(*flownode_id, flow_id, *partition_id).to_bytes();
TxnOp::Delete(key)
});
let put_txns = flownode_ids.into_iter().map(|(partition_id, flownode_id)| {
let key = FlownodeFlowKey::new(flownode_id, flow_id, partition_id).to_bytes();
TxnOp::Put(key, vec![])
});
let txns = del_txns.chain(put_txns).collect::<Vec<_>>();
Txn::new().and_then(txns)
}
}
#[cfg(test)]

View File

@@ -22,9 +22,12 @@ use snafu::OptionExt;
use table::metadata::TableId;
use crate::error::{self, Result};
use crate::key::flow::flow_info::FlowInfoValue;
use crate::key::flow::{flownode_addr_helper, FlowScoped};
use crate::key::node_address::NodeAddressKey;
use crate::key::{BytesAdapter, FlowId, FlowPartitionId, MetadataKey, MetadataValue};
use crate::key::{
BytesAdapter, DeserializedValueWithBytes, FlowId, FlowPartitionId, MetadataKey, MetadataValue,
};
use crate::kv_backend::txn::{Txn, TxnOp};
use crate::kv_backend::KvBackendRef;
use crate::peer::Peer;
@@ -215,7 +218,7 @@ impl TableFlowManager {
/// Builds a create table flow transaction.
///
/// Puts `__flow/source_table/{table_id}/{node_id}/{partition_id}` keys.
/// Puts `__flow/source_table/{table_id}/{node_id}/{flow_id}/{partition_id}` keys.
pub fn build_create_txn(
&self,
flow_id: FlowId,
@@ -239,6 +242,44 @@ impl TableFlowManager {
Ok(Txn::new().and_then(txns))
}
/// Builds a update table flow transaction.
///
/// Puts `__flow/source_table/{table_id}/{node_id}/{flow_id}/{partition_id}` keys,
/// Also remove previous
/// `__flow/source_table/{table_id}/{old_node_id}/{flow_id}/{partition_id}` keys.
pub fn build_update_txn(
&self,
flow_id: FlowId,
current_flow_info: &DeserializedValueWithBytes<FlowInfoValue>,
table_flow_values: Vec<(FlowPartitionId, TableFlowValue)>,
source_table_ids: &[TableId],
) -> Result<Txn> {
let mut txns = Vec::with_capacity(2 * source_table_ids.len() * table_flow_values.len());
// first remove the old keys
for (part_id, node_id) in current_flow_info.flownode_ids() {
for source_table_id in current_flow_info.source_table_ids() {
txns.push(TxnOp::Delete(
TableFlowKey::new(*source_table_id, *node_id, flow_id, *part_id).to_bytes(),
));
}
}
for (partition_id, table_flow_value) in table_flow_values {
let flownode_id = table_flow_value.peer.id;
let value = table_flow_value.try_as_raw_value()?;
for source_table_id in source_table_ids {
txns.push(TxnOp::Put(
TableFlowKey::new(*source_table_id, flownode_id, flow_id, partition_id)
.to_bytes(),
value.clone(),
));
}
}
Ok(Txn::new().and_then(txns))
}
async fn remap_table_flow_addresses(
&self,
table_flows: &mut [(TableFlowKey, TableFlowValue)],

View File

@@ -35,7 +35,7 @@ pub mod memory;
pub mod rds;
pub mod test;
pub mod txn;
pub mod util;
pub type KvBackendRef<E = Error> = Arc<dyn KvBackend<Error = E> + Send + Sync>;
#[async_trait]

View File

@@ -0,0 +1,85 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
/// Removes sensitive information like passwords from connection strings.
///
/// This function sanitizes connection strings by removing credentials:
/// - For URL format (mysql://user:password@host:port/db): Removes everything before '@'
/// - For parameter format (host=localhost password=secret): Removes the password parameter
/// - For URL format without credentials (mysql://host:port/db): Removes the protocol prefix
///
/// # Arguments
///
/// * `conn_str` - The connection string to sanitize
///
/// # Returns
///
/// A sanitized version of the connection string with sensitive information removed
pub fn sanitize_connection_string(conn_str: &str) -> String {
// Case 1: URL format with credentials (mysql://user:password@host:port/db)
// Extract everything after the '@' symbol
if let Some(at_pos) = conn_str.find('@') {
return conn_str[at_pos + 1..].to_string();
}
// Case 2: Parameter format with password (host=localhost password=secret dbname=mydb)
// Filter out any parameter that starts with "password="
if conn_str.contains("password=") {
return conn_str
.split_whitespace()
.filter(|param| !param.starts_with("password="))
.collect::<Vec<_>>()
.join(" ");
}
// Case 3: URL format without credentials (mysql://host:port/db)
// Extract everything after the protocol prefix
if let Some(host_part) = conn_str.split("://").nth(1) {
return host_part.to_string();
}
// Case 4: Already sanitized or unknown format
// Return as is
conn_str.to_string()
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_sanitize_connection_string() {
// Test URL format with username/password
let conn_str = "mysql://user:password123@localhost:3306/db";
assert_eq!(sanitize_connection_string(conn_str), "localhost:3306/db");
// Test URL format without credentials
let conn_str = "mysql://localhost:3306/db";
assert_eq!(sanitize_connection_string(conn_str), "localhost:3306/db");
// Test parameter format with password
let conn_str = "host=localhost port=5432 user=postgres password=secret dbname=mydb";
assert_eq!(
sanitize_connection_string(conn_str),
"host=localhost port=5432 user=postgres dbname=mydb"
);
// Test parameter format without password
let conn_str = "host=localhost port=5432 user=postgres dbname=mydb";
assert_eq!(
sanitize_connection_string(conn_str),
"host=localhost port=5432 user=postgres dbname=mydb"
);
}
}

View File

@@ -20,6 +20,8 @@ use api::v1::region::{InsertRequests, RegionRequest};
pub use common_base::AffectedRows;
use common_query::request::QueryRequest;
use common_recordbatch::SendableRecordBatchStream;
use common_wal::config::kafka::common::{KafkaConnectionConfig, KafkaTopicConfig};
use common_wal::config::kafka::MetasrvKafkaConfig;
use crate::cache_invalidator::DummyCacheInvalidator;
use crate::ddl::flow_meta::FlowMetadataAllocator;
@@ -37,7 +39,8 @@ use crate::peer::{Peer, PeerLookupService};
use crate::region_keeper::MemoryRegionKeeper;
use crate::region_registry::LeaderRegionRegistry;
use crate::sequence::SequenceBuilder;
use crate::wal_options_allocator::WalOptionsAllocator;
use crate::wal_options_allocator::topic_pool::KafkaTopicPool;
use crate::wal_options_allocator::{build_kafka_topic_creator, WalOptionsAllocator};
use crate::{DatanodeId, FlownodeId};
#[async_trait::async_trait]
@@ -199,3 +202,34 @@ impl PeerLookupService for NoopPeerLookupService {
Ok(Some(Peer::empty(id)))
}
}
/// Create a kafka topic pool for testing.
pub async fn test_kafka_topic_pool(
broker_endpoints: Vec<String>,
num_topics: usize,
auto_create_topics: bool,
topic_name_prefix: Option<&str>,
) -> KafkaTopicPool {
let mut config = MetasrvKafkaConfig {
connection: KafkaConnectionConfig {
broker_endpoints,
..Default::default()
},
kafka_topic: KafkaTopicConfig {
num_topics,
..Default::default()
},
auto_create_topics,
..Default::default()
};
if let Some(prefix) = topic_name_prefix {
config.kafka_topic.topic_name_prefix = prefix.to_string();
}
let kv_backend = Arc::new(MemoryKvBackend::new()) as KvBackendRef;
let topic_creator = build_kafka_topic_creator(&config.connection, &config.kafka_topic)
.await
.unwrap();
KafkaTopicPool::new(&config, kv_backend, topic_creator)
}

View File

@@ -112,7 +112,9 @@ pub async fn build_wal_options_allocator(
NAME_PATTERN_REGEX.is_match(prefix),
InvalidTopicNamePrefixSnafu { prefix }
);
let topic_creator = build_kafka_topic_creator(kafka_config).await?;
let topic_creator =
build_kafka_topic_creator(&kafka_config.connection, &kafka_config.kafka_topic)
.await?;
let topic_pool = KafkaTopicPool::new(kafka_config, kv_backend, topic_creator);
Ok(WalOptionsAllocator::Kafka(topic_pool))
}
@@ -151,13 +153,16 @@ pub fn prepare_wal_options(
mod tests {
use std::assert_matches::assert_matches;
use common_wal::config::kafka::common::{KafkaConnectionConfig, KafkaTopicConfig};
use common_wal::config::kafka::common::KafkaTopicConfig;
use common_wal::config::kafka::MetasrvKafkaConfig;
use common_wal::test_util::run_test_with_kafka_wal;
use common_wal::maybe_skip_kafka_integration_test;
use common_wal::test_util::get_kafka_endpoints;
use super::*;
use crate::error::Error;
use crate::kv_backend::memory::MemoryKvBackend;
use crate::test_util::test_kafka_topic_pool;
use crate::wal_options_allocator::selector::RoundRobinTopicSelector;
// Tests that the wal options allocator could successfully allocate raft-engine wal options.
#[tokio::test]
@@ -197,55 +202,42 @@ mod tests {
assert_matches!(got, Error::InvalidTopicNamePrefix { .. });
}
// Tests that the wal options allocator could successfully allocate Kafka wal options.
#[tokio::test]
async fn test_allocator_with_kafka() {
run_test_with_kafka_wal(|broker_endpoints| {
Box::pin(async {
let topics = (0..256)
.map(|i| format!("test_allocator_with_kafka_{}_{}", i, uuid::Uuid::new_v4()))
.collect::<Vec<_>>();
// Creates a topic manager.
let kafka_topic = KafkaTopicConfig {
replication_factor: broker_endpoints.len() as i16,
..Default::default()
};
let config = MetasrvKafkaConfig {
connection: KafkaConnectionConfig {
broker_endpoints,
..Default::default()
},
kafka_topic,
..Default::default()
};
let kv_backend = Arc::new(MemoryKvBackend::new()) as KvBackendRef;
let topic_creator = build_kafka_topic_creator(&config).await.unwrap();
let mut topic_pool = KafkaTopicPool::new(&config, kv_backend, topic_creator);
topic_pool.topics.clone_from(&topics);
topic_pool.selector = Arc::new(selector::RoundRobinTopicSelector::default());
// Creates an options allocator.
let allocator = WalOptionsAllocator::Kafka(topic_pool);
allocator.start().await.unwrap();
let num_regions = 32;
let regions = (0..num_regions).collect::<Vec<_>>();
let got = allocate_region_wal_options(regions.clone(), &allocator, false).unwrap();
// Check the allocated wal options contain the expected topics.
let expected = (0..num_regions)
.map(|i| {
let options = WalOptions::Kafka(KafkaWalOptions {
topic: topics[i as usize].clone(),
});
(i, serde_json::to_string(&options).unwrap())
})
.collect::<HashMap<_, _>>();
assert_eq!(got, expected);
})
})
async fn test_allocator_with_kafka_allocate_wal_options() {
common_telemetry::init_default_ut_logging();
maybe_skip_kafka_integration_test!();
let num_topics = 5;
let mut topic_pool = test_kafka_topic_pool(
get_kafka_endpoints(),
num_topics,
true,
Some("test_allocator_with_kafka"),
)
.await;
topic_pool.selector = Arc::new(RoundRobinTopicSelector::default());
let topics = topic_pool.topics.clone();
// clean up the topics before test
let topic_creator = topic_pool.topic_creator();
topic_creator.delete_topics(&topics).await.unwrap();
// Creates an options allocator.
let allocator = WalOptionsAllocator::Kafka(topic_pool);
allocator.start().await.unwrap();
let num_regions = 3;
let regions = (0..num_regions).collect::<Vec<_>>();
let got = allocate_region_wal_options(regions.clone(), &allocator, false).unwrap();
// Check the allocated wal options contain the expected topics.
let expected = (0..num_regions)
.map(|i| {
let options = WalOptions::Kafka(KafkaWalOptions {
topic: topics[i as usize].clone(),
});
(i, serde_json::to_string(&options).unwrap())
})
.collect::<HashMap<_, _>>();
assert_eq!(got, expected);
}
#[tokio::test]

View File

@@ -12,20 +12,21 @@
// See the License for the specific language governing permissions and
// limitations under the License.
use common_telemetry::{error, info};
use common_wal::config::kafka::common::DEFAULT_BACKOFF_CONFIG;
use common_wal::config::kafka::MetasrvKafkaConfig;
use common_telemetry::{debug, error, info};
use common_wal::config::kafka::common::{
KafkaConnectionConfig, KafkaTopicConfig, DEFAULT_BACKOFF_CONFIG,
};
use rskafka::client::error::Error as RsKafkaError;
use rskafka::client::error::ProtocolError::TopicAlreadyExists;
use rskafka::client::partition::{Compression, UnknownTopicHandling};
use rskafka::client::partition::{Compression, OffsetAt, PartitionClient, UnknownTopicHandling};
use rskafka::client::{Client, ClientBuilder};
use rskafka::record::Record;
use snafu::ResultExt;
use crate::error::{
BuildKafkaClientSnafu, BuildKafkaCtrlClientSnafu, BuildKafkaPartitionClientSnafu,
CreateKafkaWalTopicSnafu, ProduceRecordSnafu, ResolveKafkaEndpointSnafu, Result,
TlsConfigSnafu,
BuildKafkaClientSnafu, BuildKafkaCtrlClientSnafu, CreateKafkaWalTopicSnafu,
KafkaGetOffsetSnafu, KafkaPartitionClientSnafu, ProduceRecordSnafu, ResolveKafkaEndpointSnafu,
Result, TlsConfigSnafu,
};
// Each topic only has one partition for now.
@@ -70,21 +71,47 @@ impl KafkaTopicCreator {
info!("The topic {} already exists", topic);
Ok(())
} else {
error!("Failed to create a topic {}, error {:?}", topic, e);
error!(e; "Failed to create a topic {}", topic);
Err(e).context(CreateKafkaWalTopicSnafu)
}
}
}
}
async fn append_noop_record(&self, topic: &String, client: &Client) -> Result<()> {
let partition_client = client
async fn prepare_topic(&self, topic: &String) -> Result<()> {
let partition_client = self.partition_client(topic).await?;
self.append_noop_record(topic, &partition_client).await?;
Ok(())
}
/// Creates a [PartitionClient] for the given topic.
async fn partition_client(&self, topic: &str) -> Result<PartitionClient> {
self.client
.partition_client(topic, DEFAULT_PARTITION, UnknownTopicHandling::Retry)
.await
.context(BuildKafkaPartitionClientSnafu {
.context(KafkaPartitionClientSnafu {
topic,
partition: DEFAULT_PARTITION,
})
}
/// Appends a noop record to the topic.
/// It only appends a noop record if the topic is empty.
async fn append_noop_record(
&self,
topic: &String,
partition_client: &PartitionClient,
) -> Result<()> {
let end_offset = partition_client
.get_offset(OffsetAt::Latest)
.await
.context(KafkaGetOffsetSnafu {
topic: topic.to_string(),
partition: DEFAULT_PARTITION,
})?;
if end_offset > 0 {
return Ok(());
}
partition_client
.produce(
@@ -98,22 +125,28 @@ impl KafkaTopicCreator {
)
.await
.context(ProduceRecordSnafu { topic })?;
debug!("Appended a noop record to topic {}", topic);
Ok(())
}
/// Creates topics in Kafka.
pub async fn create_topics(&self, topics: &[String]) -> Result<()> {
let tasks = topics
.iter()
.map(|topic| async { self.create_topic(topic, &self.client).await })
.collect::<Vec<_>>();
futures::future::try_join_all(tasks).await.map(|_| ())
}
/// Prepares topics in Kafka.
/// 1. Creates missing topics.
/// 2. Appends a noop record to each topic.
pub async fn prepare_topics(&self, topics: &[&String]) -> Result<()> {
///
/// It appends a noop record to each topic if the topic is empty.
pub async fn prepare_topics(&self, topics: &[String]) -> Result<()> {
// Try to create missing topics.
let tasks = topics
.iter()
.map(|topic| async {
self.create_topic(topic, &self.client).await?;
self.append_noop_record(topic, &self.client).await?;
Ok(())
})
.map(|topic| async { self.prepare_topic(topic).await })
.collect::<Vec<_>>();
futures::future::try_join_all(tasks).await.map(|_| ())
}
@@ -129,34 +162,244 @@ impl KafkaTopicCreator {
}
}
#[cfg(test)]
impl KafkaTopicCreator {
pub async fn delete_topics(&self, topics: &[String]) -> Result<()> {
let tasks = topics
.iter()
.map(|topic| async { self.delete_topic(topic, &self.client).await })
.collect::<Vec<_>>();
futures::future::try_join_all(tasks).await.map(|_| ())
}
async fn delete_topic(&self, topic: &String, client: &Client) -> Result<()> {
let controller = client
.controller_client()
.context(BuildKafkaCtrlClientSnafu)?;
match controller.delete_topic(topic, 10).await {
Ok(_) => {
info!("Successfully deleted topic {}", topic);
Ok(())
}
Err(e) => {
if Self::is_unknown_topic_err(&e) {
info!("The topic {} does not exist", topic);
Ok(())
} else {
panic!("Failed to delete a topic {}, error: {}", topic, e);
}
}
}
}
fn is_unknown_topic_err(e: &RsKafkaError) -> bool {
matches!(
e,
&RsKafkaError::ServerError {
protocol_error: rskafka::client::error::ProtocolError::UnknownTopicOrPartition,
..
}
)
}
pub async fn get_partition_client(&self, topic: &str) -> PartitionClient {
self.partition_client(topic).await.unwrap()
}
}
/// Builds a kafka [Client](rskafka::client::Client).
pub async fn build_kafka_client(config: &MetasrvKafkaConfig) -> Result<Client> {
pub async fn build_kafka_client(connection: &KafkaConnectionConfig) -> Result<Client> {
// Builds an kafka controller client for creating topics.
let broker_endpoints = common_wal::resolve_to_ipv4(&config.connection.broker_endpoints)
let broker_endpoints = common_wal::resolve_to_ipv4(&connection.broker_endpoints)
.await
.context(ResolveKafkaEndpointSnafu)?;
let mut builder = ClientBuilder::new(broker_endpoints).backoff_config(DEFAULT_BACKOFF_CONFIG);
if let Some(sasl) = &config.connection.sasl {
if let Some(sasl) = &connection.sasl {
builder = builder.sasl_config(sasl.config.clone().into_sasl_config());
};
if let Some(tls) = &config.connection.tls {
if let Some(tls) = &connection.tls {
builder = builder.tls_config(tls.to_tls_config().await.context(TlsConfigSnafu)?)
};
builder
.build()
.await
.with_context(|_| BuildKafkaClientSnafu {
broker_endpoints: config.connection.broker_endpoints.clone(),
broker_endpoints: connection.broker_endpoints.clone(),
})
}
/// Builds a [KafkaTopicCreator].
pub async fn build_kafka_topic_creator(config: &MetasrvKafkaConfig) -> Result<KafkaTopicCreator> {
let client = build_kafka_client(config).await?;
pub async fn build_kafka_topic_creator(
connection: &KafkaConnectionConfig,
kafka_topic: &KafkaTopicConfig,
) -> Result<KafkaTopicCreator> {
let client = build_kafka_client(connection).await?;
Ok(KafkaTopicCreator {
client,
num_partitions: config.kafka_topic.num_partitions,
replication_factor: config.kafka_topic.replication_factor,
create_topic_timeout: config.kafka_topic.create_topic_timeout.as_millis() as i32,
num_partitions: kafka_topic.num_partitions,
replication_factor: kafka_topic.replication_factor,
create_topic_timeout: kafka_topic.create_topic_timeout.as_millis() as i32,
})
}
#[cfg(test)]
mod tests {
use common_wal::config::kafka::common::{KafkaConnectionConfig, KafkaTopicConfig};
use common_wal::maybe_skip_kafka_integration_test;
use common_wal::test_util::get_kafka_endpoints;
use super::*;
async fn test_topic_creator(broker_endpoints: Vec<String>) -> KafkaTopicCreator {
let connection = KafkaConnectionConfig {
broker_endpoints,
..Default::default()
};
let kafka_topic = KafkaTopicConfig::default();
build_kafka_topic_creator(&connection, &kafka_topic)
.await
.unwrap()
}
async fn append_records(partition_client: &PartitionClient, num_records: usize) -> Result<()> {
for i in 0..num_records {
partition_client
.produce(
vec![Record {
key: Some(b"test".to_vec()),
value: Some(format!("test {}", i).as_bytes().to_vec()),
timestamp: chrono::Utc::now(),
headers: Default::default(),
}],
Compression::Lz4,
)
.await
.unwrap();
}
Ok(())
}
#[tokio::test]
async fn test_append_noop_record_to_empty_topic() {
common_telemetry::init_default_ut_logging();
maybe_skip_kafka_integration_test!();
let prefix = "append_noop_record_to_empty_topic";
let creator = test_topic_creator(get_kafka_endpoints()).await;
let topic = format!("{}{}", prefix, "0");
// Clean up the topics before test
creator.delete_topics(&[topic.to_string()]).await.unwrap();
creator.create_topics(&[topic.to_string()]).await.unwrap();
let partition_client = creator.partition_client(&topic).await.unwrap();
let end_offset = partition_client.get_offset(OffsetAt::Latest).await.unwrap();
assert_eq!(end_offset, 0);
// The topic is not empty, so no noop record is appended.
creator
.append_noop_record(&topic, &partition_client)
.await
.unwrap();
let end_offset = partition_client.get_offset(OffsetAt::Latest).await.unwrap();
assert_eq!(end_offset, 1);
}
#[tokio::test]
async fn test_append_noop_record_to_non_empty_topic() {
common_telemetry::init_default_ut_logging();
maybe_skip_kafka_integration_test!();
let prefix = "append_noop_record_to_non_empty_topic";
let creator = test_topic_creator(get_kafka_endpoints()).await;
let topic = format!("{}{}", prefix, "0");
// Clean up the topics before test
creator.delete_topics(&[topic.to_string()]).await.unwrap();
creator.create_topics(&[topic.to_string()]).await.unwrap();
let partition_client = creator.partition_client(&topic).await.unwrap();
append_records(&partition_client, 2).await.unwrap();
let end_offset = partition_client.get_offset(OffsetAt::Latest).await.unwrap();
assert_eq!(end_offset, 2);
// The topic is not empty, so no noop record is appended.
creator
.append_noop_record(&topic, &partition_client)
.await
.unwrap();
let end_offset = partition_client.get_offset(OffsetAt::Latest).await.unwrap();
assert_eq!(end_offset, 2);
}
#[tokio::test]
async fn test_create_topic() {
common_telemetry::init_default_ut_logging();
maybe_skip_kafka_integration_test!();
let prefix = "create_topic";
let creator = test_topic_creator(get_kafka_endpoints()).await;
let topic = format!("{}{}", prefix, "0");
// Clean up the topics before test
creator.delete_topics(&[topic.to_string()]).await.unwrap();
creator.create_topics(&[topic.to_string()]).await.unwrap();
// Should be ok
creator.create_topics(&[topic.to_string()]).await.unwrap();
let partition_client = creator.partition_client(&topic).await.unwrap();
let end_offset = partition_client.get_offset(OffsetAt::Latest).await.unwrap();
assert_eq!(end_offset, 0);
}
#[tokio::test]
async fn test_prepare_topic() {
common_telemetry::init_default_ut_logging();
maybe_skip_kafka_integration_test!();
let prefix = "prepare_topic";
let creator = test_topic_creator(get_kafka_endpoints()).await;
let topic = format!("{}{}", prefix, "0");
// Clean up the topics before test
creator.delete_topics(&[topic.to_string()]).await.unwrap();
creator.create_topics(&[topic.to_string()]).await.unwrap();
creator.prepare_topic(&topic).await.unwrap();
let partition_client = creator.partition_client(&topic).await.unwrap();
let start_offset = partition_client
.get_offset(OffsetAt::Earliest)
.await
.unwrap();
assert_eq!(start_offset, 0);
let end_offset = partition_client.get_offset(OffsetAt::Latest).await.unwrap();
assert_eq!(end_offset, 1);
}
#[tokio::test]
async fn test_prepare_topic_with_stale_records_without_pruning() {
common_telemetry::init_default_ut_logging();
maybe_skip_kafka_integration_test!();
let prefix = "prepare_topic_with_stale_records_without_pruning";
let creator = test_topic_creator(get_kafka_endpoints()).await;
let topic = format!("{}{}", prefix, "0");
// Clean up the topics before test
creator.delete_topics(&[topic.to_string()]).await.unwrap();
creator.create_topics(&[topic.to_string()]).await.unwrap();
let partition_client = creator.partition_client(&topic).await.unwrap();
append_records(&partition_client, 10).await.unwrap();
creator.prepare_topic(&topic).await.unwrap();
let end_offset = partition_client.get_offset(OffsetAt::Latest).await.unwrap();
assert_eq!(end_offset, 10);
let start_offset = partition_client
.get_offset(OffsetAt::Earliest)
.await
.unwrap();
assert_eq!(start_offset, 0);
}
}

View File

@@ -40,24 +40,21 @@ impl KafkaTopicManager {
Ok(topics)
}
/// Restores topics from the key-value backend. and returns the topics that are not stored in kvbackend.
pub async fn get_topics_to_create<'a>(
&self,
all_topics: &'a [String],
) -> Result<Vec<&'a String>> {
/// Returns the topics that are not prepared.
pub async fn unprepare_topics(&self, all_topics: &[String]) -> Result<Vec<String>> {
let existing_topics = self.restore_topics().await?;
let existing_topic_set = existing_topics.iter().collect::<HashSet<_>>();
let mut topics_to_create = Vec::with_capacity(all_topics.len());
for topic in all_topics {
if !existing_topic_set.contains(topic) {
topics_to_create.push(topic);
topics_to_create.push(topic.to_string());
}
}
Ok(topics_to_create)
}
/// Persists topics into the key-value backend.
pub async fn persist_topics(&self, topics: &[String]) -> Result<()> {
/// Persists prepared topics into the key-value backend.
pub async fn persist_prepared_topics(&self, topics: &[String]) -> Result<()> {
self.topic_name_manager
.batch_put(
topics
@@ -70,6 +67,14 @@ impl KafkaTopicManager {
}
}
#[cfg(test)]
impl KafkaTopicManager {
/// Lists all topics in the key-value backend.
pub async fn list_topics(&self) -> Result<Vec<String>> {
self.topic_name_manager.range().await
}
}
#[cfg(test)]
mod tests {
use std::sync::Arc;
@@ -90,11 +95,11 @@ mod tests {
// No legacy topics.
let mut topics_to_be_created = topic_kvbackend_manager
.get_topics_to_create(&all_topics)
.unprepare_topics(&all_topics)
.await
.unwrap();
topics_to_be_created.sort();
let mut expected = all_topics.iter().collect::<Vec<_>>();
let mut expected = all_topics.clone();
expected.sort();
assert_eq!(expected, topics_to_be_created);
@@ -109,7 +114,7 @@ mod tests {
assert!(res.prev_kv.is_none());
let topics_to_be_created = topic_kvbackend_manager
.get_topics_to_create(&all_topics)
.unprepare_topics(&all_topics)
.await
.unwrap();
assert!(topics_to_be_created.is_empty());
@@ -144,21 +149,21 @@ mod tests {
let topic_kvbackend_manager = KafkaTopicManager::new(kv_backend);
let mut topics_to_be_created = topic_kvbackend_manager
.get_topics_to_create(&all_topics)
.unprepare_topics(&all_topics)
.await
.unwrap();
topics_to_be_created.sort();
let mut expected = all_topics.iter().collect::<Vec<_>>();
let mut expected = all_topics.clone();
expected.sort();
assert_eq!(expected, topics_to_be_created);
// Persists topics to kv backend.
topic_kvbackend_manager
.persist_topics(&all_topics)
.persist_prepared_topics(&all_topics)
.await
.unwrap();
let topics_to_be_created = topic_kvbackend_manager
.get_topics_to_create(&all_topics)
.unprepare_topics(&all_topics)
.await
.unwrap();
assert!(topics_to_be_created.is_empty());

View File

@@ -15,6 +15,7 @@
use std::fmt::{self, Formatter};
use std::sync::Arc;
use common_telemetry::info;
use common_wal::config::kafka::MetasrvKafkaConfig;
use common_wal::TopicSelectorType;
use snafu::ensure;
@@ -77,27 +78,35 @@ impl KafkaTopicPool {
}
/// Tries to activate the topic manager when metasrv becomes the leader.
///
/// First tries to restore persisted topics from the kv backend.
/// If not enough topics retrieved, it will try to contact the Kafka cluster and request creating more topics.
/// If there are unprepared topics (topics that exist in the configuration but not in the kv backend),
/// it will create these topics in Kafka if `auto_create_topics` is enabled.
///
/// Then it prepares all unprepared topics by appending a noop record if the topic is empty,
/// and persists them in the kv backend for future use.
pub async fn activate(&self) -> Result<()> {
if !self.auto_create_topics {
return Ok(());
}
let num_topics = self.topics.len();
ensure!(num_topics > 0, InvalidNumTopicsSnafu { num_topics });
let topics_to_be_created = self
.topic_manager
.get_topics_to_create(&self.topics)
.await?;
let unprepared_topics = self.topic_manager.unprepare_topics(&self.topics).await?;
if !topics_to_be_created.is_empty() {
if !unprepared_topics.is_empty() {
if self.auto_create_topics {
info!("Creating {} topics", unprepared_topics.len());
self.topic_creator.create_topics(&unprepared_topics).await?;
} else {
info!("Auto create topics is disabled, skipping topic creation.");
}
self.topic_creator
.prepare_topics(&topics_to_be_created)
.prepare_topics(&unprepared_topics)
.await?;
self.topic_manager
.persist_prepared_topics(&unprepared_topics)
.await?;
self.topic_manager.persist_topics(&self.topics).await?;
}
info!("Activated topic pool with {} topics", self.topics.len());
Ok(())
}
@@ -114,77 +123,147 @@ impl KafkaTopicPool {
}
}
#[cfg(test)]
impl KafkaTopicPool {
pub(crate) fn topic_manager(&self) -> &KafkaTopicManager {
&self.topic_manager
}
pub(crate) fn topic_creator(&self) -> &KafkaTopicCreator {
&self.topic_creator
}
}
#[cfg(test)]
mod tests {
use common_wal::config::kafka::common::{KafkaConnectionConfig, KafkaTopicConfig};
use common_wal::test_util::run_test_with_kafka_wal;
use std::assert_matches::assert_matches;
use common_wal::maybe_skip_kafka_integration_test;
use common_wal::test_util::get_kafka_endpoints;
use super::*;
use crate::kv_backend::memory::MemoryKvBackend;
use crate::wal_options_allocator::topic_creator::build_kafka_topic_creator;
use crate::error::Error;
use crate::test_util::test_kafka_topic_pool;
use crate::wal_options_allocator::selector::RoundRobinTopicSelector;
#[tokio::test]
async fn test_pool_invalid_number_topics_err() {
common_telemetry::init_default_ut_logging();
maybe_skip_kafka_integration_test!();
let endpoints = get_kafka_endpoints();
let pool = test_kafka_topic_pool(endpoints.clone(), 0, false, None).await;
let err = pool.activate().await.unwrap_err();
assert_matches!(err, Error::InvalidNumTopics { .. });
let pool = test_kafka_topic_pool(endpoints, 0, true, None).await;
let err = pool.activate().await.unwrap_err();
assert_matches!(err, Error::InvalidNumTopics { .. });
}
#[tokio::test]
async fn test_pool_activate_unknown_topics_err() {
common_telemetry::init_default_ut_logging();
maybe_skip_kafka_integration_test!();
let pool =
test_kafka_topic_pool(get_kafka_endpoints(), 1, false, Some("unknown_topic")).await;
let err = pool.activate().await.unwrap_err();
assert_matches!(err, Error::KafkaPartitionClient { .. });
}
#[tokio::test]
async fn test_pool_activate() {
common_telemetry::init_default_ut_logging();
maybe_skip_kafka_integration_test!();
let pool =
test_kafka_topic_pool(get_kafka_endpoints(), 2, true, Some("pool_activate")).await;
// clean up the topics before test
let topic_creator = pool.topic_creator();
topic_creator.delete_topics(&pool.topics).await.unwrap();
let topic_manager = pool.topic_manager();
pool.activate().await.unwrap();
let topics = topic_manager.list_topics().await.unwrap();
assert_eq!(topics.len(), 2);
}
#[tokio::test]
async fn test_pool_activate_with_existing_topics() {
common_telemetry::init_default_ut_logging();
maybe_skip_kafka_integration_test!();
let prefix = "pool_activate_with_existing_topics";
let pool = test_kafka_topic_pool(get_kafka_endpoints(), 2, true, Some(prefix)).await;
let topic_creator = pool.topic_creator();
topic_creator.delete_topics(&pool.topics).await.unwrap();
let topic_manager = pool.topic_manager();
// persists one topic info, then pool.activate() will create new topics that not persisted.
topic_manager
.persist_prepared_topics(&pool.topics[0..1])
.await
.unwrap();
pool.activate().await.unwrap();
let topics = topic_manager.list_topics().await.unwrap();
assert_eq!(topics.len(), 2);
let client = pool.topic_creator().client();
let topics = client
.list_topics()
.await
.unwrap()
.into_iter()
.filter(|t| t.name.starts_with(prefix))
.collect::<Vec<_>>();
assert_eq!(topics.len(), 1);
}
/// Tests that the topic manager could allocate topics correctly.
#[tokio::test]
async fn test_alloc_topics() {
run_test_with_kafka_wal(|broker_endpoints| {
Box::pin(async {
// Constructs topics that should be created.
let topics = (0..256)
.map(|i| format!("test_alloc_topics_{}_{}", i, uuid::Uuid::new_v4()))
.collect::<Vec<_>>();
// Creates a topic manager.
let kafka_topic = KafkaTopicConfig {
replication_factor: broker_endpoints.len() as i16,
..Default::default()
};
let config = MetasrvKafkaConfig {
connection: KafkaConnectionConfig {
broker_endpoints,
..Default::default()
},
kafka_topic,
..Default::default()
};
let kv_backend = Arc::new(MemoryKvBackend::new()) as KvBackendRef;
let topic_creator = build_kafka_topic_creator(&config).await.unwrap();
let mut topic_pool = KafkaTopicPool::new(&config, kv_backend, topic_creator);
// Replaces the default topic pool with the constructed topics.
topic_pool.topics.clone_from(&topics);
// Replaces the default selector with a round-robin selector without shuffled.
topic_pool.selector = Arc::new(RoundRobinTopicSelector::default());
topic_pool.activate().await.unwrap();
// Selects exactly the number of `num_topics` topics one by one.
let got = (0..topics.len())
.map(|_| topic_pool.select().unwrap())
.cloned()
.collect::<Vec<_>>();
assert_eq!(got, topics);
// Selects exactly the number of `num_topics` topics in a batching manner.
let got = topic_pool
.select_batch(topics.len())
.unwrap()
.into_iter()
.map(ToString::to_string)
.collect::<Vec<_>>();
assert_eq!(got, topics);
// Selects more than the number of `num_topics` topics.
let got = topic_pool
.select_batch(2 * topics.len())
.unwrap()
.into_iter()
.map(ToString::to_string)
.collect::<Vec<_>>();
let expected = vec![topics.clone(); 2]
.into_iter()
.flatten()
.collect::<Vec<_>>();
assert_eq!(got, expected);
})
})
common_telemetry::init_default_ut_logging();
maybe_skip_kafka_integration_test!();
let num_topics = 5;
let mut topic_pool = test_kafka_topic_pool(
get_kafka_endpoints(),
num_topics,
true,
Some("test_allocator_with_kafka"),
)
.await;
topic_pool.selector = Arc::new(RoundRobinTopicSelector::default());
let topics = topic_pool.topics.clone();
// clean up the topics before test
let topic_creator = topic_pool.topic_creator();
topic_creator.delete_topics(&topics).await.unwrap();
// Selects exactly the number of `num_topics` topics one by one.
let got = (0..topics.len())
.map(|_| topic_pool.select().unwrap())
.cloned()
.collect::<Vec<_>>();
assert_eq!(got, topics);
// Selects exactly the number of `num_topics` topics in a batching manner.
let got = topic_pool
.select_batch(topics.len())
.unwrap()
.into_iter()
.map(ToString::to_string)
.collect::<Vec<_>>();
assert_eq!(got, topics);
// Selects more than the number of `num_topics` topics.
let got = topic_pool
.select_batch(2 * topics.len())
.unwrap()
.into_iter()
.map(ToString::to_string)
.collect::<Vec<_>>();
let expected = vec![topics.clone(); 2]
.into_iter()
.flatten()
.collect::<Vec<_>>();
assert_eq!(got, expected);
}
}

View File

@@ -23,11 +23,16 @@ use serde::{Deserialize, Serialize};
use snafu::{OptionExt, ResultExt};
/// The default backoff config for kafka client.
///
/// If the operation fails, the client will retry 3 times.
/// The backoff time is 100ms, 300ms, 900ms.
pub const DEFAULT_BACKOFF_CONFIG: BackoffConfig = BackoffConfig {
init_backoff: Duration::from_millis(100),
max_backoff: Duration::from_secs(10),
base: 2.0,
deadline: Some(Duration::from_secs(120)),
max_backoff: Duration::from_secs(1),
base: 3.0,
// The deadline shouldn't be too long,
// otherwise the client will block the worker loop for a long time.
deadline: Some(Duration::from_secs(3)),
};
/// Default interval for auto WAL pruning.

View File

@@ -31,3 +31,33 @@ where
test(endpoints).await
}
/// Get the kafka endpoints from the environment variable `GT_KAFKA_ENDPOINTS`.
///
/// The format of the environment variable is:
/// ```
/// GT_KAFKA_ENDPOINTS=localhost:9092,localhost:9093
/// ```
pub fn get_kafka_endpoints() -> Vec<String> {
let endpoints = std::env::var("GT_KAFKA_ENDPOINTS").unwrap();
endpoints
.split(',')
.map(|s| s.trim().to_string())
.collect::<Vec<_>>()
}
#[macro_export]
/// Skip the test if the environment variable `GT_KAFKA_ENDPOINTS` is not set.
///
/// The format of the environment variable is:
/// ```
/// GT_KAFKA_ENDPOINTS=localhost:9092,localhost:9093
/// ```
macro_rules! maybe_skip_kafka_integration_test {
() => {
if std::env::var("GT_KAFKA_ENDPOINTS").is_err() {
common_telemetry::warn!("The endpoints is empty, skipping the test");
return;
}
};
}

View File

@@ -398,45 +398,46 @@ impl DatanodeBuilder {
schema_metadata_manager: SchemaMetadataManagerRef,
plugins: Plugins,
) -> Result<Vec<RegionEngineRef>> {
let mut engines = vec![];
let mut metric_engine_config = opts.region_engine.iter().find_map(|c| match c {
RegionEngineConfig::Metric(config) => Some(config.clone()),
_ => None,
});
let mut metric_engine_config = metric_engine::config::EngineConfig::default();
let mut mito_engine_config = MitoConfig::default();
let mut file_engine_config = file_engine::config::EngineConfig::default();
for engine in &opts.region_engine {
match engine {
RegionEngineConfig::Mito(config) => {
let mito_engine = Self::build_mito_engine(
opts,
object_store_manager.clone(),
config.clone(),
schema_metadata_manager.clone(),
plugins.clone(),
)
.await?;
let metric_engine = MetricEngine::try_new(
mito_engine.clone(),
metric_engine_config.take().unwrap_or_default(),
)
.context(BuildMetricEngineSnafu)?;
engines.push(Arc::new(mito_engine) as _);
engines.push(Arc::new(metric_engine) as _);
mito_engine_config = config.clone();
}
RegionEngineConfig::File(config) => {
let engine = FileRegionEngine::new(
config.clone(),
object_store_manager.default_object_store().clone(), // TODO: implement custom storage for file engine
);
engines.push(Arc::new(engine) as _);
file_engine_config = config.clone();
}
RegionEngineConfig::Metric(_) => {
// Already handled in `build_mito_engine`.
RegionEngineConfig::Metric(metric_config) => {
metric_engine_config = metric_config.clone();
}
}
}
Ok(engines)
let mito_engine = Self::build_mito_engine(
opts,
object_store_manager.clone(),
mito_engine_config,
schema_metadata_manager.clone(),
plugins.clone(),
)
.await?;
let metric_engine = MetricEngine::try_new(mito_engine.clone(), metric_engine_config)
.context(BuildMetricEngineSnafu)?;
let file_engine = FileRegionEngine::new(
file_engine_config,
object_store_manager.default_object_store().clone(), // TODO: implement custom storage for file engine
);
Ok(vec![
Arc::new(mito_engine) as _,
Arc::new(metric_engine) as _,
Arc::new(file_engine) as _,
])
}
/// Builds [MitoEngine] according to options.

View File

@@ -25,6 +25,7 @@ use std::sync::Arc;
use std::time::Duration;
use common_telemetry::{info, warn};
use mito2::access_layer::{ATOMIC_WRITE_DIR, OLD_ATOMIC_WRITE_DIR};
use object_store::layers::{LruCacheLayer, RetryInterceptor, RetryLayer};
use object_store::services::Fs;
use object_store::util::{join_dir, normalize_dir, with_instrument_layers};
@@ -168,9 +169,13 @@ async fn build_cache_layer(
if let Some(path) = cache_path.as_ref()
&& !path.trim().is_empty()
{
let atomic_temp_dir = join_dir(path, ".tmp/");
let atomic_temp_dir = join_dir(path, ATOMIC_WRITE_DIR);
clean_temp_dir(&atomic_temp_dir)?;
// Compatible code. Remove this after a major release.
let old_atomic_temp_dir = join_dir(path, OLD_ATOMIC_WRITE_DIR);
clean_temp_dir(&old_atomic_temp_dir)?;
let cache_store = Fs::default()
.root(path)
.atomic_write_dir(&atomic_temp_dir)

View File

@@ -15,6 +15,7 @@
use std::{fs, path};
use common_telemetry::info;
use mito2::access_layer::{ATOMIC_WRITE_DIR, OLD_ATOMIC_WRITE_DIR};
use object_store::services::Fs;
use object_store::util::join_dir;
use object_store::ObjectStore;
@@ -33,9 +34,13 @@ pub async fn new_fs_object_store(
.context(error::CreateDirSnafu { dir: data_home })?;
info!("The file storage home is: {}", data_home);
let atomic_write_dir = join_dir(data_home, ".tmp/");
let atomic_write_dir = join_dir(data_home, ATOMIC_WRITE_DIR);
store::clean_temp_dir(&atomic_write_dir)?;
// Compatible code. Remove this after a major release.
let old_atomic_temp_dir = join_dir(data_home, OLD_ATOMIC_WRITE_DIR);
store::clean_temp_dir(&old_atomic_temp_dir)?;
let builder = Fs::default()
.root(data_home)
.atomic_write_dir(&atomic_write_dir);

View File

@@ -60,6 +60,7 @@ partition.workspace = true
prometheus.workspace = true
prost.workspace = true
query.workspace = true
rand.workspace = true
serde.workspace = true
servers.workspace = true
session.workspace = true

View File

@@ -14,6 +14,7 @@
//! impl `FlowNode` trait for FlowNodeManager so standalone can call them
use std::collections::{HashMap, HashSet};
use std::sync::atomic::AtomicBool;
use std::sync::Arc;
use api::v1::flow::{
@@ -37,11 +38,12 @@ use tokio::sync::{Mutex, RwLock};
use crate::adapter::{CreateFlowArgs, StreamingEngine};
use crate::batching_mode::engine::BatchingEngine;
use crate::batching_mode::{FRONTEND_SCAN_TIMEOUT, MIN_REFRESH_DURATION};
use crate::engine::FlowEngine;
use crate::error::{
CreateFlowSnafu, ExternalSnafu, FlowNotFoundSnafu, IllegalCheckTaskStateSnafu,
InsertIntoFlowSnafu, InternalSnafu, JoinTaskSnafu, ListFlowsSnafu, SyncCheckTaskSnafu,
UnexpectedSnafu,
CreateFlowSnafu, ExternalSnafu, FlowNotFoundSnafu, FlowNotRecoveredSnafu,
IllegalCheckTaskStateSnafu, InsertIntoFlowSnafu, InternalSnafu, JoinTaskSnafu, ListFlowsSnafu,
NoAvailableFrontendSnafu, SyncCheckTaskSnafu, UnexpectedSnafu,
};
use crate::metrics::METRIC_FLOW_TASK_COUNT;
use crate::repr::{self, DiffRow};
@@ -62,6 +64,7 @@ pub struct FlowDualEngine {
flow_metadata_manager: Arc<FlowMetadataManager>,
catalog_manager: Arc<dyn CatalogManager>,
check_task: tokio::sync::Mutex<Option<ConsistentCheckTask>>,
done_recovering: AtomicBool,
}
impl FlowDualEngine {
@@ -78,9 +81,60 @@ impl FlowDualEngine {
flow_metadata_manager,
catalog_manager,
check_task: Mutex::new(None),
done_recovering: AtomicBool::new(false),
}
}
/// Set `done_recovering` to true
/// indicate that we are ready to handle requests
pub fn set_done_recovering(&self) {
info!("FlowDualEngine done recovering");
self.done_recovering
.store(true, std::sync::atomic::Ordering::Release);
}
/// Check if `done_recovering` is true
pub fn is_recover_done(&self) -> bool {
self.done_recovering
.load(std::sync::atomic::Ordering::Acquire)
}
/// wait for recovering to be done, this will only happen when flownode just started
async fn wait_for_all_flow_recover(&self, waiting_req_cnt: usize) -> Result<(), Error> {
if self.is_recover_done() {
return Ok(());
}
warn!(
"FlowDualEngine is not done recovering, {} insert request waiting for recovery",
waiting_req_cnt
);
// wait 3 seconds, check every 1 second
// TODO(discord9): make this configurable
let mut retry = 0;
let max_retry = 3;
while retry < max_retry && !self.is_recover_done() {
warn!(
"FlowDualEngine is not done recovering, retry {} in 1s",
retry
);
tokio::time::sleep(std::time::Duration::from_secs(1)).await;
retry += 1;
}
if retry == max_retry {
return FlowNotRecoveredSnafu.fail();
} else {
info!("FlowDualEngine is done recovering");
}
// TODO(discord9): also put to centralized logging for flow once it implemented
Ok(())
}
/// Determine if the engine is in distributed mode
pub fn is_distributed(&self) -> bool {
self.streaming_engine.node_id.is_some()
}
pub fn streaming_engine(&self) -> Arc<StreamingEngine> {
self.streaming_engine.clone()
}
@@ -89,6 +143,39 @@ impl FlowDualEngine {
self.batching_engine.clone()
}
/// In distributed mode, scan periodically(1s) until available frontend is found, or timeout,
/// in standalone mode, return immediately
/// notice here if any frontend appear in cluster info this function will return immediately
async fn wait_for_available_frontend(&self, timeout: std::time::Duration) -> Result<(), Error> {
if !self.is_distributed() {
return Ok(());
}
let frontend_client = self.batching_engine().frontend_client.clone();
let sleep_duration = std::time::Duration::from_millis(1_000);
let now = std::time::Instant::now();
loop {
let frontend_list = frontend_client.scan_for_frontend().await?;
if !frontend_list.is_empty() {
let fe_list = frontend_list
.iter()
.map(|(_, info)| &info.peer.addr)
.collect::<Vec<_>>();
info!("Available frontend found: {:?}", fe_list);
return Ok(());
}
let elapsed = now.elapsed();
tokio::time::sleep(sleep_duration).await;
info!("Waiting for available frontend, elapsed={:?}", elapsed);
if elapsed >= timeout {
return NoAvailableFrontendSnafu {
timeout,
context: "No available frontend found in cluster info",
}
.fail();
}
}
}
/// Try to sync with check task, this is only used in drop flow&flush flow, so a flow id is required
///
/// the need to sync is to make sure flush flow actually get called
@@ -196,7 +283,7 @@ impl FlowDualEngine {
to_be_created
);
let mut errors = vec![];
for flow_id in to_be_created {
for flow_id in to_be_created.clone() {
let flow_id = *flow_id;
let info = self
.flow_metadata_manager
@@ -255,12 +342,16 @@ impl FlowDualEngine {
errors.push((flow_id, err));
}
}
if errors.is_empty() {
info!("Recover flows successfully, flows: {:?}", to_be_created);
}
for (flow_id, err) in errors {
warn!("Failed to recreate flow {}, err={:#?}", flow_id, err);
}
} else {
warn!(
"Flownode {:?} found flows not exist in flownode, flow_ids={:?}",
"Flows do not exist in flownode for node {:?}, flow_ids={:?}",
nodeid, to_be_created
);
}
@@ -280,7 +371,7 @@ impl FlowDualEngine {
}
} else {
warn!(
"Flownode {:?} found flows not exist in flownode, flow_ids={:?}",
"Flows do not exist in metadata for node {:?}, flow_ids={:?}",
nodeid, to_be_dropped
);
}
@@ -338,18 +429,38 @@ struct ConsistentCheckTask {
impl ConsistentCheckTask {
async fn start_check_task(engine: &Arc<FlowDualEngine>) -> Result<Self, Error> {
// first do recover flows
engine.check_flow_consistent(true, false).await?;
let inner = engine.clone();
let engine = engine.clone();
let (tx, mut rx) = tokio::sync::mpsc::channel(1);
let (trigger_tx, mut trigger_rx) =
tokio::sync::mpsc::channel::<(bool, bool, tokio::sync::oneshot::Sender<()>)>(10);
let handle = common_runtime::spawn_global(async move {
// first check if available frontend is found
if let Err(err) = engine
.wait_for_available_frontend(FRONTEND_SCAN_TIMEOUT)
.await
{
warn!("No frontend is available yet:\n {err:?}");
}
// then do recover flows, if failed, always retry
let mut recover_retry = 0;
while let Err(err) = engine.check_flow_consistent(true, false).await {
recover_retry += 1;
error!(
"Failed to recover flows:\n {err:?}, retry {} in {}s",
recover_retry,
MIN_REFRESH_DURATION.as_secs()
);
tokio::time::sleep(MIN_REFRESH_DURATION).await;
}
engine.set_done_recovering();
// then do check flows, with configurable allow_create and allow_drop
let (mut allow_create, mut allow_drop) = (false, false);
let mut ret_signal: Option<tokio::sync::oneshot::Sender<()>> = None;
loop {
if let Err(err) = inner.check_flow_consistent(allow_create, allow_drop).await {
if let Err(err) = engine.check_flow_consistent(allow_create, allow_drop).await {
error!(err; "Failed to check flow consistent");
}
if let Some(done) = ret_signal.take() {
@@ -534,7 +645,12 @@ impl FlowEngine for FlowDualEngine {
match flow_type {
Some(FlowType::Batching) => self.batching_engine.flush_flow(flow_id).await,
Some(FlowType::Streaming) => self.streaming_engine.flush_flow(flow_id).await,
None => Ok(0),
None => {
warn!(
"Currently flow={flow_id} doesn't exist in flownode, ignore flush_flow request"
);
Ok(0)
}
}
}
@@ -559,11 +675,14 @@ impl FlowEngine for FlowDualEngine {
&self,
request: api::v1::region::InsertRequests,
) -> Result<(), Error> {
self.wait_for_all_flow_recover(request.requests.len())
.await?;
// TODO(discord9): make as little clone as possible
let mut to_stream_engine = Vec::with_capacity(request.requests.len());
let mut to_batch_engine = request.requests;
{
// not locking this, or recover flows will be starved when also handling flow inserts
let src_table2flow = self.src_table2flow.read().await;
to_batch_engine.retain(|req| {
let region_id = RegionId::from(req.region_id);
@@ -699,9 +818,17 @@ fn to_meta_err(
location: snafu::Location,
) -> impl FnOnce(crate::error::Error) -> common_meta::error::Error {
move |err: crate::error::Error| -> common_meta::error::Error {
common_meta::error::Error::External {
location,
source: BoxedError::new(err),
match err {
crate::error::Error::FlowNotFound { id, .. } => {
common_meta::error::Error::FlowNotFound {
flow_name: format!("flow_id={id}"),
location,
}
}
_ => common_meta::error::Error::External {
location,
source: BoxedError::new(err),
},
}
}
}

View File

@@ -31,4 +31,19 @@ pub const DEFAULT_BATCHING_ENGINE_QUERY_TIMEOUT: Duration = Duration::from_secs(
pub const SLOW_QUERY_THRESHOLD: Duration = Duration::from_secs(60);
/// The minimum duration between two queries execution by batching mode task
const MIN_REFRESH_DURATION: Duration = Duration::new(5, 0);
pub const MIN_REFRESH_DURATION: Duration = Duration::new(5, 0);
/// Grpc connection timeout
const GRPC_CONN_TIMEOUT: Duration = Duration::from_secs(5);
/// Grpc max retry number
const GRPC_MAX_RETRIES: u32 = 3;
/// Flow wait for available frontend timeout,
/// if failed to find available frontend after FRONTEND_SCAN_TIMEOUT elapsed, return error
/// which should prevent flownode from starting
pub const FRONTEND_SCAN_TIMEOUT: Duration = Duration::from_secs(30);
/// Frontend activity timeout
/// if frontend is down(not sending heartbeat) for more than FRONTEND_ACTIVITY_TIMEOUT, it will be removed from the list that flownode use to connect
pub const FRONTEND_ACTIVITY_TIMEOUT: Duration = Duration::from_secs(60);

View File

@@ -39,7 +39,8 @@ use crate::batching_mode::time_window::{find_time_window_expr, TimeWindowExpr};
use crate::batching_mode::utils::sql_to_df_plan;
use crate::engine::FlowEngine;
use crate::error::{
ExternalSnafu, FlowAlreadyExistSnafu, TableNotFoundMetaSnafu, UnexpectedSnafu, UnsupportedSnafu,
ExternalSnafu, FlowAlreadyExistSnafu, FlowNotFoundSnafu, TableNotFoundMetaSnafu,
UnexpectedSnafu, UnsupportedSnafu,
};
use crate::{CreateFlowArgs, Error, FlowId, TableName};
@@ -49,7 +50,8 @@ use crate::{CreateFlowArgs, Error, FlowId, TableName};
pub struct BatchingEngine {
tasks: RwLock<BTreeMap<FlowId, BatchingTask>>,
shutdown_txs: RwLock<BTreeMap<FlowId, oneshot::Sender<()>>>,
frontend_client: Arc<FrontendClient>,
/// frontend client for insert request
pub(crate) frontend_client: Arc<FrontendClient>,
flow_metadata_manager: FlowMetadataManagerRef,
table_meta: TableMetadataManagerRef,
catalog_manager: CatalogManagerRef,
@@ -329,7 +331,7 @@ impl BatchingEngine {
let frontend = self.frontend_client.clone();
// check execute once first to detect any error early
task.check_execute(&engine, &frontend).await?;
task.check_or_create_sink_table(&engine, &frontend).await?;
// TODO(discord9): use time wheel or what for better
let handle = common_runtime::spawn_global(async move {
@@ -348,7 +350,8 @@ impl BatchingEngine {
pub async fn remove_flow_inner(&self, flow_id: FlowId) -> Result<(), Error> {
if self.tasks.write().await.remove(&flow_id).is_none() {
warn!("Flow {flow_id} not found in tasks")
warn!("Flow {flow_id} not found in tasks");
FlowNotFoundSnafu { id: flow_id }.fail()?;
}
let Some(tx) = self.shutdown_txs.write().await.remove(&flow_id) else {
UnexpectedSnafu {
@@ -365,9 +368,7 @@ impl BatchingEngine {
pub async fn flush_flow_inner(&self, flow_id: FlowId) -> Result<usize, Error> {
debug!("Try flush flow {flow_id}");
let task = self.tasks.read().await.get(&flow_id).cloned();
let task = task.with_context(|| UnexpectedSnafu {
reason: format!("Can't found task for flow {flow_id}"),
})?;
let task = task.with_context(|| FlowNotFoundSnafu { id: flow_id })?;
task.mark_all_windows_as_dirty()?;

View File

@@ -15,6 +15,7 @@
//! Frontend client to run flow as batching task which is time-window-aware normal query triggered every tick set by user
use std::sync::{Arc, Weak};
use std::time::SystemTime;
use api::v1::greptime_request::Request;
use api::v1::CreateTableExpr;
@@ -25,13 +26,19 @@ use common_meta::cluster::{NodeInfo, NodeInfoKey, Role};
use common_meta::peer::Peer;
use common_meta::rpc::store::RangeRequest;
use common_query::Output;
use common_telemetry::warn;
use meta_client::client::MetaClient;
use rand::rng;
use rand::seq::SliceRandom;
use servers::query_handler::grpc::GrpcQueryHandler;
use session::context::{QueryContextBuilder, QueryContextRef};
use snafu::{OptionExt, ResultExt};
use crate::batching_mode::DEFAULT_BATCHING_ENGINE_QUERY_TIMEOUT;
use crate::error::{ExternalSnafu, InvalidRequestSnafu, UnexpectedSnafu};
use crate::batching_mode::{
DEFAULT_BATCHING_ENGINE_QUERY_TIMEOUT, FRONTEND_ACTIVITY_TIMEOUT, GRPC_CONN_TIMEOUT,
GRPC_MAX_RETRIES,
};
use crate::error::{ExternalSnafu, InvalidRequestSnafu, NoAvailableFrontendSnafu, UnexpectedSnafu};
use crate::Error;
/// Just like [`GrpcQueryHandler`] but use BoxedError
@@ -99,7 +106,9 @@ impl FrontendClient {
Self::Distributed {
meta_client,
chnl_mgr: {
let cfg = ChannelConfig::new().timeout(DEFAULT_BATCHING_ENGINE_QUERY_TIMEOUT);
let cfg = ChannelConfig::new()
.connect_timeout(GRPC_CONN_TIMEOUT)
.timeout(DEFAULT_BATCHING_ENGINE_QUERY_TIMEOUT);
ChannelManager::with_config(cfg)
},
}
@@ -122,10 +131,24 @@ impl DatabaseWithPeer {
fn new(database: Database, peer: Peer) -> Self {
Self { database, peer }
}
/// Try sending a "SELECT 1" to the database
async fn try_select_one(&self) -> Result<(), Error> {
// notice here use `sql` for `SELECT 1` return 1 row
let _ = self
.database
.sql("SELECT 1")
.await
.with_context(|_| InvalidRequestSnafu {
context: format!("Failed to handle `SELECT 1` request at {:?}", self.peer),
})?;
Ok(())
}
}
impl FrontendClient {
async fn scan_for_frontend(&self) -> Result<Vec<(NodeInfoKey, NodeInfo)>, Error> {
/// scan for available frontend from metadata
pub(crate) async fn scan_for_frontend(&self) -> Result<Vec<(NodeInfoKey, NodeInfo)>, Error> {
let Self::Distributed { meta_client, .. } = self else {
return Ok(vec![]);
};
@@ -155,8 +178,9 @@ impl FrontendClient {
Ok(res)
}
/// Get the database with max `last_activity_ts`
async fn get_last_active_frontend(
/// Get the frontend with recent enough(less than 1 minute from now) `last_activity_ts`
/// and is able to process query
async fn get_random_active_frontend(
&self,
catalog: &str,
schema: &str,
@@ -172,22 +196,50 @@ impl FrontendClient {
.fail();
};
let frontends = self.scan_for_frontend().await?;
let mut peer = None;
let mut interval = tokio::time::interval(GRPC_CONN_TIMEOUT);
interval.tick().await;
for retry in 0..GRPC_MAX_RETRIES {
let mut frontends = self.scan_for_frontend().await?;
let now_in_ms = SystemTime::now()
.duration_since(SystemTime::UNIX_EPOCH)
.unwrap()
.as_millis() as i64;
// shuffle the frontends to avoid always pick the same one
frontends.shuffle(&mut rng());
if let Some((_, val)) = frontends.iter().max_by_key(|(_, val)| val.last_activity_ts) {
peer = Some(val.peer.clone());
// found node with maximum last_activity_ts
for (_, node_info) in frontends
.iter()
// filter out frontend that have been down for more than 1 min
.filter(|(_, node_info)| {
node_info.last_activity_ts + FRONTEND_ACTIVITY_TIMEOUT.as_millis() as i64
> now_in_ms
})
{
let addr = &node_info.peer.addr;
let client = Client::with_manager_and_urls(chnl_mgr.clone(), vec![addr.clone()]);
let database = Database::new(catalog, schema, client);
let db = DatabaseWithPeer::new(database, node_info.peer.clone());
match db.try_select_one().await {
Ok(_) => return Ok(db),
Err(e) => {
warn!(
"Failed to connect to frontend {} on retry={}: \n{e:?}",
addr, retry
);
}
}
}
// no available frontend
// sleep and retry
interval.tick().await;
}
let Some(peer) = peer else {
UnexpectedSnafu {
reason: format!("No frontend available: {:?}", frontends),
}
.fail()?
};
let client = Client::with_manager_and_urls(chnl_mgr.clone(), vec![peer.addr.clone()]);
let database = Database::new(catalog, schema, client);
Ok(DatabaseWithPeer::new(database, peer))
NoAvailableFrontendSnafu {
timeout: GRPC_CONN_TIMEOUT,
context: "No available frontend found that is able to process query",
}
.fail()
}
pub async fn create(
@@ -217,17 +269,17 @@ impl FrontendClient {
) -> Result<u32, Error> {
match self {
FrontendClient::Distributed { .. } => {
let db = self.get_last_active_frontend(catalog, schema).await?;
let db = self.get_random_active_frontend(catalog, schema).await?;
*peer_desc = Some(PeerDesc::Dist {
peer: db.peer.clone(),
});
db.database
.handle(req.clone())
.handle_with_retry(req.clone(), GRPC_MAX_RETRIES)
.await
.with_context(|_| InvalidRequestSnafu {
context: format!("Failed to handle request: {:?}", req),
context: format!("Failed to handle request at {:?}: {:?}", db.peer, req),
})
}
FrontendClient::Standalone { database_client } => {

View File

@@ -71,18 +71,33 @@ impl TaskState {
self.last_update_time = Instant::now();
}
/// wait for at least `last_query_duration`, at most `max_timeout` to start next query
/// Compute the next query delay based on the time window size or the last query duration.
/// Aiming to avoid too frequent queries. But also not too long delay.
/// The delay is computed as follows:
/// - If `time_window_size` is set, the delay is half the time window size, constrained to be
/// at least `last_query_duration` and at most `max_timeout`.
/// - If `time_window_size` is not set, the delay defaults to `last_query_duration`, constrained
/// to be at least `MIN_REFRESH_DURATION` and at most `max_timeout`.
///
/// if have more dirty time window, exec next query immediately
/// If there are dirty time windows, the function returns an immediate execution time to clean them.
/// TODO: Make this behavior configurable.
pub fn get_next_start_query_time(
&self,
flow_id: FlowId,
time_window_size: &Option<Duration>,
max_timeout: Option<Duration>,
) -> Instant {
let next_duration = max_timeout
let last_duration = max_timeout
.unwrap_or(self.last_query_duration)
.min(self.last_query_duration);
let next_duration = next_duration.max(MIN_REFRESH_DURATION);
.min(self.last_query_duration)
.max(MIN_REFRESH_DURATION);
let next_duration = time_window_size
.map(|t| {
let half = t / 2;
half.max(last_duration)
})
.unwrap_or(last_duration);
// if have dirty time window, execute immediately to clean dirty time window
if self.dirty_time_windows.windows.is_empty() {

View File

@@ -53,6 +53,7 @@ use crate::batching_mode::utils::{
use crate::batching_mode::{
DEFAULT_BATCHING_ENGINE_QUERY_TIMEOUT, MIN_REFRESH_DURATION, SLOW_QUERY_THRESHOLD,
};
use crate::df_optimizer::apply_df_optimizer;
use crate::error::{
ConvertColumnSchemaSnafu, DatafusionSnafu, ExternalSnafu, InvalidQuerySnafu,
SubstraitEncodeLogicalPlanSnafu, UnexpectedSnafu,
@@ -141,26 +142,12 @@ impl BatchingTask {
Ok(())
}
/// Test execute, for check syntax or such
pub async fn check_execute(
/// Create sink table if not exists
pub async fn check_or_create_sink_table(
&self,
engine: &QueryEngineRef,
frontend_client: &Arc<FrontendClient>,
) -> Result<Option<(u32, Duration)>, Error> {
// use current time to test get a dirty time window, which should be safe
let start = SystemTime::now();
let ts = Timestamp::new_second(
start
.duration_since(UNIX_EPOCH)
.expect("Time went backwards")
.as_secs() as _,
);
self.state
.write()
.unwrap()
.dirty_time_windows
.add_lower_bounds(vec![ts].into_iter());
if !self.is_table_exist(&self.config.sink_table_name).await? {
let create_table = self.gen_create_table_expr(engine.clone()).await?;
info!(
@@ -173,7 +160,8 @@ impl BatchingTask {
self.config.sink_table_name.join(".")
);
}
self.gen_exec_once(engine, frontend_client).await
Ok(None)
}
async fn is_table_exist(&self, table_name: &[String; 3]) -> Result<bool, Error> {
@@ -392,6 +380,23 @@ impl BatchingTask {
frontend_client: Arc<FrontendClient>,
) {
loop {
// first check if shutdown signal is received
// if so, break the loop
{
let mut state = self.state.write().unwrap();
match state.shutdown_rx.try_recv() {
Ok(()) => break,
Err(TryRecvError::Closed) => {
warn!(
"Unexpected shutdown flow {}, shutdown anyway",
self.config.flow_id
);
break;
}
Err(TryRecvError::Empty) => (),
}
}
let mut new_query = None;
let mut gen_and_exec = async || {
new_query = self.gen_insert_plan(&engine).await?;
@@ -405,20 +410,15 @@ impl BatchingTask {
// normal execute, sleep for some time before doing next query
Ok(Some(_)) => {
let sleep_until = {
let mut state = self.state.write().unwrap();
match state.shutdown_rx.try_recv() {
Ok(()) => break,
Err(TryRecvError::Closed) => {
warn!(
"Unexpected shutdown flow {}, shutdown anyway",
self.config.flow_id
);
break;
}
Err(TryRecvError::Empty) => (),
}
let state = self.state.write().unwrap();
state.get_next_start_query_time(
self.config.flow_id,
&self
.config
.time_window_expr
.as_ref()
.and_then(|t| *t.time_window_size()),
Some(DEFAULT_BATCHING_ENGINE_QUERY_TIMEOUT),
)
};
@@ -541,7 +541,10 @@ impl BatchingTask {
.clone()
.rewrite(&mut add_auto_column)
.with_context(|_| DatafusionSnafu {
context: format!("Failed to rewrite plan {:?}", self.config.plan),
context: format!(
"Failed to rewrite plan:\n {}\n",
self.config.plan
),
})?
.data;
let schema_len = plan.schema().fields().len();
@@ -573,16 +576,19 @@ impl BatchingTask {
let mut add_filter = AddFilterRewriter::new(expr);
let mut add_auto_column = AddAutoColumnRewriter::new(sink_table_schema.clone());
// make a not optimized plan for clearer unparse
let plan = sql_to_df_plan(query_ctx.clone(), engine.clone(), &self.config.query, false)
.await?;
plan.clone()
let rewrite = plan
.clone()
.rewrite(&mut add_filter)
.and_then(|p| p.data.rewrite(&mut add_auto_column))
.with_context(|_| DatafusionSnafu {
context: format!("Failed to rewrite plan {plan:?}"),
context: format!("Failed to rewrite plan:\n {}\n", plan),
})?
.data
.data;
// only apply optimize after complex rewrite is done
apply_df_optimizer(rewrite).await?
};
Ok(Some((new_plan, schema_len)))

View File

@@ -55,6 +55,9 @@ use crate::error::{
use crate::expr::error::DataTypeSnafu;
use crate::Error;
/// Represents a test timestamp in seconds since the Unix epoch.
const DEFAULT_TEST_TIMESTAMP: Timestamp = Timestamp::new_second(17_0000_0000);
/// Time window expr like `date_bin(INTERVAL '1' MINUTE, ts)`, this type help with
/// evaluating the expr using given timestamp
///
@@ -70,6 +73,7 @@ pub struct TimeWindowExpr {
pub column_name: String,
logical_expr: Expr,
df_schema: DFSchema,
eval_time_window_size: Option<std::time::Duration>,
}
impl std::fmt::Display for TimeWindowExpr {
@@ -84,6 +88,11 @@ impl std::fmt::Display for TimeWindowExpr {
}
impl TimeWindowExpr {
/// The time window size of the expr, get from calling `eval` with a test timestamp
pub fn time_window_size(&self) -> &Option<std::time::Duration> {
&self.eval_time_window_size
}
pub fn from_expr(
expr: &Expr,
column_name: &str,
@@ -91,12 +100,28 @@ impl TimeWindowExpr {
session: &SessionState,
) -> Result<Self, Error> {
let phy_expr: PhysicalExprRef = to_phy_expr(expr, df_schema, session)?;
Ok(Self {
let mut zelf = Self {
phy_expr,
column_name: column_name.to_string(),
logical_expr: expr.clone(),
df_schema: df_schema.clone(),
})
eval_time_window_size: None,
};
let test_ts = DEFAULT_TEST_TIMESTAMP;
let (l, u) = zelf.eval(test_ts)?;
let time_window_size = match (l, u) {
(Some(l), Some(u)) => u.sub(&l).map(|r| r.to_std()).transpose().map_err(|_| {
UnexpectedSnafu {
reason: format!(
"Expect upper bound older than lower bound, found upper={u:?} and lower={l:?}"
),
}
.build()
})?,
_ => None,
};
zelf.eval_time_window_size = time_window_size;
Ok(zelf)
}
pub fn eval(
@@ -704,6 +729,28 @@ mod test {
),
"SELECT arrow_cast(date_bin(INTERVAL '1 MINS', numbers_with_ts.ts), 'Timestamp(Second, None)') AS time_window FROM numbers_with_ts WHERE ((ts >= CAST('2025-02-24 10:48:00' AS TIMESTAMP)) AND (ts <= CAST('2025-02-24 10:49:00' AS TIMESTAMP))) GROUP BY arrow_cast(date_bin(INTERVAL '1 MINS', numbers_with_ts.ts), 'Timestamp(Second, None)')"
),
// complex time window index with where
(
"SELECT arrow_cast(date_bin(INTERVAL '1 MINS', numbers_with_ts.ts), 'Timestamp(Second, None)') AS time_window FROM numbers_with_ts WHERE number in (2, 3, 4) GROUP BY time_window;",
Timestamp::new(1740394109, TimeUnit::Second),
(
"ts".to_string(),
Some(Timestamp::new(1740394080, TimeUnit::Second)),
Some(Timestamp::new(1740394140, TimeUnit::Second)),
),
"SELECT arrow_cast(date_bin(INTERVAL '1 MINS', numbers_with_ts.ts), 'Timestamp(Second, None)') AS time_window FROM numbers_with_ts WHERE numbers_with_ts.number IN (2, 3, 4) AND ((ts >= CAST('2025-02-24 10:48:00' AS TIMESTAMP)) AND (ts <= CAST('2025-02-24 10:49:00' AS TIMESTAMP))) GROUP BY arrow_cast(date_bin(INTERVAL '1 MINS', numbers_with_ts.ts), 'Timestamp(Second, None)')"
),
// complex time window index with between and
(
"SELECT arrow_cast(date_bin(INTERVAL '1 MINS', numbers_with_ts.ts), 'Timestamp(Second, None)') AS time_window FROM numbers_with_ts WHERE number BETWEEN 2 AND 4 GROUP BY time_window;",
Timestamp::new(1740394109, TimeUnit::Second),
(
"ts".to_string(),
Some(Timestamp::new(1740394080, TimeUnit::Second)),
Some(Timestamp::new(1740394140, TimeUnit::Second)),
),
"SELECT arrow_cast(date_bin(INTERVAL '1 MINS', numbers_with_ts.ts), 'Timestamp(Second, None)') AS time_window FROM numbers_with_ts WHERE (numbers_with_ts.number BETWEEN 2 AND 4) AND ((ts >= CAST('2025-02-24 10:48:00' AS TIMESTAMP)) AND (ts <= CAST('2025-02-24 10:49:00' AS TIMESTAMP))) GROUP BY arrow_cast(date_bin(INTERVAL '1 MINS', numbers_with_ts.ts), 'Timestamp(Second, None)')"
),
// no time index
(
"SELECT date_bin('5 minutes', ts) FROM numbers_with_ts;",

View File

@@ -342,8 +342,8 @@ impl TreeNodeRewriter for AddAutoColumnRewriter {
}
} else {
return Err(DataFusionError::Plan(format!(
"Expect table have 0,1 or 2 columns more than query columns, found {} query columns {:?}, {} table columns {:?} at node {:?}",
query_col_cnt, exprs, table_col_cnt, self.schema.column_schemas(), node
"Expect table have 0,1 or 2 columns more than query columns, found {} query columns {:?}, {} table columns {:?}",
query_col_cnt, exprs, table_col_cnt, self.schema.column_schemas()
)));
}
@@ -406,7 +406,9 @@ mod test {
use datatypes::prelude::ConcreteDataType;
use datatypes::schema::{ColumnSchema, Schema};
use pretty_assertions::assert_eq;
use query::query_engine::DefaultSerializer;
use session::context::QueryContext;
use substrait::{DFLogicalSubstraitConvertor, SubstraitPlan};
use super::*;
use crate::test_utils::create_test_query_engine;
@@ -701,4 +703,18 @@ mod test {
);
}
}
#[tokio::test]
async fn test_null_cast() {
let query_engine = create_test_query_engine();
let ctx = QueryContext::arc();
let sql = "SELECT NULL::DOUBLE FROM numbers_with_ts";
let plan = sql_to_df_plan(ctx, query_engine.clone(), sql, false)
.await
.unwrap();
let _sub_plan = DFLogicalSubstraitConvertor {}
.encode(&plan, DefaultSerializer)
.unwrap();
}
}

View File

@@ -25,7 +25,6 @@ use datafusion::config::ConfigOptions;
use datafusion::error::DataFusionError;
use datafusion::functions_aggregate::count::count_udaf;
use datafusion::functions_aggregate::sum::sum_udaf;
use datafusion::optimizer::analyzer::count_wildcard_rule::CountWildcardRule;
use datafusion::optimizer::analyzer::type_coercion::TypeCoercion;
use datafusion::optimizer::common_subexpr_eliminate::CommonSubexprEliminate;
use datafusion::optimizer::optimize_projections::OptimizeProjections;
@@ -42,6 +41,7 @@ use datafusion_expr::{
BinaryExpr, ColumnarValue, Expr, Operator, Projection, ScalarFunctionArgs, ScalarUDFImpl,
Signature, TypeSignature, Volatility,
};
use query::optimizer::count_wildcard::CountWildcardToTimeIndexRule;
use query::parser::QueryLanguageParser;
use query::query_engine::DefaultSerializer;
use query::QueryEngine;
@@ -61,9 +61,9 @@ pub async fn apply_df_optimizer(
) -> Result<datafusion_expr::LogicalPlan, Error> {
let cfg = ConfigOptions::new();
let analyzer = Analyzer::with_rules(vec![
Arc::new(CountWildcardRule::new()),
Arc::new(AvgExpandRule::new()),
Arc::new(TumbleExpandRule::new()),
Arc::new(CountWildcardToTimeIndexRule),
Arc::new(AvgExpandRule),
Arc::new(TumbleExpandRule),
Arc::new(CheckGroupByRule::new()),
Arc::new(TypeCoercion::new()),
]);
@@ -128,13 +128,7 @@ pub async fn sql_to_flow_plan(
}
#[derive(Debug)]
struct AvgExpandRule {}
impl AvgExpandRule {
pub fn new() -> Self {
Self {}
}
}
struct AvgExpandRule;
impl AnalyzerRule for AvgExpandRule {
fn analyze(
@@ -331,13 +325,7 @@ impl TreeNodeRewriter for ExpandAvgRewriter<'_> {
/// expand tumble in aggr expr to tumble_start and tumble_end with column name like `window_start`
#[derive(Debug)]
struct TumbleExpandRule {}
impl TumbleExpandRule {
pub fn new() -> Self {
Self {}
}
}
struct TumbleExpandRule;
impl AnalyzerRule for TumbleExpandRule {
fn analyze(

View File

@@ -46,6 +46,12 @@ pub enum Error {
location: Location,
},
#[snafu(display("Flow engine is still recovering"))]
FlowNotRecovered {
#[snafu(implicit)]
location: Location,
},
#[snafu(display("Error encountered while creating flow: {sql}"))]
CreateFlow {
sql: String,
@@ -61,6 +67,16 @@ pub enum Error {
location: Location,
},
#[snafu(display(
"No available frontend found after timeout: {timeout:?}, context: {context}"
))]
NoAvailableFrontend {
timeout: std::time::Duration,
context: String,
#[snafu(implicit)]
location: Location,
},
#[snafu(display("External error"))]
External {
source: BoxedError,
@@ -296,12 +312,14 @@ impl ErrorExt for Error {
Self::Eval { .. }
| Self::JoinTask { .. }
| Self::Datafusion { .. }
| Self::InsertIntoFlow { .. } => StatusCode::Internal,
| Self::InsertIntoFlow { .. }
| Self::NoAvailableFrontend { .. }
| Self::FlowNotRecovered { .. } => StatusCode::Internal,
Self::FlowAlreadyExist { .. } => StatusCode::TableAlreadyExists,
Self::TableNotFound { .. }
| Self::TableNotFoundMeta { .. }
| Self::FlowNotFound { .. }
| Self::ListFlows { .. } => StatusCode::TableNotFound,
Self::FlowNotFound { .. } => StatusCode::FlowNotFound,
Self::Plan { .. } | Self::Datatypes { .. } => StatusCode::PlanQuery,
Self::CreateFlow { .. } | Self::Arrow { .. } | Self::Time { .. } => {
StatusCode::EngineExecuteQuery

View File

@@ -60,7 +60,7 @@ pub enum GenericFn {
Mul,
Div,
Mod,
// varadic func
// variadic func
And,
Or,
// unmaterized func

View File

@@ -43,7 +43,7 @@ use servers::error::{StartGrpcSnafu, TcpBindSnafu, TcpIncomingSnafu};
use servers::http::HttpServerBuilder;
use servers::metrics_handler::MetricsHandler;
use servers::server::{ServerHandler, ServerHandlers};
use session::context::{QueryContextBuilder, QueryContextRef};
use session::context::QueryContextRef;
use snafu::{OptionExt, ResultExt};
use tokio::net::TcpListener;
use tokio::sync::{broadcast, oneshot, Mutex};
@@ -54,18 +54,18 @@ use tonic::{Request, Response, Status};
use crate::adapter::flownode_impl::{FlowDualEngine, FlowDualEngineRef};
use crate::adapter::{create_worker, FlowStreamingEngineRef};
use crate::batching_mode::engine::BatchingEngine;
use crate::engine::FlowEngine;
use crate::error::{
to_status_with_last_err, CacheRequiredSnafu, CreateFlowSnafu, ExternalSnafu, FlowNotFoundSnafu,
ListFlowsSnafu, ParseAddrSnafu, ShutdownServerSnafu, StartServerSnafu, UnexpectedSnafu,
to_status_with_last_err, CacheRequiredSnafu, ExternalSnafu, ListFlowsSnafu, ParseAddrSnafu,
ShutdownServerSnafu, StartServerSnafu, UnexpectedSnafu,
};
use crate::heartbeat::HeartbeatTask;
use crate::metrics::{METRIC_FLOW_PROCESSING_TIME, METRIC_FLOW_ROWS};
use crate::transform::register_function_to_query_engine;
use crate::utils::{SizeReportSender, StateReportHandler};
use crate::{CreateFlowArgs, Error, FlownodeOptions, FrontendClient, StreamingEngine};
use crate::{Error, FlownodeOptions, FrontendClient, StreamingEngine};
pub const FLOW_NODE_SERVER_NAME: &str = "FLOW_NODE_SERVER";
/// wrapping flow node manager to avoid orphan rule with Arc<...>
#[derive(Clone)]
pub struct FlowService {
@@ -172,6 +172,8 @@ impl FlownodeServer {
}
/// Start the background task for streaming computation.
///
/// Should be called only after heartbeat is establish, hence can get cluster info
async fn start_workers(&self) -> Result<(), Error> {
let manager_ref = self.inner.flow_service.dual_engine.clone();
let handle = manager_ref
@@ -395,109 +397,6 @@ impl FlownodeBuilder {
Ok(instance)
}
/// recover all flow tasks in this flownode in distributed mode(nodeid is Some(<num>))
///
/// or recover all existing flow tasks if in standalone mode(nodeid is None)
///
/// TODO(discord9): persistent flow tasks with internal state
async fn recover_flows(&self, manager: &FlowDualEngine) -> Result<usize, Error> {
let nodeid = self.opts.node_id;
let to_be_recovered: Vec<_> = if let Some(nodeid) = nodeid {
let to_be_recover = self
.flow_metadata_manager
.flownode_flow_manager()
.flows(nodeid)
.try_collect::<Vec<_>>()
.await
.context(ListFlowsSnafu { id: Some(nodeid) })?;
to_be_recover.into_iter().map(|(id, _)| id).collect()
} else {
let all_catalogs = self
.catalog_manager
.catalog_names()
.await
.map_err(BoxedError::new)
.context(ExternalSnafu)?;
let mut all_flow_ids = vec![];
for catalog in all_catalogs {
let flows = self
.flow_metadata_manager
.flow_name_manager()
.flow_names(&catalog)
.await
.try_collect::<Vec<_>>()
.await
.map_err(BoxedError::new)
.context(ExternalSnafu)?;
all_flow_ids.extend(flows.into_iter().map(|(_, id)| id.flow_id()));
}
all_flow_ids
};
let cnt = to_be_recovered.len();
// TODO(discord9): recover in parallel
info!("Recovering {} flows: {:?}", cnt, to_be_recovered);
for flow_id in to_be_recovered {
let info = self
.flow_metadata_manager
.flow_info_manager()
.get(flow_id)
.await
.map_err(BoxedError::new)
.context(ExternalSnafu)?
.context(FlowNotFoundSnafu { id: flow_id })?;
let sink_table_name = [
info.sink_table_name().catalog_name.clone(),
info.sink_table_name().schema_name.clone(),
info.sink_table_name().table_name.clone(),
];
let args = CreateFlowArgs {
flow_id: flow_id as _,
sink_table_name,
source_table_ids: info.source_table_ids().to_vec(),
// because recover should only happen on restart the `create_if_not_exists` and `or_replace` can be arbitrary value(since flow doesn't exist)
// but for the sake of consistency and to make sure recover of flow actually happen, we set both to true
// (which is also fine since checks for not allow both to be true is on metasrv and we already pass that)
create_if_not_exists: true,
or_replace: true,
expire_after: info.expire_after(),
comment: Some(info.comment().clone()),
sql: info.raw_sql().clone(),
flow_options: info.options().clone(),
query_ctx: info
.query_context()
.clone()
.map(|ctx| {
ctx.try_into()
.map_err(BoxedError::new)
.context(ExternalSnafu)
})
.transpose()?
// or use default QueryContext with catalog_name from info
// to keep compatibility with old version
.or_else(|| {
Some(
QueryContextBuilder::default()
.current_catalog(info.catalog_name().to_string())
.build(),
)
}),
};
manager
.create_flow(args)
.await
.map_err(BoxedError::new)
.with_context(|_| CreateFlowSnafu {
sql: info.raw_sql().clone(),
})?;
}
Ok(cnt)
}
/// build [`FlowWorkerManager`], note this doesn't take ownership of `self`,
/// nor does it actually start running the worker.
async fn build_manager(
@@ -682,7 +581,7 @@ impl FrontendInvoker {
.start_timer();
self.inserter
.handle_row_inserts(requests, ctx, &self.statement_executor)
.handle_row_inserts(requests, ctx, &self.statement_executor, false)
.await
.map_err(BoxedError::new)
.context(common_frontend::error::ExternalSnafu)

View File

@@ -72,7 +72,10 @@ impl GrpcQueryHandler for Instance {
let output = match request {
Request::Inserts(requests) => self.handle_inserts(requests, ctx.clone()).await?,
Request::RowInserts(requests) => self.handle_row_inserts(requests, ctx.clone()).await?,
Request::RowInserts(requests) => {
self.handle_row_inserts(requests, ctx.clone(), false)
.await?
}
Request::Deletes(requests) => self.handle_deletes(requests, ctx.clone()).await?,
Request::RowDeletes(requests) => self.handle_row_deletes(requests, ctx.clone()).await?,
Request::Query(query_request) => {
@@ -407,9 +410,15 @@ impl Instance {
&self,
requests: RowInsertRequests,
ctx: QueryContextRef,
accommodate_existing_schema: bool,
) -> Result<Output> {
self.inserter
.handle_row_inserts(requests, ctx, self.statement_executor.as_ref())
.handle_row_inserts(
requests,
ctx,
self.statement_executor.as_ref(),
accommodate_existing_schema,
)
.await
.context(TableOperationSnafu)
}
@@ -421,7 +430,7 @@ impl Instance {
ctx: QueryContextRef,
) -> Result<Output> {
self.inserter
.handle_last_non_null_inserts(requests, ctx, self.statement_executor.as_ref())
.handle_last_non_null_inserts(requests, ctx, self.statement_executor.as_ref(), true)
.await
.context(TableOperationSnafu)
}

View File

@@ -53,7 +53,7 @@ impl OpentsdbProtocolHandler for Instance {
};
let output = self
.handle_row_inserts(requests, ctx)
.handle_row_inserts(requests, ctx, true)
.await
.map_err(BoxedError::new)
.context(servers::error::ExecuteGrpcQuerySnafu)?;

View File

@@ -63,7 +63,7 @@ impl OpenTelemetryProtocolHandler for Instance {
None
};
self.handle_row_inserts(requests, ctx)
self.handle_row_inserts(requests, ctx, false)
.await
.map_err(BoxedError::new)
.context(error::ExecuteGrpcQuerySnafu)

View File

@@ -195,7 +195,7 @@ impl PromStoreProtocolHandler for Instance {
.map_err(BoxedError::new)
.context(error::ExecuteGrpcQuerySnafu)?
} else {
self.handle_row_inserts(request, ctx.clone())
self.handle_row_inserts(request, ctx.clone(), true)
.await
.map_err(BoxedError::new)
.context(error::ExecuteGrpcQuerySnafu)?

View File

@@ -182,6 +182,14 @@ impl ClientManager {
}
}
#[cfg(test)]
impl ClientManager {
/// Returns the controller client.
pub(crate) fn controller_client(&self) -> rskafka::client::controller::ControllerClient {
self.client.controller_client().unwrap()
}
}
#[cfg(test)]
mod tests {
use common_wal::test_util::run_test_with_kafka_wal;

View File

@@ -552,6 +552,14 @@ mod tests {
.collect()
}
async fn prepare_topic(logstore: &KafkaLogStore, topic_name: &str) {
let controller_client = logstore.client_manager.controller_client();
controller_client
.create_topic(topic_name.to_string(), 1, 1, 5000)
.await
.unwrap();
}
#[tokio::test]
async fn test_append_batch_basic() {
common_telemetry::init_default_ut_logging();
@@ -573,7 +581,9 @@ mod tests {
};
let logstore = KafkaLogStore::try_new(&config, None).await.unwrap();
let topic_name = uuid::Uuid::new_v4().to_string();
prepare_topic(&logstore, &topic_name).await;
let provider = Provider::kafka_provider(topic_name);
let region_entries = (0..5)
.map(|i| {
let region_id = RegionId::new(1, i);
@@ -647,6 +657,7 @@ mod tests {
};
let logstore = KafkaLogStore::try_new(&config, None).await.unwrap();
let topic_name = uuid::Uuid::new_v4().to_string();
prepare_topic(&logstore, &topic_name).await;
let provider = Provider::kafka_provider(topic_name);
let region_entries = (0..5)
.map(|i| {

View File

@@ -14,7 +14,7 @@
pub mod builder;
use std::fmt::Display;
use std::fmt::{self, Display};
use std::sync::atomic::{AtomicBool, Ordering};
use std::sync::{Arc, Mutex, RwLock};
use std::time::Duration;
@@ -96,7 +96,7 @@ pub enum BackendImpl {
MysqlStore,
}
#[derive(Clone, Debug, PartialEq, Serialize, Deserialize)]
#[derive(Clone, PartialEq, Serialize, Deserialize)]
#[serde(default)]
pub struct MetasrvOptions {
/// The address the server listens on.
@@ -166,6 +166,47 @@ pub struct MetasrvOptions {
pub node_max_idle_time: Duration,
}
impl fmt::Debug for MetasrvOptions {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
let mut debug_struct = f.debug_struct("MetasrvOptions");
debug_struct
.field("bind_addr", &self.bind_addr)
.field("server_addr", &self.server_addr)
.field("store_addrs", &self.sanitize_store_addrs())
.field("selector", &self.selector)
.field("use_memory_store", &self.use_memory_store)
.field("enable_region_failover", &self.enable_region_failover)
.field(
"allow_region_failover_on_local_wal",
&self.allow_region_failover_on_local_wal,
)
.field("http", &self.http)
.field("logging", &self.logging)
.field("procedure", &self.procedure)
.field("failure_detector", &self.failure_detector)
.field("datanode", &self.datanode)
.field("enable_telemetry", &self.enable_telemetry)
.field("data_home", &self.data_home)
.field("wal", &self.wal)
.field("export_metrics", &self.export_metrics)
.field("store_key_prefix", &self.store_key_prefix)
.field("max_txn_ops", &self.max_txn_ops)
.field("flush_stats_factor", &self.flush_stats_factor)
.field("tracing", &self.tracing)
.field("backend", &self.backend);
#[cfg(any(feature = "pg_kvbackend", feature = "mysql_kvbackend"))]
debug_struct.field("meta_table_name", &self.meta_table_name);
#[cfg(feature = "pg_kvbackend")]
debug_struct.field("meta_election_lock_id", &self.meta_election_lock_id);
debug_struct
.field("node_max_idle_time", &self.node_max_idle_time)
.finish()
}
}
const DEFAULT_METASRV_ADDR_PORT: &str = "3002";
impl Default for MetasrvOptions {
@@ -249,6 +290,13 @@ impl MetasrvOptions {
common_telemetry::debug!("detect local IP is not supported on Android");
}
}
fn sanitize_store_addrs(&self) -> Vec<String> {
self.store_addrs
.iter()
.map(|addr| common_meta::kv_backend::util::sanitize_connection_string(addr))
.collect()
}
}
pub struct MetasrvInfo {

View File

@@ -365,7 +365,7 @@ impl MetasrvBuilder {
let (tx, rx) = WalPruneManager::channel();
// Safety: Must be remote WAL.
let remote_wal_options = options.wal.remote_wal_options().unwrap();
let kafka_client = build_kafka_client(remote_wal_options)
let kafka_client = build_kafka_client(&remote_wal_options.connection)
.await
.context(error::BuildKafkaClientSnafu)?;
let wal_prune_context = WalPruneContext {

View File

@@ -52,7 +52,7 @@ use crate::Result;
pub type KafkaClientRef = Arc<Client>;
const DELETE_RECORDS_TIMEOUT: Duration = Duration::from_secs(1);
const DELETE_RECORDS_TIMEOUT: Duration = Duration::from_secs(5);
/// The state of WAL pruning.
#[derive(Debug, Serialize, Deserialize)]
@@ -558,6 +558,7 @@ mod tests {
topic_name = format!("test_procedure_execution-{}", topic_name);
let mut env = TestEnv::new();
let context = env.build_wal_prune_context(broker_endpoints).await;
TestEnv::prepare_topic(&context.client, &topic_name).await;
let mut procedure = WalPruneProcedure::new(topic_name.clone(), context, 10, None);
// Before any data in kvbackend is mocked, should return a retryable error.

View File

@@ -78,7 +78,7 @@ impl TestEnv {
kafka_topic,
..Default::default()
};
Arc::new(build_kafka_client(&config).await.unwrap())
Arc::new(build_kafka_client(&config.connection).await.unwrap())
}
pub async fn build_wal_prune_context(&self, broker_endpoints: Vec<String>) -> WalPruneContext {
@@ -91,4 +91,12 @@ impl TestEnv {
mailbox: self.mailbox.mailbox().clone(),
}
}
pub async fn prepare_topic(client: &Arc<Client>, topic_name: &str) {
let controller_client = client.controller_client().unwrap();
controller_client
.create_topic(topic_name.to_string(), 1, 1, 5000)
.await
.unwrap();
}
}

View File

@@ -16,7 +16,7 @@ use std::sync::Arc;
use object_store::services::Fs;
use object_store::util::{join_dir, with_instrument_layers};
use object_store::ObjectStore;
use object_store::{ErrorKind, ObjectStore};
use smallvec::SmallVec;
use snafu::ResultExt;
use store_api::metadata::RegionMetadataRef;
@@ -42,6 +42,10 @@ pub type AccessLayerRef = Arc<AccessLayer>;
/// SST write results.
pub type SstInfoArray = SmallVec<[SstInfo; 2]>;
pub const ATOMIC_WRITE_DIR: &str = "tmp/";
/// For compatibility. Remove this after a major version release.
pub const OLD_ATOMIC_WRITE_DIR: &str = ".tmp/";
/// A layer to access SST files under the same directory.
pub struct AccessLayer {
region_dir: String,
@@ -160,13 +164,18 @@ impl AccessLayer {
fulltext_index_config: request.fulltext_index_config,
bloom_filter_index_config: request.bloom_filter_index_config,
};
// We disable write cache on file system but we still use atomic write.
// TODO(yingwen): If we support other non-fs stores without the write cache, then
// we may have find a way to check whether we need the cleaner.
let cleaner = TempFileCleaner::new(region_id, self.object_store.clone());
let mut writer = ParquetWriter::new_with_object_store(
self.object_store.clone(),
request.metadata,
indexer_builder,
path_provider,
)
.await;
.await
.with_file_cleaner(cleaner);
writer
.write_all(request.source, request.max_sequence, write_opts)
.await?
@@ -213,10 +222,85 @@ pub struct SstWriteRequest {
pub bloom_filter_index_config: BloomFilterConfig,
}
/// Cleaner to remove temp files on the atomic write dir.
pub(crate) struct TempFileCleaner {
region_id: RegionId,
object_store: ObjectStore,
}
impl TempFileCleaner {
/// Constructs the cleaner for the region and store.
pub(crate) fn new(region_id: RegionId, object_store: ObjectStore) -> Self {
Self {
region_id,
object_store,
}
}
/// Removes the SST and index file from the local atomic dir by the file id.
pub(crate) async fn clean_by_file_id(&self, file_id: FileId) {
let sst_key = IndexKey::new(self.region_id, file_id, FileType::Parquet).to_string();
let index_key = IndexKey::new(self.region_id, file_id, FileType::Puffin).to_string();
Self::clean_atomic_dir_files(&self.object_store, &[&sst_key, &index_key]).await;
}
/// Removes the files from the local atomic dir by their names.
pub(crate) async fn clean_atomic_dir_files(
local_store: &ObjectStore,
names_to_remove: &[&str],
) {
// We don't know the actual suffix of the file under atomic dir, so we have
// to list the dir. The cost should be acceptable as there won't be to many files.
let Ok(entries) = local_store.list(ATOMIC_WRITE_DIR).await.inspect_err(|e| {
if e.kind() != ErrorKind::NotFound {
common_telemetry::error!(e; "Failed to list tmp files for {:?}", names_to_remove)
}
}) else {
return;
};
// In our case, we can ensure the file id is unique so it is safe to remove all files
// with the same file id under the atomic write dir.
let actual_files: Vec<_> = entries
.into_iter()
.filter_map(|entry| {
if entry.metadata().is_dir() {
return None;
}
// Remove name that matches files_to_remove.
let should_remove = names_to_remove
.iter()
.any(|file| entry.name().starts_with(file));
if should_remove {
Some(entry.path().to_string())
} else {
None
}
})
.collect();
common_telemetry::warn!(
"Clean files {:?} under atomic write dir for {:?}",
actual_files,
names_to_remove
);
if let Err(e) = local_store.delete_iter(actual_files).await {
common_telemetry::error!(e; "Failed to delete tmp file for {:?}", names_to_remove);
}
}
}
pub(crate) async fn new_fs_cache_store(root: &str) -> Result<ObjectStore> {
let atomic_write_dir = join_dir(root, ".tmp/");
let atomic_write_dir = join_dir(root, ATOMIC_WRITE_DIR);
clean_dir(&atomic_write_dir).await?;
// Compatible code. Remove this after a major release.
let old_atomic_temp_dir = join_dir(root, OLD_ATOMIC_WRITE_DIR);
clean_dir(&old_atomic_temp_dir).await?;
let builder = Fs::default().root(root).atomic_write_dir(&atomic_write_dir);
let store = ObjectStore::new(builder).context(OpenDalSnafu)?.finish();

View File

@@ -14,6 +14,7 @@
//! A cache for files.
use std::fmt;
use std::ops::Range;
use std::sync::Arc;
use std::time::{Duration, Instant};
@@ -339,6 +340,18 @@ impl IndexKey {
}
}
impl fmt::Display for IndexKey {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
write!(
f,
"{}.{}.{}",
self.region_id.as_u64(),
self.file_id,
self.file_type.as_str()
)
}
}
/// Type of the file.
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
pub enum FileType {
@@ -380,15 +393,7 @@ pub(crate) struct IndexValue {
///
/// The file name format is `{region_id}.{file_id}.{file_type}`
fn cache_file_path(cache_file_dir: &str, key: IndexKey) -> String {
join_path(
cache_file_dir,
&format!(
"{}.{}.{}",
key.region_id.as_u64(),
key.file_id,
key.file_type.as_str()
),
)
join_path(cache_file_dir, &key.to_string())
}
/// Parse index key from the file name.

View File

@@ -26,7 +26,7 @@ use store_api::storage::RegionId;
use crate::access_layer::{
new_fs_cache_store, FilePathProvider, RegionFilePathFactory, SstInfoArray, SstWriteRequest,
WriteCachePathProvider,
TempFileCleaner, WriteCachePathProvider,
};
use crate::cache::file_cache::{FileCache, FileCacheRef, FileType, IndexKey, IndexValue};
use crate::error::{self, Result};
@@ -122,7 +122,7 @@ impl WriteCache {
row_group_size: write_opts.row_group_size,
puffin_manager: self
.puffin_manager_factory
.build(store, path_provider.clone()),
.build(store.clone(), path_provider.clone()),
intermediate_manager: self.intermediate_manager.clone(),
index_options: write_request.index_options,
inverted_index_config: write_request.inverted_index_config,
@@ -130,14 +130,16 @@ impl WriteCache {
bloom_filter_index_config: write_request.bloom_filter_index_config,
};
let cleaner = TempFileCleaner::new(region_id, store.clone());
// Write to FileCache.
let mut writer = ParquetWriter::new_with_object_store(
self.file_cache.local_store(),
store.clone(),
write_request.metadata,
indexer,
path_provider,
path_provider.clone(),
)
.await;
.await
.with_file_cleaner(cleaner);
let sst_info = writer
.write_all(write_request.source, write_request.max_sequence, write_opts)
@@ -201,6 +203,26 @@ impl WriteCache {
remote_path: &str,
remote_store: &ObjectStore,
file_size: u64,
) -> Result<()> {
if let Err(e) = self
.download_without_cleaning(index_key, remote_path, remote_store, file_size)
.await
{
let filename = index_key.to_string();
TempFileCleaner::clean_atomic_dir_files(&self.file_cache.local_store(), &[&filename])
.await;
return Err(e);
}
Ok(())
}
async fn download_without_cleaning(
&self,
index_key: IndexKey,
remote_path: &str,
remote_store: &ObjectStore,
file_size: u64,
) -> Result<()> {
const DOWNLOAD_READER_CONCURRENCY: usize = 8;
const DOWNLOAD_READER_CHUNK_SIZE: ReadableSize = ReadableSize::mb(8);
@@ -410,9 +432,11 @@ mod tests {
use common_test_util::temp_dir::create_temp_dir;
use super::*;
use crate::access_layer::OperationType;
use crate::access_layer::{OperationType, ATOMIC_WRITE_DIR};
use crate::cache::test_util::new_fs_store;
use crate::cache::{CacheManager, CacheStrategy};
use crate::error::InvalidBatchSnafu;
use crate::read::Source;
use crate::region::options::IndexOptions;
use crate::sst::parquet::reader::ParquetReaderBuilder;
use crate::test_util::sst_util::{
@@ -578,4 +602,82 @@ mod tests {
// Check parquet metadata
assert_parquet_metadata_eq(write_parquet_metadata, reader.parquet_metadata());
}
#[tokio::test]
async fn test_write_cache_clean_tmp_files() {
common_telemetry::init_default_ut_logging();
let mut env = TestEnv::new();
let data_home = env.data_home().display().to_string();
let mock_store = env.init_object_store_manager();
let write_cache_dir = create_temp_dir("");
let write_cache_path = write_cache_dir.path().to_str().unwrap();
let write_cache = env
.create_write_cache_from_path(write_cache_path, ReadableSize::mb(10))
.await;
// Create a cache manager using only write cache
let cache_manager = Arc::new(
CacheManager::builder()
.write_cache(Some(write_cache.clone()))
.build(),
);
// Create source
let metadata = Arc::new(sst_region_metadata());
// Creates a source that can return an error to abort the writer.
let source = Source::Iter(Box::new(
[
Ok(new_batch_by_range(&["a", "d"], 0, 60)),
InvalidBatchSnafu {
reason: "Abort the writer",
}
.fail(),
]
.into_iter(),
));
// Write to local cache and upload sst to mock remote store
let write_request = SstWriteRequest {
op_type: OperationType::Flush,
metadata,
source,
storage: None,
max_sequence: None,
cache_manager: cache_manager.clone(),
index_options: IndexOptions::default(),
inverted_index_config: Default::default(),
fulltext_index_config: Default::default(),
bloom_filter_index_config: Default::default(),
};
let write_opts = WriteOptions {
row_group_size: 512,
..Default::default()
};
let upload_request = SstUploadRequest {
dest_path_provider: RegionFilePathFactory::new(data_home.clone()),
remote_store: mock_store.clone(),
};
write_cache
.write_and_upload_sst(write_request, upload_request, &write_opts)
.await
.unwrap_err();
let atomic_write_dir = write_cache_dir.path().join(ATOMIC_WRITE_DIR);
let mut entries = tokio::fs::read_dir(&atomic_write_dir).await.unwrap();
let mut has_files = false;
while let Some(entry) = entries.next_entry().await.unwrap() {
if entry.file_type().await.unwrap().is_dir() {
continue;
}
has_files = true;
common_telemetry::warn!(
"Found remaining temporary file in atomic dir: {}",
entry.path().display()
);
}
assert!(!has_files);
}
}

View File

@@ -302,7 +302,10 @@ impl PartitionTreeMemtable {
fn update_stats(&self, metrics: &WriteMetrics) {
// Only let the tracker tracks value bytes.
self.alloc_tracker.on_allocation(metrics.value_bytes);
metrics.update_timestamp_range(&self.max_timestamp, &self.min_timestamp);
self.max_timestamp
.fetch_max(metrics.max_ts, Ordering::SeqCst);
self.min_timestamp
.fetch_min(metrics.min_ts, Ordering::SeqCst);
}
}

View File

@@ -14,8 +14,6 @@
//! Internal metrics of the memtable.
use std::sync::atomic::{AtomicI64, Ordering};
/// Metrics of writing memtables.
pub(crate) struct WriteMetrics {
/// Size allocated by keys.
@@ -28,51 +26,6 @@ pub(crate) struct WriteMetrics {
pub(crate) max_ts: i64,
}
impl WriteMetrics {
/// Update the min/max timestamp range according to current write metric.
pub(crate) fn update_timestamp_range(&self, prev_max_ts: &AtomicI64, prev_min_ts: &AtomicI64) {
loop {
let current_min = prev_min_ts.load(Ordering::Relaxed);
if self.min_ts >= current_min {
break;
}
let Err(updated) = prev_min_ts.compare_exchange(
current_min,
self.min_ts,
Ordering::Relaxed,
Ordering::Relaxed,
) else {
break;
};
if updated == self.min_ts {
break;
}
}
loop {
let current_max = prev_max_ts.load(Ordering::Relaxed);
if self.max_ts <= current_max {
break;
}
let Err(updated) = prev_max_ts.compare_exchange(
current_max,
self.max_ts,
Ordering::Relaxed,
Ordering::Relaxed,
) else {
break;
};
if updated == self.max_ts {
break;
}
}
}
}
impl Default for WriteMetrics {
fn default() -> Self {
Self {

View File

@@ -147,7 +147,8 @@ impl TimeSeriesMemtable {
fn update_stats(&self, stats: WriteMetrics) {
self.alloc_tracker
.on_allocation(stats.key_bytes + stats.value_bytes);
stats.update_timestamp_range(&self.max_timestamp, &self.min_timestamp);
self.max_timestamp.fetch_max(stats.max_ts, Ordering::SeqCst);
self.min_timestamp.fetch_min(stats.min_ts, Ordering::SeqCst);
}
fn write_key_value(&self, kv: KeyValue, stats: &mut WriteMetrics) -> Result<()> {

View File

@@ -322,13 +322,10 @@ impl ScanRegion {
let memtables: Vec<_> = memtables
.into_iter()
.filter(|mem| {
if mem.is_empty() {
// check if memtable is empty by reading stats.
let Some((start, end)) = mem.stats().time_range() else {
return false;
}
let stats = mem.stats();
// Safety: the memtable is not empty.
let (start, end) = stats.time_range().unwrap();
};
// The time range of the memtable is inclusive.
let memtable_range = TimestampRange::new_inclusive(Some(start), Some(end));
memtable_range.intersects(&time_range)

View File

@@ -134,6 +134,7 @@ impl WriteFormat {
/// Helper for reading the SST format.
pub struct ReadFormat {
/// The metadata stored in the SST.
metadata: RegionMetadataRef,
/// SST file schema.
arrow_schema: SchemaRef,
@@ -305,17 +306,23 @@ impl ReadFormat {
&self,
row_groups: &[impl Borrow<RowGroupMetaData>],
column_id: ColumnId,
) -> Option<ArrayRef> {
let column = self.metadata.column_by_id(column_id)?;
) -> StatValues {
let Some(column) = self.metadata.column_by_id(column_id) else {
// No such column in the SST.
return StatValues::NoColumn;
};
match column.semantic_type {
SemanticType::Tag => self.tag_values(row_groups, column, true),
SemanticType::Field => {
let index = self.field_id_to_index.get(&column_id)?;
Self::column_values(row_groups, column, *index, true)
// Safety: `field_id_to_index` is initialized by the semantic type.
let index = self.field_id_to_index.get(&column_id).unwrap();
let stats = Self::column_values(row_groups, column, *index, true);
StatValues::from_stats_opt(stats)
}
SemanticType::Timestamp => {
let index = self.time_index_position();
Self::column_values(row_groups, column, index, true)
let stats = Self::column_values(row_groups, column, index, true);
StatValues::from_stats_opt(stats)
}
}
}
@@ -325,17 +332,23 @@ impl ReadFormat {
&self,
row_groups: &[impl Borrow<RowGroupMetaData>],
column_id: ColumnId,
) -> Option<ArrayRef> {
let column = self.metadata.column_by_id(column_id)?;
) -> StatValues {
let Some(column) = self.metadata.column_by_id(column_id) else {
// No such column in the SST.
return StatValues::NoColumn;
};
match column.semantic_type {
SemanticType::Tag => self.tag_values(row_groups, column, false),
SemanticType::Field => {
let index = self.field_id_to_index.get(&column_id)?;
Self::column_values(row_groups, column, *index, false)
// Safety: `field_id_to_index` is initialized by the semantic type.
let index = self.field_id_to_index.get(&column_id).unwrap();
let stats = Self::column_values(row_groups, column, *index, false);
StatValues::from_stats_opt(stats)
}
SemanticType::Timestamp => {
let index = self.time_index_position();
Self::column_values(row_groups, column, index, false)
let stats = Self::column_values(row_groups, column, index, false);
StatValues::from_stats_opt(stats)
}
}
}
@@ -345,17 +358,23 @@ impl ReadFormat {
&self,
row_groups: &[impl Borrow<RowGroupMetaData>],
column_id: ColumnId,
) -> Option<ArrayRef> {
let column = self.metadata.column_by_id(column_id)?;
) -> StatValues {
let Some(column) = self.metadata.column_by_id(column_id) else {
// No such column in the SST.
return StatValues::NoColumn;
};
match column.semantic_type {
SemanticType::Tag => None,
SemanticType::Tag => StatValues::NoStats,
SemanticType::Field => {
let index = self.field_id_to_index.get(&column_id)?;
Self::column_null_counts(row_groups, *index)
// Safety: `field_id_to_index` is initialized by the semantic type.
let index = self.field_id_to_index.get(&column_id).unwrap();
let stats = Self::column_null_counts(row_groups, *index);
StatValues::from_stats_opt(stats)
}
SemanticType::Timestamp => {
let index = self.time_index_position();
Self::column_null_counts(row_groups, index)
let stats = Self::column_null_counts(row_groups, index);
StatValues::from_stats_opt(stats)
}
}
}
@@ -390,8 +409,7 @@ impl ReadFormat {
row_groups: &[impl Borrow<RowGroupMetaData>],
column: &ColumnMetadata,
is_min: bool,
) -> Option<ArrayRef> {
let primary_key_encoding = self.metadata.primary_key_encoding;
) -> StatValues {
let is_first_tag = self
.metadata
.primary_key
@@ -400,9 +418,28 @@ impl ReadFormat {
.unwrap_or(false);
if !is_first_tag {
// Only the min-max of the first tag is available in the primary key.
return None;
return StatValues::NoStats;
}
StatValues::from_stats_opt(self.first_tag_values(row_groups, column, is_min))
}
/// Returns min/max values of the first tag.
/// Returns None if the tag does not have statistics.
fn first_tag_values(
&self,
row_groups: &[impl Borrow<RowGroupMetaData>],
column: &ColumnMetadata,
is_min: bool,
) -> Option<ArrayRef> {
debug_assert!(self
.metadata
.primary_key
.first()
.map(|id| *id == column.column_id)
.unwrap_or(false));
let primary_key_encoding = self.metadata.primary_key_encoding;
let converter = build_primary_key_codec_with_fields(
primary_key_encoding,
[(
@@ -452,6 +489,7 @@ impl ReadFormat {
}
/// Returns min/max values of specific non-tag columns.
/// Returns None if the column does not have statistics.
fn column_values(
row_groups: &[impl Borrow<RowGroupMetaData>],
column: &ColumnMetadata,
@@ -544,6 +582,29 @@ impl ReadFormat {
}
}
/// Values of column statistics of the SST.
///
/// It also distinguishes the case that a column is not found and
/// the column exists but has no statistics.
pub enum StatValues {
/// Values of each row group.
Values(ArrayRef),
/// No such column.
NoColumn,
/// Column exists but has no statistics.
NoStats,
}
impl StatValues {
/// Creates a new `StatValues` instance from optional statistics.
pub fn from_stats_opt(stats: Option<ArrayRef>) -> Self {
match stats {
Some(stats) => StatValues::Values(stats),
None => StatValues::NoStats,
}
}
}
#[cfg(test)]
impl ReadFormat {
/// Creates a helper with existing `metadata` and all columns.

View File

@@ -25,7 +25,7 @@ use parquet::file::metadata::RowGroupMetaData;
use store_api::metadata::RegionMetadataRef;
use store_api::storage::ColumnId;
use crate::sst::parquet::format::ReadFormat;
use crate::sst::parquet::format::{ReadFormat, StatValues};
/// Statistics for pruning row groups.
pub(crate) struct RowGroupPruningStats<'a, T> {
@@ -100,16 +100,18 @@ impl<T: Borrow<RowGroupMetaData>> PruningStatistics for RowGroupPruningStats<'_,
fn min_values(&self, column: &Column) -> Option<ArrayRef> {
let column_id = self.column_id_to_prune(&column.name)?;
match self.read_format.min_values(self.row_groups, column_id) {
Some(values) => Some(values),
None => self.compat_default_value(&column.name),
StatValues::Values(values) => Some(values),
StatValues::NoColumn => self.compat_default_value(&column.name),
StatValues::NoStats => None,
}
}
fn max_values(&self, column: &Column) -> Option<ArrayRef> {
let column_id = self.column_id_to_prune(&column.name)?;
match self.read_format.max_values(self.row_groups, column_id) {
Some(values) => Some(values),
None => self.compat_default_value(&column.name),
StatValues::Values(values) => Some(values),
StatValues::NoColumn => self.compat_default_value(&column.name),
StatValues::NoStats => None,
}
}
@@ -118,10 +120,12 @@ impl<T: Borrow<RowGroupMetaData>> PruningStatistics for RowGroupPruningStats<'_,
}
fn null_counts(&self, column: &Column) -> Option<ArrayRef> {
let Some(column_id) = self.column_id_to_prune(&column.name) else {
return self.compat_null_count(&column.name);
};
self.read_format.null_counts(self.row_groups, column_id)
let column_id = self.column_id_to_prune(&column.name)?;
match self.read_format.null_counts(self.row_groups, column_id) {
StatValues::Values(values) => Some(values),
StatValues::NoColumn => self.compat_null_count(&column.name),
StatValues::NoStats => None,
}
}
fn row_counts(&self, _column: &Column) -> Option<ArrayRef> {

View File

@@ -36,7 +36,7 @@ use store_api::storage::SequenceNumber;
use tokio::io::AsyncWrite;
use tokio_util::compat::{Compat, FuturesAsyncWriteCompatExt};
use crate::access_layer::{FilePathProvider, SstInfoArray};
use crate::access_layer::{FilePathProvider, SstInfoArray, TempFileCleaner};
use crate::error::{InvalidMetadataSnafu, OpenDalSnafu, Result, WriteParquetSnafu};
use crate::read::{Batch, Source};
use crate::sst::file::FileId;
@@ -61,6 +61,8 @@ pub struct ParquetWriter<F: WriterFactory, I: IndexerBuilder, P: FilePathProvide
/// Current active indexer.
current_indexer: Option<Indexer>,
bytes_written: Arc<AtomicUsize>,
/// Cleaner to remove temp files on failure.
file_cleaner: Option<TempFileCleaner>,
}
pub trait WriterFactory {
@@ -105,6 +107,11 @@ where
)
.await
}
pub(crate) fn with_file_cleaner(mut self, cleaner: TempFileCleaner) -> Self {
self.file_cleaner = Some(cleaner);
self
}
}
impl<F, I, P> ParquetWriter<F, I, P>
@@ -132,6 +139,7 @@ where
indexer_builder,
current_indexer: Some(indexer),
bytes_written: Arc::new(AtomicUsize::new(0)),
file_cleaner: None,
}
}
@@ -152,6 +160,25 @@ where
///
/// Returns the [SstInfo] if the SST is written.
pub async fn write_all(
&mut self,
source: Source,
override_sequence: Option<SequenceNumber>, // override the `sequence` field from `Source`
opts: &WriteOptions,
) -> Result<SstInfoArray> {
let res = self
.write_all_without_cleaning(source, override_sequence, opts)
.await;
if res.is_err() {
// Clean tmp files explicitly on failure.
let file_id = self.current_file;
if let Some(cleaner) = &self.file_cleaner {
cleaner.clean_by_file_id(file_id).await;
}
}
res
}
async fn write_all_without_cleaning(
&mut self,
mut source: Source,
override_sequence: Option<SequenceNumber>, // override the `sequence` field from `Source`

View File

@@ -145,6 +145,12 @@ pub(crate) async fn prepare_test_for_kafka_log_store(factory: &LogStoreFactory)
}
pub(crate) async fn append_noop_record(client: &Client, topic: &str) {
let controller_client = client.controller_client().unwrap();
controller_client
.create_topic(topic, 1, 1, 5000)
.await
.unwrap();
let partition_client = client
.partition_client(topic, 0, UnknownTopicHandling::Retry)
.await
@@ -659,6 +665,27 @@ impl TestEnv {
Arc::new(write_cache)
}
/// Creates a write cache from a path.
pub async fn create_write_cache_from_path(
&self,
path: &str,
capacity: ReadableSize,
) -> WriteCacheRef {
let index_aux_path = self.data_home.path().join("index_aux");
let puffin_mgr = PuffinManagerFactory::new(&index_aux_path, 4096, None, None)
.await
.unwrap();
let intm_mgr = IntermediateManager::init_fs(index_aux_path.to_str().unwrap())
.await
.unwrap();
let write_cache = WriteCache::new_fs(path, capacity, None, puffin_mgr, intm_mgr)
.await
.unwrap();
Arc::new(write_cache)
}
pub fn get_schema_metadata_manager(&self) -> SchemaMetadataManagerRef {
self.schema_metadata_manager.clone()
}

View File

@@ -63,5 +63,6 @@ tokio-util.workspace = true
tonic.workspace = true
[dev-dependencies]
common-meta = { workspace = true, features = ["testing"] }
common-test-util.workspace = true
path-slash = "0.2"

View File

@@ -147,7 +147,7 @@ impl Inserter {
statement_executor: &StatementExecutor,
) -> Result<Output> {
let row_inserts = ColumnToRow::convert(requests)?;
self.handle_row_inserts(row_inserts, ctx, statement_executor)
self.handle_row_inserts(row_inserts, ctx, statement_executor, false)
.await
}
@@ -157,6 +157,7 @@ impl Inserter {
mut requests: RowInsertRequests,
ctx: QueryContextRef,
statement_executor: &StatementExecutor,
accommodate_existing_schema: bool,
) -> Result<Output> {
preprocess_row_insert_requests(&mut requests.inserts)?;
self.handle_row_inserts_with_create_type(
@@ -164,6 +165,7 @@ impl Inserter {
ctx,
statement_executor,
AutoCreateTableType::Physical,
accommodate_existing_schema,
)
.await
}
@@ -180,6 +182,7 @@ impl Inserter {
ctx,
statement_executor,
AutoCreateTableType::Log,
false,
)
.await
}
@@ -195,6 +198,7 @@ impl Inserter {
ctx,
statement_executor,
AutoCreateTableType::Trace,
false,
)
.await
}
@@ -205,12 +209,14 @@ impl Inserter {
requests: RowInsertRequests,
ctx: QueryContextRef,
statement_executor: &StatementExecutor,
accommodate_existing_schema: bool,
) -> Result<Output> {
self.handle_row_inserts_with_create_type(
requests,
ctx,
statement_executor,
AutoCreateTableType::LastNonNull,
accommodate_existing_schema,
)
.await
}
@@ -222,6 +228,7 @@ impl Inserter {
ctx: QueryContextRef,
statement_executor: &StatementExecutor,
create_type: AutoCreateTableType,
accommodate_existing_schema: bool,
) -> Result<Output> {
// remove empty requests
requests.inserts.retain(|req| {
@@ -236,7 +243,13 @@ impl Inserter {
instant_table_ids,
table_infos,
} = self
.create_or_alter_tables_on_demand(&requests, &ctx, create_type, statement_executor)
.create_or_alter_tables_on_demand(
&mut requests,
&ctx,
create_type,
statement_executor,
accommodate_existing_schema,
)
.await?;
let name_to_info = table_infos
@@ -281,10 +294,11 @@ impl Inserter {
table_infos,
} = self
.create_or_alter_tables_on_demand(
&requests,
&mut requests,
&ctx,
AutoCreateTableType::Logical(physical_table.to_string()),
statement_executor,
true,
)
.await?;
let name_to_info = table_infos
@@ -448,12 +462,18 @@ impl Inserter {
///
/// Returns a mapping from table name to table id, where table name is the table name involved in the requests.
/// This mapping is used in the conversion of RowToRegion.
///
/// `accommodate_existing_schema` is used to determine if the existing schema should override the new schema.
/// It only works for TIME_INDEX and VALUE columns. This is for the case where the user creates a table with
/// custom schema, and then inserts data with endpoints that have default schema setting, like prometheus
/// remote write. This will modify the `RowInsertRequests` in place.
async fn create_or_alter_tables_on_demand(
&self,
requests: &RowInsertRequests,
requests: &mut RowInsertRequests,
ctx: &QueryContextRef,
auto_create_table_type: AutoCreateTableType,
statement_executor: &StatementExecutor,
accommodate_existing_schema: bool,
) -> Result<CreateAlterTableResult> {
let _timer = crate::metrics::CREATE_ALTER_ON_DEMAND
.with_label_values(&[auto_create_table_type.as_str()])
@@ -504,7 +524,7 @@ impl Inserter {
let mut alter_tables = vec![];
let mut instant_table_ids = HashSet::new();
for req in &requests.inserts {
for req in &mut requests.inserts {
match self.get_table(catalog, &schema, &req.table_name).await? {
Some(table) => {
let table_info = table.table_info();
@@ -512,9 +532,12 @@ impl Inserter {
instant_table_ids.insert(table_info.table_id());
}
table_infos.insert(table_info.table_id(), table.table_info());
if let Some(alter_expr) =
self.get_alter_table_expr_on_demand(req, &table, ctx)?
{
if let Some(alter_expr) = self.get_alter_table_expr_on_demand(
req,
&table,
ctx,
accommodate_existing_schema,
)? {
alter_tables.push(alter_expr);
}
}
@@ -784,12 +807,16 @@ impl Inserter {
}
/// Returns an alter table expression if it finds new columns in the request.
/// It always adds columns if not exist.
/// When `accommodate_existing_schema` is false, it always adds columns if not exist.
/// When `accommodate_existing_schema` is true, it may modify the input `req` to
/// accommodate it with existing schema. See [`create_or_alter_tables_on_demand`](Self::create_or_alter_tables_on_demand)
/// for more details.
fn get_alter_table_expr_on_demand(
&self,
req: &RowInsertRequest,
req: &mut RowInsertRequest,
table: &TableRef,
ctx: &QueryContextRef,
accommodate_existing_schema: bool,
) -> Result<Option<AlterTableExpr>> {
let catalog_name = ctx.current_catalog();
let schema_name = ctx.current_schema();
@@ -798,10 +825,64 @@ impl Inserter {
let request_schema = req.rows.as_ref().unwrap().schema.as_slice();
let column_exprs = ColumnExpr::from_column_schemas(request_schema);
let add_columns = expr_helper::extract_add_columns_expr(&table.schema(), column_exprs)?;
let Some(add_columns) = add_columns else {
let Some(mut add_columns) = add_columns else {
return Ok(None);
};
// If accommodate_existing_schema is true, update request schema for Timestamp/Field columns
if accommodate_existing_schema {
let table_schema = table.schema();
// Find timestamp column name
let ts_col_name = table_schema.timestamp_column().map(|c| c.name.clone());
// Find field column name if there is only one
let mut field_col_name = None;
let mut multiple_field_cols = false;
table.field_columns().for_each(|col| {
if field_col_name.is_none() {
field_col_name = Some(col.name.clone());
} else {
multiple_field_cols = true;
}
});
if multiple_field_cols {
field_col_name = None;
}
// Update column name in request schema for Timestamp/Field columns
if let Some(rows) = req.rows.as_mut() {
for col in &mut rows.schema {
match col.semantic_type {
x if x == SemanticType::Timestamp as i32 => {
if let Some(ref ts_name) = ts_col_name {
if col.column_name != *ts_name {
col.column_name = ts_name.clone();
}
}
}
x if x == SemanticType::Field as i32 => {
if let Some(ref field_name) = field_col_name {
if col.column_name != *field_name {
col.column_name = field_name.clone();
}
}
}
_ => {}
}
}
}
// Remove from add_columns any column that is timestamp or field (if there is only one field column)
add_columns.add_columns.retain(|col| {
let def = col.column_def.as_ref().unwrap();
def.semantic_type != SemanticType::Timestamp as i32
&& (def.semantic_type != SemanticType::Field as i32 && field_col_name.is_some())
});
if add_columns.add_columns.is_empty() {
return Ok(None);
}
}
Ok(Some(AlterTableExpr {
catalog_name: catalog_name.to_string(),
schema_name: schema_name.to_string(),
@@ -946,6 +1027,7 @@ impl FlowMirrorTask {
// already know this is not source table
Some(None) => continue,
_ => {
// dedup peers
let peers = cache
.get(table_id)
.await
@@ -953,6 +1035,8 @@ impl FlowMirrorTask {
.unwrap_or_default()
.values()
.cloned()
.collect::<HashSet<_>>()
.into_iter()
.collect::<Vec<_>>();
if !peers.is_empty() {
@@ -1032,3 +1116,124 @@ impl FlowMirrorTask {
Ok(())
}
}
#[cfg(test)]
mod tests {
use std::sync::Arc;
use api::v1::{ColumnSchema as GrpcColumnSchema, RowInsertRequest, Rows, SemanticType, Value};
use common_catalog::consts::{DEFAULT_CATALOG_NAME, DEFAULT_SCHEMA_NAME};
use common_meta::cache::new_table_flownode_set_cache;
use common_meta::ddl::test_util::datanode_handler::NaiveDatanodeHandler;
use common_meta::test_util::MockDatanodeManager;
use datatypes::data_type::ConcreteDataType;
use datatypes::schema::ColumnSchema;
use moka::future::Cache;
use session::context::QueryContext;
use table::dist_table::DummyDataSource;
use table::metadata::{TableInfoBuilder, TableMetaBuilder, TableType};
use table::TableRef;
use super::*;
use crate::tests::{create_partition_rule_manager, prepare_mocked_backend};
fn make_table_ref_with_schema(ts_name: &str, field_name: &str) -> TableRef {
let schema = datatypes::schema::SchemaBuilder::try_from_columns(vec![
ColumnSchema::new(
ts_name,
ConcreteDataType::timestamp_millisecond_datatype(),
false,
)
.with_time_index(true),
ColumnSchema::new(field_name, ConcreteDataType::float64_datatype(), true),
])
.unwrap()
.build()
.unwrap();
let meta = TableMetaBuilder::empty()
.schema(Arc::new(schema))
.primary_key_indices(vec![])
.value_indices(vec![1])
.engine("mito")
.next_column_id(0)
.options(Default::default())
.created_on(Default::default())
.region_numbers(vec![0])
.build()
.unwrap();
let info = Arc::new(
TableInfoBuilder::default()
.table_id(1)
.table_version(0)
.name("test_table")
.schema_name(DEFAULT_SCHEMA_NAME)
.catalog_name(DEFAULT_CATALOG_NAME)
.desc(None)
.table_type(TableType::Base)
.meta(meta)
.build()
.unwrap(),
);
Arc::new(table::Table::new(
info,
table::metadata::FilterPushDownType::Unsupported,
Arc::new(DummyDataSource),
))
}
#[tokio::test]
async fn test_accommodate_existing_schema_logic() {
let ts_name = "my_ts";
let field_name = "my_field";
let table = make_table_ref_with_schema(ts_name, field_name);
// The request uses different names for timestamp and field columns
let mut req = RowInsertRequest {
table_name: "test_table".to_string(),
rows: Some(Rows {
schema: vec![
GrpcColumnSchema {
column_name: "ts_wrong".to_string(),
datatype: api::v1::ColumnDataType::TimestampMillisecond as i32,
semantic_type: SemanticType::Timestamp as i32,
..Default::default()
},
GrpcColumnSchema {
column_name: "field_wrong".to_string(),
datatype: api::v1::ColumnDataType::Float64 as i32,
semantic_type: SemanticType::Field as i32,
..Default::default()
},
],
rows: vec![api::v1::Row {
values: vec![Value::default(), Value::default()],
}],
}),
};
let ctx = Arc::new(QueryContext::with(
DEFAULT_CATALOG_NAME,
DEFAULT_SCHEMA_NAME,
));
let kv_backend = prepare_mocked_backend().await;
let inserter = Inserter::new(
catalog::memory::MemoryCatalogManager::new(),
create_partition_rule_manager(kv_backend.clone()).await,
Arc::new(MockDatanodeManager::new(NaiveDatanodeHandler)),
Arc::new(new_table_flownode_set_cache(
String::new(),
Cache::new(100),
kv_backend.clone(),
)),
);
let alter_expr = inserter
.get_alter_table_expr_on_demand(&mut req, &table, &ctx, true)
.unwrap();
assert!(alter_expr.is_none());
// The request's schema should have updated names for timestamp and field columns
let req_schema = req.rows.as_ref().unwrap().schema.clone();
assert_eq!(req_schema[0].column_name, ts_name);
assert_eq!(req_schema[1].column_name, field_name);
}
}

View File

@@ -14,6 +14,7 @@
#![feature(assert_matches)]
#![feature(if_let_guard)]
#![feature(let_chains)]
pub mod delete;
pub mod error;

View File

@@ -57,33 +57,13 @@ mod tests {
use api::v1::value::ValueData;
use api::v1::{ColumnDataType, ColumnSchema, Row, SemanticType, Value};
use common_catalog::consts::{DEFAULT_CATALOG_NAME, DEFAULT_SCHEMA_NAME};
use common_meta::key::catalog_name::{CatalogManager, CatalogNameKey};
use common_meta::key::schema_name::{SchemaManager, SchemaNameKey};
use common_meta::kv_backend::memory::MemoryKvBackend;
use common_meta::kv_backend::KvBackendRef;
use datatypes::vectors::{Int32Vector, VectorRef};
use store_api::storage::RegionId;
use super::*;
use crate::tests::{create_partition_rule_manager, new_test_table_info};
async fn prepare_mocked_backend() -> KvBackendRef {
let backend = Arc::new(MemoryKvBackend::default());
let catalog_manager = CatalogManager::new(backend.clone());
let schema_manager = SchemaManager::new(backend.clone());
catalog_manager
.create(CatalogNameKey::default(), false)
.await
.unwrap();
schema_manager
.create(SchemaNameKey::default(), None, false)
.await
.unwrap();
backend
}
use crate::tests::{
create_partition_rule_manager, new_test_table_info, prepare_mocked_backend,
};
#[tokio::test]
async fn test_delete_request_table_to_region() {

View File

@@ -73,33 +73,13 @@ mod tests {
use api::v1::value::ValueData;
use api::v1::{ColumnDataType, ColumnSchema, Row, SemanticType, Value};
use common_catalog::consts::{DEFAULT_CATALOG_NAME, DEFAULT_SCHEMA_NAME};
use common_meta::key::catalog_name::{CatalogManager, CatalogNameKey};
use common_meta::key::schema_name::{SchemaManager, SchemaNameKey};
use common_meta::kv_backend::memory::MemoryKvBackend;
use common_meta::kv_backend::KvBackendRef;
use datatypes::vectors::{Int32Vector, VectorRef};
use store_api::storage::RegionId;
use super::*;
use crate::tests::{create_partition_rule_manager, new_test_table_info};
async fn prepare_mocked_backend() -> KvBackendRef {
let backend = Arc::new(MemoryKvBackend::default());
let catalog_manager = CatalogManager::new(backend.clone());
let schema_manager = SchemaManager::new(backend.clone());
catalog_manager
.create(CatalogNameKey::default(), false)
.await
.unwrap();
schema_manager
.create(SchemaNameKey::default(), None, false)
.await
.unwrap();
backend
}
use crate::tests::{
create_partition_rule_manager, new_test_table_info, prepare_mocked_backend,
};
#[tokio::test]
async fn test_insert_request_table_to_region() {

View File

@@ -37,7 +37,7 @@ use common_meta::rpc::ddl::{
};
use common_meta::rpc::router::{Partition, Partition as MetaPartition};
use common_query::Output;
use common_telemetry::{debug, info, tracing};
use common_telemetry::{debug, info, tracing, warn};
use common_time::Timezone;
use datafusion_common::tree_node::TreeNodeVisitor;
use datafusion_expr::LogicalPlan;
@@ -369,7 +369,7 @@ impl StatementExecutor {
query_context: QueryContextRef,
) -> Result<SubmitDdlTaskResponse> {
let flow_type = self
.determine_flow_type(&expr.sql, query_context.clone())
.determine_flow_type(&expr, query_context.clone())
.await?;
info!("determined flow={} type: {:#?}", expr.flow_name, flow_type);
@@ -398,9 +398,49 @@ impl StatementExecutor {
/// Determine the flow type based on the SQL query
///
/// If it contains aggregation or distinct, then it is a batch flow, otherwise it is a streaming flow
async fn determine_flow_type(&self, sql: &str, query_ctx: QueryContextRef) -> Result<FlowType> {
async fn determine_flow_type(
&self,
expr: &CreateFlowExpr,
query_ctx: QueryContextRef,
) -> Result<FlowType> {
// first check if source table's ttl is instant, if it is, force streaming mode
for src_table_name in &expr.source_table_names {
let table = self
.catalog_manager()
.table(
&src_table_name.catalog_name,
&src_table_name.schema_name,
&src_table_name.table_name,
Some(&query_ctx),
)
.await
.map_err(BoxedError::new)
.context(ExternalSnafu)?
.with_context(|| TableNotFoundSnafu {
table_name: format_full_table_name(
&src_table_name.catalog_name,
&src_table_name.schema_name,
&src_table_name.table_name,
),
})?;
// instant source table can only be handled by streaming mode
if table.table_info().meta.options.ttl == Some(common_time::TimeToLive::Instant) {
warn!(
"Source table `{}` for flow `{}`'s ttl=instant, fallback to streaming mode",
format_full_table_name(
&src_table_name.catalog_name,
&src_table_name.schema_name,
&src_table_name.table_name
),
expr.flow_name
);
return Ok(FlowType::Streaming);
}
}
let engine = &self.query_engine;
let stmt = QueryLanguageParser::parse_sql(sql, &query_ctx)
let stmt = QueryLanguageParser::parse_sql(&expr.sql, &query_ctx)
.map_err(BoxedError::new)
.context(ExternalSnafu)?;
let plan = engine
@@ -803,8 +843,46 @@ impl StatementExecutor {
}
);
self.alter_logical_tables_procedure(alter_table_exprs, query_context)
.await?;
// group by physical table id
let mut groups: HashMap<TableId, Vec<AlterTableExpr>> = HashMap::new();
for expr in alter_table_exprs {
// Get table_id from catalog_manager
let catalog = if expr.catalog_name.is_empty() {
query_context.current_catalog()
} else {
&expr.catalog_name
};
let schema = if expr.schema_name.is_empty() {
query_context.current_schema()
} else {
expr.schema_name.to_string()
};
let table_name = &expr.table_name;
let table = self
.catalog_manager
.table(catalog, &schema, table_name, Some(&query_context))
.await
.context(CatalogSnafu)?
.with_context(|| TableNotFoundSnafu {
table_name: format_full_table_name(catalog, &schema, table_name),
})?;
let table_id = table.table_info().ident.table_id;
let physical_table_id = self
.table_metadata_manager
.table_route_manager()
.get_physical_table_id(table_id)
.await
.context(TableMetadataManagerSnafu)?;
groups.entry(physical_table_id).or_default().push(expr);
}
// Submit procedure for each physical table
let mut handles = Vec::with_capacity(groups.len());
for (_physical_table_id, exprs) in groups {
let fut = self.alter_logical_tables_procedure(exprs, query_context.clone());
handles.push(fut);
}
let _results = futures::future::try_join_all(handles).await?;
Ok(Output::new_with_affected_rows(0))
}

View File

@@ -122,7 +122,11 @@ pub fn set_search_path(exprs: Vec<Expr>, ctx: QueryContextRef) -> Result<()> {
match search_expr {
Expr::Value(Value::SingleQuotedString(search_path))
| Expr::Value(Value::DoubleQuotedString(search_path)) => {
ctx.set_current_schema(&search_path.clone());
ctx.set_current_schema(search_path);
Ok(())
}
Expr::Identifier(Ident { value, .. }) => {
ctx.set_current_schema(value);
Ok(())
}
expr => NotSupportedSnafu {

View File

@@ -12,6 +12,8 @@
// See the License for the specific language governing permissions and
// limitations under the License.
mod kv_backend;
mod partition_manager;
pub(crate) use kv_backend::prepare_mocked_backend;
pub(crate) use partition_manager::{create_partition_rule_manager, new_test_table_info};

View File

@@ -0,0 +1,38 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
use std::sync::Arc;
use common_meta::key::catalog_name::{CatalogManager, CatalogNameKey};
use common_meta::key::schema_name::{SchemaManager, SchemaNameKey};
use common_meta::kv_backend::memory::MemoryKvBackend;
use common_meta::kv_backend::KvBackendRef;
pub async fn prepare_mocked_backend() -> KvBackendRef {
let backend = Arc::new(MemoryKvBackend::default());
let catalog_manager = CatalogManager::new(backend.clone());
let schema_manager = SchemaManager::new(backend.clone());
catalog_manager
.create(CatalogNameKey::default(), false)
.await
.unwrap();
schema_manager
.create(SchemaNameKey::default(), None, false)
.await
.unwrap();
backend
}

View File

@@ -249,6 +249,7 @@ impl PipelineTable {
requests,
Self::query_ctx(&table_info),
&self.statement_executor,
false,
)
.await
.context(InsertPipelineSnafu)?;

View File

@@ -47,6 +47,11 @@ use crate::metrics::PROMQL_SERIES_COUNT;
#[derive(Debug, PartialEq, Eq, Hash, PartialOrd)]
pub struct SeriesDivide {
tag_columns: Vec<String>,
/// `SeriesDivide` requires `time_index` column's name to generate ordering requirement
/// for input data. But this plan itself doesn't depend on the ordering of time index
/// column. This is for follow on plans like `RangeManipulate`. Because requiring ordering
/// here can avoid unnecessary sort in follow on plans.
time_index_column: String,
input: LogicalPlan,
}
@@ -84,14 +89,19 @@ impl UserDefinedLogicalNodeCore for SeriesDivide {
Ok(Self {
tag_columns: self.tag_columns.clone(),
time_index_column: self.time_index_column.clone(),
input: inputs[0].clone(),
})
}
}
impl SeriesDivide {
pub fn new(tag_columns: Vec<String>, input: LogicalPlan) -> Self {
Self { tag_columns, input }
pub fn new(tag_columns: Vec<String>, time_index_column: String, input: LogicalPlan) -> Self {
Self {
tag_columns,
time_index_column,
input,
}
}
pub const fn name() -> &'static str {
@@ -101,6 +111,7 @@ impl SeriesDivide {
pub fn to_execution_plan(&self, exec_input: Arc<dyn ExecutionPlan>) -> Arc<dyn ExecutionPlan> {
Arc::new(SeriesDivideExec {
tag_columns: self.tag_columns.clone(),
time_index_column: self.time_index_column.clone(),
input: exec_input,
metric: ExecutionPlanMetricsSet::new(),
})
@@ -109,6 +120,7 @@ impl SeriesDivide {
pub fn serialize(&self) -> Vec<u8> {
pb::SeriesDivide {
tag_columns: self.tag_columns.clone(),
time_index_column: self.time_index_column.clone(),
}
.encode_to_vec()
}
@@ -121,6 +133,7 @@ impl SeriesDivide {
});
Ok(Self {
tag_columns: pb_series_divide.tag_columns,
time_index_column: pb_series_divide.time_index_column,
input: placeholder_plan,
})
}
@@ -129,6 +142,7 @@ impl SeriesDivide {
#[derive(Debug)]
pub struct SeriesDivideExec {
tag_columns: Vec<String>,
time_index_column: String,
input: Arc<dyn ExecutionPlan>,
metric: ExecutionPlanMetricsSet,
}
@@ -159,7 +173,7 @@ impl ExecutionPlan for SeriesDivideExec {
fn required_input_ordering(&self) -> Vec<Option<LexRequirement>> {
let input_schema = self.input.schema();
let exprs: Vec<PhysicalSortRequirement> = self
let mut exprs: Vec<PhysicalSortRequirement> = self
.tag_columns
.iter()
.map(|tag| PhysicalSortRequirement {
@@ -171,11 +185,17 @@ impl ExecutionPlan for SeriesDivideExec {
}),
})
.collect();
if !exprs.is_empty() {
vec![Some(LexRequirement::new(exprs))]
} else {
vec![None]
}
exprs.push(PhysicalSortRequirement {
expr: Arc::new(
ColumnExpr::new_with_schema(&self.time_index_column, &input_schema).unwrap(),
),
options: Some(SortOptions {
descending: false,
nulls_first: true,
}),
});
vec![Some(LexRequirement::new(exprs))]
}
fn maintains_input_order(&self) -> Vec<bool> {
@@ -193,6 +213,7 @@ impl ExecutionPlan for SeriesDivideExec {
assert!(!children.is_empty());
Ok(Arc::new(Self {
tag_columns: self.tag_columns.clone(),
time_index_column: self.time_index_column.clone(),
input: children[0].clone(),
metric: self.metric.clone(),
}))
@@ -468,6 +489,11 @@ mod test {
let schema = Arc::new(Schema::new(vec![
Field::new("host", DataType::Utf8, true),
Field::new("path", DataType::Utf8, true),
Field::new(
"time_index",
DataType::Timestamp(datafusion::arrow::datatypes::TimeUnit::Millisecond, None),
false,
),
]));
let path_column_1 = Arc::new(StringArray::from(vec![
@@ -476,9 +502,17 @@ mod test {
let host_column_1 = Arc::new(StringArray::from(vec![
"000", "000", "001", "002", "002", "002", "002", "002", "003", "005", "005", "005",
])) as _;
let time_index_column_1 = Arc::new(
datafusion::arrow::array::TimestampMillisecondArray::from(vec![
1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000, 11000, 12000,
]),
) as _;
let path_column_2 = Arc::new(StringArray::from(vec!["bla", "bla", "bla"])) as _;
let host_column_2 = Arc::new(StringArray::from(vec!["005", "005", "005"])) as _;
let time_index_column_2 = Arc::new(
datafusion::arrow::array::TimestampMillisecondArray::from(vec![13000, 14000, 15000]),
) as _;
let path_column_3 = Arc::new(StringArray::from(vec![
"bla", "🥺", "🥺", "🥺", "🥺", "🥺", "🫠", "🫠",
@@ -486,13 +520,26 @@ mod test {
let host_column_3 = Arc::new(StringArray::from(vec![
"005", "001", "001", "001", "001", "001", "001", "001",
])) as _;
let time_index_column_3 =
Arc::new(datafusion::arrow::array::TimestampMillisecondArray::from(
vec![16000, 17000, 18000, 19000, 20000, 21000, 22000, 23000],
)) as _;
let data_1 =
RecordBatch::try_new(schema.clone(), vec![path_column_1, host_column_1]).unwrap();
let data_2 =
RecordBatch::try_new(schema.clone(), vec![path_column_2, host_column_2]).unwrap();
let data_3 =
RecordBatch::try_new(schema.clone(), vec![path_column_3, host_column_3]).unwrap();
let data_1 = RecordBatch::try_new(
schema.clone(),
vec![path_column_1, host_column_1, time_index_column_1],
)
.unwrap();
let data_2 = RecordBatch::try_new(
schema.clone(),
vec![path_column_2, host_column_2, time_index_column_2],
)
.unwrap();
let data_3 = RecordBatch::try_new(
schema.clone(),
vec![path_column_3, host_column_3, time_index_column_3],
)
.unwrap();
MemoryExec::try_new(&[vec![data_1, data_2, data_3]], schema, None).unwrap()
}
@@ -502,6 +549,7 @@ mod test {
let memory_exec = Arc::new(prepare_test_data());
let divide_exec = Arc::new(SeriesDivideExec {
tag_columns: vec!["host".to_string(), "path".to_string()],
time_index_column: "time_index".to_string(),
input: memory_exec,
metric: ExecutionPlanMetricsSet::new(),
});
@@ -514,33 +562,33 @@ mod test {
.to_string();
let expected = String::from(
"+------+------+\
\n| host | path |\
\n+------+------+\
\n| foo | 000 |\
\n| foo | 000 |\
\n| foo | 001 |\
\n| bar | 002 |\
\n| bar | 002 |\
\n| bar | 002 |\
\n| bar | 002 |\
\n| bar | 002 |\
\n| bar | 003 |\
\n| bla | 005 |\
\n| bla | 005 |\
\n| bla | 005 |\
\n| bla | 005 |\
\n| bla | 005 |\
\n| bla | 005 |\
\n| bla | 005 |\
\n| 🥺 | 001 |\
\n| 🥺 | 001 |\
\n| 🥺 | 001 |\
\n| 🥺 | 001 |\
\n| 🥺 | 001 |\
\n| 🫠 | 001 |\
\n| 🫠 | 001 |\
\n+------+------+",
"+------+------+---------------------+\
\n| host | path | time_index |\
\n+------+------+---------------------+\
\n| foo | 000 | 1970-01-01T00:00:01 |\
\n| foo | 000 | 1970-01-01T00:00:02 |\
\n| foo | 001 | 1970-01-01T00:00:03 |\
\n| bar | 002 | 1970-01-01T00:00:04 |\
\n| bar | 002 | 1970-01-01T00:00:05 |\
\n| bar | 002 | 1970-01-01T00:00:06 |\
\n| bar | 002 | 1970-01-01T00:00:07 |\
\n| bar | 002 | 1970-01-01T00:00:08 |\
\n| bar | 003 | 1970-01-01T00:00:09 |\
\n| bla | 005 | 1970-01-01T00:00:10 |\
\n| bla | 005 | 1970-01-01T00:00:11 |\
\n| bla | 005 | 1970-01-01T00:00:12 |\
\n| bla | 005 | 1970-01-01T00:00:13 |\
\n| bla | 005 | 1970-01-01T00:00:14 |\
\n| bla | 005 | 1970-01-01T00:00:15 |\
\n| bla | 005 | 1970-01-01T00:00:16 |\
\n| 🥺 | 001 | 1970-01-01T00:00:17 |\
\n| 🥺 | 001 | 1970-01-01T00:00:18 |\
\n| 🥺 | 001 | 1970-01-01T00:00:19 |\
\n| 🥺 | 001 | 1970-01-01T00:00:20 |\
\n| 🥺 | 001 | 1970-01-01T00:00:21 |\
\n| 🫠 | 001 | 1970-01-01T00:00:22 |\
\n| 🫠 | 001 | 1970-01-01T00:00:23 |\
\n+------+------+---------------------+",
);
assert_eq!(result_literal, expected);
}
@@ -550,6 +598,7 @@ mod test {
let memory_exec = Arc::new(prepare_test_data());
let divide_exec = Arc::new(SeriesDivideExec {
tag_columns: vec!["host".to_string(), "path".to_string()],
time_index_column: "time_index".to_string(),
input: memory_exec,
metric: ExecutionPlanMetricsSet::new(),
});
@@ -559,69 +608,69 @@ mod test {
let mut expectations = vec![
String::from(
"+------+------+\
\n| host | path |\
\n+------+------+\
\n| foo | 000 |\
\n| foo | 000 |\
\n+------+------+",
"+------+------+---------------------+\
\n| host | path | time_index |\
\n+------+------+---------------------+\
\n| foo | 000 | 1970-01-01T00:00:01 |\
\n| foo | 000 | 1970-01-01T00:00:02 |\
\n+------+------+---------------------+",
),
String::from(
"+------+------+\
\n| host | path |\
\n+------+------+\
\n| foo | 001 |\
\n+------+------+",
"+------+------+---------------------+\
\n| host | path | time_index |\
\n+------+------+---------------------+\
\n| foo | 001 | 1970-01-01T00:00:03 |\
\n+------+------+---------------------+",
),
String::from(
"+------+------+\
\n| host | path |\
\n+------+------+\
\n| bar | 002 |\
\n| bar | 002 |\
\n| bar | 002 |\
\n| bar | 002 |\
\n| bar | 002 |\
\n+------+------+",
"+------+------+---------------------+\
\n| host | path | time_index |\
\n+------+------+---------------------+\
\n| bar | 002 | 1970-01-01T00:00:04 |\
\n| bar | 002 | 1970-01-01T00:00:05 |\
\n| bar | 002 | 1970-01-01T00:00:06 |\
\n| bar | 002 | 1970-01-01T00:00:07 |\
\n| bar | 002 | 1970-01-01T00:00:08 |\
\n+------+------+---------------------+",
),
String::from(
"+------+------+\
\n| host | path |\
\n+------+------+\
\n| bar | 003 |\
\n+------+------+",
"+------+------+---------------------+\
\n| host | path | time_index |\
\n+------+------+---------------------+\
\n| bar | 003 | 1970-01-01T00:00:09 |\
\n+------+------+---------------------+",
),
String::from(
"+------+------+\
\n| host | path |\
\n+------+------+\
\n| bla | 005 |\
\n| bla | 005 |\
\n| bla | 005 |\
\n| bla | 005 |\
\n| bla | 005 |\
\n| bla | 005 |\
\n| bla | 005 |\
\n+------+------+",
"+------+------+---------------------+\
\n| host | path | time_index |\
\n+------+------+---------------------+\
\n| bla | 005 | 1970-01-01T00:00:10 |\
\n| bla | 005 | 1970-01-01T00:00:11 |\
\n| bla | 005 | 1970-01-01T00:00:12 |\
\n| bla | 005 | 1970-01-01T00:00:13 |\
\n| bla | 005 | 1970-01-01T00:00:14 |\
\n| bla | 005 | 1970-01-01T00:00:15 |\
\n| bla | 005 | 1970-01-01T00:00:16 |\
\n+------+------+---------------------+",
),
String::from(
"+------+------+\
\n| host | path |\
\n+------+------+\
\n| 🥺 | 001 |\
\n| 🥺 | 001 |\
\n| 🥺 | 001 |\
\n| 🥺 | 001 |\
\n| 🥺 | 001 |\
\n+------+------+",
"+------+------+---------------------+\
\n| host | path | time_index |\
\n+------+------+---------------------+\
\n| 🥺 | 001 | 1970-01-01T00:00:17 |\
\n| 🥺 | 001 | 1970-01-01T00:00:18 |\
\n| 🥺 | 001 | 1970-01-01T00:00:19 |\
\n| 🥺 | 001 | 1970-01-01T00:00:20 |\
\n| 🥺 | 001 | 1970-01-01T00:00:21 |\
\n+------+------+---------------------+",
),
String::from(
"+------+------+\
\n| host | path |\
\n+------+------+\
\n| 🫠 | 001 |\
\n| 🫠 | 001 |\
\n+------+------+",
"+------+------+---------------------+\
\n| host | path | time_index |\
\n+------+------+---------------------+\
\n| 🫠 | 001 | 1970-01-01T00:00:22 |\
\n| 🫠 | 001 | 1970-01-01T00:00:23 |\
\n+------+------+---------------------+",
),
];
expectations.reverse();
@@ -642,6 +691,11 @@ mod test {
let schema = Arc::new(Schema::new(vec![
Field::new("host", DataType::Utf8, true),
Field::new("path", DataType::Utf8, true),
Field::new(
"time_index",
DataType::Timestamp(datafusion::arrow::datatypes::TimeUnit::Millisecond, None),
false,
),
]));
// Create batches with three different combinations
@@ -654,6 +708,9 @@ mod test {
vec![
Arc::new(StringArray::from(vec!["server1", "server1", "server1"])) as _,
Arc::new(StringArray::from(vec!["/var/log", "/var/log", "/var/log"])) as _,
Arc::new(datafusion::arrow::array::TimestampMillisecondArray::from(
vec![1000, 2000, 3000],
)) as _,
],
)
.unwrap();
@@ -663,6 +720,9 @@ mod test {
vec![
Arc::new(StringArray::from(vec!["server1", "server1"])) as _,
Arc::new(StringArray::from(vec!["/var/log", "/var/log"])) as _,
Arc::new(datafusion::arrow::array::TimestampMillisecondArray::from(
vec![4000, 5000],
)) as _,
],
)
.unwrap();
@@ -677,6 +737,9 @@ mod test {
"/var/data",
"/var/data",
])) as _,
Arc::new(datafusion::arrow::array::TimestampMillisecondArray::from(
vec![6000, 7000, 8000],
)) as _,
],
)
.unwrap();
@@ -686,6 +749,9 @@ mod test {
vec![
Arc::new(StringArray::from(vec!["server2"])) as _,
Arc::new(StringArray::from(vec!["/var/data"])) as _,
Arc::new(datafusion::arrow::array::TimestampMillisecondArray::from(
vec![9000],
)) as _,
],
)
.unwrap();
@@ -696,6 +762,9 @@ mod test {
vec![
Arc::new(StringArray::from(vec!["server3", "server3"])) as _,
Arc::new(StringArray::from(vec!["/opt/logs", "/opt/logs"])) as _,
Arc::new(datafusion::arrow::array::TimestampMillisecondArray::from(
vec![10000, 11000],
)) as _,
],
)
.unwrap();
@@ -709,6 +778,9 @@ mod test {
"/opt/logs",
"/opt/logs",
])) as _,
Arc::new(datafusion::arrow::array::TimestampMillisecondArray::from(
vec![12000, 13000, 14000],
)) as _,
],
)
.unwrap();
@@ -726,6 +798,7 @@ mod test {
// Create SeriesDivideExec
let divide_exec = Arc::new(SeriesDivideExec {
tag_columns: vec!["host".to_string(), "path".to_string()],
time_index_column: "time_index".to_string(),
input: memory_exec,
metric: ExecutionPlanMetricsSet::new(),
});
@@ -760,10 +833,16 @@ mod test {
.as_any()
.downcast_ref::<StringArray>()
.unwrap();
let time_index_array1 = result[0]
.column(2)
.as_any()
.downcast_ref::<datafusion::arrow::array::TimestampMillisecondArray>()
.unwrap();
for i in 0..5 {
assert_eq!(host_array1.value(i), "server1");
assert_eq!(path_array1.value(i), "/var/log");
assert_eq!(time_index_array1.value(i), 1000 + (i as i64) * 1000);
}
// Verify values in second batch (server2, /var/data)
@@ -777,10 +856,16 @@ mod test {
.as_any()
.downcast_ref::<StringArray>()
.unwrap();
let time_index_array2 = result[1]
.column(2)
.as_any()
.downcast_ref::<datafusion::arrow::array::TimestampMillisecondArray>()
.unwrap();
for i in 0..4 {
assert_eq!(host_array2.value(i), "server2");
assert_eq!(path_array2.value(i), "/var/data");
assert_eq!(time_index_array2.value(i), 6000 + (i as i64) * 1000);
}
// Verify values in third batch (server3, /opt/logs)
@@ -794,10 +879,16 @@ mod test {
.as_any()
.downcast_ref::<StringArray>()
.unwrap();
let time_index_array3 = result[2]
.column(2)
.as_any()
.downcast_ref::<datafusion::arrow::array::TimestampMillisecondArray>()
.unwrap();
for i in 0..5 {
assert_eq!(host_array3.value(i), "server3");
assert_eq!(path_array3.value(i), "/opt/logs");
assert_eq!(time_index_array3.value(i), 10000 + (i as i64) * 1000);
}
// Also verify streaming behavior

View File

@@ -28,7 +28,7 @@ pub mod error;
pub mod executor;
pub mod log_query;
pub mod metrics;
mod optimizer;
pub mod optimizer;
pub mod options;
pub mod parser;
mod part_sort;

View File

@@ -761,6 +761,8 @@ impl PromPlanner {
} else {
self.ctx.time_index_column = Some(SPECIAL_TIME_FUNCTION.to_string());
self.ctx.reset_table_name_and_schema();
self.ctx.tag_columns = vec![];
self.ctx.field_columns = vec![DEFAULT_FIELD_COLUMN.to_string()];
LogicalPlan::Extension(Extension {
node: Arc::new(
EmptyMetric::new(
@@ -1033,8 +1035,19 @@ impl PromPlanner {
.context(DataFusionPlanningSnafu)?;
// make divide plan
let time_index_column =
self.ctx
.time_index_column
.clone()
.with_context(|| TimeIndexNotFoundSnafu {
table: table_ref.to_string(),
})?;
let divide_plan = LogicalPlan::Extension(Extension {
node: Arc::new(SeriesDivide::new(self.ctx.tag_columns.clone(), sort_plan)),
node: Arc::new(SeriesDivide::new(
self.ctx.tag_columns.clone(),
time_index_column,
sort_plan,
)),
});
// make series_normalize plan
@@ -2828,6 +2841,7 @@ impl PromPlanner {
let project_fields = non_field_columns_iter
.chain(field_columns_iter)
.collect::<Result<Vec<_>>>()?;
LogicalPlanBuilder::from(input)
.project(project_fields)
.context(DataFusionPlanningSnafu)?
@@ -4219,7 +4233,7 @@ mod test {
\n PromInstantManipulate: range=[0..100000000], lookback=[1000], interval=[5000], time index=[timestamp] [tag_0:Utf8, tag_1:Utf8, tag_2:Utf8, timestamp:Timestamp(Millisecond, None), field_0:Float64;N, field_1:Float64;N, field_2:Float64;N]\
\n PromSeriesDivide: tags=[\"tag_0\", \"tag_1\", \"tag_2\"] [tag_0:Utf8, tag_1:Utf8, tag_2:Utf8, timestamp:Timestamp(Millisecond, None), field_0:Float64;N, field_1:Float64;N, field_2:Float64;N]\
\n Sort: prometheus_tsdb_head_series.tag_0 ASC NULLS FIRST, prometheus_tsdb_head_series.tag_1 ASC NULLS FIRST, prometheus_tsdb_head_series.tag_2 ASC NULLS FIRST, prometheus_tsdb_head_series.timestamp ASC NULLS FIRST [tag_0:Utf8, tag_1:Utf8, tag_2:Utf8, timestamp:Timestamp(Millisecond, None), field_0:Float64;N, field_1:Float64;N, field_2:Float64;N]\
\n Filter: prometheus_tsdb_head_series.tag_1 ~ Utf8(\"^(10.0.160.237:8080|10.0.160.237:9090)$\") AND prometheus_tsdb_head_series.timestamp >= TimestampMillisecond(-1000, None) AND prometheus_tsdb_head_series.timestamp <= TimestampMillisecond(100001000, None) [tag_0:Utf8, tag_1:Utf8, tag_2:Utf8, timestamp:Timestamp(Millisecond, None), field_0:Float64;N, field_1:Float64;N, field_2:Float64;N]\
\n Filter: prometheus_tsdb_head_series.tag_1 ~ Utf8(\"^(?:(10.0.160.237:8080|10.0.160.237:9090))$\") AND prometheus_tsdb_head_series.timestamp >= TimestampMillisecond(-1000, None) AND prometheus_tsdb_head_series.timestamp <= TimestampMillisecond(100001000, None) [tag_0:Utf8, tag_1:Utf8, tag_2:Utf8, timestamp:Timestamp(Millisecond, None), field_0:Float64;N, field_1:Float64;N, field_2:Float64;N]\
\n TableScan: prometheus_tsdb_head_series [tag_0:Utf8, tag_1:Utf8, tag_2:Utf8, timestamp:Timestamp(Millisecond, None), field_0:Float64;N, field_1:Float64;N, field_2:Float64;N]";
assert_eq!(plan.display_indent_schema().to_string(), expected);
}
@@ -4266,7 +4280,7 @@ mod test {
\n PromInstantManipulate: range=[0..100000000], lookback=[1000], interval=[5000], time index=[greptime_timestamp] [ip:Utf8, greptime_timestamp:Timestamp(Millisecond, None), greptime_value:Float64;N]\
\n PromSeriesDivide: tags=[\"ip\"] [ip:Utf8, greptime_timestamp:Timestamp(Millisecond, None), greptime_value:Float64;N]\
\n Sort: prometheus_tsdb_head_series.ip ASC NULLS FIRST, prometheus_tsdb_head_series.greptime_timestamp ASC NULLS FIRST [ip:Utf8, greptime_timestamp:Timestamp(Millisecond, None), greptime_value:Float64;N]\
\n Filter: prometheus_tsdb_head_series.ip ~ Utf8(\"^(10.0.160.237:8080|10.0.160.237:9090)$\") AND prometheus_tsdb_head_series.greptime_timestamp >= TimestampMillisecond(-1000, None) AND prometheus_tsdb_head_series.greptime_timestamp <= TimestampMillisecond(100001000, None) [ip:Utf8, greptime_timestamp:Timestamp(Millisecond, None), greptime_value:Float64;N]\
\n Filter: prometheus_tsdb_head_series.ip ~ Utf8(\"^(?:(10.0.160.237:8080|10.0.160.237:9090))$\") AND prometheus_tsdb_head_series.greptime_timestamp >= TimestampMillisecond(-1000, None) AND prometheus_tsdb_head_series.greptime_timestamp <= TimestampMillisecond(100001000, None) [ip:Utf8, greptime_timestamp:Timestamp(Millisecond, None), greptime_value:Float64;N]\
\n TableScan: prometheus_tsdb_head_series [ip:Utf8, greptime_timestamp:Timestamp(Millisecond, None), greptime_value:Float64;N]";
assert_eq!(plan.display_indent_schema().to_string(), expected);
@@ -4312,7 +4326,7 @@ mod test {
\n PromInstantManipulate: range=[0..100000000], lookback=[1000], interval=[5000], time index=[greptime_timestamp] [ip:Utf8, greptime_timestamp:Timestamp(Millisecond, None), greptime_value:Float64;N]\
\n PromSeriesDivide: tags=[\"ip\"] [ip:Utf8, greptime_timestamp:Timestamp(Millisecond, None), greptime_value:Float64;N]\
\n Sort: prometheus_tsdb_head_series.ip ASC NULLS FIRST, prometheus_tsdb_head_series.greptime_timestamp ASC NULLS FIRST [ip:Utf8, greptime_timestamp:Timestamp(Millisecond, None), greptime_value:Float64;N]\
\n Filter: prometheus_tsdb_head_series.ip ~ Utf8(\"^(10.0.160.237:8080|10.0.160.237:9090)$\") AND prometheus_tsdb_head_series.greptime_timestamp >= TimestampMillisecond(-1000, None) AND prometheus_tsdb_head_series.greptime_timestamp <= TimestampMillisecond(100001000, None) [ip:Utf8, greptime_timestamp:Timestamp(Millisecond, None), greptime_value:Float64;N]\
\n Filter: prometheus_tsdb_head_series.ip ~ Utf8(\"^(?:(10.0.160.237:8080|10.0.160.237:9090))$\") AND prometheus_tsdb_head_series.greptime_timestamp >= TimestampMillisecond(-1000, None) AND prometheus_tsdb_head_series.greptime_timestamp <= TimestampMillisecond(100001000, None) [ip:Utf8, greptime_timestamp:Timestamp(Millisecond, None), greptime_value:Float64;N]\
\n TableScan: prometheus_tsdb_head_series [ip:Utf8, greptime_timestamp:Timestamp(Millisecond, None), greptime_value:Float64;N]";
assert_eq!(plan.display_indent_schema().to_string(), expected);
@@ -4358,7 +4372,7 @@ mod test {
\n PromInstantManipulate: range=[0..100000000], lookback=[1000], interval=[5000], time index=[greptime_timestamp] [ip:Utf8, greptime_timestamp:Timestamp(Millisecond, None), greptime_value:Float64;N]\
\n PromSeriesDivide: tags=[\"ip\"] [ip:Utf8, greptime_timestamp:Timestamp(Millisecond, None), greptime_value:Float64;N]\
\n Sort: prometheus_tsdb_head_series.ip ASC NULLS FIRST, prometheus_tsdb_head_series.greptime_timestamp ASC NULLS FIRST [ip:Utf8, greptime_timestamp:Timestamp(Millisecond, None), greptime_value:Float64;N]\
\n Filter: prometheus_tsdb_head_series.ip ~ Utf8(\"^(10.0.160.237:8080|10.0.160.237:9090)$\") AND prometheus_tsdb_head_series.greptime_timestamp >= TimestampMillisecond(-1000, None) AND prometheus_tsdb_head_series.greptime_timestamp <= TimestampMillisecond(100001000, None) [ip:Utf8, greptime_timestamp:Timestamp(Millisecond, None), greptime_value:Float64;N]\
\n Filter: prometheus_tsdb_head_series.ip ~ Utf8(\"^(?:(10.0.160.237:8080|10.0.160.237:9090))$\") AND prometheus_tsdb_head_series.greptime_timestamp >= TimestampMillisecond(-1000, None) AND prometheus_tsdb_head_series.greptime_timestamp <= TimestampMillisecond(100001000, None) [ip:Utf8, greptime_timestamp:Timestamp(Millisecond, None), greptime_value:Float64;N]\
\n TableScan: prometheus_tsdb_head_series [ip:Utf8, greptime_timestamp:Timestamp(Millisecond, None), greptime_value:Float64;N]";
assert_eq!(plan.display_indent_schema().to_string(), expected);

View File

@@ -475,6 +475,38 @@ pub fn mock_timeseries() -> Vec<TimeSeries> {
]
}
/// Add new labels to the mock timeseries.
pub fn mock_timeseries_new_label() -> Vec<TimeSeries> {
let ts_demo_metrics = TimeSeries {
labels: vec![
new_label(METRIC_NAME_LABEL.to_string(), "demo_metrics".to_string()),
new_label("idc".to_string(), "idc3".to_string()),
new_label("new_label1".to_string(), "foo".to_string()),
],
samples: vec![Sample {
value: 42.0,
timestamp: 3000,
}],
..Default::default()
};
let ts_multi_labels = TimeSeries {
labels: vec![
new_label(METRIC_NAME_LABEL.to_string(), "metric1".to_string()),
new_label("idc".to_string(), "idc4".to_string()),
new_label("env".to_string(), "prod".to_string()),
new_label("host".to_string(), "host9".to_string()),
new_label("new_label2".to_string(), "bar".to_string()),
],
samples: vec![Sample {
value: 99.0,
timestamp: 4000,
}],
..Default::default()
};
vec![ts_demo_metrics, ts_multi_labels]
}
#[cfg(test)]
mod tests {
use std::sync::Arc;

View File

@@ -182,6 +182,7 @@ impl From<QueryContext> for api::v1::QueryContext {
channel: channel as u32,
snapshot_seqs: Some(api::v1::SnapshotSequences {
snapshot_seqs: snapshot_seqs.read().unwrap().clone(),
sst_min_sequences: Default::default(),
}),
explain,
}

View File

@@ -13,6 +13,7 @@ aquamarine.workspace = true
async-trait.workspace = true
common-base.workspace = true
common-error.workspace = true
common-grpc.workspace = true
common-macro.workspace = true
common-recordbatch.workspace = true
common-time.workspace = true

View File

@@ -973,6 +973,21 @@ pub enum MetadataError {
#[snafu(implicit)]
location: Location,
},
#[snafu(display("Failed to encode/decode flight message"))]
FlightCodec {
source: common_grpc::Error,
#[snafu(implicit)]
location: Location,
},
#[snafu(display("Failed to decode prost message"))]
Prost {
#[snafu(source)]
error: prost::DecodeError,
#[snafu(implicit)]
location: Location,
},
}
impl ErrorExt for MetadataError {

View File

@@ -12,41 +12,42 @@
// See the License for the specific language governing permissions and
// limitations under the License.
use std::collections::hash_map::Entry;
use std::collections::HashMap;
use std::fmt::{self, Display};
use std::io::Cursor;
use api::helper::ColumnDataTypeWrapper;
use api::v1::add_column_location::LocationType;
use api::v1::column_def::{
as_fulltext_option_analyzer, as_fulltext_option_backend, as_skipping_index_type,
};
use api::v1::region::bulk_insert_request::Body;
use api::v1::region::{
alter_request, compact_request, region_request, AlterRequest, AlterRequests,
BulkInsertRequests, CloseRequest, CompactRequest, CreateRequest, CreateRequests,
DeleteRequests, DropRequest, DropRequests, FlushRequest, InsertRequests, OpenRequest,
TruncateRequest,
alter_request, compact_request, region_request, AlterRequest, AlterRequests, ArrowIpc,
BulkInsertRequest, CloseRequest, CompactRequest, CreateRequest, CreateRequests, DeleteRequests,
DropRequest, DropRequests, FlushRequest, InsertRequests, OpenRequest, TruncateRequest,
};
use api::v1::{
self, set_index, Analyzer, FulltextBackend as PbFulltextBackend, Option as PbOption, Rows,
SemanticType, SkippingIndexType as PbSkippingIndexType, WriteHint,
};
pub use common_base::AffectedRows;
use common_grpc::flight::{FlightDecoder, FlightMessage};
use common_grpc::FlightData;
use common_recordbatch::DfRecordBatch;
use common_time::TimeToLive;
use datatypes::arrow::ipc::reader::FileReader;
use datatypes::prelude::ConcreteDataType;
use datatypes::schema::{FulltextOptions, SkippingIndexOptions};
use prost::Message;
use serde::{Deserialize, Serialize};
use snafu::{ensure, OptionExt, ResultExt};
use strum::{AsRefStr, IntoStaticStr};
use crate::logstore::entry;
use crate::metadata::{
ColumnMetadata, DecodeArrowIpcSnafu, DecodeProtoSnafu, InvalidRawRegionRequestSnafu,
ColumnMetadata, DecodeProtoSnafu, FlightCodecSnafu, InvalidRawRegionRequestSnafu,
InvalidRegionRequestSnafu, InvalidSetRegionOptionRequestSnafu,
InvalidUnsetRegionOptionRequestSnafu, MetadataError, RegionMetadata, Result, UnexpectedSnafu,
InvalidUnsetRegionOptionRequestSnafu, MetadataError, ProstSnafu, RegionMetadata, Result,
UnexpectedSnafu,
};
use crate::metric_engine_consts::PHYSICAL_TABLE_METADATA_KEY;
use crate::mito_engine_options::{
@@ -152,7 +153,7 @@ impl RegionRequest {
region_request::Body::Creates(creates) => make_region_creates(creates),
region_request::Body::Drops(drops) => make_region_drops(drops),
region_request::Body::Alters(alters) => make_region_alters(alters),
region_request::Body::BulkInserts(bulk) => make_region_bulk_inserts(bulk),
region_request::Body::BulkInsert(bulk) => make_region_bulk_inserts(bulk),
region_request::Body::Sync(_) => UnexpectedSnafu {
reason: "Sync request should be handled separately by RegionServer",
}
@@ -326,49 +327,36 @@ fn make_region_truncate(truncate: TruncateRequest) -> Result<Vec<(RegionId, Regi
)])
}
/// Convert [BulkInsertRequests] to [RegionRequest] and group by [RegionId].
fn make_region_bulk_inserts(
requests: BulkInsertRequests,
) -> Result<Vec<(RegionId, RegionRequest)>> {
let mut region_requests: HashMap<u64, Vec<BulkInsertPayload>> =
HashMap::with_capacity(requests.requests.len());
/// Convert [BulkInsertRequest] to [RegionRequest] and group by [RegionId].
fn make_region_bulk_inserts(request: BulkInsertRequest) -> Result<Vec<(RegionId, RegionRequest)>> {
let Some(Body::ArrowIpc(request)) = request.body else {
return Ok(vec![]);
};
for req in requests.requests {
let region_id = req.region_id;
match req.payload_type() {
api::v1::region::BulkInsertType::ArrowIpc => {
// todo(hl): use StreamReader instead
let reader = FileReader::try_new(Cursor::new(req.payload), None)
.context(DecodeArrowIpcSnafu)?;
let record_batches = reader
.map(|b| b.map(BulkInsertPayload::ArrowIpc))
.try_collect::<Vec<_>>()
.context(DecodeArrowIpcSnafu)?;
match region_requests.entry(region_id) {
Entry::Occupied(mut e) => {
e.get_mut().extend(record_batches);
}
Entry::Vacant(e) => {
e.insert(record_batches);
}
}
}
}
}
let ArrowIpc {
region_id,
schema,
payload,
} = request;
let result = region_requests
.into_iter()
.map(|(region_id, payloads)| {
(
region_id.into(),
RegionRequest::BulkInserts(RegionBulkInsertsRequest {
region_id: region_id.into(),
payloads,
}),
)
})
.collect::<Vec<_>>();
Ok(result)
let schema_data = FlightData::decode(schema.clone()).context(ProstSnafu)?;
let payload_data = FlightData::decode(payload.clone()).context(ProstSnafu)?;
let mut decoder = FlightDecoder::default();
let _schema_message = decoder.try_decode(schema_data).context(FlightCodecSnafu)?;
let FlightMessage::Recordbatch(rb) =
decoder.try_decode(payload_data).context(FlightCodecSnafu)?
else {
unreachable!("Always expect record batch message after schema");
};
let region_id: RegionId = region_id.into();
Ok(vec![(
region_id,
RegionRequest::BulkInserts(RegionBulkInsertsRequest {
region_id,
payloads: vec![BulkInsertPayload::ArrowIpc(rb.df_record_batch().clone())],
}),
)])
}
/// Request to put data into a region.

View File

@@ -14,7 +14,6 @@ services:
- 9092:9092
- 9093:9093
environment:
# KRaft settings
KAFKA_CFG_NODE_ID: "1"
KAFKA_CFG_PROCESS_ROLES: broker,controller
KAFKA_CFG_CONTROLLER_QUORUM_VOTERS: 1@127.0.0.1:2181
@@ -27,6 +26,7 @@ services:
KAFKA_BROKER_ID: "1"
KAFKA_CLIENT_USERS: "user_kafka"
KAFKA_CLIENT_PASSWORDS: "secret"
KAFKA_CFG_AUTO_CREATE_TOPICS_ENABLE: false
depends_on:
zookeeper:
condition: service_started

View File

@@ -43,7 +43,7 @@ use servers::http::result::greptime_result_v1::GreptimedbV1Response;
use servers::http::result::influxdb_result_v1::{InfluxdbOutput, InfluxdbV1Response};
use servers::http::test_helpers::{TestClient, TestResponse};
use servers::http::GreptimeQueryOutput;
use servers::prom_store;
use servers::prom_store::{self, mock_timeseries_new_label};
use table::table_name::TableName;
use tests_integration::test_util::{
setup_test_http_app, setup_test_http_app_with_frontend,
@@ -1223,6 +1223,24 @@ pub async fn test_prometheus_remote_write(store_type: StorageType) {
.await;
assert_eq!(res.status(), StatusCode::NO_CONTENT);
// Write snappy encoded data with new labels
let write_request = WriteRequest {
timeseries: mock_timeseries_new_label(),
..Default::default()
};
let serialized_request = write_request.encode_to_vec();
let compressed_request =
prom_store::snappy_compress(&serialized_request).expect("failed to encode snappy");
let res = client
.post("/v1/prometheus/write")
.header("Content-Encoding", "snappy")
.body(compressed_request)
.send()
.await;
assert_eq!(res.status(), StatusCode::NO_CONTENT);
guard.remove_all().await;
}

View File

@@ -8,7 +8,7 @@ CREATE TABLE distinct_basic (
Affected Rows: 0
-- should fail
-- should fallback to streaming mode
-- SQLNESS REPLACE id=\d+ id=REDACTED
CREATE FLOW test_distinct_basic SINK TO out_distinct_basic AS
SELECT
@@ -16,9 +16,151 @@ SELECT
FROM
distinct_basic;
Error: 3001(EngineExecuteQuery), Unsupported: Source table `greptime.public.distinct_basic`(id=REDACTED) has instant TTL, Instant TTL is not supported under batching mode. Consider using a TTL longer than flush interval
Affected Rows: 0
ALTER TABLE distinct_basic SET 'ttl' = '5s';
-- flow_options should have a flow_type:streaming
-- since source table's ttl=instant
SELECT flow_name, options FROM INFORMATION_SCHEMA.FLOWS;
+---------------------+---------------------------+
| flow_name | options |
+---------------------+---------------------------+
| test_distinct_basic | {"flow_type":"streaming"} |
+---------------------+---------------------------+
SHOW CREATE TABLE distinct_basic;
+----------------+-----------------------------------------------------------+
| Table | Create Table |
+----------------+-----------------------------------------------------------+
| distinct_basic | CREATE TABLE IF NOT EXISTS "distinct_basic" ( |
| | "number" INT NULL, |
| | "ts" TIMESTAMP(3) NOT NULL DEFAULT current_timestamp(), |
| | TIME INDEX ("ts"), |
| | PRIMARY KEY ("number") |
| | ) |
| | |
| | ENGINE=mito |
| | WITH( |
| | ttl = 'instant' |
| | ) |
+----------------+-----------------------------------------------------------+
SHOW CREATE TABLE out_distinct_basic;
+--------------------+---------------------------------------------------+
| Table | Create Table |
+--------------------+---------------------------------------------------+
| out_distinct_basic | CREATE TABLE IF NOT EXISTS "out_distinct_basic" ( |
| | "dis" INT NULL, |
| | "update_at" TIMESTAMP(3) NULL, |
| | "__ts_placeholder" TIMESTAMP(3) NOT NULL, |
| | TIME INDEX ("__ts_placeholder"), |
| | PRIMARY KEY ("dis") |
| | ) |
| | |
| | ENGINE=mito |
| | |
+--------------------+---------------------------------------------------+
-- SQLNESS SLEEP 3s
INSERT INTO
distinct_basic
VALUES
(20, "2021-07-01 00:00:00.200"),
(20, "2021-07-01 00:00:00.200"),
(22, "2021-07-01 00:00:00.600");
Affected Rows: 0
-- SQLNESS REPLACE (ADMIN\sFLUSH_FLOW\('\w+'\)\s+\|\n\+-+\+\n\|\s+)[0-9]+\s+\| $1 FLOW_FLUSHED |
ADMIN FLUSH_FLOW('test_distinct_basic');
+-----------------------------------------+
| ADMIN FLUSH_FLOW('test_distinct_basic') |
+-----------------------------------------+
| FLOW_FLUSHED |
+-----------------------------------------+
SELECT
dis
FROM
out_distinct_basic;
+-----+
| dis |
+-----+
| 20 |
| 22 |
+-----+
SELECT number FROM distinct_basic;
++
++
-- SQLNESS SLEEP 6s
ADMIN FLUSH_TABLE('distinct_basic');
+-------------------------------------+
| ADMIN FLUSH_TABLE('distinct_basic') |
+-------------------------------------+
| 0 |
+-------------------------------------+
INSERT INTO
distinct_basic
VALUES
(23, "2021-07-01 00:00:01.600");
Affected Rows: 0
-- SQLNESS REPLACE (ADMIN\sFLUSH_FLOW\('\w+'\)\s+\|\n\+-+\+\n\|\s+)[0-9]+\s+\| $1 FLOW_FLUSHED |
ADMIN FLUSH_FLOW('test_distinct_basic');
+-----------------------------------------+
| ADMIN FLUSH_FLOW('test_distinct_basic') |
+-----------------------------------------+
| FLOW_FLUSHED |
+-----------------------------------------+
SELECT
dis
FROM
out_distinct_basic;
+-----+
| dis |
+-----+
| 20 |
| 22 |
| 23 |
+-----+
SELECT number FROM distinct_basic;
++
++
DROP FLOW test_distinct_basic;
Affected Rows: 0
DROP TABLE distinct_basic;
Affected Rows: 0
DROP TABLE out_distinct_basic;
Affected Rows: 0
-- test ttl = 5s
CREATE TABLE distinct_basic (
number INT,
ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY(number),
TIME INDEX(ts)
)WITH ('ttl' = '5s');
Affected Rows: 0
@@ -30,7 +172,26 @@ FROM
Affected Rows: 0
-- flow_options should have a flow_type:batching
-- since source table's ttl=instant
SELECT flow_name, options FROM INFORMATION_SCHEMA.FLOWS;
+---------------------+--------------------------+
| flow_name | options |
+---------------------+--------------------------+
| test_distinct_basic | {"flow_type":"batching"} |
+---------------------+--------------------------+
-- SQLNESS ARG restart=true
SELECT 1;
+----------+
| Int64(1) |
+----------+
| 1 |
+----------+
-- SQLNESS SLEEP 3s
INSERT INTO
distinct_basic
VALUES
@@ -130,41 +291,6 @@ ADMIN FLUSH_FLOW('test_distinct_basic');
| FLOW_FLUSHED |
+-----------------------------------------+
SHOW CREATE TABLE distinct_basic;
+----------------+-----------------------------------------------------------+
| Table | Create Table |
+----------------+-----------------------------------------------------------+
| distinct_basic | CREATE TABLE IF NOT EXISTS "distinct_basic" ( |
| | "number" INT NULL, |
| | "ts" TIMESTAMP(3) NOT NULL DEFAULT current_timestamp(), |
| | TIME INDEX ("ts"), |
| | PRIMARY KEY ("number") |
| | ) |
| | |
| | ENGINE=mito |
| | WITH( |
| | ttl = '5s' |
| | ) |
+----------------+-----------------------------------------------------------+
SHOW CREATE TABLE out_distinct_basic;
+--------------------+---------------------------------------------------+
| Table | Create Table |
+--------------------+---------------------------------------------------+
| out_distinct_basic | CREATE TABLE IF NOT EXISTS "out_distinct_basic" ( |
| | "dis" INT NULL, |
| | "update_at" TIMESTAMP(3) NULL, |
| | "__ts_placeholder" TIMESTAMP(3) NOT NULL, |
| | TIME INDEX ("__ts_placeholder"), |
| | PRIMARY KEY ("dis") |
| | ) |
| | |
| | ENGINE=mito |
| | |
+--------------------+---------------------------------------------------+
SELECT
dis
FROM

View File

@@ -6,7 +6,7 @@ CREATE TABLE distinct_basic (
TIME INDEX(ts)
)WITH ('ttl' = 'instant');
-- should fail
-- should fallback to streaming mode
-- SQLNESS REPLACE id=\d+ id=REDACTED
CREATE FLOW test_distinct_basic SINK TO out_distinct_basic AS
SELECT
@@ -14,7 +14,61 @@ SELECT
FROM
distinct_basic;
ALTER TABLE distinct_basic SET 'ttl' = '5s';
-- flow_options should have a flow_type:streaming
-- since source table's ttl=instant
SELECT flow_name, options FROM INFORMATION_SCHEMA.FLOWS;
SHOW CREATE TABLE distinct_basic;
SHOW CREATE TABLE out_distinct_basic;
-- SQLNESS SLEEP 3s
INSERT INTO
distinct_basic
VALUES
(20, "2021-07-01 00:00:00.200"),
(20, "2021-07-01 00:00:00.200"),
(22, "2021-07-01 00:00:00.600");
-- SQLNESS REPLACE (ADMIN\sFLUSH_FLOW\('\w+'\)\s+\|\n\+-+\+\n\|\s+)[0-9]+\s+\| $1 FLOW_FLUSHED |
ADMIN FLUSH_FLOW('test_distinct_basic');
SELECT
dis
FROM
out_distinct_basic;
SELECT number FROM distinct_basic;
-- SQLNESS SLEEP 6s
ADMIN FLUSH_TABLE('distinct_basic');
INSERT INTO
distinct_basic
VALUES
(23, "2021-07-01 00:00:01.600");
-- SQLNESS REPLACE (ADMIN\sFLUSH_FLOW\('\w+'\)\s+\|\n\+-+\+\n\|\s+)[0-9]+\s+\| $1 FLOW_FLUSHED |
ADMIN FLUSH_FLOW('test_distinct_basic');
SELECT
dis
FROM
out_distinct_basic;
SELECT number FROM distinct_basic;
DROP FLOW test_distinct_basic;
DROP TABLE distinct_basic;
DROP TABLE out_distinct_basic;
-- test ttl = 5s
CREATE TABLE distinct_basic (
number INT,
ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY(number),
TIME INDEX(ts)
)WITH ('ttl' = '5s');
CREATE FLOW test_distinct_basic SINK TO out_distinct_basic AS
SELECT
@@ -22,7 +76,14 @@ SELECT
FROM
distinct_basic;
-- flow_options should have a flow_type:batching
-- since source table's ttl=instant
SELECT flow_name, options FROM INFORMATION_SCHEMA.FLOWS;
-- SQLNESS ARG restart=true
SELECT 1;
-- SQLNESS SLEEP 3s
INSERT INTO
distinct_basic
VALUES
@@ -55,10 +116,6 @@ VALUES
-- SQLNESS REPLACE (ADMIN\sFLUSH_FLOW\('\w+'\)\s+\|\n\+-+\+\n\|\s+)[0-9]+\s+\| $1 FLOW_FLUSHED |
ADMIN FLUSH_FLOW('test_distinct_basic');
SHOW CREATE TABLE distinct_basic;
SHOW CREATE TABLE out_distinct_basic;
SELECT
dis
FROM

View File

@@ -44,6 +44,15 @@ ADMIN FLUSH_FLOW('test_numbers_basic');
+----------------------------------------+
-- SQLNESS ARG restart=true
SELECT 1;
+----------+
| Int64(1) |
+----------+
| 1 |
+----------+
-- SQLNESS SLEEP 3s
SHOW CREATE TABLE out_num_cnt_basic;
+-------------------+--------------------------------------------------+
@@ -101,6 +110,16 @@ GROUP BY
Affected Rows: 0
-- SQLNESS ARG restart=true
SELECT 1;
+----------+
| Int64(1) |
+----------+
| 1 |
+----------+
-- SQLNESS SLEEP 3s
SHOW CREATE TABLE out_num_cnt_basic;
+-------------------+--------------------------------------------------+
@@ -118,6 +137,15 @@ SHOW CREATE TABLE out_num_cnt_basic;
+-------------------+--------------------------------------------------+
-- SQLNESS ARG restart=true
SELECT 1;
+----------+
| Int64(1) |
+----------+
| 1 |
+----------+
-- SQLNESS SLEEP 3s
SHOW CREATE FLOW test_numbers_basic;
+--------------------+---------------------------------------------------------------------------------------+

View File

@@ -20,6 +20,9 @@ SHOW CREATE TABLE out_num_cnt_basic;
ADMIN FLUSH_FLOW('test_numbers_basic');
-- SQLNESS ARG restart=true
SELECT 1;
-- SQLNESS SLEEP 3s
SHOW CREATE TABLE out_num_cnt_basic;
SHOW CREATE FLOW test_numbers_basic;
@@ -44,10 +47,16 @@ FROM
numbers_input_basic
GROUP BY
ts;
-- SQLNESS ARG restart=true
SELECT 1;
-- SQLNESS SLEEP 3s
SHOW CREATE TABLE out_num_cnt_basic;
-- SQLNESS ARG restart=true
SELECT 1;
-- SQLNESS SLEEP 3s
SHOW CREATE FLOW test_numbers_basic;
SHOW CREATE TABLE out_num_cnt_basic;

View File

@@ -62,6 +62,15 @@ SHOW CREATE TABLE out_num_cnt_basic;
+-------------------+--------------------------------------------------+
-- SQLNESS ARG restart=true
SELECT 1;
+----------+
| Int64(1) |
+----------+
| 1 |
+----------+
-- SQLNESS SLEEP 3s
INSERT INTO
numbers_input_basic
VALUES
@@ -206,6 +215,15 @@ SHOW CREATE TABLE out_basic;
+-----------+---------------------------------------------+
-- SQLNESS ARG restart=true
SELECT 1;
+----------+
| Int64(1) |
+----------+
| 1 |
+----------+
-- SQLNESS SLEEP 3s
INSERT INTO
input_basic
VALUES
@@ -306,6 +324,15 @@ ADMIN FLUSH_FLOW('test_distinct_basic');
+-----------------------------------------+
-- SQLNESS ARG restart=true
SELECT 1;
+----------+
| Int64(1) |
+----------+
| 1 |
+----------+
-- SQLNESS SLEEP 3s
INSERT INTO
distinct_basic
VALUES
@@ -1665,6 +1692,15 @@ ADMIN FLUSH_FLOW('test_numbers_basic');
+----------------------------------------+
-- SQLNESS ARG restart=true
SELECT 1;
+----------+
| Int64(1) |
+----------+
| 1 |
+----------+
-- SQLNESS SLEEP 3s
INSERT INTO
numbers_input_basic
VALUES

View File

@@ -24,6 +24,9 @@ ADMIN FLUSH_FLOW('test_numbers_basic');
SHOW CREATE TABLE out_num_cnt_basic;
-- SQLNESS ARG restart=true
SELECT 1;
-- SQLNESS SLEEP 3s
INSERT INTO
numbers_input_basic
VALUES
@@ -91,6 +94,9 @@ FROM
SHOW CREATE TABLE out_basic;
-- SQLNESS ARG restart=true
SELECT 1;
-- SQLNESS SLEEP 3s
INSERT INTO
input_basic
VALUES
@@ -130,6 +136,9 @@ SHOW CREATE TABLE out_distinct_basic;
ADMIN FLUSH_FLOW('test_distinct_basic');
-- SQLNESS ARG restart=true
SELECT 1;
-- SQLNESS SLEEP 3s
INSERT INTO
distinct_basic
VALUES
@@ -788,6 +797,9 @@ SHOW CREATE TABLE out_num_cnt_basic;
ADMIN FLUSH_FLOW('test_numbers_basic');
-- SQLNESS ARG restart=true
SELECT 1;
-- SQLNESS SLEEP 3s
INSERT INTO
numbers_input_basic
VALUES

View File

@@ -730,10 +730,21 @@ SELECT key FROM api_stats;
+-----+
-- SQLNESS ARG restart=true
SELECT 1;
+----------+
| Int64(1) |
+----------+
| 1 |
+----------+
-- SQLNESS SLEEP 5s
INSERT INTO `api_log` (`time`, `key`, `status_code`, `method`, `path`, `raw_query`, `user_agent`, `client_ip`, `duration`, `count`) VALUES (now(), '2', 0, 'GET', '/lightning/v1/query', 'key=1&since=600', 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36', '1', 21, 1);
Affected Rows: 1
-- wait more time so flownode have time to recover flows
-- SQLNESS SLEEP 5s
-- SQLNESS REPLACE (ADMIN\sFLUSH_FLOW\('\w+'\)\s+\|\n\+-+\+\n\|\s+)[0-9]+\s+\| $1 FLOW_FLUSHED |
ADMIN FLUSH_FLOW('api_stats_flow');

View File

@@ -399,8 +399,13 @@ ADMIN FLUSH_FLOW('api_stats_flow');
SELECT key FROM api_stats;
-- SQLNESS ARG restart=true
SELECT 1;
-- SQLNESS SLEEP 5s
INSERT INTO `api_log` (`time`, `key`, `status_code`, `method`, `path`, `raw_query`, `user_agent`, `client_ip`, `duration`, `count`) VALUES (now(), '2', 0, 'GET', '/lightning/v1/query', 'key=1&since=600', 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36', '1', 21, 1);
-- wait more time so flownode have time to recover flows
-- SQLNESS SLEEP 5s
-- SQLNESS REPLACE (ADMIN\sFLUSH_FLOW\('\w+'\)\s+\|\n\+-+\+\n\|\s+)[0-9]+\s+\| $1 FLOW_FLUSHED |
ADMIN FLUSH_FLOW('api_stats_flow');

View File

@@ -0,0 +1,269 @@
CREATE TABLE access_log (
"url" STRING,
user_id BIGINT,
ts TIMESTAMP TIME INDEX,
PRIMARY KEY ("url", user_id)
);
Affected Rows: 0
CREATE TABLE access_log_10s (
"url" STRING,
time_window timestamp time INDEX,
state BINARY,
PRIMARY KEY ("url")
);
Affected Rows: 0
CREATE FLOW calc_access_log_10s SINK TO access_log_10s
AS
SELECT
"url",
date_bin('10s'::INTERVAL, ts) AS time_window,
hll(user_id) AS state
FROM
access_log
GROUP BY
"url",
time_window;
Affected Rows: 0
-- insert 4 rows of data
INSERT INTO access_log VALUES
("/dashboard", 1, "2025-03-04 00:00:00"),
("/dashboard", 1, "2025-03-04 00:00:01"),
("/dashboard", 2, "2025-03-04 00:00:05"),
("/not_found", 3, "2025-03-04 00:00:11"),
("/dashboard", 4, "2025-03-04 00:00:15");
Affected Rows: 5
-- SQLNESS REPLACE (ADMIN\sFLUSH_FLOW\('\w+'\)\s+\|\n\+-+\+\n\|\s+)[0-9]+\s+\| $1 FLOW_FLUSHED |
ADMIN FLUSH_FLOW('calc_access_log_10s');
+-----------------------------------------+
| ADMIN FLUSH_FLOW('calc_access_log_10s') |
+-----------------------------------------+
| FLOW_FLUSHED |
+-----------------------------------------+
-- query should return 3 rows
-- SQLNESS SORT_RESULT 3 1
SELECT "url", time_window FROM access_log_10s
ORDER BY
time_window;
+------------+---------------------+
| url | time_window |
+------------+---------------------+
| /dashboard | 2025-03-04T00:00:00 |
| /dashboard | 2025-03-04T00:00:10 |
| /not_found | 2025-03-04T00:00:10 |
+------------+---------------------+
-- use hll_count to query the approximate data in access_log_10s
-- SQLNESS SORT_RESULT 3 1
SELECT "url", time_window, hll_count(state) FROM access_log_10s
ORDER BY
time_window;
+------------+---------------------+---------------------------------+
| url | time_window | hll_count(access_log_10s.state) |
+------------+---------------------+---------------------------------+
| /dashboard | 2025-03-04T00:00:00 | 2 |
| /dashboard | 2025-03-04T00:00:10 | 1 |
| /not_found | 2025-03-04T00:00:10 | 1 |
+------------+---------------------+---------------------------------+
-- further, we can aggregate 10 seconds of data to every minute, by using hll_merge to merge 10 seconds of hyperloglog state
-- SQLNESS SORT_RESULT 3 1
SELECT
"url",
date_bin('1 minute'::INTERVAL, time_window) AS time_window_1m,
hll_count(hll_merge(state)) as uv_per_min
FROM
access_log_10s
GROUP BY
"url",
time_window_1m
ORDER BY
time_window_1m;
+------------+---------------------+------------+
| url | time_window_1m | uv_per_min |
+------------+---------------------+------------+
| /dashboard | 2025-03-04T00:00:00 | 3 |
| /not_found | 2025-03-04T00:00:00 | 1 |
+------------+---------------------+------------+
DROP FLOW calc_access_log_10s;
Affected Rows: 0
DROP TABLE access_log_10s;
Affected Rows: 0
DROP TABLE access_log;
Affected Rows: 0
CREATE TABLE percentile_base (
"id" INT PRIMARY KEY,
"value" DOUBLE,
ts timestamp(0) time index
);
Affected Rows: 0
CREATE TABLE percentile_5s (
"percentile_state" BINARY,
time_window timestamp(0) time index
);
Affected Rows: 0
CREATE FLOW calc_percentile_5s SINK TO percentile_5s
AS
SELECT
uddsketch_state(128, 0.01, "value") AS "value",
date_bin('5 seconds'::INTERVAL, ts) AS time_window
FROM
percentile_base
WHERE
"value" > 0 AND "value" < 70
GROUP BY
time_window;
Affected Rows: 0
INSERT INTO percentile_base ("id", "value", ts) VALUES
(1, 10.0, 1),
(2, 20.0, 2),
(3, 30.0, 3),
(4, 40.0, 4),
(5, 50.0, 5),
(6, 60.0, 6),
(7, 70.0, 7),
(8, 80.0, 8),
(9, 90.0, 9),
(10, 100.0, 10);
Affected Rows: 10
-- SQLNESS REPLACE (ADMIN\sFLUSH_FLOW\('\w+'\)\s+\|\n\+-+\+\n\|\s+)[0-9]+\s+\| $1 FLOW_FLUSHED |
ADMIN FLUSH_FLOW('calc_percentile_5s');
+----------------------------------------+
| ADMIN FLUSH_FLOW('calc_percentile_5s') |
+----------------------------------------+
| FLOW_FLUSHED |
+----------------------------------------+
SELECT
time_window,
uddsketch_calc(0.99, `percentile_state`) AS p99
FROM
percentile_5s
ORDER BY
time_window;
+---------------------+--------------------+
| time_window | p99 |
+---------------------+--------------------+
| 1970-01-01T00:00:00 | 40.04777053326359 |
| 1970-01-01T00:00:05 | 59.745049810145126 |
+---------------------+--------------------+
DROP FLOW calc_percentile_5s;
Affected Rows: 0
DROP TABLE percentile_5s;
Affected Rows: 0
DROP TABLE percentile_base;
Affected Rows: 0
CREATE TABLE percentile_base (
"id" INT PRIMARY KEY,
"value" DOUBLE,
ts timestamp(0) time index
);
Affected Rows: 0
CREATE TABLE percentile_5s (
"percentile_state" BINARY,
time_window timestamp(0) time index
);
Affected Rows: 0
CREATE FLOW calc_percentile_5s SINK TO percentile_5s
AS
SELECT
uddsketch_state(128, 0.01, CASE WHEN "value" > 0 AND "value" < 70 THEN "value" ELSE NULL END) AS "value",
date_bin('5 seconds'::INTERVAL, ts) AS time_window
FROM
percentile_base
GROUP BY
time_window;
Affected Rows: 0
INSERT INTO percentile_base ("id", "value", ts) VALUES
(1, 10.0, 1),
(2, 20.0, 2),
(3, 30.0, 3),
(4, 40.0, 4),
(5, 50.0, 5),
(6, 60.0, 6),
(7, 70.0, 7),
(8, 80.0, 8),
(9, 90.0, 9),
(10, 100.0, 10);
Affected Rows: 10
-- SQLNESS REPLACE (ADMIN\sFLUSH_FLOW\('\w+'\)\s+\|\n\+-+\+\n\|\s+)[0-9]+\s+\| $1 FLOW_FLUSHED |
ADMIN FLUSH_FLOW('calc_percentile_5s');
+----------------------------------------+
| ADMIN FLUSH_FLOW('calc_percentile_5s') |
+----------------------------------------+
| FLOW_FLUSHED |
+----------------------------------------+
SELECT
time_window,
uddsketch_calc(0.99, percentile_state) AS p99
FROM
percentile_5s
ORDER BY
time_window;
+---------------------+--------------------+
| time_window | p99 |
+---------------------+--------------------+
| 1970-01-01T00:00:00 | 40.04777053326359 |
| 1970-01-01T00:00:05 | 59.745049810145126 |
| 1970-01-01T00:00:10 | |
+---------------------+--------------------+
DROP FLOW calc_percentile_5s;
Affected Rows: 0
DROP TABLE percentile_5s;
Affected Rows: 0
DROP TABLE percentile_base;
Affected Rows: 0

View File

@@ -0,0 +1,164 @@
CREATE TABLE access_log (
"url" STRING,
user_id BIGINT,
ts TIMESTAMP TIME INDEX,
PRIMARY KEY ("url", user_id)
);
CREATE TABLE access_log_10s (
"url" STRING,
time_window timestamp time INDEX,
state BINARY,
PRIMARY KEY ("url")
);
CREATE FLOW calc_access_log_10s SINK TO access_log_10s
AS
SELECT
"url",
date_bin('10s'::INTERVAL, ts) AS time_window,
hll(user_id) AS state
FROM
access_log
GROUP BY
"url",
time_window;
-- insert 4 rows of data
INSERT INTO access_log VALUES
("/dashboard", 1, "2025-03-04 00:00:00"),
("/dashboard", 1, "2025-03-04 00:00:01"),
("/dashboard", 2, "2025-03-04 00:00:05"),
("/not_found", 3, "2025-03-04 00:00:11"),
("/dashboard", 4, "2025-03-04 00:00:15");
-- SQLNESS REPLACE (ADMIN\sFLUSH_FLOW\('\w+'\)\s+\|\n\+-+\+\n\|\s+)[0-9]+\s+\| $1 FLOW_FLUSHED |
ADMIN FLUSH_FLOW('calc_access_log_10s');
-- query should return 3 rows
-- SQLNESS SORT_RESULT 3 1
SELECT "url", time_window FROM access_log_10s
ORDER BY
time_window;
-- use hll_count to query the approximate data in access_log_10s
-- SQLNESS SORT_RESULT 3 1
SELECT "url", time_window, hll_count(state) FROM access_log_10s
ORDER BY
time_window;
-- further, we can aggregate 10 seconds of data to every minute, by using hll_merge to merge 10 seconds of hyperloglog state
-- SQLNESS SORT_RESULT 3 1
SELECT
"url",
date_bin('1 minute'::INTERVAL, time_window) AS time_window_1m,
hll_count(hll_merge(state)) as uv_per_min
FROM
access_log_10s
GROUP BY
"url",
time_window_1m
ORDER BY
time_window_1m;
DROP FLOW calc_access_log_10s;
DROP TABLE access_log_10s;
DROP TABLE access_log;
CREATE TABLE percentile_base (
"id" INT PRIMARY KEY,
"value" DOUBLE,
ts timestamp(0) time index
);
CREATE TABLE percentile_5s (
"percentile_state" BINARY,
time_window timestamp(0) time index
);
CREATE FLOW calc_percentile_5s SINK TO percentile_5s
AS
SELECT
uddsketch_state(128, 0.01, "value") AS "value",
date_bin('5 seconds'::INTERVAL, ts) AS time_window
FROM
percentile_base
WHERE
"value" > 0 AND "value" < 70
GROUP BY
time_window;
INSERT INTO percentile_base ("id", "value", ts) VALUES
(1, 10.0, 1),
(2, 20.0, 2),
(3, 30.0, 3),
(4, 40.0, 4),
(5, 50.0, 5),
(6, 60.0, 6),
(7, 70.0, 7),
(8, 80.0, 8),
(9, 90.0, 9),
(10, 100.0, 10);
-- SQLNESS REPLACE (ADMIN\sFLUSH_FLOW\('\w+'\)\s+\|\n\+-+\+\n\|\s+)[0-9]+\s+\| $1 FLOW_FLUSHED |
ADMIN FLUSH_FLOW('calc_percentile_5s');
SELECT
time_window,
uddsketch_calc(0.99, `percentile_state`) AS p99
FROM
percentile_5s
ORDER BY
time_window;
DROP FLOW calc_percentile_5s;
DROP TABLE percentile_5s;
DROP TABLE percentile_base;
CREATE TABLE percentile_base (
"id" INT PRIMARY KEY,
"value" DOUBLE,
ts timestamp(0) time index
);
CREATE TABLE percentile_5s (
"percentile_state" BINARY,
time_window timestamp(0) time index
);
CREATE FLOW calc_percentile_5s SINK TO percentile_5s
AS
SELECT
uddsketch_state(128, 0.01, CASE WHEN "value" > 0 AND "value" < 70 THEN "value" ELSE NULL END) AS "value",
date_bin('5 seconds'::INTERVAL, ts) AS time_window
FROM
percentile_base
GROUP BY
time_window;
INSERT INTO percentile_base ("id", "value", ts) VALUES
(1, 10.0, 1),
(2, 20.0, 2),
(3, 30.0, 3),
(4, 40.0, 4),
(5, 50.0, 5),
(6, 60.0, 6),
(7, 70.0, 7),
(8, 80.0, 8),
(9, 90.0, 9),
(10, 100.0, 10);
-- SQLNESS REPLACE (ADMIN\sFLUSH_FLOW\('\w+'\)\s+\|\n\+-+\+\n\|\s+)[0-9]+\s+\| $1 FLOW_FLUSHED |
ADMIN FLUSH_FLOW('calc_percentile_5s');
SELECT
time_window,
uddsketch_calc(0.99, percentile_state) AS p99
FROM
percentile_5s
ORDER BY
time_window;
DROP FLOW calc_percentile_5s;
DROP TABLE percentile_5s;
DROP TABLE percentile_base;

View File

@@ -263,6 +263,15 @@ SELECT flow_name, table_catalog, flow_definition, source_table_names FROM INFORM
-- makesure after recover should be the same
-- SQLNESS ARG restart=true
SELECT 1;
+----------+
| Int64(1) |
+----------+
| 1 |
+----------+
-- SQLNESS SLEEP 3s
SELECT flow_name, table_catalog, flow_definition, source_table_names FROM INFORMATION_SCHEMA.FLOWS WHERE flow_name='filter_numbers_show';
+---------------------+---------------+-------------------------------------------------------------+------------------------------------+

View File

@@ -108,6 +108,9 @@ SELECT flow_name, table_catalog, flow_definition, source_table_names FROM INFORM
-- makesure after recover should be the same
-- SQLNESS ARG restart=true
SELECT 1;
-- SQLNESS SLEEP 3s
SELECT flow_name, table_catalog, flow_definition, source_table_names FROM INFORMATION_SCHEMA.FLOWS WHERE flow_name='filter_numbers_show';

Some files were not shown because too many files have changed in this diff Show More