Compare commits

...

46 Commits

Author SHA1 Message Date
Weny Xu
904d560175 feat(promql-planner): introduce vector matching binary operation (#5578)
* feat(promql-planner): support vector matching for binary operation

* test: add sqlness tests
2025-02-27 07:39:19 +00:00
Lei, HUANG
765d1277ee fix(metasrv): clean expired nodes in memory (#5592)
* fix/frontend-node-state: Refactor NodeInfoKey and Context Handling in Meta Server

 • Removed unused cluster_id from NodeInfoKey struct.
 • Updated HeartbeatHandlerGroup to return Context alongside HeartbeatResponse.
 • Added current_node_info to Context for tracking node information.
 • Implemented on_node_disconnect in Context to handle node disconnection events, specifically for Frontend roles.
 • Adjusted register_pusher function to return PusherId directly.
 • Updated tests to accommodate changes in Context structure.

* fix/frontend-node-state: Refactor Heartbeat Handler Context Management

Refactored the HeartbeatHandlerGroup::handle method to use a mutable reference for Context instead of passing it by value. This change simplifies the
context management by eliminating the need to return the context with the response. Updated the Metasrv implementation to align with this new context
handling approach, improving code clarity and reducing unnecessary context cloning.

* revert: clean cluster info on disconnect

* fix/frontend-node-state: Add Frontend Expiry Listener and Update NodeInfoKey Conversion

 • Introduced FrontendExpiryListener to manage the expiration of frontend nodes, including its integration with leadership change notifications.
 • Modified NodeInfoKey conversion to use references, enhancing efficiency and consistency across the codebase.
 • Updated collect_cluster_info_handler and metasrv to incorporate the new listener and conversion changes.
 • Added frontend_expiry module to the project structure for better organization and maintainability.

* chore: add config for node expiry

* add some doc

* fix: clippy

* fix/frontend-node-state:
 ### Refactor Node Expiry Handling
 - **Configuration Update**: Removed `node_expiry_tick` from `metasrv.example.toml` and `MetasrvOptions` in `metasrv.rs`.
 - **Module Renaming**: Renamed `frontend_expiry.rs` to `node_expiry_listener.rs` and updated references in `lib.rs`.
 - **Code Refactoring**: Replaced `FrontendExpiryListener` with `NodeExpiryListener` in `node_expiry_listener.rs` and `metasrv.rs`, removing the tick     interval and adjusting logic to use a fixed 60-second interval for node expiry checks.

* fix/frontend-node-state:
 Improve logging in `node_expiry_listener.rs`

 - Enhanced warning message to include peer information when an unrecognized node info key is encountered in `node_expiry_listener.rs`.

* docs: update config docs

* fix/frontend-node-state:
 **Refactor Context Handling in Heartbeat Services**

 - Updated `HeartbeatHandlerGroup` in `handler.rs` to pass `Context` by value instead of by mutable reference, allowing for more flexible context
 management.
 - Modified `Metasrv` implementation in `heartbeat.rs` to clone `Context` when passing to `handle` method, ensuring thread safety and consistency in
 asynchronous operations.
2025-02-27 06:16:36 +00:00
discord9
ccf42a9d97 fix: flow heartbeat retry (#5600)
* fix: flow heartbeat retry

* fix?: not sure if fixed

* chore: per review
2025-02-27 03:58:21 +00:00
Weny Xu
71e2fb895f feat: introduce prom_round fn (#5604)
* feat: introduce `prom_round` fn

* test: add sqlness tests
2025-02-27 03:30:15 +00:00
Ruihang Xia
c9671fd669 feat(promql): implement subquery (#5606)
* feat: initial implement for promql subquery

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* impl and test

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* refactor

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix clippy

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2025-02-27 03:28:04 +00:00
Ruihang Xia
b5efc75aab feat(promql): ignore invalid input in histogram plan (#5607)
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2025-02-27 03:18:20 +00:00
Weny Xu
c1d18d9980 fix(prom): preserve the order of series in PromQueryResult (#5601)
fix(prom): keep the order of tags
2025-02-26 13:40:09 +00:00
Lei, HUANG
5d9faaaf39 fix(metasrv): reject ddl when metasrv is follower (#5599)
* fix/reject-ddl-in-follower-metasrv:
 Add leader check and logging for gRPC requests in `procedure.rs`

 - Implemented leader verification for `query_procedure_state`, `ddl`, and `procedure_details` gRPC requests in `procedure.rs`.
 - Added logging with `warn` for requests reaching a non-leader node.
 - Introduced `ResponseHeader` and `Error::is_not_leader()` to handle non-leader responses.

* fix/reject-ddl-in-follower-metasrv:
 Improve leader address handling in `heartbeat.rs`

 - Refactor leader address retrieval by renaming `leader` to `leader_addr` for clarity.
 - Update `make_client` function to use a reference to `leader_addr`.
 - Enhance logging to include the leader address in the success message for creating a heartbeat stream.

* fmt

* fix/reject-ddl-in-follower-metasrv:
 **Enhance Leader Check in `procedure.rs`**

 - Updated the leader verification logic in `procedure.rs` to return a failed `MigrateRegionResponse` when the server is not the leader.
 - Added logging to warn when a migrate request is received by a non-leader server.
2025-02-26 08:10:40 +00:00
ZonaHe
538875abee feat: update dashboard to v0.7.11 (#5597)
Co-authored-by: sunchanglong <sunchanglong@users.noreply.github.com>
2025-02-26 07:57:59 +00:00
jeremyhi
5ed09c4584 fix: all heartbeat channel need to check leader (#5593) 2025-02-25 10:45:30 +00:00
Yingwen
3f6a41eac5 fix: update show create table output for fulltext index (#5591)
* fix: update full index syntax in show create table

* test: update fulltext sqlness result
2025-02-25 09:36:27 +00:00
yihong
ff0dcf12c5 perf: close issue 4974 by do not delete columns when drop logical region about 100 times faster (#5561)
* perf: do not delete columns when drop logical region in drop database

Signed-off-by: yihong0618 <zouzou0208@gmail.com>

* fix: make ci happy

Signed-off-by: yihong0618 <zouzou0208@gmail.com>

* fix: address review comments

Signed-off-by: yihong0618 <zouzou0208@gmail.com>

* fix: address some comments

Signed-off-by: yihong0618 <zouzou0208@gmail.com>

* fix: drop stupid comments by copilot

Signed-off-by: yihong0618 <zouzou0208@gmail.com>

* chore: minor refactor

* chore: minor refactor

* chore: update grpetime-proto

---------

Signed-off-by: yihong0618 <zouzou0208@gmail.com>
Co-authored-by: WenyXu <wenymedia@gmail.com>
2025-02-25 09:00:49 +00:00
Yingwen
5b1fca825a fix: remove cached and uploaded files on failure (#5590) 2025-02-25 08:51:37 +00:00
Ruihang Xia
7bd108e2be feat: impl hll_state, hll_merge and hll_calc for incremental distinct counting (#5579)
* basic impl

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* more tests

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* sqlness test

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix clippy

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* update with more test and logs

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* impl

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* impl merge fn

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* rename function names

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2025-02-24 19:07:37 +00:00
Weny Xu
286f225e50 fix: correct inverted_indexed_column_ids behavior (#5586)
* fix: correct `inverted_indexed_column_ids`

* fix: fix unit tests
2025-02-23 07:17:38 +00:00
Ruihang Xia
4f988b5ba9 feat: remove default inverted index for physical table (#5583)
* feat: remove default inverted index for physical table

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* update sqlness result

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2025-02-22 06:48:05 +00:00
Ruihang Xia
500d0852eb fix: avoid run labeler job concurrently (#5584)
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2025-02-22 05:18:26 +00:00
Zhenchi
8d05fb3503 feat: unify puffin name passed to stager (#5564)
* feat: purge a given puffin file in staging area

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* polish log

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* ttl set to 2d

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* feat: expose staging_ttl to index config

* feat: unify puffin name passed to stager

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* fix test

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* address comments

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* fallback to remote index

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* fix

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* refactor

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

---------

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
Co-authored-by: evenyag <realevenyag@gmail.com>
2025-02-21 09:27:03 +00:00
Ruihang Xia
d7b6718be0 feat: run sqlness in parallel (#5499)
* define server mode

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* bump sqlness

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* all good

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* clean up

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* refactor: Move config generation logic from Env to ServerMode

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* finalize

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* change license header

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* rename variables

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* override parallelism

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* rename more variables

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2025-02-21 07:05:19 +00:00
Ruihang Xia
6f0783e17e fix: broken link in AUTHOR.md (#5581) 2025-02-21 07:01:41 +00:00
Ruihang Xia
d69e93b91a feat: support to generate json output for explain analyze in http api (#5567)
* impl

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* integration test

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* Update src/servers/src/http/hints.rs

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* refactor: with FORMAT option for explain format

* lift some well-known metrics

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Ning Sun <sunning@greptime.com>
2025-02-21 05:13:09 +00:00
Ruihang Xia
76083892cd feat: support UNNEST (#5580)
* feat: support UNNEST

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix clippy and sqlness

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2025-02-21 04:53:56 +00:00
Ruihang Xia
7981c06989 feat: implement uddsketch function to calculate percentile (#5574)
* basic impl

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* more tests

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* sqlness test

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix clippy

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* update with more test and logs

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2025-02-20 18:59:20 +00:00
beryl678
97bb1519f8 docs: revise the author list (#5575) 2025-02-20 18:04:23 +00:00
Weny Xu
1d8c9c1843 feat: enable gzip for prometheus query handlers and ignore NaN values in prometheus response (#5576)
* feat: enable gzip for prometheus query handlers and ignore nan values in prometheus response

* Apply suggestions from code review

Co-authored-by: shuiyisong <113876041+shuiyisong@users.noreply.github.com>

---------

Co-authored-by: shuiyisong <113876041+shuiyisong@users.noreply.github.com>
2025-02-20 11:34:32 +00:00
jeremyhi
71007e200c feat: remap flow route address (#5565)
* feat: remap fow peers

* refactor: not stream

* feat: remap flownode addr on FlowRoute and TableFlow

* fix: unit test

* Update src/meta-srv/src/handler/remap_flow_peer_handler.rs

Co-authored-by: Lei, HUANG <6406592+v0y4g3r@users.noreply.github.com>

* chore: by comment

* Update src/meta-srv/src/handler/remap_flow_peer_handler.rs

* Update src/common/meta/src/key/flow/table_flow.rs

* Update src/common/meta/src/key/flow/flow_route.rs

* chore: remove duplicate field

---------

Co-authored-by: Lei, HUANG <6406592+v0y4g3r@users.noreply.github.com>
2025-02-20 08:21:32 +00:00
jeremyhi
a0ff9e751e feat: flow type on creating procedure (#5572)
feat: flow type on creating
2025-02-20 08:12:02 +00:00
LFC
f6f617d667 feat: submit node's cpu cores number to metasrv in heartbeat (#5571)
* feat: submit node's cpu cores number to metasrv in heartbeat

* update greptime-proto dep
2025-02-20 03:55:18 +00:00
Ruihang Xia
e8788088a8 feat(log-query): implement the first part of log query expr (#5548)
* feat(log-query): implement the first part of log query expr

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix clippy

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2025-02-19 18:25:41 +00:00
shuiyisong
53b25c04a2 chore: support Loki's structured metadata for ingestion (#5541)
* chore: support loki's structured metadata

* test: update test

* chore: revert some code change

* chore: address CR comment
2025-02-19 16:44:26 +00:00
dennis zhuang
62a8b8b9dc feat(promql): supports sort, sort_desc etc. functions (#5542)
* feat(promql): supports sort, sort_desc etc. functions

* chore: fix toml format and tests

* chore: update deps

Co-authored-by: Weny Xu <wenymedia@gmail.com>

* chore: remove fixme

* fix: cargo lock

* chore: style

---------

Co-authored-by: Weny Xu <wenymedia@gmail.com>
2025-02-19 13:13:49 +00:00
Weny Xu
c8bdeaaa6a fix(promql-planner): update ctx field columns of OR operator (#5556)
* fix(promql-planner): update ctx field columns of OR operator

* test: add sqlness test
2025-02-19 11:18:58 +00:00
Ning Sun
81da18e5df refactor: use global type alias for pipeline input (#5568)
* refactor: use global type alias for pipeline input

* fmt: reformat
2025-02-19 10:41:33 +00:00
Weny Xu
7c65fddb30 fix(promql-planner): correct AND/UNLESS operator behavior (#5557)
* fix(promql-planner): keep field column in left input for AND operator

* test: add sqlness test

* fix: fix unless operator
2025-02-19 09:07:39 +00:00
Zhenchi
421e38c481 feat: allow purging a given puffin file in staging area (#5558)
* feat: purge a given puffin file in staging area

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* polish log

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* ttl set to 2d

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* feat: expose staging_ttl to index config

* fix test

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* use `invalidate_entries_if` instead of maintaining map

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* run_pending_tasks after purging

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

---------

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
Co-authored-by: evenyag <realevenyag@gmail.com>
2025-02-19 08:58:30 +00:00
Weny Xu
aada5c1706 fix(promql-planner): remove le tag in ctx (#5560)
* fix(promql-planner): remove le tag in ctx

* test: add sqlness test

* chore: apply suggestions from CR
2025-02-19 03:51:27 +00:00
yihong
aa8f119bbb chore: format all toml files (#5529)
fix: format some cargo files

Signed-off-by: yihong0618 <zouzou0208@gmail.com>
2025-02-18 12:09:01 +00:00
ZonaHe
19a6d15849 feat: update dashboard to v0.7.10 (#5562)
Co-authored-by: ZonaHex <ZonaHex@users.noreply.github.com>
2025-02-18 12:06:22 +00:00
liyang
073aaefe65 chore: improve grafana dashboard (#5559) 2025-02-18 11:36:27 +00:00
Yingwen
77223a0f3e fix: window sort support alias time index (#5543)
* fix: use alias expr to check commutativity

* chore: debug sort

* feat: consider alias in window sort optimizer

* test: sqlness test

* test: update sqlness result
2025-02-18 10:35:43 +00:00
Ruihang Xia
4ef038d098 fix: correct promql behavior on nonexistent columns (#5547)
* Revert "fix(promql): ignore filters for non-existent labels (#5519)"

This reverts commit 33a2485f54.

* reimplement

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* state safety

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2025-02-17 18:43:50 +00:00
jeremyhi
deb9520970 fix: information_schema.cluster_info be covered by the same id (#5555)
* fix: information_schema.cluster_info be coverd by the same id

* chore: by comment
2025-02-17 11:51:02 +00:00
Yingwen
6bba5e0afa feat: collect stager metrics (#5553)
* feat: collect stager metrics

* Apply suggestions from code review

Co-authored-by: Zhenchi <zhongzc_arch@outlook.com>

* Update src/mito2/src/metrics.rs

---------

Co-authored-by: Weny Xu <wenymedia@gmail.com>
Co-authored-by: Zhenchi <zhongzc_arch@outlook.com>
2025-02-17 07:09:15 +00:00
Ruihang Xia
f359eeb667 feat(log-query): support specifying exclusive/inclusive for between filter (#5546)
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2025-02-17 04:40:47 +00:00
liyang
009dbad581 ci: don't push nightly latest image (#5551)
* ci: don't push nightly latest image

* add push release latest image
2025-02-17 04:34:49 +00:00
liyang
a2047b096c ci: use s5cmd upload artifacts (#5550) 2025-02-17 02:57:13 +00:00
215 changed files with 8842 additions and 3860 deletions

View File

@@ -34,8 +34,8 @@ inputs:
required: true
push-latest-tag:
description: Whether to push the latest tag
required: false
default: 'true'
required: true
default: 'false'
runs:
using: composite
steps:

View File

@@ -22,8 +22,8 @@ inputs:
required: true
push-latest-tag:
description: Whether to push the latest tag
required: false
default: 'true'
required: true
default: 'false'
dev-mode:
description: Enable dev mode, only build standard greptime
required: false

View File

@@ -51,8 +51,8 @@ inputs:
required: true
upload-to-s3:
description: Upload to S3
required: false
default: 'true'
required: true
default: 'false'
artifacts-dir:
description: Directory to store artifacts
required: false
@@ -77,13 +77,21 @@ runs:
with:
path: ${{ inputs.artifacts-dir }}
- name: Install s5cmd
shell: bash
run: |
wget https://github.com/peak/s5cmd/releases/download/v2.3.0/s5cmd_2.3.0_Linux-64bit.tar.gz
tar -xzf s5cmd_2.3.0_Linux-64bit.tar.gz
sudo mv s5cmd /usr/local/bin/
sudo chmod +x /usr/local/bin/s5cmd
- name: Release artifacts to cn region
uses: nick-invision/retry@v2
if: ${{ inputs.upload-to-s3 == 'true' }}
env:
AWS_ACCESS_KEY_ID: ${{ inputs.aws-cn-access-key-id }}
AWS_SECRET_ACCESS_KEY: ${{ inputs.aws-cn-secret-access-key }}
AWS_DEFAULT_REGION: ${{ inputs.aws-cn-region }}
AWS_REGION: ${{ inputs.aws-cn-region }}
UPDATE_VERSION_INFO: ${{ inputs.update-version-info }}
with:
max_attempts: ${{ inputs.upload-max-retry-times }}

View File

@@ -33,7 +33,7 @@ function upload_artifacts() {
# ├── greptime-darwin-amd64-v0.2.0.sha256sum
# └── greptime-darwin-amd64-v0.2.0.tar.gz
find "$ARTIFACTS_DIR" -type f \( -name "*.tar.gz" -o -name "*.sha256sum" \) | while IFS= read -r file; do
aws s3 cp \
s5cmd cp \
"$file" "s3://$AWS_S3_BUCKET/$RELEASE_DIRS/$VERSION/$(basename "$file")"
done
}
@@ -45,7 +45,7 @@ function update_version_info() {
if [[ "$VERSION" =~ ^v[0-9]+\.[0-9]+\.[0-9]+$ ]]; then
echo "Updating latest-version.txt"
echo "$VERSION" > latest-version.txt
aws s3 cp \
s5cmd cp \
latest-version.txt "s3://$AWS_S3_BUCKET/$RELEASE_DIRS/latest-version.txt"
fi
@@ -53,7 +53,7 @@ function update_version_info() {
if [[ "$VERSION" == *"nightly"* ]]; then
echo "Updating latest-nightly-version.txt"
echo "$VERSION" > latest-nightly-version.txt
aws s3 cp \
s5cmd cp \
latest-nightly-version.txt "s3://$AWS_S3_BUCKET/$RELEASE_DIRS/latest-nightly-version.txt"
fi
fi

View File

@@ -274,6 +274,7 @@ jobs:
aws-cn-access-key-id: ${{ secrets.AWS_CN_ACCESS_KEY_ID }}
aws-cn-secret-access-key: ${{ secrets.AWS_CN_SECRET_ACCESS_KEY }}
aws-cn-region: ${{ vars.AWS_RELEASE_BUCKET_REGION }}
upload-to-s3: false
dev-mode: true # Only build the standard images(exclude centos images).
push-latest-tag: false # Don't push the latest tag to registry.
update-version-info: false # Don't update the version info in S3.

View File

@@ -3,6 +3,10 @@ on:
pull_request_target:
types: [opened, edited]
concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
cancel-in-progress: true
jobs:
docbot:
runs-on: ubuntu-20.04

View File

@@ -200,7 +200,7 @@ jobs:
image-registry-username: ${{ secrets.DOCKERHUB_USERNAME }}
image-registry-password: ${{ secrets.DOCKERHUB_TOKEN }}
version: ${{ needs.allocate-runners.outputs.version }}
push-latest-tag: true
push-latest-tag: false
- name: Set nightly build result
id: set-nightly-build-result
@@ -240,9 +240,10 @@ jobs:
aws-cn-access-key-id: ${{ secrets.AWS_CN_ACCESS_KEY_ID }}
aws-cn-secret-access-key: ${{ secrets.AWS_CN_SECRET_ACCESS_KEY }}
aws-cn-region: ${{ vars.AWS_RELEASE_BUCKET_REGION }}
upload-to-s3: false
dev-mode: false
update-version-info: false # Don't update version info in S3.
push-latest-tag: true
push-latest-tag: false
stop-linux-amd64-runner: # It's always run as the last job in the workflow to make sure that the runner is released.
name: Stop linux-amd64 runner

View File

@@ -317,6 +317,7 @@ jobs:
image-registry-username: ${{ secrets.DOCKERHUB_USERNAME }}
image-registry-password: ${{ secrets.DOCKERHUB_TOKEN }}
version: ${{ needs.allocate-runners.outputs.version }}
push-latest-tag: true
- name: Set build image result
id: set-build-image-result
@@ -361,6 +362,7 @@ jobs:
aws-cn-secret-access-key: ${{ secrets.AWS_CN_SECRET_ACCESS_KEY }}
aws-cn-region: ${{ vars.AWS_RELEASE_BUCKET_REGION }}
dev-mode: false
upload-to-s3: true
update-version-info: true
push-latest-tag: true

View File

@@ -7,6 +7,10 @@ on:
- reopened
- edited
concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
cancel-in-progress: true
jobs:
check:
runs-on: ubuntu-20.04

View File

@@ -3,30 +3,28 @@
## Individual Committers (in alphabetical order)
* [CookiePieWw](https://github.com/CookiePieWw)
* [KKould](https://github.com/KKould)
* [NiwakaDev](https://github.com/NiwakaDev)
* [etolbakov](https://github.com/etolbakov)
* [irenjj](https://github.com/irenjj)
* [tisonkun](https://github.com/tisonkun)
* [KKould](https://github.com/KKould)
* [Lanqing Yang](https://github.com/lyang24)
* [NiwakaDev](https://github.com/NiwakaDev)
* [tisonkun](https://github.com/tisonkun)
## Team Members (in alphabetical order)
* [Breeze-P](https://github.com/Breeze-P)
* [GrepTime](https://github.com/GrepTime)
* [MichaelScofield](https://github.com/MichaelScofield)
* [Wenjie0329](https://github.com/Wenjie0329)
* [WenyXu](https://github.com/WenyXu)
* [ZonaHex](https://github.com/ZonaHex)
* [apdong2022](https://github.com/apdong2022)
* [beryl678](https://github.com/beryl678)
* [Breeze-P](https://github.com/Breeze-P)
* [daviderli614](https://github.com/daviderli614)
* [discord9](https://github.com/discord9)
* [evenyag](https://github.com/evenyag)
* [fengjiachun](https://github.com/fengjiachun)
* [fengys1996](https://github.com/fengys1996)
* [GrepTime](https://github.com/GrepTime)
* [holalengyu](https://github.com/holalengyu)
* [killme2008](https://github.com/killme2008)
* [MichaelScofield](https://github.com/MichaelScofield)
* [nicecui](https://github.com/nicecui)
* [paomian](https://github.com/paomian)
* [shuiyisong](https://github.com/shuiyisong)
@@ -34,11 +32,14 @@
* [sunng87](https://github.com/sunng87)
* [v0y4g3r](https://github.com/v0y4g3r)
* [waynexia](https://github.com/waynexia)
* [Wenjie0329](https://github.com/Wenjie0329)
* [WenyXu](https://github.com/WenyXu)
* [xtang](https://github.com/xtang)
* [zhaoyingnan01](https://github.com/zhaoyingnan01)
* [zhongzc](https://github.com/zhongzc)
* [ZonaHex](https://github.com/ZonaHex)
* [zyy17](https://github.com/zyy17)
## All Contributors
[![All Contributors](https://contrib.rocks/image?repo=GreptimeTeam/greptimedb)](https://github.com/GreptimeTeam/greptimedb/graphs/contributors)
To see the full list of contributors, please visit our [Contributors page](https://github.com/GreptimeTeam/greptimedb/graphs/contributors)

85
Cargo.lock generated
View File

@@ -432,7 +432,7 @@ dependencies = [
"arrow-schema",
"chrono",
"half",
"indexmap 2.6.0",
"indexmap 2.7.1",
"lexical-core",
"num",
"serde",
@@ -1475,7 +1475,7 @@ version = "0.13.7"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "6026d8cd82ada8bbcfe337805dd1eb6afdc9e80fa4d57e977b3a36315e0c5525"
dependencies = [
"indexmap 2.6.0",
"indexmap 2.7.1",
"lazy_static",
"num-traits",
"regex",
@@ -2009,10 +2009,12 @@ dependencies = [
name = "common-function"
version = "0.12.0"
dependencies = [
"ahash 0.8.11",
"api",
"approx 0.5.1",
"arc-swap",
"async-trait",
"bincode",
"common-base",
"common-catalog",
"common-error",
@@ -2030,6 +2032,7 @@ dependencies = [
"geo-types",
"geohash",
"h3o",
"hyperloglogplus",
"jsonb",
"nalgebra 0.33.2",
"num",
@@ -2046,6 +2049,7 @@ dependencies = [
"store-api",
"table",
"tokio",
"uddsketch",
"wkt",
]
@@ -2972,7 +2976,7 @@ dependencies = [
"chrono",
"half",
"hashbrown 0.14.5",
"indexmap 2.6.0",
"indexmap 2.7.1",
"libc",
"object_store",
"parquet",
@@ -3032,7 +3036,7 @@ dependencies = [
"datafusion-functions-aggregate-common",
"datafusion-functions-window-common",
"datafusion-physical-expr-common",
"indexmap 2.6.0",
"indexmap 2.7.1",
"paste",
"recursive",
"serde_json",
@@ -3154,7 +3158,7 @@ dependencies = [
"datafusion-physical-expr-common",
"datafusion-physical-plan",
"half",
"indexmap 2.6.0",
"indexmap 2.7.1",
"log",
"parking_lot 0.12.3",
"paste",
@@ -3205,7 +3209,7 @@ dependencies = [
"datafusion-common",
"datafusion-expr",
"datafusion-physical-expr",
"indexmap 2.6.0",
"indexmap 2.7.1",
"itertools 0.13.0",
"log",
"recursive",
@@ -3230,7 +3234,7 @@ dependencies = [
"datafusion-physical-expr-common",
"half",
"hashbrown 0.14.5",
"indexmap 2.6.0",
"indexmap 2.7.1",
"itertools 0.13.0",
"log",
"paste",
@@ -3289,7 +3293,7 @@ dependencies = [
"futures",
"half",
"hashbrown 0.14.5",
"indexmap 2.6.0",
"indexmap 2.7.1",
"itertools 0.13.0",
"log",
"once_cell",
@@ -3309,7 +3313,7 @@ dependencies = [
"arrow-schema",
"datafusion-common",
"datafusion-expr",
"indexmap 2.6.0",
"indexmap 2.7.1",
"log",
"recursive",
"regex",
@@ -3376,6 +3380,7 @@ dependencies = [
"meta-client",
"metric-engine",
"mito2",
"num_cpus",
"object-store",
"prometheus",
"prost 0.13.3",
@@ -4196,6 +4201,7 @@ dependencies = [
"meta-client",
"nom",
"num-traits",
"num_cpus",
"operator",
"partition",
"pretty_assertions",
@@ -4302,6 +4308,7 @@ dependencies = [
"log-query",
"log-store",
"meta-client",
"num_cpus",
"opentelemetry-proto 0.27.0",
"operator",
"partition",
@@ -4692,7 +4699,7 @@ dependencies = [
[[package]]
name = "greptime-proto"
version = "0.1.0"
source = "git+https://github.com/GreptimeTeam/greptime-proto.git?rev=fc09a5696608d2a0aa718cc835d5cb9c4e8e9387#fc09a5696608d2a0aa718cc835d5cb9c4e8e9387"
source = "git+https://github.com/GreptimeTeam/greptime-proto.git?rev=072ce580502e015df1a6b03a185b60309a7c2a7a#072ce580502e015df1a6b03a185b60309a7c2a7a"
dependencies = [
"prost 0.13.3",
"serde",
@@ -4715,7 +4722,7 @@ dependencies = [
"futures-sink",
"futures-util",
"http 0.2.12",
"indexmap 2.6.0",
"indexmap 2.7.1",
"slab",
"tokio",
"tokio-util",
@@ -4734,7 +4741,7 @@ dependencies = [
"futures-core",
"futures-sink",
"http 1.1.0",
"indexmap 2.6.0",
"indexmap 2.7.1",
"slab",
"tokio",
"tokio-util",
@@ -5284,6 +5291,15 @@ dependencies = [
"tracing",
]
[[package]]
name = "hyperloglogplus"
version = "0.4.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "621debdf94dcac33e50475fdd76d34d5ea9c0362a834b9db08c3024696c1fbe3"
dependencies = [
"serde",
]
[[package]]
name = "i_float"
version = "1.3.1"
@@ -5572,9 +5588,9 @@ dependencies = [
[[package]]
name = "indexmap"
version = "2.6.0"
version = "2.7.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "707907fe3c25f5424cce2cb7e1cbcafee6bdbe735ca90ef77c29e84591e5b9da"
checksum = "8c9c992b02b5b4c94ea26e32fe5bccb7aa7d9f390ab5c1221ff895bc7ea8b652"
dependencies = [
"equivalent",
"hashbrown 0.15.2",
@@ -5588,7 +5604,7 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "232929e1d75fe899576a3d5c7416ad0d88dbfbb3c3d6aa00873a7408a50ddb88"
dependencies = [
"ahash 0.8.11",
"indexmap 2.6.0",
"indexmap 2.7.1",
"is-terminal",
"itoa",
"log",
@@ -5935,7 +5951,7 @@ version = "0.4.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "4ee7893dab2e44ae5f9d0173f26ff4aa327c10b01b06a72b52dd9405b628640d"
dependencies = [
"indexmap 2.6.0",
"indexmap 2.7.1",
]
[[package]]
@@ -6418,7 +6434,7 @@ dependencies = [
"cactus",
"cfgrammar",
"filetime",
"indexmap 2.6.0",
"indexmap 2.7.1",
"lazy_static",
"lrtable",
"num-traits",
@@ -7659,7 +7675,7 @@ checksum = "1e32339a5dc40459130b3bd269e9892439f55b33e772d2a9d402a789baaf4e8a"
dependencies = [
"futures-core",
"futures-sink",
"indexmap 2.6.0",
"indexmap 2.7.1",
"js-sys",
"once_cell",
"pin-project-lite",
@@ -8231,7 +8247,7 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b4c5cc86750666a3ed20bdaf5ca2a0344f9c67674cae0515bec2da16fbaa47db"
dependencies = [
"fixedbitset",
"indexmap 2.6.0",
"indexmap 2.7.1",
]
[[package]]
@@ -8756,8 +8772,7 @@ dependencies = [
[[package]]
name = "promql-parser"
version = "0.4.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7fe99e6f80a79abccf1e8fb48dd63473a36057e600cc6ea36147c8318698ae6f"
source = "git+https://github.com/GreptimeTeam/promql-parser.git?rev=27abb8e16003a50c720f00d6c85f41f5fa2a2a8e#27abb8e16003a50c720f00d6c85f41f5fa2a2a8e"
dependencies = [
"cfgrammar",
"chrono",
@@ -10323,7 +10338,7 @@ version = "1.0.137"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "930cfb6e6abf99298aaad7d29abbef7a9999a9a8806a40088f55f0dcec03146b"
dependencies = [
"indexmap 2.6.0",
"indexmap 2.7.1",
"itoa",
"memchr",
"ryu",
@@ -10394,7 +10409,7 @@ dependencies = [
"chrono",
"hex",
"indexmap 1.9.3",
"indexmap 2.6.0",
"indexmap 2.7.1",
"serde",
"serde_derive",
"serde_json",
@@ -10420,7 +10435,7 @@ version = "0.9.34+deprecated"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "6a8b1a1a2ebf674015cc02edccce75287f1a0130d394307b36743c2f5d504b47"
dependencies = [
"indexmap 2.6.0",
"indexmap 2.7.1",
"itoa",
"ryu",
"serde",
@@ -10481,6 +10496,7 @@ dependencies = [
"humantime",
"humantime-serde",
"hyper 1.4.1",
"indexmap 2.7.1",
"influxdb_line_protocol",
"itertools 0.10.5",
"json5",
@@ -10891,12 +10907,12 @@ dependencies = [
[[package]]
name = "sqlness"
version = "0.6.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "308a7338f2211813d6e9da117e9b9b7aee5d072872d11a934002fd2bd4ab5276"
source = "git+https://github.com/CeresDB/sqlness.git?rev=bb91f31ff58993e07ea89845791235138283a24c#bb91f31ff58993e07ea89845791235138283a24c"
dependencies = [
"async-trait",
"derive_builder 0.11.2",
"duration-str",
"futures",
"minijinja",
"prettydiff",
"regex",
@@ -10922,6 +10938,7 @@ dependencies = [
"hex",
"local-ip-address",
"mysql",
"num_cpus",
"reqwest",
"serde",
"serde_json",
@@ -11021,7 +11038,7 @@ dependencies = [
"futures-util",
"hashbrown 0.15.2",
"hashlink",
"indexmap 2.6.0",
"indexmap 2.7.1",
"log",
"memchr",
"once_cell",
@@ -12317,7 +12334,7 @@ version = "0.19.15"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "1b5bb770da30e5cbfde35a2d7b9b8a2c4b8ef89548a7a6aeab5c9a576e3e7421"
dependencies = [
"indexmap 2.6.0",
"indexmap 2.7.1",
"toml_datetime",
"winnow 0.5.40",
]
@@ -12328,7 +12345,7 @@ version = "0.22.22"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "4ae48d6208a266e853d946088ed816055e556cc6028c5e8e2b84d9fa5dd7c7f5"
dependencies = [
"indexmap 2.6.0",
"indexmap 2.7.1",
"serde",
"serde_spanned",
"toml_datetime",
@@ -12466,7 +12483,7 @@ dependencies = [
"futures-core",
"futures-util",
"hdrhistogram",
"indexmap 2.6.0",
"indexmap 2.7.1",
"pin-project-lite",
"slab",
"sync_wrapper 1.0.1",
@@ -12954,6 +12971,14 @@ version = "0.1.7"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "2896d95c02a80c6d6a5d6e953d479f5ddf2dfdb6a244441010e373ac0fb88971"
[[package]]
name = "uddsketch"
version = "0.1.0"
source = "git+https://github.com/GreptimeTeam/timescaledb-toolkit.git?rev=84828fe8fb494a6a61412a3da96517fc80f7bb20#84828fe8fb494a6a61412a3da96517fc80f7bb20"
dependencies = [
"serde",
]
[[package]]
name = "unescaper"
version = "0.1.5"

View File

@@ -129,7 +129,7 @@ etcd-client = "0.14"
fst = "0.4.7"
futures = "0.3"
futures-util = "0.3"
greptime-proto = { git = "https://github.com/GreptimeTeam/greptime-proto.git", rev = "fc09a5696608d2a0aa718cc835d5cb9c4e8e9387" }
greptime-proto = { git = "https://github.com/GreptimeTeam/greptime-proto.git", rev = "072ce580502e015df1a6b03a185b60309a7c2a7a" }
hex = "0.4"
http = "1"
humantime = "2.1"
@@ -160,7 +160,9 @@ parquet = { version = "53.0.0", default-features = false, features = ["arrow", "
paste = "1.0"
pin-project = "1.0"
prometheus = { version = "0.13.3", features = ["process"] }
promql-parser = { version = "0.4.3", features = ["ser"] }
promql-parser = { git = "https://github.com/GreptimeTeam/promql-parser.git", features = [
"ser",
], rev = "27abb8e16003a50c720f00d6c85f41f5fa2a2a8e" }
prost = "0.13"
raft-engine = { version = "0.4.1", default-features = false }
rand = "0.8"

View File

@@ -152,6 +152,7 @@
| `region_engine.mito.index` | -- | -- | The options for index in Mito engine. |
| `region_engine.mito.index.aux_path` | String | `""` | Auxiliary directory path for the index in filesystem, used to store intermediate files for<br/>creating the index and staging files for searching the index, defaults to `{data_home}/index_intermediate`.<br/>The default name for this directory is `index_intermediate` for backward compatibility.<br/><br/>This path contains two subdirectories:<br/>- `__intm`: for storing intermediate files used during creating index.<br/>- `staging`: for storing staging files used during searching index. |
| `region_engine.mito.index.staging_size` | String | `2GB` | The max capacity of the staging directory. |
| `region_engine.mito.index.staging_ttl` | String | `7d` | The TTL of the staging directory.<br/>Defaults to 7 days.<br/>Setting it to "0s" to disable TTL. |
| `region_engine.mito.index.metadata_cache_size` | String | `64MiB` | Cache size for inverted index metadata. |
| `region_engine.mito.index.content_cache_size` | String | `128MiB` | Cache size for inverted index content. |
| `region_engine.mito.index.content_cache_page_size` | String | `64KiB` | Page size for inverted index content cache. |
@@ -318,6 +319,7 @@
| `selector` | String | `round_robin` | Datanode selector type.<br/>- `round_robin` (default value)<br/>- `lease_based`<br/>- `load_based`<br/>For details, please see "https://docs.greptime.com/developer-guide/metasrv/selector". |
| `use_memory_store` | Bool | `false` | Store data in memory. |
| `enable_region_failover` | Bool | `false` | Whether to enable region failover.<br/>This feature is only available on GreptimeDB running on cluster mode and<br/>- Using Remote WAL<br/>- Using shared storage (e.g., s3). |
| `node_max_idle_time` | String | `24hours` | Max allowed idle time before removing node info from metasrv memory. |
| `enable_telemetry` | Bool | `true` | Whether to enable greptimedb telemetry. Enabled by default. |
| `runtime` | -- | -- | The runtime options. |
| `runtime.global_rt_size` | Integer | `8` | The number of threads to execute the runtime for global read operations. |
@@ -491,6 +493,7 @@
| `region_engine.mito.index` | -- | -- | The options for index in Mito engine. |
| `region_engine.mito.index.aux_path` | String | `""` | Auxiliary directory path for the index in filesystem, used to store intermediate files for<br/>creating the index and staging files for searching the index, defaults to `{data_home}/index_intermediate`.<br/>The default name for this directory is `index_intermediate` for backward compatibility.<br/><br/>This path contains two subdirectories:<br/>- `__intm`: for storing intermediate files used during creating index.<br/>- `staging`: for storing staging files used during searching index. |
| `region_engine.mito.index.staging_size` | String | `2GB` | The max capacity of the staging directory. |
| `region_engine.mito.index.staging_ttl` | String | `7d` | The TTL of the staging directory.<br/>Defaults to 7 days.<br/>Setting it to "0s" to disable TTL. |
| `region_engine.mito.index.metadata_cache_size` | String | `64MiB` | Cache size for inverted index metadata. |
| `region_engine.mito.index.content_cache_size` | String | `128MiB` | Cache size for inverted index content. |
| `region_engine.mito.index.content_cache_page_size` | String | `64KiB` | Page size for inverted index content cache. |

View File

@@ -497,6 +497,11 @@ aux_path = ""
## The max capacity of the staging directory.
staging_size = "2GB"
## The TTL of the staging directory.
## Defaults to 7 days.
## Setting it to "0s" to disable TTL.
staging_ttl = "7d"
## Cache size for inverted index metadata.
metadata_cache_size = "64MiB"

View File

@@ -50,6 +50,9 @@ use_memory_store = false
## - Using shared storage (e.g., s3).
enable_region_failover = false
## Max allowed idle time before removing node info from metasrv memory.
node_max_idle_time = "24hours"
## Whether to enable greptimedb telemetry. Enabled by default.
#+ enable_telemetry = true

View File

@@ -584,6 +584,11 @@ aux_path = ""
## The max capacity of the staging directory.
staging_size = "2GB"
## The TTL of the staging directory.
## Defaults to 7 days.
## Setting it to "0s" to disable TTL.
staging_ttl = "7d"
## Cache size for inverted index metadata.
metadata_cache_size = "64MiB"

File diff suppressed because it is too large Load Diff

View File

@@ -384,8 +384,8 @@
"rowHeight": 0.9,
"showValue": "auto",
"tooltip": {
"mode": "none",
"sort": "none"
"mode": "multi",
"sort": "desc"
}
},
"targets": [
@@ -483,8 +483,8 @@
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
"mode": "multi",
"sort": "desc"
}
},
"pluginVersion": "10.2.3",
@@ -578,8 +578,8 @@
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
"mode": "multi",
"sort": "desc"
}
},
"pluginVersion": "10.2.3",
@@ -601,7 +601,7 @@
"type": "timeseries"
},
{
"collapsed": true,
"collapsed": false,
"gridPos": {
"h": 1,
"w": 24,
@@ -684,8 +684,8 @@
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
"mode": "multi",
"sort": "desc"
}
},
"targets": [
@@ -878,8 +878,8 @@
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
"mode": "multi",
"sort": "desc"
}
},
"targets": [
@@ -1124,8 +1124,8 @@
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
"mode": "multi",
"sort": "desc"
}
},
"targets": [
@@ -1223,8 +1223,8 @@
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
"mode": "multi",
"sort": "desc"
}
},
"targets": [
@@ -1322,8 +1322,8 @@
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
"mode": "multi",
"sort": "desc"
}
},
"targets": [
@@ -1456,8 +1456,8 @@
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
"mode": "multi",
"sort": "desc"
}
},
"targets": [
@@ -1573,8 +1573,8 @@
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
"mode": "multi",
"sort": "desc"
}
},
"targets": [
@@ -1673,8 +1673,8 @@
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
"mode": "multi",
"sort": "desc"
}
},
"targets": [
@@ -1773,8 +1773,8 @@
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
"mode": "multi",
"sort": "desc"
}
},
"targets": [
@@ -1890,8 +1890,8 @@
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
"mode": "multi",
"sort": "desc"
}
},
"targets": [
@@ -2002,8 +2002,8 @@
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
"mode": "multi",
"sort": "desc"
}
},
"targets": [
@@ -2120,8 +2120,8 @@
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
"mode": "multi",
"sort": "desc"
}
},
"targets": [
@@ -2233,8 +2233,8 @@
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
"mode": "multi",
"sort": "desc"
}
},
"targets": [
@@ -2334,8 +2334,8 @@
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
"mode": "multi",
"sort": "desc"
}
},
"targets": [
@@ -2435,8 +2435,8 @@
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
"mode": "multi",
"sort": "desc"
}
},
"targets": [
@@ -2548,8 +2548,8 @@
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
"mode": "multi",
"sort": "desc"
}
},
"targets": [
@@ -2661,8 +2661,8 @@
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
"mode": "multi",
"sort": "desc"
}
},
"targets": [
@@ -2788,8 +2788,8 @@
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
"mode": "multi",
"sort": "desc"
}
},
"targets": [
@@ -2889,8 +2889,8 @@
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
"mode": "multi",
"sort": "desc"
}
},
"targets": [
@@ -2990,8 +2990,8 @@
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
"mode": "multi",
"sort": "desc"
}
},
"targets": [
@@ -3091,8 +3091,8 @@
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
"mode": "multi",
"sort": "desc"
}
},
"targets": [
@@ -3191,8 +3191,8 @@
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
"mode": "multi",
"sort": "desc"
}
},
"targets": [
@@ -3302,8 +3302,8 @@
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
"mode": "multi",
"sort": "desc"
}
},
"targets": [
@@ -3432,8 +3432,8 @@
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
"mode": "multi",
"sort": "desc"
}
},
"targets": [
@@ -3543,8 +3543,8 @@
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
"mode": "multi",
"sort": "desc"
}
},
"targets": [
@@ -3657,8 +3657,8 @@
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
"mode": "multi",
"sort": "desc"
}
},
"targets": [
@@ -3808,8 +3808,8 @@
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
"mode": "multi",
"sort": "desc"
}
},
"targets": [
@@ -3909,8 +3909,8 @@
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
"mode": "multi",
"sort": "desc"
}
},
"targets": [
@@ -4011,8 +4011,8 @@
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
"mode": "multi",
"sort": "desc"
}
},
"targets": [
@@ -4113,8 +4113,8 @@
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
"mode": "multi",
"sort": "desc"
}
},
"targets": [

View File

@@ -15,13 +15,10 @@ common-macro.workspace = true
common-time.workspace = true
datatypes.workspace = true
greptime-proto.workspace = true
paste = "1.0"
paste.workspace = true
prost.workspace = true
serde_json.workspace = true
snafu.workspace = true
[build-dependencies]
tonic-build = "0.11"
[dev-dependencies]
paste = "1.0"

View File

@@ -15,7 +15,7 @@ api.workspace = true
arrow.workspace = true
arrow-schema.workspace = true
async-stream.workspace = true
async-trait = "0.1"
async-trait.workspace = true
bytes.workspace = true
common-catalog.workspace = true
common-error.workspace = true
@@ -31,7 +31,7 @@ common-version.workspace = true
dashmap.workspace = true
datafusion.workspace = true
datatypes.workspace = true
futures = "0.3"
futures.workspace = true
futures-util.workspace = true
humantime.workspace = true
itertools.workspace = true
@@ -39,7 +39,7 @@ lazy_static.workspace = true
meta-client.workspace = true
moka = { workspace = true, features = ["future", "sync"] }
partition.workspace = true
paste = "1.0"
paste.workspace = true
prometheus.workspace = true
rustc-hash.workspace = true
serde_json.workspace = true
@@ -49,7 +49,7 @@ sql.workspace = true
store-api.workspace = true
table.workspace = true
tokio.workspace = true
tokio-stream = "0.1"
tokio-stream.workspace = true
[dev-dependencies]
cache.workspace = true

View File

@@ -42,7 +42,7 @@ pub struct Instance {
}
impl Instance {
fn new(instance: MetasrvInstance, guard: Vec<WorkerGuard>) -> Self {
pub fn new(instance: MetasrvInstance, guard: Vec<WorkerGuard>) -> Self {
Self {
instance,
_guard: guard,

View File

@@ -18,7 +18,7 @@ bytes.workspace = true
common-error.workspace = true
common-macro.workspace = true
futures.workspace = true
paste = "1.0"
paste.workspace = true
pin-project.workspace = true
rand.workspace = true
serde = { version = "1.0", features = ["derive"] }

View File

@@ -35,7 +35,7 @@ orc-rust = { version = "0.5", default-features = false, features = [
"async",
] }
parquet.workspace = true
paste = "1.0"
paste.workspace = true
rand.workspace = true
regex = "1.7"
serde.workspace = true

View File

@@ -12,9 +12,11 @@ default = ["geo"]
geo = ["geohash", "h3o", "s2", "wkt", "geo-types", "dep:geo"]
[dependencies]
ahash = "0.8"
api.workspace = true
arc-swap = "1.0"
async-trait.workspace = true
bincode = "1.3"
common-base.workspace = true
common-catalog.workspace = true
common-error.workspace = true
@@ -32,12 +34,13 @@ geo = { version = "0.29", optional = true }
geo-types = { version = "0.7", optional = true }
geohash = { version = "0.13", optional = true }
h3o = { version = "0.6", optional = true }
hyperloglogplus = "0.4"
jsonb.workspace = true
nalgebra.workspace = true
num = "0.4"
num-traits = "0.2"
once_cell.workspace = true
paste = "1.0"
paste.workspace = true
s2 = { version = "0.0.12", optional = true }
serde.workspace = true
serde_json.workspace = true
@@ -47,6 +50,7 @@ sql.workspace = true
statrs = "0.16"
store-api.workspace = true
table.workspace = true
uddsketch = { git = "https://github.com/GreptimeTeam/timescaledb-toolkit.git", rev = "84828fe8fb494a6a61412a3da96517fc80f7bb20" }
wkt = { version = "0.11", optional = true }
[dev-dependencies]

View File

@@ -0,0 +1,20 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
mod hll;
mod uddsketch_state;
pub(crate) use hll::HllStateType;
pub use hll::{HllState, HLL_MERGE_NAME, HLL_NAME};
pub use uddsketch_state::{UddSketchState, UDDSKETCH_STATE_NAME};

View File

@@ -0,0 +1,319 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
use std::sync::Arc;
use common_query::prelude::*;
use common_telemetry::trace;
use datafusion::arrow::array::ArrayRef;
use datafusion::common::cast::{as_binary_array, as_string_array};
use datafusion::common::not_impl_err;
use datafusion::error::{DataFusionError, Result as DfResult};
use datafusion::logical_expr::function::AccumulatorArgs;
use datafusion::logical_expr::{Accumulator as DfAccumulator, AggregateUDF};
use datafusion::prelude::create_udaf;
use datatypes::arrow::datatypes::DataType;
use hyperloglogplus::{HyperLogLog, HyperLogLogPlus};
use crate::utils::FixedRandomState;
pub const HLL_NAME: &str = "hll";
pub const HLL_MERGE_NAME: &str = "hll_merge";
const DEFAULT_PRECISION: u8 = 14;
pub(crate) type HllStateType = HyperLogLogPlus<String, FixedRandomState>;
pub struct HllState {
hll: HllStateType,
}
impl std::fmt::Debug for HllState {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
write!(f, "HllState<Opaque>")
}
}
impl Default for HllState {
fn default() -> Self {
Self::new()
}
}
impl HllState {
pub fn new() -> Self {
Self {
// Safety: the DEFAULT_PRECISION is fixed and valid
hll: HllStateType::new(DEFAULT_PRECISION, FixedRandomState::new()).unwrap(),
}
}
/// Create a UDF for the `hll` function.
///
/// `hll` accepts a string column and aggregates the
/// values into a HyperLogLog state.
pub fn state_udf_impl() -> AggregateUDF {
create_udaf(
HLL_NAME,
vec![DataType::Utf8],
Arc::new(DataType::Binary),
Volatility::Immutable,
Arc::new(Self::create_accumulator),
Arc::new(vec![DataType::Binary]),
)
}
/// Create a UDF for the `hll_merge` function.
///
/// `hll_merge` accepts a binary column of states generated by `hll`
/// and merges them into a single state.
pub fn merge_udf_impl() -> AggregateUDF {
create_udaf(
HLL_MERGE_NAME,
vec![DataType::Binary],
Arc::new(DataType::Binary),
Volatility::Immutable,
Arc::new(Self::create_merge_accumulator),
Arc::new(vec![DataType::Binary]),
)
}
fn update(&mut self, value: &str) {
self.hll.insert(value);
}
fn merge(&mut self, raw: &[u8]) {
if let Ok(serialized) = bincode::deserialize::<HllStateType>(raw) {
if let Ok(()) = self.hll.merge(&serialized) {
return;
}
}
trace!("Warning: Failed to merge HyperLogLog from {:?}", raw);
}
fn create_accumulator(acc_args: AccumulatorArgs) -> DfResult<Box<dyn DfAccumulator>> {
let data_type = acc_args.exprs[0].data_type(acc_args.schema)?;
match data_type {
DataType::Utf8 => Ok(Box::new(HllState::new())),
other => not_impl_err!("{HLL_NAME} does not support data type: {other}"),
}
}
fn create_merge_accumulator(acc_args: AccumulatorArgs) -> DfResult<Box<dyn DfAccumulator>> {
let data_type = acc_args.exprs[0].data_type(acc_args.schema)?;
match data_type {
DataType::Binary => Ok(Box::new(HllState::new())),
other => not_impl_err!("{HLL_MERGE_NAME} does not support data type: {other}"),
}
}
}
impl DfAccumulator for HllState {
fn update_batch(&mut self, values: &[ArrayRef]) -> DfResult<()> {
let array = &values[0];
match array.data_type() {
DataType::Utf8 => {
let string_array = as_string_array(array)?;
for value in string_array.iter().flatten() {
self.update(value);
}
}
DataType::Binary => {
let binary_array = as_binary_array(array)?;
for v in binary_array.iter().flatten() {
self.merge(v);
}
}
_ => {
return not_impl_err!(
"HLL functions do not support data type: {}",
array.data_type()
)
}
}
Ok(())
}
fn evaluate(&mut self) -> DfResult<ScalarValue> {
Ok(ScalarValue::Binary(Some(
bincode::serialize(&self.hll).map_err(|e| {
DataFusionError::Internal(format!("Failed to serialize HyperLogLog: {}", e))
})?,
)))
}
fn size(&self) -> usize {
std::mem::size_of_val(&self.hll)
}
fn state(&mut self) -> DfResult<Vec<ScalarValue>> {
Ok(vec![ScalarValue::Binary(Some(
bincode::serialize(&self.hll).map_err(|e| {
DataFusionError::Internal(format!("Failed to serialize HyperLogLog: {}", e))
})?,
))])
}
fn merge_batch(&mut self, states: &[ArrayRef]) -> DfResult<()> {
let array = &states[0];
let binary_array = as_binary_array(array)?;
for v in binary_array.iter().flatten() {
self.merge(v);
}
Ok(())
}
}
#[cfg(test)]
mod tests {
use datafusion::arrow::array::{BinaryArray, StringArray};
use super::*;
#[test]
fn test_hll_basic() {
let mut state = HllState::new();
state.update("1");
state.update("2");
state.update("3");
let result = state.evaluate().unwrap();
if let ScalarValue::Binary(Some(bytes)) = result {
let mut hll: HllStateType = bincode::deserialize(&bytes).unwrap();
assert_eq!(hll.count().trunc() as u32, 3);
} else {
panic!("Expected binary scalar value");
}
}
#[test]
fn test_hll_roundtrip() {
let mut state = HllState::new();
state.update("1");
state.update("2");
// Serialize
let serialized = state.evaluate().unwrap();
// Create new state and merge the serialized data
let mut new_state = HllState::new();
if let ScalarValue::Binary(Some(bytes)) = &serialized {
new_state.merge(bytes);
// Verify the merged state matches original
let result = new_state.evaluate().unwrap();
if let ScalarValue::Binary(Some(new_bytes)) = result {
let mut original: HllStateType = bincode::deserialize(bytes).unwrap();
let mut merged: HllStateType = bincode::deserialize(&new_bytes).unwrap();
assert_eq!(original.count(), merged.count());
} else {
panic!("Expected binary scalar value");
}
} else {
panic!("Expected binary scalar value");
}
}
#[test]
fn test_hll_batch_update() {
let mut state = HllState::new();
// Test string values
let str_values = vec!["a", "b", "c", "d", "e", "f", "g", "h", "i"];
let str_array = Arc::new(StringArray::from(str_values)) as ArrayRef;
state.update_batch(&[str_array]).unwrap();
let result = state.evaluate().unwrap();
if let ScalarValue::Binary(Some(bytes)) = result {
let mut hll: HllStateType = bincode::deserialize(&bytes).unwrap();
assert_eq!(hll.count().trunc() as u32, 9);
} else {
panic!("Expected binary scalar value");
}
}
#[test]
fn test_hll_merge_batch() {
let mut state1 = HllState::new();
state1.update("1");
let state1_binary = state1.evaluate().unwrap();
let mut state2 = HllState::new();
state2.update("2");
let state2_binary = state2.evaluate().unwrap();
let mut merged_state = HllState::new();
if let (ScalarValue::Binary(Some(bytes1)), ScalarValue::Binary(Some(bytes2))) =
(&state1_binary, &state2_binary)
{
let binary_array = Arc::new(BinaryArray::from(vec![
bytes1.as_slice(),
bytes2.as_slice(),
])) as ArrayRef;
merged_state.merge_batch(&[binary_array]).unwrap();
let result = merged_state.evaluate().unwrap();
if let ScalarValue::Binary(Some(bytes)) = result {
let mut hll: HllStateType = bincode::deserialize(&bytes).unwrap();
assert_eq!(hll.count().trunc() as u32, 2);
} else {
panic!("Expected binary scalar value");
}
} else {
panic!("Expected binary scalar values");
}
}
#[test]
fn test_hll_merge_function() {
// Create two HLL states with different values
let mut state1 = HllState::new();
state1.update("1");
state1.update("2");
let state1_binary = state1.evaluate().unwrap();
let mut state2 = HllState::new();
state2.update("2");
state2.update("3");
let state2_binary = state2.evaluate().unwrap();
// Create a merge state and merge both states
let mut merge_state = HllState::new();
if let (ScalarValue::Binary(Some(bytes1)), ScalarValue::Binary(Some(bytes2))) =
(&state1_binary, &state2_binary)
{
let binary_array = Arc::new(BinaryArray::from(vec![
bytes1.as_slice(),
bytes2.as_slice(),
])) as ArrayRef;
merge_state.update_batch(&[binary_array]).unwrap();
let result = merge_state.evaluate().unwrap();
if let ScalarValue::Binary(Some(bytes)) = result {
let mut hll: HllStateType = bincode::deserialize(&bytes).unwrap();
// Should have 3 unique values: "1", "2", "3"
assert_eq!(hll.count().trunc() as u32, 3);
} else {
panic!("Expected binary scalar value");
}
} else {
panic!("Expected binary scalar values");
}
}
}

View File

@@ -0,0 +1,307 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
use std::sync::Arc;
use common_query::prelude::*;
use common_telemetry::trace;
use datafusion::common::cast::{as_binary_array, as_primitive_array};
use datafusion::common::not_impl_err;
use datafusion::error::{DataFusionError, Result as DfResult};
use datafusion::logical_expr::function::AccumulatorArgs;
use datafusion::logical_expr::{Accumulator as DfAccumulator, AggregateUDF};
use datafusion::physical_plan::expressions::Literal;
use datafusion::prelude::create_udaf;
use datatypes::arrow::array::ArrayRef;
use datatypes::arrow::datatypes::{DataType, Float64Type};
use uddsketch::{SketchHashKey, UDDSketch};
pub const UDDSKETCH_STATE_NAME: &str = "uddsketch_state";
#[derive(Debug)]
pub struct UddSketchState {
uddsketch: UDDSketch,
}
impl UddSketchState {
pub fn new(bucket_size: u64, error_rate: f64) -> Self {
Self {
uddsketch: UDDSketch::new(bucket_size, error_rate),
}
}
pub fn udf_impl() -> AggregateUDF {
create_udaf(
UDDSKETCH_STATE_NAME,
vec![DataType::Int64, DataType::Float64, DataType::Float64],
Arc::new(DataType::Binary),
Volatility::Immutable,
Arc::new(|args| {
let (bucket_size, error_rate) = downcast_accumulator_args(args)?;
Ok(Box::new(UddSketchState::new(bucket_size, error_rate)))
}),
Arc::new(vec![DataType::Binary]),
)
}
fn update(&mut self, value: f64) {
self.uddsketch.add_value(value);
}
fn merge(&mut self, raw: &[u8]) {
if let Ok(uddsketch) = bincode::deserialize::<UDDSketch>(raw) {
if uddsketch.count() != 0 {
self.uddsketch.merge_sketch(&uddsketch);
}
} else {
trace!("Warning: Failed to deserialize UDDSketch from {:?}", raw);
}
}
}
fn downcast_accumulator_args(args: AccumulatorArgs) -> DfResult<(u64, f64)> {
let bucket_size = match args.exprs[0]
.as_any()
.downcast_ref::<Literal>()
.map(|lit| lit.value())
{
Some(ScalarValue::Int64(Some(value))) => *value as u64,
_ => {
return not_impl_err!(
"{} not supported for bucket size: {}",
UDDSKETCH_STATE_NAME,
&args.exprs[0]
)
}
};
let error_rate = match args.exprs[1]
.as_any()
.downcast_ref::<Literal>()
.map(|lit| lit.value())
{
Some(ScalarValue::Float64(Some(value))) => *value,
_ => {
return not_impl_err!(
"{} not supported for error rate: {}",
UDDSKETCH_STATE_NAME,
&args.exprs[1]
)
}
};
Ok((bucket_size, error_rate))
}
impl DfAccumulator for UddSketchState {
fn update_batch(&mut self, values: &[ArrayRef]) -> DfResult<()> {
let array = &values[2]; // the third column is data value
let f64_array = as_primitive_array::<Float64Type>(array)?;
for v in f64_array.iter().flatten() {
self.update(v);
}
Ok(())
}
fn evaluate(&mut self) -> DfResult<ScalarValue> {
Ok(ScalarValue::Binary(Some(
bincode::serialize(&self.uddsketch).map_err(|e| {
DataFusionError::Internal(format!("Failed to serialize UDDSketch: {}", e))
})?,
)))
}
fn size(&self) -> usize {
// Base size of UDDSketch struct fields
let mut total_size = std::mem::size_of::<f64>() * 3 + // alpha, gamma, values_sum
std::mem::size_of::<u32>() + // compactions
std::mem::size_of::<u64>() * 2; // max_buckets, num_values
// Size of buckets (SketchHashMap)
// Each bucket entry contains:
// - SketchHashKey (enum with i64/Zero/Invalid variants)
// - SketchHashEntry (count: u64, next: SketchHashKey)
let bucket_entry_size = std::mem::size_of::<SketchHashKey>() + // key
std::mem::size_of::<u64>() + // count
std::mem::size_of::<SketchHashKey>(); // next
total_size += self.uddsketch.current_buckets_count() * bucket_entry_size;
total_size
}
fn state(&mut self) -> DfResult<Vec<ScalarValue>> {
Ok(vec![ScalarValue::Binary(Some(
bincode::serialize(&self.uddsketch).map_err(|e| {
DataFusionError::Internal(format!("Failed to serialize UDDSketch: {}", e))
})?,
))])
}
fn merge_batch(&mut self, states: &[ArrayRef]) -> DfResult<()> {
let array = &states[0];
let binary_array = as_binary_array(array)?;
for v in binary_array.iter().flatten() {
self.merge(v);
}
Ok(())
}
}
#[cfg(test)]
mod tests {
use datafusion::arrow::array::{BinaryArray, Float64Array};
use super::*;
#[test]
fn test_uddsketch_state_basic() {
let mut state = UddSketchState::new(10, 0.01);
state.update(1.0);
state.update(2.0);
state.update(3.0);
let result = state.evaluate().unwrap();
if let ScalarValue::Binary(Some(bytes)) = result {
let deserialized: UDDSketch = bincode::deserialize(&bytes).unwrap();
assert_eq!(deserialized.count(), 3);
} else {
panic!("Expected binary scalar value");
}
}
#[test]
fn test_uddsketch_state_roundtrip() {
let mut state = UddSketchState::new(10, 0.01);
state.update(1.0);
state.update(2.0);
// Serialize
let serialized = state.evaluate().unwrap();
// Create new state and merge the serialized data
let mut new_state = UddSketchState::new(10, 0.01);
if let ScalarValue::Binary(Some(bytes)) = &serialized {
new_state.merge(bytes);
// Verify the merged state matches original by comparing deserialized values
let original_sketch: UDDSketch = bincode::deserialize(bytes).unwrap();
let new_result = new_state.evaluate().unwrap();
if let ScalarValue::Binary(Some(new_bytes)) = new_result {
let new_sketch: UDDSketch = bincode::deserialize(&new_bytes).unwrap();
assert_eq!(original_sketch.count(), new_sketch.count());
assert_eq!(original_sketch.sum(), new_sketch.sum());
assert_eq!(original_sketch.mean(), new_sketch.mean());
assert_eq!(original_sketch.max_error(), new_sketch.max_error());
// Compare a few quantiles to ensure statistical equivalence
for q in [0.1, 0.5, 0.9].iter() {
assert!(
(original_sketch.estimate_quantile(*q) - new_sketch.estimate_quantile(*q))
.abs()
< 1e-10,
"Quantile {} mismatch: original={}, new={}",
q,
original_sketch.estimate_quantile(*q),
new_sketch.estimate_quantile(*q)
);
}
} else {
panic!("Expected binary scalar value");
}
} else {
panic!("Expected binary scalar value");
}
}
#[test]
fn test_uddsketch_state_batch_update() {
let mut state = UddSketchState::new(10, 0.01);
let values = vec![1.0f64, 2.0, 3.0];
let array = Arc::new(Float64Array::from(values)) as ArrayRef;
state
.update_batch(&[array.clone(), array.clone(), array])
.unwrap();
let result = state.evaluate().unwrap();
if let ScalarValue::Binary(Some(bytes)) = result {
let deserialized: UDDSketch = bincode::deserialize(&bytes).unwrap();
assert_eq!(deserialized.count(), 3);
} else {
panic!("Expected binary scalar value");
}
}
#[test]
fn test_uddsketch_state_merge_batch() {
let mut state1 = UddSketchState::new(10, 0.01);
state1.update(1.0);
let state1_binary = state1.evaluate().unwrap();
let mut state2 = UddSketchState::new(10, 0.01);
state2.update(2.0);
let state2_binary = state2.evaluate().unwrap();
let mut merged_state = UddSketchState::new(10, 0.01);
if let (ScalarValue::Binary(Some(bytes1)), ScalarValue::Binary(Some(bytes2))) =
(&state1_binary, &state2_binary)
{
let binary_array = Arc::new(BinaryArray::from(vec![
bytes1.as_slice(),
bytes2.as_slice(),
])) as ArrayRef;
merged_state.merge_batch(&[binary_array]).unwrap();
let result = merged_state.evaluate().unwrap();
if let ScalarValue::Binary(Some(bytes)) = result {
let deserialized: UDDSketch = bincode::deserialize(&bytes).unwrap();
assert_eq!(deserialized.count(), 2);
} else {
panic!("Expected binary scalar value");
}
} else {
panic!("Expected binary scalar values");
}
}
#[test]
fn test_uddsketch_state_size() {
let mut state = UddSketchState::new(10, 0.01);
let initial_size = state.size();
// Add some values to create buckets
state.update(1.0);
state.update(2.0);
state.update(3.0);
let size_with_values = state.size();
assert!(
size_with_values > initial_size,
"Size should increase after adding values: initial={}, with_values={}",
initial_size,
size_with_values
);
// Verify size increases with more buckets
state.update(10.0); // This should create a new bucket
assert!(
state.size() > size_with_values,
"Size should increase after adding new bucket: prev={}, new={}",
size_with_values,
state.size()
);
}
}

View File

@@ -22,10 +22,12 @@ use crate::function::{AsyncFunctionRef, FunctionRef};
use crate::scalars::aggregate::{AggregateFunctionMetaRef, AggregateFunctions};
use crate::scalars::date::DateFunction;
use crate::scalars::expression::ExpressionFunction;
use crate::scalars::hll_count::HllCalcFunction;
use crate::scalars::json::JsonFunction;
use crate::scalars::matches::MatchesFunction;
use crate::scalars::math::MathFunction;
use crate::scalars::timestamp::TimestampFunction;
use crate::scalars::uddsketch_calc::UddSketchCalcFunction;
use crate::scalars::vector::VectorFunction;
use crate::system::SystemFunction;
use crate::table::TableFunction;
@@ -105,6 +107,8 @@ pub static FUNCTION_REGISTRY: Lazy<Arc<FunctionRegistry>> = Lazy::new(|| {
TimestampFunction::register(&function_registry);
DateFunction::register(&function_registry);
ExpressionFunction::register(&function_registry);
UddSketchCalcFunction::register(&function_registry);
HllCalcFunction::register(&function_registry);
// Aggregate functions
AggregateFunctions::register(&function_registry);

View File

@@ -21,6 +21,7 @@ pub mod scalars;
mod system;
mod table;
pub mod aggr;
pub mod function;
pub mod function_registry;
pub mod handlers;

View File

@@ -22,7 +22,9 @@ pub mod matches;
pub mod math;
pub mod vector;
pub(crate) mod hll_count;
#[cfg(test)]
pub(crate) mod test;
pub(crate) mod timestamp;
pub(crate) mod uddsketch_calc;
pub mod udf;

View File

@@ -0,0 +1,175 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
//! Implementation of the scalar function `hll_count`.
use std::fmt;
use std::fmt::Display;
use std::sync::Arc;
use common_query::error::{DowncastVectorSnafu, InvalidFuncArgsSnafu, Result};
use common_query::prelude::{Signature, Volatility};
use datatypes::data_type::ConcreteDataType;
use datatypes::prelude::Vector;
use datatypes::scalars::{ScalarVector, ScalarVectorBuilder};
use datatypes::vectors::{BinaryVector, MutableVector, UInt64VectorBuilder, VectorRef};
use hyperloglogplus::HyperLogLog;
use snafu::OptionExt;
use crate::aggr::HllStateType;
use crate::function::{Function, FunctionContext};
use crate::function_registry::FunctionRegistry;
const NAME: &str = "hll_count";
/// HllCalcFunction implements the scalar function `hll_count`.
///
/// It accepts one argument:
/// 1. The serialized HyperLogLogPlus state, as produced by the aggregator (binary).
///
/// For each row, it deserializes the sketch and returns the estimated cardinality.
#[derive(Debug, Default)]
pub struct HllCalcFunction;
impl HllCalcFunction {
pub fn register(registry: &FunctionRegistry) {
registry.register(Arc::new(HllCalcFunction));
}
}
impl Display for HllCalcFunction {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
write!(f, "{}", NAME.to_ascii_uppercase())
}
}
impl Function for HllCalcFunction {
fn name(&self) -> &str {
NAME
}
fn return_type(&self, _input_types: &[ConcreteDataType]) -> Result<ConcreteDataType> {
Ok(ConcreteDataType::uint64_datatype())
}
fn signature(&self) -> Signature {
// Only argument: HyperLogLogPlus state (binary)
Signature::exact(
vec![ConcreteDataType::binary_datatype()],
Volatility::Immutable,
)
}
fn eval(&self, _func_ctx: FunctionContext, columns: &[VectorRef]) -> Result<VectorRef> {
if columns.len() != 1 {
return InvalidFuncArgsSnafu {
err_msg: format!("hll_count expects 1 argument, got {}", columns.len()),
}
.fail();
}
let hll_vec = columns[0]
.as_any()
.downcast_ref::<BinaryVector>()
.with_context(|| DowncastVectorSnafu {
err_msg: format!("expect BinaryVector, got {}", columns[0].vector_type_name()),
})?;
let len = hll_vec.len();
let mut builder = UInt64VectorBuilder::with_capacity(len);
for i in 0..len {
let hll_opt = hll_vec.get_data(i);
if hll_opt.is_none() {
builder.push_null();
continue;
}
let hll_bytes = hll_opt.unwrap();
// Deserialize the HyperLogLogPlus from its bincode representation
let mut hll: HllStateType = match bincode::deserialize(hll_bytes) {
Ok(h) => h,
Err(e) => {
common_telemetry::trace!("Failed to deserialize HyperLogLogPlus: {}", e);
builder.push_null();
continue;
}
};
builder.push(Some(hll.count().round() as u64));
}
Ok(builder.to_vector())
}
}
#[cfg(test)]
mod tests {
use datatypes::vectors::BinaryVector;
use super::*;
use crate::utils::FixedRandomState;
#[test]
fn test_hll_count_function() {
let function = HllCalcFunction;
assert_eq!("hll_count", function.name());
assert_eq!(
ConcreteDataType::uint64_datatype(),
function
.return_type(&[ConcreteDataType::uint64_datatype()])
.unwrap()
);
// Create a test HLL
let mut hll = HllStateType::new(14, FixedRandomState::new()).unwrap();
for i in 1..=10 {
hll.insert(&i.to_string());
}
let serialized_bytes = bincode::serialize(&hll).unwrap();
let args: Vec<VectorRef> = vec![Arc::new(BinaryVector::from(vec![Some(serialized_bytes)]))];
let result = function.eval(FunctionContext::default(), &args).unwrap();
assert_eq!(result.len(), 1);
// Test cardinality estimate
if let datatypes::value::Value::UInt64(v) = result.get(0) {
assert_eq!(v, 10);
} else {
panic!("Expected uint64 value");
}
}
#[test]
fn test_hll_count_function_errors() {
let function = HllCalcFunction;
// Test with invalid number of arguments
let args: Vec<VectorRef> = vec![];
let result = function.eval(FunctionContext::default(), &args);
assert!(result.is_err());
assert!(result
.unwrap_err()
.to_string()
.contains("hll_count expects 1 argument"));
// Test with invalid binary data
let args: Vec<VectorRef> = vec![Arc::new(BinaryVector::from(vec![Some(vec![1, 2, 3])]))]; // Invalid binary data
let result = function.eval(FunctionContext::default(), &args).unwrap();
assert_eq!(result.len(), 1);
assert!(matches!(result.get(0), datatypes::value::Value::Null));
}
}

View File

@@ -0,0 +1,211 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
//! Implementation of the scalar function `uddsketch_calc`.
use std::fmt;
use std::fmt::Display;
use std::sync::Arc;
use common_query::error::{DowncastVectorSnafu, InvalidFuncArgsSnafu, Result};
use common_query::prelude::{Signature, Volatility};
use datatypes::data_type::ConcreteDataType;
use datatypes::prelude::Vector;
use datatypes::scalars::{ScalarVector, ScalarVectorBuilder};
use datatypes::vectors::{BinaryVector, Float64VectorBuilder, MutableVector, VectorRef};
use snafu::OptionExt;
use uddsketch::UDDSketch;
use crate::function::{Function, FunctionContext};
use crate::function_registry::FunctionRegistry;
const NAME: &str = "uddsketch_calc";
/// UddSketchCalcFunction implements the scalar function `uddsketch_calc`.
///
/// It accepts two arguments:
/// 1. A percentile (as f64) for which to compute the estimated quantile (e.g. 0.95 for p95).
/// 2. The serialized UDDSketch state, as produced by the aggregator (binary).
///
/// For each row, it deserializes the sketch and returns the computed quantile value.
#[derive(Debug, Default)]
pub struct UddSketchCalcFunction;
impl UddSketchCalcFunction {
pub fn register(registry: &FunctionRegistry) {
registry.register(Arc::new(UddSketchCalcFunction));
}
}
impl Display for UddSketchCalcFunction {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
write!(f, "{}", NAME.to_ascii_uppercase())
}
}
impl Function for UddSketchCalcFunction {
fn name(&self) -> &str {
NAME
}
fn return_type(&self, _input_types: &[ConcreteDataType]) -> Result<ConcreteDataType> {
Ok(ConcreteDataType::float64_datatype())
}
fn signature(&self) -> Signature {
// First argument: percentile (float64)
// Second argument: UDDSketch state (binary)
Signature::exact(
vec![
ConcreteDataType::float64_datatype(),
ConcreteDataType::binary_datatype(),
],
Volatility::Immutable,
)
}
fn eval(&self, _func_ctx: FunctionContext, columns: &[VectorRef]) -> Result<VectorRef> {
if columns.len() != 2 {
return InvalidFuncArgsSnafu {
err_msg: format!("uddsketch_calc expects 2 arguments, got {}", columns.len()),
}
.fail();
}
let perc_vec = &columns[0];
let sketch_vec = columns[1]
.as_any()
.downcast_ref::<BinaryVector>()
.with_context(|| DowncastVectorSnafu {
err_msg: format!("expect BinaryVector, got {}", columns[1].vector_type_name()),
})?;
let len = sketch_vec.len();
let mut builder = Float64VectorBuilder::with_capacity(len);
for i in 0..len {
let perc_opt = perc_vec.get(i).as_f64_lossy();
let sketch_opt = sketch_vec.get_data(i);
if sketch_opt.is_none() || perc_opt.is_none() {
builder.push_null();
continue;
}
let sketch_bytes = sketch_opt.unwrap();
let perc = perc_opt.unwrap();
// Deserialize the UDDSketch from its bincode representation
let sketch: UDDSketch = match bincode::deserialize(sketch_bytes) {
Ok(s) => s,
Err(e) => {
common_telemetry::trace!("Failed to deserialize UDDSketch: {}", e);
builder.push_null();
continue;
}
};
// Compute the estimated quantile from the sketch
let result = sketch.estimate_quantile(perc);
builder.push(Some(result));
}
Ok(builder.to_vector())
}
}
#[cfg(test)]
mod tests {
use std::sync::Arc;
use datatypes::vectors::{BinaryVector, Float64Vector};
use super::*;
#[test]
fn test_uddsketch_calc_function() {
let function = UddSketchCalcFunction;
assert_eq!("uddsketch_calc", function.name());
assert_eq!(
ConcreteDataType::float64_datatype(),
function
.return_type(&[ConcreteDataType::float64_datatype()])
.unwrap()
);
// Create a test sketch
let mut sketch = UDDSketch::new(128, 0.01);
sketch.add_value(10.0);
sketch.add_value(20.0);
sketch.add_value(30.0);
sketch.add_value(40.0);
sketch.add_value(50.0);
sketch.add_value(60.0);
sketch.add_value(70.0);
sketch.add_value(80.0);
sketch.add_value(90.0);
sketch.add_value(100.0);
// Get expected values directly from the sketch
let expected_p50 = sketch.estimate_quantile(0.5);
let expected_p90 = sketch.estimate_quantile(0.9);
let expected_p95 = sketch.estimate_quantile(0.95);
let serialized = bincode::serialize(&sketch).unwrap();
let percentiles = vec![0.5, 0.9, 0.95];
let args: Vec<VectorRef> = vec![
Arc::new(Float64Vector::from_vec(percentiles.clone())),
Arc::new(BinaryVector::from(vec![Some(serialized.clone()); 3])),
];
let result = function.eval(FunctionContext::default(), &args).unwrap();
assert_eq!(result.len(), 3);
// Test median (p50)
assert!(
matches!(result.get(0), datatypes::value::Value::Float64(v) if (v - expected_p50).abs() < 1e-10)
);
// Test p90
assert!(
matches!(result.get(1), datatypes::value::Value::Float64(v) if (v - expected_p90).abs() < 1e-10)
);
// Test p95
assert!(
matches!(result.get(2), datatypes::value::Value::Float64(v) if (v - expected_p95).abs() < 1e-10)
);
}
#[test]
fn test_uddsketch_calc_function_errors() {
let function = UddSketchCalcFunction;
// Test with invalid number of arguments
let args: Vec<VectorRef> = vec![Arc::new(Float64Vector::from_vec(vec![0.95]))];
let result = function.eval(FunctionContext::default(), &args);
assert!(result.is_err());
assert!(result
.unwrap_err()
.to_string()
.contains("uddsketch_calc expects 2 arguments"));
// Test with invalid binary data
let args: Vec<VectorRef> = vec![
Arc::new(Float64Vector::from_vec(vec![0.95])),
Arc::new(BinaryVector::from(vec![Some(vec![1, 2, 3])])), // Invalid binary data
];
let result = function.eval(FunctionContext::default(), &args).unwrap();
assert_eq!(result.len(), 1);
assert!(matches!(result.get(0), datatypes::value::Value::Null));
}
}

View File

@@ -12,6 +12,11 @@
// See the License for the specific language governing permissions and
// limitations under the License.
use std::hash::BuildHasher;
use ahash::RandomState;
use serde::{Deserialize, Serialize};
/// Escapes special characters in the provided pattern string for `LIKE`.
///
/// Specifically, it prefixes the backslash (`\`), percent (`%`), and underscore (`_`)
@@ -32,6 +37,71 @@ pub fn escape_like_pattern(pattern: &str) -> String {
})
.collect::<String>()
}
/// A random state with fixed seeds.
///
/// This is used to ensure that the hash values are consistent across
/// different processes, and easy to serialize and deserialize.
#[derive(Debug)]
pub struct FixedRandomState {
state: RandomState,
}
impl FixedRandomState {
// some random seeds
const RANDOM_SEED_0: u64 = 0x517cc1b727220a95;
const RANDOM_SEED_1: u64 = 0x428a2f98d728ae22;
const RANDOM_SEED_2: u64 = 0x7137449123ef65cd;
const RANDOM_SEED_3: u64 = 0xb5c0fbcfec4d3b2f;
pub fn new() -> Self {
Self {
state: ahash::RandomState::with_seeds(
Self::RANDOM_SEED_0,
Self::RANDOM_SEED_1,
Self::RANDOM_SEED_2,
Self::RANDOM_SEED_3,
),
}
}
}
impl Default for FixedRandomState {
fn default() -> Self {
Self::new()
}
}
impl BuildHasher for FixedRandomState {
type Hasher = ahash::AHasher;
fn build_hasher(&self) -> Self::Hasher {
self.state.build_hasher()
}
fn hash_one<T: std::hash::Hash>(&self, x: T) -> u64 {
self.state.hash_one(x)
}
}
impl Serialize for FixedRandomState {
fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where
S: serde::Serializer,
{
serializer.serialize_unit()
}
}
impl<'de> Deserialize<'de> for FixedRandomState {
fn deserialize<D>(_deserializer: D) -> Result<Self, D::Error>
where
D: serde::Deserializer<'de>,
{
Ok(Self::new())
}
}
#[cfg(test)]
mod tests {
use super::*;

View File

@@ -22,4 +22,4 @@ store-api.workspace = true
table.workspace = true
[dev-dependencies]
paste = "1.0"
paste.workspace = true

View File

@@ -16,7 +16,6 @@ use std::collections::HashMap;
use std::sync::Arc;
use futures::future::BoxFuture;
use futures::TryStreamExt;
use moka::future::Cache;
use moka::ops::compute::Op;
use table::metadata::TableId;
@@ -54,9 +53,13 @@ fn init_factory(table_flow_manager: TableFlowManagerRef) -> Initializer<TableId,
Box::pin(async move {
table_flow_manager
.flows(table_id)
.map_ok(|(key, value)| (key.flownode_id(), value.peer))
.try_collect::<HashMap<_, _>>()
.await
.map(|flows| {
flows
.into_iter()
.map(|(key, value)| (key.flownode_id(), value.peer))
.collect::<HashMap<_, _>>()
})
// We must cache the `HashSet` even if it's empty,
// to avoid future requests to the remote storage next time;
// If the value is added to the remote storage,

View File

@@ -12,8 +12,10 @@
// See the License for the specific language governing permissions and
// limitations under the License.
use std::hash::{DefaultHasher, Hash, Hasher};
use std::str::FromStr;
use api::v1::meta::HeartbeatRequest;
use common_error::ext::ErrorExt;
use lazy_static::lazy_static;
use regex::Regex;
@@ -55,12 +57,10 @@ pub trait ClusterInfo {
}
/// The key of [NodeInfo] in the storage. The format is `__meta_cluster_node_info-{cluster_id}-{role}-{node_id}`.
///
/// This key cannot be used to describe the `Metasrv` because the `Metasrv` does not have
/// a `cluster_id`, it serves multiple clusters.
#[derive(Debug, Clone, Eq, Hash, PartialEq, Serialize, Deserialize)]
#[derive(Debug, Clone, Copy, Eq, Hash, PartialEq, Serialize, Deserialize)]
pub struct NodeInfoKey {
/// The cluster id.
// todo(hl): remove cluster_id as it is not assigned anywhere.
pub cluster_id: ClusterId,
/// The role of the node. It can be `[Role::Datanode]` or `[Role::Frontend]`.
pub role: Role,
@@ -69,6 +69,28 @@ pub struct NodeInfoKey {
}
impl NodeInfoKey {
/// Try to create a `NodeInfoKey` from a "good" heartbeat request. "good" as in every needed
/// piece of information is provided and valid.
pub fn new(request: &HeartbeatRequest) -> Option<Self> {
let HeartbeatRequest { header, peer, .. } = request;
let header = header.as_ref()?;
let peer = peer.as_ref()?;
let role = header.role.try_into().ok()?;
let node_id = match role {
// Because the Frontend is stateless, it's too easy to neglect choosing a unique id
// for it when setting up a cluster. So we calculate its id from its address.
Role::Frontend => calculate_node_id(&peer.addr),
_ => peer.id,
};
Some(NodeInfoKey {
cluster_id: header.cluster_id,
role,
node_id,
})
}
pub fn key_prefix_with_cluster_id(cluster_id: u64) -> String {
format!("{}-{}-", CLUSTER_NODE_INFO_PREFIX, cluster_id)
}
@@ -83,6 +105,13 @@ impl NodeInfoKey {
}
}
/// Calculate (by using the DefaultHasher) the node's id from its address.
fn calculate_node_id(addr: &str) -> u64 {
let mut hasher = DefaultHasher::new();
addr.hash(&mut hasher);
hasher.finish()
}
/// The information of a node in the cluster.
#[derive(Debug, Serialize, Deserialize)]
pub struct NodeInfo {
@@ -100,7 +129,7 @@ pub struct NodeInfo {
pub start_time_ms: u64,
}
#[derive(Debug, Clone, Eq, Hash, PartialEq, Serialize, Deserialize)]
#[derive(Debug, Clone, Copy, Eq, Hash, PartialEq, Serialize, Deserialize)]
pub enum Role {
Datanode,
Frontend,
@@ -201,8 +230,8 @@ impl TryFrom<Vec<u8>> for NodeInfoKey {
}
}
impl From<NodeInfoKey> for Vec<u8> {
fn from(key: NodeInfoKey) -> Self {
impl From<&NodeInfoKey> for Vec<u8> {
fn from(key: &NodeInfoKey) -> Self {
format!(
"{}-{}-{}-{}",
CLUSTER_NODE_INFO_PREFIX,
@@ -271,6 +300,7 @@ impl TryFrom<i32> for Role {
mod tests {
use std::assert_matches::assert_matches;
use super::*;
use crate::cluster::Role::{Datanode, Frontend};
use crate::cluster::{DatanodeStatus, NodeInfo, NodeInfoKey, NodeStatus};
use crate::peer::Peer;
@@ -283,7 +313,7 @@ mod tests {
node_id: 2,
};
let key_bytes: Vec<u8> = key.into();
let key_bytes: Vec<u8> = (&key).into();
let new_key: NodeInfoKey = key_bytes.try_into().unwrap();
assert_eq!(1, new_key.cluster_id);
@@ -338,4 +368,26 @@ mod tests {
let prefix = NodeInfoKey::key_prefix_with_role(2, Frontend);
assert_eq!(prefix, "__meta_cluster_node_info-2-1-");
}
#[test]
fn test_calculate_node_id_from_addr() {
// Test empty string
assert_eq!(calculate_node_id(""), calculate_node_id(""));
// Test same addresses return same ids
let addr1 = "127.0.0.1:8080";
let id1 = calculate_node_id(addr1);
let id2 = calculate_node_id(addr1);
assert_eq!(id1, id2);
// Test different addresses return different ids
let addr2 = "127.0.0.1:8081";
let id3 = calculate_node_id(addr2);
assert_ne!(id1, id3);
// Test long address
let long_addr = "very.long.domain.name.example.com:9999";
let id4 = calculate_node_id(long_addr);
assert!(id4 > 0);
}
}

View File

@@ -15,6 +15,7 @@
mod metadata;
use std::collections::BTreeMap;
use std::fmt;
use api::v1::flow::flow_request::Body as PbFlowRequest;
use api::v1::flow::{CreateRequest, FlowRequest, FlowRequestHeader};
@@ -28,7 +29,6 @@ use common_procedure::{
use common_telemetry::info;
use common_telemetry::tracing_context::TracingContext;
use futures::future::join_all;
use futures::TryStreamExt;
use itertools::Itertools;
use serde::{Deserialize, Serialize};
use snafu::{ensure, ResultExt};
@@ -77,6 +77,7 @@ impl CreateFlowProcedure {
query_context,
state: CreateFlowState::Prepare,
prev_flow_info_value: None,
flow_type: None,
},
}
}
@@ -104,7 +105,7 @@ impl CreateFlowProcedure {
if create_if_not_exists && or_replace {
// this is forbidden because not clear what does that mean exactly
return error::UnsupportedSnafu {
operation: "Create flow with both `IF NOT EXISTS` and `OR REPLACE`".to_string(),
operation: "Create flow with both `IF NOT EXISTS` and `OR REPLACE`",
}
.fail();
}
@@ -129,9 +130,10 @@ impl CreateFlowProcedure {
.flow_metadata_manager
.flow_route_manager()
.routes(flow_id)
.map_ok(|(_, value)| value.peer)
.try_collect::<Vec<_>>()
.await?;
.await?
.into_iter()
.map(|(_, value)| value.peer)
.collect::<Vec<_>>();
self.data.flow_id = Some(flow_id);
self.data.peers = peers;
info!("Replacing flow, flow_id: {}", flow_id);
@@ -175,6 +177,8 @@ impl CreateFlowProcedure {
self.allocate_flow_id().await?;
}
self.data.state = CreateFlowState::CreateFlows;
// determine flow type
self.data.flow_type = Some(determine_flow_type(&self.data.task));
Ok(Status::executing(true))
}
@@ -309,6 +313,11 @@ impl Procedure for CreateFlowProcedure {
}
}
pub fn determine_flow_type(_flow_task: &CreateFlowTask) -> FlowType {
// TODO(discord9): determine flow type
FlowType::RecordingRule
}
/// The state of [CreateFlowProcedure].
#[derive(Debug, Clone, Serialize, Deserialize, AsRefStr, PartialEq)]
pub enum CreateFlowState {
@@ -322,6 +331,35 @@ pub enum CreateFlowState {
CreateMetadata,
}
/// The type of flow.
#[derive(Debug, Clone, Copy, Serialize, Deserialize)]
pub enum FlowType {
/// The flow is a recording rule task.
RecordingRule,
/// The flow is a streaming task.
Streaming,
}
impl FlowType {
pub const RECORDING_RULE: &str = "recording_rule";
pub const STREAMING: &str = "streaming";
}
impl Default for FlowType {
fn default() -> Self {
Self::RecordingRule
}
}
impl fmt::Display for FlowType {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
match self {
FlowType::RecordingRule => write!(f, "{}", FlowType::RECORDING_RULE),
FlowType::Streaming => write!(f, "{}", FlowType::STREAMING),
}
}
}
/// The serializable data.
#[derive(Debug, Serialize, Deserialize)]
pub struct CreateFlowData {
@@ -335,6 +373,7 @@ pub struct CreateFlowData {
/// For verify if prev value is consistent when need to update flow metadata.
/// only set when `or_replace` is true.
pub(crate) prev_flow_info_value: Option<DeserializedValueWithBytes<FlowInfoValue>>,
pub(crate) flow_type: Option<FlowType>,
}
impl From<&CreateFlowData> for CreateRequest {
@@ -342,7 +381,7 @@ impl From<&CreateFlowData> for CreateRequest {
let flow_id = value.flow_id.unwrap();
let source_table_ids = &value.source_table_ids;
CreateRequest {
let mut req = CreateRequest {
flow_id: Some(api::v1::FlowId { id: flow_id }),
source_table_ids: source_table_ids
.iter()
@@ -356,7 +395,11 @@ impl From<&CreateFlowData> for CreateRequest {
comment: value.task.comment.clone(),
sql: value.task.sql.clone(),
flow_options: value.task.flow_options.clone(),
}
};
let flow_type = value.flow_type.unwrap_or_default().to_string();
req.flow_options.insert("flow_type".to_string(), flow_type);
req
}
}
@@ -369,7 +412,7 @@ impl From<&CreateFlowData> for (FlowInfoValue, Vec<(FlowPartitionId, FlowRouteVa
expire_after,
comment,
sql,
flow_options: options,
flow_options: mut options,
..
} = value.task.clone();
@@ -386,19 +429,21 @@ impl From<&CreateFlowData> for (FlowInfoValue, Vec<(FlowPartitionId, FlowRouteVa
.map(|(idx, peer)| (idx as u32, FlowRouteValue { peer: peer.clone() }))
.collect::<Vec<_>>();
(
FlowInfoValue {
source_table_ids: value.source_table_ids.clone(),
sink_table_name,
flownode_ids,
catalog_name,
flow_name,
raw_sql: sql,
expire_after,
comment,
options,
},
flow_routes,
)
let flow_type = value.flow_type.unwrap_or_default().to_string();
options.insert("flow_type".to_string(), flow_type);
let flow_info = FlowInfoValue {
source_table_ids: value.source_table_ids.clone(),
sink_table_name,
flownode_ids,
catalog_name,
flow_name,
raw_sql: sql,
expire_after,
comment,
options,
};
(flow_info, flow_routes)
}
}

View File

@@ -128,7 +128,7 @@ impl State for DropDatabaseExecutor {
.await?;
executor.invalidate_table_cache(ddl_ctx).await?;
executor
.on_drop_regions(ddl_ctx, &self.physical_region_routes)
.on_drop_regions(ddl_ctx, &self.physical_region_routes, true)
.await?;
info!("Table: {}({}) is dropped", self.table_name, self.table_id);

View File

@@ -13,7 +13,6 @@
// limitations under the License.
use common_catalog::format_full_flow_name;
use futures::TryStreamExt;
use snafu::{ensure, OptionExt};
use crate::ddl::drop_flow::DropFlowProcedure;
@@ -39,9 +38,10 @@ impl DropFlowProcedure {
.flow_metadata_manager
.flow_route_manager()
.routes(self.data.task.flow_id)
.map_ok(|(_, value)| value)
.try_collect::<Vec<_>>()
.await?;
.await?
.into_iter()
.map(|(_, value)| value)
.collect::<Vec<_>>();
ensure!(
!flow_route_values.is_empty(),
error::FlowRouteNotFoundSnafu {

View File

@@ -156,7 +156,7 @@ impl DropTableProcedure {
pub async fn on_datanode_drop_regions(&mut self) -> Result<Status> {
self.executor
.on_drop_regions(&self.context, &self.data.physical_region_routes)
.on_drop_regions(&self.context, &self.data.physical_region_routes, false)
.await?;
self.data.state = DropTableState::DeleteTombstone;
Ok(Status::executing(true))

View File

@@ -214,6 +214,7 @@ impl DropTableExecutor {
&self,
ctx: &DdlContext,
region_routes: &[RegionRoute],
fast_path: bool,
) -> Result<()> {
let leaders = find_leaders(region_routes);
let mut drop_region_tasks = Vec::with_capacity(leaders.len());
@@ -236,6 +237,7 @@ impl DropTableExecutor {
}),
body: Some(region_request::Body::Drop(PbDropRegionRequest {
region_id: region_id.as_u64(),
fast_path,
})),
};
let datanode = datanode.clone();

View File

@@ -16,9 +16,9 @@ pub mod flow_info;
pub(crate) mod flow_name;
pub(crate) mod flow_route;
pub mod flow_state;
mod flownode_addr_helper;
pub(crate) mod flownode_flow;
pub(crate) mod table_flow;
use std::ops::Deref;
use std::sync::Arc;
@@ -506,7 +506,6 @@ mod tests {
let routes = flow_metadata_manager
.flow_route_manager()
.routes(flow_id)
.try_collect::<Vec<_>>()
.await
.unwrap();
assert_eq!(
@@ -538,7 +537,6 @@ mod tests {
let nodes = flow_metadata_manager
.table_flow_manager()
.flows(table_id)
.try_collect::<Vec<_>>()
.await
.unwrap();
assert_eq!(
@@ -727,7 +725,6 @@ mod tests {
let routes = flow_metadata_manager
.flow_route_manager()
.routes(flow_id)
.try_collect::<Vec<_>>()
.await
.unwrap();
assert_eq!(
@@ -759,7 +756,6 @@ mod tests {
let nodes = flow_metadata_manager
.table_flow_manager()
.flows(table_id)
.try_collect::<Vec<_>>()
.await
.unwrap();
assert_eq!(

View File

@@ -12,14 +12,15 @@
// See the License for the specific language governing permissions and
// limitations under the License.
use futures::stream::BoxStream;
use futures::TryStreamExt;
use lazy_static::lazy_static;
use regex::Regex;
use serde::{Deserialize, Serialize};
use snafu::OptionExt;
use crate::error::{self, Result};
use crate::key::flow::FlowScoped;
use crate::key::flow::{flownode_addr_helper, FlowScoped};
use crate::key::node_address::NodeAddressKey;
use crate::key::{BytesAdapter, FlowId, FlowPartitionId, MetadataKey, MetadataValue};
use crate::kv_backend::txn::{Txn, TxnOp};
use crate::kv_backend::KvBackendRef;
@@ -167,10 +168,7 @@ impl FlowRouteManager {
}
/// Retrieves all [FlowRouteValue]s of the specified `flow_id`.
pub fn routes(
&self,
flow_id: FlowId,
) -> BoxStream<'static, Result<(FlowRouteKey, FlowRouteValue)>> {
pub async fn routes(&self, flow_id: FlowId) -> Result<Vec<(FlowRouteKey, FlowRouteValue)>> {
let start_key = FlowRouteKey::range_start_key(flow_id);
let req = RangeRequest::new().with_prefix(start_key);
let stream = PaginationStream::new(
@@ -181,7 +179,9 @@ impl FlowRouteManager {
)
.into_stream();
Box::pin(stream)
let mut res = stream.try_collect::<Vec<_>>().await?;
self.remap_flow_route_addresses(&mut res).await?;
Ok(res)
}
/// Builds a create flow routes transaction.
@@ -203,6 +203,28 @@ impl FlowRouteManager {
Ok(Txn::new().and_then(txns))
}
async fn remap_flow_route_addresses(
&self,
flow_routes: &mut [(FlowRouteKey, FlowRouteValue)],
) -> Result<()> {
let keys = flow_routes
.iter()
.map(|(_, value)| NodeAddressKey::with_flownode(value.peer.id))
.collect();
let flow_node_addrs =
flownode_addr_helper::get_flownode_addresses(&self.kv_backend, keys).await?;
for (_, flow_route_value) in flow_routes.iter_mut() {
let flownode_id = flow_route_value.peer.id;
// If an id lacks a corresponding address in the `flow_node_addrs`,
// it means the old address in `table_flow_value` is still valid,
// which is expected.
if let Some(node_addr) = flow_node_addrs.get(&flownode_id) {
flow_route_value.peer.addr = node_addr.peer.addr.clone();
}
}
Ok(())
}
}
#[cfg(test)]

View File

@@ -0,0 +1,47 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
use std::collections::HashMap;
use crate::error::Result;
use crate::key::node_address::{NodeAddressKey, NodeAddressValue};
use crate::key::{MetadataKey, MetadataValue};
use crate::kv_backend::KvBackendRef;
use crate::rpc::store::BatchGetRequest;
/// Get the addresses of the flownodes.
/// The result is a map: node_id -> NodeAddressValue
pub(crate) async fn get_flownode_addresses(
kv_backend: &KvBackendRef,
keys: Vec<NodeAddressKey>,
) -> Result<HashMap<u64, NodeAddressValue>> {
if keys.is_empty() {
return Ok(HashMap::default());
}
let req = BatchGetRequest {
keys: keys.into_iter().map(|k| k.to_bytes()).collect(),
};
kv_backend
.batch_get(req)
.await?
.kvs
.into_iter()
.map(|kv| {
let key = NodeAddressKey::from_bytes(&kv.key)?;
let value = NodeAddressValue::try_from_raw_value(&kv.value)?;
Ok((key.node_id, value))
})
.collect()
}

View File

@@ -14,7 +14,7 @@
use std::sync::Arc;
use futures::stream::BoxStream;
use futures::TryStreamExt;
use lazy_static::lazy_static;
use regex::Regex;
use serde::{Deserialize, Serialize};
@@ -22,7 +22,8 @@ use snafu::OptionExt;
use table::metadata::TableId;
use crate::error::{self, Result};
use crate::key::flow::FlowScoped;
use crate::key::flow::{flownode_addr_helper, FlowScoped};
use crate::key::node_address::NodeAddressKey;
use crate::key::{BytesAdapter, FlowId, FlowPartitionId, MetadataKey, MetadataValue};
use crate::kv_backend::txn::{Txn, TxnOp};
use crate::kv_backend::KvBackendRef;
@@ -196,10 +197,7 @@ impl TableFlowManager {
/// Retrieves all [TableFlowKey]s of the specified `table_id`.
///
/// TODO(discord9): add cache for it since range request does not support cache.
pub fn flows(
&self,
table_id: TableId,
) -> BoxStream<'static, Result<(TableFlowKey, TableFlowValue)>> {
pub async fn flows(&self, table_id: TableId) -> Result<Vec<(TableFlowKey, TableFlowValue)>> {
let start_key = TableFlowKey::range_start_key(table_id);
let req = RangeRequest::new().with_prefix(start_key);
let stream = PaginationStream::new(
@@ -210,7 +208,9 @@ impl TableFlowManager {
)
.into_stream();
Box::pin(stream)
let mut res = stream.try_collect::<Vec<_>>().await?;
self.remap_table_flow_addresses(&mut res).await?;
Ok(res)
}
/// Builds a create table flow transaction.
@@ -238,6 +238,28 @@ impl TableFlowManager {
Ok(Txn::new().and_then(txns))
}
async fn remap_table_flow_addresses(
&self,
table_flows: &mut [(TableFlowKey, TableFlowValue)],
) -> Result<()> {
let keys = table_flows
.iter()
.map(|(_, value)| NodeAddressKey::with_flownode(value.peer.id))
.collect::<Vec<_>>();
let flownode_addrs =
flownode_addr_helper::get_flownode_addresses(&self.kv_backend, keys).await?;
for (_, table_flow_value) in table_flows.iter_mut() {
let flownode_id = table_flow_value.peer.id;
// If an id lacks a corresponding address in the `flow_node_addrs`,
// it means the old address in `table_flow_value` is still valid,
// which is expected.
if let Some(flownode_addr) = flownode_addrs.get(&flownode_id) {
table_flow_value.peer.addr = flownode_addr.peer.addr.clone();
}
}
Ok(())
}
}
#[cfg(test)]

View File

@@ -39,6 +39,10 @@ impl NodeAddressKey {
pub fn with_datanode(node_id: u64) -> Self {
Self::new(Role::Datanode, node_id)
}
pub fn with_flownode(node_id: u64) -> Self {
Self::new(Role::Flownode, node_id)
}
}
#[derive(Debug, PartialEq, Serialize, Deserialize, Clone)]

View File

@@ -34,6 +34,7 @@ pub mod kv_backend;
pub mod leadership_notifier;
pub mod lock_key;
pub mod metrics;
pub mod node_expiry_listener;
pub mod node_manager;
pub mod peer;
pub mod range_stream;

View File

@@ -0,0 +1,152 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
use std::sync::Mutex;
use std::time::Duration;
use common_telemetry::{debug, error, info, warn};
use tokio::task::JoinHandle;
use tokio::time::{interval, MissedTickBehavior};
use crate::cluster::{NodeInfo, NodeInfoKey};
use crate::error;
use crate::kv_backend::ResettableKvBackendRef;
use crate::leadership_notifier::LeadershipChangeListener;
use crate::rpc::store::RangeRequest;
use crate::rpc::KeyValue;
/// [NodeExpiryListener] periodically checks all node info in memory and removes
/// expired node info to prevent memory leak.
pub struct NodeExpiryListener {
handle: Mutex<Option<JoinHandle<()>>>,
max_idle_time: Duration,
in_memory: ResettableKvBackendRef,
}
impl Drop for NodeExpiryListener {
fn drop(&mut self) {
self.stop();
}
}
impl NodeExpiryListener {
pub fn new(max_idle_time: Duration, in_memory: ResettableKvBackendRef) -> Self {
Self {
handle: Mutex::new(None),
max_idle_time,
in_memory,
}
}
async fn start(&self) {
let mut handle = self.handle.lock().unwrap();
if handle.is_none() {
let in_memory = self.in_memory.clone();
let max_idle_time = self.max_idle_time;
let ticker_loop = tokio::spawn(async move {
// Run clean task every minute.
let mut interval = interval(Duration::from_secs(60));
interval.set_missed_tick_behavior(MissedTickBehavior::Skip);
loop {
interval.tick().await;
if let Err(e) = Self::clean_expired_nodes(&in_memory, max_idle_time).await {
error!(e; "Failed to clean expired node");
}
}
});
*handle = Some(ticker_loop);
}
}
fn stop(&self) {
if let Some(handle) = self.handle.lock().unwrap().take() {
handle.abort();
info!("Node expiry listener stopped")
}
}
/// Cleans expired nodes from memory.
async fn clean_expired_nodes(
in_memory: &ResettableKvBackendRef,
max_idle_time: Duration,
) -> error::Result<()> {
let node_keys = Self::list_expired_nodes(in_memory, max_idle_time).await?;
for key in node_keys {
let key_bytes: Vec<u8> = (&key).into();
if let Err(e) = in_memory.delete(&key_bytes, false).await {
warn!(e; "Failed to delete expired node: {:?}", key_bytes);
} else {
debug!("Deleted expired node key: {:?}", key);
}
}
Ok(())
}
/// Lists expired nodes that have been inactive more than `max_idle_time`.
async fn list_expired_nodes(
in_memory: &ResettableKvBackendRef,
max_idle_time: Duration,
) -> error::Result<impl Iterator<Item = NodeInfoKey>> {
let prefix = NodeInfoKey::key_prefix_with_cluster_id(0);
let req = RangeRequest::new().with_prefix(prefix);
let current_time_millis = common_time::util::current_time_millis();
let resp = in_memory.range(req).await?;
Ok(resp
.kvs
.into_iter()
.filter_map(move |KeyValue { key, value }| {
let Ok(info) = NodeInfo::try_from(value).inspect_err(|e| {
warn!(e; "Unrecognized node info value");
}) else {
return None;
};
if (current_time_millis - info.last_activity_ts) > max_idle_time.as_millis() as i64
{
NodeInfoKey::try_from(key)
.inspect_err(|e| {
warn!(e; "Unrecognized node info key: {:?}", info.peer);
})
.ok()
.inspect(|node_key| {
debug!("Found expired node: {:?}", node_key);
})
} else {
None
}
}))
}
}
#[async_trait::async_trait]
impl LeadershipChangeListener for NodeExpiryListener {
fn name(&self) -> &str {
"NodeExpiryListener"
}
async fn on_leader_start(&self) -> error::Result<()> {
self.start().await;
info!(
"On leader start, node expiry listener started with max idle time: {:?}",
self.max_idle_time
);
Ok(())
}
async fn on_leader_stop(&self) -> error::Result<()> {
self.stop();
info!("On leader stop, node expiry listener stopped");
Ok(())
}
}

View File

@@ -39,7 +39,7 @@ datafusion-common.workspace = true
datafusion-expr.workspace = true
datatypes.workspace = true
file-engine.workspace = true
futures = "0.3"
futures.workspace = true
futures-util.workspace = true
humantime-serde.workspace = true
lazy_static.workspace = true
@@ -47,6 +47,7 @@ log-store.workspace = true
meta-client.workspace = true
metric-engine.workspace = true
mito2.workspace = true
num_cpus.workspace = true
object-store.workspace = true
prometheus.workspace = true
prost.workspace = true

View File

@@ -224,6 +224,20 @@ impl HeartbeatTask {
common_runtime::spawn_hb(async move {
let sleep = tokio::time::sleep(Duration::from_millis(0));
tokio::pin!(sleep);
let build_info = common_version::build_info();
let heartbeat_request = HeartbeatRequest {
peer: self_peer,
node_epoch,
info: Some(NodeInfo {
version: build_info.version.to_string(),
git_commit: build_info.commit_short.to_string(),
start_time_ms: node_epoch,
cpus: num_cpus::get() as u32,
}),
..Default::default()
};
loop {
if !running.load(Ordering::Relaxed) {
info!("shutdown heartbeat task");
@@ -235,9 +249,8 @@ impl HeartbeatTask {
match outgoing_message_to_mailbox_message(message) {
Ok(message) => {
let req = HeartbeatRequest {
peer: self_peer.clone(),
mailbox_message: Some(message),
..Default::default()
..heartbeat_request.clone()
};
HEARTBEAT_RECV_COUNT.with_label_values(&["success"]).inc();
Some(req)
@@ -253,22 +266,13 @@ impl HeartbeatTask {
}
}
_ = &mut sleep => {
let build_info = common_version::build_info();
let region_stats = Self::load_region_stats(&region_server_clone);
let now = Instant::now();
let duration_since_epoch = (now - epoch).as_millis() as u64;
let req = HeartbeatRequest {
peer: self_peer.clone(),
region_stats,
duration_since_epoch,
node_epoch,
info: Some(NodeInfo {
version: build_info.version.to_string(),
git_commit: build_info.commit_short.to_string(),
// The start timestamp is the same as node_epoch currently.
start_time_ms: node_epoch,
}),
..Default::default()
..heartbeat_request.clone()
};
sleep.as_mut().reset(now + Duration::from_millis(interval));
Some(req)

View File

@@ -1218,7 +1218,10 @@ mod tests {
);
let response = mock_region_server
.handle_request(region_id, RegionRequest::Drop(RegionDropRequest {}))
.handle_request(
region_id,
RegionRequest::Drop(RegionDropRequest { fast_path: false }),
)
.await
.unwrap();
assert_eq!(response.affected_rows, 0);
@@ -1310,7 +1313,10 @@ mod tests {
.insert(region_id, RegionEngineWithStatus::Ready(engine.clone()));
mock_region_server
.handle_request(region_id, RegionRequest::Drop(RegionDropRequest {}))
.handle_request(
region_id,
RegionRequest::Drop(RegionDropRequest { fast_path: false }),
)
.await
.unwrap_err();

View File

@@ -29,7 +29,7 @@ jsonb.workspace = true
num = "0.4"
num-traits = "0.2"
ordered-float = { version = "3.0", features = ["serde"] }
paste = "1.0"
paste.workspace = true
serde.workspace = true
serde_json.workspace = true
snafu.workspace = true

View File

@@ -32,5 +32,5 @@ pub mod types;
pub mod value;
pub mod vectors;
pub use arrow;
pub use arrow::{self, compute};
pub use error::{Error, Result};

View File

@@ -13,7 +13,7 @@ workspace = true
[dependencies]
api.workspace = true
async-trait = "0.1"
async-trait.workspace = true
common-catalog.workspace = true
common-datasource.workspace = true
common-error.workspace = true

View File

@@ -41,7 +41,7 @@ datafusion-substrait.workspace = true
datatypes.workspace = true
enum-as-inner = "0.6.0"
enum_dispatch = "0.3"
futures = "0.3"
futures.workspace = true
get-size2 = "0.1.2"
greptime-proto.workspace = true
# This fork of hydroflow is simply for keeping our dependency in our org, and pin the version
@@ -53,6 +53,7 @@ lazy_static.workspace = true
meta-client.workspace = true
nom = "7.1.3"
num-traits = "0.2"
num_cpus.workspace = true
operator.workspace = true
partition.workspace = true
prometheus.workspace = true

View File

@@ -60,12 +60,12 @@ async fn query_flow_state(
#[derive(Clone)]
pub struct HeartbeatTask {
node_id: u64,
node_epoch: u64,
peer_addr: String,
meta_client: Arc<MetaClient>,
report_interval: Duration,
retry_interval: Duration,
resp_handler_executor: HeartbeatResponseHandlerExecutorRef,
start_time_ms: u64,
running: Arc<AtomicBool>,
query_stat_size: Option<SizeReportSender>,
}
@@ -83,12 +83,12 @@ impl HeartbeatTask {
) -> Self {
Self {
node_id: opts.node_id.unwrap_or(0),
node_epoch: common_time::util::current_time_millis() as u64,
peer_addr: addrs::resolve_addr(&opts.grpc.bind_addr, Some(&opts.grpc.server_addr)),
meta_client,
report_interval: heartbeat_opts.interval,
retry_interval: heartbeat_opts.retry_interval,
resp_handler_executor,
start_time_ms: common_time::util::current_time_millis() as u64,
running: Arc::new(AtomicBool::new(false)),
query_stat_size: None,
}
@@ -103,6 +103,11 @@ impl HeartbeatTask {
warn!("Heartbeat task started multiple times");
return Ok(());
}
self.create_streams().await
}
async fn create_streams(&self) -> Result<(), Error> {
info!("Start to establish the heartbeat connection to metasrv.");
let (req_sender, resp_stream) = self
.meta_client
@@ -134,10 +139,9 @@ impl HeartbeatTask {
}
}
fn create_heartbeat_request(
fn new_heartbeat_request(
heartbeat_request: &HeartbeatRequest,
message: Option<OutgoingMessage>,
peer: Option<Peer>,
start_time_ms: u64,
latest_report: &Option<FlowStat>,
) -> Option<HeartbeatRequest> {
let mailbox_message = match message.map(outgoing_message_to_mailbox_message) {
@@ -161,10 +165,8 @@ impl HeartbeatTask {
Some(HeartbeatRequest {
mailbox_message,
peer,
info: Self::build_node_info(start_time_ms),
flow_stat,
..Default::default()
..heartbeat_request.clone()
})
}
@@ -174,6 +176,7 @@ impl HeartbeatTask {
version: build_info.version.to_string(),
git_commit: build_info.commit_short.to_string(),
start_time_ms,
cpus: num_cpus::get() as u32,
})
}
@@ -183,7 +186,7 @@ impl HeartbeatTask {
mut outgoing_rx: mpsc::Receiver<OutgoingMessage>,
) {
let report_interval = self.report_interval;
let start_time_ms = self.start_time_ms;
let node_epoch = self.node_epoch;
let self_peer = Some(Peer {
id: self.node_id,
addr: self.peer_addr.clone(),
@@ -198,18 +201,25 @@ impl HeartbeatTask {
interval.set_missed_tick_behavior(tokio::time::MissedTickBehavior::Delay);
let mut latest_report = None;
let heartbeat_request = HeartbeatRequest {
peer: self_peer,
node_epoch,
info: Self::build_node_info(node_epoch),
..Default::default()
};
loop {
let req = tokio::select! {
message = outgoing_rx.recv() => {
if let Some(message) = message {
Self::create_heartbeat_request(Some(message), self_peer.clone(), start_time_ms, &latest_report)
Self::new_heartbeat_request(&heartbeat_request, Some(message), &latest_report)
} else {
// Receives None that means Sender was dropped, we need to break the current loop
break
}
}
_ = interval.tick() => {
Self::create_heartbeat_request(None, self_peer.clone(), start_time_ms, &latest_report)
Self::new_heartbeat_request(&heartbeat_request, None, &latest_report)
}
};
@@ -226,6 +236,8 @@ impl HeartbeatTask {
// set the timeout to half of the report interval so that it wouldn't delay heartbeat if something went horribly wrong
latest_report = query_flow_state(&query_stat_size, report_interval / 2).await;
}
info!("flownode heartbeat task stopped.");
});
}
@@ -269,7 +281,7 @@ impl HeartbeatTask {
info!("Try to re-establish the heartbeat connection to metasrv.");
if self.start().await.is_ok() {
if self.create_streams().await.is_ok() {
break;
}
}

View File

@@ -13,7 +13,7 @@ workspace = true
[dependencies]
api.workspace = true
arc-swap = "1.0"
async-trait = "0.1"
async-trait.workspace = true
auth.workspace = true
cache.workspace = true
catalog.workspace = true
@@ -44,6 +44,7 @@ lazy_static.workspace = true
log-query.workspace = true
log-store.workspace = true
meta-client.workspace = true
num_cpus.workspace = true
opentelemetry-proto.workspace = true
operator.workspace = true
partition.workspace = true
@@ -70,7 +71,7 @@ catalog = { workspace = true, features = ["testing"] }
common-test-util.workspace = true
datanode.workspace = true
datatypes.workspace = true
futures = "0.3"
futures.workspace = true
serde_json.workspace = true
strfmt = "0.2"
tower.workspace = true

View File

@@ -118,10 +118,9 @@ impl HeartbeatTask {
});
}
fn create_heartbeat_request(
fn new_heartbeat_request(
heartbeat_request: &HeartbeatRequest,
message: Option<OutgoingMessage>,
peer: Option<Peer>,
start_time_ms: u64,
) -> Option<HeartbeatRequest> {
let mailbox_message = match message.map(outgoing_message_to_mailbox_message) {
Some(Ok(message)) => Some(message),
@@ -134,9 +133,7 @@ impl HeartbeatTask {
Some(HeartbeatRequest {
mailbox_message,
peer,
info: Self::build_node_info(start_time_ms),
..Default::default()
..heartbeat_request.clone()
})
}
@@ -147,6 +144,7 @@ impl HeartbeatTask {
version: build_info.version.to_string(),
git_commit: build_info.commit_short.to_string(),
start_time_ms,
cpus: num_cpus::get() as u32,
})
}
@@ -167,11 +165,17 @@ impl HeartbeatTask {
let sleep = tokio::time::sleep(Duration::from_millis(0));
tokio::pin!(sleep);
let heartbeat_request = HeartbeatRequest {
peer: self_peer,
info: Self::build_node_info(start_time_ms),
..Default::default()
};
loop {
let req = tokio::select! {
message = outgoing_rx.recv() => {
if let Some(message) = message {
Self::create_heartbeat_request(Some(message), self_peer.clone(), start_time_ms)
Self::new_heartbeat_request(&heartbeat_request, Some(message))
} else {
// Receives None that means Sender was dropped, we need to break the current loop
break
@@ -179,7 +183,7 @@ impl HeartbeatTask {
}
_ = &mut sleep => {
sleep.as_mut().reset(Instant::now() + Duration::from_millis(report_interval));
Self::create_heartbeat_request(None, self_peer.clone(), start_time_ms)
Self::new_heartbeat_request(&heartbeat_request, None)
}
};

View File

@@ -237,6 +237,13 @@ impl Instance {
let output = match stmt {
Statement::Query(_) | Statement::Explain(_) | Statement::Delete(_) => {
// TODO: remove this when format is supported in datafusion
if let Statement::Explain(explain) = &stmt {
if let Some(format) = explain.format() {
query_ctx.set_explain_format(format.to_string());
}
}
let stmt = QueryStatement::Sql(stmt);
let plan = self
.statement_executor

View File

@@ -25,12 +25,12 @@ use crate::fulltext_index::create::{FulltextIndexCreator, TantivyFulltextIndexCr
use crate::fulltext_index::search::{FulltextIndexSearcher, RowId, TantivyFulltextIndexSearcher};
use crate::fulltext_index::{Analyzer, Config};
async fn new_bounded_stager(prefix: &str) -> (TempDir, Arc<BoundedStager>) {
async fn new_bounded_stager(prefix: &str) -> (TempDir, Arc<BoundedStager<String>>) {
let staging_dir = create_temp_dir(prefix);
let path = staging_dir.path().to_path_buf();
(
staging_dir,
Arc::new(BoundedStager::new(path, 102400, None).await.unwrap()),
Arc::new(BoundedStager::new(path, 102400, None, None).await.unwrap()),
)
}
@@ -68,13 +68,13 @@ async fn test_search(
let file_accessor = Arc::new(MockFileAccessor::new(prefix));
let puffin_manager = FsPuffinManager::new(stager, file_accessor);
let file_name = "fulltext_index";
let blob_key = "fulltext_index";
let mut writer = puffin_manager.writer(file_name).await.unwrap();
create_index(prefix, &mut writer, blob_key, texts, config).await;
let file_name = "fulltext_index".to_string();
let blob_key = "fulltext_index".to_string();
let mut writer = puffin_manager.writer(&file_name).await.unwrap();
create_index(prefix, &mut writer, &blob_key, texts, config).await;
let reader = puffin_manager.reader(file_name).await.unwrap();
let index_dir = reader.dir(blob_key).await.unwrap();
let reader = puffin_manager.reader(&file_name).await.unwrap();
let index_dir = reader.dir(&blob_key).await.unwrap();
let searcher = TantivyFulltextIndexSearcher::new(index_dir.path()).unwrap();
let results = searcher.search(query).await.unwrap();

View File

@@ -55,7 +55,7 @@ pub struct LogQuery {
}
/// Expression to calculate on log after filtering.
#[derive(Debug, Serialize, Deserialize)]
#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum LogExpr {
NamedIdent(String),
PositionalIdent(usize),
@@ -289,7 +289,7 @@ pub struct ColumnFilters {
pub filters: Vec<ContentFilter>,
}
#[derive(Debug, Serialize, Deserialize)]
#[derive(Clone, Debug, Serialize, Deserialize)]
pub enum ContentFilter {
// Search-based filters
/// Only match the exact content.
@@ -310,14 +310,19 @@ pub enum ContentFilter {
// Value-based filters
/// Content exists, a.k.a. not null.
Exist,
Between(String, String),
Between {
start: String,
end: String,
start_inclusive: bool,
end_inclusive: bool,
},
// TODO(ruihang): arithmetic operations
// Compound filters
Compound(Vec<ContentFilter>, BinaryOperator),
}
#[derive(Debug, Serialize, Deserialize)]
#[derive(Clone, Debug, Serialize, Deserialize)]
pub enum BinaryOperator {
And,
Or,

View File

@@ -9,7 +9,7 @@ workspace = true
[dependencies]
api.workspace = true
async-trait = "0.1"
async-trait.workspace = true
common-error.workspace = true
common-grpc.workspace = true
common-macro.workspace = true
@@ -27,7 +27,7 @@ tonic.workspace = true
[dev-dependencies]
datatypes.workspace = true
futures = "0.3"
futures.workspace = true
meta-srv = { workspace = true, features = ["mock"] }
tower.workspace = true
tracing = "0.1"

View File

@@ -198,13 +198,13 @@ impl Inner {
}
);
let leader = self
let leader_addr = self
.ask_leader
.as_ref()
.unwrap()
.get_leader()
.context(error::NoLeaderSnafu)?;
let mut leader = self.make_client(leader)?;
let mut leader = self.make_client(&leader_addr)?;
let (sender, receiver) = mpsc::channel::<HeartbeatRequest>(128);
@@ -236,7 +236,11 @@ impl Inner {
.await
.map_err(error::Error::from)?
.context(error::CreateHeartbeatStreamSnafu)?;
info!("Success to create heartbeat stream to server: {:#?}", res);
info!(
"Success to create heartbeat stream to server: {}, response: {:#?}",
leader_addr, res
);
Ok((
HeartbeatSender::new(self.id, self.role, sender),

View File

@@ -16,7 +16,7 @@ local-ip-address.workspace = true
[dependencies]
api.workspace = true
async-trait = "0.1"
async-trait.workspace = true
bytes.workspace = true
chrono.workspace = true
clap.workspace = true

View File

@@ -44,6 +44,7 @@ use mailbox_handler::MailboxHandler;
use on_leader_start_handler::OnLeaderStartHandler;
use publish_heartbeat_handler::PublishHeartbeatHandler;
use region_lease_handler::RegionLeaseHandler;
use remap_flow_peer_handler::RemapFlowPeerHandler;
use response_header_handler::ResponseHeaderHandler;
use snafu::{OptionExt, ResultExt};
use store_api::storage::RegionId;
@@ -71,6 +72,7 @@ pub mod mailbox_handler;
pub mod on_leader_start_handler;
pub mod publish_heartbeat_handler;
pub mod region_lease_handler;
pub mod remap_flow_peer_handler;
pub mod response_header_handler;
#[async_trait::async_trait]
@@ -573,6 +575,7 @@ impl HeartbeatHandlerGroupBuilder {
self.add_handler_last(publish_heartbeat_handler);
}
self.add_handler_last(CollectStatsHandler::new(self.flush_stats_factor));
self.add_handler_last(RemapFlowPeerHandler::default());
if let Some(flow_state_handler) = self.flow_state_handler.take() {
self.add_handler_last(flow_state_handler);
@@ -853,7 +856,7 @@ mod tests {
.unwrap();
let handlers = group.handlers;
assert_eq!(12, handlers.len());
assert_eq!(13, handlers.len());
let names = [
"ResponseHeaderHandler",
@@ -868,6 +871,7 @@ mod tests {
"MailboxHandler",
"FilterInactiveRegionStatsHandler",
"CollectStatsHandler",
"RemapFlowPeerHandler",
];
for (handler, name) in handlers.iter().zip(names.into_iter()) {
@@ -888,7 +892,7 @@ mod tests {
let group = builder.build().unwrap();
let handlers = group.handlers;
assert_eq!(13, handlers.len());
assert_eq!(14, handlers.len());
let names = [
"ResponseHeaderHandler",
@@ -904,6 +908,7 @@ mod tests {
"CollectStatsHandler",
"FilterInactiveRegionStatsHandler",
"CollectStatsHandler",
"RemapFlowPeerHandler",
];
for (handler, name) in handlers.iter().zip(names.into_iter()) {
@@ -921,7 +926,7 @@ mod tests {
let group = builder.build().unwrap();
let handlers = group.handlers;
assert_eq!(13, handlers.len());
assert_eq!(14, handlers.len());
let names = [
"CollectStatsHandler",
@@ -937,6 +942,7 @@ mod tests {
"MailboxHandler",
"FilterInactiveRegionStatsHandler",
"CollectStatsHandler",
"RemapFlowPeerHandler",
];
for (handler, name) in handlers.iter().zip(names.into_iter()) {
@@ -954,7 +960,7 @@ mod tests {
let group = builder.build().unwrap();
let handlers = group.handlers;
assert_eq!(13, handlers.len());
assert_eq!(14, handlers.len());
let names = [
"ResponseHeaderHandler",
@@ -970,6 +976,7 @@ mod tests {
"CollectStatsHandler",
"FilterInactiveRegionStatsHandler",
"CollectStatsHandler",
"RemapFlowPeerHandler",
];
for (handler, name) in handlers.iter().zip(names.into_iter()) {
@@ -987,7 +994,7 @@ mod tests {
let group = builder.build().unwrap();
let handlers = group.handlers;
assert_eq!(13, handlers.len());
assert_eq!(14, handlers.len());
let names = [
"ResponseHeaderHandler",
@@ -1003,6 +1010,7 @@ mod tests {
"FilterInactiveRegionStatsHandler",
"CollectStatsHandler",
"ResponseHeaderHandler",
"RemapFlowPeerHandler",
];
for (handler, name) in handlers.iter().zip(names.into_iter()) {
@@ -1020,7 +1028,7 @@ mod tests {
let group = builder.build().unwrap();
let handlers = group.handlers;
assert_eq!(12, handlers.len());
assert_eq!(13, handlers.len());
let names = [
"ResponseHeaderHandler",
@@ -1035,6 +1043,7 @@ mod tests {
"CollectStatsHandler",
"FilterInactiveRegionStatsHandler",
"CollectStatsHandler",
"RemapFlowPeerHandler",
];
for (handler, name) in handlers.iter().zip(names.into_iter()) {
@@ -1052,7 +1061,7 @@ mod tests {
let group = builder.build().unwrap();
let handlers = group.handlers;
assert_eq!(12, handlers.len());
assert_eq!(13, handlers.len());
let names = [
"ResponseHeaderHandler",
@@ -1067,6 +1076,7 @@ mod tests {
"MailboxHandler",
"FilterInactiveRegionStatsHandler",
"ResponseHeaderHandler",
"RemapFlowPeerHandler",
];
for (handler, name) in handlers.iter().zip(names.into_iter()) {
@@ -1084,7 +1094,7 @@ mod tests {
let group = builder.build().unwrap();
let handlers = group.handlers;
assert_eq!(12, handlers.len());
assert_eq!(13, handlers.len());
let names = [
"CollectStatsHandler",
@@ -1099,6 +1109,7 @@ mod tests {
"MailboxHandler",
"FilterInactiveRegionStatsHandler",
"CollectStatsHandler",
"RemapFlowPeerHandler",
];
for (handler, name) in handlers.iter().zip(names.into_iter()) {

View File

@@ -23,8 +23,8 @@ pub struct CheckLeaderHandler;
#[async_trait::async_trait]
impl HeartbeatHandler for CheckLeaderHandler {
fn is_acceptable(&self, role: Role) -> bool {
role == Role::Datanode
fn is_acceptable(&self, _role: Role) -> bool {
true
}
async fn handle(

View File

@@ -13,7 +13,6 @@
// limitations under the License.
use api::v1::meta::{HeartbeatRequest, NodeInfo as PbNodeInfo, Role};
use common_meta::cluster;
use common_meta::cluster::{
DatanodeStatus, FlownodeStatus, FrontendStatus, NodeInfo, NodeInfoKey, NodeStatus,
};
@@ -42,7 +41,7 @@ impl HeartbeatHandler for CollectFrontendClusterInfoHandler {
ctx: &mut Context,
_acc: &mut HeartbeatAccumulator,
) -> Result<HandleControl> {
let Some((key, peer, info)) = extract_base_info(req, Role::Frontend) else {
let Some((key, peer, info)) = extract_base_info(req) else {
return Ok(HandleControl::Continue);
};
@@ -75,7 +74,7 @@ impl HeartbeatHandler for CollectFlownodeClusterInfoHandler {
ctx: &mut Context,
_acc: &mut HeartbeatAccumulator,
) -> Result<HandleControl> {
let Some((key, peer, info)) = extract_base_info(req, Role::Flownode) else {
let Some((key, peer, info)) = extract_base_info(req) else {
return Ok(HandleControl::Continue);
};
@@ -109,7 +108,7 @@ impl HeartbeatHandler for CollectDatanodeClusterInfoHandler {
ctx: &mut Context,
acc: &mut HeartbeatAccumulator,
) -> Result<HandleControl> {
let Some((key, peer, info)) = extract_base_info(req, Role::Datanode) else {
let Some((key, peer, info)) = extract_base_info(req) else {
return Ok(HandleControl::Continue);
};
@@ -144,16 +143,9 @@ impl HeartbeatHandler for CollectDatanodeClusterInfoHandler {
}
}
fn extract_base_info(
req: &HeartbeatRequest,
role: Role,
) -> Option<(NodeInfoKey, Peer, PbNodeInfo)> {
let HeartbeatRequest {
header, peer, info, ..
} = req;
let Some(header) = &header else {
return None;
};
fn extract_base_info(request: &HeartbeatRequest) -> Option<(NodeInfoKey, Peer, PbNodeInfo)> {
let HeartbeatRequest { peer, info, .. } = request;
let key = NodeInfoKey::new(request)?;
let Some(peer) = &peer else {
return None;
};
@@ -161,23 +153,11 @@ fn extract_base_info(
return None;
};
Some((
NodeInfoKey {
cluster_id: header.cluster_id,
role: match role {
Role::Datanode => cluster::Role::Datanode,
Role::Frontend => cluster::Role::Frontend,
Role::Flownode => cluster::Role::Flownode,
},
node_id: peer.id,
},
Peer::from(peer.clone()),
info.clone(),
))
Some((key, Peer::from(peer.clone()), info.clone()))
}
async fn put_into_memory_store(ctx: &mut Context, key: NodeInfoKey, value: NodeInfo) -> Result<()> {
let key = key.into();
let key = (&key).into();
let value = value.try_into().context(InvalidClusterInfoFormatSnafu)?;
let put_req = PutRequest {
key,

View File

@@ -21,7 +21,7 @@ use common_meta::key::node_address::{NodeAddressKey, NodeAddressValue};
use common_meta::key::{MetadataKey, MetadataValue};
use common_meta::peer::Peer;
use common_meta::rpc::store::PutRequest;
use common_telemetry::{error, warn};
use common_telemetry::{error, info, warn};
use dashmap::DashMap;
use snafu::ResultExt;
@@ -185,6 +185,10 @@ async fn rewrite_node_address(ctx: &mut Context, stat: &Stat) {
match ctx.leader_cached_kv_backend.put(put).await {
Ok(_) => {
info!(
"Successfully updated datanode `NodeAddressValue`: {:?}",
peer
);
// broadcast invalidating cache
let cache_idents = stat
.table_ids()
@@ -200,11 +204,14 @@ async fn rewrite_node_address(ctx: &mut Context, stat: &Stat) {
}
}
Err(e) => {
error!(e; "Failed to update NodeAddressValue: {:?}", peer);
error!(e; "Failed to update datanode `NodeAddressValue`: {:?}", peer);
}
}
} else {
warn!("Failed to serialize NodeAddressValue: {:?}", peer);
warn!(
"Failed to serialize datanode `NodeAddressValue`: {:?}",
peer
);
}
}

View File

@@ -0,0 +1,92 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
use api::v1::meta::{HeartbeatRequest, Peer, Role};
use common_meta::key::node_address::{NodeAddressKey, NodeAddressValue};
use common_meta::key::{MetadataKey, MetadataValue};
use common_meta::rpc::store::PutRequest;
use common_telemetry::{error, info, warn};
use dashmap::DashMap;
use crate::handler::{HandleControl, HeartbeatAccumulator, HeartbeatHandler};
use crate::metasrv::Context;
use crate::Result;
#[derive(Debug, Default)]
pub struct RemapFlowPeerHandler {
/// flow_node_id -> epoch
epoch_cache: DashMap<u64, u64>,
}
#[async_trait::async_trait]
impl HeartbeatHandler for RemapFlowPeerHandler {
fn is_acceptable(&self, role: Role) -> bool {
role == Role::Flownode
}
async fn handle(
&self,
req: &HeartbeatRequest,
ctx: &mut Context,
_acc: &mut HeartbeatAccumulator,
) -> Result<HandleControl> {
let Some(peer) = req.peer.as_ref() else {
return Ok(HandleControl::Continue);
};
let current_epoch = req.node_epoch;
let flow_node_id = peer.id;
let refresh = if let Some(mut epoch) = self.epoch_cache.get_mut(&flow_node_id) {
if current_epoch > *epoch.value() {
*epoch.value_mut() = current_epoch;
true
} else {
false
}
} else {
self.epoch_cache.insert(flow_node_id, current_epoch);
true
};
if refresh {
rewrite_node_address(ctx, peer).await;
}
Ok(HandleControl::Continue)
}
}
async fn rewrite_node_address(ctx: &mut Context, peer: &Peer) {
let key = NodeAddressKey::with_flownode(peer.id).to_bytes();
if let Ok(value) = NodeAddressValue::new(peer.clone().into()).try_as_raw_value() {
let put = PutRequest {
key,
value,
prev_kv: false,
};
match ctx.leader_cached_kv_backend.put(put).await {
Ok(_) => {
info!("Successfully updated flow `NodeAddressValue`: {:?}", peer);
// TODO(discord): broadcast invalidating cache to all frontends
}
Err(e) => {
error!(e; "Failed to update flow `NodeAddressValue`: {:?}", peer);
}
}
} else {
warn!("Failed to serialize flow `NodeAddressValue`: {:?}", peer);
}
}

View File

@@ -32,6 +32,7 @@ use common_meta::kv_backend::{KvBackendRef, ResettableKvBackend, ResettableKvBac
use common_meta::leadership_notifier::{
LeadershipChangeNotifier, LeadershipChangeNotifierCustomizerRef,
};
use common_meta::node_expiry_listener::NodeExpiryListener;
use common_meta::peer::Peer;
use common_meta::region_keeper::MemoryRegionKeeperRef;
use common_meta::wal_options_allocator::WalOptionsAllocatorRef;
@@ -151,6 +152,8 @@ pub struct MetasrvOptions {
#[cfg(feature = "pg_kvbackend")]
/// Lock id for meta kv election. Only effect when using pg_kvbackend.
pub meta_election_lock_id: u64,
#[serde(with = "humantime_serde")]
pub node_max_idle_time: Duration,
}
const DEFAULT_METASRV_ADDR_PORT: &str = "3002";
@@ -192,6 +195,7 @@ impl Default for MetasrvOptions {
meta_table_name: DEFAULT_META_TABLE_NAME.to_string(),
#[cfg(feature = "pg_kvbackend")]
meta_election_lock_id: DEFAULT_META_ELECTION_LOCK_ID,
node_max_idle_time: Duration::from_secs(24 * 60 * 60),
}
}
}
@@ -442,6 +446,10 @@ impl Metasrv {
leadership_change_notifier.add_listener(self.wal_options_allocator.clone());
leadership_change_notifier
.add_listener(Arc::new(ProcedureManagerListenerAdapter(procedure_manager)));
leadership_change_notifier.add_listener(Arc::new(NodeExpiryListener::new(
self.options.node_max_idle_time,
self.in_memory.clone(),
)));
if let Some(region_supervisor_ticker) = &self.region_supervisor_ticker {
leadership_change_notifier.add_listener(region_supervisor_ticker.clone() as _);
}

View File

@@ -68,13 +68,15 @@ impl heartbeat_server::Heartbeat for Metasrv {
};
if pusher_id.is_none() {
pusher_id = register_pusher(&handler_group, header, tx.clone()).await;
pusher_id =
Some(register_pusher(&handler_group, header, tx.clone()).await);
}
if let Some(k) = &pusher_id {
METRIC_META_HEARTBEAT_RECV.with_label_values(&[&k.to_string()]);
} else {
METRIC_META_HEARTBEAT_RECV.with_label_values(&["none"]);
}
let res = handler_group
.handle(req, ctx.clone())
.await
@@ -173,13 +175,13 @@ async fn register_pusher(
handler_group: &HeartbeatHandlerGroup,
header: &RequestHeader,
sender: Sender<std::result::Result<HeartbeatResponse, tonic::Status>>,
) -> Option<PusherId> {
) -> PusherId {
let role = header.role();
let id = get_node_id(header);
let pusher_id = PusherId::new(role, id);
let pusher = Pusher::new(sender, header);
handler_group.register_pusher(pusher_id, pusher).await;
Some(pusher_id)
pusher_id
}
#[cfg(test)]

View File

@@ -17,13 +17,15 @@ use std::time::Duration;
use api::v1::meta::{
procedure_service_server, DdlTaskRequest as PbDdlTaskRequest,
DdlTaskResponse as PbDdlTaskResponse, MigrateRegionRequest, MigrateRegionResponse,
DdlTaskResponse as PbDdlTaskResponse, Error, MigrateRegionRequest, MigrateRegionResponse,
ProcedureDetailRequest, ProcedureDetailResponse, ProcedureStateResponse, QueryProcedureRequest,
ResponseHeader,
};
use common_meta::ddl::ExecutorContext;
use common_meta::rpc::ddl::{DdlTask, SubmitDdlTaskRequest};
use common_meta::rpc::procedure;
use snafu::{ensure, OptionExt, ResultExt};
use common_telemetry::warn;
use snafu::{OptionExt, ResultExt};
use tonic::{Request, Response};
use super::GrpcResult;
@@ -37,6 +39,16 @@ impl procedure_service_server::ProcedureService for Metasrv {
&self,
request: Request<QueryProcedureRequest>,
) -> GrpcResult<ProcedureStateResponse> {
if !self.is_leader() {
let resp = ProcedureStateResponse {
header: Some(ResponseHeader::failed(0, Error::is_not_leader())),
..Default::default()
};
warn!("The current meta is not leader, but a `query procedure state` request have reached the meta. Detail: {:?}.", request);
return Ok(Response::new(resp));
}
let QueryProcedureRequest { header, pid, .. } = request.into_inner();
let _header = header.context(error::MissingRequestHeaderSnafu)?;
let pid = pid.context(error::MissingRequiredParameterSnafu { param: "pid" })?;
@@ -57,6 +69,16 @@ impl procedure_service_server::ProcedureService for Metasrv {
}
async fn ddl(&self, request: Request<PbDdlTaskRequest>) -> GrpcResult<PbDdlTaskResponse> {
if !self.is_leader() {
let resp = PbDdlTaskResponse {
header: Some(ResponseHeader::failed(0, Error::is_not_leader())),
..Default::default()
};
warn!("The current meta is not leader, but a `ddl` request have reached the meta. Detail: {:?}.", request);
return Ok(Response::new(resp));
}
let PbDdlTaskRequest {
header,
query_context,
@@ -99,12 +121,15 @@ impl procedure_service_server::ProcedureService for Metasrv {
&self,
request: Request<MigrateRegionRequest>,
) -> GrpcResult<MigrateRegionResponse> {
ensure!(
self.meta_peer_client().is_leader(),
error::UnexpectedSnafu {
violated: "Trying to submit a region migration procedure to non-leader meta server"
}
);
if !self.is_leader() {
let resp = MigrateRegionResponse {
header: Some(ResponseHeader::failed(0, Error::is_not_leader())),
..Default::default()
};
warn!("The current meta is not leader, but a `migrate` request have reached the meta. Detail: {:?}.", request);
return Ok(Response::new(resp));
}
let MigrateRegionRequest {
header,
@@ -150,6 +175,16 @@ impl procedure_service_server::ProcedureService for Metasrv {
&self,
request: Request<ProcedureDetailRequest>,
) -> GrpcResult<ProcedureDetailResponse> {
if !self.is_leader() {
let resp = ProcedureDetailResponse {
header: Some(ResponseHeader::failed(0, Error::is_not_leader())),
..Default::default()
};
warn!("The current meta is not leader, but a `procedure details` request have reached the meta. Detail: {:?}.", request);
return Ok(Response::new(resp));
}
let ProcedureDetailRequest { header } = request.into_inner();
let _header = header.context(error::MissingRequestHeaderSnafu)?;
let metas = self

View File

@@ -142,6 +142,7 @@ impl DataRegion {
c.column_id = new_column_id_start + delta as u32;
c.column_schema.set_nullable();
match index_options {
IndexOptions::None => {}
IndexOptions::Inverted => {
c.column_schema.set_inverted_index(true);
}

View File

@@ -21,7 +21,7 @@ use api::v1::SemanticType;
use common_telemetry::info;
use common_time::{Timestamp, FOREVER};
use datatypes::data_type::ConcreteDataType;
use datatypes::schema::ColumnSchema;
use datatypes::schema::{ColumnSchema, SkippingIndexOptions};
use datatypes::value::Value;
use mito2::engine::MITO_ENGINE_NAME;
use object_store::util::join_dir;
@@ -55,6 +55,8 @@ use crate::error::{
use crate::metrics::PHYSICAL_REGION_COUNT;
use crate::utils::{self, to_data_region_id, to_metadata_region_id};
const DEFAULT_TABLE_ID_SKIPPING_INDEX_GRANULARITY: u32 = 1024;
impl MetricEngineInner {
pub async fn create_regions(
&self,
@@ -440,6 +442,7 @@ impl MetricEngineInner {
///
/// Return `[table_id_col, tsid_col]`
fn internal_column_metadata() -> [ColumnMetadata; 2] {
// Safety: BloomFilter is a valid skipping index type
let metric_name_col = ColumnMetadata {
column_id: ReservedColumnId::table_id(),
semantic_type: SemanticType::Tag,
@@ -448,7 +451,11 @@ impl MetricEngineInner {
ConcreteDataType::uint32_datatype(),
false,
)
.with_inverted_index(true),
.with_skipping_options(SkippingIndexOptions {
granularity: DEFAULT_TABLE_ID_SKIPPING_INDEX_GRANULARITY,
index_type: datatypes::schema::SkippingIndexType::BloomFilter,
})
.unwrap(),
};
let tsid_col = ColumnMetadata {
column_id: ReservedColumnId::tsid(),

View File

@@ -30,9 +30,10 @@ impl MetricEngineInner {
pub async fn drop_region(
&self,
region_id: RegionId,
_req: RegionDropRequest,
req: RegionDropRequest,
) -> Result<AffectedRows> {
let data_region_id = utils::to_data_region_id(region_id);
let fast_path = req.fast_path;
// enclose the guard in a block to prevent the guard from polluting the async context
let (is_physical_region, is_physical_region_busy) = {
@@ -52,7 +53,7 @@ impl MetricEngineInner {
if is_physical_region {
// check if there is no logical region relates to this physical region
if is_physical_region_busy {
if is_physical_region_busy && !fast_path {
// reject if there is any present logical region
return Err(PhysicalRegionBusySnafu {
region_id: data_region_id,
@@ -60,9 +61,21 @@ impl MetricEngineInner {
.build());
}
self.drop_physical_region(data_region_id).await
return self.drop_physical_region(data_region_id).await;
}
if fast_path {
// for fast path, we don't delete the metadata in the metadata region.
// it only remove the logical region from the engine state.
//
// The drop database procedure will ensure the metadata region and data region are dropped eventually.
self.state
.write()
.unwrap()
.remove_logical_region(region_id)?;
Ok(0)
} else {
// cannot merge these two `if` otherwise the stupid type checker will complain
let metadata_region_id = self
.state
.read()
@@ -87,13 +100,16 @@ impl MetricEngineInner {
// Since the physical regions are going to be dropped, we don't need to
// update the contents in metadata region.
self.mito
.handle_request(data_region_id, RegionRequest::Drop(RegionDropRequest {}))
.handle_request(
data_region_id,
RegionRequest::Drop(RegionDropRequest { fast_path: false }),
)
.await
.with_context(|_| CloseMitoRegionSnafu { region_id })?;
self.mito
.handle_request(
metadata_region_id,
RegionRequest::Drop(RegionDropRequest {}),
RegionRequest::Drop(RegionDropRequest { fast_path: false }),
)
.await
.with_context(|_| CloseMitoRegionSnafu { region_id })?;

View File

@@ -40,6 +40,7 @@ pub struct PhysicalRegionOptions {
#[derive(Debug, Clone, Copy, Default, PartialEq, Eq)]
pub enum IndexOptions {
#[default]
None,
Inverted,
Skipping {
granularity: u32,

View File

@@ -16,7 +16,7 @@ api.workspace = true
aquamarine.workspace = true
async-channel = "1.9"
async-stream.workspace = true
async-trait = "0.1"
async-trait.workspace = true
bytemuck.workspace = true
bytes.workspace = true
common-base.workspace = true

View File

@@ -146,11 +146,14 @@ impl AccessLayer {
} else {
// Write cache is disabled.
let store = self.object_store.clone();
let path_provider = RegionFilePathFactory::new(self.region_dir.clone());
let indexer_builder = IndexerBuilderImpl {
op_type: request.op_type,
metadata: request.metadata.clone(),
row_group_size: write_opts.row_group_size,
puffin_manager: self.puffin_manager_factory.build(store),
puffin_manager: self
.puffin_manager_factory
.build(store, path_provider.clone()),
intermediate_manager: self.intermediate_manager.clone(),
index_options: request.index_options,
inverted_index_config: request.inverted_index_config,
@@ -161,9 +164,7 @@ impl AccessLayer {
self.object_store.clone(),
request.metadata,
indexer_builder,
RegionFilePathFactory {
region_dir: self.region_dir.clone(),
},
path_provider,
)
.await;
writer
@@ -248,8 +249,18 @@ pub trait FilePathProvider: Send + Sync {
/// Path provider that builds paths in local write cache.
#[derive(Clone)]
pub(crate) struct WriteCachePathProvider {
pub(crate) region_id: RegionId,
pub(crate) file_cache: FileCacheRef,
region_id: RegionId,
file_cache: FileCacheRef,
}
impl WriteCachePathProvider {
/// Creates a new `WriteCachePathProvider` instance.
pub fn new(region_id: RegionId, file_cache: FileCacheRef) -> Self {
Self {
region_id,
file_cache,
}
}
}
impl FilePathProvider for WriteCachePathProvider {
@@ -267,7 +278,14 @@ impl FilePathProvider for WriteCachePathProvider {
/// Path provider that builds paths in region storage path.
#[derive(Clone, Debug)]
pub(crate) struct RegionFilePathFactory {
pub(crate) region_dir: String,
region_dir: String,
}
impl RegionFilePathFactory {
/// Creates a new `RegionFilePathFactory` instance.
pub fn new(region_dir: String) -> Self {
Self { region_dir }
}
}
impl FilePathProvider for RegionFilePathFactory {

View File

@@ -187,9 +187,12 @@ impl FileCache {
}
/// Removes a file from the cache explicitly.
/// It always tries to remove the file from the local store because we may not have the file
/// in the memory index if upload is failed.
pub(crate) async fn remove(&self, key: IndexKey) {
let file_path = self.cache_file_path(key);
self.memory_index.remove(&key).await;
// Always delete the file from the local store.
if let Err(e) = self.local_store.delete(&file_path).await {
warn!(e; "Failed to delete a cached file {}", file_path);
}

View File

@@ -22,6 +22,7 @@ use common_telemetry::{debug, info};
use futures::AsyncWriteExt;
use object_store::ObjectStore;
use snafu::ResultExt;
use store_api::storage::RegionId;
use crate::access_layer::{
new_fs_cache_store, FilePathProvider, RegionFilePathFactory, SstInfoArray, SstWriteRequest,
@@ -114,15 +115,14 @@ impl WriteCache {
let region_id = write_request.metadata.region_id;
let store = self.file_cache.local_store();
let path_provider = WriteCachePathProvider {
file_cache: self.file_cache.clone(),
region_id,
};
let path_provider = WriteCachePathProvider::new(region_id, self.file_cache.clone());
let indexer = IndexerBuilderImpl {
op_type: write_request.op_type,
metadata: write_request.metadata.clone(),
row_group_size: write_opts.row_group_size,
puffin_manager: self.puffin_manager_factory.build(store),
puffin_manager: self
.puffin_manager_factory
.build(store, path_provider.clone()),
intermediate_manager: self.intermediate_manager.clone(),
index_options: write_request.index_options,
inverted_index_config: write_request.inverted_index_config,
@@ -150,24 +150,41 @@ impl WriteCache {
return Ok(sst_info);
}
let mut upload_tracker = UploadTracker::new(region_id);
let mut err = None;
let remote_store = &upload_request.remote_store;
for sst in &sst_info {
let parquet_key = IndexKey::new(region_id, sst.file_id, FileType::Parquet);
let parquet_path = upload_request
.dest_path_provider
.build_sst_file_path(sst.file_id);
self.upload(parquet_key, &parquet_path, remote_store)
.await?;
if let Err(e) = self.upload(parquet_key, &parquet_path, remote_store).await {
err = Some(e);
break;
}
upload_tracker.push_uploaded_file(parquet_path);
if sst.index_metadata.file_size > 0 {
let puffin_key = IndexKey::new(region_id, sst.file_id, FileType::Puffin);
let puffin_path = &upload_request
let puffin_path = upload_request
.dest_path_provider
.build_index_file_path(sst.file_id);
self.upload(puffin_key, puffin_path, remote_store).await?;
if let Err(e) = self.upload(puffin_key, &puffin_path, remote_store).await {
err = Some(e);
break;
}
upload_tracker.push_uploaded_file(puffin_path);
}
}
if let Some(err) = err {
// Cleans files on failure.
upload_tracker
.clean(&sst_info, &self.file_cache, remote_store)
.await;
return Err(err);
}
Ok(sst_info)
}
@@ -333,6 +350,61 @@ pub struct SstUploadRequest {
pub remote_store: ObjectStore,
}
/// A structs to track files to upload and clean them if upload failed.
struct UploadTracker {
/// Id of the region to track.
region_id: RegionId,
/// Paths of files uploaded successfully.
files_uploaded: Vec<String>,
}
impl UploadTracker {
/// Creates a new instance of `UploadTracker` for a given region.
fn new(region_id: RegionId) -> Self {
Self {
region_id,
files_uploaded: Vec::new(),
}
}
/// Add a file path to the list of uploaded files.
fn push_uploaded_file(&mut self, path: String) {
self.files_uploaded.push(path);
}
/// Cleans uploaded files and files in the file cache at best effort.
async fn clean(
&self,
sst_info: &SstInfoArray,
file_cache: &FileCacheRef,
remote_store: &ObjectStore,
) {
common_telemetry::info!(
"Start cleaning files on upload failure, region: {}, num_ssts: {}",
self.region_id,
sst_info.len()
);
// Cleans files in the file cache first.
for sst in sst_info {
let parquet_key = IndexKey::new(self.region_id, sst.file_id, FileType::Parquet);
file_cache.remove(parquet_key).await;
if sst.index_metadata.file_size > 0 {
let puffin_key = IndexKey::new(self.region_id, sst.file_id, FileType::Puffin);
file_cache.remove(puffin_key).await;
}
}
// Cleans uploaded files.
for file_path in &self.files_uploaded {
if let Err(e) = remote_store.delete(file_path).await {
common_telemetry::error!(e; "Failed to delete file {}", file_path);
}
}
}
}
#[cfg(test)]
mod tests {
use common_test_util::temp_dir::create_temp_dir;
@@ -355,9 +427,7 @@ mod tests {
// and now just use local file system to mock.
let mut env = TestEnv::new();
let mock_store = env.init_object_store_manager();
let path_provider = RegionFilePathFactory {
region_dir: "test".to_string(),
};
let path_provider = RegionFilePathFactory::new("test".to_string());
let local_dir = create_temp_dir("");
let local_store = new_fs_store(local_dir.path().to_str().unwrap());
@@ -488,9 +558,7 @@ mod tests {
..Default::default()
};
let upload_request = SstUploadRequest {
dest_path_provider: RegionFilePathFactory {
region_dir: data_home.clone(),
},
dest_path_provider: RegionFilePathFactory::new(data_home.clone()),
remote_store: mock_store.clone(),
};

View File

@@ -135,6 +135,7 @@ pub async fn open_compaction_region(
&mito_config.index.aux_path,
mito_config.index.staging_size.as_bytes(),
Some(mito_config.index.write_buffer_size.as_bytes() as _),
mito_config.index.staging_ttl,
)
.await?;
let intermediate_manager =

View File

@@ -299,6 +299,11 @@ pub struct IndexConfig {
/// The max capacity of the staging directory.
pub staging_size: ReadableSize,
/// The TTL of the staging directory.
/// Defaults to 7 days.
/// Setting it to "0s" to disable TTL.
#[serde(with = "humantime_serde")]
pub staging_ttl: Option<Duration>,
/// Write buffer size for creating the index.
pub write_buffer_size: ReadableSize,
@@ -316,6 +321,7 @@ impl Default for IndexConfig {
Self {
aux_path: String::new(),
staging_size: ReadableSize::gb(2),
staging_ttl: Some(Duration::from_secs(7 * 24 * 60 * 60)),
write_buffer_size: ReadableSize::mb(8),
metadata_cache_size: ReadableSize::mb(64),
content_cache_size: ReadableSize::mb(128),
@@ -352,6 +358,10 @@ impl IndexConfig {
);
}
if self.staging_ttl.map(|ttl| ttl.is_zero()).unwrap_or(false) {
self.staging_ttl = None;
}
Ok(())
}
}

View File

@@ -56,7 +56,10 @@ async fn test_engine_drop_region() {
// It's okay to drop a region doesn't exist.
engine
.handle_request(region_id, RegionRequest::Drop(RegionDropRequest {}))
.handle_request(
region_id,
RegionRequest::Drop(RegionDropRequest { fast_path: false }),
)
.await
.unwrap_err();
@@ -86,7 +89,10 @@ async fn test_engine_drop_region() {
// drop the created region.
engine
.handle_request(region_id, RegionRequest::Drop(RegionDropRequest {}))
.handle_request(
region_id,
RegionRequest::Drop(RegionDropRequest { fast_path: false }),
)
.await
.unwrap();
assert!(!engine.is_region_exists(region_id));
@@ -192,7 +198,10 @@ async fn test_engine_drop_region_for_custom_store() {
// Drop the custom region.
engine
.handle_request(custom_region_id, RegionRequest::Drop(RegionDropRequest {}))
.handle_request(
custom_region_id,
RegionRequest::Drop(RegionDropRequest { fast_path: false }),
)
.await
.unwrap();
assert!(!engine.is_region_exists(custom_region_id));

View File

@@ -823,6 +823,13 @@ pub enum Error {
location: Location,
},
#[snafu(display("Failed to purge puffin stager"))]
PuffinPurgeStager {
source: puffin::error::Error,
#[snafu(implicit)]
location: Location,
},
#[snafu(display("Failed to build puffin reader"))]
PuffinBuildReader {
source: puffin::error::Error,
@@ -1062,7 +1069,8 @@ impl ErrorExt for Error {
PuffinReadBlob { source, .. }
| PuffinAddBlob { source, .. }
| PuffinInitStager { source, .. }
| PuffinBuildReader { source, .. } => source.status_code(),
| PuffinBuildReader { source, .. }
| PuffinPurgeStager { source, .. } => source.status_code(),
CleanDir { .. } => StatusCode::Unexpected,
InvalidConfig { .. } => StatusCode::InvalidArguments,
StaleLogEntry { .. } => StatusCode::Unexpected,

View File

@@ -12,8 +12,11 @@
// See the License for the specific language governing permissions and
// limitations under the License.
use std::time::Duration;
use lazy_static::lazy_static;
use prometheus::*;
use puffin::puffin_manager::stager::StagerNotifier;
/// Stage label.
pub const STAGE_LABEL: &str = "stage";
@@ -28,6 +31,10 @@ pub const FILE_TYPE_LABEL: &str = "file_type";
pub const WORKER_LABEL: &str = "worker";
/// Partition label.
pub const PARTITION_LABEL: &str = "partition";
/// Staging dir type label.
pub const STAGING_TYPE: &str = "index_staging";
/// Recycle bin type label.
pub const RECYCLE_TYPE: &str = "recycle_bin";
lazy_static! {
/// Global write buffer size in bytes.
@@ -381,3 +388,68 @@ lazy_static! {
exponential_buckets(0.01, 10.0, 6).unwrap(),
).unwrap();
}
/// Stager notifier to collect metrics.
pub struct StagerMetrics {
cache_hit: IntCounter,
cache_miss: IntCounter,
staging_cache_bytes: IntGauge,
recycle_cache_bytes: IntGauge,
cache_eviction: IntCounter,
staging_miss_read: Histogram,
}
impl StagerMetrics {
/// Creates a new stager notifier.
pub fn new() -> Self {
Self {
cache_hit: CACHE_HIT.with_label_values(&[STAGING_TYPE]),
cache_miss: CACHE_MISS.with_label_values(&[STAGING_TYPE]),
staging_cache_bytes: CACHE_BYTES.with_label_values(&[STAGING_TYPE]),
recycle_cache_bytes: CACHE_BYTES.with_label_values(&[RECYCLE_TYPE]),
cache_eviction: CACHE_EVICTION.with_label_values(&[STAGING_TYPE, "size"]),
staging_miss_read: READ_STAGE_ELAPSED.with_label_values(&["staging_miss_read"]),
}
}
}
impl Default for StagerMetrics {
fn default() -> Self {
Self::new()
}
}
impl StagerNotifier for StagerMetrics {
fn on_cache_hit(&self, _size: u64) {
self.cache_hit.inc();
}
fn on_cache_miss(&self, _size: u64) {
self.cache_miss.inc();
}
fn on_cache_insert(&self, size: u64) {
self.staging_cache_bytes.add(size as i64);
}
fn on_load_dir(&self, duration: Duration) {
self.staging_miss_read.observe(duration.as_secs_f64());
}
fn on_load_blob(&self, duration: Duration) {
self.staging_miss_read.observe(duration.as_secs_f64());
}
fn on_cache_evict(&self, size: u64) {
self.cache_eviction.inc();
self.staging_cache_bytes.sub(size as i64);
}
fn on_recycle_insert(&self, size: u64) {
self.recycle_cache_bytes.add(size as i64);
}
fn on_recycle_clear(&self, size: u64) {
self.recycle_cache_bytes.sub(size as i64);
}
}

View File

@@ -32,7 +32,6 @@ use tokio::sync::{mpsc, Semaphore};
use tokio_stream::wrappers::ReceiverStream;
use crate::access_layer::AccessLayerRef;
use crate::cache::file_cache::FileCacheRef;
use crate::cache::CacheStrategy;
use crate::config::DEFAULT_SCAN_CHANNEL_SIZE;
use crate::error::Result;
@@ -427,12 +426,7 @@ impl ScanRegion {
return None;
}
let file_cache = || -> Option<FileCacheRef> {
let write_cache = self.cache_strategy.write_cache()?;
let file_cache = write_cache.file_cache();
Some(file_cache)
}();
let file_cache = self.cache_strategy.write_cache().map(|w| w.file_cache());
let inverted_index_cache = self.cache_strategy.inverted_index_cache().cloned();
let puffin_metadata_cache = self.cache_strategy.puffin_metadata_cache().cloned();
@@ -467,14 +461,8 @@ impl ScanRegion {
return None;
}
let file_cache = || -> Option<FileCacheRef> {
let write_cache = self.cache_strategy.write_cache()?;
let file_cache = write_cache.file_cache();
Some(file_cache)
}();
let file_cache = self.cache_strategy.write_cache().map(|w| w.file_cache());
let bloom_filter_index_cache = self.cache_strategy.bloom_filter_index_cache().cloned();
let puffin_metadata_cache = self.cache_strategy.puffin_metadata_cache().cloned();
BloomFilterIndexApplierBuilder::new(
@@ -499,12 +487,18 @@ impl ScanRegion {
return None;
}
let file_cache = self.cache_strategy.write_cache().map(|w| w.file_cache());
let puffin_metadata_cache = self.cache_strategy.puffin_metadata_cache().cloned();
FulltextIndexApplierBuilder::new(
self.access_layer.region_dir().to_string(),
self.version.metadata.region_id,
self.access_layer.object_store().clone(),
self.access_layer.puffin_manager_factory().clone(),
self.version.metadata.as_ref(),
)
.with_file_cache(file_cache)
.with_puffin_metadata_cache(puffin_metadata_cache)
.build(&self.request.filters)
.inspect_err(|err| warn!(err; "Failed to build fulltext index applier"))
.ok()

View File

@@ -35,8 +35,8 @@ use store_api::metadata::{ColumnMetadata, RegionMetadata, RegionMetadataRef};
use store_api::region_engine::{SetRegionRoleStateResponse, SettableRegionRoleState};
use store_api::region_request::{
AffectedRows, RegionAlterRequest, RegionCatchupRequest, RegionCloseRequest,
RegionCompactRequest, RegionCreateRequest, RegionDropRequest, RegionFlushRequest,
RegionOpenRequest, RegionRequest, RegionTruncateRequest,
RegionCompactRequest, RegionCreateRequest, RegionFlushRequest, RegionOpenRequest,
RegionRequest, RegionTruncateRequest,
};
use store_api::storage::{RegionId, SequenceNumber};
use tokio::sync::oneshot::{self, Receiver, Sender};
@@ -624,10 +624,10 @@ impl WorkerRequest {
sender: sender.into(),
request: DdlRequest::Create(v),
}),
RegionRequest::Drop(v) => WorkerRequest::Ddl(SenderDdlRequest {
RegionRequest::Drop(_) => WorkerRequest::Ddl(SenderDdlRequest {
region_id,
sender: sender.into(),
request: DdlRequest::Drop(v),
request: DdlRequest::Drop,
}),
RegionRequest::Open(v) => WorkerRequest::Ddl(SenderDdlRequest {
region_id,
@@ -690,7 +690,7 @@ impl WorkerRequest {
#[derive(Debug)]
pub(crate) enum DdlRequest {
Create(RegionCreateRequest),
Drop(RegionDropRequest),
Drop,
Open((RegionOpenRequest, Option<WalEntryReceiver>)),
Close(RegionCloseRequest),
Alter(RegionAlterRequest),

View File

@@ -154,6 +154,10 @@ pub enum IndexType {
}
impl FileMeta {
pub fn exists_index(&self) -> bool {
!self.available_indexes.is_empty()
}
/// Returns true if the file has an inverted index
pub fn inverted_index_available(&self) -> bool {
self.available_indexes.contains(&IndexType::InvertedIndex)
@@ -170,31 +174,8 @@ impl FileMeta {
.contains(&IndexType::BloomFilterIndex)
}
/// Returns the size of the inverted index file
pub fn inverted_index_size(&self) -> Option<u64> {
if self.available_indexes.len() == 1 && self.inverted_index_available() {
Some(self.index_file_size)
} else {
None
}
}
/// Returns the size of the fulltext index file
pub fn fulltext_index_size(&self) -> Option<u64> {
if self.available_indexes.len() == 1 && self.fulltext_index_available() {
Some(self.index_file_size)
} else {
None
}
}
/// Returns the size of the bloom filter index file
pub fn bloom_filter_index_size(&self) -> Option<u64> {
if self.available_indexes.len() == 1 && self.bloom_filter_index_available() {
Some(self.index_file_size)
} else {
None
}
pub fn index_file_size(&self) -> u64 {
self.index_file_size
}
}

View File

@@ -92,8 +92,8 @@ impl FilePurger for LocalFilePurger {
if let Some(write_cache) = cache_manager.as_ref().and_then(|cache| cache.write_cache())
{
// Removes the inverted index from the cache.
if file_meta.inverted_index_available() {
// Removes index file from the cache.
if file_meta.exists_index() {
write_cache
.remove(IndexKey::new(
file_meta.region_id,
@@ -111,6 +111,16 @@ impl FilePurger for LocalFilePurger {
))
.await;
}
// Purges index content in the stager.
if let Err(e) = sst_layer
.puffin_manager_factory()
.purge_stager(file_meta.file_id)
.await
{
error!(e; "Failed to purge stager with index file, file_id: {}, region: {}",
file_meta.file_id, file_meta.region_id);
}
})) {
error!(e; "Failed to schedule the file purge request");
}
@@ -146,7 +156,7 @@ mod tests {
let path = location::sst_file_path(sst_dir, sst_file_id);
let index_aux_path = dir.path().join("index_aux");
let puffin_mgr = PuffinManagerFactory::new(&index_aux_path, 4096, None)
let puffin_mgr = PuffinManagerFactory::new(&index_aux_path, 4096, None, None)
.await
.unwrap();
let intm_mgr = IntermediateManager::init_fs(index_aux_path.to_str().unwrap())
@@ -202,7 +212,7 @@ mod tests {
let sst_dir = "table1";
let index_aux_path = dir.path().join("index_aux");
let puffin_mgr = PuffinManagerFactory::new(&index_aux_path, 4096, None)
let puffin_mgr = PuffinManagerFactory::new(&index_aux_path, 4096, None, None)
.await
.unwrap();
let intm_mgr = IntermediateManager::init_fs(index_aux_path.to_str().unwrap())

View File

@@ -103,7 +103,6 @@ pub type BloomFilterOutput = IndexBaseOutput;
#[derive(Default)]
pub struct Indexer {
file_id: FileId,
file_path: String,
region_id: RegionId,
puffin_manager: Option<SstPuffinManager>,
inverted_indexer: Option<InvertedIndexer>,
@@ -170,7 +169,7 @@ impl Indexer {
#[async_trait::async_trait]
pub trait IndexerBuilder {
/// Builds indexer of given file id to [index_file_path].
async fn build(&self, file_id: FileId, index_file_path: String) -> Indexer;
async fn build(&self, file_id: FileId) -> Indexer;
}
pub(crate) struct IndexerBuilderImpl {
@@ -188,10 +187,9 @@ pub(crate) struct IndexerBuilderImpl {
#[async_trait::async_trait]
impl IndexerBuilder for IndexerBuilderImpl {
/// Sanity check for arguments and create a new [Indexer] if arguments are valid.
async fn build(&self, file_id: FileId, index_file_path: String) -> Indexer {
async fn build(&self, file_id: FileId) -> Indexer {
let mut indexer = Indexer {
file_id,
file_path: index_file_path,
region_id: self.metadata.region_id,
..Default::default()
};
@@ -392,30 +390,31 @@ mod tests {
use store_api::metadata::{ColumnMetadata, RegionMetadataBuilder};
use super::*;
use crate::access_layer::FilePathProvider;
use crate::config::{FulltextIndexConfig, Mode};
struct MetaConfig {
with_tag: bool,
with_inverted: bool,
with_fulltext: bool,
with_skipping_bloom: bool,
}
fn mock_region_metadata(
MetaConfig {
with_tag,
with_inverted,
with_fulltext,
with_skipping_bloom,
}: MetaConfig,
) -> RegionMetadataRef {
let mut builder = RegionMetadataBuilder::new(RegionId::new(1, 2));
let mut column_schema = ColumnSchema::new("a", ConcreteDataType::int64_datatype(), false);
if with_inverted {
column_schema = column_schema.with_inverted_index(true);
}
builder
.push_column_metadata(ColumnMetadata {
column_schema: ColumnSchema::new("a", ConcreteDataType::int64_datatype(), false),
semantic_type: if with_tag {
SemanticType::Tag
} else {
SemanticType::Field
},
column_schema,
semantic_type: SemanticType::Field,
column_id: 1,
})
.push_column_metadata(ColumnMetadata {
@@ -433,10 +432,6 @@ mod tests {
column_id: 3,
});
if with_tag {
builder.primary_key(vec![1]);
}
if with_fulltext {
let column_schema =
ColumnSchema::new("text", ConcreteDataType::string_datatype(), true)
@@ -484,6 +479,18 @@ mod tests {
IntermediateManager::init_fs(path).await.unwrap()
}
struct NoopPathProvider;
impl FilePathProvider for NoopPathProvider {
fn build_index_file_path(&self, _file_id: FileId) -> String {
unreachable!()
}
fn build_sst_file_path(&self, _file_id: FileId) -> String {
unreachable!()
}
}
#[tokio::test]
async fn test_build_indexer_basic() {
let (dir, factory) =
@@ -491,7 +498,7 @@ mod tests {
let intm_manager = mock_intm_mgr(dir.path().to_string_lossy()).await;
let metadata = mock_region_metadata(MetaConfig {
with_tag: true,
with_inverted: true,
with_fulltext: true,
with_skipping_bloom: true,
});
@@ -499,14 +506,14 @@ mod tests {
op_type: OperationType::Flush,
metadata,
row_group_size: 1024,
puffin_manager: factory.build(mock_object_store()),
puffin_manager: factory.build(mock_object_store(), NoopPathProvider),
intermediate_manager: intm_manager,
index_options: IndexOptions::default(),
inverted_index_config: InvertedIndexConfig::default(),
fulltext_index_config: FulltextIndexConfig::default(),
bloom_filter_index_config: BloomFilterConfig::default(),
}
.build(FileId::random(), "test".to_string())
.build(FileId::random())
.await;
assert!(indexer.inverted_indexer.is_some());
@@ -521,7 +528,7 @@ mod tests {
let intm_manager = mock_intm_mgr(dir.path().to_string_lossy()).await;
let metadata = mock_region_metadata(MetaConfig {
with_tag: true,
with_inverted: true,
with_fulltext: true,
with_skipping_bloom: true,
});
@@ -529,7 +536,7 @@ mod tests {
op_type: OperationType::Flush,
metadata: metadata.clone(),
row_group_size: 1024,
puffin_manager: factory.build(mock_object_store()),
puffin_manager: factory.build(mock_object_store(), NoopPathProvider),
intermediate_manager: intm_manager.clone(),
index_options: IndexOptions::default(),
inverted_index_config: InvertedIndexConfig {
@@ -539,7 +546,7 @@ mod tests {
fulltext_index_config: FulltextIndexConfig::default(),
bloom_filter_index_config: BloomFilterConfig::default(),
}
.build(FileId::random(), "test".to_string())
.build(FileId::random())
.await;
assert!(indexer.inverted_indexer.is_none());
@@ -550,7 +557,7 @@ mod tests {
op_type: OperationType::Compact,
metadata: metadata.clone(),
row_group_size: 1024,
puffin_manager: factory.build(mock_object_store()),
puffin_manager: factory.build(mock_object_store(), NoopPathProvider),
intermediate_manager: intm_manager.clone(),
index_options: IndexOptions::default(),
inverted_index_config: InvertedIndexConfig::default(),
@@ -560,7 +567,7 @@ mod tests {
},
bloom_filter_index_config: BloomFilterConfig::default(),
}
.build(FileId::random(), "test".to_string())
.build(FileId::random())
.await;
assert!(indexer.inverted_indexer.is_some());
@@ -571,7 +578,7 @@ mod tests {
op_type: OperationType::Compact,
metadata,
row_group_size: 1024,
puffin_manager: factory.build(mock_object_store()),
puffin_manager: factory.build(mock_object_store(), NoopPathProvider),
intermediate_manager: intm_manager,
index_options: IndexOptions::default(),
inverted_index_config: InvertedIndexConfig::default(),
@@ -581,7 +588,7 @@ mod tests {
..Default::default()
},
}
.build(FileId::random(), "test".to_string())
.build(FileId::random())
.await;
assert!(indexer.inverted_indexer.is_some());
@@ -596,7 +603,7 @@ mod tests {
let intm_manager = mock_intm_mgr(dir.path().to_string_lossy()).await;
let metadata = mock_region_metadata(MetaConfig {
with_tag: false,
with_inverted: false,
with_fulltext: true,
with_skipping_bloom: true,
});
@@ -604,14 +611,14 @@ mod tests {
op_type: OperationType::Flush,
metadata: metadata.clone(),
row_group_size: 1024,
puffin_manager: factory.build(mock_object_store()),
puffin_manager: factory.build(mock_object_store(), NoopPathProvider),
intermediate_manager: intm_manager.clone(),
index_options: IndexOptions::default(),
inverted_index_config: InvertedIndexConfig::default(),
fulltext_index_config: FulltextIndexConfig::default(),
bloom_filter_index_config: BloomFilterConfig::default(),
}
.build(FileId::random(), "test".to_string())
.build(FileId::random())
.await;
assert!(indexer.inverted_indexer.is_none());
@@ -619,7 +626,7 @@ mod tests {
assert!(indexer.bloom_filter_indexer.is_some());
let metadata = mock_region_metadata(MetaConfig {
with_tag: true,
with_inverted: true,
with_fulltext: false,
with_skipping_bloom: true,
});
@@ -627,14 +634,14 @@ mod tests {
op_type: OperationType::Flush,
metadata: metadata.clone(),
row_group_size: 1024,
puffin_manager: factory.build(mock_object_store()),
puffin_manager: factory.build(mock_object_store(), NoopPathProvider),
intermediate_manager: intm_manager.clone(),
index_options: IndexOptions::default(),
inverted_index_config: InvertedIndexConfig::default(),
fulltext_index_config: FulltextIndexConfig::default(),
bloom_filter_index_config: BloomFilterConfig::default(),
}
.build(FileId::random(), "test".to_string())
.build(FileId::random())
.await;
assert!(indexer.inverted_indexer.is_some());
@@ -642,7 +649,7 @@ mod tests {
assert!(indexer.bloom_filter_indexer.is_some());
let metadata = mock_region_metadata(MetaConfig {
with_tag: true,
with_inverted: true,
with_fulltext: true,
with_skipping_bloom: false,
});
@@ -650,14 +657,14 @@ mod tests {
op_type: OperationType::Flush,
metadata: metadata.clone(),
row_group_size: 1024,
puffin_manager: factory.build(mock_object_store()),
puffin_manager: factory.build(mock_object_store(), NoopPathProvider),
intermediate_manager: intm_manager,
index_options: IndexOptions::default(),
inverted_index_config: InvertedIndexConfig::default(),
fulltext_index_config: FulltextIndexConfig::default(),
bloom_filter_index_config: BloomFilterConfig::default(),
}
.build(FileId::random(), "test".to_string())
.build(FileId::random())
.await;
assert!(indexer.inverted_indexer.is_some());
@@ -672,7 +679,7 @@ mod tests {
let intm_manager = mock_intm_mgr(dir.path().to_string_lossy()).await;
let metadata = mock_region_metadata(MetaConfig {
with_tag: true,
with_inverted: true,
with_fulltext: true,
with_skipping_bloom: true,
});
@@ -680,14 +687,14 @@ mod tests {
op_type: OperationType::Flush,
metadata,
row_group_size: 0,
puffin_manager: factory.build(mock_object_store()),
puffin_manager: factory.build(mock_object_store(), NoopPathProvider),
intermediate_manager: intm_manager,
index_options: IndexOptions::default(),
inverted_index_config: InvertedIndexConfig::default(),
fulltext_index_config: FulltextIndexConfig::default(),
bloom_filter_index_config: BloomFilterConfig::default(),
}
.build(FileId::random(), "test".to_string())
.build(FileId::random())
.await;
assert!(indexer.inverted_indexer.is_none());

View File

@@ -28,6 +28,7 @@ use puffin::puffin_manager::{BlobGuard, PuffinManager, PuffinReader};
use snafu::ResultExt;
use store_api::storage::{ColumnId, RegionId};
use crate::access_layer::{RegionFilePathFactory, WriteCachePathProvider};
use crate::cache::file_cache::{FileCacheRef, FileType, IndexKey};
use crate::cache::index::bloom_filter_index::{
BloomFilterIndexCacheRef, CachedBloomFilterIndexBlobReader,
@@ -43,7 +44,6 @@ use crate::sst::index::bloom_filter::applier::builder::Predicate;
use crate::sst::index::bloom_filter::INDEX_BLOB_TYPE;
use crate::sst::index::puffin_manager::{BlobReader, PuffinManagerFactory};
use crate::sst::index::TYPE_BLOOM_FILTER_INDEX;
use crate::sst::location;
pub(crate) type BloomFilterIndexApplierRef = Arc<BloomFilterIndexApplier>;
@@ -247,11 +247,12 @@ impl BloomFilterIndexApplier {
return Ok(None);
};
let puffin_manager = self.puffin_manager_factory.build(file_cache.local_store());
let puffin_file_name = file_cache.cache_file_path(index_key);
let puffin_manager = self.puffin_manager_factory.build(
file_cache.local_store(),
WriteCachePathProvider::new(self.region_id, file_cache.clone()),
);
let reader = puffin_manager
.reader(&puffin_file_name)
.reader(&file_id)
.await
.context(PuffinBuildReaderSnafu)?
.with_file_size_hint(file_size_hint)
@@ -278,12 +279,14 @@ impl BloomFilterIndexApplier {
) -> Result<BlobReader> {
let puffin_manager = self
.puffin_manager_factory
.build(self.object_store.clone())
.build(
self.object_store.clone(),
RegionFilePathFactory::new(self.region_dir.clone()),
)
.with_puffin_metadata_cache(self.puffin_metadata_cache.clone());
let file_path = location::index_file_path(&self.region_dir, file_id);
puffin_manager
.reader(&file_path)
.reader(&file_id)
.await
.context(PuffinBuildReaderSnafu)?
.with_file_size_hint(file_size_hint)
@@ -447,7 +450,6 @@ mod tests {
let memory_usage_threshold = Some(1024);
let file_id = FileId::random();
let region_dir = "region_dir".to_string();
let path = location::index_file_path(&region_dir, file_id);
let mut indexer =
BloomFilterIndexer::new(file_id, &region_metadata, intm_mgr, memory_usage_threshold)
@@ -460,9 +462,12 @@ mod tests {
let mut batch = new_batch("tag2", 10..20);
indexer.update(&mut batch).await.unwrap();
let puffin_manager = factory.build(object_store.clone());
let puffin_manager = factory.build(
object_store.clone(),
RegionFilePathFactory::new(region_dir.clone()),
);
let mut puffin_writer = puffin_manager.writer(&path).await.unwrap();
let mut puffin_writer = puffin_manager.writer(&file_id).await.unwrap();
indexer.finish(&mut puffin_writer).await.unwrap();
puffin_writer.finish().await.unwrap();

View File

@@ -356,6 +356,7 @@ pub(crate) mod tests {
use store_api::storage::RegionId;
use super::*;
use crate::access_layer::FilePathProvider;
use crate::read::BatchColumn;
use crate::row_converter::{DensePrimaryKeyCodec, PrimaryKeyCodecExt};
use crate::sst::index::puffin_manager::PuffinManagerFactory;
@@ -368,6 +369,18 @@ pub(crate) mod tests {
IntermediateManager::init_fs(path).await.unwrap()
}
pub struct TestPathProvider;
impl FilePathProvider for TestPathProvider {
fn build_index_file_path(&self, file_id: FileId) -> String {
file_id.to_string()
}
fn build_sst_file_path(&self, file_id: FileId) -> String {
file_id.to_string()
}
}
/// tag_str:
/// - type: string
/// - index: bloom filter
@@ -483,16 +496,16 @@ pub(crate) mod tests {
indexer.update(&mut batch).await.unwrap();
let (_d, factory) = PuffinManagerFactory::new_for_test_async(prefix).await;
let puffin_manager = factory.build(object_store);
let puffin_manager = factory.build(object_store, TestPathProvider);
let index_file_name = "index_file";
let mut puffin_writer = puffin_manager.writer(index_file_name).await.unwrap();
let file_id = FileId::random();
let mut puffin_writer = puffin_manager.writer(&file_id).await.unwrap();
let (row_count, byte_count) = indexer.finish(&mut puffin_writer).await.unwrap();
assert_eq!(row_count, 20);
assert!(byte_count > 0);
puffin_writer.finish().await.unwrap();
let puffin_reader = puffin_manager.reader(index_file_name).await.unwrap();
let puffin_reader = puffin_manager.reader(&file_id).await.unwrap();
// tag_str
{

View File

@@ -15,19 +15,22 @@
use std::collections::BTreeSet;
use std::sync::Arc;
use common_telemetry::warn;
use index::fulltext_index::search::{FulltextIndexSearcher, RowId, TantivyFulltextIndexSearcher};
use object_store::ObjectStore;
use puffin::puffin_manager::cache::PuffinMetadataCacheRef;
use puffin::puffin_manager::{DirGuard, PuffinManager, PuffinReader};
use snafu::ResultExt;
use store_api::storage::ColumnId;
use store_api::storage::{ColumnId, RegionId};
use crate::access_layer::{RegionFilePathFactory, WriteCachePathProvider};
use crate::cache::file_cache::{FileCacheRef, FileType, IndexKey};
use crate::error::{ApplyFulltextIndexSnafu, PuffinBuildReaderSnafu, PuffinReadBlobSnafu, Result};
use crate::metrics::INDEX_APPLY_ELAPSED;
use crate::sst::file::FileId;
use crate::sst::index::fulltext_index::INDEX_BLOB_TYPE_TANTIVY;
use crate::sst::index::puffin_manager::{PuffinManagerFactory, SstPuffinDir};
use crate::sst::index::TYPE_FULLTEXT_INDEX;
use crate::sst::location;
pub mod builder;
@@ -36,6 +39,9 @@ pub struct FulltextIndexApplier {
/// The root directory of the region.
region_dir: String,
/// The region ID.
region_id: RegionId,
/// Queries to apply to the index.
queries: Vec<(ColumnId, String)>,
@@ -44,6 +50,12 @@ pub struct FulltextIndexApplier {
/// Store responsible for accessing index files.
store: ObjectStore,
/// File cache to be used by the `FulltextIndexApplier`.
file_cache: Option<FileCacheRef>,
/// The puffin metadata cache.
puffin_metadata_cache: Option<PuffinMetadataCacheRef>,
}
pub type FulltextIndexApplierRef = Arc<FulltextIndexApplier>;
@@ -52,20 +64,43 @@ impl FulltextIndexApplier {
/// Creates a new `FulltextIndexApplier`.
pub fn new(
region_dir: String,
region_id: RegionId,
store: ObjectStore,
queries: Vec<(ColumnId, String)>,
puffin_manager_factory: PuffinManagerFactory,
) -> Self {
Self {
region_dir,
region_id,
store,
queries,
puffin_manager_factory,
file_cache: None,
puffin_metadata_cache: None,
}
}
/// Sets the file cache.
pub fn with_file_cache(mut self, file_cache: Option<FileCacheRef>) -> Self {
self.file_cache = file_cache;
self
}
/// Sets the puffin metadata cache.
pub fn with_puffin_metadata_cache(
mut self,
puffin_metadata_cache: Option<PuffinMetadataCacheRef>,
) -> Self {
self.puffin_metadata_cache = puffin_metadata_cache;
self
}
/// Applies the queries to the fulltext index of the specified SST file.
pub async fn apply(&self, file_id: FileId) -> Result<BTreeSet<RowId>> {
pub async fn apply(
&self,
file_id: FileId,
file_size_hint: Option<u64>,
) -> Result<BTreeSet<RowId>> {
let _timer = INDEX_APPLY_ELAPSED
.with_label_values(&[TYPE_FULLTEXT_INDEX])
.start_timer();
@@ -74,7 +109,9 @@ impl FulltextIndexApplier {
let mut row_ids = BTreeSet::new();
for (column_id, query) in &self.queries {
let dir = self.index_dir_path(file_id, *column_id).await?;
let dir = self
.index_dir_path(file_id, *column_id, file_size_hint)
.await?;
let path = match &dir {
Some(dir) => dir.path(),
None => {
@@ -110,15 +147,74 @@ impl FulltextIndexApplier {
&self,
file_id: FileId,
column_id: ColumnId,
file_size_hint: Option<u64>,
) -> Result<Option<SstPuffinDir>> {
let puffin_manager = self.puffin_manager_factory.build(self.store.clone());
let file_path = location::index_file_path(&self.region_dir, file_id);
let blob_key = format!("{INDEX_BLOB_TYPE_TANTIVY}-{column_id}");
match puffin_manager
.reader(&file_path)
// FAST PATH: Try to read the index from the file cache.
if let Some(file_cache) = &self.file_cache {
let index_key = IndexKey::new(self.region_id, file_id, FileType::Puffin);
if file_cache.get(index_key).await.is_some() {
match self
.get_index_from_file_cache(file_cache, file_id, file_size_hint, &blob_key)
.await
{
Ok(dir) => return Ok(dir),
Err(err) => {
warn!(err; "An unexpected error occurred while reading the cached index file. Fallback to remote index file.")
}
}
}
}
// SLOW PATH: Try to read the index from the remote file.
self.get_index_from_remote_file(file_id, file_size_hint, &blob_key)
.await
}
async fn get_index_from_file_cache(
&self,
file_cache: &FileCacheRef,
file_id: FileId,
file_size_hint: Option<u64>,
blob_key: &str,
) -> Result<Option<SstPuffinDir>> {
match self
.puffin_manager_factory
.build(
file_cache.local_store(),
WriteCachePathProvider::new(self.region_id, file_cache.clone()),
)
.reader(&file_id)
.await
.context(PuffinBuildReaderSnafu)?
.dir(&format!("{INDEX_BLOB_TYPE_TANTIVY}-{column_id}"))
.with_file_size_hint(file_size_hint)
.dir(blob_key)
.await
{
Ok(dir) => Ok(Some(dir)),
Err(puffin::error::Error::BlobNotFound { .. }) => Ok(None),
Err(err) => Err(err).context(PuffinReadBlobSnafu),
}
}
async fn get_index_from_remote_file(
&self,
file_id: FileId,
file_size_hint: Option<u64>,
blob_key: &str,
) -> Result<Option<SstPuffinDir>> {
match self
.puffin_manager_factory
.build(
self.store.clone(),
RegionFilePathFactory::new(self.region_dir.clone()),
)
.reader(&file_id)
.await
.context(PuffinBuildReaderSnafu)?
.with_file_size_hint(file_size_hint)
.dir(blob_key)
.await
{
Ok(dir) => Ok(Some(dir)),

View File

@@ -15,9 +15,11 @@
use datafusion_common::ScalarValue;
use datafusion_expr::Expr;
use object_store::ObjectStore;
use puffin::puffin_manager::cache::PuffinMetadataCacheRef;
use store_api::metadata::RegionMetadata;
use store_api::storage::{ColumnId, ConcreteDataType};
use store_api::storage::{ColumnId, ConcreteDataType, RegionId};
use crate::cache::file_cache::FileCacheRef;
use crate::error::Result;
use crate::sst::index::fulltext_index::applier::FulltextIndexApplier;
use crate::sst::index::puffin_manager::PuffinManagerFactory;
@@ -25,27 +27,49 @@ use crate::sst::index::puffin_manager::PuffinManagerFactory;
/// `FulltextIndexApplierBuilder` is a builder for `FulltextIndexApplier`.
pub struct FulltextIndexApplierBuilder<'a> {
region_dir: String,
region_id: RegionId,
store: ObjectStore,
puffin_manager_factory: PuffinManagerFactory,
metadata: &'a RegionMetadata,
file_cache: Option<FileCacheRef>,
puffin_metadata_cache: Option<PuffinMetadataCacheRef>,
}
impl<'a> FulltextIndexApplierBuilder<'a> {
/// Creates a new `FulltextIndexApplierBuilder`.
pub fn new(
region_dir: String,
region_id: RegionId,
store: ObjectStore,
puffin_manager_factory: PuffinManagerFactory,
metadata: &'a RegionMetadata,
) -> Self {
Self {
region_dir,
region_id,
store,
puffin_manager_factory,
metadata,
file_cache: None,
puffin_metadata_cache: None,
}
}
/// Sets the file cache to be used by the `FulltextIndexApplier`.
pub fn with_file_cache(mut self, file_cache: Option<FileCacheRef>) -> Self {
self.file_cache = file_cache;
self
}
/// Sets the puffin metadata cache to be used by the `FulltextIndexApplier`.
pub fn with_puffin_metadata_cache(
mut self,
puffin_metadata_cache: Option<PuffinMetadataCacheRef>,
) -> Self {
self.puffin_metadata_cache = puffin_metadata_cache;
self
}
/// Builds `SstIndexApplier` from the given expressions.
pub fn build(self, exprs: &[Expr]) -> Result<Option<FulltextIndexApplier>> {
let mut queries = Vec::with_capacity(exprs.len());
@@ -58,10 +82,13 @@ impl<'a> FulltextIndexApplierBuilder<'a> {
Ok((!queries.is_empty()).then(|| {
FulltextIndexApplier::new(
self.region_dir,
self.region_id,
self.store,
queries,
self.puffin_manager_factory,
)
.with_file_cache(self.file_cache)
.with_puffin_metadata_cache(self.puffin_metadata_cache)
}))
}

View File

@@ -350,11 +350,11 @@ mod tests {
use store_api::storage::{ConcreteDataType, RegionId};
use super::*;
use crate::access_layer::RegionFilePathFactory;
use crate::read::{Batch, BatchColumn};
use crate::sst::file::FileId;
use crate::sst::index::fulltext_index::applier::FulltextIndexApplier;
use crate::sst::index::puffin_manager::PuffinManagerFactory;
use crate::sst::location;
fn mock_object_store() -> ObjectStore {
ObjectStore::new(Memory::default()).unwrap().finish()
@@ -494,7 +494,6 @@ mod tests {
let (d, factory) = PuffinManagerFactory::new_for_test_async(prefix).await;
let region_dir = "region0".to_string();
let sst_file_id = FileId::random();
let file_path = location::index_file_path(&region_dir, sst_file_id);
let object_store = mock_object_store();
let region_metadata = mock_region_metadata();
let intm_mgr = new_intm_mgr(d.path().to_string_lossy()).await;
@@ -514,8 +513,11 @@ mod tests {
let mut batch = new_batch(rows);
indexer.update(&mut batch).await.unwrap();
let puffin_manager = factory.build(object_store.clone());
let mut writer = puffin_manager.writer(&file_path).await.unwrap();
let puffin_manager = factory.build(
object_store.clone(),
RegionFilePathFactory::new(region_dir.clone()),
);
let mut writer = puffin_manager.writer(&sst_file_id).await.unwrap();
let _ = indexer.finish(&mut writer).await.unwrap();
writer.finish().await.unwrap();
@@ -523,6 +525,7 @@ mod tests {
let _d = &d;
let applier = FulltextIndexApplier::new(
region_dir.clone(),
region_metadata.region_id,
object_store.clone(),
queries
.into_iter()
@@ -531,7 +534,7 @@ mod tests {
factory.clone(),
);
async move { applier.apply(sst_file_id).await.unwrap() }.boxed()
async move { applier.apply(sst_file_id, None).await.unwrap() }.boxed()
}
}

View File

@@ -62,7 +62,7 @@ impl Indexer {
async fn build_puffin_writer(&mut self) -> Option<SstPuffinWriter> {
let puffin_manager = self.puffin_manager.take()?;
let err = match puffin_manager.writer(&self.file_path).await {
let err = match puffin_manager.writer(&self.file_id).await {
Ok(writer) => return Some(writer),
Err(err) => err,
};

View File

@@ -28,6 +28,7 @@ use puffin::puffin_manager::{BlobGuard, PuffinManager, PuffinReader};
use snafu::ResultExt;
use store_api::storage::RegionId;
use crate::access_layer::{RegionFilePathFactory, WriteCachePathProvider};
use crate::cache::file_cache::{FileCacheRef, FileType, IndexKey};
use crate::cache::index::inverted_index::{CachedInvertedIndexBlobReader, InvertedIndexCacheRef};
use crate::error::{
@@ -38,7 +39,6 @@ use crate::sst::file::FileId;
use crate::sst::index::inverted_index::INDEX_BLOB_TYPE;
use crate::sst::index::puffin_manager::{BlobReader, PuffinManagerFactory};
use crate::sst::index::TYPE_INVERTED_INDEX;
use crate::sst::location;
/// `InvertedIndexApplier` is responsible for applying predicates to the provided SST files
/// and returning the relevant row group ids for further scan.
@@ -172,12 +172,14 @@ impl InvertedIndexApplier {
return Ok(None);
};
let puffin_manager = self.puffin_manager_factory.build(file_cache.local_store());
let puffin_file_name = file_cache.cache_file_path(index_key);
let puffin_manager = self.puffin_manager_factory.build(
file_cache.local_store(),
WriteCachePathProvider::new(self.region_id, file_cache.clone()),
);
// Adds file size hint to the puffin reader to avoid extra metadata read.
let reader = puffin_manager
.reader(&puffin_file_name)
.reader(&file_id)
.await
.context(PuffinBuildReaderSnafu)?
.with_file_size_hint(file_size_hint)
@@ -198,12 +200,14 @@ impl InvertedIndexApplier {
) -> Result<BlobReader> {
let puffin_manager = self
.puffin_manager_factory
.build(self.store.clone())
.build(
self.store.clone(),
RegionFilePathFactory::new(self.region_dir.clone()),
)
.with_puffin_metadata_cache(self.puffin_metadata_cache.clone());
let file_path = location::index_file_path(&self.region_dir, file_id);
puffin_manager
.reader(&file_path)
.reader(&file_id)
.await
.context(PuffinBuildReaderSnafu)?
.with_file_size_hint(file_size_hint)
@@ -239,10 +243,12 @@ mod tests {
let object_store = ObjectStore::new(Memory::default()).unwrap().finish();
let file_id = FileId::random();
let region_dir = "region_dir".to_string();
let path = location::index_file_path(&region_dir, file_id);
let puffin_manager = puffin_manager_factory.build(object_store.clone());
let mut writer = puffin_manager.writer(&path).await.unwrap();
let puffin_manager = puffin_manager_factory.build(
object_store.clone(),
RegionFilePathFactory::new(region_dir.clone()),
);
let mut writer = puffin_manager.writer(&file_id).await.unwrap();
writer
.put_blob(INDEX_BLOB_TYPE, Cursor::new(vec![]), Default::default())
.await
@@ -285,10 +291,12 @@ mod tests {
let object_store = ObjectStore::new(Memory::default()).unwrap().finish();
let file_id = FileId::random();
let region_dir = "region_dir".to_string();
let path = location::index_file_path(&region_dir, file_id);
let puffin_manager = puffin_manager_factory.build(object_store.clone());
let mut writer = puffin_manager.writer(&path).await.unwrap();
let puffin_manager = puffin_manager_factory.build(
object_store.clone(),
RegionFilePathFactory::new(region_dir.clone()),
);
let mut writer = puffin_manager.writer(&file_id).await.unwrap();
writer
.put_blob("invalid_blob_type", Cursor::new(vec![]), Default::default())
.await

View File

@@ -336,13 +336,13 @@ mod tests {
use store_api::storage::RegionId;
use super::*;
use crate::access_layer::RegionFilePathFactory;
use crate::cache::index::inverted_index::InvertedIndexCache;
use crate::metrics::CACHE_BYTES;
use crate::read::BatchColumn;
use crate::row_converter::{DensePrimaryKeyCodec, PrimaryKeyCodecExt};
use crate::sst::index::inverted_index::applier::builder::InvertedIndexApplierBuilder;
use crate::sst::index::puffin_manager::PuffinManagerFactory;
use crate::sst::location;
fn mock_object_store() -> ObjectStore {
ObjectStore::new(Memory::default()).unwrap().finish()
@@ -438,7 +438,6 @@ mod tests {
let (d, factory) = PuffinManagerFactory::new_for_test_async(prefix).await;
let region_dir = "region0".to_string();
let sst_file_id = FileId::random();
let file_path = location::index_file_path(&region_dir, sst_file_id);
let object_store = mock_object_store();
let region_metadata = mock_region_metadata();
let intm_mgr = new_intm_mgr(d.path().to_string_lossy()).await;
@@ -460,8 +459,11 @@ mod tests {
creator.update(&mut batch).await.unwrap();
}
let puffin_manager = factory.build(object_store.clone());
let mut writer = puffin_manager.writer(&file_path).await.unwrap();
let puffin_manager = factory.build(
object_store.clone(),
RegionFilePathFactory::new(region_dir.clone()),
);
let mut writer = puffin_manager.writer(&sst_file_id).await.unwrap();
let (row_count, _) = creator.finish(&mut writer).await.unwrap();
assert_eq!(row_count, rows.len() * segment_row_count);
writer.finish().await.unwrap();

View File

@@ -14,6 +14,7 @@
use std::path::Path;
use std::sync::Arc;
use std::time::Duration;
use async_trait::async_trait;
use common_error::ext::BoxedError;
@@ -21,22 +22,24 @@ use object_store::{FuturesAsyncWriter, ObjectStore};
use puffin::error::{self as puffin_error, Result as PuffinResult};
use puffin::puffin_manager::file_accessor::PuffinFileAccessor;
use puffin::puffin_manager::fs_puffin_manager::FsPuffinManager;
use puffin::puffin_manager::stager::BoundedStager;
use puffin::puffin_manager::stager::{BoundedStager, Stager};
use puffin::puffin_manager::{BlobGuard, PuffinManager, PuffinReader};
use snafu::ResultExt;
use crate::error::{PuffinInitStagerSnafu, Result};
use crate::access_layer::FilePathProvider;
use crate::error::{PuffinInitStagerSnafu, PuffinPurgeStagerSnafu, Result};
use crate::metrics::{
INDEX_PUFFIN_FLUSH_OP_TOTAL, INDEX_PUFFIN_READ_BYTES_TOTAL, INDEX_PUFFIN_READ_OP_TOTAL,
INDEX_PUFFIN_WRITE_BYTES_TOTAL, INDEX_PUFFIN_WRITE_OP_TOTAL,
StagerMetrics, INDEX_PUFFIN_FLUSH_OP_TOTAL, INDEX_PUFFIN_READ_BYTES_TOTAL,
INDEX_PUFFIN_READ_OP_TOTAL, INDEX_PUFFIN_WRITE_BYTES_TOTAL, INDEX_PUFFIN_WRITE_OP_TOTAL,
};
use crate::sst::file::FileId;
use crate::sst::index::store::{self, InstrumentedStore};
type InstrumentedRangeReader = store::InstrumentedRangeReader<'static>;
type InstrumentedAsyncWrite = store::InstrumentedAsyncWrite<'static, FuturesAsyncWriter>;
pub(crate) type SstPuffinManager =
FsPuffinManager<Arc<BoundedStager>, ObjectStorePuffinFileAccessor>;
FsPuffinManager<Arc<BoundedStager<FileId>>, ObjectStorePuffinFileAccessor>;
pub(crate) type SstPuffinReader = <SstPuffinManager as PuffinManager>::Reader;
pub(crate) type SstPuffinWriter = <SstPuffinManager as PuffinManager>::Writer;
pub(crate) type SstPuffinBlob = <SstPuffinReader as PuffinReader>::Blob;
@@ -49,7 +52,7 @@ const STAGING_DIR: &str = "staging";
#[derive(Clone)]
pub struct PuffinManagerFactory {
/// The stager used by the puffin manager.
stager: Arc<BoundedStager>,
stager: Arc<BoundedStager<FileId>>,
/// The size of the write buffer used to create object store.
write_buffer_size: Option<usize>,
@@ -61,22 +64,40 @@ impl PuffinManagerFactory {
aux_path: impl AsRef<Path>,
staging_capacity: u64,
write_buffer_size: Option<usize>,
staging_ttl: Option<Duration>,
) -> Result<Self> {
let staging_dir = aux_path.as_ref().join(STAGING_DIR);
let stager = BoundedStager::new(staging_dir, staging_capacity, None)
.await
.context(PuffinInitStagerSnafu)?;
let stager = BoundedStager::new(
staging_dir,
staging_capacity,
Some(Arc::new(StagerMetrics::default())),
staging_ttl,
)
.await
.context(PuffinInitStagerSnafu)?;
Ok(Self {
stager: Arc::new(stager),
write_buffer_size,
})
}
pub(crate) fn build(&self, store: ObjectStore) -> SstPuffinManager {
pub(crate) fn build(
&self,
store: ObjectStore,
path_provider: impl FilePathProvider + 'static,
) -> SstPuffinManager {
let store = InstrumentedStore::new(store).with_write_buffer_size(self.write_buffer_size);
let puffin_file_accessor = ObjectStorePuffinFileAccessor::new(store);
let puffin_file_accessor =
ObjectStorePuffinFileAccessor::new(store, Arc::new(path_provider));
SstPuffinManager::new(self.stager.clone(), puffin_file_accessor)
}
pub(crate) async fn purge_stager(&self, file_id: FileId) -> Result<()> {
self.stager
.purge(&file_id)
.await
.context(PuffinPurgeStagerSnafu)
}
}
#[cfg(test)]
@@ -85,7 +106,7 @@ impl PuffinManagerFactory {
prefix: &str,
) -> (common_test_util::temp_dir::TempDir, Self) {
let tempdir = common_test_util::temp_dir::create_temp_dir(prefix);
let factory = Self::new(tempdir.path().to_path_buf(), 1024, None)
let factory = Self::new(tempdir.path().to_path_buf(), 1024, None, None)
.await
.unwrap();
(tempdir, factory)
@@ -94,7 +115,7 @@ impl PuffinManagerFactory {
pub(crate) fn new_for_test_block(prefix: &str) -> (common_test_util::temp_dir::TempDir, Self) {
let tempdir = common_test_util::temp_dir::create_temp_dir(prefix);
let f = Self::new(tempdir.path().to_path_buf(), 1024, None);
let f = Self::new(tempdir.path().to_path_buf(), 1024, None, None);
let factory = common_runtime::block_on_global(f).unwrap();
(tempdir, factory)
@@ -105,11 +126,15 @@ impl PuffinManagerFactory {
#[derive(Clone)]
pub(crate) struct ObjectStorePuffinFileAccessor {
object_store: InstrumentedStore,
path_provider: Arc<dyn FilePathProvider>,
}
impl ObjectStorePuffinFileAccessor {
pub fn new(object_store: InstrumentedStore) -> Self {
Self { object_store }
pub fn new(object_store: InstrumentedStore, path_provider: Arc<dyn FilePathProvider>) -> Self {
Self {
object_store,
path_provider,
}
}
}
@@ -117,11 +142,13 @@ impl ObjectStorePuffinFileAccessor {
impl PuffinFileAccessor for ObjectStorePuffinFileAccessor {
type Reader = InstrumentedRangeReader;
type Writer = InstrumentedAsyncWrite;
type FileHandle = FileId;
async fn reader(&self, puffin_file_name: &str) -> PuffinResult<Self::Reader> {
async fn reader(&self, handle: &FileId) -> PuffinResult<Self::Reader> {
let file_path = self.path_provider.build_index_file_path(*handle);
self.object_store
.range_reader(
puffin_file_name,
&file_path,
&INDEX_PUFFIN_READ_BYTES_TOTAL,
&INDEX_PUFFIN_READ_OP_TOTAL,
)
@@ -130,10 +157,11 @@ impl PuffinFileAccessor for ObjectStorePuffinFileAccessor {
.context(puffin_error::ExternalSnafu)
}
async fn writer(&self, puffin_file_name: &str) -> PuffinResult<Self::Writer> {
async fn writer(&self, handle: &FileId) -> PuffinResult<Self::Writer> {
let file_path = self.path_provider.build_index_file_path(*handle);
self.object_store
.writer(
puffin_file_name,
&file_path,
&INDEX_PUFFIN_WRITE_BYTES_TOTAL,
&INDEX_PUFFIN_WRITE_OP_TOTAL,
&INDEX_PUFFIN_FLUSH_OP_TOTAL,
@@ -155,20 +183,32 @@ mod tests {
use super::*;
struct TestFilePathProvider;
impl FilePathProvider for TestFilePathProvider {
fn build_index_file_path(&self, file_id: FileId) -> String {
file_id.to_string()
}
fn build_sst_file_path(&self, file_id: FileId) -> String {
file_id.to_string()
}
}
#[tokio::test]
async fn test_puffin_manager_factory() {
let (_dir, factory) =
PuffinManagerFactory::new_for_test_async("test_puffin_manager_factory_").await;
let object_store = ObjectStore::new(Memory::default()).unwrap().finish();
let manager = factory.build(object_store);
let manager = factory.build(object_store, TestFilePathProvider);
let file_name = "my-puffin-file";
let file_id = FileId::random();
let blob_key = "blob-key";
let dir_key = "dir-key";
let raw_data = b"hello world!";
let mut writer = manager.writer(file_name).await.unwrap();
let mut writer = manager.writer(&file_id).await.unwrap();
writer
.put_blob(blob_key, Cursor::new(raw_data), PutOptions::default())
.await
@@ -189,7 +229,7 @@ mod tests {
.unwrap();
writer.finish().await.unwrap();
let reader = manager.reader(file_name).await.unwrap();
let reader = manager.reader(&file_id).await.unwrap();
let blob_guard = reader.blob(blob_key).await.unwrap();
let blob_reader = blob_guard.reader().await.unwrap();
let meta = blob_reader.metadata().await.unwrap();

View File

@@ -131,7 +131,7 @@ mod tests {
#[async_trait::async_trait]
impl IndexerBuilder for NoopIndexBuilder {
async fn build(&self, _file_id: FileId, _path: String) -> Indexer {
async fn build(&self, _file_id: FileId) -> Indexer {
Indexer::default()
}
}

Some files were not shown because too many files have changed in this diff Show More