Compare commits

...

134 Commits

Author SHA1 Message Date
discord9
1ddf19d886 feat: flownode use Inserter to write to database (#4323)
* feat: use `Inserter` as Frontend

* fix: enable procedure in flownode

* docs: remove `frontend_addr` opts

* chore: rm fe addr in test runner

* refactor: int test also use inserter invoker

* feat: flow shutdown&refactor: remove `Frontendinvoker`

* refactor: rename `RemoteFrontendInvoker` to `FrontendInvoker`

* refactor: per review

* refactor: remove a layer of  box

* fix: standalone use `node_manager`

* fix: remove a `Arc` cycle
2024-07-09 10:44:22 +00:00
Ruihang Xia
185953e586 fix: support unary operator in default value, partition rule and prepare statement (#4301)
* handle unary operator

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* add sqlness test

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* add prepare test

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* add test and context

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix rebase error

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix merge error

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix sqlness

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Co-authored-by: dennis zhuang <killme2008@gmail.com>
2024-07-09 08:59:06 +00:00
Ning Sun
7fe3f496ac refactor: do not print error log on PlanQuery error (#4322) 2024-07-09 06:34:30 +00:00
Weny Xu
1a9314a581 feat: enhanced the retry logic by adding a random noise (#4320)
feat: enhanced the retry logic by adding a random noise to the retry delay to avoid retry storms
2024-07-09 04:30:10 +00:00
Ruihang Xia
23bb9d92cb feat: handle parentheses with unary ops (#4290)
* feat: handle parentheses with unary ops

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* clean up

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* add comment

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* add sqlness test

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* check tokens before convert to RPN

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* add test cases to sqlness

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix clippy

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Co-authored-by: dennis zhuang <killme2008@gmail.com>
2024-07-09 04:08:36 +00:00
dennis zhuang
f1d17a8ba5 fix: panic while reading information_schema. KEY_COLUMN_USAGE (#4318)
* fix: table might be dropped during iteration

* fix: panic while reading information_schema.key_column_usage

* fix: key_column_usage wrong results
2024-07-09 03:30:14 +00:00
tison
d1f1fad440 build(deps): switch to upstream (#4319)
* build(deps): switch to upstream

* lock

Signed-off-by: tison <wander4096@gmail.com>

---------

Signed-off-by: tison <wander4096@gmail.com>
2024-07-09 01:56:19 +00:00
Zhenchi
00308218b3 feat(fulltext_index): allow enable full-text index in SQL and gRPC way (#4310)
* feat(fulltext_index): allow enable full-text index in SQL and gRPC way

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* fix: typo

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* chore: polish

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* fix: test_fulltext_intm_path

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* address comments

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* refactor: explicitly build column options

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* test: fix error msg

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* fix: address comments

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* fix: polish

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

---------

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
2024-07-08 20:18:48 +00:00
Ning Sun
81308b9063 fix: error on show databases in non-default catalog (#4316) 2024-07-08 16:28:00 +00:00
Lei, HUANG
aa4d10eef7 feat(inverted_index): inverted index cache (#4309)
* feat/inverted-index-cache:
 Update dependencies and add caching for inverted index reader

 - Updated `atomic` to 0.6.0 and `uuid` to 1.9.1 in `Cargo.lock`.
 - Added `moka` and `uuid` dependencies in `Cargo.toml`.
 - Introduced `seek_read` method in `InvertedIndexBlobReader` for common seek and read operations.
 - Added `cache.rs` module to implement caching for inverted index reader using `moka`.
 - Updated `async-compression` to 0.4.11 in `puffin/Cargo.toml`.

* feat/inverted-index-cache:
 Refactor InvertedIndexReader and Add Index Cache Support

 - Refactored `InvertedIndexReader` to include `seek_read` method and default implementations for `fst` and `bitmap`.
 - Implemented `seek_read` in `InvertedIndexBlobReader` and `CachedInvertedIndexBlobReader`.
 - Introduced `InvertedIndexCache` in `CacheManager` and `SstIndexApplier`.
 - Updated `SstIndexApplierBuilder` to accept and utilize `InvertedIndexCache`.
 - Added `From<FileId> for Uuid` implementation.

* feat/inverted-index-cache:
 Update Cargo.toml and refactor SstIndexApplier

 - Moved `uuid.workspace` entry in Cargo.toml for better organization.

* feat/inverted-index-cache:
 Refactor InvertedIndexCache to use type alias for Arc

 - Replaced `Arc<InvertedIndexCache>` with `InvertedIndexCacheRef` type alias.

* feat/inverted-index-cache:
 Add Prometheus metrics and caching improvements for inverted index

 - Introduced `prometheus` and `puffin` dependencies for metrics.

* feat/inverted-index-cache:
 Refactor InvertedIndexReader and Cache handling

 - Simplified `InvertedIndexReader` trait by removing seek-related comments.

* feat/inverted-index-cache:
 Add configurable cache sizes for inverted index metadata and content
 - Introduced `index_metadata_size` and `index_content_size` in `CacheManagerBuilder`.

* feat/inverted-index-cache:
 Refactor and optimize inverted index caching

 - Removed `metrics.rs` and integrated cache metrics into `index.rs`.

* feat/inverted-index-cache:
 Remove unused dependencies from Cargo.lock and Cargo.toml

 - Removed `moka`, `prometheus`, and `puffin` dependencies from both Cargo.lock and Cargo.toml.

* feat/inverted-index-cache:
 Replace Uuid with FileId in CachedInvertedIndexBlobReader

 - Updated `file_id` type from `Uuid` to `FileId` in `CachedInvertedIndexBlobReader` and related methods.

* feat/inverted-index-cache:
 Refactor cache configuration for inverted index

 - Moved `inverted_index_metadata_cache_size` and `inverted_index_cache_size` from `MitoConfig` to `InvertedIndexConfig`.

* feat/inverted-index-cache:
 Remove unnecessary conversion of `file_id` in `SstIndexApplier`

 - Simplified the initialization of `CachedInvertedIndexBlobReader` by removing the redundant `into()` conversion for `file_id`.
2024-07-08 12:36:59 +00:00
Zhenchi
4811fe83f5 fix: test_fulltext_intm_path (#4314)
* fix: test_fulltext_intm_path

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* address comments

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

---------

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
2024-07-08 12:34:35 +00:00
Ruihang Xia
96861137b2 fix(ci): remove sqlness state in success (#4313)
* fix(ci): remove sqlness state in success

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix regex

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Co-authored-by: dennis zhuang <killme2008@gmail.com>
2024-07-08 12:32:36 +00:00
Yohan Wal
8e69543704 feat: support inserting into binary value through string (#4197)
feat: support inserting binary by string
2024-07-08 12:09:30 +00:00
Ruihang Xia
e5730a3745 refactor: split match arms in prom_expr_to_plan into smaller methods (#4317)
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2024-07-08 11:59:59 +00:00
localhost
c0e9b3dbe2 chore: disable TraceLayer on_failure log (#4315) 2024-07-08 10:53:35 +00:00
Yingwen
59afa70311 feat!: Set merge mode while creating table in influx handler (#4299)
* feat: influxdb write auto set merge mode

* chore: update logs

* chore: address PR comments
2024-07-08 04:55:36 +00:00
dennis zhuang
bb32230f00 feat: impl show table status (#4303)
* feat: impl show table status

* chore: style and comment

* test: revert lz4 compression
2024-07-08 03:58:29 +00:00
tison
fe0be1583a build(deps): upgrade opendal to 0.47.3 (#4307)
Signed-off-by: tison <wander4096@gmail.com>
2024-07-08 03:33:38 +00:00
Weny Xu
08c415c729 ci: retry on error or timeout during installing operator (#4308)
chore(ci): retry on error or timeout during installing operator
2024-07-08 03:31:13 +00:00
Weny Xu
58f991b864 fix: deregister failure detector in region migration (#4293)
* fix: deregister failure detector in region migration

* chore: apply suggestions from CR
2024-07-07 06:58:12 +00:00
Zhenchi
a710676d06 feat(fulltext_index): integrate full-text indexer with sst writer (#4302)
* feat(fulltext_index): integrate full-text indexer with sst writer

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* refactor: delay building puffin writer

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* test: indexer test

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* chore: add abort on empty indexer

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* config: indicates default mode

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* config: introduce "auto" and "unlimited" as mem threshold

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* chore: polish

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* doc: comment about push empty string

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

---------

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
2024-07-07 04:10:19 +00:00
Eugene Tolbakov
3f4928effc feat(sql): add iso-8601 format support for intervals (#4291)
* feat(sql): add iso-8601 format support for intervals

* fix(sql): address CR feedback

* chore(sql): use regex to check the start of iso 8601
2024-07-05 22:19:05 +00:00
Weny Xu
bc398cf197 feat(remote wal): set default compresion to LZ4 (#4294)
* feat(remote wal): set default compresion to LZ4

* fix: fix test
2024-07-05 20:40:18 +00:00
discord9
09fff24ac4 feat: make flow distributed work&tests (#4256)
feat: flownode frontend client&test

feat: Frontend Client

feat: set frontend invoker for flownode

feat: set frontend invoker for flownode

chore: test script

WIP: test flow distributed

feat: hard coded demo

docs: flownode example toml

feat: add flownode support in runner

docs: comments for node

chore: after rebase

docs: add a todo

tests: move flow tests to common

fix: flownode sqlness dist test

chore: per review

docs: make

fix: make doc
2024-07-05 14:46:44 +00:00
Weny Xu
30b65ca99e chore: bump OpenDAL to 0.47.2 (#4297)
chore: bump opendal to 0.47.2
2024-07-05 13:54:32 +00:00
Yingwen
b1219fa456 feat: refine scan metrics logging (#4296)
* fix: collect scan cost in row group reader

* feat: remove log after scan

* feat: collect prepare scan cost before fetching readers

* print first poll elapsed

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* feat: print more first poll

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Co-authored-by: Ruihang Xia <waynestxia@gmail.com>
2024-07-05 08:39:35 +00:00
Weny Xu
4f0984c1d7 chore: remove original region failover implementation (#4237)
chore: remove original region failure implementation
2024-07-05 08:03:46 +00:00
Weny Xu
0b624dc337 ci: retry on error during installing operator (#4295)
chore(ci): retry on error during installing operator
2024-07-05 07:54:31 +00:00
Yingwen
60f599c3ef feat: expose merge_mode option (#4289)
feat: expose merge mode options
2024-07-05 07:40:01 +00:00
Zhenchi
f71b7b997d refactor(inverted_index): integrate puffin manager with sst indexer (#4285)
* refactor(puffin): adjust generic parameters

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* refactor(inverted_index): integrate puffin manager for build

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* Revert "refactor(puffin): adjust generic parameters"

This reverts commit 81ea1b6ee4.

* fix: column_ids remove ignore columns

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* refactor: remove with_ignore_column_ids

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* docs: add comments for IndexOutput

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* tiny fix

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* config: hide compress

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* chore: index_size > 0 indicates index available

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* perf: reduce to_string`

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* fix: clippy

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* chore: address comment

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* chore: address comment

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

---------

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
2024-07-05 07:12:50 +00:00
Weny Xu
8a119aa0b2 feat: add naive region failover test for metric table (#4269)
* feat: add failover test for metric table

* chore: introduce help marco

* chore: remove incorrect check
2024-07-05 06:54:23 +00:00
dennis zhuang
d2f6daf7b7 fix: prepare inserting with column defaults not work, #4244 (#4272)
* fix: prepare inserting with column defaults not work, #4244

* fix: build column_defaults every time when creating adapters

* feat: cache the column_defaults in table

* test: assert ts column

* fix: unit

* chore: style

Co-authored-by: Yingwen <realevenyag@gmail.com>

* fix: typo

* chore: style

Co-authored-by: Ruihang Xia <waynestxia@gmail.com>

---------

Co-authored-by: Yingwen <realevenyag@gmail.com>
Co-authored-by: Ruihang Xia <waynestxia@gmail.com>
2024-07-05 06:50:12 +00:00
Ning Sun
d9efa564ee feat: add path prefix label to obejct storage metrics (#4277)
* feat: add path prefix label to storage metrics

* refactor: return full path when the levels are less than 3

* refactor: align path label name with upstream

* refactor: better implementation of sub path

---------

Co-authored-by: Weny Xu <wenymedia@gmail.com>
2024-07-05 06:45:47 +00:00
shuiyisong
849e0b9249 feat: delete pipeline (#4156)
* feat: add delete for pipeline

* chore: remove unused code

* refactor: delete pipeline

* chore: add pipeline management api metrics

* chore: minor cr issues

* chore: add unit test

* chore: fix cr issue

* fix: test

* chore: add `GreptimedbManageResponse`

* fix: typo

* fix: typo
2024-07-05 06:23:49 +00:00
discord9
c21e969329 fix: call df_func with literal (#4265)
* fix: call df_func with literal

* chore: rm dbg log forget to remove
2024-07-05 06:21:22 +00:00
shuiyisong
9393a1c51e fix: align pre-commit config with make file (#4292) 2024-07-05 04:19:57 +00:00
shuiyisong
69bb7ded6a fix: enable space string in yaml value (#4286)
* fix: enable space string in yaml value

* fix: typo
2024-07-05 03:39:26 +00:00
Weny Xu
b5c6c72b02 fix: enhance ColumnOption::DefaultValue formatting for string values (#4287) 2024-07-04 13:16:51 +00:00
Ning Sun
8399dcada3 refactor: use rwlock for modifiable session data (#4232)
* chore: update sqlness results

* refactor: use rwlock for modifiable data in session and querycontext

* chore: format toml

* refactor: use mutable_inner structure for mutable fields

* refactor: remove arc wrapper
2024-07-04 12:53:25 +00:00
Zhenchi
6e2c21dd3f refactor(puffin): adjust generic parameters (#4279)
* refactor(puffin): adjust generic parameters

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* fix: address comments

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* fix: remove Box impl

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

---------

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
2024-07-04 12:22:04 +00:00
Weny Xu
70f7baffda feat(fuzz): enhance condition check of region migration finish (#4283) 2024-07-04 12:14:52 +00:00
Weny Xu
4ec247f34d feat: store peer info in TableFlowValue (#4280)
* feat: store peer info in `TableFlowValue`

* chore: apply suggestions from CR
2024-07-04 09:37:23 +00:00
Weny Xu
22f4d43b10 fix(fuzz): generate valid string (#4281)
* fix: generate valid string

* refactor(fuzz): wait for procedure finish at first
2024-07-04 08:22:39 +00:00
discord9
d9175213fd chore: add missing s for --metasrv-addr (#4278) 2024-07-04 07:38:00 +00:00
Ruihang Xia
03c933c006 feat: handle AND/OR and priority in matches fn (#4270)
* feat: handle AND/OR and priority in matches fn

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* transform AST

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix clippy

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* handle non-big-write AND & OR

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Co-authored-by: dennis zhuang <killme2008@gmail.com>
2024-07-04 06:19:03 +00:00
Zhenchi
65c9fbbd2f feat(fulltext_index): integrate puffin manager with inverted index applier (#4266)
* feat(fulltext_index): integrate puffin manager with inverted index applier

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* fix: get rid of unexpected not found from write cache

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* fix: move create_dir_all to BoundedStager::new

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* fix: update config.md

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* config: unify directories

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* fix: silent remove

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* fix: config docs

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* chore: auxiliary -> aux

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

---------

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
2024-07-04 06:18:58 +00:00
Weny Xu
ee9a5d7611 feat: introduce FlowRouteValue (#4263)
* feat: introduce `FlowRouteKey` and `FlowRouteValue`

* feat: put `FlowRouteValue` values in flow creation

* feat: use `FlowRouteValue`

* refactor: remove `PeerLookupServiceRef` in `DdlContext`

* chore: remove unused code

* Update src/common/meta/src/key.rs

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

* chore: apply suggestions from CR

---------

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2024-07-03 19:46:16 +00:00
ZonaHe
8e306f3f51 feat: update dashboard to v0.5.3 (#4262)
Co-authored-by: ZonaHex <ZonaHex@users.noreply.github.com>
2024-07-03 14:45:55 +00:00
Weny Xu
76fac359cd feat: implement naive fuzz test for region migration (#4252)
* fix(fuzz): adapt for new partition rules

* feat: implement naive fuzz test for region migration

* chore(ci): add ci cfg

* chore: apply suggestions from CR

* chore: apply suggestions from CR
2024-07-03 13:30:41 +00:00
Lei, HUANG
705b22411b fix(puffin): fix dependency (#4267)
fix/puffin-dependency: Update async-compression to 0.4 with features and add features to moka in Cargo.toml
2024-07-03 12:13:26 +00:00
zyy17
c9177cceeb ci: push latest greptimedb nigthly build image (#4260) 2024-07-03 11:14:06 +00:00
Jeremyhi
ddf2e6a3c0 feat: provide a simple way to create metaclient (#4257)
* feat: provide a simple way to create metaclient

* chore: minor refactor using metaclient

* chore: minor refactor using metaclient
2024-07-03 08:11:55 +00:00
discord9
967b2cada6 feat!: remove alias metasrv-addr (#4239) 2024-07-03 06:53:43 +00:00
Weny Xu
0f4b9e576d chore(ci): add timeout (60min) for fuzz tests (#4255) 2024-07-03 03:36:43 +00:00
Yohan Wal
c4db9e8aa7 fix!: forbid to change information_schema (#4233)
* fix: forbid to change tables in information_schema

* refactor: use unified read-only check function

* test: add more sqlness tests for information_schema

* refactor: move is_readonly_schema to common_catalog
2024-07-03 03:09:23 +00:00
Ning Sun
11cf9c827e feat: dbeaver mysql compatibility, use statement and information_schema.tables (#4218)
* feat: add more placeholder field in information_schema.tables

* feat: make schema modifiable for use statement

* chore: add todo items

* fix: resolve lint issues after data type changes

* chore: update sqlness results

* refactor: patch for select database is no longer needed

* test: align tests and data types

* Apply suggestions from code review

Co-authored-by: dennis zhuang <killme2008@gmail.com>

* fix: use canonicalize_identifier for database name

* feat: add all columns for information_schema.tables

* test: remove vairables from sqlness results

* feat: add to_string impl for table options

---------

Co-authored-by: dennis zhuang <killme2008@gmail.com>
2024-07-03 01:31:13 +00:00
Weny Xu
be29e48a60 chore: reduce insertion size of fuzz test (#4243)
* chore: reduce size of fuzz test

* chore: get env cfg variables
2024-07-02 13:02:04 +00:00
Lei, HUANG
226136011e refactor: change InvertedIndexWriter method signature to offsets to f… (#4250)
refactor: change InvertedIndexWriter method signature to offsets to facilliate caching
2024-07-02 12:49:18 +00:00
zyy17
fd4a928521 refactor: add RemoteCompaction error (#4251)
* refactor: make location field public

* refactor: add RemoteCompaction error
2024-07-02 12:33:57 +00:00
zyy17
ef5d1a6a65 ci: update centos yum source and specify cargo-binstall version (#4248)
* ci: use 'vault.centos.org' as default yum for centos:7 image

* ci: fix cargo-binstall version to adapt rust toolchain

* ci: specify cargo-binstall version to adapt current rust toolchain
2024-07-02 11:56:21 +00:00
Zhenchi
e64379d4f7 feat(fulltext_index): introduce creator (#4249)
* feat(fulltext_index): introduce creator

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* fix: typo

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* fix: typo

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* chore: polish

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* fix: return error if writer not found

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* refactor: helper function for tests

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

---------

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
2024-07-02 09:06:14 +00:00
zyy17
f2c08b8ddd feat: introduce the interface of RemoteJobScheduler (#4181)
* refactor: add Compactor trait

* chore: add compact() in Compactor trait and expose compaction module

* refactor: add CompactionRequest and open_compaction_region

* refactor: export the compaction api

* refactor: add DefaultCompactor::new_from_request

* refactor: no need to pass mito_config in open_compaction_region()

* refactor: CompactionRequest -> &CompactionRequest

* fix: typo

* docs: add docs for public apis

* refactor: remove 'Picker' from Compactor

* chore: add logs

* chore: change pub attribute for Picker

* refactor: remove do_merge_ssts()

* refactor: update comments

* refactor: use CompactionRegion argument in Picker

* chore: make compaction module public and remove unnessary clone

* refactor: move build_compaction_task() in CompactionScheduler{}

* chore: use  in open_compaction_region() and add some comments for public structure

* refactor: add 'manifest_dir()' in store-api

* refactor: move the default implementation to DefaultCompactor

* refactor: remove Options from MergeOutput

* chore: minor modification

* fix: clippy errors

* fix: unit test errors

* refactor: remove 'manifest_dir()' from store-api crate(already have one in opener)

* refactor: use 'region_dir' in CompactionRequest

* refactor: refine naming

* refactor: refine naming

* refactor: remove clone()

* chore: add comments

* refactor: add PickerOutput field in CompactorRequest

* feat: introduce RemoteJobScheduler

* feat: add RemoteJobScheudler in schedule_compaction_request()

* refactor: use Option type for senders field of CompactionFinished

* refactor: modify CompactionJob

* refactor: schedule remote compaction job by options

* refactor: remove unused Options

* build: remove unused log

* refactor: fallback to local compaction if the remote compaction failed

* fix: clippy errors

* refactor: add plugins in mito2

* refactor: add from_u64() for JobId

* refactor: make schedule module public

* refactor: add error for RemoteJobScheduler

* refactor: add Notifier

* refactor: use Arc for Notifier

* refactor: add 'remote_compaction' in compaction options

* fix: clippy errors

* fix: unrecognized table option

* refactor: add 'start_time' in CompactionJob

* refactor: modify error type of RemoteJobScheduler

* chore: revert changes for request

* refactor: code refactor by review comment

* refactor: use string type for JobId

* refactor: add 'waiters' field in DefaultNotifier

* fix: build error

* refactor: take coderabbit's review comment

* refactor: use uuid::Uuid as JobId

* refactor: return waiters when schedule failed and add on_failure for DefaultNotifier

* refactor: move waiters from notifier to Job

* refactor: use ObjectStoreManagerRef in open_compaction_region()

* refactor: implement  for JobId and adds related unit tests

* fix: run unit tests failed

* refactor: add RemoteJobSchedulerError
2024-07-02 07:08:43 +00:00
Zhenchi
db5d1162f0 feat(puffin): complete dir support (#4240)
* feat(puffin): implement CachedPuffinReader

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* chore: next PR to introduce CachedPuffinManager

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* chore: rename

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* feat(puffin): implement MokaCacheManager

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* chore: polish

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* fix: clippy

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* chore: +1s

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* feat(puffin): implement CachedPuffinManager and add tests

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* fix: corner case to get a blob

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* fix: keep dir in used

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* fix: add more tests

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* chore: add doc comments

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* fix: toml format

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* chore: rename unreleased_dirs

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* chore: refine some comments

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* fix: handle more cornor cases

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* chore: refine

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* refactor: simplify

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* chore: more explanation

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* fix: polish

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* chore: comment compressed

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* fix: fmt

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* chore: address comment

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* refactor: Cached* -> Fs*

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* refactor: CacheManager -> Stager

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* refactor: rename Puffin(A)sync* -> (A)sync*

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* fix: fmt

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

---------

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
2024-07-02 05:43:06 +00:00
tison
ea081c95bf chore: add AUTHOR.md file (#4241)
Signed-off-by: tison <wander4096@gmail.com>
2024-07-02 01:07:13 +00:00
LFC
6276e006b9 refactor: add interceptor after Influxdb lines are converted to grpc row insert (#4225)
* fix: make Influxdb lines able to be inserted into last created tables

* Update src/servers/src/influxdb.rs

* add an option to control the time index alignment behavior

* fix ci

* refactor: use interceptor to handle timestamp align

* Apply suggestions from code review

Co-authored-by: dennis zhuang <killme2008@gmail.com>

---------

Co-authored-by: tison <wander4096@gmail.com>
Co-authored-by: dennis zhuang <killme2008@gmail.com>
2024-07-01 22:28:00 +00:00
tison
2665616f72 build(deps): Upgrade OpenDAL to 0.47 (#4224)
* catch up changes

Signed-off-by: tison <wander4096@gmail.com>

* fmt

Signed-off-by: tison <wander4096@gmail.com>

* Fix cache for 0471 (#7)

* Fix cache for 0471

Signed-off-by: Xuanwo <github@xuanwo.io>

* Make clippy happy

Signed-off-by: Xuanwo <github@xuanwo.io>

---------

Signed-off-by: Xuanwo <github@xuanwo.io>

* tidy

Signed-off-by: tison <wander4096@gmail.com>

* use opendal's exported type

Signed-off-by: tison <wander4096@gmail.com>

* clippy

Signed-off-by: tison <wander4096@gmail.com>

* fmt

Signed-off-by: tison <wander4096@gmail.com>

---------

Signed-off-by: tison <wander4096@gmail.com>
Signed-off-by: Xuanwo <github@xuanwo.io>
Co-authored-by: Xuanwo <github@xuanwo.io>
2024-07-01 17:05:15 +00:00
zyy17
e5313260d0 refactor: use ObjectStoreManagerRef type in open_compaction_region() and add related unit test (#4238) 2024-07-01 13:10:50 +00:00
Zhenchi
b69b24a237 feat(puffin): implement MokaCacheManager (#4211)
* feat(puffin): implement MokaCacheManager

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* chore: polish

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* fix: clippy

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* chore: +1s

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* fix: corner case to get a blob

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* fix: keep dir in used

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* fix: add more tests

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* chore: add doc comments

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* fix: toml format

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* chore: rename unreleased_dirs

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* chore: refine some comments

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* fix: handle more cornor cases

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* chore: refine

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* refactor: simplify

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* chore: more explanation

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* fix: use recycle bin

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* fix

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* fix: remove instead

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* chore: address comment

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* fix: remove unnecessary removing

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

---------

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
2024-07-01 13:10:13 +00:00
discord9
f035a7c79c feat: flow cli for distributed (#4226)
* feat(WIP): add FlownodeInstance for flow cli

* feat(WIP): cli

* feat: add merge opts func

* refactor: move server&error to src dir

* feat: flownode cli build

* feat: add `flownode` subcmd to cli

* refactor: per review

* refactor!: BREAKING remove alias `metasrv-addr`

* chore: after rebase

* feat: cache invalide flownode cache

* chore: small refactor per review

* chore: fix a typo

* feat!: revert breaking change

* chore: per review

* refactor: not accept `metasrv-addr` only for flownode
2024-07-01 09:56:15 +00:00
Ruihang Xia
a4e99f5666 feat: basic implement of matches fn (#4222)
* basic impl

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* handle error

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* Update src/common/function/src/scalars/matches.rs

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

* Update src/common/function/src/scalars/matches.rs

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

* revert typo fix

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* ignore typo unqualifed

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* enhance grammar restrictions

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* Apply suggestions from code review

Co-authored-by: Yingwen <realevenyag@gmail.com>

* todo about tokenizer

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* reverse order

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* rewrite escape_pattern

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: Yingwen <realevenyag@gmail.com>
2024-07-01 07:59:59 +00:00
Jeremyhi
5d396bd6d7 feat: forgot collect flownode clusterinfo handler (#4236)
* feat: forgot collect flownode clusterinfo handler

* fix: unit test

* fix: filter stale heartbeat
2024-07-01 06:42:31 +00:00
zyy17
fe2c5c3735 refactor: expose DatanodeBuilder::build_object_store_manager() and MitoConfig::sanitize() (#4212)
* refactor: expose DatanodeBuilder::build_object_store_manager()

* refactor: expose MitoConfig::sanitize()
2024-07-01 06:36:32 +00:00
Weny Xu
6a634f8e5d feat: register & deregister region failure detectors actively (#4223)
* feat: Use DATANODE_LEASE_SECS from distributed_time_constants for heartbeat pause duration

* feat: introduce `RegionFailureDetectorController` to manage region failure detectors

* feat: add `RegionFailureDetectorController` to `DdlContext`

* feat: add `region_failure_detector_controller` to `Context` in region migration

* feat: register region failure detectors during rollback region migration procedure

* feat: deregister region failure detectors during drop table procedure

* feat: register region failure detectors during create table procedure

* fix: update meta config

* chore: apply suggestions from CR

* chore: avoid cloning

* chore: rename

* chore: reduce the size of the test

* chore: apply suggestions from CR

* chore: move channel initialization into `RegionSupervisor::channel`

* chore: minor refactor

* chore: rename ident
2024-07-01 05:58:27 +00:00
Jeremyhi
214fd38f69 feat: add build info for flow heartbeat task (#4228)
* chore: refactor load region stats

* feat: add build info for flow heartbeat
2024-07-01 03:19:25 +00:00
zyy17
ddc7a80f56 fix: add serialize_ignore_column_ids() to fix deserialize region options failed from json string (#4229)
* fix: add serialize_ignore_column_ids() to fix deserialize region options failed from json string

* refactor: return empty vector if column_id is empty
2024-06-30 09:59:14 +00:00
Ruihang Xia
a7aa556763 feat: output multiple partition in MergeScanExec (#4227)
* feat: output multiple partition in MergeScanExec

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix range manipulate

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2024-06-28 13:45:22 +00:00
Lei, HUANG
ef935a1de6 feat!: reduce sorted runs during compaction (#3702)
* feat: add functions to find and merge sorted runs

* chore: refactor code

* chore: remove some duplicates

* chore: remove one clone

* refactor: change max_active_window_files to max_active_window_runs

* feat: integrate with sorted runs

* fix: unit tests

* feat: limit num of sorted runs during compaction

* fix: some test

* fix: some cr comments

* feat: use smallvec

* chore: rebase main

* feat/reduce-sorted-runs:
 Refactor compaction logic and update test configurations

 - Refactored `merge_all_runs` function to use `sort_ranged_items` for sorting.
 - Improved item merging logic by iterating with `into_iter` and handling overlaps.
 - Updated test configurations to use `max_active_window_runs` instead of `max_active_window_files` for consistency.

---------

Co-authored-by: tison <wander4096@gmail.com>
2024-06-28 08:17:30 +00:00
Weny Xu
352cc9ddde test: add e2e test for region failover (#4188)
* test: add e2e test for region failover

* chore: add ci cfg

* chore: reduce parallelism to 8

* fix(ci): enable region failure

* chore: set sqlx LogLevel to Off

* refactor: move help functions to utils
2024-06-28 06:49:41 +00:00
discord9
b6585e3581 refactor(flow): make from_substrait_* async& worker handle refactor (#4210)
* refactor: use oneshot to receive result

* refactor: make from_substrait_* async

* refacrot: remove serde for plan&expr
2024-06-27 17:17:46 +00:00
Yingwen
10b7a3d24d feat: Implements merge_mode region options (#4208)
* feat: add update_mode to region options

* test: add test

* feat: last not null iter

* feat: time series last not null

* feat: partition tree update mode

* feat: partition tree

* fix: last not null iter slice

* test: add test for compaction

* test: use second resolution

* style: fix clippy

* chore: merge two lines

Co-authored-by: Jeremyhi <jiachun_feng@proton.me>

* chore: address CR comments

* refactor: UpdateMode -> MergeMode

* refactor: LastNotNull -> LastNonNull

* chore: return None earlier

* feat: validate region options

make merge mode optional and use default while it is None

* test: fix tests

---------

Co-authored-by: Jeremyhi <jiachun_feng@proton.me>
2024-06-27 07:52:58 +00:00
Eugene Tolbakov
8702066967 feat(sql): add casting support for shortened intervals (#4220)
* feat(sql): add casting support for shortened intervals

* chore(sql): apply CR suggestion, minor renamings
2024-06-26 22:07:09 +00:00
Jeremyhi
df0fff2f2c feat(servers): make http timeout and body limit optional (#4217)
* feat(servers): make http timeout and body limit optional

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* add comment

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* chore: make config-docs

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Co-authored-by: Ruihang Xia <waynestxia@gmail.com>
2024-06-26 06:14:14 +00:00
dennis zhuang
a779cb36ec fix: wrong frontend registration address (#4199)
* fix: frontend registration address is wrong, #4186

* fix: license header

* chore: adds hostname to frontend grpc

* fix: forgot run make config-docs

* chore: warn when using bind_addr

* fix: flow node heartbeat carrying address
2024-06-26 06:13:07 +00:00
zyy17
948c8695d0 refactor: add SerializedPickerOutput and field modification of CompactorRequest (#4198)
* refactor: remove compaction_options and use RegionOptions type for region_options

* refactor: add file_purger field in CompactionRegion

* refactor: add SerializedPickerOutput

* refactor: rename CompactorRequest to OpenCompactionRegionRequest and remove PickerOutput

* refactor: use &PickerOutput instead of clone()
2024-06-25 13:04:07 +00:00
Ruihang Xia
4d4a6cd265 feat: validate partition rule on create table (#4213)
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2024-06-25 12:55:01 +00:00
Zhenchi
5dde148b3d feat(puffin): implement CachedPuffinReader (#4209)
* feat(puffin): implement CachedPuffinReader

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* chore: next PR to introduce CachedPuffinManager

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* chore: rename

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* address comments

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

---------

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
2024-06-25 12:27:06 +00:00
Weny Xu
8cbe7166b0 refactor: migrate region failover implementation to region migration (#4172)
* refactor: migrate region failover implementation to region migration

* fix: use HEARTBEAT_INTERVAL_MILLIS as lease secs

* fix: return false if leader is downgraded

* fix: only remove failure detector after submitting procedure successfully

* feat: ignore dropped region

* refactor: retrieve table routes in batches

* refactor: disable region failover on local WAL implementation

* fix: move the guard into procedure

* feat: use real peer addr

* feat: use interval instead of sleep

* chore: rename `HeartbeatSender` to `HeartbeatAcceptor`

* chore: apply suggestions from CR

* chore: reduce duplicate code

* chore: apply suggestions from CR

* feat: lookup peer addr

* chore: add comments

* chore: apply suggestions from CR

* chore: apply suggestions from CR
2024-06-25 11:58:17 +00:00
Yingwen
f5ac158605 docs: remove outdated docs (#4205)
* docs: remove outdated docs

* ci: align ci

* chore: Revert "ci: align ci"

This reverts commit 2c3c0eed7e.

* ci: fix docs ci
2024-06-25 09:46:30 +00:00
Lei, HUANG
120447779c feat: bulk memtable codec (#4163)
* feat: introduce bulk memtable encoder/decoder

* chore: rebase main

* chore: resolve some comments

* refactor: only carries time unit in ArraysSorter

* fix: some comments
2024-06-25 09:02:20 +00:00
discord9
82f6373574 feat: FlownodeClient (#4206)
* feat: FlownodeClient

* chore: remove wrong doc

* fix: debug impl for NodeClients

* chore: rename `FlownodeClient` to `FlowRequester`
2024-06-25 08:40:24 +00:00
Zhenchi
1e815dddf1 feat(puffin): implement CachedPuffinWriter (#4203)
* feat(puffin): support lz4 compression for footer

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* feat(puffin): introduce puffin manager trait

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* chore: polish

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* feat(puffin): implement CachedPuffinWriter

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* chore: polish

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

---------

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
2024-06-25 08:00:48 +00:00
Weny Xu
b2f61aa1cf fix: format error correctly (#4204)
* chore: remove TODO comments

* fix: format error correctly
2024-06-25 07:56:13 +00:00
Ruihang Xia
a1e2612bbf fix: align workflows again for the troublesome GHA (#4196)
* fix: align workflows again for the troublesome GHA

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* unify

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2024-06-25 06:52:01 +00:00
Yingwen
9aaf7d79bf feat: Dedup strategy that keeps the last not null field (#4184)
* feat: dedup strategy: last not null

* fix: fix tests

* fix: fix single batch

* chore: warning

* chore: skip has_null check

* refactor: rename fields

* fix: merge last fields may not reset builder

* chore: clear before filter deleted

* chore: remove debug logs

* chore: Update comment

Co-authored-by: Ruihang Xia <waynestxia@gmail.com>

---------

Co-authored-by: Ruihang Xia <waynestxia@gmail.com>
2024-06-25 06:38:48 +00:00
Yingwen
4a4237115a test: wait until checkpoint finish (#4202) 2024-06-25 06:21:19 +00:00
Eugene Tolbakov
840f81e0fd fix(sql): improve compound signed number processing (#4200) 2024-06-25 04:01:46 +00:00
Eugene Tolbakov
cdd4baf183 feat(sql): improve interval expression, support shortened version (#4182)
* feat(sql): improve interval expression, support shortened version

* fix(sql): remove accidental change of sqlness assertion

* fix(sql): address CR feedback, add more tests

* chore(sql): add more tests
2024-06-24 20:29:34 +00:00
Zhenchi
4b42c7b840 feat(puffin): introduce puffin manager trait (#4195)
* feat(puffin): support lz4 compression for footer

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* feat(puffin): introduce puffin manager trait

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* chore: polish

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

---------

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
2024-06-24 16:02:52 +00:00
discord9
a44fe627ce feat: heartbeat task&peer lookup in proc (#4179)
* feat: herat beat task

* feat: use real flow peer allocator when building

* feat: add peer look up in ddl context

* fix: drop flow test

* refactor: per review(WIP)

* refactor: not check if is alive

* refactor: per review

* refactor: remove useless `reset`

* refactor: per bot advices

* refactor: alive peer

* chore: bot review
2024-06-24 15:06:33 +00:00
taobo
77904adaaf fix: region_peers returns same region_id for multi logical tables (#4190)
* fix: `region_peers` returns same region_id for multi logical tables

* test: add sqlness test for information_schema.region_peers

* refactor: region_peers sqlness
2024-06-24 14:12:36 +00:00
Zhenchi
07cbabab7b feat(puffin): support lz4 compression for footer (#4194)
* feat(puffin): support lz4 compression for footer

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* address comments

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

---------

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
2024-06-24 11:39:03 +00:00
zyy17
ea7c17089f refactor: add region_dir in CompactionRegion (#4187) 2024-06-24 08:25:52 +00:00
discord9
517917453d fix(flow): fix call df func bug&sqlness test (#4165)
* tests: flow sqlness tests

* tests: WIP df func test

* fix: use schema before expand for transform expr

* tests: some basic flow tests

* tests: unit test

* chore: dep use rev not patch

* fix: wired sqlness error?

* refactor: per review

* fix: temp sqlness bug

* fix: use fixed sqlness

* fix: impl drop as async shutdown

* refactor: per bot's review

* tests: drop worker handler both sync/async

* docs: add rationale for test

* refactor: per review

* chore: fmt
2024-06-24 07:52:45 +00:00
zyy17
0139a70549 refactor: make RegionOptions and MergeOutput serializable (#4180)
* chore: make RegionOptions serializable and add region_dir in CompactionRegion

* refactor: make `PickerOutput` and `MergeOutput` serializable and deserializable

* refactor: remove Serialize and Deserialize from PickerOutput

* chore: revert changes for file.rs

* chore: revert changes for compactor.rs and compaction.rs

---------

Co-authored-by: tison <wander4096@gmail.com>
2024-06-24 07:37:53 +00:00
tison
5566dd72f2 chore: highlight our committers in CONTRIBUTING.md (#4189)
* chore: highlight our committers in CONTRIBUTING.md

Signed-off-by: tison <wander4096@gmail.com>

* chore: appy suggestion

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

* chore: apply suggestion

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

* Trigger CI

Signed-off-by: tison <wander4096@gmail.com>

---------

Signed-off-by: tison <wander4096@gmail.com>
Co-authored-by: dennis zhuang <killme2008@gmail.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2024-06-22 08:23:30 +00:00
ZonaHe
dea33a7aaf feat: update dashboard to v0.5.2 (#4185)
Co-authored-by: ZonaHex <ZonaHex@users.noreply.github.com>
2024-06-21 14:32:46 +00:00
Weny Xu
15ad9f2f6f fix: region logical regions after catching up (#4176)
* fix: region logical regions after catching up

* test: add metric table migration test

* chore: apply suggestions from CR
2024-06-21 10:30:18 +00:00
Ruihang Xia
fce65c97e3 feat: make RegionScanner aware of PartitionRange (#4170)
* define PartitionRange

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* add optimizer rule

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* implement interfaces

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* impl aggr stream

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* add fallback method

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix tests

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix tests

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix typo

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* update sqlness result

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* add document and rename struct

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* add more comments

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix typo

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2024-06-21 09:54:22 +00:00
Ruihang Xia
ac574b66ab feat: add num_rows and num_row_groups to manifest (#4183)
* featadd num_rows and num_row_groups to manifest

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* add document

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2024-06-21 07:15:13 +00:00
Weny Xu
1e52ba325f feat: introduce chaos crds (#4173)
feat: add chaos-mesh crds
2024-06-21 06:28:51 +00:00
Yohan Wal
b739c9fd10 feat: PREPARE and EXECUTE statement from mysql client (#4125)
* feat: prepare stmt in mysql client

* feat: execute stmt in mysql client

* fix: handle parameters properly

* refactor: use existing funcs to convert expr to scalar value

* refactor: use uuid strings as stmt_key for queries from COM_PREPARE packet

* refactor: take prepare and execute parser as submodule

* test: add unit test for converting expr to scalar value

* feat: deallocate stmt in mysql client

* chore: comments and duplicates

---------

Co-authored-by: dennis zhuang <killme2008@gmail.com>
2024-06-21 02:02:57 +00:00
Ruihang Xia
21c89f3247 perf: optimize RecordBatch to HttpOutput conversion (#4178)
* add benchmark

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* save 70ms

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* add profiler

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* save 50ms

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* save 160ms

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* format toml file

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix license header

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix windows build

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2024-06-20 12:33:58 +00:00
Jeremyhi
5bcd7a14bb feat: use the write runtime to handle the heartbeats (#4177) 2024-06-20 08:45:07 +00:00
dennis zhuang
4306cba866 feat: show database options (#4174)
* test: test create table with database ttl

* feat: show database options

* fix: comment

* chore: apply suggestion

Co-authored-by: Jeremyhi <jiachun_feng@proton.me>

* chore: fix CR comments and refactor

* chore: style

Co-authored-by: Weny Xu <wenymedia@gmail.com>

---------

Co-authored-by: Jeremyhi <jiachun_feng@proton.me>
Co-authored-by: Weny Xu <wenymedia@gmail.com>
2024-06-20 04:21:58 +00:00
Jeremyhi
4c3d4af127 feat: register flow node (#4166)
* feat: rename keys.rs to key.rs

* feat: refactor datanode keys

* feat: add flownode key

* feat: keep flownode's lease info in metasrv

* feat: flow selector

* feat: impl_try_from_lease_key and impl_from_str_lease_key to simple code
2024-06-20 03:46:19 +00:00
localhost
48a0f39b19 chore: enhance add pipeline http api return data (#4167)
* chore: enhance add pipeline http api return data

* chore: replaceing hard code header value
2024-06-20 02:19:31 +00:00
ZonaHe
8abebad458 feat: update dashboard to v0.5.1 (#4171)
Co-authored-by: ZonaHex <ZonaHex@users.noreply.github.com>
2024-06-20 01:46:54 +00:00
LFC
cc2f7efb98 chore: bump datafusion version to fix last_value regression (#4169)
* chore: bump datafusion version to fix `last_value` regression

* fix: resolve PR comments

* fix ci
2024-06-19 07:47:17 +00:00
taobo
22d12683b4 refactor!: unify FrontendOptions and DatanodeOptions by using GrpcOptions (#4088)
* refactor: move GrpcOptions to servers/grpc

* fix: optimize code

* fix: docs

* refactor: move DatanodeOptions.rpc_hostname to grpc.hostname

* fix: merge main

* refactor code impl

test: add test_depreacted_cli_options unit test

* Update src/servers/src/grpc.rs

Co-authored-by: Yingwen <realevenyag@gmail.com>

---------

Co-authored-by: Yingwen <realevenyag@gmail.com>
2024-06-18 22:45:38 +00:00
Yingwen
fe74efdafe feat: Implement memtable range (#4162)
* refactor: RangeBase

* feat: memtable range

* feat: scanner use mem range

* feat: remove base from mem range context

* feat: impl ranges for memtables

* chore: fix warnings

* refactor: make predicate cheap to clone

* refactor: MemRange -> MemtableRange

* feat: pub empty memtable to fix warnings

* test: fix sqlness result
2024-06-18 22:25:19 +00:00
discord9
cd9705ccd7 feat(flow): support datafusion scalar function (#4142)
* chore: call df function types

* feat: RelationDesc to DfSchema

* refactor: use RelationDesc instead of Type

* chore: WIP get to phy expr

* feat: custom deserialize

* chore: fmt

* refactor: renaming to DfScalarFunction

* feat: eval df func(untested)

* fix: had to spawn a thread for calling async

* chore: per review advices

* tests: test df scalar function
2024-06-18 12:34:38 +00:00
Weny Xu
ea2d067cf1 feat: implement the OrderedBatchProducer (#4134)
* feat: implement the `OrderedBatchProducer`

* test: add test of cancel safety

* chore: apply suggestions from CR

* chore: apply suggestions from CR

* refactor: simplify the `BackgroundProducerWorker`

* feat: implement the OrderedBatchProducer v2

* refactor: switch to `OrderedBatchProducer`

* chore: rename to `MAX_FLUSH_QUEUE_SIZE`

* refactor: switch to `OrderedBatchProducerV2`

* refactor: remove `OrderedBatchProducerV1`

* test: add tests

* refactor: make config configurable

* refactor: minor refactor

* chore: remove unused code

* chore: remove `benchmarks` crate

* chore: update config doc

* chore: remove unused comment

* refactor: refactor client registry

* refactor: rename `max_batch_size` to `max_batch_bytes`

* chore: use constant value

* chore: ensure serialized meta < ESTIMATED_META_SIZE

* chore: apply suggestions from CR

* chore: remove the `CHANNEL_SIZE`

* chore: apply suggestions from CR

* fix: ensure serialized meta < ESTIMATED_META_SIZE

* chore: apply suggestions from CR

* chore: apply suggestions from CR
2024-06-18 07:20:01 +00:00
Ning Sun
70d113a355 feat: update default size of bgworkers, add hbworkers (#4129)
* feat: update default size of bgworkers, add hbworkers

* feat: update frontend heartbeat as well

* chore: update sample config files and default settings

* chore: update config docs

* Revert "chore: update config docs"

This reverts commit 8107f4c120.

* Revert "chore: update sample config files and default settings"

This reverts commit f5ae701c8d.

* feat: use default heartbeat runtime size

* chore: update config docs
2024-06-18 06:18:37 +00:00
shuiyisong
cb657ae51e feat(pipeline): join processor (#4158)
* feat: add join processor

* test: add join simple test

* chore: fix header

* chore: update commit

Co-authored-by: dennis zhuang <killme2008@gmail.com>

* test: add more join test

* chore: fix lint

* chore: update comment

---------

Co-authored-by: dennis zhuang <killme2008@gmail.com>
2024-06-18 05:00:34 +00:00
Jeremyhi
141d017576 feat: enable metasrv to accept flownode's heartbeats (#4160)
* feat: make metasrv can accept flownode's heartbeat

* chore: proto
2024-06-18 04:07:46 +00:00
yuanbohan
0fc18b6865 feat(pipeline): gsub prosessor (#4121)
* chore: add log http ingester scaffold

* chore: add some example code

* chore: add log inserter

* chore: add log handler file

* chore: add pipeline lib

* chore: import log handler

* chore: add pipelime http handler

* chore: add pipeline private table

* chore: add pipeline API

* chore: improve error handling

* chore: merge main

* chore: add multi content type support for log handler

* refactor: remove servers dep on pipeline

* refactor: move define_into_tonic_status to common-error

* refactor: bring in pipeline 3eb890c551b8d7f60c4491fcfec18966e2b210a4

* chore: fix typo

* refactor: bring in pipeline a95c9767d7056ab01dd8ca5fa1214456c6ffc72c

* chore: fix typo and license header

* refactor: move http event handler to a separate file

* chore: add test for pipeline

* chore: fmt

* refactor: bring in pipeline 7d2402701877901871dd1294a65ac937605a6a93

* refactor: move `pipeline_operator` to `pipeline` crate

* chore: minor update

* refactor: bring in pipeline 1711f4d46687bada72426d88cda417899e0ae3a4

* chore: add log

* chore: add log

* chore: remove open hook

* chore: minor update

* chore: fix fmt

* chore: minor update

* chore: rename desc for pipeline table

* refactor: remove updated_at in pipelines

* chore: add more content type support for log inserter api

* chore: introduce pipeline crate

* chore: update upload pipeline api

* chore: fix by pr commit

* chore: add some doc for pub fn/struct

* chore: some minro fix

* chore: add pipeline version support

* chore: impl log pipeline version

* gsub prosessor

* chore: add test

* chore: update commit

Co-authored-by: dennis zhuang <killme2008@gmail.com>

---------

Co-authored-by: paomian <xpaomian@gmail.com>
Co-authored-by: shuiyisong <xixing.sys@gmail.com>
Co-authored-by: shuiyisong <113876041+shuiyisong@users.noreply.github.com>
Co-authored-by: dennis zhuang <killme2008@gmail.com>
2024-06-17 07:57:47 +00:00
yuanbohan
0aceebf0a3 feat(pipeline): transform support on_failure (#4123)
* chore: add log http ingester scaffold

* chore: add some example code

* chore: add log inserter

* chore: add log handler file

* chore: add pipeline lib

* chore: import log handler

* chore: add pipelime http handler

* chore: add pipeline private table

* chore: add pipeline API

* chore: improve error handling

* chore: merge main

* chore: add multi content type support for log handler

* refactor: remove servers dep on pipeline

* refactor: move define_into_tonic_status to common-error

* refactor: bring in pipeline 3eb890c551b8d7f60c4491fcfec18966e2b210a4

* chore: fix typo

* refactor: bring in pipeline a95c9767d7056ab01dd8ca5fa1214456c6ffc72c

* chore: fix typo and license header

* refactor: move http event handler to a separate file

* chore: add test for pipeline

* chore: fmt

* refactor: bring in pipeline 7d2402701877901871dd1294a65ac937605a6a93

* refactor: move `pipeline_operator` to `pipeline` crate

* chore: minor update

* refactor: bring in pipeline 1711f4d46687bada72426d88cda417899e0ae3a4

* chore: add log

* chore: add log

* chore: remove open hook

* chore: minor update

* chore: fix fmt

* chore: minor update

* chore: rename desc for pipeline table

* refactor: remove updated_at in pipelines

* chore: add more content type support for log inserter api

* chore: introduce pipeline crate

* chore: update upload pipeline api

* chore: fix by pr commit

* chore: add some doc for pub fn/struct

* chore: some minro fix

* chore: add pipeline version support

* chore: impl log pipeline version

* transform on_failure

* chore: add test

* chore: move test to a separate file

* chore: add comment

---------

Co-authored-by: paomian <xpaomian@gmail.com>
Co-authored-by: shuiyisong <xixing.sys@gmail.com>
2024-06-17 06:56:31 +00:00
Yingwen
558272de61 refactor: Decouple dedup and merge (#4139)
* feat: remove dedup/filter deleted from merge reader

* feat: impl dedup reader

* feat: support filter deleted flag

* test: test dedup reader

* feat: remove put_only field

* chore: fix clippy

* feat: metrics

* test: test empty batch

* perf: optimize dedup strategy

Avoid iterating all timestamps.

* test: fix test

* feat: generic
2024-06-17 04:09:50 +00:00
LFC
f4a5a44549 refactor: make region manifest checkpoint ran in background (#4133)
* refactor: use in-database manifest as checkpoint instead of merging incremental files in object store

* refactor: make region manifest checkpoint ran in background

* reduce unnecessary metrics

* Update src/mito2/src/manifest/checkpointer.rs

Co-authored-by: Yingwen <realevenyag@gmail.com>

* resolve PR comments

* resolve PR comments

---------

Co-authored-by: Yingwen <realevenyag@gmail.com>
2024-06-17 03:47:18 +00:00
zyy17
5390603855 refactor: add Compactor trait to abstract the compaction (#4097)
* refactor: add Compactor trait

* chore: add compact() in Compactor trait and expose compaction module

* refactor: add CompactionRequest and open_compaction_region

* refactor: export the compaction api

* refactor: add DefaultCompactor::new_from_request

* refactor: no need to pass mito_config in open_compaction_region()

* refactor: CompactionRequest -> &CompactionRequest

* fix: typo

* docs: add docs for public apis

* refactor: remove 'Picker' from Compactor

* chore: add logs

* chore: change pub attribute for Picker

* refactor: remove do_merge_ssts()

* refactor: update comments

* refactor: use CompactionRegion argument in Picker

* chore: make compaction module public and remove unnessary clone

* refactor: move build_compaction_task() in CompactionScheduler{}

* chore: use  in open_compaction_region() and add some comments for public structure

* refactor: add 'manifest_dir()' in store-api

* refactor: move the default implementation to DefaultCompactor

* refactor: remove Options from MergeOutput

* chore: minor modification

* fix: clippy errors

* fix: unit test errors

* refactor: remove 'manifest_dir()' from store-api crate(already have one in opener)

* refactor: use 'region_dir' in CompactionRequest

* refactor: refine naming

* refactor: refine naming

* refactor: remove clone()

* chore: add comments

* refactor: add PickerOutput field in CompactorRequest
2024-06-17 03:03:47 +00:00
Ruihang Xia
a2e3532a57 docs: add guide for tsbs benchmark (#4151)
* docs: add guide for tsbs benchmark

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix typo

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2024-06-16 16:02:15 +08:00
Ruihang Xia
2faa6d6c97 ci: align docs with develop (#4152)
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2024-06-15 16:33:57 +00:00
Weny Xu
d6392acd65 fix(sqlness): catch different format timestamp (#4149) 2024-06-15 11:13:01 +00:00
localhost
01e3a24cf7 feat: log ingestion support (#4014)
* chore: add log http ingester scaffold

* chore: add some example code

* chore: add log inserter

* chore: add log handler file

* chore: add pipeline lib

* chore: import log handler

* chore: add pipelime http handler

* chore: add pipeline private table

* chore: add pipeline API

* chore: improve error handling

* chore: merge main

* chore: add multi content type support for log handler

* refactor: remove servers dep on pipeline

* refactor: move define_into_tonic_status to common-error

* refactor: bring in pipeline 3eb890c551b8d7f60c4491fcfec18966e2b210a4

* chore: fix typo

* refactor: bring in pipeline a95c9767d7056ab01dd8ca5fa1214456c6ffc72c

* chore: fix typo and license header

* refactor: move http event handler to a separate file

* chore: add test for pipeline

* chore: fmt

* refactor: bring in pipeline 7d2402701877901871dd1294a65ac937605a6a93

* refactor: move `pipeline_operator` to `pipeline` crate

* chore: minor update

* refactor: bring in pipeline 1711f4d46687bada72426d88cda417899e0ae3a4

* chore: add log

* chore: add log

* chore: remove open hook

* chore: minor update

* chore: fix fmt

* chore: minor update

* chore: rename desc for pipeline table

* refactor: remove updated_at in pipelines

* chore: add more content type support for log inserter api

* chore: introduce pipeline crate

* chore: update upload pipeline api

* chore: fix by pr commit

* chore: add some doc for pub fn/struct

* chore: some minro fix

* chore: add pipeline version support

* chore: impl log pipeline version

* chore: fix format issue

* fix: make the LogicalPlan of a query pipeline sorted in desc order

* chore: remove some debug log

* chore: replacing hashmap cache with moak

* chore: fix by pr commit

* chore: fix toml format issue

* chore: update Cargo.lock

* chore: fix by pr commit

* chore: fix some issue by pr commit

* chore: add more doc for pipeline version

---------

Co-authored-by: shuiyisong <xixing.sys@gmail.com>
2024-06-14 17:03:30 +00:00
608 changed files with 33899 additions and 11745 deletions

View File

@@ -28,3 +28,8 @@ GT_MYSQL_ADDR = localhost:4002
# Setting for unstable fuzz tests
GT_FUZZ_BINARY_PATH=/path/to/
GT_FUZZ_INSTANCE_ROOT_DIR=/tmp/unstable_greptime
GT_FUZZ_INPUT_MAX_ROWS=2048
GT_FUZZ_INPUT_MAX_TABLES=32
GT_FUZZ_INPUT_MAX_COLUMNS=32
GT_FUZZ_INPUT_MAX_ALTER_ACTIONS=256
GT_FUZZ_INPUT_MAX_INSERT_ACTIONS=8

View File

@@ -123,10 +123,10 @@ runs:
DST_REGISTRY_PASSWORD: ${{ inputs.dst-image-registry-password }}
run: |
./.github/scripts/copy-image.sh \
${{ inputs.src-image-registry }}/${{ inputs.src-image-namespace }}/${{ inputs.src-image-name }}-centos:latest \
${{ inputs.src-image-registry }}/${{ inputs.src-image-namespace }}/${{ inputs.src-image-name }}-centos:${{ inputs.version }} \
${{ inputs.dst-image-registry }}/${{ inputs.dst-image-namespace }}
- name: Push greptimedb-centos image from DockerHub to ACR
- name: Push latest greptimedb-centos image from DockerHub to ACR
shell: bash
if: ${{ inputs.dev-mode == 'false' && inputs.push-latest-tag == 'true' }}
env:

17
.github/actions/setup-chaos/action.yml vendored Normal file
View File

@@ -0,0 +1,17 @@
name: Setup Kind
description: Deploy Kind
runs:
using: composite
steps:
- uses: actions/checkout@v4
- name: Create kind cluster
shell: bash
run: |
helm repo add chaos-mesh https://charts.chaos-mesh.org
kubectl create ns chaos-mesh
helm install chaos-mesh chaos-mesh/chaos-mesh -n=chaos-mesh --version 2.6.3
- name: Print Chaos-mesh
if: always()
shell: bash
run: |
kubectl get po -n chaos-mesh

View File

@@ -24,29 +24,35 @@ inputs:
description: "Etcd endpoints"
values-filename:
default: "with-minio.yaml"
enable-region-failover:
default: false
runs:
using: composite
steps:
- name: Install GreptimeDB operator
shell: bash
run: |
helm repo add greptime https://greptimeteam.github.io/helm-charts/
helm repo update
helm upgrade \
--install \
--create-namespace \
greptimedb-operator greptime/greptimedb-operator \
-n greptimedb-admin \
--wait \
--wait-for-jobs
uses: nick-fields/retry@v3
with:
timeout_minutes: 3
max_attempts: 3
shell: bash
command: |
helm repo add greptime https://greptimeteam.github.io/helm-charts/
helm repo update
helm upgrade \
--install \
--create-namespace \
greptimedb-operator greptime/greptimedb-operator \
-n greptimedb-admin \
--wait \
--wait-for-jobs
- name: Install GreptimeDB cluster
shell: bash
run: |
helm upgrade \
--install my-greptimedb \
--set meta.etcdEndpoints=${{ inputs.etcd-endpoints }} \
--set meta.enableRegionFailover=${{ inputs.enable-region-failover }} \
--set image.registry=${{ inputs.image-registry }} \
--set image.repository=${{ inputs.image-repository }} \
--set image.tag=${{ inputs.image-tag }} \

View File

@@ -139,6 +139,7 @@ jobs:
name: Fuzz Test
needs: build
runs-on: ubuntu-latest
timeout-minutes: 60
strategy:
matrix:
target: [ "fuzz_create_table", "fuzz_alter_table", "fuzz_create_database", "fuzz_create_logical_table", "fuzz_alter_logical_table", "fuzz_insert", "fuzz_insert_logical_table" ]
@@ -186,6 +187,7 @@ jobs:
name: Unstable Fuzz Test
needs: build-greptime-ci
runs-on: ubuntu-latest
timeout-minutes: 60
strategy:
matrix:
target: [ "unstable_fuzz_create_table_standalone" ]
@@ -278,6 +280,7 @@ jobs:
name: Fuzz Test (Distributed, ${{ matrix.mode.name }}, ${{ matrix.target }})
runs-on: ubuntu-latest
needs: build-greptime-ci
timeout-minutes: 60
strategy:
matrix:
target: [ "fuzz_create_table", "fuzz_alter_table", "fuzz_create_database", "fuzz_create_logical_table", "fuzz_alter_logical_table", "fuzz_insert", "fuzz_insert_logical_table" ]
@@ -409,6 +412,133 @@ jobs:
docker rm $(docker ps -a -q)
docker system prune -f
distributed-fuzztest-with-chaos:
name: Fuzz Test with Chaos (Distributed, ${{ matrix.mode.name }}, ${{ matrix.target }})
runs-on: ubuntu-latest
needs: build-greptime-ci
timeout-minutes: 60
strategy:
matrix:
target: ["fuzz_migrate_mito_regions", "fuzz_failover_mito_regions", "fuzz_failover_metric_regions"]
mode:
- name: "Remote WAL"
minio: true
kafka: true
values: "with-remote-wal.yaml"
steps:
- uses: actions/checkout@v4
- name: Setup Kind
uses: ./.github/actions/setup-kind
- name: Setup Chaos Mesh
uses: ./.github/actions/setup-chaos
- if: matrix.mode.minio
name: Setup Minio
uses: ./.github/actions/setup-minio
- if: matrix.mode.kafka
name: Setup Kafka cluser
uses: ./.github/actions/setup-kafka-cluster
- name: Setup Etcd cluser
uses: ./.github/actions/setup-etcd-cluster
# Prepares for fuzz tests
- uses: arduino/setup-protoc@v3
with:
repo-token: ${{ secrets.GITHUB_TOKEN }}
- uses: dtolnay/rust-toolchain@master
with:
toolchain: ${{ env.RUST_TOOLCHAIN }}
- name: Rust Cache
uses: Swatinem/rust-cache@v2
with:
# Shares across multiple jobs
shared-key: "fuzz-test-targets"
- name: Set Rust Fuzz
shell: bash
run: |
sudo apt-get install -y libfuzzer-14-dev
rustup install nightly
cargo +nightly install cargo-fuzz cargo-gc-bin
# Downloads ci image
- name: Download pre-built binariy
uses: actions/download-artifact@v4
with:
name: bin
path: .
- name: Unzip binary
run: |
tar -xvf ./bin.tar.gz
rm ./bin.tar.gz
- name: Build and push GreptimeDB image
uses: ./.github/actions/build-and-push-ci-image
- name: Wait for etcd
run: |
kubectl wait \
--for=condition=Ready \
pod -l app.kubernetes.io/instance=etcd \
--timeout=120s \
-n etcd-cluster
- if: matrix.mode.minio
name: Wait for minio
run: |
kubectl wait \
--for=condition=Ready \
pod -l app=minio \
--timeout=120s \
-n minio
- if: matrix.mode.kafka
name: Wait for kafka
run: |
kubectl wait \
--for=condition=Ready \
pod -l app.kubernetes.io/instance=kafka \
--timeout=120s \
-n kafka-cluster
- name: Print etcd info
shell: bash
run: kubectl get all --show-labels -n etcd-cluster
# Setup cluster for test
- name: Setup GreptimeDB cluster
uses: ./.github/actions/setup-greptimedb-cluster
with:
image-registry: localhost:5001
values-filename: ${{ matrix.mode.values }}
enable-region-failover: true
- name: Port forward (mysql)
run: |
kubectl port-forward service/my-greptimedb-frontend 4002:4002 -n my-greptimedb&
- name: Fuzz Test
uses: ./.github/actions/fuzz-test
env:
CUSTOM_LIBFUZZER_PATH: /usr/lib/llvm-14/lib/libFuzzer.a
GT_MYSQL_ADDR: 127.0.0.1:4002
with:
target: ${{ matrix.target }}
max-total-time: 120
- name: Describe Nodes
if: failure()
shell: bash
run: |
kubectl describe nodes
- name: Export kind logs
if: failure()
shell: bash
run: |
kind export logs /tmp/kind
- name: Upload logs
if: failure()
uses: actions/upload-artifact@v4
with:
name: fuzz-tests-kind-logs-${{ matrix.mode.name }}-${{ matrix.target }}
path: /tmp/kind
retention-days: 3
- name: Delete cluster
if: success()
shell: bash
run: |
kind delete cluster
docker stop $(docker ps -a -q)
docker rm $(docker ps -a -q)
docker system prune -f
sqlness:
name: Sqlness Test (${{ matrix.mode.name }})
needs: build

View File

@@ -67,19 +67,13 @@ jobs:
- run: 'echo "No action required"'
sqlness:
name: Sqlness Test
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ ubuntu-20.04 ]
steps:
- run: 'echo "No action required"'
sqlness-kafka-wal:
name: Sqlness Test with Kafka Wal
name: Sqlness Test (${{ matrix.mode.name }})
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ ubuntu-20.04 ]
mode:
- name: "Basic"
- name: "Remote WAL"
steps:
- run: 'echo "No action required"'

View File

@@ -199,7 +199,7 @@ jobs:
image-registry-username: ${{ secrets.DOCKERHUB_USERNAME }}
image-registry-password: ${{ secrets.DOCKERHUB_TOKEN }}
version: ${{ needs.allocate-runners.outputs.version }}
push-latest-tag: false # Don't push the latest tag to registry.
push-latest-tag: true
- name: Set nightly build result
id: set-nightly-build-result
@@ -240,7 +240,7 @@ jobs:
aws-cn-region: ${{ vars.AWS_RELEASE_BUCKET_REGION }}
dev-mode: false
update-version-info: false # Don't update version info in S3.
push-latest-tag: false # Don't push the latest tag to registry.
push-latest-tag: true
stop-linux-amd64-runner: # It's always run as the last job in the workflow to make sure that the runner is released.
name: Stop linux-amd64 runner

View File

@@ -16,6 +16,7 @@ repos:
hooks:
- id: fmt
- id: clippy
args: ["--workspace", "--all-targets", "--", "-D", "warnings", "-D", "clippy::print_stdout", "-D", "clippy::print_stderr"]
args: ["--workspace", "--all-targets", "--all-features", "--", "-D", "warnings"]
stages: [push]
- id: cargo-check
args: ["--workspace", "--all-targets", "--all-features"]

43
AUTHOR.md Normal file
View File

@@ -0,0 +1,43 @@
# GreptimeDB Authors
## Individual Committers (in alphabetical order)
* [CookiePieWw](https://github.com/CookiePieWw)
* [KKould](https://github.com/KKould)
* [NiwakaDev](https://github.com/NiwakaDev)
* [etolbakov](https://github.com/etolbakov)
* [irenjj](https://github.com/irenjj)
## Team Members (in alphabetical order)
* [Breeze-P](https://github.com/Breeze-P)
* [GrepTime](https://github.com/GrepTime)
* [MichaelScofield](https://github.com/MichaelScofield)
* [Wenjie0329](https://github.com/Wenjie0329)
* [WenyXu](https://github.com/WenyXu)
* [ZonaHex](https://github.com/ZonaHex)
* [apdong2022](https://github.com/apdong2022)
* [beryl678](https://github.com/beryl678)
* [daviderli614](https://github.com/daviderli614)
* [discord9](https://github.com/discord9)
* [evenyag](https://github.com/evenyag)
* [fengjiachun](https://github.com/fengjiachun)
* [fengys1996](https://github.com/fengys1996)
* [holalengyu](https://github.com/holalengyu)
* [killme2008](https://github.com/killme2008)
* [nicecui](https://github.com/nicecui)
* [paomian](https://github.com/paomian)
* [shuiyisong](https://github.com/shuiyisong)
* [sunchanglong](https://github.com/sunchanglong)
* [sunng87](https://github.com/sunng87)
* [tisonkun](https://github.com/tisonkun)
* [v0y4g3r](https://github.com/v0y4g3r)
* [waynexia](https://github.com/waynexia)
* [xtang](https://github.com/xtang)
* [zhaoyingnan01](https://github.com/zhaoyingnan01)
* [zhongzc](https://github.com/zhongzc)
* [zyy17](https://github.com/zyy17)
## All Contributors
[![All Contributors](https://contrib.rocks/image?repo=GreptimeTeam/greptimedb)](https://github.com/GreptimeTeam/greptimedb/graphs/contributors)

View File

@@ -2,7 +2,11 @@
Thanks a lot for considering contributing to GreptimeDB. We believe people like you would make GreptimeDB a great product. We intend to build a community where individuals can have open talks, show respect for one another, and speak with true ❤️. Meanwhile, we are to keep transparency and make your effort count here.
Please read the guidelines, and they can help you get started. Communicate with respect to developers maintaining and developing the project. In return, they should reciprocate that respect by addressing your issue, reviewing changes, as well as helping finalize and merge your pull requests.
You can find our contributors at https://github.com/GreptimeTeam/greptimedb/graphs/contributors. When you dedicate to GreptimeDB for a few months and keep bringing high-quality contributions (code, docs, advocate, etc.), you will be a candidate of a committer.
A committer will be granted both read & write access to GreptimeDB repos. Check the [AUTHOR.md](AUTHOR.md) file for all current individual committers.
Please read the guidelines, and they can help you get started. Communicate respectfully with the developers maintaining and developing the project. In return, they should reciprocate that respect by addressing your issue, reviewing changes, as well as helping finalize and merge your pull requests.
Follow our [README](https://github.com/GreptimeTeam/greptimedb#readme) to get the whole picture of the project. To learn about the design of GreptimeDB, please refer to the [design docs](https://github.com/GrepTimeTeam/docs).

2338
Cargo.lock generated

File diff suppressed because it is too large Load Diff

View File

@@ -1,6 +1,5 @@
[workspace]
members = [
"benchmarks",
"src/api",
"src/auth",
"src/catalog",
@@ -46,6 +45,7 @@ members = [
"src/object-store",
"src/operator",
"src/partition",
"src/pipeline",
"src/plugins",
"src/promql",
"src/puffin",
@@ -104,23 +104,22 @@ clap = { version = "4.4", features = ["derive"] }
config = "0.13.0"
crossbeam-utils = "0.8"
dashmap = "5.4"
datafusion = { git = "https://github.com/apache/datafusion.git", rev = "08e19f4956d32164be6fc66eb5a4c080eb0023d1" }
datafusion-common = { git = "https://github.com/apache/datafusion.git", rev = "08e19f4956d32164be6fc66eb5a4c080eb0023d1" }
datafusion-expr = { git = "https://github.com/apache/datafusion.git", rev = "08e19f4956d32164be6fc66eb5a4c080eb0023d1" }
datafusion-functions = { git = "https://github.com/apache/datafusion.git", rev = "08e19f4956d32164be6fc66eb5a4c080eb0023d1" }
datafusion-optimizer = { git = "https://github.com/apache/datafusion.git", rev = "08e19f4956d32164be6fc66eb5a4c080eb0023d1" }
datafusion-physical-expr = { git = "https://github.com/apache/datafusion.git", rev = "08e19f4956d32164be6fc66eb5a4c080eb0023d1" }
datafusion-physical-plan = { git = "https://github.com/apache/datafusion.git", rev = "08e19f4956d32164be6fc66eb5a4c080eb0023d1" }
datafusion-sql = { git = "https://github.com/apache/datafusion.git", rev = "08e19f4956d32164be6fc66eb5a4c080eb0023d1" }
datafusion-substrait = { git = "https://github.com/apache/datafusion.git", rev = "08e19f4956d32164be6fc66eb5a4c080eb0023d1" }
datafusion = { git = "https://github.com/apache/datafusion.git", rev = "729b356ef543ffcda6813c7b5373507a04ae0109" }
datafusion-common = { git = "https://github.com/apache/datafusion.git", rev = "729b356ef543ffcda6813c7b5373507a04ae0109" }
datafusion-expr = { git = "https://github.com/apache/datafusion.git", rev = "729b356ef543ffcda6813c7b5373507a04ae0109" }
datafusion-functions = { git = "https://github.com/apache/datafusion.git", rev = "729b356ef543ffcda6813c7b5373507a04ae0109" }
datafusion-optimizer = { git = "https://github.com/apache/datafusion.git", rev = "729b356ef543ffcda6813c7b5373507a04ae0109" }
datafusion-physical-expr = { git = "https://github.com/apache/datafusion.git", rev = "729b356ef543ffcda6813c7b5373507a04ae0109" }
datafusion-physical-plan = { git = "https://github.com/apache/datafusion.git", rev = "729b356ef543ffcda6813c7b5373507a04ae0109" }
datafusion-sql = { git = "https://github.com/apache/datafusion.git", rev = "729b356ef543ffcda6813c7b5373507a04ae0109" }
datafusion-substrait = { git = "https://github.com/apache/datafusion.git", rev = "729b356ef543ffcda6813c7b5373507a04ae0109" }
derive_builder = "0.12"
dotenv = "0.15"
# TODO(LFC): Wait for https://github.com/etcdv3/etcd-client/pull/76
etcd-client = { git = "https://github.com/MichaelScofield/etcd-client.git", rev = "4c371e9b3ea8e0a8ee2f9cbd7ded26e54a45df3b" }
etcd-client = { version = "0.13" }
fst = "0.4.7"
futures = "0.3"
futures-util = "0.3"
greptime-proto = { git = "https://github.com/GreptimeTeam/greptime-proto.git", rev = "ae26136accd82fbdf8be540cd502f2e94951077e" }
greptime-proto = { git = "https://github.com/GreptimeTeam/greptime-proto.git", rev = "b83f00958fe4cbc77b85b7407bca206e98bdc845" }
humantime = "2.1"
humantime-serde = "1.1"
itertools = "0.10"
@@ -224,6 +223,7 @@ mito2 = { path = "src/mito2" }
object-store = { path = "src/object-store" }
operator = { path = "src/operator" }
partition = { path = "src/partition" }
pipeline = { path = "src/pipeline" }
plugins = { path = "src/plugins" }
promql = { path = "src/promql" }
puffin = { path = "src/puffin" }

View File

@@ -170,6 +170,10 @@ FUZZ_TARGET ?= fuzz_alter_table
fuzz:
cargo fuzz run ${FUZZ_TARGET} --fuzz-dir tests-fuzz -D -s none -- -runs=${RUNS}
.PHONY: fuzz-ls
fuzz-ls:
cargo fuzz list --fuzz-dir tests-fuzz
.PHONY: check
check: ## Cargo check all the targets.
cargo check --workspace --all-targets --all-features

View File

@@ -183,6 +183,8 @@ Please refer to [contribution guidelines](CONTRIBUTING.md) and [internal concept
## Acknowledgement
Special thanks to all the contributors who have propelled GreptimeDB forward. For a complete list of contributors, please refer to [AUTHOR.md](AUTHOR.md).
- GreptimeDB uses [Apache Arrow™](https://arrow.apache.org/) as the memory model and [Apache Parquet™](https://parquet.apache.org/) as the persistent file format.
- GreptimeDB's query engine is powered by [Apache Arrow DataFusion™](https://arrow.apache.org/datafusion/).
- [Apache OpenDAL™](https://opendal.apache.org) gives GreptimeDB a very general and elegant data access abstraction layer.

View File

@@ -1,37 +0,0 @@
[package]
name = "benchmarks"
version.workspace = true
edition.workspace = true
license.workspace = true
[lints]
workspace = true
[dependencies]
api.workspace = true
arrow.workspace = true
chrono.workspace = true
clap.workspace = true
common-base.workspace = true
common-telemetry.workspace = true
common-wal.workspace = true
dotenv.workspace = true
futures.workspace = true
futures-util.workspace = true
humantime.workspace = true
humantime-serde.workspace = true
indicatif = "0.17.1"
itertools.workspace = true
lazy_static.workspace = true
log-store.workspace = true
mito2.workspace = true
num_cpus.workspace = true
parquet.workspace = true
prometheus.workspace = true
rand.workspace = true
rskafka.workspace = true
serde.workspace = true
store-api.workspace = true
tokio.workspace = true
toml.workspace = true
uuid.workspace = true

View File

@@ -1,11 +0,0 @@
Benchmarkers for GreptimeDB
--------------------------------
## Wal Benchmarker
The wal benchmarker serves to evaluate the performance of GreptimeDB's Write-Ahead Log (WAL) component. It meticulously assesses the read/write performance of the WAL under diverse workloads generated by the benchmarker.
### How to use
To compile the benchmarker, navigate to the `greptimedb/benchmarks` directory and execute `cargo build --release`. Subsequently, you'll find the compiled target located at `greptimedb/target/release/wal_bench`.
The `./wal_bench -h` command reveals numerous arguments that the target accepts. Among these, a notable one is the `cfg-file` argument. By utilizing a configuration file in the TOML format, you can bypass the need to repeatedly specify cumbersome arguments.

View File

@@ -1,21 +0,0 @@
# Refers to the documents of `Args` in benchmarks/src/wal.rs`.
wal_provider = "kafka"
bootstrap_brokers = ["localhost:9092"]
num_workers = 10
num_topics = 32
num_regions = 1000
num_scrapes = 1000
num_rows = 5
col_types = "ifs"
max_batch_size = "512KB"
linger = "1ms"
backoff_init = "10ms"
backoff_max = "1ms"
backoff_base = 2
backoff_deadline = "3s"
compression = "zstd"
rng_seed = 42
skip_read = false
skip_write = false
random_topics = true
report_metrics = false

View File

@@ -1,326 +0,0 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#![feature(int_roundings)]
use std::fs;
use std::sync::Arc;
use std::time::Instant;
use api::v1::{ColumnDataType, ColumnSchema, SemanticType};
use benchmarks::metrics;
use benchmarks::wal_bench::{Args, Config, Region, WalProvider};
use clap::Parser;
use common_telemetry::info;
use common_wal::config::kafka::common::BackoffConfig;
use common_wal::config::kafka::DatanodeKafkaConfig as KafkaConfig;
use common_wal::config::raft_engine::RaftEngineConfig;
use common_wal::options::{KafkaWalOptions, WalOptions};
use itertools::Itertools;
use log_store::kafka::log_store::KafkaLogStore;
use log_store::raft_engine::log_store::RaftEngineLogStore;
use mito2::wal::Wal;
use prometheus::{Encoder, TextEncoder};
use rand::distributions::{Alphanumeric, DistString};
use rand::rngs::SmallRng;
use rand::SeedableRng;
use rskafka::client::partition::Compression;
use rskafka::client::ClientBuilder;
use store_api::logstore::LogStore;
use store_api::storage::RegionId;
async fn run_benchmarker<S: LogStore>(cfg: &Config, topics: &[String], wal: Arc<Wal<S>>) {
let chunk_size = cfg.num_regions.div_ceil(cfg.num_workers);
let region_chunks = (0..cfg.num_regions)
.map(|id| {
build_region(
id as u64,
topics,
&mut SmallRng::seed_from_u64(cfg.rng_seed),
cfg,
)
})
.chunks(chunk_size as usize)
.into_iter()
.map(|chunk| Arc::new(chunk.collect::<Vec<_>>()))
.collect::<Vec<_>>();
let mut write_elapsed = 0;
let mut read_elapsed = 0;
if !cfg.skip_write {
info!("Benchmarking write ...");
let num_scrapes = cfg.num_scrapes;
let timer = Instant::now();
futures::future::join_all((0..cfg.num_workers).map(|i| {
let wal = wal.clone();
let regions = region_chunks[i as usize].clone();
tokio::spawn(async move {
for _ in 0..num_scrapes {
let mut wal_writer = wal.writer();
regions
.iter()
.for_each(|region| region.add_wal_entry(&mut wal_writer));
wal_writer.write_to_wal().await.unwrap();
}
})
}))
.await;
write_elapsed += timer.elapsed().as_millis();
}
if !cfg.skip_read {
info!("Benchmarking read ...");
let timer = Instant::now();
futures::future::join_all((0..cfg.num_workers).map(|i| {
let wal = wal.clone();
let regions = region_chunks[i as usize].clone();
tokio::spawn(async move {
for region in regions.iter() {
region.replay(&wal).await;
}
})
}))
.await;
read_elapsed = timer.elapsed().as_millis();
}
dump_report(cfg, write_elapsed, read_elapsed);
}
fn build_region(id: u64, topics: &[String], rng: &mut SmallRng, cfg: &Config) -> Region {
let wal_options = match cfg.wal_provider {
WalProvider::Kafka => {
assert!(!topics.is_empty());
WalOptions::Kafka(KafkaWalOptions {
topic: topics.get(id as usize % topics.len()).cloned().unwrap(),
})
}
WalProvider::RaftEngine => WalOptions::RaftEngine,
};
Region::new(
RegionId::from_u64(id),
build_schema(&parse_col_types(&cfg.col_types), rng),
wal_options,
cfg.num_rows,
cfg.rng_seed,
)
}
fn build_schema(col_types: &[ColumnDataType], mut rng: &mut SmallRng) -> Vec<ColumnSchema> {
col_types
.iter()
.map(|col_type| ColumnSchema {
column_name: Alphanumeric.sample_string(&mut rng, 5),
datatype: *col_type as i32,
semantic_type: SemanticType::Field as i32,
datatype_extension: None,
})
.chain(vec![ColumnSchema {
column_name: "ts".to_string(),
datatype: ColumnDataType::TimestampMillisecond as i32,
semantic_type: SemanticType::Tag as i32,
datatype_extension: None,
}])
.collect()
}
fn dump_report(cfg: &Config, write_elapsed: u128, read_elapsed: u128) {
let cost_report = format!(
"write costs: {} ms, read costs: {} ms",
write_elapsed, read_elapsed,
);
let total_written_bytes = metrics::METRIC_WAL_WRITE_BYTES_TOTAL.get() as u128;
let write_throughput = if write_elapsed > 0 {
(total_written_bytes * 1000).div_floor(write_elapsed)
} else {
0
};
let total_read_bytes = metrics::METRIC_WAL_READ_BYTES_TOTAL.get() as u128;
let read_throughput = if read_elapsed > 0 {
(total_read_bytes * 1000).div_floor(read_elapsed)
} else {
0
};
let throughput_report = format!(
"total written bytes: {} bytes, total read bytes: {} bytes, write throuput: {} bytes/s ({} mb/s), read throughput: {} bytes/s ({} mb/s)",
total_written_bytes,
total_read_bytes,
write_throughput,
write_throughput.div_floor(1 << 20),
read_throughput,
read_throughput.div_floor(1 << 20),
);
let metrics_report = if cfg.report_metrics {
let mut buffer = Vec::new();
let encoder = TextEncoder::new();
let metrics = prometheus::gather();
encoder.encode(&metrics, &mut buffer).unwrap();
String::from_utf8(buffer).unwrap()
} else {
String::new()
};
info!(
r#"
Benchmark config:
{cfg:?}
Benchmark report:
{cost_report}
{throughput_report}
{metrics_report}"#
);
}
async fn create_topics(cfg: &Config) -> Vec<String> {
// Creates topics.
let client = ClientBuilder::new(cfg.bootstrap_brokers.clone())
.build()
.await
.unwrap();
let ctrl_client = client.controller_client().unwrap();
let (topics, tasks): (Vec<_>, Vec<_>) = (0..cfg.num_topics)
.map(|i| {
let topic = if cfg.random_topics {
format!(
"greptime_wal_bench_topic_{}_{}",
uuid::Uuid::new_v4().as_u128(),
i
)
} else {
format!("greptime_wal_bench_topic_{}", i)
};
let task = ctrl_client.create_topic(
topic.clone(),
1,
cfg.bootstrap_brokers.len() as i16,
2000,
);
(topic, task)
})
.unzip();
// Must ignore errors since we allow topics being created more than once.
let _ = futures::future::try_join_all(tasks).await;
topics
}
fn parse_compression(comp: &str) -> Compression {
match comp {
"no" => Compression::NoCompression,
"gzip" => Compression::Gzip,
"lz4" => Compression::Lz4,
"snappy" => Compression::Snappy,
"zstd" => Compression::Zstd,
other => unreachable!("Unrecognized compression {other}"),
}
}
fn parse_col_types(col_types: &str) -> Vec<ColumnDataType> {
let parts = col_types.split('x').collect::<Vec<_>>();
assert!(parts.len() <= 2);
let pattern = parts[0];
let repeat = parts
.get(1)
.map(|r| r.parse::<usize>().unwrap())
.unwrap_or(1);
pattern
.chars()
.map(|c| match c {
'i' | 'I' => ColumnDataType::Int64,
'f' | 'F' => ColumnDataType::Float64,
's' | 'S' => ColumnDataType::String,
other => unreachable!("Cannot parse {other} as a column data type"),
})
.cycle()
.take(pattern.len() * repeat)
.collect()
}
fn main() {
// Sets the global logging to INFO and suppress loggings from rskafka other than ERROR and upper ones.
std::env::set_var("UNITTEST_LOG_LEVEL", "info,rskafka=error");
common_telemetry::init_default_ut_logging();
let args = Args::parse();
let cfg = if !args.cfg_file.is_empty() {
toml::from_str(&fs::read_to_string(&args.cfg_file).unwrap()).unwrap()
} else {
Config::from(args)
};
// Validates arguments.
if cfg.num_regions < cfg.num_workers {
panic!("num_regions must be greater than or equal to num_workers");
}
if cfg
.num_workers
.min(cfg.num_topics)
.min(cfg.num_regions)
.min(cfg.num_scrapes)
.min(cfg.max_batch_size.as_bytes() as u32)
.min(cfg.bootstrap_brokers.len() as u32)
== 0
{
panic!("Invalid arguments");
}
tokio::runtime::Builder::new_multi_thread()
.enable_all()
.build()
.unwrap()
.block_on(async {
match cfg.wal_provider {
WalProvider::Kafka => {
let topics = create_topics(&cfg).await;
let kafka_cfg = KafkaConfig {
broker_endpoints: cfg.bootstrap_brokers.clone(),
max_batch_size: cfg.max_batch_size,
linger: cfg.linger,
backoff: BackoffConfig {
init: cfg.backoff_init,
max: cfg.backoff_max,
base: cfg.backoff_base,
deadline: Some(cfg.backoff_deadline),
},
compression: parse_compression(&cfg.compression),
..Default::default()
};
let store = Arc::new(KafkaLogStore::try_new(&kafka_cfg).await.unwrap());
let wal = Arc::new(Wal::new(store));
run_benchmarker(&cfg, &topics, wal).await;
}
WalProvider::RaftEngine => {
// The benchmarker assumes the raft engine directory exists.
let store = RaftEngineLogStore::try_new(
"/tmp/greptimedb/raft-engine-wal".to_string(),
RaftEngineConfig::default(),
)
.await
.map(Arc::new)
.unwrap();
let wal = Arc::new(Wal::new(store));
run_benchmarker(&cfg, &[], wal).await;
}
}
});
}

View File

@@ -1,39 +0,0 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
use lazy_static::lazy_static;
use prometheus::*;
/// Logstore label.
pub const LOGSTORE_LABEL: &str = "logstore";
/// Operation type label.
pub const OPTYPE_LABEL: &str = "optype";
lazy_static! {
/// Counters of bytes of each operation on a logstore.
pub static ref METRIC_WAL_OP_BYTES_TOTAL: IntCounterVec = register_int_counter_vec!(
"greptime_bench_wal_op_bytes_total",
"wal operation bytes total",
&[OPTYPE_LABEL],
)
.unwrap();
/// Counter of bytes of the append_batch operation.
pub static ref METRIC_WAL_WRITE_BYTES_TOTAL: IntCounter = METRIC_WAL_OP_BYTES_TOTAL.with_label_values(
&["write"],
);
/// Counter of bytes of the read operation.
pub static ref METRIC_WAL_READ_BYTES_TOTAL: IntCounter = METRIC_WAL_OP_BYTES_TOTAL.with_label_values(
&["read"],
);
}

View File

@@ -1,366 +0,0 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
use std::mem::size_of;
use std::sync::atomic::{AtomicI64, AtomicU64, Ordering};
use std::sync::{Arc, Mutex};
use std::time::Duration;
use api::v1::value::ValueData;
use api::v1::{ColumnDataType, ColumnSchema, Mutation, OpType, Row, Rows, Value, WalEntry};
use clap::{Parser, ValueEnum};
use common_base::readable_size::ReadableSize;
use common_wal::options::WalOptions;
use futures::StreamExt;
use mito2::wal::{Wal, WalWriter};
use rand::distributions::{Alphanumeric, DistString, Uniform};
use rand::rngs::SmallRng;
use rand::{Rng, SeedableRng};
use serde::{Deserialize, Serialize};
use store_api::logstore::provider::Provider;
use store_api::logstore::LogStore;
use store_api::storage::RegionId;
use crate::metrics;
/// The wal provider.
#[derive(Clone, ValueEnum, Default, Debug, PartialEq, Serialize, Deserialize)]
#[serde(rename_all = "snake_case")]
pub enum WalProvider {
#[default]
RaftEngine,
Kafka,
}
#[derive(Parser)]
pub struct Args {
/// The provided configuration file.
/// The example configuration file can be found at `greptimedb/benchmarks/config/wal_bench.example.toml`.
#[clap(long, short = 'c')]
pub cfg_file: String,
/// The wal provider.
#[clap(long, value_enum, default_value_t = WalProvider::default())]
pub wal_provider: WalProvider,
/// The advertised addresses of the kafka brokers.
/// If there're multiple bootstrap brokers, their addresses should be separated by comma, for e.g. "localhost:9092,localhost:9093".
#[clap(long, short = 'b', default_value = "localhost:9092")]
pub bootstrap_brokers: String,
/// The number of workers each running in a dedicated thread.
#[clap(long, default_value_t = num_cpus::get() as u32)]
pub num_workers: u32,
/// The number of kafka topics to be created.
#[clap(long, default_value_t = 32)]
pub num_topics: u32,
/// The number of regions.
#[clap(long, default_value_t = 1000)]
pub num_regions: u32,
/// The number of times each region is scraped.
#[clap(long, default_value_t = 1000)]
pub num_scrapes: u32,
/// The number of rows in each wal entry.
/// Each time a region is scraped, a wal entry containing will be produced.
#[clap(long, default_value_t = 5)]
pub num_rows: u32,
/// The column types of the schema for each region.
/// Currently, three column types are supported:
/// - i = ColumnDataType::Int64
/// - f = ColumnDataType::Float64
/// - s = ColumnDataType::String
/// For e.g., "ifs" will be parsed as three columns: i64, f64, and string.
///
/// Additionally, a "x" sign can be provided to repeat the column types for a given number of times.
/// For e.g., "iix2" will be parsed as 4 columns: i64, i64, i64, and i64.
/// This feature is useful if you want to specify many columns.
#[clap(long, default_value = "ifs")]
pub col_types: String,
/// The maximum size of a batch of kafka records.
/// The default value is 1mb.
#[clap(long, default_value = "512KB")]
pub max_batch_size: ReadableSize,
/// The minimum latency the kafka client issues a batch of kafka records.
/// However, a batch of kafka records would be immediately issued if a record cannot be fit into the batch.
#[clap(long, default_value = "1ms")]
pub linger: String,
/// The initial backoff delay of the kafka consumer.
#[clap(long, default_value = "10ms")]
pub backoff_init: String,
/// The maximum backoff delay of the kafka consumer.
#[clap(long, default_value = "1s")]
pub backoff_max: String,
/// The exponential backoff rate of the kafka consumer. The next back off = base * the current backoff.
#[clap(long, default_value_t = 2)]
pub backoff_base: u32,
/// The deadline of backoff. The backoff ends if the total backoff delay reaches the deadline.
#[clap(long, default_value = "3s")]
pub backoff_deadline: String,
/// The client-side compression algorithm for kafka records.
#[clap(long, default_value = "zstd")]
pub compression: String,
/// The seed of random number generators.
#[clap(long, default_value_t = 42)]
pub rng_seed: u64,
/// Skips the read phase, aka. region replay, if set to true.
#[clap(long, default_value_t = false)]
pub skip_read: bool,
/// Skips the write phase if set to true.
#[clap(long, default_value_t = false)]
pub skip_write: bool,
/// Randomly generates topic names if set to true.
/// Useful when you want to run the benchmarker without worrying about the topics created before.
#[clap(long, default_value_t = false)]
pub random_topics: bool,
/// Logs out the gathered prometheus metrics when the benchmarker ends.
#[clap(long, default_value_t = false)]
pub report_metrics: bool,
}
/// Benchmarker config.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Config {
pub wal_provider: WalProvider,
pub bootstrap_brokers: Vec<String>,
pub num_workers: u32,
pub num_topics: u32,
pub num_regions: u32,
pub num_scrapes: u32,
pub num_rows: u32,
pub col_types: String,
pub max_batch_size: ReadableSize,
#[serde(with = "humantime_serde")]
pub linger: Duration,
#[serde(with = "humantime_serde")]
pub backoff_init: Duration,
#[serde(with = "humantime_serde")]
pub backoff_max: Duration,
pub backoff_base: u32,
#[serde(with = "humantime_serde")]
pub backoff_deadline: Duration,
pub compression: String,
pub rng_seed: u64,
pub skip_read: bool,
pub skip_write: bool,
pub random_topics: bool,
pub report_metrics: bool,
}
impl From<Args> for Config {
fn from(args: Args) -> Self {
let cfg = Self {
wal_provider: args.wal_provider,
bootstrap_brokers: args
.bootstrap_brokers
.split(',')
.map(ToString::to_string)
.collect::<Vec<_>>(),
num_workers: args.num_workers.min(num_cpus::get() as u32),
num_topics: args.num_topics,
num_regions: args.num_regions,
num_scrapes: args.num_scrapes,
num_rows: args.num_rows,
col_types: args.col_types,
max_batch_size: args.max_batch_size,
linger: humantime::parse_duration(&args.linger).unwrap(),
backoff_init: humantime::parse_duration(&args.backoff_init).unwrap(),
backoff_max: humantime::parse_duration(&args.backoff_max).unwrap(),
backoff_base: args.backoff_base,
backoff_deadline: humantime::parse_duration(&args.backoff_deadline).unwrap(),
compression: args.compression,
rng_seed: args.rng_seed,
skip_read: args.skip_read,
skip_write: args.skip_write,
random_topics: args.random_topics,
report_metrics: args.report_metrics,
};
cfg
}
}
/// The region used for wal benchmarker.
pub struct Region {
id: RegionId,
schema: Vec<ColumnSchema>,
provider: Provider,
next_sequence: AtomicU64,
next_entry_id: AtomicU64,
next_timestamp: AtomicI64,
rng: Mutex<Option<SmallRng>>,
num_rows: u32,
}
impl Region {
/// Creates a new region.
pub fn new(
id: RegionId,
schema: Vec<ColumnSchema>,
wal_options: WalOptions,
num_rows: u32,
rng_seed: u64,
) -> Self {
let provider = match wal_options {
WalOptions::RaftEngine => Provider::raft_engine_provider(id.as_u64()),
WalOptions::Kafka(opts) => Provider::kafka_provider(opts.topic),
};
Self {
id,
schema,
provider,
next_sequence: AtomicU64::new(1),
next_entry_id: AtomicU64::new(1),
next_timestamp: AtomicI64::new(1655276557000),
rng: Mutex::new(Some(SmallRng::seed_from_u64(rng_seed))),
num_rows,
}
}
/// Scrapes the region and adds the generated entry to wal.
pub fn add_wal_entry<S: LogStore>(&self, wal_writer: &mut WalWriter<S>) {
let mutation = Mutation {
op_type: OpType::Put as i32,
sequence: self
.next_sequence
.fetch_add(self.num_rows as u64, Ordering::Relaxed),
rows: Some(self.build_rows()),
};
let entry = WalEntry {
mutations: vec![mutation],
};
metrics::METRIC_WAL_WRITE_BYTES_TOTAL.inc_by(Self::entry_estimated_size(&entry) as u64);
wal_writer
.add_entry(
self.id,
self.next_entry_id.fetch_add(1, Ordering::Relaxed),
&entry,
&self.provider,
)
.unwrap();
}
/// Replays the region.
pub async fn replay<S: LogStore>(&self, wal: &Arc<Wal<S>>) {
let mut wal_stream = wal.scan(self.id, 0, &self.provider).unwrap();
while let Some(res) = wal_stream.next().await {
let (_, entry) = res.unwrap();
metrics::METRIC_WAL_READ_BYTES_TOTAL.inc_by(Self::entry_estimated_size(&entry) as u64);
}
}
/// Computes the estimated size in bytes of the entry.
pub fn entry_estimated_size(entry: &WalEntry) -> usize {
let wrapper_size = size_of::<WalEntry>()
+ entry.mutations.capacity() * size_of::<Mutation>()
+ size_of::<Rows>();
let rows = entry.mutations[0].rows.as_ref().unwrap();
let schema_size = rows.schema.capacity() * size_of::<ColumnSchema>()
+ rows
.schema
.iter()
.map(|s| s.column_name.capacity())
.sum::<usize>();
let values_size = (rows.rows.capacity() * size_of::<Row>())
+ rows
.rows
.iter()
.map(|r| r.values.capacity() * size_of::<Value>())
.sum::<usize>();
wrapper_size + schema_size + values_size
}
fn build_rows(&self) -> Rows {
let cols = self
.schema
.iter()
.map(|col_schema| {
let col_data_type = ColumnDataType::try_from(col_schema.datatype).unwrap();
self.build_col(&col_data_type, self.num_rows)
})
.collect::<Vec<_>>();
let rows = (0..self.num_rows)
.map(|i| {
let values = cols.iter().map(|col| col[i as usize].clone()).collect();
Row { values }
})
.collect();
Rows {
schema: self.schema.clone(),
rows,
}
}
fn build_col(&self, col_data_type: &ColumnDataType, num_rows: u32) -> Vec<Value> {
let mut rng_guard = self.rng.lock().unwrap();
let rng = rng_guard.as_mut().unwrap();
match col_data_type {
ColumnDataType::TimestampMillisecond => (0..num_rows)
.map(|_| {
let ts = self.next_timestamp.fetch_add(1000, Ordering::Relaxed);
Value {
value_data: Some(ValueData::TimestampMillisecondValue(ts)),
}
})
.collect(),
ColumnDataType::Int64 => (0..num_rows)
.map(|_| {
let v = rng.sample(Uniform::new(0, 10_000));
Value {
value_data: Some(ValueData::I64Value(v)),
}
})
.collect(),
ColumnDataType::Float64 => (0..num_rows)
.map(|_| {
let v = rng.sample(Uniform::new(0.0, 5000.0));
Value {
value_data: Some(ValueData::F64Value(v)),
}
})
.collect(),
ColumnDataType::String => (0..num_rows)
.map(|_| {
let v = Alphanumeric.sample_string(rng, 10);
Value {
value_data: Some(ValueData::StringValue(v)),
}
})
.collect(),
_ => unreachable!(),
}
}
}

View File

@@ -1,10 +1,12 @@
# Configurations
- [Standalone Mode](#standalone-mode)
- [Distributed Mode](#distributed-mode)
- [Configurations](#configurations)
- [Standalone Mode](#standalone-mode)
- [Distributed Mode](#distributed-mode)
- [Frontend](#frontend)
- [Metasrv](#metasrv)
- [Datanode](#datanode)
- [Flownode](#flownode)
## Standalone Mode
@@ -23,3 +25,7 @@
### Datanode
{{ toml2docs "./datanode.example.toml" }}
### Flownode
{{ toml2docs "./flownode.example.toml"}}

View File

@@ -1,10 +1,12 @@
# Configurations
- [Standalone Mode](#standalone-mode)
- [Distributed Mode](#distributed-mode)
- [Configurations](#configurations)
- [Standalone Mode](#standalone-mode)
- [Distributed Mode](#distributed-mode)
- [Frontend](#frontend)
- [Metasrv](#metasrv)
- [Datanode](#datanode)
- [Flownode](#flownode)
## Standalone Mode
@@ -16,11 +18,11 @@
| `runtime` | -- | -- | The runtime options. |
| `runtime.read_rt_size` | Integer | `8` | The number of threads to execute the runtime for global read operations. |
| `runtime.write_rt_size` | Integer | `8` | The number of threads to execute the runtime for global write operations. |
| `runtime.bg_rt_size` | Integer | `8` | The number of threads to execute the runtime for global background operations. |
| `runtime.bg_rt_size` | Integer | `4` | The number of threads to execute the runtime for global background operations. |
| `http` | -- | -- | The HTTP server options. |
| `http.addr` | String | `127.0.0.1:4000` | The address to bind the HTTP server. |
| `http.timeout` | String | `30s` | HTTP request timeout. |
| `http.body_limit` | String | `64MB` | HTTP request body limit.<br/>Support the following units are supported: `B`, `KB`, `KiB`, `MB`, `MiB`, `GB`, `GiB`, `TB`, `TiB`, `PB`, `PiB`. |
| `http.timeout` | String | `30s` | HTTP request timeout. Set to 0 to disable timeout. |
| `http.body_limit` | String | `64MB` | HTTP request body limit.<br/>The following units are supported: `B`, `KB`, `KiB`, `MB`, `MiB`, `GB`, `GiB`, `TB`, `TiB`, `PB`, `PiB`.<br/>Set to 0 to disable limit. |
| `grpc` | -- | -- | The gRPC server options. |
| `grpc.addr` | String | `127.0.0.1:4001` | The address to bind the gRPC server. |
| `grpc.runtime_size` | Integer | `8` | The number of server worker threads. |
@@ -66,8 +68,7 @@
| `wal.prefill_log_files` | Bool | `false` | Whether to pre-create log files on start up.<br/>**It's only used when the provider is `raft_engine`**. |
| `wal.sync_period` | String | `10s` | Duration for fsyncing log files.<br/>**It's only used when the provider is `raft_engine`**. |
| `wal.broker_endpoints` | Array | -- | The Kafka broker endpoints.<br/>**It's only used when the provider is `kafka`**. |
| `wal.max_batch_size` | String | `1MB` | The max size of a single producer batch.<br/>Warning: Kafka has a default limit of 1MB per message in a topic.<br/>**It's only used when the provider is `kafka`**. |
| `wal.linger` | String | `200ms` | The linger duration of a kafka batch producer.<br/>**It's only used when the provider is `kafka`**. |
| `wal.max_batch_bytes` | String | `1MB` | The max size of a single producer batch.<br/>Warning: Kafka has a default limit of 1MB per message in a topic.<br/>**It's only used when the provider is `kafka`**. |
| `wal.consumer_wait_timeout` | String | `100ms` | The consumer wait timeout.<br/>**It's only used when the provider is `kafka`**. |
| `wal.backoff_init` | String | `500ms` | The initial backoff delay.<br/>**It's only used when the provider is `kafka`**. |
| `wal.backoff_max` | String | `10s` | The maximum backoff delay.<br/>**It's only used when the provider is `kafka`**. |
@@ -119,12 +120,20 @@
| `region_engine.mito.scan_parallelism` | Integer | `0` | Parallelism to scan a region (default: 1/4 of cpu cores).<br/>- `0`: using the default value (1/4 of cpu cores).<br/>- `1`: scan in current thread.<br/>- `n`: scan in parallelism n. |
| `region_engine.mito.parallel_scan_channel_size` | Integer | `32` | Capacity of the channel to send data from parallel scan tasks to the main task. |
| `region_engine.mito.allow_stale_entries` | Bool | `false` | Whether to allow stale WAL entries read during replay. |
| `region_engine.mito.index` | -- | -- | The options for index in Mito engine. |
| `region_engine.mito.index.aux_path` | String | `""` | Auxiliary directory path for the index in filesystem, used to store intermediate files for<br/>creating the index and staging files for searching the index, defaults to `{data_home}/index_intermediate`.<br/>The default name for this directory is `index_intermediate` for backward compatibility.<br/><br/>This path contains two subdirectories:<br/>- `__intm`: for storing intermediate files used during creating index.<br/>- `staging`: for storing staging files used during searching index. |
| `region_engine.mito.index.staging_size` | String | `2GB` | The max capacity of the staging directory. |
| `region_engine.mito.inverted_index` | -- | -- | The options for inverted index in Mito engine. |
| `region_engine.mito.inverted_index.create_on_flush` | String | `auto` | Whether to create the index on flush.<br/>- `auto`: automatically<br/>- `disable`: never |
| `region_engine.mito.inverted_index.create_on_compaction` | String | `auto` | Whether to create the index on compaction.<br/>- `auto`: automatically<br/>- `disable`: never |
| `region_engine.mito.inverted_index.apply_on_query` | String | `auto` | Whether to apply the index on query<br/>- `auto`: automatically<br/>- `disable`: never |
| `region_engine.mito.inverted_index.mem_threshold_on_create` | String | `64M` | Memory threshold for performing an external sort during index creation.<br/>Setting to empty will disable external sorting, forcing all sorting operations to happen in memory. |
| `region_engine.mito.inverted_index.intermediate_path` | String | `""` | File system path to store intermediate files for external sorting (default `{data_home}/index_intermediate`). |
| `region_engine.mito.inverted_index.create_on_flush` | String | `auto` | Whether to create the index on flush.<br/>- `auto`: automatically (default)<br/>- `disable`: never |
| `region_engine.mito.inverted_index.create_on_compaction` | String | `auto` | Whether to create the index on compaction.<br/>- `auto`: automatically (default)<br/>- `disable`: never |
| `region_engine.mito.inverted_index.apply_on_query` | String | `auto` | Whether to apply the index on query<br/>- `auto`: automatically (default)<br/>- `disable`: never |
| `region_engine.mito.inverted_index.mem_threshold_on_create` | String | `auto` | Memory threshold for performing an external sort during index creation.<br/>- `auto`: automatically determine the threshold based on the system memory size (default)<br/>- `unlimited`: no memory limit<br/>- `[size]` e.g. `64MB`: fixed memory threshold |
| `region_engine.mito.inverted_index.intermediate_path` | String | `""` | Deprecated, use `region_engine.mito.index.aux_path` instead. |
| `region_engine.mito.fulltext_index` | -- | -- | The options for full-text index in Mito engine. |
| `region_engine.mito.fulltext_index.create_on_flush` | String | `auto` | Whether to create the index on flush.<br/>- `auto`: automatically (default)<br/>- `disable`: never |
| `region_engine.mito.fulltext_index.create_on_compaction` | String | `auto` | Whether to create the index on compaction.<br/>- `auto`: automatically (default)<br/>- `disable`: never |
| `region_engine.mito.fulltext_index.apply_on_query` | String | `auto` | Whether to apply the index on query<br/>- `auto`: automatically (default)<br/>- `disable`: never |
| `region_engine.mito.fulltext_index.mem_threshold_on_create` | String | `auto` | Memory threshold for index creation.<br/>- `auto`: automatically determine the threshold based on the system memory size (default)<br/>- `unlimited`: no memory limit<br/>- `[size]` e.g. `64MB`: fixed memory threshold |
| `region_engine.mito.memtable` | -- | -- | -- |
| `region_engine.mito.memtable.type` | String | `time_series` | Memtable type.<br/>- `time_series`: time-series memtable<br/>- `partition_tree`: partition tree memtable (experimental) |
| `region_engine.mito.memtable.index_max_keys_per_shard` | Integer | `8192` | The max number of keys in one shard.<br/>Only available for `partition_tree` memtable. |
@@ -161,16 +170,17 @@
| `runtime` | -- | -- | The runtime options. |
| `runtime.read_rt_size` | Integer | `8` | The number of threads to execute the runtime for global read operations. |
| `runtime.write_rt_size` | Integer | `8` | The number of threads to execute the runtime for global write operations. |
| `runtime.bg_rt_size` | Integer | `8` | The number of threads to execute the runtime for global background operations. |
| `runtime.bg_rt_size` | Integer | `4` | The number of threads to execute the runtime for global background operations. |
| `heartbeat` | -- | -- | The heartbeat options. |
| `heartbeat.interval` | String | `18s` | Interval for sending heartbeat messages to the metasrv. |
| `heartbeat.retry_interval` | String | `3s` | Interval for retrying to send heartbeat messages to the metasrv. |
| `http` | -- | -- | The HTTP server options. |
| `http.addr` | String | `127.0.0.1:4000` | The address to bind the HTTP server. |
| `http.timeout` | String | `30s` | HTTP request timeout. |
| `http.body_limit` | String | `64MB` | HTTP request body limit.<br/>Support the following units are supported: `B`, `KB`, `KiB`, `MB`, `MiB`, `GB`, `GiB`, `TB`, `TiB`, `PB`, `PiB`. |
| `http.timeout` | String | `30s` | HTTP request timeout. Set to 0 to disable timeout. |
| `http.body_limit` | String | `64MB` | HTTP request body limit.<br/>The following units are supported: `B`, `KB`, `KiB`, `MB`, `MiB`, `GB`, `GiB`, `TB`, `TiB`, `PB`, `PiB`.<br/>Set to 0 to disable limit. |
| `grpc` | -- | -- | The gRPC server options. |
| `grpc.addr` | String | `127.0.0.1:4001` | The address to bind the gRPC server. |
| `grpc.hostname` | String | `127.0.0.1` | The hostname advertised to the metasrv,<br/>and used for connections from outside the host |
| `grpc.runtime_size` | Integer | `8` | The number of server worker threads. |
| `grpc.tls` | -- | -- | gRPC server TLS options, see `mysql.tls` section. |
| `grpc.tls.mode` | String | `disable` | TLS mode. |
@@ -251,7 +261,7 @@
| `runtime` | -- | -- | The runtime options. |
| `runtime.read_rt_size` | Integer | `8` | The number of threads to execute the runtime for global read operations. |
| `runtime.write_rt_size` | Integer | `8` | The number of threads to execute the runtime for global write operations. |
| `runtime.bg_rt_size` | Integer | `8` | The number of threads to execute the runtime for global background operations. |
| `runtime.bg_rt_size` | Integer | `4` | The number of threads to execute the runtime for global background operations. |
| `procedure` | -- | -- | Procedure storage options. |
| `procedure.max_retry_times` | Integer | `12` | Procedure max retry time. |
| `procedure.retry_delay` | String | `500ms` | Initial retry delay of procedures, increases exponentially |
@@ -259,7 +269,7 @@
| `failure_detector` | -- | -- | -- |
| `failure_detector.threshold` | Float | `8.0` | -- |
| `failure_detector.min_std_deviation` | String | `100ms` | -- |
| `failure_detector.acceptable_heartbeat_pause` | String | `3000ms` | -- |
| `failure_detector.acceptable_heartbeat_pause` | String | `10000ms` | -- |
| `failure_detector.first_heartbeat_estimate` | String | `1000ms` | -- |
| `datanode` | -- | -- | Datanode options. |
| `datanode.client` | -- | -- | Datanode client options. |
@@ -306,17 +316,28 @@
| `node_id` | Integer | `None` | The datanode identifier and should be unique in the cluster. |
| `require_lease_before_startup` | Bool | `false` | Start services after regions have obtained leases.<br/>It will block the datanode start if it can't receive leases in the heartbeat from metasrv. |
| `init_regions_in_background` | Bool | `false` | Initialize all regions in the background during the startup.<br/>By default, it provides services after all regions have been initialized. |
| `init_regions_parallelism` | Integer | `16` | Parallelism of initializing regions. |
| `rpc_addr` | String | `127.0.0.1:3001` | The gRPC address of the datanode. |
| `rpc_hostname` | String | `None` | The hostname of the datanode. |
| `rpc_runtime_size` | Integer | `8` | The number of gRPC server worker threads. |
| `rpc_max_recv_message_size` | String | `512MB` | The maximum receive message size for gRPC server. |
| `rpc_max_send_message_size` | String | `512MB` | The maximum send message size for gRPC server. |
| `enable_telemetry` | Bool | `true` | Enable telemetry to collect anonymous usage data. |
| `init_regions_parallelism` | Integer | `16` | Parallelism of initializing regions. |
| `rpc_addr` | String | `None` | Deprecated, use `grpc.addr` instead. |
| `rpc_hostname` | String | `None` | Deprecated, use `grpc.hostname` instead. |
| `rpc_runtime_size` | Integer | `None` | Deprecated, use `grpc.runtime_size` instead. |
| `rpc_max_recv_message_size` | String | `None` | Deprecated, use `grpc.rpc_max_recv_message_size` instead. |
| `rpc_max_send_message_size` | String | `None` | Deprecated, use `grpc.rpc_max_send_message_size` instead. |
| `grpc` | -- | -- | The gRPC server options. |
| `grpc.addr` | String | `127.0.0.1:3001` | The address to bind the gRPC server. |
| `grpc.hostname` | String | `127.0.0.1` | The hostname advertised to the metasrv,<br/>and used for connections from outside the host |
| `grpc.runtime_size` | Integer | `8` | The number of server worker threads. |
| `grpc.max_recv_message_size` | String | `512MB` | The maximum receive message size for gRPC server. |
| `grpc.max_send_message_size` | String | `512MB` | The maximum send message size for gRPC server. |
| `grpc.tls` | -- | -- | gRPC server TLS options, see `mysql.tls` section. |
| `grpc.tls.mode` | String | `disable` | TLS mode. |
| `grpc.tls.cert_path` | String | `None` | Certificate file path. |
| `grpc.tls.key_path` | String | `None` | Private key file path. |
| `grpc.tls.watch` | Bool | `false` | Watch for Certificate and key file change and auto reload.<br/>For now, gRPC tls config does not support auto reload. |
| `runtime` | -- | -- | The runtime options. |
| `runtime.read_rt_size` | Integer | `8` | The number of threads to execute the runtime for global read operations. |
| `runtime.write_rt_size` | Integer | `8` | The number of threads to execute the runtime for global write operations. |
| `runtime.bg_rt_size` | Integer | `8` | The number of threads to execute the runtime for global background operations. |
| `runtime.bg_rt_size` | Integer | `4` | The number of threads to execute the runtime for global background operations. |
| `heartbeat` | -- | -- | The heartbeat options. |
| `heartbeat.interval` | String | `3s` | Interval for sending heartbeat messages to the metasrv. |
| `heartbeat.retry_interval` | String | `3s` | Interval for retrying to send heartbeat messages to the metasrv. |
@@ -342,8 +363,7 @@
| `wal.prefill_log_files` | Bool | `false` | Whether to pre-create log files on start up.<br/>**It's only used when the provider is `raft_engine`**. |
| `wal.sync_period` | String | `10s` | Duration for fsyncing log files.<br/>**It's only used when the provider is `raft_engine`**. |
| `wal.broker_endpoints` | Array | -- | The Kafka broker endpoints.<br/>**It's only used when the provider is `kafka`**. |
| `wal.max_batch_size` | String | `1MB` | The max size of a single producer batch.<br/>Warning: Kafka has a default limit of 1MB per message in a topic.<br/>**It's only used when the provider is `kafka`**. |
| `wal.linger` | String | `200ms` | The linger duration of a kafka batch producer.<br/>**It's only used when the provider is `kafka`**. |
| `wal.max_batch_bytes` | String | `1MB` | The max size of a single producer batch.<br/>Warning: Kafka has a default limit of 1MB per message in a topic.<br/>**It's only used when the provider is `kafka`**. |
| `wal.consumer_wait_timeout` | String | `100ms` | The consumer wait timeout.<br/>**It's only used when the provider is `kafka`**. |
| `wal.backoff_init` | String | `500ms` | The initial backoff delay.<br/>**It's only used when the provider is `kafka`**. |
| `wal.backoff_max` | String | `10s` | The maximum backoff delay.<br/>**It's only used when the provider is `kafka`**. |
@@ -389,12 +409,20 @@
| `region_engine.mito.scan_parallelism` | Integer | `0` | Parallelism to scan a region (default: 1/4 of cpu cores).<br/>- `0`: using the default value (1/4 of cpu cores).<br/>- `1`: scan in current thread.<br/>- `n`: scan in parallelism n. |
| `region_engine.mito.parallel_scan_channel_size` | Integer | `32` | Capacity of the channel to send data from parallel scan tasks to the main task. |
| `region_engine.mito.allow_stale_entries` | Bool | `false` | Whether to allow stale WAL entries read during replay. |
| `region_engine.mito.index` | -- | -- | The options for index in Mito engine. |
| `region_engine.mito.index.aux_path` | String | `""` | Auxiliary directory path for the index in filesystem, used to store intermediate files for<br/>creating the index and staging files for searching the index, defaults to `{data_home}/index_intermediate`.<br/>The default name for this directory is `index_intermediate` for backward compatibility.<br/><br/>This path contains two subdirectories:<br/>- `__intm`: for storing intermediate files used during creating index.<br/>- `staging`: for storing staging files used during searching index. |
| `region_engine.mito.index.staging_size` | String | `2GB` | The max capacity of the staging directory. |
| `region_engine.mito.inverted_index` | -- | -- | The options for inverted index in Mito engine. |
| `region_engine.mito.inverted_index.create_on_flush` | String | `auto` | Whether to create the index on flush.<br/>- `auto`: automatically<br/>- `disable`: never |
| `region_engine.mito.inverted_index.create_on_compaction` | String | `auto` | Whether to create the index on compaction.<br/>- `auto`: automatically<br/>- `disable`: never |
| `region_engine.mito.inverted_index.apply_on_query` | String | `auto` | Whether to apply the index on query<br/>- `auto`: automatically<br/>- `disable`: never |
| `region_engine.mito.inverted_index.mem_threshold_on_create` | String | `64M` | Memory threshold for performing an external sort during index creation.<br/>Setting to empty will disable external sorting, forcing all sorting operations to happen in memory. |
| `region_engine.mito.inverted_index.intermediate_path` | String | `""` | File system path to store intermediate files for external sorting (default `{data_home}/index_intermediate`). |
| `region_engine.mito.inverted_index.create_on_flush` | String | `auto` | Whether to create the index on flush.<br/>- `auto`: automatically (default)<br/>- `disable`: never |
| `region_engine.mito.inverted_index.create_on_compaction` | String | `auto` | Whether to create the index on compaction.<br/>- `auto`: automatically (default)<br/>- `disable`: never |
| `region_engine.mito.inverted_index.apply_on_query` | String | `auto` | Whether to apply the index on query<br/>- `auto`: automatically (default)<br/>- `disable`: never |
| `region_engine.mito.inverted_index.mem_threshold_on_create` | String | `auto` | Memory threshold for performing an external sort during index creation.<br/>- `auto`: automatically determine the threshold based on the system memory size (default)<br/>- `unlimited`: no memory limit<br/>- `[size]` e.g. `64MB`: fixed memory threshold |
| `region_engine.mito.inverted_index.intermediate_path` | String | `""` | Deprecated, use `region_engine.mito.index.aux_path` instead. |
| `region_engine.mito.fulltext_index` | -- | -- | The options for full-text index in Mito engine. |
| `region_engine.mito.fulltext_index.create_on_flush` | String | `auto` | Whether to create the index on flush.<br/>- `auto`: automatically (default)<br/>- `disable`: never |
| `region_engine.mito.fulltext_index.create_on_compaction` | String | `auto` | Whether to create the index on compaction.<br/>- `auto`: automatically (default)<br/>- `disable`: never |
| `region_engine.mito.fulltext_index.apply_on_query` | String | `auto` | Whether to apply the index on query<br/>- `auto`: automatically (default)<br/>- `disable`: never |
| `region_engine.mito.fulltext_index.mem_threshold_on_create` | String | `auto` | Memory threshold for index creation.<br/>- `auto`: automatically determine the threshold based on the system memory size (default)<br/>- `unlimited`: no memory limit<br/>- `[size]` e.g. `64MB`: fixed memory threshold |
| `region_engine.mito.memtable` | -- | -- | -- |
| `region_engine.mito.memtable.type` | String | `time_series` | Memtable type.<br/>- `time_series`: time-series memtable<br/>- `partition_tree`: partition tree memtable (experimental) |
| `region_engine.mito.memtable.index_max_keys_per_shard` | Integer | `8192` | The max number of keys in one shard.<br/>Only available for `partition_tree` memtable. |
@@ -418,3 +446,40 @@
| `export_metrics.remote_write.headers` | InlineTable | -- | HTTP headers of Prometheus remote-write carry. |
| `tracing` | -- | -- | The tracing options. Only effect when compiled with `tokio-console` feature. |
| `tracing.tokio_console_addr` | String | `None` | The tokio console address. |
### Flownode
| Key | Type | Default | Descriptions |
| --- | -----| ------- | ----------- |
| `mode` | String | `distributed` | The running mode of the flownode. It can be `standalone` or `distributed`. |
| `node_id` | Integer | `None` | The flownode identifier and should be unique in the cluster. |
| `grpc` | -- | -- | The gRPC server options. |
| `grpc.addr` | String | `127.0.0.1:6800` | The address to bind the gRPC server. |
| `grpc.hostname` | String | `127.0.0.1` | The hostname advertised to the metasrv,<br/>and used for connections from outside the host |
| `grpc.runtime_size` | Integer | `2` | The number of server worker threads. |
| `grpc.max_recv_message_size` | String | `512MB` | The maximum receive message size for gRPC server. |
| `grpc.max_send_message_size` | String | `512MB` | The maximum send message size for gRPC server. |
| `meta_client` | -- | -- | The metasrv client options. |
| `meta_client.metasrv_addrs` | Array | -- | The addresses of the metasrv. |
| `meta_client.timeout` | String | `3s` | Operation timeout. |
| `meta_client.heartbeat_timeout` | String | `500ms` | Heartbeat timeout. |
| `meta_client.ddl_timeout` | String | `10s` | DDL timeout. |
| `meta_client.connect_timeout` | String | `1s` | Connect server timeout. |
| `meta_client.tcp_nodelay` | Bool | `true` | `TCP_NODELAY` option for accepted connections. |
| `meta_client.metadata_cache_max_capacity` | Integer | `100000` | The configuration about the cache of the metadata. |
| `meta_client.metadata_cache_ttl` | String | `10m` | TTL of the metadata cache. |
| `meta_client.metadata_cache_tti` | String | `5m` | -- |
| `heartbeat` | -- | -- | The heartbeat options. |
| `heartbeat.interval` | String | `3s` | Interval for sending heartbeat messages to the metasrv. |
| `heartbeat.retry_interval` | String | `3s` | Interval for retrying to send heartbeat messages to the metasrv. |
| `logging` | -- | -- | The logging options. |
| `logging.dir` | String | `/tmp/greptimedb/logs` | The directory to store the log files. |
| `logging.level` | String | `None` | The log level. Can be `info`/`debug`/`warn`/`error`. |
| `logging.enable_otlp_tracing` | Bool | `false` | Enable OTLP tracing. |
| `logging.otlp_endpoint` | String | `None` | The OTLP tracing endpoint. |
| `logging.append_stdout` | Bool | `true` | Whether to append logs to stdout. |
| `logging.tracing_sample_ratio` | -- | -- | The percentage of tracing will be sampled and exported.<br/>Valid range `[0, 1]`, 1 means all traces are sampled, 0 means all traces are not sampled, the default value is 1.<br/>ratio > 1 are treated as 1. Fractions < 0 are treated as 0 |
| `logging.tracing_sample_ratio.default_ratio` | Float | `1.0` | -- |
| `tracing` | -- | -- | The tracing options. Only effect when compiled with `tokio-console` feature. |
| `tracing.tokio_console_addr` | String | `None` | The tokio console address. |

View File

@@ -13,27 +13,62 @@ require_lease_before_startup = false
## By default, it provides services after all regions have been initialized.
init_regions_in_background = false
## Enable telemetry to collect anonymous usage data.
enable_telemetry = true
## Parallelism of initializing regions.
init_regions_parallelism = 16
## The gRPC address of the datanode.
## Deprecated, use `grpc.addr` instead.
## +toml2docs:none-default
rpc_addr = "127.0.0.1:3001"
## The hostname of the datanode.
## Deprecated, use `grpc.hostname` instead.
## +toml2docs:none-default
rpc_hostname = "127.0.0.1"
## The number of gRPC server worker threads.
## Deprecated, use `grpc.runtime_size` instead.
## +toml2docs:none-default
rpc_runtime_size = 8
## The maximum receive message size for gRPC server.
## Deprecated, use `grpc.rpc_max_recv_message_size` instead.
## +toml2docs:none-default
rpc_max_recv_message_size = "512MB"
## The maximum send message size for gRPC server.
## Deprecated, use `grpc.rpc_max_send_message_size` instead.
## +toml2docs:none-default
rpc_max_send_message_size = "512MB"
## Enable telemetry to collect anonymous usage data.
enable_telemetry = true
## The gRPC server options.
[grpc]
## The address to bind the gRPC server.
addr = "127.0.0.1:3001"
## The hostname advertised to the metasrv,
## and used for connections from outside the host
hostname = "127.0.0.1"
## The number of server worker threads.
runtime_size = 8
## The maximum receive message size for gRPC server.
max_recv_message_size = "512MB"
## The maximum send message size for gRPC server.
max_send_message_size = "512MB"
## gRPC server TLS options, see `mysql.tls` section.
[grpc.tls]
## TLS mode.
mode = "disable"
## Certificate file path.
## +toml2docs:none-default
cert_path = ""
## Private key file path.
## +toml2docs:none-default
key_path = ""
## Watch for Certificate and key file change and auto reload.
## For now, gRPC tls config does not support auto reload.
watch = false
## The runtime options.
[runtime]
@@ -42,7 +77,7 @@ read_rt_size = 8
## The number of threads to execute the runtime for global write operations.
write_rt_size = 8
## The number of threads to execute the runtime for global background operations.
bg_rt_size = 8
bg_rt_size = 4
## The heartbeat options.
[heartbeat]
@@ -132,11 +167,7 @@ broker_endpoints = ["127.0.0.1:9092"]
## The max size of a single producer batch.
## Warning: Kafka has a default limit of 1MB per message in a topic.
## **It's only used when the provider is `kafka`**.
max_batch_size = "1MB"
## The linger duration of a kafka batch producer.
## **It's only used when the provider is `kafka`**.
linger = "200ms"
max_batch_bytes = "1MB"
## The consumer wait timeout.
## **It's only used when the provider is `kafka`**.
@@ -363,31 +394,72 @@ parallel_scan_channel_size = 32
## Whether to allow stale WAL entries read during replay.
allow_stale_entries = false
## The options for index in Mito engine.
[region_engine.mito.index]
## Auxiliary directory path for the index in filesystem, used to store intermediate files for
## creating the index and staging files for searching the index, defaults to `{data_home}/index_intermediate`.
## The default name for this directory is `index_intermediate` for backward compatibility.
##
## This path contains two subdirectories:
## - `__intm`: for storing intermediate files used during creating index.
## - `staging`: for storing staging files used during searching index.
aux_path = ""
## The max capacity of the staging directory.
staging_size = "2GB"
## The options for inverted index in Mito engine.
[region_engine.mito.inverted_index]
## Whether to create the index on flush.
## - `auto`: automatically
## - `auto`: automatically (default)
## - `disable`: never
create_on_flush = "auto"
## Whether to create the index on compaction.
## - `auto`: automatically
## - `auto`: automatically (default)
## - `disable`: never
create_on_compaction = "auto"
## Whether to apply the index on query
## - `auto`: automatically
## - `auto`: automatically (default)
## - `disable`: never
apply_on_query = "auto"
## Memory threshold for performing an external sort during index creation.
## Setting to empty will disable external sorting, forcing all sorting operations to happen in memory.
mem_threshold_on_create = "64M"
## - `auto`: automatically determine the threshold based on the system memory size (default)
## - `unlimited`: no memory limit
## - `[size]` e.g. `64MB`: fixed memory threshold
mem_threshold_on_create = "auto"
## File system path to store intermediate files for external sorting (default `{data_home}/index_intermediate`).
## Deprecated, use `region_engine.mito.index.aux_path` instead.
intermediate_path = ""
## The options for full-text index in Mito engine.
[region_engine.mito.fulltext_index]
## Whether to create the index on flush.
## - `auto`: automatically (default)
## - `disable`: never
create_on_flush = "auto"
## Whether to create the index on compaction.
## - `auto`: automatically (default)
## - `disable`: never
create_on_compaction = "auto"
## Whether to apply the index on query
## - `auto`: automatically (default)
## - `disable`: never
apply_on_query = "auto"
## Memory threshold for index creation.
## - `auto`: automatically determine the threshold based on the system memory size (default)
## - `unlimited`: no memory limit
## - `[size]` e.g. `64MB`: fixed memory threshold
mem_threshold_on_create = "auto"
[region_engine.mito.memtable]
## Memtable type.
## - `time_series`: time-series memtable

View File

@@ -0,0 +1,90 @@
## The running mode of the flownode. It can be `standalone` or `distributed`.
mode = "distributed"
## The flownode identifier and should be unique in the cluster.
## +toml2docs:none-default
node_id = 14
## The gRPC server options.
[grpc]
## The address to bind the gRPC server.
addr = "127.0.0.1:6800"
## The hostname advertised to the metasrv,
## and used for connections from outside the host
hostname = "127.0.0.1"
## The number of server worker threads.
runtime_size = 2
## The maximum receive message size for gRPC server.
max_recv_message_size = "512MB"
## The maximum send message size for gRPC server.
max_send_message_size = "512MB"
## The metasrv client options.
[meta_client]
## The addresses of the metasrv.
metasrv_addrs = ["127.0.0.1:3002"]
## Operation timeout.
timeout = "3s"
## Heartbeat timeout.
heartbeat_timeout = "500ms"
## DDL timeout.
ddl_timeout = "10s"
## Connect server timeout.
connect_timeout = "1s"
## `TCP_NODELAY` option for accepted connections.
tcp_nodelay = true
## The configuration about the cache of the metadata.
metadata_cache_max_capacity = 100000
## TTL of the metadata cache.
metadata_cache_ttl = "10m"
# TTI of the metadata cache.
metadata_cache_tti = "5m"
## The heartbeat options.
[heartbeat]
## Interval for sending heartbeat messages to the metasrv.
interval = "3s"
## Interval for retrying to send heartbeat messages to the metasrv.
retry_interval = "3s"
## The logging options.
[logging]
## The directory to store the log files.
dir = "/tmp/greptimedb/logs"
## The log level. Can be `info`/`debug`/`warn`/`error`.
## +toml2docs:none-default
level = "info"
## Enable OTLP tracing.
enable_otlp_tracing = false
## The OTLP tracing endpoint.
## +toml2docs:none-default
otlp_endpoint = ""
## Whether to append logs to stdout.
append_stdout = true
## The percentage of tracing will be sampled and exported.
## Valid range `[0, 1]`, 1 means all traces are sampled, 0 means all traces are not sampled, the default value is 1.
## ratio > 1 are treated as 1. Fractions < 0 are treated as 0
[logging.tracing_sample_ratio]
default_ratio = 1.0
## The tracing options. Only effect when compiled with `tokio-console` feature.
[tracing]
## The tokio console address.
## +toml2docs:none-default
tokio_console_addr = "127.0.0.1"

View File

@@ -12,7 +12,7 @@ read_rt_size = 8
## The number of threads to execute the runtime for global write operations.
write_rt_size = 8
## The number of threads to execute the runtime for global background operations.
bg_rt_size = 8
bg_rt_size = 4
## The heartbeat options.
[heartbeat]
@@ -26,16 +26,20 @@ retry_interval = "3s"
[http]
## The address to bind the HTTP server.
addr = "127.0.0.1:4000"
## HTTP request timeout.
## HTTP request timeout. Set to 0 to disable timeout.
timeout = "30s"
## HTTP request body limit.
## Support the following units are supported: `B`, `KB`, `KiB`, `MB`, `MiB`, `GB`, `GiB`, `TB`, `TiB`, `PB`, `PiB`.
## The following units are supported: `B`, `KB`, `KiB`, `MB`, `MiB`, `GB`, `GiB`, `TB`, `TiB`, `PB`, `PiB`.
## Set to 0 to disable limit.
body_limit = "64MB"
## The gRPC server options.
[grpc]
## The address to bind the gRPC server.
addr = "127.0.0.1:4001"
## The hostname advertised to the metasrv,
## and used for connections from outside the host
hostname = "127.0.0.1"
## The number of server worker threads.
runtime_size = 8

View File

@@ -32,7 +32,7 @@ read_rt_size = 8
## The number of threads to execute the runtime for global write operations.
write_rt_size = 8
## The number of threads to execute the runtime for global background operations.
bg_rt_size = 8
bg_rt_size = 4
## Procedure storage options.
[procedure]
@@ -54,7 +54,7 @@ max_metadata_value_size = "1500KiB"
[failure_detector]
threshold = 8.0
min_std_deviation = "100ms"
acceptable_heartbeat_pause = "3000ms"
acceptable_heartbeat_pause = "10000ms"
first_heartbeat_estimate = "1000ms"
## Datanode options.

View File

@@ -15,16 +15,17 @@ read_rt_size = 8
## The number of threads to execute the runtime for global write operations.
write_rt_size = 8
## The number of threads to execute the runtime for global background operations.
bg_rt_size = 8
bg_rt_size = 4
## The HTTP server options.
[http]
## The address to bind the HTTP server.
addr = "127.0.0.1:4000"
## HTTP request timeout.
## HTTP request timeout. Set to 0 to disable timeout.
timeout = "30s"
## HTTP request body limit.
## Support the following units are supported: `B`, `KB`, `KiB`, `MB`, `MiB`, `GB`, `GiB`, `TB`, `TiB`, `PB`, `PiB`.
## The following units are supported: `B`, `KB`, `KiB`, `MB`, `MiB`, `GB`, `GiB`, `TB`, `TiB`, `PB`, `PiB`.
## Set to 0 to disable limit.
body_limit = "64MB"
## The gRPC server options.
@@ -175,11 +176,7 @@ broker_endpoints = ["127.0.0.1:9092"]
## The max size of a single producer batch.
## Warning: Kafka has a default limit of 1MB per message in a topic.
## **It's only used when the provider is `kafka`**.
max_batch_size = "1MB"
## The linger duration of a kafka batch producer.
## **It's only used when the provider is `kafka`**.
linger = "200ms"
max_batch_bytes = "1MB"
## The consumer wait timeout.
## **It's only used when the provider is `kafka`**.
@@ -420,31 +417,72 @@ parallel_scan_channel_size = 32
## Whether to allow stale WAL entries read during replay.
allow_stale_entries = false
## The options for index in Mito engine.
[region_engine.mito.index]
## Auxiliary directory path for the index in filesystem, used to store intermediate files for
## creating the index and staging files for searching the index, defaults to `{data_home}/index_intermediate`.
## The default name for this directory is `index_intermediate` for backward compatibility.
##
## This path contains two subdirectories:
## - `__intm`: for storing intermediate files used during creating index.
## - `staging`: for storing staging files used during searching index.
aux_path = ""
## The max capacity of the staging directory.
staging_size = "2GB"
## The options for inverted index in Mito engine.
[region_engine.mito.inverted_index]
## Whether to create the index on flush.
## - `auto`: automatically
## - `auto`: automatically (default)
## - `disable`: never
create_on_flush = "auto"
## Whether to create the index on compaction.
## - `auto`: automatically
## - `auto`: automatically (default)
## - `disable`: never
create_on_compaction = "auto"
## Whether to apply the index on query
## - `auto`: automatically
## - `auto`: automatically (default)
## - `disable`: never
apply_on_query = "auto"
## Memory threshold for performing an external sort during index creation.
## Setting to empty will disable external sorting, forcing all sorting operations to happen in memory.
mem_threshold_on_create = "64M"
## - `auto`: automatically determine the threshold based on the system memory size (default)
## - `unlimited`: no memory limit
## - `[size]` e.g. `64MB`: fixed memory threshold
mem_threshold_on_create = "auto"
## File system path to store intermediate files for external sorting (default `{data_home}/index_intermediate`).
## Deprecated, use `region_engine.mito.index.aux_path` instead.
intermediate_path = ""
## The options for full-text index in Mito engine.
[region_engine.mito.fulltext_index]
## Whether to create the index on flush.
## - `auto`: automatically (default)
## - `disable`: never
create_on_flush = "auto"
## Whether to create the index on compaction.
## - `auto`: automatically (default)
## - `disable`: never
create_on_compaction = "auto"
## Whether to apply the index on query
## - `auto`: automatically (default)
## - `disable`: never
apply_on_query = "auto"
## Memory threshold for index creation.
## - `auto`: automatically determine the threshold based on the system memory size (default)
## - `unlimited`: no memory limit
## - `[size]` e.g. `64MB`: fixed memory threshold
mem_threshold_on_create = "auto"
[region_engine.mito.memtable]
## Memtable type.
## - `time_series`: time-series memtable

View File

@@ -1,5 +1,9 @@
FROM centos:7
# Note: CentOS 7 has reached EOL since 2024-07-01 thus `mirror.centos.org` is no longer available and we need to use `vault.centos.org` instead.
RUN sed -i s/mirror.centos.org/vault.centos.org/g /etc/yum.repos.d/*.repo
RUN sed -i s/^#.*baseurl=http/baseurl=http/g /etc/yum.repos.d/*.repo
RUN yum install -y epel-release \
openssl \
openssl-devel \

View File

@@ -2,6 +2,10 @@ FROM centos:7 as builder
ENV LANG en_US.utf8
# Note: CentOS 7 has reached EOL since 2024-07-01 thus `mirror.centos.org` is no longer available and we need to use `vault.centos.org` instead.
RUN sed -i s/mirror.centos.org/vault.centos.org/g /etc/yum.repos.d/*.repo
RUN sed -i s/^#.*baseurl=http/baseurl=http/g /etc/yum.repos.d/*.repo
# Install dependencies
RUN ulimit -n 1024000 && yum groupinstall -y 'Development Tools'
RUN yum install -y epel-release \
@@ -25,6 +29,10 @@ ENV PATH /opt/rh/rh-python38/root/usr/bin:/usr/local/bin:/root/.cargo/bin/:$PATH
ARG RUST_TOOLCHAIN
RUN rustup toolchain install ${RUST_TOOLCHAIN}
# Install cargo-binstall with a specific version to adapt the current rust toolchain.
# Note: if we use the latest version, we may encounter the following `use of unstable library feature 'io_error_downcast'` error.
RUN cargo install cargo-binstall --version 1.6.6 --locked
# Install nextest.
RUN cargo install cargo-binstall --locked
RUN cargo binstall cargo-nextest --no-confirm

View File

@@ -55,6 +55,9 @@ ENV PATH /root/.cargo/bin/:$PATH
ARG RUST_TOOLCHAIN
RUN rustup toolchain install ${RUST_TOOLCHAIN}
# Install cargo-binstall with a specific version to adapt the current rust toolchain.
# Note: if we use the latest version, we may encounter the following `use of unstable library feature 'io_error_downcast'` error.
RUN cargo install cargo-binstall --version 1.6.6 --locked
# Install nextest.
RUN cargo install cargo-binstall --locked
RUN cargo binstall cargo-nextest --no-confirm

View File

@@ -43,6 +43,9 @@ ENV PATH /root/.cargo/bin/:$PATH
ARG RUST_TOOLCHAIN
RUN rustup toolchain install ${RUST_TOOLCHAIN}
# Install cargo-binstall with a specific version to adapt the current rust toolchain.
# Note: if we use the latest version, we may encounter the following `use of unstable library feature 'io_error_downcast'` error.
RUN cargo install cargo-binstall --version 1.6.6 --locked
# Install nextest.
RUN cargo install cargo-binstall --locked
RUN cargo binstall cargo-nextest --no-confirm

View File

@@ -66,7 +66,7 @@ services:
- --node-id=0
- --rpc-addr=0.0.0.0:3001
- --rpc-hostname=datanode0:3001
- --metasrv-addr=metasrv:3002
- --metasrv-addrs=metasrv:3002
volumes:
- /tmp/greptimedb-cluster-docker-compose/datanode0:/tmp/greptimedb
depends_on:

View File

@@ -0,0 +1,253 @@
# How to run TSBS Benchmark
This document contains the steps to run TSBS Benchmark. Our results are listed in other files in the same directory.
## Prerequires
You need the following tools to run TSBS Benchmark:
- Go
- git
- make
- rust (optional, if you want to build the DB from source)
## Build TSBS suite
Clone our fork of TSBS:
```shell
git clone https://github.com/GreptimeTeam/tsbs.git
```
Then build it:
```shell
cd tsbs
make
```
You can check the `bin/` directory for compiled binaries. We will only use some of them.
```shell
ls ./bin/
```
Binaries we will use later:
- `tsbs_generate_data`
- `tsbs_generate_queries`
- `tsbs_load_greptime`
- `tsbs_run_queries_influx`
## Generate test data and queries
The data is generated by `tsbs_generate_data`
```shell
mkdir bench-data
./bin/tsbs_generate_data --use-case="cpu-only" --seed=123 --scale=4000 \
--timestamp-start="2023-06-11T00:00:00Z" \
--timestamp-end="2023-06-14T00:00:00Z" \
--log-interval="10s" --format="influx" \
> ./bench-data/influx-data.lp
```
Here we generates 4000 time-series in 3 days with 10s interval. We'll use influx line protocol to write so the target format is `influx`.
Queries are generated by `tsbs_generate_queries`. You can change the parameters but need to make sure it matches with `tsbs_generate_data`.
```shell
./bin/tsbs_generate_queries \
--use-case="devops" --seed=123 --scale=4000 \
--timestamp-start="2023-06-11T00:00:00Z" \
--timestamp-end="2023-06-14T00:00:01Z" \
--queries=100 \
--query-type cpu-max-all-1 \
--format="greptime" \
> ./bench-data/greptime-queries-cpu-max-all-1.dat
./bin/tsbs_generate_queries \
--use-case="devops" --seed=123 --scale=4000 \
--timestamp-start="2023-06-11T00:00:00Z" \
--timestamp-end="2023-06-14T00:00:01Z" \
--queries=100 \
--query-type cpu-max-all-8 \
--format="greptime" \
> ./bench-data/greptime-queries-cpu-max-all-8.dat
./bin/tsbs_generate_queries \
--use-case="devops" --seed=123 --scale=4000 \
--timestamp-start="2023-06-11T00:00:00Z" \
--timestamp-end="2023-06-14T00:00:01Z" \
--queries=50 \
--query-type double-groupby-1 \
--format="greptime" \
> ./bench-data/greptime-queries-double-groupby-1.dat
./bin/tsbs_generate_queries \
--use-case="devops" --seed=123 --scale=4000 \
--timestamp-start="2023-06-11T00:00:00Z" \
--timestamp-end="2023-06-14T00:00:01Z" \
--queries=50 \
--query-type double-groupby-5 \
--format="greptime" \
> ./bench-data/greptime-queries-double-groupby-5.dat
./bin/tsbs_generate_queries \
--use-case="devops" --seed=123 --scale=4000 \
--timestamp-start="2023-06-11T00:00:00Z" \
--timestamp-end="2023-06-14T00:00:01Z" \
--queries=50 \
--query-type double-groupby-all \
--format="greptime" \
> ./bench-data/greptime-queries-double-groupby-all.dat
./bin/tsbs_generate_queries \
--use-case="devops" --seed=123 --scale=4000 \
--timestamp-start="2023-06-11T00:00:00Z" \
--timestamp-end="2023-06-14T00:00:01Z" \
--queries=50 \
--query-type groupby-orderby-limit \
--format="greptime" \
> ./bench-data/greptime-queries-groupby-orderby-limit.dat
./bin/tsbs_generate_queries \
--use-case="devops" --seed=123 --scale=4000 \
--timestamp-start="2023-06-11T00:00:00Z" \
--timestamp-end="2023-06-14T00:00:01Z" \
--queries=100 \
--query-type high-cpu-1 \
--format="greptime" \
> ./bench-data/greptime-queries-high-cpu-1.dat
./bin/tsbs_generate_queries \
--use-case="devops" --seed=123 --scale=4000 \
--timestamp-start="2023-06-11T00:00:00Z" \
--timestamp-end="2023-06-14T00:00:01Z" \
--queries=50 \
--query-type high-cpu-all \
--format="greptime" \
> ./bench-data/greptime-queries-high-cpu-all.dat
./bin/tsbs_generate_queries \
--use-case="devops" --seed=123 --scale=4000 \
--timestamp-start="2023-06-11T00:00:00Z" \
--timestamp-end="2023-06-14T00:00:01Z" \
--queries=10 \
--query-type lastpoint \
--format="greptime" \
> ./bench-data/greptime-queries-lastpoint.dat
./bin/tsbs_generate_queries \
--use-case="devops" --seed=123 --scale=4000 \
--timestamp-start="2023-06-11T00:00:00Z" \
--timestamp-end="2023-06-14T00:00:01Z" \
--queries=100 \
--query-type single-groupby-1-1-1 \
--format="greptime" \
> ./bench-data/greptime-queries-single-groupby-1-1-1.dat
./bin/tsbs_generate_queries \
--use-case="devops" --seed=123 --scale=4000 \
--timestamp-start="2023-06-11T00:00:00Z" \
--timestamp-end="2023-06-14T00:00:01Z" \
--queries=100 \
--query-type single-groupby-1-1-12 \
--format="greptime" \
> ./bench-data/greptime-queries-single-groupby-1-1-12.dat
./bin/tsbs_generate_queries \
--use-case="devops" --seed=123 --scale=4000 \
--timestamp-start="2023-06-11T00:00:00Z" \
--timestamp-end="2023-06-14T00:00:01Z" \
--queries=100 \
--query-type single-groupby-1-8-1 \
--format="greptime" \
> ./bench-data/greptime-queries-single-groupby-1-8-1.dat
./bin/tsbs_generate_queries \
--use-case="devops" --seed=123 --scale=4000 \
--timestamp-start="2023-06-11T00:00:00Z" \
--timestamp-end="2023-06-14T00:00:01Z" \
--queries=100 \
--query-type single-groupby-5-1-1 \
--format="greptime" \
> ./bench-data/greptime-queries-single-groupby-5-1-1.dat
./bin/tsbs_generate_queries \
--use-case="devops" --seed=123 --scale=4000 \
--timestamp-start="2023-06-11T00:00:00Z" \
--timestamp-end="2023-06-14T00:00:01Z" \
--queries=100 \
--query-type single-groupby-5-1-12 \
--format="greptime" \
> ./bench-data/greptime-queries-single-groupby-5-1-12.dat
./bin/tsbs_generate_queries \
--use-case="devops" --seed=123 --scale=4000 \
--timestamp-start="2023-06-11T00:00:00Z" \
--timestamp-end="2023-06-14T00:00:01Z" \
--queries=100 \
--query-type single-groupby-5-8-1 \
--format="greptime" \
> ./bench-data/greptime-queries-single-groupby-5-8-1.dat
```
## Start GreptimeDB
Reference to our [document](https://docs.greptime.com/getting-started/installation/overview) for how to install and start a GreptimeDB. Or you can also check this [document](https://docs.greptime.com/contributor-guide/getting-started#compile-and-run) for how to build a GreptimeDB from source.
## Write Data
After the DB is started, we can use `tsbs_load_greptime` to test the write performance.
```shell
./bin/tsbs_load_greptime \
--urls=http://localhost:4000 \
--file=./bench-data/influx-data.lp \
--batch-size=3000 \
--gzip=false \
--workers=6
```
Parameters here are only provided as an example. You can choose whatever you like or adjust them to match your target scenario.
Notice that if you want to rerun `tsbs_load_greptime`, please destroy and restart the DB and clear its previous data first. Existing duplicated data will impact the write and query performance.
## Query Data
After the data is imported, you can then run queries. The following script runs all queries. You can also choose a subset of queries to run.
```shell
./bin/tsbs_run_queries_influx --file=./bench-data/greptime-queries-cpu-max-all-1.dat \
--db-name=benchmark \
--urls="http://localhost:4000"
./bin/tsbs_run_queries_influx --file=./bench-data/greptime-queries-cpu-max-all-8.dat \
--db-name=benchmark \
--urls="http://localhost:4000"
./bin/tsbs_run_queries_influx --file=./bench-data/greptime-queries-double-groupby-1.dat \
--db-name=benchmark \
--urls="http://localhost:4000"
./bin/tsbs_run_queries_influx --file=./bench-data/greptime-queries-double-groupby-5.dat \
--db-name=benchmark \
--urls="http://localhost:4000"
./bin/tsbs_run_queries_influx --file=./bench-data/greptime-queries-double-groupby-all.dat \
--db-name=benchmark \
--urls="http://localhost:4000"
./bin/tsbs_run_queries_influx --file=./bench-data/greptime-queries-groupby-orderby-limit.dat \
--db-name=benchmark \
--urls="http://localhost:4000"
./bin/tsbs_run_queries_influx --file=./bench-data/greptime-queries-high-cpu-1.dat \
--db-name=benchmark \
--urls="http://localhost:4000"
./bin/tsbs_run_queries_influx --file=./bench-data/greptime-queries-high-cpu-all.dat \
--db-name=benchmark \
--urls="http://localhost:4000"
./bin/tsbs_run_queries_influx --file=./bench-data/greptime-queries-lastpoint.dat \
--db-name=benchmark \
--urls="http://localhost:4000"
./bin/tsbs_run_queries_influx --file=./bench-data/greptime-queries-single-groupby-1-1-1.dat \
--db-name=benchmark \
--urls="http://localhost:4000"
./bin/tsbs_run_queries_influx --file=./bench-data/greptime-queries-single-groupby-1-1-12.dat \
--db-name=benchmark \
--urls="http://localhost:4000"
./bin/tsbs_run_queries_influx --file=./bench-data/greptime-queries-single-groupby-1-8-1.dat \
--db-name=benchmark \
--urls="http://localhost:4000"
./bin/tsbs_run_queries_influx --file=./bench-data/greptime-queries-single-groupby-5-1-1.dat \
--db-name=benchmark \
--urls="http://localhost:4000"
./bin/tsbs_run_queries_influx --file=./bench-data/greptime-queries-single-groupby-5-1-12.dat \
--db-name=benchmark \
--urls="http://localhost:4000"
./bin/tsbs_run_queries_influx --file=./bench-data/greptime-queries-single-groupby-5-8-1.dat \
--db-name=benchmark \
--urls="http://localhost:4000"
```
Rerun queries need not to re-import data. Just execute the corresponding command again is fine.

View File

@@ -1,527 +0,0 @@
# Schema Structs
# Common Schemas
The `datatypes` crate defines the elementary schema struct to describe the metadata.
## ColumnSchema
[ColumnSchema](https://github.com/GreptimeTeam/greptimedb/blob/9fa871a3fad07f583dc1863a509414da393747f8/src/datatypes/src/schema/column_schema.rs#L36) represents the metadata of a column. It is equivalent to arrow's [Field](https://docs.rs/arrow/latest/arrow/datatypes/struct.Field.html) with additional metadata such as default constraint and whether the column is a time index. The time index is the column with a `TIME INDEX` constraint of a table. We can convert the `ColumnSchema` into an arrow `Field` and convert the `Field` back to the `ColumnSchema` without losing metadata.
```rust
pub struct ColumnSchema {
pub name: String,
pub data_type: ConcreteDataType,
is_nullable: bool,
is_time_index: bool,
default_constraint: Option<ColumnDefaultConstraint>,
metadata: Metadata,
}
```
## Schema
[Schema](https://github.com/GreptimeTeam/greptimedb/blob/9fa871a3fad07f583dc1863a509414da393747f8/src/datatypes/src/schema.rs#L38) is an ordered sequence of `ColumnSchema`. It is equivalent to arrow's [Schema](https://docs.rs/arrow/latest/arrow/datatypes/struct.Schema.html) with additional metadata including the index of the time index column and the version of this schema. Same as `ColumnSchema`, we can convert our `Schema` from/to arrow's `Schema`.
```rust
use arrow::datatypes::Schema as ArrowSchema;
pub struct Schema {
column_schemas: Vec<ColumnSchema>,
name_to_index: HashMap<String, usize>,
arrow_schema: Arc<ArrowSchema>,
timestamp_index: Option<usize>,
version: u32,
}
pub type SchemaRef = Arc<Schema>;
```
We alias `Arc<Schema>` as `SchemaRef` since it is used frequently. Mostly, we use our `ColumnSchema` and `Schema` structs instead of Arrow's `Field` and `Schema` unless we need to invoke third-party libraries (like DataFusion or ArrowFlight) that rely on Arrow.
## RawSchema
`Schema` contains fields like a map from column names to their indices in the `ColumnSchema` sequences and a cached arrow `Schema`. We can construct these fields from the `ColumnSchema` sequences thus we don't want to serialize them. This is why we don't derive `Serialize` and `Deserialize` for `Schema`. We introduce a new struct [RawSchema](https://github.com/GreptimeTeam/greptimedb/blob/9fa871a3fad07f583dc1863a509414da393747f8/src/datatypes/src/schema/raw.rs#L24) which keeps all required fields of a `Schema` and derives the serialization traits. To serialize a `Schema`, we need to convert it into a `RawSchema` first and serialize the `RawSchema`.
```rust
pub struct RawSchema {
pub column_schemas: Vec<ColumnSchema>,
pub timestamp_index: Option<usize>,
pub version: u32,
}
```
We want to keep the `Schema` simple and avoid putting too much business-related metadata in it as many different structs or traits rely on it.
# Schema of the Table
A table maintains its schema in [TableMeta](https://github.com/GreptimeTeam/greptimedb/blob/9fa871a3fad07f583dc1863a509414da393747f8/src/table/src/metadata.rs#L97).
```rust
pub struct TableMeta {
pub schema: SchemaRef,
pub primary_key_indices: Vec<usize>,
pub value_indices: Vec<usize>,
// ...
}
```
The order of columns in `TableMeta::schema` is the same as the order specified in the `CREATE TABLE` statement which users use to create this table.
The field `primary_key_indices` stores indices of primary key columns. The field `value_indices` records the indices of value columns (non-primary key and time index, we sometimes call them field columns).
Suppose we create a table with the following SQL
```sql
CREATE TABLE cpu (
ts TIMESTAMP,
host STRING,
usage_user DOUBLE,
usage_system DOUBLE,
datacenter STRING,
TIME INDEX (ts),
PRIMARY KEY(datacenter, host)) ENGINE=mito;
```
Then the table's `TableMeta` may look like this:
```json
{
"schema":{
"column_schemas":[
"ts",
"host",
"usage_user",
"usage_system",
"datacenter"
],
"time_index":0,
"version":0
},
"primary_key_indices":[
4,
1
],
"value_indices":[
2,
3
]
}
```
# Schemas of the storage engine
We split a table into one or more units with the same schema and then store these units in the storage engine. Each unit is a region in the storage engine.
The storage engine maintains schemas of regions in more complicated ways because it
- adds internal columns that are invisible to users to store additional metadata for each row
- provides a data model similar to the key-value model so it organizes columns in a different order
- maintains additional metadata like column id or column family
So the storage engine defines several schema structs:
- RegionSchema
- StoreSchema
- ProjectedSchema
## RegionSchema
A [RegionSchema](https://github.com/GreptimeTeam/greptimedb/blob/9fa871a3fad07f583dc1863a509414da393747f8/src/storage/src/schema/region.rs#L37) describes the schema of a region.
```rust
pub struct RegionSchema {
user_schema: SchemaRef,
store_schema: StoreSchemaRef,
columns: ColumnsMetadataRef,
}
```
Each region reserves some columns called `internal columns` for internal usage:
- `__sequence`, sequence number of a row
- `__op_type`, operation type of a row, such as `PUT` or `DELETE`
- `__version`, user-specified version of a row, reserved but not used. We might remove this in the future
The table engine can't see the `__sequence` and `__op_type` columns, so the `RegionSchema` itself maintains two internal schemas:
- User schema, a `Schema` struct that doesn't have internal columns
- Store schema, a `StoreSchema` struct that has internal columns
The `ColumnsMetadata` struct keeps metadata about all columns but most time we only need to use metadata in user schema and store schema, so we just ignore it. We may remove this struct in the future.
`RegionSchema` organizes columns in the following order:
```
key columns, timestamp, [__version,] value columns, __sequence, __op_type
```
We can ignore the `__version` column because it is disabled now:
```
key columns, timestamp, value columns, __sequence, __op_type
```
Key columns are columns of a table's primary key. Timestamp is the time index column. A region sorts all rows by key columns, timestamp, sequence, and op type.
So the `RegionSchema` of our `cpu` table above looks like this:
```json
{
"user_schema":[
"datacenter",
"host",
"ts",
"usage_user",
"usage_system"
],
"store_schema":[
"datacenter",
"host",
"ts",
"usage_user",
"usage_system",
"__sequence",
"__op_type"
]
}
```
## StoreSchema
As described above, a [StoreSchema](https://github.com/GreptimeTeam/greptimedb/blob/9fa871a3fad07f583dc1863a509414da393747f8/src/storage/src/schema/store.rs#L36) is a schema that knows all internal columns.
```rust
struct StoreSchema {
columns: Vec<ColumnMetadata>,
schema: SchemaRef,
row_key_end: usize,
user_column_end: usize,
}
```
The columns in the `columns` and `schema` fields have the same order. The `ColumnMetadata` has metadata like column id, column family id, and comment. The `StoreSchema` also stores this metadata in `StoreSchema::schema`, so we can convert the `StoreSchema` between arrow's `Schema`. We use this feature to persist the `StoreSchema` in the SST since our SST format is `Parquet`, which can take arrow's `Schema` as its schema.
The `StoreSchema` of the region above is similar to this:
```json
{
"schema":{
"column_schemas":[
"datacenter",
"host",
"ts",
"usage_user",
"usage_system",
"__sequence",
"__op_type"
],
"time_index":2,
"version":0
},
"row_key_end":3,
"user_column_end":5
}
```
The key and timestamp columns form row keys of rows. We put them together so we can use `row_key_end` to get indices of all row key columns. Similarly, we can use the `user_column_end` to get indices of all user columns (non-internal columns).
```rust
impl StoreSchema {
#[inline]
pub(crate) fn row_key_indices(&self) -> impl Iterator<Item = usize> {
0..self.row_key_end
}
#[inline]
pub(crate) fn value_indices(&self) -> impl Iterator<Item = usize> {
self.row_key_end..self.user_column_end
}
}
```
Another useful feature of `StoreSchema` is that we ensure it always contains key columns, a timestamp column, and internal columns because we need them to perform merge, deduplication, and delete. Projection on `StoreSchema` only projects value columns.
## ProjectedSchema
To support arbitrary projection, we introduce the [ProjectedSchema](https://github.com/GreptimeTeam/greptimedb/blob/9fa871a3fad07f583dc1863a509414da393747f8/src/storage/src/schema/projected.rs#L106).
```rust
pub struct ProjectedSchema {
projection: Option<Projection>,
schema_to_read: StoreSchemaRef,
projected_user_schema: SchemaRef,
}
```
We need to handle many cases while doing projection:
- The columns' order of table and region is different
- The projection can be in arbitrary order, e.g. `select usage_user, host from cpu` and `select host, usage_user from cpu` have different projection order
- We support `ALTER TABLE` so data files may have different schemas.
### Projection
Let's take an example to see how projection works. Suppose we want to select `ts`, `usage_system` from the `cpu` table.
```sql
CREATE TABLE cpu (
ts TIMESTAMP,
host STRING,
usage_user DOUBLE,
usage_system DOUBLE,
datacenter STRING,
TIME INDEX (ts),
PRIMARY KEY(datacenter, host)) ENGINE=mito;
select ts, usage_system from cpu;
```
The query engine uses the projection `[0, 3]` to scan the table. However, columns in the region have a different order, so the table engine adjusts the projection to `2, 4`.
```json
{
"user_schema":[
"datacenter",
"host",
"ts",
"usage_user",
"usage_system"
],
}
```
As you can see, the output order is still `[ts, usage_system]`. This is the schema users can see after projection so we call it `projected user schema`.
But the storage engine also needs to read key columns, a timestamp column, and internal columns. So we maintain a `StoreSchema` after projection in the `ProjectedSchema`.
The `Projection` struct is a helper struct to help compute the projected user schema and store schema.
So we can construct the following `ProjectedSchema`:
```json
{
"schema_to_read":{
"schema":{
"column_schemas":[
"datacenter",
"host",
"ts",
"usage_system",
"__sequence",
"__op_type"
],
"time_index":2,
"version":0
},
"row_key_end":3,
"user_column_end":4
},
"projected_user_schema":{
"column_schemas":[
"ts",
"usage_system"
],
"time_index":0
}
}
```
As you can see, `schema_to_read` doesn't contain the column `usage_user` that is not intended to be read (not in projection).
### ReadAdapter
As mentioned above, we can alter a table so the underlying files (SSTs) and memtables in the storage engine may have different schemas.
To simplify the logic of `ProjectedSchema`, we handle the difference between schemas before projection (constructing the `ProjectedSchema`). We introduce [ReadAdapter](https://github.com/GreptimeTeam/greptimedb/blob/9fa871a3fad07f583dc1863a509414da393747f8/src/storage/src/schema/compat.rs#L90) that adapts rows with different source schemas to the same expected schema.
So we can always use the current `RegionSchema` of the region to construct the `ProjectedSchema`, and then create a `ReadAdapter` for each memtable or SST.
```rust
#[derive(Debug)]
pub struct ReadAdapter {
source_schema: StoreSchemaRef,
dest_schema: ProjectedSchemaRef,
indices_in_result: Vec<Option<usize>>,
is_source_needed: Vec<bool>,
}
```
For each column required by `dest_schema`, `indices_in_result` stores the index of that column in the row read from the source memtable or SST. If the source row doesn't contain that column, the index is `None`.
The field `is_source_needed` stores whether a column in the source memtable or SST is needed.
Suppose we add a new column `usage_idle` to the table `cpu`.
```sql
ALTER TABLE cpu ADD COLUMN usage_idle DOUBLE;
```
The new `StoreSchema` becomes:
```json
{
"schema":{
"column_schemas":[
"datacenter",
"host",
"ts",
"usage_user",
"usage_system",
"usage_idle",
"__sequence",
"__op_type"
],
"time_index":2,
"version":1
},
"row_key_end":3,
"user_column_end":6
}
```
Note that we bump the version of the schema to 1.
If we want to select `ts`, `usage_system`, and `usage_idle`. While reading from the old schema, the storage engine creates a `ReadAdapter` like this:
```json
{
"source_schema":{
"schema":{
"column_schemas":[
"datacenter",
"host",
"ts",
"usage_user",
"usage_system",
"__sequence",
"__op_type"
],
"time_index":2,
"version":0
},
"row_key_end":3,
"user_column_end":5
},
"dest_schema":{
"schema_to_read":{
"schema":{
"column_schemas":[
"datacenter",
"host",
"ts",
"usage_system",
"usage_idle",
"__sequence",
"__op_type"
],
"time_index":2,
"version":1
},
"row_key_end":3,
"user_column_end":5
},
"projected_user_schema":{
"column_schemas":[
"ts",
"usage_system",
"usage_idle"
],
"time_index":0
}
},
"indices_in_result":[
0,
1,
2,
3,
null,
4,
5
],
"is_source_needed":[
true,
true,
true,
false,
true,
true,
true
]
}
```
We don't need to read `usage_user` so `is_source_needed[3]` is false. The old schema doesn't have column `usage_idle` so `indices_in_result[4]` is `null` and the `ReadAdapter` needs to insert a null column to the output row so the output schema still contains `usage_idle`.
The figure below shows the relationship between `RegionSchema`, `StoreSchema`, `ProjectedSchema`, and `ReadAdapter`.
```text
┌──────────────────────────────┐
│ │
│ ┌────────────────────┐ │
│ │ store_schema │ │
│ │ │ │
│ │ StoreSchema │ │
│ │ version 1 │ │
│ └────────────────────┘ │
│ │
│ ┌────────────────────┐ │
│ │ user_schema │ │
│ └────────────────────┘ │
│ │
│ RegionSchema │
│ │
└──────────────┬───────────────┘
┌──────────────▼───────────────┐
│ │
│ ┌──────────────────────────┐ │
│ │ schema_to_read │ │
│ │ │ │
│ │ StoreSchema (projected) │ │
│ │ version 1 │ │
│ └──────────────────────────┘ │
┌───┤ ├───┐
│ │ ┌──────────────────────────┐ │ │
│ │ │ projected_user_schema │ │ │
│ │ └──────────────────────────┘ │ │
│ │ │ │
│ │ ProjectedSchema │ │
dest schema │ └──────────────────────────────┘ │ dest schema
│ │
│ │
┌──────▼───────┐ ┌───────▼──────┐
│ │ │ │
│ ReadAdapter │ │ ReadAdapter │
│ │ │ │
└──────▲───────┘ └───────▲──────┘
│ │
│ │
source schema │ │ source schema
│ │
┌───────┴─────────┐ ┌────────┴────────┐
│ │ │ │
│ ┌─────────────┐ │ │ ┌─────────────┐ │
│ │ │ │ │ │ │ │
│ │ StoreSchema │ │ │ │ StoreSchema │ │
│ │ │ │ │ │ │ │
│ │ version 0 │ │ │ │ version 1 │ │
│ │ │ │ │ │ │ │
│ └─────────────┘ │ │ └─────────────┘ │
│ │ │ │
│ SST 0 │ │ SST 1 │
│ │ │ │
└─────────────────┘ └─────────────────┘
```
# Conversion
This figure shows the conversion between schemas:
```text
┌─────────────┐ schema From ┌─────────────┐
│ ├──────────────────┐ ┌────────────────────────────► │
│ TableMeta │ │ │ │ RawSchema │
│ │ │ │ ┌─────────────────────────┤ │
└─────────────┘ │ │ │ TryFrom └─────────────┘
│ │ │
│ │ │
│ │ │
│ │ │
│ │ │
┌───────────────────┐ ┌─────▼──┴──▼──┐ arrow_schema() ┌─────────────────┐
│ │ │ ├─────────────────────► │
│ ColumnsMetadata │ ┌─────► Schema │ │ ArrowSchema ├──┐
│ │ │ │ ◄─────────────────────┤ │ │
└────┬───────────▲──┘ │ └───▲───▲──────┘ TryFrom └─────────────────┘ │
│ │ │ │ │ │
│ │ │ │ └────────────────────────────────────────┐ │
│ │ │ │ │ │
│ columns │ user_schema() │ │ │
│ │ │ │ projected_user_schema() schema() │
│ │ │ │ │ │
│ ┌───┴─────────────┴─┐ │ ┌────────────────────┐ │ │
columns │ │ │ └─────────────────┤ │ │ │ TryFrom
│ │ RegionSchema │ │ ProjectedSchema │ │ │
│ │ ├─────────────────────────► │ │ │
│ └─────────────────┬─┘ ProjectedSchema::new() └──────────────────┬─┘ │ │
│ │ │ │ │
│ │ │ │ │
│ │ │ │ │
│ │ │ │ │
┌────▼────────────────────┐ │ store_schema() ┌────▼───────┴──┐ │
│ │ └─────────────────────────────────────────► │ │
│ Vec<ColumnMetadata> │ │ StoreSchema ◄─────┘
│ ◄──────────────────────────────────────────────┤ │
└─────────────────────────┘ columns └───────────────┘
```

View File

@@ -17,10 +17,11 @@ datatypes.workspace = true
greptime-proto.workspace = true
paste = "1.0"
prost.workspace = true
serde_json.workspace = true
snafu.workspace = true
[build-dependencies]
tonic-build = "0.9"
tonic-build = "0.11"
[dev-dependencies]
paste = "1.0"

View File

@@ -58,13 +58,23 @@ pub enum Error {
location: Location,
source: datatypes::error::Error,
},
#[snafu(display("Failed to serialize JSON"))]
SerializeJson {
#[snafu(source)]
error: serde_json::Error,
#[snafu(implicit)]
location: Location,
},
}
impl ErrorExt for Error {
fn status_code(&self) -> StatusCode {
match self {
Error::UnknownColumnDataType { .. } => StatusCode::InvalidArguments,
Error::IntoColumnDataType { .. } => StatusCode::Unexpected,
Error::IntoColumnDataType { .. } | Error::SerializeJson { .. } => {
StatusCode::Unexpected
}
Error::ConvertColumnDefaultConstraint { source, .. }
| Error::InvalidColumnDefaultConstraint { source, .. } => source.status_code(),
}

View File

@@ -1843,6 +1843,7 @@ mod tests {
null_mask: vec![2],
datatype: ColumnDataType::Boolean as i32,
datatype_extension: None,
options: None,
};
assert!(is_column_type_value_eq(
column1.datatype,

View File

@@ -12,6 +12,8 @@
// See the License for the specific language governing permissions and
// limitations under the License.
#![feature(let_chains)]
pub mod error;
pub mod helper;

View File

@@ -14,13 +14,19 @@
use std::collections::HashMap;
use datatypes::schema::{ColumnDefaultConstraint, ColumnSchema, COMMENT_KEY};
use datatypes::schema::{
ColumnDefaultConstraint, ColumnSchema, FulltextOptions, COMMENT_KEY, FULLTEXT_KEY,
};
use snafu::ResultExt;
use crate::error::{self, Result};
use crate::helper::ColumnDataTypeWrapper;
use crate::v1::ColumnDef;
use crate::v1::{ColumnDef, ColumnOptions, SemanticType};
/// Key used to store fulltext options in gRPC column options.
const FULLTEXT_GRPC_KEY: &str = "fulltext";
/// Tries to construct a `ColumnSchema` from the given `ColumnDef`.
pub fn try_as_column_schema(column_def: &ColumnDef) -> Result<ColumnSchema> {
let data_type = ColumnDataTypeWrapper::try_new(
column_def.data_type,
@@ -43,13 +49,147 @@ pub fn try_as_column_schema(column_def: &ColumnDef) -> Result<ColumnSchema> {
if !column_def.comment.is_empty() {
metadata.insert(COMMENT_KEY.to_string(), column_def.comment.clone());
}
if let Some(options) = column_def.options.as_ref()
&& let Some(fulltext) = options.options.get(FULLTEXT_GRPC_KEY)
{
metadata.insert(FULLTEXT_KEY.to_string(), fulltext.to_string());
}
Ok(
ColumnSchema::new(&column_def.name, data_type.into(), column_def.is_nullable)
.with_default_constraint(constraint)
.context(error::InvalidColumnDefaultConstraintSnafu {
column: &column_def.name,
})?
.with_metadata(metadata),
)
ColumnSchema::new(&column_def.name, data_type.into(), column_def.is_nullable)
.with_metadata(metadata)
.with_time_index(column_def.semantic_type() == SemanticType::Timestamp)
.with_default_constraint(constraint)
.context(error::InvalidColumnDefaultConstraintSnafu {
column: &column_def.name,
})
}
/// Constructs a `ColumnOptions` from the given `ColumnSchema`.
pub fn options_from_column_schema(column_schema: &ColumnSchema) -> Option<ColumnOptions> {
let mut options = ColumnOptions::default();
if let Some(fulltext) = column_schema.metadata().get(FULLTEXT_KEY) {
options
.options
.insert(FULLTEXT_GRPC_KEY.to_string(), fulltext.to_string());
}
(!options.options.is_empty()).then_some(options)
}
/// Checks if the `ColumnOptions` contains fulltext options.
pub fn contains_fulltext(options: &Option<ColumnOptions>) -> bool {
options
.as_ref()
.map_or(false, |o| o.options.contains_key(FULLTEXT_GRPC_KEY))
}
/// Tries to construct a `ColumnOptions` from the given `FulltextOptions`.
pub fn options_from_fulltext(fulltext: &FulltextOptions) -> Result<Option<ColumnOptions>> {
let mut options = ColumnOptions::default();
let v = serde_json::to_string(fulltext).context(error::SerializeJsonSnafu)?;
options.options.insert(FULLTEXT_GRPC_KEY.to_string(), v);
Ok((!options.options.is_empty()).then_some(options))
}
#[cfg(test)]
mod tests {
use datatypes::data_type::ConcreteDataType;
use datatypes::schema::FulltextAnalyzer;
use super::*;
use crate::v1::ColumnDataType;
#[test]
fn test_try_as_column_schema() {
let column_def = ColumnDef {
name: "test".to_string(),
data_type: ColumnDataType::String as i32,
is_nullable: true,
default_constraint: ColumnDefaultConstraint::Value("test_default".into())
.try_into()
.unwrap(),
semantic_type: SemanticType::Field as i32,
comment: "test_comment".to_string(),
datatype_extension: None,
options: Some(ColumnOptions {
options: HashMap::from([(
FULLTEXT_GRPC_KEY.to_string(),
"{\"enable\":true}".to_string(),
)]),
}),
};
let schema = try_as_column_schema(&column_def).unwrap();
assert_eq!(schema.name, "test");
assert_eq!(schema.data_type, ConcreteDataType::string_datatype());
assert!(!schema.is_time_index());
assert!(schema.is_nullable());
assert_eq!(
schema.default_constraint().unwrap(),
&ColumnDefaultConstraint::Value("test_default".into())
);
assert_eq!(schema.metadata().get(COMMENT_KEY).unwrap(), "test_comment");
assert_eq!(
schema.fulltext_options().unwrap().unwrap(),
FulltextOptions {
enable: true,
..Default::default()
}
);
}
#[test]
fn test_options_from_column_schema() {
let schema = ColumnSchema::new("test", ConcreteDataType::string_datatype(), true);
let options = options_from_column_schema(&schema);
assert!(options.is_none());
let schema = ColumnSchema::new("test", ConcreteDataType::string_datatype(), true)
.with_fulltext_options(FulltextOptions {
enable: true,
analyzer: FulltextAnalyzer::English,
case_sensitive: false,
})
.unwrap();
let options = options_from_column_schema(&schema).unwrap();
assert_eq!(
options.options.get(FULLTEXT_GRPC_KEY).unwrap(),
"{\"enable\":true,\"analyzer\":\"English\",\"case-sensitive\":false}"
);
}
#[test]
fn test_options_with_fulltext() {
let fulltext = FulltextOptions {
enable: true,
analyzer: FulltextAnalyzer::English,
case_sensitive: false,
};
let options = options_from_fulltext(&fulltext).unwrap().unwrap();
assert_eq!(
options.options.get(FULLTEXT_GRPC_KEY).unwrap(),
"{\"enable\":true,\"analyzer\":\"English\",\"case-sensitive\":false}"
);
}
#[test]
fn test_contains_fulltext() {
let options = ColumnOptions {
options: HashMap::from([(
FULLTEXT_GRPC_KEY.to_string(),
"{\"enable\":true}".to_string(),
)]),
};
assert!(contains_fulltext(&Some(options)));
let options = ColumnOptions {
options: HashMap::new(),
};
assert!(!contains_fulltext(&Some(options)));
assert!(!contains_fulltext(&None));
}
}

View File

@@ -30,6 +30,7 @@ pub enum PermissionReq<'a> {
PromStoreWrite,
PromStoreRead,
Otlp,
LogWrite,
}
#[derive(Debug)]
@@ -41,7 +42,7 @@ pub enum PermissionResp {
pub trait PermissionChecker: Send + Sync {
fn check_permission(
&self,
user_info: Option<UserInfoRef>,
user_info: UserInfoRef,
req: PermissionReq,
) -> Result<PermissionResp>;
}
@@ -49,7 +50,7 @@ pub trait PermissionChecker: Send + Sync {
impl PermissionChecker for Option<&PermissionCheckerRef> {
fn check_permission(
&self,
user_info: Option<UserInfoRef>,
user_info: UserInfoRef,
req: PermissionReq,
) -> Result<PermissionResp> {
match self {

View File

@@ -27,7 +27,7 @@ struct DummyPermissionChecker;
impl PermissionChecker for DummyPermissionChecker {
fn check_permission(
&self,
_user_info: Option<UserInfoRef>,
_user_info: UserInfoRef,
req: PermissionReq,
) -> auth::error::Result<PermissionResp> {
match req {
@@ -45,17 +45,21 @@ fn test_permission_checker() {
let checker: PermissionCheckerRef = Arc::new(DummyPermissionChecker);
let grpc_result = checker.check_permission(
None,
auth::userinfo_by_name(None),
PermissionReq::GrpcRequest(&Request::Query(Default::default())),
);
assert_matches!(grpc_result, Ok(PermissionResp::Allow));
let sql_result = checker.check_permission(
None,
PermissionReq::SqlStatement(&Statement::ShowDatabases(ShowDatabases::new(ShowKind::All))),
auth::userinfo_by_name(None),
PermissionReq::SqlStatement(&Statement::ShowDatabases(ShowDatabases::new(
ShowKind::All,
false,
))),
);
assert_matches!(sql_result, Ok(PermissionResp::Reject));
let err_result = checker.check_permission(None, PermissionReq::Opentsdb);
let err_result =
checker.check_permission(auth::userinfo_by_name(None), PermissionReq::Opentsdb);
assert_matches!(err_result, Err(InternalState { msg }) if msg == "testing");
}

View File

@@ -27,6 +27,7 @@ use datatypes::prelude::{ConcreteDataType, MutableVector, ScalarVectorBuilder, V
use datatypes::schema::{ColumnSchema, Schema, SchemaRef};
use datatypes::value::Value;
use datatypes::vectors::{ConstantVector, StringVector, StringVectorBuilder, UInt32VectorBuilder};
use futures_util::TryStreamExt;
use snafu::{OptionExt, ResultExt};
use store_api::storage::{ScanRequest, TableId};
@@ -211,71 +212,58 @@ impl InformationSchemaKeyColumnUsageBuilder {
.context(UpgradeWeakCatalogManagerRefSnafu)?;
let predicates = Predicates::from_scan_request(&request);
let mut primary_constraints = vec![];
for schema_name in catalog_manager.schema_names(&catalog_name).await? {
if !catalog_manager
.schema_exists(&catalog_name, &schema_name)
.await?
{
continue;
}
let mut stream = catalog_manager.tables(&catalog_name, &schema_name);
for table_name in catalog_manager
.table_names(&catalog_name, &schema_name)
.await?
{
if let Some(table) = catalog_manager
.table(&catalog_name, &schema_name, &table_name)
.await?
{
let keys = &table.table_info().meta.primary_key_indices;
let schema = table.schema();
while let Some(table) = stream.try_next().await? {
let mut primary_constraints = vec![];
for (idx, column) in schema.column_schemas().iter().enumerate() {
if column.is_time_index() {
self.add_key_column_usage(
&predicates,
&schema_name,
TIME_INDEX_CONSTRAINT_NAME,
&catalog_name,
&schema_name,
&table_name,
&column.name,
1, //always 1 for time index
);
}
if keys.contains(&idx) {
primary_constraints.push((
catalog_name.clone(),
schema_name.clone(),
table_name.clone(),
column.name.clone(),
));
}
// TODO(dimbtp): foreign key constraint not supported yet
let table_info = table.table_info();
let table_name = &table_info.name;
let keys = &table_info.meta.primary_key_indices;
let schema = table.schema();
for (idx, column) in schema.column_schemas().iter().enumerate() {
if column.is_time_index() {
self.add_key_column_usage(
&predicates,
&schema_name,
TIME_INDEX_CONSTRAINT_NAME,
&catalog_name,
&schema_name,
table_name,
&column.name,
1, //always 1 for time index
);
}
} else {
unreachable!();
if keys.contains(&idx) {
primary_constraints.push((
catalog_name.clone(),
schema_name.clone(),
table_name.to_string(),
column.name.clone(),
));
}
// TODO(dimbtp): foreign key constraint not supported yet
}
for (i, (catalog_name, schema_name, table_name, column_name)) in
primary_constraints.into_iter().enumerate()
{
self.add_key_column_usage(
&predicates,
&schema_name,
PRI_CONSTRAINT_NAME,
&catalog_name,
&schema_name,
&table_name,
&column_name,
i as u32 + 1,
);
}
}
}
for (i, (catalog_name, schema_name, table_name, column_name)) in
primary_constraints.into_iter().enumerate()
{
self.add_key_column_usage(
&predicates,
&schema_name,
PRI_CONSTRAINT_NAME,
&catalog_name,
&schema_name,
&table_name,
&column_name,
i as u32 + 1,
);
}
self.finish()
}

View File

@@ -31,7 +31,7 @@ use datatypes::value::Value;
use datatypes::vectors::{Int64VectorBuilder, StringVectorBuilder, UInt64VectorBuilder};
use futures::{StreamExt, TryStreamExt};
use snafu::{OptionExt, ResultExt};
use store_api::storage::{ScanRequest, TableId};
use store_api::storage::{RegionId, ScanRequest, TableId};
use table::metadata::TableType;
use super::REGION_PEERS;
@@ -205,8 +205,8 @@ impl InformationSchemaRegionPeersBuilder {
table_ids.into_iter().map(|id| (id, vec![])).collect()
};
for routes in table_routes.values() {
self.add_region_peers(&predicates, routes);
for (table_id, routes) in table_routes {
self.add_region_peers(&predicates, table_id, &routes);
}
}
}
@@ -214,9 +214,14 @@ impl InformationSchemaRegionPeersBuilder {
self.finish()
}
fn add_region_peers(&mut self, predicates: &Predicates, routes: &[RegionRoute]) {
fn add_region_peers(
&mut self,
predicates: &Predicates,
table_id: TableId,
routes: &[RegionRoute],
) {
for route in routes {
let region_id = route.region.id.as_u64();
let region_id = RegionId::new(table_id, route.region.id.region_number()).as_u64();
let peer_id = route.leader_peer.clone().map(|p| p.id);
let peer_addr = route.leader_peer.clone().map(|p| p.addr);
let status = if let Some(status) = route.leader_status {

View File

@@ -17,6 +17,7 @@ use std::sync::{Arc, Weak};
use arrow_schema::SchemaRef as ArrowSchemaRef;
use common_catalog::consts::INFORMATION_SCHEMA_SCHEMATA_TABLE_ID;
use common_error::ext::BoxedError;
use common_meta::key::schema_name::SchemaNameKey;
use common_recordbatch::adapter::RecordBatchStreamAdapter;
use common_recordbatch::{RecordBatch, SendableRecordBatchStream};
use datafusion::execution::TaskContext;
@@ -32,15 +33,18 @@ use store_api::storage::{ScanRequest, TableId};
use super::SCHEMATA;
use crate::error::{
CreateRecordBatchSnafu, InternalSnafu, Result, UpgradeWeakCatalogManagerRefSnafu,
CreateRecordBatchSnafu, InternalSnafu, Result, TableMetadataManagerSnafu,
UpgradeWeakCatalogManagerRefSnafu,
};
use crate::information_schema::{InformationTable, Predicates};
use crate::information_schema::{utils, InformationTable, Predicates};
use crate::CatalogManager;
pub const CATALOG_NAME: &str = "catalog_name";
pub const SCHEMA_NAME: &str = "schema_name";
const DEFAULT_CHARACTER_SET_NAME: &str = "default_character_set_name";
const DEFAULT_COLLATION_NAME: &str = "default_collation_name";
/// The database options
pub const SCHEMA_OPTS: &str = "options";
const INIT_CAPACITY: usize = 42;
/// The `information_schema.schemata` table implementation.
@@ -74,6 +78,7 @@ impl InformationSchemaSchemata {
false,
),
ColumnSchema::new("sql_path", ConcreteDataType::string_datatype(), true),
ColumnSchema::new(SCHEMA_OPTS, ConcreteDataType::string_datatype(), true),
]))
}
@@ -133,6 +138,7 @@ struct InformationSchemaSchemataBuilder {
charset_names: StringVectorBuilder,
collation_names: StringVectorBuilder,
sql_paths: StringVectorBuilder,
schema_options: StringVectorBuilder,
}
impl InformationSchemaSchemataBuilder {
@@ -150,6 +156,7 @@ impl InformationSchemaSchemataBuilder {
charset_names: StringVectorBuilder::with_capacity(INIT_CAPACITY),
collation_names: StringVectorBuilder::with_capacity(INIT_CAPACITY),
sql_paths: StringVectorBuilder::with_capacity(INIT_CAPACITY),
schema_options: StringVectorBuilder::with_capacity(INIT_CAPACITY),
}
}
@@ -160,21 +167,47 @@ impl InformationSchemaSchemataBuilder {
.catalog_manager
.upgrade()
.context(UpgradeWeakCatalogManagerRefSnafu)?;
let table_metadata_manager = utils::table_meta_manager(&self.catalog_manager)?;
let predicates = Predicates::from_scan_request(&request);
for schema_name in catalog_manager.schema_names(&catalog_name).await? {
self.add_schema(&predicates, &catalog_name, &schema_name);
let opts = if let Some(table_metadata_manager) = &table_metadata_manager {
table_metadata_manager
.schema_manager()
.get(SchemaNameKey::new(&catalog_name, &schema_name))
.await
.context(TableMetadataManagerSnafu)?
// information_schema is not available from this
// table_metadata_manager and we return None
.map(|schema_opts| format!("{schema_opts}"))
} else {
None
};
self.add_schema(
&predicates,
&catalog_name,
&schema_name,
opts.as_deref().unwrap_or(""),
);
}
self.finish()
}
fn add_schema(&mut self, predicates: &Predicates, catalog_name: &str, schema_name: &str) {
fn add_schema(
&mut self,
predicates: &Predicates,
catalog_name: &str,
schema_name: &str,
schema_options: &str,
) {
let row = [
(CATALOG_NAME, &Value::from(catalog_name)),
(SCHEMA_NAME, &Value::from(schema_name)),
(DEFAULT_CHARACTER_SET_NAME, &Value::from("utf8")),
(DEFAULT_COLLATION_NAME, &Value::from("utf8_bin")),
(SCHEMA_OPTS, &Value::from(schema_options)),
];
if !predicates.eval(&row) {
@@ -186,6 +219,7 @@ impl InformationSchemaSchemataBuilder {
self.charset_names.push(Some("utf8"));
self.collation_names.push(Some("utf8_bin"));
self.sql_paths.push(None);
self.schema_options.push(Some(schema_options));
}
fn finish(&mut self) -> Result<RecordBatch> {
@@ -195,6 +229,7 @@ impl InformationSchemaSchemataBuilder {
Arc::new(self.charset_names.finish()),
Arc::new(self.collation_names.finish()),
Arc::new(self.sql_paths.finish()),
Arc::new(self.schema_options.finish()),
];
RecordBatch::new(self.schema.clone(), columns).context(CreateRecordBatchSnafu)
}

View File

@@ -26,11 +26,13 @@ use datafusion::physical_plan::SendableRecordBatchStream as DfSendableRecordBatc
use datatypes::prelude::{ConcreteDataType, ScalarVectorBuilder, VectorRef};
use datatypes::schema::{ColumnSchema, Schema, SchemaRef};
use datatypes::value::Value;
use datatypes::vectors::{StringVectorBuilder, UInt32VectorBuilder};
use datatypes::vectors::{
DateTimeVectorBuilder, StringVectorBuilder, UInt32VectorBuilder, UInt64VectorBuilder,
};
use futures::TryStreamExt;
use snafu::{OptionExt, ResultExt};
use store_api::storage::{ScanRequest, TableId};
use table::metadata::TableType;
use table::metadata::{TableInfo, TableType};
use super::TABLES;
use crate::error::{
@@ -43,8 +45,26 @@ pub const TABLE_CATALOG: &str = "table_catalog";
pub const TABLE_SCHEMA: &str = "table_schema";
pub const TABLE_NAME: &str = "table_name";
pub const TABLE_TYPE: &str = "table_type";
pub const VERSION: &str = "version";
pub const ROW_FORMAT: &str = "row_format";
pub const TABLE_ROWS: &str = "table_rows";
pub const DATA_LENGTH: &str = "data_length";
pub const INDEX_LENGTH: &str = "index_length";
pub const MAX_DATA_LENGTH: &str = "max_data_length";
pub const AVG_ROW_LENGTH: &str = "avg_row_length";
pub const DATA_FREE: &str = "data_free";
pub const AUTO_INCREMENT: &str = "auto_increment";
pub const CREATE_TIME: &str = "create_time";
pub const UPDATE_TIME: &str = "update_time";
pub const CHECK_TIME: &str = "check_time";
pub const TABLE_COLLATION: &str = "table_collation";
pub const CHECKSUM: &str = "checksum";
pub const CREATE_OPTIONS: &str = "create_options";
pub const TABLE_COMMENT: &str = "table_comment";
pub const MAX_INDEX_LENGTH: &str = "max_index_length";
pub const TEMPORARY: &str = "temporary";
const TABLE_ID: &str = "table_id";
const ENGINE: &str = "engine";
pub const ENGINE: &str = "engine";
const INIT_CAPACITY: usize = 42;
pub(super) struct InformationSchemaTables {
@@ -69,7 +89,25 @@ impl InformationSchemaTables {
ColumnSchema::new(TABLE_NAME, ConcreteDataType::string_datatype(), false),
ColumnSchema::new(TABLE_TYPE, ConcreteDataType::string_datatype(), false),
ColumnSchema::new(TABLE_ID, ConcreteDataType::uint32_datatype(), true),
ColumnSchema::new(DATA_LENGTH, ConcreteDataType::uint64_datatype(), true),
ColumnSchema::new(MAX_DATA_LENGTH, ConcreteDataType::uint64_datatype(), true),
ColumnSchema::new(INDEX_LENGTH, ConcreteDataType::uint64_datatype(), true),
ColumnSchema::new(MAX_INDEX_LENGTH, ConcreteDataType::uint64_datatype(), true),
ColumnSchema::new(AVG_ROW_LENGTH, ConcreteDataType::uint64_datatype(), true),
ColumnSchema::new(ENGINE, ConcreteDataType::string_datatype(), true),
ColumnSchema::new(VERSION, ConcreteDataType::uint64_datatype(), true),
ColumnSchema::new(ROW_FORMAT, ConcreteDataType::string_datatype(), true),
ColumnSchema::new(TABLE_ROWS, ConcreteDataType::uint64_datatype(), true),
ColumnSchema::new(DATA_FREE, ConcreteDataType::uint64_datatype(), true),
ColumnSchema::new(AUTO_INCREMENT, ConcreteDataType::uint64_datatype(), true),
ColumnSchema::new(CREATE_TIME, ConcreteDataType::datetime_datatype(), true),
ColumnSchema::new(UPDATE_TIME, ConcreteDataType::datetime_datatype(), true),
ColumnSchema::new(CHECK_TIME, ConcreteDataType::datetime_datatype(), true),
ColumnSchema::new(TABLE_COLLATION, ConcreteDataType::string_datatype(), true),
ColumnSchema::new(CHECKSUM, ConcreteDataType::uint64_datatype(), true),
ColumnSchema::new(CREATE_OPTIONS, ConcreteDataType::string_datatype(), true),
ColumnSchema::new(TABLE_COMMENT, ConcreteDataType::string_datatype(), true),
ColumnSchema::new(TEMPORARY, ConcreteDataType::string_datatype(), true),
]))
}
@@ -131,7 +169,25 @@ struct InformationSchemaTablesBuilder {
table_names: StringVectorBuilder,
table_types: StringVectorBuilder,
table_ids: UInt32VectorBuilder,
version: UInt64VectorBuilder,
row_format: StringVectorBuilder,
table_rows: UInt64VectorBuilder,
data_length: UInt64VectorBuilder,
max_data_length: UInt64VectorBuilder,
index_length: UInt64VectorBuilder,
avg_row_length: UInt64VectorBuilder,
max_index_length: UInt64VectorBuilder,
data_free: UInt64VectorBuilder,
auto_increment: UInt64VectorBuilder,
create_time: DateTimeVectorBuilder,
update_time: DateTimeVectorBuilder,
check_time: DateTimeVectorBuilder,
table_collation: StringVectorBuilder,
checksum: UInt64VectorBuilder,
create_options: StringVectorBuilder,
table_comment: StringVectorBuilder,
engines: StringVectorBuilder,
temporary: StringVectorBuilder,
}
impl InformationSchemaTablesBuilder {
@@ -149,7 +205,25 @@ impl InformationSchemaTablesBuilder {
table_names: StringVectorBuilder::with_capacity(INIT_CAPACITY),
table_types: StringVectorBuilder::with_capacity(INIT_CAPACITY),
table_ids: UInt32VectorBuilder::with_capacity(INIT_CAPACITY),
data_length: UInt64VectorBuilder::with_capacity(INIT_CAPACITY),
max_data_length: UInt64VectorBuilder::with_capacity(INIT_CAPACITY),
index_length: UInt64VectorBuilder::with_capacity(INIT_CAPACITY),
avg_row_length: UInt64VectorBuilder::with_capacity(INIT_CAPACITY),
engines: StringVectorBuilder::with_capacity(INIT_CAPACITY),
version: UInt64VectorBuilder::with_capacity(INIT_CAPACITY),
row_format: StringVectorBuilder::with_capacity(INIT_CAPACITY),
table_rows: UInt64VectorBuilder::with_capacity(INIT_CAPACITY),
max_index_length: UInt64VectorBuilder::with_capacity(INIT_CAPACITY),
data_free: UInt64VectorBuilder::with_capacity(INIT_CAPACITY),
auto_increment: UInt64VectorBuilder::with_capacity(INIT_CAPACITY),
create_time: DateTimeVectorBuilder::with_capacity(INIT_CAPACITY),
update_time: DateTimeVectorBuilder::with_capacity(INIT_CAPACITY),
check_time: DateTimeVectorBuilder::with_capacity(INIT_CAPACITY),
table_collation: StringVectorBuilder::with_capacity(INIT_CAPACITY),
checksum: UInt64VectorBuilder::with_capacity(INIT_CAPACITY),
create_options: StringVectorBuilder::with_capacity(INIT_CAPACITY),
table_comment: StringVectorBuilder::with_capacity(INIT_CAPACITY),
temporary: StringVectorBuilder::with_capacity(INIT_CAPACITY),
}
}
@@ -171,10 +245,8 @@ impl InformationSchemaTablesBuilder {
&predicates,
&catalog_name,
&schema_name,
&table_info.name,
table_info,
table.table_type(),
Some(table_info.ident.table_id),
Some(&table_info.meta.engine),
);
}
}
@@ -188,12 +260,14 @@ impl InformationSchemaTablesBuilder {
predicates: &Predicates,
catalog_name: &str,
schema_name: &str,
table_name: &str,
table_info: Arc<TableInfo>,
table_type: TableType,
table_id: Option<u32>,
engine: Option<&str>,
) {
let table_type = match table_type {
let table_name = table_info.name.as_ref();
let table_id = table_info.table_id();
let engine = table_info.meta.engine.as_ref();
let table_type_text = match table_type {
TableType::Base => "BASE TABLE",
TableType::View => "VIEW",
TableType::Temporary => "LOCAL TEMPORARY",
@@ -203,7 +277,7 @@ impl InformationSchemaTablesBuilder {
(TABLE_CATALOG, &Value::from(catalog_name)),
(TABLE_SCHEMA, &Value::from(schema_name)),
(TABLE_NAME, &Value::from(table_name)),
(TABLE_TYPE, &Value::from(table_type)),
(TABLE_TYPE, &Value::from(table_type_text)),
];
if !predicates.eval(&row) {
@@ -213,9 +287,38 @@ impl InformationSchemaTablesBuilder {
self.catalog_names.push(Some(catalog_name));
self.schema_names.push(Some(schema_name));
self.table_names.push(Some(table_name));
self.table_types.push(Some(table_type));
self.table_ids.push(table_id);
self.engines.push(engine);
self.table_types.push(Some(table_type_text));
self.table_ids.push(Some(table_id));
// TODO(sunng87): use real data for these fields
self.data_length.push(Some(0));
self.max_data_length.push(Some(0));
self.index_length.push(Some(0));
self.avg_row_length.push(Some(0));
self.max_index_length.push(Some(0));
self.checksum.push(Some(0));
self.table_rows.push(Some(0));
self.data_free.push(Some(0));
self.auto_increment.push(Some(0));
self.row_format.push(Some("Fixed"));
self.table_collation.push(None);
self.update_time.push(None);
self.check_time.push(None);
// use mariadb default table version number here
self.version.push(Some(11));
self.table_comment.push(table_info.desc.as_deref());
self.create_options
.push(Some(table_info.meta.options.to_string().as_ref()));
self.create_time
.push(Some(table_info.meta.created_on.timestamp_millis().into()));
self.temporary
.push(if matches!(table_type, TableType::Temporary) {
Some("Y")
} else {
Some("N")
});
self.engines.push(Some(engine));
}
fn finish(&mut self) -> Result<RecordBatch> {
@@ -225,7 +328,25 @@ impl InformationSchemaTablesBuilder {
Arc::new(self.table_names.finish()),
Arc::new(self.table_types.finish()),
Arc::new(self.table_ids.finish()),
Arc::new(self.data_length.finish()),
Arc::new(self.max_data_length.finish()),
Arc::new(self.index_length.finish()),
Arc::new(self.max_index_length.finish()),
Arc::new(self.avg_row_length.finish()),
Arc::new(self.engines.finish()),
Arc::new(self.version.finish()),
Arc::new(self.row_format.finish()),
Arc::new(self.table_rows.finish()),
Arc::new(self.data_free.finish()),
Arc::new(self.auto_increment.finish()),
Arc::new(self.create_time.finish()),
Arc::new(self.update_time.finish()),
Arc::new(self.check_time.finish()),
Arc::new(self.table_collation.finish()),
Arc::new(self.checksum.finish()),
Arc::new(self.create_options.finish()),
Arc::new(self.table_comment.finish()),
Arc::new(self.temporary.finish()),
];
RecordBatch::new(self.schema.clone(), columns).context(CreateRecordBatchSnafu)
}

View File

@@ -15,6 +15,7 @@
use std::sync::{Arc, Weak};
use common_config::Mode;
use common_meta::key::TableMetadataManagerRef;
use meta_client::client::MetaClient;
use snafu::OptionExt;
@@ -51,3 +52,17 @@ pub fn meta_client(catalog_manager: &Weak<dyn CatalogManager>) -> Result<Option<
Ok(meta_client)
}
/// Try to get the `[TableMetadataManagerRef]` from `[CatalogManager]` weak reference.
pub fn table_meta_manager(
catalog_manager: &Weak<dyn CatalogManager>,
) -> Result<Option<TableMetadataManagerRef>> {
let catalog_manager = catalog_manager
.upgrade()
.context(UpgradeWeakCatalogManagerRefSnafu)?;
Ok(catalog_manager
.as_any()
.downcast_ref::<KvBackendCatalogManager>()
.map(|manager| manager.table_metadata_manager_ref().clone()))
}

View File

@@ -57,7 +57,7 @@ impl DfTableSourceProvider {
disallow_cross_catalog_query,
resolved_tables: HashMap::new(),
default_catalog: query_ctx.current_catalog().to_owned(),
default_schema: query_ctx.current_schema().to_owned(),
default_schema: query_ctx.current_schema(),
plan_decoder,
}
}

View File

@@ -14,6 +14,7 @@
use std::sync::Arc;
use api::v1::flow::flow_client::FlowClient as PbFlowClient;
use api::v1::health_check_client::HealthCheckClient;
use api::v1::prometheus_gateway_client::PrometheusGatewayClient;
use api::v1::region::region_client::RegionClient as PbRegionClient;
@@ -183,6 +184,16 @@ impl Client {
Ok((addr, client))
}
pub(crate) fn raw_flow_client(&self) -> Result<(String, PbFlowClient<Channel>)> {
let (addr, channel) = self.find_channel()?;
let client = PbFlowClient::new(channel)
.max_decoding_message_size(self.max_grpc_recv_message_size())
.max_encoding_message_size(self.max_grpc_send_message_size())
.accept_compressed(CompressionEncoding::Zstd)
.send_compressed(CompressionEncoding::Zstd);
Ok((addr, client))
}
pub fn make_prometheus_gateway_client(&self) -> Result<PrometheusGatewayClient<Channel>> {
let (_, channel) = self.find_channel()?;
let client = PrometheusGatewayClient::new(channel)

View File

@@ -21,43 +21,45 @@ use common_meta::node_manager::{DatanodeRef, FlownodeRef, NodeManager};
use common_meta::peer::Peer;
use moka::future::{Cache, CacheBuilder};
use crate::flow::FlowRequester;
use crate::region::RegionRequester;
use crate::Client;
pub struct DatanodeClients {
pub struct NodeClients {
channel_manager: ChannelManager,
clients: Cache<Peer, Client>,
}
impl Default for DatanodeClients {
impl Default for NodeClients {
fn default() -> Self {
Self::new(ChannelConfig::new())
}
}
impl Debug for DatanodeClients {
impl Debug for NodeClients {
fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result {
f.debug_struct("DatanodeClients")
f.debug_struct("NodeClients")
.field("channel_manager", &self.channel_manager)
.finish()
}
}
#[async_trait::async_trait]
impl NodeManager for DatanodeClients {
impl NodeManager for NodeClients {
async fn datanode(&self, datanode: &Peer) -> DatanodeRef {
let client = self.get_client(datanode).await;
Arc::new(RegionRequester::new(client))
}
async fn flownode(&self, _node: &Peer) -> FlownodeRef {
// TODO(weny): Support it.
unimplemented!()
async fn flownode(&self, flownode: &Peer) -> FlownodeRef {
let client = self.get_client(flownode).await;
Arc::new(FlowRequester::new(client))
}
}
impl DatanodeClients {
impl NodeClients {
pub fn new(config: ChannelConfig) -> Self {
Self {
channel_manager: ChannelManager::with_config(config),

View File

@@ -98,6 +98,15 @@ pub enum Error {
location: Location,
},
#[snafu(display("Failed to request FlowServer {}, code: {}", addr, code))]
FlowServer {
addr: String,
code: Code,
source: BoxedError,
#[snafu(implicit)]
location: Location,
},
// Server error carried in Tonic Status's metadata.
#[snafu(display("{}", msg))]
Server {
@@ -136,7 +145,8 @@ impl ErrorExt for Error {
Error::Server { code, .. } => *code,
Error::FlightGet { source, .. }
| Error::HandleRequest { source, .. }
| Error::RegionServer { source, .. } => source.status_code(),
| Error::RegionServer { source, .. }
| Error::FlowServer { source, .. } => source.status_code(),
Error::CreateChannel { source, .. }
| Error::ConvertFlightData { source, .. }
| Error::CreateTlsChannel { source, .. } => source.status_code(),

104
src/client/src/flow.rs Normal file
View File

@@ -0,0 +1,104 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
use api::v1::flow::{FlowRequest, FlowResponse};
use api::v1::region::InsertRequests;
use common_error::ext::BoxedError;
use common_meta::node_manager::Flownode;
use snafu::{location, Location, ResultExt};
use crate::error::Result;
use crate::Client;
#[derive(Debug)]
pub struct FlowRequester {
client: Client,
}
#[async_trait::async_trait]
impl Flownode for FlowRequester {
async fn handle(&self, request: FlowRequest) -> common_meta::error::Result<FlowResponse> {
self.handle_inner(request)
.await
.map_err(BoxedError::new)
.context(common_meta::error::ExternalSnafu)
}
async fn handle_inserts(
&self,
request: InsertRequests,
) -> common_meta::error::Result<FlowResponse> {
self.handle_inserts_inner(request)
.await
.map_err(BoxedError::new)
.context(common_meta::error::ExternalSnafu)
}
}
impl FlowRequester {
pub fn new(client: Client) -> Self {
Self { client }
}
async fn handle_inner(&self, request: FlowRequest) -> Result<FlowResponse> {
let (addr, mut client) = self.client.raw_flow_client()?;
let response = client
.handle_create_remove(request)
.await
.map_err(|e| {
let code = e.code();
let err: crate::error::Error = e.into();
crate::error::Error::FlowServer {
addr,
code,
source: BoxedError::new(err),
location: location!(),
}
})?
.into_inner();
Ok(response)
}
async fn handle_inserts_inner(&self, request: InsertRequests) -> Result<FlowResponse> {
let (addr, mut client) = self.client.raw_flow_client()?;
let requests = api::v1::flow::InsertRequests {
requests: request
.requests
.into_iter()
.map(|insert| api::v1::flow::InsertRequest {
region_id: insert.region_id,
rows: insert.rows,
})
.collect(),
};
let response = client
.handle_mirror_request(requests)
.await
.map_err(|e| {
let code = e.code();
let err: crate::error::Error = e.into();
crate::error::Error::FlowServer {
addr,
code,
source: BoxedError::new(err),
location: location!(),
}
})?
.into_inner();
Ok(response)
}
}

View File

@@ -19,6 +19,7 @@ pub mod client_manager;
#[cfg(feature = "testing")]
mod database;
pub mod error;
pub mod flow;
pub mod load_balance;
mod metrics;
pub mod region;

View File

@@ -74,6 +74,7 @@ substrait.workspace = true
table.workspace = true
tokio.workspace = true
toml.workspace = true
tonic.workspace = true
tracing-appender = "0.2"
[target.'cfg(not(windows))'.dependencies]

View File

@@ -17,7 +17,7 @@
use clap::{Parser, Subcommand};
use cmd::error::Result;
use cmd::options::GlobalOptions;
use cmd::{cli, datanode, frontend, metasrv, standalone, App};
use cmd::{cli, datanode, flownode, frontend, metasrv, standalone, App};
use common_version::version;
#[derive(Parser)]
@@ -37,6 +37,10 @@ enum SubCommand {
#[clap(name = "datanode")]
Datanode(datanode::Command),
/// Start flownode service.
#[clap(name = "flownode")]
Flownode(flownode::Command),
/// Start frontend service.
#[clap(name = "frontend")]
Frontend(frontend::Command),
@@ -72,6 +76,12 @@ async fn start(cli: Command) -> Result<()> {
.run()
.await
}
SubCommand::Flownode(cmd) => {
cmd.build(cmd.load_options(&cli.global_options)?)
.await?
.run()
.await
}
SubCommand::Frontend(cmd) => {
cmd.build(cmd.load_options(&cli.global_options)?)
.await?

View File

@@ -19,19 +19,20 @@ use async_trait::async_trait;
use catalog::kvbackend::MetaKvBackend;
use clap::Parser;
use common_config::Configurable;
use common_telemetry::info;
use common_telemetry::logging::TracingOptions;
use common_telemetry::{info, warn};
use common_version::{short_version, version};
use common_wal::config::DatanodeWalConfig;
use datanode::datanode::{Datanode, DatanodeBuilder};
use datanode::service::DatanodeServiceBuilder;
use meta_client::MetaClientOptions;
use meta_client::{MetaClientOptions, MetaClientType};
use servers::Mode;
use snafu::{OptionExt, ResultExt};
use tracing_appender::non_blocking::WorkerGuard;
use crate::error::{
LoadLayeredConfigSnafu, MissingConfigSnafu, Result, ShutdownDatanodeSnafu, StartDatanodeSnafu,
LoadLayeredConfigSnafu, MetaClientInitSnafu, MissingConfigSnafu, Result, ShutdownDatanodeSnafu,
StartDatanodeSnafu,
};
use crate::options::{GlobalOptions, GreptimeOptions};
use crate::{log_versions, App};
@@ -125,7 +126,7 @@ struct StartCommand {
rpc_addr: Option<String>,
#[clap(long)]
rpc_hostname: Option<String>,
#[clap(long, aliases = ["metasrv-addr"], value_delimiter = ',', num_args = 1..)]
#[clap(long, value_delimiter = ',', num_args = 1..)]
metasrv_addrs: Option<Vec<String>>,
#[clap(short, long)]
config_file: Option<String>,
@@ -155,6 +156,7 @@ impl StartCommand {
}
// The precedence order is: cli > config file > environment variables > default values.
#[allow(deprecated)]
fn merge_with_cli_options(
&self,
global_options: &GlobalOptions,
@@ -176,11 +178,32 @@ impl StartCommand {
};
if let Some(addr) = &self.rpc_addr {
opts.rpc_addr.clone_from(addr);
opts.grpc.addr.clone_from(addr);
} else if let Some(addr) = &opts.rpc_addr {
warn!("Use the deprecated attribute `DatanodeOptions.rpc_addr`, please use `grpc.addr` instead.");
opts.grpc.addr.clone_from(addr);
}
if self.rpc_hostname.is_some() {
opts.rpc_hostname.clone_from(&self.rpc_hostname);
if let Some(hostname) = &self.rpc_hostname {
opts.grpc.hostname.clone_from(hostname);
} else if let Some(hostname) = &opts.rpc_hostname {
warn!("Use the deprecated attribute `DatanodeOptions.rpc_hostname`, please use `grpc.hostname` instead.");
opts.grpc.hostname.clone_from(hostname);
}
if let Some(runtime_size) = opts.rpc_runtime_size {
warn!("Use the deprecated attribute `DatanodeOptions.rpc_runtime_size`, please use `grpc.runtime_size` instead.");
opts.grpc.runtime_size = runtime_size;
}
if let Some(max_recv_message_size) = opts.rpc_max_recv_message_size {
warn!("Use the deprecated attribute `DatanodeOptions.rpc_max_recv_message_size`, please use `grpc.max_recv_message_size` instead.");
opts.grpc.max_recv_message_size = max_recv_message_size;
}
if let Some(max_send_message_size) = opts.rpc_max_send_message_size {
warn!("Use the deprecated attribute `DatanodeOptions.rpc_max_send_message_size`, please use `grpc.max_send_message_size` instead.");
opts.grpc.max_send_message_size = max_send_message_size;
}
if let Some(node_id) = self.node_id {
@@ -253,7 +276,8 @@ impl StartCommand {
.await
.context(StartDatanodeSnafu)?;
let node_id = opts
let cluster_id = 0; // TODO(hl): read from config
let member_id = opts
.node_id
.context(MissingConfigSnafu { msg: "'node_id'" })?;
@@ -261,12 +285,16 @@ impl StartCommand {
msg: "'meta_client_options'",
})?;
let meta_client = datanode::heartbeat::new_metasrv_client(node_id, meta_config)
.await
.context(StartDatanodeSnafu)?;
let meta_client = meta_client::create_meta_client(
cluster_id,
MetaClientType::Datanode { member_id },
meta_config,
)
.await
.context(MetaClientInitSnafu)?;
let meta_backend = Arc::new(MetaKvBackend {
client: Arc::new(meta_client.clone()),
client: meta_client.clone(),
});
let mut datanode = DatanodeBuilder::new(opts.clone(), plugins)
@@ -302,6 +330,34 @@ mod tests {
use super::*;
use crate::options::GlobalOptions;
#[test]
fn test_deprecated_cli_options() {
common_telemetry::init_default_ut_logging();
let mut file = create_named_temp_file();
let toml_str = r#"
mode = "distributed"
enable_memory_catalog = false
node_id = 42
rpc_addr = "127.0.0.1:4001"
rpc_hostname = "192.168.0.1"
[grpc]
addr = "127.0.0.1:3001"
hostname = "127.0.0.1"
runtime_size = 8
"#;
write!(file, "{}", toml_str).unwrap();
let cmd = StartCommand {
config_file: Some(file.path().to_str().unwrap().to_string()),
..Default::default()
};
let options = cmd.load_options(&Default::default()).unwrap().component;
assert_eq!("127.0.0.1:4001".to_string(), options.grpc.addr);
assert_eq!("192.168.0.1".to_string(), options.grpc.hostname);
}
#[test]
fn test_read_from_config_file() {
let mut file = create_named_temp_file();
@@ -309,9 +365,11 @@ mod tests {
mode = "distributed"
enable_memory_catalog = false
node_id = 42
rpc_addr = "127.0.0.1:3001"
rpc_hostname = "127.0.0.1"
rpc_runtime_size = 8
[grpc]
addr = "127.0.0.1:3001"
hostname = "127.0.0.1"
runtime_size = 8
[heartbeat]
interval = "300ms"
@@ -358,7 +416,7 @@ mod tests {
let options = cmd.load_options(&Default::default()).unwrap().component;
assert_eq!("127.0.0.1:3001".to_string(), options.rpc_addr);
assert_eq!("127.0.0.1:3001".to_string(), options.grpc.addr);
assert_eq!(Some(42), options.node_id);
let DatanodeWalConfig::RaftEngine(raft_engine_config) = options.wal else {
@@ -475,8 +533,8 @@ mod tests {
enable_memory_catalog = false
node_id = 42
rpc_addr = "127.0.0.1:3001"
rpc_hostname = "127.0.0.1"
rpc_runtime_size = 8
rpc_hostname = "10.103.174.219"
[meta_client]
timeout = "3s"
@@ -572,6 +630,7 @@ mod tests {
opts.http.addr,
DatanodeOptions::default().component.http.addr
);
assert_eq!(opts.grpc.hostname, "10.103.174.219");
},
);
}

View File

@@ -87,6 +87,20 @@ pub enum Error {
source: datanode::error::Error,
},
#[snafu(display("Failed to start flownode"))]
StartFlownode {
#[snafu(implicit)]
location: Location,
source: flow::Error,
},
#[snafu(display("Failed to shutdown flownode"))]
ShutdownFlownode {
#[snafu(implicit)]
location: Location,
source: flow::Error,
},
#[snafu(display("Failed to start frontend"))]
StartFrontend {
#[snafu(implicit)]
@@ -325,6 +339,22 @@ pub enum Error {
location: Location,
source: cache::error::Error,
},
#[snafu(display("Failed to initialize meta client"))]
MetaClientInit {
#[snafu(implicit)]
location: Location,
source: meta_client::error::Error,
},
#[snafu(display("Tonic transport error: {error:?} with msg: {msg:?}"))]
TonicTransport {
#[snafu(implicit)]
location: Location,
#[snafu(source)]
error: tonic::transport::Error,
msg: Option<String>,
},
}
pub type Result<T> = std::result::Result<T, Error>;
@@ -380,6 +410,11 @@ impl ErrorExt for Error {
Error::BuildRuntime { source, .. } => source.status_code(),
Error::CacheRequired { .. } | Error::BuildCacheRegistry { .. } => StatusCode::Internal,
Self::StartFlownode { source, .. } | Self::ShutdownFlownode { source, .. } => {
source.status_code()
}
Error::MetaClientInit { source, .. } => source.status_code(),
Error::TonicTransport { .. } => StatusCode::Internal,
}
}

334
src/cmd/src/flownode.rs Normal file
View File

@@ -0,0 +1,334 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
use std::sync::Arc;
use cache::{build_fundamental_cache_registry, with_default_composite_cache_registry};
use catalog::kvbackend::{CachedMetaKvBackendBuilder, KvBackendCatalogManager, MetaKvBackend};
use clap::Parser;
use client::client_manager::NodeClients;
use common_base::Plugins;
use common_config::Configurable;
use common_grpc::channel_manager::ChannelConfig;
use common_meta::cache::{CacheRegistryBuilder, LayeredCacheRegistryBuilder};
use common_meta::heartbeat::handler::parse_mailbox_message::ParseMailboxMessageHandler;
use common_meta::heartbeat::handler::HandlerGroupExecutor;
use common_meta::key::TableMetadataManager;
use common_telemetry::info;
use common_telemetry::logging::TracingOptions;
use common_version::{short_version, version};
use flow::{FlownodeBuilder, FlownodeInstance, FrontendInvoker};
use frontend::heartbeat::handler::invalidate_table_cache::InvalidateTableCacheHandler;
use meta_client::{MetaClientOptions, MetaClientType};
use servers::Mode;
use snafu::{OptionExt, ResultExt};
use tracing_appender::non_blocking::WorkerGuard;
use crate::error::{
BuildCacheRegistrySnafu, InitMetadataSnafu, LoadLayeredConfigSnafu, MetaClientInitSnafu,
MissingConfigSnafu, Result, ShutdownFlownodeSnafu, StartFlownodeSnafu,
};
use crate::options::{GlobalOptions, GreptimeOptions};
use crate::{log_versions, App};
pub const APP_NAME: &str = "greptime-flownode";
type FlownodeOptions = GreptimeOptions<flow::FlownodeOptions>;
pub struct Instance {
flownode: FlownodeInstance,
// Keep the logging guard to prevent the worker from being dropped.
_guard: Vec<WorkerGuard>,
}
impl Instance {
pub fn new(flownode: FlownodeInstance, guard: Vec<WorkerGuard>) -> Self {
Self {
flownode,
_guard: guard,
}
}
pub fn flownode_mut(&mut self) -> &mut FlownodeInstance {
&mut self.flownode
}
pub fn flownode(&self) -> &FlownodeInstance {
&self.flownode
}
}
#[async_trait::async_trait]
impl App for Instance {
fn name(&self) -> &str {
APP_NAME
}
async fn start(&mut self) -> Result<()> {
self.flownode.start().await.context(StartFlownodeSnafu)
}
async fn stop(&self) -> Result<()> {
self.flownode
.shutdown()
.await
.context(ShutdownFlownodeSnafu)
}
}
#[derive(Parser)]
pub struct Command {
#[clap(subcommand)]
subcmd: SubCommand,
}
impl Command {
pub async fn build(&self, opts: FlownodeOptions) -> Result<Instance> {
self.subcmd.build(opts).await
}
pub fn load_options(&self, global_options: &GlobalOptions) -> Result<FlownodeOptions> {
match &self.subcmd {
SubCommand::Start(cmd) => cmd.load_options(global_options),
}
}
}
#[derive(Parser)]
enum SubCommand {
Start(StartCommand),
}
impl SubCommand {
async fn build(&self, opts: FlownodeOptions) -> Result<Instance> {
match self {
SubCommand::Start(cmd) => cmd.build(opts).await,
}
}
}
#[derive(Debug, Parser, Default)]
struct StartCommand {
/// Flownode's id
#[clap(long)]
node_id: Option<u64>,
/// Bind address for the gRPC server.
#[clap(long)]
rpc_addr: Option<String>,
/// Hostname for the gRPC server.
#[clap(long)]
rpc_hostname: Option<String>,
/// Metasrv address list;
#[clap(long, value_delimiter = ',', num_args = 1..)]
metasrv_addrs: Option<Vec<String>>,
/// The configuration file for flownode
#[clap(short, long)]
config_file: Option<String>,
/// The prefix of environment variables, default is `GREPTIMEDB_FLOWNODE`;
#[clap(long, default_value = "GREPTIMEDB_FLOWNODE")]
env_prefix: String,
}
impl StartCommand {
fn load_options(&self, global_options: &GlobalOptions) -> Result<FlownodeOptions> {
let mut opts = FlownodeOptions::load_layered_options(
self.config_file.as_deref(),
self.env_prefix.as_ref(),
)
.context(LoadLayeredConfigSnafu)?;
self.merge_with_cli_options(global_options, &mut opts)?;
Ok(opts)
}
// The precedence order is: cli > config file > environment variables > default values.
fn merge_with_cli_options(
&self,
global_options: &GlobalOptions,
opts: &mut FlownodeOptions,
) -> Result<()> {
let opts = &mut opts.component;
if let Some(dir) = &global_options.log_dir {
opts.logging.dir.clone_from(dir);
}
if global_options.log_level.is_some() {
opts.logging.level.clone_from(&global_options.log_level);
}
opts.tracing = TracingOptions {
#[cfg(feature = "tokio-console")]
tokio_console_addr: global_options.tokio_console_addr.clone(),
};
if let Some(addr) = &self.rpc_addr {
opts.grpc.addr.clone_from(addr);
}
if let Some(hostname) = &self.rpc_hostname {
opts.grpc.hostname.clone_from(hostname);
}
if let Some(node_id) = self.node_id {
opts.node_id = Some(node_id);
}
if let Some(metasrv_addrs) = &self.metasrv_addrs {
opts.meta_client
.get_or_insert_with(MetaClientOptions::default)
.metasrv_addrs
.clone_from(metasrv_addrs);
opts.mode = Mode::Distributed;
}
if let (Mode::Distributed, None) = (&opts.mode, &opts.node_id) {
return MissingConfigSnafu {
msg: "Missing node id option",
}
.fail();
}
Ok(())
}
async fn build(&self, opts: FlownodeOptions) -> Result<Instance> {
common_runtime::init_global_runtimes(&opts.runtime);
let guard = common_telemetry::init_global_logging(
APP_NAME,
&opts.component.logging,
&opts.component.tracing,
opts.component.node_id.map(|x| x.to_string()),
);
log_versions(version!(), short_version!());
info!("Flownode start command: {:#?}", self);
info!("Flownode options: {:#?}", opts);
let opts = opts.component;
// TODO(discord9): make it not optionale after cluster id is required
let cluster_id = opts.cluster_id.unwrap_or(0);
let member_id = opts
.node_id
.context(MissingConfigSnafu { msg: "'node_id'" })?;
let meta_config = opts.meta_client.as_ref().context(MissingConfigSnafu {
msg: "'meta_client_options'",
})?;
let meta_client = meta_client::create_meta_client(
cluster_id,
MetaClientType::Flownode { member_id },
meta_config,
)
.await
.context(MetaClientInitSnafu)?;
let cache_max_capacity = meta_config.metadata_cache_max_capacity;
let cache_ttl = meta_config.metadata_cache_ttl;
let cache_tti = meta_config.metadata_cache_tti;
// TODO(discord9): add helper function to ease the creation of cache registry&such
let cached_meta_backend = CachedMetaKvBackendBuilder::new(meta_client.clone())
.cache_max_capacity(cache_max_capacity)
.cache_ttl(cache_ttl)
.cache_tti(cache_tti)
.build();
let cached_meta_backend = Arc::new(cached_meta_backend);
// Builds cache registry
let layered_cache_builder = LayeredCacheRegistryBuilder::default().add_cache_registry(
CacheRegistryBuilder::default()
.add_cache(cached_meta_backend.clone())
.build(),
);
let fundamental_cache_registry =
build_fundamental_cache_registry(Arc::new(MetaKvBackend::new(meta_client.clone())));
let layered_cache_registry = Arc::new(
with_default_composite_cache_registry(
layered_cache_builder.add_cache_registry(fundamental_cache_registry),
)
.context(BuildCacheRegistrySnafu)?
.build(),
);
let catalog_manager = KvBackendCatalogManager::new(
opts.mode,
Some(meta_client.clone()),
cached_meta_backend.clone(),
layered_cache_registry.clone(),
);
let table_metadata_manager =
Arc::new(TableMetadataManager::new(cached_meta_backend.clone()));
table_metadata_manager
.init()
.await
.context(InitMetadataSnafu)?;
let executor = HandlerGroupExecutor::new(vec![
Arc::new(ParseMailboxMessageHandler),
Arc::new(InvalidateTableCacheHandler::new(
layered_cache_registry.clone(),
)),
]);
let heartbeat_task = flow::heartbeat::HeartbeatTask::new(
&opts,
meta_client.clone(),
opts.heartbeat.clone(),
Arc::new(executor),
);
let flownode_builder = FlownodeBuilder::new(
opts,
Plugins::new(),
table_metadata_manager,
catalog_manager.clone(),
)
.with_heartbeat_task(heartbeat_task);
let flownode = flownode_builder.build().await.context(StartFlownodeSnafu)?;
// flownode's frontend to datanode need not timeout.
// Some queries are expected to take long time.
let channel_config = ChannelConfig {
timeout: None,
..Default::default()
};
let client = Arc::new(NodeClients::new(channel_config));
let invoker = FrontendInvoker::build_from(
flownode.flow_worker_manager().clone(),
catalog_manager.clone(),
cached_meta_backend.clone(),
layered_cache_registry.clone(),
meta_client.clone(),
client,
)
.await
.context(StartFlownodeSnafu)?;
flownode
.flow_worker_manager()
.set_frontend_invoker(invoker)
.await;
Ok(Instance::new(flownode, guard))
}
}

View File

@@ -19,7 +19,7 @@ use async_trait::async_trait;
use cache::{build_fundamental_cache_registry, with_default_composite_cache_registry};
use catalog::kvbackend::{CachedMetaKvBackendBuilder, KvBackendCatalogManager, MetaKvBackend};
use clap::Parser;
use client::client_manager::DatanodeClients;
use client::client_manager::NodeClients;
use common_config::Configurable;
use common_grpc::channel_manager::ChannelConfig;
use common_meta::cache::{CacheRegistryBuilder, LayeredCacheRegistryBuilder};
@@ -34,14 +34,15 @@ use frontend::heartbeat::HeartbeatTask;
use frontend::instance::builder::FrontendBuilder;
use frontend::instance::{FrontendInstance, Instance as FeInstance};
use frontend::server::Services;
use meta_client::MetaClientOptions;
use meta_client::{MetaClientOptions, MetaClientType};
use servers::tls::{TlsMode, TlsOption};
use servers::Mode;
use snafu::{OptionExt, ResultExt};
use tracing_appender::non_blocking::WorkerGuard;
use crate::error::{
self, InitTimezoneSnafu, LoadLayeredConfigSnafu, MissingConfigSnafu, Result, StartFrontendSnafu,
self, InitTimezoneSnafu, LoadLayeredConfigSnafu, MetaClientInitSnafu, MissingConfigSnafu,
Result, StartFrontendSnafu,
};
use crate::options::{GlobalOptions, GreptimeOptions};
use crate::{log_versions, App};
@@ -147,7 +148,7 @@ pub struct StartCommand {
config_file: Option<String>,
#[clap(short, long)]
influxdb_enable: Option<bool>,
#[clap(long, aliases = ["metasrv-addr"], value_delimiter = ',', num_args = 1..)]
#[clap(long, value_delimiter = ',', num_args = 1..)]
metasrv_addrs: Option<Vec<String>>,
#[clap(long)]
tls_mode: Option<TlsMode>,
@@ -279,10 +280,16 @@ impl StartCommand {
let cache_ttl = meta_client_options.metadata_cache_ttl;
let cache_tti = meta_client_options.metadata_cache_tti;
let meta_client = FeInstance::create_meta_client(meta_client_options)
.await
.context(StartFrontendSnafu)?;
let cluster_id = 0; // (TODO: jeremy): It is currently a reserved field and has not been enabled.
let meta_client = meta_client::create_meta_client(
cluster_id,
MetaClientType::Frontend,
meta_client_options,
)
.await
.context(MetaClientInitSnafu)?;
// TODO(discord9): add helper function to ease the creation of cache registry&such
let cached_meta_backend = CachedMetaKvBackendBuilder::new(meta_client.clone())
.cache_max_capacity(cache_max_capacity)
.cache_ttl(cache_ttl)
@@ -333,9 +340,10 @@ impl StartCommand {
timeout: None,
..Default::default()
};
let client = DatanodeClients::new(channel_config);
let client = NodeClients::new(channel_config);
let mut instance = FrontendBuilder::new(
opts.clone(),
cached_meta_backend.clone(),
layered_cache_registry.clone(),
catalog_manager,
@@ -349,12 +357,12 @@ impl StartCommand {
.await
.context(StartFrontendSnafu)?;
let servers = Services::new(opts.clone(), Arc::new(instance.clone()), plugins)
let servers = Services::new(opts, Arc::new(instance.clone()), plugins)
.build()
.await
.context(StartFrontendSnafu)?;
instance
.build_servers(opts, servers)
.build_servers(servers)
.context(StartFrontendSnafu)?;
Ok(Instance::new(instance, guard))
@@ -370,7 +378,7 @@ mod tests {
use common_base::readable_size::ReadableSize;
use common_config::ENV_VAR_SEP;
use common_test_util::temp_dir::create_named_temp_file;
use frontend::service_config::GrpcOptions;
use servers::grpc::GrpcOptions;
use servers::http::HttpOptions;
use super::*;

View File

@@ -22,6 +22,7 @@ use crate::error::Result;
pub mod cli;
pub mod datanode;
pub mod error;
pub mod flownode;
pub mod frontend;
pub mod metasrv;
pub mod options;

View File

@@ -21,11 +21,12 @@ use catalog::kvbackend::KvBackendCatalogManager;
use clap::Parser;
use common_catalog::consts::{MIN_USER_FLOW_ID, MIN_USER_TABLE_ID};
use common_config::{metadata_store_dir, Configurable, KvBackendConfig};
use common_error::ext::BoxedError;
use common_meta::cache::LayeredCacheRegistryBuilder;
use common_meta::cache_invalidator::CacheInvalidatorRef;
use common_meta::ddl::flow_meta::{FlowMetadataAllocator, FlowMetadataAllocatorRef};
use common_meta::ddl::table_meta::{TableMetadataAllocator, TableMetadataAllocatorRef};
use common_meta::ddl::{DdlContext, ProcedureExecutorRef};
use common_meta::ddl::{DdlContext, NoopRegionFailureDetectorControl, ProcedureExecutorRef};
use common_meta::ddl_manager::DdlManager;
use common_meta::key::flow::{FlowMetadataManager, FlowMetadataManagerRef};
use common_meta::key::{TableMetadataManager, TableMetadataManagerRef};
@@ -43,28 +44,31 @@ use common_wal::config::StandaloneWalConfig;
use datanode::config::{DatanodeOptions, ProcedureConfig, RegionEngineConfig, StorageConfig};
use datanode::datanode::{Datanode, DatanodeBuilder};
use file_engine::config::EngineConfig as FileEngineConfig;
use flow::FlownodeBuilder;
use flow::{FlowWorkerManager, FlownodeBuilder, FrontendInvoker};
use frontend::frontend::FrontendOptions;
use frontend::instance::builder::FrontendBuilder;
use frontend::instance::{FrontendInstance, Instance as FeInstance, StandaloneDatanodeManager};
use frontend::server::Services;
use frontend::service_config::{
GrpcOptions, InfluxdbOptions, MysqlOptions, OpentsdbOptions, PostgresOptions, PromStoreOptions,
InfluxdbOptions, MysqlOptions, OpentsdbOptions, PostgresOptions, PromStoreOptions,
};
use meta_srv::metasrv::{FLOW_ID_SEQ, TABLE_ID_SEQ};
use mito2::config::MitoConfig;
use serde::{Deserialize, Serialize};
use servers::export_metrics::ExportMetricsOption;
use servers::grpc::GrpcOptions;
use servers::http::HttpOptions;
use servers::tls::{TlsMode, TlsOption};
use servers::Mode;
use snafu::ResultExt;
use tokio::sync::broadcast;
use tracing_appender::non_blocking::WorkerGuard;
use crate::error::{
BuildCacheRegistrySnafu, CreateDirSnafu, IllegalConfigSnafu, InitDdlManagerSnafu,
InitMetadataSnafu, InitTimezoneSnafu, LoadLayeredConfigSnafu, Result, ShutdownDatanodeSnafu,
ShutdownFrontendSnafu, StartDatanodeSnafu, StartFrontendSnafu, StartProcedureManagerSnafu,
InitMetadataSnafu, InitTimezoneSnafu, LoadLayeredConfigSnafu, OtherSnafu, Result,
ShutdownDatanodeSnafu, ShutdownFlownodeSnafu, ShutdownFrontendSnafu, StartDatanodeSnafu,
StartFlownodeSnafu, StartFrontendSnafu, StartProcedureManagerSnafu,
StartWalOptionsAllocatorSnafu, StopProcedureManagerSnafu,
};
use crate::options::{GlobalOptions, GreptimeOptions};
@@ -203,7 +207,7 @@ impl StandaloneOptions {
wal: cloned_opts.wal.into(),
storage: cloned_opts.storage,
region_engine: cloned_opts.region_engine,
rpc_addr: cloned_opts.grpc.addr,
grpc: cloned_opts.grpc,
..Default::default()
}
}
@@ -212,6 +216,9 @@ impl StandaloneOptions {
pub struct Instance {
datanode: Datanode,
frontend: FeInstance,
// TODO(discord9): wrapped it in flownode instance instead
flow_worker_manager: Arc<FlowWorkerManager>,
flow_shutdown: broadcast::Sender<()>,
procedure_manager: ProcedureManagerRef,
wal_options_allocator: WalOptionsAllocatorRef,
@@ -243,6 +250,9 @@ impl App for Instance {
.context(StartFrontendSnafu)?;
self.frontend.start().await.context(StartFrontendSnafu)?;
self.flow_worker_manager
.clone()
.run_background(Some(self.flow_shutdown.subscribe()));
Ok(())
}
@@ -261,6 +271,15 @@ impl App for Instance {
.shutdown()
.await
.context(ShutdownDatanodeSnafu)?;
self.flow_shutdown
.send(())
.map_err(|_e| {
flow::error::InternalSnafu {
reason: "Failed to send shutdown signal to flow worker manager, all receiver end already closed".to_string(),
}
.build()
})
.context(ShutdownFlownodeSnafu)?;
info!("Datanode instance stopped.");
Ok(())
@@ -350,7 +369,7 @@ impl StartCommand {
if let Some(addr) = &self.rpc_addr {
// frontend grpc addr conflict with datanode default grpc addr
let datanode_grpc_addr = DatanodeOptions::default().rpc_addr;
let datanode_grpc_addr = DatanodeOptions::default().grpc.addr;
if addr.eq(&datanode_grpc_addr) {
return IllegalConfigSnafu {
msg: format!(
@@ -445,24 +464,29 @@ impl StartCommand {
let table_metadata_manager =
Self::create_table_metadata_manager(kv_backend.clone()).await?;
let flow_builder = FlownodeBuilder::new(
1,
Default::default(),
fe_plugins.clone(),
table_metadata_manager.clone(),
catalog_manager.clone(),
);
let flownode = Arc::new(flow_builder.build().await);
let datanode = DatanodeBuilder::new(dn_opts, fe_plugins.clone())
.with_kv_backend(kv_backend.clone())
.build()
.await
.context(StartDatanodeSnafu)?;
let flow_builder = FlownodeBuilder::new(
Default::default(),
fe_plugins.clone(),
table_metadata_manager.clone(),
catalog_manager.clone(),
);
let flownode = Arc::new(
flow_builder
.build()
.await
.map_err(BoxedError::new)
.context(OtherSnafu)?,
);
let node_manager = Arc::new(StandaloneDatanodeManager {
region_server: datanode.region_server(),
flow_server: flownode.clone(),
flow_server: flownode.flow_worker_manager(),
});
let table_id_sequence = Arc::new(
@@ -502,35 +526,47 @@ impl StartCommand {
.await?;
let mut frontend = FrontendBuilder::new(
kv_backend,
layered_cache_registry,
catalog_manager,
node_manager,
ddl_task_executor,
fe_opts.clone(),
kv_backend.clone(),
layered_cache_registry.clone(),
catalog_manager.clone(),
node_manager.clone(),
ddl_task_executor.clone(),
)
.with_plugin(fe_plugins.clone())
.try_build()
.await
.context(StartFrontendSnafu)?;
let flow_worker_manager = flownode.flow_worker_manager();
// flow server need to be able to use frontend to write insert requests back
flownode
.set_frontend_invoker(Box::new(frontend.clone()))
.await;
// TODO(discord9): unify with adding `start` and `shutdown` method to flownode too.
let _handle = flownode.clone().run_background();
let invoker = FrontendInvoker::build_from(
flow_worker_manager.clone(),
catalog_manager.clone(),
kv_backend.clone(),
layered_cache_registry.clone(),
ddl_task_executor.clone(),
node_manager,
)
.await
.context(StartFlownodeSnafu)?;
flow_worker_manager.set_frontend_invoker(invoker).await;
let servers = Services::new(fe_opts.clone(), Arc::new(frontend.clone()), fe_plugins)
let (tx, _rx) = broadcast::channel(1);
let servers = Services::new(fe_opts, Arc::new(frontend.clone()), fe_plugins)
.build()
.await
.context(StartFrontendSnafu)?;
frontend
.build_servers(fe_opts, servers)
.build_servers(servers)
.context(StartFrontendSnafu)?;
Ok(Instance {
datanode,
frontend,
flow_worker_manager,
flow_shutdown: tx,
procedure_manager,
wal_options_allocator,
_guard: guard,
@@ -556,6 +592,7 @@ impl StartCommand {
table_metadata_allocator,
flow_metadata_manager,
flow_metadata_allocator,
region_failure_detector_controller: Arc::new(NoopRegionFailureDetectorControl),
},
procedure_manager,
true,

View File

@@ -18,6 +18,9 @@ use cmd::options::GreptimeOptions;
use cmd::standalone::StandaloneOptions;
use common_base::readable_size::ReadableSize;
use common_config::Configurable;
use common_grpc::channel_manager::{
DEFAULT_MAX_GRPC_RECV_MESSAGE_SIZE, DEFAULT_MAX_GRPC_SEND_MESSAGE_SIZE,
};
use common_runtime::global::RuntimeOptions;
use common_telemetry::logging::LoggingOptions;
use common_wal::config::raft_engine::RaftEngineConfig;
@@ -30,7 +33,9 @@ use meta_srv::metasrv::MetasrvOptions;
use meta_srv::selector::SelectorType;
use mito2::config::MitoConfig;
use servers::export_metrics::ExportMetricsOption;
use servers::grpc::GrpcOptions;
#[allow(deprecated)]
#[test]
fn test_load_datanode_example_config() {
let example_config = common_test_util::find_workspace_path("config/datanode.example.toml");
@@ -42,11 +47,10 @@ fn test_load_datanode_example_config() {
runtime: RuntimeOptions {
read_rt_size: 8,
write_rt_size: 8,
bg_rt_size: 8,
bg_rt_size: 4,
},
component: DatanodeOptions {
node_id: Some(42),
rpc_hostname: Some("127.0.0.1".to_string()),
meta_client: Some(MetaClientOptions {
metasrv_addrs: vec!["127.0.0.1:3002".to_string()],
timeout: Duration::from_secs(3),
@@ -90,6 +94,12 @@ fn test_load_datanode_example_config() {
remote_write: Some(Default::default()),
..Default::default()
},
grpc: GrpcOptions::default().with_addr("127.0.0.1:3001"),
rpc_addr: Some("127.0.0.1:3001".to_string()),
rpc_hostname: Some("127.0.0.1".to_string()),
rpc_runtime_size: Some(8),
rpc_max_recv_message_size: Some(DEFAULT_MAX_GRPC_RECV_MESSAGE_SIZE),
rpc_max_send_message_size: Some(DEFAULT_MAX_GRPC_SEND_MESSAGE_SIZE),
..Default::default()
},
};
@@ -107,7 +117,7 @@ fn test_load_frontend_example_config() {
runtime: RuntimeOptions {
read_rt_size: 8,
write_rt_size: 8,
bg_rt_size: 8,
bg_rt_size: 4,
},
component: FrontendOptions {
default_timezone: Some("UTC".to_string()),
@@ -155,7 +165,7 @@ fn test_load_metasrv_example_config() {
runtime: RuntimeOptions {
read_rt_size: 8,
write_rt_size: 8,
bg_rt_size: 8,
bg_rt_size: 4,
},
component: MetasrvOptions {
selector: SelectorType::LeaseBased,
@@ -188,7 +198,7 @@ fn test_load_standalone_example_config() {
runtime: RuntimeOptions {
read_rt_size: 8,
write_rt_size: 8,
bg_rt_size: 8,
bg_rt_size: 4,
},
component: StandaloneOptions {
default_timezone: Some("UTC".to_string()),

View File

@@ -81,16 +81,17 @@ impl PartialEq<Bytes> for [u8] {
/// Now this buffer is restricted to only hold valid UTF-8 string (only allow constructing `StringBytes`
/// from String or str). We may support other encoding in the future.
#[derive(Debug, Default, Clone, PartialEq, Eq, Hash, PartialOrd, Ord)]
pub struct StringBytes(bytes::Bytes);
pub struct StringBytes(String);
impl StringBytes {
/// View this string as UTF-8 string slice.
///
/// # Safety
/// We only allow constructing `StringBytes` from String/str, so the inner
/// buffer must holds valid UTF-8.
pub fn as_utf8(&self) -> &str {
unsafe { std::str::from_utf8_unchecked(&self.0) }
&self.0
}
/// Convert this string into owned UTF-8 string.
pub fn into_string(self) -> String {
self.0
}
pub fn len(&self) -> usize {
@@ -104,37 +105,37 @@ impl StringBytes {
impl From<String> for StringBytes {
fn from(string: String) -> StringBytes {
StringBytes(bytes::Bytes::from(string))
StringBytes(string)
}
}
impl From<&str> for StringBytes {
fn from(string: &str) -> StringBytes {
StringBytes(bytes::Bytes::copy_from_slice(string.as_bytes()))
StringBytes(string.to_string())
}
}
impl PartialEq<String> for StringBytes {
fn eq(&self, other: &String) -> bool {
self.0 == other.as_bytes()
&self.0 == other
}
}
impl PartialEq<StringBytes> for String {
fn eq(&self, other: &StringBytes) -> bool {
self.as_bytes() == other.0
self == &other.0
}
}
impl PartialEq<str> for StringBytes {
fn eq(&self, other: &str) -> bool {
self.0 == other.as_bytes()
self.0.as_str() == other
}
}
impl PartialEq<StringBytes> for str {
fn eq(&self, other: &StringBytes) -> bool {
self.as_bytes() == other.0
self == other.0
}
}

View File

@@ -108,3 +108,7 @@ pub const FILE_ENGINE: &str = "file";
pub const SEMANTIC_TYPE_PRIMARY_KEY: &str = "TAG";
pub const SEMANTIC_TYPE_FIELD: &str = "FIELD";
pub const SEMANTIC_TYPE_TIME_INDEX: &str = "TIMESTAMP";
pub fn is_readonly_schema(schema: &str) -> bool {
matches!(schema, INFORMATION_SCHEMA_NAME)
}

View File

@@ -31,7 +31,9 @@ derive_builder.workspace = true
futures.workspace = true
lazy_static.workspace = true
object-store.workspace = true
orc-rust = { git = "https://github.com/datafusion-contrib/datafusion-orc.git", rev = "502217315726314c4008808fe169764529640599" }
orc-rust = { git = "https://github.com/datafusion-contrib/datafusion-orc.git", rev = "502217315726314c4008808fe169764529640599", default-features = false, features = [
"async",
] }
parquet.workspace = true
paste = "1.0"
rand.workspace = true

View File

@@ -149,7 +149,9 @@ pub fn open_with_decoder<T: ArrowDecoder, F: Fn() -> DataFusionResult<T>>(
.reader(&path)
.await
.map_err(|e| DataFusionError::External(Box::new(e)))?
.into_bytes_stream(..);
.into_bytes_stream(..)
.await
.map_err(|e| DataFusionError::External(Box::new(e)))?;
let mut upstream = compression_type.convert_stream(reader).fuse();

View File

@@ -169,11 +169,14 @@ impl FileFormat for CsvFormat {
.stat(path)
.await
.context(error::ReadObjectSnafu { path })?;
let reader = store
.reader(path)
.await
.context(error::ReadObjectSnafu { path })?
.into_futures_async_read(0..meta.content_length())
.await
.context(error::ReadObjectSnafu { path })?
.compat();
let decoded = self.compression_type.convert_async_read(reader);

View File

@@ -87,11 +87,14 @@ impl FileFormat for JsonFormat {
.stat(path)
.await
.context(error::ReadObjectSnafu { path })?;
let reader = store
.reader(path)
.await
.context(error::ReadObjectSnafu { path })?
.into_futures_async_read(0..meta.content_length())
.await
.context(error::ReadObjectSnafu { path })?
.compat();
let decoded = self.compression_type.convert_async_read(reader);

View File

@@ -52,11 +52,14 @@ impl FileFormat for ParquetFormat {
.stat(path)
.await
.context(error::ReadObjectSnafu { path })?;
let mut reader = store
.reader(path)
.await
.context(error::ReadObjectSnafu { path })?
.into_futures_async_read(0..meta.content_length())
.await
.context(error::ReadObjectSnafu { path })?
.compat();
let metadata = reader
@@ -129,6 +132,7 @@ impl LazyParquetFileReader {
.reader(&self.path)
.await?
.into_futures_async_read(0..meta.content_length())
.await?
.compat();
self.reader = Some(reader);
}

View File

@@ -19,7 +19,6 @@ use std::vec;
use common_test_util::find_workspace_path;
use datafusion::assert_batches_eq;
use datafusion::config::TableParquetOptions;
use datafusion::datasource::physical_plan::{FileOpener, FileScanConfig, FileStream, ParquetExec};
use datafusion::execution::context::TaskContext;
use datafusion::physical_plan::metrics::ExecutionPlanMetricsSet;
@@ -167,8 +166,9 @@ async fn test_parquet_exec() {
.to_string();
let base_config = scan_config(schema.clone(), None, path);
let exec = ParquetExec::new(base_config, None, None, TableParquetOptions::default())
.with_parquet_file_reader_factory(Arc::new(DefaultParquetFileReaderFactory::new(store)));
let exec = ParquetExec::builder(base_config)
.with_parquet_file_reader_factory(Arc::new(DefaultParquetFileReaderFactory::new(store)))
.build();
let ctx = SessionContext::new();

View File

@@ -121,6 +121,11 @@ impl Decimal128 {
let value = (hi | lo) as i128;
Self::new(value, precision, scale)
}
pub fn negative(mut self) -> Self {
self.value = -self.value;
self
}
}
/// The default value of Decimal128 is 0, and its precision is 1 and scale is 0.

View File

@@ -105,6 +105,10 @@ impl BoxedError {
inner: Box::new(err),
}
}
pub fn into_inner(self) -> Box<dyn crate::ext::ErrorExt + Send + Sync> {
self.inner
}
}
impl std::fmt::Debug for BoxedError {

View File

@@ -159,7 +159,6 @@ impl StatusCode {
| StatusCode::Unexpected
| StatusCode::Internal
| StatusCode::Cancelled
| StatusCode::PlanQuery
| StatusCode::EngineExecuteQuery
| StatusCode::StorageUnavailable
| StatusCode::RuntimeResourcesExhausted => true,
@@ -171,6 +170,7 @@ impl StatusCode {
| StatusCode::TableNotFound
| StatusCode::RegionAlreadyExists
| StatusCode::RegionNotFound
| StatusCode::PlanQuery
| StatusCode::FlowAlreadyExists
| StatusCode::FlowNotFound
| StatusCode::RegionNotReady

View File

@@ -13,4 +13,3 @@
// limitations under the License.
pub mod error;
pub mod handler;

View File

@@ -22,6 +22,7 @@ use crate::function::FunctionRef;
use crate::scalars::aggregate::{AggregateFunctionMetaRef, AggregateFunctions};
use crate::scalars::date::DateFunction;
use crate::scalars::expression::ExpressionFunction;
use crate::scalars::matches::MatchesFunction;
use crate::scalars::math::MathFunction;
use crate::scalars::numpy::NumpyFunction;
use crate::scalars::timestamp::TimestampFunction;
@@ -86,6 +87,9 @@ pub static FUNCTION_REGISTRY: Lazy<Arc<FunctionRegistry>> = Lazy::new(|| {
// Aggregate functions
AggregateFunctions::register(&function_registry);
// Full text search function
MatchesFunction::register(&function_registry);
// System and administration functions
SystemFunction::register(&function_registry);
TableFunction::register(&function_registry);

View File

@@ -12,6 +12,9 @@
// See the License for the specific language governing permissions and
// limitations under the License.
#![feature(let_chains)]
#![feature(try_blocks)]
mod macros;
pub mod scalars;
mod system;

View File

@@ -15,6 +15,7 @@
pub mod aggregate;
pub(crate) mod date;
pub mod expression;
pub mod matches;
pub mod math;
pub mod numpy;
#[cfg(test)]

File diff suppressed because it is too large Load Diff

View File

@@ -44,7 +44,7 @@ impl Function for DatabaseFunction {
fn eval(&self, func_ctx: FunctionContext, _columns: &[VectorRef]) -> Result<VectorRef> {
let db = func_ctx.query_ctx.current_schema();
Ok(Arc::new(StringVector::from_slice(&[db])) as _)
Ok(Arc::new(StringVector::from_slice(&[&db])) as _)
}
}

View File

@@ -16,6 +16,7 @@ common-macro.workspace = true
common-query.workspace = true
common-time.workspace = true
datatypes.workspace = true
prost.workspace = true
snafu.workspace = true
table.workspace = true

View File

@@ -14,6 +14,7 @@
use std::any::Any;
use api::v1::ColumnDataType;
use common_error::ext::ErrorExt;
use common_error::status_code::StatusCode;
use common_macro::stack_trace_debug;
@@ -104,6 +105,25 @@ pub enum Error {
#[snafu(implicit)]
location: Location,
},
#[snafu(display("Unknown proto column datatype: {}", datatype))]
UnknownColumnDataType {
datatype: i32,
#[snafu(implicit)]
location: Location,
#[snafu(source)]
error: prost::DecodeError,
},
#[snafu(display(
"Fulltext index only supports string type, column: {column_name}, unexpected type: {column_type:?}"
))]
InvalidFulltextColumnType {
column_name: String,
column_type: ColumnDataType,
#[snafu(implicit)]
location: Location,
},
}
pub type Result<T> = std::result::Result<T, Error>;
@@ -124,6 +144,10 @@ impl ErrorExt for Error {
Error::UnexpectedValuesLength { .. } | Error::UnknownLocationType { .. } => {
StatusCode::InvalidArguments
}
Error::UnknownColumnDataType { .. } | Error::InvalidFulltextColumnType { .. } => {
StatusCode::InvalidArguments
}
}
}

View File

@@ -474,6 +474,7 @@ mod tests {
scale: 10,
})),
}),
options: None,
};
(

View File

@@ -14,24 +14,26 @@
use std::collections::HashSet;
use api::v1::column_def::contains_fulltext;
use api::v1::{
AddColumn, AddColumns, Column, ColumnDataTypeExtension, ColumnDef, ColumnSchema,
CreateTableExpr, SemanticType,
AddColumn, AddColumns, Column, ColumnDataType, ColumnDataTypeExtension, ColumnDef,
ColumnOptions, ColumnSchema, CreateTableExpr, SemanticType,
};
use datatypes::schema::Schema;
use snafu::{ensure, OptionExt};
use snafu::{ensure, OptionExt, ResultExt};
use table::metadata::TableId;
use table::table_reference::TableReference;
use crate::error::{
DuplicatedColumnNameSnafu, DuplicatedTimestampColumnSnafu, MissingTimestampColumnSnafu, Result,
DuplicatedColumnNameSnafu, DuplicatedTimestampColumnSnafu, InvalidFulltextColumnTypeSnafu,
MissingTimestampColumnSnafu, Result, UnknownColumnDataTypeSnafu,
};
pub struct ColumnExpr<'a> {
pub column_name: &'a str,
pub datatype: i32,
pub semantic_type: i32,
pub datatype_extension: &'a Option<ColumnDataTypeExtension>,
pub options: &'a Option<ColumnOptions>,
}
impl<'a> ColumnExpr<'a> {
@@ -53,6 +55,7 @@ impl<'a> From<&'a Column> for ColumnExpr<'a> {
datatype: column.datatype,
semantic_type: column.semantic_type,
datatype_extension: &column.datatype_extension,
options: &column.options,
}
}
}
@@ -64,6 +67,7 @@ impl<'a> From<&'a ColumnSchema> for ColumnExpr<'a> {
datatype: schema.datatype,
semantic_type: schema.semantic_type,
datatype_extension: &schema.datatype_extension,
options: &schema.options,
}
}
}
@@ -99,6 +103,7 @@ pub fn build_create_table_expr(
datatype,
semantic_type,
datatype_extension,
options,
} in column_exprs
{
let mut is_nullable = true;
@@ -119,6 +124,17 @@ pub fn build_create_table_expr(
_ => {}
}
let column_type =
ColumnDataType::try_from(datatype).context(UnknownColumnDataTypeSnafu { datatype })?;
ensure!(
!contains_fulltext(options) || column_type == ColumnDataType::String,
InvalidFulltextColumnTypeSnafu {
column_name,
column_type,
}
);
let column_def = ColumnDef {
name: column_name.to_string(),
data_type: datatype,
@@ -127,6 +143,7 @@ pub fn build_create_table_expr(
semantic_type,
comment: String::new(),
datatype_extension: datatype_extension.clone(),
options: options.clone(),
};
column_defs.push(column_def);
}
@@ -168,6 +185,7 @@ pub fn extract_new_columns(
semantic_type: expr.semantic_type,
comment: String::new(),
datatype_extension: expr.datatype_extension.clone(),
options: expr.options.clone(),
});
AddColumn {
column_def,

View File

@@ -12,7 +12,7 @@
// See the License for the specific language governing permissions and
// limitations under the License.
use std::collections::HashSet;
use std::collections::HashMap;
use std::sync::Arc;
use futures::future::BoxFuture;
@@ -26,9 +26,10 @@ use crate::error::Result;
use crate::instruction::{CacheIdent, CreateFlow, DropFlow};
use crate::key::flow::{TableFlowManager, TableFlowManagerRef};
use crate::kv_backend::KvBackendRef;
use crate::peer::Peer;
use crate::FlownodeId;
type FlownodeSet = HashSet<FlownodeId>;
type FlownodeSet = Arc<HashMap<FlownodeId, Peer>>;
pub type TableFlownodeSetCacheRef = Arc<TableFlownodeSetCache>;
@@ -53,13 +54,14 @@ fn init_factory(table_flow_manager: TableFlowManagerRef) -> Initializer<TableId,
Box::pin(async move {
table_flow_manager
.flows(table_id)
.map_ok(|key| key.flownode_id())
.try_collect::<HashSet<_>>()
.map_ok(|(key, value)| (key.flownode_id(), value.peer))
.try_collect::<HashMap<_, _>>()
.await
// We must cache the `HashSet` even if it's empty,
// to avoid future requests to the remote storage next time;
// If the value is added to the remote storage,
// we have a corresponding cache invalidation mechanism to invalidate `(Key, EmptyHashSet)`.
.map(Arc::new)
.map(Some)
})
})
@@ -69,21 +71,23 @@ async fn handle_create_flow(
cache: &Cache<TableId, FlownodeSet>,
CreateFlow {
source_table_ids,
flownode_ids,
flownodes: flownode_peers,
}: &CreateFlow,
) {
for table_id in source_table_ids {
let entry = cache.entry(*table_id);
entry
.and_compute_with(
async |entry: Option<moka::Entry<u32, HashSet<u64>>>| match entry {
async |entry: Option<moka::Entry<u32, Arc<HashMap<u64, _>>>>| match entry {
Some(entry) => {
let mut set = entry.into_value();
set.extend(flownode_ids.iter().cloned());
let mut map = entry.into_value().as_ref().clone();
map.extend(flownode_peers.iter().map(|peer| (peer.id, peer.clone())));
Op::Put(set)
Op::Put(Arc::new(map))
}
None => Op::Put(HashSet::from_iter(flownode_ids.iter().cloned())),
None => Op::Put(Arc::new(HashMap::from_iter(
flownode_peers.iter().map(|peer| (peer.id, peer.clone())),
))),
},
)
.await;
@@ -101,14 +105,14 @@ async fn handle_drop_flow(
let entry = cache.entry(*table_id);
entry
.and_compute_with(
async |entry: Option<moka::Entry<u32, HashSet<u64>>>| match entry {
async |entry: Option<moka::Entry<u32, Arc<HashMap<u64, _>>>>| match entry {
Some(entry) => {
let mut set = entry.into_value();
let mut set = entry.into_value().as_ref().clone();
for flownode_id in flownode_ids {
set.remove(flownode_id);
}
Op::Put(set)
Op::Put(Arc::new(set))
}
None => {
// Do nothing
@@ -140,7 +144,7 @@ fn filter(ident: &CacheIdent) -> bool {
#[cfg(test)]
mod tests {
use std::collections::{BTreeMap, HashSet};
use std::collections::{BTreeMap, HashMap};
use std::sync::Arc;
use common_catalog::consts::{DEFAULT_CATALOG_NAME, DEFAULT_SCHEMA_NAME};
@@ -150,8 +154,10 @@ mod tests {
use crate::cache::flow::table_flownode::new_table_flownode_set_cache;
use crate::instruction::{CacheIdent, CreateFlow, DropFlow};
use crate::key::flow::flow_info::FlowInfoValue;
use crate::key::flow::flow_route::FlowRouteValue;
use crate::key::flow::FlowMetadataManager;
use crate::kv_backend::memory::MemoryKvBackend;
use crate::peer::Peer;
#[tokio::test]
async fn test_cache_empty_set() {
@@ -184,15 +190,31 @@ mod tests {
comment: "comment".to_string(),
options: Default::default(),
},
(1..=3)
.map(|i| {
(
(i - 1) as u32,
FlowRouteValue {
peer: Peer::empty(i),
},
)
})
.collect::<Vec<_>>(),
)
.await
.unwrap();
let cache = CacheBuilder::new(128).build();
let cache = new_table_flownode_set_cache("test".to_string(), cache, mem_kv);
let set = cache.get(1024).await.unwrap().unwrap();
assert_eq!(set, HashSet::from([1, 2, 3]));
assert_eq!(
set.as_ref().clone(),
HashMap::from_iter((1..=3).map(|i| { (i, Peer::empty(i),) }))
);
let set = cache.get(1025).await.unwrap().unwrap();
assert_eq!(set, HashSet::from([1, 2, 3]));
assert_eq!(
set.as_ref().clone(),
HashMap::from_iter((1..=3).map(|i| { (i, Peer::empty(i),) }))
);
let result = cache.get(1026).await.unwrap().unwrap();
assert_eq!(result.len(), 0);
}
@@ -204,7 +226,7 @@ mod tests {
let cache = new_table_flownode_set_cache("test".to_string(), cache, mem_kv);
let ident = vec![CacheIdent::CreateFlow(CreateFlow {
source_table_ids: vec![1024, 1025],
flownode_ids: vec![1, 2, 3, 4, 5],
flownodes: (1..=5).map(Peer::empty).collect(),
})];
cache.invalidate(&ident).await.unwrap();
let set = cache.get(1024).await.unwrap().unwrap();
@@ -221,11 +243,11 @@ mod tests {
let ident = vec![
CacheIdent::CreateFlow(CreateFlow {
source_table_ids: vec![1024, 1025],
flownode_ids: vec![1, 2, 3, 4, 5],
flownodes: (1..=5).map(Peer::empty).collect(),
}),
CacheIdent::CreateFlow(CreateFlow {
source_table_ids: vec![1024, 1025],
flownode_ids: vec![11, 12],
flownodes: (11..=12).map(Peer::empty).collect(),
}),
];
cache.invalidate(&ident).await.unwrap();
@@ -240,8 +262,14 @@ mod tests {
})];
cache.invalidate(&ident).await.unwrap();
let set = cache.get(1024).await.unwrap().unwrap();
assert_eq!(set, HashSet::from([11, 12]));
assert_eq!(
set.as_ref().clone(),
HashMap::from_iter((11..=12).map(|i| { (i, Peer::empty(i),) }))
);
let set = cache.get(1025).await.unwrap().unwrap();
assert_eq!(set, HashSet::from([11, 12]));
assert_eq!(
set.as_ref().clone(),
HashMap::from_iter((11..=12).map(|i| { (i, Peer::empty(i),) }))
);
}
}

View File

@@ -99,6 +99,7 @@ pub struct NodeInfo {
pub enum Role {
Datanode,
Frontend,
Flownode,
Metasrv,
}
@@ -106,6 +107,7 @@ pub enum Role {
pub enum NodeStatus {
Datanode(DatanodeStatus),
Frontend(FrontendStatus),
Flownode(FlownodeStatus),
Metasrv(MetasrvStatus),
Standalone,
}
@@ -116,6 +118,7 @@ impl NodeStatus {
match self {
NodeStatus::Datanode(_) => "DATANODE",
NodeStatus::Frontend(_) => "FRONTEND",
NodeStatus::Flownode(_) => "FLOWNODE",
NodeStatus::Metasrv(_) => "METASRV",
NodeStatus::Standalone => "STANDALONE",
}
@@ -139,6 +142,10 @@ pub struct DatanodeStatus {
#[derive(Debug, Serialize, Deserialize)]
pub struct FrontendStatus {}
/// The status of a flownode.
#[derive(Debug, Serialize, Deserialize)]
pub struct FlownodeStatus {}
/// The status of a metasrv.
#[derive(Debug, Serialize, Deserialize)]
pub struct MetasrvStatus {
@@ -235,7 +242,8 @@ impl From<Role> for i32 {
match role {
Role::Datanode => 0,
Role::Frontend => 1,
Role::Metasrv => 2,
Role::Flownode => 2,
Role::Metasrv => 99,
}
}
}
@@ -247,7 +255,8 @@ impl TryFrom<i32> for Role {
match role {
0 => Ok(Self::Datanode),
1 => Ok(Self::Frontend),
2 => Ok(Self::Metasrv),
2 => Ok(Self::Flownode),
99 => Ok(Self::Metasrv),
_ => InvalidRoleSnafu { role }.fail(),
}
}

View File

@@ -16,7 +16,7 @@ use std::collections::HashMap;
use std::sync::Arc;
use common_telemetry::tracing_context::W3cTrace;
use store_api::storage::{RegionNumber, TableId};
use store_api::storage::{RegionId, RegionNumber, TableId};
use crate::cache_invalidator::CacheInvalidatorRef;
use crate::ddl::flow_meta::FlowMetadataAllocatorRef;
@@ -29,7 +29,7 @@ use crate::node_manager::NodeManagerRef;
use crate::region_keeper::MemoryRegionKeeperRef;
use crate::rpc::ddl::{SubmitDdlTaskRequest, SubmitDdlTaskResponse};
use crate::rpc::procedure::{MigrateRegionRequest, MigrateRegionResponse, ProcedureStateResponse};
use crate::ClusterId;
use crate::{ClusterId, DatanodeId};
pub mod alter_logical_tables;
pub mod alter_table;
@@ -101,6 +101,33 @@ pub struct TableMetadata {
pub region_wal_options: HashMap<RegionNumber, String>,
}
pub type RegionFailureDetectorControllerRef = Arc<dyn RegionFailureDetectorController>;
pub type DetectingRegion = (ClusterId, DatanodeId, RegionId);
/// Used for actively registering Region failure detectors.
///
/// Ensuring the Region Supervisor can detect Region failures without relying on the first heartbeat from the datanode.
#[async_trait::async_trait]
pub trait RegionFailureDetectorController: Send + Sync {
/// Registers failure detectors for the given identifiers.
async fn register_failure_detectors(&self, detecting_regions: Vec<DetectingRegion>);
/// Deregisters failure detectors for the given identifiers.
async fn deregister_failure_detectors(&self, detecting_regions: Vec<DetectingRegion>);
}
/// A noop implementation of [`RegionFailureDetectorController`].
#[derive(Debug, Clone)]
pub struct NoopRegionFailureDetectorControl;
#[async_trait::async_trait]
impl RegionFailureDetectorController for NoopRegionFailureDetectorControl {
async fn register_failure_detectors(&self, _detecting_regions: Vec<DetectingRegion>) {}
async fn deregister_failure_detectors(&self, _detecting_regions: Vec<DetectingRegion>) {}
}
/// The context of ddl.
#[derive(Clone)]
pub struct DdlContext {
@@ -118,4 +145,28 @@ pub struct DdlContext {
pub flow_metadata_manager: FlowMetadataManagerRef,
/// Allocator for flow metadata.
pub flow_metadata_allocator: FlowMetadataAllocatorRef,
/// controller of region failure detector.
pub region_failure_detector_controller: RegionFailureDetectorControllerRef,
}
impl DdlContext {
/// Notifies the RegionSupervisor to register failure detector of new created regions.
///
/// The datanode may crash without sending a heartbeat that contains information about newly created regions,
/// which may prevent the RegionSupervisor from detecting failures in these newly created regions.
pub async fn register_failure_detectors(&self, detecting_regions: Vec<DetectingRegion>) {
self.region_failure_detector_controller
.register_failure_detectors(detecting_regions)
.await;
}
/// Notifies the RegionSupervisor to remove failure detectors.
///
/// Once the regions were dropped, subsequent heartbeats no longer include these regions.
/// Therefore, we should remove the failure detectors for these dropped regions.
async fn deregister_failure_detectors(&self, detecting_regions: Vec<DetectingRegion>) {
self.region_failure_detector_controller
.deregister_failure_detectors(detecting_regions)
.await;
}
}

View File

@@ -41,8 +41,9 @@ use crate::ddl::DdlContext;
use crate::error::{self, Result};
use crate::instruction::{CacheIdent, CreateFlow};
use crate::key::flow::flow_info::FlowInfoValue;
use crate::key::flow::flow_route::FlowRouteValue;
use crate::key::table_name::TableNameKey;
use crate::key::FlowId;
use crate::key::{FlowId, FlowPartitionId};
use crate::lock_key::{CatalogLock, FlowNameLock, TableNameLock};
use crate::peer::Peer;
use crate::rpc::ddl::{CreateFlowTask, QueryContext};
@@ -170,9 +171,10 @@ impl CreateFlowProcedure {
// Safety: The flow id must be allocated.
let flow_id = self.data.flow_id.unwrap();
// TODO(weny): Support `or_replace`.
let (flow_info, flow_routes) = (&self.data).into();
self.context
.flow_metadata_manager
.create_flow_metadata(flow_id, (&self.data).into())
.create_flow_metadata(flow_id, flow_info, flow_routes)
.await?;
info!("Created flow metadata for flow {flow_id}");
self.data.state = CreateFlowState::InvalidateFlowCache;
@@ -192,7 +194,7 @@ impl CreateFlowProcedure {
&ctx,
&[CacheIdent::CreateFlow(CreateFlow {
source_table_ids: self.data.source_table_ids.clone(),
flownode_ids: self.data.peers.iter().map(|peer| peer.id).collect(),
flownodes: self.data.peers.clone(),
})],
)
.await?;
@@ -292,7 +294,7 @@ impl From<&CreateFlowData> for CreateRequest {
}
}
impl From<&CreateFlowData> for FlowInfoValue {
impl From<&CreateFlowData> for (FlowInfoValue, Vec<(FlowPartitionId, FlowRouteValue)>) {
fn from(value: &CreateFlowData) -> Self {
let CreateFlowTask {
catalog_name,
@@ -311,17 +313,26 @@ impl From<&CreateFlowData> for FlowInfoValue {
.enumerate()
.map(|(idx, peer)| (idx as u32, peer.id))
.collect::<BTreeMap<_, _>>();
let flow_routes = value
.peers
.iter()
.enumerate()
.map(|(idx, peer)| (idx as u32, FlowRouteValue { peer: peer.clone() }))
.collect::<Vec<_>>();
FlowInfoValue {
source_table_ids: value.source_table_ids.clone(),
sink_table_name,
flownode_ids,
catalog_name,
flow_name,
raw_sql: sql,
expire_after,
comment,
options,
}
(
FlowInfoValue {
source_table_ids: value.source_table_ids.clone(),
sink_table_name,
flownode_ids,
catalog_name,
flow_name,
raw_sql: sql,
expire_after,
comment,
options,
},
flow_routes,
)
}
}

View File

@@ -23,10 +23,11 @@ impl CreateFlowProcedure {
pub(crate) async fn allocate_flow_id(&mut self) -> Result<()> {
//TODO(weny, ruihang): We doesn't support the partitions. It's always be 1, now.
let partitions = 1;
let cluster_id = self.data.cluster_id;
let (flow_id, peers) = self
.context
.flow_metadata_allocator
.create(partitions)
.create(cluster_id, partitions)
.await?;
self.data.flow_id = Some(flow_id);
self.data.peers = peers;

View File

@@ -33,7 +33,10 @@ use table::metadata::{RawTableInfo, TableId};
use table::table_reference::TableReference;
use crate::ddl::create_table_template::{build_template, CreateRequestBuilder};
use crate::ddl::utils::{add_peer_context_if_needed, handle_retry_error, region_storage_path};
use crate::ddl::utils::{
add_peer_context_if_needed, convert_region_routes_to_detecting_regions, handle_retry_error,
region_storage_path,
};
use crate::ddl::{DdlContext, TableMetadata, TableMetadataAllocatorContext};
use crate::error::{self, Result};
use crate::key::table_name::TableNameKey;
@@ -265,16 +268,25 @@ impl CreateTableProcedure {
/// - Failed to create table metadata.
async fn on_create_metadata(&mut self) -> Result<Status> {
let table_id = self.table_id();
let cluster_id = self.creator.data.cluster_id;
let manager = &self.context.table_metadata_manager;
let raw_table_info = self.table_info().clone();
// Safety: the region_wal_options must be allocated.
let region_wal_options = self.region_wal_options()?.clone();
// Safety: the table_route must be allocated.
let table_route = TableRouteValue::Physical(self.table_route()?.clone());
let physical_table_route = self.table_route()?.clone();
let detecting_regions = convert_region_routes_to_detecting_regions(
cluster_id,
&physical_table_route.region_routes,
);
let table_route = TableRouteValue::Physical(physical_table_route);
manager
.create_table_metadata(raw_table_info, table_route, region_wal_options)
.await?;
self.context
.register_failure_detectors(detecting_regions)
.await;
info!("Created table metadata for table {table_id}");
self.creator.opening_regions.clear();

View File

@@ -48,6 +48,7 @@ pub(crate) fn build_template(create_table_expr: &CreateTableExpr) -> Result<Crea
semantic_type: semantic_type as i32,
comment: String::new(),
datatype_extension: c.datatype_extension.clone(),
options: c.options.clone(),
}),
column_id: i as u32,
}

View File

@@ -35,6 +35,7 @@ use crate::ddl::DdlContext;
use crate::error::Result;
use crate::key::table_name::TableNameValue;
use crate::lock_key::{CatalogLock, SchemaLock};
use crate::ClusterId;
pub struct DropDatabaseProcedure {
/// The context of procedure runtime.
@@ -53,6 +54,7 @@ pub(crate) enum DropTableTarget {
/// Context of [DropDatabaseProcedure] execution.
pub(crate) struct DropDatabaseContext {
cluster_id: ClusterId,
catalog: String,
schema: String,
drop_if_exists: bool,
@@ -85,6 +87,7 @@ impl DropDatabaseProcedure {
Self {
runtime_context: context,
context: DropDatabaseContext {
cluster_id: 0,
catalog,
schema,
drop_if_exists,
@@ -105,6 +108,7 @@ impl DropDatabaseProcedure {
Ok(Self {
runtime_context,
context: DropDatabaseContext {
cluster_id: 0,
catalog,
schema,
drop_if_exists,

View File

@@ -221,6 +221,7 @@ mod tests {
// It always starts from Logical
let mut state = DropDatabaseCursor::new(DropTableTarget::Logical);
let mut ctx = DropDatabaseContext {
cluster_id: 0,
catalog: DEFAULT_CATALOG_NAME.to_string(),
schema: DEFAULT_SCHEMA_NAME.to_string(),
drop_if_exists: false,
@@ -256,6 +257,7 @@ mod tests {
// It always starts from Logical
let mut state = DropDatabaseCursor::new(DropTableTarget::Logical);
let mut ctx = DropDatabaseContext {
cluster_id: 0,
catalog: DEFAULT_CATALOG_NAME.to_string(),
schema: DEFAULT_SCHEMA_NAME.to_string(),
drop_if_exists: false,
@@ -284,6 +286,7 @@ mod tests {
let ddl_context = new_ddl_context(node_manager);
let mut state = DropDatabaseCursor::new(DropTableTarget::Physical);
let mut ctx = DropDatabaseContext {
cluster_id: 0,
catalog: DEFAULT_CATALOG_NAME.to_string(),
schema: DEFAULT_SCHEMA_NAME.to_string(),
drop_if_exists: false,

View File

@@ -96,10 +96,11 @@ impl State for DropDatabaseExecutor {
async fn next(
&mut self,
ddl_ctx: &DdlContext,
_ctx: &mut DropDatabaseContext,
ctx: &mut DropDatabaseContext,
) -> Result<(Box<dyn State>, Status)> {
self.register_dropping_regions(ddl_ctx)?;
let executor = DropTableExecutor::new(self.table_name.clone(), self.table_id, true);
let executor =
DropTableExecutor::new(ctx.cluster_id, self.table_name.clone(), self.table_id, true);
// Deletes metadata for table permanently.
let table_route_value = TableRouteValue::new(
self.table_id,
@@ -186,6 +187,7 @@ mod tests {
DropTableTarget::Physical,
);
let mut ctx = DropDatabaseContext {
cluster_id: 0,
catalog: DEFAULT_CATALOG_NAME.to_string(),
schema: DEFAULT_SCHEMA_NAME.to_string(),
drop_if_exists: false,
@@ -198,6 +200,7 @@ mod tests {
}
// Execute again
let mut ctx = DropDatabaseContext {
cluster_id: 0,
catalog: DEFAULT_CATALOG_NAME.to_string(),
schema: DEFAULT_SCHEMA_NAME.to_string(),
drop_if_exists: false,
@@ -238,6 +241,7 @@ mod tests {
DropTableTarget::Logical,
);
let mut ctx = DropDatabaseContext {
cluster_id: 0,
catalog: DEFAULT_CATALOG_NAME.to_string(),
schema: DEFAULT_SCHEMA_NAME.to_string(),
drop_if_exists: false,
@@ -250,6 +254,7 @@ mod tests {
}
// Execute again
let mut ctx = DropDatabaseContext {
cluster_id: 0,
catalog: DEFAULT_CATALOG_NAME.to_string(),
schema: DEFAULT_SCHEMA_NAME.to_string(),
drop_if_exists: false,
@@ -339,6 +344,7 @@ mod tests {
DropTableTarget::Physical,
);
let mut ctx = DropDatabaseContext {
cluster_id: 0,
catalog: DEFAULT_CATALOG_NAME.to_string(),
schema: DEFAULT_SCHEMA_NAME.to_string(),
drop_if_exists: false,
@@ -368,6 +374,7 @@ mod tests {
DropTableTarget::Physical,
);
let mut ctx = DropDatabaseContext {
cluster_id: 0,
catalog: DEFAULT_CATALOG_NAME.to_string(),
schema: DEFAULT_SCHEMA_NAME.to_string(),
drop_if_exists: false,

View File

@@ -118,6 +118,7 @@ mod tests {
.unwrap();
let mut state = DropDatabaseRemoveMetadata;
let mut ctx = DropDatabaseContext {
cluster_id: 0,
catalog: "foo".to_string(),
schema: "bar".to_string(),
drop_if_exists: true,
@@ -144,6 +145,7 @@ mod tests {
// Schema not exists
let mut state = DropDatabaseRemoveMetadata;
let mut ctx = DropDatabaseContext {
cluster_id: 0,
catalog: "foo".to_string(),
schema: "bar".to_string(),
drop_if_exists: true,

View File

@@ -89,6 +89,7 @@ mod tests {
let ddl_context = new_ddl_context(node_manager);
let mut step = DropDatabaseStart;
let mut ctx = DropDatabaseContext {
cluster_id: 0,
catalog: "foo".to_string(),
schema: "bar".to_string(),
drop_if_exists: false,
@@ -104,6 +105,7 @@ mod tests {
let ddl_context = new_ddl_context(node_manager);
let mut state = DropDatabaseStart;
let mut ctx = DropDatabaseContext {
cluster_id: 0,
catalog: "foo".to_string(),
schema: "bar".to_string(),
drop_if_exists: true,
@@ -126,6 +128,7 @@ mod tests {
.unwrap();
let mut state = DropDatabaseStart;
let mut ctx = DropDatabaseContext {
cluster_id: 0,
catalog: "foo".to_string(),
schema: "bar".to_string(),
drop_if_exists: false,

View File

@@ -35,8 +35,8 @@ use crate::error::{self, Result};
use crate::flow_name::FlowName;
use crate::instruction::{CacheIdent, DropFlow};
use crate::key::flow::flow_info::FlowInfoValue;
use crate::key::flow::flow_route::FlowRouteValue;
use crate::lock_key::{CatalogLock, FlowLock};
use crate::peer::Peer;
use crate::rpc::ddl::DropFlowTask;
use crate::{metrics, ClusterId};
@@ -59,6 +59,7 @@ impl DropFlowProcedure {
cluster_id,
task,
flow_info_value: None,
flow_route_values: vec![],
},
}
}
@@ -104,10 +105,8 @@ impl DropFlowProcedure {
let flow_id = self.data.task.flow_id;
let mut drop_flow_tasks = Vec::with_capacity(flownode_ids.len());
for flownode in flownode_ids.values() {
// TODO(weny): use the real peer.
let peer = Peer::new(*flownode, "");
let requester = self.context.node_manager.flownode(&peer).await;
for FlowRouteValue { peer } in &self.data.flow_route_values {
let requester = self.context.node_manager.flownode(peer).await;
let request = FlowRequest {
body: Some(flow_request::Body::Drop(DropRequest {
flow_id: Some(api::v1::FlowId { id: flow_id }),
@@ -118,12 +117,13 @@ impl DropFlowProcedure {
drop_flow_tasks.push(async move {
if let Err(err) = requester.handle(request).await {
if err.status_code() != StatusCode::FlowNotFound {
return Err(add_peer_context_if_needed(peer)(err));
return Err(add_peer_context_if_needed(peer.clone())(err));
}
}
Ok(())
});
}
join_all(drop_flow_tasks)
.await
.into_iter()
@@ -221,6 +221,7 @@ pub(crate) struct DropFlowData {
cluster_id: ClusterId,
task: DropFlowTask,
pub(crate) flow_info_value: Option<FlowInfoValue>,
pub(crate) flow_route_values: Vec<FlowRouteValue>,
}
/// The state of drop flow

View File

@@ -13,7 +13,8 @@
// limitations under the License.
use common_catalog::format_full_flow_name;
use snafu::OptionExt;
use futures::TryStreamExt;
use snafu::{ensure, OptionExt};
use crate::ddl::drop_flow::DropFlowProcedure;
use crate::error::{self, Result};
@@ -32,7 +33,23 @@ impl DropFlowProcedure {
.with_context(|| error::FlowNotFoundSnafu {
flow_name: format_full_flow_name(catalog_name, flow_name),
})?;
let flow_route_values = self
.context
.flow_metadata_manager
.flow_route_manager()
.routes(self.data.task.flow_id)
.map_ok(|(_, value)| value)
.try_collect::<Vec<_>>()
.await?;
ensure!(
!flow_route_values.is_empty(),
error::FlowRouteNotFoundSnafu {
flow_name: format_full_flow_name(catalog_name, flow_name),
}
);
self.data.flow_info_value = Some(flow_info_value);
self.data.flow_route_values = flow_route_values;
Ok(())
}

View File

@@ -279,6 +279,7 @@ impl DropTableData {
fn build_executor(&self) -> DropTableExecutor {
DropTableExecutor::new(
self.cluster_id,
self.task.table_name(),
self.task.table_id,
self.task.drop_if_exists,

View File

@@ -26,13 +26,14 @@ use table::metadata::TableId;
use table::table_name::TableName;
use crate::cache_invalidator::Context;
use crate::ddl::utils::add_peer_context_if_needed;
use crate::ddl::utils::{add_peer_context_if_needed, convert_region_routes_to_detecting_regions};
use crate::ddl::DdlContext;
use crate::error::{self, Result};
use crate::instruction::CacheIdent;
use crate::key::table_name::TableNameKey;
use crate::key::table_route::TableRouteValue;
use crate::rpc::router::{find_leader_regions, find_leaders, RegionRoute};
use crate::ClusterId;
/// [Control] indicated to the caller whether to go to the next step.
#[derive(Debug)]
@@ -50,8 +51,14 @@ impl<T> Control<T> {
impl DropTableExecutor {
/// Returns the [DropTableExecutor].
pub fn new(table: TableName, table_id: TableId, drop_if_exists: bool) -> Self {
pub fn new(
cluster_id: ClusterId,
table: TableName,
table_id: TableId,
drop_if_exists: bool,
) -> Self {
Self {
cluster_id,
table,
table_id,
drop_if_exists,
@@ -64,6 +71,7 @@ impl DropTableExecutor {
/// - Invalidates the cache on the Frontend nodes.
/// - Drops the regions on the Datanode nodes.
pub struct DropTableExecutor {
cluster_id: ClusterId,
table: TableName,
table_id: TableId,
drop_if_exists: bool,
@@ -130,7 +138,17 @@ impl DropTableExecutor {
) -> Result<()> {
ctx.table_metadata_manager
.destroy_table_metadata(self.table_id, &self.table, table_route_value)
.await
.await?;
let detecting_regions = if table_route_value.is_physical() {
// Safety: checked.
let regions = table_route_value.region_routes().unwrap();
convert_region_routes_to_detecting_regions(self.cluster_id, regions)
} else {
vec![]
};
ctx.deregister_failure_detectors(detecting_regions).await;
Ok(())
}
/// Restores the table metadata.
@@ -274,6 +292,7 @@ mod tests {
let node_manager = Arc::new(MockDatanodeManager::new(()));
let ctx = new_ddl_context(node_manager);
let executor = DropTableExecutor::new(
0,
TableName::new(DEFAULT_CATALOG_NAME, DEFAULT_SCHEMA_NAME, "my_table"),
1024,
true,
@@ -283,6 +302,7 @@ mod tests {
// Drops a non-exists table
let executor = DropTableExecutor::new(
0,
TableName::new(DEFAULT_CATALOG_NAME, DEFAULT_SCHEMA_NAME, "my_table"),
1024,
false,
@@ -292,6 +312,7 @@ mod tests {
// Drops a exists table
let executor = DropTableExecutor::new(
0,
TableName::new(DEFAULT_CATALOG_NAME, DEFAULT_SCHEMA_NAME, "my_table"),
1024,
false,

View File

@@ -20,6 +20,7 @@ use crate::error::Result;
use crate::key::FlowId;
use crate::peer::Peer;
use crate::sequence::SequenceRef;
use crate::ClusterId;
/// The reference of [FlowMetadataAllocator].
pub type FlowMetadataAllocatorRef = Arc<FlowMetadataAllocator>;
@@ -42,6 +43,16 @@ impl FlowMetadataAllocator {
}
}
pub fn with_peer_allocator(
flow_id_sequence: SequenceRef,
peer_allocator: Arc<dyn PartitionPeerAllocator>,
) -> Self {
Self {
flow_id_sequence,
partition_peer_allocator: peer_allocator,
}
}
/// Allocates a the [FlowId].
pub(crate) async fn allocate_flow_id(&self) -> Result<FlowId> {
let flow_id = self.flow_id_sequence.next().await? as FlowId;
@@ -49,9 +60,16 @@ impl FlowMetadataAllocator {
}
/// Allocates the [FlowId] and [Peer]s.
pub async fn create(&self, partitions: usize) -> Result<(FlowId, Vec<Peer>)> {
pub async fn create(
&self,
cluster_id: ClusterId,
partitions: usize,
) -> Result<(FlowId, Vec<Peer>)> {
let flow_id = self.allocate_flow_id().await?;
let peers = self.partition_peer_allocator.alloc(partitions).await?;
let peers = self
.partition_peer_allocator
.alloc(cluster_id, partitions)
.await?;
Ok((flow_id, peers))
}
@@ -61,7 +79,7 @@ impl FlowMetadataAllocator {
#[async_trait]
pub trait PartitionPeerAllocator: Send + Sync {
/// Allocates [Peer] nodes for storing partitions.
async fn alloc(&self, partitions: usize) -> Result<Vec<Peer>>;
async fn alloc(&self, cluster_id: ClusterId, partitions: usize) -> Result<Vec<Peer>>;
}
/// [PartitionPeerAllocatorRef] allocates [Peer]s for partitions.
@@ -71,7 +89,7 @@ struct NoopPartitionPeerAllocator;
#[async_trait]
impl PartitionPeerAllocator for NoopPartitionPeerAllocator {
async fn alloc(&self, partitions: usize) -> Result<Vec<Peer>> {
async fn alloc(&self, _cluster_id: ClusterId, partitions: usize) -> Result<Vec<Peer>> {
Ok(vec![Peer::default(); partitions])
}
}

View File

@@ -45,6 +45,7 @@ impl From<TestColumnDef> for ColumnDef {
semantic_type: semantic_type as i32,
comment,
datatype_extension: None,
options: None,
}
}
}

Some files were not shown because too many files have changed in this diff Show More