Compare commits

..

24 Commits

Author SHA1 Message Date
Ruihang Xia
1b7ab2957b feat: cache logical region's metadata
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2024-10-12 16:16:25 +08:00
Ning Sun
aaa9b32908 feat: add more h3 functions (#4770)
* feat: add more h3 grid functions

* feat: add more traversal functions

* refactor: update some function definitions

* style: format

* refactor: avoid creating slice in nested loop

* feat: ensure column number and length

* refactor: fix lint warnings

* refactor: merge main

* Apply suggestions from code review

Co-authored-by: LFC <990479+MichaelScofield@users.noreply.github.com>

* Update src/common/function/src/scalars/geo/h3.rs

Co-authored-by: LFC <990479+MichaelScofield@users.noreply.github.com>

* style: format

---------

Co-authored-by: LFC <990479+MichaelScofield@users.noreply.github.com>
2024-10-11 17:57:54 +00:00
Weny Xu
4bb1f4f184 feat: introduce LeadershipChangeNotifier and LeadershipChangeListener (#4817)
* feat: introduce `LeadershipChangeNotifier`

* refactor: use `LeadershipChangeNotifier`

* chore: apply suggestions from CR

* chore: apply suggestions from CR

* chore: adjust log styling
2024-10-11 12:48:53 +00:00
Weny Xu
0f907ef99e fix: correct table name formatting (#4819) 2024-10-11 11:32:15 +00:00
Ruihang Xia
a61c0bd1d8 fix: error in admin function is not formatted properly (#4820)
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2024-10-11 10:02:45 +00:00
Lei, HUANG
7dd0e3ab37 fix: Panic in UNION ALL queries (#4796)
* fix/union_all_panic:
 Improve MetricCollector by incrementing level and fix underflow issue; add tests for UNION ALL queries

* chore: remove useless documentation

* fix/union_all_panic: Add order by clause to UNION ALL select queries in tests
2024-10-11 08:23:01 +00:00
Yingwen
d168bde226 feat!: move v1/prof API to debug/prof (#4810)
* feat!: move v1/prof to debug/prof

* docs: update readme

* docs: move prof docs to docs dir

* chore: update message

* feat!: remove v1/prof

* docs: update mem prof docs
2024-10-11 04:16:37 +00:00
jeremyhi
4b34f610aa feat: information extension (#4811)
* feat: information extension

* Update manager.rs

Co-authored-by: Weny Xu <wenymedia@gmail.com>

* chore: by comment

---------

Co-authored-by: Weny Xu <wenymedia@gmail.com>
2024-10-11 03:13:49 +00:00
Weny Xu
695ff1e037 feat: expose RegionMigrationManagerRef (#4812)
* chore: expose `RegionMigrationProcedureTask`

* fix: fix typos

* chore: expose `tracker`
2024-10-11 02:40:51 +00:00
Yohan Wal
288fdc3145 feat: json_path_exists udf (#4807)
* feat: json_path_exists udf

* chore: fix comments

* fix: caution when copy&paste QAQ
2024-10-10 14:15:34 +00:00
discord9
a8ed3db0aa feat: Merge sort Logical plan (#4768)
* feat(WIP): MergeSort

* wip

* feat: MergeSort LogicalPlan

* update sqlness result

* Apply suggestions from code review

Co-authored-by: Lei, HUANG <6406592+v0y4g3r@users.noreply.github.com>

* refactor: per review advice

* refactor: more per review

* chore: per review

---------

Co-authored-by: Ruihang Xia <waynestxia@gmail.com>
Co-authored-by: Lei, HUANG <6406592+v0y4g3r@users.noreply.github.com>
2024-10-09 09:37:27 +00:00
Kaifeng Zheng
0dd11f53f5 feat: add json format output for http interface (#4797)
* feat: json output format for http

* feat: add json result test case

* fix: typo and refactor a piece of code

* fix: cargo check

* move affected_rows to top level
2024-10-09 07:11:57 +00:00
Ning Sun
19918928c5 feat: add function to aggregate path into a geojson path (#4798)
* feat: add geojson function to aggregate paths

* test: add sqlness results

* test: add sqlness

* refactor: corrected to aggregation function

* chore: update comments

* fix: make linter happy again

* refactor: rename to remove `geo` from `geojson` function name

The return type is not geojson at all. It's just compatible with geojson's
coordinates part and superset's deckgl path plugin.
2024-10-09 02:38:44 +00:00
shuiyisong
5f0a83b2b1 fix: ts conversion during transform phase (#4790)
* fix: allow ts conversion during transform phase

* chore: replace `unimplemented` with snafu
2024-10-08 17:54:44 +00:00
localhost
71a66d15f7 chore: add json write (#4744)
* chore: add json write

* chore: add test for write json log api

* chore: enhancement of Error Handling

* chore: fix by pr comment

* chore: fix by pr comment

* chore: enhancement of error content and add some doc
2024-10-08 12:11:09 +00:00
Weny Xu
2cdd103874 feat: introduce HeartbeatHandlerGroupBuilderCustomizer (#4803)
* feat: introduce `HeartbeatHandlerGroupBuilderFinalizer`

* chore: rename to `HeartbeatHandlerGroupBuilderCustomizer`
2024-10-08 09:02:06 +00:00
Ning Sun
4dea4cac47 refactor: change sqlness ports to avoid conflict with local instance (#4794) 2024-10-08 07:33:24 +00:00
Kaifeng Zheng
a283e13da7 feat: set max log files to 720 by default, info log only (#4787)
* feat: set max log files to 720 by default, info log only

* expose max_log_files in tomls

* include dir info when panicing, limit max_log_files of err_log to 30, and that of slow_queries to opt.max_log_files

* fix clippy

* update config.md

* update expected config str

* limit err_log max files size to `max_log_files` too, include err info when panicing, put `max_l_f` in right position

* fix typos

* chore: config

Co-authored-by: Lei, HUANG <6406592+v0y4g3r@users.noreply.github.com>

---------

Co-authored-by: dennis zhuang <killme2008@gmail.com>
Co-authored-by: Lei, HUANG <6406592+v0y4g3r@users.noreply.github.com>
2024-10-04 18:05:40 +00:00
JohnsonLee
47a3277d12 feat: customize channel information for sqlness tests (#4729)
* feat: add pg and mysql server_address options

* feat: start pg and mysql server(standalone)

* feat: start pg and mysql in distribute

* feat: finally get there, specify postgres sqlness

* feat: support mysql sqlness

* fix: license

* fix: remove unused import

* fix: toml

* fix: clippy

* refactor: BeginProtocolInterceptorFactory to ProtocolInterceptorFactory

* fix: sqlness pg connect

* fix: clippy

* Apply suggestions from code review

Co-authored-by: Yingwen <realevenyag@gmail.com>

* fix: rustfmt

* fix: reconnect pg and mysql when restart

* test: add mysql related sqlness

* fix: wait for start while restarting

* fix: clippy

* fix: cargo lock conflict

fix: Cargo.lock conflict

* fix: usage of '@@tx_isolation' in sqlness

* fix: typos

* feat: retry with backoff when create client

* fix: use millisecond rather than microseconds in backoff

---------

Co-authored-by: Yingwen <realevenyag@gmail.com>
2024-10-04 09:26:57 +00:00
Yingwen
caf5f2c7a5 docs: add TM to logos (#4789) 2024-10-01 03:35:04 +00:00
Weny Xu
c1e8084af6 feat: add add_handler_after, add_handler_before, replace_handler (#4788)
* feat: add `add_handler_after`, `add_handler_before`, `replace_handler`

* chore: apply suggestions from CR

* test: add more tests

* feat: use `Vec` instead of `LinkedList`

* Update src/meta-srv/src/lib.rs

Co-authored-by: Yingwen <realevenyag@gmail.com>

---------

Co-authored-by: Yingwen <realevenyag@gmail.com>
2024-09-30 08:53:59 +00:00
Weny Xu
6e776d5f98 feat: support to reject write after flushing (#4759)
* refactor: use `RegionRoleState` instead of `RegionState`

* feat: introducing `RegionLeaderState::Downgrading`

* refactor: introduce `set_region_role_state_gracefully`

* refactor: use `set_region_role` instead of `set_writable`

* feat: support to reject write after flushing

* fix: fix unit tests

* test: add unit test for `should_reject_write`

* chore: add comments

* chore: refine comments

* fix: fix unit test

* test: enable fuzz tests for Local WAL

* chore: add logs

* chore: rename `RegionStatus` to `RegionState`

* feat: introduce `DowngradingLeader`

* chore: rename

* refactor: refactor `set_role_state` tests

* test: ensure downgrading region will reject write

* chore: enhance logs

* chore: refine name

* chore: refine comment

* test: add tests for `set_role_role_state`

* fix: fix unit tests

* chore: apply suggestions from CR

* chore: apply suggestions from CR
2024-09-30 08:28:51 +00:00
zyy17
e39a9e6feb feat: add StatementStatistics for slow query logging implementation (#4719)
* feat: log slow query

* feat: log slow query for sql

* refactor: add slow query logging options

* ci: fix errors

* feat: add StatementStatistics

* chore: revert modification of servers crate

* docs: update config docs

* fix: clippy errors
2024-09-30 03:26:50 +00:00
Weny Xu
77af4fd981 refactor: introduce HeartbeatHandlerGroupBuilder (#4785) 2024-09-30 02:56:53 +00:00
171 changed files with 5942 additions and 1751 deletions

View File

@@ -442,6 +442,13 @@ jobs:
minio: true
kafka: true
values: "with-remote-wal.yaml"
include:
- target: "fuzz_migrate_mito_regions"
mode:
name: "Local WAL"
minio: true
kafka: false
values: "with-minio.yaml"
steps:
- name: Remove unused software
run: |
@@ -530,7 +537,7 @@ jobs:
with:
image-registry: localhost:5001
values-filename: ${{ matrix.mode.values }}
enable-region-failover: true
enable-region-failover: ${{ matrix.mode.kafka }}
- name: Port forward (mysql)
run: |
kubectl port-forward service/my-greptimedb-frontend 4002:4002 -n my-greptimedb&

801
Cargo.lock generated

File diff suppressed because it is too large Load Diff

View File

@@ -120,7 +120,7 @@ etcd-client = { version = "0.13" }
fst = "0.4.7"
futures = "0.3"
futures-util = "0.3"
greptime-proto = { git = "https://github.com/GreptimeTeam/greptime-proto.git", rev = "36334744c7020734dcb4a6b8d24d52ae7ed53fe1" }
greptime-proto = { git = "https://github.com/GreptimeTeam/greptime-proto.git", rev = "0b4f7c8ab06399f6b90e1626e8d5b9697cb33bb9" }
humantime = "2.1"
humantime-serde = "1.1"
itertools = "0.10"

View File

@@ -161,8 +161,13 @@
| `logging.otlp_endpoint` | String | `http://localhost:4317` | The OTLP tracing endpoint. |
| `logging.append_stdout` | Bool | `true` | Whether to append logs to stdout. |
| `logging.log_format` | String | `text` | The log format. Can be `text`/`json`. |
| `logging.max_log_files` | Integer | `720` | The maximum amount of log files. |
| `logging.tracing_sample_ratio` | -- | -- | The percentage of tracing will be sampled and exported.<br/>Valid range `[0, 1]`, 1 means all traces are sampled, 0 means all traces are not sampled, the default value is 1.<br/>ratio > 1 are treated as 1. Fractions < 0 are treated as 0 |
| `logging.tracing_sample_ratio.default_ratio` | Float | `1.0` | -- |
| `logging.slow_query` | -- | -- | The slow query log options. |
| `logging.slow_query.enable` | Bool | `false` | Whether to enable slow query log. |
| `logging.slow_query.threshold` | String | Unset | The threshold of slow query. |
| `logging.slow_query.sample_ratio` | Float | Unset | The sampling ratio of slow query log. The value should be in the range of (0, 1]. |
| `export_metrics` | -- | -- | The datanode can export its metrics and send to Prometheus compatible service (e.g. send to `greptimedb` itself) from remote-write API.<br/>This is only used for `greptimedb` to export its own metrics internally. It's different from prometheus scrape. |
| `export_metrics.enable` | Bool | `false` | whether enable export metrics. |
| `export_metrics.write_interval` | String | `30s` | The interval of export metrics. |
@@ -247,8 +252,13 @@
| `logging.otlp_endpoint` | String | `http://localhost:4317` | The OTLP tracing endpoint. |
| `logging.append_stdout` | Bool | `true` | Whether to append logs to stdout. |
| `logging.log_format` | String | `text` | The log format. Can be `text`/`json`. |
| `logging.max_log_files` | Integer | `720` | The maximum amount of log files. |
| `logging.tracing_sample_ratio` | -- | -- | The percentage of tracing will be sampled and exported.<br/>Valid range `[0, 1]`, 1 means all traces are sampled, 0 means all traces are not sampled, the default value is 1.<br/>ratio > 1 are treated as 1. Fractions < 0 are treated as 0 |
| `logging.tracing_sample_ratio.default_ratio` | Float | `1.0` | -- |
| `logging.slow_query` | -- | -- | The slow query log options. |
| `logging.slow_query.enable` | Bool | `false` | Whether to enable slow query log. |
| `logging.slow_query.threshold` | String | Unset | The threshold of slow query. |
| `logging.slow_query.sample_ratio` | Float | Unset | The sampling ratio of slow query log. The value should be in the range of (0, 1]. |
| `export_metrics` | -- | -- | The datanode can export its metrics and send to Prometheus compatible service (e.g. send to `greptimedb` itself) from remote-write API.<br/>This is only used for `greptimedb` to export its own metrics internally. It's different from prometheus scrape. |
| `export_metrics.enable` | Bool | `false` | whether enable export metrics. |
| `export_metrics.write_interval` | String | `30s` | The interval of export metrics. |
@@ -312,8 +322,13 @@
| `logging.otlp_endpoint` | String | `http://localhost:4317` | The OTLP tracing endpoint. |
| `logging.append_stdout` | Bool | `true` | Whether to append logs to stdout. |
| `logging.log_format` | String | `text` | The log format. Can be `text`/`json`. |
| `logging.max_log_files` | Integer | `720` | The maximum amount of log files. |
| `logging.tracing_sample_ratio` | -- | -- | The percentage of tracing will be sampled and exported.<br/>Valid range `[0, 1]`, 1 means all traces are sampled, 0 means all traces are not sampled, the default value is 1.<br/>ratio > 1 are treated as 1. Fractions < 0 are treated as 0 |
| `logging.tracing_sample_ratio.default_ratio` | Float | `1.0` | -- |
| `logging.slow_query` | -- | -- | The slow query log options. |
| `logging.slow_query.enable` | Bool | `false` | Whether to enable slow query log. |
| `logging.slow_query.threshold` | String | Unset | The threshold of slow query. |
| `logging.slow_query.sample_ratio` | Float | Unset | The sampling ratio of slow query log. The value should be in the range of (0, 1]. |
| `export_metrics` | -- | -- | The datanode can export its metrics and send to Prometheus compatible service (e.g. send to `greptimedb` itself) from remote-write API.<br/>This is only used for `greptimedb` to export its own metrics internally. It's different from prometheus scrape. |
| `export_metrics.enable` | Bool | `false` | whether enable export metrics. |
| `export_metrics.write_interval` | String | `30s` | The interval of export metrics. |
@@ -464,8 +479,13 @@
| `logging.otlp_endpoint` | String | `http://localhost:4317` | The OTLP tracing endpoint. |
| `logging.append_stdout` | Bool | `true` | Whether to append logs to stdout. |
| `logging.log_format` | String | `text` | The log format. Can be `text`/`json`. |
| `logging.max_log_files` | Integer | `720` | The maximum amount of log files. |
| `logging.tracing_sample_ratio` | -- | -- | The percentage of tracing will be sampled and exported.<br/>Valid range `[0, 1]`, 1 means all traces are sampled, 0 means all traces are not sampled, the default value is 1.<br/>ratio > 1 are treated as 1. Fractions < 0 are treated as 0 |
| `logging.tracing_sample_ratio.default_ratio` | Float | `1.0` | -- |
| `logging.slow_query` | -- | -- | The slow query log options. |
| `logging.slow_query.enable` | Bool | `false` | Whether to enable slow query log. |
| `logging.slow_query.threshold` | String | Unset | The threshold of slow query. |
| `logging.slow_query.sample_ratio` | Float | Unset | The sampling ratio of slow query log. The value should be in the range of (0, 1]. |
| `export_metrics` | -- | -- | The datanode can export its metrics and send to Prometheus compatible service (e.g. send to `greptimedb` itself) from remote-write API.<br/>This is only used for `greptimedb` to export its own metrics internally. It's different from prometheus scrape. |
| `export_metrics.enable` | Bool | `false` | whether enable export metrics. |
| `export_metrics.write_interval` | String | `30s` | The interval of export metrics. |
@@ -510,7 +530,12 @@
| `logging.otlp_endpoint` | String | `http://localhost:4317` | The OTLP tracing endpoint. |
| `logging.append_stdout` | Bool | `true` | Whether to append logs to stdout. |
| `logging.log_format` | String | `text` | The log format. Can be `text`/`json`. |
| `logging.max_log_files` | Integer | `720` | The maximum amount of log files. |
| `logging.tracing_sample_ratio` | -- | -- | The percentage of tracing will be sampled and exported.<br/>Valid range `[0, 1]`, 1 means all traces are sampled, 0 means all traces are not sampled, the default value is 1.<br/>ratio > 1 are treated as 1. Fractions < 0 are treated as 0 |
| `logging.tracing_sample_ratio.default_ratio` | Float | `1.0` | -- |
| `logging.slow_query` | -- | -- | The slow query log options. |
| `logging.slow_query.enable` | Bool | `false` | Whether to enable slow query log. |
| `logging.slow_query.threshold` | String | Unset | The threshold of slow query. |
| `logging.slow_query.sample_ratio` | Float | Unset | The sampling ratio of slow query log. The value should be in the range of (0, 1]. |
| `tracing` | -- | -- | The tracing options. Only effect when compiled with `tokio-console` feature. |
| `tracing.tokio_console_addr` | String | Unset | The tokio console address. |

View File

@@ -580,12 +580,28 @@ append_stdout = true
## The log format. Can be `text`/`json`.
log_format = "text"
## The maximum amount of log files.
max_log_files = 720
## The percentage of tracing will be sampled and exported.
## Valid range `[0, 1]`, 1 means all traces are sampled, 0 means all traces are not sampled, the default value is 1.
## ratio > 1 are treated as 1. Fractions < 0 are treated as 0
[logging.tracing_sample_ratio]
default_ratio = 1.0
## The slow query log options.
[logging.slow_query]
## Whether to enable slow query log.
enable = false
## The threshold of slow query.
## @toml2docs:none-default
threshold = "10s"
## The sampling ratio of slow query log. The value should be in the range of (0, 1].
## @toml2docs:none-default
sample_ratio = 1.0
## The datanode can export its metrics and send to Prometheus compatible service (e.g. send to `greptimedb` itself) from remote-write API.
## This is only used for `greptimedb` to export its own metrics internally. It's different from prometheus scrape.
[export_metrics]

View File

@@ -78,12 +78,28 @@ append_stdout = true
## The log format. Can be `text`/`json`.
log_format = "text"
## The maximum amount of log files.
max_log_files = 720
## The percentage of tracing will be sampled and exported.
## Valid range `[0, 1]`, 1 means all traces are sampled, 0 means all traces are not sampled, the default value is 1.
## ratio > 1 are treated as 1. Fractions < 0 are treated as 0
[logging.tracing_sample_ratio]
default_ratio = 1.0
## The slow query log options.
[logging.slow_query]
## Whether to enable slow query log.
enable = false
## The threshold of slow query.
## @toml2docs:none-default
threshold = "10s"
## The sampling ratio of slow query log. The value should be in the range of (0, 1].
## @toml2docs:none-default
sample_ratio = 1.0
## The tracing options. Only effect when compiled with `tokio-console` feature.
[tracing]
## The tokio console address.

View File

@@ -185,12 +185,28 @@ append_stdout = true
## The log format. Can be `text`/`json`.
log_format = "text"
## The maximum amount of log files.
max_log_files = 720
## The percentage of tracing will be sampled and exported.
## Valid range `[0, 1]`, 1 means all traces are sampled, 0 means all traces are not sampled, the default value is 1.
## ratio > 1 are treated as 1. Fractions < 0 are treated as 0
[logging.tracing_sample_ratio]
default_ratio = 1.0
## The slow query log options.
[logging.slow_query]
## Whether to enable slow query log.
enable = false
## The threshold of slow query.
## @toml2docs:none-default
threshold = "10s"
## The sampling ratio of slow query log. The value should be in the range of (0, 1].
## @toml2docs:none-default
sample_ratio = 1.0
## The datanode can export its metrics and send to Prometheus compatible service (e.g. send to `greptimedb` itself) from remote-write API.
## This is only used for `greptimedb` to export its own metrics internally. It's different from prometheus scrape.
[export_metrics]

View File

@@ -172,12 +172,28 @@ append_stdout = true
## The log format. Can be `text`/`json`.
log_format = "text"
## The maximum amount of log files.
max_log_files = 720
## The percentage of tracing will be sampled and exported.
## Valid range `[0, 1]`, 1 means all traces are sampled, 0 means all traces are not sampled, the default value is 1.
## ratio > 1 are treated as 1. Fractions < 0 are treated as 0
[logging.tracing_sample_ratio]
default_ratio = 1.0
## The slow query log options.
[logging.slow_query]
## Whether to enable slow query log.
enable = false
## The threshold of slow query.
## @toml2docs:none-default
threshold = "10s"
## The sampling ratio of slow query log. The value should be in the range of (0, 1].
## @toml2docs:none-default
sample_ratio = 1.0
## The datanode can export its metrics and send to Prometheus compatible service (e.g. send to `greptimedb` itself) from remote-write API.
## This is only used for `greptimedb` to export its own metrics internally. It's different from prometheus scrape.
[export_metrics]

View File

@@ -624,12 +624,28 @@ append_stdout = true
## The log format. Can be `text`/`json`.
log_format = "text"
## The maximum amount of log files.
max_log_files = 720
## The percentage of tracing will be sampled and exported.
## Valid range `[0, 1]`, 1 means all traces are sampled, 0 means all traces are not sampled, the default value is 1.
## ratio > 1 are treated as 1. Fractions < 0 are treated as 0
[logging.tracing_sample_ratio]
default_ratio = 1.0
## The slow query log options.
[logging.slow_query]
## Whether to enable slow query log.
enable = false
## The threshold of slow query.
## @toml2docs:none-default
threshold = "10s"
## The sampling ratio of slow query log. The value should be in the range of (0, 1].
## @toml2docs:none-default
sample_ratio = 1.0
## The datanode can export its metrics and send to Prometheus compatible service (e.g. send to `greptimedb` itself) from remote-write API.
## This is only used for `greptimedb` to export its own metrics internally. It's different from prometheus scrape.
[export_metrics]

View File

@@ -9,7 +9,7 @@ cargo build --features=pprof
## HTTP API
Sample at 99 Hertz, for 5 seconds, output report in [protobuf format](https://github.com/google/pprof/blob/master/proto/profile.proto).
```bash
curl -s '0:4000/v1/prof/cpu' > /tmp/pprof.out
curl -s '0:4000/debug/prof/cpu' > /tmp/pprof.out
```
Then you can use `pprof` command with the protobuf file.
@@ -19,10 +19,10 @@ go tool pprof -top /tmp/pprof.out
Sample at 99 Hertz, for 60 seconds, output report in flamegraph format.
```bash
curl -s '0:4000/v1/prof/cpu?seconds=60&output=flamegraph' > /tmp/pprof.svg
curl -s '0:4000/debug/prof/cpu?seconds=60&output=flamegraph' > /tmp/pprof.svg
```
Sample at 49 Hertz, for 10 seconds, output report in text format.
```bash
curl -s '0:4000/v1/prof/cpu?seconds=10&frequency=49&output=text' > /tmp/pprof.txt
curl -s '0:4000/debug/prof/cpu?seconds=10&frequency=49&output=text' > /tmp/pprof.txt
```

View File

@@ -12,10 +12,10 @@ brew install jemalloc
sudo apt install libjemalloc-dev
```
### [flamegraph](https://github.com/brendangregg/FlameGraph)
### [flamegraph](https://github.com/brendangregg/FlameGraph)
```bash
curl https://raw.githubusercontent.com/brendangregg/FlameGraph/master/flamegraph.pl > ./flamegraph.pl
curl https://raw.githubusercontent.com/brendangregg/FlameGraph/master/flamegraph.pl > ./flamegraph.pl
```
### Build GreptimeDB with `mem-prof` feature.
@@ -35,7 +35,7 @@ MALLOC_CONF=prof:true,lg_prof_interval:28 ./target/debug/greptime standalone sta
Dump memory profiling data through HTTP API:
```bash
curl localhost:4000/v1/prof/mem > greptime.hprof
curl localhost:4000/debug/prof/mem > greptime.hprof
```
You can periodically dump profiling data and compare them to find the delta memory usage.
@@ -45,6 +45,9 @@ You can periodically dump profiling data and compare them to find the delta memo
To create flamegraph according to dumped profiling data:
```bash
jeprof --svg <path_to_greptimedb_binary> --base=<baseline_prof> <profile_data> > output.svg
```
sudo apt install -y libjemalloc-dev
jeprof <path_to_greptime_binary> <profile_data> --collapse | ./flamegraph.pl > mem-prof.svg
jeprof <path_to_greptime_binary> --base <baseline_prof> <profile_data> --collapse | ./flamegraph.pl > output.svg
```

Binary file not shown.

Before

Width:  |  Height:  |  Size: 25 KiB

After

Width:  |  Height:  |  Size: 36 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 18 KiB

After

Width:  |  Height:  |  Size: 25 KiB

View File

@@ -89,9 +89,8 @@ pub enum Error {
location: Location,
},
#[snafu(display("Failed to get procedure client in {mode} mode"))]
GetProcedureClient {
mode: String,
#[snafu(display("Failed to get information extension client"))]
GetInformationExtension {
#[snafu(implicit)]
location: Location,
},
@@ -301,7 +300,7 @@ impl ErrorExt for Error {
| Error::CacheNotFound { .. }
| Error::CastManager { .. }
| Error::Json { .. }
| Error::GetProcedureClient { .. }
| Error::GetInformationExtension { .. }
| Error::ProcedureIdNotFound { .. } => StatusCode::Unexpected,
Error::ViewPlanColumnsChanged { .. } => StatusCode::InvalidArguments,

View File

@@ -21,7 +21,6 @@ use common_catalog::consts::{
DEFAULT_CATALOG_NAME, DEFAULT_SCHEMA_NAME, INFORMATION_SCHEMA_NAME, NUMBERS_TABLE_ID,
PG_CATALOG_NAME,
};
use common_config::Mode;
use common_error::ext::BoxedError;
use common_meta::cache::{LayeredCacheRegistryRef, ViewInfoCacheRef};
use common_meta::key::catalog_name::CatalogNameKey;
@@ -34,7 +33,6 @@ use common_meta::kv_backend::KvBackendRef;
use common_procedure::ProcedureManagerRef;
use futures_util::stream::BoxStream;
use futures_util::{StreamExt, TryStreamExt};
use meta_client::client::MetaClient;
use moka::sync::Cache;
use partition::manager::{PartitionRuleManager, PartitionRuleManagerRef};
use session::context::{Channel, QueryContext};
@@ -50,7 +48,7 @@ use crate::error::{
CacheNotFoundSnafu, GetTableCacheSnafu, InvalidTableInfoInCatalogSnafu, ListCatalogsSnafu,
ListSchemasSnafu, ListTablesSnafu, Result, TableMetadataManagerSnafu,
};
use crate::information_schema::InformationSchemaProvider;
use crate::information_schema::{InformationExtensionRef, InformationSchemaProvider};
use crate::kvbackend::TableCacheRef;
use crate::system_schema::pg_catalog::PGCatalogProvider;
use crate::system_schema::SystemSchemaProvider;
@@ -63,9 +61,8 @@ use crate::CatalogManager;
/// comes from `SystemCatalog`, which is static and read-only.
#[derive(Clone)]
pub struct KvBackendCatalogManager {
mode: Mode,
/// Only available in `Distributed` mode.
meta_client: Option<Arc<MetaClient>>,
/// Provides the extension methods for the `information_schema` tables
information_extension: InformationExtensionRef,
/// Manages partition rules.
partition_manager: PartitionRuleManagerRef,
/// Manages table metadata.
@@ -82,15 +79,13 @@ const CATALOG_CACHE_MAX_CAPACITY: u64 = 128;
impl KvBackendCatalogManager {
pub fn new(
mode: Mode,
meta_client: Option<Arc<MetaClient>>,
information_extension: InformationExtensionRef,
backend: KvBackendRef,
cache_registry: LayeredCacheRegistryRef,
procedure_manager: Option<ProcedureManagerRef>,
) -> Arc<Self> {
Arc::new_cyclic(|me| Self {
mode,
meta_client,
information_extension,
partition_manager: Arc::new(PartitionRuleManager::new(
backend.clone(),
cache_registry
@@ -118,20 +113,15 @@ impl KvBackendCatalogManager {
})
}
/// Returns the server running mode.
pub fn running_mode(&self) -> &Mode {
&self.mode
}
pub fn view_info_cache(&self) -> Result<ViewInfoCacheRef> {
self.cache_registry.get().context(CacheNotFoundSnafu {
name: "view_info_cache",
})
}
/// Returns the `[MetaClient]`.
pub fn meta_client(&self) -> Option<Arc<MetaClient>> {
self.meta_client.clone()
/// Returns the [`InformationExtension`].
pub fn information_extension(&self) -> InformationExtensionRef {
self.information_extension.clone()
}
pub fn partition_manager(&self) -> PartitionRuleManagerRef {

View File

@@ -32,7 +32,11 @@ use std::collections::HashMap;
use std::sync::{Arc, Weak};
use common_catalog::consts::{self, DEFAULT_CATALOG_NAME, INFORMATION_SCHEMA_NAME};
use common_error::ext::ErrorExt;
use common_meta::cluster::NodeInfo;
use common_meta::datanode::RegionStat;
use common_meta::key::flow::FlowMetadataManager;
use common_procedure::ProcedureInfo;
use common_recordbatch::SendableRecordBatchStream;
use datatypes::schema::SchemaRef;
use lazy_static::lazy_static;
@@ -45,7 +49,7 @@ use views::InformationSchemaViews;
use self::columns::InformationSchemaColumns;
use super::{SystemSchemaProviderInner, SystemTable, SystemTableRef};
use crate::error::Result;
use crate::error::{Error, Result};
use crate::system_schema::information_schema::cluster_info::InformationSchemaClusterInfo;
use crate::system_schema::information_schema::flows::InformationSchemaFlows;
use crate::system_schema::information_schema::information_memory_table::get_schema_columns;
@@ -318,3 +322,39 @@ where
InformationTable::to_stream(self, request)
}
}
pub type InformationExtensionRef = Arc<dyn InformationExtension<Error = Error> + Send + Sync>;
/// The `InformationExtension` trait provides the extension methods for the `information_schema` tables.
#[async_trait::async_trait]
pub trait InformationExtension {
type Error: ErrorExt;
/// Gets the nodes information.
async fn nodes(&self) -> std::result::Result<Vec<NodeInfo>, Self::Error>;
/// Gets the procedures information.
async fn procedures(&self) -> std::result::Result<Vec<(String, ProcedureInfo)>, Self::Error>;
/// Gets the region statistics.
async fn region_stats(&self) -> std::result::Result<Vec<RegionStat>, Self::Error>;
}
pub struct NoopInformationExtension;
#[async_trait::async_trait]
impl InformationExtension for NoopInformationExtension {
type Error = Error;
async fn nodes(&self) -> std::result::Result<Vec<NodeInfo>, Self::Error> {
Ok(vec![])
}
async fn procedures(&self) -> std::result::Result<Vec<(String, ProcedureInfo)>, Self::Error> {
Ok(vec![])
}
async fn region_stats(&self) -> std::result::Result<Vec<RegionStat>, Self::Error> {
Ok(vec![])
}
}

View File

@@ -17,13 +17,10 @@ use std::time::Duration;
use arrow_schema::SchemaRef as ArrowSchemaRef;
use common_catalog::consts::INFORMATION_SCHEMA_CLUSTER_INFO_TABLE_ID;
use common_config::Mode;
use common_error::ext::BoxedError;
use common_meta::cluster::{ClusterInfo, NodeInfo, NodeStatus};
use common_meta::peer::Peer;
use common_meta::cluster::NodeInfo;
use common_recordbatch::adapter::RecordBatchStreamAdapter;
use common_recordbatch::{RecordBatch, SendableRecordBatchStream};
use common_telemetry::warn;
use common_time::timestamp::Timestamp;
use datafusion::execution::TaskContext;
use datafusion::physical_plan::stream::RecordBatchStreamAdapter as DfRecordBatchStreamAdapter;
@@ -40,7 +37,7 @@ use snafu::ResultExt;
use store_api::storage::{ScanRequest, TableId};
use super::CLUSTER_INFO;
use crate::error::{CreateRecordBatchSnafu, InternalSnafu, ListNodesSnafu, Result};
use crate::error::{CreateRecordBatchSnafu, InternalSnafu, Result};
use crate::system_schema::information_schema::{InformationTable, Predicates};
use crate::system_schema::utils;
use crate::CatalogManager;
@@ -70,7 +67,6 @@ const INIT_CAPACITY: usize = 42;
pub(super) struct InformationSchemaClusterInfo {
schema: SchemaRef,
catalog_manager: Weak<dyn CatalogManager>,
start_time_ms: u64,
}
impl InformationSchemaClusterInfo {
@@ -78,7 +74,6 @@ impl InformationSchemaClusterInfo {
Self {
schema: Self::schema(),
catalog_manager,
start_time_ms: common_time::util::current_time_millis() as u64,
}
}
@@ -100,11 +95,7 @@ impl InformationSchemaClusterInfo {
}
fn builder(&self) -> InformationSchemaClusterInfoBuilder {
InformationSchemaClusterInfoBuilder::new(
self.schema.clone(),
self.catalog_manager.clone(),
self.start_time_ms,
)
InformationSchemaClusterInfoBuilder::new(self.schema.clone(), self.catalog_manager.clone())
}
}
@@ -144,7 +135,6 @@ impl InformationTable for InformationSchemaClusterInfo {
struct InformationSchemaClusterInfoBuilder {
schema: SchemaRef,
start_time_ms: u64,
catalog_manager: Weak<dyn CatalogManager>,
peer_ids: Int64VectorBuilder,
@@ -158,11 +148,7 @@ struct InformationSchemaClusterInfoBuilder {
}
impl InformationSchemaClusterInfoBuilder {
fn new(
schema: SchemaRef,
catalog_manager: Weak<dyn CatalogManager>,
start_time_ms: u64,
) -> Self {
fn new(schema: SchemaRef, catalog_manager: Weak<dyn CatalogManager>) -> Self {
Self {
schema,
catalog_manager,
@@ -174,56 +160,17 @@ impl InformationSchemaClusterInfoBuilder {
start_times: TimestampMillisecondVectorBuilder::with_capacity(INIT_CAPACITY),
uptimes: StringVectorBuilder::with_capacity(INIT_CAPACITY),
active_times: StringVectorBuilder::with_capacity(INIT_CAPACITY),
start_time_ms,
}
}
/// Construct the `information_schema.cluster_info` virtual table
async fn make_cluster_info(&mut self, request: Option<ScanRequest>) -> Result<RecordBatch> {
let predicates = Predicates::from_scan_request(&request);
let mode = utils::running_mode(&self.catalog_manager)?.unwrap_or(Mode::Standalone);
match mode {
Mode::Standalone => {
let build_info = common_version::build_info();
self.add_node_info(
&predicates,
NodeInfo {
// For the standalone:
// - id always 0
// - empty string for peer_addr
peer: Peer {
id: 0,
addr: "".to_string(),
},
last_activity_ts: -1,
status: NodeStatus::Standalone,
version: build_info.version.to_string(),
git_commit: build_info.commit_short.to_string(),
// Use `self.start_time_ms` instead.
// It's not precise but enough.
start_time_ms: self.start_time_ms,
},
);
}
Mode::Distributed => {
if let Some(meta_client) = utils::meta_client(&self.catalog_manager)? {
let node_infos = meta_client
.list_nodes(None)
.await
.map_err(BoxedError::new)
.context(ListNodesSnafu)?;
for node_info in node_infos {
self.add_node_info(&predicates, node_info);
}
} else {
warn!("Could not find meta client in distributed mode.");
}
}
let information_extension = utils::information_extension(&self.catalog_manager)?;
let node_infos = information_extension.nodes().await?;
for node_info in node_infos {
self.add_node_info(&predicates, node_info);
}
self.finish()
}

View File

@@ -14,14 +14,10 @@
use std::sync::{Arc, Weak};
use api::v1::meta::{ProcedureMeta, ProcedureStatus};
use arrow_schema::SchemaRef as ArrowSchemaRef;
use common_catalog::consts::INFORMATION_SCHEMA_PROCEDURE_INFO_TABLE_ID;
use common_config::Mode;
use common_error::ext::BoxedError;
use common_meta::ddl::{ExecutorContext, ProcedureExecutor};
use common_meta::rpc::procedure;
use common_procedure::{ProcedureInfo, ProcedureState};
use common_procedure::ProcedureInfo;
use common_recordbatch::adapter::RecordBatchStreamAdapter;
use common_recordbatch::{RecordBatch, SendableRecordBatchStream};
use common_time::timestamp::Timestamp;
@@ -38,10 +34,7 @@ use snafu::ResultExt;
use store_api::storage::{ScanRequest, TableId};
use super::PROCEDURE_INFO;
use crate::error::{
ConvertProtoDataSnafu, CreateRecordBatchSnafu, GetProcedureClientSnafu, InternalSnafu,
ListProceduresSnafu, ProcedureIdNotFoundSnafu, Result,
};
use crate::error::{CreateRecordBatchSnafu, InternalSnafu, Result};
use crate::system_schema::information_schema::{InformationTable, Predicates};
use crate::system_schema::utils;
use crate::CatalogManager;
@@ -167,45 +160,11 @@ impl InformationSchemaProcedureInfoBuilder {
/// Construct the `information_schema.procedure_info` virtual table
async fn make_procedure_info(&mut self, request: Option<ScanRequest>) -> Result<RecordBatch> {
let predicates = Predicates::from_scan_request(&request);
let mode = utils::running_mode(&self.catalog_manager)?.unwrap_or(Mode::Standalone);
match mode {
Mode::Standalone => {
if let Some(procedure_manager) = utils::procedure_manager(&self.catalog_manager)? {
let procedures = procedure_manager
.list_procedures()
.await
.map_err(BoxedError::new)
.context(ListProceduresSnafu)?;
for procedure in procedures {
self.add_procedure(
&predicates,
procedure.state.as_str_name().to_string(),
procedure,
);
}
} else {
return GetProcedureClientSnafu { mode: "standalone" }.fail();
}
}
Mode::Distributed => {
if let Some(meta_client) = utils::meta_client(&self.catalog_manager)? {
let procedures = meta_client
.list_procedures(&ExecutorContext::default())
.await
.map_err(BoxedError::new)
.context(ListProceduresSnafu)?;
for procedure in procedures.procedures {
self.add_procedure_info(&predicates, procedure)?;
}
} else {
return GetProcedureClientSnafu {
mode: "distributed",
}
.fail();
}
}
};
let information_extension = utils::information_extension(&self.catalog_manager)?;
let procedures = information_extension.procedures().await?;
for (status, procedure_info) in procedures {
self.add_procedure(&predicates, status, procedure_info);
}
self.finish()
}
@@ -247,34 +206,6 @@ impl InformationSchemaProcedureInfoBuilder {
self.lock_keys.push(Some(&lock_keys));
}
fn add_procedure_info(
&mut self,
predicates: &Predicates,
procedure: ProcedureMeta,
) -> Result<()> {
let pid = match procedure.id {
Some(pid) => pid,
None => return ProcedureIdNotFoundSnafu {}.fail(),
};
let pid = procedure::pb_pid_to_pid(&pid)
.map_err(BoxedError::new)
.context(ConvertProtoDataSnafu)?;
let status = ProcedureStatus::try_from(procedure.status)
.map(|v| v.as_str_name())
.unwrap_or("Unknown")
.to_string();
let procedure_info = ProcedureInfo {
id: pid,
type_name: procedure.type_name,
start_time_ms: procedure.start_time_ms,
end_time_ms: procedure.end_time_ms,
state: ProcedureState::Running,
lock_keys: procedure.lock_keys,
};
self.add_procedure(predicates, status, procedure_info);
Ok(())
}
fn finish(&mut self) -> Result<RecordBatch> {
let columns: Vec<VectorRef> = vec![
Arc::new(self.procedure_ids.finish()),

View File

@@ -224,8 +224,8 @@ impl InformationSchemaRegionPeersBuilder {
let region_id = RegionId::new(table_id, route.region.id.region_number()).as_u64();
let peer_id = route.leader_peer.clone().map(|p| p.id);
let peer_addr = route.leader_peer.clone().map(|p| p.addr);
let status = if let Some(status) = route.leader_status {
Some(status.as_ref().to_string())
let state = if let Some(state) = route.leader_state {
Some(state.as_ref().to_string())
} else {
// Alive by default
Some("ALIVE".to_string())
@@ -242,7 +242,7 @@ impl InformationSchemaRegionPeersBuilder {
self.peer_ids.push(peer_id);
self.peer_addrs.push(peer_addr.as_deref());
self.is_leaders.push(Some("Yes"));
self.statuses.push(status.as_deref());
self.statuses.push(state.as_deref());
self.down_seconds
.push(route.leader_down_millis().map(|m| m / 1000));
}

View File

@@ -16,13 +16,10 @@ use std::sync::{Arc, Weak};
use arrow_schema::SchemaRef as ArrowSchemaRef;
use common_catalog::consts::INFORMATION_SCHEMA_REGION_STATISTICS_TABLE_ID;
use common_config::Mode;
use common_error::ext::BoxedError;
use common_meta::cluster::ClusterInfo;
use common_meta::datanode::RegionStat;
use common_recordbatch::adapter::RecordBatchStreamAdapter;
use common_recordbatch::{DfSendableRecordBatchStream, RecordBatch, SendableRecordBatchStream};
use common_telemetry::tracing::warn;
use datafusion::execution::TaskContext;
use datafusion::physical_plan::stream::RecordBatchStreamAdapter as DfRecordBatchStreamAdapter;
use datafusion::physical_plan::streaming::PartitionStream as DfPartitionStream;
@@ -34,7 +31,7 @@ use snafu::ResultExt;
use store_api::storage::{ScanRequest, TableId};
use super::{InformationTable, REGION_STATISTICS};
use crate::error::{CreateRecordBatchSnafu, InternalSnafu, ListRegionStatsSnafu, Result};
use crate::error::{CreateRecordBatchSnafu, InternalSnafu, Result};
use crate::information_schema::Predicates;
use crate::system_schema::utils;
use crate::CatalogManager;
@@ -167,28 +164,11 @@ impl InformationSchemaRegionStatisticsBuilder {
request: Option<ScanRequest>,
) -> Result<RecordBatch> {
let predicates = Predicates::from_scan_request(&request);
let mode = utils::running_mode(&self.catalog_manager)?.unwrap_or(Mode::Standalone);
match mode {
Mode::Standalone => {
// TODO(weny): implement it
}
Mode::Distributed => {
if let Some(meta_client) = utils::meta_client(&self.catalog_manager)? {
let region_stats = meta_client
.list_region_stats()
.await
.map_err(BoxedError::new)
.context(ListRegionStatsSnafu)?;
for region_stat in region_stats {
self.add_region_statistic(&predicates, region_stat);
}
} else {
warn!("Meta client is not available");
}
}
let information_extension = utils::information_extension(&self.catalog_manager)?;
let region_stats = information_extension.region_stats().await?;
for region_stat in region_stats {
self.add_region_statistic(&predicates, region_stat);
}
self.finish()
}

View File

@@ -12,48 +12,33 @@
// See the License for the specific language governing permissions and
// limitations under the License.
pub mod tables;
use std::sync::Weak;
use std::sync::{Arc, Weak};
use common_config::Mode;
use common_meta::key::TableMetadataManagerRef;
use common_procedure::ProcedureManagerRef;
use meta_client::client::MetaClient;
use snafu::OptionExt;
use crate::error::{Result, UpgradeWeakCatalogManagerRefSnafu};
use crate::error::{GetInformationExtensionSnafu, Result, UpgradeWeakCatalogManagerRefSnafu};
use crate::information_schema::InformationExtensionRef;
use crate::kvbackend::KvBackendCatalogManager;
use crate::CatalogManager;
/// Try to get the server running mode from `[CatalogManager]` weak reference.
pub fn running_mode(catalog_manager: &Weak<dyn CatalogManager>) -> Result<Option<Mode>> {
pub mod tables;
/// Try to get the `[InformationExtension]` from `[CatalogManager]` weak reference.
pub fn information_extension(
catalog_manager: &Weak<dyn CatalogManager>,
) -> Result<InformationExtensionRef> {
let catalog_manager = catalog_manager
.upgrade()
.context(UpgradeWeakCatalogManagerRefSnafu)?;
Ok(catalog_manager
let information_extension = catalog_manager
.as_any()
.downcast_ref::<KvBackendCatalogManager>()
.map(|manager| manager.running_mode())
.copied())
}
.map(|manager| manager.information_extension())
.context(GetInformationExtensionSnafu)?;
/// Try to get the `[MetaClient]` from `[CatalogManager]` weak reference.
pub fn meta_client(catalog_manager: &Weak<dyn CatalogManager>) -> Result<Option<Arc<MetaClient>>> {
let catalog_manager = catalog_manager
.upgrade()
.context(UpgradeWeakCatalogManagerRefSnafu)?;
let meta_client = match catalog_manager
.as_any()
.downcast_ref::<KvBackendCatalogManager>()
{
None => None,
Some(manager) => manager.meta_client(),
};
Ok(meta_client)
Ok(information_extension)
}
/// Try to get the `[TableMetadataManagerRef]` from `[CatalogManager]` weak reference.
@@ -69,17 +54,3 @@ pub fn table_meta_manager(
.downcast_ref::<KvBackendCatalogManager>()
.map(|manager| manager.table_metadata_manager_ref().clone()))
}
/// Try to get the `[ProcedureManagerRef]` from `[CatalogManager]` weak reference.
pub fn procedure_manager(
catalog_manager: &Weak<dyn CatalogManager>,
) -> Result<Option<ProcedureManagerRef>> {
let catalog_manager = catalog_manager
.upgrade()
.context(UpgradeWeakCatalogManagerRefSnafu)?;
Ok(catalog_manager
.as_any()
.downcast_ref::<KvBackendCatalogManager>()
.and_then(|manager| manager.procedure_manager()))
}

View File

@@ -259,7 +259,6 @@ mod tests {
use arrow::datatypes::{DataType, Field, Schema, SchemaRef};
use cache::{build_fundamental_cache_registry, with_default_composite_cache_registry};
use common_config::Mode;
use common_meta::cache::{CacheRegistryBuilder, LayeredCacheRegistryBuilder};
use common_meta::key::TableMetadataManager;
use common_meta::kv_backend::memory::MemoryKvBackend;
@@ -269,6 +268,8 @@ mod tests {
use datafusion::logical_expr::builder::LogicalTableSource;
use datafusion::logical_expr::{col, lit, LogicalPlan, LogicalPlanBuilder};
use crate::information_schema::NoopInformationExtension;
struct MockDecoder;
impl MockDecoder {
pub fn arc() -> Arc<Self> {
@@ -323,8 +324,7 @@ mod tests {
);
let catalog_manager = KvBackendCatalogManager::new(
Mode::Standalone,
None,
Arc::new(NoopInformationExtension),
backend.clone(),
layered_cache_registry,
None,

View File

@@ -158,7 +158,7 @@ fn create_region_routes(regions: Vec<RegionNumber>) -> Vec<RegionRoute> {
addr: String::new(),
}),
follower_peers: vec![],
leader_status: None,
leader_state: None,
leader_down_since: None,
});
}

View File

@@ -46,12 +46,12 @@ use substrait::{DFLogicalSubstraitConvertor, SubstraitPlan};
use crate::cli::cmd::ReplCommand;
use crate::cli::helper::RustylineHelper;
use crate::cli::AttachCommand;
use crate::error;
use crate::error::{
CollectRecordBatchesSnafu, ParseSqlSnafu, PlanStatementSnafu, PrettyPrintRecordBatchesSnafu,
ReadlineSnafu, ReplCreationSnafu, RequestDatabaseSnafu, Result, StartMetaClientSnafu,
SubstraitEncodeLogicalPlanSnafu,
};
use crate::{error, DistributedInformationExtension};
/// Captures the state of the repl, gathers commands and executes them one by one
pub struct Repl {
@@ -275,9 +275,9 @@ async fn create_query_engine(meta_addr: &str) -> Result<DatafusionQueryEngine> {
.build(),
);
let information_extension = Arc::new(DistributedInformationExtension::new(meta_client.clone()));
let catalog_manager = KvBackendCatalogManager::new(
Mode::Distributed,
Some(meta_client.clone()),
information_extension,
cached_meta_backend.clone(),
layered_cache_registry,
None,

View File

@@ -41,7 +41,7 @@ use crate::error::{
MissingConfigSnafu, Result, ShutdownFlownodeSnafu, StartFlownodeSnafu,
};
use crate::options::{GlobalOptions, GreptimeOptions};
use crate::{log_versions, App};
use crate::{log_versions, App, DistributedInformationExtension};
pub const APP_NAME: &str = "greptime-flownode";
@@ -269,9 +269,10 @@ impl StartCommand {
.build(),
);
let information_extension =
Arc::new(DistributedInformationExtension::new(meta_client.clone()));
let catalog_manager = KvBackendCatalogManager::new(
opts.mode,
Some(meta_client.clone()),
information_extension,
cached_meta_backend.clone(),
layered_cache_registry.clone(),
None,

View File

@@ -36,8 +36,8 @@ use frontend::instance::builder::FrontendBuilder;
use frontend::instance::{FrontendInstance, Instance as FeInstance};
use frontend::server::Services;
use meta_client::{MetaClientOptions, MetaClientType};
use query::stats::StatementStatistics;
use servers::tls::{TlsMode, TlsOption};
use servers::Mode;
use snafu::{OptionExt, ResultExt};
use tracing_appender::non_blocking::WorkerGuard;
@@ -46,7 +46,7 @@ use crate::error::{
Result, StartFrontendSnafu,
};
use crate::options::{GlobalOptions, GreptimeOptions};
use crate::{log_versions, App};
use crate::{log_versions, App, DistributedInformationExtension};
type FrontendOptions = GreptimeOptions<frontend::frontend::FrontendOptions>;
@@ -315,9 +315,10 @@ impl StartCommand {
.build(),
);
let information_extension =
Arc::new(DistributedInformationExtension::new(meta_client.clone()));
let catalog_manager = KvBackendCatalogManager::new(
Mode::Distributed,
Some(meta_client.clone()),
information_extension,
cached_meta_backend.clone(),
layered_cache_registry.clone(),
None,
@@ -352,6 +353,7 @@ impl StartCommand {
catalog_manager,
Arc::new(client),
meta_client,
StatementStatistics::new(opts.logging.slow_query.clone()),
)
.with_plugin(plugins.clone())
.with_local_cache_invalidator(layered_cache_registry)

View File

@@ -15,7 +15,17 @@
#![feature(assert_matches, let_chains)]
use async_trait::async_trait;
use catalog::information_schema::InformationExtension;
use client::api::v1::meta::ProcedureStatus;
use common_error::ext::BoxedError;
use common_meta::cluster::{ClusterInfo, NodeInfo};
use common_meta::datanode::RegionStat;
use common_meta::ddl::{ExecutorContext, ProcedureExecutor};
use common_meta::rpc::procedure;
use common_procedure::{ProcedureInfo, ProcedureState};
use common_telemetry::{error, info};
use meta_client::MetaClientRef;
use snafu::ResultExt;
use crate::error::Result;
@@ -94,3 +104,69 @@ fn log_env_flags() {
info!("argument: {}", argument);
}
}
pub struct DistributedInformationExtension {
meta_client: MetaClientRef,
}
impl DistributedInformationExtension {
pub fn new(meta_client: MetaClientRef) -> Self {
Self { meta_client }
}
}
#[async_trait::async_trait]
impl InformationExtension for DistributedInformationExtension {
type Error = catalog::error::Error;
async fn nodes(&self) -> std::result::Result<Vec<NodeInfo>, Self::Error> {
self.meta_client
.list_nodes(None)
.await
.map_err(BoxedError::new)
.context(catalog::error::ListNodesSnafu)
}
async fn procedures(&self) -> std::result::Result<Vec<(String, ProcedureInfo)>, Self::Error> {
let procedures = self
.meta_client
.list_procedures(&ExecutorContext::default())
.await
.map_err(BoxedError::new)
.context(catalog::error::ListProceduresSnafu)?
.procedures;
let mut result = Vec::with_capacity(procedures.len());
for procedure in procedures {
let pid = match procedure.id {
Some(pid) => pid,
None => return catalog::error::ProcedureIdNotFoundSnafu {}.fail(),
};
let pid = procedure::pb_pid_to_pid(&pid)
.map_err(BoxedError::new)
.context(catalog::error::ConvertProtoDataSnafu)?;
let status = ProcedureStatus::try_from(procedure.status)
.map(|v| v.as_str_name())
.unwrap_or("Unknown")
.to_string();
let procedure_info = ProcedureInfo {
id: pid,
type_name: procedure.type_name,
start_time_ms: procedure.start_time_ms,
end_time_ms: procedure.end_time_ms,
state: ProcedureState::Running,
lock_keys: procedure.lock_keys,
};
result.push((status, procedure_info));
}
Ok(result)
}
async fn region_stats(&self) -> std::result::Result<Vec<RegionStat>, Self::Error> {
self.meta_client
.list_region_stats()
.await
.map_err(BoxedError::new)
.context(catalog::error::ListRegionStatsSnafu)
}
}

View File

@@ -17,14 +17,18 @@ use std::{fs, path};
use async_trait::async_trait;
use cache::{build_fundamental_cache_registry, with_default_composite_cache_registry};
use catalog::information_schema::InformationExtension;
use catalog::kvbackend::KvBackendCatalogManager;
use clap::Parser;
use client::api::v1::meta::RegionRole;
use common_base::Plugins;
use common_catalog::consts::{MIN_USER_FLOW_ID, MIN_USER_TABLE_ID};
use common_config::{metadata_store_dir, Configurable, KvBackendConfig};
use common_error::ext::BoxedError;
use common_meta::cache::LayeredCacheRegistryBuilder;
use common_meta::cache_invalidator::CacheInvalidatorRef;
use common_meta::cluster::{NodeInfo, NodeStatus};
use common_meta::datanode::RegionStat;
use common_meta::ddl::flow_meta::{FlowMetadataAllocator, FlowMetadataAllocatorRef};
use common_meta::ddl::table_meta::{TableMetadataAllocator, TableMetadataAllocatorRef};
use common_meta::ddl::{DdlContext, NoopRegionFailureDetectorControl, ProcedureExecutorRef};
@@ -33,10 +37,11 @@ use common_meta::key::flow::{FlowMetadataManager, FlowMetadataManagerRef};
use common_meta::key::{TableMetadataManager, TableMetadataManagerRef};
use common_meta::kv_backend::KvBackendRef;
use common_meta::node_manager::NodeManagerRef;
use common_meta::peer::Peer;
use common_meta::region_keeper::MemoryRegionKeeper;
use common_meta::sequence::SequenceBuilder;
use common_meta::wal_options_allocator::{WalOptionsAllocator, WalOptionsAllocatorRef};
use common_procedure::ProcedureManagerRef;
use common_procedure::{ProcedureInfo, ProcedureManagerRef};
use common_telemetry::info;
use common_telemetry::logging::{LoggingOptions, TracingOptions};
use common_time::timezone::set_default_timezone;
@@ -44,6 +49,7 @@ use common_version::{short_version, version};
use common_wal::config::DatanodeWalConfig;
use datanode::config::{DatanodeOptions, ProcedureConfig, RegionEngineConfig, StorageConfig};
use datanode::datanode::{Datanode, DatanodeBuilder};
use datanode::region_server::RegionServer;
use file_engine::config::EngineConfig as FileEngineConfig;
use flow::{FlowWorkerManager, FlownodeBuilder, FrontendInvoker};
use frontend::frontend::FrontendOptions;
@@ -55,6 +61,7 @@ use frontend::service_config::{
};
use meta_srv::metasrv::{FLOW_ID_SEQ, TABLE_ID_SEQ};
use mito2::config::MitoConfig;
use query::stats::StatementStatistics;
use serde::{Deserialize, Serialize};
use servers::export_metrics::ExportMetricsOption;
use servers::grpc::GrpcOptions;
@@ -477,9 +484,18 @@ impl StartCommand {
.build(),
);
let datanode = DatanodeBuilder::new(dn_opts, plugins.clone())
.with_kv_backend(kv_backend.clone())
.build()
.await
.context(StartDatanodeSnafu)?;
let information_extension = Arc::new(StandaloneInformationExtension::new(
datanode.region_server(),
procedure_manager.clone(),
));
let catalog_manager = KvBackendCatalogManager::new(
dn_opts.mode,
None,
information_extension,
kv_backend.clone(),
layered_cache_registry.clone(),
Some(procedure_manager.clone()),
@@ -488,12 +504,6 @@ impl StartCommand {
let table_metadata_manager =
Self::create_table_metadata_manager(kv_backend.clone()).await?;
let datanode = DatanodeBuilder::new(dn_opts, plugins.clone())
.with_kv_backend(kv_backend.clone())
.build()
.await
.context(StartDatanodeSnafu)?;
let flow_metadata_manager = Arc::new(FlowMetadataManager::new(kv_backend.clone()));
let flow_builder = FlownodeBuilder::new(
Default::default(),
@@ -557,6 +567,7 @@ impl StartCommand {
catalog_manager.clone(),
node_manager.clone(),
ddl_task_executor.clone(),
StatementStatistics::new(opts.logging.slow_query.clone()),
)
.with_plugin(plugins.clone())
.try_build()
@@ -642,6 +653,91 @@ impl StartCommand {
}
}
struct StandaloneInformationExtension {
region_server: RegionServer,
procedure_manager: ProcedureManagerRef,
start_time_ms: u64,
}
impl StandaloneInformationExtension {
pub fn new(region_server: RegionServer, procedure_manager: ProcedureManagerRef) -> Self {
Self {
region_server,
procedure_manager,
start_time_ms: common_time::util::current_time_millis() as u64,
}
}
}
#[async_trait::async_trait]
impl InformationExtension for StandaloneInformationExtension {
type Error = catalog::error::Error;
async fn nodes(&self) -> std::result::Result<Vec<NodeInfo>, Self::Error> {
let build_info = common_version::build_info();
let node_info = NodeInfo {
// For the standalone:
// - id always 0
// - empty string for peer_addr
peer: Peer {
id: 0,
addr: "".to_string(),
},
last_activity_ts: -1,
status: NodeStatus::Standalone,
version: build_info.version.to_string(),
git_commit: build_info.commit_short.to_string(),
// Use `self.start_time_ms` instead.
// It's not precise but enough.
start_time_ms: self.start_time_ms,
};
Ok(vec![node_info])
}
async fn procedures(&self) -> std::result::Result<Vec<(String, ProcedureInfo)>, Self::Error> {
self.procedure_manager
.list_procedures()
.await
.map_err(BoxedError::new)
.map(|procedures| {
procedures
.into_iter()
.map(|procedure| {
let status = procedure.state.as_str_name().to_string();
(status, procedure)
})
.collect::<Vec<_>>()
})
.context(catalog::error::ListProceduresSnafu)
}
async fn region_stats(&self) -> std::result::Result<Vec<RegionStat>, Self::Error> {
let stats = self
.region_server
.reportable_regions()
.into_iter()
.map(|stat| {
let region_stat = self
.region_server
.region_statistic(stat.region_id)
.unwrap_or_default();
RegionStat {
id: stat.region_id,
rcus: 0,
wcus: 0,
approximate_bytes: region_stat.estimated_disk_size() as i64,
engine: stat.engine,
role: RegionRole::from(stat.role).into(),
memtable_size: region_stat.memtable_size,
manifest_size: region_stat.manifest_size,
sst_size: region_stat.sst_size,
}
})
.collect::<Vec<_>>();
Ok(stats)
}
}
#[cfg(test)]
mod tests {
use std::default::Default;

View File

@@ -31,6 +31,7 @@ pub use polyval::PolyvalAccumulatorCreator;
pub use scipy_stats_norm_cdf::ScipyStatsNormCdfAccumulatorCreator;
pub use scipy_stats_norm_pdf::ScipyStatsNormPdfAccumulatorCreator;
use super::geo::encoding::JsonPathEncodeFunctionCreator;
use crate::function_registry::FunctionRegistry;
/// A function creates `AggregateFunctionCreator`.
@@ -91,5 +92,7 @@ impl AggregateFunctions {
register_aggr_func!("argmin", 1, ArgminAccumulatorCreator);
register_aggr_func!("scipystatsnormcdf", 2, ScipyStatsNormCdfAccumulatorCreator);
register_aggr_func!("scipystatsnormpdf", 2, ScipyStatsNormPdfAccumulatorCreator);
register_aggr_func!("json_encode_path", 3, JsonPathEncodeFunctionCreator);
}
}

View File

@@ -16,7 +16,10 @@ use std::cmp::Ordering;
use std::sync::Arc;
use common_macro::{as_aggr_func_creator, AggrFuncTypeStore};
use common_query::error::{BadAccumulatorImplSnafu, CreateAccumulatorSnafu, Result};
use common_query::error::{
BadAccumulatorImplSnafu, CreateAccumulatorSnafu, InvalidInputStateSnafu, Result,
};
use common_query::logical_plan::accumulator::AggrFuncTypeStore;
use common_query::logical_plan::{Accumulator, AggregateFunctionCreator};
use common_query::prelude::*;
use datatypes::prelude::*;

View File

@@ -16,7 +16,10 @@ use std::cmp::Ordering;
use std::sync::Arc;
use common_macro::{as_aggr_func_creator, AggrFuncTypeStore};
use common_query::error::{BadAccumulatorImplSnafu, CreateAccumulatorSnafu, Result};
use common_query::error::{
BadAccumulatorImplSnafu, CreateAccumulatorSnafu, InvalidInputStateSnafu, Result,
};
use common_query::logical_plan::accumulator::AggrFuncTypeStore;
use common_query::logical_plan::{Accumulator, AggregateFunctionCreator};
use common_query::prelude::*;
use datatypes::prelude::*;

View File

@@ -17,8 +17,10 @@ use std::sync::Arc;
use common_macro::{as_aggr_func_creator, AggrFuncTypeStore};
use common_query::error::{
CreateAccumulatorSnafu, DowncastVectorSnafu, FromScalarValueSnafu, Result,
CreateAccumulatorSnafu, DowncastVectorSnafu, FromScalarValueSnafu, InvalidInputStateSnafu,
Result,
};
use common_query::logical_plan::accumulator::AggrFuncTypeStore;
use common_query::logical_plan::{Accumulator, AggregateFunctionCreator};
use common_query::prelude::*;
use datatypes::prelude::*;

View File

@@ -17,8 +17,10 @@ use std::sync::Arc;
use common_macro::{as_aggr_func_creator, AggrFuncTypeStore};
use common_query::error::{
BadAccumulatorImplSnafu, CreateAccumulatorSnafu, DowncastVectorSnafu, Result,
BadAccumulatorImplSnafu, CreateAccumulatorSnafu, DowncastVectorSnafu, InvalidInputStateSnafu,
Result,
};
use common_query::logical_plan::accumulator::AggrFuncTypeStore;
use common_query::logical_plan::{Accumulator, AggregateFunctionCreator};
use common_query::prelude::*;
use datatypes::prelude::*;

View File

@@ -18,8 +18,9 @@ use std::sync::Arc;
use common_macro::{as_aggr_func_creator, AggrFuncTypeStore};
use common_query::error::{
self, BadAccumulatorImplSnafu, CreateAccumulatorSnafu, DowncastVectorSnafu,
FromScalarValueSnafu, InvalidInputColSnafu, Result,
FromScalarValueSnafu, InvalidInputColSnafu, InvalidInputStateSnafu, Result,
};
use common_query::logical_plan::accumulator::AggrFuncTypeStore;
use common_query::logical_plan::{Accumulator, AggregateFunctionCreator};
use common_query::prelude::*;
use datatypes::prelude::*;

View File

@@ -17,8 +17,10 @@ use std::sync::Arc;
use common_macro::{as_aggr_func_creator, AggrFuncTypeStore};
use common_query::error::{
self, BadAccumulatorImplSnafu, CreateAccumulatorSnafu, DowncastVectorSnafu,
FromScalarValueSnafu, GenerateFunctionSnafu, InvalidInputColSnafu, Result,
FromScalarValueSnafu, GenerateFunctionSnafu, InvalidInputColSnafu, InvalidInputStateSnafu,
Result,
};
use common_query::logical_plan::accumulator::AggrFuncTypeStore;
use common_query::logical_plan::{Accumulator, AggregateFunctionCreator};
use common_query::prelude::*;
use datatypes::prelude::*;

View File

@@ -17,8 +17,10 @@ use std::sync::Arc;
use common_macro::{as_aggr_func_creator, AggrFuncTypeStore};
use common_query::error::{
self, BadAccumulatorImplSnafu, CreateAccumulatorSnafu, DowncastVectorSnafu,
FromScalarValueSnafu, GenerateFunctionSnafu, InvalidInputColSnafu, Result,
FromScalarValueSnafu, GenerateFunctionSnafu, InvalidInputColSnafu, InvalidInputStateSnafu,
Result,
};
use common_query::logical_plan::accumulator::AggrFuncTypeStore;
use common_query::logical_plan::{Accumulator, AggregateFunctionCreator};
use common_query::prelude::*;
use datatypes::prelude::*;

View File

@@ -13,8 +13,10 @@
// limitations under the License.
use std::sync::Arc;
pub(crate) mod encoding;
mod geohash;
mod h3;
mod helpers;
use geohash::{GeohashFunction, GeohashNeighboursFunction};
@@ -27,18 +29,31 @@ impl GeoFunctions {
// geohash
registry.register(Arc::new(GeohashFunction));
registry.register(Arc::new(GeohashNeighboursFunction));
// h3 family
// h3 index
registry.register(Arc::new(h3::H3LatLngToCell));
registry.register(Arc::new(h3::H3LatLngToCellString));
// h3 index inspection
registry.register(Arc::new(h3::H3CellBase));
registry.register(Arc::new(h3::H3CellCenterChild));
registry.register(Arc::new(h3::H3CellCenterLat));
registry.register(Arc::new(h3::H3CellCenterLng));
registry.register(Arc::new(h3::H3CellIsPentagon));
registry.register(Arc::new(h3::H3CellParent));
registry.register(Arc::new(h3::H3CellResolution));
registry.register(Arc::new(h3::H3CellToString));
registry.register(Arc::new(h3::H3IsNeighbour));
registry.register(Arc::new(h3::H3StringToCell));
registry.register(Arc::new(h3::H3CellToString));
registry.register(Arc::new(h3::H3CellCenterLatLng));
registry.register(Arc::new(h3::H3CellResolution));
// h3 hierarchical grid
registry.register(Arc::new(h3::H3CellCenterChild));
registry.register(Arc::new(h3::H3CellParent));
registry.register(Arc::new(h3::H3CellToChildren));
registry.register(Arc::new(h3::H3CellToChildrenSize));
registry.register(Arc::new(h3::H3CellToChildPos));
registry.register(Arc::new(h3::H3ChildPosToCell));
// h3 grid traversal
registry.register(Arc::new(h3::H3GridDisk));
registry.register(Arc::new(h3::H3GridDiskDistances));
registry.register(Arc::new(h3::H3GridDistance));
registry.register(Arc::new(h3::H3GridPathCells));
}
}

View File

@@ -0,0 +1,223 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
use std::sync::Arc;
use common_error::ext::{BoxedError, PlainError};
use common_error::status_code::StatusCode;
use common_macro::{as_aggr_func_creator, AggrFuncTypeStore};
use common_query::error::{self, InvalidFuncArgsSnafu, InvalidInputStateSnafu, Result};
use common_query::logical_plan::accumulator::AggrFuncTypeStore;
use common_query::logical_plan::{Accumulator, AggregateFunctionCreator};
use common_query::prelude::AccumulatorCreatorFunction;
use common_time::Timestamp;
use datatypes::prelude::ConcreteDataType;
use datatypes::value::{ListValue, Value};
use datatypes::vectors::VectorRef;
use snafu::{ensure, ResultExt};
use super::helpers::{ensure_columns_len, ensure_columns_n};
/// Accumulator of lat, lng, timestamp tuples
#[derive(Debug)]
pub struct JsonPathAccumulator {
timestamp_type: ConcreteDataType,
lat: Vec<Option<f64>>,
lng: Vec<Option<f64>>,
timestamp: Vec<Option<Timestamp>>,
}
impl JsonPathAccumulator {
fn new(timestamp_type: ConcreteDataType) -> Self {
Self {
lat: Vec::default(),
lng: Vec::default(),
timestamp: Vec::default(),
timestamp_type,
}
}
}
impl Accumulator for JsonPathAccumulator {
fn state(&self) -> Result<Vec<Value>> {
Ok(vec![
Value::List(ListValue::new(
self.lat.iter().map(|i| Value::from(*i)).collect(),
ConcreteDataType::float64_datatype(),
)),
Value::List(ListValue::new(
self.lng.iter().map(|i| Value::from(*i)).collect(),
ConcreteDataType::float64_datatype(),
)),
Value::List(ListValue::new(
self.timestamp.iter().map(|i| Value::from(*i)).collect(),
self.timestamp_type.clone(),
)),
])
}
fn update_batch(&mut self, columns: &[VectorRef]) -> Result<()> {
// update batch as in datafusion just provides the accumulator original
// input.
//
// columns is vec of [`lat`, `lng`, `timestamp`]
// where
// - `lat` is a vector of `Value::Float64` or similar type. Each item in
// the vector is a row in given dataset.
// - so on so forth for `lng` and `timestamp`
ensure_columns_n!(columns, 3);
let lat = &columns[0];
let lng = &columns[1];
let ts = &columns[2];
let size = lat.len();
for idx in 0..size {
self.lat.push(lat.get(idx).as_f64_lossy());
self.lng.push(lng.get(idx).as_f64_lossy());
self.timestamp.push(ts.get(idx).as_timestamp());
}
Ok(())
}
fn merge_batch(&mut self, states: &[VectorRef]) -> Result<()> {
// merge batch as in datafusion gives state accumulated from the data
// returned from child accumulators' state() call
// In our particular implementation, the data structure is like
//
// states is vec of [`lat`, `lng`, `timestamp`]
// where
// - `lat` is a vector of `Value::List`. Each item in the list is all
// coordinates from a child accumulator.
// - so on so forth for `lng` and `timestamp`
ensure_columns_n!(states, 3);
let lat_lists = &states[0];
let lng_lists = &states[1];
let ts_lists = &states[2];
let len = lat_lists.len();
for idx in 0..len {
if let Some(lat_list) = lat_lists
.get(idx)
.as_list()
.map_err(BoxedError::new)
.context(error::ExecuteSnafu)?
{
for v in lat_list.items() {
self.lat.push(v.as_f64_lossy());
}
}
if let Some(lng_list) = lng_lists
.get(idx)
.as_list()
.map_err(BoxedError::new)
.context(error::ExecuteSnafu)?
{
for v in lng_list.items() {
self.lng.push(v.as_f64_lossy());
}
}
if let Some(ts_list) = ts_lists
.get(idx)
.as_list()
.map_err(BoxedError::new)
.context(error::ExecuteSnafu)?
{
for v in ts_list.items() {
self.timestamp.push(v.as_timestamp());
}
}
}
Ok(())
}
fn evaluate(&self) -> Result<Value> {
let mut work_vec: Vec<(&Option<f64>, &Option<f64>, &Option<Timestamp>)> = self
.lat
.iter()
.zip(self.lng.iter())
.zip(self.timestamp.iter())
.map(|((a, b), c)| (a, b, c))
.collect();
// sort by timestamp, we treat null timestamp as 0
work_vec.sort_unstable_by_key(|tuple| tuple.2.unwrap_or_else(|| Timestamp::new_second(0)));
let result = serde_json::to_string(
&work_vec
.into_iter()
// note that we transform to lng,lat for geojson compatibility
.map(|(lat, lng, _)| vec![lng, lat])
.collect::<Vec<Vec<&Option<f64>>>>(),
)
.map_err(|e| {
BoxedError::new(PlainError::new(
format!("Serialization failure: {}", e),
StatusCode::EngineExecuteQuery,
))
})
.context(error::ExecuteSnafu)?;
Ok(Value::String(result.into()))
}
}
/// This function accept rows of lat, lng and timestamp, sort with timestamp and
/// encoding them into a geojson-like path.
///
/// Example:
///
/// ```sql
/// SELECT json_encode_path(lat, lon, timestamp) FROM table [group by ...];
/// ```
///
#[as_aggr_func_creator]
#[derive(Debug, Default, AggrFuncTypeStore)]
pub struct JsonPathEncodeFunctionCreator {}
impl AggregateFunctionCreator for JsonPathEncodeFunctionCreator {
fn creator(&self) -> AccumulatorCreatorFunction {
let creator: AccumulatorCreatorFunction = Arc::new(move |types: &[ConcreteDataType]| {
let ts_type = types[2].clone();
Ok(Box::new(JsonPathAccumulator::new(ts_type)))
});
creator
}
fn output_type(&self) -> Result<ConcreteDataType> {
Ok(ConcreteDataType::string_datatype())
}
fn state_types(&self) -> Result<Vec<ConcreteDataType>> {
let input_types = self.input_types()?;
ensure!(input_types.len() == 3, InvalidInputStateSnafu);
let timestamp_type = input_types[2].clone();
Ok(vec![
ConcreteDataType::list_datatype(ConcreteDataType::float64_datatype()),
ConcreteDataType::list_datatype(ConcreteDataType::float64_datatype()),
ConcreteDataType::list_datatype(timestamp_type),
])
}
}

View File

@@ -20,18 +20,71 @@ use common_query::error::{self, InvalidFuncArgsSnafu, Result};
use common_query::prelude::{Signature, TypeSignature};
use datafusion::logical_expr::Volatility;
use datatypes::prelude::ConcreteDataType;
use datatypes::scalars::ScalarVectorBuilder;
use datatypes::value::Value;
use datatypes::scalars::{Scalar, ScalarVectorBuilder};
use datatypes::value::{ListValue, Value};
use datatypes::vectors::{
BooleanVectorBuilder, Float64VectorBuilder, MutableVector, StringVectorBuilder,
UInt64VectorBuilder, UInt8VectorBuilder, VectorRef,
BooleanVectorBuilder, Int32VectorBuilder, ListVectorBuilder, MutableVector,
StringVectorBuilder, UInt64VectorBuilder, UInt8VectorBuilder, VectorRef,
};
use derive_more::Display;
use h3o::{CellIndex, LatLng, Resolution};
use once_cell::sync::Lazy;
use snafu::{ensure, ResultExt};
use super::helpers::{ensure_columns_len, ensure_columns_n};
use crate::function::{Function, FunctionContext};
static CELL_TYPES: Lazy<Vec<ConcreteDataType>> = Lazy::new(|| {
vec![
ConcreteDataType::int64_datatype(),
ConcreteDataType::uint64_datatype(),
]
});
static COORDINATE_TYPES: Lazy<Vec<ConcreteDataType>> = Lazy::new(|| {
vec![
ConcreteDataType::float32_datatype(),
ConcreteDataType::float64_datatype(),
]
});
static RESOLUTION_TYPES: Lazy<Vec<ConcreteDataType>> = Lazy::new(|| {
vec![
ConcreteDataType::int8_datatype(),
ConcreteDataType::int16_datatype(),
ConcreteDataType::int32_datatype(),
ConcreteDataType::int64_datatype(),
ConcreteDataType::uint8_datatype(),
ConcreteDataType::uint16_datatype(),
ConcreteDataType::uint32_datatype(),
ConcreteDataType::uint64_datatype(),
]
});
static DISTANCE_TYPES: Lazy<Vec<ConcreteDataType>> = Lazy::new(|| {
vec![
ConcreteDataType::int8_datatype(),
ConcreteDataType::int16_datatype(),
ConcreteDataType::int32_datatype(),
ConcreteDataType::int64_datatype(),
ConcreteDataType::uint8_datatype(),
ConcreteDataType::uint16_datatype(),
ConcreteDataType::uint32_datatype(),
ConcreteDataType::uint64_datatype(),
]
});
static POSITION_TYPES: Lazy<Vec<ConcreteDataType>> = Lazy::new(|| {
vec![
ConcreteDataType::int8_datatype(),
ConcreteDataType::int16_datatype(),
ConcreteDataType::int32_datatype(),
ConcreteDataType::int64_datatype(),
ConcreteDataType::uint8_datatype(),
ConcreteDataType::uint16_datatype(),
ConcreteDataType::uint32_datatype(),
ConcreteDataType::uint64_datatype(),
]
});
/// Function that returns [h3] encoding cellid for a given geospatial coordinate.
///
/// [h3]: https://h3geo.org/
@@ -50,20 +103,8 @@ impl Function for H3LatLngToCell {
fn signature(&self) -> Signature {
let mut signatures = Vec::new();
for coord_type in &[
ConcreteDataType::float32_datatype(),
ConcreteDataType::float64_datatype(),
] {
for resolution_type in &[
ConcreteDataType::int8_datatype(),
ConcreteDataType::int16_datatype(),
ConcreteDataType::int32_datatype(),
ConcreteDataType::int64_datatype(),
ConcreteDataType::uint8_datatype(),
ConcreteDataType::uint16_datatype(),
ConcreteDataType::uint32_datatype(),
ConcreteDataType::uint64_datatype(),
] {
for coord_type in COORDINATE_TYPES.as_slice() {
for resolution_type in RESOLUTION_TYPES.as_slice() {
signatures.push(TypeSignature::Exact(vec![
// latitude
coord_type.clone(),
@@ -78,15 +119,7 @@ impl Function for H3LatLngToCell {
}
fn eval(&self, _func_ctx: FunctionContext, columns: &[VectorRef]) -> Result<VectorRef> {
ensure!(
columns.len() == 3,
InvalidFuncArgsSnafu {
err_msg: format!(
"The length of the args is not correct, expect 3, provided : {}",
columns.len()
),
}
);
ensure_columns_n!(columns, 3);
let lat_vec = &columns[0];
let lon_vec = &columns[1];
@@ -142,20 +175,8 @@ impl Function for H3LatLngToCellString {
fn signature(&self) -> Signature {
let mut signatures = Vec::new();
for coord_type in &[
ConcreteDataType::float32_datatype(),
ConcreteDataType::float64_datatype(),
] {
for resolution_type in &[
ConcreteDataType::int8_datatype(),
ConcreteDataType::int16_datatype(),
ConcreteDataType::int32_datatype(),
ConcreteDataType::int64_datatype(),
ConcreteDataType::uint8_datatype(),
ConcreteDataType::uint16_datatype(),
ConcreteDataType::uint32_datatype(),
ConcreteDataType::uint64_datatype(),
] {
for coord_type in COORDINATE_TYPES.as_slice() {
for resolution_type in RESOLUTION_TYPES.as_slice() {
signatures.push(TypeSignature::Exact(vec![
// latitude
coord_type.clone(),
@@ -170,15 +191,7 @@ impl Function for H3LatLngToCellString {
}
fn eval(&self, _func_ctx: FunctionContext, columns: &[VectorRef]) -> Result<VectorRef> {
ensure!(
columns.len() == 3,
InvalidFuncArgsSnafu {
err_msg: format!(
"The length of the args is not correct, expect 3, provided : {}",
columns.len()
),
}
);
ensure_columns_n!(columns, 3);
let lat_vec = &columns[0];
let lon_vec = &columns[1];
@@ -234,15 +247,7 @@ impl Function for H3CellToString {
}
fn eval(&self, _func_ctx: FunctionContext, columns: &[VectorRef]) -> Result<VectorRef> {
ensure!(
columns.len() == 1,
InvalidFuncArgsSnafu {
err_msg: format!(
"The length of the args is not correct, expect 1, provided : {}",
columns.len()
),
}
);
ensure_columns_n!(columns, 1);
let cell_vec = &columns[0];
let size = cell_vec.len();
@@ -280,15 +285,7 @@ impl Function for H3StringToCell {
}
fn eval(&self, _func_ctx: FunctionContext, columns: &[VectorRef]) -> Result<VectorRef> {
ensure!(
columns.len() == 1,
InvalidFuncArgsSnafu {
err_msg: format!(
"The length of the args is not correct, expect 1, provided : {}",
columns.len()
),
}
);
ensure_columns_n!(columns, 1);
let string_vec = &columns[0];
let size = string_vec.len();
@@ -319,18 +316,20 @@ impl Function for H3StringToCell {
}
}
/// Function that returns centroid latitude of given cell id
/// Function that returns centroid latitude and longitude of given cell id
#[derive(Clone, Debug, Default, Display)]
#[display("{}", self.name())]
pub struct H3CellCenterLat;
pub struct H3CellCenterLatLng;
impl Function for H3CellCenterLat {
impl Function for H3CellCenterLatLng {
fn name(&self) -> &str {
"h3_cell_center_lat"
"h3_cell_center_latlng"
}
fn return_type(&self, _input_types: &[ConcreteDataType]) -> Result<ConcreteDataType> {
Ok(ConcreteDataType::float64_datatype())
Ok(ConcreteDataType::list_datatype(
ConcreteDataType::float64_datatype(),
))
}
fn signature(&self) -> Signature {
@@ -338,69 +337,26 @@ impl Function for H3CellCenterLat {
}
fn eval(&self, _func_ctx: FunctionContext, columns: &[VectorRef]) -> Result<VectorRef> {
ensure!(
columns.len() == 1,
InvalidFuncArgsSnafu {
err_msg: format!(
"The length of the args is not correct, expect 1, provided : {}",
columns.len()
),
}
);
ensure_columns_n!(columns, 1);
let cell_vec = &columns[0];
let size = cell_vec.len();
let mut results = Float64VectorBuilder::with_capacity(size);
let mut results =
ListVectorBuilder::with_type_capacity(ConcreteDataType::float64_datatype(), size);
for i in 0..size {
let cell = cell_from_value(cell_vec.get(i))?;
let lat = cell.map(|cell| LatLng::from(cell).lat());
let latlng = cell.map(LatLng::from);
results.push(lat);
}
Ok(results.to_vector())
}
}
/// Function that returns centroid longitude of given cell id
#[derive(Clone, Debug, Default, Display)]
#[display("{}", self.name())]
pub struct H3CellCenterLng;
impl Function for H3CellCenterLng {
fn name(&self) -> &str {
"h3_cell_center_lng"
}
fn return_type(&self, _input_types: &[ConcreteDataType]) -> Result<ConcreteDataType> {
Ok(ConcreteDataType::float64_datatype())
}
fn signature(&self) -> Signature {
signature_of_cell()
}
fn eval(&self, _func_ctx: FunctionContext, columns: &[VectorRef]) -> Result<VectorRef> {
ensure!(
columns.len() == 1,
InvalidFuncArgsSnafu {
err_msg: format!(
"The length of the args is not correct, expect 1, provided : {}",
columns.len()
),
if let Some(latlng) = latlng {
let result = ListValue::new(
vec![latlng.lat().into(), latlng.lng().into()],
ConcreteDataType::float64_datatype(),
);
results.push(Some(result.as_scalar_ref()));
} else {
results.push(None);
}
);
let cell_vec = &columns[0];
let size = cell_vec.len();
let mut results = Float64VectorBuilder::with_capacity(size);
for i in 0..size {
let cell = cell_from_value(cell_vec.get(i))?;
let lat = cell.map(|cell| LatLng::from(cell).lng());
results.push(lat);
}
Ok(results.to_vector())
@@ -470,15 +426,7 @@ impl Function for H3CellBase {
}
fn eval(&self, _func_ctx: FunctionContext, columns: &[VectorRef]) -> Result<VectorRef> {
ensure!(
columns.len() == 1,
InvalidFuncArgsSnafu {
err_msg: format!(
"The length of the args is not correct, expect 1, provided : {}",
columns.len()
),
}
);
ensure_columns_n!(columns, 1);
let cell_vec = &columns[0];
let size = cell_vec.len();
@@ -514,15 +462,7 @@ impl Function for H3CellIsPentagon {
}
fn eval(&self, _func_ctx: FunctionContext, columns: &[VectorRef]) -> Result<VectorRef> {
ensure!(
columns.len() == 1,
InvalidFuncArgsSnafu {
err_msg: format!(
"The length of the args is not correct, expect 1, provided : {}",
columns.len()
),
}
);
ensure_columns_n!(columns, 1);
let cell_vec = &columns[0];
let size = cell_vec.len();
@@ -558,15 +498,7 @@ impl Function for H3CellCenterChild {
}
fn eval(&self, _func_ctx: FunctionContext, columns: &[VectorRef]) -> Result<VectorRef> {
ensure!(
columns.len() == 2,
InvalidFuncArgsSnafu {
err_msg: format!(
"The length of the args is not correct, expect 2, provided : {}",
columns.len()
),
}
);
ensure_columns_n!(columns, 2);
let cell_vec = &columns[0];
let res_vec = &columns[1];
@@ -606,15 +538,7 @@ impl Function for H3CellParent {
}
fn eval(&self, _func_ctx: FunctionContext, columns: &[VectorRef]) -> Result<VectorRef> {
ensure!(
columns.len() == 2,
InvalidFuncArgsSnafu {
err_msg: format!(
"The length of the args is not correct, expect 2, provided : {}",
columns.len()
),
}
);
ensure_columns_n!(columns, 2);
let cell_vec = &columns[0];
let res_vec = &columns[1];
@@ -633,48 +557,323 @@ impl Function for H3CellParent {
}
}
/// Function that checks if two cells are neighbour
/// Function that returns children cell list
#[derive(Clone, Debug, Default, Display)]
#[display("{}", self.name())]
pub struct H3IsNeighbour;
pub struct H3CellToChildren;
impl Function for H3IsNeighbour {
impl Function for H3CellToChildren {
fn name(&self) -> &str {
"h3_is_neighbour"
"h3_cell_to_children"
}
fn return_type(&self, _input_types: &[ConcreteDataType]) -> Result<ConcreteDataType> {
Ok(ConcreteDataType::boolean_datatype())
Ok(ConcreteDataType::list_datatype(
ConcreteDataType::uint64_datatype(),
))
}
fn signature(&self) -> Signature {
signature_of_double_cell()
signature_of_cell_and_resolution()
}
fn eval(&self, _func_ctx: FunctionContext, columns: &[VectorRef]) -> Result<VectorRef> {
ensure!(
columns.len() == 2,
InvalidFuncArgsSnafu {
err_msg: format!(
"The length of the args is not correct, expect 2, provided : {}",
columns.len()
),
}
);
ensure_columns_n!(columns, 2);
let cell_vec = &columns[0];
let cell2_vec = &columns[1];
let res_vec = &columns[1];
let size = cell_vec.len();
let mut results = BooleanVectorBuilder::with_capacity(size);
let mut results =
ListVectorBuilder::with_type_capacity(ConcreteDataType::uint64_datatype(), size);
for i in 0..size {
let cell = cell_from_value(cell_vec.get(i))?;
let res = value_to_resolution(res_vec.get(i))?;
let result = cell.map(|cell| {
let children: Vec<Value> = cell
.children(res)
.map(|child| Value::from(u64::from(child)))
.collect();
ListValue::new(children, ConcreteDataType::uint64_datatype())
});
if let Some(list_value) = result {
results.push(Some(list_value.as_scalar_ref()));
} else {
results.push(None);
}
}
Ok(results.to_vector())
}
}
/// Function that returns children cell count
#[derive(Clone, Debug, Default, Display)]
#[display("{}", self.name())]
pub struct H3CellToChildrenSize;
impl Function for H3CellToChildrenSize {
fn name(&self) -> &str {
"h3_cell_to_children_size"
}
fn return_type(&self, _input_types: &[ConcreteDataType]) -> Result<ConcreteDataType> {
Ok(ConcreteDataType::uint64_datatype())
}
fn signature(&self) -> Signature {
signature_of_cell_and_resolution()
}
fn eval(&self, _func_ctx: FunctionContext, columns: &[VectorRef]) -> Result<VectorRef> {
ensure_columns_n!(columns, 2);
let cell_vec = &columns[0];
let res_vec = &columns[1];
let size = cell_vec.len();
let mut results = UInt64VectorBuilder::with_capacity(size);
for i in 0..size {
let cell = cell_from_value(cell_vec.get(i))?;
let res = value_to_resolution(res_vec.get(i))?;
let result = cell.map(|cell| cell.children_count(res));
results.push(result);
}
Ok(results.to_vector())
}
}
/// Function that returns the cell position if its parent at given resolution
#[derive(Clone, Debug, Default, Display)]
#[display("{}", self.name())]
pub struct H3CellToChildPos;
impl Function for H3CellToChildPos {
fn name(&self) -> &str {
"h3_cell_to_child_pos"
}
fn return_type(&self, _input_types: &[ConcreteDataType]) -> Result<ConcreteDataType> {
Ok(ConcreteDataType::uint64_datatype())
}
fn signature(&self) -> Signature {
signature_of_cell_and_resolution()
}
fn eval(&self, _func_ctx: FunctionContext, columns: &[VectorRef]) -> Result<VectorRef> {
ensure_columns_n!(columns, 2);
let cell_vec = &columns[0];
let res_vec = &columns[1];
let size = cell_vec.len();
let mut results = UInt64VectorBuilder::with_capacity(size);
for i in 0..size {
let cell = cell_from_value(cell_vec.get(i))?;
let res = value_to_resolution(res_vec.get(i))?;
let result = cell.and_then(|cell| cell.child_position(res));
results.push(result);
}
Ok(results.to_vector())
}
}
/// Function that returns the cell at given position of the parent at given resolution
#[derive(Clone, Debug, Default, Display)]
#[display("{}", self.name())]
pub struct H3ChildPosToCell;
impl Function for H3ChildPosToCell {
fn name(&self) -> &str {
"h3_child_pos_to_cell"
}
fn return_type(&self, _input_types: &[ConcreteDataType]) -> Result<ConcreteDataType> {
Ok(ConcreteDataType::uint64_datatype())
}
fn signature(&self) -> Signature {
let mut signatures =
Vec::with_capacity(POSITION_TYPES.len() * CELL_TYPES.len() * RESOLUTION_TYPES.len());
for position_type in POSITION_TYPES.as_slice() {
for cell_type in CELL_TYPES.as_slice() {
for resolution_type in RESOLUTION_TYPES.as_slice() {
signatures.push(TypeSignature::Exact(vec![
position_type.clone(),
cell_type.clone(),
resolution_type.clone(),
]));
}
}
}
Signature::one_of(signatures, Volatility::Stable)
}
fn eval(&self, _func_ctx: FunctionContext, columns: &[VectorRef]) -> Result<VectorRef> {
ensure_columns_n!(columns, 3);
let pos_vec = &columns[0];
let cell_vec = &columns[1];
let res_vec = &columns[2];
let size = cell_vec.len();
let mut results = UInt64VectorBuilder::with_capacity(size);
for i in 0..size {
let cell = cell_from_value(cell_vec.get(i))?;
let pos = value_to_position(pos_vec.get(i))?;
let res = value_to_resolution(res_vec.get(i))?;
let result = cell.and_then(|cell| cell.child_at(pos, res).map(u64::from));
results.push(result);
}
Ok(results.to_vector())
}
}
/// Function that returns cells with k distances of given cell
#[derive(Clone, Debug, Default, Display)]
#[display("{}", self.name())]
pub struct H3GridDisk;
impl Function for H3GridDisk {
fn name(&self) -> &str {
"h3_grid_disk"
}
fn return_type(&self, _input_types: &[ConcreteDataType]) -> Result<ConcreteDataType> {
Ok(ConcreteDataType::list_datatype(
ConcreteDataType::uint64_datatype(),
))
}
fn signature(&self) -> Signature {
signature_of_cell_and_distance()
}
fn eval(&self, _func_ctx: FunctionContext, columns: &[VectorRef]) -> Result<VectorRef> {
ensure_columns_n!(columns, 2);
let cell_vec = &columns[0];
let k_vec = &columns[1];
let size = cell_vec.len();
let mut results =
ListVectorBuilder::with_type_capacity(ConcreteDataType::uint64_datatype(), size);
for i in 0..size {
let cell = cell_from_value(cell_vec.get(i))?;
let k = value_to_distance(k_vec.get(i))?;
let result = cell.map(|cell| {
let children: Vec<Value> = cell
.grid_disk::<Vec<_>>(k)
.into_iter()
.map(|child| Value::from(u64::from(child)))
.collect();
ListValue::new(children, ConcreteDataType::uint64_datatype())
});
if let Some(list_value) = result {
results.push(Some(list_value.as_scalar_ref()));
} else {
results.push(None);
}
}
Ok(results.to_vector())
}
}
/// Function that returns all cells within k distances of given cell
#[derive(Clone, Debug, Default, Display)]
#[display("{}", self.name())]
pub struct H3GridDiskDistances;
impl Function for H3GridDiskDistances {
fn name(&self) -> &str {
"h3_grid_disk_distances"
}
fn return_type(&self, _input_types: &[ConcreteDataType]) -> Result<ConcreteDataType> {
Ok(ConcreteDataType::list_datatype(
ConcreteDataType::uint64_datatype(),
))
}
fn signature(&self) -> Signature {
signature_of_cell_and_distance()
}
fn eval(&self, _func_ctx: FunctionContext, columns: &[VectorRef]) -> Result<VectorRef> {
ensure_columns_n!(columns, 2);
let cell_vec = &columns[0];
let k_vec = &columns[1];
let size = cell_vec.len();
let mut results =
ListVectorBuilder::with_type_capacity(ConcreteDataType::uint64_datatype(), size);
for i in 0..size {
let cell = cell_from_value(cell_vec.get(i))?;
let k = value_to_distance(k_vec.get(i))?;
let result = cell.map(|cell| {
let children: Vec<Value> = cell
.grid_disk_distances::<Vec<_>>(k)
.into_iter()
.map(|(child, _distance)| Value::from(u64::from(child)))
.collect();
ListValue::new(children, ConcreteDataType::uint64_datatype())
});
if let Some(list_value) = result {
results.push(Some(list_value.as_scalar_ref()));
} else {
results.push(None);
}
}
Ok(results.to_vector())
}
}
/// Function that returns distance between two cells
#[derive(Clone, Debug, Default, Display)]
#[display("{}", self.name())]
pub struct H3GridDistance;
impl Function for H3GridDistance {
fn name(&self) -> &str {
"h3_grid_distance"
}
fn return_type(&self, _input_types: &[ConcreteDataType]) -> Result<ConcreteDataType> {
Ok(ConcreteDataType::int32_datatype())
}
fn signature(&self) -> Signature {
signature_of_double_cells()
}
fn eval(&self, _func_ctx: FunctionContext, columns: &[VectorRef]) -> Result<VectorRef> {
ensure_columns_n!(columns, 2);
let cell_this_vec = &columns[0];
let cell_that_vec = &columns[1];
let size = cell_this_vec.len();
let mut results = Int32VectorBuilder::with_capacity(size);
for i in 0..size {
let result = match (
cell_from_value(cell_vec.get(i))?,
cell_from_value(cell2_vec.get(i))?,
cell_from_value(cell_this_vec.get(i))?,
cell_from_value(cell_that_vec.get(i))?,
) {
(Some(cell_this), Some(cell_that)) => {
let is_neighbour = cell_this
.is_neighbor_with(cell_that)
let dist = cell_this
.grid_distance(cell_that)
.map_err(|e| {
BoxedError::new(PlainError::new(
format!("H3 error: {}", e),
@@ -682,7 +881,7 @@ impl Function for H3IsNeighbour {
))
})
.context(error::ExecuteSnafu)?;
Some(is_neighbour)
Some(dist)
}
_ => None,
};
@@ -694,6 +893,73 @@ impl Function for H3IsNeighbour {
}
}
/// Function that returns path cells between two cells
#[derive(Clone, Debug, Default, Display)]
#[display("{}", self.name())]
pub struct H3GridPathCells;
impl Function for H3GridPathCells {
fn name(&self) -> &str {
"h3_grid_path_cells"
}
fn return_type(&self, _input_types: &[ConcreteDataType]) -> Result<ConcreteDataType> {
Ok(ConcreteDataType::list_datatype(
ConcreteDataType::uint64_datatype(),
))
}
fn signature(&self) -> Signature {
signature_of_double_cells()
}
fn eval(&self, _func_ctx: FunctionContext, columns: &[VectorRef]) -> Result<VectorRef> {
ensure_columns_n!(columns, 2);
let cell_this_vec = &columns[0];
let cell_that_vec = &columns[1];
let size = cell_this_vec.len();
let mut results =
ListVectorBuilder::with_type_capacity(ConcreteDataType::uint64_datatype(), size);
for i in 0..size {
let result = match (
cell_from_value(cell_this_vec.get(i))?,
cell_from_value(cell_that_vec.get(i))?,
) {
(Some(cell_this), Some(cell_that)) => {
let cells = cell_this
.grid_path_cells(cell_that)
.and_then(|x| x.collect::<std::result::Result<Vec<CellIndex>, _>>())
.map_err(|e| {
BoxedError::new(PlainError::new(
format!("H3 error: {}", e),
StatusCode::EngineExecuteQuery,
))
})
.context(error::ExecuteSnafu)?;
Some(ListValue::new(
cells
.into_iter()
.map(|c| Value::from(u64::from(c)))
.collect(),
ConcreteDataType::uint64_datatype(),
))
}
_ => None,
};
if let Some(list_value) = result {
results.push(Some(list_value.as_scalar_ref()));
} else {
results.push(None);
}
}
Ok(results.to_vector())
}
}
fn value_to_resolution(v: Value) -> Result<Resolution> {
let r = match v {
Value::Int8(v) => v as u8,
@@ -716,26 +982,59 @@ fn value_to_resolution(v: Value) -> Result<Resolution> {
.context(error::ExecuteSnafu)
}
macro_rules! ensure_and_coerce {
($compare:expr, $coerce:expr) => {{
ensure!(
$compare,
InvalidFuncArgsSnafu {
err_msg: "Argument was outside of acceptable range "
}
);
Ok($coerce)
}};
}
fn value_to_position(v: Value) -> Result<u64> {
match v {
Value::Int8(v) => ensure_and_coerce!(v >= 0, v as u64),
Value::Int16(v) => ensure_and_coerce!(v >= 0, v as u64),
Value::Int32(v) => ensure_and_coerce!(v >= 0, v as u64),
Value::Int64(v) => ensure_and_coerce!(v >= 0, v as u64),
Value::UInt8(v) => Ok(v as u64),
Value::UInt16(v) => Ok(v as u64),
Value::UInt32(v) => Ok(v as u64),
Value::UInt64(v) => Ok(v),
_ => unreachable!(),
}
}
fn value_to_distance(v: Value) -> Result<u32> {
match v {
Value::Int8(v) => ensure_and_coerce!(v >= 0, v as u32),
Value::Int16(v) => ensure_and_coerce!(v >= 0, v as u32),
Value::Int32(v) => ensure_and_coerce!(v >= 0, v as u32),
Value::Int64(v) => ensure_and_coerce!(v >= 0, v as u32),
Value::UInt8(v) => Ok(v as u32),
Value::UInt16(v) => Ok(v as u32),
Value::UInt32(v) => Ok(v),
Value::UInt64(v) => Ok(v as u32),
_ => unreachable!(),
}
}
fn signature_of_cell() -> Signature {
let mut signatures = Vec::new();
for cell_type in &[
ConcreteDataType::uint64_datatype(),
ConcreteDataType::int64_datatype(),
] {
let mut signatures = Vec::with_capacity(CELL_TYPES.len());
for cell_type in CELL_TYPES.as_slice() {
signatures.push(TypeSignature::Exact(vec![cell_type.clone()]));
}
Signature::one_of(signatures, Volatility::Stable)
}
fn signature_of_double_cell() -> Signature {
let mut signatures = Vec::new();
let cell_types = &[
ConcreteDataType::uint64_datatype(),
ConcreteDataType::int64_datatype(),
];
for cell_type in cell_types {
for cell_type2 in cell_types {
fn signature_of_double_cells() -> Signature {
let mut signatures = Vec::with_capacity(CELL_TYPES.len() * CELL_TYPES.len());
for cell_type in CELL_TYPES.as_slice() {
for cell_type2 in CELL_TYPES.as_slice() {
signatures.push(TypeSignature::Exact(vec![
cell_type.clone(),
cell_type2.clone(),
@@ -747,21 +1046,9 @@ fn signature_of_double_cell() -> Signature {
}
fn signature_of_cell_and_resolution() -> Signature {
let mut signatures = Vec::new();
for cell_type in &[
ConcreteDataType::uint64_datatype(),
ConcreteDataType::int64_datatype(),
] {
for resolution_type in &[
ConcreteDataType::int8_datatype(),
ConcreteDataType::int16_datatype(),
ConcreteDataType::int32_datatype(),
ConcreteDataType::int64_datatype(),
ConcreteDataType::uint8_datatype(),
ConcreteDataType::uint16_datatype(),
ConcreteDataType::uint32_datatype(),
ConcreteDataType::uint64_datatype(),
] {
let mut signatures = Vec::with_capacity(CELL_TYPES.len() * RESOLUTION_TYPES.len());
for cell_type in CELL_TYPES.as_slice() {
for resolution_type in RESOLUTION_TYPES.as_slice() {
signatures.push(TypeSignature::Exact(vec![
cell_type.clone(),
resolution_type.clone(),
@@ -771,6 +1058,19 @@ fn signature_of_cell_and_resolution() -> Signature {
Signature::one_of(signatures, Volatility::Stable)
}
fn signature_of_cell_and_distance() -> Signature {
let mut signatures = Vec::with_capacity(CELL_TYPES.len() * DISTANCE_TYPES.len());
for cell_type in CELL_TYPES.as_slice() {
for distance_type in DISTANCE_TYPES.as_slice() {
signatures.push(TypeSignature::Exact(vec![
cell_type.clone(),
distance_type.clone(),
]));
}
}
Signature::one_of(signatures, Volatility::Stable)
}
fn cell_from_value(v: Value) -> Result<Option<CellIndex>> {
let cell = match v {
Value::Int64(v) => Some(

View File

@@ -0,0 +1,61 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
macro_rules! ensure_columns_len {
($columns:ident) => {
ensure!(
$columns.windows(2).all(|c| c[0].len() == c[1].len()),
InvalidFuncArgsSnafu {
err_msg: "The length of input columns are in different size"
}
)
};
($column_a:ident, $column_b:ident, $($column_n:ident),*) => {
ensure!(
{
let mut result = $column_a.len() == $column_b.len();
$(
result = result && ($column_a.len() == $column_n.len());
)*
result
}
InvalidFuncArgsSnafu {
err_msg: "The length of input columns are in different size"
}
)
};
}
pub(super) use ensure_columns_len;
macro_rules! ensure_columns_n {
($columns:ident, $n:literal) => {
ensure!(
$columns.len() == $n,
InvalidFuncArgsSnafu {
err_msg: format!(
"The length of arguments is not correct, expect {}, provided : {}",
stringify!($n),
$columns.len()
),
}
);
if $n > 1 {
ensure_columns_len!($columns);
}
};
}
pub(super) use ensure_columns_n;

View File

@@ -15,6 +15,7 @@
use std::sync::Arc;
mod json_get;
mod json_is;
mod json_path_exists;
mod json_to_string;
mod parse_json;
@@ -46,5 +47,7 @@ impl JsonFunction {
registry.register(Arc::new(JsonIsBool));
registry.register(Arc::new(JsonIsArray));
registry.register(Arc::new(JsonIsObject));
registry.register(Arc::new(json_path_exists::JsonPathExistsFunction));
}
}

View File

@@ -0,0 +1,172 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
use std::fmt::{self, Display};
use common_query::error::{InvalidFuncArgsSnafu, Result, UnsupportedInputDataTypeSnafu};
use common_query::prelude::Signature;
use datafusion::logical_expr::Volatility;
use datatypes::data_type::ConcreteDataType;
use datatypes::prelude::VectorRef;
use datatypes::scalars::ScalarVectorBuilder;
use datatypes::vectors::{BooleanVectorBuilder, MutableVector};
use snafu::ensure;
use crate::function::{Function, FunctionContext};
/// Check if the given JSON data contains the given JSON path.
#[derive(Clone, Debug, Default)]
pub struct JsonPathExistsFunction;
const NAME: &str = "json_path_exists";
impl Function for JsonPathExistsFunction {
fn name(&self) -> &str {
NAME
}
fn return_type(&self, _input_types: &[ConcreteDataType]) -> Result<ConcreteDataType> {
Ok(ConcreteDataType::boolean_datatype())
}
fn signature(&self) -> Signature {
Signature::exact(
vec![
ConcreteDataType::json_datatype(),
ConcreteDataType::string_datatype(),
],
Volatility::Immutable,
)
}
fn eval(&self, _func_ctx: FunctionContext, columns: &[VectorRef]) -> Result<VectorRef> {
ensure!(
columns.len() == 2,
InvalidFuncArgsSnafu {
err_msg: format!(
"The length of the args is not correct, expect exactly two, have: {}",
columns.len()
),
}
);
let jsons = &columns[0];
let paths = &columns[1];
let size = jsons.len();
let datatype = jsons.data_type();
let mut results = BooleanVectorBuilder::with_capacity(size);
match datatype {
// JSON data type uses binary vector
ConcreteDataType::Binary(_) => {
for i in 0..size {
let json = jsons.get_ref(i);
let path = paths.get_ref(i);
let json = json.as_binary();
let path = path.as_string();
let result = match (json, path) {
(Ok(Some(json)), Ok(Some(path))) => {
let json_path = jsonb::jsonpath::parse_json_path(path.as_bytes());
match json_path {
Ok(json_path) => jsonb::path_exists(json, json_path).ok(),
Err(_) => None,
}
}
_ => None,
};
results.push(result);
}
}
_ => {
return UnsupportedInputDataTypeSnafu {
function: NAME,
datatypes: columns.iter().map(|c| c.data_type()).collect::<Vec<_>>(),
}
.fail();
}
}
Ok(results.to_vector())
}
}
impl Display for JsonPathExistsFunction {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
write!(f, "JSON_PATH_EXISTS")
}
}
#[cfg(test)]
mod tests {
use std::sync::Arc;
use common_query::prelude::TypeSignature;
use datatypes::scalars::ScalarVector;
use datatypes::vectors::{BinaryVector, StringVector};
use super::*;
#[test]
fn test_json_path_exists_function() {
let json_path_exists = JsonPathExistsFunction;
assert_eq!("json_path_exists", json_path_exists.name());
assert_eq!(
ConcreteDataType::boolean_datatype(),
json_path_exists
.return_type(&[ConcreteDataType::json_datatype()])
.unwrap()
);
assert!(matches!(json_path_exists.signature(),
Signature {
type_signature: TypeSignature::Exact(valid_types),
volatility: Volatility::Immutable
} if valid_types == vec![ConcreteDataType::json_datatype(), ConcreteDataType::string_datatype()]
));
let json_strings = [
r#"{"a": {"b": 2}, "b": 2, "c": 3}"#,
r#"{"a": 4, "b": {"c": 6}, "c": 6}"#,
r#"{"a": 7, "b": 8, "c": {"a": 7}}"#,
r#"{"a": 7, "b": 8, "c": {"a": 7}}"#,
];
let paths = vec!["$.a.b.c", "$.b", "$.c.a", ".d"];
let results = [false, true, true, false];
let jsonbs = json_strings
.iter()
.map(|s| {
let value = jsonb::parse_value(s.as_bytes()).unwrap();
value.to_vec()
})
.collect::<Vec<_>>();
let json_vector = BinaryVector::from_vec(jsonbs);
let path_vector = StringVector::from_vec(paths);
let args: Vec<VectorRef> = vec![Arc::new(json_vector), Arc::new(path_vector)];
let vector = json_path_exists
.eval(FunctionContext::default(), &args)
.unwrap();
assert_eq!(4, vector.len());
for (i, gt) in results.iter().enumerate() {
let result = vector.get_ref(i);
let result = result.as_boolean().unwrap().unwrap();
assert_eq!(*gt, result);
}
}
}

View File

@@ -21,23 +21,19 @@ use syn::{parse_macro_input, DeriveInput, ItemStruct};
pub(crate) fn impl_aggr_func_type_store(ast: &DeriveInput) -> TokenStream {
let name = &ast.ident;
let gen = quote! {
use common_query::logical_plan::accumulator::AggrFuncTypeStore;
use common_query::error::{InvalidInputStateSnafu, Error as QueryError};
use datatypes::prelude::ConcreteDataType;
impl AggrFuncTypeStore for #name {
fn input_types(&self) -> std::result::Result<Vec<ConcreteDataType>, QueryError> {
impl common_query::logical_plan::accumulator::AggrFuncTypeStore for #name {
fn input_types(&self) -> std::result::Result<Vec<datatypes::prelude::ConcreteDataType>, common_query::error::Error> {
let input_types = self.input_types.load();
snafu::ensure!(input_types.is_some(), InvalidInputStateSnafu);
snafu::ensure!(input_types.is_some(), common_query::error::InvalidInputStateSnafu);
Ok(input_types.as_ref().unwrap().as_ref().clone())
}
fn set_input_types(&self, input_types: Vec<ConcreteDataType>) -> std::result::Result<(), QueryError> {
fn set_input_types(&self, input_types: Vec<datatypes::prelude::ConcreteDataType>) -> std::result::Result<(), common_query::error::Error> {
let old = self.input_types.swap(Some(std::sync::Arc::new(input_types.clone())));
if let Some(old) = old {
snafu::ensure!(old.len() == input_types.len(), InvalidInputStateSnafu);
snafu::ensure!(old.len() == input_types.len(), common_query::error::InvalidInputStateSnafu);
for (x, y) in old.iter().zip(input_types.iter()) {
snafu::ensure!(x == y, InvalidInputStateSnafu);
snafu::ensure!(x == y, common_query::error::InvalidInputStateSnafu);
}
}
Ok(())
@@ -51,7 +47,7 @@ pub(crate) fn impl_as_aggr_func_creator(_args: TokenStream, input: TokenStream)
let mut item_struct = parse_macro_input!(input as ItemStruct);
if let syn::Fields::Named(ref mut fields) = item_struct.fields {
let result = syn::Field::parse_named.parse2(quote! {
input_types: arc_swap::ArcSwapOption<Vec<ConcreteDataType>>
input_types: arc_swap::ArcSwapOption<Vec<datatypes::prelude::ConcreteDataType>>
});
match result {
Ok(field) => fields.named.push(field),

View File

@@ -24,5 +24,5 @@ struct Foo {}
fn test_derive() {
let _ = Foo::default();
assert_fields!(Foo: input_types);
assert_impl_all!(Foo: std::fmt::Debug, Default, AggrFuncTypeStore);
assert_impl_all!(Foo: std::fmt::Debug, Default, common_query::logical_plan::accumulator::AggrFuncTypeStore);
}

View File

@@ -187,7 +187,7 @@ mod tests {
region: Region::new_test(region_id),
leader_peer: Some(Peer::empty(1)),
follower_peers: vec![],
leader_status: None,
leader_state: None,
leader_down_since: None,
}]),
HashMap::new(),

View File

@@ -107,21 +107,21 @@ async fn test_on_submit_alter_request() {
region: Region::new_test(RegionId::new(table_id, 1)),
leader_peer: Some(Peer::empty(1)),
follower_peers: vec![Peer::empty(5)],
leader_status: None,
leader_state: None,
leader_down_since: None,
},
RegionRoute {
region: Region::new_test(RegionId::new(table_id, 2)),
leader_peer: Some(Peer::empty(2)),
follower_peers: vec![Peer::empty(4)],
leader_status: None,
leader_state: None,
leader_down_since: None,
},
RegionRoute {
region: Region::new_test(RegionId::new(table_id, 3)),
leader_peer: Some(Peer::empty(3)),
follower_peers: vec![],
leader_status: None,
leader_state: None,
leader_down_since: None,
},
]),
@@ -193,21 +193,21 @@ async fn test_on_submit_alter_request_with_outdated_request() {
region: Region::new_test(RegionId::new(table_id, 1)),
leader_peer: Some(Peer::empty(1)),
follower_peers: vec![Peer::empty(5)],
leader_status: None,
leader_state: None,
leader_down_since: None,
},
RegionRoute {
region: Region::new_test(RegionId::new(table_id, 2)),
leader_peer: Some(Peer::empty(2)),
follower_peers: vec![Peer::empty(4)],
leader_status: None,
leader_state: None,
leader_down_since: None,
},
RegionRoute {
region: Region::new_test(RegionId::new(table_id, 3)),
leader_peer: Some(Peer::empty(3)),
follower_peers: vec![],
leader_status: None,
leader_state: None,
leader_down_since: None,
},
]),

View File

@@ -119,21 +119,21 @@ async fn test_on_datanode_drop_regions() {
region: Region::new_test(RegionId::new(table_id, 1)),
leader_peer: Some(Peer::empty(1)),
follower_peers: vec![Peer::empty(5)],
leader_status: None,
leader_state: None,
leader_down_since: None,
},
RegionRoute {
region: Region::new_test(RegionId::new(table_id, 2)),
leader_peer: Some(Peer::empty(2)),
follower_peers: vec![Peer::empty(4)],
leader_status: None,
leader_state: None,
leader_down_since: None,
},
RegionRoute {
region: Region::new_test(RegionId::new(table_id, 3)),
leader_peer: Some(Peer::empty(3)),
follower_peers: vec![],
leader_status: None,
leader_state: None,
leader_down_since: None,
},
]),

View File

@@ -18,6 +18,7 @@ use common_procedure::error::Error as ProcedureError;
use snafu::{ensure, OptionExt, ResultExt};
use store_api::metric_engine_consts::LOGICAL_TABLE_METADATA_KEY;
use table::metadata::TableId;
use table::table_reference::TableReference;
use crate::ddl::DetectingRegion;
use crate::error::{Error, OperateDatanodeSnafu, Result, TableNotFoundSnafu, UnsupportedSnafu};
@@ -109,8 +110,8 @@ pub async fn check_and_get_physical_table_id(
.table_name_manager()
.get(physical_table_name)
.await?
.context(TableNotFoundSnafu {
table_name: physical_table_name.to_string(),
.with_context(|| TableNotFoundSnafu {
table_name: TableReference::from(physical_table_name).to_string(),
})
.map(|table| table.table_id())
}
@@ -123,8 +124,8 @@ pub async fn get_physical_table_id(
.table_name_manager()
.get(logical_table_name)
.await?
.context(TableNotFoundSnafu {
table_name: logical_table_name.to_string(),
.with_context(|| TableNotFoundSnafu {
table_name: TableReference::from(logical_table_name).to_string(),
})
.map(|table| table.table_id())?;

View File

@@ -147,6 +147,20 @@ pub enum Error {
source: common_procedure::Error,
},
#[snafu(display("Failed to start procedure manager"))]
StartProcedureManager {
#[snafu(implicit)]
location: Location,
source: common_procedure::Error,
},
#[snafu(display("Failed to stop procedure manager"))]
StopProcedureManager {
#[snafu(implicit)]
location: Location,
source: common_procedure::Error,
},
#[snafu(display(
"Failed to get procedure output, procedure id: {procedure_id}, error: {err_msg}"
))]
@@ -715,7 +729,9 @@ impl ErrorExt for Error {
SubmitProcedure { source, .. }
| QueryProcedure { source, .. }
| WaitProcedure { source, .. } => source.status_code(),
| WaitProcedure { source, .. }
| StartProcedureManager { source, .. }
| StopProcedureManager { source, .. } => source.status_code(),
RegisterProcedureLoader { source, .. } => source.status_code(),
External { source, .. } => source.status_code(),
OperateDatanode { source, .. } => source.status_code(),

View File

@@ -137,14 +137,16 @@ pub struct DowngradeRegion {
/// `None` stands for don't flush before downgrading the region.
#[serde(default)]
pub flush_timeout: Option<Duration>,
/// Rejects all write requests after flushing.
pub reject_write: bool,
}
impl Display for DowngradeRegion {
fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result {
write!(
f,
"DowngradeRegion(region_id={}, flush_timeout={:?})",
self.region_id, self.flush_timeout,
"DowngradeRegion(region_id={}, flush_timeout={:?}, rejct_write={})",
self.region_id, self.flush_timeout, self.reject_write
)
}
}

View File

@@ -140,7 +140,7 @@ use crate::key::table_route::TableRouteKey;
use crate::key::txn_helper::TxnOpGetResponseSet;
use crate::kv_backend::txn::{Txn, TxnOp};
use crate::kv_backend::KvBackendRef;
use crate::rpc::router::{region_distribution, RegionRoute, RegionStatus};
use crate::rpc::router::{region_distribution, LeaderState, RegionRoute};
use crate::rpc::store::BatchDeleteRequest;
use crate::DatanodeId;
@@ -1126,14 +1126,14 @@ impl TableMetadataManager {
next_region_route_status: F,
) -> Result<()>
where
F: Fn(&RegionRoute) -> Option<Option<RegionStatus>>,
F: Fn(&RegionRoute) -> Option<Option<LeaderState>>,
{
let mut new_region_routes = current_table_route_value.region_routes()?.clone();
let mut updated = 0;
for route in &mut new_region_routes {
if let Some(status) = next_region_route_status(route) {
if route.set_leader_status(status) {
if let Some(state) = next_region_route_status(route) {
if route.set_leader_state(state) {
updated += 1;
}
}
@@ -1280,7 +1280,7 @@ mod tests {
use crate::key::{DeserializedValueWithBytes, TableMetadataManager, ViewInfoValue};
use crate::kv_backend::memory::MemoryKvBackend;
use crate::peer::Peer;
use crate::rpc::router::{region_distribution, Region, RegionRoute, RegionStatus};
use crate::rpc::router::{region_distribution, LeaderState, Region, RegionRoute};
#[test]
fn test_deserialized_value_with_bytes() {
@@ -1324,7 +1324,7 @@ mod tests {
},
leader_peer: Some(Peer::new(datanode, "a2")),
follower_peers: vec![],
leader_status: None,
leader_state: None,
leader_down_since: None,
}
}
@@ -1715,7 +1715,7 @@ mod tests {
attrs: BTreeMap::new(),
},
leader_peer: Some(Peer::new(datanode, "a2")),
leader_status: Some(RegionStatus::Downgraded),
leader_state: Some(LeaderState::Downgrading),
follower_peers: vec![],
leader_down_since: Some(current_time_millis()),
},
@@ -1727,7 +1727,7 @@ mod tests {
attrs: BTreeMap::new(),
},
leader_peer: Some(Peer::new(datanode, "a1")),
leader_status: None,
leader_state: None,
follower_peers: vec![],
leader_down_since: None,
},
@@ -1750,10 +1750,10 @@ mod tests {
table_metadata_manager
.update_leader_region_status(table_id, &current_table_route_value, |region_route| {
if region_route.leader_status.is_some() {
if region_route.leader_state.is_some() {
None
} else {
Some(Some(RegionStatus::Downgraded))
Some(Some(LeaderState::Downgrading))
}
})
.await
@@ -1768,8 +1768,8 @@ mod tests {
.unwrap();
assert_eq!(
updated_route_value.region_routes().unwrap()[0].leader_status,
Some(RegionStatus::Downgraded)
updated_route_value.region_routes().unwrap()[0].leader_state,
Some(LeaderState::Downgrading)
);
assert!(updated_route_value.region_routes().unwrap()[0]
@@ -1777,8 +1777,8 @@ mod tests {
.is_some());
assert_eq!(
updated_route_value.region_routes().unwrap()[1].leader_status,
Some(RegionStatus::Downgraded)
updated_route_value.region_routes().unwrap()[1].leader_state,
Some(LeaderState::Downgrading)
);
assert!(updated_route_value.region_routes().unwrap()[1]
.leader_down_since
@@ -1943,21 +1943,21 @@ mod tests {
region: Region::new_test(RegionId::new(table_id, 1)),
leader_peer: Some(Peer::empty(1)),
follower_peers: vec![Peer::empty(5)],
leader_status: None,
leader_state: None,
leader_down_since: None,
},
RegionRoute {
region: Region::new_test(RegionId::new(table_id, 2)),
leader_peer: Some(Peer::empty(2)),
follower_peers: vec![Peer::empty(4)],
leader_status: None,
leader_state: None,
leader_down_since: None,
},
RegionRoute {
region: Region::new_test(RegionId::new(table_id, 3)),
leader_peer: Some(Peer::empty(3)),
follower_peers: vec![],
leader_status: None,
leader_state: None,
leader_down_since: None,
},
]),
@@ -1996,21 +1996,21 @@ mod tests {
region: Region::new_test(RegionId::new(table_id, 1)),
leader_peer: Some(Peer::empty(1)),
follower_peers: vec![Peer::empty(5)],
leader_status: None,
leader_state: None,
leader_down_since: None,
},
RegionRoute {
region: Region::new_test(RegionId::new(table_id, 2)),
leader_peer: Some(Peer::empty(2)),
follower_peers: vec![Peer::empty(4)],
leader_status: None,
leader_state: None,
leader_down_since: None,
},
RegionRoute {
region: Region::new_test(RegionId::new(table_id, 3)),
leader_peer: Some(Peer::empty(3)),
follower_peers: vec![],
leader_status: None,
leader_state: None,
leader_down_since: None,
},
]),

View File

@@ -21,6 +21,7 @@ use serde::{Deserialize, Serialize};
use snafu::OptionExt;
use table::metadata::TableId;
use table::table_name::TableName;
use table::table_reference::TableReference;
use super::{MetadataKey, MetadataValue, TABLE_NAME_KEY_PATTERN, TABLE_NAME_KEY_PREFIX};
use crate::error::{Error, InvalidMetadataSnafu, Result};
@@ -122,6 +123,16 @@ impl From<TableNameKey<'_>> for TableName {
}
}
impl<'a> From<TableNameKey<'a>> for TableReference<'a> {
fn from(value: TableNameKey<'a>) -> Self {
Self {
catalog: value.catalog,
schema: value.schema,
table: value.table,
}
}
}
impl<'a> TryFrom<&'a str> for TableNameKey<'a> {
type Error = Error;

View File

@@ -744,6 +744,7 @@ mod tests {
use crate::kv_backend::memory::MemoryKvBackend;
use crate::kv_backend::{KvBackend, TxnService};
use crate::peer::Peer;
use crate::rpc::router::Region;
use crate::rpc::store::PutRequest;
#[test]
@@ -751,11 +752,43 @@ mod tests {
let old_raw_v = r#"{"region_routes":[{"region":{"id":1,"name":"r1","partition":null,"attrs":{}},"leader_peer":{"id":2,"addr":"a2"},"follower_peers":[]},{"region":{"id":1,"name":"r1","partition":null,"attrs":{}},"leader_peer":{"id":2,"addr":"a2"},"follower_peers":[]}],"version":0}"#;
let v = TableRouteValue::try_from_raw_value(old_raw_v.as_bytes()).unwrap();
let new_raw_v = format!("{:?}", v);
assert_eq!(
new_raw_v,
r#"Physical(PhysicalTableRouteValue { region_routes: [RegionRoute { region: Region { id: 1(0, 1), name: "r1", partition: None, attrs: {} }, leader_peer: Some(Peer { id: 2, addr: "a2" }), follower_peers: [], leader_status: None, leader_down_since: None }, RegionRoute { region: Region { id: 1(0, 1), name: "r1", partition: None, attrs: {} }, leader_peer: Some(Peer { id: 2, addr: "a2" }), follower_peers: [], leader_status: None, leader_down_since: None }], version: 0 })"#
);
let expected_table_route = TableRouteValue::Physical(PhysicalTableRouteValue {
region_routes: vec![
RegionRoute {
region: Region {
id: RegionId::new(0, 1),
name: "r1".to_string(),
partition: None,
attrs: Default::default(),
},
leader_peer: Some(Peer {
id: 2,
addr: "a2".to_string(),
}),
follower_peers: vec![],
leader_state: None,
leader_down_since: None,
},
RegionRoute {
region: Region {
id: RegionId::new(0, 1),
name: "r1".to_string(),
partition: None,
attrs: Default::default(),
},
leader_peer: Some(Peer {
id: 2,
addr: "a2".to_string(),
}),
follower_peers: vec![],
leader_state: None,
leader_down_since: None,
},
],
version: 0,
});
assert_eq!(v, expected_table_route);
}
#[test]

View File

@@ -0,0 +1,156 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
use std::sync::Arc;
use async_trait::async_trait;
use common_telemetry::error;
use crate::error::Result;
pub type LeadershipChangeNotifierCustomizerRef = Arc<dyn LeadershipChangeNotifierCustomizer>;
/// A trait for customizing the leadership change notifier.
pub trait LeadershipChangeNotifierCustomizer: Send + Sync {
fn customize(&self, notifier: &mut LeadershipChangeNotifier);
}
/// A trait for handling leadership change events in a distributed system.
#[async_trait]
pub trait LeadershipChangeListener: Send + Sync {
/// Returns the listener name.
fn name(&self) -> &str;
/// Called when the node transitions to the leader role.
async fn on_leader_start(&self) -> Result<()>;
/// Called when the node transitions to the follower role.
async fn on_leader_stop(&self) -> Result<()>;
}
/// A notifier for leadership change events.
#[derive(Default)]
pub struct LeadershipChangeNotifier {
listeners: Vec<Arc<dyn LeadershipChangeListener>>,
}
impl LeadershipChangeNotifier {
/// Adds a listener to the notifier.
pub fn add_listener(&mut self, listener: Arc<dyn LeadershipChangeListener>) {
self.listeners.push(listener);
}
/// Notify all listeners that the node has become a leader.
pub async fn notify_on_leader_start(&self) {
for listener in &self.listeners {
if let Err(err) = listener.on_leader_start().await {
error!(
err;
"Failed to notify listener: {}, event 'on_leader_start'",
listener.name()
);
}
}
}
/// Notify all listeners that the node has become a follower.
pub async fn notify_on_leader_stop(&self) {
for listener in &self.listeners {
if let Err(err) = listener.on_leader_stop().await {
error!(
err;
"Failed to notify listener: {}, event: 'on_follower_start'",
listener.name()
);
}
}
}
}
#[cfg(test)]
mod tests {
use std::sync::atomic::{AtomicBool, Ordering};
use std::sync::Arc;
use super::*;
struct MockListener {
name: String,
on_leader_start_fn: Option<Box<dyn Fn() -> Result<()> + Send + Sync>>,
on_follower_start_fn: Option<Box<dyn Fn() -> Result<()> + Send + Sync>>,
}
#[async_trait::async_trait]
impl LeadershipChangeListener for MockListener {
fn name(&self) -> &str {
&self.name
}
async fn on_leader_start(&self) -> Result<()> {
if let Some(f) = &self.on_leader_start_fn {
return f();
}
Ok(())
}
async fn on_leader_stop(&self) -> Result<()> {
if let Some(f) = &self.on_follower_start_fn {
return f();
}
Ok(())
}
}
#[tokio::test]
async fn test_leadership_change_notifier() {
let mut notifier = LeadershipChangeNotifier::default();
let listener1 = Arc::new(MockListener {
name: "listener1".to_string(),
on_leader_start_fn: None,
on_follower_start_fn: None,
});
let called_on_leader_start = Arc::new(AtomicBool::new(false));
let called_on_follower_start = Arc::new(AtomicBool::new(false));
let called_on_leader_start_moved = called_on_leader_start.clone();
let called_on_follower_start_moved = called_on_follower_start.clone();
let listener2 = Arc::new(MockListener {
name: "listener2".to_string(),
on_leader_start_fn: Some(Box::new(move || {
called_on_leader_start_moved.store(true, Ordering::Relaxed);
Ok(())
})),
on_follower_start_fn: Some(Box::new(move || {
called_on_follower_start_moved.store(true, Ordering::Relaxed);
Ok(())
})),
});
notifier.add_listener(listener1);
notifier.add_listener(listener2);
let listener1 = notifier.listeners.first().unwrap();
let listener2 = notifier.listeners.get(1).unwrap();
assert_eq!(listener1.name(), "listener1");
assert_eq!(listener2.name(), "listener2");
notifier.notify_on_leader_start().await;
assert!(!called_on_follower_start.load(Ordering::Relaxed));
assert!(called_on_leader_start.load(Ordering::Relaxed));
notifier.notify_on_leader_stop().await;
assert!(called_on_follower_start.load(Ordering::Relaxed));
assert!(called_on_leader_start.load(Ordering::Relaxed));
}
}

View File

@@ -32,6 +32,7 @@ pub mod heartbeat;
pub mod instruction;
pub mod key;
pub mod kv_backend;
pub mod leadership_notifier;
pub mod lock_key;
pub mod metrics;
pub mod node_manager;

View File

@@ -58,7 +58,7 @@ impl MemoryRegionKeeper {
Default::default()
}
/// Returns [OpeningRegionGuard] if Region(`region_id`) on Peer(`datanode_id`) does not exist.
/// Returns [OperatingRegionGuard] if Region(`region_id`) on Peer(`datanode_id`) does not exist.
pub fn register(
&self,
datanode_id: DatanodeId,

View File

@@ -108,16 +108,16 @@ pub fn convert_to_region_peer_map(
.collect::<HashMap<_, _>>()
}
/// Returns the HashMap<[RegionNumber], [RegionStatus]>;
pub fn convert_to_region_leader_status_map(
/// Returns the HashMap<[RegionNumber], [LeaderState]>;
pub fn convert_to_region_leader_state_map(
region_routes: &[RegionRoute],
) -> HashMap<RegionNumber, RegionStatus> {
) -> HashMap<RegionNumber, LeaderState> {
region_routes
.iter()
.filter_map(|x| {
x.leader_status
x.leader_state
.as_ref()
.map(|status| (x.region.id.region_number(), *status))
.map(|state| (x.region.id.region_number(), *state))
})
.collect::<HashMap<_, _>>()
}
@@ -205,7 +205,7 @@ impl TableRoute {
region,
leader_peer,
follower_peers,
leader_status: None,
leader_state: None,
leader_down_since: None,
});
}
@@ -259,9 +259,13 @@ pub struct RegionRoute {
pub follower_peers: Vec<Peer>,
/// `None` by default.
#[builder(setter(into, strip_option), default)]
#[serde(default, skip_serializing_if = "Option::is_none")]
pub leader_status: Option<RegionStatus>,
/// The start time when the leader is in `Downgraded` status.
#[serde(
default,
alias = "leader_status",
skip_serializing_if = "Option::is_none"
)]
pub leader_state: Option<LeaderState>,
/// The start time when the leader is in `Downgraded` state.
#[serde(default)]
#[builder(default = "self.default_leader_down_since()")]
pub leader_down_since: Option<i64>,
@@ -269,76 +273,78 @@ pub struct RegionRoute {
impl RegionRouteBuilder {
fn default_leader_down_since(&self) -> Option<i64> {
match self.leader_status {
Some(Some(RegionStatus::Downgraded)) => Some(current_time_millis()),
match self.leader_state {
Some(Some(LeaderState::Downgrading)) => Some(current_time_millis()),
_ => None,
}
}
}
/// The Status of the [Region].
/// The State of the [`Region`] Leader.
/// TODO(dennis): It's better to add more fine-grained statuses such as `PENDING` etc.
#[derive(Debug, Clone, Copy, Deserialize, Serialize, PartialEq, AsRefStr)]
#[strum(serialize_all = "UPPERCASE")]
pub enum RegionStatus {
/// The following cases in which the [Region] will be downgraded.
pub enum LeaderState {
/// The following cases in which the [`Region`] will be downgraded.
///
/// - The [Region] is unavailable(e.g., Crashed, Network disconnected).
/// - The [Region] was planned to migrate to another [Peer].
Downgraded,
/// - The [`Region`] may be unavailable (e.g., Crashed, Network disconnected).
/// - The [`Region`] was planned to migrate to another [`Peer`].
Downgrading,
}
impl RegionRoute {
/// Returns true if the Leader [Region] is downgraded.
/// Returns true if the Leader [`Region`] is downgraded.
///
/// The following cases in which the [Region] will be downgraded.
/// The following cases in which the [`Region`] will be downgraded.
///
/// - The [Region] is unavailable(e.g., Crashed, Network disconnected).
/// - The [Region] was planned to migrate to another [Peer].
/// - The [`Region`] is unavailable(e.g., Crashed, Network disconnected).
/// - The [`Region`] was planned to migrate to another [`Peer`].
///
pub fn is_leader_downgraded(&self) -> bool {
matches!(self.leader_status, Some(RegionStatus::Downgraded))
pub fn is_leader_downgrading(&self) -> bool {
matches!(self.leader_state, Some(LeaderState::Downgrading))
}
/// Marks the Leader [Region] as downgraded.
/// Marks the Leader [`Region`] as [`RegionState::Downgrading`].
///
/// We should downgrade a [Region] before deactivating it:
/// We should downgrade a [`Region`] before deactivating it:
///
/// - During the [Region] Failover Procedure.
/// - Migrating a [Region].
/// - During the [`Region`] Failover Procedure.
/// - Migrating a [`Region`].
///
/// **Notes:** Meta Server will stop renewing the lease for the downgraded [Region].
/// **Notes:** Meta Server will renewing a special lease(`Downgrading`) for the downgrading [`Region`].
///
/// A downgrading region will reject any write requests, and only allow memetable to be flushed to object storage
///
pub fn downgrade_leader(&mut self) {
self.leader_down_since = Some(current_time_millis());
self.leader_status = Some(RegionStatus::Downgraded)
self.leader_state = Some(LeaderState::Downgrading)
}
/// Returns how long since the leader is in `Downgraded` status.
/// Returns how long since the leader is in `Downgraded` state.
pub fn leader_down_millis(&self) -> Option<i64> {
self.leader_down_since
.map(|start| current_time_millis() - start)
}
/// Sets the leader status.
/// Sets the leader state.
///
/// Returns true if updated.
pub fn set_leader_status(&mut self, status: Option<RegionStatus>) -> bool {
let updated = self.leader_status != status;
pub fn set_leader_state(&mut self, state: Option<LeaderState>) -> bool {
let updated = self.leader_state != state;
match (status, updated) {
(Some(RegionStatus::Downgraded), true) => {
match (state, updated) {
(Some(LeaderState::Downgrading), true) => {
self.leader_down_since = Some(current_time_millis());
}
(Some(RegionStatus::Downgraded), false) => {
// Do nothing if leader is still in `Downgraded` status.
(Some(LeaderState::Downgrading), false) => {
// Do nothing if leader is still in `Downgraded` state.
}
_ => {
self.leader_down_since = None;
}
}
self.leader_status = status;
self.leader_state = state;
updated
}
}
@@ -477,15 +483,15 @@ mod tests {
},
leader_peer: Some(Peer::new(1, "a1")),
follower_peers: vec![Peer::new(2, "a2"), Peer::new(3, "a3")],
leader_status: None,
leader_state: None,
leader_down_since: None,
};
assert!(!region_route.is_leader_downgraded());
assert!(!region_route.is_leader_downgrading());
region_route.downgrade_leader();
assert!(region_route.is_leader_downgraded());
assert!(region_route.is_leader_downgrading());
}
#[test]
@@ -499,7 +505,7 @@ mod tests {
},
leader_peer: Some(Peer::new(1, "a1")),
follower_peers: vec![Peer::new(2, "a2"), Peer::new(3, "a3")],
leader_status: None,
leader_state: None,
leader_down_since: None,
};

View File

@@ -17,6 +17,7 @@ pub mod kafka;
use std::collections::HashMap;
use std::sync::Arc;
use async_trait::async_trait;
use common_wal::config::MetasrvWalConfig;
use common_wal::options::{KafkaWalOptions, WalOptions, WAL_OPTIONS_KEY};
use snafu::ResultExt;
@@ -24,6 +25,7 @@ use store_api::storage::{RegionId, RegionNumber};
use crate::error::{EncodeWalOptionsSnafu, Result};
use crate::kv_backend::KvBackendRef;
use crate::leadership_notifier::LeadershipChangeListener;
use crate::wal_options_allocator::kafka::topic_manager::TopicManager as KafkaTopicManager;
/// Allocates wal options in region granularity.
@@ -94,6 +96,21 @@ impl WalOptionsAllocator {
}
}
#[async_trait]
impl LeadershipChangeListener for WalOptionsAllocator {
fn name(&self) -> &str {
"WalOptionsAllocator"
}
async fn on_leader_start(&self) -> Result<()> {
self.start().await
}
async fn on_leader_stop(&self) -> Result<()> {
Ok(())
}
}
/// Allocates a wal options for each region. The allocated wal options is encoded immediately.
pub fn allocate_region_wal_options(
regions: Vec<RegionNumber>,

View File

@@ -329,6 +329,7 @@ impl ExecutionPlanVisitor for MetricCollector {
level: self.current_level,
metrics: vec![],
});
self.current_level += 1;
return Ok(true);
};
@@ -365,8 +366,7 @@ impl ExecutionPlanVisitor for MetricCollector {
}
fn post_visit(&mut self, _plan: &dyn ExecutionPlan) -> std::result::Result<bool, Self::Error> {
// the last minus will underflow
self.current_level = self.current_level.wrapping_sub(1);
self.current_level -= 1;
Ok(true)
}
}

View File

@@ -17,6 +17,7 @@ backtrace = "0.3"
common-error.workspace = true
console-subscriber = { version = "0.1", optional = true }
greptime-proto.workspace = true
humantime-serde.workspace = true
lazy_static.workspace = true
once_cell.workspace = true
opentelemetry = { version = "0.21.0", default-features = false, features = [

View File

@@ -15,6 +15,7 @@
//! logging stuffs, inspired by databend
use std::env;
use std::sync::{Arc, Mutex, Once};
use std::time::Duration;
use once_cell::sync::{Lazy, OnceCell};
use opentelemetry::{global, KeyValue};
@@ -26,7 +27,7 @@ use serde::{Deserialize, Serialize};
use tracing_appender::non_blocking::WorkerGuard;
use tracing_appender::rolling::{RollingFileAppender, Rotation};
use tracing_log::LogTracer;
use tracing_subscriber::filter::Targets;
use tracing_subscriber::filter::{FilterFn, Targets};
use tracing_subscriber::fmt::Layer;
use tracing_subscriber::layer::SubscriberExt;
use tracing_subscriber::prelude::*;
@@ -53,6 +54,9 @@ pub struct LoggingOptions {
/// The log format that can be one of "json" or "text". Default is "text".
pub log_format: LogFormat,
/// The maximum number of log files set by default.
pub max_log_files: usize,
/// Whether to append logs to stdout. Default is true.
pub append_stdout: bool,
@@ -64,6 +68,24 @@ pub struct LoggingOptions {
/// The tracing sample ratio.
pub tracing_sample_ratio: Option<TracingSampleOptions>,
/// The logging options of slow query.
pub slow_query: SlowQueryOptions,
}
/// The options of slow query.
#[derive(Clone, Debug, Serialize, Deserialize, Default)]
#[serde(default)]
pub struct SlowQueryOptions {
/// Whether to enable slow query log.
pub enable: bool,
/// The threshold of slow queries.
#[serde(with = "humantime_serde")]
pub threshold: Option<Duration>,
/// The sample ratio of slow queries.
pub sample_ratio: Option<f64>,
}
#[derive(Clone, Debug, Copy, PartialEq, Eq, Serialize, Deserialize)]
@@ -96,6 +118,9 @@ impl Default for LoggingOptions {
otlp_endpoint: None,
tracing_sample_ratio: None,
append_stdout: true,
slow_query: SlowQueryOptions::default(),
// Rotation hourly, 24 files per day, keeps info log files of 30 days
max_log_files: 720,
}
}
}
@@ -186,8 +211,17 @@ pub fn init_global_logging(
// Configure the file logging layer with rolling policy.
let file_logging_layer = if !opts.dir.is_empty() {
let rolling_appender =
RollingFileAppender::new(Rotation::HOURLY, &opts.dir, "greptimedb");
let rolling_appender = RollingFileAppender::builder()
.rotation(Rotation::HOURLY)
.filename_prefix("greptimedb")
.max_log_files(opts.max_log_files)
.build(&opts.dir)
.unwrap_or_else(|e| {
panic!(
"initializing rolling file appender at {} failed: {}",
&opts.dir, e
)
});
let (writer, guard) = tracing_appender::non_blocking(rolling_appender);
guards.push(guard);
@@ -208,8 +242,17 @@ pub fn init_global_logging(
// Configure the error file logging layer with rolling policy.
let err_file_logging_layer = if !opts.dir.is_empty() {
let rolling_appender =
RollingFileAppender::new(Rotation::HOURLY, &opts.dir, "greptimedb-err");
let rolling_appender = RollingFileAppender::builder()
.rotation(Rotation::HOURLY)
.filename_prefix("greptimedb-err")
.max_log_files(opts.max_log_files)
.build(&opts.dir)
.unwrap_or_else(|e| {
panic!(
"initializing rolling file appender at {} failed: {}",
&opts.dir, e
)
});
let (writer, guard) = tracing_appender::non_blocking(rolling_appender);
guards.push(guard);
@@ -235,6 +278,51 @@ pub fn init_global_logging(
None
};
let slow_query_logging_layer = if !opts.dir.is_empty() && opts.slow_query.enable {
let rolling_appender = RollingFileAppender::builder()
.rotation(Rotation::HOURLY)
.filename_prefix("greptimedb-slow-queries")
.max_log_files(opts.max_log_files)
.build(&opts.dir)
.unwrap_or_else(|e| {
panic!(
"initializing rolling file appender at {} failed: {}",
&opts.dir, e
)
});
let (writer, guard) = tracing_appender::non_blocking(rolling_appender);
guards.push(guard);
// Only logs if the field contains "slow".
let slow_query_filter = FilterFn::new(|metadata| {
metadata
.fields()
.iter()
.any(|field| field.name().contains("slow"))
});
if opts.log_format == LogFormat::Json {
Some(
Layer::new()
.json()
.with_writer(writer)
.with_ansi(false)
.with_filter(slow_query_filter)
.boxed(),
)
} else {
Some(
Layer::new()
.with_writer(writer)
.with_ansi(false)
.with_filter(slow_query_filter)
.boxed(),
)
}
} else {
None
};
// resolve log level settings from:
// - options from command line or config files
// - environment variable: RUST_LOG
@@ -279,6 +367,7 @@ pub fn init_global_logging(
.with(stdout_logging_layer)
.with(file_logging_layer)
.with(err_file_logging_layer)
.with(slow_query_logging_layer)
};
// consume the `tracing_opts` to avoid "unused" warnings.
@@ -289,7 +378,8 @@ pub fn init_global_logging(
.with(dyn_filter)
.with(stdout_logging_layer)
.with(file_logging_layer)
.with(err_file_logging_layer);
.with(err_file_logging_layer)
.with(slow_query_logging_layer);
if opts.enable_otlp_tracing {
global::set_text_map_propagator(TraceContextPropagator::new());

View File

@@ -152,6 +152,17 @@ macro_rules! trace {
};
}
#[macro_export]
macro_rules! slow {
(target: $target:expr, $($arg:tt)+) => {
$crate::log!(target: $target, slow = true, $crate::tracing::Level::INFO, $($arg)+)
};
($($arg:tt)+) => {
$crate::log!($crate::tracing::Level::INFO, slow = true, $($arg)+)
};
}
#[cfg(test)]
mod tests {
use common_error::mock::MockError;

View File

@@ -129,8 +129,10 @@ impl RegionAliveKeeper {
let request = RegionRequest::Close(RegionCloseRequest {});
if let Err(e) = self.region_server.handle_request(region_id, request).await {
if e.status_code() != StatusCode::RegionNotFound {
let _ = self.region_server.set_writable(region_id, false);
error!(e; "Failed to close staled region {}, set region to readonly.",region_id);
let _ = self
.region_server
.set_region_role(region_id, RegionRole::Follower);
error!(e; "Failed to close staled region {}, convert region to follower.", region_id);
}
}
}
@@ -378,7 +380,7 @@ impl CountdownTask {
}
},
Some(CountdownCommand::Reset((role, deadline))) => {
let _ = self.region_server.set_writable(self.region_id, role.writable());
let _ = self.region_server.set_region_role(self.region_id, role);
trace!(
"Reset deadline of region {region_id} to approximately {} seconds later.",
(deadline - Instant::now()).as_secs_f32(),
@@ -399,8 +401,8 @@ impl CountdownTask {
}
}
() = &mut countdown => {
warn!("The region {region_id} lease is expired, set region to readonly.");
let _ = self.region_server.set_writable(self.region_id, false);
warn!("The region {region_id} lease is expired, convert region to follower.");
let _ = self.region_server.set_region_role(self.region_id, RegionRole::Follower);
// resets the countdown.
let far_future = Instant::now() + Duration::from_secs(86400 * 365 * 30);
countdown.as_mut().reset(far_future);
@@ -436,7 +438,9 @@ mod test {
.handle_request(region_id, RegionRequest::Create(builder.build()))
.await
.unwrap();
region_server.set_writable(region_id, true).unwrap();
region_server
.set_region_role(region_id, RegionRole::Leader)
.unwrap();
// Register a region before starting.
alive_keeper.register_region(region_id).await;

View File

@@ -47,7 +47,7 @@ use servers::server::ServerHandlers;
use servers::Mode;
use snafu::{ensure, OptionExt, ResultExt};
use store_api::path_utils::{region_dir, WAL_DIR};
use store_api::region_engine::RegionEngineRef;
use store_api::region_engine::{RegionEngineRef, RegionRole};
use store_api::region_request::RegionOpenRequest;
use store_api::storage::RegionId;
use tokio::fs;
@@ -546,9 +546,9 @@ async fn open_all_regions(
for region_id in open_regions {
if open_with_writable {
if let Err(e) = region_server.set_writable(region_id, true) {
if let Err(e) = region_server.set_region_role(region_id, RegionRole::Leader) {
error!(
e; "failed to set writable for region {region_id}"
e; "failed to convert region {region_id} to leader"
);
}
}

View File

@@ -126,7 +126,9 @@ impl HeartbeatTask {
let mut follower_region_lease_count = 0;
for lease in &lease.regions {
match lease.role() {
RegionRole::Leader => leader_region_lease_count += 1,
RegionRole::Leader | RegionRole::DowngradingLeader => {
leader_region_lease_count += 1
}
RegionRole::Follower => follower_region_lease_count += 1,
}
}

View File

@@ -153,6 +153,7 @@ mod tests {
use mito2::engine::MITO_ENGINE_NAME;
use mito2::test_util::{CreateRequestBuilder, TestEnv};
use store_api::path_utils::region_dir;
use store_api::region_engine::RegionRole;
use store_api::region_request::{RegionCloseRequest, RegionRequest};
use store_api::storage::RegionId;
use tokio::sync::mpsc::{self, Receiver};
@@ -213,6 +214,7 @@ mod tests {
let instruction = Instruction::DowngradeRegion(DowngradeRegion {
region_id: RegionId::new(2048, 1),
flush_timeout: Some(Duration::from_secs(1)),
reject_write: false,
});
assert!(heartbeat_handler
.is_acceptable(&heartbeat_env.create_handler_ctx((meta.clone(), instruction))));
@@ -295,7 +297,9 @@ mod tests {
}
assert_matches!(
region_server.set_writable(region_id, true).unwrap_err(),
region_server
.set_region_role(region_id, RegionRole::Leader)
.unwrap_err(),
error::Error::RegionNotFound { .. }
);
}
@@ -411,6 +415,7 @@ mod tests {
let instruction = Instruction::DowngradeRegion(DowngradeRegion {
region_id,
flush_timeout: Some(Duration::from_secs(1)),
reject_write: false,
});
let mut ctx = heartbeat_env.create_handler_ctx((meta, instruction));
@@ -433,6 +438,7 @@ mod tests {
let instruction = Instruction::DowngradeRegion(DowngradeRegion {
region_id: RegionId::new(2048, 1),
flush_timeout: Some(Duration::from_secs(1)),
reject_write: false,
});
let mut ctx = heartbeat_env.create_handler_ctx((meta, instruction));
let control = heartbeat_handler.handle(&mut ctx).await.unwrap();

View File

@@ -16,7 +16,7 @@ use common_meta::instruction::{DowngradeRegion, DowngradeRegionReply, Instructio
use common_telemetry::tracing::info;
use common_telemetry::warn;
use futures_util::future::BoxFuture;
use store_api::region_engine::SetReadonlyResponse;
use store_api::region_engine::{SetRegionRoleStateResponse, SettableRegionRoleState};
use store_api::region_request::{RegionFlushRequest, RegionRequest};
use store_api::storage::RegionId;
@@ -24,16 +24,20 @@ use crate::heartbeat::handler::HandlerContext;
use crate::heartbeat::task_tracker::WaitResult;
impl HandlerContext {
async fn set_readonly_gracefully(&self, region_id: RegionId) -> InstructionReply {
match self.region_server.set_readonly_gracefully(region_id).await {
Ok(SetReadonlyResponse::Success { last_entry_id }) => {
async fn downgrade_to_follower_gracefully(&self, region_id: RegionId) -> InstructionReply {
match self
.region_server
.set_region_role_state_gracefully(region_id, SettableRegionRoleState::Follower)
.await
{
Ok(SetRegionRoleStateResponse::Success { last_entry_id }) => {
InstructionReply::DowngradeRegion(DowngradeRegionReply {
last_entry_id,
exists: true,
error: None,
})
}
Ok(SetReadonlyResponse::NotFound) => {
Ok(SetRegionRoleStateResponse::NotFound) => {
InstructionReply::DowngradeRegion(DowngradeRegionReply {
last_entry_id: None,
exists: false,
@@ -53,10 +57,12 @@ impl HandlerContext {
DowngradeRegion {
region_id,
flush_timeout,
reject_write,
}: DowngradeRegion,
) -> BoxFuture<'static, InstructionReply> {
Box::pin(async move {
let Some(writable) = self.region_server.is_writable(region_id) else {
let Some(writable) = self.region_server.is_region_leader(region_id) else {
warn!("Region: {region_id} is not found");
return InstructionReply::DowngradeRegion(DowngradeRegionReply {
last_entry_id: None,
exists: false,
@@ -64,61 +70,89 @@ impl HandlerContext {
});
};
let region_server_moved = self.region_server.clone();
// Ignores flush request
if !writable {
return self.set_readonly_gracefully(region_id).await;
return self.downgrade_to_follower_gracefully(region_id).await;
}
let region_server_moved = self.region_server.clone();
if let Some(flush_timeout) = flush_timeout {
let register_result = self
.downgrade_tasks
.try_register(
// If flush_timeout is not set, directly convert region to follower.
let Some(flush_timeout) = flush_timeout else {
return self.downgrade_to_follower_gracefully(region_id).await;
};
if reject_write {
// Sets region to downgrading, the downgrading region will reject all write requests.
match self
.region_server
.set_region_role_state_gracefully(
region_id,
Box::pin(async move {
info!("Flush region: {region_id} before downgrading region");
region_server_moved
.handle_request(
region_id,
RegionRequest::Flush(RegionFlushRequest {
row_group_size: None,
}),
)
.await?;
Ok(())
}),
SettableRegionRoleState::DowngradingLeader,
)
.await;
if register_result.is_busy() {
warn!("Another flush task is running for the region: {region_id}");
}
let mut watcher = register_result.into_watcher();
let result = self.catchup_tasks.wait(&mut watcher, flush_timeout).await;
match result {
WaitResult::Timeout => {
InstructionReply::DowngradeRegion(DowngradeRegionReply {
.await
{
Ok(SetRegionRoleStateResponse::Success { .. }) => {}
Ok(SetRegionRoleStateResponse::NotFound) => {
warn!("Region: {region_id} is not found");
return InstructionReply::DowngradeRegion(DowngradeRegionReply {
last_entry_id: None,
exists: true,
error: Some(format!(
"Flush region: {region_id} before downgrading region is timeout"
)),
})
exists: false,
error: None,
});
}
WaitResult::Finish(Ok(_)) => self.set_readonly_gracefully(region_id).await,
WaitResult::Finish(Err(err)) => {
InstructionReply::DowngradeRegion(DowngradeRegionReply {
Err(err) => {
warn!(err; "Failed to convert region to downgrading leader");
return InstructionReply::DowngradeRegion(DowngradeRegionReply {
last_entry_id: None,
exists: true,
error: Some(format!("{err:?}")),
})
});
}
}
} else {
self.set_readonly_gracefully(region_id).await
}
let register_result = self
.downgrade_tasks
.try_register(
region_id,
Box::pin(async move {
info!("Flush region: {region_id} before converting region to follower");
region_server_moved
.handle_request(
region_id,
RegionRequest::Flush(RegionFlushRequest {
row_group_size: None,
}),
)
.await?;
Ok(())
}),
)
.await;
if register_result.is_busy() {
warn!("Another flush task is running for the region: {region_id}");
}
let mut watcher = register_result.into_watcher();
let result = self.catchup_tasks.wait(&mut watcher, flush_timeout).await;
match result {
WaitResult::Timeout => InstructionReply::DowngradeRegion(DowngradeRegionReply {
last_entry_id: None,
exists: true,
error: Some(format!("Flush region: {region_id} is timeout")),
}),
WaitResult::Finish(Ok(_)) => self.downgrade_to_follower_gracefully(region_id).await,
WaitResult::Finish(Err(err)) => {
InstructionReply::DowngradeRegion(DowngradeRegionReply {
last_entry_id: None,
exists: true,
error: Some(format!("{err:?}")),
})
}
}
})
}
@@ -131,7 +165,7 @@ mod tests {
use common_meta::instruction::{DowngradeRegion, InstructionReply};
use mito2::engine::MITO_ENGINE_NAME;
use store_api::region_engine::{RegionRole, SetReadonlyResponse};
use store_api::region_engine::{RegionRole, SetRegionRoleStateResponse};
use store_api::region_request::RegionRequest;
use store_api::storage::RegionId;
use tokio::time::Instant;
@@ -155,6 +189,7 @@ mod tests {
.handle_downgrade_region_instruction(DowngradeRegion {
region_id,
flush_timeout,
reject_write: false,
})
.await;
assert_matches!(reply, InstructionReply::DowngradeRegion(_));
@@ -182,8 +217,9 @@ mod tests {
Ok(0)
}));
region_engine.handle_set_readonly_gracefully_mock_fn =
Some(Box::new(|_| Ok(SetReadonlyResponse::success(Some(1024)))))
region_engine.handle_set_readonly_gracefully_mock_fn = Some(Box::new(|_| {
Ok(SetRegionRoleStateResponse::success(Some(1024)))
}))
});
mock_region_server.register_test_region(region_id, mock_engine);
let handler_context = HandlerContext::new_for_test(mock_region_server);
@@ -195,6 +231,7 @@ mod tests {
.handle_downgrade_region_instruction(DowngradeRegion {
region_id,
flush_timeout,
reject_write: false,
})
.await;
assert_matches!(reply, InstructionReply::DowngradeRegion(_));
@@ -215,8 +252,9 @@ mod tests {
MockRegionEngine::with_custom_apply_fn(MITO_ENGINE_NAME, |region_engine| {
region_engine.mock_role = Some(Some(RegionRole::Leader));
region_engine.handle_request_delay = Some(Duration::from_secs(100));
region_engine.handle_set_readonly_gracefully_mock_fn =
Some(Box::new(|_| Ok(SetReadonlyResponse::success(Some(1024)))))
region_engine.handle_set_readonly_gracefully_mock_fn = Some(Box::new(|_| {
Ok(SetRegionRoleStateResponse::success(Some(1024)))
}))
});
mock_region_server.register_test_region(region_id, mock_engine);
let handler_context = HandlerContext::new_for_test(mock_region_server);
@@ -227,6 +265,7 @@ mod tests {
.handle_downgrade_region_instruction(DowngradeRegion {
region_id,
flush_timeout: Some(flush_timeout),
reject_write: false,
})
.await;
assert_matches!(reply, InstructionReply::DowngradeRegion(_));
@@ -246,8 +285,9 @@ mod tests {
MockRegionEngine::with_custom_apply_fn(MITO_ENGINE_NAME, |region_engine| {
region_engine.mock_role = Some(Some(RegionRole::Leader));
region_engine.handle_request_delay = Some(Duration::from_millis(300));
region_engine.handle_set_readonly_gracefully_mock_fn =
Some(Box::new(|_| Ok(SetReadonlyResponse::success(Some(1024)))))
region_engine.handle_set_readonly_gracefully_mock_fn = Some(Box::new(|_| {
Ok(SetRegionRoleStateResponse::success(Some(1024)))
}))
});
mock_region_server.register_test_region(region_id, mock_engine);
let handler_context = HandlerContext::new_for_test(mock_region_server);
@@ -263,6 +303,7 @@ mod tests {
.handle_downgrade_region_instruction(DowngradeRegion {
region_id,
flush_timeout,
reject_write: false,
})
.await;
assert_matches!(reply, InstructionReply::DowngradeRegion(_));
@@ -277,6 +318,7 @@ mod tests {
.handle_downgrade_region_instruction(DowngradeRegion {
region_id,
flush_timeout: Some(Duration::from_millis(500)),
reject_write: false,
})
.await;
assert_matches!(reply, InstructionReply::DowngradeRegion(_));
@@ -304,8 +346,9 @@ mod tests {
}
.fail()
}));
region_engine.handle_set_readonly_gracefully_mock_fn =
Some(Box::new(|_| Ok(SetReadonlyResponse::success(Some(1024)))))
region_engine.handle_set_readonly_gracefully_mock_fn = Some(Box::new(|_| {
Ok(SetRegionRoleStateResponse::success(Some(1024)))
}))
});
mock_region_server.register_test_region(region_id, mock_engine);
let handler_context = HandlerContext::new_for_test(mock_region_server);
@@ -321,6 +364,7 @@ mod tests {
.handle_downgrade_region_instruction(DowngradeRegion {
region_id,
flush_timeout,
reject_write: false,
})
.await;
assert_matches!(reply, InstructionReply::DowngradeRegion(_));
@@ -335,6 +379,7 @@ mod tests {
.handle_downgrade_region_instruction(DowngradeRegion {
region_id,
flush_timeout: Some(Duration::from_millis(500)),
reject_write: false,
})
.await;
assert_matches!(reply, InstructionReply::DowngradeRegion(_));
@@ -356,7 +401,7 @@ mod tests {
MockRegionEngine::with_custom_apply_fn(MITO_ENGINE_NAME, |region_engine| {
region_engine.mock_role = Some(Some(RegionRole::Leader));
region_engine.handle_set_readonly_gracefully_mock_fn =
Some(Box::new(|_| Ok(SetReadonlyResponse::NotFound)));
Some(Box::new(|_| Ok(SetRegionRoleStateResponse::NotFound)));
});
mock_region_server.register_test_region(region_id, mock_engine);
let handler_context = HandlerContext::new_for_test(mock_region_server);
@@ -365,6 +410,7 @@ mod tests {
.handle_downgrade_region_instruction(DowngradeRegion {
region_id,
flush_timeout: None,
reject_write: false,
})
.await;
assert_matches!(reply, InstructionReply::DowngradeRegion(_));
@@ -396,6 +442,7 @@ mod tests {
.handle_downgrade_region_instruction(DowngradeRegion {
region_id,
flush_timeout: None,
reject_write: false,
})
.await;
assert_matches!(reply, InstructionReply::DowngradeRegion(_));

View File

@@ -31,7 +31,7 @@ impl HandlerContext {
}: UpgradeRegion,
) -> BoxFuture<'static, InstructionReply> {
Box::pin(async move {
let Some(writable) = self.region_server.is_writable(region_id) else {
let Some(writable) = self.region_server.is_region_leader(region_id) else {
return InstructionReply::UpgradeRegion(UpgradeRegionReply {
ready: false,
exists: false,

View File

@@ -54,7 +54,10 @@ use snafu::{ensure, OptionExt, ResultExt};
use store_api::metric_engine_consts::{
FILE_ENGINE_NAME, LOGICAL_TABLE_METADATA_KEY, METRIC_ENGINE_NAME,
};
use store_api::region_engine::{RegionEngineRef, RegionRole, RegionStatistic, SetReadonlyResponse};
use store_api::region_engine::{
RegionEngineRef, RegionRole, RegionStatistic, SetRegionRoleStateResponse,
SettableRegionRoleState,
};
use store_api::region_request::{
AffectedRows, RegionCloseRequest, RegionOpenRequest, RegionRequest,
};
@@ -274,37 +277,47 @@ impl RegionServer {
.collect()
}
pub fn is_writable(&self, region_id: RegionId) -> Option<bool> {
// TODO(weny): Finds a better way.
pub fn is_region_leader(&self, region_id: RegionId) -> Option<bool> {
self.inner.region_map.get(&region_id).and_then(|engine| {
engine.role(region_id).map(|role| match role {
RegionRole::Follower => false,
RegionRole::Leader => true,
RegionRole::DowngradingLeader => true,
})
})
}
pub fn set_writable(&self, region_id: RegionId, writable: bool) -> Result<()> {
pub fn set_region_role(&self, region_id: RegionId, role: RegionRole) -> Result<()> {
let engine = self
.inner
.region_map
.get(&region_id)
.with_context(|| RegionNotFoundSnafu { region_id })?;
engine
.set_writable(region_id, writable)
.set_region_role(region_id, role)
.with_context(|_| HandleRegionRequestSnafu { region_id })
}
pub async fn set_readonly_gracefully(
/// Set region role state gracefully.
///
/// For [SettableRegionRoleState::Follower]:
/// After the call returns, the engine ensures that
/// no **further** write or flush operations will succeed in this region.
///
/// For [SettableRegionRoleState::DowngradingLeader]:
/// After the call returns, the engine ensures that
/// no **further** write operations will succeed in this region.
pub async fn set_region_role_state_gracefully(
&self,
region_id: RegionId,
) -> Result<SetReadonlyResponse> {
state: SettableRegionRoleState,
) -> Result<SetRegionRoleStateResponse> {
match self.inner.region_map.get(&region_id) {
Some(engine) => Ok(engine
.set_readonly_gracefully(region_id)
.set_region_role_state_gracefully(region_id, state)
.await
.with_context(|_| HandleRegionRequestSnafu { region_id })?),
None => Ok(SetReadonlyResponse::NotFound),
None => Ok(SetRegionRoleStateResponse::NotFound),
}
}
@@ -842,7 +855,7 @@ impl RegionServerInner {
info!("Region {region_id} is deregistered from engine {engine_type}");
self.region_map
.remove(&region_id)
.map(|(id, engine)| engine.set_writable(id, false));
.map(|(id, engine)| engine.set_region_role(id, RegionRole::Follower));
self.event_listener.on_region_deregistered(region_id);
}
RegionChange::Catchup => {

View File

@@ -32,7 +32,8 @@ use query::{QueryEngine, QueryEngineContext};
use session::context::QueryContextRef;
use store_api::metadata::RegionMetadataRef;
use store_api::region_engine::{
RegionEngine, RegionRole, RegionScannerRef, RegionStatistic, SetReadonlyResponse,
RegionEngine, RegionRole, RegionScannerRef, RegionStatistic, SetRegionRoleStateResponse,
SettableRegionRoleState,
};
use store_api::region_request::{AffectedRows, RegionRequest};
use store_api::storage::{RegionId, ScanRequest};
@@ -106,7 +107,7 @@ pub type MockRequestHandler =
Box<dyn Fn(RegionId, RegionRequest) -> Result<AffectedRows, Error> + Send + Sync>;
pub type MockSetReadonlyGracefullyHandler =
Box<dyn Fn(RegionId) -> Result<SetReadonlyResponse, Error> + Send + Sync>;
Box<dyn Fn(RegionId) -> Result<SetRegionRoleStateResponse, Error> + Send + Sync>;
pub struct MockRegionEngine {
sender: Sender<(RegionId, RegionRequest)>,
@@ -220,14 +221,15 @@ impl RegionEngine for MockRegionEngine {
Ok(())
}
fn set_writable(&self, _region_id: RegionId, _writable: bool) -> Result<(), BoxedError> {
fn set_region_role(&self, _region_id: RegionId, _role: RegionRole) -> Result<(), BoxedError> {
Ok(())
}
async fn set_readonly_gracefully(
async fn set_region_role_state_gracefully(
&self,
region_id: RegionId,
) -> Result<SetReadonlyResponse, BoxedError> {
_region_role_state: SettableRegionRoleState,
) -> Result<SetRegionRoleStateResponse, BoxedError> {
if let Some(mock_fn) = &self.handle_set_readonly_gracefully_mock_fn {
return mock_fn(region_id).map_err(BoxedError::new);
};

View File

@@ -249,6 +249,15 @@ impl ConcreteDataType {
]
}
pub fn timestamps() -> Vec<ConcreteDataType> {
vec![
ConcreteDataType::timestamp_second_datatype(),
ConcreteDataType::timestamp_millisecond_datatype(),
ConcreteDataType::timestamp_microsecond_datatype(),
ConcreteDataType::timestamp_nanosecond_datatype(),
]
}
/// Convert arrow data type to [ConcreteDataType].
///
/// # Panics

View File

@@ -26,8 +26,8 @@ use object_store::ObjectStore;
use snafu::{ensure, OptionExt};
use store_api::metadata::RegionMetadataRef;
use store_api::region_engine::{
RegionEngine, RegionRole, RegionScannerRef, RegionStatistic, SetReadonlyResponse,
SinglePartitionScanner,
RegionEngine, RegionRole, RegionScannerRef, RegionStatistic, SetRegionRoleStateResponse,
SettableRegionRoleState, SinglePartitionScanner,
};
use store_api::region_request::{
AffectedRows, RegionCloseRequest, RegionCreateRequest, RegionDropRequest, RegionOpenRequest,
@@ -113,22 +113,23 @@ impl RegionEngine for FileRegionEngine {
None
}
fn set_writable(&self, region_id: RegionId, writable: bool) -> Result<(), BoxedError> {
fn set_region_role(&self, region_id: RegionId, role: RegionRole) -> Result<(), BoxedError> {
self.inner
.set_writable(region_id, writable)
.set_region_role(region_id, role)
.map_err(BoxedError::new)
}
async fn set_readonly_gracefully(
async fn set_region_role_state_gracefully(
&self,
region_id: RegionId,
) -> Result<SetReadonlyResponse, BoxedError> {
_region_role_state: SettableRegionRoleState,
) -> Result<SetRegionRoleStateResponse, BoxedError> {
let exists = self.inner.get_region(region_id).await.is_some();
if exists {
Ok(SetReadonlyResponse::success(None))
Ok(SetRegionRoleStateResponse::success(None))
} else {
Ok(SetReadonlyResponse::NotFound)
Ok(SetRegionRoleStateResponse::NotFound)
}
}
@@ -189,7 +190,7 @@ impl EngineInner {
Ok(())
}
fn set_writable(&self, _region_id: RegionId, _writable: bool) -> EngineResult<()> {
fn set_region_role(&self, _region_id: RegionId, _region_role: RegionRole) -> EngineResult<()> {
// TODO(zhongzc): Improve the semantics and implementation of this API.
Ok(())
}

View File

@@ -37,6 +37,7 @@ use operator::delete::Deleter;
use operator::insert::Inserter;
use operator::statement::StatementExecutor;
use partition::manager::PartitionRuleManager;
use query::stats::StatementStatistics;
use query::{QueryEngine, QueryEngineFactory};
use servers::error::{AlreadyStartedSnafu, StartGrpcSnafu, TcpBindSnafu, TcpIncomingSnafu};
use servers::server::Server;
@@ -475,6 +476,7 @@ impl FrontendInvoker {
layered_cache_registry.clone(),
inserter.clone(),
table_route_cache,
StatementStatistics::default(),
));
let invoker = FrontendInvoker::new(inserter, deleter, statement_executor);

View File

@@ -33,6 +33,7 @@ use operator::statement::{StatementExecutor, StatementExecutorRef};
use operator::table::TableMutationOperator;
use partition::manager::PartitionRuleManager;
use pipeline::pipeline_operator::PipelineOperator;
use query::stats::StatementStatistics;
use query::QueryEngineFactory;
use servers::server::ServerHandlers;
use snafu::OptionExt;
@@ -55,6 +56,7 @@ pub struct FrontendBuilder {
plugins: Option<Plugins>,
procedure_executor: ProcedureExecutorRef,
heartbeat_task: Option<HeartbeatTask>,
stats: StatementStatistics,
}
impl FrontendBuilder {
@@ -65,6 +67,7 @@ impl FrontendBuilder {
catalog_manager: CatalogManagerRef,
node_manager: NodeManagerRef,
procedure_executor: ProcedureExecutorRef,
stats: StatementStatistics,
) -> Self {
Self {
options,
@@ -76,6 +79,7 @@ impl FrontendBuilder {
plugins: None,
procedure_executor,
heartbeat_task: None,
stats,
}
}
@@ -181,6 +185,7 @@ impl FrontendBuilder {
local_cache_invalidator,
inserter.clone(),
table_route_cache,
self.stats,
));
let pipeline_operator = Arc::new(PipelineOperator::new(

View File

@@ -132,7 +132,7 @@ impl ClientManager {
}
async fn try_create_client(&self, provider: &Arc<KafkaProvider>) -> Result<Client> {
// Sets to Retry to retry connecting if the kafka cluter replies with an UnknownTopic error.
// Sets to Retry to retry connecting if the kafka cluster replies with an UnknownTopic error.
// That's because the topic is believed to exist as the metasrv is expected to create required topics upon start.
// The reconnecting won't stop until succeed or a different error returns.
let client = self
@@ -182,7 +182,7 @@ mod tests {
use super::*;
/// Creates `num_topiocs` number of topics each will be decorated by the given decorator.
/// Creates `num_topics` number of topics each will be decorated by the given decorator.
pub async fn create_topics<F>(
num_topics: usize,
decorator: F,

View File

@@ -733,6 +733,13 @@ pub enum Error {
#[snafu(implicit)]
location: Location,
},
#[snafu(display("Handler not found: {}", name))]
HandlerNotFound {
name: String,
#[snafu(implicit)]
location: Location,
},
}
impl Error {
@@ -803,7 +810,8 @@ impl ErrorExt for Error {
| Error::InitExportMetricsTask { .. }
| Error::ProcedureNotFound { .. }
| Error::TooManyPartitions { .. }
| Error::TomlFormat { .. } => StatusCode::InvalidArguments,
| Error::TomlFormat { .. }
| Error::HandlerNotFound { .. } => StatusCode::InvalidArguments,
Error::LeaseKeyFromUtf8 { .. }
| Error::LeaseValueFromUtf8 { .. }
| Error::InvalidRegionKeyFromUtf8 { .. }

View File

@@ -22,12 +22,28 @@ use api::v1::meta::{
HeartbeatRequest, HeartbeatResponse, MailboxMessage, RegionLease, RequestHeader,
ResponseHeader, Role, PROTOCOL_VERSION,
};
use check_leader_handler::CheckLeaderHandler;
use collect_cluster_info_handler::{
CollectDatanodeClusterInfoHandler, CollectFlownodeClusterInfoHandler,
CollectFrontendClusterInfoHandler,
};
use collect_stats_handler::CollectStatsHandler;
use common_base::Plugins;
use common_meta::datanode::Stat;
use common_meta::instruction::{Instruction, InstructionReply};
use common_meta::sequence::Sequence;
use common_telemetry::{debug, info, warn};
use dashmap::DashMap;
use extract_stat_handler::ExtractStatHandler;
use failure_handler::RegionFailureHandler;
use filter_inactive_region_stats::FilterInactiveRegionStatsHandler;
use futures::future::join_all;
use keep_lease_handler::{DatanodeKeepLeaseHandler, FlownodeKeepLeaseHandler};
use mailbox_handler::MailboxHandler;
use on_leader_start_handler::OnLeaderStartHandler;
use publish_heartbeat_handler::PublishHeartbeatHandler;
use region_lease_handler::RegionLeaseHandler;
use response_header_handler::ResponseHeaderHandler;
use snafu::{OptionExt, ResultExt};
use store_api::storage::RegionId;
use tokio::sync::mpsc::Sender;
@@ -36,6 +52,7 @@ use tokio::sync::{oneshot, Notify, RwLock};
use crate::error::{self, DeserializeFromJsonSnafu, Result, UnexpectedInstructionReplySnafu};
use crate::metasrv::Context;
use crate::metrics::{METRIC_META_HANDLER_EXECUTE, METRIC_META_HEARTBEAT_CONNECTION_NUM};
use crate::pubsub::PublisherRef;
use crate::service::mailbox::{
BroadcastChannel, Channel, Mailbox, MailboxReceiver, MailboxRef, MessageId,
};
@@ -96,6 +113,7 @@ impl HeartbeatAccumulator {
}
}
/// The pusher of the heartbeat response.
pub struct Pusher {
sender: Sender<std::result::Result<HeartbeatResponse, tonic::Status>>,
res_header: ResponseHeader,
@@ -131,6 +149,7 @@ impl Pusher {
}
}
/// The group of heartbeat pushers.
#[derive(Clone, Default)]
pub struct Pushers(Arc<RwLock<BTreeMap<String, Pusher>>>);
@@ -203,50 +222,46 @@ impl NameCachedHandler {
}
}
#[derive(Clone, Default)]
pub type HeartbeatHandlerGroupRef = Arc<HeartbeatHandlerGroup>;
/// The group of heartbeat handlers.
#[derive(Default)]
pub struct HeartbeatHandlerGroup {
handlers: Arc<RwLock<Vec<NameCachedHandler>>>,
handlers: Vec<NameCachedHandler>,
pushers: Pushers,
}
impl HeartbeatHandlerGroup {
pub(crate) fn new(pushers: Pushers) -> Self {
Self {
handlers: Arc::new(RwLock::new(vec![])),
pushers,
}
}
pub async fn add_handler(&self, handler: impl HeartbeatHandler + 'static) {
let mut handlers = self.handlers.write().await;
handlers.push(NameCachedHandler::new(handler));
}
pub async fn register(&self, key: impl AsRef<str>, pusher: Pusher) {
/// Registers the heartbeat response [`Pusher`] with the given key to the group.
pub async fn register_pusher(&self, key: impl AsRef<str>, pusher: Pusher) {
let key = key.as_ref();
METRIC_META_HEARTBEAT_CONNECTION_NUM.inc();
info!("Pusher register: {}", key);
let _ = self.pushers.insert(key.to_string(), pusher).await;
}
pub async fn deregister(&self, key: impl AsRef<str>) -> Option<Pusher> {
/// Deregisters the heartbeat response [`Pusher`] with the given key from the group.
///
/// Returns the [`Pusher`] if it exists.
pub async fn deregister_push(&self, key: impl AsRef<str>) -> Option<Pusher> {
let key = key.as_ref();
METRIC_META_HEARTBEAT_CONNECTION_NUM.dec();
info!("Pusher unregister: {}", key);
self.pushers.remove(key).await
}
/// Returns the [`Pushers`] of the group.
pub fn pushers(&self) -> Pushers {
self.pushers.clone()
}
/// Handles the heartbeat request.
pub async fn handle(
&self,
req: HeartbeatRequest,
mut ctx: Context,
) -> Result<HeartbeatResponse> {
let mut acc = HeartbeatAccumulator::default();
let handlers = self.handlers.read().await;
let role = req
.header
.as_ref()
@@ -255,7 +270,7 @@ impl HeartbeatHandlerGroup {
err_msg: format!("invalid role: {:?}", req.header),
})?;
for NameCachedHandler { name, handler } in handlers.iter() {
for NameCachedHandler { name, handler } in self.handlers.iter() {
if !handler.is_acceptable(role) {
continue;
}
@@ -426,9 +441,170 @@ impl Mailbox for HeartbeatMailbox {
}
}
/// The builder to build the group of heartbeat handlers.
pub struct HeartbeatHandlerGroupBuilder {
/// The handler to handle region failure.
region_failure_handler: Option<RegionFailureHandler>,
/// The handler to handle region lease.
region_lease_handler: Option<RegionLeaseHandler>,
/// The plugins.
plugins: Option<Plugins>,
/// The heartbeat response pushers.
pushers: Pushers,
/// The group of heartbeat handlers.
handlers: Vec<NameCachedHandler>,
}
impl HeartbeatHandlerGroupBuilder {
pub fn new(pushers: Pushers) -> Self {
Self {
region_failure_handler: None,
region_lease_handler: None,
plugins: None,
pushers,
handlers: vec![],
}
}
pub fn with_region_lease_handler(mut self, handler: Option<RegionLeaseHandler>) -> Self {
self.region_lease_handler = handler;
self
}
/// Sets the [`RegionFailureHandler`].
pub fn with_region_failure_handler(mut self, handler: Option<RegionFailureHandler>) -> Self {
self.region_failure_handler = handler;
self
}
/// Sets the [`Plugins`].
pub fn with_plugins(mut self, plugins: Option<Plugins>) -> Self {
self.plugins = plugins;
self
}
/// Adds the default handlers.
pub fn add_default_handlers(mut self) -> Self {
// Extract the `PublishHeartbeatHandler` from the plugins.
let publish_heartbeat_handler = if let Some(plugins) = self.plugins.as_ref() {
plugins
.get::<PublisherRef>()
.map(|publish| PublishHeartbeatHandler::new(publish.clone()))
} else {
None
};
self.add_handler_last(ResponseHeaderHandler);
// `KeepLeaseHandler` should preferably be in front of `CheckLeaderHandler`,
// because even if the current meta-server node is no longer the leader it can
// still help the datanode to keep lease.
self.add_handler_last(DatanodeKeepLeaseHandler);
self.add_handler_last(FlownodeKeepLeaseHandler);
self.add_handler_last(CheckLeaderHandler);
self.add_handler_last(OnLeaderStartHandler);
self.add_handler_last(ExtractStatHandler);
self.add_handler_last(CollectDatanodeClusterInfoHandler);
self.add_handler_last(CollectFrontendClusterInfoHandler);
self.add_handler_last(CollectFlownodeClusterInfoHandler);
self.add_handler_last(MailboxHandler);
if let Some(region_lease_handler) = self.region_lease_handler.take() {
self.add_handler_last(region_lease_handler);
}
self.add_handler_last(FilterInactiveRegionStatsHandler);
if let Some(region_failure_handler) = self.region_failure_handler.take() {
self.add_handler_last(region_failure_handler);
}
if let Some(publish_heartbeat_handler) = publish_heartbeat_handler {
self.add_handler_last(publish_heartbeat_handler);
}
self.add_handler_last(CollectStatsHandler::default());
self
}
/// Builds the group of heartbeat handlers.
///
/// Applies the customizer if it exists.
pub fn build(mut self) -> Result<HeartbeatHandlerGroup> {
if let Some(customizer) = self
.plugins
.as_ref()
.and_then(|plugins| plugins.get::<HeartbeatHandlerGroupBuilderCustomizerRef>())
{
debug!("Customizing the heartbeat handler group builder");
customizer.customize(&mut self)?;
}
Ok(HeartbeatHandlerGroup {
handlers: self.handlers.into_iter().collect(),
pushers: self.pushers,
})
}
/// Adds the handler after the specified handler.
pub fn add_handler_after(
&mut self,
target: &'static str,
handler: impl HeartbeatHandler + 'static,
) -> Result<()> {
if let Some(pos) = self.handlers.iter().position(|x| x.name == target) {
self.handlers
.insert(pos + 1, NameCachedHandler::new(handler));
return Ok(());
}
error::HandlerNotFoundSnafu { name: target }.fail()
}
/// Adds the handler before the specified handler.
pub fn add_handler_before(
&mut self,
target: &'static str,
handler: impl HeartbeatHandler + 'static,
) -> Result<()> {
if let Some(pos) = self.handlers.iter().position(|x| x.name == target) {
self.handlers.insert(pos, NameCachedHandler::new(handler));
return Ok(());
}
error::HandlerNotFoundSnafu { name: target }.fail()
}
/// Replaces the handler with the specified name.
pub fn replace_handler(
&mut self,
target: &'static str,
handler: impl HeartbeatHandler + 'static,
) -> Result<()> {
if let Some(pos) = self.handlers.iter().position(|x| x.name == target) {
self.handlers[pos] = NameCachedHandler::new(handler);
return Ok(());
}
error::HandlerNotFoundSnafu { name: target }.fail()
}
fn add_handler_last(&mut self, handler: impl HeartbeatHandler + 'static) {
self.handlers.push(NameCachedHandler::new(handler));
}
}
pub type HeartbeatHandlerGroupBuilderCustomizerRef =
Arc<dyn HeartbeatHandlerGroupBuilderCustomizer>;
/// The customizer of the [`HeartbeatHandlerGroupBuilder`].
pub trait HeartbeatHandlerGroupBuilderCustomizer: Send + Sync {
fn customize(&self, builder: &mut HeartbeatHandlerGroupBuilder) -> Result<()>;
}
#[cfg(test)]
mod tests {
use std::assert_matches::assert_matches;
use std::sync::Arc;
use std::time::Duration;
@@ -437,17 +613,9 @@ mod tests {
use common_meta::sequence::SequenceBuilder;
use tokio::sync::mpsc;
use crate::handler::check_leader_handler::CheckLeaderHandler;
use crate::handler::collect_cluster_info_handler::{
CollectDatanodeClusterInfoHandler, CollectFlownodeClusterInfoHandler,
CollectFrontendClusterInfoHandler,
};
use super::{HeartbeatHandlerGroupBuilder, Pushers};
use crate::error;
use crate::handler::collect_stats_handler::CollectStatsHandler;
use crate::handler::extract_stat_handler::ExtractStatHandler;
use crate::handler::filter_inactive_region_stats::FilterInactiveRegionStatsHandler;
use crate::handler::keep_lease_handler::{DatanodeKeepLeaseHandler, FlownodeKeepLeaseHandler};
use crate::handler::mailbox_handler::MailboxHandler;
use crate::handler::on_leader_start_handler::OnLeaderStartHandler;
use crate::handler::response_header_handler::ResponseHeaderHandler;
use crate::handler::{HeartbeatHandlerGroup, HeartbeatMailbox, Pusher};
use crate::service::mailbox::{Channel, MailboxReceiver, MailboxRef};
@@ -489,7 +657,7 @@ mod tests {
let pusher: Pusher = Pusher::new(pusher_tx, &res_header);
let handler_group = HeartbeatHandlerGroup::default();
handler_group
.register(format!("{}-{}", Role::Datanode as i32, datanode_id), pusher)
.register_pusher(format!("{}-{}", Role::Datanode as i32, datanode_id), pusher)
.await;
let kv_backend = Arc::new(MemoryKvBackend::new());
@@ -517,24 +685,14 @@ mod tests {
(mailbox, receiver)
}
#[tokio::test]
async fn test_handler_name() {
let group = HeartbeatHandlerGroup::default();
group.add_handler(ResponseHeaderHandler).await;
group.add_handler(DatanodeKeepLeaseHandler).await;
group.add_handler(FlownodeKeepLeaseHandler).await;
group.add_handler(CheckLeaderHandler).await;
group.add_handler(OnLeaderStartHandler).await;
group.add_handler(ExtractStatHandler).await;
group.add_handler(CollectDatanodeClusterInfoHandler).await;
group.add_handler(CollectFrontendClusterInfoHandler).await;
group.add_handler(CollectFlownodeClusterInfoHandler).await;
group.add_handler(MailboxHandler).await;
group.add_handler(FilterInactiveRegionStatsHandler).await;
group.add_handler(CollectStatsHandler::default()).await;
let handlers = group.handlers.read().await;
#[test]
fn test_handler_group_builder() {
let group = HeartbeatHandlerGroupBuilder::new(Pushers::default())
.add_default_handlers()
.build()
.unwrap();
let handlers = group.handlers;
assert_eq!(12, handlers.len());
let names = [
@@ -556,4 +714,255 @@ mod tests {
assert_eq!(handler.name, name);
}
}
#[test]
fn test_handler_group_builder_add_before() {
let mut builder =
HeartbeatHandlerGroupBuilder::new(Pushers::default()).add_default_handlers();
builder
.add_handler_before(
"FilterInactiveRegionStatsHandler",
CollectStatsHandler::default(),
)
.unwrap();
let group = builder.build().unwrap();
let handlers = group.handlers;
assert_eq!(13, handlers.len());
let names = [
"ResponseHeaderHandler",
"DatanodeKeepLeaseHandler",
"FlownodeKeepLeaseHandler",
"CheckLeaderHandler",
"OnLeaderStartHandler",
"ExtractStatHandler",
"CollectDatanodeClusterInfoHandler",
"CollectFrontendClusterInfoHandler",
"CollectFlownodeClusterInfoHandler",
"MailboxHandler",
"CollectStatsHandler",
"FilterInactiveRegionStatsHandler",
"CollectStatsHandler",
];
for (handler, name) in handlers.iter().zip(names.into_iter()) {
assert_eq!(handler.name, name);
}
}
#[test]
fn test_handler_group_builder_add_before_first() {
let mut builder =
HeartbeatHandlerGroupBuilder::new(Pushers::default()).add_default_handlers();
builder
.add_handler_before("ResponseHeaderHandler", CollectStatsHandler::default())
.unwrap();
let group = builder.build().unwrap();
let handlers = group.handlers;
assert_eq!(13, handlers.len());
let names = [
"CollectStatsHandler",
"ResponseHeaderHandler",
"DatanodeKeepLeaseHandler",
"FlownodeKeepLeaseHandler",
"CheckLeaderHandler",
"OnLeaderStartHandler",
"ExtractStatHandler",
"CollectDatanodeClusterInfoHandler",
"CollectFrontendClusterInfoHandler",
"CollectFlownodeClusterInfoHandler",
"MailboxHandler",
"FilterInactiveRegionStatsHandler",
"CollectStatsHandler",
];
for (handler, name) in handlers.iter().zip(names.into_iter()) {
assert_eq!(handler.name, name);
}
}
#[test]
fn test_handler_group_builder_add_after() {
let mut builder =
HeartbeatHandlerGroupBuilder::new(Pushers::default()).add_default_handlers();
builder
.add_handler_after("MailboxHandler", CollectStatsHandler::default())
.unwrap();
let group = builder.build().unwrap();
let handlers = group.handlers;
assert_eq!(13, handlers.len());
let names = [
"ResponseHeaderHandler",
"DatanodeKeepLeaseHandler",
"FlownodeKeepLeaseHandler",
"CheckLeaderHandler",
"OnLeaderStartHandler",
"ExtractStatHandler",
"CollectDatanodeClusterInfoHandler",
"CollectFrontendClusterInfoHandler",
"CollectFlownodeClusterInfoHandler",
"MailboxHandler",
"CollectStatsHandler",
"FilterInactiveRegionStatsHandler",
"CollectStatsHandler",
];
for (handler, name) in handlers.iter().zip(names.into_iter()) {
assert_eq!(handler.name, name);
}
}
#[test]
fn test_handler_group_builder_add_after_last() {
let mut builder =
HeartbeatHandlerGroupBuilder::new(Pushers::default()).add_default_handlers();
builder
.add_handler_after("CollectStatsHandler", ResponseHeaderHandler)
.unwrap();
let group = builder.build().unwrap();
let handlers = group.handlers;
assert_eq!(13, handlers.len());
let names = [
"ResponseHeaderHandler",
"DatanodeKeepLeaseHandler",
"FlownodeKeepLeaseHandler",
"CheckLeaderHandler",
"OnLeaderStartHandler",
"ExtractStatHandler",
"CollectDatanodeClusterInfoHandler",
"CollectFrontendClusterInfoHandler",
"CollectFlownodeClusterInfoHandler",
"MailboxHandler",
"FilterInactiveRegionStatsHandler",
"CollectStatsHandler",
"ResponseHeaderHandler",
];
for (handler, name) in handlers.iter().zip(names.into_iter()) {
assert_eq!(handler.name, name);
}
}
#[test]
fn test_handler_group_builder_replace() {
let mut builder =
HeartbeatHandlerGroupBuilder::new(Pushers::default()).add_default_handlers();
builder
.replace_handler("MailboxHandler", CollectStatsHandler::default())
.unwrap();
let group = builder.build().unwrap();
let handlers = group.handlers;
assert_eq!(12, handlers.len());
let names = [
"ResponseHeaderHandler",
"DatanodeKeepLeaseHandler",
"FlownodeKeepLeaseHandler",
"CheckLeaderHandler",
"OnLeaderStartHandler",
"ExtractStatHandler",
"CollectDatanodeClusterInfoHandler",
"CollectFrontendClusterInfoHandler",
"CollectFlownodeClusterInfoHandler",
"CollectStatsHandler",
"FilterInactiveRegionStatsHandler",
"CollectStatsHandler",
];
for (handler, name) in handlers.iter().zip(names.into_iter()) {
assert_eq!(handler.name, name);
}
}
#[test]
fn test_handler_group_builder_replace_last() {
let mut builder =
HeartbeatHandlerGroupBuilder::new(Pushers::default()).add_default_handlers();
builder
.replace_handler("CollectStatsHandler", ResponseHeaderHandler)
.unwrap();
let group = builder.build().unwrap();
let handlers = group.handlers;
assert_eq!(12, handlers.len());
let names = [
"ResponseHeaderHandler",
"DatanodeKeepLeaseHandler",
"FlownodeKeepLeaseHandler",
"CheckLeaderHandler",
"OnLeaderStartHandler",
"ExtractStatHandler",
"CollectDatanodeClusterInfoHandler",
"CollectFrontendClusterInfoHandler",
"CollectFlownodeClusterInfoHandler",
"MailboxHandler",
"FilterInactiveRegionStatsHandler",
"ResponseHeaderHandler",
];
for (handler, name) in handlers.iter().zip(names.into_iter()) {
assert_eq!(handler.name, name);
}
}
#[test]
fn test_handler_group_builder_replace_first() {
let mut builder =
HeartbeatHandlerGroupBuilder::new(Pushers::default()).add_default_handlers();
builder
.replace_handler("ResponseHeaderHandler", CollectStatsHandler::default())
.unwrap();
let group = builder.build().unwrap();
let handlers = group.handlers;
assert_eq!(12, handlers.len());
let names = [
"CollectStatsHandler",
"DatanodeKeepLeaseHandler",
"FlownodeKeepLeaseHandler",
"CheckLeaderHandler",
"OnLeaderStartHandler",
"ExtractStatHandler",
"CollectDatanodeClusterInfoHandler",
"CollectFrontendClusterInfoHandler",
"CollectFlownodeClusterInfoHandler",
"MailboxHandler",
"FilterInactiveRegionStatsHandler",
"CollectStatsHandler",
];
for (handler, name) in handlers.iter().zip(names.into_iter()) {
assert_eq!(handler.name, name);
}
}
#[test]
fn test_handler_group_builder_handler_not_found() {
let mut builder =
HeartbeatHandlerGroupBuilder::new(Pushers::default()).add_default_handlers();
let err = builder
.add_handler_before("NotExists", CollectStatsHandler::default())
.unwrap_err();
assert_matches!(err, error::Error::HandlerNotFound { .. });
let err = builder
.add_handler_after("NotExists", CollectStatsHandler::default())
.unwrap_err();
assert_matches!(err, error::Error::HandlerNotFound { .. });
let err = builder
.replace_handler("NotExists", CollectStatsHandler::default())
.unwrap_err();
assert_matches!(err, error::Error::HandlerNotFound { .. });
}
}

View File

@@ -111,7 +111,7 @@ mod test {
use common_meta::kv_backend::memory::MemoryKvBackend;
use common_meta::peer::Peer;
use common_meta::region_keeper::MemoryRegionKeeper;
use common_meta::rpc::router::{Region, RegionRoute, RegionStatus};
use common_meta::rpc::router::{LeaderState, Region, RegionRoute};
use store_api::region_engine::RegionRole;
use store_api::storage::RegionId;
@@ -297,7 +297,7 @@ mod test {
region: Region::new_test(region_id),
leader_peer: Some(peer.clone()),
follower_peers: vec![follower_peer.clone()],
leader_status: Some(RegionStatus::Downgraded),
leader_state: Some(LeaderState::Downgrading),
leader_down_since: Some(1),
},
RegionRoute {
@@ -352,7 +352,7 @@ mod test {
assert_region_lease(
acc,
vec![
GrantedRegion::new(region_id, RegionRole::Follower),
GrantedRegion::new(region_id, RegionRole::DowngradingLeader),
GrantedRegion::new(another_region_id, RegionRole::Leader),
],
);

View File

@@ -40,7 +40,6 @@ pub mod selector;
pub mod service;
pub mod state;
pub mod table_meta_alloc;
pub use crate::error::Result;
mod greptimedb_telemetry;

View File

@@ -29,6 +29,9 @@ use common_meta::cache_invalidator::CacheInvalidatorRef;
use common_meta::ddl::ProcedureExecutorRef;
use common_meta::key::TableMetadataManagerRef;
use common_meta::kv_backend::{KvBackendRef, ResettableKvBackend, ResettableKvBackendRef};
use common_meta::leadership_notifier::{
LeadershipChangeNotifier, LeadershipChangeNotifierCustomizerRef,
};
use common_meta::peer::Peer;
use common_meta::region_keeper::MemoryRegionKeeperRef;
use common_meta::wal_options_allocator::WalOptionsAllocatorRef;
@@ -52,10 +55,11 @@ use crate::error::{
StopProcedureManagerSnafu,
};
use crate::failure_detector::PhiAccrualFailureDetectorOptions;
use crate::handler::HeartbeatHandlerGroup;
use crate::handler::HeartbeatHandlerGroupRef;
use crate::lease::lookup_datanode_peer;
use crate::lock::DistLockRef;
use crate::procedure::region_migration::manager::RegionMigrationManagerRef;
use crate::procedure::ProcedureManagerListenerAdapter;
use crate::pubsub::{PublisherRef, SubscriptionManagerRef};
use crate::region::supervisor::RegionSupervisorTickerRef;
use crate::selector::{Selector, SelectorType};
@@ -291,17 +295,15 @@ pub type SelectorRef = Arc<dyn Selector<Context = SelectorContext, Output = Vec<
pub type ElectionRef = Arc<dyn Election<Leader = LeaderValue>>;
pub struct MetaStateHandler {
procedure_manager: ProcedureManagerRef,
wal_options_allocator: WalOptionsAllocatorRef,
subscribe_manager: Option<SubscriptionManagerRef>,
greptimedb_telemetry_task: Arc<GreptimeDBTelemetryTask>,
leader_cached_kv_backend: Arc<LeaderCachedKvBackend>,
region_supervisor_ticker: Option<RegionSupervisorTickerRef>,
leadership_change_notifier: LeadershipChangeNotifier,
state: StateRef,
}
impl MetaStateHandler {
pub async fn on_become_leader(&self) {
pub async fn on_leader_start(&self) {
self.state.write().unwrap().next_state(become_leader(false));
if let Err(e) = self.leader_cached_kv_backend.load().await {
@@ -310,33 +312,19 @@ impl MetaStateHandler {
self.state.write().unwrap().next_state(become_leader(true));
}
if let Some(ticker) = self.region_supervisor_ticker.as_ref() {
ticker.start();
}
if let Err(e) = self.procedure_manager.start().await {
error!(e; "Failed to start procedure manager");
}
if let Err(e) = self.wal_options_allocator.start().await {
error!(e; "Failed to start wal options allocator");
}
self.leadership_change_notifier
.notify_on_leader_start()
.await;
self.greptimedb_telemetry_task.should_report(true);
}
pub async fn on_become_follower(&self) {
pub async fn on_leader_stop(&self) {
self.state.write().unwrap().next_state(become_follower());
// Stops the procedures.
if let Err(e) = self.procedure_manager.stop().await {
error!(e; "Failed to stop procedure manager");
}
if let Some(ticker) = self.region_supervisor_ticker.as_ref() {
// Stops the supervisor ticker.
ticker.stop();
}
self.leadership_change_notifier
.notify_on_leader_stop()
.await;
// Suspends reporting.
self.greptimedb_telemetry_task.should_report(false);
@@ -366,7 +354,7 @@ pub struct Metasrv {
selector: SelectorRef,
// The flow selector is used to select a target flownode.
flow_selector: SelectorRef,
handler_group: HeartbeatHandlerGroup,
handler_group: HeartbeatHandlerGroupRef,
election: Option<ElectionRef>,
lock: DistLockRef,
procedure_manager: ProcedureManagerRef,
@@ -410,15 +398,25 @@ impl Metasrv {
greptimedb_telemetry_task
.start()
.context(StartTelemetryTaskSnafu)?;
let region_supervisor_ticker = self.region_supervisor_ticker.clone();
// Builds leadership change notifier.
let mut leadership_change_notifier = LeadershipChangeNotifier::default();
leadership_change_notifier.add_listener(self.wal_options_allocator.clone());
leadership_change_notifier
.add_listener(Arc::new(ProcedureManagerListenerAdapter(procedure_manager)));
if let Some(region_supervisor_ticker) = &self.region_supervisor_ticker {
leadership_change_notifier.add_listener(region_supervisor_ticker.clone() as _);
}
if let Some(customizer) = self.plugins.get::<LeadershipChangeNotifierCustomizerRef>() {
customizer.customize(&mut leadership_change_notifier);
}
let state_handler = MetaStateHandler {
greptimedb_telemetry_task,
subscribe_manager,
procedure_manager,
wal_options_allocator: self.wal_options_allocator.clone(),
state: self.state.clone(),
leader_cached_kv_backend: leader_cached_kv_backend.clone(),
region_supervisor_ticker,
leadership_change_notifier,
};
let _handle = common_runtime::spawn_global(async move {
loop {
@@ -429,12 +427,12 @@ impl Metasrv {
info!("Leader's cache has bean cleared on leader change: {msg}");
match msg {
LeaderChangeMessage::Elected(_) => {
state_handler.on_become_leader().await;
state_handler.on_leader_start().await;
}
LeaderChangeMessage::StepDown(leader) => {
error!("Leader :{:?} step down", leader);
state_handler.on_become_follower().await;
state_handler.on_leader_stop().await;
}
}
}
@@ -448,7 +446,7 @@ impl Metasrv {
}
}
state_handler.on_become_follower().await;
state_handler.on_leader_stop().await;
});
// Register candidate and keep lease in background.
@@ -562,7 +560,7 @@ impl Metasrv {
&self.flow_selector
}
pub fn handler_group(&self) -> &HeartbeatHandlerGroup {
pub fn handler_group(&self) -> &HeartbeatHandlerGroupRef {
&self.handler_group
}

View File

@@ -46,22 +46,11 @@ use crate::cluster::{MetaPeerClientBuilder, MetaPeerClientRef};
use crate::error::{self, Result};
use crate::flow_meta_alloc::FlowPeerAllocator;
use crate::greptimedb_telemetry::get_greptimedb_telemetry_task;
use crate::handler::check_leader_handler::CheckLeaderHandler;
use crate::handler::collect_cluster_info_handler::{
CollectDatanodeClusterInfoHandler, CollectFlownodeClusterInfoHandler,
CollectFrontendClusterInfoHandler,
};
use crate::handler::collect_stats_handler::CollectStatsHandler;
use crate::handler::extract_stat_handler::ExtractStatHandler;
use crate::handler::failure_handler::RegionFailureHandler;
use crate::handler::filter_inactive_region_stats::FilterInactiveRegionStatsHandler;
use crate::handler::keep_lease_handler::{DatanodeKeepLeaseHandler, FlownodeKeepLeaseHandler};
use crate::handler::mailbox_handler::MailboxHandler;
use crate::handler::on_leader_start_handler::OnLeaderStartHandler;
use crate::handler::publish_heartbeat_handler::PublishHeartbeatHandler;
use crate::handler::region_lease_handler::RegionLeaseHandler;
use crate::handler::response_header_handler::ResponseHeaderHandler;
use crate::handler::{HeartbeatHandlerGroup, HeartbeatMailbox, Pushers};
use crate::handler::{
HeartbeatHandlerGroup, HeartbeatHandlerGroupBuilder, HeartbeatMailbox, Pushers,
};
use crate::lease::MetaPeerLookupService;
use crate::lock::memory::MemLock;
use crate::lock::DistLockRef;
@@ -70,7 +59,6 @@ use crate::metasrv::{
};
use crate::procedure::region_migration::manager::RegionMigrationManager;
use crate::procedure::region_migration::DefaultContextFactory;
use crate::pubsub::PublisherRef;
use crate::region::supervisor::{
HeartbeatAcceptor, RegionFailureDetectorControl, RegionSupervisor, RegionSupervisorTicker,
DEFAULT_TICK_INTERVAL,
@@ -364,41 +352,18 @@ impl MetasrvBuilder {
let handler_group = match handler_group {
Some(handler_group) => handler_group,
None => {
let publish_heartbeat_handler = plugins
.clone()
.and_then(|plugins| plugins.get::<PublisherRef>())
.map(|publish| PublishHeartbeatHandler::new(publish.clone()));
let region_lease_handler = RegionLeaseHandler::new(
distributed_time_constants::REGION_LEASE_SECS,
table_metadata_manager.clone(),
memory_region_keeper.clone(),
);
let group = HeartbeatHandlerGroup::new(pushers);
group.add_handler(ResponseHeaderHandler).await;
// `KeepLeaseHandler` should preferably be in front of `CheckLeaderHandler`,
// because even if the current meta-server node is no longer the leader it can
// still help the datanode to keep lease.
group.add_handler(DatanodeKeepLeaseHandler).await;
group.add_handler(FlownodeKeepLeaseHandler).await;
group.add_handler(CheckLeaderHandler).await;
group.add_handler(OnLeaderStartHandler).await;
group.add_handler(ExtractStatHandler).await;
group.add_handler(CollectDatanodeClusterInfoHandler).await;
group.add_handler(CollectFrontendClusterInfoHandler).await;
group.add_handler(CollectFlownodeClusterInfoHandler).await;
group.add_handler(MailboxHandler).await;
group.add_handler(region_lease_handler).await;
group.add_handler(FilterInactiveRegionStatsHandler).await;
if let Some(region_failover_handler) = region_failover_handler {
group.add_handler(region_failover_handler).await;
}
if let Some(publish_heartbeat_handler) = publish_heartbeat_handler {
group.add_handler(publish_heartbeat_handler).await;
}
group.add_handler(CollectStatsHandler::default()).await;
group
HeartbeatHandlerGroupBuilder::new(pushers)
.with_plugins(plugins.clone())
.with_region_failure_handler(region_failover_handler)
.with_region_lease_handler(Some(region_lease_handler))
.add_default_handlers()
.build()?
}
};
@@ -417,7 +382,7 @@ impl MetasrvBuilder {
selector,
// TODO(jeremy): We do not allow configuring the flow selector.
flow_selector: Arc::new(RoundRobinSelector::new(SelectTarget::Flownode)),
handler_group,
handler_group: Arc::new(handler_group),
election,
lock,
procedure_manager,

View File

@@ -12,7 +12,37 @@
// See the License for the specific language governing permissions and
// limitations under the License.
use async_trait::async_trait;
use common_meta::error::{self, Result};
use common_meta::leadership_notifier::LeadershipChangeListener;
use common_procedure::ProcedureManagerRef;
use snafu::ResultExt;
pub mod region_migration;
#[cfg(test)]
mod tests;
pub mod utils;
#[derive(Clone)]
pub struct ProcedureManagerListenerAdapter(pub ProcedureManagerRef);
#[async_trait]
impl LeadershipChangeListener for ProcedureManagerListenerAdapter {
fn name(&self) -> &str {
"ProcedureManager"
}
async fn on_leader_start(&self) -> Result<()> {
self.0
.start()
.await
.context(error::StartProcedureManagerSnafu)
}
async fn on_leader_stop(&self) -> Result<()> {
self.0
.stop()
.await
.context(error::StopProcedureManagerSnafu)
}
}

View File

@@ -43,8 +43,10 @@ use common_procedure::error::{
Error as ProcedureError, FromJsonSnafu, Result as ProcedureResult, ToJsonSnafu,
};
use common_procedure::{Context as ProcedureContext, LockKey, Procedure, Status, StringKey};
pub use manager::RegionMigrationProcedureTask;
use manager::{RegionMigrationProcedureGuard, RegionMigrationProcedureTracker};
use manager::RegionMigrationProcedureGuard;
pub use manager::{
RegionMigrationManagerRef, RegionMigrationProcedureTask, RegionMigrationProcedureTracker,
};
use serde::{Deserialize, Serialize};
use snafu::{OptionExt, ResultExt};
use store_api::storage::RegionId;

View File

@@ -22,8 +22,10 @@ use common_meta::instruction::{
};
use common_procedure::Status;
use common_telemetry::{error, info, warn};
use common_wal::options::WalOptions;
use serde::{Deserialize, Serialize};
use snafu::{OptionExt, ResultExt};
use store_api::storage::RegionId;
use tokio::time::{sleep, Instant};
use super::update_metadata::UpdateMetadata;
@@ -95,15 +97,32 @@ impl DowngradeLeaderRegion {
&self,
ctx: &Context,
flush_timeout: Duration,
reject_write: bool,
) -> Instruction {
let pc = &ctx.persistent_ctx;
let region_id = pc.region_id;
Instruction::DowngradeRegion(DowngradeRegion {
region_id,
flush_timeout: Some(flush_timeout),
reject_write,
})
}
async fn should_reject_write(ctx: &mut Context, region_id: RegionId) -> Result<bool> {
let datanode_table_value = ctx.get_from_peer_datanode_table_value().await?;
if let Some(wal_option) = datanode_table_value
.region_info
.region_wal_options
.get(&region_id.region_number())
{
let options: WalOptions = serde_json::from_str(wal_option)
.with_context(|_| error::DeserializeFromJsonSnafu { input: wal_option })?;
return Ok(matches!(options, WalOptions::RaftEngine));
}
Ok(true)
}
/// Tries to downgrade a leader region.
///
/// Retry:
@@ -118,16 +137,17 @@ impl DowngradeLeaderRegion {
/// - [ExceededDeadline](error::Error::ExceededDeadline)
/// - Invalid JSON.
async fn downgrade_region(&self, ctx: &mut Context) -> Result<()> {
let pc = &ctx.persistent_ctx;
let region_id = pc.region_id;
let leader = &pc.from_peer;
let region_id = ctx.persistent_ctx.region_id;
let operation_timeout =
ctx.next_operation_timeout()
.context(error::ExceededDeadlineSnafu {
operation: "Downgrade region",
})?;
let downgrade_instruction = self.build_downgrade_region_instruction(ctx, operation_timeout);
let reject_write = Self::should_reject_write(ctx, region_id).await?;
let downgrade_instruction =
self.build_downgrade_region_instruction(ctx, operation_timeout, reject_write);
let leader = &ctx.persistent_ctx.from_peer;
let msg = MailboxMessage::json_message(
&format!("Downgrade leader region: {}", region_id),
&format!("Meta@{}", ctx.server_addr()),
@@ -240,8 +260,13 @@ impl DowngradeLeaderRegion {
#[cfg(test)]
mod tests {
use std::assert_matches::assert_matches;
use std::collections::HashMap;
use common_meta::key::table_route::TableRouteValue;
use common_meta::key::test_utils::new_test_table_info;
use common_meta::peer::Peer;
use common_meta::rpc::router::{Region, RegionRoute};
use common_wal::options::KafkaWalOptions;
use store_api::storage::RegionId;
use tokio::time::Instant;
@@ -264,19 +289,73 @@ mod tests {
}
}
async fn prepare_table_metadata(ctx: &Context, wal_options: HashMap<u32, String>) {
let table_info =
new_test_table_info(ctx.persistent_ctx.region_id.table_id(), vec![1]).into();
let region_routes = vec![RegionRoute {
region: Region::new_test(ctx.persistent_ctx.region_id),
leader_peer: Some(ctx.persistent_ctx.from_peer.clone()),
follower_peers: vec![ctx.persistent_ctx.to_peer.clone()],
..Default::default()
}];
ctx.table_metadata_manager
.create_table_metadata(
table_info,
TableRouteValue::physical(region_routes),
wal_options,
)
.await
.unwrap();
}
#[tokio::test]
async fn test_datanode_is_unreachable() {
let state = DowngradeLeaderRegion::default();
let persistent_context = new_persistent_context();
let env = TestingEnv::new();
let mut ctx = env.context_factory().new_context(persistent_context);
prepare_table_metadata(&ctx, HashMap::default()).await;
let err = state.downgrade_region(&mut ctx).await.unwrap_err();
assert_matches!(err, Error::PusherNotFound { .. });
assert!(!err.is_retryable());
}
#[tokio::test]
async fn test_should_reject_writes() {
let persistent_context = new_persistent_context();
let region_id = persistent_context.region_id;
let env = TestingEnv::new();
let mut ctx = env.context_factory().new_context(persistent_context);
let wal_options =
HashMap::from([(1, serde_json::to_string(&WalOptions::RaftEngine).unwrap())]);
prepare_table_metadata(&ctx, wal_options).await;
let reject_write = DowngradeLeaderRegion::should_reject_write(&mut ctx, region_id)
.await
.unwrap();
assert!(reject_write);
// Remote WAL
let persistent_context = new_persistent_context();
let region_id = persistent_context.region_id;
let env = TestingEnv::new();
let mut ctx = env.context_factory().new_context(persistent_context);
let wal_options = HashMap::from([(
1,
serde_json::to_string(&WalOptions::Kafka(KafkaWalOptions {
topic: "my_topic".to_string(),
}))
.unwrap(),
)]);
prepare_table_metadata(&ctx, wal_options).await;
let reject_write = DowngradeLeaderRegion::should_reject_write(&mut ctx, region_id)
.await
.unwrap();
assert!(!reject_write);
}
#[tokio::test]
async fn test_pusher_dropped() {
let state = DowngradeLeaderRegion::default();
@@ -285,6 +364,7 @@ mod tests {
let mut env = TestingEnv::new();
let mut ctx = env.context_factory().new_context(persistent_context);
prepare_table_metadata(&ctx, HashMap::default()).await;
let mailbox_ctx = env.mailbox_context();
let (tx, rx) = tokio::sync::mpsc::channel(1);
@@ -307,6 +387,7 @@ mod tests {
let persistent_context = new_persistent_context();
let env = TestingEnv::new();
let mut ctx = env.context_factory().new_context(persistent_context);
prepare_table_metadata(&ctx, HashMap::default()).await;
ctx.volatile_ctx.operations_elapsed = ctx.persistent_ctx.timeout + Duration::from_secs(1);
let err = state.downgrade_region(&mut ctx).await.unwrap_err();
@@ -330,6 +411,7 @@ mod tests {
let mut env = TestingEnv::new();
let mut ctx = env.context_factory().new_context(persistent_context);
prepare_table_metadata(&ctx, HashMap::default()).await;
let mailbox_ctx = env.mailbox_context();
let mailbox = mailbox_ctx.mailbox().clone();
@@ -356,6 +438,7 @@ mod tests {
let mut env = TestingEnv::new();
let mut ctx = env.context_factory().new_context(persistent_context);
prepare_table_metadata(&ctx, HashMap::default()).await;
let mailbox_ctx = env.mailbox_context();
let mailbox = mailbox_ctx.mailbox().clone();
@@ -383,6 +466,7 @@ mod tests {
let mut env = TestingEnv::new();
let mut ctx = env.context_factory().new_context(persistent_context);
prepare_table_metadata(&ctx, HashMap::default()).await;
let mailbox_ctx = env.mailbox_context();
let mailbox = mailbox_ctx.mailbox().clone();
@@ -416,6 +500,7 @@ mod tests {
let mut env = TestingEnv::new();
let mut ctx = env.context_factory().new_context(persistent_context);
prepare_table_metadata(&ctx, HashMap::default()).await;
let mailbox_ctx = env.mailbox_context();
let mailbox = mailbox_ctx.mailbox().clone();
@@ -508,6 +593,7 @@ mod tests {
let mut env = TestingEnv::new();
let mut ctx = env.context_factory().new_context(persistent_context);
prepare_table_metadata(&ctx, HashMap::default()).await;
let mailbox_ctx = env.mailbox_context();
let mailbox = mailbox_ctx.mailbox().clone();

View File

@@ -44,7 +44,7 @@ pub struct RegionMigrationManager {
}
#[derive(Default, Clone)]
pub(crate) struct RegionMigrationProcedureTracker {
pub struct RegionMigrationProcedureTracker {
running_procedures: Arc<RwLock<HashMap<RegionId, RegionMigrationProcedureTask>>>,
}
@@ -149,7 +149,7 @@ impl RegionMigrationManager {
}
/// Returns the [`RegionMigrationProcedureTracker`].
pub(crate) fn tracker(&self) -> &RegionMigrationProcedureTracker {
pub fn tracker(&self) -> &RegionMigrationProcedureTracker {
&self.tracker
}
@@ -246,7 +246,7 @@ impl RegionMigrationManager {
region_route: &RegionRoute,
task: &RegionMigrationProcedureTask,
) -> Result<bool> {
if region_route.is_leader_downgraded() {
if region_route.is_leader_downgrading() {
return Ok(false);
}

View File

@@ -449,7 +449,7 @@ impl ProcedureMigrationTestSuite {
.find(|route| route.region.id == region_id)
.unwrap();
assert!(!region_route.is_leader_downgraded());
assert!(!region_route.is_leader_downgrading());
assert_eq!(
region_route.leader_peer.as_ref().unwrap().id,
expected_leader_id

View File

@@ -13,7 +13,7 @@
// limitations under the License.
use common_error::ext::BoxedError;
use common_meta::rpc::router::RegionStatus;
use common_meta::rpc::router::LeaderState;
use snafu::ResultExt;
use crate::error::{self, Result};
@@ -53,7 +53,7 @@ impl UpdateMetadata {
.as_ref()
.is_some_and(|leader_peer| leader_peer.id == from_peer_id)
{
Some(Some(RegionStatus::Downgraded))
Some(Some(LeaderState::Downgrading))
} else {
None
}
@@ -81,7 +81,7 @@ mod tests {
use common_meta::key::test_utils::new_test_table_info;
use common_meta::peer::Peer;
use common_meta::rpc::router::{Region, RegionRoute, RegionStatus};
use common_meta::rpc::router::{LeaderState, Region, RegionRoute};
use store_api::storage::RegionId;
use crate::error::Error;
@@ -155,7 +155,7 @@ mod tests {
table_metadata_manager
.update_leader_region_status(table_id, &original_table_route, |route| {
if route.region.id == RegionId::new(1024, 2) {
Some(Some(RegionStatus::Downgraded))
Some(Some(LeaderState::Downgrading))
} else {
None
}
@@ -210,7 +210,7 @@ mod tests {
// It should remain unchanged.
assert_eq!(latest_table_route.version().unwrap(), 0);
assert!(!latest_table_route.region_routes().unwrap()[0].is_leader_downgraded());
assert!(!latest_table_route.region_routes().unwrap()[0].is_leader_downgrading());
assert!(ctx.volatile_ctx.table_route.is_none());
}
@@ -251,7 +251,7 @@ mod tests {
.unwrap()
.unwrap();
assert!(latest_table_route.region_routes().unwrap()[0].is_leader_downgraded());
assert!(latest_table_route.region_routes().unwrap()[0].is_leader_downgrading());
assert!(ctx.volatile_ctx.table_route.is_none());
}
}

View File

@@ -65,7 +65,7 @@ mod tests {
use common_meta::key::test_utils::new_test_table_info;
use common_meta::peer::Peer;
use common_meta::rpc::router::{Region, RegionRoute, RegionStatus};
use common_meta::rpc::router::{LeaderState, Region, RegionRoute};
use store_api::storage::RegionId;
use crate::error::Error;
@@ -110,13 +110,13 @@ mod tests {
RegionRoute {
region: Region::new_test(RegionId::new(1024, 1)),
leader_peer: Some(from_peer.clone()),
leader_status: Some(RegionStatus::Downgraded),
leader_state: Some(LeaderState::Downgrading),
..Default::default()
},
RegionRoute {
region: Region::new_test(RegionId::new(1024, 2)),
leader_peer: Some(Peer::empty(4)),
leader_status: Some(RegionStatus::Downgraded),
leader_state: Some(LeaderState::Downgrading),
..Default::default()
},
RegionRoute {
@@ -128,8 +128,8 @@ mod tests {
let expected_region_routes = {
let mut region_routes = region_routes.clone();
region_routes[0].leader_status = None;
region_routes[1].leader_status = None;
region_routes[0].leader_state = None;
region_routes[1].leader_state = None;
region_routes
};
@@ -207,13 +207,13 @@ mod tests {
RegionRoute {
region: Region::new_test(RegionId::new(1024, 1)),
leader_peer: Some(from_peer.clone()),
leader_status: Some(RegionStatus::Downgraded),
leader_state: Some(LeaderState::Downgrading),
..Default::default()
},
RegionRoute {
region: Region::new_test(RegionId::new(1024, 2)),
leader_peer: Some(Peer::empty(4)),
leader_status: Some(RegionStatus::Downgraded),
leader_state: Some(LeaderState::Downgrading),
..Default::default()
},
RegionRoute {
@@ -225,7 +225,7 @@ mod tests {
let expected_region_routes = {
let mut region_routes = region_routes.clone();
region_routes[0].leader_status = None;
region_routes[0].leader_state = None;
region_routes
};

View File

@@ -43,7 +43,7 @@ impl UpdateMetadata {
.context(error::RegionRouteNotFoundSnafu { region_id })?;
// Removes downgraded status.
region_route.set_leader_status(None);
region_route.set_leader_state(None);
let candidate = &ctx.persistent_ctx.to_peer;
let expected_old_leader = &ctx.persistent_ctx.from_peer;
@@ -106,7 +106,7 @@ impl UpdateMetadata {
if leader_peer.id == candidate_peer_id {
ensure!(
!region_route.is_leader_downgraded(),
!region_route.is_leader_downgrading(),
error::UnexpectedSnafu {
violated: format!("Unexpected intermediate state is found during the update metadata for upgrading region {region_id}"),
}
@@ -190,7 +190,7 @@ mod tests {
use common_meta::key::test_utils::new_test_table_info;
use common_meta::peer::Peer;
use common_meta::region_keeper::MemoryRegionKeeper;
use common_meta::rpc::router::{Region, RegionRoute, RegionStatus};
use common_meta::rpc::router::{LeaderState, Region, RegionRoute};
use common_time::util::current_time_millis;
use store_api::storage::RegionId;
@@ -286,7 +286,7 @@ mod tests {
region: Region::new_test(RegionId::new(1024, 1)),
leader_peer: Some(Peer::empty(1)),
follower_peers: vec![Peer::empty(2), Peer::empty(3)],
leader_status: Some(RegionStatus::Downgraded),
leader_state: Some(LeaderState::Downgrading),
leader_down_since: Some(current_time_millis()),
}];
@@ -298,7 +298,7 @@ mod tests {
.await
.unwrap();
assert!(!new_region_routes[0].is_leader_downgraded());
assert!(!new_region_routes[0].is_leader_downgrading());
assert!(new_region_routes[0].leader_down_since.is_none());
assert_eq!(new_region_routes[0].follower_peers, vec![Peer::empty(3)]);
assert_eq!(new_region_routes[0].leader_peer.as_ref().unwrap().id, 2);
@@ -319,13 +319,13 @@ mod tests {
region: Region::new_test(RegionId::new(table_id, 1)),
leader_peer: Some(Peer::empty(1)),
follower_peers: vec![Peer::empty(5), Peer::empty(3)],
leader_status: Some(RegionStatus::Downgraded),
leader_state: Some(LeaderState::Downgrading),
leader_down_since: Some(current_time_millis()),
},
RegionRoute {
region: Region::new_test(RegionId::new(table_id, 2)),
leader_peer: Some(Peer::empty(4)),
leader_status: Some(RegionStatus::Downgraded),
leader_state: Some(LeaderState::Downgrading),
..Default::default()
},
];
@@ -382,7 +382,7 @@ mod tests {
region: Region::new_test(RegionId::new(1024, 1)),
leader_peer: Some(leader_peer),
follower_peers: vec![Peer::empty(2), Peer::empty(3)],
leader_status: None,
leader_state: None,
leader_down_since: None,
}];
@@ -406,7 +406,7 @@ mod tests {
region: Region::new_test(RegionId::new(1024, 1)),
leader_peer: Some(candidate_peer),
follower_peers: vec![Peer::empty(2), Peer::empty(3)],
leader_status: None,
leader_state: None,
leader_down_since: None,
}];
@@ -430,7 +430,7 @@ mod tests {
region: Region::new_test(RegionId::new(1024, 1)),
leader_peer: Some(candidate_peer),
follower_peers: vec![Peer::empty(2), Peer::empty(3)],
leader_status: Some(RegionStatus::Downgraded),
leader_state: Some(LeaderState::Downgrading),
leader_down_since: None,
}];
@@ -455,7 +455,7 @@ mod tests {
let region_routes = vec![RegionRoute {
region: Region::new_test(RegionId::new(table_id, 1)),
leader_peer: Some(Peer::empty(1)),
leader_status: Some(RegionStatus::Downgraded),
leader_state: Some(LeaderState::Downgrading),
..Default::default()
}];
@@ -485,7 +485,7 @@ mod tests {
assert!(ctx.volatile_ctx.table_route.is_none());
assert!(ctx.volatile_ctx.opening_region_guard.is_none());
assert_eq!(region_routes.len(), 1);
assert!(!region_routes[0].is_leader_downgraded());
assert!(!region_routes[0].is_leader_downgrading());
assert!(region_routes[0].follower_peers.is_empty());
assert_eq!(region_routes[0].leader_peer.as_ref().unwrap().id, 2);
}

View File

@@ -62,8 +62,8 @@ fn renew_region_lease_via_region_route(
// If it's a leader region on this datanode.
if let Some(leader) = &region_route.leader_peer {
if leader.id == datanode_id {
let region_role = if region_route.is_leader_downgraded() {
RegionRole::Follower
let region_role = if region_route.is_leader_downgrading() {
RegionRole::DowngradingLeader
} else {
RegionRole::Leader
};
@@ -220,7 +220,7 @@ mod tests {
use common_meta::kv_backend::memory::MemoryKvBackend;
use common_meta::peer::Peer;
use common_meta::region_keeper::MemoryRegionKeeper;
use common_meta::rpc::router::{Region, RegionRouteBuilder, RegionStatus};
use common_meta::rpc::router::{LeaderState, Region, RegionRouteBuilder};
use store_api::region_engine::RegionRole;
use store_api::storage::RegionId;
use table::metadata::RawTableInfo;
@@ -265,11 +265,11 @@ mod tests {
Some((region_id, RegionRole::Follower))
);
region_route.leader_status = Some(RegionStatus::Downgraded);
region_route.leader_state = Some(LeaderState::Downgrading);
// The downgraded leader region on the datanode.
assert_eq!(
renew_region_lease_via_region_route(&region_route, leader_peer_id, region_id),
Some((region_id, RegionRole::Follower))
Some((region_id, RegionRole::DowngradingLeader))
);
}
@@ -492,7 +492,7 @@ mod tests {
.region(Region::new_test(region_id))
.leader_peer(Peer::empty(leader_peer_id))
.follower_peers(vec![Peer::empty(follower_peer_id)])
.leader_status(RegionStatus::Downgraded)
.leader_state(LeaderState::Downgrading)
.build()
.unwrap();

View File

@@ -16,10 +16,12 @@ use std::fmt::Debug;
use std::sync::{Arc, Mutex};
use std::time::Duration;
use async_trait::async_trait;
use common_meta::datanode::Stat;
use common_meta::ddl::{DetectingRegion, RegionFailureDetectorController};
use common_meta::key::MAINTENANCE_KEY;
use common_meta::kv_backend::KvBackendRef;
use common_meta::leadership_notifier::LeadershipChangeListener;
use common_meta::peer::PeerLookupServiceRef;
use common_meta::{ClusterId, DatanodeId};
use common_runtime::JoinHandle;
@@ -129,6 +131,23 @@ pub struct RegionSupervisorTicker {
sender: Sender<Event>,
}
#[async_trait]
impl LeadershipChangeListener for RegionSupervisorTicker {
fn name(&self) -> &'static str {
"RegionSupervisorTicker"
}
async fn on_leader_start(&self) -> common_meta::error::Result<()> {
self.start();
Ok(())
}
async fn on_leader_stop(&self) -> common_meta::error::Result<()> {
self.stop();
Ok(())
}
}
impl RegionSupervisorTicker {
pub(crate) fn new(tick_interval: Duration, sender: Sender<Event>) -> Self {
Self {
@@ -223,7 +242,7 @@ impl RegionFailureDetectorController for RegionFailureDetectorControl {
.send(Event::RegisterFailureDetectors(detecting_regions))
.await
{
error!(err; "RegionSupervisor is stop receiving heartbeat");
error!(err; "RegionSupervisor has stop receiving heartbeat.");
}
}
@@ -233,7 +252,7 @@ impl RegionFailureDetectorController for RegionFailureDetectorControl {
.send(Event::DeregisterFailureDetectors(detecting_regions))
.await
{
error!(err; "RegionSupervisor is stop receiving heartbeat");
error!(err; "RegionSupervisor has stop receiving heartbeat.");
}
}
}
@@ -251,13 +270,13 @@ impl HeartbeatAcceptor {
/// Accepts heartbeats from datanodes.
pub(crate) async fn accept(&self, heartbeat: DatanodeHeartbeat) {
if let Err(err) = self.sender.send(Event::HeartbeatArrived(heartbeat)).await {
error!(err; "RegionSupervisor is stop receiving heartbeat");
error!(err; "RegionSupervisor has stop receiving heartbeat.");
}
}
}
impl RegionSupervisor {
/// Returns a a mpsc channel with a buffer capacity of 1024 for sending and receiving `Event` messages.
/// Returns a mpsc channel with a buffer capacity of 1024 for sending and receiving `Event` messages.
pub(crate) fn channel() -> (Sender<Event>, Receiver<Event>) {
tokio::sync::mpsc::channel(1024)
}

View File

@@ -113,7 +113,7 @@ impl heartbeat_server::Heartbeat for Metasrv {
);
if let Some(key) = pusher_key {
let _ = handler_group.deregister(&key).await;
let _ = handler_group.deregister_push(&key).await;
}
});
@@ -177,7 +177,7 @@ async fn register_pusher(
let node_id = get_node_id(header);
let key = format!("{}-{}", role, node_id);
let pusher = Pusher::new(sender, header);
handler_group.register(&key, pusher).await;
handler_group.register_pusher(&key, pusher).await;
Some(key)
}

View File

@@ -36,7 +36,7 @@ pub(crate) fn new_region_route(region_id: u64, peers: &[Peer], leader_node: u64)
region,
leader_peer,
follower_peers: vec![],
leader_status: None,
leader_state: None,
leader_down_since: None,
}
}

View File

@@ -37,7 +37,8 @@ use snafu::ResultExt;
use store_api::metadata::RegionMetadataRef;
use store_api::metric_engine_consts::METRIC_ENGINE_NAME;
use store_api::region_engine::{
RegionEngine, RegionRole, RegionScannerRef, RegionStatistic, SetReadonlyResponse,
RegionEngine, RegionRole, RegionScannerRef, RegionStatistic, SetRegionRoleStateResponse,
SettableRegionRoleState,
};
use store_api::region_request::RegionRequest;
use store_api::storage::{RegionId, ScanRequest};
@@ -201,14 +202,14 @@ impl RegionEngine for MetricEngine {
Ok(())
}
fn set_writable(&self, region_id: RegionId, writable: bool) -> Result<(), BoxedError> {
fn set_region_role(&self, region_id: RegionId, role: RegionRole) -> Result<(), BoxedError> {
// ignore the region not found error
for x in [
utils::to_metadata_region_id(region_id),
utils::to_data_region_id(region_id),
region_id,
] {
if let Err(e) = self.inner.mito.set_writable(x, writable)
if let Err(e) = self.inner.mito.set_region_role(x, role)
&& e.status_code() != StatusCode::RegionNotFound
{
return Err(e);
@@ -217,11 +218,15 @@ impl RegionEngine for MetricEngine {
Ok(())
}
async fn set_readonly_gracefully(
async fn set_region_role_state_gracefully(
&self,
region_id: RegionId,
) -> std::result::Result<SetReadonlyResponse, BoxedError> {
self.inner.mito.set_readonly_gracefully(region_id).await
region_role_state: SettableRegionRoleState,
) -> std::result::Result<SetRegionRoleStateResponse, BoxedError> {
self.inner
.mito
.set_region_role_state_gracefully(region_id, region_role_state)
.await
}
/// Returns the physical region role.

View File

@@ -64,15 +64,19 @@ impl MetricEngineInner {
/// Return the physical region id behind this logical region
async fn alter_logical_region(
&self,
region_id: RegionId,
logical_region_id: RegionId,
request: RegionAlterRequest,
) -> Result<RegionId> {
let physical_region_id = {
let state = &self.state.read().unwrap();
state.get_physical_region_id(region_id).with_context(|| {
error!("Trying to alter an nonexistent region {region_id}");
LogicalRegionNotFoundSnafu { region_id }
})?
state
.get_physical_region_id(logical_region_id)
.with_context(|| {
error!("Trying to alter an nonexistent region {logical_region_id}");
LogicalRegionNotFoundSnafu {
region_id: logical_region_id,
}
})?
};
// only handle adding column
@@ -87,7 +91,7 @@ impl MetricEngineInner {
.metadata_region
.column_semantic_type(
metadata_region_id,
region_id,
logical_region_id,
&col.column_metadata.column_schema.name,
)
.await?
@@ -102,7 +106,7 @@ impl MetricEngineInner {
self.add_columns_to_physical_data_region(
data_region_id,
metadata_region_id,
region_id,
logical_region_id,
columns_to_add,
)
.await?;
@@ -110,10 +114,16 @@ impl MetricEngineInner {
// register columns to logical region
for col in columns {
self.metadata_region
.add_column(metadata_region_id, region_id, &col.column_metadata)
.add_column(metadata_region_id, logical_region_id, &col.column_metadata)
.await?;
}
// invalid logical column cache
self.state
.write()
.unwrap()
.invalid_logical_column_cache(logical_region_id);
Ok(physical_region_id)
}

View File

@@ -169,11 +169,11 @@ impl MetricEngineInner {
) -> Result<Vec<usize>> {
// project on logical columns
let all_logical_columns = self
.load_logical_columns(physical_region_id, logical_region_id)
.load_logical_column_names(physical_region_id, logical_region_id)
.await?;
let projected_logical_names = origin_projection
.iter()
.map(|i| all_logical_columns[*i].column_schema.name.clone())
.map(|i| all_logical_columns[*i].clone())
.collect::<Vec<_>>();
// generate physical projection
@@ -200,10 +200,8 @@ impl MetricEngineInner {
logical_region_id: RegionId,
) -> Result<Vec<usize>> {
let logical_columns = self
.load_logical_columns(physical_region_id, logical_region_id)
.await?
.into_iter()
.map(|col| col.column_schema.name);
.load_logical_column_names(physical_region_id, logical_region_id)
.await?;
let mut projection = Vec::with_capacity(logical_columns.len());
let data_region_id = utils::to_data_region_id(physical_region_id);
let physical_metadata = self

View File

@@ -23,13 +23,25 @@ use crate::error::Result;
impl MetricEngineInner {
/// Load column metadata of a logical region.
///
/// The return value is ordered on [ColumnId].
/// The return value is ordered on column name.
pub async fn load_logical_columns(
&self,
physical_region_id: RegionId,
logical_region_id: RegionId,
) -> Result<Vec<ColumnMetadata>> {
// load logical and physical columns, and intersect them to get logical column metadata
// First try to load from state cache
if let Some(columns) = self
.state
.read()
.unwrap()
.logical_columns()
.get(&logical_region_id)
{
return Ok(columns.clone());
}
// Else load from metadata region and update the cache.
// Load logical and physical columns, and intersect them to get logical column metadata.
let mut logical_column_metadata = self
.metadata_region
.logical_columns(physical_region_id, logical_region_id)
@@ -37,11 +49,48 @@ impl MetricEngineInner {
.into_iter()
.map(|(_, column_metadata)| column_metadata)
.collect::<Vec<_>>();
// sort columns on column id to ensure the order
// Sort columns on column name to ensure the order
logical_column_metadata
.sort_unstable_by(|c1, c2| c1.column_schema.name.cmp(&c2.column_schema.name));
// Update cache
self.state
.write()
.unwrap()
.add_logical_columns(logical_region_id, logical_column_metadata.clone());
Ok(logical_column_metadata)
}
/// Load logical column names of a logical region.
///
/// The return value is ordered on column name alphabetically.
pub async fn load_logical_column_names(
&self,
physical_region_id: RegionId,
logical_region_id: RegionId,
) -> Result<Vec<String>> {
// First try to load from state cache
if let Some(columns) = self
.state
.read()
.unwrap()
.logical_columns()
.get(&logical_region_id)
{
return Ok(columns
.iter()
.map(|c| c.column_schema.name.clone())
.collect());
}
// Else load from metadata region
let columns = self
.load_logical_columns(physical_region_id, logical_region_id)
.await?
.into_iter()
.map(|c| c.column_schema.name)
.collect::<Vec<_>>();
Ok(columns)
}
}

View File

@@ -17,6 +17,7 @@
use std::collections::{HashMap, HashSet};
use snafu::OptionExt;
use store_api::metadata::ColumnMetadata;
use store_api::storage::RegionId;
use crate::error::{PhysicalRegionNotFoundSnafu, Result};
@@ -35,6 +36,10 @@ pub(crate) struct MetricEngineState {
/// Cache for the columns of physical regions.
/// The region id in key is the data region id.
physical_columns: HashMap<RegionId, HashSet<String>>,
/// Cache for the column metadata of logical regions.
/// The column order is the same with the order in the metadata, which is
/// alphabetically ordered on column name.
logical_columns: HashMap<RegionId, Vec<ColumnMetadata>>,
}
impl MetricEngineState {
@@ -80,6 +85,21 @@ impl MetricEngineState {
.insert(logical_region_id, physical_region_id);
}
/// Add and reorder logical columns.
///
/// Caller should make sure:
/// 1. there is no duplicate columns
/// 2. the column order is the same with the order in the metadata, which is
/// alphabetically ordered on column name.
pub fn add_logical_columns(
&mut self,
logical_region_id: RegionId,
new_columns: impl IntoIterator<Item = ColumnMetadata>,
) {
let columns = self.logical_columns.entry(logical_region_id).or_default();
columns.extend(new_columns);
}
pub fn get_physical_region_id(&self, logical_region_id: RegionId) -> Option<RegionId> {
self.logical_regions.get(&logical_region_id).copied()
}
@@ -88,6 +108,10 @@ impl MetricEngineState {
&self.physical_columns
}
pub fn logical_columns(&self) -> &HashMap<RegionId, Vec<ColumnMetadata>> {
&self.logical_columns
}
pub fn physical_regions(&self) -> &HashMap<RegionId, HashSet<RegionId>> {
&self.physical_regions
}
@@ -129,9 +153,15 @@ impl MetricEngineState {
.unwrap() // Safety: physical_region_id is got from physical_regions
.remove(&logical_region_id);
self.logical_columns.remove(&logical_region_id);
Ok(())
}
pub fn invalid_logical_column_cache(&mut self, logical_region_id: RegionId) {
self.logical_columns.remove(&logical_region_id);
}
pub fn is_logical_region_exist(&self, logical_region_id: RegionId) -> bool {
self.logical_regions().contains_key(&logical_region_id)
}

View File

@@ -39,8 +39,7 @@ use crate::read::Source;
use crate::region::opener::new_manifest_dir;
use crate::region::options::RegionOptions;
use crate::region::version::{VersionBuilder, VersionRef};
use crate::region::ManifestContext;
use crate::region::RegionState::Writable;
use crate::region::{ManifestContext, RegionLeaderState, RegionRoleState};
use crate::schedule::scheduler::LocalScheduler;
use crate::sst::file::{FileMeta, IndexType};
use crate::sst::file_purger::LocalFilePurger;
@@ -129,7 +128,10 @@ pub async fn open_compaction_region(
let manifest = manifest_manager.manifest();
let region_metadata = manifest.metadata.clone();
let manifest_ctx = Arc::new(ManifestContext::new(manifest_manager, Writable));
let manifest_ctx = Arc::new(ManifestContext::new(
manifest_manager,
RegionRoleState::Leader(RegionLeaderState::Writable),
));
let file_purger = {
let purge_scheduler = Arc::new(LocalScheduler::new(mito_config.max_background_jobs));
@@ -379,7 +381,7 @@ impl Compactor for DefaultCompactor {
// TODO: We might leak files if we fail to update manifest. We can add a cleanup task to remove them later.
compaction_region
.manifest_ctx
.update_manifest(Writable, action_list)
.update_manifest(RegionLeaderState::Writable, action_list)
.await?;
Ok(edit)

View File

@@ -53,7 +53,7 @@ mod prune_test;
#[cfg(test)]
mod row_selector_test;
#[cfg(test)]
mod set_readonly_test;
mod set_role_state_test;
#[cfg(test)]
mod truncate_test;
@@ -77,7 +77,7 @@ use store_api::logstore::LogStore;
use store_api::metadata::RegionMetadataRef;
use store_api::region_engine::{
BatchResponses, RegionEngine, RegionRole, RegionScannerRef, RegionStatistic,
SetReadonlyResponse,
SetRegionRoleStateResponse, SettableRegionRoleState,
};
use store_api::region_request::{AffectedRows, RegionOpenRequest, RegionRequest};
use store_api::storage::{RegionId, ScanRequest};
@@ -436,22 +436,27 @@ impl EngineInner {
Ok(scan_region)
}
/// Set writable mode for a region.
fn set_writable(&self, region_id: RegionId, writable: bool) -> Result<()> {
/// Converts the [`RegionRole`].
fn set_region_role(&self, region_id: RegionId, role: RegionRole) -> Result<()> {
let region = self
.workers
.get_region(region_id)
.context(RegionNotFoundSnafu { region_id })?;
region.set_writable(writable);
region.set_role(role);
Ok(())
}
/// Sets read-only for a region and ensures no more writes in the region after it returns.
async fn set_readonly_gracefully(&self, region_id: RegionId) -> Result<SetReadonlyResponse> {
async fn set_region_role_state_gracefully(
&self,
region_id: RegionId,
region_role_state: SettableRegionRoleState,
) -> Result<SetRegionRoleStateResponse> {
// Notes: It acquires the mutable ownership to ensure no other threads,
// Therefore, we submit it to the worker.
let (request, receiver) = WorkerRequest::new_set_readonly_gracefully(region_id);
let (request, receiver) =
WorkerRequest::new_set_readonly_gracefully(region_id, region_role_state);
self.workers.submit_to_worker(region_id, request).await?;
receiver.await.context(RecvSnafu)
@@ -459,7 +464,7 @@ impl EngineInner {
fn role(&self, region_id: RegionId) -> Option<RegionRole> {
self.workers.get_region(region_id).map(|region| {
if region.is_readonly() {
if region.is_follower() {
RegionRole::Follower
} else {
RegionRole::Leader
@@ -547,22 +552,23 @@ impl RegionEngine for MitoEngine {
self.get_region_statistic(region_id)
}
fn set_writable(&self, region_id: RegionId, writable: bool) -> Result<(), BoxedError> {
fn set_region_role(&self, region_id: RegionId, role: RegionRole) -> Result<(), BoxedError> {
self.inner
.set_writable(region_id, writable)
.set_region_role(region_id, role)
.map_err(BoxedError::new)
}
async fn set_readonly_gracefully(
async fn set_region_role_state_gracefully(
&self,
region_id: RegionId,
) -> Result<SetReadonlyResponse, BoxedError> {
region_role_state: SettableRegionRoleState,
) -> Result<SetRegionRoleStateResponse, BoxedError> {
let _timer = HANDLE_REQUEST_ELAPSED
.with_label_values(&["set_readonly_gracefully"])
.with_label_values(&["set_region_role_state_gracefully"])
.start_timer();
self.inner
.set_readonly_gracefully(region_id)
.set_region_role_state_gracefully(region_id, region_role_state)
.await
.map_err(BoxedError::new)
}

Some files were not shown because too many files have changed in this diff Show More