Compare commits

...

25 Commits

Author SHA1 Message Date
Alan Tang
a32326c887 chore: check for redundant pre-commit hooks (#7506)
Signed-off-by: StandingMan <jmtangcs@gmail.com>
2026-01-07 13:46:42 +00:00
Ruihang Xia
fce1687fa7 fix: incorrect timestamp index inference (#7530)
* add sqlness case, but can't reproduce

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* reproduction

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix wildcard rule

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* sort result

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2026-01-07 11:18:25 +00:00
Yingwen
ef6dd5b99f fix: precise filter time index if not in projection (#7531)
* fix: precise filter time index if not in projection

Signed-off-by: evenyag <realevenyag@gmail.com>

* test: add sqlness test

Signed-off-by: evenyag <realevenyag@gmail.com>

---------

Signed-off-by: evenyag <realevenyag@gmail.com>
2026-01-07 11:15:34 +00:00
discord9
ac6d68aa2d fix: simp expr recursively (#7523)
* fix: simp expr recursively

Signed-off-by: discord9 <discord9@163.com>

* test: some simple constant folding case

Signed-off-by: discord9 <discord9@163.com>

* fix: literal ts cast to UTC

Signed-off-by: discord9 <discord9@163.com>

* fix: patch merge scan batch col tz instead

Signed-off-by: discord9 <discord9@163.com>

* test: fix

Signed-off-by: discord9 <discord9@163.com>

---------

Signed-off-by: discord9 <discord9@163.com>
2026-01-07 09:22:26 +00:00
Ruihang Xia
d39895a970 feat: tune query traces (#7524)
* feat: add partition and region id

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* wip: instrument mito

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* connect region scan span

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* instrument streams

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* tweak

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2026-01-07 08:11:09 +00:00
jeremyhi
59867cd5b6 fix: remove log_env_flags (#7529)
Signed-off-by: jeremyhi <fengjiachun@gmail.com>
2026-01-07 08:08:35 +00:00
Ruihang Xia
9a4b7cbb32 feat: bump promql-parser to v0.7.1 (#7521)
* feat: bump promql-parser to v0.7.0

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* add sqlness tests

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* update other sqlness results

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* Update tests/cases/standalone/common/tql/case_sensitive.result

Co-authored-by: Ning Sun <sunng@protonmail.com>

* remove escape on greptimedb side

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* update to v0.7.1

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* remove unused deps

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Co-authored-by: Ning Sun <sunng@protonmail.com>
2026-01-07 07:23:40 +00:00
Weny Xu
2f242927a8 feat(repartition): implement region deallocation for repartition procedure (#7522)
* feat: implement deallocate regions for repartition procedure

Signed-off-by: WenyXu <wenymedia@gmail.com>

* feat(metric-engine): add force flag to drop physical regions with associated logical regions

Signed-off-by: WenyXu <wenymedia@gmail.com>

* feat: update table metadata after deallocating regions

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: apply suggestions

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: update proto

Signed-off-by: WenyXu <wenymedia@gmail.com>

---------

Signed-off-by: WenyXu <wenymedia@gmail.com>
2026-01-07 06:13:48 +00:00
Weny Xu
77310ec5bd refactor: refactor CreateTableProcedure to extract reusable components (#7526)
Signed-off-by: WenyXu <wenymedia@gmail.com>
2026-01-07 01:58:53 +00:00
Weny Xu
ada4666e10 refactor: remove region_numbers from TableMeta and TableInfo (#7519)
* refactor: remove `region_numbers` from `TableMeta` and `TableInfo`

Signed-off-by: WenyXu <wenymedia@gmail.com>

* feat: create partitions from region route

Signed-off-by: WenyXu <wenymedia@gmail.com>

* fix: fix build

Signed-off-by: WenyXu <wenymedia@gmail.com>

---------

Signed-off-by: WenyXu <wenymedia@gmail.com>
2026-01-06 13:21:36 +00:00
jeremyhi
898e84898c feat!: make heartbeat config only in metasrv (#7510)
* feat: make heartbeat config only in metasrv

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* feat: refine config doc

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* feat: make the heartbeat setup simple

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* chore: by comment

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* chore: revert config

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* feat: proto update

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* chore: fix sqlness wrong cfg

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

---------

Signed-off-by: jeremyhi <fengjiachun@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-01-06 09:43:36 +00:00
discord9
6f86a22e6f feat: adjust some args to gc worker (#7469)
* chore: less stuff sent

Signed-off-by: discord9 <discord9@163.com>

* after rebase fix

Signed-off-by: discord9 <discord9@163.com>

* pcr

Signed-off-by: discord9 <discord9@163.com>

* fix: clarify comment on manifest file removal for GC worker

Signed-off-by: discord9 <discord9@163.com>

* per review

Signed-off-by: discord9 <discord9@163.com>

---------

Signed-off-by: discord9 <discord9@163.com>
2026-01-06 07:37:05 +00:00
Ruihang Xia
5162c1de4d feat: repartition grammar candy (#7518)
* feat: repartition grammar candy

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* align keyword

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2026-01-06 04:44:13 +00:00
LFC
522ca99cd6 feat: ingest jsonbench data through pipeline (#7312)
Signed-off-by: luofucong <luofc@foxmail.com>
2026-01-05 12:12:34 +00:00
Weny Xu
2d756b24c8 feat: implement RemapManifest and ApplyStagingManifest for repartition procedure (#7509)
* feat: add RemapManifest and ApplyStagingManifest heartbeat handler

Signed-off-by: WenyXu <wenymedia@gmail.com>

* feat: add `RemapManifest` and `ApplyStagingManifest` states for repartition

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: apply suggestions from CR

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: apply suggestions from CR

Signed-off-by: WenyXu <wenymedia@gmail.com>

---------

Signed-off-by: WenyXu <wenymedia@gmail.com>
2026-01-05 08:33:44 +00:00
shuiyisong
527a1c03f3 fix: pipeline loading issue (#7491)
* fix: pipeline loading

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* chore: change string to str

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* chore: minor fix to save returned version

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* refactor: introduce PipelineContent

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* fix: use found schema

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* chore: update CR

Co-authored-by: Yingwen <realevenyag@gmail.com>

* chore: CR issue

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

---------

Signed-off-by: shuiyisong <xixing.sys@gmail.com>
Co-authored-by: Yingwen <realevenyag@gmail.com>
2026-01-05 06:49:44 +00:00
discord9
7e243632c7 fix: dist planner rm col req when rm sort (#7512)
* aha!

Signed-off-by: discord9 <discord9@163.com>

* fix: rm col_req in pql sort

Signed-off-by: discord9 <discord9@163.com>

* ut

Signed-off-by: discord9 <discord9@163.com>

* docs

Signed-off-by: discord9 <discord9@163.com>

* typo

Signed-off-by: discord9 <discord9@163.com>

* more typo

Signed-off-by: discord9 <discord9@163.com>

---------

Signed-off-by: discord9 <discord9@163.com>
2026-01-05 03:27:11 +00:00
Ruihang Xia
3556eb4476 chore: add tests to comment column on information_schema (#7514)
* feat: show comment on information_schema

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* add to information schema for columns, add sqlness tests

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* remove duplications

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix typo

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* update integration test

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2026-01-04 09:05:50 +00:00
Weny Xu
9343da7fe8 feat(meta-srv): fallback to non-TLS connection when etcd TLS prefer mode fail (#7507)
* feat(meta-srv): fallback to non-TLS connection when etcd TLS prefer mode fail

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore(ci): set timeout for deploy cluster

Signed-off-by: WenyXu <wenymedia@gmail.com>

* refactor: simplify etcd TLS prefer mode handling

Signed-off-by: WenyXu <wenymedia@gmail.com>

---------

Signed-off-by: WenyXu <wenymedia@gmail.com>
2025-12-31 10:03:34 +00:00
Alan Tang
8a07dbf605 fix: fix sqlness test error about double precision (#7476)
* fix: fix sqlness test error about double precision

Signed-off-by: StandingMan <jmtangcs@gmail.com>

* fix: use round method to truncate the result

Signed-off-by: StandingMan <jmtangcs@gmail.com>

---------

Signed-off-by: StandingMan <jmtangcs@gmail.com>
2025-12-31 04:55:22 +00:00
Weny Xu
83932c8c9e fix: align backend_tls default value with example config (#7496)
* fix: align backend_tls default value with example config

Signed-off-by: WenyXu <wenymedia@gmail.com>

* Update src/common/meta/src/kv_backend/rds/postgres.rs

Co-authored-by: dennis zhuang <killme2008@gmail.com>

---------

Signed-off-by: WenyXu <wenymedia@gmail.com>
Co-authored-by: dennis zhuang <killme2008@gmail.com>
2025-12-31 03:31:08 +00:00
LFC
dc9fc582a0 feat: impl json_get_int for new json type (#7495)
Update src/common/function/src/scalars/json/json_get.rs



impl `json_get_int` for new json type

Signed-off-by: luofucong <luofc@foxmail.com>
2025-12-30 09:42:16 +00:00
Weny Xu
b1d81913f5 feat: update ApplyStagingManifestRequest to fetch manifest from central region (#7493)
* feat: update ApplyStagingManifestRequest to fetch manifest from central region

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: refine comments

Signed-off-by: WenyXu <wenymedia@gmail.com>

* refactor(mito2): rename `StagingDataStorage` to `StagingBlobStorage`

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: apply suggestions

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: update proto

Signed-off-by: WenyXu <wenymedia@gmail.com>

---------

Signed-off-by: WenyXu <wenymedia@gmail.com>
2025-12-30 07:29:56 +00:00
Yingwen
554f3943b6 ci: update breaking change title level (#7497)
Signed-off-by: evenyag <realevenyag@gmail.com>
2025-12-30 06:17:51 +00:00
dennis zhuang
e4b5ef275f feat: impl vector index building (#7468)
* feat: impl vector index building

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* feat: supports flat format

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* ci: add vector_index feature to test

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* chore: apply suggestions

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* chore: apply suggestions from copilot

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

---------

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
2025-12-30 03:38:51 +00:00
240 changed files with 8137 additions and 1672 deletions

View File

@@ -70,19 +70,23 @@ runs:
--wait \
--wait-for-jobs
- name: Wait for GreptimeDB
shell: bash
run: |
while true; do
PHASE=$(kubectl -n my-greptimedb get gtc my-greptimedb -o jsonpath='{.status.clusterPhase}')
if [ "$PHASE" == "Running" ]; then
echo "Cluster is ready"
break
else
echo "Cluster is not ready yet: Current phase: $PHASE"
kubectl get pods -n my-greptimedb
sleep 5 # wait for 5 seconds before check again.
fi
done
uses: nick-fields/retry@v3
with:
timeout_minutes: 3
max_attempts: 1
shell: bash
command: |
while true; do
PHASE=$(kubectl -n my-greptimedb get gtc my-greptimedb -o jsonpath='{.status.clusterPhase}')
if [ "$PHASE" == "Running" ]; then
echo "Cluster is ready"
break
else
echo "Cluster is not ready yet: Current phase: $PHASE"
kubectl get pods -n my-greptimedb
sleep 5 # wait for 5 seconds before check again.
fi
done
- name: Print GreptimeDB info
if: always()
shell: bash

View File

@@ -755,7 +755,7 @@ jobs:
run: ../../.github/scripts/pull-test-deps-images.sh && docker compose up -d --wait
- name: Run nextest cases
run: cargo nextest run --workspace -F dashboard -F pg_kvbackend -F mysql_kvbackend
run: cargo nextest run --workspace -F dashboard -F pg_kvbackend -F mysql_kvbackend -F vector_index
env:
CARGO_BUILD_RUSTFLAGS: "-C link-arg=-fuse-ld=mold"
RUST_BACKTRACE: 1
@@ -813,7 +813,7 @@ jobs:
run: ../../.github/scripts/pull-test-deps-images.sh && docker compose up -d --wait
- name: Run nextest cases
run: cargo llvm-cov nextest --workspace --lcov --output-path lcov.info -F dashboard -F pg_kvbackend -F mysql_kvbackend
run: cargo llvm-cov nextest --workspace --lcov --output-path lcov.info -F dashboard -F pg_kvbackend -F mysql_kvbackend -F vector_index
env:
CARGO_BUILD_RUSTFLAGS: "-C link-arg=-fuse-ld=mold"
RUST_BACKTRACE: 1

View File

@@ -15,8 +15,11 @@ repos:
rev: v1.0
hooks:
- id: fmt
args: ["--", "--check"]
stages: [commit-msg]
- id: clippy
args: ["--workspace", "--all-targets", "--all-features", "--", "-D", "warnings"]
stages: [pre-push]
stages: [commit-msg]
- id: cargo-check
args: ["--workspace", "--all-targets", "--all-features"]
stages: [commit-msg]

22
Cargo.lock generated
View File

@@ -4062,6 +4062,7 @@ dependencies = [
"mito2",
"num_cpus",
"object-store",
"partition",
"prometheus",
"prost 0.13.5",
"query",
@@ -5466,7 +5467,7 @@ dependencies = [
[[package]]
name = "greptime-proto"
version = "0.1.0"
source = "git+https://github.com/GreptimeTeam/greptime-proto.git?rev=520fa524f9d590752ea327683e82ffd65721b27c#520fa524f9d590752ea327683e82ffd65721b27c"
source = "git+https://github.com/GreptimeTeam/greptime-proto.git?rev=0e316b86d765e4718d6f0ca77b1ad179f222b822#0e316b86d765e4718d6f0ca77b1ad179f222b822"
dependencies = [
"prost 0.13.5",
"prost-types 0.13.5",
@@ -7779,7 +7780,6 @@ dependencies = [
"either",
"futures",
"greptime-proto",
"humantime",
"humantime-serde",
"index",
"itertools 0.14.0",
@@ -7798,6 +7798,7 @@ dependencies = [
"rand 0.9.1",
"rayon",
"regex",
"roaring",
"rskafka",
"rstest",
"rstest_reuse",
@@ -7816,6 +7817,7 @@ dependencies = [
"tokio-util",
"toml 0.8.23",
"tracing",
"usearch",
"uuid",
]
@@ -9473,6 +9475,7 @@ dependencies = [
"ahash 0.8.12",
"api",
"arrow",
"arrow-schema",
"async-trait",
"catalog",
"chrono",
@@ -9950,9 +9953,9 @@ dependencies = [
[[package]]
name = "promql-parser"
version = "0.6.0"
version = "0.7.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "328fe69c2443ec4f8e6c33ea925dde04a1026e6c95928e89ed02343944cac9bf"
checksum = "6c3c2199b84e1253aade469e92ae16cd8dbe1de031c66a00f4f5cdd650290a86"
dependencies = [
"cfgrammar",
"chrono",
@@ -9962,7 +9965,6 @@ dependencies = [
"regex",
"serde",
"serde_json",
"unescaper",
]
[[package]]
@@ -10323,7 +10325,6 @@ dependencies = [
"tokio",
"tokio-stream",
"tracing",
"unescaper",
"uuid",
]
@@ -14166,15 +14167,6 @@ dependencies = [
"version_check",
]
[[package]]
name = "unescaper"
version = "0.1.6"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "c01d12e3a56a4432a8b436f293c25f4808bdf9e9f9f98f9260bba1f1bc5a1f26"
dependencies = [
"thiserror 2.0.17",
]
[[package]]
name = "unicase"
version = "2.8.1"

View File

@@ -151,7 +151,7 @@ etcd-client = { version = "0.16.1", features = [
fst = "0.4.7"
futures = "0.3"
futures-util = "0.3"
greptime-proto = { git = "https://github.com/GreptimeTeam/greptime-proto.git", rev = "520fa524f9d590752ea327683e82ffd65721b27c" }
greptime-proto = { git = "https://github.com/GreptimeTeam/greptime-proto.git", rev = "0e316b86d765e4718d6f0ca77b1ad179f222b822" }
hex = "0.4"
http = "1"
humantime = "2.1"
@@ -189,7 +189,7 @@ paste = "1.0"
pin-project = "1.0"
pretty_assertions = "1.4.0"
prometheus = { version = "0.13.3", features = ["process"] }
promql-parser = { version = "0.6", features = ["ser"] }
promql-parser = { version = "0.7.1", features = ["ser"] }
prost = { version = "0.13", features = ["no-recursion-limit"] }
prost-types = "0.13"
raft-engine = { version = "0.4.1", default-features = false }

View File

@@ -17,7 +17,7 @@ Release date: {{ timestamp | date(format="%B %d, %Y") }}
{%- set breakings = commits | filter(attribute="breaking", value=true) -%}
{%- if breakings | length > 0 %}
## Breaking changes
### Breaking changes
{% for commit in breakings %}
* {{ commit.github.pr_title }}\
{% if commit.github.username %} by \

View File

@@ -895,7 +895,7 @@ pub fn is_column_type_value_eq(
.unwrap_or(false)
}
fn encode_json_value(value: JsonValue) -> v1::JsonValue {
pub fn encode_json_value(value: JsonValue) -> v1::JsonValue {
fn helper(json: JsonVariant) -> v1::JsonValue {
let value = match json {
JsonVariant::Null => None,

View File

@@ -17,8 +17,8 @@ use std::collections::HashMap;
use arrow_schema::extension::{EXTENSION_TYPE_METADATA_KEY, EXTENSION_TYPE_NAME_KEY};
use datatypes::schema::{
COMMENT_KEY, ColumnDefaultConstraint, ColumnSchema, FULLTEXT_KEY, FulltextAnalyzer,
FulltextBackend, FulltextOptions, INVERTED_INDEX_KEY, SKIPPING_INDEX_KEY, SkippingIndexOptions,
SkippingIndexType,
FulltextBackend, FulltextOptions, INVERTED_INDEX_KEY, Metadata, SKIPPING_INDEX_KEY,
SkippingIndexOptions, SkippingIndexType,
};
use greptime_proto::v1::{
Analyzer, FulltextBackend as PbFulltextBackend, SkippingIndexType as PbSkippingIndexType,
@@ -36,6 +36,14 @@ const INVERTED_INDEX_GRPC_KEY: &str = "inverted_index";
/// Key used to store skip index options in gRPC column options.
const SKIPPING_INDEX_GRPC_KEY: &str = "skipping_index";
const COLUMN_OPTION_MAPPINGS: [(&str, &str); 5] = [
(FULLTEXT_GRPC_KEY, FULLTEXT_KEY),
(INVERTED_INDEX_GRPC_KEY, INVERTED_INDEX_KEY),
(SKIPPING_INDEX_GRPC_KEY, SKIPPING_INDEX_KEY),
(EXTENSION_TYPE_NAME_KEY, EXTENSION_TYPE_NAME_KEY),
(EXTENSION_TYPE_METADATA_KEY, EXTENSION_TYPE_METADATA_KEY),
];
/// Tries to construct a `ColumnSchema` from the given `ColumnDef`.
pub fn try_as_column_schema(column_def: &ColumnDef) -> Result<ColumnSchema> {
let data_type = ColumnDataTypeWrapper::try_new(
@@ -131,6 +139,21 @@ pub fn try_as_column_def(column_schema: &ColumnSchema, is_primary_key: bool) ->
})
}
/// Collect the [ColumnOptions] into the [Metadata] that can be used in, for example, [ColumnSchema].
pub fn collect_column_options(column_options: Option<&ColumnOptions>) -> Metadata {
let Some(ColumnOptions { options }) = column_options else {
return Metadata::default();
};
let mut metadata = Metadata::with_capacity(options.len());
for (x, y) in COLUMN_OPTION_MAPPINGS {
if let Some(v) = options.get(x) {
metadata.insert(y.to_string(), v.clone());
}
}
metadata
}
/// Constructs a `ColumnOptions` from the given `ColumnSchema`.
pub fn options_from_column_schema(column_schema: &ColumnSchema) -> Option<ColumnOptions> {
let mut options = ColumnOptions::default();

View File

@@ -32,6 +32,7 @@ use crate::error::Result;
pub mod error;
pub mod information_extension;
pub mod kvbackend;
#[cfg(any(test, feature = "testing"))]
pub mod memory;
mod metrics;
pub mod system_schema;

View File

@@ -12,8 +12,6 @@
// See the License for the specific language governing permissions and
// limitations under the License.
pub(crate) const METRIC_DB_LABEL: &str = "db";
use lazy_static::lazy_static;
use prometheus::*;
@@ -25,7 +23,7 @@ lazy_static! {
pub static ref METRIC_CATALOG_MANAGER_TABLE_COUNT: IntGaugeVec = register_int_gauge_vec!(
"greptime_catalog_table_count",
"catalog table count",
&[METRIC_DB_LABEL]
&["db"]
)
.unwrap();
pub static ref METRIC_CATALOG_KV_REMOTE_GET: Histogram =

View File

@@ -24,6 +24,7 @@ use std::sync::Arc;
use common_error::ext::BoxedError;
use common_recordbatch::{RecordBatchStreamWrapper, SendableRecordBatchStream};
use common_telemetry::tracing::Span;
use datatypes::schema::SchemaRef;
use futures_util::StreamExt;
use snafu::ResultExt;
@@ -163,6 +164,7 @@ impl DataSource for SystemTableDataSource {
stream: Box::pin(stream),
output_ordering: None,
metrics: Default::default(),
span: Span::current(),
};
Ok(Box::pin(stream))

View File

@@ -399,8 +399,8 @@ impl InformationSchemaColumnsBuilder {
self.is_nullables.push(Some("No"));
}
self.column_types.push(Some(&data_type));
self.column_comments
.push(column_schema.column_comment().map(|x| x.as_ref()));
let column_comment = column_schema.column_comment().map(|x| x.as_ref());
self.column_comments.push(column_comment);
}
fn finish(&mut self) -> Result<RecordBatch> {

View File

@@ -12,6 +12,7 @@
// See the License for the specific language governing permissions and
// limitations under the License.
use core::pin::pin;
use std::sync::{Arc, Weak};
use arrow_schema::SchemaRef as ArrowSchemaRef;
@@ -31,15 +32,17 @@ use datatypes::value::Value;
use datatypes::vectors::{
StringVectorBuilder, TimestampSecondVectorBuilder, UInt32VectorBuilder, UInt64VectorBuilder,
};
use futures::TryStreamExt;
use futures::StreamExt;
use snafu::{OptionExt, ResultExt};
use store_api::storage::{RegionId, ScanRequest, TableId};
use store_api::storage::{ScanRequest, TableId};
use table::metadata::{TableInfo, TableType};
use crate::CatalogManager;
use crate::error::{
CreateRecordBatchSnafu, InternalSnafu, Result, UpgradeWeakCatalogManagerRefSnafu,
CreateRecordBatchSnafu, FindRegionRoutesSnafu, InternalSnafu, Result,
UpgradeWeakCatalogManagerRefSnafu,
};
use crate::kvbackend::KvBackendCatalogManager;
use crate::system_schema::information_schema::{InformationTable, Predicates, TABLES};
use crate::system_schema::utils;
@@ -247,6 +250,10 @@ impl InformationSchemaTablesBuilder {
.catalog_manager
.upgrade()
.context(UpgradeWeakCatalogManagerRefSnafu)?;
let partition_manager = catalog_manager
.as_any()
.downcast_ref::<KvBackendCatalogManager>()
.map(|catalog_manager| catalog_manager.partition_manager());
let predicates = Predicates::from_scan_request(&request);
let information_extension = utils::information_extension(&self.catalog_manager)?;
@@ -267,37 +274,59 @@ impl InformationSchemaTablesBuilder {
};
for schema_name in catalog_manager.schema_names(&catalog_name, None).await? {
let mut stream = catalog_manager.tables(&catalog_name, &schema_name, None);
let table_stream = catalog_manager.tables(&catalog_name, &schema_name, None);
while let Some(table) = stream.try_next().await? {
let table_info = table.table_info();
const BATCH_SIZE: usize = 128;
// Split tables into chunks
let mut table_chunks = pin!(table_stream.ready_chunks(BATCH_SIZE));
// TODO(dennis): make it working for metric engine
let table_region_stats =
if table_info.meta.engine == MITO_ENGINE || table_info.is_physical_table() {
table_info
.meta
.region_numbers
.iter()
.map(|n| RegionId::new(table_info.ident.table_id, *n))
.flat_map(|region_id| {
region_stats
.binary_search_by_key(&region_id, |x| x.id)
.map(|i| &region_stats[i])
})
.collect::<Vec<_>>()
} else {
vec![]
};
while let Some(tables) = table_chunks.next().await {
let tables = tables.into_iter().collect::<Result<Vec<_>>>()?;
let mito_or_physical_table_ids = tables
.iter()
.filter(|table| {
table.table_info().meta.engine == MITO_ENGINE
|| table.table_info().is_physical_table()
})
.map(|table| table.table_info().ident.table_id)
.collect::<Vec<_>>();
self.add_table(
&predicates,
&catalog_name,
&schema_name,
table_info,
table.table_type(),
&table_region_stats,
);
let table_routes = if let Some(partition_manager) = &partition_manager {
partition_manager
.batch_find_region_routes(&mito_or_physical_table_ids)
.await
.context(FindRegionRoutesSnafu)?
} else {
mito_or_physical_table_ids
.into_iter()
.map(|id| (id, vec![]))
.collect()
};
for table in tables {
let table_region_stats =
match table_routes.get(&table.table_info().ident.table_id) {
Some(routes) => routes
.iter()
.flat_map(|route| {
let region_id = route.region.id;
region_stats
.binary_search_by_key(&region_id, |x| x.id)
.map(|i| &region_stats[i])
})
.collect::<Vec<_>>(),
None => vec![],
};
self.add_table(
&predicates,
&catalog_name,
&schema_name,
table.table_info(),
table.table_type(),
&table_region_stats,
);
}
}
}

View File

@@ -337,7 +337,7 @@ mod tests {
.build();
let table_metadata_manager = TableMetadataManager::new(backend);
let mut view_info = common_meta::key::test_utils::new_test_table_info(1024, vec![]);
let mut view_info = common_meta::key::test_utils::new_test_table_info(1024);
view_info.table_type = TableType::View;
let logical_plan = vec![1, 2, 3];
// Create view metadata

View File

@@ -162,7 +162,6 @@ fn create_table_info(table_id: TableId, table_name: TableName) -> RawTableInfo {
next_column_id: columns as u32 + 1,
value_indices: vec![],
options: Default::default(),
region_numbers: (1..=100).collect(),
partition_key_indices: vec![],
column_ids: vec![],
};

View File

@@ -92,7 +92,7 @@ impl StoreConfig {
pub fn tls_config(&self) -> Option<TlsOption> {
if self.backend_tls_mode != TlsMode::Disable {
Some(TlsOption {
mode: self.backend_tls_mode.clone(),
mode: self.backend_tls_mode,
cert_path: self.backend_tls_cert_path.clone(),
key_path: self.backend_tls_key_path.clone(),
ca_cert_path: self.backend_tls_ca_cert_path.clone(),

View File

@@ -37,6 +37,7 @@ use common_grpc::flight::{FlightDecoder, FlightMessage};
use common_query::Output;
use common_recordbatch::error::ExternalSnafu;
use common_recordbatch::{RecordBatch, RecordBatchStreamWrapper};
use common_telemetry::tracing::Span;
use common_telemetry::tracing_context::W3cTrace;
use common_telemetry::{error, warn};
use futures::future;
@@ -456,6 +457,7 @@ impl Database {
stream,
output_ordering: None,
metrics: Default::default(),
span: Span::current(),
};
Ok(Output::new_with_stream(Box::pin(record_batch_stream)))
}

View File

@@ -30,6 +30,7 @@ use common_query::request::QueryRequest;
use common_recordbatch::error::ExternalSnafu;
use common_recordbatch::{RecordBatch, RecordBatchStreamWrapper, SendableRecordBatchStream};
use common_telemetry::error;
use common_telemetry::tracing::Span;
use common_telemetry::tracing_context::TracingContext;
use prost::Message;
use query::query_engine::DefaultSerializer;
@@ -242,6 +243,7 @@ impl RegionRequester {
stream,
output_ordering: None,
metrics,
span: Span::current(),
};
Ok(Box::pin(record_batch_stream))
}

View File

@@ -18,6 +18,7 @@ default = [
]
enterprise = ["common-meta/enterprise", "frontend/enterprise", "meta-srv/enterprise"]
tokio-console = ["common-telemetry/tokio-console"]
vector_index = ["mito2/vector_index"]
[lints]
workspace = true

View File

@@ -330,7 +330,6 @@ mod tests {
use common_config::ENV_VAR_SEP;
use common_test_util::temp_dir::create_named_temp_file;
use object_store::config::{FileConfig, GcsConfig, ObjectStoreConfig, S3Config};
use servers::heartbeat_options::HeartbeatOptions;
use super::*;
use crate::options::GlobalOptions;
@@ -374,9 +373,6 @@ mod tests {
hostname = "127.0.0.1"
runtime_size = 8
[heartbeat]
interval = "300ms"
[meta_client]
metasrv_addrs = ["127.0.0.1:3002"]
timeout = "3s"
@@ -434,13 +430,6 @@ mod tests {
);
assert!(!raft_engine_config.sync_write);
let HeartbeatOptions {
interval: heart_beat_interval,
..
} = options.heartbeat;
assert_eq!(300, heart_beat_interval.as_millis());
let MetaClientOptions {
metasrv_addrs: metasrv_addr,
timeout,

View File

@@ -233,6 +233,8 @@ impl ObjbenchCommand {
inverted_index_config: MitoConfig::default().inverted_index,
fulltext_index_config,
bloom_filter_index_config: MitoConfig::default().bloom_filter_index,
#[cfg(feature = "vector_index")]
vector_index_config: Default::default(),
};
// Write SST

View File

@@ -358,7 +358,6 @@ impl StartCommand {
let heartbeat_task = flow::heartbeat::HeartbeatTask::new(
&opts,
meta_client.clone(),
opts.heartbeat.clone(),
Arc::new(executor),
Arc::new(resource_stat),
);

View File

@@ -236,7 +236,7 @@ impl StartCommand {
};
let tls_opts = TlsOption::new(
self.tls_mode.clone(),
self.tls_mode,
self.tls_cert_path.clone(),
self.tls_key_path.clone(),
self.tls_watch,

View File

@@ -108,7 +108,7 @@ pub trait App: Send {
}
}
/// Log the versions of the application, and the arguments passed to the cli.
/// Log the versions of the application.
///
/// `version` should be the same as the output of cli "--version";
/// and the `short_version` is the short version of the codes, often consist of git branch and commit.
@@ -118,10 +118,7 @@ pub fn log_versions(version: &str, short_version: &str, app: &str) {
.with_label_values(&[common_version::version(), short_version, app])
.inc();
// Log version and argument flags.
info!("GreptimeDB version: {}", version);
log_env_flags();
}
pub fn create_resource_limit_metrics(app: &str) {
@@ -144,13 +141,6 @@ pub fn create_resource_limit_metrics(app: &str) {
}
}
fn log_env_flags() {
info!("command line arguments");
for argument in std::env::args() {
info!("argument: {}", argument);
}
}
pub fn maybe_activate_heap_profile(memory_options: &common_options::memory::MemoryOptions) {
if memory_options.enable_heap_profiling {
match activate_heap_profile() {

View File

@@ -261,7 +261,7 @@ impl StartCommand {
};
let tls_opts = TlsOption::new(
self.tls_mode.clone(),
self.tls_mode,
self.tls_cert_path.clone(),
self.tls_key_path.clone(),
self.tls_watch,

View File

@@ -228,7 +228,6 @@ fn test_load_flownode_example_config() {
..Default::default()
},
tracing: Default::default(),
heartbeat: Default::default(),
// flownode deliberately use a slower query parallelism
// to avoid overwhelming the frontend with too many queries
query: QueryOptions {

View File

@@ -27,7 +27,7 @@ use datafusion_common::arrow::datatypes::DataType;
use datafusion_common::{DataFusionError, Result};
use datafusion_expr::type_coercion::aggregates::STRINGS;
use datafusion_expr::{ColumnarValue, ScalarFunctionArgs, Signature, Volatility};
use datatypes::arrow_array::string_array_value_at_index;
use datatypes::arrow_array::{int_array_value_at_index, string_array_value_at_index};
use datatypes::json::JsonStructureSettings;
use jsonpath_rust::JsonPath;
use serde_json::Value;
@@ -131,13 +131,6 @@ macro_rules! json_get {
};
}
json_get!(
JsonGetInt,
Int64,
i64,
"Get the value from the JSONB by the given path and return it as an integer."
);
json_get!(
JsonGetFloat,
Float64,
@@ -152,17 +145,65 @@ json_get!(
"Get the value from the JSONB by the given path and return it as a boolean."
);
/// Get the value from the JSONB by the given path and return it as a string.
#[derive(Clone, Debug)]
pub struct JsonGetString {
enum JsonResultValue<'a> {
Jsonb(Vec<u8>),
JsonStructByColumn(&'a ArrayRef, usize),
JsonStructByValue(&'a Value),
}
trait JsonGetResultBuilder {
fn append_value(&mut self, value: JsonResultValue<'_>) -> Result<()>;
fn append_null(&mut self);
fn build(&mut self) -> ArrayRef;
}
/// Common implementation for JSON get scalar functions.
///
/// `JsonGet` encapsulates the logic for extracting values from JSON inputs
/// based on a path expression. Different JSON get functions reuse this
/// implementation by supplying their own `JsonGetResultBuilder` to control
/// how the resulting values are materialized into an Arrow array.
struct JsonGet {
signature: Signature,
}
impl JsonGetString {
pub const NAME: &'static str = "json_get_string";
impl JsonGet {
fn invoke<F, B>(&self, args: ScalarFunctionArgs, builder_factory: F) -> Result<ColumnarValue>
where
F: Fn(usize) -> B,
B: JsonGetResultBuilder,
{
let [arg0, arg1] = extract_args("JSON_GET", &args)?;
let arg1 = compute::cast(&arg1, &DataType::Utf8View)?;
let paths = arg1.as_string_view();
let mut builder = (builder_factory)(arg0.len());
match arg0.data_type() {
DataType::Binary | DataType::LargeBinary | DataType::BinaryView => {
let arg0 = compute::cast(&arg0, &DataType::BinaryView)?;
let jsons = arg0.as_binary_view();
jsonb_get(jsons, paths, &mut builder)?;
}
DataType::Struct(_) => {
let jsons = arg0.as_struct();
json_struct_get(jsons, paths, &mut builder)?
}
_ => {
return Err(DataFusionError::Execution(format!(
"JSON_GET not supported argument type {}",
arg0.data_type(),
)));
}
};
Ok(ColumnarValue::Array(builder.build()))
}
}
impl Default for JsonGetString {
impl Default for JsonGet {
fn default() -> Self {
Self {
signature: Signature::any(2, Volatility::Immutable),
@@ -170,6 +211,13 @@ impl Default for JsonGetString {
}
}
#[derive(Default)]
pub struct JsonGetString(JsonGet);
impl JsonGetString {
pub const NAME: &'static str = "json_get_string";
}
impl Function for JsonGetString {
fn name(&self) -> &str {
Self::NAME
@@ -180,61 +228,142 @@ impl Function for JsonGetString {
}
fn signature(&self) -> &Signature {
&self.signature
&self.0.signature
}
fn invoke_with_args(&self, args: ScalarFunctionArgs) -> Result<ColumnarValue> {
let [arg0, arg1] = extract_args(self.name(), &args)?;
struct StringResultBuilder(StringViewBuilder);
let arg1 = compute::cast(&arg1, &DataType::Utf8View)?;
let paths = arg1.as_string_view();
impl JsonGetResultBuilder for StringResultBuilder {
fn append_value(&mut self, value: JsonResultValue<'_>) -> Result<()> {
match value {
JsonResultValue::Jsonb(value) => {
self.0.append_option(jsonb::to_str(&value).ok())
}
JsonResultValue::JsonStructByColumn(column, i) => {
if let Some(v) = string_array_value_at_index(column, i) {
self.0.append_value(v);
} else {
self.0
.append_value(arrow_cast::display::array_value_to_string(
column, i,
)?);
}
}
JsonResultValue::JsonStructByValue(value) => {
if let Some(s) = value.as_str() {
self.0.append_value(s)
} else {
self.0.append_value(value.to_string())
}
}
}
Ok(())
}
let result = match arg0.data_type() {
DataType::Binary | DataType::LargeBinary | DataType::BinaryView => {
let arg0 = compute::cast(&arg0, &DataType::BinaryView)?;
let jsons = arg0.as_binary_view();
jsonb_get_string(jsons, paths)?
fn append_null(&mut self) {
self.0.append_null();
}
DataType::Struct(_) => {
let jsons = arg0.as_struct();
json_struct_get_string(jsons, paths)?
}
_ => {
return Err(DataFusionError::Execution(format!(
"{} not supported argument type {}",
Self::NAME,
arg0.data_type(),
)));
}
};
Ok(ColumnarValue::Array(result))
fn build(&mut self) -> ArrayRef {
Arc::new(self.0.finish())
}
}
self.0.invoke(args, |len: usize| {
StringResultBuilder(StringViewBuilder::with_capacity(len))
})
}
}
fn jsonb_get_string(jsons: &BinaryViewArray, paths: &StringViewArray) -> Result<ArrayRef> {
let size = jsons.len();
let mut builder = StringViewBuilder::with_capacity(size);
#[derive(Default)]
pub struct JsonGetInt(JsonGet);
impl JsonGetInt {
pub const NAME: &'static str = "json_get_int";
}
impl Function for JsonGetInt {
fn name(&self) -> &str {
Self::NAME
}
fn return_type(&self, _: &[DataType]) -> Result<DataType> {
Ok(DataType::Int64)
}
fn signature(&self) -> &Signature {
&self.0.signature
}
fn invoke_with_args(&self, args: ScalarFunctionArgs) -> Result<ColumnarValue> {
struct IntResultBuilder(Int64Builder);
impl JsonGetResultBuilder for IntResultBuilder {
fn append_value(&mut self, value: JsonResultValue<'_>) -> Result<()> {
match value {
JsonResultValue::Jsonb(value) => {
self.0.append_option(jsonb::to_i64(&value).ok())
}
JsonResultValue::JsonStructByColumn(column, i) => {
self.0.append_option(int_array_value_at_index(column, i))
}
JsonResultValue::JsonStructByValue(value) => {
self.0.append_option(value.as_i64())
}
}
Ok(())
}
fn append_null(&mut self) {
self.0.append_null();
}
fn build(&mut self) -> ArrayRef {
Arc::new(self.0.finish())
}
}
self.0.invoke(args, |len: usize| {
IntResultBuilder(Int64Builder::with_capacity(len))
})
}
}
impl Display for JsonGetInt {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
write!(f, "{}", Self::NAME.to_ascii_uppercase())
}
}
fn jsonb_get(
jsons: &BinaryViewArray,
paths: &StringViewArray,
builder: &mut impl JsonGetResultBuilder,
) -> Result<()> {
let size = jsons.len();
for i in 0..size {
let json = jsons.is_valid(i).then(|| jsons.value(i));
let path = paths.is_valid(i).then(|| paths.value(i));
let result = match (json, path) {
(Some(json), Some(path)) => {
get_json_by_path(json, path).and_then(|json| jsonb::to_str(&json).ok())
}
(Some(json), Some(path)) => get_json_by_path(json, path),
_ => None,
};
builder.append_option(result);
if let Some(v) = result {
builder.append_value(JsonResultValue::Jsonb(v))?;
} else {
builder.append_null();
}
}
Ok(Arc::new(builder.finish()))
Ok(())
}
fn json_struct_get_string(jsons: &StructArray, paths: &StringViewArray) -> Result<ArrayRef> {
fn json_struct_get(
jsons: &StructArray,
paths: &StringViewArray,
builder: &mut impl JsonGetResultBuilder,
) -> Result<()> {
let size = jsons.len();
let mut builder = StringViewBuilder::with_capacity(size);
for i in 0..size {
if jsons.is_null(i) || paths.is_null(i) {
builder.append_null();
@@ -247,11 +376,7 @@ fn json_struct_get_string(jsons: &StructArray, paths: &StringViewArray) -> Resul
let column = jsons.column_by_name(&field_path);
if let Some(column) = column {
if let Some(v) = string_array_value_at_index(column, i) {
builder.append_value(v);
} else {
builder.append_value(arrow_cast::display::array_value_to_string(column, i)?);
}
builder.append_value(JsonResultValue::JsonStructByColumn(column, i))?;
} else {
let Some(raw) = jsons
.column_by_name(JsonStructureSettings::RAW_FIELD)
@@ -272,27 +397,15 @@ fn json_struct_get_string(jsons: &StructArray, paths: &StringViewArray) -> Resul
Value::Null => builder.append_null(),
Value::Array(values) => match values.as_slice() {
[] => builder.append_null(),
[x] => {
if let Some(s) = x.as_str() {
builder.append_value(s)
} else {
builder.append_value(x.to_string())
}
}
x => builder.append_value(
x.iter()
.map(|v| v.to_string())
.collect::<Vec<_>>()
.join(", "),
),
[x] => builder.append_value(JsonResultValue::JsonStructByValue(x))?,
_ => builder.append_value(JsonResultValue::JsonStructByValue(&value))?,
},
// Safety: guarded by the returns of `path.find` as documented
_ => unreachable!(),
value => builder.append_value(JsonResultValue::JsonStructByValue(&value))?,
}
}
}
Ok(Arc::new(builder.finish()))
Ok(())
}
fn json_struct_to_value(raw: &str, jsons: &StructArray, i: usize) -> Result<Value> {
@@ -479,6 +592,50 @@ mod tests {
use super::*;
/// Create a JSON object like this (as a one element struct array for testing):
///
/// ```JSON
/// {
/// "kind": "foo",
/// "payload": {
/// "code": 404,
/// "success": false,
/// "result": {
/// "error": "not found",
/// "time_cost": 1.234
/// }
/// }
/// }
/// ```
fn test_json_struct() -> ArrayRef {
Arc::new(StructArray::new(
vec![
Field::new("kind", DataType::Utf8, true),
Field::new("payload.code", DataType::Int64, true),
Field::new("payload.result.time_cost", DataType::Float64, true),
Field::new(JsonStructureSettings::RAW_FIELD, DataType::Utf8View, true),
]
.into(),
vec![
Arc::new(StringArray::from_iter([Some("foo")])) as ArrayRef,
Arc::new(Int64Array::from_iter([Some(404)])),
Arc::new(Float64Array::from_iter([Some(1.234)])),
Arc::new(StringViewArray::from_iter([Some(
json! ({
"payload": {
"success": false,
"result": {
"error": "not found"
}
}
})
.to_string(),
)])),
],
None,
))
}
#[test]
fn test_json_get_int() {
let json_get_int = JsonGetInt::default();
@@ -496,37 +653,55 @@ mod tests {
r#"{"a": 4, "b": {"c": 6}, "c": 6}"#,
r#"{"a": 7, "b": 8, "c": {"a": 7}}"#,
];
let paths = vec!["$.a.b", "$.a", "$.c"];
let results = [Some(2), Some(4), None];
let json_struct = test_json_struct();
let jsonbs = json_strings
let path_expects = vec![
("$.a.b", Some(2)),
("$.a", Some(4)),
("$.c", None),
("$.kind", None),
("$.payload.code", Some(404)),
("$.payload.success", None),
("$.payload.result.time_cost", None),
("$.payload.not-exists", None),
("$.not-exists", None),
("$", None),
];
let mut jsons = json_strings
.iter()
.map(|s| {
let value = jsonb::parse_value(s.as_bytes()).unwrap();
value.to_vec()
Arc::new(BinaryArray::from_iter_values([value.to_vec()])) as ArrayRef
})
.collect::<Vec<_>>();
let json_struct_arrays =
std::iter::repeat_n(json_struct, path_expects.len() - jsons.len()).collect::<Vec<_>>();
jsons.extend(json_struct_arrays);
let args = ScalarFunctionArgs {
args: vec![
ColumnarValue::Array(Arc::new(BinaryArray::from_iter_values(jsonbs))),
ColumnarValue::Array(Arc::new(StringArray::from_iter_values(paths))),
],
arg_fields: vec![],
number_rows: 3,
return_field: Arc::new(Field::new("x", DataType::Int64, false)),
config_options: Arc::new(Default::default()),
};
let result = json_get_int
.invoke_with_args(args)
.and_then(|x| x.to_array(3))
.unwrap();
let vector = result.as_primitive::<Int64Type>();
for i in 0..jsons.len() {
let json = &jsons[i];
let (path, expect) = path_expects[i];
assert_eq!(3, vector.len());
for (i, gt) in results.iter().enumerate() {
let result = vector.is_valid(i).then(|| vector.value(i));
assert_eq!(*gt, result);
let args = ScalarFunctionArgs {
args: vec![
ColumnarValue::Array(json.clone()),
ColumnarValue::Scalar(path.into()),
],
arg_fields: vec![],
number_rows: 1,
return_field: Arc::new(Field::new("x", DataType::Int64, false)),
config_options: Arc::new(Default::default()),
};
let result = json_get_int
.invoke_with_args(args)
.and_then(|x| x.to_array(1))
.unwrap();
let result = result.as_primitive::<Int64Type>();
assert_eq!(1, result.len());
let actual = result.is_valid(0).then(|| result.value(0));
assert_eq!(actual, expect);
}
}
@@ -649,45 +824,7 @@ mod tests {
r#"{"a": "d", "b": {"c": "e"}, "c": "f"}"#,
r#"{"a": "g", "b": "h", "c": {"a": "g"}}"#,
];
// complete JSON is:
// {
// "kind": "foo",
// "payload": {
// "code": 404,
// "success": false,
// "result": {
// "error": "not found",
// "time_cost": 1.234
// }
// }
// }
let json_struct: ArrayRef = Arc::new(StructArray::new(
vec![
Field::new("kind", DataType::Utf8, true),
Field::new("payload.code", DataType::Int64, true),
Field::new("payload.result.time_cost", DataType::Float64, true),
Field::new(JsonStructureSettings::RAW_FIELD, DataType::Utf8View, true),
]
.into(),
vec![
Arc::new(StringArray::from_iter([Some("foo")])) as ArrayRef,
Arc::new(Int64Array::from_iter([Some(404)])),
Arc::new(Float64Array::from_iter([Some(1.234)])),
Arc::new(StringViewArray::from_iter([Some(
json! ({
"payload": {
"success": false,
"result": {
"error": "not found"
}
}
})
.to_string(),
)])),
],
None,
));
let json_struct = test_json_struct();
let paths = vec![
"$.a.b",

View File

@@ -36,8 +36,7 @@ pub mod create_database;
pub mod create_flow;
pub mod create_logical_tables;
pub mod create_table;
mod create_table_template;
pub(crate) use create_table_template::{CreateRequestBuilder, build_template_from_raw_table_info};
pub(crate) use create_table::{CreateRequestBuilder, build_template_from_raw_table_info};
pub mod create_view;
pub mod drop_database;
pub mod drop_flow;

View File

@@ -30,7 +30,7 @@ use serde::{Deserialize, Serialize};
use snafu::ResultExt;
use store_api::metadata::ColumnMetadata;
use store_api::metric_engine_consts::ALTER_PHYSICAL_EXTENSION_KEY;
use store_api::storage::{RegionId, RegionNumber};
use store_api::storage::RegionNumber;
use strum::AsRefStr;
use table::metadata::{RawTableInfo, TableId};
@@ -286,14 +286,7 @@ impl CreateTablesData {
.flat_map(|(task, table_id)| {
if table_id.is_none() {
let table_info = task.table_info.clone();
let region_ids = self
.physical_region_numbers
.iter()
.map(|region_number| {
RegionId::new(table_info.ident.table_id, *region_number)
})
.collect();
let table_route = TableRouteValue::logical(self.physical_table_id, region_ids);
let table_route = TableRouteValue::logical(self.physical_table_id);
Some((table_info, table_route))
} else {
None

View File

@@ -22,7 +22,7 @@ use store_api::storage::{RegionId, TableId};
use table::metadata::RawTableInfo;
use crate::ddl::create_logical_tables::CreateLogicalTablesProcedure;
use crate::ddl::create_table_template::{
use crate::ddl::create_table::template::{
CreateRequestBuilder, build_template, build_template_from_raw_table_info,
};
use crate::ddl::utils::region_storage_path;

View File

@@ -12,74 +12,99 @@
// See the License for the specific language governing permissions and
// limitations under the License.
pub(crate) mod executor;
pub(crate) mod template;
use std::collections::HashMap;
use api::v1::region::region_request::Body as PbRegionRequest;
use api::v1::region::{RegionRequest, RegionRequestHeader};
use api::v1::CreateTableExpr;
use async_trait::async_trait;
use common_error::ext::BoxedError;
use common_procedure::error::{
ExternalSnafu, FromJsonSnafu, Result as ProcedureResult, ToJsonSnafu,
};
use common_procedure::{Context as ProcedureContext, LockKey, Procedure, ProcedureId, Status};
use common_telemetry::tracing_context::TracingContext;
use common_telemetry::{info, warn};
use futures::future::join_all;
use common_telemetry::info;
use serde::{Deserialize, Serialize};
use snafu::{OptionExt, ResultExt, ensure};
use snafu::{OptionExt, ResultExt};
use store_api::metadata::ColumnMetadata;
use store_api::metric_engine_consts::TABLE_COLUMN_METADATA_EXTENSION_KEY;
use store_api::storage::{RegionId, RegionNumber};
use store_api::storage::RegionNumber;
use strum::AsRefStr;
use table::metadata::{RawTableInfo, TableId};
use table::table_name::TableName;
use table::table_reference::TableReference;
pub(crate) use template::{CreateRequestBuilder, build_template_from_raw_table_info};
use crate::ddl::create_table_template::{CreateRequestBuilder, build_template};
use crate::ddl::utils::raw_table_info::update_table_info_column_ids;
use crate::ddl::utils::{
add_peer_context_if_needed, convert_region_routes_to_detecting_regions,
extract_column_metadatas, map_to_procedure_error, region_storage_path,
};
use crate::ddl::create_table::executor::CreateTableExecutor;
use crate::ddl::create_table::template::build_template;
use crate::ddl::utils::map_to_procedure_error;
use crate::ddl::{DdlContext, TableMetadata};
use crate::error::{self, Result};
use crate::key::table_name::TableNameKey;
use crate::key::table_route::{PhysicalTableRouteValue, TableRouteValue};
use crate::key::table_route::PhysicalTableRouteValue;
use crate::lock_key::{CatalogLock, SchemaLock, TableNameLock};
use crate::metrics;
use crate::region_keeper::OperatingRegionGuard;
use crate::rpc::ddl::CreateTableTask;
use crate::rpc::router::{
RegionRoute, find_leader_regions, find_leaders, operating_leader_regions,
};
use crate::rpc::router::{RegionRoute, operating_leader_regions};
pub struct CreateTableProcedure {
pub context: DdlContext,
pub creator: TableCreator,
/// The serializable data.
pub data: CreateTableData,
/// The guards of opening.
pub opening_regions: Vec<OperatingRegionGuard>,
/// The executor of the procedure.
pub executor: CreateTableExecutor,
}
fn build_executor_from_create_table_data(
create_table_expr: &CreateTableExpr,
) -> Result<CreateTableExecutor> {
let template = build_template(create_table_expr)?;
let builder = CreateRequestBuilder::new(template, None);
let table_name = TableName::new(
create_table_expr.catalog_name.clone(),
create_table_expr.schema_name.clone(),
create_table_expr.table_name.clone(),
);
let executor =
CreateTableExecutor::new(table_name, create_table_expr.create_if_not_exists, builder);
Ok(executor)
}
impl CreateTableProcedure {
pub const TYPE_NAME: &'static str = "metasrv-procedure::CreateTable";
pub fn new(task: CreateTableTask, context: DdlContext) -> Self {
Self {
pub fn new(task: CreateTableTask, context: DdlContext) -> Result<Self> {
let executor = build_executor_from_create_table_data(&task.create_table)?;
Ok(Self {
context,
creator: TableCreator::new(task),
}
data: CreateTableData::new(task),
opening_regions: vec![],
executor,
})
}
pub fn from_json(json: &str, context: DdlContext) -> ProcedureResult<Self> {
let data = serde_json::from_str(json).context(FromJsonSnafu)?;
let data: CreateTableData = serde_json::from_str(json).context(FromJsonSnafu)?;
let create_table_expr = &data.task.create_table;
let executor = build_executor_from_create_table_data(create_table_expr)
.map_err(BoxedError::new)
.context(ExternalSnafu {
clean_poisons: false,
})?;
Ok(CreateTableProcedure {
context,
creator: TableCreator {
data,
opening_regions: vec![],
},
data,
opening_regions: vec![],
executor,
})
}
fn table_info(&self) -> &RawTableInfo {
&self.creator.data.task.table_info
&self.data.task.table_info
}
pub(crate) fn table_id(&self) -> TableId {
@@ -87,8 +112,7 @@ impl CreateTableProcedure {
}
fn region_wal_options(&self) -> Result<&HashMap<RegionNumber, String>> {
self.creator
.data
self.data
.region_wal_options
.as_ref()
.context(error::UnexpectedSnafu {
@@ -97,8 +121,7 @@ impl CreateTableProcedure {
}
fn table_route(&self) -> Result<&PhysicalTableRouteValue> {
self.creator
.data
self.data
.table_route
.as_ref()
.context(error::UnexpectedSnafu {
@@ -106,17 +129,6 @@ impl CreateTableProcedure {
})
}
#[cfg(any(test, feature = "testing"))]
pub fn set_allocated_metadata(
&mut self,
table_id: TableId,
table_route: PhysicalTableRouteValue,
region_wal_options: HashMap<RegionNumber, String>,
) {
self.creator
.set_allocated_metadata(table_id, table_route, region_wal_options)
}
/// On the prepare step, it performs:
/// - Checks whether the table exists.
/// - Allocates the table id.
@@ -125,31 +137,16 @@ impl CreateTableProcedure {
/// - TableName exists and `create_if_not_exists` is false.
/// - Failed to allocate [TableMetadata].
pub(crate) async fn on_prepare(&mut self) -> Result<Status> {
let expr = &self.creator.data.task.create_table;
let table_name_value = self
.context
.table_metadata_manager
.table_name_manager()
.get(TableNameKey::new(
&expr.catalog_name,
&expr.schema_name,
&expr.table_name,
))
let table_id = self
.executor
.on_prepare(&self.context.table_metadata_manager)
.await?;
if let Some(value) = table_name_value {
ensure!(
expr.create_if_not_exists,
error::TableAlreadyExistsSnafu {
table_name: self.creator.data.table_ref().to_string(),
}
);
let table_id = value.table_id();
// Return the table id if the table already exists.
if let Some(table_id) = table_id {
return Ok(Status::done_with_output(table_id));
}
self.creator.data.state = CreateTableState::DatanodeCreateRegions;
self.data.state = CreateTableState::DatanodeCreateRegions;
let TableMetadata {
table_id,
table_route,
@@ -157,23 +154,13 @@ impl CreateTableProcedure {
} = self
.context
.table_metadata_allocator
.create(&self.creator.data.task)
.create(&self.data.task)
.await?;
self.creator
.set_allocated_metadata(table_id, table_route, region_wal_options);
self.set_allocated_metadata(table_id, table_route, region_wal_options);
Ok(Status::executing(true))
}
pub fn new_region_request_builder(
&self,
physical_table_id: Option<TableId>,
) -> Result<CreateRequestBuilder> {
let create_table_expr = &self.creator.data.task.create_table;
let template = build_template(create_table_expr)?;
Ok(CreateRequestBuilder::new(template, physical_table_id))
}
/// Creates regions on datanodes
///
/// Abort(non-retry):
@@ -187,90 +174,29 @@ impl CreateTableProcedure {
/// - [Code::Unavailable](tonic::status::Code::Unavailable)
pub async fn on_datanode_create_regions(&mut self) -> Result<Status> {
let table_route = self.table_route()?.clone();
let request_builder = self.new_region_request_builder(None)?;
// Registers opening regions
let guards = self
.creator
.register_opening_regions(&self.context, &table_route.region_routes)?;
let guards = self.register_opening_regions(&self.context, &table_route.region_routes)?;
if !guards.is_empty() {
self.creator.opening_regions = guards;
self.opening_regions = guards;
}
self.create_regions(&table_route.region_routes, request_builder)
.await
self.create_regions(&table_route.region_routes).await
}
async fn create_regions(
&mut self,
region_routes: &[RegionRoute],
request_builder: CreateRequestBuilder,
) -> Result<Status> {
let create_table_data = &self.creator.data;
// Safety: the region_wal_options must be allocated
async fn create_regions(&mut self, region_routes: &[RegionRoute]) -> Result<Status> {
let table_id = self.table_id();
let region_wal_options = self.region_wal_options()?;
let create_table_expr = &create_table_data.task.create_table;
let catalog = &create_table_expr.catalog_name;
let schema = &create_table_expr.schema_name;
let storage_path = region_storage_path(catalog, schema);
let leaders = find_leaders(region_routes);
let mut create_region_tasks = Vec::with_capacity(leaders.len());
let column_metadatas = self
.executor
.on_create_regions(
&self.context.node_manager,
table_id,
region_routes,
region_wal_options,
)
.await?;
let partition_exprs = region_routes
.iter()
.map(|r| (r.region.id.region_number(), r.region.partition_expr()))
.collect();
for datanode in leaders {
let requester = self.context.node_manager.datanode(&datanode).await;
let regions = find_leader_regions(region_routes, &datanode);
let mut requests = Vec::with_capacity(regions.len());
for region_number in regions {
let region_id = RegionId::new(self.table_id(), region_number);
let create_region_request = request_builder.build_one(
region_id,
storage_path.clone(),
region_wal_options,
&partition_exprs,
);
requests.push(PbRegionRequest::Create(create_region_request));
}
for request in requests {
let request = RegionRequest {
header: Some(RegionRequestHeader {
tracing_context: TracingContext::from_current_span().to_w3c(),
..Default::default()
}),
body: Some(request),
};
let datanode = datanode.clone();
let requester = requester.clone();
create_region_tasks.push(async move {
requester
.handle(request)
.await
.map_err(add_peer_context_if_needed(datanode))
});
}
}
let mut results = join_all(create_region_tasks)
.await
.into_iter()
.collect::<Result<Vec<_>>>()?;
if let Some(column_metadatas) =
extract_column_metadatas(&mut results, TABLE_COLUMN_METADATA_EXTENSION_KEY)?
{
self.creator.data.column_metadatas = column_metadatas;
} else {
warn!(
"creating table result doesn't contains extension key `{TABLE_COLUMN_METADATA_EXTENSION_KEY}`,leaving the table's column metadata unchanged"
);
}
self.creator.data.state = CreateTableState::CreateMetadata;
self.data.column_metadatas = column_metadatas;
self.data.state = CreateTableState::CreateMetadata;
Ok(Status::executing(true))
}
@@ -280,107 +206,33 @@ impl CreateTableProcedure {
/// - Failed to create table metadata.
async fn on_create_metadata(&mut self, pid: ProcedureId) -> Result<Status> {
let table_id = self.table_id();
let table_ref = self.creator.data.table_ref();
let table_ref = self.data.table_ref();
let manager = &self.context.table_metadata_manager;
let mut raw_table_info = self.table_info().clone();
if !self.creator.data.column_metadatas.is_empty() {
update_table_info_column_ids(&mut raw_table_info, &self.creator.data.column_metadatas);
}
let raw_table_info = self.table_info().clone();
// Safety: the region_wal_options must be allocated.
let region_wal_options = self.region_wal_options()?.clone();
// Safety: the table_route must be allocated.
let physical_table_route = self.table_route()?.clone();
let detecting_regions =
convert_region_routes_to_detecting_regions(&physical_table_route.region_routes);
let table_route = TableRouteValue::Physical(physical_table_route);
manager
.create_table_metadata(raw_table_info, table_route, region_wal_options)
self.executor
.on_create_metadata(
manager,
&self.context.region_failure_detector_controller,
raw_table_info,
&self.data.column_metadatas,
physical_table_route,
region_wal_options,
)
.await?;
self.context
.register_failure_detectors(detecting_regions)
.await;
info!(
"Successfully created table: {}, table_id: {}, procedure_id: {}",
table_ref, table_id, pid
);
self.creator.opening_regions.clear();
self.opening_regions.clear();
Ok(Status::done_with_output(table_id))
}
}
#[async_trait]
impl Procedure for CreateTableProcedure {
fn type_name(&self) -> &str {
Self::TYPE_NAME
}
fn recover(&mut self) -> ProcedureResult<()> {
// Only registers regions if the table route is allocated.
if let Some(x) = &self.creator.data.table_route {
self.creator.opening_regions = self
.creator
.register_opening_regions(&self.context, &x.region_routes)
.map_err(BoxedError::new)
.context(ExternalSnafu {
clean_poisons: false,
})?;
}
Ok(())
}
async fn execute(&mut self, ctx: &ProcedureContext) -> ProcedureResult<Status> {
let state = &self.creator.data.state;
let _timer = metrics::METRIC_META_PROCEDURE_CREATE_TABLE
.with_label_values(&[state.as_ref()])
.start_timer();
match state {
CreateTableState::Prepare => self.on_prepare().await,
CreateTableState::DatanodeCreateRegions => self.on_datanode_create_regions().await,
CreateTableState::CreateMetadata => self.on_create_metadata(ctx.procedure_id).await,
}
.map_err(map_to_procedure_error)
}
fn dump(&self) -> ProcedureResult<String> {
serde_json::to_string(&self.creator.data).context(ToJsonSnafu)
}
fn lock_key(&self) -> LockKey {
let table_ref = &self.creator.data.table_ref();
LockKey::new(vec![
CatalogLock::Read(table_ref.catalog).into(),
SchemaLock::read(table_ref.catalog, table_ref.schema).into(),
TableNameLock::new(table_ref.catalog, table_ref.schema, table_ref.table).into(),
])
}
}
pub struct TableCreator {
/// The serializable data.
pub data: CreateTableData,
/// The guards of opening.
pub opening_regions: Vec<OperatingRegionGuard>,
}
impl TableCreator {
pub fn new(task: CreateTableTask) -> Self {
Self {
data: CreateTableData {
state: CreateTableState::Prepare,
column_metadatas: vec![],
task,
table_route: None,
region_wal_options: None,
},
opening_regions: vec![],
}
}
/// Registers and returns the guards of the opening region if they don't exist.
fn register_opening_regions(
@@ -389,7 +241,6 @@ impl TableCreator {
region_routes: &[RegionRoute],
) -> Result<Vec<OperatingRegionGuard>> {
let opening_regions = operating_leader_regions(region_routes);
if self.opening_regions.len() == opening_regions.len() {
return Ok(vec![]);
}
@@ -409,7 +260,7 @@ impl TableCreator {
Ok(opening_region_guards)
}
fn set_allocated_metadata(
pub fn set_allocated_metadata(
&mut self,
table_id: TableId,
table_route: PhysicalTableRouteValue,
@@ -421,6 +272,56 @@ impl TableCreator {
}
}
#[async_trait]
impl Procedure for CreateTableProcedure {
fn type_name(&self) -> &str {
Self::TYPE_NAME
}
fn recover(&mut self) -> ProcedureResult<()> {
// Only registers regions if the table route is allocated.
if let Some(x) = &self.data.table_route {
self.opening_regions = self
.register_opening_regions(&self.context, &x.region_routes)
.map_err(BoxedError::new)
.context(ExternalSnafu {
clean_poisons: false,
})?;
}
Ok(())
}
async fn execute(&mut self, ctx: &ProcedureContext) -> ProcedureResult<Status> {
let state = &self.data.state;
let _timer = metrics::METRIC_META_PROCEDURE_CREATE_TABLE
.with_label_values(&[state.as_ref()])
.start_timer();
match state {
CreateTableState::Prepare => self.on_prepare().await,
CreateTableState::DatanodeCreateRegions => self.on_datanode_create_regions().await,
CreateTableState::CreateMetadata => self.on_create_metadata(ctx.procedure_id).await,
}
.map_err(map_to_procedure_error)
}
fn dump(&self) -> ProcedureResult<String> {
serde_json::to_string(&self.data).context(ToJsonSnafu)
}
fn lock_key(&self) -> LockKey {
let table_ref = &self.data.table_ref();
LockKey::new(vec![
CatalogLock::Read(table_ref.catalog).into(),
SchemaLock::read(table_ref.catalog, table_ref.schema).into(),
TableNameLock::new(table_ref.catalog, table_ref.schema, table_ref.table).into(),
])
}
}
#[derive(Debug, Clone, Serialize, Deserialize, AsRefStr, PartialEq)]
pub enum CreateTableState {
/// Prepares to create the table
@@ -444,6 +345,16 @@ pub struct CreateTableData {
}
impl CreateTableData {
pub fn new(task: CreateTableTask) -> Self {
CreateTableData {
state: CreateTableState::Prepare,
column_metadatas: vec![],
task,
table_route: None,
region_wal_options: None,
}
}
fn table_ref(&self) -> TableReference<'_> {
self.task.table_ref()
}

View File

@@ -0,0 +1,203 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
use std::collections::HashMap;
use api::v1::region::region_request::Body as PbRegionRequest;
use api::v1::region::{RegionRequest, RegionRequestHeader};
use common_telemetry::tracing_context::TracingContext;
use common_telemetry::warn;
use futures::future::join_all;
use snafu::ensure;
use store_api::metadata::ColumnMetadata;
use store_api::metric_engine_consts::TABLE_COLUMN_METADATA_EXTENSION_KEY;
use store_api::storage::{RegionId, RegionNumber};
use table::metadata::{RawTableInfo, TableId};
use table::table_name::TableName;
use crate::ddl::utils::raw_table_info::update_table_info_column_ids;
use crate::ddl::utils::{
add_peer_context_if_needed, convert_region_routes_to_detecting_regions,
extract_column_metadatas, region_storage_path,
};
use crate::ddl::{CreateRequestBuilder, RegionFailureDetectorControllerRef};
use crate::error::{self, Result};
use crate::key::TableMetadataManagerRef;
use crate::key::table_name::TableNameKey;
use crate::key::table_route::{PhysicalTableRouteValue, TableRouteValue};
use crate::node_manager::NodeManagerRef;
use crate::rpc::router::{RegionRoute, find_leader_regions, find_leaders};
/// [CreateTableExecutor] performs:
/// - Creates the metadata of the table.
/// - Creates the regions on the Datanode nodes.
pub struct CreateTableExecutor {
create_if_not_exists: bool,
table_name: TableName,
builder: CreateRequestBuilder,
}
impl CreateTableExecutor {
/// Creates a new [`CreateTableExecutor`].
pub fn new(
table_name: TableName,
create_if_not_exists: bool,
builder: CreateRequestBuilder,
) -> Self {
Self {
create_if_not_exists,
table_name,
builder,
}
}
/// On the prepare step, it performs:
/// - Checks whether the table exists.
/// - Returns the table id if the table exists.
///
/// Abort(non-retry):
/// - Table exists and `create_if_not_exists` is `false`.
/// - Failed to get the table name value.
pub async fn on_prepare(
&self,
table_metadata_manager: &TableMetadataManagerRef,
) -> Result<Option<TableId>> {
let table_name_value = table_metadata_manager
.table_name_manager()
.get(TableNameKey::new(
&self.table_name.catalog_name,
&self.table_name.schema_name,
&self.table_name.table_name,
))
.await?;
if let Some(value) = table_name_value {
ensure!(
self.create_if_not_exists,
error::TableAlreadyExistsSnafu {
table_name: self.table_name.to_string(),
}
);
return Ok(Some(value.table_id()));
}
Ok(None)
}
pub async fn on_create_regions(
&self,
node_manager: &NodeManagerRef,
table_id: TableId,
region_routes: &[RegionRoute],
region_wal_options: &HashMap<RegionNumber, String>,
) -> Result<Vec<ColumnMetadata>> {
let storage_path =
region_storage_path(&self.table_name.catalog_name, &self.table_name.schema_name);
let leaders = find_leaders(region_routes);
let mut create_region_tasks = Vec::with_capacity(leaders.len());
let partition_exprs = region_routes
.iter()
.map(|r| (r.region.id.region_number(), r.region.partition_expr()))
.collect::<HashMap<_, _>>();
for datanode in leaders {
let requester = node_manager.datanode(&datanode).await;
let regions = find_leader_regions(region_routes, &datanode);
let mut requests = Vec::with_capacity(regions.len());
for region_number in regions {
let region_id = RegionId::new(table_id, region_number);
let create_region_request = self.builder.build_one(
region_id,
storage_path.clone(),
region_wal_options,
&partition_exprs,
);
requests.push(PbRegionRequest::Create(create_region_request));
}
for request in requests {
let request = RegionRequest {
header: Some(RegionRequestHeader {
tracing_context: TracingContext::from_current_span().to_w3c(),
..Default::default()
}),
body: Some(request),
};
let datanode = datanode.clone();
let requester = requester.clone();
create_region_tasks.push(async move {
requester
.handle(request)
.await
.map_err(add_peer_context_if_needed(datanode))
});
}
}
let mut results = join_all(create_region_tasks)
.await
.into_iter()
.collect::<Result<Vec<_>>>()?;
let column_metadatas = if let Some(column_metadatas) =
extract_column_metadatas(&mut results, TABLE_COLUMN_METADATA_EXTENSION_KEY)?
{
column_metadatas
} else {
warn!(
"creating table result doesn't contains extension key `{TABLE_COLUMN_METADATA_EXTENSION_KEY}`,leaving the table's column metadata unchanged"
);
vec![]
};
Ok(column_metadatas)
}
/// Creates table metadata
///
/// Abort(non-retry):
/// - Failed to create table metadata.
pub async fn on_create_metadata(
&self,
table_metadata_manager: &TableMetadataManagerRef,
region_failure_detector_controller: &RegionFailureDetectorControllerRef,
mut raw_table_info: RawTableInfo,
column_metadatas: &[ColumnMetadata],
table_route: PhysicalTableRouteValue,
region_wal_options: HashMap<RegionNumber, String>,
) -> Result<()> {
if !column_metadatas.is_empty() {
update_table_info_column_ids(&mut raw_table_info, column_metadatas);
}
let detecting_regions =
convert_region_routes_to_detecting_regions(&table_route.region_routes);
let table_route = TableRouteValue::Physical(table_route);
table_metadata_manager
.create_table_metadata(raw_table_info, table_route, region_wal_options)
.await?;
region_failure_detector_controller
.register_failure_detectors(detecting_regions)
.await;
Ok(())
}
/// Returns the builder of the executor.
pub fn builder(&self) -> &CreateRequestBuilder {
&self.builder
}
}

View File

@@ -120,7 +120,13 @@ impl State for DropDatabaseExecutor {
.await?;
executor.invalidate_table_cache(ddl_ctx).await?;
executor
.on_drop_regions(ddl_ctx, &self.physical_region_routes, true)
.on_drop_regions(
&ddl_ctx.node_manager,
&ddl_ctx.leader_region_registry,
&self.physical_region_routes,
true,
false,
)
.await?;
info!("Table: {}({}) is dropped", self.table_name, self.table_id);

View File

@@ -12,7 +12,7 @@
// See the License for the specific language governing permissions and
// limitations under the License.
pub(crate) mod executor;
pub mod executor;
mod metadata;
use std::collections::HashMap;
@@ -156,7 +156,13 @@ impl DropTableProcedure {
pub async fn on_datanode_drop_regions(&mut self) -> Result<Status> {
self.executor
.on_drop_regions(&self.context, &self.data.physical_region_routes, false)
.on_drop_regions(
&self.context.node_manager,
&self.context.leader_region_registry,
&self.data.physical_region_routes,
false,
false,
)
.await?;
self.data.state = DropTableState::DeleteTombstone;
Ok(Status::executing(true))

View File

@@ -36,6 +36,8 @@ use crate::error::{self, Result};
use crate::instruction::CacheIdent;
use crate::key::table_name::TableNameKey;
use crate::key::table_route::TableRouteValue;
use crate::node_manager::NodeManagerRef;
use crate::region_registry::LeaderRegionRegistryRef;
use crate::rpc::router::{
RegionRoute, find_follower_regions, find_followers, find_leader_regions, find_leaders,
operating_leader_regions,
@@ -212,16 +214,18 @@ impl DropTableExecutor {
/// Drops region on datanode.
pub async fn on_drop_regions(
&self,
ctx: &DdlContext,
node_manager: &NodeManagerRef,
leader_region_registry: &LeaderRegionRegistryRef,
region_routes: &[RegionRoute],
fast_path: bool,
force: bool,
) -> Result<()> {
// Drops leader regions on datanodes.
let leaders = find_leaders(region_routes);
let mut drop_region_tasks = Vec::with_capacity(leaders.len());
let table_id = self.table_id;
for datanode in leaders {
let requester = ctx.node_manager.datanode(&datanode).await;
let requester = node_manager.datanode(&datanode).await;
let regions = find_leader_regions(region_routes, &datanode);
let region_ids = regions
.iter()
@@ -238,6 +242,7 @@ impl DropTableExecutor {
body: Some(region_request::Body::Drop(PbDropRegionRequest {
region_id: region_id.as_u64(),
fast_path,
force,
})),
};
let datanode = datanode.clone();
@@ -262,7 +267,7 @@ impl DropTableExecutor {
let followers = find_followers(region_routes);
let mut close_region_tasks = Vec::with_capacity(followers.len());
for datanode in followers {
let requester = ctx.node_manager.datanode(&datanode).await;
let requester = node_manager.datanode(&datanode).await;
let regions = find_follower_regions(region_routes, &datanode);
let region_ids = regions
.iter()
@@ -307,8 +312,7 @@ impl DropTableExecutor {
// Deletes the leader region from registry.
let region_ids = operating_leader_regions(region_routes);
ctx.leader_region_registry
.batch_delete(region_ids.into_iter().map(|(region_id, _)| region_id));
leader_region_registry.batch_delete(region_ids.into_iter().map(|(region_id, _)| region_id));
Ok(())
}

View File

@@ -128,7 +128,6 @@ pub fn build_raw_table_info_from_expr(expr: &CreateTableExpr) -> RawTableInfo {
value_indices: vec![],
engine: expr.engine.clone(),
next_column_id: expr.column_defs.len() as u32,
region_numbers: vec![],
options: TableOptions::try_from_iter(&expr.table_options).unwrap(),
created_on: DateTime::default(),
updated_on: DateTime::default(),

View File

@@ -166,7 +166,7 @@ async fn test_on_prepare_logical_table_exists_err() {
.table_metadata_manager
.create_logical_tables_metadata(vec![(
task.table_info.clone(),
TableRouteValue::logical(1024, vec![RegionId::new(1025, 1)]),
TableRouteValue::logical(1024),
)])
.await
.unwrap();
@@ -208,7 +208,7 @@ async fn test_on_prepare_with_create_if_table_exists() {
.table_metadata_manager
.create_logical_tables_metadata(vec![(
task.table_info.clone(),
TableRouteValue::logical(1024, vec![RegionId::new(8192, 1)]),
TableRouteValue::logical(1024),
)])
.await
.unwrap();
@@ -252,7 +252,7 @@ async fn test_on_prepare_part_logical_tables_exist() {
.table_metadata_manager
.create_logical_tables_metadata(vec![(
task.table_info.clone(),
TableRouteValue::logical(1024, vec![RegionId::new(8192, 1)]),
TableRouteValue::logical(1024),
)])
.await
.unwrap();
@@ -392,7 +392,7 @@ async fn test_on_create_metadata_part_logical_tables_exist() {
.table_metadata_manager
.create_logical_tables_metadata(vec![(
task.table_info.clone(),
TableRouteValue::logical(1024, vec![RegionId::new(8192, 1)]),
TableRouteValue::logical(1024),
)])
.await
.unwrap();
@@ -496,10 +496,7 @@ async fn test_on_create_metadata_err() {
task.table_info.ident.table_id = 1025;
ddl_context
.table_metadata_manager
.create_logical_tables_metadata(vec![(
task.table_info,
TableRouteValue::logical(512, vec![RegionId::new(1026, 1)]),
)])
.create_logical_tables_metadata(vec![(task.table_info, TableRouteValue::logical(512))])
.await
.unwrap();
// Triggers procedure to create table metadata

View File

@@ -162,7 +162,7 @@ async fn test_on_prepare_table_exists_err() {
)
.await
.unwrap();
let mut procedure = CreateTableProcedure::new(task, ddl_context);
let mut procedure = CreateTableProcedure::new(task, ddl_context).unwrap();
let err = procedure.on_prepare().await.unwrap_err();
assert_matches!(err, Error::TableAlreadyExists { .. });
assert_eq!(err.status_code(), StatusCode::TableAlreadyExists);
@@ -185,7 +185,7 @@ async fn test_on_prepare_with_create_if_table_exists() {
)
.await
.unwrap();
let mut procedure = CreateTableProcedure::new(task, ddl_context);
let mut procedure = CreateTableProcedure::new(task, ddl_context).unwrap();
let status = procedure.on_prepare().await.unwrap();
assert_matches!(status, Status::Done { output: Some(..) });
let table_id = *status.downcast_output_ref::<u32>().unwrap();
@@ -198,7 +198,7 @@ async fn test_on_prepare_without_create_if_table_exists() {
let ddl_context = new_ddl_context(node_manager);
let mut task = test_create_table_task("foo");
task.create_table.create_if_not_exists = true;
let mut procedure = CreateTableProcedure::new(task, ddl_context);
let mut procedure = CreateTableProcedure::new(task, ddl_context).unwrap();
let status = procedure.on_prepare().await.unwrap();
assert_matches!(
status,
@@ -217,7 +217,7 @@ async fn test_on_datanode_create_regions_should_retry() {
let ddl_context = new_ddl_context(node_manager);
let task = test_create_table_task("foo");
assert!(!task.create_table.create_if_not_exists);
let mut procedure = CreateTableProcedure::new(task, ddl_context);
let mut procedure = CreateTableProcedure::new(task, ddl_context).unwrap();
procedure.on_prepare().await.unwrap();
let ctx = ProcedureContext {
procedure_id: ProcedureId::random(),
@@ -234,7 +234,7 @@ async fn test_on_datanode_create_regions_should_not_retry() {
let ddl_context = new_ddl_context(node_manager);
let task = test_create_table_task("foo");
assert!(!task.create_table.create_if_not_exists);
let mut procedure = CreateTableProcedure::new(task, ddl_context);
let mut procedure = CreateTableProcedure::new(task, ddl_context).unwrap();
procedure.on_prepare().await.unwrap();
let ctx = ProcedureContext {
procedure_id: ProcedureId::random(),
@@ -251,7 +251,7 @@ async fn test_on_create_metadata_error() {
let ddl_context = new_ddl_context(node_manager);
let task = test_create_table_task("foo");
assert!(!task.create_table.create_if_not_exists);
let mut procedure = CreateTableProcedure::new(task.clone(), ddl_context.clone());
let mut procedure = CreateTableProcedure::new(task.clone(), ddl_context.clone()).unwrap();
procedure.on_prepare().await.unwrap();
let ctx = ProcedureContext {
procedure_id: ProcedureId::random(),
@@ -284,7 +284,7 @@ async fn test_on_create_metadata() {
let ddl_context = new_ddl_context(node_manager);
let task = test_create_table_task("foo");
assert!(!task.create_table.create_if_not_exists);
let mut procedure = CreateTableProcedure::new(task, ddl_context.clone());
let mut procedure = CreateTableProcedure::new(task, ddl_context.clone()).unwrap();
procedure.on_prepare().await.unwrap();
let ctx = ProcedureContext {
procedure_id: ProcedureId::random(),
@@ -312,16 +312,16 @@ async fn test_memory_region_keeper_guard_dropped_on_procedure_done() {
let ddl_context = new_ddl_context_with_kv_backend(node_manager, kv_backend);
let task = test_create_table_task("foo");
let mut procedure = CreateTableProcedure::new(task, ddl_context.clone());
let mut procedure = CreateTableProcedure::new(task, ddl_context.clone()).unwrap();
execute_procedure_until(&mut procedure, |p| {
p.creator.data.state == CreateTableState::CreateMetadata
p.data.state == CreateTableState::CreateMetadata
})
.await;
// Ensure that after running to the state `CreateMetadata`(just past `DatanodeCreateRegions`),
// the opening regions should be recorded:
let guards = &procedure.creator.opening_regions;
let guards = &procedure.opening_regions;
assert_eq!(guards.len(), 1);
let (datanode_id, region_id) = (0, RegionId::new(procedure.table_id(), 0));
assert_eq!(guards[0].info(), (datanode_id, region_id));
@@ -334,7 +334,7 @@ async fn test_memory_region_keeper_guard_dropped_on_procedure_done() {
execute_procedure_until_done(&mut procedure).await;
// Ensure that when run to the end, the opening regions should be cleared:
let guards = &procedure.creator.opening_regions;
let guards = &procedure.opening_regions;
assert!(guards.is_empty());
assert!(
!ddl_context

View File

@@ -259,7 +259,7 @@ async fn test_replace_table() {
{
// Create a `foo` table.
let task = test_create_table_task("foo");
let mut procedure = CreateTableProcedure::new(task, ddl_context.clone());
let mut procedure = CreateTableProcedure::new(task, ddl_context.clone()).unwrap();
procedure.on_prepare().await.unwrap();
let ctx = ProcedureContext {
procedure_id: ProcedureId::random(),

View File

@@ -231,7 +231,7 @@ impl DdlManager {
) -> Result<(ProcedureId, Option<Output>)> {
let context = self.create_context();
let procedure = CreateTableProcedure::new(create_table_task, context);
let procedure = CreateTableProcedure::new(create_table_task, context)?;
let procedure_with_id = ProcedureWithId::with_random_id(Box::new(procedure));

View File

@@ -530,6 +530,49 @@ impl Display for EnterStagingRegion {
}
}
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq)]
pub struct RemapManifest {
pub region_id: RegionId,
/// Regions to remap manifests from.
pub input_regions: Vec<RegionId>,
/// For each old region, which new regions should receive its files
pub region_mapping: HashMap<RegionId, Vec<RegionId>>,
/// New partition expressions for the new regions.
pub new_partition_exprs: HashMap<RegionId, String>,
}
impl Display for RemapManifest {
fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result {
write!(
f,
"RemapManifest(region_id={}, input_regions={:?}, region_mapping={:?}, new_partition_exprs={:?})",
self.region_id, self.input_regions, self.region_mapping, self.new_partition_exprs
)
}
}
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq)]
pub struct ApplyStagingManifest {
/// The region ID to apply the staging manifest to.
pub region_id: RegionId,
/// The partition expression of the staging region.
pub partition_expr: String,
/// The region that stores the staging manifests in its staging blob storage.
pub central_region_id: RegionId,
/// The relative path to the staging manifest within the central region's staging blob storage.
pub manifest_path: String,
}
impl Display for ApplyStagingManifest {
fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result {
write!(
f,
"ApplyStagingManifest(region_id={}, partition_expr={}, central_region_id={}, manifest_path={})",
self.region_id, self.partition_expr, self.central_region_id, self.manifest_path
)
}
}
#[derive(Debug, Clone, Serialize, Deserialize, Display, PartialEq)]
pub enum Instruction {
/// Opens regions.
@@ -559,6 +602,10 @@ pub enum Instruction {
Suspend,
/// Makes regions enter staging state.
EnterStagingRegions(Vec<EnterStagingRegion>),
/// Remaps manifests for a region.
RemapManifest(RemapManifest),
/// Applies staging manifests for a region.
ApplyStagingManifests(Vec<ApplyStagingManifest>),
}
impl Instruction {
@@ -737,6 +784,48 @@ impl EnterStagingRegionsReply {
}
}
#[derive(Debug, Serialize, Deserialize, PartialEq, Eq, Clone)]
pub struct RemapManifestReply {
/// Returns false if the region does not exist.
pub exists: bool,
/// A map from region IDs to their corresponding remapped manifest paths.
pub manifest_paths: HashMap<RegionId, String>,
/// Return error if any during the operation.
pub error: Option<String>,
}
impl Display for RemapManifestReply {
fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result {
write!(
f,
"RemapManifestReply(manifest_paths={:?}, error={:?})",
self.manifest_paths, self.error
)
}
}
#[derive(Debug, Serialize, Deserialize, PartialEq, Eq, Clone)]
pub struct ApplyStagingManifestsReply {
pub replies: Vec<ApplyStagingManifestReply>,
}
impl ApplyStagingManifestsReply {
pub fn new(replies: Vec<ApplyStagingManifestReply>) -> Self {
Self { replies }
}
}
#[derive(Debug, Serialize, Deserialize, PartialEq, Eq, Clone)]
pub struct ApplyStagingManifestReply {
pub region_id: RegionId,
/// Returns true if the region is ready to serve reads and writes.
pub ready: bool,
/// Indicates whether the region exists.
pub exists: bool,
/// Return error if any during the operation.
pub error: Option<String>,
}
#[derive(Debug, Serialize, Deserialize, PartialEq, Eq, Clone)]
#[serde(tag = "type", rename_all = "snake_case")]
pub enum InstructionReply {
@@ -758,6 +847,8 @@ pub enum InstructionReply {
GetFileRefs(GetFileRefsReply),
GcRegions(GcRegionsReply),
EnterStagingRegions(EnterStagingRegionsReply),
RemapManifest(RemapManifestReply),
ApplyStagingManifests(ApplyStagingManifestsReply),
}
impl Display for InstructionReply {
@@ -781,6 +872,12 @@ impl Display for InstructionReply {
reply.replies
)
}
Self::RemapManifest(reply) => write!(f, "InstructionReply::RemapManifest({})", reply),
Self::ApplyStagingManifests(reply) => write!(
f,
"InstructionReply::ApplyStagingManifests({:?})",
reply.replies
),
}
}
}
@@ -828,6 +925,20 @@ impl InstructionReply {
_ => panic!("Expected EnterStagingRegion reply"),
}
}
pub fn expect_remap_manifest_reply(self) -> RemapManifestReply {
match self {
Self::RemapManifest(reply) => reply,
_ => panic!("Expected RemapManifest reply"),
}
}
pub fn expect_apply_staging_manifests_reply(self) -> Vec<ApplyStagingManifestReply> {
match self {
Self::ApplyStagingManifests(reply) => reply.replies,
_ => panic!("Expected ApplyStagingManifest reply"),
}
}
}
#[cfg(test)]

View File

@@ -747,12 +747,10 @@ impl TableMetadataManager {
/// The caller MUST ensure it has the exclusive access to `TableNameKey`.
pub async fn create_table_metadata(
&self,
mut table_info: RawTableInfo,
table_info: RawTableInfo,
table_route_value: TableRouteValue,
region_wal_options: HashMap<RegionNumber, String>,
) -> Result<()> {
let region_numbers = table_route_value.region_numbers();
table_info.meta.region_numbers = region_numbers;
let table_id = table_info.ident.table_id;
let engine = table_info.meta.engine.clone();
@@ -851,8 +849,7 @@ impl TableMetadataManager {
on_create_table_route_failure: F2,
}
let mut on_failures = Vec::with_capacity(len);
for (mut table_info, table_route_value) in tables_data {
table_info.meta.region_numbers = table_route_value.region_numbers();
for (table_info, table_route_value) in tables_data {
let table_id = table_info.ident.table_id;
// Creates table name.
@@ -1543,8 +1540,8 @@ mod tests {
}
}
fn new_test_table_info(region_numbers: impl Iterator<Item = u32>) -> TableInfo {
test_utils::new_test_table_info(10, region_numbers)
fn new_test_table_info() -> TableInfo {
test_utils::new_test_table_info(10)
}
fn new_test_table_names() -> HashSet<TableName> {
@@ -1602,8 +1599,7 @@ mod tests {
let table_metadata_manager = TableMetadataManager::new(mem_kv.clone());
let region_route = new_test_region_route();
let region_routes = &vec![region_route.clone()];
let table_info: RawTableInfo =
new_test_table_info(region_routes.iter().map(|r| r.region.id.region_number())).into();
let table_info: RawTableInfo = new_test_table_info().into();
let wal_allocator = WalOptionsAllocator::RaftEngine;
let regions = (0..16).collect();
let region_wal_options =
@@ -1630,8 +1626,7 @@ mod tests {
let table_metadata_manager = TableMetadataManager::new(mem_kv);
let region_route = new_test_region_route();
let region_routes = &vec![region_route.clone()];
let table_info: RawTableInfo =
new_test_table_info(region_routes.iter().map(|r| r.region.id.region_number())).into();
let table_info: RawTableInfo = new_test_table_info().into();
let region_wal_options = create_mock_region_wal_options()
.into_iter()
.map(|(k, v)| (k, serde_json::to_string(&v).unwrap()))
@@ -1713,8 +1708,7 @@ mod tests {
let table_metadata_manager = TableMetadataManager::new(mem_kv);
let region_route = new_test_region_route();
let region_routes = vec![region_route.clone()];
let table_info: RawTableInfo =
new_test_table_info(region_routes.iter().map(|r| r.region.id.region_number())).into();
let table_info: RawTableInfo = new_test_table_info().into();
let table_id = table_info.ident.table_id;
let table_route_value = TableRouteValue::physical(region_routes.clone());
@@ -1779,7 +1773,6 @@ mod tests {
let table_info: RawTableInfo = test_utils::new_test_table_info_with_name(
table_id,
&format!("my_table_{}", table_id),
region_routes.iter().map(|r| r.region.id.region_number()),
)
.into();
let table_route_value = TableRouteValue::physical(region_routes.clone());
@@ -1800,8 +1793,7 @@ mod tests {
let table_metadata_manager = TableMetadataManager::new(mem_kv);
let region_route = new_test_region_route();
let region_routes = &vec![region_route.clone()];
let table_info: RawTableInfo =
new_test_table_info(region_routes.iter().map(|r| r.region.id.region_number())).into();
let table_info: RawTableInfo = new_test_table_info().into();
let table_id = table_info.ident.table_id;
let datanode_id = 2;
let region_wal_options = create_mock_region_wal_options();
@@ -1907,8 +1899,7 @@ mod tests {
let table_metadata_manager = TableMetadataManager::new(mem_kv);
let region_route = new_test_region_route();
let region_routes = vec![region_route.clone()];
let table_info: RawTableInfo =
new_test_table_info(region_routes.iter().map(|r| r.region.id.region_number())).into();
let table_info: RawTableInfo = new_test_table_info().into();
let table_id = table_info.ident.table_id;
// creates metadata.
create_physical_table_metadata(
@@ -1984,8 +1975,7 @@ mod tests {
let table_metadata_manager = TableMetadataManager::new(mem_kv);
let region_route = new_test_region_route();
let region_routes = vec![region_route.clone()];
let table_info: RawTableInfo =
new_test_table_info(region_routes.iter().map(|r| r.region.id.region_number())).into();
let table_info: RawTableInfo = new_test_table_info().into();
let table_id = table_info.ident.table_id;
// creates metadata.
create_physical_table_metadata(
@@ -2070,8 +2060,7 @@ mod tests {
leader_down_since: None,
},
];
let table_info: RawTableInfo =
new_test_table_info(region_routes.iter().map(|r| r.region.id.region_number())).into();
let table_info: RawTableInfo = new_test_table_info().into();
let table_id = table_info.ident.table_id;
let current_table_route_value = DeserializedValueWithBytes::from_inner(
TableRouteValue::physical(region_routes.clone()),
@@ -2153,8 +2142,7 @@ mod tests {
let table_metadata_manager = TableMetadataManager::new(mem_kv);
let region_route = new_test_region_route();
let region_routes = vec![region_route.clone()];
let table_info: RawTableInfo =
new_test_table_info(region_routes.iter().map(|r| r.region.id.region_number())).into();
let table_info: RawTableInfo = new_test_table_info().into();
let table_id = table_info.ident.table_id;
let engine = table_info.meta.engine.as_str();
let region_storage_path =
@@ -2408,7 +2396,7 @@ mod tests {
let mem_kv = Arc::new(MemoryKvBackend::default());
let table_metadata_manager = TableMetadataManager::new(mem_kv);
let view_info: RawTableInfo = new_test_table_info(Vec::<u32>::new().into_iter()).into();
let view_info: RawTableInfo = new_test_table_info().into();
let view_id = view_info.ident.table_id;

View File

@@ -338,7 +338,6 @@ mod tests {
next_column_id: 3,
value_indices: vec![2, 3],
options: Default::default(),
region_numbers: vec![1],
partition_key_indices: vec![],
column_ids: vec![],
};

View File

@@ -71,7 +71,6 @@ pub struct PhysicalTableRouteValue {
#[derive(Debug, PartialEq, Serialize, Deserialize, Clone)]
pub struct LogicalTableRouteValue {
physical_table_id: TableId,
region_ids: Vec<RegionId>,
}
impl TableRouteValue {
@@ -85,14 +84,7 @@ impl TableRouteValue {
if table_id == physical_table_id {
TableRouteValue::physical(region_routes)
} else {
let region_routes = region_routes
.into_iter()
.map(|region| {
debug_assert_eq!(region.region.id.table_id(), physical_table_id);
RegionId::new(table_id, region.region.id.region_number())
})
.collect();
TableRouteValue::logical(physical_table_id, region_routes)
TableRouteValue::logical(physical_table_id)
}
}
@@ -100,8 +92,8 @@ impl TableRouteValue {
Self::Physical(PhysicalTableRouteValue::new(region_routes))
}
pub fn logical(physical_table_id: TableId, region_ids: Vec<RegionId>) -> Self {
Self::Logical(LogicalTableRouteValue::new(physical_table_id, region_ids))
pub fn logical(physical_table_id: TableId) -> Self {
Self::Logical(LogicalTableRouteValue::new(physical_table_id))
}
/// Returns a new version [TableRouteValue] with `region_routes`.
@@ -207,11 +199,9 @@ impl TableRouteValue {
.iter()
.map(|region_route| region_route.region.id.region_number())
.collect(),
TableRouteValue::Logical(x) => x
.region_ids()
.iter()
.map(|region_id| region_id.region_number())
.collect(),
TableRouteValue::Logical(_) => {
vec![]
}
}
}
}
@@ -245,20 +235,13 @@ impl PhysicalTableRouteValue {
}
impl LogicalTableRouteValue {
pub fn new(physical_table_id: TableId, region_ids: Vec<RegionId>) -> Self {
Self {
physical_table_id,
region_ids,
}
pub fn new(physical_table_id: TableId) -> Self {
Self { physical_table_id }
}
pub fn physical_table_id(&self) -> TableId {
self.physical_table_id
}
pub fn region_ids(&self) -> &Vec<RegionId> {
&self.region_ids
}
}
impl MetadataKey<'_, TableRouteKey> for TableRouteKey {
@@ -900,7 +883,6 @@ mod tests {
let table_route_manager = TableRouteManager::new(kv.clone());
let table_route_value = TableRouteValue::Logical(LogicalTableRouteValue {
physical_table_id: 1023,
region_ids: vec![RegionId::new(1023, 1)],
});
let (txn, _) = table_route_manager
.table_route_storage()
@@ -930,14 +912,12 @@ mod tests {
1024,
TableRouteValue::Logical(LogicalTableRouteValue {
physical_table_id: 1023,
region_ids: vec![RegionId::new(1023, 1)],
}),
),
(
1025,
TableRouteValue::Logical(LogicalTableRouteValue {
physical_table_id: 1023,
region_ids: vec![RegionId::new(1023, 2)],
}),
),
];

View File

@@ -19,11 +19,7 @@ use datatypes::schema::{ColumnSchema, SchemaBuilder};
use store_api::storage::TableId;
use table::metadata::{TableInfo, TableInfoBuilder, TableMetaBuilder};
pub fn new_test_table_info_with_name<I: IntoIterator<Item = u32>>(
table_id: TableId,
table_name: &str,
region_numbers: I,
) -> TableInfo {
pub fn new_test_table_info_with_name(table_id: TableId, table_name: &str) -> TableInfo {
let column_schemas = vec![
ColumnSchema::new("col1", ConcreteDataType::int32_datatype(), true),
ColumnSchema::new(
@@ -45,7 +41,6 @@ pub fn new_test_table_info_with_name<I: IntoIterator<Item = u32>>(
.primary_key_indices(vec![0])
.engine("engine")
.next_column_id(3)
.region_numbers(region_numbers.into_iter().collect::<Vec<_>>())
.build()
.unwrap();
TableInfoBuilder::default()
@@ -56,9 +51,6 @@ pub fn new_test_table_info_with_name<I: IntoIterator<Item = u32>>(
.build()
.unwrap()
}
pub fn new_test_table_info<I: IntoIterator<Item = u32>>(
table_id: TableId,
region_numbers: I,
) -> TableInfo {
new_test_table_info_with_name(table_id, "mytable", region_numbers)
pub fn new_test_table_info(table_id: TableId) -> TableInfo {
new_test_table_info_with_name(table_id, "mytable")
}

View File

@@ -868,6 +868,8 @@ impl PgStore {
let client = match pool.get().await {
Ok(client) => client,
Err(e) => {
// We need to log the debug for the error to help diagnose the issue.
common_telemetry::error!(e; "Failed to get Postgres connection.");
return GetPostgresConnectionSnafu {
reason: e.to_string(),
}

View File

@@ -1639,7 +1639,6 @@ mod tests {
value_indices: vec![2],
engine: METRIC_ENGINE_NAME.to_string(),
next_column_id: 0,
region_numbers: vec![0],
options: Default::default(),
created_on: Default::default(),
updated_on: Default::default(),

View File

@@ -21,6 +21,7 @@ use std::sync::Arc;
use std::task::{Context, Poll};
use common_base::readable_size::ReadableSize;
use common_telemetry::tracing::{Span, info_span};
use common_time::util::format_nanoseconds_human_readable;
use datafusion::arrow::compute::cast;
use datafusion::arrow::datatypes::SchemaRef as DfSchemaRef;
@@ -218,6 +219,7 @@ pub struct RecordBatchStreamAdapter {
metrics_2: Metrics,
/// Display plan and metrics in verbose mode.
explain_verbose: bool,
span: Span,
}
/// Json encoded metrics. Contains metric from a whole plan tree.
@@ -238,22 +240,21 @@ impl RecordBatchStreamAdapter {
metrics: None,
metrics_2: Metrics::Unavailable,
explain_verbose: false,
span: Span::current(),
})
}
pub fn try_new_with_metrics_and_df_plan(
stream: DfSendableRecordBatchStream,
metrics: BaselineMetrics,
df_plan: Arc<dyn ExecutionPlan>,
) -> Result<Self> {
pub fn try_new_with_span(stream: DfSendableRecordBatchStream, span: Span) -> Result<Self> {
let schema =
Arc::new(Schema::try_from(stream.schema()).context(error::SchemaConversionSnafu)?);
let subspan = info_span!(parent: &span, "RecordBatchStreamAdapter");
Ok(Self {
schema,
stream,
metrics: Some(metrics),
metrics_2: Metrics::Unresolved(df_plan),
metrics: None,
metrics_2: Metrics::Unavailable,
explain_verbose: false,
span: subspan,
})
}
@@ -300,6 +301,8 @@ impl Stream for RecordBatchStreamAdapter {
.map(|m| m.elapsed_compute().clone())
.unwrap_or_default();
let _guard = timer.timer();
let poll_span = info_span!(parent: &self.span, "poll_next");
let _entered = poll_span.enter();
match Pin::new(&mut self.stream).poll_next(cx) {
Poll::Pending => Poll::Pending,
Poll::Ready(Some(df_record_batch)) => {

View File

@@ -29,6 +29,7 @@ use std::sync::atomic::{AtomicBool, AtomicUsize, Ordering};
use adapter::RecordBatchMetrics;
use arc_swap::ArcSwapOption;
use common_base::readable_size::ReadableSize;
use common_telemetry::tracing::Span;
pub use datafusion::physical_plan::SendableRecordBatchStream as DfSendableRecordBatchStream;
use datatypes::arrow::array::{ArrayRef, AsArray, StringBuilder};
use datatypes::arrow::compute::SortOptions;
@@ -370,6 +371,7 @@ pub struct RecordBatchStreamWrapper<S> {
pub stream: S,
pub output_ordering: Option<Vec<OrderOption>>,
pub metrics: Arc<ArcSwapOption<RecordBatchMetrics>>,
pub span: Span,
}
impl<S> RecordBatchStreamWrapper<S> {
@@ -380,6 +382,7 @@ impl<S> RecordBatchStreamWrapper<S> {
stream,
output_ordering: None,
metrics: Default::default(),
span: Span::current(),
}
}
}
@@ -408,6 +411,7 @@ impl<S: Stream<Item = Result<RecordBatch>> + Unpin> Stream for RecordBatchStream
type Item = Result<RecordBatch>;
fn poll_next(mut self: Pin<&mut Self>, ctx: &mut Context<'_>) -> Poll<Option<Self::Item>> {
let _entered = self.span.clone().entered();
Pin::new(&mut self.stream).poll_next(ctx)
}
}

View File

@@ -77,4 +77,5 @@ common-query.workspace = true
common-test-util.workspace = true
datafusion-common.workspace = true
mito2 = { workspace = true, features = ["test"] }
partition.workspace = true
session.workspace = true

View File

@@ -14,7 +14,7 @@
use std::collections::HashMap;
use std::sync::Arc;
use std::sync::atomic::{AtomicBool, Ordering};
use std::sync::atomic::{AtomicBool, AtomicU64, Ordering};
use api::v1::meta::GrantedRegion;
use async_trait::async_trait;
@@ -50,7 +50,7 @@ use crate::region_server::RegionServer;
pub struct RegionAliveKeeper {
region_server: RegionServer,
tasks: Arc<Mutex<HashMap<RegionId, Arc<CountdownTaskHandle>>>>,
heartbeat_interval_millis: u64,
heartbeat_interval_millis: Arc<AtomicU64>,
started: Arc<AtomicBool>,
/// The epoch when [RegionAliveKeeper] is created. It's used to get a monotonically non-decreasing
@@ -67,18 +67,26 @@ impl RegionAliveKeeper {
pub fn new(
region_server: RegionServer,
countdown_task_handler_ext: Option<CountdownTaskHandlerExtRef>,
heartbeat_interval_millis: u64,
heartbeat_interval: Duration,
) -> Self {
Self {
region_server,
tasks: Arc::new(Mutex::new(HashMap::new())),
heartbeat_interval_millis,
heartbeat_interval_millis: Arc::new(AtomicU64::new(
heartbeat_interval.as_millis() as u64
)),
started: Arc::new(AtomicBool::new(false)),
epoch: Instant::now(),
countdown_task_handler_ext,
}
}
/// Update the heartbeat interval with the value received from Metasrv.
pub fn update_heartbeat_interval(&self, heartbeat_interval_millis: u64) {
self.heartbeat_interval_millis
.store(heartbeat_interval_millis, Ordering::Relaxed);
}
async fn find_handle(&self, region_id: RegionId) -> Option<Arc<CountdownTaskHandle>> {
self.tasks.lock().await.get(&region_id).cloned()
}
@@ -108,7 +116,9 @@ impl RegionAliveKeeper {
};
if should_start {
handle.start(self.heartbeat_interval_millis).await;
handle
.start(self.heartbeat_interval_millis.load(Ordering::Relaxed))
.await;
info!("Region alive countdown for region {region_id} is started!");
} else {
info!(
@@ -230,8 +240,9 @@ impl RegionAliveKeeper {
}
let tasks = self.tasks.lock().await;
let interval = self.heartbeat_interval_millis.load(Ordering::Relaxed);
for task in tasks.values() {
task.start(self.heartbeat_interval_millis).await;
task.start(interval).await;
}
info!(
@@ -505,7 +516,11 @@ mod test {
let engine = Arc::new(engine);
region_server.register_engine(engine.clone());
let alive_keeper = Arc::new(RegionAliveKeeper::new(region_server.clone(), None, 100));
let alive_keeper = Arc::new(RegionAliveKeeper::new(
region_server.clone(),
None,
Duration::from_millis(100),
));
let region_id = RegionId::new(1024, 1);
let builder = CreateRequestBuilder::new();

View File

@@ -29,7 +29,6 @@ pub(crate) use object_store::config::ObjectStoreConfig;
use query::options::QueryOptions;
use serde::{Deserialize, Serialize};
use servers::grpc::GrpcOptions;
use servers::heartbeat_options::HeartbeatOptions;
use servers::http::HttpOptions;
/// Storage engine config
@@ -71,7 +70,6 @@ pub struct DatanodeOptions {
pub init_regions_in_background: bool,
pub init_regions_parallelism: usize,
pub grpc: GrpcOptions,
pub heartbeat: HeartbeatOptions,
pub http: HttpOptions,
pub meta_client: Option<MetaClientOptions>,
pub wal: DatanodeWalConfig,
@@ -134,7 +132,6 @@ impl Default for DatanodeOptions {
RegionEngineConfig::File(FileEngineConfig::default()),
],
logging: LoggingOptions::default(),
heartbeat: HeartbeatOptions::datanode_default(),
enable_telemetry: true,
tracing: TracingOptions::default(),
query: QueryOptions::default(),

View File

@@ -201,6 +201,7 @@ pub enum Error {
ShutdownServer {
#[snafu(implicit)]
location: Location,
#[snafu(source)]
source: servers::error::Error,
},
@@ -208,6 +209,7 @@ pub enum Error {
ShutdownInstance {
#[snafu(implicit)]
location: Location,
#[snafu(source)]
source: BoxedError,
},

View File

@@ -22,7 +22,7 @@ use api::v1::meta::{DatanodeWorkloads, HeartbeatRequest, NodeInfo, Peer, RegionR
use common_base::Plugins;
use common_meta::cache_invalidator::CacheInvalidatorRef;
use common_meta::datanode::REGION_STATISTIC_KEY;
use common_meta::distributed_time_constants::META_KEEP_ALIVE_INTERVAL_SECS;
use common_meta::distributed_time_constants::BASE_HEARTBEAT_INTERVAL;
use common_meta::heartbeat::handler::invalidate_table_cache::InvalidateCacheHandler;
use common_meta::heartbeat::handler::parse_mailbox_message::ParseMailboxMessageHandler;
use common_meta::heartbeat::handler::suspend::SuspendHandler;
@@ -35,6 +35,7 @@ use common_stat::ResourceStatRef;
use common_telemetry::{debug, error, info, trace, warn};
use common_workload::DatanodeWorkloadType;
use meta_client::MetaClientRef;
use meta_client::client::heartbeat::HeartbeatConfig;
use meta_client::client::{HeartbeatSender, MetaClient};
use servers::addrs;
use snafu::{OptionExt as _, ResultExt};
@@ -61,7 +62,6 @@ pub struct HeartbeatTask {
running: Arc<AtomicBool>,
meta_client: MetaClientRef,
region_server: RegionServer,
interval: u64,
resp_handler_executor: HeartbeatResponseHandlerExecutorRef,
region_alive_keeper: Arc<RegionAliveKeeper>,
resource_stat: ResourceStatRef,
@@ -87,7 +87,7 @@ impl HeartbeatTask {
let region_alive_keeper = Arc::new(RegionAliveKeeper::new(
region_server.clone(),
countdown_task_handler_ext,
opts.heartbeat.interval.as_millis() as u64,
BASE_HEARTBEAT_INTERVAL,
));
let resp_handler_executor = Arc::new(HandlerGroupExecutor::new(vec![
region_alive_keeper.clone(),
@@ -109,7 +109,6 @@ impl HeartbeatTask {
running: Arc::new(AtomicBool::new(false)),
meta_client,
region_server,
interval: opts.heartbeat.interval.as_millis() as u64,
resp_handler_executor,
region_alive_keeper,
resource_stat,
@@ -123,9 +122,9 @@ impl HeartbeatTask {
mailbox: MailboxRef,
mut notify: Option<Arc<Notify>>,
quit_signal: Arc<Notify>,
) -> Result<HeartbeatSender> {
) -> Result<(HeartbeatSender, HeartbeatConfig)> {
let client_id = meta_client.id();
let (tx, mut rx) = meta_client.heartbeat().await.context(MetaClientInitSnafu)?;
let (tx, mut rx, config) = meta_client.heartbeat().await.context(MetaClientInitSnafu)?;
let mut last_received_lease = Instant::now();
@@ -175,7 +174,7 @@ impl HeartbeatTask {
quit_signal.notify_one();
info!("Heartbeat handling loop exit.");
});
Ok(tx)
Ok((tx, config))
}
async fn handle_response(
@@ -204,13 +203,9 @@ impl HeartbeatTask {
warn!("Heartbeat task started multiple times");
return Ok(());
}
let interval = self.interval;
let node_id = self.node_id;
let node_epoch = self.node_epoch;
let addr = &self.peer_addr;
info!(
"Starting heartbeat to Metasrv with interval {interval}. My node id is {node_id}, address is {addr}."
);
let meta_client = self.meta_client.clone();
let region_server_clone = self.region_server.clone();
@@ -222,7 +217,7 @@ impl HeartbeatTask {
let quit_signal = Arc::new(Notify::new());
let mut tx = Self::create_streams(
let (mut tx, config) = Self::create_streams(
&meta_client,
running.clone(),
handler_executor.clone(),
@@ -232,6 +227,17 @@ impl HeartbeatTask {
)
.await?;
let interval = config.interval.as_millis() as u64;
let mut retry_interval = config.retry_interval;
// Update RegionAliveKeeper with the interval from Metasrv
self.region_alive_keeper.update_heartbeat_interval(interval);
info!(
"Starting heartbeat to Metasrv with config: {}. My node id is {}, address is {}.",
config, node_id, addr
);
let self_peer = Some(Peer {
id: node_id,
addr: addr.clone(),
@@ -244,6 +250,7 @@ impl HeartbeatTask {
let total_cpu_millicores = self.resource_stat.get_total_cpu_millicores();
let total_memory_bytes = self.resource_stat.get_total_memory_bytes();
let resource_stat = self.resource_stat.clone();
let region_alive_keeper = self.region_alive_keeper.clone();
let gc_limiter = self
.region_server
.mito_engine()
@@ -363,20 +370,23 @@ impl HeartbeatTask {
)
.await
{
Ok(new_tx) => {
info!("Reconnected to metasrv");
Ok((new_tx, new_config)) => {
info!("Reconnected to metasrv, heartbeat config: {}", new_config);
tx = new_tx;
// Update retry_interval from new config
retry_interval = new_config.retry_interval;
// Update region_alive_keeper's heartbeat interval
region_alive_keeper.update_heartbeat_interval(
new_config.interval.as_millis() as u64,
);
// Triggers to send heartbeat immediately.
sleep.as_mut().reset(Instant::now());
}
Err(e) => {
// Before the META_LEASE_SECS expires,
// any retries are meaningless, it always reads the old meta leader address.
// Triggers to retry after META_KEEP_ALIVE_INTERVAL_SECS.
sleep.as_mut().reset(
Instant::now()
+ Duration::from_secs(META_KEEP_ALIVE_INTERVAL_SECS),
);
// Triggers to retry after retry_interval from Metasrv config.
sleep.as_mut().reset(Instant::now() + retry_interval);
error!(e; "Failed to reconnect to metasrv!");
}
}

View File

@@ -22,6 +22,7 @@ use common_telemetry::error;
use snafu::OptionExt;
use store_api::storage::GcReport;
mod apply_staging_manifest;
mod close_region;
mod downgrade_region;
mod enter_staging;
@@ -29,8 +30,10 @@ mod file_ref;
mod flush_region;
mod gc_worker;
mod open_region;
mod remap_manifest;
mod upgrade_region;
use crate::heartbeat::handler::apply_staging_manifest::ApplyStagingManifestsHandler;
use crate::heartbeat::handler::close_region::CloseRegionsHandler;
use crate::heartbeat::handler::downgrade_region::DowngradeRegionsHandler;
use crate::heartbeat::handler::enter_staging::EnterStagingRegionsHandler;
@@ -38,6 +41,7 @@ use crate::heartbeat::handler::file_ref::GetFileRefsHandler;
use crate::heartbeat::handler::flush_region::FlushRegionsHandler;
use crate::heartbeat::handler::gc_worker::GcRegionsHandler;
use crate::heartbeat::handler::open_region::OpenRegionsHandler;
use crate::heartbeat::handler::remap_manifest::RemapManifestHandler;
use crate::heartbeat::handler::upgrade_region::UpgradeRegionsHandler;
use crate::heartbeat::task_tracker::TaskTracker;
use crate::region_server::RegionServer;
@@ -128,6 +132,10 @@ impl RegionHeartbeatResponseHandler {
Instruction::EnterStagingRegions(_) => {
Ok(Some(Box::new(EnterStagingRegionsHandler.into())))
}
Instruction::RemapManifest(_) => Ok(Some(Box::new(RemapManifestHandler.into()))),
Instruction::ApplyStagingManifests(_) => {
Ok(Some(Box::new(ApplyStagingManifestsHandler.into())))
}
}
}
}
@@ -142,6 +150,8 @@ pub enum InstructionHandlers {
GetFileRefs(GetFileRefsHandler),
GcRegions(GcRegionsHandler),
EnterStagingRegions(EnterStagingRegionsHandler),
RemapManifest(RemapManifestHandler),
ApplyStagingManifests(ApplyStagingManifestsHandler),
}
macro_rules! impl_from_handler {
@@ -164,7 +174,9 @@ impl_from_handler!(
UpgradeRegionsHandler => UpgradeRegions,
GetFileRefsHandler => GetFileRefs,
GcRegionsHandler => GcRegions,
EnterStagingRegionsHandler => EnterStagingRegions
EnterStagingRegionsHandler => EnterStagingRegions,
RemapManifestHandler => RemapManifest,
ApplyStagingManifestsHandler => ApplyStagingManifests
);
macro_rules! dispatch_instr {
@@ -209,7 +221,9 @@ dispatch_instr!(
UpgradeRegions => UpgradeRegions,
GetFileRefs => GetFileRefs,
GcRegions => GcRegions,
EnterStagingRegions => EnterStagingRegions
EnterStagingRegions => EnterStagingRegions,
RemapManifest => RemapManifest,
ApplyStagingManifests => ApplyStagingManifests,
);
#[async_trait]

View File

@@ -0,0 +1,287 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
use common_meta::instruction::{
ApplyStagingManifest, ApplyStagingManifestReply, ApplyStagingManifestsReply, InstructionReply,
};
use common_telemetry::{error, warn};
use futures::future::join_all;
use store_api::region_request::{ApplyStagingManifestRequest, RegionRequest};
use crate::heartbeat::handler::{HandlerContext, InstructionHandler};
pub struct ApplyStagingManifestsHandler;
#[async_trait::async_trait]
impl InstructionHandler for ApplyStagingManifestsHandler {
type Instruction = Vec<ApplyStagingManifest>;
async fn handle(
&self,
ctx: &HandlerContext,
requests: Self::Instruction,
) -> Option<InstructionReply> {
let results = join_all(
requests
.into_iter()
.map(|request| Self::handle_apply_staging_manifest(ctx, request)),
)
.await;
Some(InstructionReply::ApplyStagingManifests(
ApplyStagingManifestsReply::new(results),
))
}
}
impl ApplyStagingManifestsHandler {
async fn handle_apply_staging_manifest(
ctx: &HandlerContext,
request: ApplyStagingManifest,
) -> ApplyStagingManifestReply {
let Some(leader) = ctx.region_server.is_region_leader(request.region_id) else {
warn!("Region: {} is not found", request.region_id);
return ApplyStagingManifestReply {
region_id: request.region_id,
exists: false,
ready: false,
error: None,
};
};
if !leader {
warn!("Region: {} is not leader", request.region_id);
return ApplyStagingManifestReply {
region_id: request.region_id,
exists: true,
ready: false,
error: Some("Region is not leader".into()),
};
}
match ctx
.region_server
.handle_request(
request.region_id,
RegionRequest::ApplyStagingManifest(ApplyStagingManifestRequest {
partition_expr: request.partition_expr,
central_region_id: request.central_region_id,
manifest_path: request.manifest_path,
}),
)
.await
{
Ok(_) => ApplyStagingManifestReply {
region_id: request.region_id,
exists: true,
ready: true,
error: None,
},
Err(err) => {
error!(err; "Failed to apply staging manifest");
ApplyStagingManifestReply {
region_id: request.region_id,
exists: true,
ready: false,
error: Some(format!("{err:?}")),
}
}
}
}
}
#[cfg(test)]
mod tests {
use std::collections::HashMap;
use std::sync::Arc;
use common_meta::instruction::RemapManifest;
use datatypes::value::Value;
use mito2::config::MitoConfig;
use mito2::engine::MITO_ENGINE_NAME;
use mito2::test_util::{CreateRequestBuilder, TestEnv};
use partition::expr::{PartitionExpr, col};
use store_api::path_utils::table_dir;
use store_api::region_engine::RegionRole;
use store_api::region_request::EnterStagingRequest;
use store_api::storage::RegionId;
use super::*;
use crate::heartbeat::handler::remap_manifest::RemapManifestHandler;
use crate::region_server::RegionServer;
use crate::tests::{MockRegionEngine, mock_region_server};
#[tokio::test]
async fn test_region_not_exist() {
let mut mock_region_server = mock_region_server();
let (mock_engine, _) = MockRegionEngine::new(MITO_ENGINE_NAME);
mock_region_server.register_engine(mock_engine);
let handler_context = HandlerContext::new_for_test(mock_region_server);
let region_id = RegionId::new(1024, 1);
let reply = ApplyStagingManifestsHandler
.handle(
&handler_context,
vec![ApplyStagingManifest {
region_id,
partition_expr: "".to_string(),
central_region_id: RegionId::new(1024, 9999), // use a dummy value
manifest_path: "".to_string(),
}],
)
.await
.unwrap();
let replies = reply.expect_apply_staging_manifests_reply();
let reply = &replies[0];
assert!(!reply.exists);
assert!(!reply.ready);
assert!(reply.error.is_none());
}
#[tokio::test]
async fn test_region_not_leader() {
let mock_region_server = mock_region_server();
let region_id = RegionId::new(1024, 1);
let (mock_engine, _) =
MockRegionEngine::with_custom_apply_fn(MITO_ENGINE_NAME, |region_engine| {
region_engine.mock_role = Some(Some(RegionRole::Follower));
region_engine.handle_request_mock_fn = Some(Box::new(|_, _| Ok(0)));
});
mock_region_server.register_test_region(region_id, mock_engine);
let handler_context = HandlerContext::new_for_test(mock_region_server);
let region_id = RegionId::new(1024, 1);
let reply = ApplyStagingManifestsHandler
.handle(
&handler_context,
vec![ApplyStagingManifest {
region_id,
partition_expr: "".to_string(),
central_region_id: RegionId::new(1024, 2),
manifest_path: "".to_string(),
}],
)
.await
.unwrap();
let replies = reply.expect_apply_staging_manifests_reply();
let reply = &replies[0];
assert!(reply.exists);
assert!(!reply.ready);
assert!(reply.error.is_some());
}
fn range_expr(col_name: &str, start: i64, end: i64) -> PartitionExpr {
col(col_name)
.gt_eq(Value::Int64(start))
.and(col(col_name).lt(Value::Int64(end)))
}
async fn prepare_region(region_server: &RegionServer) {
let region_specs = [
(RegionId::new(1024, 1), range_expr("x", 0, 49)),
(RegionId::new(1024, 2), range_expr("x", 49, 100)),
];
for (region_id, partition_expr) in region_specs {
let builder = CreateRequestBuilder::new();
let mut create_req = builder.build();
create_req.table_dir = table_dir("test", 1024);
region_server
.handle_request(region_id, RegionRequest::Create(create_req))
.await
.unwrap();
region_server
.handle_request(
region_id,
RegionRequest::EnterStaging(EnterStagingRequest {
partition_expr: partition_expr.as_json_str().unwrap(),
}),
)
.await
.unwrap();
}
}
#[tokio::test]
async fn test_apply_staging_manifest() {
common_telemetry::init_default_ut_logging();
let mut region_server = mock_region_server();
let region_id = RegionId::new(1024, 1);
let mut engine_env = TestEnv::new().await;
let engine = engine_env.create_engine(MitoConfig::default()).await;
region_server.register_engine(Arc::new(engine.clone()));
prepare_region(&region_server).await;
let handler_context = HandlerContext::new_for_test(region_server);
let region_id2 = RegionId::new(1024, 2);
let reply = RemapManifestHandler
.handle(
&handler_context,
RemapManifest {
region_id,
input_regions: vec![region_id, region_id2],
region_mapping: HashMap::from([
// [0,49) <- [0, 50)
(region_id, vec![region_id]),
// [49, 100) <- [0, 50), [50,100)
(region_id2, vec![region_id, region_id2]),
]),
new_partition_exprs: HashMap::from([
(region_id, range_expr("x", 0, 49).as_json_str().unwrap()),
(region_id2, range_expr("x", 49, 100).as_json_str().unwrap()),
]),
},
)
.await
.unwrap();
let reply = reply.expect_remap_manifest_reply();
assert!(reply.exists);
assert!(reply.error.is_none(), "{}", reply.error.unwrap());
assert_eq!(reply.manifest_paths.len(), 2);
let manifest_path_1 = reply.manifest_paths[&region_id].clone();
let manifest_path_2 = reply.manifest_paths[&region_id2].clone();
let reply = ApplyStagingManifestsHandler
.handle(
&handler_context,
vec![ApplyStagingManifest {
region_id,
partition_expr: range_expr("x", 0, 49).as_json_str().unwrap(),
central_region_id: region_id,
manifest_path: manifest_path_1,
}],
)
.await
.unwrap();
let replies = reply.expect_apply_staging_manifests_reply();
let reply = &replies[0];
assert!(reply.exists);
assert!(reply.ready);
assert!(reply.error.is_none());
// partition expr mismatch
let reply = ApplyStagingManifestsHandler
.handle(
&handler_context,
vec![ApplyStagingManifest {
region_id: region_id2,
partition_expr: range_expr("x", 50, 100).as_json_str().unwrap(),
central_region_id: region_id,
manifest_path: manifest_path_2,
}],
)
.await
.unwrap();
let replies = reply.expect_apply_staging_manifests_reply();
let reply = &replies[0];
assert!(reply.exists);
assert!(!reply.ready);
assert!(reply.error.is_some());
}
}

View File

@@ -0,0 +1,246 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
use common_meta::instruction::{InstructionReply, RemapManifest, RemapManifestReply};
use common_telemetry::warn;
use store_api::region_engine::RemapManifestsRequest;
use crate::heartbeat::handler::{HandlerContext, InstructionHandler};
pub struct RemapManifestHandler;
#[async_trait::async_trait]
impl InstructionHandler for RemapManifestHandler {
type Instruction = RemapManifest;
async fn handle(
&self,
ctx: &HandlerContext,
request: Self::Instruction,
) -> Option<InstructionReply> {
let RemapManifest {
region_id,
input_regions,
region_mapping,
new_partition_exprs,
} = request;
let Some(leader) = ctx.region_server.is_region_leader(region_id) else {
warn!("Region: {} is not found", region_id);
return Some(InstructionReply::RemapManifest(RemapManifestReply {
exists: false,
manifest_paths: Default::default(),
error: None,
}));
};
if !leader {
warn!("Region: {} is not leader", region_id);
return Some(InstructionReply::RemapManifest(RemapManifestReply {
exists: true,
manifest_paths: Default::default(),
error: Some("Region is not leader".into()),
}));
}
let reply = match ctx
.region_server
.remap_manifests(RemapManifestsRequest {
region_id,
input_regions,
region_mapping,
new_partition_exprs,
})
.await
{
Ok(result) => InstructionReply::RemapManifest(RemapManifestReply {
exists: true,
manifest_paths: result.manifest_paths,
error: None,
}),
Err(e) => InstructionReply::RemapManifest(RemapManifestReply {
exists: true,
manifest_paths: Default::default(),
error: Some(format!("{e:?}")),
}),
};
Some(reply)
}
}
#[cfg(test)]
mod tests {
use std::collections::HashMap;
use std::sync::Arc;
use common_meta::instruction::RemapManifest;
use datatypes::value::Value;
use mito2::config::MitoConfig;
use mito2::engine::MITO_ENGINE_NAME;
use mito2::test_util::{CreateRequestBuilder, TestEnv};
use partition::expr::{PartitionExpr, col};
use store_api::path_utils::table_dir;
use store_api::region_engine::RegionRole;
use store_api::region_request::{EnterStagingRequest, RegionRequest};
use store_api::storage::RegionId;
use crate::heartbeat::handler::remap_manifest::RemapManifestHandler;
use crate::heartbeat::handler::{HandlerContext, InstructionHandler};
use crate::region_server::RegionServer;
use crate::tests::{MockRegionEngine, mock_region_server};
#[tokio::test]
async fn test_region_not_exist() {
let mut mock_region_server = mock_region_server();
let (mock_engine, _) = MockRegionEngine::new(MITO_ENGINE_NAME);
mock_region_server.register_engine(mock_engine);
let handler_context = HandlerContext::new_for_test(mock_region_server);
let region_id = RegionId::new(1024, 1);
let reply = RemapManifestHandler
.handle(
&handler_context,
RemapManifest {
region_id,
input_regions: vec![],
region_mapping: HashMap::new(),
new_partition_exprs: HashMap::new(),
},
)
.await
.unwrap();
let reply = &reply.expect_remap_manifest_reply();
assert!(!reply.exists);
assert!(reply.error.is_none());
assert!(reply.manifest_paths.is_empty());
}
#[tokio::test]
async fn test_region_not_leader() {
let mock_region_server = mock_region_server();
let region_id = RegionId::new(1024, 1);
let (mock_engine, _) =
MockRegionEngine::with_custom_apply_fn(MITO_ENGINE_NAME, |region_engine| {
region_engine.mock_role = Some(Some(RegionRole::Follower));
region_engine.handle_request_mock_fn = Some(Box::new(|_, _| Ok(0)));
});
mock_region_server.register_test_region(region_id, mock_engine);
let handler_context = HandlerContext::new_for_test(mock_region_server);
let reply = RemapManifestHandler
.handle(
&handler_context,
RemapManifest {
region_id,
input_regions: vec![],
region_mapping: HashMap::new(),
new_partition_exprs: HashMap::new(),
},
)
.await
.unwrap();
let reply = reply.expect_remap_manifest_reply();
assert!(reply.exists);
assert!(reply.error.is_some());
}
fn range_expr(col_name: &str, start: i64, end: i64) -> PartitionExpr {
col(col_name)
.gt_eq(Value::Int64(start))
.and(col(col_name).lt(Value::Int64(end)))
}
async fn prepare_region(region_server: &RegionServer) {
let region_specs = [
(RegionId::new(1024, 1), range_expr("x", 0, 50)),
(RegionId::new(1024, 2), range_expr("x", 50, 100)),
];
for (region_id, partition_expr) in region_specs {
let builder = CreateRequestBuilder::new();
let mut create_req = builder.build();
create_req.table_dir = table_dir("test", 1024);
region_server
.handle_request(region_id, RegionRequest::Create(create_req))
.await
.unwrap();
region_server
.handle_request(
region_id,
RegionRequest::EnterStaging(EnterStagingRequest {
partition_expr: partition_expr.as_json_str().unwrap(),
}),
)
.await
.unwrap();
}
}
#[tokio::test]
async fn test_remap_manifest() {
common_telemetry::init_default_ut_logging();
let mut region_server = mock_region_server();
let region_id = RegionId::new(1024, 1);
let mut engine_env = TestEnv::new().await;
let engine = engine_env.create_engine(MitoConfig::default()).await;
region_server.register_engine(Arc::new(engine.clone()));
prepare_region(&region_server).await;
let handler_context = HandlerContext::new_for_test(region_server);
let region_id2 = RegionId::new(1024, 2);
let reply = RemapManifestHandler
.handle(
&handler_context,
RemapManifest {
region_id,
input_regions: vec![region_id, region_id2],
region_mapping: HashMap::from([
(region_id, vec![region_id]),
(region_id2, vec![region_id]),
]),
new_partition_exprs: HashMap::from([(
region_id,
range_expr("x", 0, 100).as_json_str().unwrap(),
)]),
},
)
.await
.unwrap();
let reply = reply.expect_remap_manifest_reply();
assert!(reply.exists);
assert!(reply.error.is_none(), "{}", reply.error.unwrap());
assert_eq!(reply.manifest_paths.len(), 1);
// Remap failed
let reply = RemapManifestHandler
.handle(
&handler_context,
RemapManifest {
region_id,
input_regions: vec![region_id],
region_mapping: HashMap::from([
(region_id, vec![region_id]),
(region_id2, vec![region_id]),
]),
new_partition_exprs: HashMap::from([(
region_id,
range_expr("x", 0, 100).as_json_str().unwrap(),
)]),
},
)
.await
.unwrap();
let reply = reply.expect_remap_manifest_reply();
assert!(reply.exists);
assert!(reply.error.is_some());
assert!(reply.manifest_paths.is_empty());
}
}

View File

@@ -65,8 +65,9 @@ use store_api::metric_engine_consts::{
FILE_ENGINE_NAME, LOGICAL_TABLE_METADATA_KEY, METRIC_ENGINE_NAME,
};
use store_api::region_engine::{
RegionEngineRef, RegionManifestInfo, RegionRole, RegionStatistic, SetRegionRoleStateResponse,
SettableRegionRoleState, SyncRegionFromRequest,
RegionEngineRef, RegionManifestInfo, RegionRole, RegionStatistic, RemapManifestsRequest,
RemapManifestsResponse, SetRegionRoleStateResponse, SettableRegionRoleState,
SyncRegionFromRequest,
};
use store_api::region_request::{
AffectedRows, BatchRegionDdlRequest, RegionCatchupRequest, RegionCloseRequest,
@@ -604,6 +605,25 @@ impl RegionServer {
.await
}
/// Remaps manifests from old regions to new regions.
pub async fn remap_manifests(
&self,
request: RemapManifestsRequest,
) -> Result<RemapManifestsResponse> {
let region_id = request.region_id;
let engine_with_status = self
.inner
.region_map
.get(&region_id)
.with_context(|| RegionNotFoundSnafu { region_id })?;
engine_with_status
.engine()
.remap_manifests(request)
.await
.with_context(|_| HandleRegionRequestSnafu { region_id })
}
fn is_suspended(&self) -> bool {
self.suspend.load(Ordering::Relaxed)
}
@@ -1621,7 +1641,10 @@ mod tests {
let response = mock_region_server
.handle_request(
region_id,
RegionRequest::Drop(RegionDropRequest { fast_path: false }),
RegionRequest::Drop(RegionDropRequest {
fast_path: false,
force: false,
}),
)
.await
.unwrap();
@@ -1719,7 +1742,10 @@ mod tests {
mock_region_server
.handle_request(
region_id,
RegionRequest::Drop(RegionDropRequest { fast_path: false }),
RegionRequest::Drop(RegionDropRequest {
fast_path: false,
force: false,
}),
)
.await
.unwrap_err();

View File

@@ -15,9 +15,10 @@
use arrow::array::{ArrayRef, AsArray};
use arrow::datatypes::{
DataType, DurationMicrosecondType, DurationMillisecondType, DurationNanosecondType,
DurationSecondType, Time32MillisecondType, Time32SecondType, Time64MicrosecondType,
Time64NanosecondType, TimeUnit, TimestampMicrosecondType, TimestampMillisecondType,
TimestampNanosecondType, TimestampSecondType,
DurationSecondType, Int8Type, Int16Type, Int32Type, Int64Type, Time32MillisecondType,
Time32SecondType, Time64MicrosecondType, Time64NanosecondType, TimeUnit,
TimestampMicrosecondType, TimestampMillisecondType, TimestampNanosecondType,
TimestampSecondType, UInt8Type, UInt16Type, UInt32Type, UInt64Type,
};
use arrow_array::Array;
use common_time::time::Time;
@@ -152,3 +153,62 @@ pub fn string_array_value_at_index(array: &ArrayRef, i: usize) -> Option<&str> {
_ => None,
}
}
/// Get the integer value (`i64`) at index `i` for any integer array.
///
/// Returns `None` when:
///
/// - the array type is not an integer type;
/// - the value is larger than `i64::MAX`;
/// - the value is null.
///
/// # Panics
///
/// If index `i` is out of bounds.
pub fn int_array_value_at_index(array: &ArrayRef, i: usize) -> Option<i64> {
match array.data_type() {
DataType::Int8 => {
let array = array.as_primitive::<Int8Type>();
array.is_valid(i).then(|| array.value(i) as i64)
}
DataType::Int16 => {
let array = array.as_primitive::<Int16Type>();
array.is_valid(i).then(|| array.value(i) as i64)
}
DataType::Int32 => {
let array = array.as_primitive::<Int32Type>();
array.is_valid(i).then(|| array.value(i) as i64)
}
DataType::Int64 => {
let array = array.as_primitive::<Int64Type>();
array.is_valid(i).then(|| array.value(i))
}
DataType::UInt8 => {
let array = array.as_primitive::<UInt8Type>();
array.is_valid(i).then(|| array.value(i) as i64)
}
DataType::UInt16 => {
let array = array.as_primitive::<UInt16Type>();
array.is_valid(i).then(|| array.value(i) as i64)
}
DataType::UInt32 => {
let array = array.as_primitive::<UInt32Type>();
array.is_valid(i).then(|| array.value(i) as i64)
}
DataType::UInt64 => {
let array = array.as_primitive::<UInt64Type>();
array
.is_valid(i)
.then(|| {
let i = array.value(i);
if i <= i64::MAX as u64 {
Some(i as i64)
} else {
None
}
})
.flatten()
}
_ => None,
}
}

View File

@@ -816,7 +816,7 @@ mod tests {
let result = encode_by_struct(&json_struct, json);
assert_eq!(
result.unwrap_err().to_string(),
"Cannot cast value bar to Number(I64)"
r#"Cannot cast value bar to "<Number>""#
);
let json = json!({

View File

@@ -13,7 +13,7 @@
// limitations under the License.
use std::collections::BTreeMap;
use std::fmt::{Display, Formatter};
use std::fmt::{Debug, Display, Formatter};
use std::str::FromStr;
use std::sync::Arc;
@@ -134,24 +134,24 @@ impl From<&ConcreteDataType> for JsonNativeType {
impl Display for JsonNativeType {
fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result {
match self {
JsonNativeType::Null => write!(f, "Null"),
JsonNativeType::Bool => write!(f, "Bool"),
JsonNativeType::Number(t) => {
write!(f, "Number({t:?})")
JsonNativeType::Null => write!(f, r#""<Null>""#),
JsonNativeType::Bool => write!(f, r#""<Bool>""#),
JsonNativeType::Number(_) => {
write!(f, r#""<Number>""#)
}
JsonNativeType::String => write!(f, "String"),
JsonNativeType::String => write!(f, r#""<String>""#),
JsonNativeType::Array(item_type) => {
write!(f, "Array[{}]", item_type)
write!(f, "[{}]", item_type)
}
JsonNativeType::Object(object) => {
write!(
f,
"Object{{{}}}",
"{{{}}}",
object
.iter()
.map(|(k, v)| format!(r#""{k}": {v}"#))
.map(|(k, v)| format!(r#""{k}":{v}"#))
.collect::<Vec<_>>()
.join(", ")
.join(",")
)
}
}
@@ -183,7 +183,11 @@ impl JsonType {
}
}
pub(crate) fn native_type(&self) -> &JsonNativeType {
pub fn is_native_type(&self) -> bool {
matches!(self.format, JsonFormat::Native(_))
}
pub fn native_type(&self) -> &JsonNativeType {
match &self.format {
JsonFormat::Jsonb => &JsonNativeType::String,
JsonFormat::Native(x) => x.as_ref(),
@@ -650,15 +654,16 @@ mod tests {
"list": [1, 2, 3],
"object": {"a": 1}
}"#;
let expected = r#"Json<Object{"hello": String, "list": Array[Number(I64)], "object": Object{"a": Number(I64)}}>"#;
let expected =
r#"Json<{"hello":"<String>","list":["<Number>"],"object":{"a":"<Number>"}}>"#;
test(json, json_type, Ok(expected))?;
// cannot merge with other non-object json values:
let jsons = [r#""s""#, "1", "[1]"];
let expects = [
r#"Failed to merge JSON datatype: datatypes have conflict, this: Object{"hello": String, "list": Array[Number(I64)], "object": Object{"a": Number(I64)}}, that: String"#,
r#"Failed to merge JSON datatype: datatypes have conflict, this: Object{"hello": String, "list": Array[Number(I64)], "object": Object{"a": Number(I64)}}, that: Number(I64)"#,
r#"Failed to merge JSON datatype: datatypes have conflict, this: Object{"hello": String, "list": Array[Number(I64)], "object": Object{"a": Number(I64)}}, that: Array[Number(I64)]"#,
r#"Failed to merge JSON datatype: datatypes have conflict, this: {"hello":"<String>","list":["<Number>"],"object":{"a":"<Number>"}}, that: "<String>""#,
r#"Failed to merge JSON datatype: datatypes have conflict, this: {"hello":"<String>","list":["<Number>"],"object":{"a":"<Number>"}}, that: "<Number>""#,
r#"Failed to merge JSON datatype: datatypes have conflict, this: {"hello":"<String>","list":["<Number>"],"object":{"a":"<Number>"}}, that: ["<Number>"]"#,
];
for (json, expect) in jsons.into_iter().zip(expects.into_iter()) {
test(json, json_type, Err(expect))?;
@@ -670,7 +675,7 @@ mod tests {
"float": 0.123,
"no": 42
}"#;
let expected = r#"Failed to merge JSON datatype: datatypes have conflict, this: String, that: Number(I64)"#;
let expected = r#"Failed to merge JSON datatype: datatypes have conflict, this: "<String>", that: "<Number>""#;
test(json, json_type, Err(expected))?;
// can merge with another json object:
@@ -679,7 +684,7 @@ mod tests {
"float": 0.123,
"int": 42
}"#;
let expected = r#"Json<Object{"float": Number(F64), "hello": String, "int": Number(I64), "list": Array[Number(I64)], "object": Object{"a": Number(I64)}}>"#;
let expected = r#"Json<{"float":"<Number>","hello":"<String>","int":"<Number>","list":["<Number>"],"object":{"a":"<Number>"}}>"#;
test(json, json_type, Ok(expected))?;
// can merge with some complex nested json object:
@@ -689,7 +694,7 @@ mod tests {
"float": 0.456,
"int": 0
}"#;
let expected = r#"Json<Object{"float": Number(F64), "hello": String, "int": Number(I64), "list": Array[Number(I64)], "object": Object{"a": Number(I64), "foo": String, "l": Array[String], "o": Object{"key": String}}}>"#;
let expected = r#"Json<{"float":"<Number>","hello":"<String>","int":"<Number>","list":["<Number>"],"object":{"a":"<Number>","foo":"<String>","l":["<String>"],"o":{"key":"<String>"}}}>"#;
test(json, json_type, Ok(expected))?;
Ok(())

View File

@@ -321,10 +321,10 @@ mod tests {
Ok(()),
Ok(()),
Err(
"Failed to merge JSON datatype: datatypes have conflict, this: Number(I64), that: String",
r#"Failed to merge JSON datatype: datatypes have conflict, this: "<Number>", that: "<String>""#,
),
Err(
"Failed to merge JSON datatype: datatypes have conflict, this: Number(I64), that: Array[Bool]",
r#"Failed to merge JSON datatype: datatypes have conflict, this: "<Number>", that: ["<Bool>"]"#,
),
];
let mut builder = JsonVectorBuilder::new(JsonNativeType::Null, 1);
@@ -396,12 +396,12 @@ mod tests {
// test children builders:
assert_eq!(builder.builders.len(), 6);
let expect_types = [
r#"Json<Object{"list": Array[Number(I64)], "s": String}>"#,
r#"Json<Object{"float": Number(F64), "s": String}>"#,
r#"Json<Object{"float": Number(F64), "int": Number(I64)}>"#,
r#"Json<Object{"int": Number(I64), "object": Object{"hello": String, "timestamp": Number(I64)}}>"#,
r#"Json<Object{"nested": Object{"a": Object{"b": Object{"b": Object{"a": String}}}}, "object": Object{"timestamp": Number(I64)}}>"#,
r#"Json<Object{"nested": Object{"a": Object{"b": Object{"a": Object{"b": String}}}}, "object": Object{"timestamp": Number(I64)}}>"#,
r#"Json<{"list":["<Number>"],"s":"<String>"}>"#,
r#"Json<{"float":"<Number>","s":"<String>"}>"#,
r#"Json<{"float":"<Number>","int":"<Number>"}>"#,
r#"Json<{"int":"<Number>","object":{"hello":"<String>","timestamp":"<Number>"}}>"#,
r#"Json<{"nested":{"a":{"b":{"b":{"a":"<String>"}}}},"object":{"timestamp":"<Number>"}}>"#,
r#"Json<{"nested":{"a":{"b":{"a":{"b":"<String>"}}}},"object":{"timestamp":"<Number>"}}>"#,
];
let expect_vectors = [
r#"
@@ -456,7 +456,7 @@ mod tests {
}
// test final merged json type:
let expected = r#"Json<Object{"float": Number(F64), "int": Number(I64), "list": Array[Number(I64)], "nested": Object{"a": Object{"b": Object{"a": Object{"b": String}, "b": Object{"a": String}}}}, "object": Object{"hello": String, "timestamp": Number(I64)}, "s": String}>"#;
let expected = r#"Json<{"float":"<Number>","int":"<Number>","list":["<Number>"],"nested":{"a":{"b":{"a":{"b":"<String>"},"b":{"a":"<String>"}}}},"object":{"hello":"<String>","timestamp":"<Number>"},"s":"<String>"}>"#;
assert_eq!(builder.data_type().to_string(), expected);
// test final produced vector:

View File

@@ -79,7 +79,7 @@ tokio.workspace = true
tonic.workspace = true
[dev-dependencies]
catalog.workspace = true
catalog = { workspace = true, features = ["testing"] }
common-catalog.workspace = true
pretty_assertions.workspace = true
prost.workspace = true

View File

@@ -39,7 +39,6 @@ use query::QueryEngine;
use query::options::QueryOptions;
use serde::{Deserialize, Serialize};
use servers::grpc::GrpcOptions;
use servers::heartbeat_options::HeartbeatOptions;
use servers::http::HttpOptions;
use session::context::QueryContext;
use snafu::{OptionExt, ResultExt, ensure};
@@ -111,7 +110,6 @@ pub struct FlownodeOptions {
pub meta_client: Option<MetaClientOptions>,
pub logging: LoggingOptions,
pub tracing: TracingOptions,
pub heartbeat: HeartbeatOptions,
pub query: QueryOptions,
pub user_provider: Option<String>,
pub memory: MemoryOptions,
@@ -127,7 +125,6 @@ impl Default for FlownodeOptions {
meta_client: None,
logging: LoggingOptions::default(),
tracing: TracingOptions::default(),
heartbeat: HeartbeatOptions::default(),
// flownode's query option is set to 1 to throttle flow's query so
// that it won't use too much cpu or memory
query: QueryOptions {

View File

@@ -24,7 +24,7 @@ use super::*;
pub fn new_test_table_info_with_name<I: IntoIterator<Item = u32>>(
table_id: TableId,
table_name: &str,
region_numbers: I,
_region_numbers: I,
) -> TableInfo {
let column_schemas = vec![
ColumnSchema::new("number", ConcreteDataType::int32_datatype(), true),
@@ -46,7 +46,6 @@ pub fn new_test_table_info_with_name<I: IntoIterator<Item = u32>>(
.primary_key_indices(vec![0])
.engine("engine")
.next_column_id(3)
.region_numbers(region_numbers.into_iter().collect::<Vec<_>>())
.build()
.unwrap();
TableInfoBuilder::default()

View File

@@ -30,7 +30,6 @@ use common_telemetry::{debug, error, info, warn};
use greptime_proto::v1::meta::NodeInfo;
use meta_client::client::{HeartbeatSender, HeartbeatStream, MetaClient};
use servers::addrs;
use servers::heartbeat_options::HeartbeatOptions;
use snafu::ResultExt;
use tokio::sync::mpsc;
use tokio::time::Duration;
@@ -64,8 +63,6 @@ pub struct HeartbeatTask {
node_epoch: u64,
peer_addr: String,
meta_client: Arc<MetaClient>,
report_interval: Duration,
retry_interval: Duration,
resp_handler_executor: HeartbeatResponseHandlerExecutorRef,
running: Arc<AtomicBool>,
query_stat_size: Option<SizeReportSender>,
@@ -81,7 +78,6 @@ impl HeartbeatTask {
pub fn new(
opts: &FlownodeOptions,
meta_client: Arc<MetaClient>,
heartbeat_opts: HeartbeatOptions,
resp_handler_executor: HeartbeatResponseHandlerExecutorRef,
resource_stat: ResourceStatRef,
) -> Self {
@@ -90,8 +86,6 @@ impl HeartbeatTask {
node_epoch: common_time::util::current_time_millis() as u64,
peer_addr: addrs::resolve_addr(&opts.grpc.bind_addr, Some(&opts.grpc.server_addr)),
meta_client,
report_interval: heartbeat_opts.interval,
retry_interval: heartbeat_opts.retry_interval,
resp_handler_executor,
running: Arc::new(AtomicBool::new(false)),
query_stat_size: None,
@@ -113,22 +107,26 @@ impl HeartbeatTask {
}
async fn create_streams(&self) -> Result<(), Error> {
info!("Start to establish the heartbeat connection to metasrv.");
let (req_sender, resp_stream) = self
info!("Establishing heartbeat connection to Metasrv...");
let (req_sender, resp_stream, config) = self
.meta_client
.heartbeat()
.await
.map_err(BoxedError::new)
.context(ExternalSnafu)?;
info!("Flownode's heartbeat connection has been established with metasrv");
info!(
"Heartbeat started for flownode {}, Metasrv config: {}",
self.node_id, config
);
let (outgoing_tx, outgoing_rx) = mpsc::channel(16);
let mailbox = Arc::new(HeartbeatMailbox::new(outgoing_tx));
self.start_handle_resp_stream(resp_stream, mailbox);
self.start_handle_resp_stream(resp_stream, mailbox, config.retry_interval);
self.start_heartbeat_report(req_sender, outgoing_rx);
self.start_heartbeat_report(req_sender, outgoing_rx, config.interval);
Ok(())
}
@@ -217,8 +215,8 @@ impl HeartbeatTask {
&self,
req_sender: HeartbeatSender,
mut outgoing_rx: mpsc::Receiver<OutgoingMessage>,
report_interval: Duration,
) {
let report_interval = self.report_interval;
let node_epoch = self.node_epoch;
let self_peer = Some(Peer {
id: self.node_id,
@@ -277,9 +275,13 @@ impl HeartbeatTask {
});
}
fn start_handle_resp_stream(&self, mut resp_stream: HeartbeatStream, mailbox: MailboxRef) {
fn start_handle_resp_stream(
&self,
mut resp_stream: HeartbeatStream,
mailbox: MailboxRef,
retry_interval: Duration,
) {
let capture_self = self.clone();
let retry_interval = self.retry_interval;
let _handle = common_runtime::spawn_hb(async move {
loop {

View File

@@ -25,7 +25,6 @@ use meta_client::MetaClientOptions;
use query::options::QueryOptions;
use serde::{Deserialize, Serialize};
use servers::grpc::GrpcOptions;
use servers::heartbeat_options::HeartbeatOptions;
use servers::http::HttpOptions;
use servers::server::ServerHandlers;
use snafu::ResultExt;
@@ -45,7 +44,6 @@ pub struct FrontendOptions {
pub node_id: Option<String>,
pub default_timezone: Option<String>,
pub default_column_prefix: Option<String>,
pub heartbeat: HeartbeatOptions,
/// Maximum total memory for all concurrent write request bodies and messages (HTTP, gRPC, Flight).
/// Set to 0 to disable the limit. Default: "0" (unlimited)
pub max_in_flight_write_bytes: ReadableSize,
@@ -82,7 +80,6 @@ impl Default for FrontendOptions {
node_id: None,
default_timezone: None,
default_column_prefix: None,
heartbeat: HeartbeatOptions::frontend_default(),
max_in_flight_write_bytes: ReadableSize(0),
write_bytes_exhausted_policy: OnExhaustedPolicy::default(),
http: HttpOptions::default(),
@@ -406,10 +403,6 @@ mod tests {
..Default::default()
},
meta_client: Some(meta_client_options.clone()),
heartbeat: HeartbeatOptions {
interval: Duration::from_secs(1),
..Default::default()
},
..Default::default()
};
@@ -419,7 +412,11 @@ mod tests {
let meta_client = create_meta_client(&meta_client_options, server.clone()).await;
let frontend = create_frontend(&options, meta_client).await?;
let frontend_heartbeat_interval = options.heartbeat.interval;
use common_meta::distributed_time_constants::{
BASE_HEARTBEAT_INTERVAL, frontend_heartbeat_interval,
};
let frontend_heartbeat_interval =
frontend_heartbeat_interval(BASE_HEARTBEAT_INTERVAL) + Duration::from_secs(1);
tokio::time::sleep(frontend_heartbeat_interval).await;
// initial state: not suspend:
assert!(!frontend.instance.is_suspended());

View File

@@ -42,8 +42,6 @@ use crate::metrics::{HEARTBEAT_RECV_COUNT, HEARTBEAT_SENT_COUNT};
pub struct HeartbeatTask {
peer_addr: String,
meta_client: Arc<MetaClient>,
report_interval: Duration,
retry_interval: Duration,
resp_handler_executor: HeartbeatResponseHandlerExecutorRef,
start_time_ms: u64,
resource_stat: ResourceStatRef,
@@ -66,8 +64,6 @@ impl HeartbeatTask {
addrs::resolve_addr(&opts.grpc.bind_addr, Some(&opts.grpc.server_addr))
},
meta_client,
report_interval: opts.heartbeat.interval,
retry_interval: opts.heartbeat.retry_interval,
resp_handler_executor,
start_time_ms: common_time::util::current_time_millis() as u64,
resource_stat,
@@ -75,27 +71,31 @@ impl HeartbeatTask {
}
pub async fn start(&self) -> Result<()> {
let (req_sender, resp_stream) = self
let (req_sender, resp_stream, config) = self
.meta_client
.heartbeat()
.await
.context(error::CreateMetaHeartbeatStreamSnafu)?;
info!("A heartbeat connection has been established with metasrv");
info!("Heartbeat started with Metasrv config: {}", config);
let (outgoing_tx, outgoing_rx) = mpsc::channel(16);
let mailbox = Arc::new(HeartbeatMailbox::new(outgoing_tx));
self.start_handle_resp_stream(resp_stream, mailbox);
self.start_handle_resp_stream(resp_stream, mailbox, config.retry_interval);
self.start_heartbeat_report(req_sender, outgoing_rx);
self.start_heartbeat_report(req_sender, outgoing_rx, config.interval);
Ok(())
}
fn start_handle_resp_stream(&self, mut resp_stream: HeartbeatStream, mailbox: MailboxRef) {
fn start_handle_resp_stream(
&self,
mut resp_stream: HeartbeatStream,
mailbox: MailboxRef,
retry_interval: Duration,
) {
let capture_self = self.clone();
let retry_interval = self.retry_interval;
let _handle = common_runtime::spawn_hb(async move {
loop {
@@ -190,8 +190,8 @@ impl HeartbeatTask {
&self,
req_sender: HeartbeatSender,
mut outgoing_rx: Receiver<OutgoingMessage>,
report_interval: Duration,
) {
let report_interval = self.report_interval;
let start_time_ms = self.start_time_ms;
let self_peer = Some(Peer {
// The node id will be actually calculated from its address (by hashing the address

View File

@@ -91,6 +91,7 @@ use sql::statements::tql::Tql;
use sqlparser::ast::ObjectName;
pub use standalone::StandaloneDatanodeManager;
use table::requests::{OTLP_METRIC_COMPAT_KEY, OTLP_METRIC_COMPAT_PROM};
use tracing::Span;
use crate::error::{
self, Error, ExecLogicalPlanSnafu, ExecutePromqlSnafu, ExternalSnafu, InvalidSqlSnafu,
@@ -508,6 +509,7 @@ fn attach_timeout(output: Output, mut timeout: Duration) -> Result<Output> {
stream: s,
output_ordering: None,
metrics: Default::default(),
span: Span::current(),
};
Output::new(OutputData::Stream(Box::pin(stream)), output.meta)
}

View File

@@ -40,6 +40,7 @@ use servers::query_handler::{
};
use session::context::QueryContextRef;
use snafu::{OptionExt, ResultExt};
use tracing::instrument;
use crate::error::{
CatalogSnafu, ExecLogicalPlanSnafu, PromStoreRemoteQueryPlanSnafu, ReadTableSnafu, Result,
@@ -78,6 +79,7 @@ fn negotiate_response_type(accepted_response_types: &[i32]) -> ServerResult<Resp
Ok(ResponseType::try_from(*response_type).unwrap())
}
#[instrument(skip_all, fields(table_name))]
async fn to_query_result(table_name: &str, output: Output) -> ServerResult<QueryResult> {
let OutputData::Stream(stream) = output.data else {
unreachable!()
@@ -194,6 +196,7 @@ impl PromStoreProtocolHandler for Instance {
Ok(output)
}
#[instrument(skip_all, fields(table_name))]
async fn read(
&self,
request: ReadRequest,

View File

@@ -97,12 +97,16 @@ impl Datanode for RegionInvoker {
}
async fn handle_query(&self, request: QueryRequest) -> MetaResult<SendableRecordBatchStream> {
let region_id = request.region_id.to_string();
let span = request
.header
.as_ref()
.map(|h| TracingContext::from_w3c(&h.tracing_context))
.unwrap_or_default()
.attach(tracing::info_span!("RegionInvoker::handle_query"));
.attach(tracing::info_span!(
"RegionInvoker::handle_query",
region_id = region_id
));
self.region_server
.handle_read(request)
.trace(span)

View File

@@ -44,7 +44,7 @@ async fn run() {
// required only when the heartbeat_client is enabled
meta_client.ask_leader().await.unwrap();
let (sender, mut receiver) = meta_client.heartbeat().await.unwrap();
let (sender, mut receiver, _config) = meta_client.heartbeat().await.unwrap();
// send heartbeats
let _handle = tokio::spawn(async move {

View File

@@ -13,7 +13,7 @@
// limitations under the License.
mod ask_leader;
mod heartbeat;
pub mod heartbeat;
mod load_balance;
mod procedure;
@@ -57,7 +57,7 @@ use common_meta::rpc::store::{
};
use common_telemetry::info;
use futures::TryStreamExt;
use heartbeat::Client as HeartbeatClient;
use heartbeat::{Client as HeartbeatClient, HeartbeatConfig};
use procedure::Client as ProcedureClient;
use snafu::{OptionExt, ResultExt};
use store::Client as StoreClient;
@@ -594,7 +594,9 @@ impl MetaClient {
/// The `datanode` needs to use the sender to continuously send heartbeat
/// packets (some self-state data), and the receiver can receive a response
/// from "metasrv" (which may contain some scheduling instructions).
pub async fn heartbeat(&self) -> Result<(HeartbeatSender, HeartbeatStream)> {
///
/// Returns the heartbeat sender, stream, and configuration received from Metasrv.
pub async fn heartbeat(&self) -> Result<(HeartbeatSender, HeartbeatStream, HeartbeatConfig)> {
self.heartbeat_client()?.heartbeat().await
}
@@ -873,7 +875,7 @@ mod tests {
#[tokio::test]
async fn test_heartbeat() {
let tc = new_client("test_heartbeat").await;
let (sender, mut receiver) = tc.client.heartbeat().await.unwrap();
let (sender, mut receiver, _config) = tc.client.heartbeat().await.unwrap();
// send heartbeats
let request_sent = Arc::new(AtomicUsize::new(0));

View File

@@ -12,14 +12,17 @@
// See the License for the specific language governing permissions and
// limitations under the License.
use std::fmt;
use std::sync::Arc;
use std::time::Duration;
use api::v1::meta::heartbeat_client::HeartbeatClient;
use api::v1::meta::{HeartbeatRequest, HeartbeatResponse, RequestHeader, Role};
use common_grpc::channel_manager::ChannelManager;
use common_meta::distributed_time_constants::BASE_HEARTBEAT_INTERVAL;
use common_meta::util;
use common_telemetry::info;
use common_telemetry::tracing_context::TracingContext;
use common_telemetry::{info, warn};
use snafu::{OptionExt, ResultExt, ensure};
use tokio::sync::{RwLock, mpsc};
use tokio_stream::wrappers::ReceiverStream;
@@ -32,6 +35,52 @@ use crate::client::{Id, LeaderProviderRef};
use crate::error;
use crate::error::{InvalidResponseHeaderSnafu, Result};
/// Heartbeat configuration received from Metasrv during handshake.
#[derive(Debug, Clone, Copy)]
pub struct HeartbeatConfig {
pub interval: Duration,
pub retry_interval: Duration,
}
impl Default for HeartbeatConfig {
fn default() -> Self {
Self {
interval: BASE_HEARTBEAT_INTERVAL,
retry_interval: BASE_HEARTBEAT_INTERVAL,
}
}
}
impl fmt::Display for HeartbeatConfig {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
write!(
f,
"interval={:?}, retry={:?}",
self.interval, self.retry_interval
)
}
}
impl HeartbeatConfig {
/// Extract configuration from HeartbeatResponse.
pub fn from_response(res: &HeartbeatResponse) -> Self {
if let Some(cfg) = &res.heartbeat_config {
// Metasrv provided complete configuration
Self {
interval: Duration::from_millis(cfg.heartbeat_interval_ms),
retry_interval: Duration::from_millis(cfg.retry_interval_ms),
}
} else {
let fallback = Self::default();
warn!(
"Metasrv didn't provide heartbeat_config, using default: {}",
fallback
);
fallback
}
}
}
pub struct HeartbeatSender {
id: Id,
role: Role,
@@ -130,7 +179,9 @@ impl Client {
inner.ask_leader().await
}
pub async fn heartbeat(&mut self) -> Result<(HeartbeatSender, HeartbeatStream)> {
pub async fn heartbeat(
&mut self,
) -> Result<(HeartbeatSender, HeartbeatStream, HeartbeatConfig)> {
let inner = self.inner.read().await;
inner.ask_leader().await?;
inner.heartbeat().await
@@ -198,7 +249,7 @@ impl Inner {
leader_provider.ask_leader().await
}
async fn heartbeat(&self) -> Result<(HeartbeatSender, HeartbeatStream)> {
async fn heartbeat(&self) -> Result<(HeartbeatSender, HeartbeatStream, HeartbeatConfig)> {
ensure!(
self.is_started(),
error::IllegalGrpcClientStateSnafu {
@@ -245,14 +296,18 @@ impl Inner {
.map_err(error::Error::from)?
.context(error::CreateHeartbeatStreamSnafu)?;
// Extract heartbeat configuration from handshake response
let config = HeartbeatConfig::from_response(&res);
info!(
"Success to create heartbeat stream to server: {}, response: {:#?}",
leader_addr, res
"Handshake successful with Metasrv at {}, received config: {}",
leader_addr, config
);
Ok((
HeartbeatSender::new(self.id, self.role, sender),
HeartbeatStream::new(self.id, stream),
config,
))
}

View File

@@ -110,11 +110,11 @@ pub enum Error {
},
#[snafu(display(
"Another procedure is opening the region: {} on peer: {}",
"Another procedure is operating the region: {} on peer: {}",
region_id,
peer_id
))]
RegionOpeningRace {
RegionOperatingRace {
#[snafu(implicit)]
location: Location,
peer_id: DatanodeId,
@@ -1059,6 +1059,15 @@ pub enum Error {
#[snafu(implicit)]
location: Location,
},
#[snafu(display("Failed to deallocate regions for table: {}", table_id))]
DeallocateRegions {
#[snafu(implicit)]
location: Location,
table_id: TableId,
#[snafu(source)]
source: common_meta::error::Error,
},
}
impl Error {
@@ -1154,7 +1163,7 @@ impl ErrorExt for Error {
| Error::InvalidUtf8Value { .. }
| Error::UnexpectedInstructionReply { .. }
| Error::Unexpected { .. }
| Error::RegionOpeningRace { .. }
| Error::RegionOperatingRace { .. }
| Error::RegionRouteNotFound { .. }
| Error::MigrationAbort { .. }
| Error::MigrationRunning { .. }
@@ -1206,6 +1215,7 @@ impl ErrorExt for Error {
Error::Other { source, .. } => source.status_code(),
Error::RepartitionCreateSubtasks { source, .. } => source.status_code(),
Error::RepartitionSubprocedureStateReceiver { source, .. } => source.status_code(),
Error::DeallocateRegions { source, .. } => source.status_code(),
Error::NoEnoughAvailableNode { .. } => StatusCode::RuntimeResourcesExhausted,
#[cfg(feature = "pg_kvbackend")]

View File

@@ -73,11 +73,11 @@ impl Default for GcSchedulerOptions {
retry_backoff_duration: Duration::from_secs(5),
region_gc_concurrency: 16,
min_region_size_threshold: 100 * 1024 * 1024, // 100MB
sst_count_weight: 1.0,
file_removed_count_weight: 0.5,
sst_count_weight: 0.5, // more sst means could potentially remove more files, moderate priority
file_removed_count_weight: 1.0, // more file to be deleted, higher priority
gc_cooldown_period: Duration::from_secs(60 * 5), // 5 minutes
regions_per_table_threshold: 20, // Select top 20 regions per table
mailbox_timeout: Duration::from_secs(60), // 60 seconds
regions_per_table_threshold: 20, // Select top 20 regions per table
mailbox_timeout: Duration::from_secs(60), // 60 seconds
// Perform full file listing every 24 hours to find orphan files
full_file_listing_interval: Duration::from_secs(60 * 60 * 24),
// Clean up stale tracker entries every 6 hours

View File

@@ -348,6 +348,8 @@ impl HeartbeatHandlerGroup {
err_msg: format!("invalid role: {:?}", req.header),
})?;
let is_handshake = ctx.is_handshake;
for NameCachedHandler { name, handler } in self.handlers.iter() {
if !handler.is_acceptable(role) {
continue;
@@ -363,10 +365,26 @@ impl HeartbeatHandlerGroup {
}
let header = std::mem::take(&mut acc.header);
let mailbox_message = acc.take_mailbox_message();
// Populate heartbeat_config during handshake
let heartbeat_config = if is_handshake {
let config = ctx.heartbeat_options_for(role).into();
info!(
"Handshake with {:?} node, sending config: {:?}",
role, config
);
Some(config)
} else {
None
};
let res = HeartbeatResponse {
header,
region_lease: acc.region_lease,
mailbox_message,
heartbeat_config,
};
Ok(res)
}

View File

@@ -192,7 +192,7 @@ mod test {
let another_region_id = RegionId::new(table_id, region_number + 1);
let peer = Peer::empty(datanode_id);
let follower_peer = Peer::empty(datanode_id + 1);
let table_info = new_test_table_info(table_id, vec![region_number]).into();
let table_info = new_test_table_info(table_id).into();
let region_routes = vec![RegionRoute {
region: Region::new_test(region_id),
@@ -328,7 +328,7 @@ mod test {
let no_exist_region_id = RegionId::new(table_id, region_number + 2);
let peer = Peer::empty(datanode_id);
let follower_peer = Peer::empty(datanode_id + 1);
let table_info = new_test_table_info(table_id, vec![region_number]).into();
let table_info = new_test_table_info(table_id).into();
let region_routes = vec![
RegionRoute {

View File

@@ -15,6 +15,7 @@
use std::sync::Arc;
use common_meta::cache_invalidator::{CacheInvalidatorRef, DummyCacheInvalidator};
use common_meta::distributed_time_constants::BASE_HEARTBEAT_INTERVAL;
use common_meta::key::{TableMetadataManager, TableMetadataManagerRef};
use common_meta::kv_backend::memory::MemoryKvBackend;
use common_meta::kv_backend::{KvBackendRef, ResettableKvBackendRef};
@@ -90,6 +91,8 @@ impl TestEnv {
cache_invalidator: self.cache_invalidator.clone(),
leader_region_registry: self.leader_region_registry.clone(),
topic_stats_registry: self.topic_stats_registry.clone(),
heartbeat_interval: BASE_HEARTBEAT_INTERVAL,
is_handshake: false,
}
}
}

View File

@@ -19,6 +19,7 @@ use std::sync::atomic::{AtomicBool, Ordering};
use std::sync::{Arc, Mutex, RwLock};
use std::time::Duration;
use api::v1::meta::{HeartbeatConfig, Role};
use clap::ValueEnum;
use common_base::Plugins;
use common_base::readable_size::ReadableSize;
@@ -27,7 +28,9 @@ use common_event_recorder::EventRecorderOptions;
use common_greptimedb_telemetry::GreptimeDBTelemetryTask;
use common_meta::cache_invalidator::CacheInvalidatorRef;
use common_meta::ddl_manager::DdlManagerRef;
use common_meta::distributed_time_constants::{self, default_distributed_time_constants};
use common_meta::distributed_time_constants::{
self, BASE_HEARTBEAT_INTERVAL, default_distributed_time_constants, frontend_heartbeat_interval,
};
use common_meta::key::TableMetadataManagerRef;
use common_meta::key::runtime_switch::RuntimeSwitchManagerRef;
use common_meta::kv_backend::{KvBackendRef, ResettableKvBackend, ResettableKvBackendRef};
@@ -121,6 +124,59 @@ impl Default for StatsPersistenceOptions {
}
}
/// Heartbeat configuration for a single node type.
#[derive(Clone, PartialEq, Serialize, Deserialize, Debug)]
#[serde(default)]
pub struct HeartbeatOptions {
/// Heartbeat interval.
#[serde(with = "humantime_serde")]
pub interval: Duration,
/// Retry interval when heartbeat connection fails.
#[serde(with = "humantime_serde")]
pub retry_interval: Duration,
}
impl Default for HeartbeatOptions {
fn default() -> Self {
Self {
interval: BASE_HEARTBEAT_INTERVAL,
retry_interval: BASE_HEARTBEAT_INTERVAL,
}
}
}
impl HeartbeatOptions {
pub fn datanode_from(base_interval: Duration) -> Self {
Self {
interval: base_interval,
retry_interval: base_interval,
}
}
pub fn frontend_from(base_interval: Duration) -> Self {
Self {
interval: frontend_heartbeat_interval(base_interval),
retry_interval: base_interval,
}
}
pub fn flownode_from(base_interval: Duration) -> Self {
Self {
interval: base_interval,
retry_interval: base_interval,
}
}
}
impl From<HeartbeatOptions> for HeartbeatConfig {
fn from(opts: HeartbeatOptions) -> Self {
Self {
heartbeat_interval_ms: opts.interval.as_millis() as u64,
retry_interval_ms: opts.retry_interval.as_millis() as u64,
}
}
}
#[derive(Clone, PartialEq, Serialize, Deserialize, Debug)]
#[serde(default)]
pub struct BackendClientOptions {
@@ -299,7 +355,7 @@ impl Default for MetasrvOptions {
#[allow(deprecated)]
server_addr: String::new(),
store_addrs: vec!["127.0.0.1:2379".to_string()],
backend_tls: None,
backend_tls: Some(TlsOption::prefer()),
selector: SelectorType::default(),
enable_region_failover: false,
heartbeat_interval: distributed_time_constants::BASE_HEARTBEAT_INTERVAL,
@@ -379,6 +435,8 @@ pub struct Context {
pub cache_invalidator: CacheInvalidatorRef,
pub leader_region_registry: LeaderRegionRegistryRef,
pub topic_stats_registry: TopicStatsRegistryRef,
pub heartbeat_interval: Duration,
pub is_handshake: bool,
}
impl Context {
@@ -386,6 +444,19 @@ impl Context {
self.in_memory.reset();
self.leader_region_registry.reset();
}
pub fn with_handshake(mut self, is_handshake: bool) -> Self {
self.is_handshake = is_handshake;
self
}
pub fn heartbeat_options_for(&self, role: Role) -> HeartbeatOptions {
match role {
Role::Datanode => HeartbeatOptions::datanode_from(self.heartbeat_interval),
Role::Frontend => HeartbeatOptions::frontend_from(self.heartbeat_interval),
Role::Flownode => HeartbeatOptions::flownode_from(self.heartbeat_interval),
}
}
}
/// The value of the leader. It is used to store the leader's address.
@@ -903,6 +974,8 @@ impl Metasrv {
cache_invalidator,
leader_region_registry,
topic_stats_registry,
heartbeat_interval: self.options().heartbeat_interval,
is_handshake: false,
}
}
}

View File

@@ -1172,7 +1172,7 @@ mod tests {
let from_peer = persistent_context.from_peer.clone();
let to_peer = persistent_context.to_peer.clone();
let region_id = persistent_context.region_ids[0];
let table_info = new_test_table_info(1024, vec![1]).into();
let table_info = new_test_table_info(1024).into();
let region_routes = vec![RegionRoute {
region: Region::new_test(region_id),
leader_peer: Some(from_peer),
@@ -1211,7 +1211,7 @@ mod tests {
let to_peer_id = persistent_context.to_peer.id;
let from_peer = persistent_context.from_peer.clone();
let region_id = persistent_context.region_ids[0];
let table_info = new_test_table_info(1024, vec![1]).into();
let table_info = new_test_table_info(1024).into();
let region_routes = vec![RegionRoute {
region: Region::new_test(region_id),
leader_peer: Some(from_peer),
@@ -1299,7 +1299,7 @@ mod tests {
let to_peer_id = persistent_context.to_peer.id;
let from_peer = persistent_context.from_peer.clone();
let region_id = persistent_context.region_ids[0];
let table_info = new_test_table_info(1024, vec![1]).into();
let table_info = new_test_table_info(1024).into();
let region_routes = vec![RegionRoute {
region: Region::new_test(region_id),
leader_peer: Some(from_peer),
@@ -1419,7 +1419,7 @@ mod tests {
let from_peer_id = persistent_context.from_peer.id;
let from_peer = persistent_context.from_peer.clone();
let region_id = persistent_context.region_ids[0];
let table_info = new_test_table_info(1024, vec![1]).into();
let table_info = new_test_table_info(1024).into();
let region_routes = vec![RegionRoute {
region: Region::new_test(region_id),
leader_peer: Some(from_peer),

View File

@@ -401,7 +401,7 @@ mod tests {
async fn prepare_table_metadata(ctx: &Context, wal_options: HashMap<u32, String>) {
let region_id = ctx.persistent_ctx.region_ids[0];
let table_info = new_test_table_info(region_id.table_id(), vec![1]).into();
let table_info = new_test_table_info(region_id.table_id()).into();
let region_routes = vec![RegionRoute {
region: Region::new_test(region_id),
leader_peer: Some(ctx.persistent_ctx.from_peer.clone()),

View File

@@ -698,7 +698,7 @@ mod test {
trigger_reason: RegionMigrationTriggerReason::Manual,
};
let table_info = new_test_table_info(1024, vec![1]).into();
let table_info = new_test_table_info(1024).into();
let region_routes = vec![RegionRoute {
region: Region::new_test(RegionId::new(1024, 2)),
leader_peer: Some(Peer::empty(3)),
@@ -726,7 +726,7 @@ mod test {
trigger_reason: RegionMigrationTriggerReason::Manual,
};
let table_info = new_test_table_info(1024, vec![1]).into();
let table_info = new_test_table_info(1024).into();
let region_routes = vec![RegionRoute {
region: Region::new_test(RegionId::new(1024, 1)),
leader_peer: Some(Peer::empty(3)),
@@ -758,7 +758,7 @@ mod test {
trigger_reason: RegionMigrationTriggerReason::Manual,
};
let table_info = new_test_table_info(1024, vec![1]).into();
let table_info = new_test_table_info(1024).into();
let region_routes = vec![RegionRoute {
region: Region::new_test(region_id),
leader_peer: Some(Peer::empty(3)),
@@ -792,7 +792,7 @@ mod test {
trigger_reason: RegionMigrationTriggerReason::Manual,
};
let table_info = new_test_table_info(1024, vec![1]).into();
let table_info = new_test_table_info(1024).into();
let region_routes = vec![RegionRoute {
region: Region::new_test(RegionId::new(1024, 1)),
leader_peer: Some(Peer::empty(2)),
@@ -822,7 +822,7 @@ mod test {
let err = manager
.verify_table_route(
&TableRouteValue::Logical(LogicalTableRouteValue::new(0, vec![])),
&TableRouteValue::Logical(LogicalTableRouteValue::new(0)),
&task,
)
.unwrap_err();
@@ -864,7 +864,7 @@ mod test {
timeout: Duration::from_millis(1000),
trigger_reason: RegionMigrationTriggerReason::Manual,
};
let table_info = new_test_table_info(1024, vec![1]).into();
let table_info = new_test_table_info(1024).into();
let region_routes = vec![RegionRoute {
region: Region::new_test(region_id),
leader_peer: Some(Peer::empty(2)),
@@ -897,7 +897,7 @@ mod test {
trigger_reason: RegionMigrationTriggerReason::Manual,
};
let table_info = new_test_table_info(1024, vec![1]).into();
let table_info = new_test_table_info(1024).into();
let region_routes = vec![RegionRoute {
region: Region::new_test(RegionId::new(1024, 1)),
leader_peer: Some(Peer::empty(3)),
@@ -930,7 +930,7 @@ mod test {
trigger_reason: RegionMigrationTriggerReason::Manual,
};
let table_info = new_test_table_info(1024, vec![1]).into();
let table_info = new_test_table_info(1024).into();
let region_routes = vec![RegionRoute {
region: Region::new_test(region_id),
leader_peer: Some(Peer::empty(3)),
@@ -974,7 +974,7 @@ mod test {
task.trigger_reason,
),
);
let table_info = new_test_table_info(1024, vec![1]).into();
let table_info = new_test_table_info(1024).into();
let region_routes = vec![RegionRoute {
region: Region::new_test(RegionId::new(1024, 2)),
leader_peer: Some(Peer::empty(1)),

View File

@@ -223,7 +223,7 @@ mod tests {
let env = TestingEnv::new();
let mut ctx = env.context_factory().new_context(persistent_context);
let table_info = new_test_table_info(1024, vec![3]).into();
let table_info = new_test_table_info(1024).into();
let region_route = RegionRoute {
region: Region::new_test(RegionId::new(1024, 3)),
leader_peer: Some(from_peer.clone()),
@@ -250,7 +250,7 @@ mod tests {
let env = TestingEnv::new();
let mut ctx = env.context_factory().new_context(persistent_context);
let table_info = new_test_table_info(1024, vec![1]).into();
let table_info = new_test_table_info(1024).into();
let region_routes = vec![RegionRoute {
region: Region::new_test(region_id),
leader_peer: Some(to_peer),
@@ -277,7 +277,7 @@ mod tests {
let env = TestingEnv::new();
let mut ctx = env.context_factory().new_context(persistent_context);
let table_info = new_test_table_info(1024, vec![1]).into();
let table_info = new_test_table_info(1024).into();
let region_routes = vec![RegionRoute {
region: Region::new_test(region_id),
leader_peer: Some(Peer::empty(from_peer_id)),
@@ -302,7 +302,7 @@ mod tests {
let env = TestingEnv::new();
let mut ctx = env.context_factory().new_context(persistent_context);
let table_info = new_test_table_info(1024, vec![1]).into();
let table_info = new_test_table_info(1024).into();
let region_routes: Vec<RegionRoute> = vec![RegionRoute {
region: Region::new_test(region_id),
leader_peer: Some(Peer::empty(1024)),

View File

@@ -129,7 +129,7 @@ impl OpenCandidateRegion {
let guard = ctx
.opening_region_keeper
.register(candidate.id, *region_id)
.context(error::RegionOpeningRaceSnafu {
.context(error::RegionOperatingRaceSnafu {
peer_id: candidate.id,
region_id: *region_id,
})?;
@@ -302,7 +302,7 @@ mod tests {
.await
.unwrap_err();
assert_matches!(err, Error::RegionOpeningRace { .. });
assert_matches!(err, Error::RegionOperatingRace { .. });
assert!(!err.is_retryable());
}
@@ -425,7 +425,7 @@ mod tests {
let mut env = TestingEnv::new();
// Prepares table
let table_info = new_test_table_info(1024, vec![1]).into();
let table_info = new_test_table_info(1024).into();
let region_routes = vec![RegionRoute {
region: Region::new_test(region_id),
leader_peer: Some(Peer::empty(from_peer_id)),

View File

@@ -142,7 +142,7 @@ mod tests {
let mut ctx = env.context_factory().new_context(persistent_context);
let table_id = ctx.persistent_ctx.region_ids[0].table_id();
let table_info = new_test_table_info(1024, vec![1, 2]).into();
let table_info = new_test_table_info(1024).into();
let region_routes = vec![RegionRoute {
region: Region::new_test(RegionId::new(1024, 1)),
leader_peer: Some(Peer::empty(1024)),
@@ -185,7 +185,7 @@ mod tests {
let mut ctx = env.context_factory().new_context(persistent_context);
let table_id = ctx.persistent_ctx.region_ids[0].table_id();
let table_info = new_test_table_info(1024, vec![1, 2]).into();
let table_info = new_test_table_info(1024).into();
let region_routes = vec![RegionRoute {
region: Region::new_test(RegionId::new(1024, 1)),
leader_peer: Some(from_peer.clone()),

View File

@@ -120,7 +120,7 @@ mod tests {
let mut ctx = env.context_factory().new_context(persistent_context);
let table_id = ctx.persistent_ctx.region_ids[0].table_id();
let table_info = new_test_table_info(1024, vec![1, 2, 3]).into();
let table_info = new_test_table_info(1024).into();
let region_routes = vec![
RegionRoute {
region: Region::new_test(RegionId::new(1024, 1)),

View File

@@ -262,7 +262,7 @@ mod tests {
let persistent_context = new_persistent_context();
let mut ctx = env.context_factory().new_context(persistent_context);
let table_info = new_test_table_info(1024, vec![2]).into();
let table_info = new_test_table_info(1024).into();
let region_routes = vec![RegionRoute {
region: Region::new_test(RegionId::new(1024, 2)),
leader_peer: Some(Peer::empty(4)),
@@ -295,7 +295,7 @@ mod tests {
let persistent_context = new_persistent_context();
let mut ctx = env.context_factory().new_context(persistent_context);
let table_info = new_test_table_info(1024, vec![1]).into();
let table_info = new_test_table_info(1024).into();
let region_routes = vec![RegionRoute {
region: Region::new_test(RegionId::new(1024, 1)),
leader_peer: Some(Peer::empty(3)),
@@ -330,7 +330,7 @@ mod tests {
let persistent_context = new_persistent_context();
let mut ctx = env.context_factory().new_context(persistent_context);
let table_info = new_test_table_info(1024, vec![1]).into();
let table_info = new_test_table_info(1024).into();
let region_routes = vec![RegionRoute {
region: Region::new_test(RegionId::new(1024, 1)),
leader_peer: Some(Peer::empty(1)),
@@ -369,7 +369,7 @@ mod tests {
let leader_peer = persistent_context.from_peer.clone();
let mut ctx = env.context_factory().new_context(persistent_context);
let table_info = new_test_table_info(1024, vec![1]).into();
let table_info = new_test_table_info(1024).into();
let region_routes = vec![RegionRoute {
region: Region::new_test(RegionId::new(1024, 1)),
leader_peer: Some(leader_peer),
@@ -396,7 +396,7 @@ mod tests {
let candidate_peer = persistent_context.to_peer.clone();
let mut ctx = env.context_factory().new_context(persistent_context);
let table_info = new_test_table_info(1024, vec![1]).into();
let table_info = new_test_table_info(1024).into();
let region_routes = vec![RegionRoute {
region: Region::new_test(RegionId::new(1024, 1)),
leader_peer: Some(candidate_peer),
@@ -424,7 +424,7 @@ mod tests {
let candidate_peer = persistent_context.to_peer.clone();
let mut ctx = env.context_factory().new_context(persistent_context);
let table_info = new_test_table_info(1024, vec![1]).into();
let table_info = new_test_table_info(1024).into();
let region_routes = vec![RegionRoute {
region: Region::new_test(RegionId::new(1024, 1)),
leader_peer: Some(candidate_peer),
@@ -454,7 +454,7 @@ mod tests {
let opening_keeper = MemoryRegionKeeper::default();
let table_id = 1024;
let table_info = new_test_table_info(table_id, vec![1]).into();
let table_info = new_test_table_info(table_id).into();
let region_routes = vec![RegionRoute {
region: Region::new_test(RegionId::new(table_id, 1)),
leader_peer: Some(Peer::empty(1)),

View File

@@ -381,7 +381,7 @@ mod tests {
async fn prepare_table_metadata(ctx: &Context, wal_options: HashMap<u32, String>) {
let region_id = ctx.persistent_ctx.region_ids[0];
let table_info = new_test_table_info(region_id.table_id(), vec![1]).into();
let table_info = new_test_table_info(region_id.table_id()).into();
let region_routes = vec![RegionRoute {
region: Region::new_test(region_id),
leader_peer: Some(ctx.persistent_ctx.from_peer.clone()),

View File

@@ -390,10 +390,7 @@ mod tests {
.table_route_storage()
.build_create_txn(
1024,
&TableRouteValue::Logical(LogicalTableRouteValue::new(
1024,
vec![RegionId::new(1023, 1)],
)),
&TableRouteValue::Logical(LogicalTableRouteValue::new(1024)),
)
.unwrap();
kv_backend.txn(txn).await.unwrap();

View File

@@ -20,18 +20,29 @@ pub mod group;
pub mod plan;
pub mod repartition_end;
pub mod repartition_start;
pub mod utils;
use std::any::Any;
use std::fmt::Debug;
use common_error::ext::BoxedError;
use common_meta::cache_invalidator::CacheInvalidatorRef;
use common_meta::key::TableMetadataManagerRef;
use common_meta::instruction::CacheIdent;
use common_meta::key::datanode_table::RegionInfo;
use common_meta::key::table_route::TableRouteValue;
use common_meta::key::{DeserializedValueWithBytes, TableMetadataManagerRef};
use common_meta::node_manager::NodeManagerRef;
use common_meta::region_keeper::{MemoryRegionKeeperRef, OperatingRegionGuard};
use common_meta::region_registry::LeaderRegionRegistryRef;
use common_meta::rpc::router::RegionRoute;
use common_procedure::{Context as ProcedureContext, Status};
use serde::{Deserialize, Serialize};
use snafu::{OptionExt, ResultExt};
use store_api::storage::TableId;
use crate::error::Result;
use crate::error::{self, Result};
use crate::procedure::repartition::plan::RepartitionPlanEntry;
use crate::procedure::repartition::utils::get_datanode_table_value;
use crate::service::mailbox::MailboxRef;
#[cfg(test)]
@@ -46,14 +57,115 @@ pub struct PersistentContext {
pub plans: Vec<RepartitionPlanEntry>,
}
pub struct VolatileContext {
pub allocating_regions: Vec<OperatingRegionGuard>,
}
pub struct Context {
pub persistent_ctx: PersistentContext,
pub volatile_ctx: VolatileContext,
pub table_metadata_manager: TableMetadataManagerRef,
pub memory_region_keeper: MemoryRegionKeeperRef,
pub node_manager: NodeManagerRef,
pub leader_region_registry: LeaderRegionRegistryRef,
pub mailbox: MailboxRef,
pub server_addr: String,
pub cache_invalidator: CacheInvalidatorRef,
}
impl Context {
/// Retrieves the table route value for the given table id.
///
/// Retry:
/// - Failed to retrieve the metadata of table.
///
/// Abort:
/// - Table route not found.
pub async fn get_table_route_value(
&self,
) -> Result<DeserializedValueWithBytes<TableRouteValue>> {
let table_id = self.persistent_ctx.table_id;
let table_route_value = self
.table_metadata_manager
.table_route_manager()
.table_route_storage()
.get_with_raw_bytes(table_id)
.await
.map_err(BoxedError::new)
.with_context(|_| error::RetryLaterWithSourceSnafu {
reason: format!("Failed to get table route for table: {}", table_id),
})?
.context(error::TableRouteNotFoundSnafu { table_id })?;
Ok(table_route_value)
}
/// Updates the table route.
///
/// Retry:
/// - Failed to retrieve the metadata of datanode table.
///
/// Abort:
/// - Table route not found.
/// - Failed to update the table route.
pub async fn update_table_route(
&self,
current_table_route_value: &DeserializedValueWithBytes<TableRouteValue>,
new_region_routes: Vec<RegionRoute>,
) -> Result<()> {
let table_id = self.persistent_ctx.table_id;
if new_region_routes.is_empty() {
return error::UnexpectedSnafu {
violated: format!("new_region_routes is empty for table: {}", table_id),
}
.fail();
}
let datanode_id = new_region_routes
.first()
.unwrap()
.leader_peer
.as_ref()
.context(error::NoLeaderSnafu)?
.id;
let datanode_table_value =
get_datanode_table_value(&self.table_metadata_manager, table_id, datanode_id).await?;
let RegionInfo {
region_options,
region_wal_options,
..
} = &datanode_table_value.region_info;
self.table_metadata_manager
.update_table_route(
table_id,
datanode_table_value.region_info.clone(),
current_table_route_value,
new_region_routes,
region_options,
region_wal_options,
)
.await
.context(error::TableMetadataManagerSnafu)
}
/// Broadcasts the invalidate table cache message.
pub async fn invalidate_table_cache(&self) -> Result<()> {
let table_id = self.persistent_ctx.table_id;
let subject = format!(
"Invalidate table cache for repartition table, table: {}",
table_id,
);
let ctx = common_meta::cache_invalidator::Context {
subject: Some(subject),
};
let _ = self
.cache_invalidator
.invalidate(&ctx, &[CacheIdent::TableId(table_id)])
.await;
Ok(())
}
}
#[async_trait::async_trait]
#[typetag::serde(tag = "repartition_state")]
pub(crate) trait State: Sync + Send + Debug {

View File

@@ -13,11 +13,23 @@
// limitations under the License.
use std::any::Any;
use std::collections::{HashMap, HashSet};
use common_meta::ddl::drop_table::executor::DropTableExecutor;
use common_meta::lock_key::TableLock;
use common_meta::node_manager::NodeManagerRef;
use common_meta::region_registry::LeaderRegionRegistryRef;
use common_meta::rpc::router::RegionRoute;
use common_procedure::{Context as ProcedureContext, Status};
use common_telemetry::{info, warn};
use serde::{Deserialize, Serialize};
use snafu::ResultExt;
use store_api::storage::{RegionId, TableId};
use table::table_name::TableName;
use table::table_reference::TableReference;
use crate::error::Result;
use crate::error::{self, Result};
use crate::procedure::repartition::group::region_routes;
use crate::procedure::repartition::repartition_end::RepartitionEnd;
use crate::procedure::repartition::{Context, State};
@@ -30,7 +42,7 @@ impl State for DeallocateRegion {
async fn next(
&mut self,
ctx: &mut Context,
_procedure_ctx: &ProcedureContext,
procedure_ctx: &ProcedureContext,
) -> Result<(Box<dyn State>, Status)> {
let region_to_deallocate = ctx
.persistent_ctx
@@ -42,11 +54,185 @@ impl State for DeallocateRegion {
return Ok((Box::new(RepartitionEnd), Status::done()));
}
// TODO(weny): deallocate regions.
todo!()
let table_id = ctx.persistent_ctx.table_id;
let pending_deallocate_region_ids = ctx
.persistent_ctx
.plans
.iter()
.flat_map(|p| p.pending_deallocate_region_ids.iter())
.cloned()
.collect::<HashSet<_>>();
info!(
"Deallocating regions: {:?} for table: {} during repartition procedure",
pending_deallocate_region_ids, table_id
);
let table_lock = TableLock::Write(table_id).into();
let _guard = procedure_ctx.provider.acquire_lock(&table_lock).await;
let table_route_value = ctx.get_table_route_value().await?;
let deallocating_regions = {
let region_routes = region_routes(table_id, &table_route_value)?;
Self::filter_deallocatable_region_routes(
table_id,
region_routes,
&pending_deallocate_region_ids,
)
};
let table_ref = TableReference::full(
&ctx.persistent_ctx.catalog_name,
&ctx.persistent_ctx.schema_name,
&ctx.persistent_ctx.table_name,
);
// Deallocates the regions on datanodes.
Self::deallocate_regions(
&ctx.node_manager,
&ctx.leader_region_registry,
table_ref.into(),
table_id,
&deallocating_regions,
)
.await?;
// Safety: the table route must be physical, so we can safely unwrap the region routes.
let region_routes = table_route_value.region_routes().unwrap();
let new_region_routes =
Self::generate_region_routes(region_routes, &pending_deallocate_region_ids);
ctx.update_table_route(&table_route_value, new_region_routes)
.await?;
ctx.invalidate_table_cache().await?;
Ok((Box::new(RepartitionEnd), Status::executing(false)))
}
fn as_any(&self) -> &dyn Any {
self
}
}
impl DeallocateRegion {
#[allow(dead_code)]
async fn deallocate_regions(
node_manager: &NodeManagerRef,
leader_region_registry: &LeaderRegionRegistryRef,
table: TableName,
table_id: TableId,
region_routes: &[RegionRoute],
) -> Result<()> {
let executor = DropTableExecutor::new(table, table_id, false);
// Note: Consider adding an option to forcefully drop the physical region,
// which would involve dropping all logical regions associated with that physical region.
executor
.on_drop_regions(
node_manager,
leader_region_registry,
region_routes,
false,
true,
)
.await
.context(error::DeallocateRegionsSnafu { table_id })?;
Ok(())
}
#[allow(dead_code)]
fn filter_deallocatable_region_routes(
table_id: TableId,
region_routes: &[RegionRoute],
pending_deallocate_region_ids: &HashSet<RegionId>,
) -> Vec<RegionRoute> {
let region_routes_map = region_routes
.iter()
.map(|r| (r.region.id, r.clone()))
.collect::<HashMap<_, _>>();
pending_deallocate_region_ids
.iter()
.filter_map(|region_id| match region_routes_map.get(region_id) {
Some(region_route) => Some(region_route.clone()),
None => {
warn!(
"Region {} not found during deallocate regions for table {:?}",
region_id, table_id
);
None
}
})
.collect::<Vec<_>>()
}
#[allow(dead_code)]
fn generate_region_routes(
region_routes: &[RegionRoute],
pending_deallocate_region_ids: &HashSet<RegionId>,
) -> Vec<RegionRoute> {
// Safety: the table route must be physical, so we can safely unwrap the region routes.
region_routes
.iter()
.filter(|r| !pending_deallocate_region_ids.contains(&r.region.id))
.cloned()
.collect()
}
}
#[cfg(test)]
mod tests {
use std::collections::HashSet;
use common_meta::peer::Peer;
use common_meta::rpc::router::{Region, RegionRoute};
use store_api::storage::{RegionId, TableId};
use crate::procedure::repartition::deallocate_region::DeallocateRegion;
fn test_region_routes(table_id: TableId) -> Vec<RegionRoute> {
vec![
RegionRoute {
region: Region {
id: RegionId::new(table_id, 1),
..Default::default()
},
leader_peer: Some(Peer::empty(1)),
..Default::default()
},
RegionRoute {
region: Region {
id: RegionId::new(table_id, 2),
..Default::default()
},
leader_peer: Some(Peer::empty(2)),
..Default::default()
},
]
}
#[test]
fn test_filter_deallocatable_region_routes() {
let table_id = 1024;
let region_routes = test_region_routes(table_id);
let pending_deallocate_region_ids = HashSet::from([RegionId::new(table_id, 1)]);
let deallocatable_region_routes = DeallocateRegion::filter_deallocatable_region_routes(
table_id,
&region_routes,
&pending_deallocate_region_ids,
);
assert_eq!(deallocatable_region_routes.len(), 1);
assert_eq!(
deallocatable_region_routes[0].region.id,
RegionId::new(table_id, 1)
);
}
#[test]
fn test_generate_region_routes() {
let table_id = 1024;
let region_routes = test_region_routes(table_id);
let pending_deallocate_region_ids = HashSet::from([RegionId::new(table_id, 1)]);
let new_region_routes = DeallocateRegion::generate_region_routes(
&region_routes,
&pending_deallocate_region_ids,
);
assert_eq!(new_region_routes.len(), 1);
assert_eq!(new_region_routes[0].region.id, RegionId::new(table_id, 2));
}
}

View File

@@ -13,18 +13,41 @@
// limitations under the License.
use std::any::Any;
use std::collections::HashMap;
use common_procedure::{Context as ProcedureContext, ProcedureWithId, Status};
use serde::{Deserialize, Serialize};
use store_api::storage::RegionId;
use crate::error::Result;
use crate::procedure::repartition::collect::{Collect, ProcedureMeta};
use crate::procedure::repartition::group::RepartitionGroupProcedure;
use crate::procedure::repartition::plan::RegionDescriptor;
use crate::procedure::repartition::{self, Context, State};
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Dispatch;
#[allow(dead_code)]
fn build_region_mapping(
source_regions: &[RegionDescriptor],
target_regions: &[RegionDescriptor],
transition_map: &[Vec<usize>],
) -> HashMap<RegionId, Vec<RegionId>> {
transition_map
.iter()
.enumerate()
.map(|(source_idx, indices)| {
let source_region = source_regions[source_idx].region_id;
let target_regions = indices
.iter()
.map(|&target_idx| target_regions[target_idx].region_id)
.collect::<Vec<_>>();
(source_region, target_regions)
})
.collect::<HashMap<RegionId, _>>()
}
#[async_trait::async_trait]
#[typetag::serde]
impl State for Dispatch {
@@ -37,11 +60,19 @@ impl State for Dispatch {
let mut procedures = Vec::with_capacity(ctx.persistent_ctx.plans.len());
let mut procedure_metas = Vec::with_capacity(ctx.persistent_ctx.plans.len());
for (plan_index, plan) in ctx.persistent_ctx.plans.iter().enumerate() {
let region_mapping = build_region_mapping(
&plan.source_regions,
&plan.target_regions,
&plan.transition_map,
);
let persistent_ctx = repartition::group::PersistentContext::new(
plan.group_id,
table_id,
ctx.persistent_ctx.catalog_name.clone(),
ctx.persistent_ctx.schema_name.clone(),
plan.source_regions.clone(),
plan.target_regions.clone(),
region_mapping,
);
let group_procedure = RepartitionGroupProcedure::new(persistent_ctx, ctx);

View File

@@ -12,27 +12,34 @@
// See the License for the specific language governing permissions and
// limitations under the License.
pub(crate) mod apply_staging_manifest;
pub(crate) mod enter_staging_region;
pub(crate) mod remap_manifest;
pub(crate) mod repartition_end;
pub(crate) mod repartition_start;
pub(crate) mod update_metadata;
pub(crate) mod utils;
use std::any::Any;
use std::collections::HashMap;
use std::fmt::Debug;
use std::time::Duration;
use common_error::ext::BoxedError;
use common_meta::DatanodeId;
use common_meta::cache_invalidator::CacheInvalidatorRef;
use common_meta::instruction::CacheIdent;
use common_meta::key::datanode_table::{DatanodeTableKey, DatanodeTableValue, RegionInfo};
use common_meta::key::datanode_table::{DatanodeTableValue, RegionInfo};
use common_meta::key::table_route::TableRouteValue;
use common_meta::key::{DeserializedValueWithBytes, TableMetadataManagerRef};
use common_meta::lock_key::{CatalogLock, RegionLock, SchemaLock};
use common_meta::peer::Peer;
use common_meta::rpc::router::RegionRoute;
use common_procedure::error::ToJsonSnafu;
use common_procedure::{
Context as ProcedureContext, LockKey, Procedure, Result as ProcedureResult, Status,
UserMetadata,
Context as ProcedureContext, Error as ProcedureError, LockKey, Procedure,
Result as ProcedureResult, Status, StringKey, UserMetadata,
};
use common_telemetry::error;
use serde::{Deserialize, Serialize};
use snafu::{OptionExt, ResultExt};
use store_api::storage::{RegionId, TableId};
@@ -41,6 +48,7 @@ use uuid::Uuid;
use crate::error::{self, Result};
use crate::procedure::repartition::group::repartition_start::RepartitionStart;
use crate::procedure::repartition::plan::RegionDescriptor;
use crate::procedure::repartition::utils::get_datanode_table_value;
use crate::procedure::repartition::{self};
use crate::service::mailbox::MailboxRef;
@@ -71,6 +79,12 @@ impl RepartitionGroupProcedure {
}
}
#[derive(Debug, Serialize)]
pub struct RepartitionGroupData<'a> {
persistent_ctx: &'a PersistentContext,
state: &'a dyn State,
}
#[async_trait::async_trait]
impl Procedure for RepartitionGroupProcedure {
fn type_name(&self) -> &str {
@@ -78,27 +92,48 @@ impl Procedure for RepartitionGroupProcedure {
}
async fn execute(&mut self, _ctx: &ProcedureContext) -> ProcedureResult<Status> {
todo!()
}
let state = &mut self.state;
async fn rollback(&mut self, _: &ProcedureContext) -> ProcedureResult<()> {
todo!()
match state.next(&mut self.context, _ctx).await {
Ok((next, status)) => {
*state = next;
Ok(status)
}
Err(e) => {
if e.is_retryable() {
Err(ProcedureError::retry_later(e))
} else {
error!(
e;
"Repartition group procedure failed, group id: {}, table id: {}",
self.context.persistent_ctx.group_id,
self.context.persistent_ctx.table_id,
);
Err(ProcedureError::external(e))
}
}
}
}
fn rollback_supported(&self) -> bool {
true
false
}
fn dump(&self) -> ProcedureResult<String> {
todo!()
let data = RepartitionGroupData {
persistent_ctx: &self.context.persistent_ctx,
state: self.state.as_ref(),
};
serde_json::to_string(&data).context(ToJsonSnafu)
}
fn lock_key(&self) -> LockKey {
todo!()
LockKey::new(self.context.persistent_ctx.lock_key())
}
fn user_metadata(&self) -> Option<UserMetadata> {
todo!()
// TODO(weny): support user metadata.
None
}
}
@@ -123,8 +158,8 @@ pub struct GroupPrepareResult {
pub target_routes: Vec<RegionRoute>,
/// The primary source region id (first source region), used for retrieving region options.
pub central_region: RegionId,
/// The datanode id where the primary source region is located.
pub central_region_datanode_id: DatanodeId,
/// The peer where the primary source region is located.
pub central_region_datanode: Peer,
}
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq)]
@@ -132,30 +167,59 @@ pub struct PersistentContext {
pub group_id: GroupId,
/// The table id of the repartition group.
pub table_id: TableId,
/// The catalog name of the repartition group.
pub catalog_name: String,
/// The schema name of the repartition group.
pub schema_name: String,
/// The source regions of the repartition group.
pub sources: Vec<RegionDescriptor>,
/// The target regions of the repartition group.
pub targets: Vec<RegionDescriptor>,
/// For each `source region`, the corresponding
/// `target regions` that overlap with it.
pub region_mapping: HashMap<RegionId, Vec<RegionId>>,
/// The result of group prepare.
/// The value will be set in [RepartitionStart](crate::procedure::repartition::group::repartition_start::RepartitionStart) state.
pub group_prepare_result: Option<GroupPrepareResult>,
/// The staging manifest paths of the repartition group.
/// The value will be set in [RemapManifest](crate::procedure::repartition::group::remap_manifest::RemapManifest) state.
pub staging_manifest_paths: HashMap<RegionId, String>,
}
impl PersistentContext {
pub fn new(
group_id: GroupId,
table_id: TableId,
catalog_name: String,
schema_name: String,
sources: Vec<RegionDescriptor>,
targets: Vec<RegionDescriptor>,
region_mapping: HashMap<RegionId, Vec<RegionId>>,
) -> Self {
Self {
group_id,
table_id,
catalog_name,
schema_name,
sources,
targets,
region_mapping,
group_prepare_result: None,
staging_manifest_paths: HashMap::new(),
}
}
pub fn lock_key(&self) -> Vec<StringKey> {
let mut lock_keys = Vec::with_capacity(2 + self.sources.len());
lock_keys.extend([
CatalogLock::Read(&self.catalog_name).into(),
SchemaLock::read(&self.catalog_name, &self.schema_name).into(),
]);
for source in &self.sources {
lock_keys.push(RegionLock::Write(source.region_id).into());
}
lock_keys
}
}
impl Context {
@@ -198,24 +262,7 @@ impl Context {
table_id: TableId,
datanode_id: u64,
) -> Result<DatanodeTableValue> {
let datanode_table_value = self
.table_metadata_manager
.datanode_table_manager()
.get(&DatanodeTableKey {
datanode_id,
table_id,
})
.await
.context(error::TableMetadataManagerSnafu)
.map_err(BoxedError::new)
.with_context(|_| error::RetryLaterWithSourceSnafu {
reason: format!("Failed to get DatanodeTable: {table_id}"),
})?
.context(error::DatanodeTableNotFoundSnafu {
table_id,
datanode_id,
})?;
Ok(datanode_table_value)
get_datanode_table_value(&self.table_metadata_manager, table_id, datanode_id).await
}
/// Broadcasts the invalidate table cache message.
@@ -253,7 +300,7 @@ impl Context {
// Safety: prepare result is set in [RepartitionStart] state.
let prepare_result = self.persistent_ctx.group_prepare_result.as_ref().unwrap();
let central_region_datanode_table_value = self
.get_datanode_table_value(table_id, prepare_result.central_region_datanode_id)
.get_datanode_table_value(table_id, prepare_result.central_region_datanode.id)
.await?;
let RegionInfo {
region_options,

View File

@@ -0,0 +1,333 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
use std::any::Any;
use std::collections::HashMap;
use std::time::{Duration, Instant};
use api::v1::meta::MailboxMessage;
use common_meta::instruction::{
ApplyStagingManifestReply, ApplyStagingManifestsReply, Instruction, InstructionReply,
};
use common_meta::peer::Peer;
use common_meta::rpc::router::RegionRoute;
use common_procedure::{Context as ProcedureContext, Status};
use common_telemetry::info;
use futures::future::join_all;
use serde::{Deserialize, Serialize};
use snafu::{OptionExt, ResultExt, ensure};
use store_api::storage::RegionId;
use crate::error::{self, Error, Result};
use crate::handler::HeartbeatMailbox;
use crate::procedure::repartition::group::update_metadata::UpdateMetadata;
use crate::procedure::repartition::group::utils::{
HandleMultipleResult, group_region_routes_by_peer, handle_multiple_results,
};
use crate::procedure::repartition::group::{Context, State};
use crate::procedure::repartition::plan::RegionDescriptor;
use crate::service::mailbox::{Channel, MailboxRef};
#[derive(Debug, Serialize, Deserialize)]
pub struct ApplyStagingManifest;
#[async_trait::async_trait]
#[typetag::serde]
impl State for ApplyStagingManifest {
async fn next(
&mut self,
ctx: &mut Context,
_procedure_ctx: &ProcedureContext,
) -> Result<(Box<dyn State>, Status)> {
self.apply_staging_manifests(ctx).await?;
Ok((
Box::new(UpdateMetadata::ApplyStaging),
Status::executing(true),
))
}
fn as_any(&self) -> &dyn Any {
self
}
}
impl ApplyStagingManifest {
fn build_apply_staging_manifest_instructions(
staging_manifest_paths: &HashMap<RegionId, String>,
target_routes: &[RegionRoute],
targets: &[RegionDescriptor],
central_region_id: RegionId,
) -> Result<HashMap<Peer, Vec<common_meta::instruction::ApplyStagingManifest>>> {
let target_partition_expr_by_region = targets
.iter()
.map(|target| {
Ok((
target.region_id,
target
.partition_expr
.as_json_str()
.context(error::SerializePartitionExprSnafu)?,
))
})
.collect::<Result<HashMap<_, _>>>()?;
// Safety: `leader_peer` is set for all region routes, checked in `repartition_start`.
let target_region_routes_by_peer = group_region_routes_by_peer(target_routes);
let mut instructions = HashMap::with_capacity(target_region_routes_by_peer.len());
for (peer, region_ids) in target_region_routes_by_peer {
let apply_staging_manifests = region_ids
.into_iter()
.map(|region_id| common_meta::instruction::ApplyStagingManifest {
region_id,
partition_expr: target_partition_expr_by_region[&region_id].clone(),
central_region_id,
manifest_path: staging_manifest_paths[&region_id].clone(),
})
.collect();
instructions.insert(peer.clone(), apply_staging_manifests);
}
Ok(instructions)
}
#[allow(dead_code)]
async fn apply_staging_manifests(&self, ctx: &mut Context) -> Result<()> {
let table_id = ctx.persistent_ctx.table_id;
let group_id = ctx.persistent_ctx.group_id;
let staging_manifest_paths = &ctx.persistent_ctx.staging_manifest_paths;
// Safety: the group prepare result is set in the RepartitionStart state.
let prepare_result = ctx.persistent_ctx.group_prepare_result.as_ref().unwrap();
let targets = &ctx.persistent_ctx.targets;
let target_routes = &prepare_result.target_routes;
let central_region_id = prepare_result.central_region;
let instructions = Self::build_apply_staging_manifest_instructions(
staging_manifest_paths,
target_routes,
targets,
central_region_id,
)?;
let operation_timeout =
ctx.next_operation_timeout()
.context(error::ExceededDeadlineSnafu {
operation: "Apply staging manifests",
})?;
let (peers, tasks): (Vec<_>, Vec<_>) = instructions
.iter()
.map(|(peer, apply_staging_manifests)| {
(
peer,
Self::apply_staging_manifest(
&ctx.mailbox,
&ctx.server_addr,
peer,
apply_staging_manifests,
operation_timeout,
),
)
})
.unzip();
info!(
"Sent apply staging manifests instructions to peers: {:?} for repartition table {}, group id {}",
peers, table_id, group_id
);
let format_err_msg = |idx: usize, error: &Error| {
let peer = peers[idx];
format!(
"Failed to apply staging manifests on datanode {:?}, error: {:?}",
peer, error
)
};
// Waits for all tasks to complete.
let results = join_all(tasks).await;
let result = handle_multiple_results(&results);
match result {
HandleMultipleResult::AllSuccessful => Ok(()),
HandleMultipleResult::AllRetryable(retryable_errors) => error::RetryLaterSnafu {
reason: format!(
"All retryable errors during applying staging manifests for repartition table {}, group id {}: {:?}",
table_id, group_id,
retryable_errors
.iter()
.map(|(idx, error)| format_err_msg(*idx, error))
.collect::<Vec<_>>()
.join(",")
),
}
.fail(),
HandleMultipleResult::AllNonRetryable(non_retryable_errors) => error::UnexpectedSnafu {
violated: format!(
"All non retryable errors during applying staging manifests for repartition table {}, group id {}: {:?}",
table_id, group_id,
non_retryable_errors
.iter()
.map(|(idx, error)| format_err_msg(*idx, error))
.collect::<Vec<_>>()
.join(",")
),
}
.fail(),
HandleMultipleResult::PartialRetryable {
retryable_errors,
non_retryable_errors,
} => error::UnexpectedSnafu {
violated: format!(
"Partial retryable errors during applying staging manifests for repartition table {}, group id {}: {:?}, non retryable errors: {:?}",
table_id, group_id,
retryable_errors
.iter()
.map(|(idx, error)| format_err_msg(*idx, error))
.collect::<Vec<_>>()
.join(","),
non_retryable_errors
.iter()
.map(|(idx, error)| format_err_msg(*idx, error))
.collect::<Vec<_>>()
.join(","),
),
}
.fail(),
}
}
async fn apply_staging_manifest(
mailbox: &MailboxRef,
server_addr: &str,
peer: &Peer,
apply_staging_manifests: &[common_meta::instruction::ApplyStagingManifest],
timeout: Duration,
) -> Result<()> {
let ch = Channel::Datanode(peer.id);
let instruction = Instruction::ApplyStagingManifests(apply_staging_manifests.to_vec());
let message = MailboxMessage::json_message(
&format!(
"Apply staging manifests for regions: {:?}",
apply_staging_manifests
.iter()
.map(|r| r.region_id)
.collect::<Vec<_>>()
),
&format!("Metasrv@{}", server_addr),
&format!("Datanode-{}@{}", peer.id, peer.addr),
common_time::util::current_time_millis(),
&instruction,
)
.with_context(|_| error::SerializeToJsonSnafu {
input: instruction.to_string(),
})?;
let now = Instant::now();
let receiver = mailbox.send(&ch, message, timeout).await;
let receiver = match receiver {
Ok(receiver) => receiver,
Err(error::Error::PusherNotFound { .. }) => error::RetryLaterSnafu {
reason: format!(
"Pusher not found for apply staging manifests on datanode {:?}, elapsed: {:?}",
peer,
now.elapsed()
),
}
.fail()?,
Err(err) => {
return Err(err);
}
};
match receiver.await {
Ok(msg) => {
let reply = HeartbeatMailbox::json_reply(&msg)?;
info!(
"Received apply staging manifests reply: {:?}, elapsed: {:?}",
reply,
now.elapsed()
);
let InstructionReply::ApplyStagingManifests(ApplyStagingManifestsReply { replies }) =
reply
else {
return error::UnexpectedInstructionReplySnafu {
mailbox_message: msg.to_string(),
reason: "expect apply staging manifests reply",
}
.fail();
};
for reply in replies {
Self::handle_apply_staging_manifest_reply(&reply, &now, peer)?;
}
Ok(())
}
Err(error::Error::MailboxTimeout { .. }) => {
let reason = format!(
"Mailbox received timeout for apply staging manifests on datanode {:?}, elapsed: {:?}",
peer,
now.elapsed()
);
error::RetryLaterSnafu { reason }.fail()
}
Err(err) => Err(err),
}
}
fn handle_apply_staging_manifest_reply(
ApplyStagingManifestReply {
region_id,
ready,
exists,
error,
}: &ApplyStagingManifestReply,
now: &Instant,
peer: &Peer,
) -> Result<()> {
ensure!(
exists,
error::UnexpectedSnafu {
violated: format!(
"Region {} doesn't exist on datanode {:?}, elapsed: {:?}",
region_id,
peer,
now.elapsed()
)
}
);
if error.is_some() {
return error::RetryLaterSnafu {
reason: format!(
"Failed to apply staging manifest on datanode {:?}, error: {:?}, elapsed: {:?}",
peer,
error,
now.elapsed()
),
}
.fail();
}
ensure!(
ready,
error::RetryLaterSnafu {
reason: format!(
"Region {} is still applying staging manifest on datanode {:?}, elapsed: {:?}",
region_id,
peer,
now.elapsed()
),
}
);
Ok(())
}
}

View File

@@ -29,6 +29,7 @@ use snafu::{OptionExt, ResultExt, ensure};
use crate::error::{self, Error, Result};
use crate::handler::HeartbeatMailbox;
use crate::procedure::repartition::group::remap_manifest::RemapManifest;
use crate::procedure::repartition::group::utils::{
HandleMultipleResult, group_region_routes_by_peer, handle_multiple_results,
};
@@ -49,7 +50,7 @@ impl State for EnterStagingRegion {
) -> Result<(Box<dyn State>, Status)> {
self.enter_staging_regions(ctx).await?;
Ok(Self::next_state())
Ok((Box::new(RemapManifest), Status::executing(true)))
}
fn as_any(&self) -> &dyn Any {
@@ -58,16 +59,10 @@ impl State for EnterStagingRegion {
}
impl EnterStagingRegion {
#[allow(dead_code)]
fn next_state() -> (Box<dyn State>, Status) {
// TODO(weny): change it later.
(Box::new(EnterStagingRegion), Status::executing(true))
}
fn build_enter_staging_instructions(
prepare_result: &GroupPrepareResult,
targets: &[RegionDescriptor],
) -> Result<HashMap<Peer, Instruction>> {
) -> Result<HashMap<Peer, Vec<common_meta::instruction::EnterStagingRegion>>> {
let target_partition_expr_by_region = targets
.iter()
.map(|target| {
@@ -93,10 +88,7 @@ impl EnterStagingRegion {
partition_expr: target_partition_expr_by_region[&region_id].clone(),
})
.collect();
instructions.insert(
peer.clone(),
Instruction::EnterStagingRegions(enter_staging_regions),
);
instructions.insert(peer.clone(), enter_staging_regions);
}
Ok(instructions)
@@ -117,14 +109,14 @@ impl EnterStagingRegion {
})?;
let (peers, tasks): (Vec<_>, Vec<_>) = instructions
.iter()
.map(|(peer, instruction)| {
.map(|(peer, enter_staging_regions)| {
(
peer,
Self::enter_staging_region(
&ctx.mailbox,
&ctx.server_addr,
peer,
instruction,
enter_staging_regions,
operation_timeout,
),
)
@@ -208,12 +200,19 @@ impl EnterStagingRegion {
mailbox: &MailboxRef,
server_addr: &str,
peer: &Peer,
instruction: &Instruction,
enter_staging_regions: &[common_meta::instruction::EnterStagingRegion],
timeout: Duration,
) -> Result<()> {
let ch = Channel::Datanode(peer.id);
let instruction = Instruction::EnterStagingRegions(enter_staging_regions.to_vec());
let message = MailboxMessage::json_message(
&format!("Enter staging regions: {:?}", instruction),
&format!(
"Enter staging regions: {:?}",
enter_staging_regions
.iter()
.map(|r| r.region_id)
.collect::<Vec<_>>()
),
&format!("Metasrv@{}", server_addr),
&format!("Datanode-{}@{}", peer.id, peer.addr),
common_time::util::current_time_millis(),
@@ -328,7 +327,6 @@ mod tests {
use std::assert_matches::assert_matches;
use std::time::Duration;
use common_meta::instruction::Instruction;
use common_meta::peer::Peer;
use common_meta::rpc::router::{Region, RegionRoute};
use store_api::storage::RegionId;
@@ -376,7 +374,7 @@ mod tests {
},
],
central_region: RegionId::new(table_id, 1),
central_region_datanode_id: 1,
central_region_datanode: Peer::empty(1),
};
let targets = test_targets();
let instructions =
@@ -384,12 +382,7 @@ mod tests {
.unwrap();
assert_eq!(instructions.len(), 2);
let instruction_1 = instructions
.get(&Peer::empty(1))
.unwrap()
.clone()
.into_enter_staging_regions()
.unwrap();
let instruction_1 = instructions.get(&Peer::empty(1)).unwrap().clone();
assert_eq!(
instruction_1,
vec![common_meta::instruction::EnterStagingRegion {
@@ -397,12 +390,7 @@ mod tests {
partition_expr: range_expr("x", 0, 10).as_json_str().unwrap(),
}]
);
let instruction_2 = instructions
.get(&Peer::empty(2))
.unwrap()
.clone()
.into_enter_staging_regions()
.unwrap();
let instruction_2 = instructions.get(&Peer::empty(2)).unwrap().clone();
assert_eq!(
instruction_2,
vec![common_meta::instruction::EnterStagingRegion {
@@ -417,18 +405,17 @@ mod tests {
let env = TestingEnv::new();
let server_addr = "localhost";
let peer = Peer::empty(1);
let instruction =
Instruction::EnterStagingRegions(vec![common_meta::instruction::EnterStagingRegion {
region_id: RegionId::new(1024, 1),
partition_expr: range_expr("x", 0, 10).as_json_str().unwrap(),
}]);
let enter_staging_regions = vec![common_meta::instruction::EnterStagingRegion {
region_id: RegionId::new(1024, 1),
partition_expr: range_expr("x", 0, 10).as_json_str().unwrap(),
}];
let timeout = Duration::from_secs(10);
let err = EnterStagingRegion::enter_staging_region(
env.mailbox_ctx.mailbox(),
server_addr,
&peer,
&instruction,
&enter_staging_regions,
timeout,
)
.await
@@ -447,11 +434,10 @@ mod tests {
.await;
let server_addr = "localhost";
let peer = Peer::empty(1);
let instruction =
Instruction::EnterStagingRegions(vec![common_meta::instruction::EnterStagingRegion {
region_id: RegionId::new(1024, 1),
partition_expr: range_expr("x", 0, 10).as_json_str().unwrap(),
}]);
let enter_staging_regions = vec![common_meta::instruction::EnterStagingRegion {
region_id: RegionId::new(1024, 1),
partition_expr: range_expr("x", 0, 10).as_json_str().unwrap(),
}];
let timeout = Duration::from_secs(10);
// Sends a timeout error.
@@ -463,7 +449,7 @@ mod tests {
env.mailbox_ctx.mailbox(),
server_addr,
&peer,
&instruction,
&enter_staging_regions,
timeout,
)
.await
@@ -479,11 +465,10 @@ mod tests {
let server_addr = "localhost";
let peer = Peer::empty(1);
let instruction =
Instruction::EnterStagingRegions(vec![common_meta::instruction::EnterStagingRegion {
region_id: RegionId::new(1024, 1),
partition_expr: range_expr("x", 0, 10).as_json_str().unwrap(),
}]);
let enter_staging_regions = vec![common_meta::instruction::EnterStagingRegion {
region_id: RegionId::new(1024, 1),
partition_expr: range_expr("x", 0, 10).as_json_str().unwrap(),
}];
let timeout = Duration::from_secs(10);
env.mailbox_ctx
@@ -498,7 +483,7 @@ mod tests {
env.mailbox_ctx.mailbox(),
server_addr,
&peer,
&instruction,
&enter_staging_regions,
timeout,
)
.await
@@ -516,11 +501,10 @@ mod tests {
.await;
let server_addr = "localhost";
let peer = Peer::empty(1);
let instruction =
Instruction::EnterStagingRegions(vec![common_meta::instruction::EnterStagingRegion {
region_id: RegionId::new(1024, 1),
partition_expr: range_expr("x", 0, 10).as_json_str().unwrap(),
}]);
let enter_staging_regions = vec![common_meta::instruction::EnterStagingRegion {
region_id: RegionId::new(1024, 1),
partition_expr: range_expr("x", 0, 10).as_json_str().unwrap(),
}];
let timeout = Duration::from_secs(10);
// Sends a failed reply.
@@ -538,7 +522,7 @@ mod tests {
env.mailbox_ctx.mailbox(),
server_addr,
&peer,
&instruction,
&enter_staging_regions,
timeout,
)
.await
@@ -565,7 +549,7 @@ mod tests {
env.mailbox_ctx.mailbox(),
server_addr,
&peer,
&instruction,
&enter_staging_regions,
timeout,
)
.await
@@ -596,7 +580,7 @@ mod tests {
},
],
central_region: RegionId::new(table_id, 1),
central_region_datanode_id: 1,
central_region_datanode: Peer::empty(1),
}
}

View File

@@ -0,0 +1,222 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
use std::any::Any;
use std::collections::HashMap;
use std::time::{Duration, Instant};
use api::v1::meta::MailboxMessage;
use common_meta::instruction::{Instruction, InstructionReply, RemapManifestReply};
use common_meta::peer::Peer;
use common_procedure::{Context as ProcedureContext, Status};
use common_telemetry::{info, warn};
use serde::{Deserialize, Serialize};
use snafu::{OptionExt, ResultExt, ensure};
use store_api::storage::RegionId;
use crate::error::{self, Result};
use crate::handler::HeartbeatMailbox;
use crate::procedure::repartition::group::apply_staging_manifest::ApplyStagingManifest;
use crate::procedure::repartition::group::{Context, State};
use crate::procedure::repartition::plan::RegionDescriptor;
use crate::service::mailbox::{Channel, MailboxRef};
#[derive(Debug, Serialize, Deserialize)]
pub(crate) struct RemapManifest;
#[async_trait::async_trait]
#[typetag::serde]
impl State for RemapManifest {
async fn next(
&mut self,
ctx: &mut Context,
_procedure_ctx: &ProcedureContext,
) -> Result<(Box<dyn State>, Status)> {
let prepare_result = ctx.persistent_ctx.group_prepare_result.as_ref().unwrap();
let remap = Self::build_remap_manifest_instructions(
&ctx.persistent_ctx.sources,
&ctx.persistent_ctx.targets,
&ctx.persistent_ctx.region_mapping,
prepare_result.central_region,
)?;
let operation_timeout =
ctx.next_operation_timeout()
.context(error::ExceededDeadlineSnafu {
operation: "Remap manifests",
})?;
let manifest_paths = Self::remap_manifests(
&ctx.mailbox,
&ctx.server_addr,
&prepare_result.central_region_datanode,
&remap,
operation_timeout,
)
.await?;
let table_id = ctx.persistent_ctx.table_id;
let group_id = ctx.persistent_ctx.group_id;
if manifest_paths.len() != ctx.persistent_ctx.targets.len() {
warn!(
"Mismatch in manifest paths count: expected {}, got {}. This occurred during remapping manifests for group {} and table {}.",
ctx.persistent_ctx.targets.len(),
manifest_paths.len(),
group_id,
table_id
);
}
ctx.persistent_ctx.staging_manifest_paths = manifest_paths;
Ok((Box::new(ApplyStagingManifest), Status::executing(true)))
}
fn as_any(&self) -> &dyn Any {
self
}
}
impl RemapManifest {
fn build_remap_manifest_instructions(
source_regions: &[RegionDescriptor],
target_regions: &[RegionDescriptor],
region_mapping: &HashMap<RegionId, Vec<RegionId>>,
central_region_id: RegionId,
) -> Result<common_meta::instruction::RemapManifest> {
let new_partition_exprs = target_regions
.iter()
.map(|r| {
Ok((
r.region_id,
r.partition_expr
.as_json_str()
.context(error::SerializePartitionExprSnafu)?,
))
})
.collect::<Result<HashMap<RegionId, String>>>()?;
Ok(common_meta::instruction::RemapManifest {
region_id: central_region_id,
input_regions: source_regions.iter().map(|r| r.region_id).collect(),
region_mapping: region_mapping.clone(),
new_partition_exprs,
})
}
async fn remap_manifests(
mailbox: &MailboxRef,
server_addr: &str,
peer: &Peer,
remap: &common_meta::instruction::RemapManifest,
timeout: Duration,
) -> Result<HashMap<RegionId, String>> {
let ch = Channel::Datanode(peer.id);
let instruction = Instruction::RemapManifest(remap.clone());
let message = MailboxMessage::json_message(
&format!(
"Remap manifests, central region: {}, input regions: {:?}",
remap.region_id, remap.input_regions
),
&format!("Metasrv@{}", server_addr),
&format!("Datanode-{}@{}", peer.id, peer.addr),
common_time::util::current_time_millis(),
&instruction,
)
.with_context(|_| error::SerializeToJsonSnafu {
input: instruction.to_string(),
})?;
let now = Instant::now();
let receiver = mailbox.send(&ch, message, timeout).await;
let receiver = match receiver {
Ok(receiver) => receiver,
Err(error::Error::PusherNotFound { .. }) => error::RetryLaterSnafu {
reason: format!(
"Pusher not found for remap manifests on datanode {:?}, elapsed: {:?}",
peer,
now.elapsed()
),
}
.fail()?,
Err(err) => {
return Err(err);
}
};
match receiver.await {
Ok(msg) => {
let reply = HeartbeatMailbox::json_reply(&msg)?;
info!(
"Received remap manifest reply: {:?}, elapsed: {:?}",
reply,
now.elapsed()
);
let InstructionReply::RemapManifest(reply) = reply else {
return error::UnexpectedInstructionReplySnafu {
mailbox_message: msg.to_string(),
reason: "expect remap manifest reply",
}
.fail();
};
Self::handle_remap_manifest_reply(remap.region_id, reply, &now, peer)
}
Err(error::Error::MailboxTimeout { .. }) => {
let reason = format!(
"Mailbox received timeout for remap manifests on datanode {:?}, elapsed: {:?}",
peer,
now.elapsed()
);
error::RetryLaterSnafu { reason }.fail()
}
Err(err) => Err(err),
}
}
fn handle_remap_manifest_reply(
region_id: RegionId,
RemapManifestReply {
exists,
manifest_paths,
error,
}: RemapManifestReply,
now: &Instant,
peer: &Peer,
) -> Result<HashMap<RegionId, String>> {
ensure!(
exists,
error::UnexpectedSnafu {
violated: format!(
"Region {} doesn't exist on datanode {:?}, elapsed: {:?}",
region_id,
peer,
now.elapsed()
)
}
);
if error.is_some() {
return error::RetryLaterSnafu {
reason: format!(
"Failed to remap manifest on datanode {:?}, error: {:?}, elapsed: {:?}",
peer,
error,
now.elapsed()
),
}
.fail();
}
Ok(manifest_paths)
}
}

View File

@@ -0,0 +1,40 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
use std::any::Any;
use common_procedure::{Context as ProcedureContext, Status};
use serde::{Deserialize, Serialize};
use crate::error::Result;
use crate::procedure::repartition::group::{Context, State};
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct RepartitionEnd;
#[async_trait::async_trait]
#[typetag::serde]
impl State for RepartitionEnd {
async fn next(
&mut self,
_ctx: &mut Context,
_procedure_ctx: &ProcedureContext,
) -> Result<(Box<dyn State>, Status)> {
Ok((Box::new(RepartitionEnd), Status::done()))
}
fn as_any(&self) -> &dyn Any {
self
}
}

View File

@@ -22,6 +22,7 @@ use serde::{Deserialize, Serialize};
use snafu::{OptionExt, ResultExt, ensure};
use crate::error::{self, Result};
use crate::procedure::repartition::group::update_metadata::UpdateMetadata;
use crate::procedure::repartition::group::{
Context, GroupId, GroupPrepareResult, State, region_routes,
};
@@ -109,7 +110,7 @@ impl RepartitionStart {
);
}
let central_region = sources[0].region_id;
let central_region_datanode_id = source_region_routes[0]
let central_region_datanode = source_region_routes[0]
.leader_peer
.as_ref()
.context(error::UnexpectedSnafu {
@@ -118,20 +119,22 @@ impl RepartitionStart {
central_region
),
})?
.id;
.clone();
Ok(GroupPrepareResult {
source_routes: source_region_routes,
target_routes: target_region_routes,
central_region,
central_region_datanode_id,
central_region_datanode,
})
}
#[allow(dead_code)]
fn next_state() -> (Box<dyn State>, Status) {
// TODO(weny): change it later.
(Box::new(RepartitionStart), Status::executing(true))
(
Box::new(UpdateMetadata::ApplyStaging),
Status::executing(true),
)
}
}

Some files were not shown because too many files have changed in this diff Show More