Compare commits

...

17 Commits

Author SHA1 Message Date
Ruihang Xia
1b7ab2957b feat: cache logical region's metadata
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2024-10-12 16:16:25 +08:00
Ning Sun
aaa9b32908 feat: add more h3 functions (#4770)
* feat: add more h3 grid functions

* feat: add more traversal functions

* refactor: update some function definitions

* style: format

* refactor: avoid creating slice in nested loop

* feat: ensure column number and length

* refactor: fix lint warnings

* refactor: merge main

* Apply suggestions from code review

Co-authored-by: LFC <990479+MichaelScofield@users.noreply.github.com>

* Update src/common/function/src/scalars/geo/h3.rs

Co-authored-by: LFC <990479+MichaelScofield@users.noreply.github.com>

* style: format

---------

Co-authored-by: LFC <990479+MichaelScofield@users.noreply.github.com>
2024-10-11 17:57:54 +00:00
Weny Xu
4bb1f4f184 feat: introduce LeadershipChangeNotifier and LeadershipChangeListener (#4817)
* feat: introduce `LeadershipChangeNotifier`

* refactor: use `LeadershipChangeNotifier`

* chore: apply suggestions from CR

* chore: apply suggestions from CR

* chore: adjust log styling
2024-10-11 12:48:53 +00:00
Weny Xu
0f907ef99e fix: correct table name formatting (#4819) 2024-10-11 11:32:15 +00:00
Ruihang Xia
a61c0bd1d8 fix: error in admin function is not formatted properly (#4820)
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2024-10-11 10:02:45 +00:00
Lei, HUANG
7dd0e3ab37 fix: Panic in UNION ALL queries (#4796)
* fix/union_all_panic:
 Improve MetricCollector by incrementing level and fix underflow issue; add tests for UNION ALL queries

* chore: remove useless documentation

* fix/union_all_panic: Add order by clause to UNION ALL select queries in tests
2024-10-11 08:23:01 +00:00
Yingwen
d168bde226 feat!: move v1/prof API to debug/prof (#4810)
* feat!: move v1/prof to debug/prof

* docs: update readme

* docs: move prof docs to docs dir

* chore: update message

* feat!: remove v1/prof

* docs: update mem prof docs
2024-10-11 04:16:37 +00:00
jeremyhi
4b34f610aa feat: information extension (#4811)
* feat: information extension

* Update manager.rs

Co-authored-by: Weny Xu <wenymedia@gmail.com>

* chore: by comment

---------

Co-authored-by: Weny Xu <wenymedia@gmail.com>
2024-10-11 03:13:49 +00:00
Weny Xu
695ff1e037 feat: expose RegionMigrationManagerRef (#4812)
* chore: expose `RegionMigrationProcedureTask`

* fix: fix typos

* chore: expose `tracker`
2024-10-11 02:40:51 +00:00
Yohan Wal
288fdc3145 feat: json_path_exists udf (#4807)
* feat: json_path_exists udf

* chore: fix comments

* fix: caution when copy&paste QAQ
2024-10-10 14:15:34 +00:00
discord9
a8ed3db0aa feat: Merge sort Logical plan (#4768)
* feat(WIP): MergeSort

* wip

* feat: MergeSort LogicalPlan

* update sqlness result

* Apply suggestions from code review

Co-authored-by: Lei, HUANG <6406592+v0y4g3r@users.noreply.github.com>

* refactor: per review advice

* refactor: more per review

* chore: per review

---------

Co-authored-by: Ruihang Xia <waynestxia@gmail.com>
Co-authored-by: Lei, HUANG <6406592+v0y4g3r@users.noreply.github.com>
2024-10-09 09:37:27 +00:00
Kaifeng Zheng
0dd11f53f5 feat: add json format output for http interface (#4797)
* feat: json output format for http

* feat: add json result test case

* fix: typo and refactor a piece of code

* fix: cargo check

* move affected_rows to top level
2024-10-09 07:11:57 +00:00
Ning Sun
19918928c5 feat: add function to aggregate path into a geojson path (#4798)
* feat: add geojson function to aggregate paths

* test: add sqlness results

* test: add sqlness

* refactor: corrected to aggregation function

* chore: update comments

* fix: make linter happy again

* refactor: rename to remove `geo` from `geojson` function name

The return type is not geojson at all. It's just compatible with geojson's
coordinates part and superset's deckgl path plugin.
2024-10-09 02:38:44 +00:00
shuiyisong
5f0a83b2b1 fix: ts conversion during transform phase (#4790)
* fix: allow ts conversion during transform phase

* chore: replace `unimplemented` with snafu
2024-10-08 17:54:44 +00:00
localhost
71a66d15f7 chore: add json write (#4744)
* chore: add json write

* chore: add test for write json log api

* chore: enhancement of Error Handling

* chore: fix by pr comment

* chore: fix by pr comment

* chore: enhancement of error content and add some doc
2024-10-08 12:11:09 +00:00
Weny Xu
2cdd103874 feat: introduce HeartbeatHandlerGroupBuilderCustomizer (#4803)
* feat: introduce `HeartbeatHandlerGroupBuilderFinalizer`

* chore: rename to `HeartbeatHandlerGroupBuilderCustomizer`
2024-10-08 09:02:06 +00:00
Ning Sun
4dea4cac47 refactor: change sqlness ports to avoid conflict with local instance (#4794) 2024-10-08 07:33:24 +00:00
89 changed files with 2959 additions and 804 deletions

1
Cargo.lock generated
View File

@@ -8141,6 +8141,7 @@ dependencies = [
"futures",
"greptime-proto",
"itertools 0.10.5",
"jsonb",
"lazy_static",
"moka",
"once_cell",

View File

@@ -9,7 +9,7 @@ cargo build --features=pprof
## HTTP API
Sample at 99 Hertz, for 5 seconds, output report in [protobuf format](https://github.com/google/pprof/blob/master/proto/profile.proto).
```bash
curl -s '0:4000/v1/prof/cpu' > /tmp/pprof.out
curl -s '0:4000/debug/prof/cpu' > /tmp/pprof.out
```
Then you can use `pprof` command with the protobuf file.
@@ -19,10 +19,10 @@ go tool pprof -top /tmp/pprof.out
Sample at 99 Hertz, for 60 seconds, output report in flamegraph format.
```bash
curl -s '0:4000/v1/prof/cpu?seconds=60&output=flamegraph' > /tmp/pprof.svg
curl -s '0:4000/debug/prof/cpu?seconds=60&output=flamegraph' > /tmp/pprof.svg
```
Sample at 49 Hertz, for 10 seconds, output report in text format.
```bash
curl -s '0:4000/v1/prof/cpu?seconds=10&frequency=49&output=text' > /tmp/pprof.txt
curl -s '0:4000/debug/prof/cpu?seconds=10&frequency=49&output=text' > /tmp/pprof.txt
```

View File

@@ -12,10 +12,10 @@ brew install jemalloc
sudo apt install libjemalloc-dev
```
### [flamegraph](https://github.com/brendangregg/FlameGraph)
### [flamegraph](https://github.com/brendangregg/FlameGraph)
```bash
curl https://raw.githubusercontent.com/brendangregg/FlameGraph/master/flamegraph.pl > ./flamegraph.pl
curl https://raw.githubusercontent.com/brendangregg/FlameGraph/master/flamegraph.pl > ./flamegraph.pl
```
### Build GreptimeDB with `mem-prof` feature.
@@ -35,7 +35,7 @@ MALLOC_CONF=prof:true,lg_prof_interval:28 ./target/debug/greptime standalone sta
Dump memory profiling data through HTTP API:
```bash
curl localhost:4000/v1/prof/mem > greptime.hprof
curl localhost:4000/debug/prof/mem > greptime.hprof
```
You can periodically dump profiling data and compare them to find the delta memory usage.
@@ -45,6 +45,9 @@ You can periodically dump profiling data and compare them to find the delta memo
To create flamegraph according to dumped profiling data:
```bash
jeprof --svg <path_to_greptimedb_binary> --base=<baseline_prof> <profile_data> > output.svg
```
sudo apt install -y libjemalloc-dev
jeprof <path_to_greptime_binary> <profile_data> --collapse | ./flamegraph.pl > mem-prof.svg
jeprof <path_to_greptime_binary> --base <baseline_prof> <profile_data> --collapse | ./flamegraph.pl > output.svg
```

View File

@@ -89,9 +89,8 @@ pub enum Error {
location: Location,
},
#[snafu(display("Failed to get procedure client in {mode} mode"))]
GetProcedureClient {
mode: String,
#[snafu(display("Failed to get information extension client"))]
GetInformationExtension {
#[snafu(implicit)]
location: Location,
},
@@ -301,7 +300,7 @@ impl ErrorExt for Error {
| Error::CacheNotFound { .. }
| Error::CastManager { .. }
| Error::Json { .. }
| Error::GetProcedureClient { .. }
| Error::GetInformationExtension { .. }
| Error::ProcedureIdNotFound { .. } => StatusCode::Unexpected,
Error::ViewPlanColumnsChanged { .. } => StatusCode::InvalidArguments,

View File

@@ -21,7 +21,6 @@ use common_catalog::consts::{
DEFAULT_CATALOG_NAME, DEFAULT_SCHEMA_NAME, INFORMATION_SCHEMA_NAME, NUMBERS_TABLE_ID,
PG_CATALOG_NAME,
};
use common_config::Mode;
use common_error::ext::BoxedError;
use common_meta::cache::{LayeredCacheRegistryRef, ViewInfoCacheRef};
use common_meta::key::catalog_name::CatalogNameKey;
@@ -34,7 +33,6 @@ use common_meta::kv_backend::KvBackendRef;
use common_procedure::ProcedureManagerRef;
use futures_util::stream::BoxStream;
use futures_util::{StreamExt, TryStreamExt};
use meta_client::client::MetaClient;
use moka::sync::Cache;
use partition::manager::{PartitionRuleManager, PartitionRuleManagerRef};
use session::context::{Channel, QueryContext};
@@ -50,7 +48,7 @@ use crate::error::{
CacheNotFoundSnafu, GetTableCacheSnafu, InvalidTableInfoInCatalogSnafu, ListCatalogsSnafu,
ListSchemasSnafu, ListTablesSnafu, Result, TableMetadataManagerSnafu,
};
use crate::information_schema::InformationSchemaProvider;
use crate::information_schema::{InformationExtensionRef, InformationSchemaProvider};
use crate::kvbackend::TableCacheRef;
use crate::system_schema::pg_catalog::PGCatalogProvider;
use crate::system_schema::SystemSchemaProvider;
@@ -63,9 +61,8 @@ use crate::CatalogManager;
/// comes from `SystemCatalog`, which is static and read-only.
#[derive(Clone)]
pub struct KvBackendCatalogManager {
mode: Mode,
/// Only available in `Distributed` mode.
meta_client: Option<Arc<MetaClient>>,
/// Provides the extension methods for the `information_schema` tables
information_extension: InformationExtensionRef,
/// Manages partition rules.
partition_manager: PartitionRuleManagerRef,
/// Manages table metadata.
@@ -82,15 +79,13 @@ const CATALOG_CACHE_MAX_CAPACITY: u64 = 128;
impl KvBackendCatalogManager {
pub fn new(
mode: Mode,
meta_client: Option<Arc<MetaClient>>,
information_extension: InformationExtensionRef,
backend: KvBackendRef,
cache_registry: LayeredCacheRegistryRef,
procedure_manager: Option<ProcedureManagerRef>,
) -> Arc<Self> {
Arc::new_cyclic(|me| Self {
mode,
meta_client,
information_extension,
partition_manager: Arc::new(PartitionRuleManager::new(
backend.clone(),
cache_registry
@@ -118,20 +113,15 @@ impl KvBackendCatalogManager {
})
}
/// Returns the server running mode.
pub fn running_mode(&self) -> &Mode {
&self.mode
}
pub fn view_info_cache(&self) -> Result<ViewInfoCacheRef> {
self.cache_registry.get().context(CacheNotFoundSnafu {
name: "view_info_cache",
})
}
/// Returns the `[MetaClient]`.
pub fn meta_client(&self) -> Option<Arc<MetaClient>> {
self.meta_client.clone()
/// Returns the [`InformationExtension`].
pub fn information_extension(&self) -> InformationExtensionRef {
self.information_extension.clone()
}
pub fn partition_manager(&self) -> PartitionRuleManagerRef {

View File

@@ -32,7 +32,11 @@ use std::collections::HashMap;
use std::sync::{Arc, Weak};
use common_catalog::consts::{self, DEFAULT_CATALOG_NAME, INFORMATION_SCHEMA_NAME};
use common_error::ext::ErrorExt;
use common_meta::cluster::NodeInfo;
use common_meta::datanode::RegionStat;
use common_meta::key::flow::FlowMetadataManager;
use common_procedure::ProcedureInfo;
use common_recordbatch::SendableRecordBatchStream;
use datatypes::schema::SchemaRef;
use lazy_static::lazy_static;
@@ -45,7 +49,7 @@ use views::InformationSchemaViews;
use self::columns::InformationSchemaColumns;
use super::{SystemSchemaProviderInner, SystemTable, SystemTableRef};
use crate::error::Result;
use crate::error::{Error, Result};
use crate::system_schema::information_schema::cluster_info::InformationSchemaClusterInfo;
use crate::system_schema::information_schema::flows::InformationSchemaFlows;
use crate::system_schema::information_schema::information_memory_table::get_schema_columns;
@@ -318,3 +322,39 @@ where
InformationTable::to_stream(self, request)
}
}
pub type InformationExtensionRef = Arc<dyn InformationExtension<Error = Error> + Send + Sync>;
/// The `InformationExtension` trait provides the extension methods for the `information_schema` tables.
#[async_trait::async_trait]
pub trait InformationExtension {
type Error: ErrorExt;
/// Gets the nodes information.
async fn nodes(&self) -> std::result::Result<Vec<NodeInfo>, Self::Error>;
/// Gets the procedures information.
async fn procedures(&self) -> std::result::Result<Vec<(String, ProcedureInfo)>, Self::Error>;
/// Gets the region statistics.
async fn region_stats(&self) -> std::result::Result<Vec<RegionStat>, Self::Error>;
}
pub struct NoopInformationExtension;
#[async_trait::async_trait]
impl InformationExtension for NoopInformationExtension {
type Error = Error;
async fn nodes(&self) -> std::result::Result<Vec<NodeInfo>, Self::Error> {
Ok(vec![])
}
async fn procedures(&self) -> std::result::Result<Vec<(String, ProcedureInfo)>, Self::Error> {
Ok(vec![])
}
async fn region_stats(&self) -> std::result::Result<Vec<RegionStat>, Self::Error> {
Ok(vec![])
}
}

View File

@@ -17,13 +17,10 @@ use std::time::Duration;
use arrow_schema::SchemaRef as ArrowSchemaRef;
use common_catalog::consts::INFORMATION_SCHEMA_CLUSTER_INFO_TABLE_ID;
use common_config::Mode;
use common_error::ext::BoxedError;
use common_meta::cluster::{ClusterInfo, NodeInfo, NodeStatus};
use common_meta::peer::Peer;
use common_meta::cluster::NodeInfo;
use common_recordbatch::adapter::RecordBatchStreamAdapter;
use common_recordbatch::{RecordBatch, SendableRecordBatchStream};
use common_telemetry::warn;
use common_time::timestamp::Timestamp;
use datafusion::execution::TaskContext;
use datafusion::physical_plan::stream::RecordBatchStreamAdapter as DfRecordBatchStreamAdapter;
@@ -40,7 +37,7 @@ use snafu::ResultExt;
use store_api::storage::{ScanRequest, TableId};
use super::CLUSTER_INFO;
use crate::error::{CreateRecordBatchSnafu, InternalSnafu, ListNodesSnafu, Result};
use crate::error::{CreateRecordBatchSnafu, InternalSnafu, Result};
use crate::system_schema::information_schema::{InformationTable, Predicates};
use crate::system_schema::utils;
use crate::CatalogManager;
@@ -70,7 +67,6 @@ const INIT_CAPACITY: usize = 42;
pub(super) struct InformationSchemaClusterInfo {
schema: SchemaRef,
catalog_manager: Weak<dyn CatalogManager>,
start_time_ms: u64,
}
impl InformationSchemaClusterInfo {
@@ -78,7 +74,6 @@ impl InformationSchemaClusterInfo {
Self {
schema: Self::schema(),
catalog_manager,
start_time_ms: common_time::util::current_time_millis() as u64,
}
}
@@ -100,11 +95,7 @@ impl InformationSchemaClusterInfo {
}
fn builder(&self) -> InformationSchemaClusterInfoBuilder {
InformationSchemaClusterInfoBuilder::new(
self.schema.clone(),
self.catalog_manager.clone(),
self.start_time_ms,
)
InformationSchemaClusterInfoBuilder::new(self.schema.clone(), self.catalog_manager.clone())
}
}
@@ -144,7 +135,6 @@ impl InformationTable for InformationSchemaClusterInfo {
struct InformationSchemaClusterInfoBuilder {
schema: SchemaRef,
start_time_ms: u64,
catalog_manager: Weak<dyn CatalogManager>,
peer_ids: Int64VectorBuilder,
@@ -158,11 +148,7 @@ struct InformationSchemaClusterInfoBuilder {
}
impl InformationSchemaClusterInfoBuilder {
fn new(
schema: SchemaRef,
catalog_manager: Weak<dyn CatalogManager>,
start_time_ms: u64,
) -> Self {
fn new(schema: SchemaRef, catalog_manager: Weak<dyn CatalogManager>) -> Self {
Self {
schema,
catalog_manager,
@@ -174,56 +160,17 @@ impl InformationSchemaClusterInfoBuilder {
start_times: TimestampMillisecondVectorBuilder::with_capacity(INIT_CAPACITY),
uptimes: StringVectorBuilder::with_capacity(INIT_CAPACITY),
active_times: StringVectorBuilder::with_capacity(INIT_CAPACITY),
start_time_ms,
}
}
/// Construct the `information_schema.cluster_info` virtual table
async fn make_cluster_info(&mut self, request: Option<ScanRequest>) -> Result<RecordBatch> {
let predicates = Predicates::from_scan_request(&request);
let mode = utils::running_mode(&self.catalog_manager)?.unwrap_or(Mode::Standalone);
match mode {
Mode::Standalone => {
let build_info = common_version::build_info();
self.add_node_info(
&predicates,
NodeInfo {
// For the standalone:
// - id always 0
// - empty string for peer_addr
peer: Peer {
id: 0,
addr: "".to_string(),
},
last_activity_ts: -1,
status: NodeStatus::Standalone,
version: build_info.version.to_string(),
git_commit: build_info.commit_short.to_string(),
// Use `self.start_time_ms` instead.
// It's not precise but enough.
start_time_ms: self.start_time_ms,
},
);
}
Mode::Distributed => {
if let Some(meta_client) = utils::meta_client(&self.catalog_manager)? {
let node_infos = meta_client
.list_nodes(None)
.await
.map_err(BoxedError::new)
.context(ListNodesSnafu)?;
for node_info in node_infos {
self.add_node_info(&predicates, node_info);
}
} else {
warn!("Could not find meta client in distributed mode.");
}
}
let information_extension = utils::information_extension(&self.catalog_manager)?;
let node_infos = information_extension.nodes().await?;
for node_info in node_infos {
self.add_node_info(&predicates, node_info);
}
self.finish()
}

View File

@@ -14,14 +14,10 @@
use std::sync::{Arc, Weak};
use api::v1::meta::{ProcedureMeta, ProcedureStatus};
use arrow_schema::SchemaRef as ArrowSchemaRef;
use common_catalog::consts::INFORMATION_SCHEMA_PROCEDURE_INFO_TABLE_ID;
use common_config::Mode;
use common_error::ext::BoxedError;
use common_meta::ddl::{ExecutorContext, ProcedureExecutor};
use common_meta::rpc::procedure;
use common_procedure::{ProcedureInfo, ProcedureState};
use common_procedure::ProcedureInfo;
use common_recordbatch::adapter::RecordBatchStreamAdapter;
use common_recordbatch::{RecordBatch, SendableRecordBatchStream};
use common_time::timestamp::Timestamp;
@@ -38,10 +34,7 @@ use snafu::ResultExt;
use store_api::storage::{ScanRequest, TableId};
use super::PROCEDURE_INFO;
use crate::error::{
ConvertProtoDataSnafu, CreateRecordBatchSnafu, GetProcedureClientSnafu, InternalSnafu,
ListProceduresSnafu, ProcedureIdNotFoundSnafu, Result,
};
use crate::error::{CreateRecordBatchSnafu, InternalSnafu, Result};
use crate::system_schema::information_schema::{InformationTable, Predicates};
use crate::system_schema::utils;
use crate::CatalogManager;
@@ -167,45 +160,11 @@ impl InformationSchemaProcedureInfoBuilder {
/// Construct the `information_schema.procedure_info` virtual table
async fn make_procedure_info(&mut self, request: Option<ScanRequest>) -> Result<RecordBatch> {
let predicates = Predicates::from_scan_request(&request);
let mode = utils::running_mode(&self.catalog_manager)?.unwrap_or(Mode::Standalone);
match mode {
Mode::Standalone => {
if let Some(procedure_manager) = utils::procedure_manager(&self.catalog_manager)? {
let procedures = procedure_manager
.list_procedures()
.await
.map_err(BoxedError::new)
.context(ListProceduresSnafu)?;
for procedure in procedures {
self.add_procedure(
&predicates,
procedure.state.as_str_name().to_string(),
procedure,
);
}
} else {
return GetProcedureClientSnafu { mode: "standalone" }.fail();
}
}
Mode::Distributed => {
if let Some(meta_client) = utils::meta_client(&self.catalog_manager)? {
let procedures = meta_client
.list_procedures(&ExecutorContext::default())
.await
.map_err(BoxedError::new)
.context(ListProceduresSnafu)?;
for procedure in procedures.procedures {
self.add_procedure_info(&predicates, procedure)?;
}
} else {
return GetProcedureClientSnafu {
mode: "distributed",
}
.fail();
}
}
};
let information_extension = utils::information_extension(&self.catalog_manager)?;
let procedures = information_extension.procedures().await?;
for (status, procedure_info) in procedures {
self.add_procedure(&predicates, status, procedure_info);
}
self.finish()
}
@@ -247,34 +206,6 @@ impl InformationSchemaProcedureInfoBuilder {
self.lock_keys.push(Some(&lock_keys));
}
fn add_procedure_info(
&mut self,
predicates: &Predicates,
procedure: ProcedureMeta,
) -> Result<()> {
let pid = match procedure.id {
Some(pid) => pid,
None => return ProcedureIdNotFoundSnafu {}.fail(),
};
let pid = procedure::pb_pid_to_pid(&pid)
.map_err(BoxedError::new)
.context(ConvertProtoDataSnafu)?;
let status = ProcedureStatus::try_from(procedure.status)
.map(|v| v.as_str_name())
.unwrap_or("Unknown")
.to_string();
let procedure_info = ProcedureInfo {
id: pid,
type_name: procedure.type_name,
start_time_ms: procedure.start_time_ms,
end_time_ms: procedure.end_time_ms,
state: ProcedureState::Running,
lock_keys: procedure.lock_keys,
};
self.add_procedure(predicates, status, procedure_info);
Ok(())
}
fn finish(&mut self) -> Result<RecordBatch> {
let columns: Vec<VectorRef> = vec![
Arc::new(self.procedure_ids.finish()),

View File

@@ -16,13 +16,10 @@ use std::sync::{Arc, Weak};
use arrow_schema::SchemaRef as ArrowSchemaRef;
use common_catalog::consts::INFORMATION_SCHEMA_REGION_STATISTICS_TABLE_ID;
use common_config::Mode;
use common_error::ext::BoxedError;
use common_meta::cluster::ClusterInfo;
use common_meta::datanode::RegionStat;
use common_recordbatch::adapter::RecordBatchStreamAdapter;
use common_recordbatch::{DfSendableRecordBatchStream, RecordBatch, SendableRecordBatchStream};
use common_telemetry::tracing::warn;
use datafusion::execution::TaskContext;
use datafusion::physical_plan::stream::RecordBatchStreamAdapter as DfRecordBatchStreamAdapter;
use datafusion::physical_plan::streaming::PartitionStream as DfPartitionStream;
@@ -34,7 +31,7 @@ use snafu::ResultExt;
use store_api::storage::{ScanRequest, TableId};
use super::{InformationTable, REGION_STATISTICS};
use crate::error::{CreateRecordBatchSnafu, InternalSnafu, ListRegionStatsSnafu, Result};
use crate::error::{CreateRecordBatchSnafu, InternalSnafu, Result};
use crate::information_schema::Predicates;
use crate::system_schema::utils;
use crate::CatalogManager;
@@ -167,28 +164,11 @@ impl InformationSchemaRegionStatisticsBuilder {
request: Option<ScanRequest>,
) -> Result<RecordBatch> {
let predicates = Predicates::from_scan_request(&request);
let mode = utils::running_mode(&self.catalog_manager)?.unwrap_or(Mode::Standalone);
match mode {
Mode::Standalone => {
// TODO(weny): implement it
}
Mode::Distributed => {
if let Some(meta_client) = utils::meta_client(&self.catalog_manager)? {
let region_stats = meta_client
.list_region_stats()
.await
.map_err(BoxedError::new)
.context(ListRegionStatsSnafu)?;
for region_stat in region_stats {
self.add_region_statistic(&predicates, region_stat);
}
} else {
warn!("Meta client is not available");
}
}
let information_extension = utils::information_extension(&self.catalog_manager)?;
let region_stats = information_extension.region_stats().await?;
for region_stat in region_stats {
self.add_region_statistic(&predicates, region_stat);
}
self.finish()
}

View File

@@ -12,48 +12,33 @@
// See the License for the specific language governing permissions and
// limitations under the License.
pub mod tables;
use std::sync::Weak;
use std::sync::{Arc, Weak};
use common_config::Mode;
use common_meta::key::TableMetadataManagerRef;
use common_procedure::ProcedureManagerRef;
use meta_client::client::MetaClient;
use snafu::OptionExt;
use crate::error::{Result, UpgradeWeakCatalogManagerRefSnafu};
use crate::error::{GetInformationExtensionSnafu, Result, UpgradeWeakCatalogManagerRefSnafu};
use crate::information_schema::InformationExtensionRef;
use crate::kvbackend::KvBackendCatalogManager;
use crate::CatalogManager;
/// Try to get the server running mode from `[CatalogManager]` weak reference.
pub fn running_mode(catalog_manager: &Weak<dyn CatalogManager>) -> Result<Option<Mode>> {
pub mod tables;
/// Try to get the `[InformationExtension]` from `[CatalogManager]` weak reference.
pub fn information_extension(
catalog_manager: &Weak<dyn CatalogManager>,
) -> Result<InformationExtensionRef> {
let catalog_manager = catalog_manager
.upgrade()
.context(UpgradeWeakCatalogManagerRefSnafu)?;
Ok(catalog_manager
let information_extension = catalog_manager
.as_any()
.downcast_ref::<KvBackendCatalogManager>()
.map(|manager| manager.running_mode())
.copied())
}
.map(|manager| manager.information_extension())
.context(GetInformationExtensionSnafu)?;
/// Try to get the `[MetaClient]` from `[CatalogManager]` weak reference.
pub fn meta_client(catalog_manager: &Weak<dyn CatalogManager>) -> Result<Option<Arc<MetaClient>>> {
let catalog_manager = catalog_manager
.upgrade()
.context(UpgradeWeakCatalogManagerRefSnafu)?;
let meta_client = match catalog_manager
.as_any()
.downcast_ref::<KvBackendCatalogManager>()
{
None => None,
Some(manager) => manager.meta_client(),
};
Ok(meta_client)
Ok(information_extension)
}
/// Try to get the `[TableMetadataManagerRef]` from `[CatalogManager]` weak reference.
@@ -69,17 +54,3 @@ pub fn table_meta_manager(
.downcast_ref::<KvBackendCatalogManager>()
.map(|manager| manager.table_metadata_manager_ref().clone()))
}
/// Try to get the `[ProcedureManagerRef]` from `[CatalogManager]` weak reference.
pub fn procedure_manager(
catalog_manager: &Weak<dyn CatalogManager>,
) -> Result<Option<ProcedureManagerRef>> {
let catalog_manager = catalog_manager
.upgrade()
.context(UpgradeWeakCatalogManagerRefSnafu)?;
Ok(catalog_manager
.as_any()
.downcast_ref::<KvBackendCatalogManager>()
.and_then(|manager| manager.procedure_manager()))
}

View File

@@ -259,7 +259,6 @@ mod tests {
use arrow::datatypes::{DataType, Field, Schema, SchemaRef};
use cache::{build_fundamental_cache_registry, with_default_composite_cache_registry};
use common_config::Mode;
use common_meta::cache::{CacheRegistryBuilder, LayeredCacheRegistryBuilder};
use common_meta::key::TableMetadataManager;
use common_meta::kv_backend::memory::MemoryKvBackend;
@@ -269,6 +268,8 @@ mod tests {
use datafusion::logical_expr::builder::LogicalTableSource;
use datafusion::logical_expr::{col, lit, LogicalPlan, LogicalPlanBuilder};
use crate::information_schema::NoopInformationExtension;
struct MockDecoder;
impl MockDecoder {
pub fn arc() -> Arc<Self> {
@@ -323,8 +324,7 @@ mod tests {
);
let catalog_manager = KvBackendCatalogManager::new(
Mode::Standalone,
None,
Arc::new(NoopInformationExtension),
backend.clone(),
layered_cache_registry,
None,

View File

@@ -46,12 +46,12 @@ use substrait::{DFLogicalSubstraitConvertor, SubstraitPlan};
use crate::cli::cmd::ReplCommand;
use crate::cli::helper::RustylineHelper;
use crate::cli::AttachCommand;
use crate::error;
use crate::error::{
CollectRecordBatchesSnafu, ParseSqlSnafu, PlanStatementSnafu, PrettyPrintRecordBatchesSnafu,
ReadlineSnafu, ReplCreationSnafu, RequestDatabaseSnafu, Result, StartMetaClientSnafu,
SubstraitEncodeLogicalPlanSnafu,
};
use crate::{error, DistributedInformationExtension};
/// Captures the state of the repl, gathers commands and executes them one by one
pub struct Repl {
@@ -275,9 +275,9 @@ async fn create_query_engine(meta_addr: &str) -> Result<DatafusionQueryEngine> {
.build(),
);
let information_extension = Arc::new(DistributedInformationExtension::new(meta_client.clone()));
let catalog_manager = KvBackendCatalogManager::new(
Mode::Distributed,
Some(meta_client.clone()),
information_extension,
cached_meta_backend.clone(),
layered_cache_registry,
None,

View File

@@ -41,7 +41,7 @@ use crate::error::{
MissingConfigSnafu, Result, ShutdownFlownodeSnafu, StartFlownodeSnafu,
};
use crate::options::{GlobalOptions, GreptimeOptions};
use crate::{log_versions, App};
use crate::{log_versions, App, DistributedInformationExtension};
pub const APP_NAME: &str = "greptime-flownode";
@@ -269,9 +269,10 @@ impl StartCommand {
.build(),
);
let information_extension =
Arc::new(DistributedInformationExtension::new(meta_client.clone()));
let catalog_manager = KvBackendCatalogManager::new(
opts.mode,
Some(meta_client.clone()),
information_extension,
cached_meta_backend.clone(),
layered_cache_registry.clone(),
None,

View File

@@ -38,7 +38,6 @@ use frontend::server::Services;
use meta_client::{MetaClientOptions, MetaClientType};
use query::stats::StatementStatistics;
use servers::tls::{TlsMode, TlsOption};
use servers::Mode;
use snafu::{OptionExt, ResultExt};
use tracing_appender::non_blocking::WorkerGuard;
@@ -47,7 +46,7 @@ use crate::error::{
Result, StartFrontendSnafu,
};
use crate::options::{GlobalOptions, GreptimeOptions};
use crate::{log_versions, App};
use crate::{log_versions, App, DistributedInformationExtension};
type FrontendOptions = GreptimeOptions<frontend::frontend::FrontendOptions>;
@@ -316,9 +315,10 @@ impl StartCommand {
.build(),
);
let information_extension =
Arc::new(DistributedInformationExtension::new(meta_client.clone()));
let catalog_manager = KvBackendCatalogManager::new(
Mode::Distributed,
Some(meta_client.clone()),
information_extension,
cached_meta_backend.clone(),
layered_cache_registry.clone(),
None,

View File

@@ -15,7 +15,17 @@
#![feature(assert_matches, let_chains)]
use async_trait::async_trait;
use catalog::information_schema::InformationExtension;
use client::api::v1::meta::ProcedureStatus;
use common_error::ext::BoxedError;
use common_meta::cluster::{ClusterInfo, NodeInfo};
use common_meta::datanode::RegionStat;
use common_meta::ddl::{ExecutorContext, ProcedureExecutor};
use common_meta::rpc::procedure;
use common_procedure::{ProcedureInfo, ProcedureState};
use common_telemetry::{error, info};
use meta_client::MetaClientRef;
use snafu::ResultExt;
use crate::error::Result;
@@ -94,3 +104,69 @@ fn log_env_flags() {
info!("argument: {}", argument);
}
}
pub struct DistributedInformationExtension {
meta_client: MetaClientRef,
}
impl DistributedInformationExtension {
pub fn new(meta_client: MetaClientRef) -> Self {
Self { meta_client }
}
}
#[async_trait::async_trait]
impl InformationExtension for DistributedInformationExtension {
type Error = catalog::error::Error;
async fn nodes(&self) -> std::result::Result<Vec<NodeInfo>, Self::Error> {
self.meta_client
.list_nodes(None)
.await
.map_err(BoxedError::new)
.context(catalog::error::ListNodesSnafu)
}
async fn procedures(&self) -> std::result::Result<Vec<(String, ProcedureInfo)>, Self::Error> {
let procedures = self
.meta_client
.list_procedures(&ExecutorContext::default())
.await
.map_err(BoxedError::new)
.context(catalog::error::ListProceduresSnafu)?
.procedures;
let mut result = Vec::with_capacity(procedures.len());
for procedure in procedures {
let pid = match procedure.id {
Some(pid) => pid,
None => return catalog::error::ProcedureIdNotFoundSnafu {}.fail(),
};
let pid = procedure::pb_pid_to_pid(&pid)
.map_err(BoxedError::new)
.context(catalog::error::ConvertProtoDataSnafu)?;
let status = ProcedureStatus::try_from(procedure.status)
.map(|v| v.as_str_name())
.unwrap_or("Unknown")
.to_string();
let procedure_info = ProcedureInfo {
id: pid,
type_name: procedure.type_name,
start_time_ms: procedure.start_time_ms,
end_time_ms: procedure.end_time_ms,
state: ProcedureState::Running,
lock_keys: procedure.lock_keys,
};
result.push((status, procedure_info));
}
Ok(result)
}
async fn region_stats(&self) -> std::result::Result<Vec<RegionStat>, Self::Error> {
self.meta_client
.list_region_stats()
.await
.map_err(BoxedError::new)
.context(catalog::error::ListRegionStatsSnafu)
}
}

View File

@@ -17,14 +17,18 @@ use std::{fs, path};
use async_trait::async_trait;
use cache::{build_fundamental_cache_registry, with_default_composite_cache_registry};
use catalog::information_schema::InformationExtension;
use catalog::kvbackend::KvBackendCatalogManager;
use clap::Parser;
use client::api::v1::meta::RegionRole;
use common_base::Plugins;
use common_catalog::consts::{MIN_USER_FLOW_ID, MIN_USER_TABLE_ID};
use common_config::{metadata_store_dir, Configurable, KvBackendConfig};
use common_error::ext::BoxedError;
use common_meta::cache::LayeredCacheRegistryBuilder;
use common_meta::cache_invalidator::CacheInvalidatorRef;
use common_meta::cluster::{NodeInfo, NodeStatus};
use common_meta::datanode::RegionStat;
use common_meta::ddl::flow_meta::{FlowMetadataAllocator, FlowMetadataAllocatorRef};
use common_meta::ddl::table_meta::{TableMetadataAllocator, TableMetadataAllocatorRef};
use common_meta::ddl::{DdlContext, NoopRegionFailureDetectorControl, ProcedureExecutorRef};
@@ -33,10 +37,11 @@ use common_meta::key::flow::{FlowMetadataManager, FlowMetadataManagerRef};
use common_meta::key::{TableMetadataManager, TableMetadataManagerRef};
use common_meta::kv_backend::KvBackendRef;
use common_meta::node_manager::NodeManagerRef;
use common_meta::peer::Peer;
use common_meta::region_keeper::MemoryRegionKeeper;
use common_meta::sequence::SequenceBuilder;
use common_meta::wal_options_allocator::{WalOptionsAllocator, WalOptionsAllocatorRef};
use common_procedure::ProcedureManagerRef;
use common_procedure::{ProcedureInfo, ProcedureManagerRef};
use common_telemetry::info;
use common_telemetry::logging::{LoggingOptions, TracingOptions};
use common_time::timezone::set_default_timezone;
@@ -44,6 +49,7 @@ use common_version::{short_version, version};
use common_wal::config::DatanodeWalConfig;
use datanode::config::{DatanodeOptions, ProcedureConfig, RegionEngineConfig, StorageConfig};
use datanode::datanode::{Datanode, DatanodeBuilder};
use datanode::region_server::RegionServer;
use file_engine::config::EngineConfig as FileEngineConfig;
use flow::{FlowWorkerManager, FlownodeBuilder, FrontendInvoker};
use frontend::frontend::FrontendOptions;
@@ -478,9 +484,18 @@ impl StartCommand {
.build(),
);
let datanode = DatanodeBuilder::new(dn_opts, plugins.clone())
.with_kv_backend(kv_backend.clone())
.build()
.await
.context(StartDatanodeSnafu)?;
let information_extension = Arc::new(StandaloneInformationExtension::new(
datanode.region_server(),
procedure_manager.clone(),
));
let catalog_manager = KvBackendCatalogManager::new(
dn_opts.mode,
None,
information_extension,
kv_backend.clone(),
layered_cache_registry.clone(),
Some(procedure_manager.clone()),
@@ -489,12 +504,6 @@ impl StartCommand {
let table_metadata_manager =
Self::create_table_metadata_manager(kv_backend.clone()).await?;
let datanode = DatanodeBuilder::new(dn_opts, plugins.clone())
.with_kv_backend(kv_backend.clone())
.build()
.await
.context(StartDatanodeSnafu)?;
let flow_metadata_manager = Arc::new(FlowMetadataManager::new(kv_backend.clone()));
let flow_builder = FlownodeBuilder::new(
Default::default(),
@@ -644,6 +653,91 @@ impl StartCommand {
}
}
struct StandaloneInformationExtension {
region_server: RegionServer,
procedure_manager: ProcedureManagerRef,
start_time_ms: u64,
}
impl StandaloneInformationExtension {
pub fn new(region_server: RegionServer, procedure_manager: ProcedureManagerRef) -> Self {
Self {
region_server,
procedure_manager,
start_time_ms: common_time::util::current_time_millis() as u64,
}
}
}
#[async_trait::async_trait]
impl InformationExtension for StandaloneInformationExtension {
type Error = catalog::error::Error;
async fn nodes(&self) -> std::result::Result<Vec<NodeInfo>, Self::Error> {
let build_info = common_version::build_info();
let node_info = NodeInfo {
// For the standalone:
// - id always 0
// - empty string for peer_addr
peer: Peer {
id: 0,
addr: "".to_string(),
},
last_activity_ts: -1,
status: NodeStatus::Standalone,
version: build_info.version.to_string(),
git_commit: build_info.commit_short.to_string(),
// Use `self.start_time_ms` instead.
// It's not precise but enough.
start_time_ms: self.start_time_ms,
};
Ok(vec![node_info])
}
async fn procedures(&self) -> std::result::Result<Vec<(String, ProcedureInfo)>, Self::Error> {
self.procedure_manager
.list_procedures()
.await
.map_err(BoxedError::new)
.map(|procedures| {
procedures
.into_iter()
.map(|procedure| {
let status = procedure.state.as_str_name().to_string();
(status, procedure)
})
.collect::<Vec<_>>()
})
.context(catalog::error::ListProceduresSnafu)
}
async fn region_stats(&self) -> std::result::Result<Vec<RegionStat>, Self::Error> {
let stats = self
.region_server
.reportable_regions()
.into_iter()
.map(|stat| {
let region_stat = self
.region_server
.region_statistic(stat.region_id)
.unwrap_or_default();
RegionStat {
id: stat.region_id,
rcus: 0,
wcus: 0,
approximate_bytes: region_stat.estimated_disk_size() as i64,
engine: stat.engine,
role: RegionRole::from(stat.role).into(),
memtable_size: region_stat.memtable_size,
manifest_size: region_stat.manifest_size,
sst_size: region_stat.sst_size,
}
})
.collect::<Vec<_>>();
Ok(stats)
}
}
#[cfg(test)]
mod tests {
use std::default::Default;

View File

@@ -31,6 +31,7 @@ pub use polyval::PolyvalAccumulatorCreator;
pub use scipy_stats_norm_cdf::ScipyStatsNormCdfAccumulatorCreator;
pub use scipy_stats_norm_pdf::ScipyStatsNormPdfAccumulatorCreator;
use super::geo::encoding::JsonPathEncodeFunctionCreator;
use crate::function_registry::FunctionRegistry;
/// A function creates `AggregateFunctionCreator`.
@@ -91,5 +92,7 @@ impl AggregateFunctions {
register_aggr_func!("argmin", 1, ArgminAccumulatorCreator);
register_aggr_func!("scipystatsnormcdf", 2, ScipyStatsNormCdfAccumulatorCreator);
register_aggr_func!("scipystatsnormpdf", 2, ScipyStatsNormPdfAccumulatorCreator);
register_aggr_func!("json_encode_path", 3, JsonPathEncodeFunctionCreator);
}
}

View File

@@ -16,7 +16,10 @@ use std::cmp::Ordering;
use std::sync::Arc;
use common_macro::{as_aggr_func_creator, AggrFuncTypeStore};
use common_query::error::{BadAccumulatorImplSnafu, CreateAccumulatorSnafu, Result};
use common_query::error::{
BadAccumulatorImplSnafu, CreateAccumulatorSnafu, InvalidInputStateSnafu, Result,
};
use common_query::logical_plan::accumulator::AggrFuncTypeStore;
use common_query::logical_plan::{Accumulator, AggregateFunctionCreator};
use common_query::prelude::*;
use datatypes::prelude::*;

View File

@@ -16,7 +16,10 @@ use std::cmp::Ordering;
use std::sync::Arc;
use common_macro::{as_aggr_func_creator, AggrFuncTypeStore};
use common_query::error::{BadAccumulatorImplSnafu, CreateAccumulatorSnafu, Result};
use common_query::error::{
BadAccumulatorImplSnafu, CreateAccumulatorSnafu, InvalidInputStateSnafu, Result,
};
use common_query::logical_plan::accumulator::AggrFuncTypeStore;
use common_query::logical_plan::{Accumulator, AggregateFunctionCreator};
use common_query::prelude::*;
use datatypes::prelude::*;

View File

@@ -17,8 +17,10 @@ use std::sync::Arc;
use common_macro::{as_aggr_func_creator, AggrFuncTypeStore};
use common_query::error::{
CreateAccumulatorSnafu, DowncastVectorSnafu, FromScalarValueSnafu, Result,
CreateAccumulatorSnafu, DowncastVectorSnafu, FromScalarValueSnafu, InvalidInputStateSnafu,
Result,
};
use common_query::logical_plan::accumulator::AggrFuncTypeStore;
use common_query::logical_plan::{Accumulator, AggregateFunctionCreator};
use common_query::prelude::*;
use datatypes::prelude::*;

View File

@@ -17,8 +17,10 @@ use std::sync::Arc;
use common_macro::{as_aggr_func_creator, AggrFuncTypeStore};
use common_query::error::{
BadAccumulatorImplSnafu, CreateAccumulatorSnafu, DowncastVectorSnafu, Result,
BadAccumulatorImplSnafu, CreateAccumulatorSnafu, DowncastVectorSnafu, InvalidInputStateSnafu,
Result,
};
use common_query::logical_plan::accumulator::AggrFuncTypeStore;
use common_query::logical_plan::{Accumulator, AggregateFunctionCreator};
use common_query::prelude::*;
use datatypes::prelude::*;

View File

@@ -18,8 +18,9 @@ use std::sync::Arc;
use common_macro::{as_aggr_func_creator, AggrFuncTypeStore};
use common_query::error::{
self, BadAccumulatorImplSnafu, CreateAccumulatorSnafu, DowncastVectorSnafu,
FromScalarValueSnafu, InvalidInputColSnafu, Result,
FromScalarValueSnafu, InvalidInputColSnafu, InvalidInputStateSnafu, Result,
};
use common_query::logical_plan::accumulator::AggrFuncTypeStore;
use common_query::logical_plan::{Accumulator, AggregateFunctionCreator};
use common_query::prelude::*;
use datatypes::prelude::*;

View File

@@ -17,8 +17,10 @@ use std::sync::Arc;
use common_macro::{as_aggr_func_creator, AggrFuncTypeStore};
use common_query::error::{
self, BadAccumulatorImplSnafu, CreateAccumulatorSnafu, DowncastVectorSnafu,
FromScalarValueSnafu, GenerateFunctionSnafu, InvalidInputColSnafu, Result,
FromScalarValueSnafu, GenerateFunctionSnafu, InvalidInputColSnafu, InvalidInputStateSnafu,
Result,
};
use common_query::logical_plan::accumulator::AggrFuncTypeStore;
use common_query::logical_plan::{Accumulator, AggregateFunctionCreator};
use common_query::prelude::*;
use datatypes::prelude::*;

View File

@@ -17,8 +17,10 @@ use std::sync::Arc;
use common_macro::{as_aggr_func_creator, AggrFuncTypeStore};
use common_query::error::{
self, BadAccumulatorImplSnafu, CreateAccumulatorSnafu, DowncastVectorSnafu,
FromScalarValueSnafu, GenerateFunctionSnafu, InvalidInputColSnafu, Result,
FromScalarValueSnafu, GenerateFunctionSnafu, InvalidInputColSnafu, InvalidInputStateSnafu,
Result,
};
use common_query::logical_plan::accumulator::AggrFuncTypeStore;
use common_query::logical_plan::{Accumulator, AggregateFunctionCreator};
use common_query::prelude::*;
use datatypes::prelude::*;

View File

@@ -13,8 +13,10 @@
// limitations under the License.
use std::sync::Arc;
pub(crate) mod encoding;
mod geohash;
mod h3;
mod helpers;
use geohash::{GeohashFunction, GeohashNeighboursFunction};
@@ -27,18 +29,31 @@ impl GeoFunctions {
// geohash
registry.register(Arc::new(GeohashFunction));
registry.register(Arc::new(GeohashNeighboursFunction));
// h3 family
// h3 index
registry.register(Arc::new(h3::H3LatLngToCell));
registry.register(Arc::new(h3::H3LatLngToCellString));
// h3 index inspection
registry.register(Arc::new(h3::H3CellBase));
registry.register(Arc::new(h3::H3CellCenterChild));
registry.register(Arc::new(h3::H3CellCenterLat));
registry.register(Arc::new(h3::H3CellCenterLng));
registry.register(Arc::new(h3::H3CellIsPentagon));
registry.register(Arc::new(h3::H3CellParent));
registry.register(Arc::new(h3::H3CellResolution));
registry.register(Arc::new(h3::H3CellToString));
registry.register(Arc::new(h3::H3IsNeighbour));
registry.register(Arc::new(h3::H3StringToCell));
registry.register(Arc::new(h3::H3CellToString));
registry.register(Arc::new(h3::H3CellCenterLatLng));
registry.register(Arc::new(h3::H3CellResolution));
// h3 hierarchical grid
registry.register(Arc::new(h3::H3CellCenterChild));
registry.register(Arc::new(h3::H3CellParent));
registry.register(Arc::new(h3::H3CellToChildren));
registry.register(Arc::new(h3::H3CellToChildrenSize));
registry.register(Arc::new(h3::H3CellToChildPos));
registry.register(Arc::new(h3::H3ChildPosToCell));
// h3 grid traversal
registry.register(Arc::new(h3::H3GridDisk));
registry.register(Arc::new(h3::H3GridDiskDistances));
registry.register(Arc::new(h3::H3GridDistance));
registry.register(Arc::new(h3::H3GridPathCells));
}
}

View File

@@ -0,0 +1,223 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
use std::sync::Arc;
use common_error::ext::{BoxedError, PlainError};
use common_error::status_code::StatusCode;
use common_macro::{as_aggr_func_creator, AggrFuncTypeStore};
use common_query::error::{self, InvalidFuncArgsSnafu, InvalidInputStateSnafu, Result};
use common_query::logical_plan::accumulator::AggrFuncTypeStore;
use common_query::logical_plan::{Accumulator, AggregateFunctionCreator};
use common_query::prelude::AccumulatorCreatorFunction;
use common_time::Timestamp;
use datatypes::prelude::ConcreteDataType;
use datatypes::value::{ListValue, Value};
use datatypes::vectors::VectorRef;
use snafu::{ensure, ResultExt};
use super::helpers::{ensure_columns_len, ensure_columns_n};
/// Accumulator of lat, lng, timestamp tuples
#[derive(Debug)]
pub struct JsonPathAccumulator {
timestamp_type: ConcreteDataType,
lat: Vec<Option<f64>>,
lng: Vec<Option<f64>>,
timestamp: Vec<Option<Timestamp>>,
}
impl JsonPathAccumulator {
fn new(timestamp_type: ConcreteDataType) -> Self {
Self {
lat: Vec::default(),
lng: Vec::default(),
timestamp: Vec::default(),
timestamp_type,
}
}
}
impl Accumulator for JsonPathAccumulator {
fn state(&self) -> Result<Vec<Value>> {
Ok(vec![
Value::List(ListValue::new(
self.lat.iter().map(|i| Value::from(*i)).collect(),
ConcreteDataType::float64_datatype(),
)),
Value::List(ListValue::new(
self.lng.iter().map(|i| Value::from(*i)).collect(),
ConcreteDataType::float64_datatype(),
)),
Value::List(ListValue::new(
self.timestamp.iter().map(|i| Value::from(*i)).collect(),
self.timestamp_type.clone(),
)),
])
}
fn update_batch(&mut self, columns: &[VectorRef]) -> Result<()> {
// update batch as in datafusion just provides the accumulator original
// input.
//
// columns is vec of [`lat`, `lng`, `timestamp`]
// where
// - `lat` is a vector of `Value::Float64` or similar type. Each item in
// the vector is a row in given dataset.
// - so on so forth for `lng` and `timestamp`
ensure_columns_n!(columns, 3);
let lat = &columns[0];
let lng = &columns[1];
let ts = &columns[2];
let size = lat.len();
for idx in 0..size {
self.lat.push(lat.get(idx).as_f64_lossy());
self.lng.push(lng.get(idx).as_f64_lossy());
self.timestamp.push(ts.get(idx).as_timestamp());
}
Ok(())
}
fn merge_batch(&mut self, states: &[VectorRef]) -> Result<()> {
// merge batch as in datafusion gives state accumulated from the data
// returned from child accumulators' state() call
// In our particular implementation, the data structure is like
//
// states is vec of [`lat`, `lng`, `timestamp`]
// where
// - `lat` is a vector of `Value::List`. Each item in the list is all
// coordinates from a child accumulator.
// - so on so forth for `lng` and `timestamp`
ensure_columns_n!(states, 3);
let lat_lists = &states[0];
let lng_lists = &states[1];
let ts_lists = &states[2];
let len = lat_lists.len();
for idx in 0..len {
if let Some(lat_list) = lat_lists
.get(idx)
.as_list()
.map_err(BoxedError::new)
.context(error::ExecuteSnafu)?
{
for v in lat_list.items() {
self.lat.push(v.as_f64_lossy());
}
}
if let Some(lng_list) = lng_lists
.get(idx)
.as_list()
.map_err(BoxedError::new)
.context(error::ExecuteSnafu)?
{
for v in lng_list.items() {
self.lng.push(v.as_f64_lossy());
}
}
if let Some(ts_list) = ts_lists
.get(idx)
.as_list()
.map_err(BoxedError::new)
.context(error::ExecuteSnafu)?
{
for v in ts_list.items() {
self.timestamp.push(v.as_timestamp());
}
}
}
Ok(())
}
fn evaluate(&self) -> Result<Value> {
let mut work_vec: Vec<(&Option<f64>, &Option<f64>, &Option<Timestamp>)> = self
.lat
.iter()
.zip(self.lng.iter())
.zip(self.timestamp.iter())
.map(|((a, b), c)| (a, b, c))
.collect();
// sort by timestamp, we treat null timestamp as 0
work_vec.sort_unstable_by_key(|tuple| tuple.2.unwrap_or_else(|| Timestamp::new_second(0)));
let result = serde_json::to_string(
&work_vec
.into_iter()
// note that we transform to lng,lat for geojson compatibility
.map(|(lat, lng, _)| vec![lng, lat])
.collect::<Vec<Vec<&Option<f64>>>>(),
)
.map_err(|e| {
BoxedError::new(PlainError::new(
format!("Serialization failure: {}", e),
StatusCode::EngineExecuteQuery,
))
})
.context(error::ExecuteSnafu)?;
Ok(Value::String(result.into()))
}
}
/// This function accept rows of lat, lng and timestamp, sort with timestamp and
/// encoding them into a geojson-like path.
///
/// Example:
///
/// ```sql
/// SELECT json_encode_path(lat, lon, timestamp) FROM table [group by ...];
/// ```
///
#[as_aggr_func_creator]
#[derive(Debug, Default, AggrFuncTypeStore)]
pub struct JsonPathEncodeFunctionCreator {}
impl AggregateFunctionCreator for JsonPathEncodeFunctionCreator {
fn creator(&self) -> AccumulatorCreatorFunction {
let creator: AccumulatorCreatorFunction = Arc::new(move |types: &[ConcreteDataType]| {
let ts_type = types[2].clone();
Ok(Box::new(JsonPathAccumulator::new(ts_type)))
});
creator
}
fn output_type(&self) -> Result<ConcreteDataType> {
Ok(ConcreteDataType::string_datatype())
}
fn state_types(&self) -> Result<Vec<ConcreteDataType>> {
let input_types = self.input_types()?;
ensure!(input_types.len() == 3, InvalidInputStateSnafu);
let timestamp_type = input_types[2].clone();
Ok(vec![
ConcreteDataType::list_datatype(ConcreteDataType::float64_datatype()),
ConcreteDataType::list_datatype(ConcreteDataType::float64_datatype()),
ConcreteDataType::list_datatype(timestamp_type),
])
}
}

View File

@@ -20,18 +20,71 @@ use common_query::error::{self, InvalidFuncArgsSnafu, Result};
use common_query::prelude::{Signature, TypeSignature};
use datafusion::logical_expr::Volatility;
use datatypes::prelude::ConcreteDataType;
use datatypes::scalars::ScalarVectorBuilder;
use datatypes::value::Value;
use datatypes::scalars::{Scalar, ScalarVectorBuilder};
use datatypes::value::{ListValue, Value};
use datatypes::vectors::{
BooleanVectorBuilder, Float64VectorBuilder, MutableVector, StringVectorBuilder,
UInt64VectorBuilder, UInt8VectorBuilder, VectorRef,
BooleanVectorBuilder, Int32VectorBuilder, ListVectorBuilder, MutableVector,
StringVectorBuilder, UInt64VectorBuilder, UInt8VectorBuilder, VectorRef,
};
use derive_more::Display;
use h3o::{CellIndex, LatLng, Resolution};
use once_cell::sync::Lazy;
use snafu::{ensure, ResultExt};
use super::helpers::{ensure_columns_len, ensure_columns_n};
use crate::function::{Function, FunctionContext};
static CELL_TYPES: Lazy<Vec<ConcreteDataType>> = Lazy::new(|| {
vec![
ConcreteDataType::int64_datatype(),
ConcreteDataType::uint64_datatype(),
]
});
static COORDINATE_TYPES: Lazy<Vec<ConcreteDataType>> = Lazy::new(|| {
vec![
ConcreteDataType::float32_datatype(),
ConcreteDataType::float64_datatype(),
]
});
static RESOLUTION_TYPES: Lazy<Vec<ConcreteDataType>> = Lazy::new(|| {
vec![
ConcreteDataType::int8_datatype(),
ConcreteDataType::int16_datatype(),
ConcreteDataType::int32_datatype(),
ConcreteDataType::int64_datatype(),
ConcreteDataType::uint8_datatype(),
ConcreteDataType::uint16_datatype(),
ConcreteDataType::uint32_datatype(),
ConcreteDataType::uint64_datatype(),
]
});
static DISTANCE_TYPES: Lazy<Vec<ConcreteDataType>> = Lazy::new(|| {
vec![
ConcreteDataType::int8_datatype(),
ConcreteDataType::int16_datatype(),
ConcreteDataType::int32_datatype(),
ConcreteDataType::int64_datatype(),
ConcreteDataType::uint8_datatype(),
ConcreteDataType::uint16_datatype(),
ConcreteDataType::uint32_datatype(),
ConcreteDataType::uint64_datatype(),
]
});
static POSITION_TYPES: Lazy<Vec<ConcreteDataType>> = Lazy::new(|| {
vec![
ConcreteDataType::int8_datatype(),
ConcreteDataType::int16_datatype(),
ConcreteDataType::int32_datatype(),
ConcreteDataType::int64_datatype(),
ConcreteDataType::uint8_datatype(),
ConcreteDataType::uint16_datatype(),
ConcreteDataType::uint32_datatype(),
ConcreteDataType::uint64_datatype(),
]
});
/// Function that returns [h3] encoding cellid for a given geospatial coordinate.
///
/// [h3]: https://h3geo.org/
@@ -50,20 +103,8 @@ impl Function for H3LatLngToCell {
fn signature(&self) -> Signature {
let mut signatures = Vec::new();
for coord_type in &[
ConcreteDataType::float32_datatype(),
ConcreteDataType::float64_datatype(),
] {
for resolution_type in &[
ConcreteDataType::int8_datatype(),
ConcreteDataType::int16_datatype(),
ConcreteDataType::int32_datatype(),
ConcreteDataType::int64_datatype(),
ConcreteDataType::uint8_datatype(),
ConcreteDataType::uint16_datatype(),
ConcreteDataType::uint32_datatype(),
ConcreteDataType::uint64_datatype(),
] {
for coord_type in COORDINATE_TYPES.as_slice() {
for resolution_type in RESOLUTION_TYPES.as_slice() {
signatures.push(TypeSignature::Exact(vec![
// latitude
coord_type.clone(),
@@ -78,15 +119,7 @@ impl Function for H3LatLngToCell {
}
fn eval(&self, _func_ctx: FunctionContext, columns: &[VectorRef]) -> Result<VectorRef> {
ensure!(
columns.len() == 3,
InvalidFuncArgsSnafu {
err_msg: format!(
"The length of the args is not correct, expect 3, provided : {}",
columns.len()
),
}
);
ensure_columns_n!(columns, 3);
let lat_vec = &columns[0];
let lon_vec = &columns[1];
@@ -142,20 +175,8 @@ impl Function for H3LatLngToCellString {
fn signature(&self) -> Signature {
let mut signatures = Vec::new();
for coord_type in &[
ConcreteDataType::float32_datatype(),
ConcreteDataType::float64_datatype(),
] {
for resolution_type in &[
ConcreteDataType::int8_datatype(),
ConcreteDataType::int16_datatype(),
ConcreteDataType::int32_datatype(),
ConcreteDataType::int64_datatype(),
ConcreteDataType::uint8_datatype(),
ConcreteDataType::uint16_datatype(),
ConcreteDataType::uint32_datatype(),
ConcreteDataType::uint64_datatype(),
] {
for coord_type in COORDINATE_TYPES.as_slice() {
for resolution_type in RESOLUTION_TYPES.as_slice() {
signatures.push(TypeSignature::Exact(vec![
// latitude
coord_type.clone(),
@@ -170,15 +191,7 @@ impl Function for H3LatLngToCellString {
}
fn eval(&self, _func_ctx: FunctionContext, columns: &[VectorRef]) -> Result<VectorRef> {
ensure!(
columns.len() == 3,
InvalidFuncArgsSnafu {
err_msg: format!(
"The length of the args is not correct, expect 3, provided : {}",
columns.len()
),
}
);
ensure_columns_n!(columns, 3);
let lat_vec = &columns[0];
let lon_vec = &columns[1];
@@ -234,15 +247,7 @@ impl Function for H3CellToString {
}
fn eval(&self, _func_ctx: FunctionContext, columns: &[VectorRef]) -> Result<VectorRef> {
ensure!(
columns.len() == 1,
InvalidFuncArgsSnafu {
err_msg: format!(
"The length of the args is not correct, expect 1, provided : {}",
columns.len()
),
}
);
ensure_columns_n!(columns, 1);
let cell_vec = &columns[0];
let size = cell_vec.len();
@@ -280,15 +285,7 @@ impl Function for H3StringToCell {
}
fn eval(&self, _func_ctx: FunctionContext, columns: &[VectorRef]) -> Result<VectorRef> {
ensure!(
columns.len() == 1,
InvalidFuncArgsSnafu {
err_msg: format!(
"The length of the args is not correct, expect 1, provided : {}",
columns.len()
),
}
);
ensure_columns_n!(columns, 1);
let string_vec = &columns[0];
let size = string_vec.len();
@@ -319,18 +316,20 @@ impl Function for H3StringToCell {
}
}
/// Function that returns centroid latitude of given cell id
/// Function that returns centroid latitude and longitude of given cell id
#[derive(Clone, Debug, Default, Display)]
#[display("{}", self.name())]
pub struct H3CellCenterLat;
pub struct H3CellCenterLatLng;
impl Function for H3CellCenterLat {
impl Function for H3CellCenterLatLng {
fn name(&self) -> &str {
"h3_cell_center_lat"
"h3_cell_center_latlng"
}
fn return_type(&self, _input_types: &[ConcreteDataType]) -> Result<ConcreteDataType> {
Ok(ConcreteDataType::float64_datatype())
Ok(ConcreteDataType::list_datatype(
ConcreteDataType::float64_datatype(),
))
}
fn signature(&self) -> Signature {
@@ -338,69 +337,26 @@ impl Function for H3CellCenterLat {
}
fn eval(&self, _func_ctx: FunctionContext, columns: &[VectorRef]) -> Result<VectorRef> {
ensure!(
columns.len() == 1,
InvalidFuncArgsSnafu {
err_msg: format!(
"The length of the args is not correct, expect 1, provided : {}",
columns.len()
),
}
);
ensure_columns_n!(columns, 1);
let cell_vec = &columns[0];
let size = cell_vec.len();
let mut results = Float64VectorBuilder::with_capacity(size);
let mut results =
ListVectorBuilder::with_type_capacity(ConcreteDataType::float64_datatype(), size);
for i in 0..size {
let cell = cell_from_value(cell_vec.get(i))?;
let lat = cell.map(|cell| LatLng::from(cell).lat());
let latlng = cell.map(LatLng::from);
results.push(lat);
}
Ok(results.to_vector())
}
}
/// Function that returns centroid longitude of given cell id
#[derive(Clone, Debug, Default, Display)]
#[display("{}", self.name())]
pub struct H3CellCenterLng;
impl Function for H3CellCenterLng {
fn name(&self) -> &str {
"h3_cell_center_lng"
}
fn return_type(&self, _input_types: &[ConcreteDataType]) -> Result<ConcreteDataType> {
Ok(ConcreteDataType::float64_datatype())
}
fn signature(&self) -> Signature {
signature_of_cell()
}
fn eval(&self, _func_ctx: FunctionContext, columns: &[VectorRef]) -> Result<VectorRef> {
ensure!(
columns.len() == 1,
InvalidFuncArgsSnafu {
err_msg: format!(
"The length of the args is not correct, expect 1, provided : {}",
columns.len()
),
if let Some(latlng) = latlng {
let result = ListValue::new(
vec![latlng.lat().into(), latlng.lng().into()],
ConcreteDataType::float64_datatype(),
);
results.push(Some(result.as_scalar_ref()));
} else {
results.push(None);
}
);
let cell_vec = &columns[0];
let size = cell_vec.len();
let mut results = Float64VectorBuilder::with_capacity(size);
for i in 0..size {
let cell = cell_from_value(cell_vec.get(i))?;
let lat = cell.map(|cell| LatLng::from(cell).lng());
results.push(lat);
}
Ok(results.to_vector())
@@ -470,15 +426,7 @@ impl Function for H3CellBase {
}
fn eval(&self, _func_ctx: FunctionContext, columns: &[VectorRef]) -> Result<VectorRef> {
ensure!(
columns.len() == 1,
InvalidFuncArgsSnafu {
err_msg: format!(
"The length of the args is not correct, expect 1, provided : {}",
columns.len()
),
}
);
ensure_columns_n!(columns, 1);
let cell_vec = &columns[0];
let size = cell_vec.len();
@@ -514,15 +462,7 @@ impl Function for H3CellIsPentagon {
}
fn eval(&self, _func_ctx: FunctionContext, columns: &[VectorRef]) -> Result<VectorRef> {
ensure!(
columns.len() == 1,
InvalidFuncArgsSnafu {
err_msg: format!(
"The length of the args is not correct, expect 1, provided : {}",
columns.len()
),
}
);
ensure_columns_n!(columns, 1);
let cell_vec = &columns[0];
let size = cell_vec.len();
@@ -558,15 +498,7 @@ impl Function for H3CellCenterChild {
}
fn eval(&self, _func_ctx: FunctionContext, columns: &[VectorRef]) -> Result<VectorRef> {
ensure!(
columns.len() == 2,
InvalidFuncArgsSnafu {
err_msg: format!(
"The length of the args is not correct, expect 2, provided : {}",
columns.len()
),
}
);
ensure_columns_n!(columns, 2);
let cell_vec = &columns[0];
let res_vec = &columns[1];
@@ -606,15 +538,7 @@ impl Function for H3CellParent {
}
fn eval(&self, _func_ctx: FunctionContext, columns: &[VectorRef]) -> Result<VectorRef> {
ensure!(
columns.len() == 2,
InvalidFuncArgsSnafu {
err_msg: format!(
"The length of the args is not correct, expect 2, provided : {}",
columns.len()
),
}
);
ensure_columns_n!(columns, 2);
let cell_vec = &columns[0];
let res_vec = &columns[1];
@@ -633,48 +557,323 @@ impl Function for H3CellParent {
}
}
/// Function that checks if two cells are neighbour
/// Function that returns children cell list
#[derive(Clone, Debug, Default, Display)]
#[display("{}", self.name())]
pub struct H3IsNeighbour;
pub struct H3CellToChildren;
impl Function for H3IsNeighbour {
impl Function for H3CellToChildren {
fn name(&self) -> &str {
"h3_is_neighbour"
"h3_cell_to_children"
}
fn return_type(&self, _input_types: &[ConcreteDataType]) -> Result<ConcreteDataType> {
Ok(ConcreteDataType::boolean_datatype())
Ok(ConcreteDataType::list_datatype(
ConcreteDataType::uint64_datatype(),
))
}
fn signature(&self) -> Signature {
signature_of_double_cell()
signature_of_cell_and_resolution()
}
fn eval(&self, _func_ctx: FunctionContext, columns: &[VectorRef]) -> Result<VectorRef> {
ensure!(
columns.len() == 2,
InvalidFuncArgsSnafu {
err_msg: format!(
"The length of the args is not correct, expect 2, provided : {}",
columns.len()
),
}
);
ensure_columns_n!(columns, 2);
let cell_vec = &columns[0];
let cell2_vec = &columns[1];
let res_vec = &columns[1];
let size = cell_vec.len();
let mut results = BooleanVectorBuilder::with_capacity(size);
let mut results =
ListVectorBuilder::with_type_capacity(ConcreteDataType::uint64_datatype(), size);
for i in 0..size {
let cell = cell_from_value(cell_vec.get(i))?;
let res = value_to_resolution(res_vec.get(i))?;
let result = cell.map(|cell| {
let children: Vec<Value> = cell
.children(res)
.map(|child| Value::from(u64::from(child)))
.collect();
ListValue::new(children, ConcreteDataType::uint64_datatype())
});
if let Some(list_value) = result {
results.push(Some(list_value.as_scalar_ref()));
} else {
results.push(None);
}
}
Ok(results.to_vector())
}
}
/// Function that returns children cell count
#[derive(Clone, Debug, Default, Display)]
#[display("{}", self.name())]
pub struct H3CellToChildrenSize;
impl Function for H3CellToChildrenSize {
fn name(&self) -> &str {
"h3_cell_to_children_size"
}
fn return_type(&self, _input_types: &[ConcreteDataType]) -> Result<ConcreteDataType> {
Ok(ConcreteDataType::uint64_datatype())
}
fn signature(&self) -> Signature {
signature_of_cell_and_resolution()
}
fn eval(&self, _func_ctx: FunctionContext, columns: &[VectorRef]) -> Result<VectorRef> {
ensure_columns_n!(columns, 2);
let cell_vec = &columns[0];
let res_vec = &columns[1];
let size = cell_vec.len();
let mut results = UInt64VectorBuilder::with_capacity(size);
for i in 0..size {
let cell = cell_from_value(cell_vec.get(i))?;
let res = value_to_resolution(res_vec.get(i))?;
let result = cell.map(|cell| cell.children_count(res));
results.push(result);
}
Ok(results.to_vector())
}
}
/// Function that returns the cell position if its parent at given resolution
#[derive(Clone, Debug, Default, Display)]
#[display("{}", self.name())]
pub struct H3CellToChildPos;
impl Function for H3CellToChildPos {
fn name(&self) -> &str {
"h3_cell_to_child_pos"
}
fn return_type(&self, _input_types: &[ConcreteDataType]) -> Result<ConcreteDataType> {
Ok(ConcreteDataType::uint64_datatype())
}
fn signature(&self) -> Signature {
signature_of_cell_and_resolution()
}
fn eval(&self, _func_ctx: FunctionContext, columns: &[VectorRef]) -> Result<VectorRef> {
ensure_columns_n!(columns, 2);
let cell_vec = &columns[0];
let res_vec = &columns[1];
let size = cell_vec.len();
let mut results = UInt64VectorBuilder::with_capacity(size);
for i in 0..size {
let cell = cell_from_value(cell_vec.get(i))?;
let res = value_to_resolution(res_vec.get(i))?;
let result = cell.and_then(|cell| cell.child_position(res));
results.push(result);
}
Ok(results.to_vector())
}
}
/// Function that returns the cell at given position of the parent at given resolution
#[derive(Clone, Debug, Default, Display)]
#[display("{}", self.name())]
pub struct H3ChildPosToCell;
impl Function for H3ChildPosToCell {
fn name(&self) -> &str {
"h3_child_pos_to_cell"
}
fn return_type(&self, _input_types: &[ConcreteDataType]) -> Result<ConcreteDataType> {
Ok(ConcreteDataType::uint64_datatype())
}
fn signature(&self) -> Signature {
let mut signatures =
Vec::with_capacity(POSITION_TYPES.len() * CELL_TYPES.len() * RESOLUTION_TYPES.len());
for position_type in POSITION_TYPES.as_slice() {
for cell_type in CELL_TYPES.as_slice() {
for resolution_type in RESOLUTION_TYPES.as_slice() {
signatures.push(TypeSignature::Exact(vec![
position_type.clone(),
cell_type.clone(),
resolution_type.clone(),
]));
}
}
}
Signature::one_of(signatures, Volatility::Stable)
}
fn eval(&self, _func_ctx: FunctionContext, columns: &[VectorRef]) -> Result<VectorRef> {
ensure_columns_n!(columns, 3);
let pos_vec = &columns[0];
let cell_vec = &columns[1];
let res_vec = &columns[2];
let size = cell_vec.len();
let mut results = UInt64VectorBuilder::with_capacity(size);
for i in 0..size {
let cell = cell_from_value(cell_vec.get(i))?;
let pos = value_to_position(pos_vec.get(i))?;
let res = value_to_resolution(res_vec.get(i))?;
let result = cell.and_then(|cell| cell.child_at(pos, res).map(u64::from));
results.push(result);
}
Ok(results.to_vector())
}
}
/// Function that returns cells with k distances of given cell
#[derive(Clone, Debug, Default, Display)]
#[display("{}", self.name())]
pub struct H3GridDisk;
impl Function for H3GridDisk {
fn name(&self) -> &str {
"h3_grid_disk"
}
fn return_type(&self, _input_types: &[ConcreteDataType]) -> Result<ConcreteDataType> {
Ok(ConcreteDataType::list_datatype(
ConcreteDataType::uint64_datatype(),
))
}
fn signature(&self) -> Signature {
signature_of_cell_and_distance()
}
fn eval(&self, _func_ctx: FunctionContext, columns: &[VectorRef]) -> Result<VectorRef> {
ensure_columns_n!(columns, 2);
let cell_vec = &columns[0];
let k_vec = &columns[1];
let size = cell_vec.len();
let mut results =
ListVectorBuilder::with_type_capacity(ConcreteDataType::uint64_datatype(), size);
for i in 0..size {
let cell = cell_from_value(cell_vec.get(i))?;
let k = value_to_distance(k_vec.get(i))?;
let result = cell.map(|cell| {
let children: Vec<Value> = cell
.grid_disk::<Vec<_>>(k)
.into_iter()
.map(|child| Value::from(u64::from(child)))
.collect();
ListValue::new(children, ConcreteDataType::uint64_datatype())
});
if let Some(list_value) = result {
results.push(Some(list_value.as_scalar_ref()));
} else {
results.push(None);
}
}
Ok(results.to_vector())
}
}
/// Function that returns all cells within k distances of given cell
#[derive(Clone, Debug, Default, Display)]
#[display("{}", self.name())]
pub struct H3GridDiskDistances;
impl Function for H3GridDiskDistances {
fn name(&self) -> &str {
"h3_grid_disk_distances"
}
fn return_type(&self, _input_types: &[ConcreteDataType]) -> Result<ConcreteDataType> {
Ok(ConcreteDataType::list_datatype(
ConcreteDataType::uint64_datatype(),
))
}
fn signature(&self) -> Signature {
signature_of_cell_and_distance()
}
fn eval(&self, _func_ctx: FunctionContext, columns: &[VectorRef]) -> Result<VectorRef> {
ensure_columns_n!(columns, 2);
let cell_vec = &columns[0];
let k_vec = &columns[1];
let size = cell_vec.len();
let mut results =
ListVectorBuilder::with_type_capacity(ConcreteDataType::uint64_datatype(), size);
for i in 0..size {
let cell = cell_from_value(cell_vec.get(i))?;
let k = value_to_distance(k_vec.get(i))?;
let result = cell.map(|cell| {
let children: Vec<Value> = cell
.grid_disk_distances::<Vec<_>>(k)
.into_iter()
.map(|(child, _distance)| Value::from(u64::from(child)))
.collect();
ListValue::new(children, ConcreteDataType::uint64_datatype())
});
if let Some(list_value) = result {
results.push(Some(list_value.as_scalar_ref()));
} else {
results.push(None);
}
}
Ok(results.to_vector())
}
}
/// Function that returns distance between two cells
#[derive(Clone, Debug, Default, Display)]
#[display("{}", self.name())]
pub struct H3GridDistance;
impl Function for H3GridDistance {
fn name(&self) -> &str {
"h3_grid_distance"
}
fn return_type(&self, _input_types: &[ConcreteDataType]) -> Result<ConcreteDataType> {
Ok(ConcreteDataType::int32_datatype())
}
fn signature(&self) -> Signature {
signature_of_double_cells()
}
fn eval(&self, _func_ctx: FunctionContext, columns: &[VectorRef]) -> Result<VectorRef> {
ensure_columns_n!(columns, 2);
let cell_this_vec = &columns[0];
let cell_that_vec = &columns[1];
let size = cell_this_vec.len();
let mut results = Int32VectorBuilder::with_capacity(size);
for i in 0..size {
let result = match (
cell_from_value(cell_vec.get(i))?,
cell_from_value(cell2_vec.get(i))?,
cell_from_value(cell_this_vec.get(i))?,
cell_from_value(cell_that_vec.get(i))?,
) {
(Some(cell_this), Some(cell_that)) => {
let is_neighbour = cell_this
.is_neighbor_with(cell_that)
let dist = cell_this
.grid_distance(cell_that)
.map_err(|e| {
BoxedError::new(PlainError::new(
format!("H3 error: {}", e),
@@ -682,7 +881,7 @@ impl Function for H3IsNeighbour {
))
})
.context(error::ExecuteSnafu)?;
Some(is_neighbour)
Some(dist)
}
_ => None,
};
@@ -694,6 +893,73 @@ impl Function for H3IsNeighbour {
}
}
/// Function that returns path cells between two cells
#[derive(Clone, Debug, Default, Display)]
#[display("{}", self.name())]
pub struct H3GridPathCells;
impl Function for H3GridPathCells {
fn name(&self) -> &str {
"h3_grid_path_cells"
}
fn return_type(&self, _input_types: &[ConcreteDataType]) -> Result<ConcreteDataType> {
Ok(ConcreteDataType::list_datatype(
ConcreteDataType::uint64_datatype(),
))
}
fn signature(&self) -> Signature {
signature_of_double_cells()
}
fn eval(&self, _func_ctx: FunctionContext, columns: &[VectorRef]) -> Result<VectorRef> {
ensure_columns_n!(columns, 2);
let cell_this_vec = &columns[0];
let cell_that_vec = &columns[1];
let size = cell_this_vec.len();
let mut results =
ListVectorBuilder::with_type_capacity(ConcreteDataType::uint64_datatype(), size);
for i in 0..size {
let result = match (
cell_from_value(cell_this_vec.get(i))?,
cell_from_value(cell_that_vec.get(i))?,
) {
(Some(cell_this), Some(cell_that)) => {
let cells = cell_this
.grid_path_cells(cell_that)
.and_then(|x| x.collect::<std::result::Result<Vec<CellIndex>, _>>())
.map_err(|e| {
BoxedError::new(PlainError::new(
format!("H3 error: {}", e),
StatusCode::EngineExecuteQuery,
))
})
.context(error::ExecuteSnafu)?;
Some(ListValue::new(
cells
.into_iter()
.map(|c| Value::from(u64::from(c)))
.collect(),
ConcreteDataType::uint64_datatype(),
))
}
_ => None,
};
if let Some(list_value) = result {
results.push(Some(list_value.as_scalar_ref()));
} else {
results.push(None);
}
}
Ok(results.to_vector())
}
}
fn value_to_resolution(v: Value) -> Result<Resolution> {
let r = match v {
Value::Int8(v) => v as u8,
@@ -716,26 +982,59 @@ fn value_to_resolution(v: Value) -> Result<Resolution> {
.context(error::ExecuteSnafu)
}
macro_rules! ensure_and_coerce {
($compare:expr, $coerce:expr) => {{
ensure!(
$compare,
InvalidFuncArgsSnafu {
err_msg: "Argument was outside of acceptable range "
}
);
Ok($coerce)
}};
}
fn value_to_position(v: Value) -> Result<u64> {
match v {
Value::Int8(v) => ensure_and_coerce!(v >= 0, v as u64),
Value::Int16(v) => ensure_and_coerce!(v >= 0, v as u64),
Value::Int32(v) => ensure_and_coerce!(v >= 0, v as u64),
Value::Int64(v) => ensure_and_coerce!(v >= 0, v as u64),
Value::UInt8(v) => Ok(v as u64),
Value::UInt16(v) => Ok(v as u64),
Value::UInt32(v) => Ok(v as u64),
Value::UInt64(v) => Ok(v),
_ => unreachable!(),
}
}
fn value_to_distance(v: Value) -> Result<u32> {
match v {
Value::Int8(v) => ensure_and_coerce!(v >= 0, v as u32),
Value::Int16(v) => ensure_and_coerce!(v >= 0, v as u32),
Value::Int32(v) => ensure_and_coerce!(v >= 0, v as u32),
Value::Int64(v) => ensure_and_coerce!(v >= 0, v as u32),
Value::UInt8(v) => Ok(v as u32),
Value::UInt16(v) => Ok(v as u32),
Value::UInt32(v) => Ok(v),
Value::UInt64(v) => Ok(v as u32),
_ => unreachable!(),
}
}
fn signature_of_cell() -> Signature {
let mut signatures = Vec::new();
for cell_type in &[
ConcreteDataType::uint64_datatype(),
ConcreteDataType::int64_datatype(),
] {
let mut signatures = Vec::with_capacity(CELL_TYPES.len());
for cell_type in CELL_TYPES.as_slice() {
signatures.push(TypeSignature::Exact(vec![cell_type.clone()]));
}
Signature::one_of(signatures, Volatility::Stable)
}
fn signature_of_double_cell() -> Signature {
let mut signatures = Vec::new();
let cell_types = &[
ConcreteDataType::uint64_datatype(),
ConcreteDataType::int64_datatype(),
];
for cell_type in cell_types {
for cell_type2 in cell_types {
fn signature_of_double_cells() -> Signature {
let mut signatures = Vec::with_capacity(CELL_TYPES.len() * CELL_TYPES.len());
for cell_type in CELL_TYPES.as_slice() {
for cell_type2 in CELL_TYPES.as_slice() {
signatures.push(TypeSignature::Exact(vec![
cell_type.clone(),
cell_type2.clone(),
@@ -747,21 +1046,9 @@ fn signature_of_double_cell() -> Signature {
}
fn signature_of_cell_and_resolution() -> Signature {
let mut signatures = Vec::new();
for cell_type in &[
ConcreteDataType::uint64_datatype(),
ConcreteDataType::int64_datatype(),
] {
for resolution_type in &[
ConcreteDataType::int8_datatype(),
ConcreteDataType::int16_datatype(),
ConcreteDataType::int32_datatype(),
ConcreteDataType::int64_datatype(),
ConcreteDataType::uint8_datatype(),
ConcreteDataType::uint16_datatype(),
ConcreteDataType::uint32_datatype(),
ConcreteDataType::uint64_datatype(),
] {
let mut signatures = Vec::with_capacity(CELL_TYPES.len() * RESOLUTION_TYPES.len());
for cell_type in CELL_TYPES.as_slice() {
for resolution_type in RESOLUTION_TYPES.as_slice() {
signatures.push(TypeSignature::Exact(vec![
cell_type.clone(),
resolution_type.clone(),
@@ -771,6 +1058,19 @@ fn signature_of_cell_and_resolution() -> Signature {
Signature::one_of(signatures, Volatility::Stable)
}
fn signature_of_cell_and_distance() -> Signature {
let mut signatures = Vec::with_capacity(CELL_TYPES.len() * DISTANCE_TYPES.len());
for cell_type in CELL_TYPES.as_slice() {
for distance_type in DISTANCE_TYPES.as_slice() {
signatures.push(TypeSignature::Exact(vec![
cell_type.clone(),
distance_type.clone(),
]));
}
}
Signature::one_of(signatures, Volatility::Stable)
}
fn cell_from_value(v: Value) -> Result<Option<CellIndex>> {
let cell = match v {
Value::Int64(v) => Some(

View File

@@ -0,0 +1,61 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
macro_rules! ensure_columns_len {
($columns:ident) => {
ensure!(
$columns.windows(2).all(|c| c[0].len() == c[1].len()),
InvalidFuncArgsSnafu {
err_msg: "The length of input columns are in different size"
}
)
};
($column_a:ident, $column_b:ident, $($column_n:ident),*) => {
ensure!(
{
let mut result = $column_a.len() == $column_b.len();
$(
result = result && ($column_a.len() == $column_n.len());
)*
result
}
InvalidFuncArgsSnafu {
err_msg: "The length of input columns are in different size"
}
)
};
}
pub(super) use ensure_columns_len;
macro_rules! ensure_columns_n {
($columns:ident, $n:literal) => {
ensure!(
$columns.len() == $n,
InvalidFuncArgsSnafu {
err_msg: format!(
"The length of arguments is not correct, expect {}, provided : {}",
stringify!($n),
$columns.len()
),
}
);
if $n > 1 {
ensure_columns_len!($columns);
}
};
}
pub(super) use ensure_columns_n;

View File

@@ -15,6 +15,7 @@
use std::sync::Arc;
mod json_get;
mod json_is;
mod json_path_exists;
mod json_to_string;
mod parse_json;
@@ -46,5 +47,7 @@ impl JsonFunction {
registry.register(Arc::new(JsonIsBool));
registry.register(Arc::new(JsonIsArray));
registry.register(Arc::new(JsonIsObject));
registry.register(Arc::new(json_path_exists::JsonPathExistsFunction));
}
}

View File

@@ -0,0 +1,172 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
use std::fmt::{self, Display};
use common_query::error::{InvalidFuncArgsSnafu, Result, UnsupportedInputDataTypeSnafu};
use common_query::prelude::Signature;
use datafusion::logical_expr::Volatility;
use datatypes::data_type::ConcreteDataType;
use datatypes::prelude::VectorRef;
use datatypes::scalars::ScalarVectorBuilder;
use datatypes::vectors::{BooleanVectorBuilder, MutableVector};
use snafu::ensure;
use crate::function::{Function, FunctionContext};
/// Check if the given JSON data contains the given JSON path.
#[derive(Clone, Debug, Default)]
pub struct JsonPathExistsFunction;
const NAME: &str = "json_path_exists";
impl Function for JsonPathExistsFunction {
fn name(&self) -> &str {
NAME
}
fn return_type(&self, _input_types: &[ConcreteDataType]) -> Result<ConcreteDataType> {
Ok(ConcreteDataType::boolean_datatype())
}
fn signature(&self) -> Signature {
Signature::exact(
vec![
ConcreteDataType::json_datatype(),
ConcreteDataType::string_datatype(),
],
Volatility::Immutable,
)
}
fn eval(&self, _func_ctx: FunctionContext, columns: &[VectorRef]) -> Result<VectorRef> {
ensure!(
columns.len() == 2,
InvalidFuncArgsSnafu {
err_msg: format!(
"The length of the args is not correct, expect exactly two, have: {}",
columns.len()
),
}
);
let jsons = &columns[0];
let paths = &columns[1];
let size = jsons.len();
let datatype = jsons.data_type();
let mut results = BooleanVectorBuilder::with_capacity(size);
match datatype {
// JSON data type uses binary vector
ConcreteDataType::Binary(_) => {
for i in 0..size {
let json = jsons.get_ref(i);
let path = paths.get_ref(i);
let json = json.as_binary();
let path = path.as_string();
let result = match (json, path) {
(Ok(Some(json)), Ok(Some(path))) => {
let json_path = jsonb::jsonpath::parse_json_path(path.as_bytes());
match json_path {
Ok(json_path) => jsonb::path_exists(json, json_path).ok(),
Err(_) => None,
}
}
_ => None,
};
results.push(result);
}
}
_ => {
return UnsupportedInputDataTypeSnafu {
function: NAME,
datatypes: columns.iter().map(|c| c.data_type()).collect::<Vec<_>>(),
}
.fail();
}
}
Ok(results.to_vector())
}
}
impl Display for JsonPathExistsFunction {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
write!(f, "JSON_PATH_EXISTS")
}
}
#[cfg(test)]
mod tests {
use std::sync::Arc;
use common_query::prelude::TypeSignature;
use datatypes::scalars::ScalarVector;
use datatypes::vectors::{BinaryVector, StringVector};
use super::*;
#[test]
fn test_json_path_exists_function() {
let json_path_exists = JsonPathExistsFunction;
assert_eq!("json_path_exists", json_path_exists.name());
assert_eq!(
ConcreteDataType::boolean_datatype(),
json_path_exists
.return_type(&[ConcreteDataType::json_datatype()])
.unwrap()
);
assert!(matches!(json_path_exists.signature(),
Signature {
type_signature: TypeSignature::Exact(valid_types),
volatility: Volatility::Immutable
} if valid_types == vec![ConcreteDataType::json_datatype(), ConcreteDataType::string_datatype()]
));
let json_strings = [
r#"{"a": {"b": 2}, "b": 2, "c": 3}"#,
r#"{"a": 4, "b": {"c": 6}, "c": 6}"#,
r#"{"a": 7, "b": 8, "c": {"a": 7}}"#,
r#"{"a": 7, "b": 8, "c": {"a": 7}}"#,
];
let paths = vec!["$.a.b.c", "$.b", "$.c.a", ".d"];
let results = [false, true, true, false];
let jsonbs = json_strings
.iter()
.map(|s| {
let value = jsonb::parse_value(s.as_bytes()).unwrap();
value.to_vec()
})
.collect::<Vec<_>>();
let json_vector = BinaryVector::from_vec(jsonbs);
let path_vector = StringVector::from_vec(paths);
let args: Vec<VectorRef> = vec![Arc::new(json_vector), Arc::new(path_vector)];
let vector = json_path_exists
.eval(FunctionContext::default(), &args)
.unwrap();
assert_eq!(4, vector.len());
for (i, gt) in results.iter().enumerate() {
let result = vector.get_ref(i);
let result = result.as_boolean().unwrap().unwrap();
assert_eq!(*gt, result);
}
}
}

View File

@@ -21,23 +21,19 @@ use syn::{parse_macro_input, DeriveInput, ItemStruct};
pub(crate) fn impl_aggr_func_type_store(ast: &DeriveInput) -> TokenStream {
let name = &ast.ident;
let gen = quote! {
use common_query::logical_plan::accumulator::AggrFuncTypeStore;
use common_query::error::{InvalidInputStateSnafu, Error as QueryError};
use datatypes::prelude::ConcreteDataType;
impl AggrFuncTypeStore for #name {
fn input_types(&self) -> std::result::Result<Vec<ConcreteDataType>, QueryError> {
impl common_query::logical_plan::accumulator::AggrFuncTypeStore for #name {
fn input_types(&self) -> std::result::Result<Vec<datatypes::prelude::ConcreteDataType>, common_query::error::Error> {
let input_types = self.input_types.load();
snafu::ensure!(input_types.is_some(), InvalidInputStateSnafu);
snafu::ensure!(input_types.is_some(), common_query::error::InvalidInputStateSnafu);
Ok(input_types.as_ref().unwrap().as_ref().clone())
}
fn set_input_types(&self, input_types: Vec<ConcreteDataType>) -> std::result::Result<(), QueryError> {
fn set_input_types(&self, input_types: Vec<datatypes::prelude::ConcreteDataType>) -> std::result::Result<(), common_query::error::Error> {
let old = self.input_types.swap(Some(std::sync::Arc::new(input_types.clone())));
if let Some(old) = old {
snafu::ensure!(old.len() == input_types.len(), InvalidInputStateSnafu);
snafu::ensure!(old.len() == input_types.len(), common_query::error::InvalidInputStateSnafu);
for (x, y) in old.iter().zip(input_types.iter()) {
snafu::ensure!(x == y, InvalidInputStateSnafu);
snafu::ensure!(x == y, common_query::error::InvalidInputStateSnafu);
}
}
Ok(())
@@ -51,7 +47,7 @@ pub(crate) fn impl_as_aggr_func_creator(_args: TokenStream, input: TokenStream)
let mut item_struct = parse_macro_input!(input as ItemStruct);
if let syn::Fields::Named(ref mut fields) = item_struct.fields {
let result = syn::Field::parse_named.parse2(quote! {
input_types: arc_swap::ArcSwapOption<Vec<ConcreteDataType>>
input_types: arc_swap::ArcSwapOption<Vec<datatypes::prelude::ConcreteDataType>>
});
match result {
Ok(field) => fields.named.push(field),

View File

@@ -24,5 +24,5 @@ struct Foo {}
fn test_derive() {
let _ = Foo::default();
assert_fields!(Foo: input_types);
assert_impl_all!(Foo: std::fmt::Debug, Default, AggrFuncTypeStore);
assert_impl_all!(Foo: std::fmt::Debug, Default, common_query::logical_plan::accumulator::AggrFuncTypeStore);
}

View File

@@ -18,6 +18,7 @@ use common_procedure::error::Error as ProcedureError;
use snafu::{ensure, OptionExt, ResultExt};
use store_api::metric_engine_consts::LOGICAL_TABLE_METADATA_KEY;
use table::metadata::TableId;
use table::table_reference::TableReference;
use crate::ddl::DetectingRegion;
use crate::error::{Error, OperateDatanodeSnafu, Result, TableNotFoundSnafu, UnsupportedSnafu};
@@ -109,8 +110,8 @@ pub async fn check_and_get_physical_table_id(
.table_name_manager()
.get(physical_table_name)
.await?
.context(TableNotFoundSnafu {
table_name: physical_table_name.to_string(),
.with_context(|| TableNotFoundSnafu {
table_name: TableReference::from(physical_table_name).to_string(),
})
.map(|table| table.table_id())
}
@@ -123,8 +124,8 @@ pub async fn get_physical_table_id(
.table_name_manager()
.get(logical_table_name)
.await?
.context(TableNotFoundSnafu {
table_name: logical_table_name.to_string(),
.with_context(|| TableNotFoundSnafu {
table_name: TableReference::from(logical_table_name).to_string(),
})
.map(|table| table.table_id())?;

View File

@@ -147,6 +147,20 @@ pub enum Error {
source: common_procedure::Error,
},
#[snafu(display("Failed to start procedure manager"))]
StartProcedureManager {
#[snafu(implicit)]
location: Location,
source: common_procedure::Error,
},
#[snafu(display("Failed to stop procedure manager"))]
StopProcedureManager {
#[snafu(implicit)]
location: Location,
source: common_procedure::Error,
},
#[snafu(display(
"Failed to get procedure output, procedure id: {procedure_id}, error: {err_msg}"
))]
@@ -715,7 +729,9 @@ impl ErrorExt for Error {
SubmitProcedure { source, .. }
| QueryProcedure { source, .. }
| WaitProcedure { source, .. } => source.status_code(),
| WaitProcedure { source, .. }
| StartProcedureManager { source, .. }
| StopProcedureManager { source, .. } => source.status_code(),
RegisterProcedureLoader { source, .. } => source.status_code(),
External { source, .. } => source.status_code(),
OperateDatanode { source, .. } => source.status_code(),

View File

@@ -21,6 +21,7 @@ use serde::{Deserialize, Serialize};
use snafu::OptionExt;
use table::metadata::TableId;
use table::table_name::TableName;
use table::table_reference::TableReference;
use super::{MetadataKey, MetadataValue, TABLE_NAME_KEY_PATTERN, TABLE_NAME_KEY_PREFIX};
use crate::error::{Error, InvalidMetadataSnafu, Result};
@@ -122,6 +123,16 @@ impl From<TableNameKey<'_>> for TableName {
}
}
impl<'a> From<TableNameKey<'a>> for TableReference<'a> {
fn from(value: TableNameKey<'a>) -> Self {
Self {
catalog: value.catalog,
schema: value.schema,
table: value.table,
}
}
}
impl<'a> TryFrom<&'a str> for TableNameKey<'a> {
type Error = Error;

View File

@@ -0,0 +1,156 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
use std::sync::Arc;
use async_trait::async_trait;
use common_telemetry::error;
use crate::error::Result;
pub type LeadershipChangeNotifierCustomizerRef = Arc<dyn LeadershipChangeNotifierCustomizer>;
/// A trait for customizing the leadership change notifier.
pub trait LeadershipChangeNotifierCustomizer: Send + Sync {
fn customize(&self, notifier: &mut LeadershipChangeNotifier);
}
/// A trait for handling leadership change events in a distributed system.
#[async_trait]
pub trait LeadershipChangeListener: Send + Sync {
/// Returns the listener name.
fn name(&self) -> &str;
/// Called when the node transitions to the leader role.
async fn on_leader_start(&self) -> Result<()>;
/// Called when the node transitions to the follower role.
async fn on_leader_stop(&self) -> Result<()>;
}
/// A notifier for leadership change events.
#[derive(Default)]
pub struct LeadershipChangeNotifier {
listeners: Vec<Arc<dyn LeadershipChangeListener>>,
}
impl LeadershipChangeNotifier {
/// Adds a listener to the notifier.
pub fn add_listener(&mut self, listener: Arc<dyn LeadershipChangeListener>) {
self.listeners.push(listener);
}
/// Notify all listeners that the node has become a leader.
pub async fn notify_on_leader_start(&self) {
for listener in &self.listeners {
if let Err(err) = listener.on_leader_start().await {
error!(
err;
"Failed to notify listener: {}, event 'on_leader_start'",
listener.name()
);
}
}
}
/// Notify all listeners that the node has become a follower.
pub async fn notify_on_leader_stop(&self) {
for listener in &self.listeners {
if let Err(err) = listener.on_leader_stop().await {
error!(
err;
"Failed to notify listener: {}, event: 'on_follower_start'",
listener.name()
);
}
}
}
}
#[cfg(test)]
mod tests {
use std::sync::atomic::{AtomicBool, Ordering};
use std::sync::Arc;
use super::*;
struct MockListener {
name: String,
on_leader_start_fn: Option<Box<dyn Fn() -> Result<()> + Send + Sync>>,
on_follower_start_fn: Option<Box<dyn Fn() -> Result<()> + Send + Sync>>,
}
#[async_trait::async_trait]
impl LeadershipChangeListener for MockListener {
fn name(&self) -> &str {
&self.name
}
async fn on_leader_start(&self) -> Result<()> {
if let Some(f) = &self.on_leader_start_fn {
return f();
}
Ok(())
}
async fn on_leader_stop(&self) -> Result<()> {
if let Some(f) = &self.on_follower_start_fn {
return f();
}
Ok(())
}
}
#[tokio::test]
async fn test_leadership_change_notifier() {
let mut notifier = LeadershipChangeNotifier::default();
let listener1 = Arc::new(MockListener {
name: "listener1".to_string(),
on_leader_start_fn: None,
on_follower_start_fn: None,
});
let called_on_leader_start = Arc::new(AtomicBool::new(false));
let called_on_follower_start = Arc::new(AtomicBool::new(false));
let called_on_leader_start_moved = called_on_leader_start.clone();
let called_on_follower_start_moved = called_on_follower_start.clone();
let listener2 = Arc::new(MockListener {
name: "listener2".to_string(),
on_leader_start_fn: Some(Box::new(move || {
called_on_leader_start_moved.store(true, Ordering::Relaxed);
Ok(())
})),
on_follower_start_fn: Some(Box::new(move || {
called_on_follower_start_moved.store(true, Ordering::Relaxed);
Ok(())
})),
});
notifier.add_listener(listener1);
notifier.add_listener(listener2);
let listener1 = notifier.listeners.first().unwrap();
let listener2 = notifier.listeners.get(1).unwrap();
assert_eq!(listener1.name(), "listener1");
assert_eq!(listener2.name(), "listener2");
notifier.notify_on_leader_start().await;
assert!(!called_on_follower_start.load(Ordering::Relaxed));
assert!(called_on_leader_start.load(Ordering::Relaxed));
notifier.notify_on_leader_stop().await;
assert!(called_on_follower_start.load(Ordering::Relaxed));
assert!(called_on_leader_start.load(Ordering::Relaxed));
}
}

View File

@@ -32,6 +32,7 @@ pub mod heartbeat;
pub mod instruction;
pub mod key;
pub mod kv_backend;
pub mod leadership_notifier;
pub mod lock_key;
pub mod metrics;
pub mod node_manager;

View File

@@ -17,6 +17,7 @@ pub mod kafka;
use std::collections::HashMap;
use std::sync::Arc;
use async_trait::async_trait;
use common_wal::config::MetasrvWalConfig;
use common_wal::options::{KafkaWalOptions, WalOptions, WAL_OPTIONS_KEY};
use snafu::ResultExt;
@@ -24,6 +25,7 @@ use store_api::storage::{RegionId, RegionNumber};
use crate::error::{EncodeWalOptionsSnafu, Result};
use crate::kv_backend::KvBackendRef;
use crate::leadership_notifier::LeadershipChangeListener;
use crate::wal_options_allocator::kafka::topic_manager::TopicManager as KafkaTopicManager;
/// Allocates wal options in region granularity.
@@ -94,6 +96,21 @@ impl WalOptionsAllocator {
}
}
#[async_trait]
impl LeadershipChangeListener for WalOptionsAllocator {
fn name(&self) -> &str {
"WalOptionsAllocator"
}
async fn on_leader_start(&self) -> Result<()> {
self.start().await
}
async fn on_leader_stop(&self) -> Result<()> {
Ok(())
}
}
/// Allocates a wal options for each region. The allocated wal options is encoded immediately.
pub fn allocate_region_wal_options(
regions: Vec<RegionNumber>,

View File

@@ -329,6 +329,7 @@ impl ExecutionPlanVisitor for MetricCollector {
level: self.current_level,
metrics: vec![],
});
self.current_level += 1;
return Ok(true);
};
@@ -365,8 +366,7 @@ impl ExecutionPlanVisitor for MetricCollector {
}
fn post_visit(&mut self, _plan: &dyn ExecutionPlan) -> std::result::Result<bool, Self::Error> {
// the last minus will underflow
self.current_level = self.current_level.wrapping_sub(1);
self.current_level -= 1;
Ok(true)
}
}

View File

@@ -249,6 +249,15 @@ impl ConcreteDataType {
]
}
pub fn timestamps() -> Vec<ConcreteDataType> {
vec![
ConcreteDataType::timestamp_second_datatype(),
ConcreteDataType::timestamp_millisecond_datatype(),
ConcreteDataType::timestamp_microsecond_datatype(),
ConcreteDataType::timestamp_nanosecond_datatype(),
]
}
/// Convert arrow data type to [ConcreteDataType].
///
/// # Panics

View File

@@ -527,11 +527,22 @@ impl HeartbeatHandlerGroupBuilder {
}
/// Builds the group of heartbeat handlers.
pub fn build(self) -> HeartbeatHandlerGroup {
HeartbeatHandlerGroup {
///
/// Applies the customizer if it exists.
pub fn build(mut self) -> Result<HeartbeatHandlerGroup> {
if let Some(customizer) = self
.plugins
.as_ref()
.and_then(|plugins| plugins.get::<HeartbeatHandlerGroupBuilderCustomizerRef>())
{
debug!("Customizing the heartbeat handler group builder");
customizer.customize(&mut self)?;
}
Ok(HeartbeatHandlerGroup {
handlers: self.handlers.into_iter().collect(),
pushers: self.pushers,
}
})
}
/// Adds the handler after the specified handler.
@@ -582,6 +593,14 @@ impl HeartbeatHandlerGroupBuilder {
}
}
pub type HeartbeatHandlerGroupBuilderCustomizerRef =
Arc<dyn HeartbeatHandlerGroupBuilderCustomizer>;
/// The customizer of the [`HeartbeatHandlerGroupBuilder`].
pub trait HeartbeatHandlerGroupBuilderCustomizer: Send + Sync {
fn customize(&self, builder: &mut HeartbeatHandlerGroupBuilder) -> Result<()>;
}
#[cfg(test)]
mod tests {
@@ -670,7 +689,8 @@ mod tests {
fn test_handler_group_builder() {
let group = HeartbeatHandlerGroupBuilder::new(Pushers::default())
.add_default_handlers()
.build();
.build()
.unwrap();
let handlers = group.handlers;
assert_eq!(12, handlers.len());
@@ -706,7 +726,7 @@ mod tests {
)
.unwrap();
let group = builder.build();
let group = builder.build().unwrap();
let handlers = group.handlers;
assert_eq!(13, handlers.len());
@@ -739,7 +759,7 @@ mod tests {
.add_handler_before("ResponseHeaderHandler", CollectStatsHandler::default())
.unwrap();
let group = builder.build();
let group = builder.build().unwrap();
let handlers = group.handlers;
assert_eq!(13, handlers.len());
@@ -772,7 +792,7 @@ mod tests {
.add_handler_after("MailboxHandler", CollectStatsHandler::default())
.unwrap();
let group = builder.build();
let group = builder.build().unwrap();
let handlers = group.handlers;
assert_eq!(13, handlers.len());
@@ -805,7 +825,7 @@ mod tests {
.add_handler_after("CollectStatsHandler", ResponseHeaderHandler)
.unwrap();
let group = builder.build();
let group = builder.build().unwrap();
let handlers = group.handlers;
assert_eq!(13, handlers.len());
@@ -838,7 +858,7 @@ mod tests {
.replace_handler("MailboxHandler", CollectStatsHandler::default())
.unwrap();
let group = builder.build();
let group = builder.build().unwrap();
let handlers = group.handlers;
assert_eq!(12, handlers.len());
@@ -870,7 +890,7 @@ mod tests {
.replace_handler("CollectStatsHandler", ResponseHeaderHandler)
.unwrap();
let group = builder.build();
let group = builder.build().unwrap();
let handlers = group.handlers;
assert_eq!(12, handlers.len());
@@ -902,7 +922,7 @@ mod tests {
.replace_handler("ResponseHeaderHandler", CollectStatsHandler::default())
.unwrap();
let group = builder.build();
let group = builder.build().unwrap();
let handlers = group.handlers;
assert_eq!(12, handlers.len());

View File

@@ -40,7 +40,6 @@ pub mod selector;
pub mod service;
pub mod state;
pub mod table_meta_alloc;
pub use crate::error::Result;
mod greptimedb_telemetry;

View File

@@ -29,6 +29,9 @@ use common_meta::cache_invalidator::CacheInvalidatorRef;
use common_meta::ddl::ProcedureExecutorRef;
use common_meta::key::TableMetadataManagerRef;
use common_meta::kv_backend::{KvBackendRef, ResettableKvBackend, ResettableKvBackendRef};
use common_meta::leadership_notifier::{
LeadershipChangeNotifier, LeadershipChangeNotifierCustomizerRef,
};
use common_meta::peer::Peer;
use common_meta::region_keeper::MemoryRegionKeeperRef;
use common_meta::wal_options_allocator::WalOptionsAllocatorRef;
@@ -56,6 +59,7 @@ use crate::handler::HeartbeatHandlerGroupRef;
use crate::lease::lookup_datanode_peer;
use crate::lock::DistLockRef;
use crate::procedure::region_migration::manager::RegionMigrationManagerRef;
use crate::procedure::ProcedureManagerListenerAdapter;
use crate::pubsub::{PublisherRef, SubscriptionManagerRef};
use crate::region::supervisor::RegionSupervisorTickerRef;
use crate::selector::{Selector, SelectorType};
@@ -291,17 +295,15 @@ pub type SelectorRef = Arc<dyn Selector<Context = SelectorContext, Output = Vec<
pub type ElectionRef = Arc<dyn Election<Leader = LeaderValue>>;
pub struct MetaStateHandler {
procedure_manager: ProcedureManagerRef,
wal_options_allocator: WalOptionsAllocatorRef,
subscribe_manager: Option<SubscriptionManagerRef>,
greptimedb_telemetry_task: Arc<GreptimeDBTelemetryTask>,
leader_cached_kv_backend: Arc<LeaderCachedKvBackend>,
region_supervisor_ticker: Option<RegionSupervisorTickerRef>,
leadership_change_notifier: LeadershipChangeNotifier,
state: StateRef,
}
impl MetaStateHandler {
pub async fn on_become_leader(&self) {
pub async fn on_leader_start(&self) {
self.state.write().unwrap().next_state(become_leader(false));
if let Err(e) = self.leader_cached_kv_backend.load().await {
@@ -310,33 +312,19 @@ impl MetaStateHandler {
self.state.write().unwrap().next_state(become_leader(true));
}
if let Some(ticker) = self.region_supervisor_ticker.as_ref() {
ticker.start();
}
if let Err(e) = self.procedure_manager.start().await {
error!(e; "Failed to start procedure manager");
}
if let Err(e) = self.wal_options_allocator.start().await {
error!(e; "Failed to start wal options allocator");
}
self.leadership_change_notifier
.notify_on_leader_start()
.await;
self.greptimedb_telemetry_task.should_report(true);
}
pub async fn on_become_follower(&self) {
pub async fn on_leader_stop(&self) {
self.state.write().unwrap().next_state(become_follower());
// Stops the procedures.
if let Err(e) = self.procedure_manager.stop().await {
error!(e; "Failed to stop procedure manager");
}
if let Some(ticker) = self.region_supervisor_ticker.as_ref() {
// Stops the supervisor ticker.
ticker.stop();
}
self.leadership_change_notifier
.notify_on_leader_stop()
.await;
// Suspends reporting.
self.greptimedb_telemetry_task.should_report(false);
@@ -410,15 +398,25 @@ impl Metasrv {
greptimedb_telemetry_task
.start()
.context(StartTelemetryTaskSnafu)?;
let region_supervisor_ticker = self.region_supervisor_ticker.clone();
// Builds leadership change notifier.
let mut leadership_change_notifier = LeadershipChangeNotifier::default();
leadership_change_notifier.add_listener(self.wal_options_allocator.clone());
leadership_change_notifier
.add_listener(Arc::new(ProcedureManagerListenerAdapter(procedure_manager)));
if let Some(region_supervisor_ticker) = &self.region_supervisor_ticker {
leadership_change_notifier.add_listener(region_supervisor_ticker.clone() as _);
}
if let Some(customizer) = self.plugins.get::<LeadershipChangeNotifierCustomizerRef>() {
customizer.customize(&mut leadership_change_notifier);
}
let state_handler = MetaStateHandler {
greptimedb_telemetry_task,
subscribe_manager,
procedure_manager,
wal_options_allocator: self.wal_options_allocator.clone(),
state: self.state.clone(),
leader_cached_kv_backend: leader_cached_kv_backend.clone(),
region_supervisor_ticker,
leadership_change_notifier,
};
let _handle = common_runtime::spawn_global(async move {
loop {
@@ -429,12 +427,12 @@ impl Metasrv {
info!("Leader's cache has bean cleared on leader change: {msg}");
match msg {
LeaderChangeMessage::Elected(_) => {
state_handler.on_become_leader().await;
state_handler.on_leader_start().await;
}
LeaderChangeMessage::StepDown(leader) => {
error!("Leader :{:?} step down", leader);
state_handler.on_become_follower().await;
state_handler.on_leader_stop().await;
}
}
}
@@ -448,7 +446,7 @@ impl Metasrv {
}
}
state_handler.on_become_follower().await;
state_handler.on_leader_stop().await;
});
// Register candidate and keep lease in background.

View File

@@ -363,7 +363,7 @@ impl MetasrvBuilder {
.with_region_failure_handler(region_failover_handler)
.with_region_lease_handler(Some(region_lease_handler))
.add_default_handlers()
.build()
.build()?
}
};

View File

@@ -12,7 +12,37 @@
// See the License for the specific language governing permissions and
// limitations under the License.
use async_trait::async_trait;
use common_meta::error::{self, Result};
use common_meta::leadership_notifier::LeadershipChangeListener;
use common_procedure::ProcedureManagerRef;
use snafu::ResultExt;
pub mod region_migration;
#[cfg(test)]
mod tests;
pub mod utils;
#[derive(Clone)]
pub struct ProcedureManagerListenerAdapter(pub ProcedureManagerRef);
#[async_trait]
impl LeadershipChangeListener for ProcedureManagerListenerAdapter {
fn name(&self) -> &str {
"ProcedureManager"
}
async fn on_leader_start(&self) -> Result<()> {
self.0
.start()
.await
.context(error::StartProcedureManagerSnafu)
}
async fn on_leader_stop(&self) -> Result<()> {
self.0
.stop()
.await
.context(error::StopProcedureManagerSnafu)
}
}

View File

@@ -43,8 +43,10 @@ use common_procedure::error::{
Error as ProcedureError, FromJsonSnafu, Result as ProcedureResult, ToJsonSnafu,
};
use common_procedure::{Context as ProcedureContext, LockKey, Procedure, Status, StringKey};
pub use manager::RegionMigrationProcedureTask;
use manager::{RegionMigrationProcedureGuard, RegionMigrationProcedureTracker};
use manager::RegionMigrationProcedureGuard;
pub use manager::{
RegionMigrationManagerRef, RegionMigrationProcedureTask, RegionMigrationProcedureTracker,
};
use serde::{Deserialize, Serialize};
use snafu::{OptionExt, ResultExt};
use store_api::storage::RegionId;

View File

@@ -44,7 +44,7 @@ pub struct RegionMigrationManager {
}
#[derive(Default, Clone)]
pub(crate) struct RegionMigrationProcedureTracker {
pub struct RegionMigrationProcedureTracker {
running_procedures: Arc<RwLock<HashMap<RegionId, RegionMigrationProcedureTask>>>,
}
@@ -149,7 +149,7 @@ impl RegionMigrationManager {
}
/// Returns the [`RegionMigrationProcedureTracker`].
pub(crate) fn tracker(&self) -> &RegionMigrationProcedureTracker {
pub fn tracker(&self) -> &RegionMigrationProcedureTracker {
&self.tracker
}

View File

@@ -16,10 +16,12 @@ use std::fmt::Debug;
use std::sync::{Arc, Mutex};
use std::time::Duration;
use async_trait::async_trait;
use common_meta::datanode::Stat;
use common_meta::ddl::{DetectingRegion, RegionFailureDetectorController};
use common_meta::key::MAINTENANCE_KEY;
use common_meta::kv_backend::KvBackendRef;
use common_meta::leadership_notifier::LeadershipChangeListener;
use common_meta::peer::PeerLookupServiceRef;
use common_meta::{ClusterId, DatanodeId};
use common_runtime::JoinHandle;
@@ -129,6 +131,23 @@ pub struct RegionSupervisorTicker {
sender: Sender<Event>,
}
#[async_trait]
impl LeadershipChangeListener for RegionSupervisorTicker {
fn name(&self) -> &'static str {
"RegionSupervisorTicker"
}
async fn on_leader_start(&self) -> common_meta::error::Result<()> {
self.start();
Ok(())
}
async fn on_leader_stop(&self) -> common_meta::error::Result<()> {
self.stop();
Ok(())
}
}
impl RegionSupervisorTicker {
pub(crate) fn new(tick_interval: Duration, sender: Sender<Event>) -> Self {
Self {
@@ -223,7 +242,7 @@ impl RegionFailureDetectorController for RegionFailureDetectorControl {
.send(Event::RegisterFailureDetectors(detecting_regions))
.await
{
error!(err; "RegionSupervisor is stop receiving heartbeat");
error!(err; "RegionSupervisor has stop receiving heartbeat.");
}
}
@@ -233,7 +252,7 @@ impl RegionFailureDetectorController for RegionFailureDetectorControl {
.send(Event::DeregisterFailureDetectors(detecting_regions))
.await
{
error!(err; "RegionSupervisor is stop receiving heartbeat");
error!(err; "RegionSupervisor has stop receiving heartbeat.");
}
}
}
@@ -251,13 +270,13 @@ impl HeartbeatAcceptor {
/// Accepts heartbeats from datanodes.
pub(crate) async fn accept(&self, heartbeat: DatanodeHeartbeat) {
if let Err(err) = self.sender.send(Event::HeartbeatArrived(heartbeat)).await {
error!(err; "RegionSupervisor is stop receiving heartbeat");
error!(err; "RegionSupervisor has stop receiving heartbeat.");
}
}
}
impl RegionSupervisor {
/// Returns a a mpsc channel with a buffer capacity of 1024 for sending and receiving `Event` messages.
/// Returns a mpsc channel with a buffer capacity of 1024 for sending and receiving `Event` messages.
pub(crate) fn channel() -> (Sender<Event>, Receiver<Event>) {
tokio::sync::mpsc::channel(1024)
}

View File

@@ -64,15 +64,19 @@ impl MetricEngineInner {
/// Return the physical region id behind this logical region
async fn alter_logical_region(
&self,
region_id: RegionId,
logical_region_id: RegionId,
request: RegionAlterRequest,
) -> Result<RegionId> {
let physical_region_id = {
let state = &self.state.read().unwrap();
state.get_physical_region_id(region_id).with_context(|| {
error!("Trying to alter an nonexistent region {region_id}");
LogicalRegionNotFoundSnafu { region_id }
})?
state
.get_physical_region_id(logical_region_id)
.with_context(|| {
error!("Trying to alter an nonexistent region {logical_region_id}");
LogicalRegionNotFoundSnafu {
region_id: logical_region_id,
}
})?
};
// only handle adding column
@@ -87,7 +91,7 @@ impl MetricEngineInner {
.metadata_region
.column_semantic_type(
metadata_region_id,
region_id,
logical_region_id,
&col.column_metadata.column_schema.name,
)
.await?
@@ -102,7 +106,7 @@ impl MetricEngineInner {
self.add_columns_to_physical_data_region(
data_region_id,
metadata_region_id,
region_id,
logical_region_id,
columns_to_add,
)
.await?;
@@ -110,10 +114,16 @@ impl MetricEngineInner {
// register columns to logical region
for col in columns {
self.metadata_region
.add_column(metadata_region_id, region_id, &col.column_metadata)
.add_column(metadata_region_id, logical_region_id, &col.column_metadata)
.await?;
}
// invalid logical column cache
self.state
.write()
.unwrap()
.invalid_logical_column_cache(logical_region_id);
Ok(physical_region_id)
}

View File

@@ -169,11 +169,11 @@ impl MetricEngineInner {
) -> Result<Vec<usize>> {
// project on logical columns
let all_logical_columns = self
.load_logical_columns(physical_region_id, logical_region_id)
.load_logical_column_names(physical_region_id, logical_region_id)
.await?;
let projected_logical_names = origin_projection
.iter()
.map(|i| all_logical_columns[*i].column_schema.name.clone())
.map(|i| all_logical_columns[*i].clone())
.collect::<Vec<_>>();
// generate physical projection
@@ -200,10 +200,8 @@ impl MetricEngineInner {
logical_region_id: RegionId,
) -> Result<Vec<usize>> {
let logical_columns = self
.load_logical_columns(physical_region_id, logical_region_id)
.await?
.into_iter()
.map(|col| col.column_schema.name);
.load_logical_column_names(physical_region_id, logical_region_id)
.await?;
let mut projection = Vec::with_capacity(logical_columns.len());
let data_region_id = utils::to_data_region_id(physical_region_id);
let physical_metadata = self

View File

@@ -23,13 +23,25 @@ use crate::error::Result;
impl MetricEngineInner {
/// Load column metadata of a logical region.
///
/// The return value is ordered on [ColumnId].
/// The return value is ordered on column name.
pub async fn load_logical_columns(
&self,
physical_region_id: RegionId,
logical_region_id: RegionId,
) -> Result<Vec<ColumnMetadata>> {
// load logical and physical columns, and intersect them to get logical column metadata
// First try to load from state cache
if let Some(columns) = self
.state
.read()
.unwrap()
.logical_columns()
.get(&logical_region_id)
{
return Ok(columns.clone());
}
// Else load from metadata region and update the cache.
// Load logical and physical columns, and intersect them to get logical column metadata.
let mut logical_column_metadata = self
.metadata_region
.logical_columns(physical_region_id, logical_region_id)
@@ -37,11 +49,48 @@ impl MetricEngineInner {
.into_iter()
.map(|(_, column_metadata)| column_metadata)
.collect::<Vec<_>>();
// sort columns on column id to ensure the order
// Sort columns on column name to ensure the order
logical_column_metadata
.sort_unstable_by(|c1, c2| c1.column_schema.name.cmp(&c2.column_schema.name));
// Update cache
self.state
.write()
.unwrap()
.add_logical_columns(logical_region_id, logical_column_metadata.clone());
Ok(logical_column_metadata)
}
/// Load logical column names of a logical region.
///
/// The return value is ordered on column name alphabetically.
pub async fn load_logical_column_names(
&self,
physical_region_id: RegionId,
logical_region_id: RegionId,
) -> Result<Vec<String>> {
// First try to load from state cache
if let Some(columns) = self
.state
.read()
.unwrap()
.logical_columns()
.get(&logical_region_id)
{
return Ok(columns
.iter()
.map(|c| c.column_schema.name.clone())
.collect());
}
// Else load from metadata region
let columns = self
.load_logical_columns(physical_region_id, logical_region_id)
.await?
.into_iter()
.map(|c| c.column_schema.name)
.collect::<Vec<_>>();
Ok(columns)
}
}

View File

@@ -17,6 +17,7 @@
use std::collections::{HashMap, HashSet};
use snafu::OptionExt;
use store_api::metadata::ColumnMetadata;
use store_api::storage::RegionId;
use crate::error::{PhysicalRegionNotFoundSnafu, Result};
@@ -35,6 +36,10 @@ pub(crate) struct MetricEngineState {
/// Cache for the columns of physical regions.
/// The region id in key is the data region id.
physical_columns: HashMap<RegionId, HashSet<String>>,
/// Cache for the column metadata of logical regions.
/// The column order is the same with the order in the metadata, which is
/// alphabetically ordered on column name.
logical_columns: HashMap<RegionId, Vec<ColumnMetadata>>,
}
impl MetricEngineState {
@@ -80,6 +85,21 @@ impl MetricEngineState {
.insert(logical_region_id, physical_region_id);
}
/// Add and reorder logical columns.
///
/// Caller should make sure:
/// 1. there is no duplicate columns
/// 2. the column order is the same with the order in the metadata, which is
/// alphabetically ordered on column name.
pub fn add_logical_columns(
&mut self,
logical_region_id: RegionId,
new_columns: impl IntoIterator<Item = ColumnMetadata>,
) {
let columns = self.logical_columns.entry(logical_region_id).or_default();
columns.extend(new_columns);
}
pub fn get_physical_region_id(&self, logical_region_id: RegionId) -> Option<RegionId> {
self.logical_regions.get(&logical_region_id).copied()
}
@@ -88,6 +108,10 @@ impl MetricEngineState {
&self.physical_columns
}
pub fn logical_columns(&self) -> &HashMap<RegionId, Vec<ColumnMetadata>> {
&self.logical_columns
}
pub fn physical_regions(&self) -> &HashMap<RegionId, HashSet<RegionId>> {
&self.physical_regions
}
@@ -129,9 +153,15 @@ impl MetricEngineState {
.unwrap() // Safety: physical_region_id is got from physical_regions
.remove(&logical_region_id);
self.logical_columns.remove(&logical_region_id);
Ok(())
}
pub fn invalid_logical_column_cache(&mut self, logical_region_id: RegionId) {
self.logical_columns.remove(&logical_region_id);
}
pub fn is_logical_region_exist(&self, logical_region_id: RegionId) -> bool {
self.logical_regions().contains_key(&logical_region_id)
}

View File

@@ -59,7 +59,7 @@ impl StatementExecutor {
.map(|arg| {
let FunctionArg::Unnamed(FunctionArgExpr::Expr(Expr::Value(value))) = arg else {
return error::BuildAdminFunctionArgsSnafu {
msg: "unsupported function arg {arg}",
msg: format!("unsupported function arg {arg}"),
}
.fail();
};
@@ -200,7 +200,7 @@ fn values_to_vectors_by_valid_types(
}
error::BuildAdminFunctionArgsSnafu {
msg: "failed to cast {value}",
msg: format!("failed to cast {value}"),
}
.fail()
})

View File

@@ -40,6 +40,7 @@ enum_dispatch = "0.3"
futures.workspace = true
greptime-proto.workspace = true
itertools.workspace = true
jsonb.workspace = true
lazy_static.workspace = true
moka = { workspace = true, features = ["sync"] }
once_cell.workspace = true

View File

@@ -40,7 +40,7 @@ pub enum Error {
location: Location,
},
#[snafu(display("{processor} processor: missing field: {field}"))]
#[snafu(display("Processor {processor}: missing field: {field}"))]
ProcessorMissingField {
processor: String,
field: String,
@@ -48,7 +48,7 @@ pub enum Error {
location: Location,
},
#[snafu(display("{processor} processor: expect string value, but got {v:?}"))]
#[snafu(display("Processor {processor}: expect string value, but got {v:?}"))]
ProcessorExpectString {
processor: String,
v: crate::etl::Value,
@@ -56,7 +56,7 @@ pub enum Error {
location: Location,
},
#[snafu(display("{processor} processor: unsupported value {val}"))]
#[snafu(display("Processor {processor}: unsupported value {val}"))]
ProcessorUnsupportedValue {
processor: &'static str,
val: String,
@@ -64,13 +64,13 @@ pub enum Error {
location: Location,
},
#[snafu(display("processor key must be a string"))]
#[snafu(display("Processor key must be a string"))]
ProcessorKeyMustBeString {
#[snafu(implicit)]
location: Location,
},
#[snafu(display("{kind} processor: failed to parse {value}"))]
#[snafu(display("Processor {kind}: failed to parse {value}"))]
ProcessorFailedToParseString {
kind: String,
value: String,
@@ -78,13 +78,13 @@ pub enum Error {
location: Location,
},
#[snafu(display("processor must have a string key"))]
#[snafu(display("Processor must have a string key"))]
ProcessorMustHaveStringKey {
#[snafu(implicit)]
location: Location,
},
#[snafu(display("unsupported {processor} processor"))]
#[snafu(display("Unsupported {processor} processor"))]
UnsupportedProcessor {
processor: String,
#[snafu(implicit)]
@@ -108,7 +108,7 @@ pub enum Error {
location: Location,
},
#[snafu(display("failed to parse {key} as int: {value}"))]
#[snafu(display("Failed to parse {key} as int: {value}"))]
FailedToParseIntKey {
key: String,
value: String,
@@ -126,7 +126,7 @@ pub enum Error {
#[snafu(implicit)]
location: Location,
},
#[snafu(display("failed to parse {key} as float: {value}"))]
#[snafu(display("Failed to parse {key} as float: {value}"))]
FailedToParseFloatKey {
key: String,
value: String,
@@ -136,7 +136,7 @@ pub enum Error {
location: Location,
},
#[snafu(display("{kind} processor.{key} not found in intermediate keys"))]
#[snafu(display("Processor {kind}: {key} not found in intermediate keys"))]
IntermediateKeyIndex {
kind: String,
key: String,
@@ -144,41 +144,41 @@ pub enum Error {
location: Location,
},
#[snafu(display("{k} missing value in {s}"))]
#[snafu(display("Cmcd {k} missing value in {s}"))]
CmcdMissingValue {
k: String,
s: String,
#[snafu(implicit)]
location: Location,
},
#[snafu(display("{part} missing key in {s}"))]
#[snafu(display("Part: {part} missing key in {s}"))]
CmcdMissingKey {
part: String,
s: String,
#[snafu(implicit)]
location: Location,
},
#[snafu(display("key must be a string, but got {k:?}"))]
#[snafu(display("Key must be a string, but got {k:?}"))]
KeyMustBeString {
k: yaml_rust::Yaml,
#[snafu(implicit)]
location: Location,
},
#[snafu(display("csv read error"))]
#[snafu(display("Csv read error"))]
CsvRead {
#[snafu(implicit)]
location: Location,
#[snafu(source)]
error: csv::Error,
},
#[snafu(display("expected at least one record from csv format, but got none"))]
#[snafu(display("Expected at least one record from csv format, but got none"))]
CsvNoRecord {
#[snafu(implicit)]
location: Location,
},
#[snafu(display("'{separator}' must be a single character, but got '{value}'"))]
#[snafu(display("Separator '{separator}' must be a single character, but got '{value}'"))]
CsvSeparatorName {
separator: &'static str,
value: String,
@@ -186,7 +186,7 @@ pub enum Error {
location: Location,
},
#[snafu(display("'{quote}' must be a single character, but got '{value}'"))]
#[snafu(display("Quote '{quote}' must be a single character, but got '{value}'"))]
CsvQuoteName {
quote: &'static str,
value: String,
@@ -212,19 +212,19 @@ pub enum Error {
location: Location,
},
#[snafu(display("failed to get local timezone"))]
#[snafu(display("Failed to get local timezone"))]
DateFailedToGetLocalTimezone {
#[snafu(implicit)]
location: Location,
},
#[snafu(display("failed to get timestamp"))]
#[snafu(display("Failed to get timestamp"))]
DateFailedToGetTimestamp {
#[snafu(implicit)]
location: Location,
},
#[snafu(display("{processor} processor: invalid format {s}"))]
#[snafu(display("Processor {processor}: invalid format {s}"))]
DateInvalidFormat {
s: String,
processor: String,
@@ -245,20 +245,20 @@ pub enum Error {
#[snafu(implicit)]
location: Location,
},
#[snafu(display("'{split}' exceeds the input"))]
#[snafu(display("Split: '{split}' exceeds the input"))]
DissectSplitExceedsInput {
split: String,
#[snafu(implicit)]
location: Location,
},
#[snafu(display("'{split}' does not match the input '{input}'"))]
#[snafu(display("Split: '{split}' does not match the input '{input}'"))]
DissectSplitNotMatchInput {
split: String,
input: String,
#[snafu(implicit)]
location: Location,
},
#[snafu(display("consecutive names are not allowed: '{name1}' '{name2}'"))]
#[snafu(display("Consecutive names are not allowed: '{name1}' '{name2}'"))]
DissectConsecutiveNames {
name1: String,
name2: String,
@@ -270,7 +270,7 @@ pub enum Error {
#[snafu(implicit)]
location: Location,
},
#[snafu(display("'{m}' modifier already set, but found {modifier}"))]
#[snafu(display("Modifier '{m}' already set, but found {modifier}"))]
DissectModifierAlreadySet {
m: String,
modifier: String,
@@ -304,23 +304,23 @@ pub enum Error {
#[snafu(implicit)]
location: Location,
},
#[snafu(display("invalid resolution: {resolution}"))]
#[snafu(display("Invalid resolution: {resolution}"))]
EpochInvalidResolution {
resolution: String,
#[snafu(implicit)]
location: Location,
},
#[snafu(display("pattern is required"))]
#[snafu(display("Pattern is required"))]
GsubPatternRequired {
#[snafu(implicit)]
location: Location,
},
#[snafu(display("replacement is required"))]
#[snafu(display("Replacement is required"))]
GsubReplacementRequired {
#[snafu(implicit)]
location: Location,
},
#[snafu(display("invalid regex pattern: {pattern}"))]
#[snafu(display("Invalid regex pattern: {pattern}"))]
Regex {
#[snafu(source)]
error: regex::Error,
@@ -328,72 +328,72 @@ pub enum Error {
#[snafu(implicit)]
location: Location,
},
#[snafu(display("separator is required"))]
#[snafu(display("Separator is required"))]
JoinSeparatorRequired {
#[snafu(implicit)]
location: Location,
},
#[snafu(display("invalid method: {method}"))]
#[snafu(display("Invalid method: {method}"))]
LetterInvalidMethod {
method: String,
#[snafu(implicit)]
location: Location,
},
#[snafu(display("no named group found in regex {origin}"))]
#[snafu(display("No named group found in regex {origin}"))]
RegexNamedGroupNotFound {
origin: String,
#[snafu(implicit)]
location: Location,
},
#[snafu(display("no valid field found in {processor} processor"))]
#[snafu(display("No valid field found in {processor} processor"))]
RegexNoValidField {
processor: String,
#[snafu(implicit)]
location: Location,
},
#[snafu(display("no valid pattern found in {processor} processor"))]
#[snafu(display("No valid pattern found in {processor} processor"))]
RegexNoValidPattern {
processor: String,
#[snafu(implicit)]
location: Location,
},
#[snafu(display("invalid method: {s}"))]
#[snafu(display("Invalid method: {s}"))]
UrlEncodingInvalidMethod {
s: String,
#[snafu(implicit)]
location: Location,
},
#[snafu(display("url decoding error"))]
#[snafu(display("Url decoding error"))]
UrlEncodingDecode {
#[snafu(source)]
error: std::string::FromUtf8Error,
#[snafu(implicit)]
location: Location,
},
#[snafu(display("invalid transform on_failure value: {value}"))]
#[snafu(display("Invalid transform on_failure value: {value}"))]
TransformOnFailureInvalidValue {
value: String,
#[snafu(implicit)]
location: Location,
},
#[snafu(display("transform element must be a map"))]
#[snafu(display("Transform element must be a map"))]
TransformElementMustBeMap {
#[snafu(implicit)]
location: Location,
},
#[snafu(display("transform {fields:?} type MUST BE set before default {default}"))]
#[snafu(display("Transform {fields:?} type MUST BE set before default {default}"))]
TransformTypeMustBeSet {
fields: String,
default: String,
#[snafu(implicit)]
location: Location,
},
#[snafu(display("transform cannot be empty"))]
#[snafu(display("Transform cannot be empty"))]
TransformEmpty {
#[snafu(implicit)]
location: Location,
},
#[snafu(display("column name must be unique, but got duplicated: {duplicates}"))]
#[snafu(display("Column name must be unique, but got duplicated: {duplicates}"))]
TransformColumnNameMustBeUnique {
duplicates: String,
#[snafu(implicit)]
@@ -407,7 +407,7 @@ pub enum Error {
#[snafu(implicit)]
location: Location,
},
#[snafu(display("transform must have exactly one field specified as timestamp Index, but got {count}: {columns}"))]
#[snafu(display("Transform must have exactly one field specified as timestamp Index, but got {count}: {columns}"))]
TransformTimestampIndexCount {
count: usize,
columns: String,
@@ -425,22 +425,33 @@ pub enum Error {
#[snafu(implicit)]
location: Location,
},
#[snafu(display("{ty} value not supported for Epoch"))]
#[snafu(display("Type: {ty} value not supported for Epoch"))]
CoerceUnsupportedEpochType {
ty: String,
#[snafu(implicit)]
location: Location,
},
#[snafu(display("failed to coerce string value '{s}' to type '{ty}'"))]
#[snafu(display("Failed to coerce string value '{s}' to type '{ty}'"))]
CoerceStringToType {
s: String,
ty: String,
#[snafu(implicit)]
location: Location,
},
#[snafu(display("failed to coerce complex value, not supported"))]
CoerceComplexType {
#[snafu(implicit)]
location: Location,
},
#[snafu(display("failed to coerce value: {msg}"))]
CoerceIncompatibleTypes {
msg: String,
#[snafu(implicit)]
location: Location,
},
#[snafu(display(
"invalid resolution: '{resolution}'. Available resolutions: {valid_resolution}"
"Invalid resolution: '{resolution}'. Available resolutions: {valid_resolution}"
))]
ValueInvalidResolution {
resolution: String,
@@ -449,14 +460,14 @@ pub enum Error {
location: Location,
},
#[snafu(display("failed to parse type: '{t}'"))]
#[snafu(display("Failed to parse type: '{t}'"))]
ValueParseType {
t: String,
#[snafu(implicit)]
location: Location,
},
#[snafu(display("failed to parse {ty}: {v}"))]
#[snafu(display("Failed to parse {ty}: {v}"))]
ValueParseInt {
ty: String,
v: String,
@@ -466,7 +477,7 @@ pub enum Error {
location: Location,
},
#[snafu(display("failed to parse {ty}: {v}"))]
#[snafu(display("Failed to parse {ty}: {v}"))]
ValueParseFloat {
ty: String,
v: String,
@@ -476,7 +487,7 @@ pub enum Error {
location: Location,
},
#[snafu(display("failed to parse {ty}: {v}"))]
#[snafu(display("Failed to parse {ty}: {v}"))]
ValueParseBoolean {
ty: String,
v: String,
@@ -485,19 +496,19 @@ pub enum Error {
#[snafu(implicit)]
location: Location,
},
#[snafu(display("default value not unsupported for type {value}"))]
#[snafu(display("Default value not unsupported for type {value}"))]
ValueDefaultValueUnsupported {
value: String,
#[snafu(implicit)]
location: Location,
},
#[snafu(display("unsupported number type: {value}"))]
#[snafu(display("Unsupported number type: {value}"))]
ValueUnsupportedNumberType {
value: serde_json::Number,
#[snafu(implicit)]
location: Location,
},
#[snafu(display("unsupported yaml type: {value:?}"))]
#[snafu(display("Unsupported yaml type: {value:?}"))]
ValueUnsupportedYamlType {
value: yaml_rust::Yaml,
#[snafu(implicit)]
@@ -531,12 +542,26 @@ pub enum Error {
#[snafu(implicit)]
location: Location,
},
#[snafu(display("unsupported index type: {value}"))]
#[snafu(display("Unsupported index type: {value}"))]
UnsupportedIndexType {
value: String,
#[snafu(implicit)]
location: Location,
},
#[snafu(display("Unsupported number type: {value:?}"))]
UnsupportedNumberType {
value: serde_json::Number,
#[snafu(implicit)]
location: Location,
},
#[snafu(display("Column datatype mismatch. For column: {column}, expected datatype: {expected}, actual datatype: {actual}"))]
IdentifyPipelineColumnTypeMismatch {
column: String,
expected: String,
actual: String,
#[snafu(implicit)]
location: Location,
},
}
pub type Result<T> = std::result::Result<T, Error>;

View File

@@ -13,3 +13,4 @@
// limitations under the License.
pub mod greptime;
pub use greptime::identity_pipeline;

View File

@@ -16,13 +16,20 @@ pub mod coerce;
use std::collections::HashSet;
use ahash::HashMap;
use api::helper::proto_value_type;
use api::v1::column_data_type_extension::TypeExt;
use api::v1::value::ValueData;
use api::v1::{ColumnDataType, ColumnDataTypeExtension, JsonTypeExtension, SemanticType};
use coerce::{coerce_columns, coerce_value};
use greptime_proto::v1::{ColumnSchema, Row, Rows, Value as GreptimeValue};
use itertools::Itertools;
use serde_json::{Map, Number};
use crate::etl::error::{
Result, TransformColumnNameMustBeUniqueSnafu, TransformEmptySnafu,
TransformMultipleTimestampIndexSnafu, TransformTimestampIndexCountSnafu,
IdentifyPipelineColumnTypeMismatchSnafu, Result, TransformColumnNameMustBeUniqueSnafu,
TransformEmptySnafu, TransformMultipleTimestampIndexSnafu, TransformTimestampIndexCountSnafu,
UnsupportedNumberTypeSnafu,
};
use crate::etl::field::{InputFieldInfo, OneInputOneOutputField};
use crate::etl::transform::index::Index;
@@ -120,6 +127,7 @@ impl Transformer for GreptimeTransformer {
if let Some(idx) = transform.index {
if idx == Index::Time {
match transform.real_fields.len() {
//Safety unwrap is fine here because we have checked the length of real_fields
1 => timestamp_columns
.push(transform.real_fields.first().unwrap().input_name()),
_ => {
@@ -194,3 +202,304 @@ impl Transformer for GreptimeTransformer {
&mut self.transforms
}
}
/// This is used to record the current state schema information and a sequential cache of field names.
/// As you traverse the user input JSON, this will change.
/// It will record a superset of all user input schemas.
#[derive(Debug, Default)]
struct SchemaInfo {
/// schema info
schema: Vec<ColumnSchema>,
/// index of the column name
index: HashMap<String, usize>,
}
fn resolve_schema(
index: Option<usize>,
value_data: ValueData,
column_schema: ColumnSchema,
row: &mut Vec<GreptimeValue>,
schema_info: &mut SchemaInfo,
) -> Result<()> {
if let Some(index) = index {
let api_value = GreptimeValue {
value_data: Some(value_data),
};
// Safety unwrap is fine here because api_value is always valid
let value_column_data_type = proto_value_type(&api_value).unwrap();
// Safety unwrap is fine here because index is always valid
let schema_column_data_type = schema_info.schema.get(index).unwrap().datatype();
if value_column_data_type != schema_column_data_type {
IdentifyPipelineColumnTypeMismatchSnafu {
column: column_schema.column_name,
expected: schema_column_data_type.as_str_name(),
actual: value_column_data_type.as_str_name(),
}
.fail()
} else {
row[index] = api_value;
Ok(())
}
} else {
let key = column_schema.column_name.clone();
schema_info.schema.push(column_schema);
schema_info.index.insert(key, schema_info.schema.len() - 1);
let api_value = GreptimeValue {
value_data: Some(value_data),
};
row.push(api_value);
Ok(())
}
}
fn resolve_number_schema(
n: Number,
column_name: String,
index: Option<usize>,
row: &mut Vec<GreptimeValue>,
schema_info: &mut SchemaInfo,
) -> Result<()> {
let (value, datatype, semantic_type) = if n.is_i64() {
(
ValueData::I64Value(n.as_i64().unwrap()),
ColumnDataType::Int64 as i32,
SemanticType::Field as i32,
)
} else if n.is_u64() {
(
ValueData::U64Value(n.as_u64().unwrap()),
ColumnDataType::Uint64 as i32,
SemanticType::Field as i32,
)
} else if n.is_f64() {
(
ValueData::F64Value(n.as_f64().unwrap()),
ColumnDataType::Float64 as i32,
SemanticType::Field as i32,
)
} else {
return UnsupportedNumberTypeSnafu { value: n }.fail();
};
resolve_schema(
index,
value,
ColumnSchema {
column_name,
datatype,
semantic_type,
datatype_extension: None,
options: None,
},
row,
schema_info,
)
}
fn json_value_to_row(
schema_info: &mut SchemaInfo,
map: Map<String, serde_json::Value>,
) -> Result<Row> {
let mut row: Vec<GreptimeValue> = Vec::with_capacity(schema_info.schema.len());
for _ in 0..schema_info.schema.len() {
row.push(GreptimeValue { value_data: None });
}
for (column_name, value) in map {
if column_name == DEFAULT_GREPTIME_TIMESTAMP_COLUMN {
continue;
}
let index = schema_info.index.get(&column_name).copied();
match value {
serde_json::Value::Null => {
// do nothing
}
serde_json::Value::String(s) => {
resolve_schema(
index,
ValueData::StringValue(s),
ColumnSchema {
column_name,
datatype: ColumnDataType::String as i32,
semantic_type: SemanticType::Field as i32,
datatype_extension: None,
options: None,
},
&mut row,
schema_info,
)?;
}
serde_json::Value::Bool(b) => {
resolve_schema(
index,
ValueData::BoolValue(b),
ColumnSchema {
column_name,
datatype: ColumnDataType::Boolean as i32,
semantic_type: SemanticType::Field as i32,
datatype_extension: None,
options: None,
},
&mut row,
schema_info,
)?;
}
serde_json::Value::Number(n) => {
resolve_number_schema(n, column_name, index, &mut row, schema_info)?;
}
serde_json::Value::Array(_) | serde_json::Value::Object(_) => {
resolve_schema(
index,
ValueData::BinaryValue(jsonb::Value::from(value).to_vec()),
ColumnSchema {
column_name,
datatype: ColumnDataType::Binary as i32,
semantic_type: SemanticType::Field as i32,
datatype_extension: Some(ColumnDataTypeExtension {
type_ext: Some(TypeExt::JsonType(JsonTypeExtension::JsonBinary.into())),
}),
options: None,
},
&mut row,
schema_info,
)?;
}
}
}
Ok(Row { values: row })
}
/// Identity pipeline for Greptime
/// This pipeline will convert the input JSON array to Greptime Rows
/// 1. The pipeline will add a default timestamp column to the schema
/// 2. The pipeline not resolve NULL value
/// 3. The pipeline assumes that the json format is fixed
/// 4. The pipeline will return an error if the same column datatype is mismatched
/// 5. The pipeline will analyze the schema of each json record and merge them to get the final schema.
pub fn identity_pipeline(array: Vec<serde_json::Value>) -> Result<Rows> {
let mut rows = Vec::with_capacity(array.len());
let mut schema = SchemaInfo::default();
for value in array {
if let serde_json::Value::Object(map) = value {
let row = json_value_to_row(&mut schema, map)?;
rows.push(row);
}
}
let greptime_timestamp_schema = ColumnSchema {
column_name: DEFAULT_GREPTIME_TIMESTAMP_COLUMN.to_string(),
datatype: ColumnDataType::TimestampNanosecond as i32,
semantic_type: SemanticType::Timestamp as i32,
datatype_extension: None,
options: None,
};
let ns = chrono::Utc::now().timestamp_nanos_opt().unwrap_or(0);
let ts = GreptimeValue {
value_data: Some(ValueData::TimestampNanosecondValue(ns)),
};
let column_count = schema.schema.len();
for row in rows.iter_mut() {
let diff = column_count - row.values.len();
for _ in 0..diff {
row.values.push(GreptimeValue { value_data: None });
}
row.values.push(ts.clone());
}
schema.schema.push(greptime_timestamp_schema);
Ok(Rows {
schema: schema.schema,
rows,
})
}
#[cfg(test)]
mod tests {
use crate::identity_pipeline;
#[test]
fn test_identify_pipeline() {
{
let array = vec![
serde_json::json!({
"woshinull": null,
"name": "Alice",
"age": 20,
"is_student": true,
"score": 99.5,
"hobbies": "reading",
"address": "Beijing",
}),
serde_json::json!({
"name": "Bob",
"age": 21,
"is_student": false,
"score": "88.5",
"hobbies": "swimming",
"address": "Shanghai",
"gaga": "gaga"
}),
];
let rows = identity_pipeline(array);
assert!(rows.is_err());
assert_eq!(
rows.err().unwrap().to_string(),
"Column datatype mismatch. For column: score, expected datatype: FLOAT64, actual datatype: STRING".to_string(),
);
}
{
let array = vec![
serde_json::json!({
"woshinull": null,
"name": "Alice",
"age": 20,
"is_student": true,
"score": 99.5,
"hobbies": "reading",
"address": "Beijing",
}),
serde_json::json!({
"name": "Bob",
"age": 21,
"is_student": false,
"score": 88,
"hobbies": "swimming",
"address": "Shanghai",
"gaga": "gaga"
}),
];
let rows = identity_pipeline(array);
assert!(rows.is_err());
assert_eq!(
rows.err().unwrap().to_string(),
"Column datatype mismatch. For column: score, expected datatype: FLOAT64, actual datatype: INT64".to_string(),
);
}
{
let array = vec![
serde_json::json!({
"woshinull": null,
"name": "Alice",
"age": 20,
"is_student": true,
"score": 99.5,
"hobbies": "reading",
"address": "Beijing",
}),
serde_json::json!({
"name": "Bob",
"age": 21,
"is_student": false,
"score": 88.5,
"hobbies": "swimming",
"address": "Shanghai",
"gaga": "gaga"
}),
];
let rows = identity_pipeline(array);
assert!(rows.is_ok());
let rows = rows.unwrap();
assert_eq!(rows.schema.len(), 8);
assert_eq!(rows.rows.len(), 2);
assert_eq!(8, rows.rows[0].values.len());
assert_eq!(8, rows.rows[1].values.len());
}
}
}

View File

@@ -20,7 +20,8 @@ use greptime_proto::v1::{ColumnDataType, ColumnSchema, SemanticType};
use snafu::ResultExt;
use crate::etl::error::{
CoerceStringToTypeSnafu, CoerceUnsupportedEpochTypeSnafu, CoerceUnsupportedNullTypeSnafu,
CoerceComplexTypeSnafu, CoerceIncompatibleTypesSnafu, CoerceStringToTypeSnafu,
CoerceUnsupportedEpochTypeSnafu, CoerceUnsupportedNullTypeSnafu,
CoerceUnsupportedNullTypeToSnafu, ColumnOptionsSnafu, Error, Result,
};
use crate::etl::transform::index::Index;
@@ -61,8 +62,7 @@ impl TryFrom<Value> for ValueData {
}
Value::Timestamp(Timestamp::Second(s)) => Ok(ValueData::TimestampSecondValue(s)),
Value::Array(_) => unimplemented!("Array type not supported"),
Value::Map(_) => unimplemented!("Object type not supported"),
Value::Array(_) | Value::Map(_) => CoerceComplexTypeSnafu.fail(),
}
}
}
@@ -134,8 +134,7 @@ fn coerce_type(transform: &Transform) -> Result<ColumnDataType> {
Value::Timestamp(Timestamp::Millisecond(_)) => Ok(ColumnDataType::TimestampMillisecond),
Value::Timestamp(Timestamp::Second(_)) => Ok(ColumnDataType::TimestampSecond),
Value::Array(_) => unimplemented!("Array"),
Value::Map(_) => unimplemented!("Object"),
Value::Array(_) | Value::Map(_) => CoerceComplexTypeSnafu.fail(),
Value::Null => CoerceUnsupportedNullTypeToSnafu {
ty: transform.type_.to_str_type(),
@@ -176,19 +175,28 @@ pub(crate) fn coerce_value(val: &Value, transform: &Transform) -> Result<Option<
Value::Boolean(b) => coerce_bool_value(*b, transform),
Value::String(s) => coerce_string_value(s, transform),
Value::Timestamp(Timestamp::Nanosecond(ns)) => {
Ok(Some(ValueData::TimestampNanosecondValue(*ns)))
}
Value::Timestamp(Timestamp::Microsecond(us)) => {
Ok(Some(ValueData::TimestampMicrosecondValue(*us)))
}
Value::Timestamp(Timestamp::Millisecond(ms)) => {
Ok(Some(ValueData::TimestampMillisecondValue(*ms)))
}
Value::Timestamp(Timestamp::Second(s)) => Ok(Some(ValueData::TimestampSecondValue(*s))),
Value::Timestamp(input_timestamp) => match &transform.type_ {
Value::Timestamp(target_timestamp) => match target_timestamp {
Timestamp::Nanosecond(_) => Ok(Some(ValueData::TimestampNanosecondValue(
input_timestamp.timestamp_nanos(),
))),
Timestamp::Microsecond(_) => Ok(Some(ValueData::TimestampMicrosecondValue(
input_timestamp.timestamp_micros(),
))),
Timestamp::Millisecond(_) => Ok(Some(ValueData::TimestampMillisecondValue(
input_timestamp.timestamp_millis(),
))),
Timestamp::Second(_) => Ok(Some(ValueData::TimestampSecondValue(
input_timestamp.timestamp(),
))),
},
_ => CoerceIncompatibleTypesSnafu {
msg: "Timestamp can only be coerced to another timestamp",
}
.fail(),
},
Value::Array(_) => unimplemented!("Array type not supported"),
Value::Map(_) => unimplemented!("Object type not supported"),
Value::Array(_) | Value::Map(_) => CoerceComplexTypeSnafu.fail(),
}
}
@@ -220,8 +228,7 @@ fn coerce_bool_value(b: bool, transform: &Transform) -> Result<Option<ValueData>
}
},
Value::Array(_) => unimplemented!("Array type not supported"),
Value::Map(_) => unimplemented!("Object type not supported"),
Value::Array(_) | Value::Map(_) => return CoerceComplexTypeSnafu.fail(),
Value::Null => return Ok(None),
};
@@ -257,8 +264,7 @@ fn coerce_i64_value(n: i64, transform: &Transform) -> Result<Option<ValueData>>
}
},
Value::Array(_) => unimplemented!("Array type not supported"),
Value::Map(_) => unimplemented!("Object type not supported"),
Value::Array(_) | Value::Map(_) => return CoerceComplexTypeSnafu.fail(),
Value::Null => return Ok(None),
};
@@ -294,8 +300,7 @@ fn coerce_u64_value(n: u64, transform: &Transform) -> Result<Option<ValueData>>
}
},
Value::Array(_) => unimplemented!("Array type not supported"),
Value::Map(_) => unimplemented!("Object type not supported"),
Value::Array(_) | Value::Map(_) => return CoerceComplexTypeSnafu.fail(),
Value::Null => return Ok(None),
};
@@ -331,8 +336,7 @@ fn coerce_f64_value(n: f64, transform: &Transform) -> Result<Option<ValueData>>
}
},
Value::Array(_) => unimplemented!("Array type not supported"),
Value::Map(_) => unimplemented!("Object type not supported"),
Value::Array(_) | Value::Map(_) => return CoerceComplexTypeSnafu.fail(),
Value::Null => return Ok(None),
};
@@ -407,8 +411,7 @@ fn coerce_string_value(s: &String, transform: &Transform) -> Result<Option<Value
None => CoerceUnsupportedEpochTypeSnafu { ty: "String" }.fail(),
},
Value::Array(_) => unimplemented!("Array type not supported"),
Value::Map(_) => unimplemented!("Object type not supported"),
Value::Array(_) | Value::Map(_) => CoerceComplexTypeSnafu.fail(),
Value::Null => Ok(None),
}

View File

@@ -18,6 +18,7 @@ mod metrics;
pub use etl::error::Result;
pub use etl::processor::Processor;
pub use etl::transform::transformer::identity_pipeline;
pub use etl::transform::{GreptimeTransformer, Transformer};
pub use etl::value::{Array, Map, Value};
pub use etl::{parse, Content, Pipeline};

View File

@@ -200,6 +200,8 @@ transform:
#[test]
fn test_default_wrong_resolution() {
// given a number, we have no ways to guess its resolution
// but we can convert resolution during transform phase
let test_input = r#"
{
"input_s": "1722580862",
@@ -209,28 +211,30 @@ fn test_default_wrong_resolution() {
let pipeline_yaml = r#"
processors:
- epoch:
fields:
- input_s
- input_nano
field: input_s
resolution: s
- epoch:
field: input_nano
resolution: ns
transform:
- fields:
- input_s
type: epoch, s
type: epoch, ms
- fields:
- input_nano
type: epoch, nano
type: epoch, ms
"#;
let expected_schema = vec![
common::make_column_schema(
"input_s".to_string(),
ColumnDataType::TimestampSecond,
ColumnDataType::TimestampMillisecond,
SemanticType::Field,
),
common::make_column_schema(
"input_nano".to_string(),
ColumnDataType::TimestampNanosecond,
ColumnDataType::TimestampMillisecond,
SemanticType::Field,
),
common::make_column_schema(
@@ -242,14 +246,12 @@ transform:
let output = common::parse_and_exec(test_input, pipeline_yaml);
assert_eq!(output.schema, expected_schema);
// this is actually wrong
// TODO(shuiyisong): add check for type when converting epoch
assert_eq!(
output.rows[0].values[0].value_data,
Some(ValueData::TimestampMillisecondValue(1722580862))
Some(ValueData::TimestampMillisecondValue(1722580862000))
);
assert_eq!(
output.rows[0].values[1].value_data,
Some(ValueData::TimestampMillisecondValue(1722583122284583936))
Some(ValueData::TimestampMillisecondValue(1722583122284))
);
}

View File

@@ -318,6 +318,7 @@ transform:
#[test]
fn test_timestamp_default_wrong_resolution() {
// same as test_default_wrong_resolution from epoch tests
let test_input = r#"
{
"input_s": "1722580862",
@@ -327,28 +328,30 @@ fn test_timestamp_default_wrong_resolution() {
let pipeline_yaml = r#"
processors:
- timestamp:
fields:
- input_s
- input_nano
field: input_s
resolution: s
- timestamp:
field: input_nano
resolution: ns
transform:
- fields:
- input_s
type: timestamp, s
type: timestamp, ms
- fields:
- input_nano
type: timestamp, nano
type: timestamp, ms
"#;
let expected_schema = vec![
common::make_column_schema(
"input_s".to_string(),
ColumnDataType::TimestampSecond,
ColumnDataType::TimestampMillisecond,
SemanticType::Field,
),
common::make_column_schema(
"input_nano".to_string(),
ColumnDataType::TimestampNanosecond,
ColumnDataType::TimestampMillisecond,
SemanticType::Field,
),
common::make_column_schema(
@@ -360,14 +363,12 @@ transform:
let output = common::parse_and_exec(test_input, pipeline_yaml);
assert_eq!(output.schema, expected_schema);
// this is actually wrong
// TODO(shuiyisong): add check for type when converting epoch
assert_eq!(
output.rows[0].values[0].value_data,
Some(ValueData::TimestampMillisecondValue(1722580862))
Some(ValueData::TimestampMillisecondValue(1722580862000))
);
assert_eq!(
output.rows[0].values[1].value_data,
Some(ValueData::TimestampMillisecondValue(1722583122284583936))
Some(ValueData::TimestampMillisecondValue(1722583122284))
);
}

View File

@@ -15,8 +15,9 @@
mod analyzer;
mod commutativity;
mod merge_scan;
mod merge_sort;
mod planner;
pub use analyzer::DistPlannerAnalyzer;
pub use merge_scan::{MergeScanExec, MergeScanLogicalPlan};
pub use planner::DistExtensionPlanner;
pub use planner::{DistExtensionPlanner, MergeSortExtensionPlanner};

View File

@@ -160,7 +160,6 @@ impl PlanRewriter {
{
return true;
}
match Categorizer::check_plan(plan, self.partition_cols.clone()) {
Commutativity::Commutative => {}
Commutativity::PartialCommutative => {
@@ -265,9 +264,10 @@ impl PlanRewriter {
// add merge scan as the new root
let mut node = MergeScanLogicalPlan::new(on_node, false).into_logical_plan();
// expand stages
for new_stage in self.stage.drain(..) {
node = new_stage.with_new_exprs(new_stage.expressions(), vec![node.clone()])?
node = new_stage.with_new_exprs(new_stage.expressions(), vec![node.clone()])?;
}
self.set_expanded();

View File

@@ -21,6 +21,7 @@ use promql::extension_plan::{
EmptyMetric, InstantManipulate, RangeManipulate, SeriesDivide, SeriesNormalize,
};
use crate::dist_plan::merge_sort::{merge_sort_transformer, MergeSortLogicalPlan};
use crate::dist_plan::MergeScanLogicalPlan;
#[allow(dead_code)]
@@ -68,8 +69,9 @@ impl Categorizer {
}
// sort plan needs to consider column priority
// We can implement a merge-sort on partial ordered data
Commutativity::PartialCommutative
// Change Sort to MergeSort which assumes the input streams are already sorted hence can be more efficient
// We should ensure the number of partition is not smaller than the number of region at present. Otherwise this would result in incorrect output.
Commutativity::ConditionalCommutative(Some(Arc::new(merge_sort_transformer)))
}
LogicalPlan::Join(_) => Commutativity::NonCommutative,
LogicalPlan::CrossJoin(_) => Commutativity::NonCommutative,
@@ -118,7 +120,8 @@ impl Categorizer {
|| name == SeriesNormalize::name()
|| name == RangeManipulate::name()
|| name == SeriesDivide::name()
|| name == MergeScanLogicalPlan::name() =>
|| name == MergeScanLogicalPlan::name()
|| name == MergeSortLogicalPlan::name() =>
{
Commutativity::Unimplemented
}

View File

@@ -298,6 +298,14 @@ impl MergeScanExec {
pub fn sub_stage_metrics(&self) -> Vec<RecordBatchMetrics> {
self.sub_stage_metrics.lock().unwrap().clone()
}
pub fn partition_count(&self) -> usize {
self.target_partition
}
pub fn region_count(&self) -> usize {
self.regions.len()
}
}
impl ExecutionPlan for MergeScanExec {

View File

@@ -0,0 +1,124 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
//! Merge sort logical plan for distributed query execution, roughly corresponding to the
//! `SortPreservingMergeExec` operator in datafusion
//!
use std::fmt;
use std::sync::Arc;
use datafusion_common::{DataFusionError, Result};
use datafusion_expr::{Expr, Extension, LogicalPlan, UserDefinedLogicalNodeCore};
/// MergeSort Logical Plan, have same field as `Sort`, but indicate it is a merge sort,
/// which assume each input partition is a sorted stream, and will use `SortPreserveingMergeExec`
/// to merge them into a single sorted stream.
#[derive(Hash, PartialEq, Eq, Clone)]
pub struct MergeSortLogicalPlan {
pub expr: Vec<Expr>,
pub input: Arc<LogicalPlan>,
pub fetch: Option<usize>,
}
impl fmt::Debug for MergeSortLogicalPlan {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
UserDefinedLogicalNodeCore::fmt_for_explain(self, f)
}
}
impl MergeSortLogicalPlan {
pub fn new(input: Arc<LogicalPlan>, expr: Vec<Expr>, fetch: Option<usize>) -> Self {
Self { input, expr, fetch }
}
pub fn name() -> &'static str {
"MergeSort"
}
/// Create a [`LogicalPlan::Extension`] node from this merge sort plan
pub fn into_logical_plan(self) -> LogicalPlan {
LogicalPlan::Extension(Extension {
node: Arc::new(self),
})
}
/// Convert self to a [`Sort`] logical plan with same input and expressions
pub fn into_sort(self) -> LogicalPlan {
LogicalPlan::Sort(datafusion::logical_expr::Sort {
input: self.input.clone(),
expr: self.expr,
fetch: self.fetch,
})
}
}
impl UserDefinedLogicalNodeCore for MergeSortLogicalPlan {
fn name(&self) -> &str {
Self::name()
}
// Allow optimization here
fn inputs(&self) -> Vec<&LogicalPlan> {
vec![self.input.as_ref()]
}
fn schema(&self) -> &datafusion_common::DFSchemaRef {
self.input.schema()
}
// Allow further optimization
fn expressions(&self) -> Vec<datafusion_expr::Expr> {
self.expr.clone()
}
fn fmt_for_explain(&self, f: &mut fmt::Formatter) -> fmt::Result {
write!(f, "MergeSort: ")?;
for (i, expr_item) in self.expr.iter().enumerate() {
if i > 0 {
write!(f, ", ")?;
}
write!(f, "{expr_item}")?;
}
if let Some(a) = self.fetch {
write!(f, ", fetch={a}")?;
}
Ok(())
}
fn with_exprs_and_inputs(
&self,
exprs: Vec<datafusion::prelude::Expr>,
mut inputs: Vec<LogicalPlan>,
) -> Result<Self> {
let mut zelf = self.clone();
zelf.expr = exprs;
zelf.input = Arc::new(inputs.pop().ok_or_else(|| {
DataFusionError::Internal("Expected exactly one input with MergeSort".to_string())
})?);
Ok(zelf)
}
}
/// Turn `Sort` into `MergeSort` if possible
pub fn merge_sort_transformer(plan: &LogicalPlan) -> Option<LogicalPlan> {
if let LogicalPlan::Sort(sort) = plan {
Some(
MergeSortLogicalPlan::new(sort.input.clone(), sort.expr.clone(), sort.fetch)
.into_logical_plan(),
)
} else {
None
}
}

View File

@@ -25,7 +25,7 @@ use datafusion::execution::context::SessionState;
use datafusion::physical_plan::ExecutionPlan;
use datafusion::physical_planner::{ExtensionPlanner, PhysicalPlanner};
use datafusion_common::tree_node::{TreeNode, TreeNodeRecursion, TreeNodeVisitor};
use datafusion_common::TableReference;
use datafusion_common::{DataFusionError, TableReference};
use datafusion_expr::{LogicalPlan, UserDefinedLogicalNode};
use session::context::QueryContext;
use snafu::{OptionExt, ResultExt};
@@ -35,9 +35,69 @@ use table::table::adapter::DfTableProviderAdapter;
use table::table_name::TableName;
use crate::dist_plan::merge_scan::{MergeScanExec, MergeScanLogicalPlan};
use crate::dist_plan::merge_sort::MergeSortLogicalPlan;
use crate::error::{CatalogSnafu, TableNotFoundSnafu};
use crate::region_query::RegionQueryHandlerRef;
/// Planner for convert merge sort logical plan to physical plan
/// it is currently a fallback to sort, and doesn't change the execution plan:
/// `MergeSort(MergeScan) -> Sort(MergeScan) - to physical plan -> ...`
/// It should be applied after `DistExtensionPlanner`
///
/// (Later when actually impl this merge sort)
///
/// We should ensure the number of partition is not smaller than the number of region at present. Otherwise this would result in incorrect output.
pub struct MergeSortExtensionPlanner {}
#[async_trait]
impl ExtensionPlanner for MergeSortExtensionPlanner {
async fn plan_extension(
&self,
planner: &dyn PhysicalPlanner,
node: &dyn UserDefinedLogicalNode,
_logical_inputs: &[&LogicalPlan],
physical_inputs: &[Arc<dyn ExecutionPlan>],
session_state: &SessionState,
) -> Result<Option<Arc<dyn ExecutionPlan>>> {
if let Some(merge_sort) = node.as_any().downcast_ref::<MergeSortLogicalPlan>() {
if let LogicalPlan::Extension(ext) = &merge_sort.input.as_ref()
&& ext
.node
.as_any()
.downcast_ref::<MergeScanLogicalPlan>()
.is_some()
{
let merge_scan_exec = physical_inputs
.first()
.and_then(|p| p.as_any().downcast_ref::<MergeScanExec>())
.ok_or(DataFusionError::Internal(format!(
"Expect MergeSort's input is a MergeScanExec, found {:?}",
physical_inputs
)))?;
let partition_cnt = merge_scan_exec.partition_count();
let region_cnt = merge_scan_exec.region_count();
// if partition >= region, we know that every partition stream of merge scan is ordered
// and we only need to do a merge sort, otherwise fallback to quick sort
let can_merge_sort = partition_cnt >= region_cnt;
if can_merge_sort {
// TODO(discord9): use `SortPreversingMergeExec here`
}
// for now merge sort only exist in logical plan, and have the same effect as `Sort`
// doesn't change the execution plan, this will change in the future
let ret = planner
.create_physical_plan(&merge_sort.clone().into_sort(), session_state)
.await?;
Ok(Some(ret))
} else {
Ok(None)
}
} else {
Ok(None)
}
}
}
pub struct DistExtensionPlanner {
catalog_manager: CatalogManagerRef,
region_query_handler: RegionQueryHandlerRef,

View File

@@ -42,7 +42,7 @@ use promql::extension_plan::PromExtensionPlanner;
use table::table::adapter::DfTableProviderAdapter;
use table::TableRef;
use crate::dist_plan::{DistExtensionPlanner, DistPlannerAnalyzer};
use crate::dist_plan::{DistExtensionPlanner, DistPlannerAnalyzer, MergeSortExtensionPlanner};
use crate::optimizer::count_wildcard::CountWildcardToTimeIndexRule;
use crate::optimizer::parallelize_scan::ParallelizeScan;
use crate::optimizer::remove_duplicate::RemoveDuplicate;
@@ -295,6 +295,7 @@ impl DfQueryPlanner {
catalog_manager,
region_query_handler,
)));
planners.push(Arc::new(MergeSortExtensionPlanner {}));
}
Self {
physical_planner: DefaultPhysicalPlanner::with_extension_planners(planners),

View File

@@ -19,6 +19,7 @@ use std::sync::Arc;
use common_function::scalars::aggregate::AggregateFunctionMeta;
use common_macro::{as_aggr_func_creator, AggrFuncTypeStore};
use common_query::error::{CreateAccumulatorSnafu, Result as QueryResult};
use common_query::logical_plan::accumulator::AggrFuncTypeStore;
use common_query::logical_plan::{Accumulator, AggregateFunctionCreator};
use common_query::prelude::*;
use common_recordbatch::{RecordBatch, RecordBatches};

View File

@@ -59,6 +59,7 @@ use crate::http::error_result::ErrorResponse;
use crate::http::greptime_result_v1::GreptimedbV1Response;
use crate::http::influxdb::{influxdb_health, influxdb_ping, influxdb_write_v1, influxdb_write_v2};
use crate::http::influxdb_result_v1::InfluxdbV1Response;
use crate::http::json_result::JsonResponse;
use crate::http::prometheus::{
build_info_query, format_query, instant_query, label_values_query, labels_query, range_query,
series_query,
@@ -97,6 +98,7 @@ pub mod error_result;
pub mod greptime_manage_resp;
pub mod greptime_result_v1;
pub mod influxdb_result_v1;
pub mod json_result;
pub mod table_result;
#[cfg(any(test, feature = "testing"))]
@@ -279,6 +281,7 @@ pub enum ResponseFormat {
#[default]
GreptimedbV1,
InfluxdbV1,
Json,
}
impl ResponseFormat {
@@ -289,6 +292,7 @@ impl ResponseFormat {
"table" => Some(ResponseFormat::Table),
"greptimedb_v1" => Some(ResponseFormat::GreptimedbV1),
"influxdb_v1" => Some(ResponseFormat::InfluxdbV1),
"json" => Some(ResponseFormat::Json),
_ => None,
}
}
@@ -300,6 +304,7 @@ impl ResponseFormat {
ResponseFormat::Table => "table",
ResponseFormat::GreptimedbV1 => "greptimedb_v1",
ResponseFormat::InfluxdbV1 => "influxdb_v1",
ResponseFormat::Json => "json",
}
}
}
@@ -356,6 +361,7 @@ pub enum HttpResponse {
Error(ErrorResponse),
GreptimedbV1(GreptimedbV1Response),
InfluxdbV1(InfluxdbV1Response),
Json(JsonResponse),
}
impl HttpResponse {
@@ -366,6 +372,7 @@ impl HttpResponse {
HttpResponse::Table(resp) => resp.with_execution_time(execution_time).into(),
HttpResponse::GreptimedbV1(resp) => resp.with_execution_time(execution_time).into(),
HttpResponse::InfluxdbV1(resp) => resp.with_execution_time(execution_time).into(),
HttpResponse::Json(resp) => resp.with_execution_time(execution_time).into(),
HttpResponse::Error(resp) => resp.with_execution_time(execution_time).into(),
}
}
@@ -375,6 +382,7 @@ impl HttpResponse {
HttpResponse::Csv(resp) => resp.with_limit(limit).into(),
HttpResponse::Table(resp) => resp.with_limit(limit).into(),
HttpResponse::GreptimedbV1(resp) => resp.with_limit(limit).into(),
HttpResponse::Json(resp) => resp.with_limit(limit).into(),
_ => self,
}
}
@@ -407,6 +415,7 @@ impl IntoResponse for HttpResponse {
HttpResponse::Table(resp) => resp.into_response(),
HttpResponse::GreptimedbV1(resp) => resp.into_response(),
HttpResponse::InfluxdbV1(resp) => resp.into_response(),
HttpResponse::Json(resp) => resp.into_response(),
HttpResponse::Error(resp) => resp.into_response(),
}
}
@@ -452,6 +461,12 @@ impl From<InfluxdbV1Response> for HttpResponse {
}
}
impl From<JsonResponse> for HttpResponse {
fn from(value: JsonResponse) -> Self {
HttpResponse::Json(value)
}
}
async fn serve_api(Extension(api): Extension<OpenApi>) -> impl IntoApiResponse {
Json(api)
}
@@ -715,6 +730,7 @@ impl HttpServer {
authorize::check_http_auth,
)),
)
// Handlers for debug, we don't expect a timeout.
.nest(
"/debug",
Router::new()
@@ -722,19 +738,19 @@ impl HttpServer {
.route(
"/log_level",
routing::get(dyn_log::dyn_log_handler).post(dyn_log::dyn_log_handler),
),
)
// Handlers for debug, we don't expect a timeout.
.nest(
&format!("/{HTTP_API_VERSION}/prof"),
Router::new()
.route(
"/cpu",
routing::get(pprof::pprof_handler).post(pprof::pprof_handler),
)
.route(
"/mem",
routing::get(mem_prof::mem_prof_handler).post(mem_prof::mem_prof_handler),
.nest(
"/prof",
Router::new()
.route(
"/cpu",
routing::get(pprof::pprof_handler).post(pprof::pprof_handler),
)
.route(
"/mem",
routing::get(mem_prof::mem_prof_handler)
.post(mem_prof::mem_prof_handler),
),
),
)
}
@@ -1131,6 +1147,7 @@ mod test {
ResponseFormat::Csv,
ResponseFormat::Table,
ResponseFormat::Arrow,
ResponseFormat::Json,
] {
let recordbatches =
RecordBatches::try_new(schema.clone(), vec![recordbatch.clone()]).unwrap();
@@ -1141,6 +1158,7 @@ mod test {
ResponseFormat::Table => TableResponse::from_output(outputs).await,
ResponseFormat::GreptimedbV1 => GreptimedbV1Response::from_output(outputs).await,
ResponseFormat::InfluxdbV1 => InfluxdbV1Response::from_output(outputs, None).await,
ResponseFormat::Json => JsonResponse::from_output(outputs).await,
};
match json_resp {
@@ -1210,6 +1228,21 @@ mod test {
assert_eq!(rb.num_columns(), 2);
assert_eq!(rb.num_rows(), 4);
}
HttpResponse::Json(resp) => {
let output = &resp.output()[0];
if let GreptimeQueryOutput::Records(r) = output {
assert_eq!(r.num_rows(), 4);
assert_eq!(r.num_cols(), 2);
assert_eq!(r.schema.column_schemas[0].name, "numbers");
assert_eq!(r.schema.column_schemas[0].data_type, "UInt32");
assert_eq!(r.rows[0][0], serde_json::Value::from(1));
assert_eq!(r.rows[0][1], serde_json::Value::Null);
} else {
panic!("invalid output type");
}
}
HttpResponse::Error(err) => unreachable!("{err:?}"),
}
}

View File

@@ -50,6 +50,9 @@ use crate::metrics::{
};
use crate::query_handler::LogHandlerRef;
const GREPTIME_INTERNAL_PIPELINE_NAME_PREFIX: &str = "greptime_";
const GREPTIME_INTERNAL_IDENTITY_PIPELINE_NAME: &str = "greptime_identity";
#[derive(Debug, Default, Serialize, Deserialize, JsonSchema)]
pub struct LogIngesterQueryParams {
pub table: Option<String>,
@@ -121,6 +124,12 @@ pub async fn add_pipeline(
reason: "pipeline_name is required in path",
}
);
ensure!(
!pipeline_name.starts_with(GREPTIME_INTERNAL_PIPELINE_NAME_PREFIX),
InvalidParameterSnafu {
reason: "pipeline_name cannot start with greptime_",
}
);
ensure!(
!payload.is_empty(),
InvalidParameterSnafu {
@@ -425,47 +434,54 @@ async fn ingest_logs_inner(
let db = query_ctx.get_db_string();
let exec_timer = std::time::Instant::now();
let pipeline = state
.get_pipeline(&pipeline_name, version, query_ctx.clone())
.await?;
let transform_timer = std::time::Instant::now();
let mut intermediate_state = pipeline.init_intermediate_state();
let mut results = Vec::with_capacity(pipeline_data.len());
let transformed_data: Rows;
if pipeline_name == GREPTIME_INTERNAL_IDENTITY_PIPELINE_NAME {
let rows = pipeline::identity_pipeline(pipeline_data)
.context(PipelineTransformSnafu)
.context(PipelineSnafu)?;
transformed_data = rows;
} else {
let pipeline = state
.get_pipeline(&pipeline_name, version, query_ctx.clone())
.await?;
for v in pipeline_data {
pipeline
.prepare(v, &mut intermediate_state)
.inspect_err(|_| {
METRIC_HTTP_LOGS_TRANSFORM_ELAPSED
.with_label_values(&[db.as_str(), METRIC_FAILURE_VALUE])
.observe(transform_timer.elapsed().as_secs_f64());
})
.context(PipelineTransformSnafu)
.context(PipelineSnafu)?;
let r = pipeline
.exec_mut(&mut intermediate_state)
.inspect_err(|_| {
METRIC_HTTP_LOGS_TRANSFORM_ELAPSED
.with_label_values(&[db.as_str(), METRIC_FAILURE_VALUE])
.observe(transform_timer.elapsed().as_secs_f64());
})
.context(PipelineTransformSnafu)
.context(PipelineSnafu)?;
results.push(r);
pipeline.reset_intermediate_state(&mut intermediate_state);
let transform_timer = std::time::Instant::now();
let mut intermediate_state = pipeline.init_intermediate_state();
for v in pipeline_data {
pipeline
.prepare(v, &mut intermediate_state)
.inspect_err(|_| {
METRIC_HTTP_LOGS_TRANSFORM_ELAPSED
.with_label_values(&[db.as_str(), METRIC_FAILURE_VALUE])
.observe(transform_timer.elapsed().as_secs_f64());
})
.context(PipelineTransformSnafu)
.context(PipelineSnafu)?;
let r = pipeline
.exec_mut(&mut intermediate_state)
.inspect_err(|_| {
METRIC_HTTP_LOGS_TRANSFORM_ELAPSED
.with_label_values(&[db.as_str(), METRIC_FAILURE_VALUE])
.observe(transform_timer.elapsed().as_secs_f64());
})
.context(PipelineTransformSnafu)
.context(PipelineSnafu)?;
results.push(r);
pipeline.reset_intermediate_state(&mut intermediate_state);
}
METRIC_HTTP_LOGS_TRANSFORM_ELAPSED
.with_label_values(&[db.as_str(), METRIC_SUCCESS_VALUE])
.observe(transform_timer.elapsed().as_secs_f64());
transformed_data = Rows {
rows: results,
schema: pipeline.schemas().clone(),
};
}
METRIC_HTTP_LOGS_TRANSFORM_ELAPSED
.with_label_values(&[db.as_str(), METRIC_SUCCESS_VALUE])
.observe(transform_timer.elapsed().as_secs_f64());
let transformed_data: Rows = Rows {
rows: results,
schema: pipeline.schemas().clone(),
};
let insert_request = RowInsertRequest {
rows: Some(transformed_data),
table_name: table_name.clone(),

View File

@@ -39,6 +39,7 @@ use crate::http::csv_result::CsvResponse;
use crate::http::error_result::ErrorResponse;
use crate::http::greptime_result_v1::GreptimedbV1Response;
use crate::http::influxdb_result_v1::InfluxdbV1Response;
use crate::http::json_result::JsonResponse;
use crate::http::table_result::TableResponse;
use crate::http::{
ApiState, Epoch, GreptimeOptionsConfigState, GreptimeQueryOutput, HttpRecordsOutput,
@@ -138,6 +139,7 @@ pub async fn sql(
ResponseFormat::Table => TableResponse::from_output(outputs).await,
ResponseFormat::GreptimedbV1 => GreptimedbV1Response::from_output(outputs).await,
ResponseFormat::InfluxdbV1 => InfluxdbV1Response::from_output(outputs, epoch).await,
ResponseFormat::Json => JsonResponse::from_output(outputs).await,
};
if let Some(limit) = query_params.limit {

View File

@@ -0,0 +1,137 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
use axum::http::{header, HeaderValue};
use axum::response::{IntoResponse, Response};
use common_error::status_code::StatusCode;
use common_query::Output;
use mime_guess::mime;
use schemars::JsonSchema;
use serde::{Deserialize, Serialize};
use serde_json::{json, Map, Value};
use super::process_with_limit;
use crate::http::error_result::ErrorResponse;
use crate::http::header::{GREPTIME_DB_HEADER_EXECUTION_TIME, GREPTIME_DB_HEADER_FORMAT};
use crate::http::{handler, GreptimeQueryOutput, HttpResponse, ResponseFormat};
/// The json format here is different from the default json output of `GreptimedbV1` result.
/// `JsonResponse` is intended to make it easier for user to consume data.
#[derive(Serialize, Deserialize, Debug, JsonSchema)]
pub struct JsonResponse {
output: Vec<GreptimeQueryOutput>,
execution_time_ms: u64,
}
impl JsonResponse {
pub async fn from_output(outputs: Vec<crate::error::Result<Output>>) -> HttpResponse {
match handler::from_output(outputs).await {
Err(err) => HttpResponse::Error(err),
Ok((output, _)) => {
if output.len() > 1 {
HttpResponse::Error(ErrorResponse::from_error_message(
StatusCode::InvalidArguments,
"cannot output multi-statements result in json format".to_string(),
))
} else {
HttpResponse::Json(JsonResponse {
output,
execution_time_ms: 0,
})
}
}
}
}
pub fn output(&self) -> &[GreptimeQueryOutput] {
&self.output
}
pub fn with_execution_time(mut self, execution_time: u64) -> Self {
self.execution_time_ms = execution_time;
self
}
pub fn execution_time_ms(&self) -> u64 {
self.execution_time_ms
}
pub fn with_limit(mut self, limit: usize) -> Self {
self.output = process_with_limit(self.output, limit);
self
}
}
impl IntoResponse for JsonResponse {
fn into_response(mut self) -> Response {
debug_assert!(
self.output.len() <= 1,
"self.output has extra elements: {}",
self.output.len()
);
let execution_time = self.execution_time_ms;
let payload = match self.output.pop() {
None => String::default(),
Some(GreptimeQueryOutput::AffectedRows(n)) => json!({
"data": [],
"affected_rows": n,
"execution_time_ms": execution_time,
})
.to_string(),
Some(GreptimeQueryOutput::Records(records)) => {
let schema = records.schema();
let data: Vec<Map<String, Value>> = records
.rows
.iter()
.map(|row| {
schema
.column_schemas
.iter()
.enumerate()
.map(|(i, col)| (col.name.clone(), row[i].clone()))
.collect::<Map<String, Value>>()
})
.collect();
json!({
"data": data,
"execution_time_ms": execution_time,
})
.to_string()
}
};
(
[
(
header::CONTENT_TYPE,
HeaderValue::from_static(mime::APPLICATION_JSON.as_ref()),
),
(
GREPTIME_DB_HEADER_FORMAT.clone(),
HeaderValue::from_static(ResponseFormat::Json.as_str()),
),
(
GREPTIME_DB_HEADER_EXECUTION_TIME.clone(),
HeaderValue::from(execution_time),
),
],
payload,
)
.into_response()
}
}

View File

@@ -23,6 +23,7 @@ use cache::{build_fundamental_cache_registry, with_default_composite_cache_regis
use catalog::kvbackend::{CachedMetaKvBackendBuilder, KvBackendCatalogManager, MetaKvBackend};
use client::client_manager::NodeClients;
use client::Client;
use cmd::DistributedInformationExtension;
use common_base::Plugins;
use common_grpc::channel_manager::{ChannelConfig, ChannelManager};
use common_meta::cache::{CacheRegistryBuilder, LayeredCacheRegistryBuilder};
@@ -366,9 +367,10 @@ impl GreptimeDbClusterBuilder {
.build(),
);
let information_extension =
Arc::new(DistributedInformationExtension::new(meta_client.clone()));
let catalog_manager = KvBackendCatalogManager::new(
Mode::Distributed,
Some(meta_client.clone()),
information_extension,
cached_meta_backend.clone(),
cache_registry.clone(),
None,

View File

@@ -15,6 +15,7 @@
use std::sync::Arc;
use cache::{build_fundamental_cache_registry, with_default_composite_cache_registry};
use catalog::information_schema::NoopInformationExtension;
use catalog::kvbackend::KvBackendCatalogManager;
use cmd::error::StartFlownodeSnafu;
use cmd::standalone::StandaloneOptions;
@@ -146,8 +147,7 @@ impl GreptimeDbStandaloneBuilder {
);
let catalog_manager = KvBackendCatalogManager::new(
Mode::Standalone,
None,
Arc::new(NoopInformationExtension),
kv_backend.clone(),
cache_registry.clone(),
Some(procedure_manager.clone()),

View File

@@ -86,6 +86,7 @@ macro_rules! http_tests {
test_pipeline_api,
test_test_pipeline_api,
test_plain_text_ingestion,
test_identify_pipeline,
test_otlp_metrics,
test_otlp_traces,
@@ -181,6 +182,22 @@ pub async fn test_sql_api(store_type: StorageType) {
})).unwrap()
);
// test json result format
let res = client
.get("/v1/sql?format=json&sql=select * from numbers limit 10")
.send()
.await;
assert_eq!(res.status(), StatusCode::OK);
let body = res.json::<Value>().await;
let data = body.get("data").expect("Missing 'data' field in response");
let expected = json!([
{"number": 0}, {"number": 1}, {"number": 2}, {"number": 3}, {"number": 4},
{"number": 5}, {"number": 6}, {"number": 7}, {"number": 8}, {"number": 9}
]);
assert_eq!(data, &expected);
// test insert and select
let res = client
.get("/v1/sql?sql=insert into demo values('host', 66.6, 1024, 0)")
@@ -1076,6 +1093,21 @@ transform:
"#;
// 1. create pipeline
let res = client
.post("/v1/events/pipelines/greptime_guagua")
.header("Content-Type", "application/x-yaml")
.body(body)
.send()
.await;
assert_eq!(res.status(), StatusCode::BAD_REQUEST);
assert_eq!(
res.json::<serde_json::Value>().await["error"]
.as_str()
.unwrap(),
"Invalid request parameter: pipeline_name cannot start with greptime_"
);
let res = client
.post("/v1/events/pipelines/test")
.header("Content-Type", "application/x-yaml")
@@ -1161,6 +1193,61 @@ transform:
guard.remove_all().await;
}
pub async fn test_identify_pipeline(store_type: StorageType) {
common_telemetry::init_default_ut_logging();
let (app, mut guard) = setup_test_http_app_with_frontend(store_type, "test_pipeline_api").await;
// handshake
let client = TestClient::new(app);
let body = r#"{"__time__":1453809242,"__topic__":"","__source__":"10.170.***.***","ip":"10.200.**.***","time":"26/Jan/2016:19:54:02 +0800","url":"POST/PutData?Category=YunOsAccountOpLog&AccessKeyId=<yourAccessKeyId>&Date=Fri%2C%2028%20Jun%202013%2006%3A53%3A30%20GMT&Topic=raw&Signature=<yourSignature>HTTP/1.1","status":"200","user-agent":"aliyun-sdk-java"}
{"__time__":1453809242,"__topic__":"","__source__":"10.170.***.***","ip":"10.200.**.***","time":"26/Jan/2016:19:54:02 +0800","url":"POST/PutData?Category=YunOsAccountOpLog&AccessKeyId=<yourAccessKeyId>&Date=Fri%2C%2028%20Jun%202013%2006%3A53%3A30%20GMT&Topic=raw&Signature=<yourSignature>HTTP/1.1","status":"200","user-agent":"aliyun-sdk-java","hasagei":"hasagei","dongdongdong":"guaguagua"}"#;
let res = client
.post("/v1/events/logs?db=public&table=logs&pipeline_name=greptime_identity")
.header("Content-Type", "application/json")
.body(body)
.send()
.await;
assert_eq!(res.status(), StatusCode::OK);
let body: serde_json::Value = res.json().await;
assert!(body.get("execution_time_ms").unwrap().is_number());
assert_eq!(body["output"][0]["affectedrows"], 2);
let res = client.get("/v1/sql?sql=select * from logs").send().await;
assert_eq!(res.status(), StatusCode::OK);
let line1_expected = r#"["10.170.***.***",1453809242,"","10.200.**.***","200","26/Jan/2016:19:54:02 +0800","POST/PutData?Category=YunOsAccountOpLog&AccessKeyId=<yourAccessKeyId>&Date=Fri%2C%2028%20Jun%202013%2006%3A53%3A30%20GMT&Topic=raw&Signature=<yourSignature>HTTP/1.1","aliyun-sdk-java","guaguagua","hasagei",null]"#;
let line2_expected = r#"["10.170.***.***",1453809242,"","10.200.**.***","200","26/Jan/2016:19:54:02 +0800","POST/PutData?Category=YunOsAccountOpLog&AccessKeyId=<yourAccessKeyId>&Date=Fri%2C%2028%20Jun%202013%2006%3A53%3A30%20GMT&Topic=raw&Signature=<yourSignature>HTTP/1.1","aliyun-sdk-java",null,null,null]"#;
let res = client.get("/v1/sql?sql=select * from logs").send().await;
assert_eq!(res.status(), StatusCode::OK);
let resp: serde_json::Value = res.json().await;
let result = resp["output"][0]["records"]["rows"].as_array().unwrap();
assert_eq!(result.len(), 2);
let mut line1 = result[0].as_array().unwrap().clone();
let mut line2 = result[1].as_array().unwrap().clone();
assert!(line1.last().unwrap().is_i64());
assert!(line2.last().unwrap().is_i64());
*line1.last_mut().unwrap() = serde_json::Value::Null;
*line2.last_mut().unwrap() = serde_json::Value::Null;
assert_eq!(
line1,
serde_json::from_str::<Vec<Value>>(line1_expected).unwrap()
);
assert_eq!(
line2,
serde_json::from_str::<Vec<Value>>(line2_expected).unwrap()
);
let expected = r#"[["__source__","String","","YES","","FIELD"],["__time__","Int64","","YES","","FIELD"],["__topic__","String","","YES","","FIELD"],["ip","String","","YES","","FIELD"],["status","String","","YES","","FIELD"],["time","String","","YES","","FIELD"],["url","String","","YES","","FIELD"],["user-agent","String","","YES","","FIELD"],["dongdongdong","String","","YES","","FIELD"],["hasagei","String","","YES","","FIELD"],["greptime_timestamp","TimestampNanosecond","PRI","NO","","TIMESTAMP"]]"#;
validate_data(&client, "desc logs", expected).await;
guard.remove_all().await;
}
pub async fn test_test_pipeline_api(store_type: StorageType) {
common_telemetry::init_default_ut_logging();
let (app, mut guard) = setup_test_http_app_with_frontend(store_type, "test_pipeline_api").await;
@@ -1236,7 +1323,7 @@ transform:
.send()
.await;
assert_eq!(res.status(), StatusCode::OK);
let body: serde_json::Value = res.json().await;
let body: Value = res.json().await;
let schema = &body["schema"];
let rows = &body["rows"];
assert_eq!(

View File

@@ -25,7 +25,7 @@ explain SELECT * FROM demo WHERE ts > cast(1000000000 as timestamp) ORDER BY hos
+-+-+
| plan_type_| plan_|
+-+-+
| logical_plan_| Sort: demo.host ASC NULLS LAST_|
| logical_plan_| MergeSort: demo.host ASC NULLS LAST_|
|_|_MergeScan [is_placeholder=false]_|
| physical_plan | SortPreservingMergeExec: [host@0 ASC NULLS LAST]_|
|_|_SortExec: expr=[host@0 ASC NULLS LAST], preserve_partitioning=[true]_|

View File

@@ -20,52 +20,52 @@ DESC TABLE CLUSTER_INFO;
-- SQLNESS REPLACE version node_version
-- SQLNESS REPLACE (\s\d\.\d\.\d\s) Version
-- SQLNESS REPLACE (\s[a-z0-9]{7,8}\s) Hash
-- SQLNESS REPLACE (\s[\-0-9T:\.]{15,}) Start_time
-- SQLNESS REPLACE (\s[\-0-9T:\.]{19,}) Start_time
-- SQLNESS REPLACE ((\d+(s|ms|m)\s)+) Duration
-- SQLNESS REPLACE [\s\-]+
SELECT * FROM CLUSTER_INFO ORDER BY peer_type;
+++++++++|peer_id|peer_type|peer_addr|node_version|git_commit|start_time|uptime|active_time|+++++++++|1|DATANODE|127.0.0.1:4101|Version|Hash|Start_time|Duration|Duration||2|DATANODE|127.0.0.1:4102|Version|Hash|Start_time|Duration|Duration||3|DATANODE|127.0.0.1:4103|Version|Hash|Start_time|Duration|Duration||0|FLOWNODE|127.0.0.1:6800|Version|Hash|Start_time|Duration|Duration||1|FRONTEND|127.0.0.1:4001|Version|Hash|Start_time|Duration|Duration||1|METASRV|127.0.0.1:3002|Version|Hash|Start_time|Duration||+++++++++
+++++++++|peer_id|peer_type|peer_addr|node_version|git_commit|start_time|uptime|active_time|+++++++++|1|DATANODE|127.0.0.1:29411|Version|Hash|Start_time|Duration|Duration||2|DATANODE|127.0.0.1:29412|Version|Hash|Start_time|Duration|Duration||3|DATANODE|127.0.0.1:29413|Version|Hash|Start_time|Duration|Duration||0|FLOWNODE|127.0.0.1:29680|Version|Hash|Start_time|Duration|Duration||1|FRONTEND|127.0.0.1:29401|Version|Hash|Start_time|Duration|Duration||1|METASRV|127.0.0.1:29302|Version|Hash|Start_time|Duration||+++++++++
-- SQLNESS REPLACE version node_version
-- SQLNESS REPLACE (\s\d\.\d\.\d\s) Version
-- SQLNESS REPLACE (\s[a-z0-9]{7,8}\s) Hash
-- SQLNESS REPLACE (\s[\-0-9T:\.]{15,}) Start_time
-- SQLNESS REPLACE (\s[\-0-9T:\.]{19,}) Start_time
-- SQLNESS REPLACE ((\d+(s|ms|m)\s)+) Duration
-- SQLNESS REPLACE [\s\-]+
SELECT * FROM CLUSTER_INFO WHERE PEER_TYPE = 'METASRV' ORDER BY peer_type;
+++++++++|peer_id|peer_type|peer_addr|node_version|git_commit|start_time|uptime|active_time|+++++++++|1|METASRV|127.0.0.1:3002|Version|Hash|Start_time|Duration||+++++++++
+++++++++|peer_id|peer_type|peer_addr|node_version|git_commit|start_time|uptime|active_time|+++++++++|1|METASRV|127.0.0.1:29302|Version|Hash|Start_time|Duration||+++++++++
-- SQLNESS REPLACE version node_version
-- SQLNESS REPLACE (\s\d\.\d\.\d\s) Version
-- SQLNESS REPLACE (\s[a-z0-9]{7,8}\s) Hash
-- SQLNESS REPLACE (\s[\-0-9T:\.]{15,}) Start_time
-- SQLNESS REPLACE (\s[\-0-9T:\.]{19,}) Start_time
-- SQLNESS REPLACE ((\d+(s|ms|m)\s)+) Duration
-- SQLNESS REPLACE [\s\-]+
SELECT * FROM CLUSTER_INFO WHERE PEER_TYPE = 'FRONTEND' ORDER BY peer_type;
+++++++++|peer_id|peer_type|peer_addr|node_version|git_commit|start_time|uptime|active_time|+++++++++|1|FRONTEND|127.0.0.1:4001|Version|Hash|Start_time|Duration|Duration|+++++++++
+++++++++|peer_id|peer_type|peer_addr|node_version|git_commit|start_time|uptime|active_time|+++++++++|1|FRONTEND|127.0.0.1:29401|Version|Hash|Start_time|Duration|Duration|+++++++++
-- SQLNESS REPLACE version node_version
-- SQLNESS REPLACE (\s\d\.\d\.\d\s) Version
-- SQLNESS REPLACE (\s[a-z0-9]{7,8}\s) Hash
-- SQLNESS REPLACE (\s[\-0-9T:\.]{15,}) Start_time
-- SQLNESS REPLACE (\s[\-0-9T:\.]{19,}) Start_time
-- SQLNESS REPLACE ((\d+(s|ms|m)\s)+) Duration
-- SQLNESS REPLACE [\s\-]+
SELECT * FROM CLUSTER_INFO WHERE PEER_TYPE != 'FRONTEND' ORDER BY peer_type;
+++++++++|peer_id|peer_type|peer_addr|node_version|git_commit|start_time|uptime|active_time|+++++++++|1|DATANODE|127.0.0.1:4101|Version|Hash|Start_time|Duration|Duration||2|DATANODE|127.0.0.1:4102|Version|Hash|Start_time|Duration|Duration||3|DATANODE|127.0.0.1:4103|Version|Hash|Start_time|Duration|Duration||0|FLOWNODE|127.0.0.1:6800|Version|Hash|Start_time|Duration|Duration||1|METASRV|127.0.0.1:3002|Version|Hash|Start_time|Duration||+++++++++
+++++++++|peer_id|peer_type|peer_addr|node_version|git_commit|start_time|uptime|active_time|+++++++++|1|DATANODE|127.0.0.1:29411|Version|Hash|Start_time|Duration|Duration||2|DATANODE|127.0.0.1:29412|Version|Hash|Start_time|Duration|Duration||3|DATANODE|127.0.0.1:29413|Version|Hash|Start_time|Duration|Duration||0|FLOWNODE|127.0.0.1:29680|Version|Hash|Start_time|Duration|Duration||1|METASRV|127.0.0.1:29302|Version|Hash|Start_time|Duration||+++++++++
-- SQLNESS REPLACE version node_version
-- SQLNESS REPLACE (\s\d\.\d\.\d\s) Version
-- SQLNESS REPLACE (\s[a-z0-9]{7,8}\s) Hash
-- SQLNESS REPLACE (\s[\-0-9T:\.]{15,}) Start_time
-- SQLNESS REPLACE (\s[\-0-9T:\.]{19,}) Start_time
-- SQLNESS REPLACE ((\d+(s|ms|m)\s)+) Duration
-- SQLNESS REPLACE [\s\-]+
SELECT * FROM CLUSTER_INFO WHERE PEER_ID > 1 ORDER BY peer_type;
+++++++++|peer_id|peer_type|peer_addr|node_version|git_commit|start_time|uptime|active_time|+++++++++|2|DATANODE|127.0.0.1:4102|Version|Hash|Start_time|Duration|Duration||3|DATANODE|127.0.0.1:4103|Version|Hash|Start_time|Duration|Duration|+++++++++
+++++++++|peer_id|peer_type|peer_addr|node_version|git_commit|start_time|uptime|active_time|+++++++++|2|DATANODE|127.0.0.1:29412|Version|Hash|Start_time|Duration|Duration||3|DATANODE|127.0.0.1:29413|Version|Hash|Start_time|Duration|Duration|+++++++++
USE PUBLIC;

View File

@@ -5,7 +5,7 @@ DESC TABLE CLUSTER_INFO;
-- SQLNESS REPLACE version node_version
-- SQLNESS REPLACE (\s\d\.\d\.\d\s) Version
-- SQLNESS REPLACE (\s[a-z0-9]{7,8}\s) Hash
-- SQLNESS REPLACE (\s[\-0-9T:\.]{15,}) Start_time
-- SQLNESS REPLACE (\s[\-0-9T:\.]{19,}) Start_time
-- SQLNESS REPLACE ((\d+(s|ms|m)\s)+) Duration
-- SQLNESS REPLACE [\s\-]+
SELECT * FROM CLUSTER_INFO ORDER BY peer_type;
@@ -13,7 +13,7 @@ SELECT * FROM CLUSTER_INFO ORDER BY peer_type;
-- SQLNESS REPLACE version node_version
-- SQLNESS REPLACE (\s\d\.\d\.\d\s) Version
-- SQLNESS REPLACE (\s[a-z0-9]{7,8}\s) Hash
-- SQLNESS REPLACE (\s[\-0-9T:\.]{15,}) Start_time
-- SQLNESS REPLACE (\s[\-0-9T:\.]{19,}) Start_time
-- SQLNESS REPLACE ((\d+(s|ms|m)\s)+) Duration
-- SQLNESS REPLACE [\s\-]+
SELECT * FROM CLUSTER_INFO WHERE PEER_TYPE = 'METASRV' ORDER BY peer_type;
@@ -21,7 +21,7 @@ SELECT * FROM CLUSTER_INFO WHERE PEER_TYPE = 'METASRV' ORDER BY peer_type;
-- SQLNESS REPLACE version node_version
-- SQLNESS REPLACE (\s\d\.\d\.\d\s) Version
-- SQLNESS REPLACE (\s[a-z0-9]{7,8}\s) Hash
-- SQLNESS REPLACE (\s[\-0-9T:\.]{15,}) Start_time
-- SQLNESS REPLACE (\s[\-0-9T:\.]{19,}) Start_time
-- SQLNESS REPLACE ((\d+(s|ms|m)\s)+) Duration
-- SQLNESS REPLACE [\s\-]+
SELECT * FROM CLUSTER_INFO WHERE PEER_TYPE = 'FRONTEND' ORDER BY peer_type;
@@ -29,7 +29,7 @@ SELECT * FROM CLUSTER_INFO WHERE PEER_TYPE = 'FRONTEND' ORDER BY peer_type;
-- SQLNESS REPLACE version node_version
-- SQLNESS REPLACE (\s\d\.\d\.\d\s) Version
-- SQLNESS REPLACE (\s[a-z0-9]{7,8}\s) Hash
-- SQLNESS REPLACE (\s[\-0-9T:\.]{15,}) Start_time
-- SQLNESS REPLACE (\s[\-0-9T:\.]{19,}) Start_time
-- SQLNESS REPLACE ((\d+(s|ms|m)\s)+) Duration
-- SQLNESS REPLACE [\s\-]+
SELECT * FROM CLUSTER_INFO WHERE PEER_TYPE != 'FRONTEND' ORDER BY peer_type;
@@ -37,7 +37,7 @@ SELECT * FROM CLUSTER_INFO WHERE PEER_TYPE != 'FRONTEND' ORDER BY peer_type;
-- SQLNESS REPLACE version node_version
-- SQLNESS REPLACE (\s\d\.\d\.\d\s) Version
-- SQLNESS REPLACE (\s[a-z0-9]{7,8}\s) Hash
-- SQLNESS REPLACE (\s[\-0-9T:\.]{15,}) Start_time
-- SQLNESS REPLACE (\s[\-0-9T:\.]{19,}) Start_time
-- SQLNESS REPLACE ((\d+(s|ms|m)\s)+) Duration
-- SQLNESS REPLACE [\s\-]+
SELECT * FROM CLUSTER_INFO WHERE PEER_ID > 1 ORDER BY peer_type;

File diff suppressed because one or more lines are too long

View File

@@ -26,18 +26,34 @@ SELECT h3_latlng_to_cell(37.76938, -122.3889, 8::UInt64), h3_latlng_to_cell_stri
SELECT h3_cell_to_string(h3_latlng_to_cell(37.76938, -122.3889, 8::UInt64)) AS cell_str, h3_string_to_cell(h3_latlng_to_cell_string(37.76938, -122.3889, 8::UInt64)) AS cell_index;
SELECT h3_cell_center_lat(h3_latlng_to_cell(37.76938, -122.3889, 8::UInt64)) AS cell_center_lat, h3_cell_center_lng(h3_latlng_to_cell(37.76938, -122.3889, 8::UInt64)) AS cell_center_lng;
SELECT h3_cell_center_latlng(h3_latlng_to_cell(37.76938, -122.3889, 8::UInt64)) AS cell_center;
SELECT
h3_cell_resolution(cell) AS resolution,
h3_cell_base(cell) AS base,
h3_cell_is_pentagon(cell) AS pentagon,
h3_cell_parent(cell, 6::UInt64) AS parent,
h3_cell_to_children(cell, 10::UInt64) AS children,
h3_cell_to_children_size(cell, 10) AS children_count,
h3_cell_to_child_pos(cell, 6) AS child_pos,
h3_child_pos_to_cell(25, cell, 11) AS child
FROM (SELECT h3_latlng_to_cell(37.76938, -122.3889, 8::UInt64) AS cell);
SELECT h3_is_neighbour(cell1, cell2)
FROM (SELECT h3_latlng_to_cell(37.76938, -122.3889, 8::UInt64) AS cell1, h3_latlng_to_cell(36.76938, -122.3889, 8::UInt64) AS cell2);
SELECT
h3_grid_disk(cell, 0) AS current_cell,
h3_grid_disk(cell, 3) AS grids,
h3_grid_disk_distances(cell, 3) AS all_grids,
FROM (SELECT h3_latlng_to_cell(37.76938, -122.3889, 8::UInt64) AS cell);
SELECT
h3_grid_distance(cell1, cell2) AS distance,
h3_grid_path_cells(cell1, cell2) AS path_cells,
FROM
(
SELECT
h3_latlng_to_cell(37.76938, -122.3889, 8::UInt64) AS cell1,
h3_latlng_to_cell(39.634, -104.999, 8::UInt64) AS cell2
);
SELECT geohash(37.76938, -122.3889, 9);
@@ -66,3 +82,16 @@ SELECT geohash(37.76938, -122.3889, 11::UInt32);
SELECT geohash(37.76938, -122.3889, 11::UInt64);
SELECT geohash_neighbours(37.76938, -122.3889, 11);
SELECT json_encode_path(37.76938, -122.3889, 1728083375::TimestampSecond);
SELECT json_encode_path(lat, lon, ts)
FROM(
SELECT 37.76938 AS lat, -122.3889 AS lon, 1728083375::TimestampSecond AS ts
UNION ALL
SELECT 37.76928 AS lat, -122.3839 AS lon, 1728083373::TimestampSecond AS ts
UNION ALL
SELECT 37.76930 AS lat, -122.3820 AS lon, 1728083379::TimestampSecond AS ts
UNION ALL
SELECT 37.77001 AS lat, -122.3888 AS lon, 1728083372::TimestampSecond AS ts
);

View File

@@ -0,0 +1,33 @@
--- json_path_exists ---
SELECT json_path_exists(parse_json('{"a": 1, "b": 2}'), '$.a');
+--------------------------------------------------------------------+
| json_path_exists(parse_json(Utf8("{"a": 1, "b": 2}")),Utf8("$.a")) |
+--------------------------------------------------------------------+
| true |
+--------------------------------------------------------------------+
SELECT json_path_exists(parse_json('{"a": 1, "b": 2}'), '$.c');
+--------------------------------------------------------------------+
| json_path_exists(parse_json(Utf8("{"a": 1, "b": 2}")),Utf8("$.c")) |
+--------------------------------------------------------------------+
| false |
+--------------------------------------------------------------------+
SELECT json_path_exists(parse_json('[1, 2]'), '[0]');
+----------------------------------------------------------+
| json_path_exists(parse_json(Utf8("[1, 2]")),Utf8("[0]")) |
+----------------------------------------------------------+
| true |
+----------------------------------------------------------+
SELECT json_path_exists(parse_json('[1, 2]'), '[2]');
+----------------------------------------------------------+
| json_path_exists(parse_json(Utf8("[1, 2]")),Utf8("[2]")) |
+----------------------------------------------------------+
| false |
+----------------------------------------------------------+

View File

@@ -0,0 +1,8 @@
--- json_path_exists ---
SELECT json_path_exists(parse_json('{"a": 1, "b": 2}'), '$.a');
SELECT json_path_exists(parse_json('{"a": 1, "b": 2}'), '$.c');
SELECT json_path_exists(parse_json('[1, 2]'), '[0]');
SELECT json_path_exists(parse_json('[1, 2]'), '[2]');

View File

@@ -294,7 +294,7 @@ explain analyze select tag from t where num > 6 order by ts desc limit 2;
+-+-+-+
| 0_| 0_|_GlobalLimitExec: skip=0, fetch=2 REDACTED
|_|_|_SortPreservingMergeExec: [ts@1 DESC] REDACTED
|_|_|_SortExec: TopK(fetch=2), expr=[ts@1 DESC], preserve_partitioning=[true] REDACTED
|_|_|_SortExec: expr=[ts@1 DESC], preserve_partitioning=[true] REDACTED
|_|_|_MergeScanExec: REDACTED
|_|_|_|
| 1_| 0_|_GlobalLimitExec: skip=0, fetch=2 REDACTED

View File

@@ -0,0 +1,10 @@
SELECT 123 as a, 'h' as b UNION ALL SELECT 456 as a, 'e' as b UNION ALL SELECT 789 as a, 'l' as b order by a;
+-----+---+
| a | b |
+-----+---+
| 123 | h |
| 456 | e |
| 789 | l |
+-----+---+

View File

@@ -0,0 +1 @@
SELECT 123 as a, 'h' as b UNION ALL SELECT 456 as a, 'e' as b UNION ALL SELECT 789 as a, 'l' as b order by a;

View File

@@ -1,7 +1,7 @@
node_id = 1
mode = 'distributed'
require_lease_before_startup = true
rpc_addr = '127.0.0.1:4100'
rpc_addr = '127.0.0.1:29410'
rpc_hostname = '127.0.0.1'
rpc_runtime_size = 8
@@ -24,7 +24,7 @@ type = 'File'
data_home = '{data_home}'
[meta_client_options]
metasrv_addrs = ['127.0.0.1:3002']
metasrv_addrs = ['127.0.0.1:29302']
timeout_millis = 3000
connect_timeout_millis = 5000
tcp_nodelay = false

View File

@@ -20,13 +20,13 @@ linger = "5ms"
type = 'File'
data_home = '{data_home}'
[grpc_options]
addr = '127.0.0.1:4001'
[grpc]
addr = '127.0.0.1:29401'
runtime_size = 8
[mysql]
enable = true
addr = "127.0.0.1:4002"
addr = "127.0.0.1:29402"
runtime_size = 2
[mysql.tls]
@@ -34,7 +34,7 @@ mode = "disable"
[postgres]
enable = true
addr = "127.0.0.1:4003"
addr = "127.0.0.1:29403"
runtime_size = 2
[procedure]

View File

@@ -47,10 +47,10 @@ use tokio_postgres::{Client as PgClient, SimpleQueryMessage as PgRow};
use crate::protocol_interceptor::{MYSQL, PROTOCOL_KEY};
use crate::{util, ServerAddr};
const METASRV_ADDR: &str = "127.0.0.1:3002";
const GRPC_SERVER_ADDR: &str = "127.0.0.1:4001";
const MYSQL_SERVER_ADDR: &str = "127.0.0.1:4002";
const POSTGRES_SERVER_ADDR: &str = "127.0.0.1:4003";
const METASRV_ADDR: &str = "127.0.0.1:29302";
const GRPC_SERVER_ADDR: &str = "127.0.0.1:29401";
const MYSQL_SERVER_ADDR: &str = "127.0.0.1:29402";
const POSTGRES_SERVER_ADDR: &str = "127.0.0.1:29403";
const DEFAULT_LOG_LEVEL: &str = "--log-level=debug,hyper=warn,tower=warn,datafusion=warn,reqwest=warn,sqlparser=warn,h2=info,opendal=info";
#[derive(Clone)]
@@ -305,34 +305,55 @@ impl Env {
),
"-c".to_string(),
self.generate_config_file(subcommand, db_ctx),
"--http-addr=127.0.0.1:5002".to_string(),
"--http-addr=127.0.0.1:29502".to_string(),
];
(args, vec![GRPC_SERVER_ADDR.to_string()])
(
args,
vec![
GRPC_SERVER_ADDR.to_string(),
MYSQL_SERVER_ADDR.to_string(),
POSTGRES_SERVER_ADDR.to_string(),
],
)
}
"frontend" => {
let args = vec![
DEFAULT_LOG_LEVEL.to_string(),
subcommand.to_string(),
"start".to_string(),
"--metasrv-addrs=127.0.0.1:3002".to_string(),
"--http-addr=127.0.0.1:5003".to_string(),
"--metasrv-addrs=127.0.0.1:29302".to_string(),
"--http-addr=127.0.0.1:29503".to_string(),
format!("--rpc-addr={}", GRPC_SERVER_ADDR),
format!("--mysql-addr={}", MYSQL_SERVER_ADDR),
format!("--postgres-addr={}", POSTGRES_SERVER_ADDR),
format!(
"--log-dir={}/greptimedb-frontend/logs",
self.sqlness_home.display()
),
];
(args, vec![GRPC_SERVER_ADDR.to_string()])
(
args,
vec![
GRPC_SERVER_ADDR.to_string(),
MYSQL_SERVER_ADDR.to_string(),
POSTGRES_SERVER_ADDR.to_string(),
],
)
}
"metasrv" => {
let args = vec![
DEFAULT_LOG_LEVEL.to_string(),
subcommand.to_string(),
"start".to_string(),
"--bind-addr".to_string(),
"127.0.0.1:29302".to_string(),
"--server-addr".to_string(),
"127.0.0.1:29302".to_string(),
"--backend".to_string(),
"memory-store".to_string(),
"--enable-region-failover".to_string(),
"false".to_string(),
"--http-addr=127.0.0.1:5002".to_string(),
"--http-addr=127.0.0.1:29502".to_string(),
format!(
"--log-dir={}/greptimedb-metasrv/logs",
self.sqlness_home.display()
@@ -396,15 +417,15 @@ impl Env {
subcommand.to_string(),
"start".to_string(),
];
args.push(format!("--rpc-addr=127.0.0.1:410{id}"));
args.push(format!("--http-addr=127.0.0.1:430{id}"));
args.push(format!("--rpc-addr=127.0.0.1:2941{id}"));
args.push(format!("--http-addr=127.0.0.1:2943{id}"));
args.push(format!("--data-home={}", data_home.display()));
args.push(format!("--log-dir={}/logs", data_home.display()));
args.push(format!("--node-id={id}"));
args.push("-c".to_string());
args.push(self.generate_config_file(subcommand, db_ctx));
args.push("--metasrv-addrs=127.0.0.1:3002".to_string());
(args, format!("127.0.0.1:410{id}"))
args.push("--metasrv-addrs=127.0.0.1:29302".to_string());
(args, format!("127.0.0.1:2941{id}"))
}
fn flownode_start_args(
@@ -420,14 +441,14 @@ impl Env {
subcommand.to_string(),
"start".to_string(),
];
args.push(format!("--rpc-addr=127.0.0.1:680{id}"));
args.push(format!("--rpc-addr=127.0.0.1:2968{id}"));
args.push(format!("--node-id={id}"));
args.push(format!(
"--log-dir={}/greptimedb-flownode/logs",
sqlness_home.display()
));
args.push("--metasrv-addrs=127.0.0.1:3002".to_string());
(args, format!("127.0.0.1:680{id}"))
args.push("--metasrv-addrs=127.0.0.1:29302".to_string());
(args, format!("127.0.0.1:2968{id}"))
}
/// stop and restart the server process