Compare commits

...

19 Commits

Author SHA1 Message Date
Ning Sun
11bdb33d37 feat: sql query interceptor and plugin refactoring (#773)
* feat: let instance hold plugins

* feat: add sql query interceptor definition

* docs: add comments to key apis

* feat: add implementation for pre-parsing and post-parsing

* feat: add post_execute hook

* test: add tests for interceptor

* chore: add license header

* fix: clippy error

* Update src/cmd/src/frontend.rs

Co-authored-by: LFC <bayinamine@gmail.com>

* refactor: batching post_parsing calls

* refactor: rename AnyMap2 to Plugins

* feat: call pre_execute with logical plan empty at the moment

Co-authored-by: LFC <bayinamine@gmail.com>
2022-12-23 15:22:12 +08:00
LFC
1daba75e7b refactor: use "USE" keyword (#785)
Co-authored-by: luofucong <luofucong@greptime.com>
2022-12-23 14:29:47 +08:00
LFC
dc52a51576 chore: upgrade to Arrow 29.0 and use workspace package and dependencies (#782)
* chore: upgrade to Arrow 29.0 and use workspace package and dependencies

* fix: resolve PR comments

Co-authored-by: luofucong <luofucong@greptime.com>
2022-12-23 14:28:37 +08:00
Ruihang Xia
26af9e6214 ci: setup secrets for setup-protoc job (#783)
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2022-12-23 11:36:39 +08:00
fys
e07791c5e8 chore: make election mod public (#781) 2022-12-22 17:32:35 +08:00
Yingwen
b6d29afcd1 ci: Use lld for coverage (#778)
* ci: Use lld for coverage

* style: Fix clippy
2022-12-22 16:10:37 +08:00
LFC
ea9af42091 chore: upgrade Rust to nightly 2022-12-20 (#772)
* chore: upgrade Rust to nightly 2022-12-20

* chore: upgrade Rust to nightly 2022-12-20

Co-authored-by: luofucong <luofucong@greptime.com>
2022-12-21 19:32:30 +08:00
shuiyisong
d0ebcc3b5a chore: open userinfo constructor (#774) 2022-12-21 17:58:43 +08:00
LFC
77182f5024 chore: upgrade Arrow to version 28, and DataFusion to 15 (#771)
Co-authored-by: luofucong <luofucong@greptime.com>
2022-12-21 17:02:11 +08:00
Ning Sun
539ead5460 feat: check database existence on http api (#764)
* feat: check database existance on http api

* Update src/servers/src/http/handler.rs

Co-authored-by: Ruihang Xia <waynestxia@gmail.com>

* feat: use database not found status code

* test: add assertion for status code

Co-authored-by: Ruihang Xia <waynestxia@gmail.com>
2022-12-21 10:28:45 +08:00
Ruihang Xia
bc0e4e2cb0 fix: fill NULL based on row_count (#765)
* fix: fill NULL based on row_count

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* simplify code

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix: replace set_len with resize

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2022-12-20 12:12:48 +08:00
Ruihang Xia
7d29670c86 fix: consider null mask in sqlness display util (#763)
* fix: consider null mask in sqlness display util

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* add test case

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix test case

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* change placeholder to null

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2022-12-19 14:20:28 +08:00
LFC
afd88dd53a fix: test_dist_table_scan block (#761)
* fix: `test_dist_table_scan` block

* fix: resolve PR comments

Co-authored-by: luofucong <luofucong@greptime.com>
2022-12-19 11:20:51 +08:00
Ning Sun
efd85df6be feat: add schema check on postgres startup (#758)
* feat: add schema check on postgres startup

* chore: update pgwire to 0.6.3

* test: add test for unspecified db
2022-12-19 10:53:44 +08:00
Ning Sun
ea1896493b feat: allow multiple sql statements in query string (#699)
* feat: allow multiple sql statement in query string

* test: add a test for multiple statement call

* feat: add temprary workaround for standalone mode

* fix: resolve sql parser issue temporarily

* Update src/datanode/src/instance/sql.rs

Co-authored-by: Yingwen <realevenyag@gmail.com>

* fix: adopt new sql handler

* refactor: revert changes in query engine

* refactor: assume sql-statement 1-1 on datanode

* test: use frontend for integration test

* refactor: add statement execution api for explicit single statement call

* fix: typo

* refactor: rename query method

* test: add test case for error

* test: data type change adoption

* chore: add todo from review

* chore: remove obsolete comments

* fix: resolve resolve issues

Co-authored-by: Yingwen <realevenyag@gmail.com>
2022-12-16 19:50:20 +08:00
Jiachun Feng
66bca11401 refactor: remove optional from the protos (#756) 2022-12-16 15:47:51 +08:00
Yingwen
7c16a4a17b refactor(storage): Move write_batch::codec to a separate file (#757)
* refactor(storage): Move write_batch::codec to a separate file

* chore: move new_test_batch to write_batch mod
2022-12-16 15:32:59 +08:00
dennis zhuang
28bd7404ad feat: change column's default property to nullable (#751)
* feat: change column's default property to nullable

* chore: use all instead of any

* fix: compile error

* fix: dependencies order in cargo
2022-12-16 11:17:01 +08:00
Lei, HUANG
0653301754 feat: replace arrow2 with official implementation 🎉 (#753)
* chore: kick off. change datafusion/arrow/parquet to target version

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* chore: replace one last datafusion dep

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* feat: arrow_array switch to arrow

* chore: update dep of binary vector

* chore: fix wrong merge commit

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* feat: Switch to datatypes2

* feat: Make recordbatch compile

* chore: sort Cargo.toml

* feat: Fix common::recordbatch compiler errors

* feat: Fix recordbatch test compiling issue

* fix: api crate (#708)

* fix: rename ConcreteDataType::timestamp_millis_type to ConcreteDataType::timestamp_millisecond_type. fix other warnings regarding timestamp

* fix: revert changes in datatypes2

* fix: helper

* chore: delete datatypes based on arrow2

* feat: Fix some compiler errors in common::query (#710)

* feat: Fix some compiler errors in common::query

* feat: test_collect use vectors api

* fix: common-query subcrate (#712)

* fix: record batch adapter

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix error enum

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix: Fix common::query compiler errors (#713)

* feat: Move conversion to ScalarValue to value.rs

* fix: Fix common::query compiler errors

This commit also make InnerError pub(crate)

* feat: Implements diff accumulator using WrapperType (#715)

* feat: Remove usage of opaque error from common::recordbatch

* feat: Remove opaque error from common::query

* feat: Fix diff compiler errors

Now common_function just use common_query's Error and Result. Adds
a LargestType associated type to LogicalPrimitiveType to get the largest
type a logical primitive type can cast to.

* feat: Remove LargestType from NativeType trait

* chore: Update comments

* feat: Restrict Scalar::RefType of WrapperType to itself

Add trait bound `for<'a> Scalar<RefType<'a> = Self>` to WrapperType

* chore: Address CR comments

* chore: Format codes

* fix: fix compile error for mean/polyval/pow/interp ops

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* Revert "fix: fix compile error for mean/polyval/pow/interp ops"

This reverts commit fb0b4eb826.

* fix: Fix compiler errors in argmax/rate/median/norm_cdf (#716)

* fix: Fix compiler errors in argmax/rate/median/norm_cdf

* chore: Address CR comments

* fix: fix compile error for mean/polyval/pow/interp ops (#717)

* fix: fix compile error for mean/polyval/pow/interp ops

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* simplify type bounds

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix: fix argmin/percentile/clip/interp/scipy_stats_norm_pdf errors (#718)

fix: fix argmin/percentile/clip/interp/scipy_stats_norm_pdf compiler errors

* fix: fix other compile error in common-function (#719)

* further fixing

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix all compile errors in common function

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix: Fix tests and clippy for common-function subcrate (#726)

* further fixing

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix all compile errors in common function

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix tests

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix clippy

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* revert test changes

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix: row group pruning (#725)

* fix: row group pruning

* chore: use macro to simplify stats implemetation

* fxi: CR comments

* fix: row group metadata length mismatch

* fix: simplify code

* fix: Fix common::grpc compiler errors (#722)

* fix: Fix common::grpc compiler errors

This commit refactors RecordBatch and holds vectors in the RecordBatch
struct, so we don't need to cast the array to vector when doing
serialization or iterating the batch.

Now we use the vector API instead of the arrow API in grpc crate.

* chore: Address CR comments

* fix common record batch

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix: Fix compile error in server subcrate (#727)

* fix: Fix compile error in server subcrate

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* remove unused type alias

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* explicitly panic

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* Update src/storage/src/sst/parquet.rs

Co-authored-by: Yingwen <realevenyag@gmail.com>

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Co-authored-by: Yingwen <realevenyag@gmail.com>

* fix: Fix common grpc expr (#730)

* fix compile errors

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* rename fn names

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix styles

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix wranings in common-time

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix: pre-cast to avoid tremendous match arms (#734)

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* feat: upgrade storage crate to arrow and parquet offcial impl (#738)

* fix: compile erros

* fix: parquet reader and writer

* fix: parquet reader and writer

* fix: WriteBatch IPC encode/decode

* fix: clippy errors in storage subcrate

* chore: remove suspicious unwrap

* fix: some cr comments

* fix: CR comments

* fix: CR comments

* fix: Fix compiler errors in catalog and mito crates (#742)

* fix: Fix compiler errors in mito

* fix: Fix compiler errors in catalog crate

* style: Fix clippy

* chore: Fix use

* Merge pull request #745

* fix nyc-taxi and util

* Merge branch 'replace-arrow2' into fix-others

* fix substrait

* fix warnings and error in test

* fix: Fix imports in optimizer.rs

* fix: errors in optimzer

* fix: remove unwrap

* fix: Fix compiler errors in query crate (#746)

* fix: Fix compiler errors in state.rs

* fix: fix compiler errors in state

* feat: upgrade sqlparser to 0.26

* fix: fix datafusion engine compiler errors

* fix: Fix some tests in query crate

* fix: Fix all warnings in tests

* feat: Remove `Type` from timestamp's type name

* fix: fix query tests

Now datafusion already supports median, so this commit also remove the
median function

* style: Fix clippy

* feat: Remove RecordBatch::pretty_print

* chore: Address CR comments

* Update src/query/src/query_engine/state.rs

Co-authored-by: Ruihang Xia <waynestxia@gmail.com>

* fix: frontend compile errors (#747)

fix: fix compile errors in frontend

* fix: Fix compiler errors in script crate (#749)

* fix: Fix compiler errors in state.rs

* fix: fix compiler errors in state

* feat: upgrade sqlparser to 0.26

* fix: fix datafusion engine compiler errors

* fix: Fix some tests in query crate

* fix: Fix all warnings in tests

* feat: Remove `Type` from timestamp's type name

* fix: fix query tests

Now datafusion already supports median, so this commit also remove the
median function

* style: Fix clippy

* feat: Remove RecordBatch::pretty_print

* chore: Address CR comments

* feat: Add column_by_name to RecordBatch

* feat: modify select_from_rb

* feat: Fix some compiler errors in vector.rs

* feat: Fix more compiler errors in vector.rs

* fix: fix table.rs

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix: Fix compiler errors in coprocessor

* fix: Fix some compiler errors

* fix: Fix compiler errors in script

* chore: Remove unused imports and format code

* test: disable interval tests

* test: Fix test_compile_execute test

* style: Fix clippy

* feat: Support interval

* feat: Add RecordBatch::columns and fix clippy

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Co-authored-by: Ruihang Xia <waynestxia@gmail.com>

* fix: Fix All The Tests! (#752)

* fix: Fix several tests compile errors

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix: some compile errors in tests

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix: compile errors in frontend tests

* fix: compile errors in frontend tests

* test: Fix tests in api and common-query

* test: Fix test in sql crate

* fix: resolve substrait error

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* chore: add more test

* test: Fix tests in servers

* fix instance_test

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* test: Fix tests in tests-integration

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Co-authored-by: Lei, HUANG <mrsatangel@gmail.com>
Co-authored-by: evenyag <realevenyag@gmail.com>

* fix: clippy errors

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Co-authored-by: Ruihang Xia <waynestxia@gmail.com>
Co-authored-by: evenyag <realevenyag@gmail.com>
2022-12-15 18:49:12 +08:00
378 changed files with 9390 additions and 19171 deletions

View File

@@ -24,7 +24,7 @@ on:
name: Code coverage
env:
RUST_TOOLCHAIN: nightly-2022-07-14
RUST_TOOLCHAIN: nightly-2022-12-20
jobs:
coverage:
@@ -34,6 +34,11 @@ jobs:
steps:
- uses: actions/checkout@v3
- uses: arduino/setup-protoc@v1
with:
repo-token: ${{ secrets.GITHUB_TOKEN }}
- uses: KyleMayes/install-llvm-action@v1
with:
version: "14.0"
- name: Install toolchain
uses: dtolnay/rust-toolchain@master
with:
@@ -48,6 +53,7 @@ jobs:
- name: Collect coverage data
run: cargo llvm-cov nextest --workspace --lcov --output-path lcov.info
env:
CARGO_BUILD_RUSTFLAGS: "-C link-arg=-fuse-ld=lld"
RUST_BACKTRACE: 1
CARGO_INCREMENTAL: 0
GT_S3_BUCKET: ${{ secrets.S3_BUCKET }}

View File

@@ -23,7 +23,7 @@ on:
name: CI
env:
RUST_TOOLCHAIN: nightly-2022-07-14
RUST_TOOLCHAIN: nightly-2022-12-20
jobs:
typos:
@@ -41,6 +41,8 @@ jobs:
steps:
- uses: actions/checkout@v3
- uses: arduino/setup-protoc@v1
with:
repo-token: ${{ secrets.GITHUB_TOKEN }}
- uses: dtolnay/rust-toolchain@master
with:
toolchain: ${{ env.RUST_TOOLCHAIN }}
@@ -81,6 +83,8 @@ jobs:
# path: ./llvm
# key: llvm
# - uses: arduino/setup-protoc@v1
# with:
# repo-token: ${{ secrets.GITHUB_TOKEN }}
# - uses: KyleMayes/install-llvm-action@v1
# with:
# version: "14.0"
@@ -114,6 +118,8 @@ jobs:
steps:
- uses: actions/checkout@v3
- uses: arduino/setup-protoc@v1
with:
repo-token: ${{ secrets.GITHUB_TOKEN }}
- uses: dtolnay/rust-toolchain@master
with:
toolchain: ${{ env.RUST_TOOLCHAIN }}
@@ -131,6 +137,8 @@ jobs:
steps:
- uses: actions/checkout@v3
- uses: arduino/setup-protoc@v1
with:
repo-token: ${{ secrets.GITHUB_TOKEN }}
- uses: dtolnay/rust-toolchain@master
with:
toolchain: ${{ env.RUST_TOOLCHAIN }}

View File

@@ -10,7 +10,7 @@ on:
name: Release
env:
RUST_TOOLCHAIN: nightly-2022-07-14
RUST_TOOLCHAIN: nightly-2022-12-20
# FIXME(zyy17): Would be better to use `gh release list -L 1 | cut -f 3` to get the latest release version tag, but for a long time, we will stay at 'v0.1.0-alpha-*'.
SCHEDULED_BUILD_VERSION_PREFIX: v0.1.0-alpha

1822
Cargo.lock generated

File diff suppressed because it is too large Load Diff

View File

@@ -20,7 +20,6 @@ members = [
"src/common/time",
"src/datanode",
"src/datatypes",
"src/datatypes2",
"src/frontend",
"src/log-store",
"src/meta-client",
@@ -40,5 +39,23 @@ members = [
"tests/runner",
]
[workspace.package]
version = "0.1.0"
edition = "2021"
license = "Apache-2.0"
[workspace.dependencies]
arrow = "29.0"
arrow-schema = { version = "29.0", features = ["serde"] }
# TODO(LFC): Use released Datafusion when it officially dpendent on Arrow 29.0
datafusion = { git = "https://github.com/apache/arrow-datafusion.git", rev = "4917235a398ae20145c87d20984e6367dc1a0c1e" }
datafusion-common = { git = "https://github.com/apache/arrow-datafusion.git", rev = "4917235a398ae20145c87d20984e6367dc1a0c1e" }
datafusion-expr = { git = "https://github.com/apache/arrow-datafusion.git", rev = "4917235a398ae20145c87d20984e6367dc1a0c1e" }
datafusion-optimizer = { git = "https://github.com/apache/arrow-datafusion.git", rev = "4917235a398ae20145c87d20984e6367dc1a0c1e" }
datafusion-physical-expr = { git = "https://github.com/apache/arrow-datafusion.git", rev = "4917235a398ae20145c87d20984e6367dc1a0c1e" }
datafusion-sql = { git = "https://github.com/apache/arrow-datafusion.git", rev = "4917235a398ae20145c87d20984e6367dc1a0c1e" }
parquet = "29.0"
sqlparser = "0.28"
[profile.release]
debug = true

View File

@@ -1,14 +1,14 @@
[package]
name = "benchmarks"
version = "0.1.0"
edition = "2021"
license = "Apache-2.0"
version.workspace = true
edition.workspace = true
license.workspace = true
[dependencies]
arrow = "10"
arrow.workspace = true
clap = { version = "4.0", features = ["derive"] }
client = { path = "../src/client" }
indicatif = "0.17.1"
itertools = "0.10.5"
parquet = { version = "*" }
parquet.workspace = true
tokio = { version = "1.21", features = ["full"] }

View File

@@ -15,12 +15,10 @@
//! Use the taxi trip records from New York City dataset to bench. You can download the dataset from
//! [here](https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page).
#![feature(once_cell)]
#![allow(clippy::print_stdout)]
use std::collections::HashMap;
use std::path::{Path, PathBuf};
use std::sync::Arc;
use std::time::Instant;
use arrow::array::{ArrayRef, PrimitiveArray, StringArray, TimestampNanosecondArray};
@@ -29,12 +27,10 @@ use arrow::record_batch::RecordBatch;
use clap::Parser;
use client::admin::Admin;
use client::api::v1::column::Values;
use client::api::v1::{Column, ColumnDataType, ColumnDef, CreateExpr, InsertExpr};
use client::api::v1::{Column, ColumnDataType, ColumnDef, CreateTableExpr, InsertExpr, TableId};
use client::{Client, Database, Select};
use indicatif::{MultiProgress, ProgressBar, ProgressStyle};
use parquet::arrow::{ArrowReader, ParquetFileArrowReader};
use parquet::file::reader::FileReader;
use parquet::file::serialized_reader::SerializedFileReader;
use parquet::arrow::arrow_reader::ParquetRecordBatchReaderBuilder;
use tokio::task::JoinSet;
const DATABASE_NAME: &str = "greptime";
@@ -86,14 +82,18 @@ async fn write_data(
pb_style: ProgressStyle,
) -> u128 {
let file = std::fs::File::open(&path).unwrap();
let file_reader = Arc::new(SerializedFileReader::new(file).unwrap());
let row_num = file_reader.metadata().file_metadata().num_rows();
let record_batch_reader = ParquetFileArrowReader::new(file_reader)
.get_record_reader(batch_size)
let record_batch_reader_builder = ParquetRecordBatchReaderBuilder::try_new(file).unwrap();
let row_num = record_batch_reader_builder
.metadata()
.file_metadata()
.num_rows();
let record_batch_reader = record_batch_reader_builder
.with_batch_size(batch_size)
.build()
.unwrap();
let progress_bar = mpb.add(ProgressBar::new(row_num as _));
progress_bar.set_style(pb_style);
progress_bar.set_message(format!("{:?}", path));
progress_bar.set_message(format!("{path:?}"));
let mut total_rpc_elapsed_ms = 0;
@@ -114,10 +114,7 @@ async fn write_data(
progress_bar.inc(row_count as _);
}
progress_bar.finish_with_message(format!(
"file {:?} done in {}ms",
path, total_rpc_elapsed_ms
));
progress_bar.finish_with_message(format!("file {path:?} done in {total_rpc_elapsed_ms}ms",));
total_rpc_elapsed_ms
}
@@ -210,133 +207,134 @@ fn build_values(column: &ArrayRef) -> Values {
| DataType::FixedSizeList(_, _)
| DataType::LargeList(_)
| DataType::Struct(_)
| DataType::Union(_, _)
| DataType::Union(_, _, _)
| DataType::Dictionary(_, _)
| DataType::Decimal(_, _)
| DataType::Decimal128(_, _)
| DataType::Decimal256(_, _)
| DataType::Map(_, _) => todo!(),
}
}
fn create_table_expr() -> CreateExpr {
CreateExpr {
catalog_name: Some(CATALOG_NAME.to_string()),
schema_name: Some(SCHEMA_NAME.to_string()),
fn create_table_expr() -> CreateTableExpr {
CreateTableExpr {
catalog_name: CATALOG_NAME.to_string(),
schema_name: SCHEMA_NAME.to_string(),
table_name: TABLE_NAME.to_string(),
desc: None,
desc: "".to_string(),
column_defs: vec![
ColumnDef {
name: "VendorID".to_string(),
datatype: ColumnDataType::Int64 as i32,
is_nullable: true,
default_constraint: None,
default_constraint: vec![],
},
ColumnDef {
name: "tpep_pickup_datetime".to_string(),
datatype: ColumnDataType::Int64 as i32,
is_nullable: true,
default_constraint: None,
default_constraint: vec![],
},
ColumnDef {
name: "tpep_dropoff_datetime".to_string(),
datatype: ColumnDataType::Int64 as i32,
is_nullable: true,
default_constraint: None,
default_constraint: vec![],
},
ColumnDef {
name: "passenger_count".to_string(),
datatype: ColumnDataType::Float64 as i32,
is_nullable: true,
default_constraint: None,
default_constraint: vec![],
},
ColumnDef {
name: "trip_distance".to_string(),
datatype: ColumnDataType::Float64 as i32,
is_nullable: true,
default_constraint: None,
default_constraint: vec![],
},
ColumnDef {
name: "RatecodeID".to_string(),
datatype: ColumnDataType::Float64 as i32,
is_nullable: true,
default_constraint: None,
default_constraint: vec![],
},
ColumnDef {
name: "store_and_fwd_flag".to_string(),
datatype: ColumnDataType::String as i32,
is_nullable: true,
default_constraint: None,
default_constraint: vec![],
},
ColumnDef {
name: "PULocationID".to_string(),
datatype: ColumnDataType::Int64 as i32,
is_nullable: true,
default_constraint: None,
default_constraint: vec![],
},
ColumnDef {
name: "DOLocationID".to_string(),
datatype: ColumnDataType::Int64 as i32,
is_nullable: true,
default_constraint: None,
default_constraint: vec![],
},
ColumnDef {
name: "payment_type".to_string(),
datatype: ColumnDataType::Int64 as i32,
is_nullable: true,
default_constraint: None,
default_constraint: vec![],
},
ColumnDef {
name: "fare_amount".to_string(),
datatype: ColumnDataType::Float64 as i32,
is_nullable: true,
default_constraint: None,
default_constraint: vec![],
},
ColumnDef {
name: "extra".to_string(),
datatype: ColumnDataType::Float64 as i32,
is_nullable: true,
default_constraint: None,
default_constraint: vec![],
},
ColumnDef {
name: "mta_tax".to_string(),
datatype: ColumnDataType::Float64 as i32,
is_nullable: true,
default_constraint: None,
default_constraint: vec![],
},
ColumnDef {
name: "tip_amount".to_string(),
datatype: ColumnDataType::Float64 as i32,
is_nullable: true,
default_constraint: None,
default_constraint: vec![],
},
ColumnDef {
name: "tolls_amount".to_string(),
datatype: ColumnDataType::Float64 as i32,
is_nullable: true,
default_constraint: None,
default_constraint: vec![],
},
ColumnDef {
name: "improvement_surcharge".to_string(),
datatype: ColumnDataType::Float64 as i32,
is_nullable: true,
default_constraint: None,
default_constraint: vec![],
},
ColumnDef {
name: "total_amount".to_string(),
datatype: ColumnDataType::Float64 as i32,
is_nullable: true,
default_constraint: None,
default_constraint: vec![],
},
ColumnDef {
name: "congestion_surcharge".to_string(),
datatype: ColumnDataType::Float64 as i32,
is_nullable: true,
default_constraint: None,
default_constraint: vec![],
},
ColumnDef {
name: "airport_fee".to_string(),
datatype: ColumnDataType::Float64 as i32,
is_nullable: true,
default_constraint: None,
default_constraint: vec![],
},
],
time_index: "tpep_pickup_datetime".to_string(),
@@ -344,7 +342,7 @@ fn create_table_expr() -> CreateExpr {
create_if_not_exists: false,
table_options: Default::default(),
region_ids: vec![0],
table_id: Some(0),
table_id: Some(TableId { id: 0 }),
}
}
@@ -353,12 +351,12 @@ fn query_set() -> HashMap<String, String> {
ret.insert(
"count_all".to_string(),
format!("SELECT COUNT(*) FROM {};", TABLE_NAME),
format!("SELECT COUNT(*) FROM {TABLE_NAME};"),
);
ret.insert(
"fare_amt_by_passenger".to_string(),
format!("SELECT passenger_count, MIN(fare_amount), MAX(fare_amount), SUM(fare_amount) FROM {} GROUP BY passenger_count",TABLE_NAME)
format!("SELECT passenger_count, MIN(fare_amount), MAX(fare_amount), SUM(fare_amount) FROM {TABLE_NAME} GROUP BY passenger_count")
);
ret
@@ -371,7 +369,7 @@ async fn do_write(args: &Args, client: &Client) {
let mut write_jobs = JoinSet::new();
let create_table_result = admin.create(create_table_expr()).await;
println!("Create table result: {:?}", create_table_result);
println!("Create table result: {create_table_result:?}");
let progress_bar_style = ProgressStyle::with_template(
"[{elapsed_precise}] {bar:60.cyan/blue} {pos:>7}/{len:7} {msg}",
@@ -404,7 +402,7 @@ async fn do_write(args: &Args, client: &Client) {
async fn do_query(num_iter: usize, db: &Database) {
for (query_name, query) in query_set() {
println!("Running query: {}", query);
println!("Running query: {query}");
for i in 0..num_iter {
let now = Instant::now();
let _res = db.select(Select::Sql(query.clone())).await.unwrap();

View File

@@ -1 +1 @@
nightly-2022-07-14
nightly-2022-12-20

View File

@@ -1,9 +1,8 @@
[package]
name = "api"
version = "0.1.0"
edition = "2021"
license = "Apache-2.0"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
version.workspace = true
edition.workspace = true
license.workspace = true
[dependencies]
common-base = { path = "../common/base" }

View File

@@ -17,7 +17,7 @@ message AdminResponse {
message AdminExpr {
ExprHeader header = 1;
oneof expr {
CreateExpr create = 2;
CreateTableExpr create_table = 2;
AlterExpr alter = 3;
CreateDatabaseExpr create_database = 4;
DropTableExpr drop_table = 5;
@@ -31,24 +31,23 @@ message AdminResult {
}
}
// TODO(hl): rename to CreateTableExpr
message CreateExpr {
optional string catalog_name = 1;
optional string schema_name = 2;
message CreateTableExpr {
string catalog_name = 1;
string schema_name = 2;
string table_name = 3;
optional string desc = 4;
string desc = 4;
repeated ColumnDef column_defs = 5;
string time_index = 6;
repeated string primary_keys = 7;
bool create_if_not_exists = 8;
map<string, string> table_options = 9;
optional uint32 table_id = 10;
TableId table_id = 10;
repeated uint32 region_ids = 11;
}
message AlterExpr {
optional string catalog_name = 1;
optional string schema_name = 2;
string catalog_name = 1;
string schema_name = 2;
string table_name = 3;
oneof kind {
AddColumns add_columns = 4;
@@ -62,6 +61,11 @@ message DropTableExpr {
string table_name = 3;
}
message CreateDatabaseExpr {
//TODO(hl): maybe rename to schema_name?
string database_name = 1;
}
message AddColumns {
repeated AddColumn add_columns = 1;
}
@@ -79,7 +83,6 @@ message DropColumn {
string name = 1;
}
message CreateDatabaseExpr {
//TODO(hl): maybe rename to schema_name?
string database_name = 1;
message TableId {
uint32 id = 1;
}

View File

@@ -32,7 +32,10 @@ message Column {
repeated int32 date_values = 14;
repeated int64 datetime_values = 15;
repeated int64 ts_millis_values = 16;
repeated int64 ts_second_values = 16;
repeated int64 ts_millisecond_values = 17;
repeated int64 ts_microsecond_values = 18;
repeated int64 ts_nanosecond_values = 19;
}
// The array of non-null values in this column.
//
@@ -56,7 +59,7 @@ message ColumnDef {
string name = 1;
ColumnDataType datatype = 2;
bool is_nullable = 3;
optional bytes default_constraint = 4;
bytes default_constraint = 4;
}
enum ColumnDataType {
@@ -75,5 +78,8 @@ enum ColumnDataType {
STRING = 12;
DATE = 13;
DATETIME = 14;
TIMESTAMP = 15;
TIMESTAMP_SECOND = 15;
TIMESTAMP_MILLISECOND = 16;
TIMESTAMP_MICROSECOND = 17;
TIMESTAMP_NANOSECOND = 18;
}

View File

@@ -15,6 +15,7 @@
use common_base::BitVec;
use common_time::timestamp::TimeUnit;
use datatypes::prelude::ConcreteDataType;
use datatypes::types::TimestampType;
use datatypes::value::Value;
use datatypes::vectors::VectorRef;
use snafu::prelude::*;
@@ -56,7 +57,16 @@ impl From<ColumnDataTypeWrapper> for ConcreteDataType {
ColumnDataType::String => ConcreteDataType::string_datatype(),
ColumnDataType::Date => ConcreteDataType::date_datatype(),
ColumnDataType::Datetime => ConcreteDataType::datetime_datatype(),
ColumnDataType::Timestamp => ConcreteDataType::timestamp_millis_datatype(),
ColumnDataType::TimestampSecond => ConcreteDataType::timestamp_second_datatype(),
ColumnDataType::TimestampMillisecond => {
ConcreteDataType::timestamp_millisecond_datatype()
}
ColumnDataType::TimestampMicrosecond => {
ConcreteDataType::timestamp_microsecond_datatype()
}
ColumnDataType::TimestampNanosecond => {
ConcreteDataType::timestamp_nanosecond_datatype()
}
}
}
}
@@ -81,7 +91,12 @@ impl TryFrom<ConcreteDataType> for ColumnDataTypeWrapper {
ConcreteDataType::String(_) => ColumnDataType::String,
ConcreteDataType::Date(_) => ColumnDataType::Date,
ConcreteDataType::DateTime(_) => ColumnDataType::Datetime,
ConcreteDataType::Timestamp(_) => ColumnDataType::Timestamp,
ConcreteDataType::Timestamp(unit) => match unit {
TimestampType::Second(_) => ColumnDataType::TimestampSecond,
TimestampType::Millisecond(_) => ColumnDataType::TimestampMillisecond,
TimestampType::Microsecond(_) => ColumnDataType::TimestampMicrosecond,
TimestampType::Nanosecond(_) => ColumnDataType::TimestampNanosecond,
},
ConcreteDataType::Null(_) | ConcreteDataType::List(_) => {
return error::IntoColumnDataTypeSnafu { from: datatype }.fail()
}
@@ -153,8 +168,20 @@ impl Values {
datetime_values: Vec::with_capacity(capacity),
..Default::default()
},
ColumnDataType::Timestamp => Values {
ts_millis_values: Vec::with_capacity(capacity),
ColumnDataType::TimestampSecond => Values {
ts_second_values: Vec::with_capacity(capacity),
..Default::default()
},
ColumnDataType::TimestampMillisecond => Values {
ts_millisecond_values: Vec::with_capacity(capacity),
..Default::default()
},
ColumnDataType::TimestampMicrosecond => Values {
ts_microsecond_values: Vec::with_capacity(capacity),
..Default::default()
},
ColumnDataType::TimestampNanosecond => Values {
ts_nanosecond_values: Vec::with_capacity(capacity),
..Default::default()
},
}
@@ -187,9 +214,12 @@ impl Column {
Value::Binary(val) => values.binary_values.push(val.to_vec()),
Value::Date(val) => values.date_values.push(val.val()),
Value::DateTime(val) => values.datetime_values.push(val.val()),
Value::Timestamp(val) => values
.ts_millis_values
.push(val.convert_to(TimeUnit::Millisecond)),
Value::Timestamp(val) => match val.unit() {
TimeUnit::Second => values.ts_second_values.push(val.value()),
TimeUnit::Millisecond => values.ts_millisecond_values.push(val.value()),
TimeUnit::Microsecond => values.ts_microsecond_values.push(val.value()),
TimeUnit::Nanosecond => values.ts_nanosecond_values.push(val.value()),
},
Value::List(_) => unreachable!(),
});
self.null_mask = null_mask.into_vec();
@@ -200,7 +230,10 @@ impl Column {
mod tests {
use std::sync::Arc;
use datatypes::vectors::BooleanVector;
use datatypes::vectors::{
BooleanVector, TimestampMicrosecondVector, TimestampMillisecondVector,
TimestampNanosecondVector, TimestampSecondVector,
};
use super::*;
@@ -258,8 +291,8 @@ mod tests {
let values = values.datetime_values;
assert_eq!(2, values.capacity());
let values = Values::with_capacity(ColumnDataType::Timestamp, 2);
let values = values.ts_millis_values;
let values = Values::with_capacity(ColumnDataType::TimestampMillisecond, 2);
let values = values.ts_millisecond_values;
assert_eq!(2, values.capacity());
}
@@ -326,8 +359,8 @@ mod tests {
ColumnDataTypeWrapper(ColumnDataType::Datetime).into()
);
assert_eq!(
ConcreteDataType::timestamp_millis_datatype(),
ColumnDataTypeWrapper(ColumnDataType::Timestamp).into()
ConcreteDataType::timestamp_millisecond_datatype(),
ColumnDataTypeWrapper(ColumnDataType::TimestampMillisecond).into()
);
}
@@ -394,8 +427,8 @@ mod tests {
ConcreteDataType::datetime_datatype().try_into().unwrap()
);
assert_eq!(
ColumnDataTypeWrapper(ColumnDataType::Timestamp),
ConcreteDataType::timestamp_millis_datatype()
ColumnDataTypeWrapper(ColumnDataType::TimestampMillisecond),
ConcreteDataType::timestamp_millisecond_datatype()
.try_into()
.unwrap()
);
@@ -412,7 +445,48 @@ mod tests {
assert!(result.is_err());
assert_eq!(
result.unwrap_err().to_string(),
"Failed to create column datatype from List(ListType { inner: Boolean(BooleanType) })"
"Failed to create column datatype from List(ListType { item_type: Boolean(BooleanType) })"
);
}
#[test]
fn test_column_put_timestamp_values() {
let mut column = Column {
column_name: "test".to_string(),
semantic_type: 0,
values: Some(Values {
..Default::default()
}),
null_mask: vec![],
datatype: 0,
};
let vector = Arc::new(TimestampNanosecondVector::from_vec(vec![1, 2, 3]));
column.push_vals(3, vector);
assert_eq!(
vec![1, 2, 3],
column.values.as_ref().unwrap().ts_nanosecond_values
);
let vector = Arc::new(TimestampMillisecondVector::from_vec(vec![4, 5, 6]));
column.push_vals(3, vector);
assert_eq!(
vec![4, 5, 6],
column.values.as_ref().unwrap().ts_millisecond_values
);
let vector = Arc::new(TimestampMicrosecondVector::from_vec(vec![7, 8, 9]));
column.push_vals(3, vector);
assert_eq!(
vec![7, 8, 9],
column.values.as_ref().unwrap().ts_microsecond_values
);
let vector = Arc::new(TimestampSecondVector::from_vec(vec![10, 11, 12]));
column.push_vals(3, vector);
assert_eq!(
vec![10, 11, 12],
column.values.as_ref().unwrap().ts_second_values
);
}

View File

@@ -23,12 +23,13 @@ impl ColumnDef {
pub fn try_as_column_schema(&self) -> Result<ColumnSchema> {
let data_type = ColumnDataTypeWrapper::try_new(self.datatype)?;
let constraint = match &self.default_constraint {
None => None,
Some(v) => Some(
ColumnDefaultConstraint::try_from(&v[..])
let constraint = if self.default_constraint.is_empty() {
None
} else {
Some(
ColumnDefaultConstraint::try_from(self.default_constraint.as_slice())
.context(error::ConvertColumnDefaultConstraintSnafu { column: &self.name })?,
),
)
};
ColumnSchema::new(&self.name, data_type.into(), self.is_nullable)

View File

@@ -1,9 +1,8 @@
[package]
name = "catalog"
version = "0.1.0"
edition = "2021"
license = "Apache-2.0"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
version.workspace = true
edition.workspace = true
license.workspace = true
[dependencies]
api = { path = "../api" }
@@ -19,9 +18,7 @@ common-recordbatch = { path = "../common/recordbatch" }
common-runtime = { path = "../common/runtime" }
common-telemetry = { path = "../common/telemetry" }
common-time = { path = "../common/time" }
datafusion = { git = "https://github.com/apache/arrow-datafusion.git", branch = "arrow2", features = [
"simd",
] }
datafusion.workspace = true
datatypes = { path = "../datatypes" }
futures = "0.3"
futures-util = "0.3"

View File

@@ -17,7 +17,7 @@ use std::any::Any;
use common_error::ext::{BoxedError, ErrorExt};
use common_error::prelude::{Snafu, StatusCode};
use datafusion::error::DataFusionError;
use datatypes::arrow;
use datatypes::prelude::ConcreteDataType;
use datatypes::schema::RawSchema;
use snafu::{Backtrace, ErrorCompat};
@@ -51,14 +51,12 @@ pub enum Error {
SystemCatalog { msg: String, backtrace: Backtrace },
#[snafu(display(
"System catalog table type mismatch, expected: binary, found: {:?} source: {}",
"System catalog table type mismatch, expected: binary, found: {:?}",
data_type,
source
))]
SystemCatalogTypeMismatch {
data_type: arrow::datatypes::DataType,
#[snafu(backtrace)]
source: datatypes::error::Error,
data_type: ConcreteDataType,
backtrace: Backtrace,
},
#[snafu(display("Invalid system catalog entry type: {:?}", entry_type))]
@@ -222,10 +220,11 @@ impl ErrorExt for Error {
| Error::ValueDeserialize { .. }
| Error::Io { .. } => StatusCode::StorageUnavailable,
Error::RegisterTable { .. } => StatusCode::Internal,
Error::RegisterTable { .. } | Error::SystemCatalogTypeMismatch { .. } => {
StatusCode::Internal
}
Error::ReadSystemCatalog { source, .. } => source.status_code(),
Error::SystemCatalogTypeMismatch { source, .. } => source.status_code(),
Error::InvalidCatalogValue { source, .. } => source.status_code(),
Error::TableExists { .. } => StatusCode::TableAlreadyExists,
@@ -265,7 +264,6 @@ impl From<Error> for DataFusionError {
#[cfg(test)]
mod tests {
use common_error::mock::MockError;
use datatypes::arrow::datatypes::DataType;
use snafu::GenerateImplicitData;
use super::*;
@@ -314,11 +312,8 @@ mod tests {
assert_eq!(
StatusCode::Internal,
Error::SystemCatalogTypeMismatch {
data_type: DataType::Boolean,
source: datatypes::error::Error::UnsupportedArrowType {
arrow_type: DataType::Boolean,
backtrace: Backtrace::generate()
}
data_type: ConcreteDataType::binary_datatype(),
backtrace: Backtrace::generate(),
}
.status_code()
);

View File

@@ -33,48 +33,38 @@ const ALPHANUMERICS_NAME_PATTERN: &str = "[a-zA-Z_][a-zA-Z0-9_]*";
lazy_static! {
static ref CATALOG_KEY_PATTERN: Regex = Regex::new(&format!(
"^{}-({})$",
CATALOG_KEY_PREFIX, ALPHANUMERICS_NAME_PATTERN
"^{CATALOG_KEY_PREFIX}-({ALPHANUMERICS_NAME_PATTERN})$"
))
.unwrap();
}
lazy_static! {
static ref SCHEMA_KEY_PATTERN: Regex = Regex::new(&format!(
"^{}-({})-({})$",
SCHEMA_KEY_PREFIX, ALPHANUMERICS_NAME_PATTERN, ALPHANUMERICS_NAME_PATTERN
"^{SCHEMA_KEY_PREFIX}-({ALPHANUMERICS_NAME_PATTERN})-({ALPHANUMERICS_NAME_PATTERN})$"
))
.unwrap();
}
lazy_static! {
static ref TABLE_GLOBAL_KEY_PATTERN: Regex = Regex::new(&format!(
"^{}-({})-({})-({})$",
TABLE_GLOBAL_KEY_PREFIX,
ALPHANUMERICS_NAME_PATTERN,
ALPHANUMERICS_NAME_PATTERN,
ALPHANUMERICS_NAME_PATTERN
"^{TABLE_GLOBAL_KEY_PREFIX}-({ALPHANUMERICS_NAME_PATTERN})-({ALPHANUMERICS_NAME_PATTERN})-({ALPHANUMERICS_NAME_PATTERN})$"
))
.unwrap();
}
lazy_static! {
static ref TABLE_REGIONAL_KEY_PATTERN: Regex = Regex::new(&format!(
"^{}-({})-({})-({})-([0-9]+)$",
TABLE_REGIONAL_KEY_PREFIX,
ALPHANUMERICS_NAME_PATTERN,
ALPHANUMERICS_NAME_PATTERN,
ALPHANUMERICS_NAME_PATTERN
"^{TABLE_REGIONAL_KEY_PREFIX}-({ALPHANUMERICS_NAME_PATTERN})-({ALPHANUMERICS_NAME_PATTERN})-({ALPHANUMERICS_NAME_PATTERN})-([0-9]+)$"
))
.unwrap();
}
pub fn build_catalog_prefix() -> String {
format!("{}-", CATALOG_KEY_PREFIX)
format!("{CATALOG_KEY_PREFIX}-")
}
pub fn build_schema_prefix(catalog_name: impl AsRef<str>) -> String {
format!("{}-{}-", SCHEMA_KEY_PREFIX, catalog_name.as_ref())
format!("{SCHEMA_KEY_PREFIX}-{}-", catalog_name.as_ref())
}
pub fn build_table_global_prefix(
@@ -82,8 +72,7 @@ pub fn build_table_global_prefix(
schema_name: impl AsRef<str>,
) -> String {
format!(
"{}-{}-{}-",
TABLE_GLOBAL_KEY_PREFIX,
"{TABLE_GLOBAL_KEY_PREFIX}-{}-{}-",
catalog_name.as_ref(),
schema_name.as_ref()
)
@@ -138,7 +127,7 @@ impl TableGlobalKey {
/// Table global info contains necessary info for a datanode to create table regions, including
/// table id, table meta(schema...), region id allocation across datanodes.
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq)]
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq)]
pub struct TableGlobalValue {
/// Id of datanode that created the global table info kv. only for debugging.
pub node_id: u64,
@@ -378,7 +367,7 @@ mod tests {
table_info,
};
let serialized = serde_json::to_string(&value).unwrap();
let deserialized = TableGlobalValue::parse(&serialized).unwrap();
let deserialized = TableGlobalValue::parse(serialized).unwrap();
assert_eq!(value, deserialized);
}
}

View File

@@ -157,7 +157,7 @@ pub struct RegisterSchemaRequest {
/// Formats table fully-qualified name
pub fn format_full_table_name(catalog: &str, schema: &str, table: &str) -> String {
format!("{}.{}.{}", catalog, schema, table)
format!("{catalog}.{schema}.{table}")
}
pub trait CatalogProviderFactory {
@@ -187,8 +187,7 @@ pub(crate) async fn handle_system_table_request<'a, M: CatalogManager>(
.await
.with_context(|_| CreateTableSnafu {
table_info: format!(
"{}.{}.{}, id: {}",
catalog_name, schema_name, table_name, table_id,
"{catalog_name}.{schema_name}.{table_name}, id: {table_id}",
),
})?;
manager
@@ -200,7 +199,7 @@ pub(crate) async fn handle_system_table_request<'a, M: CatalogManager>(
table: table.clone(),
})
.await?;
info!("Created and registered system table: {}", table_name);
info!("Created and registered system table: {table_name}");
table
};
if let Some(hook) = req.open_hook {

View File

@@ -145,27 +145,34 @@ impl LocalCatalogManager {
/// Convert `RecordBatch` to a vector of `Entry`.
fn record_batch_to_entry(rb: RecordBatch) -> Result<Vec<Entry>> {
ensure!(
rb.df_recordbatch.columns().len() >= 6,
rb.num_columns() >= 6,
SystemCatalogSnafu {
msg: format!("Length mismatch: {}", rb.df_recordbatch.columns().len())
msg: format!("Length mismatch: {}", rb.num_columns())
}
);
let entry_type = UInt8Vector::try_from_arrow_array(&rb.df_recordbatch.columns()[0])
.with_context(|_| SystemCatalogTypeMismatchSnafu {
data_type: rb.df_recordbatch.columns()[ENTRY_TYPE_INDEX]
.data_type()
.clone(),
let entry_type = rb
.column(ENTRY_TYPE_INDEX)
.as_any()
.downcast_ref::<UInt8Vector>()
.with_context(|| SystemCatalogTypeMismatchSnafu {
data_type: rb.column(ENTRY_TYPE_INDEX).data_type(),
})?;
let key = BinaryVector::try_from_arrow_array(&rb.df_recordbatch.columns()[1])
.with_context(|_| SystemCatalogTypeMismatchSnafu {
data_type: rb.df_recordbatch.columns()[KEY_INDEX].data_type().clone(),
let key = rb
.column(KEY_INDEX)
.as_any()
.downcast_ref::<BinaryVector>()
.with_context(|| SystemCatalogTypeMismatchSnafu {
data_type: rb.column(KEY_INDEX).data_type(),
})?;
let value = BinaryVector::try_from_arrow_array(&rb.df_recordbatch.columns()[3])
.with_context(|_| SystemCatalogTypeMismatchSnafu {
data_type: rb.df_recordbatch.columns()[VALUE_INDEX].data_type().clone(),
let value = rb
.column(VALUE_INDEX)
.as_any()
.downcast_ref::<BinaryVector>()
.with_context(|| SystemCatalogTypeMismatchSnafu {
data_type: rb.column(VALUE_INDEX).data_type(),
})?;
let mut res = Vec::with_capacity(rb.num_rows());
@@ -331,7 +338,7 @@ impl CatalogManager for LocalCatalogManager {
let schema = catalog
.schema(schema_name)?
.with_context(|| SchemaNotFoundSnafu {
schema_info: format!("{}.{}", catalog_name, schema_name),
schema_info: format!("{catalog_name}.{schema_name}"),
})?;
{
@@ -445,7 +452,7 @@ impl CatalogManager for LocalCatalogManager {
let schema = catalog
.schema(schema_name)?
.with_context(|| SchemaNotFoundSnafu {
schema_info: format!("{}.{}", catalog_name, schema_name),
schema_info: format!("{catalog_name}.{schema_name}"),
})?;
schema.table(table_name)
}

View File

@@ -331,10 +331,7 @@ impl RemoteCatalogManager {
.open_table(&context, request)
.await
.with_context(|_| OpenTableSnafu {
table_info: format!(
"{}.{}.{}, id:{}",
catalog_name, schema_name, table_name, table_id
),
table_info: format!("{catalog_name}.{schema_name}.{table_name}, id:{table_id}"),
})? {
Some(table) => {
info!(
@@ -355,7 +352,7 @@ impl RemoteCatalogManager {
.clone()
.try_into()
.context(InvalidTableSchemaSnafu {
table_info: format!("{}.{}.{}", catalog_name, schema_name, table_name,),
table_info: format!("{catalog_name}.{schema_name}.{table_name}"),
schema: meta.schema.clone(),
})?;
let req = CreateTableRequest {
@@ -477,7 +474,7 @@ impl CatalogManager for RemoteCatalogManager {
let schema = catalog
.schema(schema_name)?
.with_context(|| SchemaNotFoundSnafu {
schema_info: format!("{}.{}", catalog_name, schema_name),
schema_info: format!("{catalog_name}.{schema_name}"),
})?;
schema.table(table_name)
}

View File

@@ -21,14 +21,13 @@ use common_catalog::consts::{
SYSTEM_CATALOG_TABLE_ID, SYSTEM_CATALOG_TABLE_NAME,
};
use common_query::logical_plan::Expr;
use common_query::physical_plan::{PhysicalPlanRef, RuntimeEnv};
use common_query::physical_plan::{PhysicalPlanRef, SessionContext};
use common_recordbatch::SendableRecordBatchStream;
use common_telemetry::debug;
use common_time::timestamp::Timestamp;
use common_time::util;
use datatypes::prelude::{ConcreteDataType, ScalarVector};
use datatypes::schema::{ColumnSchema, Schema, SchemaBuilder, SchemaRef};
use datatypes::vectors::{BinaryVector, TimestampVector, UInt8Vector};
use datatypes::vectors::{BinaryVector, TimestampMillisecondVector, UInt8Vector};
use serde::{Deserialize, Serialize};
use snafu::{ensure, OptionExt, ResultExt};
use table::engine::{EngineContext, TableEngineRef};
@@ -62,7 +61,7 @@ impl Table for SystemCatalogTable {
async fn scan(
&self,
_projection: &Option<Vec<usize>>,
_projection: Option<&Vec<usize>>,
_filters: &[Expr],
_limit: Option<usize>,
) -> table::Result<PhysicalPlanRef> {
@@ -127,13 +126,14 @@ impl SystemCatalogTable {
/// Create a stream of all entries inside system catalog table
pub async fn records(&self) -> Result<SendableRecordBatchStream> {
let full_projection = None;
let ctx = SessionContext::new();
let scan = self
.table
.scan(&full_projection, &[], None)
.scan(full_projection, &[], None)
.await
.context(error::SystemCatalogTableScanSnafu)?;
let stream = scan
.execute(0, Arc::new(RuntimeEnv::default()))
.execute(0, ctx.task_ctx())
.context(error::SystemCatalogTableScanExecSnafu)?;
Ok(stream)
}
@@ -161,7 +161,7 @@ fn build_system_catalog_schema() -> Schema {
),
ColumnSchema::new(
"timestamp".to_string(),
ConcreteDataType::timestamp_millis_datatype(),
ConcreteDataType::timestamp_millisecond_datatype(),
false,
)
.with_time_index(true),
@@ -172,12 +172,12 @@ fn build_system_catalog_schema() -> Schema {
),
ColumnSchema::new(
"gmt_created".to_string(),
ConcreteDataType::timestamp_millis_datatype(),
ConcreteDataType::timestamp_millisecond_datatype(),
false,
),
ColumnSchema::new(
"gmt_modified".to_string(),
ConcreteDataType::timestamp_millis_datatype(),
ConcreteDataType::timestamp_millisecond_datatype(),
false,
),
];
@@ -197,7 +197,7 @@ pub fn build_table_insert_request(full_table_name: String, table_id: TableId) ->
}
pub fn build_schema_insert_request(catalog_name: String, schema_name: String) -> InsertRequest {
let full_schema_name = format!("{}.{}", catalog_name, schema_name);
let full_schema_name = format!("{catalog_name}.{schema_name}");
build_insert_request(
EntryType::Schema,
full_schema_name.as_bytes(),
@@ -222,7 +222,7 @@ pub fn build_insert_request(entry_type: EntryType, key: &[u8], value: &[u8]) ->
// Timestamp in key part is intentionally left to 0
columns_values.insert(
"timestamp".to_string(),
Arc::new(TimestampVector::from_slice(&[Timestamp::from_millis(0)])) as _,
Arc::new(TimestampMillisecondVector::from_slice(&[0])) as _,
);
columns_values.insert(
@@ -230,18 +230,15 @@ pub fn build_insert_request(entry_type: EntryType, key: &[u8], value: &[u8]) ->
Arc::new(BinaryVector::from_slice(&[value])) as _,
);
let now = util::current_time_millis();
columns_values.insert(
"gmt_created".to_string(),
Arc::new(TimestampVector::from_slice(&[Timestamp::from_millis(
util::current_time_millis(),
)])) as _,
Arc::new(TimestampMillisecondVector::from_slice(&[now])) as _,
);
columns_values.insert(
"gmt_modified".to_string(),
Arc::new(TimestampVector::from_slice(&[Timestamp::from_millis(
util::current_time_millis(),
)])) as _,
Arc::new(TimestampMillisecondVector::from_slice(&[now])) as _,
);
InsertRequest {
@@ -393,7 +390,7 @@ mod tests {
if let Entry::Catalog(e) = entry {
assert_eq!("some_catalog", e.catalog_name);
} else {
panic!("Unexpected type: {:?}", entry);
panic!("Unexpected type: {entry:?}");
}
}
@@ -410,7 +407,7 @@ mod tests {
assert_eq!("some_catalog", e.catalog_name);
assert_eq!("some_schema", e.schema_name);
} else {
panic!("Unexpected type: {:?}", entry);
panic!("Unexpected type: {entry:?}");
}
}
@@ -429,7 +426,7 @@ mod tests {
assert_eq!("some_table", e.table_name);
assert_eq!(42, e.table_id);
} else {
panic!("Unexpected type: {:?}", entry);
panic!("Unexpected type: {entry:?}");
}
}

View File

@@ -26,9 +26,9 @@ use common_query::logical_plan::Expr;
use common_query::physical_plan::PhysicalPlanRef;
use common_recordbatch::error::Result as RecordBatchResult;
use common_recordbatch::{RecordBatch, RecordBatchStream};
use datatypes::prelude::{ConcreteDataType, VectorBuilder};
use datatypes::prelude::{ConcreteDataType, DataType};
use datatypes::schema::{ColumnSchema, Schema, SchemaRef};
use datatypes::value::Value;
use datatypes::value::ValueRef;
use datatypes::vectors::VectorRef;
use futures::Stream;
use snafu::ResultExt;
@@ -77,7 +77,7 @@ impl Table for Tables {
async fn scan(
&self,
_projection: &Option<Vec<usize>>,
_projection: Option<&Vec<usize>>,
_filters: &[Expr],
_limit: Option<usize>,
) -> table::error::Result<PhysicalPlanRef> {
@@ -149,26 +149,33 @@ fn tables_to_record_batch(
engine: &str,
) -> Vec<VectorRef> {
let mut catalog_vec =
VectorBuilder::with_capacity(ConcreteDataType::string_datatype(), table_names.len());
ConcreteDataType::string_datatype().create_mutable_vector(table_names.len());
let mut schema_vec =
VectorBuilder::with_capacity(ConcreteDataType::string_datatype(), table_names.len());
ConcreteDataType::string_datatype().create_mutable_vector(table_names.len());
let mut table_name_vec =
VectorBuilder::with_capacity(ConcreteDataType::string_datatype(), table_names.len());
ConcreteDataType::string_datatype().create_mutable_vector(table_names.len());
let mut engine_vec =
VectorBuilder::with_capacity(ConcreteDataType::string_datatype(), table_names.len());
ConcreteDataType::string_datatype().create_mutable_vector(table_names.len());
for table_name in table_names {
catalog_vec.push(&Value::String(catalog_name.into()));
schema_vec.push(&Value::String(schema_name.into()));
table_name_vec.push(&Value::String(table_name.into()));
engine_vec.push(&Value::String(engine.into()));
// Safety: All these vectors are string type.
catalog_vec
.push_value_ref(ValueRef::String(catalog_name))
.unwrap();
schema_vec
.push_value_ref(ValueRef::String(schema_name))
.unwrap();
table_name_vec
.push_value_ref(ValueRef::String(&table_name))
.unwrap();
engine_vec.push_value_ref(ValueRef::String(engine)).unwrap();
}
vec![
catalog_vec.finish(),
schema_vec.finish(),
table_name_vec.finish(),
engine_vec.finish(),
catalog_vec.to_vector(),
schema_vec.to_vector(),
table_name_vec.to_vector(),
engine_vec.to_vector(),
]
}
@@ -340,9 +347,7 @@ fn build_schema_for_tables() -> Schema {
#[cfg(test)]
mod tests {
use common_catalog::consts::{DEFAULT_CATALOG_NAME, DEFAULT_SCHEMA_NAME};
use common_query::physical_plan::RuntimeEnv;
use datatypes::arrow::array::Utf8Array;
use datatypes::arrow::datatypes::DataType;
use common_query::physical_plan::SessionContext;
use futures_util::StreamExt;
use table::table::numbers::NumbersTable;
@@ -365,57 +370,48 @@ mod tests {
.unwrap();
let tables = Tables::new(catalog_list, "test_engine".to_string());
let tables_stream = tables.scan(&None, &[], None).await.unwrap();
let mut tables_stream = tables_stream
.execute(0, Arc::new(RuntimeEnv::default()))
.unwrap();
let tables_stream = tables.scan(None, &[], None).await.unwrap();
let session_ctx = SessionContext::new();
let mut tables_stream = tables_stream.execute(0, session_ctx.task_ctx()).unwrap();
if let Some(t) = tables_stream.next().await {
let batch = t.unwrap().df_recordbatch;
let batch = t.unwrap();
assert_eq!(1, batch.num_rows());
assert_eq!(4, batch.num_columns());
assert_eq!(&DataType::Utf8, batch.column(0).data_type());
assert_eq!(&DataType::Utf8, batch.column(1).data_type());
assert_eq!(&DataType::Utf8, batch.column(2).data_type());
assert_eq!(&DataType::Utf8, batch.column(3).data_type());
assert_eq!(
ConcreteDataType::string_datatype(),
batch.column(0).data_type()
);
assert_eq!(
ConcreteDataType::string_datatype(),
batch.column(1).data_type()
);
assert_eq!(
ConcreteDataType::string_datatype(),
batch.column(2).data_type()
);
assert_eq!(
ConcreteDataType::string_datatype(),
batch.column(3).data_type()
);
assert_eq!(
"greptime",
batch
.column(0)
.as_any()
.downcast_ref::<Utf8Array<i32>>()
.unwrap()
.value(0)
batch.column(0).get_ref(0).as_string().unwrap().unwrap()
);
assert_eq!(
"public",
batch
.column(1)
.as_any()
.downcast_ref::<Utf8Array<i32>>()
.unwrap()
.value(0)
batch.column(1).get_ref(0).as_string().unwrap().unwrap()
);
assert_eq!(
"test_table",
batch
.column(2)
.as_any()
.downcast_ref::<Utf8Array<i32>>()
.unwrap()
.value(0)
batch.column(2).get_ref(0).as_string().unwrap().unwrap()
);
assert_eq!(
"test_engine",
batch
.column(3)
.as_any()
.downcast_ref::<Utf8Array<i32>>()
.unwrap()
.value(0)
batch.column(3).get_ref(0).as_string().unwrap().unwrap()
);
} else {
panic!("Record batch should not be empty!")

View File

@@ -69,8 +69,7 @@ mod tests {
assert!(
err.to_string()
.contains("Table `greptime.public.test_table` already exists"),
"Actual error message: {}",
err
"Actual error message: {err}",
);
}

View File

@@ -189,10 +189,10 @@ impl TableEngine for MockTableEngine {
unimplemented!()
}
fn get_table<'a>(
fn get_table(
&self,
_ctx: &EngineContext,
table_ref: &'a TableReference,
table_ref: &TableReference,
) -> table::Result<Option<TableRef>> {
futures::executor::block_on(async {
Ok(self
@@ -204,7 +204,7 @@ impl TableEngine for MockTableEngine {
})
}
fn table_exists<'a>(&self, _ctx: &EngineContext, table_ref: &'a TableReference) -> bool {
fn table_exists(&self, _ctx: &EngineContext, table_ref: &TableReference) -> bool {
futures::executor::block_on(async {
self.tables
.read()

View File

@@ -1,9 +1,8 @@
[package]
name = "client"
version = "0.1.0"
edition = "2021"
license = "Apache-2.0"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
version.workspace = true
edition.workspace = true
license.workspace = true
[dependencies]
api = { path = "../api" }
@@ -15,9 +14,7 @@ common-grpc-expr = { path = "../common/grpc-expr" }
common-query = { path = "../common/query" }
common-recordbatch = { path = "../common/recordbatch" }
common-time = { path = "../common/time" }
datafusion = { git = "https://github.com/apache/arrow-datafusion.git", branch = "arrow2", features = [
"simd",
] }
datafusion.workspace = true
datatypes = { path = "../datatypes" }
enum_dispatch = "0.3"
parking_lot = "0.12"

View File

@@ -12,7 +12,7 @@
// See the License for the specific language governing permissions and
// limitations under the License.
use api::v1::{ColumnDataType, ColumnDef, CreateExpr};
use api::v1::{ColumnDataType, ColumnDef, CreateTableExpr, TableId};
use client::admin::Admin;
use client::{Client, Database};
use prost_09::Message;
@@ -33,36 +33,36 @@ fn main() {
async fn run() {
let client = Client::with_urls(vec!["127.0.0.1:3001"]);
let create_table_expr = CreateExpr {
catalog_name: Some("greptime".to_string()),
schema_name: Some("public".to_string()),
let create_table_expr = CreateTableExpr {
catalog_name: "greptime".to_string(),
schema_name: "public".to_string(),
table_name: "test_logical_dist_exec".to_string(),
desc: None,
desc: "".to_string(),
column_defs: vec![
ColumnDef {
name: "timestamp".to_string(),
datatype: ColumnDataType::Timestamp as i32,
datatype: ColumnDataType::TimestampMillisecond as i32,
is_nullable: false,
default_constraint: None,
default_constraint: vec![],
},
ColumnDef {
name: "key".to_string(),
datatype: ColumnDataType::Uint64 as i32,
is_nullable: false,
default_constraint: None,
default_constraint: vec![],
},
ColumnDef {
name: "value".to_string(),
datatype: ColumnDataType::Uint64 as i32,
is_nullable: false,
default_constraint: None,
default_constraint: vec![],
},
],
time_index: "timestamp".to_string(),
primary_keys: vec!["key".to_string()],
create_if_not_exists: false,
table_options: Default::default(),
table_id: Some(1024),
table_id: Some(TableId { id: 1024 }),
region_ids: vec![0],
};

View File

@@ -34,13 +34,13 @@ impl Admin {
}
}
pub async fn create(&self, expr: CreateExpr) -> Result<AdminResult> {
pub async fn create(&self, expr: CreateTableExpr) -> Result<AdminResult> {
let header = ExprHeader {
version: PROTOCOL_VERSION,
};
let expr = AdminExpr {
header: Some(header),
expr: Some(admin_expr::Expr::Create(expr)),
expr: Some(admin_expr::Expr::CreateTable(expr)),
};
self.do_request(expr).await
}

View File

@@ -318,12 +318,11 @@ mod tests {
fn create_test_column(vector: VectorRef) -> Column {
let wrapper: ColumnDataTypeWrapper = vector.data_type().try_into().unwrap();
let array = vector.to_arrow_array();
Column {
column_name: "test".to_string(),
semantic_type: 1,
values: Some(values(&[array.clone()]).unwrap()),
null_mask: null_mask(&vec![array], vector.len()),
values: Some(values(&[vector.clone()]).unwrap()),
null_mask: null_mask(&[vector.clone()], vector.len()),
datatype: wrapper.datatype() as i32,
}
}

View File

@@ -1,9 +1,9 @@
[package]
name = "cmd"
version = "0.1.0"
edition = "2021"
version.workspace = true
edition.workspace = true
license.workspace = true
default-run = "greptime"
license = "Apache-2.0"
[[bin]]
name = "greptime"

View File

@@ -14,7 +14,6 @@
use std::sync::Arc;
use anymap::AnyMap;
use clap::Parser;
use frontend::frontend::{Frontend, FrontendOptions};
use frontend::grpc::GrpcOptions;
@@ -23,6 +22,7 @@ use frontend::instance::Instance;
use frontend::mysql::MysqlOptions;
use frontend::opentsdb::OpentsdbOptions;
use frontend::postgres::PostgresOptions;
use frontend::Plugins;
use meta_client::MetaClientOpts;
use servers::auth::UserProviderRef;
use servers::http::HttpOptions;
@@ -88,21 +88,21 @@ pub struct StartCommand {
impl StartCommand {
async fn run(self) -> Result<()> {
let plugins = load_frontend_plugins(&self.user_provider)?;
let plugins = Arc::new(load_frontend_plugins(&self.user_provider)?);
let opts: FrontendOptions = self.try_into()?;
let mut frontend = Frontend::new(
opts.clone(),
Instance::try_new_distributed(&opts)
.await
.context(error::StartFrontendSnafu)?,
plugins,
);
let mut instance = Instance::try_new_distributed(&opts)
.await
.context(error::StartFrontendSnafu)?;
instance.set_plugins(plugins.clone());
let mut frontend = Frontend::new(opts, instance, plugins);
frontend.start().await.context(error::StartFrontendSnafu)
}
}
pub fn load_frontend_plugins(user_provider: &Option<String>) -> Result<AnyMap> {
let mut plugins = AnyMap::new();
pub fn load_frontend_plugins(user_provider: &Option<String>) -> Result<Plugins> {
let mut plugins = Plugins::new();
if let Some(provider) = user_provider {
let provider = auth::user_provider_from_option(provider).context(IllegalAuthConfigSnafu)?;
@@ -138,14 +138,14 @@ impl TryFrom<StartCommand> for FrontendOptions {
if let Some(addr) = cmd.mysql_addr {
opts.mysql_options = Some(MysqlOptions {
addr,
tls: Arc::new(tls_option.clone()),
tls: tls_option.clone(),
..Default::default()
});
}
if let Some(addr) = cmd.postgres_addr {
opts.postgres_options = Some(PostgresOptions {
addr,
tls: Arc::new(tls_option),
tls: tls_option,
..Default::default()
});
}

View File

@@ -14,7 +14,6 @@
use std::sync::Arc;
use anymap::AnyMap;
use clap::Parser;
use common_telemetry::info;
use datanode::datanode::{Datanode, DatanodeOptions, ObjectStoreConfig};
@@ -27,6 +26,7 @@ use frontend::mysql::MysqlOptions;
use frontend::opentsdb::OpentsdbOptions;
use frontend::postgres::PostgresOptions;
use frontend::prometheus::PrometheusOptions;
use frontend::Plugins;
use serde::{Deserialize, Serialize};
use servers::http::HttpOptions;
use servers::tls::{TlsMode, TlsOption};
@@ -152,7 +152,7 @@ impl StartCommand {
async fn run(self) -> Result<()> {
let enable_memory_catalog = self.enable_memory_catalog;
let config_file = self.config_file.clone();
let plugins = load_frontend_plugins(&self.user_provider)?;
let plugins = Arc::new(load_frontend_plugins(&self.user_provider)?);
let fe_opts = FrontendOptions::try_from(self)?;
let dn_opts: DatanodeOptions = {
let mut opts: StandaloneOptions = if let Some(path) = config_file {
@@ -189,11 +189,12 @@ impl StartCommand {
/// Build frontend instance in standalone mode
async fn build_frontend(
fe_opts: FrontendOptions,
plugins: AnyMap,
plugins: Arc<Plugins>,
datanode_instance: InstanceRef,
) -> Result<Frontend<FeInstance>> {
let mut frontend_instance = FeInstance::new_standalone(datanode_instance.clone());
frontend_instance.set_script_handler(datanode_instance);
frontend_instance.set_plugins(plugins.clone());
Ok(Frontend::new(fe_opts, frontend_instance, plugins))
}
@@ -223,8 +224,7 @@ impl TryFrom<StartCommand> for FrontendOptions {
if addr == datanode_grpc_addr {
return IllegalConfigSnafu {
msg: format!(
"gRPC listen address conflicts with datanode reserved gRPC addr: {}",
datanode_grpc_addr
"gRPC listen address conflicts with datanode reserved gRPC addr: {datanode_grpc_addr}",
),
}
.fail();
@@ -262,12 +262,12 @@ impl TryFrom<StartCommand> for FrontendOptions {
let tls_option = TlsOption::new(cmd.tls_mode, cmd.tls_cert_path, cmd.tls_key_path);
if let Some(mut mysql_options) = opts.mysql_options {
mysql_options.tls = Arc::new(tls_option.clone());
mysql_options.tls = tls_option.clone();
opts.mysql_options = Some(mysql_options);
}
if let Some(mut postgres_options) = opts.postgres_options {
postgres_options.tls = Arc::new(tls_option);
postgres_options.tls = tls_option;
opts.postgres_options = Some(postgres_options);
}

View File

@@ -1,8 +1,8 @@
[package]
name = "common-base"
version = "0.1.0"
edition = "2021"
license = "Apache-2.0"
version.workspace = true
edition.workspace = true
license.workspace = true
[dependencies]
bitvec = "1.0"

View File

@@ -1,8 +1,8 @@
[package]
name = "common-catalog"
version = "0.1.0"
edition = "2021"
license = "Apache-2.0"
version.workspace = true
edition.workspace = true
license.workspace = true
[dependencies]
async-trait = "0.1"

View File

@@ -1,8 +1,8 @@
[package]
name = "common-error"
version = "0.1.0"
edition = "2021"
license = "Apache-2.0"
version.workspace = true
edition.workspace = true
license.workspace = true
[dependencies]
snafu = { version = "0.7", features = ["backtraces"] }

View File

@@ -131,7 +131,7 @@ mod tests {
assert!(ErrorCompat::backtrace(&err).is_some());
let msg = format!("{:?}", err);
let msg = format!("{err:?}");
assert!(msg.contains("\nBacktrace:\n"));
let fmt_msg = format!("{:?}", DebugFormat::new(&err));
assert_eq!(msg, fmt_msg);
@@ -151,7 +151,7 @@ mod tests {
assert!(err.as_any().downcast_ref::<MockError>().is_some());
assert!(err.source().is_some());
let msg = format!("{:?}", err);
let msg = format!("{err:?}");
assert!(msg.contains("\nBacktrace:\n"));
assert!(msg.contains("Caused by"));

View File

@@ -31,11 +31,11 @@ impl<'a, E: ErrorExt + ?Sized> fmt::Debug for DebugFormat<'a, E> {
write!(f, "{}.", self.0)?;
if let Some(source) = self.0.source() {
// Source error use debug format for more verbose info.
write!(f, " Caused by: {:?}", source)?;
write!(f, " Caused by: {source:?}")?;
}
if let Some(backtrace) = self.0.backtrace_opt() {
// Add a newline to separate causes and backtrace.
write!(f, "\nBacktrace:\n{}", backtrace)?;
write!(f, "\nBacktrace:\n{backtrace}")?;
}
Ok(())

View File

@@ -51,6 +51,7 @@ pub enum StatusCode {
TableNotFound = 4001,
TableColumnNotFound = 4002,
TableColumnExists = 4003,
DatabaseNotFound = 4004,
// ====== End of catalog related status code =======
// ====== Begin of storage related status code =====
@@ -86,7 +87,7 @@ impl StatusCode {
impl fmt::Display for StatusCode {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
// The current debug format is suitable to display.
write!(f, "{:?}", self)
write!(f, "{self:?}")
}
}
@@ -95,7 +96,7 @@ mod tests {
use super::*;
fn assert_status_code_display(code: StatusCode, msg: &str) {
let code_msg = format!("{}", code);
let code_msg = format!("{code}");
assert_eq!(msg, code_msg);
}

View File

@@ -1,8 +1,8 @@
[package]
name = "common-function-macro"
version = "0.1.0"
edition = "2021"
license = "Apache-2.0"
version.workspace = true
edition.workspace = true
license.workspace = true
[lib]
proc-macro = true

View File

@@ -1,8 +1,8 @@
[package]
edition = "2021"
name = "common-function"
version = "0.1.0"
license = "Apache-2.0"
edition.workspace = true
version.workspace = true
license.workspace = true
[dependencies]
arc-swap = "1.0"
@@ -11,7 +11,7 @@ common-error = { path = "../error" }
common-function-macro = { path = "../function-macro" }
common-query = { path = "../query" }
common-time = { path = "../time" }
datafusion-common = { git = "https://github.com/apache/arrow-datafusion.git", branch = "arrow2" }
datafusion.workspace = true
datatypes = { path = "../../datatypes" }
libc = "0.2"
num = "0.4"

View File

@@ -1,69 +0,0 @@
// Copyright 2022 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
use std::any::Any;
use common_error::prelude::*;
pub use common_query::error::{Error, Result};
use datatypes::error::Error as DataTypeError;
#[derive(Debug, Snafu)]
#[snafu(visibility(pub))]
pub enum InnerError {
#[snafu(display("Fail to get scalar vector, {}", source))]
GetScalarVector {
source: DataTypeError,
backtrace: Backtrace,
},
}
impl ErrorExt for InnerError {
fn backtrace_opt(&self) -> Option<&Backtrace> {
ErrorCompat::backtrace(self)
}
fn as_any(&self) -> &dyn Any {
self
}
}
impl From<InnerError> for Error {
fn from(err: InnerError) -> Self {
Self::new(err)
}
}
#[cfg(test)]
mod tests {
use snafu::GenerateImplicitData;
use super::*;
fn raise_datatype_error() -> std::result::Result<(), DataTypeError> {
Err(DataTypeError::Conversion {
from: "test".to_string(),
backtrace: Backtrace::generate(),
})
}
#[test]
fn test_get_scalar_vector_error() {
let err: Error = raise_datatype_error()
.context(GetScalarVectorSnafu)
.err()
.unwrap()
.into();
assert!(err.backtrace_opt().is_some());
}
}

View File

@@ -12,5 +12,4 @@
// See the License for the specific language governing permissions and
// limitations under the License.
pub mod error;
pub mod scalars;

View File

@@ -23,6 +23,5 @@ pub(crate) mod test;
mod timestamp;
pub mod udf;
pub use aggregate::MedianAccumulatorCreator;
pub use function::{Function, FunctionRef};
pub use function_registry::{FunctionRegistry, FUNCTION_REGISTRY};

View File

@@ -16,7 +16,6 @@ mod argmax;
mod argmin;
mod diff;
mod mean;
mod median;
mod percentile;
mod polyval;
mod scipy_stats_norm_cdf;
@@ -29,7 +28,6 @@ pub use argmin::ArgminAccumulatorCreator;
use common_query::logical_plan::AggregateFunctionCreatorRef;
pub use diff::DiffAccumulatorCreator;
pub use mean::MeanAccumulatorCreator;
pub use median::MedianAccumulatorCreator;
pub use percentile::PercentileAccumulatorCreator;
pub use polyval::PolyvalAccumulatorCreator;
pub use scipy_stats_norm_cdf::ScipyStatsNormCdfAccumulatorCreator;
@@ -88,7 +86,6 @@ impl AggregateFunctions {
};
}
register_aggr_func!("median", 1, MedianAccumulatorCreator);
register_aggr_func!("diff", 1, DiffAccumulatorCreator);
register_aggr_func!("mean", 1, MeanAccumulatorCreator);
register_aggr_func!("polyval", 2, PolyvalAccumulatorCreator);

View File

@@ -20,24 +20,22 @@ use common_query::error::{BadAccumulatorImplSnafu, CreateAccumulatorSnafu, Resul
use common_query::logical_plan::{Accumulator, AggregateFunctionCreator};
use common_query::prelude::*;
use datatypes::prelude::*;
use datatypes::vectors::ConstantVector;
use datatypes::types::{LogicalPrimitiveType, WrapperType};
use datatypes::vectors::{ConstantVector, Helper};
use datatypes::with_match_primitive_type_id;
use snafu::ensure;
// https://numpy.org/doc/stable/reference/generated/numpy.argmax.html
// return the index of the max value
#[derive(Debug, Default)]
pub struct Argmax<T>
where
T: Primitive + PartialOrd,
{
pub struct Argmax<T> {
max: Option<T>,
n: u64,
}
impl<T> Argmax<T>
where
T: Primitive + PartialOrd,
T: PartialOrd + Copy,
{
fn update(&mut self, value: T, index: u64) {
if let Some(Ordering::Less) = self.max.partial_cmp(&Some(value)) {
@@ -49,8 +47,7 @@ where
impl<T> Accumulator for Argmax<T>
where
T: Primitive + PartialOrd,
for<'a> T: Scalar<RefType<'a> = T>,
T: WrapperType + PartialOrd,
{
fn state(&self) -> Result<Vec<Value>> {
match self.max {
@@ -66,10 +63,10 @@ where
let column = &values[0];
let column: &<T as Scalar>::VectorType = if column.is_const() {
let column: &ConstantVector = unsafe { VectorHelper::static_cast(column) };
unsafe { VectorHelper::static_cast(column.inner()) }
let column: &ConstantVector = unsafe { Helper::static_cast(column) };
unsafe { Helper::static_cast(column.inner()) }
} else {
unsafe { VectorHelper::static_cast(column) }
unsafe { Helper::static_cast(column) }
};
for (i, v) in column.iter_data().enumerate() {
if let Some(value) = v {
@@ -93,8 +90,8 @@ where
let max = &states[0];
let index = &states[1];
let max: &<T as Scalar>::VectorType = unsafe { VectorHelper::static_cast(max) };
let index: &<u64 as Scalar>::VectorType = unsafe { VectorHelper::static_cast(index) };
let max: &<T as Scalar>::VectorType = unsafe { Helper::static_cast(max) };
let index: &<u64 as Scalar>::VectorType = unsafe { Helper::static_cast(index) };
index
.iter_data()
.flatten()
@@ -122,7 +119,7 @@ impl AggregateFunctionCreator for ArgmaxAccumulatorCreator {
with_match_primitive_type_id!(
input_type.logical_type_id(),
|$S| {
Ok(Box::new(Argmax::<$S>::default()))
Ok(Box::new(Argmax::<<$S as LogicalPrimitiveType>::Wrapper>::default()))
},
{
let err_msg = format!(
@@ -154,7 +151,7 @@ impl AggregateFunctionCreator for ArgmaxAccumulatorCreator {
#[cfg(test)]
mod test {
use datatypes::vectors::PrimitiveVector;
use datatypes::vectors::Int32Vector;
use super::*;
#[test]
@@ -166,21 +163,19 @@ mod test {
// test update one not-null value
let mut argmax = Argmax::<i32>::default();
let v: Vec<VectorRef> = vec![Arc::new(PrimitiveVector::<i32>::from(vec![Some(42)]))];
let v: Vec<VectorRef> = vec![Arc::new(Int32Vector::from(vec![Some(42)]))];
assert!(argmax.update_batch(&v).is_ok());
assert_eq!(Value::from(0_u64), argmax.evaluate().unwrap());
// test update one null value
let mut argmax = Argmax::<i32>::default();
let v: Vec<VectorRef> = vec![Arc::new(PrimitiveVector::<i32>::from(vec![
Option::<i32>::None,
]))];
let v: Vec<VectorRef> = vec![Arc::new(Int32Vector::from(vec![Option::<i32>::None]))];
assert!(argmax.update_batch(&v).is_ok());
assert_eq!(Value::Null, argmax.evaluate().unwrap());
// test update no null-value batch
let mut argmax = Argmax::<i32>::default();
let v: Vec<VectorRef> = vec![Arc::new(PrimitiveVector::<i32>::from(vec![
let v: Vec<VectorRef> = vec![Arc::new(Int32Vector::from(vec![
Some(-1i32),
Some(1),
Some(3),
@@ -190,7 +185,7 @@ mod test {
// test update null-value batch
let mut argmax = Argmax::<i32>::default();
let v: Vec<VectorRef> = vec![Arc::new(PrimitiveVector::<i32>::from(vec![
let v: Vec<VectorRef> = vec![Arc::new(Int32Vector::from(vec![
Some(-2i32),
None,
Some(4),
@@ -201,7 +196,7 @@ mod test {
// test update with constant vector
let mut argmax = Argmax::<i32>::default();
let v: Vec<VectorRef> = vec![Arc::new(ConstantVector::new(
Arc::new(PrimitiveVector::<i32>::from_vec(vec![4])),
Arc::new(Int32Vector::from_vec(vec![4])),
10,
))];
assert!(argmax.update_batch(&v).is_ok());

View File

@@ -20,23 +20,20 @@ use common_query::error::{BadAccumulatorImplSnafu, CreateAccumulatorSnafu, Resul
use common_query::logical_plan::{Accumulator, AggregateFunctionCreator};
use common_query::prelude::*;
use datatypes::prelude::*;
use datatypes::vectors::ConstantVector;
use datatypes::vectors::{ConstantVector, Helper};
use datatypes::with_match_primitive_type_id;
use snafu::ensure;
// // https://numpy.org/doc/stable/reference/generated/numpy.argmin.html
#[derive(Debug, Default)]
pub struct Argmin<T>
where
T: Primitive + PartialOrd,
{
pub struct Argmin<T> {
min: Option<T>,
n: u32,
}
impl<T> Argmin<T>
where
T: Primitive + PartialOrd,
T: Copy + PartialOrd,
{
fn update(&mut self, value: T, index: u32) {
match self.min {
@@ -56,8 +53,7 @@ where
impl<T> Accumulator for Argmin<T>
where
T: Primitive + PartialOrd,
for<'a> T: Scalar<RefType<'a> = T>,
T: WrapperType + PartialOrd,
{
fn state(&self) -> Result<Vec<Value>> {
match self.min {
@@ -75,10 +71,10 @@ where
let column = &values[0];
let column: &<T as Scalar>::VectorType = if column.is_const() {
let column: &ConstantVector = unsafe { VectorHelper::static_cast(column) };
unsafe { VectorHelper::static_cast(column.inner()) }
let column: &ConstantVector = unsafe { Helper::static_cast(column) };
unsafe { Helper::static_cast(column.inner()) }
} else {
unsafe { VectorHelper::static_cast(column) }
unsafe { Helper::static_cast(column) }
};
for (i, v) in column.iter_data().enumerate() {
if let Some(value) = v {
@@ -102,8 +98,8 @@ where
let min = &states[0];
let index = &states[1];
let min: &<T as Scalar>::VectorType = unsafe { VectorHelper::static_cast(min) };
let index: &<u32 as Scalar>::VectorType = unsafe { VectorHelper::static_cast(index) };
let min: &<T as Scalar>::VectorType = unsafe { Helper::static_cast(min) };
let index: &<u32 as Scalar>::VectorType = unsafe { Helper::static_cast(index) };
index
.iter_data()
.flatten()
@@ -131,7 +127,7 @@ impl AggregateFunctionCreator for ArgminAccumulatorCreator {
with_match_primitive_type_id!(
input_type.logical_type_id(),
|$S| {
Ok(Box::new(Argmin::<$S>::default()))
Ok(Box::new(Argmin::<<$S as LogicalPrimitiveType>::Wrapper>::default()))
},
{
let err_msg = format!(
@@ -163,7 +159,7 @@ impl AggregateFunctionCreator for ArgminAccumulatorCreator {
#[cfg(test)]
mod test {
use datatypes::vectors::PrimitiveVector;
use datatypes::vectors::Int32Vector;
use super::*;
#[test]
@@ -175,21 +171,19 @@ mod test {
// test update one not-null value
let mut argmin = Argmin::<i32>::default();
let v: Vec<VectorRef> = vec![Arc::new(PrimitiveVector::<i32>::from(vec![Some(42)]))];
let v: Vec<VectorRef> = vec![Arc::new(Int32Vector::from(vec![Some(42)]))];
assert!(argmin.update_batch(&v).is_ok());
assert_eq!(Value::from(0_u32), argmin.evaluate().unwrap());
// test update one null value
let mut argmin = Argmin::<i32>::default();
let v: Vec<VectorRef> = vec![Arc::new(PrimitiveVector::<i32>::from(vec![
Option::<i32>::None,
]))];
let v: Vec<VectorRef> = vec![Arc::new(Int32Vector::from(vec![Option::<i32>::None]))];
assert!(argmin.update_batch(&v).is_ok());
assert_eq!(Value::Null, argmin.evaluate().unwrap());
// test update no null-value batch
let mut argmin = Argmin::<i32>::default();
let v: Vec<VectorRef> = vec![Arc::new(PrimitiveVector::<i32>::from(vec![
let v: Vec<VectorRef> = vec![Arc::new(Int32Vector::from(vec![
Some(-1i32),
Some(1),
Some(3),
@@ -199,7 +193,7 @@ mod test {
// test update null-value batch
let mut argmin = Argmin::<i32>::default();
let v: Vec<VectorRef> = vec![Arc::new(PrimitiveVector::<i32>::from(vec![
let v: Vec<VectorRef> = vec![Arc::new(Int32Vector::from(vec![
Some(-2i32),
None,
Some(4),
@@ -210,7 +204,7 @@ mod test {
// test update with constant vector
let mut argmin = Argmin::<i32>::default();
let v: Vec<VectorRef> = vec![Arc::new(ConstantVector::new(
Arc::new(PrimitiveVector::<i32>::from_vec(vec![4])),
Arc::new(Int32Vector::from_vec(vec![4])),
10,
))];
assert!(argmin.update_batch(&v).is_ok());

View File

@@ -22,40 +22,32 @@ use common_query::error::{
use common_query::logical_plan::{Accumulator, AggregateFunctionCreator};
use common_query::prelude::*;
use datatypes::prelude::*;
use datatypes::types::PrimitiveType;
use datatypes::value::ListValue;
use datatypes::vectors::{ConstantVector, ListVector};
use datatypes::vectors::{ConstantVector, Helper, ListVector};
use datatypes::with_match_primitive_type_id;
use num_traits::AsPrimitive;
use snafu::{ensure, OptionExt, ResultExt};
// https://numpy.org/doc/stable/reference/generated/numpy.diff.html
// I is the input type, O is the output type.
#[derive(Debug, Default)]
pub struct Diff<T, SubT>
where
T: Primitive + AsPrimitive<SubT>,
SubT: Primitive + std::ops::Sub<Output = SubT>,
{
values: Vec<T>,
_phantom: PhantomData<SubT>,
pub struct Diff<I, O> {
values: Vec<I>,
_phantom: PhantomData<O>,
}
impl<T, SubT> Diff<T, SubT>
where
T: Primitive + AsPrimitive<SubT>,
SubT: Primitive + std::ops::Sub<Output = SubT>,
{
fn push(&mut self, value: T) {
impl<I, O> Diff<I, O> {
fn push(&mut self, value: I) {
self.values.push(value);
}
}
impl<T, SubT> Accumulator for Diff<T, SubT>
impl<I, O> Accumulator for Diff<I, O>
where
T: Primitive + AsPrimitive<SubT>,
for<'a> T: Scalar<RefType<'a> = T>,
SubT: Primitive + std::ops::Sub<Output = SubT>,
for<'a> SubT: Scalar<RefType<'a> = SubT>,
I: WrapperType,
O: WrapperType,
I::Native: AsPrimitive<O::Native>,
O::Native: std::ops::Sub<Output = O::Native>,
{
fn state(&self) -> Result<Vec<Value>> {
let nums = self
@@ -65,7 +57,7 @@ where
.collect::<Vec<Value>>();
Ok(vec![Value::List(ListValue::new(
Some(Box::new(nums)),
T::default().into().data_type(),
I::LogicalType::build_data_type(),
))])
}
@@ -78,12 +70,12 @@ where
let column = &values[0];
let mut len = 1;
let column: &<T as Scalar>::VectorType = if column.is_const() {
let column: &<I as Scalar>::VectorType = if column.is_const() {
len = column.len();
let column: &ConstantVector = unsafe { VectorHelper::static_cast(column) };
unsafe { VectorHelper::static_cast(column.inner()) }
let column: &ConstantVector = unsafe { Helper::static_cast(column) };
unsafe { Helper::static_cast(column.inner()) }
} else {
unsafe { VectorHelper::static_cast(column) }
unsafe { Helper::static_cast(column) }
};
(0..len).for_each(|_| {
for v in column.iter_data().flatten() {
@@ -109,8 +101,9 @@ where
),
})?;
for state in states.values_iter() {
let state = state.context(FromScalarValueSnafu)?;
self.update_batch(&[state])?
if let Some(state) = state.context(FromScalarValueSnafu)? {
self.update_batch(&[state])?;
}
}
Ok(())
}
@@ -122,11 +115,14 @@ where
let diff = self
.values
.windows(2)
.map(|x| (x[1].as_() - x[0].as_()).into())
.map(|x| {
let native = x[1].into_native().as_() - x[0].into_native().as_();
O::from_native(native).into()
})
.collect::<Vec<Value>>();
let diff = Value::List(ListValue::new(
Some(Box::new(diff)),
SubT::default().into().data_type(),
O::LogicalType::build_data_type(),
));
Ok(diff)
}
@@ -143,7 +139,7 @@ impl AggregateFunctionCreator for DiffAccumulatorCreator {
with_match_primitive_type_id!(
input_type.logical_type_id(),
|$S| {
Ok(Box::new(Diff::<$S,<$S as Primitive>::LargestType>::default()))
Ok(Box::new(Diff::<<$S as LogicalPrimitiveType>::Wrapper, <<$S as LogicalPrimitiveType>::LargestType as LogicalPrimitiveType>::Wrapper>::default()))
},
{
let err_msg = format!(
@@ -163,7 +159,7 @@ impl AggregateFunctionCreator for DiffAccumulatorCreator {
with_match_primitive_type_id!(
input_types[0].logical_type_id(),
|$S| {
Ok(ConcreteDataType::list_datatype(PrimitiveType::<<$S as Primitive>::LargestType>::default().into()))
Ok(ConcreteDataType::list_datatype($S::default().into()))
},
{
unreachable!()
@@ -177,7 +173,7 @@ impl AggregateFunctionCreator for DiffAccumulatorCreator {
with_match_primitive_type_id!(
input_types[0].logical_type_id(),
|$S| {
Ok(vec![ConcreteDataType::list_datatype(PrimitiveType::<$S>::default().into())])
Ok(vec![ConcreteDataType::list_datatype($S::default().into())])
},
{
unreachable!()
@@ -188,9 +184,10 @@ impl AggregateFunctionCreator for DiffAccumulatorCreator {
#[cfg(test)]
mod test {
use datatypes::vectors::PrimitiveVector;
use datatypes::vectors::Int32Vector;
use super::*;
#[test]
fn test_update_batch() {
// test update empty batch, expect not updating anything
@@ -201,21 +198,19 @@ mod test {
// test update one not-null value
let mut diff = Diff::<i32, i64>::default();
let v: Vec<VectorRef> = vec![Arc::new(PrimitiveVector::<i32>::from(vec![Some(42)]))];
let v: Vec<VectorRef> = vec![Arc::new(Int32Vector::from(vec![Some(42)]))];
assert!(diff.update_batch(&v).is_ok());
assert_eq!(Value::Null, diff.evaluate().unwrap());
// test update one null value
let mut diff = Diff::<i32, i64>::default();
let v: Vec<VectorRef> = vec![Arc::new(PrimitiveVector::<i32>::from(vec![
Option::<i32>::None,
]))];
let v: Vec<VectorRef> = vec![Arc::new(Int32Vector::from(vec![Option::<i32>::None]))];
assert!(diff.update_batch(&v).is_ok());
assert_eq!(Value::Null, diff.evaluate().unwrap());
// test update no null-value batch
let mut diff = Diff::<i32, i64>::default();
let v: Vec<VectorRef> = vec![Arc::new(PrimitiveVector::<i32>::from(vec![
let v: Vec<VectorRef> = vec![Arc::new(Int32Vector::from(vec![
Some(-1i32),
Some(1),
Some(2),
@@ -232,7 +227,7 @@ mod test {
// test update null-value batch
let mut diff = Diff::<i32, i64>::default();
let v: Vec<VectorRef> = vec![Arc::new(PrimitiveVector::<i32>::from(vec![
let v: Vec<VectorRef> = vec![Arc::new(Int32Vector::from(vec![
Some(-2i32),
None,
Some(3),
@@ -251,7 +246,7 @@ mod test {
// test update with constant vector
let mut diff = Diff::<i32, i64>::default();
let v: Vec<VectorRef> = vec![Arc::new(ConstantVector::new(
Arc::new(PrimitiveVector::<i32>::from_vec(vec![4])),
Arc::new(Int32Vector::from_vec(vec![4])),
4,
))];
let values = vec![Value::from(0_i64), Value::from(0_i64), Value::from(0_i64)];

View File

@@ -22,16 +22,14 @@ use common_query::error::{
use common_query::logical_plan::{Accumulator, AggregateFunctionCreator};
use common_query::prelude::*;
use datatypes::prelude::*;
use datatypes::vectors::{ConstantVector, Float64Vector, UInt64Vector};
use datatypes::types::WrapperType;
use datatypes::vectors::{ConstantVector, Float64Vector, Helper, UInt64Vector};
use datatypes::with_match_primitive_type_id;
use num_traits::AsPrimitive;
use snafu::{ensure, OptionExt};
#[derive(Debug, Default)]
pub struct Mean<T>
where
T: Primitive + AsPrimitive<f64>,
{
pub struct Mean<T> {
sum: f64,
n: u64,
_phantom: PhantomData<T>,
@@ -39,11 +37,12 @@ where
impl<T> Mean<T>
where
T: Primitive + AsPrimitive<f64>,
T: WrapperType,
T::Native: AsPrimitive<f64>,
{
#[inline(always)]
fn push(&mut self, value: T) {
self.sum += value.as_();
self.sum += value.into_native().as_();
self.n += 1;
}
@@ -56,8 +55,8 @@ where
impl<T> Accumulator for Mean<T>
where
T: Primitive + AsPrimitive<f64>,
for<'a> T: Scalar<RefType<'a> = T>,
T: WrapperType,
T::Native: AsPrimitive<f64>,
{
fn state(&self) -> Result<Vec<Value>> {
Ok(vec![self.sum.into(), self.n.into()])
@@ -73,10 +72,10 @@ where
let mut len = 1;
let column: &<T as Scalar>::VectorType = if column.is_const() {
len = column.len();
let column: &ConstantVector = unsafe { VectorHelper::static_cast(column) };
unsafe { VectorHelper::static_cast(column.inner()) }
let column: &ConstantVector = unsafe { Helper::static_cast(column) };
unsafe { Helper::static_cast(column.inner()) }
} else {
unsafe { VectorHelper::static_cast(column) }
unsafe { Helper::static_cast(column) }
};
(0..len).for_each(|_| {
for v in column.iter_data().flatten() {
@@ -150,7 +149,7 @@ impl AggregateFunctionCreator for MeanAccumulatorCreator {
with_match_primitive_type_id!(
input_type.logical_type_id(),
|$S| {
Ok(Box::new(Mean::<$S>::default()))
Ok(Box::new(Mean::<<$S as LogicalPrimitiveType>::Native>::default()))
},
{
let err_msg = format!(
@@ -182,7 +181,7 @@ impl AggregateFunctionCreator for MeanAccumulatorCreator {
#[cfg(test)]
mod test {
use datatypes::vectors::PrimitiveVector;
use datatypes::vectors::Int32Vector;
use super::*;
#[test]
@@ -194,21 +193,19 @@ mod test {
// test update one not-null value
let mut mean = Mean::<i32>::default();
let v: Vec<VectorRef> = vec![Arc::new(PrimitiveVector::<i32>::from(vec![Some(42)]))];
let v: Vec<VectorRef> = vec![Arc::new(Int32Vector::from(vec![Some(42)]))];
assert!(mean.update_batch(&v).is_ok());
assert_eq!(Value::from(42.0_f64), mean.evaluate().unwrap());
// test update one null value
let mut mean = Mean::<i32>::default();
let v: Vec<VectorRef> = vec![Arc::new(PrimitiveVector::<i32>::from(vec![
Option::<i32>::None,
]))];
let v: Vec<VectorRef> = vec![Arc::new(Int32Vector::from(vec![Option::<i32>::None]))];
assert!(mean.update_batch(&v).is_ok());
assert_eq!(Value::Null, mean.evaluate().unwrap());
// test update no null-value batch
let mut mean = Mean::<i32>::default();
let v: Vec<VectorRef> = vec![Arc::new(PrimitiveVector::<i32>::from(vec![
let v: Vec<VectorRef> = vec![Arc::new(Int32Vector::from(vec![
Some(-1i32),
Some(1),
Some(2),
@@ -218,7 +215,7 @@ mod test {
// test update null-value batch
let mut mean = Mean::<i32>::default();
let v: Vec<VectorRef> = vec![Arc::new(PrimitiveVector::<i32>::from(vec![
let v: Vec<VectorRef> = vec![Arc::new(Int32Vector::from(vec![
Some(-2i32),
None,
Some(3),
@@ -230,7 +227,7 @@ mod test {
// test update with constant vector
let mut mean = Mean::<i32>::default();
let v: Vec<VectorRef> = vec![Arc::new(ConstantVector::new(
Arc::new(PrimitiveVector::<i32>::from_vec(vec![4])),
Arc::new(Int32Vector::from_vec(vec![4])),
10,
))];
assert!(mean.update_batch(&v).is_ok());

View File

@@ -1,289 +0,0 @@
// Copyright 2022 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
use std::cmp::Reverse;
use std::collections::BinaryHeap;
use std::sync::Arc;
use common_function_macro::{as_aggr_func_creator, AggrFuncTypeStore};
use common_query::error::{
CreateAccumulatorSnafu, DowncastVectorSnafu, FromScalarValueSnafu, Result,
};
use common_query::logical_plan::{Accumulator, AggregateFunctionCreator};
use common_query::prelude::*;
use datatypes::prelude::*;
use datatypes::types::OrdPrimitive;
use datatypes::value::ListValue;
use datatypes::vectors::{ConstantVector, ListVector};
use datatypes::with_match_primitive_type_id;
use num::NumCast;
use snafu::{ensure, OptionExt, ResultExt};
// This median calculation algorithm's details can be found at
// https://leetcode.cn/problems/find-median-from-data-stream/
//
// Basically, it uses two heaps, a maximum heap and a minimum. The maximum heap stores numbers that
// are not greater than the median, and the minimum heap stores the greater. In a streaming of
// numbers, when a number is arrived, we adjust the heaps' tops, so that either one top is the
// median or both tops can be averaged to get the median.
//
// The time complexity to update the median is O(logn), O(1) to get the median; and the space
// complexity is O(n). (Ignore the costs for heap expansion.)
//
// From the point of algorithm, [quick select](https://en.wikipedia.org/wiki/Quickselect) might be
// better. But to use quick select here, we need a mutable self in the final calculation(`evaluate`)
// to swap stored numbers in the states vector. Though we can make our `evaluate` received
// `&mut self`, DataFusion calls our accumulator with `&self` (see `DfAccumulatorAdaptor`). That
// means we have to introduce some kinds of interior mutability, and the overhead is not neglectable.
//
// TODO(LFC): Use quick select to get median when we can modify DataFusion's code, and benchmark with two-heap algorithm.
#[derive(Debug, Default)]
pub struct Median<T>
where
T: Primitive,
{
greater: BinaryHeap<Reverse<OrdPrimitive<T>>>,
not_greater: BinaryHeap<OrdPrimitive<T>>,
}
impl<T> Median<T>
where
T: Primitive,
{
fn push(&mut self, value: T) {
let value = OrdPrimitive::<T>(value);
if self.not_greater.is_empty() {
self.not_greater.push(value);
return;
}
// The `unwrap`s below are safe because there are `push`s before them.
if value <= *self.not_greater.peek().unwrap() {
self.not_greater.push(value);
if self.not_greater.len() > self.greater.len() + 1 {
self.greater.push(Reverse(self.not_greater.pop().unwrap()));
}
} else {
self.greater.push(Reverse(value));
if self.greater.len() > self.not_greater.len() {
self.not_greater.push(self.greater.pop().unwrap().0);
}
}
}
}
// UDAFs are built using the trait `Accumulator`, that offers DataFusion the necessary functions
// to use them.
impl<T> Accumulator for Median<T>
where
T: Primitive,
for<'a> T: Scalar<RefType<'a> = T>,
{
// This function serializes our state to `ScalarValue`, which DataFusion uses to pass this
// state between execution stages. Note that this can be arbitrary data.
//
// The `ScalarValue`s returned here will be passed in as argument `states: &[VectorRef]` to
// `merge_batch` function.
fn state(&self) -> Result<Vec<Value>> {
let nums = self
.greater
.iter()
.map(|x| &x.0)
.chain(self.not_greater.iter())
.map(|&n| n.into())
.collect::<Vec<Value>>();
Ok(vec![Value::List(ListValue::new(
Some(Box::new(nums)),
T::default().into().data_type(),
))])
}
// DataFusion calls this function to update the accumulator's state for a batch of inputs rows.
// It is expected this function to update the accumulator's state.
fn update_batch(&mut self, values: &[VectorRef]) -> Result<()> {
if values.is_empty() {
return Ok(());
}
ensure!(values.len() == 1, InvalidInputStateSnafu);
// This is a unary accumulator, so only one column is provided.
let column = &values[0];
let mut len = 1;
let column: &<T as Scalar>::VectorType = if column.is_const() {
len = column.len();
let column: &ConstantVector = unsafe { VectorHelper::static_cast(column) };
unsafe { VectorHelper::static_cast(column.inner()) }
} else {
unsafe { VectorHelper::static_cast(column) }
};
(0..len).for_each(|_| {
for v in column.iter_data().flatten() {
self.push(v);
}
});
Ok(())
}
// DataFusion executes accumulators in partitions. In some execution stage, DataFusion will
// merge states from other accumulators (returned by `state()` method).
fn merge_batch(&mut self, states: &[VectorRef]) -> Result<()> {
if states.is_empty() {
return Ok(());
}
// The states here are returned by the `state` method. Since we only returned a vector
// with one value in that method, `states[0]` is fine.
let states = &states[0];
let states = states
.as_any()
.downcast_ref::<ListVector>()
.with_context(|| DowncastVectorSnafu {
err_msg: format!(
"expect ListVector, got vector type {}",
states.vector_type_name()
),
})?;
for state in states.values_iter() {
let state = state.context(FromScalarValueSnafu)?;
// merging state is simply accumulate stored numbers from others', so just call update
self.update_batch(&[state])?
}
Ok(())
}
// DataFusion expects this function to return the final value of this aggregator.
fn evaluate(&self) -> Result<Value> {
if self.not_greater.is_empty() {
assert!(
self.greater.is_empty(),
"not expected in two-heap median algorithm, there must be a bug when implementing it"
);
return Ok(Value::Null);
}
// unwrap is safe because we checked not_greater heap's len above
let not_greater = *self.not_greater.peek().unwrap();
let median = if self.not_greater.len() > self.greater.len() {
not_greater.into()
} else {
// unwrap is safe because greater heap len >= not_greater heap len, which is > 0
let greater = self.greater.peek().unwrap();
// the following three NumCast's `unwrap`s are safe because T is primitive
let not_greater_v: f64 = NumCast::from(not_greater.as_primitive()).unwrap();
let greater_v: f64 = NumCast::from(greater.0.as_primitive()).unwrap();
let median: T = NumCast::from((not_greater_v + greater_v) / 2.0).unwrap();
median.into()
};
Ok(median)
}
}
#[as_aggr_func_creator]
#[derive(Debug, Default, AggrFuncTypeStore)]
pub struct MedianAccumulatorCreator {}
impl AggregateFunctionCreator for MedianAccumulatorCreator {
fn creator(&self) -> AccumulatorCreatorFunction {
let creator: AccumulatorCreatorFunction = Arc::new(move |types: &[ConcreteDataType]| {
let input_type = &types[0];
with_match_primitive_type_id!(
input_type.logical_type_id(),
|$S| {
Ok(Box::new(Median::<$S>::default()))
},
{
let err_msg = format!(
"\"MEDIAN\" aggregate function not support data type {:?}",
input_type.logical_type_id(),
);
CreateAccumulatorSnafu { err_msg }.fail()?
}
)
});
creator
}
fn output_type(&self) -> Result<ConcreteDataType> {
let input_types = self.input_types()?;
ensure!(input_types.len() == 1, InvalidInputStateSnafu);
// unwrap is safe because we have checked input_types len must equals 1
Ok(input_types.into_iter().next().unwrap())
}
fn state_types(&self) -> Result<Vec<ConcreteDataType>> {
Ok(vec![ConcreteDataType::list_datatype(self.output_type()?)])
}
}
#[cfg(test)]
mod test {
use datatypes::vectors::PrimitiveVector;
use super::*;
#[test]
fn test_update_batch() {
// test update empty batch, expect not updating anything
let mut median = Median::<i32>::default();
assert!(median.update_batch(&[]).is_ok());
assert!(median.not_greater.is_empty());
assert!(median.greater.is_empty());
assert_eq!(Value::Null, median.evaluate().unwrap());
// test update one not-null value
let mut median = Median::<i32>::default();
let v: Vec<VectorRef> = vec![Arc::new(PrimitiveVector::<i32>::from(vec![Some(42)]))];
assert!(median.update_batch(&v).is_ok());
assert_eq!(Value::Int32(42), median.evaluate().unwrap());
// test update one null value
let mut median = Median::<i32>::default();
let v: Vec<VectorRef> = vec![Arc::new(PrimitiveVector::<i32>::from(vec![
Option::<i32>::None,
]))];
assert!(median.update_batch(&v).is_ok());
assert_eq!(Value::Null, median.evaluate().unwrap());
// test update no null-value batch
let mut median = Median::<i32>::default();
let v: Vec<VectorRef> = vec![Arc::new(PrimitiveVector::<i32>::from(vec![
Some(-1i32),
Some(1),
Some(2),
]))];
assert!(median.update_batch(&v).is_ok());
assert_eq!(Value::Int32(1), median.evaluate().unwrap());
// test update null-value batch
let mut median = Median::<i32>::default();
let v: Vec<VectorRef> = vec![Arc::new(PrimitiveVector::<i32>::from(vec![
Some(-2i32),
None,
Some(3),
Some(4),
]))];
assert!(median.update_batch(&v).is_ok());
assert_eq!(Value::Int32(3), median.evaluate().unwrap());
// test update with constant vector
let mut median = Median::<i32>::default();
let v: Vec<VectorRef> = vec![Arc::new(ConstantVector::new(
Arc::new(PrimitiveVector::<i32>::from_vec(vec![4])),
10,
))];
assert!(median.update_batch(&v).is_ok());
assert_eq!(Value::Int32(4), median.evaluate().unwrap());
}
}

View File

@@ -26,7 +26,7 @@ use common_query::prelude::*;
use datatypes::prelude::*;
use datatypes::types::OrdPrimitive;
use datatypes::value::{ListValue, OrderedFloat};
use datatypes::vectors::{ConstantVector, Float64Vector, ListVector};
use datatypes::vectors::{ConstantVector, Float64Vector, Helper, ListVector};
use datatypes::with_match_primitive_type_id;
use num::NumCast;
use snafu::{ensure, OptionExt, ResultExt};
@@ -44,15 +44,15 @@ use snafu::{ensure, OptionExt, ResultExt};
// This optional method parameter specifies the method to use when the desired quantile lies between two data points i < j.
// If g is the fractional part of the index surrounded by i and alpha and beta are correction constants modifying i and j.
// i+g = (q-alpha)/(n-alpha-beta+1)
// Below, q is the quantile value, n is the sample size and alpha and beta are constants. The following formula gives an interpolation i + g of where the quantile would be in the sorted sample.
// With i being the floor and g the fractional part of the result.
// Below, 'q' is the quantile value, 'n' is the sample size and alpha and beta are constants. The following formula gives an interpolation "i + g" of where the quantile would be in the sorted sample.
// With 'i' being the floor and 'g' the fractional part of the result.
// the default method is linear where
// alpha = 1
// beta = 1
#[derive(Debug, Default)]
pub struct Percentile<T>
where
T: Primitive,
T: WrapperType,
{
greater: BinaryHeap<Reverse<OrdPrimitive<T>>>,
not_greater: BinaryHeap<OrdPrimitive<T>>,
@@ -62,7 +62,7 @@ where
impl<T> Percentile<T>
where
T: Primitive,
T: WrapperType,
{
fn push(&mut self, value: T) {
let value = OrdPrimitive::<T>(value);
@@ -93,8 +93,7 @@ where
impl<T> Accumulator for Percentile<T>
where
T: Primitive,
for<'a> T: Scalar<RefType<'a> = T>,
T: WrapperType,
{
fn state(&self) -> Result<Vec<Value>> {
let nums = self
@@ -107,7 +106,7 @@ where
Ok(vec![
Value::List(ListValue::new(
Some(Box::new(nums)),
T::default().into().data_type(),
T::LogicalType::build_data_type(),
)),
self.p.into(),
])
@@ -129,14 +128,14 @@ where
let mut len = 1;
let column: &<T as Scalar>::VectorType = if column.is_const() {
len = column.len();
let column: &ConstantVector = unsafe { VectorHelper::static_cast(column) };
unsafe { VectorHelper::static_cast(column.inner()) }
let column: &ConstantVector = unsafe { Helper::static_cast(column) };
unsafe { Helper::static_cast(column.inner()) }
} else {
unsafe { VectorHelper::static_cast(column) }
unsafe { Helper::static_cast(column) }
};
let x = &values[1];
let x = VectorHelper::check_get_scalar::<f64>(x).context(error::InvalidInputsSnafu {
let x = Helper::check_get_scalar::<f64>(x).context(error::InvalidInputTypeSnafu {
err_msg: "expecting \"POLYVAL\" function's second argument to be float64",
})?;
// `get(0)` is safe because we have checked `values[1].len() == values[0].len() != 0`
@@ -209,10 +208,11 @@ where
),
})?;
for value in values.values_iter() {
let value = value.context(FromScalarValueSnafu)?;
let column: &<T as Scalar>::VectorType = unsafe { VectorHelper::static_cast(&value) };
for v in column.iter_data().flatten() {
self.push(v);
if let Some(value) = value.context(FromScalarValueSnafu)? {
let column: &<T as Scalar>::VectorType = unsafe { Helper::static_cast(&value) };
for v in column.iter_data().flatten() {
self.push(v);
}
}
}
Ok(())
@@ -259,7 +259,7 @@ impl AggregateFunctionCreator for PercentileAccumulatorCreator {
with_match_primitive_type_id!(
input_type.logical_type_id(),
|$S| {
Ok(Box::new(Percentile::<$S>::default()))
Ok(Box::new(Percentile::<<$S as LogicalPrimitiveType>::Wrapper>::default()))
},
{
let err_msg = format!(
@@ -292,7 +292,7 @@ impl AggregateFunctionCreator for PercentileAccumulatorCreator {
#[cfg(test)]
mod test {
use datatypes::vectors::PrimitiveVector;
use datatypes::vectors::{Float64Vector, Int32Vector};
use super::*;
#[test]
@@ -307,8 +307,8 @@ mod test {
// test update one not-null value
let mut percentile = Percentile::<i32>::default();
let v: Vec<VectorRef> = vec![
Arc::new(PrimitiveVector::<i32>::from(vec![Some(42)])),
Arc::new(PrimitiveVector::<f64>::from(vec![Some(100.0_f64)])),
Arc::new(Int32Vector::from(vec![Some(42)])),
Arc::new(Float64Vector::from(vec![Some(100.0_f64)])),
];
assert!(percentile.update_batch(&v).is_ok());
assert_eq!(Value::from(42.0_f64), percentile.evaluate().unwrap());
@@ -316,8 +316,8 @@ mod test {
// test update one null value
let mut percentile = Percentile::<i32>::default();
let v: Vec<VectorRef> = vec![
Arc::new(PrimitiveVector::<i32>::from(vec![Option::<i32>::None])),
Arc::new(PrimitiveVector::<f64>::from(vec![Some(100.0_f64)])),
Arc::new(Int32Vector::from(vec![Option::<i32>::None])),
Arc::new(Float64Vector::from(vec![Some(100.0_f64)])),
];
assert!(percentile.update_batch(&v).is_ok());
assert_eq!(Value::Null, percentile.evaluate().unwrap());
@@ -325,12 +325,8 @@ mod test {
// test update no null-value batch
let mut percentile = Percentile::<i32>::default();
let v: Vec<VectorRef> = vec![
Arc::new(PrimitiveVector::<i32>::from(vec![
Some(-1i32),
Some(1),
Some(2),
])),
Arc::new(PrimitiveVector::<f64>::from(vec![
Arc::new(Int32Vector::from(vec![Some(-1i32), Some(1), Some(2)])),
Arc::new(Float64Vector::from(vec![
Some(100.0_f64),
Some(100.0_f64),
Some(100.0_f64),
@@ -342,13 +338,8 @@ mod test {
// test update null-value batch
let mut percentile = Percentile::<i32>::default();
let v: Vec<VectorRef> = vec![
Arc::new(PrimitiveVector::<i32>::from(vec![
Some(-2i32),
None,
Some(3),
Some(4),
])),
Arc::new(PrimitiveVector::<f64>::from(vec![
Arc::new(Int32Vector::from(vec![Some(-2i32), None, Some(3), Some(4)])),
Arc::new(Float64Vector::from(vec![
Some(100.0_f64),
Some(100.0_f64),
Some(100.0_f64),
@@ -362,13 +353,10 @@ mod test {
let mut percentile = Percentile::<i32>::default();
let v: Vec<VectorRef> = vec![
Arc::new(ConstantVector::new(
Arc::new(PrimitiveVector::<i32>::from_vec(vec![4])),
Arc::new(Int32Vector::from_vec(vec![4])),
2,
)),
Arc::new(PrimitiveVector::<f64>::from(vec![
Some(100.0_f64),
Some(100.0_f64),
])),
Arc::new(Float64Vector::from(vec![Some(100.0_f64), Some(100.0_f64)])),
];
assert!(percentile.update_batch(&v).is_ok());
assert_eq!(Value::from(4_f64), percentile.evaluate().unwrap());
@@ -376,12 +364,8 @@ mod test {
// test left border
let mut percentile = Percentile::<i32>::default();
let v: Vec<VectorRef> = vec![
Arc::new(PrimitiveVector::<i32>::from(vec![
Some(-1i32),
Some(1),
Some(2),
])),
Arc::new(PrimitiveVector::<f64>::from(vec![
Arc::new(Int32Vector::from(vec![Some(-1i32), Some(1), Some(2)])),
Arc::new(Float64Vector::from(vec![
Some(0.0_f64),
Some(0.0_f64),
Some(0.0_f64),
@@ -393,12 +377,8 @@ mod test {
// test medium
let mut percentile = Percentile::<i32>::default();
let v: Vec<VectorRef> = vec![
Arc::new(PrimitiveVector::<i32>::from(vec![
Some(-1i32),
Some(1),
Some(2),
])),
Arc::new(PrimitiveVector::<f64>::from(vec![
Arc::new(Int32Vector::from(vec![Some(-1i32), Some(1), Some(2)])),
Arc::new(Float64Vector::from(vec![
Some(50.0_f64),
Some(50.0_f64),
Some(50.0_f64),
@@ -410,12 +390,8 @@ mod test {
// test right border
let mut percentile = Percentile::<i32>::default();
let v: Vec<VectorRef> = vec![
Arc::new(PrimitiveVector::<i32>::from(vec![
Some(-1i32),
Some(1),
Some(2),
])),
Arc::new(PrimitiveVector::<f64>::from(vec![
Arc::new(Int32Vector::from(vec![Some(-1i32), Some(1), Some(2)])),
Arc::new(Float64Vector::from(vec![
Some(100.0_f64),
Some(100.0_f64),
Some(100.0_f64),
@@ -431,12 +407,8 @@ mod test {
// >> 6.400000000000
let mut percentile = Percentile::<i32>::default();
let v: Vec<VectorRef> = vec![
Arc::new(PrimitiveVector::<i32>::from(vec![
Some(10i32),
Some(7),
Some(4),
])),
Arc::new(PrimitiveVector::<f64>::from(vec![
Arc::new(Int32Vector::from(vec![Some(10i32), Some(7), Some(4)])),
Arc::new(Float64Vector::from(vec![
Some(40.0_f64),
Some(40.0_f64),
Some(40.0_f64),
@@ -451,12 +423,8 @@ mod test {
// >> 9.7000000000000011
let mut percentile = Percentile::<i32>::default();
let v: Vec<VectorRef> = vec![
Arc::new(PrimitiveVector::<i32>::from(vec![
Some(10i32),
Some(7),
Some(4),
])),
Arc::new(PrimitiveVector::<f64>::from(vec![
Arc::new(Int32Vector::from(vec![Some(10i32), Some(7), Some(4)])),
Arc::new(Float64Vector::from(vec![
Some(95.0_f64),
Some(95.0_f64),
Some(95.0_f64),

View File

@@ -23,9 +23,9 @@ use common_query::error::{
use common_query::logical_plan::{Accumulator, AggregateFunctionCreator};
use common_query::prelude::*;
use datatypes::prelude::*;
use datatypes::types::PrimitiveType;
use datatypes::types::{LogicalPrimitiveType, WrapperType};
use datatypes::value::ListValue;
use datatypes::vectors::{ConstantVector, Int64Vector, ListVector};
use datatypes::vectors::{ConstantVector, Helper, Int64Vector, ListVector};
use datatypes::with_match_primitive_type_id;
use num_traits::AsPrimitive;
use snafu::{ensure, OptionExt, ResultExt};
@@ -34,8 +34,10 @@ use snafu::{ensure, OptionExt, ResultExt};
#[derive(Debug, Default)]
pub struct Polyval<T, PolyT>
where
T: Primitive + AsPrimitive<PolyT>,
PolyT: Primitive + std::ops::Mul<Output = PolyT>,
T: WrapperType,
T::Native: AsPrimitive<PolyT::Native>,
PolyT: WrapperType,
PolyT::Native: std::ops::Mul<Output = PolyT::Native>,
{
values: Vec<T>,
// DataFusion casts constant in into i64 type.
@@ -45,8 +47,10 @@ where
impl<T, PolyT> Polyval<T, PolyT>
where
T: Primitive + AsPrimitive<PolyT>,
PolyT: Primitive + std::ops::Mul<Output = PolyT>,
T: WrapperType,
T::Native: AsPrimitive<PolyT::Native>,
PolyT: WrapperType,
PolyT::Native: std::ops::Mul<Output = PolyT::Native>,
{
fn push(&mut self, value: T) {
self.values.push(value);
@@ -55,11 +59,11 @@ where
impl<T, PolyT> Accumulator for Polyval<T, PolyT>
where
T: Primitive + AsPrimitive<PolyT>,
PolyT: Primitive + std::ops::Mul<Output = PolyT> + std::iter::Sum<PolyT>,
for<'a> T: Scalar<RefType<'a> = T>,
for<'a> PolyT: Scalar<RefType<'a> = PolyT>,
i64: AsPrimitive<PolyT>,
T: WrapperType,
T::Native: AsPrimitive<PolyT::Native>,
PolyT: WrapperType + std::iter::Sum<<PolyT as WrapperType>::Native>,
PolyT::Native: std::ops::Mul<Output = PolyT::Native> + std::iter::Sum<PolyT::Native>,
i64: AsPrimitive<<PolyT as WrapperType>::Native>,
{
fn state(&self) -> Result<Vec<Value>> {
let nums = self
@@ -70,7 +74,7 @@ where
Ok(vec![
Value::List(ListValue::new(
Some(Box::new(nums)),
T::default().into().data_type(),
T::LogicalType::build_data_type(),
)),
self.x.into(),
])
@@ -91,10 +95,10 @@ where
let mut len = 1;
let column: &<T as Scalar>::VectorType = if column.is_const() {
len = column.len();
let column: &ConstantVector = unsafe { VectorHelper::static_cast(column) };
unsafe { VectorHelper::static_cast(column.inner()) }
let column: &ConstantVector = unsafe { Helper::static_cast(column) };
unsafe { Helper::static_cast(column.inner()) }
} else {
unsafe { VectorHelper::static_cast(column) }
unsafe { Helper::static_cast(column) }
};
(0..len).for_each(|_| {
for v in column.iter_data().flatten() {
@@ -103,7 +107,7 @@ where
});
let x = &values[1];
let x = VectorHelper::check_get_scalar::<i64>(x).context(error::InvalidInputsSnafu {
let x = Helper::check_get_scalar::<i64>(x).context(error::InvalidInputTypeSnafu {
err_msg: "expecting \"POLYVAL\" function's second argument to be a positive integer",
})?;
// `get(0)` is safe because we have checked `values[1].len() == values[0].len() != 0`
@@ -172,12 +176,14 @@ where
),
})?;
for value in values.values_iter() {
let value = value.context(FromScalarValueSnafu)?;
let column: &<T as Scalar>::VectorType = unsafe { VectorHelper::static_cast(&value) };
for v in column.iter_data().flatten() {
self.push(v);
if let Some(value) = value.context(FromScalarValueSnafu)? {
let column: &<T as Scalar>::VectorType = unsafe { Helper::static_cast(&value) };
for v in column.iter_data().flatten() {
self.push(v);
}
}
}
Ok(())
}
@@ -196,7 +202,7 @@ where
.values
.iter()
.enumerate()
.map(|(i, &value)| value.as_() * (x.pow((len - 1 - i) as u32)).as_())
.map(|(i, &value)| value.into_native().as_() * x.pow((len - 1 - i) as u32).as_())
.sum();
Ok(polyval.into())
}
@@ -213,7 +219,7 @@ impl AggregateFunctionCreator for PolyvalAccumulatorCreator {
with_match_primitive_type_id!(
input_type.logical_type_id(),
|$S| {
Ok(Box::new(Polyval::<$S,<$S as Primitive>::LargestType>::default()))
Ok(Box::new(Polyval::<<$S as LogicalPrimitiveType>::Wrapper, <<$S as LogicalPrimitiveType>::LargestType as LogicalPrimitiveType>::Wrapper>::default()))
},
{
let err_msg = format!(
@@ -234,7 +240,7 @@ impl AggregateFunctionCreator for PolyvalAccumulatorCreator {
with_match_primitive_type_id!(
input_type,
|$S| {
Ok(PrimitiveType::<<$S as Primitive>::LargestType>::default().into())
Ok(<<$S as LogicalPrimitiveType>::LargestType as LogicalPrimitiveType>::build_data_type())
},
{
unreachable!()
@@ -254,7 +260,7 @@ impl AggregateFunctionCreator for PolyvalAccumulatorCreator {
#[cfg(test)]
mod test {
use datatypes::vectors::PrimitiveVector;
use datatypes::vectors::Int32Vector;
use super::*;
#[test]
@@ -268,8 +274,8 @@ mod test {
// test update one not-null value
let mut polyval = Polyval::<i32, i64>::default();
let v: Vec<VectorRef> = vec![
Arc::new(PrimitiveVector::<i32>::from(vec![Some(3)])),
Arc::new(PrimitiveVector::<i64>::from(vec![Some(2_i64)])),
Arc::new(Int32Vector::from(vec![Some(3)])),
Arc::new(Int64Vector::from(vec![Some(2_i64)])),
];
assert!(polyval.update_batch(&v).is_ok());
assert_eq!(Value::Int64(3), polyval.evaluate().unwrap());
@@ -277,8 +283,8 @@ mod test {
// test update one null value
let mut polyval = Polyval::<i32, i64>::default();
let v: Vec<VectorRef> = vec![
Arc::new(PrimitiveVector::<i32>::from(vec![Option::<i32>::None])),
Arc::new(PrimitiveVector::<i64>::from(vec![Some(2_i64)])),
Arc::new(Int32Vector::from(vec![Option::<i32>::None])),
Arc::new(Int64Vector::from(vec![Some(2_i64)])),
];
assert!(polyval.update_batch(&v).is_ok());
assert_eq!(Value::Null, polyval.evaluate().unwrap());
@@ -286,12 +292,8 @@ mod test {
// test update no null-value batch
let mut polyval = Polyval::<i32, i64>::default();
let v: Vec<VectorRef> = vec![
Arc::new(PrimitiveVector::<i32>::from(vec![
Some(3),
Some(0),
Some(1),
])),
Arc::new(PrimitiveVector::<i64>::from(vec![
Arc::new(Int32Vector::from(vec![Some(3), Some(0), Some(1)])),
Arc::new(Int64Vector::from(vec![
Some(2_i64),
Some(2_i64),
Some(2_i64),
@@ -303,13 +305,8 @@ mod test {
// test update null-value batch
let mut polyval = Polyval::<i32, i64>::default();
let v: Vec<VectorRef> = vec![
Arc::new(PrimitiveVector::<i32>::from(vec![
Some(3),
Some(0),
None,
Some(1),
])),
Arc::new(PrimitiveVector::<i64>::from(vec![
Arc::new(Int32Vector::from(vec![Some(3), Some(0), None, Some(1)])),
Arc::new(Int64Vector::from(vec![
Some(2_i64),
Some(2_i64),
Some(2_i64),
@@ -323,10 +320,10 @@ mod test {
let mut polyval = Polyval::<i32, i64>::default();
let v: Vec<VectorRef> = vec![
Arc::new(ConstantVector::new(
Arc::new(PrimitiveVector::<i32>::from_vec(vec![4])),
Arc::new(Int32Vector::from_vec(vec![4])),
2,
)),
Arc::new(PrimitiveVector::<i64>::from(vec![Some(5_i64), Some(5_i64)])),
Arc::new(Int64Vector::from(vec![Some(5_i64), Some(5_i64)])),
];
assert!(polyval.update_batch(&v).is_ok());
assert_eq!(Value::Int64(24), polyval.evaluate().unwrap());

View File

@@ -23,7 +23,7 @@ use common_query::logical_plan::{Accumulator, AggregateFunctionCreator};
use common_query::prelude::*;
use datatypes::prelude::*;
use datatypes::value::{ListValue, OrderedFloat};
use datatypes::vectors::{ConstantVector, Float64Vector, ListVector};
use datatypes::vectors::{ConstantVector, Float64Vector, Helper, ListVector};
use datatypes::with_match_primitive_type_id;
use num_traits::AsPrimitive;
use snafu::{ensure, OptionExt, ResultExt};
@@ -33,18 +33,12 @@ use statrs::statistics::Statistics;
// https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.norm.html
#[derive(Debug, Default)]
pub struct ScipyStatsNormCdf<T>
where
T: Primitive + AsPrimitive<f64> + std::iter::Sum<T>,
{
pub struct ScipyStatsNormCdf<T> {
values: Vec<T>,
x: Option<f64>,
}
impl<T> ScipyStatsNormCdf<T>
where
T: Primitive + AsPrimitive<f64> + std::iter::Sum<T>,
{
impl<T> ScipyStatsNormCdf<T> {
fn push(&mut self, value: T) {
self.values.push(value);
}
@@ -52,8 +46,8 @@ where
impl<T> Accumulator for ScipyStatsNormCdf<T>
where
T: Primitive + AsPrimitive<f64> + std::iter::Sum<T>,
for<'a> T: Scalar<RefType<'a> = T>,
T: WrapperType + std::iter::Sum<T>,
T::Native: AsPrimitive<f64>,
{
fn state(&self) -> Result<Vec<Value>> {
let nums = self
@@ -64,7 +58,7 @@ where
Ok(vec![
Value::List(ListValue::new(
Some(Box::new(nums)),
T::default().into().data_type(),
T::LogicalType::build_data_type(),
)),
self.x.into(),
])
@@ -86,14 +80,14 @@ where
let mut len = 1;
let column: &<T as Scalar>::VectorType = if column.is_const() {
len = column.len();
let column: &ConstantVector = unsafe { VectorHelper::static_cast(column) };
unsafe { VectorHelper::static_cast(column.inner()) }
let column: &ConstantVector = unsafe { Helper::static_cast(column) };
unsafe { Helper::static_cast(column.inner()) }
} else {
unsafe { VectorHelper::static_cast(column) }
unsafe { Helper::static_cast(column) }
};
let x = &values[1];
let x = VectorHelper::check_get_scalar::<f64>(x).context(error::InvalidInputsSnafu {
let x = Helper::check_get_scalar::<f64>(x).context(error::InvalidInputTypeSnafu {
err_msg: "expecting \"SCIPYSTATSNORMCDF\" function's second argument to be a positive integer",
})?;
let first = x.get(0);
@@ -160,19 +154,19 @@ where
),
})?;
for value in values.values_iter() {
let value = value.context(FromScalarValueSnafu)?;
let column: &<T as Scalar>::VectorType = unsafe { VectorHelper::static_cast(&value) };
for v in column.iter_data().flatten() {
self.push(v);
if let Some(value) = value.context(FromScalarValueSnafu)? {
let column: &<T as Scalar>::VectorType = unsafe { Helper::static_cast(&value) };
for v in column.iter_data().flatten() {
self.push(v);
}
}
}
Ok(())
}
fn evaluate(&self) -> Result<Value> {
let values = self.values.iter().map(|&v| v.as_()).collect::<Vec<_>>();
let mean = values.clone().mean();
let std_dev = values.std_dev();
let mean = self.values.iter().map(|v| v.into_native().as_()).mean();
let std_dev = self.values.iter().map(|v| v.into_native().as_()).std_dev();
if mean.is_nan() || std_dev.is_nan() {
Ok(Value::Null)
} else {
@@ -198,7 +192,7 @@ impl AggregateFunctionCreator for ScipyStatsNormCdfAccumulatorCreator {
with_match_primitive_type_id!(
input_type.logical_type_id(),
|$S| {
Ok(Box::new(ScipyStatsNormCdf::<$S>::default()))
Ok(Box::new(ScipyStatsNormCdf::<<$S as LogicalPrimitiveType>::Wrapper>::default()))
},
{
let err_msg = format!(
@@ -230,7 +224,7 @@ impl AggregateFunctionCreator for ScipyStatsNormCdfAccumulatorCreator {
#[cfg(test)]
mod test {
use datatypes::vectors::PrimitiveVector;
use datatypes::vectors::{Float64Vector, Int32Vector};
use super::*;
#[test]
@@ -244,12 +238,8 @@ mod test {
// test update no null-value batch
let mut scipy_stats_norm_cdf = ScipyStatsNormCdf::<i32>::default();
let v: Vec<VectorRef> = vec![
Arc::new(PrimitiveVector::<i32>::from(vec![
Some(-1i32),
Some(1),
Some(2),
])),
Arc::new(PrimitiveVector::<f64>::from(vec![
Arc::new(Int32Vector::from(vec![Some(-1i32), Some(1), Some(2)])),
Arc::new(Float64Vector::from(vec![
Some(2.0_f64),
Some(2.0_f64),
Some(2.0_f64),
@@ -264,13 +254,8 @@ mod test {
// test update null-value batch
let mut scipy_stats_norm_cdf = ScipyStatsNormCdf::<i32>::default();
let v: Vec<VectorRef> = vec![
Arc::new(PrimitiveVector::<i32>::from(vec![
Some(-2i32),
None,
Some(3),
Some(4),
])),
Arc::new(PrimitiveVector::<f64>::from(vec![
Arc::new(Int32Vector::from(vec![Some(-2i32), None, Some(3), Some(4)])),
Arc::new(Float64Vector::from(vec![
Some(2.0_f64),
None,
Some(2.0_f64),

View File

@@ -23,7 +23,7 @@ use common_query::logical_plan::{Accumulator, AggregateFunctionCreator};
use common_query::prelude::*;
use datatypes::prelude::*;
use datatypes::value::{ListValue, OrderedFloat};
use datatypes::vectors::{ConstantVector, Float64Vector, ListVector};
use datatypes::vectors::{ConstantVector, Float64Vector, Helper, ListVector};
use datatypes::with_match_primitive_type_id;
use num_traits::AsPrimitive;
use snafu::{ensure, OptionExt, ResultExt};
@@ -33,18 +33,12 @@ use statrs::statistics::Statistics;
// https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.norm.html
#[derive(Debug, Default)]
pub struct ScipyStatsNormPdf<T>
where
T: Primitive + AsPrimitive<f64> + std::iter::Sum<T>,
{
pub struct ScipyStatsNormPdf<T> {
values: Vec<T>,
x: Option<f64>,
}
impl<T> ScipyStatsNormPdf<T>
where
T: Primitive + AsPrimitive<f64> + std::iter::Sum<T>,
{
impl<T> ScipyStatsNormPdf<T> {
fn push(&mut self, value: T) {
self.values.push(value);
}
@@ -52,8 +46,8 @@ where
impl<T> Accumulator for ScipyStatsNormPdf<T>
where
T: Primitive + AsPrimitive<f64> + std::iter::Sum<T>,
for<'a> T: Scalar<RefType<'a> = T>,
T: WrapperType,
T::Native: AsPrimitive<f64> + std::iter::Sum<T>,
{
fn state(&self) -> Result<Vec<Value>> {
let nums = self
@@ -64,7 +58,7 @@ where
Ok(vec![
Value::List(ListValue::new(
Some(Box::new(nums)),
T::default().into().data_type(),
T::LogicalType::build_data_type(),
)),
self.x.into(),
])
@@ -86,14 +80,14 @@ where
let mut len = 1;
let column: &<T as Scalar>::VectorType = if column.is_const() {
len = column.len();
let column: &ConstantVector = unsafe { VectorHelper::static_cast(column) };
unsafe { VectorHelper::static_cast(column.inner()) }
let column: &ConstantVector = unsafe { Helper::static_cast(column) };
unsafe { Helper::static_cast(column.inner()) }
} else {
unsafe { VectorHelper::static_cast(column) }
unsafe { Helper::static_cast(column) }
};
let x = &values[1];
let x = VectorHelper::check_get_scalar::<f64>(x).context(error::InvalidInputsSnafu {
let x = Helper::check_get_scalar::<f64>(x).context(error::InvalidInputTypeSnafu {
err_msg: "expecting \"SCIPYSTATSNORMPDF\" function's second argument to be a positive integer",
})?;
let first = x.get(0);
@@ -160,19 +154,20 @@ where
),
})?;
for value in values.values_iter() {
let value = value.context(FromScalarValueSnafu)?;
let column: &<T as Scalar>::VectorType = unsafe { VectorHelper::static_cast(&value) };
for v in column.iter_data().flatten() {
self.push(v);
if let Some(value) = value.context(FromScalarValueSnafu)? {
let column: &<T as Scalar>::VectorType = unsafe { Helper::static_cast(&value) };
for v in column.iter_data().flatten() {
self.push(v);
}
}
}
Ok(())
}
fn evaluate(&self) -> Result<Value> {
let values = self.values.iter().map(|&v| v.as_()).collect::<Vec<_>>();
let mean = values.clone().mean();
let std_dev = values.std_dev();
let mean = self.values.iter().map(|v| v.into_native().as_()).mean();
let std_dev = self.values.iter().map(|v| v.into_native().as_()).std_dev();
if mean.is_nan() || std_dev.is_nan() {
Ok(Value::Null)
} else {
@@ -198,7 +193,7 @@ impl AggregateFunctionCreator for ScipyStatsNormPdfAccumulatorCreator {
with_match_primitive_type_id!(
input_type.logical_type_id(),
|$S| {
Ok(Box::new(ScipyStatsNormPdf::<$S>::default()))
Ok(Box::new(ScipyStatsNormPdf::<<$S as LogicalPrimitiveType>::Wrapper>::default()))
},
{
let err_msg = format!(
@@ -230,7 +225,7 @@ impl AggregateFunctionCreator for ScipyStatsNormPdfAccumulatorCreator {
#[cfg(test)]
mod test {
use datatypes::vectors::PrimitiveVector;
use datatypes::vectors::{Float64Vector, Int32Vector};
use super::*;
#[test]
@@ -244,12 +239,8 @@ mod test {
// test update no null-value batch
let mut scipy_stats_norm_pdf = ScipyStatsNormPdf::<i32>::default();
let v: Vec<VectorRef> = vec![
Arc::new(PrimitiveVector::<i32>::from(vec![
Some(-1i32),
Some(1),
Some(2),
])),
Arc::new(PrimitiveVector::<f64>::from(vec![
Arc::new(Int32Vector::from(vec![Some(-1i32), Some(1), Some(2)])),
Arc::new(Float64Vector::from(vec![
Some(2.0_f64),
Some(2.0_f64),
Some(2.0_f64),
@@ -264,13 +255,8 @@ mod test {
// test update null-value batch
let mut scipy_stats_norm_pdf = ScipyStatsNormPdf::<i32>::default();
let v: Vec<VectorRef> = vec![
Arc::new(PrimitiveVector::<i32>::from(vec![
Some(-2i32),
None,
Some(3),
Some(4),
])),
Arc::new(PrimitiveVector::<f64>::from(vec![
Arc::new(Int32Vector::from(vec![Some(-2i32), None, Some(3), Some(4)])),
Arc::new(Float64Vector::from(vec![
Some(2.0_f64),
None,
Some(2.0_f64),

View File

@@ -14,10 +14,10 @@
use std::iter;
use common_query::error::Result;
use datatypes::prelude::*;
use datatypes::vectors::ConstantVector;
use datatypes::vectors::{ConstantVector, Helper};
use crate::error::Result;
use crate::scalars::expression::ctx::EvalContext;
pub fn scalar_binary_op<L: Scalar, R: Scalar, O: Scalar, F>(
@@ -36,10 +36,9 @@ where
let result = match (l.is_const(), r.is_const()) {
(false, true) => {
let left: &<L as Scalar>::VectorType = unsafe { VectorHelper::static_cast(l) };
let right: &ConstantVector = unsafe { VectorHelper::static_cast(r) };
let right: &<R as Scalar>::VectorType =
unsafe { VectorHelper::static_cast(right.inner()) };
let left: &<L as Scalar>::VectorType = unsafe { Helper::static_cast(l) };
let right: &ConstantVector = unsafe { Helper::static_cast(r) };
let right: &<R as Scalar>::VectorType = unsafe { Helper::static_cast(right.inner()) };
let b = right.get_data(0);
let it = left.iter_data().map(|a| f(a, b, ctx));
@@ -47,8 +46,8 @@ where
}
(false, false) => {
let left: &<L as Scalar>::VectorType = unsafe { VectorHelper::static_cast(l) };
let right: &<R as Scalar>::VectorType = unsafe { VectorHelper::static_cast(r) };
let left: &<L as Scalar>::VectorType = unsafe { Helper::static_cast(l) };
let right: &<R as Scalar>::VectorType = unsafe { Helper::static_cast(r) };
let it = left
.iter_data()
@@ -58,25 +57,22 @@ where
}
(true, false) => {
let left: &ConstantVector = unsafe { VectorHelper::static_cast(l) };
let left: &<L as Scalar>::VectorType =
unsafe { VectorHelper::static_cast(left.inner()) };
let left: &ConstantVector = unsafe { Helper::static_cast(l) };
let left: &<L as Scalar>::VectorType = unsafe { Helper::static_cast(left.inner()) };
let a = left.get_data(0);
let right: &<R as Scalar>::VectorType = unsafe { VectorHelper::static_cast(r) };
let right: &<R as Scalar>::VectorType = unsafe { Helper::static_cast(r) };
let it = right.iter_data().map(|b| f(a, b, ctx));
<O as Scalar>::VectorType::from_owned_iterator(it)
}
(true, true) => {
let left: &ConstantVector = unsafe { VectorHelper::static_cast(l) };
let left: &<L as Scalar>::VectorType =
unsafe { VectorHelper::static_cast(left.inner()) };
let left: &ConstantVector = unsafe { Helper::static_cast(l) };
let left: &<L as Scalar>::VectorType = unsafe { Helper::static_cast(left.inner()) };
let a = left.get_data(0);
let right: &ConstantVector = unsafe { VectorHelper::static_cast(r) };
let right: &<R as Scalar>::VectorType =
unsafe { VectorHelper::static_cast(right.inner()) };
let right: &ConstantVector = unsafe { Helper::static_cast(r) };
let right: &<R as Scalar>::VectorType = unsafe { Helper::static_cast(right.inner()) };
let b = right.get_data(0);
let it = iter::repeat(a)

View File

@@ -13,8 +13,7 @@
// limitations under the License.
use chrono_tz::Tz;
use crate::error::Error;
use common_query::error::Error;
pub struct EvalContext {
_tz: Tz,

View File

@@ -12,10 +12,11 @@
// See the License for the specific language governing permissions and
// limitations under the License.
use common_query::error::{self, Result};
use datatypes::prelude::*;
use datatypes::vectors::Helper;
use snafu::ResultExt;
use crate::error::{GetScalarVectorSnafu, Result};
use crate::scalars::expression::ctx::EvalContext;
/// TODO: remove the allow_unused when it's used.
@@ -28,7 +29,7 @@ pub fn scalar_unary_op<L: Scalar, O: Scalar, F>(
where
F: Fn(Option<L::RefType<'_>>, &mut EvalContext) -> Option<O>,
{
let left = VectorHelper::check_get_scalar::<L>(l).context(GetScalarVectorSnafu)?;
let left = Helper::check_get_scalar::<L>(l).context(error::GetScalarVectorSnafu)?;
let it = left.iter_data().map(|a| f(a, ctx));
let result = <O as Scalar>::VectorType::from_owned_iterator(it);

View File

@@ -16,12 +16,11 @@ use std::fmt;
use std::sync::Arc;
use chrono_tz::Tz;
use common_query::error::Result;
use common_query::prelude::Signature;
use datatypes::data_type::ConcreteDataType;
use datatypes::vectors::VectorRef;
use crate::error::Result;
#[derive(Clone)]
pub struct FunctionContext {
pub tz: Tz,

View File

@@ -15,15 +15,16 @@
use std::fmt;
use std::sync::Arc;
use common_query::error::Result;
use common_query::prelude::{Signature, Volatility};
use datatypes::data_type::DataType;
use datatypes::prelude::ConcreteDataType;
use datatypes::types::LogicalPrimitiveType;
use datatypes::vectors::VectorRef;
use datatypes::with_match_primitive_type_id;
use num::traits::Pow;
use num_traits::AsPrimitive;
use crate::error::Result;
use crate::scalars::expression::{scalar_binary_op, EvalContext};
use crate::scalars::function::{Function, FunctionContext};
@@ -46,7 +47,7 @@ impl Function for PowFunction {
fn eval(&self, _func_ctx: FunctionContext, columns: &[VectorRef]) -> Result<VectorRef> {
with_match_primitive_type_id!(columns[0].data_type().logical_type_id(), |$S| {
with_match_primitive_type_id!(columns[1].data_type().logical_type_id(), |$T| {
let col = scalar_binary_op::<$S, $T, f64, _>(&columns[0], &columns[1], scalar_pow, &mut EvalContext::default())?;
let col = scalar_binary_op::<<$S as LogicalPrimitiveType>::Native, <$T as LogicalPrimitiveType>::Native, f64, _>(&columns[0], &columns[1], scalar_pow, &mut EvalContext::default())?;
Ok(Arc::new(col))
},{
unreachable!()

View File

@@ -14,10 +14,10 @@
use std::fmt;
use arrow::array::Array;
use common_query::error::{FromArrowArraySnafu, Result, TypeCastSnafu};
use common_query::error::{self, Result};
use common_query::prelude::{Signature, Volatility};
use datatypes::arrow;
use datatypes::arrow::compute::kernels::{arithmetic, cast};
use datatypes::arrow::datatypes::DataType;
use datatypes::prelude::*;
use datatypes::vectors::{Helper, VectorRef};
use snafu::ResultExt;
@@ -51,28 +51,21 @@ impl Function for RateFunction {
let val = &columns[0].to_arrow_array();
let val_0 = val.slice(0, val.len() - 1);
let val_1 = val.slice(1, val.len() - 1);
let dv = arrow::compute::arithmetics::sub(&*val_1, &*val_0);
let dv = arithmetic::subtract_dyn(&val_1, &val_0).context(error::ArrowComputeSnafu)?;
let ts = &columns[1].to_arrow_array();
let ts_0 = ts.slice(0, ts.len() - 1);
let ts_1 = ts.slice(1, ts.len() - 1);
let dt = arrow::compute::arithmetics::sub(&*ts_1, &*ts_0);
fn all_to_f64(array: &dyn Array) -> Result<Box<dyn Array>> {
Ok(arrow::compute::cast::cast(
array,
&arrow::datatypes::DataType::Float64,
arrow::compute::cast::CastOptions {
wrapped: true,
partial: true,
},
)
.context(TypeCastSnafu {
typ: arrow::datatypes::DataType::Float64,
})?)
}
let dv = all_to_f64(&*dv)?;
let dt = all_to_f64(&*dt)?;
let rate = arrow::compute::arithmetics::div(&*dv, &*dt);
let v = Helper::try_into_vector(&rate).context(FromArrowArraySnafu)?;
let dt = arithmetic::subtract_dyn(&ts_1, &ts_0).context(error::ArrowComputeSnafu)?;
let dv = cast::cast(&dv, &DataType::Float64).context(error::TypeCastSnafu {
typ: DataType::Float64,
})?;
let dt = cast::cast(&dt, &DataType::Float64).context(error::TypeCastSnafu {
typ: DataType::Float64,
})?;
let rate = arithmetic::divide_dyn(&dv, &dt).context(error::ArrowComputeSnafu)?;
let v = Helper::try_into_vector(&rate).context(error::FromArrowArraySnafu)?;
Ok(v)
}
}
@@ -81,9 +74,8 @@ impl Function for RateFunction {
mod tests {
use std::sync::Arc;
use arrow::array::Float64Array;
use common_query::prelude::TypeSignature;
use datatypes::vectors::{Float32Vector, Int64Vector};
use datatypes::vectors::{Float32Vector, Float64Vector, Int64Vector};
use super::*;
#[test]
@@ -108,9 +100,7 @@ mod tests {
Arc::new(Int64Vector::from_vec(ts)),
];
let vector = rate.eval(FunctionContext::default(), &args).unwrap();
let arr = vector.to_arrow_array();
let expect = Arc::new(Float64Array::from_vec(vec![2.0, 3.0]));
let res = arrow::compute::comparison::eq(&*arr, &*expect);
res.iter().for_each(|x| assert!(matches!(x, Some(true))));
let expect: VectorRef = Arc::new(Float64Vector::from_vec(vec![2.0, 3.0]));
assert_eq!(expect, vector);
}
}

View File

@@ -13,7 +13,6 @@
// limitations under the License.
mod clip;
#[allow(unused)]
mod interp;
use std::sync::Arc;

View File

@@ -15,14 +15,15 @@
use std::fmt;
use std::sync::Arc;
use common_query::error::Result;
use common_query::prelude::{Signature, Volatility};
use datatypes::data_type::{ConcreteDataType, DataType};
use datatypes::prelude::{Scalar, VectorRef};
use datatypes::with_match_primitive_type_id;
use num_traits::AsPrimitive;
use datatypes::arrow::compute;
use datatypes::arrow::datatypes::ArrowPrimitiveType;
use datatypes::data_type::ConcreteDataType;
use datatypes::prelude::*;
use datatypes::vectors::PrimitiveVector;
use paste::paste;
use crate::error::Result;
use crate::scalars::expression::{scalar_binary_op, EvalContext};
use crate::scalars::function::{Function, FunctionContext};
@@ -34,25 +35,32 @@ macro_rules! define_eval {
($O: ident) => {
paste! {
fn [<eval_ $O>](columns: &[VectorRef]) -> Result<VectorRef> {
with_match_primitive_type_id!(columns[0].data_type().logical_type_id(), |$S| {
with_match_primitive_type_id!(columns[1].data_type().logical_type_id(), |$T| {
with_match_primitive_type_id!(columns[2].data_type().logical_type_id(), |$R| {
// clip(a, min, max) is equals to min(max(a, min), max)
let col: VectorRef = Arc::new(scalar_binary_op::<$S, $T, $O, _>(&columns[0], &columns[1], scalar_max, &mut EvalContext::default())?);
let col = scalar_binary_op::<$O, $R, $O, _>(&col, &columns[2], scalar_min, &mut EvalContext::default())?;
Ok(Arc::new(col))
}, {
unreachable!()
})
}, {
unreachable!()
})
}, {
unreachable!()
})
fn cast_vector(input: &VectorRef) -> VectorRef {
Arc::new(PrimitiveVector::<<$O as WrapperType>::LogicalType>::try_from_arrow_array(
compute::cast(&input.to_arrow_array(), &<<<$O as WrapperType>::LogicalType as LogicalPrimitiveType>::ArrowPrimitive as ArrowPrimitiveType>::DATA_TYPE).unwrap()
).unwrap()) as _
}
let operator_1 = cast_vector(&columns[0]);
let operator_2 = cast_vector(&columns[1]);
let operator_3 = cast_vector(&columns[2]);
// clip(a, min, max) is equals to min(max(a, min), max)
let col: VectorRef = Arc::new(scalar_binary_op::<$O, $O, $O, _>(
&operator_1,
&operator_2,
scalar_max,
&mut EvalContext::default(),
)?);
let col = scalar_binary_op::<$O, $O, $O, _>(
&col,
&operator_3,
scalar_min,
&mut EvalContext::default(),
)?;
Ok(Arc::new(col))
}
}
}
};
}
define_eval!(i64);
@@ -108,27 +116,23 @@ pub fn max<T: PartialOrd>(input: T, max: T) -> T {
}
#[inline]
fn scalar_min<S, T, O>(left: Option<S>, right: Option<T>, _ctx: &mut EvalContext) -> Option<O>
fn scalar_min<O>(left: Option<O>, right: Option<O>, _ctx: &mut EvalContext) -> Option<O>
where
S: AsPrimitive<O>,
T: AsPrimitive<O>,
O: Scalar + Copy + PartialOrd,
{
match (left, right) {
(Some(left), Some(right)) => Some(min(left.as_(), right.as_())),
(Some(left), Some(right)) => Some(min(left, right)),
_ => None,
}
}
#[inline]
fn scalar_max<S, T, O>(left: Option<S>, right: Option<T>, _ctx: &mut EvalContext) -> Option<O>
fn scalar_max<O>(left: Option<O>, right: Option<O>, _ctx: &mut EvalContext) -> Option<O>
where
S: AsPrimitive<O>,
T: AsPrimitive<O>,
O: Scalar + Copy + PartialOrd,
{
match (left, right) {
(Some(left), Some(right)) => Some(max(left.as_(), right.as_())),
(Some(left), Some(right)) => Some(max(left, right)),
_ => None,
}
}
@@ -143,11 +147,15 @@ impl fmt::Display for ClipFunction {
mod tests {
use common_query::prelude::TypeSignature;
use datatypes::value::Value;
use datatypes::vectors::{ConstantVector, Float32Vector, Int32Vector, UInt32Vector};
use datatypes::vectors::{
ConstantVector, Float32Vector, Int16Vector, Int32Vector, Int8Vector, UInt16Vector,
UInt32Vector, UInt8Vector,
};
use super::*;
#[test]
fn test_clip_function() {
fn test_clip_signature() {
let clip = ClipFunction::default();
assert_eq!("clip", clip.name());
@@ -190,16 +198,21 @@ mod tests {
volatility: Volatility::Immutable
} if valid_types == ConcreteDataType::numerics()
));
}
#[test]
fn test_clip_fn_signed() {
let clip = ClipFunction::default();
// eval with signed integers
let args: Vec<VectorRef> = vec![
Arc::new(Int32Vector::from_values(0..10)),
Arc::new(ConstantVector::new(
Arc::new(Int32Vector::from_vec(vec![3])),
Arc::new(Int8Vector::from_vec(vec![3])),
10,
)),
Arc::new(ConstantVector::new(
Arc::new(Int32Vector::from_vec(vec![6])),
Arc::new(Int16Vector::from_vec(vec![6])),
10,
)),
];
@@ -217,16 +230,21 @@ mod tests {
assert!(matches!(vector.get(i), Value::Int64(v) if v == 6));
}
}
}
#[test]
fn test_clip_fn_unsigned() {
let clip = ClipFunction::default();
// eval with unsigned integers
let args: Vec<VectorRef> = vec![
Arc::new(UInt32Vector::from_values(0..10)),
Arc::new(UInt8Vector::from_values(0..10)),
Arc::new(ConstantVector::new(
Arc::new(UInt32Vector::from_vec(vec![3])),
10,
)),
Arc::new(ConstantVector::new(
Arc::new(UInt32Vector::from_vec(vec![6])),
Arc::new(UInt16Vector::from_vec(vec![6])),
10,
)),
];
@@ -244,12 +262,17 @@ mod tests {
assert!(matches!(vector.get(i), Value::UInt64(v) if v == 6));
}
}
}
#[test]
fn test_clip_fn_float() {
let clip = ClipFunction::default();
// eval with floats
let args: Vec<VectorRef> = vec![
Arc::new(Int32Vector::from_values(0..10)),
Arc::new(Int8Vector::from_values(0..10)),
Arc::new(ConstantVector::new(
Arc::new(Int32Vector::from_vec(vec![3])),
Arc::new(UInt32Vector::from_vec(vec![3])),
10,
)),
Arc::new(ConstantVector::new(

View File

@@ -14,41 +14,18 @@
use std::sync::Arc;
use datatypes::arrow::array::PrimitiveArray;
use datatypes::arrow::compute::cast::primitive_to_primitive;
use datatypes::arrow::datatypes::DataType::Float64;
use common_query::error::{self, Result};
use datatypes::arrow::compute::cast;
use datatypes::arrow::datatypes::DataType as ArrowDataType;
use datatypes::data_type::DataType;
use datatypes::prelude::ScalarVector;
use datatypes::type_id::LogicalTypeId;
use datatypes::value::Value;
use datatypes::vectors::{Float64Vector, PrimitiveVector, Vector, VectorRef};
use datatypes::{arrow, with_match_primitive_type_id};
use snafu::{ensure, Snafu};
#[derive(Debug, Snafu)]
pub enum Error {
#[snafu(display(
"The length of the args is not enough, expect at least: {}, have: {}",
expect,
actual,
))]
ArgsLenNotEnough { expect: usize, actual: usize },
#[snafu(display("The sample {} is empty", name))]
SampleEmpty { name: String },
#[snafu(display(
"The length of the len1: {} don't match the length of the len2: {}",
len1,
len2,
))]
LenNotEquals { len1: usize, len2: usize },
}
pub type Result<T> = std::result::Result<T, Error>;
use datatypes::vectors::{Float64Vector, Vector, VectorRef};
use datatypes::with_match_primitive_type_id;
use snafu::{ensure, ResultExt};
/* search the biggest number that smaller than x in xp */
fn linear_search_ascending_vector(x: Value, xp: &PrimitiveVector<f64>) -> usize {
fn linear_search_ascending_vector(x: Value, xp: &Float64Vector) -> usize {
for i in 0..xp.len() {
if x < xp.get(i) {
return i - 1;
@@ -58,7 +35,7 @@ fn linear_search_ascending_vector(x: Value, xp: &PrimitiveVector<f64>) -> usize
}
/* search the biggest number that smaller than x in xp */
fn binary_search_ascending_vector(key: Value, xp: &PrimitiveVector<f64>) -> usize {
fn binary_search_ascending_vector(key: Value, xp: &Float64Vector) -> usize {
let mut left = 0;
let mut right = xp.len();
/* If len <= 4 use linear search. */
@@ -77,27 +54,33 @@ fn binary_search_ascending_vector(key: Value, xp: &PrimitiveVector<f64>) -> usiz
left - 1
}
fn concrete_type_to_primitive_vector(arg: &VectorRef) -> Result<PrimitiveVector<f64>> {
fn concrete_type_to_primitive_vector(arg: &VectorRef) -> Result<Float64Vector> {
with_match_primitive_type_id!(arg.data_type().logical_type_id(), |$S| {
let tmp = arg.to_arrow_array();
let from = tmp.as_any().downcast_ref::<PrimitiveArray<$S>>().expect("cast failed");
let array = primitive_to_primitive(from, &Float64);
Ok(PrimitiveVector::new(array))
let array = cast(&tmp, &ArrowDataType::Float64).context(error::TypeCastSnafu {
typ: ArrowDataType::Float64,
})?;
// Safety: array has been cast to Float64Array.
Ok(Float64Vector::try_from_arrow_array(array).unwrap())
},{
unreachable!()
})
}
/// https://github.com/numpy/numpy/blob/b101756ac02e390d605b2febcded30a1da50cc2c/numpy/core/src/multiarray/compiled_base.c#L491
#[allow(unused)]
pub fn interp(args: &[VectorRef]) -> Result<VectorRef> {
let mut left = None;
let mut right = None;
ensure!(
args.len() >= 3,
ArgsLenNotEnoughSnafu {
expect: 3_usize,
actual: args.len()
error::InvalidFuncArgsSnafu {
err_msg: format!(
"The length of the args is not enough, expect at least: {}, have: {}",
3,
args.len()
),
}
);
@@ -109,9 +92,12 @@ pub fn interp(args: &[VectorRef]) -> Result<VectorRef> {
if args.len() > 3 {
ensure!(
args.len() == 5,
ArgsLenNotEnoughSnafu {
expect: 5_usize,
actual: args.len()
error::InvalidFuncArgsSnafu {
err_msg: format!(
"The length of the args is not enough, expect at least: {}, have: {}",
5,
args.len()
),
}
);
@@ -123,14 +109,32 @@ pub fn interp(args: &[VectorRef]) -> Result<VectorRef> {
.get_data(0);
}
ensure!(x.len() != 0, SampleEmptySnafu { name: "x" });
ensure!(xp.len() != 0, SampleEmptySnafu { name: "xp" });
ensure!(fp.len() != 0, SampleEmptySnafu { name: "fp" });
ensure!(
x.len() != 0,
error::InvalidFuncArgsSnafu {
err_msg: "The sample x is empty",
}
);
ensure!(
xp.len() != 0,
error::InvalidFuncArgsSnafu {
err_msg: "The sample xp is empty",
}
);
ensure!(
fp.len() != 0,
error::InvalidFuncArgsSnafu {
err_msg: "The sample fp is empty",
}
);
ensure!(
xp.len() == fp.len(),
LenNotEqualsSnafu {
len1: xp.len(),
len2: fp.len(),
error::InvalidFuncArgsSnafu {
err_msg: format!(
"The length of the len1: {} don't match the length of the len2: {}",
xp.len(),
fp.len()
),
}
);
@@ -147,7 +151,7 @@ pub fn interp(args: &[VectorRef]) -> Result<VectorRef> {
let res;
if xp.len() == 1 {
res = x
let datas = x
.iter_data()
.map(|x| {
if Value::from(x) < xp.get(0) {
@@ -158,7 +162,8 @@ pub fn interp(args: &[VectorRef]) -> Result<VectorRef> {
fp.get_data(0)
}
})
.collect::<Float64Vector>();
.collect::<Vec<_>>();
res = Float64Vector::from(datas);
} else {
let mut j = 0;
/* only pre-calculate slopes if there are relatively few of them. */
@@ -185,7 +190,7 @@ pub fn interp(args: &[VectorRef]) -> Result<VectorRef> {
}
slopes = Some(slopes_tmp);
}
res = x
let datas = x
.iter_data()
.map(|x| match x {
Some(xi) => {
@@ -248,7 +253,8 @@ pub fn interp(args: &[VectorRef]) -> Result<VectorRef> {
}
_ => None,
})
.collect::<Float64Vector>();
.collect::<Vec<_>>();
res = Float64Vector::from(datas);
}
Ok(Arc::new(res) as _)
}
@@ -257,8 +263,7 @@ pub fn interp(args: &[VectorRef]) -> Result<VectorRef> {
mod tests {
use std::sync::Arc;
use datatypes::prelude::ScalarVectorBuilder;
use datatypes::vectors::{Int32Vector, Int64Vector, PrimitiveVectorBuilder};
use datatypes::vectors::{Int32Vector, Int64Vector};
use super::*;
#[test]
@@ -338,15 +343,11 @@ mod tests {
Arc::new(Int64Vector::from_vec(fp.clone())),
];
let vector = interp(&args).unwrap();
assert!(matches!(vector.get(0), Value::Float64(v) if v==x[0] as f64));
assert!(matches!(vector.get(0), Value::Float64(v) if v == x[0]));
// x=None output:Null
let input = [None, Some(0.0), Some(0.3)];
let mut builder = PrimitiveVectorBuilder::with_capacity(input.len());
for v in input {
builder.push(v);
}
let x = builder.finish();
let input = vec![None, Some(0.0), Some(0.3)];
let x = Float64Vector::from(input);
let args: Vec<VectorRef> = vec![
Arc::new(x),
Arc::new(Int64Vector::from_vec(xp)),

View File

@@ -15,11 +15,11 @@
use std::fmt;
use std::sync::Arc;
use common_query::error::Result;
use common_query::prelude::{Signature, Volatility};
use datatypes::data_type::ConcreteDataType;
use datatypes::prelude::VectorRef;
use crate::error::Result;
use crate::scalars::expression::{scalar_binary_op, EvalContext};
use crate::scalars::function::{Function, FunctionContext};

View File

@@ -17,16 +17,17 @@
use std::fmt;
use std::sync::Arc;
use common_query::error::{IntoVectorSnafu, UnsupportedInputDataTypeSnafu};
use common_query::error::{
ArrowComputeSnafu, IntoVectorSnafu, Result, TypeCastSnafu, UnsupportedInputDataTypeSnafu,
};
use common_query::prelude::{Signature, Volatility};
use datatypes::arrow::compute::arithmetics;
use datatypes::arrow::datatypes::DataType as ArrowDatatype;
use datatypes::arrow::scalar::PrimitiveScalar;
use datatypes::arrow::compute;
use datatypes::arrow::datatypes::{DataType as ArrowDatatype, Int64Type};
use datatypes::data_type::DataType;
use datatypes::prelude::ConcreteDataType;
use datatypes::vectors::{TimestampVector, VectorRef};
use datatypes::vectors::{TimestampMillisecondVector, VectorRef};
use snafu::ResultExt;
use crate::error::Result;
use crate::scalars::function::{Function, FunctionContext};
#[derive(Clone, Debug, Default)]
@@ -40,7 +41,7 @@ impl Function for FromUnixtimeFunction {
}
fn return_type(&self, _input_types: &[ConcreteDataType]) -> Result<ConcreteDataType> {
Ok(ConcreteDataType::timestamp_millis_datatype())
Ok(ConcreteDataType::timestamp_millisecond_datatype())
}
fn signature(&self) -> Signature {
@@ -56,14 +57,18 @@ impl Function for FromUnixtimeFunction {
ConcreteDataType::Int64(_) => {
let array = columns[0].to_arrow_array();
// Our timestamp vector's time unit is millisecond
let array = arithmetics::mul_scalar(
&*array,
&PrimitiveScalar::new(ArrowDatatype::Int64, Some(1000i64)),
);
let array = compute::multiply_scalar_dyn::<Int64Type>(&array, 1000i64)
.context(ArrowComputeSnafu)?;
let arrow_datatype = &self.return_type(&[]).unwrap().as_arrow_type();
Ok(Arc::new(
TimestampVector::try_from_arrow_array(array).context(IntoVectorSnafu {
data_type: ArrowDatatype::Int64,
TimestampMillisecondVector::try_from_arrow_array(
compute::cast(&array, arrow_datatype).context(TypeCastSnafu {
typ: ArrowDatatype::Int64,
})?,
)
.context(IntoVectorSnafu {
data_type: arrow_datatype.clone(),
})?,
))
}
@@ -71,8 +76,7 @@ impl Function for FromUnixtimeFunction {
function: NAME,
datatypes: columns.iter().map(|c| c.data_type()).collect::<Vec<_>>(),
}
.fail()
.map_err(|e| e.into()),
.fail(),
}
}
}
@@ -96,7 +100,7 @@ mod tests {
let f = FromUnixtimeFunction::default();
assert_eq!("from_unixtime", f.name());
assert_eq!(
ConcreteDataType::timestamp_millis_datatype(),
ConcreteDataType::timestamp_millisecond_datatype(),
f.return_type(&[]).unwrap()
);

View File

@@ -19,7 +19,8 @@ use common_query::prelude::{
ColumnarValue, ReturnTypeFunction, ScalarFunctionImplementation, ScalarUdf, ScalarValue,
};
use datatypes::error::Error as DataTypeError;
use datatypes::prelude::{ConcreteDataType, VectorHelper};
use datatypes::prelude::*;
use datatypes::vectors::Helper;
use snafu::ResultExt;
use crate::scalars::function::{FunctionContext, FunctionRef};
@@ -47,7 +48,7 @@ pub fn create_udf(func: FunctionRef) -> ScalarUdf {
let args: Result<Vec<_>, DataTypeError> = args
.iter()
.map(|arg| match arg {
ColumnarValue::Scalar(v) => VectorHelper::try_from_scalar_value(v.clone(), rows),
ColumnarValue::Scalar(v) => Helper::try_from_scalar_value(v.clone(), rows),
ColumnarValue::Vector(v) => Ok(v.clone()),
})
.collect();
@@ -126,12 +127,7 @@ mod tests {
assert_eq!(4, vec.len());
for i in 0..4 {
assert_eq!(
i == 0 || i == 3,
vec.get_data(i).unwrap(),
"failed at {}",
i
)
assert_eq!(i == 0 || i == 3, vec.get_data(i).unwrap(), "Failed at {i}",)
}
}
_ => unreachable!(),

View File

@@ -1,8 +1,8 @@
[package]
name = "common-grpc-expr"
version = "0.1.0"
edition = "2021"
license = "Apache-2.0"
version.workspace = true
edition.workspace = true
license.workspace = true
[dependencies]
api = { path = "../../api" }

View File

@@ -15,7 +15,7 @@
use std::sync::Arc;
use api::v1::alter_expr::Kind;
use api::v1::{AlterExpr, CreateExpr, DropColumns};
use api::v1::{AlterExpr, CreateTableExpr, DropColumns};
use common_catalog::consts::{DEFAULT_CATALOG_NAME, DEFAULT_SCHEMA_NAME};
use datatypes::schema::{ColumnSchema, SchemaBuilder, SchemaRef};
use snafu::{ensure, OptionExt, ResultExt};
@@ -29,6 +29,16 @@ use crate::error::{
/// Convert an [`AlterExpr`] to an optional [`AlterTableRequest`]
pub fn alter_expr_to_request(expr: AlterExpr) -> Result<Option<AlterTableRequest>> {
let catalog_name = if expr.catalog_name.is_empty() {
None
} else {
Some(expr.catalog_name)
};
let schema_name = if expr.schema_name.is_empty() {
None
} else {
Some(expr.schema_name)
};
match expr.kind {
Some(Kind::AddColumns(add_columns)) => {
let add_column_requests = add_columns
@@ -57,8 +67,8 @@ pub fn alter_expr_to_request(expr: AlterExpr) -> Result<Option<AlterTableRequest
};
let request = AlterTableRequest {
catalog_name: expr.catalog_name,
schema_name: expr.schema_name,
catalog_name,
schema_name,
table_name: expr.table_name,
alter_kind,
};
@@ -70,8 +80,8 @@ pub fn alter_expr_to_request(expr: AlterExpr) -> Result<Option<AlterTableRequest
};
let request = AlterTableRequest {
catalog_name: expr.catalog_name,
schema_name: expr.schema_name,
catalog_name,
schema_name,
table_name: expr.table_name,
alter_kind,
};
@@ -81,7 +91,7 @@ pub fn alter_expr_to_request(expr: AlterExpr) -> Result<Option<AlterTableRequest
}
}
pub fn create_table_schema(expr: &CreateExpr) -> Result<SchemaRef> {
pub fn create_table_schema(expr: &CreateTableExpr) -> Result<SchemaRef> {
let column_schemas = expr
.column_defs
.iter()
@@ -96,7 +106,7 @@ pub fn create_table_schema(expr: &CreateExpr) -> Result<SchemaRef> {
.iter()
.any(|column| column.name == expr.time_index),
MissingTimestampColumnSnafu {
msg: format!("CreateExpr: {:?}", expr)
msg: format!("CreateExpr: {expr:?}")
}
);
@@ -119,7 +129,10 @@ pub fn create_table_schema(expr: &CreateExpr) -> Result<SchemaRef> {
))
}
pub fn create_expr_to_request(table_id: TableId, expr: CreateExpr) -> Result<CreateTableRequest> {
pub fn create_expr_to_request(
table_id: TableId,
expr: CreateTableExpr,
) -> Result<CreateTableRequest> {
let schema = create_table_schema(&expr)?;
let primary_key_indices = expr
.primary_keys
@@ -134,12 +147,19 @@ pub fn create_expr_to_request(table_id: TableId, expr: CreateExpr) -> Result<Cre
})
.collect::<Result<Vec<usize>>>()?;
let catalog_name = expr
.catalog_name
.unwrap_or_else(|| DEFAULT_CATALOG_NAME.to_string());
let schema_name = expr
.schema_name
.unwrap_or_else(|| DEFAULT_SCHEMA_NAME.to_string());
let mut catalog_name = expr.catalog_name;
if catalog_name.is_empty() {
catalog_name = DEFAULT_CATALOG_NAME.to_string();
}
let mut schema_name = expr.schema_name;
if schema_name.is_empty() {
schema_name = DEFAULT_SCHEMA_NAME.to_string();
}
let desc = if expr.desc.is_empty() {
None
} else {
Some(expr.desc)
};
let region_ids = if expr.region_ids.is_empty() {
vec![0]
@@ -152,7 +172,7 @@ pub fn create_expr_to_request(table_id: TableId, expr: CreateExpr) -> Result<Cre
catalog_name,
schema_name,
table_name: expr.table_name,
desc: expr.desc,
desc,
schema,
region_numbers: region_ids,
primary_key_indices,
@@ -171,8 +191,8 @@ mod tests {
#[test]
fn test_alter_expr_to_request() {
let expr = AlterExpr {
catalog_name: None,
schema_name: None,
catalog_name: "".to_string(),
schema_name: "".to_string(),
table_name: "monitor".to_string(),
kind: Some(Kind::AddColumns(AddColumns {
@@ -181,7 +201,7 @@ mod tests {
name: "mem_usage".to_string(),
datatype: ColumnDataType::Float64 as i32,
is_nullable: false,
default_constraint: None,
default_constraint: vec![],
}),
is_key: false,
}],
@@ -208,8 +228,8 @@ mod tests {
#[test]
fn test_drop_column_expr() {
let expr = AlterExpr {
catalog_name: Some("test_catalog".to_string()),
schema_name: Some("test_schema".to_string()),
catalog_name: "test_catalog".to_string(),
schema_name: "test_schema".to_string(),
table_name: "monitor".to_string(),
kind: Some(Kind::DropColumns(DropColumns {

View File

@@ -18,15 +18,15 @@ use std::sync::Arc;
use api::helper::ColumnDataTypeWrapper;
use api::v1::column::{SemanticType, Values};
use api::v1::{AddColumn, AddColumns, Column, ColumnDataType, ColumnDef, CreateExpr};
use api::v1::{AddColumn, AddColumns, Column, ColumnDataType, ColumnDef, CreateTableExpr};
use common_base::BitVec;
use common_time::timestamp::Timestamp;
use common_time::{Date, DateTime};
use datatypes::data_type::ConcreteDataType;
use datatypes::data_type::{ConcreteDataType, DataType};
use datatypes::prelude::{ValueRef, VectorRef};
use datatypes::schema::SchemaRef;
use datatypes::value::Value;
use datatypes::vectors::VectorBuilder;
use datatypes::vectors::MutableVector;
use snafu::{ensure, OptionExt, ResultExt};
use table::metadata::TableId;
use table::requests::{AddColumnRequest, AlterKind, AlterTableRequest, InsertRequest};
@@ -45,7 +45,7 @@ fn build_column_def(column_name: &str, datatype: i32, nullable: bool) -> ColumnD
name: column_name.to_string(),
datatype,
is_nullable: nullable,
default_constraint: None,
default_constraint: vec![],
}
}
@@ -99,7 +99,7 @@ pub fn column_to_vector(column: &Column, rows: u32) -> Result<VectorRef> {
let column_datatype = wrapper.datatype();
let rows = rows as usize;
let mut vector = VectorBuilder::with_capacity(wrapper.into(), rows);
let mut vector = ConcreteDataType::from(wrapper).create_mutable_vector(rows);
if let Some(values) = &column.values {
let values = collect_column_values(column_datatype, values);
@@ -110,21 +110,31 @@ pub fn column_to_vector(column: &Column, rows: u32) -> Result<VectorRef> {
for i in 0..rows {
if let Some(true) = nulls_iter.next() {
vector.push_null();
vector
.push_value_ref(ValueRef::Null)
.context(CreateVectorSnafu)?;
} else {
let value_ref = values_iter.next().context(InvalidColumnProtoSnafu {
err_msg: format!(
"value not found at position {} of column {}",
i, &column.column_name
),
})?;
vector.try_push_ref(value_ref).context(CreateVectorSnafu)?;
let value_ref = values_iter
.next()
.with_context(|| InvalidColumnProtoSnafu {
err_msg: format!(
"value not found at position {} of column {}",
i, &column.column_name
),
})?;
vector
.push_value_ref(value_ref)
.context(CreateVectorSnafu)?;
}
}
} else {
(0..rows).for_each(|_| vector.push_null());
(0..rows).try_for_each(|_| {
vector
.push_value_ref(ValueRef::Null)
.context(CreateVectorSnafu)
})?;
}
Ok(vector.finish())
Ok(vector.to_vector())
}
fn collect_column_values(column_datatype: ColumnDataType, values: &Values) -> Vec<ValueRef> {
@@ -144,7 +154,7 @@ fn collect_column_values(column_datatype: ColumnDataType, values: &Values) -> Ve
collect_values!(values.i32_values, |v| ValueRef::from(*v))
}
ColumnDataType::Int64 => {
collect_values!(values.i64_values, |v| ValueRef::from(*v as i64))
collect_values!(values.i64_values, |v| ValueRef::from(*v))
}
ColumnDataType::Uint8 => {
collect_values!(values.u8_values, |v| ValueRef::from(*v as u8))
@@ -156,7 +166,7 @@ fn collect_column_values(column_datatype: ColumnDataType, values: &Values) -> Ve
collect_values!(values.u32_values, |v| ValueRef::from(*v))
}
ColumnDataType::Uint64 => {
collect_values!(values.u64_values, |v| ValueRef::from(*v as u64))
collect_values!(values.u64_values, |v| ValueRef::from(*v))
}
ColumnDataType::Float32 => collect_values!(values.f32_values, |v| ValueRef::from(*v)),
ColumnDataType::Float64 => collect_values!(values.f64_values, |v| ValueRef::from(*v)),
@@ -174,9 +184,24 @@ fn collect_column_values(column_datatype: ColumnDataType, values: &Values) -> Ve
DateTime::new(*v)
))
}
ColumnDataType::Timestamp => {
collect_values!(values.ts_millis_values, |v| ValueRef::Timestamp(
Timestamp::from_millis(*v)
ColumnDataType::TimestampSecond => {
collect_values!(values.ts_second_values, |v| ValueRef::Timestamp(
Timestamp::new_second(*v)
))
}
ColumnDataType::TimestampMillisecond => {
collect_values!(values.ts_millisecond_values, |v| ValueRef::Timestamp(
Timestamp::new_millisecond(*v)
))
}
ColumnDataType::TimestampMicrosecond => {
collect_values!(values.ts_millisecond_values, |v| ValueRef::Timestamp(
Timestamp::new_microsecond(*v)
))
}
ColumnDataType::TimestampNanosecond => {
collect_values!(values.ts_millisecond_values, |v| ValueRef::Timestamp(
Timestamp::new_nanosecond(*v)
))
}
}
@@ -189,7 +214,7 @@ pub fn build_create_expr_from_insertion(
table_id: Option<TableId>,
table_name: &str,
columns: &[Column],
) -> Result<CreateExpr> {
) -> Result<CreateTableExpr> {
let mut new_columns: HashSet<String> = HashSet::default();
let mut column_defs = Vec::default();
let mut primary_key_indices = Vec::default();
@@ -238,17 +263,17 @@ pub fn build_create_expr_from_insertion(
.map(|idx| columns[*idx].column_name.clone())
.collect::<Vec<_>>();
let expr = CreateExpr {
catalog_name: Some(catalog_name.to_string()),
schema_name: Some(schema_name.to_string()),
let expr = CreateTableExpr {
catalog_name: catalog_name.to_string(),
schema_name: schema_name.to_string(),
table_name: table_name.to_string(),
desc: Some("Created on insertion".to_string()),
desc: "Created on insertion".to_string(),
column_defs,
time_index: timestamp_field_name,
primary_keys,
create_if_not_exists: true,
table_options: Default::default(),
table_id,
table_id: table_id.map(|id| api::v1::TableId { id }),
region_ids: vec![0], // TODO:(hl): region id should be allocated by frontend
};
@@ -289,10 +314,7 @@ pub fn insertion_expr_to_request(
},
)?;
let data_type = &column_schema.data_type;
entry.insert(VectorBuilder::with_capacity(
data_type.clone(),
row_count as usize,
))
entry.insert(data_type.create_mutable_vector(row_count as usize))
}
};
add_values_to_builder(vector_builder, values, row_count as usize, null_mask)?;
@@ -300,7 +322,7 @@ pub fn insertion_expr_to_request(
}
let columns_values = columns_builders
.into_iter()
.map(|(column_name, mut vector_builder)| (column_name, vector_builder.finish()))
.map(|(column_name, mut vector_builder)| (column_name, vector_builder.to_vector()))
.collect();
Ok(InsertRequest {
@@ -312,7 +334,7 @@ pub fn insertion_expr_to_request(
}
fn add_values_to_builder(
builder: &mut VectorBuilder,
builder: &mut Box<dyn MutableVector>,
values: Values,
row_count: usize,
null_mask: Vec<u8>,
@@ -323,9 +345,11 @@ fn add_values_to_builder(
if null_mask.is_empty() {
ensure!(values.len() == row_count, IllegalInsertDataSnafu);
values.iter().for_each(|value| {
builder.push(value);
});
values.iter().try_for_each(|value| {
builder
.push_value_ref(value.as_value_ref())
.context(CreateVectorSnafu)
})?;
} else {
let null_mask = BitVec::from_vec(null_mask);
ensure!(
@@ -336,9 +360,13 @@ fn add_values_to_builder(
let mut idx_of_values = 0;
for idx in 0..row_count {
match is_null(&null_mask, idx) {
Some(true) => builder.push(&Value::Null),
Some(true) => builder
.push_value_ref(ValueRef::Null)
.context(CreateVectorSnafu)?,
_ => {
builder.push(&values[idx_of_values]);
builder
.push_value_ref(values[idx_of_values].as_value_ref())
.context(CreateVectorSnafu)?;
idx_of_values += 1
}
}
@@ -418,9 +446,9 @@ fn convert_values(data_type: &ConcreteDataType, values: Values) -> Vec<Value> {
.map(|v| Value::Date(v.into()))
.collect(),
ConcreteDataType::Timestamp(_) => values
.ts_millis_values
.ts_millisecond_values
.into_iter()
.map(|v| Value::Timestamp(Timestamp::from_millis(v)))
.map(|v| Value::Timestamp(Timestamp::new_millisecond(v)))
.collect(),
ConcreteDataType::Null(_) => unreachable!(),
ConcreteDataType::List(_) => unreachable!(),
@@ -488,9 +516,9 @@ mod tests {
build_create_expr_from_insertion("", "", table_id, table_name, &insert_batch.0)
.unwrap();
assert_eq!(table_id, create_expr.table_id);
assert_eq!(table_id, create_expr.table_id.map(|x| x.id));
assert_eq!(table_name, create_expr.table_name);
assert_eq!(Some("Created on insertion".to_string()), create_expr.desc);
assert_eq!("Created on insertion".to_string(), create_expr.desc);
assert_eq!(
vec![create_expr.column_defs[0].name.clone()],
create_expr.primary_keys
@@ -543,7 +571,7 @@ mod tests {
);
assert_eq!(
ConcreteDataType::timestamp_millis_datatype(),
ConcreteDataType::timestamp_millisecond_datatype(),
ConcreteDataType::from(
ColumnDataTypeWrapper::try_new(
column_defs
@@ -624,8 +652,8 @@ mod tests {
assert_eq!(Value::Float64(0.1.into()), memory.get(1));
let ts = insert_req.columns_values.get("ts").unwrap();
assert_eq!(Value::Timestamp(Timestamp::from_millis(100)), ts.get(0));
assert_eq!(Value::Timestamp(Timestamp::from_millis(101)), ts.get(1));
assert_eq!(Value::Timestamp(Timestamp::new_millisecond(100)), ts.get(0));
assert_eq!(Value::Timestamp(Timestamp::new_millisecond(101)), ts.get(1));
}
#[test]
@@ -675,8 +703,12 @@ mod tests {
ColumnSchema::new("host", ConcreteDataType::string_datatype(), false),
ColumnSchema::new("cpu", ConcreteDataType::float64_datatype(), true),
ColumnSchema::new("memory", ConcreteDataType::float64_datatype(), true),
ColumnSchema::new("ts", ConcreteDataType::timestamp_millis_datatype(), true)
.with_time_index(true),
ColumnSchema::new(
"ts",
ConcreteDataType::timestamp_millisecond_datatype(),
true,
)
.with_time_index(true),
];
Arc::new(
@@ -693,7 +725,7 @@ mod tests {
async fn scan(
&self,
_projection: &Option<Vec<usize>>,
_projection: Option<&Vec<usize>>,
_filters: &[Expr],
_limit: Option<usize>,
) -> TableResult<PhysicalPlanRef> {
@@ -741,7 +773,7 @@ mod tests {
};
let ts_vals = column::Values {
ts_millis_values: vec![100, 101],
ts_millisecond_values: vec![100, 101],
..Default::default()
};
let ts_column = Column {
@@ -749,7 +781,7 @@ mod tests {
semantic_type: TIMESTAMP_SEMANTIC_TYPE,
values: Some(ts_vals),
null_mask: vec![0],
datatype: ColumnDataType::Timestamp as i32,
datatype: ColumnDataType::TimestampMillisecond as i32,
};
(

View File

@@ -1,4 +1,3 @@
#![feature(assert_matches)]
// Copyright 2022 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");

View File

@@ -1,8 +1,8 @@
[package]
name = "common-grpc"
version = "0.1.0"
edition = "2021"
license = "Apache-2.0"
version.workspace = true
edition.workspace = true
license.workspace = true
[dependencies]
api = { path = "../../api" }
@@ -13,9 +13,7 @@ common-query = { path = "../query" }
common-recordbatch = { path = "../recordbatch" }
common-runtime = { path = "../runtime" }
dashmap = "5.4"
datafusion = { git = "https://github.com/apache/arrow-datafusion.git", branch = "arrow2", features = [
"simd",
] }
datafusion.workspace = true
datatypes = { path = "../../datatypes" }
snafu = { version = "0.7", features = ["backtraces"] }
tokio = { version = "1.0", features = ["full"] }

View File

@@ -26,7 +26,7 @@ async fn do_bench_channel_manager() {
let join = tokio::spawn(async move {
for _ in 0..10000 {
let idx = rand::random::<usize>() % 100;
let ret = m_clone.get(format!("{}", idx));
let ret = m_clone.get(format!("{idx}"));
assert!(ret.is_ok());
}
});

View File

@@ -120,7 +120,7 @@ impl ChannelManager {
fn build_endpoint(&self, addr: &str) -> Result<Endpoint> {
let mut endpoint =
Endpoint::new(format!("http://{}", addr)).context(error::CreateChannelSnafu)?;
Endpoint::new(format!("http://{addr}")).context(error::CreateChannelSnafu)?;
if let Some(dur) = self.config.timeout {
endpoint = endpoint.timeout(dur);

View File

@@ -12,8 +12,6 @@
// See the License for the specific language governing permissions and
// limitations under the License.
use std::sync::Arc;
use api::helper::ColumnDataTypeWrapper;
use api::result::{build_err_result, ObjectResultBuilder};
use api::v1::codec::SelectResult;
@@ -24,9 +22,14 @@ use common_error::prelude::ErrorExt;
use common_error::status_code::StatusCode;
use common_query::Output;
use common_recordbatch::{RecordBatches, SendableRecordBatchStream};
use datatypes::arrow::array::{Array, BooleanArray, PrimitiveArray};
use datatypes::arrow_array::{BinaryArray, StringArray};
use datatypes::schema::SchemaRef;
use datatypes::types::{TimestampType, WrapperType};
use datatypes::vectors::{
BinaryVector, BooleanVector, DateTimeVector, DateVector, Float32Vector, Float64Vector,
Int16Vector, Int32Vector, Int64Vector, Int8Vector, StringVector, TimestampMicrosecondVector,
TimestampMillisecondVector, TimestampNanosecondVector, TimestampSecondVector, UInt16Vector,
UInt32Vector, UInt64Vector, UInt8Vector, VectorRef,
};
use snafu::{OptionExt, ResultExt};
use crate::error::{self, ConversionSnafu, Result};
@@ -46,6 +49,7 @@ pub async fn to_object_result(output: std::result::Result<Output, impl ErrorExt>
Err(e) => build_err_result(&e),
}
}
async fn collect(stream: SendableRecordBatchStream) -> Result<ObjectResult> {
let recordbatches = RecordBatches::try_collect(stream)
.await
@@ -78,10 +82,7 @@ fn try_convert(record_batches: RecordBatches) -> Result<SelectResult> {
let schema = record_batches.schema();
let record_batches = record_batches.take();
let row_count: usize = record_batches
.iter()
.map(|r| r.df_recordbatch.num_rows())
.sum();
let row_count: usize = record_batches.iter().map(|r| r.num_rows()).sum();
let schemas = schema.column_schemas();
let mut columns = Vec::with_capacity(schemas.len());
@@ -89,9 +90,9 @@ fn try_convert(record_batches: RecordBatches) -> Result<SelectResult> {
for (idx, column_schema) in schemas.iter().enumerate() {
let column_name = column_schema.name.clone();
let arrays: Vec<Arc<dyn Array>> = record_batches
let arrays: Vec<_> = record_batches
.iter()
.map(|r| r.df_recordbatch.columns()[idx].clone())
.map(|r| r.column(idx).clone())
.collect();
let column = Column {
@@ -112,7 +113,7 @@ fn try_convert(record_batches: RecordBatches) -> Result<SelectResult> {
})
}
pub fn null_mask(arrays: &Vec<Arc<dyn Array>>, row_count: usize) -> Vec<u8> {
pub fn null_mask(arrays: &[VectorRef], row_count: usize) -> Vec<u8> {
let null_count: usize = arrays.iter().map(|a| a.null_count()).sum();
if null_count == 0 {
@@ -122,10 +123,12 @@ pub fn null_mask(arrays: &Vec<Arc<dyn Array>>, row_count: usize) -> Vec<u8> {
let mut null_mask = BitVec::with_capacity(row_count);
for array in arrays {
let validity = array.validity();
if let Some(v) = validity {
v.iter().for_each(|x| null_mask.push(!x));
} else {
if validity.is_all_valid() {
null_mask.extend_from_bitslice(&BitVec::repeat(false, array.len()));
} else {
for i in 0..array.len() {
null_mask.push(!validity.is_set(i));
}
}
}
null_mask.into_vec()
@@ -133,7 +136,9 @@ pub fn null_mask(arrays: &Vec<Arc<dyn Array>>, row_count: usize) -> Vec<u8> {
macro_rules! convert_arrow_array_to_grpc_vals {
($data_type: expr, $arrays: ident, $(($Type: pat, $CastType: ty, $field: ident, $MapFunction: expr)), +) => {{
use datatypes::arrow::datatypes::{DataType, TimeUnit};
use datatypes::data_type::{ConcreteDataType};
use datatypes::prelude::ScalarVector;
match $data_type {
$(
$Type => {
@@ -143,52 +148,114 @@ macro_rules! convert_arrow_array_to_grpc_vals {
from: format!("{:?}", $data_type),
})?;
vals.$field.extend(array
.iter()
.iter_data()
.filter_map(|i| i.map($MapFunction))
.collect::<Vec<_>>());
}
return Ok(vals);
},
)+
_ => unimplemented!(),
ConcreteDataType::Null(_) | ConcreteDataType::List(_) => unreachable!("Should not send {:?} in gRPC", $data_type),
}
}};
}
pub fn values(arrays: &[Arc<dyn Array>]) -> Result<Values> {
pub fn values(arrays: &[VectorRef]) -> Result<Values> {
if arrays.is_empty() {
return Ok(Values::default());
}
let data_type = arrays[0].data_type();
convert_arrow_array_to_grpc_vals!(
data_type, arrays,
(DataType::Boolean, BooleanArray, bool_values, |x| {x}),
(DataType::Int8, PrimitiveArray<i8>, i8_values, |x| {*x as i32}),
(DataType::Int16, PrimitiveArray<i16>, i16_values, |x| {*x as i32}),
(DataType::Int32, PrimitiveArray<i32>, i32_values, |x| {*x}),
(DataType::Int64, PrimitiveArray<i64>, i64_values, |x| {*x}),
(DataType::UInt8, PrimitiveArray<u8>, u8_values, |x| {*x as u32}),
(DataType::UInt16, PrimitiveArray<u16>, u16_values, |x| {*x as u32}),
(DataType::UInt32, PrimitiveArray<u32>, u32_values, |x| {*x}),
(DataType::UInt64, PrimitiveArray<u64>, u64_values, |x| {*x}),
(DataType::Float32, PrimitiveArray<f32>, f32_values, |x| {*x}),
(DataType::Float64, PrimitiveArray<f64>, f64_values, |x| {*x}),
(DataType::Binary, BinaryArray, binary_values, |x| {x.into()}),
(DataType::LargeBinary, BinaryArray, binary_values, |x| {x.into()}),
(DataType::Utf8, StringArray, string_values, |x| {x.into()}),
(DataType::LargeUtf8, StringArray, string_values, |x| {x.into()}),
(DataType::Date32, PrimitiveArray<i32>, date_values, |x| {*x as i32}),
(DataType::Date64, PrimitiveArray<i64>, datetime_values,|x| {*x as i64}),
(DataType::Timestamp(TimeUnit::Millisecond, _), PrimitiveArray<i64>, ts_millis_values, |x| {*x})
data_type,
arrays,
(
ConcreteDataType::Boolean(_),
BooleanVector,
bool_values,
|x| { x }
),
(ConcreteDataType::Int8(_), Int8Vector, i8_values, |x| {
i32::from(x)
}),
(ConcreteDataType::Int16(_), Int16Vector, i16_values, |x| {
i32::from(x)
}),
(ConcreteDataType::Int32(_), Int32Vector, i32_values, |x| {
x
}),
(ConcreteDataType::Int64(_), Int64Vector, i64_values, |x| {
x
}),
(ConcreteDataType::UInt8(_), UInt8Vector, u8_values, |x| {
u32::from(x)
}),
(ConcreteDataType::UInt16(_), UInt16Vector, u16_values, |x| {
u32::from(x)
}),
(ConcreteDataType::UInt32(_), UInt32Vector, u32_values, |x| {
x
}),
(ConcreteDataType::UInt64(_), UInt64Vector, u64_values, |x| {
x
}),
(
ConcreteDataType::Float32(_),
Float32Vector,
f32_values,
|x| { x }
),
(
ConcreteDataType::Float64(_),
Float64Vector,
f64_values,
|x| { x }
),
(
ConcreteDataType::Binary(_),
BinaryVector,
binary_values,
|x| { x.into() }
),
(
ConcreteDataType::String(_),
StringVector,
string_values,
|x| { x.into() }
),
(ConcreteDataType::Date(_), DateVector, date_values, |x| {
x.val()
}),
(
ConcreteDataType::DateTime(_),
DateTimeVector,
datetime_values,
|x| { x.val() }
),
(
ConcreteDataType::Timestamp(TimestampType::Second(_)),
TimestampSecondVector,
ts_second_values,
|x| { x.into_native() }
),
(
ConcreteDataType::Timestamp(TimestampType::Millisecond(_)),
TimestampMillisecondVector,
ts_millisecond_values,
|x| { x.into_native() }
),
(
ConcreteDataType::Timestamp(TimestampType::Microsecond(_)),
TimestampMicrosecondVector,
ts_microsecond_values,
|x| { x.into_native() }
),
(
ConcreteDataType::Timestamp(TimestampType::Nanosecond(_)),
TimestampNanosecondVector,
ts_nanosecond_values,
|x| { x.into_native() }
)
)
}
@@ -197,14 +264,10 @@ mod tests {
use std::sync::Arc;
use common_recordbatch::{RecordBatch, RecordBatches};
use datafusion::field_util::SchemaExt;
use datatypes::arrow::array::{Array, BooleanArray, PrimitiveArray};
use datatypes::arrow::datatypes::{DataType, Field, Schema as ArrowSchema};
use datatypes::arrow_array::StringArray;
use datatypes::schema::Schema;
use datatypes::vectors::{UInt32Vector, VectorRef};
use datatypes::data_type::ConcreteDataType;
use datatypes::schema::{ColumnSchema, Schema};
use crate::select::{null_mask, try_convert, values};
use super::*;
#[test]
fn test_convert_record_batches_to_select_result() {
@@ -230,9 +293,8 @@ mod tests {
#[test]
fn test_convert_arrow_arrays_i32() {
let array: PrimitiveArray<i32> =
PrimitiveArray::from(vec![Some(1), Some(2), None, Some(3)]);
let array: Arc<dyn Array> = Arc::new(array);
let array = Int32Vector::from(vec![Some(1), Some(2), None, Some(3)]);
let array: VectorRef = Arc::new(array);
let values = values(&[array]).unwrap();
@@ -241,14 +303,14 @@ mod tests {
#[test]
fn test_convert_arrow_arrays_string() {
let array = StringArray::from(vec![
let array = StringVector::from(vec![
Some("1".to_string()),
Some("2".to_string()),
None,
Some("3".to_string()),
None,
]);
let array: Arc<dyn Array> = Arc::new(array);
let array: VectorRef = Arc::new(array);
let values = values(&[array]).unwrap();
@@ -257,8 +319,8 @@ mod tests {
#[test]
fn test_convert_arrow_arrays_bool() {
let array = BooleanArray::from(vec![Some(true), Some(false), None, Some(false), None]);
let array: Arc<dyn Array> = Arc::new(array);
let array = BooleanVector::from(vec![Some(true), Some(false), None, Some(false), None]);
let array: VectorRef = Arc::new(array);
let values = values(&[array]).unwrap();
@@ -267,43 +329,42 @@ mod tests {
#[test]
fn test_convert_arrow_arrays_empty() {
let array = BooleanArray::from(vec![None, None, None, None, None]);
let array: Arc<dyn Array> = Arc::new(array);
let array = BooleanVector::from(vec![None, None, None, None, None]);
let array: VectorRef = Arc::new(array);
let values = values(&[array]).unwrap();
assert_eq!(Vec::<bool>::default(), values.bool_values);
assert!(values.bool_values.is_empty());
}
#[test]
fn test_null_mask() {
let a1: Arc<dyn Array> = Arc::new(PrimitiveArray::from(vec![None, Some(2), None]));
let a2: Arc<dyn Array> =
Arc::new(PrimitiveArray::from(vec![Some(1), Some(2), None, Some(4)]));
let mask = null_mask(&vec![a1, a2], 3 + 4);
let a1: VectorRef = Arc::new(Int32Vector::from(vec![None, Some(2), None]));
let a2: VectorRef = Arc::new(Int32Vector::from(vec![Some(1), Some(2), None, Some(4)]));
let mask = null_mask(&[a1, a2], 3 + 4);
assert_eq!(vec![0b0010_0101], mask);
let empty: Arc<dyn Array> = Arc::new(PrimitiveArray::<i32>::from(vec![None, None, None]));
let mask = null_mask(&vec![empty.clone(), empty.clone(), empty], 9);
let empty: VectorRef = Arc::new(Int32Vector::from(vec![None, None, None]));
let mask = null_mask(&[empty.clone(), empty.clone(), empty], 9);
assert_eq!(vec![0b1111_1111, 0b0000_0001], mask);
let a1: Arc<dyn Array> = Arc::new(PrimitiveArray::from(vec![Some(1), Some(2), Some(3)]));
let a2: Arc<dyn Array> = Arc::new(PrimitiveArray::from(vec![Some(4), Some(5), Some(6)]));
let mask = null_mask(&vec![a1, a2], 3 + 3);
let a1: VectorRef = Arc::new(Int32Vector::from(vec![Some(1), Some(2), Some(3)]));
let a2: VectorRef = Arc::new(Int32Vector::from(vec![Some(4), Some(5), Some(6)]));
let mask = null_mask(&[a1, a2], 3 + 3);
assert_eq!(Vec::<u8>::default(), mask);
let a1: Arc<dyn Array> = Arc::new(PrimitiveArray::from(vec![Some(1), Some(2), Some(3)]));
let a2: Arc<dyn Array> = Arc::new(PrimitiveArray::from(vec![Some(4), Some(5), None]));
let mask = null_mask(&vec![a1, a2], 3 + 3);
let a1: VectorRef = Arc::new(Int32Vector::from(vec![Some(1), Some(2), Some(3)]));
let a2: VectorRef = Arc::new(Int32Vector::from(vec![Some(4), Some(5), None]));
let mask = null_mask(&[a1, a2], 3 + 3);
assert_eq!(vec![0b0010_0000], mask);
}
fn mock_record_batch() -> RecordBatch {
let arrow_schema = Arc::new(ArrowSchema::new(vec![
Field::new("c1", DataType::UInt32, false),
Field::new("c2", DataType::UInt32, false),
]));
let schema = Arc::new(Schema::try_from(arrow_schema).unwrap());
let column_schemas = vec![
ColumnSchema::new("c1", ConcreteDataType::uint32_datatype(), true),
ColumnSchema::new("c2", ConcreteDataType::uint32_datatype(), true),
];
let schema = Arc::new(Schema::try_new(column_schemas).unwrap());
let v1 = Arc::new(UInt32Vector::from(vec![Some(1), Some(2), None]));
let v2 = Arc::new(UInt32Vector::from(vec![Some(1), None, None]));

View File

@@ -45,11 +45,11 @@ impl LinesWriter {
pub fn write_ts(&mut self, column_name: &str, value: (i64, Precision)) -> Result<()> {
let (idx, column) = self.mut_column(
column_name,
ColumnDataType::Timestamp,
ColumnDataType::TimestampMillisecond,
SemanticType::Timestamp,
);
ensure!(
column.datatype == ColumnDataType::Timestamp as i32,
column.datatype == ColumnDataType::TimestampMillisecond as i32,
TypeMismatchSnafu {
column_name,
expected: "timestamp",
@@ -58,7 +58,9 @@ impl LinesWriter {
);
// It is safe to use unwrap here, because values has been initialized in mut_column()
let values = column.values.as_mut().unwrap();
values.ts_millis_values.push(to_ms_ts(value.1, value.0));
values
.ts_millisecond_values
.push(to_ms_ts(value.1, value.0));
self.null_masks[idx].push(false);
Ok(())
}
@@ -224,23 +226,23 @@ impl LinesWriter {
pub fn to_ms_ts(p: Precision, ts: i64) -> i64 {
match p {
Precision::NANOSECOND => ts / 1_000_000,
Precision::MICROSECOND => ts / 1000,
Precision::MILLISECOND => ts,
Precision::SECOND => ts * 1000,
Precision::MINUTE => ts * 1000 * 60,
Precision::HOUR => ts * 1000 * 60 * 60,
Precision::Nanosecond => ts / 1_000_000,
Precision::Microsecond => ts / 1000,
Precision::Millisecond => ts,
Precision::Second => ts * 1000,
Precision::Minute => ts * 1000 * 60,
Precision::Hour => ts * 1000 * 60 * 60,
}
}
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum Precision {
NANOSECOND,
MICROSECOND,
MILLISECOND,
SECOND,
MINUTE,
HOUR,
Nanosecond,
Microsecond,
Millisecond,
Second,
Minute,
Hour,
}
#[cfg(test)]
@@ -261,13 +263,13 @@ mod tests {
writer.write_f64("memory", 0.4).unwrap();
writer.write_string("name", "name1").unwrap();
writer
.write_ts("ts", (101011000, Precision::MILLISECOND))
.write_ts("ts", (101011000, Precision::Millisecond))
.unwrap();
writer.commit();
writer.write_tag("host", "host2").unwrap();
writer
.write_ts("ts", (102011001, Precision::MILLISECOND))
.write_ts("ts", (102011001, Precision::Millisecond))
.unwrap();
writer.write_bool("enable_reboot", true).unwrap();
writer.write_u64("year_of_service", 2).unwrap();
@@ -278,7 +280,7 @@ mod tests {
writer.write_f64("cpu", 0.4).unwrap();
writer.write_u64("cpu_core_num", 16).unwrap();
writer
.write_ts("ts", (103011002, Precision::MILLISECOND))
.write_ts("ts", (103011002, Precision::Millisecond))
.unwrap();
writer.commit();
@@ -321,11 +323,11 @@ mod tests {
let column = &columns[4];
assert_eq!("ts", column.column_name);
assert_eq!(ColumnDataType::Timestamp as i32, column.datatype);
assert_eq!(ColumnDataType::TimestampMillisecond as i32, column.datatype);
assert_eq!(SemanticType::Timestamp as i32, column.semantic_type);
assert_eq!(
vec![101011000, 102011001, 103011002],
column.values.as_ref().unwrap().ts_millis_values
column.values.as_ref().unwrap().ts_millisecond_values
);
verify_null_mask(&column.null_mask, vec![false, false, false]);
@@ -367,16 +369,16 @@ mod tests {
#[test]
fn test_to_ms() {
assert_eq!(100, to_ms_ts(Precision::NANOSECOND, 100110000));
assert_eq!(100110, to_ms_ts(Precision::MICROSECOND, 100110000));
assert_eq!(100110000, to_ms_ts(Precision::MILLISECOND, 100110000));
assert_eq!(100, to_ms_ts(Precision::Nanosecond, 100110000));
assert_eq!(100110, to_ms_ts(Precision::Microsecond, 100110000));
assert_eq!(100110000, to_ms_ts(Precision::Millisecond, 100110000));
assert_eq!(
100110000 * 1000 * 60,
to_ms_ts(Precision::MINUTE, 100110000)
to_ms_ts(Precision::Minute, 100110000)
);
assert_eq!(
100110000 * 1000 * 60 * 60,
to_ms_ts(Precision::HOUR, 100110000)
to_ms_ts(Precision::Hour, 100110000)
);
}
}

View File

@@ -1,19 +1,17 @@
[package]
name = "common-query"
version = "0.1.0"
edition = "2021"
license = "Apache-2.0"
version.workspace = true
edition.workspace = true
license.workspace = true
[dependencies]
async-trait = "0.1"
common-error = { path = "../error" }
common-recordbatch = { path = "../recordbatch" }
common-time = { path = "../time" }
datafusion = { git = "https://github.com/apache/arrow-datafusion.git", branch = "arrow2", features = [
"simd",
] }
datafusion-common = { git = "https://github.com/apache/arrow-datafusion.git", branch = "arrow2" }
datafusion-expr = { git = "https://github.com/apache/arrow-datafusion.git", branch = "arrow2" }
datafusion.workspace = true
datafusion-common.workspace = true
datafusion-expr.workspace = true
datatypes = { path = "../../datatypes" }
snafu = { version = "0.7", features = ["backtraces"] }
statrs = "0.15"

View File

@@ -23,16 +23,9 @@ use datatypes::error::Error as DataTypeError;
use datatypes::prelude::ConcreteDataType;
use statrs::StatsError;
common_error::define_opaque_error!(Error);
#[derive(Debug, Snafu)]
#[snafu(visibility(pub))]
pub enum InnerError {
#[snafu(display("Fail to cast array to {:?}, source: {}", typ, source))]
TypeCast {
source: ArrowError,
typ: arrow::datatypes::DataType,
},
pub enum Error {
#[snafu(display("Fail to execute function, source: {}", source))]
ExecuteFunction {
source: DataFusionError,
@@ -83,8 +76,8 @@ pub enum InnerError {
backtrace: Backtrace,
},
#[snafu(display("Invalid inputs: {}", err_msg))]
InvalidInputs {
#[snafu(display("Invalid input type: {}", err_msg))]
InvalidInputType {
#[snafu(backtrace)]
source: DataTypeError,
err_msg: String,
@@ -133,37 +126,74 @@ pub enum InnerError {
#[snafu(backtrace)]
source: BoxedError,
},
#[snafu(display("Failed to cast array to {:?}, source: {}", typ, source))]
TypeCast {
source: ArrowError,
typ: arrow::datatypes::DataType,
backtrace: Backtrace,
},
#[snafu(display(
"Failed to perform compute operation on arrow arrays, source: {}",
source
))]
ArrowCompute {
source: ArrowError,
backtrace: Backtrace,
},
#[snafu(display("Query engine fail to cast value: {}", source))]
ToScalarValue {
#[snafu(backtrace)]
source: DataTypeError,
},
#[snafu(display("Failed to get scalar vector, {}", source))]
GetScalarVector {
#[snafu(backtrace)]
source: DataTypeError,
},
#[snafu(display("Invalid function args: {}", err_msg))]
InvalidFuncArgs {
err_msg: String,
backtrace: Backtrace,
},
}
pub type Result<T> = std::result::Result<T, Error>;
impl ErrorExt for InnerError {
impl ErrorExt for Error {
fn status_code(&self) -> StatusCode {
match self {
InnerError::ExecuteFunction { .. }
| InnerError::GenerateFunction { .. }
| InnerError::CreateAccumulator { .. }
| InnerError::DowncastVector { .. }
| InnerError::InvalidInputState { .. }
| InnerError::InvalidInputCol { .. }
| InnerError::BadAccumulatorImpl { .. } => StatusCode::EngineExecuteQuery,
Error::ExecuteFunction { .. }
| Error::GenerateFunction { .. }
| Error::CreateAccumulator { .. }
| Error::DowncastVector { .. }
| Error::InvalidInputState { .. }
| Error::InvalidInputCol { .. }
| Error::BadAccumulatorImpl { .. }
| Error::ToScalarValue { .. }
| Error::GetScalarVector { .. }
| Error::ArrowCompute { .. } => StatusCode::EngineExecuteQuery,
InnerError::InvalidInputs { source, .. }
| InnerError::IntoVector { source, .. }
| InnerError::FromScalarValue { source }
| InnerError::ConvertArrowSchema { source }
| InnerError::FromArrowArray { source } => source.status_code(),
Error::InvalidInputType { source, .. }
| Error::IntoVector { source, .. }
| Error::FromScalarValue { source }
| Error::ConvertArrowSchema { source }
| Error::FromArrowArray { source } => source.status_code(),
InnerError::ExecuteRepeatedly { .. }
| InnerError::GeneralDataFusion { .. }
| InnerError::DataFusionExecutionPlan { .. } => StatusCode::Unexpected,
Error::ExecuteRepeatedly { .. }
| Error::GeneralDataFusion { .. }
| Error::DataFusionExecutionPlan { .. } => StatusCode::Unexpected,
InnerError::UnsupportedInputDataType { .. } | InnerError::TypeCast { .. } => {
StatusCode::InvalidArguments
}
Error::UnsupportedInputDataType { .. }
| Error::TypeCast { .. }
| Error::InvalidFuncArgs { .. } => StatusCode::InvalidArguments,
InnerError::ConvertDfRecordBatchStream { source, .. } => source.status_code(),
InnerError::ExecutePhysicalPlan { source } => source.status_code(),
Error::ConvertDfRecordBatchStream { source, .. } => source.status_code(),
Error::ExecutePhysicalPlan { source } => source.status_code(),
}
}
@@ -176,12 +206,6 @@ impl ErrorExt for InnerError {
}
}
impl From<InnerError> for Error {
fn from(e: InnerError) -> Error {
Error::new(e)
}
}
impl From<Error> for DataFusionError {
fn from(e: Error) -> DataFusionError {
DataFusionError::External(Box::new(e))
@@ -190,7 +214,7 @@ impl From<Error> for DataFusionError {
impl From<BoxedError> for Error {
fn from(source: BoxedError) -> Self {
InnerError::ExecutePhysicalPlan { source }.into()
Error::ExecutePhysicalPlan { source }
}
}
@@ -206,60 +230,51 @@ mod tests {
}
fn assert_error(err: &Error, code: StatusCode) {
let inner_err = err.as_any().downcast_ref::<InnerError>().unwrap();
let inner_err = err.as_any().downcast_ref::<Error>().unwrap();
assert_eq!(code, inner_err.status_code());
assert!(inner_err.backtrace_opt().is_some());
}
#[test]
fn test_datafusion_as_source() {
let err: Error = throw_df_error()
let err = throw_df_error()
.context(ExecuteFunctionSnafu)
.err()
.unwrap()
.into();
.unwrap();
assert_error(&err, StatusCode::EngineExecuteQuery);
let err: Error = throw_df_error()
.context(GeneralDataFusionSnafu)
.err()
.unwrap()
.into();
.unwrap();
assert_error(&err, StatusCode::Unexpected);
let err: Error = throw_df_error()
let err = throw_df_error()
.context(DataFusionExecutionPlanSnafu)
.err()
.unwrap()
.into();
.unwrap();
assert_error(&err, StatusCode::Unexpected);
}
#[test]
fn test_execute_repeatedly_error() {
let error: Error = None::<i32>
.context(ExecuteRepeatedlySnafu)
.err()
.unwrap()
.into();
assert_eq!(error.inner.status_code(), StatusCode::Unexpected);
let error = None::<i32>.context(ExecuteRepeatedlySnafu).err().unwrap();
assert_eq!(error.status_code(), StatusCode::Unexpected);
assert!(error.backtrace_opt().is_some());
}
#[test]
fn test_convert_df_recordbatch_stream_error() {
let result: std::result::Result<i32, common_recordbatch::error::Error> =
Err(common_recordbatch::error::InnerError::PollStream {
source: ArrowError::Overflow,
Err(common_recordbatch::error::Error::PollStream {
source: ArrowError::DivideByZero,
backtrace: Backtrace::generate(),
}
.into());
let error: Error = result
});
let error = result
.context(ConvertDfRecordBatchStreamSnafu)
.err()
.unwrap()
.into();
assert_eq!(error.inner.status_code(), StatusCode::Internal);
.unwrap();
assert_eq!(error.status_code(), StatusCode::Internal);
assert!(error.backtrace_opt().is_some());
}
@@ -272,13 +287,12 @@ mod tests {
#[test]
fn test_into_vector_error() {
let err: Error = raise_datatype_error()
let err = raise_datatype_error()
.context(IntoVectorSnafu {
data_type: ArrowDatatype::Int32,
})
.err()
.unwrap()
.into();
.unwrap();
assert!(err.backtrace_opt().is_some());
let datatype_err = raise_datatype_error().err().unwrap();
assert_eq!(datatype_err.status_code(), err.status_code());

View File

@@ -161,12 +161,7 @@ mod tests {
assert_eq!(4, vec.len());
for i in 0..4 {
assert_eq!(
i == 0 || i == 3,
vec.get_data(i).unwrap(),
"failed at {}",
i
)
assert_eq!(i == 0 || i == 3, vec.get_data(i).unwrap(), "Failed at {i}")
}
}
_ => unreachable!(),

View File

@@ -22,7 +22,7 @@ use std::sync::Arc;
use datatypes::prelude::ConcreteDataType;
pub use self::accumulator::{Accumulator, AggregateFunctionCreator, AggregateFunctionCreatorRef};
pub use self::expr::Expr;
pub use self::expr::{DfExpr, Expr};
pub use self::udaf::AggregateFunction;
pub use self::udf::ScalarUdf;
use crate::function::{ReturnTypeFunction, ScalarFunctionImplementation};
@@ -148,9 +148,7 @@ mod tests {
let args = vec![
DfColumnarValue::Scalar(ScalarValue::Boolean(Some(true))),
DfColumnarValue::Array(Arc::new(BooleanArray::from_slice(vec![
true, false, false, true,
]))),
DfColumnarValue::Array(Arc::new(BooleanArray::from(vec![true, false, false, true]))),
];
// call the function

View File

@@ -17,12 +17,10 @@
use std::fmt::Debug;
use std::sync::Arc;
use common_time::timestamp::TimeUnit;
use datafusion_common::Result as DfResult;
use datafusion_expr::Accumulator as DfAccumulator;
use datatypes::arrow::array::ArrayRef;
use datatypes::prelude::*;
use datatypes::value::ListValue;
use datatypes::vectors::{Helper as VectorHelper, VectorRef};
use snafu::ResultExt;
@@ -133,351 +131,48 @@ impl DfAccumulator for DfAccumulatorAdaptor {
let state_types = self.creator.state_types()?;
if state_values.len() != state_types.len() {
return error::BadAccumulatorImplSnafu {
err_msg: format!("Accumulator {:?} returned state values size do not match its state types size.", self),
err_msg: format!("Accumulator {self:?} returned state values size do not match its state types size."),
}
.fail()
.map_err(Error::from)?;
.fail()?;
}
Ok(state_values
.into_iter()
.zip(state_types.iter())
.map(|(v, t)| try_into_scalar_value(v, t))
.collect::<Result<Vec<_>>>()
.map_err(Error::from)?)
.map(|(v, t)| v.try_to_scalar_value(t).context(error::ToScalarValueSnafu))
.collect::<Result<Vec<_>>>()?)
}
fn update_batch(&mut self, values: &[ArrayRef]) -> DfResult<()> {
let vectors = VectorHelper::try_into_vectors(values)
.context(FromScalarValueSnafu)
.map_err(Error::from)?;
self.accumulator
.update_batch(&vectors)
.map_err(|e| e.into())
let vectors = VectorHelper::try_into_vectors(values).context(FromScalarValueSnafu)?;
self.accumulator.update_batch(&vectors)?;
Ok(())
}
fn merge_batch(&mut self, states: &[ArrayRef]) -> DfResult<()> {
let mut vectors = Vec::with_capacity(states.len());
for array in states.iter() {
vectors.push(
VectorHelper::try_into_vector(array)
.context(IntoVectorSnafu {
data_type: array.data_type().clone(),
})
.map_err(Error::from)?,
VectorHelper::try_into_vector(array).context(IntoVectorSnafu {
data_type: array.data_type().clone(),
})?,
);
}
self.accumulator.merge_batch(&vectors).map_err(|e| e.into())
self.accumulator.merge_batch(&vectors)?;
Ok(())
}
fn evaluate(&self) -> DfResult<ScalarValue> {
let value = self.accumulator.evaluate()?;
let output_type = self.creator.output_type()?;
Ok(try_into_scalar_value(value, &output_type)?)
}
}
fn try_into_scalar_value(value: Value, datatype: &ConcreteDataType) -> Result<ScalarValue> {
if !matches!(value, Value::Null) && datatype != &value.data_type() {
return error::BadAccumulatorImplSnafu {
err_msg: format!(
"expect value to return datatype {:?}, actual: {:?}",
datatype,
value.data_type()
),
}
.fail()?;
}
Ok(match value {
Value::Boolean(v) => ScalarValue::Boolean(Some(v)),
Value::UInt8(v) => ScalarValue::UInt8(Some(v)),
Value::UInt16(v) => ScalarValue::UInt16(Some(v)),
Value::UInt32(v) => ScalarValue::UInt32(Some(v)),
Value::UInt64(v) => ScalarValue::UInt64(Some(v)),
Value::Int8(v) => ScalarValue::Int8(Some(v)),
Value::Int16(v) => ScalarValue::Int16(Some(v)),
Value::Int32(v) => ScalarValue::Int32(Some(v)),
Value::Int64(v) => ScalarValue::Int64(Some(v)),
Value::Float32(v) => ScalarValue::Float32(Some(v.0)),
Value::Float64(v) => ScalarValue::Float64(Some(v.0)),
Value::String(v) => ScalarValue::Utf8(Some(v.as_utf8().to_string())),
Value::Binary(v) => ScalarValue::LargeBinary(Some(v.to_vec())),
Value::Date(v) => ScalarValue::Date32(Some(v.val())),
Value::DateTime(v) => ScalarValue::Date64(Some(v.val())),
Value::Null => try_convert_null_value(datatype)?,
Value::List(list) => try_convert_list_value(list)?,
Value::Timestamp(t) => timestamp_to_scalar_value(t.unit(), Some(t.value())),
})
}
fn timestamp_to_scalar_value(unit: TimeUnit, val: Option<i64>) -> ScalarValue {
match unit {
TimeUnit::Second => ScalarValue::TimestampSecond(val, None),
TimeUnit::Millisecond => ScalarValue::TimestampMillisecond(val, None),
TimeUnit::Microsecond => ScalarValue::TimestampMicrosecond(val, None),
TimeUnit::Nanosecond => ScalarValue::TimestampNanosecond(val, None),
}
}
fn try_convert_null_value(datatype: &ConcreteDataType) -> Result<ScalarValue> {
Ok(match datatype {
ConcreteDataType::Boolean(_) => ScalarValue::Boolean(None),
ConcreteDataType::Int8(_) => ScalarValue::Int8(None),
ConcreteDataType::Int16(_) => ScalarValue::Int16(None),
ConcreteDataType::Int32(_) => ScalarValue::Int32(None),
ConcreteDataType::Int64(_) => ScalarValue::Int64(None),
ConcreteDataType::UInt8(_) => ScalarValue::UInt8(None),
ConcreteDataType::UInt16(_) => ScalarValue::UInt16(None),
ConcreteDataType::UInt32(_) => ScalarValue::UInt32(None),
ConcreteDataType::UInt64(_) => ScalarValue::UInt64(None),
ConcreteDataType::Float32(_) => ScalarValue::Float32(None),
ConcreteDataType::Float64(_) => ScalarValue::Float64(None),
ConcreteDataType::Binary(_) => ScalarValue::LargeBinary(None),
ConcreteDataType::String(_) => ScalarValue::Utf8(None),
ConcreteDataType::Timestamp(t) => timestamp_to_scalar_value(t.unit, None),
_ => {
return error::BadAccumulatorImplSnafu {
err_msg: format!(
"undefined transition from null value to datatype {:?}",
datatype
),
}
.fail()?
}
})
}
fn try_convert_list_value(list: ListValue) -> Result<ScalarValue> {
let vs = if let Some(items) = list.items() {
Some(Box::new(
items
.iter()
.map(|v| try_into_scalar_value(v.clone(), list.datatype()))
.collect::<Result<Vec<_>>>()?,
))
} else {
None
};
Ok(ScalarValue::List(
vs,
Box::new(list.datatype().as_arrow_type()),
))
}
#[cfg(test)]
mod tests {
use common_base::bytes::{Bytes, StringBytes};
use datafusion_common::ScalarValue;
use datatypes::arrow::datatypes::DataType;
use datatypes::value::{ListValue, OrderedFloat};
use super::*;
#[test]
fn test_not_null_value_to_scalar_value() {
assert_eq!(
ScalarValue::Boolean(Some(true)),
try_into_scalar_value(Value::Boolean(true), &ConcreteDataType::boolean_datatype())
.unwrap()
);
assert_eq!(
ScalarValue::Boolean(Some(false)),
try_into_scalar_value(Value::Boolean(false), &ConcreteDataType::boolean_datatype())
.unwrap()
);
assert_eq!(
ScalarValue::UInt8(Some(u8::MIN + 1)),
try_into_scalar_value(
Value::UInt8(u8::MIN + 1),
&ConcreteDataType::uint8_datatype()
)
.unwrap()
);
assert_eq!(
ScalarValue::UInt16(Some(u16::MIN + 2)),
try_into_scalar_value(
Value::UInt16(u16::MIN + 2),
&ConcreteDataType::uint16_datatype()
)
.unwrap()
);
assert_eq!(
ScalarValue::UInt32(Some(u32::MIN + 3)),
try_into_scalar_value(
Value::UInt32(u32::MIN + 3),
&ConcreteDataType::uint32_datatype()
)
.unwrap()
);
assert_eq!(
ScalarValue::UInt64(Some(u64::MIN + 4)),
try_into_scalar_value(
Value::UInt64(u64::MIN + 4),
&ConcreteDataType::uint64_datatype()
)
.unwrap()
);
assert_eq!(
ScalarValue::Int8(Some(i8::MIN + 4)),
try_into_scalar_value(Value::Int8(i8::MIN + 4), &ConcreteDataType::int8_datatype())
.unwrap()
);
assert_eq!(
ScalarValue::Int16(Some(i16::MIN + 5)),
try_into_scalar_value(
Value::Int16(i16::MIN + 5),
&ConcreteDataType::int16_datatype()
)
.unwrap()
);
assert_eq!(
ScalarValue::Int32(Some(i32::MIN + 6)),
try_into_scalar_value(
Value::Int32(i32::MIN + 6),
&ConcreteDataType::int32_datatype()
)
.unwrap()
);
assert_eq!(
ScalarValue::Int64(Some(i64::MIN + 7)),
try_into_scalar_value(
Value::Int64(i64::MIN + 7),
&ConcreteDataType::int64_datatype()
)
.unwrap()
);
assert_eq!(
ScalarValue::Float32(Some(8.0f32)),
try_into_scalar_value(
Value::Float32(OrderedFloat(8.0f32)),
&ConcreteDataType::float32_datatype()
)
.unwrap()
);
assert_eq!(
ScalarValue::Float64(Some(9.0f64)),
try_into_scalar_value(
Value::Float64(OrderedFloat(9.0f64)),
&ConcreteDataType::float64_datatype()
)
.unwrap()
);
assert_eq!(
ScalarValue::Utf8(Some("hello".to_string())),
try_into_scalar_value(
Value::String(StringBytes::from("hello")),
&ConcreteDataType::string_datatype(),
)
.unwrap()
);
assert_eq!(
ScalarValue::LargeBinary(Some("world".as_bytes().to_vec())),
try_into_scalar_value(
Value::Binary(Bytes::from("world".as_bytes())),
&ConcreteDataType::binary_datatype()
)
.unwrap()
);
}
#[test]
fn test_null_value_to_scalar_value() {
assert_eq!(
ScalarValue::Boolean(None),
try_into_scalar_value(Value::Null, &ConcreteDataType::boolean_datatype()).unwrap()
);
assert_eq!(
ScalarValue::UInt8(None),
try_into_scalar_value(Value::Null, &ConcreteDataType::uint8_datatype()).unwrap()
);
assert_eq!(
ScalarValue::UInt16(None),
try_into_scalar_value(Value::Null, &ConcreteDataType::uint16_datatype()).unwrap()
);
assert_eq!(
ScalarValue::UInt32(None),
try_into_scalar_value(Value::Null, &ConcreteDataType::uint32_datatype()).unwrap()
);
assert_eq!(
ScalarValue::UInt64(None),
try_into_scalar_value(Value::Null, &ConcreteDataType::uint64_datatype()).unwrap()
);
assert_eq!(
ScalarValue::Int8(None),
try_into_scalar_value(Value::Null, &ConcreteDataType::int8_datatype()).unwrap()
);
assert_eq!(
ScalarValue::Int16(None),
try_into_scalar_value(Value::Null, &ConcreteDataType::int16_datatype()).unwrap()
);
assert_eq!(
ScalarValue::Int32(None),
try_into_scalar_value(Value::Null, &ConcreteDataType::int32_datatype()).unwrap()
);
assert_eq!(
ScalarValue::Int64(None),
try_into_scalar_value(Value::Null, &ConcreteDataType::int64_datatype()).unwrap()
);
assert_eq!(
ScalarValue::Float32(None),
try_into_scalar_value(Value::Null, &ConcreteDataType::float32_datatype()).unwrap()
);
assert_eq!(
ScalarValue::Float64(None),
try_into_scalar_value(Value::Null, &ConcreteDataType::float64_datatype()).unwrap()
);
assert_eq!(
ScalarValue::Utf8(None),
try_into_scalar_value(Value::Null, &ConcreteDataType::string_datatype()).unwrap()
);
assert_eq!(
ScalarValue::LargeBinary(None),
try_into_scalar_value(Value::Null, &ConcreteDataType::binary_datatype()).unwrap()
);
}
#[test]
fn test_list_value_to_scalar_value() {
let items = Some(Box::new(vec![Value::Int32(-1), Value::Null]));
let list = Value::List(ListValue::new(items, ConcreteDataType::int32_datatype()));
let df_list = try_into_scalar_value(
list,
&ConcreteDataType::list_datatype(ConcreteDataType::int32_datatype()),
)
.unwrap();
assert!(matches!(df_list, ScalarValue::List(_, _)));
match df_list {
ScalarValue::List(vs, datatype) => {
assert_eq!(*datatype, DataType::Int32);
assert!(vs.is_some());
let vs = *vs.unwrap();
assert_eq!(
vs,
vec![ScalarValue::Int32(Some(-1)), ScalarValue::Int32(None)]
);
}
_ => unreachable!(),
}
}
#[test]
pub fn test_timestamp_to_scalar_value() {
assert_eq!(
ScalarValue::TimestampSecond(Some(1), None),
timestamp_to_scalar_value(TimeUnit::Second, Some(1))
);
assert_eq!(
ScalarValue::TimestampMillisecond(Some(1), None),
timestamp_to_scalar_value(TimeUnit::Millisecond, Some(1))
);
assert_eq!(
ScalarValue::TimestampMicrosecond(Some(1), None),
timestamp_to_scalar_value(TimeUnit::Microsecond, Some(1))
);
assert_eq!(
ScalarValue::TimestampNanosecond(Some(1), None),
timestamp_to_scalar_value(TimeUnit::Nanosecond, Some(1))
);
let scalar_value = value
.try_to_scalar_value(&output_type)
.context(error::ToScalarValueSnafu)
.map_err(Error::from)?;
Ok(scalar_value)
}
fn size(&self) -> usize {
// TODO(LFC): Implement new "size" method for Accumulator.
0
}
}

View File

@@ -12,11 +12,11 @@
// See the License for the specific language governing permissions and
// limitations under the License.
use datafusion::logical_plan::Expr as DfExpr;
pub use datafusion_expr::expr::Expr as DfExpr;
/// Central struct of query API.
/// Represent logical expressions such as `A + 1`, or `CAST(c1 AS int)`.
#[derive(Clone, PartialEq, Hash, Debug)]
#[derive(Clone, PartialEq, Eq, Hash, Debug)]
pub struct Expr {
df_expr: DfExpr,
}

View File

@@ -104,7 +104,7 @@ fn to_df_accumulator_func(
accumulator: AccumulatorFunctionImpl,
creator: AggregateFunctionCreatorRef,
) -> DfAccumulatorFunctionImplementation {
Arc::new(move || {
Arc::new(move |_| {
let accumulator = accumulator()?;
let creator = creator.clone();
Ok(Box::new(DfAccumulatorAdaptor::new(accumulator, creator)))

View File

@@ -16,12 +16,11 @@ use std::any::Any;
use std::fmt::Debug;
use std::sync::Arc;
use async_trait::async_trait;
use common_recordbatch::adapter::{AsyncRecordBatchStreamAdapter, DfRecordBatchStreamAdapter};
use common_recordbatch::adapter::{DfRecordBatchStreamAdapter, RecordBatchStreamAdapter};
use common_recordbatch::{DfSendableRecordBatchStream, SendableRecordBatchStream};
use datafusion::arrow::datatypes::SchemaRef as DfSchemaRef;
use datafusion::error::Result as DfResult;
pub use datafusion::execution::runtime_env::RuntimeEnv;
pub use datafusion::execution::context::{SessionContext, TaskContext};
use datafusion::physical_plan::expressions::PhysicalSortExpr;
pub use datafusion::physical_plan::Partitioning;
use datafusion::physical_plan::Statistics;
@@ -63,7 +62,7 @@ pub trait PhysicalPlan: Debug + Send + Sync {
fn execute(
&self,
partition: usize,
runtime: Arc<RuntimeEnv>,
context: Arc<TaskContext>,
) -> Result<SendableRecordBatchStream>;
}
@@ -111,6 +110,7 @@ impl PhysicalPlan for PhysicalPlanAdapter {
.collect();
let plan = self
.df_plan
.clone()
.with_new_children(children)
.context(error::GeneralDataFusionSnafu)?;
Ok(Arc::new(PhysicalPlanAdapter::new(self.schema(), plan)))
@@ -119,20 +119,22 @@ impl PhysicalPlan for PhysicalPlanAdapter {
fn execute(
&self,
partition: usize,
runtime: Arc<RuntimeEnv>,
context: Arc<TaskContext>,
) -> Result<SendableRecordBatchStream> {
let df_plan = self.df_plan.clone();
let stream = Box::pin(async move { df_plan.execute(partition, runtime).await });
let stream = AsyncRecordBatchStreamAdapter::new(self.schema(), stream);
let stream = df_plan
.execute(partition, context)
.context(error::GeneralDataFusionSnafu)?;
let adapter = RecordBatchStreamAdapter::try_new(stream)
.context(error::ConvertDfRecordBatchStreamSnafu)?;
Ok(Box::pin(stream))
Ok(Box::pin(adapter))
}
}
#[derive(Debug)]
pub struct DfPhysicalPlanAdapter(pub PhysicalPlanRef);
#[async_trait]
impl DfPhysicalPlan for DfPhysicalPlanAdapter {
fn as_any(&self) -> &dyn Any {
self
@@ -159,15 +161,14 @@ impl DfPhysicalPlan for DfPhysicalPlanAdapter {
}
fn with_new_children(
&self,
self: Arc<Self>,
children: Vec<Arc<dyn DfPhysicalPlan>>,
) -> DfResult<Arc<dyn DfPhysicalPlan>> {
let df_schema = self.schema();
let schema: SchemaRef = Arc::new(
df_schema
.try_into()
.context(error::ConvertArrowSchemaSnafu)
.map_err(error::Error::from)?,
.context(error::ConvertArrowSchemaSnafu)?,
);
let children = children
.into_iter()
@@ -177,12 +178,12 @@ impl DfPhysicalPlan for DfPhysicalPlanAdapter {
Ok(Arc::new(DfPhysicalPlanAdapter(plan)))
}
async fn execute(
fn execute(
&self,
partition: usize,
runtime: Arc<RuntimeEnv>,
context: Arc<TaskContext>,
) -> DfResult<DfSendableRecordBatchStream> {
let stream = self.0.execute(partition, runtime)?;
let stream = self.0.execute(partition, context)?;
Ok(Box::pin(DfRecordBatchStreamAdapter::new(stream)))
}
@@ -194,16 +195,16 @@ impl DfPhysicalPlan for DfPhysicalPlanAdapter {
#[cfg(test)]
mod test {
use async_trait::async_trait;
use common_recordbatch::{RecordBatch, RecordBatches};
use datafusion::arrow_print;
use datafusion::datasource::TableProvider as DfTableProvider;
use datafusion::logical_plan::LogicalPlanBuilder;
use datafusion::datasource::{DefaultTableSource, TableProvider as DfTableProvider, TableType};
use datafusion::execution::context::{SessionContext, SessionState};
use datafusion::physical_plan::collect;
use datafusion::physical_plan::empty::EmptyExec;
use datafusion::prelude::ExecutionContext;
use datafusion_common::field_util::SchemaExt;
use datafusion_expr::Expr;
use datafusion_expr::logical_plan::builder::LogicalPlanBuilder;
use datafusion_expr::{Expr, TableSource};
use datatypes::arrow::datatypes::{DataType, Field, Schema as ArrowSchema};
use datatypes::arrow::util::pretty;
use datatypes::schema::Schema;
use datatypes::vectors::Int32Vector;
@@ -225,9 +226,14 @@ mod test {
)]))
}
fn table_type(&self) -> TableType {
TableType::Base
}
async fn scan(
&self,
_projection: &Option<Vec<usize>>,
_ctx: &SessionState,
_projection: Option<&Vec<usize>>,
_filters: &[Expr],
_limit: Option<usize>,
) -> DfResult<Arc<dyn DfPhysicalPlan>> {
@@ -240,6 +246,14 @@ mod test {
}
}
impl MyDfTableProvider {
fn table_source() -> Arc<dyn TableSource> {
Arc::new(DefaultTableSource {
table_provider: Arc::new(Self),
})
}
}
#[derive(Debug)]
struct MyExecutionPlan {
schema: SchemaRef,
@@ -269,7 +283,7 @@ mod test {
fn execute(
&self,
_partition: usize,
_runtime: Arc<RuntimeEnv>,
_context: Arc<TaskContext>,
) -> Result<SendableRecordBatchStream> {
let schema = self.schema();
let recordbatches = RecordBatches::try_new(
@@ -295,20 +309,26 @@ mod test {
// Test our physical plan can be executed by DataFusion, through adapters.
#[tokio::test]
async fn test_execute_physical_plan() {
let ctx = ExecutionContext::new();
let logical_plan = LogicalPlanBuilder::scan("test", Arc::new(MyDfTableProvider), None)
.unwrap()
.build()
.unwrap();
let ctx = SessionContext::new();
let logical_plan =
LogicalPlanBuilder::scan("test", MyDfTableProvider::table_source(), None)
.unwrap()
.build()
.unwrap();
let physical_plan = ctx.create_physical_plan(&logical_plan).await.unwrap();
let df_recordbatches = collect(physical_plan, Arc::new(RuntimeEnv::default()))
let df_recordbatches = collect(physical_plan, Arc::new(TaskContext::from(&ctx)))
.await
.unwrap();
let pretty_print = arrow_print::write(&df_recordbatches);
let pretty_print = pretty_print.lines().collect::<Vec<&str>>();
let pretty_print = pretty::pretty_format_batches(&df_recordbatches).unwrap();
assert_eq!(
pretty_print,
vec!["+---+", "| a |", "+---+", "| 1 |", "| 2 |", "| 3 |", "+---+",]
pretty_print.to_string(),
r#"+---+
| a |
+---+
| 1 |
| 2 |
| 3 |
+---+"#
);
}

View File

@@ -15,7 +15,7 @@
//! Signature module contains foundational types that are used to represent signatures, types,
//! and return types of functions.
//! Copied and modified from datafusion.
pub use datafusion::physical_plan::functions::Volatility;
pub use datafusion_expr::Volatility;
use datafusion_expr::{Signature as DfSignature, TypeSignature as DfTypeSignature};
use datatypes::arrow::datatypes::DataType as ArrowDataType;
use datatypes::data_type::DataType;

View File

@@ -1,15 +1,13 @@
[package]
name = "common-recordbatch"
version = "0.1.0"
edition = "2021"
license = "Apache-2.0"
version.workspace = true
edition.workspace = true
license.workspace = true
[dependencies]
common-error = { path = "../error" }
datafusion = { git = "https://github.com/apache/arrow-datafusion.git", branch = "arrow2", features = [
"simd",
] }
datafusion-common = { git = "https://github.com/apache/arrow-datafusion.git", branch = "arrow2" }
datafusion.workspace = true
datafusion-common.workspace = true
datatypes = { path = "../../datatypes" }
futures = "0.3"
paste = "1.0"

View File

@@ -19,7 +19,6 @@ use std::task::{Context, Poll};
use datafusion::arrow::datatypes::SchemaRef as DfSchemaRef;
use datafusion::physical_plan::RecordBatchStream as DfRecordBatchStream;
use datafusion_common::record_batch::RecordBatch as DfRecordBatch;
use datafusion_common::DataFusionError;
use datatypes::arrow::error::{ArrowError, Result as ArrowResult};
use datatypes::schema::{Schema, SchemaRef};
@@ -28,7 +27,8 @@ use snafu::ResultExt;
use crate::error::{self, Result};
use crate::{
DfSendableRecordBatchStream, RecordBatch, RecordBatchStream, SendableRecordBatchStream, Stream,
DfRecordBatch, DfSendableRecordBatchStream, RecordBatch, RecordBatchStream,
SendableRecordBatchStream, Stream,
};
type FutureStream = Pin<
@@ -63,8 +63,8 @@ impl Stream for DfRecordBatchStreamAdapter {
match Pin::new(&mut self.stream).poll_next(cx) {
Poll::Pending => Poll::Pending,
Poll::Ready(Some(recordbatch)) => match recordbatch {
Ok(recordbatch) => Poll::Ready(Some(Ok(recordbatch.df_recordbatch))),
Err(e) => Poll::Ready(Some(Err(ArrowError::External("".to_owned(), Box::new(e))))),
Ok(recordbatch) => Poll::Ready(Some(Ok(recordbatch.into_df_record_batch()))),
Err(e) => Poll::Ready(Some(Err(ArrowError::ExternalError(Box::new(e))))),
},
Poll::Ready(None) => Poll::Ready(None),
}
@@ -102,10 +102,13 @@ impl Stream for RecordBatchStreamAdapter {
fn poll_next(mut self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Option<Self::Item>> {
match Pin::new(&mut self.stream).poll_next(cx) {
Poll::Pending => Poll::Pending,
Poll::Ready(Some(df_recordbatch)) => Poll::Ready(Some(Ok(RecordBatch {
schema: self.schema(),
df_recordbatch: df_recordbatch.context(error::PollStreamSnafu)?,
}))),
Poll::Ready(Some(df_record_batch)) => {
let df_record_batch = df_record_batch.context(error::PollStreamSnafu)?;
Poll::Ready(Some(RecordBatch::try_from_df_record_batch(
self.schema(),
df_record_batch,
)))
}
Poll::Ready(None) => Poll::Ready(None),
}
}
@@ -118,7 +121,8 @@ impl Stream for RecordBatchStreamAdapter {
enum AsyncRecordBatchStreamAdapterState {
Uninit(FutureStream),
Inited(std::result::Result<DfSendableRecordBatchStream, DataFusionError>),
Ready(DfSendableRecordBatchStream),
Failed,
}
pub struct AsyncRecordBatchStreamAdapter {
@@ -148,31 +152,26 @@ impl Stream for AsyncRecordBatchStreamAdapter {
loop {
match &mut self.state {
AsyncRecordBatchStreamAdapterState::Uninit(stream_future) => {
self.state = AsyncRecordBatchStreamAdapterState::Inited(ready!(Pin::new(
stream_future
)
.poll(cx)));
continue;
match ready!(Pin::new(stream_future).poll(cx)) {
Ok(stream) => {
self.state = AsyncRecordBatchStreamAdapterState::Ready(stream);
continue;
}
Err(e) => {
self.state = AsyncRecordBatchStreamAdapterState::Failed;
return Poll::Ready(Some(
Err(e).context(error::InitRecordbatchStreamSnafu),
));
}
};
}
AsyncRecordBatchStreamAdapterState::Inited(stream) => match stream {
Ok(stream) => {
return Poll::Ready(ready!(Pin::new(stream).poll_next(cx)).map(|df| {
Ok(RecordBatch {
schema: self.schema(),
df_recordbatch: df.context(error::PollStreamSnafu)?,
})
}));
}
Err(e) => {
return Poll::Ready(Some(
error::CreateRecordBatchesSnafu {
reason: format!("Read error {:?} from stream", e),
}
.fail()
.map_err(|e| e.into()),
))
}
},
AsyncRecordBatchStreamAdapterState::Ready(stream) => {
return Poll::Ready(ready!(Pin::new(stream).poll_next(cx)).map(|x| {
let df_record_batch = x.context(error::PollStreamSnafu)?;
RecordBatch::try_from_df_record_batch(self.schema(), df_record_batch)
}))
}
AsyncRecordBatchStreamAdapterState::Failed => return Poll::Ready(None),
}
}
}
@@ -183,3 +182,104 @@ impl Stream for AsyncRecordBatchStreamAdapter {
(0, None)
}
}
#[cfg(test)]
mod test {
use common_error::mock::MockError;
use common_error::prelude::{BoxedError, StatusCode};
use datatypes::prelude::ConcreteDataType;
use datatypes::schema::ColumnSchema;
use datatypes::vectors::Int32Vector;
use super::*;
use crate::RecordBatches;
#[tokio::test]
async fn test_async_recordbatch_stream_adaptor() {
struct MaybeErrorRecordBatchStream {
items: Vec<Result<RecordBatch>>,
}
impl RecordBatchStream for MaybeErrorRecordBatchStream {
fn schema(&self) -> SchemaRef {
unimplemented!()
}
}
impl Stream for MaybeErrorRecordBatchStream {
type Item = Result<RecordBatch>;
fn poll_next(
mut self: Pin<&mut Self>,
_: &mut Context<'_>,
) -> Poll<Option<Self::Item>> {
if let Some(batch) = self.items.pop() {
Poll::Ready(Some(Ok(batch?)))
} else {
Poll::Ready(None)
}
}
}
fn new_future_stream(
maybe_recordbatches: Result<Vec<Result<RecordBatch>>>,
) -> FutureStream {
Box::pin(async move {
maybe_recordbatches
.map(|items| {
Box::pin(DfRecordBatchStreamAdapter::new(Box::pin(
MaybeErrorRecordBatchStream { items },
))) as _
})
.map_err(|e| DataFusionError::External(Box::new(e)))
})
}
let schema = Arc::new(Schema::new(vec![ColumnSchema::new(
"a",
ConcreteDataType::int32_datatype(),
false,
)]));
let batch1 = RecordBatch::new(
schema.clone(),
vec![Arc::new(Int32Vector::from_slice(&[1])) as _],
)
.unwrap();
let batch2 = RecordBatch::new(
schema.clone(),
vec![Arc::new(Int32Vector::from_slice(&[2])) as _],
)
.unwrap();
let success_stream = new_future_stream(Ok(vec![Ok(batch1.clone()), Ok(batch2.clone())]));
let adapter = AsyncRecordBatchStreamAdapter::new(schema.clone(), success_stream);
let collected = RecordBatches::try_collect(Box::pin(adapter)).await.unwrap();
assert_eq!(
collected,
RecordBatches::try_new(schema.clone(), vec![batch2.clone(), batch1.clone()]).unwrap()
);
let poll_err_stream = new_future_stream(Ok(vec![
Ok(batch1.clone()),
Err(error::Error::External {
source: BoxedError::new(MockError::new(StatusCode::Unknown)),
}),
]));
let adapter = AsyncRecordBatchStreamAdapter::new(schema.clone(), poll_err_stream);
let result = RecordBatches::try_collect(Box::pin(adapter)).await;
assert_eq!(
result.unwrap_err().to_string(),
"Failed to poll stream, source: External error: External error, source: Unknown"
);
let failed_to_init_stream = new_future_stream(Err(error::Error::External {
source: BoxedError::new(MockError::new(StatusCode::Internal)),
}));
let adapter = AsyncRecordBatchStreamAdapter::new(schema.clone(), failed_to_init_stream);
let result = RecordBatches::try_collect(Box::pin(adapter)).await;
assert_eq!(
result.unwrap_err().to_string(),
"Failed to init Recordbatch stream, source: External error: External error, source: Internal"
);
}
}

View File

@@ -17,13 +17,12 @@ use std::any::Any;
use common_error::ext::BoxedError;
use common_error::prelude::*;
common_error::define_opaque_error!(Error);
pub type Result<T> = std::result::Result<T, Error>;
#[derive(Debug, Snafu)]
#[snafu(visibility(pub))]
pub enum InnerError {
pub enum Error {
#[snafu(display("Fail to create datafusion record batch, source: {}", source))]
NewDfRecordBatch {
source: datatypes::arrow::error::ArrowError,
@@ -59,20 +58,34 @@ pub enum InnerError {
source: datatypes::arrow::error::ArrowError,
backtrace: Backtrace,
},
#[snafu(display("Fail to format record batch, source: {}", source))]
Format {
source: datatypes::arrow::error::ArrowError,
backtrace: Backtrace,
},
#[snafu(display("Failed to init Recordbatch stream, source: {}", source))]
InitRecordbatchStream {
source: datafusion_common::DataFusionError,
backtrace: Backtrace,
},
}
impl ErrorExt for InnerError {
impl ErrorExt for Error {
fn status_code(&self) -> StatusCode {
match self {
InnerError::NewDfRecordBatch { .. } => StatusCode::InvalidArguments,
Error::NewDfRecordBatch { .. } => StatusCode::InvalidArguments,
InnerError::DataTypes { .. }
| InnerError::CreateRecordBatches { .. }
| InnerError::PollStream { .. } => StatusCode::Internal,
Error::DataTypes { .. }
| Error::CreateRecordBatches { .. }
| Error::PollStream { .. }
| Error::Format { .. }
| Error::InitRecordbatchStream { .. } => StatusCode::Internal,
InnerError::External { source } => source.status_code(),
Error::External { source } => source.status_code(),
InnerError::SchemaConversion { source, .. } => source.status_code(),
Error::SchemaConversion { source, .. } => source.status_code(),
}
}
@@ -84,9 +97,3 @@ impl ErrorExt for InnerError {
self
}
}
impl From<InnerError> for Error {
fn from(e: InnerError) -> Error {
Error::new(e)
}
}

View File

@@ -20,16 +20,17 @@ pub mod util;
use std::pin::Pin;
use std::sync::Arc;
use datafusion::arrow_print;
use datafusion::physical_plan::memory::MemoryStream;
pub use datafusion::physical_plan::SendableRecordBatchStream as DfSendableRecordBatchStream;
pub use datatypes::arrow::record_batch::RecordBatch as DfRecordBatch;
use datatypes::arrow::util::pretty;
use datatypes::prelude::VectorRef;
use datatypes::schema::{Schema, SchemaRef};
use error::Result;
use futures::task::{Context, Poll};
use futures::{Stream, TryStreamExt};
pub use recordbatch::RecordBatch;
use snafu::ensure;
use snafu::{ensure, ResultExt};
pub trait RecordBatchStream: Stream<Item = Result<RecordBatch>> {
fn schema(&self) -> SchemaRef;
@@ -65,7 +66,7 @@ impl Stream for EmptyRecordBatchStream {
}
}
#[derive(Debug)]
#[derive(Debug, PartialEq)]
pub struct RecordBatches {
schema: SchemaRef,
batches: Vec<RecordBatch>,
@@ -98,17 +99,18 @@ impl RecordBatches {
self.batches.iter()
}
pub fn pretty_print(&self) -> String {
arrow_print::write(
&self
.iter()
.map(|x| x.df_recordbatch.clone())
.collect::<Vec<_>>(),
)
pub fn pretty_print(&self) -> Result<String> {
let df_batches = &self
.iter()
.map(|x| x.df_record_batch().clone())
.collect::<Vec<_>>();
let result = pretty::pretty_format_batches(df_batches).context(error::FormatSnafu)?;
Ok(result.to_string())
}
pub fn try_new(schema: SchemaRef, batches: Vec<RecordBatch>) -> Result<Self> {
for batch in batches.iter() {
for batch in &batches {
ensure!(
batch.schema == schema,
error::CreateRecordBatchesSnafu {
@@ -144,7 +146,7 @@ impl RecordBatches {
let df_record_batches = self
.batches
.into_iter()
.map(|batch| batch.df_recordbatch)
.map(|batch| batch.into_df_record_batch())
.collect();
// unwrap safety: `MemoryStream::try_new` won't fail
Box::pin(
@@ -229,8 +231,7 @@ mod tests {
assert_eq!(
result.unwrap_err().to_string(),
format!(
"Failed to create RecordBatches, reason: expect RecordBatch schema equals {:?}, actual: {:?}",
schema1, schema2
"Failed to create RecordBatches, reason: expect RecordBatch schema equals {schema1:?}, actual: {schema2:?}",
)
);
@@ -242,7 +243,7 @@ mod tests {
| 1 | hello |
| 2 | world |
+---+-------+";
assert_eq!(batches.pretty_print(), expected);
assert_eq!(batches.pretty_print().unwrap(), expected);
assert_eq!(schema1, batches.schema());
assert_eq!(vec![batch1], batches.take());

View File

@@ -12,8 +12,6 @@
// See the License for the specific language governing permissions and
// limitations under the License.
use datafusion_common::record_batch::RecordBatch as DfRecordBatch;
use datatypes::arrow_array::arrow_array_get;
use datatypes::schema::SchemaRef;
use datatypes::value::Value;
use datatypes::vectors::{Helper, VectorRef};
@@ -22,32 +20,88 @@ use serde::{Serialize, Serializer};
use snafu::ResultExt;
use crate::error::{self, Result};
use crate::DfRecordBatch;
// TODO(yingwen): We should hold vectors in the RecordBatch.
/// A two-dimensional batch of column-oriented data with a defined schema.
#[derive(Clone, Debug, PartialEq)]
pub struct RecordBatch {
pub schema: SchemaRef,
pub df_recordbatch: DfRecordBatch,
columns: Vec<VectorRef>,
df_record_batch: DfRecordBatch,
}
impl RecordBatch {
/// Create a new [`RecordBatch`] from `schema` and `columns`.
pub fn new<I: IntoIterator<Item = VectorRef>>(
schema: SchemaRef,
columns: I,
) -> Result<RecordBatch> {
let arrow_arrays = columns.into_iter().map(|v| v.to_arrow_array()).collect();
let columns: Vec<_> = columns.into_iter().collect();
let arrow_arrays = columns.iter().map(|v| v.to_arrow_array()).collect();
let df_recordbatch = DfRecordBatch::try_new(schema.arrow_schema().clone(), arrow_arrays)
let df_record_batch = DfRecordBatch::try_new(schema.arrow_schema().clone(), arrow_arrays)
.context(error::NewDfRecordBatchSnafu)?;
Ok(RecordBatch {
schema,
df_recordbatch,
columns,
df_record_batch,
})
}
/// Create a new [`RecordBatch`] from `schema` and `df_record_batch`.
///
/// This method doesn't check the schema.
pub fn try_from_df_record_batch(
schema: SchemaRef,
df_record_batch: DfRecordBatch,
) -> Result<RecordBatch> {
let columns = df_record_batch
.columns()
.iter()
.map(|c| Helper::try_into_vector(c.clone()).context(error::DataTypesSnafu))
.collect::<Result<Vec<_>>>()?;
Ok(RecordBatch {
schema,
columns,
df_record_batch,
})
}
#[inline]
pub fn df_record_batch(&self) -> &DfRecordBatch {
&self.df_record_batch
}
#[inline]
pub fn into_df_record_batch(self) -> DfRecordBatch {
self.df_record_batch
}
#[inline]
pub fn columns(&self) -> &[VectorRef] {
&self.columns
}
#[inline]
pub fn column(&self, idx: usize) -> &VectorRef {
&self.columns[idx]
}
pub fn column_by_name(&self, name: &str) -> Option<&VectorRef> {
let idx = self.schema.column_index_by_name(name)?;
Some(&self.columns[idx])
}
#[inline]
pub fn num_columns(&self) -> usize {
self.columns.len()
}
#[inline]
pub fn num_rows(&self) -> usize {
self.df_recordbatch.num_rows()
self.df_record_batch.num_rows()
}
/// Create an iterator to traverse the data by row
@@ -61,14 +115,15 @@ impl Serialize for RecordBatch {
where
S: Serializer,
{
// TODO(yingwen): arrow and arrow2's schemas have different fields, so
// it might be better to use our `RawSchema` as serialized field.
let mut s = serializer.serialize_struct("record", 2)?;
s.serialize_field("schema", &self.schema.arrow_schema())?;
s.serialize_field("schema", &**self.schema.arrow_schema())?;
let df_columns = self.df_recordbatch.columns();
let vec = df_columns
let vec = self
.columns
.iter()
.map(|c| Helper::try_into_vector(c.clone())?.serialize_to_json())
.map(|c| c.serialize_to_json())
.collect::<std::result::Result<Vec<_>, _>>()
.map_err(S::Error::custom)?;
@@ -88,8 +143,8 @@ impl<'a> RecordBatchRowIterator<'a> {
fn new(record_batch: &'a RecordBatch) -> RecordBatchRowIterator {
RecordBatchRowIterator {
record_batch,
rows: record_batch.df_recordbatch.num_rows(),
columns: record_batch.df_recordbatch.num_columns(),
rows: record_batch.df_record_batch.num_rows(),
columns: record_batch.df_record_batch.num_columns(),
row_cursor: 0,
}
}
@@ -104,15 +159,9 @@ impl<'a> Iterator for RecordBatchRowIterator<'a> {
} else {
let mut row = Vec::with_capacity(self.columns);
// TODO(yingwen): Get from the vector if RecordBatch also holds vectors.
for col in 0..self.columns {
let column_array = self.record_batch.df_recordbatch.column(col);
match arrow_array_get(column_array.as_ref(), self.row_cursor)
.context(error::DataTypesSnafu)
{
Ok(field) => row.push(field),
Err(e) => return Some(Err(e.into())),
}
let column = self.record_batch.column(col);
row.push(column.get(self.row_cursor));
}
self.row_cursor += 1;
@@ -125,63 +174,60 @@ impl<'a> Iterator for RecordBatchRowIterator<'a> {
mod tests {
use std::sync::Arc;
use datafusion_common::field_util::SchemaExt;
use datafusion_common::record_batch::RecordBatch as DfRecordBatch;
use datatypes::arrow::array::UInt32Array;
use datatypes::arrow::datatypes::{DataType, Field, Schema as ArrowSchema};
use datatypes::prelude::*;
use datatypes::data_type::ConcreteDataType;
use datatypes::schema::{ColumnSchema, Schema};
use datatypes::vectors::{StringVector, UInt32Vector, Vector};
use datatypes::vectors::{StringVector, UInt32Vector};
use super::*;
#[test]
fn test_new_record_batch() {
fn test_record_batch() {
let arrow_schema = Arc::new(ArrowSchema::new(vec![
Field::new("c1", DataType::UInt32, false),
Field::new("c2", DataType::UInt32, false),
]));
let schema = Arc::new(Schema::try_from(arrow_schema).unwrap());
let v = Arc::new(UInt32Vector::from_slice(&[1, 2, 3]));
let columns: Vec<VectorRef> = vec![v.clone(), v.clone()];
let c1 = Arc::new(UInt32Vector::from_slice(&[1, 2, 3]));
let c2 = Arc::new(UInt32Vector::from_slice(&[4, 5, 6]));
let columns: Vec<VectorRef> = vec![c1, c2];
let batch = RecordBatch::new(schema.clone(), columns).unwrap();
let expect = v.to_arrow_array();
for column in batch.df_recordbatch.columns() {
let array = column.as_any().downcast_ref::<UInt32Array>().unwrap();
assert_eq!(
expect.as_any().downcast_ref::<UInt32Array>().unwrap(),
array
);
let batch = RecordBatch::new(schema.clone(), columns.clone()).unwrap();
assert_eq!(3, batch.num_rows());
assert_eq!(&columns, batch.columns());
for (i, expect) in columns.iter().enumerate().take(batch.num_columns()) {
let column = batch.column(i);
assert_eq!(expect, column);
}
assert_eq!(schema, batch.schema);
assert_eq!(columns[0], *batch.column_by_name("c1").unwrap());
assert_eq!(columns[1], *batch.column_by_name("c2").unwrap());
assert!(batch.column_by_name("c3").is_none());
let converted =
RecordBatch::try_from_df_record_batch(schema, batch.df_record_batch().clone()).unwrap();
assert_eq!(batch, converted);
assert_eq!(*batch.df_record_batch(), converted.into_df_record_batch());
}
#[test]
pub fn test_serialize_recordbatch() {
let arrow_schema = Arc::new(ArrowSchema::new(vec![Field::new(
let column_schemas = vec![ColumnSchema::new(
"number",
DataType::UInt32,
ConcreteDataType::uint32_datatype(),
false,
)]));
let schema = Arc::new(Schema::try_from(arrow_schema.clone()).unwrap());
)];
let schema = Arc::new(Schema::try_new(column_schemas).unwrap());
let numbers: Vec<u32> = (0..10).collect();
let df_batch = DfRecordBatch::try_new(
arrow_schema,
vec![Arc::new(UInt32Array::from_slice(&numbers))],
)
.unwrap();
let batch = RecordBatch {
schema,
df_recordbatch: df_batch,
};
let columns = vec![Arc::new(UInt32Vector::from_slice(&numbers)) as VectorRef];
let batch = RecordBatch::new(schema, columns).unwrap();
let output = serde_json::to_string(&batch).unwrap();
assert_eq!(
r#"{"schema":{"fields":[{"name":"number","data_type":"UInt32","is_nullable":false,"metadata":{}}],"metadata":{}},"columns":[[0,1,2,3,4,5,6,7,8,9]]}"#,
r#"{"schema":{"fields":[{"name":"number","data_type":"UInt32","nullable":false,"dict_id":0,"dict_is_ordered":false,"metadata":{}}],"metadata":{"greptime:version":"0"}},"columns":[[0,1,2,3,4,5,6,7,8,9]]}"#,
output
);
}

View File

@@ -15,23 +15,29 @@
use futures::TryStreamExt;
use crate::error::Result;
use crate::{RecordBatch, SendableRecordBatchStream};
use crate::{RecordBatch, RecordBatches, SendableRecordBatchStream};
/// Collect all the items from the stream into a vector of [`RecordBatch`].
pub async fn collect(stream: SendableRecordBatchStream) -> Result<Vec<RecordBatch>> {
stream.try_collect::<Vec<_>>().await
}
/// Collect all the items from the stream into [RecordBatches].
pub async fn collect_batches(stream: SendableRecordBatchStream) -> Result<RecordBatches> {
let schema = stream.schema();
let batches = stream.try_collect::<Vec<_>>().await?;
RecordBatches::try_new(schema, batches)
}
#[cfg(test)]
mod tests {
use std::mem;
use std::pin::Pin;
use std::sync::Arc;
use datafusion_common::field_util::SchemaExt;
use datafusion_common::record_batch::RecordBatch as DfRecordBatch;
use datatypes::arrow::array::UInt32Array;
use datatypes::arrow::datatypes::{DataType, Field, Schema as ArrowSchema};
use datatypes::schema::{Schema, SchemaRef};
use datatypes::prelude::*;
use datatypes::schema::{ColumnSchema, Schema, SchemaRef};
use datatypes::vectors::UInt32Vector;
use futures::task::{Context, Poll};
use futures::Stream;
@@ -65,12 +71,13 @@ mod tests {
#[tokio::test]
async fn test_collect() {
let arrow_schema = Arc::new(ArrowSchema::new(vec![Field::new(
let column_schemas = vec![ColumnSchema::new(
"number",
DataType::UInt32,
ConcreteDataType::uint32_datatype(),
false,
)]));
let schema = Arc::new(Schema::try_from(arrow_schema.clone()).unwrap());
)];
let schema = Arc::new(Schema::try_new(column_schemas).unwrap());
let stream = MockRecordBatchStream {
schema: schema.clone(),
@@ -81,24 +88,23 @@ mod tests {
assert_eq!(0, batches.len());
let numbers: Vec<u32> = (0..10).collect();
let df_batch = DfRecordBatch::try_new(
arrow_schema.clone(),
vec![Arc::new(UInt32Array::from_slice(&numbers))],
)
.unwrap();
let batch = RecordBatch {
schema: schema.clone(),
df_recordbatch: df_batch,
};
let columns = [Arc::new(UInt32Vector::from_vec(numbers)) as _];
let batch = RecordBatch::new(schema.clone(), columns).unwrap();
let stream = MockRecordBatchStream {
schema: Arc::new(Schema::try_from(arrow_schema).unwrap()),
schema: schema.clone(),
batch: Some(batch.clone()),
};
let batches = collect(Box::pin(stream)).await.unwrap();
assert_eq!(1, batches.len());
assert_eq!(batch, batches[0]);
let stream = MockRecordBatchStream {
schema: schema.clone(),
batch: Some(batch.clone()),
};
let batches = collect_batches(Box::pin(stream)).await.unwrap();
let expect_batches = RecordBatches::try_new(schema.clone(), vec![batch]).unwrap();
assert_eq!(expect_batches, batches);
}
}

View File

@@ -1,8 +1,8 @@
[package]
name = "common-runtime"
version = "0.1.0"
edition = "2021"
license = "Apache-2.0"
version.workspace = true
edition.workspace = true
license.workspace = true
[dependencies]
common-error = { path = "../error" }

View File

@@ -1,8 +1,8 @@
[package]
name = "substrait"
version = "0.1.0"
edition = "2021"
license = "Apache-2.0"
version.workspace = true
edition.workspace = true
license.workspace = true
[dependencies]
bytes = "1.1"
@@ -10,10 +10,8 @@ catalog = { path = "../../catalog" }
common-catalog = { path = "../catalog" }
common-error = { path = "../error" }
common-telemetry = { path = "../telemetry" }
datafusion = { git = "https://github.com/apache/arrow-datafusion.git", branch = "arrow2", features = [
"simd",
] }
datafusion-expr = { git = "https://github.com/apache/arrow-datafusion.git", branch = "arrow2" }
datafusion.workspace = true
datafusion-expr.workspace = true
datatypes = { path = "../../datatypes" }
futures = "0.3"
prost = "0.9"

View File

@@ -14,7 +14,7 @@
use std::collections::HashMap;
use datafusion::logical_plan::DFSchemaRef;
use datafusion::common::DFSchemaRef;
use substrait_proto::protobuf::extensions::simple_extension_declaration::{
ExtensionFunction, MappingType,
};

View File

@@ -15,8 +15,9 @@
use std::collections::VecDeque;
use std::str::FromStr;
use datafusion::logical_plan::{Column, Expr};
use datafusion_expr::{expr_fn, lit, BuiltinScalarFunction, Operator};
use datafusion::common::Column;
use datafusion_expr::expr::Sort;
use datafusion_expr::{expr_fn, lit, Between, BinaryExpr, BuiltinScalarFunction, Expr, Operator};
use datatypes::schema::Schema;
use snafu::{ensure, OptionExt};
use substrait_proto::protobuf::expression::field_reference::ReferenceType as FieldReferenceType;
@@ -61,7 +62,7 @@ pub(crate) fn to_df_expr(
| RexType::Cast(_)
| RexType::Subquery(_)
| RexType::Enum(_) => UnsupportedExprSnafu {
name: format!("substrait expression {:?}", expr_rex_type),
name: format!("substrait expression {expr_rex_type:?}"),
}
.fail()?,
}
@@ -109,7 +110,7 @@ pub fn convert_scalar_function(
let fn_name = ctx
.find_scalar_fn(anchor)
.with_context(|| InvalidParametersSnafu {
reason: format!("Unregistered scalar function reference: {}", anchor),
reason: format!("Unregistered scalar function reference: {anchor}"),
})?;
// convenient util
@@ -311,39 +312,39 @@ pub fn convert_scalar_function(
// skip GetIndexedField, unimplemented.
"between" => {
ensure_arg_len(3)?;
Expr::Between {
Expr::Between(Between {
expr: Box::new(inputs.pop_front().unwrap()),
negated: false,
low: Box::new(inputs.pop_front().unwrap()),
high: Box::new(inputs.pop_front().unwrap()),
}
})
}
"not_between" => {
ensure_arg_len(3)?;
Expr::Between {
Expr::Between(Between {
expr: Box::new(inputs.pop_front().unwrap()),
negated: true,
low: Box::new(inputs.pop_front().unwrap()),
high: Box::new(inputs.pop_front().unwrap()),
}
})
}
// skip Case, is covered in substrait::SwitchExpression.
// skip Cast and TryCast, is covered in substrait::Cast.
"sort" | "sort_des" => {
ensure_arg_len(1)?;
Expr::Sort {
Expr::Sort(Sort {
expr: Box::new(inputs.pop_front().unwrap()),
asc: false,
nulls_first: false,
}
})
}
"sort_asc" => {
ensure_arg_len(1)?;
Expr::Sort {
Expr::Sort(Sort {
expr: Box::new(inputs.pop_front().unwrap()),
asc: true,
nulls_first: false,
}
})
}
// those are datafusion built-in "scalar functions".
"abs"
@@ -435,7 +436,7 @@ pub fn convert_scalar_function(
// skip Wildcard, unimplemented.
// end other direct expr
_ => UnsupportedExprSnafu {
name: format!("scalar function {}", fn_name),
name: format!("scalar function {fn_name}"),
}
.fail()?,
};
@@ -477,7 +478,7 @@ pub fn expression_from_df_expr(
rex_type: Some(RexType::Literal(l)),
}
}
Expr::BinaryExpr { left, op, right } => {
Expr::BinaryExpr(BinaryExpr { left, op, right }) => {
let left = expression_from_df_expr(ctx, left, schema)?;
let right = expression_from_df_expr(ctx, right, schema)?;
let arguments = utils::expression_to_argument(vec![left, right]);
@@ -518,12 +519,12 @@ pub fn expression_from_df_expr(
name: expr.to_string(),
}
.fail()?,
Expr::Between {
Expr::Between(Between {
expr,
negated,
low,
high,
} => {
}) => {
let expr = expression_from_df_expr(ctx, expr, schema)?;
let low = expression_from_df_expr(ctx, low, schema)?;
let high = expression_from_df_expr(ctx, high, schema)?;
@@ -537,11 +538,11 @@ pub fn expression_from_df_expr(
name: expr.to_string(),
}
.fail()?,
Expr::Sort {
Expr::Sort(Sort {
expr,
asc,
nulls_first: _,
} => {
}) => {
let expr = expression_from_df_expr(ctx, expr, schema)?;
let arguments = utils::expression_to_argument(vec![expr]);
let op_name = if *asc { "sort_asc" } else { "sort_des" };
@@ -564,7 +565,22 @@ pub fn expression_from_df_expr(
| Expr::WindowFunction { .. }
| Expr::AggregateUDF { .. }
| Expr::InList { .. }
| Expr::Wildcard => UnsupportedExprSnafu {
| Expr::Wildcard
| Expr::Like(_)
| Expr::ILike(_)
| Expr::SimilarTo(_)
| Expr::IsTrue(_)
| Expr::IsFalse(_)
| Expr::IsUnknown(_)
| Expr::IsNotTrue(_)
| Expr::IsNotFalse(_)
| Expr::IsNotUnknown(_)
| Expr::Exists { .. }
| Expr::InSubquery { .. }
| Expr::ScalarSubquery(..)
| Expr::Placeholder { .. }
| Expr::QualifiedWildcard { .. } => todo!(),
Expr::GroupingSet(_) => UnsupportedExprSnafu {
name: expr.to_string(),
}
.fail()?,
@@ -581,8 +597,8 @@ pub fn convert_column(column: &Column, schema: &Schema) -> Result<FieldReference
schema
.column_index_by_name(column_name)
.with_context(|| MissingFieldSnafu {
field: format!("{:?}", column),
plan: format!("schema: {:?}", schema),
field: format!("{column:?}"),
plan: format!("schema: {schema:?}"),
})?;
Ok(FieldReference {
@@ -628,6 +644,12 @@ mod utils {
Operator::RegexNotIMatch => "regex_not_i_match",
Operator::BitwiseAnd => "bitwise_and",
Operator::BitwiseOr => "bitwise_or",
Operator::BitwiseXor => "bitwise_xor",
Operator::BitwiseShiftRight => "bitwise_shift_right",
Operator::BitwiseShiftLeft => "bitwise_shift_left",
Operator::StringConcat => "string_concat",
Operator::ILike => "i_like",
Operator::NotILike => "not_i_like",
}
}
@@ -679,7 +701,6 @@ mod utils {
BuiltinScalarFunction::Sqrt => "sqrt",
BuiltinScalarFunction::Tan => "tan",
BuiltinScalarFunction::Trunc => "trunc",
BuiltinScalarFunction::Array => "make_array",
BuiltinScalarFunction::Ascii => "ascii",
BuiltinScalarFunction::BitLength => "bit_length",
BuiltinScalarFunction::Btrim => "btrim",
@@ -723,6 +744,17 @@ mod utils {
BuiltinScalarFunction::Trim => "trim",
BuiltinScalarFunction::Upper => "upper",
BuiltinScalarFunction::RegexpMatch => "regexp_match",
BuiltinScalarFunction::Atan2 => "atan2",
BuiltinScalarFunction::Coalesce => "coalesce",
BuiltinScalarFunction::Power => "power",
BuiltinScalarFunction::MakeArray => "make_array",
BuiltinScalarFunction::DateBin => "date_bin",
BuiltinScalarFunction::FromUnixtime => "from_unixtime",
BuiltinScalarFunction::CurrentDate => "current_date",
BuiltinScalarFunction::CurrentTime => "current_time",
BuiltinScalarFunction::Uuid => "uuid",
BuiltinScalarFunction::Struct => "struct",
BuiltinScalarFunction::ArrowTypeof => "arrow_type_of",
}
}
}

View File

@@ -19,10 +19,10 @@ use catalog::CatalogManagerRef;
use common_error::prelude::BoxedError;
use common_telemetry::debug;
use datafusion::arrow::datatypes::SchemaRef as ArrowSchemaRef;
use datafusion::datasource::TableProvider;
use datafusion::logical_plan::plan::Filter;
use datafusion::logical_plan::{LogicalPlan, TableScan, ToDFSchema};
use datafusion::common::{DFField, DFSchema};
use datafusion::datasource::DefaultTableSource;
use datafusion::physical_plan::project_schema;
use datafusion_expr::{Filter, LogicalPlan, TableScan, TableSource};
use prost::Message;
use snafu::{ensure, OptionExt, ResultExt};
use substrait_proto::protobuf::expression::mask_expression::{StructItem, StructSelect};
@@ -144,7 +144,7 @@ impl DFLogicalSubstraitConvertor {
.context(error::ConvertDfSchemaSnafu)?;
let predicate = to_df_expr(ctx, *condition, &schema)?;
LogicalPlan::Filter(Filter { predicate, input })
LogicalPlan::Filter(Filter::try_new(predicate, input).context(DFInternalSnafu)?)
}
RelType::Fetch(_fetch_rel) => UnsupportedPlanSnafu {
name: "Fetch Relation",
@@ -236,9 +236,11 @@ impl DFLogicalSubstraitConvertor {
.map_err(BoxedError::new)
.context(InternalSnafu)?
.context(TableNotFoundSnafu {
name: format!("{}.{}.{}", catalog_name, schema_name, table_name),
name: format!("{catalog_name}.{schema_name}.{table_name}"),
})?;
let adapter = Arc::new(DfTableProviderAdapter::new(table_ref));
let adapter = Arc::new(DefaultTableSource::new(Arc::new(
DfTableProviderAdapter::new(table_ref),
)));
// Get schema directly from the table, and compare it with the schema retrieved from substrait proto.
let stored_schema = adapter.schema();
@@ -260,21 +262,31 @@ impl DFLogicalSubstraitConvertor {
};
// Calculate the projected schema
let projected_schema = project_schema(&stored_schema, projection.as_ref())
.context(DFInternalSnafu)?
.to_dfschema_ref()
.context(DFInternalSnafu)?;
let qualified = &format!("{catalog_name}.{schema_name}.{table_name}");
let projected_schema = Arc::new(
project_schema(&stored_schema, projection.as_ref())
.and_then(|x| {
DFSchema::new_with_metadata(
x.fields()
.iter()
.map(|f| DFField::from_qualified(qualified, f.clone()))
.collect(),
x.metadata().clone(),
)
})
.context(DFInternalSnafu)?,
);
ctx.set_df_schema(projected_schema.clone());
// TODO(ruihang): Support limit
// TODO(ruihang): Support limit(fetch)
Ok(LogicalPlan::TableScan(TableScan {
table_name: format!("{}.{}.{}", catalog_name, schema_name, table_name),
table_name: format!("{catalog_name}.{schema_name}.{table_name}"),
source: adapter,
projection,
projected_schema,
filters,
limit: None,
fetch: None,
}))
}
@@ -302,7 +314,7 @@ impl DFLogicalSubstraitConvertor {
.fail()?,
LogicalPlan::Filter(filter) => {
let input = Some(Box::new(
self.logical_plan_to_rel(ctx, filter.input.clone())?,
self.logical_plan_to_rel(ctx, filter.input().clone())?,
));
let schema = plan
@@ -312,7 +324,7 @@ impl DFLogicalSubstraitConvertor {
.context(error::ConvertDfSchemaSnafu)?;
let condition = Some(Box::new(expression_from_df_expr(
ctx,
&filter.predicate,
filter.predicate(),
&schema,
)?));
@@ -368,16 +380,25 @@ impl DFLogicalSubstraitConvertor {
name: "DataFusion Logical Limit",
}
.fail()?,
LogicalPlan::CreateExternalTable(_)
LogicalPlan::Subquery(_)
| LogicalPlan::SubqueryAlias(_)
| LogicalPlan::CreateView(_)
| LogicalPlan::CreateCatalogSchema(_)
| LogicalPlan::CreateCatalog(_)
| LogicalPlan::DropView(_)
| LogicalPlan::Distinct(_)
| LogicalPlan::SetVariable(_)
| LogicalPlan::CreateExternalTable(_)
| LogicalPlan::CreateMemoryTable(_)
| LogicalPlan::DropTable(_)
| LogicalPlan::Values(_)
| LogicalPlan::Explain(_)
| LogicalPlan::Analyze(_)
| LogicalPlan::Extension(_) => InvalidParametersSnafu {
| LogicalPlan::Extension(_)
| LogicalPlan::Prepare(_) => InvalidParametersSnafu {
reason: format!(
"Trying to convert DDL/DML plan to substrait proto, plan: {:?}",
plan
"Trying to convert DDL/DML plan to substrait proto, plan: {plan:?}",
),
}
.fail()?,
@@ -414,6 +435,10 @@ impl DFLogicalSubstraitConvertor {
let provider = table_scan
.source
.as_any()
.downcast_ref::<DefaultTableSource>()
.context(UnknownPlanSnafu)?
.table_provider
.as_any()
.downcast_ref::<DfTableProviderAdapter>()
.context(UnknownPlanSnafu)?;
let table_info = provider.table().table_info();
@@ -485,7 +510,9 @@ impl DFLogicalSubstraitConvertor {
fn same_schema_without_metadata(lhs: &ArrowSchemaRef, rhs: &ArrowSchemaRef) -> bool {
lhs.fields.len() == rhs.fields.len()
&& lhs.fields.iter().zip(rhs.fields.iter()).all(|(x, y)| {
x.name == y.name && x.data_type == y.data_type && x.is_nullable == y.is_nullable
x.name() == y.name()
&& x.data_type() == y.data_type()
&& x.is_nullable() == y.is_nullable()
})
}
@@ -494,7 +521,7 @@ mod test {
use catalog::local::{LocalCatalogManager, MemoryCatalogProvider, MemorySchemaProvider};
use catalog::{CatalogList, CatalogProvider, RegisterTableRequest};
use common_catalog::consts::{DEFAULT_CATALOG_NAME, DEFAULT_SCHEMA_NAME};
use datafusion::logical_plan::DFSchema;
use datafusion::common::{DFSchema, ToDFSchema};
use datatypes::schema::Schema;
use table::requests::CreateTableRequest;
use table::test_util::{EmptyTable, MockTableEngine};
@@ -545,7 +572,7 @@ mod test {
let proto = convertor.encode(plan.clone()).unwrap();
let tripped_plan = convertor.decode(proto, catalog).unwrap();
assert_eq!(format!("{:?}", plan), format!("{:?}", tripped_plan));
assert_eq!(format!("{plan:?}"), format!("{tripped_plan:?}"));
}
#[tokio::test]
@@ -564,7 +591,9 @@ mod test {
})
.await
.unwrap();
let adapter = Arc::new(DfTableProviderAdapter::new(table_ref));
let adapter = Arc::new(DefaultTableSource::new(Arc::new(
DfTableProviderAdapter::new(table_ref),
)));
let projection = vec![1, 3, 5];
let df_schema = adapter.schema().to_dfschema().unwrap();
@@ -577,14 +606,13 @@ mod test {
let table_scan_plan = LogicalPlan::TableScan(TableScan {
table_name: format!(
"{}.{}.{}",
DEFAULT_CATALOG_NAME, DEFAULT_SCHEMA_NAME, DEFAULT_TABLE_NAME
"{DEFAULT_CATALOG_NAME}.{DEFAULT_SCHEMA_NAME}.{DEFAULT_TABLE_NAME}",
),
source: adapter,
projection: Some(projection),
projected_schema,
filters: vec![],
limit: None,
fetch: None,
});
logical_plan_round_trip(table_scan_plan, catalog_manager).await;

View File

@@ -87,7 +87,7 @@ pub fn to_concrete_type(ty: &SType) -> Result<(ConcreteDataType, bool)> {
| Kind::List(_)
| Kind::Map(_)
| Kind::UserDefinedTypeReference(_) => UnsupportedSubstraitTypeSnafu {
ty: format!("{:?}", kind),
ty: format!("{kind:?}"),
}
.fail(),
}
@@ -154,7 +154,7 @@ pub(crate) fn scalar_value_as_literal_type(v: &ScalarValue) -> Result<LiteralTyp
// TODO(LFC): Implement other conversions: ScalarValue => LiteralType
_ => {
return error::UnsupportedExprSnafu {
name: format!("{:?}", v),
name: format!("{v:?}"),
}
.fail()
}
@@ -177,7 +177,7 @@ pub(crate) fn literal_type_to_scalar_value(t: LiteralType) -> Result<ScalarValue
// TODO(LFC): Implement other conversions: Kind => ScalarValue
_ => {
return error::UnsupportedSubstraitTypeSnafu {
ty: format!("{:?}", kind),
ty: format!("{kind:?}"),
}
.fail()
}
@@ -194,7 +194,7 @@ pub(crate) fn literal_type_to_scalar_value(t: LiteralType) -> Result<ScalarValue
// TODO(LFC): Implement other conversions: LiteralType => ScalarValue
_ => {
return error::UnsupportedSubstraitTypeSnafu {
ty: format!("{:?}", t),
ty: format!("{t:?}"),
}
.fail()
}

View File

@@ -1,8 +1,8 @@
[package]
name = "common-telemetry"
version = "0.1.0"
edition = "2021"
license = "Apache-2.0"
version.workspace = true
edition.workspace = true
license.workspace = true
[features]
console = ["console-subscriber"]

View File

@@ -28,7 +28,7 @@ pub fn set_panic_hook() {
let default_hook = panic::take_hook();
panic::set_hook(Box::new(move |panic| {
let backtrace = Backtrace::new();
let backtrace = format!("{:?}", backtrace);
let backtrace = format!("{backtrace:?}");
if let Some(location) = panic.location() {
tracing::error!(
message = %panic,

View File

@@ -1,8 +1,8 @@
[package]
name = "common-time"
version = "0.1.0"
edition = "2021"
license = "Apache-2.0"
version.workspace = true
edition.workspace = true
license.workspace = true
[dependencies]
chrono = "0.4"

View File

@@ -55,8 +55,11 @@ impl From<i32> for Date {
impl Display for Date {
/// [Date] is formatted according to ISO-8601 standard.
fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result {
let abs_date = NaiveDate::from_num_days_from_ce(UNIX_EPOCH_FROM_CE + self.0);
f.write_str(&abs_date.format("%F").to_string())
if let Some(abs_date) = NaiveDate::from_num_days_from_ce_opt(UNIX_EPOCH_FROM_CE + self.0) {
write!(f, "{}", abs_date.format("%F"))
} else {
write!(f, "Date({})", self.0)
}
}
}
@@ -95,7 +98,7 @@ mod tests {
Date::from_str("1969-01-01").unwrap().to_string()
);
let now = Utc::now().date().format("%F").to_string();
let now = Utc::now().date_naive().format("%F").to_string();
assert_eq!(now, Date::from_str(&now).unwrap().to_string());
}

Some files were not shown because too many files have changed in this diff Show More