mirror of
https://github.com/GreptimeTeam/greptimedb.git
synced 2025-12-22 22:20:02 +00:00
fix: typos (#6084)
This commit is contained in:
@@ -11,6 +11,6 @@ And database will reply with something like:
|
|||||||
Log Level changed from Some("info") to "trace,flow=debug"%
|
Log Level changed from Some("info") to "trace,flow=debug"%
|
||||||
```
|
```
|
||||||
|
|
||||||
The data is a string in the format of `global_level,module1=level1,module2=level2,...` that follow the same rule of `RUST_LOG`.
|
The data is a string in the format of `global_level,module1=level1,module2=level2,...` that follows the same rule of `RUST_LOG`.
|
||||||
|
|
||||||
The module is the module name of the log, and the level is the log level. The log level can be one of the following: `trace`, `debug`, `info`, `warn`, `error`, `off`(case insensitive).
|
The module is the module name of the log, and the level is the log level. The log level can be one of the following: `trace`, `debug`, `info`, `warn`, `error`, `off`(case insensitive).
|
||||||
@@ -14,7 +14,7 @@ impl SqlQueryHandler for Instance {
|
|||||||
```
|
```
|
||||||
|
|
||||||
Normally, when a SQL query arrives at GreptimeDB, the `do_query` method will be called. After some parsing work, the SQL
|
Normally, when a SQL query arrives at GreptimeDB, the `do_query` method will be called. After some parsing work, the SQL
|
||||||
will be feed into `StatementExecutor`:
|
will be fed into `StatementExecutor`:
|
||||||
|
|
||||||
```rust
|
```rust
|
||||||
// in Frontend Instance:
|
// in Frontend Instance:
|
||||||
@@ -27,7 +27,7 @@ an example.
|
|||||||
|
|
||||||
Now, what if the statements should be handled differently for GreptimeDB Standalone and Cluster? You can see there's
|
Now, what if the statements should be handled differently for GreptimeDB Standalone and Cluster? You can see there's
|
||||||
a `SqlStatementExecutor` field in `StatementExecutor`. Each GreptimeDB Standalone and Cluster has its own implementation
|
a `SqlStatementExecutor` field in `StatementExecutor`. Each GreptimeDB Standalone and Cluster has its own implementation
|
||||||
of `SqlStatementExecutor`. If you are going to implement the statements differently in the two mode (
|
of `SqlStatementExecutor`. If you are going to implement the statements differently in the two modes (
|
||||||
like `CREATE TABLE`), you have to implement them in their own `SqlStatementExecutor`s.
|
like `CREATE TABLE`), you have to implement them in their own `SqlStatementExecutor`s.
|
||||||
|
|
||||||
Summarize as the diagram below:
|
Summarize as the diagram below:
|
||||||
|
|||||||
@@ -1,8 +1,8 @@
|
|||||||
Currently, our query engine is based on DataFusion, so all aggregate function is executed by DataFusion, through its UDAF interface. You can find DataFusion's UDAF example [here](https://github.com/apache/arrow-datafusion/blob/arrow2/datafusion-examples/examples/simple_udaf.rs). Basically, we provide the same way as DataFusion to write aggregate functions: both are centered in a struct called "Accumulator" to accumulates states along the way in aggregation.
|
Currently, our query engine is based on DataFusion, so all aggregate function is executed by DataFusion, through its UDAF interface. You can find DataFusion's UDAF example [here](https://github.com/apache/arrow-datafusion/blob/arrow2/datafusion-examples/examples/simple_udaf.rs). Basically, we provide the same way as DataFusion to write aggregate functions: both are centered in a struct called "Accumulator" to accumulates states along the way in aggregation.
|
||||||
|
|
||||||
However, DataFusion's UDAF implementation has a huge restriction, that it requires user to provide a concrete "Accumulator". Take `Median` aggregate function for example, to aggregate a `u32` datatype column, you have to write a `MedianU32`, and use `SELECT MEDIANU32(x)` in SQL. `MedianU32` cannot be used to aggregate a `i32` datatype column. Or, there's another way: you can use a special type that can hold all kinds of data (like our `Value` enum or Arrow's `ScalarValue`), and `match` all the way up to do aggregate calculations. It might work, though rather tedious. (But I think it's DataFusion's prefer way to write UDAF.)
|
However, DataFusion's UDAF implementation has a huge restriction, that it requires user to provide a concrete "Accumulator". Take `Median` aggregate function for example, to aggregate a `u32` datatype column, you have to write a `MedianU32`, and use `SELECT MEDIANU32(x)` in SQL. `MedianU32` cannot be used to aggregate a `i32` datatype column. Or, there's another way: you can use a special type that can hold all kinds of data (like our `Value` enum or Arrow's `ScalarValue`), and `match` all the way up to do aggregate calculations. It might work, though rather tedious. (But I think it's DataFusion's preferred way to write UDAF.)
|
||||||
|
|
||||||
So is there a way we can make an aggregate function that automatically match the input data's type? For example, a `Median` aggregator that can work on both `u32` column and `i32`? The answer is yes until we found a way to bypassing DataFusion's restriction, a restriction that DataFusion simply don't pass the input data's type when creating an Accumulator.
|
So is there a way we can make an aggregate function that automatically match the input data's type? For example, a `Median` aggregator that can work on both `u32` column and `i32`? The answer is yes until we find a way to bypass DataFusion's restriction, a restriction that DataFusion simply doesn't pass the input data's type when creating an Accumulator.
|
||||||
|
|
||||||
> There's an example in `my_sum_udaf_example.rs`, take that as quick start.
|
> There's an example in `my_sum_udaf_example.rs`, take that as quick start.
|
||||||
|
|
||||||
@@ -16,7 +16,7 @@ You must first define a struct that will be used to create your accumulator. For
|
|||||||
struct MySumAccumulatorCreator {}
|
struct MySumAccumulatorCreator {}
|
||||||
```
|
```
|
||||||
|
|
||||||
Attribute macro `#[as_aggr_func_creator]` and derive macro `#[derive(Debug, AggrFuncTypeStore)]` must both annotated on the struct. They work together to provide a storage of aggregate function's input data types, which are needed for creating generic accumulator later.
|
Attribute macro `#[as_aggr_func_creator]` and derive macro `#[derive(Debug, AggrFuncTypeStore)]` must both be annotated on the struct. They work together to provide a storage of aggregate function's input data types, which are needed for creating generic accumulator later.
|
||||||
|
|
||||||
> Note that the `as_aggr_func_creator` macro will add fields to the struct, so the struct cannot be defined as an empty struct without field like `struct Foo;`, neither as a new type like `struct Foo(bar)`.
|
> Note that the `as_aggr_func_creator` macro will add fields to the struct, so the struct cannot be defined as an empty struct without field like `struct Foo;`, neither as a new type like `struct Foo(bar)`.
|
||||||
|
|
||||||
@@ -32,11 +32,11 @@ pub trait AggregateFunctionCreator: Send + Sync + Debug {
|
|||||||
|
|
||||||
You can use input data's type in methods that return output type and state types (just invoke `input_types()`).
|
You can use input data's type in methods that return output type and state types (just invoke `input_types()`).
|
||||||
|
|
||||||
The output type is aggregate function's output data's type. For example, `SUM` aggregate function's output type is `u64` for a `u32` datatype column. The state types are accumulator's internal states' types. Take `AVG` aggregate function on a `i32` column as example, it's state types are `i64` (for sum) and `u64` (for count).
|
The output type is aggregate function's output data's type. For example, `SUM` aggregate function's output type is `u64` for a `u32` datatype column. The state types are accumulator's internal states' types. Take `AVG` aggregate function on a `i32` column as example, its state types are `i64` (for sum) and `u64` (for count).
|
||||||
|
|
||||||
The `creator` function is where you define how an accumulator (that will be used in DataFusion) is created. You define "how" to create the accumulator (instead of "what" to create), using the input data's type as arguments. With input datatype known, you can create accumulator generically.
|
The `creator` function is where you define how an accumulator (that will be used in DataFusion) is created. You define "how" to create the accumulator (instead of "what" to create), using the input data's type as arguments. With input datatype known, you can create accumulator generically.
|
||||||
|
|
||||||
# 2. Impl `Accumulator` trait for you accumulator.
|
# 2. Impl `Accumulator` trait for your accumulator.
|
||||||
|
|
||||||
The accumulator is where you store the aggregate calculation states and evaluate a result. You must impl `Accumulator` trait for it. The trait's definition is:
|
The accumulator is where you store the aggregate calculation states and evaluate a result. You must impl `Accumulator` trait for it. The trait's definition is:
|
||||||
|
|
||||||
@@ -49,7 +49,7 @@ pub trait Accumulator: Send + Sync + Debug {
|
|||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
The DataFusion basically execute aggregate like this:
|
The DataFusion basically executes aggregate like this:
|
||||||
|
|
||||||
1. Partitioning all input data for aggregate. Create an accumulator for each part.
|
1. Partitioning all input data for aggregate. Create an accumulator for each part.
|
||||||
2. Call `update_batch` on each accumulator with partitioned data, to let you update your aggregate calculation.
|
2. Call `update_batch` on each accumulator with partitioned data, to let you update your aggregate calculation.
|
||||||
@@ -57,16 +57,16 @@ The DataFusion basically execute aggregate like this:
|
|||||||
4. Call `merge_batch` to merge all accumulator's internal state to one.
|
4. Call `merge_batch` to merge all accumulator's internal state to one.
|
||||||
5. Execute `evaluate` on the chosen one to get the final calculation result.
|
5. Execute `evaluate` on the chosen one to get the final calculation result.
|
||||||
|
|
||||||
Once you know the meaning of each method, you can easily write your accumulator. You can refer to `Median` accumulator or `SUM` accumulator defined in file `my_sum_udaf_example.rs` for more details.
|
Once you know the meaning of each method, you can easily write your accumulator. You can refer to `Median` accumulator or `SUM` accumulator defined in file `my_sum_udaf_example.rs` for more details.
|
||||||
|
|
||||||
# 3. Register your aggregate function to our query engine.
|
# 3. Register your aggregate function to our query engine.
|
||||||
|
|
||||||
You can call `register_aggregate_function` method in query engine to register your aggregate function. To do that, you have to new an instance of struct `AggregateFunctionMeta`. The struct has three fields, first is the name of your aggregate function's name. The function name is case-sensitive due to DataFusion's restriction. We strongly recommend using lowercase for your name. If you have to use uppercase name, wrap your aggregate function with quotation marks. For example, if you define an aggregate function named "my_aggr", you can use "`SELECT MY_AGGR(x)`"; if you define "my_AGGR", you have to use "`SELECT "my_AGGR"(x)`".
|
You can call `register_aggregate_function` method in query engine to register your aggregate function. To do that, you have to new an instance of struct `AggregateFunctionMeta`. The struct has three fields, first is the name of your aggregate function's name. The function name is case-sensitive due to DataFusion's restriction. We strongly recommend using lowercase for your name. If you have to use uppercase name, wrap your aggregate function with quotation marks. For example, if you define an aggregate function named "my_aggr", you can use "`SELECT MY_AGGR(x)`"; if you define "my_AGGR", you have to use "`SELECT "my_AGGR"(x)`".
|
||||||
|
|
||||||
The second field is arg_counts ,the count of the arguments. Like accumulator `percentile`, calculating the p_number of the column. We need to input the value of column and the value of p to cacalate, and so the count of the arguments is two.
|
The second field is arg_counts ,the count of the arguments. Like accumulator `percentile`, calculating the p_number of the column. We need to input the value of column and the value of p to calculate, and so the count of the arguments is two.
|
||||||
|
|
||||||
The third field is a function about how to create your accumulator creator that you defined in step 1 above. Create creator, that's a bit intertwined, but it is how we make DataFusion use a newly created aggregate function each time it executes a SQL, preventing the stored input types from affecting each other. The key detail can be starting looking at our `DfContextProviderAdapter` struct's `get_aggregate_meta` method.
|
The third field is a function about how to create your accumulator creator that you defined in step 1 above. Create creator, that's a bit intertwined, but it is how we make DataFusion use a newly created aggregate function each time it executes a SQL, preventing the stored input types from affecting each other. The key detail can be starting looking at our `DfContextProviderAdapter` struct's `get_aggregate_meta` method.
|
||||||
|
|
||||||
# (Optional) 4. Make your aggregate function automatically registered.
|
# (Optional) 4. Make your aggregate function automatically registered.
|
||||||
|
|
||||||
If you've written a great aggregate function that want to let everyone use it, you can make it automatically registered to our query engine at start time. It's quick simple, just refer to the `AggregateFunctions::register` function in `common/function/src/scalars/aggregate/mod.rs`.
|
If you've written a great aggregate function that wants to let everyone use it, you can make it automatically register to our query engine at start time. It's quick and simple, just refer to the `AggregateFunctions::register` function in `common/function/src/scalars/aggregate/mod.rs`.
|
||||||
|
|||||||
@@ -3,7 +3,7 @@
|
|||||||
This document introduces how to write fuzz tests in GreptimeDB.
|
This document introduces how to write fuzz tests in GreptimeDB.
|
||||||
|
|
||||||
## What is a fuzz test
|
## What is a fuzz test
|
||||||
Fuzz test is tool that leverage deterministic random generation to assist in finding bugs. The goal of fuzz tests is to identify inputs generated by the fuzzer that cause system panics, crashes, or unexpected behaviors to occur. And we are using the [cargo-fuzz](https://github.com/rust-fuzz/cargo-fuzz) to run our fuzz test targets.
|
Fuzz test is tool that leverages deterministic random generation to assist in finding bugs. The goal of fuzz tests is to identify inputs generated by the fuzzer that cause system panics, crashes, or unexpected behaviors to occur. And we are using the [cargo-fuzz](https://github.com/rust-fuzz/cargo-fuzz) to run our fuzz test targets.
|
||||||
|
|
||||||
## Why we need them
|
## Why we need them
|
||||||
- Find bugs by leveraging random generation
|
- Find bugs by leveraging random generation
|
||||||
|
|||||||
Reference in New Issue
Block a user