diff --git a/AUTHOR.md b/AUTHOR.md index bdf0d6b39a..021d7b299f 100644 --- a/AUTHOR.md +++ b/AUTHOR.md @@ -10,12 +10,10 @@ * [NiwakaDev](https://github.com/NiwakaDev) * [tisonkun](https://github.com/tisonkun) - ## Team Members (in alphabetical order) * [apdong2022](https://github.com/apdong2022) * [beryl678](https://github.com/beryl678) -* [Breeze-P](https://github.com/Breeze-P) * [daviderli614](https://github.com/daviderli614) * [discord9](https://github.com/discord9) * [evenyag](https://github.com/evenyag) diff --git a/docs/how-to/how-to-write-aggregate-function.md b/docs/how-to/how-to-write-aggregate-function.md index 04102c6543..ff34c10720 100644 --- a/docs/how-to/how-to-write-aggregate-function.md +++ b/docs/how-to/how-to-write-aggregate-function.md @@ -1,4 +1,4 @@ -Currently, our query engine is based on DataFusion, so all aggregate function is executed by DataFusion, through its UDAF interface. You can find DataFusion's UDAF example [here](https://github.com/apache/arrow-datafusion/blob/arrow2/datafusion-examples/examples/simple_udaf.rs). Basically, we provide the same way as DataFusion to write aggregate functions: both are centered in a struct called "Accumulator" to accumulates states along the way in aggregation. +Currently, our query engine is based on DataFusion, so all aggregate function is executed by DataFusion, through its UDAF interface. You can find DataFusion's UDAF example [here](https://github.com/apache/datafusion/tree/main/datafusion-examples/examples/simple_udaf.rs). Basically, we provide the same way as DataFusion to write aggregate functions: both are centered in a struct called "Accumulator" to accumulates states along the way in aggregation. However, DataFusion's UDAF implementation has a huge restriction, that it requires user to provide a concrete "Accumulator". Take `Median` aggregate function for example, to aggregate a `u32` datatype column, you have to write a `MedianU32`, and use `SELECT MEDIANU32(x)` in SQL. `MedianU32` cannot be used to aggregate a `i32` datatype column. Or, there's another way: you can use a special type that can hold all kinds of data (like our `Value` enum or Arrow's `ScalarValue`), and `match` all the way up to do aggregate calculations. It might work, though rather tedious. (But I think it's DataFusion's preferred way to write UDAF.) diff --git a/docs/rfcs/2023-02-01-table-compaction.md b/docs/rfcs/2023-02-01-table-compaction.md index 645bf2d440..311d95a351 100644 --- a/docs/rfcs/2023-02-01-table-compaction.md +++ b/docs/rfcs/2023-02-01-table-compaction.md @@ -76,7 +76,7 @@ pub trait CompactionStrategy { ``` The most suitable compaction strategy for time-series scenario would be -a hybrid strategy that combines time window compaction with size-tired compaction, just like [Cassandra](https://cassandra.apache.org/doc/latest/cassandra/operating/compaction/twcs.html) and [ScyllaDB](https://docs.scylladb.com/stable/architecture/compaction/compaction-strategies.html#time-window-compaction-strategy-twcs) does. +a hybrid strategy that combines time window compaction with size-tired compaction, just like [Cassandra](https://cassandra.apache.org/doc/latest/cassandra/managing/operating/compaction/twcs.html) and [ScyllaDB](https://docs.scylladb.com/stable/architecture/compaction/compaction-strategies.html#time-window-compaction-strategy-twcs) does. We can first group SSTs in level n into buckets according to some predefined time window. Within that window, SSTs are compacted in a size-tired manner (find SSTs with similar size and compact them to level n+1). diff --git a/docs/rfcs/2024-01-17-dataflow-framework.md b/docs/rfcs/2024-01-17-dataflow-framework.md index 46da2175e1..cbcf8a55ba 100644 --- a/docs/rfcs/2024-01-17-dataflow-framework.md +++ b/docs/rfcs/2024-01-17-dataflow-framework.md @@ -28,7 +28,7 @@ In order to do those things while maintaining a low memory footprint, you need t - Greptime Flow's is built on top of [Hydroflow](https://github.com/hydro-project/hydroflow). - We have three choices for the Dataflow/Streaming process framework for our simple continuous aggregation feature: 1. Based on the timely/differential dataflow crate that [materialize](https://github.com/MaterializeInc/materialize) based on. Later, it's proved too obscure for a simple usage, and is hard to customize memory usage control. -2. Based on a simple dataflow framework that we write from ground up, like what [arroyo](https://www.arroyo.dev/) or [risingwave](https://www.risingwave.dev/) did, for example the core streaming logic of [arroyo](https://github.com/ArroyoSystems/arroyo/blob/master/arroyo-datastream/src/lib.rs) only takes up to 2000 line of codes. However, it means maintaining another layer of dataflow framework, which might seem easy in the beginning, but I fear it might be too burdensome to maintain once we need more features. +2. Based on a simple dataflow framework that we write from ground up, like what [arroyo](https://www.arroyo.dev/) or [risingwave](https://www.risingwave.dev/) did, for example the core streaming logic of [arroyo](https://github.com/ArroyoSystems/arroyo/blob/master/crates/arroyo-datastream/src/lib.rs) only takes up to 2000 line of codes. However, it means maintaining another layer of dataflow framework, which might seem easy in the beginning, but I fear it might be too burdensome to maintain once we need more features. 3. Based on a simple and lower level dataflow framework that someone else write, like [hydroflow](https://github.com/hydro-project/hydroflow), this approach combines the best of both worlds. Firstly, it boasts ease of comprehension and customization. Secondly, the dataflow framework offers precisely the necessary features for crafting uncomplicated single-node dataflow programs while delivering decent performance. Hence, we choose the third option, and use a simple logical plan that's anagonistic to the underlying dataflow framework, as it only describe how the dataflow graph should be doing, not how it do that. And we built operator in hydroflow to execute the plan. And the result hydroflow graph is wrapped in a engine that only support data in/out and tick event to flush and compute the result. This provide a thin middle layer that's easy to maintain and allow switching to other dataflow framework if necessary. diff --git a/grafana/README.md b/grafana/README.md index e23d8d3c2f..6f06c1e4f6 100644 --- a/grafana/README.md +++ b/grafana/README.md @@ -83,7 +83,7 @@ If you use the [Helm Chart](https://github.com/GreptimeTeam/helm-charts) to depl - `monitoring.enabled=true`: Deploys a standalone GreptimeDB instance dedicated to monitoring the cluster; - `grafana.enabled=true`: Deploys Grafana and automatically imports the monitoring dashboard; -The standalone GreptimeDB instance will collect metrics from your cluster, and the dashboard will be available in the Grafana UI. For detailed deployment instructions, please refer to our [Kubernetes deployment guide](https://docs.greptime.com/user-guide/deployments-administration-administration/deploy-on-kubernetes/getting-started). +The standalone GreptimeDB instance will collect metrics from your cluster, and the dashboard will be available in the Grafana UI. For detailed deployment instructions, please refer to our [Kubernetes deployment guide](https://docs.greptime.com/user-guide/deployments-administration/deploy-on-kubernetes/overview). ### Self-host Prometheus and import dashboards manually