refactor: use DataFusion's UDAF implementation directly (#6776)

* refactor: use DataFusion's UDAF implementation directly

Signed-off-by: luofucong <luofc@foxmail.com>

* remove: delete how-to guide for writing aggregate functions

Signed-off-by: luofucong <luofc@foxmail.com>

* fix ci

Signed-off-by: luofucong <luofc@foxmail.com>

* refactor: port json_encode_path to datafusion udaf

Signed-off-by: Ning Sun <sunning@greptime.com>

---------

Signed-off-by: luofucong <luofc@foxmail.com>
Signed-off-by: Ning Sun <sunning@greptime.com>
Co-authored-by: Ning Sun <sunning@greptime.com>
This commit is contained in:
LFC
2025-08-20 16:25:00 +08:00
committed by GitHub
parent 819531393f
commit 05529387d9
17 changed files with 526 additions and 1200 deletions

View File

@@ -1,72 +0,0 @@
Currently, our query engine is based on DataFusion, so all aggregate function is executed by DataFusion, through its UDAF interface. You can find DataFusion's UDAF example [here](https://github.com/apache/datafusion/tree/main/datafusion-examples/examples/simple_udaf.rs). Basically, we provide the same way as DataFusion to write aggregate functions: both are centered in a struct called "Accumulator" to accumulates states along the way in aggregation.
However, DataFusion's UDAF implementation has a huge restriction, that it requires user to provide a concrete "Accumulator". Take `Median` aggregate function for example, to aggregate a `u32` datatype column, you have to write a `MedianU32`, and use `SELECT MEDIANU32(x)` in SQL. `MedianU32` cannot be used to aggregate a `i32` datatype column. Or, there's another way: you can use a special type that can hold all kinds of data (like our `Value` enum or Arrow's `ScalarValue`), and `match` all the way up to do aggregate calculations. It might work, though rather tedious. (But I think it's DataFusion's preferred way to write UDAF.)
So is there a way we can make an aggregate function that automatically match the input data's type? For example, a `Median` aggregator that can work on both `u32` column and `i32`? The answer is yes until we find a way to bypass DataFusion's restriction, a restriction that DataFusion simply doesn't pass the input data's type when creating an Accumulator.
> There's an example in `my_sum_udaf_example.rs`, take that as quick start.
# 1. Impl `AggregateFunctionCreator` trait for your accumulator creator.
You must first define a struct that will be used to create your accumulator. For example,
```Rust
#[as_aggr_func_creator]
#[derive(Debug, AggrFuncTypeStore)]
struct MySumAccumulatorCreator {}
```
Attribute macro `#[as_aggr_func_creator]` and derive macro `#[derive(Debug, AggrFuncTypeStore)]` must both be annotated on the struct. They work together to provide a storage of aggregate function's input data types, which are needed for creating generic accumulator later.
> Note that the `as_aggr_func_creator` macro will add fields to the struct, so the struct cannot be defined as an empty struct without field like `struct Foo;`, neither as a new type like `struct Foo(bar)`.
Then impl `AggregateFunctionCreator` trait on it. The definition of the trait is:
```Rust
pub trait AggregateFunctionCreator: Send + Sync + Debug {
fn creator(&self) -> AccumulatorCreatorFunction;
fn output_type(&self) -> ConcreteDataType;
fn state_types(&self) -> Vec<ConcreteDataType>;
}
```
You can use input data's type in methods that return output type and state types (just invoke `input_types()`).
The output type is aggregate function's output data's type. For example, `SUM` aggregate function's output type is `u64` for a `u32` datatype column. The state types are accumulator's internal states' types. Take `AVG` aggregate function on a `i32` column as example, its state types are `i64` (for sum) and `u64` (for count).
The `creator` function is where you define how an accumulator (that will be used in DataFusion) is created. You define "how" to create the accumulator (instead of "what" to create), using the input data's type as arguments. With input datatype known, you can create accumulator generically.
# 2. Impl `Accumulator` trait for your accumulator.
The accumulator is where you store the aggregate calculation states and evaluate a result. You must impl `Accumulator` trait for it. The trait's definition is:
```Rust
pub trait Accumulator: Send + Sync + Debug {
fn state(&self) -> Result<Vec<Value>>;
fn update_batch(&mut self, values: &[VectorRef]) -> Result<()>;
fn merge_batch(&mut self, states: &[VectorRef]) -> Result<()>;
fn evaluate(&self) -> Result<Value>;
}
```
The DataFusion basically executes aggregate like this:
1. Partitioning all input data for aggregate. Create an accumulator for each part.
2. Call `update_batch` on each accumulator with partitioned data, to let you update your aggregate calculation.
3. Call `state` to get each accumulator's internal state, the medial calculation result.
4. Call `merge_batch` to merge all accumulator's internal state to one.
5. Execute `evaluate` on the chosen one to get the final calculation result.
Once you know the meaning of each method, you can easily write your accumulator. You can refer to `Median` accumulator or `SUM` accumulator defined in file `my_sum_udaf_example.rs` for more details.
# 3. Register your aggregate function to our query engine.
You can call `register_aggregate_function` method in query engine to register your aggregate function. To do that, you have to new an instance of struct `AggregateFunctionMeta`. The struct has three fields, first is the name of your aggregate function's name. The function name is case-sensitive due to DataFusion's restriction. We strongly recommend using lowercase for your name. If you have to use uppercase name, wrap your aggregate function with quotation marks. For example, if you define an aggregate function named "my_aggr", you can use "`SELECT MY_AGGR(x)`"; if you define "my_AGGR", you have to use "`SELECT "my_AGGR"(x)`".
The second field is arg_counts ,the count of the arguments. Like accumulator `percentile`, calculating the p_number of the column. We need to input the value of column and the value of p to calculate, and so the count of the arguments is two.
The third field is a function about how to create your accumulator creator that you defined in step 1 above. Create creator, that's a bit intertwined, but it is how we make DataFusion use a newly created aggregate function each time it executes a SQL, preventing the stored input types from affecting each other. The key detail can be starting looking at our `DfContextProviderAdapter` struct's `get_aggregate_meta` method.
# (Optional) 4. Make your aggregate function automatically registered.
If you've written a great aggregate function that wants to let everyone use it, you can make it automatically register to our query engine at start time. It's quick and simple, just refer to the `AggregateFunctions::register` function in `common/function/src/scalars/aggregate/mod.rs`.

View File

@@ -21,7 +21,7 @@ pub(crate) struct GeoFunction;
impl GeoFunction {
pub fn register(registry: &FunctionRegistry) {
registry.register_aggr(encoding::JsonEncodePathAccumulator::uadf_impl());
registry.register_aggr(geo_path::GeoPathAccumulator::uadf_impl());
registry.register_aggr(encoding::JsonPathAccumulator::uadf_impl());
}
}

View File

@@ -14,223 +14,332 @@
use std::sync::Arc;
use common_error::ext::{BoxedError, PlainError};
use common_error::status_code::StatusCode;
use common_macro::{as_aggr_func_creator, AggrFuncTypeStore};
use common_query::error::{self, InvalidInputStateSnafu, Result};
use common_query::logical_plan::accumulator::AggrFuncTypeStore;
use common_query::logical_plan::{
create_aggregate_function, Accumulator, AggregateFunctionCreator,
use arrow::array::AsArray;
use datafusion::arrow::array::{Array, ArrayRef};
use datafusion::common::cast::as_primitive_array;
use datafusion::error::{DataFusionError, Result as DfResult};
use datafusion::logical_expr::{Accumulator as DfAccumulator, AggregateUDF, Volatility};
use datafusion::prelude::create_udaf;
use datafusion_common::cast::{as_list_array, as_struct_array};
use datafusion_common::ScalarValue;
use datatypes::arrow::array::{Float64Array, Int64Array, ListArray, StructArray};
use datatypes::arrow::datatypes::{
DataType, Field, Float64Type, Int64Type, TimeUnit, TimestampNanosecondType,
};
use common_query::prelude::AccumulatorCreatorFunction;
use common_time::Timestamp;
use datafusion_expr::AggregateUDF;
use datatypes::prelude::ConcreteDataType;
use datatypes::value::{ListValue, Value};
use datatypes::vectors::VectorRef;
use snafu::{ensure, ResultExt};
use datatypes::compute::{self, sort_to_indices};
use crate::scalars::geo::helpers::{ensure_columns_len, ensure_columns_n};
pub const JSON_ENCODE_PATH_NAME: &str = "json_encode_path";
/// Accumulator of lat, lng, timestamp tuples
#[derive(Debug)]
pub struct JsonPathAccumulator {
timestamp_type: ConcreteDataType,
const LATITUDE_FIELD: &str = "lat";
const LONGITUDE_FIELD: &str = "lng";
const TIMESTAMP_FIELD: &str = "timestamp";
const DEFAULT_LIST_FIELD_NAME: &str = "item";
#[derive(Debug, Default)]
pub struct JsonEncodePathAccumulator {
lat: Vec<Option<f64>>,
lng: Vec<Option<f64>>,
timestamp: Vec<Option<Timestamp>>,
timestamp: Vec<Option<i64>>,
}
impl JsonPathAccumulator {
fn new(timestamp_type: ConcreteDataType) -> Self {
Self {
lat: Vec::default(),
lng: Vec::default(),
timestamp: Vec::default(),
timestamp_type,
}
impl JsonEncodePathAccumulator {
pub fn new() -> Self {
Self::default()
}
/// Create a new `AggregateUDF` for the `json_encode_path` aggregate function.
pub fn uadf_impl() -> AggregateUDF {
create_aggregate_function(
"json_encode_path".to_string(),
3,
Arc::new(JsonPathEncodeFunctionCreator::default()),
create_udaf(
JSON_ENCODE_PATH_NAME,
// Input types: lat, lng, timestamp
vec![
DataType::Float64,
DataType::Float64,
DataType::Timestamp(TimeUnit::Nanosecond, None),
],
// Output type: geojson compatible linestring
Arc::new(DataType::Utf8),
Volatility::Immutable,
// Create the accumulator
Arc::new(|_| Ok(Box::new(Self::new()))),
// Intermediate state types
Arc::new(vec![DataType::Struct(
vec![
Field::new(
LATITUDE_FIELD,
DataType::List(Arc::new(Field::new(
DEFAULT_LIST_FIELD_NAME,
DataType::Float64,
true,
))),
false,
),
Field::new(
LONGITUDE_FIELD,
DataType::List(Arc::new(Field::new(
DEFAULT_LIST_FIELD_NAME,
DataType::Float64,
true,
))),
false,
),
Field::new(
TIMESTAMP_FIELD,
DataType::List(Arc::new(Field::new(
DEFAULT_LIST_FIELD_NAME,
DataType::Int64,
true,
))),
false,
),
]
.into(),
)]),
)
.into()
}
}
impl Accumulator for JsonPathAccumulator {
fn state(&self) -> Result<Vec<Value>> {
Ok(vec![
Value::List(ListValue::new(
self.lat.iter().map(|i| Value::from(*i)).collect(),
ConcreteDataType::float64_datatype(),
)),
Value::List(ListValue::new(
self.lng.iter().map(|i| Value::from(*i)).collect(),
ConcreteDataType::float64_datatype(),
)),
Value::List(ListValue::new(
self.timestamp.iter().map(|i| Value::from(*i)).collect(),
self.timestamp_type.clone(),
)),
])
}
impl DfAccumulator for JsonEncodePathAccumulator {
fn update_batch(&mut self, values: &[ArrayRef]) -> datafusion::error::Result<()> {
if values.len() != 3 {
return Err(DataFusionError::Internal(format!(
"Expected 3 columns for json_encode_path, got {}",
values.len()
)));
}
fn update_batch(&mut self, columns: &[VectorRef]) -> Result<()> {
// update batch as in datafusion just provides the accumulator original
// input.
//
// columns is vec of [`lat`, `lng`, `timestamp`]
// where
// - `lat` is a vector of `Value::Float64` or similar type. Each item in
// the vector is a row in given dataset.
// - so on so forth for `lng` and `timestamp`
ensure_columns_n!(columns, 3);
let lat_array = as_primitive_array::<Float64Type>(&values[0])?;
let lng_array = as_primitive_array::<Float64Type>(&values[1])?;
let ts_array = as_primitive_array::<TimestampNanosecondType>(&values[2])?;
let lat = &columns[0];
let lng = &columns[1];
let ts = &columns[2];
let size = lat.len();
let size = lat_array.len();
self.lat.reserve(size);
self.lng.reserve(size);
for idx in 0..size {
self.lat.push(lat.get(idx).as_f64_lossy());
self.lng.push(lng.get(idx).as_f64_lossy());
self.timestamp.push(ts.get(idx).as_timestamp());
self.lat.push(if lat_array.is_null(idx) {
None
} else {
Some(lat_array.value(idx))
});
self.lng.push(if lng_array.is_null(idx) {
None
} else {
Some(lng_array.value(idx))
});
self.timestamp.push(if ts_array.is_null(idx) {
None
} else {
Some(ts_array.value(idx))
});
}
Ok(())
}
fn merge_batch(&mut self, states: &[VectorRef]) -> Result<()> {
// merge batch as in datafusion gives state accumulated from the data
// returned from child accumulators' state() call
// In our particular implementation, the data structure is like
//
// states is vec of [`lat`, `lng`, `timestamp`]
// where
// - `lat` is a vector of `Value::List`. Each item in the list is all
// coordinates from a child accumulator.
// - so on so forth for `lng` and `timestamp`
fn evaluate(&mut self) -> DfResult<ScalarValue> {
let unordered_lng_array = Float64Array::from(self.lng.clone());
let unordered_lat_array = Float64Array::from(self.lat.clone());
let ts_array = Int64Array::from(self.timestamp.clone());
ensure_columns_n!(states, 3);
let ordered_indices = sort_to_indices(&ts_array, None, None)?;
let lat_array = compute::take(&unordered_lat_array, &ordered_indices, None)?;
let lng_array = compute::take(&unordered_lng_array, &ordered_indices, None)?;
let lat_lists = &states[0];
let lng_lists = &states[1];
let ts_lists = &states[2];
let len = ts_array.len();
let lat_array = lat_array.as_primitive::<Float64Type>();
let lng_array = lng_array.as_primitive::<Float64Type>();
let len = lat_lists.len();
let mut coords = Vec::with_capacity(len);
for i in 0..len {
let lng = lng_array.value(i);
let lat = lat_array.value(i);
coords.push(vec![lng, lat]);
}
for idx in 0..len {
if let Some(lat_list) = lat_lists
.get(idx)
.as_list()
.map_err(BoxedError::new)
.context(error::ExecuteSnafu)?
{
for v in lat_list.items() {
self.lat.push(v.as_f64_lossy());
}
}
let result = serde_json::to_string(&coords)
.map_err(|e| DataFusionError::Execution(format!("Failed to encode json, {}", e)))?;
if let Some(lng_list) = lng_lists
.get(idx)
.as_list()
.map_err(BoxedError::new)
.context(error::ExecuteSnafu)?
{
for v in lng_list.items() {
self.lng.push(v.as_f64_lossy());
}
}
Ok(ScalarValue::Utf8(Some(result)))
}
if let Some(ts_list) = ts_lists
.get(idx)
.as_list()
.map_err(BoxedError::new)
.context(error::ExecuteSnafu)?
{
for v in ts_list.items() {
self.timestamp.push(v.as_timestamp());
}
}
fn size(&self) -> usize {
// Base size of JsonEncodePathAccumulator struct fields
let mut total_size = std::mem::size_of::<Self>();
// Size of vectors (approximation)
total_size += self.lat.capacity() * std::mem::size_of::<Option<f64>>();
total_size += self.lng.capacity() * std::mem::size_of::<Option<f64>>();
total_size += self.timestamp.capacity() * std::mem::size_of::<Option<i64>>();
total_size
}
fn state(&mut self) -> datafusion::error::Result<Vec<ScalarValue>> {
let lat_array = Arc::new(ListArray::from_iter_primitive::<Float64Type, _, _>(vec![
Some(self.lat.clone()),
]));
let lng_array = Arc::new(ListArray::from_iter_primitive::<Float64Type, _, _>(vec![
Some(self.lng.clone()),
]));
let ts_array = Arc::new(ListArray::from_iter_primitive::<Int64Type, _, _>(vec![
Some(self.timestamp.clone()),
]));
let state_struct = StructArray::new(
vec![
Field::new(
LATITUDE_FIELD,
DataType::List(Arc::new(Field::new("item", DataType::Float64, true))),
false,
),
Field::new(
LONGITUDE_FIELD,
DataType::List(Arc::new(Field::new("item", DataType::Float64, true))),
false,
),
Field::new(
TIMESTAMP_FIELD,
DataType::List(Arc::new(Field::new("item", DataType::Int64, true))),
false,
),
]
.into(),
vec![lat_array, lng_array, ts_array],
None,
);
Ok(vec![ScalarValue::Struct(Arc::new(state_struct))])
}
fn merge_batch(&mut self, states: &[ArrayRef]) -> datafusion::error::Result<()> {
if states.len() != 1 {
return Err(DataFusionError::Internal(format!(
"Expected 1 states for json_encode_path, got {}",
states.len()
)));
}
for state in states {
let state = as_struct_array(state)?;
let lat_list = as_list_array(state.column(0))?.value(0);
let lat_array = as_primitive_array::<Float64Type>(&lat_list)?;
let lng_list = as_list_array(state.column(1))?.value(0);
let lng_array = as_primitive_array::<Float64Type>(&lng_list)?;
let ts_list = as_list_array(state.column(2))?.value(0);
let ts_array = as_primitive_array::<Int64Type>(&ts_list)?;
self.lat.extend(lat_array);
self.lng.extend(lng_array);
self.timestamp.extend(ts_array);
}
Ok(())
}
fn evaluate(&self) -> Result<Value> {
let mut work_vec: Vec<(&Option<f64>, &Option<f64>, &Option<Timestamp>)> = self
.lat
.iter()
.zip(self.lng.iter())
.zip(self.timestamp.iter())
.map(|((a, b), c)| (a, b, c))
.collect();
// sort by timestamp, we treat null timestamp as 0
work_vec.sort_unstable_by_key(|tuple| tuple.2.unwrap_or_else(|| Timestamp::new_second(0)));
let result = serde_json::to_string(
&work_vec
.into_iter()
// note that we transform to lng,lat for geojson compatibility
.map(|(lat, lng, _)| vec![lng, lat])
.collect::<Vec<Vec<&Option<f64>>>>(),
)
.map_err(|e| {
BoxedError::new(PlainError::new(
format!("Serialization failure: {}", e),
StatusCode::EngineExecuteQuery,
))
})
.context(error::ExecuteSnafu)?;
Ok(Value::String(result.into()))
}
}
/// This function accept rows of lat, lng and timestamp, sort with timestamp and
/// encoding them into a geojson-like path.
///
/// Example:
///
/// ```sql
/// SELECT json_encode_path(lat, lon, timestamp) FROM table [group by ...];
/// ```
///
#[as_aggr_func_creator]
#[derive(Debug, Default, AggrFuncTypeStore)]
pub struct JsonPathEncodeFunctionCreator {}
#[cfg(test)]
mod tests {
use datafusion::arrow::array::{Float64Array, TimestampNanosecondArray};
use datafusion::scalar::ScalarValue;
impl AggregateFunctionCreator for JsonPathEncodeFunctionCreator {
fn creator(&self) -> AccumulatorCreatorFunction {
let creator: AccumulatorCreatorFunction = Arc::new(move |types: &[ConcreteDataType]| {
let ts_type = types[2].clone();
Ok(Box::new(JsonPathAccumulator::new(ts_type)))
});
use super::*;
creator
#[test]
fn test_json_encode_path_basic() {
let mut accumulator = JsonEncodePathAccumulator::new();
// Create test data
let lat_array = Arc::new(Float64Array::from(vec![1.0, 2.0, 3.0]));
let lng_array = Arc::new(Float64Array::from(vec![4.0, 5.0, 6.0]));
let ts_array = Arc::new(TimestampNanosecondArray::from(vec![100, 200, 300]));
// Update batch
accumulator
.update_batch(&[lat_array, lng_array, ts_array])
.unwrap();
// Evaluate
let result = accumulator.evaluate().unwrap();
assert_eq!(
result,
ScalarValue::Utf8(Some("[[4.0,1.0],[5.0,2.0],[6.0,3.0]]".to_string()))
);
}
fn output_type(&self) -> Result<ConcreteDataType> {
Ok(ConcreteDataType::string_datatype())
#[test]
fn test_json_encode_path_sort_by_timestamp() {
let mut accumulator = JsonEncodePathAccumulator::new();
// Create test data with unordered timestamps
let lat_array = Arc::new(Float64Array::from(vec![1.0, 2.0, 3.0]));
let lng_array = Arc::new(Float64Array::from(vec![4.0, 5.0, 6.0]));
let ts_array = Arc::new(TimestampNanosecondArray::from(vec![300, 100, 200]));
// Update batch
accumulator
.update_batch(&[lat_array, lng_array, ts_array])
.unwrap();
// Evaluate
let result = accumulator.evaluate().unwrap();
assert_eq!(
result,
ScalarValue::Utf8(Some("[[5.0,2.0],[6.0,3.0],[4.0,1.0]]".to_string()))
);
}
fn state_types(&self) -> Result<Vec<ConcreteDataType>> {
let input_types = self.input_types()?;
ensure!(input_types.len() == 3, InvalidInputStateSnafu);
#[test]
fn test_json_encode_path_merge() {
let mut accumulator1 = JsonEncodePathAccumulator::new();
let mut accumulator2 = JsonEncodePathAccumulator::new();
let timestamp_type = input_types[2].clone();
// Create test data for first accumulator
let lat_array1 = Arc::new(Float64Array::from(vec![1.0]));
let lng_array1 = Arc::new(Float64Array::from(vec![4.0]));
let ts_array1 = Arc::new(TimestampNanosecondArray::from(vec![100]));
Ok(vec![
ConcreteDataType::list_datatype(ConcreteDataType::float64_datatype()),
ConcreteDataType::list_datatype(ConcreteDataType::float64_datatype()),
ConcreteDataType::list_datatype(timestamp_type),
])
// Create test data for second accumulator
let lat_array2 = Arc::new(Float64Array::from(vec![2.0]));
let lng_array2 = Arc::new(Float64Array::from(vec![5.0]));
let ts_array2 = Arc::new(TimestampNanosecondArray::from(vec![200]));
// Update batches
accumulator1
.update_batch(&[lat_array1, lng_array1, ts_array1])
.unwrap();
accumulator2
.update_batch(&[lat_array2, lng_array2, ts_array2])
.unwrap();
// Get states
let state1 = accumulator1.state().unwrap();
let state2 = accumulator2.state().unwrap();
// Create a merged accumulator
let mut merged = JsonEncodePathAccumulator::new();
// Extract the struct arrays from the states
let state_array1 = match &state1[0] {
ScalarValue::Struct(array) => array.clone(),
_ => panic!("Expected Struct scalar value"),
};
let state_array2 = match &state2[0] {
ScalarValue::Struct(array) => array.clone(),
_ => panic!("Expected Struct scalar value"),
};
// Merge state arrays
merged.merge_batch(&[state_array1]).unwrap();
merged.merge_batch(&[state_array2]).unwrap();
// Evaluate merged result
let result = merged.evaluate().unwrap();
assert_eq!(
result,
ScalarValue::Utf8(Some("[[4.0,1.0],[5.0,2.0]]".to_string()))
);
}
}

View File

@@ -12,21 +12,20 @@
// See the License for the specific language governing permissions and
// limitations under the License.
use std::borrow::Cow;
use std::sync::Arc;
use common_macro::{as_aggr_func_creator, AggrFuncTypeStore};
use common_query::error::{CreateAccumulatorSnafu, Error, InvalidFuncArgsSnafu};
use common_query::logical_plan::{
create_aggregate_function, Accumulator, AggregateFunctionCreator,
};
use common_query::prelude::AccumulatorCreatorFunction;
use datafusion_expr::AggregateUDF;
use datatypes::prelude::{ConcreteDataType, Value, *};
use datatypes::vectors::VectorRef;
use arrow::array::{Array, ArrayRef, AsArray, BinaryArray, StringArray};
use arrow_schema::{DataType, Field};
use datafusion::logical_expr::{Signature, TypeSignature, Volatility};
use datafusion_common::{Result, ScalarValue};
use datafusion_expr::{Accumulator, AggregateUDF, SimpleAggregateUDF};
use datafusion_functions_aggregate_common::accumulator::AccumulatorArgs;
use nalgebra::{Const, DVectorView, Dyn, OVector};
use snafu::ensure;
use crate::scalars::vector::impl_conv::{as_veclit, as_veclit_if_const, veclit_to_binlit};
use crate::scalars::vector::impl_conv::{
binlit_as_veclit, parse_veclit_from_strlit, veclit_to_binlit,
};
/// Aggregates by multiplying elements across the same dimension, returns a vector.
#[derive(Debug, Default)]
@@ -35,57 +34,42 @@ pub struct VectorProduct {
has_null: bool,
}
#[as_aggr_func_creator]
#[derive(Debug, Default, AggrFuncTypeStore)]
pub struct VectorProductCreator {}
impl AggregateFunctionCreator for VectorProductCreator {
fn creator(&self) -> AccumulatorCreatorFunction {
let creator: AccumulatorCreatorFunction = Arc::new(move |types: &[ConcreteDataType]| {
ensure!(
types.len() == 1,
InvalidFuncArgsSnafu {
err_msg: format!(
"The length of the args is not correct, expect exactly one, have: {}",
types.len()
)
}
);
let input_type = &types[0];
match input_type {
ConcreteDataType::String(_) | ConcreteDataType::Binary(_) => {
Ok(Box::new(VectorProduct::default()))
}
_ => {
let err_msg = format!(
"\"VEC_PRODUCT\" aggregate function not support data type {:?}",
input_type.logical_type_id(),
);
CreateAccumulatorSnafu { err_msg }.fail()?
}
}
});
creator
}
fn output_type(&self) -> common_query::error::Result<ConcreteDataType> {
Ok(ConcreteDataType::binary_datatype())
}
fn state_types(&self) -> common_query::error::Result<Vec<ConcreteDataType>> {
Ok(vec![self.output_type()?])
}
}
impl VectorProduct {
/// Create a new `AggregateUDF` for the `vec_product` aggregate function.
pub fn uadf_impl() -> AggregateUDF {
create_aggregate_function(
"vec_product".to_string(),
1,
Arc::new(VectorProductCreator::default()),
)
.into()
let signature = Signature::one_of(
vec![
TypeSignature::Exact(vec![DataType::Utf8]),
TypeSignature::Exact(vec![DataType::Binary]),
],
Volatility::Immutable,
);
let udaf = SimpleAggregateUDF::new_with_signature(
"vec_product",
signature,
DataType::Binary,
Arc::new(Self::accumulator),
vec![Arc::new(Field::new("x", DataType::Binary, true))],
);
AggregateUDF::from(udaf)
}
fn accumulator(args: AccumulatorArgs) -> Result<Box<dyn Accumulator>> {
if args.schema.fields().len() != 1 {
return Err(datafusion_common::DataFusionError::Internal(format!(
"expect creating `VEC_PRODUCT` with only one input field, actual {}",
args.schema.fields().len()
)));
}
let t = args.schema.field(0).data_type();
if !matches!(t, DataType::Utf8 | DataType::Binary) {
return Err(datafusion_common::DataFusionError::Internal(format!(
"unexpected input datatype {t} when creating `VEC_PRODUCT`"
)));
}
Ok(Box::new(VectorProduct::default()))
}
fn inner(&mut self, len: usize) -> &mut OVector<f32, Dyn> {
@@ -94,67 +78,82 @@ impl VectorProduct {
})
}
fn update(&mut self, values: &[VectorRef], is_update: bool) -> Result<(), Error> {
fn update(&mut self, values: &[ArrayRef], is_update: bool) -> Result<()> {
if values.is_empty() || self.has_null {
return Ok(());
};
let column = &values[0];
let len = column.len();
match as_veclit_if_const(column)? {
Some(column) => {
let vec_column = DVectorView::from_slice(&column, column.len()).scale(len as f32);
*self.inner(vec_column.len()) =
(*self.inner(vec_column.len())).component_mul(&vec_column);
let vectors = match values[0].data_type() {
DataType::Utf8 => {
let arr: &StringArray = values[0].as_string();
arr.iter()
.filter_map(|x| x.map(|s| parse_veclit_from_strlit(s).map_err(Into::into)))
.map(|x| x.map(Cow::Owned))
.collect::<Result<Vec<_>>>()?
}
None => {
for i in 0..len {
let Some(arg0) = as_veclit(column.get_ref(i))? else {
if is_update {
self.has_null = true;
self.product = None;
}
return Ok(());
};
let vec_column = DVectorView::from_slice(&arg0, arg0.len());
*self.inner(vec_column.len()) =
(*self.inner(vec_column.len())).component_mul(&vec_column);
}
DataType::Binary => {
let arr: &BinaryArray = values[0].as_binary();
arr.iter()
.filter_map(|x| x.map(|b| binlit_as_veclit(b).map_err(Into::into)))
.collect::<Result<Vec<_>>>()?
}
_ => {
return Err(datafusion_common::DataFusionError::NotImplemented(format!(
"unsupported data type {} for `VEC_PRODUCT`",
values[0].data_type()
)))
}
};
if vectors.len() != values[0].len() {
if is_update {
self.has_null = true;
self.product = None;
}
return Ok(());
}
vectors.iter().for_each(|v| {
let v = DVectorView::from_slice(v, v.len());
let inner = self.inner(v.len());
*inner = inner.component_mul(&v);
});
Ok(())
}
}
impl Accumulator for VectorProduct {
fn state(&self) -> common_query::error::Result<Vec<Value>> {
fn state(&mut self) -> Result<Vec<ScalarValue>> {
self.evaluate().map(|v| vec![v])
}
fn update_batch(&mut self, values: &[VectorRef]) -> common_query::error::Result<()> {
fn update_batch(&mut self, values: &[ArrayRef]) -> Result<()> {
self.update(values, true)
}
fn merge_batch(&mut self, states: &[VectorRef]) -> common_query::error::Result<()> {
fn merge_batch(&mut self, states: &[ArrayRef]) -> Result<()> {
self.update(states, false)
}
fn evaluate(&self) -> common_query::error::Result<Value> {
fn evaluate(&mut self) -> Result<ScalarValue> {
match &self.product {
None => Ok(Value::Null),
Some(vector) => {
let v = vector.as_slice();
Ok(Value::from(veclit_to_binlit(v)))
}
None => Ok(ScalarValue::Binary(None)),
Some(vector) => Ok(ScalarValue::Binary(Some(veclit_to_binlit(
vector.as_slice(),
)))),
}
}
fn size(&self) -> usize {
size_of_val(self)
}
}
#[cfg(test)]
mod tests {
use std::sync::Arc;
use datatypes::vectors::{ConstantVector, StringVector};
use datatypes::scalars::ScalarVector;
use datatypes::vectors::{ConstantVector, StringVector, Vector};
use super::*;
@@ -165,59 +164,60 @@ mod tests {
vec_product.update_batch(&[]).unwrap();
assert!(vec_product.product.is_none());
assert!(!vec_product.has_null);
assert_eq!(Value::Null, vec_product.evaluate().unwrap());
assert_eq!(ScalarValue::Binary(None), vec_product.evaluate().unwrap());
// test update one not-null value
let mut vec_product = VectorProduct::default();
let v: Vec<VectorRef> = vec![Arc::new(StringVector::from(vec![Some(
let v: Vec<ArrayRef> = vec![Arc::new(StringArray::from(vec![Some(
"[1.0,2.0,3.0]".to_string(),
)]))];
vec_product.update_batch(&v).unwrap();
assert_eq!(
Value::from(veclit_to_binlit(&[1.0, 2.0, 3.0])),
ScalarValue::Binary(Some(veclit_to_binlit(&[1.0, 2.0, 3.0]))),
vec_product.evaluate().unwrap()
);
// test update one null value
let mut vec_product = VectorProduct::default();
let v: Vec<VectorRef> = vec![Arc::new(StringVector::from(vec![Option::<String>::None]))];
let v: Vec<ArrayRef> = vec![Arc::new(StringArray::from(vec![Option::<String>::None]))];
vec_product.update_batch(&v).unwrap();
assert_eq!(Value::Null, vec_product.evaluate().unwrap());
assert_eq!(ScalarValue::Binary(None), vec_product.evaluate().unwrap());
// test update no null-value batch
let mut vec_product = VectorProduct::default();
let v: Vec<VectorRef> = vec![Arc::new(StringVector::from(vec![
let v: Vec<ArrayRef> = vec![Arc::new(StringArray::from(vec![
Some("[1.0,2.0,3.0]".to_string()),
Some("[4.0,5.0,6.0]".to_string()),
Some("[7.0,8.0,9.0]".to_string()),
]))];
vec_product.update_batch(&v).unwrap();
assert_eq!(
Value::from(veclit_to_binlit(&[28.0, 80.0, 162.0])),
ScalarValue::Binary(Some(veclit_to_binlit(&[28.0, 80.0, 162.0]))),
vec_product.evaluate().unwrap()
);
// test update null-value batch
let mut vec_product = VectorProduct::default();
let v: Vec<VectorRef> = vec![Arc::new(StringVector::from(vec![
let v: Vec<ArrayRef> = vec![Arc::new(StringArray::from(vec![
Some("[1.0,2.0,3.0]".to_string()),
None,
Some("[7.0,8.0,9.0]".to_string()),
]))];
vec_product.update_batch(&v).unwrap();
assert_eq!(Value::Null, vec_product.evaluate().unwrap());
assert_eq!(ScalarValue::Binary(None), vec_product.evaluate().unwrap());
// test update with constant vector
let mut vec_product = VectorProduct::default();
let v: Vec<VectorRef> = vec![Arc::new(ConstantVector::new(
let v: Vec<ArrayRef> = vec![Arc::new(ConstantVector::new(
Arc::new(StringVector::from_vec(vec!["[1.0,2.0,3.0]".to_string()])),
4,
))];
))
.to_arrow_array()];
vec_product.update_batch(&v).unwrap();
assert_eq!(
Value::from(veclit_to_binlit(&[4.0, 8.0, 12.0])),
ScalarValue::Binary(Some(veclit_to_binlit(&[1.0, 16.0, 81.0]))),
vec_product.evaluate().unwrap()
);
}

View File

@@ -14,19 +14,18 @@
use std::sync::Arc;
use common_macro::{as_aggr_func_creator, AggrFuncTypeStore};
use common_query::error::{CreateAccumulatorSnafu, Error, InvalidFuncArgsSnafu};
use common_query::logical_plan::{
create_aggregate_function, Accumulator, AggregateFunctionCreator,
use arrow::array::{Array, ArrayRef, AsArray, BinaryArray, StringArray};
use arrow_schema::{DataType, Field};
use datafusion_common::{Result, ScalarValue};
use datafusion_expr::{
Accumulator, AggregateUDF, Signature, SimpleAggregateUDF, TypeSignature, Volatility,
};
use common_query::prelude::AccumulatorCreatorFunction;
use datafusion_expr::AggregateUDF;
use datatypes::prelude::{ConcreteDataType, Value, *};
use datatypes::vectors::VectorRef;
use datafusion_functions_aggregate_common::accumulator::AccumulatorArgs;
use nalgebra::{Const, DVectorView, Dyn, OVector};
use snafu::ensure;
use crate::scalars::vector::impl_conv::{as_veclit, as_veclit_if_const, veclit_to_binlit};
use crate::scalars::vector::impl_conv::{
binlit_as_veclit, parse_veclit_from_strlit, veclit_to_binlit,
};
/// The accumulator for the `vec_sum` aggregate function.
#[derive(Debug, Default)]
@@ -35,57 +34,42 @@ pub struct VectorSum {
has_null: bool,
}
#[as_aggr_func_creator]
#[derive(Debug, Default, AggrFuncTypeStore)]
pub struct VectorSumCreator {}
impl AggregateFunctionCreator for VectorSumCreator {
fn creator(&self) -> AccumulatorCreatorFunction {
let creator: AccumulatorCreatorFunction = Arc::new(move |types: &[ConcreteDataType]| {
ensure!(
types.len() == 1,
InvalidFuncArgsSnafu {
err_msg: format!(
"The length of the args is not correct, expect exactly one, have: {}",
types.len()
)
}
);
let input_type = &types[0];
match input_type {
ConcreteDataType::String(_) | ConcreteDataType::Binary(_) => {
Ok(Box::new(VectorSum::default()))
}
_ => {
let err_msg = format!(
"\"VEC_SUM\" aggregate function not support data type {:?}",
input_type.logical_type_id(),
);
CreateAccumulatorSnafu { err_msg }.fail()?
}
}
});
creator
}
fn output_type(&self) -> common_query::error::Result<ConcreteDataType> {
Ok(ConcreteDataType::binary_datatype())
}
fn state_types(&self) -> common_query::error::Result<Vec<ConcreteDataType>> {
Ok(vec![self.output_type()?])
}
}
impl VectorSum {
/// Create a new `AggregateUDF` for the `vec_sum` aggregate function.
pub fn uadf_impl() -> AggregateUDF {
create_aggregate_function(
"vec_sum".to_string(),
1,
Arc::new(VectorSumCreator::default()),
)
.into()
let signature = Signature::one_of(
vec![
TypeSignature::Exact(vec![DataType::Utf8]),
TypeSignature::Exact(vec![DataType::Binary]),
],
Volatility::Immutable,
);
let udaf = SimpleAggregateUDF::new_with_signature(
"vec_sum",
signature,
DataType::Binary,
Arc::new(Self::accumulator),
vec![Arc::new(Field::new("x", DataType::Binary, true))],
);
AggregateUDF::from(udaf)
}
fn accumulator(args: AccumulatorArgs) -> Result<Box<dyn Accumulator>> {
if args.schema.fields().len() != 1 {
return Err(datafusion_common::DataFusionError::Internal(format!(
"expect creating `VEC_SUM` with only one input field, actual {}",
args.schema.fields().len()
)));
}
let t = args.schema.field(0).data_type();
if !matches!(t, DataType::Utf8 | DataType::Binary) {
return Err(datafusion_common::DataFusionError::Internal(format!(
"unexpected input datatype {t} when creating `VEC_SUM`"
)));
}
Ok(Box::new(VectorSum::default()))
}
fn inner(&mut self, len: usize) -> &mut OVector<f32, Dyn> {
@@ -93,62 +77,87 @@ impl VectorSum {
.get_or_insert_with(|| OVector::zeros_generic(Dyn(len), Const::<1>))
}
fn update(&mut self, values: &[VectorRef], is_update: bool) -> Result<(), Error> {
fn update(&mut self, values: &[ArrayRef], is_update: bool) -> Result<()> {
if values.is_empty() || self.has_null {
return Ok(());
};
let column = &values[0];
let len = column.len();
match as_veclit_if_const(column)? {
Some(column) => {
let vec_column = DVectorView::from_slice(&column, column.len()).scale(len as f32);
*self.inner(vec_column.len()) += vec_column;
}
None => {
for i in 0..len {
let Some(arg0) = as_veclit(column.get_ref(i))? else {
match values[0].data_type() {
DataType::Utf8 => {
let arr: &StringArray = values[0].as_string();
for s in arr.iter() {
let Some(s) = s else {
if is_update {
self.has_null = true;
self.sum = None;
}
return Ok(());
};
let vec_column = DVectorView::from_slice(&arg0, arg0.len());
let values = parse_veclit_from_strlit(s)?;
let vec_column = DVectorView::from_slice(&values, values.len());
*self.inner(vec_column.len()) += vec_column;
}
}
DataType::Binary => {
let arr: &BinaryArray = values[0].as_binary();
for b in arr.iter() {
let Some(b) = b else {
if is_update {
self.has_null = true;
self.sum = None;
}
return Ok(());
};
let values = binlit_as_veclit(b)?;
let vec_column = DVectorView::from_slice(&values, values.len());
*self.inner(vec_column.len()) += vec_column;
}
}
_ => {
return Err(datafusion_common::DataFusionError::NotImplemented(format!(
"unsupported data type {} for `VEC_SUM`",
values[0].data_type()
)))
}
}
Ok(())
}
}
impl Accumulator for VectorSum {
fn state(&self) -> common_query::error::Result<Vec<Value>> {
fn state(&mut self) -> Result<Vec<ScalarValue>> {
self.evaluate().map(|v| vec![v])
}
fn update_batch(&mut self, values: &[VectorRef]) -> common_query::error::Result<()> {
fn update_batch(&mut self, values: &[ArrayRef]) -> Result<()> {
self.update(values, true)
}
fn merge_batch(&mut self, states: &[VectorRef]) -> common_query::error::Result<()> {
fn merge_batch(&mut self, states: &[ArrayRef]) -> Result<()> {
self.update(states, false)
}
fn evaluate(&self) -> common_query::error::Result<Value> {
fn evaluate(&mut self) -> Result<ScalarValue> {
match &self.sum {
None => Ok(Value::Null),
Some(vector) => Ok(Value::from(veclit_to_binlit(vector.as_slice()))),
None => Ok(ScalarValue::Binary(None)),
Some(vector) => Ok(ScalarValue::Binary(Some(veclit_to_binlit(
vector.as_slice(),
)))),
}
}
fn size(&self) -> usize {
size_of_val(self)
}
}
#[cfg(test)]
mod tests {
use std::sync::Arc;
use datatypes::vectors::{ConstantVector, StringVector};
use arrow::array::StringArray;
use datatypes::scalars::ScalarVector;
use datatypes::vectors::{ConstantVector, StringVector, Vector};
use super::*;
@@ -159,57 +168,58 @@ mod tests {
vec_sum.update_batch(&[]).unwrap();
assert!(vec_sum.sum.is_none());
assert!(!vec_sum.has_null);
assert_eq!(Value::Null, vec_sum.evaluate().unwrap());
assert_eq!(ScalarValue::Binary(None), vec_sum.evaluate().unwrap());
// test update one not-null value
let mut vec_sum = VectorSum::default();
let v: Vec<VectorRef> = vec![Arc::new(StringVector::from(vec![Some(
let v: Vec<ArrayRef> = vec![Arc::new(StringArray::from(vec![Some(
"[1.0,2.0,3.0]".to_string(),
)]))];
vec_sum.update_batch(&v).unwrap();
assert_eq!(
Value::from(veclit_to_binlit(&[1.0, 2.0, 3.0])),
ScalarValue::Binary(Some(veclit_to_binlit(&[1.0, 2.0, 3.0]))),
vec_sum.evaluate().unwrap()
);
// test update one null value
let mut vec_sum = VectorSum::default();
let v: Vec<VectorRef> = vec![Arc::new(StringVector::from(vec![Option::<String>::None]))];
let v: Vec<ArrayRef> = vec![Arc::new(StringArray::from(vec![Option::<String>::None]))];
vec_sum.update_batch(&v).unwrap();
assert_eq!(Value::Null, vec_sum.evaluate().unwrap());
assert_eq!(ScalarValue::Binary(None), vec_sum.evaluate().unwrap());
// test update no null-value batch
let mut vec_sum = VectorSum::default();
let v: Vec<VectorRef> = vec![Arc::new(StringVector::from(vec![
let v: Vec<ArrayRef> = vec![Arc::new(StringArray::from(vec![
Some("[1.0,2.0,3.0]".to_string()),
Some("[4.0,5.0,6.0]".to_string()),
Some("[7.0,8.0,9.0]".to_string()),
]))];
vec_sum.update_batch(&v).unwrap();
assert_eq!(
Value::from(veclit_to_binlit(&[12.0, 15.0, 18.0])),
ScalarValue::Binary(Some(veclit_to_binlit(&[12.0, 15.0, 18.0]))),
vec_sum.evaluate().unwrap()
);
// test update null-value batch
let mut vec_sum = VectorSum::default();
let v: Vec<VectorRef> = vec![Arc::new(StringVector::from(vec![
let v: Vec<ArrayRef> = vec![Arc::new(StringArray::from(vec![
Some("[1.0,2.0,3.0]".to_string()),
None,
Some("[7.0,8.0,9.0]".to_string()),
]))];
vec_sum.update_batch(&v).unwrap();
assert_eq!(Value::Null, vec_sum.evaluate().unwrap());
assert_eq!(ScalarValue::Binary(None), vec_sum.evaluate().unwrap());
// test update with constant vector
let mut vec_sum = VectorSum::default();
let v: Vec<VectorRef> = vec![Arc::new(ConstantVector::new(
let v: Vec<ArrayRef> = vec![Arc::new(ConstantVector::new(
Arc::new(StringVector::from_vec(vec!["[1.0,2.0,3.0]".to_string()])),
4,
))];
))
.to_arrow_array()];
vec_sum.update_batch(&v).unwrap();
assert_eq!(
Value::from(veclit_to_binlit(&[4.0, 8.0, 12.0])),
ScalarValue::Binary(Some(veclit_to_binlit(&[4.0, 8.0, 12.0]))),
vec_sum.evaluate().unwrap()
);
}

View File

@@ -12,21 +12,8 @@
// See the License for the specific language governing permissions and
// limitations under the License.
use common_macro::{as_aggr_func_creator, AggrFuncTypeStore, ToRow};
use common_macro::ToRow;
use greptime_proto::v1::{ColumnDataType, ColumnSchema, Row, SemanticType};
use static_assertions::{assert_fields, assert_impl_all};
#[as_aggr_func_creator]
#[derive(Debug, Default, AggrFuncTypeStore)]
struct Foo {}
#[test]
#[allow(clippy::extra_unused_type_parameters)]
fn test_derive() {
let _ = Foo::default();
assert_fields!(Foo: input_types);
assert_impl_all!(Foo: std::fmt::Debug, Default, common_query::logical_plan::accumulator::AggrFuncTypeStore);
}
#[derive(ToRow)]
struct ToRowOwned {

View File

@@ -59,19 +59,9 @@ pub enum Error {
data_type: ArrowDatatype,
},
#[snafu(display("Failed to create accumulator: {}", err_msg))]
CreateAccumulator { err_msg: String },
#[snafu(display("Failed to downcast vector: {}", err_msg))]
DowncastVector { err_msg: String },
#[snafu(display("Bad accumulator implementation: {}", err_msg))]
BadAccumulatorImpl {
err_msg: String,
#[snafu(implicit)]
location: Location,
},
#[snafu(display("Invalid input type: {}", err_msg))]
InvalidInputType {
#[snafu(implicit)]
@@ -223,10 +213,8 @@ pub type Result<T> = std::result::Result<T, Error>;
impl ErrorExt for Error {
fn status_code(&self) -> StatusCode {
match self {
Error::CreateAccumulator { .. }
| Error::DowncastVector { .. }
Error::DowncastVector { .. }
| Error::InvalidInputState { .. }
| Error::BadAccumulatorImpl { .. }
| Error::ToScalarValue { .. }
| Error::GetScalarVector { .. }
| Error::ArrowCompute { .. }

View File

@@ -1,57 +0,0 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
use std::sync::Arc;
use datafusion_expr::ReturnTypeFunction as DfReturnTypeFunction;
use datatypes::arrow::datatypes::DataType as ArrowDataType;
use datatypes::prelude::{ConcreteDataType, DataType};
use crate::error::Result;
use crate::logical_plan::Accumulator;
/// A function's return type
pub type ReturnTypeFunction =
Arc<dyn Fn(&[ConcreteDataType]) -> Result<Arc<ConcreteDataType>> + Send + Sync>;
/// Accumulator creator that will be used by DataFusion
pub type AccumulatorFunctionImpl = Arc<dyn Fn() -> Result<Box<dyn Accumulator>> + Send + Sync>;
/// Create Accumulator with the data type of input columns.
pub type AccumulatorCreatorFunction =
Arc<dyn Fn(&[ConcreteDataType]) -> Result<Box<dyn Accumulator>> + Sync + Send>;
/// This signature corresponds to which types an aggregator serializes
/// its state, given its return datatype.
pub type StateTypeFunction =
Arc<dyn Fn(&ConcreteDataType) -> Result<Arc<Vec<ConcreteDataType>>> + Send + Sync>;
pub fn to_df_return_type(func: ReturnTypeFunction) -> DfReturnTypeFunction {
let df_func = move |data_types: &[ArrowDataType]| {
// DataFusion DataType -> ConcreteDataType
let concrete_data_types = data_types
.iter()
.map(ConcreteDataType::from_arrow_type)
.collect::<Vec<_>>();
// evaluate ConcreteDataType
let eval_result = (func)(&concrete_data_types);
// ConcreteDataType -> DataFusion DataType
eval_result
.map(|t| Arc::new(t.as_arrow_type()))
.map_err(|e| e.into())
};
Arc::new(df_func)
}

View File

@@ -14,7 +14,6 @@
pub mod columnar_value;
pub mod error;
pub mod function;
pub mod logical_plan;
pub mod prelude;
pub mod request;

View File

@@ -12,9 +12,7 @@
// See the License for the specific language governing permissions and
// limitations under the License.
pub mod accumulator;
mod expr;
mod udaf;
use std::sync::Arc;
@@ -28,29 +26,7 @@ use datafusion_expr::{col, DmlStatement, TableSource, WriteOp};
pub use expr::{build_filter_from_timestamp, build_same_type_ts_filter};
use snafu::ResultExt;
pub use self::accumulator::{Accumulator, AggregateFunctionCreator, AggregateFunctionCreatorRef};
pub use self::udaf::AggregateFunction;
use crate::error::{GeneralDataFusionSnafu, Result};
use crate::logical_plan::accumulator::*;
use crate::signature::{Signature, Volatility};
pub fn create_aggregate_function(
name: String,
args_count: u8,
creator: Arc<dyn AggregateFunctionCreator>,
) -> AggregateFunction {
let return_type = make_return_function(creator.clone());
let accumulator = make_accumulator_function(creator.clone());
let state_type = make_state_function(creator.clone());
AggregateFunction::new(
name,
Signature::any(args_count as usize, Volatility::Immutable),
return_type,
accumulator,
state_type,
creator,
)
}
/// Rename columns by applying a new projection. Returns an error if the column to be
/// renamed does not exist. The `renames` parameter is a `Vector` with elements
@@ -176,64 +152,8 @@ mod tests {
use datafusion_expr::builder::LogicalTableSource;
use datafusion_expr::lit;
use datatypes::arrow::datatypes::{DataType, Field, Schema, SchemaRef};
use datatypes::prelude::*;
use datatypes::vectors::VectorRef;
use super::*;
use crate::error::Result;
use crate::function::AccumulatorCreatorFunction;
use crate::signature::TypeSignature;
#[derive(Debug)]
struct DummyAccumulator;
impl Accumulator for DummyAccumulator {
fn state(&self) -> Result<Vec<Value>> {
Ok(vec![])
}
fn update_batch(&mut self, _values: &[VectorRef]) -> Result<()> {
Ok(())
}
fn merge_batch(&mut self, _states: &[VectorRef]) -> Result<()> {
Ok(())
}
fn evaluate(&self) -> Result<Value> {
Ok(Value::Int32(0))
}
}
#[derive(Debug)]
struct DummyAccumulatorCreator;
impl AggrFuncTypeStore for DummyAccumulatorCreator {
fn input_types(&self) -> Result<Vec<ConcreteDataType>> {
Ok(vec![ConcreteDataType::float64_datatype()])
}
fn set_input_types(&self, _: Vec<ConcreteDataType>) -> Result<()> {
Ok(())
}
}
impl AggregateFunctionCreator for DummyAccumulatorCreator {
fn creator(&self) -> AccumulatorCreatorFunction {
Arc::new(|_| Ok(Box::new(DummyAccumulator)))
}
fn output_type(&self) -> Result<ConcreteDataType> {
Ok(self.input_types()?.into_iter().next().unwrap())
}
fn state_types(&self) -> Result<Vec<ConcreteDataType>> {
Ok(vec![
ConcreteDataType::float64_datatype(),
ConcreteDataType::uint32_datatype(),
])
}
}
fn mock_plan() -> LogicalPlan {
let schema = Schema::new(vec![
@@ -268,27 +188,4 @@ Projection: person.id AS a, person.name AS b
format!("\n{}", new_plan)
);
}
#[test]
fn test_create_udaf() {
let creator = DummyAccumulatorCreator;
let udaf = create_aggregate_function("dummy".to_string(), 1, Arc::new(creator));
assert_eq!("dummy", udaf.name);
let signature = udaf.signature;
assert_eq!(TypeSignature::Any(1), signature.type_signature);
assert_eq!(Volatility::Immutable, signature.volatility);
assert_eq!(
Arc::new(ConcreteDataType::float64_datatype()),
(udaf.return_type)(&[ConcreteDataType::float64_datatype()]).unwrap()
);
assert_eq!(
Arc::new(vec![
ConcreteDataType::float64_datatype(),
ConcreteDataType::uint32_datatype(),
]),
(udaf.state_type)(&ConcreteDataType::float64_datatype()).unwrap()
);
}
}

View File

@@ -1,176 +0,0 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
//! Accumulator module contains the trait definition for aggregation function's accumulators.
use std::fmt::Debug;
use std::sync::Arc;
use datafusion_common::Result as DfResult;
use datafusion_expr::Accumulator as DfAccumulator;
use datatypes::arrow::array::ArrayRef;
use datatypes::prelude::*;
use datatypes::vectors::{Helper as VectorHelper, VectorRef};
use snafu::ResultExt;
use crate::error::{self, FromScalarValueSnafu, IntoVectorSnafu, Result};
use crate::prelude::*;
pub type AggregateFunctionCreatorRef = Arc<dyn AggregateFunctionCreator>;
/// An accumulator represents a stateful object that lives throughout the evaluation of multiple rows and
/// generically accumulates values.
///
/// An accumulator knows how to:
/// * update its state from inputs via `update_batch`
/// * convert its internal state to a vector of scalar values
/// * update its state from multiple accumulators' states via `merge_batch`
/// * compute the final value from its internal state via `evaluate`
///
/// Modified from DataFusion.
pub trait Accumulator: Send + Sync + Debug {
/// Returns the state of the accumulator at the end of the accumulation.
// in the case of an average on which we track `sum` and `n`, this function should return a vector
// of two values, sum and n.
fn state(&self) -> Result<Vec<Value>>;
/// updates the accumulator's state from a vector of arrays.
fn update_batch(&mut self, values: &[VectorRef]) -> Result<()>;
/// updates the accumulator's state from a vector of states.
fn merge_batch(&mut self, states: &[VectorRef]) -> Result<()>;
/// returns its value based on its current state.
fn evaluate(&self) -> Result<Value>;
}
/// An `AggregateFunctionCreator` dynamically creates `Accumulator`.
///
/// An `AggregateFunctionCreator` often has a companion struct, that
/// can store the input data types (impl [AggrFuncTypeStore]), and knows the output and states
/// types of an Accumulator.
pub trait AggregateFunctionCreator: AggrFuncTypeStore {
/// Create a function that can create a new accumulator with some input data type.
fn creator(&self) -> AccumulatorCreatorFunction;
/// Get the Accumulator's output data type.
fn output_type(&self) -> Result<ConcreteDataType>;
/// Get the Accumulator's state data types.
fn state_types(&self) -> Result<Vec<ConcreteDataType>>;
}
/// `AggrFuncTypeStore` stores the aggregate function's input data's types.
///
/// When creating Accumulator generically, we have to know the input data's types.
/// However, DataFusion does not provide the input data's types at the time of creating Accumulator.
/// To solve the problem, we store the datatypes upfront here.
pub trait AggrFuncTypeStore: Send + Sync + Debug {
/// Get the input data types of the Accumulator.
fn input_types(&self) -> Result<Vec<ConcreteDataType>>;
/// Store the input data types that are provided by DataFusion at runtime (when it is evaluating
/// return type function).
fn set_input_types(&self, input_types: Vec<ConcreteDataType>) -> Result<()>;
}
pub fn make_accumulator_function(
creator: Arc<dyn AggregateFunctionCreator>,
) -> AccumulatorFunctionImpl {
Arc::new(move || {
let input_types = creator.input_types()?;
let creator = creator.creator();
creator(&input_types)
})
}
pub fn make_return_function(creator: Arc<dyn AggregateFunctionCreator>) -> ReturnTypeFunction {
Arc::new(move |input_types| {
creator.set_input_types(input_types.to_vec())?;
let output_type = creator.output_type()?;
Ok(Arc::new(output_type))
})
}
pub fn make_state_function(creator: Arc<dyn AggregateFunctionCreator>) -> StateTypeFunction {
Arc::new(move |_| Ok(Arc::new(creator.state_types()?)))
}
/// A wrapper type for our Accumulator to DataFusion's Accumulator,
/// so to make our Accumulator able to be executed by DataFusion query engine.
#[derive(Debug)]
pub struct DfAccumulatorAdaptor {
accumulator: Box<dyn Accumulator>,
creator: AggregateFunctionCreatorRef,
}
impl DfAccumulatorAdaptor {
pub fn new(accumulator: Box<dyn Accumulator>, creator: AggregateFunctionCreatorRef) -> Self {
Self {
accumulator,
creator,
}
}
}
impl DfAccumulator for DfAccumulatorAdaptor {
fn state(&mut self) -> DfResult<Vec<ScalarValue>> {
let state_values = self.accumulator.state()?;
let state_types = self.creator.state_types()?;
if state_values.len() != state_types.len() {
return error::BadAccumulatorImplSnafu {
err_msg: format!("Accumulator {self:?} returned state values size do not match its state types size."),
}
.fail()?;
}
Ok(state_values
.into_iter()
.zip(state_types.iter())
.map(|(v, t)| v.try_to_scalar_value(t).context(error::ToScalarValueSnafu))
.collect::<Result<Vec<_>>>()?)
}
fn update_batch(&mut self, values: &[ArrayRef]) -> DfResult<()> {
let vectors = VectorHelper::try_into_vectors(values).context(FromScalarValueSnafu)?;
self.accumulator.update_batch(&vectors)?;
Ok(())
}
fn merge_batch(&mut self, states: &[ArrayRef]) -> DfResult<()> {
let mut vectors = Vec::with_capacity(states.len());
for array in states.iter() {
vectors.push(
VectorHelper::try_into_vector(array).context(IntoVectorSnafu {
data_type: array.data_type().clone(),
})?,
);
}
self.accumulator.merge_batch(&vectors)?;
Ok(())
}
fn evaluate(&mut self) -> DfResult<ScalarValue> {
let value = self.accumulator.evaluate()?;
let output_type = self.creator.output_type()?;
let scalar_value = value
.try_to_scalar_value(&output_type)
.context(error::ToScalarValueSnafu)?;
Ok(scalar_value)
}
fn size(&self) -> usize {
0
}
}

View File

@@ -1,167 +0,0 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
//! Udaf module contains functions and structs supporting user-defined aggregate functions.
//!
//! Modified from DataFusion.
use std::any::Any;
use std::fmt::{self, Debug, Formatter};
use std::sync::Arc;
use datafusion::arrow::datatypes::Field;
use datafusion_common::Result;
use datafusion_expr::function::{AccumulatorArgs, StateFieldsArgs};
use datafusion_expr::{
Accumulator, AccumulatorFactoryFunction, AggregateUDF as DfAggregateUdf, AggregateUDFImpl,
};
use datatypes::arrow::datatypes::{DataType as ArrowDataType, FieldRef};
use datatypes::data_type::DataType;
use crate::function::{
to_df_return_type, AccumulatorFunctionImpl, ReturnTypeFunction, StateTypeFunction,
};
use crate::logical_plan::accumulator::DfAccumulatorAdaptor;
use crate::logical_plan::AggregateFunctionCreatorRef;
use crate::signature::Signature;
/// Logical representation of a user-defined aggregate function (UDAF)
/// A UDAF is different from a UDF in that it is stateful across batches.
#[derive(Clone)]
pub struct AggregateFunction {
/// name
pub name: String,
/// signature
pub signature: Signature,
/// Return type
pub return_type: ReturnTypeFunction,
/// actual implementation
pub accumulator: AccumulatorFunctionImpl,
/// the accumulator's state's description as a function of the return type
pub state_type: StateTypeFunction,
/// the creator that creates aggregate functions
creator: AggregateFunctionCreatorRef,
}
impl Debug for AggregateFunction {
fn fmt(&self, f: &mut Formatter) -> fmt::Result {
f.debug_struct("AggregateUDF")
.field("name", &self.name)
.field("signature", &self.signature)
.field("fun", &"<FUNC>")
.finish()
}
}
impl PartialEq for AggregateFunction {
fn eq(&self, other: &Self) -> bool {
self.name == other.name && self.signature == other.signature
}
}
impl AggregateFunction {
/// Create a new AggregateUDF
pub fn new(
name: String,
signature: Signature,
return_type: ReturnTypeFunction,
accumulator: AccumulatorFunctionImpl,
state_type: StateTypeFunction,
creator: AggregateFunctionCreatorRef,
) -> Self {
Self {
name,
signature,
return_type,
accumulator,
state_type,
creator,
}
}
}
struct DfUdafAdapter {
name: String,
signature: datafusion_expr::Signature,
return_type_func: datafusion_expr::ReturnTypeFunction,
accumulator: AccumulatorFactoryFunction,
creator: AggregateFunctionCreatorRef,
}
impl Debug for DfUdafAdapter {
fn fmt(&self, f: &mut Formatter<'_>) -> fmt::Result {
f.debug_struct("DfUdafAdapter")
.field("name", &self.name)
.field("signature", &self.signature)
.finish()
}
}
impl AggregateUDFImpl for DfUdafAdapter {
fn as_any(&self) -> &dyn Any {
self
}
fn name(&self) -> &str {
&self.name
}
fn signature(&self) -> &datafusion_expr::Signature {
&self.signature
}
fn return_type(&self, arg_types: &[ArrowDataType]) -> Result<ArrowDataType> {
(self.return_type_func)(arg_types).map(|x| x.as_ref().clone())
}
fn accumulator(&self, acc_args: AccumulatorArgs) -> Result<Box<dyn Accumulator>> {
(self.accumulator)(acc_args)
}
fn state_fields(&self, args: StateFieldsArgs) -> Result<Vec<FieldRef>> {
let state_types = self.creator.state_types()?;
let fields = state_types
.into_iter()
.enumerate()
.map(|(i, t)| {
let name = format!("{}_{i}", args.name);
Arc::new(Field::new(name, t.as_arrow_type(), true))
})
.collect::<Vec<_>>();
Ok(fields)
}
}
impl From<AggregateFunction> for DfAggregateUdf {
fn from(udaf: AggregateFunction) -> Self {
DfAggregateUdf::new_from_impl(DfUdafAdapter {
name: udaf.name,
signature: udaf.signature.into(),
return_type_func: to_df_return_type(udaf.return_type),
accumulator: to_df_accumulator_func(udaf.accumulator, udaf.creator.clone()),
creator: udaf.creator,
})
}
}
fn to_df_accumulator_func(
accumulator: AccumulatorFunctionImpl,
creator: AggregateFunctionCreatorRef,
) -> AccumulatorFactoryFunction {
Arc::new(move |_| {
let accumulator = accumulator()?;
let creator = creator.clone();
Ok(Box::new(DfAccumulatorAdaptor::new(accumulator, creator)) as _)
})
}

View File

@@ -15,8 +15,6 @@
pub use datafusion_common::ScalarValue;
pub use crate::columnar_value::ColumnarValue;
pub use crate::function::*;
pub use crate::logical_plan::AggregateFunction;
pub use crate::signature::{Signature, TypeSignature, Volatility};
/// Default timestamp column name for Prometheus metrics.

View File

@@ -22,7 +22,6 @@ use crate::options::QueryOptions;
use crate::parser::QueryLanguageParser;
use crate::{QueryEngineFactory, QueryEngineRef};
mod my_sum_udaf_example;
mod query_engine_test;
mod time_range_filter_test;

View File

@@ -1,227 +0,0 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
use std::fmt::Debug;
use std::marker::PhantomData;
use std::sync::Arc;
use common_macro::{as_aggr_func_creator, AggrFuncTypeStore};
use common_query::error::{CreateAccumulatorSnafu, Result as QueryResult};
use common_query::logical_plan::accumulator::AggrFuncTypeStore;
use common_query::logical_plan::{
create_aggregate_function, Accumulator, AggregateFunctionCreator,
};
use common_query::prelude::*;
use common_recordbatch::{RecordBatch, RecordBatches};
use datatypes::prelude::*;
use datatypes::schema::{ColumnSchema, Schema};
use datatypes::types::{LogicalPrimitiveType, WrapperType};
use datatypes::vectors::Helper;
use datatypes::with_match_primitive_type_id;
use num_traits::AsPrimitive;
use table::test_util::MemTable;
use crate::error::Result;
use crate::tests::{exec_selection, new_query_engine_with_table};
#[derive(Debug, Default)]
struct MySumAccumulator<T, SumT> {
sum: SumT,
_phantom: PhantomData<T>,
}
impl<T, SumT> MySumAccumulator<T, SumT>
where
T: WrapperType,
SumT: WrapperType,
T::Native: AsPrimitive<SumT::Native>,
SumT::Native: std::ops::AddAssign,
{
#[inline(always)]
fn add(&mut self, v: T) {
let mut sum_native = self.sum.into_native();
sum_native += v.into_native().as_();
self.sum = SumT::from_native(sum_native);
}
#[inline(always)]
fn merge(&mut self, s: SumT) {
let mut sum_native = self.sum.into_native();
sum_native += s.into_native();
self.sum = SumT::from_native(sum_native);
}
}
#[as_aggr_func_creator]
#[derive(Debug, Default, AggrFuncTypeStore)]
struct MySumAccumulatorCreator {}
impl AggregateFunctionCreator for MySumAccumulatorCreator {
fn creator(&self) -> AccumulatorCreatorFunction {
let creator: AccumulatorCreatorFunction = Arc::new(move |types: &[ConcreteDataType]| {
let input_type = &types[0];
with_match_primitive_type_id!(
input_type.logical_type_id(),
|$S| {
Ok(Box::new(MySumAccumulator::<<$S as LogicalPrimitiveType>::Wrapper, <<$S as LogicalPrimitiveType>::LargestType as LogicalPrimitiveType>::Wrapper>::default()))
},
{
let err_msg = format!(
"\"MY_SUM\" aggregate function not support data type {:?}",
input_type.logical_type_id(),
);
CreateAccumulatorSnafu { err_msg }.fail()?
}
)
});
creator
}
fn output_type(&self) -> QueryResult<ConcreteDataType> {
let input_type = &self.input_types()?[0];
with_match_primitive_type_id!(
input_type.logical_type_id(),
|$S| {
Ok(<<$S as LogicalPrimitiveType>::LargestType>::build_data_type())
},
{
unreachable!()
}
)
}
fn state_types(&self) -> QueryResult<Vec<ConcreteDataType>> {
Ok(vec![self.output_type()?])
}
}
impl<T, SumT> Accumulator for MySumAccumulator<T, SumT>
where
T: WrapperType,
SumT: WrapperType,
T::Native: AsPrimitive<SumT::Native>,
SumT::Native: std::ops::AddAssign,
{
fn state(&self) -> QueryResult<Vec<Value>> {
Ok(vec![self.sum.into()])
}
fn update_batch(&mut self, values: &[VectorRef]) -> QueryResult<()> {
if values.is_empty() {
return Ok(());
};
let column = &values[0];
let column: &<T as Scalar>::VectorType = unsafe { Helper::static_cast(column) };
for v in column.iter_data().flatten() {
self.add(v)
}
Ok(())
}
fn merge_batch(&mut self, states: &[VectorRef]) -> QueryResult<()> {
if states.is_empty() {
return Ok(());
};
let states = &states[0];
let states: &<SumT as Scalar>::VectorType = unsafe { Helper::static_cast(states) };
for s in states.iter_data().flatten() {
self.merge(s)
}
Ok(())
}
fn evaluate(&self) -> QueryResult<Value> {
Ok(self.sum.into())
}
}
#[tokio::test]
async fn test_my_sum() -> Result<()> {
common_telemetry::init_default_ut_logging();
test_my_sum_with(
(1..=10).collect::<Vec<u32>>(),
r#"+--------+
| my_sum |
+--------+
| 55 |
+--------+"#,
)
.await?;
test_my_sum_with(
(-10..=11).collect::<Vec<i32>>(),
r#"+--------+
| my_sum |
+--------+
| 11 |
+--------+"#,
)
.await?;
test_my_sum_with(
vec![-1.0f32, 1.0, 2.0, 3.0, 4.0],
r#"+--------+
| my_sum |
+--------+
| 9.0 |
+--------+"#,
)
.await?;
test_my_sum_with(
vec![u32::MAX, u32::MAX],
r#"+------------+
| my_sum |
+------------+
| 8589934590 |
+------------+"#,
)
.await?;
Ok(())
}
async fn test_my_sum_with<T>(numbers: Vec<T>, expected: &str) -> Result<()>
where
T: WrapperType,
{
let table_name = format!("{}_numbers", std::any::type_name::<T>());
let column_name = format!("{}_number", std::any::type_name::<T>());
let column_schemas = vec![ColumnSchema::new(
column_name.clone(),
T::LogicalType::build_data_type(),
true,
)];
let schema = Arc::new(Schema::new(column_schemas.clone()));
let column: VectorRef = Arc::new(T::VectorType::from_vec(numbers));
let recordbatch = RecordBatch::new(schema, vec![column]).unwrap();
let testing_table = MemTable::table(&table_name, recordbatch);
let engine = new_query_engine_with_table(testing_table);
engine.register_aggregate_function(
create_aggregate_function(
"my_sum".to_string(),
1,
Arc::new(MySumAccumulatorCreator::default()),
)
.into(),
);
let sql = format!("select MY_SUM({column_name}) as my_sum from {table_name}");
let batches = exec_selection(engine, &sql).await;
let batches = RecordBatches::try_new(batches.first().unwrap().schema.clone(), batches).unwrap();
let pretty_print = batches.pretty_print().unwrap();
assert_eq!(expected, pretty_print);
Ok(())
}

View File

@@ -333,6 +333,31 @@ FROM cell_cte;
| 9263763445276221387 | 808f7fc59ef01fcb | 30 | 9277415232383221760 |
+---------------------+---------------------------------+------------------------------+----------------------------------------+
SELECT json_encode_path(37.76938, -122.3889, 1728083375::TimestampSecond);
+----------------------------------------------------------------------------------------------------------------------+
| json_encode_path(Float64(37.76938),Float64(-122.3889),arrow_cast(Int64(1728083375),Utf8("Timestamp(Second, None)"))) |
+----------------------------------------------------------------------------------------------------------------------+
| [[-122.3889,37.76938]] |
+----------------------------------------------------------------------------------------------------------------------+
SELECT json_encode_path(lat, lon, ts)
FROM(
SELECT 37.76938 AS lat, -122.3889 AS lon, 1728083375::TimestampSecond AS ts
UNION ALL
SELECT 37.76928 AS lat, -122.3839 AS lon, 1728083373::TimestampSecond AS ts
UNION ALL
SELECT 37.76930 AS lat, -122.3820 AS lon, 1728083379::TimestampSecond AS ts
UNION ALL
SELECT 37.77001 AS lat, -122.3888 AS lon, 1728083372::TimestampSecond AS ts
);
+-------------------------------------------------------------------------------------+
| json_encode_path(lat,lon,ts) |
+-------------------------------------------------------------------------------------+
| [[-122.3888,37.77001],[-122.3839,37.76928],[-122.3889,37.76938],[-122.382,37.7693]] |
+-------------------------------------------------------------------------------------+
SELECT UNNEST(geo_path(37.76938, -122.3889, 1728083375::TimestampSecond));
+----------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------+

View File

@@ -119,6 +119,19 @@ SELECT cell,
s2_cell_parent(cell, 3)
FROM cell_cte;
SELECT json_encode_path(37.76938, -122.3889, 1728083375::TimestampSecond);
SELECT json_encode_path(lat, lon, ts)
FROM(
SELECT 37.76938 AS lat, -122.3889 AS lon, 1728083375::TimestampSecond AS ts
UNION ALL
SELECT 37.76928 AS lat, -122.3839 AS lon, 1728083373::TimestampSecond AS ts
UNION ALL
SELECT 37.76930 AS lat, -122.3820 AS lon, 1728083379::TimestampSecond AS ts
UNION ALL
SELECT 37.77001 AS lat, -122.3888 AS lon, 1728083372::TimestampSecond AS ts
);
SELECT UNNEST(geo_path(37.76938, -122.3889, 1728083375::TimestampSecond));
SELECT UNNEST(geo_path(lat, lon, ts))