mirror of
https://github.com/GreptimeTeam/greptimedb.git
synced 2025-12-26 08:00:01 +00:00
feat: allow one to many VRL pipeline (#7342)
* feat/allow-one-to-many-pipeline: ### Enhance Pipeline Processing for One-to-Many Transformations - **Support One-to-Many Transformations**: - Updated `processor.rs`, `etl.rs`, `vrl_processor.rs`, and `greptime.rs` to handle one-to-many transformations by allowing VRL processors to return arrays, expanding each element into separate rows. - Introduced `transform_array_elements` and `values_to_rows` functions to facilitate this transformation. - **Error Handling Enhancements**: - Added new error types in `error.rs` to handle cases where array elements are not objects and for transformation failures. - **Testing Enhancements**: - Added tests in `pipeline.rs` to verify one-to-many transformations, single object processing, and error handling for non-object array elements. - **Context Management**: - Modified `ctx_req.rs` to clone `ContextOpt` when adding rows, ensuring correct context management during transformations. - **Server Pipeline Adjustments**: - Updated `pipeline.rs` in `servers` to handle transformed outputs with one-to-many row expansions, ensuring correct row padding and request formation. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat/allow-one-to-many-pipeline: Add one-to-many VRL pipeline test in `http.rs` - Introduced `test_pipeline_one_to_many_vrl` to verify VRL processor's ability to expand a single input row into multiple output rows. - Updated `http_tests!` macro to include the new test. - Implemented test scenarios for single and multiple input rows, ensuring correct data transformation and row count validation. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat/allow-one-to-many-pipeline: ### Add Tests for VRL Pipeline Transformations - **File:** `src/pipeline/src/etl.rs` - Added tests for one-to-many VRL pipeline expansion to ensure multiple output rows from a single input. - Introduced tests to verify backward compatibility for single object output. - Implemented tests to confirm zero rows are produced from empty arrays. - Added validation tests to ensure array elements must be objects. - Developed tests for one-to-many transformations with table suffix hints from VRL. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat/allow-one-to-many-pipeline: ### Enhance Pipeline Transformation with Per-Row Table Suffixes - **`src/pipeline/src/etl.rs`**: Updated `TransformedOutput` to include per-row table suffixes, allowing for more flexible routing of transformed data. Modified `PipelineExecOutput` and related methods to handle the new structure. - **`src/pipeline/src/etl/transform/transformer/greptime.rs`**: Enhanced `values_to_rows` to support per-row table suffix extraction and application. - **`src/pipeline/tests/common.rs`** and **`src/pipeline/tests/pipeline.rs`**: Adjusted tests to validate the new per-row table suffix functionality, ensuring backward compatibility and correct behavior in one-to-many transformations. - **`src/servers/src/pipeline.rs`**: Modified `run_custom_pipeline` to process transformed outputs with per-row table suffixes, grouping rows by `(opt, table_name)` for insertion. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat/allow-one-to-many-pipeline: ### Update VRL Processor Type Checks - **File:** `vrl_processor.rs` - **Changes:** Updated type checking logic to use `contains_object()` and `contains_array()` methods instead of `is_object()` and `is_array()`. This change ensures compatibility with VRL type inference that may return multiple possible types. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat/allow-one-to-many-pipeline: - **Enhance Error Handling**: Added new error types `ArrayElementMustBeObjectSnafu` and `TransformArrayElementSnafu` to improve error handling in `etl.rs` and `greptime.rs`. - **Refactor Error Usage**: Moved error usage declarations in `transform_array_elements` and `values_to_rows` functions to the top of the file for better organization in `etl.rs` and `greptime.rs`. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat/allow-one-to-many-pipeline: ### Update `greptime.rs` to Enhance Error Handling - **Error Handling**: Modified the `values_to_rows` function to handle invalid array elements based on the `skip_error` parameter. If `skip_error` is true, invalid elements are skipped; otherwise, an error is returned. - **Testing**: Added unit tests in `greptime.rs` to verify the behavior of `values_to_rows` with different `skip_error` settings, ensuring correct processing of valid objects and appropriate error handling for invalid elements. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat/allow-one-to-many-pipeline: ### Commit Summary - **Enhance `TransformedOutput` Structure**: Refactored `TransformedOutput` to use a `HashMap` for grouping rows by `ContextOpt`, allowing for per-row configuration options. Updated methods in `PipelineExecOutput` to support the new structure (`src/pipeline/src/etl.rs`). - **Add New Transformation Method**: Introduced `transform_array_elements_to_hashmap` to handle array inputs with per-row `ContextOpt` in `HashMap` format (`src/pipeline/src/etl.rs`). - **Update Pipeline Execution**: Modified `run_custom_pipeline` to process `TransformedOutput` using the new `HashMap` structure, ensuring rows are grouped by `ContextOpt` and table name (`src/servers/src/pipeline.rs`). - **Add Tests for New Structure**: Implemented tests to verify the functionality of the new `HashMap` structure in `TransformedOutput`, including scenarios for one-to-many mapping, single object input, and empty arrays (`src/pipeline/src/etl.rs`). Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat/allow-one-to-many-pipeline: ### Refactor `values_to_rows` to Return `HashMap` Grouped by `ContextOpt` - **`etl.rs`**: - Updated `values_to_rows` to return a `HashMap` grouped by `ContextOpt` instead of a vector. - Adjusted logic to handle single object and array inputs, ensuring rows are grouped by their `ContextOpt`. - Modified functions to extract rows from default `ContextOpt` and apply table suffixes accordingly. - **`greptime.rs`**: - Enhanced `values_to_rows` to handle errors gracefully with `skip_error` logic. - Added logic to group rows by `ContextOpt` for array inputs. - **Tests**: - Updated existing tests to validate the new `HashMap` return structure. - Added a new test to verify correct grouping of rows by per-element `ContextOpt`. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat/allow-one-to-many-pipeline: ### Refactor and Enhance Error Handling in ETL Pipeline - **Refactored Functionality**: - Replaced `transform_array_elements` with `transform_array_elements_by_ctx` in `etl.rs` to streamline transformation logic and improve error handling. - Updated `values_to_rows` in `greptime.rs` to use `or_default` for cleaner code. - **Enhanced Error Handling**: - Introduced `unwrap_or_continue_if_err` macro in `etl.rs` to allow skipping errors based on pipeline context, improving robustness in data processing. These changes enhance the maintainability and error resilience of the ETL pipeline. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat/allow-one-to-many-pipeline: ### Update `Row` Handling in ETL Pipeline - **Refactor `Row` Type**: Introduced `RowWithTableSuffix` type alias to simplify handling of rows with optional table suffixes across the ETL pipeline. - **Modify Function Signatures**: Updated function signatures in `etl.rs` and `greptime.rs` to use `RowWithTableSuffix` for better clarity and consistency. - **Enhance Test Coverage**: Adjusted test logic in `greptime.rs` to align with the new `RowWithTableSuffix` type, ensuring correct grouping and processing of rows by TTL. Files affected: `etl.rs`, `greptime.rs`. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> --------- Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
This commit is contained in:
@@ -33,7 +33,7 @@ fn processor_mut(
|
|||||||
.exec_mut(v, pipeline_ctx, schema_info)?
|
.exec_mut(v, pipeline_ctx, schema_info)?
|
||||||
.into_transformed()
|
.into_transformed()
|
||||||
.expect("expect transformed result ");
|
.expect("expect transformed result ");
|
||||||
result.push(r.0);
|
result.extend(r.into_iter().map(|v| v.0));
|
||||||
}
|
}
|
||||||
|
|
||||||
Ok(result)
|
Ok(result)
|
||||||
|
|||||||
@@ -19,6 +19,7 @@ use common_error::status_code::StatusCode;
|
|||||||
use common_macro::stack_trace_debug;
|
use common_macro::stack_trace_debug;
|
||||||
use datatypes::timestamp::TimestampNanosecond;
|
use datatypes::timestamp::TimestampNanosecond;
|
||||||
use snafu::{Location, Snafu};
|
use snafu::{Location, Snafu};
|
||||||
|
use vrl::value::Kind;
|
||||||
|
|
||||||
#[derive(Snafu)]
|
#[derive(Snafu)]
|
||||||
#[snafu(visibility(pub))]
|
#[snafu(visibility(pub))]
|
||||||
@@ -676,8 +677,12 @@ pub enum Error {
|
|||||||
location: Location,
|
location: Location,
|
||||||
},
|
},
|
||||||
|
|
||||||
#[snafu(display("Vrl script should return `.` in the end"))]
|
#[snafu(display(
|
||||||
|
"Vrl script should return object or array in the end, got `{:?}`",
|
||||||
|
result_kind
|
||||||
|
))]
|
||||||
VrlReturnValue {
|
VrlReturnValue {
|
||||||
|
result_kind: Kind,
|
||||||
#[snafu(implicit)]
|
#[snafu(implicit)]
|
||||||
location: Location,
|
location: Location,
|
||||||
},
|
},
|
||||||
@@ -695,6 +700,25 @@ pub enum Error {
|
|||||||
location: Location,
|
location: Location,
|
||||||
},
|
},
|
||||||
|
|
||||||
|
#[snafu(display(
|
||||||
|
"Array element at index {index} must be an object for one-to-many transformation, got {actual_type}"
|
||||||
|
))]
|
||||||
|
ArrayElementMustBeObject {
|
||||||
|
index: usize,
|
||||||
|
actual_type: String,
|
||||||
|
#[snafu(implicit)]
|
||||||
|
location: Location,
|
||||||
|
},
|
||||||
|
|
||||||
|
#[snafu(display("Failed to transform array element at index {index}: {source}"))]
|
||||||
|
TransformArrayElement {
|
||||||
|
index: usize,
|
||||||
|
#[snafu(source)]
|
||||||
|
source: Box<Error>,
|
||||||
|
#[snafu(implicit)]
|
||||||
|
location: Location,
|
||||||
|
},
|
||||||
|
|
||||||
#[snafu(display("Failed to build DataFusion logical plan"))]
|
#[snafu(display("Failed to build DataFusion logical plan"))]
|
||||||
BuildDfLogicalPlan {
|
BuildDfLogicalPlan {
|
||||||
#[snafu(source)]
|
#[snafu(source)]
|
||||||
@@ -792,7 +816,10 @@ impl ErrorExt for Error {
|
|||||||
| InvalidPipelineVersion { .. }
|
| InvalidPipelineVersion { .. }
|
||||||
| InvalidCustomTimeIndex { .. }
|
| InvalidCustomTimeIndex { .. }
|
||||||
| TimeIndexMustBeNonNull { .. } => StatusCode::InvalidArguments,
|
| TimeIndexMustBeNonNull { .. } => StatusCode::InvalidArguments,
|
||||||
MultiPipelineWithDiffSchema { .. } | ValueMustBeMap { .. } => StatusCode::IllegalState,
|
MultiPipelineWithDiffSchema { .. }
|
||||||
|
| ValueMustBeMap { .. }
|
||||||
|
| ArrayElementMustBeObject { .. } => StatusCode::IllegalState,
|
||||||
|
TransformArrayElement { source, .. } => source.status_code(),
|
||||||
BuildDfLogicalPlan { .. } | RecordBatchLenNotMatch { .. } => StatusCode::Internal,
|
BuildDfLogicalPlan { .. } | RecordBatchLenNotMatch { .. } => StatusCode::Internal,
|
||||||
ExecuteInternalStatement { source, .. } => source.status_code(),
|
ExecuteInternalStatement { source, .. } => source.status_code(),
|
||||||
DataFrame { source, .. } => source.status_code(),
|
DataFrame { source, .. } => source.status_code(),
|
||||||
|
|||||||
@@ -19,6 +19,8 @@ pub mod processor;
|
|||||||
pub mod transform;
|
pub mod transform;
|
||||||
pub mod value;
|
pub mod value;
|
||||||
|
|
||||||
|
use std::collections::HashMap;
|
||||||
|
|
||||||
use api::v1::Row;
|
use api::v1::Row;
|
||||||
use common_time::timestamp::TimeUnit;
|
use common_time::timestamp::TimeUnit;
|
||||||
use itertools::Itertools;
|
use itertools::Itertools;
|
||||||
@@ -30,13 +32,17 @@ use yaml_rust::{Yaml, YamlLoader};
|
|||||||
|
|
||||||
use crate::dispatcher::{Dispatcher, Rule};
|
use crate::dispatcher::{Dispatcher, Rule};
|
||||||
use crate::error::{
|
use crate::error::{
|
||||||
AutoTransformOneTimestampSnafu, Error, IntermediateKeyIndexSnafu, InvalidVersionNumberSnafu,
|
ArrayElementMustBeObjectSnafu, AutoTransformOneTimestampSnafu, Error,
|
||||||
Result, YamlLoadSnafu, YamlParseSnafu,
|
IntermediateKeyIndexSnafu, InvalidVersionNumberSnafu, Result, TransformArrayElementSnafu,
|
||||||
|
YamlLoadSnafu, YamlParseSnafu,
|
||||||
};
|
};
|
||||||
use crate::etl::processor::ProcessorKind;
|
use crate::etl::processor::ProcessorKind;
|
||||||
use crate::etl::transform::transformer::greptime::values_to_row;
|
use crate::etl::transform::transformer::greptime::{RowWithTableSuffix, values_to_rows};
|
||||||
use crate::tablesuffix::TableSuffixTemplate;
|
use crate::tablesuffix::TableSuffixTemplate;
|
||||||
use crate::{ContextOpt, GreptimeTransformer, IdentityTimeIndex, PipelineContext, SchemaInfo};
|
use crate::{
|
||||||
|
ContextOpt, GreptimeTransformer, IdentityTimeIndex, PipelineContext, SchemaInfo,
|
||||||
|
unwrap_or_continue_if_err,
|
||||||
|
};
|
||||||
|
|
||||||
const DESCRIPTION: &str = "description";
|
const DESCRIPTION: &str = "description";
|
||||||
const DOC_VERSION: &str = "version";
|
const DOC_VERSION: &str = "version";
|
||||||
@@ -230,21 +236,51 @@ pub enum PipelineExecOutput {
|
|||||||
Filtered,
|
Filtered,
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// Output from a successful pipeline transformation.
|
||||||
|
///
|
||||||
|
/// Rows are grouped by their ContextOpt, with each row having its own optional
|
||||||
|
/// table_suffix for routing to different tables when using one-to-many expansion.
|
||||||
|
/// This enables true per-row configuration options where different rows can have
|
||||||
|
/// different database settings (TTL, merge mode, etc.).
|
||||||
#[derive(Debug)]
|
#[derive(Debug)]
|
||||||
pub struct TransformedOutput {
|
pub struct TransformedOutput {
|
||||||
pub opt: ContextOpt,
|
/// Rows grouped by their ContextOpt, each with optional table suffix
|
||||||
pub row: Row,
|
pub rows_by_context: HashMap<ContextOpt, Vec<RowWithTableSuffix>>,
|
||||||
pub table_suffix: Option<String>,
|
|
||||||
}
|
}
|
||||||
|
|
||||||
impl PipelineExecOutput {
|
impl PipelineExecOutput {
|
||||||
// Note: This is a test only function, do not use it in production.
|
// Note: This is a test only function, do not use it in production.
|
||||||
pub fn into_transformed(self) -> Option<(Row, Option<String>)> {
|
pub fn into_transformed(self) -> Option<Vec<RowWithTableSuffix>> {
|
||||||
if let Self::Transformed(TransformedOutput {
|
if let Self::Transformed(TransformedOutput { rows_by_context }) = self {
|
||||||
row, table_suffix, ..
|
// For backward compatibility, merge all rows with a default ContextOpt
|
||||||
}) = self
|
Some(rows_by_context.into_values().flatten().collect())
|
||||||
{
|
} else {
|
||||||
Some((row, table_suffix))
|
None
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// New method for accessing the HashMap structure directly
|
||||||
|
pub fn into_transformed_hashmap(self) -> Option<HashMap<ContextOpt, Vec<RowWithTableSuffix>>> {
|
||||||
|
if let Self::Transformed(TransformedOutput { rows_by_context }) = self {
|
||||||
|
Some(rows_by_context)
|
||||||
|
} else {
|
||||||
|
None
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Backward compatibility helper that returns first ContextOpt with all its rows
|
||||||
|
// or merges all rows with default ContextOpt for multi-context scenarios
|
||||||
|
pub fn into_legacy_format(self) -> Option<(ContextOpt, Vec<RowWithTableSuffix>)> {
|
||||||
|
if let Self::Transformed(TransformedOutput { rows_by_context }) = self {
|
||||||
|
if rows_by_context.len() == 1 {
|
||||||
|
let (opt, rows) = rows_by_context.into_iter().next().unwrap();
|
||||||
|
Some((opt, rows))
|
||||||
|
} else {
|
||||||
|
// Multiple contexts: merge all rows with default ContextOpt for test compatibility
|
||||||
|
let all_rows: Vec<RowWithTableSuffix> =
|
||||||
|
rows_by_context.into_values().flatten().collect();
|
||||||
|
Some((ContextOpt::default(), all_rows))
|
||||||
|
}
|
||||||
} else {
|
} else {
|
||||||
None
|
None
|
||||||
}
|
}
|
||||||
@@ -285,45 +321,43 @@ impl Pipeline {
|
|||||||
return Ok(PipelineExecOutput::DispatchedTo(rule.into(), val));
|
return Ok(PipelineExecOutput::DispatchedTo(rule.into(), val));
|
||||||
}
|
}
|
||||||
|
|
||||||
// extract the options first
|
let mut val = if val.is_array() {
|
||||||
// this might be a breaking change, for table_suffix is now right after the processors
|
val
|
||||||
let mut opt = ContextOpt::from_pipeline_map_to_opt(&mut val)?;
|
} else {
|
||||||
let table_suffix = opt.resolve_table_suffix(self.tablesuffix.as_ref(), &val);
|
VrlValue::Array(vec![val])
|
||||||
|
};
|
||||||
|
|
||||||
let row = match self.transformer() {
|
let rows_by_context = match self.transformer() {
|
||||||
TransformerMode::GreptimeTransformer(greptime_transformer) => {
|
TransformerMode::GreptimeTransformer(greptime_transformer) => {
|
||||||
let values = greptime_transformer.transform_mut(&mut val, self.is_v1())?;
|
transform_array_elements_by_ctx(
|
||||||
if self.is_v1() {
|
// SAFETY: by line 326, val must be an array
|
||||||
// v1 dont combine with auto-transform
|
val.as_array_mut().unwrap(),
|
||||||
// so return immediately
|
greptime_transformer,
|
||||||
return Ok(PipelineExecOutput::Transformed(TransformedOutput {
|
self.is_v1(),
|
||||||
opt,
|
schema_info,
|
||||||
row: Row { values },
|
pipeline_ctx,
|
||||||
table_suffix,
|
self.tablesuffix.as_ref(),
|
||||||
}));
|
)?
|
||||||
}
|
|
||||||
// continue v2 process, and set the rest fields with auto-transform
|
|
||||||
// if transformer presents, then ts has been set
|
|
||||||
values_to_row(schema_info, val, pipeline_ctx, Some(values), false)?
|
|
||||||
}
|
}
|
||||||
TransformerMode::AutoTransform(ts_name, time_unit) => {
|
TransformerMode::AutoTransform(ts_name, time_unit) => {
|
||||||
// infer ts from the context
|
|
||||||
// we've check that only one timestamp should exist
|
|
||||||
|
|
||||||
// Create pipeline context with the found timestamp
|
|
||||||
let def = crate::PipelineDefinition::GreptimeIdentityPipeline(Some(
|
let def = crate::PipelineDefinition::GreptimeIdentityPipeline(Some(
|
||||||
IdentityTimeIndex::Epoch(ts_name.clone(), *time_unit, false),
|
IdentityTimeIndex::Epoch(ts_name.clone(), *time_unit, false),
|
||||||
));
|
));
|
||||||
let n_ctx =
|
let n_ctx =
|
||||||
PipelineContext::new(&def, pipeline_ctx.pipeline_param, pipeline_ctx.channel);
|
PipelineContext::new(&def, pipeline_ctx.pipeline_param, pipeline_ctx.channel);
|
||||||
values_to_row(schema_info, val, &n_ctx, None, true)?
|
values_to_rows(
|
||||||
|
schema_info,
|
||||||
|
val,
|
||||||
|
&n_ctx,
|
||||||
|
None,
|
||||||
|
true,
|
||||||
|
self.tablesuffix.as_ref(),
|
||||||
|
)?
|
||||||
}
|
}
|
||||||
};
|
};
|
||||||
|
|
||||||
Ok(PipelineExecOutput::Transformed(TransformedOutput {
|
Ok(PipelineExecOutput::Transformed(TransformedOutput {
|
||||||
opt,
|
rows_by_context,
|
||||||
row,
|
|
||||||
table_suffix,
|
|
||||||
}))
|
}))
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -350,6 +384,65 @@ impl Pipeline {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// Transforms an array of VRL values into rows grouped by their ContextOpt.
|
||||||
|
/// Each element can have its own ContextOpt for per-row configuration.
|
||||||
|
fn transform_array_elements_by_ctx(
|
||||||
|
arr: &mut [VrlValue],
|
||||||
|
transformer: &GreptimeTransformer,
|
||||||
|
is_v1: bool,
|
||||||
|
schema_info: &mut SchemaInfo,
|
||||||
|
pipeline_ctx: &PipelineContext<'_>,
|
||||||
|
tablesuffix_template: Option<&TableSuffixTemplate>,
|
||||||
|
) -> Result<HashMap<ContextOpt, Vec<RowWithTableSuffix>>> {
|
||||||
|
let skip_error = pipeline_ctx.pipeline_param.skip_error();
|
||||||
|
let mut rows_by_context = HashMap::new();
|
||||||
|
|
||||||
|
for (index, element) in arr.iter_mut().enumerate() {
|
||||||
|
if !element.is_object() {
|
||||||
|
unwrap_or_continue_if_err!(
|
||||||
|
ArrayElementMustBeObjectSnafu {
|
||||||
|
index,
|
||||||
|
actual_type: element.kind_str().to_string(),
|
||||||
|
}
|
||||||
|
.fail(),
|
||||||
|
skip_error
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
let values =
|
||||||
|
unwrap_or_continue_if_err!(transformer.transform_mut(element, is_v1), skip_error);
|
||||||
|
if is_v1 {
|
||||||
|
// v1 mode: just use transformer output directly
|
||||||
|
let mut opt = unwrap_or_continue_if_err!(
|
||||||
|
ContextOpt::from_pipeline_map_to_opt(element),
|
||||||
|
skip_error
|
||||||
|
);
|
||||||
|
let table_suffix = opt.resolve_table_suffix(tablesuffix_template, element);
|
||||||
|
rows_by_context
|
||||||
|
.entry(opt)
|
||||||
|
.or_insert_with(Vec::new)
|
||||||
|
.push((Row { values }, table_suffix));
|
||||||
|
} else {
|
||||||
|
// v2 mode: combine with auto-transform for remaining fields
|
||||||
|
let element_rows_map = values_to_rows(
|
||||||
|
schema_info,
|
||||||
|
element.clone(),
|
||||||
|
pipeline_ctx,
|
||||||
|
Some(values),
|
||||||
|
false,
|
||||||
|
tablesuffix_template,
|
||||||
|
)
|
||||||
|
.map_err(Box::new)
|
||||||
|
.context(TransformArrayElementSnafu { index })?;
|
||||||
|
for (k, v) in element_rows_map {
|
||||||
|
rows_by_context.entry(k).or_default().extend(v);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
Ok(rows_by_context)
|
||||||
|
}
|
||||||
|
|
||||||
pub(crate) fn find_key_index(intermediate_keys: &[String], key: &str, kind: &str) -> Result<usize> {
|
pub(crate) fn find_key_index(intermediate_keys: &[String], key: &str, kind: &str) -> Result<usize> {
|
||||||
intermediate_keys
|
intermediate_keys
|
||||||
.iter()
|
.iter()
|
||||||
@@ -361,7 +454,7 @@ pub(crate) fn find_key_index(intermediate_keys: &[String], key: &str, kind: &str
|
|||||||
/// The schema_info cannot be used in auto-transform ts-infer mode for lacking the ts schema.
|
/// The schema_info cannot be used in auto-transform ts-infer mode for lacking the ts schema.
|
||||||
///
|
///
|
||||||
/// Usage:
|
/// Usage:
|
||||||
/// ```rust
|
/// ```ignore
|
||||||
/// let (pipeline, schema_info, pipeline_def, pipeline_param) = setup_pipeline!(pipeline);
|
/// let (pipeline, schema_info, pipeline_def, pipeline_param) = setup_pipeline!(pipeline);
|
||||||
/// let pipeline_ctx = PipelineContext::new(&pipeline_def, &pipeline_param, Channel::Unknown);
|
/// let pipeline_ctx = PipelineContext::new(&pipeline_def, &pipeline_param, Channel::Unknown);
|
||||||
/// ```
|
/// ```
|
||||||
@@ -382,6 +475,7 @@ macro_rules! setup_pipeline {
|
|||||||
(pipeline, schema_info, pipeline_def, pipeline_param)
|
(pipeline, schema_info, pipeline_def, pipeline_param)
|
||||||
}};
|
}};
|
||||||
}
|
}
|
||||||
|
|
||||||
#[cfg(test)]
|
#[cfg(test)]
|
||||||
mod tests {
|
mod tests {
|
||||||
use std::collections::BTreeMap;
|
use std::collections::BTreeMap;
|
||||||
@@ -433,15 +527,16 @@ transform:
|
|||||||
);
|
);
|
||||||
|
|
||||||
let payload = input_value.into();
|
let payload = input_value.into();
|
||||||
let result = pipeline
|
let mut result = pipeline
|
||||||
.exec_mut(payload, &pipeline_ctx, &mut schema_info)
|
.exec_mut(payload, &pipeline_ctx, &mut schema_info)
|
||||||
.unwrap()
|
.unwrap()
|
||||||
.into_transformed()
|
.into_transformed()
|
||||||
.unwrap();
|
.unwrap();
|
||||||
|
|
||||||
assert_eq!(result.0.values[0].value_data, Some(ValueData::U32Value(1)));
|
let (row, _table_suffix) = result.swap_remove(0);
|
||||||
assert_eq!(result.0.values[1].value_data, Some(ValueData::U32Value(2)));
|
assert_eq!(row.values[0].value_data, Some(ValueData::U32Value(1)));
|
||||||
match &result.0.values[2].value_data {
|
assert_eq!(row.values[1].value_data, Some(ValueData::U32Value(2)));
|
||||||
|
match &row.values[2].value_data {
|
||||||
Some(ValueData::TimestampNanosecondValue(v)) => {
|
Some(ValueData::TimestampNanosecondValue(v)) => {
|
||||||
assert_ne!(v, &0);
|
assert_ne!(v, &0);
|
||||||
}
|
}
|
||||||
@@ -504,7 +599,7 @@ transform:
|
|||||||
.into_transformed()
|
.into_transformed()
|
||||||
.unwrap();
|
.unwrap();
|
||||||
|
|
||||||
assert_eq!(schema_info.schema.len(), result.0.values.len());
|
assert_eq!(schema_info.schema.len(), result[0].0.values.len());
|
||||||
let test = [
|
let test = [
|
||||||
(
|
(
|
||||||
ColumnDataType::String as i32,
|
ColumnDataType::String as i32,
|
||||||
@@ -545,7 +640,7 @@ transform:
|
|||||||
let schema = pipeline.schemas().unwrap();
|
let schema = pipeline.schemas().unwrap();
|
||||||
for i in 0..schema.len() {
|
for i in 0..schema.len() {
|
||||||
let schema = &schema[i];
|
let schema = &schema[i];
|
||||||
let value = &result.0.values[i];
|
let value = &result[0].0.values[i];
|
||||||
assert_eq!(schema.datatype, test[i].0);
|
assert_eq!(schema.datatype, test[i].0);
|
||||||
assert_eq!(value.value_data, test[i].1);
|
assert_eq!(value.value_data, test[i].1);
|
||||||
}
|
}
|
||||||
@@ -595,9 +690,15 @@ transform:
|
|||||||
.unwrap()
|
.unwrap()
|
||||||
.into_transformed()
|
.into_transformed()
|
||||||
.unwrap();
|
.unwrap();
|
||||||
assert_eq!(result.0.values[0].value_data, Some(ValueData::U32Value(1)));
|
assert_eq!(
|
||||||
assert_eq!(result.0.values[1].value_data, Some(ValueData::U32Value(2)));
|
result[0].0.values[0].value_data,
|
||||||
match &result.0.values[2].value_data {
|
Some(ValueData::U32Value(1))
|
||||||
|
);
|
||||||
|
assert_eq!(
|
||||||
|
result[0].0.values[1].value_data,
|
||||||
|
Some(ValueData::U32Value(2))
|
||||||
|
);
|
||||||
|
match &result[0].0.values[2].value_data {
|
||||||
Some(ValueData::TimestampNanosecondValue(v)) => {
|
Some(ValueData::TimestampNanosecondValue(v)) => {
|
||||||
assert_ne!(v, &0);
|
assert_ne!(v, &0);
|
||||||
}
|
}
|
||||||
@@ -644,14 +745,14 @@ transform:
|
|||||||
let schema = pipeline.schemas().unwrap().clone();
|
let schema = pipeline.schemas().unwrap().clone();
|
||||||
let result = input_value.into();
|
let result = input_value.into();
|
||||||
|
|
||||||
let row = pipeline
|
let rows_with_suffix = pipeline
|
||||||
.exec_mut(result, &pipeline_ctx, &mut schema_info)
|
.exec_mut(result, &pipeline_ctx, &mut schema_info)
|
||||||
.unwrap()
|
.unwrap()
|
||||||
.into_transformed()
|
.into_transformed()
|
||||||
.unwrap();
|
.unwrap();
|
||||||
let output = Rows {
|
let output = Rows {
|
||||||
schema,
|
schema,
|
||||||
rows: vec![row.0],
|
rows: rows_with_suffix.into_iter().map(|(r, _)| r).collect(),
|
||||||
};
|
};
|
||||||
let schemas = output.schema;
|
let schemas = output.schema;
|
||||||
|
|
||||||
@@ -804,4 +905,566 @@ transform:
|
|||||||
let r: Result<Pipeline> = parse(&Content::Yaml(bad_yaml3));
|
let r: Result<Pipeline> = parse(&Content::Yaml(bad_yaml3));
|
||||||
assert!(r.is_err());
|
assert!(r.is_err());
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// Test one-to-many VRL pipeline expansion.
|
||||||
|
/// A VRL processor can return an array, which results in multiple output rows.
|
||||||
|
#[test]
|
||||||
|
fn test_one_to_many_vrl_expansion() {
|
||||||
|
let pipeline_yaml = r#"
|
||||||
|
processors:
|
||||||
|
- epoch:
|
||||||
|
field: timestamp
|
||||||
|
resolution: ms
|
||||||
|
- vrl:
|
||||||
|
source: |
|
||||||
|
events = del(.events)
|
||||||
|
base_host = del(.host)
|
||||||
|
base_ts = del(.timestamp)
|
||||||
|
map_values(array!(events)) -> |event| {
|
||||||
|
{
|
||||||
|
"host": base_host,
|
||||||
|
"event_type": event.type,
|
||||||
|
"event_value": event.value,
|
||||||
|
"timestamp": base_ts
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
transform:
|
||||||
|
- field: host
|
||||||
|
type: string
|
||||||
|
- field: event_type
|
||||||
|
type: string
|
||||||
|
- field: event_value
|
||||||
|
type: int32
|
||||||
|
- field: timestamp
|
||||||
|
type: timestamp, ms
|
||||||
|
index: time
|
||||||
|
"#;
|
||||||
|
|
||||||
|
let pipeline: Pipeline = parse(&Content::Yaml(pipeline_yaml)).unwrap();
|
||||||
|
let (pipeline, mut schema_info, pipeline_def, pipeline_param) = setup_pipeline!(pipeline);
|
||||||
|
let pipeline_ctx = PipelineContext::new(
|
||||||
|
&pipeline_def,
|
||||||
|
&pipeline_param,
|
||||||
|
session::context::Channel::Unknown,
|
||||||
|
);
|
||||||
|
|
||||||
|
// Input with 3 events
|
||||||
|
let input_value: serde_json::Value = serde_json::from_str(
|
||||||
|
r#"{
|
||||||
|
"host": "server1",
|
||||||
|
"timestamp": 1716668197217,
|
||||||
|
"events": [
|
||||||
|
{"type": "cpu", "value": 80},
|
||||||
|
{"type": "memory", "value": 60},
|
||||||
|
{"type": "disk", "value": 45}
|
||||||
|
]
|
||||||
|
}"#,
|
||||||
|
)
|
||||||
|
.unwrap();
|
||||||
|
|
||||||
|
let payload = input_value.into();
|
||||||
|
let result = pipeline
|
||||||
|
.exec_mut(payload, &pipeline_ctx, &mut schema_info)
|
||||||
|
.unwrap()
|
||||||
|
.into_transformed()
|
||||||
|
.unwrap();
|
||||||
|
|
||||||
|
// Should produce 3 rows from 1 input
|
||||||
|
assert_eq!(result.len(), 3);
|
||||||
|
|
||||||
|
// Verify each row has correct structure
|
||||||
|
for (row, _table_suffix) in &result {
|
||||||
|
assert_eq!(row.values.len(), 4); // host, event_type, event_value, timestamp
|
||||||
|
// First value should be "server1"
|
||||||
|
assert_eq!(
|
||||||
|
row.values[0].value_data,
|
||||||
|
Some(ValueData::StringValue("server1".to_string()))
|
||||||
|
);
|
||||||
|
// Last value should be the timestamp
|
||||||
|
assert_eq!(
|
||||||
|
row.values[3].value_data,
|
||||||
|
Some(ValueData::TimestampMillisecondValue(1716668197217))
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Verify event types
|
||||||
|
let event_types: Vec<_> = result
|
||||||
|
.iter()
|
||||||
|
.map(|(r, _)| match &r.values[1].value_data {
|
||||||
|
Some(ValueData::StringValue(s)) => s.clone(),
|
||||||
|
_ => panic!("expected string"),
|
||||||
|
})
|
||||||
|
.collect();
|
||||||
|
assert!(event_types.contains(&"cpu".to_string()));
|
||||||
|
assert!(event_types.contains(&"memory".to_string()));
|
||||||
|
assert!(event_types.contains(&"disk".to_string()));
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Test that single object output still works (backward compatibility)
|
||||||
|
#[test]
|
||||||
|
fn test_single_object_output_unchanged() {
|
||||||
|
let pipeline_yaml = r#"
|
||||||
|
processors:
|
||||||
|
- epoch:
|
||||||
|
field: ts
|
||||||
|
resolution: ms
|
||||||
|
- vrl:
|
||||||
|
source: |
|
||||||
|
.processed = true
|
||||||
|
.
|
||||||
|
|
||||||
|
transform:
|
||||||
|
- field: name
|
||||||
|
type: string
|
||||||
|
- field: processed
|
||||||
|
type: boolean
|
||||||
|
- field: ts
|
||||||
|
type: timestamp, ms
|
||||||
|
index: time
|
||||||
|
"#;
|
||||||
|
|
||||||
|
let pipeline: Pipeline = parse(&Content::Yaml(pipeline_yaml)).unwrap();
|
||||||
|
let (pipeline, mut schema_info, pipeline_def, pipeline_param) = setup_pipeline!(pipeline);
|
||||||
|
let pipeline_ctx = PipelineContext::new(
|
||||||
|
&pipeline_def,
|
||||||
|
&pipeline_param,
|
||||||
|
session::context::Channel::Unknown,
|
||||||
|
);
|
||||||
|
|
||||||
|
let input_value: serde_json::Value = serde_json::from_str(
|
||||||
|
r#"{
|
||||||
|
"name": "test",
|
||||||
|
"ts": 1716668197217
|
||||||
|
}"#,
|
||||||
|
)
|
||||||
|
.unwrap();
|
||||||
|
|
||||||
|
let payload = input_value.into();
|
||||||
|
let result = pipeline
|
||||||
|
.exec_mut(payload, &pipeline_ctx, &mut schema_info)
|
||||||
|
.unwrap()
|
||||||
|
.into_transformed()
|
||||||
|
.unwrap();
|
||||||
|
|
||||||
|
// Should produce exactly 1 row
|
||||||
|
assert_eq!(result.len(), 1);
|
||||||
|
assert_eq!(
|
||||||
|
result[0].0.values[0].value_data,
|
||||||
|
Some(ValueData::StringValue("test".to_string()))
|
||||||
|
);
|
||||||
|
assert_eq!(
|
||||||
|
result[0].0.values[1].value_data,
|
||||||
|
Some(ValueData::BoolValue(true))
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Test that empty array produces zero rows
|
||||||
|
#[test]
|
||||||
|
fn test_empty_array_produces_zero_rows() {
|
||||||
|
let pipeline_yaml = r#"
|
||||||
|
processors:
|
||||||
|
- vrl:
|
||||||
|
source: |
|
||||||
|
.events
|
||||||
|
|
||||||
|
transform:
|
||||||
|
- field: value
|
||||||
|
type: int32
|
||||||
|
- field: greptime_timestamp
|
||||||
|
type: timestamp, ns
|
||||||
|
index: time
|
||||||
|
"#;
|
||||||
|
|
||||||
|
let pipeline: Pipeline = parse(&Content::Yaml(pipeline_yaml)).unwrap();
|
||||||
|
let (pipeline, mut schema_info, pipeline_def, pipeline_param) = setup_pipeline!(pipeline);
|
||||||
|
let pipeline_ctx = PipelineContext::new(
|
||||||
|
&pipeline_def,
|
||||||
|
&pipeline_param,
|
||||||
|
session::context::Channel::Unknown,
|
||||||
|
);
|
||||||
|
|
||||||
|
let input_value: serde_json::Value = serde_json::from_str(r#"{"events": []}"#).unwrap();
|
||||||
|
|
||||||
|
let payload = input_value.into();
|
||||||
|
let result = pipeline
|
||||||
|
.exec_mut(payload, &pipeline_ctx, &mut schema_info)
|
||||||
|
.unwrap()
|
||||||
|
.into_transformed()
|
||||||
|
.unwrap();
|
||||||
|
|
||||||
|
// Empty array should produce zero rows
|
||||||
|
assert_eq!(result.len(), 0);
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Test that array elements must be objects
|
||||||
|
#[test]
|
||||||
|
fn test_array_element_must_be_object() {
|
||||||
|
let pipeline_yaml = r#"
|
||||||
|
processors:
|
||||||
|
- vrl:
|
||||||
|
source: |
|
||||||
|
.items
|
||||||
|
|
||||||
|
transform:
|
||||||
|
- field: value
|
||||||
|
type: int32
|
||||||
|
- field: greptime_timestamp
|
||||||
|
type: timestamp, ns
|
||||||
|
index: time
|
||||||
|
"#;
|
||||||
|
|
||||||
|
let pipeline: Pipeline = parse(&Content::Yaml(pipeline_yaml)).unwrap();
|
||||||
|
let (pipeline, mut schema_info, pipeline_def, pipeline_param) = setup_pipeline!(pipeline);
|
||||||
|
let pipeline_ctx = PipelineContext::new(
|
||||||
|
&pipeline_def,
|
||||||
|
&pipeline_param,
|
||||||
|
session::context::Channel::Unknown,
|
||||||
|
);
|
||||||
|
|
||||||
|
// Array with non-object elements should fail
|
||||||
|
let input_value: serde_json::Value =
|
||||||
|
serde_json::from_str(r#"{"items": [1, 2, 3]}"#).unwrap();
|
||||||
|
|
||||||
|
let payload = input_value.into();
|
||||||
|
let result = pipeline.exec_mut(payload, &pipeline_ctx, &mut schema_info);
|
||||||
|
|
||||||
|
assert!(result.is_err());
|
||||||
|
let err_msg = result.unwrap_err().to_string();
|
||||||
|
assert!(
|
||||||
|
err_msg.contains("must be an object"),
|
||||||
|
"Expected error about non-object element, got: {}",
|
||||||
|
err_msg
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Test one-to-many with table suffix from VRL hint
|
||||||
|
#[test]
|
||||||
|
fn test_one_to_many_with_table_suffix_hint() {
|
||||||
|
let pipeline_yaml = r#"
|
||||||
|
processors:
|
||||||
|
- epoch:
|
||||||
|
field: ts
|
||||||
|
resolution: ms
|
||||||
|
- vrl:
|
||||||
|
source: |
|
||||||
|
.greptime_table_suffix = "_" + string!(.category)
|
||||||
|
.
|
||||||
|
|
||||||
|
transform:
|
||||||
|
- field: name
|
||||||
|
type: string
|
||||||
|
- field: category
|
||||||
|
type: string
|
||||||
|
- field: ts
|
||||||
|
type: timestamp, ms
|
||||||
|
index: time
|
||||||
|
"#;
|
||||||
|
|
||||||
|
let pipeline: Pipeline = parse(&Content::Yaml(pipeline_yaml)).unwrap();
|
||||||
|
let (pipeline, mut schema_info, pipeline_def, pipeline_param) = setup_pipeline!(pipeline);
|
||||||
|
let pipeline_ctx = PipelineContext::new(
|
||||||
|
&pipeline_def,
|
||||||
|
&pipeline_param,
|
||||||
|
session::context::Channel::Unknown,
|
||||||
|
);
|
||||||
|
|
||||||
|
let input_value: serde_json::Value = serde_json::from_str(
|
||||||
|
r#"{
|
||||||
|
"name": "test",
|
||||||
|
"category": "metrics",
|
||||||
|
"ts": 1716668197217
|
||||||
|
}"#,
|
||||||
|
)
|
||||||
|
.unwrap();
|
||||||
|
|
||||||
|
let payload = input_value.into();
|
||||||
|
let result = pipeline
|
||||||
|
.exec_mut(payload, &pipeline_ctx, &mut schema_info)
|
||||||
|
.unwrap()
|
||||||
|
.into_transformed()
|
||||||
|
.unwrap();
|
||||||
|
|
||||||
|
// Should have table suffix extracted per row
|
||||||
|
assert_eq!(result.len(), 1);
|
||||||
|
assert_eq!(result[0].1, Some("_metrics".to_string()));
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Test one-to-many with per-row table suffix
|
||||||
|
#[test]
|
||||||
|
fn test_one_to_many_per_row_table_suffix() {
|
||||||
|
let pipeline_yaml = r#"
|
||||||
|
processors:
|
||||||
|
- epoch:
|
||||||
|
field: timestamp
|
||||||
|
resolution: ms
|
||||||
|
- vrl:
|
||||||
|
source: |
|
||||||
|
events = del(.events)
|
||||||
|
base_ts = del(.timestamp)
|
||||||
|
|
||||||
|
map_values(array!(events)) -> |event| {
|
||||||
|
suffix = "_" + string!(event.category)
|
||||||
|
{
|
||||||
|
"name": event.name,
|
||||||
|
"value": event.value,
|
||||||
|
"timestamp": base_ts,
|
||||||
|
"greptime_table_suffix": suffix
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
transform:
|
||||||
|
- field: name
|
||||||
|
type: string
|
||||||
|
- field: value
|
||||||
|
type: int32
|
||||||
|
- field: timestamp
|
||||||
|
type: timestamp, ms
|
||||||
|
index: time
|
||||||
|
"#;
|
||||||
|
|
||||||
|
let pipeline: Pipeline = parse(&Content::Yaml(pipeline_yaml)).unwrap();
|
||||||
|
let (pipeline, mut schema_info, pipeline_def, pipeline_param) = setup_pipeline!(pipeline);
|
||||||
|
let pipeline_ctx = PipelineContext::new(
|
||||||
|
&pipeline_def,
|
||||||
|
&pipeline_param,
|
||||||
|
session::context::Channel::Unknown,
|
||||||
|
);
|
||||||
|
|
||||||
|
// Input with events that should go to different tables
|
||||||
|
let input_value: serde_json::Value = serde_json::from_str(
|
||||||
|
r#"{
|
||||||
|
"timestamp": 1716668197217,
|
||||||
|
"events": [
|
||||||
|
{"name": "cpu_usage", "value": 80, "category": "cpu"},
|
||||||
|
{"name": "mem_usage", "value": 60, "category": "memory"},
|
||||||
|
{"name": "cpu_temp", "value": 45, "category": "cpu"}
|
||||||
|
]
|
||||||
|
}"#,
|
||||||
|
)
|
||||||
|
.unwrap();
|
||||||
|
|
||||||
|
let payload = input_value.into();
|
||||||
|
let result = pipeline
|
||||||
|
.exec_mut(payload, &pipeline_ctx, &mut schema_info)
|
||||||
|
.unwrap()
|
||||||
|
.into_transformed()
|
||||||
|
.unwrap();
|
||||||
|
|
||||||
|
// Should produce 3 rows
|
||||||
|
assert_eq!(result.len(), 3);
|
||||||
|
|
||||||
|
// Collect table suffixes
|
||||||
|
let table_suffixes: Vec<_> = result.iter().map(|(_, suffix)| suffix.clone()).collect();
|
||||||
|
|
||||||
|
// Should have different table suffixes per row
|
||||||
|
assert!(table_suffixes.contains(&Some("_cpu".to_string())));
|
||||||
|
assert!(table_suffixes.contains(&Some("_memory".to_string())));
|
||||||
|
|
||||||
|
// Count rows per table suffix
|
||||||
|
let cpu_count = table_suffixes
|
||||||
|
.iter()
|
||||||
|
.filter(|s| *s == &Some("_cpu".to_string()))
|
||||||
|
.count();
|
||||||
|
let memory_count = table_suffixes
|
||||||
|
.iter()
|
||||||
|
.filter(|s| *s == &Some("_memory".to_string()))
|
||||||
|
.count();
|
||||||
|
assert_eq!(cpu_count, 2);
|
||||||
|
assert_eq!(memory_count, 1);
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Test that one-to-many mapping preserves per-row ContextOpt in HashMap
|
||||||
|
#[test]
|
||||||
|
fn test_one_to_many_hashmap_contextopt_preservation() {
|
||||||
|
let pipeline_yaml = r#"
|
||||||
|
processors:
|
||||||
|
- epoch:
|
||||||
|
field: timestamp
|
||||||
|
resolution: ms
|
||||||
|
- vrl:
|
||||||
|
source: |
|
||||||
|
events = del(.events)
|
||||||
|
base_ts = del(.timestamp)
|
||||||
|
|
||||||
|
map_values(array!(events)) -> |event| {
|
||||||
|
# Set different TTL values per event type
|
||||||
|
ttl = if event.type == "critical" {
|
||||||
|
"1h"
|
||||||
|
} else if event.type == "warning" {
|
||||||
|
"24h"
|
||||||
|
} else {
|
||||||
|
"7d"
|
||||||
|
}
|
||||||
|
|
||||||
|
{
|
||||||
|
"host": del(.host),
|
||||||
|
"event_type": event.type,
|
||||||
|
"event_value": event.value,
|
||||||
|
"timestamp": base_ts,
|
||||||
|
"greptime_ttl": ttl
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
transform:
|
||||||
|
- field: host
|
||||||
|
type: string
|
||||||
|
- field: event_type
|
||||||
|
type: string
|
||||||
|
- field: event_value
|
||||||
|
type: int32
|
||||||
|
- field: timestamp
|
||||||
|
type: timestamp, ms
|
||||||
|
index: time
|
||||||
|
"#;
|
||||||
|
|
||||||
|
let pipeline: Pipeline = parse(&Content::Yaml(pipeline_yaml)).unwrap();
|
||||||
|
let (pipeline, mut schema_info, pipeline_def, pipeline_param) = setup_pipeline!(pipeline);
|
||||||
|
let pipeline_ctx = PipelineContext::new(
|
||||||
|
&pipeline_def,
|
||||||
|
&pipeline_param,
|
||||||
|
session::context::Channel::Unknown,
|
||||||
|
);
|
||||||
|
|
||||||
|
// Input with events that should have different ContextOpt values
|
||||||
|
let input_value: serde_json::Value = serde_json::from_str(
|
||||||
|
r#"{
|
||||||
|
"host": "server1",
|
||||||
|
"timestamp": 1716668197217,
|
||||||
|
"events": [
|
||||||
|
{"type": "critical", "value": 100},
|
||||||
|
{"type": "warning", "value": 50},
|
||||||
|
{"type": "info", "value": 25}
|
||||||
|
]
|
||||||
|
}"#,
|
||||||
|
)
|
||||||
|
.unwrap();
|
||||||
|
|
||||||
|
let payload = input_value.into();
|
||||||
|
let result = pipeline
|
||||||
|
.exec_mut(payload, &pipeline_ctx, &mut schema_info)
|
||||||
|
.unwrap();
|
||||||
|
|
||||||
|
// Extract the HashMap structure
|
||||||
|
let rows_by_context = result.into_transformed_hashmap().unwrap();
|
||||||
|
|
||||||
|
// Should have 3 different ContextOpt groups due to different TTL values
|
||||||
|
assert_eq!(rows_by_context.len(), 3);
|
||||||
|
|
||||||
|
// Verify each ContextOpt group has exactly 1 row and different configurations
|
||||||
|
let mut context_opts = Vec::new();
|
||||||
|
for (opt, rows) in &rows_by_context {
|
||||||
|
assert_eq!(rows.len(), 1); // Each group should have exactly 1 row
|
||||||
|
context_opts.push(opt.clone());
|
||||||
|
}
|
||||||
|
|
||||||
|
// ContextOpts should be different due to different TTL values
|
||||||
|
assert_ne!(context_opts[0], context_opts[1]);
|
||||||
|
assert_ne!(context_opts[1], context_opts[2]);
|
||||||
|
assert_ne!(context_opts[0], context_opts[2]);
|
||||||
|
|
||||||
|
// Verify the rows are correctly structured
|
||||||
|
for rows in rows_by_context.values() {
|
||||||
|
for (row, _table_suffix) in rows {
|
||||||
|
assert_eq!(row.values.len(), 4); // host, event_type, event_value, timestamp
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Test that single object input still works with HashMap structure
|
||||||
|
#[test]
|
||||||
|
fn test_single_object_hashmap_compatibility() {
|
||||||
|
let pipeline_yaml = r#"
|
||||||
|
processors:
|
||||||
|
- epoch:
|
||||||
|
field: ts
|
||||||
|
resolution: ms
|
||||||
|
- vrl:
|
||||||
|
source: |
|
||||||
|
.processed = true
|
||||||
|
.
|
||||||
|
|
||||||
|
transform:
|
||||||
|
- field: name
|
||||||
|
type: string
|
||||||
|
- field: processed
|
||||||
|
type: boolean
|
||||||
|
- field: ts
|
||||||
|
type: timestamp, ms
|
||||||
|
index: time
|
||||||
|
"#;
|
||||||
|
|
||||||
|
let pipeline: Pipeline = parse(&Content::Yaml(pipeline_yaml)).unwrap();
|
||||||
|
let (pipeline, mut schema_info, pipeline_def, pipeline_param) = setup_pipeline!(pipeline);
|
||||||
|
let pipeline_ctx = PipelineContext::new(
|
||||||
|
&pipeline_def,
|
||||||
|
&pipeline_param,
|
||||||
|
session::context::Channel::Unknown,
|
||||||
|
);
|
||||||
|
|
||||||
|
let input_value: serde_json::Value = serde_json::from_str(
|
||||||
|
r#"{
|
||||||
|
"name": "test",
|
||||||
|
"ts": 1716668197217
|
||||||
|
}"#,
|
||||||
|
)
|
||||||
|
.unwrap();
|
||||||
|
|
||||||
|
let payload = input_value.into();
|
||||||
|
let result = pipeline
|
||||||
|
.exec_mut(payload, &pipeline_ctx, &mut schema_info)
|
||||||
|
.unwrap();
|
||||||
|
|
||||||
|
// Extract the HashMap structure
|
||||||
|
let rows_by_context = result.into_transformed_hashmap().unwrap();
|
||||||
|
|
||||||
|
// Single object should produce exactly 1 ContextOpt group
|
||||||
|
assert_eq!(rows_by_context.len(), 1);
|
||||||
|
|
||||||
|
let (_opt, rows) = rows_by_context.into_iter().next().unwrap();
|
||||||
|
assert_eq!(rows.len(), 1);
|
||||||
|
|
||||||
|
// Verify the row structure
|
||||||
|
let (row, _table_suffix) = &rows[0];
|
||||||
|
assert_eq!(row.values.len(), 3); // name, processed, timestamp
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Test that empty arrays work correctly with HashMap structure
|
||||||
|
#[test]
|
||||||
|
fn test_empty_array_hashmap() {
|
||||||
|
let pipeline_yaml = r#"
|
||||||
|
processors:
|
||||||
|
- vrl:
|
||||||
|
source: |
|
||||||
|
.events
|
||||||
|
|
||||||
|
transform:
|
||||||
|
- field: value
|
||||||
|
type: int32
|
||||||
|
- field: greptime_timestamp
|
||||||
|
type: timestamp, ns
|
||||||
|
index: time
|
||||||
|
"#;
|
||||||
|
|
||||||
|
let pipeline: Pipeline = parse(&Content::Yaml(pipeline_yaml)).unwrap();
|
||||||
|
let (pipeline, mut schema_info, pipeline_def, pipeline_param) = setup_pipeline!(pipeline);
|
||||||
|
let pipeline_ctx = PipelineContext::new(
|
||||||
|
&pipeline_def,
|
||||||
|
&pipeline_param,
|
||||||
|
session::context::Channel::Unknown,
|
||||||
|
);
|
||||||
|
|
||||||
|
let input_value: serde_json::Value = serde_json::from_str(r#"{"events": []}"#).unwrap();
|
||||||
|
|
||||||
|
let payload = input_value.into();
|
||||||
|
let result = pipeline
|
||||||
|
.exec_mut(payload, &pipeline_ctx, &mut schema_info)
|
||||||
|
.unwrap();
|
||||||
|
|
||||||
|
// Extract the HashMap structure
|
||||||
|
let rows_by_context = result.into_transformed_hashmap().unwrap();
|
||||||
|
|
||||||
|
// Empty array should produce empty HashMap
|
||||||
|
assert_eq!(rows_by_context.len(), 0);
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -57,7 +57,7 @@ const PIPELINE_HINT_PREFIX: &str = "greptime_";
|
|||||||
///
|
///
|
||||||
/// The options are set in the format of hint keys. See [`PIPELINE_HINT_KEYS`].
|
/// The options are set in the format of hint keys. See [`PIPELINE_HINT_KEYS`].
|
||||||
/// It's is used as the key in [`ContextReq`] for grouping the row insert requests.
|
/// It's is used as the key in [`ContextReq`] for grouping the row insert requests.
|
||||||
#[derive(Debug, Default, PartialEq, Eq, PartialOrd, Ord, Hash)]
|
#[derive(Debug, Default, PartialEq, Eq, PartialOrd, Ord, Hash, Clone)]
|
||||||
pub struct ContextOpt {
|
pub struct ContextOpt {
|
||||||
// table options, that need to be set in the query context before making row insert requests
|
// table options, that need to be set in the query context before making row insert requests
|
||||||
auto_create_table: Option<String>,
|
auto_create_table: Option<String>,
|
||||||
@@ -192,8 +192,15 @@ impl ContextReq {
|
|||||||
Self { req: req_map }
|
Self { req: req_map }
|
||||||
}
|
}
|
||||||
|
|
||||||
pub fn add_row(&mut self, opt: ContextOpt, req: RowInsertRequest) {
|
pub fn add_row(&mut self, opt: &ContextOpt, req: RowInsertRequest) {
|
||||||
self.req.entry(opt).or_default().push(req);
|
match self.req.get_mut(opt) {
|
||||||
|
None => {
|
||||||
|
self.req.insert(opt.clone(), vec![req]);
|
||||||
|
}
|
||||||
|
Some(e) => {
|
||||||
|
e.push(req);
|
||||||
|
}
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
pub fn add_rows(&mut self, opt: ContextOpt, reqs: impl IntoIterator<Item = RowInsertRequest>) {
|
pub fn add_rows(&mut self, opt: ContextOpt, reqs: impl IntoIterator<Item = RowInsertRequest>) {
|
||||||
|
|||||||
@@ -15,7 +15,7 @@
|
|||||||
use std::collections::BTreeMap;
|
use std::collections::BTreeMap;
|
||||||
|
|
||||||
use chrono_tz::Tz;
|
use chrono_tz::Tz;
|
||||||
use snafu::OptionExt;
|
use snafu::{OptionExt, ensure};
|
||||||
use vrl::compiler::runtime::Runtime;
|
use vrl::compiler::runtime::Runtime;
|
||||||
use vrl::compiler::{Program, TargetValue, compile};
|
use vrl::compiler::{Program, TargetValue, compile};
|
||||||
use vrl::diagnostic::Formatter;
|
use vrl::diagnostic::Formatter;
|
||||||
@@ -53,9 +53,15 @@ impl VrlProcessor {
|
|||||||
// check if the return value is have regex
|
// check if the return value is have regex
|
||||||
let result_def = program.final_type_info().result;
|
let result_def = program.final_type_info().result;
|
||||||
let kind = result_def.kind();
|
let kind = result_def.kind();
|
||||||
if !kind.is_object() {
|
// Check if the return type could possibly be an object or array.
|
||||||
return VrlReturnValueSnafu.fail();
|
// We use contains_* methods since VRL type inference may return
|
||||||
}
|
// a Kind that represents multiple possible types.
|
||||||
|
ensure!(
|
||||||
|
kind.contains_object() || kind.contains_array(),
|
||||||
|
VrlReturnValueSnafu {
|
||||||
|
result_kind: kind.clone(),
|
||||||
|
}
|
||||||
|
);
|
||||||
check_regex_output(kind)?;
|
check_regex_output(kind)?;
|
||||||
|
|
||||||
Ok(Self { source, program })
|
Ok(Self { source, program })
|
||||||
@@ -111,13 +117,7 @@ impl crate::etl::processor::Processor for VrlProcessor {
|
|||||||
}
|
}
|
||||||
|
|
||||||
fn exec_mut(&self, val: VrlValue) -> Result<VrlValue> {
|
fn exec_mut(&self, val: VrlValue) -> Result<VrlValue> {
|
||||||
let val = self.resolve(val)?;
|
self.resolve(val)
|
||||||
|
|
||||||
if let VrlValue::Object(_) = val {
|
|
||||||
Ok(val)
|
|
||||||
} else {
|
|
||||||
VrlRegexValueSnafu.fail()
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@@ -37,8 +37,8 @@ use vrl::prelude::{Bytes, VrlValueConvert};
|
|||||||
use vrl::value::{KeyString, Value as VrlValue};
|
use vrl::value::{KeyString, Value as VrlValue};
|
||||||
|
|
||||||
use crate::error::{
|
use crate::error::{
|
||||||
IdentifyPipelineColumnTypeMismatchSnafu, InvalidTimestampSnafu, Result,
|
ArrayElementMustBeObjectSnafu, IdentifyPipelineColumnTypeMismatchSnafu, InvalidTimestampSnafu,
|
||||||
TimeIndexMustBeNonNullSnafu, TransformColumnNameMustBeUniqueSnafu,
|
Result, TimeIndexMustBeNonNullSnafu, TransformColumnNameMustBeUniqueSnafu,
|
||||||
TransformMultipleTimestampIndexSnafu, TransformTimestampIndexCountSnafu, ValueMustBeMapSnafu,
|
TransformMultipleTimestampIndexSnafu, TransformTimestampIndexCountSnafu, ValueMustBeMapSnafu,
|
||||||
};
|
};
|
||||||
use crate::etl::PipelineDocVersion;
|
use crate::etl::PipelineDocVersion;
|
||||||
@@ -50,6 +50,9 @@ use crate::{PipelineContext, truthy, unwrap_or_continue_if_err};
|
|||||||
|
|
||||||
const DEFAULT_MAX_NESTED_LEVELS_FOR_JSON_FLATTENING: usize = 10;
|
const DEFAULT_MAX_NESTED_LEVELS_FOR_JSON_FLATTENING: usize = 10;
|
||||||
|
|
||||||
|
/// Row with potentially designated table suffix.
|
||||||
|
pub type RowWithTableSuffix = (Row, Option<String>);
|
||||||
|
|
||||||
/// fields not in the columns will be discarded
|
/// fields not in the columns will be discarded
|
||||||
/// to prevent automatic column creation in GreptimeDB
|
/// to prevent automatic column creation in GreptimeDB
|
||||||
#[derive(Debug, Clone)]
|
#[derive(Debug, Clone)]
|
||||||
@@ -363,6 +366,73 @@ fn calc_ts(p_ctx: &PipelineContext, values: &VrlValue) -> Result<Option<ValueDat
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// Converts VRL values to Greptime rows grouped by their ContextOpt.
|
||||||
|
/// # Returns
|
||||||
|
/// A HashMap where keys are `ContextOpt` and values are vectors of (row, table_suffix) pairs.
|
||||||
|
/// Single object input produces one ContextOpt group with one row.
|
||||||
|
/// Array input groups rows by their per-element ContextOpt values.
|
||||||
|
///
|
||||||
|
/// # Errors
|
||||||
|
/// - `ArrayElementMustBeObject` if an array element is not an object
|
||||||
|
pub(crate) fn values_to_rows(
|
||||||
|
schema_info: &mut SchemaInfo,
|
||||||
|
mut values: VrlValue,
|
||||||
|
pipeline_ctx: &PipelineContext<'_>,
|
||||||
|
row: Option<Vec<GreptimeValue>>,
|
||||||
|
need_calc_ts: bool,
|
||||||
|
tablesuffix_template: Option<&crate::tablesuffix::TableSuffixTemplate>,
|
||||||
|
) -> Result<std::collections::HashMap<ContextOpt, Vec<RowWithTableSuffix>>> {
|
||||||
|
let skip_error = pipeline_ctx.pipeline_param.skip_error();
|
||||||
|
let VrlValue::Array(arr) = values else {
|
||||||
|
// Single object: extract ContextOpt and table_suffix
|
||||||
|
let mut result = std::collections::HashMap::new();
|
||||||
|
|
||||||
|
let mut opt = match ContextOpt::from_pipeline_map_to_opt(&mut values) {
|
||||||
|
Ok(r) => r,
|
||||||
|
Err(e) => return if skip_error { Ok(result) } else { Err(e) },
|
||||||
|
};
|
||||||
|
|
||||||
|
let table_suffix = opt.resolve_table_suffix(tablesuffix_template, &values);
|
||||||
|
let row = match values_to_row(schema_info, values, pipeline_ctx, row, need_calc_ts) {
|
||||||
|
Ok(r) => r,
|
||||||
|
Err(e) => return if skip_error { Ok(result) } else { Err(e) },
|
||||||
|
};
|
||||||
|
result.insert(opt, vec![(row, table_suffix)]);
|
||||||
|
return Ok(result);
|
||||||
|
};
|
||||||
|
|
||||||
|
let mut rows_by_context: std::collections::HashMap<ContextOpt, Vec<RowWithTableSuffix>> =
|
||||||
|
std::collections::HashMap::new();
|
||||||
|
for (index, mut value) in arr.into_iter().enumerate() {
|
||||||
|
if !value.is_object() {
|
||||||
|
unwrap_or_continue_if_err!(
|
||||||
|
ArrayElementMustBeObjectSnafu {
|
||||||
|
index,
|
||||||
|
actual_type: value.kind_str().to_string(),
|
||||||
|
}
|
||||||
|
.fail(),
|
||||||
|
skip_error
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Extract ContextOpt and table_suffix for this element
|
||||||
|
let mut opt = unwrap_or_continue_if_err!(
|
||||||
|
ContextOpt::from_pipeline_map_to_opt(&mut value),
|
||||||
|
skip_error
|
||||||
|
);
|
||||||
|
let table_suffix = opt.resolve_table_suffix(tablesuffix_template, &value);
|
||||||
|
let transformed_row = unwrap_or_continue_if_err!(
|
||||||
|
values_to_row(schema_info, value, pipeline_ctx, row.clone(), need_calc_ts),
|
||||||
|
skip_error
|
||||||
|
);
|
||||||
|
rows_by_context
|
||||||
|
.entry(opt)
|
||||||
|
.or_default()
|
||||||
|
.push((transformed_row, table_suffix));
|
||||||
|
}
|
||||||
|
Ok(rows_by_context)
|
||||||
|
}
|
||||||
|
|
||||||
/// `need_calc_ts` happens in two cases:
|
/// `need_calc_ts` happens in two cases:
|
||||||
/// 1. full greptime_identity
|
/// 1. full greptime_identity
|
||||||
/// 2. auto-transform without transformer
|
/// 2. auto-transform without transformer
|
||||||
@@ -992,4 +1062,139 @@ mod tests {
|
|||||||
assert_eq!(flattened_object, expected);
|
assert_eq!(flattened_object, expected);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
use ahash::HashMap as AHashMap;
|
||||||
|
#[test]
|
||||||
|
fn test_values_to_rows_skip_error_handling() {
|
||||||
|
let table_suffix_template: Option<crate::tablesuffix::TableSuffixTemplate> = None;
|
||||||
|
|
||||||
|
// Case 1: skip_error=true, mixed valid/invalid elements
|
||||||
|
{
|
||||||
|
let schema_info = &mut SchemaInfo::default();
|
||||||
|
let input_array = vec![
|
||||||
|
// Valid object
|
||||||
|
serde_json::json!({"name": "Alice", "age": 25}).into(),
|
||||||
|
// Invalid element (string)
|
||||||
|
VrlValue::Bytes("invalid_string".into()),
|
||||||
|
// Valid object
|
||||||
|
serde_json::json!({"name": "Bob", "age": 30}).into(),
|
||||||
|
// Invalid element (number)
|
||||||
|
VrlValue::Integer(42),
|
||||||
|
// Valid object
|
||||||
|
serde_json::json!({"name": "Charlie", "age": 35}).into(),
|
||||||
|
];
|
||||||
|
|
||||||
|
let params = GreptimePipelineParams::from_map(AHashMap::from_iter([(
|
||||||
|
"skip_error".to_string(),
|
||||||
|
"true".to_string(),
|
||||||
|
)]));
|
||||||
|
|
||||||
|
let pipeline_ctx = PipelineContext::new(
|
||||||
|
&PipelineDefinition::GreptimeIdentityPipeline(None),
|
||||||
|
¶ms,
|
||||||
|
Channel::Unknown,
|
||||||
|
);
|
||||||
|
|
||||||
|
let result = values_to_rows(
|
||||||
|
schema_info,
|
||||||
|
VrlValue::Array(input_array),
|
||||||
|
&pipeline_ctx,
|
||||||
|
None,
|
||||||
|
true,
|
||||||
|
table_suffix_template.as_ref(),
|
||||||
|
);
|
||||||
|
|
||||||
|
// Should succeed and only process valid objects
|
||||||
|
assert!(result.is_ok());
|
||||||
|
let rows_by_context = result.unwrap();
|
||||||
|
// Count total rows across all ContextOpt groups
|
||||||
|
let total_rows: usize = rows_by_context.values().map(|v| v.len()).sum();
|
||||||
|
assert_eq!(total_rows, 3); // Only 3 valid objects
|
||||||
|
}
|
||||||
|
|
||||||
|
// Case 2: skip_error=false, invalid elements present
|
||||||
|
{
|
||||||
|
let schema_info = &mut SchemaInfo::default();
|
||||||
|
let input_array = vec![
|
||||||
|
serde_json::json!({"name": "Alice", "age": 25}).into(),
|
||||||
|
VrlValue::Bytes("invalid_string".into()), // This should cause error
|
||||||
|
];
|
||||||
|
|
||||||
|
let params = GreptimePipelineParams::default(); // skip_error = false
|
||||||
|
|
||||||
|
let pipeline_ctx = PipelineContext::new(
|
||||||
|
&PipelineDefinition::GreptimeIdentityPipeline(None),
|
||||||
|
¶ms,
|
||||||
|
Channel::Unknown,
|
||||||
|
);
|
||||||
|
|
||||||
|
let result = values_to_rows(
|
||||||
|
schema_info,
|
||||||
|
VrlValue::Array(input_array),
|
||||||
|
&pipeline_ctx,
|
||||||
|
None,
|
||||||
|
true,
|
||||||
|
table_suffix_template.as_ref(),
|
||||||
|
);
|
||||||
|
|
||||||
|
// Should fail with ArrayElementMustBeObject error
|
||||||
|
assert!(result.is_err());
|
||||||
|
let error_msg = result.unwrap_err().to_string();
|
||||||
|
assert!(error_msg.contains("Array element at index 1 must be an object for one-to-many transformation, got string"));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Test that values_to_rows correctly groups rows by per-element ContextOpt
|
||||||
|
#[test]
|
||||||
|
fn test_values_to_rows_per_element_context_opt() {
|
||||||
|
let table_suffix_template: Option<crate::tablesuffix::TableSuffixTemplate> = None;
|
||||||
|
let schema_info = &mut SchemaInfo::default();
|
||||||
|
|
||||||
|
// Create array with elements having different TTL values (ContextOpt)
|
||||||
|
let input_array = vec![
|
||||||
|
serde_json::json!({"name": "Alice", "greptime_ttl": "1h"}).into(),
|
||||||
|
serde_json::json!({"name": "Bob", "greptime_ttl": "1h"}).into(),
|
||||||
|
serde_json::json!({"name": "Charlie", "greptime_ttl": "24h"}).into(),
|
||||||
|
];
|
||||||
|
|
||||||
|
let params = GreptimePipelineParams::default();
|
||||||
|
let pipeline_ctx = PipelineContext::new(
|
||||||
|
&PipelineDefinition::GreptimeIdentityPipeline(None),
|
||||||
|
¶ms,
|
||||||
|
Channel::Unknown,
|
||||||
|
);
|
||||||
|
|
||||||
|
let result = values_to_rows(
|
||||||
|
schema_info,
|
||||||
|
VrlValue::Array(input_array),
|
||||||
|
&pipeline_ctx,
|
||||||
|
None,
|
||||||
|
true,
|
||||||
|
table_suffix_template.as_ref(),
|
||||||
|
);
|
||||||
|
|
||||||
|
assert!(result.is_ok());
|
||||||
|
let rows_by_context = result.unwrap();
|
||||||
|
|
||||||
|
// Should have 2 different ContextOpt groups (1h TTL and 24h TTL)
|
||||||
|
assert_eq!(rows_by_context.len(), 2);
|
||||||
|
|
||||||
|
// Count rows per group
|
||||||
|
let total_rows: usize = rows_by_context.values().map(|v| v.len()).sum();
|
||||||
|
assert_eq!(total_rows, 3);
|
||||||
|
|
||||||
|
// Verify that rows are correctly grouped by TTL
|
||||||
|
let mut ttl_1h_count = 0;
|
||||||
|
let mut ttl_24h_count = 0;
|
||||||
|
for rows in rows_by_context.values() {
|
||||||
|
// ContextOpt doesn't expose ttl directly, but we can count by group size
|
||||||
|
if rows.len() == 2 {
|
||||||
|
ttl_1h_count = rows.len();
|
||||||
|
} else if rows.len() == 1 {
|
||||||
|
ttl_24h_count = rows.len();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
assert_eq!(ttl_1h_count, 2); // Alice and Bob with 1h TTL
|
||||||
|
assert_eq!(ttl_24h_count, 1); // Charlie with 24h TTL
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -35,21 +35,25 @@ pub fn parse_and_exec(input_str: &str, pipeline_yaml: &str) -> Rows {
|
|||||||
match input_value {
|
match input_value {
|
||||||
VrlValue::Array(array) => {
|
VrlValue::Array(array) => {
|
||||||
for value in array {
|
for value in array {
|
||||||
let row = pipeline
|
let rows_with_suffix = pipeline
|
||||||
.exec_mut(value, &pipeline_ctx, &mut schema_info)
|
.exec_mut(value, &pipeline_ctx, &mut schema_info)
|
||||||
.expect("failed to exec pipeline")
|
.expect("failed to exec pipeline")
|
||||||
.into_transformed()
|
.into_transformed()
|
||||||
.expect("expect transformed result ");
|
.expect("expect transformed result ");
|
||||||
rows.push(row.0);
|
for (r, _) in rows_with_suffix {
|
||||||
|
rows.push(r);
|
||||||
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
VrlValue::Object(_) => {
|
VrlValue::Object(_) => {
|
||||||
let row = pipeline
|
let rows_with_suffix = pipeline
|
||||||
.exec_mut(input_value, &pipeline_ctx, &mut schema_info)
|
.exec_mut(input_value, &pipeline_ctx, &mut schema_info)
|
||||||
.expect("failed to exec pipeline")
|
.expect("failed to exec pipeline")
|
||||||
.into_transformed()
|
.into_transformed()
|
||||||
.expect("expect transformed result ");
|
.expect("expect transformed result ");
|
||||||
rows.push(row.0);
|
for (r, _) in rows_with_suffix {
|
||||||
|
rows.push(r);
|
||||||
|
}
|
||||||
}
|
}
|
||||||
_ => {
|
_ => {
|
||||||
panic!("invalid input value");
|
panic!("invalid input value");
|
||||||
|
|||||||
@@ -427,7 +427,7 @@ transform:
|
|||||||
);
|
);
|
||||||
let stats = input_value.into();
|
let stats = input_value.into();
|
||||||
|
|
||||||
let row = pipeline
|
let rows_with_suffix = pipeline
|
||||||
.exec_mut(stats, &pipeline_ctx, &mut schema_info)
|
.exec_mut(stats, &pipeline_ctx, &mut schema_info)
|
||||||
.expect("failed to exec pipeline")
|
.expect("failed to exec pipeline")
|
||||||
.into_transformed()
|
.into_transformed()
|
||||||
@@ -435,7 +435,7 @@ transform:
|
|||||||
|
|
||||||
let output = Rows {
|
let output = Rows {
|
||||||
schema: pipeline.schemas().unwrap().clone(),
|
schema: pipeline.schemas().unwrap().clone(),
|
||||||
rows: vec![row.0],
|
rows: rows_with_suffix.into_iter().map(|(r, _)| r).collect(),
|
||||||
};
|
};
|
||||||
|
|
||||||
assert_eq!(output.rows.len(), 1);
|
assert_eq!(output.rows.len(), 1);
|
||||||
@@ -501,13 +501,13 @@ transform:
|
|||||||
);
|
);
|
||||||
|
|
||||||
let status = input_value.into();
|
let status = input_value.into();
|
||||||
let row = pipeline
|
let mut rows_with_suffix = pipeline
|
||||||
.exec_mut(status, &pipeline_ctx, &mut schema_info)
|
.exec_mut(status, &pipeline_ctx, &mut schema_info)
|
||||||
.unwrap()
|
.unwrap()
|
||||||
.into_transformed()
|
.into_transformed()
|
||||||
.expect("expect transformed result ");
|
.expect("expect transformed result ");
|
||||||
|
let (row, _) = rows_with_suffix.swap_remove(0);
|
||||||
let r = row
|
let r = row
|
||||||
.0
|
|
||||||
.values
|
.values
|
||||||
.into_iter()
|
.into_iter()
|
||||||
.map(|v| v.value_data.unwrap())
|
.map(|v| v.value_data.unwrap())
|
||||||
@@ -616,15 +616,16 @@ transform:
|
|||||||
);
|
);
|
||||||
|
|
||||||
let status = input_value.into();
|
let status = input_value.into();
|
||||||
let row = pipeline
|
let mut rows_with_suffix = pipeline
|
||||||
.exec_mut(status, &pipeline_ctx, &mut schema_info)
|
.exec_mut(status, &pipeline_ctx, &mut schema_info)
|
||||||
.unwrap()
|
.unwrap()
|
||||||
.into_transformed()
|
.into_transformed()
|
||||||
.expect("expect transformed result ");
|
.expect("expect transformed result ");
|
||||||
|
|
||||||
|
let (row, _) = rows_with_suffix.swap_remove(0);
|
||||||
let r = row
|
let r = row
|
||||||
.0
|
|
||||||
.values
|
.values
|
||||||
|
.clone()
|
||||||
.into_iter()
|
.into_iter()
|
||||||
.map(|v| v.value_data.unwrap())
|
.map(|v| v.value_data.unwrap())
|
||||||
.collect::<Vec<_>>();
|
.collect::<Vec<_>>();
|
||||||
@@ -688,13 +689,13 @@ transform:
|
|||||||
);
|
);
|
||||||
|
|
||||||
let status = input_value.into();
|
let status = input_value.into();
|
||||||
let row = pipeline
|
let mut rows_with_suffix = pipeline
|
||||||
.exec_mut(status, &pipeline_ctx, &mut schema_info)
|
.exec_mut(status, &pipeline_ctx, &mut schema_info)
|
||||||
.unwrap()
|
.unwrap()
|
||||||
.into_transformed()
|
.into_transformed()
|
||||||
.expect("expect transformed result ");
|
.expect("expect transformed result ");
|
||||||
|
let (row, _) = rows_with_suffix.swap_remove(0);
|
||||||
let r = row
|
let r = row
|
||||||
.0
|
|
||||||
.values
|
.values
|
||||||
.into_iter()
|
.into_iter()
|
||||||
.map(|v| v.value_data.unwrap())
|
.map(|v| v.value_data.unwrap())
|
||||||
@@ -734,14 +735,14 @@ transform:
|
|||||||
);
|
);
|
||||||
|
|
||||||
let status = input_value.into();
|
let status = input_value.into();
|
||||||
let row = pipeline
|
let mut rows_with_suffix = pipeline
|
||||||
.exec_mut(status, &pipeline_ctx, &mut schema_info)
|
.exec_mut(status, &pipeline_ctx, &mut schema_info)
|
||||||
.unwrap()
|
.unwrap()
|
||||||
.into_transformed()
|
.into_transformed()
|
||||||
.expect("expect transformed result ");
|
.expect("expect transformed result ");
|
||||||
|
|
||||||
|
let (row, _) = rows_with_suffix.swap_remove(0);
|
||||||
let r = row
|
let r = row
|
||||||
.0
|
|
||||||
.values
|
.values
|
||||||
.into_iter()
|
.into_iter()
|
||||||
.map(|v| v.value_data.unwrap())
|
.map(|v| v.value_data.unwrap())
|
||||||
@@ -799,14 +800,14 @@ transform:
|
|||||||
);
|
);
|
||||||
|
|
||||||
let status = input_value.into();
|
let status = input_value.into();
|
||||||
let row = pipeline
|
let mut rows_with_suffix = pipeline
|
||||||
.exec_mut(status, &pipeline_ctx, &mut schema_info)
|
.exec_mut(status, &pipeline_ctx, &mut schema_info)
|
||||||
.unwrap()
|
.unwrap()
|
||||||
.into_transformed()
|
.into_transformed()
|
||||||
.expect("expect transformed result ");
|
.expect("expect transformed result ");
|
||||||
|
|
||||||
|
let (row, _) = rows_with_suffix.swap_remove(0);
|
||||||
let mut r = row
|
let mut r = row
|
||||||
.0
|
|
||||||
.values
|
.values
|
||||||
.into_iter()
|
.into_iter()
|
||||||
.map(|v| v.value_data.unwrap())
|
.map(|v| v.value_data.unwrap())
|
||||||
@@ -846,13 +847,14 @@ transform:
|
|||||||
);
|
);
|
||||||
|
|
||||||
let status = input_value.into();
|
let status = input_value.into();
|
||||||
let row = pipeline
|
let mut rows_with_suffix = pipeline
|
||||||
.exec_mut(status, &pipeline_ctx, &mut schema_info)
|
.exec_mut(status, &pipeline_ctx, &mut schema_info)
|
||||||
.unwrap()
|
.unwrap()
|
||||||
.into_transformed()
|
.into_transformed()
|
||||||
.expect("expect transformed result ");
|
.expect("expect transformed result ");
|
||||||
|
|
||||||
row.0.values.into_iter().for_each(|v| {
|
let (row, _) = rows_with_suffix.swap_remove(0);
|
||||||
|
row.values.into_iter().for_each(|v| {
|
||||||
if let ValueData::TimestampNanosecondValue(v) = v.value_data.unwrap() {
|
if let ValueData::TimestampNanosecondValue(v) = v.value_data.unwrap() {
|
||||||
let now = chrono::Utc::now().timestamp_nanos_opt().unwrap();
|
let now = chrono::Utc::now().timestamp_nanos_opt().unwrap();
|
||||||
assert!(now - v < 5_000_000);
|
assert!(now - v < 5_000_000);
|
||||||
@@ -923,13 +925,13 @@ transform:
|
|||||||
assert_eq!(dispatched_to.pipeline.unwrap(), "access_log_pipeline");
|
assert_eq!(dispatched_to.pipeline.unwrap(), "access_log_pipeline");
|
||||||
|
|
||||||
let status = input_value2.into();
|
let status = input_value2.into();
|
||||||
let row = pipeline
|
let mut rows_with_suffix = pipeline
|
||||||
.exec_mut(status, &pipeline_ctx, &mut schema_info)
|
.exec_mut(status, &pipeline_ctx, &mut schema_info)
|
||||||
.unwrap()
|
.unwrap()
|
||||||
.into_transformed()
|
.into_transformed()
|
||||||
.expect("expect transformed result ");
|
.expect("expect transformed result ");
|
||||||
|
let (row, _) = rows_with_suffix.swap_remove(0);
|
||||||
let r = row
|
let r = row
|
||||||
.0
|
|
||||||
.values
|
.values
|
||||||
.into_iter()
|
.into_iter()
|
||||||
.map(|v| v.value_data.unwrap())
|
.map(|v| v.value_data.unwrap())
|
||||||
@@ -988,8 +990,8 @@ table_suffix: _${logger}
|
|||||||
.exec_mut(status, &pipeline_ctx, &mut schema_info)
|
.exec_mut(status, &pipeline_ctx, &mut schema_info)
|
||||||
.unwrap();
|
.unwrap();
|
||||||
|
|
||||||
let (row, table_name) = exec_re.into_transformed().unwrap();
|
let mut rows_with_suffix = exec_re.into_transformed().unwrap();
|
||||||
let values = row.values;
|
let (row, table_suffix) = rows_with_suffix.swap_remove(0);
|
||||||
let expected_values = vec![
|
let expected_values = vec![
|
||||||
Value {
|
Value {
|
||||||
value_data: Some(ValueData::StringValue("hello world".into())),
|
value_data: Some(ValueData::StringValue("hello world".into())),
|
||||||
@@ -998,6 +1000,234 @@ table_suffix: _${logger}
|
|||||||
value_data: Some(ValueData::TimestampNanosecondValue(1716668197217000000)),
|
value_data: Some(ValueData::TimestampNanosecondValue(1716668197217000000)),
|
||||||
},
|
},
|
||||||
];
|
];
|
||||||
assert_eq!(expected_values, values);
|
assert_eq!(expected_values, row.values);
|
||||||
assert_eq!(table_name, Some("_http".to_string()));
|
assert_eq!(table_suffix, Some("_http".to_string()));
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Test one-to-many pipeline expansion using VRL processor that returns an array
|
||||||
|
#[test]
|
||||||
|
fn test_one_to_many_pipeline() {
|
||||||
|
// Input: single log entry with a list of events
|
||||||
|
let input_value = serde_json::json!({
|
||||||
|
"request_id": "req-123",
|
||||||
|
"events": [
|
||||||
|
{"type": "click", "value": 100},
|
||||||
|
{"type": "scroll", "value": 200},
|
||||||
|
{"type": "submit", "value": 300}
|
||||||
|
]
|
||||||
|
});
|
||||||
|
|
||||||
|
// VRL processor that expands events into separate rows using map
|
||||||
|
let pipeline_yaml = r#"
|
||||||
|
processors:
|
||||||
|
- vrl:
|
||||||
|
source: |
|
||||||
|
events = del(.events)
|
||||||
|
request_id = del(.request_id)
|
||||||
|
map_values(array!(events)) -> |event| {
|
||||||
|
{
|
||||||
|
"request_id": request_id,
|
||||||
|
"event_type": event.type,
|
||||||
|
"event_value": event.value
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
transform:
|
||||||
|
- field: request_id
|
||||||
|
type: string
|
||||||
|
- field: event_type
|
||||||
|
type: string
|
||||||
|
- field: event_value
|
||||||
|
type: uint64
|
||||||
|
"#;
|
||||||
|
|
||||||
|
let yaml_content = Content::Yaml(pipeline_yaml);
|
||||||
|
let pipeline: Pipeline = parse(&yaml_content).expect("failed to parse pipeline");
|
||||||
|
let (pipeline, mut schema_info, pipeline_def, pipeline_param) = setup_pipeline!(pipeline);
|
||||||
|
let pipeline_ctx = PipelineContext::new(
|
||||||
|
&pipeline_def,
|
||||||
|
&pipeline_param,
|
||||||
|
session::context::Channel::Unknown,
|
||||||
|
);
|
||||||
|
|
||||||
|
let status = input_value.into();
|
||||||
|
let rows_with_suffix = pipeline
|
||||||
|
.exec_mut(status, &pipeline_ctx, &mut schema_info)
|
||||||
|
.expect("failed to exec pipeline")
|
||||||
|
.into_transformed()
|
||||||
|
.expect("expect transformed result");
|
||||||
|
|
||||||
|
// Should produce 3 rows from the single input
|
||||||
|
assert_eq!(rows_with_suffix.len(), 3);
|
||||||
|
|
||||||
|
// Row 0: click event
|
||||||
|
assert_eq!(
|
||||||
|
rows_with_suffix[0].0.values[0].value_data,
|
||||||
|
Some(StringValue("req-123".into()))
|
||||||
|
);
|
||||||
|
assert_eq!(
|
||||||
|
rows_with_suffix[0].0.values[1].value_data,
|
||||||
|
Some(StringValue("click".into()))
|
||||||
|
);
|
||||||
|
assert_eq!(
|
||||||
|
rows_with_suffix[0].0.values[2].value_data,
|
||||||
|
Some(U64Value(100))
|
||||||
|
);
|
||||||
|
|
||||||
|
// Row 1: scroll event
|
||||||
|
assert_eq!(
|
||||||
|
rows_with_suffix[1].0.values[0].value_data,
|
||||||
|
Some(StringValue("req-123".into()))
|
||||||
|
);
|
||||||
|
assert_eq!(
|
||||||
|
rows_with_suffix[1].0.values[1].value_data,
|
||||||
|
Some(StringValue("scroll".into()))
|
||||||
|
);
|
||||||
|
assert_eq!(
|
||||||
|
rows_with_suffix[1].0.values[2].value_data,
|
||||||
|
Some(U64Value(200))
|
||||||
|
);
|
||||||
|
|
||||||
|
// Row 2: submit event
|
||||||
|
assert_eq!(
|
||||||
|
rows_with_suffix[2].0.values[0].value_data,
|
||||||
|
Some(StringValue("req-123".into()))
|
||||||
|
);
|
||||||
|
assert_eq!(
|
||||||
|
rows_with_suffix[2].0.values[1].value_data,
|
||||||
|
Some(StringValue("submit".into()))
|
||||||
|
);
|
||||||
|
assert_eq!(
|
||||||
|
rows_with_suffix[2].0.values[2].value_data,
|
||||||
|
Some(U64Value(300))
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Test that single object input still works correctly (backward compatibility)
|
||||||
|
#[test]
|
||||||
|
fn test_one_to_many_single_object_unchanged() {
|
||||||
|
let input_value = serde_json::json!({
|
||||||
|
"name": "Alice",
|
||||||
|
"age": 30
|
||||||
|
});
|
||||||
|
|
||||||
|
let pipeline_yaml = r#"
|
||||||
|
processors:
|
||||||
|
- vrl:
|
||||||
|
source: |
|
||||||
|
.processed = true
|
||||||
|
.
|
||||||
|
|
||||||
|
transform:
|
||||||
|
- field: name
|
||||||
|
type: string
|
||||||
|
- field: age
|
||||||
|
type: uint32
|
||||||
|
- field: processed
|
||||||
|
type: boolean
|
||||||
|
"#;
|
||||||
|
|
||||||
|
let yaml_content = Content::Yaml(pipeline_yaml);
|
||||||
|
let pipeline: Pipeline = parse(&yaml_content).expect("failed to parse pipeline");
|
||||||
|
let (pipeline, mut schema_info, pipeline_def, pipeline_param) = setup_pipeline!(pipeline);
|
||||||
|
let pipeline_ctx = PipelineContext::new(
|
||||||
|
&pipeline_def,
|
||||||
|
&pipeline_param,
|
||||||
|
session::context::Channel::Unknown,
|
||||||
|
);
|
||||||
|
|
||||||
|
let status = input_value.into();
|
||||||
|
let rows_with_suffix = pipeline
|
||||||
|
.exec_mut(status, &pipeline_ctx, &mut schema_info)
|
||||||
|
.expect("failed to exec pipeline")
|
||||||
|
.into_transformed()
|
||||||
|
.expect("expect transformed result");
|
||||||
|
|
||||||
|
// Should produce exactly 1 row
|
||||||
|
assert_eq!(rows_with_suffix.len(), 1);
|
||||||
|
|
||||||
|
let (row, _) = &rows_with_suffix[0];
|
||||||
|
assert_eq!(row.values[0].value_data, Some(StringValue("Alice".into())));
|
||||||
|
assert_eq!(row.values[1].value_data, Some(U32Value(30)));
|
||||||
|
assert_eq!(row.values[2].value_data, Some(BoolValue(true)));
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Test error handling when array contains non-object elements
|
||||||
|
#[test]
|
||||||
|
fn test_one_to_many_array_element_validation() {
|
||||||
|
let input_value = serde_json::json!({
|
||||||
|
"items": ["string", 123, true]
|
||||||
|
});
|
||||||
|
|
||||||
|
// VRL that returns an array with non-object elements
|
||||||
|
let pipeline_yaml = r#"
|
||||||
|
processors:
|
||||||
|
- vrl:
|
||||||
|
source: |
|
||||||
|
.items
|
||||||
|
|
||||||
|
transform:
|
||||||
|
- field: value
|
||||||
|
type: string
|
||||||
|
"#;
|
||||||
|
|
||||||
|
let yaml_content = Content::Yaml(pipeline_yaml);
|
||||||
|
let pipeline: Pipeline = parse(&yaml_content).expect("failed to parse pipeline");
|
||||||
|
let (pipeline, mut schema_info, pipeline_def, pipeline_param) = setup_pipeline!(pipeline);
|
||||||
|
let pipeline_ctx = PipelineContext::new(
|
||||||
|
&pipeline_def,
|
||||||
|
&pipeline_param,
|
||||||
|
session::context::Channel::Unknown,
|
||||||
|
);
|
||||||
|
|
||||||
|
let status = input_value.into();
|
||||||
|
let result = pipeline.exec_mut(status, &pipeline_ctx, &mut schema_info);
|
||||||
|
|
||||||
|
// Should fail because array elements are not objects
|
||||||
|
assert!(result.is_err());
|
||||||
|
let err = result.unwrap_err();
|
||||||
|
let err_msg = err.to_string();
|
||||||
|
assert!(
|
||||||
|
err_msg.contains("must be an object"),
|
||||||
|
"Expected 'must be an object' error, got: {}",
|
||||||
|
err_msg
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
/// Test that empty array produces zero rows
|
||||||
|
#[test]
|
||||||
|
fn test_one_to_many_empty_array() {
|
||||||
|
let input_value = serde_json::json!({
|
||||||
|
"events": []
|
||||||
|
});
|
||||||
|
|
||||||
|
let pipeline_yaml = r#"
|
||||||
|
processors:
|
||||||
|
- vrl:
|
||||||
|
source: |
|
||||||
|
.events
|
||||||
|
|
||||||
|
transform:
|
||||||
|
- field: value
|
||||||
|
type: string
|
||||||
|
"#;
|
||||||
|
|
||||||
|
let yaml_content = Content::Yaml(pipeline_yaml);
|
||||||
|
let pipeline: Pipeline = parse(&yaml_content).expect("failed to parse pipeline");
|
||||||
|
let (pipeline, mut schema_info, pipeline_def, pipeline_param) = setup_pipeline!(pipeline);
|
||||||
|
let pipeline_ctx = PipelineContext::new(
|
||||||
|
&pipeline_def,
|
||||||
|
&pipeline_param,
|
||||||
|
session::context::Channel::Unknown,
|
||||||
|
);
|
||||||
|
|
||||||
|
let status = input_value.into();
|
||||||
|
let rows_with_suffix = pipeline
|
||||||
|
.exec_mut(status, &pipeline_ctx, &mut schema_info)
|
||||||
|
.expect("failed to exec pipeline")
|
||||||
|
.into_transformed()
|
||||||
|
.expect("expect transformed result");
|
||||||
|
|
||||||
|
// Empty array should produce zero rows
|
||||||
|
assert_eq!(rows_with_suffix.len(), 0);
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -16,9 +16,8 @@ use std::collections::BTreeMap;
|
|||||||
use std::sync::Arc;
|
use std::sync::Arc;
|
||||||
|
|
||||||
use ahash::{HashMap, HashMapExt};
|
use ahash::{HashMap, HashMapExt};
|
||||||
use api::greptime_proto;
|
|
||||||
use api::v1::helper::time_index_column_schema;
|
use api::v1::helper::time_index_column_schema;
|
||||||
use api::v1::{ColumnDataType, RowInsertRequest, Rows};
|
use api::v1::{ColumnDataType, RowInsertRequest, Rows, Value};
|
||||||
use common_time::timestamp::TimeUnit;
|
use common_time::timestamp::TimeUnit;
|
||||||
use pipeline::{
|
use pipeline::{
|
||||||
ContextReq, DispatchedTo, GREPTIME_INTERNAL_IDENTITY_PIPELINE_NAME, Pipeline, PipelineContext,
|
ContextReq, DispatchedTo, GREPTIME_INTERNAL_IDENTITY_PIPELINE_NAME, Pipeline, PipelineContext,
|
||||||
@@ -154,13 +153,18 @@ async fn run_custom_pipeline(
|
|||||||
|
|
||||||
let r = unwrap_or_continue_if_err!(result, skip_error);
|
let r = unwrap_or_continue_if_err!(result, skip_error);
|
||||||
match r {
|
match r {
|
||||||
PipelineExecOutput::Transformed(TransformedOutput {
|
PipelineExecOutput::Transformed(TransformedOutput { rows_by_context }) => {
|
||||||
opt,
|
// Process each ContextOpt group separately
|
||||||
row,
|
for (opt, rows_with_suffix) in rows_by_context {
|
||||||
table_suffix,
|
// Group rows by table name within each context
|
||||||
}) => {
|
for (row, table_suffix) in rows_with_suffix {
|
||||||
let act_table_name = table_suffix_to_table_name(&table_name, table_suffix);
|
let act_table_name = table_suffix_to_table_name(&table_name, table_suffix);
|
||||||
push_to_map!(transformed_map, (opt, act_table_name), row, arr_len);
|
transformed_map
|
||||||
|
.entry((opt.clone(), act_table_name))
|
||||||
|
.or_insert_with(|| Vec::with_capacity(arr_len))
|
||||||
|
.push(row);
|
||||||
|
}
|
||||||
|
}
|
||||||
}
|
}
|
||||||
PipelineExecOutput::DispatchedTo(dispatched_to, val) => {
|
PipelineExecOutput::DispatchedTo(dispatched_to, val) => {
|
||||||
push_to_map!(dispatched, dispatched_to, val, arr_len);
|
push_to_map!(dispatched, dispatched_to, val, arr_len);
|
||||||
@@ -173,22 +177,26 @@ async fn run_custom_pipeline(
|
|||||||
|
|
||||||
let mut results = ContextReq::default();
|
let mut results = ContextReq::default();
|
||||||
|
|
||||||
let s_len = schema_info.schema.len();
|
// Process transformed outputs. Each entry in transformed_map contains
|
||||||
|
// Vec<Row> grouped by (opt, table_name).
|
||||||
// if transformed
|
let column_count = schema_info.schema.len();
|
||||||
for ((opt, table_name), mut rows) in transformed_map {
|
for ((opt, table_name), mut rows) in transformed_map {
|
||||||
for row in rows.iter_mut() {
|
// Pad rows to match final schema size (schema may have evolved during processing)
|
||||||
row.values
|
for row in &mut rows {
|
||||||
.resize(s_len, greptime_proto::v1::Value::default());
|
let diff = column_count.saturating_sub(row.values.len());
|
||||||
|
for _ in 0..diff {
|
||||||
|
row.values.push(Value { value_data: None });
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
results.add_row(
|
results.add_row(
|
||||||
opt,
|
&opt,
|
||||||
RowInsertRequest {
|
RowInsertRequest {
|
||||||
rows: Some(Rows {
|
rows: Some(Rows {
|
||||||
rows,
|
rows,
|
||||||
schema: schema_info.schema.clone(),
|
schema: schema_info.schema.clone(),
|
||||||
}),
|
}),
|
||||||
table_name,
|
table_name: table_name.clone(),
|
||||||
},
|
},
|
||||||
);
|
);
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -122,6 +122,7 @@ macro_rules! http_tests {
|
|||||||
test_pipeline_context,
|
test_pipeline_context,
|
||||||
test_pipeline_with_vrl,
|
test_pipeline_with_vrl,
|
||||||
test_pipeline_with_hint_vrl,
|
test_pipeline_with_hint_vrl,
|
||||||
|
test_pipeline_one_to_many_vrl,
|
||||||
test_pipeline_2,
|
test_pipeline_2,
|
||||||
test_pipeline_skip_error,
|
test_pipeline_skip_error,
|
||||||
test_pipeline_filter,
|
test_pipeline_filter,
|
||||||
@@ -3285,6 +3286,151 @@ transform:
|
|||||||
guard.remove_all().await;
|
guard.remove_all().await;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/// Test one-to-many VRL pipeline expansion.
|
||||||
|
/// This test verifies that a VRL processor can return an array, which results in
|
||||||
|
/// multiple output rows from a single input row.
|
||||||
|
pub async fn test_pipeline_one_to_many_vrl(storage_type: StorageType) {
|
||||||
|
common_telemetry::init_default_ut_logging();
|
||||||
|
let (app, mut guard) =
|
||||||
|
setup_test_http_app_with_frontend(storage_type, "test_pipeline_one_to_many_vrl").await;
|
||||||
|
|
||||||
|
let client = TestClient::new(app).await;
|
||||||
|
|
||||||
|
// Pipeline that expands events array into multiple rows
|
||||||
|
let pipeline = r#"
|
||||||
|
processors:
|
||||||
|
- date:
|
||||||
|
field: timestamp
|
||||||
|
formats:
|
||||||
|
- "%Y-%m-%d %H:%M:%S"
|
||||||
|
ignore_missing: true
|
||||||
|
- vrl:
|
||||||
|
source: |
|
||||||
|
# Extract events array and expand each event into a separate row
|
||||||
|
events = del(.events)
|
||||||
|
base_host = del(.host)
|
||||||
|
base_timestamp = del(.timestamp)
|
||||||
|
|
||||||
|
# Map each event to a complete row object
|
||||||
|
map_values(array!(events)) -> |event| {
|
||||||
|
{
|
||||||
|
"host": base_host,
|
||||||
|
"event_type": event.type,
|
||||||
|
"event_value": event.value,
|
||||||
|
"timestamp": base_timestamp
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
transform:
|
||||||
|
- field: host
|
||||||
|
type: string
|
||||||
|
- field: event_type
|
||||||
|
type: string
|
||||||
|
- field: event_value
|
||||||
|
type: int32
|
||||||
|
- field: timestamp
|
||||||
|
type: time
|
||||||
|
index: timestamp
|
||||||
|
"#;
|
||||||
|
|
||||||
|
// 1. create pipeline
|
||||||
|
let res = client
|
||||||
|
.post("/v1/events/pipelines/one_to_many")
|
||||||
|
.header("Content-Type", "application/x-yaml")
|
||||||
|
.body(pipeline)
|
||||||
|
.send()
|
||||||
|
.await;
|
||||||
|
assert_eq!(res.status(), StatusCode::OK);
|
||||||
|
|
||||||
|
// 2. write data - single input with multiple events
|
||||||
|
let data_body = r#"
|
||||||
|
[
|
||||||
|
{
|
||||||
|
"host": "server1",
|
||||||
|
"timestamp": "2024-05-25 20:16:37",
|
||||||
|
"events": [
|
||||||
|
{"type": "cpu", "value": 80},
|
||||||
|
{"type": "memory", "value": 60},
|
||||||
|
{"type": "disk", "value": 45}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
]
|
||||||
|
"#;
|
||||||
|
let res = client
|
||||||
|
.post("/v1/events/logs?db=public&table=metrics&pipeline_name=one_to_many")
|
||||||
|
.header("Content-Type", "application/json")
|
||||||
|
.body(data_body)
|
||||||
|
.send()
|
||||||
|
.await;
|
||||||
|
assert_eq!(res.status(), StatusCode::OK);
|
||||||
|
|
||||||
|
// 3. verify: one input row should produce three output rows
|
||||||
|
validate_data(
|
||||||
|
"test_pipeline_one_to_many_vrl_count",
|
||||||
|
&client,
|
||||||
|
"select count(*) from metrics",
|
||||||
|
"[[3]]",
|
||||||
|
)
|
||||||
|
.await;
|
||||||
|
|
||||||
|
// 4. verify the actual data
|
||||||
|
validate_data(
|
||||||
|
"test_pipeline_one_to_many_vrl_data",
|
||||||
|
&client,
|
||||||
|
"select host, event_type, event_value from metrics order by event_type",
|
||||||
|
"[[\"server1\",\"cpu\",80],[\"server1\",\"disk\",45],[\"server1\",\"memory\",60]]",
|
||||||
|
)
|
||||||
|
.await;
|
||||||
|
|
||||||
|
// 5. Test with multiple input rows, each producing multiple output rows
|
||||||
|
let data_body2 = r#"
|
||||||
|
[
|
||||||
|
{
|
||||||
|
"host": "server2",
|
||||||
|
"timestamp": "2024-05-25 20:17:00",
|
||||||
|
"events": [
|
||||||
|
{"type": "cpu", "value": 90},
|
||||||
|
{"type": "memory", "value": 70}
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"host": "server3",
|
||||||
|
"timestamp": "2024-05-25 20:18:00",
|
||||||
|
"events": [
|
||||||
|
{"type": "cpu", "value": 50}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
]
|
||||||
|
"#;
|
||||||
|
let res = client
|
||||||
|
.post("/v1/events/logs?db=public&table=metrics&pipeline_name=one_to_many")
|
||||||
|
.header("Content-Type", "application/json")
|
||||||
|
.body(data_body2)
|
||||||
|
.send()
|
||||||
|
.await;
|
||||||
|
assert_eq!(res.status(), StatusCode::OK);
|
||||||
|
|
||||||
|
// 6. verify total count: 3 (from first batch) + 2 + 1 = 6 rows
|
||||||
|
validate_data(
|
||||||
|
"test_pipeline_one_to_many_vrl_total_count",
|
||||||
|
&client,
|
||||||
|
"select count(*) from metrics",
|
||||||
|
"[[6]]",
|
||||||
|
)
|
||||||
|
.await;
|
||||||
|
|
||||||
|
// 7. verify rows per host
|
||||||
|
validate_data(
|
||||||
|
"test_pipeline_one_to_many_vrl_per_host",
|
||||||
|
&client,
|
||||||
|
"select host, count(*) as cnt from metrics group by host order by host",
|
||||||
|
"[[\"server1\",3],[\"server2\",2],[\"server3\",1]]",
|
||||||
|
)
|
||||||
|
.await;
|
||||||
|
|
||||||
|
guard.remove_all().await;
|
||||||
|
}
|
||||||
|
|
||||||
pub async fn test_pipeline_2(storage_type: StorageType) {
|
pub async fn test_pipeline_2(storage_type: StorageType) {
|
||||||
common_telemetry::init_default_ut_logging();
|
common_telemetry::init_default_ut_logging();
|
||||||
let (app, mut guard) = setup_test_http_app_with_frontend(storage_type, "test_pipeline_2").await;
|
let (app, mut guard) = setup_test_http_app_with_frontend(storage_type, "test_pipeline_2").await;
|
||||||
|
|||||||
Reference in New Issue
Block a user