Files
greptimedb/src/servers/tests/mod.rs
Lei, HUANG f298a110f9 feat: bridge bulk insert (#5927)
* feat/bridge-bulk-insert:
 ## Implement Bulk Insert and Update Dependencies

 - **Bulk Insert Implementation**: Added `handle_bulk_inserts` method in `src/operator/src/bulk_insert.rs` to manage bulk insert requests using `FlightDecoder` and `FlightData`.
 - **Dependency Updates**: Updated `Cargo.lock` and `Cargo.toml` to use the latest revision of `greptime-proto` and added new dependencies like `arrow`, `arrow-ipc`, `bytes`, and `prost`.
 - **gRPC Enhancements**: Modified `put_record_batch` method in `src/frontend/src/instance/grpc.rs` and `src/servers/src/grpc/flight.rs` to handle `FlightData` instead of `RawRecordBatch`.
 - **Error Handling**: Added new error types in `src/operator/src/error.rs` for handling Arrow operations and decoding flight data.
 - **Miscellaneous**: Updated `src/operator/src/insert.rs` to expose `partition_manager` and `node_manager` as public fields.

* feat/bridge-bulk-insert:
 - **Update `greptime-proto` Dependency**: Updated the `greptime-proto` dependency to a new revision in `Cargo.lock` and `Cargo.toml`.
 - **Refactor gRPC Query Handling**: Removed `RawRecordBatch` usage from `grpc.rs`, `flight.rs`, `greptime_handler.rs`, and test files, simplifying the gRPC query handling.
 - **Enhance Bulk Insert Logic**: Improved bulk insert logic in `bulk_insert.rs` and `region_request.rs` by using `FlightDecoder` and `BooleanArray` for better performance and clarity.
 - **Add `common-grpc` Dependency**: Added `common-grpc` as a workspace dependency in `store-api/Cargo.toml` to support gRPC functionalities.

* fix: clippy

* fix schema serialization

* feat/bridge-bulk-insert:
 Add error handling for encoding/decoding in `metadata.rs` and `region_request.rs`

 - Introduced new error variants `FlightCodec` and `Prost` in `MetadataError` to handle encoding/decoding failures in `metadata.rs`.
 - Updated `make_region_bulk_inserts` function in `region_request.rs` to use `context` for error handling with `ProstSnafu` and `FlightCodecSnafu`.
 - Enhanced error handling for `FlightData` decoding and `filter_record_batch` operations.

* fix: test

* refactor: rename

* allow empty app_metadata in FlightData

* feat/bridge-bulk-insert:
 - **Remove Logging**: Removed unnecessary logging of affected rows in `region_server.rs`.
 - **Error Handling Enhancement**: Improved error handling in `bulk_insert.rs` by adding context to `split_record_batch` and handling single datanode fast path.
 - **Error Enum Cleanup**: Removed unused `Arrow` error variant from `error.rs`.

* fix: standalone test

* feat/bridge-bulk-insert:
 ### Enhance Bulk Insert Handling and Metadata Management

 - **`lib.rs`**: Enabled the `result_flattening` feature for improved error handling.
 - **`request.rs`**: Made `name_to_index` and `has_null` fields public in `WriteRequest` for better accessibility.
 - **`handle_bulk_insert.rs`**:
   - Added `handle_record_batch` function to streamline processing of bulk insert payloads.
   - Improved error handling and task management for bulk insert operations.
   - Updated `region_metadata_to_column_schema` to return both column schemas and a name-to-index map for efficient data access.

* feat/bridge-bulk-insert:
 - **Refactor `handle_bulk_insert.rs`:**
   - Replaced `handle_record_batch` with `handle_payload` for handling payloads.
   - Modified the fast path to use `common_runtime::spawn_global` for asynchronous task execution.

 - **Optimize `multi_dim.rs`:**
   - Added a fast path for single-region scenarios in `MultiDimPartitionRule::partition_record_batch`.

* feat/bridge-bulk-insert:
 - **Update `greptime-proto` Dependency**: Updated the `greptime-proto` dependency to a new revision in both `Cargo.lock` and `Cargo.toml`.
 - **Optimize Memory Allocation**: Increased initial and builder capacities in `time_series.rs` to improve performance.
 - **Enhance Data Handling**: Modified `bulk_insert.rs` to use `Bytes` for efficient data handling.
 - **Improve Bulk Insert Logic**: Refined the bulk insert logic in `region_request.rs` to handle schema and payload data more effectively and optimize record batch filtering.
 - **String Handling Improvement**: Updated string conversion in `helper.rs` for better performance.

* fix: clippy warnings

* feat/bridge-bulk-insert:
 **Add Metrics and Improve Error Handling**

 - **Metrics Enhancements**: Introduced new metrics for bulk insert operations in `metrics.rs`, `bulk_insert.rs`, `greptime_handler.rs`, and `region_request.rs`. Added `HANDLE_BULK_INSERT_ELAPSED`, `BULK_REQUEST_MESSAGE_SIZE`, and `GRPC_BULK_INSERT_ELAPSED` histograms to
 monitor performance.
 - **Error Handling Improvements**: Removed unnecessary error handling in `handle_bulk_insert.rs` by eliminating redundant `let _ =` patterns.
 - **Dependency Updates**: Added `lazy_static` and `prometheus` to `Cargo.lock` and `Cargo.toml` for metrics support.
 - **Code Refactoring**: Simplified function calls in `region_server.rs` and `handle_bulk_insert.rs` for better readability.

* chore: rebase main

* chore: merge main
2025-05-06 09:53:25 +00:00

201 lines
6.5 KiB
Rust

// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
use std::sync::Arc;
use api::v1::greptime_request::Request;
use api::v1::query_request::Query;
use arrow_flight::FlightData;
use async_trait::async_trait;
use catalog::memory::MemoryCatalogManager;
use common_base::AffectedRows;
use common_catalog::consts::{DEFAULT_CATALOG_NAME, DEFAULT_SCHEMA_NAME};
use common_grpc::flight::FlightDecoder;
use common_query::Output;
use datafusion_expr::LogicalPlan;
use query::options::QueryOptions;
use query::parser::{PromQuery, QueryLanguageParser, QueryStatement};
use query::query_engine::DescribeResult;
use query::{QueryEngineFactory, QueryEngineRef};
use servers::error::{Error, NotSupportedSnafu, Result};
use servers::query_handler::grpc::{GrpcQueryHandler, ServerGrpcQueryHandlerRef};
use servers::query_handler::sql::{ServerSqlQueryHandlerRef, SqlQueryHandler};
use session::context::QueryContextRef;
use snafu::ensure;
use sql::statements::statement::Statement;
use table::metadata::TableId;
use table::table_name::TableName;
use table::TableRef;
mod grpc;
mod http;
mod interceptor;
mod mysql;
mod postgres;
const LOCALHOST_WITH_0: &str = "127.0.0.1:0";
pub struct DummyInstance {
query_engine: QueryEngineRef,
}
impl DummyInstance {
fn new(query_engine: QueryEngineRef) -> Self {
Self { query_engine }
}
}
#[async_trait]
impl SqlQueryHandler for DummyInstance {
type Error = Error;
async fn do_query(&self, query: &str, query_ctx: QueryContextRef) -> Vec<Result<Output>> {
let stmt = QueryLanguageParser::parse_sql(query, &query_ctx).unwrap();
let plan = self
.query_engine
.planner()
.plan(&stmt, query_ctx.clone())
.await
.unwrap();
let output = self.query_engine.execute(plan, query_ctx).await.unwrap();
vec![Ok(output)]
}
async fn do_exec_plan(&self, plan: LogicalPlan, query_ctx: QueryContextRef) -> Result<Output> {
Ok(self.query_engine.execute(plan, query_ctx).await.unwrap())
}
async fn do_promql_query(
&self,
_: &PromQuery,
_: QueryContextRef,
) -> Vec<std::result::Result<Output, Self::Error>> {
unimplemented!()
}
async fn do_describe(
&self,
stmt: Statement,
query_ctx: QueryContextRef,
) -> Result<Option<DescribeResult>> {
if let Statement::Query(_) = stmt {
let plan = self
.query_engine
.planner()
.plan(&QueryStatement::Sql(stmt), query_ctx.clone())
.await
.unwrap();
let schema = self.query_engine.describe(plan, query_ctx).await.unwrap();
Ok(Some(schema))
} else {
Ok(None)
}
}
async fn is_valid_schema(&self, catalog: &str, schema: &str) -> Result<bool> {
Ok(catalog == DEFAULT_CATALOG_NAME && schema == DEFAULT_SCHEMA_NAME)
}
}
#[async_trait]
impl GrpcQueryHandler for DummyInstance {
type Error = Error;
async fn do_query(
&self,
request: Request,
ctx: QueryContextRef,
) -> std::result::Result<Output, Self::Error> {
let output = match request {
Request::Inserts(_)
| Request::Deletes(_)
| Request::RowInserts(_)
| Request::RowDeletes(_) => unimplemented!(),
Request::Query(query_request) => {
let query = query_request.query.unwrap();
match query {
Query::Sql(sql) => {
let mut result = SqlQueryHandler::do_query(self, &sql, ctx).await;
ensure!(
result.len() == 1,
NotSupportedSnafu {
feat: "execute multiple statements in SQL query string through GRPC interface"
}
);
result.remove(0)?
}
Query::LogicalPlan(_) | Query::InsertIntoPlan(_) => unimplemented!(),
Query::PromRangeQuery(promql) => {
let prom_query = PromQuery {
query: promql.query,
start: promql.start,
end: promql.end,
step: promql.step,
lookback: promql.lookback,
};
let mut result =
SqlQueryHandler::do_promql_query(self, &prom_query, ctx).await;
ensure!(
result.len() == 1,
NotSupportedSnafu {
feat: "execute multiple statements in PromQL query string through GRPC interface"
}
);
result.remove(0)?
}
}
}
Request::Ddl(_) => unimplemented!(),
};
Ok(output)
}
async fn put_record_batch(
&self,
table: &TableName,
table_id: &mut Option<TableId>,
decoder: &mut FlightDecoder,
data: FlightData,
) -> std::result::Result<AffectedRows, Self::Error> {
let _ = table;
let _ = data;
let _ = table_id;
let _ = decoder;
unimplemented!()
}
}
fn create_testing_instance(table: TableRef) -> DummyInstance {
let catalog_manager = MemoryCatalogManager::new_with_table(table);
let query_engine = QueryEngineFactory::new(
catalog_manager,
None,
None,
None,
None,
false,
QueryOptions::default(),
)
.query_engine();
DummyInstance::new(query_engine)
}
fn create_testing_sql_query_handler(table: TableRef) -> ServerSqlQueryHandlerRef {
Arc::new(create_testing_instance(table)) as _
}
fn create_testing_grpc_query_handler(table: TableRef) -> ServerGrpcQueryHandlerRef {
Arc::new(create_testing_instance(table)) as _
}