* feat: metric batch 2s PoC Signed-off-by: jeremyhi <fengjiachun@gmail.com> * chore: max_concurrent_flushes Signed-off-by: jeremyhi <fengjiachun@gmail.com> * chore: work channel size Signed-off-by: jeremyhi <fengjiachun@gmail.com> * feat(servers): add metrics and logs for pending rows batch flush Add the `FLUSH_ELAPSED` histogram metric to track the duration of pending rows batch flushes in the Prometheus store protocol handler. This provides better observability into the performance and latency of the batcher. Also update telemetry by: - Recording elapsed time for both successful and failed flush operations. - Adding an informational log upon successful flush including row count and duration. - Including elapsed time in error logs when a flush fails. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat(servers): implement columnar batching for pending rows Refactor PendingRowsBatcher to use columnar batching for the metrics store. Incoming RowInsertRequests are now converted to RecordBatches, partitioned, and flushed via BulkInsert requests to datanodes. - Enhance MultiDimPartitionRule to handle scalar boolean predicates. - Add metrics for tracking flush failures and dropped rows. - Update dependencies to support columnar batching in servers. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat(servers): add backpressure for pending rows Implement backpressure in PendingRowsBatcher by limiting in-flight requests with a semaphore and making the submission wait for the flush result. This ensures Prometheus write requests are throttled and only return once the data has been successfully flushed to datanodes. - Add max_inflight_requests to PromStoreOptions. - Use oneshot channels to notify submitters of flush completion. - Limit concurrent requests using a new inflight_semaphore. - Update PendingRowsBatcher::submit to wait for the flush outcome. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat: add stage-level metrics for bulk ingestion Introduce histograms to track the elapsed time of various stages in the metric engine bulk insert path and the server's pending rows batcher. This provides better observability into the performance bottlenecks of the ingestion pipeline. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * - `src/metric-engine/src/engine/bulk_insert.rs`: Removed the fallback mechanism that converted record batches to rows when bulk inserts were unsupported, along with related helper functions and unused imports. - `src/operator/src/insert.rs`: Removed an unused import (`common_time::TimeToLive::Instant`). Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat(servers): columnar Prom remote write Optimize the Prometheus remote write path by allowing direct conversion from decoded Prometheus samples to Arrow RecordBatches. This bypasses intermediate row-based representations when `PendingRowsBatcher` is active and no pipeline is used, improving ingestion efficiency. - Implement `as_record_batch_groups` in `TablesBuilder` and `PromWriteRequest`. - Add `submit_prom_record_batch_groups` to `PendingRowsBatcher`. - Introduce `DecodedPromWriteRequest` in `prom_store`. - Implement row-to-RecordBatch conversion logic in `prom_row_builder`. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * Revert "feat(servers): columnar Prom remote write" This reverts commit efbb63c12a3e7fcec03858ea0351efd94fec8242. * refactor(servers): improve row to RecordBatch conversion - Use `snafu::ensure` for row validation in `rows_to_record_batch`. - Add explicit type hint for `MutableVector` to improve clarity. - Reorganize and clean up imports in `pending_rows_batcher.rs`. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * perf(servers): use arrow builders for row conversion This commit optimizes the conversion from `api::v1::Rows` to `RecordBatch` by using Arrow builders directly. This avoids the overhead of `MutableVector` and `common_recordbatch`, leading to better performance in the `pending_rows_batcher`. Additionally, the `#[allow(dead_code)]` attribute is removed from `modify_batch_sparse` in the metric engine as it is now utilized. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * perf(metric-engine): optimize batch modification Optimize `modify_batch_sparse` by reusing buffers, using Arrow builders, and employing fast-path encoding methods. This reduces allocations and avoids redundant downcasting and serializer overhead. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat/metric-engine-support-bulk: **Add Environment Variable for Batch Sync Control** - `pending_rows_batcher.rs`: Introduced an environment variable `PENDING_ROWS_BATCH_SYNC` to control the synchronization behavior of batch processing. If set to true, the function will wait for the flush result; otherwise, it will return immediatel with the total rows count. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * wip Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * chore: update and fix clippy Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * fix: failing test Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * picking-pending-rows-batcher: ### Commit Message Remove Unused Code and Simplify Error Handling - **`src/error.rs`**: Removed the `BatcherQueueFull` error variant and its associated logic, simplifying the error handling by removing unused code. - **`src/http/prom_store.rs`**: Eliminated the `try_decompress` function, streamlining the decompression logic by directly using `snappy_decompress` in `decode_remote_read_request`. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * chore: parse PENDING_ROWS_BATCH_SYNC once Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * chore: revert unrelated changes Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * **Refactor Prometheus Write Handling** - **`prom_store.rs`**: Introduced `pre_write` method in `PromStoreProtocolHandler` to handle pre-write checks for Prometheus remote write requests. Updated `write` method to utilize `pre_write`. - **`server.rs`**: Modified `PendingRowsBatcher` initialization to conditionally create a batcher based on `with_metric_engine` flag. - **`http/prom_store.rs`**: Integrated `pre_write` checks before submitting requests to `PendingRowsBatcher`. - **`query_handler.rs`**: Added `pre_write` method to `PromStoreProtocolHandler` trait for pre-write operations. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * picking-pending-rows-batcher: - **Fix Label Typo**: Corrected a typo in the label value from `"flush_wn ite_region"` to `"flush_write_region"` in `pending_rows_batcher.rs`. - **Refactor Array Building Logic**: Introduced a macro `build_array!` to streamline the construction of `ArrayRef` for different data types, reducing code duplication in `pending_rows_batcher.rs`. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * format toml Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * picking-pending-rows-batcher: ### Update PromStore and PendingRowsBatcher Configuration - **`prom_store.rs`**: Set `pending_rows_flush_interval` to `Duration::ZERO` to disable automatic flushing. - **`pending_rows_batcher.rs`**: Enhance validation to disable the batcher when `flush_interval` is zero or configuration values like `max_batch_rows`, `max_concurrent_flushes`, `worker_channel_capacity`, or `max_inflight_requests` are zero, preventing potential panics or deadlocks. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * picking-pending-rows-batcher: ### Update `pending_rows_flush_interval` to Zero - **Files Modified**: - `src/frontend/src/service_config/prom_store.rs` - `tests-integration/tests/http.rs` - **Key Changes**: - Updated `pending_rows_flush_interval` from `Duration::from_secs(2)` to `Duration::ZERO` in `prom_store.rs`. - Changed `pending_rows_flush_interval` configuration from `"2s"` to `"0s"` in `http.rs`. These changes set the flush interval to zero, potentially affecting how frequently pending rows are flushed. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * picking-pending-rows-batcher: **Add Worker Management Enhancements** - **`metrics.rs`**: Introduced `PENDING_WORKERS` gauge to track active pending rows batch workers. - **`pending_rows_batcher.rs`**: - Added worker idle timeout logic with `WORKER_IDLE_TIMEOUT_MULTIPLIER`. - Implemented worker management functions: `spawn_worker`, `remove_worker_if_same_channel`, and `should_close_worker_on_idle_timeout`. - Enhanced worker lifecycle management to handle idle workers and ensure proper cleanup. - **Tests**: Added unit tests for worker removal and idle timeout logic. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * fix: clippy Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> --------- Signed-off-by: jeremyhi <fengjiachun@gmail.com> Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> Co-authored-by: jeremyhi <fengjiachun@gmail.com>
Setup tests for multiple storage backend
To run the integration test, please copy .env.example to .env in the project root folder and change the values on need.
Take s3 for example. You need to set your S3 bucket, access key id and secret key:
# Settings for s3 test
GT_S3_BUCKET=S3 bucket
GT_S3_REGION=S3 region
GT_S3_ACCESS_KEY_ID=S3 access key id
GT_S3_ACCESS_KEY=S3 secret access key
Run
Execute the following command in the project root folder:
cargo test integration
Test s3 storage:
cargo test s3
Test oss storage:
cargo test oss
Test azblob storage:
cargo test azblob
Setup tests with Kafka wal
To run the integration test, please copy .env.example to .env in the project root folder and change the values on need.
GT_KAFKA_ENDPOINTS = localhost:9092
Setup kafka standalone
cd tests-integration/fixtures
docker compose -f docker-compose.yml up kafka
Setup tests with etcd TLS
This guide explains how to set up and test TLS-enabled etcd connections in GreptimeDB integration tests.
Quick Start
TLS certificates are already at tests-integration/fixtures/etcd-tls-certs/.
-
Start TLS-enabled etcd:
cd tests-integration/fixtures docker compose up etcd-tls -d -
Start all services (including etcd-tls):
cd tests-integration/fixtures docker compose up -d --wait
Certificate Details
The checked-in certificates include:
ca.crt- Certificate Authority certificateserver.crt/server-key.pem- Server certificate for etcd-tls serviceclient.crt/client-key.pem- Client certificate for connecting to etcd-tls
The server certificate includes SANs for localhost, etcd-tls, 127.0.0.1, and ::1.
Regenerating Certificates (Optional)
If you need to regenerate the etcd certificates:
# Regenerate certificates (overwrites existing ones)
./scripts/generate-etcd-tls-certs.sh
# Or generate in custom location
./scripts/generate-etcd-tls-certs.sh /path/to/cert/directory
If you need to regenerate the mysql and postgres certificates:
# Regenerate certificates (overwrites existing ones)
./scripts/generate_certs.sh
# Or generate in custom location
./scripts/generate_certs.sh /path/to/cert/directory
Note: The checked-in certificates are for testing purposes only and should never be used in production.