* feat: support flat format in SeqScan
Signed-off-by: evenyag <realevenyag@gmail.com>
* feat: support flat format in unordered scan
Signed-off-by: evenyag <realevenyag@gmail.com>
* feat: support parallel read for flat format in SeqScan
Signed-off-by: evenyag <realevenyag@gmail.com>
* refactor: rename flat DedupReader to FlatDedupReader
Signed-off-by: evenyag <realevenyag@gmail.com>
* chore: address review comments
It also precomputes the input arrow schema
Signed-off-by: evenyag <realevenyag@gmail.com>
---------
Signed-off-by: evenyag <realevenyag@gmail.com>
* feat: add FlushRegionsV2 instruction with unified semantics
- Add FlushRegionsV2 struct supporting both single and batch operations
- Preserve original FlushRegion(RegionId) API for backward compatibility
- Support configurable FlushStrategy (Sync/Async) and FlushErrorStrategy (FailFast/TryAll)
- Add detailed per-region error reporting in FlushRegionReply
- Update datanode handlers to support both legacy and enhanced flush instructions
- Maintain zero breaking changes through automatic conversion of legacy formats
Signed-off-by: Alex Araujo <alexaraujo@gmail.com>
* chore: run make fmt
Signed-off-by: Alex Araujo <alexaraujo@gmail.com>
* Apply suggestions from code review
Co-authored-by: LFC <990479+MichaelScofield@users.noreply.github.com>
Signed-off-by: Alex Araujo <alexaraujo@gmail.com>
* refactor: extract shared perform_region_flush fn
Signed-off-by: Alex Araujo <alexaraujo@gmail.com>
* refactor: use consistent error type across similar methods
see gh copilot suggestion: https://github.com/GreptimeTeam/greptimedb/pull/6819#discussion_r2299603698
Signed-off-by: Alex Araujo <alexaraujo@gmail.com>
* chore: make fmt
Signed-off-by: Alex Araujo <alexaraujo@gmail.com>
* refactor: consolidate FlushRegion instructions
Signed-off-by: Alex Araujo <alexaraujo@gmail.com>
---------
Signed-off-by: Alex Araujo <alexaraujo@gmail.com>
* feat: implements method to write flat batch for ParquetWriter
Signed-off-by: evenyag <realevenyag@gmail.com>
* feat: add update method for flat RecordBatch in Indexer
Signed-off-by: evenyag <realevenyag@gmail.com>
* feat: calls indexer to write flat batch in ParquetWriter
Signed-off-by: evenyag <realevenyag@gmail.com>
* fix: handle empty projection for flat format
Signed-off-by: evenyag <realevenyag@gmail.com>
* fix: eval array in precise_filter_flat
Signed-off-by: evenyag <realevenyag@gmail.com>
* feat: cache column lookup result in inverted indexer
Signed-off-by: evenyag <realevenyag@gmail.com>
* test: add test
Signed-off-by: evenyag <realevenyag@gmail.com>
* feat: support dict type in dense codec
Signed-off-by: evenyag <realevenyag@gmail.com>
* test: remove read part in test as it need modifying the reader
Signed-off-by: evenyag <realevenyag@gmail.com>
* feat: support dictionary type in other methods for dense codec
Signed-off-by: evenyag <realevenyag@gmail.com>
* refactor: fulltext use string array directly
Signed-off-by: evenyag <realevenyag@gmail.com>
---------
Signed-off-by: evenyag <realevenyag@gmail.com>
* chore/change-encode-raw-values-sig:
### Update Sparse Encoding to Use Byte Slices
- **`bench_sparse_encoding.rs`**: Modified the `encode_raw_tag_value` function to use byte slices instead of `Bytes` for tag values.
- **`sparse.rs`**: Updated the `encode_raw_tag_value` method in `SparsePrimaryKeyCodec` to accept byte slices (`&[u8]`) instead of `Bytes`. Adjusted related test cases to reflect this change.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* chore/change-encode-raw-values-sig:
### Add `Clear` Trait Implementation for Byte Slices
- Implemented the `Clear` trait for byte slices (`&[u8]`) in `repeated_field.rs` to enhance trait coverage and provide a default clear operation for byte slice types.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
---------
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat: update rate limiter to use semaphore that will block without return error
Signed-off-by: Ning Sun <sunning@greptime.com>
* fix: remove unused error
Signed-off-by: Ning Sun <sunning@greptime.com>
---------
Signed-off-by: Ning Sun <sunning@greptime.com>
* feat: support different key type for the dictionary vector
Signed-off-by: evenyag <realevenyag@gmail.com>
* feat: support more dictionary type in try_into_vector
Signed-off-by: evenyag <realevenyag@gmail.com>
* fix: use key array's type as key type
Signed-off-by: evenyag <realevenyag@gmail.com>
---------
Signed-off-by: evenyag <realevenyag@gmail.com>
* feat: cache the cloned page bytes
Signed-off-by: evenyag <realevenyag@gmail.com>
* feat: cache the whole row group pages
The opendal reader may merge IO requests so the pages of different
columns can share the same Bytes.
When we use a per-column page cache, the page cache may still referencing
the whole Bytes after eviction if there are other columns in the cache that
share the same Bytes.
Signed-off-by: evenyag <realevenyag@gmail.com>
* feat: check possible max byte range and copy pages if needed
Signed-off-by: evenyag <realevenyag@gmail.com>
* feat: always copy pages
Signed-off-by: evenyag <realevenyag@gmail.com>
* feat: returns the copied pages
Signed-off-by: evenyag <realevenyag@gmail.com>
* feat: compute cache size by MERGE_GAP
Signed-off-by: evenyag <realevenyag@gmail.com>
* feat: align to buf size
Signed-off-by: evenyag <realevenyag@gmail.com>
* feat: aligh to 2MB
Signed-off-by: evenyag <realevenyag@gmail.com>
* chore: remove unused code
Signed-off-by: evenyag <realevenyag@gmail.com>
* style: fix clippy
Signed-off-by: evenyag <realevenyag@gmail.com>
* chore: fix typo
Signed-off-by: evenyag <realevenyag@gmail.com>
* test: fix parquet read with cache test
Signed-off-by: evenyag <realevenyag@gmail.com>
---------
Signed-off-by: evenyag <realevenyag@gmail.com>
* feat: Add support for TWCS time window hints in insert operations
Signed-off-by: WenyXu <wenymedia@gmail.com>
* feat: set system events table time window to 1d
Signed-off-by: WenyXu <wenymedia@gmail.com>
---------
Signed-off-by: WenyXu <wenymedia@gmail.com>
* perf/sparse-encoder:
- **Update Dependencies**: Updated `criterion-plot` to version `0.5.0` and added `criterion` version `0.7.0` in `Cargo.lock`. Added `bytes` to `Cargo.toml` in `src/metric-engine`.
- **Benchmarking**: Added a new benchmark for sparse encoding in `bench_sparse_encoding.rs` and updated `Cargo.toml` in `src/mito-codec` to include `criterion` as a dev-dependency.
- **Sparse Encoding Enhancements**: Modified `SparsePrimaryKeyCodec` in `sparse.rs` to include new methods `encode_raw_tag_value` and `encode_internal`. Added public constants `RESERVED_COLUMN_ID_TSID` and `RESERVED_COLUMN_ID_TABLE_ID`.
- **HTTP Server**: Made `try_decompress` function public in `prom_store.rs`.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* perf/sparse-encoder:
Improve buffer handling in `sparse.rs`
- Refactored buffer reservation logic to use `value_len` for clarity.
- Optimized chunk processing by calculating `num_chunks` and `remainder` for efficient data handling.
- Enhanced manual serialization of bytes to avoid byte-by-byte operations, improving performance.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* Update src/mito-codec/src/row_converter/sparse.rs
Co-authored-by: Yingwen <realevenyag@gmail.com>
---------
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
Co-authored-by: Yingwen <realevenyag@gmail.com>
* test: add failing test for #6791
* test: add support for = and =~
* fix: lint
* fix: code merge issue
Signed-off-by: Ning Sun <sunning@greptime.com>
---------
Signed-off-by: Ning Sun <sunning@greptime.com>
* chore: improve error message when there are more than one time index
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* chore: style
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
---------
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* feat: optimize CreateFlowData with lightweight FlowQueryContext
Replace full QueryContext with FlowQueryContext containing only essential fields (catalog, schema, timezone) in CreateFlowData struct. This improves serialization performance by eliminating unused extensions HashMap and channel fields.
Key changes:
- Add FlowQueryContext struct with conversion implementations
- Update CreateFlowData to use FlowQueryContext with backward compatibility
- Add tests for serialization and conversions
Signed-off-by: Alex Araujo <alexaraujo@gmail.com>
* chore: run make fmt
Signed-off-by: Alex Araujo <alexaraujo@gmail.com>
---------
Signed-off-by: Alex Araujo <alexaraujo@gmail.com>
* refactor: use DataFusion's UDAF implementation directly
Signed-off-by: luofucong <luofc@foxmail.com>
* remove: delete how-to guide for writing aggregate functions
Signed-off-by: luofucong <luofc@foxmail.com>
* fix ci
Signed-off-by: luofucong <luofc@foxmail.com>
* refactor: port json_encode_path to datafusion udaf
Signed-off-by: Ning Sun <sunning@greptime.com>
---------
Signed-off-by: luofucong <luofc@foxmail.com>
Signed-off-by: Ning Sun <sunning@greptime.com>
Co-authored-by: Ning Sun <sunning@greptime.com>
* feat: Implements FlatLastNonNull strategy
Dedup rows and keep last non null fields
Signed-off-by: evenyag <realevenyag@gmail.com>
* test: add basic test for FlatLastNonNull
Signed-off-by: evenyag <realevenyag@gmail.com>
* test: port more last non null test
Signed-off-by: evenyag <realevenyag@gmail.com>
* fix: do merge rows after delete op
Signed-off-by: evenyag <realevenyag@gmail.com>
* style: fix clippy
Signed-off-by: evenyag <realevenyag@gmail.com>
* refactor: rename num_pk_columns to field_column_start
So we can support different format later
Signed-off-by: evenyag <realevenyag@gmail.com>
* chore: address comment
Signed-off-by: evenyag <realevenyag@gmail.com>
---------
Signed-off-by: evenyag <realevenyag@gmail.com>
* chore: add u64 for EqualValue and set expr is true when filter is empty
* Update src/log-query/src/log_query.rs
Co-authored-by: Yingwen <realevenyag@gmail.com>
* chore: update EqualValue Uinit to UInt
---------
Co-authored-by: Yingwen <realevenyag@gmail.com>
chore/impl-cast-to-primitives-for-path-type:
### Add `num_enum` for Enum Conversion and Update `PathType`
- **Added `num_enum` Dependency**: Updated `Cargo.lock` and `Cargo.toml` to include `num_enum` for enum conversion functionality.
- Files: `Cargo.lock`, `src/store-api/Cargo.toml`
- **Enhanced `PathType` Enum**: Implemented `TryFromPrimitive` for `PathType` to enable conversion from primitive types.
- Files: `src/store-api/src/region_request.rs`
- **Added Unit Tests**: Introduced tests to verify the conversion of `PathType` enum to and from primitive types.
- Files: `src/store-api/src/region_request.rs`
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* test: reproduce windows ci issue
* chore: update sqlx
* chore: update pgwire
* chore: update to a debug version of pgwire
* fix: update pgwire to resolve peek after read on windows
* ci: remove windows task from regular ci
refactor: extract the common codes of creating proto ColumnSchema and Row to helper functions
fix: explicitly set the follower max sequence when finding extension ranges to avoid potential concurrency hazard
Signed-off-by: luofucong <luofc@foxmail.com>
* feat: update pgwire api
* feat: update pgwire and override on_query/on_execute
* feat: update pgwire to 0.32
* chore: remove code example
Signed-off-by: Ning Sun <sunning@greptime.com>
---------
Signed-off-by: Ning Sun <sunning@greptime.com>
* perf: cached reader do not get page concurrently
Otherwise they will all fetch the same pages in parallel
Signed-off-by: evenyag <realevenyag@gmail.com>
* perf: always disable zstd for bloom
Signed-off-by: evenyag <realevenyag@gmail.com>
---------
Signed-off-by: evenyag <realevenyag@gmail.com>
* chore/optimize-catalog:
### Add `table_id` Method to `CatalogManager`
- **Files Modified**:
- `src/catalog/src/kvbackend/manager.rs`
- `src/catalog/src/lib.rs`
- **Key Changes**:
- Introduced a new asynchronous method `table_id` in the `CatalogManager` trait to retrieve the table ID based on catalog, schema, and table name.
- Implemented the `table_id` method in `KvBackendCatalogManager` to fetch the table ID from the system catalog or cache, with a fallback to `pg_catalog` for Postgres channels.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* chore/optimize-catalog:
### Add `table_info_by_id` Method to Catalog Managers
- **`manager.rs`**: Introduced the `table_info_by_id` method in `KvBackendCatalogManager` to retrieve table information by table ID using the `TableInfoCacheRef`.
- **`lib.rs`**: Updated the `CatalogManager` trait to include the new `table_info_by_id` method.
- **`memory/manager.rs`**: Implemented the `table_info_by_id` method in `MemoryCatalogManager` to fetch table information by table ID from in-memory catalogs.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
---------
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* fix: not mark all deleted when partial trunc¬ update manifest when partial file range is empty
Signed-off-by: discord9 <discord9@163.com>
* docs: note
Signed-off-by: discord9 <discord9@163.com>
---------
Signed-off-by: discord9 <discord9@163.com>
* feat: initial support for __schema__ in label values
* feat: filter database with matches
* refactor: skip unnecessary check
* fix: resolve schema matcher in label values
* test: add a test case for table not exists
* refactor: add matchop check on db label
* chore: merge main
fix/compaction-concurrency:
Add delay before compaction in `compaction_test.rs`
- Introduced a 2-millisecond delay using `tokio::time::sleep` before the `compact` function call in `test_compaction_region_with_overlapping_delete_all` to ensure proper timing and synchronization during the test execution.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat: add `SET DEFAULT` syntax
Signed-off-by: Yihai Lin <yihai-lin@foxmail.com>
* test: add `CURRENT_TIMESTAMP()` as default value for `SET DEFAULT` syntax
Signed-off-by: Yihai Lin <yihai-lin@foxmail.com>
* refactor: Make the error types more precise.
Signed-off-by: Yihai Lin <yihai-lin@foxmail.com>
* chore: a minor error display enchancement for `SET DEFAULT`
Signed-off-by: Yihai Lin <yihai-lin@foxmail.com>
* refactor: Using `MODIFY COLUMN` for `DROP/SET DEFUALT`
Signed-off-by: Yihai Lin <yihai-lin@foxmail.com>
* chore: update `greptime-proto`
Signed-off-by: Yihai Lin <yihai-lin@foxmail.com>
---------
Signed-off-by: Yihai Lin <yihai-lin@foxmail.com>
* feat: supports more db options
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* fix: tests
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* fix: use btree map for consistent results
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* feat: adds compaction keys into valid db options
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
---------
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* bulk-multiparts-merge-reader:
**Enhance Memtable Iteration and Flushing Logic**
- **`flush.rs`**: Updated `RegionFlushTask` to handle multiple ranges using `MergeReaderBuilder` for improved source management during flush operations.
- **`memtable.rs`**: Introduced `build_prune_iter` and `build_iter` methods in `MemtableRange` for flexible iteration. Added `MemtableRanges` struct to manage multiple contexts.
- **`simple_bulk_memtable.rs`**: Refactored to use `BatchIterBuilder` and `BatchIterBuilderDeprecated` for iteration, supporting new `read_to_values` method in `Series`.
- **`time_series.rs`**: Added `read_to_values` and `finish_cloned` methods in `Series` and `ValueBuilder` for efficient data handling.
- **`scan_util.rs`**: Replaced `build_iter` with `build_prune_iter` for range iteration, enhancing scan utility.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* bulk-multiparts-merge-reader:
- **Add Rayon for Parallel Processing**: Introduced `rayon` for parallel processing in `simple_bulk_memtable.rs` and updated `Cargo.toml` and `Cargo.lock` to include `rayon` dependency.
- **Enhance Benchmarking**: Added new benchmarks in `simple_bulk_memtable.rs` to compare parallel vs sequential processing, projection, sequence filtering, and write performance.
- **Make Structs and Methods Public**: Changed visibility of several structs and methods to `pub` in `simple_bulk_memtable.rs`, `memtable.rs`, `time_series.rs`, and `test_util.rs` to facilitate testing and benchmarking.
- **Update Criterion Features**: Modified `Cargo.toml` to include `html_reports` feature for `criterion`.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* bulk-multiparts-merge-reader:
### Commit Summary
- **Refactor `SimpleBulkMemtable`**:
- Moved `ranges_sequential` function to a new `test_only` module and made it a method of `SimpleBulkMemtable`.
- Made several fields in `SimpleBulkMemtable` private and added a `region_metadata` getter.
- Affected files: `simple_bulk_memtable.rs`, `test_only.rs`.
- **Benchmark Adjustments**:
- Updated benchmark functions to use the new `ranges_sequential` method.
- Affected file: `simple_bulk_memtable.rs`.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* bulk-multiparts-merge-reader:
### Add Test Configuration for `iter` Method in Memtable Implementations
- **Enhancements**:
- Added `#[cfg(any(test, feature = "test"))]` attribute to the `iter` method in various `Memtable` implementations to enable conditional compilation for testing purposes.
- Affected files:
- `src/mito2/src/memtable.rs`
- `src/mito2/src/memtable/bulk.rs`
- `src/mito2/src/memtable/partition_tree.rs`
- `src/mito2/src/memtable/simple_bulk_memtable.rs`
- `src/mito2/src/memtable/time_series.rs`
- `src/mito2/src/test_util/memtable_util.rs`
- **Benchmark Adjustments**:
- Removed `black_box` usage in `bench_memtable_write_performance` function to streamline benchmarking.
- Affected file: `src/mito2/benches/simple_bulk_memtable.rs`
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* bulk-multiparts-merge-reader:
**Enhance Async Support and Refactor Iteration in `mito2`**
- **Add Async Features**: Updated `Cargo.toml` to include `async` and `async_tokio` features for `criterion`.
- **Async Iteration**: Introduced async functions `flush` and `flush_original` in `simple_bulk_memtable.rs` to handle memtable flushing using async iterators.
- **Refactor Iteration Logic**: Moved `create_iter` and `BatchIterBuilderDeprecated` to `test_only.rs` for better separation of concerns.
- **Public API Change**: Made `next_batch` in `read.rs` public to support async batch processing.
- **Benchmark Updates**: Modified benchmarks in `simple_bulk_memtable.rs` to use async runtime for performance testing.
Files affected: `Cargo.toml`, `simple_bulk_memtable.rs`, `test_only.rs`, `read.rs`.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* bulk-multiparts-merge-reader:
**Enhance Benchmarking for Memtable**
- Refactored `create_large_memtable` to `create_memtable_with_rows` in `simple_bulk_memtable.rs` to allow dynamic row count configuration.
- Introduced parameterized benchmarking in `bench_ranges_parallel_vs_sequential` to test various row counts, improving the flexibility and coverage of performance tests.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* bulk-multiparts-merge-reader:
### Enhance Memory Management and Public API
- **`builder.rs`**: Made `next_offset` method public to allow external access to offset calculations.
- **`simple_bulk_memtable.rs`**: Simplified the `series.extend` method by removing the iterator conversion for `fields`.
- **`time_series.rs`**:
- Added `can_accommodate` method to `ValueBuilder` to check if fields can be accommodated without offset overflow.
- Modified `extend` method to use a `Vec` for `fields` instead of an iterator, improving memory management and error handling.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* bulk-multiparts-merge-reader:
Add License and Enhance Testing in `simple_bulk_memtable.rs`
- Added Apache License header to `simple_bulk_memtable.rs`.
- Modified test configuration in `simple_bulk_memtable.rs` to include `any(test, feature = "test")`.
- Introduced a new test `test_write_read_large_string` in `simple_bulk_memtable.rs` to verify handling of large strings.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* bulk-multiparts-merge-reader:
Update `Cargo.toml` dependencies
- Adjust features for `common-meta` and `mito-codec` to include "testing".
- Maintain `criterion` version and features for async support.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* bulk-multiparts-merge-reader:
### Update Predicate Type in Memtable Iterators
- **Files Modified**:
- `src/mito2/src/memtable.rs`
- `src/mito2/src/memtable/bulk.rs`
- `src/mito2/src/memtable/simple_bulk_memtable.rs`
- **Key Changes**:
- Updated the `iter` method in `Memtable` trait and its implementations to use `Option<table::predicate::Predicate>` instead of `Option<Predicate>`.
- Adjusted return type in `BulkMemtable`'s `iter` method to `Result<crate::memtable::BoxedBatchIterator>`.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* bulk-multiparts-merge-reader:
**Enhance Memtable Functionality**
- **`memtable.rs`**:
- Added `Clone` trait to `MemtableStats` and made `num_ranges` public.
- Introduced `num_rows` field in `MemtableRange` and updated its constructor.
- Added `num_rows` method to `MemtableRange`.
- **`partition_tree.rs`, `simple_bulk_memtable.rs`, `time_series.rs`**:
- Updated `MemtableRange` instantiation to include `num_rows`.
- **`range.rs`**:
- Refactored `MemRangeBuilder` to handle a single `MemtableRange` and `MemtableStats`.
- **`scan_region.rs`**:
- Enhanced memtable filtering based on time range and updated `MemRangeBuilder` usage.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* bulk-multiparts-merge-reader:
**Enhancements and Bug Fixes**
- **Deduplication Enhancements**:
- Introduced `DedupReader` and `LastRow` as public structs in `dedup.rs` to enhance deduplication capabilities.
- Added `LastNonNull` deduplication strategy in `flush.rs` and `simple_bulk_memtable.rs`.
- **Memtable Improvements**:
- Updated `SimpleBulkMemtable` to support batch size configuration and deduplication strategies.
- Modified `Series` struct in `time_series.rs` to include a configurable capacity.
- **Testing Enhancements**:
- Added new test `test_write_dedup` in `simple_bulk_memtable.rs` to verify deduplication functionality.
- Updated existing tests to include `OpType` parameter for better operation type handling.
- **Refactoring**:
- Renamed `BatchIterBuilder` to `BatchRangeBuilder` in `simple_bulk_memtable.rs` for clarity.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* bulk-multiparts-merge-reader:
- **Refactor `flush.rs`:** Removed `LastNonNullIter` usage and adjusted `DedupReader` instantiation to use `LastRow::new(false)` and `LastNonNull::new(false)`.
- **Enhance `simple_bulk_memtable.rs`:** Added logic to handle `LastNonNull` merge mode in `IterBuilder`. Introduced new tests: `test_delete_only` and `test_single_range` to verify delete operations and single range handling.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* fix: tests
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
---------
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* refactor: remove staled manifest structures
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
* add RegionId to FileId
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
* rename method
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
* fix test cases
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
* fix test
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
* refactor: introduce RegionFileId
- FileId still only consist of an uuid
- PathProvider accepts RegionFileId and doesn't need to keep a region id
in it
- All Index applier takes RegionFileId and respects the region id in the RegionFileId
- FileMeta can still derive Serialize/Deserialize
- Refactor the CacheManager to accept RegionFileId
Signed-off-by: evenyag <realevenyag@gmail.com>
* feat: define PathType
Signed-off-by: evenyag <realevenyag@gmail.com>
* refactor: adding PathType WIP
Signed-off-by: evenyag <realevenyag@gmail.com>
* refactor: fix compiler errors
Signed-off-by: evenyag <realevenyag@gmail.com>
* refactor: add path_type to region_dir_from_table_dir
Move region_dir_from_table_dir to mito and use join_dir internally
Signed-off-by: evenyag <realevenyag@gmail.com>
* feat: set path type to ApplierBuilder
Signed-off-by: evenyag <realevenyag@gmail.com>
* style: fmt code
Signed-off-by: evenyag <realevenyag@gmail.com>
* fix: fix passing incorrect dir to access layer
Signed-off-by: evenyag <realevenyag@gmail.com>
* refactor: remove region_dir from CompactionRegion
We can get table_dir and path_type from the access layer
Signed-off-by: evenyag <realevenyag@gmail.com>
* test: fix unit tests
Signed-off-by: evenyag <realevenyag@gmail.com>
* chore: fix typo
Signed-off-by: evenyag <realevenyag@gmail.com>
* chore: update comment
Signed-off-by: evenyag <realevenyag@gmail.com>
* fix: correct marker path
Signed-off-by: evenyag <realevenyag@gmail.com>
* feat: use AccessLayer::build_region_dir to get region dir
Signed-off-by: evenyag <realevenyag@gmail.com>
* chore: log entries in test
Signed-off-by: evenyag <realevenyag@gmail.com>
* fix: set path type in catchup
Signed-off-by: evenyag <realevenyag@gmail.com>
* test: fix test_open_region_failure test
Signed-off-by: evenyag <realevenyag@gmail.com>
* chore: fix compiler errors
Signed-off-by: evenyag <realevenyag@gmail.com>
---------
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
Co-authored-by: Yingwen <realevenyag@gmail.com>
* feat/add-sst-file-num-in-region-stat:
### Add SST File Count to Region Statistics
- **Enhancements**:
- Added `sst_num` to track the number of SST files in region statistics across multiple modules.
- Updated `RegionStat` and `RegionStatistic` structs in `datanode.rs` and `region_engine.rs` to include `sst_num`.
- Modified `MitoRegion` and `SstVersion` in `region.rs` and `version.rs` to compute and return the number of SST files.
- Adjusted test cases in `collect_leader_region_handler.rs`, `failure_handler.rs`, `region_lease_handler.rs`, and `weight_compute.rs` to initialize `sst_num`.
- Updated `get_region_statistic` in `utils.rs` to sum `sst_num` from metadata and data statistics.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/add-sst-file-num-in-region-stat:
Add `sst_num` to `region_statistics`
- Updated `region_statistics.rs` to include a new constant `SST_NUM` and added it to the schema and builder structures.
- Modified `information_schema.result` to reflect the addition of `sst_num` in the `region_statistics` table.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
---------
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* chore/update-opendal-dashboard:
### Update Grafana Dashboard Queries
- **Enhanced Metrics Queries**: Updated Prometheus queries in `dashboard.json`, `dashboard.md`, and `dashboard.yaml` files for both `cluster` and `standalone` dashboards to include additional operations (`Reader::read`, `Writer::write`, `Writer::close`) in the metrics calculations.
- **Legend Format Adjustments**: Modified legend formats to include the `operation` field for better clarity in visualizations.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* chore/update-opendal-dashboard:
Enhance Legend Format in Grafana Dashboards
- Updated the `legendFormat` in `dashboard.json`, `dashboard.md`, and `dashboard.yaml` files for both `cluster` and `standalone` dashboards to include the `operation` field.
- This change affects the following files:
- `grafana/dashboards/metrics/cluster/dashboard.json`
- `grafana/dashboards/metrics/cluster/dashboard.md`
- `grafana/dashboards/metrics/cluster/dashboard.yaml`
- `grafana/dashboards/metrics/standalone/dashboard.json`
- `grafana/dashboards/metrics/standalone/dashboard.md`
- `grafana/dashboards/metrics/standalone/dashboard.yaml`
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
---------
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat: supports null reponse format for http API
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* fix: license header and assertion
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* chore: in seconds
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
---------
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* fix/check-grpc-client-unavailable:
Improve async handling in `greptime_handler.rs`
- Updated the `DoPut` response handling to use `await` with `result_sender.send` for better asynchronous operation.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* fix/check-grpc-client-unavailable:
### Improve Error Handling in `greptime_handler.rs`
- Enhanced error handling for the `DoPut` operation by switching from `send` to `try_send` for the `result_sender`.
- Added specific logging for unreachable clients, including `request_id` in the warning message.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
---------
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* wip
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* refactor/expose-bulk-symbols:
### Commit Message
Enhance DDL Module Accessibility and Refactor `verify_alter` Function
- **`statement.rs`**: Made the `ddl` module public to enhance accessibility.
- **`ddl.rs`**:
- Made `NAME_PATTERN_REG` public for broader usage.
- Refactored `verify_alter` function to be a standalone public function, improving modularity and reusability.
- Made `parse_partitions` function public to allow external access.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* refactor/expose-bulk-symbols:
### Add Parquet Writer and Enhance Row Modifier
- **Add Parquet Writer Module**: Introduced a new module `parquet_writer.rs` to bridge `opendal` `Writer` with `parquet` `AsyncFileWriter`.
- **Enhance Row Modifier**: Updated `RowModifier` to use `Default` trait and made `fill_internal_columns` a public static method in `row_modifier.rs`.
- **Expose Internal Structures**: Made `RowsIter`, `RowIter`, `TablesBuilder`, and `TableBuilder` structs public in `row_modifier.rs` and `prom_row_builder.rs`.
- **Update Metric Engine**: Changed `RowModifier` instantiation to use `default()` in `engine.rs`.
- **Modify Table Options Handling**: Added `fill_table_options_for_create` function in `insert.rs` to handle table options based on `AutoCreateTableType`.
- **Make Constants Public**: Changed `DEFAULT_ROW_GROUP_SIZE` to public in `parquet.rs`.
- **Expose Functions**: Made `extract_add_columns_expr` public in `expr_helper.rs` and `AutoCreateTableType` public in `insert.rs`.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* refactor/expose-bulk-symbols:
### Commit Message
Enhance HTTP Server and Prometheus Integration
- **`http.rs`**: Made `extractor` module public to allow external access.
- **`prom_store.rs`**: Refactored `decode_remote_write_request` to return `TablesBuilder` and adjusted logic for processing requests based on pipeline usage.
- **`lib.rs`**: Made `metrics` module public for broader accessibility.
- **`prom_row_builder.rs`**: Exposed `tables` field in `TablesBuilder` for external manipulation.
- **`proto.rs`**: Changed visibility of `table_data` in `PromWriteRequest` to `pub(crate)` for internal module access.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* refactor/expose-bulk-symbols:
### Add Accessor Methods for Managers and Executors
- **`src/frontend/src/instance.rs`**: Added accessor methods for `NodeManagerRef`, `PartitionRuleManagerRef`, `CacheInvalidatorRef`, and `ProcedureExecutorRef` to the `Instance` struct.
- **`src/operator/src/insert.rs`**: Introduced methods to access `NodeManagerRef` and `PartitionRuleManagerRef` in the `Inserter` struct.
- **`src/operator/src/statement.rs`**: Added methods to retrieve `ProcedureExecutorRef` and `CacheInvalidatorRef` in the `StatementExecutor` struct.
### Change HashMap Implementation
- **`src/servers/src/prom_row_builder.rs`**: Replaced `ahash::HashMap` with `std::collections::HashMap`.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* refactor/expose-bulk-symbols:
Refactor table option handling in `insert.rs`
- Replaced `Vec` with `HashMap` for `table_options` to improve efficiency.
- Extracted logic for filling table options into a new function `fill_table_options_for_create`.
- Modified `fill_table_options_for_create` to return the engine name based on `create_type`.
- Simplified the insertion of table options into `create_table_expr` by using `extend` method.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* refactor/expose-bulk-symbols:
Refactor `insert.rs` to separate engine name logic from table options
- Updated `Inserter` implementation to determine `engine_name` separately from `fill_table_options_for_create`.
- Modified `fill_table_options_for_create` to no longer return an engine name, focusing solely on populating table options.
- Adjusted logic to set `engine_name` based on `AutoCreateTableType`, using `METRIC_ENGINE_NAME` for logical tables and `default_engine()` otherwise.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
---------
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat: allow alternative version string
* refactor: rename original version function to verbose_version
Signed-off-by: Ning Sun <sunning@greptime.com>
---------
Signed-off-by: Ning Sun <sunning@greptime.com>
* chore: allow float number literal as step
Signed-off-by: Ning Sun <sunning@greptime.com>
* chore: switch to released version of promql parser
Signed-off-by: Ning Sun <sunning@greptime.com>
---------
Signed-off-by: Ning Sun <sunning@greptime.com>
refactor/expose-config:
### Make SubCommand and Fields Public in `frontend.rs`
- Made `subcmd` field in `Command` struct public.
- Made `SubCommand` enum public.
- Made `config_file` and `env_prefix` fields in `StartCommand` struct public.
These changes enhance the accessibility of command-related structures and fields, facilitating external usage and integration.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* fix/filter-empty-batch-in-bulk-insert-api:
**Add Early Return for Empty Record Batches in `bulk_insert.rs`**
- Implemented an early return in the `Inserter` implementation to handle cases where `record_batch.num_rows()` is zero, improving efficiency by avoiding unnecessary processing.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* fix/filter-empty-batch-in-bulk-insert-api:
**Improve Bulk Insert Handling**
- **`handle_bulk_insert.rs`**: Added a check to handle cases where the batch has zero rows, immediately returning and sending a success response with zero rows processed.
- **`bulk_insert.rs`**: Enhanced logic to skip processing for masks that select none, optimizing the bulk insert operation by avoiding unnecessary iterations.
These changes improve the efficiency and robustness of the bulk insert process by handling edge cases more effectively.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* fix/filter-empty-batch-in-bulk-insert-api:
### Refactor and Error Handling Enhancements
- **Refactored Timestamp Handling**: Introduced `timestamp_array_to_primitive` function in `timestamp.rs` to streamline conversion of timestamp arrays to primitive arrays, reducing redundancy in `handle_bulk_insert.rs` and `bulk_insert.rs`.
- **Error Handling**: Added `InconsistentTimestampLength` error in `error.rs` to handle mismatched timestamp column lengths in bulk insert operations.
- **Bulk Insert Logic**: Updated `handle_bulk_insert.rs` to utilize the new timestamp conversion function and added checks for timestamp length consistency.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* fix/filter-empty-batch-in-bulk-insert-api:
**Refactor `bulk_insert.rs` to streamline imports**
- Simplified import statements by removing unused timestamp-related arrays and data types from the `arrow` crate in `bulk_insert.rs`.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
---------
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat: actually split window to limit time range
feat: truly limit time range by split window
Update src/flow/src/batching_mode/state.rs
Co-authored-by: Lei, HUANG <6406592+v0y4g3r@users.noreply.github.com>
Signed-off-by: discord9 <discord9@163.com>
* chore: added stalled time window range
Signed-off-by: discord9 <discord9@163.com>
* fix: not flush all time range as too expensive
Signed-off-by: discord9 <discord9@163.com>
* test: make it more robust
Signed-off-by: discord9 <discord9@163.com>
* what
Signed-off-by: discord9 <discord9@163.com>
* feat: denfensively handle surplus
Signed-off-by: discord9 <discord9@163.com>
* refactor: per review,explain flush flow
Signed-off-by: discord9 <discord9@163.com>
* chore: per bugbot
Signed-off-by: discord9 <discord9@163.com>
* fix: a temp fix to make mirror insert go first(still need better fix to sync with mirror insert that happens before
Signed-off-by: discord9 <discord9@163.com>
* chore: add todo
Signed-off-by: discord9 <discord9@163.com>
---------
Signed-off-by: discord9 <discord9@163.com>
Co-authored-by: Lei, HUANG <6406592+v0y4g3r@users.noreply.github.com>
* fix: label_replace and label_join functions in expressions
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* chore: remove update_fields
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* chore: tql eval -> TQL EVAL
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* fix: empty regex and not existing source label
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* chore: simplfy test
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* fix: test
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* fix: test
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
---------
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* fix/process-manager-skip-fail-nodes:
- **Enhance Error Handling in `process_manager.rs`:**
Improved error handling by adding a warning log for failing nodes in the `list_process` method. This ensures that the process listing continues even if some nodes fail to respond.
- **Add Error Type Import in `process_manager.rs`:**
Included the `Error` type from the `error` module to handle errors more effectively within the `ProcessManager` implementation.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* fix: clippy
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* fix/process-manager-skip-fail-nodes:
**Enhancements to Debugging and Trait Implementation**
- **`process_manager.rs`**: Improved logging by adding more detailed error messages when skipping failing nodes.
- **`selector.rs`**: Enhanced the `FrontendClient` trait by adding the `Debug` trait bound to improve debugging capabilities.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
---------
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
refactor/building-backend-in-object-store:
### Refactor Object Store Configuration
- **Centralize Object Store Configurations**: Moved object store configurations (`FileConfig`, `S3Config`, `OssConfig`, `AzblobConfig`, `GcsConfig`) to `object-store/src/config.rs`.
- **Error Handling Enhancements**: Introduced `object-store/src/error.rs` for improved error handling related to object store operations.
- **Factory Pattern for Object Store**: Implemented `object-store/src/factory.rs` to create object store instances, consolidating logic from `datanode/src/store.rs`.
- **Remove Redundant Store Implementations**: Deleted individual store files (`azblob.rs`, `fs.rs`, `gcs.rs`, `oss.rs`, `s3.rs`) from `datanode/src/store/`.
- **Update Usage of Object Store Config**: Updated references to `ObjectStoreConfig` in `datanode.rs`, `standalone.rs`, `config.rs`, and `error.rs` to use the new centralized configuration.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* fix/reordered-write-cause-incorrect-kv:
- **Enhance Testing in `partition_tree.rs`**: Added comprehensive test functions such as `kv_region_metadata`, `key_values`, and `collect_kvs` to improve the robustness of key-value operations and ensure correct behavior of the `PartitionTreeMemtable`.
- **Improve Key Handling in `dict.rs`**: Modified `KeyDictBuilder` to handle both full and sparse keys, ensuring correct mapping and insertion. Added a new test `test_builder_finish_with_sparse_key` to validate the handling of sparse keys.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* fix/reordered-write-cause-incorrect-kv:
### Refactor `partition_tree.rs` for Improved Key Handling
- **Refactored Key Handling**: Simplified the `key_values` function to accept an iterator of keys, removing hardcoded key-value pairs. This change enhances flexibility and reduces redundancy in key management.
- **Updated Test Cases**: Modified test cases to use the new `key_values` function signature, ensuring they iterate over keys dynamically rather than relying on predefined lists.
Files affected:
- `src/mito2/src/memtable/partition_tree.rs`
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* fix/reordered-write-cause-incorrect-kv:
Enhance Testing in `partition_tree.rs`
- Added assertions to verify key-value collection after `memtable` and `forked` operations.
- Refactored key-value writing logic for clarity in `forked` operations.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
---------
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* chore/print-series-count-after-wal-replay:
### Add Series Count Functionality and Logging Enhancements
- **`time_partition.rs`**: Introduced `series_count` method to calculate the total timeseries count across all time partitions.
- **`opener.rs`**: Enhanced logging to include the total timeseries replayed during WAL replay.
- **`version.rs`**: Added `series_count` method to `VersionControlData` for approximating timeseries count in the current version.
- **`handler.rs`**: Added entry and exit logging for the `sql` function to trace execution flow.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* chore/print-series-count-after-wal-replay:
### Remove Unused Import
- **File Modified**: `src/servers/src/http/handler.rs`
- **Change Summary**: Removed the unused `info` import from `common_telemetry`.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
---------
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* chore/series-metrics:
### Add Metrics for Active Series and Values in Memtable
- **`simple_bulk_memtable.rs`**: Implemented `Drop` trait for `SimpleBulkMemtable` to decrement `MEMTABLE_ACTIVE_SERIES_COUNT` and `MEMTABLE_ACTIVE_VALUES_COUNT` upon dropping.
- **`time_series.rs`**:
- Introduced `SeriesMap` with `Drop` implementation to manage active series and values count.
- Updated `SeriesSet` and `Iter` to use `SeriesMap`.
- Added `num_values` method in `Series` to calculate the number of values.
- **`metrics.rs`**: Added `MEMTABLE_ACTIVE_SERIES_COUNT` and `MEMTABLE_ACTIVE_VALUES_COUNT` metrics to track active series and values in `TimeSeriesMemtable`.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* chore/series-metrics:
- Add metrics for active series and field builders
- Update dashboard
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* chore/series-metrics:
**Add Series Count Tracking in Memtables**
- **`flush.rs`**: Updated `RegionFlushTask` to track and log the series count during memtable flush operations.
- **`memtable.rs`**: Introduced `series_count` in `MemtableStats` and added a method to retrieve it.
- **`partition_tree.rs`, `partition.rs`, `tree.rs`**: Implemented series count calculation in `PartitionTreeMemtable` and its components.
- **`simple_bulk_memtable.rs`, `time_series.rs`**: Integrated series count tracking in `SimpleBulkMemtable` and `TimeSeriesMemtable` implementations.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* Update src/mito2/src/memtable.rs
Co-authored-by: Yingwen <realevenyag@gmail.com>
---------
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
Co-authored-by: Yingwen <realevenyag@gmail.com>
* fix: use `limit` params in jaeger http
Signed-off-by: zyy17 <zyylsxm@gmail.com>
* refactor: only parse `max_duration` and `min_duration` when it's not empty
Signed-off-by: zyy17 <zyylsxm@gmail.com>
* fix: handle the input for empty `limit` string
Signed-off-by: zyy17 <zyylsxm@gmail.com>
* fix: missing the fileter for `service_name`
Signed-off-by: zyy17 <zyylsxm@gmail.com>
* test: fix ci errors
Signed-off-by: zyy17 <zyylsxm@gmail.com>
* fix: incorrect behavior of find_traces
Signed-off-by: zyy17 <zyylsxm@gmail.com>
* fix: the logic of `find_traces()`
The correct logic should be:
1. Get all trace ids that match the filters;
2. Get all traces that match the trace ids from the previous query;
Signed-off-by: zyy17 <zyylsxm@gmail.com>
* fix: integration test errors
Signed-off-by: zyy17 <zyylsxm@gmail.com>
* refactor: add `empty_string_as_none`
Signed-off-by: zyy17 <zyylsxm@gmail.com>
* refactor: refine naming
Signed-off-by: zyy17 <zyylsxm@gmail.com>
---------
Signed-off-by: zyy17 <zyylsxm@gmail.com>
* feat/answer-ctrl-c-in-mysql:
## Implement Connection ID-based Query Killing
### Key Changes:
- **Connection ID Management:**
- Added `connection_id` to `Session` and `QueryContext` in `src/session/src/lib.rs` and `src/session/src/context.rs`.
- Updated `MysqlInstanceShim` and `MysqlServer` to handle `connection_id` in `src/servers/src/mysql/handler.rs` and `src/servers/src/mysql/server.rs`.
- **KILL Statement Enhancements:**
- Introduced `Kill` enum to handle both `ProcessId` and `ConnectionId` in `src/sql/src/statements/kill.rs`.
- Updated `ParserContext` to parse `KILL QUERY <connection_id>` in `src/sql/src/parser.rs`.
- Modified `StatementExecutor` to support killing queries by `connection_id` in `src/operator/src/statement/kill.rs`.
- **Process Management:**
- Refactored `ProcessManager` to include `connection_id` in `src/catalog/src/process_manager.rs`.
- Added `kill_local_process` method for local query termination.
- **Testing:**
- Added tests for `KILL` statement parsing and execution in `src/sql/src/parser.rs`.
### Affected Files:
- `Cargo.lock`, `Cargo.toml`
- `src/catalog/src/process_manager.rs`
- `src/frontend/src/instance.rs`
- `src/frontend/src/stream_wrapper.rs`
- `src/operator/src/statement.rs`
- `src/operator/src/statement/kill.rs`
- `src/servers/src/mysql/federated.rs`
- `src/servers/src/mysql/handler.rs`
- `src/servers/src/mysql/server.rs`
- `src/servers/src/postgres.rs`
- `src/session/src/context.rs`
- `src/session/src/lib.rs`
- `src/sql/src/parser.rs`
- `src/sql/src/statements.rs`
- `src/sql/src/statements/kill.rs`
- `src/sql/src/statements/statement.rs`
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
Conflicts:
Cargo.lock
Cargo.toml
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/answer-ctrl-c-in-mysql:
### Enhance Process Management and Execution
- **`process_manager.rs`**: Added a new method `find_processes_by_connection_id` to filter processes by connection ID, improving process management capabilities.
- **`kill.rs`**: Refactored the process killing logic to utilize the new `find_processes_by_connection_id` method, streamlining the execution flow and reducing redundant checks.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/answer-ctrl-c-in-mysql:
## Commit Message
### Update Process ID Type and Refactor Code
- **Change Process ID Type**: Updated the process ID type from `u64` to `u32` across multiple files to optimize memory usage. Affected files include `process_manager.rs`, `lib.rs`, `database.rs`, `instance.rs`, `server.rs`, `stream_wrapper.rs`, `kill.rs`, `federated.rs`, `handler.rs`, `server.rs`,
`postgres.rs`, `mysql_server_test.rs`, `context.rs`, `lib.rs`, and `test_util.rs`.
- **Remove Connection ID**: Removed the `connection_id` field and related logic from `process_manager.rs`, `lib.rs`, `instance.rs`, `server.rs`, `stream_wrapper.rs`, `kill.rs`, `federated.rs`, `handler.rs`, `server.rs`, `postgres.rs`, `mysql_server_test.rs`, `context.rs`, `lib.rs`, and `test_util.rs` to
simplify the codebase.
- **Refactor Process Management**: Refactored process management logic to improve clarity and maintainability in `process_manager.rs`, `kill.rs`, and `handler.rs`.
- **Enhance MySQL Server Handling**: Improved MySQL server handling by integrating process management in `server.rs` and `mysql_server_test.rs`.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/answer-ctrl-c-in-mysql:
### Add Process Manager to Postgres Server
- **`src/frontend/src/server.rs`**: Updated server initialization to include `process_manager`.
- **`src/servers/src/postgres.rs`**: Modified `MakePostgresServerHandler` to accept `process_id` for session creation.
- **`src/servers/src/postgres/server.rs`**: Integrated `process_manager` into `PostgresServer` for generating `process_id` during connection handling.
- **`src/servers/tests/postgres/mod.rs`** and **`tests-integration/src/test_util.rs`**: Adjusted test server setup to accommodate optional `process_manager`.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/answer-ctrl-c-in-mysql:
Update `greptime-proto` Dependency
- Updated the `greptime-proto` dependency to a new revision in both `Cargo.lock` and `Cargo.toml`.
- `Cargo.lock`: Changed source revision from `d75a56e05a87594fe31ad5c48525e9b2124149ba` to `fdcbe5f1c7c467634c90a1fd1a00a784b92a4e80`.
- `Cargo.toml`: Updated the `greptime-proto` git revision to match the new commit.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
---------
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/bulk-support-flow-batch:
### Refactor and Enhance Timestamp Handling in gRPC and Bulk Insert
- **Refactor Table Handling**:
- Updated `put_record_batch` method to use `TableRef` instead of `TableId` in `grpc.rs`, `greptime_handler.rs`, and `grpc.rs`.
- Modified `handle_bulk_insert` to accept `TableRef` and extract `TableId` internally in `bulk_insert.rs`.
- **Enhance Timestamp Processing**:
- Added `compute_timestamp_range` function to calculate timestamp range in `bulk_insert.rs`.
- Introduced error handling for invalid time index types in `error.rs`.
- **Test Adjustments**:
- Updated `DummyInstance` implementation in `tests/mod.rs` to align with new method signatures.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/bulk-support-flow-batch:
### Add Dirty Window Handling in Flow Module
- **Updated `greptime-proto` Dependency**: Updated the `greptime-proto` dependency to a new revision in `Cargo.lock` and `Cargo.toml`.
- **Flow Module Enhancements**:
- Added `DirtyWindowRequest` handling in `flow.rs`, `node_manager.rs`, `test_util.rs`, `flownode_impl.rs`, and `server.rs`.
- Implemented `handle_mark_window_dirty` function to manage dirty time windows.
- **Bulk Insert Enhancements**:
- Modified `bulk_insert.rs` to notify flownodes about dirty time windows using `update_flow_dirty_window`.
- **Removed Unused Imports**: Cleaned up unused imports in `greptime_handler.rs`, `grpc.rs`, and `mod.rs`.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat: mark dirty time window
* feat: metrics
* metrics: more useful metrics batching mode
* feat/bulk-support-flow-batch:
**Refactor Timestamp Handling and Update Dependencies**
- **Dependency Update**: Updated `greptime-proto` dependency in `Cargo.lock` and `Cargo.toml` to a new revision.
- **Batching Engine Refactor**: Modified `src/flow/src/batching_mode/engine.rs` to replace `dirty_time_ranges` with `timestamps` for improved timestamp handling.
- **Bulk Insert Refactor**: Updated `src/operator/src/bulk_insert.rs` to refactor timestamp extraction and handling. Replaced `compute_timestamp_range` with `extract_timestamps` and adjusted related logic to handle timestamps directly.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/bulk-support-flow-batch:
### Update Metrics in Batching Mode Engine
- **Modified Metrics**: Replaced `METRIC_FLOW_BATCHING_ENGINE_BULK_MARK_TIME_WINDOW_RANGE` with `METRIC_FLOW_BATCHING_ENGINE_BULK_MARK_TIME_WINDOW` to track the count of time windows instead of their range.
- Files affected: `engine.rs`, `metrics.rs`
- **New Method**: Added `len()` method to `DirtyTimeWindows` to return the number of dirty windows.
- File affected: `state.rs`
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/bulk-support-flow-batch:
**Refactor and Enhance Timestamp Handling in `bulk_insert.rs`**
- **Refactored Timestamp Extraction**: Moved timestamp extraction logic to a new method `maybe_update_flow_dirty_window` to improve code readability and maintainability.
- **Enhanced Flow Update Logic**: Updated the flow dirty window update mechanism to conditionally notify flownodes only if they are configured, using `table_info` and `record_batch`.
- **Imports Adjusted**: Updated imports to reflect changes in table metadata handling, replacing `TableId` with `TableInfoRef`.
Files affected:
- `src/operator/src/bulk_insert.rs`
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/bulk-support-flow-batch:
## Update `handle_mark_window_dirty` Method in `flownode_impl.rs`
- Replaced `unimplemented!()` with `unreachable!()` in the `handle_mark_window_dirty` method for both `FlowDualEngine` and `StreamingEngine` implementations in `flownode_impl.rs`.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/bulk-support-flow-batch:
Update `greptime-proto` Dependency
- Updated the `greptime-proto` dependency to a new revision in both `Cargo.lock` and `Cargo.toml`.
- `Cargo.lock`: Changed the source revision from `f0913f179ee1d2ce428f8b85a9ea12b5f69ad636` to `17971523673f4fbc982510d3c9d6647ff642e16f`.
- `Cargo.toml`: Updated the `greptime-proto` git revision to `17971523673f4fbc982510d3c9d6647ff642e16f`.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
---------
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
Co-authored-by: discord9 <discord9@163.com>
* fix(mito2): handle corner case in catchup where compacted entry id exceeds region last entry id
Signed-off-by: WenyXu <wenymedia@gmail.com>
* chore: apply suggestions from CR
Signed-off-by: WenyXu <wenymedia@gmail.com>
---------
Signed-off-by: WenyXu <wenymedia@gmail.com>
* fix: event api content type only check type and subtype
Signed-off-by: paomian <xpaomian@gmail.com>
* chore: make clippy happy
Signed-off-by: paomian <xpaomian@gmail.com>
---------
Signed-off-by: paomian <xpaomian@gmail.com>
chore/add-conn-info-to-query-ctx:
### Add Connection Information to Query Context
- **`src/frontend/src/instance.rs`**: Updated to use `query_ctx.conn_info().to_string()` for connection information instead of a placeholder string.
- **`src/session/src/context.rs`**: Introduced `conn_info` field in `QueryContext` and added a method `conn_info()` to retrieve it. Updated `QueryContextBuilder` to handle `conn_info`.
- **`src/session/src/lib.rs`**: Modified `Session` to include `conn_info` in the query context building process.
These changes enhance the query context by incorporating connection information, allowing for more detailed session management.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/kill-process:
### Add Cancellation Support and Enhance Process Management
- **Cancellation Handle Implementation**: Introduced `CancellationHandle` in `cancellation_handle.rs` to facilitate cancellation of futures and streams.
- **Process Management Enhancements**:
- Updated `ProcessManager` in `process_manager.rs` to support cancellable processes using `CancellableProcess`.
- Added `kill_process` method for terminating processes.
- **Stream Wrapper Update**:
- Replaced `StreamWrapper` with `CancellableStreamWrapper` in `stream_wrapper.rs` and `instance.rs` to handle stream cancellation.
- **Error Handling**:
- Added `StreamCancelled` error variant in `error.rs` to handle stream cancellation scenarios.
- **gRPC Handler Update**:
- Added `kill_process` gRPC method in `frontend_grpc_handler.rs` to allow external process termination.
- **Dependency Updates**:
- Updated `Cargo.lock` and `Cargo.toml` to include `common-base` and `tokio-util`.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/kill-process:
**Enhancements and Bug Fixes**
- **Dependency Update**: Updated `greptime-proto` dependency in `Cargo.lock` and `Cargo.toml` to a new revision.
- **Error Handling Improvements**:
- Modified error variants in `src/catalog/src/error.rs` and `src/common/frontend/src/error.rs` to improve error messages and handling.
- Added `FrontendNotFound` error variant for better error specificity.
- **Process Management Enhancements**:
- Updated `ProcessManager` in `src/catalog/src/process_manager.rs` to include `kill_process` functionality with server address validation.
- Enhanced `FrontendClient` trait in `src/common/frontend/src/selector.rs` to support `kill_process` requests.
- **gRPC Handler Update**:
- Refactored `FrontendGrpcHandler` in `src/servers/src/grpc/frontend_grpc_handler.rs` to handle `kill_process` requests asynchronously and return process status.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/kill-process:
### Add Kill Process Functionality
- **`Cargo.lock`, `Cargo.toml`**: Added `common-frontend` as a dependency.
- **`server.rs`, `builder.rs`, `instance.rs`**: Updated `FrontendInvoker` and `FrontendBuilder` to support process management.
- **`error.rs`**: Introduced `InvalidProcessId` error for handling invalid process IDs.
- **`statement.rs`, `kill.rs`**: Implemented `execute_kill` method in `StatementExecutor` to handle the `KILL` statement.
- **`parser.rs`, `statement.rs`**: Updated SQL parser to recognize and parse the `KILL` statement.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/kill-process:
## Add Cancellation Support to Query Execution
- **`process_manager.rs`**: Updated `CancellationHandle` initialization to use `default()` method.
- **`cancellation_handle.rs`**: Implemented `Debug` trait for `CancellationHandle` and added `Cancellation` and `CancellableFuture` structs to support cancellable futures.
- **`error.rs`**: Introduced `Cancelled` error variant to handle query cancellations.
- **`instance.rs`**: Integrated `CancellableFuture` to manage query execution with cancellation support.
- **`stream_wrapper.rs`**: Modified `CancellableStreamWrapper` to use the new `waker()` method for cancellation handling.
- **`statement.rs`**: Added `#[allow(clippy::too_many_arguments)]` to `StatementExecutor::new` to suppress clippy warnings.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/kill-process:
- **Add `MetaClientMissing` Error**: Introduced a new error variant `MetaClientMissing` in `error.rs` to handle missing meta client scenarios.
- **Refactor Cancellation Handling**: Merged `cancellation_handle.rs` into `cancellation.rs` and updated related logic in `process_manager.rs`, `instance.rs`, and `stream_wrapper.rs`.
- **Enhance Process Management**: Improved process management logic in `process_manager.rs` to handle process cancellation more effectively.
- **Update Tests**: Added and updated tests in `cancellation.rs` and `stream_wrapper.rs` to cover new cancellation logic and error handling.
- **Cargo.toml Update**: Adjusted workspace settings in `Cargo.toml` for `common-frontend`.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/kill-process:
- **Add Tests for Process Management**: Introduced multiple async tests in `process_manager.rs` to verify query registration, deregistration, cancellation, and process killing functionalities.
- **Update Error Message in SQL Parser**: Modified the expected error message in `parser.rs` to clarify the expected token as a "process id string literal".
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/kill-process:
### Add Process Count Metrics to Catalog
- **`metrics.rs`**: Introduced a new metric `PROCESS_LIST_COUNT` to track the count of running processes per catalog using `IntGaugeVec`.
- **`process_manager.rs`**: Updated `CancellableProcess` to increment and decrement `PROCESS_LIST_COUNT` upon creation and destruction, respectively. Added a `Drop` implementation for `CancellableProcess` to handle metric updates.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/kill-process:
### Fix process removal logic in `process_manager.rs`
- Corrected the condition for removing an entry from the catalog in `ProcessManager` by using `o.get()` instead of `o.get_mut()`.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/kill-process:
- **Error Handling Improvements**:
- Updated status codes for `Error::FrontendNotFound` and `Error::MetaClientMissing` to `StatusCode::Unexpected` in `src/catalog/src/error.rs`.
- Changed `InvokeFrontend` error display message and status code in `src/common/frontend/src/error.rs`.
- Added `ProcessManagerMissing` error in `src/operator/src/error.rs` and updated its handling in `src/operator/src/statement/kill.rs`.
- **Process Management Enhancements**:
- Added documentation for `ProcessManager` and `register_query` in `src/catalog/src/process_manager.rs`.
- Modified `kill_process` response handling in `src/servers/src/grpc/frontend_grpc_handler.rs`.
- **Cancellation Logic Update**:
- Improved cancellation logic in `src/common/base/src/cancellation.rs` to use `compare_exchange` for atomic operations.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/kill-process:
### Add Process Kill Count Metric and Refactor Cancellation Handle
- **Metrics Update**: Added a new metric `PROCESS_KILL_COUNT` in `metrics.rs` to track the count of completed kill process requests per catalog.
- **Refactor Cancellation Handle**: Renamed `cancellation_handler` to `cancellation_handle` across multiple files for consistency:
- `process_manager.rs`
- `instance.rs`
- `stream_wrapper.rs`
- **Process Management**: Updated process management logic in `process_manager.rs` to increment the `PROCESS_KILL_COUNT` metric upon successful process termination.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/kill-process:
Update metric description in `metrics.rs`
- Changed the description of `PROCESS_KILL_COUNT` to reflect the count of killed processes instead of running processes in `metrics.rs`.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/kill-process:
Update `greptime-proto` Dependency and Fix Response Field
- **Updated Dependency**: Changed the `greptime-proto` Git revision in `Cargo.lock` and `Cargo.toml` to `f0913f1`.
- **Code Fix**: Modified `frontend_grpc_handler.rs` to correct the response field from `found` to `success` in `KillProcessResponse`.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
---------
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat: process id for session, query context and postgres
Signed-off-by: Ning Sun <sunning@greptime.com>
* feat: add sql functions to retrieve connection/process id
Signed-off-by: Ning Sun <sunning@greptime.com>
---------
Signed-off-by: Ning Sun <sunning@greptime.com>
* fix/file-group-in-compaction:
### Enhance Compaction Logic with File Grouping
- **`run.rs`**: Introduced `FileGroup` struct to manage groups of `FileHandle` objects, allowing for more efficient compaction operations. Updated `Ranged` and `Item` trait implementations to work with `FileGroup`.
- **`test_util.rs`**: Added `new_file_handle_with_sequence` function to support file handles with sequence numbers, enhancing test utilities.
- **`twcs.rs`**: Modified `TwcsPicker` to utilize `FileGroup` for managing files within windows, improving compaction logic. Updated `Window` struct to use `HashMap` for storing `FileGroup` objects.
- **`version_util.rs`**: Updated version control utilities to handle sequence numbers in file metadata, aligning with new compaction logic.
Signed-off-by: Lei, HUANG <lhuang@greptime.com>
* fix/file-group-in-compaction:
### Add Test for File Group Assignment in TWCS
- **Enhancements in `twcs.rs`:**
- Added a new test `test_assign_file_groups_to_windows` to verify the correct assignment of file groups to windows.
- Enhanced `test_assign_compacting_to_windows` with a new case to ensure files with overlapping time ranges and the same sequence are treated as one `FileGroup`.
Signed-off-by: Lei, HUANG <lhuang@greptime.com>
* fix/file-group-in-compaction:
**Enhance Compaction Task Documentation and Initialization**
- **`run.rs`**: Added documentation for `FileGroup` to clarify its role in representing a group of files created by the same compaction task.
- **`twcs.rs`**: Introduced comments in the `Window` struct to explain the mapping of file sequences to file groups, indicating files created from the same compaction task. Simplified the initialization of the `files` hashmap using `HashMap::from`.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
---------
Signed-off-by: Lei, HUANG <lhuang@greptime.com>
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* ### Add Process List Management
- **Error Handling Enhancements**:
* refactor: Update test IP addresses to include ports in ProcessKey
* feat/show-process-list:
Refactor Process Management in Meta Module
- Introduced `ProcessManager` for handling process registration and deregistration.
- Added methods for managing and querying process states, including `register_query`, `deregister_query`, and `list_all_processes`.
- Removed redundant process management code from the query module.
- Updated error handling to reflect changes in process management.
- Enhanced test coverage for process management functionalities.
* chore: rebase main
* add information schema process list table
* integrate process list table to system catalog
* build ProcessManager on frontend and standalone mode
* feat/show-process-list:
**Add Process Management Enhancements**
- **`manager.rs`**: Introduced `process_manager` to `SystemCatalog` and `KvBackendCatalogManager` for improved process handling.
- **`information_schema.rs`**: Updated table insertion logic to conditionally include `PROCESS_LIST`.
- **`frontend.rs`, `standalone.rs`**: Enhanced `StartCommand` to clone `process_manager` for better resource management.
- **`instance.rs`, `builder.rs`**: Integrated `ProcessManager` into `Instance` and `FrontendBuilder` to manage query
* feat/show-process-list:
### Add Process Listing and Error Handling Enhancements
- **Error Handling**: Introduced a new error variant `ListProcess` in `error.rs` to handle failures when listing running processes.
- **Process List Implementation**: Enhanced `InformationSchemaProcessList` in `process_list.rs` to track running queries, including defining column names and implementing the `make_process_list` function to build the process list.
- **Frontend Builder**: Added a `#[allow(clippy::too_many_arguments)]` attribute in `builder.rs` to suppress Clippy warnings for the `FrontendBuilder::new` function.
These changes improve error handling and process tracking capabilities within the system.
* feat/show-process-list:
Refactor imports in `process_list.rs`
- Updated import paths for `Predicates` and `InformationTable` in `process_list.rs` to align with the new module structure.
* feat/show-process-list:
Refactor process list generation in `process_list.rs`
- Simplified the process list generation by removing intermediate row storage and directly building vectors.
- Updated `process_to_row` function to use a mutable vector for current row data, improving memory efficiency.
- Removed `rows_to_record_batch` function, integrating its logic directly into the main loop for streamlined processing.
* wip: move ProcessManager to catalog crate
* feat/show-process-list:
- **Refactor Row Construction**: Updated row construction in multiple files to use references for `Value` objects, improving memory efficiency. Affected files include:
- `cluster_info.rs`
- `columns.rs`
- `flows.rs`
- `key_column_usage.rs`
- `partitions.rs`
- `procedure_info.rs`
- `process_list.rs`
- `region_peers.rs`
- `region_statistics.rs`
- `schemata.rs`
- `table_constraints.rs`
- `tables.rs`
- `views.rs`
- `pg_class.rs`
- `pg_database.rs`
- `pg_namespace.rs`
- **Remove Unused Code**: Deleted unused functions and error variants related to process management in `process_list.rs` and `error.rs`.
- **Predicate Evaluation Update**: Modified predicate evaluation functions in `predicate.rs` to work with references, enhancing performance.
* feat/show-process-list:
### Implement Process Management Enhancements
- **Error Handling Enhancements**:
- Added new error variants `BumpSequence`, `StartReportTask`, `ReportProcess`, and `BuildProcessManager` in `error.rs` to improve error handling for process management tasks.
- Updated `ErrorExt` implementations to handle new error types.
- **Process Manager Improvements**:
- Introduced `ProcessManager` enhancements in `process_manager.rs` to manage process states using `ProcessWithState` and `ProcessState` enums.
- Implemented periodic task `ReportTask` to report running queries to the KV backend.
- Modified `register_query` and `deregister_query` methods to use the new state management system.
- **Testing and Validation**:
- Updated tests in `process_manager.rs` to validate new process management logic.
- Replaced `dump` method with `list_all_processes` for listing processes.
- **Integration with Frontend and Standalone**:
- Updated `frontend.rs` and `standalone.rs` to handle `ProcessManager` initialization errors using `BuildProcessManager` error variant.
- **Schema Adjustments**:
- Modified `process_list.rs` in `system_schema/information_schema` to use the updated process listing method.
- **Key-Value Conversion**:
- Added `TryFrom` implementation for converting `Process` to `KeyValue` in `process_list.rs`.
* chore: remove register
* fix: sqlness tests
* merge main
Signed-off-by: Lei, HUANG <lhuang@greptime.com>
* feat/show-process-list:
- **Update `greptime-proto` Dependency**: Updated the `greptime-proto` dependency in `Cargo.lock` and `Cargo.toml` to a new revision.
- **Refactor `ProcessManager`**: Simplified the `ProcessManager` implementation by removing the use of `KvBackendRef` and `SequenceRef`, and replaced them with `AtomicU64` and `RwLock` for managing process IDs and catalogs in `process_manager.rs`.
- **Remove Process List Metadata**: Deleted the `process_list.rs` file and removed related metadata key definitions in `key.rs`.
- **Update Process List Logic**: Modified the process list logic in `process_list.rs` to use the new `ProcessManager` structure.
- **Adjust Frontend and Standalone Start Commands**: Updated `frontend.rs` and `standalone.rs` to use the new `ProcessManager` constructor.
Signed-off-by: Lei, HUANG <lhuang@greptime.com>
* feat/show-process-list:
- **Update `greptime-proto` Dependency**: Updated the `greptime-proto` dependency version in `Cargo.lock` and `Cargo.toml` to a new commit hash.
- **Refactor Error Handling**: Removed unused error variants and added a new `ParseProcessId` error in `src/catalog/src/error.rs`.
- **Enhance Process Management**: Introduced `DisplayProcessId` struct for better process ID representation and parsing in `src/catalog/src/process_manager.rs`.
- **Revise Process List Schema**: Updated the schema and logic for process listing in `src/catalog/src/system_schema/information_schema/process_list.rs` to include new fields like `client` and `frontend`.
Signed-off-by: Lei, HUANG <lhuang@greptime.com>
* feat/show-process-list:
### Commit Message
**Enhancements and Refactoring**
- **Process Management:**
- Refactored `ProcessManager` to list local processes with an optional catalog filter in `process_manager.rs`.
- Updated related tests in `process_manager.rs` and `process_list.rs`.
- **Client Enhancements:**
- Added `frontend_client` method in `client.rs` to support gRPC communication with the frontend.
- **Error Handling:**
- Extended error handling in `error.rs` to include gRPC and Meta errors.
- **Frontend Module:**
- Introduced `selector.rs` for frontend client selection and process listing.
- Updated `Cargo.toml` to include new dependencies and dev-dependencies.
- **gRPC Server:**
- Integrated `FrontendServer` in `builder.rs` for enhanced gRPC server capabilities.
Signed-off-by: Lei, HUANG <lhuang@greptime.com>
* feat/show-process-list:
### Commit Message
**Refactor Process Management and Frontend Integration**
- **Add `common-frontend` Dependency**:
- Updated `Cargo.lock`, `Cargo.toml` files to include `common-frontend` as a dependency.
- **Refactor Process Management**:
- Moved `ProcessManager` trait and `DisplayProcessId` struct to `common-frontend`.
- Updated `process_manager.rs` to use `MetaProcessManager` and `ProcessManagerRef`.
- Removed `ParseProcessId` error variant from `error.rs` in `catalog` and `frontend`.
- **Frontend gRPC Service**:
- Added `frontend_grpc_handler.rs` to handle gRPC requests for frontend processes.
- Updated `grpc.rs` and `builder.rs` to integrate `FrontendGrpcHandler`.
- **Update Tests**:
- Modified tests in `process_manager.rs` to align with new `ProcessManager` implementation.
- **Remove Unused Code**:
- Removed `DisplayProcessId` and related parsing logic from `process_manager.rs`.
Signed-off-by: Lei, HUANG <lhuang@greptime.com>
* feat/show-process-list:
### Add `MetaClientRef` to `MetaProcessManager` and Update Instantiation
- **Files Modified**:
- `src/catalog/src/process_manager.rs`
- `src/cmd/src/frontend.rs`
- `src/cmd/src/standalone.rs`
- **Key Changes**:
- Added `MetaClientRef` as an optional parameter to the `MetaProcessManager::new` method.
- Updated instantiation of `MetaProcessManager` to include `MetaClientRef` where applicable.
### Update `ProcessManagerRef` Usage
- **Files Modified**:
- `src/catalog/src/kvbackend/manager.rs`
- `src/catalog/src/system_schema/information_schema.rs`
- `src/catalog/src/system_schema/information_schema/process_list.rs`
- `src/frontend/src/instance.rs`
- `src/frontend/src/instance/builder.rs`
- **Key Changes**:
- Ensured consistent usage of `ProcessManagerRef` across various modules.
Signed-off-by: Lei, HUANG <lhuang@greptime.com>
* feat/show-process-list:
## Refactor Process Management
- **Unified Process Manager**:
- Replaced `MetaProcessManager` with `ProcessManager` across the codebase.
- Updated `ProcessManager` to use `Arc` for shared references and introduced a `Ticket` struct for query registration and deregistration.
- Affected files: `manager.rs`, `process_manager.rs`, `frontend.rs`, `standalone.rs`, `frontend_grpc_handler.rs`, `instance.rs`, `builder.rs`, `cluster.rs`, `standalone.rs`.
- **Stream Wrapper Implementation**:
- Added `StreamWrapper` to handle record batch streams with process management.
- Affected file: `stream_wrapper.rs`.
- **Test Adjustments**:
- Updated tests to align with the new `ProcessManager` implementation.
- Affected file: `tests-integration/src/cluster.rs`, `tests-integration/src/standalone.rs`.
Signed-off-by: Lei, HUANG <lhuang@greptime.com>
* feat/show-process-list:
### Add Error Handling and Process Management
- **Error Handling Enhancements**:
- Added new error variants `ListProcess` and `CreateChannel` in `error.rs` to handle specific gRPC service invocation failures.
- Updated error handling in `selector.rs` to use the new error variants for better context and error propagation.
- **Process Management Integration**:
- Introduced `process_manager` method in `instance.rs` to access the process manager.
- Integrated `FrontendGrpcHandler` with process management in `server.rs` to handle gRPC requests related to process management.
- **gRPC Server Enhancements**:
- Made `frontend_grpc_handler` public in `grpc.rs` to allow external access and integration with other modules.
Signed-off-by: Lei, HUANG <lhuang@greptime.com>
* feat/show-process-list:
Update `greptime-proto` dependency and enhance process management
- **Dependency Update**: Updated `greptime-proto` in `Cargo.lock` and `Cargo.toml` to a new revision.
- **Process Management**:
- Modified `process_manager.rs` to include catalog filtering in `list_process`.
- Updated `frontend_grpc_handler.rs` to handle catalog filtering in `list_process` requests.
- **System Schema**: Added a TODO comment in `process_list.rs` for future user catalog filtering implementation.
Signed-off-by: Lei, HUANG <lhuang@greptime.com>
* feat/show-process-list:
- **Update Workspace Dependencies**:
- Modified `Cargo.toml` files in `src/catalog`, `src/common/frontend`, and `src/servers` to adjust workspace dependencies.
- **Refactor `ProcessManager` Logic**:
- Updated `process_manager.rs` to simplify the condition in the `select` method.
- **Remove Unused Error Variants**:
- Deleted `BuildProcessManager` error variant from `error.rs` in `src/cmd`.
- Removed `InvalidProcessKey` error variant from `error.rs` in `src/common/meta`.
- **Add License Header**:
- Added Apache License header to `stream_wrapper.rs` in `src/frontend`.
- **Update Test Results**:
- Adjusted expected results in `information_schema.result` to reflect changes in the schema.
Signed-off-by: Lei, HUANG <lhuang@greptime.com>
* feat/show-process-list:
### Add Error Handling for Process Listing
- **`src/catalog/src/error.rs`**: Introduced a new error variant `ListProcess` to handle failures in listing frontend nodes.
- **`src/catalog/src/process_manager.rs`**: Updated `local_processes` and `list_all_processes` methods to return the new error type, adding context for error handling.
- **`src/catalog/src/system_schema/information_schema/process_list.rs`**: Modified `make_process_list` to propagate errors using the new error handling mechanism.
- **`src/servers/src/grpc/frontend_grpc_handler.rs`**: Enhanced error handling in the `list_process` method to log errors and return appropriate gRPC status codes.
Signed-off-by: Lei, HUANG <lhuang@greptime.com>
* feat/show-process-list:
Update `greptime-proto` Dependency and Remove `frontend_client` Method
- **Cargo.lock** and **Cargo.toml**: Updated the `greptime-proto` dependency to a new revision (`5f6119ac7952878d39dcde0343c4bf828d18ffc8`).
- **src/client/src/client.rs**: Removed the `frontend_client` method from the `Client` implementation.
Signed-off-by: Lei, HUANG <lhuang@greptime.com>
* feat/show-process-list:
### Add Query Registration with Pre-Generated ID
- **`process_manager.rs`**: Introduced `register_query_with_id` method to allow registering queries with a pre-generated ID. This includes creating a `ProcessInfo` instance and inserting it into the catalog. Added `next_id` method to generate the next process ID.
Signed-off-by: Lei, HUANG <lhuang@greptime.com>
* feat/show-process-list:
### Update Process List Retrieval Method
- **File**: `process_list.rs`
- Updated the method for retrieving process lists from `local_processes` to `list_all_processes` to support asynchronous operations.
Signed-off-by: Lei, HUANG <lhuang@greptime.com>
* feat/show-process-list:
### Update error handling in `error.rs`
- Refined status code handling for `CreateChannel` error by delegating to `source.status_code()`.
- Separated `ListProcess` and `CreateChannel` error handling for clarity.
Signed-off-by: Lei, HUANG <lhuang@greptime.com>
---------
Signed-off-by: Lei, HUANG <lhuang@greptime.com>
fix/config-docs:
Update `config.md` to specify default compression mode
- Added default value `none` for `grpc.flight_compression` in both frontend and datanode sections of `config/config.md`.
Signed-off-by: Lei, HUANG <lhuang@greptime.com>
* chore/enable-flight-encoder:
### Add Flight Compression Support
- **Configuration Updates**:
- Added `grpc.flight_compression` option to `config/config.md`, `config/datanode.example.toml`, and `config/frontend.example.toml` to specify compression modes for Arrow IPC service.
- **Code Enhancements**:
- Updated `FlightEncoder` in `src/common/grpc/src/flight.rs` to support compression modes.
- Modified `RegionServer` and `DatanodeBuilder` in `src/datanode/src/datanode.rs` and `src/datanode/src/region_server.rs` to handle `FlightCompression`.
- Integrated `FlightCompression` in `src/servers/src/grpc.rs` and `src/servers/src/grpc/flight.rs` to manage compression settings.
- **Testing and Integration**:
- Updated test utilities and integration tests in `tests-integration/src/grpc/flight.rs` and `tests-integration/src/test_util.rs` to include `FlightCompression`.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* chore/enable-flight-encoder:
### Enable Compression in FlightClient
- **`client.rs`**: Updated `make_flight_client` to accept `send_compression` and `accept_compression` parameters, enabling Zstd compression for sending and receiving messages.
- **`client_manager.rs`**: Modified `datanode` method to pass compression settings from `ChannelConfig` to `RegionRequester`.
- **`database.rs`**: Adjusted calls to `make_flight_client` to include compression parameters.
- **`region.rs`**: Updated `RegionRequester` to store and utilize compression settings.
- **`frontend.rs`**: Configured `ChannelConfig` to enable compression based on options.
- **`channel_manager.rs`**: Added `send_compression` and `accept_compression` fields to `ChannelConfig` with default values and updated tests accordingly.
Signed-off-by: Lei, HUANG <lhuang@greptime.com>
* chore/enable-flight-encoder:
### Update Compression Defaults and Documentation
- **Configuration Files**: Updated `datanode.example.toml` and `frontend.example.toml` to include a default setting comment for `flight_compression`, specifying it defaults to `none`.
- **gRPC Server Code**: Modified `grpc.rs` to set `None` as the default for `FlightCompression` instead of `ArrowIpc`.
Signed-off-by: Lei, HUANG <lhuang@greptime.com>
---------
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
Signed-off-by: Lei, HUANG <lhuang@greptime.com>
* chore: describe pods on CI failure
Signed-off-by: WenyXu <wenymedia@gmail.com>
* chore: increase memory limit for main pod template from 2Gi to 3Gi
Signed-off-by: WenyXu <wenymedia@gmail.com>
---------
Signed-off-by: WenyXu <wenymedia@gmail.com>
* feat/disable-flight-compression:
### Commit Summary
- **Add Compression Control in Flight Encoder**: Introduced a new method `with_compression_disabled` in `FlightEncoder` to allow encoding without compression in `flight.rs`.
- **Update Flight Stream Initialization**: Modified `FlightRecordBatchStream` to use the new `FlightEncoder::with_compression_disabled` method for initializing the encoder in `stream.rs`.
* feat/disable-flight-compression:
Remove Unused Import in `flight.rs`
- Removed the unused import `write_message` from `flight.rs` to clean up the codebase.
* feat/disable-flight-compression:
### Disable Compression in Flight Encoder
- Updated `tests-integration/src/grpc/flight.rs` to use `FlightEncoder::with_compression_disabled()` instead of `FlightEncoder::default()` for encoding `FlightMessage::Schema` and `FlightMessage::RecordBatch`. This change disables compression in the Flight encoder for these operations.
Signed-off-by: Lei, HUANG <lhuang@greptime.com>
* Signed-off-by: Lei, HUANG <lhuang@greptime.com>
* disable flight client compression
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
---------
Signed-off-by: Lei, HUANG <lhuang@greptime.com>
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Co-authored-by: Ruihang Xia <waynestxia@gmail.com>
* fix unit tests
* fix: sqlness
* fix/default-time-window:
## Add Helper Functions and Enhance Compaction Tests
- **Refactor Compaction Logic**: Introduced helper functions `flush` and `compact` in `compaction_test.rs` to streamline compaction operations.
- **Enhance Compaction Tests**: Added a new test `test_infer_compaction_time_window` in `compaction_test.rs` to verify compaction time window inference.
- **Testing Improvements**: Added `#[cfg(test)]` attribute to `new_multi_partitions` in `time_partition.rs` to ensure it's only included in test builds.
* fix/default-time-window:
- **Refactor `TimePartition` Struct**: Removed unnecessary comments regarding `time_range` in `time_partition.rs`.
- **Enhance `TimePartitions` Functionality**: Added a method `part_duration_or_default` to provide a default partition duration in `time_partition.rs`.
- **Update SQL Test Cases**: Modified SQL operations and expected results in `scan_big_varchar.result` and `scan_big_varchar.sql` to reflect changes in data manipulation logic.
* fix/default-time-window:
### Update Time Partition Default Duration
- **Refactor Default Duration**: Introduced `INITIAL_TIME_WINDOW` constant to define the default time window duration as `Duration::from_days(1)`. This change replaces multiple instances of the hardcoded default duration across the `time_partition.rs` file.
- **Files Affected**: `time_partition.rs`
* fix/default-time-window:
## Update Partition Duration Handling
- **`time_partition.rs`**: Refactored `part_duration` to be non-optional, removing `Option` wrapper. Updated logic to use `unwrap_or` with `INITIAL_TIME_WINDOW` where necessary. Adjusted related methods and tests to accommodate this change.
- **`version.rs` (memtable and region)**: Updated handling of `part_duration` to align with changes in `time_partition.rs`, ensuring consistent use of non-optional `Duration`.
* fix/default-time-window:
### Improve Error Context in `time_partition.rs`
- Enhanced error context message in `time_partition.rs` to provide clearer information on partition time range issues, including bucket size details.
Signed-off-by: Lei, HUANG <lhuang@greptime.com>
---------
Signed-off-by: Lei, HUANG <lhuang@greptime.com>
* feat(object_store): add support for Alibaba Cloud OSS
- Implement OSS backend in object_store module
- Add OSS-related options to ExportCommand
- Update build_operator to support OSS
- Modify parse_url to handle OSS schema
Signed-off-by: Logic <zqr10159@dromara.org>
* feat(object_store): add support for Alibaba Cloud OSS
- Implement OSS backend in object_store module
- Add OSS-related options to ExportCommand
- Update build_operator to support OSS
- Modify parse_url to handle OSS schema
Signed-off-by: Logic <zqr10159@dromara.org>
* test(object_store): update OSS backend tests with comprehensive scenarios
- Remove minimal case test for OSS backend
- Update test for OSS backend with all fields valid- Remove invalid allow_anonymous test case
Signed-off-by: Logic <zqr10159@dromara.org>
* feat(datasource): add support for OSS (Object Storage Service)
- Implement is_supported_in_oss function to check if a key is supported in OSS configuration- Add build_oss_backend function for creating an OSS backend
- Update requests module to include OSS support check
Signed-off-by: Logic <zqr10159@dromara.org>
* refactor(export): enhance security and logging for sensitive data
- Replace plain strings with SecretString for sensitive information- Implement masking of sensitive data in SQL logs
- Update handling of S3 and OSS credentials
Signed-off-by: Logic <zqr10159@dromara.org>
* refactor(export): generalize remote storage support and rename options
- Rename `s3_ddl_local_dir` to `ddl_local_dir` for better clarity
- Update comments to support both S3 and OSS remote storage options
- Modify logic to handle remote storage options more generically
Signed-off-by: Logic <zqr10159@dromara.org>
* refactor(export): generalize remote storage support and rename options
- Rename `s3_ddl_local_dir` to `ddl_local_dir` for better clarity
- Update comments to support both S3 and OSS remote storage options
- Modify logic to handle remote storage options more generically
Signed-off-by: Logic <zqr10159@dromara.org>
---------
Signed-off-by: Logic <zqr10159@dromara.org>
* wip
* feat: add cpu and memory limit gauge
* chore: add some test cases
* docs: polish some docs
* refactor: remove '#[cfg(target_os = linux)]'
* refactor: add cfg(target_os) in get_cpu_limit() and get_memory_limit()
* feat: pipeline recognize hints from exec
* chore: rename and add test
* chore: minor improve
* chore: rename and add comments
* fix: typos
* feat: add initial impl for vrl processor
* chore: update processors to allow vrl process
* feat: pipeline recognize hints from exec
* chore: rename and add test
* chore: minor improve
* chore: rename and add comments
* fix: typos
* chore: remove unnecessory clone fn
* chore: group metrics
* chore: use struct in transform output enum
* test: add test for vrl
* fix: leaked conflicts
* chore: merge branch code & add check in compile
* fix: check condition
* fix: check auto-transform timeindex
* chore: support table_suffix in hint
* chore: add test for table suffix in vrl hint
* refactor: change context_opt to a struct
chore/allow-numberic-values-in-alter:
### Commit Message
Enhance `alter_parser.rs` to Support Numeric Values
- Updated `parse_string_options` function in `alter_parser.rs` to handle numeric literals in addition to string literals and `NULL` for alter table statements.
- Added a new test `test_parse_alter_with_numeric_value` in `alter_parser.rs` to verify the parsing of numeric values in alter table options.
* refactor: extract some common functions and structs in election module
* chore: add comments and modify a function name
* chore: add comments and modify a function name
* fix: missing 2 lines in license header
* fix: acqrel
* chore: apply comment suggestions
* Update src/meta-srv/src/election.rs
Co-authored-by: jeremyhi <jiachun_feng@proton.me>
---------
Co-authored-by: jeremyhi <jiachun_feng@proton.me>
* fix/initial-builder-cap:
### Enhance Series Initialization and Capacity Management
- **`simple_bulk_memtable.rs`**: Updated the `Series` initialization to use `with_capacity` with a specified capacity of 8192, improving memory management.
- **`time_series.rs`**: Introduced `with_capacity` method in `Series` to allow custom initial capacity for `ValueBuilder`. Adjusted `INITIAL_BUILDER_CAPACITY` to 16 for more efficient memory usage. Added a new `new` method to maintain backward compatibility.
* fix/initial-builder-cap:
### Adjust Memory Allocation in Memtable
- **`simple_bulk_memtable.rs`**: Reduced the initial capacity of `Series` from 8192 to 1024 to optimize memory usage.
- **`time_series.rs`**: Decreased `INITIAL_BUILDER_CAPACITY` from 16 to 4 to improve efficiency in vector building.
* chore: support shared pipeline under catalog with compatibility
* test: add test for cross schema ref
* chore: use empty string schema by default
* chore: remove unwrap in the patch
* fix: df check
* feat: support SQL parsing for trigger show
* add excludes in licenserc
* refine comment
* fix: typo
* fix: add show/trigger.rs to excludes in licenserc
* feat/lossy-string-validation-in-prom-remote-write:
### Commit Message
#### Refactor Prometheus Validation Mode
- **Replace `is_strict_mode` with `PromValidationMode` Enum:**
- Updated `HttpOptions` and related structures to use `PromValidationMode` enum instead of the boolean `is_strict_mode`.
- Modified functions and tests to accommodate the new enum, ensuring flexible validation modes (`Strict`, `Lossy`, `Unchecked`).
- Affected files: `server.rs`, `prom_decode.rs`, `http.rs`, `prom_store.rs`, `prom_row_builder.rs`, `proto.rs`, `prom_store_test.rs`, `test_util.rs`, `http.rs`.
- **Enhance UTF-8 String Decoding:**
- Introduced `decode_string` function to handle UTF-8 string decoding based on the selected `PromValidationMode`.
- Affected files: `proto.rs`, `prom_row_builder.rs`.
This refactor improves the flexibility and clarity of Prometheus request handling by allowing different validation strategies.
* feat/lossy-string-validation-in-prom-remote-write:
- **Add Prometheus Validation Mode Configuration:**
- Updated `config/config.md`, `config/frontend.example.toml`, and `config/standalone.example.toml` to include `http.prom_validation_mode` setting for Prometheus remote write requests.
- **Enhance Benchmarking for Prometheus Requests:**
- Modified `src/servers/benches/prom_decode.rs` to benchmark different Prometheus validation modes (`Strict`, `Lossy`, `Unchecked`).
- **Implement and Test String Decoding:**
- Added `decode_string` function and comprehensive tests in `src/servers/src/proto.rs` to handle string decoding with different validation modes.
* feat/lossy-string-validation-in-prom-remote-write:
### Add Histogram Buckets to Metrics
- **Files Modified**: `src/servers/src/metrics.rs`
- **Key Changes**:
- Added specific histogram buckets to `METRIC_MYSQL_QUERY_TIMER`, `METRIC_POSTGRES_QUERY_TIMER`, and `METRIC_SERVER_GRPC_PROM_REQUEST_TIMER` to enhance granularity in query elapsed time metrics.
* feat/lossy-string-validation-in-prom-remote-write:
### Update Prometheus Validation Mode Default
- **Config Documentation**: Updated the default description for `http.prom_validation_mode` to indicate that "strict" is the default option in `config.md`, `frontend.example.toml`, and `standalone.example.toml`.
- **HTTP Server Implementation**: Changed the default `prom_validation_mode` to `PromValidationMode::Strict` in `src/servers/src/http.rs`.
* feat/lossy-string-validation-in-prom-remote-write:
**Commit Message:**
Update Prometheus Validation Mode to Strict
- Changed `http.prom_validation_mode` from `unchecked` to `strict` in `config.md`, `frontend.example.toml`, and
`standalone.example.toml` to enforce strict validation of Prometheus remote write requests.
* feat/bulk-wal:
### Refactor: Simplify Data Handling in LogStore Implementations
- **`kafka/log_store.rs`, `raft_engine/log_store.rs`, `wal.rs`, `raw_entry_reader.rs`, `logstore.rs`:**
- Refactored `entry` and `build_entry` functions to accept `Vec<u8>` directly instead of `&mut Vec<u8>`.
- Removed usage of `std::mem::take` for data handling, simplifying the code and improving readability.
- Updated test cases to align with the new function signatures.
* feat/bulk-wal:
### Add Support for Bulk WAL Entries and Flight Data Encoding
- **Add `raw_data` field to `BulkPart` and related structs**: Updated `BulkPart` and related structures in `src/mito2/src/memtable/bulk/part.rs`, `src/mito2/src/memtable/simple_bulk_memtable.rs`, `src/mito2/src/memtable/time_partition.rs`, `src/mito2/src/region_write_ctx.rs`,
`src/mito2/src/worker/handle_bulk_insert.rs`, and `src/store-api/src/region_request.rs` to include a new `raw_data` field for handling Arrow IPC data.
- **Implement Flight Data Encoding**: Added a new module `flight` in `src/common/test-util/src/flight.rs` to encode record batches to Flight data format.
- **Update `greptime-proto` dependency**: Changed the revision of the `greptime-proto` dependency in `Cargo.lock` and `Cargo.toml`.
- **Enhance WAL Writer and Tests**: Modified `src/mito2/src/wal.rs` and related test files to support bulk WAL entries and added tests for encoding and handling bulk data.
* feat/bulk-wal:
- **Update `greptime-proto` Dependency**: Updated the `greptime-proto` dependency to a new revision in `Cargo.lock` and `Cargo.toml`.
- **Add `common-grpc` Dependency**: Added `common-grpc` as a dependency in `Cargo.lock` and `src/mito2/Cargo.toml`.
- **Refactor `BulkPart` Structure**: Removed `num_rows` field and added `num_rows()` method in `src/mito2/src/memtable/bulk/part.rs`. Updated related usages in `src/mito2/src/memtable/simple_bulk_memtable.rs`, `src/mito2/src/memtable/time_partition.rs`, `src/mito2/src/memtable/time_series.rs`,
`src/mito2/src/region_write_ctx.rs`, and `src/mito2/src/worker/handle_bulk_insert.rs`.
- **Implement `TryFrom` and `From` for `BulkWalEntry`**: Added implementations for converting between `BulkPart` and `BulkWalEntry` in `src/mito2/src/memtable/bulk/part.rs`.
- **Handle Bulk Entries in Region Opener**: Added logic to process bulk entries in `src/mito2/src/region/opener.rs`.
- **Fix `BulkInsertRequest` Handling**: Corrected `region_id` handling in `src/operator/src/bulk_insert.rs` and `src/store-api/src/region_request.rs`.
- **Add Error Variant for `ConvertBulkWalEntry`**: Added a new error variant in `src/mito2/src/error.rs` for handling bulk WAL entry conversion errors.
* fix: ci
* feat/bulk-wal:
Add bulk write operation in `opener.rs`
- Enhanced the region write context by adding a call to `write_bulk()` after `write_memtable()` in `opener.rs`.
- This change aims to improve the efficiency of writing operations by enabling bulk writes.
* feat/bulk-wal:
Enhance error handling and metrics in `bulk_insert.rs`
- Updated `Inserter` to improve error handling by capturing the result of `datanode.handle(request)` and incrementing the `DIST_INGEST_ROW_COUNT` metric with the number of affected rows.
* feat/bulk-wal:
### Remove Encode Error Handling for WAL Entries
- **`error.rs`**: Removed the `EncodeWal` error variant and its associated handling.
- **`wal.rs`**: Eliminated the `entry_encode_buf` buffer and its usage for encoding WAL entries. Replaced with direct encoding to a vector using `encode_to_vec()`.
* chore: add tool to export db meta
* chore: add meta restore command
* chore: fmt code
* chore: remove useless error
* chore: support key prefix
* chore: add clean check for meta restore
* chore: add more log for meta restore
* chore: resolve s3 and local file root in command meta-snapshot
* chore: remove the pg mysql features from the build script as they are already in the default feature
* chore: fix by pr comment
* fix: alter table update table column default
* fix: fuzz test also cast default value
* chore: more testcase
* test: non-zero value
* refactor: per review
* tests: unexpected alter result(WIP on fix)
* ub
* ub more
* test: update sqlness
* refactor/flight-codec:
### Refactor and Enhance Schema and RecordBatch Handling
- **Add `datatypes` Dependency**: Updated `Cargo.lock` and `Cargo.toml` to include the `datatypes` dependency.
- **Schema Conversion and Error Handling**:
- Updated `src/client/src/database.rs` and `src/client/src/region.rs` to handle schema conversion using `Arc` and added error handling for schema conversion.
- Enhanced error handling in `src/client/src/error.rs` and `src/common/grpc/src/error.rs` by adding `ConvertSchema` error and removing unused errors.
- **FlightMessage and RecordBatch Refactoring**:
- Refactored `FlightMessage` enum in `src/common/grpc/src/flight.rs` to use `RecordBatch` instead of `Recordbatch`.
- Updated related functions and tests in `src/common/grpc/benches/bench_flight_decoder.rs`, `src/operator/src/bulk_insert.rs`, `src/servers/src/grpc/flight/stream.rs`, and `tests-integration/src/grpc/flight.rs` to align with the new `FlightMessage` structure.
* refactor/flight-codec:
Remove `ConvertArrowSchema` Error Variant
- Removed the `ConvertArrowSchema` error variant from `error.rs`.
- Updated the `ErrorExt` implementation to exclude `ConvertArrowSchema`.
- Affected file: `src/common/query/src/error.rs`.
* fix: cr
* fix/bulk-insert-case-sensitive:
Add error inspection for gRPC bulk insert in `greptime_handler.rs`
- Enhanced error handling by adding `inspect_err` to log errors during the `put_record_batch` operation in `greptime_handler.rs`.
* fix: silient error while bulk ingest with uppercase columns
* main:
**Enhancements to Flight Data Handling and Error Management**
- **Flight Data Handling:**
- Added `bytes` dependency in `Cargo.lock` and `Cargo.toml`.
- Introduced `try_from_schema_bytes` and `try_decode_record_batch` methods in `FlightDecoder` to handle schema and record batch decoding more efficiently in `src/common/grpc/src/flight.rs`.
- Updated `Inserter` in `src/operator/src/bulk_insert.rs` to utilize schema bytes directly, improving bulk insert operations.
- **Error Management:**
- Added `ArrowError` handling in `src/common/grpc/src/error.rs` to manage errors related to Arrow operations.
- **Region Request Processing:**
- Modified `make_region_bulk_inserts` in `src/store-api/src/region_request.rs` to use the new `FlightDecoder` methods for decoding Arrow IPC data.
* - **Flight Data Handling:**
- Added `bytes` dependency in `Cargo.lock` and `Cargo.toml`.
- Introduced `try_from_schema_bytes` and `try_decode_record_batch` methods in `FlightDecoder` to handle schema and record batch decoding more efficiently in `src/common/grpc/src/flight.rs`.
- Updated `Inserter` in `src/operator/src/bulk_insert.rs` to utilize schema bytes directly, improving bulk insert operations.
- **Error Management:**
- Added `ArrowError` handling in `src/common/grpc/src/error.rs` to manage errors related to Arrow operations.
- **Region Request Processing:**
- Modified `make_region_bulk_inserts` in `src/store-api/src/region_request.rs` to use the new `FlightDecoder` methods for decoding Arrow IPC data.
* perf/optimize-bulk-encode-decode:
Update `greptime-proto` dependency and refactor error handling
- **Dependency Update**: Updated the `greptime-proto` dependency to a new revision in `Cargo.lock` and `Cargo.toml`.
- **Error Handling Refactor**: Removed the `Prost` error variant from `MetadataError` in `src/store-api/src/metadata.rs`.
- **Error Handling Improvement**: Replaced `unwrap` with `context(FlightCodecSnafu)` for error handling in `make_region_bulk_inserts` function in `src/store-api/src/region_request.rs`.
* fix: clippy
* fix: toml
* perf/optimize-bulk-encode-decode:
### Update `Cargo.toml` Dependencies
- Updated the `bytes` dependency to use the workspace version in `Cargo.toml`.
* perf/optimize-bulk-encode-decode:
**Fix payload assignment in `bulk_insert.rs`**
- Corrected the assignment of the `payload` field in the `ArrowIpc` struct within the `Inserter` implementation in `bulk_insert.rs`.
* use main branch proto
* chore: invalid table flow mapping
* chore: exists
* fix: invalid all related keys in kv cache when drop flow&refactor: per review
* fix: flow not found status code
* chore: rm unused error code
* chore: stuff
* chore: unused
* - **Refactor `RegionFilePathFactory` to `RegionFilePathProvider`:** Updated references and implementations in `access_layer.rs`, `write_cache.rs`, and related test files to use the new struct name.
- **Add `max_file_size` support in compaction:** Introduced `max_file_size` option in `PickerOutput`, `SerializedPickerOutput`, and `WriteOptions` in `compactor.rs`, `picker.rs`, `twcs.rs`, and `window.rs`.
- **Enhance Parquet writing logic:** Modified `parquet.rs` and `parquet/writer.rs` to support optional `max_file_size` and added a test case `test_write_multiple_files` to verify writing multiple files based on size constraints.
**Refactor Parquet Writer Initialization and File Handling**
- Updated `ParquetWriter` in `writer.rs` to handle `current_indexer` as an `Option`, allowing for more flexible initialization and management.
- Introduced `finish_current_file` method to encapsulate logic for completing and transitioning between SST files, improving code clarity and maintainability.
- Enhanced error handling and logging with `debug` statements for better traceability during file operations.
- **Removed Output Size Enforcement in `twcs.rs`:**
- Deleted the `enforce_max_output_size` function and related logic to simplify compaction input handling.
- **Added Max File Size Option in `parquet.rs`:**
- Introduced `max_file_size` in `WriteOptions` to control the maximum size of output files.
- **Refactored Indexer Management in `parquet/writer.rs`:**
- Changed `current_indexer` from an `Option` to a direct `Indexer` type.
- Implemented `roll_to_next_file` to handle file transitions when exceeding `max_file_size`.
- Simplified indexer initialization and management logic.
- **Refactored SST File Handling**:
- Introduced `FilePathProvider` trait and its implementations (`WriteCachePathProvider`, `RegionFilePathFactory`) to manage SST and index file paths.
- Updated `AccessLayer`, `WriteCache`, and `ParquetWriter` to use `FilePathProvider` for path management.
- Modified `SstWriteRequest` and `SstUploadRequest` to use path providers instead of direct paths.
- Files affected: `access_layer.rs`, `write_cache.rs`, `parquet.rs`, `writer.rs`.
- **Enhanced Indexer Management**:
- Replaced `IndexerBuilder` with `IndexerBuilderImpl` and made it async to support dynamic indexer creation.
- Updated `ParquetWriter` to handle multiple indexers and file IDs.
- Files affected: `index.rs`, `parquet.rs`, `writer.rs`.
- **Removed Redundant File ID Handling**:
- Removed `file_id` from `SstWriteRequest` and `CompactionOutput`.
- Updated related logic to dynamically generate file IDs where necessary.
- Files affected: `compaction.rs`, `flush.rs`, `picker.rs`, `twcs.rs`, `window.rs`.
- **Test Adjustments**:
- Updated tests to align with new path and indexer management.
- Introduced `FixedPathProvider` and `NoopIndexBuilder` for testing purposes.
- Files affected: `sst_util.rs`, `version_util.rs`, `parquet.rs`.
* chore: rebase main
* feat/multiple-compaction-output:
### Add Benchmarking and Refactor Compaction Logic
- **Benchmarking**: Added a new benchmark `run_bench` in `Cargo.toml` and implemented benchmarks in `benches/run_bench.rs` using Criterion for `find_sorted_runs` and `reduce_runs` functions.
- **Compaction Module Enhancements**:
- Made `run.rs` public and refactored the `Ranged` and `Item` traits to be public.
- Simplified the logic in `find_sorted_runs` and `reduce_runs` by removing `MergeItems` and related functions.
- Introduced `find_overlapping_items` for identifying overlapping items.
- **Code Cleanup**: Removed redundant code and tests related to `MergeItems` in `run.rs`.
* feat/multiple-compaction-output:
### Enhance Compaction Logic and Add Benchmarks
- **Compaction Logic Improvements**:
- Updated `reduce_runs` function in `src/mito2/src/compaction/run.rs` to remove the target parameter and improve the logic for selecting files to merge based on minimum penalty.
- Enhanced `find_overlapping_items` to handle unsorted inputs and improve overlap detection efficiency.
- **Benchmark Enhancements**:
- Added `bench_find_overlapping_items` in `src/mito2/benches/run_bench.rs` to benchmark the new `find_overlapping_items` function.
- Extended existing benchmarks to include larger data sizes.
- **Testing Enhancements**:
- Updated tests in `src/mito2/src/compaction/run.rs` to reflect changes in `reduce_runs` and added new tests for `find_overlapping_items`.
- **Logging and Debugging**:
- Improved logging in `src/mito2/src/compaction/twcs.rs` to provide more detailed information about compaction decisions.
* feat/multiple-compaction-output:
### Refactor and Enhance Compaction Logic
- **Refactor `find_overlapping_items` Function**: Changed the function signature to accept slices instead of mutable vectors in `run.rs`.
- **Rename and Update Struct Fields**: Renamed `penalty` to `size` in `SortedRun` struct and updated related logic in `run.rs`.
- **Enhance `reduce_runs` Function**: Improved logic to sort runs by size and limit probe runs to 100 in `run.rs`.
- **Add `merge_seq_files` Function**: Introduced a new function `merge_seq_files` in `run.rs` for merging sequential files.
- **Modify `TwcsPicker` Logic**: Updated the compaction logic to use `merge_seq_files` when only one run is found in `twcs.rs`.
- **Remove `enforce_file_num` Function**: Deleted the `enforce_file_num` function and its related test cases in `twcs.rs`.
* feat/multiple-compaction-output:
### Enhance Compaction Logic and Testing
- **Add `merge_seq_files` Functionality**: Implemented the `merge_seq_files` function in `run.rs` to optimize file merging based on scoring systems. Updated
benchmarks in `run_bench.rs` to include `bench_merge_seq_files`.
- **Improve Compaction Strategy in `twcs.rs`**: Modified the compaction logic to handle file merging more effectively, considering file size and overlap.
- **Update Tests**: Enhanced test coverage in `compaction_test.rs` and `append_mode_test.rs` to validate new compaction logic and file merging strategies.
- **Remove Unused Function**: Deleted `new_file_handles` from `test_util.rs` as it was no longer needed.
* feat/multiple-compaction-output:
### Refactor TWCS Compaction Options
- **Refactor Compaction Logic**: Simplified the TWCS compaction logic by replacing multiple parameters (`max_active_window_runs`, `max_active_window_files`, `max_inactive_window_runs`, `max_inactive_window_files`) with a single `trigger_file_num` parameter in `picker.rs`, `twcs.rs`, and `options.rs`.
- **Update Tests**: Adjusted test cases to reflect the new compaction logic in `append_mode_test.rs`, `compaction_test.rs`, `filter_deleted_test.rs`, `merge_mode_test.rs`, and various test files under `tests/cases`.
- **Modify Engine Options**: Updated engine option keys to use `trigger_file_num` in `mito_engine_options.rs` and `region_request.rs`.
- **Fuzz Testing**: Updated fuzz test generators and translators to accommodate the new compaction parameter in `alter_expr.rs` and related files.
This refactor aims to streamline the compaction configuration by reducing the number of parameters and simplifying the codebase.
* chore: add trailing space
* fix license header
* feat/revise-compaction-picker:
**Limit File Processing and Optimize Merge Logic in `run.rs`**
- Introduced a limit to process a maximum of 100 files in `merge_seq_files` to control time complexity.
- Adjusted logic to calculate `target_size` and iterate over files using the limited set of files.
- Updated scoring calculations to use the limited file set, ensuring efficient file merging.
* feat/revise-compaction-picker:
### Add Compaction Metrics and Remove Debug Logging
- **Compaction Metrics**: Introduced new histograms `COMPACTION_INPUT_BYTES` and `COMPACTION_OUTPUT_BYTES` to track compaction input and output file sizes in `metrics.rs`. Updated `compactor.rs` to observe these metrics during the compaction process.
- **Logging Cleanup**: Removed debug logging of file ranges during the merge process in `twcs.rs`.
* feat/revise-compaction-picker:
## Enhance Compaction Logic and Metrics
- **Compaction Logic Improvements**:
- Added methods `input_file_size` and `output_file_size` to `MergeOutput` in `compactor.rs` to streamline file size calculations.
- Updated `Compactor` implementation to use these methods for metrics tracking.
- Modified `Ranged` trait logic in `run.rs` to improve range comparison.
- Enhanced test cases in `run.rs` to reflect changes in compaction logic.
- **Metrics Enhancements**:
- Changed `COMPACTION_INPUT_BYTES` and `COMPACTION_OUTPUT_BYTES` from histograms to counters in `metrics.rs` for better performance tracking.
- **Debugging and Logging**:
- Added detailed logging for compaction pick results in `twcs.rs`.
- Implemented custom `Debug` trait for `FileMeta` in `file.rs` to improve debugging output.
- **Testing Enhancements**:
- Added new test `test_compaction_overlapping_files` in `compaction_test.rs` to verify compaction behavior with overlapping files.
- Updated `merge_mode_test.rs` to reflect changes in file handling during scans.
* feat/revise-compaction-picker:
### Update `FileHandle` Debug Implementation
- **Refactor Debug Output**: Simplified the `fmt::Debug` implementation for `FileHandle` in `src/mito2/src/sst/file.rs` by consolidating multiple fields into a single `meta` field using `meta_ref()`.
- **Atomic Operations**: Updated the `deleted` field to use atomic loading with `Ordering::Relaxed`.
* Trigger CI
* feat/revise-compaction-picker:
**Update compaction logic and default options**
- **`twcs.rs`**: Enhanced logging for compaction pick results by improving the formatting for better readability.
- **`options.rs`**: Modified the default `max_output_file_size` in `TwcsOptions` from 2GB to 512MB to optimize file handling and performance.
* feat/revise-compaction-picker:
Refactor `find_overlapping_items` to use an external result vector
- Updated `find_overlapping_items` in `src/mito2/src/compaction/run.rs` to accept a mutable result vector instead of returning a new vector, improving memory efficiency.
- Modified benchmarks in `src/mito2/benches/bench_compaction_picker.rs` to accommodate the new function signature.
- Adjusted tests in `src/mito2/src/compaction/run.rs` to use the updated function signature, ensuring correct functionality with the new approach.
* feat/revise-compaction-picker:
Improve file merging logic in `run.rs`
- Refactor the loop logic in `merge_seq_files` to simplify the iteration over file groups.
- Adjust the range for `end_idx` to include the endpoint, allowing for more flexible group selection.
- Remove the condition that skips groups with only one file, enabling more comprehensive processing of file sequences.
* feat/revise-compaction-picker:
Enhance `find_overlapping_items` with `SortedRun` and Update Tests
- Refactor `find_overlapping_items` in `src/mito2/src/compaction/run.rs` to utilize the `SortedRun` struct for improved efficiency and clarity.
- Introduce a `sorted` flag in `SortedRun` to optimize sorting operations.
- Update test cases in `src/mito2/benches/bench_compaction_picker.rs` to accommodate changes in `find_overlapping_items` by using `SortedRun`.
- Add `From<Vec<T>>` implementation for `SortedRun` to facilitate easy conversion from vectors.
* feat/revise-compaction-picker:
**Enhancements in `compaction/run.rs`:**
- Added `ReadableSize` import to handle size calculations.
- Modified the logic in `merge_seq_files` to clamp the calculated target size to a maximum of 2GB when `max_file_size` is not provided.
* feat/revise-compaction-picker: Add Default Max Output Size Constant for Compaction
Introduce DEFAULT_MAX_OUTPUT_SIZE constant to define the default maximum compaction output file size as 2GB. Refactor the merge_seq_files function to utilize this constant, ensuring consistent and maintainable code for handling file size limits during compaction.
* fix: always check for shutdown signal in flow
chore: correct log msg for flows that shouldn't exist
feat: use time window size/2 as sleep interval
* chore: better slower query refresh time
* chore
* refactor: per review
fix/stall-metrics:
Improve stalled request handling in `handle_write.rs`
- Updated logic to account for both `write_requests` and `bulk_requests` when adjusting `stalled_count`.
- Modified `reject_region_stalled_requests` and `handle_region_stalled_requests` to correctly subtract the combined length of `requests` and `bulk` from `stalled_count`.
fix/flaky-prom-gateway-test:
**Refactor gRPC Test Assertions in `grpc.rs`**
- Updated test assertions for `test_prom_gateway_query` to improve clarity and maintainability.
- Replaced direct comparison with expected `PrometheusJsonResponse` objects with individual field assertions.
- Added sorting for `vector` and `matrix` results to ensure consistent test outcomes.
* 1. rename `greptime_mito_flush_errors_total` metric to `greptime_mito_flush_errors_total` for consistency
2. update grafana dashboard to add following panel:
- compaction input/output bytes
- bulk insert handle elasped time in frontend and region worker
* chore: supporting more data type for pipeline dryrun API
* chore: add docs for parse_dryrun_data
* chore: fix by pr comment
* chore: add user-friendly error message
* chore: change EventPayloadResolver content_type field type from owner to ref
* Apply suggestions from code review
Co-authored-by: shuiyisong <113876041+shuiyisong@users.noreply.github.com>
---------
Co-authored-by: shuiyisong <113876041+shuiyisong@users.noreply.github.com>
* feat: export to s3 add more options
* chore: rm output dir override logic
* fix: s3 root export data
* feat: use output_dir and s3 at same time
* refactor: per review
* fix: keep same behavior
* fix/fast-path-for-single-region-bulk-insert:
### Commit Summary
- **Refactor `try_decode` Method**: Updated the `try_decode` method in `FlightDecoder` to accept a reference to `FlightData` instead of consuming it. This change affects multiple files including `database.rs`, `region.rs`, `flight.rs`, `bulk_insert.rs`, `stream.rs`, and `region_request.rs`.
- **Optimize Bulk Insert Handling**: Added a fast path for handling bulk inserts when only one region is involved in `bulk_insert.rs`.
* fix/fast-path-for-single-region-bulk-insert:
Improve `FlightDecoder` usage in tests
- Updated `try_decode` method calls in `flight.rs` to remove unnecessary references for `d1`, `d2`, and `d3`.
- Ensured consistency in handling `FlightMessage` variants within test cases.
* fix/fast-path-for-single-region-bulk-insert:
**Enhancement: Skip Empty Regions in Bulk Insert**
- Updated `bulk_insert.rs` to improve efficiency by skipping regions without data during the bulk insert process. This change ensures that regions with a `true_count` of zero are not processed, optimizing resource usage and performance.
* fix/fast-path-for-single-region-bulk-insert:
### Commit Summary
- **Refactor `RegionMask` Handling**:
- Introduced `RegionMask` struct to encapsulate boolean array and selected rows count.
- Updated methods to use `RegionMask` instead of `BooleanArray` for region selection.
- Affected files: `bulk_insert.rs`, `multi_dim.rs`, `partition.rs`, `splitter.rs`.
- **Optimize Region Selection**:
- Removed unnecessary checks for empty regions in `bulk_insert.rs`.
- Improved logic for handling default regions in `multi_dim.rs`.
- **Update Tests**:
- Modified test cases to accommodate `RegionMask` changes.
- Affected files: `multi_dim.rs`, `splitter.rs`.
* fix/fast-path-for-single-region-bulk-insert:
**Enhancements to MultiDimPartitionRule Logic and Tests**
- **`multi_dim.rs`**: Improved the logic for selecting rows in `MultiDimPartitionRule` by optimizing the selection process when only one region is present.
- **Tests**: Added new test cases to verify the behavior of default regions with unselected rows, existing default regions, and scenarios where all rows are selected. These tests ensure robust handling of partition rules and validate the correct assignment of rows to regions.
* feat: improve topic management and add stale records cleanup
* fix: fix unit tests
* chore: apply suggestions from CR
* chore: apply suggestions from CR
* fix: remove files under atomic dir on failure
* fix: clean atomic dir on download failure
* chore: update comment
* fix: clean if failed to write without write cache
* feat: add a TempFileCleaner to clean files on failure
* chore: after merge fix
* chore: more fix
---------
Co-authored-by: discord9 <55937128+discord9@users.noreply.github.com>
Co-authored-by: discord9 <discord9@163.com>
* add benchmark for splitting according to time partition
* feat/write-to-multiple-time-partitions:
**Enhancements to Bulk Processing and Time Partitioning**
- **`part.rs`**: Added `Snafu` to imports and introduced `timestamp_index` in `BulkPart` struct. Implemented `timestamps` method for accessing timestamp columns.
- **`simple_bulk_memtable.rs`**: Updated tests to include `timestamp_index` initialization.
- **`time_partition.rs`**: Enhanced `TimePartition` to support partial writes with `write_record_batch_partial`. Implemented `split_record_batch` for filtering records by timestamp range. Added comprehensive tests for `split_record_batch`.
- **`handle_bulk_insert.rs`**: Modified to retrieve timestamp index and column together, updating `BulkPart` initialization with `timestamp_index`.
* feat/write-to-multiple-time-partitions:
### Enhance Time Partitioning Logic
- **`time_partition.rs`**:
- Introduced `HashSet` for efficient partition management.
- Refactored `write_bulk` to handle multiple partitions and added `find_partitions_by_time_range` for identifying existing and missing partitions.
- Updated `get_or_create_time_partition` to manage partition creation.
- Added comprehensive tests for partition finding logic, covering various scenarios including overlapping and non-overlapping time ranges.
- **Tests**:
- Added `test_find_partitions_by_time_range` to validate new partitioning logic.
- Updated `test_split_record_batch` to ensure correct record batch splitting behavior.
* feat/write-to-multiple-time-partitions:
### Enhance Time Partitioning and Testing in `time_partition.rs`
- **Time Partitioning Enhancements**:
- Updated `split_record_batch` to handle multiple timestamp units (`Second`, `Millisecond`, `Microsecond`, `Nanosecond`) by matching on `DataType`.
- Improved filtering logic for timestamp arrays to support various time units.
- **Testing Enhancements**:
- Added `test_write_bulk` to verify writing across multiple partitions and scenarios in `time_partition.rs`.
- Updated `test_split_record_batch` to use `TimestampMillisecondArray` for testing timestamp partitioning.
- **Imports and Dependencies**:
- Added necessary imports for new timestamp array types and testing utilities.
* feat/write-to-multiple-time-partitions:
### Refactor and Enhance Time Partition Filtering
- **Refactor Filtering Logic**: Consolidated the filtering logic for timestamp arrays using macros in `time_partition.rs` and `bench_filter_time_partition.rs`. This reduces code duplication and improves maintainability.
- **Enhance `BulkPart` Struct**: Made fields in `BulkPart` public to facilitate easier access and manipulation in `memtable.rs` and `part.rs`.
- **Rename Function**: Renamed `split_record_batch` to `filter_record_batch` for clarity in `time_partition.rs` and `bench_filter_time_partition.rs`.
- **Add Feature Flag**: Introduced `int_roundings` feature in `lib.rs` to support new functionality.
* refactor tests
* feat/write-to-multiple-time-partitions:
Improve timestamp handling in `time_partition.rs`
- Enhanced safety comments for timestamp conversion to ensure clarity.
- Modified logic to prevent overflow by using `div_euclid` for `bulk_start_sec` and `bulk_end_sec` calculations.
- Adjusted the `filter_map` logic to correctly compute timestamps using `start_sec` and `part_duration_sec`.
* feat/write-to-multiple-time-partitions:
**Refactor timestamp handling and add utility function**
- **Refactor `time_partition.rs`:** Simplified timestamp handling by replacing direct type access with a utility function to retrieve the timestamp unit. Improved error handling for timestamp conversion.
- **Enhance `metadata.rs`:** Added `time_index_type` function to `RegionMetadata` to retrieve the timestamp type of the time index column, ensuring safer and more readable code.
* feat/write-to-multiple-time-partitions:
Refactor time partition variable names in `time_partition.rs`
- Renamed variables for clarity: `bulk_start_sec` to `start_bucket` and `bulk_end_sec` to `end_bucket`.
- Updated related logic to use new variable names for improved readability and maintainability.
* feat/write-to-multiple-time-partitions:
**Refactor variable names in `time_partition.rs`**
- Updated variable names from `matching` and `missing` to `matchings` and `missings` for clarity and consistency.
- Modified function calls and loop iterations to align with the new variable names.
- Affected file: `src/mito2/src/memtable/time_partition.rs`
* feat/write-to-multiple-time-partitions:
### Refactor variable names in `time_partition.rs`
- Updated variable names for clarity in `time_partition.rs`:
- Renamed `matchings` to `matching_parts`
- Renamed `missings` to `missing_parts`
- Adjusted logic to use new variable names in methods `find_partitions_by_time_range` and `write_record_batch`.
* feat/write-to-multiple-time-partitions:
### Enhance Time Partition Handling
- **`time_partition.rs`**:
- Added `ArrayRef` to handle timestamp arrays, improving the partitioning logic by allowing more efficient timestamp range checks.
- Enhanced `find_partitions_by_time_range` to support sparse data and handle different timestamp units (`Second`, `Millisecond`, `Microsecond`, `Nanosecond`).
- Updated test cases to cover new scenarios, including sparse data and edge cases, ensuring robustness of partition handling.
---------
Co-authored-by: Lei <lei@Leis-MacBook-Pro.local>
* ci: automatically update helm-charts when release
* Update .github/workflows/release.yml
Co-authored-by: Ning Sun <classicning@gmail.com>
* Update update-helm-charts-version.sh
---------
Co-authored-by: Ning Sun <classicning@gmail.com>
* fix: select after alter
* fix: insert a proper row&catch a bug
* fix: alter table modify type modify default value type too
* refactor: per review
* chore: per review
* refactor: per review
* refactor: per review
* feat/bridge-bulk-insert:
## Implement Bulk Insert and Update Dependencies
- **Bulk Insert Implementation**: Added `handle_bulk_inserts` method in `src/operator/src/bulk_insert.rs` to manage bulk insert requests using `FlightDecoder` and `FlightData`.
- **Dependency Updates**: Updated `Cargo.lock` and `Cargo.toml` to use the latest revision of `greptime-proto` and added new dependencies like `arrow`, `arrow-ipc`, `bytes`, and `prost`.
- **gRPC Enhancements**: Modified `put_record_batch` method in `src/frontend/src/instance/grpc.rs` and `src/servers/src/grpc/flight.rs` to handle `FlightData` instead of `RawRecordBatch`.
- **Error Handling**: Added new error types in `src/operator/src/error.rs` for handling Arrow operations and decoding flight data.
- **Miscellaneous**: Updated `src/operator/src/insert.rs` to expose `partition_manager` and `node_manager` as public fields.
* feat/bridge-bulk-insert:
- **Update `greptime-proto` Dependency**: Updated the `greptime-proto` dependency to a new revision in `Cargo.lock` and `Cargo.toml`.
- **Refactor gRPC Query Handling**: Removed `RawRecordBatch` usage from `grpc.rs`, `flight.rs`, `greptime_handler.rs`, and test files, simplifying the gRPC query handling.
- **Enhance Bulk Insert Logic**: Improved bulk insert logic in `bulk_insert.rs` and `region_request.rs` by using `FlightDecoder` and `BooleanArray` for better performance and clarity.
- **Add `common-grpc` Dependency**: Added `common-grpc` as a workspace dependency in `store-api/Cargo.toml` to support gRPC functionalities.
* fix: clippy
* fix schema serialization
* feat/bridge-bulk-insert:
Add error handling for encoding/decoding in `metadata.rs` and `region_request.rs`
- Introduced new error variants `FlightCodec` and `Prost` in `MetadataError` to handle encoding/decoding failures in `metadata.rs`.
- Updated `make_region_bulk_inserts` function in `region_request.rs` to use `context` for error handling with `ProstSnafu` and `FlightCodecSnafu`.
- Enhanced error handling for `FlightData` decoding and `filter_record_batch` operations.
* fix: test
* refactor: rename
* allow empty app_metadata in FlightData
* feat/bridge-bulk-insert:
- **Remove Logging**: Removed unnecessary logging of affected rows in `region_server.rs`.
- **Error Handling Enhancement**: Improved error handling in `bulk_insert.rs` by adding context to `split_record_batch` and handling single datanode fast path.
- **Error Enum Cleanup**: Removed unused `Arrow` error variant from `error.rs`.
* fix: standalone test
* feat/bridge-bulk-insert:
### Enhance Bulk Insert Handling and Metadata Management
- **`lib.rs`**: Enabled the `result_flattening` feature for improved error handling.
- **`request.rs`**: Made `name_to_index` and `has_null` fields public in `WriteRequest` for better accessibility.
- **`handle_bulk_insert.rs`**:
- Added `handle_record_batch` function to streamline processing of bulk insert payloads.
- Improved error handling and task management for bulk insert operations.
- Updated `region_metadata_to_column_schema` to return both column schemas and a name-to-index map for efficient data access.
* feat/bridge-bulk-insert:
- **Refactor `handle_bulk_insert.rs`:**
- Replaced `handle_record_batch` with `handle_payload` for handling payloads.
- Modified the fast path to use `common_runtime::spawn_global` for asynchronous task execution.
- **Optimize `multi_dim.rs`:**
- Added a fast path for single-region scenarios in `MultiDimPartitionRule::partition_record_batch`.
* feat/bridge-bulk-insert:
- **Update `greptime-proto` Dependency**: Updated the `greptime-proto` dependency to a new revision in both `Cargo.lock` and `Cargo.toml`.
- **Optimize Memory Allocation**: Increased initial and builder capacities in `time_series.rs` to improve performance.
- **Enhance Data Handling**: Modified `bulk_insert.rs` to use `Bytes` for efficient data handling.
- **Improve Bulk Insert Logic**: Refined the bulk insert logic in `region_request.rs` to handle schema and payload data more effectively and optimize record batch filtering.
- **String Handling Improvement**: Updated string conversion in `helper.rs` for better performance.
* fix: clippy warnings
* feat/bridge-bulk-insert:
**Add Metrics and Improve Error Handling**
- **Metrics Enhancements**: Introduced new metrics for bulk insert operations in `metrics.rs`, `bulk_insert.rs`, `greptime_handler.rs`, and `region_request.rs`. Added `HANDLE_BULK_INSERT_ELAPSED`, `BULK_REQUEST_MESSAGE_SIZE`, and `GRPC_BULK_INSERT_ELAPSED` histograms to
monitor performance.
- **Error Handling Improvements**: Removed unnecessary error handling in `handle_bulk_insert.rs` by eliminating redundant `let _ =` patterns.
- **Dependency Updates**: Added `lazy_static` and `prometheus` to `Cargo.lock` and `Cargo.toml` for metrics support.
- **Code Refactoring**: Simplified function calls in `region_server.rs` and `handle_bulk_insert.rs` for better readability.
* chore: rebase main
* implement simple bulk memtable
* impl write_bulk
* implement simple bulk memtable
* feat/simple-bulk-memtable:
### Enhance Time-Series Memtable and Bulk Insert Handling
- **Visibility Modifications**: Made `mutable_array` in `PrimitiveVectorBuilder` and `StringVectorBuilder` public in `primitive.rs` and `string.rs`.
- **New Module**: Added `builder.rs` to `memtable` for time-series builders, including `FieldBuilder` and `StringBuilder` implementations.
- **Bulk Insert Enhancements**:
- Added `sequence` field to `BulkPart` in `part.rs` and updated its handling in `simple_bulk_memtable.rs` and `region_write_ctx.rs`.
- Introduced metrics for bulk insert operations in `metrics.rs` and `bulk_insert.rs`.
- **Performance Metrics**: Added timing metrics for write operations in `metrics.rs`, `region_write_ctx.rs`, and `handle_write.rs`.
- **Region Request Handling**: Updated `make_region_bulk_inserts` in `region_request.rs` to include performance metrics.
* feat/simple-bulk-memtable:
**Improve Memtable Stats Calculation and Add Metrics Timer**
- **`simple_bulk_memtable.rs`**: Refactored `stats` method to use `num_rows` for checking if rows have been written, improving accuracy in memory table statistics.
- **`handle_bulk_insert.rs`**: Introduced a metrics timer to measure the elapsed time for processing bulk requests, enhancing performance monitoring.
* feat/simple-bulk-memtable:
### Commit Message
**Enhancements and Bug Fixes**
- **Dependency Update**: Updated `greptime-proto` dependency to a new revision in `Cargo.lock` and `Cargo.toml`.
- **Feature Addition**: Implemented `to_mutation` method in `BulkPart` to convert `BulkPart` to `Mutation` for fallback `write_bulk` implementation in `src/mito2/src/memtable/bulk/part.rs`.
- **Functionality Improvement**: Modified `write_bulk` method in `TimeSeriesMemtable` to support default implementation fallback to row iteration in `src/mito2/src/memtable/time_series.rs`.
- **Performance Optimization**: Enhanced `bulk_insert` handling by optimizing region request processing and data partitioning in `src/operator/src/bulk_insert.rs`.
- **Error Handling**: Added `ComputeArrow` error variant for better error management in `src/operator/src/error.rs`.
- **Code Refactoring**: Simplified region bulk insert request processing in `src/store-api/src/region_request.rs`.
* fix: some clippy warnings
* feat/simple-bulk-memtable:
### Commit Summary
- **Refactor Return Types to `Result`:**
Updated the return type of the `ranges` method in `memtable.rs`, `bulk.rs`, `partition_tree.rs`, `simple_bulk_memtable.rs`, `time_series.rs`, and `memtable_util.rs` to return `Result<MemtableRanges>` for better error handling.
- **Enhance Metrics Tracking:**
Improved metrics tracking by adding `num_rows` and `max_sequence` to `WriteMetrics` in `stats.rs`. Updated related methods in `partition_tree.rs`, `simple_bulk_memtable.rs`, `time_series.rs`, and `scan_region.rs` to utilize these metrics.
- **Remove Unused Imports:**
Cleaned up unused imports in `time_series.rs` to streamline the codebase.
* merge main
* remove useless error vairant
* use newer version of proto
* feat/simple-bulk-memtable:
Commit Message
Summary
Enhance FieldBuilder and StringBuilder functionality, add tests, and improve error handling.
Key Changes
• builder.rs:
• Added documentation for FieldBuilder methods.
• Renamed append_string_vector to append_vector in StringBuilder.
• simple_bulk_memtable.rs:
• Added new test cases for write_one, write_bulk, is_empty, stats, fork, and sequence_filter.
• time_series.rs:
• Improved error handling in ValueBuilder for type mismatches.
• memtable_util.rs:
• Removed unused imports and streamlined code.
These changes enhance the robustness and test coverage of the memtable components.
* feat/simple-bulk-memtable:
Improve Time Partition Matching Logic in `time_partition.rs`
- Enhanced the `write_bulk` method in `time_partition.rs` to improve the logic for matching partitions based on time ranges.
- Introduced a new mechanism to filter and select partitions that overlap with the record batch's timestamp range before writing.
* feat/simple-bulk-memtable:
Improve Metrics Handling in `bulk_insert.rs`
- Removed the `group_request_timer` and its associated metric observation to streamline the timing logic.
- Moved the `BULK_REQUEST_ROWS` metric observation to occur after filtering, ensuring accurate row count metrics.
* feat/simple-bulk-memtable:
**Enhance Stalled Requests Calculation and Update Metrics**
- **`worker.rs`**: Updated the `stalled_count` method to include both `reqs` and `bulk_reqs` in the calculation of stalled requests.
- **`bulk_insert.rs`**: Removed duplicate observation of `BULK_REQUEST_MESSAGE_SIZE` metric.
- **`metrics.rs`**: Changed the bucket strategy for `BULK_REQUEST_ROWS` from linear to exponential, improving the granularity of metrics collection.
* feat/simple-bulk-memtable:
**Refactor `StringVector` Usage and Update Method Signatures**
- **`src/datatypes/src/vectors/string.rs`**: Changed `StringVector`'s `array` field from public to private.
- **`src/mito2/src/memtable/builder.rs`**: Refactored `append_vector` method to `append_array`, updating its usage to work directly with `StringArray` instead of `StringVector`.
- **`src/mito2/src/memtable/time_series.rs`**: Updated `ValueBuilder` to handle `StringArray` directly, replacing `StringVector` usage with `StringArray` in the `FieldBuilder::String` case.
* feat/simple-bulk-memtable:
- **Refactor `PrimitiveVectorBuilder`**: Made `mutable_array` private in `src/datatypes/src/vectors/primitive.rs`.
- **Optimize `ValueBuilder`**: Replaced `UInt64VectorBuilder` and `UInt8VectorBuilder` with `Vec<u64>` and `Vec<u8>` for `sequence` and `op_type` in `src/mito2/src/memtable/time_series.rs`.
- **Improve Metrics Initialization**: Updated histogram bucket initialization to use `exponential_buckets` in `src/mito2/src/metrics.rs`.
* feat/simple-bulk-memtable:
Improve error handling in `simple_bulk_memtable.rs` and `time_series.rs`
- Enhanced error handling by using `OptionExt` for more concise error context management in `simple_bulk_memtable.rs` and `time_series.rs`.
- Replaced `ok_or` with `with_context` to streamline error context creation in both files.
* feat/simple-bulk-memtable:
**Enhance Time Partition Handling in `time_partition.rs`**
- Introduced `create_time_partition` function to streamline the creation of new time partitions, ensuring thread safety by acquiring a lock.
- Modified logic to handle cases where no matching time partitions exist, creating new partitions as needed.
- Updated `write_record_batch` and `write_one` methods to utilize the new partition creation logic, improving partition management and data writing efficiency.
* replace proto
* feat/simple-bulk-memtable:
Update `metrics.rs` to adjust the range of exponential buckets for bulk insert message rows from `10 ~ 1_000_000` to `10 ~ 100_000`.
* feat: update pgwire to 0.29
* chore: only build default binary in nix ci
* Update src/servers/Cargo.toml
Co-authored-by: dennis zhuang <killme2008@gmail.com>
---------
Co-authored-by: dennis zhuang <killme2008@gmail.com>
* feat: flow add static user/pwd auth
* fix: not print password
* chore: rm explict Any bound
* refactor: per review
* refactor: move away from plugin
* refactor: not use any
* chore: per revieww
* chore: complete a todo
* chore: fix after rebase
* feat: support auto transform
* refactor: replace hashbrown with ahash
* refactor: params of run identity pipeline
* refactor: minor update
* test: add test for auto transform
* feat: add select processor
* test: select processor
* chore: use include and exclude for key
* fix: typos
* chore: address CR comment
* chore: typo
* chore: typo
* chore: address CR comment
* chore: use with_context
* fix: do not add projection for cast
Use cast to build time filter directly instead of adding a projection,
which will cause column not found
* feat: cast before creating plan
* feat/bridge-bulk-insert:
## Implement Bulk Insert and Update Dependencies
- **Bulk Insert Implementation**: Added `handle_bulk_inserts` method in `src/operator/src/bulk_insert.rs` to manage bulk insert requests using `FlightDecoder` and `FlightData`.
- **Dependency Updates**: Updated `Cargo.lock` and `Cargo.toml` to use the latest revision of `greptime-proto` and added new dependencies like `arrow`, `arrow-ipc`, `bytes`, and `prost`.
- **gRPC Enhancements**: Modified `put_record_batch` method in `src/frontend/src/instance/grpc.rs` and `src/servers/src/grpc/flight.rs` to handle `FlightData` instead of `RawRecordBatch`.
- **Error Handling**: Added new error types in `src/operator/src/error.rs` for handling Arrow operations and decoding flight data.
- **Miscellaneous**: Updated `src/operator/src/insert.rs` to expose `partition_manager` and `node_manager` as public fields.
* feat/bridge-bulk-insert:
- **Update `greptime-proto` Dependency**: Updated the `greptime-proto` dependency to a new revision in `Cargo.lock` and `Cargo.toml`.
- **Refactor gRPC Query Handling**: Removed `RawRecordBatch` usage from `grpc.rs`, `flight.rs`, `greptime_handler.rs`, and test files, simplifying the gRPC query handling.
- **Enhance Bulk Insert Logic**: Improved bulk insert logic in `bulk_insert.rs` and `region_request.rs` by using `FlightDecoder` and `BooleanArray` for better performance and clarity.
- **Add `common-grpc` Dependency**: Added `common-grpc` as a workspace dependency in `store-api/Cargo.toml` to support gRPC functionalities.
* fix: clippy
* fix schema serialization
* feat/bridge-bulk-insert:
Add error handling for encoding/decoding in `metadata.rs` and `region_request.rs`
- Introduced new error variants `FlightCodec` and `Prost` in `MetadataError` to handle encoding/decoding failures in `metadata.rs`.
- Updated `make_region_bulk_inserts` function in `region_request.rs` to use `context` for error handling with `ProstSnafu` and `FlightCodecSnafu`.
- Enhanced error handling for `FlightData` decoding and `filter_record_batch` operations.
* fix: test
* refactor: rename
* allow empty app_metadata in FlightData
* feat/bridge-bulk-insert:
- **Remove Logging**: Removed unnecessary logging of affected rows in `region_server.rs`.
- **Error Handling Enhancement**: Improved error handling in `bulk_insert.rs` by adding context to `split_record_batch` and handling single datanode fast path.
- **Error Enum Cleanup**: Removed unused `Arrow` error variant from `error.rs`.
* fix: standalone test
* feat/bridge-bulk-insert:
### Enhance Bulk Insert Handling and Metadata Management
- **`lib.rs`**: Enabled the `result_flattening` feature for improved error handling.
- **`request.rs`**: Made `name_to_index` and `has_null` fields public in `WriteRequest` for better accessibility.
- **`handle_bulk_insert.rs`**:
- Added `handle_record_batch` function to streamline processing of bulk insert payloads.
- Improved error handling and task management for bulk insert operations.
- Updated `region_metadata_to_column_schema` to return both column schemas and a name-to-index map for efficient data access.
* feat/bridge-bulk-insert:
- **Refactor `handle_bulk_insert.rs`:**
- Replaced `handle_record_batch` with `handle_payload` for handling payloads.
- Modified the fast path to use `common_runtime::spawn_global` for asynchronous task execution.
- **Optimize `multi_dim.rs`:**
- Added a fast path for single-region scenarios in `MultiDimPartitionRule::partition_record_batch`.
* feat/bridge-bulk-insert:
- **Update `greptime-proto` Dependency**: Updated the `greptime-proto` dependency to a new revision in both `Cargo.lock` and `Cargo.toml`.
- **Optimize Memory Allocation**: Increased initial and builder capacities in `time_series.rs` to improve performance.
- **Enhance Data Handling**: Modified `bulk_insert.rs` to use `Bytes` for efficient data handling.
- **Improve Bulk Insert Logic**: Refined the bulk insert logic in `region_request.rs` to handle schema and payload data more effectively and optimize record batch filtering.
- **String Handling Improvement**: Updated string conversion in `helper.rs` for better performance.
* fix: clippy warnings
* feat/bridge-bulk-insert:
**Add Metrics and Improve Error Handling**
- **Metrics Enhancements**: Introduced new metrics for bulk insert operations in `metrics.rs`, `bulk_insert.rs`, `greptime_handler.rs`, and `region_request.rs`. Added `HANDLE_BULK_INSERT_ELAPSED`, `BULK_REQUEST_MESSAGE_SIZE`, and `GRPC_BULK_INSERT_ELAPSED` histograms to
monitor performance.
- **Error Handling Improvements**: Removed unnecessary error handling in `handle_bulk_insert.rs` by eliminating redundant `let _ =` patterns.
- **Dependency Updates**: Added `lazy_static` and `prometheus` to `Cargo.lock` and `Cargo.toml` for metrics support.
- **Code Refactoring**: Simplified function calls in `region_server.rs` and `handle_bulk_insert.rs` for better readability.
* chore: rebase main
* chore: merge main
* ci: update website greptimedb version when releasing automatically
* fix: token name
* chore: tweak readme
* fix: style
* chore: license year
* refactor: simplify bump-website-version.ts
* chore: being used
* fix: make ci happy
* chore: insert support string to numeric auto cast
* test: add sqlness test
* chore: remove log
* test: fix sql test
* style: fix clippy
* test: test invalid number
* feat: do not convert to default if unable to parse
* chore: update comment
* test: update sqlness test
* test: update prepare test
* feat: support auto transform
* refactor: replace hashbrown with ahash
* refactor: params of run identity pipeline
* refactor: minor update
* test: add test for auto transform
* chore: fix cr issues
* chore: only retry when retry-able
* chore: revert dbg change
* refactor: per review
* fix: check for available frontend first
* docs: more explain&longer timeout&feat: more retry at every level&try send select 1
* fix: use `sql` method for "SELECT 1"
* fix: also put recover flows in spawned task and a dead loop
* test: update transient error in flow rebuild test
* chore: sleep after sqlness sleep
* chore: add a warning
* chore: wait even more time after reboot
* test: incorrect test result when filtering pk with multiple columns
* fix: prune non first tag correctly
Distinguish no column and no stats and only use default value when no
column
* test: update test result
* refactor: rename test file
* test: add test for null filter
* fix: use StatValues for null counts
* test: drop table
* test: fix unstable flow test
fix/checking-memtable-empty-and-stats:
- **Refactor timestamp updates**: Simplified timestamp range updates in `PartitionTreeMemtable` and `TimeSeriesMemtable` by replacing `update_timestamp_range` with `fetch_max` and `fetch_min` methods for `max_timestamp` and `min_timestamp`.
- Affected files: `partition_tree.rs`, `time_series.rs`
- **Remove unused code**: Deleted the `update_timestamp_range` method from `WriteMetrics` and removed unnecessary imports.
- Affected file: `stats.rs`
- **Optimize memtable filtering**: Streamlined the check for empty memtables in `ScanRegion` by directly using `time_range`.
- Affected file: `scan_region.rs`
* try prune one less
* test: also not add one
* ci: use longer fuzz time
* revert fuzz time&per review
* chore: no (
* docs: add explain to offset used in delete records
* test: fix test_procedure_execution
* feat: use flow batching engine
broken: try using logical plan
fix: use dummy catalog for logical plan
fix: insert plan exec&sqlness grpc addr
feat: use frontend instance in flownode in standalone
feat: flow type in metasrv&fix: flush flow out of sync& column name alias
tests: sqlness update
tests: sqlness flow rebuild udpate
chore: per review
refactor: keep chnl mgr
refactor: use catalog mgr for get table
tests: use valid sql
fix: add more check
refactor: put flow type determine to frontend
* chore: update proto
* chore: update proto to main branch
* fix: add locks for create/drop flow&docs: update docs
* feat: flush_flow flush all ranges now
* test: add align time window test
* docs: explain `nodeid` use in check task
* refactor: AddAutoColumnRewriter check for Projection
* refactor: per review
* fix: query without time window also clean dirty time window
* chore: better logging
* chore: add comments per review
* refactor: per review
* chore: per review
* chore: per review rename args
* refactor: per review partially
* chore: update docs
* chore: use better error variant
* chore: better error variant
* refactor: rename FlowWorkerManager to FlowStreamingEngine
* rename again
* refactor: per review
* chore: rebase after #5963 merged
* refactor: rename all flow_worker_manager occurs
* docs: rm resolved TODO
* fix: store flow schema on creation
* chore: update sqlness
* refactor: save the entire query context to flow info
* chore: sqlness update
* chore: rm pub
* fix: keep old version compatibility
* fix: remove obsolete failover detectors after region leader change
* chore: apply suggestions from CR
* fix: fix unit tests
* fix: fix unit test
* fix: failover logic
* [wip]: implement arrow service
* add service
* feat/otel-arrow:
### Add OpenTelemetry Arrow Support
- **`Cargo.toml`, `Cargo.lock`**: Updated `otel-arrow-rust` dependency to use a local path and added `arrow-ipc` as a dependency.
- **`src/servers/src/grpc.rs`, `src/servers/src/grpc/builder.rs`**: Integrated `ArrowMetricsServiceServer` with gRPC server, including support for custom header interception and message compression.
- **`src/servers/src/otel_arrow.rs`**: Implemented `OtelArrowServiceHandler` for handling OpenTelemetry Arrow metrics and added `HeaderInterceptor` for custom header handling.
* feat/otel-arrow:
Add error handling for OpenTelemetry Arrow requests
- **`src/error.rs`**: Introduced a new error variant `HandleOtelArrowRequest` to handle failures in processing OpenTelemetry Arrow requests.
- **`src/otel_arrow.rs`**: Implemented error handling for receiving and consuming batches from the OpenTelemetry Arrow client. Added logging for errors and updated the response status accordingly.
* feat/otel-arrow:
Remove `otel_arrow` Module from gRPC Server
- Deleted the `otel_arrow` module from the gRPC server implementation.
- Removed the `otel_arrow` module import from `grpc.rs`.
- Deleted the `otel_arrow.rs` file, which contained the `OtelArrowServer` struct and its implementation.
* feat/otel-arrow:
## Remove `Arc` Implementations for Protocol and Pipeline Handlers
- **Removed `Arc` Implementations**: Deleted `Arc` implementations for `OpenTelemetryProtocolHandler` and `PipelineHandler` traits in `query_handler.rs`. This change simplifies the code by removing redundant async trait implementations for `Arc<T>`.
- **File Affected**: `src/servers/src/query_handler.rs`
* feat/otel-arrow:
Improve error handling and metadata processing in `otel_arrow.rs`
- Updated error handling by ignoring the result of `sender.send` to prevent panic on failure.
- Enhanced metadata processing in `HeaderInterceptor` by using `Ok` to safely handle `grpc-encoding` entry retrieval.
* fix dependency
* feat/otel-arrow:
- **Update Dependencies**:
- Moved `otel-arrow-rust` dependency in `Cargo.toml`.
- Adjusted workspace dependencies in `src/frontend/Cargo.toml`.
- **Error Handling**:
- Removed `MissingQueryContext` error variant from `src/servers/src/error.rs`.
* fix: toml format
* remove useless code
* chore: resolve conflicts
* test: sqlness test case
* feat: use correct default while pruning row groups
* fix: consider default in SimpleFilterContext
* test: update sqlness test
* test: add order by
* feat: enable submitting wal prune procedure periodically
* chore: fix and add options
* test: add unit test
* test: fix unit test
* test: enable active_wal_pruning in test
* test: update default config
* chore: update config name
* refactor: use semaphore to control the number of prune process
* refactor: use split client for wal prune manager and topic creator
* chore: add configs
* chore: apply review comments
* fix: use tracker properly
* fix: use guard to track semaphore
* test: update unit tests
* chore: update config name
* chore: use prunable_entry_id
* refactor: semaphore to only limit the process of submitting
* chore: remove legacy sort
* chore: better configs
* fix: update config.md
* chore: respect fmt
* test: update unit tests
* chore: use interval_at
* fix: fix unit test
* test: fix unit test
* test: fix unit test
* chore: apply review comments
* docs: update config docs
* feat: close follower regions after dropping leader regions
* chore: upgrade greptime-proto
* feat: sync region followers after alter region operations
* test: add tests
* chore: apply suggestions from CR
* chore: apply suggestions from CR
* feat: cache regex in evaluator
* chore: fix warnings
* chore: add reference
* refactor: address CR comments
* Add negative to state
* Don't create the evaluator if the regex is invalid
* test: add test for maybe_build_regex
* fix/pg-timestamp-diff:
### Add Support for `Duration` Type in PostgreSQL Encoding
- **Enhanced `encode_value` Functionality**: Updated `src/servers/src/postgres/types.rs` to support encoding of `Value::Duration` using `PgInterval`.
- **Implemented `Duration` Conversion**: Added conversion logic from `Duration` to `PgInterval` in `src/servers/src/postgres/types/interval.rs`.
- **Added Unit Tests**: Introduced tests for `Duration` to `PgInterval` conversion in `src/servers/src/postgres/types/interval.rs`.
- **Updated SQL Test Cases**: Modified `tests/cases/standalone/common/types/timestamp/timestamp.sql` and `timestamp.result` to include tests for timestamp subtraction using PostgreSQL protocol.
* fix: overflow
* fix/pg-timestamp-diff:
Update `timestamp.sql` to ensure newline consistency
- Modified `timestamp.sql` to add a newline at the end of the file for consistency.
* fix/pg-timestamp-diff:
### Add Documentation for Month Approximation in Interval Calculation
- **File Modified**: `src/servers/src/postgres/types/interval.rs`
- **Key Change**: Added a comment explaining the approximation of one month as 30.44 days in the interval calculations.
* feat: implement Arrow Flight "DoPut" in Frontend
* support auth for "do_put"
* set request_id in DoPut requests and responses
* set "db" in request header
* wip: implement basic request handling
* feat/bulk-insert:
### Add Error Handling and Enhance Bulk Insert Functionality
- **Error Handling**: Introduced a new error variant `ConvertDataType` in `error.rs` to handle conversion failures from `ConcreteDataType` to `ColumnDataType`.
- **Bulk Insert Enhancements**:
- Updated `WorkerRequest::BulkInserts` in `request.rs` to include metadata and sender.
- Implemented `handle_bulk_inserts` in `worker.rs` to process bulk insert requests with region metadata.
- Added functions `region_metadata_to_column_schema` and `record_batch_to_rows` in `handle_bulk_insert.rs` for schema conversion and row processing.
- **API Changes**: Modified `RegionBulkInsertsRequest` in `region_request.rs` to include `region_id`.
Files affected: `error.rs`, `request.rs`, `worker.rs`, `handle_bulk_insert.rs`, `region_request.rs`.
* feat/bulk-insert:
**Enhance Error Handling and Add Unit Tests**
- Improved error handling in `record_batch_to_rows` function within `handle_bulk_insert.rs` by returning `Result` and handling errors with `context`.
- Added unit tests for `region_metadata_to_column_schema` and `record_batch_to_rows` functions in `handle_bulk_insert.rs` to ensure correct functionality and error handling.
* chore: update proto version
* feat/bulk-insert:
- **Refactor Error Handling**: Updated error handling in `error.rs` by modifying the `ConvertDataType` error handling.
- **Improve Logging and Error Reporting**: Enhanced logging and error reporting in `worker.rs` by adding error messages for missing region metadata.
- **Add New Error Type**: Introduced `DecodeArrowIpc` error in `metadata.rs` to handle Arrow IPC decoding failures.
- **Handle Arrow IPC Decoding**: Updated `region_request.rs` to handle Arrow IPC decoding errors using the new `DecodeArrowIpc` error type.
* chore: update proto version
* feat/bulk-insert:
Refactor `handle_bulk_insert.rs` to simplify row construction
- Removed the mutable `current_row` vector and refactored `row_at` function to return a new vector directly.
- Updated `record_batch_to_rows` to utilize the refactored `row_at` function for constructing rows.
* feat/bulk-insert:
### Commit Summary
**Enhancements in Region Server Request Handling**
- Updated `region_server.rs` to include `RegionRequest::BulkInserts(_)` in the `RegionChange::Ingest` category, improving the handling of bulk insert operations.
- Refined the categorization of region requests to ensure accurate mapping to `RegionChange` actions.
* wip: naive impl
* feat/column-partition:
### Add support for DataFusion physical expressions
- **`Cargo.lock` & `Cargo.toml`**: Added `datafusion-physical-expr` as a dependency to support physical expression creation.
- **`expr.rs`**: Implemented conversion methods `try_as_logical_expr` and `try_as_physical_expr` for `Operand` and `PartitionExpr` to facilitate logical and physical expression handling.
- **`multi_dim.rs`**: Enhanced `MultiDimPartitionRule` to utilize physical expressions for partitioning logic, including new methods for evaluating record batches.
- **Tests**: Added unit tests for logical and physical expression conversions and partitioning logic in `expr.rs` and `multi_dim.rs`.
* feat/column-partition:
### Refactor and Enhance Partition Handling
- **Refactor Partition Parsing Logic**: Moved partition parsing logic from `src/operator/src/statement/ddl.rs` to a new utility module `src/partition/src/utils.rs`. This includes functions like `parse_partitions`, `find_partition_bounds`, and `convert_one_expr`.
- **Error Handling Improvements**: Added new error variants `ColumnNotFound`, `InvalidPartitionRule`, and `ParseSqlValue` in `src/partition/src/error.rs` to improve error reporting for partition-related operations.
- **Dependency Updates**: Updated `Cargo.lock` and `Cargo.toml` to include new dependencies `common-time` and `session`.
- **Code Cleanup**: Removed redundant partition parsing functions from `src/operator/src/error.rs` and `src/operator/src/statement/ddl.rs`.
* feat/column-partition:
## Refactor and Enhance SQL and Table Handling
- **Refactor Column Definitions and Error Handling**
- Made `FULLTEXT_GRPC_KEY`, `INVERTED_INDEX_GRPC_KEY`, and `SKIPPING_INDEX_GRPC_KEY` public in `column_def.rs`.
- Removed `IllegalPrimaryKeysDef` error from `error.rs` and moved it to `sql/src/error.rs`.
- Updated error handling in `fill_impure_default.rs` and `expr_helper.rs`.
- **Enhance SQL Utility Functions**
- Moved and refactored functions like `create_to_expr`, `find_primary_keys`, and `validate_create_expr` to `sql/src/util.rs`.
- Added new utility functions for SQL parsing and validation in `sql/src/util.rs`.
- **Improve Partition Handling**
- Added `parse_partition_columns_and_exprs` function in `partition/src/utils.rs`.
- Updated partition rule tests in `partition/src/multi_dim.rs` to use SQL-based partitioning.
- **Simplify Table Name Handling**
- Re-exported `table_idents_to_full_name` from `sql::util` in `session/src/table_name.rs`.
- **Test Enhancements**
- Updated tests in `partition/src/multi_dim.rs` to use SQL for partition rule creation.
* feat/column-partition:
**Add Benchmarking and Enhance Partitioning Logic**
- **Benchmarking**: Introduced a new benchmark for `split_record_batch` in `bench_split_record_batch.rs` using `criterion` and `rand` as development dependencies in `Cargo.toml`.
- **Partitioning Logic**: Enhanced `MultiDimPartitionRule` in `multi_dim.rs` to include a default region for unmatched partition expressions and optimized the `split_record_batch` method.
- **Refactoring**: Moved `sql_to_partition_rule` function to a public scope for reuse in `multi_dim.rs`.
- **Testing**: Added new test module `test_split_record_batch` to validate the partitioning logic.
* Revert "feat/column-partition: ### Refactor and Enhance Partition Handling"
This reverts commit 183fa19f
* fix: revert refctoring parse_partition
* revert some refactor
* feat/column-partition:
### Enhance Partitioning and Error Handling
- **Benchmark Enhancements**: Added new benchmark `bench_split_record_batch_vs_row` in `bench_split_record_batch.rs` to compare row and column-based splitting.
- **Error Handling Improvements**: Introduced new error variants in `error.rs` for better error reporting related to record batch evaluation and arrow kernel computation.
- **Expression Handling**: Updated `expr.rs` to improve error context when converting schemas and creating physical expressions.
- **Partition Rule Enhancements**: Made `row_at` and `record_batch_to_cols` methods public in `multi_dim.rs` and improved error handling for physical expression evaluation and boolean operations.
* feat/column-partition:
### Add `eq` Method and Optimize Expression Caching
- **`expr.rs`**: Added a new `eq` method to the `Operand` struct for equality comparisons.
- **`multi_dim.rs`**: Introduced a caching mechanism for physical expressions using `RwLock` to improve performance in `MultiDimPartitionRule`.
- **`lib.rs`**: Enabled the `let_chains` feature for more concise code.
- **`multi_dim.rs` Tests**: Enhanced test coverage with new test cases for multi-dimensional partitioning, including random record batch generation and default region handling.
* feat/column-partition:
### Add `split_record_batch` Method to `PartitionRule` Trait
- **Files Modified**:
- `src/partition/src/multi_dim.rs`
- `src/partition/src/partition.rs`
- `src/partition/src/splitter.rs`
Added a new method `split_record_batch` to the `PartitionRule` trait, allowing record batches to be split into multiple regions based on partition values. Implemented this method in `MultiDimPartitionRule` and provided unimplemented stubs in test modules.
### Dependency Update
- **File Modified**:
- `src/operator/src/expr_helper.rs`
Removed unused import `ColumnDataType` and `Timezone` from the test module.
### Miscellaneous
- **File Modified**:
- `src/partition/Cargo.toml`
No functional changes; only minor formatting adjustments.
* chore: add license header
* chore: remove useless fules
* feat/column-partition:
Add support for handling unsupported partition expression values
- **`error.rs`**: Introduced a new error variant `UnsupportedPartitionExprValue` to handle unsupported partition expression values, and updated `ErrorExt` to map this error to `StatusCode::InvalidArguments`.
- **`expr.rs`**: Modified the `Operand` implementation to return the new error when encountering unsupported partition expression values.
- **`multi_dim.rs`**: Added a fast path to optimize the selection process when all rows are selected.
* feat/column-partition: Add validation for expression and region length in MultiDimPartitionRule constructor
• Ensure the lengths of exprs and regions match to prevent mismatches.
• Introduce error handling for length discrepancies with a descriptive error message.
* chore: add debug log
* feat/column-partition: Removed the validation check for matching lengths between exprs and regions in MultiDimPartitionRule constructor, simplifying the initialization process.
* fix: unit tests
* fix: gRPC connection pool leak
* use .config() instead of .inner.config
* cancel the bg task if it is running
* fix: cr
* add unit test for pool release
* Avoid potential data races
* fix/remove-metadata-region-options:
### Add `SKIP_WAL_KEY` Option to Metric Engine
- **Enhancements**:
- Introduced `SKIP_WAL_KEY` to the metric engine options in `create.rs` and `mito_engine_options.rs`.
- Updated test cases in `create.rs` to include `skip_wal` option and ensure it is removed for metadata regions.
- **Refactoring**:
- Updated `requests.rs` to use `SKIP_WAL_KEY` from `store_api::mito_engine_options`.
These changes enhance the metric engine by allowing the option to skip Write-Ahead Logging (WAL) and ensure consistent usage of option keys across modules.
* fix/remove-metadata-region-options: Add note for new options in mito_engine_options.rs
• Introduce a comment to remind developers to check if new options should be removed in region_options_for_metadata_region within metric_engine::engine::create.
* empty
* feat: partial impl of rr task/state
* feat: recording rule engine
* chore: rm unused
* chore: per review partially
* test: gen create table
* chore: rm some unused
* test: merge time window
* refactor: rename to batching mode
* refactor: per review
* refactor(partially): per review
* refactor: split engine.rs into three files
* refactor: use plan not sql
* chore: per review
* chore: per review
* refactor: per review
* refactor: per review
* chore: more per review
* refactor: per review
* refactor(partial): per review
* refactor: per review
* chore: clone task cheaper&more comments
* chore: fmt
* chore: typo
* refactor: improve jaeger '/api/services' performance by adding the trace services table
* chore: refine some logic
* chore: compatible v0
* test: add integration test
* chore: expand default limit from 100 to 2000
* test: fix integration test
* refactor: make trace service table configurable
* refactor: use a timestamp(2100-01-01 00:00:00) as large as possible
* refactor: use '<trace_table>_services' as trace services table name
* perf: introduce simd_json for parsing ndjson
* fix: some tests
* fix: some tests
* fix: es test case
* chore: use `as_bytes_mut()`
* chore: remove unnecessary `to_string`
* chore: add safety comment
* refactor: remove mode option in configuration files
* chore: remove mode in configuration file
* remvoe mode field in FlownodeOptions
* add comment for test
* update config.md
* remove mode field in standalone options
* fix: ci
* feat: time window expr
* chore: comments
* refactor: per review
* chore: partially per review
* chore: per review
* chore: per review use query engine's session
* chore: add table name template in pipeline yaml
* chore: implement apply function and add simple test
* chore: add comment and integration test
* chore: minor update
* fix: typos
* chore: change to table suffix
* chore: update comment and test
* chore: change name to table_suffix
* feat: add metrics list to scanner
* chore: add report metrics method
* feat: use df metrics in PartitionMetrics
* feat: pass execution metrics to scan partition
* refactor: remove PartitionMetricsList
* feat: better debug format for ScanMetricsSet
* feat: do not expose all metrics to execution metrics by default
* refactor: use struct destruction
* feat: add metrics list to scanner
* chore: Add custom Debug for ScanMetricsSet and partition metrics display
* test: update sqlness result
* fix: fix region follower procedure
* feat: add table related info to region peers table and follower regions
* feat: impl show region
* chore: apply suggestions from CR
* chore: utils for rr
* chore: one more test
* chore: more test case
* test: even more tests
* chore: per review
* tests: add more&update testcase
* chore: update comment
* chore: add Noop Wal option
* remove: WalOptionsAllocator::alloc method
* feat/no-op-wal:
### Add Noop WAL Option
- **`engine.rs`, `opener.rs`, `wal.rs`, `entry_reader.rs`, `handle_write.rs`, `provider.rs`**:
- Introduced a new `WalOptions::Noop` variant to handle scenarios where no write-ahead logging is required.
- Implemented `NoopEntryReader` to provide a no-operation entry reader.
- Updated logic to skip WAL operations for regions with `Noop` option.
- Added `Provider::Noop` to handle `Noop` operations in the provider logic.
* feat/no-op-wal:
### Add `skip_wal` Option to Table Metadata
- **Enhancements in `table_meta.rs`**:
- Added a `skip_wal` parameter to the `create_wal_options` function to allow skipping WAL writes.
- Updated the `create_table_route` function to utilize the `skip_wal` option from `table_info.meta.options`.
- **Updates in `wal_options_allocator.rs`**:
- Modified `alloc_batch` to handle the `skip_wal` flag, setting WAL options to `Noop` when true.
- Added a test case `test_allocator_with_skip_wal` to verify the `skip_wal` functionality.
- **Changes in `requests.rs`**:
- Introduced `skip_wal` in `TableOptions` and added parsing logic.
- Updated `TableOptions` display to include `skip_wal`.
These changes introduce the ability to skip WAL writes for tables, enhancing flexibility in table metadata management.
* feat/no-op-wal:
**Add WAL Option Handling and Table Option Validation**
- **`handle_write.rs`**: Introduced a check for `WalOptions::Noop` in the `RegionWorkerLoop` to skip WAL writing for regions with this option.
- **`requests.rs`**: Added `SKIP_WAL_KEY` to the list of valid table options for enhanced table configuration validation.
* feat/no-op-wal:
### Update WAL Options Allocation
- **`key.rs`**: Modified the `allocate_region_wal_options` function to include an additional boolean parameter, enhancing the allocation logic.
- **`wal_options_allocator.rs`**: Simplified the `test_allocator_with_skip_wal` test by removing unnecessary variable declarations and directly using `WalOptionsAllocator::RaftEngine`.
These changes improve the flexibility and efficiency of WAL options allocation in the system.
* chore: reformat code
* feat/no-op-wal:
**Enhancement:** Conditional Addition of `SKIP_WAL_KEY` in `requests.rs`
- Updated `TableOptions` implementation in `requests.rs` to conditionally add `SKIP_WAL_KEY` to `key_vals` only when `self.skip_wal` is true, optimizing the key-value pair generation.
* feat/no-op-wal:
Update `requests.rs` tests to reflect changes in `skip_wal` option
- Modified test assertions in `requests.rs` to remove `skip_wal=false` from expected strings.
- Added a new test case to verify `skip_wal=true` is correctly represented in `TableOptions`.
* feat/no-op-wal: Add Debug Logging and Improve Error Handling for WAL and Table Options
• Introduced debug logging in wal.rs to skip obsolete regions, enhancing traceability.
• Improved error handling in requests.rs by replacing warn with error propagation for invalid skip_wal values.
• Added new test cases for skip_wal functionality, including SQL scripts and expected results, to ensure correct behavior and validation of the changes.
* Add explain_verbose to QueryContext
* feat: fmt plan by display type
* feat: update proto to use ExplainOptions
* feat: display more info in verbose mode
* chore: fix clippy
* test: add sqlness test
* test: update sqlness result
* chore: update proto version
* chore: Simplify QueryContextBuilder::explain_options using get_or_insert_default
* chore: minor refactor
* chore: minor refactor
* chore: support custom ts for identity pipeline
* chore: fix clippy
* chore: minor refactor & update tests
* chore: use ref on identity pipeline param
* feat: add mysql election
* feat: add mysql election
* chore: fix deps
* chore: fix deps
* fix: duplicate container
* fix: duplicate setup for sqlness
* fix: call once
* fix: do not use NOWAIT for mysql 5.7
* chore: apply comments
* fix: no parallel sqlness for mysql
* chore: comments and minor revert
* chore: apply comments
* chore: apply comments
* chore: add to table name
* ci: use 2 metasrv to detect election bugs
* refactor: better election logic
* chore: apply comments
* chore: apply comments
* feat: version check before startup
* refactor: remove trace id in primary key
* refactor: remove trace id in primary key in v0 model
* refactor: add span id in v1
* fix: integration test
* feat: add vec_kth_elem function
Signed-off-by: pikady <2652917633@qq.com>
* code format
Signed-off-by: pikady <2652917633@qq.com>
* add test sql
Signed-off-by: pikady <2652917633@qq.com>
* change indexing from 1-based to 0-based
Signed-off-by: pikady <2652917633@qq.com>
* improve code formatting and correct spelling errors
Signed-off-by: pikady <2652917633@qq.com>
* Update tests/cases/standalone/common/function/vector/vector.sql
I noticed the two lines are identical. Could you clarify the reason for the change? Thanks!
Co-authored-by: Zhenchi <zhongzc_arch@outlook.com>
---------
Signed-off-by: pikady <2652917633@qq.com>
Co-authored-by: Zhenchi <zhongzc_arch@outlook.com>
* feat: update to disable http timeout by default
* feat: make http timeout default to 0
* test: correct test case
* chore: generate new config doc
* test: correct tests
* refactor: update jaeger api implementation
* test: add tests for v1 data model
* feat: customize trace table name
* fix: update column requirements to use Column type instead of String
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
* fix: lint fix
* refactor: accumulate resource attributes for v1
* fix: add empty check for additional string
* feat: add table option to mark data model version
* fix: do not overwrite all tags
* feat: use table option to mark table data model version and process accordingly
* chore: update comments to reflect query changes
* feat: use header for jaeger table name
* feat: update index for service_name, drop index for span_name
---------
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Co-authored-by: Ruihang Xia <waynestxia@gmail.com>
Co-authored-by: zyy17 <zyylsxm@gmail.com>
* refactor: use proc macro to generate conversion between TableMeta and TableMetaBuilder
* chore: format
* fix/partition-key-index:
### Update `TableMeta` and Add Partition and Alter Table Tests
- **`metadata.rs`**: Modified `new_meta_builder` method in `TableMeta` to manually remove `value_indices` by setting it to `None` in the `TableMetaBuilder`.
- **`partition_and_alter.result` & `partition_and_alter.sql`**: Added new test cases for creating, inserting, selecting, altering, and dropping a partitioned table `molestiAe`. These tests verify partitioning on the `sImiLiQUE` column and altering the table with a TTL
setting.
fix/partition-key-index:
### Remove Obsolete TODO Comment in `metadata.rs`
- Removed an outdated TODO comment regarding the `new_meta_builder` function in `src/table/src/metadata.rs`.
chore: check struct name in derive_meta_builder
refactor: Simplify TableMeta struct name check in macro
refactor: Improve ToMetaBuilder derive macro validation and error handling
refactor: Enforce ToMetaBuilder macro for table::metadata::TableMeta struct
* fix/partition-key-index:
Update `partition_and_alter.sql` to modify TTL setting
- Modified the TTL setting for the `molestiAe` table to '1d' in `partition_and_alter.sql`.
* fix: sqlness
* fix/partition-key-index:
### Update `TableMeta` and Test File Structure
- **Enhancement**: Added a note in `metadata.rs` to always use `new_meta_builder` for creating `TableMetaBuilder`.
- **Refactor**: Renamed test result and SQL files for better organization:
- `partition_and_alter.result` to `alter/partition_and_alter.result`
- `partition_and_alter.sql` to `alter/partition_and_alter.sql`
* refactor: Simplify `derive_meta_builder` by initializing fields with `Default::default()`
* fix/partition-key-index:
### Commit Summary
- **Refactor `TableMetaBuilder` Initialization**:
- Replaced `TableMetaBuilder::default()` with `TableMetaBuilder::empty()` across multiple files for initializing `TableMetaBuilder` instances.
- Affected files include:
- `src/catalog/src/system_schema.rs`
- `src/common/meta/src/key/test_utils.rs`
- `src/operator/src/req_convert/insert/fill_impure_default.rs`
- `src/query/src/log_query/planner.rs`
- `src/query/src/promql/planner.rs`
- `src/query/src/range_select/plan_rewrite.rs`
- `src/query/src/sql/show_create_table.rs`
- `src/table/src/test_util/memtable.rs`
- `src/table/src/test_util/table_info.rs`
- **Enhance `TableMetaBuilder`**:
- Added `custom_constructor` to `TableMeta` and implemented an `empty` method for `TableMetaBuilder`.
- Modified `TableMetaBuilder` to include a `new_external_table` method with default values.
- Updated `src/table/src/metadata.rs` to reflect these changes.
- **Add Testing Feature**:
- Introduced a conditional compilation for `test_util` in `src/table/src/lib.rs` to include testing utilities when the `testing` feature is enabled.
- **Update `Cargo.toml`**:
- Enabled the `testing` feature for the `table` module in `src/common/meta/Cargo.toml`.
- **Modify `NumbersTable` Initialization**:
- Replaced `TableMetaBuilder` with direct `TableMeta` struct initialization in `src/table/src/table/numbers.rs`.
- **Test Result Update**:
- Updated test results in `tests/cases/standalone/common/alter/partition_and_alter.result` to reflect changes in table meta handling.
* fix: rename default to empty
* docs: add doc for TableMetaBuilder::empty
* chore: Update src/table/src/metadata.rs
---------
Co-authored-by: Yingwen <realevenyag@gmail.com>
* feat: enhancement information_schema.flows
* feat: enhancement information_schema.flows
* u
* u
* u
* u
* u
* u
* u
* u
* u
* update
* update
* update
* delete unused code
* u
* u
* Update src/flow/src/adapter/worker.rs
Co-authored-by: dennis zhuang <killme2008@gmail.com>
* Update src/common/meta/src/key/flow/flow_state.rs
Co-authored-by: dennis zhuang <killme2008@gmail.com>
* Update src/common/meta/src/key/flow/flow_info.rs
Co-authored-by: dennis zhuang <killme2008@gmail.com>
* Update src/common/meta/src/key/flow/flow_state.rs
Co-authored-by: dennis zhuang <killme2008@gmail.com>
* Update src/common/meta/src/key/flow/flow_info.rs
Co-authored-by: dennis zhuang <killme2008@gmail.com>
* u
* u
* u
* u
* u
* u
* chore: fix sqlness
* chore: update proto
* fix: remove date time
* fix: update result of information_schema test
---------
Co-authored-by: dennis zhuang <killme2008@gmail.com>
Co-authored-by: discord9 <discord9@163.com>
* feat: add region follower manager
* feat: add region procudure
* refactor: make add, remove follower procedure look nice
* feat: add region follower procedure
* chore: undo some chane, possibly made by AI
* feat: on prepare cheking
* feat: on update metadata
* feat: on broadcast
* chore: unit test
* feat: add remove follower operation
* feat: add or remove region follower procedure
* chore: ut
* chore: rename
* chore: by comment
* chore: by comment
---------
Co-authored-by: jeremy <jeremy@greptime.local>
chore/move-wal-sync-to-bg:
### Refactor Log Store Task Management
- **Error Handling Enhancements**: Updated error handling for task management in `error.rs` by renaming `StartGcTask` and `StopGcTask` to `StartWalTask` and `StopWalTask`, respectively, and added a `name` field for more descriptive error messages.
- **Task Management Improvements**: Introduced `SyncWalTaskFunction` in `log_store.rs` to handle periodic synchronization of WAL tasks, replacing the previous atomic-based sync logic.
- **Backend Adjustments**: Modified `backend.rs` to use the new `StartWalTaskSnafu` for starting tasks, ensuring consistency with the updated error handling approach.
* feat: add predicate group
* feat: pass predicate group
* feat: memtable prune by time filters
* test: test PruneTimeIterator with time filters
* feat: push down returns exact for timestamp simple filters
---------
Co-authored-by: Ruihang Xia <waynestxia@gmail.com>
* fix: skip schema check to avoid schema mismatch brought by metadata
* docs: add some comment to remind me add that check back
* test: add sqlness case
* fix/skip-schema-check:
### Update CTE Test Cases
- **Added GRPC Latencies Test**: Introduced a new test case for GRPC latencies in `cte.result` and `cte.sql` under `standalone/common/cte`.
- **Removed Redundant Test Files**: Deleted `cte.result` and `cte.sql` under `standalone/common/range` as they were duplicates of the new test case.
* fix: other col alias to time index column handle
* test: update sqlness
* chore: per review
* test: more sqlness
* test: mv some to optimizer folder
* fix: resolve alias properly
* fix: also retain old name
* chore: remove wrong comment
* chore: fix sqlness
* test: standalone/dist more projection diff
* chore: resolve conflicts
* chore: merge main
* test: add compatibility test for DatanodeLeaseKey with missing cluster_id
* test: add compatibility test for DatanodeLeaseKey without cluster_id
* refactor/remove-cluster-id:
- **Update `greptime-proto` Dependency**: Updated the `greptime-proto` dependency in `Cargo.lock` and `Cargo.toml` to a new revision.
- **Remove `cluster_id` Usage**: Removed the `cluster_id` field and its related logic from various files, including `cluster.rs`, `datanode.rs`, `rpc.rs`,
`adapter.rs`, `client.rs`, `ask_leader.rs`, `heartbeat.rs`, `procedure.rs`, `store.rs`, `handler.rs`, `response_header_handler.rs`, `key.rs`, `datanode.rs`,
`lease.rs`, `metrics.rs`, `cluster.rs`, `heartbeat.rs`, `procedure.rs`, and `store.rs`.
- **Refactor Tests**: Updated tests in `client.rs`, `response_header_handler.rs`, `store.rs`, and `service` modules to reflect the removal of `cluster_id`.
* fix: clippy
* refactor/remove-cluster-id:
**Refactor and Cleanup in Meta Server**
- **`response_header_handler.rs`**: Removed unused import of `HeartbeatResponse` and cleaned up the test function by eliminating the creation of an unused `HeartbeatResponse` object.
- **`node_lease.rs`**: Simplified parameter handling in `HttpHandler` implementation by using an underscore for unused parameters.
* refactor/remove-cluster-id:
### Remove `TableMetadataAllocatorContext` and Refactor Code
- **Removed `TableMetadataAllocatorContext`**: Eliminated the `TableMetadataAllocatorContext` struct and its usage across multiple files, including `ddl.rs`, `create_table.rs`, `create_view.rs`, `table_meta.rs`, `test_util.rs`, `create_logical_tables.rs`,
`drop_table.rs`, and `table_meta_alloc.rs`.
- **Refactored Function Signatures**: Updated function signatures to remove the `TableMetadataAllocatorContext` parameter in methods like `create`, `create_view`, and `alloc` in `table_meta.rs` and `table_meta_alloc.rs`.
- **Updated Imports**: Adjusted import statements to reflect the removal of `TableMetadataAllocatorContext` in affected files.
These changes simplify the codebase by removing an unnecessary context struct and updating related function calls.
* refactor/remove-cluster-id:
### Update `datanode.rs` to Modify Key Prefix
- **File Modified**: `src/common/meta/src/datanode.rs`
- **Key Changes**:
- Updated `DatanodeStatKey::prefix_key` and `From<DatanodeStatKey>` to remove the cluster ID from the key prefix.
- Adjusted comments to reflect the changes in key prefix handling.
* reformat code
* refactor/remove-cluster-id:
### Commit Summary
- **Refactor `Pusher` Initialization**: Removed the `RequestHeader` parameter from the `Pusher::new` method across multiple files, including `handler.rs`, `test_util.rs`, and `heartbeat.rs`. This change simplifies the `Pusher` initialization process by eliminating th
unnecessary parameter.
- **Update Imports**: Adjusted import statements in `handler.rs` and `test_util.rs` to remove unused `RequestHeader` references, ensuring cleaner and more efficient code.
* chore: update proto
* feat: include trace v1 encoding
* feat: add trace ingestion in inserter
* feat: add partition rules and index for trace_id
* chore: format
* chore: fmt
* fix: issue introduced with merge
* feat: adjust index and add integration test for v1
* refactor: remove comment key
* fix: update default value of skip index granularity
* fix: update default value of skip index granularity
* refactor: rename some functions
* feat: remove skipping index from span_id
* refactor: made span_id part of primary key for potential dedup purpose
* feat: move the special attribute resource_attribute.service.name to top level
---------
Co-authored-by: shuiyisong <113876041+shuiyisong@users.noreply.github.com>
* fix/interval-cast-rewrite:
### Enhance Interval Parsing and Casting
- **`create_parser.rs`**: Added a test case `test_parse_interval_cast` to verify the parsing of interval casts.
- **`expand_interval.rs`**: Refactored interval casting logic to handle `CastKind` and `format` attributes. Removed the `create_interval` function and integrated its logic directly into the casting process.
- **`interval.result`**: Updated test results to reflect changes in interval representation, switching from `IntervalMonthDayNano` to `Utf8` format for interval operations.
* reformat code
fix/comment-in-cjk:
### Update `OptionMap` Formatting and Add Tests
- **Enhancements in `OptionMap`**:
- Changed formatting from `escape_default` to `escape_debug` for better handling of special characters in `src/sql/src/statements/option_map.rs`.
- Added unit tests to verify the new formatting behavior.
- **Test Cases for CJK Comments**:
- Added test cases for tables with comments in CJK (Chinese, Japanese, Korean) characters in `tests/cases/standalone/common/show/show_create.sql` and `show_create.result`.
* fix/frontend-node-state: Refactor NodeInfoKey and Context Handling in Meta Server
• Removed unused cluster_id from NodeInfoKey struct.
• Updated HeartbeatHandlerGroup to return Context alongside HeartbeatResponse.
• Added current_node_info to Context for tracking node information.
• Implemented on_node_disconnect in Context to handle node disconnection events, specifically for Frontend roles.
• Adjusted register_pusher function to return PusherId directly.
• Updated tests to accommodate changes in Context structure.
* fix/frontend-node-state: Refactor Heartbeat Handler Context Management
Refactored the HeartbeatHandlerGroup::handle method to use a mutable reference for Context instead of passing it by value. This change simplifies the
context management by eliminating the need to return the context with the response. Updated the Metasrv implementation to align with this new context
handling approach, improving code clarity and reducing unnecessary context cloning.
* revert: clean cluster info on disconnect
* fix/frontend-node-state: Add Frontend Expiry Listener and Update NodeInfoKey Conversion
• Introduced FrontendExpiryListener to manage the expiration of frontend nodes, including its integration with leadership change notifications.
• Modified NodeInfoKey conversion to use references, enhancing efficiency and consistency across the codebase.
• Updated collect_cluster_info_handler and metasrv to incorporate the new listener and conversion changes.
• Added frontend_expiry module to the project structure for better organization and maintainability.
* chore: add config for node expiry
* add some doc
* fix: clippy
* fix/frontend-node-state:
### Refactor Node Expiry Handling
- **Configuration Update**: Removed `node_expiry_tick` from `metasrv.example.toml` and `MetasrvOptions` in `metasrv.rs`.
- **Module Renaming**: Renamed `frontend_expiry.rs` to `node_expiry_listener.rs` and updated references in `lib.rs`.
- **Code Refactoring**: Replaced `FrontendExpiryListener` with `NodeExpiryListener` in `node_expiry_listener.rs` and `metasrv.rs`, removing the tick interval and adjusting logic to use a fixed 60-second interval for node expiry checks.
* fix/frontend-node-state:
Improve logging in `node_expiry_listener.rs`
- Enhanced warning message to include peer information when an unrecognized node info key is encountered in `node_expiry_listener.rs`.
* docs: update config docs
* fix/frontend-node-state:
**Refactor Context Handling in Heartbeat Services**
- Updated `HeartbeatHandlerGroup` in `handler.rs` to pass `Context` by value instead of by mutable reference, allowing for more flexible context
management.
- Modified `Metasrv` implementation in `heartbeat.rs` to clone `Context` when passing to `handle` method, ensuring thread safety and consistency in
asynchronous operations.
* fix/reject-ddl-in-follower-metasrv:
Add leader check and logging for gRPC requests in `procedure.rs`
- Implemented leader verification for `query_procedure_state`, `ddl`, and `procedure_details` gRPC requests in `procedure.rs`.
- Added logging with `warn` for requests reaching a non-leader node.
- Introduced `ResponseHeader` and `Error::is_not_leader()` to handle non-leader responses.
* fix/reject-ddl-in-follower-metasrv:
Improve leader address handling in `heartbeat.rs`
- Refactor leader address retrieval by renaming `leader` to `leader_addr` for clarity.
- Update `make_client` function to use a reference to `leader_addr`.
- Enhance logging to include the leader address in the success message for creating a heartbeat stream.
* fmt
* fix/reject-ddl-in-follower-metasrv:
**Enhance Leader Check in `procedure.rs`**
- Updated the leader verification logic in `procedure.rs` to return a failed `MigrateRegionResponse` when the server is not the leader.
- Added logging to warn when a migrate request is received by a non-leader server.
* perf: do not delete columns when drop logical region in drop database
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
* fix: make ci happy
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
* fix: address review comments
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
* fix: address some comments
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
* fix: drop stupid comments by copilot
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
* chore: minor refactor
* chore: minor refactor
* chore: update grpetime-proto
---------
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
Co-authored-by: WenyXu <wenymedia@gmail.com>
* fix: use alias expr to check commutativity
* chore: debug sort
* feat: consider alias in window sort optimizer
* test: sqlness test
* test: update sqlness result
* TODO: snapshot read
* feat: RegionEngine get last seq
* feat: query context snapshot
* chore: use new proto
* feat: get_region_seqs in region engine
* chore: typo
* chore: toml
* feat: make snapshots modifiable
* feat: add hint for snapshot read
* chore: some typo
* refactor: remove hint as not used
* fix: use commited seqs
* refactor: remove sequences variant on RegionRequest
* refactor: per review
* chore: rebase solve conflict
* refactor: rm unused key
* chore: per review
* chore: per review
* feat(metric-engine): introduce batch alter request handling
* refactor: minor refactor
* refactor: push down filter to mito
* chore: apply suggestions from CR
* feat: handle filter for window sort
* test: sqlness filter test for window sort
* test: add test on tag column filter
* test: test for filter on ts
* test: update sqlness test
* feat: change cache policy for file cache
* feat: file cache run pending task after put
* feat: run pending task in put_dir
* feat: run pending task after stager recovered
* feat: purge recycle bin periodically
* feat: use lru policy for read cache
* feat remove datetime type
* chore: fix unit test
* chore: add column test
* refactor: move create and alter validation to one place
* chore: minor refactor ut
* refactor: rename expr_factory to expr_helper
* chore: remove unnecessary args
fix: use fixed tonistiigi/binfmt:qemu-v7.0.0-28 image version instead of latest version to avoid segmentation fault
Co-authored-by: Yingwen <realevenyag@gmail.com>
ci: skip nightly ci jobs (#9)
(cherry picked from commit 345b4c30474f47a0477263bfba9894d7b4acda2d)
(cherry picked from commit dcd779cd668802fb1ea12fefb4dc3f83f34e30a2)
* refactor: rename grpc options
* refactor: make the arg clearly
* chore: comments on server_addr
* chore: fix test
* chore: remove the store_addr alias
* refactor: cli option rpc_server_addr
* chore: keep store-addr alias
* chore: by comment
* fix: do not transform exprs in the limit plan
* chore: keep some logs for debug
* feat: workaround for limit in other rules
* test: add sqlness tests for offset 0
* chore: add fixme
* - **Refactored SST File Handling**:
- Introduced `FilePathProvider` trait and its implementations (`WriteCachePathProvider`, `RegionFilePathFactory`) to manage SST and index file paths.
- Updated `AccessLayer`, `WriteCache`, and `ParquetWriter` to use `FilePathProvider` for path management.
- Modified `SstWriteRequest` and `SstUploadRequest` to use path providers instead of direct paths.
- Files affected: `access_layer.rs`, `write_cache.rs`, `parquet.rs`, `writer.rs`.
- **Enhanced Indexer Management**:
- Replaced `IndexerBuilder` with `IndexerBuilderImpl` and made it async to support dynamic indexer creation.
- Updated `ParquetWriter` to handle multiple indexers and file IDs.
- Files affected: `index.rs`, `parquet.rs`, `writer.rs`.
- **Removed Redundant File ID Handling**:
- Removed `file_id` from `SstWriteRequest` and `CompactionOutput`.
- Updated related logic to dynamically generate file IDs where necessary.
- Files affected: `compaction.rs`, `flush.rs`, `picker.rs`, `twcs.rs`, `window.rs`.
- **Test Adjustments**:
- Updated tests to align with new path and indexer management.
- Introduced `FixedPathProvider` and `NoopIndexBuilder` for testing purposes.
- Files affected: `sst_util.rs`, `version_util.rs`, `parquet.rs`.
* chore: merge main
* refactor/generate-file-id-in-parquet-writer:
**Enhance Logging in Compactor**
- Updated `compactor.rs` to improve logging of compaction process.
- Added `itertools::Itertools` for efficient string joining.
- Moved logging of compaction inputs and outputs to the async block for better context.
- Enhanced log message to include both input and output file names for better traceability.
* refactor: support to flatten json object in greptime_identity pipeline
* refactor: add GreptimeIdentityPipelineParams to configure greptime_identity pipeline
* refactor: pass greptime identity pipeline params by one header kv
* refactor: code review
* refactor: make pipeline params more general for all internal pipelines
* chore: remove axum deps from pipeline
* fix: clippy errors
* chore: fix and add test
* test: adopt api change for test client
---------
Co-authored-by: shuiyisong <xixing.sys@gmail.com>
Co-authored-by: Ning Sun <sunng@protonmail.com>
Co-authored-by: Ning Sun <sunning@greptime.com>
* change dep
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
* feat: adapt to arrow's interval array
* chore: fix compile errors in datatypes crate
* chore: fix api crate compiler errors
* chore: fix compiler errors in common-grpc
* chore: fix common-datasource errors
* chore: fix deprecated code in common-datasource
* fix promql and physical plan related
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
* wip: upgrading network deps
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
* block on updating `sqlparser`
* upgrade sqlparser
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
* adapt new df's trait requirements
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
* chore: fix compiler errors in mito2
* chore: fix common-function crate errors
* chore: fix catalog errors
* change import path
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
* chore: fix some errors in query crate
* chore: fix some errors in query crate
* aggr expr and some other tiny fixes
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
* chore: fix expr related errors in query crate
* chore: fix query serializer and admin command
* chore: fix grpc services
* feat: axum serve
* chore: fix http server
* remove handle_error handler
* refactor timeout layer
* serve axum
* chore: fix flow aggr functions
* chore: fix flow
* feat: fix errors in meta-srv
* boxed()
* use TokioIo
* feat!: Remove script crate and python feature (#5321)
* feat: exclude script crate
* chore: simplify feature
* feat: remove the script crate
* chore: remove python feature and some comments
* chore: fix warning
* chore: fix servers tests compiler errors
* feat: fix tests-integration errors
* chore: fix unused
* test: fix catalog test
* chore: fix compiler errors for crates using common-meta
testing feature is enabled when check with --workspace
* test: use display for logical plan test
* test: implement rewrite for ScanHintRule
* fix: http server build panic
* test: fix mito test
* fix: sql parser type alias error
* test: fix TestClient not listen
* test: some flow tests
* test(flow): more fix
* fix: test_otlp_logs
* test: fix promql test that using deprecated method fun()
* fix: sql type replace supports Int8 ~ Int64, UInt8 ~ UInt64
* test: fix infer schema test case
* test: fix tests related to plan display
* chore: fix last flow test
* test: fix function format related assertion
* test: use larger port range for tests
* fix: test_otlp_traces
* fix: test_otlp_metrics
* fix range query and dist plan
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
* fix: flow handle distinct use deprecated field
* fix: can't pass Join plan expressions to LogicalPlan::with_new_exprs
* test: fix deserialize test
* test: reduce split key case num
* tests: lower case aggr func name
* test: fix some sqlness tests
* tests: more sqlness fix
* tests: fixed sqlness test
* commit non-bug changes
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
* fix: make our udf correct
* fix: implement empty methods of ContextProvider for DfContextProviderAdapter
* test: update sqlness test result
* chore: remove unused
* fix: provide alias name for AggregateExprBuilder in range plan
* test: update range query result
* fix: implement missing ContextProvider methods for DfContextProviderAdapter
* test: update timestamps, cte result
* fix: supports empty projection in mito
* test: update comment for cte test
* fix: support projection for numbers
* test: update test cases after projection fix
* fix: fix range select first_value/last_value
* fix: handle CAST and time index conflict
* fix: handle order by correctly in range first_value/last_value
* test: update sqlness result
* test: update view test result
* test: update decimal test
wait for https://github.com/apache/datafusion/pull/14126 to fix this
* feat: remove redundant physical optimization
todo(ruihang): Check if we can remove this.
* test: update sqlness test result
* chore: range select default sort use nulls_first = false
* test: update filter push down test result
* test: comment deciaml test to avoid different panic message
* test: update some distributed test result
* test: update test for distributed count and filter push down
* test: update subqueries test
* fix: SessionState may overwrite our UDFs
* chore: fix compiler errors after merging main
* fix: fix elasticsearch and dashboard router panic
* chore: fix common-functions tests
* chore: update sqlness result
* test: fix id keyword and update sqlness result
* test: fix flow_null test
* fix: enlarge thread size in debug mode to avoid overflow
* chore: fix warnings in common-function
* chore: fix warning in flow
* chore: fix warnings in query crate
* chore: remove unused warnings
* chore: fix deprecated warnings for parquet
* chore: fix deprecated warning in servers crate
* style: fix clippy
* test: enlarge mito cache tttl test ttl time
* chore: fix typo
* style: fmt toml
* refactor: reimplement PartialOrd for RangeSelect
* chore: remove script crate files introduced by merge
* fix: return error if sql option is not kv
* chore: do not use ..default::default()
* chore: per review
* chore: update error message in BuildAdminFunctionArgsSnafu
Co-authored-by: jeremyhi <jiachun_feng@proton.me>
* refactor: typed precision
* update sqlness view case
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
* chore: flow per review
* chore: add example in comment
* chore: warn if parquet stats of timestamp is not INT64
* style: add a newline before derive to make the comment more clear
* test: update sqlness result
* fix: flow from substrait
* chore: change update_range_context log to debug level
* chore: move axum-extra axum-macros to workspace
---------
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Co-authored-by: Ruihang Xia <waynestxia@gmail.com>
Co-authored-by: luofucong <luofc@foxmail.com>
Co-authored-by: discord9 <discord9@163.com>
Co-authored-by: shuiyisong <xixing.sys@gmail.com>
Co-authored-by: jeremyhi <jiachun_feng@proton.me>
* fix/avoid-suppress-manual-compaction:
**Refactor Compaction Logic**
- Removed `PendingCompaction` struct and integrated its functionality directly into `CompactionStatus` in `compaction.rs`.
- Simplified waiter management by consolidating waiter handling logic into `CompactionStatus`.
- Updated `CompactionRequest` creation to directly handle waiters without intermediate structures.
- Adjusted test cases in `compaction.rs` to align with the new waiter management approach.
(cherry picked from commit 87e2d1c2cc9bd82c02991d22e429bef25c5ee348)
* fix/avoid-suppress-manual-compaction:
### Add Support for Manual Compaction Requests
- **Compaction Logic Enhancements**:
- Updated `CompactionScheduler` in `compaction.rs` to handle manual compaction requests using `Options::StrictWindow`.
- Introduced `PendingCompaction` struct to manage pending manual compaction requests.
- Added logic to reschedule manual compaction requests once the current compaction task is completed.
- **Testing**:
- Added `test_manual_compaction_when_compaction_in_progress` to verify the handling of manual compaction requests during ongoing compaction processes.
These changes enhance the compaction scheduling mechanism by allowing manual compaction requests to be queued and processed efficiently.
(cherry picked from commit bc38ed0f2f8ba2c4690e0d0e251aeb2acce308ca)
* chore: fix conflicts
* fix/avoid-suppress-manual-compaction:
### Add Error Handling for Manual Compaction Override
- **`compaction.rs`**: Enhanced the `set_pending_request` method to handle manual compaction overrides by sending an error to the waiter if a previous request exists.
- **`error.rs`**: Introduced a new error variant `ManualCompactionOverride` to represent manual compaction being overridden, and mapped it to the `Cancelled` status code.
* fix: format
* fix/avoid-suppress-manual-compaction:
**Add Error Handling for Pending Compaction Requests**
- Enhanced error handling in `compaction.rs` by adding logic to handle errors for pending compaction requests.
- Introduced a mechanism to send errors using `waiter.send` when a pending compaction request fails, ensuring proper error propagation and context with `CompactRegionSnafu`.
* fix/avoid-suppress-manual-compaction:
**Fix Typo and Simplify Code Logic in `compaction.rs`**
- Corrected a typo in the license comment from "langucage" to "language".
- Simplified the logic for handling `pending_compaction` in `CompactionStatus` by removing unnecessary pattern matching and directly accessing `waiter`.
* fix: typo
* feat: make instant_query and range_query to supports not-equal matchers
* feat: impl query_metric_names
* feat: forgot some files and refactor
* chore: test and docs
* fix: typo
Co-authored-by: Ruihang Xia <waynestxia@gmail.com>
* refactor: parse_query
* chore: improve test
* fix: use current catalog to query information_schema
---------
Co-authored-by: Ruihang Xia <waynestxia@gmail.com>
* fix: vector function for PromQL need to ignore the time index also close#5392
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
* fix: do not affect scalar function
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
* fix: betteer name for it
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
---------
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
* refactor: use MetadataKey
* fix: match all prefix
* refactor: introduce TopicPool
* fix: fix test, some rename
* test: add unit test for legacy restore
* fix: add _ between prefix and topic id
* chore: readable legacy topics
* refactor: a refactor
* Apply suggestions from code review
* Apply suggestions from code review
* refactor: introduce TopicPool
* fix: fix unit test
* chore: fix unit test and add some comments
* fix: fix unit test
* refactor: just refactor
* refactor: rename
* chore: rename, comments and remove unnecessary clone
* chore/change-authorization-header:
### Add Custom Authorization Header Support
- **Files Modified**: `http.rs`, `authorize.rs`, `authorize.rs` (tests)
- **Key Changes**:
- Introduced a custom authorization header `x-greptime-auth` in `http.rs`.
- Updated authorization logic in `authorize.rs` to support both `x-greptime-auth` and the standard `Authorization` header.
- Enhanced test cases in `authorize.rs` to validate the new custom header functionality.
* chore: add more tests
chore/change-default-compaction-output-size-limit:
### Update `TwcsOptions` Default Configuration
- Modified the default value of `max_output_file_size` in `TwcsOptions` to `Some(ReadableSize::gb(2))` in `src/mito2/src/region/options.rs`.
* feat: use time window in compaction options for compaction window
* test: add tests for overwriting options
* chore: typo
* chore: fix a grammar issue in log
This patch support pg_database for pg_catalog, also add query replace,
in fixtures.rs for the reason that datafusion do not support sql like
'select 1,1;' more can check issue #5344.
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
* feat: introduce `PrimaryKeyEncoding`
* fix: fix unit tests
* chore: add empty line
* test: add unit tests
* chore: fmt code
* refactor: introduce new codec trait to support various encoding
* fix: fix unit tests
* chore: update sqlness result
* chore: apply suggestions from CR
* chore: apply suggestions from CR
* feat: more workers
* feat: use round robin
* refactor: per review
* refactor: per bot review
* chore: per review
* docs: example
* docs: update config.md
* docs: update
* chore: per review
* refactor: set workers to cpu/2.max(1)
* fix: flow config in standalone mode
* test: fix config test
* docs: update docs&opt name
* chore: update config.md
* refactor: per review, sanitize at top
* chore: per review
* chore: config.md
* refactor(elasticsearch): use `_index` as greptimedb table in log ingestion and add `/${index}/_bulk` API
Signed-off-by: zyy17 <zyylsxm@gmail.com>
* refactor: code review
---------
Signed-off-by: zyy17 <zyylsxm@gmail.com>
* fix: drop all python embedding code for docker and doc
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
* fix: address comments drop the left python
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
---------
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
* feat: support `select session_user;`
This commit is part of support DBeaver that support function
select session_user like postgres did.
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
* fix: lint problem
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
* fix: address comments add tests
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
---------
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
* test: optimize out partition split insert requests if there is only one region
* Now that the optimization for single region insert has been lifted up, the original "fast path" can be obsoleted.
* resolve PR comments
* feat(flow): (Part 1) refill utils
* chore: after rebase fix
* chore: more rebase
* rm refill.rs to reduce pr size
* chore: simpler args
* refactor: per review
* docs: more explain for instant requests
* refactor: per review
* fix: drop unused dep using udeps to minial the size
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
* fix: adress comments fix the problem
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
---------
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
* feat: impl COPY a query resultset to external file
* chore: add more tests for parse `copy_table_to`
* chore: add more tests for parse `copy_table_to`
* feat: txn for pg kv backend
* chore: clippy
* fix: txn uses one client
* test: clean up and txn test
* test: clean up
* test: change lock_id to avoid conflict in test
* test: use different prefix in pg election test
* fix(test): just a fix
* test: aggregate multiple test to avoid concurrency problem
* test: use uuid instead of rng
* perf: batch cmp in txn
* perf: batch same op in txn
chore/suppress-list-warning:
### Update logging level in `intermediate.rs`
- Changed logging level from `warn` to `debug` for unexpected directory entries in index creation.
- Added `debug` to the `common_telemetry` import to support the logging level change.
* test: test adding existing columns
* chore: add more checks to AlterKind
* chore: update logs
* fix: check and build table info first
* feat: Add add_if_not_exists flag to alter expr
* feat: skip existing columns when building alter kind
* checks in make_region_alter_kind()
* reuse the alter kind
* test: fix tests in common-meta
* chore: fix typos
* chore: update comments
* feat: update partition duration of memtable using compaction window
* chore: only use provided duration if it is not None
* test: more tests
* test: test compaction apply window
* style: fix clippy
* feat: init PgElection
fix: release advisory lock
fix: handle duplicate keys
chore: update comments
fix: unlock if acquired the lock
chore: add TODO and avoid unwrap
refactor: check both lock and expire time, add more comments
chore: fmt
fix: deal with multiple edge cases
feat: init PgElection with candidate registration
chore: fmt
chore: remove
* test: add unit test for pg candidate registration
* test: add unit test for pg candidate registration
* chore: update pg env
* chore: make ci happy
* fix: spawn a background connection thread
* chore: typo
* fix: shadow the election client for now
* fix: fix ci
* chore: readability
* chore: follow review comments
* refactor: use kvbackend for pg election
* chore: rename
* chore: make clippy happy
* refactor: use pg server time instead of local ones
* chore: typo
* chore: rename infancy to leader_infancy for clarification
* chore: clean up
* chore: follow review comments
* chore: follow review comments
* ci: unit test should test all features
* ci: fix
* ci: just test pg
* wip: row group reader base
* wip: memtable row group reader
* Refactor MemtableRowGroupReader to streamline data fetching
- Added early return when fetch_ranges is empty to optimize performance.
- Replaced inline chunk data assignment with a call to `assign_dense_chunk` for cleaner code.
* wip: row group reader
* wip: reuse RowGroupReader
* wip: bulk part reader
* Enhance BulkPart Iteration with Filtering
- Introduced `RangeBase` to `BulkIterContext` for improved filter handling.
- Implemented filter application in `BulkPartIter` to prune batches based on predicates.
- Updated `SimpleFilterContext::new_opt` to be public for broader access.
* chore: add prune test
* fix: clippy
* fix: introduce prune reader for memtable and add more prune test
* Enhance BulkPart read method to return Option<BoxedBatchIterator>
- Modified `BulkPart::read` to return `Option<BoxedBatchIterator>` to handle cases where no row groups are selected.
- Added logic to return `None` when all row groups are filtered out.
- Updated tests to handle the new return type and added a test case to verify behavior when no row groups match the pr
* refactor/separate-paraquet-reader: Add helper function to parse parquet metadata and integrate it into BulkPartEncoder
* refactor/separate-paraquet-reader:
Change BulkPartEncoder row_group_size from Option to usize and update tests
* refactor/separate-paraquet-reader: Add context module for bulk memtable iteration and refactor part reading
• Introduce context module to encapsulate context for bulk memtable iteration.
• Refactor BulkPart to use BulkIterContextRef for reading operations.
• Remove redundant code in BulkPart by centralizing context creation and row group pruning logic in the new context module.
• Create new file context.rs with structures and logic for handling iteration context.
• Adjust part_reader.rs and row_group_reader.rs to reference the new BulkIterContextRef.
* refactor/separate-paraquet-reader: Refactor RowGroupReader traits and implementations in memtable and parquet reader modules
• Rename RowGroupReaderVirtual to RowGroupReaderContext for clarity.
• Replace BulkPartVirt with direct usage of BulkIterContextRef in MemtableRowGroupReader.
• Simplify MemtableRowGroupReaderBuilder by directly passing context instead of creating a BulkPartVirt instance.
• Update RowGroupReaderBase to use context field instead of virt, reflecting the trait renaming and usage.
• Modify FileRangeVirt to FileRangeContextRef and adjust implementations accordingly.
* refactor/separate-paraquet-reader: Refactor column page reader creation and remove unused code
• Centralize creation of SerializedPageReader in RowGroupBase::column_reader method.
• Remove unused RowGroupCachedReader and related code from MemtableRowGroupPageFetcher.
• Eliminate redundant error handling for invalid column index in multiple places.
* chore: rebase main and resolve conflicts
* fix: some comments
* chore: resolve conflicts
* chore: resolve conflicts
* chore: improve nix-shell support
* fix: add pkg-config
* ci: add a github action to ensure build on clean system
* ci: optimise dependencies of task
* ci: move clean build to nightly
Add ORDER BY clause to subquery union tests
Updated the SQL and result files for subquery union tests to include an ORDER BY clause, ensuring consistent result ordering. This change aligns with the test case from the DuckDB repository.
* feat: do not remove time filters
* chore: remove `time_range` from parquet reader
* chore: print more message in the check script
* chore: fix unused error
* perf/avoid-holding-memtable-during-compaction: Refactor Compaction Version Handling
• Introduced CompactionVersion struct to encapsulate region version details for compaction, removing dependency on VersionRef.
• Updated CompactionRequest and CompactionRegion to use CompactionVersion.
• Modified open_compaction_region to construct CompactionVersion without memtables.
• Adjusted WindowedCompactionPicker to work with CompactionVersion.
• Enhanced flush logic in WriteBufferManager to improve memory usage checks and logging.
* reformat code
* chore: change log level
* reformat code
---------
Co-authored-by: Yingwen <realevenyag@gmail.com>
* feat: simple version switch
* chore: remove debug print
* chore: add common folder
* tests: add drop table
* feat: pull versioned binary
* chore: don't use native-tls
* chore: rm outdated docs
* chore: new line
* fix: save old bin dir
* fix: switch version restart all node
* feat: use etcd
* fix: wait for election
* fix: normal sqlness
* refactor: hashmap for bin dir
* test: past 3 major version compat crate table
* refactor: allow using without setup etcd
* add metrics
* chore/bench-metrics: Add INFLIGHT_FLUSH_COUNT Metric to Flush Process
• Introduced INFLIGHT_FLUSH_COUNT metric to track the number of ongoing flush operations.
• Incremented INFLIGHT_FLUSH_COUNT in FlushScheduler to monitor active flushes.
• Removed redundant increment of INFLIGHT_FLUSH_COUNT in RegionWorkerLoop to prevent double counting.
* chore/bench-metrics: Add Metrics for Compaction and Flush Operations
• Introduced INFLIGHT_COMPACTION_COUNT and INFLIGHT_FLUSH_COUNT metrics to track the number of ongoing compaction and flush operations.
• Incremented INFLIGHT_COMPACTION_COUNT when scheduling remote and local compaction jobs, and decremented it upon completion.
• Added INFLIGHT_FLUSH_COUNT increment and decrement logic around flush tasks to monitor active flush operations.
• Removed redundant metric updates in worker.rs and handle_compaction.rs to streamline metric handling.
* chore: add metrics for remote compaction jobs
* chore: format
* chore: also add dashbaord
* feat: cache inverted index by page instead of file
* fix: add unit test and fix bugs
* chore: typo
* chore: ci
* fix: math
* chore: apply review comments
* chore: renames
* test: add unit test for index key calculation
* refactor: use ReadableSize
* feat: add config for inverted index page size
* chore: update config file
* refactor: handle multiple range read and fix some related bugs
* fix: add config
* test: turn to a fs reader to match behaviors of object store
* chore: decide tag column in log api follow table schema if table exists
* chore: add more test for greptime_identity pipeline
* chore: change pipeline get_table function signature
* chore: change identity_pipeline_inner tag_column_names type
* feat(fuzz): add set table options to alter fuzzer
* chore: clippy is happy, I'm sad
* chore: happy ci happy
* fix: unit test
* feat(fuzz): add unset table options to alter fuzzer
* fix: unit test
* feat(fuzz): add table option validator
* fix: make clippy happy
* chore: add comments
* chore: apply review comments
* fix: unit test
* feat(fuzz): add more ttl options
* fix: #5108
* chore: add comments
* chore: add comments
* feat: assign partition ranges by rows
* feat: balance partition rows
* feat: get uppoer bound for part nums
* feat: only split in non-compaction seq scan
* fix: parallel scan on multiple sources
* fix: can split check
* feat: scanner prepare by request
* feat: remove scan_parallelism
* docs: upate docs
* chore: update comment
* style: fix clippy
* feat: skip merge and dedup if there is only one source
* chore: Revert "feat: skip merge and dedup if there is only one source"
Since memtable won't do dedup jobs
This reverts commit 2fc7a54b11.
* test: avoid compaction in sqlness window sort test
* chore: do not create semaphore if num partitions is enough
* chore: more assertions
* chore: fix typo
* fix: compaction flag not set
* chore: address review comments
* feat: ttl zero filter
* refactor: use TimeToLive enum
* fix: unit test
* tests: sqlness
* refactor: Option<TTL> None means UNSET
* tests: sqlness
* fix: 10000 years --> forever
* chore: minor refactor from reviews
* chore: rename back TimeToLive
* refactor: split imme request from normal requests
* fix: use correct lifetime
* refactor: rename immediate to instant
* tests: flow sink table default ttl
* refactor: per review
* tests: sqlness
* fix: ttl alter to instant
* tests: sqlness
* refactor: per review
* chore: per review
* feat: add db ttl type&forbid instant for db
* tests: more unit test
* fix: use SchemaCache to locate database metadata
* main:
Refactor SchemaMetadataManager to use TableInfoCacheRef
- Replace TableInfoManagerRef with TableInfoCacheRef in SchemaMetadataManager
- Update DatanodeBuilder to pass TableInfoCacheRef to SchemaMetadataManager
- Rename error MissingCacheRegistrySnafu to MissingCacheSnafu in datanode module
- Adjust tests to use new mock_schema_metadata_manager with TableInfoCacheRef
* fix/schema-cache-invalidation: Add cache module and integrate cache registry into datanode
• Implement build_datanode_cache_registry function to create cache registry for datanode
• Integrate cache registry into datanode by modifying DatanodeBuilder and HeartbeatTask
• Refactor InvalidateTableCacheHandler to InvalidateCacheHandler and move to common-meta crate
• Update Cargo.toml to include cache as a dev-dependency for datanode
• Adjust related modules (flownode, frontend, tests-integration, standalone) to use new cache handler and registry
• Remove obsolete handler module from frontend crate
* fix: fuzz imports
* chore: add some doc for cahce builder functions
* refactor: change table info cache to table schema cache
* fix: remove unused variants
* fix fuzz
* chore: apply suggestion
Co-authored-by: Weny Xu <wenymedia@gmail.com>
* chore: apply suggestion
Co-authored-by: Weny Xu <wenymedia@gmail.com>
* fix: compile
---------
Co-authored-by: dennis zhuang <killme2008@gmail.com>
Co-authored-by: Weny Xu <wenymedia@gmail.com>
* feat: add cache for schema options
* fix/use-cache-kv-manager: Add cache invalidation handling to Datanode's heartbeat task
• Implement InvalidateSchemaCacheHandler in heartbeat.rs to handle cache invalidation instructions.
• Update HeartbeatTask constructor to accept cached_kv_backend and pass it to InvalidateSchemaCacheHandler.
• Modify DatanodeBuilder to clone cached_kv_backend when creating schema_metadata_manager.
• Refactor MetasrvCacheInvalidator in cache_invalidator.rs to reuse MailboxMessage for broadcasting to different channels.
* fix: only remove schema related cache entries
* chore: add more tests
* fix/use-cache-kv-manager: Moved InvalidateSchemaCacheHandler to a separate module
• Extracted InvalidateSchemaCacheHandler and associated tests into a new file cache_invalidator.rs
• Removed async_trait and CacheInvalidator related code from heartbeat.rs
• Added cache_invalidator module declaration in handler.rs
* fix: unit tests
* fix/use-cache-kv-manager:
Standardize TODO comment format in CachedKvBackend txn method
* Update src/datanode/src/heartbeat/handler/cache_invalidator.rs
* Update src/datanode/src/heartbeat/handler/cache_invalidator.rs
* Update src/datanode/src/heartbeat/handler/cache_invalidator.rs
---------
Co-authored-by: jeremyhi <jiachun_feng@proton.me>
* fix/metric-metadata-region-options: Remove APPEND_MODE_KEY and refactor TTL option handling in MetricEngineInner
* fix/metric-metadata-region-options: Refactor metadata region options into a shared function
• Extract metadata region options into region_options_for_metadata_region function
• Replace inline options map with a call to the new shared function in both create.rs and open.rs files
* fix: exclude typos
* fix/metric-metadata-region-options:
Refactor metadata region options to accept original options and remove APPEND_MODE_KEY
* feat: prune in each partition
* chore: change pick log to trace
* chore: add in progress partition scan to metrics
* feat: seqscan support pruning in partition
* chore: remove commented codes
* feat: Replace flow
* refactor: better show create flow&tests: better check
* tests: sqlness result update
* tests: unit test for update
* refactor: cmp with raw bytes
* refactor: rename
* refactor: per review
* support set and show on statement/execution timeout session variables.
* implement statement timeout for mysql read, and postgres queries
* add mysql test with max execution time
* tests: more flow testcase
* tests(WIP): more tests
* tests: more flow tests
* test: wired regex for sqlness
* refactor: put blog&example to two files
* fix: result of nulls
* update test result
* fix null behaviors, add null tests
* update NULL tests
* error handler when parsing json_path
* change the logic to: items' datatype in the input arrays are all the same.
* remove a comment
* refactor: better logic
* drop unnecessary err check
* added an error test case
* main:
Add common-meta dependency and implement SchemaMetadataManager
- Introduce `common-meta` as a new dependency in `mito2`.
- Implement `SchemaMetadataManager` for managing schema-level metadata.
- Update `DatanodeBuilder` and `MitoEngine` to pass `KvBackendRef` for schema metadata management.
- Add `SchemaMetadataManager` to `RegionWorkerLoop` for compaction handling.
- Include `SchemaNameKey` usage in compaction-related code.
- Add `database_metadata_manager` module with `SchemaMetadataManager` struct and associated logic.
* fix/database-base-ttl:
Refactor metadata management and update compaction logic
- Remove `database_metadata_manager` and introduce `schema_metadata_manager`
- Update compaction logic to handle TTL based on schema metadata
- Adjust tests to use `schema_metadata_manager` for setting up schema options
- Fix engine creation in tests to pass `kv_backend` explicitly
- Remove unused imports and apply minor code cleanups
* fix/database-base-ttl:
Extend CREATE TABLE LIKE to inherit schema options
- Implement inheritance of database level options for CREATE TABLE LIKE
- Add schema options to SHOW CREATE TABLE output
- Refactor create_table_stmt to include schema_options in SQL generation
- Update error handling to include TableMetadataManagerSnafu
* fix/database-base-ttl:
Refactor error handling and remove schema dependency in table creation
- Replace expect with the ? operator for error handling in open_compaction_region
- Simplify create_logical_tables by removing catalog and schema name parameters
- Remove unnecessary schema retrieval and merging of schema options in create_table_info
- Clean up unused imports and redundant code
* fix/database-base-ttl:
Refactor error handling and update documentation comments
- Update comment to reflect retrieval of schema options instead of metadata
- Introduce new error type `GetSchemaMetadataSnafu` for schema metadata retrieval failures
- Implement error handling for schema metadata retrieval in `find_ttl` function
* fix: toml
* fix/database-base-ttl:
Refactor SchemaMetadataManager and adjust Cargo.toml dependencies
- Remove unused imports in schema_metadata_manager.rs
- Add conditional compilation for SchemaMetadataManager::new
- Update Cargo.toml to remove "testing" feature from common-meta dependency in main section and add it to dev-dependencies
* fix/database-base-ttl:
Fix typos in comments and function names across multiple modules
- Correct spelling of 'parallelism' in region_server, engine, and scan_region modules
- Amend typo in TODO comment from 'persisent' to 'persistent' in server module
- Update incorrect test query from 'versiona' to 'version' in federated module tests
* fix/database-base-ttl: Add schema existence check in StatementExecutor for CREATE TABLE operation
* fix/database-base-ttl: Add warning log for failed TTL retrieval in compaction region open function
* fix/database-base-ttl:
Refactor to use SchemaMetadataManagerRef in Datanode and MitoEngine
- Replace KvBackendRef with SchemaMetadataManagerRef across various components.
- Update DatanodeBuilder and MitoEngine to pass SchemaMetadataManagerRef instead of KvBackendRef.
- Adjust test cases to use get_schema_metadata_manager method for consistency.
* fix: data_length, index_length, table_rows in tables
* feat: table stats only works for mito engine currently
* fix: tests
* fix: typo
* chore: log error when region_stats fails
fix/database-base-ttl:
Fix typos in comments and function names across multiple modules
- Correct spelling of 'parallelism' in region_server, engine, and scan_region modules
- Amend typo in TODO comment from 'persisent' to 'persistent' in server module
- Update incorrect test query from 'versiona' to 'version' in federated module tests
Co-authored-by: Lei, HUANG <mrsatangel@gmail.com>
2024-11-04 01:56:11 +00:00
2298 changed files with 288725 additions and 86128 deletions
release-dev-builder-images-cn: # Note: Be careful issue:https://github.com/containers/skopeo/issues/1874 and we decide to use the latest stable skopeo container.
@@ -55,14 +55,18 @@ GreptimeDB uses the [Apache 2.0 license](https://github.com/GreptimeTeam/greptim
- To ensure that community is free and confident in its ability to use your contributions, please sign the Contributor License Agreement (CLA) which will be incorporated in the pull request process.
- Make sure all files have proper license header (running `docker run --rm -v $(pwd):/github/workspace ghcr.io/korandoru/hawkeye-native:v3 format` from the project root).
- Make sure all your codes are formatted and follow the [coding style](https://pingcap.github.io/style-guide/rust/) and [style guide](docs/style-guide.md).
- Make sure all unit tests are passed using [nextest](https://nexte.st/index.html) `cargo nextest run`.
- Make sure all clippy warnings are fixed (you can check it locally by running `cargo clippy --workspace --all-targets -- -D warnings`).
- Make sure all unit tests are passed using [nextest](https://nexte.st/index.html) `cargo nextest run --workspace --features pg_kvbackend,mysql_kvbackend` or `make test`.
- Make sure all clippy warnings are fixed (you can check it locally by running `cargo clippy --workspace --all-targets -- -D warnings` or `make clippy`).
- Ensure there are no unused dependencies by running `make check-udeps` (clean them up with `make fix-udeps` if reported).
- If you must keep a target-specific dependency (e.g. under `[target.'cfg(...)'.dev-dependencies]`), add a cargo-udeps ignore entry in the same `Cargo.toml`, for example:
`[package.metadata.cargo-udeps.ignore]` with `development = ["rexpect"]` (or `dependencies`/`build` as appropriate).
- When modifying sample configuration files in `config/`, run `make config-docs` (which requires Docker to be installed) to update the configuration documentation and include it in your commit.
#### `pre-commit` Hooks
You could setup the [`pre-commit`](https://pre-commit.com/#plugins) hooks to run these checks on every commit automatically.
1. Install `pre-commit`
1. Install `pre-commit`
pip install pre-commit
@@ -70,7 +74,7 @@ You could setup the [`pre-commit`](https://pre-commit.com/#plugins) hooks to run
brew install pre-commit
2. Install the `pre-commit` hooks
2. Install the `pre-commit` hooks
$ pre-commit install
pre-commit installed at .git/hooks/pre-commit
@@ -108,7 +112,7 @@ of what you were trying to do and what went wrong. You can also reach for help i
The core team will be thrilled if you would like to participate in any way you like. When you are stuck, try to ask for help by filing an issue, with a detailed description of what you were trying to do and what went wrong. If you have any questions or if you would like to get involved in our community, please check out:
- [GreptimeDB Community Slack](https://greptime.com/slack)
# This is commented, since we are not using aws-lc-sys, if we need to use it, we need to uncomment this line or use a release after this commit, or it wouldn't compile with gcc < 8.1
**GreptimeDB** is an open-source unified time-series database for **Metrics**, **Logs**, and **Events** (also **Traces** in plan). You can gain real-time insights from Edge to Cloud at any scale.
**GreptimeDB** is an open-source, cloud-native database purpose-built for the unified collection and analysis of observability data (metrics, logs, and traces). Whether you’re operating on the edge, in the cloud, or across hybrid environments, GreptimeDB empowers real-time insights at massive scale — all in one system.
## Why GreptimeDB
## Features
Our core developers have been building time-series data platforms for years. Based on our best-practices, GreptimeDB is born to give you:
| Feature | Description |
| --------- | ----------- |
| [Unified Observability Data](https://docs.greptime.com/user-guide/concepts/why-greptimedb) | Store metrics, logs, and traces as timestamped, contextual wide events. Query via [SQL](https://docs.greptime.com/user-guide/query-data/sql), [PromQL](https://docs.greptime.com/user-guide/query-data/promql), and [streaming](https://docs.greptime.com/user-guide/flow-computation/overview). |
| [High Performance & Cost Effective](https://docs.greptime.com/user-guide/manage-data/data-index) | Written in Rust, with a distributed query engine, [rich indexing](https://docs.greptime.com/user-guide/manage-data/data-index), and optimized columnar storage, delivering sub-second responses at PB scale. |
| [Cloud-Native Architecture](https://docs.greptime.com/user-guide/concepts/architecture) | Designed for [Kubernetes](https://docs.greptime.com/user-guide/deployments-administration/deploy-on-kubernetes/greptimedb-operator-management), with compute/storage separation, native object storage (AWS S3, Azure Blob, etc.) and seamless cross-cloud access. |
| [Developer-Friendly](https://docs.greptime.com/user-guide/protocols/overview) | Access via SQL/PromQL interfaces, REST API, MySQL/PostgreSQL protocols, and popular ingestion [protocols](https://docs.greptime.com/user-guide/protocols/overview). |
| [Flexible Deployment](https://docs.greptime.com/user-guide/deployments-administration/overview) | Deploy anywhere: edge (including ARM/[Android](https://docs.greptime.com/user-guide/deployments-administration/run-on-android)) or cloud, with unified APIs and efficient data sync. |
* **Unified all kinds of time series**
Learn more in [Why GreptimeDB](https://docs.greptime.com/user-guide/concepts/why-greptimedb) and [Observability 2.0 and the Database for It](https://greptime.com/blogs/2025-04-25-greptimedb-observability2-new-database).
GreptimeDB treats all time series as contextual events with timestamp, and thus unifies the processing of metrics, logs, and events. It supports analyzing metrics, logs, and events with SQL and PromQL, and doing streaming with continuous aggregation.
GreptimeDB can be deployed on ARM architecture-compatible Android/Linux systems as well as cloud environments from various vendors. Both sides run the same software, providing identical APIs and control planes, so your application can run at the edge or on the cloud without modification, and data synchronization also becomes extremely easy and efficient.
**Performance:**
* [GreptimeDB tops JSONBench's billion-record cold run test!](https://greptime.com/blogs/2025-03-18-jsonbench-greptimedb-performance)
By leveraging object storage (S3 and others), separating compute and storage, scaling stateless compute nodes arbitrarily, GreptimeDB implements seamless scalability. It also supports cross-cloud deployment with a built-in unified data access layer over different object storages.
## Architecture
***Performance and Cost-effective**
Flexible indexing capabilities and distributed, parallel-processing query engine, tackling high cardinality issues down. Optimized columnar layout for handling time-series data; compacted, compressed, and stored on various storage backends, particularly cloud object storage with 50x cost efficiency.
* **Compatible with InfluxDB, Prometheus and more protocols**
Widely adopted database protocols and APIs, including MySQL, PostgreSQL, and Prometheus Remote Storage, etc. [Read more](https://docs.greptime.com/user-guide/protocols/overview).
*Read the [architecture](https://docs.greptime.com/contributor-guide/overview/#architecture) document.
* [DeepWiki](https://deepwiki.com/GreptimeTeam/greptimedb/1-overview) provides an in-depth look at GreptimeDB:
<imgalt="GreptimeDB System Overview"src="docs/architecture.png">
*Python toolchain (optional): Required only if built with PyO3 backend. More detail for compiling with PyO3 can be found in its [documentation](https://pyo3.rs/v0.18.1/building_and_distribution#configuring-the-python-version).
*C/C++ building essentials, including `gcc`/`g++`/`autoconf` and glibc library (eg. `libc6-dev` on Ubuntu and `glibc-devel` on Fedora)
* Python toolchain (optional): Required only if using some test scripts.
Build GreptimeDB binary:
```shell
**Build and Run:**
```bash
make
```
Run a standalone server:
```shell
cargo run -- standalone start
```
## Extension
## Tools & Extensions
### Dashboard
-[The dashboard UI for GreptimeDB](https://github.com/GreptimeTeam/dashboard)
### SDK
- [GreptimeDB Go Ingester](https://github.com/GreptimeTeam/greptimedb-ingester-go)
The current version has not yet reached the standards for General Availability.
According to our Greptime 2024 Roadmap, we aim to achieve a production-level version with the release of v1.0 by the end of 2024. [Join Us](https://github.com/GreptimeTeam/greptimedb/issues/3412)
> **Status:** Beta.
> **GA (v1.0):** Targeted for mid 2025.
We welcome you to test and use GreptimeDB. Some users have already adopted it in their production environments. If you're interested in trying it out, please use the latest stable release available.
- Being used in production by early adopters
- Stable, actively maintained, with regular releases ([version info](https://docs.greptime.com/nightly/reference/about-greptimedb-version))
- Suitable for evaluation and pilot deployments
For production use, we recommend using the latest stable release.
[](https://www.star-history.com/#GreptimeTeam/GreptimeDB&Date)
If you find this project useful, a ⭐ would mean a lot to us!
GreptimeDB uses the [Apache License 2.0](https://apache.org/licenses/LICENSE-2.0.txt) to strike a balance between
open contributions and allowing you to use the software however you want.
GreptimeDB is licensed under the [Apache License 2.0](https://apache.org/licenses/LICENSE-2.0.txt).
## Commercial Support
Running GreptimeDB in your organization?
We offer enterprise add-ons, services, training, and consulting.
[Contact us](https://greptime.com/contactus) for details.
## Contributing
Please refer to [contribution guidelines](CONTRIBUTING.md) and [internal concepts docs](https://docs.greptime.com/contributor-guide/overview.html) for more information.
- Explore [Internal Concepts](https://docs.greptime.com/contributor-guide/overview.html) and [DeepWiki](https://deepwiki.com/GreptimeTeam/greptimedb).
- Pick up a [good first issue](https://github.com/GreptimeTeam/greptimedb/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22) and join the #contributors [Slack](https://greptime.com/slack) channel.
## Acknowledgement
Special thanks to all the contributors who have propelled GreptimeDB forward. For a complete list of contributors, please refer to [AUTHOR.md](AUTHOR.md).
Special thanks to all contributors! See [AUTHORS.md](https://github.com/GreptimeTeam/greptimedb/blob/main/AUTHOR.md).
-GreptimeDB uses [Apache Arrow™](https://arrow.apache.org/) as the memory model and [Apache Parquet™](https://parquet.apache.org/) as the persistent file format.
-GreptimeDB's query engine is powered by [Apache Arrow DataFusion™](https://arrow.apache.org/datafusion/).
- [Apache OpenDAL™](https://opendal.apache.org) gives GreptimeDB a very general and elegant data access abstraction layer.
-GreptimeDB's meta service is based on [etcd](https://etcd.io/).
- GreptimeDB uses [RustPython](https://github.com/RustPython/RustPython) for experimental embedded python scripting.
| `default_timezone` | String | Unset | The default timezone of the server. |
| `init_regions_in_background` | Bool | `false` | Initialize all regions in the background during the startup.<br/>By default, it provides services after all regions have been initialized. |
| `max_concurrent_queries` | Integer | `0` | The maximum current queries allowed to be executed. Zero means unlimited. |
| `enable_telemetry` | Bool | `true` | Enable telemetry to collect anonymous usage data. Enabled by default. |
| `max_in_flight_write_bytes` | String | Unset | The maximum in-flight write bytes. |
| `runtime` | -- | -- | The runtime options. |
| `runtime.global_rt_size` | Integer | `8` | The number of threads to execute the runtime for global read operations. |
| `runtime.compact_rt_size` | Integer | `4` | The number of threads to execute the runtime for global write operations. |
| `http` | -- | -- | The HTTP server options. |
| `http.addr` | String | `127.0.0.1:4000` | The address to bind the HTTP server. |
| `http.timeout` | String | `30s` | HTTP request timeout. Set to 0 to disable timeout. |
| `http.timeout` | String | `0s` | HTTP request timeout. Set to 0 to disable timeout. |
| `http.body_limit` | String | `64MB` | HTTP request body limit.<br/>The following units are supported: `B`, `KB`, `KiB`, `MB`, `MiB`, `GB`, `GiB`, `TB`, `TiB`, `PB`, `PiB`.<br/>Set to 0 to disable limit. |
| `http.enable_cors` | Bool | `true` | HTTP CORS support, it's turned on by default<br/>This allows browser to access http APIs without CORS restrictions |
| `prom_store.enable` | Bool | `true` | Whether to enable Prometheus remote write and read in HTTP API. |
| `prom_store.with_metric_engine` | Bool | `true` | Whether to store the data from Prometheus remote write in metric engine. |
| `wal` | -- | -- | The WAL options. |
| `wal.provider` | String | `raft_engine` | The provider of the WAL.<br/>- `raft_engine`: the wal is stored in the local file system by raft-engine.<br/>- `kafka`: it's remote wal that data is stored in Kafka. |
| `wal.dir` | String | Unset | The directory to store the WAL files.<br/>**It's only used when the provider is `raft_engine`**. |
| `wal.file_size` | String | `256MB` | The size of the WAL segment file.<br/>**It's only used when the provider is `raft_engine`**. |
| `wal.purge_threshold` | String | `4GB` | The threshold of the WAL size to trigger a flush.<br/>**It's only used when the provider is `raft_engine`**. |
| `wal.purge_interval` | String | `10m` | The interval to trigger a flush.<br/>**It's only used when the provider is `raft_engine`**. |
| `wal.file_size` | String | `128MB` | The size of the WAL segment file.<br/>**It's only used when the provider is `raft_engine`**. |
| `wal.purge_threshold` | String | `1GB` | The threshold of the WAL size to trigger a purge.<br/>**It's only used when the provider is `raft_engine`**. |
| `wal.purge_interval` | String | `1m` | The interval to trigger a purge.<br/>**It's only used when the provider is `raft_engine`**. |
| `wal.read_batch_size` | Integer | `128` | The read batch size.<br/>**It's only used when the provider is `raft_engine`**. |
| `wal.sync_write` | Bool | `false` | Whether to use sync write.<br/>**It's only used when the provider is `raft_engine`**. |
| `wal.enable_log_recycle` | Bool | `true` | Whether to reuse logically truncated log files.<br/>**It's only used when the provider is `raft_engine`**. |
@@ -79,22 +87,24 @@
| `wal.create_topic_timeout` | String | `30s` | Above which a topic creation operation will be cancelled.<br/>**It's only used when the provider is `kafka`**. |
| `wal.max_batch_bytes` | String | `1MB` | The max size of a single producer batch.<br/>Warning: Kafka has a default limit of 1MB per message in a topic.<br/>**It's only used when the provider is `kafka`**. |
| `wal.consumer_wait_timeout` | String | `100ms` | The consumer wait timeout.<br/>**It's only used when the provider is `kafka`**. |
| `wal.backoff_init` | String | `500ms` | The initial backoff delay.<br/>**It's only used when the provider is `kafka`**. |
| `wal.backoff_max` | String | `10s` | The maximum backoff delay.<br/>**It's only used when the provider is `kafka`**. |
| `wal.backoff_base` | Integer | `2` | The exponential backoff rate, i.e. next backoff = base * current backoff.<br/>**It's only used when the provider is `kafka`**. |
| `wal.backoff_deadline` | String | `5mins` | The deadline of retries.<br/>**It's only used when the provider is `kafka`**. |
| `wal.overwrite_entry_start_id` | Bool | `false` | Ignore missing entries during read WAL.<br/>**It's only used when the provider is `kafka`**.<br/><br/>This option ensures that when Kafka messages are deleted, the system<br/>can still successfully replay memtable data without throwing an<br/>out-of-range error.<br/>However, enabling this option might lead to unexpected data loss,<br/>as the system will skip over missing entries instead of treating<br/>them as critical errors. |
| `procedure.max_running_procedures` | Integer | `128` | Max running procedures.<br/>The maximum number of procedures that can be running at the same time.<br/>If the number of running procedures exceeds this limit, the procedure will be rejected. |
| `flow` | -- | -- | flow engine options. |
| `flow.num_workers` | Integer | `0` | The number of flow worker in flownode.<br/>Not setting(or set to 0) this value will use the number of CPU cores divided by 2. |
| `query` | -- | -- | The query engine options. |
| `query.parallelism` | Integer | `0` | Parallelism of the query engine.<br/>Default to 0, which means the number of CPU cores. |
| `storage` | -- | -- | The data storage options. |
| `storage.data_home` | String | `/tmp/greptimedb/` | The working home directory. |
| `storage.data_home` | String | `./greptimedb_data` | The working home directory. |
| `storage.type` | String | `File` | The storage type used to store the data.<br/>- `File`: the data is stored in the local file system.<br/>- `S3`: the data is stored in the S3 object storage.<br/>- `Gcs`: the data is stored in the Google Cloud Storage.<br/>- `Azblob`: the data is stored in the Azure Blob Storage.<br/>- `Oss`: the data is stored in the Aliyun OSS. |
| `storage.cache_path` | String | Unset | Cache configuration for object storage such as 'S3' etc.<br/>The local file cache directory. |
| `storage.cache_capacity` | String | Unset | The local file cache capacity in bytes. |
| `storage.cache_path` | String | Unset | Read cache configuration for object storage such as 'S3' etc, it's configured by default when using object storage. It is recommended to configure it when using object storage for better performance.<br/>A local file directory, defaults to `{data_home}`. An empty string means disabling. |
| `storage.cache_capacity` | String | Unset | The local file cache capacity in bytes. If your disk space is sufficient, it is recommended to set it larger. |
| `storage.bucket` | String | Unset | The S3 bucket name.<br/>**It's only used when the storage type is `S3`, `Oss` and `Gcs`**. |
| `storage.root` | String | Unset | The S3 data will be stored in the specified prefix, for example, `s3://${bucket}/${root}`.<br/>**It's only used when the storage type is `S3`, `Oss` and `Azblob`**. |
| `storage.access_key_id` | String | Unset | The access key id of the aws account.<br/>It's **highly recommended** to use AWS IAM roles instead of hardcoding the access key id and secret key.<br/>**It's only used when the storage type is `S3` and `Oss`**. |
@@ -109,6 +119,12 @@
| `storage.sas_token` | String | Unset | The sas token of the azure account.<br/>**It's only used when the storage type is `Azblob`**. |
| `storage.endpoint` | String | Unset | The endpoint of the S3 service.<br/>**It's only used when the storage type is `S3`, `Oss`, `Gcs` and `Azblob`**. |
| `storage.region` | String | Unset | The region of the S3 service.<br/>**It's only used when the storage type is `S3`, `Oss`, `Gcs` and `Azblob`**. |
| `storage.http_client` | -- | -- | The http client options to the storage.<br/>**It's only used when the storage type is `S3`, `Oss`, `Gcs` and `Azblob`**. |
| `storage.http_client.pool_max_idle_per_host` | Integer | `1024` | The maximum idle connection per host allowed in the pool. |
| `storage.http_client.connect_timeout` | String | `30s` | The timeout for only the connect phase of a http client. |
| `storage.http_client.timeout` | String | `30s` | The total request timeout, applied from when the request starts connecting until the response body has finished.<br/>Also considered a total deadline. |
| `storage.http_client.pool_idle_timeout` | String | `90s` | The timeout for idle sockets being kept-alive. |
| `storage.http_client.skip_ssl_validation` | Bool | `false` | To skip the ssl verification<br/>**Security Notice**: Setting `skip_ssl_validation = true` disables certificate verification, making connections vulnerable to man-in-the-middle attacks. Only use this in development or trusted private networks. |
| `[[region_engine]]` | -- | -- | The region engine options. You can configure multiple region engines. |
| `region_engine.mito.num_workers` | Integer | `8` | Number of region workers. |
@@ -126,61 +142,76 @@
| `region_engine.mito.vector_cache_size` | String | Auto | Cache size for vectors and arrow arrays. Setting it to 0 to disable the cache.<br/>If not set, it's default to 1/16 of OS memory with a max limitation of 512MB. |
| `region_engine.mito.page_cache_size` | String | Auto | Cache size for pages of SST row groups. Setting it to 0 to disable the cache.<br/>If not set, it's default to 1/8 of OS memory. |
| `region_engine.mito.selector_result_cache_size` | String | Auto | Cache size for time series selector (e.g. `last_value()`). Setting it to 0 to disable the cache.<br/>If not set, it's default to 1/16 of OS memory with a max limitation of 512MB. |
| `region_engine.mito.enable_experimental_write_cache` | Bool | `false` | Whether to enable the experimental write cache. |
| `region_engine.mito.experimental_write_cache_path` | String | `""` | File system path for write cache, defaults to `{data_home}/write_cache`. |
| `region_engine.mito.enable_write_cache` | Bool | `false` | Whether to enable the write cache, it's enabled by default when using object storage. It is recommended to enable it when using object storage for better performance. |
| `region_engine.mito.write_cache_path` | String | `""` | File system path for write cache, defaults to `{data_home}`. |
| `region_engine.mito.write_cache_size` | String | `5GiB` | Capacity for write cache. If your disk space is sufficient, it is recommended to set it larger. |
| `region_engine.mito.scan_parallelism` | Integer | `0` | Parallelism to scan a region (default: 1/4 of cpu cores).<br/>- `0`: using the default value (1/4 of cpu cores).<br/>- `1`: scan in current thread.<br/>- `n`: scan in parallelism n. |
| `region_engine.mito.parallel_scan_channel_size` | Integer | `32` | Capacity of the channel to send data from parallel scan tasks to the main task. |
| `region_engine.mito.max_concurrent_scan_files` | Integer | `384` | Maximum number of SST files to scan concurrently. |
| `region_engine.mito.allow_stale_entries` | Bool | `false` | Whether to allow stale WAL entries read during replay. |
| `region_engine.mito.min_compaction_interval` | String | `0m` | Minimum time interval between two compactions.<br/>To align with the old behavior, the default value is 0 (no restrictions). |
| `region_engine.mito.index` | -- | -- | The options for index in Mito engine. |
| `region_engine.mito.index.aux_path` | String | `""` | Auxiliary directory path for the index in filesystem, used to store intermediate files for<br/>creating the index and staging files for searching the index, defaults to `{data_home}/index_intermediate`.<br/>The default name for this directory is `index_intermediate` for backward compatibility.<br/><br/>This path contains two subdirectories:<br/>- `__intm`: for storing intermediate files used during creating index.<br/>- `staging`: for storing staging files used during searching index. |
| `region_engine.mito.index.staging_size` | String | `2GB` | The max capacity of the staging directory. |
| `region_engine.mito.index.staging_ttl` | String | `7d` | The TTL of the staging directory.<br/>Defaults to 7 days.<br/>Setting it to "0s" to disable TTL. |
| `region_engine.mito.index.metadata_cache_size` | String | `64MiB` | Cache size for inverted index metadata. |
| `region_engine.mito.index.content_cache_size` | String | `128MiB` | Cache size for inverted index content. |
| `region_engine.mito.index.content_cache_page_size` | String | `64KiB` | Page size for inverted index content cache. |
| `region_engine.mito.index.result_cache_size` | String | `128MiB` | Cache size for index result. |
| `region_engine.mito.inverted_index` | -- | -- | The options for inverted index in Mito engine. |
| `region_engine.mito.inverted_index.create_on_flush` | String | `auto` | Whether to create the index on flush.<br/>- `auto`: automatically (default)<br/>- `disable`: never |
| `region_engine.mito.inverted_index.create_on_compaction` | String | `auto` | Whether to create the index on compaction.<br/>- `auto`: automatically (default)<br/>- `disable`: never |
| `region_engine.mito.inverted_index.apply_on_query` | String | `auto` | Whether to apply the index on query<br/>- `auto`: automatically (default)<br/>- `disable`: never |
| `region_engine.mito.inverted_index.mem_threshold_on_create` | String | `auto` | Memory threshold for performing an external sort during index creation.<br/>- `auto`: automatically determine the threshold based on the system memory size (default)<br/>- `unlimited`: no memory limit<br/>- `[size]` e.g. `64MB`: fixed memory threshold |
| `region_engine.mito.inverted_index.metadata_cache_size` | String | `64MiB` | Cache size for inverted index metadata. |
| `region_engine.mito.inverted_index.content_cache_size` | String | `128MiB` | Cache size for inverted index content. |
| `region_engine.mito.fulltext_index` | -- | -- | The options for full-text index in Mito engine. |
| `region_engine.mito.fulltext_index.create_on_flush` | String | `auto` | Whether to create the index on flush.<br/>- `auto`: automatically (default)<br/>- `disable`: never |
| `region_engine.mito.fulltext_index.create_on_compaction` | String | `auto` | Whether to create the index on compaction.<br/>- `auto`: automatically (default)<br/>- `disable`: never |
| `region_engine.mito.fulltext_index.apply_on_query` | String | `auto` | Whether to apply the index on query<br/>- `auto`: automatically (default)<br/>- `disable`: never |
| `region_engine.mito.fulltext_index.mem_threshold_on_create` | String | `auto` | Memory threshold for index creation.<br/>- `auto`: automatically determine the threshold based on the system memory size (default)<br/>- `unlimited`: no memory limit<br/>- `[size]` e.g. `64MB`: fixed memory threshold |
| `region_engine.mito.bloom_filter_index` | -- | -- | The options for bloom filter in Mito engine. |
| `region_engine.mito.bloom_filter_index.create_on_flush` | String | `auto` | Whether to create the bloom filter on flush.<br/>- `auto`: automatically (default)<br/>- `disable`: never |
| `region_engine.mito.bloom_filter_index.create_on_compaction` | String | `auto` | Whether to create the bloom filter on compaction.<br/>- `auto`: automatically (default)<br/>- `disable`: never |
| `region_engine.mito.bloom_filter_index.apply_on_query` | String | `auto` | Whether to apply the bloom filter on query<br/>- `auto`: automatically (default)<br/>- `disable`: never |
| `region_engine.mito.bloom_filter_index.mem_threshold_on_create` | String | `auto` | Memory threshold for bloom filter creation.<br/>- `auto`: automatically determine the threshold based on the system memory size (default)<br/>- `unlimited`: no memory limit<br/>- `[size]` e.g. `64MB`: fixed memory threshold |
| `region_engine.mito.memtable.index_max_keys_per_shard` | Integer | `8192` | The max number of keys in one shard.<br/>Only available for `partition_tree` memtable. |
| `region_engine.mito.memtable.data_freeze_threshold` | Integer | `32768` | The max rows of data inside the actively writing buffer in one shard.<br/>Only available for `partition_tree` memtable. |
| `region_engine.mito.memtable.fork_dictionary_bytes` | String | `1GiB` | Max dictionary bytes.<br/>Only available for `partition_tree` memtable. |
| `logging.append_stdout` | Bool | `true` | Whether to append logs to stdout. |
| `logging.log_format` | String | `text` | The log format. Can be `text`/`json`. |
| `logging.max_log_files` | Integer | `720` | The maximum amount of log files. |
| `logging.tracing_sample_ratio` | -- | -- | The percentage of tracing will be sampled and exported.<br/>Valid range `[0, 1]`, 1 means all traces are sampled, 0 means all traces are not sampled, the default value is 1.<br/>ratio > 1 are treated as 1. Fractions <0aretreatedas0|
| `logging.otlp_export_protocol` | String | `http` | The OTLP tracing export protocol. Can be `grpc`/`http`. |
| `logging.otlp_headers` | -- | -- | Additional OTLP headers, only valid when using OTLP http |
| `logging.tracing_sample_ratio` | -- | Unset | The percentage of tracing will be sampled and exported.<br/>Valid range `[0, 1]`, 1 means all traces are sampled, 0 means all traces are not sampled, the default value is 1.<br/>ratio > 1 are treated as 1. Fractions <0aretreatedas0|
|`export_metrics`|--|--|ThedatanodecanexportitsmetricsandsendtoPrometheuscompatibleservice(e.g.sendto`greptimedb`itself)fromremote-writeAPI.<br/>This is only used for `greptimedb` to export its own metrics internally. It's different from prometheus scrape. |
|`export_metrics`|--|--|ThestandalonecanexportitsmetricsandsendtoPrometheuscompatibleservice(e.g.`greptimedb`)fromremote-writeAPI.<br/>This is only used for `greptimedb` to export its own metrics internally. It's different from prometheus scrape. |
| `export_metrics.write_interval` | String | `30s` | The interval of export metrics. |
| `export_metrics.self_import` | -- | -- | For `standalone` mode, `self_import` is recommended to collect metrics generated by itself<br/>You must create the database before enabling it. |
| `export_metrics.remote_write.url` | String | `""` | The url the metrics send to. The url example can be: `http://127.0.0.1:4000/v1/prometheus/write?db=greptime_metrics`. |
| `export_metrics.remote_write.url` | String | `""` | The prometheus remote write endpoint that the metrics send to. The url example can be: `http://127.0.0.1:4000/v1/prometheus/write?db=greptime_metrics`. |
| `tracing` | -- | -- | The tracing options. Only effect when compiled with `tokio-console` feature. |
| `tracing.tokio_console_addr` | String | Unset | The tokio console address. |
| `memory` | -- | -- | The memory options. |
| `memory.enable_heap_profiling` | Bool | `true` | Whether to enable heap profiling activation during startup.<br/>When enabled, heap profiling will be activated if the `MALLOC_CONF` environment variable<br/>is set to "prof:true,prof_active:false". The official image adds this env variable.<br/>Default is true. |
## Distributed Mode
@@ -190,6 +221,7 @@
| Key | Type | Default | Descriptions |
| --- | -----| ------- | ----------- |
| `default_timezone` | String | Unset | The default timezone of the server. |
| `max_in_flight_write_bytes` | String | Unset | The maximum in-flight write bytes. |
| `runtime` | -- | -- | The runtime options. |
| `runtime.global_rt_size` | Integer | `8` | The number of threads to execute the runtime for global read operations. |
| `runtime.compact_rt_size` | Integer | `4` | The number of threads to execute the runtime for global write operations. |
@@ -198,21 +230,37 @@
| `heartbeat.retry_interval` | String | `3s` | Interval for retrying to send heartbeat messages to the metasrv. |
| `http` | -- | -- | The HTTP server options. |
| `http.addr` | String | `127.0.0.1:4000` | The address to bind the HTTP server. |
| `http.timeout` | String | `30s` | HTTP request timeout. Set to 0 to disable timeout. |
| `http.timeout` | String | `0s` | HTTP request timeout. Set to 0 to disable timeout. |
| `http.body_limit` | String | `64MB` | HTTP request body limit.<br/>The following units are supported: `B`, `KB`, `KiB`, `MB`, `MiB`, `GB`, `GiB`, `TB`, `TiB`, `PB`, `PiB`.<br/>Set to 0 to disable limit. |
| `http.enable_cors` | Bool | `true` | HTTP CORS support, it's turned on by default<br/>This allows browser to access http APIs without CORS restrictions |
| `http.prom_validation_mode` | String | `strict` | Whether to enable validation for Prometheus remote write requests.<br/>Available options:<br/>- strict: deny invalid UTF-8 strings (default).<br/>- lossy: allow invalid UTF-8 strings, replace invalid characters with REPLACEMENT_CHARACTER(U+FFFD).<br/>- unchecked: do not valid strings. |
| `grpc` | -- | -- | The gRPC server options. |
| `grpc.addr` | String | `127.0.0.1:4001` | The address to bind the gRPC server. |
| `grpc.hostname` | String | `127.0.0.1` | The hostname advertised to the metasrv,<br/>and used for connections from outside the host |
| `grpc.bind_addr` | String | `127.0.0.1:4001` | The address to bind the gRPC server. |
| `grpc.server_addr` | String | `127.0.0.1:4001` | The address advertised to the metasrv,and used for connections from outside the host.<br/>If left empty or unset, the server will automatically use the IP address of the first network interface<br/>on the host, with the same port number as the one specified in `grpc.bind_addr`. |
| `grpc.runtime_size` | Integer | `8` | The number of server worker threads. |
| `grpc.flight_compression` | String | `arrow_ipc` | Compression mode for frontend side Arrow IPC service. Available options:<br/>- `none`: disable all compression<br/>- `transport`: only enable gRPC transport compression (zstd)<br/>- `arrow_ipc`: only enable Arrow IPC compression (lz4)<br/>- `all`: enable all compression.<br/>Default to `none` |
| `grpc.tls` | -- | -- | gRPC server TLS options, see `mysql.tls` section. |
| `grpc.tls.watch` | Bool | `false` | Watch for Certificate and key file change and auto reload.<br/>For now, gRPC tls config does not support auto reload. |
| `internal_grpc` | -- | -- | The internal gRPC server options. Internal gRPC port for nodes inside cluster to access frontend. |
| `internal_grpc.bind_addr` | String | `127.0.0.1:4010` | The address to bind the gRPC server. |
| `internal_grpc.server_addr` | String | `127.0.0.1:4010` | The address advertised to the metasrv, and used for connections from outside the host.<br/>If left empty or unset, the server will automatically use the IP address of the first network interface<br/>on the host, with the same port number as the one specified in `grpc.bind_addr`. |
| `internal_grpc.runtime_size` | Integer | `8` | The number of server worker threads. |
| `internal_grpc.flight_compression` | String | `arrow_ipc` | Compression mode for frontend side Arrow IPC service. Available options:<br/>- `none`: disable all compression<br/>- `transport`: only enable gRPC transport compression (zstd)<br/>- `arrow_ipc`: only enable Arrow IPC compression (lz4)<br/>- `all`: enable all compression.<br/>Default to `none` |
| `internal_grpc.tls` | -- | -- | internal gRPC server TLS options, see `mysql.tls` section. |
| `internal_grpc.tls.watch` | Bool | `false` | Watch for Certificate and key file change and auto reload.<br/>For now, gRPC tls config does not support auto reload. |
| `query.parallelism` | Integer | `0` | Parallelism of the query engine.<br/>Default to 0, which means the number of CPU cores. |
| `query.allow_query_fallback` | Bool | `false` | Whether to allow query fallback when push down optimize fails.<br/>Default to false, meaning when push down optimize failed, return error msg |
| `logging.append_stdout` | Bool | `true` | Whether to append logs to stdout. |
| `logging.log_format` | String | `text` | The log format. Can be `text`/`json`. |
| `logging.max_log_files` | Integer | `720` | The maximum amount of log files. |
| `logging.tracing_sample_ratio` | -- | -- | The percentage of tracing will be sampled and exported.<br/>Valid range `[0, 1]`, 1 means all traces are sampled, 0 means all traces are not sampled, the default value is 1.<br/>ratio > 1 are treated as 1. Fractions <0aretreatedas0|
| `logging.otlp_export_protocol` | String | `http` | The OTLP tracing export protocol. Can be `grpc`/`http`. |
| `logging.otlp_headers` | -- | -- | Additional OTLP headers, only valid when using OTLP http |
| `logging.tracing_sample_ratio` | -- | Unset | The percentage of tracing will be sampled and exported.<br/>Valid range `[0, 1]`, 1 means all traces are sampled, 0 means all traces are not sampled, the default value is 1.<br/>ratio > 1 are treated as 1. Fractions <0aretreatedas0|
|`export_metrics`|--|--|ThedatanodecanexportitsmetricsandsendtoPrometheuscompatibleservice(e.g.sendto`greptimedb`itself)fromremote-writeAPI.<br/>This is only used for `greptimedb` to export its own metrics internally. It's different from prometheus scrape. |
|`slow_query.record_type`|String|`system_table`|Therecordtypeofslowqueries.Itcanbe`system_table`or`log`.<br/>If `system_table` is selected, the slow queries will be recorded in a system table `greptime_private.slow_queries`.<br/>If `log` is selected, the slow queries will be logged in a log file `greptimedb-slow-queries.*`. |
| `slow_query.threshold` | String | `30s` | The threshold of slowquery. It can be human readable time string, for example: `10s`, `100ms`, `1s`. |
| `slow_query.sample_ratio` | Float | `1.0` | The sampling ratio of slow query log. The value should be in the range of (0, 1]. For example, `0.1` means 10% of the slow queries will be logged and `1.0` means all slow queries will be logged. |
| `slow_query.ttl` | String | `90d` | The TTL of the `slow_queries` system table. Default is `90d` when `record_type` is `system_table`. |
| `export_metrics` | -- | -- | The frontend can export its metrics and send to Prometheus compatible service (e.g. `greptimedb` itself) from remote-write API.<br/>This is only used for `greptimedb` to export its own metrics internally. It's different from prometheus scrape. |
| `export_metrics.write_interval` | String | `30s` | The interval of export metrics. |
| `export_metrics.self_import` | -- | -- | For `standalone` mode, `self_import` is recommend to collect metrics generated by itself<br/>You must create the database before enabling it. |
| `export_metrics.remote_write.url` | String | `""` | The url the metrics send to. The url example can be: `http://127.0.0.1:4000/v1/prometheus/write?db=greptime_metrics`. |
| `export_metrics.remote_write.url` | String | `""` | The prometheus remote write endpoint that the metrics send to. The url example can be: `http://127.0.0.1:4000/v1/prometheus/write?db=greptime_metrics`. |
| `tracing` | -- | -- | The tracing options. Only effect when compiled with `tokio-console` feature. |
| `tracing.tokio_console_addr` | String | Unset | The tokio console address. |
| `memory` | -- | -- | The memory options. |
| `memory.enable_heap_profiling` | Bool | `true` | Whether to enable heap profiling activation during startup.<br/>When enabled, heap profiling will be activated if the `MALLOC_CONF` environment variable<br/>is set to "prof:true,prof_active:false". The official image adds this env variable.<br/>Default is true. |
| `event_recorder` | -- | -- | Configuration options for the event recorder. |
| `event_recorder.ttl` | String | `90d` | TTL for the events table that will be used to store the events. Default is `90d`. |
### Metasrv
| Key | Type | Default | Descriptions |
| --- | -----| ------- | ----------- |
| `data_home` | String | `/tmp/metasrv/` | The working home directory. |
| `bind_addr` | String | `127.0.0.1:3002` | The bind address of metasrv. |
| `server_addr` | String | `127.0.0.1:3002` | The communication server address for frontend and datanode to connect to metasrv, "127.0.0.1:3002" by default for localhost. |
| `store_addr` | String | `127.0.0.1:2379` | Store server address default to etcd store. |
| `data_home` | String | `./greptimedb_data` | The working home directory. |
| `store_addrs` | Array | -- | Store server address default to etcd store.<br/>For postgres store, the format is:<br/>"password=password dbname=postgres user=postgres host=localhost port=5432"<br/>For etcd store, the format is:<br/>"127.0.0.1:2379" |
| `store_key_prefix` | String | `""` | If it's not empty, the metasrv will store all data with this key prefix. |
| `backend` | String | `etcd_store` | The datastore for meta server.<br/>Available values:<br/>- `etcd_store` (default value)<br/>- `memory_store`<br/>- `postgres_store`<br/>- `mysql_store` |
| `meta_table_name` | String | `greptime_metakv` | Table name in RDS to store metadata. Effect when using a RDS kvbackend.<br/>**Only used when backend is `postgres_store`.** |
| `meta_schema_name` | String | `greptime_schema` | Optional PostgreSQL schema for metadata table and election table name qualification.<br/>When PostgreSQL public schema is not writable (e.g., PostgreSQL 15+ with restricted public),<br/>set this to a writable schema. GreptimeDB will use `meta_schema_name`.`meta_table_name`.<br/>GreptimeDB will NOT create the schema automatically; please ensure it exists or the user has permission.<br/>**Only used when backend is `postgres_store`.** |
| `meta_election_lock_id` | Integer | `1` | Advisory lock id in PostgreSQL for election. Effect when using PostgreSQL as kvbackend<br/>Only used when backend is `postgres_store`. |
| `store_key_prefix` | String | `""` | If it's not empty, the metasrv will store all data with this key prefix. |
| `enable_region_failover` | Bool | `false` | Whether to enable region failover.<br/>This feature is only available on GreptimeDB running on cluster mode and<br/>- Using Remote WAL<br/>- Using shared storage (e.g., s3). |
| `backend` | String | `EtcdStore` | The datastore for meta server. |
| `region_failure_detector_initialization_delay` | String | `10m` | The delay before starting region failure detection.<br/>This delay helps prevent Metasrv from triggering unnecessary region failovers before all Datanodes are fully started.<br/>Especially useful when the cluster is not deployed with GreptimeDB Operator and maintenance mode is not enabled. |
| `allow_region_failover_on_local_wal` | Bool | `false` | Whether to allow region failover on local WAL.<br/>**This option is not recommended to be set to true, because it may lead to data loss during failover.** |
| `node_max_idle_time` | String | `24hours` | Max allowed idle time before removing node info from metasrv memory. |
| `enable_telemetry` | Bool | `true` | Whether to enable greptimedb telemetry. Enabled by default. |
| `runtime` | -- | -- | The runtime options. |
| `runtime.global_rt_size` | Integer | `8` | The number of threads to execute the runtime for global read operations. |
| `runtime.compact_rt_size` | Integer | `4` | The number of threads to execute the runtime for global write operations. |
| `backend_tls` | -- | -- | TLS configuration for kv store backend (applicable for etcd, PostgreSQL, and MySQL backends)<br/>When using etcd, PostgreSQL, or MySQL as metadata store, you can configure TLS here |
| `backend_tls.mode` | String | `prefer` | TLS mode, refer to https://www.postgresql.org/docs/current/libpq-ssl.html<br/>- "disable" - No TLS<br/>- "prefer" (default) - Try TLS, fallback to plain<br/>- "require" - Require TLS<br/>- "verify_ca" - Require TLS and verify CA<br/>- "verify_full" - Require TLS and verify hostname |
| `backend_tls.ca_cert_path` | String | `""` | Path to CA certificate file (for server certificate verification)<br/>Required when using custom CAs or self-signed certificates<br/>Leave empty to use system root certificates only<br/>Like "/path/to/ca.crt" |
| `backend_tls.watch` | Bool | `false` | Watch for certificate file changes and auto reload |
| `grpc` | -- | -- | The gRPC server options. |
| `grpc.bind_addr` | String | `127.0.0.1:3002` | The address to bind the gRPC server. |
| `grpc.server_addr` | String | `127.0.0.1:3002` | The communication server address for the frontend and datanode to connect to metasrv.<br/>If left empty or unset, the server will automatically use the IP address of the first network interface<br/>on the host, with the same port number as the one specified in `bind_addr`. |
| `grpc.runtime_size` | Integer | `8` | The number of server worker threads. |
| `grpc.max_recv_message_size` | String | `512MB` | The maximum receive message size for gRPC server. |
| `grpc.max_send_message_size` | String | `512MB` | The maximum send message size for gRPC server. |
| `http` | -- | -- | The HTTP server options. |
| `http.addr` | String | `127.0.0.1:4000` | The address to bind the HTTP server. |
| `http.timeout` | String | `0s` | HTTP request timeout. Set to 0 to disable timeout. |
| `http.body_limit` | String | `64MB` | HTTP request body limit.<br/>The following units are supported: `B`, `KB`, `KiB`, `MB`, `MiB`, `GB`, `GiB`, `TB`, `TiB`, `PB`, `PiB`.<br/>Set to 0 to disable limit. |
| `procedure.max_metadata_value_size` | String | `1500KiB` | Auto split large value<br/>GreptimeDB procedure uses etcd as the default metadata storage backend.<br/>The etcd the maximum size of any request is 1.5 MiB<br/>1500KiB = 1536KiB (1.5MiB) - 36KiB (reserved size of key)<br/>Comments out the `max_metadata_value_size`, for don't split large value (no limit). |
| `procedure.max_running_procedures` | Integer | `128` | Max running procedures.<br/>The maximum number of procedures that can be running at the same time.<br/>If the number of running procedures exceeds this limit, the procedure will be rejected. |
| `failure_detector` | -- | -- | -- |
| `failure_detector.threshold` | Float | `8.0` | The threshold value used by the failure detector to determine failure conditions. |
| `failure_detector.min_std_deviation` | String | `100ms` | The minimum standard deviation of the heartbeat intervals, used to calculate acceptable variations. |
| `wal.broker_endpoints` | Array | -- | The broker endpoints of the Kafka cluster. |
| `wal.auto_create_topics` | Bool | `true` | Automatically create topics for WAL.<br/>Set to `true` to automatically create topics for WAL.<br/>Otherwise, use topics named `topic_name_prefix_[0..num_topics)` |
| `wal.num_topics` | Integer |`64`| Number of topics. |
| `wal.topic_name_prefix` | String | `greptimedb_wal_topic` | A Kafka topic is constructed by concatenating `topic_name_prefix` and `topic_id`.<br/>i.g., greptimedb_wal_topic_0, greptimedb_wal_topic_1. |
| `wal.replication_factor` | Integer | `1` | Expected number of replicas of each partition. |
| `wal.create_topic_timeout` | String | `30s` | Above which a topic creation operation will be cancelled. |
| `wal.backoff_init` | String | `500ms` | The initial backoff for kafka clients. |
| `wal.backoff_max` | String | `10s` | The maximum backoff for kafka clients. |
| `wal.backoff_base` | Integer | `2` | Exponential backoff rate, i.e. next backoff = base * current backoff. |
| `wal.backoff_deadline` | String | `5mins` | Stop reconnecting if the total wait time reaches the deadline. If this config is missing, the reconnecting won't terminate. |
| `wal.broker_endpoints` | Array | -- | The broker endpoints of the Kafka cluster.<br/><br/>**It's only used when the provider is `kafka`**. |
| `wal.auto_create_topics` | Bool | `true` | Automatically create topics for WAL.<br/>Set to `true` to automatically create topics for WAL.<br/>Otherwise, use topics named `topic_name_prefix_[0..num_topics)`<br/>**It's only used when the provider is `kafka`**. |
| `wal.auto_prune_interval` | String | `30m` | Interval of automatically WAL pruning.<br/>Set to`0s`to disable automatically WAL pruning which delete unused remote WAL entries periodically.<br/>**It's only used when the provider is `kafka`**. |
| `wal.flush_trigger_size` | String | `512MB` | Estimated size threshold to trigger a flush when using Kafka remote WAL.<br/>Since multiple regions may share a Kafka topic, the estimated size is calculated as:<br/> (latest_entry_id - flushed_entry_id) * avg_record_size<br/>MetaSrv triggers a flush for a region when this estimated size exceeds `flush_trigger_size`.<br/>- `latest_entry_id`: The latest entry ID in the topic.<br/>- `flushed_entry_id`: The last flushed entry ID for the region.<br/>Set to "0" to let the system decide the flush trigger size.<br/>**It's only used when the provider is `kafka`**. |
| `wal.checkpoint_trigger_size` | String | `128MB` | Estimated size threshold to trigger a checkpoint when using Kafka remote WAL.<br/>The estimated size is calculated as:<br/> (latest_entry_id - last_checkpoint_entry_id) * avg_record_size<br/>MetaSrv triggers a checkpoint for a region when this estimated size exceeds `checkpoint_trigger_size`.<br/>Set to "0" to let the system decide the checkpoint trigger size.<br/>**It's only used when the provider is `kafka`**. |
| `wal.auto_prune_parallelism` | Integer | `10` | Concurrent task limit for automatically WAL pruning.<br/>**It's only used when the provider is `kafka`**. |
| `wal.num_topics` | Integer | `64` | Number of topics used for remote WAL.<br/>**It's only used when the provider is `kafka`**. |
| `wal.selector_type` | String | `round_robin` | Topic selector type.<br/>Available selector types:<br/>- `round_robin` (default)<br/>**It's only used when the provider is `kafka`**. |
| `wal.topic_name_prefix` | String | `greptimedb_wal_topic` | A Kafka topic is constructed by concatenating `topic_name_prefix` and `topic_id`.<br/>Only accepts strings that match the following regular expression pattern:<br/>[a-zA-Z_:-][a-zA-Z0-9_:\-\.@#]*<br/>i.g., greptimedb_wal_topic_0, greptimedb_wal_topic_1.<br/>**It's only used when the provider is `kafka`**. |
| `wal.replication_factor` | Integer | `1` | Expected number of replicas of each partition.<br/>**It's only used when the provider is `kafka`**. |
| `wal.create_topic_timeout` | String | `30s` | The timeout for creating a Kafka topic.<br/>**It's only used when the provider is `kafka`**. |
| `event_recorder` | -- | -- | Configuration options for the event recorder. |
| `event_recorder.ttl` | String | `90d` | TTL for the events table that will be used to store the events. Default is `90d`. |
| `stats_persistence` | -- | -- | Configuration options for the stats persistence. |
| `stats_persistence.ttl` | String | `0s` | TTL for the stats table that will be used to store the stats.<br/>Set to `0s` to disable stats persistence.<br/>Default is `0s`.<br/>If you want to enable stats persistence, set the TTL to a value greater than 0.<br/>It is recommended to set a small value, e.g., `3h`. |
| `stats_persistence.interval` | String | `10m` | The interval to persist the stats. Default is `10m`.<br/>The minimum value is `10m`, if the value is less than `10m`, it will be overridden to `10m`. |
| `logging` | -- | -- | The logging options. |
| `logging.dir` | String | `/tmp/greptimedb/logs` | The directory to store the log files. If set to empty, logs will not be written to files. |
| `logging.dir` | String | `./greptimedb_data/logs` | The directory to store the log files. If set to empty, logs will not be written to files. |
| `logging.level` | String | Unset | The log level. Can be `info`/`debug`/`warn`/`error`. |
| `logging.append_stdout` | Bool | `true` | Whether to append logs to stdout. |
| `logging.log_format` | String | `text` | The log format. Can be `text`/`json`. |
| `logging.max_log_files` | Integer | `720` | The maximum amount of log files. |
| `logging.tracing_sample_ratio` | -- | -- | The percentage of tracing will be sampled and exported.<br/>Valid range `[0, 1]`, 1 means all traces are sampled, 0 means all traces are not sampled, the default value is 1.<br/>ratio > 1 are treated as 1. Fractions <0aretreatedas0|
| `logging.otlp_export_protocol` | String | `http` | The OTLP tracing export protocol. Can be `grpc`/`http`. |
| `logging.otlp_headers` | -- | -- | Additional OTLP headers, only valid when using OTLP http |
| `logging.tracing_sample_ratio` | -- | Unset | The percentage of tracing will be sampled and exported.<br/>Valid range `[0, 1]`, 1 means all traces are sampled, 0 means all traces are not sampled, the default value is 1.<br/>ratio > 1 are treated as 1. Fractions <0aretreatedas0|
|`export_metrics`|--|--|ThedatanodecanexportitsmetricsandsendtoPrometheuscompatibleservice(e.g.sendto`greptimedb`itself)fromremote-writeAPI.<br/>This is only used for `greptimedb` to export its own metrics internally. It's different from prometheus scrape. |
|`export_metrics`|--|--|ThemetasrvcanexportitsmetricsandsendtoPrometheuscompatibleservice(e.g.`greptimedb`itself)fromremote-writeAPI.<br/>This is only used for `greptimedb` to export its own metrics internally. It's different from prometheus scrape. |
| `export_metrics.write_interval` | String | `30s` | The interval of export metrics. |
| `export_metrics.self_import` | -- | -- | For `standalone` mode, `self_import` is recommend to collect metrics generated by itself<br/>You must create the database before enabling it. |
| `export_metrics.remote_write.url` | String | `""` | The url the metrics send to. The url example can be: `http://127.0.0.1:4000/v1/prometheus/write?db=greptime_metrics`. |
| `export_metrics.remote_write.url` | String | `""` | The prometheus remote write endpoint that the metrics send to. The url example can be: `http://127.0.0.1:4000/v1/prometheus/write?db=greptime_metrics`. |
| `tracing` | -- | -- | The tracing options. Only effect when compiled with `tokio-console` feature. |
| `tracing.tokio_console_addr` | String | Unset | The tokio console address. |
| `memory` | -- | -- | The memory options. |
| `memory.enable_heap_profiling` | Bool | `true` | Whether to enable heap profiling activation during startup.<br/>When enabled, heap profiling will be activated if the `MALLOC_CONF` environment variable<br/>is set to "prof:true,prof_active:false". The official image adds this env variable.<br/>Default is true. |
### Datanode
| Key | Type | Default | Descriptions |
| --- | -----| ------- | ----------- |
| `mode` | String | `standalone` | The running mode of the datanode. It can be `standalone` or `distributed`. |
| `node_id` | Integer | Unset | The datanode identifier and should be unique in the cluster. |
| `require_lease_before_startup` | Bool | `false` | Start services after regions have obtained leases.<br/>It will block the datanode start if it can't receive leases in the heartbeat from metasrv. |
| `init_regions_in_background` | Bool | `false` | Initialize all regions in the background during the startup.<br/>By default, it provides services after all regions have been initialized. |
| `enable_telemetry` | Bool | `true` | Enable telemetry to collect anonymous usage data. Enabled by default. |
| `http` | -- | -- | The HTTP server options. |
| `http.addr` | String | `127.0.0.1:4000` | The address to bind the HTTP server. |
| `http.timeout` | String | `30s` | HTTP request timeout. Set to 0 to disable timeout. |
| `http.timeout` | String | `0s` | HTTP request timeout. Set to 0 to disable timeout. |
| `http.body_limit` | String | `64MB` | HTTP request body limit.<br/>The following units are supported: `B`, `KB`, `KiB`, `MB`, `MiB`, `GB`, `GiB`, `TB`, `TiB`, `PB`, `PiB`.<br/>Set to 0 to disable limit. |
| `grpc` | -- | -- | The gRPC server options. |
| `grpc.addr` | String | `127.0.0.1:3001` | The address to bind the gRPC server. |
| `grpc.hostname` | String | `127.0.0.1` | The hostname advertised to the metasrv,<br/>and used for connections from outside the host |
| `grpc.bind_addr` | String | `127.0.0.1:3001` | The address to bind the gRPC server. |
| `grpc.server_addr` | String | `127.0.0.1:3001` | The address advertised to the metasrv,and used for connections from outside the host.<br/>If left empty or unset, the server will automatically use the IP address of the first network interface<br/>on the host, with the same port number as the one specified in `grpc.bind_addr`. |
| `grpc.runtime_size` | Integer | `8` | The number of server worker threads. |
| `grpc.max_recv_message_size` | String | `512MB` | The maximum receive message size for gRPC server. |
| `grpc.max_send_message_size` | String | `512MB` | The maximum send message size for gRPC server. |
| `grpc.flight_compression` | String | `arrow_ipc` | Compression mode for datanode side Arrow IPC service. Available options:<br/>- `none`: disable all compression<br/>- `transport`: only enable gRPC transport compression (zstd)<br/>- `arrow_ipc`: only enable Arrow IPC compression (lz4)<br/>- `all`: enable all compression.<br/>Default to `none` |
| `grpc.tls` | -- | -- | gRPC server TLS options, see `mysql.tls` section. |
| `wal.provider` | String | `raft_engine` | The provider of the WAL.<br/>- `raft_engine`: the wal is stored in the local file system by raft-engine.<br/>- `kafka`: it's remote wal that data is stored in Kafka. |
| `wal.dir` | String | Unset | The directory to store the WAL files.<br/>**It's only used when the provider is `raft_engine`**. |
| `wal.file_size` | String | `256MB` | The size of the WAL segment file.<br/>**It's only used when the provider is `raft_engine`**. |
| `wal.purge_threshold` | String | `4GB` | The threshold of the WAL size to trigger a flush.<br/>**It's only used when the provider is `raft_engine`**. |
| `wal.purge_interval` | String | `10m` | The interval to trigger a flush.<br/>**It's only used when the provider is `raft_engine`**. |
| `wal.file_size` | String | `128MB` | The size of the WAL segment file.<br/>**It's only used when the provider is `raft_engine`**. |
| `wal.purge_threshold` | String | `1GB` | The threshold of the WAL size to trigger a purge.<br/>**It's only used when the provider is `raft_engine`**. |
| `wal.purge_interval` | String | `1m` | The interval to trigger a purge.<br/>**It's only used when the provider is `raft_engine`**. |
| `wal.read_batch_size` | Integer | `128` | The read batch size.<br/>**It's only used when the provider is `raft_engine`**. |
| `wal.sync_write` | Bool | `false` | Whether to use sync write.<br/>**It's only used when the provider is `raft_engine`**. |
| `wal.enable_log_recycle` | Bool | `true` | Whether to reuse logically truncated log files.<br/>**It's only used when the provider is `raft_engine`**. |
@@ -406,18 +485,16 @@
| `wal.broker_endpoints` | Array | -- | The Kafka broker endpoints.<br/>**It's only used when the provider is `kafka`**. |
| `wal.max_batch_bytes` | String | `1MB` | The max size of a single producer batch.<br/>Warning: Kafka has a default limit of 1MB per message in a topic.<br/>**It's only used when the provider is `kafka`**. |
| `wal.consumer_wait_timeout` | String | `100ms` | The consumer wait timeout.<br/>**It's only used when the provider is `kafka`**. |
| `wal.backoff_init` | String | `500ms` | The initial backoff delay.<br/>**It's only used when the provider is `kafka`**. |
| `wal.backoff_max` | String | `10s` | The maximum backoff delay.<br/>**It's only used when the provider is `kafka`**. |
| `wal.backoff_base` | Integer | `2` | The exponential backoff rate, i.e. next backoff = base * current backoff.<br/>**It's only used when the provider is `kafka`**. |
| `wal.backoff_deadline` | String | `5mins` | The deadline of retries.<br/>**It's only used when the provider is `kafka`**. |
| `wal.create_index` | Bool | `true` | Whether to enable WAL index creation.<br/>**It's only used when the provider is `kafka`**. |
| `wal.dump_index_interval` | String | `60s` | The interval for dumping WAL indexes.<br/>**It's only used when the provider is `kafka`**. |
| `wal.overwrite_entry_start_id` | Bool | `false` | Ignore missing entries during read WAL.<br/>**It's only used when the provider is `kafka`**.<br/><br/>This option ensures that when Kafka messages are deleted, the system<br/>can still successfully replay memtable data without throwing an<br/>out-of-range error.<br/>However, enabling this option might lead to unexpected data loss,<br/>as the system will skip over missing entries instead of treating<br/>them as critical errors. |
| `query` | -- | -- | The query engine options. |
| `query.parallelism` | Integer | `0` | Parallelism of the query engine.<br/>Default to 0, which means the number of CPU cores. |
| `storage` | -- | -- | The data storage options. |
| `storage.data_home` | String | `/tmp/greptimedb/` | The working home directory. |
| `storage.data_home` | String | `./greptimedb_data` | The working home directory. |
| `storage.type` | String | `File` | The storage type used to store the data.<br/>- `File`: the data is stored in the local file system.<br/>- `S3`: the data is stored in the S3 object storage.<br/>- `Gcs`: the data is stored in the Google Cloud Storage.<br/>- `Azblob`: the data is stored in the Azure Blob Storage.<br/>- `Oss`: the data is stored in the Aliyun OSS. |
| `storage.cache_path` | String | Unset | Cache configuration for object storage such as 'S3' etc.<br/>The local file cache directory. |
| `storage.cache_capacity` | String | Unset | The local file cache capacity in bytes. |
| `storage.cache_path` | String | Unset | Read cache configuration for object storage such as 'S3' etc, it's configured by default when using object storage. It is recommended to configure it when using object storage for better performance.<br/>A local file directory, defaults to `{data_home}`. An empty string means disabling. |
| `storage.cache_capacity` | String | Unset | The local file cache capacity in bytes. If your disk space is sufficient, it is recommended to set it larger. |
| `storage.bucket` | String | Unset | The S3 bucket name.<br/>**It's only used when the storage type is `S3`, `Oss` and `Gcs`**. |
| `storage.root` | String | Unset | The S3 data will be stored in the specified prefix, for example, `s3://${bucket}/${root}`.<br/>**It's only used when the storage type is `S3`, `Oss` and `Azblob`**. |
| `storage.access_key_id` | String | Unset | The access key id of the aws account.<br/>It's **highly recommended** to use AWS IAM roles instead of hardcoding the access key id and secret key.<br/>**It's only used when the storage type is `S3` and `Oss`**. |
@@ -432,12 +509,20 @@
| `storage.sas_token` | String | Unset | The sas token of the azure account.<br/>**It's only used when the storage type is `Azblob`**. |
| `storage.endpoint` | String | Unset | The endpoint of the S3 service.<br/>**It's only used when the storage type is `S3`, `Oss`, `Gcs` and `Azblob`**. |
| `storage.region` | String | Unset | The region of the S3 service.<br/>**It's only used when the storage type is `S3`, `Oss`, `Gcs` and `Azblob`**. |
| `storage.http_client` | -- | -- | The http client options to the storage.<br/>**It's only used when the storage type is `S3`, `Oss`, `Gcs` and `Azblob`**. |
| `storage.http_client.pool_max_idle_per_host` | Integer | `1024` | The maximum idle connection per host allowed in the pool. |
| `storage.http_client.connect_timeout` | String | `30s` | The timeout for only the connect phase of a http client. |
| `storage.http_client.timeout` | String | `30s` | The total request timeout, applied from when the request starts connecting until the response body has finished.<br/>Also considered a total deadline. |
| `storage.http_client.pool_idle_timeout` | String | `90s` | The timeout for idle sockets being kept-alive. |
| `storage.http_client.skip_ssl_validation` | Bool | `false` | To skip the ssl verification<br/>**Security Notice**: Setting `skip_ssl_validation = true` disables certificate verification, making connections vulnerable to man-in-the-middle attacks. Only use this in development or trusted private networks. |
| `[[region_engine]]` | -- | -- | The region engine options. You can configure multiple region engines. |
| `region_engine.mito.num_workers` | Integer | `8` | Number of region workers. |
| `region_engine.mito.worker_channel_size` | Integer | `128` | Request channel size of each worker. |
| `region_engine.mito.worker_request_batch_size` | Integer | `64` | Max batch size for a worker to handle requests. |
| `region_engine.mito.manifest_checkpoint_distance` | Integer | `10` | Number of meta action updated to trigger a new checkpoint for the manifest. |
| `region_engine.mito.experimental_manifest_keep_removed_file_count` | Integer | `256` | Number of removed files to keep in manifest's `removed_files` field before also<br/>remove them from `removed_files`. Mostly for debugging purpose.<br/>If set to 0, it will only use `keep_removed_file_ttl` to decide when to remove files<br/>from `removed_files` field. |
| `region_engine.mito.experimental_manifest_keep_removed_file_ttl` | String | `1h` | How long to keep removed files in the `removed_files` field of manifest<br/>after they are removed from manifest.<br/>files will only be removed from `removed_files` field<br/>if both `keep_removed_file_count` and `keep_removed_file_ttl` is reached. |
| `region_engine.mito.compress_manifest` | Bool | `false` | Whether to compress manifest and checkpoint file by gzip (default false). |
| `region_engine.mito.max_background_flushes` | Integer | Auto | Max number of running background flush jobs (default: 1/2 of cpu cores). |
| `region_engine.mito.max_background_compactions` | Integer | Auto | Max number of running background compaction jobs (default: 1/4 of cpu cores). |
@@ -449,18 +534,23 @@
| `region_engine.mito.vector_cache_size` | String | Auto | Cache size for vectors and arrow arrays. Setting it to 0 to disable the cache.<br/>If not set, it's default to 1/16 of OS memory with a max limitation of 512MB. |
| `region_engine.mito.page_cache_size` | String | Auto | Cache size for pages of SST row groups. Setting it to 0 to disable the cache.<br/>If not set, it's default to 1/8 of OS memory. |
| `region_engine.mito.selector_result_cache_size` | String | Auto | Cache size for time series selector (e.g. `last_value()`). Setting it to 0 to disable the cache.<br/>If not set, it's default to 1/16 of OS memory with a max limitation of 512MB. |
| `region_engine.mito.enable_experimental_write_cache` | Bool | `false` | Whether to enable the experimental write cache. |
| `region_engine.mito.experimental_write_cache_path` | String | `""` | File system path for write cache, defaults to `{data_home}/write_cache`. |
| `region_engine.mito.enable_write_cache` | Bool | `false` | Whether to enable the write cache, it's enabled by default when using object storage. It is recommended to enable it when using object storage for better performance. |
| `region_engine.mito.write_cache_path` | String | `""` | File system path for write cache, defaults to `{data_home}`. |
| `region_engine.mito.write_cache_size` | String | `5GiB` | Capacity for write cache. If your disk space is sufficient, it is recommended to set it larger. |
| `region_engine.mito.scan_parallelism` | Integer | `0` | Parallelism to scan a region (default: 1/4 of cpu cores).<br/>- `0`: using the default value (1/4 of cpu cores).<br/>- `1`: scan in current thread.<br/>- `n`: scan in parallelism n. |
| `region_engine.mito.parallel_scan_channel_size` | Integer | `32` | Capacity of the channel to send data from parallel scan tasks to the main task. |
| `region_engine.mito.max_concurrent_scan_files` | Integer | `384` | Maximum number of SST files to scan concurrently. |
| `region_engine.mito.allow_stale_entries` | Bool | `false` | Whether to allow stale WAL entries read during replay. |
| `region_engine.mito.min_compaction_interval` | String | `0m` | Minimum time interval between two compactions.<br/>To align with the old behavior, the default value is 0 (no restrictions). |
| `region_engine.mito.index` | -- | -- | The options for index in Mito engine. |
| `region_engine.mito.index.aux_path` | String | `""` | Auxiliary directory path for the index in filesystem, used to store intermediate files for<br/>creating the index and staging files for searching the index, defaults to `{data_home}/index_intermediate`.<br/>The default name for this directory is `index_intermediate` for backward compatibility.<br/><br/>This path contains two subdirectories:<br/>- `__intm`: for storing intermediate files used during creating index.<br/>- `staging`: for storing staging files used during searching index. |
| `region_engine.mito.index.staging_size` | String | `2GB` | The max capacity of the staging directory. |
| `region_engine.mito.index.staging_ttl` | String | `7d` | The TTL of the staging directory.<br/>Defaults to 7 days.<br/>Setting it to "0s" to disable TTL. |
| `region_engine.mito.index.metadata_cache_size` | String | `64MiB` | Cache size for inverted index metadata. |
| `region_engine.mito.index.content_cache_size` | String | `128MiB` | Cache size for inverted index content. |
| `region_engine.mito.index.content_cache_page_size` | String | `64KiB` | Page size for inverted index content cache. |
| `region_engine.mito.index.result_cache_size` | String | `128MiB` | Cache size for index result. |
| `region_engine.mito.inverted_index` | -- | -- | The options for inverted index in Mito engine. |
| `region_engine.mito.inverted_index.create_on_flush` | String | `auto` | Whether to create the index on flush.<br/>- `auto`: automatically (default)<br/>- `disable`: never |
| `region_engine.mito.inverted_index.create_on_compaction` | String | `auto` | Whether to create the index on compaction.<br/>- `auto`: automatically (default)<br/>- `disable`: never |
@@ -472,50 +562,76 @@
| `region_engine.mito.fulltext_index.create_on_compaction` | String | `auto` | Whether to create the index on compaction.<br/>- `auto`: automatically (default)<br/>- `disable`: never |
| `region_engine.mito.fulltext_index.apply_on_query` | String | `auto` | Whether to apply the index on query<br/>- `auto`: automatically (default)<br/>- `disable`: never |
| `region_engine.mito.fulltext_index.mem_threshold_on_create` | String | `auto` | Memory threshold for index creation.<br/>- `auto`: automatically determine the threshold based on the system memory size (default)<br/>- `unlimited`: no memory limit<br/>- `[size]` e.g. `64MB`: fixed memory threshold |
| `region_engine.mito.bloom_filter_index` | -- | -- | The options for bloom filter index in Mito engine. |
| `region_engine.mito.bloom_filter_index.create_on_flush` | String | `auto` | Whether to create the index on flush.<br/>- `auto`: automatically (default)<br/>- `disable`: never |
| `region_engine.mito.bloom_filter_index.create_on_compaction` | String | `auto` | Whether to create the index on compaction.<br/>- `auto`: automatically (default)<br/>- `disable`: never |
| `region_engine.mito.bloom_filter_index.apply_on_query` | String | `auto` | Whether to apply the index on query<br/>- `auto`: automatically (default)<br/>- `disable`: never |
| `region_engine.mito.bloom_filter_index.mem_threshold_on_create` | String | `auto` | Memory threshold for the index creation.<br/>- `auto`: automatically determine the threshold based on the system memory size (default)<br/>- `unlimited`: no memory limit<br/>- `[size]` e.g. `64MB`: fixed memory threshold |
| `region_engine.mito.memtable.index_max_keys_per_shard` | Integer | `8192` | The max number of keys in one shard.<br/>Only available for `partition_tree` memtable. |
| `region_engine.mito.memtable.data_freeze_threshold` | Integer | `32768` | The max rows of data inside the actively writing buffer in one shard.<br/>Only available for `partition_tree` memtable. |
| `region_engine.mito.memtable.fork_dictionary_bytes` | String | `1GiB` | Max dictionary bytes.<br/>Only available for `partition_tree` memtable. |
| `logging.append_stdout` | Bool | `true` | Whether to append logs to stdout. |
| `logging.log_format` | String | `text` | The log format. Can be `text`/`json`. |
| `logging.max_log_files` | Integer | `720` | The maximum amount of log files. |
| `logging.tracing_sample_ratio` | -- | -- | The percentage of tracing will be sampled and exported.<br/>Valid range `[0, 1]`, 1 means all traces are sampled, 0 means all traces are not sampled, the default value is 1.<br/>ratio > 1 are treated as 1. Fractions <0aretreatedas0|
| `logging.otlp_export_protocol` | String | `http` | The OTLP tracing export protocol. Can be `grpc`/`http`. |
| `logging.otlp_headers` | -- | -- | Additional OTLP headers, only valid when using OTLP http |
| `logging.tracing_sample_ratio` | -- | Unset | The percentage of tracing will be sampled and exported.<br/>Valid range `[0, 1]`, 1 means all traces are sampled, 0 means all traces are not sampled, the default value is 1.<br/>ratio > 1 are treated as 1. Fractions <0aretreatedas0|
|`export_metrics`|--|--|ThedatanodecanexportitsmetricsandsendtoPrometheuscompatibleservice(e.g.sendto`greptimedb`itself)fromremote-writeAPI.<br/>This is only used for `greptimedb` to export its own metrics internally. It's different from prometheus scrape. |
|`export_metrics`|--|--|ThedatanodecanexportitsmetricsandsendtoPrometheuscompatibleservice(e.g.`greptimedb`itself)fromremote-writeAPI.<br/>This is only used for `greptimedb` to export its own metrics internally. It's different from prometheus scrape. |
| `export_metrics.write_interval` | String | `30s` | The interval of export metrics. |
| `export_metrics.self_import` | -- | -- | For `standalone` mode, `self_import` is recommend to collect metrics generated by itself<br/>You must create the database before enabling it. |
| `export_metrics.remote_write.url` | String | `""` | The url the metrics send to. The url example can be: `http://127.0.0.1:4000/v1/prometheus/write?db=greptime_metrics`. |
| `export_metrics.remote_write.url` | String | `""` | The prometheus remote write endpoint that the metrics send to. The url example can be: `http://127.0.0.1:4000/v1/prometheus/write?db=greptime_metrics`. |
| `tracing` | -- | -- | The tracing options. Only effect when compiled with `tokio-console` feature. |
| `tracing.tokio_console_addr` | String | Unset | The tokio console address. |
| `memory` | -- | -- | The memory options. |
| `memory.enable_heap_profiling` | Bool | `true` | Whether to enable heap profiling activation during startup.<br/>When enabled, heap profiling will be activated if the `MALLOC_CONF` environment variable<br/>is set to "prof:true,prof_active:false". The official image adds this env variable.<br/>Default is true. |
### Flownode
| Key | Type | Default | Descriptions |
| --- | -----| ------- | ----------- |
| `mode` | String | `distributed` | The running mode of the flownode. It can be `standalone` or `distributed`. |
| `node_id` | Integer | Unset | The flownode identifier and should be unique in the cluster. |
| `flow` | -- | -- | flow engine options. |
| `flow.num_workers` | Integer | `0` | The number of flow worker in flownode.<br/>Not setting(or set to 0) this value will use the number of CPU cores divided by 2. |
| `flow.batching_mode` | -- | -- | -- |
| `flow.batching_mode.query_timeout` | String | `600s` | The default batching engine query timeout is 10 minutes. |
| `flow.batching_mode.slow_query_threshold` | String | `60s` | will output a warn log for any query that runs for more that this threshold |
| `flow.batching_mode.experimental_min_refresh_duration` | String | `5s` | The minimum duration between two queries execution by batching mode task |
| `flow.batching_mode.experimental_grpc_max_retries` | Integer | `3` | The gRPC max retry number |
| `flow.batching_mode.experimental_frontend_scan_timeout` | String | `30s` | Flow wait for available frontend timeout,<br/>if failed to find available frontend after frontend_scan_timeout elapsed, return error<br/>which prevent flownode from starting |
| `flow.batching_mode.experimental_frontend_activity_timeout` | String | `60s` | Frontend activity timeout<br/>if frontend is down(not sending heartbeat) for more than frontend_activity_timeout,<br/>it will be removed from the list that flownode use to connect |
| `flow.batching_mode.experimental_max_filter_num_per_query` | Integer | `20` | Maximum number of filters allowed in a single query |
| `logging.append_stdout` | Bool | `true` | Whether to append logs to stdout. |
| `logging.log_format` | String | `text` | The log format. Can be `text`/`json`. |
| `logging.max_log_files` | Integer | `720` | The maximum amount of log files. |
| `logging.tracing_sample_ratio` | -- | -- | The percentage of tracing will be sampled and exported.<br/>Valid range `[0, 1]`, 1 means all traces are sampled, 0 means all traces are not sampled, the default value is 1.<br/>ratio > 1 are treated as 1. Fractions <0aretreatedas0|
| `logging.otlp_export_protocol` | String | `http` | The OTLP tracing export protocol. Can be `grpc`/`http`. |
| `logging.otlp_headers` | -- | -- | Additional OTLP headers, only valid when using OTLP http |
| `logging.tracing_sample_ratio` | -- | Unset | The percentage of tracing will be sampled and exported.<br/>Valid range `[0, 1]`, 1 means all traces are sampled, 0 means all traces are not sampled, the default value is 1.<br/>ratio > 1 are treated as 1. Fractions <0aretreatedas0|
|`query.parallelism`|Integer|`1`|Parallelismofthequeryengineforquerysentbyflownode.<br/>Default to 1, so it won't use too much cpu or memory |
| `memory` | -- | -- | The memory options. |
| `memory.enable_heap_profiling` | Bool | `true` | Whether to enable heap profiling activation during startup.<br/>When enabled, heap profiling will be activated if the `MALLOC_CONF` environment variable<br/>is set to "prof:true,prof_active:false". The official image adds this env variable.<br/>Default is true. |
## Cache configuration for object storage such as 'S3' etc.
## The local file cache directory.
## Read cache configuration for object storage such as 'S3' etc, it's configured by default when using object storage. It is recommended to configure it when using object storage for better performance.
## A local file directory, defaults to `{data_home}`. An empty string means disabling.
## @toml2docs:none-default
cache_path ="/path/local_cache"
#+ cache_path = ""
## The local file cache capacity in bytes.
## The local file cache capacity in bytes. If your disk space is sufficient, it is recommended to set it larger.
## @toml2docs:none-default
cache_capacity="256MB"
cache_capacity="5GiB"
## The S3 bucket name.
## **It's only used when the storage type is `S3`, `Oss` and `Gcs`**.
## **It's only used when the storage type is `S3`, `Oss`, `Gcs` and `Azblob`**.
[storage.http_client]
## The maximum idle connection per host allowed in the pool.
pool_max_idle_per_host=1024
## The timeout for only the connect phase of a http client.
connect_timeout="30s"
## The total request timeout, applied from when the request starts connecting until the response body has finished.
## Also considered a total deadline.
timeout="30s"
## The timeout for idle sockets being kept-alive.
pool_idle_timeout="90s"
## To skip the ssl verification
## **Security Notice**: Setting `skip_ssl_validation = true` disables certificate verification, making connections vulnerable to man-in-the-middle attacks. Only use this in development or trusted private networks.
## Number of meta action updated to trigger a new checkpoint for the manifest.
manifest_checkpoint_distance=10
## Number of removed files to keep in manifest's `removed_files` field before also
## remove them from `removed_files`. Mostly for debugging purpose.
## If set to 0, it will only use `keep_removed_file_ttl` to decide when to remove files
## from `removed_files` field.
experimental_manifest_keep_removed_file_count=256
## How long to keep removed files in the `removed_files` field of manifest
## after they are removed from manifest.
## files will only be removed from `removed_files` field
## if both `keep_removed_file_count` and `keep_removed_file_ttl` is reached.
experimental_manifest_keep_removed_file_ttl="1h"
## Whether to compress manifest and checkpoint file by gzip (default false).
compress_manifest=false
@@ -459,31 +468,28 @@ auto_flush_interval = "1h"
## @toml2docs:none-default="Auto"
#+ selector_result_cache_size = "512MB"
## Whether to enable the experimental write cache.
enable_experimental_write_cache=false
## Whether to enable the write cache, it's enabled by default when using object storage. It is recommended to enable it when using object storage for better performance.
enable_write_cache=false
## File system path for write cache, defaults to `{data_home}/write_cache`.
experimental_write_cache_path=""
## File system path for write cache, defaults to `{data_home}`.
write_cache_path=""
## Capacity for write cache.
experimental_write_cache_size="512MB"
## Capacity for write cache. If your disk space is sufficient, it is recommended to set it larger.
write_cache_size="5GiB"
## TTL for write cache.
## @toml2docs:none-default
experimental_write_cache_ttl="8h"
write_cache_ttl="8h"
## Buffer size for SST writing.
sst_write_buffer_size="8MB"
## Parallelism to scan a region (default: 1/4 of cpu cores).
## - `0`: using the default value (1/4 of cpu cores).
## - `1`: scan in current thread.
## - `n`: scan in parallelism n.
scan_parallelism=0
## Capacity of the channel to send data from parallel scan tasks to the main task.
parallel_scan_channel_size=32
## Maximum number of SST files to scan concurrently.
max_concurrent_scan_files=384
## Whether to allow stale WAL entries read during replay.
allow_stale_entries=false
@@ -506,6 +512,23 @@ aux_path = ""
## The max capacity of the staging directory.
staging_size="2GB"
## The TTL of the staging directory.
## Defaults to 7 days.
## Setting it to "0s" to disable TTL.
staging_ttl="7d"
## Cache size for inverted index metadata.
metadata_cache_size="64MiB"
## Cache size for inverted index content.
content_cache_size="128MiB"
## Page size for inverted index content cache.
content_cache_page_size="64KiB"
## Cache size for index result.
result_cache_size="128MiB"
## The options for inverted index in Mito engine.
[region_engine.mito.inverted_index]
@@ -557,6 +580,30 @@ apply_on_query = "auto"
## - `[size]` e.g. `64MB`: fixed memory threshold
mem_threshold_on_create="auto"
## The options for bloom filter index in Mito engine.
[region_engine.mito.bloom_filter_index]
## Whether to create the index on flush.
## - `auto`: automatically (default)
## - `disable`: never
create_on_flush="auto"
## Whether to create the index on compaction.
## - `auto`: automatically (default)
## - `disable`: never
create_on_compaction="auto"
## Whether to apply the index on query
## - `auto`: automatically (default)
## - `disable`: never
apply_on_query="auto"
## Memory threshold for the index creation.
## - `auto`: automatically determine the threshold based on the system memory size (default)
## Whether to enable the experimental sparse primary key encoding.
experimental_sparse_primary_key_encoding=false
## The logging options.
[logging]
## The directory to store the log files. If set to empty, logs will not be written to files.
dir="/tmp/greptimedb/logs"
dir="./greptimedb_data/logs"
## The log level. Can be `info`/`debug`/`warn`/`error`.
## @toml2docs:none-default
@@ -592,7 +645,7 @@ level = "info"
enable_otlp_tracing=false
## The OTLP tracing endpoint.
otlp_endpoint="http://localhost:4317"
otlp_endpoint="http://localhost:4318/v1/traces"
## Whether to append logs to stdout.
append_stdout=true
@@ -603,43 +656,32 @@ log_format = "text"
## The maximum amount of log files.
max_log_files=720
## The OTLP tracing export protocol. Can be `grpc`/`http`.
otlp_export_protocol="http"
## Additional OTLP headers, only valid when using OTLP http
[logging.otlp_headers]
## @toml2docs:none-default
#Authorization = "Bearer my-token"
## @toml2docs:none-default
#Database = "My database"
## The percentage of tracing will be sampled and exported.
## Valid range `[0, 1]`, 1 means all traces are sampled, 0 means all traces are not sampled, the default value is 1.
## ratio > 1 are treated as 1. Fractions < 0 are treated as 0
[logging.tracing_sample_ratio]
default_ratio=1.0
## The slow query log options.
[logging.slow_query]
## Whether to enable slow query log.
enable=false
## The threshold of slow query.
## @toml2docs:none-default
threshold="10s"
## The sampling ratio of slow query log. The value should be in the range of (0, 1].
## @toml2docs:none-default
sample_ratio=1.0
## The datanode can export its metrics and send to Prometheus compatible service (e.g. send to `greptimedb` itself) from remote-write API.
## The datanode can export its metrics and send to Prometheus compatible service (e.g. `greptimedb` itself) from remote-write API.
## This is only used for `greptimedb` to export its own metrics internally. It's different from prometheus scrape.
[export_metrics]
## whether enable export metrics.
enable=false
## The interval of export metrics.
write_interval="30s"
## For `standalone` mode, `self_import` is recommend to collect metrics generated by itself
## You must create the database before enabling it.
[export_metrics.self_import]
## @toml2docs:none-default
db="greptime_metrics"
[export_metrics.remote_write]
## The url the metrics send to. The url example can be: `http://127.0.0.1:4000/v1/prometheus/write?db=greptime_metrics`.
## The prometheus remote write endpoint that the metrics send to. The url example can be: `http://127.0.0.1:4000/v1/prometheus/write?db=greptime_metrics`.
url=""
## HTTP headers of Prometheus remote-write carry.
@@ -650,3 +692,11 @@ headers = { }
## The tokio console address.
## @toml2docs:none-default
#+ tokio_console_addr = "127.0.0.1"
## The memory options.
[memory]
## Whether to enable heap profiling activation during startup.
## When enabled, heap profiling will be activated if the `MALLOC_CONF` environment variable
## is set to "prof:true,prof_active:false". The official image adds this env variable.
## The address advertised to the metasrv, and used for connections from outside the host.
## If left empty or unset, the server will automatically use the IP address of the first network interface
## on the host, with the same port number as the one specified in `grpc.bind_addr`.
server_addr="127.0.0.1:4001"
## The number of server worker threads.
runtime_size=8
## Compression mode for frontend side Arrow IPC service. Available options:
## - `none`: disable all compression
## - `transport`: only enable gRPC transport compression (zstd)
## - `arrow_ipc`: only enable Arrow IPC compression (lz4)
## - `all`: enable all compression.
## Default to `none`
flight_compression="arrow_ipc"
## gRPC server TLS options, see `mysql.tls` section.
[grpc.tls]
@@ -55,6 +79,42 @@ key_path = ""
## For now, gRPC tls config does not support auto reload.
watch=false
## The internal gRPC server options. Internal gRPC port for nodes inside cluster to access frontend.
[internal_grpc]
## The address to bind the gRPC server.
bind_addr="127.0.0.1:4010"
## The address advertised to the metasrv, and used for connections from outside the host.
## If left empty or unset, the server will automatically use the IP address of the first network interface
## on the host, with the same port number as the one specified in `grpc.bind_addr`.
server_addr="127.0.0.1:4010"
## The number of server worker threads.
runtime_size=8
## Compression mode for frontend side Arrow IPC service. Available options:
## - `none`: disable all compression
## - `transport`: only enable gRPC transport compression (zstd)
## - `arrow_ipc`: only enable Arrow IPC compression (lz4)
## - `all`: enable all compression.
## Default to `none`
flight_compression="arrow_ipc"
## internal gRPC server TLS options, see `mysql.tls` section.
[internal_grpc.tls]
## TLS mode.
mode="disable"
## Certificate file path.
## @toml2docs:none-default
cert_path=""
## Private key file path.
## @toml2docs:none-default
key_path=""
## Watch for Certificate and key file change and auto reload.
## For now, gRPC tls config does not support auto reload.
watch=false
## MySQL server options.
[mysql]
## Whether to enable.
@@ -63,6 +123,11 @@ enable = true
addr="127.0.0.1:4002"
## The number of server worker threads.
runtime_size=2
## Server-side keep-alive time.
## Set to 0 (default) to disable.
keep_alive="0s"
## Maximum entries in the MySQL prepared statement cache; default is 10,000.
prepared_stmt_cache_size=10000
# MySQL server TLS options.
[mysql.tls]
@@ -94,6 +159,9 @@ enable = true
addr="127.0.0.1:4003"
## The number of server worker threads.
runtime_size=2
## Server-side keep-alive time.
## Set to 0 (default) to disable.
keep_alive="0s"
## PostgresSQL server TLS options, see `mysql.tls` section.
[postgres.tls]
@@ -121,6 +189,11 @@ enable = true
## Whether to enable InfluxDB protocol in HTTP API.
enable=true
## Jaeger protocol options.
[jaeger]
## Whether to enable Jaeger protocol in HTTP API.
enable=true
## Prometheus remote storage options
[prom_store]
## Whether to enable Prometheus remote write and read in HTTP API.
@@ -157,6 +230,15 @@ metadata_cache_ttl = "10m"
# TTI of the metadata cache.
metadata_cache_tti="5m"
## The query engine options.
[query]
## Parallelism of the query engine.
## Default to 0, which means the number of CPU cores.
parallelism=0
## Whether to allow query fallback when push down optimize fails.
## Default to false, meaning when push down optimize failed, return error msg
allow_query_fallback=false
## Datanode options.
[datanode]
## Datanode client options.
@@ -167,7 +249,7 @@ tcp_nodelay = true
## The logging options.
[logging]
## The directory to store the log files. If set to empty, logs will not be written to files.
dir="/tmp/greptimedb/logs"
dir="./greptimedb_data/logs"
## The log level. Can be `info`/`debug`/`warn`/`error`.
## @toml2docs:none-default
@@ -177,7 +259,7 @@ level = "info"
enable_otlp_tracing=false
## The OTLP tracing endpoint.
otlp_endpoint="http://localhost:4317"
otlp_endpoint="http://localhost:4318/v1/traces"
## Whether to append logs to stdout.
append_stdout=true
@@ -188,6 +270,16 @@ log_format = "text"
## The maximum amount of log files.
max_log_files=720
## The OTLP tracing export protocol. Can be `grpc`/`http`.
otlp_export_protocol="http"
## Additional OTLP headers, only valid when using OTLP http
[logging.otlp_headers]
## @toml2docs:none-default
#Authorization = "Bearer my-token"
## @toml2docs:none-default
#Database = "My database"
## The percentage of tracing will be sampled and exported.
## Valid range `[0, 1]`, 1 means all traces are sampled, 0 means all traces are not sampled, the default value is 1.
## ratio > 1 are treated as 1. Fractions < 0 are treated as 0
@@ -195,36 +287,34 @@ max_log_files = 720
default_ratio=1.0
## The slow query log options.
[logging.slow_query]
[slow_query]
## Whether to enable slow query log.
enable=false
enable=true
## The threshold of slow query.
## @toml2docs:none-default
threshold="10s"
## The record type of slow queries. It can be `system_table` or `log`.
## If `system_table` is selected, the slow queries will be recorded in a system table `greptime_private.slow_queries`.
## If `log` is selected, the slow queries will be logged in a log file `greptimedb-slow-queries.*`.
record_type="system_table"
## The sampling ratio of slow query log. The value should be in the range of (0, 1].
## @toml2docs:none-default
## The threshold of slow query. It can be human readable time string, for example: `10s`, `100ms`, `1s`.
threshold="30s"
## The sampling ratio of slow query log. The value should be in the range of (0, 1]. For example, `0.1` means 10% of the slow queries will be logged and `1.0` means all slow queries will be logged.
sample_ratio=1.0
## The datanode can export its metrics and send to Prometheus compatible service (e.g. send to `greptimedb` itself) from remote-write API.
## The TTL of the `slow_queries` system table. Default is `90d` when `record_type` is `system_table`.
ttl="90d"
## The frontend can export its metrics and send to Prometheus compatible service (e.g. `greptimedb` itself) from remote-write API.
## This is only used for `greptimedb` to export its own metrics internally. It's different from prometheus scrape.
[export_metrics]
## whether enable export metrics.
enable=false
## The interval of export metrics.
write_interval="30s"
## For `standalone` mode, `self_import` is recommend to collect metrics generated by itself
## You must create the database before enabling it.
[export_metrics.self_import]
## @toml2docs:none-default
db="greptime_metrics"
[export_metrics.remote_write]
## The url the metrics send to. The url example can be: `http://127.0.0.1:4000/v1/prometheus/write?db=greptime_metrics`.
## The prometheus remote write endpoint that the metrics send to. The url example can be: `http://127.0.0.1:4000/v1/prometheus/write?db=greptime_metrics`.
url=""
## HTTP headers of Prometheus remote-write carry.
@@ -235,3 +325,16 @@ headers = { }
## The tokio console address.
## @toml2docs:none-default
#+ tokio_console_addr = "127.0.0.1"
## The memory options.
[memory]
## Whether to enable heap profiling activation during startup.
## When enabled, heap profiling will be activated if the `MALLOC_CONF` environment variable
## is set to "prof:true,prof_active:false". The official image adds this env variable.
## Default is true.
enable_heap_profiling=true
## Configuration options for the event recorder.
[event_recorder]
## TTL for the events table that will be used to store the events. Default is `90d`.
## **It's only used when the provider is `kafka`**.
topic_name_prefix="greptimedb_wal_topic"
## Expected number of replicas of each partition.
## **It's only used when the provider is `kafka`**.
replication_factor=1
## Above which a topic creation operation will be cancelled.
## The timeout for creating a Kafka topic.
## **It's only used when the provider is `kafka`**.
create_topic_timeout="30s"
## The initial backoff for kafka clients.
backoff_init="500ms"
## The maximum backoff for kafka clients.
backoff_max="10s"
## Exponential backoff rate, i.e. next backoff = base * current backoff.
backoff_base=2
## Stop reconnecting if the total wait time reaches the deadline. If this config is missing, the reconnecting won't terminate.
backoff_deadline="5mins"
# The Kafka SASL configuration.
# **It's only used when the provider is `kafka`**.
@@ -151,10 +267,27 @@ backoff_deadline = "5mins"
# client_cert_path = "/path/to/client_cert"
# client_key_path = "/path/to/key"
## Configuration options for the event recorder.
[event_recorder]
## TTL for the events table that will be used to store the events. Default is `90d`.
ttl="90d"
## Configuration options for the stats persistence.
[stats_persistence]
## TTL for the stats table that will be used to store the stats.
## Set to `0s` to disable stats persistence.
## Default is `0s`.
## If you want to enable stats persistence, set the TTL to a value greater than 0.
## It is recommended to set a small value, e.g., `3h`.
ttl="0s"
## The interval to persist the stats. Default is `10m`.
## The minimum value is `10m`, if the value is less than `10m`, it will be overridden to `10m`.
interval="10m"
## The logging options.
[logging]
## The directory to store the log files. If set to empty, logs will not be written to files.
dir="/tmp/greptimedb/logs"
dir="./greptimedb_data/logs"
## The log level. Can be `info`/`debug`/`warn`/`error`.
## @toml2docs:none-default
@@ -164,7 +297,7 @@ level = "info"
enable_otlp_tracing=false
## The OTLP tracing endpoint.
otlp_endpoint="http://localhost:4317"
otlp_endpoint="http://localhost:4318/v1/traces"
## Whether to append logs to stdout.
append_stdout=true
@@ -175,43 +308,33 @@ log_format = "text"
## The maximum amount of log files.
max_log_files=720
## The OTLP tracing export protocol. Can be `grpc`/`http`.
otlp_export_protocol="http"
## Additional OTLP headers, only valid when using OTLP http
[logging.otlp_headers]
## @toml2docs:none-default
#Authorization = "Bearer my-token"
## @toml2docs:none-default
#Database = "My database"
## The percentage of tracing will be sampled and exported.
## Valid range `[0, 1]`, 1 means all traces are sampled, 0 means all traces are not sampled, the default value is 1.
## ratio > 1 are treated as 1. Fractions < 0 are treated as 0
[logging.tracing_sample_ratio]
default_ratio=1.0
## The slow query log options.
[logging.slow_query]
## Whether to enable slow query log.
enable=false
## The threshold of slow query.
## @toml2docs:none-default
threshold="10s"
## The sampling ratio of slow query log. The value should be in the range of (0, 1].
## @toml2docs:none-default
sample_ratio=1.0
## The datanode can export its metrics and send to Prometheus compatible service (e.g. send to `greptimedb` itself) from remote-write API.
## The metasrv can export its metrics and send to Prometheus compatible service (e.g. `greptimedb` itself) from remote-write API.
## This is only used for `greptimedb` to export its own metrics internally. It's different from prometheus scrape.
[export_metrics]
## whether enable export metrics.
enable=false
## The interval of export metrics.
write_interval="30s"
## For `standalone` mode, `self_import` is recommend to collect metrics generated by itself
## You must create the database before enabling it.
[export_metrics.self_import]
## @toml2docs:none-default
db="greptime_metrics"
[export_metrics.remote_write]
## The url the metrics send to. The url example can be: `http://127.0.0.1:4000/v1/prometheus/write?db=greptime_metrics`.
## The prometheus remote write endpoint that the metrics send to. The url example can be: `http://127.0.0.1:4000/v1/prometheus/write?db=greptime_metrics`.
url=""
## HTTP headers of Prometheus remote-write carry.
@@ -222,3 +345,11 @@ headers = { }
## The tokio console address.
## @toml2docs:none-default
#+ tokio_console_addr = "127.0.0.1"
## The memory options.
[memory]
## Whether to enable heap profiling activation during startup.
## When enabled, heap profiling will be activated if the `MALLOC_CONF` environment variable
## is set to "prof:true,prof_active:false". The official image adds this env variable.
## Cache configuration for object storage such as 'S3' etc.
## The local file cache directory.
## Read cache configuration for object storage such as 'S3' etc, it's configured by default when using object storage. It is recommended to configure it when using object storage for better performance.
## A local file directory, defaults to `{data_home}`. An empty string means disabling.
## @toml2docs:none-default
cache_path ="/path/local_cache"
#+ cache_path = ""
## The local file cache capacity in bytes.
## The local file cache capacity in bytes. If your disk space is sufficient, it is recommended to set it larger.
## @toml2docs:none-default
cache_capacity="256MB"
cache_capacity="5GiB"
## The S3 bucket name.
## **It's only used when the storage type is `S3`, `Oss` and `Gcs`**.
## **It's only used when the storage type is `S3`, `Oss`, `Gcs` and `Azblob`**.
[storage.http_client]
## The maximum idle connection per host allowed in the pool.
pool_max_idle_per_host=1024
## The timeout for only the connect phase of a http client.
connect_timeout="30s"
## The total request timeout, applied from when the request starts connecting until the response body has finished.
## Also considered a total deadline.
timeout="30s"
## The timeout for idle sockets being kept-alive.
pool_idle_timeout="90s"
## To skip the ssl verification
## **Security Notice**: Setting `skip_ssl_validation = true` disables certificate verification, making connections vulnerable to man-in-the-middle attacks. Only use this in development or trusted private networks.
skip_ssl_validation=false
# Custom storage options
# [[storage.providers]]
# name = "S3"
@@ -497,31 +547,28 @@ auto_flush_interval = "1h"
## @toml2docs:none-default="Auto"
#+ selector_result_cache_size = "512MB"
## Whether to enable the experimental write cache.
enable_experimental_write_cache=false
## Whether to enable the write cache, it's enabled by default when using object storage. It is recommended to enable it when using object storage for better performance.
enable_write_cache=false
## File system path for write cache, defaults to `{data_home}/write_cache`.
experimental_write_cache_path=""
## File system path for write cache, defaults to `{data_home}`.
write_cache_path=""
## Capacity for write cache.
experimental_write_cache_size="512MB"
## Capacity for write cache. If your disk space is sufficient, it is recommended to set it larger.
write_cache_size="5GiB"
## TTL for write cache.
## @toml2docs:none-default
experimental_write_cache_ttl="8h"
write_cache_ttl="8h"
## Buffer size for SST writing.
sst_write_buffer_size="8MB"
## Parallelism to scan a region (default: 1/4 of cpu cores).
## - `0`: using the default value (1/4 of cpu cores).
## - `1`: scan in current thread.
## - `n`: scan in parallelism n.
scan_parallelism=0
## Capacity of the channel to send data from parallel scan tasks to the main task.
parallel_scan_channel_size=32
## Maximum number of SST files to scan concurrently.
max_concurrent_scan_files=384
## Whether to allow stale WAL entries read during replay.
## Whether to enable the experimental sparse primary key encoding.
experimental_sparse_primary_key_encoding=false
## The logging options.
[logging]
## The directory to store the log files. If set to empty, logs will not be written to files.
dir="/tmp/greptimedb/logs"
dir="./greptimedb_data/logs"
## The log level. Can be `info`/`debug`/`warn`/`error`.
## @toml2docs:none-default
@@ -636,7 +724,7 @@ level = "info"
enable_otlp_tracing=false
## The OTLP tracing endpoint.
otlp_endpoint="http://localhost:4317"
otlp_endpoint="http://localhost:4318/v1/traces"
## Whether to append logs to stdout.
append_stdout=true
@@ -647,6 +735,16 @@ log_format = "text"
## The maximum amount of log files.
max_log_files=720
## The OTLP tracing export protocol. Can be `grpc`/`http`.
otlp_export_protocol="http"
## Additional OTLP headers, only valid when using OTLP http
[logging.otlp_headers]
## @toml2docs:none-default
#Authorization = "Bearer my-token"
## @toml2docs:none-default
#Database = "My database"
## The percentage of tracing will be sampled and exported.
## Valid range `[0, 1]`, 1 means all traces are sampled, 0 means all traces are not sampled, the default value is 1.
## ratio > 1 are treated as 1. Fractions < 0 are treated as 0
@@ -654,25 +752,27 @@ max_log_files = 720
default_ratio=1.0
## The slow query log options.
[logging.slow_query]
[slow_query]
## Whether to enable slow query log.
enable = false
#+ enable = false
## The record type of slow queries. It can be `system_table` or `log`.
## @toml2docs:none-default
#+ record_type = "system_table"
## The threshold of slow query.
## @toml2docs:none-default
threshold = "10s"
#+ threshold = "10s"
## The sampling ratio of slow query log. The value should be in the range of (0, 1].
## @toml2docs:none-default
sample_ratio = 1.0
#+ sample_ratio = 1.0
## The datanode can export its metrics and send to Prometheus compatible service (e.g. send to `greptimedb` itself) from remote-write API.
## The standalone can export its metrics and send to Prometheus compatible service (e.g. `greptimedb`) from remote-write API.
## This is only used for `greptimedb` to export its own metrics internally. It's different from prometheus scrape.
[export_metrics]
## whether enable export metrics.
enable=false
## The interval of export metrics.
write_interval="30s"
@@ -683,7 +783,7 @@ write_interval = "30s"
db="greptime_metrics"
[export_metrics.remote_write]
## The url the metrics send to. The url example can be: `http://127.0.0.1:4000/v1/prometheus/write?db=greptime_metrics`.
## The prometheus remote write endpoint that the metrics send to. The url example can be: `http://127.0.0.1:4000/v1/prometheus/write?db=greptime_metrics`.
url=""
## HTTP headers of Prometheus remote-write carry.
@@ -694,3 +794,11 @@ headers = { }
## The tokio console address.
## @toml2docs:none-default
#+ tokio_console_addr = "127.0.0.1"
## The memory options.
[memory]
## Whether to enable heap profiling activation during startup.
## When enabled, heap profiling will be activated if the `MALLOC_CONF` environment variable
## is set to "prof:true,prof_active:false". The official image adds this env variable.
@@ -48,4 +48,4 @@ Please refer to [SQL query](./query.sql) for GreptimeDB and Clickhouse, and [que
## Addition
- You can tune GreptimeDB's configuration to get better performance.
- You can setup GreptimeDB to use S3 as storage, see [here](https://docs.greptime.com/user-guide/deployments/configuration#storage-options).
- You can setup GreptimeDB to use S3 as storage, see [here](https://docs.greptime.com/user-guide/deployments-administration/configuration#storage-options).
Log Level changed from Some("info") to "trace;flow=debug"%
Log Level changed from Some("info") to "trace,flow=debug"%
```
The data is a string in the format of `global_level;module1=level1;module2=level2;...` that follow the same rule of `RUST_LOG`.
The data is a string in the format of `global_level,module1=level1,module2=level2,...` that follows the same rule of `RUST_LOG`.
The module is the module name of the log, and the level is the log level. The log level can be one of the following: `trace`, `debug`, `info`, `warn`, `error`, `off`(case insensitive).
This crate provides an easy approach to dump memory profiling info.
This crate provides an easy approach to dump memory profiling info. A set of ready to use scripts is provided in [docs/how-to/memory-profile-scripts](./memory-profile-scripts/scripts).
## Prerequisites
### jemalloc
jeprof is already compiled in the target directory of GreptimeDB. You can find the binary and use it.
```
# find jeprof binary
find . -name 'jeprof'
# add executable permission
chmod +x <path_to_jeprof>
```
The path is usually under `./target/${PROFILE}/build/tikv-jemalloc-sys-${HASH}/out/build/bin/jeprof`.
The default version of jemalloc installed from the package manager may not have the `--collapsed` option.
You may need to check the whether the `jeprof` version is >= `5.3.0` if you want to install it from the package manager.
You can control heap profiling activation through configuration. Add the following to your configuration file:
```toml
[memory]
# Whether to enable heap profiling activation during startup.
# When enabled, heap profiling will be activated if the `MALLOC_CONF` environment variable
# is set to "prof:true,prof_active:false". The official image adds this env variable.
# Default is true.
enable_heap_profiling=true
```
By default, if you set `MALLOC_CONF=prof:true,prof_active:false`, the database will enable profiling during startup. You can disable this behavior by setting `enable_heap_profiling = false` in the configuration.
### Starting with environment variables
Start GreptimeDB instance with environment variables:
Currently, our query engine is based on DataFusion, so all aggregate function is executed by DataFusion, through its UDAF interface. You can find DataFusion's UDAF example [here](https://github.com/apache/arrow-datafusion/blob/arrow2/datafusion-examples/examples/simple_udaf.rs). Basically, we provide the same way as DataFusion to write aggregate functions: both are centered in a struct called "Accumulator" to accumulates states along the way in aggregation.
However, DataFusion's UDAF implementation has a huge restriction, that it requires user to provide a concrete "Accumulator". Take `Median` aggregate function for example, to aggregate a `u32` datatype column, you have to write a `MedianU32`, and use `SELECT MEDIANU32(x)` in SQL. `MedianU32` cannot be used to aggregate a `i32` datatype column. Or, there's another way: you can use a special type that can hold all kinds of data (like our `Value` enum or Arrow's `ScalarValue`), and `match` all the way up to do aggregate calculations. It might work, though rather tedious. (But I think it's DataFusion's prefer way to write UDAF.)
So is there a way we can make an aggregate function that automatically match the input data's type? For example, a `Median` aggregator that can work on both `u32` column and `i32`? The answer is yes until we found a way to bypassing DataFusion's restriction, a restriction that DataFusion simply don't pass the input data's type when creating an Accumulator.
> There's an example in `my_sum_udaf_example.rs`, take that as quick start.
# 1. Impl `AggregateFunctionCreator` trait for your accumulator creator.
You must first define a struct that will be used to create your accumulator. For example,
```Rust
#[as_aggr_func_creator]
#[derive(Debug, AggrFuncTypeStore)]
structMySumAccumulatorCreator{}
```
Attribute macro `#[as_aggr_func_creator]` and derive macro `#[derive(Debug, AggrFuncTypeStore)]` must both annotated on the struct. They work together to provide a storage of aggregate function's input data types, which are needed for creating generic accumulator later.
> Note that the `as_aggr_func_creator` macro will add fields to the struct, so the struct cannot be defined as an empty struct without field like `struct Foo;`, neither as a new type like `struct Foo(bar)`.
Then impl `AggregateFunctionCreator` trait on it. The definition of the trait is:
You can use input data's type in methods that return output type and state types (just invoke `input_types()`).
The output type is aggregate function's output data's type. For example, `SUM` aggregate function's output type is `u64` for a `u32` datatype column. The state types are accumulator's internal states' types. Take `AVG` aggregate function on a `i32` column as example, it's state types are `i64` (for sum) and `u64` (for count).
The `creator` function is where you define how an accumulator (that will be used in DataFusion) is created. You define "how" to create the accumulator (instead of "what" to create), using the input data's type as arguments. With input datatype known, you can create accumulator generically.
# 2. Impl `Accumulator` trait for you accumulator.
The accumulator is where you store the aggregate calculation states and evaluate a result. You must impl `Accumulator` trait for it. The trait's definition is:
The DataFusion basically execute aggregate like this:
1. Partitioning all input data for aggregate. Create an accumulator for each part.
2. Call `update_batch` on each accumulator with partitioned data, to let you update your aggregate calculation.
3. Call `state` to get each accumulator's internal state, the medial calculation result.
4. Call `merge_batch` to merge all accumulator's internal state to one.
5. Execute `evaluate` on the chosen one to get the final calculation result.
Once you know the meaning of each method, you can easily write your accumulator. You can refer to `Median` accumulator or `SUM` accumulator defined in file `my_sum_udaf_example.rs` for more details.
# 3. Register your aggregate function to our query engine.
You can call `register_aggregate_function` method in query engine to register your aggregate function. To do that, you have to new an instance of struct `AggregateFunctionMeta`. The struct has three fields, first is the name of your aggregate function's name. The function name is case-sensitive due to DataFusion's restriction. We strongly recommend using lowercase for your name. If you have to use uppercase name, wrap your aggregate function with quotation marks. For example, if you define an aggregate function named "my_aggr", you can use "`SELECT MY_AGGR(x)`"; if you define "my_AGGR", you have to use "`SELECT "my_AGGR"(x)`".
The second field is arg_counts ,the count of the arguments. Like accumulator `percentile`, calculating the p_number of the column. We need to input the value of column and the value of p to cacalate, and so the count of the arguments is two.
The third field is a function about how to create your accumulator creator that you defined in step 1 above. Create creator, that's a bit intertwined, but it is how we make DataFusion use a newly created aggregate function each time it executes a SQL, preventing the stored input types from affecting each other. The key detail can be starting looking at our `DfContextProviderAdapter` struct's `get_aggregate_meta` method.
# (Optional) 4. Make your aggregate function automatically registered.
If you've written a great aggregate function that want to let everyone use it, you can make it automatically registered to our query engine at start time. It's quick simple, just refer to the `AggregateFunctions::register` function in `common/function/src/scalars/aggregate/mod.rs`.
This document introduces how to write fuzz tests in GreptimeDB.
## What is a fuzz test
Fuzz test is tool that leverage deterministic random generation to assist in finding bugs. The goal of fuzz tests is to identify inputs generated by the fuzzer that cause system panics, crashes, or unexpected behaviors to occur. And we are using the [cargo-fuzz](https://github.com/rust-fuzz/cargo-fuzz) to run our fuzz test targets.
Fuzz test is tool that leverages deterministic random generation to assist in finding bugs. The goal of fuzz tests is to identify inputs generated by the fuzzer that cause system panics, crashes, or unexpected behaviors to occur. And we are using the [cargo-fuzz](https://github.com/rust-fuzz/cargo-fuzz) to run our fuzz test targets.
## Why we need them
- Find bugs by leveraging random generation
@@ -13,7 +13,7 @@ Fuzz test is tool that leverage deterministic random generation to assist in fin
All fuzz test-related resources are located in the `/tests-fuzz` directory.
There are two types of resources: (1) fundamental components and (2) test targets.
### Fundamental components
### Fundamental components
They are located in the `/tests-fuzz/src` directory. The fundamental components define how to generate SQLs (including dialects for different protocols) and validate execution results (e.g., column attribute validation), etc.
### Test targets
@@ -21,25 +21,25 @@ They are located in the `/tests-fuzz/targets` directory, with each file represen
Figure 1 illustrates the fundamental components of the fuzz test provide the ability to generate random SQLs. It utilizes a Random Number Generator (Rng) to generate the Intermediate Representation (IR), then employs a DialectTranslator to produce specified dialects for different protocols. Finally, the fuzz tests send the generated SQL via the specified protocol and verify that the execution results meet expectations.
This section will guide you through the process of analyzing memory usage for greptimedb.
1. Get the `jeprof` tool script, see the next section("Getting the `jeprof` tool") for details.
2. After starting `greptimedb`(with env var `MALLOC_CONF=prof:true`), execute the `dump.sh` script with the PID of the `greptimedb` process as an argument. This continuously monitors memory usage and captures profiles when exceeding thresholds (e.g. +20MB within 10 minutes). Outputs `greptime-{timestamp}.gprof` files.
3. With 2-3 gprof files, run `gen_flamegraph.sh` in the same environment to generate flame graphs showing memory allocation call stacks.
4.**NOTE:** The `gen_flamegraph.sh` script requires `jeprof` and optionally `flamegraph.pl` to be in the current directory. If needed to gen flamegraph now, run the `get_flamegraph_tool.sh` script, which downloads the flame graph generation tool `flamegraph.pl` to the current directory.
where `<binary_path>` is the path to the greptimedb binary, `<gprof_directory>` is the directory containing the gprof files(the directory `dump.sh` is dumping profiles to).
Example call: `./gen_flamegraph.sh ./greptime .`
Generating the flame graph might take a few minutes. The generated flame graphs are located in the `<gprof_directory>/flamegraphs` directory. Or if no `flamegraph.pl` is found, it will only contain `.collapse` files which is also fine.
5. You can send the generated flame graphs(the entire folder of `<gprof_directory>/flamegraphs`) to developers for further analysis.
## Getting the `jeprof` tool
there are three ways to get `jeprof`, list in here from simple to complex, using any one of those methods is ok, as long as it's the same environment as the `greptimedb` will be running on:
1. If you are compiling greptimedb from source, then `jeprof` is already produced during compilation. After running `cargo build`, execute `find_compiled_jeprof.sh`. This will copy `jeprof` to the current directory.
2. Or, if you have the Rust toolchain installed locally, simply follow these commands:
after that the `jeprof` tool is produced. Now run `find_compiled_jeprof.sh` in current directory, it will copy the `jeprof` tool to the current directory.
3. compile jemalloc from source
you can first clone this repo, and checkout to this commit:
The most suitable compaction strategy for time-series scenario would be
a hybrid strategy that combines time window compaction with size-tired compaction, just like [Cassandra](https://cassandra.apache.org/doc/latest/cassandra/operating/compaction/twcs.html) and [ScyllaDB](https://docs.scylladb.com/stable/architecture/compaction/compaction-strategies.html#time-window-compaction-strategy-twcs) does.
a hybrid strategy that combines time window compaction with size-tired compaction, just like [Cassandra](https://cassandra.apache.org/doc/latest/cassandra/managing/operating/compaction/twcs.html) and [ScyllaDB](https://docs.scylladb.com/stable/architecture/compaction/compaction-strategies.html#time-window-compaction-strategy-twcs) does.
We can first group SSTs in level n into buckets according to some predefined time window. Within that window,
SSTs are compacted in a size-tired manner (find SSTs with similar size and compact them to level n+1).
@@ -28,7 +28,7 @@ In order to do those things while maintaining a low memory footprint, you need t
- Greptime Flow's is built on top of [Hydroflow](https://github.com/hydro-project/hydroflow).
- We have three choices for the Dataflow/Streaming process framework for our simple continuous aggregation feature:
1. Based on the timely/differential dataflow crate that [materialize](https://github.com/MaterializeInc/materialize) based on. Later, it's proved too obscure for a simple usage, and is hard to customize memory usage control.
2. Based on a simple dataflow framework that we write from ground up, like what [arroyo](https://www.arroyo.dev/) or [risingwave](https://www.risingwave.dev/) did, for example the core streaming logic of [arroyo](https://github.com/ArroyoSystems/arroyo/blob/master/arroyo-datastream/src/lib.rs) only takes up to 2000 line of codes. However, it means maintaining another layer of dataflow framework, which might seem easy in the beginning, but I fear it might be too burdensome to maintain once we need more features.
2. Based on a simple dataflow framework that we write from ground up, like what [arroyo](https://www.arroyo.dev/) or [risingwave](https://www.risingwave.dev/) did, for example the core streaming logic of [arroyo](https://github.com/ArroyoSystems/arroyo/blob/master/crates/arroyo-datastream/src/lib.rs) only takes up to 2000 line of codes. However, it means maintaining another layer of dataflow framework, which might seem easy in the beginning, but I fear it might be too burdensome to maintain once we need more features.
3. Based on a simple and lower level dataflow framework that someone else write, like [hydroflow](https://github.com/hydro-project/hydroflow), this approach combines the best of both worlds. Firstly, it boasts ease of comprehension and customization. Secondly, the dataflow framework offers precisely the necessary features for crafting uncomplicated single-node dataflow programs while delivering decent performance.
Hence, we choose the third option, and use a simple logical plan that's anagonistic to the underlying dataflow framework, as it only describe how the dataflow graph should be doing, not how it do that. And we built operator in hydroflow to execute the plan. And the result hydroflow graph is wrapped in a engine that only support data in/out and tick event to flush and compute the result. This provide a thin middle layer that's easy to maintain and allow switching to other dataflow framework if necessary.
2.`kafka_topic_last_entry_id` which is the last entry id of the topic in use. Can be lazily updated and needed when region has empty memtable.
3. Kafka topics that each region uses.
The states are maintained through:
1. Heartbeat: Datanode sends `last_entry_id` to metasrv in heartbeat. As for regions with empty memtable, `last_entry_id` should equals to `kafka_topic_last_entry_id`.
2. Metasrv maintains a topic-region map to know which region uses which topic.
`kafka_topic_last_entry_id` will be maintained by the region itself. Region will update the value after `k` heartbeats if the memtable is empty.
### Purge procedure
We can better handle locks utilizing current procedure. It's quite similar to the region migration procedure.
After a period of time, metasrv will submit a purge procedure to ProcedureManager. The purge will apply to all topics.
The procedure is divided into following stages:
1. Preparation:
- Retrieve `last_entry_id` of each region kvbackend.
- Choose regions that have a relatively small `last_entry_id` as candidate regions, which means we need to send a flush request to these regions.
2. Communication:
- Send flush requests to candidate regions.
3. Purge:
- Choose proper entry id to delete for each topic. The entry should be the smallest `last_entry_id - 1` among all regions.
- Delete legacy entries in Kafka.
- Store the `last_purged_entry_id` in kvbackend. It should be locked to prevent other regions from replaying the purged entries.
### After purge
After purge, there may be some regions that have `last_entry_id` smaller than the entry we just deleted. It's legal since we only delete the entries that are not needed anymore.
When restarting a region, it should query the `last_purged_entry_id` from metasrv and replay from `min(last_entry_id, last_purged_entry_id)`.
### Error handling
No persisted states are needed since all states are maintained in kvbackend.
Retry when failed to retrieving metadata from kvbackend.
# Alternatives
Purge time can depend on the size of the WAL entries instead of a fixed period of time, which may be more efficient.
This phase is for static analysis of the new partition rule. The server can know whether the repartitioning is possible, how to do the repartitioning, and how much resources are needed.
In theory, the input and output partition rules for repartitioning can be completely unrelated. But in practice, to avoid a very large change set, we'll only allow two simple kinds of change. One splits one region into two regions (region split) and another merges two regions into one (region merge).
After validating the new partition rule using the same validation logic as table creation, we compute the difference between the old and new partition rules. The resulting diff may contain several independent groups of changes. During subsequent processing, each group of changes can be handled independently and can succeed or fail without affecting other groups or creating non-idempotently retryable scenarios.
Next, we generate a repartition plan for each group of changes. Each plan contains this information for all regions involved in that particular plan. And one target region will only be referenced by a single plan.
With those plans, we can determine the resource requirements for the repartition operation, where resources here primarily refer to Regions. Metasrv will coordinate with PaaS layer to pre-allocate the necessary regions at this stage. These new regions start completely empty, and their metadata and manifests will be populated during subsequent modification steps.
## Data Processing
This phase is primarily for region's change, including region's metadata (route table and the corresponding rule) and manifest.
Once we start processing one plan through a procedure, we'll first stop the region's compaction and snapshot. This is to avoid any states being removed due to compaction (which may removes old SST files) and snapshot (which may removes old manifest files).
Metasrv will trying to update the metadata of partition, or the region route table (related to `PartitionRuleManager`). This step is in the "no ingestion" scope, so no new data will be ingested. Since this won't take much time, the affection to the cluster is minimized. Metasrv will also update the region rule to corresponding regions on Datanodes.
Every regions and all the ingestion requests to the region server will have a version of region rule, to identify under which rule the request is processed. The version can be something like `hash(region_rule)`. Once the region rule on region server is updated, all ingestion request with old rule will be rejected, and all requests with new rule will be accepted but not visible. They can still be flushed to persisted storage, but their version change (new manifest) will be staged.
Then region 0 (or let metasrv to pick any operational region) will compute the new manifests for all target regions. This step is done by first reading all old manifests, and remapping the files with new partition rule, to get the content of new manifests. Notice this step only handles the manifests before region rule change on region server, and won't touch those staged manifests, as they are already with the new rule.
Those new manifest will be submitted to the corresponding target regions by region 0 via a `RegionEdit` request. If this request falls after a few retries, region 0 will try to rollback this change by directly overwriting the manifest on object storage. and report this failure to metasrv and let the entire repartition procedure to fail. And we can also optionally compute the new manifest for those staged version changes (like another repartition) and submit them to the target regions to make the also visible even if the repartition fails.
In the other hand, a successful `RegionEdit` request also acknowledges those staged version changes and make them visible.
After this step, the repartition is done in the data plane. We can start to process compaction and snapshot again.
## Postprocessing
After the main processing is done, we can do some extra postprocessing to reduce the performance impact of repartition. Including reloading caches in frontend's route table, metasrv's kv cache and datanode's read/write/page cache etc.
We can also schedule an optional compaction to reorganize all the data file under the new partition rule to reduce potential fragmentation or read amplification.
## Procedure
Here describe the repartition procedure step by step:
-<onfrontend> Validating repartition request
-<onfrontend> Initialize the repartition procedure
- Calculate rule diff and repartition plan group
- Allocate necessary new regions
- Lock the table key
- For each repartition subprocedure
- Stop compaction and snapshot
- Forbid new ingestion requests, update metadata, allow ingestion requests.
- Update region rule to regions
- Pick one region to calculate new manifest for all regions in this repartition group
- Let that region to apply new manifest to each region via `RegionEdit`
- If failed after some retries, revert this manifest change to other succeeded regions and mark this failure.
- If all succeeded, acknowledge those staged version changes and make them visible.
- Return result
- Collect results from subprocedure.
- For those who failed, we need to restart those regions to force reconstruct their status from manifests
- For those who succeeded, collect and merge their rule diff
- Unlock the table key
- Report the result to user.
-<inbackground> Reload cache
-<inbackground> Maybe trigger a special compaction
In addition of sequential step, rollback is also an important part of this procedure. There are three steps can be rolled back when unrecoverable failure occurs.
If the metadata update is not committed, we can overwrite the metadata to previous version. This step is scoped in the "no ingestion" period, so no new data will be ingested and the status of both datanode and metasrv will be consistent.
If the `RegionEdit` to other regions is not acknowledged, or partial acknowledged, we can directly overwrite the manifest on object storage from the central region (who computes the new manifest), and force region server to reload corresponding region to load its state from object storage to recover.
If the staged version changes are not acknowledged, we can re-compute manifest based on old rule for staged data, and apply them directly like above. This is like another smaller repartition for those staged data.
## Region rule validation and diff calculation
In the current codebase, the rule checker is not complete. It can't check uniqueness and completeness of the rule. This RFC also propose a new way to validate the rule.
The proposed validation way is based on a check-point system, which first generates a group of check-points from the rule, and then check if all the point is covered and only covered by one rule.
All the partition rule expressionis limited to be the form of `<column> <operator> <value>`, and the operator is limited to be comparison operators. Those expressions are allowed to be nested with `AND` and `OR` operators. Based on this, we can first extract all the unique values on each column, adding and subtracting a little epsilon to cover its left and right boundary.
Since we accept integer, float and string as the value type, compute on them directly is not convenient. So we'll first normalize them to a common type and only need to preserve the relative partial ordering. This also avoids the problem of "what is next/previous value" of string and "what's a good precision" for float.
After normalization, we get a set of scatter points for each column. Then we can generate a set of check-points by combining all the scatter points like building a cartesian product. This might bring a large number of check-points, so we can do an prune optimization to remove some of them by merging some of the expression zones. Those expressions who have identical N-1 edge sub-expressions with one adjacent edge can be merged together. This prune check is with a time complexity of O(N * M * log(M)), where N is the number of active dimensions and M is the number of expression zones. Diff calculation is also done by finding different expression zones between the old and new rule set, and check if we can transform one to another by merging some of the expression zones.
The step to validate the check-points set against expressions can be treated as a tiny expression of `PhysicalExpr`. This evaluation will give a boolean matrix of K*M shape, where K is the number of check-points. We then check in each row of the matrix, if there is one and only one true value.
## Compute and use new manifest
We can generate a new set of manifest file based on old manifest and two versions of rule. From abvoe rule processing part, we can tell how a new rule & region is from previous one. So a simple way to get the new manifest is also apply the step of change to manifest files. E.g., if region A is from region B and C, we simply combine all file IDs from B and C to generate the content of A.
If necessary, we can do this better by involving some metadata related to data, like min-max statistics of each file, and pre-evaluate over min-max to filter out unneeded files when generating new manifest.
The way to use new manifest needs one more extra step based on the current implementation. We'll need to record either in manifest or in file metadata, of what rule is used when generating (flush or compaction) a SST file. Then in every single read request, we need to append the current region rule as predicate to the read request, to ensure no data belong to other regions will be read. We can use the stored region rule to reduce the number of new predicates to apply, by removing the identical predicate between the current region rule and the stored region rule. So ideally in a table that has not been repartitioned recently, the overhead of checking region rule is minimal.
## Pre-required tasks
In above steps, we assume some functionalities are implemented. Here list them with where they are used and how to implement them.
### Cross-region read
The current data directory structure is `{table_id}/{region_id}/[data/metadata]/{file_id}`, every region can only access files under their own directory. After repartition, data file may be placed in other previous old regions. So we need to support cross-region read. This new access method allows region to access any file under the same table. Related tracking issue is <https://github.com/GreptimeTeam/greptimedb/issues/6409>.
### Global GC worker
This is to simplify state management of data files. As one file may be referenced in multiple manifests, or no manifest at all. After this, every region and the repartition process only need to care about generateing and using new files, without tracking whether a file should be deleted or not. Leaving the deletion to the global GC worker. This worker basically works by counting reference from manifest file, and remove unused one. Related tracking issue is **TBD**.
# Alternatives
In the "Data Processing" section, we can enlarge the "no ingestion" period to include almost all the steps. This can simplify the entire procedure by a lot, but will bring a longer time of ingestion pause which may not be acceptable.
This RFC proposes a compatibility test framework for GreptimeDB to ensure backward/forward compatibility for different versions of GreptimeDB.
# Motivation
In current practice, we don't have a systematic way to test and ensure the compatibility of different versions of GreptimeDB. Each time we release a new version, we need to manually test the compatibility with ad-hoc cases. This is not only time-consuming, but also prone to errors and unmaintainable. Highly rely on the release manager to ensure the compatibility of different versions of GreptimeDB.
We don't have a detailed guide on the release SoP of how to test and ensure the compatibility of the new version. And has broken the compatibility of the new version many times (`v0.14.1` and `v0.15.1` are two examples, which are both released right after the major release).
# Details
This RFC proposes a compatibility test framework that is easy to maintain, extend and run. It can tell the compatibility between any given two versions of GreptimeDB, both backward and forward. It's based on the Sqlness library but used in a different way.
Generally speaking, the framework is composed of two parts:
1. Test cases: A set of test cases that are maintained dedicatedly for the compatibility test. Still in the `.sql` and `.result` format.
2. Test framework: A new sqlness runner that is used to run the test cases. With some new features that is not required by the integration sqlness test.
## Test Cases
### Structure
The case set is organized in three parts:
-`1.feature`: Use a new feature
-`2.verify`: Verify database behavior
-`3.cleanup`: Paired with `1.feature`, cleanup the test environment.
These three parts are organized in a tree structure, and should be run in sequence:
```
compatibility_test/
├── 1.feature/
│ ├── feature-a/
│ ├── feature-b/
│ └── feature-c/
├── 2.verify/
│ ├── verify-metadata/
│ ├── verify-data/
│ └── verify-schema/
└── 3.cleanup/
├── cleanup-a/
├── cleanup-b/
└── cleanup-c/
```
### Example
For example, for a new feature like adding new index option ([#6416](https://github.com/GreptimeTeam/greptimedb/pull/6416)), we (who implement the feature) create a new test case like this:
Since this new feature don't require some special way to verify the database behavior, we can reuse existing test cases in `2.verify/` to verify the database behavior. For example, we can reuse the `verify-metadata` test case to verify the metadata of the table.
In this example, we use some new sqlness features that will be introduced in the next section (`since`, `IGNORE_RESULT`, `TEMPLATE`).
### Maintenance
Each time implement a new feature that should be covered by the compatibility test, we should create a new test case in `1.feature/` and `3.cleanup/` for them. And check if existing cases in `2.verify/` can be reused to verify the database behavior.
This simulates an enthusiastic user who uses all the new features at the first time. All the new Maintenance burden is on the feature implementer to write one more test case for the new feature, to "fixation" the behavior. And once there is a breaking change in the future, it can be detected by the compatibility test framework automatically.
Another topic is about deprecation. If a feature is deprecated, we should also mark it in the test case. Still use above example, assume we deprecate the `index.granularity` and `index.false_positive_rate` index options in `v0.99.0`, we can mark them as:
```sql
-- SQLNESS ARG since=0.15.0 till=0.99.0
...
```
This tells the framework to ignore this feature in version `v0.99.0` and later. Currently, we have so many experimental features that are scheduled to be broken in the future, this is a good way to mark them.
## Test Framework
This section is about new sqlness features required by this framework.
### Since and Till
Follows the `ARG` interceptor in sqlness, we can mark a feature is available between two given versions. Only the `since` is required:
`IGNORE_RESULT` is a new interceptor, it tells the runner to ignore the result of the query, only check whether the query is executed successfully.
This is useful to reduce the Maintenance burden of the test cases, unlike the integration sqlness test, in most cases we don't care about the result of the query, only need to make sure the query is executed successfully.
### TEMPLATE
`TEMPLATE` is another new interceptor, it can generate queries from a template based on a runtime data.
In above example, we need to run the `SHOW CREATE TABLE` query for all existing tables, so we can use the `TEMPLATE` interceptor to generate the query with a dynamic table list.
### RUNNER
There are also some extra requirement for the runner itself:
- It should run the test cases in sequence, first `1.feature/`, then `2.verify/`, and finally `3.cleanup/`.
- It should be able to fetch required version automatically to finish the test.
- It should handle the `since` and `till` properly.
On the `1.feature` phase, the runner needs to identify all features need to be tested by version number. And then restart with a new version (the `to` version) to run `2.verify/` and `3.cleanup/` phase.
## Test Report
Finally, we can run the compatibility test to verify the compatibility between any given two versions of GreptimeDB, for example:
```bash
# check backward compatibility between v0.15.0 and v0.16.0 when releasing v0.16.0
./sqlness run --from=0.15.0 --to=0.16.0
# check forward compatibility when downgrading from v0.15.0 to v0.13.0
./sqlness run --from=0.15.0 --to=0.13.0
```
We can also use a script to run the compatibility test for all the versions in a given range to give a quick report with all versions we need.
And we always bump the version in `Cargo.toml` to the next major release version, so the next major release version can be used as "latest" unpublished version for scenarios like local testing.
# Alternatives
There was a previous attempt to implement a compatibility test framework that was disabled due to some reasons [#3728](https://github.com/GreptimeTeam/greptimedb/issues/3728).
This RFC proposes the integration of a garbage collection (GC) mechanism within the Compaction process. This mechanism aims to manage and remove stale files that are no longer actively used by any system component, thereby reclaiming storage space.
## Motivation
With the introduction of features such as table repartitioning, a substantial number of Parquet files can become obsolete. Furthermore, failures during manifest updates may result in orphaned files that are never referenced by the system. Therefore, a periodic garbage collection mechanism is essential to reclaim storage space by systematically removing these unused files.
## Details
### Overview
The garbage collection process will be integrated directly into the Compaction process. Upon the completion of a Compaction for a given region, the GC worker will be automatically triggered. Its primary function will be to identify and subsequently delete obsolete files that have persisted beyond their designated retention period. This integration ensures that garbage collection is performed in close conjunction with data lifecycle management, effectively leveraging the compaction process's inherent knowledge of file states.
This design prioritizes correctness and safety by explicitly linking GC execution to a well-defined operational boundary: the successful completion of a compaction cycle.
### Terminology
- **Unused File**: Refers to a file present in the storage directory that has never been formally recorded in any manifest. A common scenario for this includes cases where a new SST file is successfully written to storage, but the subsequent update to the manifest fails, leaving the file unreferenced.
- **Obsolete File**: Denotes a file that was previously recorded in a manifest but has since been explicitly marked for removal. This typically occurs following operations such as data repartitioning or compaction.
### GC Worker Process
The GC worker operates as an integral part of the Compaction process. Once a Compaction for a specific region is completed, the GC worker is automatically triggered. Executing this process on a `datanode` is preferred to eliminate the overhead associated with having to set object storage configurations in the `metasrv`.
The detailed process is as follows:
1.**Invocation**: Upon the successful completion of a Compaction for a region, the GC worker is invoked.
2.**Manifest Reading**: The worker reads the region's primary manifest to obtain a comprehensive list of all files marked as obsolete. Concurrently, it reads any temporary manifests generated by long-running queries to identify files that are currently in active use, thereby preventing their premature deletion.
3.**Lingering Time Check (Obsolete Files)**: For each identified obsolete file, the GC worker evaluates its "lingering time." Which is the time passed after it had been removed from manifest.
4.**Deletion Marking (Obsolete Files)**: Files that have exceeded their maximum configurable lingering time and are not referenced by any active temporary manifests are marked for deletion.
5.**Lingering Time (Unused Files)**: Unused files (those never recorded in any manifest) are also subject to a configurable maximum lingering time before they are eligible for deletion.
Following flowchart illustrates the GC worker's process:
```mermaid
flowchart TD
A[Compaction Completed] --> B[Trigger GC Worker]
B --> C[Scan Region Manifest]
C --> D[Identify File Types]
D --> E[Unused Files<br/>Never recorded in manifest]
D --> F[Obsolete Files<br/>Previously in manifest<br/>but marked for removal]
E --> G[Check Lingering Time]
F --> G
G --> H{File exceeds<br/>configured lingering time?}
H -->|No| I[Skip deletion]
H -->|Yes| J[Check Temporary Manifest]
J --> K{File in use by<br/>active queries?}
K -->|Yes| L[Retain file<br/>Wait for next GC cycle]
K -->|No| M[Safely delete file]
I --> N[End GC cycle]
L --> N
M --> O[Update Manifest]
O --> N
N --> P[Wait for next Compaction]
P --> A
style A fill:#e1f5fe
style B fill:#f3e5f5
style M fill:#e8f5e8
style L fill:#fff3e0
```
#### Handling Obsolete Files
An obsolete file is permanently deleted only if two conditions are met:
1. The time elapsed since its removal from the manifest (its obsolescence timestamp) exceeds a configurable threshold.
2. It is not currently referenced by any active temporary manifests.
#### Handling Unused Files
With the integration of the GC worker into the Compaction process, the risk of accidentally deleting newly created SST files that have not yet been recorded in the manifest is significantly mitigated. Consequently, the concept of "Unused Files" as a distinct category primarily susceptible to accidental deletion is largely resolved. Any files that are genuinely "unused" (i.e., never referenced by any manifest, including temporary ones) can be safely deleted after a configurable maximum lingering time.
For debugging and auditing purposes, a comprehensive list of recently deleted files can be maintained.
### Ensuring Read Consistency
To prevent the GC worker from inadvertently deleting files that are actively being utilized by long-running analytical queries, a robust protection mechanism is introduced. This mechanism relies on temporary manifests that are actively kept "alive" by the queries using them.
When a long-running query is detected (e.g., by a slow query recorder), it will write a temporary manifest to the region's manifest directory. This manifest lists all files required for the query. However, simply creating this file is not enough, as a query runner might crash, leaving the temporary manifest orphaned and preventing garbage collection indefinitely.
To address this, the following "heartbeat" mechanism is implemented:
1.**Periodic Updates**: The process executing the long-running query is responsible for periodically updating the modification timestamp of its temporary manifest file (i.e., "touching" the file). This serves as a heartbeat, signaling that the query is still active.
2.**GC Worker Verification**: When the GC worker runs, it scans for temporary manifests. For each one it finds, it checks the file's last modification time.
3.**Stale File Handling**: If a temporary manifest's last modification time is older than a configurable threshold, the GC worker considers it stale (left over from a crashed or terminated query). The GC worker will then delete this stale temporary manifest. Files that were protected only by this stale manifest are no longer shielded from garbage collection.
This approach ensures that only files for genuinely active queries are protected. The lifecycle of the temporary manifest is managed dynamically: it is created when a long query starts, kept alive through periodic updates, and is either deleted by the query upon normal completion or automatically cleaned up by the GC worker if the query terminates unexpectedly.
This mechanism may be too complex to implement at once. We can consider a two-phased approach:
1.**Phase 1 (Simple Time-Based Deletion)**: Initially, implement a simpler GC strategy that deletes obsolete files based solely on a configurable lingering time. This provides a baseline for space reclamation without the complexity of temporary manifests.
2.**Phase 2 (Consistency-Aware GC)**: Based on the practical effectiveness and observed issues from Phase 1, we can then decide whether to implement the full temporary manifest and heartbeat mechanism to handle long-running queries. This iterative approach allows for a quicker initial implementation while gathering real-world data to justify the need for a more complex solution.
## Drawbacks
- **Dependency on Compaction Frequency**: The integration of the GC worker with Compaction means that GC cycles are directly tied to the frequency of compactions. In environments with infrequent compaction operations, obsolete files may accumulate for extended periods before being reclaimed, potentially leading to increased storage consumption.
- **Race Condition with Long-Running Queries**: A potential race condition exists if a long-running query initiates but haven't write its temporary manifest in time, while a compaction process simultaneously begins and marks files used by that query as obsolete. This scenario could lead to the premature deletion of files still required by the active query. To mitigate this, the threshold time for writing a temporary manifest should be significantly shorter than the lingering time configured for obsolete files, ensuring that next GC worker runs do not delete files that are now referenced by a temporary manifest if the query is still running.
Also the read replica shouldn't be later in manifest version for more than the lingering time of obsolete files, otherwise it might ref to files that are already deleted by the GC worker.
- need to upload tmp manifest to object storage, which may introduce additional complexity and potential performance overhead. But since long-running queries are typically not frequent, the performance impact is expected to be minimal.
## Conclusion and Rationale
This section summarizes the key aspects and trade-offs of the proposed integrated GC worker, highlighting its advantages and potential challenges.
| Aspect | Current Proposal (Integrated GC) |
| :--- | :--- |
| **Implementation Complexity** | **Medium**. Requires careful integration with the compaction process and the slow query recorder for temporary manifest management. |
| **Reliability** | **High**. Integration with compaction and leveraging temporary manifests from long-running queries significantly mitigates the risk of incorrect deletion. Accurate management of lingering times for obsolete files and prevention of accidental deletion of newly created SSTs enhance data safety. |
| **Performance Overhead** | **Low to Medium**. The GC worker runs post-compaction, minimizing direct impact on write paths. Overhead from temporary manifest management by the slow query recorder is expected to be acceptable for long-running queries. |
| **Impact on Other Components** | **Moderate**. Requires modifications to the compaction process to trigger GC and the slow query recorder to manage temporary manifests. This introduces some coupling but enhances overall data safety. |
| **Deletion Strategy** | **State- and Time-Based**. Obsolete files are deleted based on a configurable lingering time, which is paused if the file is referenced by a temporary manifest. Unused files (never in a manifest) are also subject to a lingering time. |
## Unresolved Questions and Future Work
This section outlines key areas requiring further discussion and defines potential avenues for future development.
***Slow Query Recorder Implementation**: Detailed specifications for modify slow query recorder's implementation and its precise interaction mechanisms with temporary manifests are needed.
***Configurable Lingering Times**: Establish and make configurable the specific lingering times for both obsolete and unused files to optimize storage reclamation and data availability.
## Alternatives
### 1. Standalone GC Service
Instead of integrating the GC worker directly into the Compaction process, a standalone GC service could be implemented. This service would operate independently, periodically scanning the storage for obsolete and unused files based on manifest information and predefined retention policies.
**Pros:**
* **Decoupling**: Separates GC logic from compaction, allowing independent scaling and deployment.
* **Flexibility**: Can be configured to run at different frequencies and with different strategies than compaction.
**Cons:**
* **Increased Complexity**: Requires a separate service to manage, monitor, and coordinate with other components.
* **Potential for Redundancy**: May duplicate some file scanning logic already present in compaction.
* **Consistency Challenges**: Ensuring read consistency would require more complex coordination mechanisms between the standalone GC service and active queries, potentially involving a distributed lock manager or a more sophisticated temporary manifest system.
This alternative could be implemented in the future if the integrated GC worker proves insufficient or if there is a need for more advanced GC strategies.
### 2. Manifest-Driven Deletion (No Lingering Time)
This alternative would involve immediate deletion of files once they are removed from the manifest, without a lingering time.
**Pros:**
* **Simplicity**: Simplifies the GC logic by removing the need for lingering time management.
* **Immediate Space Reclamation**: Storage space is reclaimed as soon as files are marked for deletion.
**Cons:**
* **Increased Risk of Data Loss**: Higher risk of deleting files still in use by long-running queries or other processes if not perfectly synchronized.
* **Complex Read Consistency**: Requires extremely robust and immediate mechanisms to ensure that no active queries are referencing files marked for deletion, potentially leading to performance bottlenecks or complex error handling.
* **Debugging Challenges**: Difficult to debug issues related to premature file deletion due to the immediate nature of the operation.
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.