greptimedb

mirror of https://github.com/GreptimeTeam/greptimedb.git synced 2026-05-29 19:30:37 +00:00

Author	SHA1	Message	Date
Yingwen	fd94f55193	refactor(mito2): remove dead scan code (#7925 ) * refactor(mito2): remove dead batch parallel scan helpers Signed-off-by: evenyag <realevenyag@gmail.com> * refactor(mito2): remove dead merge reader path Signed-off-by: evenyag <realevenyag@gmail.com> * refactor(mito2): remove dead batch dedup reader Signed-off-by: evenyag <realevenyag@gmail.com> * test(mito2): remove obsolete batch source helper Signed-off-by: evenyag <realevenyag@gmail.com> * refactor: remove unused plain batch Signed-off-by: evenyag <realevenyag@gmail.com> --------- Signed-off-by: evenyag <realevenyag@gmail.com>	2026-04-10 03:12:33 +00:00
Yingwen	0e22d6a72b	feat: implement partition range cache stream (#7842 ) * feat: add cache stream helpers, key construction, config wiring, and metrics for partition range cache Add range result cache size config field and wire it through cache builder chains. Implement cache key building (build_range_cache_key), stream replay/store helpers (cached_flat_range_stream, cache_flat_range_stream), dictionary compaction (compact_pk_dictionary), and partition range row group collection. Add range cache metrics (size, hit, miss) to ScanMetricsSet and PartitionMetrics. Move fingerprint tests from scan_region to range_cache module. These functions are not yet wired into scan execution. Signed-off-by: evenyag <realevenyag@gmail.com> * feat: add benchmark for cache stream Signed-off-by: evenyag <realevenyag@gmail.com> * refactor: move bench_util to test_util Signed-off-by: evenyag <realevenyag@gmail.com> * feat: share dict Signed-off-by: evenyag <realevenyag@gmail.com> * test: test ptr_eq Signed-off-by: evenyag <realevenyag@gmail.com> * chore: fmt code Signed-off-by: evenyag <realevenyag@gmail.com> * refactor: simplify value array handling Signed-off-by: evenyag <realevenyag@gmail.com> * chore: add todo for estimate size Signed-off-by: evenyag <realevenyag@gmail.com> * feat: simplify size calculation Signed-off-by: evenyag <realevenyag@gmail.com> * chore: remove one test Signed-off-by: evenyag <realevenyag@gmail.com> * test: update config test Signed-off-by: evenyag <realevenyag@gmail.com> * chore: address review comment Only ignore exprs that can extract time ranges Signed-off-by: evenyag <realevenyag@gmail.com> * test: fix tests Signed-off-by: evenyag <realevenyag@gmail.com> --------- Signed-off-by: evenyag <realevenyag@gmail.com>	2026-03-24 10:01:13 +00:00
Lei, HUANG	be4a7a6d37	refactor: remove Memtable::iter (#7809 ) * refactor: remove Memtable::iter Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * fix: review comments Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> --------- Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>	2026-03-16 07:49:31 +00:00
Yingwen	04cd2c8a05	feat: flat read path support primary_key format memtables (#7759 ) * feat: add adapter for batch to flat recordbatch Signed-off-by: evenyag <realevenyag@gmail.com> * feat: support batch to flat record batch in MemtableRange Signed-off-by: evenyag <realevenyag@gmail.com> * refactor: address review issues for BatchToRecordBatchAdapter - Extract duplicated read_column_ids computation into a shared `read_column_ids_from_projection` helper function - Cache `FormatProjection` in `BatchToRecordBatchContext::new()` instead of recomputing it on every `adapt_iter()` call - Remove unnecessary `Arc` wrapping of `read_column_ids` in `SimpleBulkMemtable::ranges()` - Fix clippy `filter_map_bool_then` warning in `batch_adapter.rs` Signed-off-by: evenyag <realevenyag@gmail.com> * chore: simplify comments Signed-off-by: evenyag <realevenyag@gmail.com> * refactor(mito2): use read column ids in batch adapter Signed-off-by: evenyag <realevenyag@gmail.com> * test: test build_record_batch_iter Signed-off-by: evenyag <realevenyag@gmail.com> * chore: fmt code Signed-off-by: evenyag <realevenyag@gmail.com> * test: test build_record_batch_iter for all old memtables Signed-off-by: evenyag <realevenyag@gmail.com> * chore: address comment Signed-off-by: evenyag <realevenyag@gmail.com> * fix: prune time range before adapter Signed-off-by: evenyag <realevenyag@gmail.com> * chore: share BatchToRecordBatchContext in simple_bulk_memtable.rs Signed-off-by: evenyag <realevenyag@gmail.com> * chore: use ScalarValue::to_array_of_size to build repeated value array Signed-off-by: evenyag <realevenyag@gmail.com> --------- Signed-off-by: evenyag <realevenyag@gmail.com>	2026-03-10 12:46:39 +00:00
LFC	5eac4f10aa	chore: remove dependency on "atty" (#7725 ) Signed-off-by: luofucong <luofc@foxmail.com>	2026-02-26 09:58:01 +00:00
Yingwen	7711661618	feat: BulkMemtable compact parts without encoding into Parquet (#7617 ) * feat: implement MultiBulkPart to hold a list of batches in BulkMemtable Signed-off-by: evenyag <realevenyag@gmail.com> * feat: Only encode parts when there are enough rows Signed-off-by: evenyag <realevenyag@gmail.com> * refactor: merge MultiBulkPartIter and BulkPartBatchIter Signed-off-by: evenyag <realevenyag@gmail.com> * refactor: remove some enums and structs Signed-off-by: evenyag <realevenyag@gmail.com> * chore: reuse code in merging bulk/encoded parts Signed-off-by: evenyag <realevenyag@gmail.com> * refactor: collect part groups directly Signed-off-by: evenyag <realevenyag@gmail.com> * test: add unit tests Signed-off-by: evenyag <realevenyag@gmail.com> * feat: enlarge merge threshold and configure by env - GREPTIME_BULK_MERGE_THRESHOLD - GREPTIME_BULK_ENCODE_ROW_THRESHOLD - GREPTIME_BULK_ENCODE_BYTES_THRESHOLD Signed-off-by: evenyag <realevenyag@gmail.com> * feat: change flush strategy Signed-off-by: evenyag <realevenyag@gmail.com> * feat: add BulkMemtableConfig Signed-off-by: evenyag <realevenyag@gmail.com> * chore: limit max groups and adjust threshold Signed-off-by: evenyag <realevenyag@gmail.com> * feat: add flush file number metrics Signed-off-by: evenyag <realevenyag@gmail.com> * chore: add bulk filter 1 host bench Signed-off-by: evenyag <realevenyag@gmail.com> * feat: adjust bulk compact threshold Signed-off-by: evenyag <realevenyag@gmail.com> * feat: flush a file if == min_flush_rows Signed-off-by: evenyag <realevenyag@gmail.com> * test: fix test_index_build_type_compact test Signed-off-by: evenyag <realevenyag@gmail.com> * test: fix mito tests Signed-off-by: evenyag <realevenyag@gmail.com> * fix: remove regions from catchup_regions before notify Signed-off-by: evenyag <realevenyag@gmail.com> --------- Signed-off-by: evenyag <realevenyag@gmail.com>	2026-01-30 08:03:36 +00:00
Yingwen	a22d08f1b1	feat: collect merge and dedup metrics (#7375 ) * feat: collect FlatMergeReader metrics Signed-off-by: evenyag <realevenyag@gmail.com> * feat: add MergeMetricsReporter, rename Metrics to MergeMetrics Signed-off-by: evenyag <realevenyag@gmail.com> * feat: remove num_input_rows from MergeMetrics The merge reader won't dedup so there is no need to collect input rows Signed-off-by: evenyag <realevenyag@gmail.com> * feat: report merge metrics to PartitionMetrics Signed-off-by: evenyag <realevenyag@gmail.com> * feat: add dedup cost to DedupMetrics Signed-off-by: evenyag <realevenyag@gmail.com> * feat: collect dedup metrics Signed-off-by: evenyag <realevenyag@gmail.com> * refactor: remove metrics from FlatMergeIterator Signed-off-by: evenyag <realevenyag@gmail.com> * feat: remove num_output_rows from MergeMetrics Signed-off-by: evenyag <realevenyag@gmail.com> * chore: fix clippy Signed-off-by: evenyag <realevenyag@gmail.com> * feat: implement merge() for merge and dedup metrics Signed-off-by: evenyag <realevenyag@gmail.com> * fix: report metrics after observe metrics Signed-off-by: evenyag <realevenyag@gmail.com> --------- Signed-off-by: evenyag <realevenyag@gmail.com>	2025-12-10 09:16:20 +00:00
Yingwen	3001c2d719	feat: BulkMemtable stores small fragments in another buffer (#7164 ) * feat: buffer small parts in bulk memtable Signed-off-by: evenyag <realevenyag@gmail.com> * refactor: use assert_eq instead of assert Signed-off-by: evenyag <realevenyag@gmail.com> * chore: fix compiler errors Signed-off-by: evenyag <realevenyag@gmail.com> * chore: collect bulk memtable scan metrics Signed-off-by: evenyag <realevenyag@gmail.com> * chore: report metrics early Signed-off-by: evenyag <realevenyag@gmail.com> --------- Signed-off-by: evenyag <realevenyag@gmail.com>	2025-11-05 06:35:32 +00:00
Yingwen	acf38a7091	fix: avoid filtering rows with delete op by fields under merge mode (#7154 ) * chore: clear allow dead_code for flat format Signed-off-by: evenyag <realevenyag@gmail.com> * chore: pass exprs to build appliers Signed-off-by: evenyag <realevenyag@gmail.com> * feat: split field filters and index appliers Signed-off-by: evenyag <realevenyag@gmail.com> * feat: support skip filtering fields in RowGroupPruningStats Signed-off-by: evenyag <realevenyag@gmail.com> * feat: add PreFilterMode to config whether to skip filtering fields Adds the PreFilterMode to the RangeBase and sets it in ParquetReaderBuilder Signed-off-by: evenyag <realevenyag@gmail.com> * feat: support skipping fields in prune reader Signed-off-by: evenyag <realevenyag@gmail.com> * feat: support pre filter mode in bulk memtable Signed-off-by: evenyag <realevenyag@gmail.com> * feat: pass PreFilterMode to memtable Signed-off-by: evenyag <realevenyag@gmail.com> * test: test mito filter delete Signed-off-by: evenyag <realevenyag@gmail.com> * chore: fix compiler errors Signed-off-by: evenyag <realevenyag@gmail.com> * chore: remove commented code Signed-off-by: evenyag <realevenyag@gmail.com> * refactor: move predicate and sequence to RangesOptions Signed-off-by: evenyag <realevenyag@gmail.com> * chore: fmt code Signed-off-by: evenyag <realevenyag@gmail.com> * ci: skip cargo gc Signed-off-by: evenyag <realevenyag@gmail.com> * chore: fix cargo build warning Signed-off-by: evenyag <realevenyag@gmail.com> * Revert "ci: skip cargo gc" This reverts commit `1ec9594a6d`. Signed-off-by: evenyag <realevenyag@gmail.com> --------- Signed-off-by: evenyag <realevenyag@gmail.com>	2025-10-30 12:14:45 +00:00
Yingwen	cff9cb6327	feat: converts batches in old format to the flat format in query time (#6987 ) * feat: use correct projection index for old format Signed-off-by: evenyag <realevenyag@gmail.com> * chore: remove allow dead_code from format Signed-off-by: evenyag <realevenyag@gmail.com> * feat: check and convert old format to flat format Signed-off-by: evenyag <realevenyag@gmail.com> * fix: sub primary key num from projection Signed-off-by: evenyag <realevenyag@gmail.com> * fix: always convert the batch in FlatRowGroupReader Signed-off-by: evenyag <realevenyag@gmail.com> * style: fix clippy Signed-off-by: evenyag <realevenyag@gmail.com> * refactor: Change &Option<&[]> to Option<&[]> Signed-off-by: evenyag <realevenyag@gmail.com> * refactor: only build arrow schema once adds a method flat_sst_arrow_schema_column_num() to get the field num Signed-off-by: evenyag <realevenyag@gmail.com> * feat: Handle flat format and old format separately Adds two structs ParquetFlat and ParquetPrimaryKeyToFlat. ParquetPrimaryKeyToFlat delegates stats and projection to the PrimaryKeyReadFormat. Signed-off-by: evenyag <realevenyag@gmail.com> * fix: handle non string tag correctly Signed-off-by: evenyag <realevenyag@gmail.com> * fix: do not register file cache twice Signed-off-by: evenyag <realevenyag@gmail.com> * fix: clean temp files Signed-off-by: evenyag <realevenyag@gmail.com> * chore: add rows and bytes to flush success log Signed-off-by: evenyag <realevenyag@gmail.com> * chore: convert format in memtable Signed-off-by: evenyag <realevenyag@gmail.com> * refactor: add compaction flag to ScanInput Signed-off-by: evenyag <realevenyag@gmail.com> * fix: compaction should use old format for sparse encoding Signed-off-by: evenyag <realevenyag@gmail.com> * fix: merge schema use old format in sparse encoding Signed-off-by: evenyag <realevenyag@gmail.com> * feat: reads legacy format but not convert if skip_auto_convert Signed-off-by: evenyag <realevenyag@gmail.com> * fix: suppport sparse encoding in bulk parts Signed-off-by: evenyag <realevenyag@gmail.com> --------- Signed-off-by: evenyag <realevenyag@gmail.com>	2025-09-25 06:42:22 +00:00
Yingwen	9c22092189	feat: Implements compaction for bulk memtable (#6923 ) * feat: initial bulk memtable compaction implementation Signed-off-by: evenyag <realevenyag@gmail.com> * feat: implement compact for memtable Signed-off-by: evenyag <realevenyag@gmail.com> * test: add tests for bulk memtable compaction Signed-off-by: evenyag <realevenyag@gmail.com> * style: clippy Signed-off-by: evenyag <realevenyag@gmail.com> * chore: address review comments Signed-off-by: evenyag <realevenyag@gmail.com> --------- Signed-off-by: evenyag <realevenyag@gmail.com>	2025-09-09 09:15:25 +00:00
Ruihang Xia	c9377e7c5a	build: bump rust edition to 2024 (#6920 ) * bump edition Signed-off-by: Ruihang Xia <waynestxia@gmail.com> * format Signed-off-by: Ruihang Xia <waynestxia@gmail.com> * gen keyword Signed-off-by: Ruihang Xia <waynestxia@gmail.com> * lifetime and env var Signed-off-by: Ruihang Xia <waynestxia@gmail.com> * one more gen fix Signed-off-by: Ruihang Xia <waynestxia@gmail.com> * lifetime of temporaries in tail expressions Signed-off-by: Ruihang Xia <waynestxia@gmail.com> * format again Signed-off-by: Ruihang Xia <waynestxia@gmail.com> * clippy nested if Signed-off-by: Ruihang Xia <waynestxia@gmail.com> * clippy let and return Signed-off-by: Ruihang Xia <waynestxia@gmail.com> --------- Signed-off-by: Ruihang Xia <waynestxia@gmail.com>	2025-09-08 02:37:18 +00:00
Yingwen	8fc3a9a9d7	feat: Implements async FlatMergeReader and FlatDedupReader (#6761 ) * refactor: Add Flat prefix to MergeIterator and DedupIterator Signed-off-by: evenyag <realevenyag@gmail.com> * feat: implement MergeReader for RecordBatch Signed-off-by: evenyag <realevenyag@gmail.com> * refactor: use GenericNode for IterNode Signed-off-by: evenyag <realevenyag@gmail.com> * feat: flat merge reader to stream Signed-off-by: evenyag <realevenyag@gmail.com> * feat: implement FlatDedupReader Signed-off-by: evenyag <realevenyag@gmail.com> * chore: add a benchmark for FlatMergeIterator Signed-off-by: evenyag <realevenyag@gmail.com> * refactor: rename plain_projection to flat_projection Signed-off-by: evenyag <realevenyag@gmail.com> --------- Signed-off-by: evenyag <realevenyag@gmail.com>	2025-08-19 06:30:59 +00:00
Yingwen	39e2f122eb	feat: EncodedBulkPartIter iters flat format and returns RecordBatch (#6655 ) * feat: implements iter to read bulk part Signed-off-by: evenyag <realevenyag@gmail.com> * feat: BulkPartEncoder encodes BulkPart instead of mutation Signed-off-by: evenyag <realevenyag@gmail.com> --------- Signed-off-by: evenyag <realevenyag@gmail.com>	2025-08-06 06:50:01 +00:00
Yingwen	50f7f61fdc	feat: Implements an iterator to read the RecordBatch in BulkPart (#6647 ) * feat: impl RecordBatchIter for BulkPart Signed-off-by: evenyag <realevenyag@gmail.com> * refactor: rename BulkPartIter to EncodedBulkPartIter Signed-off-by: evenyag <realevenyag@gmail.com> * chore: add iter benchmark Signed-off-by: evenyag <realevenyag@gmail.com> * feat: filter by primary key columns Signed-off-by: evenyag <realevenyag@gmail.com> * refactor: move struct definitions Signed-off-by: evenyag <realevenyag@gmail.com> * feat: bulk iter for flat schema Signed-off-by: evenyag <realevenyag@gmail.com> * feat: iter filter benchmark Signed-off-by: evenyag <realevenyag@gmail.com> * chore: fix compiler errors Signed-off-by: evenyag <realevenyag@gmail.com> * fix: use corrent sequence array to compare Signed-off-by: evenyag <realevenyag@gmail.com> * refactor: remove RecordBatchIter Signed-off-by: evenyag <realevenyag@gmail.com> * chore: update comments Signed-off-by: evenyag <realevenyag@gmail.com> * style: fix clippy Signed-off-by: evenyag <realevenyag@gmail.com> * feat: apply projection first Signed-off-by: evenyag <realevenyag@gmail.com> * chore: address comment No need to check number of rows after filter Signed-off-by: evenyag <realevenyag@gmail.com> --------- Signed-off-by: evenyag <realevenyag@gmail.com>	2025-08-05 08:11:28 +00:00
Yingwen	52466fdd92	feat: Implement a converter to converts KeyValues into BulkPart (#6620 ) * chore: add api to memtable to check bulk capability Signed-off-by: evenyag <realevenyag@gmail.com> * feat: Add a converter to convert KeyValues into BulkPart Signed-off-by: evenyag <realevenyag@gmail.com> * feat: move supports_bulk_insert to MemtableBuilder Signed-off-by: evenyag <realevenyag@gmail.com> * chore: benchmark Signed-off-by: evenyag <realevenyag@gmail.com> * feat: use write_bulk if the memtable benefits from it Signed-off-by: evenyag <realevenyag@gmail.com> * test: test BulkPartConverter Signed-off-by: evenyag <realevenyag@gmail.com> * feat: add a flag to store unencoded primary keys Signed-off-by: evenyag <realevenyag@gmail.com> * feat: cache schema for converter Implements to_flat_sst_arrow_schema Signed-off-by: evenyag <realevenyag@gmail.com> * chore: simplify tests Signed-off-by: evenyag <realevenyag@gmail.com> * fix: don't use bulk convert branch now Signed-off-by: evenyag <realevenyag@gmail.com> * style: fix clippy Signed-off-by: evenyag <realevenyag@gmail.com> * chore: address review comments * simplify primary_key_column_builders check * return error if value is not string Signed-off-by: evenyag <realevenyag@gmail.com> * feat: add FlatSchemaOptions::from_encoding and test sparse encoding Signed-off-by: evenyag <realevenyag@gmail.com> --------- Signed-off-by: evenyag <realevenyag@gmail.com>	2025-08-01 07:59:11 +00:00
Lei, HUANG	c3201c32c3	feat(mito): replace `Memtable::iter` with `Memtable::ranges` (#6549 ) * bulk-multiparts-merge-reader: Enhance Memtable Iteration and Flushing Logic - `flush.rs`: Updated `RegionFlushTask` to handle multiple ranges using `MergeReaderBuilder` for improved source management during flush operations. - `memtable.rs`: Introduced `build_prune_iter` and `build_iter` methods in `MemtableRange` for flexible iteration. Added `MemtableRanges` struct to manage multiple contexts. - `simple_bulk_memtable.rs`: Refactored to use `BatchIterBuilder` and `BatchIterBuilderDeprecated` for iteration, supporting new `read_to_values` method in `Series`. - `time_series.rs`: Added `read_to_values` and `finish_cloned` methods in `Series` and `ValueBuilder` for efficient data handling. - `scan_util.rs`: Replaced `build_iter` with `build_prune_iter` for range iteration, enhancing scan utility. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * bulk-multiparts-merge-reader: - Add Rayon for Parallel Processing: Introduced `rayon` for parallel processing in `simple_bulk_memtable.rs` and updated `Cargo.toml` and `Cargo.lock` to include `rayon` dependency. - Enhance Benchmarking: Added new benchmarks in `simple_bulk_memtable.rs` to compare parallel vs sequential processing, projection, sequence filtering, and write performance. - Make Structs and Methods Public: Changed visibility of several structs and methods to `pub` in `simple_bulk_memtable.rs`, `memtable.rs`, `time_series.rs`, and `test_util.rs` to facilitate testing and benchmarking. - Update Criterion Features: Modified `Cargo.toml` to include `html_reports` feature for `criterion`. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * bulk-multiparts-merge-reader: ### Commit Summary - Refactor `SimpleBulkMemtable`: - Moved `ranges_sequential` function to a new `test_only` module and made it a method of `SimpleBulkMemtable`. - Made several fields in `SimpleBulkMemtable` private and added a `region_metadata` getter. - Affected files: `simple_bulk_memtable.rs`, `test_only.rs`. - Benchmark Adjustments: - Updated benchmark functions to use the new `ranges_sequential` method. - Affected file: `simple_bulk_memtable.rs`. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * bulk-multiparts-merge-reader: ### Add Test Configuration for `iter` Method in Memtable Implementations - Enhancements: - Added `#[cfg(any(test, feature = "test"))]` attribute to the `iter` method in various `Memtable` implementations to enable conditional compilation for testing purposes. - Affected files: - `src/mito2/src/memtable.rs` - `src/mito2/src/memtable/bulk.rs` - `src/mito2/src/memtable/partition_tree.rs` - `src/mito2/src/memtable/simple_bulk_memtable.rs` - `src/mito2/src/memtable/time_series.rs` - `src/mito2/src/test_util/memtable_util.rs` - Benchmark Adjustments: - Removed `black_box` usage in `bench_memtable_write_performance` function to streamline benchmarking. - Affected file: `src/mito2/benches/simple_bulk_memtable.rs` Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * bulk-multiparts-merge-reader: Enhance Async Support and Refactor Iteration in `mito2` - Add Async Features: Updated `Cargo.toml` to include `async` and `async_tokio` features for `criterion`. - Async Iteration: Introduced async functions `flush` and `flush_original` in `simple_bulk_memtable.rs` to handle memtable flushing using async iterators. - Refactor Iteration Logic: Moved `create_iter` and `BatchIterBuilderDeprecated` to `test_only.rs` for better separation of concerns. - Public API Change: Made `next_batch` in `read.rs` public to support async batch processing. - Benchmark Updates: Modified benchmarks in `simple_bulk_memtable.rs` to use async runtime for performance testing. Files affected: `Cargo.toml`, `simple_bulk_memtable.rs`, `test_only.rs`, `read.rs`. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * bulk-multiparts-merge-reader: Enhance Benchmarking for Memtable - Refactored `create_large_memtable` to `create_memtable_with_rows` in `simple_bulk_memtable.rs` to allow dynamic row count configuration. - Introduced parameterized benchmarking in `bench_ranges_parallel_vs_sequential` to test various row counts, improving the flexibility and coverage of performance tests. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * bulk-multiparts-merge-reader: ### Enhance Memory Management and Public API - `builder.rs`: Made `next_offset` method public to allow external access to offset calculations. - `simple_bulk_memtable.rs`: Simplified the `series.extend` method by removing the iterator conversion for `fields`. - `time_series.rs`: - Added `can_accommodate` method to `ValueBuilder` to check if fields can be accommodated without offset overflow. - Modified `extend` method to use a `Vec` for `fields` instead of an iterator, improving memory management and error handling. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * bulk-multiparts-merge-reader: Add License and Enhance Testing in `simple_bulk_memtable.rs` - Added Apache License header to `simple_bulk_memtable.rs`. - Modified test configuration in `simple_bulk_memtable.rs` to include `any(test, feature = "test")`. - Introduced a new test `test_write_read_large_string` in `simple_bulk_memtable.rs` to verify handling of large strings. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * bulk-multiparts-merge-reader: Update `Cargo.toml` dependencies - Adjust features for `common-meta` and `mito-codec` to include "testing". - Maintain `criterion` version and features for async support. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * bulk-multiparts-merge-reader: ### Update Predicate Type in Memtable Iterators - Files Modified: - `src/mito2/src/memtable.rs` - `src/mito2/src/memtable/bulk.rs` - `src/mito2/src/memtable/simple_bulk_memtable.rs` - Key Changes: - Updated the `iter` method in `Memtable` trait and its implementations to use `Option<table::predicate::Predicate>` instead of `Option<Predicate>`. - Adjusted return type in `BulkMemtable`'s `iter` method to `Result<crate::memtable::BoxedBatchIterator>`. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * bulk-multiparts-merge-reader: Enhance Memtable Functionality - `memtable.rs`: - Added `Clone` trait to `MemtableStats` and made `num_ranges` public. - Introduced `num_rows` field in `MemtableRange` and updated its constructor. - Added `num_rows` method to `MemtableRange`. - `partition_tree.rs`, `simple_bulk_memtable.rs`, `time_series.rs`: - Updated `MemtableRange` instantiation to include `num_rows`. - `range.rs`: - Refactored `MemRangeBuilder` to handle a single `MemtableRange` and `MemtableStats`. - `scan_region.rs`: - Enhanced memtable filtering based on time range and updated `MemRangeBuilder` usage. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * bulk-multiparts-merge-reader: Enhancements and Bug Fixes - Deduplication Enhancements: - Introduced `DedupReader` and `LastRow` as public structs in `dedup.rs` to enhance deduplication capabilities. - Added `LastNonNull` deduplication strategy in `flush.rs` and `simple_bulk_memtable.rs`. - Memtable Improvements: - Updated `SimpleBulkMemtable` to support batch size configuration and deduplication strategies. - Modified `Series` struct in `time_series.rs` to include a configurable capacity. - Testing Enhancements: - Added new test `test_write_dedup` in `simple_bulk_memtable.rs` to verify deduplication functionality. - Updated existing tests to include `OpType` parameter for better operation type handling. - Refactoring: - Renamed `BatchIterBuilder` to `BatchRangeBuilder` in `simple_bulk_memtable.rs` for clarity. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * bulk-multiparts-merge-reader: - Refactor `flush.rs`: Removed `LastNonNullIter` usage and adjusted `DedupReader` instantiation to use `LastRow::new(false)` and `LastNonNull::new(false)`. - Enhance `simple_bulk_memtable.rs`: Added logic to handle `LastNonNull` merge mode in `IterBuilder`. Introduced new tests: `test_delete_only` and `test_single_range` to verify delete operations and single range handling. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * fix: tests Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> --------- Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>	2025-07-25 03:05:54 +00:00
Yingwen	eaf1e1198f	refactor: Extract mito codec part into a new crate (#6307 ) * chore: add a new crate mito-codec Signed-off-by: evenyag <realevenyag@gmail.com> * feat: port necessary mods for primary key codec Signed-off-by: evenyag <realevenyag@gmail.com> * refactor: use codec utils in mito-codec Signed-off-by: evenyag <realevenyag@gmail.com> * refactor: remove unused mods Signed-off-by: evenyag <realevenyag@gmail.com> * style: fix clippy Signed-off-by: evenyag <realevenyag@gmail.com> * refactor: remove Partition::is_partition_column() Signed-off-by: evenyag <realevenyag@gmail.com> * refactor: remove duplicated test utils Signed-off-by: evenyag <realevenyag@gmail.com> * chore: remove unused comment Signed-off-by: evenyag <realevenyag@gmail.com> * fix: fix is_partition_column check Signed-off-by: evenyag <realevenyag@gmail.com> --------- Signed-off-by: evenyag <realevenyag@gmail.com>	2025-06-13 07:14:29 +00:00
Lei, HUANG	4e615e8906	feat(wal): support bulk wal entries (#6178 ) * feat/bulk-wal: ### Refactor: Simplify Data Handling in LogStore Implementations - `kafka/log_store.rs`, `raft_engine/log_store.rs`, `wal.rs`, `raw_entry_reader.rs`, `logstore.rs`: - Refactored `entry` and `build_entry` functions to accept `Vec<u8>` directly instead of `&mut Vec<u8>`. - Removed usage of `std::mem::take` for data handling, simplifying the code and improving readability. - Updated test cases to align with the new function signatures. * feat/bulk-wal: ### Add Support for Bulk WAL Entries and Flight Data Encoding - Add `raw_data` field to `BulkPart` and related structs: Updated `BulkPart` and related structures in `src/mito2/src/memtable/bulk/part.rs`, `src/mito2/src/memtable/simple_bulk_memtable.rs`, `src/mito2/src/memtable/time_partition.rs`, `src/mito2/src/region_write_ctx.rs`, `src/mito2/src/worker/handle_bulk_insert.rs`, and `src/store-api/src/region_request.rs` to include a new `raw_data` field for handling Arrow IPC data. - Implement Flight Data Encoding: Added a new module `flight` in `src/common/test-util/src/flight.rs` to encode record batches to Flight data format. - Update `greptime-proto` dependency: Changed the revision of the `greptime-proto` dependency in `Cargo.lock` and `Cargo.toml`. - Enhance WAL Writer and Tests: Modified `src/mito2/src/wal.rs` and related test files to support bulk WAL entries and added tests for encoding and handling bulk data. * feat/bulk-wal: - Update `greptime-proto` Dependency: Updated the `greptime-proto` dependency to a new revision in `Cargo.lock` and `Cargo.toml`. - Add `common-grpc` Dependency: Added `common-grpc` as a dependency in `Cargo.lock` and `src/mito2/Cargo.toml`. - Refactor `BulkPart` Structure: Removed `num_rows` field and added `num_rows()` method in `src/mito2/src/memtable/bulk/part.rs`. Updated related usages in `src/mito2/src/memtable/simple_bulk_memtable.rs`, `src/mito2/src/memtable/time_partition.rs`, `src/mito2/src/memtable/time_series.rs`, `src/mito2/src/region_write_ctx.rs`, and `src/mito2/src/worker/handle_bulk_insert.rs`. - Implement `TryFrom` and `From` for `BulkWalEntry`: Added implementations for converting between `BulkPart` and `BulkWalEntry` in `src/mito2/src/memtable/bulk/part.rs`. - Handle Bulk Entries in Region Opener: Added logic to process bulk entries in `src/mito2/src/region/opener.rs`. - Fix `BulkInsertRequest` Handling: Corrected `region_id` handling in `src/operator/src/bulk_insert.rs` and `src/store-api/src/region_request.rs`. - Add Error Variant for `ConvertBulkWalEntry`: Added a new error variant in `src/mito2/src/error.rs` for handling bulk WAL entry conversion errors. * fix: ci * feat/bulk-wal: Add bulk write operation in `opener.rs` - Enhanced the region write context by adding a call to `write_bulk()` after `write_memtable()` in `opener.rs`. - This change aims to improve the efficiency of writing operations by enabling bulk writes. * feat/bulk-wal: Enhance error handling and metrics in `bulk_insert.rs` - Updated `Inserter` to improve error handling by capturing the result of `datanode.handle(request)` and incrementing the `DIST_INGEST_ROW_COUNT` metric with the number of affected rows. * feat/bulk-wal: ### Remove Encode Error Handling for WAL Entries - `error.rs`: Removed the `EncodeWal` error variant and its associated handling. - `wal.rs`: Eliminated the `entry_encode_buf` buffer and its usage for encoding WAL entries. Replaced with direct encoding to a vector using `encode_to_vec()`.	2025-05-29 09:10:30 +00:00
Lei, HUANG	4b71e493f7	feat!: revise compaction picker (#6121 ) * - Refactor `RegionFilePathFactory` to `RegionFilePathProvider`: Updated references and implementations in `access_layer.rs`, `write_cache.rs`, and related test files to use the new struct name. - Add `max_file_size` support in compaction: Introduced `max_file_size` option in `PickerOutput`, `SerializedPickerOutput`, and `WriteOptions` in `compactor.rs`, `picker.rs`, `twcs.rs`, and `window.rs`. - Enhance Parquet writing logic: Modified `parquet.rs` and `parquet/writer.rs` to support optional `max_file_size` and added a test case `test_write_multiple_files` to verify writing multiple files based on size constraints. Refactor Parquet Writer Initialization and File Handling - Updated `ParquetWriter` in `writer.rs` to handle `current_indexer` as an `Option`, allowing for more flexible initialization and management. - Introduced `finish_current_file` method to encapsulate logic for completing and transitioning between SST files, improving code clarity and maintainability. - Enhanced error handling and logging with `debug` statements for better traceability during file operations. - Removed Output Size Enforcement in `twcs.rs`: - Deleted the `enforce_max_output_size` function and related logic to simplify compaction input handling. - Added Max File Size Option in `parquet.rs`: - Introduced `max_file_size` in `WriteOptions` to control the maximum size of output files. - Refactored Indexer Management in `parquet/writer.rs`: - Changed `current_indexer` from an `Option` to a direct `Indexer` type. - Implemented `roll_to_next_file` to handle file transitions when exceeding `max_file_size`. - Simplified indexer initialization and management logic. - Refactored SST File Handling: - Introduced `FilePathProvider` trait and its implementations (`WriteCachePathProvider`, `RegionFilePathFactory`) to manage SST and index file paths. - Updated `AccessLayer`, `WriteCache`, and `ParquetWriter` to use `FilePathProvider` for path management. - Modified `SstWriteRequest` and `SstUploadRequest` to use path providers instead of direct paths. - Files affected: `access_layer.rs`, `write_cache.rs`, `parquet.rs`, `writer.rs`. - Enhanced Indexer Management: - Replaced `IndexerBuilder` with `IndexerBuilderImpl` and made it async to support dynamic indexer creation. - Updated `ParquetWriter` to handle multiple indexers and file IDs. - Files affected: `index.rs`, `parquet.rs`, `writer.rs`. - Removed Redundant File ID Handling: - Removed `file_id` from `SstWriteRequest` and `CompactionOutput`. - Updated related logic to dynamically generate file IDs where necessary. - Files affected: `compaction.rs`, `flush.rs`, `picker.rs`, `twcs.rs`, `window.rs`. - Test Adjustments: - Updated tests to align with new path and indexer management. - Introduced `FixedPathProvider` and `NoopIndexBuilder` for testing purposes. - Files affected: `sst_util.rs`, `version_util.rs`, `parquet.rs`. * chore: rebase main * feat/multiple-compaction-output: ### Add Benchmarking and Refactor Compaction Logic - Benchmarking: Added a new benchmark `run_bench` in `Cargo.toml` and implemented benchmarks in `benches/run_bench.rs` using Criterion for `find_sorted_runs` and `reduce_runs` functions. - Compaction Module Enhancements: - Made `run.rs` public and refactored the `Ranged` and `Item` traits to be public. - Simplified the logic in `find_sorted_runs` and `reduce_runs` by removing `MergeItems` and related functions. - Introduced `find_overlapping_items` for identifying overlapping items. - Code Cleanup: Removed redundant code and tests related to `MergeItems` in `run.rs`. * feat/multiple-compaction-output: ### Enhance Compaction Logic and Add Benchmarks - Compaction Logic Improvements: - Updated `reduce_runs` function in `src/mito2/src/compaction/run.rs` to remove the target parameter and improve the logic for selecting files to merge based on minimum penalty. - Enhanced `find_overlapping_items` to handle unsorted inputs and improve overlap detection efficiency. - Benchmark Enhancements: - Added `bench_find_overlapping_items` in `src/mito2/benches/run_bench.rs` to benchmark the new `find_overlapping_items` function. - Extended existing benchmarks to include larger data sizes. - Testing Enhancements: - Updated tests in `src/mito2/src/compaction/run.rs` to reflect changes in `reduce_runs` and added new tests for `find_overlapping_items`. - Logging and Debugging: - Improved logging in `src/mito2/src/compaction/twcs.rs` to provide more detailed information about compaction decisions. * feat/multiple-compaction-output: ### Refactor and Enhance Compaction Logic - Refactor `find_overlapping_items` Function: Changed the function signature to accept slices instead of mutable vectors in `run.rs`. - Rename and Update Struct Fields: Renamed `penalty` to `size` in `SortedRun` struct and updated related logic in `run.rs`. - Enhance `reduce_runs` Function: Improved logic to sort runs by size and limit probe runs to 100 in `run.rs`. - Add `merge_seq_files` Function: Introduced a new function `merge_seq_files` in `run.rs` for merging sequential files. - Modify `TwcsPicker` Logic: Updated the compaction logic to use `merge_seq_files` when only one run is found in `twcs.rs`. - Remove `enforce_file_num` Function: Deleted the `enforce_file_num` function and its related test cases in `twcs.rs`. * feat/multiple-compaction-output: ### Enhance Compaction Logic and Testing - Add `merge_seq_files` Functionality: Implemented the `merge_seq_files` function in `run.rs` to optimize file merging based on scoring systems. Updated benchmarks in `run_bench.rs` to include `bench_merge_seq_files`. - Improve Compaction Strategy in `twcs.rs`: Modified the compaction logic to handle file merging more effectively, considering file size and overlap. - Update Tests: Enhanced test coverage in `compaction_test.rs` and `append_mode_test.rs` to validate new compaction logic and file merging strategies. - Remove Unused Function: Deleted `new_file_handles` from `test_util.rs` as it was no longer needed. * feat/multiple-compaction-output: ### Refactor TWCS Compaction Options - Refactor Compaction Logic: Simplified the TWCS compaction logic by replacing multiple parameters (`max_active_window_runs`, `max_active_window_files`, `max_inactive_window_runs`, `max_inactive_window_files`) with a single `trigger_file_num` parameter in `picker.rs`, `twcs.rs`, and `options.rs`. - Update Tests: Adjusted test cases to reflect the new compaction logic in `append_mode_test.rs`, `compaction_test.rs`, `filter_deleted_test.rs`, `merge_mode_test.rs`, and various test files under `tests/cases`. - Modify Engine Options: Updated engine option keys to use `trigger_file_num` in `mito_engine_options.rs` and `region_request.rs`. - Fuzz Testing: Updated fuzz test generators and translators to accommodate the new compaction parameter in `alter_expr.rs` and related files. This refactor aims to streamline the compaction configuration by reducing the number of parameters and simplifying the codebase. * chore: add trailing space * fix license header * feat/revise-compaction-picker: Limit File Processing and Optimize Merge Logic in `run.rs` - Introduced a limit to process a maximum of 100 files in `merge_seq_files` to control time complexity. - Adjusted logic to calculate `target_size` and iterate over files using the limited set of files. - Updated scoring calculations to use the limited file set, ensuring efficient file merging. * feat/revise-compaction-picker: ### Add Compaction Metrics and Remove Debug Logging - Compaction Metrics: Introduced new histograms `COMPACTION_INPUT_BYTES` and `COMPACTION_OUTPUT_BYTES` to track compaction input and output file sizes in `metrics.rs`. Updated `compactor.rs` to observe these metrics during the compaction process. - Logging Cleanup: Removed debug logging of file ranges during the merge process in `twcs.rs`. * feat/revise-compaction-picker: ## Enhance Compaction Logic and Metrics - Compaction Logic Improvements: - Added methods `input_file_size` and `output_file_size` to `MergeOutput` in `compactor.rs` to streamline file size calculations. - Updated `Compactor` implementation to use these methods for metrics tracking. - Modified `Ranged` trait logic in `run.rs` to improve range comparison. - Enhanced test cases in `run.rs` to reflect changes in compaction logic. - Metrics Enhancements: - Changed `COMPACTION_INPUT_BYTES` and `COMPACTION_OUTPUT_BYTES` from histograms to counters in `metrics.rs` for better performance tracking. - Debugging and Logging: - Added detailed logging for compaction pick results in `twcs.rs`. - Implemented custom `Debug` trait for `FileMeta` in `file.rs` to improve debugging output. - Testing Enhancements: - Added new test `test_compaction_overlapping_files` in `compaction_test.rs` to verify compaction behavior with overlapping files. - Updated `merge_mode_test.rs` to reflect changes in file handling during scans. * feat/revise-compaction-picker: ### Update `FileHandle` Debug Implementation - Refactor Debug Output: Simplified the `fmt::Debug` implementation for `FileHandle` in `src/mito2/src/sst/file.rs` by consolidating multiple fields into a single `meta` field using `meta_ref()`. - Atomic Operations: Updated the `deleted` field to use atomic loading with `Ordering::Relaxed`. * Trigger CI * feat/revise-compaction-picker: Update compaction logic and default options - `twcs.rs`: Enhanced logging for compaction pick results by improving the formatting for better readability. - `options.rs`: Modified the default `max_output_file_size` in `TwcsOptions` from 2GB to 512MB to optimize file handling and performance. * feat/revise-compaction-picker: Refactor `find_overlapping_items` to use an external result vector - Updated `find_overlapping_items` in `src/mito2/src/compaction/run.rs` to accept a mutable result vector instead of returning a new vector, improving memory efficiency. - Modified benchmarks in `src/mito2/benches/bench_compaction_picker.rs` to accommodate the new function signature. - Adjusted tests in `src/mito2/src/compaction/run.rs` to use the updated function signature, ensuring correct functionality with the new approach. * feat/revise-compaction-picker: Improve file merging logic in `run.rs` - Refactor the loop logic in `merge_seq_files` to simplify the iteration over file groups. - Adjust the range for `end_idx` to include the endpoint, allowing for more flexible group selection. - Remove the condition that skips groups with only one file, enabling more comprehensive processing of file sequences. * feat/revise-compaction-picker: Enhance `find_overlapping_items` with `SortedRun` and Update Tests - Refactor `find_overlapping_items` in `src/mito2/src/compaction/run.rs` to utilize the `SortedRun` struct for improved efficiency and clarity. - Introduce a `sorted` flag in `SortedRun` to optimize sorting operations. - Update test cases in `src/mito2/benches/bench_compaction_picker.rs` to accommodate changes in `find_overlapping_items` by using `SortedRun`. - Add `From<Vec<T>>` implementation for `SortedRun` to facilitate easy conversion from vectors. * feat/revise-compaction-picker: Enhancements in `compaction/run.rs`: - Added `ReadableSize` import to handle size calculations. - Modified the logic in `merge_seq_files` to clamp the calculated target size to a maximum of 2GB when `max_file_size` is not provided. * feat/revise-compaction-picker: Add Default Max Output Size Constant for Compaction Introduce DEFAULT_MAX_OUTPUT_SIZE constant to define the default maximum compaction output file size as 2GB. Refactor the merge_seq_files function to utilize this constant, ensuring consistent and maintainable code for handling file size limits during compaction.	2025-05-23 03:29:08 +00:00
Lei, HUANG	5a9023d6b3	feat(bulk): write to multiple time partitions (#6086 ) * add benchmark for splitting according to time partition * feat/write-to-multiple-time-partitions: Enhancements to Bulk Processing and Time Partitioning - `part.rs`: Added `Snafu` to imports and introduced `timestamp_index` in `BulkPart` struct. Implemented `timestamps` method for accessing timestamp columns. - `simple_bulk_memtable.rs`: Updated tests to include `timestamp_index` initialization. - `time_partition.rs`: Enhanced `TimePartition` to support partial writes with `write_record_batch_partial`. Implemented `split_record_batch` for filtering records by timestamp range. Added comprehensive tests for `split_record_batch`. - `handle_bulk_insert.rs`: Modified to retrieve timestamp index and column together, updating `BulkPart` initialization with `timestamp_index`. * feat/write-to-multiple-time-partitions: ### Enhance Time Partitioning Logic - `time_partition.rs`: - Introduced `HashSet` for efficient partition management. - Refactored `write_bulk` to handle multiple partitions and added `find_partitions_by_time_range` for identifying existing and missing partitions. - Updated `get_or_create_time_partition` to manage partition creation. - Added comprehensive tests for partition finding logic, covering various scenarios including overlapping and non-overlapping time ranges. - Tests: - Added `test_find_partitions_by_time_range` to validate new partitioning logic. - Updated `test_split_record_batch` to ensure correct record batch splitting behavior. * feat/write-to-multiple-time-partitions: ### Enhance Time Partitioning and Testing in `time_partition.rs` - Time Partitioning Enhancements: - Updated `split_record_batch` to handle multiple timestamp units (`Second`, `Millisecond`, `Microsecond`, `Nanosecond`) by matching on `DataType`. - Improved filtering logic for timestamp arrays to support various time units. - Testing Enhancements: - Added `test_write_bulk` to verify writing across multiple partitions and scenarios in `time_partition.rs`. - Updated `test_split_record_batch` to use `TimestampMillisecondArray` for testing timestamp partitioning. - Imports and Dependencies: - Added necessary imports for new timestamp array types and testing utilities. * feat/write-to-multiple-time-partitions: ### Refactor and Enhance Time Partition Filtering - Refactor Filtering Logic: Consolidated the filtering logic for timestamp arrays using macros in `time_partition.rs` and `bench_filter_time_partition.rs`. This reduces code duplication and improves maintainability. - Enhance `BulkPart` Struct: Made fields in `BulkPart` public to facilitate easier access and manipulation in `memtable.rs` and `part.rs`. - Rename Function: Renamed `split_record_batch` to `filter_record_batch` for clarity in `time_partition.rs` and `bench_filter_time_partition.rs`. - Add Feature Flag: Introduced `int_roundings` feature in `lib.rs` to support new functionality. * refactor tests * feat/write-to-multiple-time-partitions: Improve timestamp handling in `time_partition.rs` - Enhanced safety comments for timestamp conversion to ensure clarity. - Modified logic to prevent overflow by using `div_euclid` for `bulk_start_sec` and `bulk_end_sec` calculations. - Adjusted the `filter_map` logic to correctly compute timestamps using `start_sec` and `part_duration_sec`. * feat/write-to-multiple-time-partitions: Refactor timestamp handling and add utility function - Refactor `time_partition.rs`: Simplified timestamp handling by replacing direct type access with a utility function to retrieve the timestamp unit. Improved error handling for timestamp conversion. - Enhance `metadata.rs`: Added `time_index_type` function to `RegionMetadata` to retrieve the timestamp type of the time index column, ensuring safer and more readable code. * feat/write-to-multiple-time-partitions: Refactor time partition variable names in `time_partition.rs` - Renamed variables for clarity: `bulk_start_sec` to `start_bucket` and `bulk_end_sec` to `end_bucket`. - Updated related logic to use new variable names for improved readability and maintainability. * feat/write-to-multiple-time-partitions: Refactor variable names in `time_partition.rs` - Updated variable names from `matching` and `missing` to `matchings` and `missings` for clarity and consistency. - Modified function calls and loop iterations to align with the new variable names. - Affected file: `src/mito2/src/memtable/time_partition.rs` * feat/write-to-multiple-time-partitions: ### Refactor variable names in `time_partition.rs` - Updated variable names for clarity in `time_partition.rs`: - Renamed `matchings` to `matching_parts` - Renamed `missings` to `missing_parts` - Adjusted logic to use new variable names in methods `find_partitions_by_time_range` and `write_record_batch`. * feat/write-to-multiple-time-partitions: ### Enhance Time Partition Handling - `time_partition.rs`: - Added `ArrayRef` to handle timestamp arrays, improving the partitioning logic by allowing more efficient timestamp range checks. - Enhanced `find_partitions_by_time_range` to support sparse data and handle different timestamp units (`Second`, `Millisecond`, `Microsecond`, `Nanosecond`). - Updated test cases to cover new scenarios, including sparse data and edge cases, ensuring robustness of partition handling. --------- Co-authored-by: Lei <lei@Leis-MacBook-Pro.local>	2025-05-14 05:09:59 +00:00
fys	2b2ea5bf72	chore: upgrade some dependencies (#5777 ) * chore: upgrade some dependencies * chore: upgrade some dependencies * fix: cr * fix: ci * fix: test * fix: cargo fmt	2025-03-27 02:48:44 +00:00
Weny Xu	965a48656f	feat(metric-engine): introduce `RowModifier` for MetricEngine (#5380 ) * feat(metric-engine): store physical table ColumnIds in `MetricEngineState` * feat(metric-engine): introduce `RowModifier` for MetricEngine * chore: upgrade greptime-proto * feat: introduce `WriteHint` to `RegionPutRequest` * chore: apply suggestions from CR * chore: udpate greptime-proto * chore: apply suggestions from CR * chore: add comments * chore: update proto	2025-01-22 05:16:44 +00:00
discord9	758aef39d8	feat: filter batch by sequence in memtable (#5367 ) * feat: add seq field * feat: filter by sequence * chore: per review * docs: explain why not prune * chore: correct doc * test: test filter by seq	2025-01-16 04:44:28 +00:00
Weny Xu	b64c075cdb	feat: introduce `PrimaryKeyEncoding` (#5312 ) * feat: introduce `PrimaryKeyEncoding` * fix: fix unit tests * chore: add empty line * test: add unit tests * chore: fmt code * refactor: introduce new codec trait to support various encoding * fix: fix unit tests * chore: update sqlness result * chore: apply suggestions from CR * chore: apply suggestions from CR	2025-01-15 06:16:53 +00:00
Yingwen	10b7a3d24d	feat: Implements `merge_mode` region options (#4208 ) * feat: add update_mode to region options * test: add test * feat: last not null iter * feat: time series last not null * feat: partition tree update mode * feat: partition tree * fix: last not null iter slice * test: add test for compaction * test: use second resolution * style: fix clippy * chore: merge two lines Co-authored-by: Jeremyhi <jiachun_feng@proton.me> * chore: address CR comments * refactor: UpdateMode -> MergeMode * refactor: LastNotNull -> LastNonNull * chore: return None earlier * feat: validate region options make merge mode optional and use default while it is None * test: fix tests --------- Co-authored-by: Jeremyhi <jiachun_feng@proton.me>	2024-06-27 07:52:58 +00:00
maco	40c585890a	refactor: replace Expr with datafusion::Expr (#3995 ) * refactor: replace Expr with datafusion::Expr * fix: fmt-toml * fix: cr comment	2024-05-21 06:40:29 +00:00
Yingwen	39b69f1e3b	refactor!: Renames the new memtable to PartitionTreeMemtable (#3547 ) * refactor: rename mod merge_tree to partition_tree * refactor: rename merge_tree * refactor: change merge tree comment * refactor: rename merge tree struct * refactor: memtable options	2024-03-20 06:40:41 +00:00
Lei, HUANG	ddbcff68dd	feat: support append-only mode in time-series memtable (#3540 ) * feat: support append-only mode in time-series memtable * fix: rename sort_and_dedup to sort	2024-03-19 20:37:54 +00:00
Yingwen	7c895e2605	perf: more benchmarks for memtables (#3491 ) * chore: remove duplicate bench * refactor: rename bench * perf: add full scan bench for memtable * feat: filter bench and add time series to bench group * chore: comment * refactor: rename * style: fix clippy	2024-03-12 12:02:58 +00:00
Lei, HUANG	376409b857	feat: employ sparse key encoding for shard lookup (#3410 ) * feat: employ short key encoding for shard lookup * fix: license * chore: simplify code * refactor: only enable sparse encoding to speed lookup on metric engine * fix: names	2024-03-01 06:22:15 +00:00

31 Commits