Commit Graph

31 Commits

Author SHA1 Message Date
Yingwen
fd94f55193 refactor(mito2): remove dead scan code (#7925)
* refactor(mito2): remove dead batch parallel scan helpers

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor(mito2): remove dead merge reader path

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor(mito2): remove dead batch dedup reader

Signed-off-by: evenyag <realevenyag@gmail.com>

* test(mito2): remove obsolete batch source helper

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: remove unused plain batch

Signed-off-by: evenyag <realevenyag@gmail.com>

---------

Signed-off-by: evenyag <realevenyag@gmail.com>
2026-04-10 03:12:33 +00:00
Yingwen
0e22d6a72b feat: implement partition range cache stream (#7842)
* feat: add cache stream helpers, key construction, config wiring, and metrics for partition range cache

Add range result cache size config field and wire it through cache builder
chains. Implement cache key building (build_range_cache_key), stream
replay/store helpers (cached_flat_range_stream, cache_flat_range_stream),
dictionary compaction (compact_pk_dictionary), and partition range row group
collection. Add range cache metrics (size, hit, miss) to ScanMetricsSet
and PartitionMetrics. Move fingerprint tests from scan_region to
range_cache module. These functions are not yet wired into scan execution.

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: add benchmark for cache stream

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: move bench_util to test_util

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: share dict

Signed-off-by: evenyag <realevenyag@gmail.com>

* test: test ptr_eq

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: fmt code

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: simplify value array handling

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: add todo for estimate size

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: simplify size calculation

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: remove one test

Signed-off-by: evenyag <realevenyag@gmail.com>

* test: update config test

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: address review comment

Only ignore exprs that can extract time ranges

Signed-off-by: evenyag <realevenyag@gmail.com>

* test: fix tests

Signed-off-by: evenyag <realevenyag@gmail.com>

---------

Signed-off-by: evenyag <realevenyag@gmail.com>
2026-03-24 10:01:13 +00:00
Lei, HUANG
be4a7a6d37 refactor: remove Memtable::iter (#7809)
* refactor: remove Memtable::iter

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* fix: review comments

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

---------

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
2026-03-16 07:49:31 +00:00
Yingwen
04cd2c8a05 feat: flat read path support primary_key format memtables (#7759)
* feat: add adapter for batch to flat recordbatch

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: support batch to flat record batch in MemtableRange

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: address review issues for BatchToRecordBatchAdapter

- Extract duplicated read_column_ids computation into a shared
  `read_column_ids_from_projection` helper function
- Cache `FormatProjection` in `BatchToRecordBatchContext::new()` instead
  of recomputing it on every `adapt_iter()` call
- Remove unnecessary `Arc` wrapping of `read_column_ids` in
  `SimpleBulkMemtable::ranges()`
- Fix clippy `filter_map_bool_then` warning in `batch_adapter.rs`

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: simplify comments

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor(mito2): use read column ids in batch adapter

Signed-off-by: evenyag <realevenyag@gmail.com>

* test: test build_record_batch_iter

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: fmt code

Signed-off-by: evenyag <realevenyag@gmail.com>

* test: test build_record_batch_iter for all old memtables

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: address comment

Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: prune time range before adapter

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: share BatchToRecordBatchContext in simple_bulk_memtable.rs

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: use ScalarValue::to_array_of_size to build repeated value array

Signed-off-by: evenyag <realevenyag@gmail.com>

---------

Signed-off-by: evenyag <realevenyag@gmail.com>
2026-03-10 12:46:39 +00:00
LFC
5eac4f10aa chore: remove dependency on "atty" (#7725)
Signed-off-by: luofucong <luofc@foxmail.com>
2026-02-26 09:58:01 +00:00
Yingwen
7711661618 feat: BulkMemtable compact parts without encoding into Parquet (#7617)
* feat: implement MultiBulkPart to hold a list of batches in BulkMemtable

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: Only encode parts when there are enough rows

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: merge MultiBulkPartIter and BulkPartBatchIter

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: remove some enums and structs

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: reuse code in merging bulk/encoded parts

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: collect part groups directly

Signed-off-by: evenyag <realevenyag@gmail.com>

* test: add unit tests

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: enlarge merge threshold and configure by env

- GREPTIME_BULK_MERGE_THRESHOLD
- GREPTIME_BULK_ENCODE_ROW_THRESHOLD
- GREPTIME_BULK_ENCODE_BYTES_THRESHOLD

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: change flush strategy

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: add BulkMemtableConfig

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: limit max groups and adjust threshold

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: add flush file number metrics

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: add bulk filter 1 host bench

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: adjust bulk compact threshold

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: flush a file if == min_flush_rows

Signed-off-by: evenyag <realevenyag@gmail.com>

* test: fix test_index_build_type_compact test

Signed-off-by: evenyag <realevenyag@gmail.com>

* test: fix mito tests

Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: remove regions from catchup_regions before notify

Signed-off-by: evenyag <realevenyag@gmail.com>

---------

Signed-off-by: evenyag <realevenyag@gmail.com>
2026-01-30 08:03:36 +00:00
Yingwen
a22d08f1b1 feat: collect merge and dedup metrics (#7375)
* feat: collect FlatMergeReader metrics

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: add MergeMetricsReporter, rename Metrics to MergeMetrics

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: remove num_input_rows from MergeMetrics

The merge reader won't dedup so there is no need to collect input rows

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: report merge metrics to PartitionMetrics

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: add dedup cost to DedupMetrics

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: collect dedup metrics

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: remove metrics from FlatMergeIterator

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: remove num_output_rows from MergeMetrics

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: fix clippy

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: implement merge() for merge and dedup metrics

Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: report metrics after observe metrics

Signed-off-by: evenyag <realevenyag@gmail.com>

---------

Signed-off-by: evenyag <realevenyag@gmail.com>
2025-12-10 09:16:20 +00:00
Yingwen
3001c2d719 feat: BulkMemtable stores small fragments in another buffer (#7164)
* feat: buffer small parts in bulk memtable

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: use assert_eq instead of assert

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: fix compiler errors

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: collect bulk memtable scan metrics

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: report metrics early

Signed-off-by: evenyag <realevenyag@gmail.com>

---------

Signed-off-by: evenyag <realevenyag@gmail.com>
2025-11-05 06:35:32 +00:00
Yingwen
acf38a7091 fix: avoid filtering rows with delete op by fields under merge mode (#7154)
* chore: clear allow dead_code for flat format

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: pass exprs to build appliers

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: split field filters and index appliers

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: support skip filtering fields in RowGroupPruningStats

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: add PreFilterMode to config whether to skip filtering fields

Adds the PreFilterMode to the RangeBase and sets it in
ParquetReaderBuilder

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: support skipping fields in prune reader

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: support pre filter mode in bulk memtable

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: pass PreFilterMode to memtable

Signed-off-by: evenyag <realevenyag@gmail.com>

* test: test mito filter delete

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: fix compiler errors

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: remove commented code

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: move predicate and sequence to RangesOptions

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: fmt code

Signed-off-by: evenyag <realevenyag@gmail.com>

* ci: skip cargo gc

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: fix cargo build warning

Signed-off-by: evenyag <realevenyag@gmail.com>

* Revert "ci: skip cargo gc"

This reverts commit 1ec9594a6d.

Signed-off-by: evenyag <realevenyag@gmail.com>

---------

Signed-off-by: evenyag <realevenyag@gmail.com>
2025-10-30 12:14:45 +00:00
Yingwen
cff9cb6327 feat: converts batches in old format to the flat format in query time (#6987)
* feat: use correct projection index for old format

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: remove allow dead_code from format

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: check and convert old format to flat format

Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: sub primary key num from projection

Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: always convert the batch in FlatRowGroupReader

Signed-off-by: evenyag <realevenyag@gmail.com>

* style: fix clippy

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: Change &Option<&[]> to Option<&[]>

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: only build arrow schema once

adds a method flat_sst_arrow_schema_column_num() to get the field num

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: Handle flat format and old format separately

Adds two structs ParquetFlat and ParquetPrimaryKeyToFlat.
ParquetPrimaryKeyToFlat delegates stats and projection to the
PrimaryKeyReadFormat.

Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: handle non string tag correctly

Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: do not register file cache twice

Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: clean temp files

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: add rows and bytes to flush success log

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: convert format in memtable

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: add compaction flag to ScanInput

Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: compaction should use old format for sparse encoding

Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: merge schema use old format in sparse encoding

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: reads legacy format but not convert if skip_auto_convert

Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: suppport sparse encoding in bulk parts

Signed-off-by: evenyag <realevenyag@gmail.com>

---------

Signed-off-by: evenyag <realevenyag@gmail.com>
2025-09-25 06:42:22 +00:00
Yingwen
9c22092189 feat: Implements compaction for bulk memtable (#6923)
* feat: initial bulk memtable compaction implementation

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: implement compact for memtable

Signed-off-by: evenyag <realevenyag@gmail.com>

* test: add tests for bulk memtable compaction

Signed-off-by: evenyag <realevenyag@gmail.com>

* style: clippy

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: address review comments

Signed-off-by: evenyag <realevenyag@gmail.com>

---------

Signed-off-by: evenyag <realevenyag@gmail.com>
2025-09-09 09:15:25 +00:00
Ruihang Xia
c9377e7c5a build: bump rust edition to 2024 (#6920)
* bump edition

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* format

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* gen keyword

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* lifetime and env var

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* one more gen fix

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* lifetime of temporaries in tail expressions

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* format again

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* clippy nested if

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* clippy let and return

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2025-09-08 02:37:18 +00:00
Yingwen
8fc3a9a9d7 feat: Implements async FlatMergeReader and FlatDedupReader (#6761)
* refactor: Add Flat prefix to MergeIterator and DedupIterator

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: implement MergeReader for RecordBatch

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: use GenericNode for IterNode

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: flat merge reader to stream

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: implement FlatDedupReader

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: add a benchmark for FlatMergeIterator

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: rename plain_projection to flat_projection

Signed-off-by: evenyag <realevenyag@gmail.com>

---------

Signed-off-by: evenyag <realevenyag@gmail.com>
2025-08-19 06:30:59 +00:00
Yingwen
39e2f122eb feat: EncodedBulkPartIter iters flat format and returns RecordBatch (#6655)
* feat: implements iter to read bulk part

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: BulkPartEncoder encodes BulkPart instead of mutation

Signed-off-by: evenyag <realevenyag@gmail.com>

---------

Signed-off-by: evenyag <realevenyag@gmail.com>
2025-08-06 06:50:01 +00:00
Yingwen
50f7f61fdc feat: Implements an iterator to read the RecordBatch in BulkPart (#6647)
* feat: impl RecordBatchIter for BulkPart

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: rename BulkPartIter to EncodedBulkPartIter

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: add iter benchmark

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: filter by primary key columns

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: move struct definitions

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: bulk iter for flat schema

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: iter filter benchmark

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: fix compiler errors

Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: use corrent sequence array to compare

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: remove RecordBatchIter

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: update comments

Signed-off-by: evenyag <realevenyag@gmail.com>

* style: fix clippy

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: apply projection first

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: address comment

No need to check number of rows after filter

Signed-off-by: evenyag <realevenyag@gmail.com>

---------

Signed-off-by: evenyag <realevenyag@gmail.com>
2025-08-05 08:11:28 +00:00
Yingwen
52466fdd92 feat: Implement a converter to converts KeyValues into BulkPart (#6620)
* chore: add api to memtable to check bulk capability

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: Add a converter to convert KeyValues into BulkPart

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: move supports_bulk_insert to MemtableBuilder

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: benchmark

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: use write_bulk if the memtable benefits from it

Signed-off-by: evenyag <realevenyag@gmail.com>

* test: test BulkPartConverter

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: add a flag to store unencoded primary keys

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: cache schema for converter

Implements to_flat_sst_arrow_schema

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: simplify tests

Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: don't use bulk convert branch now

Signed-off-by: evenyag <realevenyag@gmail.com>

* style: fix clippy

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: address review comments

* simplify primary_key_column_builders check
* return error if value is not string

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: add FlatSchemaOptions::from_encoding and test sparse encoding

Signed-off-by: evenyag <realevenyag@gmail.com>

---------

Signed-off-by: evenyag <realevenyag@gmail.com>
2025-08-01 07:59:11 +00:00
Lei, HUANG
c3201c32c3 feat(mito): replace Memtable::iter with Memtable::ranges (#6549)
* bulk-multiparts-merge-reader:
 **Enhance Memtable Iteration and Flushing Logic**

 - **`flush.rs`**: Updated `RegionFlushTask` to handle multiple ranges using `MergeReaderBuilder` for improved source management during flush operations.
 - **`memtable.rs`**: Introduced `build_prune_iter` and `build_iter` methods in `MemtableRange` for flexible iteration. Added `MemtableRanges` struct to manage multiple contexts.
 - **`simple_bulk_memtable.rs`**: Refactored to use `BatchIterBuilder` and `BatchIterBuilderDeprecated` for iteration, supporting new `read_to_values` method in `Series`.
 - **`time_series.rs`**: Added `read_to_values` and `finish_cloned` methods in `Series` and `ValueBuilder` for efficient data handling.
 - **`scan_util.rs`**: Replaced `build_iter` with `build_prune_iter` for range iteration, enhancing scan utility.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* bulk-multiparts-merge-reader:
 - **Add Rayon for Parallel Processing**: Introduced `rayon` for parallel processing in `simple_bulk_memtable.rs` and updated `Cargo.toml` and `Cargo.lock` to include `rayon` dependency.
 - **Enhance Benchmarking**: Added new benchmarks in `simple_bulk_memtable.rs` to compare parallel vs sequential processing, projection, sequence filtering, and write performance.
 - **Make Structs and Methods Public**: Changed visibility of several structs and methods to `pub` in `simple_bulk_memtable.rs`, `memtable.rs`, `time_series.rs`, and `test_util.rs` to facilitate testing and benchmarking.
 - **Update Criterion Features**: Modified `Cargo.toml` to include `html_reports` feature for `criterion`.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* bulk-multiparts-merge-reader:
 ### Commit Summary

 - **Refactor `SimpleBulkMemtable`**:
   - Moved `ranges_sequential` function to a new `test_only` module and made it a method of `SimpleBulkMemtable`.
   - Made several fields in `SimpleBulkMemtable` private and added a `region_metadata` getter.
   - Affected files: `simple_bulk_memtable.rs`, `test_only.rs`.

 - **Benchmark Adjustments**:
   - Updated benchmark functions to use the new `ranges_sequential` method.
   - Affected file: `simple_bulk_memtable.rs`.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* bulk-multiparts-merge-reader:
 ### Add Test Configuration for `iter` Method in Memtable Implementations

 - **Enhancements**:
   - Added `#[cfg(any(test, feature = "test"))]` attribute to the `iter` method in various `Memtable` implementations to enable conditional compilation for testing purposes.
   - Affected files:
     - `src/mito2/src/memtable.rs`
     - `src/mito2/src/memtable/bulk.rs`
     - `src/mito2/src/memtable/partition_tree.rs`
     - `src/mito2/src/memtable/simple_bulk_memtable.rs`
     - `src/mito2/src/memtable/time_series.rs`
     - `src/mito2/src/test_util/memtable_util.rs`

 - **Benchmark Adjustments**:
   - Removed `black_box` usage in `bench_memtable_write_performance` function to streamline benchmarking.
   - Affected file: `src/mito2/benches/simple_bulk_memtable.rs`

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* bulk-multiparts-merge-reader:
 **Enhance Async Support and Refactor Iteration in `mito2`**

 - **Add Async Features**: Updated `Cargo.toml` to include `async` and `async_tokio` features for `criterion`.
 - **Async Iteration**: Introduced async functions `flush` and `flush_original` in `simple_bulk_memtable.rs` to handle memtable flushing using async iterators.
 - **Refactor Iteration Logic**: Moved `create_iter` and `BatchIterBuilderDeprecated` to `test_only.rs` for better separation of concerns.
 - **Public API Change**: Made `next_batch` in `read.rs` public to support async batch processing.
 - **Benchmark Updates**: Modified benchmarks in `simple_bulk_memtable.rs` to use async runtime for performance testing.

 Files affected: `Cargo.toml`, `simple_bulk_memtable.rs`, `test_only.rs`, `read.rs`.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* bulk-multiparts-merge-reader:
 **Enhance Benchmarking for Memtable**

 - Refactored `create_large_memtable` to `create_memtable_with_rows` in `simple_bulk_memtable.rs` to allow dynamic row count configuration.
 - Introduced parameterized benchmarking in `bench_ranges_parallel_vs_sequential` to test various row counts, improving the flexibility and coverage of performance tests.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* bulk-multiparts-merge-reader:
 ### Enhance Memory Management and Public API

 - **`builder.rs`**: Made `next_offset` method public to allow external access to offset calculations.
 - **`simple_bulk_memtable.rs`**: Simplified the `series.extend` method by removing the iterator conversion for `fields`.
 - **`time_series.rs`**:
   - Added `can_accommodate` method to `ValueBuilder` to check if fields can be accommodated without offset overflow.
   - Modified `extend` method to use a `Vec` for `fields` instead of an iterator, improving memory management and error handling.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* bulk-multiparts-merge-reader:
 Add License and Enhance Testing in `simple_bulk_memtable.rs`

 - Added Apache License header to `simple_bulk_memtable.rs`.
 - Modified test configuration in `simple_bulk_memtable.rs` to include `any(test, feature = "test")`.
 - Introduced a new test `test_write_read_large_string` in `simple_bulk_memtable.rs` to verify handling of large strings.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* bulk-multiparts-merge-reader:
 Update `Cargo.toml` dependencies

 - Adjust features for `common-meta` and `mito-codec` to include "testing".
 - Maintain `criterion` version and features for async support.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* bulk-multiparts-merge-reader:
 ### Update Predicate Type in Memtable Iterators

 - **Files Modified**:
   - `src/mito2/src/memtable.rs`
   - `src/mito2/src/memtable/bulk.rs`
   - `src/mito2/src/memtable/simple_bulk_memtable.rs`

 - **Key Changes**:
   - Updated the `iter` method in `Memtable` trait and its implementations to use `Option<table::predicate::Predicate>` instead of `Option<Predicate>`.
   - Adjusted return type in `BulkMemtable`'s `iter` method to `Result<crate::memtable::BoxedBatchIterator>`.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* bulk-multiparts-merge-reader:
 **Enhance Memtable Functionality**

 - **`memtable.rs`**:
   - Added `Clone` trait to `MemtableStats` and made `num_ranges` public.
   - Introduced `num_rows` field in `MemtableRange` and updated its constructor.
   - Added `num_rows` method to `MemtableRange`.

 - **`partition_tree.rs`, `simple_bulk_memtable.rs`, `time_series.rs`**:
   - Updated `MemtableRange` instantiation to include `num_rows`.

 - **`range.rs`**:
   - Refactored `MemRangeBuilder` to handle a single `MemtableRange` and `MemtableStats`.

 - **`scan_region.rs`**:
   - Enhanced memtable filtering based on time range and updated `MemRangeBuilder` usage.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* bulk-multiparts-merge-reader:
 **Enhancements and Bug Fixes**

 - **Deduplication Enhancements**:
   - Introduced `DedupReader` and `LastRow` as public structs in `dedup.rs` to enhance deduplication capabilities.
   - Added `LastNonNull` deduplication strategy in `flush.rs` and `simple_bulk_memtable.rs`.

 - **Memtable Improvements**:
   - Updated `SimpleBulkMemtable` to support batch size configuration and deduplication strategies.
   - Modified `Series` struct in `time_series.rs` to include a configurable capacity.

 - **Testing Enhancements**:
   - Added new test `test_write_dedup` in `simple_bulk_memtable.rs` to verify deduplication functionality.
   - Updated existing tests to include `OpType` parameter for better operation type handling.

 - **Refactoring**:
   - Renamed `BatchIterBuilder` to `BatchRangeBuilder` in `simple_bulk_memtable.rs` for clarity.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* bulk-multiparts-merge-reader:
 - **Refactor `flush.rs`:** Removed `LastNonNullIter` usage and adjusted `DedupReader` instantiation to use `LastRow::new(false)` and `LastNonNull::new(false)`.
 - **Enhance `simple_bulk_memtable.rs`:** Added logic to handle `LastNonNull` merge mode in `IterBuilder`. Introduced new tests: `test_delete_only` and `test_single_range` to verify delete operations and single range handling.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* fix: tests

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

---------

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
2025-07-25 03:05:54 +00:00
Yingwen
eaf1e1198f refactor: Extract mito codec part into a new crate (#6307)
* chore: add a new crate mito-codec

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: port necessary mods for primary key codec

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: use codec utils in mito-codec

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: remove unused mods

Signed-off-by: evenyag <realevenyag@gmail.com>

* style: fix clippy

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: remove Partition::is_partition_column()

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: remove duplicated test utils

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: remove unused comment

Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: fix is_partition_column check

Signed-off-by: evenyag <realevenyag@gmail.com>

---------

Signed-off-by: evenyag <realevenyag@gmail.com>
2025-06-13 07:14:29 +00:00
Lei, HUANG
4e615e8906 feat(wal): support bulk wal entries (#6178)
* feat/bulk-wal:
 ### Refactor: Simplify Data Handling in LogStore Implementations

 - **`kafka/log_store.rs`, `raft_engine/log_store.rs`, `wal.rs`, `raw_entry_reader.rs`, `logstore.rs`:**
   - Refactored `entry` and `build_entry` functions to accept `Vec<u8>` directly instead of `&mut Vec<u8>`.
   - Removed usage of `std::mem::take` for data handling, simplifying the code and improving readability.
   - Updated test cases to align with the new function signatures.

* feat/bulk-wal:
 ### Add Support for Bulk WAL Entries and Flight Data Encoding

 - **Add `raw_data` field to `BulkPart` and related structs**: Updated `BulkPart` and related structures in `src/mito2/src/memtable/bulk/part.rs`, `src/mito2/src/memtable/simple_bulk_memtable.rs`, `src/mito2/src/memtable/time_partition.rs`, `src/mito2/src/region_write_ctx.rs`,
 `src/mito2/src/worker/handle_bulk_insert.rs`, and `src/store-api/src/region_request.rs` to include a new `raw_data` field for handling Arrow IPC data.
 - **Implement Flight Data Encoding**: Added a new module `flight` in `src/common/test-util/src/flight.rs` to encode record batches to Flight data format.
 - **Update `greptime-proto` dependency**: Changed the revision of the `greptime-proto` dependency in `Cargo.lock` and `Cargo.toml`.
 - **Enhance WAL Writer and Tests**: Modified `src/mito2/src/wal.rs` and related test files to support bulk WAL entries and added tests for encoding and handling bulk data.

* feat/bulk-wal:
 - **Update `greptime-proto` Dependency**: Updated the `greptime-proto` dependency to a new revision in `Cargo.lock` and `Cargo.toml`.
 - **Add `common-grpc` Dependency**: Added `common-grpc` as a dependency in `Cargo.lock` and `src/mito2/Cargo.toml`.
 - **Refactor `BulkPart` Structure**: Removed `num_rows` field and added `num_rows()` method in `src/mito2/src/memtable/bulk/part.rs`. Updated related usages in `src/mito2/src/memtable/simple_bulk_memtable.rs`, `src/mito2/src/memtable/time_partition.rs`, `src/mito2/src/memtable/time_series.rs`,
 `src/mito2/src/region_write_ctx.rs`, and `src/mito2/src/worker/handle_bulk_insert.rs`.
 - **Implement `TryFrom` and `From` for `BulkWalEntry`**: Added implementations for converting between `BulkPart` and `BulkWalEntry` in `src/mito2/src/memtable/bulk/part.rs`.
 - **Handle Bulk Entries in Region Opener**: Added logic to process bulk entries in `src/mito2/src/region/opener.rs`.
 - **Fix `BulkInsertRequest` Handling**: Corrected `region_id` handling in `src/operator/src/bulk_insert.rs` and `src/store-api/src/region_request.rs`.
 - **Add Error Variant for `ConvertBulkWalEntry`**: Added a new error variant in `src/mito2/src/error.rs` for handling bulk WAL entry conversion errors.

* fix: ci

* feat/bulk-wal:
 Add bulk write operation in `opener.rs`

 - Enhanced the region write context by adding a call to `write_bulk()` after `write_memtable()` in `opener.rs`.
 - This change aims to improve the efficiency of writing operations by enabling bulk writes.

* feat/bulk-wal:
 Enhance error handling and metrics in `bulk_insert.rs`

 - Updated `Inserter` to improve error handling by capturing the result of `datanode.handle(request)` and incrementing the `DIST_INGEST_ROW_COUNT` metric with the number of affected rows.

* feat/bulk-wal:
 ### Remove Encode Error Handling for WAL Entries

 - **`error.rs`**: Removed the `EncodeWal` error variant and its associated handling.
 - **`wal.rs`**: Eliminated the `entry_encode_buf` buffer and its usage for encoding WAL entries. Replaced with direct encoding to a vector using `encode_to_vec()`.
2025-05-29 09:10:30 +00:00
Lei, HUANG
4b71e493f7 feat!: revise compaction picker (#6121)
* - **Refactor `RegionFilePathFactory` to `RegionFilePathProvider`:** Updated references and implementations in `access_layer.rs`, `write_cache.rs`, and related test files to use the new struct name.
 - **Add `max_file_size` support in compaction:** Introduced `max_file_size` option in `PickerOutput`, `SerializedPickerOutput`, and `WriteOptions` in `compactor.rs`, `picker.rs`, `twcs.rs`, and `window.rs`.
 - **Enhance Parquet writing logic:** Modified `parquet.rs` and `parquet/writer.rs` to support optional `max_file_size` and added a test case `test_write_multiple_files` to verify writing multiple files based on size constraints.

 **Refactor Parquet Writer Initialization and File Handling**
 - Updated `ParquetWriter` in `writer.rs` to handle `current_indexer` as an `Option`, allowing for more flexible initialization and management.
 - Introduced `finish_current_file` method to encapsulate logic for completing and transitioning between SST files, improving code clarity and maintainability.
 - Enhanced error handling and logging with `debug` statements for better traceability during file operations.

 - **Removed Output Size Enforcement in `twcs.rs`:**
   - Deleted the `enforce_max_output_size` function and related logic to simplify compaction input handling.

 - **Added Max File Size Option in `parquet.rs`:**
   - Introduced `max_file_size` in `WriteOptions` to control the maximum size of output files.

 - **Refactored Indexer Management in `parquet/writer.rs`:**
   - Changed `current_indexer` from an `Option` to a direct `Indexer` type.
   - Implemented `roll_to_next_file` to handle file transitions when exceeding `max_file_size`.
   - Simplified indexer initialization and management logic.

 - **Refactored SST File Handling**:
   - Introduced `FilePathProvider` trait and its implementations (`WriteCachePathProvider`, `RegionFilePathFactory`) to manage SST and index file paths.
   - Updated `AccessLayer`, `WriteCache`, and `ParquetWriter` to use `FilePathProvider` for path management.
   - Modified `SstWriteRequest` and `SstUploadRequest` to use path providers instead of direct paths.
   - Files affected: `access_layer.rs`, `write_cache.rs`, `parquet.rs`, `writer.rs`.

 - **Enhanced Indexer Management**:
   - Replaced `IndexerBuilder` with `IndexerBuilderImpl` and made it async to support dynamic indexer creation.
   - Updated `ParquetWriter` to handle multiple indexers and file IDs.
   - Files affected: `index.rs`, `parquet.rs`, `writer.rs`.

 - **Removed Redundant File ID Handling**:
   - Removed `file_id` from `SstWriteRequest` and `CompactionOutput`.
   - Updated related logic to dynamically generate file IDs where necessary.
   - Files affected: `compaction.rs`, `flush.rs`, `picker.rs`, `twcs.rs`, `window.rs`.

 - **Test Adjustments**:
   - Updated tests to align with new path and indexer management.
   - Introduced `FixedPathProvider` and `NoopIndexBuilder` for testing purposes.
   - Files affected: `sst_util.rs`, `version_util.rs`, `parquet.rs`.

* chore: rebase main

* feat/multiple-compaction-output:
 ### Add Benchmarking and Refactor Compaction Logic

 - **Benchmarking**: Added a new benchmark `run_bench` in `Cargo.toml` and implemented benchmarks in `benches/run_bench.rs` using Criterion for `find_sorted_runs` and `reduce_runs` functions.
 - **Compaction Module Enhancements**:
   - Made `run.rs` public and refactored the `Ranged` and `Item` traits to be public.
   - Simplified the logic in `find_sorted_runs` and `reduce_runs` by removing `MergeItems` and related functions.
   - Introduced `find_overlapping_items` for identifying overlapping items.
 - **Code Cleanup**: Removed redundant code and tests related to `MergeItems` in `run.rs`.

* feat/multiple-compaction-output:
 ### Enhance Compaction Logic and Add Benchmarks

 - **Compaction Logic Improvements**:
   - Updated `reduce_runs` function in `src/mito2/src/compaction/run.rs` to remove the target parameter and improve the logic for selecting files to merge based on minimum penalty.
   - Enhanced `find_overlapping_items` to handle unsorted inputs and improve overlap detection efficiency.

 - **Benchmark Enhancements**:
   - Added `bench_find_overlapping_items` in `src/mito2/benches/run_bench.rs` to benchmark the new `find_overlapping_items` function.
   - Extended existing benchmarks to include larger data sizes.

 - **Testing Enhancements**:
   - Updated tests in `src/mito2/src/compaction/run.rs` to reflect changes in `reduce_runs` and added new tests for `find_overlapping_items`.

 - **Logging and Debugging**:
   - Improved logging in `src/mito2/src/compaction/twcs.rs` to provide more detailed information about compaction decisions.

* feat/multiple-compaction-output:
 ### Refactor and Enhance Compaction Logic

 - **Refactor `find_overlapping_items` Function**: Changed the function signature to accept slices instead of mutable vectors in `run.rs`.
 - **Rename and Update Struct Fields**: Renamed `penalty` to `size` in `SortedRun` struct and updated related logic in `run.rs`.
 - **Enhance `reduce_runs` Function**: Improved logic to sort runs by size and limit probe runs to 100 in `run.rs`.
 - **Add `merge_seq_files` Function**: Introduced a new function `merge_seq_files` in `run.rs` for merging sequential files.
 - **Modify `TwcsPicker` Logic**: Updated the compaction logic to use `merge_seq_files` when only one run is found in `twcs.rs`.
 - **Remove `enforce_file_num` Function**: Deleted the `enforce_file_num` function and its related test cases in `twcs.rs`.

* feat/multiple-compaction-output:
 ### Enhance Compaction Logic and Testing

 - **Add `merge_seq_files` Functionality**: Implemented the `merge_seq_files` function in `run.rs` to optimize file merging based on scoring systems. Updated
 benchmarks in `run_bench.rs` to include `bench_merge_seq_files`.
 - **Improve Compaction Strategy in `twcs.rs`**: Modified the compaction logic to handle file merging more effectively, considering file size and overlap.
 - **Update Tests**: Enhanced test coverage in `compaction_test.rs` and `append_mode_test.rs` to validate new compaction logic and file merging strategies.
 - **Remove Unused Function**: Deleted `new_file_handles` from `test_util.rs` as it was no longer needed.

* feat/multiple-compaction-output:
 ### Refactor TWCS Compaction Options

 - **Refactor Compaction Logic**: Simplified the TWCS compaction logic by replacing multiple parameters (`max_active_window_runs`, `max_active_window_files`, `max_inactive_window_runs`, `max_inactive_window_files`) with a single `trigger_file_num` parameter in `picker.rs`, `twcs.rs`, and `options.rs`.
 - **Update Tests**: Adjusted test cases to reflect the new compaction logic in `append_mode_test.rs`, `compaction_test.rs`, `filter_deleted_test.rs`, `merge_mode_test.rs`, and various test files under `tests/cases`.
 - **Modify Engine Options**: Updated engine option keys to use `trigger_file_num` in `mito_engine_options.rs` and `region_request.rs`.
 - **Fuzz Testing**: Updated fuzz test generators and translators to accommodate the new compaction parameter in `alter_expr.rs` and related files.

 This refactor aims to streamline the compaction configuration by reducing the number of parameters and simplifying the codebase.

* chore: add trailing space

* fix license header

* feat/revise-compaction-picker:
 **Limit File Processing and Optimize Merge Logic in `run.rs`**

 - Introduced a limit to process a maximum of 100 files in `merge_seq_files` to control time complexity.
 - Adjusted logic to calculate `target_size` and iterate over files using the limited set of files.
 - Updated scoring calculations to use the limited file set, ensuring efficient file merging.

* feat/revise-compaction-picker:
 ### Add Compaction Metrics and Remove Debug Logging

 - **Compaction Metrics**: Introduced new histograms `COMPACTION_INPUT_BYTES` and `COMPACTION_OUTPUT_BYTES` to track compaction input and output file sizes in `metrics.rs`. Updated `compactor.rs` to observe these metrics during the compaction process.
 - **Logging Cleanup**: Removed debug logging of file ranges during the merge process in `twcs.rs`.

* feat/revise-compaction-picker:
 ## Enhance Compaction Logic and Metrics

 - **Compaction Logic Improvements**:
   - Added methods `input_file_size` and `output_file_size` to `MergeOutput` in `compactor.rs` to streamline file size calculations.
   - Updated `Compactor` implementation to use these methods for metrics tracking.
   - Modified `Ranged` trait logic in `run.rs` to improve range comparison.
   - Enhanced test cases in `run.rs` to reflect changes in compaction logic.

 - **Metrics Enhancements**:
   - Changed `COMPACTION_INPUT_BYTES` and `COMPACTION_OUTPUT_BYTES` from histograms to counters in `metrics.rs` for better performance tracking.

 - **Debugging and Logging**:
   - Added detailed logging for compaction pick results in `twcs.rs`.
   - Implemented custom `Debug` trait for `FileMeta` in `file.rs` to improve debugging output.

 - **Testing Enhancements**:
   - Added new test `test_compaction_overlapping_files` in `compaction_test.rs` to verify compaction behavior with overlapping files.
   - Updated `merge_mode_test.rs` to reflect changes in file handling during scans.

* feat/revise-compaction-picker:
 ### Update `FileHandle` Debug Implementation

 - **Refactor Debug Output**: Simplified the `fmt::Debug` implementation for `FileHandle` in `src/mito2/src/sst/file.rs` by consolidating multiple fields into a single `meta` field using `meta_ref()`.
 - **Atomic Operations**: Updated the `deleted` field to use atomic loading with `Ordering::Relaxed`.

* Trigger CI

* feat/revise-compaction-picker:
 **Update compaction logic and default options**

 - **`twcs.rs`**: Enhanced logging for compaction pick results by improving the formatting for better readability.
 - **`options.rs`**: Modified the default `max_output_file_size` in `TwcsOptions` from 2GB to 512MB to optimize file handling and performance.

* feat/revise-compaction-picker:
 Refactor `find_overlapping_items` to use an external result vector

 - Updated `find_overlapping_items` in `src/mito2/src/compaction/run.rs` to accept a mutable result vector instead of returning a new vector, improving memory efficiency.
 - Modified benchmarks in `src/mito2/benches/bench_compaction_picker.rs` to accommodate the new function signature.
 - Adjusted tests in `src/mito2/src/compaction/run.rs` to use the updated function signature, ensuring correct functionality with the new approach.

* feat/revise-compaction-picker:
 Improve file merging logic in `run.rs`

 - Refactor the loop logic in `merge_seq_files` to simplify the iteration over file groups.
 - Adjust the range for `end_idx` to include the endpoint, allowing for more flexible group selection.
 - Remove the condition that skips groups with only one file, enabling more comprehensive processing of file sequences.

* feat/revise-compaction-picker:
 Enhance `find_overlapping_items` with `SortedRun` and Update Tests

 - Refactor `find_overlapping_items` in `src/mito2/src/compaction/run.rs` to utilize the `SortedRun` struct for improved efficiency and clarity.
 - Introduce a `sorted` flag in `SortedRun` to optimize sorting operations.
 - Update test cases in `src/mito2/benches/bench_compaction_picker.rs` to accommodate changes in `find_overlapping_items` by using `SortedRun`.
 - Add `From<Vec<T>>` implementation for `SortedRun` to facilitate easy conversion from vectors.

* feat/revise-compaction-picker:
 **Enhancements in `compaction/run.rs`:**

 - Added `ReadableSize` import to handle size calculations.
 - Modified the logic in `merge_seq_files` to clamp the calculated target size to a maximum of 2GB when `max_file_size` is not provided.

* feat/revise-compaction-picker: Add Default Max Output Size Constant for Compaction

Introduce DEFAULT_MAX_OUTPUT_SIZE constant to define the default maximum compaction output file size as 2GB. Refactor the merge_seq_files function to utilize this constant, ensuring consistent and maintainable code for handling file size limits during compaction.
2025-05-23 03:29:08 +00:00
Lei, HUANG
5a9023d6b3 feat(bulk): write to multiple time partitions (#6086)
* add benchmark for splitting according to time partition

* feat/write-to-multiple-time-partitions:
 **Enhancements to Bulk Processing and Time Partitioning**

 - **`part.rs`**: Added `Snafu` to imports and introduced `timestamp_index` in `BulkPart` struct. Implemented `timestamps` method for accessing timestamp columns.
 - **`simple_bulk_memtable.rs`**: Updated tests to include `timestamp_index` initialization.
 - **`time_partition.rs`**: Enhanced `TimePartition` to support partial writes with `write_record_batch_partial`. Implemented `split_record_batch` for filtering records by timestamp range. Added comprehensive tests for `split_record_batch`.
 - **`handle_bulk_insert.rs`**: Modified to retrieve timestamp index and column together, updating `BulkPart` initialization with `timestamp_index`.

* feat/write-to-multiple-time-partitions:
 ### Enhance Time Partitioning Logic

 - **`time_partition.rs`**:
   - Introduced `HashSet` for efficient partition management.
   - Refactored `write_bulk` to handle multiple partitions and added `find_partitions_by_time_range` for identifying existing and missing partitions.
   - Updated `get_or_create_time_partition` to manage partition creation.
   - Added comprehensive tests for partition finding logic, covering various scenarios including overlapping and non-overlapping time ranges.

 - **Tests**:
   - Added `test_find_partitions_by_time_range` to validate new partitioning logic.
   - Updated `test_split_record_batch` to ensure correct record batch splitting behavior.

* feat/write-to-multiple-time-partitions:
 ### Enhance Time Partitioning and Testing in `time_partition.rs`

 - **Time Partitioning Enhancements**:
   - Updated `split_record_batch` to handle multiple timestamp units (`Second`, `Millisecond`, `Microsecond`, `Nanosecond`) by matching on `DataType`.
   - Improved filtering logic for timestamp arrays to support various time units.

 - **Testing Enhancements**:
   - Added `test_write_bulk` to verify writing across multiple partitions and scenarios in `time_partition.rs`.
   - Updated `test_split_record_batch` to use `TimestampMillisecondArray` for testing timestamp partitioning.

 - **Imports and Dependencies**:
   - Added necessary imports for new timestamp array types and testing utilities.

* feat/write-to-multiple-time-partitions:
 ### Refactor and Enhance Time Partition Filtering

 - **Refactor Filtering Logic**: Consolidated the filtering logic for timestamp arrays using macros in `time_partition.rs` and `bench_filter_time_partition.rs`. This reduces code duplication and improves maintainability.
 - **Enhance `BulkPart` Struct**: Made fields in `BulkPart` public to facilitate easier access and manipulation in `memtable.rs` and `part.rs`.
 - **Rename Function**: Renamed `split_record_batch` to `filter_record_batch` for clarity in `time_partition.rs` and `bench_filter_time_partition.rs`.
 - **Add Feature Flag**: Introduced `int_roundings` feature in `lib.rs` to support new functionality.

* refactor tests

* feat/write-to-multiple-time-partitions:
 Improve timestamp handling in `time_partition.rs`

 - Enhanced safety comments for timestamp conversion to ensure clarity.
 - Modified logic to prevent overflow by using `div_euclid` for `bulk_start_sec` and `bulk_end_sec` calculations.
 - Adjusted the `filter_map` logic to correctly compute timestamps using `start_sec` and `part_duration_sec`.

* feat/write-to-multiple-time-partitions:
 **Refactor timestamp handling and add utility function**

 - **Refactor `time_partition.rs`:** Simplified timestamp handling by replacing direct type access with a utility function to retrieve the timestamp unit. Improved error handling for timestamp conversion.
 - **Enhance `metadata.rs`:** Added `time_index_type` function to `RegionMetadata` to retrieve the timestamp type of the time index column, ensuring safer and more readable code.

* feat/write-to-multiple-time-partitions:
 Refactor time partition variable names in `time_partition.rs`

 - Renamed variables for clarity: `bulk_start_sec` to `start_bucket` and `bulk_end_sec` to `end_bucket`.
 - Updated related logic to use new variable names for improved readability and maintainability.

* feat/write-to-multiple-time-partitions:
 **Refactor variable names in `time_partition.rs`**

 - Updated variable names from `matching` and `missing` to `matchings` and `missings` for clarity and consistency.
 - Modified function calls and loop iterations to align with the new variable names.
 - Affected file: `src/mito2/src/memtable/time_partition.rs`

* feat/write-to-multiple-time-partitions:
 ### Refactor variable names in `time_partition.rs`

 - Updated variable names for clarity in `time_partition.rs`:
   - Renamed `matchings` to `matching_parts`
   - Renamed `missings` to `missing_parts`
 - Adjusted logic to use new variable names in methods `find_partitions_by_time_range` and `write_record_batch`.

* feat/write-to-multiple-time-partitions:
 ### Enhance Time Partition Handling

 - **`time_partition.rs`**:
   - Added `ArrayRef` to handle timestamp arrays, improving the partitioning logic by allowing more efficient timestamp range checks.
   - Enhanced `find_partitions_by_time_range` to support sparse data and handle different timestamp units (`Second`, `Millisecond`, `Microsecond`, `Nanosecond`).
   - Updated test cases to cover new scenarios, including sparse data and edge cases, ensuring robustness of partition handling.

---------

Co-authored-by: Lei <lei@Leis-MacBook-Pro.local>
2025-05-14 05:09:59 +00:00
fys
2b2ea5bf72 chore: upgrade some dependencies (#5777)
* chore: upgrade some dependencies

* chore: upgrade some dependencies

* fix: cr

* fix: ci

* fix: test

* fix: cargo fmt
2025-03-27 02:48:44 +00:00
Weny Xu
965a48656f feat(metric-engine): introduce RowModifier for MetricEngine (#5380)
* feat(metric-engine): store physical table ColumnIds in `MetricEngineState`

* feat(metric-engine): introduce `RowModifier` for MetricEngine

* chore: upgrade greptime-proto

* feat: introduce `WriteHint` to `RegionPutRequest`

* chore: apply suggestions from CR

* chore: udpate greptime-proto

* chore: apply suggestions from CR

* chore: add comments

* chore: update proto
2025-01-22 05:16:44 +00:00
discord9
758aef39d8 feat: filter batch by sequence in memtable (#5367)
* feat: add seq field

* feat: filter by sequence

* chore: per review

* docs: explain why not prune

* chore: correct doc

* test: test filter by seq
2025-01-16 04:44:28 +00:00
Weny Xu
b64c075cdb feat: introduce PrimaryKeyEncoding (#5312)
* feat: introduce `PrimaryKeyEncoding`

* fix: fix unit tests

* chore: add empty line

* test: add unit tests

* chore: fmt code

* refactor: introduce new codec trait to support various encoding

* fix: fix unit tests

* chore: update sqlness result

* chore: apply suggestions from CR

* chore: apply suggestions from CR
2025-01-15 06:16:53 +00:00
Yingwen
10b7a3d24d feat: Implements merge_mode region options (#4208)
* feat: add update_mode to region options

* test: add test

* feat: last not null iter

* feat: time series last not null

* feat: partition tree update mode

* feat: partition tree

* fix: last not null iter slice

* test: add test for compaction

* test: use second resolution

* style: fix clippy

* chore: merge two lines

Co-authored-by: Jeremyhi <jiachun_feng@proton.me>

* chore: address CR comments

* refactor: UpdateMode -> MergeMode

* refactor: LastNotNull -> LastNonNull

* chore: return None earlier

* feat: validate region options

make merge mode optional and use default while it is None

* test: fix tests

---------

Co-authored-by: Jeremyhi <jiachun_feng@proton.me>
2024-06-27 07:52:58 +00:00
maco
40c585890a refactor: replace Expr with datafusion::Expr (#3995)
* refactor: replace Expr with datafusion::Expr

* fix: fmt-toml

* fix: cr comment
2024-05-21 06:40:29 +00:00
Yingwen
39b69f1e3b refactor!: Renames the new memtable to PartitionTreeMemtable (#3547)
* refactor: rename mod merge_tree to partition_tree

* refactor: rename merge_tree

* refactor: change merge tree comment

* refactor: rename merge tree struct

* refactor: memtable options
2024-03-20 06:40:41 +00:00
Lei, HUANG
ddbcff68dd feat: support append-only mode in time-series memtable (#3540)
* feat: support append-only mode in time-series memtable

* fix: rename sort_and_dedup to sort
2024-03-19 20:37:54 +00:00
Yingwen
7c895e2605 perf: more benchmarks for memtables (#3491)
* chore: remove duplicate bench

* refactor: rename bench

* perf: add full scan bench for memtable

* feat: filter bench and add time series to bench group

* chore: comment

* refactor: rename

* style: fix clippy
2024-03-12 12:02:58 +00:00
Lei, HUANG
376409b857 feat: employ sparse key encoding for shard lookup (#3410)
* feat: employ short key encoding for shard lookup

* fix: license

* chore: simplify code

* refactor: only enable sparse encoding to speed lookup on metric engine

* fix: names
2024-03-01 06:22:15 +00:00