Commit Graph

857 Commits

Author SHA1 Message Date
Lei, HUANG
35c5a4adb7 fix(mito2): accept post-truncate flush for skip-wal tables (#7858)
Allow flush edits with equal entry ids when flushed sequence advances, so close-time flush after truncate still succeeds for skip-wal regions while stale pre-truncate flushes are rejected. Add a regression test for create->truncate->write->close timing.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
2026-03-25 12:26:27 +00:00
Yingwen
04aa84af62 feat: use ArrowReaderBuilder instead of the RowGroups API (#7853)
* feat: use ArrowReaderBuilder instead of the RowGroups API

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: make row_group_idx required

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: remove unsed variant

Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: collect total_fetch_elapsed metrics

Signed-off-by: evenyag <realevenyag@gmail.com>

---------

Signed-off-by: evenyag <realevenyag@gmail.com>
2026-03-25 03:10:19 +00:00
Yingwen
0e22d6a72b feat: implement partition range cache stream (#7842)
* feat: add cache stream helpers, key construction, config wiring, and metrics for partition range cache

Add range result cache size config field and wire it through cache builder
chains. Implement cache key building (build_range_cache_key), stream
replay/store helpers (cached_flat_range_stream, cache_flat_range_stream),
dictionary compaction (compact_pk_dictionary), and partition range row group
collection. Add range cache metrics (size, hit, miss) to ScanMetricsSet
and PartitionMetrics. Move fingerprint tests from scan_region to
range_cache module. These functions are not yet wired into scan execution.

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: add benchmark for cache stream

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: move bench_util to test_util

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: share dict

Signed-off-by: evenyag <realevenyag@gmail.com>

* test: test ptr_eq

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: fmt code

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: simplify value array handling

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: add todo for estimate size

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: simplify size calculation

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: remove one test

Signed-off-by: evenyag <realevenyag@gmail.com>

* test: update config test

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: address review comment

Only ignore exprs that can extract time ranges

Signed-off-by: evenyag <realevenyag@gmail.com>

* test: fix tests

Signed-off-by: evenyag <realevenyag@gmail.com>

---------

Signed-off-by: evenyag <realevenyag@gmail.com>
2026-03-24 10:01:13 +00:00
Yingwen
5231ee40c8 feat: add parquet pk prefilter helpers (#7850)
* feat: extract parquet pk prefilter helpers

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: fmt code

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: fix warnings

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: update todo

Signed-off-by: evenyag <realevenyag@gmail.com>

---------

Signed-off-by: evenyag <realevenyag@gmail.com>
2026-03-24 03:57:18 +00:00
Ruihang Xia
f999d5e70e feat: avoid some vector-array conversions on flat projection (#7804)
* perf(mito2): optimize flat projection conversion

* shrink the diff size

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* apply gemini's sugg

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* nit

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2026-03-24 00:11:37 +00:00
Lei, HUANG
7874282089 feat(mito): flat scan for time series memtable (#7814)
* feat/flat-for-time-series:
 ### Commit Message

 Enhance `TimeSeriesMemtable` with Record Batch Support

 - **`time_series.rs`**:
   - Introduced `BatchToRecordBatchContext` to facilitate conversion of batch iterators to record batch iterators.
   - Added `build_record_batch` method in `TimeSeriesIterBuilder` to support record batch creation.
   - Implemented multiple test cases to validate the functionality of record batch creation, including tests for projections,
 deduplication, sequence filtering, and data correctness.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* feat/flat-for-time-series:
 Refactor `TimeSeriesMemtable` and `TimeSeriesIterBuilder`

 - Renamed `adapter_context` to `batch_to_record_batch` in `TimeSeriesMemtable` for clarity.
 - Simplified `MemtableRangeContext` initialization by removing the `batch_to_record_batch` parameter.
 - Added `is_record_batch` method to `TimeSeriesIterBuilder` to indicate record batch status.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* feat/flat-for-time-series:
 ### Add Time Range Filtering and Predicate Group Enhancements

 - **`memtable.rs`**: Updated `IterBuilder` to include `time_range` parameter in `build_record_batch` method, enhancing record batch iteration with time range filtering.
 - **`time_series.rs`**: Modified `TimeSeriesIterBuilder` to use `PredicateGroup` instead of `Predicate`, and integrated `PruneTimeIterator` for time-based filtering.
 - **`memtable_util.rs`**: Removed unused `Predicate` import, reflecting changes in predicate handling.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

---------

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
2026-03-23 19:39:57 +00:00
Ruihang Xia
2af3951944 feat: cache decoded region metadata alone with parquet metadata (#7813)
* cache decoded region metadata

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix: account for decoded sst metadata cache weight

* take optional pre-exist metadata

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2026-03-19 03:09:47 +00:00
Yingwen
e0aadffb91 feat: add flat last row reader to the final stream (#7818)
Signed-off-by: evenyag <realevenyag@gmail.com>
2026-03-17 07:55:48 +00:00
Yingwen
5a37e58b4f feat(mito2): add partition range cache infrastructure (#7798)
* feat: add partition range cache infra

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: optimize scan request fingerprint cloning

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: merge loops

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: more docs

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: update estimated size method and comment

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: fix clippy

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: only cache when we scan files

Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: address PR review comments for partition range cache

- Remove TimeSeriesDistribution from fingerprint as it only affects yield order
- Disable range cache when dyn filters are present since they change at runtime

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: fmt code

Signed-off-by: evenyag <realevenyag@gmail.com>

---------

Signed-off-by: evenyag <realevenyag@gmail.com>
2026-03-17 03:53:20 +00:00
Ning Sun
dd82fcac00 chore: update visibility of BatchToRecordBatchAdapter::new (#7817) 2026-03-16 09:56:34 +00:00
Lei, HUANG
be4a7a6d37 refactor: remove Memtable::iter (#7809)
* refactor: remove Memtable::iter

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* fix: review comments

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

---------

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
2026-03-16 07:49:31 +00:00
Yingwen
e215851c8a refactor: unify flush and compaction to always use FlatSource (#7799)
* feat: support write flat as primary key format

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: migrate flush to always use FlatSource

Add FormatType propagation in SstWriteRequest and use it to choose
Flat vs PrimaryKey write paths (write_all_flat vs
write_all_flat_as_primary_key) in AccessLayer and WriteCache. Make
compactor and flush derive the sst_write_format from region options or
engine config. Simplify flush logic and remove the old memtable_source
helper. Update tests to set default sst_write_format.

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: compaction use flat source

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: read parquet sequentially as flat batches

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: remove new_batch_with_binary in favor of new_record_batch_with_binary

Replace PrimaryKeyWriteFormat with FlatWriteFormat in test_read_large_binary
test and use new_record_batch_with_binary directly, removing the now-unused
new_batch_with_binary function and its BinaryArray import.

Signed-off-by: evenyag <realevenyag@gmail.com>

* test: add tests for PrimaryKeyWriteFormat::convert_flat_batch

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: remove Either from SstWriteRequest

Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: handle index build mode

Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: consider sparse encoding and last non null in flush

Signed-off-by: evenyag <realevenyag@gmail.com>

* test: add unit tests for field_column_start edge cases

Signed-off-by: evenyag <realevenyag@gmail.com>

---------

Signed-off-by: evenyag <realevenyag@gmail.com>
2026-03-13 09:44:13 +00:00
discord9
922f9cb3d6 feat: use dyn filter (#7545)
* parent b2074e3863
author discord9 <discord9@163.com> 1767869295 +0800
committer discord9 <discord9@163.com> 1772529023 +0800

feat: use dyn filter

Signed-off-by: discord9 <discord9@163.com>

not supported

Signed-off-by: discord9 <discord9@163.com>

refactor: use make_mut instead

Signed-off-by: discord9 <discord9@163.com>

refactor: rm need to clone stream ctx

Signed-off-by: discord9 <discord9@163.com>

r

Signed-off-by: discord9 <discord9@163.com>

pcr

Signed-off-by: discord9 <discord9@163.com>

test: wait for datafusion update

Signed-off-by: discord9 <discord9@163.com>

refactor: use arc swap for dyn filters

Signed-off-by: discord9 <discord9@163.com>

* test: update sqlness

Signed-off-by: discord9 <discord9@163.com>

* chore: comment out sqlness

Signed-off-by: discord9 <discord9@163.com>

* test: update sqlness

Signed-off-by: discord9 <discord9@163.com>

* test: sqlness fix

Signed-off-by: discord9 <discord9@163.com>

* refactor: predicate without option

Signed-off-by: discord9 <discord9@163.com>

* feat: print dyn filters& more tests

Signed-off-by: discord9 <discord9@163.com>

* test: sqlness vector result update

Signed-off-by: discord9 <discord9@163.com>

* chore: log

Signed-off-by: discord9 <discord9@163.com>

* test: properly redact

Signed-off-by: discord9 <discord9@163.com>

* test: better data dist for non empty dyn filter

Signed-off-by: discord9 <discord9@163.com>

* test: properly redacted

Signed-off-by: discord9 <discord9@163.com>

* chore: per review

Signed-off-by: discord9 <discord9@163.com>

* properly redact

Signed-off-by: discord9 <discord9@163.com>

* docs:  explain why not  do it

Signed-off-by: discord9 <discord9@163.com>

* chore: rename update to add as its more proper

Signed-off-by: discord9 <discord9@163.com>

* chore: rm no need clone

Signed-off-by: discord9 <discord9@163.com>

* docs: per review

Signed-off-by: discord9 <discord9@163.com>

---------

Signed-off-by: discord9 <discord9@163.com>
2026-03-11 03:06:24 +00:00
Weny Xu
b1b91e88f4 fix(mito2): ensure enter staging waits for compaction (#7776)
* fix: do not schedule compaction

Signed-off-by: WenyXu <wenymedia@gmail.com>

* mito2: add pending ddl queue primitives to compaction scheduler

Signed-off-by: WenyXu <wenymedia@gmail.com>

* mito2: hand off pending ddls on compaction finish

Signed-off-by: WenyXu <wenymedia@gmail.com>

* mito2: defer enter staging via compaction pending ddl queue

Signed-off-by: WenyXu <wenymedia@gmail.com>

* mito2: cover pending-ddl failure paths on region lifecycle events

Signed-off-by: WenyXu <wenymedia@gmail.com>

* mito2: replay pending ddls directly in compaction handler

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: styling

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: styling

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: apply suggestions from CR

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: apply suggestions

Signed-off-by: WenyXu <wenymedia@gmail.com>

* test: add unit test

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: add unit tests

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: apply suggestions from CR

Signed-off-by: WenyXu <wenymedia@gmail.com>

---------

Signed-off-by: WenyXu <wenymedia@gmail.com>
2026-03-10 13:49:40 +00:00
Yingwen
04cd2c8a05 feat: flat read path support primary_key format memtables (#7759)
* feat: add adapter for batch to flat recordbatch

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: support batch to flat record batch in MemtableRange

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: address review issues for BatchToRecordBatchAdapter

- Extract duplicated read_column_ids computation into a shared
  `read_column_ids_from_projection` helper function
- Cache `FormatProjection` in `BatchToRecordBatchContext::new()` instead
  of recomputing it on every `adapt_iter()` call
- Remove unnecessary `Arc` wrapping of `read_column_ids` in
  `SimpleBulkMemtable::ranges()`
- Fix clippy `filter_map_bool_then` warning in `batch_adapter.rs`

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: simplify comments

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor(mito2): use read column ids in batch adapter

Signed-off-by: evenyag <realevenyag@gmail.com>

* test: test build_record_batch_iter

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: fmt code

Signed-off-by: evenyag <realevenyag@gmail.com>

* test: test build_record_batch_iter for all old memtables

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: address comment

Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: prune time range before adapter

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: share BatchToRecordBatchContext in simple_bulk_memtable.rs

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: use ScalarValue::to_array_of_size to build repeated value array

Signed-off-by: evenyag <realevenyag@gmail.com>

---------

Signed-off-by: evenyag <realevenyag@gmail.com>
2026-03-10 12:46:39 +00:00
Ruihang Xia
58528d1334 feat: fast path for empty selection files (#7780)
* perf(mito2): skip reader context for empty selections

* refactor(mito2): make parquet reader input optional
2026-03-10 12:22:03 +00:00
Weny Xu
3e81345d7f fix(mito2): avoid parquet redownload when write cache already contains file (#7777)
* feat: add idempotent write cache download API

* fix: skip parquet redownload in manifest edit path

* test: cover download_if_absent cache hit and miss

* chore: add comments

Signed-off-by: WenyXu <wenymedia@gmail.com>

---------

Signed-off-by: WenyXu <wenymedia@gmail.com>
2026-03-09 12:16:53 +00:00
Yingwen
4232ce8eaa test: fix unstable index meta list test (#7774)
* test: fix unstable index meta list test

Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: raise bucket_size threshold to avoid bucketing sizes in [512, 999] to 0

Signed-off-by: evenyag <realevenyag@gmail.com>

---------

Signed-off-by: evenyag <realevenyag@gmail.com>
2026-03-09 07:57:43 +00:00
Ruihang Xia
a71df9477a perf(mito2): speed up parquet scan via minmax caches (#7708)
* perf(mito2): speed up parquet scan via meta caches

* fix(mito2): fix parquet pruning and metadata cache

* revert config changes

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* enhance cache file enter logic

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* resolve tiny cr comments

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* only preload from fs or cache

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix vector test

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2026-03-07 06:32:47 +00:00
Yingwen
93c48a078c feat: implement last row cache reader for flat format (#7757)
* feat: initial implementation

Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: handle multiple series

Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: reset state in finish()

Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: handle duplicated last timestamps across batches

Signed-off-by: evenyag <realevenyag@gmail.com>

* perf: compact primary key array

Signed-off-by: evenyag <realevenyag@gmail.com>

* fix(mito2): simplify flat last timestamp selector state

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor(mito2): rebuild flat pk dictionary from selector state

Signed-off-by: evenyag <realevenyag@gmail.com>

* test: reduce tests

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: update comment

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: more logs to debug

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: concat batches in last row reader

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor(mito2): simplify flat last row selector output buffer

- Replace VecDeque with BatchBuffer struct for output buffering
- Remove rebuild_pk_dictionary_for_key as batches go directly into buffer
- Remove unused push method and make BatchBuffer pub(crate)
- Remove debug logging in maybe_update_cache

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: address comments

Signed-off-by: evenyag <realevenyag@gmail.com>

---------

Signed-off-by: evenyag <realevenyag@gmail.com>
2026-03-06 12:08:58 +00:00
Yingwen
33acbf985d fix: Allow overriding sequence when writing flat SSTs (#7764)
Signed-off-by: evenyag <realevenyag@gmail.com>
2026-03-06 09:43:58 +00:00
discord9
56ee8baa3f feat: admin gc table/regions (#7619)
* feat: gc table

Signed-off-by: discord9 <discord9@163.com>

* test: admin gc

Signed-off-by: discord9 <discord9@163.com>

* chore: after rebase fix

Signed-off-by: discord9 <discord9@163.com>

* refactor: GcStats

Signed-off-by: discord9 <discord9@163.com>

* refactor: use gc ticker for admin gc

Signed-off-by: discord9 <discord9@163.com>

* fix: region routes override

Signed-off-by: discord9 <discord9@163.com>

* test: non happy path

Signed-off-by: discord9 <discord9@163.com>

* refactor: gc job report enum

Signed-off-by: discord9 <discord9@163.com>

* test: process 0 regions

Signed-off-by: discord9 <discord9@163.com>

* after rebase

Signed-off-by: discord9 <discord9@163.com>

* feat: allow manual gc to return error

Signed-off-by: discord9 <discord9@163.com>

* chore: update proto

Signed-off-by: discord9 <discord9@163.com>

* per review

Signed-off-by: discord9 <discord9@163.com>

* chore: timeout and update proto

Signed-off-by: discord9 <discord9@163.com>

* chore: udpate proto

Signed-off-by: discord9 <discord9@163.com>

---------

Signed-off-by: discord9 <discord9@163.com>
2026-03-06 08:25:44 +00:00
discord9
5e6d2b221e feat: gc batch delete files (#7733)
* feat: batch delete

Signed-off-by: discord9 <discord9@163.com>

* chore: explict error msg

Signed-off-by: discord9 <discord9@163.com>

* chore: per review

Signed-off-by: discord9 <discord9@163.com>

* refactor: not fallback when batch failure

Signed-off-by: discord9 <discord9@163.com>

* chore: per review

Signed-off-by: discord9 <discord9@163.com>

* pcr

Signed-off-by: discord9 <discord9@163.com>

* chore: per review

Signed-off-by: discord9 <discord9@163.com>

* refactor: batch delete in access layer

Signed-off-by: discord9 <discord9@163.com>

---------

Signed-off-by: discord9 <discord9@163.com>
2026-03-06 07:17:20 +00:00
discord9
4c30b9efaf fix: null first for part expr as logical expr (#7747)
* fix: null first for part expr as logical expr

Signed-off-by: discord9 <discord9@163.com>

* test: update tests

Signed-off-by: discord9 <discord9@163.com>

* chore: per review

Signed-off-by: discord9 <discord9@163.com>

* fix: nulll handle&non-null filter

Signed-off-by: discord9 <discord9@163.com>

* chore: doc test

Signed-off-by: discord9 <discord9@163.com>

---------

Signed-off-by: discord9 <discord9@163.com>
2026-03-06 02:53:05 +00:00
Yingwen
5c8ece27e0 feat: improve filter support for scanbench (#7736)
* feat: cast filters type for scanbench

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: pub file_range mod

So we can use the pub struct FileRange in other places

Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: add api as dev-dependency to cmd for clippy

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: support profiling after warmup

Signed-off-by: evenyag <realevenyag@gmail.com>

---------

Signed-off-by: evenyag <realevenyag@gmail.com>
2026-03-03 09:00:41 +00:00
LFC
b2074e3863 chore: upgrade DataFusion family, again (#7578)
* chore: upgrade DataFusion family

Signed-off-by: luofucong <luofc@foxmail.com>

* chore: switch to released version of datafusion-pg-catalog

---------

Signed-off-by: luofucong <luofc@foxmail.com>
Co-authored-by: Ning Sun <sunning@greptime.com>
Co-authored-by: Ning Sun <sunng@protonmail.com>
2026-03-03 07:36:39 +00:00
Yingwen
f4b4d61651 feat: add extension range api for flat format (#7730)
feat: add flat extension api

Signed-off-by: evenyag <realevenyag@gmail.com>
2026-02-27 09:48:18 +00:00
LFC
5eac4f10aa chore: remove dependency on "atty" (#7725)
Signed-off-by: luofucong <luofc@foxmail.com>
2026-02-26 09:58:01 +00:00
Yingwen
0c30bf1a10 feat: add a subcommand to bench scan (#7722)
* feat: support scan bench

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: support projection by name

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: support force flat format

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: spawn tasks to poll streams

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: support filter config

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: scan bench support wal

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: support not providing provider in wal

Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: skip wal replay

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: wrap EngineComponents

Signed-off-by: evenyag <realevenyag@gmail.com>

* docs: add scanbench doc

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: change --skip-wal-replay to --enable-wal

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: remove limit from config

Signed-off-by: evenyag <realevenyag@gmail.com>

---------

Signed-off-by: evenyag <realevenyag@gmail.com>
2026-02-26 06:37:40 +00:00
discord9
46683f908a chore: tracing for gc (#7723)
* chore: tracing for gc

Signed-off-by: discord9 <discord9@163.com>

* chore: rm some clone

Signed-off-by: discord9 <discord9@163.com>

---------

Signed-off-by: discord9 <discord9@163.com>
2026-02-26 06:37:29 +00:00
Weny Xu
df04267c54 fix(repartition): reject writes on deallocating regions during region merge (#7694)
* feat(meta): add write route policy to region route with backward compatibility

Signed-off-by: WenyXu <wenymedia@gmail.com>

* fix(meta): use partition_expr compatibility accessor in repartition matching

Signed-off-by: WenyXu <wenymedia@gmail.com>

* feat(meta): introduce staging partition rule enum for repartition instructions

Signed-off-by: WenyXu <wenymedia@gmail.com>

* feat(datanode): plumb staging partition rule enum through heartbeat handlers

Signed-off-by: WenyXu <wenymedia@gmail.com>

* feat(meta): mark pending-deallocate regions as reject-all during merge staging

Signed-off-by: WenyXu <wenymedia@gmail.com>

* feat(partition): exclude reject-all regions from write partitioning

Signed-off-by: WenyXu <wenymedia@gmail.com>

* feat(mito): store staging partition rule enum in region state

Signed-off-by: WenyXu <wenymedia@gmail.com>

* feat(mito): reject writes in staging when partition rule is reject-all

Signed-off-by: WenyXu <wenymedia@gmail.com>

* feat(meta): send enter staging instruction with reject-all

Signed-off-by: WenyXu <wenymedia@gmail.com>

* fix(repartition): preserve reject-all on exit, merge enter-staging instructions, and allow staged bulk writes

Signed-off-by: WenyXu <wenymedia@gmail.com>

* refactor: refactor to ignore all writes

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: apply suggestions

Signed-off-by: WenyXu <wenymedia@gmail.com>

* refactor: rename StagingPartitionRule to StagingPartitionDirective across staging flow

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: add comments

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: clippy

Signed-off-by: WenyXu <wenymedia@gmail.com>

* refactor: nit

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: apply suggestions

Signed-off-by: WenyXu <wenymedia@gmail.com>

* refactor: rename

Signed-off-by: WenyXu <wenymedia@gmail.com>

---------

Signed-off-by: WenyXu <wenymedia@gmail.com>
2026-02-25 07:04:38 +00:00
Yingwen
42ad842434 feat: support changing table's append_mode to true (#7669)
* feat: support alter append_mode to true

Signed-off-by: evenyag <realevenyag@gmail.com>

* test: add sqlness test

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: remove comment

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: fix compiler errors

Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: clear merge mode in mito when setting append mode

Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: sanitize open request and options with both append/merge mode

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: clear merge mode when append mode is true

Signed-off-by: evenyag <realevenyag@gmail.com>

---------

Signed-off-by: evenyag <realevenyag@gmail.com>
2026-02-25 04:11:23 +00:00
discord9
279b009583 chore: more gc metrics (#7661)
* chore: more gc metrics

Signed-off-by: discord9 <discord9@163.com>

* clippy

Signed-off-by: discord9 <discord9@163.com>

* refactor: simple metrics

Signed-off-by: discord9 <discord9@163.com>

* unused metrics

Signed-off-by: discord9 <discord9@163.com>

* fix(meta-srv): count need-retry regions in GC failure metric

Signed-off-by: discord9 <discord9@163.com>

* chore: better bucketing

Signed-off-by: discord9 <discord9@163.com>

---------

Signed-off-by: discord9 <discord9@163.com>
2026-02-24 08:21:10 +00:00
Weny Xu
0ed3b83099 refactor: rename partition rule version to partition expr version (#7696)
* refactor: rename partition rule version to partition expr version

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: update proto

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: clippy

Signed-off-by: WenyXu <wenymedia@gmail.com>

---------

Signed-off-by: WenyXu <wenymedia@gmail.com>
2026-02-10 10:12:47 +00:00
Weny Xu
45a3e1121d fix(mito2): introduce PartitionExprChange in staging flow and keep memtables on metadata-only updates (#7695)
* feat(mito2): add RegionMetaAction::PartitionExprChange

Signed-off-by: WenyXu <wenymedia@gmail.com>

* refactor(mito2): apply partition-expr action in manifest builder and manager

Signed-off-by: WenyXu <wenymedia@gmail.com>

* refactor(mito2): add partition-expr action merge rules and conflict guard

Signed-off-by: WenyXu <wenymedia@gmail.com>

* refactor(mito2): use partition-rule action in enter staging

Signed-off-by: WenyXu <wenymedia@gmail.com>

* fix(mito2): validate Change and route to metadata-only update on staging exit

Signed-off-by: WenyXu <wenymedia@gmail.com>

* test(mito2): cover partition-expr action staging flow and conflict cases

Signed-off-by: WenyXu <wenymedia@gmail.com>

* test(mito2): add apply-staging coverage for Change metadata validation

Signed-off-by: WenyXu <wenymedia@gmail.com>

* test(mito2): add apply-staging coverage for Change metadata validation

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: fmt

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: remove unused error

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: apply suggestions

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: add warn

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: add comments

Signed-off-by: WenyXu <wenymedia@gmail.com>

* test: preserves unflushed memtable

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: fmt

Signed-off-by: WenyXu <wenymedia@gmail.com>

---------

Signed-off-by: WenyXu <wenymedia@gmail.com>
2026-02-10 08:26:25 +00:00
fys
c75f6d8ca8 fix(mito2): filter extension ranges in pruner (#7693)
* fix: ci

* fix: cargo fmt
2026-02-10 02:41:22 +00:00
Weny Xu
8026b23834 feat: partition rule version validation for writes and staging (#7628)
* feat: verify partition rule

Signed-off-by: WenyXu <wenymedia@gmail.com>

* feat: add partition version cache

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: header check

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: fmt toml

Signed-off-by: WenyXu <wenymedia@gmail.com>

* refactor: minor refactor

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: header

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: fix clippy

Signed-off-by: WenyXu <wenymedia@gmail.com>

* fix: fix unit tests

Signed-off-by: WenyXu <wenymedia@gmail.com>

* refactor: minor refactor

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: apply suggestions

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: nit

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: nit

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: apply suggestions

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: apply suggestions

Signed-off-by: WenyXu <wenymedia@gmail.com>

---------

Signed-off-by: WenyXu <wenymedia@gmail.com>
2026-02-06 12:16:34 +00:00
Yingwen
581c777dce feat: Implement a shared pruner for partitions in the same scanner (#7635)
* feat: implement pruner

feat: initial implementation for pruner

Signed-off-by: evenyag <realevenyag@gmail.com>

refactor: simplify worker

Signed-off-by: evenyag <realevenyag@gmail.com>

refactor: increase remaining counts in each scan partition

Signed-off-by: evenyag <realevenyag@gmail.com>

feat: pre filter files prepared to read

Signed-off-by: evenyag <realevenyag@gmail.com>

feat: add metrics for pruner

Signed-off-by: evenyag <realevenyag@gmail.com>

feat: more logs for worker

Signed-off-by: evenyag <realevenyag@gmail.com>

feat: log files

Signed-off-by: evenyag <realevenyag@gmail.com>

fix: move sneders to pruner to avoid cycling ref

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: collect ReaderMetrics in pruner worker

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: report build part cost for per file metrics

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: skip files with no ranges in top metrics

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: update comments for pruner

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: wrap method to get worker idx

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: remove unused PruneStatus

Signed-off-by: evenyag <realevenyag@gmail.com>

---------

Signed-off-by: evenyag <realevenyag@gmail.com>
2026-02-05 15:28:28 +00:00
dennis zhuang
8883022742 refactor(vector-index): use protobuf for metadata and align code (#7648)
* refactor(vector-index): use protobuf for metadata and introduce lifecycle traits

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* chore: minor change

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* refactor: by suggestions

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* chore: format

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* fix: style

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* chore: remove usearch from mito2

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* chore: tweak errors

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* test: update index size in result

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* fix: clippy

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* chore: update proto deps

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

---------

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
2026-02-05 02:41:48 +00:00
dennis zhuang
c08f3a4472 test: adds sqlness test for vector index (#7634)
* test: adds sqlness test for vector index

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* fix: CI

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* test: redacted flat map and size

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* test: simplify the replace rules

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* chore: update comments and tests

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

---------

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
2026-02-04 03:54:47 +00:00
Yingwen
c6ce4485a2 chore: adjust manifest cache log level (#7655)
* chore: adjust manifest cache log level

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: add consts to words

Signed-off-by: evenyag <realevenyag@gmail.com>

---------

Signed-off-by: evenyag <realevenyag@gmail.com>
2026-02-03 07:08:52 +00:00
Ruihang Xia
d5285e965b perf(mito2): merge last_non_null within memtable batches (#7653)
* perf(mito2): merge last_non_null within memtable batches

* fix(mito2): apply sequence filter before memtable merge

* test(mito2): cover merge_last_non_null

* refactor(mito2): remove redundant loop label
2026-02-02 13:43:32 +00:00
Yingwen
a8f1ed7fc9 feat: add recover_sync to ManifestCache::new (#7652)
feat: add recover_sync to  ManifestCache::new

This ensures tests can recover in sync

Signed-off-by: evenyag <realevenyag@gmail.com>
2026-02-02 12:33:41 +00:00
discord9
dd4698002a fix: send get file ref to all regions (#7640)
* fix: send get file ref to all regions

Signed-off-by: discord9 <discord9@163.com>

* refactor: return err on fail to get table route

Signed-off-by: discord9 <discord9@163.com>

* refactor: batch get

Signed-off-by: discord9 <discord9@163.com>

* chore: add loggin in all places

Signed-off-by: discord9 <discord9@163.com>

---------

Signed-off-by: discord9 <discord9@163.com>
2026-01-30 10:36:59 +00:00
Weny Xu
ac9c830365 fix: clean up staging blob directory on clear (#7642)
Signed-off-by: WenyXu <wenymedia@gmail.com>
2026-01-30 09:39:37 +00:00
Yingwen
7711661618 feat: BulkMemtable compact parts without encoding into Parquet (#7617)
* feat: implement MultiBulkPart to hold a list of batches in BulkMemtable

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: Only encode parts when there are enough rows

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: merge MultiBulkPartIter and BulkPartBatchIter

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: remove some enums and structs

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: reuse code in merging bulk/encoded parts

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: collect part groups directly

Signed-off-by: evenyag <realevenyag@gmail.com>

* test: add unit tests

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: enlarge merge threshold and configure by env

- GREPTIME_BULK_MERGE_THRESHOLD
- GREPTIME_BULK_ENCODE_ROW_THRESHOLD
- GREPTIME_BULK_ENCODE_BYTES_THRESHOLD

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: change flush strategy

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: add BulkMemtableConfig

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: limit max groups and adjust threshold

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: add flush file number metrics

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: add bulk filter 1 host bench

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: adjust bulk compact threshold

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: flush a file if == min_flush_rows

Signed-off-by: evenyag <realevenyag@gmail.com>

* test: fix test_index_build_type_compact test

Signed-off-by: evenyag <realevenyag@gmail.com>

* test: fix mito tests

Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: remove regions from catchup_regions before notify

Signed-off-by: evenyag <realevenyag@gmail.com>

---------

Signed-off-by: evenyag <realevenyag@gmail.com>
2026-01-30 08:03:36 +00:00
Ruihang Xia
d99a946d33 refactor: remove duplications from mito (#7632)
* parse wal options

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* new memtable from version

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* file path

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* use wal entry reader

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* map batch responses

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2026-01-28 09:03:22 +00:00
discord9
00f568ed28 fix: gc update repart map properly (#7606)
* feat: update repart map

Signed-off-by: discord9 <discord9@163.com>

* fix: table id write lock

Signed-off-by: discord9 <discord9@163.com>

* chore: default value

Signed-off-by: discord9 <discord9@163.com>

* chore: config

Signed-off-by: discord9 <discord9@163.com>

* test: update repartition map

Signed-off-by: discord9 <discord9@163.com>

* fix: empty file ref set

Signed-off-by: discord9 <discord9@163.com>

* chore: per review

Signed-off-by: discord9 <discord9@163.com>

* chore: properly log error

Signed-off-by: discord9 <discord9@163.com>

---------

Signed-off-by: discord9 <discord9@163.com>
2026-01-28 04:31:19 +00:00
Weny Xu
5bfc728d32 fix(repartition): improve physical region allocation and compaction read path correctness (#7621)
* fix: fix metadata region

Signed-off-by: WenyXu <wenymedia@gmail.com>

* fix: adjust repartition flow and compaction read compatibility

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: remove logs

Signed-off-by: WenyXu <wenymedia@gmail.com>

* refactor: rename compaction mapper and pk projection

Signed-off-by: WenyXu <wenymedia@gmail.com>

* refactor: rename `CompactionProjectionMapper`

Signed-off-by: WenyXu <wenymedia@gmail.com>

* refactor: clarify compaction projection naming

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: add comments

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: fmt

Signed-off-by: WenyXu <wenymedia@gmail.com>

* feat: allow create physical table with internal columns

Signed-off-by: WenyXu <wenymedia@gmail.com>

* test: add tests

Signed-off-by: WenyXu <wenymedia@gmail.com>

* fix: fix template logic

Signed-off-by: WenyXu <wenymedia@gmail.com>

* fix: fix unit test

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: apply suggestions

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: apply suggestions

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: update sqlness result

Signed-off-by: WenyXu <wenymedia@gmail.com>

---------

Signed-off-by: WenyXu <wenymedia@gmail.com>
2026-01-28 04:04:05 +00:00
dennis zhuang
238bc4fa2c feat: impl vector index query (#7564)
* feat: impl vector index query

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* feat: remove VectorSearchRule and merge it into scan hint rule

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* refactor: vector search hint

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* test: join and subquery

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* fix: clippy when feature disabled

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* fix: push hint only when column is non-nullable or an explicit IS NOT NULL filter exists

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* fix: transformed = true

Co-authored-by: Yingwen <realevenyag@gmail.com>
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* chore: remove adpater vector hint

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* chore: revert transformed

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

---------

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
Co-authored-by: Yingwen <realevenyag@gmail.com>
2026-01-28 03:40:56 +00:00