Yingwen
04cd2c8a05
feat: flat read path support primary_key format memtables ( #7759 )
...
* feat: add adapter for batch to flat recordbatch
Signed-off-by: evenyag <realevenyag@gmail.com >
* feat: support batch to flat record batch in MemtableRange
Signed-off-by: evenyag <realevenyag@gmail.com >
* refactor: address review issues for BatchToRecordBatchAdapter
- Extract duplicated read_column_ids computation into a shared
`read_column_ids_from_projection` helper function
- Cache `FormatProjection` in `BatchToRecordBatchContext::new()` instead
of recomputing it on every `adapt_iter()` call
- Remove unnecessary `Arc` wrapping of `read_column_ids` in
`SimpleBulkMemtable::ranges()`
- Fix clippy `filter_map_bool_then` warning in `batch_adapter.rs`
Signed-off-by: evenyag <realevenyag@gmail.com >
* chore: simplify comments
Signed-off-by: evenyag <realevenyag@gmail.com >
* refactor(mito2): use read column ids in batch adapter
Signed-off-by: evenyag <realevenyag@gmail.com >
* test: test build_record_batch_iter
Signed-off-by: evenyag <realevenyag@gmail.com >
* chore: fmt code
Signed-off-by: evenyag <realevenyag@gmail.com >
* test: test build_record_batch_iter for all old memtables
Signed-off-by: evenyag <realevenyag@gmail.com >
* chore: address comment
Signed-off-by: evenyag <realevenyag@gmail.com >
* fix: prune time range before adapter
Signed-off-by: evenyag <realevenyag@gmail.com >
* chore: share BatchToRecordBatchContext in simple_bulk_memtable.rs
Signed-off-by: evenyag <realevenyag@gmail.com >
* chore: use ScalarValue::to_array_of_size to build repeated value array
Signed-off-by: evenyag <realevenyag@gmail.com >
---------
Signed-off-by: evenyag <realevenyag@gmail.com >
2026-03-10 12:46:39 +00:00
Ruihang Xia
58528d1334
feat: fast path for empty selection files ( #7780 )
...
* perf(mito2): skip reader context for empty selections
* refactor(mito2): make parquet reader input optional
2026-03-10 12:22:03 +00:00
Weny Xu
3e81345d7f
fix(mito2): avoid parquet redownload when write cache already contains file ( #7777 )
...
* feat: add idempotent write cache download API
* fix: skip parquet redownload in manifest edit path
* test: cover download_if_absent cache hit and miss
* chore: add comments
Signed-off-by: WenyXu <wenymedia@gmail.com >
---------
Signed-off-by: WenyXu <wenymedia@gmail.com >
2026-03-09 12:16:53 +00:00
Yingwen
4232ce8eaa
test: fix unstable index meta list test ( #7774 )
...
* test: fix unstable index meta list test
Signed-off-by: evenyag <realevenyag@gmail.com >
* fix: raise bucket_size threshold to avoid bucketing sizes in [512, 999] to 0
Signed-off-by: evenyag <realevenyag@gmail.com >
---------
Signed-off-by: evenyag <realevenyag@gmail.com >
2026-03-09 07:57:43 +00:00
Ruihang Xia
a71df9477a
perf(mito2): speed up parquet scan via minmax caches ( #7708 )
...
* perf(mito2): speed up parquet scan via meta caches
* fix(mito2): fix parquet pruning and metadata cache
* revert config changes
Signed-off-by: Ruihang Xia <waynestxia@gmail.com >
* enhance cache file enter logic
Signed-off-by: Ruihang Xia <waynestxia@gmail.com >
* resolve tiny cr comments
Signed-off-by: Ruihang Xia <waynestxia@gmail.com >
* only preload from fs or cache
Signed-off-by: Ruihang Xia <waynestxia@gmail.com >
* fix vector test
Signed-off-by: Ruihang Xia <waynestxia@gmail.com >
---------
Signed-off-by: Ruihang Xia <waynestxia@gmail.com >
2026-03-07 06:32:47 +00:00
Yingwen
93c48a078c
feat: implement last row cache reader for flat format ( #7757 )
...
* feat: initial implementation
Signed-off-by: evenyag <realevenyag@gmail.com >
* fix: handle multiple series
Signed-off-by: evenyag <realevenyag@gmail.com >
* fix: reset state in finish()
Signed-off-by: evenyag <realevenyag@gmail.com >
* fix: handle duplicated last timestamps across batches
Signed-off-by: evenyag <realevenyag@gmail.com >
* perf: compact primary key array
Signed-off-by: evenyag <realevenyag@gmail.com >
* fix(mito2): simplify flat last timestamp selector state
Signed-off-by: evenyag <realevenyag@gmail.com >
* refactor(mito2): rebuild flat pk dictionary from selector state
Signed-off-by: evenyag <realevenyag@gmail.com >
* test: reduce tests
Signed-off-by: evenyag <realevenyag@gmail.com >
* chore: update comment
Signed-off-by: evenyag <realevenyag@gmail.com >
* chore: more logs to debug
Signed-off-by: evenyag <realevenyag@gmail.com >
* feat: concat batches in last row reader
Signed-off-by: evenyag <realevenyag@gmail.com >
* refactor(mito2): simplify flat last row selector output buffer
- Replace VecDeque with BatchBuffer struct for output buffering
- Remove rebuild_pk_dictionary_for_key as batches go directly into buffer
- Remove unused push method and make BatchBuffer pub(crate)
- Remove debug logging in maybe_update_cache
Signed-off-by: evenyag <realevenyag@gmail.com >
* chore: address comments
Signed-off-by: evenyag <realevenyag@gmail.com >
---------
Signed-off-by: evenyag <realevenyag@gmail.com >
2026-03-06 12:08:58 +00:00
Yingwen
33acbf985d
fix: Allow overriding sequence when writing flat SSTs ( #7764 )
...
Signed-off-by: evenyag <realevenyag@gmail.com >
2026-03-06 09:43:58 +00:00
discord9
56ee8baa3f
feat: admin gc table/regions ( #7619 )
...
* feat: gc table
Signed-off-by: discord9 <discord9@163.com >
* test: admin gc
Signed-off-by: discord9 <discord9@163.com >
* chore: after rebase fix
Signed-off-by: discord9 <discord9@163.com >
* refactor: GcStats
Signed-off-by: discord9 <discord9@163.com >
* refactor: use gc ticker for admin gc
Signed-off-by: discord9 <discord9@163.com >
* fix: region routes override
Signed-off-by: discord9 <discord9@163.com >
* test: non happy path
Signed-off-by: discord9 <discord9@163.com >
* refactor: gc job report enum
Signed-off-by: discord9 <discord9@163.com >
* test: process 0 regions
Signed-off-by: discord9 <discord9@163.com >
* after rebase
Signed-off-by: discord9 <discord9@163.com >
* feat: allow manual gc to return error
Signed-off-by: discord9 <discord9@163.com >
* chore: update proto
Signed-off-by: discord9 <discord9@163.com >
* per review
Signed-off-by: discord9 <discord9@163.com >
* chore: timeout and update proto
Signed-off-by: discord9 <discord9@163.com >
* chore: udpate proto
Signed-off-by: discord9 <discord9@163.com >
---------
Signed-off-by: discord9 <discord9@163.com >
2026-03-06 08:25:44 +00:00
discord9
5e6d2b221e
feat: gc batch delete files ( #7733 )
...
* feat: batch delete
Signed-off-by: discord9 <discord9@163.com >
* chore: explict error msg
Signed-off-by: discord9 <discord9@163.com >
* chore: per review
Signed-off-by: discord9 <discord9@163.com >
* refactor: not fallback when batch failure
Signed-off-by: discord9 <discord9@163.com >
* chore: per review
Signed-off-by: discord9 <discord9@163.com >
* pcr
Signed-off-by: discord9 <discord9@163.com >
* chore: per review
Signed-off-by: discord9 <discord9@163.com >
* refactor: batch delete in access layer
Signed-off-by: discord9 <discord9@163.com >
---------
Signed-off-by: discord9 <discord9@163.com >
2026-03-06 07:17:20 +00:00
discord9
4c30b9efaf
fix: null first for part expr as logical expr ( #7747 )
...
* fix: null first for part expr as logical expr
Signed-off-by: discord9 <discord9@163.com >
* test: update tests
Signed-off-by: discord9 <discord9@163.com >
* chore: per review
Signed-off-by: discord9 <discord9@163.com >
* fix: nulll handle&non-null filter
Signed-off-by: discord9 <discord9@163.com >
* chore: doc test
Signed-off-by: discord9 <discord9@163.com >
---------
Signed-off-by: discord9 <discord9@163.com >
2026-03-06 02:53:05 +00:00
Yingwen
5c8ece27e0
feat: improve filter support for scanbench ( #7736 )
...
* feat: cast filters type for scanbench
Signed-off-by: evenyag <realevenyag@gmail.com >
* chore: pub file_range mod
So we can use the pub struct FileRange in other places
Signed-off-by: evenyag <realevenyag@gmail.com >
* fix: add api as dev-dependency to cmd for clippy
Signed-off-by: evenyag <realevenyag@gmail.com >
* feat: support profiling after warmup
Signed-off-by: evenyag <realevenyag@gmail.com >
---------
Signed-off-by: evenyag <realevenyag@gmail.com >
2026-03-03 09:00:41 +00:00
LFC
b2074e3863
chore: upgrade DataFusion family, again ( #7578 )
...
* chore: upgrade DataFusion family
Signed-off-by: luofucong <luofc@foxmail.com >
* chore: switch to released version of datafusion-pg-catalog
---------
Signed-off-by: luofucong <luofc@foxmail.com >
Co-authored-by: Ning Sun <sunning@greptime.com >
Co-authored-by: Ning Sun <sunng@protonmail.com >
2026-03-03 07:36:39 +00:00
Yingwen
f4b4d61651
feat: add extension range api for flat format ( #7730 )
...
feat: add flat extension api
Signed-off-by: evenyag <realevenyag@gmail.com >
2026-02-27 09:48:18 +00:00
LFC
5eac4f10aa
chore: remove dependency on "atty" ( #7725 )
...
Signed-off-by: luofucong <luofc@foxmail.com >
2026-02-26 09:58:01 +00:00
Yingwen
0c30bf1a10
feat: add a subcommand to bench scan ( #7722 )
...
* feat: support scan bench
Signed-off-by: evenyag <realevenyag@gmail.com >
* feat: support projection by name
Signed-off-by: evenyag <realevenyag@gmail.com >
* feat: support force flat format
Signed-off-by: evenyag <realevenyag@gmail.com >
* feat: spawn tasks to poll streams
Signed-off-by: evenyag <realevenyag@gmail.com >
* feat: support filter config
Signed-off-by: evenyag <realevenyag@gmail.com >
* feat: scan bench support wal
Signed-off-by: evenyag <realevenyag@gmail.com >
* chore: support not providing provider in wal
Signed-off-by: evenyag <realevenyag@gmail.com >
* fix: skip wal replay
Signed-off-by: evenyag <realevenyag@gmail.com >
* refactor: wrap EngineComponents
Signed-off-by: evenyag <realevenyag@gmail.com >
* docs: add scanbench doc
Signed-off-by: evenyag <realevenyag@gmail.com >
* chore: change --skip-wal-replay to --enable-wal
Signed-off-by: evenyag <realevenyag@gmail.com >
* chore: remove limit from config
Signed-off-by: evenyag <realevenyag@gmail.com >
---------
Signed-off-by: evenyag <realevenyag@gmail.com >
2026-02-26 06:37:40 +00:00
discord9
46683f908a
chore: tracing for gc ( #7723 )
...
* chore: tracing for gc
Signed-off-by: discord9 <discord9@163.com >
* chore: rm some clone
Signed-off-by: discord9 <discord9@163.com >
---------
Signed-off-by: discord9 <discord9@163.com >
2026-02-26 06:37:29 +00:00
Weny Xu
df04267c54
fix(repartition): reject writes on deallocating regions during region merge ( #7694 )
...
* feat(meta): add write route policy to region route with backward compatibility
Signed-off-by: WenyXu <wenymedia@gmail.com >
* fix(meta): use partition_expr compatibility accessor in repartition matching
Signed-off-by: WenyXu <wenymedia@gmail.com >
* feat(meta): introduce staging partition rule enum for repartition instructions
Signed-off-by: WenyXu <wenymedia@gmail.com >
* feat(datanode): plumb staging partition rule enum through heartbeat handlers
Signed-off-by: WenyXu <wenymedia@gmail.com >
* feat(meta): mark pending-deallocate regions as reject-all during merge staging
Signed-off-by: WenyXu <wenymedia@gmail.com >
* feat(partition): exclude reject-all regions from write partitioning
Signed-off-by: WenyXu <wenymedia@gmail.com >
* feat(mito): store staging partition rule enum in region state
Signed-off-by: WenyXu <wenymedia@gmail.com >
* feat(mito): reject writes in staging when partition rule is reject-all
Signed-off-by: WenyXu <wenymedia@gmail.com >
* feat(meta): send enter staging instruction with reject-all
Signed-off-by: WenyXu <wenymedia@gmail.com >
* fix(repartition): preserve reject-all on exit, merge enter-staging instructions, and allow staged bulk writes
Signed-off-by: WenyXu <wenymedia@gmail.com >
* refactor: refactor to ignore all writes
Signed-off-by: WenyXu <wenymedia@gmail.com >
* chore: apply suggestions
Signed-off-by: WenyXu <wenymedia@gmail.com >
* refactor: rename StagingPartitionRule to StagingPartitionDirective across staging flow
Signed-off-by: WenyXu <wenymedia@gmail.com >
* chore: add comments
Signed-off-by: WenyXu <wenymedia@gmail.com >
* chore: clippy
Signed-off-by: WenyXu <wenymedia@gmail.com >
* refactor: nit
Signed-off-by: WenyXu <wenymedia@gmail.com >
* chore: apply suggestions
Signed-off-by: WenyXu <wenymedia@gmail.com >
* refactor: rename
Signed-off-by: WenyXu <wenymedia@gmail.com >
---------
Signed-off-by: WenyXu <wenymedia@gmail.com >
2026-02-25 07:04:38 +00:00
Yingwen
42ad842434
feat: support changing table's append_mode to true ( #7669 )
...
* feat: support alter append_mode to true
Signed-off-by: evenyag <realevenyag@gmail.com >
* test: add sqlness test
Signed-off-by: evenyag <realevenyag@gmail.com >
* chore: remove comment
Signed-off-by: evenyag <realevenyag@gmail.com >
* chore: fix compiler errors
Signed-off-by: evenyag <realevenyag@gmail.com >
* fix: clear merge mode in mito when setting append mode
Signed-off-by: evenyag <realevenyag@gmail.com >
* fix: sanitize open request and options with both append/merge mode
Signed-off-by: evenyag <realevenyag@gmail.com >
* feat: clear merge mode when append mode is true
Signed-off-by: evenyag <realevenyag@gmail.com >
---------
Signed-off-by: evenyag <realevenyag@gmail.com >
2026-02-25 04:11:23 +00:00
discord9
279b009583
chore: more gc metrics ( #7661 )
...
* chore: more gc metrics
Signed-off-by: discord9 <discord9@163.com >
* clippy
Signed-off-by: discord9 <discord9@163.com >
* refactor: simple metrics
Signed-off-by: discord9 <discord9@163.com >
* unused metrics
Signed-off-by: discord9 <discord9@163.com >
* fix(meta-srv): count need-retry regions in GC failure metric
Signed-off-by: discord9 <discord9@163.com >
* chore: better bucketing
Signed-off-by: discord9 <discord9@163.com >
---------
Signed-off-by: discord9 <discord9@163.com >
2026-02-24 08:21:10 +00:00
Weny Xu
0ed3b83099
refactor: rename partition rule version to partition expr version ( #7696 )
...
* refactor: rename partition rule version to partition expr version
Signed-off-by: WenyXu <wenymedia@gmail.com >
* chore: update proto
Signed-off-by: WenyXu <wenymedia@gmail.com >
* chore: clippy
Signed-off-by: WenyXu <wenymedia@gmail.com >
---------
Signed-off-by: WenyXu <wenymedia@gmail.com >
2026-02-10 10:12:47 +00:00
Weny Xu
45a3e1121d
fix(mito2): introduce PartitionExprChange in staging flow and keep memtables on metadata-only updates ( #7695 )
...
* feat(mito2): add RegionMetaAction::PartitionExprChange
Signed-off-by: WenyXu <wenymedia@gmail.com >
* refactor(mito2): apply partition-expr action in manifest builder and manager
Signed-off-by: WenyXu <wenymedia@gmail.com >
* refactor(mito2): add partition-expr action merge rules and conflict guard
Signed-off-by: WenyXu <wenymedia@gmail.com >
* refactor(mito2): use partition-rule action in enter staging
Signed-off-by: WenyXu <wenymedia@gmail.com >
* fix(mito2): validate Change and route to metadata-only update on staging exit
Signed-off-by: WenyXu <wenymedia@gmail.com >
* test(mito2): cover partition-expr action staging flow and conflict cases
Signed-off-by: WenyXu <wenymedia@gmail.com >
* test(mito2): add apply-staging coverage for Change metadata validation
Signed-off-by: WenyXu <wenymedia@gmail.com >
* test(mito2): add apply-staging coverage for Change metadata validation
Signed-off-by: WenyXu <wenymedia@gmail.com >
* chore: fmt
Signed-off-by: WenyXu <wenymedia@gmail.com >
* chore: remove unused error
Signed-off-by: WenyXu <wenymedia@gmail.com >
* chore: apply suggestions
Signed-off-by: WenyXu <wenymedia@gmail.com >
* chore: add warn
Signed-off-by: WenyXu <wenymedia@gmail.com >
* chore: add comments
Signed-off-by: WenyXu <wenymedia@gmail.com >
* test: preserves unflushed memtable
Signed-off-by: WenyXu <wenymedia@gmail.com >
* chore: fmt
Signed-off-by: WenyXu <wenymedia@gmail.com >
---------
Signed-off-by: WenyXu <wenymedia@gmail.com >
2026-02-10 08:26:25 +00:00
fys
c75f6d8ca8
fix(mito2): filter extension ranges in pruner ( #7693 )
...
* fix: ci
* fix: cargo fmt
2026-02-10 02:41:22 +00:00
Weny Xu
8026b23834
feat: partition rule version validation for writes and staging ( #7628 )
...
* feat: verify partition rule
Signed-off-by: WenyXu <wenymedia@gmail.com >
* feat: add partition version cache
Signed-off-by: WenyXu <wenymedia@gmail.com >
* chore: header check
Signed-off-by: WenyXu <wenymedia@gmail.com >
* chore: fmt toml
Signed-off-by: WenyXu <wenymedia@gmail.com >
* refactor: minor refactor
Signed-off-by: WenyXu <wenymedia@gmail.com >
* chore: header
Signed-off-by: WenyXu <wenymedia@gmail.com >
* chore: fix clippy
Signed-off-by: WenyXu <wenymedia@gmail.com >
* fix: fix unit tests
Signed-off-by: WenyXu <wenymedia@gmail.com >
* refactor: minor refactor
Signed-off-by: WenyXu <wenymedia@gmail.com >
* chore: apply suggestions
Signed-off-by: WenyXu <wenymedia@gmail.com >
* chore: nit
Signed-off-by: WenyXu <wenymedia@gmail.com >
* chore: nit
Signed-off-by: WenyXu <wenymedia@gmail.com >
* chore: apply suggestions
Signed-off-by: WenyXu <wenymedia@gmail.com >
* chore: apply suggestions
Signed-off-by: WenyXu <wenymedia@gmail.com >
---------
Signed-off-by: WenyXu <wenymedia@gmail.com >
2026-02-06 12:16:34 +00:00
Yingwen
581c777dce
feat: Implement a shared pruner for partitions in the same scanner ( #7635 )
...
* feat: implement pruner
feat: initial implementation for pruner
Signed-off-by: evenyag <realevenyag@gmail.com >
refactor: simplify worker
Signed-off-by: evenyag <realevenyag@gmail.com >
refactor: increase remaining counts in each scan partition
Signed-off-by: evenyag <realevenyag@gmail.com >
feat: pre filter files prepared to read
Signed-off-by: evenyag <realevenyag@gmail.com >
feat: add metrics for pruner
Signed-off-by: evenyag <realevenyag@gmail.com >
feat: more logs for worker
Signed-off-by: evenyag <realevenyag@gmail.com >
feat: log files
Signed-off-by: evenyag <realevenyag@gmail.com >
fix: move sneders to pruner to avoid cycling ref
Signed-off-by: evenyag <realevenyag@gmail.com >
* feat: collect ReaderMetrics in pruner worker
Signed-off-by: evenyag <realevenyag@gmail.com >
* feat: report build part cost for per file metrics
Signed-off-by: evenyag <realevenyag@gmail.com >
* feat: skip files with no ranges in top metrics
Signed-off-by: evenyag <realevenyag@gmail.com >
* chore: update comments for pruner
Signed-off-by: evenyag <realevenyag@gmail.com >
* refactor: wrap method to get worker idx
Signed-off-by: evenyag <realevenyag@gmail.com >
* chore: remove unused PruneStatus
Signed-off-by: evenyag <realevenyag@gmail.com >
---------
Signed-off-by: evenyag <realevenyag@gmail.com >
2026-02-05 15:28:28 +00:00
dennis zhuang
8883022742
refactor(vector-index): use protobuf for metadata and align code ( #7648 )
...
* refactor(vector-index): use protobuf for metadata and introduce lifecycle traits
Signed-off-by: Dennis Zhuang <killme2008@gmail.com >
* chore: minor change
Signed-off-by: Dennis Zhuang <killme2008@gmail.com >
* refactor: by suggestions
Signed-off-by: Dennis Zhuang <killme2008@gmail.com >
* chore: format
Signed-off-by: Dennis Zhuang <killme2008@gmail.com >
* fix: style
Signed-off-by: Dennis Zhuang <killme2008@gmail.com >
* chore: remove usearch from mito2
Signed-off-by: Dennis Zhuang <killme2008@gmail.com >
* chore: tweak errors
Signed-off-by: Dennis Zhuang <killme2008@gmail.com >
* test: update index size in result
Signed-off-by: Dennis Zhuang <killme2008@gmail.com >
* fix: clippy
Signed-off-by: Dennis Zhuang <killme2008@gmail.com >
* chore: update proto deps
Signed-off-by: Dennis Zhuang <killme2008@gmail.com >
---------
Signed-off-by: Dennis Zhuang <killme2008@gmail.com >
2026-02-05 02:41:48 +00:00
dennis zhuang
c08f3a4472
test: adds sqlness test for vector index ( #7634 )
...
* test: adds sqlness test for vector index
Signed-off-by: Dennis Zhuang <killme2008@gmail.com >
* fix: CI
Signed-off-by: Dennis Zhuang <killme2008@gmail.com >
* test: redacted flat map and size
Signed-off-by: Dennis Zhuang <killme2008@gmail.com >
* test: simplify the replace rules
Signed-off-by: Dennis Zhuang <killme2008@gmail.com >
* chore: update comments and tests
Signed-off-by: Dennis Zhuang <killme2008@gmail.com >
---------
Signed-off-by: Dennis Zhuang <killme2008@gmail.com >
2026-02-04 03:54:47 +00:00
Yingwen
c6ce4485a2
chore: adjust manifest cache log level ( #7655 )
...
* chore: adjust manifest cache log level
Signed-off-by: evenyag <realevenyag@gmail.com >
* chore: add consts to words
Signed-off-by: evenyag <realevenyag@gmail.com >
---------
Signed-off-by: evenyag <realevenyag@gmail.com >
2026-02-03 07:08:52 +00:00
Ruihang Xia
d5285e965b
perf(mito2): merge last_non_null within memtable batches ( #7653 )
...
* perf(mito2): merge last_non_null within memtable batches
* fix(mito2): apply sequence filter before memtable merge
* test(mito2): cover merge_last_non_null
* refactor(mito2): remove redundant loop label
2026-02-02 13:43:32 +00:00
Yingwen
a8f1ed7fc9
feat: add recover_sync to ManifestCache::new ( #7652 )
...
feat: add recover_sync to ManifestCache::new
This ensures tests can recover in sync
Signed-off-by: evenyag <realevenyag@gmail.com >
2026-02-02 12:33:41 +00:00
discord9
dd4698002a
fix: send get file ref to all regions ( #7640 )
...
* fix: send get file ref to all regions
Signed-off-by: discord9 <discord9@163.com >
* refactor: return err on fail to get table route
Signed-off-by: discord9 <discord9@163.com >
* refactor: batch get
Signed-off-by: discord9 <discord9@163.com >
* chore: add loggin in all places
Signed-off-by: discord9 <discord9@163.com >
---------
Signed-off-by: discord9 <discord9@163.com >
2026-01-30 10:36:59 +00:00
Weny Xu
ac9c830365
fix: clean up staging blob directory on clear ( #7642 )
...
Signed-off-by: WenyXu <wenymedia@gmail.com >
2026-01-30 09:39:37 +00:00
Yingwen
7711661618
feat: BulkMemtable compact parts without encoding into Parquet ( #7617 )
...
* feat: implement MultiBulkPart to hold a list of batches in BulkMemtable
Signed-off-by: evenyag <realevenyag@gmail.com >
* feat: Only encode parts when there are enough rows
Signed-off-by: evenyag <realevenyag@gmail.com >
* refactor: merge MultiBulkPartIter and BulkPartBatchIter
Signed-off-by: evenyag <realevenyag@gmail.com >
* refactor: remove some enums and structs
Signed-off-by: evenyag <realevenyag@gmail.com >
* chore: reuse code in merging bulk/encoded parts
Signed-off-by: evenyag <realevenyag@gmail.com >
* refactor: collect part groups directly
Signed-off-by: evenyag <realevenyag@gmail.com >
* test: add unit tests
Signed-off-by: evenyag <realevenyag@gmail.com >
* feat: enlarge merge threshold and configure by env
- GREPTIME_BULK_MERGE_THRESHOLD
- GREPTIME_BULK_ENCODE_ROW_THRESHOLD
- GREPTIME_BULK_ENCODE_BYTES_THRESHOLD
Signed-off-by: evenyag <realevenyag@gmail.com >
* feat: change flush strategy
Signed-off-by: evenyag <realevenyag@gmail.com >
* feat: add BulkMemtableConfig
Signed-off-by: evenyag <realevenyag@gmail.com >
* chore: limit max groups and adjust threshold
Signed-off-by: evenyag <realevenyag@gmail.com >
* feat: add flush file number metrics
Signed-off-by: evenyag <realevenyag@gmail.com >
* chore: add bulk filter 1 host bench
Signed-off-by: evenyag <realevenyag@gmail.com >
* feat: adjust bulk compact threshold
Signed-off-by: evenyag <realevenyag@gmail.com >
* feat: flush a file if == min_flush_rows
Signed-off-by: evenyag <realevenyag@gmail.com >
* test: fix test_index_build_type_compact test
Signed-off-by: evenyag <realevenyag@gmail.com >
* test: fix mito tests
Signed-off-by: evenyag <realevenyag@gmail.com >
* fix: remove regions from catchup_regions before notify
Signed-off-by: evenyag <realevenyag@gmail.com >
---------
Signed-off-by: evenyag <realevenyag@gmail.com >
2026-01-30 08:03:36 +00:00
Ruihang Xia
d99a946d33
refactor: remove duplications from mito ( #7632 )
...
* parse wal options
Signed-off-by: Ruihang Xia <waynestxia@gmail.com >
* new memtable from version
Signed-off-by: Ruihang Xia <waynestxia@gmail.com >
* file path
Signed-off-by: Ruihang Xia <waynestxia@gmail.com >
* use wal entry reader
Signed-off-by: Ruihang Xia <waynestxia@gmail.com >
* map batch responses
Signed-off-by: Ruihang Xia <waynestxia@gmail.com >
---------
Signed-off-by: Ruihang Xia <waynestxia@gmail.com >
2026-01-28 09:03:22 +00:00
discord9
00f568ed28
fix: gc update repart map properly ( #7606 )
...
* feat: update repart map
Signed-off-by: discord9 <discord9@163.com >
* fix: table id write lock
Signed-off-by: discord9 <discord9@163.com >
* chore: default value
Signed-off-by: discord9 <discord9@163.com >
* chore: config
Signed-off-by: discord9 <discord9@163.com >
* test: update repartition map
Signed-off-by: discord9 <discord9@163.com >
* fix: empty file ref set
Signed-off-by: discord9 <discord9@163.com >
* chore: per review
Signed-off-by: discord9 <discord9@163.com >
* chore: properly log error
Signed-off-by: discord9 <discord9@163.com >
---------
Signed-off-by: discord9 <discord9@163.com >
2026-01-28 04:31:19 +00:00
Weny Xu
5bfc728d32
fix(repartition): improve physical region allocation and compaction read path correctness ( #7621 )
...
* fix: fix metadata region
Signed-off-by: WenyXu <wenymedia@gmail.com >
* fix: adjust repartition flow and compaction read compatibility
Signed-off-by: WenyXu <wenymedia@gmail.com >
* chore: remove logs
Signed-off-by: WenyXu <wenymedia@gmail.com >
* refactor: rename compaction mapper and pk projection
Signed-off-by: WenyXu <wenymedia@gmail.com >
* refactor: rename `CompactionProjectionMapper`
Signed-off-by: WenyXu <wenymedia@gmail.com >
* refactor: clarify compaction projection naming
Signed-off-by: WenyXu <wenymedia@gmail.com >
* chore: add comments
Signed-off-by: WenyXu <wenymedia@gmail.com >
* chore: fmt
Signed-off-by: WenyXu <wenymedia@gmail.com >
* feat: allow create physical table with internal columns
Signed-off-by: WenyXu <wenymedia@gmail.com >
* test: add tests
Signed-off-by: WenyXu <wenymedia@gmail.com >
* fix: fix template logic
Signed-off-by: WenyXu <wenymedia@gmail.com >
* fix: fix unit test
Signed-off-by: WenyXu <wenymedia@gmail.com >
* chore: apply suggestions
Signed-off-by: WenyXu <wenymedia@gmail.com >
* chore: apply suggestions
Signed-off-by: WenyXu <wenymedia@gmail.com >
* chore: update sqlness result
Signed-off-by: WenyXu <wenymedia@gmail.com >
---------
Signed-off-by: WenyXu <wenymedia@gmail.com >
2026-01-28 04:04:05 +00:00
dennis zhuang
238bc4fa2c
feat: impl vector index query ( #7564 )
...
* feat: impl vector index query
Signed-off-by: Dennis Zhuang <killme2008@gmail.com >
* feat: remove VectorSearchRule and merge it into scan hint rule
Signed-off-by: Dennis Zhuang <killme2008@gmail.com >
* refactor: vector search hint
Signed-off-by: Dennis Zhuang <killme2008@gmail.com >
* test: join and subquery
Signed-off-by: Dennis Zhuang <killme2008@gmail.com >
* fix: clippy when feature disabled
Signed-off-by: Dennis Zhuang <killme2008@gmail.com >
* fix: push hint only when column is non-nullable or an explicit IS NOT NULL filter exists
Signed-off-by: Dennis Zhuang <killme2008@gmail.com >
* fix: transformed = true
Co-authored-by: Yingwen <realevenyag@gmail.com >
Signed-off-by: Dennis Zhuang <killme2008@gmail.com >
* chore: remove adpater vector hint
Signed-off-by: Dennis Zhuang <killme2008@gmail.com >
* chore: revert transformed
Signed-off-by: Dennis Zhuang <killme2008@gmail.com >
---------
Signed-off-by: Dennis Zhuang <killme2008@gmail.com >
Co-authored-by: Yingwen <realevenyag@gmail.com >
2026-01-28 03:40:56 +00:00
Lei, HUANG
4ae9245eb4
fix: flaky compaction test ( #7627 )
...
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com >
2026-01-27 15:07:24 +00:00
Weny Xu
d0c610f3c7
feat: add partial_drop to DropRequest ( #7597 )
...
* feat: add `partial_drop` to `DropRequest`
Signed-off-by: WenyXu <wenymedia@gmail.com >
* feat: handle non-partial-drop drop task
Signed-off-by: WenyXu <wenymedia@gmail.com >
* feat: remove files immediately
Signed-off-by: WenyXu <wenymedia@gmail.com >
* chore: update proto
Signed-off-by: WenyXu <wenymedia@gmail.com >
---------
Signed-off-by: WenyXu <wenymedia@gmail.com >
2026-01-27 10:46:52 +00:00
Weny Xu
4fb61047cb
test: add integration tests for repartition ( #7560 )
...
* test: add integration tests for mito repartition
Signed-off-by: WenyXu <wenymedia@gmail.com >
* chore: update test result
Signed-off-by: WenyXu <wenymedia@gmail.com >
* test: add integration tests for metric repartition
Signed-off-by: WenyXu <wenymedia@gmail.com >
* fix: correct results
Signed-off-by: WenyXu <wenymedia@gmail.com >
* chore: enable tests for object store
Signed-off-by: WenyXu <wenymedia@gmail.com >
* test: add compaction and gc
Signed-off-by: WenyXu <wenymedia@gmail.com >
* fix: fix unit test
Signed-off-by: WenyXu <wenymedia@gmail.com >
* chore: apply suggestions
Signed-off-by: WenyXu <wenymedia@gmail.com >
* more cases
Signed-off-by: WenyXu <wenymedia@gmail.com >
* fix: file ref also in repart mapping
Signed-off-by: discord9 <discord9@163.com >
* chore: apply suggestions
Signed-off-by: WenyXu <wenymedia@gmail.com >
* chore: set a longer timeout for mock metasrv channel manager
Signed-off-by: WenyXu <wenymedia@gmail.com >
---------
Signed-off-by: WenyXu <wenymedia@gmail.com >
Signed-off-by: discord9 <discord9@163.com >
Co-authored-by: discord9 <discord9@163.com >
2026-01-22 10:14:40 +00:00
discord9
576eb0aadc
chore: not ignore error now bug is fixed in #7579 ( #7596 )
...
* chore: not ignore error now bug is fixed in #7579
Signed-off-by: discord9 <discord9@163.com >
* per review
Signed-off-by: discord9 <discord9@163.com >
* chor: test
Signed-off-by: discord9 <discord9@163.com >
---------
Signed-off-by: discord9 <discord9@163.com >
2026-01-21 06:13:59 +00:00
Yingwen
c34d142e7d
fix: clear unused range builders eagerly ( #7569 )
...
* feat: clear the range builder after one part
Signed-off-by: evenyag <realevenyag@gmail.com >
* feat: collect peak memory usage of build ranges
Signed-off-by: evenyag <realevenyag@gmail.com >
* feat: collect peak range builder nums in metrics
Signed-off-by: evenyag <realevenyag@gmail.com >
* refactor: num_range_builders_peak -> num_peak_range_builders
Signed-off-by: evenyag <realevenyag@gmail.com >
* fix: track file range counts
* Ensure the reader won't be released until all ranges scanned.
* This fixes unordered scan which each partition range is a row group
Signed-off-by: evenyag <realevenyag@gmail.com >
* style: fix clippy
Signed-off-by: evenyag <realevenyag@gmail.com >
* chore: change to isize
The metrics may init to 0.
Signed-off-by: evenyag <realevenyag@gmail.com >
---------
Signed-off-by: evenyag <realevenyag@gmail.com >
2026-01-21 03:25:44 +00:00
discord9
67e51b4573
feat: gc worker on dropped region ( #7537 )
...
* feat: allow clean up for dropped region
Signed-off-by: discord9 <discord9@163.com >
* clippy
Signed-off-by: discord9 <discord9@163.com >
* pcr
Signed-off-by: discord9 <discord9@163.com >
* fix: get access layer correct
Signed-off-by: discord9 <discord9@163.com >
* chore: invalid gc args
Signed-off-by: discord9 <discord9@163.com >
* chore: fix test
Signed-off-by: discord9 <discord9@163.com >
* feat: more defend check
Signed-off-by: discord9 <discord9@163.com >
* per review
Signed-off-by: discord9 <discord9@163.com >
* feat: messy impl of drop region
Signed-off-by: discord9 <discord9@163.com >
* feat: add dropped region GC handling module and integrate with GcScheduler
Signed-off-by: discord9 <discord9@163.com >
* refactor: simplify access layer creation
Signed-off-by: discord9 <discord9@163.com >
* c
Signed-off-by: discord9 <discord9@163.com >
* fix: path type
Signed-off-by: discord9 <discord9@163.com >
* feat: gc handle drop
Signed-off-by: discord9 <discord9@163.com >
* chore: use proper const
Signed-off-by: discord9 <discord9@163.com >
* fix: recursive list when check empty dir
Signed-off-by: discord9 <discord9@163.com >
* per review
Signed-off-by: discord9 <discord9@163.com >
* refactor: with gc only delete if metadata region
Signed-off-by: discord9 <discord9@163.com >
* feat: add batch_get_table_route method to SchedulerCtx and MockSchedulerCtx
Signed-off-by: discord9 <discord9@163.com >
* chore: comment
Signed-off-by: discord9 <discord9@163.com >
* refactor: retry delete method
Signed-off-by: discord9 <discord9@163.com >
---------
Signed-off-by: discord9 <discord9@163.com >
2026-01-20 11:45:37 +00:00
discord9
aa3daf7053
fix: read filter's column ( #7579 )
...
* fix: read columns need for filter
Signed-off-by: discord9 <discord9@163.com >
* c
Signed-off-by: discord9 <discord9@163.com >
* feat: add support for explicit read columns in projection mappers
Signed-off-by: discord9 <discord9@163.com >
* test: add compatibility tests for projection mappers
Signed-off-by: discord9 <discord9@163.com >
* c
Signed-off-by: discord9 <discord9@163.com >
* fix: rename variable for clarity and improve column ID retrieval logic
Signed-off-by: discord9 <discord9@163.com >
* fix: update scan input construction to include read column IDs
Signed-off-by: discord9 <discord9@163.com >
* chore: per review
Signed-off-by: discord9 <discord9@163.com >
* test: sqlness for projection filter
Signed-off-by: discord9 <discord9@163.com >
* refactor: per review
Signed-off-by: discord9 <discord9@163.com >
* chore: more redacting
Signed-off-by: discord9 <discord9@163.com >
* chore: more redact
Signed-off-by: discord9 <discord9@163.com >
* c
Signed-off-by: discord9 <discord9@163.com >
---------
Signed-off-by: discord9 <discord9@163.com >
2026-01-20 08:19:50 +00:00
discord9
d916409d04
feat: exact partition filter ( #7571 )
...
* feat(mito2): add repartition tests
Signed-off-by: WenyXu <wenymedia@gmail.com >
* feat: filter(VIBED NOT REVIEW YET)
Signed-off-by: discord9 <discord9@163.com >
* feat: only use related columns
Signed-off-by: discord9 <discord9@163.com >
* feat: add partition filter tests and enhance pruning logic
Signed-off-by: discord9 <discord9@163.com >
* pre review
Signed-off-by: discord9 <discord9@163.com >
* feat: refine partition filter logic and update related function names
Signed-off-by: discord9 <discord9@163.com >
* per review
Signed-off-by: discord9 <discord9@163.com >
* c
Signed-off-by: discord9 <discord9@163.com >
* rm useless test
Signed-off-by: discord9 <discord9@163.com >
* feat: enhance partition filter error handling to skip failures
Signed-off-by: discord9 <discord9@163.com >
* chore: per review
Signed-off-by: discord9 <discord9@163.com >
* test: use real column
Signed-off-by: discord9 <discord9@163.com >
* per review
Signed-off-by: discord9 <discord9@163.com >
* feat: add TagDecodeState initialization to filter processing
Signed-off-by: discord9 <discord9@163.com >
* chore: update test doc
Signed-off-by: discord9 <discord9@163.com >
* per review
Signed-off-by: discord9 <discord9@163.com >
---------
Signed-off-by: WenyXu <wenymedia@gmail.com >
Signed-off-by: discord9 <discord9@163.com >
Co-authored-by: WenyXu <wenymedia@gmail.com >
2026-01-19 13:06:32 +00:00
Lei, HUANG
e0285209cb
feat: flush region before close when skip-wal is enabled ( #7549 )
...
* feat: flush region before close when skip-wal is enabled
When closing a region with Noop WAL provider, the region is now flushed
before closing to ensure data durability. This prevents data loss for
regions configured with skip_wal.
Changes:
- Add `Closing` variant to `FlushReason` enum
- Modify `handle_close_request` to trigger flush for Noop WAL regions
- Pass flush reason through the flush pipeline
- Add test to verify data persistence after close with skip-wal
The flush-on-close flow completes the region cleanup after the flush
finishes, ensuring the region is properly removed from all schedulers.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com >
* refactor: extract region cleanup logic into dedicated method
Extracts common region cleanup logic (stop, remove, and scheduler cleanup) into a new `remove_region` method to avoid duplication between `handle_close` and `handle_flush_request`. This improves code maintainability and reduces redundancy.
Also updates `RegionMap::remove_region` to return the removed region reference, allowing the caller to perform cleanup operations.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com >
* test: split skip-wal region close test into pending and no-pending cases
Split the test_close_region_skip_wal test into two separate test cases:
- test_close_region_skip_wal_with_pending_data: Tests the scenario where
data is inserted before closing a region with skip-wal enabled
- test_close_region_skip_wal_without_pending_data: Tests the scenario
where a region with skip-wal is closed without any data insertion
This improves test clarity and ensures both scenarios are properly covered.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com >
* fix: skip request handling and compaction for flush-on-close regions
When a region is flushed as part of the close operation (flush_on_close=true),
the region is immediately removed from the server. Therefore, there's no need
to handle pending requests or schedule compactions for such regions.
This fix moves the on_flush_success listener call outside the conditional
block and wraps all post-flush operations (request handling, compaction
scheduling) in an else branch, ensuring they only execute for normal flush
operations where the region remains active.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com >
* test: add close follower region test with skip-wal
Adds a test case for closing a follower region with skip-wal enabled.
The test verifies that when a region transitions from Follower to Leader
before closing, the flush mechanism works correctly even with WAL disabled.
Also refactors flushable_region() to return Option instead of erroring
when region is not operable, allowing more flexible handling of region
states during flush operations.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com >
* fix: fmt
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com >
* revise test logic for closing a follower region
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com >
---------
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com >
2026-01-15 09:15:14 +00:00
Weny Xu
2ae20daa62
feat: add sync region instruction for repartition procedure ( #7562 )
...
* feat: add sync region instruction for repartition procedure
This commit introduces a new sync region instruction and integrates it
into the repartition procedure flow, specifically for metric engine tables.
Changes:
- Add SyncRegion instruction type and SyncRegionsReply in instruction.rs
- Implement SyncRegionHandler in datanode to handle sync region requests
- Add SyncRegion state in repartition procedure to sync newly allocated regions
- Integrate sync region step after enter_staging_region for metric engine tables
- Add sync_region flag and allocated_region_ids to PersistentContext
- Make SyncRegionFromRequest serializable for instruction transmission
- Add test utilities and mock support for sync region operations
The sync region step is conditionally executed based on the table engine type,
ensuring that newly allocated regions in metric engine tables are properly
synced from their source regions before proceeding with manifest remapping.
Signed-off-by: WenyXu <wenymedia@gmail.com >
* chore: add logs
Signed-off-by: WenyXu <wenymedia@gmail.com >
* feat(repartition): improve staging region handling and support metric engine repartition
- Reorder sync region flow: move SyncRegion from EnterStagingRegion to RepartitionStart to sync before applying staging
- Add ExitStaging metadata update state to properly clear staging leader info after repartition completes
- Update build_template_from_raw_table_info to optionally skip metric engine internal columns when creating region requests
- Fix region state transition: set_dropping now expects specific state (Staging or Writable) for proper validation
- Adjust region drop and copy handlers to handle staging regions correctly
- Add comprehensive test cases for metric engine SPLIT/MERGE partition operations on physical tables with logical tables
- Improve logging for table route updates, region drops, and repartition operations
Signed-off-by: WenyXu <wenymedia@gmail.com >
* refactor: removes code duplication
Signed-off-by: WenyXu <wenymedia@gmail.com >
* fix: update result
Signed-off-by: WenyXu <wenymedia@gmail.com >
* chore: refine comments
Signed-off-by: WenyXu <wenymedia@gmail.com >
* feat: add error strategy support for flush region and flush pending deallocate regions
- **Add `ErrorStrategy` enum** in `procedure/utils.rs`:
- Supports `Ignore` and `Retry` strategies for error handling
- Refactor `flush_region` to accept `error_strategy` parameter
- Extract `handle_flush_region_reply` helper function for better code organization
- **Add pending deallocate region support**:
- Add `pending_deallocate_region_ids` field to `PersistentContext`
- Implement `flush_pending_deallocate_regions` in `EnterStagingRegion` state
- Flush pending deallocate regions before entering staging regions to ensure data consistency
- **Update error handling**:
- `flush_leader_region`: Use `ErrorStrategy::Ignore` to skip unreachable datanodes
- `sync_region`: Use `ErrorStrategy::Retry` for critical operations
- `enter_staging_region`: Use `ErrorStrategy::Retry` when flushing pending deallocate regions
This change improves the robustness of the repartition procedure by:
1. Providing flexible error handling strategies for flush operations
2. Ensuring pending deallocate regions are properly flushed before repartitioning
3. Preventing data inconsistency during region migration
Signed-off-by: WenyXu <wenymedia@gmail.com >
* chore: apply suggestions from CR
Signed-off-by: WenyXu <wenymedia@gmail.com >
* fix: compile
Signed-off-by: WenyXu <wenymedia@gmail.com >
---------
Signed-off-by: WenyXu <wenymedia@gmail.com >
2026-01-15 04:52:57 +00:00
LFC
e64c31e59a
chore: upgrade DataFusion family ( #7558 )
...
* chore: upgrade DataFusion family
Signed-off-by: luofucong <luofc@foxmail.com >
* use main proto
Signed-off-by: luofucong <luofc@foxmail.com >
* fix ci
Signed-off-by: luofucong <luofc@foxmail.com >
---------
Signed-off-by: luofucong <luofc@foxmail.com >
2026-01-14 14:02:31 +00:00
Ruihang Xia
a5cb0116a2
perf: avoid boundary checks on accessing array items ( #7570 )
...
Signed-off-by: Ruihang Xia <waynestxia@gmail.com >
2026-01-14 12:56:39 +00:00
Yingwen
4b3bd7317b
feat: add per-partition convert, result cache metrics ( #7539 )
...
* fix: show convert cost in explain analyze verbose
Signed-off-by: evenyag <realevenyag@gmail.com >
* fix: increase puffin metadata cache metric
Signed-off-by: evenyag <realevenyag@gmail.com >
* feat: add result cache hit/miss to filter metrics
Signed-off-by: evenyag <realevenyag@gmail.com >
* feat: print flat format in debug
Signed-off-by: evenyag <realevenyag@gmail.com >
* test: update sqlness test
Signed-off-by: evenyag <realevenyag@gmail.com >
* feat: make scan cost contains part/reader build cost
Signed-off-by: evenyag <realevenyag@gmail.com >
* feat: collect divider cost
Signed-off-by: evenyag <realevenyag@gmail.com >
* refactor: remove unused field in ScannerMetrics
Signed-off-by: evenyag <realevenyag@gmail.com >
* feat: collect metadata read bytes
Signed-off-by: evenyag <realevenyag@gmail.com >
* chore: collect read metrics in get_parquet_meta_data
Signed-off-by: evenyag <realevenyag@gmail.com >
---------
Signed-off-by: evenyag <realevenyag@gmail.com >
2026-01-13 09:17:09 +00:00
dennis zhuang
a56a00224f
feat: impl vector index scan in storage ( #7528 )
...
* feat: impl vector index scan in storage
Signed-off-by: Dennis Zhuang <killme2008@gmail.com >
* feat: fallback to read remote blob when blob not found
Signed-off-by: Dennis Zhuang <killme2008@gmail.com >
* chore: refactor encoding and decoding and apply suggestions
Signed-off-by: Dennis Zhuang <killme2008@gmail.com >
* fix: license
Signed-off-by: Dennis Zhuang <killme2008@gmail.com >
* test: add apply_with_k tests
Signed-off-by: Dennis Zhuang <killme2008@gmail.com >
* chore: apply suggestions
Signed-off-by: Dennis Zhuang <killme2008@gmail.com >
* fix: forgot to align nulls when the vector column is not in the batch
Signed-off-by: Dennis Zhuang <killme2008@gmail.com >
* test: add test for vector column is not in a batch while buiilding
Signed-off-by: Dennis Zhuang <killme2008@gmail.com >
---------
Signed-off-by: Dennis Zhuang <killme2008@gmail.com >
2026-01-12 08:30:51 +00:00