Commit Graph

857 Commits

Author SHA1 Message Date
Lei, HUANG
4ae9245eb4 fix: flaky compaction test (#7627)
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
2026-01-27 15:07:24 +00:00
Weny Xu
d0c610f3c7 feat: add partial_drop to DropRequest (#7597)
* feat: add `partial_drop` to `DropRequest`

Signed-off-by: WenyXu <wenymedia@gmail.com>

* feat: handle non-partial-drop drop task

Signed-off-by: WenyXu <wenymedia@gmail.com>

* feat: remove files immediately

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: update proto

Signed-off-by: WenyXu <wenymedia@gmail.com>

---------

Signed-off-by: WenyXu <wenymedia@gmail.com>
2026-01-27 10:46:52 +00:00
Weny Xu
4fb61047cb test: add integration tests for repartition (#7560)
* test: add integration tests for mito repartition

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: update test result

Signed-off-by: WenyXu <wenymedia@gmail.com>

* test: add integration tests for metric repartition

Signed-off-by: WenyXu <wenymedia@gmail.com>

* fix: correct results

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: enable tests for object store

Signed-off-by: WenyXu <wenymedia@gmail.com>

* test: add compaction and gc

Signed-off-by: WenyXu <wenymedia@gmail.com>

* fix: fix unit test

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: apply suggestions

Signed-off-by: WenyXu <wenymedia@gmail.com>

* more cases

Signed-off-by: WenyXu <wenymedia@gmail.com>

* fix: file ref also in repart mapping

Signed-off-by: discord9 <discord9@163.com>

* chore: apply suggestions

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: set a longer timeout for mock metasrv channel manager

Signed-off-by: WenyXu <wenymedia@gmail.com>

---------

Signed-off-by: WenyXu <wenymedia@gmail.com>
Signed-off-by: discord9 <discord9@163.com>
Co-authored-by: discord9 <discord9@163.com>
2026-01-22 10:14:40 +00:00
discord9
576eb0aadc chore: not ignore error now bug is fixed in #7579 (#7596)
* chore: not ignore error now bug is fixed in #7579

Signed-off-by: discord9 <discord9@163.com>

* per review

Signed-off-by: discord9 <discord9@163.com>

* chor: test

Signed-off-by: discord9 <discord9@163.com>

---------

Signed-off-by: discord9 <discord9@163.com>
2026-01-21 06:13:59 +00:00
Yingwen
c34d142e7d fix: clear unused range builders eagerly (#7569)
* feat: clear the range builder after one part

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: collect peak memory usage of build ranges

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: collect peak range builder nums in metrics

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: num_range_builders_peak -> num_peak_range_builders

Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: track file range counts

* Ensure the reader won't be released until all ranges scanned.
* This fixes unordered scan which each partition range is a row group

Signed-off-by: evenyag <realevenyag@gmail.com>

* style: fix clippy

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: change to isize

The metrics may init to 0.

Signed-off-by: evenyag <realevenyag@gmail.com>

---------

Signed-off-by: evenyag <realevenyag@gmail.com>
2026-01-21 03:25:44 +00:00
discord9
67e51b4573 feat: gc worker on dropped region (#7537)
* feat: allow clean up for dropped region

Signed-off-by: discord9 <discord9@163.com>

* clippy

Signed-off-by: discord9 <discord9@163.com>

* pcr

Signed-off-by: discord9 <discord9@163.com>

* fix: get access layer correct

Signed-off-by: discord9 <discord9@163.com>

* chore: invalid gc args

Signed-off-by: discord9 <discord9@163.com>

* chore: fix test

Signed-off-by: discord9 <discord9@163.com>

* feat: more defend check

Signed-off-by: discord9 <discord9@163.com>

* per review

Signed-off-by: discord9 <discord9@163.com>

* feat: messy impl of drop region

Signed-off-by: discord9 <discord9@163.com>

* feat: add dropped region GC handling module and integrate with GcScheduler

Signed-off-by: discord9 <discord9@163.com>

* refactor: simplify access layer creation

Signed-off-by: discord9 <discord9@163.com>

* c

Signed-off-by: discord9 <discord9@163.com>

* fix: path type

Signed-off-by: discord9 <discord9@163.com>

* feat: gc handle drop

Signed-off-by: discord9 <discord9@163.com>

* chore: use proper const

Signed-off-by: discord9 <discord9@163.com>

* fix: recursive list when check empty dir

Signed-off-by: discord9 <discord9@163.com>

* per review

Signed-off-by: discord9 <discord9@163.com>

* refactor: with gc only delete if metadata region

Signed-off-by: discord9 <discord9@163.com>

* feat: add batch_get_table_route method to SchedulerCtx and MockSchedulerCtx

Signed-off-by: discord9 <discord9@163.com>

* chore: comment

Signed-off-by: discord9 <discord9@163.com>

* refactor: retry delete method

Signed-off-by: discord9 <discord9@163.com>

---------

Signed-off-by: discord9 <discord9@163.com>
2026-01-20 11:45:37 +00:00
discord9
aa3daf7053 fix: read filter's column (#7579)
* fix: read columns need for filter

Signed-off-by: discord9 <discord9@163.com>

* c

Signed-off-by: discord9 <discord9@163.com>

* feat: add support for explicit read columns in projection mappers

Signed-off-by: discord9 <discord9@163.com>

* test: add compatibility tests for projection mappers

Signed-off-by: discord9 <discord9@163.com>

* c

Signed-off-by: discord9 <discord9@163.com>

* fix: rename variable for clarity and improve column ID retrieval logic

Signed-off-by: discord9 <discord9@163.com>

* fix: update scan input construction to include read column IDs

Signed-off-by: discord9 <discord9@163.com>

* chore: per review

Signed-off-by: discord9 <discord9@163.com>

* test: sqlness for projection filter

Signed-off-by: discord9 <discord9@163.com>

* refactor: per review

Signed-off-by: discord9 <discord9@163.com>

* chore: more redacting

Signed-off-by: discord9 <discord9@163.com>

* chore: more redact

Signed-off-by: discord9 <discord9@163.com>

* c

Signed-off-by: discord9 <discord9@163.com>

---------

Signed-off-by: discord9 <discord9@163.com>
2026-01-20 08:19:50 +00:00
discord9
d916409d04 feat: exact partition filter (#7571)
* feat(mito2): add repartition tests

Signed-off-by: WenyXu <wenymedia@gmail.com>

* feat: filter(VIBED NOT REVIEW YET)

Signed-off-by: discord9 <discord9@163.com>

* feat: only use related columns

Signed-off-by: discord9 <discord9@163.com>

* feat: add partition filter tests and enhance pruning logic

Signed-off-by: discord9 <discord9@163.com>

* pre review

Signed-off-by: discord9 <discord9@163.com>

* feat: refine partition filter logic and update related function names

Signed-off-by: discord9 <discord9@163.com>

* per review

Signed-off-by: discord9 <discord9@163.com>

* c

Signed-off-by: discord9 <discord9@163.com>

* rm useless test

Signed-off-by: discord9 <discord9@163.com>

* feat: enhance partition filter error handling to skip failures

Signed-off-by: discord9 <discord9@163.com>

* chore: per review

Signed-off-by: discord9 <discord9@163.com>

* test: use real column

Signed-off-by: discord9 <discord9@163.com>

* per review

Signed-off-by: discord9 <discord9@163.com>

* feat: add TagDecodeState initialization to filter processing

Signed-off-by: discord9 <discord9@163.com>

* chore: update test doc

Signed-off-by: discord9 <discord9@163.com>

* per review

Signed-off-by: discord9 <discord9@163.com>

---------

Signed-off-by: WenyXu <wenymedia@gmail.com>
Signed-off-by: discord9 <discord9@163.com>
Co-authored-by: WenyXu <wenymedia@gmail.com>
2026-01-19 13:06:32 +00:00
Lei, HUANG
e0285209cb feat: flush region before close when skip-wal is enabled (#7549)
* feat: flush region before close when skip-wal is enabled

When closing a region with Noop WAL provider, the region is now flushed
before closing to ensure data durability. This prevents data loss for
regions configured with skip_wal.

Changes:
- Add `Closing` variant to `FlushReason` enum
- Modify `handle_close_request` to trigger flush for Noop WAL regions
- Pass flush reason through the flush pipeline
- Add test to verify data persistence after close with skip-wal

The flush-on-close flow completes the region cleanup after the flush
finishes, ensuring the region is properly removed from all schedulers.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* refactor: extract region cleanup logic into dedicated method

Extracts common region cleanup logic (stop, remove, and scheduler cleanup) into a new `remove_region` method to avoid duplication between `handle_close` and `handle_flush_request`. This improves code maintainability and reduces redundancy.

Also updates `RegionMap::remove_region` to return the removed region reference, allowing the caller to perform cleanup operations.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* test: split skip-wal region close test into pending and no-pending cases

Split the test_close_region_skip_wal test into two separate test cases:
- test_close_region_skip_wal_with_pending_data: Tests the scenario where
  data is inserted before closing a region with skip-wal enabled
- test_close_region_skip_wal_without_pending_data: Tests the scenario
  where a region with skip-wal is closed without any data insertion

This improves test clarity and ensures both scenarios are properly covered.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* fix: skip request handling and compaction for flush-on-close regions

When a region is flushed as part of the close operation (flush_on_close=true),
the region is immediately removed from the server. Therefore, there's no need
to handle pending requests or schedule compactions for such regions.

This fix moves the on_flush_success listener call outside the conditional
block and wraps all post-flush operations (request handling, compaction
scheduling) in an else branch, ensuring they only execute for normal flush
operations where the region remains active.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* test: add close follower region test with skip-wal

Adds a test case for closing a follower region with skip-wal enabled.
The test verifies that when a region transitions from Follower to Leader
before closing, the flush mechanism works correctly even with WAL disabled.

Also refactors flushable_region() to return Option instead of erroring
when region is not operable, allowing more flexible handling of region
states during flush operations.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* fix: fmt

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* revise test logic for closing a follower region

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

---------

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
2026-01-15 09:15:14 +00:00
Weny Xu
2ae20daa62 feat: add sync region instruction for repartition procedure (#7562)
* feat: add sync region instruction for repartition procedure

This commit introduces a new sync region instruction and integrates it
into the repartition procedure flow, specifically for metric engine tables.

Changes:
- Add SyncRegion instruction type and SyncRegionsReply in instruction.rs
- Implement SyncRegionHandler in datanode to handle sync region requests
- Add SyncRegion state in repartition procedure to sync newly allocated regions
- Integrate sync region step after enter_staging_region for metric engine tables
- Add sync_region flag and allocated_region_ids to PersistentContext
- Make SyncRegionFromRequest serializable for instruction transmission
- Add test utilities and mock support for sync region operations

The sync region step is conditionally executed based on the table engine type,
ensuring that newly allocated regions in metric engine tables are properly
synced from their source regions before proceeding with manifest remapping.

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: add logs

Signed-off-by: WenyXu <wenymedia@gmail.com>

* feat(repartition): improve staging region handling and support metric engine repartition
- Reorder sync region flow: move SyncRegion from EnterStagingRegion to RepartitionStart to sync before applying staging
- Add ExitStaging metadata update state to properly clear staging leader info after repartition completes
- Update build_template_from_raw_table_info to optionally skip metric engine internal columns when creating region requests
- Fix region state transition: set_dropping now expects specific state (Staging or Writable) for proper validation
- Adjust region drop and copy handlers to handle staging regions correctly
- Add comprehensive test cases for metric engine SPLIT/MERGE partition operations on physical tables with logical tables
- Improve logging for table route updates, region drops, and repartition operations

Signed-off-by: WenyXu <wenymedia@gmail.com>

* refactor: removes code duplication

Signed-off-by: WenyXu <wenymedia@gmail.com>

* fix: update result

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: refine comments

Signed-off-by: WenyXu <wenymedia@gmail.com>

* feat: add error strategy support for flush region and flush pending deallocate regions

- **Add `ErrorStrategy` enum** in `procedure/utils.rs`:
  - Supports `Ignore` and `Retry` strategies for error handling
  - Refactor `flush_region` to accept `error_strategy` parameter
  - Extract `handle_flush_region_reply` helper function for better code organization

- **Add pending deallocate region support**:
  - Add `pending_deallocate_region_ids` field to `PersistentContext`
  - Implement `flush_pending_deallocate_regions` in `EnterStagingRegion` state
  - Flush pending deallocate regions before entering staging regions to ensure data consistency

- **Update error handling**:
  - `flush_leader_region`: Use `ErrorStrategy::Ignore` to skip unreachable datanodes
  - `sync_region`: Use `ErrorStrategy::Retry` for critical operations
  - `enter_staging_region`: Use `ErrorStrategy::Retry` when flushing pending deallocate regions

This change improves the robustness of the repartition procedure by:
1. Providing flexible error handling strategies for flush operations
2. Ensuring pending deallocate regions are properly flushed before repartitioning
3. Preventing data inconsistency during region migration

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: apply suggestions from CR

Signed-off-by: WenyXu <wenymedia@gmail.com>

* fix: compile

Signed-off-by: WenyXu <wenymedia@gmail.com>

---------

Signed-off-by: WenyXu <wenymedia@gmail.com>
2026-01-15 04:52:57 +00:00
LFC
e64c31e59a chore: upgrade DataFusion family (#7558)
* chore: upgrade DataFusion family

Signed-off-by: luofucong <luofc@foxmail.com>

* use main proto

Signed-off-by: luofucong <luofc@foxmail.com>

* fix ci

Signed-off-by: luofucong <luofc@foxmail.com>

---------

Signed-off-by: luofucong <luofc@foxmail.com>
2026-01-14 14:02:31 +00:00
Ruihang Xia
a5cb0116a2 perf: avoid boundary checks on accessing array items (#7570)
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2026-01-14 12:56:39 +00:00
Yingwen
4b3bd7317b feat: add per-partition convert, result cache metrics (#7539)
* fix: show convert cost in explain analyze verbose

Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: increase puffin metadata cache metric

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: add result cache hit/miss to filter metrics

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: print flat format in debug

Signed-off-by: evenyag <realevenyag@gmail.com>

* test: update sqlness test

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: make scan cost contains part/reader build cost

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: collect divider cost

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: remove unused field in ScannerMetrics

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: collect metadata read bytes

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: collect read metrics in get_parquet_meta_data

Signed-off-by: evenyag <realevenyag@gmail.com>

---------

Signed-off-by: evenyag <realevenyag@gmail.com>
2026-01-13 09:17:09 +00:00
dennis zhuang
a56a00224f feat: impl vector index scan in storage (#7528)
* feat: impl vector index scan in storage

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* feat: fallback to read remote blob when blob not found

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* chore: refactor encoding and decoding and apply suggestions

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* fix: license

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* test: add apply_with_k tests

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* chore: apply suggestions

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* fix: forgot to align nulls when the vector column is not in the batch

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* test: add test for vector column is not in a batch while buiilding

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

---------

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
2026-01-12 08:30:51 +00:00
discord9
6487f14f70 feat: gc schd update repart mapping (#7517)
* feat(gc): batch gc now alos handle routing

Signed-off-by: discord9 <discord9@163.com>

typo

Signed-off-by: discord9 <discord9@163.com>

s

Signed-off-by: discord9 <discord9@163.com>

feat: use batch gc procedure

Signed-off-by: discord9 <discord9@163.com>

feat: cross region refs

Signed-off-by: discord9 <discord9@163.com>

feat: clean up repartition

Signed-off-by: discord9 <discord9@163.com>

chore: cleanup

Signed-off-by: discord9 <discord9@163.com>

per review

Signed-off-by: discord9 <discord9@163.com>

test: update mock test

Signed-off-by: discord9 <discord9@163.com>

refactor: rm unused

Signed-off-by: discord9 <discord9@163.com>

refactor: invert related_regions

Signed-off-by: discord9 <discord9@163.com>

clippy

Signed-off-by: discord9 <discord9@163.com>

pcr

Signed-off-by: discord9 <discord9@163.com>

chore: remove unused

Signed-off-by: discord9 <discord9@163.com>

fix: after invert fix

Signed-off-by: discord9 <discord9@163.com>

chore: rm unused

Signed-off-by: discord9 <discord9@163.com>

refactor: eff

Signed-off-by: discord9 <discord9@163.com>

docs: chore

Signed-off-by: discord9 <discord9@163.com>

* after rebase fix

Signed-off-by: discord9 <discord9@163.com>

* chore

Signed-off-by: discord9 <discord9@163.com>

* pcr

Signed-off-by: discord9 <discord9@163.com>

* fix: mssing region

Signed-off-by: discord9 <discord9@163.com>

---------

Signed-off-by: discord9 <discord9@163.com>
2026-01-12 08:28:34 +00:00
Yingwen
ef6dd5b99f fix: precise filter time index if not in projection (#7531)
* fix: precise filter time index if not in projection

Signed-off-by: evenyag <realevenyag@gmail.com>

* test: add sqlness test

Signed-off-by: evenyag <realevenyag@gmail.com>

---------

Signed-off-by: evenyag <realevenyag@gmail.com>
2026-01-07 11:15:34 +00:00
Ruihang Xia
d39895a970 feat: tune query traces (#7524)
* feat: add partition and region id

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* wip: instrument mito

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* connect region scan span

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* instrument streams

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* tweak

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2026-01-07 08:11:09 +00:00
Weny Xu
2f242927a8 feat(repartition): implement region deallocation for repartition procedure (#7522)
* feat: implement deallocate regions for repartition procedure

Signed-off-by: WenyXu <wenymedia@gmail.com>

* feat(metric-engine): add force flag to drop physical regions with associated logical regions

Signed-off-by: WenyXu <wenymedia@gmail.com>

* feat: update table metadata after deallocating regions

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: apply suggestions

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: update proto

Signed-off-by: WenyXu <wenymedia@gmail.com>

---------

Signed-off-by: WenyXu <wenymedia@gmail.com>
2026-01-07 06:13:48 +00:00
discord9
6f86a22e6f feat: adjust some args to gc worker (#7469)
* chore: less stuff sent

Signed-off-by: discord9 <discord9@163.com>

* after rebase fix

Signed-off-by: discord9 <discord9@163.com>

* pcr

Signed-off-by: discord9 <discord9@163.com>

* fix: clarify comment on manifest file removal for GC worker

Signed-off-by: discord9 <discord9@163.com>

* per review

Signed-off-by: discord9 <discord9@163.com>

---------

Signed-off-by: discord9 <discord9@163.com>
2026-01-06 07:37:05 +00:00
Weny Xu
b1d81913f5 feat: update ApplyStagingManifestRequest to fetch manifest from central region (#7493)
* feat: update ApplyStagingManifestRequest to fetch manifest from central region

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: refine comments

Signed-off-by: WenyXu <wenymedia@gmail.com>

* refactor(mito2): rename `StagingDataStorage` to `StagingBlobStorage`

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: apply suggestions

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: update proto

Signed-off-by: WenyXu <wenymedia@gmail.com>

---------

Signed-off-by: WenyXu <wenymedia@gmail.com>
2025-12-30 07:29:56 +00:00
dennis zhuang
e4b5ef275f feat: impl vector index building (#7468)
* feat: impl vector index building

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* feat: supports flat format

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* ci: add vector_index feature to test

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* chore: apply suggestions

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* chore: apply suggestions from copilot

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

---------

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
2025-12-30 03:38:51 +00:00
Lei, HUANG
7bc0934eb3 refactor(mito2): make MemtableStats fields public (#7488)
Change visibility of estimated_bytes, time_range, max_sequence, and
series_count fields from private to public for external access.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
2025-12-26 09:57:18 +00:00
Yingwen
89b9469250 feat: Implement per range stats for bulk memtable (#7486)
* feat: implement per range stats for MemtableRange

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: extract methods to MemtableRanges

Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: simple bulk memtable set other fields in stats

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: use time_index_type()

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: use time index type

Signed-off-by: evenyag <realevenyag@gmail.com>

---------

Signed-off-by: evenyag <realevenyag@gmail.com>
2025-12-26 07:24:11 +00:00
Weny Xu
518a4e013b refactor(mito2): reorganize manifest storage into modular components (#7483)
* refactor(mito2): reorganize manifest storage into modular components

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: apply suggestions

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: apply suggestions from CR

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: sort

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: fmt

Signed-off-by: WenyXu <wenymedia@gmail.com>

---------

Signed-off-by: WenyXu <wenymedia@gmail.com>
2025-12-26 02:24:27 +00:00
Weny Xu
294f19fa1d feat(metric-engine): support sync logical regions from source region (#7438)
* chore: move file

Signed-off-by: WenyXu <wenymedia@gmail.com>

* feat(metric-engine): support sync logical regions from source region

Signed-off-by: WenyXu <wenymedia@gmail.com>

* fix: fix unit tests

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: apply suggestions

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: add comments

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: add comments

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: apply suggestions from CR

Signed-off-by: WenyXu <wenymedia@gmail.com>

---------

Signed-off-by: WenyXu <wenymedia@gmail.com>
2025-12-25 09:06:58 +00:00
discord9
aea4e9fa55 fix: RemovedFiles deser compatibility (#7475)
* fix: compat for RemovedFiles

Signed-off-by: discord9 <discord9@163.com>

* cr

Signed-off-by: discord9 <discord9@163.com>

---------

Signed-off-by: discord9 <discord9@163.com>
2025-12-25 02:50:34 +00:00
AntiTopQuark
cea578244c fix(compaction): unify behavior of database compaction options with TTL (#7402)
* fix: fix dynamic compactiom option,unify behavior of database compaction options with TTL option

Signed-off-by: AntiTopQuark <AntiTopQuark1350@outlook.com>

* fix unit test

Signed-off-by: AntiTopQuark <AntiTopQuark1350@outlook.com>

* add debug log

Signed-off-by: AntiTopQuark <AntiTopQuark1350@outlook.com>

---------

Signed-off-by: AntiTopQuark <AntiTopQuark1350@outlook.com>
2025-12-25 02:34:42 +00:00
Weny Xu
e1b18614ee feat(mito2): implement ApplyStagingManifest request handling (#7456)
* feat(mito2): implement `ApplyStagingManifest` request handling

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: apply suggestions from CR

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: fmt

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: apply suggestions from CR

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: apply suggestions from CR

Signed-off-by: WenyXu <wenymedia@gmail.com>

* fix: fix logic

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: update proto

Signed-off-by: WenyXu <wenymedia@gmail.com>

---------

Signed-off-by: WenyXu <wenymedia@gmail.com>
2025-12-24 09:05:09 +00:00
Weny Xu
2d9967b981 fix(mito2): pass partition expr explicitly to flush task for region (#7461)
* fix(mito2): pass partition expr explicitly to flush task for staging mode

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: apply suggestions from CR

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: rename

Signed-off-by: WenyXu <wenymedia@gmail.com>

---------

Signed-off-by: WenyXu <wenymedia@gmail.com>
2025-12-24 04:18:06 +00:00
discord9
dec0d522f8 feat: gc versioned index (#7412)
* feat: add index version to file ref

Signed-off-by: discord9 <discord9@163.com>

* refactor wip

Signed-off-by: discord9 <discord9@163.com>

* wip

Signed-off-by: discord9 <discord9@163.com>

* update gc worker

Signed-off-by: discord9 <discord9@163.com>

* stuff

Signed-off-by: discord9 <discord9@163.com>

* gc report for index files

Signed-off-by: discord9 <discord9@163.com>

* fix: type

Signed-off-by: discord9 <discord9@163.com>

* stuff

Signed-off-by: discord9 <discord9@163.com>

* chore: clippy

Signed-off-by: discord9 <discord9@163.com>

* chore: metrics

Signed-off-by: discord9 <discord9@163.com>

* typo

Signed-off-by: discord9 <discord9@163.com>

* typo

Signed-off-by: discord9 <discord9@163.com>

* chore: naming

Signed-off-by: discord9 <discord9@163.com>

* docs: update explain

Signed-off-by: discord9 <discord9@163.com>

* test: parse file id/type from file path

Signed-off-by: discord9 <discord9@163.com>

* chore: change parse method visibility to crate

Signed-off-by: discord9 <discord9@163.com>

* pcr

Signed-off-by: discord9 <discord9@163.com>

* pcr

Signed-off-by: discord9 <discord9@163.com>

* chore

Signed-off-by: discord9 <discord9@163.com>

---------

Signed-off-by: discord9 <discord9@163.com>
2025-12-24 03:07:53 +00:00
discord9
b3bc3c76f1 feat: file range dynamic filter (#7441)
* feat: add dynamic filtering support in file range and predicate handling

Signed-off-by: discord9 <discord9@163.com>

* clippy

Signed-off-by: discord9 <discord9@163.com>

* c

Signed-off-by: discord9 <discord9@163.com>

* c

Signed-off-by: discord9 <discord9@163.com>

* per review

Signed-off-by: discord9 <discord9@163.com>

* per review

Signed-off-by: discord9 <discord9@163.com>

* pcr

Signed-off-by: discord9 <discord9@163.com>

* c

Signed-off-by: discord9 <discord9@163.com>

---------

Signed-off-by: discord9 <discord9@163.com>
2025-12-23 06:15:30 +00:00
Lei, HUANG
a8b512dded chore: expose symbols (#7451)
* chore/expose-symbols:
 ### Commit Message

 Enhance `merge_and_dedup` Functionality in `flush.rs`

 - **Function Signature Update**: Modified the `merge_and_dedup` function to accept `append_mode` and `merge_mode` as separate parameters instead of using `options`.
 - **Function Accessibility**: Changed the visibility of `merge_and_dedup` to `pub` to allow external access.
 - **Function Calls Update**: Updated calls to `merge_and_dedup` within `memtable_flat_sources` to align with the new function signature, passing `options.append_mode` and `options.merge_mode()` directly.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* chore/expose-symbols:
 ### Add Merge and Deduplication Functionality

 - **File**: `src/mito2/src/flush.rs`
   - Introduced `merge_and_dedup` function to merge multiple record batch iterators and apply deduplication based on specified modes.
   - Added detailed documentation for the function, explaining its arguments, behavior, and usage examples.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

---------

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
2025-12-22 05:39:03 +00:00
Yingwen
fed6cb0806 fix: flat format use correct encoding in indexer for tags (#7440)
* test: add inverted and skipping test

Signed-off-by: evenyag <realevenyag@gmail.com>

* test: Add tests for fulltext index

Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: index dictionary type in correct encoding in flat format

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: use encode_data_type() in SortField

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: refine imports

Signed-off-by: evenyag <realevenyag@gmail.com>

* test: add tests for sparse encoding

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: remove logs

Signed-off-by: evenyag <realevenyag@gmail.com>

* test: update list test

Signed-off-by: evenyag <realevenyag@gmail.com>

* test: simplify tests

Signed-off-by: evenyag <realevenyag@gmail.com>

---------

Signed-off-by: evenyag <realevenyag@gmail.com>
2025-12-19 07:36:44 +00:00
Lanqing Yang
658332fe68 chore(mito): nit remove extra hashset in gc workers (#7399)
chore(mito): remove extra hashset in gc workers

Signed-off-by: lyang24 <lanqingy93@gmail.com>
2025-12-18 13:09:32 +00:00
jeremyhi
95eccd6cde feat: introduce granularity for memory manager (#7416)
* feat: introduce granularity for memory manager

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* chore: add unit test

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* chore: remove granularity getter for mamanger

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* Update src/common/memory-manager/src/manager.rs

Co-authored-by: Lei, HUANG <6406592+v0y4g3r@users.noreply.github.com>

* feat: acquire_with_policy for manager

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

---------

Signed-off-by: jeremyhi <fengjiachun@gmail.com>
Co-authored-by: Lei, HUANG <6406592+v0y4g3r@users.noreply.github.com>
2025-12-17 11:08:51 +00:00
Lei, HUANG
da964880f5 chore: expose symbols (#7417)
* refactor/expose-symbols:
 ## Refactor `bulk/part.rs` to Simplify Mutation Handling

 - Removed the `mutations_to_record_batch` function and its associated helper functions, including `ArraysSorter`, `timestamp_array_to_iter`, and `binary_array_to_dictionary`, to simplify the mutation handling logic in `bulk/part.rs`.
 - Deleted related test functions `check_binary_array_to_dictionary` and `check_mutations_to_record_batches` from the test module, along with their associated test cases.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* refactor/expose-symbols:
 ### Commit Message

 **Refactor and Enhance Deduplication Logic**

 - **`flush.rs`**: Refactored `maybe_dedup_one` function to accept `append_mode` and `merge_mode` as parameters instead of `RegionOptions`. This change enhances flexibility in deduplication logic.
 - **`memtable/bulk.rs`**: Made `BulkRangeIterBuilder` struct and its fields public to allow external access and modification, improving extensibility.
 - **`sst.rs`**: Corrected a typo in the schema documentation, changing `__prmary_key` to `__primary_key` for clarity and accuracy.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

---------

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
2025-12-17 01:29:36 +00:00
Yingwen
f6afb10e33 feat!: download file to fill the cache on write cache miss (#7294)
* feat: download inverted index file

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: download for bloom and fulltext

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: implement maybe_download_background for FileCache

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: load file for parquet

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: reduce channel size

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: use ManifestCache

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: pass cache to ManifestObjectStore::new

Signed-off-by: evenyag <realevenyag@gmail.com>

* style: fix fmt and clippy

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: remove manifest cache ttl

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: remove read cache

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: clean old read cache path

Signed-off-by: evenyag <realevenyag@gmail.com>

* docs: update config

Signed-off-by: evenyag <realevenyag@gmail.com>

* docs: update config examples

Signed-off-by: evenyag <realevenyag@gmail.com>

* test: update test

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: fix CI

Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: also clean the root directory

Signed-off-by: evenyag <realevenyag@gmail.com>

* test: update manifest test

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: fix compiler errors

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: skip file if it exists

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: remove warn in replace

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: add a flag to enable/disable background download

set the concurrency to 1 for background download

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: rename write_cache_enable_background_download to enable_refill_cache_on_read

Signed-off-by: evenyag <realevenyag@gmail.com>

* test: update config test

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: address comments

Signed-off-by: evenyag <realevenyag@gmail.com>

* docs: update config.md

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: fmt code

Signed-off-by: evenyag <realevenyag@gmail.com>

---------

Signed-off-by: evenyag <realevenyag@gmail.com>
2025-12-16 08:31:26 +00:00
Weny Xu
f7d5c87ac0 feat: introduce copy_region_from for mito engine (#7389)
* feat: introduce `copy_region_from`

Signed-off-by: WenyXu <wenymedia@gmail.com>

* fix: fix clippy

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: apply suggestions from CR

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: apply suggestions

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: apply suggestions from CR

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: apply suggestions from CR

Signed-off-by: WenyXu <wenymedia@gmail.com>

---------

Signed-off-by: WenyXu <wenymedia@gmail.com>
2025-12-16 06:12:06 +00:00
jeremyhi
32f9cc5286 feat: move memory_manager to common crate (#7408)
* feat: move memory_manager to common crate

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* chore: add license header

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* fix: by AI comment

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

---------

Signed-off-by: jeremyhi <fengjiachun@gmail.com>
2025-12-15 13:15:33 +00:00
Yingwen
5232a12a8c feat: per file scan metrics (#7396)
* feat: collect per file metrics

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: divide build_cost to build_part_cost and build_reader_cost

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: limit the file metrics num to display

Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: use sorted iter to get sorted files

Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: output metrics in desc order

Signed-off-by: evenyag <realevenyag@gmail.com>

---------

Signed-off-by: evenyag <realevenyag@gmail.com>
2025-12-15 12:52:03 +00:00
jeremyhi
baffed8c6a feat: mem manager on compaction (#7305)
* feat: mem manager on compaction

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* fix: by copilot review comment

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* feat: experimental_

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* fix: refine estimate_compaction_bytes

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* feat: make them into config example

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* chore: by copilot comment

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* Update src/mito2/src/compaction.rs

Co-authored-by: Lei, HUANG <6406592+v0y4g3r@users.noreply.github.com>

* fix: dedup the regions waiting

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* chore: by comment

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* chore: minor change

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* feat: add AdditionalMemoryGuard for the running compaction task

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* refactor: do OnExhaustedPolicy before running task

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* refactor: use OwnedSemaphorePermit to impl guard

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* feat: add early_release_partial method to release a portion of memory

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* fix: 0 bytes make request_additional unlimited

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* fix: fail-fast on acquire

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

---------

Signed-off-by: jeremyhi <fengjiachun@gmail.com>
Co-authored-by: Lei, HUANG <6406592+v0y4g3r@users.noreply.github.com>
2025-12-12 06:49:58 +00:00
Lanqing Yang
f5e0e94e3a chore(mito): nit avoid clone the batch object on inverted index building (#7388)
fix: avoid clone the batch object on inverted index building

Signed-off-by: lyang24 <lanqingy93@gmail.com>
2025-12-12 04:58:37 +00:00
discord9
f06a64ff90 feat: mark index outdated (#7383)
* feat: mark index outdated

Signed-off-by: discord9 <discord9@163.com>

* refactor: move IndexVerwsion to store-api

Signed-off-by: discord9 <discord9@163.com>

* per review

Signed-off-by: discord9 <discord9@163.com>

* fix: condition for add files

Signed-off-by: discord9 <discord9@163.com>

* cleanup

Signed-off-by: discord9 <discord9@163.com>

* refactor(sst): extract index version check into method

Signed-off-by: discord9 <discord9@163.com>

---------

Signed-off-by: discord9 <discord9@163.com>
2025-12-11 12:08:45 +00:00
discord9
a26dee0ca1 fix: gc listing op first (#7385)
Signed-off-by: discord9 <discord9@163.com>
2025-12-11 03:25:05 +00:00
Yingwen
a22d08f1b1 feat: collect merge and dedup metrics (#7375)
* feat: collect FlatMergeReader metrics

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: add MergeMetricsReporter, rename Metrics to MergeMetrics

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: remove num_input_rows from MergeMetrics

The merge reader won't dedup so there is no need to collect input rows

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: report merge metrics to PartitionMetrics

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: add dedup cost to DedupMetrics

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: collect dedup metrics

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: remove metrics from FlatMergeIterator

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: remove num_output_rows from MergeMetrics

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: fix clippy

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: implement merge() for merge and dedup metrics

Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: report metrics after observe metrics

Signed-off-by: evenyag <realevenyag@gmail.com>

---------

Signed-off-by: evenyag <realevenyag@gmail.com>
2025-12-10 09:16:20 +00:00
Lei, HUANG
2f9130a2de chore(mito): expose some symbols (#7373)
chore/expose-symbols:
 ### Commit Summary

 - **Visibility Changes**: Updated visibility of functions in `bulk/part.rs`:
   - Made `record_batch_estimated_size` and `sort_primary_key_record_batch` functions public.
 - **Enhancements**: Enhanced functionality in `memtable.rs` by exposing additional components from `bulk::part`:
   - `BulkPartEncoder`, `BulkPartMeta`, `UnorderedPart`, `record_batch_estimated_size`, and `sort_primary_key_record_batch`.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
2025-12-09 14:33:14 +00:00
discord9
9197e818ec refactor: use versioned index for index file (#7309)
* refactor: use versioned index for index file

Signed-off-by: discord9 <discord9@163.com>

* fix: sst entry table

Signed-off-by: discord9 <discord9@163.com>

* update sqlness

Signed-off-by: discord9 <discord9@163.com>

* chore: unit type

Signed-off-by: discord9 <discord9@163.com>

* fix: missing version

Signed-off-by: discord9 <discord9@163.com>

* more fix build index

Signed-off-by: discord9 <discord9@163.com>

* fix: use proper index id

Signed-off-by: discord9 <discord9@163.com>

* pcr

Signed-off-by: discord9 <discord9@163.com>

* test: update

Signed-off-by: discord9 <discord9@163.com>

* clippy

Signed-off-by: discord9 <discord9@163.com>

* test: test_list_ssts fixed

Signed-off-by: discord9 <discord9@163.com>

* test: fix test

Signed-off-by: discord9 <discord9@163.com>

* feat: stuff

Signed-off-by: discord9 <discord9@163.com>

* fix: clean temp index file on abort&delete all index version when delete file

Signed-off-by: discord9 <discord9@163.com>

* docs: explain

Signed-off-by: discord9 <discord9@163.com>

* fix: actually clean up tmp dir

Signed-off-by: discord9 <discord9@163.com>

* clippy

Signed-off-by: discord9 <discord9@163.com>

* clean tmp dir only when write cache enabled

Signed-off-by: discord9 <discord9@163.com>

* refactor: add version to index cache

Signed-off-by: discord9 <discord9@163.com>

* per review

Signed-off-by: discord9 <discord9@163.com>

* test: update size

Signed-off-by: discord9 <discord9@163.com>

* per review

Signed-off-by: discord9 <discord9@163.com>

---------

Signed-off-by: discord9 <discord9@163.com>
2025-12-09 07:31:12 +00:00
Ruihang Xia
edb1f6086f feat: decode pk eagerly (#7350)
* feat: decode pk eagerly

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* merge primary_key_codec and decode_primary_key_values

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2025-12-05 09:11:51 +00:00
Yingwen
84e4e42ee7 feat: add more verbose metrics to scanners (#7336)
* feat: add inverted applier metrics

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: add metrics to bloom applier

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: add metrics to fulltext index applier

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: implement BloomFilterReadMetrics for BloomFilterReader

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: collect read metrics for inverted index

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: add metrics for range_read and metadata

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: rename elapsed to fetch_elapsed

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: collect metadata fetch metrics for inverted index

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: collect cache metrics for inverted and bloom index

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: collect read metrics in appliers

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: collect fulltext dir metrics for applier

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: collect parquet row group metrics

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: add parquet metadata metrics

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: add apply metrics

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: collect more metrics for memory row group

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: add fetch metrics to ReaderMetrics

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: init verbose metrics

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: debug print metrics in ScanMetricsSet

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: implement debug for new metrics

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: fix compiler errors

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: update parquet fetch metrics

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: collect the whole fetch time

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: add file_scan_cost

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: parquet fetch add cache_miss counter

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: print index read metrics

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: use actual bytes to increase counter

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: remove provided implementations for index reader traits

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: change get_parquet_meta_data() method to receive metrics

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: rename file_scan_cost to sst_scan_cost

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: refine ParquetFetchMetrics

Signed-off-by: evenyag <realevenyag@gmail.com>

* style: fix clippy

Signed-off-by: evenyag <realevenyag@gmail.com>

* style: fmt code

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: remove useless inner method

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: collect page size actual needed

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: simplify InvertedIndexReadMetrics

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: simplfy InvertedIndexApplyMetrics Debug

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: simplify BloomFilterReadMetrics Debug

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: simplify BloomFilterIndexApplyMetrics Debug

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: simplify FulltextIndexApplyMetrics implementation

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: simplify ParquetFetchMetrics Debug

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: simplify MetadataCacheMetrics Debug

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: only print verbose metrics when they are not empty.

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: use mutex to protect ParquetFetchMetrics

Signed-off-by: evenyag <realevenyag@gmail.com>

* style: fmt code

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: use duration for elapsed in ParquetFetchMetricsData

Signed-off-by: evenyag <realevenyag@gmail.com>

---------

Signed-off-by: evenyag <realevenyag@gmail.com>
2025-12-04 13:40:18 +00:00
Yingwen
d5c616a9ff feat: implement a cache for manifest files (#7326)
* feat: use cache in manifest store

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: use ManifestCache

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: clean empty manifest dir

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: get last checkpoint from cache

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: add hit/miss counter for manifest cache

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: add logs

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: pass cache to ManifestObjectStore::new

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: fix compiler errors

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: cache checkpoint

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: cache checkpoint in write

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: fix compiler warnings

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: update config comment

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: manifest store cache for staging

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: move recover_inner to FileCacheInner

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: remove manifest cache config from MitoConfig

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: reduce clone when cache is enabled

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: do not cache staging manifests

We clean staging manifests by remove_all which isn't easy to clean
the cache in the same way

Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: fix paths in manifest cache

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: don't clean dir if it is too new

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: reuse write cache ttl as manifest cache ttl

Signed-off-by: evenyag <realevenyag@gmail.com>

* style: fix clippy

Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: clean all empty subdirectories

Signed-off-by: evenyag <realevenyag@gmail.com>

---------

Signed-off-by: evenyag <realevenyag@gmail.com>
2025-12-04 12:51:09 +00:00