Files
greptimedb/src/mito2/AGENTS.md
dennis zhuang 3b8f55e490 docs(agents): add per-crate guides, architecture invariants, and generated-files list (#8346)
* docs(agents): add per-crate guides, architecture invariants, and generated-files list

Add agent/contributor navigation docs modeled on the AGENTS.md convention:

- Per-crate AGENTS.md for hot crates (mito2, metric-engine, flow, frontend,
  meta-srv): module map, read/write paths, change-coupling points, test
  commands, and gotchas.
- .agents/architecture-invariants.md: repo-wide rules that clippy and the
  style guide do not cover (format compatibility, crate layering, async
  runtimes, error handling, experimental gating, the DataFusion fork).
- .agents/generated-files.md: tool-generated artifacts that must not be
  hand-edited (sqlness .result, config.md, dashboards, build.rs output, proto).
- Anchor the .gitignore CLAUDE.md/AGENTS.md rules to the repo root so per-crate
  AGENTS.md files are tracked while root-level personal config stays ignored.

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* chore: update crate AGENTS.md and fix config.md path

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* docs(agents): fix DataFusion patch layout and SQL query lifecycle order

Address review feedback on #8346:

- architecture-invariants: the DataFusion sub-crates pin an exact crates.io
  version in [workspace.dependencies] and are redirected to the fork rev in
  [patch.crates-io]; the two sections hold different forms, not the same rev.
- frontend: the SQL query lifecycle runs the pre_parsing/post_parsing
  interceptors around parsing, before the per-statement permission check.

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

---------

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
2026-06-24 09:14:09 +00:00

4.4 KiB

mito2 — Agent & Contributor Guide

Navigation aid for src/mito2. Keep it short and point to code; do not duplicate the code here. Paths are relative to the repo root.

Repo-wide rules that apply here: .agents/architecture-invariants.md.

What this crate does

Mito2 is GreptimeDB's primary time-series region storage engine. It owns the write path (memtable + WAL), flushing memtables to Parquet SST files, TWCS/windowed compaction, and the read path (multi-level merge + dedup with snapshot isolation). It implements the RegionEngine trait from store-api.

Module map

Module Path Purpose
engine src/mito2/src/engine.rs MitoEngine (the RegionEngine impl) and request dispatch
worker src/mito2/src/worker.rs, src/mito2/src/worker/ Per-region worker loop; write/alter/flush handlers
region src/mito2/src/region.rs, src/mito2/src/region/version.rs MitoRegion state and copy-on-write VersionControl snapshots
request src/mito2/src/request.rs WriteRequest/RegionRequest types and result channels
wal src/mito2/src/wal.rs Write-ahead log wrapper over log-store
memtable src/mito2/src/memtable/ In-memory write buffers (time-series / bulk / partition)
flush src/mito2/src/flush.rs FlushScheduler, WriteBufferManager, memtable → SST
compaction src/mito2/src/compaction/ TWCS picker, strict-window manual picker, compactor, memory control
access_layer src/mito2/src/access_layer.rs SST read/write over the object store
sst src/mito2/src/sst/ Parquet format, file metadata, index layout
read src/mito2/src/read/ ScanRegion, merge, dedup, projection, streaming
manifest src/mito2/src/manifest/ RegionManifestManager, manifest actions/edits
cache src/mito2/src/cache.rs Write/file/page caches
gc src/mito2/src/gc.rs, src/mito2/src/gc/ Dropped-file cleanup worker
schedule src/mito2/src/schedule/ Local/remote background job scheduling
remap_manifest src/mito2/src/remap_manifest.rs Manifest path remapping for region copy/migration
config src/mito2/src/config.rs MitoConfig tuning knobs
test_util src/mito2/src/test_util.rs TestEnv and builders (under the test feature)

Write path

MitoEngine::handle_request (engine.rs) → worker loop (worker/handle_write.rs) → sequence + WAL assembly (region_write_ctx.rs) → wal.rs → memtable (memtable/) → when buffer pressure trips, flush.rs writes SSTs via access_layer.rs and appends a RegionEdit to the manifest (manifest/manager.rs).

Read path

MitoEngine::handle_query (engine.rs) → read/scan_region.rs takes an immutable Version (region/version.rs) → scans memtables and Parquet SSTs (sst/parquet.rs) → merges (read/) and dedups by sequence → projected, filtered RecordBatch stream.

Public surface

  • Entry: MitoEngine in src/mito2/src/engine.rs, built via MitoEngineBuilder.
  • Trait: impl RegionEngine for MitoEngine (store-api's region engine contract).
  • Consumed by datanode (sends RegionRequests) and the query layer (scans).

When you change X, also touch Y

  • Manifest format (manifest/action.rs): affects crash recovery and follower replay. Keep it backward compatible.
  • SST/Parquet layout (sst/): readers must stay compatible with existing files.
  • Request types (request.rs): usually tied to proto definitions consumed by datanode.
  • WAL/memtable encoding (wal/, memtable/): breaks replay if changed incompatibly.

Testing

cargo nextest run -p mito2

Tests live next to the code as *_test.rs (e.g. src/mito2/src/engine/flush_test.rs). TestEnv in test_util.rs spins up an engine over an in-process object store.

Gotchas

  • Sequence numbers are strictly increasing per region; dedup and snapshot reads depend on this. Do not change assignment lightly.
  • Manifest version is monotonic — never reset or skip it.
  • Lock ordering: take the manifest lock before updating version_control; the reverse deadlocks against concurrent flush/compaction.
  • All region I/O runs on tokio workers; never block_on inside a worker.

Maintenance contract

Update this file when you add/rename a top-level module, change the write/read path entry points, or alter a persisted format (manifest, SST, WAL).