* docs(agents): add per-crate guides, architecture invariants, and generated-files list Add agent/contributor navigation docs modeled on the AGENTS.md convention: - Per-crate AGENTS.md for hot crates (mito2, metric-engine, flow, frontend, meta-srv): module map, read/write paths, change-coupling points, test commands, and gotchas. - .agents/architecture-invariants.md: repo-wide rules that clippy and the style guide do not cover (format compatibility, crate layering, async runtimes, error handling, experimental gating, the DataFusion fork). - .agents/generated-files.md: tool-generated artifacts that must not be hand-edited (sqlness .result, config.md, dashboards, build.rs output, proto). - Anchor the .gitignore CLAUDE.md/AGENTS.md rules to the repo root so per-crate AGENTS.md files are tracked while root-level personal config stays ignored. Signed-off-by: Dennis Zhuang <killme2008@gmail.com> * chore: update crate AGENTS.md and fix config.md path Signed-off-by: Dennis Zhuang <killme2008@gmail.com> * docs(agents): fix DataFusion patch layout and SQL query lifecycle order Address review feedback on #8346: - architecture-invariants: the DataFusion sub-crates pin an exact crates.io version in [workspace.dependencies] and are redirected to the fork rev in [patch.crates-io]; the two sections hold different forms, not the same rev. - frontend: the SQL query lifecycle runs the pre_parsing/post_parsing interceptors around parsing, before the per-statement permission check. Signed-off-by: Dennis Zhuang <killme2008@gmail.com> --------- Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
4.4 KiB
mito2 — Agent & Contributor Guide
Navigation aid for src/mito2. Keep it short and point to code; do not duplicate
the code here. Paths are relative to the repo root.
Repo-wide rules that apply here: .agents/architecture-invariants.md.
What this crate does
Mito2 is GreptimeDB's primary time-series region storage engine. It owns the
write path (memtable + WAL), flushing memtables to Parquet SST files,
TWCS/windowed compaction, and the read path (multi-level merge + dedup with
snapshot isolation). It implements the RegionEngine trait from store-api.
Module map
| Module | Path | Purpose |
|---|---|---|
engine |
src/mito2/src/engine.rs |
MitoEngine (the RegionEngine impl) and request dispatch |
worker |
src/mito2/src/worker.rs, src/mito2/src/worker/ |
Per-region worker loop; write/alter/flush handlers |
region |
src/mito2/src/region.rs, src/mito2/src/region/version.rs |
MitoRegion state and copy-on-write VersionControl snapshots |
request |
src/mito2/src/request.rs |
WriteRequest/RegionRequest types and result channels |
wal |
src/mito2/src/wal.rs |
Write-ahead log wrapper over log-store |
memtable |
src/mito2/src/memtable/ |
In-memory write buffers (time-series / bulk / partition) |
flush |
src/mito2/src/flush.rs |
FlushScheduler, WriteBufferManager, memtable → SST |
compaction |
src/mito2/src/compaction/ |
TWCS picker, strict-window manual picker, compactor, memory control |
access_layer |
src/mito2/src/access_layer.rs |
SST read/write over the object store |
sst |
src/mito2/src/sst/ |
Parquet format, file metadata, index layout |
read |
src/mito2/src/read/ |
ScanRegion, merge, dedup, projection, streaming |
manifest |
src/mito2/src/manifest/ |
RegionManifestManager, manifest actions/edits |
cache |
src/mito2/src/cache.rs |
Write/file/page caches |
gc |
src/mito2/src/gc.rs, src/mito2/src/gc/ |
Dropped-file cleanup worker |
schedule |
src/mito2/src/schedule/ |
Local/remote background job scheduling |
remap_manifest |
src/mito2/src/remap_manifest.rs |
Manifest path remapping for region copy/migration |
config |
src/mito2/src/config.rs |
MitoConfig tuning knobs |
test_util |
src/mito2/src/test_util.rs |
TestEnv and builders (under the test feature) |
Write path
MitoEngine::handle_request (engine.rs) → worker loop
(worker/handle_write.rs) → sequence + WAL assembly (region_write_ctx.rs) →
wal.rs → memtable (memtable/) → when buffer pressure trips, flush.rs
writes SSTs via access_layer.rs and appends a RegionEdit to the manifest
(manifest/manager.rs).
Read path
MitoEngine::handle_query (engine.rs) → read/scan_region.rs takes an
immutable Version (region/version.rs) → scans memtables and Parquet SSTs
(sst/parquet.rs) → merges (read/) and dedups by sequence → projected,
filtered RecordBatch stream.
Public surface
- Entry:
MitoEngineinsrc/mito2/src/engine.rs, built viaMitoEngineBuilder. - Trait:
impl RegionEngine for MitoEngine(store-api's region engine contract). - Consumed by
datanode(sendsRegionRequests) and the query layer (scans).
When you change X, also touch Y
- Manifest format (
manifest/action.rs): affects crash recovery and follower replay. Keep it backward compatible. - SST/Parquet layout (
sst/): readers must stay compatible with existing files. - Request types (
request.rs): usually tied to proto definitions consumed bydatanode. - WAL/memtable encoding (
wal/,memtable/): breaks replay if changed incompatibly.
Testing
cargo nextest run -p mito2
Tests live next to the code as *_test.rs (e.g. src/mito2/src/engine/flush_test.rs).
TestEnv in test_util.rs spins up an engine over an in-process object store.
Gotchas
- Sequence numbers are strictly increasing per region; dedup and snapshot reads depend on this. Do not change assignment lightly.
- Manifest version is monotonic — never reset or skip it.
- Lock ordering: take the manifest lock before updating
version_control; the reverse deadlocks against concurrent flush/compaction. - All region I/O runs on tokio workers; never
block_oninside a worker.
Maintenance contract
Update this file when you add/rename a top-level module, change the write/read path entry points, or alter a persisted format (manifest, SST, WAL).