mirror of https://github.com/GreptimeTeam/greptimedb.git synced 2026-07-03 20:40:37 +00:00

Files

dennis zhuang 3b8f55e490 docs(agents): add per-crate guides, architecture invariants, and generated-files list (#8346 )

* docs(agents): add per-crate guides, architecture invariants, and generated-files list

Add agent/contributor navigation docs modeled on the AGENTS.md convention:

- Per-crate AGENTS.md for hot crates (mito2, metric-engine, flow, frontend,
  meta-srv): module map, read/write paths, change-coupling points, test
  commands, and gotchas.
- .agents/architecture-invariants.md: repo-wide rules that clippy and the
  style guide do not cover (format compatibility, crate layering, async
  runtimes, error handling, experimental gating, the DataFusion fork).
- .agents/generated-files.md: tool-generated artifacts that must not be
  hand-edited (sqlness .result, config.md, dashboards, build.rs output, proto).
- Anchor the .gitignore CLAUDE.md/AGENTS.md rules to the repo root so per-crate
  AGENTS.md files are tracked while root-level personal config stays ignored.

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* chore: update crate AGENTS.md and fix config.md path

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* docs(agents): fix DataFusion patch layout and SQL query lifecycle order

Address review feedback on #8346:

- architecture-invariants: the DataFusion sub-crates pin an exact crates.io
  version in [workspace.dependencies] and are redirected to the fork rev in
  [patch.crates-io]; the two sections hold different forms, not the same rev.
- frontend: the SQL query lifecycle runs the pre_parsing/post_parsing
  interceptors around parsing, before the per-statement permission check.

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

---------

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

2026-06-24 09:14:09 +00:00

4.3 KiB

Raw Permalink Blame History

metric-engine — Agent & Contributor Guide

Navigation aid for src/metric-engine. Keep it short and point to code. Paths are relative to the repo root.

Repo-wide rules that apply here: .agents/architecture-invariants.md.

What this crate does

The Metric Engine is optimized for Prometheus-style workloads with a huge number of small tables. Many logical regions (one per metric table) share a single physical pair of Mito2 regions: a data region and a metadata region. Rows are multiplexed with metric-engine internal identity: dense primary-key mode injects __table_id and __tsid, while the default sparse mode encodes them into __primary_key. Reads still add a logical-table filter before forwarding to the physical data region. It implements RegionEngine and delegates all real storage to mito2.

The architecture is documented at the top of src/metric-engine/src/lib.rs.

Module map

Module	Path	Purpose
`engine`	`src/metric-engine/src/engine.rs`, `src/metric-engine/src/engine/`	`MetricEngine` (`RegionEngine` impl) and per-op handlers (`create.rs`, `put.rs`, `read.rs`, `alter.rs`, `drop.rs`, ...)
`metadata_region`	`src/metric-engine/src/metadata_region.rs`	K-V over a Mito2 region storing logical table/column metadata, with an LRU cache
`data_region`	`src/metric-engine/src/data_region.rs`	Wraps the Mito2 data region; forwards writes and manages physical columns
`row_modifier`	`src/metric-engine/src/row_modifier.rs`	Rewrites incoming rows for dense or sparse primary-key encoding
`batch_modifier`	`src/metric-engine/src/batch_modifier.rs`	RecordBatch-level TSID computation and sparse primary-key encoding
`state`	`src/metric-engine/src/engine/state.rs`	In-memory cache of physical columns and logical column metadata
`repeated_task`	`src/metric-engine/src/repeated_task.rs`	Periodic metadata-region flush task
`utils`	`src/metric-engine/src/utils.rs`	`RegionId` conversions (data vs metadata group), manifest encoding
`config`	`src/metric-engine/src/config.rs`	`EngineConfig` (metadata flush interval, sparse PK)
`test_util`	`src/metric-engine/src/test_util.rs`	`TestEnv` building the Mito2 + Metric stack

Write path

MetricEngine::handle_request(Put) (engine/put.rs) rejects direct writes to a physical region, resolves the physical region for the logical id, loads logical columns from metadata_region, then row_modifier rewrites the rows according to the data region's primary-key encoding and forwards the request to the Mito2 data region.

Read path

MetricEngine::handle_query (engine/read.rs): a query against a logical region is rewritten to add a __table_id == <logical_id> filter and forwarded to the Mito2 data region. Queries against a physical region pass straight through.

Public surface

Entry: MetricEngine in src/metric-engine/src/engine.rs, built via try_new(mito, config).
Trait: impl RegionEngine for MetricEngine (name = "metric").
Depends on mito2, store-api, and mito-codec (sparse primary key codec).

When you change X, also touch Y

Reserved column ids / names (__tsid, __table_id, __primary_key): see store-api's metric engine consts; keep them in sync with engine.rs.
Metadata K-V encoding (metadata_region.rs): changes the on-disk metadata layout.
RegionId group mapping (utils.rs): data vs metadata region derivation.
Physical column rules (engine/alter/): a physical region allows only one field column.

Testing

cargo nextest run -p metric-engine

TestEnv in test_util.rs gives you mito() and metric() handles.

Gotchas

Physical vs logical region confusion: physical regions reject direct user writes; operate on logical region ids.
TSID must be stable for the same tag set — it is a hash over sorted tag names
- values and may be stored in __tsid or encoded into __primary_key.
Metadata is cached (LRU with a TTL); after an alter, stale reads are possible until invalidation/expiry.
Always convert ids via utils::to_data_region_id / to_metadata_region_id.

Maintenance contract

Update this file when you change the logical/physical region model, the injected columns, the metadata encoding, or the public engine entry points.

4.3 KiB Raw Permalink Blame History