prost has no size caching, so WalEntry::encode_to_vec recomputes the
encoded_len of every nested message for each length delimiter. In the deep
WAL tree (WalEntry/Mutation/Rows/Row/Value) a leaf Value's length is
recomputed once per ancestor level (~5x).
Add WalEntryEncoder: one size pass caches every message body length into a
flat vector (pre-order), one encode pass writes bytes reading the cached
lengths back via a cursor, so each length is computed exactly once. Leaf
messages (Value, ColumnSchema, WriteHint, BulkWalEntry) delegate their body
encoding to prost but have their single computed length cached.
Output is byte-for-byte identical to encode_to_vec (asserted in tests
covering delete op_type, null values, empty entry, encoder reuse), which is
required for WAL replay compatibility. Wired into WalWriter::add_entry,
reused per write batch.
This compensates for prost lacking a cached_size mechanism and can be
removed if such caching lands upstream.
Signed-off-by: lyang24 <lanqingy93@gmail.com>
* feat: add datanode runtime options
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat: add datanode runtime handles
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* refactor: wire datanode runtimes into region server
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat: route datanode ingestion to ingestion runtime
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat: add datanode query runtime stream bridge
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat: route datanode reads to query runtime
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat: add datanode global runtimes
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* refactor: use common datanode runtimes
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat: run mito scan tasks on query runtime
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* refactor: split datanode runtime options
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* fix: clippy
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* fix: share global fallback for datanode runtimes
Use the global runtime as the fallback for datanode query and ingestion
runtimes when datanode-specific pools are not initialized. This avoids
creating unused datanode worker pools in non-datanode services.
Files:
- `src/common/runtime/src/global.rs`
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* fix: docs
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* fix: forward query runtime stream metrics
Forward inner stream metrics through the datanode query runtime bridge so
`EXPLAIN ANALYZE` can report plan metrics after stream polling moves to the
query runtime.
Files:
- `src/datanode/src/query_stream.rs`
- `src/datanode/src/region_server.rs`
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* fix: route metric batch puts to ingest runtime
Run the optimized metric batch put path on the datanode ingest runtime so
metric ingestion does not bypass runtime isolation.
Files:
- `src/datanode/src/region_server.rs`
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* fix: abort query producer on stream drop
Abort the datanode query runtime producer when the returned read stream is
dropped so cancelled clients do not leave query work running in the
background.
Files:
- `src/datanode/src/query_stream.rs`
- `src/datanode/src/region_server.rs`
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* refactor: simplify query stream bridge setup
Create the inner read stream before spawning the datanode query runtime
producer so setup does not use an extra task and initialization channel.
Files:
- `src/datanode/src/region_server.rs`
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/runtime-priority:
### Update Datanode Runtime Options and Region Server Logic
- **`global.rs`**: Adjusted `datanode_ingest_rt_size` to utilize all available CPUs for improved performance.
- **`region_server.rs`**: Simplified the collection of `put_requests` and optimized the `put_regions_batch` call for better efficiency.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/runtime-priority:
### Remove Redundant Checks and Simplify Code
- **`global.rs`**: Removed the assertion check for already initialized global runtimes to streamline the initialization process.
- **`region_server.rs`**: Simplified the extraction of `Put` requests by removing unnecessary cloning and restructuring the iterator logic.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* fix: remove redundant spawn_datanode_query in RegionServer::handle_read
The outer `spawn_datanode_query` wrapped `handle_read_inner` on the
same runtime, creating a nested spawn that consumed query runtime
threads unnecessarily under concurrent read load. The gRPC handler
already provides runtime isolation, so the inner call is sufficient.
- `src/datanode/src/region_server.rs` — inline `handle_read_inner`
directly instead of spawning onto the datanode query runtime
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* fix: resolve test mismatch and redundant spawn in handle_remote_read
- `src/common/runtime/src/global.rs` — update test assertion to match
default `datanode_ingest_rt_size` of `cpus` instead of `1`
- `src/datanode/src/region_server.rs` — inline `handle_remote_read_inner`
directly instead of spawning onto the datanode query runtime
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* refactor: rename datanode runtimes
Summary:
- Rename datanode runtime APIs from `datanode_query` and `datanode_ingest` to `query` and `ingest`.
- Rename runtime config keys from `datanode_query_rt_size` and `datanode_ingest_rt_size` to `query_rt_size` and `ingest_rt_size`.
- Update config docs, example config, and config-loading coverage.
Files:
- `src/common/runtime/src/global.rs`
- `src/common/runtime/src/lib.rs`
- `src/cmd/tests/load_config_test.rs`
- `src/datanode/src/region_server.rs`
- `src/mito2/src/read/pruner.rs`
- `src/mito2/src/read/range_cache.rs`
- `src/mito2/src/read/scan_region.rs`
- `src/mito2/src/read/series_scan.rs`
- `config/datanode.example.toml`
- `config/config.md`
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* refactor: consolidate runtime options
Summary:
- Embed datanode runtime sizes in shared `RuntimeOptions` and remove the extra `GreptimeOptions` runtime type parameter.
- Use the unified `RuntimeOptions` for datanode global and datanode-specific runtime initialization.
- Update datanode runtime config coverage and ingest runtime default documentation.
Files:
- `src/common/runtime/src/global.rs`
- `src/common/runtime/src/lib.rs`
- `src/cmd/src/options.rs`
- `src/cmd/src/datanode.rs`
- `src/cmd/src/datanode/builder.rs`
- `src/cmd/tests/load_config_test.rs`
- `config/datanode.example.toml`
- `config/config.md`
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat: guard against double initialization of datanode runtimes
Add an assertion in `init_datanode_runtimes` to panic when global runtimes
are already initialized, preventing silent overwrites.
- `src/common/runtime/src/global.rs` — assert guard in `init_datanode_runtimes`
and test `test_set_datanode_runtimes_panics_after_global_runtimes_initialized`
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
---------
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat: table semantic layer information_schema view (Phase 3)
Add `information_schema.table_semantics`, a queryable view over the table
semantic layer. One row per table that carries at least one
`greptime.semantic.*` option: the signal-agnostic keys
(signal_type/source/pipeline/metadata_quality) are promoted to columns and
the remaining signal-specific keys are folded into a `semantic_options`
JSON string. Tables with no semantic key are excluded.
Stacked on Phase 2.
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* chore: address PR review on table_semantics
- fold JSON serialization failure into None instead of unwrap/panic
- drop per-row Vec allocation in predicate eval; use a fixed array
- align RFC view name with the shipped `table_semantics`
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* chore: update results
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
---------
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* feat: table semantic layer per-table enrichment (Phase 2)
Phase 2 of the table semantic layer, plus a vocabulary trim so the layer only
records what a machine consumer cannot cheaply recover on its own.
Per-table metric enrichment (OTLP), via an internal per-table channel:
- A `SemanticIndex` accumulator records, per emitted table, the declared metric
keys: type / unit / temporality / metadata_quality=declared / original_name.
Conflicting single-valued keys collapse to `mixed`/`unknown`.
- Recording happens at the `encode_metrics` level where the base name, metric
type, and proto fields are all in scope, so histogram/summary fan-out gets the
correct per-subtable type (`_bucket`=histogram, `_sum`/`_count`=counter)
without threading state through every encoder.
- The index is serialized onto the `greptime.internal.semantic.per_table_index`
context extension; `apply_per_table_semantic_options` folds each table's keys
into its options at auto-create time.
- `trace.conventions` is refined from the request's resource/scope `schema_url`s
(concrete when uniform, else `mixed`/`unknown`).
Vocabulary trimmed to only meaningful keys. Kept: signal_type, source, pipeline,
trace.conventions, metric.{type,unit,temporality,metadata_quality,original_name}.
Dropped: metric.monotonic (a function of type), trace.has_events/has_links
(constant + derivable from columns), log.severity_scheme/body_format (constant /
derivable, and body_format cost an O(rows) scan), resource/scope lineage
(restates columns / collector-config concern), source_version (no cheap
non-constant value today). Prometheus carries type/unit in the metric name by
convention, so it gets identity only — no inferred enrichment.
Identity (signal_type + source) extended to the remaining ingest protocols so
the discovery view is complete: InfluxDB and OpenTSDB (metric), Loki and
Elasticsearch (log). These protocols carry no type/unit metadata, so identity is
all that applies.
Tests: unit coverage for the accumulator, per-metric-type fan-out, and trace
conventions; integration goldens updated for the OTLP metric/trace SHOW CREATE
output and the new Loki identity.
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* chore: validate the option value
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
---------
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>