* feat: table semantic layer information_schema view (Phase 3)
Add `information_schema.table_semantics`, a queryable view over the table
semantic layer. One row per table that carries at least one
`greptime.semantic.*` option: the signal-agnostic keys
(signal_type/source/pipeline/metadata_quality) are promoted to columns and
the remaining signal-specific keys are folded into a `semantic_options`
JSON string. Tables with no semantic key are excluded.
Stacked on Phase 2.
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* chore: address PR review on table_semantics
- fold JSON serialization failure into None instead of unwrap/panic
- drop per-row Vec allocation in predicate eval; use a fixed array
- align RFC view name with the shipped `table_semantics`
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* chore: update results
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
---------
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* feat: table semantic layer per-table enrichment (Phase 2)
Phase 2 of the table semantic layer, plus a vocabulary trim so the layer only
records what a machine consumer cannot cheaply recover on its own.
Per-table metric enrichment (OTLP), via an internal per-table channel:
- A `SemanticIndex` accumulator records, per emitted table, the declared metric
keys: type / unit / temporality / metadata_quality=declared / original_name.
Conflicting single-valued keys collapse to `mixed`/`unknown`.
- Recording happens at the `encode_metrics` level where the base name, metric
type, and proto fields are all in scope, so histogram/summary fan-out gets the
correct per-subtable type (`_bucket`=histogram, `_sum`/`_count`=counter)
without threading state through every encoder.
- The index is serialized onto the `greptime.internal.semantic.per_table_index`
context extension; `apply_per_table_semantic_options` folds each table's keys
into its options at auto-create time.
- `trace.conventions` is refined from the request's resource/scope `schema_url`s
(concrete when uniform, else `mixed`/`unknown`).
Vocabulary trimmed to only meaningful keys. Kept: signal_type, source, pipeline,
trace.conventions, metric.{type,unit,temporality,metadata_quality,original_name}.
Dropped: metric.monotonic (a function of type), trace.has_events/has_links
(constant + derivable from columns), log.severity_scheme/body_format (constant /
derivable, and body_format cost an O(rows) scan), resource/scope lineage
(restates columns / collector-config concern), source_version (no cheap
non-constant value today). Prometheus carries type/unit in the metric name by
convention, so it gets identity only — no inferred enrichment.
Identity (signal_type + source) extended to the remaining ingest protocols so
the discovery view is complete: InfluxDB and OpenTSDB (metric), Loki and
Elasticsearch (log). These protocols carry no type/unit metadata, so identity is
all that applies.
Tests: unit coverage for the accumulator, per-metric-type fan-out, and trace
conventions; integration goldens updated for the OTLP metric/trace SHOW CREATE
output and the new Loki identity.
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* chore: validate the option value
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
---------
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* refactor: give instance to start plugin functions
* refactor: remove start_standalone_plugins because it's never used
* chore: make sure code is nicely formatted
* feat: table semantic layer identity (Phase 1)
Attach a thin layer of semantic metadata to ingested tables via the existing
`table_options` slot, so machine consumers (LLM agents, alert/dashboard builders,
MCP servers, ETL) can align a table with the observability concept it stands for
without guessing from column names. See docs/rfcs/2026-05-28-table-semantic-layer.md.
Phase 1 (identity) only:
- New `table::requests::semantic` module: the `greptime.semantic.*` vocabulary
(signal/source/source_version/pipeline + trace/metric/log/resource-scope keys,
defined now, populated by later phases), value constants, the internal
`greptime.internal.semantic.per_table_index` transport key (reserved for Phase 2,
deliberately outside the public namespace), and `is_semantic_option_key`.
- `validate_table_option` accepts the `greptime.semantic.*` prefix, so the keys are
valid both on the auto-create path and on explicit `CREATE TABLE ... WITH (...)`.
- `fill_table_options_for_create` copies every semantic ctx extension into the new
table's options (prefix passthrough alongside the fixed allowlist).
- Frontend stamps identity on the context at each ingest entry: OTLP metrics
(metric/opentelemetry), traces (+pipeline, has_events/has_links/conventions for
the v1 model), logs (log/opentelemetry), and Prometheus remote write
(metric/prometheus, metadata_quality=inferred). OTLP metric metadata_quality is
left for Phase 2 (declared).
- Trace identity is stamped only on the main span table; the derived
`_services` / `_operations` lookup tables keep the unstamped context and carry no
semantic identity (cross-table relationships are out of scope).
Semantic options appear in SHOW CREATE TABLE (like table_data_model /
otlp_metric_compat) and in information_schema, so an LLM inspecting a table sees its
semantics directly.
Tests: unit (validation prefix + internal-key rejection, ctx passthrough) and
integration assertions that the common keys land for OTLP metrics (metric-engine
logical table), traces, logs, and Prometheus remote write; SHOW CREATE goldens
updated.
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* fix: prom batcher not cover and white list for semantic keys/values
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* fix: typo
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
---------
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* refactor: use last compaction scheduled time instead of finished time
Signed-off-by: luofucong <luofc@foxmail.com>
* add some logs for debugging
Signed-off-by: luofucong <luofc@foxmail.com>
---------
Signed-off-by: luofucong <luofc@foxmail.com>
* feat: global switch for creating table automatically
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* chore: make auto_create_table as comment by default
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* feat: respect gloabl switch for metric engine
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
---------
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* feat: support defer_on_missing_source for pending flow creation
Add `defer_on_missing_source` flow option that allows creating flows
even when source tables do not yet exist. The flow enters a pending
state and is automatically activated when source tables become available.
Key changes:
- New `FlowStatus::PendingSources` and fields in `FlowInfoValue` for
unresolved source table names and last activation error
- `defer_on_missing_source` create-time-only option: stripped from
runtime/flownode `CreateRequest` but preserved in metadata for
SQL round-trip (`SHOW CREATE FLOW`, `information_schema.flows`)
- `CreateFlowProcedure` creates pending metadata when sources are
missing and `defer_on_missing_source=true`; falls back to
`FlowType::Batching` for missing-source flows
- `PendingFlowReconcileManager` in meta-srv periodically checks
pending flows and activates them when source tables resolve
- `ActivatePendingFlowProcedure` handles activation: allocates peers,
creates flows on flownodes, updates metadata, invalidates cache
- `OR REPLACE` properly handles pending<->active transitions,
including peer allocation and flownode flow teardown
- `FlowMetadataAllocator::alloc_peers` for peer allocation at
activation time
- Validated flow options: only `defer_on_missing_source` allowed;
unknown options rejected
- Known issue: standalone mode does not support flownodes, so
pending flow flush/sink behavior covered only in distributed
sqlness; operator and meta unit tests cover activation logic
Tests:
- operator `determine_flow_type_for_source_state` (3 passed)
- common-meta `create_flow` (19 passed) including replacement
- common-meta `activate_flow` (4 passed)
- meta-srv `flow` (11 passed)
- sqlness: `flow_pending` covers create/replace/round-trip
Signed-off-by: discord9 <discord9@163.com>
* chore: simplify pending flow PR scope
Reduce PR #8124 to the metadata-only MVP after complexity review.
Changes:
- Remove automatic activation procedure and meta-srv reconcile wiring
- Remove activation tests and activation-only metadata fields
- Reject cross-state pending<->active `OR REPLACE` transitions for now
- Keep pending metadata creation and SQL round-trip behavior
- Allow `DROP FLOW` for pending flows without routes
- Reduce flow_pending sqlness to metadata/round-trip/drop coverage only
Deferred follow-ups are documented locally in `.tmp/tasks/pending-defer-semantics/deferred-followups.md` and intentionally not committed.
Tests:
- `cargo test -p operator determine_flow_type_for_source_state`
- `cargo test -p common-meta create_flow`
- `cargo test -p common-meta drop_flow`
- `cargo sqlness bare --test-filter flow_pending --bins-dir /mnt/nvme_rust/rust-targets/pending_defer/debug`
Signed-off-by: discord9 <discord9@163.com>
* test: cover pending flow metadata edge cases
Signed-off-by: discord9 <discord9@163.com>
* test: fix pending flow metadata test lint
Signed-off-by: discord9 <discord9@163.com>
* docs: document pending flow metadata fields
Signed-off-by: discord9 <discord9@163.com>
* chore: more sleep when test
Signed-off-by: discord9 <discord9@163.com>
---------
Signed-off-by: discord9 <discord9@163.com>