Commit Graph

375 Commits

Author SHA1 Message Date
dennis zhuang
31c2c1f6db feat: table semantic layer per-table enrichment (Phase 2) (#8218)
* feat: table semantic layer per-table enrichment (Phase 2)

Phase 2 of the table semantic layer, plus a vocabulary trim so the layer only
records what a machine consumer cannot cheaply recover on its own.

Per-table metric enrichment (OTLP), via an internal per-table channel:
- A `SemanticIndex` accumulator records, per emitted table, the declared metric
  keys: type / unit / temporality / metadata_quality=declared / original_name.
  Conflicting single-valued keys collapse to `mixed`/`unknown`.
- Recording happens at the `encode_metrics` level where the base name, metric
  type, and proto fields are all in scope, so histogram/summary fan-out gets the
  correct per-subtable type (`_bucket`=histogram, `_sum`/`_count`=counter)
  without threading state through every encoder.
- The index is serialized onto the `greptime.internal.semantic.per_table_index`
  context extension; `apply_per_table_semantic_options` folds each table's keys
  into its options at auto-create time.
- `trace.conventions` is refined from the request's resource/scope `schema_url`s
  (concrete when uniform, else `mixed`/`unknown`).

Vocabulary trimmed to only meaningful keys. Kept: signal_type, source, pipeline,
trace.conventions, metric.{type,unit,temporality,metadata_quality,original_name}.
Dropped: metric.monotonic (a function of type), trace.has_events/has_links
(constant + derivable from columns), log.severity_scheme/body_format (constant /
derivable, and body_format cost an O(rows) scan), resource/scope lineage
(restates columns / collector-config concern), source_version (no cheap
non-constant value today). Prometheus carries type/unit in the metric name by
convention, so it gets identity only — no inferred enrichment.

Identity (signal_type + source) extended to the remaining ingest protocols so
the discovery view is complete: InfluxDB and OpenTSDB (metric), Loki and
Elasticsearch (log). These protocols carry no type/unit metadata, so identity is
all that applies.

Tests: unit coverage for the accumulator, per-metric-type fan-out, and trace
conventions; integration goldens updated for the OTLP metric/trace SHOW CREATE
output and the new Loki identity.

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* chore: validate the option value

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

---------

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
2026-06-04 12:52:00 +00:00
QuakeWang
6ed67167bb feat: support headerless CSV copy from (#8233)
* feat: support headerless CSV copy from

Signed-off-by: QuakeWang <wangfuzheng0814@foxmail.com>

* fix: update csv copy sqlness result

Signed-off-by: QuakeWang <wangfuzheng0814@foxmail.com>

* test: cover headerless CSV copy from

Signed-off-by: QuakeWang <wangfuzheng0814@foxmail.com>

* test: cover headerless CSV column count mismatch

Signed-off-by: QuakeWang <wangfuzheng0814@foxmail.com>

---------

Signed-off-by: QuakeWang <wangfuzheng0814@foxmail.com>
2026-06-04 12:04:15 +00:00
Lei, HUANG
da9d314fa4 fix: align bulk insert schema handling (#8222)
* fix: align bulk insert schema handling

Carry `aligned_schema_version` through bulk insert requests so datanode can recheck stale schemas only when needed. Use the physical data region schema version for metric logical bulk inserts, while external bulk insert callers leave alignment unknown.

Related files:
- `Cargo.toml`
- `Cargo.lock`
- `src/store-api/src/region_request.rs`
- `src/metric-engine/src/engine/bulk_insert.rs`
- `src/mito2/src/request.rs`
- `src/mito2/src/worker/handle_bulk_insert.rs`
- `src/mito2/src/worker/handle_write.rs`
- `src/operator/src/bulk_insert.rs`
- `src/servers/src/pending_rows_batcher.rs`
- `src/mito2/src/engine/edit_region_test.rs`

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* test: cover bulk insert missing columns

Add Flight bulk insert coverage for batches that omit nullable columns and verify the database fills the missing values with nulls.

Related files:
- `tests-integration/src/grpc/flight.rs`

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* chore: update proto to main branch commit

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* fix: align physical metric bulk insert schema

Pass the physical schema version through metric bulk insert requests so
physical and logical region writes share the same schema alignment behavior.

Files:
- \`src/metric-engine/src/engine/bulk_insert.rs\` — updates \`bulk_insert_physical_region\` and reuses \`physical_schema_version\` for bulk insert schema alignment.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* chore: add some doc

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

---------

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
2026-06-04 02:55:09 +00:00
QuakeWang
ca07a53deb feat: support CSV copy skip bad records (#8198)
* feat: support CSV copy headers and skip bad records

Signed-off-by: QuakeWang <wangfuzheng0814@foxmail.com>

* refactor: reuse CSV skippable error helper

Signed-off-by: QuakeWang <wangfuzheng0814@foxmail.com>

* fix: address CSV skip bad records review

Signed-off-by: QuakeWang <wangfuzheng0814@foxmail.com>

* docs: clarify CSV skip bad records performance tradeoff

Signed-off-by: QuakeWang <wangfuzheng0814@foxmail.com>

---------

Signed-off-by: QuakeWang <wangfuzheng0814@foxmail.com>
2026-06-03 12:11:02 +00:00
dennis zhuang
f25ab67af1 feat: table semantic layer identity (Phase 1) (#8210)
* feat: table semantic layer identity (Phase 1)

Attach a thin layer of semantic metadata to ingested tables via the existing
`table_options` slot, so machine consumers (LLM agents, alert/dashboard builders,
MCP servers, ETL) can align a table with the observability concept it stands for
without guessing from column names. See docs/rfcs/2026-05-28-table-semantic-layer.md.

Phase 1 (identity) only:
- New `table::requests::semantic` module: the `greptime.semantic.*` vocabulary
  (signal/source/source_version/pipeline + trace/metric/log/resource-scope keys,
  defined now, populated by later phases), value constants, the internal
  `greptime.internal.semantic.per_table_index` transport key (reserved for Phase 2,
  deliberately outside the public namespace), and `is_semantic_option_key`.
- `validate_table_option` accepts the `greptime.semantic.*` prefix, so the keys are
  valid both on the auto-create path and on explicit `CREATE TABLE ... WITH (...)`.
- `fill_table_options_for_create` copies every semantic ctx extension into the new
  table's options (prefix passthrough alongside the fixed allowlist).
- Frontend stamps identity on the context at each ingest entry: OTLP metrics
  (metric/opentelemetry), traces (+pipeline, has_events/has_links/conventions for
  the v1 model), logs (log/opentelemetry), and Prometheus remote write
  (metric/prometheus, metadata_quality=inferred). OTLP metric metadata_quality is
  left for Phase 2 (declared).
- Trace identity is stamped only on the main span table; the derived
  `_services` / `_operations` lookup tables keep the unstamped context and carry no
  semantic identity (cross-table relationships are out of scope).

Semantic options appear in SHOW CREATE TABLE (like table_data_model /
otlp_metric_compat) and in information_schema, so an LLM inspecting a table sees its
semantics directly.

Tests: unit (validation prefix + internal-key rejection, ctx passthrough) and
integration assertions that the common keys land for OTLP metrics (metric-engine
logical table), traces, logs, and Prometheus remote write; SHOW CREATE goldens
updated.

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* fix: prom batcher not cover and white list for semantic keys/values

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* fix: typo

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

---------

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
2026-06-02 01:56:29 +00:00
discord9
28fd796f4e fix(flow): harden incremental read correctness (#8196)
* fix(flow): harden incremental read correctness

Signed-off-by: discord9 <discord9@163.com>

* fix(flow): propagate dirty window options

Signed-off-by: discord9 <discord9@163.com>

* test: more

Signed-off-by: discord9 <discord9@163.com>

* chore: test config api

Signed-off-by: discord9 <discord9@163.com>

* refactor: split gen

Signed-off-by: discord9 <discord9@163.com>

* chore: per review

Signed-off-by: discord9 <discord9@163.com>

* fix: allowlist key

Signed-off-by: discord9 <discord9@163.com>

---------

Signed-off-by: discord9 <discord9@163.com>
2026-06-01 02:48:00 +00:00
dennis zhuang
ed9312f8e3 feat: global switch for creating tables automatically (#8203)
* feat: global switch for creating table automatically

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* chore: make auto_create_table as comment by default

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* feat: respect gloabl switch for metric engine

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

---------

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
2026-05-31 23:51:14 +00:00
discord9
ba15a9c056 feat: support pending flow metadata with defer_on_missing_source (#8124)
* feat: support defer_on_missing_source for pending flow creation

Add `defer_on_missing_source` flow option that allows creating flows
even when source tables do not yet exist. The flow enters a pending
state and is automatically activated when source tables become available.

Key changes:
- New `FlowStatus::PendingSources` and fields in `FlowInfoValue` for
  unresolved source table names and last activation error
- `defer_on_missing_source` create-time-only option: stripped from
  runtime/flownode `CreateRequest` but preserved in metadata for
  SQL round-trip (`SHOW CREATE FLOW`, `information_schema.flows`)
- `CreateFlowProcedure` creates pending metadata when sources are
  missing and `defer_on_missing_source=true`; falls back to
  `FlowType::Batching` for missing-source flows
- `PendingFlowReconcileManager` in meta-srv periodically checks
  pending flows and activates them when source tables resolve
- `ActivatePendingFlowProcedure` handles activation: allocates peers,
  creates flows on flownodes, updates metadata, invalidates cache
- `OR REPLACE` properly handles pending<->active transitions,
  including peer allocation and flownode flow teardown
- `FlowMetadataAllocator::alloc_peers` for peer allocation at
  activation time
- Validated flow options: only `defer_on_missing_source` allowed;
  unknown options rejected
- Known issue: standalone mode does not support flownodes, so
  pending flow flush/sink behavior covered only in distributed
  sqlness; operator and meta unit tests cover activation logic

Tests:
- operator `determine_flow_type_for_source_state` (3 passed)
- common-meta `create_flow` (19 passed) including replacement
- common-meta `activate_flow` (4 passed)
- meta-srv `flow` (11 passed)
- sqlness: `flow_pending` covers create/replace/round-trip

Signed-off-by: discord9 <discord9@163.com>

* chore: simplify pending flow PR scope

Reduce PR #8124 to the metadata-only MVP after complexity review.

Changes:
- Remove automatic activation procedure and meta-srv reconcile wiring
- Remove activation tests and activation-only metadata fields
- Reject cross-state pending<->active `OR REPLACE` transitions for now
- Keep pending metadata creation and SQL round-trip behavior
- Allow `DROP FLOW` for pending flows without routes
- Reduce flow_pending sqlness to metadata/round-trip/drop coverage only

Deferred follow-ups are documented locally in `.tmp/tasks/pending-defer-semantics/deferred-followups.md` and intentionally not committed.

Tests:
- `cargo test -p operator determine_flow_type_for_source_state`
- `cargo test -p common-meta create_flow`
- `cargo test -p common-meta drop_flow`
- `cargo sqlness bare --test-filter flow_pending --bins-dir /mnt/nvme_rust/rust-targets/pending_defer/debug`

Signed-off-by: discord9 <discord9@163.com>

* test: cover pending flow metadata edge cases

Signed-off-by: discord9 <discord9@163.com>

* test: fix pending flow metadata test lint

Signed-off-by: discord9 <discord9@163.com>

* docs: document pending flow metadata fields

Signed-off-by: discord9 <discord9@163.com>

* chore: more sleep when test

Signed-off-by: discord9 <discord9@163.com>

---------

Signed-off-by: discord9 <discord9@163.com>
2026-05-29 06:59:21 +00:00
Weny Xu
f513b77ccc feat: support alter table partition syntax (#8177)
* feat(sql): support alter table partition syntax

Signed-off-by: WenyXu <wenymedia@gmail.com>

* feat: support repartition source proto

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: update greptime-proto

Signed-off-by: WenyXu <wenymedia@gmail.com>

---------

Signed-off-by: WenyXu <wenymedia@gmail.com>
2026-05-26 15:06:14 +00:00
Lei, HUANG
f8df016623 feat: add InfluxDB default merge mode config (#8134)
* feat/influxdb-default-merge-mode: add InfluxDB merge mode config

- `influxdb` config: add `default_merge_mode` parsing and defaults in `src/frontend/src/service_config/influxdb.rs` and `src/frontend/src/service_config.rs`
- auto-create behavior: apply configured `merge_mode` for InfluxDB ingestion in `src/frontend/src/instance.rs`, `src/frontend/src/instance/builder.rs`, `src/frontend/src/instance/influxdb.rs`, and `src/operator/src/insert.rs`
- config docs: document `influxdb.default_merge_mode` in `config/frontend.example.toml`, `config/standalone.example.toml`, and `config/config.md`

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* feat/influxdb-default-merge-mode: derive merge mode default

- `influxdb` config: derive `Default` for `InfluxdbMergeMode` in `src/frontend/src/service_config/influxdb.rs`

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* feat/influxdb-default-merge-mode: update config API snapshot

- `config API`: include `default_merge_mode` in `tests-integration/tests/http.rs`

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* feat/influxdb-default-merge-mode: avoid default context clone

- `InfluxDB merge mode`: avoid cloning `QueryContext` for default `last_non_null` in `src/frontend/src/instance/influxdb.rs`
- `InfluxDB merge mode`: cover default, configured, and explicit `MERGE_MODE_KEY` paths in `src/frontend/src/instance/influxdb.rs`

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

---------

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
2026-05-19 16:54:36 +00:00
Lei, HUANG
2fdbe6c8c3 feat: expose node info for placement selectors (#8095)
* feat: expose node info for placement selectors

Return `NodeInfo` from `PeerDiscovery` methods and keep OSS selectors mapping back to `Peer`.

Carry `__greptime_origin_frontend.addr` from frontend create-table DDLs into selector `extensions`, and thread `PeerAllocContext` through table-route allocation.

Persist datanode `NodeInfo` when heartbeat stats are absent so collected env vars remain available after restart.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* fix: skip datanode node info without stats

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* fix: avoid unnecessary workload clones

Skip workload cloning for inactive nodes and for active node-info lookups without workload filters.

Files: `src/meta-srv/src/discovery/utils.rs`
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* fix: require frontend origin address

Require `StatementExecutor` to carry a concrete frontend origin address and always attach it to meta DDL query contexts.

Files: `src/operator/src/statement.rs`, `src/operator/src/statement/ddl.rs`, `src/operator/src/utils.rs`, `src/frontend/src/instance/builder.rs`, `src/frontend/src/heartbeat.rs`, `src/flow/src/server.rs`, `src/cmd/src/standalone.rs`, `src/cmd/src/flownode.rs`.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* refactor: reuse resolved frontend address

Resolve the frontend peer address once in the frontend builder, store it on the instance, and reuse it for heartbeat and flow invoker origins.

Files: `src/frontend/src/instance/builder.rs`, `src/frontend/src/instance.rs`, `src/frontend/src/heartbeat.rs`, `src/cmd/src/frontend.rs`, `src/cmd/src/standalone.rs`, `src/frontend/src/frontend.rs`, `src/frontend/src/heartbeat/tests.rs`.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* fix: preserve datanode lease liveness

Filter active datanode node infos through lease timestamps and workloads while preserving node info fields such as reported env vars.

Files: `src/meta-srv/src/discovery/utils.rs`, `src/meta-srv/src/discovery/lease.rs`.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* Remove stale datanode lease helper

- `discovery`: remove the obsolete `alive_datanodes` helper and related tests in `src/meta-srv/src/discovery/utils.rs` and `src/meta-srv/src/discovery/lease.rs`
- `integration`: update cluster and standalone setup paths in `tests-integration/src/cluster.rs` and `tests-integration/src/standalone.rs`

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* feat/env-based-region-selector-oss: simplify lease discovery

- `lease-discovery`: simplify logic and remove unused utilities in `src/meta-srv/src/discovery/lease.rs` and `src/meta-srv/src/discovery/utils.rs`

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

---------

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
2026-05-15 07:35:18 +00:00
Lei, HUANG
796aae3d9f feat(operator): allow last_row merge mode with append mode (#8065)
* feat(operator): allow last_row merge_mode when append_mode is enabled

- Update RegionOptions::validate to allow last_row merge_mode with append_mode.
- Update fill_table_options_for_create to automatically set merge_mode to last_row when append_mode is enabled for LastNonNull table type.
- Add unit tests in mito2 and operator to verify options validation and table creation.
- Add integration test for InfluxDB write with append mode hint.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* fix(operator): simplify append mode options

Group `LastNonNull` auto-create options in a single append-mode branch.

Files:

- `src/operator/src/insert.rs`

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* fix: sqlness

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

---------

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
2026-05-07 07:21:37 +00:00
discord9
d2d256909f feat(flow): parse defer on miss src table (#7980)
* feat: parse create flow with

Signed-off-by: discord9 <discord9@163.com>

* feat: validate after parse

Signed-off-by: discord9 <discord9@163.com>

* pcr

Signed-off-by: discord9 <discord9@163.com>

* chore: sqlness

Signed-off-by: discord9 <discord9@163.com>

---------

Signed-off-by: discord9 <discord9@163.com>
2026-04-27 03:02:13 +00:00
shuiyisong
0effc30778 chore: update the opendal to 0.56 rc2 (#8003)
* chore: update opendal version

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* chore: update opendal version

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* chore: fix test

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* fix: grpc init

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* fix: dep versions

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* fix: remove aws-lc-rs in reqwest

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* chore: rebase main and fix compile

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* fix: remove unused deps

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* Revert "fix: remove aws-lc-rs in reqwest"

This reverts commit 90bfafca06f877befb36f3e54bb72fcfc4c56778.

* chore: remove aws-lc-sys from blacklist

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* chore: fix sqlness

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* chore: add tls deps

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* fix: idemptent install in rds

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* fix: test

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* chore: use aws-lc-sys as possible

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* fix: lint

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* fix: address comments

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* chore: address CR issue

Signed-off-by: shuiyisong <xixing.sys@gmail.com>
Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: sync opendal compat adapter with upstream

Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: address compat clippy warnings

Signed-off-by: evenyag <realevenyag@gmail.com>

---------

Signed-off-by: shuiyisong <xixing.sys@gmail.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
Co-authored-by: evenyag <realevenyag@gmail.com>
2026-04-26 09:59:48 +00:00
LFC
c7427fa6d2 feat: json2 insert (#7981)
* feat: json2 insert

Signed-off-by: luofucong <luofc@foxmail.com>

* resolve PR comments

Signed-off-by: luofucong <luofc@foxmail.com>

* use main greptime-proto

Signed-off-by: luofucong <luofc@foxmail.com>

---------

Signed-off-by: luofucong <luofc@foxmail.com>
2026-04-20 09:45:32 +00:00
yxrxy
6c924efc2f fix: remove redundant error messages in admin functions (#7953)
Closes #7938

Signed-off-by: yxrxy <yxrxytrigger@gmail.com>
2026-04-17 08:21:22 +00:00
fys
62013217c7 fix: cargo check -p common-meta (#7964)
fix: moka feature
2026-04-14 08:27:22 +00:00
Ning Sun
32a2990802 feat: allow customizing trace table partitions (#7944)
* feat: allow customizing trace table partitions

* feat: add hint

* feat: return error on invalid partition number

* feat: add full failure/partial success/full success

* chore: format

* fix: address review suggestion
2026-04-13 14:10:55 +00:00
Lei, HUANG
2b4e12c358 feat: auto-align Prometheus schemas in pending rows batching (#7877)
* feat/auto-schema-align:
 - **Error Handling Improvements**:
   - Removed `CatalogSnafu` context from various `.await` calls in `dashboard.rs`, `influxdb.rs`, `jaeger.rs`, `prometheus.rs`, `event.rs`, and `pipeline.rs` to streamline error handling.

 - **Prometheus Store Enhancements**:
   - Added support for auto-creating tables and adding missing Prometheus tag columns in `prom_store.rs` and `pending_rows_batcher.rs`.
   - Introduced `PendingRowsSchemaAlterer` trait for schema alterations in `pending_rows_batcher.rs`.

 - **Test Additions**:
   - Added tests for new Prometheus store functionalities in `prom_store.rs` and `pending_rows_batcher.rs`.

 - **Error Message Improvements**:
   - Enhanced error messages for catalog access in `error.rs`.

 - **Server Configuration Updates**:
   - Updated server configuration to include Prometheus store options in `server.rs`.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* reformat

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* feat/auto-schema-align:
 ### Add DataTypes Error Handling and Column Renaming Logic

 - **`error.rs`**: Introduced a new `DataTypes` error variant to handle errors from `datatypes::error::Error`. Updated `ErrorExt` implementation to include `DataTypes`.
 - **`pending_rows_batcher.rs`**: Added functions `find_prom_special_column_names` and `rename_prom_special_columns_for_existing_schema` to handle renaming of special Prometheus columns. Updated `build_prom_create_table_schema` to simplify error handling with
 `ConcreteDataType`.
 - **Tests**: Added a test case `test_rename_prom_special_columns_for_existing_schema` to verify the renaming logic for Prometheus special columns.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* feat/auto-schema-align:
 - Refactored `PendingRowsBatcher` to accommodate Prometheus record batches:
   - Introduced `accommodate_record_batch_for_target_schema` to normalize incoming record batches against existing table schemas.
   - Removed `collect_missing_prom_tag_columns` and `rename_prom_special_columns_for_existing_schema` in favor of the new function.
   - Added `unzip_logical_region_schema` to extract schema components.
 - Updated tests in `pending_rows_batcher.rs`:
   - Added tests for `accommodate_record_batch_for_target_schema` to verify handling of missing tag columns and renaming of special columns.
   - Ensured error handling for missing timestamp and field columns in target schema.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* feat/auto-schema-align:
 ### Commit Summary

 - **Enhancement in Table Creation Logic**: Updated `prom_store.rs` to modify the handling of `table_options` during table creation. Specifically, `table_options` are now extended differently based on the `AutoCreateTableType`. For `Physical` tables, enforced
 `sst_format=flat` to optimize pending-rows writes by leveraging bulk memtables.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* feat/auto-schema-align:
 Enhance Performance Monitoring in `pending_rows_batcher.rs`

 - Added performance monitoring timers to various stages of the `PendingRowsBatcher` process, including schema cache checks, table resolution, schema creation, and record batch alignment.
 - Improved schema handling by adding timers around schema alteration and missing column addition processes.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* feat/auto-schema-align:
 - **Enhance Concurrent Write Handling**: Introduced `FlushRegionWrite` and `FlushWriteResult` structs to manage region writes and their results. Added `flush_region_writes_concurrently` function to handle concurrent flushing of region writes based on
 `should_dispatch_concurrently` logic in `pending_rows_batcher.rs`.
 - **Testing Enhancements**: Added tests for concurrent dispatching of region writes and the logic for determining concurrent dispatch in `pending_rows_batcher.rs`.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* feat/auto-schema-align:
 ### Add Histogram for Flush Stage Elapsed Time

 - **`metrics.rs`**: Introduced a new `HistogramVec` named `PENDING_ROWS_BATCH_FLUSH_STAGE_ELAPSED` to track the elapsed time of pending rows batch flush stages.
 - **`pending_rows_batcher.rs`**: Replaced instances of `PENDING_ROWS_BATCH_INGEST_STAGE_ELAPSED` with `PENDING_ROWS_BATCH_FLUSH_STAGE_ELAPSED` to measure the elapsed time for various flush stages, including `flush_write_region`, `flush_concat_table_batches`,
 `flush_resolve_table`, `flush_fetch_partition_rule`, `flush_split_record_batch`, `flush_filter_record_batch`, `flush_resolve_region_leader`, and `flush_encode_ipc`.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* Add design doc for physical table batching in PendingRowsBatcher

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* Add implementation plan for physical table batching in PendingRowsBatcher

* feat/auto-schema-align:
 ### Commit Message

 **Enhance Metric Engine with Physical Batch Processing**

 - **Add `metric-engine` Dependency**: Updated `Cargo.lock` and `Cargo.toml` to include `metric-engine` as a workspace dependency.

 - **Expose Batch Modifier Functions**: Changed visibility of `TagColumnInfo`, `compute_tsid_array`, and `modify_batch_sparse` in `batch_modifier.rs` to public, and made `batch_modifier` a public module in `lib.rs`.

 - **Implement Physical Batch Processing**:
   - Added functions `bulk_insert_physical_region` and `bulk_insert_logical_region` in `bulk_insert.rs` to handle physical and logical batch insertions.
   - Updated `pending_rows_batcher.rs` to attempt physical batch processing before falling back to logical processing, including new functions `flush_batch_physical` and `flush_batch_per_logical_table`.

 - **Enhance Testing**:
   - Added tests for physical region passthrough and empty batch handling in `bulk_insert.rs`.
   - Introduced `with_mito_config` in `test_util.rs` for customized test environments.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* feat/auto-schema-align:
 ### Enhance Batch Processing for Table Creation and Alteration

 - **`prom_store.rs`**:
   - Added `create_tables_if_missing_batch` and
 `add_missing_prom_tag_columns_batch` methods to handle batch creation of tables
 and batch alteration to add missing tag columns.
   - Implemented logic to determine missing tables and columns, and perform batch
 operations accordingly.

 - **`pending_rows_batcher.rs`**:
   - Updated `PendingRowsBatcher` to utilize batch methods for creating tables an
 adding missing columns.
   - Enhanced logic to resolve table schemas and accommodate record batches after
 batch operations.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* perf: concurrent catalog lookups and eliminate redundant concat_batches on ingest path

Replace sequential catalog_manager.table() calls with concurrent
futures::future::join_all in align_table_batches_to_region_schema.
This affects all three lookup loops: initial table resolution,
post-create resolution, and post-alter schema refresh. Reduces
O(N) sequential RPC latency to O(1) wall-clock time for requests
with many distinct logical tables (e.g. Prometheus remote_write).

Remove the per-logical-table concat_batches in flush_batch_physical.
Instead of merging all chunks of a table into one RecordBatch before
calling modify_batch_sparse, apply modify_batch_sparse directly to
each chunk and collect all modified chunks for a single final concat.
This eliminates one full data copy per logical table on the flush path.

* refactor: extract Prometheus schema alignment helpers into prom_row_builder module

Move six functions and their eight unit tests from pending_rows_batcher.rs
(~2386 lines) into a new prom_row_builder.rs module (~776 lines), leaving
the batcher at ~1665 lines focused on flush/worker machinery.

Extracted functions:
- accommodate_record_batch_for_target_schema (normalize incoming batch
  against existing table schema)
- unzip_logical_region_schema (extract ts/field/tag columns)
- build_prom_create_table_schema (build ColumnSchema vec for table creation)
- align_record_batch_to_schema (reorder/fill/cast columns to target schema)
- rows_to_record_batch (convert proto Rows to Arrow RecordBatch)
- build_arrow_array (build Arrow arrays from proto values)

Cleaned up 12 now-unused imports from pending_rows_batcher.rs.

* feat/auto-schema-align:
 ### Enhance `PendingRowsBatcher` and `prom_row_builder` for Efficient Schema Handling

 - **`pending_rows_batcher.rs`:**
   - Refactored `submit` method to integrate table batch building and alignment into a single method `build_and_align_table_batches`.
   - Removed intermediate `RecordBatch` creation, optimizing the process by directly converting proto `RowInsertRequests` into aligned `RecordBatch`es.
   - Enhanced schema handling by identifying missing columns directly from proto schemas.

 - **`prom_row_builder.rs`:**
   - Introduced `rows_to_aligned_record_batch` for direct conversion of proto `Rows` into aligned `RecordBatch`es.
   - Added `identify_missing_columns_from_proto` to detect absent tag columns without intermediate `RecordBatch`.
   - Implemented `build_prom_create_table_schema_from_proto` to construct table schemas directly from proto schemas.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* feat/auto-schema-align:
 Add elapsed time metrics for bulk insert operations

 - Updated `bulk_insert` method in `bulk_insert.rs` to record elapsed time metrics using `MITO_OPERATION_ELAPSED` for both physical and logical regions.
 - Added a new test `test_bulk_insert_records_elapsed_metric` to verify that the elapsed time metric is recorded correctly during bulk insert operations.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* remove flush per logical region

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* feat/auto-schema-align:
 **Refactor `flush_batch` and `flush_batch_physical` functions**

 - Removed unused `catalog` and `schema` variables from `flush_batch` in `pending_rows_batcher.rs`.
 - Updated `flush_batch_physical` to directly use `ctx.current_catalog()` and `ctx.current_schema()` for resolving table names.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* feat/auto-schema-align:
 ### Remove Unused Function and Associated Test

 - **File:** `src/servers/src/prom_row_builder.rs`
   - Removed the unused function `build_prom_create_table_schema` which was responsible for building a `Vec<ColumnSchema>` from an Arrow schema.
   - Deleted the associated test `test_build_prom_create_table_schema_from_request_schema` that validated the removed function.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* feat/auto-schema-align:
 - **Remove Test**: Deleted the `test_bulk_insert_records_elapsed_metric` test from `bulk_insert.rs`.
 - **Refactor Table Resolution**: Introduced `TableResolutionPlan` struct and refactored table resolution logic in `pending_rows_batcher.rs`.
 - **Enhance Table Handling**: Added functions for collecting non-empty table rows, unique table schemas, and handling table creation and alteration in `pending_rows_batcher.rs`.
 - **Add Tests**: Implemented tests for `collect_non_empty_table_rows` and `collect_unique_table_schemas` in `pending_rows_batcher.rs`.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* feat/auto-schema-align:
 - **Refactor Error Handling**: Updated error handling in `pending_rows_batcher.rs` and `prom_row_builder.rs` to use `Snafu` error context for more descriptive error messages.
 - **Remove Unused Functionality**: Eliminated the `rows_to_record_batch` function and related test in `prom_row_builder.rs` as it was redundant.
 - **Simplify Function Return Types**: Modified `rows_to_aligned_record_batch` in `prom_row_builder.rs` to return only `RecordBatch` without missing columns, simplifying the function's interface and related tests.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* feat/auto-schema-align:
 ### Add Helper Function for Table Options in `prom_store.rs`

 - Introduced `fill_metric_physical_table_options` function to encapsulate logic for setting table options, ensuring the use of flat SST format and physical table metadata.
 - Updated `Instance` implementation to utilize the new helper function for setting table options.
 - Added a unit test `test_metric_physical_table_options_forces_flat_sst_format` to verify the correct application of table options.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* feat/auto-schema-align:
 - **Refactor `PendingRowsBatcher`**: Simplified worker retrieval logic in `get_or_spawn_worker` method by using a more concise conditional check.
 - **Metrics Update**: Added `PENDING_ROWS_BATCH_FLUSH_STAGE_ELAPSED` metric in `pending_rows_batcher.rs`.
 - **Remove Unused Code**: Deleted multiple test functions related to record batch alignment and schema preparation in `pending_rows_batcher.rs` and `prom_row_builder.rs`.
 - **Function Visibility Change**: Made `build_prom_create_table_schema_from_proto` public in `prom_row_builder.rs`.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* chore: remove plan

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* feat/auto-schema-align:
 ### Refactor and Simplify Schema Alteration Logic

 - **Removed Unused Methods**: Deleted `create_table_if_missing` and `add_missing_prom_tag_columns` methods from `PendingRowsSchemaAlterer` trait in `prom_store.rs` and `pending_rows_batcher.rs`.
 - **Error Handling Improvement**: Enhanced error handling in `create_tables_if_missing_batch` method to return a specific error message for unsupported `AutoCreateTableType` in `prom_store.rs`.
 - **Visibility Change**: Made `as_str` method public in `AutoCreateTableType` enum in `insert.rs` to support external access.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* feat/auto-schema-align:
 ### Commit Message

 Improve safety in `prom_row_builder.rs`

 - Updated `unzip_logical_region_schema` to use `saturating_sub` for safer capacity calculation of `tag_columns`.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* feat/auto-schema-align:
 Add TODO comments for future improvements in `pending_rows_batcher.rs`

 - Added a TODO comment to consider bounding the `flush_region_writes_concurrently` function.
 - Added a TODO comment to potentially limit the maximum rows to concatenate in the `flush_batch_physical` function.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* feat/auto-schema-align:
 ### Commit Message

 Enhance error handling in `pending_rows_batcher.rs`

 - Updated `collect_unique_table_schemas` to return a `Result` type, enabling error handling for duplicate table names.
 - Modified the function to return an error when duplicate table names are found in `table_rows`.
 - Adjusted test cases to handle the new `Result` return type in `collect_unique_table_schemas`.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* feat/auto-schema-align:
 - **Refactor `partition_columns` Method**: Updated the `partition_columns` method in `multi_dim.rs`, `partition.rs`, and `splitter.rs` to return a slice reference instead of a cloned vector, improving performance by avoiding unnecessary cloning.
 - **Enhance Partition Handling**: Added functions `collect_tag_columns_and_non_tag_indices` and `strip_partition_columns_from_batch` in `pending_rows_batcher.rs` to manage partition columns more efficiently, including stripping partition columns from record batches.
 - **Update Tests**: Modified existing tests and added new ones in `pending_rows_batcher.rs` to verify the functionality of partition column handling, ensuring correct behavior of the new methods.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* feat/auto-schema-align:
 ### Enhance Schema Handling and Validation in `pending_rows_batcher.rs`

 - **Schema Validation Enhancements**:
   - Added checks for essential columns (`timestamp`, `value`) in `collect_tag_columns_and_non_tag_indices`.
   - Introduced `PHYSICAL_REGION_ESSENTIAL_COLUMN_COUNT` to ensure minimum column count in `strip_partition_columns_from_batch`.
   - Improved error handling for unexpected data types and duplicated columns.

 - **Function Modifications**:
   - Updated `strip_partition_columns_from_batch` to project essential columns without lookup.
   - Modified `flush_batch_physical` to use `essential_col_indices` instead of `non_tag_indices`.

 - **Test Enhancements**:
   - Added tests for schema validation, including checks for unexpected data types and duplicated columns.
   - Verified correct projection of essential columns in `strip_partition_columns_from_batch`.

 Files affected: `pending_rows_batcher.rs`, `tests`.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* feat/auto-schema-align:
 - **Add `smallvec` Dependency**: Updated `Cargo.lock` and `Cargo.toml` to include `smallvec` as a workspace dependency.
 - **Refactor Function**: Renamed `collect_tag_columns_and_non_tag_indices` to `columns_taxonomy` in `pending_rows_batcher.rs` and updated its return type to use `SmallVec`.
 - **Update Tests**: Modified test cases in `pending_rows_batcher.rs` to reflect changes in function name and return type.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* feat/auto-schema-align:
 **Refactor `pending_rows_batcher.rs` to Simplify Table ID Handling**

 - Updated `TableBatch` struct to use `TableId` directly instead of `Option<u32>` for `table_id`.
 - Simplified logic in `flush_batch_physical` by removing the check for `None` in `table_id`.
 - Adjusted related logic in `start_worker` to accommodate the change in `table_id` handling.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* feat/auto-schema-align:
 ### Enhance Batch Processing Logic

 - **`pending_rows_batcher.rs`**:
   - Moved column taxonomy resolution inside the loop to handle schema variations across batches.
   - Added checks to skip processing if both tag columns and essential column indices are empty.

 - **Tests**:
   - Added `test_modify_batch_sparse_with_taxonomy_per_batch` to verify batch modification logic with varying schemas.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* feat/auto-schema-align:
 ### Remove Primary Key Column Check in `pending_rows_batcher.rs`

 - Removed the check for the primary key column and other essential column names in the function `strip_partition_columns_from_batch` within `pending_rows_batcher.rs`.
 - Simplified the logic by eliminating the validation of column order against expected essential names.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* feat/auto-schema-align:
 ### Refactor error handling and iteration in `otlp.rs` and `pending_rows_batcher.rs`

 - **`otlp.rs`**: Simplified error handling by removing `CatalogSnafu` context when awaiting table retrieval.
 - **`pending_rows_batcher.rs`**: Streamlined iteration over tables by removing unnecessary `into_iter()` calls, improving code readability and efficiency.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* chore/metrics-for-bulk:
 Add timing metrics for batch processing in `pending_rows_batcher.rs`

 - Introduced `modify_elapsed` and `columns_taxonomy_elapsed` to measure time spent in `modify_batch_sparse` and `columns_taxonomy` functions.
 - Updated `flush_batch_physical` to record these metrics using `PENDING_ROWS_BATCH_FLUSH_STAGE_ELAPSED`.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* feat/auto-schema-align:
 ### Commit Summary

 - **Remove Unused Code**: Eliminated the `#[allow(dead_code)]` attribute from the `compute_tsid_array` function in `batch_modifier.rs`.
 - **Error Handling Improvement**: Enhanced error handling in `flush_batch_physical` function by adjusting the `match` block in `pending_rows_batcher.rs`.
 - **Simplify Logic**: Streamlined the logic in `rows_to_aligned_record_batch` by removing unnecessary type casting in `prom_row_builder.rs`.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* feat/auto-schema-align:
 **Refactor `flush_batch_physical` in `pending_rows_batcher.rs`:**

 - Moved partition column stripping logic to a single location before processing region batches.
 - Updated the use of `combined_batch` to `stripped_batch` for consistency in batch processing.
 - Removed redundant partition column stripping logic within the region batch loop.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* feat/auto-schema-align:
 ### Update `batch_modifier.rs` Documentation and Parameter Naming

 - Enhanced documentation for `compute_tsid_array` and `modify_batch_sparse` functions to clarify their logic and parameters.
 - Renamed parameter `non_tag_column_indices` to `extra_column_indices` in `modify_batch_sparse` for better clarity.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

---------

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
2026-04-01 02:45:26 +00:00
Ning Sun
e14404c677 chore: update rust toolchain to 2026-03-21 (#7849)
* chore: update rust toolchain to 2026-03-21

* chore: new format

* fix: lint

* chore: resolve lint issues

* chore: remove as_millis_f64

* chore: deps up
2026-03-30 12:13:14 +00:00
discord9
d7bc5ad16b feat: add incremental read context and scan boundaries (#7848)
* feat: add incremental read context and scan boundaries

Signed-off-by: discord9 <discord9@163.com>

* chore: per review

Signed-off-by: discord9 <discord9@163.com>

* docs: explain field

Signed-off-by: discord9 <discord9@163.com>

---------

Signed-off-by: discord9 <discord9@163.com>
2026-03-26 07:00:22 +00:00
discord9
56ee8baa3f feat: admin gc table/regions (#7619)
* feat: gc table

Signed-off-by: discord9 <discord9@163.com>

* test: admin gc

Signed-off-by: discord9 <discord9@163.com>

* chore: after rebase fix

Signed-off-by: discord9 <discord9@163.com>

* refactor: GcStats

Signed-off-by: discord9 <discord9@163.com>

* refactor: use gc ticker for admin gc

Signed-off-by: discord9 <discord9@163.com>

* fix: region routes override

Signed-off-by: discord9 <discord9@163.com>

* test: non happy path

Signed-off-by: discord9 <discord9@163.com>

* refactor: gc job report enum

Signed-off-by: discord9 <discord9@163.com>

* test: process 0 regions

Signed-off-by: discord9 <discord9@163.com>

* after rebase

Signed-off-by: discord9 <discord9@163.com>

* feat: allow manual gc to return error

Signed-off-by: discord9 <discord9@163.com>

* chore: update proto

Signed-off-by: discord9 <discord9@163.com>

* per review

Signed-off-by: discord9 <discord9@163.com>

* chore: timeout and update proto

Signed-off-by: discord9 <discord9@163.com>

* chore: udpate proto

Signed-off-by: discord9 <discord9@163.com>

---------

Signed-off-by: discord9 <discord9@163.com>
2026-03-06 08:25:44 +00:00
discord9
d1151b665b feat: flow tql cte (#7702)
* feat: flow tql cte

Signed-off-by: discord9 <discord9@163.com>

* fix: creating flow TQL CTE source tables lose cte part in query

Signed-off-by: discord9 <discord9@163.com>

* test: update sqlness result

Signed-off-by: discord9 <discord9@163.com>

* chore

Signed-off-by: discord9 <discord9@163.com>

* fix: properly canonicalize ident

Signed-off-by: discord9 <discord9@163.com>

* feat: even stricter check

Signed-off-by: discord9 <discord9@163.com>

* chore: sqlness update

Signed-off-by: discord9 <discord9@163.com>

* chore: after rebase fix

Signed-off-by: discord9 <discord9@163.com>

---------

Signed-off-by: discord9 <discord9@163.com>
2026-03-06 03:36:42 +00:00
LFC
b2074e3863 chore: upgrade DataFusion family, again (#7578)
* chore: upgrade DataFusion family

Signed-off-by: luofucong <luofc@foxmail.com>

* chore: switch to released version of datafusion-pg-catalog

---------

Signed-off-by: luofucong <luofc@foxmail.com>
Co-authored-by: Ning Sun <sunning@greptime.com>
Co-authored-by: Ning Sun <sunng@protonmail.com>
2026-03-03 07:36:39 +00:00
Weny Xu
df04267c54 fix(repartition): reject writes on deallocating regions during region merge (#7694)
* feat(meta): add write route policy to region route with backward compatibility

Signed-off-by: WenyXu <wenymedia@gmail.com>

* fix(meta): use partition_expr compatibility accessor in repartition matching

Signed-off-by: WenyXu <wenymedia@gmail.com>

* feat(meta): introduce staging partition rule enum for repartition instructions

Signed-off-by: WenyXu <wenymedia@gmail.com>

* feat(datanode): plumb staging partition rule enum through heartbeat handlers

Signed-off-by: WenyXu <wenymedia@gmail.com>

* feat(meta): mark pending-deallocate regions as reject-all during merge staging

Signed-off-by: WenyXu <wenymedia@gmail.com>

* feat(partition): exclude reject-all regions from write partitioning

Signed-off-by: WenyXu <wenymedia@gmail.com>

* feat(mito): store staging partition rule enum in region state

Signed-off-by: WenyXu <wenymedia@gmail.com>

* feat(mito): reject writes in staging when partition rule is reject-all

Signed-off-by: WenyXu <wenymedia@gmail.com>

* feat(meta): send enter staging instruction with reject-all

Signed-off-by: WenyXu <wenymedia@gmail.com>

* fix(repartition): preserve reject-all on exit, merge enter-staging instructions, and allow staged bulk writes

Signed-off-by: WenyXu <wenymedia@gmail.com>

* refactor: refactor to ignore all writes

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: apply suggestions

Signed-off-by: WenyXu <wenymedia@gmail.com>

* refactor: rename StagingPartitionRule to StagingPartitionDirective across staging flow

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: add comments

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: clippy

Signed-off-by: WenyXu <wenymedia@gmail.com>

* refactor: nit

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: apply suggestions

Signed-off-by: WenyXu <wenymedia@gmail.com>

* refactor: rename

Signed-off-by: WenyXu <wenymedia@gmail.com>

---------

Signed-off-by: WenyXu <wenymedia@gmail.com>
2026-02-25 07:04:38 +00:00
Ruihang Xia
db46849f40 feat: track flow source tables for TQL and info schema (#7697)
* feat: track flow source tables for TQL and info schema

* handle schema matcher

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* sqlness tests

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* cover __name__ case

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2026-02-11 03:19:09 +00:00
Ning Sun
43afb7962a refactor: remove session from common meta (#7698)
* refactor: remove session dependency from common-meta

* chore: add udeps

* chore: format

* fix: lint issues

* chore: update oneshot

* chore: update unused deps
2026-02-11 03:04:45 +00:00
fys
1aa80d9363 fix: incorrect-tql-explain result (#7675) 2026-02-11 02:30:15 +00:00
Weny Xu
0ed3b83099 refactor: rename partition rule version to partition expr version (#7696)
* refactor: rename partition rule version to partition expr version

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: update proto

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: clippy

Signed-off-by: WenyXu <wenymedia@gmail.com>

---------

Signed-off-by: WenyXu <wenymedia@gmail.com>
2026-02-10 10:12:47 +00:00
LFC
8c23b29725 refactor: remove the RawTableMeta and RawTableInfo to make codes more concise (#7626)
* refactor: remove the `RawTableMeta` and `RawTableInfo` to make codes more concise

Signed-off-by: luofucong <luofc@foxmail.com>

* fix ci

Signed-off-by: luofucong <luofc@foxmail.com>

* fix ci

Signed-off-by: luofucong <luofc@foxmail.com>

---------

Signed-off-by: luofucong <luofc@foxmail.com>
2026-02-10 07:10:04 +00:00
Yingwen
39d3744f4f fix: bump prometheus to 0.14 (#7686)
* chore: add metrics test for fs backend

Signed-off-by: evenyag <realevenyag@gmail.com>

* test: test opendal metrics

Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: update prometheus to 0.14

The opendal metrics require 0.14.

Signed-off-by: evenyag <realevenyag@gmail.com>

---------

Signed-off-by: evenyag <realevenyag@gmail.com>
2026-02-09 09:51:00 +00:00
Weny Xu
8026b23834 feat: partition rule version validation for writes and staging (#7628)
* feat: verify partition rule

Signed-off-by: WenyXu <wenymedia@gmail.com>

* feat: add partition version cache

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: header check

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: fmt toml

Signed-off-by: WenyXu <wenymedia@gmail.com>

* refactor: minor refactor

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: header

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: fix clippy

Signed-off-by: WenyXu <wenymedia@gmail.com>

* fix: fix unit tests

Signed-off-by: WenyXu <wenymedia@gmail.com>

* refactor: minor refactor

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: apply suggestions

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: nit

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: nit

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: apply suggestions

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: apply suggestions

Signed-off-by: WenyXu <wenymedia@gmail.com>

---------

Signed-off-by: WenyXu <wenymedia@gmail.com>
2026-02-06 12:16:34 +00:00
discord9
3006ac54af fix: get correct table info when insert create/alter table (#7641)
* fix: set partition column&other newly acquired table info

Signed-off-by: discord9 <discord9@163.com>

* fix: update after alter table info

Signed-off-by: discord9 <discord9@163.com>

* refactor: use table name instead

Signed-off-by: discord9 <discord9@163.com>

* chore: log

Signed-off-by: discord9 <discord9@163.com>

---------

Signed-off-by: discord9 <discord9@163.com>
2026-01-30 10:32:31 +00:00
Ning Sun
124478f577 feat: use arrow-pg for postgres data encoding (#7591)
* feat: use arrow-pg for encode_row

* refactor: remove bytea and datetime module

* feat: port more encodings to arrow-pg

* feat: implement intervalstyle

* chore: format

* chore: remove error that is no longer used

* chore: use released arrow-pg

* Apply suggestions from code review

Co-authored-by: LFC <990479+MichaelScofield@users.noreply.github.com>

---------

Co-authored-by: LFC <990479+MichaelScofield@users.noreply.github.com>
2026-01-28 02:34:02 +00:00
fys
3c915a382b fix: unit tests when enterprise feature is enabled (#7625) 2026-01-28 02:13:17 +00:00
Ruihang Xia
c83868c4eb feat: partition rule simplifier (#7622)
* basic impl

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* reuse collider

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* simplify range helpers

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* notes

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* update unit test resule

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2026-01-27 14:31:20 +00:00
Weny Xu
4fb61047cb test: add integration tests for repartition (#7560)
* test: add integration tests for mito repartition

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: update test result

Signed-off-by: WenyXu <wenymedia@gmail.com>

* test: add integration tests for metric repartition

Signed-off-by: WenyXu <wenymedia@gmail.com>

* fix: correct results

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: enable tests for object store

Signed-off-by: WenyXu <wenymedia@gmail.com>

* test: add compaction and gc

Signed-off-by: WenyXu <wenymedia@gmail.com>

* fix: fix unit test

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: apply suggestions

Signed-off-by: WenyXu <wenymedia@gmail.com>

* more cases

Signed-off-by: WenyXu <wenymedia@gmail.com>

* fix: file ref also in repart mapping

Signed-off-by: discord9 <discord9@163.com>

* chore: apply suggestions

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: set a longer timeout for mock metasrv channel manager

Signed-off-by: WenyXu <wenymedia@gmail.com>

---------

Signed-off-by: WenyXu <wenymedia@gmail.com>
Signed-off-by: discord9 <discord9@163.com>
Co-authored-by: discord9 <discord9@163.com>
2026-01-22 10:14:40 +00:00
Weny Xu
25687bb282 feat: add ddl timeout/wait options, repartition WITH parsing, meta-client startup refactor (#7589)
* feat: add ddl request timeouts and unify meta client startup

Signed-off-by: WenyXu <wenymedia@gmail.com>

* feat: omplement ALTER TABLE repartition DDL options parsing

Signed-off-by: WenyXu <wenymedia@gmail.com>

* test: add sqlness tests

Signed-off-by: WenyXu <wenymedia@gmail.com>

* test: add unit tests

Signed-off-by: WenyXu <wenymedia@gmail.com>

* feat: pass timeout argument to procedure

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: apply suggestions

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: refine comments

Signed-off-by: WenyXu <wenymedia@gmail.com>

* test: assert timeout

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: apply suggestions

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: update proto

Signed-off-by: WenyXu <wenymedia@gmail.com>

---------

Signed-off-by: WenyXu <wenymedia@gmail.com>
2026-01-20 09:26:53 +00:00
LFC
e64c31e59a chore: upgrade DataFusion family (#7558)
* chore: upgrade DataFusion family

Signed-off-by: luofucong <luofc@foxmail.com>

* use main proto

Signed-off-by: luofucong <luofc@foxmail.com>

* fix ci

Signed-off-by: luofucong <luofc@foxmail.com>

---------

Signed-off-by: luofucong <luofc@foxmail.com>
2026-01-14 14:02:31 +00:00
Ruihang Xia
45b4067721 feat: always canonicalize partition expr (#7553)
* feat: always canonicalize partition expr

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix ut assertion

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2026-01-12 07:24:29 +00:00
Weny Xu
567d3e66e9 feat: integrate repartition procedure into DdlManager (#7548)
* feat: add repartition procedure factory support to DdlManager

- Introduce RepartitionProcedureFactory trait for creating and registering
  repartition procedures
- Implement DefaultRepartitionProcedureFactory for metasrv with full support
- Implement StandaloneRepartitionProcedureFactory for standalone (unsupported)
- Add procedure loader registration for RepartitionProcedure and
  RepartitionGroupProcedure
- Add helper methods to TableMetadataAllocator for allocator access
- Add error types for repartition procedure operations
- Update DdlManager to accept and use RepartitionProcedureFactoryRef

Signed-off-by: WenyXu <wenymedia@gmail.com>

* feat: integrate repartition procedure into DdlManager

- Add submit_repartition_task() to handle repartition from alter table
- Route Repartition operations in submit_alter_table_task() to repartition factory
- Refactor: rename submit_procedure() to execute_procedure_and_wait()
- Make all DDL operations wait for completion by default
- Add submit_procedure() for fire-and-forget submissions
- Add CreateRepartitionProcedure error type
- Add placeholder Repartition handling in grpc-expr (unsupported)
- Update greptime-proto dependency

Signed-off-by: WenyXu <wenymedia@gmail.com>

* feat: implement ALTER TABLE REPARTITION procedure submission

Signed-off-by: WenyXu <wenymedia@gmail.com>

* refactor(repartition): handle central region in apply staging manifest

- Introduce ApplyStagingManifestInstructions struct to organize instructions
- Add special handling for central region when applying staging manifests
- Transition state from UpdateMetadata to RepartitionEnd after applying staging manifests
- Remove next_state() method in RepartitionStart and inline state transitions
- Improve logging and expression serialization in DDL statement executor
- Move repartition tests from standalone to distributed test suite

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: apply suggestions from CR

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: update proto

Signed-off-by: WenyXu <wenymedia@gmail.com>

---------

Signed-off-by: WenyXu <wenymedia@gmail.com>
2026-01-09 08:37:21 +00:00
Weny Xu
aadfcd7821 feat(repartition): implement validation logic for repartition table (#7538)
* feat(repartition): implement validation logic for repartition_table

Signed-off-by: WenyXu <wenymedia@gmail.com>

* refactor: minor refactor

Signed-off-by: WenyXu <wenymedia@gmail.com>

* test: update sqlness

Signed-off-by: WenyXu <wenymedia@gmail.com>

---------

Signed-off-by: WenyXu <wenymedia@gmail.com>
2026-01-08 12:18:39 +00:00
Weny Xu
ada4666e10 refactor: remove region_numbers from TableMeta and TableInfo (#7519)
* refactor: remove `region_numbers` from `TableMeta` and `TableInfo`

Signed-off-by: WenyXu <wenymedia@gmail.com>

* feat: create partitions from region route

Signed-off-by: WenyXu <wenymedia@gmail.com>

* fix: fix build

Signed-off-by: WenyXu <wenymedia@gmail.com>

---------

Signed-off-by: WenyXu <wenymedia@gmail.com>
2026-01-06 13:21:36 +00:00
AntiTopQuark
cea578244c fix(compaction): unify behavior of database compaction options with TTL (#7402)
* fix: fix dynamic compactiom option,unify behavior of database compaction options with TTL option

Signed-off-by: AntiTopQuark <AntiTopQuark1350@outlook.com>

* fix unit test

Signed-off-by: AntiTopQuark <AntiTopQuark1350@outlook.com>

* add debug log

Signed-off-by: AntiTopQuark <AntiTopQuark1350@outlook.com>

---------

Signed-off-by: AntiTopQuark <AntiTopQuark1350@outlook.com>
2025-12-25 02:34:42 +00:00
LFC
dc9f3a702e refactor: explicitly define json struct to ingest jsonbench data (#7462)
ingest jsonbench data

Signed-off-by: luofucong <luofc@foxmail.com>
2025-12-24 07:30:22 +00:00
Ruihang Xia
564cc0c750 feat: table/column/flow COMMENT (#7060)
* initial impl

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* simplify impl

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* sqlness test

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* avoid unimplemented panic

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* validate flow

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* update sqlness result

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix table column comment

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* table level comment

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* simplify table info serde

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* don't txn

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* remove empty trait

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* wip: procedure

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* update proto

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* grpc support

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* Apply suggestions from code review

Co-authored-by: dennis zhuang <killme2008@gmail.com>
Co-authored-by: LFC <990479+MichaelScofield@users.noreply.github.com>
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* try from pb struct

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* doc comment

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* check unchanged fast case

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* tune errors

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix merge error

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* use try_as_raw_value

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Co-authored-by: dennis zhuang <killme2008@gmail.com>
Co-authored-by: LFC <990479+MichaelScofield@users.noreply.github.com>
2025-12-10 15:08:47 +00:00
Lei, HUANG
11ecb7a28a refactor(servers): bulk insert service (#7329)
* refactor/bulk-insert-service:
 refactor: decode FlightData early in put_record_batch pipeline

 - Move FlightDecoder usage from Inserter up to PutRecordBatchRequestStream,
   passing decoded RecordBatch and schema bytes instead of raw FlightData.
 - Eliminate redundant per-request decoding/encoding in Inserter; encode
   once and reuse for all region requests.
 - Streamline GrpcQueryHandler trait and implementations to accept
   PutRecordBatchRequest containing pre-decoded data.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* refactor/bulk-insert-service:
 feat: stream-based bulk insert with per-batch responses

 - Introduce handle_put_record_batch_stream() to process Flight DoPut streams
 - Resolve table & permissions once, yield (request_id, AffectedRows) per batch
 - Replace loop-over-request with async-stream in frontend & server
 - Make PutRecordBatchRequestStream public for cross-crate usage

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* refactor/bulk-insert-service:
 fix: propagate request_id with errors in bulk insert stream

 Changes the bulk-insert stream item type from
 Result<(i64, AffectedRows), E> to (i64, Result<AffectedRows, E>)
 so every emitted tuple carries the request_id even on failure,
 letting callers correlate errors with the originating request.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* refactor/bulk-insert-service:
 refactor: unify DoPut response stream to return DoPutResponse

 Replace the tuple (i64, Result<AffectedRows>) with Result<DoPutResponse>
 throughout the gRPC bulk-insert path so the handler, adapter and server
 all speak the same type.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* refactor/bulk-insert-service:
 feat: add elapsed_secs to DoPutResponse for bulk-insert timing

 - DoPutResponse now carries elapsed_secs field
 - Frontend measures and attaches insert duration
 - Server observes GRPC_BULK_INSERT_ELAPSED metric from response

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* refactor/bulk-insert-service:
 refactor: unify Bytes import in flight module

 - Replace `bytes::Bytes` with `Bytes` alias for consistency
 - Remove redundant `ProstBytes` alias

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* refactor/bulk-insert-service:
 fix: terminate gRPC stream on error and optimize FlightData handling

 - Stop retrying on stream errors in gRPC handler
 - Replace Vec1 indexing with into_iter().next() for FlightData
 - Remove redundant clones in bulk_insert and flight modules

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* refactor/bulk-insert-service:
 Improve permission check placement in `grpc.rs`

 - Moved the permission check for `BulkInsert` to occur before resolving the table reference in `GrpcQueryHandler` implementation.
 - Ensures permission validation is performed earlier in the process, potentially avoiding unnecessary operations if permission is denied.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* refactor/bulk-insert-service:
 **Refactor Bulk Insert Handling in gRPC**

 - **`grpc.rs`**:
   - Switched from `async_stream::stream` to `async_stream::try_stream` for error handling.
   - Removed `body_size` parameter and added `flight_data` to `handle_bulk_insert`.
   - Simplified error handling and permission checks in `GrpcQueryHandler`.

 - **`bulk_insert.rs`**:
   - Added `raw_flight_data` parameter to `handle_bulk_insert`.
   - Calculated `body_size` from `raw_flight_data` and removed redundant encoding logic.

 - **`flight.rs`**:
   - Replaced `body_size` with `flight_data` in `PutRecordBatchRequest`.
   - Updated memory usage calculation to include `flight_data` components.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* refactor/bulk-insert-service:
 perf(bulk_insert): encode record batch once per datanode

 Move FlightData encoding outside the per-region loop so the same
 encoded bytes are reused when mask.select_all(), eliminating redundant
 serialisation work.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

---------

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
2025-12-04 07:08:02 +00:00
dennis zhuang
1f91422bae feat!: improve mysql/pg compatibility (#7315)
* feat(mysql): add SHOW WARNINGS support and return warnings for unsupported SET variables

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* feat(function): add MySQL IF() function and PostgreSQL description functions for connector compatibility

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* fix: show tables for mysql

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* fix: partitions table in information_schema and add starrocks external catalog compatibility

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* refactor: async udf

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* fix: set warnings

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* feat: impl pg_my_temp_schema and make description functions simple

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* test: add test for issue 7313

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* feat: apply suggestions

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* fix: partition_expression and partition_description

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* fix: test

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* fix: unit tests

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* fix: saerch_path only works for pg

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* feat: improve warnings processing

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* fix: warnings while writing affected rows and refactor

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* chore: improve ShobjDescriptionFunction signature

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* refactor: array_to_boolean

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

---------

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-12-01 20:41:14 +00:00
LFC
fdab75ce27 feat: simple read write new json type values (#7175)
feat: basic json read and write

Signed-off-by: luofucong <luofc@foxmail.com>
2025-11-27 12:40:35 +00:00
fys
020477994b feat: add some configurable points (#7227)
* feat: enhance extension

* fix: cr

* move information schema table factories trait to standalone

* fix: self cr

* remove extension factory

* refactor

* remove extension filed from greptime options struct

* refactor

* minor refactor

* fix: cargo check

* fix: clippy

* fix: license check

* feat: enhance grpc and http configurator in servers crate

* grpc builder configurator

* remove unused file

* complete the remaining expansion points.

* fix: self-cr

* rename

* fix: typo
2025-11-27 09:21:46 +00:00