Compare commits

...

512 Commits

Author SHA1 Message Date
Yingwen
17d75c767c chore: update all arm64 builders (#5215) 2024-12-21 18:14:45 +08:00
Yingwen
a1ed450c0c chore: arm64 use 8xlarge runner (#5213) 2024-12-20 22:29:25 +08:00
evenyag
ea4ce9d1e3 chore: set version to 0.11.1 2024-12-20 14:12:19 +08:00
evenyag
1f7d9666b7 chore: Downgrade opendal for releasing 0.11.1
Revert "feat: bump opendal and switch prometheus layer to the upstream impl (#5179)"

This reverts commit 422d18da8b.
2024-12-20 14:12:19 +08:00
LFC
9f1a0d78b2 ci: install latest protobuf in dev-builder image (#5196) 2024-12-20 14:12:19 +08:00
discord9
ed8e418716 fix: auto created table ttl check (#5203)
* fix: auto created table ttl check

* tests: with hint
2024-12-20 14:12:19 +08:00
discord9
9e7121c1bb fix(flow): batch builder with type (#5195)
* fix: typed builder

* chore: clippy

* chore: rename

* fix: unit tests

* refactor: per review
2024-12-20 14:12:19 +08:00
dennis zhuang
94a49ed4f0 chore: update PR template (#5199) 2024-12-20 14:12:19 +08:00
discord9
f5e743379f feat: show flow's mem usage in INFORMATION_SCHEMA.FLOWS (#4890)
* feat: add flow mem size to sys table

* chore: rm dup def

* chore: remove unused variant

* chore: minor refactor

* refactor: per review
2024-12-20 14:12:19 +08:00
Ruihang Xia
6735e5867e feat: bump opendal and switch prometheus layer to the upstream impl (#5179)
* feat: bump opendal and switch prometheus layer to the upstream impl

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* remove unused files

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix tests

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* remove unused things

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* remove root dir on recovering cache

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* filter out non-files entry in test

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2024-12-20 14:12:19 +08:00
Weny Xu
925525726b fix: ensure table route metadata is eventually rolled back on failure (#5174)
* fix: ensure table route metadata is eventually rolled back on procedure failure

* fix(fuzz): enhance procedure condition checking

* chore: add logs

* feat: close downgraded leader region actively

* chore: apply suggestions from CR
2024-12-20 14:12:19 +08:00
Ning Sun
6427682a9a feat: show create postgresql foreign table (#5143)
* feat: add show create table for pg in parser

* feat: implement show create table operation

* fix: adopt upstream changes
2024-12-20 14:12:19 +08:00
Ning Sun
55b0022676 chore: make nix compilation environment config more robust (#5183)
* chore: improve nix-shell support

* fix: add pkg-config

* ci: add a github action to ensure build on clean system

* ci: optimise dependencies of task

* ci: move clean build to nightly
2024-12-20 14:12:19 +08:00
Ruihang Xia
2d84cc8d87 refactor: remove unused symbols (#5193)
chore: remove unused symbols

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2024-12-20 14:12:19 +08:00
Yingwen
c030705b17 docs: fix grafana dashboard row (#5192) 2024-12-20 14:12:19 +08:00
Ruihang Xia
443c600bd0 fix: validate matcher op for __name__ in promql (#5191)
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2024-12-20 14:12:19 +08:00
Lei, HUANG
39cadfe10b fix(sqlness): enforce order in union tests (#5190)
Add ORDER BY clause to subquery union tests

 Updated the SQL and result files for subquery union tests to include an ORDER BY clause, ensuring consistent result ordering. This change aligns with the test case from the DuckDB repository.
2024-12-20 14:12:19 +08:00
jeremyhi
9b5e4e80f7 feat: extract hints from http header (#5128)
* feat: extract hints from http header

* Update src/servers/src/http/hints.rs

Co-authored-by: shuiyisong <113876041+shuiyisong@users.noreply.github.com>

* chore: by comment

* refactor: get instead of loop

---------

Co-authored-by: shuiyisong <113876041+shuiyisong@users.noreply.github.com>
2024-12-20 14:12:19 +08:00
Yingwen
041a276b66 feat: do not remove time filters in ScanRegion (#5180)
* feat: do not remove time filters

* chore: remove `time_range` from parquet reader

* chore: print more message in the check script

* chore: fix unused error
2024-12-20 14:12:19 +08:00
Yingwen
614a25ddc5 feat: do not keep MemtableRefs in ScanInput (#5184) 2024-12-20 14:12:19 +08:00
dennis zhuang
4337e20010 feat: impl label_join and label_replace for promql (#5153)
* feat: impl label_join and label_replace for promql

* chore: style

* fix: dst_label is eqauls to src_label

* fix: forgot to sort the results

* fix: processing empty source label
2024-12-20 14:12:19 +08:00
Lanqing Yang
65c52cc698 fix: display inverted and fulltext index in show index (#5169) 2024-12-20 14:12:19 +08:00
Yohan Wal
50f31fd681 feat: introduce Buffer for non-continuous bytes (#5164)
* feat: introduce Buffer for non-continuous bytes

* Update src/mito2/src/cache/index.rs

Co-authored-by: Weny Xu <wenymedia@gmail.com>

* chore: apply review comments

* refactor: use opendal::Buffer

---------

Co-authored-by: Weny Xu <wenymedia@gmail.com>
2024-12-20 14:12:19 +08:00
LFC
b5af5aaf8d refactor: produce BatchBuilder from a Batch to modify it again (#5186)
chore: pub some mods
2024-12-20 14:12:19 +08:00
Lei, HUANG
27693c7f1e perf: avoid holding memtable during compaction (#5157)
* perf/avoid-holding-memtable-during-compaction: Refactor Compaction Version Handling

 • Introduced CompactionVersion struct to encapsulate region version details for compaction, removing dependency on VersionRef.
 • Updated CompactionRequest and CompactionRegion to use CompactionVersion.
 • Modified open_compaction_region to construct CompactionVersion without memtables.
 • Adjusted WindowedCompactionPicker to work with CompactionVersion.
 • Enhanced flush logic in WriteBufferManager to improve memory usage checks and logging.

* reformat code

* chore: change log level

* reformat code

---------

Co-authored-by: Yingwen <realevenyag@gmail.com>
2024-12-20 14:12:19 +08:00
discord9
a59fef9ffb test: sqlness upgrade compatibility tests (#5126)
* feat: simple version switch

* chore: remove debug print

* chore: add common folder

* tests: add drop table

* feat: pull versioned binary

* chore: don't use native-tls

* chore: rm outdated docs

* chore: new line

* fix: save old bin dir

* fix: switch version restart all node

* feat: use etcd

* fix: wait for election

* fix: normal sqlness

* refactor: hashmap for bin dir

* test: past 3 major version compat crate table

* refactor: allow using without setup etcd
2024-12-20 14:12:19 +08:00
Zhenchi
bcecd8ce52 feat(bloom-filter): add basic bloom filter creator (Part 1) (#5177)
* feat(bloom-filter): add a simple bloom filter creator (Part 1)

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* fix: clippy

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* fix: header

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* docs: add format comment

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

---------

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
2024-12-20 14:12:19 +08:00
Yingwen
ffdcb8c1ac fix: deletion between two put may not work in last_non_null mode (#5168)
* fix: deletion between rows with the same key may not work

* test: add sqlness test case

* chore: comments
2024-12-20 14:12:19 +08:00
Yingwen
554121ad79 chore: add aquamarine to dep lists (#5181) 2024-12-20 14:12:19 +08:00
Weny Xu
43c12b4f2c fix: correct set_region_role_state_gracefully behaviors (#5171)
* fix: reduce default max rows for fuzz testing

* chore: remove Postgres setup from fuzz test workflow

* chore(fuzz): increase resource limits for GreptimeDB cluster

* chore(fuzz): increase resource limits for kafka

* fix: correct `set_region_role_state_gracefully` behaviors

* chore: remove Postgres setup from fuzz test workflow

* chore(fuzz): redue resource limits for GreptimeDB & kafka
2024-12-20 14:12:19 +08:00
discord9
7aa8c28fe4 test: flow rebuild (#5162)
* tests: rebuild flow

* tests: more rebuild

* tests: restart

* chore: drop clean
2024-12-20 14:12:19 +08:00
Ning Sun
34fbe7739e chore: add nix-shell configure for a minimal environment for development (#5175)
* chore: add nix-shell development environment

* chore: add rust-analyzer

* chore: use .envrc as a private file
2024-12-20 14:12:19 +08:00
ZonaHe
06d7bd99dd feat: update dashboard to v0.7.3 (#5172)
Co-authored-by: sunchanglong <sunchanglong@users.noreply.github.com>
2024-12-20 14:12:19 +08:00
Ruihang Xia
b71d842615 feat: introduce SKIPPING index (part 1) (#5155)
* skip index parser

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* wip: sqlness

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* impl show create part

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* add empty line

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* change keyword to SKIPPING INDEX

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* rename local variables

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2024-12-20 14:12:19 +08:00
Lei, HUANG
7f71693b8e chore: gauge for flush compaction (#5156)
* add metrics

* chore/bench-metrics: Add INFLIGHT_FLUSH_COUNT Metric to Flush Process

 • Introduced INFLIGHT_FLUSH_COUNT metric to track the number of ongoing flush operations.
 • Incremented INFLIGHT_FLUSH_COUNT in FlushScheduler to monitor active flushes.
 • Removed redundant increment of INFLIGHT_FLUSH_COUNT in RegionWorkerLoop to prevent double counting.

* chore/bench-metrics: Add Metrics for Compaction and Flush Operations

 • Introduced INFLIGHT_COMPACTION_COUNT and INFLIGHT_FLUSH_COUNT metrics to track the number of ongoing compaction and flush operations.
 • Incremented INFLIGHT_COMPACTION_COUNT when scheduling remote and local compaction jobs, and decremented it upon completion.
 • Added INFLIGHT_FLUSH_COUNT increment and decrement logic around flush tasks to monitor active flush operations.
 • Removed redundant metric updates in worker.rs and handle_compaction.rs to streamline metric handling.

* chore: add metrics for remote compaction jobs

* chore: format

* chore: also add dashbaord
2024-12-20 14:12:19 +08:00
Lin Yihai
615ea1a171 feat: Add vector_scalar_mul function. (#5166) 2024-12-20 14:12:19 +08:00
shuiyisong
4e725d259d chore: remove unused dep (#5163)
* chore: remove unused dep

* chore: remove more unused dep
2024-12-20 14:12:19 +08:00
Niwaka
dc2252eb6d fix: support alter table ~ add ~ custom_type (#5165) 2024-12-20 14:12:19 +08:00
Yingwen
6d4cc2e070 ci: use 4xlarge for nightly build (#5158) 2024-12-20 14:12:19 +08:00
localhost
6066ce2c4a fix: loki write row len error (#5161) 2024-12-20 14:12:19 +08:00
Yingwen
b90d8f7dbd docs: Add index panels to standalone grafana dashboard (#5140)
* docs: Add index panels to standalnoe grafana dashboard

* docs: fix flush/compaction op
2024-12-20 14:12:19 +08:00
Yohan Wal
fdccf4ff84 refactor: cache inverted index with fixed-size page (#5114)
* feat: cache inverted index by page instead of file

* fix: add unit test and fix bugs

* chore: typo

* chore: ci

* fix: math

* chore: apply review comments

* chore: renames

* test: add unit test for index key calculation

* refactor: use ReadableSize

* feat: add config for inverted index page size

* chore: update config file

* refactor: handle multiple range read and fix some related bugs

* fix: add config

* test: turn to a fs reader to match behaviors of object store
2024-12-20 14:12:19 +08:00
localhost
8b1484c064 chore: pipeline dryrun api can currently receives pipeline raw content (#5142)
* chore: pipeline dryrun api can currently receives pipeline raw content

* chore: remove dryrun v1 and add test

* chore: change dryrun pipeline api body schema

* chore: remove useless struct PipelineInfo

* chore: update PipelineDryrunParams doc

* chore: increase code readability

* chore: add some comment for pipeline dryrun test

* Apply suggestions from code review

Co-authored-by: shuiyisong <113876041+shuiyisong@users.noreply.github.com>

* chore: format code

---------

Co-authored-by: shuiyisong <113876041+shuiyisong@users.noreply.github.com>
2024-12-20 14:12:19 +08:00
Yingwen
576e20ac78 feat: collect reader metrics from prune reader (#5152) 2024-12-20 14:12:19 +08:00
localhost
10b3e3da0f chore: decide tag column in log api follow table schema if table exists (#5138)
* chore: decide tag column in log api follow table schema if table exists

* chore: add more test for greptime_identity pipeline

* chore: change pipeline get_table function signature

* chore: change identity_pipeline_inner tag_column_names type
2024-12-20 14:12:19 +08:00
Weny Xu
4a3ef2d718 feat(index): add file_size_hint for remote blob reader (#5147)
feat(index): add file_size_hint for remote blob reader
2024-12-20 14:12:19 +08:00
Yohan Wal
65eabb2a05 feat(fuzz): add alter table options for alter fuzzer (#5074)
* feat(fuzz): add set table options to alter fuzzer

* chore: clippy is happy, I'm sad

* chore: happy ci happy

* fix: unit test

* feat(fuzz): add unset table options to alter fuzzer

* fix: unit test

* feat(fuzz): add table option validator

* fix: make clippy happy

* chore: add comments

* chore: apply review comments

* fix: unit test

* feat(fuzz): add more ttl options

* fix: #5108

* chore: add comments

* chore: add comments
2024-12-20 14:12:19 +08:00
Weny Xu
bc5a57f51f feat: introduce PuffinMetadataCache (#5148)
* feat: introduce `PuffinMetadataCache`

* refactor: remove too_many_arguments

* chore: fmt toml
2024-12-20 14:12:19 +08:00
Weny Xu
f24b9d8814 feat: add prefetch support to InvertedIndexFooterReader for reduced I/O time (#5146)
* feat: add prefetch support to `InvertedIndeFooterReader`

* chore: correct struct name

* chore: apply suggestions from CR
2024-12-20 14:12:19 +08:00
Weny Xu
dd4d0a88ce feat: add prefetch support to PuffinFileFooterReader for reduced I/O time (#5145)
* feat: introduce `PuffinFileFooterReader`

* refactor: remove `SyncReader` trait and impl

* refactor: replace `FooterParser` with `PuffinFileFooterReader`

* chore: remove unused errors
2024-12-20 14:12:19 +08:00
Niwaka
3d2096fe9d feat: support push down IN filter (#5129)
* feat: support push down IN filter

* chore: move tests to prune.sql
2024-12-20 14:12:19 +08:00
Ruihang Xia
35715bb710 feat: implement v1/sql/parse endpoint to parse GreptimeDB's SQL dialect (#5144)
* derive ser/de

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* impl method

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix typo

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* remove deserialize

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2024-12-20 14:12:19 +08:00
ZonaHe
08a3befa67 feat: update dashboard to v0.7.2 (#5141)
Co-authored-by: sunchanglong <sunchanglong@users.noreply.github.com>
2024-12-20 14:12:19 +08:00
Yohan Wal
ca1758d4e7 test: part of parser test migrated from duckdb (#5125)
* test: update test

* fix: fix test
2024-12-20 14:12:19 +08:00
Zhenchi
42bf818167 feat(vector): add scalar add function (#5119)
* refactor: extract implicit conversion helper functions of vector

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* feat(vector): add scalar add function

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* fix fmt

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

---------

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
2024-12-20 14:12:19 +08:00
Lei, HUANG
2c9b117224 perf: avoid cache during compaction (#5135)
* Revert "refactor: Avoid wrapping Option for CacheManagerRef (#4996)"

This reverts commit 42bf7e9965.

* fix: memory usage during log ingestion

* fix: fmt
2024-12-20 14:12:19 +08:00
dennis zhuang
3edf2317e1 feat: adjust WAL purge default configurations (#5107)
* feat: adjust WAL purge default configurations

* fix: config

* feat: change raft engine file_size default to 128Mib
2024-12-20 14:12:19 +08:00
jeremyhi
85d72a3cd0 chore: set store_key_prefix for all kvbackend (#5132) 2024-12-20 14:12:19 +08:00
discord9
928172bd82 chore: fix aws_lc not in depend tree check in CI (#5121)
* chore: fix aws_lc check in CI

* chore: update lock file
2024-12-20 14:12:19 +08:00
shuiyisong
e9f5bddeff chore: add /ready api for health checking (#5124)
* chore: add ready endpoint for health checking

* chore: add test
2024-12-20 14:12:19 +08:00
Yingwen
486755d795 chore: bump main branch version to 0.12 (#5133)
chore: bump version to v0.12.0
2024-12-20 14:12:19 +08:00
dennis zhuang
03a28320d6 feat!: enable read cache and write cache when using remote object stores (#5093)
* feat: enable read cache and write cache when using remote object stores

* feat: make read cache be aware of remote store names

* chore: docs

* chore: apply review suggestions

* chore: trim write cache path

---------

Co-authored-by: Yingwen <realevenyag@gmail.com>
2024-12-10 04:03:44 +00:00
Lei, HUANG
ce86ba3425 chore: Reduce FETCH_OPTION_TIMEOUT from 10 to 3 seconds in config.rs (#5117)
Reduce FETCH_OPTION_TIMEOUT from 10 to 3 seconds in config.rs
2024-12-09 13:39:18 +00:00
Yingwen
2fcb95f50a fix!: fix regression caused by unbalanced partitions and splitting ranges (#5090)
* feat: assign partition ranges by rows

* feat: balance partition rows

* feat: get uppoer bound for part nums

* feat: only split in non-compaction seq scan

* fix: parallel scan on multiple sources

* fix: can split check

* feat: scanner prepare by request

* feat: remove scan_parallelism

* docs: upate docs

* chore: update comment

* style: fix clippy

* feat: skip merge and dedup if there is only one source

* chore: Revert "feat: skip merge and dedup if there is only one source"

Since memtable won't do dedup jobs

This reverts commit 2fc7a54b11.

* test: avoid compaction in sqlness window sort test

* chore: do not create semaphore if num partitions is enough

* chore: more assertions

* chore: fix typo

* fix: compaction flag not set

* chore: address review comments
2024-12-09 12:50:57 +00:00
ZonaHe
1b642ea6a9 feat: update dashboard to v0.7.1 (#5123)
Co-authored-by: sunchanglong <sunchanglong@users.noreply.github.com>
2024-12-09 10:27:35 +00:00
Weny Xu
b35221ccb6 ci: set meta replicas to 1 (#5111) 2024-12-09 07:22:47 +00:00
Zhenchi
bac7e7bac9 refactor: extract implicit conversion helper functions of vector type (#5118)
refactor: extract implicit conversion helper functions of vector

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
2024-12-09 07:19:00 +00:00
dennis zhuang
903da8f4cb fix: show create table doesn't quote option keys which contains dot (#5108)
* fix: show create table doesn't quote option keys which contains dot

* fix: compile
2024-12-09 03:27:46 +00:00
Ning Sun
c0f498b00c feat: update pgwire to 0.28 (#5113)
* feat: update pgwire to 0.28

* test: update tests
2024-12-09 03:12:11 +00:00
Lin Yihai
19373d806d chore: Add timeout setting for find_ttl. (#5088) 2024-12-06 15:02:15 +00:00
Ning Sun
3133f3fb4e feat: add cursor statements (#5094)
* feat: add sql parsers for cursor operations

* feat: cursor operator

* feat: implement RecordBatchStreamCursor

* feat: implement cursor storage and execution

* test: add tests

* chore: update docstring

* feat: add a temporary sql rewrite for cast in limit

this issue is described in #5097

* test: add more sql for cursor integration test

* feat: reject non-select query for cursor statement

* refactor: address review issues

* test: add empty result case

* feat: address review comments
2024-12-06 09:32:22 +00:00
discord9
8b944268da feat: ttl=0/instant/forever/humantime&ttl refactor (#5089)
* feat: ttl zero filter

* refactor: use TimeToLive enum

* fix: unit test

* tests: sqlness

* refactor: Option<TTL> None means UNSET

* tests: sqlness

* fix: 10000 years --> forever

* chore: minor refactor from reviews

* chore: rename back TimeToLive

* refactor: split imme request from normal requests

* fix: use correct lifetime

* refactor: rename immediate to instant

* tests: flow sink table default ttl

* refactor: per review

* tests: sqlness

* fix: ttl alter to instant

* tests: sqlness

* refactor: per review

* chore: per review

* feat: add db ttl type&forbid instant for db

* tests: more unit test
2024-12-06 09:20:42 +00:00
Ning Sun
dc83b0aa15 feat: add more transaction related statement for postgres interface (#5081)
* fix: add match for start and abort transactions

* feat: add commit transaction statement

* feat: add warning on transaction start

* chore: update message
2024-12-06 08:22:25 +00:00
Weny Xu
2b699e735c chore: correct example config file (#5105)
* chore: correct example config file

* fix: fix unit test
2024-12-06 03:14:08 +00:00
Yingwen
7a3d6f2bd5 docs: remove lg_prof_interval from env (#5103) 2024-12-06 02:59:16 +00:00
Ruihang Xia
f9ebb58a12 fix: put PipelineChecker at the end (#5092)
fix: put PipelineChecker in the end

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2024-12-06 02:10:17 +00:00
ZonaHe
c732016fa0 feat: update dashboard to v0.7.0 (#5100)
Co-authored-by: sunchanglong <sunchanglong@users.noreply.github.com>
2024-12-05 13:42:36 +00:00
Weny Xu
01a308fe6b refactor: relocate CLI to a dedicated directory (#5101)
* refactor: relocate CLI to a dedicated directory

* chore: expose method and const

* refactor: use BoxedError

* chore: expose DatabaseClient

* chore: add clone derive
2024-12-05 12:12:24 +00:00
Ruihang Xia
cf0c84bed1 feat!: remove GET method in /debug path (#5102)
* featremove GET method in \/debug path

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* update how-to document as well

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2024-12-05 08:21:55 +00:00
Yingwen
66c0445974 perf: take a new batch to reduce last row cache usage (#5095)
* feat: take and cache last row to save memory

* style: fix clippy
2024-12-05 03:59:28 +00:00
Ruihang Xia
7d8b256942 refactor: replace LogHandler with PipelineHandler (#5096)
* refactor: replace LogHandler with PipelineHandler

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* change method name

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* rename transform to insert

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2024-12-04 11:48:55 +00:00
Ruihang Xia
5092f5f451 feat: define basic structures and implement TimeFilter (#5086)
* feat: define basic structures and implement TimeFilter

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* document column filter

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* define context

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* change variable name to avoid typo checker

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* change error referring style

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* refine context definition

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2024-12-04 07:39:33 +00:00
dennis zhuang
ff4c153d4b test: adds sqlness test for TTL (#5063)
* test: adds sqlness test for TTL

* chore: restart cluster

* fix: typo

* test: adds database TTL with metric engine tables
2024-12-03 11:32:40 +00:00
Lei, HUANG
a51853846a fix: schema cache invalidation (#5067)
* fix: use SchemaCache to locate database metadata

* main:
 Refactor SchemaMetadataManager to use TableInfoCacheRef

 - Replace TableInfoManagerRef with TableInfoCacheRef in SchemaMetadataManager
 - Update DatanodeBuilder to pass TableInfoCacheRef to SchemaMetadataManager
 - Rename error MissingCacheRegistrySnafu to MissingCacheSnafu in datanode module
 - Adjust tests to use new mock_schema_metadata_manager with TableInfoCacheRef

* fix/schema-cache-invalidation: Add cache module and integrate cache registry into datanode

 • Implement build_datanode_cache_registry function to create cache registry for datanode
 • Integrate cache registry into datanode by modifying DatanodeBuilder and HeartbeatTask
 • Refactor InvalidateTableCacheHandler to InvalidateCacheHandler and move to common-meta crate
 • Update Cargo.toml to include cache as a dev-dependency for datanode
 • Adjust related modules (flownode, frontend, tests-integration, standalone) to use new cache handler and registry
 • Remove obsolete handler module from frontend crate

* fix: fuzz imports

* chore: add some doc for cahce builder functions

* refactor: change table info cache to table schema cache

* fix: remove unused variants

* fix fuzz

* chore: apply suggestion

Co-authored-by: Weny Xu <wenymedia@gmail.com>

* chore: apply suggestion

Co-authored-by: Weny Xu <wenymedia@gmail.com>

* fix: compile

---------

Co-authored-by: dennis zhuang <killme2008@gmail.com>
Co-authored-by: Weny Xu <wenymedia@gmail.com>
2024-12-03 10:44:29 +00:00
Weny Xu
51c6eafb16 feat: recover file cache index asynchronously (#5087)
* feat: recover file cache index asynchronously

* chore: apply suggestions from CR
2024-12-03 09:33:52 +00:00
Weny Xu
5bdea1a755 fix: correct is_exceeded_size_limit behavior for in-memory store (#5082)
* fix: correct `is_exceeded_size_limit` behavior for in-memory store

* chore: rename `MetaClientExceededSizeLimit` to `ResponseExceededSizeLimit`
2024-12-02 12:26:02 +00:00
Ning Sun
bcadce3988 chore: remove openssl deps (#5079)
* chore: remove openssl deps

* ci: add ci task to check blacklisted dependencies
2024-12-02 07:12:15 +00:00
Weny Xu
0f116c8501 feat: enable compression for metasrv client (#5078)
* feat: enable compression for metasrv client

* refactor: simplify gRPC service router registration

* chore: fix unit tests
2024-12-02 05:18:25 +00:00
Ruihang Xia
c049ce6ab1 feat: add decolorize processor (#5065)
* feat: add decolorize processor

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* Update src/pipeline/src/etl/processor/cmcd.rs

* add crate level integration test

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2024-11-29 11:18:02 +00:00
dennis zhuang
6308e86e21 docs: tweak readme and AUTHOR (#5069)
* docs: tweak readme

* chore: links

* fix: typo

* fix: link

* docs: update AUTHOR
2024-11-29 04:08:53 +00:00
Ning Sun
36263830bb refactor: remove built-in apidocs and schemars (#5068)
* feat: feature gate apidocs

* refactor: remove built-in apidocs and schemars

* remove redoc html
2024-11-29 03:31:02 +00:00
discord9
d931389a4c fix(flow): minor fix about count(*)&sink keyword (#5061)
* fix: SiNk

* fix: sink&count(*)

* tests: SiNk

* refactor: per review
2024-11-29 03:06:27 +00:00
Lanqing Yang
8bdef776b3 fix: allow physical region alter region options (#5046)
allow physical region alter region options
2024-11-27 08:24:34 +00:00
Zhenchi
91e933517a chore: bump version of main branch to v0.11.0 (#5057)
chore: bump nightly version to v0.11

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
2024-11-27 02:19:24 +00:00
Lei, HUANG
a617e0dbef feat: use cache kv manager for SchemaMetadataManager (#5053)
* feat: add cache for schema options

* fix/use-cache-kv-manager: Add cache invalidation handling to Datanode's heartbeat task

 • Implement InvalidateSchemaCacheHandler in heartbeat.rs to handle cache invalidation instructions.
 • Update HeartbeatTask constructor to accept cached_kv_backend and pass it to InvalidateSchemaCacheHandler.
 • Modify DatanodeBuilder to clone cached_kv_backend when creating schema_metadata_manager.
 • Refactor MetasrvCacheInvalidator in cache_invalidator.rs to reuse MailboxMessage for broadcasting to different channels.

* fix: only remove schema related cache entries

* chore: add more tests

* fix/use-cache-kv-manager: Moved InvalidateSchemaCacheHandler to a separate module

 • Extracted InvalidateSchemaCacheHandler and associated tests into a new file cache_invalidator.rs
 • Removed async_trait and CacheInvalidator related code from heartbeat.rs
 • Added cache_invalidator module declaration in handler.rs

* fix: unit tests

* fix/use-cache-kv-manager:
 Standardize TODO comment format in CachedKvBackend txn method

* Update src/datanode/src/heartbeat/handler/cache_invalidator.rs

* Update src/datanode/src/heartbeat/handler/cache_invalidator.rs

* Update src/datanode/src/heartbeat/handler/cache_invalidator.rs

---------

Co-authored-by: jeremyhi <jiachun_feng@proton.me>
2024-11-26 12:24:47 +00:00
Yingwen
6130c70b63 fix: pass series row selector to file range reader (#5054) 2024-11-26 11:13:00 +00:00
Lei, HUANG
fae141ad0a fix(metric-engine): set ttl also on opening metadata regions (#5051)
* fix/metric-metadata-region-options: Remove APPEND_MODE_KEY and refactor TTL option handling in MetricEngineInner

* fix/metric-metadata-region-options: Refactor metadata region options into a shared function

 • Extract metadata region options into region_options_for_metadata_region function
 • Replace inline options map with a call to the new shared function in both create.rs and open.rs files

* fix: exclude typos

* fix/metric-metadata-region-options:
 Refactor metadata region options to accept original options and remove APPEND_MODE_KEY
2024-11-26 08:14:41 +00:00
LFC
57f31d14c8 refactor: expose configs for http clients used in object store (#5041) 2024-11-25 03:49:54 +00:00
Zhenchi
1cd6abb61f chore: bump version to v0.10.1 (#5048)
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
2024-11-25 03:20:54 +00:00
Ruihang Xia
e3927ea6f7 fix: prevent metadata region from inheriting database ttl (#5044)
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2024-11-25 02:13:11 +00:00
Zhenchi
a6571d3392 chore: bump version to 0.10.0 (#5040)
* chore: bump version to 0.10.0

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* fix sqlness version regex

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

---------

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
2024-11-22 03:56:25 +00:00
Yohan Wal
1255638e84 refactor: unify mysql execute through cli and protocol (#5038)
refactor: mysql execute
2024-11-22 03:55:09 +00:00
Yohan Wal
1578c004b0 fix: prepare param mismatch (#5025)
* fix: prepare param mismatch

* test: clear state

* fix: minus 1
2024-11-22 02:31:53 +00:00
Yohan Wal
5f8d849981 feat: alter database ttl (#5035)
* feat: alter databaset ttl

* fix: make clippy happy

* feat: add unset database option

* fix: happy ci

* fix: happy clippy

* chore: fmt toml

* fix: fix header

* refactor: introduce `AlterDatabaseKind`

* chore: apply suggestions from CR

* refactor: add unset database option support

* test: add unit tests

* test: add sqlness tests

* feat: invalidate schema name value cache

* Apply suggestions from code review

* chore: fmt

* chore: update error messages

* test: add more test cases

* test: add more test cases

* Apply suggestions from code review

* chore: apply suggestions from CR

---------

Co-authored-by: WenyXu <wenymedia@gmail.com>
2024-11-21 12:41:41 +00:00
Lei, HUANG
3029b47a89 fix: find latest window (#5037)
* fix: find latest window

* more test files
2024-11-21 04:56:03 +00:00
Weny Xu
14d997e2d1 feat: add unset table options support (#5034)
* feat: add unset table options support

* test: add tests

* chore: update greptime-proto

* chore: add comments
2024-11-21 03:52:56 +00:00
Zhenchi
0aab68c23b feat(vector): add conversion between vector and string (#5029)
* feat(vector): add conversion between vector and string

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* fix sqlness

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* address comments

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

---------

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
2024-11-20 08:42:00 +00:00
Weny Xu
027284ed1b chore(cli): set default timeout for cli commands (#5021)
* chore(cli): set default timeout for cli commands

* chore: apply suggestions from CR

* chore: apply suggestions from CR

* chore: update comments

* fix: treats `None` as `0s` to disable server-side default timeout

* chore: update comments
2024-11-20 07:36:17 +00:00
Ruihang Xia
6a958e2c36 feat: reimplement limit in PartSort to reduce memory footprint (#5018)
* feat: support windowed sort with where condition

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix split logic

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* modify fuzz test to reflect logic change

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* feat: handle sort that wont preserving partition

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* clean up

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix typo

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix test case and add more cases

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* basic impl

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* install topk

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* tests: add test for limit

* add debug assertion

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Co-authored-by: discord9 <discord9@163.com>
2024-11-20 07:11:49 +00:00
Zhenchi
db345c92df feat(vector): remove simsimd and use nalgebra instead (#5027)
* feat(vector): remove `simsimd` and use `nalgebra` instead

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* keep thing simple

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

---------

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
2024-11-20 06:57:10 +00:00
Yohan Wal
55ced9aa71 refactor: split up different stmts (#4997)
* refactor: set and unset

* chore: error message

* fix: reset Cargo.lock

* Apply suggestions from code review

Co-authored-by: jeremyhi <jiachun_feng@proton.me>

* Apply suggestions from code review

Co-authored-by: Weny Xu <wenymedia@gmail.com>

---------

Co-authored-by: jeremyhi <jiachun_feng@proton.me>
Co-authored-by: Weny Xu <wenymedia@gmail.com>
2024-11-20 06:02:51 +00:00
discord9
3633f25d0c feat: also shutdown gracefully on sigterm on unix (#5023)
* feat: support SIGTERM on unix

* chore: log

* fix: Result type
2024-11-19 15:20:33 +00:00
Yingwen
63bbfd04c7 fix: prune memtable/files range independently in each partition (#4998)
* feat: prune in each partition

* chore: change pick log to trace

* chore: add in progress partition scan to metrics

* feat: seqscan support pruning in partition

* chore: remove commented codes
2024-11-19 12:43:30 +00:00
Zhenchi
2f260d8b27 fix: android build failed due to simsimd (#5019)
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
2024-11-19 11:21:58 +00:00
discord9
4d8fe29ea8 feat: CREATE OR REPLACE FLOW (#5001)
* feat: Replace flow

* refactor: better show create flow&tests: better check

* tests: sqlness result update

* tests: unit test for update

* refactor: cmp with raw bytes

* refactor: rename

* refactor: per review
2024-11-19 08:44:57 +00:00
Zhenchi
dbb3f2d98d fix: inverted index constraint to be case-insensitive (#5020)
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
2024-11-19 08:30:20 +00:00
ZonaHe
9926e3bc78 feat: update dashboard to v0.6.1 (#5017)
Co-authored-by: ZonaHex <ZonaHex@users.noreply.github.com>
2024-11-19 07:28:10 +00:00
dennis zhuang
0dd02e93cf feat: make greatest supports timestamp and datetime types (#5005)
* feat: make greatest supports timestamp and datetime types

* chore: style

Co-authored-by: Lei, HUANG <6406592+v0y4g3r@users.noreply.github.com>

* refactor: greatest with gt_time_types macro

---------

Co-authored-by: Lei, HUANG <6406592+v0y4g3r@users.noreply.github.com>
2024-11-19 07:01:24 +00:00
Zhenchi
73e6bf399d test: reduce round precision to avoid platform diff (#5013)
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
2024-11-18 16:37:15 +00:00
discord9
4402f638cd fix: distinct respect in range (#5015)
* fix: distinct respect in range

* tests: sqlness

* chore: newline
2024-11-18 12:11:07 +00:00
shuiyisong
c199604ece feat: Loki remote write (#4941)
* chore: add debug loki remote write url

* chore: add decode snappy

* chore: format output

* feature: impl loki remote write

* fix: special labels deserialize

* chore: move result to folder

* chore: finish todo in loki write

* test: loki write

* chore: fix cr issue

* chore: fix cr issue

* chore: fix cr issue

* chore: update pre-commit config

* chore: fix cr issue

Co-authored-by: dennis zhuang <killme2008@gmail.com>

---------

Co-authored-by: dennis zhuang <killme2008@gmail.com>
2024-11-18 08:39:17 +00:00
Yohan Wal
2b72e66536 test: subquery test migrated from duckdb (#4985)
* test: subquery test migrated from duckdb

* test: update test

* test: skip unsupported features and add sources
2024-11-18 08:37:06 +00:00
Weny Xu
7c135c0ef9 feat: introduce DynamicTimeoutLayer (#5006)
* feat: introduce `DynamicTimeoutLayer`

* test: add unit test

* chore: apply suggestions from CR

* feat: add timeout option for cli
2024-11-18 07:10:40 +00:00
Weny Xu
9289265f54 fix: correct unset_maintenance_mode behavior (#5009) 2024-11-18 06:39:13 +00:00
Lanqing Yang
485782af51 fix: ensure Create Or Replace and If Not Exist cannot coexist in create view (#5003)
ensure Create Or Replace and If Not Exist cannot coexist in create view statement
2024-11-17 07:08:30 +00:00
Weny Xu
4b263ef1cc fix: obsolete wal entires while opening a migrated region (#4993)
* fix: delete obsolete wal entrie while opening a migrated region

* chore: add logs

* chore: rust fmt

* fix: fix fuzz test
2024-11-15 10:55:22 +00:00
Weny Xu
08f59008cc refactor: introduce MaintenanceModeManager (#4994)
* refactor: introduce MaintenanceModeManager

* chore: apply suggestions from CR

* chore: apply suggestions from CR

* chore: apply suggestions from CR
2024-11-15 07:15:22 +00:00
Yohan Wal
a2852affeb chore: rename change to modify (#5000)
* chore: rename change to modify

* chore: proto rev
2024-11-15 06:58:26 +00:00
Lanqing Yang
cdba7b442f feat: implement statement/execution timeout session variable (#4792)
* support set and show on statement/execution timeout session variables.

* implement statement timeout for mysql read, and postgres queries

* add mysql test with max execution time
2024-11-15 06:19:39 +00:00
Lin Yihai
42bf7e9965 refactor: Avoid wrapping Option for CacheManagerRef (#4996) 2024-11-14 13:37:02 +00:00
discord9
a70b4d7eba chore: update greptime-proto to e1070a (#4992)
* chore: update protot to e1070a

* fix: update proto usage
2024-11-14 11:59:00 +00:00
Zhenchi
408013c22b feat: add distance functions (#4987)
* feat: add distance functions

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* fix: f64 instead

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* address comments

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* tiny adjust

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

---------

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
2024-11-14 10:18:58 +00:00
zyy17
22c8a7656b chore: update cluster dashboard (#4995) 2024-11-14 10:18:50 +00:00
discord9
35898f0b2e test: more sqlness tests for flow (#4988)
* tests: more flow testcase

* tests(WIP): more tests

* tests: more flow tests

* test: wired regex for sqlness

* refactor: put blog&example to two files
2024-11-14 07:40:14 +00:00
zyy17
1101e98651 refactor(grafana): update cluster dashboard (#4980) 2024-11-14 07:12:30 +00:00
zyy17
0089cf1b4f fix: run install.sh error (#4989)
* fix: use '/bin/sh' shebang and remove function key word

* ci: check install.sh in nightly CI
2024-11-13 21:54:24 +00:00
dennis zhuang
d7c3c8e124 fix: physical table statistics info (#4975)
* fix: physical table statistics info

* refactor: is_physical_table

* fix: remove file
2024-11-13 14:29:44 +00:00
Yohan Wal
f4b9eac465 build(deps): switch to upstream jsonb (#4986)
chore: jsonb
2024-11-13 11:15:37 +00:00
Yohan Wal
aa6c2de42a refactor: use UNSET instead of enable (#4983)
* refactor: use UNSERT instead of enable

* fix: test

* chore: comment
2024-11-13 08:28:20 +00:00
discord9
175fddb3b5 fix: alter table add column id alloc mismatch (#4972)
* fix: alter table add column id alloc mismatch

* chore: remove debug code

* chore: typos

* chore: error variant

* chore: more checks for invariant

* refactor: better check&explain

* fix: exist column metadata correct

* chore: remove unused error variant
2024-11-13 07:02:35 +00:00
Lei, HUANG
6afc4e778a refactor(mito): tidy memtable stats (#4982)
* wip: share same WriteMetrics struct between different memtable implementations

* refactor: extract function to update memtable timestamp range
2024-11-13 06:57:27 +00:00
Lanqing Yang
3bbcde8e58 feat: support alter twcs compaction options (#4965)
support alter twcs compression options
2024-11-13 06:27:32 +00:00
Weny Xu
3bf9981aab refactor: support distinct JSON format and improve type conversions (#4979)
refactor: support distinct JSON format
2024-11-13 03:03:51 +00:00
Weny Xu
c47ad548a4 feat: refine region state checks and handle stalled requests (#4971)
* feat: refine region state checks and handle stalled requests

* test: add tests

* chore: apply suggestions from CR

* chore: apply suggestions from CR

* chore: add comments
2024-11-13 02:51:27 +00:00
Lin Yihai
0b6d78a527 refactor: consolidate DatanodeClientOptions (#4966)
refactor!: consolidate `DatanodeClientOptions`
2024-11-12 09:57:41 +00:00
Zhenchi
d616bd92ef feat: introduce vector type (#4964)
* feat: introduce vector type

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* test: fix prepared stmt

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* test: add grpc test

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* test: parse vector value

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* test: column to row

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* test: sqlness

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* fix: merge issue

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* refactor: add check for bytes size

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* Update tests/cases/standalone/common/types/vector/vector.sql

Co-authored-by: Ruihang Xia <waynestxia@gmail.com>

* chore: update proto

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* chore: simplify cargo

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* chore: address comment

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

---------

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
Co-authored-by: Ruihang Xia <waynestxia@gmail.com>
2024-11-12 08:28:44 +00:00
Yohan Wal
84aa5b7b22 feat: alter fulltext options (#4952)
* feat(WIP): alter fulltext index

Co-Authored-By: irenjj <renj.jiang@gmail.com>

* feat: alter column fulltext option

Co-Authored-By: irenjj <renj.jiang@gmail.com>

* chore: fmt

* test: add unit and integration tests

Co-Authored-By: irenjj <renj.jiang@gmail.com>

* test: update sqlness test

* chore: new line

* chore: lock file update

* chore: apply review comments

* test: update sqlness test

* test: update sqlness test

* fix: convert

* chore: apply review comments

* fix: toml fmt

* fix: tests

* test: add test for mito

* chore: error message

* fix: test

* fix: test

* fix: wrong comment

* chore: change proto rev

* chore: apply review comments

* chore: apply review comments

* chore: fmt

---------

Co-authored-by: irenjj <renj.jiang@gmail.com>
2024-11-12 03:04:04 +00:00
Zhenchi
cbf21e53a9 feat(puffin): apply range reader (#4928)
* wip

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* feat(puffin): apply range reader

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* refactor: read_vec reduce iteration

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* refactor: simplify rangereader for vec<u8>

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* test: add unit test

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* fix: toml format

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

---------

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
2024-11-12 02:36:38 +00:00
Zhenchi
6248a6ccf5 feat(index): support SQL to specify inverted index columns (#4929)
* feat(index): support building inverted index for the field column

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* feat(index): support SQL to specify inverted index columns

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* test: fix sqlness

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* fix: consider compatibility

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* polish

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* compatibility

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* fix

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* fix: ignore case

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* refactor: reduce dup

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* fix: clippy

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

---------

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
2024-11-11 08:06:23 +00:00
localhost
0e0c4faf0d fix(otlp): replace otlp trace attr type from string to jsonb (#4918)
* chore: minor update

* chore: replace otlp trace attr type from string to jsonb

* chore: add new util file and remove useless code

* chore: add license header

* chore: remove unused error

* chore: adjust otlp traces column order

* chore: update test

* chore: minor fix

---------

Co-authored-by: shuiyisong <xixing.sys@gmail.com>
2024-11-08 06:34:49 +00:00
Kaifeng Zheng
1a02fc31c2 fix: json_path_exists null results (#4881)
* fix: result of nulls

* update test result

* fix null behaviors, add null tests

* update NULL tests

* error handler when parsing json_path

* change the logic to: items' datatype in the input arrays are all the same.

* remove a comment

* refactor: better logic

* drop unnecessary err check

* added an error test case
2024-11-08 03:01:45 +00:00
Ruihang Xia
8efbafa538 feat: support filter with windowed sort (#4960)
* feat: support windowed sort with where condition

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix split logic

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* modify fuzz test to reflect logic change

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* feat: handle sort that wont preserving partition

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* clean up

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix typo

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix test case and add more cases

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2024-11-08 02:49:36 +00:00
Ruihang Xia
fcd0ceea94 fix: column already exists (#4961)
* fix: merge fetched logical metadata with existing cache

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix log acquire

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* Update src/metric-engine/src/engine/region_metadata.rs

Co-authored-by: Yingwen <realevenyag@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Co-authored-by: Yingwen <realevenyag@gmail.com>
2024-11-07 13:25:05 +00:00
jeremyhi
22f31f5929 chore: paginated query region stats (#4942) 2024-11-07 03:01:12 +00:00
Lei, HUANG
5d20acca44 fix: round euclidean result in sqlness (#4956) 2024-11-07 02:29:49 +00:00
Yingwen
e3733344fe fix: do not pick compacting/expired files (#4955) 2024-11-06 21:38:26 +00:00
dennis zhuang
305767e226 fix: bugs introduced by alter table options (#4953)
* fix: ChangeTableOptions display

* fix: partition number disappear after altering table options

* Update src/table/src/metadata.rs

Co-authored-by: Ruihang Xia <waynestxia@gmail.com>

---------

Co-authored-by: Lei, HUANG <6406592+v0y4g3r@users.noreply.github.com>
Co-authored-by: Ruihang Xia <waynestxia@gmail.com>
2024-11-06 19:56:49 +00:00
dennis zhuang
22a662f6bc docs: add TOC to readme (#4949)
* docs: add TOC to readme

* chore: remove some links

* chore: tweak TOC

* chore: tweak TOC
2024-11-06 13:27:00 +00:00
Lin Yihai
1431393fc8 fix: the region_stats API will return an error in instance test (#4951) 2024-11-06 13:15:48 +00:00
localhost
dfe8cf25f9 chore: add json path for pipeline (#4925)
* chore: add json path for pipeline

* chore: change jsonpath lib verion

* chore: remove useless doc

* chore: fix json path test

* chore: fix pipeline json path test
2024-11-06 11:44:59 +00:00
Weny Xu
cccd25ddbb chore: fix typos in change log level doc (#4948)
chore: fix typos in change log level
2024-11-06 07:49:00 +00:00
discord9
ac387bd2af fix: pprof (#4938) 2024-11-05 09:22:29 +00:00
LFC
2e9737c01d refactor: pass LogicalPlan to promql execution interceptor (#4937) 2024-11-05 08:49:05 +00:00
Ning Sun
a8b426aebe feat: add more geo functions (#4888)
* chore: add type conversion for array types

* feat: add h3_cells_contains

* refactor: resolve lint issues

* feat: add sphere distance function

* feat: euclidean distance between h3 centroids

* test: round float number

* feat: add more geospatial functions

* test: add tests for geometry functions

* refactor: move wkt function to dedicated module

* feat: add st_area

* refactor: only allow sphere distance between points
2024-11-05 03:44:25 +00:00
jeremyhi
f3509fa312 chore: minor refactor for weighted choose (#4917)
* chore: minor refactor for weighted choose

* chore: by comment, remove the fast path of choose_multiple
2024-11-05 02:55:11 +00:00
Lei, HUANG
3dcd6b8e51 fix: database base ttl (#4926)
* main:
 Add common-meta dependency and implement SchemaMetadataManager

 - Introduce `common-meta` as a new dependency in `mito2`.
 - Implement `SchemaMetadataManager` for managing schema-level metadata.
 - Update `DatanodeBuilder` and `MitoEngine` to pass `KvBackendRef` for schema metadata management.
 - Add `SchemaMetadataManager` to `RegionWorkerLoop` for compaction handling.
 - Include `SchemaNameKey` usage in compaction-related code.
 - Add `database_metadata_manager` module with `SchemaMetadataManager` struct and associated logic.

* fix/database-base-ttl:
 Refactor metadata management and update compaction logic

 - Remove `database_metadata_manager` and introduce `schema_metadata_manager`
 - Update compaction logic to handle TTL based on schema metadata
 - Adjust tests to use `schema_metadata_manager` for setting up schema options
 - Fix engine creation in tests to pass `kv_backend` explicitly
 - Remove unused imports and apply minor code cleanups

* fix/database-base-ttl:
 Extend CREATE TABLE LIKE to inherit schema options

 - Implement inheritance of database level options for CREATE TABLE LIKE
 - Add schema options to SHOW CREATE TABLE output
 - Refactor create_table_stmt to include schema_options in SQL generation
 - Update error handling to include TableMetadataManagerSnafu

* fix/database-base-ttl:
 Refactor error handling and remove schema dependency in table creation

 - Replace expect with the ? operator for error handling in open_compaction_region
 - Simplify create_logical_tables by removing catalog and schema name parameters
 - Remove unnecessary schema retrieval and merging of schema options in create_table_info
 - Clean up unused imports and redundant code

* fix/database-base-ttl:
 Refactor error handling and update documentation comments

 - Update comment to reflect retrieval of schema options instead of metadata
 - Introduce new error type `GetSchemaMetadataSnafu` for schema metadata retrieval failures
 - Implement error handling for schema metadata retrieval in `find_ttl` function

* fix: toml

* fix/database-base-ttl:
 Refactor SchemaMetadataManager and adjust Cargo.toml dependencies

 - Remove unused imports in schema_metadata_manager.rs
 - Add conditional compilation for SchemaMetadataManager::new
 - Update Cargo.toml to remove "testing" feature from common-meta dependency in main section and add it to dev-dependencies

* fix/database-base-ttl:
 Fix typos in comments and function names across multiple modules

 - Correct spelling of 'parallelism' in region_server, engine, and scan_region modules
 - Amend typo in TODO comment from 'persisent' to 'persistent' in server module
 - Update incorrect test query from 'versiona' to 'version' in federated module tests

* fix/database-base-ttl: Add schema existence check in StatementExecutor for CREATE TABLE operation

* fix/database-base-ttl: Add warning log for failed TTL retrieval in compaction region open function

* fix/database-base-ttl:
 Refactor to use SchemaMetadataManagerRef in Datanode and MitoEngine

 - Replace KvBackendRef with SchemaMetadataManagerRef across various components.
 - Update DatanodeBuilder and MitoEngine to pass SchemaMetadataManagerRef instead of KvBackendRef.
 - Adjust test cases to use get_schema_metadata_manager method for consistency.
2024-11-05 02:51:32 +00:00
Weny Xu
f221ee30fd fix: violations of elided_named_lifetimes (#4936) 2024-11-04 10:52:39 +00:00
Yohan Wal
fb822987a9 refactor: refactor alter parser (#4933)
refactor: alter parser
2024-11-04 09:00:30 +00:00
Weny Xu
4ab6dc2825 feat: support to insert json data via grpc protocol (#4908)
* feat: support to insert json data via grpc protocol

* chore: handle error

* feat: introduce `prepare_rows`

* chore: fmt toml

* test: add row deletion test

* test: fix unit test

* chore: remove log

* chore: apply suggestions from CR
2024-11-04 08:55:47 +00:00
dennis zhuang
191755fc42 fix: data_length, index_length, table_rows in tables (#4927)
* fix: data_length, index_length, table_rows in tables

* feat: table stats only works for mito engine currently

* fix: tests

* fix: typo

* chore: log error when region_stats fails
2024-11-04 07:44:13 +00:00
Yohan Wal
1676d02149 fix: panic when jsonb corrupted (#4919)
* refactor: json type update

* test: update test

* fix: convert when needed

* revert: leave sqlness tests unchanged

* fix: fmt

* refactor: just refactor

* Apply suggestions from code review

Co-authored-by: Weny Xu <wenymedia@gmail.com>

* refactor: parse jsonb first

* test: add bad cases

* Update src/datatypes/src/vectors/binary.rs

Co-authored-by: Weny Xu <wenymedia@gmail.com>

* fix: fmt

* fix: fix clippy/check

* fix: corrupted jsonb panic

* chore(deps): change to rev

---------

Co-authored-by: Weny Xu <wenymedia@gmail.com>
2024-11-04 06:55:14 +00:00
dennis zhuang
edc49623de chore: update default cache size to 1Gib (#4923)
* chore: update default cache size to 1Gib for object storage read/write cache

* feat: update docs

* fix: test
2024-11-04 03:53:17 +00:00
jeremyhi
9405d1c578 feat: heartbeat_flush_threshold option (#4924)
* feat: heartbeat_flush_threshold

* chore: rename to flush_stats_factor

* Update src/meta-srv/src/handler/collect_stats_handler.rs
2024-11-04 03:34:50 +00:00
dennis zhuang
7a4276c24a fix: typo (#4931)
fix/database-base-ttl:
 Fix typos in comments and function names across multiple modules

 - Correct spelling of 'parallelism' in region_server, engine, and scan_region modules
 - Amend typo in TODO comment from 'persisent' to 'persistent' in server module
 - Update incorrect test query from 'versiona' to 'version' in federated module tests

Co-authored-by: Lei, HUANG <mrsatangel@gmail.com>
2024-11-04 01:56:11 +00:00
Ruihang Xia
be72d3bedb feat: simple limit impl in PartSort (#4922)
* feat: simple limit impl in PartSort

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix: update time_index method to return a non-optional String

Co-authored-by: Yingwen <realevenyag@gmail.com>
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* use builtin limit

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* add more info to analyze display

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* update sqlness

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Co-authored-by: Yingwen <realevenyag@gmail.com>
2024-11-01 09:25:03 +00:00
discord9
1ff29d8fde chore: short desc markdown about change log level (#4921)
* chore: tiny doc about change log level

* chore: per review

* chore
2024-11-01 07:10:57 +00:00
Yingwen
39ab1a6415 feat: get row group time range from cached metadata (#4869)
* feat: get part range min-max from cache for unordered scan

* feat: seq scan push row groups if num_row_groups > 0

* test: test split

* feat: update comment

* test: fix split test

* refactor: rename get meta data method
2024-11-01 06:35:03 +00:00
Weny Xu
758ad0a8c5 refactor: simplify WeightedChoose (#4916)
* refactor: simplify WeightedChoose

* chore: remove unused errors
2024-10-31 06:22:30 +00:00
Ruihang Xia
8b60c27c2e feat: enhance windowed-sort optimizer rule (#4910)
* add RegionScanner::metadata

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* skip PartSort when there is no tag column

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* add more sqlness test

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* handle desc

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix: should keep part sort on DESC

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix clippy

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2024-10-31 06:15:45 +00:00
Yingwen
ea6df9ba49 fix: prune batches from memtable by time range (#4913)
* feat: add an iter to prune by time range

* feat: filter rows from mem range
2024-10-31 05:13:35 +00:00
Ning Sun
69420793e2 feat: implement parse_query api (#4860)
* feat: implement parse_query api

* chore: switch to upstream

* fix: add post method for parse_query

* chore: bump promql-parser

* test: use latest promql ast serialization
2024-10-30 12:16:22 +00:00
Yingwen
0da112b335 chore: provide more info in check batch message (#4906)
* chore: provide more info in check message

* chore: set timeout to 240s

---------

Co-authored-by: WenyXu <wenymedia@gmail.com>
2024-10-30 11:56:10 +00:00
dennis zhuang
dcc08f6b3e feat: adds the number of rows and index files size to region_statistics table (#4909)
* feat: adds index size to region statistics

* feat: adds the number of rows for region statistics

* test: adds sqlness test for region_statistics

* fix: test
2024-10-30 11:12:58 +00:00
dennis zhuang
a34035a1f2 fix: set transaction variables not working in mysql protocol (#4912) 2024-10-30 10:59:13 +00:00
LFC
fd8eba36a8 refactor: make use of the "pre_execute" in sql execution interceptor (#4875)
* feat: dynamic definition of plugin options

* rebase

* revert

* fix ci
2024-10-30 09:16:46 +00:00
Ruihang Xia
9712295177 fix(config): update tracing section headers in example TOML files (#4898)
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2024-10-30 08:31:31 +00:00
Lei, HUANG
d275cdd570 feat: Support altering table TTL (#4848)
* feat/alter-ttl:
 Update greptime-proto source and add ChangeTableOptions handling

 - Change greptime-proto source repository and revision in Cargo.lock and Cargo.toml
 - Implement handling for ChangeTableOptions in grpc-expr and meta modules
 - Add support for parsing and applying region option changes in mito2
 - Introduce new error type for invalid change table option requests
 - Add humantime dependency to store-api
 - Fix SQL syntax in tests for changing column types

* chore: remove write buffer size option handling since we don't support specifying write_buffer_size for single table or region

* persist ttl to manifest

* chore: add sqlness

* fix: sqlness

* fix: typo and toml format

* fix: tests

* update: change alter syntax

* feat/alter-ttl: Add Clone trait to RegionFlushRequest and remove redundant Default derive in region_request.rs.

* feat/alter-ttl: Refactor code to replace 'ChangeTableOption' with 'ChangeRegionOption' and handle TTL as a region option

 • Rename ChangeTableOption to ChangeRegionOption across various files.
 • Update AlterKind::ChangeTableOptions to AlterKind::ChangeRegionOptions.
 • Modify TTL handling to treat '0d' as None for TTL in table options.
 • Adjust related function names and comments to reflect the change from table to region options.
 • Include test case updates to verify the new TTL handling behavior.

* chore: update format

* refactor: update region options in DatanodeTableValue

* feat/alter-ttl:
 Remove TTL handling from RegionManifest and related structures

 - Eliminate TTL fields from `RegionManifest`, `RegionChange`, and associated handling logic.
 - Update tests and checksums to reflect removal of TTL.
 - Refactor `RegionOpener` and `handle_alter` to adjust to TTL removal.
 - Simplify `RegionChangeResult` by replacing `change` with `new_meta`.

* chore: fmt

* remove useless delete op

* feat/alter-ttl: Updated Cargo.lock and gRPC expression Cargo.toml to include store-api dependency. Refactored alter.rs to use ChangeOption from store-api instead of ChangeTableOptionRequest.
Adjusted error handling in error.rs to use MetadataError. Modified handle_alter.rs to handle TTL changes with ChangeOption. Simplified region_request.rs by replacing
ChangeRegionOption with ChangeOption and removing redundant code. Removed UnsupportedTableOptionChange error in table/src/error.rs. Updated metadata.rs to use ChangeOption for table
options. Removed ChangeTableOptionRequest enum and related conversion code from requests.rs.

* feat/alter-ttl: Update greptime-proto dependency to revision 53ab9a9553

* chore: format code

* chore: update greptime-proto
2024-10-30 04:39:48 +00:00
Weny Xu
83eb777d21 test: add fuzz test for metric region migration (#4862)
* test: add fuzz tests for migrate metric regions

* test: insert values before migrating metric region

* feat: correct table num

* chore: apply suggestions from CR
2024-10-29 15:47:48 +00:00
Yohan Wal
8ed5bc5305 refactor: json conversion (#4893)
* refactor: json type update

* test: update test

* fix: convert when needed

* revert: leave sqlness tests unchanged

* fix: fmt

* refactor: just refactor

* Apply suggestions from code review

Co-authored-by: Weny Xu <wenymedia@gmail.com>

* refactor: parse jsonb first

* test: add bad cases

* Update src/datatypes/src/vectors/binary.rs

Co-authored-by: Weny Xu <wenymedia@gmail.com>

* fix: fmt

* fix: fix clippy/check

---------

Co-authored-by: Weny Xu <wenymedia@gmail.com>
2024-10-29 15:46:24 +00:00
Weny Xu
9ded314905 feat: add json datatype for grpc protocol (#4897)
* chore: update greptime-proto

* feat: add json datatype for grpc protocol
2024-10-29 12:37:53 +00:00
discord9
702a55a235 chore: update proto depend (#4899) 2024-10-29 09:32:28 +00:00
discord9
f3e5a5a7aa ci: install numpy in CI (#4895)
chore: install numpy in CI
2024-10-29 07:57:40 +00:00
Zhenchi
9c79baca4b feat(index): support building inverted index for the field column on Mito (#4887)
feat(index): support building inverted index for the field column

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
2024-10-29 07:57:17 +00:00
Ruihang Xia
03f2fa219d feat: optimizer rule for windowed sort (#4874)
* basic impl

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* implement physical rule

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* feat: install windowed sort physical rule and optimize partition ranges

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* add logs and sqlness test

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* feat: introduce PartSortExec for partitioned sorting

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* tune exec nodes' properties and metrics

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* clean up

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix typo

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* debug: add more info on very wrong

* debug: also print overlap ranges

* feat: add check when emit PartSort Stream

* dbg: info on overlap working range

* feat: check batch range is inside part range

* set distinguish partition range param

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* chore: more logs

* update sqlness

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* tune optimizer

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* clean up

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix lints

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix windowed sort rule

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix: early terminate sort stream

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* chore: remove min/max check

* chore: remove unused windowed_sort module, uuid feature and refactor region_scanner to synchronous

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* chore: print more fuzz log

* chore: more log

* fix: part sort should skip empty part

* chore: remove insert logs

* tests: empty PartitionRange

* refactor: testcase

* docs: update comment&tests: all empty

* ci: enlarge etcd cpu limit

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Co-authored-by: discord9 <discord9@163.com>
Co-authored-by: evenyag <realevenyag@gmail.com>
2024-10-29 07:46:05 +00:00
Lei, HUANG
0ee455a980 fix: pyo3 ut (#4894) 2024-10-29 04:47:57 +00:00
Lei, HUANG
eab9e3a48d chore: remove struct size assertion (#4885)
chore/remove-struct-size-assertion: Remove unit tests for parquet_meta_size function in cache_size.rs
2024-10-28 08:50:10 +00:00
Yingwen
1008af5324 feat!: Divide flush and compaction job pool (#4871)
* feat: divide flush/compact job pool

* feat!: divide bg jobs config

* docs: update config examples

* test: fix tests
2024-10-25 23:36:16 +00:00
discord9
2485f66077 chore: graceful exit on bind fail (#4882) 2024-10-25 09:29:39 +00:00
Weny Xu
4f3afb13b6 fix: fix broken import (#4880) 2024-10-25 07:09:51 +00:00
shuiyisong
32a0023010 chore: add schema urls to otlp logs (#4876)
* chore: add schema urls to otlp logs table

* chore: update meter-macros version to remove anymap warning

* chore: change span id and trace id to field
2024-10-25 03:45:24 +00:00
Kaifeng Zheng
4e9c251041 feat: add json_path_match udf (#4864)
* add json_path_match udf

* sql tests for json_path_match

* fix clippy & comment

* fix null value behavior

* added null tests

* adjust function's behavior on nulls

* update test cases

* fix null check of json
2024-10-25 03:13:34 +00:00
Lei, HUANG
e328c7067c chore: udapte Rust toolchain to 2024-10-19 (#4857)
* update rust toolchain

* change toolchain to 2024-10-17

* fix: clippy

* fix: ut

* bump shadow-rs

* fix: use nightly-2024-10-19

* fix: clippy

* chore/udapte-toolchain-2024-10-17: Update DEV_BUILDER_IMAGE_TAG to 2024-10-19-a5c00e85-20241024184445 in Makefile
2024-10-25 00:23:32 +00:00
Weny Xu
8b307e4548 feat: introduce the PluginOptions (#4835)
* feat: introduce the `PluginOptions`

* chore: apply suggestions from CR
2024-10-24 12:02:10 +00:00
discord9
ff38abde2e chore: better column schema check for flow (#4855)
* chore: better column schema check for flow

* chore: better msg

* tests: clean up after tests

* chore: better msg

* chore: per review

* tests: sqlness
2024-10-24 09:43:32 +00:00
jeremyhi
aa9a265984 chore: make pusher log easy to understand (#4841)
* chore: make pusher log easy to understand

* Update src/meta-srv/src/service/heartbeat.rs

Co-authored-by: Yingwen <realevenyag@gmail.com>

* Update src/meta-srv/src/service/heartbeat.rs

Co-authored-by: Yingwen <realevenyag@gmail.com>

* chore: by comment

---------

Co-authored-by: Yingwen <realevenyag@gmail.com>
2024-10-24 07:44:16 +00:00
pa
9d3ee6384a feat: Limit CPU in runtime (#3685) (#4782)
feat: add throttle runtime (#3685)
2024-10-24 07:30:24 +00:00
localhost
fcde0a4874 feat: Add functionality to the Opentelemetry write interface to extract fields from attr to top-level data. (#4859)
* chore: add otlp select

* chore: change otlp select

* chore: remove json path

* chore: format toml

* chore: change opentelemetry extract keys header name

* chore: add some doc and remove useless code and lib

* chore: make clippy happy

* chore: fix by pr comment

* chore: fix by pr comment

* chore: opentelemetry logs select key change some type default semantic type
2024-10-24 05:55:57 +00:00
Weny Xu
5d42e63ab0 fix!: replace timeout_millis and connect_timeout_millis with Duration in DatanodeClientOptions (#4867)
* fix: correct options struct

* fix: fix unit test
2024-10-23 08:20:34 +00:00
discord9
0c01532a37 feat: Sort within each PartitionRange (#4847)
* feat: PartSort

* chore: rm unused

* chore: typo

* chore: mem pool df

* chore: add location to arrow error

* refactor: test_util

* refactor: per review

* chore: rm unused

* chore: more cases

* chore: test&buffer clear

* fix: remove fetch

* chore: fmt

* chore: per review

* chore: rm unused
2024-10-23 07:01:55 +00:00
ZonaHe
6d503b047a feat: update dashboard to v0.6.0 (#4861)
Co-authored-by: ZonaHex <ZonaHex@users.noreply.github.com>
2024-10-22 02:34:09 +00:00
Yingwen
5d28f7a912 feat: yields empty batch after reading a range (#4845)
* feat: add empty batch to end of range stream

* feat: add batch validation

* fix: validate batch order

* fix: not yield empty batch in compaction

* fix: empty record batch

* feat: add a flag to enable empty batch
2024-10-21 13:52:47 +00:00
Lei, HUANG
a50eea76a6 chore: bump greptime-meter (#4858)
chore/bump-greptime-meter: Add meter-core package and update meter-core dependency across various packages to
new git revision.
2024-10-21 08:18:30 +00:00
Yingwen
2ee1ce2ba1 docs: change cpu/mem panel to time-series (#4844)
* docs: change cpu/mem panel to time-series

* docs: update version
2024-10-18 08:42:01 +00:00
Weny Xu
c02b5dae93 chore: bump version to 0.9.5 (#4853) 2024-10-18 08:07:13 +00:00
Weny Xu
081c6d9e74 fix: flush metric metadata region (#4852)
* fix: flush metric metadata region

* chore: apply suggestions from CR
2024-10-18 07:21:35 +00:00
Weny Xu
ca6e02980e fix: overwrite entry_id if entry id is less than start_offset (#4842)
* fix: overwrite entry_id if entry id is less than start_offset

* feat: add `overwrite_entry_start_id` to options

* chore: update config.md
2024-10-18 06:31:02 +00:00
Weny Xu
74bdba4613 fix: fix metadata forward compatibility issue (#4846) 2024-10-18 06:26:41 +00:00
Weny Xu
2e0e82ddc8 chore: update greptime-proto to b4d3011 (#4850) 2024-10-18 04:10:22 +00:00
Yingwen
e0c4157ad8 feat: Seq scanner scans data by time range (#4809)
* feat: seq scan by partition

* feat: part metrics

* chore: remove unused codes

* chore: fmt stream

* feat: build ranges returns smallvec

* feat: move scan mem/file ranges to util and reuse

* feat: log metrics

* chore: correct some metrics

* feat: get explain info from ranges

* test: group test and remove unused codes

* chore: fix clippy

* feat: change PartitionRange end to exclusive

* test: add tests
2024-10-17 11:05:12 +00:00
discord9
613e07afb4 feat: window sort physical plan (#4814)
* WIP

* feat: range split& tests

* WIP: split range

* add sort exprs

* chore: typo

* WIP

* feat: find successive runs

* WIP

* READY FOR REVIEW PART ONE: more tests

* refactor: break into smaller functions

* feat: precompute working range(need testing)

* tests: on working range

* tests: on working range

* feat: support rev working range

* feat(to be tested): core logic of merge sort

* fix: poll results

* fix: find_slice_from_range&test

* chore: remove some unused util func&fields

* chore: typos

* chore: impl exec plan for WindowedSortExec

* test(WIP): window sort stream

* test: window sort stream

* chore: remove unused

* fix: fetch

* fix: WIP intersection remaining

* test: fix and test!

* chore: remove outdated comments

* chore: rename test

* chore: remove dbg line

* chore: sorted runs

* feat: handling unexpected data

* chore: unused

* chore: remove a print in test

* chore: per review

* docs: wrong comment

* chore: more test cases
2024-10-16 11:50:25 +00:00
Weny Xu
0ce93f0b88 chore: add more metrics for region migration (#4838) 2024-10-16 09:36:57 +00:00
Ning Sun
c231eee7c1 fix: respect feature flags for geo function (#4836) 2024-10-16 07:46:31 +00:00
Yiran
176f2df5b3 fix: dead links (#4837) 2024-10-16 07:43:14 +00:00
localhost
4622412dfe feat: add API to write OpenTelemetry logs to GreptimeDB (#4755)
* chore: otlp logs api

* feat: add API to write OpenTelemetry logs to GreptimeDB

* chore: fix test data schema error

* chore: modify the underlying data structure of the pipeline value map type from hashmap to btremap to keep key order

* chore: fix by pr comment

* chore: resolve conflicts and add some test

* chore: remove useless error

* chore: change otlp header name

* chore: fmt code

* chore: fix integration test for otlp log write api

* chore: fix by pr comment

* chore: set otlp body with fulltext default
2024-10-16 04:36:08 +00:00
jeremyhi
59ec90299b refactor: metasrv cannot be cloned (#4834)
* refactor: metasrv cannot be cloned

* chore: remove MetasrvInstance's clone
2024-10-15 11:36:48 +00:00
discord9
16b8cdc3d5 chore: bump version v0.9.4 (#4833) 2024-10-15 10:48:03 +00:00
Weny Xu
3197b8b535 feat: introduce default customizers (#4831)
* feat: introduce `DefaultHeartbeatHandlerGroupBuilderCustomizer` and `DefaultLeadershipChangeNotifierCustomizer`

* chore: code styling
2024-10-15 09:48:13 +00:00
zyy17
972c2441af chore: bump promql-parser to v0.4.1 and use to_string() for EvalStmt (#4832)
chore: bump promql-parser to v0.4.1 and use to_string() for EvalStmt
2024-10-15 08:50:37 +00:00
Ning Sun
bb8b54b5d3 feat: add some s2 geo functions (#4823)
* feat: add first batch of s2 functions

* refactor: update reusable code from main

* test: add sqlness tests for s2

* feat: add tostring function for s2

* Update src/common/function/src/scalars/geo/s2.rs

Co-authored-by: Lei, HUANG <6406592+v0y4g3r@users.noreply.github.com>

* Apply suggestions from code review

* one more change

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Co-authored-by: Ruihang Xia <waynestxia@gmail.com>
Co-authored-by: Lei, HUANG <6406592+v0y4g3r@users.noreply.github.com>
2024-10-15 06:47:29 +00:00
Weny Xu
b5233e500b feat: defer HeartbeatHandlerGroup construction and enhance LeadershipChangeNotifier (#4826)
* feat: enhance `HeartbeatHandlerGroup`

* chore: apply suggestions from CR

* chore: minor refactoring

* chore: code styling

* chore: apply suggestions from CR
2024-10-15 03:35:31 +00:00
Ruihang Xia
b61a388d04 refactor: replace info logs with debug logs in region server (#4829)
* refactor: replace info logs with debug logs in region server

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix: update error handling for closing and opening nonexistent regions

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2024-10-14 12:46:07 +00:00
Ruihang Xia
06e565d25a feat: cache logical region's metadata (#4827)
* feat: cache logical region's metadata

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* feat: implement logical region locking for metadata operations

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix: correct typo in comment for MetadataRegion struct

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2024-10-14 08:44:13 +00:00
Yingwen
3b2ce31a19 feat: enable prof features by default (#4815)
* feat: enable prof by default

* docs: don't need to build with features

* feat: add common-pprof as optional dep for pprof feature

* build: remove optional

* feat: use dump_text
2024-10-14 03:32:47 +00:00
Ruihang Xia
a889ea88ca fix: case sensitive for __field__ matcher (#4822)
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2024-10-14 03:18:59 +00:00
Yingwen
2f2b4b306c feat!: implement interval type by multiple structs (#4772)
* define structs and methods

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* feat: re-implement interval types in time crate

* feat: use new

* feat: interval value

* feat: query crate interval

* feat: pg and mysql interval

* chore: remove unused imports

* chore: remove commented codes

* feat: make flow compile but may not work

* feat: flow datetime

* test: fix some tests

* test: fix some flow tests(WIP)

* chore: some fix test&docs

* fix: change interval order

* chore: remove unused codes

* chore: fix cilppy

* chore: now signature change

* chore: remove todo

* feat: update error message

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Co-authored-by: Ruihang Xia <waynestxia@gmail.com>
Co-authored-by: discord9 <discord9@163.com>
2024-10-14 03:09:03 +00:00
jeremyhi
856c0280f5 feat: remove the distributed lock (#4825)
* feat: remove the distributed lock as we do not need it any more

* chore: delete todo comment

* chore: remove unused error
2024-10-12 09:04:22 +00:00
Ning Sun
aaa9b32908 feat: add more h3 functions (#4770)
* feat: add more h3 grid functions

* feat: add more traversal functions

* refactor: update some function definitions

* style: format

* refactor: avoid creating slice in nested loop

* feat: ensure column number and length

* refactor: fix lint warnings

* refactor: merge main

* Apply suggestions from code review

Co-authored-by: LFC <990479+MichaelScofield@users.noreply.github.com>

* Update src/common/function/src/scalars/geo/h3.rs

Co-authored-by: LFC <990479+MichaelScofield@users.noreply.github.com>

* style: format

---------

Co-authored-by: LFC <990479+MichaelScofield@users.noreply.github.com>
2024-10-11 17:57:54 +00:00
Weny Xu
4bb1f4f184 feat: introduce LeadershipChangeNotifier and LeadershipChangeListener (#4817)
* feat: introduce `LeadershipChangeNotifier`

* refactor: use `LeadershipChangeNotifier`

* chore: apply suggestions from CR

* chore: apply suggestions from CR

* chore: adjust log styling
2024-10-11 12:48:53 +00:00
Weny Xu
0f907ef99e fix: correct table name formatting (#4819) 2024-10-11 11:32:15 +00:00
Ruihang Xia
a61c0bd1d8 fix: error in admin function is not formatted properly (#4820)
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2024-10-11 10:02:45 +00:00
Lei, HUANG
7dd0e3ab37 fix: Panic in UNION ALL queries (#4796)
* fix/union_all_panic:
 Improve MetricCollector by incrementing level and fix underflow issue; add tests for UNION ALL queries

* chore: remove useless documentation

* fix/union_all_panic: Add order by clause to UNION ALL select queries in tests
2024-10-11 08:23:01 +00:00
Yingwen
d168bde226 feat!: move v1/prof API to debug/prof (#4810)
* feat!: move v1/prof to debug/prof

* docs: update readme

* docs: move prof docs to docs dir

* chore: update message

* feat!: remove v1/prof

* docs: update mem prof docs
2024-10-11 04:16:37 +00:00
jeremyhi
4b34f610aa feat: information extension (#4811)
* feat: information extension

* Update manager.rs

Co-authored-by: Weny Xu <wenymedia@gmail.com>

* chore: by comment

---------

Co-authored-by: Weny Xu <wenymedia@gmail.com>
2024-10-11 03:13:49 +00:00
Weny Xu
695ff1e037 feat: expose RegionMigrationManagerRef (#4812)
* chore: expose `RegionMigrationProcedureTask`

* fix: fix typos

* chore: expose `tracker`
2024-10-11 02:40:51 +00:00
Yohan Wal
288fdc3145 feat: json_path_exists udf (#4807)
* feat: json_path_exists udf

* chore: fix comments

* fix: caution when copy&paste QAQ
2024-10-10 14:15:34 +00:00
discord9
a8ed3db0aa feat: Merge sort Logical plan (#4768)
* feat(WIP): MergeSort

* wip

* feat: MergeSort LogicalPlan

* update sqlness result

* Apply suggestions from code review

Co-authored-by: Lei, HUANG <6406592+v0y4g3r@users.noreply.github.com>

* refactor: per review advice

* refactor: more per review

* chore: per review

---------

Co-authored-by: Ruihang Xia <waynestxia@gmail.com>
Co-authored-by: Lei, HUANG <6406592+v0y4g3r@users.noreply.github.com>
2024-10-09 09:37:27 +00:00
Kaifeng Zheng
0dd11f53f5 feat: add json format output for http interface (#4797)
* feat: json output format for http

* feat: add json result test case

* fix: typo and refactor a piece of code

* fix: cargo check

* move affected_rows to top level
2024-10-09 07:11:57 +00:00
Ning Sun
19918928c5 feat: add function to aggregate path into a geojson path (#4798)
* feat: add geojson function to aggregate paths

* test: add sqlness results

* test: add sqlness

* refactor: corrected to aggregation function

* chore: update comments

* fix: make linter happy again

* refactor: rename to remove `geo` from `geojson` function name

The return type is not geojson at all. It's just compatible with geojson's
coordinates part and superset's deckgl path plugin.
2024-10-09 02:38:44 +00:00
shuiyisong
5f0a83b2b1 fix: ts conversion during transform phase (#4790)
* fix: allow ts conversion during transform phase

* chore: replace `unimplemented` with snafu
2024-10-08 17:54:44 +00:00
localhost
71a66d15f7 chore: add json write (#4744)
* chore: add json write

* chore: add test for write json log api

* chore: enhancement of Error Handling

* chore: fix by pr comment

* chore: fix by pr comment

* chore: enhancement of error content and add some doc
2024-10-08 12:11:09 +00:00
Weny Xu
2cdd103874 feat: introduce HeartbeatHandlerGroupBuilderCustomizer (#4803)
* feat: introduce `HeartbeatHandlerGroupBuilderFinalizer`

* chore: rename to `HeartbeatHandlerGroupBuilderCustomizer`
2024-10-08 09:02:06 +00:00
Ning Sun
4dea4cac47 refactor: change sqlness ports to avoid conflict with local instance (#4794) 2024-10-08 07:33:24 +00:00
Kaifeng Zheng
a283e13da7 feat: set max log files to 720 by default, info log only (#4787)
* feat: set max log files to 720 by default, info log only

* expose max_log_files in tomls

* include dir info when panicing, limit max_log_files of err_log to 30, and that of slow_queries to opt.max_log_files

* fix clippy

* update config.md

* update expected config str

* limit err_log max files size to `max_log_files` too, include err info when panicing, put `max_l_f` in right position

* fix typos

* chore: config

Co-authored-by: Lei, HUANG <6406592+v0y4g3r@users.noreply.github.com>

---------

Co-authored-by: dennis zhuang <killme2008@gmail.com>
Co-authored-by: Lei, HUANG <6406592+v0y4g3r@users.noreply.github.com>
2024-10-04 18:05:40 +00:00
JohnsonLee
47a3277d12 feat: customize channel information for sqlness tests (#4729)
* feat: add pg and mysql server_address options

* feat: start pg and mysql server(standalone)

* feat: start pg and mysql in distribute

* feat: finally get there, specify postgres sqlness

* feat: support mysql sqlness

* fix: license

* fix: remove unused import

* fix: toml

* fix: clippy

* refactor: BeginProtocolInterceptorFactory to ProtocolInterceptorFactory

* fix: sqlness pg connect

* fix: clippy

* Apply suggestions from code review

Co-authored-by: Yingwen <realevenyag@gmail.com>

* fix: rustfmt

* fix: reconnect pg and mysql when restart

* test: add mysql related sqlness

* fix: wait for start while restarting

* fix: clippy

* fix: cargo lock conflict

fix: Cargo.lock conflict

* fix: usage of '@@tx_isolation' in sqlness

* fix: typos

* feat: retry with backoff when create client

* fix: use millisecond rather than microseconds in backoff

---------

Co-authored-by: Yingwen <realevenyag@gmail.com>
2024-10-04 09:26:57 +00:00
Yingwen
caf5f2c7a5 docs: add TM to logos (#4789) 2024-10-01 03:35:04 +00:00
Weny Xu
c1e8084af6 feat: add add_handler_after, add_handler_before, replace_handler (#4788)
* feat: add `add_handler_after`, `add_handler_before`, `replace_handler`

* chore: apply suggestions from CR

* test: add more tests

* feat: use `Vec` instead of `LinkedList`

* Update src/meta-srv/src/lib.rs

Co-authored-by: Yingwen <realevenyag@gmail.com>

---------

Co-authored-by: Yingwen <realevenyag@gmail.com>
2024-09-30 08:53:59 +00:00
Weny Xu
6e776d5f98 feat: support to reject write after flushing (#4759)
* refactor: use `RegionRoleState` instead of `RegionState`

* feat: introducing `RegionLeaderState::Downgrading`

* refactor: introduce `set_region_role_state_gracefully`

* refactor: use `set_region_role` instead of `set_writable`

* feat: support to reject write after flushing

* fix: fix unit tests

* test: add unit test for `should_reject_write`

* chore: add comments

* chore: refine comments

* fix: fix unit test

* test: enable fuzz tests for Local WAL

* chore: add logs

* chore: rename `RegionStatus` to `RegionState`

* feat: introduce `DowngradingLeader`

* chore: rename

* refactor: refactor `set_role_state` tests

* test: ensure downgrading region will reject write

* chore: enhance logs

* chore: refine name

* chore: refine comment

* test: add tests for `set_role_role_state`

* fix: fix unit tests

* chore: apply suggestions from CR

* chore: apply suggestions from CR
2024-09-30 08:28:51 +00:00
zyy17
e39a9e6feb feat: add StatementStatistics for slow query logging implementation (#4719)
* feat: log slow query

* feat: log slow query for sql

* refactor: add slow query logging options

* ci: fix errors

* feat: add StatementStatistics

* chore: revert modification of servers crate

* docs: update config docs

* fix: clippy errors
2024-09-30 03:26:50 +00:00
Weny Xu
77af4fd981 refactor: introduce HeartbeatHandlerGroupBuilder (#4785) 2024-09-30 02:56:53 +00:00
Yingwen
cd55202136 feat: unordered scanner scans data by time ranges (#4757)
* feat: define range meta

* feat: group ranges

* feat: split range

* feat: build ranges from the scan input

* feat: get partition range from range meta

* feat: build file range

* feat: unordered scan read by ranges

* feat: wip for mem ranges

* feat: build ranges

* feat: remove unused codes

* chore: update comments

* feat: update metrics

* chore: address review comments

* chore: debug assertion
2024-09-29 07:14:48 +00:00
Lei, HUANG
50cb59587d chore: replace anymap with anymap2 (#4781) 2024-09-29 06:27:35 +00:00
Lei, HUANG
0a82b12d08 fix(sqlness): sqlness isolation (#4780)
* fix: isolate logs

* fix: copy cases

* fix: clippy
2024-09-29 05:54:01 +00:00
LFC
d9f2f0ccf0 feat: add a new status code for "external" errors (#4775)
* feat: add a new status code for "external" errors

* Update src/auth/src/error.rs

Co-authored-by: shuiyisong <113876041+shuiyisong@users.noreply.github.com>

* support mysql cli cleartext auth

* resolve PR comments

---------

Co-authored-by: shuiyisong <113876041+shuiyisong@users.noreply.github.com>
2024-09-29 03:38:50 +00:00
Ning Sun
cedbbcf2b8 fix: update pgwire for potential issue with connection establish (#4783) 2024-09-28 19:27:31 +00:00
Ning Sun
d6be44bc7f fix: dead loop on detecting postgres ssl handshake (#4778) 2024-09-28 01:39:29 +00:00
JohnsonLee
3a46c1b235 fix: use information_schema returns Unknown database (#4774)
* fix: use information_schema returns Unknown database 'information_schema'

* test: make sure 'use information_schma' successful
2024-09-27 23:15:48 +00:00
Lei, HUANG
934bc13967 feat(mito): limit compaction output file size (#4754)
* Commit Message

 Clarify documentation for CompactionOutput struct

 Updated the documentation for the `CompactionOutput` struct to specify that the output time range is only relevant for windowed compaction.

* Add max_output_file_size to TwcsPicker and TwcsOptions

 - Introduced `max_output_file_size` to `TwcsPicker` struct and its logic to enforce output file size limits during compaction.
 - Updated `TwcsOptions` to include `max_output_file_size` and adjusted related tests.
 - Modified `new_picker` function to initialize `TwcsPicker` with the new `max_output_file_size` field.

* feat/limit-compaction-output-size:
 Refactor compaction picker and TWCS to support append mode and improve options handling

 - Update compaction picker to accept a reference to options and append mode flag
 - Modify TWCS picker logic to consider append mode when filtering deleted rows
 - Remove VersionControl usage in compactor and simplify return type
 - Adjust enforce_max_output_size logic in TWCS picker to handle max output file size
 - Add append mode flag to TwcsPicker struct
 - Fix incorrect condition in TWCS picker for enforcing max output size
 - Update region options tests to reflect new max output file size format (1GB and 7MB)
 - Simplify InvalidTableOptionSnafu error handling in create_parser
 - Add `compaction.twcs.max_output_file_size` to mito engine option keys

* resolve some comments
2024-09-27 11:17:36 +00:00
Weny Xu
4045298cb2 feat: add region_statistics table (#4771)
* refactor: introduce `region_statistic`

* refactor: move DatanodeStat related structs to common_meta

* chore: add comments

* feat: implement `list_region_stats` for `ClusterInfo` trait

* feat: add `region_statistics` table

* feat: add table_id and region_number fields

* chore: rename unused snafu

* chore: udpate sqlness results

* chore: avoid to print source in error msg

* chore: move `procedure_info` under `greptime` catalog

* chore: apply suggestions from CR

* Update src/common/meta/src/datanode.rs

Co-authored-by: jeremyhi <jiachun_feng@proton.me>

---------

Co-authored-by: jeremyhi <jiachun_feng@proton.me>
2024-09-27 09:54:52 +00:00
Lanqing Yang
cc4106cbd2 feat: protect datanode with concurrency limit. (#4699)
Adding parallelism in region server to protect datanode from query overload.
2024-09-27 06:12:57 +00:00
localhost
627a326273 refactor: Change the error type in the pipeline crate from String to Error (#4763)
* chore: in process

* chore: change pipeline crate error type

* chore: improve event error

* chore: fix by pr comment

* chore: use snafu context replace ok_or_else

* refactor: update snafu usage

---------

Co-authored-by: Ning Sun <sunng@protonmail.com>
2024-09-25 19:32:34 +00:00
discord9
0274e752ae chore: make sure aws-lc-sys wouldn't be built (#4767)
* chore: make sure aws-lc-sys wouldn't be built

* fix: bash

* refactor: multiple line bash
2024-09-25 17:11:37 +00:00
Lei, HUANG
cd4bf239d0 chore: relax table name constraint (#4766)
chore/relax-table-name-constraint: Updated NAME_PATTERN to allow '@' and '#' characters and adjusted tests for new table name validation rules.
2024-09-25 02:45:18 +00:00
Ning Sun
e3c0b5482f feat: returning warning instead of error on unsupported SET statement (#4761)
* feat: add capability to send warning to pgclient

* fix: refactor query context to carry query scope data

* feat: return a warning for unsupported postgres statement
2024-09-24 08:45:55 +00:00
Weny Xu
d1b252736d refactor: unify the styling in create_or_alter_tables_on_demand (#4756)
* refactor: refactor `create_or_alter_tables_on_demand`

* chore: apply suggestions from CR
2024-09-24 03:48:16 +00:00
discord9
54f6e13d13 fix: Release CI & make rustls use ring (#4750)
* feat: new dev-build

* fix: copy

* chore: rm dbg run

* fix: binstall install script

* chore: typos

* fix: update useable image

* fix: properly uninstall gcc-9

* chore: another try

* chore: print current cc version

* chore: another try to release

* fix: use gcc-10 by `update-alternatives`

* chore: update dev-build image

* remove gcc-9 again and install make/cmake

* use new image

* alias gcc-10/g++-10 to every variant

* again.....

* fix....

* again release ....

* chore: remove auto remove

* feat: rustls default to ring

* chore: update Cargo.lock

* chore: update Cargo.lock

* Apply suggestions from code review

---------

Co-authored-by: Ruihang Xia <waynestxia@gmail.com>
2024-09-24 03:11:12 +00:00
Ruihang Xia
5c64f0ce09 refactor!: simplify NativeType trait and remove percentile UDAF (#4758)
* refactor!: simplify NativeType trait and remove percentile UDAF

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* remove NativeType

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* recover a mis-deleted case

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2024-09-23 10:55:20 +00:00
Ruihang Xia
2feddca1cb feat: include order by to commutativity rule set (#4753)
* feat: include order by to commutativity rule set

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* tune sqlness replace interceptor

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2024-09-23 08:35:06 +00:00
Ning Sun
0f99218386 feat: list/array/timezone support for postgres output (#4727)
* feat: list/array support for postgres output

* fix: implement time zone support for postgrsql

* feat: add a geohash function that returns array

* fix: typo

* fix: lint warnings

* test: add sqlness test

* refactor: check resolution range before convert value

* fix: test result for sqlness

* feat: upgrade pgwire apis
2024-09-22 02:39:38 +00:00
Weny Xu
163cea81c2 feat: migrate local WAL regions (#4715)
* feat: allow to flush region before migrating

* fix: fix unit tests

* feat: allow to set `flush_timeout`

* feat: skip to replay memtable

* fix: fix unit tests

* test: add more tests

* refactor: simplify timeout logical

* test: add unit tests

* test: add unit tests

* chore: update comments

* fix: fix unit tests

* fix: fmt and clippy

* feat: change default timeout to 30s

* fix: throw `ExceededDeadline` error

* test: add tests for `downgrade_region_with_retry`

* chore: apply suggestions from CR

* chore: apply suggestions from CR

* chore: apply suggestions from CR

* chore: update proto to `3633474`

* refactor: refactor `upgrade_region_with_retry`

* chore: apply suggestions from CR
2024-09-20 08:27:20 +00:00
taobo
0c9b8eb0d2 feat: improve observability for procedure (#4675)
* feat: improve observability for procedure

* fix: test error

* test: add sqlness test for information_schema.procedure_info

* fix: sqlness test error

* fix: cr comment

* chore: update proto version

* fix: apply cr comment

* update version

* fix: cr comment

* optimize procedure type output format

* upgrade dep version

* fix: clippy error

* fix: `procedure` borrowed error

* fix: optimize code
2024-09-20 06:07:53 +00:00
Ning Sun
75c6fad1a3 feat: add more h3 scalar functions (#4707)
* feat: add more h3 scalar functions

* chore: comment up
2024-09-20 04:19:50 +00:00
Yingwen
e12ffbeb2f feat: flush other workers if still need flush (#4746) 2024-09-20 02:55:31 +00:00
discord9
c4e52ebf91 feat: use new image for gcc-10 (#4748)
feat: use new image
2024-09-20 02:34:45 +00:00
Yingwen
f02410c39b fix: disable field pruning in last non null mode (#4740)
* fix: don't prune fields in last non null mode

* test: add sqlness test for field pruning

* test: add flush

* refine implementation

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Co-authored-by: Ruihang Xia <waynestxia@gmail.com>
2024-09-20 00:35:37 +00:00
Ruihang Xia
f5cf25b0db refactor: remove DfPlan wrapper (#4733)
* refactor: remove DfPlan wrapper

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* clean up

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* remove unused errors

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix test assertion

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2024-09-19 12:29:33 +00:00
liyang
1acda74c26 fix: cannot input tag for the dev-builder image (#4743) 2024-09-19 11:14:34 +00:00
Yohan Wal
95787825f1 build(deps): use original jsonb repo (#4742) 2024-09-19 09:44:44 +00:00
Weny Xu
49004391d3 chore(fuzz): print table name for debugging (#4738)
* chore(fuzz): print table name for debugging

* chore: apply suggestions
2024-09-19 09:40:10 +00:00
discord9
d0f5b2ad7d fix: use gcc-10 in release dev build (#4741) 2024-09-19 09:34:06 +00:00
Yohan Wal
0295f8dbea docs: json datatype rfc (#4515)
* docs: json datatype rfc

* docs: turn to a jsonb proposal

* chore: fix typo

* feat: add store and query process

* fix: typo

* fix: use query nodes instead of query plans

* feat: a detailed overview of query

* fix: grammar

* fix: use independent cast function

* fix: unify cast function

* fix: refine, make statements clear

* docs: update rfc according to impl

* docs: refine

* docs: fix wrong arrows

* docs: refine

* docs: fix some errors qaq
2024-09-19 05:49:10 +00:00
Ning Sun
8786624515 feat: improve support for postgres extended protocol (#4721)
* feat: improve support for postgres extended protocol

* fix: lint fix

* fix: test code

* fix: adopt upstream

* refactor: remove dup code

* refactor: avoid copy on error message
2024-09-19 05:30:56 +00:00
shuiyisong
52d627e37d chore: add log ingest interceptor (#4734)
* chore: add log ingest interceptor

* chore: rename

* chore: update interceptor signature
2024-09-19 05:14:47 +00:00
Lei, HUANG
b5f7138d33 refactor(tables): improve tables performance (#4737)
* chore: cherrypick 52e8eebb2dbbbe81179583c05094004a5eedd7fd

* refactor/tables: Change variable from immutable to mutable in KvBackendCatalogManager's method

* refactor/tables: Replace unbounded channel with bounded and use semaphore for concurrency control in KvBackendCatalogManager

* refactor/tables: Add common-runtime dependency and update KvBackendCatalogManager to use common_runtime::spawn_global

* refactor/tables: Await on sending error through channel in KvBackendCatalogManager
2024-09-19 04:44:02 +00:00
Ning Sun
08bd40333c feat: add an option to turn on compression for arrow output (#4730)
* feat: add an option to turn on compression for arrow output

* fix: typo
2024-09-19 04:38:41 +00:00
discord9
d1e0602c76 fix: opensrv Use After Free update (#4732)
* chore: version skew

* fix: even more version skew

* feat: use `ring` instead of `aws-lc` for remove nasm assembler on windows

* feat: use `ring` for pgwire

* feat: change to use `aws-lc-sys` on windows instead

* feat: change back to use `ring`

* chore: provide CryptoProvider

* feat: use upstream repo

* feat: install ring crypto lib in main

* chore: use same fn to install in tests

* feat: make pgwire use `ring`
2024-09-19 04:12:13 +00:00
Weny Xu
befb6d85f0 fix: determine region role by using is_readonly (#4725)
fix: correct `is_writable` behavior
2024-09-18 22:17:39 +00:00
Yohan Wal
f73fb82133 feat: add respective json_is UDFs for JSON type (#4726)
* feat: add respective json_is UDFs

* refactor: rename to_json to parse_json

* chore: happy clippy

* chore: some rename

* fix: small fixes
2024-09-18 11:07:30 +00:00
shuiyisong
50b3bb4c0d fix: sort cargo toml (#4735) 2024-09-18 09:19:05 +00:00
zyy17
0847ff36ce fix: config test failed and use similar_asserts::assert_eq to replace assert_eq for long string compare (#4731)
* fix: config test failed and use 'similar_asserts::assert_eq' to replace 'assert_eq' for long string compare

* Update Cargo.toml

Co-authored-by: Yingwen <realevenyag@gmail.com>

* Update src/cmd/tests/load_config_test.rs

Co-authored-by: Yingwen <realevenyag@gmail.com>

---------

Co-authored-by: Ruihang Xia <waynestxia@gmail.com>
Co-authored-by: Yingwen <realevenyag@gmail.com>
2024-09-18 07:41:25 +00:00
shuiyisong
c014e875f3 chore: add auto-decompression layer for otlp http request (#4723)
* chore: add auto-decompression for http request

* test: otlp
2024-09-18 04:32:00 +00:00
Zhenchi
3b5b906543 feat(index): add explicit adapter between RangeReader and AsyncRead (#4724)
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
2024-09-18 03:33:55 +00:00
Weny Xu
d1dfffcdaf chore: enable fuzz test for append table (#4702)
* chore: enable fuzz test for append table

* fix: fix mysql translator
2024-09-18 03:01:30 +00:00
localhost
36b1bafbf0 fix: pipeline dissert error is returned directly to the user, instead of printing a warn log (#4709)
* fix: pipeline dissert error is returned directly to the user, instead of printing a warn log

* chore: add more test for pipeline
2024-09-12 18:21:05 +00:00
Yohan Wal
67fb3d003e feat: add respective get_by_path UDFs for JSON type (#4720)
* feat: add respectiv get_by_path udf for json type

* Apply review comments

Co-authored-by: Weny Xu <wenymedia@gmail.com>

* fix: fix compile error

* refactor: change name of UDFs, add some tests

---------

Co-authored-by: Weny Xu <wenymedia@gmail.com>
2024-09-11 08:17:57 +00:00
zyy17
aa03d3b11c docs: use docs comment prefix and bump toml2docs version (#4711) 2024-09-11 07:49:23 +00:00
discord9
a3d567f0c9 perf(flow): use batch mode for flow (#4599)
* generic bundle trait

* feat: impl get/let

* fix: drop batch

* test: tumble batch

* feat: use batch eval flow

* fix: div use arrow::div not mul

* perf: not append batch

* perf: use bool mask for reduce

* perf: tiny opt

* perf: refactor slow path

* feat: opt if then

* fix: WIP

* perf: if then

* chore: use trace instead

* fix: reduce missing non-first batch

* perf: flow if then using interleave

* docs: add TODO

* perf: remove unnecessary eq

* chore: remove unused import

* fix: run_available no longer loop forever

* feat: blocking on high input buf

* chore: increase threhold

* chore: after rebase

* chore: per review

* chore: per review

* fix: allow empty values in reduce&test

* tests: more flow doc example tests

* chore: per review

* chore: per review
2024-09-11 03:31:52 +00:00
Zhenchi
f252599ac6 feat(index): add RangeReader trait (#4718)
* feat(index): add `RangeReader` trait`

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* fix: return content_length as read bytes

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* chore: remove buffer & use `BufMut`

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

---------

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
2024-09-10 15:24:06 +00:00
Ruihang Xia
ff40d512bd fix: support append-only physical table (#4716)
* fix: support append-only physical table

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* Update src/metric-engine/src/engine/create.rs

Co-authored-by: jeremyhi <jiachun_feng@proton.me>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Co-authored-by: Weny Xu <wenymedia@gmail.com>
Co-authored-by: jeremyhi <jiachun_feng@proton.me>
2024-09-10 12:23:23 +00:00
jeremyhi
dcae21208b chore: refresh route table (#4673)
* chore: remove error::

* chore: avoid to use get_raw if unnecessary

* chore: clearer method name

* feat: remap node addresses in table route

* chore: add unit test for remap address

* feat: refresh node address mapping via heartbeat

* feat: broadcast table cache invalidate on new epoch

* chore: clarify heartbeat log

* chore: remove InvalidHeartbeatRequest

* chore: add log

* feat: add role into NodeAddressKey

* chore: fix test

* Update src/common/meta/src/key/table_route.rs

Co-authored-by: LFC <990479+MichaelScofield@users.noreply.github.com>

* chore: simplify code

---------

Co-authored-by: LFC <990479+MichaelScofield@users.noreply.github.com>
2024-09-10 12:08:59 +00:00
Weny Xu
d0fd79ac7f chore: remove validate_request_with_table (#4710)
perf: remove `validate_request_with_table`
2024-09-10 11:56:18 +00:00
Yingwen
3e17c09e45 feat: skip caching uncompressed pages if they are large (#4705)
* feat: cache each uncompressed page

* chore: remove unused function

* chore: log

* chore: log

* chore: row group pages cache kv

* feat: also support row group level cache

* chore: fix range count

* feat: don't cache compressed page for row group cache

* feat: use function to get part

* chore: log whether scan is from compaction

* chore: avoid get column

* feat: add timer metrics

* chore: Revert "feat: add timer metrics"

This reverts commit 4618f57fa2ba13b1e1a8dec83afd01c00ae4c867.

* feat: don't cache individual uncompressed page

* feat: append in row group level under append mode

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* chore: fetch pages cost

* perf: yield

* Update src/mito2/src/sst/parquet/row_group.rs

* refactor: cache key

* feat: print file num and row groups num in explain

* test: update sqlness test

* chore: Update src/mito2/src/sst/parquet/page_reader.rs

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Co-authored-by: Ruihang Xia <waynestxia@gmail.com>
2024-09-10 11:52:16 +00:00
jeremyhi
04de3ed929 chore: avoid schema check when auto_create_table_hint is disabled (#4712)
chore: avoid schema check when auto-create-table-hint is disabled
2024-09-10 07:13:28 +00:00
Ruihang Xia
29f215531a feat: parallel in row group level under append mode (#4704)
feat: append in row group level under append mode

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2024-09-10 07:12:23 +00:00
jeremyhi
545a80c6e0 chore: remove unused method (#4703) 2024-09-09 12:14:17 +00:00
Yohan Wal
04e7dd6fd5 feat: add json data type (#4619)
* feat: add json type and vector

* fix: allow to create and insert json data

* feat: udf to query json as string

* refactor: remove JsonbValue and JsonVector

* feat: show json value as strings

* chore: make ci happy

* test: adunit test and sqlness test

* refactor: use binary as grpc value of json

* fix: use non-preserve-order jsonb

* test: revert changed test

* refactor: change udf get_by_path to jq

* chore: make ci happy

* fix: distinguish binary and json in proto

* chore: delete udf for future pr

* refactor: remove Value(Json)

* chore: follow review comments

* test: some tests and checks

* test: fix unit tests

* chore: follow review comments

* chore: corresponding changes to proto

* fix: change grpc and pgsql server behavior alongside with sqlness/crud tests

* chore: follow review comments

* feat: udf of conversions between json and strings, used for grpc server

* refactor: rename to_string to json_to_string

* test: add more sqlness test for json

* chore: thanks for review :)

* Apply suggestions from code review

---------

Co-authored-by: Weny Xu <wenymedia@gmail.com>
2024-09-09 11:41:36 +00:00
jeremyhi
dc89944570 feat: gRPC auto create table hint (#4700)
* feat: gRPC auto create table hint

* chore: remove the checking of auto_create_table_hint
2024-09-09 09:07:07 +00:00
Weny Xu
8bf549c2fa chore: print downgraded region last_entry_id (#4701) 2024-09-09 08:14:55 +00:00
Lei, HUANG
208afe402b feat(wal): increase recovery parallelism (#4689)
* Refactor RaftEngineLogStore to use references for config

 - Updated `RaftEngineLogStore::try_new` to accept a reference to `RaftEngineConfig` instead of taking ownership.
 - Replaced direct usage of `config` with individual fields (`sync_write`, `sync_period`, `read_batch_size`).
 - Adjusted test cases to pass references to `RaftEngineConfig`.

* Add parallelism configuration for WAL recovery

 - Introduced `recovery_parallelism` setting in `datanode.example.toml` and `standalone.example.toml` for configuring parallelism during WAL recovery.
 - Updated `Cargo.lock` and `Cargo.toml` to include `num_cpus` dependency.
 - Modified `RaftEngineConfig` to include `recovery_parallelism` with a default value set to the number of CP

* feat/wal-recovery-parallelism:
 Add `wal.recovery_parallelism` configuration option

 - Introduced `wal.recovery_parallelism` to config.md for specifying parallelism during WAL recovery.
 - Updated `RaftEngineLogStore` to include `recovery_threads` from the new configuration.

* fix: ut
2024-09-09 04:25:24 +00:00
Ning Sun
c22a398f59 fix: return version string based on request protocol (#4680)
* fix: return version string based on request protocol

* fix: resolve lint issue
2024-09-09 03:36:54 +00:00
JohnsonLee
a8477e4142 fix: table resolving logic related to pg_catalog (#4580)
* fix: table resolving logic related to pg_catalog

refer to
https://github.com/GreptimeTeam/greptimedb/issues/3560#issuecomment-2287794348
and #4543

* refactor: remove CatalogProtocol type

* fix: sqlness

* fix: forbid create database pg_catalog with mysql client

* refactor: use QueryContext as arguments rather than Channel

* refactor: pass None as default behaviour in information_schema

* test: fix test
2024-09-09 00:47:59 +00:00
Yiran
b950e705f5 chore: update the document link in README.md (#4690) 2024-09-07 15:27:32 +00:00
Ruihang Xia
d2d62e0c6f fix: unconditional statistics (#4694)
* fix: unconditional statistics

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* add more sqlness case

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2024-09-07 04:28:11 +00:00
localhost
5d9f8a3be7 feat: add test pipeline api (#4667)
* chore: add test pipeline api

* chore: add test for test pipeline api

* chore: fix taplo check

* chore: change pipeline dryrun api path

* chore: add more info for pipeline dryrun api
2024-09-06 08:36:49 +00:00
jeremyhi
e88465840d feat: add extension field to HeartbeatRequest (#4688)
* feat: add extension field to HeartbeatRequest

* chore: extension to extensions

* chore: upgrade proto
2024-09-06 08:29:20 +00:00
localhost
67d95d2088 refactor!: add processor builder and transform buidler (#4571)
* chore: add processor builder and transform buidler

* chore: in process

* chore: intermediate state from hashmap to vector in pipeline

* chore: remove useless code and rename some struct

* chore: fix typos

* chore: format code

* chore: add error handling and optimize code readability

* chore: fix typos

* chore: remove useless code

* chore: add some doc

* chore: fix by pr commit

* chore: remove useless code and change struct name

* chore: modify the location of the find_key_index function.
2024-09-06 07:51:08 +00:00
Yingwen
506dc20765 fix: last non null iter not init (#4687) 2024-09-06 04:13:23 +00:00
Lei, HUANG
114772ba87 chore: bump version v0.9.3 (#4684) 2024-09-06 02:31:41 +00:00
liyang
89a3da8a3a chore(dockerfile): remove mysql and postgresql clients in greptimedb image (#4685) 2024-09-05 16:00:53 +00:00
jeremyhi
8814695b58 feat: invalidate cache via invalidator on region migration (#4682)
feat: invalidate table via invalidator on region migration
2024-09-05 06:15:38 +00:00
Lanqing Yang
86cef648cd feat: add more spans to mito engine (#4643)
feat: add more span on mito engine
2024-09-05 06:13:22 +00:00
Ning Sun
e476e36647 feat: add geohash and h3 as built-in functions (#4656)
* feat: add built-in functions h3 and geohash

* tests: add sqlness tests for geo functions

* doc: correct h3 comment

* fix: lint error

* fix: toml format

* refactor: address review comments

* test: add more sqlness cases

* Apply suggestions from code review

Co-authored-by: Ruihang Xia <waynestxia@gmail.com>

---------

Co-authored-by: Ruihang Xia <waynestxia@gmail.com>
2024-09-05 04:42:29 +00:00
shuiyisong
4781b327f3 fix: ref to auth err (#4681)
* fix: ref to auth err

* fix: typo
2024-09-05 04:05:39 +00:00
LFC
3e4a69017d build: add mysql and postgresql clients to greptimedb image (#4677) 2024-09-04 11:38:47 +00:00
LFC
d43e31c7ed feat: schedule compaction when adding sst files by editing region (#4648)
* feat: schedule compaction when adding sst files by editing region

* add minimum time interval for two successive compactions

* resolve PR comments
2024-09-04 10:10:07 +00:00
discord9
19e2a9d44b feat: change log level dynamically (#4653)
* feat: add dyn_log handle

* feat: use reload handle

* chore: per review
2024-09-04 07:54:50 +00:00
zyy17
8453df1392 refactor: make init_global_logging() clean and add log_format (#4657)
refactor: refine the code logic of init_global_logging and add json output format
2024-09-04 03:04:51 +00:00
Ruihang Xia
8ca35a4a1a fix: use number of partitions as parallilism in region scanner (#4669)
* fix: use number of partitions as parallilism in region scanner

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* add sqlness

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

Co-authored-by: Lei HUANG <mrsatangel@gmail.com>

* order by ts

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* debug pring time range

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Co-authored-by: Lei HUANG <mrsatangel@gmail.com>
2024-09-03 13:42:38 +00:00
Ruihang Xia
93f202694c refactor: remove unused error variants (#4666)
* add python script

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* remove unused errors

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix all negative cases

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* setup CI

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* add license header

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2024-09-03 13:19:38 +00:00
Weny Xu
b52e3c694a chore(ci): set etcd resources limits (#4665) 2024-09-03 07:25:23 +00:00
dennis zhuang
a612b67470 feat: supports name in object storage config (#4630)
* feat: supports name in object storage config

* fix: integration test

* fix: integration test

* fix: update sample config

* fix: config api test
2024-09-03 07:02:55 +00:00
jeremyhi
9b03940e03 chore: refactor metadata key value trait (#4664) 2024-09-03 07:00:24 +00:00
jeremyhi
8d6cd8ae16 feat: export import database (#4654)
* feat: export database create sql

* feat: import create database

* Update src/cmd/src/cli/export.rs

Co-authored-by: Weny Xu <wenymedia@gmail.com>

* Update src/cmd/src/cli/import.rs

Co-authored-by: Weny Xu <wenymedia@gmail.com>

* Update src/cmd/src/error.rs

Co-authored-by: Ruihang Xia <waynestxia@gmail.com>

* chore: make show create fail fast

---------

Co-authored-by: Weny Xu <wenymedia@gmail.com>
Co-authored-by: Ruihang Xia <waynestxia@gmail.com>
2024-09-03 04:45:25 +00:00
dennis zhuang
8f4ec536de feat: grpc writing supports TTL hint (#4651) 2024-09-03 02:15:01 +00:00
zyy17
f0e2d6e663 fix: use 'target' for 'actions-rust-lang/setup-rust-toolchain' to fix cross build failed (#4661) 2024-09-02 06:11:12 +00:00
Weny Xu
306bd25c64 fix: expose missing options for initializing regions (#4660)
* fix: expose `init_regions_in_background` and `init_regions_parallelism` opts

* fix: ci
2024-09-02 03:11:18 +00:00
zyy17
ddafcc678c ci: disable macos integration test and some minor refactoring (#4658) 2024-09-02 03:06:17 +00:00
Weny Xu
2564b5daee fix: correct otlp endpoint formatting (#4646) 2024-09-02 02:59:50 +00:00
Lei, HUANG
37dcf34bb9 fix(mito): avoid caching empty batches in row group (#4652)
* fix: avoid caching empty batches in row group

* fix: clippy

* Update tests/cases/standalone/common/select/last_value.sql

* fix: sqlness
2024-09-02 02:43:00 +00:00
Yingwen
8eda36bfe3 feat: remove files from the write cache in purger (#4655)
* feat: remove files from the write cache in purger

* chore: fix typo
2024-08-31 04:19:52 +00:00
Ruihang Xia
68b59e0e5e feat: remove the requirement that partition column must be PK (#4647)
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2024-08-31 03:16:01 +00:00
Ruihang Xia
a37aeb2814 feat: initialize partition range from ScanInput (#4635)
* feat: initialize partition range from ScanInput

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* use num_rows instead

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* add todo

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* setup unordered scan

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* Update src/mito2/src/read/scan_region.rs

Co-authored-by: jeremyhi <jiachun_feng@proton.me>

* leave unordered scan unchanged

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Co-authored-by: jeremyhi <jiachun_feng@proton.me>
2024-08-30 07:30:37 +00:00
jeremyhi
f641c562c2 feat: show create database (#4642)
* feat: show create database

* feat: add sqlness test

* chore: reorder mod and use

* feat: show create schema

* Update src/frontend/src/instance.rs
2024-08-30 03:58:11 +00:00
Lanqing Yang
9286e963e7 chore: adding heartbeat sent/recv counts in greptimedb nodes (#4624)
obs: adding heartbeat sent/recv counts in greptimedb nodes
2024-08-30 03:57:16 +00:00
LFC
8ea4f67e4b refactor: reduce a object store "stat" call (#4645) 2024-08-30 03:31:19 +00:00
jeremyhi
5e4bac2633 feat: import cli tool (#4639)
* feat: import create tables

* feat: import databasse

* fix: export view schema
2024-08-29 09:32:21 +00:00
LFC
d45b04180c feat: pre-download the ingested sst (#4636)
* refactor: pre-read the ingested sst file in object store to fill the local cache to accelerate first query

* feat: pre-download the ingested SST from remote to accelerate following reads

* resolve PR comments

* resolve PR comments
2024-08-29 08:36:41 +00:00
discord9
8c8499ce53 perf(flow): Map&Reduce Operator use batch to reduce alloc (#4567)
* feat: partial impl mfp

* feat: eval batch inner

* chore: fmt

* feat: mfp eval_batch

* WIP

* feat: Collection generic over row&Batch

* feat: render source batch

* chore: chore

* feat: render mfp batch

* feat: render reduce batch(WIP)

* feat(WIP): render reduce

* feat: reduce batch

* feat: render sink batch

* feat: render constant batch

* chore: error handling& mfp batch test

* test: mfp batch

* chore: rm import

* test: render reduce batch

* chore: add TODO

* chore: per bot review

* refactor: per review

* chore: cmt

* chore: rename

* docs: update no panic
2024-08-29 07:28:13 +00:00
Weny Xu
79f40a762b fix: set selector_result_cache_size in unit test again (#4641) 2024-08-29 07:14:40 +00:00
jeremyhi
b062d8515d feat: copy database ignores view and temporary tables (#4640)
feat: copy database ingores view and temporary tables
2024-08-29 06:17:51 +00:00
discord9
9f9c1dab60 feat(flow): use DataFusion's optimizer (#4489)
* feat: use datafusion optimization

refactor: mv `sql_to_flow_plan` elsewhere

feat(WIP): use df optimization

WIP analyzer rule

feat(WIP): avg expander

fix: transform avg expander

fix: avg expand

feat: names from substrait

fix: avg rewrite

test: update `test_avg`&`test_avg_group_by`

test: fix `test_sum`

test: fix some tests

chore: remove unused flow plan transform

feat: tumble expander

test: update tests

* chore: clippy

* fix: tumble lose `group expr`

* test: sqlness test update

* test: rm unused cast

* test: simplify sqlness

* refactor: per review

* chore: after rebase

* fix: remove a outdated test

* test: add comment

* fix: report error when not literal

* chore: update sqlness test after rebase

* refactor: per review
2024-08-29 02:52:00 +00:00
dennis zhuang
841e66c810 fix: config api and export metrics default database (#4633) 2024-08-28 14:28:49 +00:00
shuiyisong
d1c635085c chore: modify grafana config to accord with version 9 (#4634)
chore: update grafana config to accord with version 9
2024-08-28 12:53:35 +00:00
Weny Xu
47657ebbc8 feat: replay WAL entries respect index (#4565)
* feat(log_store): use new `Consumer`

* feat: add `from_peer_id`

* feat: read WAL entries respect index

* test: add test for `build_region_wal_index_iterator`

* fix: keep the handle

* fix: incorrect last index

* fix: replay last entry id may be greater than expected

* chore: remove unused code

* chore: apply suggestions from CR

* chore: rename `datanode_id` to `location_id`

* chore: rename `from_peer_id` to `location_id`

* chore: rename `from_peer_id` to `location_id`

* chore: apply suggestions from CR
2024-08-28 11:37:18 +00:00
Ruihang Xia
64ae32def0 feat: remove some redundent clone/conversion on constructing MergeScan stream (#4632)
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2024-08-28 08:52:09 +00:00
Weny Xu
744946957e fix: set selector_result_cache_size in unit test (#4631) 2024-08-28 07:24:17 +00:00
Ruihang Xia
d5455db2d5 fix: update properties on updating partitions (#4627)
* fix: update properties on updating partitions

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* add unit test and handle insufficient ranges

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2024-08-28 07:06:01 +00:00
Yingwen
28bf549907 fix: fallback to window size in manifest (#4629) 2024-08-28 06:43:56 +00:00
zyy17
4ea412249a ci: add check-builder-rust-version job in release and change release-dev-builder-images trigger condition (#4615) 2024-08-27 16:59:01 +00:00
zyy17
eacc7bc471 refactor: add app in greptime_app_version metric (#4626)
refactor: add app in greptime_app_version metric
2024-08-27 11:19:01 +00:00
Ruihang Xia
b72d3bc71d build(deps): bump backon to 1.0 (#4625)
* build(deps): bump backon to 1.0

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix clippy

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2024-08-27 09:38:12 +00:00
Ning Sun
0b102ef846 ci: improve toolchain resolution in ci (#4614)
* ci: improve toolchain resolution in ci

* fix: yaml format
2024-08-27 07:46:51 +00:00
liyang
e404e9dafc chore: setting docker authentication in dev-build image (#4623) 2024-08-27 03:49:53 +00:00
liyang
63a442632e fix: failed to get version (#4622) 2024-08-26 15:33:30 +00:00
liyang
d39bafcfbd fix: change toolchain file name (#4621) 2024-08-26 13:04:06 +00:00
liyang
1717445ebe fix: failed to get github sha (#4620) 2024-08-26 11:42:07 +00:00
liyang
55d65da24d ci: add push dev-build images to aws ecr (#4618)
* ci: add push dev-build images to aws ecr

* chore: use toolchain file generation dev-build image tag

* chore: change dev-build version

* Update .github/workflows/release-dev-builder-images.yaml

Co-authored-by: zyy17 <zyylsxm@gmail.com>

---------

Co-authored-by: zyy17 <zyylsxm@gmail.com>
2024-08-26 09:36:55 +00:00
Weny Xu
3297d5f657 feat: allow skipping topic creation (#4616)
* feat: introduce `create_topics` opt

* feat: allow skipping topic creation

* chore: refine docs

* chore: apply suggestions from CR
2024-08-26 08:34:27 +00:00
Ning Sun
d6865911ee feat: add postgres response for trasaction related statements (#4562)
* feat: add postgres fixtures WIP

* feat: implement more postgres fixtures

* feat: add compatibility for transaction/set transaction/show transaction

* fix: improve regex for set transaction
2024-08-26 08:09:21 +00:00
dennis zhuang
63f2463273 feat!: impl admin command (#4600)
* feat: impl admin statement parser

* feat: introduce AsyncFunction and implements it for admin functions

* feat: execute admin functions

* fix: license header

* fix: panic in test

* chore: fixed by code review
2024-08-26 07:53:40 +00:00
Ruihang Xia
da337a9635 perf: acclerate scatter query (#4607)
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2024-08-26 03:03:30 +00:00
fys
3973d6b01f chore: optimize common_version build (#4611) 2024-08-23 12:36:28 +00:00
discord9
2c731c76ad chore: add stats feature for jemalloc-ctl (#4610) 2024-08-23 11:18:30 +00:00
ozewr
40e7b58c80 feat: refactoring LruCacheLayer with list_with_metakey and concurrent_stat_in_list (#4596)
* use list_with_metakey and concurrent_stat_in_list

* change concurrent in recover_cache like before

* remove stat funcation

* use 8 concurrent

* use const value

* fmt code

* Apply suggestions from code review

---------

Co-authored-by: ozewr <l19ht@google.com>
Co-authored-by: Weny Xu <wenymedia@gmail.com>
2024-08-23 03:22:00 +00:00
zyy17
5177717f71 refactor: add fallback_to_local region option (#4578)
* refactor: add 'fallback_to_local_compaction' region option

* refactor: use 'fallback_to_local'
2024-08-23 03:09:43 +00:00
Weny Xu
8d61e6fe49 chore: bump rskafka to 75535b (#4608) 2024-08-23 03:05:52 +00:00
Ruihang Xia
a3b8d2fe8f chore: bump rust toolchain to 2024-08-21 (#4606)
* chore: bump rust toolchain to 2024-08-22

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix clippy

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* update workflow

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* try 20240606

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2024-08-22 15:38:10 +00:00
Ning Sun
863ee073a9 chore: add commerial support section (#4601)
doc: add commerial support section
2024-08-22 12:03:20 +00:00
Weny Xu
25cd61b310 chore: upgrade toolchain to nightly-2024-08-07 (#4549)
* chore: upgrade toolchain to `nightly-2024-08-07`

* chore(ci): upgrade toolchain

* fix: fix unit test
2024-08-22 11:02:18 +00:00
fys
3517c13192 fix: incremental compilation always compile the common-version crate (#4605)
fix: wrong cargo:rerun
2024-08-22 11:00:33 +00:00
Ruihang Xia
b9cedf2c1a perf: optimize series divide algo (#4603)
* perf: optimize series divide algo

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* remove dead code

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2024-08-22 09:16:36 +00:00
LFC
883c5bc5b0 refactor: skip checking the existence of the SST files (#4602)
refactor: skip checking the existence of the SST files when region is directly edited
2024-08-22 08:32:27 +00:00
Yingwen
d628079f4c feat: collect filters metrics for scanners (#4591)
* feat: collect filter metrics

* refactor: reuse ReaderFilterMetrics

* feat: record read rows from parquet by type

* feat: unordered scan observe rows

also fix read type

* chore: rename label
2024-08-22 03:22:05 +00:00
Weny Xu
0025fa6ec7 chore: bump opendal version to 0.49 (#4587)
* chore: bump opendal version to 0.49

* chore: apply suggestions from CR

* Update src/object-store/src/util.rs

Co-authored-by: Yingwen <realevenyag@gmail.com>

---------

Co-authored-by: Yingwen <realevenyag@gmail.com>
2024-08-22 03:05:36 +00:00
Lanqing Yang
ff04109ee6 docs: add example configs introduced by pg_kvbackend (#4573)
chore: add example configs that introduced after pg_kvbackend
2024-08-22 01:52:02 +00:00
Yingwen
9c1704d4cb docs: move v0.9.1 benchmark report to tsbs dir (#4598)
* docs: move v0.9.1 benchmark report to tsbs dir

* docs: add newlines
2024-08-21 09:31:05 +00:00
Yingwen
a12a905578 chore: disable ttl for write cache by default (#4595)
* chore: remove default write cache ttl

* docs: update example config

* chore: fix ci
2024-08-21 08:38:38 +00:00
shuiyisong
449236360d docs: log benchmark (#4597)
* chore: add log benchmark stuff

* chore: minor update
2024-08-21 07:12:32 +00:00
localhost
bf16422cee fix: pipeline prepare loop break detects a conditional error (#4593) 2024-08-21 06:20:09 +00:00
Ran Joe
9db08dbbe0 refactor(mito2): reduce duplicate IndexOutput struct (#4592)
* refactor(mito2): reduce duplicate IndexOutput struct

* docs(mito2): add index output note
2024-08-20 12:30:17 +00:00
fys
9d885fa0c2 chore: bump tikv-jemalloc* to "0.6" (#4590)
chore: bump tikv-jemalloc* ti "0.6"
2024-08-20 09:08:21 +00:00
Ruihang Xia
b25a2b117e feat: remove sql in error desc (#4589)
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2024-08-20 06:37:30 +00:00
fys
6fccff4810 chore: keep symbol table in nightly profile (#4588)
chore: keep symbol table in nighly profile
2024-08-20 02:27:31 +00:00
ozewr
30af78700f feat: Implement the Buf to avoid extra memory allocation (#4585)
* feat: Implement the Buf to avoid extra memory allocation

* fmt toml

* fmt code

* mv entry.into_buffer to raw_entry_buffer

* less reuse opendal

* remove todo #4065

* Update src/mito2/src/wal/entry_reader.rs

Co-authored-by: Weny Xu <wenymedia@gmail.com>

* fmt code

---------

Co-authored-by: ozewr <l19ht@google.com>
Co-authored-by: Weny Xu <wenymedia@gmail.com>
2024-08-19 12:11:08 +00:00
Ruihang Xia
8de11a0e34 perf: set simple filter on primary key columns to exact filter (#4564)
* perf: set simple filter on primary key columns to exact filter

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* add sqlness test

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix typo

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix sqlness

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2024-08-19 09:07:35 +00:00
Lei, HUANG
975b8c69e5 fix(sqlness): redact all volatile text (#4583)
Commit Message:

 Add SQLNESS replacements for RoundRobinBatch and region patterns
2024-08-19 08:04:54 +00:00
Weny Xu
8036b44347 chore: setup kafka before downloading binary step (#4582) 2024-08-19 06:44:33 +00:00
Zhenchi
4c72b3f3fe chore: bump version to v0.9.2 (#4581)
chore: bump version to 0.9.2

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
2024-08-19 06:11:36 +00:00
Weny Xu
76dc906574 feat(log_store): introduce the CollectionTask (#4530)
* feat: introduce the `CollectionTask`

* feat: add config of index collector

* chore: remove unused code

* feat: truncate indexes

* chore: apply suggestions from CR

* chore: update config examples

* refactor: retrieve latest offset while dumping indexes

* chore: print warn
2024-08-19 03:48:35 +00:00
Ran Joe
2a73e0937f fix(common_version): short_version with empty branch (#4572) 2024-08-19 03:14:49 +00:00
Zhenchi
c8de8b80f4 fix(fulltext-index): single segment is not sufficient for >50M rows SST (#4552)
* fix(fulltext-index): single segment is not sufficient for a >50M rows SST

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* fix: update doc comment

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

---------

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
2024-08-16 09:14:33 +00:00
LFC
ec59ce5c9a feat: able to handle concurrent region edit requests (#4569)
* feat: able to handle concurrent region edit requests

* resolve PR comments
2024-08-16 03:29:03 +00:00
liyang
f578155602 feat: add GcsConfig credential field (#4568) 2024-08-16 03:11:20 +00:00
Weny Xu
d1472782d0 chore(log_store): remove redundant metrics (#4570)
chore(log_store): remove unused metrics
2024-08-16 02:23:21 +00:00
Lanqing Yang
93be81c041 feat: implement postgres kvbackend (#4421) 2024-08-14 22:49:32 +00:00
discord9
2c3fccb516 feat(flow): add eval_batch for ScalarExpr (#4551)
* refactor: better perf flow

* feat(WIP): batching proc

* feat: UnaryFunc::eval_batch untested

* feat: BinaryFunc::eval_batch untested

* feat: VariadicFunc::eval_batch un tested

* feat: literal eval_batch

* refactor: move DfScalarFunc to separate file

* chore: remove unused imports

* feat: eval_batch df func&ifthen

* chore: remove unused file

* refactor: use Batch type

* chore: remove unused

* chore: remove a done TODO

* refactor: per review

* chore: import

* refactor: eval_batch if then

* chore: typo
2024-08-14 11:29:30 +00:00
Lei, HUANG
c1b1be47ba fix: append table stats (#4561)
* fix: append table stats

* fix: clippy
2024-08-14 09:01:42 +00:00
Weny Xu
0f85037024 chore: remove unused code (#4559) 2024-08-14 06:55:54 +00:00
discord9
f88705080b chore: set topic to 3 for sqlness test (#4560) 2024-08-14 06:32:26 +00:00
discord9
cbb06cd0c6 feat(flow): add some metrics (#4539)
* feat: add some metrics

* fix: tmp rate limiter

* feat: add task count metrics

* refactor: use bounded channel anyway

* refactor: better metrics
2024-08-14 03:23:49 +00:00
discord9
b59a93dfbc chore: Helper function to convert Vec<Value> to VectorRef (#4546)
* chore: `try_from_row_into_vector` helper

* test: try_from_row

* refactor: simplify with builder

* fix: deicmal set prec&scale

* refactor: more simplify

* refactor: use ref
2024-08-14 03:11:44 +00:00
localhost
202c730363 perf: Optimizing pipeline performance (#4390)
* chore: improve pipeline performance

* chore: use arc to improve time type

* chore: improve pipeline coerce

* chore: add vec refactor

* chore: add vec pp

* chore: improve pipeline

* inprocess

* chore: set log ingester use new pipeline

* chore: fix some error by pr comment

* chore: fix typo

* chore: use enum_dispatch to simplify code

* chore: some minor fix

* chore: format code

* chore: update by pr comment

* chore: fix typo

* chore: make clippy happy

* chore: fix by pr comment

* chore: remove epoch and date process add new timestamp process

* chore: add more test for pipeline

* chore: restore epoch and date processor

* chore: compatibility issue

* chore: fix by pr comment

* chore: move the evaluation out of the loop

* chore: fix by pr comment

* chore: fix dissect output key filter

* chore: fix transform output greptime value has order error

* chore: keep pipeline transform output order

* chore: revert tests

* chore: simplify pipeline prepare implementation

* chore: add test for timestamp pipelin processor

* chore: make clippy happy

* chore: replace is_some check to match

---------

Co-authored-by: shuiyisong <xixing.sys@gmail.com>
2024-08-13 11:32:04 +00:00
zyy17
63e1892dc1 refactor(plugin): add SetupPlugin and StartPlugin error (#4554) 2024-08-13 11:22:48 +00:00
Lei, HUANG
216bce6973 perf: count(*) for append-only tables (#4545)
* feat: support fast count(*) for append-only tables

* fix: total_rows stats in time series memtable

* fix: sqlness result changes for SinglePartitionScanner -> StreamScanAdapter

* fix: some cr comments
2024-08-13 09:27:50 +00:00
Yingwen
4466fee580 docs: update grafana readme (#4550)
* docs: update grafana readme

* docs: simplify example
2024-08-13 08:45:06 +00:00
shuiyisong
5aa4c70057 chore: update validator signature (#4548) 2024-08-13 08:06:12 +00:00
Yingwen
72a1732fb4 docs: Adds more panels to grafana dashboards (#4540)
* docs: update standalone grafana

* docs: add more panels to grafana dashboards

* docs: replace source name

* docs: bump dashboard version

* docs: update hit rate expr

* docs: greptime_pod to instance, add panels for cache
2024-08-13 06:29:28 +00:00
Weny Xu
c821d21111 feat(log_store): introduce the IndexCollector (#4461)
* feat: introduce the IndexCollector

* refactor: separate BackgroundProducerWorker code into files

* feat: introduce index related operations

* feat: introduce the `GlobalIndexCollector`

* refactor: move collector to index mod

* refactor: refactor `GlobalIndexCollector`

* chore: remove unused collector.rs

* chore: add comments

* chore: add comments

* chore: apply suggestions from CR

* chore: apply suggestions from CR
2024-08-13 06:15:24 +00:00
Weny Xu
2e2eacf3b2 feat: add SASL and TLS config for Kafka client (#4536)
* feat: add SASL and TLS config

* feat: add SASL/PLAIN and TLS config for Kafka client

* chore: use `ring`

* feat: support SASL SCRAM-SHA-256 and SCRAM-SHA-512

* fix: correct unit test

* test: add integration test

* chore: apply suggestions from CR

* refactor: introduce `KafkaConnectionConfig`

* chore: refine toml examples

* docs: add missing fields

* chore: refine examples

* feat: allow no server ca cert

* chore: refine examples

* chore: fix clippy

* feat: load system ca certs

* chore: fmt toml

* chore: unpin version

* Update src/common/wal/src/error.rs

Co-authored-by: Lei, HUANG <6406592+v0y4g3r@users.noreply.github.com>

---------

Co-authored-by: Lei, HUANG <6406592+v0y4g3r@users.noreply.github.com>
2024-08-12 12:27:11 +00:00
Ruihang Xia
9bcaeaaa0e refactor: reuse aligned ts array in range manipulate exec (#4535)
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2024-08-12 06:26:11 +00:00
Weny Xu
90cfe276b4 chore: upload kind logs (#4544) 2024-08-12 05:01:13 +00:00
JohnsonLee
6694d2a930 fix: change the type of oid in pg_namespace to u32 (#4541)
* fix:  change the type of oid in pg_namespace to u32

* fix: header and correct logic of update oid
2024-08-10 15:06:14 +00:00
Ning Sun
9532ffb954 fix: configuration example for selector (#4532)
* fix: configuration example for selector

* docs: update config docs

* test: update unit tests for configuration in meta
2024-08-09 09:51:05 +00:00
Weny Xu
665b7e5c6e perf: merge small byte ranges for optimized fetching (#4520) 2024-08-09 08:17:54 +00:00
Weny Xu
27d9aa0f3b fix: rollback only if dropping the metric physical table fails (#4525)
* fix: rollback only if dropping the metric physical table fails

* chore: apply suggestions from CR
2024-08-09 08:01:11 +00:00
discord9
8f3293d4fb fix: larger stack size in debug mode (#4521)
* fix: larger stack size in debug mode

* chore: typo

* chore: clippy

* chore: per review

* chore: rename thread

* chore: per review

* refactor: better looking cfg

* chore: async main entry
2024-08-09 07:01:20 +00:00
LFC
7dd20b0348 chore: make mysql server version changable (#4531) 2024-08-09 03:43:43 +00:00
zyy17
4c1a3f29c0 ci: download the latest stable released version by default and do some small refactoring (#4529)
refactor: download the latest stable released version by default and do some small refactoring
2024-08-08 07:46:09 +00:00
Jeremyhi
0d70961448 feat: change the default selector to RoundRobin (#4528)
* feat: change the default selector to rr

* Update src/meta-srv/src/selector.rs

* fix: unit test
2024-08-08 04:58:20 +00:00
LFC
a75cfaa516 chore: update snafu to make clippy happy (#4507)
* chore: update snafu to make clippy happy

* fix ci
2024-08-07 16:12:00 +00:00
Lei, HUANG
aa3f53f08a fix: install script (#4527)
fix: install script always install v0.9.0-nightly-20240709 instead of latest nightly
2024-08-07 14:07:32 +00:00
Ruihang Xia
8f0959fa9f fix: fix incorrect result of topk with cte (#4523)
* fix: fix incorrect result of topk with cte

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* update sqlness

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* clean up cargo toml

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2024-08-07 09:13:38 +00:00
Weny Xu
4a3982ca60 chore: use configData (#4522)
* chore: use `configData`

* chore: add an empty line
2024-08-07 07:43:04 +00:00
Yingwen
559219496d ci: fix windows temp path (#4518) 2024-08-06 13:53:12 +00:00
LFC
685aa7dd8f ci: squeeze some disk space for complex fuzz tests (#4519)
* ci: squeeze some disk space for complex fuzz tests

* Update .github/workflows/develop.yml

Co-authored-by: Lei, HUANG <6406592+v0y4g3r@users.noreply.github.com>

---------

Co-authored-by: Lei, HUANG <6406592+v0y4g3r@users.noreply.github.com>
2024-08-06 11:52:34 +00:00
Lei, HUANG
be5364a056 chore: support swcs as the short name for strict window compaction (#4517) 2024-08-06 07:38:07 +00:00
Weny Xu
a25d9f736f chore: set default otlp_endpoint (#4508)
* chore: set default `otlp_endpoint`

* fix: fix ci
2024-08-06 06:48:14 +00:00
dependabot[bot]
2cd4a78f17 build(deps): bump zerovec from 0.10.2 to 0.10.4 (#4335)
Bumps [zerovec](https://github.com/unicode-org/icu4x) from 0.10.2 to 0.10.4.
- [Release notes](https://github.com/unicode-org/icu4x/releases)
- [Changelog](https://github.com/unicode-org/icu4x/blob/main/CHANGELOG.md)
- [Commits](https://github.com/unicode-org/icu4x/commits/ind/zerovec@0.10.4)

---
updated-dependencies:
- dependency-name: zerovec
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Weny Xu <wenymedia@gmail.com>
2024-08-06 00:40:03 +00:00
dependabot[bot]
188e182d75 build(deps): bump zerovec-derive from 0.10.2 to 0.10.3 (#4346)
Bumps [zerovec-derive](https://github.com/unicode-org/icu4x) from 0.10.2 to 0.10.3.
- [Release notes](https://github.com/unicode-org/icu4x/releases)
- [Changelog](https://github.com/unicode-org/icu4x/blob/main/CHANGELOG.md)
- [Commits](https://github.com/unicode-org/icu4x/commits/ind/zerovec-derive@0.10.3)

---
updated-dependencies:
- dependency-name: zerovec-derive
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Weny Xu <wenymedia@gmail.com>
2024-08-05 23:58:30 +00:00
Yingwen
d64cc79ab4 docs: add v0.9.1 bench result (#4511) 2024-08-05 16:53:32 +00:00
discord9
e6cc4df8c8 feat: flow recreate on reboot (#4509)
* feat: flow reboot clean

* refactor: per review

* refactor: per review

* test: sqlness flow reboot
2024-08-05 13:57:48 +00:00
LFC
803780030d fix: too large shadow-rs consts (#4506) 2024-08-05 07:05:14 +00:00
Weny Xu
79f10d0415 chore: reduce fuzz tests in CI (#4505) 2024-08-05 06:56:41 +00:00
Weny Xu
3937e67694 feat: introduce new kafka topic consumer respecting WAL index (#4424)
* feat: introduce new kafka topic consumer respecting WAL index

* chore: fmt

* chore: fmt toml

* chore: add comments

* feat: merge close ranges

* fix: fix unit tests

* chore: fix typos

* chore: use loop

* chore: use unstable sort

* chore: use gt instead of gte

* chore: add comments

* chore: rename to `current_entry_id`

* chore: apply suggestions from CR

* chore: apply suggestions from CR

* refactor: minor refactor

* chore: apply suggestions from CR
2024-08-05 06:56:25 +00:00
Weny Xu
4c93fe6c2d chore: bump rust-postgres to 0.7.11 (#4504) 2024-08-05 04:26:46 +00:00
LFC
c4717abb68 chore: bump shadow-rs version to set the path to find the correct git repo (#4494) 2024-08-05 02:24:12 +00:00
shuiyisong
3b701d8f5e test: more on processors (#4493)
* test: add date test

* test: add epoch test

* test: add letter test and complete some others

* test: add urlencoding test

* chore: typo
2024-08-04 08:29:31 +00:00
Weny Xu
cb4cffe636 chore: bump opendal version to 0.48 (#4499) 2024-08-04 00:46:04 +00:00
Ruihang Xia
cc7f33c90c fix(tql): avoid unwrap on parsing tql query (#4502)
* fix(tql): avoid unwrap on parsing tql query

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* add unit test

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2024-08-03 20:58:53 +00:00
Ruihang Xia
fe1cfbf2b3 fix: partition column with mixed quoted and unquoted idents (#4491)
* fix: partition column with mixed quoted and unquoted idents

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* update error message

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2024-08-02 09:06:31 +00:00
Yingwen
ded874da04 feat: enlarge default page cache size (#4490) 2024-08-02 07:24:20 +00:00
Lei, HUANG
fe2d29a2a0 chore: bump version v0.9.1 (#4486)
Update package versions to 0.9.1

 - Bump version for multiple packages from 0.9.0 to 0.9.1 in Cargo.lock
2024-08-02 07:10:05 +00:00
Yingwen
b388829a96 fix: avoid total size overflow (#4487)
feat: avoid total size overflow
2024-08-02 06:16:37 +00:00
zyy17
8e7c027bf5 ci: make docker image args configurable from env vars (#4484)
refactor: make docker image args configurable from env vars
2024-08-02 03:17:09 +00:00
Lei, HUANG
9d5d7c1f9a feat(compaction): add file number limits to TWCS compaction (#4481)
* Add file number limits to TWCS compaction

 - Introduce `max_active_window_files` and `max_inactive_window_files` to `TwcsOptions`.

* feat/limit-files-in-windows: Add max active/inactive window files options to mito engine config

* feat/limit-files-in-windows: Add Debug derive to TwcsPicker and implement max file enforcement logging in TWCS compaction

* fix: clippy
2024-08-01 12:42:09 +00:00
ZonaHe
efe5eeef14 feat: update dashboard to v0.5.4 (#4483)
Co-authored-by: ZonaHex <ZonaHex@users.noreply.github.com>
2024-08-01 12:19:38 +00:00
Ruihang Xia
ca54b05be3 feat: time poll elapsed for RegionScan plan (#4482)
* feat: time poll elapsed for RegionScan plan

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* also record await time

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2024-08-01 12:19:15 +00:00
Jeremyhi
d67314789c feat: export all schemas and data at once in export tool (#4478)
* feat: export all schemas and data at onece

* feat: introduce export all to export schemas and data at once

* feat: default value for target

* feat: refactor export target

* chore: fix unit test
2024-08-01 09:14:44 +00:00
Yingwen
6c4b8b63a5 fix: notify flush receiver after write buffer is released (#4476)
* fix: notify the worker after write buffer is released

* feat: worker level region count
2024-08-01 07:15:36 +00:00
Jeremyhi
62a0defd63 feat: improve extract hints (#4479) 2024-08-01 07:06:13 +00:00
Jeremyhi
291d9d55a4 feat: hint options for gRPC insert (#4454)
* feat: hint options for gRPC isnert

* chore: unit test for extract_hints

* feat: add integration test for grpc hint

* test: add integration test for hints
2024-08-01 02:59:38 +00:00
Weny Xu
90301a6250 fix: generate unique timestamp for inserting tests (#4472) 2024-07-31 12:19:43 +00:00
shuiyisong
c66d3090b6 fix: prometheus api only returns 200 (#4471)
fix: prometheus api returns http status other than 200
2024-07-31 07:42:50 +00:00
dennis zhuang
656050722c fix: overflow when parsing default value with negative numbers (#4459)
* fix: overflow when parsing default value with negative numbers

* test: adds sqlness test
2024-07-31 07:41:49 +00:00
Ning Sun
b741a7181b feat: track channels with query context and w/rcu (#4448)
* feat: add source channel to meter recorders

* feat: provide channel for query context

* fix: testing and extension get for query context

* chore: revert cargo toml structure changes

* fix: querycontext modification for prometheus and pipeline

* chore: switch git dependency to main branches

* chore: remove TODO

* refactor: rename other to unknown

---------

Co-authored-by: shuiyisong <113876041+shuiyisong@users.noreply.github.com>
2024-07-31 07:30:50 +00:00
Weny Xu
dd23d47743 chore(ci): bring back chaos tests (#4456)
* Revert "chore: temporarily disable fuzz chaos tests (#4457)"

This reverts commit f0c953f84a.

* chore: update config

* Update .github/actions/setup-greptimedb-cluster/with-remote-wal.yaml

Co-authored-by: Ruihang Xia <waynestxia@gmail.com>

---------

Co-authored-by: Ruihang Xia <waynestxia@gmail.com>
2024-07-31 07:29:31 +00:00
Ran Miller
80aaa7725e docs(contributing): replace expired links (#4468) 2024-07-31 06:11:30 +00:00
Ran Miller
c24de8b908 refactor(servers): improve postgres error message (#4463)
* refactor(servers): improve postgres error message

* refactor(servers): remove numerical representation of ErrorSeverity
2024-07-31 06:06:15 +00:00
Yingwen
f382a7695f perf: reduce lock scope and improve log (#4453)
* feat: refine logs for scan

* feat: improve build parts and  unordered scan metrics

* feat: change to debug log

* fix: release lock before reading part

* test: replace region id

* test: fix sqlness

* chore: add todo

Co-authored-by: dennis zhuang <killme2008@gmail.com>

---------

Co-authored-by: dennis zhuang <killme2008@gmail.com>
2024-07-31 04:07:34 +00:00
Jeremyhi
1ea43da9ea feat: default export catalog name (#4464)
* feat: default export catalog name

* chore: default catalog name
2024-07-31 03:39:39 +00:00
dennis zhuang
6113f46284 docs: tweak readme (#4465) 2024-07-31 02:35:29 +00:00
LFC
6d8a502430 chore: add more metrics about parquet and cache (#4410)
* chore: add more metrics about parquet and cache

* resolve PR comments

* resolve PR comments

* resolve PR comments

* resolve PR comments
2024-07-30 12:01:49 +00:00
Ruihang Xia
2d992f4f12 fix: check_partition uses unqualified name (#4452)
* fix: check_partition uses unqualified name

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* update sqlness result

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2024-07-30 11:28:28 +00:00
Ruihang Xia
7daf24c47f feat: remove dedicated runtime for grpc, mysql and pg protocols (#4436)
* feat: remove dedicated runtime for grpc, mysql and pg protocols

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* remove other runtimes

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* spawn compact task into compact_runtime

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* refine naming

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* Update src/servers/tests/mysql/mysql_server_test.rs

Co-authored-by: Zhenchi <zhongzc_arch@outlook.com>

* fix clippy

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* turnoff fuzz test matrix fail fast option

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* chore: update rt config for ci tests

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Co-authored-by: Zhenchi <zhongzc_arch@outlook.com>
Co-authored-by: Weny Xu <wenymedia@gmail.com>
2024-07-30 06:17:58 +00:00
shuiyisong
567f5105bf fix: missing pre_write check on prometheus remote write (#4460)
fix: missing pre_write check on prometheus remote write
2024-07-30 04:55:19 +00:00
Yingwen
78962015dd ci: keep sqlness log by default (#4449)
* ci: keep sqlness log by default

* chore: not preserve state in makefile by default

* ci: use make
2024-07-29 17:11:24 +00:00
taobo
1138f32af9 feat: support setting time range in Copy From statement (#4405)
* feat: support setting time range in Copy From statement

* test: add batch_filter_test

* fix: ts data type inconsistent error

* test: add sqlness test for copy from with statement

* fix: sqlness result error

* fix: cr comments
2024-07-29 16:55:19 +00:00
shuiyisong
53fc14a50b fix: use status code to http status mapping in error IntoResponse (#4455) 2024-07-29 16:37:04 +00:00
Ruihang Xia
1895a5478b feat: track prometheus HTTP API's query latency (#4458)
* feat: track prometheus HTTP API's query latency

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* update grafana config

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* chore: Update src/servers/src/metrics.rs

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Co-authored-by: Yingwen <realevenyag@gmail.com>
2024-07-29 15:00:09 +00:00
Weny Xu
f0c953f84a chore: temporarily disable fuzz chaos tests (#4457) 2024-07-29 13:23:40 +00:00
zyy17
1a38f36d2d refactor!: Remove Mode from FrontendOptions (#4401)
refactor: remove `Mode` from `FrontendOptions`

Signed-off-by: zyy17 <zyylsxm@gmail.com>
2024-07-29 06:57:01 +00:00
Zhenchi
cb94bd45d3 fix(fulltext-search): prune rows in row group forget to take remainder (#4447)
* fix(fulltext-search): prune rows in row group forget to take remainder

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

* test: add unit test

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>

---------

Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
2024-07-29 06:20:07 +00:00
Ning Sun
b298b35b3b feat: show root cause and db name on the error line (#4442)
* feat: show root cause on the error line

* feat: show root error for grpc

* feat: add error information for http error

* feat: add db information on error mysql/postgres logs
2024-07-29 03:59:42 +00:00
Weny Xu
164232e073 fix: use heartbeat runtime instead of background runtime (#4445) 2024-07-29 03:29:30 +00:00
JohnsonLee
9a5fa49955 feat: support pg_namespace, pg_class and related psql command (#4428)
* feat: add function 'pg_catalog.pg_table_is_visible'q

* feat: add 'pg_class' and 'pg_namespace', now we can run '\d' and '\dt'!

* refactor: move memory_table::tables to utils::tables

* refactor: move out predicate to system_schema to reuse it

* feat: predicates pushdown

* test: add pg_namespace, pg_class related sqlness test

* fix: typos and license header

* fix: sqlness test

* refactor: use `expect` instead of `unwrap` here

* refactor: remove the `information_schema::utils` mod

* doc: make the comment in pg_get_userbyid more precise

* doc: add TODO and comment in pg_catalog

* fix: typo

* fix: sqlness

* doc: change to comment on PGClassBuilder to TODO
2024-07-28 12:04:54 +00:00
dennis zhuang
92d6d4e64a docs: update project status (#4440)
* docs: update project status

* docs: update project status
2024-07-27 05:24:09 +00:00
discord9
021ec7b6ac feat(flow): flush_flow function (#4416)
* refactor: df err variant

* WIP

* chore: update proto version

* chore: revert mistaken rust-toolchain

* feat(WIP): added FlowService to QueryEngine

* refactor: move flow service to operator

* refactor: flush use flow name not id

* refactor: use full path in macro

* feat: flush flow

* feat: impl flush flow

* chore: remove unused

* chore: meaninful response

* chore: remove unused

* chore: clippy

* fix: flush_flow with proper blocking

* test: sqlness tests added back for flow

* test: better predicate for flush_flow

* refactor: rwlock

* fix: flush lock

* fix: flush lock write then drop

* test: add a new flow sqlness test

* fix: sqlness testcase

* chore: style

---------

Co-authored-by: dennis zhuang <killme2008@gmail.com>
2024-07-26 23:04:13 +00:00
dennis zhuang
0710e6ff36 fix: remove to_timezone function (#4439)
fix: remove to_timezone, it doesn't make sense
2024-07-26 07:40:07 +00:00
dennis zhuang
db3a07804e fix: information_schema tables and views column value (#4438) 2024-07-26 07:39:58 +00:00
Lei, HUANG
bdd3d2d9ce chore: add dynamic cache size adjustment for InvertedIndexConfig (#4433)
* Add dynamic cache size adjustment for InvertedIndexConfig

* Increase cache sizes in integration tests for HTTP

 - Updated `metadata_cache_size` from 32MiB to 64MiB

* Remove cache size settings from config and update drop_lines_with_inconsistent_results function to handle them

* Add cache size configurations for inverted index metadata and content

 - Introduced `metadata_cache_size` with a default of 64MiB.
 - Introduced `content_cache_size` with a default of 128MiB.

* chore/index-content-cache-default-size: Add cache size configuration options for Mito engine's inverted index
2024-07-26 03:36:20 +00:00
zyy17
b81d3a28e6 refactor: add RetryInterceptor to print detailed error (#4434) 2024-07-25 11:52:28 +00:00
Weny Xu
89b86c87a2 chore: add docs for config file (#4432) 2024-07-25 08:11:10 +00:00
Lei, HUANG
0b0ed03ee6 fix(metrics): RowGroupLastRowCachedReader metrics (#4418)
fix/reader-metrics:
 Refactor cache hit/miss logic and update metrics in mito2

 - Simplify cache retrieval logic in CacheManager by removing inline update_hit_miss function call.
 - Add separate functions for incrementing cache hit and miss metrics.
 - Update RowGroupLastRowCachedReader to use new cache hit/miss functions and refactor to new helper methods for creating Hit and Miss variants.
2024-07-25 06:45:43 +00:00
dennis zhuang
ea4a71b387 docs: update readme (#4431) 2024-07-25 06:17:45 +00:00
dennis zhuang
4cd5ec7769 docs: update readme (#4430) 2024-07-25 02:42:18 +00:00
Ruihang Xia
c8f4a85720 chore: update grafana dashboard to reflect recent metric changes (#4417)
* chore: update grafana dashboard to reflect recent metric changes

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* chore: add a blank line at the end

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Co-authored-by: dennis zhuang <killme2008@gmail.com>
2024-07-24 20:05:44 +00:00
discord9
024dac8171 chore: add a compile cfg for python in cmd package (#4406)
* chore: add a compile cfg for python

* fix: feature gate additive turn off default features in workspace&add cfg in place

* chore: remove unused in different cfg
2024-07-24 20:03:53 +00:00
Ran Miller
918be099cd docs(common_error): format enum StatusCode docs (#4427)
* fix: format comments end with . symbol
* docs: add commnet for RegionReadonly
* fix: comment error for DatabaseAlreadyExists
2024-07-24 15:54:35 +00:00
Zhenchi
91dbac4141 fix(fulltext-index): clean up 0-value timer (#4423)
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>
2024-07-24 15:03:36 +00:00
Ran Miller
e935bf7574 refactor: Remove PhysicalOptimizer and LogicalOptimizer trait (#4426)
* refactor(query): Remove LogicalOptimizer trait

* refactor(query): Remove PhysicalOptimizer trait
2024-07-24 13:01:44 +00:00
Ran Miller
f7872654cc refactor(query): Remove PhysicalPlanner trait (#4412) 2024-07-24 03:06:46 +00:00
shuiyisong
547730a467 chore: add metrics for log ingestion (#4411)
* chore: add metrics for log ingestion

* chore: record result as well
2024-07-23 08:05:11 +00:00
Ning Sun
49f22f0fc5 fix: add back AuthBackend which is required by custom auth backend (#4409) 2024-07-23 05:35:29 +00:00
1265 changed files with 100105 additions and 29171 deletions

View File

@@ -14,10 +14,11 @@ GT_AZBLOB_CONTAINER=AZBLOB container
GT_AZBLOB_ACCOUNT_NAME=AZBLOB account name
GT_AZBLOB_ACCOUNT_KEY=AZBLOB account key
GT_AZBLOB_ENDPOINT=AZBLOB endpoint
# Settings for gcs test
GT_GCS_BUCKET = GCS bucket
# Settings for gcs test
GT_GCS_BUCKET = GCS bucket
GT_GCS_SCOPE = GCS scope
GT_GCS_CREDENTIAL_PATH = GCS credential path
GT_GCS_CREDENTIAL_PATH = GCS credential path
GT_GCS_CREDENTIAL = GCS credential
GT_GCS_ENDPOINT = GCS end point
# Settings for kafka wal test
GT_KAFKA_ENDPOINTS = localhost:9092

View File

@@ -50,7 +50,7 @@ runs:
BUILDX_MULTI_PLATFORM_BUILD=all \
IMAGE_REGISTRY=${{ inputs.dockerhub-image-registry }} \
IMAGE_NAMESPACE=${{ inputs.dockerhub-image-namespace }} \
IMAGE_TAG=${{ inputs.version }}
DEV_BUILDER_IMAGE_TAG=${{ inputs.version }}
- name: Build and push dev-builder-centos image
shell: bash
@@ -61,7 +61,7 @@ runs:
BUILDX_MULTI_PLATFORM_BUILD=amd64 \
IMAGE_REGISTRY=${{ inputs.dockerhub-image-registry }} \
IMAGE_NAMESPACE=${{ inputs.dockerhub-image-namespace }} \
IMAGE_TAG=${{ inputs.version }}
DEV_BUILDER_IMAGE_TAG=${{ inputs.version }}
- name: Build and push dev-builder-android image # Only build image for amd64 platform.
shell: bash
@@ -71,6 +71,6 @@ runs:
BASE_IMAGE=android \
IMAGE_REGISTRY=${{ inputs.dockerhub-image-registry }} \
IMAGE_NAMESPACE=${{ inputs.dockerhub-image-namespace }} \
IMAGE_TAG=${{ inputs.version }} && \
DEV_BUILDER_IMAGE_TAG=${{ inputs.version }} && \
docker push ${{ inputs.dockerhub-image-registry }}/${{ inputs.dockerhub-image-namespace }}/dev-builder-android:${{ inputs.version }}

View File

@@ -17,6 +17,12 @@ inputs:
description: Enable dev mode, only build standard greptime
required: false
default: "false"
image-namespace:
description: Image Namespace
required: true
image-registry:
description: Image Registry
required: true
working-dir:
description: Working directory to build the artifacts
required: false
@@ -31,8 +37,8 @@ runs:
run: |
cd ${{ inputs.working-dir }} && \
make run-it-in-container BUILD_JOBS=4 \
IMAGE_NAMESPACE=i8k6a5e1/greptime \
IMAGE_REGISTRY=public.ecr.aws
IMAGE_NAMESPACE=${{ inputs.image-namespace }} \
IMAGE_REGISTRY=${{ inputs.image-registry }}
- name: Upload sqlness logs
if: ${{ failure() && inputs.disable-run-tests == 'false' }} # Only upload logs when the integration tests failed.
@@ -51,8 +57,8 @@ runs:
artifacts-dir: greptime-linux-${{ inputs.arch }}-pyo3-${{ inputs.version }}
version: ${{ inputs.version }}
working-dir: ${{ inputs.working-dir }}
image-registry: public.ecr.aws
image-namespace: i8k6a5e1/greptime
image-registry: ${{ inputs.image-registry }}
image-namespace: ${{ inputs.image-namespace }}
- name: Build greptime without pyo3
if: ${{ inputs.dev-mode == 'false' }}
@@ -64,8 +70,8 @@ runs:
artifacts-dir: greptime-linux-${{ inputs.arch }}-${{ inputs.version }}
version: ${{ inputs.version }}
working-dir: ${{ inputs.working-dir }}
image-registry: public.ecr.aws
image-namespace: i8k6a5e1/greptime
image-registry: ${{ inputs.image-registry }}
image-namespace: ${{ inputs.image-namespace }}
- name: Clean up the target directory # Clean up the target directory for the centos7 base image, or it will still use the objects of last build.
shell: bash
@@ -82,8 +88,8 @@ runs:
artifacts-dir: greptime-linux-${{ inputs.arch }}-centos-${{ inputs.version }}
version: ${{ inputs.version }}
working-dir: ${{ inputs.working-dir }}
image-registry: public.ecr.aws
image-namespace: i8k6a5e1/greptime
image-registry: ${{ inputs.image-registry }}
image-namespace: ${{ inputs.image-namespace }}
- name: Build greptime on android base image
uses: ./.github/actions/build-greptime-binary
@@ -94,5 +100,5 @@ runs:
version: ${{ inputs.version }}
working-dir: ${{ inputs.working-dir }}
build-android-artifacts: true
image-registry: public.ecr.aws
image-namespace: i8k6a5e1/greptime
image-registry: ${{ inputs.image-registry }}
image-namespace: ${{ inputs.image-namespace }}

View File

@@ -4,9 +4,6 @@ inputs:
arch:
description: Architecture to build
required: true
rust-toolchain:
description: Rust toolchain to use
required: true
cargo-profile:
description: Cargo profile to build
required: true
@@ -43,10 +40,9 @@ runs:
brew install protobuf
- name: Install rust toolchain
uses: dtolnay/rust-toolchain@master
uses: actions-rust-lang/setup-rust-toolchain@v1
with:
toolchain: ${{ inputs.rust-toolchain }}
targets: ${{ inputs.arch }}
target: ${{ inputs.arch }}
- name: Start etcd # For integration tests.
if: ${{ inputs.disable-run-tests == 'false' }}
@@ -62,12 +58,13 @@ runs:
# Get proper backtraces in mac Sonoma. Currently there's an issue with the new
# linker that prevents backtraces from getting printed correctly.
#
# <https://github.com/rust-lang/rust/issues/113783>
# <https://github.com/rust-lang/rust/issues/113783>
- name: Run integration tests
if: ${{ inputs.disable-run-tests == 'false' }}
shell: bash
env:
env:
CARGO_BUILD_RUSTFLAGS: "-Clink-arg=-Wl,-ld_classic"
SQLNESS_OPTS: "--preserve-state"
run: |
make test sqlness-test
@@ -81,7 +78,7 @@ runs:
- name: Build greptime binary
shell: bash
env:
env:
CARGO_BUILD_RUSTFLAGS: "-Clink-arg=-Wl,-ld_classic"
run: |
make build \

View File

@@ -4,9 +4,6 @@ inputs:
arch:
description: Architecture to build
required: true
rust-toolchain:
description: Rust toolchain to use
required: true
cargo-profile:
description: Cargo profile to build
required: true
@@ -28,10 +25,9 @@ runs:
- uses: arduino/setup-protoc@v3
- name: Install rust toolchain
uses: dtolnay/rust-toolchain@master
uses: actions-rust-lang/setup-rust-toolchain@v1
with:
toolchain: ${{ inputs.rust-toolchain }}
targets: ${{ inputs.arch }}
target: ${{ inputs.arch }}
components: llvm-tools-preview
- name: Rust Cache
@@ -40,11 +36,11 @@ runs:
- name: Install Python
uses: actions/setup-python@v5
with:
python-version: '3.10'
python-version: "3.10"
- name: Install PyArrow Package
shell: pwsh
run: pip install pyarrow
run: pip install pyarrow numpy
- name: Install WSL distribution
uses: Vampire/setup-wsl@v2
@@ -62,13 +58,14 @@ runs:
env:
RUSTUP_WINDOWS_PATH_ADD_BIN: 1 # Workaround for https://github.com/nextest-rs/nextest/issues/1493
RUST_BACKTRACE: 1
SQLNESS_OPTS: "--preserve-state"
- name: Upload sqlness logs
if: ${{ failure() }} # Only upload logs when the integration tests failed.
uses: actions/upload-artifact@v4
with:
name: sqlness-logs
path: /tmp/greptime-*.log
path: C:\Users\RUNNER~1\AppData\Local\Temp\sqlness*
retention-days: 3
- name: Build greptime binary

View File

@@ -18,6 +18,8 @@ runs:
--set replicaCount=${{ inputs.etcd-replicas }} \
--set resources.requests.cpu=50m \
--set resources.requests.memory=128Mi \
--set resources.limits.cpu=1500m \
--set resources.limits.memory=2Gi \
--set auth.rbac.create=false \
--set auth.rbac.token.enabled=false \
--set persistence.size=2Gi \

View File

@@ -8,7 +8,7 @@ inputs:
default: 2
description: "Number of Datanode replicas"
meta-replicas:
default: 3
default: 1
description: "Number of Metasrv replicas"
image-registry:
default: "docker.io"
@@ -58,7 +58,7 @@ runs:
--set image.tag=${{ inputs.image-tag }} \
--set base.podTemplate.main.resources.requests.cpu=50m \
--set base.podTemplate.main.resources.requests.memory=256Mi \
--set base.podTemplate.main.resources.limits.cpu=1000m \
--set base.podTemplate.main.resources.limits.cpu=2000m \
--set base.podTemplate.main.resources.limits.memory=2Gi \
--set frontend.replicas=${{ inputs.frontend-replicas }} \
--set datanode.replicas=${{ inputs.datanode-replicas }} \

View File

@@ -1,18 +1,13 @@
meta:
config: |-
configData: |-
[runtime]
read_rt_size = 8
write_rt_size = 8
bg_rt_size = 8
global_rt_size = 4
datanode:
config: |-
configData: |-
[runtime]
read_rt_size = 8
write_rt_size = 8
bg_rt_size = 8
global_rt_size = 4
compact_rt_size = 2
frontend:
config: |-
configData: |-
[runtime]
read_rt_size = 8
write_rt_size = 8
bg_rt_size = 8
global_rt_size = 4

View File

@@ -1,29 +1,24 @@
meta:
config: |-
configData: |-
[runtime]
read_rt_size = 8
write_rt_size = 8
bg_rt_size = 8
global_rt_size = 4
[datanode]
[datanode.client]
timeout = "60s"
datanode:
config: |-
configData: |-
[runtime]
read_rt_size = 8
write_rt_size = 8
bg_rt_size = 8
global_rt_size = 4
compact_rt_size = 2
[storage]
cache_path = "/data/greptimedb/s3cache"
cache_capacity = "256MB"
frontend:
config: |-
configData: |-
[runtime]
read_rt_size = 8
write_rt_size = 8
bg_rt_size = 8
global_rt_size = 4
[meta_client]
ddl_timeout = "60s"

View File

@@ -1,25 +1,20 @@
meta:
config: |-
configData: |-
[runtime]
read_rt_size = 8
write_rt_size = 8
bg_rt_size = 8
global_rt_size = 4
[datanode]
[datanode.client]
timeout = "60s"
datanode:
config: |-
configData: |-
[runtime]
read_rt_size = 8
write_rt_size = 8
bg_rt_size = 8
global_rt_size = 4
compact_rt_size = 2
frontend:
config: |-
configData: |-
[runtime]
read_rt_size = 8
write_rt_size = 8
bg_rt_size = 8
global_rt_size = 4
[meta_client]
ddl_timeout = "60s"

View File

@@ -1,9 +1,7 @@
meta:
config: |-
configData: |-
[runtime]
read_rt_size = 8
write_rt_size = 8
bg_rt_size = 8
global_rt_size = 4
[wal]
provider = "kafka"
@@ -15,22 +13,19 @@ meta:
[datanode.client]
timeout = "60s"
datanode:
config: |-
configData: |-
[runtime]
read_rt_size = 8
write_rt_size = 8
bg_rt_size = 8
global_rt_size = 4
compact_rt_size = 2
[wal]
provider = "kafka"
broker_endpoints = ["kafka.kafka-cluster.svc.cluster.local:9092"]
linger = "2ms"
frontend:
config: |-
configData: |-
[runtime]
read_rt_size = 8
write_rt_size = 8
bg_rt_size = 8
global_rt_size = 4
[meta_client]
ddl_timeout = "60s"
@@ -43,3 +38,8 @@ objectStorage:
credentials:
accessKeyId: rootuser
secretAccessKey: rootpass123
remoteWal:
enabled: true
kafka:
brokerEndpoints:
- "kafka.kafka-cluster.svc.cluster.local:9092"

View File

@@ -18,6 +18,8 @@ runs:
--set controller.replicaCount=${{ inputs.controller-replicas }} \
--set controller.resources.requests.cpu=50m \
--set controller.resources.requests.memory=128Mi \
--set controller.resources.limits.cpu=2000m \
--set controller.resources.limits.memory=2Gi \
--set listeners.controller.protocol=PLAINTEXT \
--set listeners.client.protocol=PLAINTEXT \
--create-namespace \

View File

@@ -0,0 +1,30 @@
name: Setup PostgreSQL
description: Deploy PostgreSQL on Kubernetes
inputs:
postgres-replicas:
default: 1
description: "Number of PostgreSQL replicas"
namespace:
default: "postgres-namespace"
postgres-version:
default: "14.2"
description: "PostgreSQL version"
storage-size:
default: "1Gi"
description: "Storage size for PostgreSQL"
runs:
using: composite
steps:
- name: Install PostgreSQL
shell: bash
run: |
helm upgrade \
--install postgresql oci://registry-1.docker.io/bitnamicharts/postgresql \
--set replicaCount=${{ inputs.postgres-replicas }} \
--set image.tag=${{ inputs.postgres-version }} \
--set persistence.size=${{ inputs.storage-size }} \
--set postgresql.username=greptimedb \
--set postgresql.password=admin \
--create-namespace \
-n ${{ inputs.namespace }}

View File

@@ -38,7 +38,7 @@ runs:
steps:
- name: Configure AWS credentials
if: startsWith(inputs.runner, 'ec2')
uses: aws-actions/configure-aws-credentials@v2
uses: aws-actions/configure-aws-credentials@v4
with:
aws-access-key-id: ${{ inputs.aws-access-key-id }}
aws-secret-access-key: ${{ inputs.aws-secret-access-key }}

View File

@@ -25,7 +25,7 @@ runs:
steps:
- name: Configure AWS credentials
if: ${{ inputs.label && inputs.ec2-instance-id }}
uses: aws-actions/configure-aws-credentials@v2
uses: aws-actions/configure-aws-credentials@v4
with:
aws-access-key-id: ${{ inputs.aws-access-key-id }}
aws-secret-access-key: ${{ inputs.aws-secret-access-key }}

3
.github/cargo-blacklist.txt vendored Normal file
View File

@@ -0,0 +1,3 @@
native-tls
openssl
aws-lc-sys

View File

@@ -4,7 +4,8 @@ I hereby agree to the terms of the [GreptimeDB CLA](https://github.com/GreptimeT
## What's changed and what's your intention?
__!!! DO NOT LEAVE THIS BLOCK EMPTY !!!__
<!--
__!!! DO NOT LEAVE THIS BLOCK EMPTY !!!__
Please explain IN DETAIL what the changes are in this PR and why they are needed:
@@ -12,9 +13,14 @@ Please explain IN DETAIL what the changes are in this PR and why they are needed
- How does this PR work? Need a brief introduction for the changed logic (optional)
- Describe clearly one logical change and avoid lazy messages (optional)
- Describe any limitations of the current code (optional)
- Describe if this PR will break **API or data compatibility** (optional)
-->
## Checklist
## PR Checklist
Please convert it to a draft if some of the following conditions are not met.
- [ ] I have written the necessary rustdoc comments.
- [ ] I have added the necessary unit tests and integration tests.
- [ ] This PR requires documentation updates.
- [ ] API changes are backward compatible.
- [ ] Schema or data changes are backward compatible.

14
.github/scripts/check-install-script.sh vendored Executable file
View File

@@ -0,0 +1,14 @@
#!/bin/sh
set -e
# Get the latest version of github.com/GreptimeTeam/greptimedb
VERSION=$(curl -s https://api.github.com/repos/GreptimeTeam/greptimedb/releases/latest | jq -r '.tag_name')
echo "Downloading the latest version: $VERSION"
# Download the install script
curl -fsSL https://raw.githubusercontent.com/greptimeteam/greptimedb/main/scripts/install.sh | sh -s $VERSION
# Execute the `greptime` command
./greptime --version

View File

@@ -12,9 +12,6 @@ on:
name: Build API docs
env:
RUST_TOOLCHAIN: nightly-2024-04-20
jobs:
apidoc:
runs-on: ubuntu-20.04
@@ -23,9 +20,7 @@ jobs:
- uses: arduino/setup-protoc@v3
with:
repo-token: ${{ secrets.GITHUB_TOKEN }}
- uses: dtolnay/rust-toolchain@master
with:
toolchain: ${{ env.RUST_TOOLCHAIN }}
- uses: actions-rust-lang/setup-rust-toolchain@v1
- run: cargo doc --workspace --no-deps --document-private-items
- run: |
cat <<EOF > target/doc/index.html

36
.github/workflows/dependency-check.yml vendored Normal file
View File

@@ -0,0 +1,36 @@
name: Check Dependencies
on:
push:
branches:
- main
pull_request:
branches:
- main
jobs:
check-dependencies:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Set up Rust
uses: actions-rust-lang/setup-rust-toolchain@v1
- name: Run cargo tree
run: cargo tree --prefix none > dependencies.txt
- name: Extract dependency names
run: awk '{print $1}' dependencies.txt > dependency_names.txt
- name: Check for blacklisted crates
run: |
while read -r dep; do
if grep -qFx "$dep" dependency_names.txt; then
echo "Blacklisted crate '$dep' found in dependencies."
exit 1
fi
done < .github/cargo-blacklist.txt
echo "No blacklisted crates found."

View File

@@ -29,7 +29,7 @@ on:
linux_arm64_runner:
type: choice
description: The runner uses to build linux-arm64 artifacts
default: ec2-c6g.4xlarge-arm64
default: ec2-c6g.8xlarge-arm64
options:
- ec2-c6g.xlarge-arm64 # 4C8G
- ec2-c6g.2xlarge-arm64 # 8C16G
@@ -177,6 +177,8 @@ jobs:
disable-run-tests: ${{ env.DISABLE_RUN_TESTS }}
dev-mode: true # Only build the standard greptime binary.
working-dir: ${{ env.CHECKOUT_GREPTIMEDB_PATH }}
image-registry: ${{ vars.ECR_IMAGE_REGISTRY }}
image-namespace: ${{ vars.ECR_IMAGE_NAMESPACE }}
build-linux-arm64-artifacts:
name: Build linux-arm64 artifacts
@@ -206,6 +208,8 @@ jobs:
disable-run-tests: ${{ env.DISABLE_RUN_TESTS }}
dev-mode: true # Only build the standard greptime binary.
working-dir: ${{ env.CHECKOUT_GREPTIMEDB_PATH }}
image-registry: ${{ vars.ECR_IMAGE_REGISTRY }}
image-namespace: ${{ vars.ECR_IMAGE_NAMESPACE }}
release-images-to-dockerhub:
name: Build and push images to DockerHub

View File

@@ -29,9 +29,6 @@ concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
cancel-in-progress: true
env:
RUST_TOOLCHAIN: nightly-2024-04-20
jobs:
check-typos-and-docs:
name: Check typos and docs
@@ -64,9 +61,7 @@ jobs:
- uses: arduino/setup-protoc@v3
with:
repo-token: ${{ secrets.GITHUB_TOKEN }}
- uses: dtolnay/rust-toolchain@master
with:
toolchain: ${{ env.RUST_TOOLCHAIN }}
- uses: actions-rust-lang/setup-rust-toolchain@v1
- name: Rust Cache
uses: Swatinem/rust-cache@v2
with:
@@ -82,9 +77,7 @@ jobs:
timeout-minutes: 60
steps:
- uses: actions/checkout@v4
- uses: dtolnay/rust-toolchain@master
with:
toolchain: stable
- uses: actions-rust-lang/setup-rust-toolchain@v1
- name: Rust Cache
uses: Swatinem/rust-cache@v2
with:
@@ -107,9 +100,7 @@ jobs:
- uses: arduino/setup-protoc@v3
with:
repo-token: ${{ secrets.GITHUB_TOKEN }}
- uses: dtolnay/rust-toolchain@master
with:
toolchain: ${{ env.RUST_TOOLCHAIN }}
- uses: actions-rust-lang/setup-rust-toolchain@v1
- uses: Swatinem/rust-cache@v2
with:
# Shares across multiple jobs
@@ -141,16 +132,27 @@ jobs:
runs-on: ubuntu-latest
timeout-minutes: 60
strategy:
fail-fast: false
matrix:
target: [ "fuzz_create_table", "fuzz_alter_table", "fuzz_create_database", "fuzz_create_logical_table", "fuzz_alter_logical_table", "fuzz_insert", "fuzz_insert_logical_table" ]
steps:
- name: Remove unused software
run: |
echo "Disk space before:"
df -h
[[ -d /usr/share/dotnet ]] && sudo rm -rf /usr/share/dotnet
[[ -d /usr/local/lib/android ]] && sudo rm -rf /usr/local/lib/android
[[ -d /opt/ghc ]] && sudo rm -rf /opt/ghc
[[ -d /opt/hostedtoolcache/CodeQL ]] && sudo rm -rf /opt/hostedtoolcache/CodeQL
sudo docker image prune --all --force
sudo docker builder prune -a
echo "Disk space after:"
df -h
- uses: actions/checkout@v4
- uses: arduino/setup-protoc@v3
with:
repo-token: ${{ secrets.GITHUB_TOKEN }}
- uses: dtolnay/rust-toolchain@master
with:
toolchain: ${{ env.RUST_TOOLCHAIN }}
- uses: actions-rust-lang/setup-rust-toolchain@v1
- name: Rust Cache
uses: Swatinem/rust-cache@v2
with:
@@ -168,7 +170,7 @@ jobs:
name: bins
path: .
- name: Unzip binaries
run: |
run: |
tar -xvf ./bins.tar.gz
rm ./bins.tar.gz
- name: Run GreptimeDB
@@ -192,13 +194,23 @@ jobs:
matrix:
target: [ "unstable_fuzz_create_table_standalone" ]
steps:
- name: Remove unused software
run: |
echo "Disk space before:"
df -h
[[ -d /usr/share/dotnet ]] && sudo rm -rf /usr/share/dotnet
[[ -d /usr/local/lib/android ]] && sudo rm -rf /usr/local/lib/android
[[ -d /opt/ghc ]] && sudo rm -rf /opt/ghc
[[ -d /opt/hostedtoolcache/CodeQL ]] && sudo rm -rf /opt/hostedtoolcache/CodeQL
sudo docker image prune --all --force
sudo docker builder prune -a
echo "Disk space after:"
df -h
- uses: actions/checkout@v4
- uses: arduino/setup-protoc@v3
with:
repo-token: ${{ secrets.GITHUB_TOKEN }}
- uses: dtolnay/rust-toolchain@master
with:
toolchain: ${{ env.RUST_TOOLCHAIN }}
- uses: actions-rust-lang/setup-rust-toolchain@v1
- name: Rust Cache
uses: Swatinem/rust-cache@v2
with:
@@ -249,9 +261,7 @@ jobs:
- uses: arduino/setup-protoc@v3
with:
repo-token: ${{ secrets.GITHUB_TOKEN }}
- uses: dtolnay/rust-toolchain@master
with:
toolchain: ${{ env.RUST_TOOLCHAIN }}
- uses: actions-rust-lang/setup-rust-toolchain@v1
- uses: Swatinem/rust-cache@v2
with:
# Shares across multiple jobs
@@ -262,7 +272,7 @@ jobs:
- name: Build greptime bianry
shell: bash
# `cargo gc` will invoke `cargo build` with specified args
run: cargo gc --profile ci -- --bin greptime
run: cargo gc --profile ci -- --bin greptime
- name: Pack greptime binary
shell: bash
run: |
@@ -276,7 +286,7 @@ jobs:
artifacts-dir: bin
version: current
distributed-fuzztest:
distributed-fuzztest:
name: Fuzz Test (Distributed, ${{ matrix.mode.name }}, ${{ matrix.target }})
runs-on: ubuntu-latest
needs: build-greptime-ci
@@ -284,24 +294,24 @@ jobs:
strategy:
matrix:
target: [ "fuzz_create_table", "fuzz_alter_table", "fuzz_create_database", "fuzz_create_logical_table", "fuzz_alter_logical_table", "fuzz_insert", "fuzz_insert_logical_table" ]
mode:
- name: "Disk"
minio: false
kafka: false
values: "with-disk.yaml"
- name: "Minio"
minio: true
kafka: false
values: "with-minio.yaml"
- name: "Minio with Cache"
minio: true
kafka: false
values: "with-minio-and-cache.yaml"
mode:
- name: "Remote WAL"
minio: true
kafka: true
values: "with-remote-wal.yaml"
steps:
- name: Remove unused software
run: |
echo "Disk space before:"
df -h
[[ -d /usr/share/dotnet ]] && sudo rm -rf /usr/share/dotnet
[[ -d /usr/local/lib/android ]] && sudo rm -rf /usr/local/lib/android
[[ -d /opt/ghc ]] && sudo rm -rf /opt/ghc
[[ -d /opt/hostedtoolcache/CodeQL ]] && sudo rm -rf /opt/hostedtoolcache/CodeQL
sudo docker image prune --all --force
sudo docker builder prune -a
echo "Disk space after:"
df -h
- uses: actions/checkout@v4
- name: Setup Kind
uses: ./.github/actions/setup-kind
@@ -317,9 +327,7 @@ jobs:
- uses: arduino/setup-protoc@v3
with:
repo-token: ${{ secrets.GITHUB_TOKEN }}
- uses: dtolnay/rust-toolchain@master
with:
toolchain: ${{ env.RUST_TOOLCHAIN }}
- uses: actions-rust-lang/setup-rust-toolchain@v1
- name: Rust Cache
uses: Swatinem/rust-cache@v2
with:
@@ -389,12 +397,12 @@ jobs:
- name: Describe Nodes
if: failure()
shell: bash
run: |
kubectl describe nodes
run: |
kubectl describe nodes
- name: Export kind logs
if: failure()
shell: bash
run: |
run: |
kind export logs /tmp/kind
- name: Upload logs
if: failure()
@@ -406,26 +414,51 @@ jobs:
- name: Delete cluster
if: success()
shell: bash
run: |
run: |
kind delete cluster
docker stop $(docker ps -a -q)
docker rm $(docker ps -a -q)
docker system prune -f
distributed-fuzztest-with-chaos:
distributed-fuzztest-with-chaos:
name: Fuzz Test with Chaos (Distributed, ${{ matrix.mode.name }}, ${{ matrix.target }})
runs-on: ubuntu-latest
needs: build-greptime-ci
timeout-minutes: 60
strategy:
matrix:
target: ["fuzz_migrate_mito_regions", "fuzz_failover_mito_regions", "fuzz_failover_metric_regions"]
mode:
target: ["fuzz_migrate_mito_regions", "fuzz_migrate_metric_regions", "fuzz_failover_mito_regions", "fuzz_failover_metric_regions"]
mode:
- name: "Remote WAL"
minio: true
kafka: true
values: "with-remote-wal.yaml"
include:
- target: "fuzz_migrate_mito_regions"
mode:
name: "Local WAL"
minio: true
kafka: false
values: "with-minio.yaml"
- target: "fuzz_migrate_metric_regions"
mode:
name: "Local WAL"
minio: true
kafka: false
values: "with-minio.yaml"
steps:
- name: Remove unused software
run: |
echo "Disk space before:"
df -h
[[ -d /usr/share/dotnet ]] && sudo rm -rf /usr/share/dotnet
[[ -d /usr/local/lib/android ]] && sudo rm -rf /usr/local/lib/android
[[ -d /opt/ghc ]] && sudo rm -rf /opt/ghc
[[ -d /opt/hostedtoolcache/CodeQL ]] && sudo rm -rf /opt/hostedtoolcache/CodeQL
sudo docker image prune --all --force
sudo docker builder prune -a
echo "Disk space after:"
df -h
- uses: actions/checkout@v4
- name: Setup Kind
uses: ./.github/actions/setup-kind
@@ -443,9 +476,7 @@ jobs:
- uses: arduino/setup-protoc@v3
with:
repo-token: ${{ secrets.GITHUB_TOKEN }}
- uses: dtolnay/rust-toolchain@master
with:
toolchain: ${{ env.RUST_TOOLCHAIN }}
- uses: actions-rust-lang/setup-rust-toolchain@v1
- name: Rust Cache
uses: Swatinem/rust-cache@v2
with:
@@ -501,7 +532,7 @@ jobs:
with:
image-registry: localhost:5001
values-filename: ${{ matrix.mode.values }}
enable-region-failover: true
enable-region-failover: ${{ matrix.mode.kafka }}
- name: Port forward (mysql)
run: |
kubectl port-forward service/my-greptimedb-frontend 4002:4002 -n my-greptimedb&
@@ -516,12 +547,12 @@ jobs:
- name: Describe Nodes
if: failure()
shell: bash
run: |
kubectl describe nodes
run: |
kubectl describe nodes
- name: Export kind logs
if: failure()
shell: bash
run: |
run: |
kind export logs /tmp/kind
- name: Upload logs
if: failure()
@@ -533,7 +564,7 @@ jobs:
- name: Delete cluster
if: success()
shell: bash
run: |
run: |
kind delete cluster
docker stop $(docker ps -a -q)
docker rm $(docker ps -a -q)
@@ -556,6 +587,10 @@ jobs:
timeout-minutes: 60
steps:
- uses: actions/checkout@v4
- if: matrix.mode.kafka
name: Setup kafka server
working-directory: tests-integration/fixtures/kafka
run: docker compose -f docker-compose-standalone.yml up -d --wait
- name: Download pre-built binaries
uses: actions/download-artifact@v4
with:
@@ -563,10 +598,6 @@ jobs:
path: .
- name: Unzip binaries
run: tar -xvf ./bins.tar.gz
- if: matrix.mode.kafka
name: Setup kafka server
working-directory: tests-integration/fixtures/kafka
run: docker compose -f docker-compose-standalone.yml up -d --wait
- name: Run sqlness
run: RUST_BACKTRACE=1 ./bins/sqlness-runner ${{ matrix.mode.opts }} -c ./tests/cases --bins-dir ./bins --preserve-state
- name: Upload sqlness logs
@@ -586,17 +617,16 @@ jobs:
- uses: arduino/setup-protoc@v3
with:
repo-token: ${{ secrets.GITHUB_TOKEN }}
- uses: dtolnay/rust-toolchain@master
- uses: actions-rust-lang/setup-rust-toolchain@v1
with:
toolchain: ${{ env.RUST_TOOLCHAIN }}
components: rustfmt
- name: Rust Cache
uses: Swatinem/rust-cache@v2
with:
# Shares across multiple jobs
shared-key: "check-rust-fmt"
- name: Run cargo fmt
run: cargo fmt --all -- --check
- name: Check format
run: make fmt-check
clippy:
name: Clippy
@@ -607,9 +637,8 @@ jobs:
- uses: arduino/setup-protoc@v3
with:
repo-token: ${{ secrets.GITHUB_TOKEN }}
- uses: dtolnay/rust-toolchain@master
- uses: actions-rust-lang/setup-rust-toolchain@v1
with:
toolchain: ${{ env.RUST_TOOLCHAIN }}
components: clippy
- name: Rust Cache
uses: Swatinem/rust-cache@v2
@@ -633,9 +662,8 @@ jobs:
with:
version: "14.0"
- name: Install toolchain
uses: dtolnay/rust-toolchain@master
uses: actions-rust-lang/setup-rust-toolchain@v1
with:
toolchain: ${{ env.RUST_TOOLCHAIN }}
components: llvm-tools-preview
- name: Rust Cache
uses: Swatinem/rust-cache@v2
@@ -655,7 +683,7 @@ jobs:
with:
python-version: '3.10'
- name: Install PyArrow Package
run: pip install pyarrow
run: pip install pyarrow numpy
- name: Setup etcd server
working-directory: tests-integration/fixtures/etcd
run: docker compose -f docker-compose-standalone.yml up -d --wait
@@ -665,6 +693,9 @@ jobs:
- name: Setup minio
working-directory: tests-integration/fixtures/minio
run: docker compose -f docker-compose-standalone.yml up -d --wait
- name: Setup postgres server
working-directory: tests-integration/fixtures/postgres
run: docker compose -f docker-compose-standalone.yml up -d --wait
- name: Run nextest cases
run: cargo llvm-cov nextest --workspace --lcov --output-path lcov.info -F pyo3_backend -F dashboard
env:
@@ -681,7 +712,9 @@ jobs:
GT_MINIO_REGION: us-west-2
GT_MINIO_ENDPOINT_URL: http://127.0.0.1:9000
GT_ETCD_ENDPOINTS: http://127.0.0.1:2379
GT_POSTGRES_ENDPOINTS: postgres://greptimedb:admin@127.0.0.1:5432/postgres
GT_KAFKA_ENDPOINTS: 127.0.0.1:9092
GT_KAFKA_SASL_ENDPOINTS: 127.0.0.1:9093
UNITTEST_LOG_DIR: "__unittest_logs"
- name: Codecov upload
uses: codecov/codecov-action@v4

View File

@@ -12,7 +12,7 @@ on:
linux_amd64_runner:
type: choice
description: The runner uses to build linux-amd64 artifacts
default: ec2-c6i.2xlarge-amd64
default: ec2-c6i.4xlarge-amd64
options:
- ubuntu-20.04
- ubuntu-20.04-8-cores
@@ -27,7 +27,7 @@ on:
linux_arm64_runner:
type: choice
description: The runner uses to build linux-arm64 artifacts
default: ec2-c6g.2xlarge-arm64
default: ec2-c6g.8xlarge-arm64
options:
- ec2-c6g.xlarge-arm64 # 4C8G
- ec2-c6g.2xlarge-arm64 # 8C16G
@@ -154,6 +154,8 @@ jobs:
cargo-profile: ${{ env.CARGO_PROFILE }}
version: ${{ needs.allocate-runners.outputs.version }}
disable-run-tests: ${{ env.DISABLE_RUN_TESTS }}
image-registry: ${{ vars.ECR_IMAGE_REGISTRY }}
image-namespace: ${{ vars.ECR_IMAGE_NAMESPACE }}
build-linux-arm64-artifacts:
name: Build linux-arm64 artifacts
@@ -173,6 +175,8 @@ jobs:
cargo-profile: ${{ env.CARGO_PROFILE }}
version: ${{ needs.allocate-runners.outputs.version }}
disable-run-tests: ${{ env.DISABLE_RUN_TESTS }}
image-registry: ${{ vars.ECR_IMAGE_REGISTRY }}
image-namespace: ${{ vars.ECR_IMAGE_NAMESPACE }}
release-images-to-dockerhub:
name: Build and push images to DockerHub

View File

@@ -9,9 +9,6 @@ concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
cancel-in-progress: true
env:
RUST_TOOLCHAIN: nightly-2024-04-20
permissions:
issues: write
@@ -25,6 +22,10 @@ jobs:
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Check install.sh
run: ./.github/scripts/check-install-script.sh
- name: Run sqlness test
uses: ./.github/actions/sqlness-test
with:
@@ -33,6 +34,13 @@ jobs:
aws-region: ${{ vars.AWS_CI_TEST_BUCKET_REGION }}
aws-access-key-id: ${{ secrets.AWS_CI_TEST_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_CI_TEST_SECRET_ACCESS_KEY }}
- name: Upload sqlness logs
if: failure()
uses: actions/upload-artifact@v4
with:
name: sqlness-logs-kind
path: /tmp/kind/
retention-days: 3
sqlness-windows:
name: Sqlness tests on Windows
@@ -45,19 +53,19 @@ jobs:
- uses: arduino/setup-protoc@v3
with:
repo-token: ${{ secrets.GITHUB_TOKEN }}
- uses: dtolnay/rust-toolchain@master
with:
toolchain: ${{ env.RUST_TOOLCHAIN }}
- uses: actions-rust-lang/setup-rust-toolchain@v1
- name: Rust Cache
uses: Swatinem/rust-cache@v2
- name: Run sqlness
run: cargo sqlness
run: make sqlness-test
env:
SQLNESS_OPTS: "--preserve-state"
- name: Upload sqlness logs
if: always()
if: failure()
uses: actions/upload-artifact@v4
with:
name: sqlness-logs
path: /tmp/greptime-*.log
path: C:\Users\RUNNER~1\AppData\Local\Temp\sqlness*
retention-days: 3
test-on-windows:
@@ -76,9 +84,8 @@ jobs:
with:
version: "14.0"
- name: Install Rust toolchain
uses: dtolnay/rust-toolchain@master
uses: actions-rust-lang/setup-rust-toolchain@v1
with:
toolchain: ${{ env.RUST_TOOLCHAIN }}
components: llvm-tools-preview
- name: Rust Cache
uses: Swatinem/rust-cache@v2
@@ -89,7 +96,7 @@ jobs:
with:
python-version: "3.10"
- name: Install PyArrow Package
run: pip install pyarrow
run: pip install pyarrow numpy
- name: Install WSL distribution
uses: Vampire/setup-wsl@v2
with:
@@ -107,13 +114,20 @@ jobs:
GT_S3_REGION: ${{ vars.AWS_CI_TEST_BUCKET_REGION }}
UNITTEST_LOG_DIR: "__unittest_logs"
cleanbuild-linux-nix:
runs-on: ubuntu-latest-8-cores
timeout-minutes: 60
needs: [coverage, fmt, clippy, check]
steps:
- uses: actions/checkout@v4
- uses: cachix/install-nix-action@v27
with:
nix_path: nixpkgs=channel:nixos-unstable
- run: nix-shell --pure --run "cargo build"
check-status:
name: Check status
needs: [
sqlness-test,
sqlness-windows,
test-on-windows,
]
needs: [sqlness-test, sqlness-windows, test-on-windows]
if: ${{ github.repository == 'GreptimeTeam/greptimedb' }}
runs-on: ubuntu-20.04
outputs:
@@ -127,9 +141,7 @@ jobs:
notification:
if: ${{ github.repository == 'GreptimeTeam/greptimedb' && always() }} # Not requiring successful dependent jobs, always run.
name: Send notification to Greptime team
needs: [
check-status
]
needs: [check-status]
runs-on: ubuntu-20.04
env:
SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL_DEVELOP_CHANNEL }}

View File

@@ -1,12 +1,14 @@
name: Release dev-builder images
on:
push:
branches:
- main
paths:
- rust-toolchain.toml
- 'docker/dev-builder/**'
workflow_dispatch: # Allows you to run this workflow manually.
inputs:
version:
description: Version of the dev-builder
required: false
default: latest
release_dev_builder_ubuntu_image:
type: boolean
description: Release dev-builder-ubuntu image
@@ -28,22 +30,103 @@ jobs:
name: Release dev builder images
if: ${{ inputs.release_dev_builder_ubuntu_image || inputs.release_dev_builder_centos_image || inputs.release_dev_builder_android_image }} # Only manually trigger this job.
runs-on: ubuntu-20.04-16-cores
outputs:
version: ${{ steps.set-version.outputs.version }}
steps:
- name: Checkout
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Configure build image version
id: set-version
shell: bash
run: |
commitShortSHA=`echo ${{ github.sha }} | cut -c1-8`
buildTime=`date +%Y%m%d%H%M%S`
BUILD_VERSION="$commitShortSHA-$buildTime"
RUST_TOOLCHAIN_VERSION=$(cat rust-toolchain.toml | grep -Eo '[0-9]{4}-[0-9]{2}-[0-9]{2}')
IMAGE_VERSION="${RUST_TOOLCHAIN_VERSION}-${BUILD_VERSION}"
echo "VERSION=${IMAGE_VERSION}" >> $GITHUB_ENV
echo "version=$IMAGE_VERSION" >> $GITHUB_OUTPUT
- name: Build and push dev builder images
uses: ./.github/actions/build-dev-builder-images
with:
version: ${{ inputs.version }}
version: ${{ env.VERSION }}
dockerhub-image-registry-username: ${{ secrets.DOCKERHUB_USERNAME }}
dockerhub-image-registry-token: ${{ secrets.DOCKERHUB_TOKEN }}
build-dev-builder-ubuntu: ${{ inputs.release_dev_builder_ubuntu_image }}
build-dev-builder-centos: ${{ inputs.release_dev_builder_centos_image }}
build-dev-builder-android: ${{ inputs.release_dev_builder_android_image }}
release-dev-builder-images-ecr:
name: Release dev builder images to AWS ECR
runs-on: ubuntu-20.04
needs: [
release-dev-builder-images
]
steps:
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
aws-access-key-id: ${{ secrets.AWS_ECR_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_ECR_SECRET_ACCESS_KEY }}
aws-region: ${{ vars.ECR_REGION }}
- name: Login to Amazon ECR
id: login-ecr-public
uses: aws-actions/amazon-ecr-login@v2
env:
AWS_REGION: ${{ vars.ECR_REGION }}
with:
registry-type: public
- name: Push dev-builder-ubuntu image
shell: bash
if: ${{ inputs.release_dev_builder_ubuntu_image }}
run: |
docker run -v "${DOCKER_CONFIG:-$HOME/.docker}:/root/.docker:ro" \
-e "REGISTRY_AUTH_FILE=/root/.docker/config.json" \
quay.io/skopeo/stable:latest \
copy -a docker://docker.io/${{ vars.IMAGE_NAMESPACE }}/dev-builder-ubuntu:${{ needs.release-dev-builder-images.outputs.version }} \
docker://${{ vars.ECR_IMAGE_REGISTRY }}/${{ vars.ECR_IMAGE_NAMESPACE }}/dev-builder-ubuntu:${{ needs.release-dev-builder-images.outputs.version }}
docker run -v "${DOCKER_CONFIG:-$HOME/.docker}:/root/.docker:ro" \
-e "REGISTRY_AUTH_FILE=/root/.docker/config.json" \
quay.io/skopeo/stable:latest \
copy -a docker://docker.io/${{ vars.IMAGE_NAMESPACE }}/dev-builder-ubuntu:latest \
docker://${{ vars.ECR_IMAGE_REGISTRY }}/${{ vars.ECR_IMAGE_NAMESPACE }}/dev-builder-ubuntu:latest
- name: Push dev-builder-centos image
shell: bash
if: ${{ inputs.release_dev_builder_centos_image }}
run: |
docker run -v "${DOCKER_CONFIG:-$HOME/.docker}:/root/.docker:ro" \
-e "REGISTRY_AUTH_FILE=/root/.docker/config.json" \
quay.io/skopeo/stable:latest \
copy -a docker://docker.io/${{ vars.IMAGE_NAMESPACE }}/dev-builder-centos:${{ needs.release-dev-builder-images.outputs.version }} \
docker://${{ vars.ECR_IMAGE_REGISTRY }}/${{ vars.ECR_IMAGE_NAMESPACE }}/dev-builder-centos:${{ needs.release-dev-builder-images.outputs.version }}
docker run -v "${DOCKER_CONFIG:-$HOME/.docker}:/root/.docker:ro" \
-e "REGISTRY_AUTH_FILE=/root/.docker/config.json" \
quay.io/skopeo/stable:latest \
copy -a docker://docker.io/${{ vars.IMAGE_NAMESPACE }}/dev-builder-centos:latest \
docker://${{ vars.ECR_IMAGE_REGISTRY }}/${{ vars.ECR_IMAGE_NAMESPACE }}/dev-builder-centos:latest
- name: Push dev-builder-android image
shell: bash
if: ${{ inputs.release_dev_builder_android_image }}
run: |
docker run -v "${DOCKER_CONFIG:-$HOME/.docker}:/root/.docker:ro" \
-e "REGISTRY_AUTH_FILE=/root/.docker/config.json" \
quay.io/skopeo/stable:latest \
copy -a docker://docker.io/${{ vars.IMAGE_NAMESPACE }}/dev-builder-android:${{ needs.release-dev-builder-images.outputs.version }} \
docker://${{ vars.ECR_IMAGE_REGISTRY }}/${{ vars.ECR_IMAGE_NAMESPACE }}/dev-builder-android:${{ needs.release-dev-builder-images.outputs.version }}
docker run -v "${DOCKER_CONFIG:-$HOME/.docker}:/root/.docker:ro" \
-e "REGISTRY_AUTH_FILE=/root/.docker/config.json" \
quay.io/skopeo/stable:latest \
copy -a docker://docker.io/${{ vars.IMAGE_NAMESPACE }}/dev-builder-android:latest \
docker://${{ vars.ECR_IMAGE_REGISTRY }}/${{ vars.ECR_IMAGE_NAMESPACE }}/dev-builder-android:latest
release-dev-builder-images-cn: # Note: Be careful issue: https://github.com/containers/skopeo/issues/1874 and we decide to use the latest stable skopeo container.
name: Release dev builder images to CN region
runs-on: ubuntu-20.04
@@ -51,35 +134,39 @@ jobs:
release-dev-builder-images
]
steps:
- name: Login to AliCloud Container Registry
uses: docker/login-action@v3
with:
registry: ${{ vars.ACR_IMAGE_REGISTRY }}
username: ${{ secrets.ALICLOUD_USERNAME }}
password: ${{ secrets.ALICLOUD_PASSWORD }}
- name: Push dev-builder-ubuntu image
shell: bash
if: ${{ inputs.release_dev_builder_ubuntu_image }}
env:
DST_REGISTRY_USERNAME: ${{ secrets.ALICLOUD_USERNAME }}
DST_REGISTRY_PASSWORD: ${{ secrets.ALICLOUD_PASSWORD }}
run: |
docker run quay.io/skopeo/stable:latest copy -a docker://docker.io/${{ vars.IMAGE_NAMESPACE }}/dev-builder-ubuntu:${{ inputs.version }} \
--dest-creds "$DST_REGISTRY_USERNAME":"$DST_REGISTRY_PASSWORD" \
docker://${{ vars.ACR_IMAGE_REGISTRY }}/${{ vars.IMAGE_NAMESPACE }}/dev-builder-ubuntu:${{ inputs.version }}
docker run -v "${DOCKER_CONFIG:-$HOME/.docker}:/root/.docker:ro" \
-e "REGISTRY_AUTH_FILE=/root/.docker/config.json" \
quay.io/skopeo/stable:latest \
copy -a docker://docker.io/${{ vars.IMAGE_NAMESPACE }}/dev-builder-ubuntu:${{ needs.release-dev-builder-images.outputs.version }} \
docker://${{ vars.ACR_IMAGE_REGISTRY }}/${{ vars.IMAGE_NAMESPACE }}/dev-builder-ubuntu:${{ needs.release-dev-builder-images.outputs.version }}
- name: Push dev-builder-centos image
shell: bash
if: ${{ inputs.release_dev_builder_centos_image }}
env:
DST_REGISTRY_USERNAME: ${{ secrets.ALICLOUD_USERNAME }}
DST_REGISTRY_PASSWORD: ${{ secrets.ALICLOUD_PASSWORD }}
run: |
docker run quay.io/skopeo/stable:latest copy -a docker://docker.io/${{ vars.IMAGE_NAMESPACE }}/dev-builder-centos:${{ inputs.version }} \
--dest-creds "$DST_REGISTRY_USERNAME":"$DST_REGISTRY_PASSWORD" \
docker://${{ vars.ACR_IMAGE_REGISTRY }}/${{ vars.IMAGE_NAMESPACE }}/dev-builder-centos:${{ inputs.version }}
docker run -v "${DOCKER_CONFIG:-$HOME/.docker}:/root/.docker:ro" \
-e "REGISTRY_AUTH_FILE=/root/.docker/config.json" \
quay.io/skopeo/stable:latest \
copy -a docker://docker.io/${{ vars.IMAGE_NAMESPACE }}/dev-builder-centos:${{ needs.release-dev-builder-images.outputs.version }} \
docker://${{ vars.ACR_IMAGE_REGISTRY }}/${{ vars.IMAGE_NAMESPACE }}/dev-builder-centos:${{ needs.release-dev-builder-images.outputs.version }}
- name: Push dev-builder-android image
shell: bash
if: ${{ inputs.release_dev_builder_android_image }}
env:
DST_REGISTRY_USERNAME: ${{ secrets.ALICLOUD_USERNAME }}
DST_REGISTRY_PASSWORD: ${{ secrets.ALICLOUD_PASSWORD }}
run: |
docker run quay.io/skopeo/stable:latest copy -a docker://docker.io/${{ vars.IMAGE_NAMESPACE }}/dev-builder-android:${{ inputs.version }} \
--dest-creds "$DST_REGISTRY_USERNAME":"$DST_REGISTRY_PASSWORD" \
docker://${{ vars.ACR_IMAGE_REGISTRY }}/${{ vars.IMAGE_NAMESPACE }}/dev-builder-android:${{ inputs.version }}
docker run -v "${DOCKER_CONFIG:-$HOME/.docker}:/root/.docker:ro" \
-e "REGISTRY_AUTH_FILE=/root/.docker/config.json" \
quay.io/skopeo/stable:latest \
copy -a docker://docker.io/${{ vars.IMAGE_NAMESPACE }}/dev-builder-android:${{ needs.release-dev-builder-images.outputs.version }} \
docker://${{ vars.ACR_IMAGE_REGISTRY }}/${{ vars.IMAGE_NAMESPACE }}/dev-builder-android:${{ needs.release-dev-builder-images.outputs.version }}

View File

@@ -31,8 +31,9 @@ on:
linux_arm64_runner:
type: choice
description: The runner uses to build linux-arm64 artifacts
default: ec2-c6g.4xlarge-arm64
default: ec2-c6g.8xlarge-arm64
options:
- ubuntu-2204-32-cores-arm
- ec2-c6g.xlarge-arm64 # 4C8G
- ec2-c6g.2xlarge-arm64 # 8C16G
- ec2-c6g.4xlarge-arm64 # 16C32G
@@ -82,7 +83,6 @@ on:
# Use env variables to control all the release process.
env:
# The arguments of building greptime.
RUST_TOOLCHAIN: nightly-2024-04-20
CARGO_PROFILE: nightly
# Controls whether to run tests, include unit-test, integration-test and sqlness.
@@ -91,7 +91,7 @@ env:
# The scheduled version is '${{ env.NEXT_RELEASE_VERSION }}-nightly-YYYYMMDD', like v0.2.0-nigthly-20230313;
NIGHTLY_RELEASE_PREFIX: nightly
# Note: The NEXT_RELEASE_VERSION should be modified manually by every formal release.
NEXT_RELEASE_VERSION: v0.10.0
NEXT_RELEASE_VERSION: v0.11.0
# Permission reference: https://docs.github.com/en/actions/using-jobs/assigning-permissions-to-jobs
permissions:
@@ -123,6 +123,11 @@ jobs:
with:
fetch-depth: 0
- name: Check Rust toolchain version
shell: bash
run: |
./scripts/check-builder-rust-version.sh
# The create-version will create a global variable named 'version' in the global workflows.
# - If it's a tag push release, the version is the tag name(${{ github.ref_name }});
# - If it's a scheduled release, the version is '${{ env.NEXT_RELEASE_VERSION }}-nightly-$buildTime', like v0.2.0-nigthly-20230313;
@@ -183,6 +188,8 @@ jobs:
cargo-profile: ${{ env.CARGO_PROFILE }}
version: ${{ needs.allocate-runners.outputs.version }}
disable-run-tests: ${{ env.DISABLE_RUN_TESTS }}
image-registry: ${{ vars.ECR_IMAGE_REGISTRY }}
image-namespace: ${{ vars.ECR_IMAGE_NAMESPACE }}
build-linux-arm64-artifacts:
name: Build linux-arm64 artifacts
@@ -202,6 +209,8 @@ jobs:
cargo-profile: ${{ env.CARGO_PROFILE }}
version: ${{ needs.allocate-runners.outputs.version }}
disable-run-tests: ${{ env.DISABLE_RUN_TESTS }}
image-registry: ${{ vars.ECR_IMAGE_REGISTRY }}
image-namespace: ${{ vars.ECR_IMAGE_NAMESPACE }}
build-macos-artifacts:
name: Build macOS artifacts
@@ -240,11 +249,11 @@ jobs:
- uses: ./.github/actions/build-macos-artifacts
with:
arch: ${{ matrix.arch }}
rust-toolchain: ${{ env.RUST_TOOLCHAIN }}
cargo-profile: ${{ env.CARGO_PROFILE }}
features: ${{ matrix.features }}
version: ${{ needs.allocate-runners.outputs.version }}
disable-run-tests: ${{ env.DISABLE_RUN_TESTS }}
# We decide to disable the integration tests on macOS because it's unnecessary and time-consuming.
disable-run-tests: true
artifacts-dir: ${{ matrix.artifacts-dir-prefix }}-${{ needs.allocate-runners.outputs.version }}
- name: Set build macos result
@@ -283,7 +292,6 @@ jobs:
- uses: ./.github/actions/build-windows-artifacts
with:
arch: ${{ matrix.arch }}
rust-toolchain: ${{ env.RUST_TOOLCHAIN }}
cargo-profile: ${{ env.CARGO_PROFILE }}
features: ${{ matrix.features }}
version: ${{ needs.allocate-runners.outputs.version }}

6
.gitignore vendored
View File

@@ -47,6 +47,10 @@ benchmarks/data
venv/
# Fuzz tests
# Fuzz tests
tests-fuzz/artifacts/
tests-fuzz/corpus/
# Nix
.direnv
.envrc

View File

@@ -17,6 +17,6 @@ repos:
- id: fmt
- id: clippy
args: ["--workspace", "--all-targets", "--all-features", "--", "-D", "warnings"]
stages: [push]
stages: [pre-push]
- id: cargo-check
args: ["--workspace", "--all-targets", "--all-features"]

View File

@@ -7,6 +7,8 @@
* [NiwakaDev](https://github.com/NiwakaDev)
* [etolbakov](https://github.com/etolbakov)
* [irenjj](https://github.com/irenjj)
* [tisonkun](https://github.com/tisonkun)
* [Lanqing Yang](https://github.com/lyang24)
## Team Members (in alphabetical order)
@@ -30,7 +32,6 @@
* [shuiyisong](https://github.com/shuiyisong)
* [sunchanglong](https://github.com/sunchanglong)
* [sunng87](https://github.com/sunng87)
* [tisonkun](https://github.com/tisonkun)
* [v0y4g3r](https://github.com/v0y4g3r)
* [waynexia](https://github.com/waynexia)
* [xtang](https://github.com/xtang)

View File

@@ -14,7 +14,7 @@ Follow our [README](https://github.com/GreptimeTeam/greptimedb#readme) to get th
It can feel intimidating to contribute to a complex project, but it can also be exciting and fun. These general notes will help everyone participate in this communal activity.
- Follow the [Code of Conduct](https://github.com/GreptimeTeam/greptimedb/blob/main/CODE_OF_CONDUCT.md)
- Follow the [Code of Conduct](https://github.com/GreptimeTeam/.github/blob/main/.github/CODE_OF_CONDUCT.md)
- Small changes make huge differences. We will happily accept a PR making a single character change if it helps move forward. Don't wait to have everything working.
- Check the closed issues before opening your issue.
- Try to follow the existing style of the code.
@@ -30,7 +30,7 @@ Pull requests are great, but we accept all kinds of other help if you like. Such
## Code of Conduct
Also, there are things that we are not looking for because they don't match the goals of the product or benefit the community. Please read [Code of Conduct](https://github.com/GreptimeTeam/greptimedb/blob/main/CODE_OF_CONDUCT.md); we hope everyone can keep good manners and become an honored member.
Also, there are things that we are not looking for because they don't match the goals of the product or benefit the community. Please read [Code of Conduct](https://github.com/GreptimeTeam/.github/blob/main/.github/CODE_OF_CONDUCT.md); we hope everyone can keep good manners and become an honored member.
## License

3554
Cargo.lock generated

File diff suppressed because it is too large Load Diff

View File

@@ -2,24 +2,28 @@
members = [
"src/api",
"src/auth",
"src/catalog",
"src/cache",
"src/catalog",
"src/cli",
"src/client",
"src/cmd",
"src/common/base",
"src/common/catalog",
"src/common/config",
"src/common/datasource",
"src/common/decimal",
"src/common/error",
"src/common/frontend",
"src/common/function",
"src/common/macro",
"src/common/greptimedb-telemetry",
"src/common/grpc",
"src/common/grpc-expr",
"src/common/macro",
"src/common/mem-prof",
"src/common/meta",
"src/common/options",
"src/common/plugins",
"src/common/pprof",
"src/common/procedure",
"src/common/procedure-test",
"src/common/query",
@@ -29,7 +33,6 @@ members = [
"src/common/telemetry",
"src/common/test-util",
"src/common/time",
"src/common/decimal",
"src/common/version",
"src/common/wal",
"src/datanode",
@@ -37,6 +40,8 @@ members = [
"src/file-engine",
"src/flow",
"src/frontend",
"src/index",
"src/log-query",
"src/log-store",
"src/meta-client",
"src/meta-srv",
@@ -56,7 +61,6 @@ members = [
"src/sql",
"src/store-api",
"src/table",
"src/index",
"tests-fuzz",
"tests-integration",
"tests/runner",
@@ -64,7 +68,7 @@ members = [
resolver = "2"
[workspace.package]
version = "0.9.0"
version = "0.11.1"
edition = "2021"
license = "Apache-2.0"
@@ -77,6 +81,7 @@ clippy.readonly_write_lock = "allow"
rust.unknown_lints = "deny"
# Remove this after https://github.com/PyO3/pyo3/issues/4094
rust.non_local_definitions = "allow"
rust.unexpected_cfgs = { level = "warn", check-cfg = ['cfg(tokio_unstable)'] }
[workspace.dependencies]
# We turn off default-features for some dependencies here so the workspaces which inherit them can
@@ -89,7 +94,7 @@ aquamarine = "0.3"
arrow = { version = "51.0.0", features = ["prettyprint"] }
arrow-array = { version = "51.0.0", default-features = false, features = ["chrono-tz"] }
arrow-flight = "51.0"
arrow-ipc = { version = "51.0.0", default-features = false, features = ["lz4"] }
arrow-ipc = { version = "51.0.0", default-features = false, features = ["lz4", "zstd"] }
arrow-schema = { version = "51.0", features = ["serde"] }
async-stream = "0.3"
async-trait = "0.1"
@@ -98,33 +103,35 @@ base64 = "0.21"
bigdecimal = "0.4.2"
bitflags = "2.4.1"
bytemuck = "1.12"
bytes = { version = "1.5", features = ["serde"] }
bytes = { version = "1.7", features = ["serde"] }
chrono = { version = "0.4", features = ["serde"] }
clap = { version = "4.4", features = ["derive"] }
config = "0.13.0"
crossbeam-utils = "0.8"
dashmap = "5.4"
datafusion = { git = "https://github.com/waynexia/arrow-datafusion.git", rev = "d7bda5c9b762426e81f144296deadc87e5f4a0b8" }
datafusion-common = { git = "https://github.com/waynexia/arrow-datafusion.git", rev = "d7bda5c9b762426e81f144296deadc87e5f4a0b8" }
datafusion-expr = { git = "https://github.com/waynexia/arrow-datafusion.git", rev = "d7bda5c9b762426e81f144296deadc87e5f4a0b8" }
datafusion-functions = { git = "https://github.com/waynexia/arrow-datafusion.git", rev = "d7bda5c9b762426e81f144296deadc87e5f4a0b8" }
datafusion-optimizer = { git = "https://github.com/waynexia/arrow-datafusion.git", rev = "d7bda5c9b762426e81f144296deadc87e5f4a0b8" }
datafusion-physical-expr = { git = "https://github.com/waynexia/arrow-datafusion.git", rev = "d7bda5c9b762426e81f144296deadc87e5f4a0b8" }
datafusion-physical-plan = { git = "https://github.com/waynexia/arrow-datafusion.git", rev = "d7bda5c9b762426e81f144296deadc87e5f4a0b8" }
datafusion-sql = { git = "https://github.com/waynexia/arrow-datafusion.git", rev = "d7bda5c9b762426e81f144296deadc87e5f4a0b8" }
datafusion-substrait = { git = "https://github.com/waynexia/arrow-datafusion.git", rev = "d7bda5c9b762426e81f144296deadc87e5f4a0b8" }
datafusion = { git = "https://github.com/waynexia/arrow-datafusion.git", rev = "7823ef2f63663907edab46af0d51359900f608d6" }
datafusion-common = { git = "https://github.com/waynexia/arrow-datafusion.git", rev = "7823ef2f63663907edab46af0d51359900f608d6" }
datafusion-expr = { git = "https://github.com/waynexia/arrow-datafusion.git", rev = "7823ef2f63663907edab46af0d51359900f608d6" }
datafusion-functions = { git = "https://github.com/waynexia/arrow-datafusion.git", rev = "7823ef2f63663907edab46af0d51359900f608d6" }
datafusion-optimizer = { git = "https://github.com/waynexia/arrow-datafusion.git", rev = "7823ef2f63663907edab46af0d51359900f608d6" }
datafusion-physical-expr = { git = "https://github.com/waynexia/arrow-datafusion.git", rev = "7823ef2f63663907edab46af0d51359900f608d6" }
datafusion-physical-plan = { git = "https://github.com/waynexia/arrow-datafusion.git", rev = "7823ef2f63663907edab46af0d51359900f608d6" }
datafusion-sql = { git = "https://github.com/waynexia/arrow-datafusion.git", rev = "7823ef2f63663907edab46af0d51359900f608d6" }
datafusion-substrait = { git = "https://github.com/waynexia/arrow-datafusion.git", rev = "7823ef2f63663907edab46af0d51359900f608d6" }
derive_builder = "0.12"
dotenv = "0.15"
etcd-client = { version = "0.13" }
etcd-client = "0.13"
fst = "0.4.7"
futures = "0.3"
futures-util = "0.3"
greptime-proto = { git = "https://github.com/GreptimeTeam/greptime-proto.git", rev = "5c801650435d464891114502539b701c77a1b914" }
greptime-proto = { git = "https://github.com/GreptimeTeam/greptime-proto.git", rev = "a875e976441188028353f7274a46a7e6e065c5d4" }
hex = "0.4"
humantime = "2.1"
humantime-serde = "1.1"
itertools = "0.10"
jsonb = { git = "https://github.com/databendlabs/jsonb.git", rev = "8c8d2fc294a39f3ff08909d60f718639cfba3875", default-features = false }
lazy_static = "1.4"
meter-core = { git = "https://github.com/GreptimeTeam/greptime-meter.git", rev = "80b72716dcde47ec4161478416a5c6c21343364d" }
meter-core = { git = "https://github.com/GreptimeTeam/greptime-meter.git", rev = "a10facb353b41460eeb98578868ebf19c2084fac" }
mockall = "0.11.4"
moka = "0.12"
notify = "6.1"
@@ -134,46 +141,59 @@ opentelemetry-proto = { version = "0.5", features = [
"gen-tonic",
"metrics",
"trace",
"with-serde",
"logs",
] }
parking_lot = "0.12"
parquet = { version = "51.0.0", default-features = false, features = ["arrow", "async", "object_store"] }
paste = "1.0"
pin-project = "1.0"
prometheus = { version = "0.13.3", features = ["process"] }
promql-parser = { version = "0.4" }
promql-parser = { version = "0.4.3", features = ["ser"] }
prost = "0.12"
raft-engine = { version = "0.4.1", default-features = false }
rand = "0.8"
ratelimit = "0.9"
regex = "1.8"
regex-automata = { version = "0.4" }
regex-automata = "0.4"
reqwest = { version = "0.12", default-features = false, features = [
"json",
"rustls-tls-native-roots",
"stream",
"multipart",
] }
rskafka = "0.5"
rskafka = { git = "https://github.com/influxdata/rskafka.git", rev = "75535b5ad9bae4a5dbb582c82e44dfd81ec10105", features = [
"transport-tls",
] }
rstest = "0.21"
rstest_reuse = "0.7"
rust_decimal = "1.33"
schemars = "0.8"
rustc-hash = "2.0"
serde = { version = "1.0", features = ["derive"] }
serde_json = { version = "1.0", features = ["float_roundtrip"] }
serde_with = "3"
shadow-rs = "0.35"
similar-asserts = "1.6.0"
smallvec = { version = "1", features = ["serde"] }
snafu = "0.8"
sysinfo = "0.30"
# on branch v0.44.x
sqlparser = { git = "https://github.com/GreptimeTeam/sqlparser-rs.git", rev = "54a267ac89c09b11c0c88934690530807185d3e7", features = [
"visitor",
"serde",
] }
strum = { version = "0.25", features = ["derive"] }
tempfile = "3"
tokio = { version = "1.36", features = ["full"] }
tokio-stream = { version = "0.1" }
tokio = { version = "1.40", features = ["full"] }
tokio-postgres = "0.7"
tokio-stream = "0.1"
tokio-util = { version = "0.7", features = ["io-util", "compat"] }
toml = "0.8.8"
tonic = { version = "0.11", features = ["tls", "gzip", "zstd"] }
tower = { version = "0.4" }
tower = "0.4"
tracing-appender = "0.2"
tracing-subscriber = { version = "0.3", features = ["env-filter", "json", "fmt"] }
typetag = "0.2"
uuid = { version = "1.7", features = ["serde", "v4", "fast-rng"] }
zstd = "0.13"
@@ -182,8 +202,9 @@ api = { path = "src/api" }
auth = { path = "src/auth" }
cache = { path = "src/cache" }
catalog = { path = "src/catalog" }
cli = { path = "src/cli" }
client = { path = "src/client" }
cmd = { path = "src/cmd" }
cmd = { path = "src/cmd", default-features = false }
common-base = { path = "src/common/base" }
common-catalog = { path = "src/common/catalog" }
common-config = { path = "src/common/config" }
@@ -198,7 +219,9 @@ common-grpc-expr = { path = "src/common/grpc-expr" }
common-macro = { path = "src/common/macro" }
common-mem-prof = { path = "src/common/mem-prof" }
common-meta = { path = "src/common/meta" }
common-options = { path = "src/common/options" }
common-plugins = { path = "src/common/plugins" }
common-pprof = { path = "src/common/pprof" }
common-procedure = { path = "src/common/procedure" }
common-procedure-test = { path = "src/common/procedure-test" }
common-query = { path = "src/common/query" }
@@ -213,7 +236,7 @@ datanode = { path = "src/datanode" }
datatypes = { path = "src/datatypes" }
file-engine = { path = "src/file-engine" }
flow = { path = "src/flow" }
frontend = { path = "src/frontend" }
frontend = { path = "src/frontend", default-features = false }
index = { path = "src/index" }
log-store = { path = "src/log-store" }
meta-client = { path = "src/meta-client" }
@@ -236,16 +259,27 @@ store-api = { path = "src/store-api" }
substrait = { path = "src/common/substrait" }
table = { path = "src/table" }
[patch.crates-io]
# change all rustls dependencies to use our fork to default to `ring` to make it "just work"
hyper-rustls = { git = "https://github.com/GreptimeTeam/hyper-rustls" }
rustls = { git = "https://github.com/GreptimeTeam/rustls" }
tokio-rustls = { git = "https://github.com/GreptimeTeam/tokio-rustls" }
# This is commented, since we are not using aws-lc-sys, if we need to use it, we need to uncomment this line or use a release after this commit, or it wouldn't compile with gcc < 8.1
# see https://github.com/aws/aws-lc-rs/pull/526
# aws-lc-sys = { git ="https://github.com/aws/aws-lc-rs", rev = "556558441e3494af4b156ae95ebc07ebc2fd38aa" }
# Apply a fix for pprof for unaligned pointer access
pprof = { git = "https://github.com/GreptimeTeam/pprof-rs", rev = "1bd1e21" }
[workspace.dependencies.meter-macros]
git = "https://github.com/GreptimeTeam/greptime-meter.git"
rev = "80b72716dcde47ec4161478416a5c6c21343364d"
rev = "a10facb353b41460eeb98578868ebf19c2084fac"
[profile.release]
debug = 1
[profile.nightly]
inherits = "release"
strip = true
strip = "debuginfo"
lto = "thin"
debug = false
incremental = false

View File

@@ -8,6 +8,7 @@ CARGO_BUILD_OPTS := --locked
IMAGE_REGISTRY ?= docker.io
IMAGE_NAMESPACE ?= greptime
IMAGE_TAG ?= latest
DEV_BUILDER_IMAGE_TAG ?= 2024-10-19-a5c00e85-20241024184445
BUILDX_MULTI_PLATFORM_BUILD ?= false
BUILDX_BUILDER_NAME ?= gtbuilder
BASE_IMAGE ?= ubuntu
@@ -15,6 +16,7 @@ RUST_TOOLCHAIN ?= $(shell cat rust-toolchain.toml | grep channel | cut -d'"' -f2
CARGO_REGISTRY_CACHE ?= ${HOME}/.cargo/registry
ARCH := $(shell uname -m | sed 's/x86_64/amd64/' | sed 's/aarch64/arm64/')
OUTPUT_DIR := $(shell if [ "$(RELEASE)" = "true" ]; then echo "release"; elif [ ! -z "$(CARGO_PROFILE)" ]; then echo "$(CARGO_PROFILE)" ; else echo "debug"; fi)
SQLNESS_OPTS ?=
# The arguments for running integration tests.
ETCD_VERSION ?= v3.5.9
@@ -76,7 +78,7 @@ build: ## Build debug version greptime.
build-by-dev-builder: ## Build greptime by dev-builder.
docker run --network=host \
-v ${PWD}:/greptimedb -v ${CARGO_REGISTRY_CACHE}:/root/.cargo/registry \
-w /greptimedb ${IMAGE_REGISTRY}/${IMAGE_NAMESPACE}/dev-builder-${BASE_IMAGE}:latest \
-w /greptimedb ${IMAGE_REGISTRY}/${IMAGE_NAMESPACE}/dev-builder-${BASE_IMAGE}:${DEV_BUILDER_IMAGE_TAG} \
make build \
CARGO_EXTENSION="${CARGO_EXTENSION}" \
CARGO_PROFILE=${CARGO_PROFILE} \
@@ -90,7 +92,7 @@ build-by-dev-builder: ## Build greptime by dev-builder.
build-android-bin: ## Build greptime binary for android.
docker run --network=host \
-v ${PWD}:/greptimedb -v ${CARGO_REGISTRY_CACHE}:/root/.cargo/registry \
-w /greptimedb ${IMAGE_REGISTRY}/${IMAGE_NAMESPACE}/dev-builder-android:latest \
-w /greptimedb ${IMAGE_REGISTRY}/${IMAGE_NAMESPACE}/dev-builder-android:${DEV_BUILDER_IMAGE_TAG} \
make build \
CARGO_EXTENSION="ndk --platform 23 -t aarch64-linux-android" \
CARGO_PROFILE=release \
@@ -104,8 +106,8 @@ build-android-bin: ## Build greptime binary for android.
strip-android-bin: build-android-bin ## Strip greptime binary for android.
docker run --network=host \
-v ${PWD}:/greptimedb \
-w /greptimedb ${IMAGE_REGISTRY}/${IMAGE_NAMESPACE}/dev-builder-android:latest \
bash -c '$${NDK_ROOT}/toolchains/llvm/prebuilt/linux-x86_64/bin/llvm-strip /greptimedb/target/aarch64-linux-android/release/greptime'
-w /greptimedb ${IMAGE_REGISTRY}/${IMAGE_NAMESPACE}/dev-builder-android:${DEV_BUILDER_IMAGE_TAG} \
bash -c '$${NDK_ROOT}/toolchains/llvm/prebuilt/linux-x86_64/bin/llvm-strip --strip-debug /greptimedb/target/aarch64-linux-android/release/greptime'
.PHONY: clean
clean: ## Clean the project.
@@ -144,7 +146,7 @@ dev-builder: multi-platform-buildx ## Build dev-builder image.
docker buildx build --builder ${BUILDX_BUILDER_NAME} \
--build-arg="RUST_TOOLCHAIN=${RUST_TOOLCHAIN}" \
-f docker/dev-builder/${BASE_IMAGE}/Dockerfile \
-t ${IMAGE_REGISTRY}/${IMAGE_NAMESPACE}/dev-builder-${BASE_IMAGE}:${IMAGE_TAG} ${BUILDX_MULTI_PLATFORM_BUILD_OPTS} .
-t ${IMAGE_REGISTRY}/${IMAGE_NAMESPACE}/dev-builder-${BASE_IMAGE}:${DEV_BUILDER_IMAGE_TAG} ${BUILDX_MULTI_PLATFORM_BUILD_OPTS} .
.PHONY: multi-platform-buildx
multi-platform-buildx: ## Create buildx multi-platform builder.
@@ -161,7 +163,7 @@ nextest: ## Install nextest tools.
.PHONY: sqlness-test
sqlness-test: ## Run sqlness test.
cargo sqlness
cargo sqlness ${SQLNESS_OPTS}
# Run fuzz test ${FUZZ_TARGET}.
RUNS ?= 1
@@ -172,7 +174,7 @@ fuzz:
.PHONY: fuzz-ls
fuzz-ls:
cargo fuzz list --fuzz-dir tests-fuzz
cargo fuzz list --fuzz-dir tests-fuzz
.PHONY: check
check: ## Cargo check all the targets.
@@ -189,6 +191,7 @@ fix-clippy: ## Fix clippy violations.
.PHONY: fmt-check
fmt-check: ## Check code format.
cargo fmt --all -- --check
python3 scripts/check-snafu.py
.PHONY: start-etcd
start-etcd: ## Start single node etcd for testing purpose.
@@ -202,7 +205,7 @@ stop-etcd: ## Stop single node etcd for testing purpose.
run-it-in-container: start-etcd ## Run integration tests in dev-builder.
docker run --network=host \
-v ${PWD}:/greptimedb -v ${CARGO_REGISTRY_CACHE}:/root/.cargo/registry -v /tmp:/tmp \
-w /greptimedb ${IMAGE_REGISTRY}/${IMAGE_NAMESPACE}/dev-builder-${BASE_IMAGE}:latest \
-w /greptimedb ${IMAGE_REGISTRY}/${IMAGE_NAMESPACE}/dev-builder-${BASE_IMAGE}:${DEV_BUILDER_IMAGE_TAG} \
make test sqlness-test BUILD_JOBS=${BUILD_JOBS}
.PHONY: start-cluster
@@ -218,7 +221,7 @@ config-docs: ## Generate configuration documentation from toml files.
docker run --rm \
-v ${PWD}:/greptimedb \
-w /greptimedb/config \
toml2docs/toml2docs:v0.1.1 \
toml2docs/toml2docs:v0.1.3 \
-p '##' \
-t ./config-docs-template.md \
-o ./config.md

View File

@@ -6,7 +6,7 @@
</picture>
</p>
<h2 align="center">Unified Time Series Database for Metrics, Events, and Logs</h2>
<h2 align="center">Unified & Cost-Effective Time Series Database for Metrics, Logs, and Events</h2>
<div align="center">
<h3 align="center">
@@ -48,37 +48,51 @@
</a>
</div>
- [Introduction](#introduction)
- [**Features: Why GreptimeDB**](#why-greptimedb)
- [Architecture](https://docs.greptime.com/contributor-guide/overview/#architecture)
- [Try it for free](#try-greptimedb)
- [Getting Started](#getting-started)
- [Project Status](#project-status)
- [Join the community](#community)
- [Contributing](#contributing)
- [Tools & Extensions](#tools--extensions)
- [License](#license)
- [Acknowledgement](#acknowledgement)
## Introduction
**GreptimeDB** is an open-source unified time-series database for **Metrics**, **Events**, and **Logs** (also **Traces** in plan). You can gain real-time insights from Edge to Cloud at any scale.
**GreptimeDB** is an open-source unified & cost-effective time-series database for **Metrics**, **Logs**, and **Events** (also **Traces** in plan). You can gain real-time insights from Edge to Cloud at Any Scale.
## Why GreptimeDB
Our core developers have been building time-series data platforms for years. Based on our best-practices, GreptimeDB is born to give you:
Our core developers have been building time-series data platforms for years. Based on our best practices, GreptimeDB was born to give you:
* **Unified all kinds of time series**
* **Unified Processing of Metrics, Logs, and Events**
GreptimeDB treats all time series as contextual events with timestamp, and thus unifies the processing of metrics and events. It supports analyzing metrics and events with SQL and PromQL, and doing streaming with continuous aggregation.
GreptimeDB unifies time series data processing by treating all data - whether metrics, logs, or events - as timestamped events with context. Users can analyze this data using either [SQL](https://docs.greptime.com/user-guide/query-data/sql) or [PromQL](https://docs.greptime.com/user-guide/query-data/promql) and leverage stream processing ([Flow](https://docs.greptime.com/user-guide/continuous-aggregation/overview)) to enable continuous aggregation. [Read more](https://docs.greptime.com/user-guide/concepts/data-model).
* **Cloud-Edge collaboration**
* **Cloud-native Distributed Database**
GreptimeDB can be deployed on ARM architecture-compatible Android/Linux systems as well as cloud environments from various vendors. Both sides run the same software, providing identical APIs and control planes, so your application can run at the edge or on the cloud without modification, and data synchronization also becomes extremely easy and efficient.
* **Cloud-native distributed database**
By leveraging object storage (S3 and others), separating compute and storage, scaling stateless compute nodes arbitrarily, GreptimeDB implements seamless scalability. It also supports cross-cloud deployment with a built-in unified data access layer over different object storages.
Built for [Kubernetes](https://docs.greptime.com/user-guide/deployments/deploy-on-kubernetes/greptimedb-operator-management). GreptimeDB achieves seamless scalability with its [cloud-native architecture](https://docs.greptime.com/user-guide/concepts/architecture) of separated compute and storage, built on object storage (AWS S3, Azure Blob Storage, etc.) while enabling cross-cloud deployment through a unified data access layer.
* **Performance and Cost-effective**
Flexible indexing capabilities and distributed, parallel-processing query engine, tackling high cardinality issues down. Optimized columnar layout for handling time-series data; compacted, compressed, and stored on various storage backends, particularly cloud object storage with 50x cost efficiency.
Written in pure Rust for superior performance and reliability. GreptimeDB features a distributed query engine with intelligent indexing to handle high cardinality data efficiently. Its optimized columnar storage achieves 50x cost efficiency on cloud object storage through advanced compression. [Benchmark reports](https://www.greptime.com/blogs/2024-09-09-report-summary).
* **Compatible with InfluxDB, Prometheus and more protocols**
* **Cloud-Edge Collaboration**
Widely adopted database protocols and APIs, including MySQL, PostgreSQL, and Prometheus Remote Storage, etc. [Read more](https://docs.greptime.com/user-guide/clients/overview).
GreptimeDB seamlessly operates across cloud and edge (ARM/Android/Linux), providing consistent APIs and control plane for unified data management and efficient synchronization. [Learn how to run on Android](https://docs.greptime.com/user-guide/deployments/run-on-android/).
* **Multi-protocol Ingestion, SQL & PromQL Ready**
Widely adopted database protocols and APIs, including MySQL, PostgreSQL, InfluxDB, OpenTelemetry, Loki and Prometheus, etc. Effortless Adoption & Seamless Migration. [Supported Protocols Overview](https://docs.greptime.com/user-guide/protocols/overview).
For more detailed info please read [Why GreptimeDB](https://docs.greptime.com/user-guide/concepts/why-greptimedb).
## Try GreptimeDB
### 1. [GreptimePlay](https://greptime.com/playground)
### 1. [Live Demo](https://greptime.com/playground)
Try out the features of GreptimeDB right from your browser.
@@ -97,17 +111,26 @@ docker pull greptime/greptimedb
Start a GreptimeDB container with:
```shell
docker run --rm --name greptime --net=host greptime/greptimedb standalone start
docker run -p 127.0.0.1:4000-4003:4000-4003 \
-v "$(pwd)/greptimedb:/tmp/greptimedb" \
--name greptime --rm \
greptime/greptimedb:latest standalone start \
--http-addr 0.0.0.0:4000 \
--rpc-addr 0.0.0.0:4001 \
--mysql-addr 0.0.0.0:4002 \
--postgres-addr 0.0.0.0:4003
```
Access the dashboard via `http://localhost:4000/dashboard`.
Read more about [Installation](https://docs.greptime.com/getting-started/installation/overview) on docs.
## Getting Started
* [Quickstart](https://docs.greptime.com/getting-started/quick-start/overview)
* [Write Data](https://docs.greptime.com/user-guide/clients/overview)
* [Query Data](https://docs.greptime.com/user-guide/query-data/overview)
* [Operations](https://docs.greptime.com/user-guide/operations/overview)
* [Quickstart](https://docs.greptime.com/getting-started/quick-start)
* [User Guide](https://docs.greptime.com/user-guide/overview)
* [Demos](https://github.com/GreptimeTeam/demo-scene)
* [FAQ](https://docs.greptime.com/faq-and-others/faq)
## Build
@@ -129,7 +152,7 @@ Run a standalone server:
cargo run -- standalone start
```
## Extension
## Tools & Extensions
### Dashboard
@@ -146,13 +169,19 @@ cargo run -- standalone start
### Grafana Dashboard
Our official Grafana dashboard is available at [grafana](grafana/README.md) directory.
Our official Grafana dashboard for monitoring GreptimeDB is available at [grafana](grafana/README.md) directory.
## Project Status
The current version has not yet reached General Availability version standards.
In line with our Greptime 2024 Roadmap, we plan to achieve a production-level
version with the update to v1.0 in August. [[Join Force]](https://github.com/GreptimeTeam/greptimedb/issues/3412)
GreptimeDB is currently in Beta. We are targeting GA (General Availability) with v1.0 release by Early 2025.
While in Beta, GreptimeDB is already:
* Being used in production by early adopters
* Actively maintained with regular releases, [about version number](https://docs.greptime.com/nightly/reference/about-greptimedb-version)
* Suitable for testing and evaluation
For production use, we recommend using the latest stable release.
## Community
@@ -171,6 +200,13 @@ In addition, you may:
- Connect us with [Linkedin](https://www.linkedin.com/company/greptime/)
- Follow us on [Twitter](https://twitter.com/greptime)
## Commercial Support
If you are running GreptimeDB OSS in your organization, we offer additional
enterprise add-ons, installation services, training, and consulting. [Contact
us](https://greptime.com/contactus) and we will reach out to you with more
detail of our commercial license.
## License
GreptimeDB uses the [Apache License 2.0](https://apache.org/licenses/LICENSE-2.0.txt) to strike a balance between

View File

@@ -13,12 +13,14 @@
| Key | Type | Default | Descriptions |
| --- | -----| ------- | ----------- |
| `mode` | String | `standalone` | The running mode of the datanode. It can be `standalone` or `distributed`. |
| `enable_telemetry` | Bool | `true` | Enable telemetry to collect anonymous usage data. |
| `default_timezone` | String | `None` | The default timezone of the server. |
| `default_timezone` | String | Unset | The default timezone of the server. |
| `init_regions_in_background` | Bool | `false` | Initialize all regions in the background during the startup.<br/>By default, it provides services after all regions have been initialized. |
| `init_regions_parallelism` | Integer | `16` | Parallelism of initializing regions. |
| `max_concurrent_queries` | Integer | `0` | The maximum current queries allowed to be executed. Zero means unlimited. |
| `enable_telemetry` | Bool | `true` | Enable telemetry to collect anonymous usage data. Enabled by default. |
| `runtime` | -- | -- | The runtime options. |
| `runtime.read_rt_size` | Integer | `8` | The number of threads to execute the runtime for global read operations. |
| `runtime.write_rt_size` | Integer | `8` | The number of threads to execute the runtime for global write operations. |
| `runtime.bg_rt_size` | Integer | `4` | The number of threads to execute the runtime for global background operations. |
| `runtime.global_rt_size` | Integer | `8` | The number of threads to execute the runtime for global read operations. |
| `runtime.compact_rt_size` | Integer | `4` | The number of threads to execute the runtime for global write operations. |
| `http` | -- | -- | The HTTP server options. |
| `http.addr` | String | `127.0.0.1:4000` | The address to bind the HTTP server. |
| `http.timeout` | String | `30s` | HTTP request timeout. Set to 0 to disable timeout. |
@@ -28,8 +30,8 @@
| `grpc.runtime_size` | Integer | `8` | The number of server worker threads. |
| `grpc.tls` | -- | -- | gRPC server TLS options, see `mysql.tls` section. |
| `grpc.tls.mode` | String | `disable` | TLS mode. |
| `grpc.tls.cert_path` | String | `None` | Certificate file path. |
| `grpc.tls.key_path` | String | `None` | Private key file path. |
| `grpc.tls.cert_path` | String | Unset | Certificate file path. |
| `grpc.tls.key_path` | String | Unset | Private key file path. |
| `grpc.tls.watch` | Bool | `false` | Watch for Certificate and key file change and auto reload.<br/>For now, gRPC tls config does not support auto reload. |
| `mysql` | -- | -- | MySQL server options. |
| `mysql.enable` | Bool | `true` | Whether to enable. |
@@ -37,8 +39,8 @@
| `mysql.runtime_size` | Integer | `2` | The number of server worker threads. |
| `mysql.tls` | -- | -- | -- |
| `mysql.tls.mode` | String | `disable` | TLS mode, refer to https://www.postgresql.org/docs/current/libpq-ssl.html<br/>- `disable` (default value)<br/>- `prefer`<br/>- `require`<br/>- `verify-ca`<br/>- `verify-full` |
| `mysql.tls.cert_path` | String | `None` | Certificate file path. |
| `mysql.tls.key_path` | String | `None` | Private key file path. |
| `mysql.tls.cert_path` | String | Unset | Certificate file path. |
| `mysql.tls.key_path` | String | Unset | Private key file path. |
| `mysql.tls.watch` | Bool | `false` | Watch for Certificate and key file change and auto reload |
| `postgres` | -- | -- | PostgresSQL server options. |
| `postgres.enable` | Bool | `true` | Whether to enable |
@@ -46,8 +48,8 @@
| `postgres.runtime_size` | Integer | `2` | The number of server worker threads. |
| `postgres.tls` | -- | -- | PostgresSQL server TLS options, see `mysql.tls` section. |
| `postgres.tls.mode` | String | `disable` | TLS mode. |
| `postgres.tls.cert_path` | String | `None` | Certificate file path. |
| `postgres.tls.key_path` | String | `None` | Private key file path. |
| `postgres.tls.cert_path` | String | Unset | Certificate file path. |
| `postgres.tls.key_path` | String | Unset | Private key file path. |
| `postgres.tls.watch` | Bool | `false` | Watch for Certificate and key file change and auto reload |
| `opentsdb` | -- | -- | OpenTSDB protocol options. |
| `opentsdb.enable` | Bool | `true` | Whether to enable OpenTSDB put in HTTP API. |
@@ -58,22 +60,30 @@
| `prom_store.with_metric_engine` | Bool | `true` | Whether to store the data from Prometheus remote write in metric engine. |
| `wal` | -- | -- | The WAL options. |
| `wal.provider` | String | `raft_engine` | The provider of the WAL.<br/>- `raft_engine`: the wal is stored in the local file system by raft-engine.<br/>- `kafka`: it's remote wal that data is stored in Kafka. |
| `wal.dir` | String | `None` | The directory to store the WAL files.<br/>**It's only used when the provider is `raft_engine`**. |
| `wal.file_size` | String | `256MB` | The size of the WAL segment file.<br/>**It's only used when the provider is `raft_engine`**. |
| `wal.purge_threshold` | String | `4GB` | The threshold of the WAL size to trigger a flush.<br/>**It's only used when the provider is `raft_engine`**. |
| `wal.purge_interval` | String | `10m` | The interval to trigger a flush.<br/>**It's only used when the provider is `raft_engine`**. |
| `wal.dir` | String | Unset | The directory to store the WAL files.<br/>**It's only used when the provider is `raft_engine`**. |
| `wal.file_size` | String | `128MB` | The size of the WAL segment file.<br/>**It's only used when the provider is `raft_engine`**. |
| `wal.purge_threshold` | String | `1GB` | The threshold of the WAL size to trigger a flush.<br/>**It's only used when the provider is `raft_engine`**. |
| `wal.purge_interval` | String | `1m` | The interval to trigger a flush.<br/>**It's only used when the provider is `raft_engine`**. |
| `wal.read_batch_size` | Integer | `128` | The read batch size.<br/>**It's only used when the provider is `raft_engine`**. |
| `wal.sync_write` | Bool | `false` | Whether to use sync write.<br/>**It's only used when the provider is `raft_engine`**. |
| `wal.enable_log_recycle` | Bool | `true` | Whether to reuse logically truncated log files.<br/>**It's only used when the provider is `raft_engine`**. |
| `wal.prefill_log_files` | Bool | `false` | Whether to pre-create log files on start up.<br/>**It's only used when the provider is `raft_engine`**. |
| `wal.sync_period` | String | `10s` | Duration for fsyncing log files.<br/>**It's only used when the provider is `raft_engine`**. |
| `wal.recovery_parallelism` | Integer | `2` | Parallelism during WAL recovery. |
| `wal.broker_endpoints` | Array | -- | The Kafka broker endpoints.<br/>**It's only used when the provider is `kafka`**. |
| `wal.auto_create_topics` | Bool | `true` | Automatically create topics for WAL.<br/>Set to `true` to automatically create topics for WAL.<br/>Otherwise, use topics named `topic_name_prefix_[0..num_topics)` |
| `wal.num_topics` | Integer | `64` | Number of topics.<br/>**It's only used when the provider is `kafka`**. |
| `wal.selector_type` | String | `round_robin` | Topic selector type.<br/>Available selector types:<br/>- `round_robin` (default)<br/>**It's only used when the provider is `kafka`**. |
| `wal.topic_name_prefix` | String | `greptimedb_wal_topic` | A Kafka topic is constructed by concatenating `topic_name_prefix` and `topic_id`.<br/>i.g., greptimedb_wal_topic_0, greptimedb_wal_topic_1.<br/>**It's only used when the provider is `kafka`**. |
| `wal.replication_factor` | Integer | `1` | Expected number of replicas of each partition.<br/>**It's only used when the provider is `kafka`**. |
| `wal.create_topic_timeout` | String | `30s` | Above which a topic creation operation will be cancelled.<br/>**It's only used when the provider is `kafka`**. |
| `wal.max_batch_bytes` | String | `1MB` | The max size of a single producer batch.<br/>Warning: Kafka has a default limit of 1MB per message in a topic.<br/>**It's only used when the provider is `kafka`**. |
| `wal.consumer_wait_timeout` | String | `100ms` | The consumer wait timeout.<br/>**It's only used when the provider is `kafka`**. |
| `wal.backoff_init` | String | `500ms` | The initial backoff delay.<br/>**It's only used when the provider is `kafka`**. |
| `wal.backoff_max` | String | `10s` | The maximum backoff delay.<br/>**It's only used when the provider is `kafka`**. |
| `wal.backoff_base` | Integer | `2` | The exponential backoff rate, i.e. next backoff = base * current backoff.<br/>**It's only used when the provider is `kafka`**. |
| `wal.backoff_deadline` | String | `5mins` | The deadline of retries.<br/>**It's only used when the provider is `kafka`**. |
| `wal.overwrite_entry_start_id` | Bool | `false` | Ignore missing entries during read WAL.<br/>**It's only used when the provider is `kafka`**.<br/><br/>This option ensures that when Kafka messages are deleted, the system<br/>can still successfully replay memtable data without throwing an<br/>out-of-range error.<br/>However, enabling this option might lead to unexpected data loss,<br/>as the system will skip over missing entries instead of treating<br/>them as critical errors. |
| `metadata_store` | -- | -- | Metadata storage options. |
| `metadata_store.file_size` | String | `256MB` | Kv file size in bytes. |
| `metadata_store.purge_threshold` | String | `4GB` | Kv purge threshold. |
@@ -83,21 +93,27 @@
| `storage` | -- | -- | The data storage options. |
| `storage.data_home` | String | `/tmp/greptimedb/` | The working home directory. |
| `storage.type` | String | `File` | The storage type used to store the data.<br/>- `File`: the data is stored in the local file system.<br/>- `S3`: the data is stored in the S3 object storage.<br/>- `Gcs`: the data is stored in the Google Cloud Storage.<br/>- `Azblob`: the data is stored in the Azure Blob Storage.<br/>- `Oss`: the data is stored in the Aliyun OSS. |
| `storage.cache_path` | String | `None` | Cache configuration for object storage such as 'S3' etc.<br/>The local file cache directory. |
| `storage.cache_capacity` | String | `None` | The local file cache capacity in bytes. |
| `storage.bucket` | String | `None` | The S3 bucket name.<br/>**It's only used when the storage type is `S3`, `Oss` and `Gcs`**. |
| `storage.root` | String | `None` | The S3 data will be stored in the specified prefix, for example, `s3://${bucket}/${root}`.<br/>**It's only used when the storage type is `S3`, `Oss` and `Azblob`**. |
| `storage.access_key_id` | String | `None` | The access key id of the aws account.<br/>It's **highly recommended** to use AWS IAM roles instead of hardcoding the access key id and secret key.<br/>**It's only used when the storage type is `S3` and `Oss`**. |
| `storage.secret_access_key` | String | `None` | The secret access key of the aws account.<br/>It's **highly recommended** to use AWS IAM roles instead of hardcoding the access key id and secret key.<br/>**It's only used when the storage type is `S3`**. |
| `storage.access_key_secret` | String | `None` | The secret access key of the aliyun account.<br/>**It's only used when the storage type is `Oss`**. |
| `storage.account_name` | String | `None` | The account key of the azure account.<br/>**It's only used when the storage type is `Azblob`**. |
| `storage.account_key` | String | `None` | The account key of the azure account.<br/>**It's only used when the storage type is `Azblob`**. |
| `storage.scope` | String | `None` | The scope of the google cloud storage.<br/>**It's only used when the storage type is `Gcs`**. |
| `storage.credential_path` | String | `None` | The credential path of the google cloud storage.<br/>**It's only used when the storage type is `Gcs`**. |
| `storage.container` | String | `None` | The container of the azure account.<br/>**It's only used when the storage type is `Azblob`**. |
| `storage.sas_token` | String | `None` | The sas token of the azure account.<br/>**It's only used when the storage type is `Azblob`**. |
| `storage.endpoint` | String | `None` | The endpoint of the S3 service.<br/>**It's only used when the storage type is `S3`, `Oss`, `Gcs` and `Azblob`**. |
| `storage.region` | String | `None` | The region of the S3 service.<br/>**It's only used when the storage type is `S3`, `Oss`, `Gcs` and `Azblob`**. |
| `storage.cache_path` | String | Unset | Read cache configuration for object storage such as 'S3' etc, it's configured by default when using object storage. It is recommended to configure it when using object storage for better performance.<br/>A local file directory, defaults to `{data_home}/object_cache/read`. An empty string means disabling. |
| `storage.cache_capacity` | String | Unset | The local file cache capacity in bytes. If your disk space is sufficient, it is recommended to set it larger. |
| `storage.bucket` | String | Unset | The S3 bucket name.<br/>**It's only used when the storage type is `S3`, `Oss` and `Gcs`**. |
| `storage.root` | String | Unset | The S3 data will be stored in the specified prefix, for example, `s3://${bucket}/${root}`.<br/>**It's only used when the storage type is `S3`, `Oss` and `Azblob`**. |
| `storage.access_key_id` | String | Unset | The access key id of the aws account.<br/>It's **highly recommended** to use AWS IAM roles instead of hardcoding the access key id and secret key.<br/>**It's only used when the storage type is `S3` and `Oss`**. |
| `storage.secret_access_key` | String | Unset | The secret access key of the aws account.<br/>It's **highly recommended** to use AWS IAM roles instead of hardcoding the access key id and secret key.<br/>**It's only used when the storage type is `S3`**. |
| `storage.access_key_secret` | String | Unset | The secret access key of the aliyun account.<br/>**It's only used when the storage type is `Oss`**. |
| `storage.account_name` | String | Unset | The account key of the azure account.<br/>**It's only used when the storage type is `Azblob`**. |
| `storage.account_key` | String | Unset | The account key of the azure account.<br/>**It's only used when the storage type is `Azblob`**. |
| `storage.scope` | String | Unset | The scope of the google cloud storage.<br/>**It's only used when the storage type is `Gcs`**. |
| `storage.credential_path` | String | Unset | The credential path of the google cloud storage.<br/>**It's only used when the storage type is `Gcs`**. |
| `storage.credential` | String | Unset | The credential of the google cloud storage.<br/>**It's only used when the storage type is `Gcs`**. |
| `storage.container` | String | Unset | The container of the azure account.<br/>**It's only used when the storage type is `Azblob`**. |
| `storage.sas_token` | String | Unset | The sas token of the azure account.<br/>**It's only used when the storage type is `Azblob`**. |
| `storage.endpoint` | String | Unset | The endpoint of the S3 service.<br/>**It's only used when the storage type is `S3`, `Oss`, `Gcs` and `Azblob`**. |
| `storage.region` | String | Unset | The region of the S3 service.<br/>**It's only used when the storage type is `S3`, `Oss`, `Gcs` and `Azblob`**. |
| `storage.http_client` | -- | -- | The http client options to the storage.<br/>**It's only used when the storage type is `S3`, `Oss`, `Gcs` and `Azblob`**. |
| `storage.http_client.pool_max_idle_per_host` | Integer | `1024` | The maximum idle connection per host allowed in the pool. |
| `storage.http_client.connect_timeout` | String | `30s` | The timeout for only the connect phase of a http client. |
| `storage.http_client.timeout` | String | `30s` | The total request timeout, applied from when the request starts connecting until the response body has finished.<br/>Also considered a total deadline. |
| `storage.http_client.pool_idle_timeout` | String | `90s` | The timeout for idle sockets being kept-alive. |
| `[[region_engine]]` | -- | -- | The region engine options. You can configure multiple region engines. |
| `region_engine.mito` | -- | -- | The Mito engine options. |
| `region_engine.mito.num_workers` | Integer | `8` | Number of region workers. |
@@ -105,21 +121,24 @@
| `region_engine.mito.worker_request_batch_size` | Integer | `64` | Max batch size for a worker to handle requests. |
| `region_engine.mito.manifest_checkpoint_distance` | Integer | `10` | Number of meta action updated to trigger a new checkpoint for the manifest. |
| `region_engine.mito.compress_manifest` | Bool | `false` | Whether to compress manifest and checkpoint file by gzip (default false). |
| `region_engine.mito.max_background_jobs` | Integer | `4` | Max number of running background jobs |
| `region_engine.mito.max_background_flushes` | Integer | Auto | Max number of running background flush jobs (default: 1/2 of cpu cores). |
| `region_engine.mito.max_background_compactions` | Integer | Auto | Max number of running background compaction jobs (default: 1/4 of cpu cores). |
| `region_engine.mito.max_background_purges` | Integer | Auto | Max number of running background purge jobs (default: number of cpu cores). |
| `region_engine.mito.auto_flush_interval` | String | `1h` | Interval to auto flush a region if it has not flushed yet. |
| `region_engine.mito.global_write_buffer_size` | String | `1GB` | Global write buffer size for all regions. If not set, it's default to 1/8 of OS memory with a max limitation of 1GB. |
| `region_engine.mito.global_write_buffer_reject_size` | String | `2GB` | Global write buffer size threshold to reject write requests. If not set, it's default to 2 times of `global_write_buffer_size` |
| `region_engine.mito.sst_meta_cache_size` | String | `128MB` | Cache size for SST metadata. Setting it to 0 to disable the cache.<br/>If not set, it's default to 1/32 of OS memory with a max limitation of 128MB. |
| `region_engine.mito.vector_cache_size` | String | `512MB` | Cache size for vectors and arrow arrays. Setting it to 0 to disable the cache.<br/>If not set, it's default to 1/16 of OS memory with a max limitation of 512MB. |
| `region_engine.mito.page_cache_size` | String | `512MB` | Cache size for pages of SST row groups. Setting it to 0 to disable the cache.<br/>If not set, it's default to 1/16 of OS memory with a max limitation of 512MB. |
| `region_engine.mito.enable_experimental_write_cache` | Bool | `false` | Whether to enable the experimental write cache. |
| `region_engine.mito.experimental_write_cache_path` | String | `""` | File system path for write cache, defaults to `{data_home}/write_cache`. |
| `region_engine.mito.experimental_write_cache_size` | String | `512MB` | Capacity for write cache. |
| `region_engine.mito.experimental_write_cache_ttl` | String | `1h` | TTL for write cache. |
| `region_engine.mito.global_write_buffer_size` | String | Auto | Global write buffer size for all regions. If not set, it's default to 1/8 of OS memory with a max limitation of 1GB. |
| `region_engine.mito.global_write_buffer_reject_size` | String | Auto | Global write buffer size threshold to reject write requests. If not set, it's default to 2 times of `global_write_buffer_size`. |
| `region_engine.mito.sst_meta_cache_size` | String | Auto | Cache size for SST metadata. Setting it to 0 to disable the cache.<br/>If not set, it's default to 1/32 of OS memory with a max limitation of 128MB. |
| `region_engine.mito.vector_cache_size` | String | Auto | Cache size for vectors and arrow arrays. Setting it to 0 to disable the cache.<br/>If not set, it's default to 1/16 of OS memory with a max limitation of 512MB. |
| `region_engine.mito.page_cache_size` | String | Auto | Cache size for pages of SST row groups. Setting it to 0 to disable the cache.<br/>If not set, it's default to 1/8 of OS memory. |
| `region_engine.mito.selector_result_cache_size` | String | Auto | Cache size for time series selector (e.g. `last_value()`). Setting it to 0 to disable the cache.<br/>If not set, it's default to 1/16 of OS memory with a max limitation of 512MB. |
| `region_engine.mito.enable_experimental_write_cache` | Bool | `false` | Whether to enable the experimental write cache, it's enabled by default when using object storage. It is recommended to enable it when using object storage for better performance. |
| `region_engine.mito.experimental_write_cache_path` | String | `""` | File system path for write cache, defaults to `{data_home}/object_cache/write`. |
| `region_engine.mito.experimental_write_cache_size` | String | `5GiB` | Capacity for write cache. If your disk space is sufficient, it is recommended to set it larger. |
| `region_engine.mito.experimental_write_cache_ttl` | String | Unset | TTL for write cache. |
| `region_engine.mito.sst_write_buffer_size` | String | `8MB` | Buffer size for SST writing. |
| `region_engine.mito.scan_parallelism` | Integer | `0` | Parallelism to scan a region (default: 1/4 of cpu cores).<br/>- `0`: using the default value (1/4 of cpu cores).<br/>- `1`: scan in current thread.<br/>- `n`: scan in parallelism n. |
| `region_engine.mito.parallel_scan_channel_size` | Integer | `32` | Capacity of the channel to send data from parallel scan tasks to the main task. |
| `region_engine.mito.allow_stale_entries` | Bool | `false` | Whether to allow stale WAL entries read during replay. |
| `region_engine.mito.min_compaction_interval` | String | `0m` | Minimum time interval between two compactions.<br/>To align with the old behavior, the default value is 0 (no restrictions). |
| `region_engine.mito.index` | -- | -- | The options for index in Mito engine. |
| `region_engine.mito.index.aux_path` | String | `""` | Auxiliary directory path for the index in filesystem, used to store intermediate files for<br/>creating the index and staging files for searching the index, defaults to `{data_home}/index_intermediate`.<br/>The default name for this directory is `index_intermediate` for backward compatibility.<br/><br/>This path contains two subdirectories:<br/>- `__intm`: for storing intermediate files used during creating index.<br/>- `staging`: for storing staging files used during searching index. |
| `region_engine.mito.index.staging_size` | String | `2GB` | The max capacity of the staging directory. |
@@ -129,6 +148,9 @@
| `region_engine.mito.inverted_index.apply_on_query` | String | `auto` | Whether to apply the index on query<br/>- `auto`: automatically (default)<br/>- `disable`: never |
| `region_engine.mito.inverted_index.mem_threshold_on_create` | String | `auto` | Memory threshold for performing an external sort during index creation.<br/>- `auto`: automatically determine the threshold based on the system memory size (default)<br/>- `unlimited`: no memory limit<br/>- `[size]` e.g. `64MB`: fixed memory threshold |
| `region_engine.mito.inverted_index.intermediate_path` | String | `""` | Deprecated, use `region_engine.mito.index.aux_path` instead. |
| `region_engine.mito.inverted_index.metadata_cache_size` | String | `64MiB` | Cache size for inverted index metadata. |
| `region_engine.mito.inverted_index.content_cache_size` | String | `128MiB` | Cache size for inverted index content. |
| `region_engine.mito.inverted_index.content_cache_page_size` | String | `8MiB` | Page size for inverted index content cache. |
| `region_engine.mito.fulltext_index` | -- | -- | The options for full-text index in Mito engine. |
| `region_engine.mito.fulltext_index.create_on_flush` | String | `auto` | Whether to create the index on flush.<br/>- `auto`: automatically (default)<br/>- `disable`: never |
| `region_engine.mito.fulltext_index.create_on_compaction` | String | `auto` | Whether to create the index on compaction.<br/>- `auto`: automatically (default)<br/>- `disable`: never |
@@ -141,23 +163,29 @@
| `region_engine.mito.memtable.fork_dictionary_bytes` | String | `1GiB` | Max dictionary bytes.<br/>Only available for `partition_tree` memtable. |
| `region_engine.file` | -- | -- | Enable the file engine. |
| `logging` | -- | -- | The logging options. |
| `logging.dir` | String | `/tmp/greptimedb/logs` | The directory to store the log files. |
| `logging.level` | String | `None` | The log level. Can be `info`/`debug`/`warn`/`error`. |
| `logging.dir` | String | `/tmp/greptimedb/logs` | The directory to store the log files. If set to empty, logs will not be written to files. |
| `logging.level` | String | Unset | The log level. Can be `info`/`debug`/`warn`/`error`. |
| `logging.enable_otlp_tracing` | Bool | `false` | Enable OTLP tracing. |
| `logging.otlp_endpoint` | String | `None` | The OTLP tracing endpoint. |
| `logging.otlp_endpoint` | String | `http://localhost:4317` | The OTLP tracing endpoint. |
| `logging.append_stdout` | Bool | `true` | Whether to append logs to stdout. |
| `logging.log_format` | String | `text` | The log format. Can be `text`/`json`. |
| `logging.max_log_files` | Integer | `720` | The maximum amount of log files. |
| `logging.tracing_sample_ratio` | -- | -- | The percentage of tracing will be sampled and exported.<br/>Valid range `[0, 1]`, 1 means all traces are sampled, 0 means all traces are not sampled, the default value is 1.<br/>ratio > 1 are treated as 1. Fractions < 0 are treated as 0 |
| `logging.tracing_sample_ratio.default_ratio` | Float | `1.0` | -- |
| `logging.slow_query` | -- | -- | The slow query log options. |
| `logging.slow_query.enable` | Bool | `false` | Whether to enable slow query log. |
| `logging.slow_query.threshold` | String | Unset | The threshold of slow query. |
| `logging.slow_query.sample_ratio` | Float | Unset | The sampling ratio of slow query log. The value should be in the range of (0, 1]. |
| `export_metrics` | -- | -- | The datanode can export its metrics and send to Prometheus compatible service (e.g. send to `greptimedb` itself) from remote-write API.<br/>This is only used for `greptimedb` to export its own metrics internally. It's different from prometheus scrape. |
| `export_metrics.enable` | Bool | `false` | whether enable export metrics. |
| `export_metrics.write_interval` | String | `30s` | The interval of export metrics. |
| `export_metrics.self_import` | -- | -- | For `standalone` mode, `self_import` is recommend to collect metrics generated by itself |
| `export_metrics.self_import.db` | String | `None` | -- |
| `export_metrics.self_import` | -- | -- | For `standalone` mode, `self_import` is recommended to collect metrics generated by itself<br/>You must create the database before enabling it. |
| `export_metrics.self_import.db` | String | Unset | -- |
| `export_metrics.remote_write` | -- | -- | -- |
| `export_metrics.remote_write.url` | String | `""` | The url the metrics send to. The url example can be: `http://127.0.0.1:4000/v1/prometheus/write?db=information_schema`. |
| `export_metrics.remote_write.url` | String | `""` | The url the metrics send to. The url example can be: `http://127.0.0.1:4000/v1/prometheus/write?db=greptime_metrics`. |
| `export_metrics.remote_write.headers` | InlineTable | -- | HTTP headers of Prometheus remote-write carry. |
| `tracing` | -- | -- | The tracing options. Only effect when compiled with `tokio-console` feature. |
| `tracing.tokio_console_addr` | String | `None` | The tokio console address. |
| `tracing.tokio_console_addr` | String | Unset | The tokio console address. |
## Distributed Mode
@@ -166,12 +194,10 @@
| Key | Type | Default | Descriptions |
| --- | -----| ------- | ----------- |
| `mode` | String | `standalone` | The running mode of the datanode. It can be `standalone` or `distributed`. |
| `default_timezone` | String | `None` | The default timezone of the server. |
| `default_timezone` | String | Unset | The default timezone of the server. |
| `runtime` | -- | -- | The runtime options. |
| `runtime.read_rt_size` | Integer | `8` | The number of threads to execute the runtime for global read operations. |
| `runtime.write_rt_size` | Integer | `8` | The number of threads to execute the runtime for global write operations. |
| `runtime.bg_rt_size` | Integer | `4` | The number of threads to execute the runtime for global background operations. |
| `runtime.global_rt_size` | Integer | `8` | The number of threads to execute the runtime for global read operations. |
| `runtime.compact_rt_size` | Integer | `4` | The number of threads to execute the runtime for global write operations. |
| `heartbeat` | -- | -- | The heartbeat options. |
| `heartbeat.interval` | String | `18s` | Interval for sending heartbeat messages to the metasrv. |
| `heartbeat.retry_interval` | String | `3s` | Interval for retrying to send heartbeat messages to the metasrv. |
@@ -185,8 +211,8 @@
| `grpc.runtime_size` | Integer | `8` | The number of server worker threads. |
| `grpc.tls` | -- | -- | gRPC server TLS options, see `mysql.tls` section. |
| `grpc.tls.mode` | String | `disable` | TLS mode. |
| `grpc.tls.cert_path` | String | `None` | Certificate file path. |
| `grpc.tls.key_path` | String | `None` | Private key file path. |
| `grpc.tls.cert_path` | String | Unset | Certificate file path. |
| `grpc.tls.key_path` | String | Unset | Private key file path. |
| `grpc.tls.watch` | Bool | `false` | Watch for Certificate and key file change and auto reload.<br/>For now, gRPC tls config does not support auto reload. |
| `mysql` | -- | -- | MySQL server options. |
| `mysql.enable` | Bool | `true` | Whether to enable. |
@@ -194,8 +220,8 @@
| `mysql.runtime_size` | Integer | `2` | The number of server worker threads. |
| `mysql.tls` | -- | -- | -- |
| `mysql.tls.mode` | String | `disable` | TLS mode, refer to https://www.postgresql.org/docs/current/libpq-ssl.html<br/>- `disable` (default value)<br/>- `prefer`<br/>- `require`<br/>- `verify-ca`<br/>- `verify-full` |
| `mysql.tls.cert_path` | String | `None` | Certificate file path. |
| `mysql.tls.key_path` | String | `None` | Private key file path. |
| `mysql.tls.cert_path` | String | Unset | Certificate file path. |
| `mysql.tls.key_path` | String | Unset | Private key file path. |
| `mysql.tls.watch` | Bool | `false` | Watch for Certificate and key file change and auto reload |
| `postgres` | -- | -- | PostgresSQL server options. |
| `postgres.enable` | Bool | `true` | Whether to enable |
@@ -203,8 +229,8 @@
| `postgres.runtime_size` | Integer | `2` | The number of server worker threads. |
| `postgres.tls` | -- | -- | PostgresSQL server TLS options, see `mysql.tls` section. |
| `postgres.tls.mode` | String | `disable` | TLS mode. |
| `postgres.tls.cert_path` | String | `None` | Certificate file path. |
| `postgres.tls.key_path` | String | `None` | Private key file path. |
| `postgres.tls.cert_path` | String | Unset | Certificate file path. |
| `postgres.tls.key_path` | String | Unset | Private key file path. |
| `postgres.tls.watch` | Bool | `false` | Watch for Certificate and key file change and auto reload |
| `opentsdb` | -- | -- | OpenTSDB protocol options. |
| `opentsdb.enable` | Bool | `true` | Whether to enable OpenTSDB put in HTTP API. |
@@ -228,23 +254,29 @@
| `datanode.client.connect_timeout` | String | `10s` | -- |
| `datanode.client.tcp_nodelay` | Bool | `true` | -- |
| `logging` | -- | -- | The logging options. |
| `logging.dir` | String | `/tmp/greptimedb/logs` | The directory to store the log files. |
| `logging.level` | String | `None` | The log level. Can be `info`/`debug`/`warn`/`error`. |
| `logging.dir` | String | `/tmp/greptimedb/logs` | The directory to store the log files. If set to empty, logs will not be written to files. |
| `logging.level` | String | Unset | The log level. Can be `info`/`debug`/`warn`/`error`. |
| `logging.enable_otlp_tracing` | Bool | `false` | Enable OTLP tracing. |
| `logging.otlp_endpoint` | String | `None` | The OTLP tracing endpoint. |
| `logging.otlp_endpoint` | String | `http://localhost:4317` | The OTLP tracing endpoint. |
| `logging.append_stdout` | Bool | `true` | Whether to append logs to stdout. |
| `logging.log_format` | String | `text` | The log format. Can be `text`/`json`. |
| `logging.max_log_files` | Integer | `720` | The maximum amount of log files. |
| `logging.tracing_sample_ratio` | -- | -- | The percentage of tracing will be sampled and exported.<br/>Valid range `[0, 1]`, 1 means all traces are sampled, 0 means all traces are not sampled, the default value is 1.<br/>ratio > 1 are treated as 1. Fractions < 0 are treated as 0 |
| `logging.tracing_sample_ratio.default_ratio` | Float | `1.0` | -- |
| `logging.slow_query` | -- | -- | The slow query log options. |
| `logging.slow_query.enable` | Bool | `false` | Whether to enable slow query log. |
| `logging.slow_query.threshold` | String | Unset | The threshold of slow query. |
| `logging.slow_query.sample_ratio` | Float | Unset | The sampling ratio of slow query log. The value should be in the range of (0, 1]. |
| `export_metrics` | -- | -- | The datanode can export its metrics and send to Prometheus compatible service (e.g. send to `greptimedb` itself) from remote-write API.<br/>This is only used for `greptimedb` to export its own metrics internally. It's different from prometheus scrape. |
| `export_metrics.enable` | Bool | `false` | whether enable export metrics. |
| `export_metrics.write_interval` | String | `30s` | The interval of export metrics. |
| `export_metrics.self_import` | -- | -- | For `standalone` mode, `self_import` is recommend to collect metrics generated by itself |
| `export_metrics.self_import.db` | String | `None` | -- |
| `export_metrics.self_import` | -- | -- | For `standalone` mode, `self_import` is recommend to collect metrics generated by itself<br/>You must create the database before enabling it. |
| `export_metrics.self_import.db` | String | Unset | -- |
| `export_metrics.remote_write` | -- | -- | -- |
| `export_metrics.remote_write.url` | String | `""` | The url the metrics send to. The url example can be: `http://127.0.0.1:4000/v1/prometheus/write?db=information_schema`. |
| `export_metrics.remote_write.url` | String | `""` | The url the metrics send to. The url example can be: `http://127.0.0.1:4000/v1/prometheus/write?db=greptime_metrics`. |
| `export_metrics.remote_write.headers` | InlineTable | -- | HTTP headers of Prometheus remote-write carry. |
| `tracing` | -- | -- | The tracing options. Only effect when compiled with `tokio-console` feature. |
| `tracing.tokio_console_addr` | String | `None` | The tokio console address. |
| `tracing.tokio_console_addr` | String | Unset | The tokio console address. |
### Metasrv
@@ -254,36 +286,37 @@
| `data_home` | String | `/tmp/metasrv/` | The working home directory. |
| `bind_addr` | String | `127.0.0.1:3002` | The bind address of metasrv. |
| `server_addr` | String | `127.0.0.1:3002` | The communication server address for frontend and datanode to connect to metasrv, "127.0.0.1:3002" by default for localhost. |
| `store_addr` | String | `127.0.0.1:2379` | Etcd server address. |
| `selector` | String | `lease_based` | Datanode selector type.<br/>- `lease_based` (default value).<br/>- `load_based`<br/>For details, please see "https://docs.greptime.com/developer-guide/metasrv/selector". |
| `use_memory_store` | Bool | `false` | Store data in memory. |
| `enable_telemetry` | Bool | `true` | Whether to enable greptimedb telemetry. |
| `store_addrs` | Array | -- | Store server address default to etcd store. |
| `store_key_prefix` | String | `""` | If it's not empty, the metasrv will store all data with this key prefix. |
| `backend` | String | `EtcdStore` | The datastore for meta server. |
| `selector` | String | `round_robin` | Datanode selector type.<br/>- `round_robin` (default value)<br/>- `lease_based`<br/>- `load_based`<br/>For details, please see "https://docs.greptime.com/developer-guide/metasrv/selector". |
| `use_memory_store` | Bool | `false` | Store data in memory. |
| `enable_region_failover` | Bool | `false` | Whether to enable region failover.<br/>This feature is only available on GreptimeDB running on cluster mode and<br/>- Using Remote WAL<br/>- Using shared storage (e.g., s3). |
| `enable_telemetry` | Bool | `true` | Whether to enable greptimedb telemetry. Enabled by default. |
| `runtime` | -- | -- | The runtime options. |
| `runtime.read_rt_size` | Integer | `8` | The number of threads to execute the runtime for global read operations. |
| `runtime.write_rt_size` | Integer | `8` | The number of threads to execute the runtime for global write operations. |
| `runtime.bg_rt_size` | Integer | `4` | The number of threads to execute the runtime for global background operations. |
| `runtime.global_rt_size` | Integer | `8` | The number of threads to execute the runtime for global read operations. |
| `runtime.compact_rt_size` | Integer | `4` | The number of threads to execute the runtime for global write operations. |
| `procedure` | -- | -- | Procedure storage options. |
| `procedure.max_retry_times` | Integer | `12` | Procedure max retry time. |
| `procedure.retry_delay` | String | `500ms` | Initial retry delay of procedures, increases exponentially |
| `procedure.max_metadata_value_size` | String | `1500KiB` | Auto split large value<br/>GreptimeDB procedure uses etcd as the default metadata storage backend.<br/>The etcd the maximum size of any request is 1.5 MiB<br/>1500KiB = 1536KiB (1.5MiB) - 36KiB (reserved size of key)<br/>Comments out the `max_metadata_value_size`, for don't split large value (no limit). |
| `failure_detector` | -- | -- | -- |
| `failure_detector.threshold` | Float | `8.0` | -- |
| `failure_detector.min_std_deviation` | String | `100ms` | -- |
| `failure_detector.acceptable_heartbeat_pause` | String | `10000ms` | -- |
| `failure_detector.first_heartbeat_estimate` | String | `1000ms` | -- |
| `failure_detector.threshold` | Float | `8.0` | The threshold value used by the failure detector to determine failure conditions. |
| `failure_detector.min_std_deviation` | String | `100ms` | The minimum standard deviation of the heartbeat intervals, used to calculate acceptable variations. |
| `failure_detector.acceptable_heartbeat_pause` | String | `10000ms` | The acceptable pause duration between heartbeats, used to determine if a heartbeat interval is acceptable. |
| `failure_detector.first_heartbeat_estimate` | String | `1000ms` | The initial estimate of the heartbeat interval used by the failure detector. |
| `datanode` | -- | -- | Datanode options. |
| `datanode.client` | -- | -- | Datanode client options. |
| `datanode.client.timeout` | String | `10s` | -- |
| `datanode.client.connect_timeout` | String | `10s` | -- |
| `datanode.client.tcp_nodelay` | Bool | `true` | -- |
| `datanode.client.timeout` | String | `10s` | Operation timeout. |
| `datanode.client.connect_timeout` | String | `10s` | Connect server timeout. |
| `datanode.client.tcp_nodelay` | Bool | `true` | `TCP_NODELAY` option for accepted connections. |
| `wal` | -- | -- | -- |
| `wal.provider` | String | `raft_engine` | -- |
| `wal.broker_endpoints` | Array | -- | The broker endpoints of the Kafka cluster. |
| `wal.num_topics` | Integer | `64` | Number of topics to be created upon start. |
| `wal.auto_create_topics` | Bool | `true` | Automatically create topics for WAL.<br/>Set to `true` to automatically create topics for WAL.<br/>Otherwise, use topics named `topic_name_prefix_[0..num_topics)` |
| `wal.num_topics` | Integer | `64` | Number of topics. |
| `wal.selector_type` | String | `round_robin` | Topic selector type.<br/>Available selector types:<br/>- `round_robin` (default) |
| `wal.topic_name_prefix` | String | `greptimedb_wal_topic` | A Kafka topic is constructed by concatenating `topic_name_prefix` and `topic_id`. |
| `wal.topic_name_prefix` | String | `greptimedb_wal_topic` | A Kafka topic is constructed by concatenating `topic_name_prefix` and `topic_id`.<br/>i.g., greptimedb_wal_topic_0, greptimedb_wal_topic_1. |
| `wal.replication_factor` | Integer | `1` | Expected number of replicas of each partition. |
| `wal.create_topic_timeout` | String | `30s` | Above which a topic creation operation will be cancelled. |
| `wal.backoff_init` | String | `500ms` | The initial backoff for kafka clients. |
@@ -291,23 +324,29 @@
| `wal.backoff_base` | Integer | `2` | Exponential backoff rate, i.e. next backoff = base * current backoff. |
| `wal.backoff_deadline` | String | `5mins` | Stop reconnecting if the total wait time reaches the deadline. If this config is missing, the reconnecting won't terminate. |
| `logging` | -- | -- | The logging options. |
| `logging.dir` | String | `/tmp/greptimedb/logs` | The directory to store the log files. |
| `logging.level` | String | `None` | The log level. Can be `info`/`debug`/`warn`/`error`. |
| `logging.dir` | String | `/tmp/greptimedb/logs` | The directory to store the log files. If set to empty, logs will not be written to files. |
| `logging.level` | String | Unset | The log level. Can be `info`/`debug`/`warn`/`error`. |
| `logging.enable_otlp_tracing` | Bool | `false` | Enable OTLP tracing. |
| `logging.otlp_endpoint` | String | `None` | The OTLP tracing endpoint. |
| `logging.otlp_endpoint` | String | `http://localhost:4317` | The OTLP tracing endpoint. |
| `logging.append_stdout` | Bool | `true` | Whether to append logs to stdout. |
| `logging.log_format` | String | `text` | The log format. Can be `text`/`json`. |
| `logging.max_log_files` | Integer | `720` | The maximum amount of log files. |
| `logging.tracing_sample_ratio` | -- | -- | The percentage of tracing will be sampled and exported.<br/>Valid range `[0, 1]`, 1 means all traces are sampled, 0 means all traces are not sampled, the default value is 1.<br/>ratio > 1 are treated as 1. Fractions < 0 are treated as 0 |
| `logging.tracing_sample_ratio.default_ratio` | Float | `1.0` | -- |
| `logging.slow_query` | -- | -- | The slow query log options. |
| `logging.slow_query.enable` | Bool | `false` | Whether to enable slow query log. |
| `logging.slow_query.threshold` | String | Unset | The threshold of slow query. |
| `logging.slow_query.sample_ratio` | Float | Unset | The sampling ratio of slow query log. The value should be in the range of (0, 1]. |
| `export_metrics` | -- | -- | The datanode can export its metrics and send to Prometheus compatible service (e.g. send to `greptimedb` itself) from remote-write API.<br/>This is only used for `greptimedb` to export its own metrics internally. It's different from prometheus scrape. |
| `export_metrics.enable` | Bool | `false` | whether enable export metrics. |
| `export_metrics.write_interval` | String | `30s` | The interval of export metrics. |
| `export_metrics.self_import` | -- | -- | For `standalone` mode, `self_import` is recommend to collect metrics generated by itself |
| `export_metrics.self_import.db` | String | `None` | -- |
| `export_metrics.self_import` | -- | -- | For `standalone` mode, `self_import` is recommend to collect metrics generated by itself<br/>You must create the database before enabling it. |
| `export_metrics.self_import.db` | String | Unset | -- |
| `export_metrics.remote_write` | -- | -- | -- |
| `export_metrics.remote_write.url` | String | `""` | The url the metrics send to. The url example can be: `http://127.0.0.1:4000/v1/prometheus/write?db=information_schema`. |
| `export_metrics.remote_write.url` | String | `""` | The url the metrics send to. The url example can be: `http://127.0.0.1:4000/v1/prometheus/write?db=greptime_metrics`. |
| `export_metrics.remote_write.headers` | InlineTable | -- | HTTP headers of Prometheus remote-write carry. |
| `tracing` | -- | -- | The tracing options. Only effect when compiled with `tokio-console` feature. |
| `tracing.tokio_console_addr` | String | `None` | The tokio console address. |
| `tracing.tokio_console_addr` | String | Unset | The tokio console address. |
### Datanode
@@ -315,16 +354,21 @@
| Key | Type | Default | Descriptions |
| --- | -----| ------- | ----------- |
| `mode` | String | `standalone` | The running mode of the datanode. It can be `standalone` or `distributed`. |
| `node_id` | Integer | `None` | The datanode identifier and should be unique in the cluster. |
| `node_id` | Integer | Unset | The datanode identifier and should be unique in the cluster. |
| `require_lease_before_startup` | Bool | `false` | Start services after regions have obtained leases.<br/>It will block the datanode start if it can't receive leases in the heartbeat from metasrv. |
| `init_regions_in_background` | Bool | `false` | Initialize all regions in the background during the startup.<br/>By default, it provides services after all regions have been initialized. |
| `enable_telemetry` | Bool | `true` | Enable telemetry to collect anonymous usage data. |
| `init_regions_parallelism` | Integer | `16` | Parallelism of initializing regions. |
| `rpc_addr` | String | `None` | Deprecated, use `grpc.addr` instead. |
| `rpc_hostname` | String | `None` | Deprecated, use `grpc.hostname` instead. |
| `rpc_runtime_size` | Integer | `None` | Deprecated, use `grpc.runtime_size` instead. |
| `rpc_max_recv_message_size` | String | `None` | Deprecated, use `grpc.rpc_max_recv_message_size` instead. |
| `rpc_max_send_message_size` | String | `None` | Deprecated, use `grpc.rpc_max_send_message_size` instead. |
| `max_concurrent_queries` | Integer | `0` | The maximum current queries allowed to be executed. Zero means unlimited. |
| `rpc_addr` | String | Unset | Deprecated, use `grpc.addr` instead. |
| `rpc_hostname` | String | Unset | Deprecated, use `grpc.hostname` instead. |
| `rpc_runtime_size` | Integer | Unset | Deprecated, use `grpc.runtime_size` instead. |
| `rpc_max_recv_message_size` | String | Unset | Deprecated, use `grpc.rpc_max_recv_message_size` instead. |
| `rpc_max_send_message_size` | String | Unset | Deprecated, use `grpc.rpc_max_send_message_size` instead. |
| `enable_telemetry` | Bool | `true` | Enable telemetry to collect anonymous usage data. Enabled by default. |
| `http` | -- | -- | The HTTP server options. |
| `http.addr` | String | `127.0.0.1:4000` | The address to bind the HTTP server. |
| `http.timeout` | String | `30s` | HTTP request timeout. Set to 0 to disable timeout. |
| `http.body_limit` | String | `64MB` | HTTP request body limit.<br/>The following units are supported: `B`, `KB`, `KiB`, `MB`, `MiB`, `GB`, `GiB`, `TB`, `TiB`, `PB`, `PiB`.<br/>Set to 0 to disable limit. |
| `grpc` | -- | -- | The gRPC server options. |
| `grpc.addr` | String | `127.0.0.1:3001` | The address to bind the gRPC server. |
| `grpc.hostname` | String | `127.0.0.1` | The hostname advertised to the metasrv,<br/>and used for connections from outside the host |
@@ -333,13 +377,12 @@
| `grpc.max_send_message_size` | String | `512MB` | The maximum send message size for gRPC server. |
| `grpc.tls` | -- | -- | gRPC server TLS options, see `mysql.tls` section. |
| `grpc.tls.mode` | String | `disable` | TLS mode. |
| `grpc.tls.cert_path` | String | `None` | Certificate file path. |
| `grpc.tls.key_path` | String | `None` | Private key file path. |
| `grpc.tls.cert_path` | String | Unset | Certificate file path. |
| `grpc.tls.key_path` | String | Unset | Private key file path. |
| `grpc.tls.watch` | Bool | `false` | Watch for Certificate and key file change and auto reload.<br/>For now, gRPC tls config does not support auto reload. |
| `runtime` | -- | -- | The runtime options. |
| `runtime.read_rt_size` | Integer | `8` | The number of threads to execute the runtime for global read operations. |
| `runtime.write_rt_size` | Integer | `8` | The number of threads to execute the runtime for global write operations. |
| `runtime.bg_rt_size` | Integer | `4` | The number of threads to execute the runtime for global background operations. |
| `runtime.global_rt_size` | Integer | `8` | The number of threads to execute the runtime for global read operations. |
| `runtime.compact_rt_size` | Integer | `4` | The number of threads to execute the runtime for global write operations. |
| `heartbeat` | -- | -- | The heartbeat options. |
| `heartbeat.interval` | String | `3s` | Interval for sending heartbeat messages to the metasrv. |
| `heartbeat.retry_interval` | String | `3s` | Interval for retrying to send heartbeat messages to the metasrv. |
@@ -355,15 +398,16 @@
| `meta_client.metadata_cache_tti` | String | `5m` | -- |
| `wal` | -- | -- | The WAL options. |
| `wal.provider` | String | `raft_engine` | The provider of the WAL.<br/>- `raft_engine`: the wal is stored in the local file system by raft-engine.<br/>- `kafka`: it's remote wal that data is stored in Kafka. |
| `wal.dir` | String | `None` | The directory to store the WAL files.<br/>**It's only used when the provider is `raft_engine`**. |
| `wal.file_size` | String | `256MB` | The size of the WAL segment file.<br/>**It's only used when the provider is `raft_engine`**. |
| `wal.purge_threshold` | String | `4GB` | The threshold of the WAL size to trigger a flush.<br/>**It's only used when the provider is `raft_engine`**. |
| `wal.purge_interval` | String | `10m` | The interval to trigger a flush.<br/>**It's only used when the provider is `raft_engine`**. |
| `wal.dir` | String | Unset | The directory to store the WAL files.<br/>**It's only used when the provider is `raft_engine`**. |
| `wal.file_size` | String | `128MB` | The size of the WAL segment file.<br/>**It's only used when the provider is `raft_engine`**. |
| `wal.purge_threshold` | String | `1GB` | The threshold of the WAL size to trigger a flush.<br/>**It's only used when the provider is `raft_engine`**. |
| `wal.purge_interval` | String | `1m` | The interval to trigger a flush.<br/>**It's only used when the provider is `raft_engine`**. |
| `wal.read_batch_size` | Integer | `128` | The read batch size.<br/>**It's only used when the provider is `raft_engine`**. |
| `wal.sync_write` | Bool | `false` | Whether to use sync write.<br/>**It's only used when the provider is `raft_engine`**. |
| `wal.enable_log_recycle` | Bool | `true` | Whether to reuse logically truncated log files.<br/>**It's only used when the provider is `raft_engine`**. |
| `wal.prefill_log_files` | Bool | `false` | Whether to pre-create log files on start up.<br/>**It's only used when the provider is `raft_engine`**. |
| `wal.sync_period` | String | `10s` | Duration for fsyncing log files.<br/>**It's only used when the provider is `raft_engine`**. |
| `wal.recovery_parallelism` | Integer | `2` | Parallelism during WAL recovery. |
| `wal.broker_endpoints` | Array | -- | The Kafka broker endpoints.<br/>**It's only used when the provider is `kafka`**. |
| `wal.max_batch_bytes` | String | `1MB` | The max size of a single producer batch.<br/>Warning: Kafka has a default limit of 1MB per message in a topic.<br/>**It's only used when the provider is `kafka`**. |
| `wal.consumer_wait_timeout` | String | `100ms` | The consumer wait timeout.<br/>**It's only used when the provider is `kafka`**. |
@@ -371,24 +415,33 @@
| `wal.backoff_max` | String | `10s` | The maximum backoff delay.<br/>**It's only used when the provider is `kafka`**. |
| `wal.backoff_base` | Integer | `2` | The exponential backoff rate, i.e. next backoff = base * current backoff.<br/>**It's only used when the provider is `kafka`**. |
| `wal.backoff_deadline` | String | `5mins` | The deadline of retries.<br/>**It's only used when the provider is `kafka`**. |
| `wal.create_index` | Bool | `true` | Whether to enable WAL index creation.<br/>**It's only used when the provider is `kafka`**. |
| `wal.dump_index_interval` | String | `60s` | The interval for dumping WAL indexes.<br/>**It's only used when the provider is `kafka`**. |
| `wal.overwrite_entry_start_id` | Bool | `false` | Ignore missing entries during read WAL.<br/>**It's only used when the provider is `kafka`**.<br/><br/>This option ensures that when Kafka messages are deleted, the system<br/>can still successfully replay memtable data without throwing an<br/>out-of-range error.<br/>However, enabling this option might lead to unexpected data loss,<br/>as the system will skip over missing entries instead of treating<br/>them as critical errors. |
| `storage` | -- | -- | The data storage options. |
| `storage.data_home` | String | `/tmp/greptimedb/` | The working home directory. |
| `storage.type` | String | `File` | The storage type used to store the data.<br/>- `File`: the data is stored in the local file system.<br/>- `S3`: the data is stored in the S3 object storage.<br/>- `Gcs`: the data is stored in the Google Cloud Storage.<br/>- `Azblob`: the data is stored in the Azure Blob Storage.<br/>- `Oss`: the data is stored in the Aliyun OSS. |
| `storage.cache_path` | String | `None` | Cache configuration for object storage such as 'S3' etc.<br/>The local file cache directory. |
| `storage.cache_capacity` | String | `None` | The local file cache capacity in bytes. |
| `storage.bucket` | String | `None` | The S3 bucket name.<br/>**It's only used when the storage type is `S3`, `Oss` and `Gcs`**. |
| `storage.root` | String | `None` | The S3 data will be stored in the specified prefix, for example, `s3://${bucket}/${root}`.<br/>**It's only used when the storage type is `S3`, `Oss` and `Azblob`**. |
| `storage.access_key_id` | String | `None` | The access key id of the aws account.<br/>It's **highly recommended** to use AWS IAM roles instead of hardcoding the access key id and secret key.<br/>**It's only used when the storage type is `S3` and `Oss`**. |
| `storage.secret_access_key` | String | `None` | The secret access key of the aws account.<br/>It's **highly recommended** to use AWS IAM roles instead of hardcoding the access key id and secret key.<br/>**It's only used when the storage type is `S3`**. |
| `storage.access_key_secret` | String | `None` | The secret access key of the aliyun account.<br/>**It's only used when the storage type is `Oss`**. |
| `storage.account_name` | String | `None` | The account key of the azure account.<br/>**It's only used when the storage type is `Azblob`**. |
| `storage.account_key` | String | `None` | The account key of the azure account.<br/>**It's only used when the storage type is `Azblob`**. |
| `storage.scope` | String | `None` | The scope of the google cloud storage.<br/>**It's only used when the storage type is `Gcs`**. |
| `storage.credential_path` | String | `None` | The credential path of the google cloud storage.<br/>**It's only used when the storage type is `Gcs`**. |
| `storage.container` | String | `None` | The container of the azure account.<br/>**It's only used when the storage type is `Azblob`**. |
| `storage.sas_token` | String | `None` | The sas token of the azure account.<br/>**It's only used when the storage type is `Azblob`**. |
| `storage.endpoint` | String | `None` | The endpoint of the S3 service.<br/>**It's only used when the storage type is `S3`, `Oss`, `Gcs` and `Azblob`**. |
| `storage.region` | String | `None` | The region of the S3 service.<br/>**It's only used when the storage type is `S3`, `Oss`, `Gcs` and `Azblob`**. |
| `storage.cache_path` | String | Unset | Read cache configuration for object storage such as 'S3' etc, it's configured by default when using object storage. It is recommended to configure it when using object storage for better performance.<br/>A local file directory, defaults to `{data_home}/object_cache/read`. An empty string means disabling. |
| `storage.cache_capacity` | String | Unset | The local file cache capacity in bytes. If your disk space is sufficient, it is recommended to set it larger. |
| `storage.bucket` | String | Unset | The S3 bucket name.<br/>**It's only used when the storage type is `S3`, `Oss` and `Gcs`**. |
| `storage.root` | String | Unset | The S3 data will be stored in the specified prefix, for example, `s3://${bucket}/${root}`.<br/>**It's only used when the storage type is `S3`, `Oss` and `Azblob`**. |
| `storage.access_key_id` | String | Unset | The access key id of the aws account.<br/>It's **highly recommended** to use AWS IAM roles instead of hardcoding the access key id and secret key.<br/>**It's only used when the storage type is `S3` and `Oss`**. |
| `storage.secret_access_key` | String | Unset | The secret access key of the aws account.<br/>It's **highly recommended** to use AWS IAM roles instead of hardcoding the access key id and secret key.<br/>**It's only used when the storage type is `S3`**. |
| `storage.access_key_secret` | String | Unset | The secret access key of the aliyun account.<br/>**It's only used when the storage type is `Oss`**. |
| `storage.account_name` | String | Unset | The account key of the azure account.<br/>**It's only used when the storage type is `Azblob`**. |
| `storage.account_key` | String | Unset | The account key of the azure account.<br/>**It's only used when the storage type is `Azblob`**. |
| `storage.scope` | String | Unset | The scope of the google cloud storage.<br/>**It's only used when the storage type is `Gcs`**. |
| `storage.credential_path` | String | Unset | The credential path of the google cloud storage.<br/>**It's only used when the storage type is `Gcs`**. |
| `storage.credential` | String | Unset | The credential of the google cloud storage.<br/>**It's only used when the storage type is `Gcs`**. |
| `storage.container` | String | Unset | The container of the azure account.<br/>**It's only used when the storage type is `Azblob`**. |
| `storage.sas_token` | String | Unset | The sas token of the azure account.<br/>**It's only used when the storage type is `Azblob`**. |
| `storage.endpoint` | String | Unset | The endpoint of the S3 service.<br/>**It's only used when the storage type is `S3`, `Oss`, `Gcs` and `Azblob`**. |
| `storage.region` | String | Unset | The region of the S3 service.<br/>**It's only used when the storage type is `S3`, `Oss`, `Gcs` and `Azblob`**. |
| `storage.http_client` | -- | -- | The http client options to the storage.<br/>**It's only used when the storage type is `S3`, `Oss`, `Gcs` and `Azblob`**. |
| `storage.http_client.pool_max_idle_per_host` | Integer | `1024` | The maximum idle connection per host allowed in the pool. |
| `storage.http_client.connect_timeout` | String | `30s` | The timeout for only the connect phase of a http client. |
| `storage.http_client.timeout` | String | `30s` | The total request timeout, applied from when the request starts connecting until the response body has finished.<br/>Also considered a total deadline. |
| `storage.http_client.pool_idle_timeout` | String | `90s` | The timeout for idle sockets being kept-alive. |
| `[[region_engine]]` | -- | -- | The region engine options. You can configure multiple region engines. |
| `region_engine.mito` | -- | -- | The Mito engine options. |
| `region_engine.mito.num_workers` | Integer | `8` | Number of region workers. |
@@ -396,21 +449,24 @@
| `region_engine.mito.worker_request_batch_size` | Integer | `64` | Max batch size for a worker to handle requests. |
| `region_engine.mito.manifest_checkpoint_distance` | Integer | `10` | Number of meta action updated to trigger a new checkpoint for the manifest. |
| `region_engine.mito.compress_manifest` | Bool | `false` | Whether to compress manifest and checkpoint file by gzip (default false). |
| `region_engine.mito.max_background_jobs` | Integer | `4` | Max number of running background jobs |
| `region_engine.mito.max_background_flushes` | Integer | Auto | Max number of running background flush jobs (default: 1/2 of cpu cores). |
| `region_engine.mito.max_background_compactions` | Integer | Auto | Max number of running background compaction jobs (default: 1/4 of cpu cores). |
| `region_engine.mito.max_background_purges` | Integer | Auto | Max number of running background purge jobs (default: number of cpu cores). |
| `region_engine.mito.auto_flush_interval` | String | `1h` | Interval to auto flush a region if it has not flushed yet. |
| `region_engine.mito.global_write_buffer_size` | String | `1GB` | Global write buffer size for all regions. If not set, it's default to 1/8 of OS memory with a max limitation of 1GB. |
| `region_engine.mito.global_write_buffer_reject_size` | String | `2GB` | Global write buffer size threshold to reject write requests. If not set, it's default to 2 times of `global_write_buffer_size` |
| `region_engine.mito.sst_meta_cache_size` | String | `128MB` | Cache size for SST metadata. Setting it to 0 to disable the cache.<br/>If not set, it's default to 1/32 of OS memory with a max limitation of 128MB. |
| `region_engine.mito.vector_cache_size` | String | `512MB` | Cache size for vectors and arrow arrays. Setting it to 0 to disable the cache.<br/>If not set, it's default to 1/16 of OS memory with a max limitation of 512MB. |
| `region_engine.mito.page_cache_size` | String | `512MB` | Cache size for pages of SST row groups. Setting it to 0 to disable the cache.<br/>If not set, it's default to 1/16 of OS memory with a max limitation of 512MB. |
| `region_engine.mito.enable_experimental_write_cache` | Bool | `false` | Whether to enable the experimental write cache. |
| `region_engine.mito.experimental_write_cache_path` | String | `""` | File system path for write cache, defaults to `{data_home}/write_cache`. |
| `region_engine.mito.experimental_write_cache_size` | String | `512MB` | Capacity for write cache. |
| `region_engine.mito.experimental_write_cache_ttl` | String | `1h` | TTL for write cache. |
| `region_engine.mito.global_write_buffer_size` | String | Auto | Global write buffer size for all regions. If not set, it's default to 1/8 of OS memory with a max limitation of 1GB. |
| `region_engine.mito.global_write_buffer_reject_size` | String | Auto | Global write buffer size threshold to reject write requests. If not set, it's default to 2 times of `global_write_buffer_size` |
| `region_engine.mito.sst_meta_cache_size` | String | Auto | Cache size for SST metadata. Setting it to 0 to disable the cache.<br/>If not set, it's default to 1/32 of OS memory with a max limitation of 128MB. |
| `region_engine.mito.vector_cache_size` | String | Auto | Cache size for vectors and arrow arrays. Setting it to 0 to disable the cache.<br/>If not set, it's default to 1/16 of OS memory with a max limitation of 512MB. |
| `region_engine.mito.page_cache_size` | String | Auto | Cache size for pages of SST row groups. Setting it to 0 to disable the cache.<br/>If not set, it's default to 1/8 of OS memory. |
| `region_engine.mito.selector_result_cache_size` | String | Auto | Cache size for time series selector (e.g. `last_value()`). Setting it to 0 to disable the cache.<br/>If not set, it's default to 1/16 of OS memory with a max limitation of 512MB. |
| `region_engine.mito.enable_experimental_write_cache` | Bool | `false` | Whether to enable the experimental write cache, it's enabled by default when using object storage. It is recommended to enable it when using object storage for better performance. |
| `region_engine.mito.experimental_write_cache_path` | String | `""` | File system path for write cache, defaults to `{data_home}/object_cache/write`. |
| `region_engine.mito.experimental_write_cache_size` | String | `5GiB` | Capacity for write cache. If your disk space is sufficient, it is recommended to set it larger. |
| `region_engine.mito.experimental_write_cache_ttl` | String | Unset | TTL for write cache. |
| `region_engine.mito.sst_write_buffer_size` | String | `8MB` | Buffer size for SST writing. |
| `region_engine.mito.scan_parallelism` | Integer | `0` | Parallelism to scan a region (default: 1/4 of cpu cores).<br/>- `0`: using the default value (1/4 of cpu cores).<br/>- `1`: scan in current thread.<br/>- `n`: scan in parallelism n. |
| `region_engine.mito.parallel_scan_channel_size` | Integer | `32` | Capacity of the channel to send data from parallel scan tasks to the main task. |
| `region_engine.mito.allow_stale_entries` | Bool | `false` | Whether to allow stale WAL entries read during replay. |
| `region_engine.mito.min_compaction_interval` | String | `0m` | Minimum time interval between two compactions.<br/>To align with the old behavior, the default value is 0 (no restrictions). |
| `region_engine.mito.index` | -- | -- | The options for index in Mito engine. |
| `region_engine.mito.index.aux_path` | String | `""` | Auxiliary directory path for the index in filesystem, used to store intermediate files for<br/>creating the index and staging files for searching the index, defaults to `{data_home}/index_intermediate`.<br/>The default name for this directory is `index_intermediate` for backward compatibility.<br/><br/>This path contains two subdirectories:<br/>- `__intm`: for storing intermediate files used during creating index.<br/>- `staging`: for storing staging files used during searching index. |
| `region_engine.mito.index.staging_size` | String | `2GB` | The max capacity of the staging directory. |
@@ -420,6 +476,9 @@
| `region_engine.mito.inverted_index.apply_on_query` | String | `auto` | Whether to apply the index on query<br/>- `auto`: automatically (default)<br/>- `disable`: never |
| `region_engine.mito.inverted_index.mem_threshold_on_create` | String | `auto` | Memory threshold for performing an external sort during index creation.<br/>- `auto`: automatically determine the threshold based on the system memory size (default)<br/>- `unlimited`: no memory limit<br/>- `[size]` e.g. `64MB`: fixed memory threshold |
| `region_engine.mito.inverted_index.intermediate_path` | String | `""` | Deprecated, use `region_engine.mito.index.aux_path` instead. |
| `region_engine.mito.inverted_index.metadata_cache_size` | String | `64MiB` | Cache size for inverted index metadata. |
| `region_engine.mito.inverted_index.content_cache_size` | String | `128MiB` | Cache size for inverted index content. |
| `region_engine.mito.inverted_index.content_cache_page_size` | String | `8MiB` | Page size for inverted index content cache. |
| `region_engine.mito.fulltext_index` | -- | -- | The options for full-text index in Mito engine. |
| `region_engine.mito.fulltext_index.create_on_flush` | String | `auto` | Whether to create the index on flush.<br/>- `auto`: automatically (default)<br/>- `disable`: never |
| `region_engine.mito.fulltext_index.create_on_compaction` | String | `auto` | Whether to create the index on compaction.<br/>- `auto`: automatically (default)<br/>- `disable`: never |
@@ -432,23 +491,29 @@
| `region_engine.mito.memtable.fork_dictionary_bytes` | String | `1GiB` | Max dictionary bytes.<br/>Only available for `partition_tree` memtable. |
| `region_engine.file` | -- | -- | Enable the file engine. |
| `logging` | -- | -- | The logging options. |
| `logging.dir` | String | `/tmp/greptimedb/logs` | The directory to store the log files. |
| `logging.level` | String | `None` | The log level. Can be `info`/`debug`/`warn`/`error`. |
| `logging.dir` | String | `/tmp/greptimedb/logs` | The directory to store the log files. If set to empty, logs will not be written to files. |
| `logging.level` | String | Unset | The log level. Can be `info`/`debug`/`warn`/`error`. |
| `logging.enable_otlp_tracing` | Bool | `false` | Enable OTLP tracing. |
| `logging.otlp_endpoint` | String | `None` | The OTLP tracing endpoint. |
| `logging.otlp_endpoint` | String | `http://localhost:4317` | The OTLP tracing endpoint. |
| `logging.append_stdout` | Bool | `true` | Whether to append logs to stdout. |
| `logging.log_format` | String | `text` | The log format. Can be `text`/`json`. |
| `logging.max_log_files` | Integer | `720` | The maximum amount of log files. |
| `logging.tracing_sample_ratio` | -- | -- | The percentage of tracing will be sampled and exported.<br/>Valid range `[0, 1]`, 1 means all traces are sampled, 0 means all traces are not sampled, the default value is 1.<br/>ratio > 1 are treated as 1. Fractions < 0 are treated as 0 |
| `logging.tracing_sample_ratio.default_ratio` | Float | `1.0` | -- |
| `logging.slow_query` | -- | -- | The slow query log options. |
| `logging.slow_query.enable` | Bool | `false` | Whether to enable slow query log. |
| `logging.slow_query.threshold` | String | Unset | The threshold of slow query. |
| `logging.slow_query.sample_ratio` | Float | Unset | The sampling ratio of slow query log. The value should be in the range of (0, 1]. |
| `export_metrics` | -- | -- | The datanode can export its metrics and send to Prometheus compatible service (e.g. send to `greptimedb` itself) from remote-write API.<br/>This is only used for `greptimedb` to export its own metrics internally. It's different from prometheus scrape. |
| `export_metrics.enable` | Bool | `false` | whether enable export metrics. |
| `export_metrics.write_interval` | String | `30s` | The interval of export metrics. |
| `export_metrics.self_import` | -- | -- | For `standalone` mode, `self_import` is recommend to collect metrics generated by itself |
| `export_metrics.self_import.db` | String | `None` | -- |
| `export_metrics.self_import` | -- | -- | For `standalone` mode, `self_import` is recommend to collect metrics generated by itself<br/>You must create the database before enabling it. |
| `export_metrics.self_import.db` | String | Unset | -- |
| `export_metrics.remote_write` | -- | -- | -- |
| `export_metrics.remote_write.url` | String | `""` | The url the metrics send to. The url example can be: `http://127.0.0.1:4000/v1/prometheus/write?db=information_schema`. |
| `export_metrics.remote_write.url` | String | `""` | The url the metrics send to. The url example can be: `http://127.0.0.1:4000/v1/prometheus/write?db=greptime_metrics`. |
| `export_metrics.remote_write.headers` | InlineTable | -- | HTTP headers of Prometheus remote-write carry. |
| `tracing` | -- | -- | The tracing options. Only effect when compiled with `tokio-console` feature. |
| `tracing.tokio_console_addr` | String | `None` | The tokio console address. |
| `tracing.tokio_console_addr` | String | Unset | The tokio console address. |
### Flownode
@@ -456,7 +521,7 @@
| Key | Type | Default | Descriptions |
| --- | -----| ------- | ----------- |
| `mode` | String | `distributed` | The running mode of the flownode. It can be `standalone` or `distributed`. |
| `node_id` | Integer | `None` | The flownode identifier and should be unique in the cluster. |
| `node_id` | Integer | Unset | The flownode identifier and should be unique in the cluster. |
| `grpc` | -- | -- | The gRPC server options. |
| `grpc.addr` | String | `127.0.0.1:6800` | The address to bind the gRPC server. |
| `grpc.hostname` | String | `127.0.0.1` | The hostname advertised to the metasrv,<br/>and used for connections from outside the host |
@@ -477,12 +542,18 @@
| `heartbeat.interval` | String | `3s` | Interval for sending heartbeat messages to the metasrv. |
| `heartbeat.retry_interval` | String | `3s` | Interval for retrying to send heartbeat messages to the metasrv. |
| `logging` | -- | -- | The logging options. |
| `logging.dir` | String | `/tmp/greptimedb/logs` | The directory to store the log files. |
| `logging.level` | String | `None` | The log level. Can be `info`/`debug`/`warn`/`error`. |
| `logging.dir` | String | `/tmp/greptimedb/logs` | The directory to store the log files. If set to empty, logs will not be written to files. |
| `logging.level` | String | Unset | The log level. Can be `info`/`debug`/`warn`/`error`. |
| `logging.enable_otlp_tracing` | Bool | `false` | Enable OTLP tracing. |
| `logging.otlp_endpoint` | String | `None` | The OTLP tracing endpoint. |
| `logging.otlp_endpoint` | String | `http://localhost:4317` | The OTLP tracing endpoint. |
| `logging.append_stdout` | Bool | `true` | Whether to append logs to stdout. |
| `logging.log_format` | String | `text` | The log format. Can be `text`/`json`. |
| `logging.max_log_files` | Integer | `720` | The maximum amount of log files. |
| `logging.tracing_sample_ratio` | -- | -- | The percentage of tracing will be sampled and exported.<br/>Valid range `[0, 1]`, 1 means all traces are sampled, 0 means all traces are not sampled, the default value is 1.<br/>ratio > 1 are treated as 1. Fractions < 0 are treated as 0 |
| `logging.tracing_sample_ratio.default_ratio` | Float | `1.0` | -- |
| `logging.slow_query` | -- | -- | The slow query log options. |
| `logging.slow_query.enable` | Bool | `false` | Whether to enable slow query log. |
| `logging.slow_query.threshold` | String | Unset | The threshold of slow query. |
| `logging.slow_query.sample_ratio` | Float | Unset | The sampling ratio of slow query log. The value should be in the range of (0, 1]. |
| `tracing` | -- | -- | The tracing options. Only effect when compiled with `tokio-console` feature. |
| `tracing.tokio_console_addr` | String | `None` | The tokio console address. |
| `tracing.tokio_console_addr` | String | Unset | The tokio console address. |

View File

@@ -2,7 +2,7 @@
mode = "standalone"
## The datanode identifier and should be unique in the cluster.
## +toml2docs:none-default
## @toml2docs:none-default
node_id = 42
## Start services after regions have obtained leases.
@@ -13,32 +13,46 @@ require_lease_before_startup = false
## By default, it provides services after all regions have been initialized.
init_regions_in_background = false
## Enable telemetry to collect anonymous usage data.
enable_telemetry = true
## Parallelism of initializing regions.
init_regions_parallelism = 16
## The maximum current queries allowed to be executed. Zero means unlimited.
max_concurrent_queries = 0
## Deprecated, use `grpc.addr` instead.
## +toml2docs:none-default
## @toml2docs:none-default
rpc_addr = "127.0.0.1:3001"
## Deprecated, use `grpc.hostname` instead.
## +toml2docs:none-default
## @toml2docs:none-default
rpc_hostname = "127.0.0.1"
## Deprecated, use `grpc.runtime_size` instead.
## +toml2docs:none-default
## @toml2docs:none-default
rpc_runtime_size = 8
## Deprecated, use `grpc.rpc_max_recv_message_size` instead.
## +toml2docs:none-default
## @toml2docs:none-default
rpc_max_recv_message_size = "512MB"
## Deprecated, use `grpc.rpc_max_send_message_size` instead.
## +toml2docs:none-default
## @toml2docs:none-default
rpc_max_send_message_size = "512MB"
## Enable telemetry to collect anonymous usage data. Enabled by default.
#+ enable_telemetry = true
## The HTTP server options.
[http]
## The address to bind the HTTP server.
addr = "127.0.0.1:4000"
## HTTP request timeout. Set to 0 to disable timeout.
timeout = "30s"
## HTTP request body limit.
## The following units are supported: `B`, `KB`, `KiB`, `MB`, `MiB`, `GB`, `GiB`, `TB`, `TiB`, `PB`, `PiB`.
## Set to 0 to disable limit.
body_limit = "64MB"
## The gRPC server options.
[grpc]
## The address to bind the gRPC server.
@@ -59,11 +73,11 @@ max_send_message_size = "512MB"
mode = "disable"
## Certificate file path.
## +toml2docs:none-default
## @toml2docs:none-default
cert_path = ""
## Private key file path.
## +toml2docs:none-default
## @toml2docs:none-default
key_path = ""
## Watch for Certificate and key file change and auto reload.
@@ -71,13 +85,11 @@ key_path = ""
watch = false
## The runtime options.
[runtime]
#+ [runtime]
## The number of threads to execute the runtime for global read operations.
read_rt_size = 8
#+ global_rt_size = 8
## The number of threads to execute the runtime for global write operations.
write_rt_size = 8
## The number of threads to execute the runtime for global background operations.
bg_rt_size = 4
#+ compact_rt_size = 4
## The heartbeat options.
[heartbeat]
@@ -125,20 +137,20 @@ provider = "raft_engine"
## The directory to store the WAL files.
## **It's only used when the provider is `raft_engine`**.
## +toml2docs:none-default
## @toml2docs:none-default
dir = "/tmp/greptimedb/wal"
## The size of the WAL segment file.
## **It's only used when the provider is `raft_engine`**.
file_size = "256MB"
file_size = "128MB"
## The threshold of the WAL size to trigger a flush.
## **It's only used when the provider is `raft_engine`**.
purge_threshold = "4GB"
purge_threshold = "1GB"
## The interval to trigger a flush.
## **It's only used when the provider is `raft_engine`**.
purge_interval = "10m"
purge_interval = "1m"
## The read batch size.
## **It's only used when the provider is `raft_engine`**.
@@ -160,6 +172,9 @@ prefill_log_files = false
## **It's only used when the provider is `raft_engine`**.
sync_period = "10s"
## Parallelism during WAL recovery.
recovery_parallelism = 2
## The Kafka broker endpoints.
## **It's only used when the provider is `kafka`**.
broker_endpoints = ["127.0.0.1:9092"]
@@ -189,6 +204,43 @@ backoff_base = 2
## **It's only used when the provider is `kafka`**.
backoff_deadline = "5mins"
## Whether to enable WAL index creation.
## **It's only used when the provider is `kafka`**.
create_index = true
## The interval for dumping WAL indexes.
## **It's only used when the provider is `kafka`**.
dump_index_interval = "60s"
## Ignore missing entries during read WAL.
## **It's only used when the provider is `kafka`**.
##
## This option ensures that when Kafka messages are deleted, the system
## can still successfully replay memtable data without throwing an
## out-of-range error.
## However, enabling this option might lead to unexpected data loss,
## as the system will skip over missing entries instead of treating
## them as critical errors.
overwrite_entry_start_id = false
# The Kafka SASL configuration.
# **It's only used when the provider is `kafka`**.
# Available SASL mechanisms:
# - `PLAIN`
# - `SCRAM-SHA-256`
# - `SCRAM-SHA-512`
# [wal.sasl]
# type = "SCRAM-SHA-512"
# username = "user_kafka"
# password = "secret"
# The Kafka TLS configuration.
# **It's only used when the provider is `kafka`**.
# [wal.tls]
# server_ca_cert_path = "/path/to/server_cert"
# client_cert_path = "/path/to/client_cert"
# client_key_path = "/path/to/key"
# Example of using S3 as the storage.
# [storage]
# type = "S3"
@@ -225,6 +277,7 @@ backoff_deadline = "5mins"
# root = "data"
# scope = "test"
# credential_path = "123456"
# credential = "base64-credential"
# endpoint = "https://storage.googleapis.com"
## The data storage options.
@@ -240,87 +293,123 @@ data_home = "/tmp/greptimedb/"
## - `Oss`: the data is stored in the Aliyun OSS.
type = "File"
## Cache configuration for object storage such as 'S3' etc.
## The local file cache directory.
## +toml2docs:none-default
cache_path = "/path/local_cache"
## Read cache configuration for object storage such as 'S3' etc, it's configured by default when using object storage. It is recommended to configure it when using object storage for better performance.
## A local file directory, defaults to `{data_home}/object_cache/read`. An empty string means disabling.
## @toml2docs:none-default
#+ cache_path = ""
## The local file cache capacity in bytes.
## +toml2docs:none-default
cache_capacity = "256MB"
## The local file cache capacity in bytes. If your disk space is sufficient, it is recommended to set it larger.
## @toml2docs:none-default
cache_capacity = "5GiB"
## The S3 bucket name.
## **It's only used when the storage type is `S3`, `Oss` and `Gcs`**.
## +toml2docs:none-default
## @toml2docs:none-default
bucket = "greptimedb"
## The S3 data will be stored in the specified prefix, for example, `s3://${bucket}/${root}`.
## **It's only used when the storage type is `S3`, `Oss` and `Azblob`**.
## +toml2docs:none-default
## @toml2docs:none-default
root = "greptimedb"
## The access key id of the aws account.
## It's **highly recommended** to use AWS IAM roles instead of hardcoding the access key id and secret key.
## **It's only used when the storage type is `S3` and `Oss`**.
## +toml2docs:none-default
## @toml2docs:none-default
access_key_id = "test"
## The secret access key of the aws account.
## It's **highly recommended** to use AWS IAM roles instead of hardcoding the access key id and secret key.
## **It's only used when the storage type is `S3`**.
## +toml2docs:none-default
## @toml2docs:none-default
secret_access_key = "test"
## The secret access key of the aliyun account.
## **It's only used when the storage type is `Oss`**.
## +toml2docs:none-default
## @toml2docs:none-default
access_key_secret = "test"
## The account key of the azure account.
## **It's only used when the storage type is `Azblob`**.
## +toml2docs:none-default
## @toml2docs:none-default
account_name = "test"
## The account key of the azure account.
## **It's only used when the storage type is `Azblob`**.
## +toml2docs:none-default
## @toml2docs:none-default
account_key = "test"
## The scope of the google cloud storage.
## **It's only used when the storage type is `Gcs`**.
## +toml2docs:none-default
## @toml2docs:none-default
scope = "test"
## The credential path of the google cloud storage.
## **It's only used when the storage type is `Gcs`**.
## +toml2docs:none-default
## @toml2docs:none-default
credential_path = "test"
## The credential of the google cloud storage.
## **It's only used when the storage type is `Gcs`**.
## @toml2docs:none-default
credential = "base64-credential"
## The container of the azure account.
## **It's only used when the storage type is `Azblob`**.
## +toml2docs:none-default
## @toml2docs:none-default
container = "greptimedb"
## The sas token of the azure account.
## **It's only used when the storage type is `Azblob`**.
## +toml2docs:none-default
## @toml2docs:none-default
sas_token = ""
## The endpoint of the S3 service.
## **It's only used when the storage type is `S3`, `Oss`, `Gcs` and `Azblob`**.
## +toml2docs:none-default
## @toml2docs:none-default
endpoint = "https://s3.amazonaws.com"
## The region of the S3 service.
## **It's only used when the storage type is `S3`, `Oss`, `Gcs` and `Azblob`**.
## +toml2docs:none-default
## @toml2docs:none-default
region = "us-west-2"
## The http client options to the storage.
## **It's only used when the storage type is `S3`, `Oss`, `Gcs` and `Azblob`**.
[storage.http_client]
## The maximum idle connection per host allowed in the pool.
pool_max_idle_per_host = 1024
## The timeout for only the connect phase of a http client.
connect_timeout = "30s"
## The total request timeout, applied from when the request starts connecting until the response body has finished.
## Also considered a total deadline.
timeout = "30s"
## The timeout for idle sockets being kept-alive.
pool_idle_timeout = "90s"
# Custom storage options
# [[storage.providers]]
# name = "S3"
# type = "S3"
# bucket = "greptimedb"
# root = "data"
# access_key_id = "test"
# secret_access_key = "123456"
# endpoint = "https://s3.amazonaws.com"
# region = "us-west-2"
# [[storage.providers]]
# name = "Gcs"
# type = "Gcs"
# bucket = "greptimedb"
# root = "data"
# scope = "test"
# credential_path = "123456"
# credential = "base64-credential"
# endpoint = "https://storage.googleapis.com"
## The region engine options. You can configure multiple region engines.
[[region_engine]]
@@ -329,7 +418,7 @@ region = "us-west-2"
[region_engine.mito]
## Number of region workers.
num_workers = 8
#+ num_workers = 8
## Request channel size of each worker.
worker_channel_size = 128
@@ -343,57 +432,75 @@ manifest_checkpoint_distance = 10
## Whether to compress manifest and checkpoint file by gzip (default false).
compress_manifest = false
## Max number of running background jobs
max_background_jobs = 4
## Max number of running background flush jobs (default: 1/2 of cpu cores).
## @toml2docs:none-default="Auto"
#+ max_background_flushes = 4
## Max number of running background compaction jobs (default: 1/4 of cpu cores).
## @toml2docs:none-default="Auto"
#+ max_background_compactions = 2
## Max number of running background purge jobs (default: number of cpu cores).
## @toml2docs:none-default="Auto"
#+ max_background_purges = 8
## Interval to auto flush a region if it has not flushed yet.
auto_flush_interval = "1h"
## Global write buffer size for all regions. If not set, it's default to 1/8 of OS memory with a max limitation of 1GB.
global_write_buffer_size = "1GB"
## @toml2docs:none-default="Auto"
#+ global_write_buffer_size = "1GB"
## Global write buffer size threshold to reject write requests. If not set, it's default to 2 times of `global_write_buffer_size`
global_write_buffer_reject_size = "2GB"
## @toml2docs:none-default="Auto"
#+ global_write_buffer_reject_size = "2GB"
## Cache size for SST metadata. Setting it to 0 to disable the cache.
## If not set, it's default to 1/32 of OS memory with a max limitation of 128MB.
sst_meta_cache_size = "128MB"
## @toml2docs:none-default="Auto"
#+ sst_meta_cache_size = "128MB"
## Cache size for vectors and arrow arrays. Setting it to 0 to disable the cache.
## If not set, it's default to 1/16 of OS memory with a max limitation of 512MB.
vector_cache_size = "512MB"
## @toml2docs:none-default="Auto"
#+ vector_cache_size = "512MB"
## Cache size for pages of SST row groups. Setting it to 0 to disable the cache.
## If not set, it's default to 1/16 of OS memory with a max limitation of 512MB.
page_cache_size = "512MB"
## If not set, it's default to 1/8 of OS memory.
## @toml2docs:none-default="Auto"
#+ page_cache_size = "512MB"
## Whether to enable the experimental write cache.
## Cache size for time series selector (e.g. `last_value()`). Setting it to 0 to disable the cache.
## If not set, it's default to 1/16 of OS memory with a max limitation of 512MB.
## @toml2docs:none-default="Auto"
#+ selector_result_cache_size = "512MB"
## Whether to enable the experimental write cache, it's enabled by default when using object storage. It is recommended to enable it when using object storage for better performance.
enable_experimental_write_cache = false
## File system path for write cache, defaults to `{data_home}/write_cache`.
## File system path for write cache, defaults to `{data_home}/object_cache/write`.
experimental_write_cache_path = ""
## Capacity for write cache.
experimental_write_cache_size = "512MB"
## Capacity for write cache. If your disk space is sufficient, it is recommended to set it larger.
experimental_write_cache_size = "5GiB"
## TTL for write cache.
experimental_write_cache_ttl = "1h"
## @toml2docs:none-default
experimental_write_cache_ttl = "8h"
## Buffer size for SST writing.
sst_write_buffer_size = "8MB"
## Parallelism to scan a region (default: 1/4 of cpu cores).
## - `0`: using the default value (1/4 of cpu cores).
## - `1`: scan in current thread.
## - `n`: scan in parallelism n.
scan_parallelism = 0
## Capacity of the channel to send data from parallel scan tasks to the main task.
parallel_scan_channel_size = 32
## Whether to allow stale WAL entries read during replay.
allow_stale_entries = false
## Minimum time interval between two compactions.
## To align with the old behavior, the default value is 0 (no restrictions).
min_compaction_interval = "0m"
## The options for index in Mito engine.
[region_engine.mito.index]
@@ -436,6 +543,15 @@ mem_threshold_on_create = "auto"
## Deprecated, use `region_engine.mito.index.aux_path` instead.
intermediate_path = ""
## Cache size for inverted index metadata.
metadata_cache_size = "64MiB"
## Cache size for inverted index content.
content_cache_size = "128MiB"
## Page size for inverted index content cache.
content_cache_page_size = "8MiB"
## The options for full-text index in Mito engine.
[region_engine.mito.fulltext_index]
@@ -484,29 +600,47 @@ fork_dictionary_bytes = "1GiB"
## The logging options.
[logging]
## The directory to store the log files.
## The directory to store the log files. If set to empty, logs will not be written to files.
dir = "/tmp/greptimedb/logs"
## The log level. Can be `info`/`debug`/`warn`/`error`.
## +toml2docs:none-default
## @toml2docs:none-default
level = "info"
## Enable OTLP tracing.
enable_otlp_tracing = false
## The OTLP tracing endpoint.
## +toml2docs:none-default
otlp_endpoint = ""
otlp_endpoint = "http://localhost:4317"
## Whether to append logs to stdout.
append_stdout = true
## The log format. Can be `text`/`json`.
log_format = "text"
## The maximum amount of log files.
max_log_files = 720
## The percentage of tracing will be sampled and exported.
## Valid range `[0, 1]`, 1 means all traces are sampled, 0 means all traces are not sampled, the default value is 1.
## ratio > 1 are treated as 1. Fractions < 0 are treated as 0
[logging.tracing_sample_ratio]
default_ratio = 1.0
## The slow query log options.
[logging.slow_query]
## Whether to enable slow query log.
enable = false
## The threshold of slow query.
## @toml2docs:none-default
threshold = "10s"
## The sampling ratio of slow query log. The value should be in the range of (0, 1].
## @toml2docs:none-default
sample_ratio = 1.0
## The datanode can export its metrics and send to Prometheus compatible service (e.g. send to `greptimedb` itself) from remote-write API.
## This is only used for `greptimedb` to export its own metrics internally. It's different from prometheus scrape.
[export_metrics]
@@ -518,19 +652,20 @@ enable = false
write_interval = "30s"
## For `standalone` mode, `self_import` is recommend to collect metrics generated by itself
## You must create the database before enabling it.
[export_metrics.self_import]
## +toml2docs:none-default
db = "information_schema"
## @toml2docs:none-default
db = "greptime_metrics"
[export_metrics.remote_write]
## The url the metrics send to. The url example can be: `http://127.0.0.1:4000/v1/prometheus/write?db=information_schema`.
## The url the metrics send to. The url example can be: `http://127.0.0.1:4000/v1/prometheus/write?db=greptime_metrics`.
url = ""
## HTTP headers of Prometheus remote-write carry.
headers = { }
## The tracing options. Only effect when compiled with `tokio-console` feature.
[tracing]
#+ [tracing]
## The tokio console address.
## +toml2docs:none-default
tokio_console_addr = "127.0.0.1"
## @toml2docs:none-default
#+ tokio_console_addr = "127.0.0.1"

View File

@@ -2,7 +2,7 @@
mode = "distributed"
## The flownode identifier and should be unique in the cluster.
## +toml2docs:none-default
## @toml2docs:none-default
node_id = 14
## The gRPC server options.
@@ -59,32 +59,50 @@ retry_interval = "3s"
## The logging options.
[logging]
## The directory to store the log files.
## The directory to store the log files. If set to empty, logs will not be written to files.
dir = "/tmp/greptimedb/logs"
## The log level. Can be `info`/`debug`/`warn`/`error`.
## +toml2docs:none-default
## @toml2docs:none-default
level = "info"
## Enable OTLP tracing.
enable_otlp_tracing = false
## The OTLP tracing endpoint.
## +toml2docs:none-default
otlp_endpoint = ""
otlp_endpoint = "http://localhost:4317"
## Whether to append logs to stdout.
append_stdout = true
## The log format. Can be `text`/`json`.
log_format = "text"
## The maximum amount of log files.
max_log_files = 720
## The percentage of tracing will be sampled and exported.
## Valid range `[0, 1]`, 1 means all traces are sampled, 0 means all traces are not sampled, the default value is 1.
## ratio > 1 are treated as 1. Fractions < 0 are treated as 0
[logging.tracing_sample_ratio]
default_ratio = 1.0
## The tracing options. Only effect when compiled with `tokio-console` feature.
[tracing]
## The tokio console address.
## +toml2docs:none-default
tokio_console_addr = "127.0.0.1"
## The slow query log options.
[logging.slow_query]
## Whether to enable slow query log.
enable = false
## The threshold of slow query.
## @toml2docs:none-default
threshold = "10s"
## The sampling ratio of slow query log. The value should be in the range of (0, 1].
## @toml2docs:none-default
sample_ratio = 1.0
## The tracing options. Only effect when compiled with `tokio-console` feature.
#+ [tracing]
## The tokio console address.
## @toml2docs:none-default
#+ tokio_console_addr = "127.0.0.1"

View File

@@ -1,18 +1,13 @@
## The running mode of the datanode. It can be `standalone` or `distributed`.
mode = "standalone"
## The default timezone of the server.
## +toml2docs:none-default
## @toml2docs:none-default
default_timezone = "UTC"
## The runtime options.
[runtime]
#+ [runtime]
## The number of threads to execute the runtime for global read operations.
read_rt_size = 8
#+ global_rt_size = 8
## The number of threads to execute the runtime for global write operations.
write_rt_size = 8
## The number of threads to execute the runtime for global background operations.
bg_rt_size = 4
#+ compact_rt_size = 4
## The heartbeat options.
[heartbeat]
@@ -49,11 +44,11 @@ runtime_size = 8
mode = "disable"
## Certificate file path.
## +toml2docs:none-default
## @toml2docs:none-default
cert_path = ""
## Private key file path.
## +toml2docs:none-default
## @toml2docs:none-default
key_path = ""
## Watch for Certificate and key file change and auto reload.
@@ -81,11 +76,11 @@ runtime_size = 2
mode = "disable"
## Certificate file path.
## +toml2docs:none-default
## @toml2docs:none-default
cert_path = ""
## Private key file path.
## +toml2docs:none-default
## @toml2docs:none-default
key_path = ""
## Watch for Certificate and key file change and auto reload
@@ -106,11 +101,11 @@ runtime_size = 2
mode = "disable"
## Certificate file path.
## +toml2docs:none-default
## @toml2docs:none-default
cert_path = ""
## Private key file path.
## +toml2docs:none-default
## @toml2docs:none-default
key_path = ""
## Watch for Certificate and key file change and auto reload
@@ -171,29 +166,47 @@ tcp_nodelay = true
## The logging options.
[logging]
## The directory to store the log files.
## The directory to store the log files. If set to empty, logs will not be written to files.
dir = "/tmp/greptimedb/logs"
## The log level. Can be `info`/`debug`/`warn`/`error`.
## +toml2docs:none-default
## @toml2docs:none-default
level = "info"
## Enable OTLP tracing.
enable_otlp_tracing = false
## The OTLP tracing endpoint.
## +toml2docs:none-default
otlp_endpoint = ""
otlp_endpoint = "http://localhost:4317"
## Whether to append logs to stdout.
append_stdout = true
## The log format. Can be `text`/`json`.
log_format = "text"
## The maximum amount of log files.
max_log_files = 720
## The percentage of tracing will be sampled and exported.
## Valid range `[0, 1]`, 1 means all traces are sampled, 0 means all traces are not sampled, the default value is 1.
## ratio > 1 are treated as 1. Fractions < 0 are treated as 0
[logging.tracing_sample_ratio]
default_ratio = 1.0
## The slow query log options.
[logging.slow_query]
## Whether to enable slow query log.
enable = false
## The threshold of slow query.
## @toml2docs:none-default
threshold = "10s"
## The sampling ratio of slow query log. The value should be in the range of (0, 1].
## @toml2docs:none-default
sample_ratio = 1.0
## The datanode can export its metrics and send to Prometheus compatible service (e.g. send to `greptimedb` itself) from remote-write API.
## This is only used for `greptimedb` to export its own metrics internally. It's different from prometheus scrape.
[export_metrics]
@@ -205,19 +218,20 @@ enable = false
write_interval = "30s"
## For `standalone` mode, `self_import` is recommend to collect metrics generated by itself
## You must create the database before enabling it.
[export_metrics.self_import]
## +toml2docs:none-default
db = "information_schema"
## @toml2docs:none-default
db = "greptime_metrics"
[export_metrics.remote_write]
## The url the metrics send to. The url example can be: `http://127.0.0.1:4000/v1/prometheus/write?db=information_schema`.
## The url the metrics send to. The url example can be: `http://127.0.0.1:4000/v1/prometheus/write?db=greptime_metrics`.
url = ""
## HTTP headers of Prometheus remote-write carry.
headers = { }
## The tracing options. Only effect when compiled with `tokio-console` feature.
[tracing]
#+ [tracing]
## The tokio console address.
## +toml2docs:none-default
tokio_console_addr = "127.0.0.1"
## @toml2docs:none-default
#+ tokio_console_addr = "127.0.0.1"

View File

@@ -7,38 +7,40 @@ bind_addr = "127.0.0.1:3002"
## The communication server address for frontend and datanode to connect to metasrv, "127.0.0.1:3002" by default for localhost.
server_addr = "127.0.0.1:3002"
## Etcd server address.
store_addr = "127.0.0.1:2379"
## Datanode selector type.
## - `lease_based` (default value).
## - `load_based`
## For details, please see "https://docs.greptime.com/developer-guide/metasrv/selector".
selector = "lease_based"
## Store data in memory.
use_memory_store = false
## Whether to enable greptimedb telemetry.
enable_telemetry = true
## Store server address default to etcd store.
store_addrs = ["127.0.0.1:2379"]
## If it's not empty, the metasrv will store all data with this key prefix.
store_key_prefix = ""
## The datastore for meta server.
backend = "EtcdStore"
## Datanode selector type.
## - `round_robin` (default value)
## - `lease_based`
## - `load_based`
## For details, please see "https://docs.greptime.com/developer-guide/metasrv/selector".
selector = "round_robin"
## Store data in memory.
use_memory_store = false
## Whether to enable region failover.
## This feature is only available on GreptimeDB running on cluster mode and
## - Using Remote WAL
## - Using shared storage (e.g., s3).
enable_region_failover = false
## Whether to enable greptimedb telemetry. Enabled by default.
#+ enable_telemetry = true
## The runtime options.
[runtime]
#+ [runtime]
## The number of threads to execute the runtime for global read operations.
read_rt_size = 8
#+ global_rt_size = 8
## The number of threads to execute the runtime for global write operations.
write_rt_size = 8
## The number of threads to execute the runtime for global background operations.
bg_rt_size = 4
#+ compact_rt_size = 4
## Procedure storage options.
[procedure]
@@ -58,17 +60,32 @@ max_metadata_value_size = "1500KiB"
# Failure detectors options.
[failure_detector]
## The threshold value used by the failure detector to determine failure conditions.
threshold = 8.0
## The minimum standard deviation of the heartbeat intervals, used to calculate acceptable variations.
min_std_deviation = "100ms"
## The acceptable pause duration between heartbeats, used to determine if a heartbeat interval is acceptable.
acceptable_heartbeat_pause = "10000ms"
## The initial estimate of the heartbeat interval used by the failure detector.
first_heartbeat_estimate = "1000ms"
## Datanode options.
[datanode]
## Datanode client options.
[datanode.client]
## Operation timeout.
timeout = "10s"
## Connect server timeout.
connect_timeout = "10s"
## `TCP_NODELAY` option for accepted connections.
tcp_nodelay = true
[wal]
@@ -82,7 +99,12 @@ provider = "raft_engine"
## The broker endpoints of the Kafka cluster.
broker_endpoints = ["127.0.0.1:9092"]
## Number of topics to be created upon start.
## Automatically create topics for WAL.
## Set to `true` to automatically create topics for WAL.
## Otherwise, use topics named `topic_name_prefix_[0..num_topics)`
auto_create_topics = true
## Number of topics.
num_topics = 64
## Topic selector type.
@@ -91,6 +113,7 @@ num_topics = 64
selector_type = "round_robin"
## A Kafka topic is constructed by concatenating `topic_name_prefix` and `topic_id`.
## i.g., greptimedb_wal_topic_0, greptimedb_wal_topic_1.
topic_name_prefix = "greptimedb_wal_topic"
## Expected number of replicas of each partition.
@@ -110,31 +133,67 @@ backoff_base = 2
## Stop reconnecting if the total wait time reaches the deadline. If this config is missing, the reconnecting won't terminate.
backoff_deadline = "5mins"
# The Kafka SASL configuration.
# **It's only used when the provider is `kafka`**.
# Available SASL mechanisms:
# - `PLAIN`
# - `SCRAM-SHA-256`
# - `SCRAM-SHA-512`
# [wal.sasl]
# type = "SCRAM-SHA-512"
# username = "user_kafka"
# password = "secret"
# The Kafka TLS configuration.
# **It's only used when the provider is `kafka`**.
# [wal.tls]
# server_ca_cert_path = "/path/to/server_cert"
# client_cert_path = "/path/to/client_cert"
# client_key_path = "/path/to/key"
## The logging options.
[logging]
## The directory to store the log files.
## The directory to store the log files. If set to empty, logs will not be written to files.
dir = "/tmp/greptimedb/logs"
## The log level. Can be `info`/`debug`/`warn`/`error`.
## +toml2docs:none-default
## @toml2docs:none-default
level = "info"
## Enable OTLP tracing.
enable_otlp_tracing = false
## The OTLP tracing endpoint.
## +toml2docs:none-default
otlp_endpoint = ""
otlp_endpoint = "http://localhost:4317"
## Whether to append logs to stdout.
append_stdout = true
## The log format. Can be `text`/`json`.
log_format = "text"
## The maximum amount of log files.
max_log_files = 720
## The percentage of tracing will be sampled and exported.
## Valid range `[0, 1]`, 1 means all traces are sampled, 0 means all traces are not sampled, the default value is 1.
## ratio > 1 are treated as 1. Fractions < 0 are treated as 0
[logging.tracing_sample_ratio]
default_ratio = 1.0
## The slow query log options.
[logging.slow_query]
## Whether to enable slow query log.
enable = false
## The threshold of slow query.
## @toml2docs:none-default
threshold = "10s"
## The sampling ratio of slow query log. The value should be in the range of (0, 1].
## @toml2docs:none-default
sample_ratio = 1.0
## The datanode can export its metrics and send to Prometheus compatible service (e.g. send to `greptimedb` itself) from remote-write API.
## This is only used for `greptimedb` to export its own metrics internally. It's different from prometheus scrape.
[export_metrics]
@@ -146,19 +205,20 @@ enable = false
write_interval = "30s"
## For `standalone` mode, `self_import` is recommend to collect metrics generated by itself
## You must create the database before enabling it.
[export_metrics.self_import]
## +toml2docs:none-default
db = "information_schema"
## @toml2docs:none-default
db = "greptime_metrics"
[export_metrics.remote_write]
## The url the metrics send to. The url example can be: `http://127.0.0.1:4000/v1/prometheus/write?db=information_schema`.
## The url the metrics send to. The url example can be: `http://127.0.0.1:4000/v1/prometheus/write?db=greptime_metrics`.
url = ""
## HTTP headers of Prometheus remote-write carry.
headers = { }
## The tracing options. Only effect when compiled with `tokio-console` feature.
[tracing]
#+ [tracing]
## The tokio console address.
## +toml2docs:none-default
tokio_console_addr = "127.0.0.1"
## @toml2docs:none-default
#+ tokio_console_addr = "127.0.0.1"

View File

@@ -1,21 +1,29 @@
## The running mode of the datanode. It can be `standalone` or `distributed`.
mode = "standalone"
## Enable telemetry to collect anonymous usage data.
enable_telemetry = true
## The default timezone of the server.
## +toml2docs:none-default
## @toml2docs:none-default
default_timezone = "UTC"
## Initialize all regions in the background during the startup.
## By default, it provides services after all regions have been initialized.
init_regions_in_background = false
## Parallelism of initializing regions.
init_regions_parallelism = 16
## The maximum current queries allowed to be executed. Zero means unlimited.
max_concurrent_queries = 0
## Enable telemetry to collect anonymous usage data. Enabled by default.
#+ enable_telemetry = true
## The runtime options.
[runtime]
#+ [runtime]
## The number of threads to execute the runtime for global read operations.
read_rt_size = 8
#+ global_rt_size = 8
## The number of threads to execute the runtime for global write operations.
write_rt_size = 8
## The number of threads to execute the runtime for global background operations.
bg_rt_size = 4
#+ compact_rt_size = 4
## The HTTP server options.
[http]
@@ -41,11 +49,11 @@ runtime_size = 8
mode = "disable"
## Certificate file path.
## +toml2docs:none-default
## @toml2docs:none-default
cert_path = ""
## Private key file path.
## +toml2docs:none-default
## @toml2docs:none-default
key_path = ""
## Watch for Certificate and key file change and auto reload.
@@ -73,11 +81,11 @@ runtime_size = 2
mode = "disable"
## Certificate file path.
## +toml2docs:none-default
## @toml2docs:none-default
cert_path = ""
## Private key file path.
## +toml2docs:none-default
## @toml2docs:none-default
key_path = ""
## Watch for Certificate and key file change and auto reload
@@ -98,11 +106,11 @@ runtime_size = 2
mode = "disable"
## Certificate file path.
## +toml2docs:none-default
## @toml2docs:none-default
cert_path = ""
## Private key file path.
## +toml2docs:none-default
## @toml2docs:none-default
key_path = ""
## Watch for Certificate and key file change and auto reload
@@ -134,20 +142,20 @@ provider = "raft_engine"
## The directory to store the WAL files.
## **It's only used when the provider is `raft_engine`**.
## +toml2docs:none-default
## @toml2docs:none-default
dir = "/tmp/greptimedb/wal"
## The size of the WAL segment file.
## **It's only used when the provider is `raft_engine`**.
file_size = "256MB"
file_size = "128MB"
## The threshold of the WAL size to trigger a flush.
## **It's only used when the provider is `raft_engine`**.
purge_threshold = "4GB"
purge_threshold = "1GB"
## The interval to trigger a flush.
## **It's only used when the provider is `raft_engine`**.
purge_interval = "10m"
purge_interval = "1m"
## The read batch size.
## **It's only used when the provider is `raft_engine`**.
@@ -169,10 +177,41 @@ prefill_log_files = false
## **It's only used when the provider is `raft_engine`**.
sync_period = "10s"
## Parallelism during WAL recovery.
recovery_parallelism = 2
## The Kafka broker endpoints.
## **It's only used when the provider is `kafka`**.
broker_endpoints = ["127.0.0.1:9092"]
## Automatically create topics for WAL.
## Set to `true` to automatically create topics for WAL.
## Otherwise, use topics named `topic_name_prefix_[0..num_topics)`
auto_create_topics = true
## Number of topics.
## **It's only used when the provider is `kafka`**.
num_topics = 64
## Topic selector type.
## Available selector types:
## - `round_robin` (default)
## **It's only used when the provider is `kafka`**.
selector_type = "round_robin"
## A Kafka topic is constructed by concatenating `topic_name_prefix` and `topic_id`.
## i.g., greptimedb_wal_topic_0, greptimedb_wal_topic_1.
## **It's only used when the provider is `kafka`**.
topic_name_prefix = "greptimedb_wal_topic"
## Expected number of replicas of each partition.
## **It's only used when the provider is `kafka`**.
replication_factor = 1
## Above which a topic creation operation will be cancelled.
## **It's only used when the provider is `kafka`**.
create_topic_timeout = "30s"
## The max size of a single producer batch.
## Warning: Kafka has a default limit of 1MB per message in a topic.
## **It's only used when the provider is `kafka`**.
@@ -198,6 +237,35 @@ backoff_base = 2
## **It's only used when the provider is `kafka`**.
backoff_deadline = "5mins"
## Ignore missing entries during read WAL.
## **It's only used when the provider is `kafka`**.
##
## This option ensures that when Kafka messages are deleted, the system
## can still successfully replay memtable data without throwing an
## out-of-range error.
## However, enabling this option might lead to unexpected data loss,
## as the system will skip over missing entries instead of treating
## them as critical errors.
overwrite_entry_start_id = false
# The Kafka SASL configuration.
# **It's only used when the provider is `kafka`**.
# Available SASL mechanisms:
# - `PLAIN`
# - `SCRAM-SHA-256`
# - `SCRAM-SHA-512`
# [wal.sasl]
# type = "SCRAM-SHA-512"
# username = "user_kafka"
# password = "secret"
# The Kafka TLS configuration.
# **It's only used when the provider is `kafka`**.
# [wal.tls]
# server_ca_cert_path = "/path/to/server_cert"
# client_cert_path = "/path/to/client_cert"
# client_key_path = "/path/to/key"
## Metadata storage options.
[metadata_store]
## Kv file size in bytes.
@@ -248,6 +316,7 @@ retry_delay = "500ms"
# root = "data"
# scope = "test"
# credential_path = "123456"
# credential = "base64-credential"
# endpoint = "https://storage.googleapis.com"
## The data storage options.
@@ -263,87 +332,123 @@ data_home = "/tmp/greptimedb/"
## - `Oss`: the data is stored in the Aliyun OSS.
type = "File"
## Cache configuration for object storage such as 'S3' etc.
## The local file cache directory.
## +toml2docs:none-default
cache_path = "/path/local_cache"
## Read cache configuration for object storage such as 'S3' etc, it's configured by default when using object storage. It is recommended to configure it when using object storage for better performance.
## A local file directory, defaults to `{data_home}/object_cache/read`. An empty string means disabling.
## @toml2docs:none-default
#+ cache_path = ""
## The local file cache capacity in bytes.
## +toml2docs:none-default
cache_capacity = "256MB"
## The local file cache capacity in bytes. If your disk space is sufficient, it is recommended to set it larger.
## @toml2docs:none-default
cache_capacity = "5GiB"
## The S3 bucket name.
## **It's only used when the storage type is `S3`, `Oss` and `Gcs`**.
## +toml2docs:none-default
## @toml2docs:none-default
bucket = "greptimedb"
## The S3 data will be stored in the specified prefix, for example, `s3://${bucket}/${root}`.
## **It's only used when the storage type is `S3`, `Oss` and `Azblob`**.
## +toml2docs:none-default
## @toml2docs:none-default
root = "greptimedb"
## The access key id of the aws account.
## It's **highly recommended** to use AWS IAM roles instead of hardcoding the access key id and secret key.
## **It's only used when the storage type is `S3` and `Oss`**.
## +toml2docs:none-default
## @toml2docs:none-default
access_key_id = "test"
## The secret access key of the aws account.
## It's **highly recommended** to use AWS IAM roles instead of hardcoding the access key id and secret key.
## **It's only used when the storage type is `S3`**.
## +toml2docs:none-default
## @toml2docs:none-default
secret_access_key = "test"
## The secret access key of the aliyun account.
## **It's only used when the storage type is `Oss`**.
## +toml2docs:none-default
## @toml2docs:none-default
access_key_secret = "test"
## The account key of the azure account.
## **It's only used when the storage type is `Azblob`**.
## +toml2docs:none-default
## @toml2docs:none-default
account_name = "test"
## The account key of the azure account.
## **It's only used when the storage type is `Azblob`**.
## +toml2docs:none-default
## @toml2docs:none-default
account_key = "test"
## The scope of the google cloud storage.
## **It's only used when the storage type is `Gcs`**.
## +toml2docs:none-default
## @toml2docs:none-default
scope = "test"
## The credential path of the google cloud storage.
## **It's only used when the storage type is `Gcs`**.
## +toml2docs:none-default
## @toml2docs:none-default
credential_path = "test"
## The credential of the google cloud storage.
## **It's only used when the storage type is `Gcs`**.
## @toml2docs:none-default
credential = "base64-credential"
## The container of the azure account.
## **It's only used when the storage type is `Azblob`**.
## +toml2docs:none-default
## @toml2docs:none-default
container = "greptimedb"
## The sas token of the azure account.
## **It's only used when the storage type is `Azblob`**.
## +toml2docs:none-default
## @toml2docs:none-default
sas_token = ""
## The endpoint of the S3 service.
## **It's only used when the storage type is `S3`, `Oss`, `Gcs` and `Azblob`**.
## +toml2docs:none-default
## @toml2docs:none-default
endpoint = "https://s3.amazonaws.com"
## The region of the S3 service.
## **It's only used when the storage type is `S3`, `Oss`, `Gcs` and `Azblob`**.
## +toml2docs:none-default
## @toml2docs:none-default
region = "us-west-2"
## The http client options to the storage.
## **It's only used when the storage type is `S3`, `Oss`, `Gcs` and `Azblob`**.
[storage.http_client]
## The maximum idle connection per host allowed in the pool.
pool_max_idle_per_host = 1024
## The timeout for only the connect phase of a http client.
connect_timeout = "30s"
## The total request timeout, applied from when the request starts connecting until the response body has finished.
## Also considered a total deadline.
timeout = "30s"
## The timeout for idle sockets being kept-alive.
pool_idle_timeout = "90s"
# Custom storage options
# [[storage.providers]]
# name = "S3"
# type = "S3"
# bucket = "greptimedb"
# root = "data"
# access_key_id = "test"
# secret_access_key = "123456"
# endpoint = "https://s3.amazonaws.com"
# region = "us-west-2"
# [[storage.providers]]
# name = "Gcs"
# type = "Gcs"
# bucket = "greptimedb"
# root = "data"
# scope = "test"
# credential_path = "123456"
# credential = "base64-credential"
# endpoint = "https://storage.googleapis.com"
## The region engine options. You can configure multiple region engines.
[[region_engine]]
@@ -352,7 +457,7 @@ region = "us-west-2"
[region_engine.mito]
## Number of region workers.
num_workers = 8
#+ num_workers = 8
## Request channel size of each worker.
worker_channel_size = 128
@@ -366,57 +471,75 @@ manifest_checkpoint_distance = 10
## Whether to compress manifest and checkpoint file by gzip (default false).
compress_manifest = false
## Max number of running background jobs
max_background_jobs = 4
## Max number of running background flush jobs (default: 1/2 of cpu cores).
## @toml2docs:none-default="Auto"
#+ max_background_flushes = 4
## Max number of running background compaction jobs (default: 1/4 of cpu cores).
## @toml2docs:none-default="Auto"
#+ max_background_compactions = 2
## Max number of running background purge jobs (default: number of cpu cores).
## @toml2docs:none-default="Auto"
#+ max_background_purges = 8
## Interval to auto flush a region if it has not flushed yet.
auto_flush_interval = "1h"
## Global write buffer size for all regions. If not set, it's default to 1/8 of OS memory with a max limitation of 1GB.
global_write_buffer_size = "1GB"
## @toml2docs:none-default="Auto"
#+ global_write_buffer_size = "1GB"
## Global write buffer size threshold to reject write requests. If not set, it's default to 2 times of `global_write_buffer_size`
global_write_buffer_reject_size = "2GB"
## Global write buffer size threshold to reject write requests. If not set, it's default to 2 times of `global_write_buffer_size`.
## @toml2docs:none-default="Auto"
#+ global_write_buffer_reject_size = "2GB"
## Cache size for SST metadata. Setting it to 0 to disable the cache.
## If not set, it's default to 1/32 of OS memory with a max limitation of 128MB.
sst_meta_cache_size = "128MB"
## @toml2docs:none-default="Auto"
#+ sst_meta_cache_size = "128MB"
## Cache size for vectors and arrow arrays. Setting it to 0 to disable the cache.
## If not set, it's default to 1/16 of OS memory with a max limitation of 512MB.
vector_cache_size = "512MB"
## @toml2docs:none-default="Auto"
#+ vector_cache_size = "512MB"
## Cache size for pages of SST row groups. Setting it to 0 to disable the cache.
## If not set, it's default to 1/16 of OS memory with a max limitation of 512MB.
page_cache_size = "512MB"
## If not set, it's default to 1/8 of OS memory.
## @toml2docs:none-default="Auto"
#+ page_cache_size = "512MB"
## Whether to enable the experimental write cache.
## Cache size for time series selector (e.g. `last_value()`). Setting it to 0 to disable the cache.
## If not set, it's default to 1/16 of OS memory with a max limitation of 512MB.
## @toml2docs:none-default="Auto"
#+ selector_result_cache_size = "512MB"
## Whether to enable the experimental write cache, it's enabled by default when using object storage. It is recommended to enable it when using object storage for better performance.
enable_experimental_write_cache = false
## File system path for write cache, defaults to `{data_home}/write_cache`.
## File system path for write cache, defaults to `{data_home}/object_cache/write`.
experimental_write_cache_path = ""
## Capacity for write cache.
experimental_write_cache_size = "512MB"
## Capacity for write cache. If your disk space is sufficient, it is recommended to set it larger.
experimental_write_cache_size = "5GiB"
## TTL for write cache.
experimental_write_cache_ttl = "1h"
## @toml2docs:none-default
experimental_write_cache_ttl = "8h"
## Buffer size for SST writing.
sst_write_buffer_size = "8MB"
## Parallelism to scan a region (default: 1/4 of cpu cores).
## - `0`: using the default value (1/4 of cpu cores).
## - `1`: scan in current thread.
## - `n`: scan in parallelism n.
scan_parallelism = 0
## Capacity of the channel to send data from parallel scan tasks to the main task.
parallel_scan_channel_size = 32
## Whether to allow stale WAL entries read during replay.
allow_stale_entries = false
## Minimum time interval between two compactions.
## To align with the old behavior, the default value is 0 (no restrictions).
min_compaction_interval = "0m"
## The options for index in Mito engine.
[region_engine.mito.index]
@@ -459,6 +582,15 @@ mem_threshold_on_create = "auto"
## Deprecated, use `region_engine.mito.index.aux_path` instead.
intermediate_path = ""
## Cache size for inverted index metadata.
metadata_cache_size = "64MiB"
## Cache size for inverted index content.
content_cache_size = "128MiB"
## Page size for inverted index content cache.
content_cache_page_size = "8MiB"
## The options for full-text index in Mito engine.
[region_engine.mito.fulltext_index]
@@ -507,29 +639,47 @@ fork_dictionary_bytes = "1GiB"
## The logging options.
[logging]
## The directory to store the log files.
## The directory to store the log files. If set to empty, logs will not be written to files.
dir = "/tmp/greptimedb/logs"
## The log level. Can be `info`/`debug`/`warn`/`error`.
## +toml2docs:none-default
## @toml2docs:none-default
level = "info"
## Enable OTLP tracing.
enable_otlp_tracing = false
## The OTLP tracing endpoint.
## +toml2docs:none-default
otlp_endpoint = ""
otlp_endpoint = "http://localhost:4317"
## Whether to append logs to stdout.
append_stdout = true
## The log format. Can be `text`/`json`.
log_format = "text"
## The maximum amount of log files.
max_log_files = 720
## The percentage of tracing will be sampled and exported.
## Valid range `[0, 1]`, 1 means all traces are sampled, 0 means all traces are not sampled, the default value is 1.
## ratio > 1 are treated as 1. Fractions < 0 are treated as 0
[logging.tracing_sample_ratio]
default_ratio = 1.0
## The slow query log options.
[logging.slow_query]
## Whether to enable slow query log.
enable = false
## The threshold of slow query.
## @toml2docs:none-default
threshold = "10s"
## The sampling ratio of slow query log. The value should be in the range of (0, 1].
## @toml2docs:none-default
sample_ratio = 1.0
## The datanode can export its metrics and send to Prometheus compatible service (e.g. send to `greptimedb` itself) from remote-write API.
## This is only used for `greptimedb` to export its own metrics internally. It's different from prometheus scrape.
[export_metrics]
@@ -540,20 +690,21 @@ enable = false
## The interval of export metrics.
write_interval = "30s"
## For `standalone` mode, `self_import` is recommend to collect metrics generated by itself
## For `standalone` mode, `self_import` is recommended to collect metrics generated by itself
## You must create the database before enabling it.
[export_metrics.self_import]
## +toml2docs:none-default
db = "information_schema"
## @toml2docs:none-default
db = "greptime_metrics"
[export_metrics.remote_write]
## The url the metrics send to. The url example can be: `http://127.0.0.1:4000/v1/prometheus/write?db=information_schema`.
## The url the metrics send to. The url example can be: `http://127.0.0.1:4000/v1/prometheus/write?db=greptime_metrics`.
url = ""
## HTTP headers of Prometheus remote-write carry.
headers = { }
## The tracing options. Only effect when compiled with `tokio-console` feature.
[tracing]
#+ [tracing]
## The tokio console address.
## +toml2docs:none-default
tokio_console_addr = "127.0.0.1"
## @toml2docs:none-default
#+ tokio_console_addr = "127.0.0.1"

View File

@@ -0,0 +1,50 @@
#!/bin/bash
set -euxo pipefail
cd "$(mktemp -d)"
# Fix version to v1.6.6, this is different than the latest version in original install script in
# https://raw.githubusercontent.com/cargo-bins/cargo-binstall/main/install-from-binstall-release.sh
base_url="https://github.com/cargo-bins/cargo-binstall/releases/download/v1.6.6/cargo-binstall-"
os="$(uname -s)"
if [ "$os" == "Darwin" ]; then
url="${base_url}universal-apple-darwin.zip"
curl -LO --proto '=https' --tlsv1.2 -sSf "$url"
unzip cargo-binstall-universal-apple-darwin.zip
elif [ "$os" == "Linux" ]; then
machine="$(uname -m)"
if [ "$machine" == "armv7l" ]; then
machine="armv7"
fi
target="${machine}-unknown-linux-musl"
if [ "$machine" == "armv7" ]; then
target="${target}eabihf"
fi
url="${base_url}${target}.tgz"
curl -L --proto '=https' --tlsv1.2 -sSf "$url" | tar -xvzf -
elif [ "${OS-}" = "Windows_NT" ]; then
machine="$(uname -m)"
target="${machine}-pc-windows-msvc"
url="${base_url}${target}.zip"
curl -LO --proto '=https' --tlsv1.2 -sSf "$url"
unzip "cargo-binstall-${target}.zip"
else
echo "Unsupported OS ${os}"
exit 1
fi
./cargo-binstall -y --force cargo-binstall
CARGO_HOME="${CARGO_HOME:-$HOME/.cargo}"
if ! [[ ":$PATH:" == *":$CARGO_HOME/bin:"* ]]; then
if [ -n "${CI:-}" ] && [ -n "${GITHUB_PATH:-}" ]; then
echo "$CARGO_HOME/bin" >> "$GITHUB_PATH"
else
echo
printf "\033[0;31mYour path is missing %s, you might want to add it.\033[0m\n" "$CARGO_HOME/bin"
echo
fi
fi

View File

@@ -32,7 +32,9 @@ RUN rustup toolchain install ${RUST_TOOLCHAIN}
# Install cargo-binstall with a specific version to adapt the current rust toolchain.
# Note: if we use the latest version, we may encounter the following `use of unstable library feature 'io_error_downcast'` error.
RUN cargo install cargo-binstall --version 1.6.6 --locked
# compile from source take too long, so we use the precompiled binary instead
COPY $DOCKER_BUILD_ROOT/docker/dev-builder/binstall/pull_binstall.sh /usr/local/bin/pull_binstall.sh
RUN chmod +x /usr/local/bin/pull_binstall.sh && /usr/local/bin/pull_binstall.sh
# Install nextest.
RUN cargo binstall cargo-nextest --no-confirm

View File

@@ -15,8 +15,8 @@ RUN apt-get update && \
RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y \
libssl-dev \
tzdata \
protobuf-compiler \
curl \
unzip \
ca-certificates \
git \
build-essential \
@@ -24,6 +24,29 @@ RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y \
python3.10 \
python3.10-dev
ARG TARGETPLATFORM
RUN echo "target platform: $TARGETPLATFORM"
# Install protobuf, because the one in the apt is too old (v3.12).
RUN if [ "$TARGETPLATFORM" = "linux/arm64" ]; then \
curl -OL https://github.com/protocolbuffers/protobuf/releases/download/v29.1/protoc-29.1-linux-aarch_64.zip && \
unzip protoc-29.1-linux-aarch_64.zip -d protoc3; \
elif [ "$TARGETPLATFORM" = "linux/amd64" ]; then \
curl -OL https://github.com/protocolbuffers/protobuf/releases/download/v29.1/protoc-29.1-linux-x86_64.zip && \
unzip protoc-29.1-linux-x86_64.zip -d protoc3; \
fi
RUN mv protoc3/bin/* /usr/local/bin/
RUN mv protoc3/include/* /usr/local/include/
# https://github.com/GreptimeTeam/greptimedb/actions/runs/10935485852/job/30357457188#step:3:7106
# `aws-lc-sys` require gcc >= 10.3.0 to work, hence alias to use gcc-10
RUN apt-get remove -y gcc-9 g++-9 cpp-9 && \
apt-get install -y gcc-10 g++-10 cpp-10 make cmake && \
ln -sf /usr/bin/gcc-10 /usr/bin/gcc && ln -sf /usr/bin/g++-10 /usr/bin/g++ && \
ln -sf /usr/bin/gcc-10 /usr/bin/cc && \
ln -sf /usr/bin/g++-10 /usr/bin/cpp && ln -sf /usr/bin/g++-10 /usr/bin/c++ && \
cc --version && gcc --version && g++ --version && cpp --version && c++ --version
# Remove Python 3.8 and install pip.
RUN apt-get -y purge python3.8 && \
apt-get -y autoremove && \
@@ -40,7 +63,7 @@ RUN apt-get -y purge python3.8 && \
# wildcard here. However, that requires the git's config files and the submodules all owned by the very same user.
# It's troublesome to do this since the dev build runs in Docker, which is under user "root"; while outside the Docker,
# it can be a different user that have prepared the submodules.
RUN git config --global --add safe.directory *
RUN git config --global --add safe.directory '*'
# Install Python dependencies.
COPY $DOCKER_BUILD_ROOT/docker/python/requirements.txt /etc/greptime/requirements.txt
@@ -57,7 +80,9 @@ RUN rustup toolchain install ${RUST_TOOLCHAIN}
# Install cargo-binstall with a specific version to adapt the current rust toolchain.
# Note: if we use the latest version, we may encounter the following `use of unstable library feature 'io_error_downcast'` error.
RUN cargo install cargo-binstall --version 1.6.6 --locked
# compile from source take too long, so we use the precompiled binary instead
COPY $DOCKER_BUILD_ROOT/docker/dev-builder/binstall/pull_binstall.sh /usr/local/bin/pull_binstall.sh
RUN chmod +x /usr/local/bin/pull_binstall.sh && /usr/local/bin/pull_binstall.sh
# Install nextest.
RUN cargo binstall cargo-nextest --no-confirm

View File

@@ -1,9 +1,9 @@
x-custom:
etcd_initial_cluster_token: &etcd_initial_cluster_token "--initial-cluster-token=etcd-cluster"
etcd_common_settings: &etcd_common_settings
image: quay.io/coreos/etcd:v3.5.10
image: "${ETCD_REGISTRY:-quay.io}/${ETCD_NAMESPACE:-coreos}/etcd:${ETCD_VERSION:-v3.5.10}"
entrypoint: /usr/local/bin/etcd
greptimedb_image: &greptimedb_image docker.io/greptimedb/greptimedb:latest
greptimedb_image: &greptimedb_image "${GREPTIMEDB_REGISTRY:-docker.io}/${GREPTIMEDB_NAMESPACE:-greptime}/greptimedb:${GREPTIMEDB_VERSION:-latest}"
services:
etcd0:

View File

@@ -0,0 +1,51 @@
# Log benchmark configuration
This repo holds the configuration we used to benchmark GreptimeDB, Clickhouse and Elastic Search.
Here are the versions of databases we used in the benchmark
| name | version |
| :------------ | :--------- |
| GreptimeDB | v0.9.2 |
| Clickhouse | 24.9.1.219 |
| Elasticsearch | 8.15.0 |
## Structured model vs Unstructured model
We divide test into two parts, using structured model and unstructured model accordingly. You can also see the difference in create table clause.
__Structured model__
The log data is pre-processed into columns by vector. For example an insert request looks like following
```SQL
INSERT INTO test_table (bytes, http_version, ip, method, path, status, user, timestamp) VALUES ()
```
The goal is to test string/text support for each database. In real scenarios it means the datasource(or log data producers) have separate fields defined, or have already processed the raw input.
__Unstructured model__
The log data is inserted as a long string, and then we build fulltext index upon these strings. For example an insert request looks like following
```SQL
INSERT INTO test_table (message, timestamp) VALUES ()
```
The goal is to test fuzzy search performance for each database. In real scenarios it means the log is produced by some kind of middleware and inserted directly into the database.
## Creating tables
See [here](./create_table.sql) for GreptimeDB and Clickhouse's create table clause.
The mapping of Elastic search is created automatically.
## Vector Configuration
We use vector to generate random log data and send inserts to databases.
Please refer to [structured config](./structured_vector.toml) and [unstructured config](./unstructured_vector.toml) for detailed configuration.
## SQLs and payloads
Please refer to [SQL query](./query.sql) for GreptimeDB and Clickhouse, and [query payload](./query.md) for Elastic search.
## Steps to reproduce
0. Decide whether to run structured model test or unstructured mode test.
1. Build vector binary(see vector's config file for specific branch) and databases binaries accordingly.
2. Create table in GreptimeDB and Clickhouse in advance.
3. Run vector to insert data.
4. When data insertion is finished, run queries against each database. Note: you'll need to update timerange value after data insertion.
## Addition
- You can tune GreptimeDB's configuration to get better performance.
- You can setup GreptimeDB to use S3 as storage, see [here](https://docs.greptime.com/user-guide/deployments/configuration#storage-options).

View File

@@ -0,0 +1,56 @@
-- GreptimeDB create table clause
-- structured test, use vector to pre-process log data into fields
CREATE TABLE IF NOT EXISTS `test_table` (
`bytes` Int64 NULL,
`http_version` STRING NULL,
`ip` STRING NULL,
`method` STRING NULL,
`path` STRING NULL,
`status` SMALLINT UNSIGNED NULL,
`user` STRING NULL,
`timestamp` TIMESTAMP(3) NOT NULL,
PRIMARY KEY (`user`, `path`, `status`),
TIME INDEX (`timestamp`)
)
ENGINE=mito
WITH(
append_mode = 'true'
);
-- unstructured test, build fulltext index on message column
CREATE TABLE IF NOT EXISTS `test_table` (
`message` STRING NULL FULLTEXT WITH(analyzer = 'English', case_sensitive = 'false'),
`timestamp` TIMESTAMP(3) NOT NULL,
TIME INDEX (`timestamp`)
)
ENGINE=mito
WITH(
append_mode = 'true'
);
-- Clickhouse create table clause
-- structured test
CREATE TABLE IF NOT EXISTS test_table
(
bytes UInt64 NOT NULL,
http_version String NOT NULL,
ip String NOT NULL,
method String NOT NULL,
path String NOT NULL,
status UInt8 NOT NULL,
user String NOT NULL,
timestamp String NOT NULL,
)
ENGINE = MergeTree()
ORDER BY (user, path, status);
-- unstructured test
SET allow_experimental_full_text_index = true;
CREATE TABLE IF NOT EXISTS test_table
(
message String,
timestamp String,
INDEX inv_idx(message) TYPE full_text(0) GRANULARITY 1
)
ENGINE = MergeTree()
ORDER BY tuple();

View File

@@ -0,0 +1,199 @@
# Query URL and payload for Elastic Search
## Count
URL: `http://127.0.0.1:9200/_count`
## Query by timerange
URL: `http://127.0.0.1:9200/_search`
You can use the following payload to get the full timerange first.
```JSON
{"size":0,"aggs":{"max_timestamp":{"max":{"field":"timestamp"}},"min_timestamp":{"min":{"field":"timestamp"}}}}
```
And then use this payload to query by timerange.
```JSON
{
"from": 0,
"size": 1000,
"query": {
"range": {
"timestamp": {
"gte": "2024-08-16T04:30:44.000Z",
"lte": "2024-08-16T04:51:52.000Z"
}
}
}
}
```
## Query by condition
URL: `http://127.0.0.1:9200/_search`
### Structured payload
```JSON
{
"from": 0,
"size": 10000,
"query": {
"bool": {
"must": [
{
"term": {
"user.keyword": "CrucifiX"
}
},
{
"term": {
"method.keyword": "OPTION"
}
},
{
"term": {
"path.keyword": "/user/booperbot124"
}
},
{
"term": {
"http_version.keyword": "HTTP/1.1"
}
},
{
"term": {
"status": "401"
}
}
]
}
}
}
```
### Unstructured payload
```JSON
{
"from": 0,
"size": 10000,
"query": {
"bool": {
"must": [
{
"match_phrase": {
"message": "CrucifiX"
}
},
{
"match_phrase": {
"message": "OPTION"
}
},
{
"match_phrase": {
"message": "/user/booperbot124"
}
},
{
"match_phrase": {
"message": "HTTP/1.1"
}
},
{
"match_phrase": {
"message": "401"
}
}
]
}
}
}
```
## Query by condition and timerange
URL: `http://127.0.0.1:9200/_search`
### Structured payload
```JSON
{
"size": 10000,
"query": {
"bool": {
"must": [
{
"term": {
"user.keyword": "CrucifiX"
}
},
{
"term": {
"method.keyword": "OPTION"
}
},
{
"term": {
"path.keyword": "/user/booperbot124"
}
},
{
"term": {
"http_version.keyword": "HTTP/1.1"
}
},
{
"term": {
"status": "401"
}
},
{
"range": {
"timestamp": {
"gte": "2024-08-19T07:03:37.383Z",
"lte": "2024-08-19T07:24:58.883Z"
}
}
}
]
}
}
}
```
### Unstructured payload
```JSON
{
"size": 10000,
"query": {
"bool": {
"must": [
{
"match_phrase": {
"message": "CrucifiX"
}
},
{
"match_phrase": {
"message": "OPTION"
}
},
{
"match_phrase": {
"message": "/user/booperbot124"
}
},
{
"match_phrase": {
"message": "HTTP/1.1"
}
},
{
"match_phrase": {
"message": "401"
}
},
{
"range": {
"timestamp": {
"gte": "2024-08-19T05:16:17.099Z",
"lte": "2024-08-19T05:46:02.722Z"
}
}
}
]
}
}
}
```

View File

@@ -0,0 +1,50 @@
-- Structured query for GreptimeDB and Clickhouse
-- query count
select count(*) from test_table;
-- query by timerange. Note: place the timestamp range in the where clause
-- GreptimeDB
-- you can use `select max(timestamp)::bigint from test_table;` and `select min(timestamp)::bigint from test_table;`
-- to get the full timestamp range
select * from test_table where timestamp between 1723710843619 and 1723711367588;
-- Clickhouse
-- you can use `select max(timestamp) from test_table;` and `select min(timestamp) from test_table;`
-- to get the full timestamp range
select * from test_table where timestamp between '2024-08-16T03:58:46Z' and '2024-08-16T04:03:50Z';
-- query by condition
SELECT * FROM test_table WHERE user = 'CrucifiX' and method = 'OPTION' and path = '/user/booperbot124' and http_version = 'HTTP/1.1' and status = 401;
-- query by condition and timerange
-- GreptimeDB
SELECT * FROM test_table WHERE user = "CrucifiX" and method = "OPTION" and path = "/user/booperbot124" and http_version = "HTTP/1.1" and status = 401
and timestamp between 1723774396760 and 1723774788760;
-- Clickhouse
SELECT * FROM test_table WHERE user = 'CrucifiX' and method = 'OPTION' and path = '/user/booperbot124' and http_version = 'HTTP/1.1' and status = 401
and timestamp between '2024-08-16T03:58:46Z' and '2024-08-16T04:03:50Z';
-- Unstructured query for GreptimeDB and Clickhouse
-- query by condition
-- GreptimeDB
SELECT * FROM test_table WHERE MATCHES(message, "+CrucifiX +OPTION +/user/booperbot124 +HTTP/1.1 +401");
-- Clickhouse
SELECT * FROM test_table WHERE (message LIKE '%CrucifiX%')
AND (message LIKE '%OPTION%')
AND (message LIKE '%/user/booperbot124%')
AND (message LIKE '%HTTP/1.1%')
AND (message LIKE '%401%');
-- query by condition and timerange
-- GreptimeDB
SELECT * FROM test_table WHERE MATCHES(message, "+CrucifiX +OPTION +/user/booperbot124 +HTTP/1.1 +401")
and timestamp between 1723710843619 and 1723711367588;
-- Clickhouse
SELECT * FROM test_table WHERE (message LIKE '%CrucifiX%')
AND (message LIKE '%OPTION%')
AND (message LIKE '%/user/booperbot124%')
AND (message LIKE '%HTTP/1.1%')
AND (message LIKE '%401%')
AND timestamp between '2024-08-15T10:25:26.524000000Z' AND '2024-08-15T10:31:31.746000000Z';

View File

@@ -0,0 +1,57 @@
# Please note we use patched branch to build vector
# https://github.com/shuiyisong/vector/tree/chore/greptime_log_ingester_logitem
[sources.demo_logs]
type = "demo_logs"
format = "apache_common"
# interval value = 1 / rps
# say you want to insert at 20k/s, that is 1 / 20000 = 0.00005
# set to 0 to run as fast as possible
interval = 0
# total rows to insert
count = 100000000
lines = [ "line1" ]
[transforms.parse_logs]
type = "remap"
inputs = ["demo_logs"]
source = '''
. = parse_regex!(.message, r'^(?P<ip>\S+) - (?P<user>\S+) \[(?P<timestamp>[^\]]+)\] "(?P<method>\S+) (?P<path>\S+) (?P<http_version>\S+)" (?P<status>\d+) (?P<bytes>\d+)$')
# Convert timestamp to a standard format
.timestamp = parse_timestamp!(.timestamp, format: "%d/%b/%Y:%H:%M:%S %z")
# Convert status and bytes to integers
.status = to_int!(.status)
.bytes = to_int!(.bytes)
'''
[sinks.sink_greptime_logs]
type = "greptimedb_logs"
# The table to insert into
table = "test_table"
pipeline_name = "demo_pipeline"
compression = "none"
inputs = [ "parse_logs" ]
endpoint = "http://127.0.0.1:4000"
# Batch size for each insertion
batch.max_events = 4000
[sinks.clickhouse]
type = "clickhouse"
inputs = [ "parse_logs" ]
database = "default"
endpoint = "http://127.0.0.1:8123"
format = "json_each_row"
# The table to insert into
table = "test_table"
[sinks.sink_elasticsearch]
type = "elasticsearch"
inputs = [ "parse_logs" ]
api_version = "auto"
compression = "none"
doc_type = "_doc"
endpoints = [ "http://127.0.0.1:9200" ]
id_key = "id"
mode = "bulk"

View File

@@ -0,0 +1,43 @@
# Please note we use patched branch to build vector
# https://github.com/shuiyisong/vector/tree/chore/greptime_log_ingester_ft
[sources.demo_logs]
type = "demo_logs"
format = "apache_common"
# interval value = 1 / rps
# say you want to insert at 20k/s, that is 1 / 20000 = 0.00005
# set to 0 to run as fast as possible
interval = 0
# total rows to insert
count = 100000000
lines = [ "line1" ]
[sinks.sink_greptime_logs]
type = "greptimedb_logs"
# The table to insert into
table = "test_table"
pipeline_name = "demo_pipeline"
compression = "none"
inputs = [ "demo_logs" ]
endpoint = "http://127.0.0.1:4000"
# Batch size for each insertion
batch.max_events = 500
[sinks.clickhouse]
type = "clickhouse"
inputs = [ "demo_logs" ]
database = "default"
endpoint = "http://127.0.0.1:8123"
format = "json_each_row"
# The table to insert into
table = "test_table"
[sinks.sink_elasticsearch]
type = "elasticsearch"
inputs = [ "demo_logs" ]
api_version = "auto"
compression = "none"
doc_type = "_doc"
endpoints = [ "http://127.0.0.1:9200" ]
id_key = "id"
mode = "bulk"

View File

@@ -0,0 +1,58 @@
# TSBS benchmark - v0.9.1
## Environment
### Local
| | |
| ------ | ---------------------------------- |
| CPU | AMD Ryzen 7 7735HS (8 core 3.2GHz) |
| Memory | 32GB |
| Disk | SOLIDIGM SSDPFKNU010TZ |
| OS | Ubuntu 22.04.2 LTS |
### Amazon EC2
| | |
| ------- | ----------------------- |
| Machine | c5d.2xlarge |
| CPU | 8 core |
| Memory | 16GB |
| Disk | 100GB (GP3) |
| OS | Ubuntu Server 24.04 LTS |
## Write performance
| Environment | Ingest rate (rows/s) |
| --------------- | -------------------- |
| Local | 387697.68 |
| EC2 c5d.2xlarge | 234620.19 |
## Query performance
| Query type | Local (ms) | EC2 c5d.2xlarge (ms) |
| --------------------- | ---------- | -------------------- |
| cpu-max-all-1 | 21.14 | 14.75 |
| cpu-max-all-8 | 36.79 | 30.69 |
| double-groupby-1 | 529.02 | 987.85 |
| double-groupby-5 | 1064.53 | 1455.95 |
| double-groupby-all | 1625.33 | 2143.96 |
| groupby-orderby-limit | 529.19 | 1353.49 |
| high-cpu-1 | 12.09 | 8.24 |
| high-cpu-all | 3619.47 | 5312.82 |
| lastpoint | 224.91 | 576.06 |
| single-groupby-1-1-1 | 10.82 | 6.01 |
| single-groupby-1-1-12 | 11.16 | 7.42 |
| single-groupby-1-8-1 | 13.50 | 10.20 |
| single-groupby-5-1-1 | 11.99 | 6.70 |
| single-groupby-5-1-12 | 13.17 | 8.72 |
| single-groupby-5-8-1 | 16.01 | 12.07 |
`single-groupby-1-1-1` query throughput
| Environment | Client concurrency | mean time (ms) | qps (queries/sec) |
| --------------- | ------------------ | -------------- | ----------------- |
| Local | 50 | 33.04 | 1511.74 |
| Local | 100 | 67.70 | 1476.14 |
| EC2 c5d.2xlarge | 50 | 61.93 | 806.97 |
| EC2 c5d.2xlarge | 100 | 126.31 | 791.40 |

View File

@@ -0,0 +1,16 @@
# Change Log Level on the Fly
## HTTP API
example:
```bash
curl --data "trace,flow=debug" 127.0.0.1:4000/debug/log_level
```
And database will reply with something like:
```bash
Log Level changed from Some("info") to "trace,flow=debug"%
```
The data is a string in the format of `global_level,module1=level1,module2=level2,...` that follow the same rule of `RUST_LOG`.
The module is the module name of the log, and the level is the log level. The log level can be one of the following: `trace`, `debug`, `info`, `warn`, `error`, `off`(case insensitive).

View File

@@ -1,15 +1,9 @@
# Profiling CPU
## Build GreptimeDB with `pprof` feature
```bash
cargo build --features=pprof
```
## HTTP API
Sample at 99 Hertz, for 5 seconds, output report in [protobuf format](https://github.com/google/pprof/blob/master/proto/profile.proto).
```bash
curl -s '0:4000/v1/prof/cpu' > /tmp/pprof.out
curl -X POST -s '0:4000/debug/prof/cpu' > /tmp/pprof.out
```
Then you can use `pprof` command with the protobuf file.
@@ -19,10 +13,10 @@ go tool pprof -top /tmp/pprof.out
Sample at 99 Hertz, for 60 seconds, output report in flamegraph format.
```bash
curl -s '0:4000/v1/prof/cpu?seconds=60&output=flamegraph' > /tmp/pprof.svg
curl -X POST -s '0:4000/debug/prof/cpu?seconds=60&output=flamegraph' > /tmp/pprof.svg
```
Sample at 49 Hertz, for 10 seconds, output report in text format.
```bash
curl -s '0:4000/v1/prof/cpu?seconds=10&frequency=49&output=text' > /tmp/pprof.txt
curl -X POST -s '0:4000/debug/prof/cpu?seconds=10&frequency=49&output=text' > /tmp/pprof.txt
```

View File

@@ -12,16 +12,10 @@ brew install jemalloc
sudo apt install libjemalloc-dev
```
### [flamegraph](https://github.com/brendangregg/FlameGraph)
### [flamegraph](https://github.com/brendangregg/FlameGraph)
```bash
curl https://raw.githubusercontent.com/brendangregg/FlameGraph/master/flamegraph.pl > ./flamegraph.pl
```
### Build GreptimeDB with `mem-prof` feature.
```bash
cargo build --features=mem-prof
curl https://raw.githubusercontent.com/brendangregg/FlameGraph/master/flamegraph.pl > ./flamegraph.pl
```
## Profiling
@@ -29,13 +23,13 @@ cargo build --features=mem-prof
Start GreptimeDB instance with environment variables:
```bash
MALLOC_CONF=prof:true,lg_prof_interval:28 ./target/debug/greptime standalone start
MALLOC_CONF=prof:true ./target/debug/greptime standalone start
```
Dump memory profiling data through HTTP API:
```bash
curl localhost:4000/v1/prof/mem > greptime.hprof
curl -X POST localhost:4000/debug/prof/mem > greptime.hprof
```
You can periodically dump profiling data and compare them to find the delta memory usage.
@@ -45,6 +39,9 @@ You can periodically dump profiling data and compare them to find the delta memo
To create flamegraph according to dumped profiling data:
```bash
jeprof --svg <path_to_greptimedb_binary> --base=<baseline_prof> <profile_data> > output.svg
```
sudo apt install -y libjemalloc-dev
jeprof <path_to_greptime_binary> <profile_data> --collapse | ./flamegraph.pl > mem-prof.svg
jeprof <path_to_greptime_binary> --base <baseline_prof> <profile_data> --collapse | ./flamegraph.pl > output.svg
```

View File

@@ -105,7 +105,7 @@ use tests_fuzz::utils::{init_greptime_connections, Connections};
fuzz_target!(|input: FuzzInput| {
common_telemetry::init_default_ut_logging();
common_runtime::block_on_write(async {
common_runtime::block_on_global(async {
let Connections { mysql } = init_greptime_connections().await;
let mut rng = ChaChaRng::seed_from_u64(input.seed);
let columns = rng.gen_range(2..30);

Binary file not shown.

Before

Width:  |  Height:  |  Size: 25 KiB

After

Width:  |  Height:  |  Size: 36 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 18 KiB

After

Width:  |  Height:  |  Size: 25 KiB

View File

@@ -0,0 +1,197 @@
---
Feature Name: Json Datatype
Tracking Issue: https://github.com/GreptimeTeam/greptimedb/issues/4230
Date: 2024-8-6
Author: "Yuhan Wang <profsyb@gmail.com>"
---
# Summary
This RFC proposes a method for storing and querying JSON data in the database.
# Motivation
JSON is widely used across various scenarios. Direct support for writing and querying JSON can significantly enhance the database's flexibility.
# Details
## Storage and Query
GreptimeDB's type system is built on Arrow/DataFusion, where each data type in GreptimeDB corresponds to a data type in Arrow/DataFusion. The proposed JSON type will be implemented on top of the existing `Binary` type, leveraging the current `datatype::value::Value` and `datatype::vectors::BinaryVector` implementations, utilizing the JSONB format as the encoding of JSON data. JSON data is stored and processed similarly to binary data within the storage layer and query engine.
This approach brings problems when dealing with insertions and queries of JSON columns.
## Insertion
Users commonly write JSON data as strings. Thus we need to make conversions between string and JSONB. There are 2 ways to do this:
1. MySQL and PostgreSQL servers provide auto-conversions between strings and JSONB. When a string is inserted into a JSON column, the server will try to parse the string as JSON and convert it to JSONB. The non-JSON strings will be rejected.
2. A function `parse_json` is provided to convert string to JSONB. If the string is not a valid JSON string, the function will return an error.
For example, in MySQL client:
```SQL
CREATE TABLE IF NOT EXISTS test (
ts TIMESTAMP TIME INDEX,
a INT,
b JSON
);
INSERT INTO test VALUES(
0,
0,
'{
"name": "jHl2oDDnPc1i2OzlP5Y",
"timestamp": "2024-07-25T04:33:11.369386Z",
"attributes": { "event_attributes": 48.28667 }
}'
);
INSERT INTO test VALUES(
0,
0,
parse_json('{
"name": "jHl2oDDnPc1i2OzlP5Y",
"timestamp": "2024-07-25T04:33:11.369386Z",
"attributes": { "event_attributes": 48.28667 }
}')
);
```
Are both valid.
The dataflow of the insertion process is as follows:
```
Insert JSON strings directly through client:
Parse Insert
String(Serialized JSON)┌──────────┐Arrow Binary(JSONB)┌──────┐Arrow Binary(JSONB)
Client ---------------------->│ Server │------------------>│ Mito │------------------> Storage
└──────────┘ └──────┘
(Server identifies JSON type and performs auto-conversion)
Insert JSON strings through parse_json function:
Parse Insert
String(Serialized JSON)┌──────────┐String(Serialized JSON)┌─────┐Arrow Binary(JSONB)┌──────┐Arrow Binary(JSONB)
Client ---------------------->│ Server │---------------------->│ UDF │------------------>│ Mito │------------------> Storage
└──────────┘ └─────┘ └──────┘
(Conversion is performed by UDF inside Query Engine)
```
Servers identify JSON column through column schema and perform auto-conversions. But when using prepared statements and binding parameters, the corresponding cached plans in datafusion generated by prepared statements cannot identify JSON columns. Under this circumstance, the servers identify JSON columns through the given parameters and perform auto-conversions.
The following is an example of inserting JSON data through prepared statements:
```Rust
sqlx::query(
"create table test(ts timestamp time index, j json)",
)
.execute(&pool)
.await
.unwrap();
let json = serde_json::json!({
"code": 200,
"success": true,
"payload": {
"features": [
"serde",
"json"
],
"homepage": null
}
});
// Valid, can identify serde_json::Value as JSON type
sqlx::query("insert into test values($1, $2)")
.bind(i)
.bind(json)
.execute(&pool)
.await
.unwrap();
// Invalid, cannot identify String as JSON type
sqlx::query("insert into test values($1, $2)")
.bind(i)
.bind(json.to_string())
.execute(&pool)
.await
.unwrap();
```
## Query
Correspondingly, users prefer to display JSON data as strings. Thus we need to make conversions between JSON data and strings before presenting JSON data. There are also 2 ways to do this: auto-conversions on MySQL and PostgreSQL servers, and function `json_to_string`.
For example, in MySQL client:
```SQL
SELECT b FROM test;
SELECT json_to_string(b) FROM test;
```
Will both return the JSON as human-readable strings.
Specifically, to perform auto-conversions, we attach a message to JSON data in the `metadata` of `Field` in Arrow/Datafusion schema when scanning a JSON column. Frontend servers could identify JSON data and convert it to strings.
The dataflow of the query process is as follows:
```
Query directly through client:
Decode Scan
String(Serialized JSON)┌──────────┐Arrow Binary(JSONB)┌──────────────┐Arrow Binary(JSONB)
Client <----------------------│ Server │<------------------│ Query Engine │<----------------- Storage
└──────────┘ └──────────────┘
(Server identifies JSON type and performs auto-conversion based on column metadata)
Query through json_to_string function:
Scan & Decode
String(Serialized JSON)┌──────────┐String(Serialized JSON)┌──────────────┐Arrow Binary(JSONB)
Client <----------------------│ Server │<----------------------│ Query Engine │<----------------- Storage
└──────────┘ └──────────────┘
(Conversion is performed by UDF inside Query Engine)
```
However, if a function uses JSON type as its return type, the metadata method mentioned above is not applicable. Thus the functions of JSON type should specify the return type explicitly instead of returning a JSON type, such as `json_get_int` and `json_get_float` which return corresponding data of `INT` and `FLOAT` type respectively.
## Functions
Similar to the common JSON type, JSON data can be queried with functions.
For example:
```SQL
CREATE TABLE IF NOT EXISTS test (
ts TIMESTAMP TIME INDEX,
a INT,
b JSON
);
INSERT INTO test VALUES(
0,
0,
'{
"name": "jHl2oDDnPc1i2OzlP5Y",
"timestamp": "2024-07-25T04:33:11.369386Z",
"attributes": { "event_attributes": 48.28667 }
}'
);
SELECT json_get_string(b, 'name') FROM test;
+---------------------+
| b.name |
+---------------------+
| jHl2oDDnPc1i2OzlP5Y |
+---------------------+
SELECT json_get_float(b, 'attributes.event_attributes') FROM test;
+--------------------------------+
| b.attributes.event_attributes |
+--------------------------------+
| 48.28667 |
+--------------------------------+
```
And more functions can be added in the future.
# Drawbacks
As a general purpose JSON data type, JSONB may not be as efficient as specialized data types for specific scenarios.
The auto-conversion mechanism is not supported in all scenarios. We need to find workarounds for these scenarios.
# Alternatives
Extract and flatten JSON schema to store in a structured format through pipeline. For nested data, we can provide nested types like `STRUCT` or `ARRAY`.

View File

@@ -5,6 +5,13 @@ GreptimeDB's official Grafana dashboard.
Status notify: we are still working on this config. It's expected to change frequently in the recent days. Please feel free to submit your feedback and/or contribution to this dashboard 🤗
If you use Helm [chart](https://github.com/GreptimeTeam/helm-charts) to deploy GreptimeDB cluster, you can enable self-monitoring by setting the following values in your Helm chart:
- `monitoring.enabled=true`: Deploys a standalone GreptimeDB instance dedicated to monitoring the cluster;
- `grafana.enabled=true`: Deploys Grafana and automatically imports the monitoring dashboard;
The standalone GreptimeDB instance will collect metrics from your cluster and the dashboard will be available in the Grafana UI. For detailed deployment instructions, please refer to our [Kubernetes deployment guide](https://docs.greptime.com/nightly/user-guide/deployments/deploy-on-kubernetes/getting-started).
# How to use
## `greptimedb.json`
@@ -25,7 +32,7 @@ Please ensure the following configuration before importing the dashboard into Gr
__1. Prometheus scrape config__
Assign `greptime_pod` label to each host target. We use this label to identify each node instance.
Configure Prometheus to scrape the cluster.
```yml
# example config
@@ -34,27 +41,15 @@ Assign `greptime_pod` label to each host target. We use this label to identify e
scrape_configs:
- job_name: metasrv
static_configs:
- targets: ['<ip>:<port>']
labels:
greptime_pod: metasrv
- targets: ['<metasrv-ip>:<port>']
- job_name: datanode
static_configs:
- targets: ['<ip>:<port>']
labels:
greptime_pod: datanode1
- targets: ['<ip>:<port>']
labels:
greptime_pod: datanode2
- targets: ['<ip>:<port>']
labels:
greptime_pod: datanode3
- targets: ['<datanode0-ip>:<port>', '<datanode1-ip>:<port>', '<datanode2-ip>:<port>']
- job_name: frontend
static_configs:
- targets: ['<ip>:<port>']
labels:
greptime_pod: frontend
- targets: ['<frontend-ip>:<port>']
```
__2. Grafana config__
@@ -63,4 +58,4 @@ Create a Prometheus data source in Grafana before using this dashboard. We use `
### Usage
Use `datasource` or `greptime_pod` on the upper-left corner to filter data from certain node.
Use `datasource` or `instance` on the upper-left corner to filter data from certain node.

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -1,2 +1,3 @@
[toolchain]
channel = "nightly-2024-04-20"
channel = "nightly-2024-10-19"
components = ["rust-analyzer"]

View File

@@ -0,0 +1,42 @@
#!/usr/bin/env bash
set -e
RUST_TOOLCHAIN_VERSION_FILE="rust-toolchain.toml"
DEV_BUILDER_UBUNTU_REGISTRY="docker.io"
DEV_BUILDER_UBUNTU_NAMESPACE="greptime"
DEV_BUILDER_UBUNTU_NAME="dev-builder-ubuntu"
function check_rust_toolchain_version() {
DEV_BUILDER_IMAGE_TAG=$(grep "DEV_BUILDER_IMAGE_TAG ?= " Makefile | cut -d= -f2 | sed 's/^[ \t]*//')
if [ -z "$DEV_BUILDER_IMAGE_TAG" ]; then
echo "Error: No DEV_BUILDER_IMAGE_TAG found in Makefile"
exit 1
fi
DEV_BUILDER_UBUNTU_IMAGE="$DEV_BUILDER_UBUNTU_REGISTRY/$DEV_BUILDER_UBUNTU_NAMESPACE/$DEV_BUILDER_UBUNTU_NAME:$DEV_BUILDER_IMAGE_TAG"
CURRENT_VERSION=$(grep -Eo '[0-9]{4}-[0-9]{2}-[0-9]{2}' "$RUST_TOOLCHAIN_VERSION_FILE")
if [ -z "$CURRENT_VERSION" ]; then
echo "Error: No rust toolchain version found in $RUST_TOOLCHAIN_VERSION_FILE"
exit 1
fi
RUST_TOOLCHAIN_VERSION_IN_BUILDER=$(docker run "$DEV_BUILDER_UBUNTU_IMAGE" rustc --version | grep -Eo '[0-9]{4}-[0-9]{2}-[0-9]{2}')
if [ -z "$RUST_TOOLCHAIN_VERSION_IN_BUILDER" ]; then
echo "Error: No rustc version found in $DEV_BUILDER_UBUNTU_IMAGE"
exit 1
fi
# Compare the version and the difference should be less than 1 day.
current_rust_toolchain_seconds=$(date -d "$CURRENT_VERSION" +%s)
rust_toolchain_in_dev_builder_ubuntu_seconds=$(date -d "$RUST_TOOLCHAIN_VERSION_IN_BUILDER" +%s)
date_diff=$(( (current_rust_toolchain_seconds - rust_toolchain_in_dev_builder_ubuntu_seconds) / 86400 ))
if [ $date_diff -gt 1 ]; then
echo "Error: The rust toolchain '$RUST_TOOLCHAIN_VERSION_IN_BUILDER' in builder '$DEV_BUILDER_UBUNTU_IMAGE' maybe outdated, please update it to '$CURRENT_VERSION'"
exit 1
fi
}
check_rust_toolchain_version

71
scripts/check-snafu.py Normal file
View File

@@ -0,0 +1,71 @@
# Copyright 2023 Greptime Team
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import re
def find_rust_files(directory):
error_files = []
other_rust_files = []
for root, _, files in os.walk(directory):
for file in files:
if file == "error.rs":
error_files.append(os.path.join(root, file))
elif file.endswith(".rs"):
other_rust_files.append(os.path.join(root, file))
return error_files, other_rust_files
def extract_branch_names(file_content):
pattern = re.compile(r"#\[snafu\(display\([^\)]*\)\)\]\s*(\w+)\s*\{")
return pattern.findall(file_content)
def check_snafu_in_files(branch_name, rust_files):
branch_name_snafu = f"{branch_name}Snafu"
for rust_file in rust_files:
with open(rust_file, "r") as file:
content = file.read()
if branch_name_snafu in content:
return True
return False
def main():
error_files, other_rust_files = find_rust_files(".")
branch_names = []
for error_file in error_files:
with open(error_file, "r") as file:
content = file.read()
branch_names.extend(extract_branch_names(content))
unused_snafu = [
branch_name
for branch_name in branch_names
if not check_snafu_in_files(branch_name, other_rust_files)
]
if unused_snafu:
print("Unused error variants:")
for name in unused_snafu:
print(name)
if unused_snafu:
raise SystemExit(1)
if __name__ == "__main__":
main()

View File

@@ -4,59 +4,69 @@ set -ue
OS_TYPE=
ARCH_TYPE=
# Set the GitHub token to avoid GitHub API rate limit.
# You can run with `GITHUB_TOKEN`:
# GITHUB_TOKEN=<your_token> ./scripts/install.sh
GITHUB_TOKEN=${GITHUB_TOKEN:-}
VERSION=${1:-latest}
GITHUB_ORG=GreptimeTeam
GITHUB_REPO=greptimedb
BIN=greptime
get_os_type() {
os_type="$(uname -s)"
os_type="$(uname -s)"
case "$os_type" in
case "$os_type" in
Darwin)
OS_TYPE=darwin
;;
OS_TYPE=darwin
;;
Linux)
OS_TYPE=linux
;;
OS_TYPE=linux
;;
*)
echo "Error: Unknown OS type: $os_type"
exit 1
esac
echo "Error: Unknown OS type: $os_type"
exit 1
esac
}
get_arch_type() {
arch_type="$(uname -m)"
arch_type="$(uname -m)"
case "$arch_type" in
case "$arch_type" in
arm64)
ARCH_TYPE=arm64
;;
ARCH_TYPE=arm64
;;
aarch64)
ARCH_TYPE=arm64
;;
ARCH_TYPE=arm64
;;
x86_64)
ARCH_TYPE=amd64
;;
ARCH_TYPE=amd64
;;
amd64)
ARCH_TYPE=amd64
;;
ARCH_TYPE=amd64
;;
*)
echo "Error: Unknown CPU type: $arch_type"
exit 1
esac
echo "Error: Unknown CPU type: $arch_type"
exit 1
esac
}
get_os_type
get_arch_type
if [ -n "${OS_TYPE}" ] && [ -n "${ARCH_TYPE}" ]; then
# Use the latest nightly version.
download_artifact() {
if [ -n "${OS_TYPE}" ] && [ -n "${ARCH_TYPE}" ]; then
# Use the latest stable released version.
# GitHub API reference: https://docs.github.com/en/rest/releases/releases?apiVersion=2022-11-28#get-the-latest-release.
if [ "${VERSION}" = "latest" ]; then
VERSION=$(curl -s -XGET "https://api.github.com/repos/${GITHUB_ORG}/${GITHUB_REPO}/releases" | grep tag_name | grep nightly | cut -d: -f 2 | sed 's/.*"\(.*\)".*/\1/' | uniq | sort -r | head -n 1)
if [ -z "${VERSION}" ]; then
echo "Failed to get the latest version."
exit 1
# To avoid other tools dependency, we choose to use `curl` to get the version metadata and parsed by `sed`.
VERSION=$(curl -sL \
-H "Accept: application/vnd.github+json" \
-H "X-GitHub-Api-Version: 2022-11-28" \
${GITHUB_TOKEN:+-H "Authorization: Bearer $GITHUB_TOKEN"} \
"https://api.github.com/repos/${GITHUB_ORG}/${GITHUB_REPO}/releases/latest" | sed -n 's/.*"tag_name": "\([^"]*\)".*/\1/p')
if [ -z "${VERSION}" ]; then
echo "Failed to get the latest stable released version."
exit 1
fi
fi
@@ -73,4 +83,9 @@ if [ -n "${OS_TYPE}" ] && [ -n "${ARCH_TYPE}" ]; then
rm -r "${PACKAGE_NAME%.tar.gz}" && \
echo "Run './${BIN} --help' to get started"
fi
fi
fi
}
get_os_type
get_arch_type
download_artifact

27
shell.nix Normal file
View File

@@ -0,0 +1,27 @@
let
nixpkgs = fetchTarball "https://github.com/NixOS/nixpkgs/tarball/nixos-unstable";
fenix = import (fetchTarball "https://github.com/nix-community/fenix/archive/main.tar.gz") {};
pkgs = import nixpkgs { config = {}; overlays = []; };
in
pkgs.mkShell rec {
nativeBuildInputs = with pkgs; [
pkg-config
git
clang
gcc
protobuf
mold
(fenix.fromToolchainFile {
dir = ./.;
})
cargo-nextest
taplo
];
buildInputs = with pkgs; [
libgit2
];
LD_LIBRARY_PATH = pkgs.lib.makeLibraryPath buildInputs;
}

View File

@@ -17,10 +17,11 @@ use std::sync::Arc;
use common_base::BitVec;
use common_decimal::decimal128::{DECIMAL128_DEFAULT_SCALE, DECIMAL128_MAX_PRECISION};
use common_decimal::Decimal128;
use common_time::interval::IntervalUnit;
use common_time::time::Time;
use common_time::timestamp::TimeUnit;
use common_time::{Date, DateTime, Interval, Timestamp};
use common_time::{
Date, DateTime, IntervalDayTime, IntervalMonthDayNano, IntervalYearMonth, Timestamp,
};
use datatypes::prelude::{ConcreteDataType, ValueRef};
use datatypes::scalars::ScalarVector;
use datatypes::types::{
@@ -35,14 +36,14 @@ use datatypes::vectors::{
TimestampMillisecondVector, TimestampNanosecondVector, TimestampSecondVector, UInt32Vector,
UInt64Vector, VectorRef,
};
use greptime_proto::v1;
use greptime_proto::v1::column_data_type_extension::TypeExt;
use greptime_proto::v1::ddl_request::Expr;
use greptime_proto::v1::greptime_request::Request;
use greptime_proto::v1::query_request::Query;
use greptime_proto::v1::value::ValueData;
use greptime_proto::v1::{
ColumnDataTypeExtension, DdlRequest, DecimalTypeExtension, QueryRequest, Row, SemanticType,
self, ColumnDataTypeExtension, DdlRequest, DecimalTypeExtension, JsonTypeExtension,
QueryRequest, Row, SemanticType, VectorTypeExtension,
};
use paste::paste;
use snafu::prelude::*;
@@ -103,7 +104,18 @@ impl From<ColumnDataTypeWrapper> for ConcreteDataType {
ColumnDataType::Uint64 => ConcreteDataType::uint64_datatype(),
ColumnDataType::Float32 => ConcreteDataType::float32_datatype(),
ColumnDataType::Float64 => ConcreteDataType::float64_datatype(),
ColumnDataType::Binary => ConcreteDataType::binary_datatype(),
ColumnDataType::Binary => {
if let Some(TypeExt::JsonType(_)) = datatype_wrapper
.datatype_ext
.as_ref()
.and_then(|datatype_ext| datatype_ext.type_ext.as_ref())
{
ConcreteDataType::json_datatype()
} else {
ConcreteDataType::binary_datatype()
}
}
ColumnDataType::Json => ConcreteDataType::json_datatype(),
ColumnDataType::String => ConcreteDataType::string_datatype(),
ColumnDataType::Date => ConcreteDataType::date_datatype(),
ColumnDataType::Datetime => ConcreteDataType::datetime_datatype(),
@@ -137,6 +149,17 @@ impl From<ColumnDataTypeWrapper> for ConcreteDataType {
ConcreteDataType::decimal128_default_datatype()
}
}
ColumnDataType::Vector => {
if let Some(TypeExt::VectorType(d)) = datatype_wrapper
.datatype_ext
.as_ref()
.and_then(|datatype_ext| datatype_ext.type_ext.as_ref())
{
ConcreteDataType::vector_datatype(d.dim)
} else {
ConcreteDataType::vector_default_datatype()
}
}
}
}
}
@@ -218,6 +241,15 @@ impl ColumnDataTypeWrapper {
}),
}
}
pub fn vector_datatype(dim: u32) -> Self {
ColumnDataTypeWrapper {
datatype: ColumnDataType::Vector,
datatype_ext: Some(ColumnDataTypeExtension {
type_ext: Some(TypeExt::VectorType(VectorTypeExtension { dim })),
}),
}
}
}
impl TryFrom<ConcreteDataType> for ColumnDataTypeWrapper {
@@ -258,6 +290,8 @@ impl TryFrom<ConcreteDataType> for ColumnDataTypeWrapper {
IntervalType::MonthDayNano(_) => ColumnDataType::IntervalMonthDayNano,
},
ConcreteDataType::Decimal128(_) => ColumnDataType::Decimal128,
ConcreteDataType::Json(_) => ColumnDataType::Json,
ConcreteDataType::Vector(_) => ColumnDataType::Vector,
ConcreteDataType::Null(_)
| ConcreteDataType::List(_)
| ConcreteDataType::Dictionary(_)
@@ -276,6 +310,18 @@ impl TryFrom<ConcreteDataType> for ColumnDataTypeWrapper {
})),
})
}
ColumnDataType::Json => datatype.as_json().map(|_| ColumnDataTypeExtension {
type_ext: Some(TypeExt::JsonType(JsonTypeExtension::JsonBinary.into())),
}),
ColumnDataType::Vector => {
datatype
.as_vector()
.map(|vector_type| ColumnDataTypeExtension {
type_ext: Some(TypeExt::VectorType(VectorTypeExtension {
dim: vector_type.dim as _,
})),
})
}
_ => None,
};
Ok(Self {
@@ -395,6 +441,14 @@ pub fn values_with_capacity(datatype: ColumnDataType, capacity: usize) -> Values
decimal128_values: Vec::with_capacity(capacity),
..Default::default()
},
ColumnDataType::Json => Values {
string_values: Vec::with_capacity(capacity),
..Default::default()
},
ColumnDataType::Vector => Values {
binary_values: Vec::with_capacity(capacity),
..Default::default()
},
}
}
@@ -435,13 +489,11 @@ pub fn push_vals(column: &mut Column, origin_count: usize, vector: VectorRef) {
TimeUnit::Microsecond => values.time_microsecond_values.push(val.value()),
TimeUnit::Nanosecond => values.time_nanosecond_values.push(val.value()),
},
Value::Interval(val) => match val.unit() {
IntervalUnit::YearMonth => values.interval_year_month_values.push(val.to_i32()),
IntervalUnit::DayTime => values.interval_day_time_values.push(val.to_i64()),
IntervalUnit::MonthDayNano => values
.interval_month_day_nano_values
.push(convert_i128_to_interval(val.to_i128())),
},
Value::IntervalYearMonth(val) => values.interval_year_month_values.push(val.to_i32()),
Value::IntervalDayTime(val) => values.interval_day_time_values.push(val.to_i64()),
Value::IntervalMonthDayNano(val) => values
.interval_month_day_nano_values
.push(convert_month_day_nano_to_pb(val)),
Value::Decimal128(val) => values.decimal128_values.push(convert_to_pb_decimal128(val)),
Value::List(_) | Value::Duration(_) => unreachable!(),
});
@@ -475,25 +527,24 @@ fn ddl_request_type(request: &DdlRequest) -> &'static str {
match request.expr {
Some(Expr::CreateDatabase(_)) => "ddl.create_database",
Some(Expr::CreateTable(_)) => "ddl.create_table",
Some(Expr::Alter(_)) => "ddl.alter",
Some(Expr::AlterTable(_)) => "ddl.alter_table",
Some(Expr::DropTable(_)) => "ddl.drop_table",
Some(Expr::TruncateTable(_)) => "ddl.truncate_table",
Some(Expr::CreateFlow(_)) => "ddl.create_flow",
Some(Expr::DropFlow(_)) => "ddl.drop_flow",
Some(Expr::CreateView(_)) => "ddl.create_view",
Some(Expr::DropView(_)) => "ddl.drop_view",
Some(Expr::AlterDatabase(_)) => "ddl.alter_database",
None => "ddl.empty",
}
}
/// Converts an i128 value to google protobuf type [IntervalMonthDayNano].
pub fn convert_i128_to_interval(v: i128) -> v1::IntervalMonthDayNano {
let interval = Interval::from_i128(v);
let (months, days, nanoseconds) = interval.to_month_day_nano();
/// Converts an interval to google protobuf type [IntervalMonthDayNano].
pub fn convert_month_day_nano_to_pb(v: IntervalMonthDayNano) -> v1::IntervalMonthDayNano {
v1::IntervalMonthDayNano {
months,
days,
nanoseconds,
months: v.months,
days: v.days,
nanoseconds: v.nanoseconds,
}
}
@@ -541,11 +592,15 @@ pub fn pb_value_to_value_ref<'a>(
ValueData::TimeMillisecondValue(t) => ValueRef::Time(Time::new_millisecond(*t)),
ValueData::TimeMicrosecondValue(t) => ValueRef::Time(Time::new_microsecond(*t)),
ValueData::TimeNanosecondValue(t) => ValueRef::Time(Time::new_nanosecond(*t)),
ValueData::IntervalYearMonthValue(v) => ValueRef::Interval(Interval::from_i32(*v)),
ValueData::IntervalDayTimeValue(v) => ValueRef::Interval(Interval::from_i64(*v)),
ValueData::IntervalYearMonthValue(v) => {
ValueRef::IntervalYearMonth(IntervalYearMonth::from_i32(*v))
}
ValueData::IntervalDayTimeValue(v) => {
ValueRef::IntervalDayTime(IntervalDayTime::from_i64(*v))
}
ValueData::IntervalMonthDayNanoValue(v) => {
let interval = Interval::from_month_day_nano(v.months, v.days, v.nanoseconds);
ValueRef::Interval(interval)
let interval = IntervalMonthDayNano::new(v.months, v.days, v.nanoseconds);
ValueRef::IntervalMonthDayNano(interval)
}
ValueData::Decimal128Value(v) => {
// get precision and scale from datatype_extension
@@ -636,7 +691,7 @@ pub fn pb_values_to_vector_ref(data_type: &ConcreteDataType, values: Values) ->
IntervalType::MonthDayNano(_) => {
Arc::new(IntervalMonthDayNanoVector::from_iter_values(
values.interval_month_day_nano_values.iter().map(|x| {
Interval::from_month_day_nano(x.months, x.days, x.nanoseconds).to_i128()
IntervalMonthDayNano::new(x.months, x.days, x.nanoseconds).to_i128()
}),
))
}
@@ -646,10 +701,12 @@ pub fn pb_values_to_vector_ref(data_type: &ConcreteDataType, values: Values) ->
Decimal128::from_value_precision_scale(x.hi, x.lo, d.precision(), d.scale()).into()
}),
)),
ConcreteDataType::Vector(_) => Arc::new(BinaryVector::from_vec(values.binary_values)),
ConcreteDataType::Null(_)
| ConcreteDataType::List(_)
| ConcreteDataType::Dictionary(_)
| ConcreteDataType::Duration(_) => {
| ConcreteDataType::Duration(_)
| ConcreteDataType::Json(_) => {
unreachable!()
}
}
@@ -780,18 +837,18 @@ pub fn pb_values_to_values(data_type: &ConcreteDataType, values: Values) -> Vec<
ConcreteDataType::Interval(IntervalType::YearMonth(_)) => values
.interval_year_month_values
.into_iter()
.map(|v| Value::Interval(Interval::from_i32(v)))
.map(|v| Value::IntervalYearMonth(IntervalYearMonth::from_i32(v)))
.collect(),
ConcreteDataType::Interval(IntervalType::DayTime(_)) => values
.interval_day_time_values
.into_iter()
.map(|v| Value::Interval(Interval::from_i64(v)))
.map(|v| Value::IntervalDayTime(IntervalDayTime::from_i64(v)))
.collect(),
ConcreteDataType::Interval(IntervalType::MonthDayNano(_)) => values
.interval_month_day_nano_values
.into_iter()
.map(|v| {
Value::Interval(Interval::from_month_day_nano(
Value::IntervalMonthDayNano(IntervalMonthDayNano::new(
v.months,
v.days,
v.nanoseconds,
@@ -810,10 +867,12 @@ pub fn pb_values_to_values(data_type: &ConcreteDataType, values: Values) -> Vec<
))
})
.collect(),
ConcreteDataType::Vector(_) => values.binary_values.into_iter().map(|v| v.into()).collect(),
ConcreteDataType::Null(_)
| ConcreteDataType::List(_)
| ConcreteDataType::Dictionary(_)
| ConcreteDataType::Duration(_) => {
| ConcreteDataType::Duration(_)
| ConcreteDataType::Json(_) => {
unreachable!()
}
}
@@ -831,7 +890,10 @@ pub fn is_column_type_value_eq(
expect_type: &ConcreteDataType,
) -> bool {
ColumnDataTypeWrapper::try_new(type_value, type_extension)
.map(|wrapper| ConcreteDataType::from(wrapper) == *expect_type)
.map(|wrapper| {
let datatype = ConcreteDataType::from(wrapper);
expect_type == &datatype
})
.unwrap_or(false)
}
@@ -912,18 +974,16 @@ pub fn to_proto_value(value: Value) -> Option<v1::Value> {
value_data: Some(ValueData::TimeNanosecondValue(v.value())),
},
},
Value::Interval(v) => match v.unit() {
IntervalUnit::YearMonth => v1::Value {
value_data: Some(ValueData::IntervalYearMonthValue(v.to_i32())),
},
IntervalUnit::DayTime => v1::Value {
value_data: Some(ValueData::IntervalDayTimeValue(v.to_i64())),
},
IntervalUnit::MonthDayNano => v1::Value {
value_data: Some(ValueData::IntervalMonthDayNanoValue(
convert_i128_to_interval(v.to_i128()),
)),
},
Value::IntervalYearMonth(v) => v1::Value {
value_data: Some(ValueData::IntervalYearMonthValue(v.to_i32())),
},
Value::IntervalDayTime(v) => v1::Value {
value_data: Some(ValueData::IntervalDayTimeValue(v.to_i64())),
},
Value::IntervalMonthDayNano(v) => v1::Value {
value_data: Some(ValueData::IntervalMonthDayNanoValue(
convert_month_day_nano_to_pb(v),
)),
},
Value::Decimal128(v) => v1::Value {
value_data: Some(ValueData::Decimal128Value(convert_to_pb_decimal128(v))),
@@ -1015,13 +1075,11 @@ pub fn value_to_grpc_value(value: Value) -> GrpcValue {
TimeUnit::Microsecond => ValueData::TimeMicrosecondValue(v.value()),
TimeUnit::Nanosecond => ValueData::TimeNanosecondValue(v.value()),
}),
Value::Interval(v) => Some(match v.unit() {
IntervalUnit::YearMonth => ValueData::IntervalYearMonthValue(v.to_i32()),
IntervalUnit::DayTime => ValueData::IntervalDayTimeValue(v.to_i64()),
IntervalUnit::MonthDayNano => {
ValueData::IntervalMonthDayNanoValue(convert_i128_to_interval(v.to_i128()))
}
}),
Value::IntervalYearMonth(v) => Some(ValueData::IntervalYearMonthValue(v.to_i32())),
Value::IntervalDayTime(v) => Some(ValueData::IntervalDayTimeValue(v.to_i64())),
Value::IntervalMonthDayNano(v) => Some(ValueData::IntervalMonthDayNanoValue(
convert_month_day_nano_to_pb(v),
)),
Value::Decimal128(v) => Some(ValueData::Decimal128Value(convert_to_pb_decimal128(v))),
Value::List(_) | Value::Duration(_) => unreachable!(),
},
@@ -1032,6 +1090,7 @@ pub fn value_to_grpc_value(value: Value) -> GrpcValue {
mod tests {
use std::sync::Arc;
use common_time::interval::IntervalUnit;
use datatypes::types::{
Int32Type, IntervalDayTimeType, IntervalMonthDayNanoType, IntervalYearMonthType,
TimeMillisecondType, TimeSecondType, TimestampMillisecondType, TimestampSecondType,
@@ -1120,6 +1179,10 @@ mod tests {
let values = values_with_capacity(ColumnDataType::Decimal128, 2);
let values = values.decimal128_values;
assert_eq!(2, values.capacity());
let values = values_with_capacity(ColumnDataType::Vector, 2);
let values = values.binary_values;
assert_eq!(2, values.capacity());
}
#[test]
@@ -1207,7 +1270,11 @@ mod tests {
assert_eq!(
ConcreteDataType::decimal128_datatype(10, 2),
ColumnDataTypeWrapper::decimal128_datatype(10, 2).into()
)
);
assert_eq!(
ConcreteDataType::vector_datatype(3),
ColumnDataTypeWrapper::vector_datatype(3).into()
);
}
#[test]
@@ -1303,6 +1370,10 @@ mod tests {
.try_into()
.unwrap()
);
assert_eq!(
ColumnDataTypeWrapper::vector_datatype(3),
ConcreteDataType::vector_datatype(3).try_into().unwrap()
);
let result: Result<ColumnDataTypeWrapper> = ConcreteDataType::null_datatype().try_into();
assert!(result.is_err());
@@ -1477,11 +1548,11 @@ mod tests {
#[test]
fn test_convert_i128_to_interval() {
let i128_val = 3000;
let interval = convert_i128_to_interval(i128_val);
let i128_val = 3;
let interval = convert_month_day_nano_to_pb(IntervalMonthDayNano::from_i128(i128_val));
assert_eq!(interval.months, 0);
assert_eq!(interval.days, 0);
assert_eq!(interval.nanoseconds, 3000);
assert_eq!(interval.nanoseconds, 3);
}
#[test]
@@ -1561,9 +1632,9 @@ mod tests {
},
);
let expect = vec![
Value::Interval(Interval::from_year_month(1_i32)),
Value::Interval(Interval::from_year_month(2_i32)),
Value::Interval(Interval::from_year_month(3_i32)),
Value::IntervalYearMonth(IntervalYearMonth::new(1_i32)),
Value::IntervalYearMonth(IntervalYearMonth::new(2_i32)),
Value::IntervalYearMonth(IntervalYearMonth::new(3_i32)),
];
assert_eq!(expect, actual);
@@ -1576,9 +1647,9 @@ mod tests {
},
);
let expect = vec![
Value::Interval(Interval::from_i64(1_i64)),
Value::Interval(Interval::from_i64(2_i64)),
Value::Interval(Interval::from_i64(3_i64)),
Value::IntervalDayTime(IntervalDayTime::from_i64(1_i64)),
Value::IntervalDayTime(IntervalDayTime::from_i64(2_i64)),
Value::IntervalDayTime(IntervalDayTime::from_i64(3_i64)),
];
assert_eq!(expect, actual);
@@ -1607,9 +1678,9 @@ mod tests {
},
);
let expect = vec![
Value::Interval(Interval::from_month_day_nano(1, 2, 3)),
Value::Interval(Interval::from_month_day_nano(5, 6, 7)),
Value::Interval(Interval::from_month_day_nano(9, 10, 11)),
Value::IntervalMonthDayNano(IntervalMonthDayNano::new(1, 2, 3)),
Value::IntervalMonthDayNano(IntervalMonthDayNano::new(5, 6, 7)),
Value::IntervalMonthDayNano(IntervalMonthDayNano::new(9, 10, 11)),
];
assert_eq!(expect, actual);
}

View File

@@ -21,14 +21,14 @@ use greptime_proto::v1::region::RegionResponse as RegionResponseV1;
#[derive(Debug)]
pub struct RegionResponse {
pub affected_rows: AffectedRows,
pub extension: HashMap<String, Vec<u8>>,
pub extensions: HashMap<String, Vec<u8>>,
}
impl RegionResponse {
pub fn from_region_response(region_response: RegionResponseV1) -> Self {
Self {
affected_rows: region_response.affected_rows as _,
extension: region_response.extension,
extensions: region_response.extensions,
}
}
@@ -36,7 +36,7 @@ impl RegionResponse {
pub fn new(affected_rows: AffectedRows) -> Self {
Self {
affected_rows,
extension: Default::default(),
extensions: Default::default(),
}
}
}

View File

@@ -15,8 +15,10 @@
use std::collections::HashMap;
use datatypes::schema::{
ColumnDefaultConstraint, ColumnSchema, FulltextOptions, COMMENT_KEY, FULLTEXT_KEY,
ColumnDefaultConstraint, ColumnSchema, FulltextAnalyzer, FulltextOptions, COMMENT_KEY,
FULLTEXT_KEY, INVERTED_INDEX_KEY, SKIPPING_INDEX_KEY,
};
use greptime_proto::v1::Analyzer;
use snafu::ResultExt;
use crate::error::{self, Result};
@@ -25,6 +27,10 @@ use crate::v1::{ColumnDef, ColumnOptions, SemanticType};
/// Key used to store fulltext options in gRPC column options.
const FULLTEXT_GRPC_KEY: &str = "fulltext";
/// Key used to store inverted index options in gRPC column options.
const INVERTED_INDEX_GRPC_KEY: &str = "inverted_index";
/// Key used to store skip index options in gRPC column options.
const SKIPPING_INDEX_GRPC_KEY: &str = "skipping_index";
/// Tries to construct a `ColumnSchema` from the given `ColumnDef`.
pub fn try_as_column_schema(column_def: &ColumnDef) -> Result<ColumnSchema> {
@@ -49,10 +55,16 @@ pub fn try_as_column_schema(column_def: &ColumnDef) -> Result<ColumnSchema> {
if !column_def.comment.is_empty() {
metadata.insert(COMMENT_KEY.to_string(), column_def.comment.clone());
}
if let Some(options) = column_def.options.as_ref()
&& let Some(fulltext) = options.options.get(FULLTEXT_GRPC_KEY)
{
metadata.insert(FULLTEXT_KEY.to_string(), fulltext.to_string());
if let Some(options) = column_def.options.as_ref() {
if let Some(fulltext) = options.options.get(FULLTEXT_GRPC_KEY) {
metadata.insert(FULLTEXT_KEY.to_string(), fulltext.clone());
}
if let Some(inverted_index) = options.options.get(INVERTED_INDEX_GRPC_KEY) {
metadata.insert(INVERTED_INDEX_KEY.to_string(), inverted_index.clone());
}
if let Some(skipping_index) = options.options.get(SKIPPING_INDEX_GRPC_KEY) {
metadata.insert(SKIPPING_INDEX_KEY.to_string(), skipping_index.clone());
}
}
ColumnSchema::new(&column_def.name, data_type.into(), column_def.is_nullable)
@@ -70,7 +82,17 @@ pub fn options_from_column_schema(column_schema: &ColumnSchema) -> Option<Column
if let Some(fulltext) = column_schema.metadata().get(FULLTEXT_KEY) {
options
.options
.insert(FULLTEXT_GRPC_KEY.to_string(), fulltext.to_string());
.insert(FULLTEXT_GRPC_KEY.to_string(), fulltext.clone());
}
if let Some(inverted_index) = column_schema.metadata().get(INVERTED_INDEX_KEY) {
options
.options
.insert(INVERTED_INDEX_GRPC_KEY.to_string(), inverted_index.clone());
}
if let Some(skipping_index) = column_schema.metadata().get(SKIPPING_INDEX_KEY) {
options
.options
.insert(SKIPPING_INDEX_GRPC_KEY.to_string(), skipping_index.clone());
}
(!options.options.is_empty()).then_some(options)
@@ -93,6 +115,14 @@ pub fn options_from_fulltext(fulltext: &FulltextOptions) -> Result<Option<Column
Ok((!options.options.is_empty()).then_some(options))
}
/// Tries to construct a `FulltextAnalyzer` from the given analyzer.
pub fn as_fulltext_option(analyzer: Analyzer) -> FulltextAnalyzer {
match analyzer {
Analyzer::English => FulltextAnalyzer::English,
Analyzer::Chinese => FulltextAnalyzer::Chinese,
}
}
#[cfg(test)]
mod tests {
@@ -115,10 +145,13 @@ mod tests {
comment: "test_comment".to_string(),
datatype_extension: None,
options: Some(ColumnOptions {
options: HashMap::from([(
FULLTEXT_GRPC_KEY.to_string(),
"{\"enable\":true}".to_string(),
)]),
options: HashMap::from([
(
FULLTEXT_GRPC_KEY.to_string(),
"{\"enable\":true}".to_string(),
),
(INVERTED_INDEX_GRPC_KEY.to_string(), "true".to_string()),
]),
}),
};
@@ -139,6 +172,7 @@ mod tests {
..Default::default()
}
);
assert!(schema.is_inverted_indexed());
}
#[test]
@@ -153,12 +187,17 @@ mod tests {
analyzer: FulltextAnalyzer::English,
case_sensitive: false,
})
.unwrap();
.unwrap()
.set_inverted_index(true);
let options = options_from_column_schema(&schema).unwrap();
assert_eq!(
options.options.get(FULLTEXT_GRPC_KEY).unwrap(),
"{\"enable\":true,\"analyzer\":\"English\",\"case-sensitive\":false}"
);
assert_eq!(
options.options.get(INVERTED_INDEX_GRPC_KEY).unwrap(),
"true"
);
}
#[test]

View File

@@ -75,6 +75,16 @@ pub enum Password<'a> {
PgMD5(HashedPassword<'a>, Salt<'a>),
}
impl Password<'_> {
pub fn r#type(&self) -> &str {
match self {
Password::PlainText(_) => "plain_text",
Password::MysqlNativePassword(_, _) => "mysql_native_password",
Password::PgMD5(_, _) => "pg_md5",
}
}
}
pub fn auth_mysql(
auth_data: HashedPassword,
salt: Salt,

View File

@@ -12,7 +12,7 @@
// See the License for the specific language governing permissions and
// limitations under the License.
use common_error::ext::ErrorExt;
use common_error::ext::{BoxedError, ErrorExt};
use common_error::status_code::StatusCode;
use common_macro::stack_trace_debug;
use snafu::{Location, Snafu};
@@ -38,6 +38,14 @@ pub enum Error {
location: Location,
},
#[snafu(display("Authentication source failure"))]
AuthBackend {
#[snafu(implicit)]
location: Location,
#[snafu(source)]
source: BoxedError,
},
#[snafu(display("User not found, username: {}", username))]
UserNotFound { username: String },
@@ -81,6 +89,7 @@ impl ErrorExt for Error {
Error::FileWatch { .. } => StatusCode::InvalidArguments,
Error::InternalState { .. } => StatusCode::Unexpected,
Error::Io { .. } => StatusCode::StorageUnavailable,
Error::AuthBackend { source, .. } => source.status_code(),
Error::UserNotFound { .. } => StatusCode::UserNotFound,
Error::UnsupportedPasswordType { .. } => StatusCode::UnsupportedPasswordType,

View File

@@ -13,9 +13,11 @@
// limitations under the License.
use common_base::secrets::ExposeSecret;
use common_error::ext::BoxedError;
use snafu::{OptionExt, ResultExt};
use crate::error::{
AccessDeniedSnafu, Result, UnsupportedPasswordTypeSnafu, UserNotFoundSnafu,
AccessDeniedSnafu, AuthBackendSnafu, Result, UnsupportedPasswordTypeSnafu, UserNotFoundSnafu,
UserPasswordMismatchSnafu,
};
use crate::user_info::DefaultUserInfo;
@@ -49,6 +51,19 @@ impl MockUserProvider {
info.schema.clone_into(&mut self.schema);
info.username.clone_into(&mut self.username);
}
// this is a deliberate function to ref AuthBackendSnafu
// so that it won't get deleted in the future
pub fn ref_auth_backend_snafu(&self) -> Result<()> {
let none_option = None;
none_option
.context(UserNotFoundSnafu {
username: "no_user".to_string(),
})
.map_err(BoxedError::new)
.context(AuthBackendSnafu)
}
}
#[async_trait::async_trait]

View File

@@ -57,6 +57,11 @@ pub trait UserProvider: Send + Sync {
self.authorize(catalog, schema, &user_info).await?;
Ok(user_info)
}
/// Returns whether this user provider implementation is backed by an external system.
fn external(&self) -> bool {
false
}
}
fn load_credential_from_file(filepath: &str) -> Result<Option<HashMap<String, Vec<u8>>>> {

View File

@@ -33,7 +33,7 @@ impl StaticUserProvider {
value: value.to_string(),
msg: "StaticUserProviderOption must be in format `<option>:<value>`",
})?;
return match mode {
match mode {
"file" => {
let users = load_credential_from_file(content)?
.context(InvalidConfigSnafu {
@@ -58,7 +58,7 @@ impl StaticUserProvider {
msg: "StaticUserProviderOption must be in format `file:<path>` or `cmd:<values>`",
}
.fail(),
};
}
}
}

View File

@@ -18,6 +18,7 @@ use std::sync::Arc;
use api::v1::greptime_request::Request;
use auth::error::Error::InternalState;
use auth::error::InternalStateSnafu;
use auth::{PermissionChecker, PermissionCheckerRef, PermissionReq, PermissionResp, UserInfoRef};
use sql::statements::show::{ShowDatabases, ShowKind};
use sql::statements::statement::Statement;
@@ -33,9 +34,10 @@ impl PermissionChecker for DummyPermissionChecker {
match req {
PermissionReq::GrpcRequest(_) => Ok(PermissionResp::Allow),
PermissionReq::SqlStatement(_) => Ok(PermissionResp::Reject),
_ => Err(InternalState {
_ => InternalStateSnafu {
msg: "testing".to_string(),
}),
}
.fail(),
}
}
}

View File

@@ -11,4 +11,3 @@ common-macro.workspace = true
common-meta.workspace = true
moka.workspace = true
snafu.workspace = true
substrait.workspace = true

62
src/cache/src/lib.rs vendored
View File

@@ -19,9 +19,9 @@ use std::time::Duration;
use catalog::kvbackend::new_table_cache;
use common_meta::cache::{
new_table_flownode_set_cache, new_table_info_cache, new_table_name_cache,
new_table_route_cache, new_view_info_cache, CacheRegistry, CacheRegistryBuilder,
LayeredCacheRegistryBuilder,
new_schema_cache, new_table_flownode_set_cache, new_table_info_cache, new_table_name_cache,
new_table_route_cache, new_table_schema_cache, new_view_info_cache, CacheRegistry,
CacheRegistryBuilder, LayeredCacheRegistryBuilder,
};
use common_meta::kv_backend::KvBackendRef;
use moka::future::CacheBuilder;
@@ -37,9 +37,47 @@ pub const TABLE_INFO_CACHE_NAME: &str = "table_info_cache";
pub const VIEW_INFO_CACHE_NAME: &str = "view_info_cache";
pub const TABLE_NAME_CACHE_NAME: &str = "table_name_cache";
pub const TABLE_CACHE_NAME: &str = "table_cache";
pub const SCHEMA_CACHE_NAME: &str = "schema_cache";
pub const TABLE_SCHEMA_NAME_CACHE_NAME: &str = "table_schema_name_cache";
pub const TABLE_FLOWNODE_SET_CACHE_NAME: &str = "table_flownode_set_cache";
pub const TABLE_ROUTE_CACHE_NAME: &str = "table_route_cache";
/// Builds cache registry for datanode, including:
/// - Schema cache.
/// - Table id to schema name cache.
pub fn build_datanode_cache_registry(kv_backend: KvBackendRef) -> CacheRegistry {
// Builds table id schema name cache that never expires.
let cache = CacheBuilder::new(DEFAULT_CACHE_MAX_CAPACITY).build();
let table_id_schema_cache = Arc::new(new_table_schema_cache(
TABLE_SCHEMA_NAME_CACHE_NAME.to_string(),
cache,
kv_backend.clone(),
));
// Builds schema cache
let cache = CacheBuilder::new(DEFAULT_CACHE_MAX_CAPACITY)
.time_to_live(DEFAULT_CACHE_TTL)
.time_to_idle(DEFAULT_CACHE_TTI)
.build();
let schema_cache = Arc::new(new_schema_cache(
SCHEMA_CACHE_NAME.to_string(),
cache,
kv_backend.clone(),
));
CacheRegistryBuilder::default()
.add_cache(table_id_schema_cache)
.add_cache(schema_cache)
.build()
}
/// Builds cache registry for frontend and datanode, including:
/// - Table info cache
/// - Table name cache
/// - Table route cache
/// - Table flow node cache
/// - View cache
/// - Schema cache
pub fn build_fundamental_cache_registry(kv_backend: KvBackendRef) -> CacheRegistry {
// Builds table info cache
let cache = CacheBuilder::new(DEFAULT_CACHE_MAX_CAPACITY)
@@ -95,12 +133,30 @@ pub fn build_fundamental_cache_registry(kv_backend: KvBackendRef) -> CacheRegist
kv_backend.clone(),
));
// Builds schema cache
let cache = CacheBuilder::new(DEFAULT_CACHE_MAX_CAPACITY)
.time_to_live(DEFAULT_CACHE_TTL)
.time_to_idle(DEFAULT_CACHE_TTI)
.build();
let schema_cache = Arc::new(new_schema_cache(
SCHEMA_CACHE_NAME.to_string(),
cache,
kv_backend.clone(),
));
let table_id_schema_cache = Arc::new(new_table_schema_cache(
TABLE_SCHEMA_NAME_CACHE_NAME.to_string(),
CacheBuilder::new(DEFAULT_CACHE_MAX_CAPACITY).build(),
kv_backend,
));
CacheRegistryBuilder::default()
.add_cache(table_info_cache)
.add_cache(table_name_cache)
.add_cache(table_route_cache)
.add_cache(view_info_cache)
.add_cache(table_flownode_set_cache)
.add_cache(schema_cache)
.add_cache(table_id_schema_cache)
.build()
}

View File

@@ -18,12 +18,13 @@ async-stream.workspace = true
async-trait = "0.1"
bytes.workspace = true
common-catalog.workspace = true
common-config.workspace = true
common-error.workspace = true
common-macro.workspace = true
common-meta.workspace = true
common-procedure.workspace = true
common-query.workspace = true
common-recordbatch.workspace = true
common-runtime.workspace = true
common-telemetry.workspace = true
common-time.workspace = true
common-version.workspace = true
@@ -40,6 +41,7 @@ moka = { workspace = true, features = ["future", "sync"] }
partition.workspace = true
paste = "1.0"
prometheus.workspace = true
rustc-hash.workspace = true
serde_json.workspace = true
session.workspace = true
snafu.workspace = true
@@ -47,6 +49,7 @@ sql.workspace = true
store-api.workspace = true
table.workspace = true
tokio.workspace = true
tokio-stream = "0.1"
[dev-dependencies]
cache.workspace = true
@@ -54,7 +57,5 @@ catalog = { workspace = true, features = ["testing"] }
chrono.workspace = true
common-meta = { workspace = true, features = ["testing"] }
common-query = { workspace = true, features = ["testing"] }
common-test-util.workspace = true
log-store.workspace = true
object-store.workspace = true
tokio.workspace = true

View File

@@ -50,13 +50,27 @@ pub enum Error {
source: BoxedError,
},
#[snafu(display("Failed to list nodes in cluster: {source}"))]
#[snafu(display("Failed to list nodes in cluster"))]
ListNodes {
#[snafu(implicit)]
location: Location,
source: BoxedError,
},
#[snafu(display("Failed to region stats in cluster"))]
ListRegionStats {
#[snafu(implicit)]
location: Location,
source: BoxedError,
},
#[snafu(display("Failed to list flow stats"))]
ListFlowStats {
#[snafu(implicit)]
location: Location,
source: BoxedError,
},
#[snafu(display("Failed to list flows in catalog {catalog}"))]
ListFlows {
#[snafu(implicit)]
@@ -82,6 +96,32 @@ pub enum Error {
location: Location,
},
#[snafu(display("Failed to get information extension client"))]
GetInformationExtension {
#[snafu(implicit)]
location: Location,
},
#[snafu(display("Failed to list procedures"))]
ListProcedures {
#[snafu(implicit)]
location: Location,
source: BoxedError,
},
#[snafu(display("Procedure id not found"))]
ProcedureIdNotFound {
#[snafu(implicit)]
location: Location,
},
#[snafu(display("convert proto data error"))]
ConvertProtoData {
#[snafu(implicit)]
location: Location,
source: BoxedError,
},
#[snafu(display("Failed to re-compile script due to internal error"))]
CompileScriptInternal {
#[snafu(implicit)]
@@ -97,13 +137,6 @@ pub enum Error {
source: table::error::Error,
},
#[snafu(display("System catalog is not valid: {}", msg))]
SystemCatalog {
msg: String,
#[snafu(implicit)]
location: Location,
},
#[snafu(display("Cannot find catalog by name: {}", catalog_name))]
CatalogNotFound {
catalog_name: String,
@@ -152,6 +185,12 @@ pub enum Error {
location: Location,
},
#[snafu(display("Partition manager not found, it's not expected."))]
PartitionManagerNotFound {
#[snafu(implicit)]
location: Location,
},
#[snafu(display("Failed to find table partitions"))]
FindPartitions { source: partition::error::Error },
@@ -186,13 +225,6 @@ pub enum Error {
source: common_query::error::Error,
},
#[snafu(display("Failed to perform metasrv operation"))]
Metasrv {
#[snafu(implicit)]
location: Location,
source: meta_client::error::Error,
},
#[snafu(display("Invalid table info in catalog"))]
InvalidTableInfoInCatalog {
#[snafu(implicit)]
@@ -280,7 +312,10 @@ impl ErrorExt for Error {
| Error::FindRegionRoutes { .. }
| Error::CacheNotFound { .. }
| Error::CastManager { .. }
| Error::Json { .. } => StatusCode::Unexpected,
| Error::Json { .. }
| Error::GetInformationExtension { .. }
| Error::PartitionManagerNotFound { .. }
| Error::ProcedureIdNotFound { .. } => StatusCode::Unexpected,
Error::ViewPlanColumnsChanged { .. } => StatusCode::InvalidArguments,
@@ -288,8 +323,6 @@ impl ErrorExt for Error {
Error::FlowInfoNotFound { .. } => StatusCode::FlowNotFound,
Error::SystemCatalog { .. } => StatusCode::StorageUnavailable,
Error::UpgradeWeakCatalogManagerRef { .. } => StatusCode::Internal,
Error::CreateRecordBatch { source, .. } => source.status_code(),
@@ -299,11 +332,14 @@ impl ErrorExt for Error {
| Error::ListNodes { source, .. }
| Error::ListSchemas { source, .. }
| Error::ListTables { source, .. }
| Error::ListFlows { source, .. } => source.status_code(),
| Error::ListFlows { source, .. }
| Error::ListFlowStats { source, .. }
| Error::ListProcedures { source, .. }
| Error::ListRegionStats { source, .. }
| Error::ConvertProtoData { source, .. } => source.status_code(),
Error::CreateTable { source, .. } => source.status_code(),
Error::Metasrv { source, .. } => source.status_code(),
Error::DecodePlan { source, .. } => source.status_code(),
Error::InvalidTableInfoInCatalog { source, .. } => source.status_code(),
@@ -338,27 +374,6 @@ mod tests {
use super::*;
#[test]
pub fn test_error_status_code() {
assert_eq!(
StatusCode::TableAlreadyExists,
Error::TableExists {
table: "some_table".to_string(),
location: Location::generate(),
}
.status_code()
);
assert_eq!(
StatusCode::StorageUnavailable,
Error::SystemCatalog {
msg: String::default(),
location: Location::generate(),
}
.status_code()
);
}
#[test]
pub fn test_errors_to_datafusion_error() {
let e: DataFusionError = Error::TableExists {

View File

@@ -0,0 +1,101 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
use api::v1::meta::ProcedureStatus;
use common_error::ext::BoxedError;
use common_meta::cluster::{ClusterInfo, NodeInfo};
use common_meta::datanode::RegionStat;
use common_meta::ddl::{ExecutorContext, ProcedureExecutor};
use common_meta::key::flow::flow_state::FlowStat;
use common_meta::rpc::procedure;
use common_procedure::{ProcedureInfo, ProcedureState};
use meta_client::MetaClientRef;
use snafu::ResultExt;
use crate::error;
use crate::information_schema::InformationExtension;
pub struct DistributedInformationExtension {
meta_client: MetaClientRef,
}
impl DistributedInformationExtension {
pub fn new(meta_client: MetaClientRef) -> Self {
Self { meta_client }
}
}
#[async_trait::async_trait]
impl InformationExtension for DistributedInformationExtension {
type Error = crate::error::Error;
async fn nodes(&self) -> std::result::Result<Vec<NodeInfo>, Self::Error> {
self.meta_client
.list_nodes(None)
.await
.map_err(BoxedError::new)
.context(error::ListNodesSnafu)
}
async fn procedures(&self) -> std::result::Result<Vec<(String, ProcedureInfo)>, Self::Error> {
let procedures = self
.meta_client
.list_procedures(&ExecutorContext::default())
.await
.map_err(BoxedError::new)
.context(error::ListProceduresSnafu)?
.procedures;
let mut result = Vec::with_capacity(procedures.len());
for procedure in procedures {
let pid = match procedure.id {
Some(pid) => pid,
None => return error::ProcedureIdNotFoundSnafu {}.fail(),
};
let pid = procedure::pb_pid_to_pid(&pid)
.map_err(BoxedError::new)
.context(error::ConvertProtoDataSnafu)?;
let status = ProcedureStatus::try_from(procedure.status)
.map(|v| v.as_str_name())
.unwrap_or("Unknown")
.to_string();
let procedure_info = ProcedureInfo {
id: pid,
type_name: procedure.type_name,
start_time_ms: procedure.start_time_ms,
end_time_ms: procedure.end_time_ms,
state: ProcedureState::Running,
lock_keys: procedure.lock_keys,
};
result.push((status, procedure_info));
}
Ok(result)
}
async fn region_stats(&self) -> std::result::Result<Vec<RegionStat>, Self::Error> {
self.meta_client
.list_region_stats()
.await
.map_err(BoxedError::new)
.context(error::ListRegionStatsSnafu)
}
async fn flow_stats(&self) -> std::result::Result<Option<FlowStat>, Self::Error> {
self.meta_client
.list_flow_stats()
.await
.map_err(BoxedError::new)
.context(crate::error::ListFlowStatsSnafu)
}
}

View File

@@ -12,7 +12,7 @@
// See the License for the specific language governing permissions and
// limitations under the License.
pub use client::{CachedMetaKvBackend, CachedMetaKvBackendBuilder, MetaKvBackend};
pub use client::{CachedKvBackend, CachedKvBackendBuilder, MetaKvBackend};
mod client;
mod manager;

View File

@@ -20,8 +20,9 @@ use std::time::Duration;
use common_error::ext::BoxedError;
use common_meta::cache_invalidator::KvCacheInvalidator;
use common_meta::error::Error::{CacheNotGet, GetKvCache};
use common_meta::error::{CacheNotGetSnafu, Error, ExternalSnafu, Result};
use common_meta::error::Error::CacheNotGet;
use common_meta::error::{CacheNotGetSnafu, Error, ExternalSnafu, GetKvCacheSnafu, Result};
use common_meta::kv_backend::txn::{Txn, TxnResponse};
use common_meta::kv_backend::{KvBackend, KvBackendRef, TxnService};
use common_meta::rpc::store::{
BatchDeleteRequest, BatchDeleteResponse, BatchGetRequest, BatchGetResponse, BatchPutRequest,
@@ -42,20 +43,20 @@ const DEFAULT_CACHE_MAX_CAPACITY: u64 = 10000;
const DEFAULT_CACHE_TTL: Duration = Duration::from_secs(10 * 60);
const DEFAULT_CACHE_TTI: Duration = Duration::from_secs(5 * 60);
pub struct CachedMetaKvBackendBuilder {
pub struct CachedKvBackendBuilder {
cache_max_capacity: Option<u64>,
cache_ttl: Option<Duration>,
cache_tti: Option<Duration>,
meta_client: Arc<MetaClient>,
inner: KvBackendRef,
}
impl CachedMetaKvBackendBuilder {
pub fn new(meta_client: Arc<MetaClient>) -> Self {
impl CachedKvBackendBuilder {
pub fn new(inner: KvBackendRef) -> Self {
Self {
cache_max_capacity: None,
cache_ttl: None,
cache_tti: None,
meta_client,
inner,
}
}
@@ -74,7 +75,7 @@ impl CachedMetaKvBackendBuilder {
self
}
pub fn build(self) -> CachedMetaKvBackend {
pub fn build(self) -> CachedKvBackend {
let cache_max_capacity = self
.cache_max_capacity
.unwrap_or(DEFAULT_CACHE_MAX_CAPACITY);
@@ -85,14 +86,11 @@ impl CachedMetaKvBackendBuilder {
.time_to_live(cache_ttl)
.time_to_idle(cache_tti)
.build();
let kv_backend = Arc::new(MetaKvBackend {
client: self.meta_client,
});
let kv_backend = self.inner;
let name = format!("CachedKvBackend({})", kv_backend.name());
let version = AtomicUsize::new(0);
CachedMetaKvBackend {
CachedKvBackend {
kv_backend,
cache,
name,
@@ -112,19 +110,29 @@ pub type CacheBackend = Cache<Vec<u8>, KeyValue>;
/// Therefore, it is recommended to use CachedMetaKvBackend to only read metadata related
/// information. Note: If you read other information, you may read expired data, which depends on
/// TTL and TTI for cache.
pub struct CachedMetaKvBackend {
pub struct CachedKvBackend {
kv_backend: KvBackendRef,
cache: CacheBackend,
name: String,
version: AtomicUsize,
}
impl TxnService for CachedMetaKvBackend {
#[async_trait::async_trait]
impl TxnService for CachedKvBackend {
type Error = Error;
async fn txn(&self, txn: Txn) -> std::result::Result<TxnResponse, Self::Error> {
// TODO(hl): txn of CachedKvBackend simply pass through to inner backend without invalidating caches.
self.kv_backend.txn(txn).await
}
fn max_txn_ops(&self) -> usize {
self.kv_backend.max_txn_ops()
}
}
#[async_trait::async_trait]
impl KvBackend for CachedMetaKvBackend {
impl KvBackend for CachedKvBackend {
fn name(&self) -> &str {
&self.name
}
@@ -282,8 +290,11 @@ impl KvBackend for CachedMetaKvBackend {
_ => Err(e),
},
}
.map_err(|e| GetKvCache {
err_msg: e.to_string(),
.map_err(|e| {
GetKvCacheSnafu {
err_msg: e.to_string(),
}
.build()
});
// "cache.invalidate_key" and "cache.try_get_with_by_ref" are not mutually exclusive. So we need
@@ -302,7 +313,7 @@ impl KvBackend for CachedMetaKvBackend {
}
#[async_trait::async_trait]
impl KvCacheInvalidator for CachedMetaKvBackend {
impl KvCacheInvalidator for CachedKvBackend {
async fn invalidate_key(&self, key: &[u8]) {
self.create_new_version();
self.cache.invalidate(key).await;
@@ -310,7 +321,7 @@ impl KvCacheInvalidator for CachedMetaKvBackend {
}
}
impl CachedMetaKvBackend {
impl CachedKvBackend {
// only for test
#[cfg(test)]
fn wrap(kv_backend: KvBackendRef) -> Self {
@@ -463,7 +474,7 @@ mod tests {
use common_meta::rpc::KeyValue;
use dashmap::DashMap;
use super::CachedMetaKvBackend;
use super::CachedKvBackend;
#[derive(Default)]
pub struct SimpleKvBackend {
@@ -537,7 +548,7 @@ mod tests {
async fn test_cached_kv_backend() {
let simple_kv = Arc::new(SimpleKvBackend::default());
let get_execute_times = simple_kv.get_execute_times.clone();
let cached_kv = CachedMetaKvBackend::wrap(simple_kv);
let cached_kv = CachedKvBackend::wrap(simple_kv);
add_some_vals(&cached_kv).await;

View File

@@ -21,7 +21,6 @@ use common_catalog::consts::{
DEFAULT_CATALOG_NAME, DEFAULT_SCHEMA_NAME, INFORMATION_SCHEMA_NAME, NUMBERS_TABLE_ID,
PG_CATALOG_NAME,
};
use common_config::Mode;
use common_error::ext::BoxedError;
use common_meta::cache::{LayeredCacheRegistryRef, ViewInfoCacheRef};
use common_meta::key::catalog_name::CatalogNameKey;
@@ -31,22 +30,25 @@ use common_meta::key::table_info::TableInfoValue;
use common_meta::key::table_name::TableNameKey;
use common_meta::key::{TableMetadataManager, TableMetadataManagerRef};
use common_meta::kv_backend::KvBackendRef;
use common_procedure::ProcedureManagerRef;
use futures_util::stream::BoxStream;
use futures_util::{StreamExt, TryStreamExt};
use meta_client::client::MetaClient;
use moka::sync::Cache;
use partition::manager::{PartitionRuleManager, PartitionRuleManagerRef};
use session::context::{Channel, QueryContext};
use snafu::prelude::*;
use table::dist_table::DistTable;
use table::table::numbers::{NumbersTable, NUMBERS_TABLE_NAME};
use table::table_name::TableName;
use table::TableRef;
use tokio::sync::Semaphore;
use tokio_stream::wrappers::ReceiverStream;
use crate::error::{
CacheNotFoundSnafu, GetTableCacheSnafu, InvalidTableInfoInCatalogSnafu, ListCatalogsSnafu,
ListSchemasSnafu, ListTablesSnafu, Result, TableMetadataManagerSnafu,
};
use crate::information_schema::InformationSchemaProvider;
use crate::information_schema::{InformationExtensionRef, InformationSchemaProvider};
use crate::kvbackend::TableCacheRef;
use crate::system_schema::pg_catalog::PGCatalogProvider;
use crate::system_schema::SystemSchemaProvider;
@@ -59,27 +61,31 @@ use crate::CatalogManager;
/// comes from `SystemCatalog`, which is static and read-only.
#[derive(Clone)]
pub struct KvBackendCatalogManager {
mode: Mode,
meta_client: Option<Arc<MetaClient>>,
/// Provides the extension methods for the `information_schema` tables
information_extension: InformationExtensionRef,
/// Manages partition rules.
partition_manager: PartitionRuleManagerRef,
/// Manages table metadata.
table_metadata_manager: TableMetadataManagerRef,
/// A sub-CatalogManager that handles system tables
system_catalog: SystemCatalog,
/// Cache registry for all caches.
cache_registry: LayeredCacheRegistryRef,
/// Only available in `Standalone` mode.
procedure_manager: Option<ProcedureManagerRef>,
}
const CATALOG_CACHE_MAX_CAPACITY: u64 = 128;
impl KvBackendCatalogManager {
pub fn new(
mode: Mode,
meta_client: Option<Arc<MetaClient>>,
information_extension: InformationExtensionRef,
backend: KvBackendRef,
cache_registry: LayeredCacheRegistryRef,
procedure_manager: Option<ProcedureManagerRef>,
) -> Arc<Self> {
Arc::new_cyclic(|me| Self {
mode,
meta_client,
information_extension,
partition_manager: Arc::new(PartitionRuleManager::new(
backend.clone(),
cache_registry
@@ -103,23 +109,19 @@ impl KvBackendCatalogManager {
backend,
},
cache_registry,
procedure_manager,
})
}
/// Returns the server running mode.
pub fn running_mode(&self) -> &Mode {
&self.mode
}
pub fn view_info_cache(&self) -> Result<ViewInfoCacheRef> {
self.cache_registry.get().context(CacheNotFoundSnafu {
name: "view_info_cache",
})
}
/// Returns the `[MetaClient]`.
pub fn meta_client(&self) -> Option<Arc<MetaClient>> {
self.meta_client.clone()
/// Returns the [`InformationExtension`].
pub fn information_extension(&self) -> InformationExtensionRef {
self.information_extension.clone()
}
pub fn partition_manager(&self) -> PartitionRuleManagerRef {
@@ -129,6 +131,10 @@ impl KvBackendCatalogManager {
pub fn table_metadata_manager_ref(&self) -> &TableMetadataManagerRef {
&self.table_metadata_manager
}
pub fn procedure_manager(&self) -> Option<ProcedureManagerRef> {
self.procedure_manager.clone()
}
}
#[async_trait::async_trait]
@@ -152,7 +158,11 @@ impl CatalogManager for KvBackendCatalogManager {
Ok(keys)
}
async fn schema_names(&self, catalog: &str) -> Result<Vec<String>> {
async fn schema_names(
&self,
catalog: &str,
query_ctx: Option<&QueryContext>,
) -> Result<Vec<String>> {
let stream = self
.table_metadata_manager
.schema_manager()
@@ -163,27 +173,29 @@ impl CatalogManager for KvBackendCatalogManager {
.map_err(BoxedError::new)
.context(ListSchemasSnafu { catalog })?;
keys.extend(self.system_catalog.schema_names());
keys.extend(self.system_catalog.schema_names(query_ctx));
Ok(keys.into_iter().collect())
}
async fn table_names(&self, catalog: &str, schema: &str) -> Result<Vec<String>> {
let stream = self
async fn table_names(
&self,
catalog: &str,
schema: &str,
query_ctx: Option<&QueryContext>,
) -> Result<Vec<String>> {
let mut tables = self
.table_metadata_manager
.table_name_manager()
.tables(catalog, schema);
let mut tables = stream
.tables(catalog, schema)
.map_ok(|(table_name, _)| table_name)
.try_collect::<Vec<_>>()
.await
.map_err(BoxedError::new)
.context(ListTablesSnafu { catalog, schema })?
.into_iter()
.map(|(k, _)| k)
.collect::<Vec<_>>();
tables.extend_from_slice(&self.system_catalog.table_names(schema));
.context(ListTablesSnafu { catalog, schema })?;
Ok(tables.into_iter().collect())
tables.extend(self.system_catalog.table_names(schema, query_ctx));
Ok(tables)
}
async fn catalog_exists(&self, catalog: &str) -> Result<bool> {
@@ -194,8 +206,13 @@ impl CatalogManager for KvBackendCatalogManager {
.context(TableMetadataManagerSnafu)
}
async fn schema_exists(&self, catalog: &str, schema: &str) -> Result<bool> {
if self.system_catalog.schema_exists(schema) {
async fn schema_exists(
&self,
catalog: &str,
schema: &str,
query_ctx: Option<&QueryContext>,
) -> Result<bool> {
if self.system_catalog.schema_exists(schema, query_ctx) {
return Ok(true);
}
@@ -206,8 +223,14 @@ impl CatalogManager for KvBackendCatalogManager {
.context(TableMetadataManagerSnafu)
}
async fn table_exists(&self, catalog: &str, schema: &str, table: &str) -> Result<bool> {
if self.system_catalog.table_exists(schema, table) {
async fn table_exists(
&self,
catalog: &str,
schema: &str,
table: &str,
query_ctx: Option<&QueryContext>,
) -> Result<bool> {
if self.system_catalog.table_exists(schema, table, query_ctx) {
return Ok(true);
}
@@ -225,10 +248,12 @@ impl CatalogManager for KvBackendCatalogManager {
catalog_name: &str,
schema_name: &str,
table_name: &str,
query_ctx: Option<&QueryContext>,
) -> Result<Option<TableRef>> {
if let Some(table) = self
.system_catalog
.table(catalog_name, schema_name, table_name)
let channel = query_ctx.map_or(Channel::Unknown, |ctx| ctx.channel());
if let Some(table) =
self.system_catalog
.table(catalog_name, schema_name, table_name, query_ctx)
{
return Ok(Some(table));
}
@@ -236,58 +261,112 @@ impl CatalogManager for KvBackendCatalogManager {
let table_cache: TableCacheRef = self.cache_registry.get().context(CacheNotFoundSnafu {
name: "table_cache",
})?;
table_cache
if let Some(table) = table_cache
.get_by_ref(&TableName {
catalog_name: catalog_name.to_string(),
schema_name: schema_name.to_string(),
table_name: table_name.to_string(),
})
.await
.context(GetTableCacheSnafu)
.context(GetTableCacheSnafu)?
{
return Ok(Some(table));
}
if channel == Channel::Postgres {
// falldown to pg_catalog
if let Some(table) =
self.system_catalog
.table(catalog_name, PG_CATALOG_NAME, table_name, query_ctx)
{
return Ok(Some(table));
}
}
return Ok(None);
}
fn tables<'a>(&'a self, catalog: &'a str, schema: &'a str) -> BoxStream<'a, Result<TableRef>> {
fn tables<'a>(
&'a self,
catalog: &'a str,
schema: &'a str,
query_ctx: Option<&'a QueryContext>,
) -> BoxStream<'a, Result<TableRef>> {
let sys_tables = try_stream!({
// System tables
let sys_table_names = self.system_catalog.table_names(schema);
let sys_table_names = self.system_catalog.table_names(schema, query_ctx);
for table_name in sys_table_names {
if let Some(table) = self.system_catalog.table(catalog, schema, &table_name) {
if let Some(table) =
self.system_catalog
.table(catalog, schema, &table_name, query_ctx)
{
yield table;
}
}
});
let table_id_stream = self
.table_metadata_manager
.table_name_manager()
.tables(catalog, schema)
.map_ok(|(_, v)| v.table_id());
const BATCH_SIZE: usize = 128;
let user_tables = try_stream!({
const CONCURRENCY: usize = 8;
let (tx, rx) = tokio::sync::mpsc::channel(64);
let metadata_manager = self.table_metadata_manager.clone();
let catalog = catalog.to_string();
let schema = schema.to_string();
let semaphore = Arc::new(Semaphore::new(CONCURRENCY));
common_runtime::spawn_global(async move {
let table_id_stream = metadata_manager
.table_name_manager()
.tables(&catalog, &schema)
.map_ok(|(_, v)| v.table_id());
// Split table ids into chunks
let mut table_id_chunks = table_id_stream.ready_chunks(BATCH_SIZE);
while let Some(table_ids) = table_id_chunks.next().await {
let table_ids = table_ids
let table_ids = match table_ids
.into_iter()
.collect::<std::result::Result<Vec<_>, _>>()
.map_err(BoxedError::new)
.context(ListTablesSnafu { catalog, schema })?;
.context(ListTablesSnafu {
catalog: &catalog,
schema: &schema,
}) {
Ok(table_ids) => table_ids,
Err(e) => {
let _ = tx.send(Err(e)).await;
return;
}
};
let table_info_values = self
.table_metadata_manager
.table_info_manager()
.batch_get(&table_ids)
.await
.context(TableMetadataManagerSnafu)?;
let metadata_manager = metadata_manager.clone();
let tx = tx.clone();
let semaphore = semaphore.clone();
common_runtime::spawn_global(async move {
// we don't explicitly close the semaphore so just ignore the potential error.
let _ = semaphore.acquire().await;
let table_info_values = match metadata_manager
.table_info_manager()
.batch_get(&table_ids)
.await
.context(TableMetadataManagerSnafu)
{
Ok(table_info_values) => table_info_values,
Err(e) => {
let _ = tx.send(Err(e)).await;
return;
}
};
for table_info_value in table_info_values.into_values() {
yield build_table(table_info_value)?;
}
for table in table_info_values.into_values().map(build_table) {
if tx.send(table).await.is_err() {
return;
}
}
});
}
});
let user_tables = ReceiverStream::new(rx);
Box::pin(sys_tables.chain(user_tables))
}
}
@@ -313,25 +392,34 @@ struct SystemCatalog {
catalog_cache: Cache<String, Arc<InformationSchemaProvider>>,
pg_catalog_cache: Cache<String, Arc<PGCatalogProvider>>,
// system_schema_provier for default catalog
// system_schema_provider for default catalog
information_schema_provider: Arc<InformationSchemaProvider>,
pg_catalog_provider: Arc<PGCatalogProvider>,
backend: KvBackendRef,
}
impl SystemCatalog {
// TODO(j0hn50n133): remove the duplicated hard-coded table names logic
fn schema_names(&self) -> Vec<String> {
vec![
INFORMATION_SCHEMA_NAME.to_string(),
PG_CATALOG_NAME.to_string(),
]
fn schema_names(&self, query_ctx: Option<&QueryContext>) -> Vec<String> {
let channel = query_ctx.map_or(Channel::Unknown, |ctx| ctx.channel());
match channel {
// pg_catalog only visible under postgres protocol
Channel::Postgres => vec![
INFORMATION_SCHEMA_NAME.to_string(),
PG_CATALOG_NAME.to_string(),
],
_ => {
vec![INFORMATION_SCHEMA_NAME.to_string()]
}
}
}
fn table_names(&self, schema: &str) -> Vec<String> {
fn table_names(&self, schema: &str, query_ctx: Option<&QueryContext>) -> Vec<String> {
let channel = query_ctx.map_or(Channel::Unknown, |ctx| ctx.channel());
match schema {
INFORMATION_SCHEMA_NAME => self.information_schema_provider.table_names(),
PG_CATALOG_NAME => self.pg_catalog_provider.table_names(),
PG_CATALOG_NAME if channel == Channel::Postgres => {
self.pg_catalog_provider.table_names()
}
DEFAULT_SCHEMA_NAME => {
vec![NUMBERS_TABLE_NAME.to_string()]
}
@@ -339,23 +427,35 @@ impl SystemCatalog {
}
}
fn schema_exists(&self, schema: &str) -> bool {
schema == INFORMATION_SCHEMA_NAME || schema == PG_CATALOG_NAME
fn schema_exists(&self, schema: &str, query_ctx: Option<&QueryContext>) -> bool {
let channel = query_ctx.map_or(Channel::Unknown, |ctx| ctx.channel());
match channel {
Channel::Postgres => schema == PG_CATALOG_NAME || schema == INFORMATION_SCHEMA_NAME,
_ => schema == INFORMATION_SCHEMA_NAME,
}
}
fn table_exists(&self, schema: &str, table: &str) -> bool {
fn table_exists(&self, schema: &str, table: &str, query_ctx: Option<&QueryContext>) -> bool {
let channel = query_ctx.map_or(Channel::Unknown, |ctx| ctx.channel());
if schema == INFORMATION_SCHEMA_NAME {
self.information_schema_provider.table(table).is_some()
} else if schema == DEFAULT_SCHEMA_NAME {
table == NUMBERS_TABLE_NAME
} else if schema == PG_CATALOG_NAME {
} else if schema == PG_CATALOG_NAME && channel == Channel::Postgres {
self.pg_catalog_provider.table(table).is_some()
} else {
false
}
}
fn table(&self, catalog: &str, schema: &str, table_name: &str) -> Option<TableRef> {
fn table(
&self,
catalog: &str,
schema: &str,
table_name: &str,
query_ctx: Option<&QueryContext>,
) -> Option<TableRef> {
let channel = query_ctx.map_or(Channel::Unknown, |ctx| ctx.channel());
if schema == INFORMATION_SCHEMA_NAME {
let information_schema_provider =
self.catalog_cache.get_with_by_ref(catalog, move || {
@@ -366,7 +466,7 @@ impl SystemCatalog {
))
});
information_schema_provider.table(table_name)
} else if schema == PG_CATALOG_NAME {
} else if schema == PG_CATALOG_NAME && channel == Channel::Postgres {
if catalog == DEFAULT_CATALOG_NAME {
self.pg_catalog_provider.table(table_name)
} else {

View File

@@ -20,14 +20,17 @@ use std::fmt::{Debug, Formatter};
use std::sync::Arc;
use api::v1::CreateTableExpr;
use common_catalog::consts::{INFORMATION_SCHEMA_NAME, PG_CATALOG_NAME};
use futures::future::BoxFuture;
use futures_util::stream::BoxStream;
use session::context::QueryContext;
use table::metadata::TableId;
use table::TableRef;
use crate::error::Result;
pub mod error;
pub mod information_extension;
pub mod kvbackend;
pub mod memory;
mod metrics;
@@ -44,15 +47,35 @@ pub trait CatalogManager: Send + Sync {
async fn catalog_names(&self) -> Result<Vec<String>>;
async fn schema_names(&self, catalog: &str) -> Result<Vec<String>>;
async fn schema_names(
&self,
catalog: &str,
query_ctx: Option<&QueryContext>,
) -> Result<Vec<String>>;
async fn table_names(&self, catalog: &str, schema: &str) -> Result<Vec<String>>;
async fn table_names(
&self,
catalog: &str,
schema: &str,
query_ctx: Option<&QueryContext>,
) -> Result<Vec<String>>;
async fn catalog_exists(&self, catalog: &str) -> Result<bool>;
async fn schema_exists(&self, catalog: &str, schema: &str) -> Result<bool>;
async fn schema_exists(
&self,
catalog: &str,
schema: &str,
query_ctx: Option<&QueryContext>,
) -> Result<bool>;
async fn table_exists(&self, catalog: &str, schema: &str, table: &str) -> Result<bool>;
async fn table_exists(
&self,
catalog: &str,
schema: &str,
table: &str,
query_ctx: Option<&QueryContext>,
) -> Result<bool>;
/// Returns the table by catalog, schema and table name.
async fn table(
@@ -60,10 +83,25 @@ pub trait CatalogManager: Send + Sync {
catalog: &str,
schema: &str,
table_name: &str,
query_ctx: Option<&QueryContext>,
) -> Result<Option<TableRef>>;
/// Returns all tables with a stream by catalog and schema.
fn tables<'a>(&'a self, catalog: &'a str, schema: &'a str) -> BoxStream<'a, Result<TableRef>>;
fn tables<'a>(
&'a self,
catalog: &'a str,
schema: &'a str,
query_ctx: Option<&'a QueryContext>,
) -> BoxStream<'a, Result<TableRef>>;
/// Check if `schema` is a reserved schema name
fn is_reserved_schema_name(&self, schema: &str) -> bool {
// We have to check whether a schema name is reserved before create schema.
// We need this rather than use schema_exists directly because `pg_catalog` is
// only visible via postgres protocol. So if we don't check, a mysql client may
// create a schema named `pg_catalog` which is somehow malformed.
schema == INFORMATION_SCHEMA_NAME || schema == PG_CATALOG_NAME
}
}
pub type CatalogManagerRef = Arc<dyn CatalogManager>;

View File

@@ -26,6 +26,7 @@ use common_catalog::consts::{
use common_meta::key::flow::FlowMetadataManager;
use common_meta::kv_backend::memory::MemoryKvBackend;
use futures_util::stream::BoxStream;
use session::context::QueryContext;
use snafu::OptionExt;
use table::TableRef;
@@ -53,7 +54,11 @@ impl CatalogManager for MemoryCatalogManager {
Ok(self.catalogs.read().unwrap().keys().cloned().collect())
}
async fn schema_names(&self, catalog: &str) -> Result<Vec<String>> {
async fn schema_names(
&self,
catalog: &str,
_query_ctx: Option<&QueryContext>,
) -> Result<Vec<String>> {
Ok(self
.catalogs
.read()
@@ -67,7 +72,12 @@ impl CatalogManager for MemoryCatalogManager {
.collect())
}
async fn table_names(&self, catalog: &str, schema: &str) -> Result<Vec<String>> {
async fn table_names(
&self,
catalog: &str,
schema: &str,
_query_ctx: Option<&QueryContext>,
) -> Result<Vec<String>> {
Ok(self
.catalogs
.read()
@@ -87,11 +97,22 @@ impl CatalogManager for MemoryCatalogManager {
self.catalog_exist_sync(catalog)
}
async fn schema_exists(&self, catalog: &str, schema: &str) -> Result<bool> {
async fn schema_exists(
&self,
catalog: &str,
schema: &str,
_query_ctx: Option<&QueryContext>,
) -> Result<bool> {
self.schema_exist_sync(catalog, schema)
}
async fn table_exists(&self, catalog: &str, schema: &str, table: &str) -> Result<bool> {
async fn table_exists(
&self,
catalog: &str,
schema: &str,
table: &str,
_query_ctx: Option<&QueryContext>,
) -> Result<bool> {
let catalogs = self.catalogs.read().unwrap();
Ok(catalogs
.get(catalog)
@@ -108,6 +129,7 @@ impl CatalogManager for MemoryCatalogManager {
catalog: &str,
schema: &str,
table_name: &str,
_query_ctx: Option<&QueryContext>,
) -> Result<Option<TableRef>> {
let result = try {
self.catalogs
@@ -121,7 +143,12 @@ impl CatalogManager for MemoryCatalogManager {
Ok(result)
}
fn tables<'a>(&'a self, catalog: &'a str, schema: &'a str) -> BoxStream<'a, Result<TableRef>> {
fn tables<'a>(
&'a self,
catalog: &'a str,
schema: &'a str,
_query_ctx: Option<&QueryContext>,
) -> BoxStream<'a, Result<TableRef>> {
let catalogs = self.catalogs.read().unwrap();
let Some(schemas) = catalogs.get(catalog) else {
@@ -371,11 +398,12 @@ mod tests {
DEFAULT_CATALOG_NAME,
DEFAULT_SCHEMA_NAME,
NUMBERS_TABLE_NAME,
None,
)
.await
.unwrap()
.unwrap();
let stream = catalog_list.tables(DEFAULT_CATALOG_NAME, DEFAULT_SCHEMA_NAME);
let stream = catalog_list.tables(DEFAULT_CATALOG_NAME, DEFAULT_SCHEMA_NAME, None);
let tables = stream.try_collect::<Vec<_>>().await.unwrap();
assert_eq!(tables.len(), 1);
assert_eq!(
@@ -384,7 +412,12 @@ mod tests {
);
assert!(catalog_list
.table(DEFAULT_CATALOG_NAME, DEFAULT_SCHEMA_NAME, "not_exists")
.table(
DEFAULT_CATALOG_NAME,
DEFAULT_SCHEMA_NAME,
"not_exists",
None
)
.await
.unwrap()
.is_none());
@@ -411,7 +444,7 @@ mod tests {
};
catalog.register_table_sync(register_table_req).unwrap();
assert!(catalog
.table(DEFAULT_CATALOG_NAME, DEFAULT_SCHEMA_NAME, table_name)
.table(DEFAULT_CATALOG_NAME, DEFAULT_SCHEMA_NAME, table_name, None)
.await
.unwrap()
.is_some());
@@ -423,7 +456,7 @@ mod tests {
};
catalog.deregister_table_sync(deregister_table_req).unwrap();
assert!(catalog
.table(DEFAULT_CATALOG_NAME, DEFAULT_SCHEMA_NAME, table_name)
.table(DEFAULT_CATALOG_NAME, DEFAULT_SCHEMA_NAME, table_name, None)
.await
.unwrap()
.is_none());

View File

@@ -15,6 +15,8 @@
pub mod information_schema;
mod memory_table;
pub mod pg_catalog;
mod predicate;
mod utils;
use std::collections::HashMap;
use std::sync::Arc;

View File

@@ -18,26 +18,30 @@ pub mod flows;
mod information_memory_table;
pub mod key_column_usage;
mod partitions;
mod predicate;
mod procedure_info;
mod region_peers;
mod region_statistics;
mod runtime_metrics;
pub mod schemata;
mod table_constraints;
mod table_names;
pub mod tables;
pub(crate) mod utils;
mod views;
use std::collections::HashMap;
use std::sync::{Arc, Weak};
use common_catalog::consts::{self, DEFAULT_CATALOG_NAME, INFORMATION_SCHEMA_NAME};
use common_error::ext::ErrorExt;
use common_meta::cluster::NodeInfo;
use common_meta::datanode::RegionStat;
use common_meta::key::flow::flow_state::FlowStat;
use common_meta::key::flow::FlowMetadataManager;
use common_procedure::ProcedureInfo;
use common_recordbatch::SendableRecordBatchStream;
use datatypes::schema::SchemaRef;
use lazy_static::lazy_static;
use paste::paste;
pub(crate) use predicate::Predicates;
use store_api::storage::{ScanRequest, TableId};
use table::metadata::TableType;
use table::TableRef;
@@ -46,7 +50,7 @@ use views::InformationSchemaViews;
use self::columns::InformationSchemaColumns;
use super::{SystemSchemaProviderInner, SystemTable, SystemTableRef};
use crate::error::Result;
use crate::error::{Error, Result};
use crate::system_schema::information_schema::cluster_info::InformationSchemaClusterInfo;
use crate::system_schema::information_schema::flows::InformationSchemaFlows;
use crate::system_schema::information_schema::information_memory_table::get_schema_columns;
@@ -58,6 +62,7 @@ use crate::system_schema::information_schema::schemata::InformationSchemaSchemat
use crate::system_schema::information_schema::table_constraints::InformationSchemaTableConstraints;
use crate::system_schema::information_schema::tables::InformationSchemaTables;
use crate::system_schema::memory_table::MemoryTable;
pub(crate) use crate::system_schema::predicate::Predicates;
use crate::system_schema::SystemSchemaProvider;
use crate::CatalogManager;
@@ -188,8 +193,19 @@ impl SystemSchemaProviderInner for InformationSchemaProvider {
)) as _),
FLOWS => Some(Arc::new(InformationSchemaFlows::new(
self.catalog_name.clone(),
self.catalog_manager.clone(),
self.flow_metadata_manager.clone(),
)) as _),
PROCEDURE_INFO => Some(
Arc::new(procedure_info::InformationSchemaProcedureInfo::new(
self.catalog_manager.clone(),
)) as _,
),
REGION_STATISTICS => Some(Arc::new(
region_statistics::InformationSchemaRegionStatistics::new(
self.catalog_manager.clone(),
),
) as _),
_ => None,
}
}
@@ -237,6 +253,14 @@ impl InformationSchemaProvider {
CLUSTER_INFO.to_string(),
self.build_table(CLUSTER_INFO).unwrap(),
);
tables.insert(
PROCEDURE_INFO.to_string(),
self.build_table(PROCEDURE_INFO).unwrap(),
);
tables.insert(
REGION_STATISTICS.to_string(),
self.build_table(REGION_STATISTICS).unwrap(),
);
}
tables.insert(TABLES.to_string(), self.build_table(TABLES).unwrap());
@@ -252,7 +276,6 @@ impl InformationSchemaProvider {
self.build_table(TABLE_CONSTRAINTS).unwrap(),
);
tables.insert(FLOWS.to_string(), self.build_table(FLOWS).unwrap());
// Add memory tables
for name in MEMORY_TABLES.iter() {
tables.insert((*name).to_string(), self.build_table(name).expect(name));
@@ -301,3 +324,46 @@ where
InformationTable::to_stream(self, request)
}
}
pub type InformationExtensionRef = Arc<dyn InformationExtension<Error = Error> + Send + Sync>;
/// The `InformationExtension` trait provides the extension methods for the `information_schema` tables.
#[async_trait::async_trait]
pub trait InformationExtension {
type Error: ErrorExt;
/// Gets the nodes information.
async fn nodes(&self) -> std::result::Result<Vec<NodeInfo>, Self::Error>;
/// Gets the procedures information.
async fn procedures(&self) -> std::result::Result<Vec<(String, ProcedureInfo)>, Self::Error>;
/// Gets the region statistics.
async fn region_stats(&self) -> std::result::Result<Vec<RegionStat>, Self::Error>;
/// Get the flow statistics. If no flownode is available, return `None`.
async fn flow_stats(&self) -> std::result::Result<Option<FlowStat>, Self::Error>;
}
pub struct NoopInformationExtension;
#[async_trait::async_trait]
impl InformationExtension for NoopInformationExtension {
type Error = Error;
async fn nodes(&self) -> std::result::Result<Vec<NodeInfo>, Self::Error> {
Ok(vec![])
}
async fn procedures(&self) -> std::result::Result<Vec<(String, ProcedureInfo)>, Self::Error> {
Ok(vec![])
}
async fn region_stats(&self) -> std::result::Result<Vec<RegionStat>, Self::Error> {
Ok(vec![])
}
async fn flow_stats(&self) -> std::result::Result<Option<FlowStat>, Self::Error> {
Ok(None)
}
}

View File

@@ -17,13 +17,10 @@ use std::time::Duration;
use arrow_schema::SchemaRef as ArrowSchemaRef;
use common_catalog::consts::INFORMATION_SCHEMA_CLUSTER_INFO_TABLE_ID;
use common_config::Mode;
use common_error::ext::BoxedError;
use common_meta::cluster::{ClusterInfo, NodeInfo, NodeStatus};
use common_meta::peer::Peer;
use common_meta::cluster::NodeInfo;
use common_recordbatch::adapter::RecordBatchStreamAdapter;
use common_recordbatch::{RecordBatch, SendableRecordBatchStream};
use common_telemetry::warn;
use common_time::timestamp::Timestamp;
use datafusion::execution::TaskContext;
use datafusion::physical_plan::stream::RecordBatchStreamAdapter as DfRecordBatchStreamAdapter;
@@ -40,8 +37,9 @@ use snafu::ResultExt;
use store_api::storage::{ScanRequest, TableId};
use super::CLUSTER_INFO;
use crate::error::{CreateRecordBatchSnafu, InternalSnafu, ListNodesSnafu, Result};
use crate::system_schema::information_schema::{utils, InformationTable, Predicates};
use crate::error::{CreateRecordBatchSnafu, InternalSnafu, Result};
use crate::system_schema::information_schema::{InformationTable, Predicates};
use crate::system_schema::utils;
use crate::CatalogManager;
const PEER_ID: &str = "peer_id";
@@ -69,7 +67,6 @@ const INIT_CAPACITY: usize = 42;
pub(super) struct InformationSchemaClusterInfo {
schema: SchemaRef,
catalog_manager: Weak<dyn CatalogManager>,
start_time_ms: u64,
}
impl InformationSchemaClusterInfo {
@@ -77,7 +74,6 @@ impl InformationSchemaClusterInfo {
Self {
schema: Self::schema(),
catalog_manager,
start_time_ms: common_time::util::current_time_millis() as u64,
}
}
@@ -99,11 +95,7 @@ impl InformationSchemaClusterInfo {
}
fn builder(&self) -> InformationSchemaClusterInfoBuilder {
InformationSchemaClusterInfoBuilder::new(
self.schema.clone(),
self.catalog_manager.clone(),
self.start_time_ms,
)
InformationSchemaClusterInfoBuilder::new(self.schema.clone(), self.catalog_manager.clone())
}
}
@@ -143,7 +135,6 @@ impl InformationTable for InformationSchemaClusterInfo {
struct InformationSchemaClusterInfoBuilder {
schema: SchemaRef,
start_time_ms: u64,
catalog_manager: Weak<dyn CatalogManager>,
peer_ids: Int64VectorBuilder,
@@ -157,11 +148,7 @@ struct InformationSchemaClusterInfoBuilder {
}
impl InformationSchemaClusterInfoBuilder {
fn new(
schema: SchemaRef,
catalog_manager: Weak<dyn CatalogManager>,
start_time_ms: u64,
) -> Self {
fn new(schema: SchemaRef, catalog_manager: Weak<dyn CatalogManager>) -> Self {
Self {
schema,
catalog_manager,
@@ -173,56 +160,17 @@ impl InformationSchemaClusterInfoBuilder {
start_times: TimestampMillisecondVectorBuilder::with_capacity(INIT_CAPACITY),
uptimes: StringVectorBuilder::with_capacity(INIT_CAPACITY),
active_times: StringVectorBuilder::with_capacity(INIT_CAPACITY),
start_time_ms,
}
}
/// Construct the `information_schema.cluster_info` virtual table
async fn make_cluster_info(&mut self, request: Option<ScanRequest>) -> Result<RecordBatch> {
let predicates = Predicates::from_scan_request(&request);
let mode = utils::running_mode(&self.catalog_manager)?.unwrap_or(Mode::Standalone);
match mode {
Mode::Standalone => {
let build_info = common_version::build_info();
self.add_node_info(
&predicates,
NodeInfo {
// For the standalone:
// - id always 0
// - empty string for peer_addr
peer: Peer {
id: 0,
addr: "".to_string(),
},
last_activity_ts: -1,
status: NodeStatus::Standalone,
version: build_info.version.to_string(),
git_commit: build_info.commit_short.to_string(),
// Use `self.start_time_ms` instead.
// It's not precise but enough.
start_time_ms: self.start_time_ms,
},
);
}
Mode::Distributed => {
if let Some(meta_client) = utils::meta_client(&self.catalog_manager)? {
let node_infos = meta_client
.list_nodes(None)
.await
.map_err(BoxedError::new)
.context(ListNodesSnafu)?;
for node_info in node_infos {
self.add_node_info(&predicates, node_info);
}
} else {
warn!("Could not find meta client in distributed mode.");
}
}
let information_extension = utils::information_extension(&self.catalog_manager)?;
let node_infos = information_extension.nodes().await?;
for node_info in node_infos {
self.add_node_info(&predicates, node_info);
}
self.finish()
}

View File

@@ -257,8 +257,8 @@ impl InformationSchemaColumnsBuilder {
.context(UpgradeWeakCatalogManagerRefSnafu)?;
let predicates = Predicates::from_scan_request(&request);
for schema_name in catalog_manager.schema_names(&catalog_name).await? {
let mut stream = catalog_manager.tables(&catalog_name, &schema_name);
for schema_name in catalog_manager.schema_names(&catalog_name, None).await? {
let mut stream = catalog_manager.tables(&catalog_name, &schema_name, None);
while let Some(table) = stream.try_next().await? {
let keys = &table.table_info().meta.primary_key_indices;

View File

@@ -12,11 +12,12 @@
// See the License for the specific language governing permissions and
// limitations under the License.
use std::sync::Arc;
use std::sync::{Arc, Weak};
use common_catalog::consts::INFORMATION_SCHEMA_FLOW_TABLE_ID;
use common_error::ext::BoxedError;
use common_meta::key::flow::flow_info::FlowInfoValue;
use common_meta::key::flow::flow_state::FlowStat;
use common_meta::key::flow::FlowMetadataManager;
use common_meta::key::FlowId;
use common_recordbatch::adapter::RecordBatchStreamAdapter;
@@ -28,7 +29,9 @@ use datatypes::prelude::ConcreteDataType as CDT;
use datatypes::scalars::ScalarVectorBuilder;
use datatypes::schema::{ColumnSchema, Schema, SchemaRef};
use datatypes::value::Value;
use datatypes::vectors::{Int64VectorBuilder, StringVectorBuilder, UInt32VectorBuilder, VectorRef};
use datatypes::vectors::{
Int64VectorBuilder, StringVectorBuilder, UInt32VectorBuilder, UInt64VectorBuilder, VectorRef,
};
use futures::TryStreamExt;
use snafu::{OptionExt, ResultExt};
use store_api::storage::{ScanRequest, TableId};
@@ -38,6 +41,8 @@ use crate::error::{
};
use crate::information_schema::{Predicates, FLOWS};
use crate::system_schema::information_schema::InformationTable;
use crate::system_schema::utils;
use crate::CatalogManager;
const INIT_CAPACITY: usize = 42;
@@ -45,6 +50,7 @@ const INIT_CAPACITY: usize = 42;
// pk is (flow_name, flow_id, table_catalog)
pub const FLOW_NAME: &str = "flow_name";
pub const FLOW_ID: &str = "flow_id";
pub const STATE_SIZE: &str = "state_size";
pub const TABLE_CATALOG: &str = "table_catalog";
pub const FLOW_DEFINITION: &str = "flow_definition";
pub const COMMENT: &str = "comment";
@@ -55,20 +61,24 @@ pub const FLOWNODE_IDS: &str = "flownode_ids";
pub const OPTIONS: &str = "options";
/// The `information_schema.flows` to provides information about flows in databases.
///
pub(super) struct InformationSchemaFlows {
schema: SchemaRef,
catalog_name: String,
catalog_manager: Weak<dyn CatalogManager>,
flow_metadata_manager: Arc<FlowMetadataManager>,
}
impl InformationSchemaFlows {
pub(super) fn new(
catalog_name: String,
catalog_manager: Weak<dyn CatalogManager>,
flow_metadata_manager: Arc<FlowMetadataManager>,
) -> Self {
Self {
schema: Self::schema(),
catalog_name,
catalog_manager,
flow_metadata_manager,
}
}
@@ -80,6 +90,7 @@ impl InformationSchemaFlows {
vec![
(FLOW_NAME, CDT::string_datatype(), false),
(FLOW_ID, CDT::uint32_datatype(), false),
(STATE_SIZE, CDT::uint64_datatype(), true),
(TABLE_CATALOG, CDT::string_datatype(), false),
(FLOW_DEFINITION, CDT::string_datatype(), false),
(COMMENT, CDT::string_datatype(), true),
@@ -99,6 +110,7 @@ impl InformationSchemaFlows {
InformationSchemaFlowsBuilder::new(
self.schema.clone(),
self.catalog_name.clone(),
self.catalog_manager.clone(),
&self.flow_metadata_manager,
)
}
@@ -144,10 +156,12 @@ impl InformationTable for InformationSchemaFlows {
struct InformationSchemaFlowsBuilder {
schema: SchemaRef,
catalog_name: String,
catalog_manager: Weak<dyn CatalogManager>,
flow_metadata_manager: Arc<FlowMetadataManager>,
flow_names: StringVectorBuilder,
flow_ids: UInt32VectorBuilder,
state_sizes: UInt64VectorBuilder,
table_catalogs: StringVectorBuilder,
raw_sqls: StringVectorBuilder,
comments: StringVectorBuilder,
@@ -162,15 +176,18 @@ impl InformationSchemaFlowsBuilder {
fn new(
schema: SchemaRef,
catalog_name: String,
catalog_manager: Weak<dyn CatalogManager>,
flow_metadata_manager: &Arc<FlowMetadataManager>,
) -> Self {
Self {
schema,
catalog_name,
catalog_manager,
flow_metadata_manager: flow_metadata_manager.clone(),
flow_names: StringVectorBuilder::with_capacity(INIT_CAPACITY),
flow_ids: UInt32VectorBuilder::with_capacity(INIT_CAPACITY),
state_sizes: UInt64VectorBuilder::with_capacity(INIT_CAPACITY),
table_catalogs: StringVectorBuilder::with_capacity(INIT_CAPACITY),
raw_sqls: StringVectorBuilder::with_capacity(INIT_CAPACITY),
comments: StringVectorBuilder::with_capacity(INIT_CAPACITY),
@@ -195,6 +212,11 @@ impl InformationSchemaFlowsBuilder {
.flow_names(&catalog_name)
.await;
let flow_stat = {
let information_extension = utils::information_extension(&self.catalog_manager)?;
information_extension.flow_stats().await?
};
while let Some((flow_name, flow_id)) = stream
.try_next()
.await
@@ -213,7 +235,7 @@ impl InformationSchemaFlowsBuilder {
catalog_name: catalog_name.to_string(),
flow_name: flow_name.to_string(),
})?;
self.add_flow(&predicates, flow_id.flow_id(), flow_info)?;
self.add_flow(&predicates, flow_id.flow_id(), flow_info, &flow_stat)?;
}
self.finish()
@@ -224,6 +246,7 @@ impl InformationSchemaFlowsBuilder {
predicates: &Predicates,
flow_id: FlowId,
flow_info: FlowInfoValue,
flow_stat: &Option<FlowStat>,
) -> Result<()> {
let row = [
(FLOW_NAME, &Value::from(flow_info.flow_name().to_string())),
@@ -238,6 +261,11 @@ impl InformationSchemaFlowsBuilder {
}
self.flow_names.push(Some(flow_info.flow_name()));
self.flow_ids.push(Some(flow_id));
self.state_sizes.push(
flow_stat
.as_ref()
.and_then(|state| state.state_size.get(&flow_id).map(|v| *v as u64)),
);
self.table_catalogs.push(Some(flow_info.catalog_name()));
self.raw_sqls.push(Some(flow_info.raw_sql()));
self.comments.push(Some(flow_info.comment()));
@@ -270,6 +298,7 @@ impl InformationSchemaFlowsBuilder {
let columns: Vec<VectorRef> = vec![
Arc::new(self.flow_names.finish()),
Arc::new(self.flow_ids.finish()),
Arc::new(self.state_sizes.finish()),
Arc::new(self.table_catalogs.finish()),
Arc::new(self.raw_sqls.finish()),
Arc::new(self.comments.finish()),

View File

@@ -19,7 +19,7 @@ use datatypes::schema::{Schema, SchemaRef};
use datatypes::vectors::{Int64Vector, StringVector, VectorRef};
use super::table_names::*;
use crate::system_schema::memory_table::tables::{
use crate::system_schema::utils::tables::{
bigint_column, datetime_column, string_column, string_columns,
};

View File

@@ -54,6 +54,10 @@ const INIT_CAPACITY: usize = 42;
pub(crate) const PRI_CONSTRAINT_NAME: &str = "PRIMARY";
/// Time index constraint name
pub(crate) const TIME_INDEX_CONSTRAINT_NAME: &str = "TIME INDEX";
/// Inverted index constraint name
pub(crate) const INVERTED_INDEX_CONSTRAINT_NAME: &str = "INVERTED INDEX";
/// Fulltext index constraint name
pub(crate) const FULLTEXT_INDEX_CONSTRAINT_NAME: &str = "FULLTEXT INDEX";
/// The virtual table implementation for `information_schema.KEY_COLUMN_USAGE`.
pub(super) struct InformationSchemaKeyColumnUsage {
@@ -212,18 +216,17 @@ impl InformationSchemaKeyColumnUsageBuilder {
.context(UpgradeWeakCatalogManagerRefSnafu)?;
let predicates = Predicates::from_scan_request(&request);
for schema_name in catalog_manager.schema_names(&catalog_name).await? {
let mut stream = catalog_manager.tables(&catalog_name, &schema_name);
for schema_name in catalog_manager.schema_names(&catalog_name, None).await? {
let mut stream = catalog_manager.tables(&catalog_name, &schema_name, None);
while let Some(table) = stream.try_next().await? {
let mut primary_constraints = vec![];
let table_info = table.table_info();
let table_name = &table_info.name;
let keys = &table_info.meta.primary_key_indices;
let schema = table.schema();
for (idx, column) in schema.column_schemas().iter().enumerate() {
let mut constraints = vec![];
if column.is_time_index() {
self.add_key_column_usage(
&predicates,
@@ -236,30 +239,31 @@ impl InformationSchemaKeyColumnUsageBuilder {
1, //always 1 for time index
);
}
if keys.contains(&idx) {
primary_constraints.push((
catalog_name.clone(),
schema_name.clone(),
table_name.to_string(),
column.name.clone(),
));
}
// TODO(dimbtp): foreign key constraint not supported yet
}
if keys.contains(&idx) {
constraints.push(PRI_CONSTRAINT_NAME);
}
if column.is_inverted_indexed() {
constraints.push(INVERTED_INDEX_CONSTRAINT_NAME);
}
for (i, (catalog_name, schema_name, table_name, column_name)) in
primary_constraints.into_iter().enumerate()
{
self.add_key_column_usage(
&predicates,
&schema_name,
PRI_CONSTRAINT_NAME,
&catalog_name,
&schema_name,
&table_name,
&column_name,
i as u32 + 1,
);
if column.has_fulltext_index_key() {
constraints.push(FULLTEXT_INDEX_CONSTRAINT_NAME);
}
if !constraints.is_empty() {
let aggregated_constraints = constraints.join(", ");
self.add_key_column_usage(
&predicates,
&schema_name,
&aggregated_constraints,
&catalog_name,
&schema_name,
table_name,
&column.name,
idx as u32 + 1,
);
}
}
}
}

View File

@@ -34,15 +34,14 @@ use datatypes::vectors::{
};
use futures::{StreamExt, TryStreamExt};
use partition::manager::PartitionInfo;
use partition::partition::PartitionDef;
use snafu::{OptionExt, ResultExt};
use store_api::storage::{RegionId, ScanRequest, TableId};
use store_api::storage::{ScanRequest, TableId};
use table::metadata::{TableInfo, TableType};
use super::PARTITIONS;
use crate::error::{
CreateRecordBatchSnafu, FindPartitionsSnafu, InternalSnafu, Result,
UpgradeWeakCatalogManagerRefSnafu,
CreateRecordBatchSnafu, FindPartitionsSnafu, InternalSnafu, PartitionManagerNotFoundSnafu,
Result, UpgradeWeakCatalogManagerRefSnafu,
};
use crate::kvbackend::KvBackendCatalogManager;
use crate::system_schema::information_schema::{InformationTable, Predicates};
@@ -236,13 +235,14 @@ impl InformationSchemaPartitionsBuilder {
let partition_manager = catalog_manager
.as_any()
.downcast_ref::<KvBackendCatalogManager>()
.map(|catalog_manager| catalog_manager.partition_manager());
.map(|catalog_manager| catalog_manager.partition_manager())
.context(PartitionManagerNotFoundSnafu)?;
let predicates = Predicates::from_scan_request(&request);
for schema_name in catalog_manager.schema_names(&catalog_name).await? {
for schema_name in catalog_manager.schema_names(&catalog_name, None).await? {
let table_info_stream = catalog_manager
.tables(&catalog_name, &schema_name)
.tables(&catalog_name, &schema_name, None)
.try_filter_map(|t| async move {
let table_info = t.table_info();
if table_info.table_type == TableType::Temporary {
@@ -262,27 +262,10 @@ impl InformationSchemaPartitionsBuilder {
let table_ids: Vec<TableId> =
table_infos.iter().map(|info| info.ident.table_id).collect();
let mut table_partitions = if let Some(partition_manager) = &partition_manager {
partition_manager
.batch_find_table_partitions(&table_ids)
.await
.context(FindPartitionsSnafu)?
} else {
// Current node must be a standalone instance, contains only one partition by default.
// TODO(dennis): change it when we support multi-regions for standalone.
table_ids
.into_iter()
.map(|table_id| {
(
table_id,
vec![PartitionInfo {
id: RegionId::new(table_id, 0),
partition: PartitionDef::new(vec![], vec![]),
}],
)
})
.collect()
};
let mut table_partitions = partition_manager
.batch_find_table_partitions(&table_ids)
.await
.context(FindPartitionsSnafu)?;
for table_info in table_infos {
let partitions = table_partitions

View File

@@ -0,0 +1,241 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
use std::sync::{Arc, Weak};
use arrow_schema::SchemaRef as ArrowSchemaRef;
use common_catalog::consts::INFORMATION_SCHEMA_PROCEDURE_INFO_TABLE_ID;
use common_error::ext::BoxedError;
use common_procedure::ProcedureInfo;
use common_recordbatch::adapter::RecordBatchStreamAdapter;
use common_recordbatch::{RecordBatch, SendableRecordBatchStream};
use common_time::timestamp::Timestamp;
use datafusion::execution::TaskContext;
use datafusion::physical_plan::stream::RecordBatchStreamAdapter as DfRecordBatchStreamAdapter;
use datafusion::physical_plan::streaming::PartitionStream as DfPartitionStream;
use datafusion::physical_plan::SendableRecordBatchStream as DfSendableRecordBatchStream;
use datatypes::prelude::{ConcreteDataType, ScalarVectorBuilder, VectorRef};
use datatypes::schema::{ColumnSchema, Schema, SchemaRef};
use datatypes::timestamp::TimestampMillisecond;
use datatypes::value::Value;
use datatypes::vectors::{StringVectorBuilder, TimestampMillisecondVectorBuilder};
use snafu::ResultExt;
use store_api::storage::{ScanRequest, TableId};
use super::PROCEDURE_INFO;
use crate::error::{CreateRecordBatchSnafu, InternalSnafu, Result};
use crate::system_schema::information_schema::{InformationTable, Predicates};
use crate::system_schema::utils;
use crate::CatalogManager;
const PROCEDURE_ID: &str = "procedure_id";
const PROCEDURE_TYPE: &str = "procedure_type";
const START_TIME: &str = "start_time";
const END_TIME: &str = "end_time";
const STATUS: &str = "status";
const LOCK_KEYS: &str = "lock_keys";
const INIT_CAPACITY: usize = 42;
/// The `PROCEDURE_INFO` table provides information about the current procedure information of the cluster.
///
/// - `procedure_id`: the unique identifier of the procedure.
/// - `procedure_name`: the name of the procedure.
/// - `start_time`: the starting execution time of the procedure.
/// - `end_time`: the ending execution time of the procedure.
/// - `status`: the status of the procedure.
/// - `lock_keys`: the lock keys of the procedure.
///
pub(super) struct InformationSchemaProcedureInfo {
schema: SchemaRef,
catalog_manager: Weak<dyn CatalogManager>,
}
impl InformationSchemaProcedureInfo {
pub(super) fn new(catalog_manager: Weak<dyn CatalogManager>) -> Self {
Self {
schema: Self::schema(),
catalog_manager,
}
}
pub(crate) fn schema() -> SchemaRef {
Arc::new(Schema::new(vec![
ColumnSchema::new(PROCEDURE_ID, ConcreteDataType::string_datatype(), false),
ColumnSchema::new(PROCEDURE_TYPE, ConcreteDataType::string_datatype(), false),
ColumnSchema::new(
START_TIME,
ConcreteDataType::timestamp_millisecond_datatype(),
true,
),
ColumnSchema::new(
END_TIME,
ConcreteDataType::timestamp_millisecond_datatype(),
true,
),
ColumnSchema::new(STATUS, ConcreteDataType::string_datatype(), false),
ColumnSchema::new(LOCK_KEYS, ConcreteDataType::string_datatype(), true),
]))
}
fn builder(&self) -> InformationSchemaProcedureInfoBuilder {
InformationSchemaProcedureInfoBuilder::new(
self.schema.clone(),
self.catalog_manager.clone(),
)
}
}
impl InformationTable for InformationSchemaProcedureInfo {
fn table_id(&self) -> TableId {
INFORMATION_SCHEMA_PROCEDURE_INFO_TABLE_ID
}
fn table_name(&self) -> &'static str {
PROCEDURE_INFO
}
fn schema(&self) -> SchemaRef {
self.schema.clone()
}
fn to_stream(&self, request: ScanRequest) -> Result<SendableRecordBatchStream> {
let schema = self.schema.arrow_schema().clone();
let mut builder = self.builder();
let stream = Box::pin(DfRecordBatchStreamAdapter::new(
schema,
futures::stream::once(async move {
builder
.make_procedure_info(Some(request))
.await
.map(|x| x.into_df_record_batch())
.map_err(Into::into)
}),
));
Ok(Box::pin(
RecordBatchStreamAdapter::try_new(stream)
.map_err(BoxedError::new)
.context(InternalSnafu)?,
))
}
}
struct InformationSchemaProcedureInfoBuilder {
schema: SchemaRef,
catalog_manager: Weak<dyn CatalogManager>,
procedure_ids: StringVectorBuilder,
procedure_types: StringVectorBuilder,
start_times: TimestampMillisecondVectorBuilder,
end_times: TimestampMillisecondVectorBuilder,
statuses: StringVectorBuilder,
lock_keys: StringVectorBuilder,
}
impl InformationSchemaProcedureInfoBuilder {
fn new(schema: SchemaRef, catalog_manager: Weak<dyn CatalogManager>) -> Self {
Self {
schema,
catalog_manager,
procedure_ids: StringVectorBuilder::with_capacity(INIT_CAPACITY),
procedure_types: StringVectorBuilder::with_capacity(INIT_CAPACITY),
start_times: TimestampMillisecondVectorBuilder::with_capacity(INIT_CAPACITY),
end_times: TimestampMillisecondVectorBuilder::with_capacity(INIT_CAPACITY),
statuses: StringVectorBuilder::with_capacity(INIT_CAPACITY),
lock_keys: StringVectorBuilder::with_capacity(INIT_CAPACITY),
}
}
/// Construct the `information_schema.procedure_info` virtual table
async fn make_procedure_info(&mut self, request: Option<ScanRequest>) -> Result<RecordBatch> {
let predicates = Predicates::from_scan_request(&request);
let information_extension = utils::information_extension(&self.catalog_manager)?;
let procedures = information_extension.procedures().await?;
for (status, procedure_info) in procedures {
self.add_procedure(&predicates, status, procedure_info);
}
self.finish()
}
fn add_procedure(
&mut self,
predicates: &Predicates,
status: String,
procedure_info: ProcedureInfo,
) {
let ProcedureInfo {
id,
type_name,
start_time_ms,
end_time_ms,
lock_keys,
..
} = procedure_info;
let pid = id.to_string();
let start_time = TimestampMillisecond(Timestamp::new_millisecond(start_time_ms));
let end_time = TimestampMillisecond(Timestamp::new_millisecond(end_time_ms));
let lock_keys = lock_keys.join(",");
let row = [
(PROCEDURE_ID, &Value::from(pid.clone())),
(PROCEDURE_TYPE, &Value::from(type_name.clone())),
(START_TIME, &Value::from(start_time)),
(END_TIME, &Value::from(end_time)),
(STATUS, &Value::from(status.clone())),
(LOCK_KEYS, &Value::from(lock_keys.clone())),
];
if !predicates.eval(&row) {
return;
}
self.procedure_ids.push(Some(&pid));
self.procedure_types.push(Some(&type_name));
self.start_times.push(Some(start_time));
self.end_times.push(Some(end_time));
self.statuses.push(Some(&status));
self.lock_keys.push(Some(&lock_keys));
}
fn finish(&mut self) -> Result<RecordBatch> {
let columns: Vec<VectorRef> = vec![
Arc::new(self.procedure_ids.finish()),
Arc::new(self.procedure_types.finish()),
Arc::new(self.start_times.finish()),
Arc::new(self.end_times.finish()),
Arc::new(self.statuses.finish()),
Arc::new(self.lock_keys.finish()),
];
RecordBatch::new(self.schema.clone(), columns).context(CreateRecordBatchSnafu)
}
}
impl DfPartitionStream for InformationSchemaProcedureInfo {
fn schema(&self) -> &ArrowSchemaRef {
self.schema.arrow_schema()
}
fn execute(&self, _: Arc<TaskContext>) -> DfSendableRecordBatchStream {
let schema = self.schema.arrow_schema().clone();
let mut builder = self.builder();
Box::pin(DfRecordBatchStreamAdapter::new(
schema,
futures::stream::once(async move {
builder
.make_procedure_info(None)
.await
.map(|x| x.into_df_record_batch())
.map_err(Into::into)
}),
))
}
}

View File

@@ -176,9 +176,9 @@ impl InformationSchemaRegionPeersBuilder {
let predicates = Predicates::from_scan_request(&request);
for schema_name in catalog_manager.schema_names(&catalog_name).await? {
for schema_name in catalog_manager.schema_names(&catalog_name, None).await? {
let table_id_stream = catalog_manager
.tables(&catalog_name, &schema_name)
.tables(&catalog_name, &schema_name, None)
.try_filter_map(|t| async move {
let table_info = t.table_info();
if table_info.table_type == TableType::Temporary {
@@ -224,8 +224,8 @@ impl InformationSchemaRegionPeersBuilder {
let region_id = RegionId::new(table_id, route.region.id.region_number()).as_u64();
let peer_id = route.leader_peer.clone().map(|p| p.id);
let peer_addr = route.leader_peer.clone().map(|p| p.addr);
let status = if let Some(status) = route.leader_status {
Some(status.as_ref().to_string())
let state = if let Some(state) = route.leader_state {
Some(state.as_ref().to_string())
} else {
// Alive by default
Some("ALIVE".to_string())
@@ -242,7 +242,7 @@ impl InformationSchemaRegionPeersBuilder {
self.peer_ids.push(peer_id);
self.peer_addrs.push(peer_addr.as_deref());
self.is_leaders.push(Some("Yes"));
self.statuses.push(status.as_deref());
self.statuses.push(state.as_deref());
self.down_seconds
.push(route.leader_down_millis().map(|m| m / 1000));
}

View File

@@ -0,0 +1,261 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
use std::sync::{Arc, Weak};
use arrow_schema::SchemaRef as ArrowSchemaRef;
use common_catalog::consts::INFORMATION_SCHEMA_REGION_STATISTICS_TABLE_ID;
use common_error::ext::BoxedError;
use common_meta::datanode::RegionStat;
use common_recordbatch::adapter::RecordBatchStreamAdapter;
use common_recordbatch::{DfSendableRecordBatchStream, RecordBatch, SendableRecordBatchStream};
use datafusion::execution::TaskContext;
use datafusion::physical_plan::stream::RecordBatchStreamAdapter as DfRecordBatchStreamAdapter;
use datafusion::physical_plan::streaming::PartitionStream as DfPartitionStream;
use datatypes::prelude::{ConcreteDataType, ScalarVectorBuilder, VectorRef};
use datatypes::schema::{ColumnSchema, Schema, SchemaRef};
use datatypes::value::Value;
use datatypes::vectors::{StringVectorBuilder, UInt32VectorBuilder, UInt64VectorBuilder};
use snafu::ResultExt;
use store_api::storage::{ScanRequest, TableId};
use super::{InformationTable, REGION_STATISTICS};
use crate::error::{CreateRecordBatchSnafu, InternalSnafu, Result};
use crate::information_schema::Predicates;
use crate::system_schema::utils;
use crate::CatalogManager;
const REGION_ID: &str = "region_id";
const TABLE_ID: &str = "table_id";
const REGION_NUMBER: &str = "region_number";
const REGION_ROWS: &str = "region_rows";
const DISK_SIZE: &str = "disk_size";
const MEMTABLE_SIZE: &str = "memtable_size";
const MANIFEST_SIZE: &str = "manifest_size";
const SST_SIZE: &str = "sst_size";
const INDEX_SIZE: &str = "index_size";
const ENGINE: &str = "engine";
const REGION_ROLE: &str = "region_role";
const INIT_CAPACITY: usize = 42;
/// The `REGION_STATISTICS` table provides information about the region statistics. Including fields:
///
/// - `region_id`: The region id.
/// - `table_id`: The table id.
/// - `region_number`: The region number.
/// - `region_rows`: The number of rows in region.
/// - `memtable_size`: The memtable size in bytes.
/// - `disk_size`: The approximate disk size in bytes.
/// - `manifest_size`: The manifest size in bytes.
/// - `sst_size`: The sst data files size in bytes.
/// - `index_size`: The sst index files size in bytes.
/// - `engine`: The engine type.
/// - `region_role`: The region role.
///
pub(super) struct InformationSchemaRegionStatistics {
schema: SchemaRef,
catalog_manager: Weak<dyn CatalogManager>,
}
impl InformationSchemaRegionStatistics {
pub(super) fn new(catalog_manager: Weak<dyn CatalogManager>) -> Self {
Self {
schema: Self::schema(),
catalog_manager,
}
}
pub(crate) fn schema() -> SchemaRef {
Arc::new(Schema::new(vec![
ColumnSchema::new(REGION_ID, ConcreteDataType::uint64_datatype(), false),
ColumnSchema::new(TABLE_ID, ConcreteDataType::uint32_datatype(), false),
ColumnSchema::new(REGION_NUMBER, ConcreteDataType::uint32_datatype(), false),
ColumnSchema::new(REGION_ROWS, ConcreteDataType::uint64_datatype(), true),
ColumnSchema::new(DISK_SIZE, ConcreteDataType::uint64_datatype(), true),
ColumnSchema::new(MEMTABLE_SIZE, ConcreteDataType::uint64_datatype(), true),
ColumnSchema::new(MANIFEST_SIZE, ConcreteDataType::uint64_datatype(), true),
ColumnSchema::new(SST_SIZE, ConcreteDataType::uint64_datatype(), true),
ColumnSchema::new(INDEX_SIZE, ConcreteDataType::uint64_datatype(), true),
ColumnSchema::new(ENGINE, ConcreteDataType::string_datatype(), true),
ColumnSchema::new(REGION_ROLE, ConcreteDataType::string_datatype(), true),
]))
}
fn builder(&self) -> InformationSchemaRegionStatisticsBuilder {
InformationSchemaRegionStatisticsBuilder::new(
self.schema.clone(),
self.catalog_manager.clone(),
)
}
}
impl InformationTable for InformationSchemaRegionStatistics {
fn table_id(&self) -> TableId {
INFORMATION_SCHEMA_REGION_STATISTICS_TABLE_ID
}
fn table_name(&self) -> &'static str {
REGION_STATISTICS
}
fn schema(&self) -> SchemaRef {
self.schema.clone()
}
fn to_stream(&self, request: ScanRequest) -> Result<SendableRecordBatchStream> {
let schema = self.schema.arrow_schema().clone();
let mut builder = self.builder();
let stream = Box::pin(DfRecordBatchStreamAdapter::new(
schema,
futures::stream::once(async move {
builder
.make_region_statistics(Some(request))
.await
.map(|x| x.into_df_record_batch())
.map_err(Into::into)
}),
));
Ok(Box::pin(
RecordBatchStreamAdapter::try_new(stream)
.map_err(BoxedError::new)
.context(InternalSnafu)?,
))
}
}
struct InformationSchemaRegionStatisticsBuilder {
schema: SchemaRef,
catalog_manager: Weak<dyn CatalogManager>,
region_ids: UInt64VectorBuilder,
table_ids: UInt32VectorBuilder,
region_numbers: UInt32VectorBuilder,
region_rows: UInt64VectorBuilder,
disk_sizes: UInt64VectorBuilder,
memtable_sizes: UInt64VectorBuilder,
manifest_sizes: UInt64VectorBuilder,
sst_sizes: UInt64VectorBuilder,
index_sizes: UInt64VectorBuilder,
engines: StringVectorBuilder,
region_roles: StringVectorBuilder,
}
impl InformationSchemaRegionStatisticsBuilder {
fn new(schema: SchemaRef, catalog_manager: Weak<dyn CatalogManager>) -> Self {
Self {
schema,
catalog_manager,
region_ids: UInt64VectorBuilder::with_capacity(INIT_CAPACITY),
table_ids: UInt32VectorBuilder::with_capacity(INIT_CAPACITY),
region_numbers: UInt32VectorBuilder::with_capacity(INIT_CAPACITY),
region_rows: UInt64VectorBuilder::with_capacity(INIT_CAPACITY),
disk_sizes: UInt64VectorBuilder::with_capacity(INIT_CAPACITY),
memtable_sizes: UInt64VectorBuilder::with_capacity(INIT_CAPACITY),
manifest_sizes: UInt64VectorBuilder::with_capacity(INIT_CAPACITY),
sst_sizes: UInt64VectorBuilder::with_capacity(INIT_CAPACITY),
index_sizes: UInt64VectorBuilder::with_capacity(INIT_CAPACITY),
engines: StringVectorBuilder::with_capacity(INIT_CAPACITY),
region_roles: StringVectorBuilder::with_capacity(INIT_CAPACITY),
}
}
/// Construct a new `InformationSchemaRegionStatistics` from the collected data.
async fn make_region_statistics(
&mut self,
request: Option<ScanRequest>,
) -> Result<RecordBatch> {
let predicates = Predicates::from_scan_request(&request);
let information_extension = utils::information_extension(&self.catalog_manager)?;
let region_stats = information_extension.region_stats().await?;
for region_stat in region_stats {
self.add_region_statistic(&predicates, region_stat);
}
self.finish()
}
fn add_region_statistic(&mut self, predicate: &Predicates, region_stat: RegionStat) {
let row = [
(REGION_ID, &Value::from(region_stat.id.as_u64())),
(TABLE_ID, &Value::from(region_stat.id.table_id())),
(REGION_NUMBER, &Value::from(region_stat.id.region_number())),
(REGION_ROWS, &Value::from(region_stat.num_rows)),
(DISK_SIZE, &Value::from(region_stat.approximate_bytes)),
(MEMTABLE_SIZE, &Value::from(region_stat.memtable_size)),
(MANIFEST_SIZE, &Value::from(region_stat.manifest_size)),
(SST_SIZE, &Value::from(region_stat.sst_size)),
(INDEX_SIZE, &Value::from(region_stat.index_size)),
(ENGINE, &Value::from(region_stat.engine.as_str())),
(REGION_ROLE, &Value::from(region_stat.role.to_string())),
];
if !predicate.eval(&row) {
return;
}
self.region_ids.push(Some(region_stat.id.as_u64()));
self.table_ids.push(Some(region_stat.id.table_id()));
self.region_numbers
.push(Some(region_stat.id.region_number()));
self.region_rows.push(Some(region_stat.num_rows));
self.disk_sizes.push(Some(region_stat.approximate_bytes));
self.memtable_sizes.push(Some(region_stat.memtable_size));
self.manifest_sizes.push(Some(region_stat.manifest_size));
self.sst_sizes.push(Some(region_stat.sst_size));
self.index_sizes.push(Some(region_stat.index_size));
self.engines.push(Some(&region_stat.engine));
self.region_roles.push(Some(&region_stat.role.to_string()));
}
fn finish(&mut self) -> Result<RecordBatch> {
let columns: Vec<VectorRef> = vec![
Arc::new(self.region_ids.finish()),
Arc::new(self.table_ids.finish()),
Arc::new(self.region_numbers.finish()),
Arc::new(self.region_rows.finish()),
Arc::new(self.disk_sizes.finish()),
Arc::new(self.memtable_sizes.finish()),
Arc::new(self.manifest_sizes.finish()),
Arc::new(self.sst_sizes.finish()),
Arc::new(self.index_sizes.finish()),
Arc::new(self.engines.finish()),
Arc::new(self.region_roles.finish()),
];
RecordBatch::new(self.schema.clone(), columns).context(CreateRecordBatchSnafu)
}
}
impl DfPartitionStream for InformationSchemaRegionStatistics {
fn schema(&self) -> &ArrowSchemaRef {
self.schema.arrow_schema()
}
fn execute(&self, _: Arc<TaskContext>) -> DfSendableRecordBatchStream {
let schema = self.schema.arrow_schema().clone();
let mut builder = self.builder();
Box::pin(DfRecordBatchStreamAdapter::new(
schema,
futures::stream::once(async move {
builder
.make_region_statistics(None)
.await
.map(|x| x.into_df_record_batch())
.map_err(Into::into)
}),
))
}
}

View File

@@ -36,7 +36,8 @@ use crate::error::{
CreateRecordBatchSnafu, InternalSnafu, Result, TableMetadataManagerSnafu,
UpgradeWeakCatalogManagerRefSnafu,
};
use crate::system_schema::information_schema::{utils, InformationTable, Predicates};
use crate::system_schema::information_schema::{InformationTable, Predicates};
use crate::system_schema::utils;
use crate::CatalogManager;
pub const CATALOG_NAME: &str = "catalog_name";
@@ -170,7 +171,7 @@ impl InformationSchemaSchemataBuilder {
let table_metadata_manager = utils::table_meta_manager(&self.catalog_manager)?;
let predicates = Predicates::from_scan_request(&request);
for schema_name in catalog_manager.schema_names(&catalog_name).await? {
for schema_name in catalog_manager.schema_names(&catalog_name, None).await? {
let opts = if let Some(table_metadata_manager) = &table_metadata_manager {
table_metadata_manager
.schema_manager()
@@ -179,7 +180,7 @@ impl InformationSchemaSchemataBuilder {
.context(TableMetadataManagerSnafu)?
// information_schema is not available from this
// table_metadata_manager and we return None
.map(|schema_opts| format!("{schema_opts}"))
.map(|schema_opts| format!("{}", schema_opts.into_inner()))
} else {
None
};

View File

@@ -176,8 +176,8 @@ impl InformationSchemaTableConstraintsBuilder {
.context(UpgradeWeakCatalogManagerRefSnafu)?;
let predicates = Predicates::from_scan_request(&request);
for schema_name in catalog_manager.schema_names(&catalog_name).await? {
let mut stream = catalog_manager.tables(&catalog_name, &schema_name);
for schema_name in catalog_manager.schema_names(&catalog_name, None).await? {
let mut stream = catalog_manager.tables(&catalog_name, &schema_name, None);
while let Some(table) = stream.try_next().await? {
let keys = &table.table_info().meta.primary_key_indices;

View File

@@ -12,7 +12,7 @@
// See the License for the specific language governing permissions and
// limitations under the License.
/// All table names in `information_schema`.
//! All table names in `information_schema`.
pub const TABLES: &str = "tables";
pub const COLUMNS: &str = "columns";
@@ -45,3 +45,5 @@ pub const TABLE_CONSTRAINTS: &str = "table_constraints";
pub const CLUSTER_INFO: &str = "cluster_info";
pub const VIEWS: &str = "views";
pub const FLOWS: &str = "flows";
pub const PROCEDURE_INFO: &str = "procedure_info";
pub const REGION_STATISTICS: &str = "region_statistics";

View File

@@ -12,13 +12,16 @@
// See the License for the specific language governing permissions and
// limitations under the License.
use std::collections::HashSet;
use std::sync::{Arc, Weak};
use arrow_schema::SchemaRef as ArrowSchemaRef;
use common_catalog::consts::INFORMATION_SCHEMA_TABLES_TABLE_ID;
use common_catalog::consts::{INFORMATION_SCHEMA_TABLES_TABLE_ID, MITO_ENGINE};
use common_error::ext::BoxedError;
use common_meta::datanode::RegionStat;
use common_recordbatch::adapter::RecordBatchStreamAdapter;
use common_recordbatch::{RecordBatch, SendableRecordBatchStream};
use common_telemetry::error;
use datafusion::execution::TaskContext;
use datafusion::physical_plan::stream::RecordBatchStreamAdapter as DfRecordBatchStreamAdapter;
use datafusion::physical_plan::streaming::PartitionStream as DfPartitionStream;
@@ -31,7 +34,7 @@ use datatypes::vectors::{
};
use futures::TryStreamExt;
use snafu::{OptionExt, ResultExt};
use store_api::storage::{ScanRequest, TableId};
use store_api::storage::{RegionId, ScanRequest, TableId};
use table::metadata::{TableInfo, TableType};
use super::TABLES;
@@ -39,6 +42,7 @@ use crate::error::{
CreateRecordBatchSnafu, InternalSnafu, Result, UpgradeWeakCatalogManagerRefSnafu,
};
use crate::system_schema::information_schema::{InformationTable, Predicates};
use crate::system_schema::utils;
use crate::CatalogManager;
pub const TABLE_CATALOG: &str = "table_catalog";
@@ -234,17 +238,51 @@ impl InformationSchemaTablesBuilder {
.context(UpgradeWeakCatalogManagerRefSnafu)?;
let predicates = Predicates::from_scan_request(&request);
for schema_name in catalog_manager.schema_names(&catalog_name).await? {
let mut stream = catalog_manager.tables(&catalog_name, &schema_name);
let information_extension = utils::information_extension(&self.catalog_manager)?;
// TODO(dennis): `region_stats` API is not stable in distributed cluster because of network issue etc.
// But we don't want the statements such as `show tables` fail,
// so using `unwrap_or_else` here instead of `?` operator.
let region_stats = information_extension
.region_stats()
.await
.map_err(|e| {
error!(e; "Failed to call region_stats");
e
})
.unwrap_or_else(|_| vec![]);
for schema_name in catalog_manager.schema_names(&catalog_name, None).await? {
let mut stream = catalog_manager.tables(&catalog_name, &schema_name, None);
while let Some(table) = stream.try_next().await? {
let table_info = table.table_info();
// TODO(dennis): make it working for metric engine
let table_region_stats =
if table_info.meta.engine == MITO_ENGINE || table_info.is_physical_table() {
let region_ids = table_info
.meta
.region_numbers
.iter()
.map(|n| RegionId::new(table_info.ident.table_id, *n))
.collect::<HashSet<_>>();
region_stats
.iter()
.filter(|stat| region_ids.contains(&stat.id))
.collect::<Vec<_>>()
} else {
vec![]
};
self.add_table(
&predicates,
&catalog_name,
&schema_name,
table_info,
table.table_type(),
&table_region_stats,
);
}
}
@@ -260,6 +298,7 @@ impl InformationSchemaTablesBuilder {
schema_name: &str,
table_info: Arc<TableInfo>,
table_type: TableType,
region_stats: &[&RegionStat],
) {
let table_name = table_info.name.as_ref();
let table_id = table_info.table_id();
@@ -273,7 +312,9 @@ impl InformationSchemaTablesBuilder {
let row = [
(TABLE_CATALOG, &Value::from(catalog_name)),
(TABLE_ID, &Value::from(table_id)),
(TABLE_SCHEMA, &Value::from(schema_name)),
(ENGINE, &Value::from(engine)),
(TABLE_NAME, &Value::from(table_name)),
(TABLE_TYPE, &Value::from(table_type_text)),
];
@@ -287,21 +328,39 @@ impl InformationSchemaTablesBuilder {
self.table_names.push(Some(table_name));
self.table_types.push(Some(table_type_text));
self.table_ids.push(Some(table_id));
let data_length = region_stats.iter().map(|stat| stat.sst_size).sum();
let table_rows = region_stats.iter().map(|stat| stat.num_rows).sum();
let index_length = region_stats.iter().map(|stat| stat.index_size).sum();
// It's not precise, but it is acceptable for long-term data storage.
let avg_row_length = if table_rows > 0 {
let total_data_length = data_length
+ region_stats
.iter()
.map(|stat| stat.memtable_size)
.sum::<u64>();
total_data_length / table_rows
} else {
0
};
self.data_length.push(Some(data_length));
self.index_length.push(Some(index_length));
self.table_rows.push(Some(table_rows));
self.avg_row_length.push(Some(avg_row_length));
// TODO(sunng87): use real data for these fields
self.data_length.push(Some(0));
self.max_data_length.push(Some(0));
self.index_length.push(Some(0));
self.avg_row_length.push(Some(0));
self.max_index_length.push(Some(0));
self.checksum.push(Some(0));
self.table_rows.push(Some(0));
self.max_index_length.push(Some(0));
self.data_free.push(Some(0));
self.auto_increment.push(Some(0));
self.row_format.push(Some("Fixed"));
self.table_collation.push(None);
self.table_collation.push(Some("utf8_bin"));
self.update_time.push(None);
self.check_time.push(None);
// use mariadb default table version number here
self.version.push(Some(11));
self.table_comment.push(table_info.desc.as_deref());

Some files were not shown because too many files have changed in this diff Show More