Compare commits

..

96 Commits

Author SHA1 Message Date
Ruihang Xia
fce1687fa7 fix: incorrect timestamp index inference (#7530)
* add sqlness case, but can't reproduce

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* reproduction

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix wildcard rule

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* sort result

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2026-01-07 11:18:25 +00:00
Yingwen
ef6dd5b99f fix: precise filter time index if not in projection (#7531)
* fix: precise filter time index if not in projection

Signed-off-by: evenyag <realevenyag@gmail.com>

* test: add sqlness test

Signed-off-by: evenyag <realevenyag@gmail.com>

---------

Signed-off-by: evenyag <realevenyag@gmail.com>
2026-01-07 11:15:34 +00:00
discord9
ac6d68aa2d fix: simp expr recursively (#7523)
* fix: simp expr recursively

Signed-off-by: discord9 <discord9@163.com>

* test: some simple constant folding case

Signed-off-by: discord9 <discord9@163.com>

* fix: literal ts cast to UTC

Signed-off-by: discord9 <discord9@163.com>

* fix: patch merge scan batch col tz instead

Signed-off-by: discord9 <discord9@163.com>

* test: fix

Signed-off-by: discord9 <discord9@163.com>

---------

Signed-off-by: discord9 <discord9@163.com>
2026-01-07 09:22:26 +00:00
Ruihang Xia
d39895a970 feat: tune query traces (#7524)
* feat: add partition and region id

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* wip: instrument mito

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* connect region scan span

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* instrument streams

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* tweak

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2026-01-07 08:11:09 +00:00
jeremyhi
59867cd5b6 fix: remove log_env_flags (#7529)
Signed-off-by: jeremyhi <fengjiachun@gmail.com>
2026-01-07 08:08:35 +00:00
Ruihang Xia
9a4b7cbb32 feat: bump promql-parser to v0.7.1 (#7521)
* feat: bump promql-parser to v0.7.0

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* add sqlness tests

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* update other sqlness results

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* Update tests/cases/standalone/common/tql/case_sensitive.result

Co-authored-by: Ning Sun <sunng@protonmail.com>

* remove escape on greptimedb side

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* update to v0.7.1

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* remove unused deps

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Co-authored-by: Ning Sun <sunng@protonmail.com>
2026-01-07 07:23:40 +00:00
Weny Xu
2f242927a8 feat(repartition): implement region deallocation for repartition procedure (#7522)
* feat: implement deallocate regions for repartition procedure

Signed-off-by: WenyXu <wenymedia@gmail.com>

* feat(metric-engine): add force flag to drop physical regions with associated logical regions

Signed-off-by: WenyXu <wenymedia@gmail.com>

* feat: update table metadata after deallocating regions

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: apply suggestions

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: update proto

Signed-off-by: WenyXu <wenymedia@gmail.com>

---------

Signed-off-by: WenyXu <wenymedia@gmail.com>
2026-01-07 06:13:48 +00:00
Weny Xu
77310ec5bd refactor: refactor CreateTableProcedure to extract reusable components (#7526)
Signed-off-by: WenyXu <wenymedia@gmail.com>
2026-01-07 01:58:53 +00:00
Weny Xu
ada4666e10 refactor: remove region_numbers from TableMeta and TableInfo (#7519)
* refactor: remove `region_numbers` from `TableMeta` and `TableInfo`

Signed-off-by: WenyXu <wenymedia@gmail.com>

* feat: create partitions from region route

Signed-off-by: WenyXu <wenymedia@gmail.com>

* fix: fix build

Signed-off-by: WenyXu <wenymedia@gmail.com>

---------

Signed-off-by: WenyXu <wenymedia@gmail.com>
2026-01-06 13:21:36 +00:00
jeremyhi
898e84898c feat!: make heartbeat config only in metasrv (#7510)
* feat: make heartbeat config only in metasrv

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* feat: refine config doc

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* feat: make the heartbeat setup simple

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* chore: by comment

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* chore: revert config

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* feat: proto update

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* chore: fix sqlness wrong cfg

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

---------

Signed-off-by: jeremyhi <fengjiachun@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-01-06 09:43:36 +00:00
discord9
6f86a22e6f feat: adjust some args to gc worker (#7469)
* chore: less stuff sent

Signed-off-by: discord9 <discord9@163.com>

* after rebase fix

Signed-off-by: discord9 <discord9@163.com>

* pcr

Signed-off-by: discord9 <discord9@163.com>

* fix: clarify comment on manifest file removal for GC worker

Signed-off-by: discord9 <discord9@163.com>

* per review

Signed-off-by: discord9 <discord9@163.com>

---------

Signed-off-by: discord9 <discord9@163.com>
2026-01-06 07:37:05 +00:00
Ruihang Xia
5162c1de4d feat: repartition grammar candy (#7518)
* feat: repartition grammar candy

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* align keyword

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2026-01-06 04:44:13 +00:00
LFC
522ca99cd6 feat: ingest jsonbench data through pipeline (#7312)
Signed-off-by: luofucong <luofc@foxmail.com>
2026-01-05 12:12:34 +00:00
Weny Xu
2d756b24c8 feat: implement RemapManifest and ApplyStagingManifest for repartition procedure (#7509)
* feat: add RemapManifest and ApplyStagingManifest heartbeat handler

Signed-off-by: WenyXu <wenymedia@gmail.com>

* feat: add `RemapManifest` and `ApplyStagingManifest` states for repartition

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: apply suggestions from CR

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: apply suggestions from CR

Signed-off-by: WenyXu <wenymedia@gmail.com>

---------

Signed-off-by: WenyXu <wenymedia@gmail.com>
2026-01-05 08:33:44 +00:00
shuiyisong
527a1c03f3 fix: pipeline loading issue (#7491)
* fix: pipeline loading

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* chore: change string to str

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* chore: minor fix to save returned version

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* refactor: introduce PipelineContent

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* fix: use found schema

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* chore: update CR

Co-authored-by: Yingwen <realevenyag@gmail.com>

* chore: CR issue

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

---------

Signed-off-by: shuiyisong <xixing.sys@gmail.com>
Co-authored-by: Yingwen <realevenyag@gmail.com>
2026-01-05 06:49:44 +00:00
discord9
7e243632c7 fix: dist planner rm col req when rm sort (#7512)
* aha!

Signed-off-by: discord9 <discord9@163.com>

* fix: rm col_req in pql sort

Signed-off-by: discord9 <discord9@163.com>

* ut

Signed-off-by: discord9 <discord9@163.com>

* docs

Signed-off-by: discord9 <discord9@163.com>

* typo

Signed-off-by: discord9 <discord9@163.com>

* more typo

Signed-off-by: discord9 <discord9@163.com>

---------

Signed-off-by: discord9 <discord9@163.com>
2026-01-05 03:27:11 +00:00
Ruihang Xia
3556eb4476 chore: add tests to comment column on information_schema (#7514)
* feat: show comment on information_schema

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* add to information schema for columns, add sqlness tests

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* remove duplications

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix typo

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* update integration test

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2026-01-04 09:05:50 +00:00
Weny Xu
9343da7fe8 feat(meta-srv): fallback to non-TLS connection when etcd TLS prefer mode fail (#7507)
* feat(meta-srv): fallback to non-TLS connection when etcd TLS prefer mode fail

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore(ci): set timeout for deploy cluster

Signed-off-by: WenyXu <wenymedia@gmail.com>

* refactor: simplify etcd TLS prefer mode handling

Signed-off-by: WenyXu <wenymedia@gmail.com>

---------

Signed-off-by: WenyXu <wenymedia@gmail.com>
2025-12-31 10:03:34 +00:00
Alan Tang
8a07dbf605 fix: fix sqlness test error about double precision (#7476)
* fix: fix sqlness test error about double precision

Signed-off-by: StandingMan <jmtangcs@gmail.com>

* fix: use round method to truncate the result

Signed-off-by: StandingMan <jmtangcs@gmail.com>

---------

Signed-off-by: StandingMan <jmtangcs@gmail.com>
2025-12-31 04:55:22 +00:00
Weny Xu
83932c8c9e fix: align backend_tls default value with example config (#7496)
* fix: align backend_tls default value with example config

Signed-off-by: WenyXu <wenymedia@gmail.com>

* Update src/common/meta/src/kv_backend/rds/postgres.rs

Co-authored-by: dennis zhuang <killme2008@gmail.com>

---------

Signed-off-by: WenyXu <wenymedia@gmail.com>
Co-authored-by: dennis zhuang <killme2008@gmail.com>
2025-12-31 03:31:08 +00:00
LFC
dc9fc582a0 feat: impl json_get_int for new json type (#7495)
Update src/common/function/src/scalars/json/json_get.rs



impl `json_get_int` for new json type

Signed-off-by: luofucong <luofc@foxmail.com>
2025-12-30 09:42:16 +00:00
Weny Xu
b1d81913f5 feat: update ApplyStagingManifestRequest to fetch manifest from central region (#7493)
* feat: update ApplyStagingManifestRequest to fetch manifest from central region

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: refine comments

Signed-off-by: WenyXu <wenymedia@gmail.com>

* refactor(mito2): rename `StagingDataStorage` to `StagingBlobStorage`

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: apply suggestions

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: update proto

Signed-off-by: WenyXu <wenymedia@gmail.com>

---------

Signed-off-by: WenyXu <wenymedia@gmail.com>
2025-12-30 07:29:56 +00:00
Yingwen
554f3943b6 ci: update breaking change title level (#7497)
Signed-off-by: evenyag <realevenyag@gmail.com>
2025-12-30 06:17:51 +00:00
dennis zhuang
e4b5ef275f feat: impl vector index building (#7468)
* feat: impl vector index building

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* feat: supports flat format

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* ci: add vector_index feature to test

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* chore: apply suggestions

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* chore: apply suggestions from copilot

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

---------

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
2025-12-30 03:38:51 +00:00
Yingwen
f2a9d50071 ci: handle prerelease version (#7492)
Signed-off-by: evenyag <realevenyag@gmail.com>
2025-12-29 08:21:05 +00:00
LFC
0c54e70e1f feat: impl json_get_string with new json type (#7489)
* impl `json_get_string` with new json type

Signed-off-by: luofucong <luofc@foxmail.com>

* resolve PR comments

Signed-off-by: luofucong <luofc@foxmail.com>

* fix ci

Signed-off-by: luofucong <luofc@foxmail.com>

---------

Signed-off-by: luofucong <luofc@foxmail.com>
2025-12-29 04:35:53 +00:00
Yingwen
b51f62c3c2 feat: bump version to beta.4 (#7490)
Signed-off-by: evenyag <realevenyag@gmail.com>
2025-12-29 04:20:00 +00:00
discord9
1ddc535b52 feat: repartition map kv (#7420)
* table partition key

Signed-off-by: discord9 <discord9@163.com>

* feat: table part key

Signed-off-by: discord9 <discord9@163.com>

* ut

Signed-off-by: discord9 <discord9@163.com>

* stuff

Signed-off-by: discord9 <discord9@163.com>

* feat: add Default trait to TablePartValue struct

Signed-off-by: discord9 <discord9@163.com>

* rename to Rep

Signed-off-by: discord9 <discord9@163.com>

* rename file

Signed-off-by: discord9 <discord9@163.com>

* more rename

Signed-off-by: discord9 <discord9@163.com>

* pcr

Signed-off-by: discord9 <discord9@163.com>

* test: update err msg

Signed-off-by: discord9 <discord9@163.com>

* feat: add TableRepartKey to TableMetadataManager

Signed-off-by: discord9 <discord9@163.com>

* feat: add TableRepartManager to TableMetadataManager

Signed-off-by: discord9 <discord9@163.com>

* docs: udpate

Signed-off-by: discord9 <discord9@163.com>

* c

Signed-off-by: discord9 <discord9@163.com>

---------

Signed-off-by: discord9 <discord9@163.com>
2025-12-29 02:45:35 +00:00
Weny Xu
b25f24c6fe feat(meta-srv): add repartition procedure skeleton (#7487)
Signed-off-by: WenyXu <wenymedia@gmail.com>
2025-12-26 11:23:47 +00:00
Lei, HUANG
7bc0934eb3 refactor(mito2): make MemtableStats fields public (#7488)
Change visibility of estimated_bytes, time_range, max_sequence, and
series_count fields from private to public for external access.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
2025-12-26 09:57:18 +00:00
Yingwen
89b9469250 feat: Implement per range stats for bulk memtable (#7486)
* feat: implement per range stats for MemtableRange

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: extract methods to MemtableRanges

Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: simple bulk memtable set other fields in stats

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: use time_index_type()

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: use time index type

Signed-off-by: evenyag <realevenyag@gmail.com>

---------

Signed-off-by: evenyag <realevenyag@gmail.com>
2025-12-26 07:24:11 +00:00
Weny Xu
518a4e013b refactor(mito2): reorganize manifest storage into modular components (#7483)
* refactor(mito2): reorganize manifest storage into modular components

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: apply suggestions

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: apply suggestions from CR

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: sort

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: fmt

Signed-off-by: WenyXu <wenymedia@gmail.com>

---------

Signed-off-by: WenyXu <wenymedia@gmail.com>
2025-12-26 02:24:27 +00:00
Lei, HUANG
fffad499ca chore: mount cargo git cache in docker builds (#7484)
Mount the cargo git cache directory (${HOME}/.cargo/git) in docker build
containers to improve rebuild performance by caching git dependencies.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
2025-12-26 01:56:11 +00:00
yihong
0c9f58316d fix: more wait time for sqlness start and better message (#7485)
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
2025-12-26 01:55:20 +00:00
ZonaHe
4f290111db feat: update dashboard to v0.11.11 (#7481)
Co-authored-by: sunchanglong <sunchanglong@users.noreply.github.com>
2025-12-25 18:43:14 +00:00
Weny Xu
294f19fa1d feat(metric-engine): support sync logical regions from source region (#7438)
* chore: move file

Signed-off-by: WenyXu <wenymedia@gmail.com>

* feat(metric-engine): support sync logical regions from source region

Signed-off-by: WenyXu <wenymedia@gmail.com>

* fix: fix unit tests

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: apply suggestions

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: add comments

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: add comments

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: apply suggestions from CR

Signed-off-by: WenyXu <wenymedia@gmail.com>

---------

Signed-off-by: WenyXu <wenymedia@gmail.com>
2025-12-25 09:06:58 +00:00
ZonaHe
be530ac1de feat: update dashboard to v0.11.10 (#7479)
Co-authored-by: sunchanglong <sunchanglong@users.noreply.github.com>
2025-12-25 04:27:10 +00:00
jeremyhi
434b4d8183 feat: refine the MemoryGuard (#7466)
* feat: refine MemoryGuard

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* chore: add test

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

---------

Signed-off-by: jeremyhi <fengjiachun@gmail.com>
2025-12-25 04:09:32 +00:00
Lei, HUANG
3ad0b60c4b chore(metric-engine): set default compaction time window for data region (#7474)
chore: set compaction time window for metric engine data region to 1 day by default

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
2025-12-25 03:55:17 +00:00
Ning Sun
19ae845225 refactor: cache server memory limiter for other components (#7470) 2025-12-25 03:46:50 +00:00
dennis zhuang
3866512cf6 feat: add more MySQL-compatible string functions (#7454)
* feat: add more mysql string functions

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* refactor: use datafusion aliasing mechanism, close #7415

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* chore: comment

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* fix: comment and style

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

---------

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
2025-12-25 03:28:57 +00:00
LFC
d4870ee2af fix: typo in AI-assisted contributions policy (#7472)
* Fix typo in AI-assisted contributions policy

* Update project name from DataFusion to GreptimeDB
2025-12-25 03:03:14 +00:00
discord9
aea4e9fa55 fix: RemovedFiles deser compatibility (#7475)
* fix: compat for RemovedFiles

Signed-off-by: discord9 <discord9@163.com>

* cr

Signed-off-by: discord9 <discord9@163.com>

---------

Signed-off-by: discord9 <discord9@163.com>
2025-12-25 02:50:34 +00:00
AntiTopQuark
cea578244c fix(compaction): unify behavior of database compaction options with TTL (#7402)
* fix: fix dynamic compactiom option,unify behavior of database compaction options with TTL option

Signed-off-by: AntiTopQuark <AntiTopQuark1350@outlook.com>

* fix unit test

Signed-off-by: AntiTopQuark <AntiTopQuark1350@outlook.com>

* add debug log

Signed-off-by: AntiTopQuark <AntiTopQuark1350@outlook.com>

---------

Signed-off-by: AntiTopQuark <AntiTopQuark1350@outlook.com>
2025-12-25 02:34:42 +00:00
Weny Xu
e1b18614ee feat(mito2): implement ApplyStagingManifest request handling (#7456)
* feat(mito2): implement `ApplyStagingManifest` request handling

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: apply suggestions from CR

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: fmt

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: apply suggestions from CR

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: apply suggestions from CR

Signed-off-by: WenyXu <wenymedia@gmail.com>

* fix: fix logic

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: update proto

Signed-off-by: WenyXu <wenymedia@gmail.com>

---------

Signed-off-by: WenyXu <wenymedia@gmail.com>
2025-12-24 09:05:09 +00:00
Frost Ming
4bae75ccdb docs: refer to the correct project name in AI guidelines (#7471)
doc: refer to the correct project name in AI guidelines
2025-12-24 07:58:36 +00:00
LFC
dc9f3a702e refactor: explicitly define json struct to ingest jsonbench data (#7462)
ingest jsonbench data

Signed-off-by: luofucong <luofc@foxmail.com>
2025-12-24 07:30:22 +00:00
Weny Xu
2d9967b981 fix(mito2): pass partition expr explicitly to flush task for region (#7461)
* fix(mito2): pass partition expr explicitly to flush task for staging mode

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: apply suggestions from CR

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: rename

Signed-off-by: WenyXu <wenymedia@gmail.com>

---------

Signed-off-by: WenyXu <wenymedia@gmail.com>
2025-12-24 04:18:06 +00:00
discord9
dec0d522f8 feat: gc versioned index (#7412)
* feat: add index version to file ref

Signed-off-by: discord9 <discord9@163.com>

* refactor wip

Signed-off-by: discord9 <discord9@163.com>

* wip

Signed-off-by: discord9 <discord9@163.com>

* update gc worker

Signed-off-by: discord9 <discord9@163.com>

* stuff

Signed-off-by: discord9 <discord9@163.com>

* gc report for index files

Signed-off-by: discord9 <discord9@163.com>

* fix: type

Signed-off-by: discord9 <discord9@163.com>

* stuff

Signed-off-by: discord9 <discord9@163.com>

* chore: clippy

Signed-off-by: discord9 <discord9@163.com>

* chore: metrics

Signed-off-by: discord9 <discord9@163.com>

* typo

Signed-off-by: discord9 <discord9@163.com>

* typo

Signed-off-by: discord9 <discord9@163.com>

* chore: naming

Signed-off-by: discord9 <discord9@163.com>

* docs: update explain

Signed-off-by: discord9 <discord9@163.com>

* test: parse file id/type from file path

Signed-off-by: discord9 <discord9@163.com>

* chore: change parse method visibility to crate

Signed-off-by: discord9 <discord9@163.com>

* pcr

Signed-off-by: discord9 <discord9@163.com>

* pcr

Signed-off-by: discord9 <discord9@163.com>

* chore

Signed-off-by: discord9 <discord9@163.com>

---------

Signed-off-by: discord9 <discord9@163.com>
2025-12-24 03:07:53 +00:00
dennis zhuang
17e2b98132 docs: rfc for vector index (#7353)
* docs: rfc for vector index

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* chore: explain why choose USearch and distributed query

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* fix: row id mapping

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* refine proposal

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* rename rfc file

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Co-authored-by: Ruihang Xia <waynestxia@gmail.com>
2025-12-24 02:54:25 +00:00
Weny Xu
ee86987912 feat(repartition): implement enter staging region state (#7447)
* feat(repartition): implement enter staging region state

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: apply suggestions from CR

Signed-off-by: WenyXu <wenymedia@gmail.com>

---------

Signed-off-by: WenyXu <wenymedia@gmail.com>
2025-12-24 02:50:27 +00:00
Ruihang Xia
0cea58c642 docs: about AI-assisted contributions (#7464)
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2025-12-23 14:20:21 +00:00
discord9
fdedbb8261 fix: part sort share same topk dyn filter&early stop use dyn filter (#7460)
* fix: part sort share same topk dyn filter

Signed-off-by: discord9 <discord9@163.com>

* test: one

Signed-off-by: discord9 <discord9@163.com>

* feat: use dyn filter properly instead

Signed-off-by: discord9 <discord9@163.com>

* c

Signed-off-by: discord9 <discord9@163.com>

* docs: explain why dyn filter work

Signed-off-by: discord9 <discord9@163.com>

* chore: after rebase fix

Signed-off-by: discord9 <discord9@163.com>

---------

Signed-off-by: discord9 <discord9@163.com>
2025-12-23 09:24:55 +00:00
Lanqing Yang
8d9afc83e3 feat: allow auto schema creation for pg (#7459)
Signed-off-by: lyang24 <lanqingy93@gmail.com>
2025-12-23 08:55:24 +00:00
LFC
625fdd09ea refactor!: remove not working metasrv cli option (#7446)
Signed-off-by: luofucong <luofc@foxmail.com>
2025-12-23 06:55:17 +00:00
discord9
b3bc3c76f1 feat: file range dynamic filter (#7441)
* feat: add dynamic filtering support in file range and predicate handling

Signed-off-by: discord9 <discord9@163.com>

* clippy

Signed-off-by: discord9 <discord9@163.com>

* c

Signed-off-by: discord9 <discord9@163.com>

* c

Signed-off-by: discord9 <discord9@163.com>

* per review

Signed-off-by: discord9 <discord9@163.com>

* per review

Signed-off-by: discord9 <discord9@163.com>

* pcr

Signed-off-by: discord9 <discord9@163.com>

* c

Signed-off-by: discord9 <discord9@163.com>

---------

Signed-off-by: discord9 <discord9@163.com>
2025-12-23 06:15:30 +00:00
yihong
342eb47e19 fix: close issue #7457 guard against empty buffer (#7458)
* fix: close issue #7457 guard against empty buffer

Signed-off-by: yihong0618 <zouzou0208@gmail.com>

* fix: add unittests for it

Signed-off-by: yihong0618 <zouzou0208@gmail.com>

---------

Signed-off-by: yihong0618 <zouzou0208@gmail.com>
2025-12-23 03:11:00 +00:00
jeremyhi
6a6b34c709 feat!: memory limiter unification write path (#7437)
* feat: remove option max_in_flight_write_bytes

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* feat: replace RequestMemoryLimiter

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* chore: add integration test

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* chore: fix test

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* fix: by AI comment

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* refactor: global permit pool on writing

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* chore: by ai comment

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

---------

Signed-off-by: jeremyhi <fengjiachun@gmail.com>
2025-12-23 02:18:49 +00:00
Lei, HUANG
a8b512dded chore: expose symbols (#7451)
* chore/expose-symbols:
 ### Commit Message

 Enhance `merge_and_dedup` Functionality in `flush.rs`

 - **Function Signature Update**: Modified the `merge_and_dedup` function to accept `append_mode` and `merge_mode` as separate parameters instead of using `options`.
 - **Function Accessibility**: Changed the visibility of `merge_and_dedup` to `pub` to allow external access.
 - **Function Calls Update**: Updated calls to `merge_and_dedup` within `memtable_flat_sources` to align with the new function signature, passing `options.append_mode` and `options.merge_mode()` directly.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* chore/expose-symbols:
 ### Add Merge and Deduplication Functionality

 - **File**: `src/mito2/src/flush.rs`
   - Introduced `merge_and_dedup` function to merge multiple record batch iterators and apply deduplication based on specified modes.
   - Added detailed documentation for the function, explaining its arguments, behavior, and usage examples.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

---------

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
2025-12-22 05:39:03 +00:00
Ning Sun
bd8ffd3db9 feat: pgwire 0.37 (#7443) 2025-12-22 05:13:39 +00:00
discord9
c0652f6dd5 chore: release push check against Cargo.toml (#7426)
Signed-off-by: discord9 <discord9@163.com>
2025-12-19 13:16:15 +00:00
Yingwen
fed6cb0806 fix: flat format use correct encoding in indexer for tags (#7440)
* test: add inverted and skipping test

Signed-off-by: evenyag <realevenyag@gmail.com>

* test: Add tests for fulltext index

Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: index dictionary type in correct encoding in flat format

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: use encode_data_type() in SortField

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: refine imports

Signed-off-by: evenyag <realevenyag@gmail.com>

* test: add tests for sparse encoding

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: remove logs

Signed-off-by: evenyag <realevenyag@gmail.com>

* test: update list test

Signed-off-by: evenyag <realevenyag@gmail.com>

* test: simplify tests

Signed-off-by: evenyag <realevenyag@gmail.com>

---------

Signed-off-by: evenyag <realevenyag@gmail.com>
2025-12-19 07:36:44 +00:00
discord9
69659211f6 chore: fix bincode version (#7445)
Signed-off-by: discord9 <discord9@163.com>
2025-12-19 07:36:28 +00:00
LFC
6332d91884 test: reduce execution time of test test_suspend_frontend (#7444)
Signed-off-by: luofucong <luofc@foxmail.com>
2025-12-19 07:25:36 +00:00
Weny Xu
4d66bd96b8 feat: make distributed time constants and client timeouts configurable (#7433)
Signed-off-by: WenyXu <wenymedia@gmail.com>
2025-12-19 02:23:20 +00:00
Ning Sun
2f4a15ec40 ci: ensure commits from main branch for whitelisted git dependencies (#7434)
* chore: update proto to include native histogram

* ci: add a CI check to ensure whitelisted dependencies are using their main branch

* chore: add changes to Cargo.toml to trigger CI

* chore: update proto

* test: update test to include histogram
2025-12-18 14:10:33 +00:00
Lanqing Yang
658332fe68 chore(mito): nit remove extra hashset in gc workers (#7399)
chore(mito): remove extra hashset in gc workers

Signed-off-by: lyang24 <lanqingy93@gmail.com>
2025-12-18 13:09:32 +00:00
shuiyisong
c088d361a4 chore: expose disable_ec2_metadata option (#7439)
chore: add option for disable ec2 metadata

Signed-off-by: shuiyisong <xixing.sys@gmail.com>
2025-12-18 11:55:08 +00:00
shuiyisong
a85864067e chore: remove canonicalize (#7430)
* chore: remove canonicalize

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* chore: add match file name option

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* chore: update field name

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* chore: modify tls option

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* chore: update config file

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* chore: update config md

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* chore: update option to `enable_filename_match`

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* chore: address CR issues

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* chore: remove option

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* chore: remove unused test

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

---------

Signed-off-by: shuiyisong <xixing.sys@gmail.com>
2025-12-18 09:39:10 +00:00
LFC
0df69c95aa chore: use official etcd-client (#7432)
Signed-off-by: luofucong <luofc@foxmail.com>
2025-12-18 06:25:48 +00:00
McKnight22
72eede8b38 refactor(cli): unify storage configuration for export command (#7280)
* refactor(cli): unify storage configuration for export command

- Utilize ObjectStoreConfig to unify storage configuration for export command
- Support export command for Fs, S3, OSS, GCS and Azblob
- Fix the Display implementation for SecretString always returned the string
  "SecretString([REDACTED])" even when the internal secret was empty.

Signed-off-by: McKnight22 <tao.wang.22@outlook.com>

* refactor(cli): unify storage configuration for export command

- Change the encapsulation permissions of each configuration
  options for every storage backend to public access.

Signed-off-by: McKnight22 <tao.wang.22@outlook.com>
Co-authored-by: WenyXu <wenymedia@gmail.com>

* refactor(cli): unify storage configuration for export command

- Update the implementation of ObjectStoreConfig::build_xxx() using macro solutions

Signed-off-by: McKnight22 <tao.wang.22@outlook.com>
Co-authored-by: WenyXu <wenymedia@gmail.com>

* refactor(cli): unify storage configuration for export command

- Introduce config validation for each storage type

Signed-off-by: McKnight22 <tao.wang.22@outlook.com>

* refactor(cli): unify storage configuration for export command

- Enable trait-based polymorphism for storage type handling
  (from inherent impl to trait impl)
- Extract helper functions to reduce code duplication

Signed-off-by: McKnight22 <tao.wang.22@outlook.com>

* refactor(cli): unify storage configuration for export command

- Improve SecretString handling and validation
  (Distinguishing between "not provided" and "empty string")
- Add validation when using filesystem storage

Signed-off-by: McKnight22 <tao.wang.22@outlook.com>

* refactor(cli): unify storage configuration for export command

- Refactor storage field validation with macro

Signed-off-by: McKnight22 <tao.wang.22@outlook.com>

* refactor(cli): unify storage configuration for export command

- support GCS Application Default Credentials (like GKE, Cloud Run, or local development with ) in export
  (Enabling ADC without validating  or  to be present)
  (Making  optional in GCS validation (defaults to https://storage.googleapis.com))

Signed-off-by: McKnight22 <tao.wang.22@outlook.com>

* refactor(cli): unify storage configuration for export command

This commit refactors the validation logic for object store configurations in the CLI to leverage clap features and reduce boilerplate.

Key changes:
- Update wrap_with_clap_prefix macro to use clap's requires attribute.
  This ensures that storage-specific options (e.g., --s3-bucket) are only accepted when the corresponding backend is enabled (e.g., --s3).
- Simplify FieldValidator trait by removing the is_provided method, as dependency checks are now handled by clap.
- Introduce validate_backend! macro to standardize the validation of required fields for enabled backends.
- Refactor ExportCommand to remove explicit validation calls (validate_s3, etc.) and rely on the validation within backend constructors.
- Add integration tests for ExportCommand to verify build success with S3, OSS, GCS, and Azblob configurations.

Signed-off-by: McKnight22 <tao.wang.22@outlook.com>

* refactor(cli): unify storage configuration for export command

- Use macros to simplify storage export implementation

Signed-off-by: McKnight22 <tao.wang.22@outlook.com>
Co-authored-by: WenyXu <wenymedia@gmail.com>

* refactor(cli): unify storage configuration for export command

- Rollback StorageExport trait implementation to not using macro for better code clarity and maintainability
- Introduce format_uri helper function to unify URI formatting logic
- Fix OSS URI path bug inherited from legacy code

Signed-off-by: McKnight22 <tao.wang.22@outlook.com>
Co-authored-by: WenyXu <wenymedia@gmail.com>

* refactor(cli): unify storage configuration for export command

- Remove unnecessary async_trait

Signed-off-by: McKnight22 <tao.wang.22@outlook.com>
Co-authored-by: jeremyhi <jiachun_feng@proton.me>

---------

Signed-off-by: McKnight22 <tao.wang.22@outlook.com>
Co-authored-by: WenyXu <wenymedia@gmail.com>
Co-authored-by: jeremyhi <jiachun_feng@proton.me>
2025-12-18 03:16:53 +00:00
jeremyhi
95eccd6cde feat: introduce granularity for memory manager (#7416)
* feat: introduce granularity for memory manager

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* chore: add unit test

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* chore: remove granularity getter for mamanger

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* Update src/common/memory-manager/src/manager.rs

Co-authored-by: Lei, HUANG <6406592+v0y4g3r@users.noreply.github.com>

* feat: acquire_with_policy for manager

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

---------

Signed-off-by: jeremyhi <fengjiachun@gmail.com>
Co-authored-by: Lei, HUANG <6406592+v0y4g3r@users.noreply.github.com>
2025-12-17 11:08:51 +00:00
fys
0bc5a305be chore: add wait_initialized method for frontend client (#7414)
* chore: add wait_initialized method for frontend client

* fix: some

* fix: cargo fmt

* add comment

* add unit test

* rename

* fix: cargo check

* fix: cr by copilot
2025-12-17 08:13:36 +00:00
discord9
1afcddd5a9 chore: feature gate vector_index (#7428)
Signed-off-by: discord9 <discord9@163.com>
2025-12-17 07:14:25 +00:00
shuiyisong
62808b887b fix: using anonymous s3 access when ak and sk is not provided (#7425)
* chore: allow s3 anon

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* chore: disable ec2 metadata

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

---------

Signed-off-by: shuiyisong <xixing.sys@gmail.com>
2025-12-17 06:34:29 +00:00
discord9
04ddd40e00 chore: bump version to beta.3 (#7423)
chore: bump to beta.3

Signed-off-by: discord9 <discord9@163.com>
2025-12-17 04:18:23 +00:00
liyang
b4f028be5f chore: change etcd endpoints to array in the test scripts (#7419)
chore: change etcd endpoint

Signed-off-by: liyang <daviderli614@gmail.com>
2025-12-17 03:14:35 +00:00
Lei, HUANG
da964880f5 chore: expose symbols (#7417)
* refactor/expose-symbols:
 ## Refactor `bulk/part.rs` to Simplify Mutation Handling

 - Removed the `mutations_to_record_batch` function and its associated helper functions, including `ArraysSorter`, `timestamp_array_to_iter`, and `binary_array_to_dictionary`, to simplify the mutation handling logic in `bulk/part.rs`.
 - Deleted related test functions `check_binary_array_to_dictionary` and `check_mutations_to_record_batches` from the test module, along with their associated test cases.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* refactor/expose-symbols:
 ### Commit Message

 **Refactor and Enhance Deduplication Logic**

 - **`flush.rs`**: Refactored `maybe_dedup_one` function to accept `append_mode` and `merge_mode` as parameters instead of `RegionOptions`. This change enhances flexibility in deduplication logic.
 - **`memtable/bulk.rs`**: Made `BulkRangeIterBuilder` struct and its fields public to allow external access and modification, improving extensibility.
 - **`sst.rs`**: Corrected a typo in the schema documentation, changing `__prmary_key` to `__primary_key` for clarity and accuracy.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

---------

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
2025-12-17 01:29:36 +00:00
dennis zhuang
a35a39f726 feat(vector_index): adds the foundational types and SQL parsing support for vector index (#7366)
* feat: adds the foundational types and SQL parsing support for vector index

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* refactor: by suggestions

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* fix: ensure index option values must be greater than zero

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* chore: validate connectivity strictly

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* fix: compile error

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* feat: disable SIMD for ci

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

---------

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
2025-12-16 22:45:36 +00:00
Lei, HUANG
e0c1566e92 fix(servers): flight stuck on waiting for first message (#7413)
* fix/flight-stuck-on-first-message:
 **Refactor GRPC Stream Handling and Table Resolution**

 - **`grpc.rs`**: Refactored the `GrpcQueryHandler` to resolve table references and check permissions only once per stream, improving efficiency. Introduced a mechanism to handle table resolution and permission checks after receiving the first `RecordBatch`.
 - **`flight.rs`**: Enhanced `PutRecordBatchRequestStream` to manage stream states (`Init` and `Ready`) for better handling of schema and table name extraction. Improved error handling and logging for unexpected flight messages.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* chore: add some doc

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

---------

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
2025-12-16 08:54:13 +00:00
Yingwen
f6afb10e33 feat!: download file to fill the cache on write cache miss (#7294)
* feat: download inverted index file

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: download for bloom and fulltext

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: implement maybe_download_background for FileCache

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: load file for parquet

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: reduce channel size

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: use ManifestCache

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: pass cache to ManifestObjectStore::new

Signed-off-by: evenyag <realevenyag@gmail.com>

* style: fix fmt and clippy

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: remove manifest cache ttl

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: remove read cache

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: clean old read cache path

Signed-off-by: evenyag <realevenyag@gmail.com>

* docs: update config

Signed-off-by: evenyag <realevenyag@gmail.com>

* docs: update config examples

Signed-off-by: evenyag <realevenyag@gmail.com>

* test: update test

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: fix CI

Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: also clean the root directory

Signed-off-by: evenyag <realevenyag@gmail.com>

* test: update manifest test

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: fix compiler errors

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: skip file if it exists

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: remove warn in replace

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: add a flag to enable/disable background download

set the concurrency to 1 for background download

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: rename write_cache_enable_background_download to enable_refill_cache_on_read

Signed-off-by: evenyag <realevenyag@gmail.com>

* test: update config test

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: address comments

Signed-off-by: evenyag <realevenyag@gmail.com>

* docs: update config.md

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: fmt code

Signed-off-by: evenyag <realevenyag@gmail.com>

---------

Signed-off-by: evenyag <realevenyag@gmail.com>
2025-12-16 08:31:26 +00:00
dennis zhuang
2dfcf35fee feat: support function aliases and add MySQL-compatible aliases (#7410)
* feat: support function aliases and add MySQL-compatible aliases

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* fix: get_table_function_source

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* refactor: add function_alias mod

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* fix: license

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

---------

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
2025-12-16 06:56:23 +00:00
Weny Xu
f7d5c87ac0 feat: introduce copy_region_from for mito engine (#7389)
* feat: introduce `copy_region_from`

Signed-off-by: WenyXu <wenymedia@gmail.com>

* fix: fix clippy

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: apply suggestions from CR

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: apply suggestions

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: apply suggestions from CR

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: apply suggestions from CR

Signed-off-by: WenyXu <wenymedia@gmail.com>

---------

Signed-off-by: WenyXu <wenymedia@gmail.com>
2025-12-16 06:12:06 +00:00
Weny Xu
9cd57e9342 fix: use verified recycling method for PostgreSQL connection pool (#7407)
Signed-off-by: WenyXu <wenymedia@gmail.com>
2025-12-16 02:49:01 +00:00
jeremyhi
32f9cc5286 feat: move memory_manager to common crate (#7408)
* feat: move memory_manager to common crate

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* chore: add license header

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* fix: by AI comment

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

---------

Signed-off-by: jeremyhi <fengjiachun@gmail.com>
2025-12-15 13:15:33 +00:00
Yingwen
5232a12a8c feat: per file scan metrics (#7396)
* feat: collect per file metrics

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: divide build_cost to build_part_cost and build_reader_cost

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: limit the file metrics num to display

Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: use sorted iter to get sorted files

Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: output metrics in desc order

Signed-off-by: evenyag <realevenyag@gmail.com>

---------

Signed-off-by: evenyag <realevenyag@gmail.com>
2025-12-15 12:52:03 +00:00
fys
913ac325e5 chore: add is_initialized method for frontend client (#7409)
chore: add `is_initialized` for frontend client
2025-12-15 12:51:09 +00:00
LFC
0c52d5bb34 fix: cpu cores got wrongly calculated to 0 (#7405)
* fix: cpu cores got wrongly calculated to 0

Signed-off-by: luofucong <luofc@foxmail.com>

* Update src/common/stat/src/resource.rs

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Signed-off-by: luofucong <luofc@foxmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-12-15 09:40:49 +00:00
Ruihang Xia
e0697790e6 chore: sort histogram sqlness result (#7406)
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2025-12-15 08:12:12 +00:00
shuiyisong
64e74916b9 fix: TLS option validate and merge (#7401)
* chore: unify gRPC server tls behaviour

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* fix: test

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* chore: add validate and merge tls

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* chore: remove mut in func sig and add back test

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* fix: test

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

---------

Signed-off-by: shuiyisong <xixing.sys@gmail.com>
2025-12-15 02:53:21 +00:00
Ruihang Xia
b601781604 feat: optimize and fix part sort on overlapping time windows (#7387)
* enforce two ends sort

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Signed-off-by: discord9 <discord9@163.com>

* primary end scope drain

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Signed-off-by: discord9 <discord9@163.com>

* correct fuzzy generator, no zero limit

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Signed-off-by: discord9 <discord9@163.com>

* early stop check

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Signed-off-by: discord9 <discord9@163.com>

* correct test

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Signed-off-by: discord9 <discord9@163.com>

* simplify implementation by removing some old logic

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Signed-off-by: discord9 <discord9@163.com>

* what

Signed-off-by: discord9 <discord9@163.com>

* maybe

Signed-off-by: discord9 <discord9@163.com>

* fix: reread topk

Signed-off-by: discord9 <discord9@163.com>

* remove: unused topk_buffer_fulfilled method

Fixes clippy dead code warning by removing the unused method.

Signed-off-by: discord9 <discord9@163.com>

* fix: correct test expectations for windowed sort with limit

Updated test expectations in windowed sort tests to match actual algorithm behavior:
- Fixed descending sort test to expect global top 4 values [95, 94, 90, 85] instead of group-local selection
- Fixed ascending sort test to expect global smallest 4 values [5, 6, 7, 8] and adjusted read count accordingly
- Updated comments to reflect correct algorithm behavior for threshold-based boundary detection

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: discord9 <discord9@163.com>

* skip fuzzy test for now

Signed-off-by: discord9 <discord9@163.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Signed-off-by: discord9 <discord9@163.com>
Co-authored-by: discord9 <discord9@163.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-12 14:04:32 +00:00
Ruihang Xia
bd3ad60910 fix: promql offset direction (#7392)
* fix: promql offset direction

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* sort sqlness result

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* commit forgotten file

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2025-12-12 07:51:35 +00:00
Ruihang Xia
cbfdeca64c fix: promql histogram with aggregation (#7393)
* fix: promql histogram with aggregation

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* update test constructors

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* sqlness tests

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* update sqlness result

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* redact partition number

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2025-12-12 07:32:04 +00:00
jeremyhi
baffed8c6a feat: mem manager on compaction (#7305)
* feat: mem manager on compaction

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* fix: by copilot review comment

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* feat: experimental_

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* fix: refine estimate_compaction_bytes

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* feat: make them into config example

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* chore: by copilot comment

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* Update src/mito2/src/compaction.rs

Co-authored-by: Lei, HUANG <6406592+v0y4g3r@users.noreply.github.com>

* fix: dedup the regions waiting

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* chore: by comment

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* chore: minor change

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* feat: add AdditionalMemoryGuard for the running compaction task

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* refactor: do OnExhaustedPolicy before running task

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* refactor: use OwnedSemaphorePermit to impl guard

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* feat: add early_release_partial method to release a portion of memory

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* fix: 0 bytes make request_additional unlimited

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* fix: fail-fast on acquire

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

---------

Signed-off-by: jeremyhi <fengjiachun@gmail.com>
Co-authored-by: Lei, HUANG <6406592+v0y4g3r@users.noreply.github.com>
2025-12-12 06:49:58 +00:00
discord9
11a5e1618d test: test_tracker_cleanup skip non linux (#7398)
test: skip non linux

Signed-off-by: discord9 <discord9@163.com>
2025-12-12 06:27:57 +00:00
Lanqing Yang
f5e0e94e3a chore(mito): nit avoid clone the batch object on inverted index building (#7388)
fix: avoid clone the batch object on inverted index building

Signed-off-by: lyang24 <lanqingy93@gmail.com>
2025-12-12 04:58:37 +00:00
474 changed files with 30225 additions and 7016 deletions

View File

@@ -51,7 +51,7 @@ runs:
run: | run: |
helm upgrade \ helm upgrade \
--install my-greptimedb \ --install my-greptimedb \
--set meta.backendStorage.etcd.endpoints=${{ inputs.etcd-endpoints }} \ --set 'meta.backendStorage.etcd.endpoints[0]=${{ inputs.etcd-endpoints }}' \
--set meta.enableRegionFailover=${{ inputs.enable-region-failover }} \ --set meta.enableRegionFailover=${{ inputs.enable-region-failover }} \
--set image.registry=${{ inputs.image-registry }} \ --set image.registry=${{ inputs.image-registry }} \
--set image.repository=${{ inputs.image-repository }} \ --set image.repository=${{ inputs.image-repository }} \
@@ -70,19 +70,23 @@ runs:
--wait \ --wait \
--wait-for-jobs --wait-for-jobs
- name: Wait for GreptimeDB - name: Wait for GreptimeDB
shell: bash uses: nick-fields/retry@v3
run: | with:
while true; do timeout_minutes: 3
PHASE=$(kubectl -n my-greptimedb get gtc my-greptimedb -o jsonpath='{.status.clusterPhase}') max_attempts: 1
if [ "$PHASE" == "Running" ]; then shell: bash
echo "Cluster is ready" command: |
break while true; do
else PHASE=$(kubectl -n my-greptimedb get gtc my-greptimedb -o jsonpath='{.status.clusterPhase}')
echo "Cluster is not ready yet: Current phase: $PHASE" if [ "$PHASE" == "Running" ]; then
kubectl get pods -n my-greptimedb echo "Cluster is ready"
sleep 5 # wait for 5 seconds before check again. break
fi else
done echo "Cluster is not ready yet: Current phase: $PHASE"
kubectl get pods -n my-greptimedb
sleep 5 # wait for 5 seconds before check again.
fi
done
- name: Print GreptimeDB info - name: Print GreptimeDB info
if: always() if: always()
shell: bash shell: bash

View File

@@ -49,6 +49,17 @@ function create_version() {
echo "GITHUB_REF_NAME is empty in push event" >&2 echo "GITHUB_REF_NAME is empty in push event" >&2
exit 1 exit 1
fi fi
# For tag releases, ensure GITHUB_REF_NAME matches the version in Cargo.toml
CARGO_VERSION=$(grep '^version = ' Cargo.toml | cut -d '"' -f 2 | head -n 1)
EXPECTED_REF_NAME="v${CARGO_VERSION}"
if [ "$GITHUB_REF_NAME" != "$EXPECTED_REF_NAME" ]; then
echo "Error: GITHUB_REF_NAME '$GITHUB_REF_NAME' does not match Cargo.toml version 'v${CARGO_VERSION}'" >&2
echo "Expected tag name: '$EXPECTED_REF_NAME'" >&2
exit 1
fi
echo "$GITHUB_REF_NAME" echo "$GITHUB_REF_NAME"
elif [ "$GITHUB_EVENT_NAME" = workflow_dispatch ]; then elif [ "$GITHUB_EVENT_NAME" = workflow_dispatch ]; then
echo "$NEXT_RELEASE_VERSION-$(git rev-parse --short HEAD)-$(date "+%Y%m%d-%s")" echo "$NEXT_RELEASE_VERSION-$(git rev-parse --short HEAD)-$(date "+%Y%m%d-%s")"

View File

@@ -81,7 +81,7 @@ function deploy_greptimedb_cluster() {
--create-namespace \ --create-namespace \
--set image.tag="$GREPTIMEDB_IMAGE_TAG" \ --set image.tag="$GREPTIMEDB_IMAGE_TAG" \
--set initializer.tag="$GREPTIMEDB_INITIALIZER_IMAGE_TAG" \ --set initializer.tag="$GREPTIMEDB_INITIALIZER_IMAGE_TAG" \
--set meta.backendStorage.etcd.endpoints="etcd.$install_namespace:2379" \ --set "meta.backendStorage.etcd.endpoints[0]=etcd.$install_namespace.svc.cluster.local:2379" \
--set meta.backendStorage.etcd.storeKeyPrefix="$cluster_name" \ --set meta.backendStorage.etcd.storeKeyPrefix="$cluster_name" \
-n "$install_namespace" -n "$install_namespace"
@@ -119,7 +119,7 @@ function deploy_greptimedb_cluster_with_s3_storage() {
--create-namespace \ --create-namespace \
--set image.tag="$GREPTIMEDB_IMAGE_TAG" \ --set image.tag="$GREPTIMEDB_IMAGE_TAG" \
--set initializer.tag="$GREPTIMEDB_INITIALIZER_IMAGE_TAG" \ --set initializer.tag="$GREPTIMEDB_INITIALIZER_IMAGE_TAG" \
--set meta.backendStorage.etcd.endpoints="etcd.$install_namespace:2379" \ --set "meta.backendStorage.etcd.endpoints[0]=etcd.$install_namespace.svc.cluster.local:2379" \
--set meta.backendStorage.etcd.storeKeyPrefix="$cluster_name" \ --set meta.backendStorage.etcd.storeKeyPrefix="$cluster_name" \
--set objectStorage.s3.bucket="$AWS_CI_TEST_BUCKET" \ --set objectStorage.s3.bucket="$AWS_CI_TEST_BUCKET" \
--set objectStorage.s3.region="$AWS_REGION" \ --set objectStorage.s3.region="$AWS_REGION" \

154
.github/workflows/check-git-deps.yml vendored Normal file
View File

@@ -0,0 +1,154 @@
name: Check Git Dependencies on Main Branch
on:
pull_request:
branches: [main]
paths:
- 'Cargo.toml'
push:
branches: [main]
paths:
- 'Cargo.toml'
jobs:
check-git-deps:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v6
- name: Check git dependencies
env:
WHITELIST_DEPS: "greptime-proto,meter-core,meter-macros"
run: |
#!/bin/bash
set -e
echo "Checking whitelisted git dependencies..."
# Function to check if a commit is on main branch
check_commit_on_main() {
local repo_url="$1"
local commit="$2"
local repo_name=$(basename "$repo_url" .git)
echo "Checking $repo_name"
echo "Repo: $repo_url"
echo "Commit: $commit"
# Create a temporary directory for cloning
local temp_dir=$(mktemp -d)
# Clone the repository
if git clone "$repo_url" "$temp_dir" 2>/dev/null; then
cd "$temp_dir"
# Try to determine the main branch name
local main_branch="main"
if ! git rev-parse --verify origin/main >/dev/null 2>&1; then
if git rev-parse --verify origin/master >/dev/null 2>&1; then
main_branch="master"
else
# Try to get the default branch
main_branch=$(git symbolic-ref refs/remotes/origin/HEAD | sed 's@^refs/remotes/origin/@@')
fi
fi
echo "Main branch: $main_branch"
# Check if commit exists
if git cat-file -e "$commit" 2>/dev/null; then
# Check if commit is on main branch
if git merge-base --is-ancestor "$commit" "origin/$main_branch" 2>/dev/null; then
echo "PASS: Commit $commit is on $main_branch branch"
cd - >/dev/null
rm -rf "$temp_dir"
return 0
else
echo "FAIL: Commit $commit is NOT on $main_branch branch"
# Try to find which branch contains this commit
local branch_name=$(git branch -r --contains "$commit" 2>/dev/null | head -1 | sed 's/^[[:space:]]*origin\///' | sed 's/[[:space:]]*$//')
if [[ -n "$branch_name" ]]; then
echo "Found on branch: $branch_name"
fi
cd - >/dev/null
rm -rf "$temp_dir"
return 1
fi
else
echo "FAIL: Commit $commit not found in repository"
cd - >/dev/null
rm -rf "$temp_dir"
return 1
fi
else
echo "FAIL: Failed to clone $repo_url"
rm -rf "$temp_dir"
return 1
fi
}
# Extract whitelisted git dependencies from Cargo.toml
echo "Extracting git dependencies from Cargo.toml..."
# Create temporary array to store dependencies
declare -a deps=()
# Build awk pattern from whitelist
IFS=',' read -ra WHITELIST <<< "$WHITELIST_DEPS"
awk_pattern=""
for dep in "${WHITELIST[@]}"; do
if [[ -n "$awk_pattern" ]]; then
awk_pattern="$awk_pattern|"
fi
awk_pattern="$awk_pattern$dep"
done
# Extract whitelisted dependencies
while IFS= read -r line; do
if [[ -n "$line" ]]; then
deps+=("$line")
fi
done < <(awk -v pattern="$awk_pattern" '
$0 ~ pattern ".*git = \"https:/" {
match($0, /git = "([^"]+)"/, arr)
git_url = arr[1]
if (match($0, /rev = "([^"]+)"/, rev_arr)) {
rev = rev_arr[1]
print git_url " " rev
} else {
# Check next line for rev
getline
if (match($0, /rev = "([^"]+)"/, rev_arr)) {
rev = rev_arr[1]
print git_url " " rev
}
}
}
' Cargo.toml)
echo "Found ${#deps[@]} dependencies to check:"
for dep in "${deps[@]}"; do
echo " $dep"
done
failed=0
for dep in "${deps[@]}"; do
read -r repo_url commit <<< "$dep"
if ! check_commit_on_main "$repo_url" "$commit"; then
failed=1
fi
done
echo "Check completed."
if [[ $failed -eq 1 ]]; then
echo "ERROR: Some git dependencies are not on their main branches!"
echo "Please update the commits to point to main branch commits."
exit 1
else
echo "SUCCESS: All git dependencies are on their main branches!"
fi

View File

@@ -755,7 +755,7 @@ jobs:
run: ../../.github/scripts/pull-test-deps-images.sh && docker compose up -d --wait run: ../../.github/scripts/pull-test-deps-images.sh && docker compose up -d --wait
- name: Run nextest cases - name: Run nextest cases
run: cargo nextest run --workspace -F dashboard -F pg_kvbackend -F mysql_kvbackend run: cargo nextest run --workspace -F dashboard -F pg_kvbackend -F mysql_kvbackend -F vector_index
env: env:
CARGO_BUILD_RUSTFLAGS: "-C link-arg=-fuse-ld=mold" CARGO_BUILD_RUSTFLAGS: "-C link-arg=-fuse-ld=mold"
RUST_BACKTRACE: 1 RUST_BACKTRACE: 1
@@ -813,7 +813,7 @@ jobs:
run: ../../.github/scripts/pull-test-deps-images.sh && docker compose up -d --wait run: ../../.github/scripts/pull-test-deps-images.sh && docker compose up -d --wait
- name: Run nextest cases - name: Run nextest cases
run: cargo llvm-cov nextest --workspace --lcov --output-path lcov.info -F dashboard -F pg_kvbackend -F mysql_kvbackend run: cargo llvm-cov nextest --workspace --lcov --output-path lcov.info -F dashboard -F pg_kvbackend -F mysql_kvbackend -F vector_index
env: env:
CARGO_BUILD_RUSTFLAGS: "-C link-arg=-fuse-ld=mold" CARGO_BUILD_RUSTFLAGS: "-C link-arg=-fuse-ld=mold"
RUST_BACKTRACE: 1 RUST_BACKTRACE: 1

3
.gitignore vendored
View File

@@ -67,3 +67,6 @@ greptimedb_data
# Claude code # Claude code
CLAUDE.md CLAUDE.md
# AGENTS.md
AGENTS.md

View File

@@ -102,6 +102,30 @@ like `feat`/`fix`/`docs`, with a concise summary of code change following. AVOID
All commit messages SHOULD adhere to the [Conventional Commits specification](https://conventionalcommits.org/). All commit messages SHOULD adhere to the [Conventional Commits specification](https://conventionalcommits.org/).
## AI-Assisted contributions
We have the following policy for AI-assisted PRs:
- The PR author should **understand the core ideas** behind the implementation **end-to-end**, and be able to justify the design and code during review.
- **Calls out unknowns and assumptions**. It's okay to not fully understand some bits of AI generated code. You should comment on these cases and point them out to reviewers so that they can use their knowledge of the codebase to clear up any concerns. For example, you might comment "calling this function here seems to work but I'm not familiar with how it works internally, I wonder if there's a race condition if it is called concurrently".
### Why fully AI-generated PRs without understanding are not helpful
Today, AI tools cannot reliably make complex changes to GreptimeDB on their own, which is why we rely on pull requests and code review.
The purposes of code review are:
1. Finish the intended task.
2. Share knowledge between authors and reviewers, as a long-term investment in the project. For this reason, even if someone familiar with the codebase can finish a task quickly, we're still happy to help a new contributor work on it even if it takes longer.
An AI dump for an issue doesnt meet these purposes. Maintainers could finish the task faster by using AI directly, and the submitters gain little knowledge if they act only as a pass through AI proxy without understanding.
Please understand the reviewing capacity is **very limited** for the project, so large PRs which appear to not have the requisite understanding might not get reviewed, and eventually closed or redirected.
### Better ways to contribute than an “AI dump”
It's recommended to write a high-quality issue with a clear problem statement and a minimal, reproducible example. This can make it easier for others to contribute.
## Getting Help ## Getting Help
There are many ways to get help when you're stuck. It is recommended to ask for help by opening an issue, with a detailed description There are many ways to get help when you're stuck. It is recommended to ask for help by opening an issue, with a detailed description

359
Cargo.lock generated

File diff suppressed because it is too large Load Diff

View File

@@ -21,6 +21,7 @@ members = [
"src/common/grpc-expr", "src/common/grpc-expr",
"src/common/macro", "src/common/macro",
"src/common/mem-prof", "src/common/mem-prof",
"src/common/memory-manager",
"src/common/meta", "src/common/meta",
"src/common/options", "src/common/options",
"src/common/plugins", "src/common/plugins",
@@ -74,7 +75,7 @@ members = [
resolver = "2" resolver = "2"
[workspace.package] [workspace.package]
version = "1.0.0-beta.2" version = "1.0.0-beta.4"
edition = "2024" edition = "2024"
license = "Apache-2.0" license = "Apache-2.0"
@@ -102,6 +103,7 @@ aquamarine = "0.6"
arrow = { version = "56.2", features = ["prettyprint"] } arrow = { version = "56.2", features = ["prettyprint"] }
arrow-array = { version = "56.2", default-features = false, features = ["chrono-tz"] } arrow-array = { version = "56.2", default-features = false, features = ["chrono-tz"] }
arrow-buffer = "56.2" arrow-buffer = "56.2"
arrow-cast = "56.2"
arrow-flight = "56.2" arrow-flight = "56.2"
arrow-ipc = { version = "56.2", default-features = false, features = ["lz4", "zstd"] } arrow-ipc = { version = "56.2", default-features = false, features = ["lz4", "zstd"] }
arrow-schema = { version = "56.2", features = ["serde"] } arrow-schema = { version = "56.2", features = ["serde"] }
@@ -142,14 +144,14 @@ derive_builder = "0.20"
derive_more = { version = "2.1", features = ["full"] } derive_more = { version = "2.1", features = ["full"] }
dotenv = "0.15" dotenv = "0.15"
either = "1.15" either = "1.15"
etcd-client = { git = "https://github.com/GreptimeTeam/etcd-client", rev = "f62df834f0cffda355eba96691fe1a9a332b75a7", features = [ etcd-client = { version = "0.16.1", features = [
"tls", "tls",
"tls-roots", "tls-roots",
] } ] }
fst = "0.4.7" fst = "0.4.7"
futures = "0.3" futures = "0.3"
futures-util = "0.3" futures-util = "0.3"
greptime-proto = { git = "https://github.com/GreptimeTeam/greptime-proto.git", rev = "0423fa30203187c75e2937a668df1da699c8b96c" } greptime-proto = { git = "https://github.com/GreptimeTeam/greptime-proto.git", rev = "0e316b86d765e4718d6f0ca77b1ad179f222b822" }
hex = "0.4" hex = "0.4"
http = "1" http = "1"
humantime = "2.1" humantime = "2.1"
@@ -187,7 +189,7 @@ paste = "1.0"
pin-project = "1.0" pin-project = "1.0"
pretty_assertions = "1.4.0" pretty_assertions = "1.4.0"
prometheus = { version = "0.13.3", features = ["process"] } prometheus = { version = "0.13.3", features = ["process"] }
promql-parser = { version = "0.6", features = ["ser"] } promql-parser = { version = "0.7.1", features = ["ser"] }
prost = { version = "0.13", features = ["no-recursion-limit"] } prost = { version = "0.13", features = ["no-recursion-limit"] }
prost-types = "0.13" prost-types = "0.13"
raft-engine = { version = "0.4.1", default-features = false } raft-engine = { version = "0.4.1", default-features = false }
@@ -266,6 +268,7 @@ common-grpc = { path = "src/common/grpc" }
common-grpc-expr = { path = "src/common/grpc-expr" } common-grpc-expr = { path = "src/common/grpc-expr" }
common-macro = { path = "src/common/macro" } common-macro = { path = "src/common/macro" }
common-mem-prof = { path = "src/common/mem-prof" } common-mem-prof = { path = "src/common/mem-prof" }
common-memory-manager = { path = "src/common/memory-manager" }
common-meta = { path = "src/common/meta" } common-meta = { path = "src/common/meta" }
common-options = { path = "src/common/options" } common-options = { path = "src/common/options" }
common-plugins = { path = "src/common/plugins" } common-plugins = { path = "src/common/plugins" }
@@ -330,7 +333,7 @@ datafusion-physical-plan = { git = "https://github.com/GreptimeTeam/datafusion.g
datafusion-datasource = { git = "https://github.com/GreptimeTeam/datafusion.git", rev = "fd4b2abcf3c3e43e94951bda452c9fd35243aab0" } datafusion-datasource = { git = "https://github.com/GreptimeTeam/datafusion.git", rev = "fd4b2abcf3c3e43e94951bda452c9fd35243aab0" }
datafusion-sql = { git = "https://github.com/GreptimeTeam/datafusion.git", rev = "fd4b2abcf3c3e43e94951bda452c9fd35243aab0" } datafusion-sql = { git = "https://github.com/GreptimeTeam/datafusion.git", rev = "fd4b2abcf3c3e43e94951bda452c9fd35243aab0" }
datafusion-substrait = { git = "https://github.com/GreptimeTeam/datafusion.git", rev = "fd4b2abcf3c3e43e94951bda452c9fd35243aab0" } datafusion-substrait = { git = "https://github.com/GreptimeTeam/datafusion.git", rev = "fd4b2abcf3c3e43e94951bda452c9fd35243aab0" }
sqlparser = { git = "https://github.com/GreptimeTeam/sqlparser-rs.git", rev = "4b519a5caa95472cc3988f5556813a583dd35af1" } # branch = "v0.58.x" sqlparser = { git = "https://github.com/GreptimeTeam/sqlparser-rs.git", rev = "a0ce2bc6eb3e804532932f39833c32432f5c9a39" } # branch = "v0.58.x"
[profile.release] [profile.release]
debug = 1 debug = 1

View File

@@ -14,6 +14,7 @@ BUILDX_BUILDER_NAME ?= gtbuilder
BASE_IMAGE ?= ubuntu BASE_IMAGE ?= ubuntu
RUST_TOOLCHAIN ?= $(shell cat rust-toolchain.toml | grep channel | cut -d'"' -f2) RUST_TOOLCHAIN ?= $(shell cat rust-toolchain.toml | grep channel | cut -d'"' -f2)
CARGO_REGISTRY_CACHE ?= ${HOME}/.cargo/registry CARGO_REGISTRY_CACHE ?= ${HOME}/.cargo/registry
CARGO_GIT_CACHE ?= ${HOME}/.cargo/git
ARCH := $(shell uname -m | sed 's/x86_64/amd64/' | sed 's/aarch64/arm64/') ARCH := $(shell uname -m | sed 's/x86_64/amd64/' | sed 's/aarch64/arm64/')
OUTPUT_DIR := $(shell if [ "$(RELEASE)" = "true" ]; then echo "release"; elif [ ! -z "$(CARGO_PROFILE)" ]; then echo "$(CARGO_PROFILE)" ; else echo "debug"; fi) OUTPUT_DIR := $(shell if [ "$(RELEASE)" = "true" ]; then echo "release"; elif [ ! -z "$(CARGO_PROFILE)" ]; then echo "$(CARGO_PROFILE)" ; else echo "debug"; fi)
SQLNESS_OPTS ?= SQLNESS_OPTS ?=
@@ -86,7 +87,7 @@ build: ## Build debug version greptime.
build-by-dev-builder: ## Build greptime by dev-builder. build-by-dev-builder: ## Build greptime by dev-builder.
docker run --network=host \ docker run --network=host \
${ASSEMBLED_EXTRA_BUILD_ENV} \ ${ASSEMBLED_EXTRA_BUILD_ENV} \
-v ${PWD}:/greptimedb -v ${CARGO_REGISTRY_CACHE}:/root/.cargo/registry \ -v ${PWD}:/greptimedb -v ${CARGO_REGISTRY_CACHE}:/root/.cargo/registry -v ${CARGO_GIT_CACHE}:/root/.cargo/git \
-w /greptimedb ${IMAGE_REGISTRY}/${IMAGE_NAMESPACE}/dev-builder-${BASE_IMAGE}:${DEV_BUILDER_IMAGE_TAG} \ -w /greptimedb ${IMAGE_REGISTRY}/${IMAGE_NAMESPACE}/dev-builder-${BASE_IMAGE}:${DEV_BUILDER_IMAGE_TAG} \
make build \ make build \
CARGO_EXTENSION="${CARGO_EXTENSION}" \ CARGO_EXTENSION="${CARGO_EXTENSION}" \
@@ -100,7 +101,7 @@ build-by-dev-builder: ## Build greptime by dev-builder.
.PHONY: build-android-bin .PHONY: build-android-bin
build-android-bin: ## Build greptime binary for android. build-android-bin: ## Build greptime binary for android.
docker run --network=host \ docker run --network=host \
-v ${PWD}:/greptimedb -v ${CARGO_REGISTRY_CACHE}:/root/.cargo/registry \ -v ${PWD}:/greptimedb -v ${CARGO_REGISTRY_CACHE}:/root/.cargo/registry -v ${CARGO_GIT_CACHE}:/root/.cargo/git \
-w /greptimedb ${IMAGE_REGISTRY}/${IMAGE_NAMESPACE}/dev-builder-android:${DEV_BUILDER_IMAGE_TAG} \ -w /greptimedb ${IMAGE_REGISTRY}/${IMAGE_NAMESPACE}/dev-builder-android:${DEV_BUILDER_IMAGE_TAG} \
make build \ make build \
CARGO_EXTENSION="ndk --platform 23 -t aarch64-linux-android" \ CARGO_EXTENSION="ndk --platform 23 -t aarch64-linux-android" \
@@ -224,7 +225,7 @@ stop-etcd: ## Stop single node etcd for testing purpose.
.PHONY: run-it-in-container .PHONY: run-it-in-container
run-it-in-container: start-etcd ## Run integration tests in dev-builder. run-it-in-container: start-etcd ## Run integration tests in dev-builder.
docker run --network=host \ docker run --network=host \
-v ${PWD}:/greptimedb -v ${CARGO_REGISTRY_CACHE}:/root/.cargo/registry -v /tmp:/tmp \ -v ${PWD}:/greptimedb -v ${CARGO_REGISTRY_CACHE}:/root/.cargo/registry -v ${CARGO_GIT_CACHE}:/root/.cargo/git -v /tmp:/tmp \
-w /greptimedb ${IMAGE_REGISTRY}/${IMAGE_NAMESPACE}/dev-builder-${BASE_IMAGE}:${DEV_BUILDER_IMAGE_TAG} \ -w /greptimedb ${IMAGE_REGISTRY}/${IMAGE_NAMESPACE}/dev-builder-${BASE_IMAGE}:${DEV_BUILDER_IMAGE_TAG} \
make test sqlness-test BUILD_JOBS=${BUILD_JOBS} make test sqlness-test BUILD_JOBS=${BUILD_JOBS}

View File

@@ -17,7 +17,7 @@ Release date: {{ timestamp | date(format="%B %d, %Y") }}
{%- set breakings = commits | filter(attribute="breaking", value=true) -%} {%- set breakings = commits | filter(attribute="breaking", value=true) -%}
{%- if breakings | length > 0 %} {%- if breakings | length > 0 %}
## Breaking changes ### Breaking changes
{% for commit in breakings %} {% for commit in breakings %}
* {{ commit.github.pr_title }}\ * {{ commit.github.pr_title }}\
{% if commit.github.username %} by \ {% if commit.github.username %} by \

View File

@@ -14,11 +14,12 @@
| --- | -----| ------- | ----------- | | --- | -----| ------- | ----------- |
| `default_timezone` | String | Unset | The default timezone of the server. | | `default_timezone` | String | Unset | The default timezone of the server. |
| `default_column_prefix` | String | Unset | The default column prefix for auto-created time index and value columns. | | `default_column_prefix` | String | Unset | The default column prefix for auto-created time index and value columns. |
| `max_in_flight_write_bytes` | String | Unset | Maximum total memory for all concurrent write request bodies and messages (HTTP, gRPC, Flight).<br/>Set to 0 to disable the limit. Default: "0" (unlimited) |
| `write_bytes_exhausted_policy` | String | Unset | Policy when write bytes quota is exhausted.<br/>Options: "wait" (default, 10s timeout), "wait(<duration>)" (e.g., "wait(30s)"), "fail" |
| `init_regions_in_background` | Bool | `false` | Initialize all regions in the background during the startup.<br/>By default, it provides services after all regions have been initialized. | | `init_regions_in_background` | Bool | `false` | Initialize all regions in the background during the startup.<br/>By default, it provides services after all regions have been initialized. |
| `init_regions_parallelism` | Integer | `16` | Parallelism of initializing regions. | | `init_regions_parallelism` | Integer | `16` | Parallelism of initializing regions. |
| `max_concurrent_queries` | Integer | `0` | The maximum current queries allowed to be executed. Zero means unlimited.<br/>NOTE: This setting affects scan_memory_limit's privileged tier allocation.<br/>When set, 70% of queries get privileged memory access (full scan_memory_limit).<br/>The remaining 30% get standard tier access (70% of scan_memory_limit). | | `max_concurrent_queries` | Integer | `0` | The maximum current queries allowed to be executed. Zero means unlimited.<br/>NOTE: This setting affects scan_memory_limit's privileged tier allocation.<br/>When set, 70% of queries get privileged memory access (full scan_memory_limit).<br/>The remaining 30% get standard tier access (70% of scan_memory_limit). |
| `enable_telemetry` | Bool | `true` | Enable telemetry to collect anonymous usage data. Enabled by default. | | `enable_telemetry` | Bool | `true` | Enable telemetry to collect anonymous usage data. Enabled by default. |
| `max_in_flight_write_bytes` | String | Unset | The maximum in-flight write bytes. |
| `runtime` | -- | -- | The runtime options. | | `runtime` | -- | -- | The runtime options. |
| `runtime.global_rt_size` | Integer | `8` | The number of threads to execute the runtime for global read operations. | | `runtime.global_rt_size` | Integer | `8` | The number of threads to execute the runtime for global read operations. |
| `runtime.compact_rt_size` | Integer | `4` | The number of threads to execute the runtime for global write operations. | | `runtime.compact_rt_size` | Integer | `4` | The number of threads to execute the runtime for global write operations. |
@@ -26,14 +27,12 @@
| `http.addr` | String | `127.0.0.1:4000` | The address to bind the HTTP server. | | `http.addr` | String | `127.0.0.1:4000` | The address to bind the HTTP server. |
| `http.timeout` | String | `0s` | HTTP request timeout. Set to 0 to disable timeout. | | `http.timeout` | String | `0s` | HTTP request timeout. Set to 0 to disable timeout. |
| `http.body_limit` | String | `64MB` | HTTP request body limit.<br/>The following units are supported: `B`, `KB`, `KiB`, `MB`, `MiB`, `GB`, `GiB`, `TB`, `TiB`, `PB`, `PiB`.<br/>Set to 0 to disable limit. | | `http.body_limit` | String | `64MB` | HTTP request body limit.<br/>The following units are supported: `B`, `KB`, `KiB`, `MB`, `MiB`, `GB`, `GiB`, `TB`, `TiB`, `PB`, `PiB`.<br/>Set to 0 to disable limit. |
| `http.max_total_body_memory` | String | Unset | Maximum total memory for all concurrent HTTP request bodies.<br/>Set to 0 to disable the limit. Default: "0" (unlimited) |
| `http.enable_cors` | Bool | `true` | HTTP CORS support, it's turned on by default<br/>This allows browser to access http APIs without CORS restrictions | | `http.enable_cors` | Bool | `true` | HTTP CORS support, it's turned on by default<br/>This allows browser to access http APIs without CORS restrictions |
| `http.cors_allowed_origins` | Array | Unset | Customize allowed origins for HTTP CORS. | | `http.cors_allowed_origins` | Array | Unset | Customize allowed origins for HTTP CORS. |
| `http.prom_validation_mode` | String | `strict` | Whether to enable validation for Prometheus remote write requests.<br/>Available options:<br/>- strict: deny invalid UTF-8 strings (default).<br/>- lossy: allow invalid UTF-8 strings, replace invalid characters with REPLACEMENT_CHARACTER(U+FFFD).<br/>- unchecked: do not valid strings. | | `http.prom_validation_mode` | String | `strict` | Whether to enable validation for Prometheus remote write requests.<br/>Available options:<br/>- strict: deny invalid UTF-8 strings (default).<br/>- lossy: allow invalid UTF-8 strings, replace invalid characters with REPLACEMENT_CHARACTER(U+FFFD).<br/>- unchecked: do not valid strings. |
| `grpc` | -- | -- | The gRPC server options. | | `grpc` | -- | -- | The gRPC server options. |
| `grpc.bind_addr` | String | `127.0.0.1:4001` | The address to bind the gRPC server. | | `grpc.bind_addr` | String | `127.0.0.1:4001` | The address to bind the gRPC server. |
| `grpc.runtime_size` | Integer | `8` | The number of server worker threads. | | `grpc.runtime_size` | Integer | `8` | The number of server worker threads. |
| `grpc.max_total_message_memory` | String | Unset | Maximum total memory for all concurrent gRPC request messages.<br/>Set to 0 to disable the limit. Default: "0" (unlimited) |
| `grpc.max_connection_age` | String | Unset | The maximum connection age for gRPC connection.<br/>The value can be a human-readable time string. For example: `10m` for ten minutes or `1h` for one hour.<br/>Refer to https://grpc.io/docs/guides/keepalive/ for more details. | | `grpc.max_connection_age` | String | Unset | The maximum connection age for gRPC connection.<br/>The value can be a human-readable time string. For example: `10m` for ten minutes or `1h` for one hour.<br/>Refer to https://grpc.io/docs/guides/keepalive/ for more details. |
| `grpc.tls` | -- | -- | gRPC server TLS options, see `mysql.tls` section. | | `grpc.tls` | -- | -- | gRPC server TLS options, see `mysql.tls` section. |
| `grpc.tls.mode` | String | `disable` | TLS mode. | | `grpc.tls.mode` | String | `disable` | TLS mode. |
@@ -83,6 +82,8 @@
| `wal.sync_period` | String | `10s` | Duration for fsyncing log files.<br/>**It's only used when the provider is `raft_engine`**. | | `wal.sync_period` | String | `10s` | Duration for fsyncing log files.<br/>**It's only used when the provider is `raft_engine`**. |
| `wal.recovery_parallelism` | Integer | `2` | Parallelism during WAL recovery. | | `wal.recovery_parallelism` | Integer | `2` | Parallelism during WAL recovery. |
| `wal.broker_endpoints` | Array | -- | The Kafka broker endpoints.<br/>**It's only used when the provider is `kafka`**. | | `wal.broker_endpoints` | Array | -- | The Kafka broker endpoints.<br/>**It's only used when the provider is `kafka`**. |
| `wal.connect_timeout` | String | `3s` | The connect timeout for kafka client.<br/>**It's only used when the provider is `kafka`**. |
| `wal.timeout` | String | `3s` | The timeout for kafka client.<br/>**It's only used when the provider is `kafka`**. |
| `wal.auto_create_topics` | Bool | `true` | Automatically create topics for WAL.<br/>Set to `true` to automatically create topics for WAL.<br/>Otherwise, use topics named `topic_name_prefix_[0..num_topics)` | | `wal.auto_create_topics` | Bool | `true` | Automatically create topics for WAL.<br/>Set to `true` to automatically create topics for WAL.<br/>Otherwise, use topics named `topic_name_prefix_[0..num_topics)` |
| `wal.num_topics` | Integer | `64` | Number of topics.<br/>**It's only used when the provider is `kafka`**. | | `wal.num_topics` | Integer | `64` | Number of topics.<br/>**It's only used when the provider is `kafka`**. |
| `wal.selector_type` | String | `round_robin` | Topic selector type.<br/>Available selector types:<br/>- `round_robin` (default)<br/>**It's only used when the provider is `kafka`**. | | `wal.selector_type` | String | `round_robin` | Topic selector type.<br/>Available selector types:<br/>- `round_robin` (default)<br/>**It's only used when the provider is `kafka`**. |
@@ -108,9 +109,6 @@
| `storage` | -- | -- | The data storage options. | | `storage` | -- | -- | The data storage options. |
| `storage.data_home` | String | `./greptimedb_data` | The working home directory. | | `storage.data_home` | String | `./greptimedb_data` | The working home directory. |
| `storage.type` | String | `File` | The storage type used to store the data.<br/>- `File`: the data is stored in the local file system.<br/>- `S3`: the data is stored in the S3 object storage.<br/>- `Gcs`: the data is stored in the Google Cloud Storage.<br/>- `Azblob`: the data is stored in the Azure Blob Storage.<br/>- `Oss`: the data is stored in the Aliyun OSS. | | `storage.type` | String | `File` | The storage type used to store the data.<br/>- `File`: the data is stored in the local file system.<br/>- `S3`: the data is stored in the S3 object storage.<br/>- `Gcs`: the data is stored in the Google Cloud Storage.<br/>- `Azblob`: the data is stored in the Azure Blob Storage.<br/>- `Oss`: the data is stored in the Aliyun OSS. |
| `storage.enable_read_cache` | Bool | `true` | Whether to enable read cache. If not set, the read cache will be enabled by default when using object storage. |
| `storage.cache_path` | String | Unset | Read cache configuration for object storage such as 'S3' etc, it's configured by default when using object storage. It is recommended to configure it when using object storage for better performance.<br/>A local file directory, defaults to `{data_home}`. An empty string means disabling. |
| `storage.cache_capacity` | String | Unset | The local file cache capacity in bytes. If your disk space is sufficient, it is recommended to set it larger. |
| `storage.bucket` | String | Unset | The S3 bucket name.<br/>**It's only used when the storage type is `S3`, `Oss` and `Gcs`**. | | `storage.bucket` | String | Unset | The S3 bucket name.<br/>**It's only used when the storage type is `S3`, `Oss` and `Gcs`**. |
| `storage.root` | String | Unset | The S3 data will be stored in the specified prefix, for example, `s3://${bucket}/${root}`.<br/>**It's only used when the storage type is `S3`, `Oss` and `Azblob`**. | | `storage.root` | String | Unset | The S3 data will be stored in the specified prefix, for example, `s3://${bucket}/${root}`.<br/>**It's only used when the storage type is `S3`, `Oss` and `Azblob`**. |
| `storage.access_key_id` | String | Unset | The access key id of the aws account.<br/>It's **highly recommended** to use AWS IAM roles instead of hardcoding the access key id and secret key.<br/>**It's only used when the storage type is `S3` and `Oss`**. | | `storage.access_key_id` | String | Unset | The access key id of the aws account.<br/>It's **highly recommended** to use AWS IAM roles instead of hardcoding the access key id and secret key.<br/>**It's only used when the storage type is `S3` and `Oss`**. |
@@ -141,6 +139,8 @@
| `region_engine.mito.max_background_flushes` | Integer | Auto | Max number of running background flush jobs (default: 1/2 of cpu cores). | | `region_engine.mito.max_background_flushes` | Integer | Auto | Max number of running background flush jobs (default: 1/2 of cpu cores). |
| `region_engine.mito.max_background_compactions` | Integer | Auto | Max number of running background compaction jobs (default: 1/4 of cpu cores). | | `region_engine.mito.max_background_compactions` | Integer | Auto | Max number of running background compaction jobs (default: 1/4 of cpu cores). |
| `region_engine.mito.max_background_purges` | Integer | Auto | Max number of running background purge jobs (default: number of cpu cores). | | `region_engine.mito.max_background_purges` | Integer | Auto | Max number of running background purge jobs (default: number of cpu cores). |
| `region_engine.mito.experimental_compaction_memory_limit` | String | 0 | Memory budget for compaction tasks. Setting it to 0 or "unlimited" disables the limit. |
| `region_engine.mito.experimental_compaction_on_exhausted` | String | wait | Behavior when compaction cannot acquire memory from the budget.<br/>Options: "wait" (default, 10s), "wait(<duration>)", "fail" |
| `region_engine.mito.auto_flush_interval` | String | `1h` | Interval to auto flush a region if it has not flushed yet. | | `region_engine.mito.auto_flush_interval` | String | `1h` | Interval to auto flush a region if it has not flushed yet. |
| `region_engine.mito.global_write_buffer_size` | String | Auto | Global write buffer size for all regions. If not set, it's default to 1/8 of OS memory with a max limitation of 1GB. | | `region_engine.mito.global_write_buffer_size` | String | Auto | Global write buffer size for all regions. If not set, it's default to 1/8 of OS memory with a max limitation of 1GB. |
| `region_engine.mito.global_write_buffer_reject_size` | String | Auto | Global write buffer size threshold to reject write requests. If not set, it's default to 2 times of `global_write_buffer_size`. | | `region_engine.mito.global_write_buffer_reject_size` | String | Auto | Global write buffer size threshold to reject write requests. If not set, it's default to 2 times of `global_write_buffer_size`. |
@@ -154,6 +154,8 @@
| `region_engine.mito.write_cache_ttl` | String | Unset | TTL for write cache. | | `region_engine.mito.write_cache_ttl` | String | Unset | TTL for write cache. |
| `region_engine.mito.preload_index_cache` | Bool | `true` | Preload index (puffin) files into cache on region open (default: true).<br/>When enabled, index files are loaded into the write cache during region initialization,<br/>which can improve query performance at the cost of longer startup times. | | `region_engine.mito.preload_index_cache` | Bool | `true` | Preload index (puffin) files into cache on region open (default: true).<br/>When enabled, index files are loaded into the write cache during region initialization,<br/>which can improve query performance at the cost of longer startup times. |
| `region_engine.mito.index_cache_percent` | Integer | `20` | Percentage of write cache capacity allocated for index (puffin) files (default: 20).<br/>The remaining capacity is used for data (parquet) files.<br/>Must be between 0 and 100 (exclusive). For example, with a 5GiB write cache and 20% allocation,<br/>1GiB is reserved for index files and 4GiB for data files. | | `region_engine.mito.index_cache_percent` | Integer | `20` | Percentage of write cache capacity allocated for index (puffin) files (default: 20).<br/>The remaining capacity is used for data (parquet) files.<br/>Must be between 0 and 100 (exclusive). For example, with a 5GiB write cache and 20% allocation,<br/>1GiB is reserved for index files and 4GiB for data files. |
| `region_engine.mito.enable_refill_cache_on_read` | Bool | `true` | Enable refilling cache on read operations (default: true).<br/>When disabled, cache refilling on read won't happen. |
| `region_engine.mito.manifest_cache_size` | String | `256MB` | Capacity for manifest cache (default: 256MB). |
| `region_engine.mito.sst_write_buffer_size` | String | `8MB` | Buffer size for SST writing. | | `region_engine.mito.sst_write_buffer_size` | String | `8MB` | Buffer size for SST writing. |
| `region_engine.mito.parallel_scan_channel_size` | Integer | `32` | Capacity of the channel to send data from parallel scan tasks to the main task. | | `region_engine.mito.parallel_scan_channel_size` | Integer | `32` | Capacity of the channel to send data from parallel scan tasks to the main task. |
| `region_engine.mito.max_concurrent_scan_files` | Integer | `384` | Maximum number of SST files to scan concurrently. | | `region_engine.mito.max_concurrent_scan_files` | Integer | `384` | Maximum number of SST files to scan concurrently. |
@@ -224,7 +226,8 @@
| --- | -----| ------- | ----------- | | --- | -----| ------- | ----------- |
| `default_timezone` | String | Unset | The default timezone of the server. | | `default_timezone` | String | Unset | The default timezone of the server. |
| `default_column_prefix` | String | Unset | The default column prefix for auto-created time index and value columns. | | `default_column_prefix` | String | Unset | The default column prefix for auto-created time index and value columns. |
| `max_in_flight_write_bytes` | String | Unset | The maximum in-flight write bytes. | | `max_in_flight_write_bytes` | String | Unset | Maximum total memory for all concurrent write request bodies and messages (HTTP, gRPC, Flight).<br/>Set to 0 to disable the limit. Default: "0" (unlimited) |
| `write_bytes_exhausted_policy` | String | Unset | Policy when write bytes quota is exhausted.<br/>Options: "wait" (default, 10s timeout), "wait(<duration>)" (e.g., "wait(30s)"), "fail" |
| `runtime` | -- | -- | The runtime options. | | `runtime` | -- | -- | The runtime options. |
| `runtime.global_rt_size` | Integer | `8` | The number of threads to execute the runtime for global read operations. | | `runtime.global_rt_size` | Integer | `8` | The number of threads to execute the runtime for global read operations. |
| `runtime.compact_rt_size` | Integer | `4` | The number of threads to execute the runtime for global write operations. | | `runtime.compact_rt_size` | Integer | `4` | The number of threads to execute the runtime for global write operations. |
@@ -235,7 +238,6 @@
| `http.addr` | String | `127.0.0.1:4000` | The address to bind the HTTP server. | | `http.addr` | String | `127.0.0.1:4000` | The address to bind the HTTP server. |
| `http.timeout` | String | `0s` | HTTP request timeout. Set to 0 to disable timeout. | | `http.timeout` | String | `0s` | HTTP request timeout. Set to 0 to disable timeout. |
| `http.body_limit` | String | `64MB` | HTTP request body limit.<br/>The following units are supported: `B`, `KB`, `KiB`, `MB`, `MiB`, `GB`, `GiB`, `TB`, `TiB`, `PB`, `PiB`.<br/>Set to 0 to disable limit. | | `http.body_limit` | String | `64MB` | HTTP request body limit.<br/>The following units are supported: `B`, `KB`, `KiB`, `MB`, `MiB`, `GB`, `GiB`, `TB`, `TiB`, `PB`, `PiB`.<br/>Set to 0 to disable limit. |
| `http.max_total_body_memory` | String | Unset | Maximum total memory for all concurrent HTTP request bodies.<br/>Set to 0 to disable the limit. Default: "0" (unlimited) |
| `http.enable_cors` | Bool | `true` | HTTP CORS support, it's turned on by default<br/>This allows browser to access http APIs without CORS restrictions | | `http.enable_cors` | Bool | `true` | HTTP CORS support, it's turned on by default<br/>This allows browser to access http APIs without CORS restrictions |
| `http.cors_allowed_origins` | Array | Unset | Customize allowed origins for HTTP CORS. | | `http.cors_allowed_origins` | Array | Unset | Customize allowed origins for HTTP CORS. |
| `http.prom_validation_mode` | String | `strict` | Whether to enable validation for Prometheus remote write requests.<br/>Available options:<br/>- strict: deny invalid UTF-8 strings (default).<br/>- lossy: allow invalid UTF-8 strings, replace invalid characters with REPLACEMENT_CHARACTER(U+FFFD).<br/>- unchecked: do not valid strings. | | `http.prom_validation_mode` | String | `strict` | Whether to enable validation for Prometheus remote write requests.<br/>Available options:<br/>- strict: deny invalid UTF-8 strings (default).<br/>- lossy: allow invalid UTF-8 strings, replace invalid characters with REPLACEMENT_CHARACTER(U+FFFD).<br/>- unchecked: do not valid strings. |
@@ -243,7 +245,6 @@
| `grpc.bind_addr` | String | `127.0.0.1:4001` | The address to bind the gRPC server. | | `grpc.bind_addr` | String | `127.0.0.1:4001` | The address to bind the gRPC server. |
| `grpc.server_addr` | String | `127.0.0.1:4001` | The address advertised to the metasrv, and used for connections from outside the host.<br/>If left empty or unset, the server will automatically use the IP address of the first network interface<br/>on the host, with the same port number as the one specified in `grpc.bind_addr`. | | `grpc.server_addr` | String | `127.0.0.1:4001` | The address advertised to the metasrv, and used for connections from outside the host.<br/>If left empty or unset, the server will automatically use the IP address of the first network interface<br/>on the host, with the same port number as the one specified in `grpc.bind_addr`. |
| `grpc.runtime_size` | Integer | `8` | The number of server worker threads. | | `grpc.runtime_size` | Integer | `8` | The number of server worker threads. |
| `grpc.max_total_message_memory` | String | Unset | Maximum total memory for all concurrent gRPC request messages.<br/>Set to 0 to disable the limit. Default: "0" (unlimited) |
| `grpc.flight_compression` | String | `arrow_ipc` | Compression mode for frontend side Arrow IPC service. Available options:<br/>- `none`: disable all compression<br/>- `transport`: only enable gRPC transport compression (zstd)<br/>- `arrow_ipc`: only enable Arrow IPC compression (lz4)<br/>- `all`: enable all compression.<br/>Default to `none` | | `grpc.flight_compression` | String | `arrow_ipc` | Compression mode for frontend side Arrow IPC service. Available options:<br/>- `none`: disable all compression<br/>- `transport`: only enable gRPC transport compression (zstd)<br/>- `arrow_ipc`: only enable Arrow IPC compression (lz4)<br/>- `all`: enable all compression.<br/>Default to `none` |
| `grpc.max_connection_age` | String | Unset | The maximum connection age for gRPC connection.<br/>The value can be a human-readable time string. For example: `10m` for ten minutes or `1h` for one hour.<br/>Refer to https://grpc.io/docs/guides/keepalive/ for more details. | | `grpc.max_connection_age` | String | Unset | The maximum connection age for gRPC connection.<br/>The value can be a human-readable time string. For example: `10m` for ten minutes or `1h` for one hour.<br/>Refer to https://grpc.io/docs/guides/keepalive/ for more details. |
| `grpc.tls` | -- | -- | gRPC server TLS options, see `mysql.tls` section. | | `grpc.tls` | -- | -- | gRPC server TLS options, see `mysql.tls` section. |
@@ -343,14 +344,15 @@
| `store_key_prefix` | String | `""` | If it's not empty, the metasrv will store all data with this key prefix. | | `store_key_prefix` | String | `""` | If it's not empty, the metasrv will store all data with this key prefix. |
| `backend` | String | `etcd_store` | The datastore for meta server.<br/>Available values:<br/>- `etcd_store` (default value)<br/>- `memory_store`<br/>- `postgres_store`<br/>- `mysql_store` | | `backend` | String | `etcd_store` | The datastore for meta server.<br/>Available values:<br/>- `etcd_store` (default value)<br/>- `memory_store`<br/>- `postgres_store`<br/>- `mysql_store` |
| `meta_table_name` | String | `greptime_metakv` | Table name in RDS to store metadata. Effect when using a RDS kvbackend.<br/>**Only used when backend is `postgres_store`.** | | `meta_table_name` | String | `greptime_metakv` | Table name in RDS to store metadata. Effect when using a RDS kvbackend.<br/>**Only used when backend is `postgres_store`.** |
| `meta_schema_name` | String | `greptime_schema` | Optional PostgreSQL schema for metadata table and election table name qualification.<br/>When PostgreSQL public schema is not writable (e.g., PostgreSQL 15+ with restricted public),<br/>set this to a writable schema. GreptimeDB will use `meta_schema_name`.`meta_table_name`.<br/>GreptimeDB will NOT create the schema automatically; please ensure it exists or the user has permission.<br/>**Only used when backend is `postgres_store`.** | | `meta_schema_name` | String | `greptime_schema` | Optional PostgreSQL schema for metadata table and election table name qualification.<br/>When PostgreSQL public schema is not writable (e.g., PostgreSQL 15+ with restricted public),<br/>set this to a writable schema. GreptimeDB will use `meta_schema_name`.`meta_table_name`.<br/>**Only used when backend is `postgres_store`.** |
| `auto_create_schema` | Bool | `true` | Automatically create PostgreSQL schema if it doesn't exist.<br/>When enabled, the system will execute `CREATE SCHEMA IF NOT EXISTS <schema_name>`<br/>before creating metadata tables. This is useful in production environments where<br/>manual schema creation may be restricted.<br/>Default is true.<br/>Note: The PostgreSQL user must have CREATE SCHEMA permission for this to work.<br/>**Only used when backend is `postgres_store`.** |
| `meta_election_lock_id` | Integer | `1` | Advisory lock id in PostgreSQL for election. Effect when using PostgreSQL as kvbackend<br/>Only used when backend is `postgres_store`. | | `meta_election_lock_id` | Integer | `1` | Advisory lock id in PostgreSQL for election. Effect when using PostgreSQL as kvbackend<br/>Only used when backend is `postgres_store`. |
| `selector` | String | `round_robin` | Datanode selector type.<br/>- `round_robin` (default value)<br/>- `lease_based`<br/>- `load_based`<br/>For details, please see "https://docs.greptime.com/developer-guide/metasrv/selector". | | `selector` | String | `round_robin` | Datanode selector type.<br/>- `round_robin` (default value)<br/>- `lease_based`<br/>- `load_based`<br/>For details, please see "https://docs.greptime.com/developer-guide/metasrv/selector". |
| `use_memory_store` | Bool | `false` | Store data in memory. |
| `enable_region_failover` | Bool | `false` | Whether to enable region failover.<br/>This feature is only available on GreptimeDB running on cluster mode and<br/>- Using Remote WAL<br/>- Using shared storage (e.g., s3). | | `enable_region_failover` | Bool | `false` | Whether to enable region failover.<br/>This feature is only available on GreptimeDB running on cluster mode and<br/>- Using Remote WAL<br/>- Using shared storage (e.g., s3). |
| `region_failure_detector_initialization_delay` | String | `10m` | The delay before starting region failure detection.<br/>This delay helps prevent Metasrv from triggering unnecessary region failovers before all Datanodes are fully started.<br/>Especially useful when the cluster is not deployed with GreptimeDB Operator and maintenance mode is not enabled. | | `region_failure_detector_initialization_delay` | String | `10m` | The delay before starting region failure detection.<br/>This delay helps prevent Metasrv from triggering unnecessary region failovers before all Datanodes are fully started.<br/>Especially useful when the cluster is not deployed with GreptimeDB Operator and maintenance mode is not enabled. |
| `allow_region_failover_on_local_wal` | Bool | `false` | Whether to allow region failover on local WAL.<br/>**This option is not recommended to be set to true, because it may lead to data loss during failover.** | | `allow_region_failover_on_local_wal` | Bool | `false` | Whether to allow region failover on local WAL.<br/>**This option is not recommended to be set to true, because it may lead to data loss during failover.** |
| `node_max_idle_time` | String | `24hours` | Max allowed idle time before removing node info from metasrv memory. | | `node_max_idle_time` | String | `24hours` | Max allowed idle time before removing node info from metasrv memory. |
| `heartbeat_interval` | String | `3s` | Base heartbeat interval for calculating distributed time constants.<br/>The frontend heartbeat interval is 6 times of the base heartbeat interval.<br/>The flownode/datanode heartbeat interval is 1 times of the base heartbeat interval.<br/>e.g., If the base heartbeat interval is 3s, the frontend heartbeat interval is 18s, the flownode/datanode heartbeat interval is 3s.<br/>If you change this value, you need to change the heartbeat interval of the flownode/frontend/datanode accordingly. |
| `enable_telemetry` | Bool | `true` | Whether to enable greptimedb telemetry. Enabled by default. | | `enable_telemetry` | Bool | `true` | Whether to enable greptimedb telemetry. Enabled by default. |
| `runtime` | -- | -- | The runtime options. | | `runtime` | -- | -- | The runtime options. |
| `runtime.global_rt_size` | Integer | `8` | The number of threads to execute the runtime for global read operations. | | `runtime.global_rt_size` | Integer | `8` | The number of threads to execute the runtime for global read operations. |
@@ -360,12 +362,18 @@
| `backend_tls.cert_path` | String | `""` | Path to client certificate file (for client authentication)<br/>Like "/path/to/client.crt" | | `backend_tls.cert_path` | String | `""` | Path to client certificate file (for client authentication)<br/>Like "/path/to/client.crt" |
| `backend_tls.key_path` | String | `""` | Path to client private key file (for client authentication)<br/>Like "/path/to/client.key" | | `backend_tls.key_path` | String | `""` | Path to client private key file (for client authentication)<br/>Like "/path/to/client.key" |
| `backend_tls.ca_cert_path` | String | `""` | Path to CA certificate file (for server certificate verification)<br/>Required when using custom CAs or self-signed certificates<br/>Leave empty to use system root certificates only<br/>Like "/path/to/ca.crt" | | `backend_tls.ca_cert_path` | String | `""` | Path to CA certificate file (for server certificate verification)<br/>Required when using custom CAs or self-signed certificates<br/>Leave empty to use system root certificates only<br/>Like "/path/to/ca.crt" |
| `backend_client` | -- | -- | The backend client options.<br/>Currently, only applicable when using etcd as the metadata store. |
| `backend_client.keep_alive_timeout` | String | `3s` | The keep alive timeout for backend client. |
| `backend_client.keep_alive_interval` | String | `10s` | The keep alive interval for backend client. |
| `backend_client.connect_timeout` | String | `3s` | The connect timeout for backend client. |
| `grpc` | -- | -- | The gRPC server options. | | `grpc` | -- | -- | The gRPC server options. |
| `grpc.bind_addr` | String | `127.0.0.1:3002` | The address to bind the gRPC server. | | `grpc.bind_addr` | String | `127.0.0.1:3002` | The address to bind the gRPC server. |
| `grpc.server_addr` | String | `127.0.0.1:3002` | The communication server address for the frontend and datanode to connect to metasrv.<br/>If left empty or unset, the server will automatically use the IP address of the first network interface<br/>on the host, with the same port number as the one specified in `bind_addr`. | | `grpc.server_addr` | String | `127.0.0.1:3002` | The communication server address for the frontend and datanode to connect to metasrv.<br/>If left empty or unset, the server will automatically use the IP address of the first network interface<br/>on the host, with the same port number as the one specified in `bind_addr`. |
| `grpc.runtime_size` | Integer | `8` | The number of server worker threads. | | `grpc.runtime_size` | Integer | `8` | The number of server worker threads. |
| `grpc.max_recv_message_size` | String | `512MB` | The maximum receive message size for gRPC server. | | `grpc.max_recv_message_size` | String | `512MB` | The maximum receive message size for gRPC server. |
| `grpc.max_send_message_size` | String | `512MB` | The maximum send message size for gRPC server. | | `grpc.max_send_message_size` | String | `512MB` | The maximum send message size for gRPC server. |
| `grpc.http2_keep_alive_interval` | String | `10s` | The server side HTTP/2 keep-alive interval |
| `grpc.http2_keep_alive_timeout` | String | `3s` | The server side HTTP/2 keep-alive timeout. |
| `http` | -- | -- | The HTTP server options. | | `http` | -- | -- | The HTTP server options. |
| `http.addr` | String | `127.0.0.1:4000` | The address to bind the HTTP server. | | `http.addr` | String | `127.0.0.1:4000` | The address to bind the HTTP server. |
| `http.timeout` | String | `0s` | HTTP request timeout. Set to 0 to disable timeout. | | `http.timeout` | String | `0s` | HTTP request timeout. Set to 0 to disable timeout. |
@@ -475,6 +483,8 @@
| `wal.sync_period` | String | `10s` | Duration for fsyncing log files.<br/>**It's only used when the provider is `raft_engine`**. | | `wal.sync_period` | String | `10s` | Duration for fsyncing log files.<br/>**It's only used when the provider is `raft_engine`**. |
| `wal.recovery_parallelism` | Integer | `2` | Parallelism during WAL recovery. | | `wal.recovery_parallelism` | Integer | `2` | Parallelism during WAL recovery. |
| `wal.broker_endpoints` | Array | -- | The Kafka broker endpoints.<br/>**It's only used when the provider is `kafka`**. | | `wal.broker_endpoints` | Array | -- | The Kafka broker endpoints.<br/>**It's only used when the provider is `kafka`**. |
| `wal.connect_timeout` | String | `3s` | The connect timeout for kafka client.<br/>**It's only used when the provider is `kafka`**. |
| `wal.timeout` | String | `3s` | The timeout for kafka client.<br/>**It's only used when the provider is `kafka`**. |
| `wal.max_batch_bytes` | String | `1MB` | The max size of a single producer batch.<br/>Warning: Kafka has a default limit of 1MB per message in a topic.<br/>**It's only used when the provider is `kafka`**. | | `wal.max_batch_bytes` | String | `1MB` | The max size of a single producer batch.<br/>Warning: Kafka has a default limit of 1MB per message in a topic.<br/>**It's only used when the provider is `kafka`**. |
| `wal.consumer_wait_timeout` | String | `100ms` | The consumer wait timeout.<br/>**It's only used when the provider is `kafka`**. | | `wal.consumer_wait_timeout` | String | `100ms` | The consumer wait timeout.<br/>**It's only used when the provider is `kafka`**. |
| `wal.create_index` | Bool | `true` | Whether to enable WAL index creation.<br/>**It's only used when the provider is `kafka`**. | | `wal.create_index` | Bool | `true` | Whether to enable WAL index creation.<br/>**It's only used when the provider is `kafka`**. |
@@ -486,9 +496,6 @@
| `storage` | -- | -- | The data storage options. | | `storage` | -- | -- | The data storage options. |
| `storage.data_home` | String | `./greptimedb_data` | The working home directory. | | `storage.data_home` | String | `./greptimedb_data` | The working home directory. |
| `storage.type` | String | `File` | The storage type used to store the data.<br/>- `File`: the data is stored in the local file system.<br/>- `S3`: the data is stored in the S3 object storage.<br/>- `Gcs`: the data is stored in the Google Cloud Storage.<br/>- `Azblob`: the data is stored in the Azure Blob Storage.<br/>- `Oss`: the data is stored in the Aliyun OSS. | | `storage.type` | String | `File` | The storage type used to store the data.<br/>- `File`: the data is stored in the local file system.<br/>- `S3`: the data is stored in the S3 object storage.<br/>- `Gcs`: the data is stored in the Google Cloud Storage.<br/>- `Azblob`: the data is stored in the Azure Blob Storage.<br/>- `Oss`: the data is stored in the Aliyun OSS. |
| `storage.cache_path` | String | Unset | Read cache configuration for object storage such as 'S3' etc, it's configured by default when using object storage. It is recommended to configure it when using object storage for better performance.<br/>A local file directory, defaults to `{data_home}`. An empty string means disabling. |
| `storage.enable_read_cache` | Bool | `true` | Whether to enable read cache. If not set, the read cache will be enabled by default when using object storage. |
| `storage.cache_capacity` | String | Unset | The local file cache capacity in bytes. If your disk space is sufficient, it is recommended to set it larger. |
| `storage.bucket` | String | Unset | The S3 bucket name.<br/>**It's only used when the storage type is `S3`, `Oss` and `Gcs`**. | | `storage.bucket` | String | Unset | The S3 bucket name.<br/>**It's only used when the storage type is `S3`, `Oss` and `Gcs`**. |
| `storage.root` | String | Unset | The S3 data will be stored in the specified prefix, for example, `s3://${bucket}/${root}`.<br/>**It's only used when the storage type is `S3`, `Oss` and `Azblob`**. | | `storage.root` | String | Unset | The S3 data will be stored in the specified prefix, for example, `s3://${bucket}/${root}`.<br/>**It's only used when the storage type is `S3`, `Oss` and `Azblob`**. |
| `storage.access_key_id` | String | Unset | The access key id of the aws account.<br/>It's **highly recommended** to use AWS IAM roles instead of hardcoding the access key id and secret key.<br/>**It's only used when the storage type is `S3` and `Oss`**. | | `storage.access_key_id` | String | Unset | The access key id of the aws account.<br/>It's **highly recommended** to use AWS IAM roles instead of hardcoding the access key id and secret key.<br/>**It's only used when the storage type is `S3` and `Oss`**. |
@@ -521,6 +528,8 @@
| `region_engine.mito.max_background_flushes` | Integer | Auto | Max number of running background flush jobs (default: 1/2 of cpu cores). | | `region_engine.mito.max_background_flushes` | Integer | Auto | Max number of running background flush jobs (default: 1/2 of cpu cores). |
| `region_engine.mito.max_background_compactions` | Integer | Auto | Max number of running background compaction jobs (default: 1/4 of cpu cores). | | `region_engine.mito.max_background_compactions` | Integer | Auto | Max number of running background compaction jobs (default: 1/4 of cpu cores). |
| `region_engine.mito.max_background_purges` | Integer | Auto | Max number of running background purge jobs (default: number of cpu cores). | | `region_engine.mito.max_background_purges` | Integer | Auto | Max number of running background purge jobs (default: number of cpu cores). |
| `region_engine.mito.experimental_compaction_memory_limit` | String | 0 | Memory budget for compaction tasks. Setting it to 0 or "unlimited" disables the limit. |
| `region_engine.mito.experimental_compaction_on_exhausted` | String | wait | Behavior when compaction cannot acquire memory from the budget.<br/>Options: "wait" (default, 10s), "wait(<duration>)", "fail" |
| `region_engine.mito.auto_flush_interval` | String | `1h` | Interval to auto flush a region if it has not flushed yet. | | `region_engine.mito.auto_flush_interval` | String | `1h` | Interval to auto flush a region if it has not flushed yet. |
| `region_engine.mito.global_write_buffer_size` | String | Auto | Global write buffer size for all regions. If not set, it's default to 1/8 of OS memory with a max limitation of 1GB. | | `region_engine.mito.global_write_buffer_size` | String | Auto | Global write buffer size for all regions. If not set, it's default to 1/8 of OS memory with a max limitation of 1GB. |
| `region_engine.mito.global_write_buffer_reject_size` | String | Auto | Global write buffer size threshold to reject write requests. If not set, it's default to 2 times of `global_write_buffer_size` | | `region_engine.mito.global_write_buffer_reject_size` | String | Auto | Global write buffer size threshold to reject write requests. If not set, it's default to 2 times of `global_write_buffer_size` |
@@ -534,6 +543,8 @@
| `region_engine.mito.write_cache_ttl` | String | Unset | TTL for write cache. | | `region_engine.mito.write_cache_ttl` | String | Unset | TTL for write cache. |
| `region_engine.mito.preload_index_cache` | Bool | `true` | Preload index (puffin) files into cache on region open (default: true).<br/>When enabled, index files are loaded into the write cache during region initialization,<br/>which can improve query performance at the cost of longer startup times. | | `region_engine.mito.preload_index_cache` | Bool | `true` | Preload index (puffin) files into cache on region open (default: true).<br/>When enabled, index files are loaded into the write cache during region initialization,<br/>which can improve query performance at the cost of longer startup times. |
| `region_engine.mito.index_cache_percent` | Integer | `20` | Percentage of write cache capacity allocated for index (puffin) files (default: 20).<br/>The remaining capacity is used for data (parquet) files.<br/>Must be between 0 and 100 (exclusive). For example, with a 5GiB write cache and 20% allocation,<br/>1GiB is reserved for index files and 4GiB for data files. | | `region_engine.mito.index_cache_percent` | Integer | `20` | Percentage of write cache capacity allocated for index (puffin) files (default: 20).<br/>The remaining capacity is used for data (parquet) files.<br/>Must be between 0 and 100 (exclusive). For example, with a 5GiB write cache and 20% allocation,<br/>1GiB is reserved for index files and 4GiB for data files. |
| `region_engine.mito.enable_refill_cache_on_read` | Bool | `true` | Enable refilling cache on read operations (default: true).<br/>When disabled, cache refilling on read won't happen. |
| `region_engine.mito.manifest_cache_size` | String | `256MB` | Capacity for manifest cache (default: 256MB). |
| `region_engine.mito.sst_write_buffer_size` | String | `8MB` | Buffer size for SST writing. | | `region_engine.mito.sst_write_buffer_size` | String | `8MB` | Buffer size for SST writing. |
| `region_engine.mito.parallel_scan_channel_size` | Integer | `32` | Capacity of the channel to send data from parallel scan tasks to the main task. | | `region_engine.mito.parallel_scan_channel_size` | Integer | `32` | Capacity of the channel to send data from parallel scan tasks to the main task. |
| `region_engine.mito.max_concurrent_scan_files` | Integer | `384` | Maximum number of SST files to scan concurrently. | | `region_engine.mito.max_concurrent_scan_files` | Integer | `384` | Maximum number of SST files to scan concurrently. |

View File

@@ -169,6 +169,14 @@ recovery_parallelism = 2
## **It's only used when the provider is `kafka`**. ## **It's only used when the provider is `kafka`**.
broker_endpoints = ["127.0.0.1:9092"] broker_endpoints = ["127.0.0.1:9092"]
## The connect timeout for kafka client.
## **It's only used when the provider is `kafka`**.
#+ connect_timeout = "3s"
## The timeout for kafka client.
## **It's only used when the provider is `kafka`**.
#+ timeout = "3s"
## The max size of a single producer batch. ## The max size of a single producer batch.
## Warning: Kafka has a default limit of 1MB per message in a topic. ## Warning: Kafka has a default limit of 1MB per message in a topic.
## **It's only used when the provider is `kafka`**. ## **It's only used when the provider is `kafka`**.
@@ -225,6 +233,7 @@ overwrite_entry_start_id = false
# endpoint = "https://s3.amazonaws.com" # endpoint = "https://s3.amazonaws.com"
# region = "us-west-2" # region = "us-west-2"
# enable_virtual_host_style = false # enable_virtual_host_style = false
# disable_ec2_metadata = false
# Example of using Oss as the storage. # Example of using Oss as the storage.
# [storage] # [storage]
@@ -281,18 +290,6 @@ data_home = "./greptimedb_data"
## - `Oss`: the data is stored in the Aliyun OSS. ## - `Oss`: the data is stored in the Aliyun OSS.
type = "File" type = "File"
## Read cache configuration for object storage such as 'S3' etc, it's configured by default when using object storage. It is recommended to configure it when using object storage for better performance.
## A local file directory, defaults to `{data_home}`. An empty string means disabling.
## @toml2docs:none-default
#+ cache_path = ""
## Whether to enable read cache. If not set, the read cache will be enabled by default when using object storage.
#+ enable_read_cache = true
## The local file cache capacity in bytes. If your disk space is sufficient, it is recommended to set it larger.
## @toml2docs:none-default
cache_capacity = "5GiB"
## The S3 bucket name. ## The S3 bucket name.
## **It's only used when the storage type is `S3`, `Oss` and `Gcs`**. ## **It's only used when the storage type is `S3`, `Oss` and `Gcs`**.
## @toml2docs:none-default ## @toml2docs:none-default
@@ -452,6 +449,15 @@ compress_manifest = false
## @toml2docs:none-default="Auto" ## @toml2docs:none-default="Auto"
#+ max_background_purges = 8 #+ max_background_purges = 8
## Memory budget for compaction tasks. Setting it to 0 or "unlimited" disables the limit.
## @toml2docs:none-default="0"
#+ experimental_compaction_memory_limit = "0"
## Behavior when compaction cannot acquire memory from the budget.
## Options: "wait" (default, 10s), "wait(<duration>)", "fail"
## @toml2docs:none-default="wait"
#+ experimental_compaction_on_exhausted = "wait"
## Interval to auto flush a region if it has not flushed yet. ## Interval to auto flush a region if it has not flushed yet.
auto_flush_interval = "1h" auto_flush_interval = "1h"
@@ -507,6 +513,13 @@ preload_index_cache = true
## 1GiB is reserved for index files and 4GiB for data files. ## 1GiB is reserved for index files and 4GiB for data files.
index_cache_percent = 20 index_cache_percent = 20
## Enable refilling cache on read operations (default: true).
## When disabled, cache refilling on read won't happen.
enable_refill_cache_on_read = true
## Capacity for manifest cache (default: 256MB).
manifest_cache_size = "256MB"
## Buffer size for SST writing. ## Buffer size for SST writing.
sst_write_buffer_size = "8MB" sst_write_buffer_size = "8MB"

View File

@@ -6,9 +6,15 @@ default_timezone = "UTC"
## @toml2docs:none-default ## @toml2docs:none-default
default_column_prefix = "greptime" default_column_prefix = "greptime"
## The maximum in-flight write bytes. ## Maximum total memory for all concurrent write request bodies and messages (HTTP, gRPC, Flight).
## Set to 0 to disable the limit. Default: "0" (unlimited)
## @toml2docs:none-default ## @toml2docs:none-default
#+ max_in_flight_write_bytes = "500MB" #+ max_in_flight_write_bytes = "1GB"
## Policy when write bytes quota is exhausted.
## Options: "wait" (default, 10s timeout), "wait(<duration>)" (e.g., "wait(30s)"), "fail"
## @toml2docs:none-default
#+ write_bytes_exhausted_policy = "wait"
## The runtime options. ## The runtime options.
#+ [runtime] #+ [runtime]
@@ -35,10 +41,6 @@ timeout = "0s"
## The following units are supported: `B`, `KB`, `KiB`, `MB`, `MiB`, `GB`, `GiB`, `TB`, `TiB`, `PB`, `PiB`. ## The following units are supported: `B`, `KB`, `KiB`, `MB`, `MiB`, `GB`, `GiB`, `TB`, `TiB`, `PB`, `PiB`.
## Set to 0 to disable limit. ## Set to 0 to disable limit.
body_limit = "64MB" body_limit = "64MB"
## Maximum total memory for all concurrent HTTP request bodies.
## Set to 0 to disable the limit. Default: "0" (unlimited)
## @toml2docs:none-default
#+ max_total_body_memory = "1GB"
## HTTP CORS support, it's turned on by default ## HTTP CORS support, it's turned on by default
## This allows browser to access http APIs without CORS restrictions ## This allows browser to access http APIs without CORS restrictions
enable_cors = true enable_cors = true
@@ -62,10 +64,6 @@ bind_addr = "127.0.0.1:4001"
server_addr = "127.0.0.1:4001" server_addr = "127.0.0.1:4001"
## The number of server worker threads. ## The number of server worker threads.
runtime_size = 8 runtime_size = 8
## Maximum total memory for all concurrent gRPC request messages.
## Set to 0 to disable the limit. Default: "0" (unlimited)
## @toml2docs:none-default
#+ max_total_message_memory = "1GB"
## Compression mode for frontend side Arrow IPC service. Available options: ## Compression mode for frontend side Arrow IPC service. Available options:
## - `none`: disable all compression ## - `none`: disable all compression
## - `transport`: only enable gRPC transport compression (zstd) ## - `transport`: only enable gRPC transport compression (zstd)
@@ -131,7 +129,6 @@ key_path = ""
## For now, gRPC tls config does not support auto reload. ## For now, gRPC tls config does not support auto reload.
watch = false watch = false
## MySQL server options. ## MySQL server options.
[mysql] [mysql]
## Whether to enable. ## Whether to enable.

View File

@@ -34,11 +34,18 @@ meta_table_name = "greptime_metakv"
## Optional PostgreSQL schema for metadata table and election table name qualification. ## Optional PostgreSQL schema for metadata table and election table name qualification.
## When PostgreSQL public schema is not writable (e.g., PostgreSQL 15+ with restricted public), ## When PostgreSQL public schema is not writable (e.g., PostgreSQL 15+ with restricted public),
## set this to a writable schema. GreptimeDB will use `meta_schema_name`.`meta_table_name`. ## set this to a writable schema. GreptimeDB will use `meta_schema_name`.`meta_table_name`.
## GreptimeDB will NOT create the schema automatically; please ensure it exists or the user has permission.
## **Only used when backend is `postgres_store`.** ## **Only used when backend is `postgres_store`.**
meta_schema_name = "greptime_schema" meta_schema_name = "greptime_schema"
## Automatically create PostgreSQL schema if it doesn't exist.
## When enabled, the system will execute `CREATE SCHEMA IF NOT EXISTS <schema_name>`
## before creating metadata tables. This is useful in production environments where
## manual schema creation may be restricted.
## Default is true.
## Note: The PostgreSQL user must have CREATE SCHEMA permission for this to work.
## **Only used when backend is `postgres_store`.**
auto_create_schema = true
## Advisory lock id in PostgreSQL for election. Effect when using PostgreSQL as kvbackend ## Advisory lock id in PostgreSQL for election. Effect when using PostgreSQL as kvbackend
## Only used when backend is `postgres_store`. ## Only used when backend is `postgres_store`.
meta_election_lock_id = 1 meta_election_lock_id = 1
@@ -50,9 +57,6 @@ meta_election_lock_id = 1
## For details, please see "https://docs.greptime.com/developer-guide/metasrv/selector". ## For details, please see "https://docs.greptime.com/developer-guide/metasrv/selector".
selector = "round_robin" selector = "round_robin"
## Store data in memory.
use_memory_store = false
## Whether to enable region failover. ## Whether to enable region failover.
## This feature is only available on GreptimeDB running on cluster mode and ## This feature is only available on GreptimeDB running on cluster mode and
## - Using Remote WAL ## - Using Remote WAL
@@ -71,6 +75,13 @@ allow_region_failover_on_local_wal = false
## Max allowed idle time before removing node info from metasrv memory. ## Max allowed idle time before removing node info from metasrv memory.
node_max_idle_time = "24hours" node_max_idle_time = "24hours"
## Base heartbeat interval for calculating distributed time constants.
## The frontend heartbeat interval is 6 times of the base heartbeat interval.
## The flownode/datanode heartbeat interval is 1 times of the base heartbeat interval.
## e.g., If the base heartbeat interval is 3s, the frontend heartbeat interval is 18s, the flownode/datanode heartbeat interval is 3s.
## If you change this value, you need to change the heartbeat interval of the flownode/frontend/datanode accordingly.
#+ heartbeat_interval = "3s"
## Whether to enable greptimedb telemetry. Enabled by default. ## Whether to enable greptimedb telemetry. Enabled by default.
#+ enable_telemetry = true #+ enable_telemetry = true
@@ -109,6 +120,16 @@ key_path = ""
## Like "/path/to/ca.crt" ## Like "/path/to/ca.crt"
ca_cert_path = "" ca_cert_path = ""
## The backend client options.
## Currently, only applicable when using etcd as the metadata store.
#+ [backend_client]
## The keep alive timeout for backend client.
#+ keep_alive_timeout = "3s"
## The keep alive interval for backend client.
#+ keep_alive_interval = "10s"
## The connect timeout for backend client.
#+ connect_timeout = "3s"
## The gRPC server options. ## The gRPC server options.
[grpc] [grpc]
## The address to bind the gRPC server. ## The address to bind the gRPC server.
@@ -123,6 +144,10 @@ runtime_size = 8
max_recv_message_size = "512MB" max_recv_message_size = "512MB"
## The maximum send message size for gRPC server. ## The maximum send message size for gRPC server.
max_send_message_size = "512MB" max_send_message_size = "512MB"
## The server side HTTP/2 keep-alive interval
#+ http2_keep_alive_interval = "10s"
## The server side HTTP/2 keep-alive timeout.
#+ http2_keep_alive_timeout = "3s"
## The HTTP server options. ## The HTTP server options.
[http] [http]

View File

@@ -6,6 +6,16 @@ default_timezone = "UTC"
## @toml2docs:none-default ## @toml2docs:none-default
default_column_prefix = "greptime" default_column_prefix = "greptime"
## Maximum total memory for all concurrent write request bodies and messages (HTTP, gRPC, Flight).
## Set to 0 to disable the limit. Default: "0" (unlimited)
## @toml2docs:none-default
#+ max_in_flight_write_bytes = "1GB"
## Policy when write bytes quota is exhausted.
## Options: "wait" (default, 10s timeout), "wait(<duration>)" (e.g., "wait(30s)"), "fail"
## @toml2docs:none-default
#+ write_bytes_exhausted_policy = "wait"
## Initialize all regions in the background during the startup. ## Initialize all regions in the background during the startup.
## By default, it provides services after all regions have been initialized. ## By default, it provides services after all regions have been initialized.
init_regions_in_background = false init_regions_in_background = false
@@ -22,10 +32,6 @@ max_concurrent_queries = 0
## Enable telemetry to collect anonymous usage data. Enabled by default. ## Enable telemetry to collect anonymous usage data. Enabled by default.
#+ enable_telemetry = true #+ enable_telemetry = true
## The maximum in-flight write bytes.
## @toml2docs:none-default
#+ max_in_flight_write_bytes = "500MB"
## The runtime options. ## The runtime options.
#+ [runtime] #+ [runtime]
## The number of threads to execute the runtime for global read operations. ## The number of threads to execute the runtime for global read operations.
@@ -43,10 +49,6 @@ timeout = "0s"
## The following units are supported: `B`, `KB`, `KiB`, `MB`, `MiB`, `GB`, `GiB`, `TB`, `TiB`, `PB`, `PiB`. ## The following units are supported: `B`, `KB`, `KiB`, `MB`, `MiB`, `GB`, `GiB`, `TB`, `TiB`, `PB`, `PiB`.
## Set to 0 to disable limit. ## Set to 0 to disable limit.
body_limit = "64MB" body_limit = "64MB"
## Maximum total memory for all concurrent HTTP request bodies.
## Set to 0 to disable the limit. Default: "0" (unlimited)
## @toml2docs:none-default
#+ max_total_body_memory = "1GB"
## HTTP CORS support, it's turned on by default ## HTTP CORS support, it's turned on by default
## This allows browser to access http APIs without CORS restrictions ## This allows browser to access http APIs without CORS restrictions
enable_cors = true enable_cors = true
@@ -67,10 +69,6 @@ prom_validation_mode = "strict"
bind_addr = "127.0.0.1:4001" bind_addr = "127.0.0.1:4001"
## The number of server worker threads. ## The number of server worker threads.
runtime_size = 8 runtime_size = 8
## Maximum total memory for all concurrent gRPC request messages.
## Set to 0 to disable the limit. Default: "0" (unlimited)
## @toml2docs:none-default
#+ max_total_message_memory = "1GB"
## The maximum connection age for gRPC connection. ## The maximum connection age for gRPC connection.
## The value can be a human-readable time string. For example: `10m` for ten minutes or `1h` for one hour. ## The value can be a human-readable time string. For example: `10m` for ten minutes or `1h` for one hour.
## Refer to https://grpc.io/docs/guides/keepalive/ for more details. ## Refer to https://grpc.io/docs/guides/keepalive/ for more details.
@@ -230,6 +228,14 @@ recovery_parallelism = 2
## **It's only used when the provider is `kafka`**. ## **It's only used when the provider is `kafka`**.
broker_endpoints = ["127.0.0.1:9092"] broker_endpoints = ["127.0.0.1:9092"]
## The connect timeout for kafka client.
## **It's only used when the provider is `kafka`**.
#+ connect_timeout = "3s"
## The timeout for kafka client.
## **It's only used when the provider is `kafka`**.
#+ timeout = "3s"
## Automatically create topics for WAL. ## Automatically create topics for WAL.
## Set to `true` to automatically create topics for WAL. ## Set to `true` to automatically create topics for WAL.
## Otherwise, use topics named `topic_name_prefix_[0..num_topics)` ## Otherwise, use topics named `topic_name_prefix_[0..num_topics)`
@@ -332,6 +338,7 @@ max_running_procedures = 128
# endpoint = "https://s3.amazonaws.com" # endpoint = "https://s3.amazonaws.com"
# region = "us-west-2" # region = "us-west-2"
# enable_virtual_host_style = false # enable_virtual_host_style = false
# disable_ec2_metadata = false
# Example of using Oss as the storage. # Example of using Oss as the storage.
# [storage] # [storage]
@@ -388,18 +395,6 @@ data_home = "./greptimedb_data"
## - `Oss`: the data is stored in the Aliyun OSS. ## - `Oss`: the data is stored in the Aliyun OSS.
type = "File" type = "File"
## Whether to enable read cache. If not set, the read cache will be enabled by default when using object storage.
#+ enable_read_cache = true
## Read cache configuration for object storage such as 'S3' etc, it's configured by default when using object storage. It is recommended to configure it when using object storage for better performance.
## A local file directory, defaults to `{data_home}`. An empty string means disabling.
## @toml2docs:none-default
#+ cache_path = ""
## The local file cache capacity in bytes. If your disk space is sufficient, it is recommended to set it larger.
## @toml2docs:none-default
cache_capacity = "5GiB"
## The S3 bucket name. ## The S3 bucket name.
## **It's only used when the storage type is `S3`, `Oss` and `Gcs`**. ## **It's only used when the storage type is `S3`, `Oss` and `Gcs`**.
## @toml2docs:none-default ## @toml2docs:none-default
@@ -546,6 +541,15 @@ compress_manifest = false
## @toml2docs:none-default="Auto" ## @toml2docs:none-default="Auto"
#+ max_background_purges = 8 #+ max_background_purges = 8
## Memory budget for compaction tasks. Setting it to 0 or "unlimited" disables the limit.
## @toml2docs:none-default="0"
#+ experimental_compaction_memory_limit = "0"
## Behavior when compaction cannot acquire memory from the budget.
## Options: "wait" (default, 10s), "wait(<duration>)", "fail"
## @toml2docs:none-default="wait"
#+ experimental_compaction_on_exhausted = "wait"
## Interval to auto flush a region if it has not flushed yet. ## Interval to auto flush a region if it has not flushed yet.
auto_flush_interval = "1h" auto_flush_interval = "1h"
@@ -601,6 +605,13 @@ preload_index_cache = true
## 1GiB is reserved for index files and 4GiB for data files. ## 1GiB is reserved for index files and 4GiB for data files.
index_cache_percent = 20 index_cache_percent = 20
## Enable refilling cache on read operations (default: true).
## When disabled, cache refilling on read won't happen.
enable_refill_cache_on_read = true
## Capacity for manifest cache (default: 256MB).
manifest_cache_size = "256MB"
## Buffer size for SST writing. ## Buffer size for SST writing.
sst_write_buffer_size = "8MB" sst_write_buffer_size = "8MB"

View File

@@ -57,6 +57,20 @@ const REPO_CONFIGS: Record<string, RepoConfig> = {
return ['bump-nightly-version.yml', version]; return ['bump-nightly-version.yml', version];
} }
// Check for prerelease versions (e.g., 1.0.0-beta.3, 1.0.0-rc.1)
const prereleaseMatch = version.match(/^(\d+)\.(\d+)\.(\d+)-(beta|rc)\.(\d+)$/);
if (prereleaseMatch) {
const [, major, minor, patch, prereleaseType, prereleaseNum] = prereleaseMatch;
// If it's beta.1 and patch version is 0, treat as major version
if (prereleaseType === 'beta' && prereleaseNum === '1' && patch === '0') {
return ['bump-version.yml', `${major}.${minor}`];
}
// Otherwise (beta.x where x > 1, or rc.x), treat as patch version
return ['bump-patch-version.yml', version];
}
const parts = version.split('.'); const parts = version.split('.');
if (parts.length !== 3) { if (parts.length !== 3) {
throw new Error('Invalid version format'); throw new Error('Invalid version format');

View File

@@ -0,0 +1,94 @@
---
Feature Name: Vector Index
Tracking Issue: TBD
Date: 2025-12-04
Author: "TBD"
---
# Summary
Introduce a per-SST approximate nearest neighbor (ANN) index for `VECTOR(dim)` columns with a pluggable engine. USearch HNSW is the initial engine, while the design keeps VSAG (default when linked) and future engines selectable at DDL or alter time and encoded in the index metadata. The index is built alongside SST creation and accelerates `ORDER BY vec_*_distance(column, <literal vector>) LIMIT k` queries, falling back to the existing brute-force path when an index is unavailable or ineligible.
# Motivation
Vector distances are currently computed with nalgebra across all rows (O(N)) before sorting, which does not scale to millions of vectors. An on-disk ANN index with sub-linear search reduces latency and compute cost for common RAG, semantic search, and recommendation workloads without changing SQL.
# Details
## Current Behavior
`VECTOR(dim)` values are stored as binary blobs. Queries call `vec_cos_distance`/`vec_l2sq_distance`/`vec_dot_product` via nalgebra for every row and then sort; there is no indexing or caching.
## Index Eligibility and Configuration
Only `VECTOR(dim)` columns can be indexed. A column metadata flag follows the existing column-option pattern with an intentionally small surface area:
- `engine`: `vsag` (default when the binding is built) or `usearch`. If a configured engine is unavailable at runtime, the builder logs and falls back to `usearch` while leaving the option intact for future rebuilds.
- `metric`: `cosine` (default), `l2sq`, or `dot`; mismatches with query functions force brute-force execution.
- `m`: HNSW graph connectivity (higher = denser graph, more memory, better recall), default `16`.
- `ef_construct`: build-time expansion, default `128`.
- `ef_search`: query-time expansion, default `64`; engines may clamp values.
Option semantics mirror HNSW defaults so both USearch and VSAG can honor them; engine-specific tunables stay in reserved key-value pairs inside the blob header for forward compatibility.
DDL reuses column extensions similar to inverted/fulltext indexes:
```sql
CREATE TABLE embeddings (
ts TIMESTAMP TIME INDEX,
id STRING PRIMARY KEY,
vec VECTOR(384) VECTOR INDEX WITH (engine = 'vsag', metric = 'cosine', ef_search = 64)
);
```
Altering column options toggles the flag, can switch engines (for example `usearch` -> `vsag`), and triggers rebuilds through the existing alter/compaction flow. Engine choice stays in table metadata and each blob header; new SSTs use the configured engine while older SSTs remain readable under their recorded engine until compaction or a manual rebuild rewrites them.
## Storage and Format
- One vector index per indexed column per SST, stored as a Puffin blob with type `greptime-vector-index-v1`.
- Each blob records the engine (`usearch`, `vsag`, future values) and engine parameters in the header so readers can select the matching decoder. Mixed-engine SSTs remain readable because the engine id travels with the blob.
- USearch uses `f32` vectors and SST row offsets (`u64`) as keys; nulls and `OpType::Delete` rows are skipped. Row ids are the absolute SST ordinal so readers can derive `RowSelection` directly from parquet row group lengths without extra side tables.
- Blob layout:
- Header: version, column id, dimension, engine id, metric, `m`, `ef_construct`, `ef_search`, and reserved engine-specific key-value pairs.
- Counts: total rows written and indexed rows.
- Payload: USearch binary produced by `save_to_buffer`.
- An empty index (no eligible vectors) results in no available index entry for that column.
- `puffin_manager` registers the blob type so caches and readers discover it alongside inverted/fulltext/bloom blobs in the same index file.
## Row Visibility and Duplicates
- The indexer increments `row_offset` for every incoming row (including skipped/null/delete rows) so offsets stay aligned with parquet ordering across row groups.
- Only `OpType::Put` rows with the expected dimension are inserted; `OpType::Delete` and malformed rows are skipped but still advance `row_offset`, matching the data planes visibility rules.
- Multiple versions of the same primary key remain in the graph; the read path intersects search hits with the standard mito2 deduplication/visibility pipeline (sequence-aware dedup, delete filtering, projection) before returning results.
- Searches overfetch beyond `k` to compensate for rows discarded by visibility checks and to avoid reissuing index reads.
## Build Path (mito2 write)
Extend `sst::index::Indexer` to optionally create a `VectorIndexer` when region metadata marks a column as vector-indexed, mirroring how inverted/fulltext/bloom filters attach to `IndexerBuilderImpl` in `mito2`.
The indexer consumes `Batch`/`RecordBatch` data and shares memory tracking and abort semantics with existing indexers:
- Maintain a running `row_offset` that follows SST write order and spans row groups so the search result can be turned into `RowSelection`.
- For each `OpType::Put`, if the vector is non-null and matches the declared dimension, insert into USearch with `row_offset` as the key; otherwise skip.
- Track memory with existing index build metrics; on failure, abort only the vector index while keeping SST writing unaffected.
Engine selection is table-driven: the builder picks the configured engine (default `vsag`, fallback `usearch` if `vsag` is not compiled in) and dispatches to the matching implementation. Unknown engines skip index build with a warning.
On `finish`, serialize the engine-tagged index into the Puffin writer and record `IndexType::Vector` metadata for the column. `IndexOutput` and `FileMeta::indexes/available_indexes` gain a vector entry so manifest updates and `RegionVersion` surface per-column presence, following patterns used by inverted/fulltext/bloom indexes. Planner/metadata validation ensures that mismatched dimensions only reduce the indexed-row count and do not break reads.
## Read Path (mito2 query)
A planner rule in `query` identifies eligible plans on mito2 tables: a single `ORDER BY vec_cos_distance|vec_l2sq_distance|vec_dot_product(<vector column>, <literal vector>)` in ascending order plus a `LIMIT`/`TopK`. The rule rejects plans with multiple sort keys, non-literal query vectors, or additional projections that would change the distance expression and falls back to brute-force in those cases.
For eligible scans, build a `VectorIndexScan` execution node that:
- Consults SST metadata for `IndexType::Vector`, loads the index via Puffin using the existing `mito2::cache::index` infrastructure, and dispatches to the engine declared in the blob header (USearch/VSAG/etc.).
- Runs the engines `search` with an overfetch (for example 2×k) to tolerate rows filtered by deletes, dimension mismatches, or late-stage dedup; keys already match SST row offsets produced by the writer.
- Converts hits to `RowSelection` using parquet row group lengths and reuses the parquet reader so visibility, projection, and deduplication logic stay unchanged; distances are recomputed with `vec_*_distance` before the final trim to k to guarantee ordering and to merge distributed partial results deterministically.
Any unsupported shape, load error, or cache miss falls back to the current brute-force execution path.
## Lifecycle and Maintenance
Lifecycle piggybacks on the existing SST/index flow: rebuilds run where other secondary indexes do, graphs are always rebuilt from source rows (no HNSW merge), and cleanup/versioning/caching reuse the existing Puffin and index cache paths.
# Implementation Plan
1. Add the `usearch` dependency (wrapper module in `index` or `mito2`) and map minimal HNSW options; keep an engine trait that allows plugging VSAG without changing the rest of the pipeline.
2. Introduce `IndexType::Vector` and a column metadata key for vector index options (including `engine`); add SQL parser and `SHOW CREATE TABLE` support for `VECTOR INDEX WITH (...)`.
3. Implement `vector_index` build/read modules under `mito2` (and `index` if shared), including Puffin serialization that records engine id, blob-type registration with `puffin_manager`, and integration with the `Indexer` builder, `IndexOutput`, manifest updates, and compaction rebuild.
4. Extend the query planner/execution to detect eligible plans and drive a `RowSelection`-based ANN scan with a fallback path, dispatching by engine at read time and using existing Puffin and index caches.
5. Add unit tests for serialization/search correctness and an end-to-end test covering plan rewrite, cache usage, engine selection, and fallback; add a mixed-engine test to confirm old USearch blobs still serve after a VSAG switch.
6. Follow up with an optional VSAG engine binding (feature flag), validate parity with USearch on dense vectors, exercise alternative algorithms (for example PQ), and flip the default `engine` to `vsag` when the binding is present.
# Alternatives
- **VSAG (follow-up engine):** C++ library with HNSW and additional algorithms (for example SINDI for sparse vectors and PQ) targeting in-memory and disk-friendly search. Provides parameter generators and a roadmap for GPU-assisted build and graph compression. Compared to FAISS it is newer with fewer integrations but bundles sparse/dense coverage and out-of-core focus in one engine. Fits the pluggable-engine design and would become the default `engine = 'vsag'` when linked; USearch remains available for lighter dependencies.
- **FAISS:** Broad feature set (IVF/IVFPQ/PQ/HNSW, GPU acceleration, scalar filtering, pre/post filters) and battle-tested performance across datasets, but it requires a heavier C++/GPU toolchain, has no official Rust binding, and is less disk-centric than VSAG; integrating it would add more build/distribution burden than USearch/VSAG.
- **Do nothing:** Keep brute-force evaluation, which remains O(N) and unacceptable at scale.

View File

@@ -895,7 +895,7 @@ pub fn is_column_type_value_eq(
.unwrap_or(false) .unwrap_or(false)
} }
fn encode_json_value(value: JsonValue) -> v1::JsonValue { pub fn encode_json_value(value: JsonValue) -> v1::JsonValue {
fn helper(json: JsonVariant) -> v1::JsonValue { fn helper(json: JsonVariant) -> v1::JsonValue {
let value = match json { let value = match json {
JsonVariant::Null => None, JsonVariant::Null => None,

View File

@@ -17,8 +17,8 @@ use std::collections::HashMap;
use arrow_schema::extension::{EXTENSION_TYPE_METADATA_KEY, EXTENSION_TYPE_NAME_KEY}; use arrow_schema::extension::{EXTENSION_TYPE_METADATA_KEY, EXTENSION_TYPE_NAME_KEY};
use datatypes::schema::{ use datatypes::schema::{
COMMENT_KEY, ColumnDefaultConstraint, ColumnSchema, FULLTEXT_KEY, FulltextAnalyzer, COMMENT_KEY, ColumnDefaultConstraint, ColumnSchema, FULLTEXT_KEY, FulltextAnalyzer,
FulltextBackend, FulltextOptions, INVERTED_INDEX_KEY, SKIPPING_INDEX_KEY, SkippingIndexOptions, FulltextBackend, FulltextOptions, INVERTED_INDEX_KEY, Metadata, SKIPPING_INDEX_KEY,
SkippingIndexType, SkippingIndexOptions, SkippingIndexType,
}; };
use greptime_proto::v1::{ use greptime_proto::v1::{
Analyzer, FulltextBackend as PbFulltextBackend, SkippingIndexType as PbSkippingIndexType, Analyzer, FulltextBackend as PbFulltextBackend, SkippingIndexType as PbSkippingIndexType,
@@ -36,6 +36,14 @@ const INVERTED_INDEX_GRPC_KEY: &str = "inverted_index";
/// Key used to store skip index options in gRPC column options. /// Key used to store skip index options in gRPC column options.
const SKIPPING_INDEX_GRPC_KEY: &str = "skipping_index"; const SKIPPING_INDEX_GRPC_KEY: &str = "skipping_index";
const COLUMN_OPTION_MAPPINGS: [(&str, &str); 5] = [
(FULLTEXT_GRPC_KEY, FULLTEXT_KEY),
(INVERTED_INDEX_GRPC_KEY, INVERTED_INDEX_KEY),
(SKIPPING_INDEX_GRPC_KEY, SKIPPING_INDEX_KEY),
(EXTENSION_TYPE_NAME_KEY, EXTENSION_TYPE_NAME_KEY),
(EXTENSION_TYPE_METADATA_KEY, EXTENSION_TYPE_METADATA_KEY),
];
/// Tries to construct a `ColumnSchema` from the given `ColumnDef`. /// Tries to construct a `ColumnSchema` from the given `ColumnDef`.
pub fn try_as_column_schema(column_def: &ColumnDef) -> Result<ColumnSchema> { pub fn try_as_column_schema(column_def: &ColumnDef) -> Result<ColumnSchema> {
let data_type = ColumnDataTypeWrapper::try_new( let data_type = ColumnDataTypeWrapper::try_new(
@@ -131,6 +139,21 @@ pub fn try_as_column_def(column_schema: &ColumnSchema, is_primary_key: bool) ->
}) })
} }
/// Collect the [ColumnOptions] into the [Metadata] that can be used in, for example, [ColumnSchema].
pub fn collect_column_options(column_options: Option<&ColumnOptions>) -> Metadata {
let Some(ColumnOptions { options }) = column_options else {
return Metadata::default();
};
let mut metadata = Metadata::with_capacity(options.len());
for (x, y) in COLUMN_OPTION_MAPPINGS {
if let Some(v) = options.get(x) {
metadata.insert(y.to_string(), v.clone());
}
}
metadata
}
/// Constructs a `ColumnOptions` from the given `ColumnSchema`. /// Constructs a `ColumnOptions` from the given `ColumnSchema`.
pub fn options_from_column_schema(column_schema: &ColumnSchema) -> Option<ColumnOptions> { pub fn options_from_column_schema(column_schema: &ColumnSchema) -> Option<ColumnOptions> {
let mut options = ColumnOptions::default(); let mut options = ColumnOptions::default();

View File

@@ -32,6 +32,7 @@ use crate::error::Result;
pub mod error; pub mod error;
pub mod information_extension; pub mod information_extension;
pub mod kvbackend; pub mod kvbackend;
#[cfg(any(test, feature = "testing"))]
pub mod memory; pub mod memory;
mod metrics; mod metrics;
pub mod system_schema; pub mod system_schema;

View File

@@ -12,8 +12,6 @@
// See the License for the specific language governing permissions and // See the License for the specific language governing permissions and
// limitations under the License. // limitations under the License.
pub(crate) const METRIC_DB_LABEL: &str = "db";
use lazy_static::lazy_static; use lazy_static::lazy_static;
use prometheus::*; use prometheus::*;
@@ -25,7 +23,7 @@ lazy_static! {
pub static ref METRIC_CATALOG_MANAGER_TABLE_COUNT: IntGaugeVec = register_int_gauge_vec!( pub static ref METRIC_CATALOG_MANAGER_TABLE_COUNT: IntGaugeVec = register_int_gauge_vec!(
"greptime_catalog_table_count", "greptime_catalog_table_count",
"catalog table count", "catalog table count",
&[METRIC_DB_LABEL] &["db"]
) )
.unwrap(); .unwrap();
pub static ref METRIC_CATALOG_KV_REMOTE_GET: Histogram = pub static ref METRIC_CATALOG_KV_REMOTE_GET: Histogram =

View File

@@ -24,6 +24,7 @@ use std::sync::Arc;
use common_error::ext::BoxedError; use common_error::ext::BoxedError;
use common_recordbatch::{RecordBatchStreamWrapper, SendableRecordBatchStream}; use common_recordbatch::{RecordBatchStreamWrapper, SendableRecordBatchStream};
use common_telemetry::tracing::Span;
use datatypes::schema::SchemaRef; use datatypes::schema::SchemaRef;
use futures_util::StreamExt; use futures_util::StreamExt;
use snafu::ResultExt; use snafu::ResultExt;
@@ -163,6 +164,7 @@ impl DataSource for SystemTableDataSource {
stream: Box::pin(stream), stream: Box::pin(stream),
output_ordering: None, output_ordering: None,
metrics: Default::default(), metrics: Default::default(),
span: Span::current(),
}; };
Ok(Box::pin(stream)) Ok(Box::pin(stream))

View File

@@ -428,7 +428,7 @@ pub trait InformationExtension {
} }
/// The request to inspect the datanode. /// The request to inspect the datanode.
#[derive(Debug, Clone, PartialEq, Eq)] #[derive(Debug, Clone, PartialEq)]
pub struct DatanodeInspectRequest { pub struct DatanodeInspectRequest {
/// Kind to fetch from datanode. /// Kind to fetch from datanode.
pub kind: DatanodeInspectKind, pub kind: DatanodeInspectKind,

View File

@@ -399,8 +399,8 @@ impl InformationSchemaColumnsBuilder {
self.is_nullables.push(Some("No")); self.is_nullables.push(Some("No"));
} }
self.column_types.push(Some(&data_type)); self.column_types.push(Some(&data_type));
self.column_comments let column_comment = column_schema.column_comment().map(|x| x.as_ref());
.push(column_schema.column_comment().map(|x| x.as_ref())); self.column_comments.push(column_comment);
} }
fn finish(&mut self) -> Result<RecordBatch> { fn finish(&mut self) -> Result<RecordBatch> {

View File

@@ -12,6 +12,7 @@
// See the License for the specific language governing permissions and // See the License for the specific language governing permissions and
// limitations under the License. // limitations under the License.
use core::pin::pin;
use std::sync::{Arc, Weak}; use std::sync::{Arc, Weak};
use arrow_schema::SchemaRef as ArrowSchemaRef; use arrow_schema::SchemaRef as ArrowSchemaRef;
@@ -31,15 +32,17 @@ use datatypes::value::Value;
use datatypes::vectors::{ use datatypes::vectors::{
StringVectorBuilder, TimestampSecondVectorBuilder, UInt32VectorBuilder, UInt64VectorBuilder, StringVectorBuilder, TimestampSecondVectorBuilder, UInt32VectorBuilder, UInt64VectorBuilder,
}; };
use futures::TryStreamExt; use futures::StreamExt;
use snafu::{OptionExt, ResultExt}; use snafu::{OptionExt, ResultExt};
use store_api::storage::{RegionId, ScanRequest, TableId}; use store_api::storage::{ScanRequest, TableId};
use table::metadata::{TableInfo, TableType}; use table::metadata::{TableInfo, TableType};
use crate::CatalogManager; use crate::CatalogManager;
use crate::error::{ use crate::error::{
CreateRecordBatchSnafu, InternalSnafu, Result, UpgradeWeakCatalogManagerRefSnafu, CreateRecordBatchSnafu, FindRegionRoutesSnafu, InternalSnafu, Result,
UpgradeWeakCatalogManagerRefSnafu,
}; };
use crate::kvbackend::KvBackendCatalogManager;
use crate::system_schema::information_schema::{InformationTable, Predicates, TABLES}; use crate::system_schema::information_schema::{InformationTable, Predicates, TABLES};
use crate::system_schema::utils; use crate::system_schema::utils;
@@ -247,6 +250,10 @@ impl InformationSchemaTablesBuilder {
.catalog_manager .catalog_manager
.upgrade() .upgrade()
.context(UpgradeWeakCatalogManagerRefSnafu)?; .context(UpgradeWeakCatalogManagerRefSnafu)?;
let partition_manager = catalog_manager
.as_any()
.downcast_ref::<KvBackendCatalogManager>()
.map(|catalog_manager| catalog_manager.partition_manager());
let predicates = Predicates::from_scan_request(&request); let predicates = Predicates::from_scan_request(&request);
let information_extension = utils::information_extension(&self.catalog_manager)?; let information_extension = utils::information_extension(&self.catalog_manager)?;
@@ -267,37 +274,59 @@ impl InformationSchemaTablesBuilder {
}; };
for schema_name in catalog_manager.schema_names(&catalog_name, None).await? { for schema_name in catalog_manager.schema_names(&catalog_name, None).await? {
let mut stream = catalog_manager.tables(&catalog_name, &schema_name, None); let table_stream = catalog_manager.tables(&catalog_name, &schema_name, None);
while let Some(table) = stream.try_next().await? { const BATCH_SIZE: usize = 128;
let table_info = table.table_info(); // Split tables into chunks
let mut table_chunks = pin!(table_stream.ready_chunks(BATCH_SIZE));
// TODO(dennis): make it working for metric engine while let Some(tables) = table_chunks.next().await {
let table_region_stats = let tables = tables.into_iter().collect::<Result<Vec<_>>>()?;
if table_info.meta.engine == MITO_ENGINE || table_info.is_physical_table() { let mito_or_physical_table_ids = tables
table_info .iter()
.meta .filter(|table| {
.region_numbers table.table_info().meta.engine == MITO_ENGINE
.iter() || table.table_info().is_physical_table()
.map(|n| RegionId::new(table_info.ident.table_id, *n)) })
.flat_map(|region_id| { .map(|table| table.table_info().ident.table_id)
region_stats .collect::<Vec<_>>();
.binary_search_by_key(&region_id, |x| x.id)
.map(|i| &region_stats[i])
})
.collect::<Vec<_>>()
} else {
vec![]
};
self.add_table( let table_routes = if let Some(partition_manager) = &partition_manager {
&predicates, partition_manager
&catalog_name, .batch_find_region_routes(&mito_or_physical_table_ids)
&schema_name, .await
table_info, .context(FindRegionRoutesSnafu)?
table.table_type(), } else {
&table_region_stats, mito_or_physical_table_ids
); .into_iter()
.map(|id| (id, vec![]))
.collect()
};
for table in tables {
let table_region_stats =
match table_routes.get(&table.table_info().ident.table_id) {
Some(routes) => routes
.iter()
.flat_map(|route| {
let region_id = route.region.id;
region_stats
.binary_search_by_key(&region_id, |x| x.id)
.map(|i| &region_stats[i])
})
.collect::<Vec<_>>(),
None => vec![],
};
self.add_table(
&predicates,
&catalog_name,
&schema_name,
table.table_info(),
table.table_type(),
&table_region_stats,
);
}
} }
} }

View File

@@ -337,7 +337,7 @@ mod tests {
.build(); .build();
let table_metadata_manager = TableMetadataManager::new(backend); let table_metadata_manager = TableMetadataManager::new(backend);
let mut view_info = common_meta::key::test_utils::new_test_table_info(1024, vec![]); let mut view_info = common_meta::key::test_utils::new_test_table_info(1024);
view_info.table_type = TableType::View; view_info.table_type = TableType::View;
let logical_plan = vec![1, 2, 3]; let logical_plan = vec![1, 2, 3];
// Create view metadata // Create view metadata

View File

@@ -67,6 +67,7 @@ tracing-appender.workspace = true
[dev-dependencies] [dev-dependencies]
common-meta = { workspace = true, features = ["testing"] } common-meta = { workspace = true, features = ["testing"] }
common-test-util.workspace = true
common-version.workspace = true common-version.workspace = true
serde.workspace = true serde.workspace = true
tempfile.workspace = true tempfile.workspace = true

View File

@@ -162,7 +162,6 @@ fn create_table_info(table_id: TableId, table_name: TableName) -> RawTableInfo {
next_column_id: columns as u32 + 1, next_column_id: columns as u32 + 1,
value_indices: vec![], value_indices: vec![],
options: Default::default(), options: Default::default(),
region_numbers: (1..=100).collect(),
partition_key_indices: vec![], partition_key_indices: vec![],
column_ids: vec![], column_ids: vec![],
}; };

View File

@@ -15,5 +15,8 @@
mod object_store; mod object_store;
mod store; mod store;
pub use object_store::{ObjectStoreConfig, new_fs_object_store}; pub use object_store::{
ObjectStoreConfig, PrefixedAzblobConnection, PrefixedGcsConnection, PrefixedOssConnection,
PrefixedS3Connection, new_fs_object_store,
};
pub use store::StoreConfig; pub use store::StoreConfig;

View File

@@ -12,7 +12,7 @@
// See the License for the specific language governing permissions and // See the License for the specific language governing permissions and
// limitations under the License. // limitations under the License.
use common_base::secrets::SecretString; use common_base::secrets::{ExposeSecret, SecretString};
use common_error::ext::BoxedError; use common_error::ext::BoxedError;
use object_store::services::{Azblob, Fs, Gcs, Oss, S3}; use object_store::services::{Azblob, Fs, Gcs, Oss, S3};
use object_store::util::{with_instrument_layers, with_retry_layers}; use object_store::util::{with_instrument_layers, with_retry_layers};
@@ -22,9 +22,69 @@ use snafu::ResultExt;
use crate::error::{self}; use crate::error::{self};
/// Trait to convert CLI field types to target struct field types.
/// This enables `Option<SecretString>` (CLI) -> `SecretString` (target) conversions,
/// allowing us to distinguish "not provided" from "provided but empty".
trait IntoField<T> {
fn into_field(self) -> T;
}
/// Identity conversion for types that are the same.
impl<T> IntoField<T> for T {
fn into_field(self) -> T {
self
}
}
/// Convert `Option<SecretString>` to `SecretString`, using default for None.
impl IntoField<SecretString> for Option<SecretString> {
fn into_field(self) -> SecretString {
self.unwrap_or_default()
}
}
/// Trait for checking if a field is effectively empty.
///
/// **`is_empty()`**: Checks if the field has no meaningful value
/// - Used when backend is enabled to validate required fields
/// - `None`, `Some("")`, `false`, or `""` are considered empty
trait FieldValidator {
/// Check if the field is empty (has no meaningful value).
fn is_empty(&self) -> bool;
}
/// String fields: empty if the string is empty
impl FieldValidator for String {
fn is_empty(&self) -> bool {
self.is_empty()
}
}
/// Bool fields: false is considered "empty", true is "provided"
impl FieldValidator for bool {
fn is_empty(&self) -> bool {
!self
}
}
/// Option<String> fields: None or empty content is empty
impl FieldValidator for Option<String> {
fn is_empty(&self) -> bool {
self.as_ref().is_none_or(|s| s.is_empty())
}
}
/// Option<SecretString> fields: None or empty secret is empty
/// For secrets, Some("") is treated as "not provided" for both checks
impl FieldValidator for Option<SecretString> {
fn is_empty(&self) -> bool {
self.as_ref().is_none_or(|s| s.expose_secret().is_empty())
}
}
macro_rules! wrap_with_clap_prefix { macro_rules! wrap_with_clap_prefix {
( (
$new_name:ident, $prefix:literal, $base:ty, { $new_name:ident, $prefix:literal, $enable_flag:literal, $base:ty, {
$( $( #[doc = $doc:expr] )? $( #[alias = $alias:literal] )? $field:ident : $type:ty $( = $default:expr )? ),* $(,)? $( $( #[doc = $doc:expr] )? $( #[alias = $alias:literal] )? $field:ident : $type:ty $( = $default:expr )? ),* $(,)?
} }
) => { ) => {
@@ -34,15 +94,16 @@ macro_rules! wrap_with_clap_prefix {
$( $(
$( #[doc = $doc] )? $( #[doc = $doc] )?
$( #[clap(alias = $alias)] )? $( #[clap(alias = $alias)] )?
#[clap(long $(, default_value_t = $default )? )] #[clap(long, requires = $enable_flag $(, default_value_t = $default )? )]
[<$prefix $field>]: $type, pub [<$prefix $field>]: $type,
)* )*
} }
impl From<$new_name> for $base { impl From<$new_name> for $base {
fn from(w: $new_name) -> Self { fn from(w: $new_name) -> Self {
Self { Self {
$( $field: w.[<$prefix $field>] ),* // Use into_field() to handle Option<SecretString> -> SecretString conversion
$( $field: w.[<$prefix $field>].into_field() ),*
} }
} }
} }
@@ -50,9 +111,90 @@ macro_rules! wrap_with_clap_prefix {
}; };
} }
/// Macro for declarative backend validation.
///
/// # Validation Rules
///
/// For each storage backend (S3, OSS, GCS, Azblob), this function validates:
/// **When backend is enabled** (e.g., `--s3`): All required fields must be non-empty
///
/// Note: When backend is disabled, clap's `requires` attribute ensures no configuration
/// fields can be provided at parse time.
///
/// # Syntax
///
/// ```ignore
/// validate_backend!(
/// enable: self.enable_s3,
/// name: "S3",
/// required: [(field1, "name1"), (field2, "name2"), ...],
/// custom_validator: |missing| { ... } // optional
/// )
/// ```
///
/// # Arguments
///
/// - `enable`: Boolean expression indicating if backend is enabled
/// - `name`: Human-readable backend name for error messages
/// - `required`: Array of (field_ref, field_name) tuples for required fields
/// - `custom_validator`: Optional closure for complex validation logic
///
/// # Example
///
/// ```ignore
/// validate_backend!(
/// enable: self.enable_s3,
/// name: "S3",
/// required: [
/// (&self.s3.s3_bucket, "bucket"),
/// (&self.s3.s3_access_key_id, "access key ID"),
/// ]
/// )
/// ```
macro_rules! validate_backend {
(
enable: $enable:expr,
name: $backend_name:expr,
required: [ $( ($field:expr, $field_name:expr) ),* $(,)? ]
$(, custom_validator: $custom_validator:expr)?
) => {{
if $enable {
// Check required fields when backend is enabled
let mut missing = Vec::new();
$(
if FieldValidator::is_empty($field) {
missing.push($field_name);
}
)*
// Run custom validation if provided
$(
$custom_validator(&mut missing);
)?
if !missing.is_empty() {
return Err(BoxedError::new(
error::MissingConfigSnafu {
msg: format!(
"{} {} must be set when --{} is enabled.",
$backend_name,
missing.join(", "),
$backend_name.to_lowercase()
),
}
.build(),
));
}
}
Ok(())
}};
}
wrap_with_clap_prefix! { wrap_with_clap_prefix! {
PrefixedAzblobConnection, PrefixedAzblobConnection,
"azblob-", "azblob-",
"enable_azblob",
AzblobConnection, AzblobConnection,
{ {
#[doc = "The container of the object store."] #[doc = "The container of the object store."]
@@ -60,9 +202,9 @@ wrap_with_clap_prefix! {
#[doc = "The root of the object store."] #[doc = "The root of the object store."]
root: String = Default::default(), root: String = Default::default(),
#[doc = "The account name of the object store."] #[doc = "The account name of the object store."]
account_name: SecretString = Default::default(), account_name: Option<SecretString>,
#[doc = "The account key of the object store."] #[doc = "The account key of the object store."]
account_key: SecretString = Default::default(), account_key: Option<SecretString>,
#[doc = "The endpoint of the object store."] #[doc = "The endpoint of the object store."]
endpoint: String = Default::default(), endpoint: String = Default::default(),
#[doc = "The SAS token of the object store."] #[doc = "The SAS token of the object store."]
@@ -70,9 +212,33 @@ wrap_with_clap_prefix! {
} }
} }
impl PrefixedAzblobConnection {
pub fn validate(&self) -> Result<(), BoxedError> {
validate_backend!(
enable: true,
name: "AzBlob",
required: [
(&self.azblob_container, "container"),
(&self.azblob_root, "root"),
(&self.azblob_account_name, "account name"),
(&self.azblob_endpoint, "endpoint"),
],
custom_validator: |missing: &mut Vec<&str>| {
// account_key is only required if sas_token is not provided
if self.azblob_sas_token.is_none()
&& self.azblob_account_key.is_empty()
{
missing.push("account key (when sas_token is not provided)");
}
}
)
}
}
wrap_with_clap_prefix! { wrap_with_clap_prefix! {
PrefixedS3Connection, PrefixedS3Connection,
"s3-", "s3-",
"enable_s3",
S3Connection, S3Connection,
{ {
#[doc = "The bucket of the object store."] #[doc = "The bucket of the object store."]
@@ -80,25 +246,39 @@ wrap_with_clap_prefix! {
#[doc = "The root of the object store."] #[doc = "The root of the object store."]
root: String = Default::default(), root: String = Default::default(),
#[doc = "The access key ID of the object store."] #[doc = "The access key ID of the object store."]
access_key_id: SecretString = Default::default(), access_key_id: Option<SecretString>,
#[doc = "The secret access key of the object store."] #[doc = "The secret access key of the object store."]
secret_access_key: SecretString = Default::default(), secret_access_key: Option<SecretString>,
#[doc = "The endpoint of the object store."] #[doc = "The endpoint of the object store."]
endpoint: Option<String>, endpoint: Option<String>,
#[doc = "The region of the object store."] #[doc = "The region of the object store."]
region: Option<String>, region: Option<String>,
#[doc = "Enable virtual host style for the object store."] #[doc = "Enable virtual host style for the object store."]
enable_virtual_host_style: bool = Default::default(), enable_virtual_host_style: bool = Default::default(),
#[doc = "Allow anonymous access (disable credential signing) for testing."] #[doc = "Disable EC2 metadata service for the object store."]
allow_anonymous: bool = Default::default(), disable_ec2_metadata: bool = Default::default(),
#[doc = "Disable config load from environment and files for testing."] }
disable_config_load: bool = Default::default(), }
impl PrefixedS3Connection {
pub fn validate(&self) -> Result<(), BoxedError> {
validate_backend!(
enable: true,
name: "S3",
required: [
(&self.s3_bucket, "bucket"),
(&self.s3_access_key_id, "access key ID"),
(&self.s3_secret_access_key, "secret access key"),
(&self.s3_region, "region"),
]
)
} }
} }
wrap_with_clap_prefix! { wrap_with_clap_prefix! {
PrefixedOssConnection, PrefixedOssConnection,
"oss-", "oss-",
"enable_oss",
OssConnection, OssConnection,
{ {
#[doc = "The bucket of the object store."] #[doc = "The bucket of the object store."]
@@ -106,17 +286,33 @@ wrap_with_clap_prefix! {
#[doc = "The root of the object store."] #[doc = "The root of the object store."]
root: String = Default::default(), root: String = Default::default(),
#[doc = "The access key ID of the object store."] #[doc = "The access key ID of the object store."]
access_key_id: SecretString = Default::default(), access_key_id: Option<SecretString>,
#[doc = "The access key secret of the object store."] #[doc = "The access key secret of the object store."]
access_key_secret: SecretString = Default::default(), access_key_secret: Option<SecretString>,
#[doc = "The endpoint of the object store."] #[doc = "The endpoint of the object store."]
endpoint: String = Default::default(), endpoint: String = Default::default(),
} }
} }
impl PrefixedOssConnection {
pub fn validate(&self) -> Result<(), BoxedError> {
validate_backend!(
enable: true,
name: "OSS",
required: [
(&self.oss_bucket, "bucket"),
(&self.oss_access_key_id, "access key ID"),
(&self.oss_access_key_secret, "access key secret"),
(&self.oss_endpoint, "endpoint"),
]
)
}
}
wrap_with_clap_prefix! { wrap_with_clap_prefix! {
PrefixedGcsConnection, PrefixedGcsConnection,
"gcs-", "gcs-",
"enable_gcs",
GcsConnection, GcsConnection,
{ {
#[doc = "The root of the object store."] #[doc = "The root of the object store."]
@@ -126,40 +322,72 @@ wrap_with_clap_prefix! {
#[doc = "The scope of the object store."] #[doc = "The scope of the object store."]
scope: String = Default::default(), scope: String = Default::default(),
#[doc = "The credential path of the object store."] #[doc = "The credential path of the object store."]
credential_path: SecretString = Default::default(), credential_path: Option<SecretString>,
#[doc = "The credential of the object store."] #[doc = "The credential of the object store."]
credential: SecretString = Default::default(), credential: Option<SecretString>,
#[doc = "The endpoint of the object store."] #[doc = "The endpoint of the object store."]
endpoint: String = Default::default(), endpoint: String = Default::default(),
} }
} }
/// common config for object store. impl PrefixedGcsConnection {
pub fn validate(&self) -> Result<(), BoxedError> {
validate_backend!(
enable: true,
name: "GCS",
required: [
(&self.gcs_bucket, "bucket"),
(&self.gcs_root, "root"),
(&self.gcs_scope, "scope"),
]
// No custom_validator needed: GCS supports Application Default Credentials (ADC)
// where neither credential_path nor credential is required.
// Endpoint is also optional (defaults to https://storage.googleapis.com).
)
}
}
/// Common config for object store.
///
/// # Dependency Enforcement
///
/// Each backend's configuration fields (e.g., `--s3-bucket`) requires its corresponding
/// enable flag (e.g., `--s3`) to be present. This is enforced by `clap` at parse time
/// using the `requires` attribute.
///
/// For example, attempting to use `--s3-bucket my-bucket` without `--s3` will result in:
/// ```text
/// error: The argument '--s3-bucket <BUCKET>' requires '--s3'
/// ```
///
/// This ensures that users cannot accidentally provide backend-specific configuration
/// without explicitly enabling that backend.
#[derive(clap::Parser, Debug, Clone, PartialEq, Default)] #[derive(clap::Parser, Debug, Clone, PartialEq, Default)]
#[clap(group(clap::ArgGroup::new("storage_backend").required(false).multiple(false)))]
pub struct ObjectStoreConfig { pub struct ObjectStoreConfig {
/// Whether to use S3 object store. /// Whether to use S3 object store.
#[clap(long, alias = "s3")] #[clap(long = "s3", group = "storage_backend")]
pub enable_s3: bool, pub enable_s3: bool,
#[clap(flatten)] #[clap(flatten)]
pub s3: PrefixedS3Connection, pub s3: PrefixedS3Connection,
/// Whether to use OSS. /// Whether to use OSS.
#[clap(long, alias = "oss")] #[clap(long = "oss", group = "storage_backend")]
pub enable_oss: bool, pub enable_oss: bool,
#[clap(flatten)] #[clap(flatten)]
pub oss: PrefixedOssConnection, pub oss: PrefixedOssConnection,
/// Whether to use GCS. /// Whether to use GCS.
#[clap(long, alias = "gcs")] #[clap(long = "gcs", group = "storage_backend")]
pub enable_gcs: bool, pub enable_gcs: bool,
#[clap(flatten)] #[clap(flatten)]
pub gcs: PrefixedGcsConnection, pub gcs: PrefixedGcsConnection,
/// Whether to use Azure Blob. /// Whether to use Azure Blob.
#[clap(long, alias = "azblob")] #[clap(long = "azblob", group = "storage_backend")]
pub enable_azblob: bool, pub enable_azblob: bool,
#[clap(flatten)] #[clap(flatten)]
@@ -177,52 +405,66 @@ pub fn new_fs_object_store(root: &str) -> std::result::Result<ObjectStore, Boxed
Ok(with_instrument_layers(object_store, false)) Ok(with_instrument_layers(object_store, false))
} }
macro_rules! gen_object_store_builder {
($method:ident, $field:ident, $conn_type:ty, $service_type:ty) => {
pub fn $method(&self) -> Result<ObjectStore, BoxedError> {
let config = <$conn_type>::from(self.$field.clone());
common_telemetry::info!(
"Building object store with {}: {:?}",
stringify!($field),
config
);
let object_store = ObjectStore::new(<$service_type>::from(&config))
.context(error::InitBackendSnafu)
.map_err(BoxedError::new)?
.finish();
Ok(with_instrument_layers(
with_retry_layers(object_store),
false,
))
}
};
}
impl ObjectStoreConfig { impl ObjectStoreConfig {
gen_object_store_builder!(build_s3, s3, S3Connection, S3);
gen_object_store_builder!(build_oss, oss, OssConnection, Oss);
gen_object_store_builder!(build_gcs, gcs, GcsConnection, Gcs);
gen_object_store_builder!(build_azblob, azblob, AzblobConnection, Azblob);
pub fn validate(&self) -> Result<(), BoxedError> {
if self.enable_s3 {
self.s3.validate()?;
}
if self.enable_oss {
self.oss.validate()?;
}
if self.enable_gcs {
self.gcs.validate()?;
}
if self.enable_azblob {
self.azblob.validate()?;
}
Ok(())
}
/// Builds the object store from the config. /// Builds the object store from the config.
pub fn build(&self) -> Result<Option<ObjectStore>, BoxedError> { pub fn build(&self) -> Result<Option<ObjectStore>, BoxedError> {
let object_store = if self.enable_s3 { self.validate()?;
let s3 = S3Connection::from(self.s3.clone());
common_telemetry::info!("Building object store with s3: {:?}", s3); if self.enable_s3 {
Some( self.build_s3().map(Some)
ObjectStore::new(S3::from(&s3))
.context(error::InitBackendSnafu)
.map_err(BoxedError::new)?
.finish(),
)
} else if self.enable_oss { } else if self.enable_oss {
let oss = OssConnection::from(self.oss.clone()); self.build_oss().map(Some)
common_telemetry::info!("Building object store with oss: {:?}", oss);
Some(
ObjectStore::new(Oss::from(&oss))
.context(error::InitBackendSnafu)
.map_err(BoxedError::new)?
.finish(),
)
} else if self.enable_gcs { } else if self.enable_gcs {
let gcs = GcsConnection::from(self.gcs.clone()); self.build_gcs().map(Some)
common_telemetry::info!("Building object store with gcs: {:?}", gcs);
Some(
ObjectStore::new(Gcs::from(&gcs))
.context(error::InitBackendSnafu)
.map_err(BoxedError::new)?
.finish(),
)
} else if self.enable_azblob { } else if self.enable_azblob {
let azblob = AzblobConnection::from(self.azblob.clone()); self.build_azblob().map(Some)
common_telemetry::info!("Building object store with azblob: {:?}", azblob);
Some(
ObjectStore::new(Azblob::from(&azblob))
.context(error::InitBackendSnafu)
.map_err(BoxedError::new)?
.finish(),
)
} else { } else {
None Ok(None)
}; }
let object_store = object_store
.map(|object_store| with_instrument_layers(with_retry_layers(object_store), false));
Ok(object_store)
} }
} }

View File

@@ -19,7 +19,7 @@ use common_error::ext::BoxedError;
use common_meta::kv_backend::KvBackendRef; use common_meta::kv_backend::KvBackendRef;
use common_meta::kv_backend::chroot::ChrootKvBackend; use common_meta::kv_backend::chroot::ChrootKvBackend;
use common_meta::kv_backend::etcd::EtcdStore; use common_meta::kv_backend::etcd::EtcdStore;
use meta_srv::metasrv::BackendImpl; use meta_srv::metasrv::{BackendClientOptions, BackendImpl};
use meta_srv::utils::etcd::create_etcd_client_with_tls; use meta_srv::utils::etcd::create_etcd_client_with_tls;
use servers::tls::{TlsMode, TlsOption}; use servers::tls::{TlsMode, TlsOption};
@@ -61,6 +61,12 @@ pub struct StoreConfig {
#[cfg(feature = "pg_kvbackend")] #[cfg(feature = "pg_kvbackend")]
#[clap(long)] #[clap(long)]
pub meta_schema_name: Option<String>, pub meta_schema_name: Option<String>,
/// Automatically create PostgreSQL schema if it doesn't exist (default: true).
#[cfg(feature = "pg_kvbackend")]
#[clap(long, default_value_t = true)]
pub auto_create_schema: bool,
/// TLS mode for backend store connections (etcd, PostgreSQL, MySQL) /// TLS mode for backend store connections (etcd, PostgreSQL, MySQL)
#[clap(long = "backend-tls-mode", value_enum, default_value = "disable")] #[clap(long = "backend-tls-mode", value_enum, default_value = "disable")]
pub backend_tls_mode: TlsMode, pub backend_tls_mode: TlsMode,
@@ -86,7 +92,7 @@ impl StoreConfig {
pub fn tls_config(&self) -> Option<TlsOption> { pub fn tls_config(&self) -> Option<TlsOption> {
if self.backend_tls_mode != TlsMode::Disable { if self.backend_tls_mode != TlsMode::Disable {
Some(TlsOption { Some(TlsOption {
mode: self.backend_tls_mode.clone(), mode: self.backend_tls_mode,
cert_path: self.backend_tls_cert_path.clone(), cert_path: self.backend_tls_cert_path.clone(),
key_path: self.backend_tls_key_path.clone(), key_path: self.backend_tls_key_path.clone(),
ca_cert_path: self.backend_tls_ca_cert_path.clone(), ca_cert_path: self.backend_tls_ca_cert_path.clone(),
@@ -112,9 +118,13 @@ impl StoreConfig {
let kvbackend = match self.backend { let kvbackend = match self.backend {
BackendImpl::EtcdStore => { BackendImpl::EtcdStore => {
let tls_config = self.tls_config(); let tls_config = self.tls_config();
let etcd_client = create_etcd_client_with_tls(store_addrs, tls_config.as_ref()) let etcd_client = create_etcd_client_with_tls(
.await store_addrs,
.map_err(BoxedError::new)?; &BackendClientOptions::default(),
tls_config.as_ref(),
)
.await
.map_err(BoxedError::new)?;
Ok(EtcdStore::with_etcd_client(etcd_client, max_txn_ops)) Ok(EtcdStore::with_etcd_client(etcd_client, max_txn_ops))
} }
#[cfg(feature = "pg_kvbackend")] #[cfg(feature = "pg_kvbackend")]
@@ -134,6 +144,7 @@ impl StoreConfig {
schema_name, schema_name,
table_name, table_name,
max_txn_ops, max_txn_ops,
self.auto_create_schema,
) )
.await .await
.map_err(BoxedError::new)?) .map_err(BoxedError::new)?)

View File

@@ -14,6 +14,7 @@
mod export; mod export;
mod import; mod import;
mod storage_export;
use clap::Subcommand; use clap::Subcommand;
use client::DEFAULT_CATALOG_NAME; use client::DEFAULT_CATALOG_NAME;

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,373 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
use std::path::PathBuf;
use common_base::secrets::{ExposeSecret, SecretString};
use common_error::ext::BoxedError;
use crate::common::{
PrefixedAzblobConnection, PrefixedGcsConnection, PrefixedOssConnection, PrefixedS3Connection,
};
/// Helper function to extract secret string from Option<SecretString>.
/// Returns empty string if None.
fn expose_optional_secret(secret: &Option<SecretString>) -> &str {
secret
.as_ref()
.map(|s| s.expose_secret().as_str())
.unwrap_or("")
}
/// Helper function to format root path with leading slash if non-empty.
fn format_root_path(root: &str) -> String {
if root.is_empty() {
String::new()
} else {
format!("/{}", root)
}
}
/// Helper function to mask multiple secrets in a string.
fn mask_secrets(mut sql: String, secrets: &[&str]) -> String {
for secret in secrets {
if !secret.is_empty() {
sql = sql.replace(secret, "[REDACTED]");
}
}
sql
}
/// Helper function to format storage URI.
fn format_uri(scheme: &str, bucket: &str, root: &str, path: &str) -> String {
let root = format_root_path(root);
format!("{}://{}{}/{}", scheme, bucket, root, path)
}
/// Trait for storage backends that can be used for data export.
pub trait StorageExport: Send + Sync {
/// Generate the storage path for COPY DATABASE command.
/// Returns (path, connection_string) where connection_string includes CONNECTION clause.
fn get_storage_path(&self, catalog: &str, schema: &str) -> (String, String);
/// Format the output path for logging purposes.
fn format_output_path(&self, file_path: &str) -> String;
/// Mask sensitive information in SQL commands for safe logging.
fn mask_sensitive_info(&self, sql: &str) -> String;
}
macro_rules! define_backend {
($name:ident, $config:ty) => {
#[derive(Clone)]
pub struct $name {
config: $config,
}
impl $name {
pub fn new(config: $config) -> Result<Self, BoxedError> {
config.validate()?;
Ok(Self { config })
}
}
};
}
/// Local file system storage backend.
#[derive(Clone)]
pub struct FsBackend {
output_dir: String,
}
impl FsBackend {
pub fn new(output_dir: String) -> Self {
Self { output_dir }
}
}
impl StorageExport for FsBackend {
fn get_storage_path(&self, catalog: &str, schema: &str) -> (String, String) {
if self.output_dir.is_empty() {
unreachable!("output_dir must be set when not using remote storage")
}
let path = PathBuf::from(&self.output_dir)
.join(catalog)
.join(format!("{schema}/"))
.to_string_lossy()
.to_string();
(path, String::new())
}
fn format_output_path(&self, file_path: &str) -> String {
format!("{}/{}", self.output_dir, file_path)
}
fn mask_sensitive_info(&self, sql: &str) -> String {
sql.to_string()
}
}
define_backend!(S3Backend, PrefixedS3Connection);
impl StorageExport for S3Backend {
fn get_storage_path(&self, catalog: &str, schema: &str) -> (String, String) {
let s3_path = format_uri(
"s3",
&self.config.s3_bucket,
&self.config.s3_root,
&format!("{}/{}/", catalog, schema),
);
let mut connection_options = vec![
format!(
"ACCESS_KEY_ID='{}'",
expose_optional_secret(&self.config.s3_access_key_id)
),
format!(
"SECRET_ACCESS_KEY='{}'",
expose_optional_secret(&self.config.s3_secret_access_key)
),
];
if let Some(region) = &self.config.s3_region {
connection_options.push(format!("REGION='{}'", region));
}
if let Some(endpoint) = &self.config.s3_endpoint {
connection_options.push(format!("ENDPOINT='{}'", endpoint));
}
let connection_str = format!(" CONNECTION ({})", connection_options.join(", "));
(s3_path, connection_str)
}
fn format_output_path(&self, file_path: &str) -> String {
format_uri(
"s3",
&self.config.s3_bucket,
&self.config.s3_root,
file_path,
)
}
fn mask_sensitive_info(&self, sql: &str) -> String {
mask_secrets(
sql.to_string(),
&[
expose_optional_secret(&self.config.s3_access_key_id),
expose_optional_secret(&self.config.s3_secret_access_key),
],
)
}
}
define_backend!(OssBackend, PrefixedOssConnection);
impl StorageExport for OssBackend {
fn get_storage_path(&self, catalog: &str, schema: &str) -> (String, String) {
let oss_path = format_uri(
"oss",
&self.config.oss_bucket,
&self.config.oss_root,
&format!("{}/{}/", catalog, schema),
);
let connection_options = [
format!(
"ACCESS_KEY_ID='{}'",
expose_optional_secret(&self.config.oss_access_key_id)
),
format!(
"ACCESS_KEY_SECRET='{}'",
expose_optional_secret(&self.config.oss_access_key_secret)
),
];
let connection_str = format!(" CONNECTION ({})", connection_options.join(", "));
(oss_path, connection_str)
}
fn format_output_path(&self, file_path: &str) -> String {
format_uri(
"oss",
&self.config.oss_bucket,
&self.config.oss_root,
file_path,
)
}
fn mask_sensitive_info(&self, sql: &str) -> String {
mask_secrets(
sql.to_string(),
&[
expose_optional_secret(&self.config.oss_access_key_id),
expose_optional_secret(&self.config.oss_access_key_secret),
],
)
}
}
define_backend!(GcsBackend, PrefixedGcsConnection);
impl StorageExport for GcsBackend {
fn get_storage_path(&self, catalog: &str, schema: &str) -> (String, String) {
let gcs_path = format_uri(
"gcs",
&self.config.gcs_bucket,
&self.config.gcs_root,
&format!("{}/{}/", catalog, schema),
);
let mut connection_options = Vec::new();
let credential_path = expose_optional_secret(&self.config.gcs_credential_path);
if !credential_path.is_empty() {
connection_options.push(format!("CREDENTIAL_PATH='{}'", credential_path));
}
let credential = expose_optional_secret(&self.config.gcs_credential);
if !credential.is_empty() {
connection_options.push(format!("CREDENTIAL='{}'", credential));
}
if !self.config.gcs_endpoint.is_empty() {
connection_options.push(format!("ENDPOINT='{}'", self.config.gcs_endpoint));
}
let connection_str = if connection_options.is_empty() {
String::new()
} else {
format!(" CONNECTION ({})", connection_options.join(", "))
};
(gcs_path, connection_str)
}
fn format_output_path(&self, file_path: &str) -> String {
format_uri(
"gcs",
&self.config.gcs_bucket,
&self.config.gcs_root,
file_path,
)
}
fn mask_sensitive_info(&self, sql: &str) -> String {
mask_secrets(
sql.to_string(),
&[
expose_optional_secret(&self.config.gcs_credential_path),
expose_optional_secret(&self.config.gcs_credential),
],
)
}
}
define_backend!(AzblobBackend, PrefixedAzblobConnection);
impl StorageExport for AzblobBackend {
fn get_storage_path(&self, catalog: &str, schema: &str) -> (String, String) {
let azblob_path = format_uri(
"azblob",
&self.config.azblob_container,
&self.config.azblob_root,
&format!("{}/{}/", catalog, schema),
);
let mut connection_options = vec![
format!(
"ACCOUNT_NAME='{}'",
expose_optional_secret(&self.config.azblob_account_name)
),
format!(
"ACCOUNT_KEY='{}'",
expose_optional_secret(&self.config.azblob_account_key)
),
];
if let Some(sas_token) = &self.config.azblob_sas_token {
connection_options.push(format!("SAS_TOKEN='{}'", sas_token));
}
let connection_str = format!(" CONNECTION ({})", connection_options.join(", "));
(azblob_path, connection_str)
}
fn format_output_path(&self, file_path: &str) -> String {
format_uri(
"azblob",
&self.config.azblob_container,
&self.config.azblob_root,
file_path,
)
}
fn mask_sensitive_info(&self, sql: &str) -> String {
mask_secrets(
sql.to_string(),
&[
expose_optional_secret(&self.config.azblob_account_name),
expose_optional_secret(&self.config.azblob_account_key),
],
)
}
}
#[derive(Clone)]
pub enum StorageType {
Fs(FsBackend),
S3(S3Backend),
Oss(OssBackend),
Gcs(GcsBackend),
Azblob(AzblobBackend),
}
impl StorageExport for StorageType {
fn get_storage_path(&self, catalog: &str, schema: &str) -> (String, String) {
match self {
StorageType::Fs(backend) => backend.get_storage_path(catalog, schema),
StorageType::S3(backend) => backend.get_storage_path(catalog, schema),
StorageType::Oss(backend) => backend.get_storage_path(catalog, schema),
StorageType::Gcs(backend) => backend.get_storage_path(catalog, schema),
StorageType::Azblob(backend) => backend.get_storage_path(catalog, schema),
}
}
fn format_output_path(&self, file_path: &str) -> String {
match self {
StorageType::Fs(backend) => backend.format_output_path(file_path),
StorageType::S3(backend) => backend.format_output_path(file_path),
StorageType::Oss(backend) => backend.format_output_path(file_path),
StorageType::Gcs(backend) => backend.format_output_path(file_path),
StorageType::Azblob(backend) => backend.format_output_path(file_path),
}
}
fn mask_sensitive_info(&self, sql: &str) -> String {
match self {
StorageType::Fs(backend) => backend.mask_sensitive_info(sql),
StorageType::S3(backend) => backend.mask_sensitive_info(sql),
StorageType::Oss(backend) => backend.mask_sensitive_info(sql),
StorageType::Gcs(backend) => backend.mask_sensitive_info(sql),
StorageType::Azblob(backend) => backend.mask_sensitive_info(sql),
}
}
}
impl StorageType {
/// Returns true if the storage backend is remote (not local filesystem).
pub fn is_remote_storage(&self) -> bool {
!matches!(self, StorageType::Fs(_))
}
}

View File

@@ -253,12 +253,6 @@ pub enum Error {
error: ObjectStoreError, error: ObjectStoreError,
}, },
#[snafu(display("S3 config need be set"))]
S3ConfigNotSet {
#[snafu(implicit)]
location: Location,
},
#[snafu(display("Output directory not set"))] #[snafu(display("Output directory not set"))]
OutputDirNotSet { OutputDirNotSet {
#[snafu(implicit)] #[snafu(implicit)]
@@ -364,9 +358,9 @@ impl ErrorExt for Error {
Error::Other { source, .. } => source.status_code(), Error::Other { source, .. } => source.status_code(),
Error::OpenDal { .. } | Error::InitBackend { .. } => StatusCode::Internal, Error::OpenDal { .. } | Error::InitBackend { .. } => StatusCode::Internal,
Error::S3ConfigNotSet { .. } Error::OutputDirNotSet { .. } | Error::EmptyStoreAddrs { .. } => {
| Error::OutputDirNotSet { .. } StatusCode::InvalidArguments
| Error::EmptyStoreAddrs { .. } => StatusCode::InvalidArguments, }
Error::BuildRuntime { source, .. } => source.status_code(), Error::BuildRuntime { source, .. } => source.status_code(),

View File

@@ -37,6 +37,7 @@ use common_grpc::flight::{FlightDecoder, FlightMessage};
use common_query::Output; use common_query::Output;
use common_recordbatch::error::ExternalSnafu; use common_recordbatch::error::ExternalSnafu;
use common_recordbatch::{RecordBatch, RecordBatchStreamWrapper}; use common_recordbatch::{RecordBatch, RecordBatchStreamWrapper};
use common_telemetry::tracing::Span;
use common_telemetry::tracing_context::W3cTrace; use common_telemetry::tracing_context::W3cTrace;
use common_telemetry::{error, warn}; use common_telemetry::{error, warn};
use futures::future; use futures::future;
@@ -456,6 +457,7 @@ impl Database {
stream, stream,
output_ordering: None, output_ordering: None,
metrics: Default::default(), metrics: Default::default(),
span: Span::current(),
}; };
Ok(Output::new_with_stream(Box::pin(record_batch_stream))) Ok(Output::new_with_stream(Box::pin(record_batch_stream)))
} }

View File

@@ -30,6 +30,7 @@ use common_query::request::QueryRequest;
use common_recordbatch::error::ExternalSnafu; use common_recordbatch::error::ExternalSnafu;
use common_recordbatch::{RecordBatch, RecordBatchStreamWrapper, SendableRecordBatchStream}; use common_recordbatch::{RecordBatch, RecordBatchStreamWrapper, SendableRecordBatchStream};
use common_telemetry::error; use common_telemetry::error;
use common_telemetry::tracing::Span;
use common_telemetry::tracing_context::TracingContext; use common_telemetry::tracing_context::TracingContext;
use prost::Message; use prost::Message;
use query::query_engine::DefaultSerializer; use query::query_engine::DefaultSerializer;
@@ -242,6 +243,7 @@ impl RegionRequester {
stream, stream,
output_ordering: None, output_ordering: None,
metrics, metrics,
span: Span::current(),
}; };
Ok(Box::pin(record_batch_stream)) Ok(Box::pin(record_batch_stream))
} }

View File

@@ -18,6 +18,7 @@ default = [
] ]
enterprise = ["common-meta/enterprise", "frontend/enterprise", "meta-srv/enterprise"] enterprise = ["common-meta/enterprise", "frontend/enterprise", "meta-srv/enterprise"]
tokio-console = ["common-telemetry/tokio-console"] tokio-console = ["common-telemetry/tokio-console"]
vector_index = ["mito2/vector_index"]
[lints] [lints]
workspace = true workspace = true

View File

@@ -330,7 +330,6 @@ mod tests {
use common_config::ENV_VAR_SEP; use common_config::ENV_VAR_SEP;
use common_test_util::temp_dir::create_named_temp_file; use common_test_util::temp_dir::create_named_temp_file;
use object_store::config::{FileConfig, GcsConfig, ObjectStoreConfig, S3Config}; use object_store::config::{FileConfig, GcsConfig, ObjectStoreConfig, S3Config};
use servers::heartbeat_options::HeartbeatOptions;
use super::*; use super::*;
use crate::options::GlobalOptions; use crate::options::GlobalOptions;
@@ -374,9 +373,6 @@ mod tests {
hostname = "127.0.0.1" hostname = "127.0.0.1"
runtime_size = 8 runtime_size = 8
[heartbeat]
interval = "300ms"
[meta_client] [meta_client]
metasrv_addrs = ["127.0.0.1:3002"] metasrv_addrs = ["127.0.0.1:3002"]
timeout = "3s" timeout = "3s"
@@ -434,13 +430,6 @@ mod tests {
); );
assert!(!raft_engine_config.sync_write); assert!(!raft_engine_config.sync_write);
let HeartbeatOptions {
interval: heart_beat_interval,
..
} = options.heartbeat;
assert_eq!(300, heart_beat_interval.as_millis());
let MetaClientOptions { let MetaClientOptions {
metasrv_addrs: metasrv_addr, metasrv_addrs: metasrv_addr,
timeout, timeout,

View File

@@ -145,6 +145,17 @@ impl ObjbenchCommand {
let region_meta = extract_region_metadata(&self.source, &parquet_meta)?; let region_meta = extract_region_metadata(&self.source, &parquet_meta)?;
let num_rows = parquet_meta.file_metadata().num_rows() as u64; let num_rows = parquet_meta.file_metadata().num_rows() as u64;
let num_row_groups = parquet_meta.num_row_groups() as u64; let num_row_groups = parquet_meta.num_row_groups() as u64;
let max_row_group_uncompressed_size: u64 = parquet_meta
.row_groups()
.iter()
.map(|rg| {
rg.columns()
.iter()
.map(|c| c.uncompressed_size() as u64)
.sum::<u64>()
})
.max()
.unwrap_or(0);
println!( println!(
"{} Metadata loaded - rows: {}, size: {} bytes", "{} Metadata loaded - rows: {}, size: {} bytes",
@@ -160,6 +171,7 @@ impl ObjbenchCommand {
time_range: Default::default(), time_range: Default::default(),
level: 0, level: 0,
file_size, file_size,
max_row_group_uncompressed_size,
available_indexes: Default::default(), available_indexes: Default::default(),
indexes: Default::default(), indexes: Default::default(),
index_file_size: 0, index_file_size: 0,
@@ -221,6 +233,8 @@ impl ObjbenchCommand {
inverted_index_config: MitoConfig::default().inverted_index, inverted_index_config: MitoConfig::default().inverted_index,
fulltext_index_config, fulltext_index_config,
bloom_filter_index_config: MitoConfig::default().bloom_filter_index, bloom_filter_index_config: MitoConfig::default().bloom_filter_index,
#[cfg(feature = "vector_index")]
vector_index_config: Default::default(),
}; };
// Write SST // Write SST

View File

@@ -358,7 +358,6 @@ impl StartCommand {
let heartbeat_task = flow::heartbeat::HeartbeatTask::new( let heartbeat_task = flow::heartbeat::HeartbeatTask::new(
&opts, &opts,
meta_client.clone(), meta_client.clone(),
opts.heartbeat.clone(),
Arc::new(executor), Arc::new(executor),
Arc::new(resource_stat), Arc::new(resource_stat),
); );

View File

@@ -52,7 +52,7 @@ use plugins::frontend::context::{
}; };
use servers::addrs; use servers::addrs;
use servers::grpc::GrpcOptions; use servers::grpc::GrpcOptions;
use servers::tls::{TlsMode, TlsOption}; use servers::tls::{TlsMode, TlsOption, merge_tls_option};
use snafu::{OptionExt, ResultExt}; use snafu::{OptionExt, ResultExt};
use tracing_appender::non_blocking::WorkerGuard; use tracing_appender::non_blocking::WorkerGuard;
@@ -236,7 +236,7 @@ impl StartCommand {
}; };
let tls_opts = TlsOption::new( let tls_opts = TlsOption::new(
self.tls_mode.clone(), self.tls_mode,
self.tls_cert_path.clone(), self.tls_cert_path.clone(),
self.tls_key_path.clone(), self.tls_key_path.clone(),
self.tls_watch, self.tls_watch,
@@ -256,7 +256,7 @@ impl StartCommand {
if let Some(addr) = &self.rpc_bind_addr { if let Some(addr) = &self.rpc_bind_addr {
opts.grpc.bind_addr.clone_from(addr); opts.grpc.bind_addr.clone_from(addr);
opts.grpc.tls = tls_opts.clone(); opts.grpc.tls = merge_tls_option(&opts.grpc.tls, tls_opts.clone());
} }
if let Some(addr) = &self.rpc_server_addr { if let Some(addr) = &self.rpc_server_addr {
@@ -291,13 +291,13 @@ impl StartCommand {
if let Some(addr) = &self.mysql_addr { if let Some(addr) = &self.mysql_addr {
opts.mysql.enable = true; opts.mysql.enable = true;
opts.mysql.addr.clone_from(addr); opts.mysql.addr.clone_from(addr);
opts.mysql.tls = tls_opts.clone(); opts.mysql.tls = merge_tls_option(&opts.mysql.tls, tls_opts.clone());
} }
if let Some(addr) = &self.postgres_addr { if let Some(addr) = &self.postgres_addr {
opts.postgres.enable = true; opts.postgres.enable = true;
opts.postgres.addr.clone_from(addr); opts.postgres.addr.clone_from(addr);
opts.postgres.tls = tls_opts; opts.postgres.tls = merge_tls_option(&opts.postgres.tls, tls_opts.clone());
} }
if let Some(enable) = self.influxdb_enable { if let Some(enable) = self.influxdb_enable {

View File

@@ -108,7 +108,7 @@ pub trait App: Send {
} }
} }
/// Log the versions of the application, and the arguments passed to the cli. /// Log the versions of the application.
/// ///
/// `version` should be the same as the output of cli "--version"; /// `version` should be the same as the output of cli "--version";
/// and the `short_version` is the short version of the codes, often consist of git branch and commit. /// and the `short_version` is the short version of the codes, often consist of git branch and commit.
@@ -118,10 +118,7 @@ pub fn log_versions(version: &str, short_version: &str, app: &str) {
.with_label_values(&[common_version::version(), short_version, app]) .with_label_values(&[common_version::version(), short_version, app])
.inc(); .inc();
// Log version and argument flags.
info!("GreptimeDB version: {}", version); info!("GreptimeDB version: {}", version);
log_env_flags();
} }
pub fn create_resource_limit_metrics(app: &str) { pub fn create_resource_limit_metrics(app: &str) {
@@ -144,13 +141,6 @@ pub fn create_resource_limit_metrics(app: &str) {
} }
} }
fn log_env_flags() {
info!("command line arguments");
for argument in std::env::args() {
info!("argument: {}", argument);
}
}
pub fn maybe_activate_heap_profile(memory_options: &common_options::memory::MemoryOptions) { pub fn maybe_activate_heap_profile(memory_options: &common_options::memory::MemoryOptions) {
if memory_options.enable_heap_profiling { if memory_options.enable_heap_profiling {
match activate_heap_profile() { match activate_heap_profile() {

View File

@@ -20,6 +20,7 @@ use async_trait::async_trait;
use clap::Parser; use clap::Parser;
use common_base::Plugins; use common_base::Plugins;
use common_config::Configurable; use common_config::Configurable;
use common_meta::distributed_time_constants::init_distributed_time_constants;
use common_telemetry::info; use common_telemetry::info;
use common_telemetry::logging::{DEFAULT_LOGGING_DIR, TracingOptions}; use common_telemetry::logging::{DEFAULT_LOGGING_DIR, TracingOptions};
use common_version::{short_version, verbose_version}; use common_version::{short_version, verbose_version};
@@ -154,8 +155,6 @@ pub struct StartCommand {
#[clap(short, long)] #[clap(short, long)]
selector: Option<String>, selector: Option<String>,
#[clap(long)] #[clap(long)]
use_memory_store: Option<bool>,
#[clap(long)]
enable_region_failover: Option<bool>, enable_region_failover: Option<bool>,
#[clap(long)] #[clap(long)]
http_addr: Option<String>, http_addr: Option<String>,
@@ -185,7 +184,6 @@ impl Debug for StartCommand {
.field("store_addrs", &self.sanitize_store_addrs()) .field("store_addrs", &self.sanitize_store_addrs())
.field("config_file", &self.config_file) .field("config_file", &self.config_file)
.field("selector", &self.selector) .field("selector", &self.selector)
.field("use_memory_store", &self.use_memory_store)
.field("enable_region_failover", &self.enable_region_failover) .field("enable_region_failover", &self.enable_region_failover)
.field("http_addr", &self.http_addr) .field("http_addr", &self.http_addr)
.field("http_timeout", &self.http_timeout) .field("http_timeout", &self.http_timeout)
@@ -267,10 +265,6 @@ impl StartCommand {
.context(error::UnsupportedSelectorTypeSnafu { selector_type })?; .context(error::UnsupportedSelectorTypeSnafu { selector_type })?;
} }
if let Some(use_memory_store) = self.use_memory_store {
opts.use_memory_store = use_memory_store;
}
if let Some(enable_region_failover) = self.enable_region_failover { if let Some(enable_region_failover) = self.enable_region_failover {
opts.enable_region_failover = enable_region_failover; opts.enable_region_failover = enable_region_failover;
} }
@@ -327,6 +321,7 @@ impl StartCommand {
log_versions(verbose_version(), short_version(), APP_NAME); log_versions(verbose_version(), short_version(), APP_NAME);
maybe_activate_heap_profile(&opts.component.memory); maybe_activate_heap_profile(&opts.component.memory);
create_resource_limit_metrics(APP_NAME); create_resource_limit_metrics(APP_NAME);
init_distributed_time_constants(opts.component.heartbeat_interval);
info!("Metasrv start command: {:#?}", self); info!("Metasrv start command: {:#?}", self);
@@ -389,7 +384,6 @@ mod tests {
server_addr = "127.0.0.1:3002" server_addr = "127.0.0.1:3002"
store_addr = "127.0.0.1:2379" store_addr = "127.0.0.1:2379"
selector = "LeaseBased" selector = "LeaseBased"
use_memory_store = false
[logging] [logging]
level = "debug" level = "debug"
@@ -468,7 +462,6 @@ mod tests {
server_addr = "127.0.0.1:3002" server_addr = "127.0.0.1:3002"
datanode_lease_secs = 15 datanode_lease_secs = 15
selector = "LeaseBased" selector = "LeaseBased"
use_memory_store = false
[http] [http]
addr = "127.0.0.1:4000" addr = "127.0.0.1:4000"

View File

@@ -62,7 +62,7 @@ use plugins::frontend::context::{
CatalogManagerConfigureContext, StandaloneCatalogManagerConfigureContext, CatalogManagerConfigureContext, StandaloneCatalogManagerConfigureContext,
}; };
use plugins::standalone::context::DdlManagerConfigureContext; use plugins::standalone::context::DdlManagerConfigureContext;
use servers::tls::{TlsMode, TlsOption}; use servers::tls::{TlsMode, TlsOption, merge_tls_option};
use snafu::ResultExt; use snafu::ResultExt;
use standalone::StandaloneInformationExtension; use standalone::StandaloneInformationExtension;
use standalone::options::StandaloneOptions; use standalone::options::StandaloneOptions;
@@ -261,7 +261,7 @@ impl StartCommand {
}; };
let tls_opts = TlsOption::new( let tls_opts = TlsOption::new(
self.tls_mode.clone(), self.tls_mode,
self.tls_cert_path.clone(), self.tls_cert_path.clone(),
self.tls_key_path.clone(), self.tls_key_path.clone(),
self.tls_watch, self.tls_watch,
@@ -293,19 +293,20 @@ impl StartCommand {
), ),
}.fail(); }.fail();
} }
opts.grpc.bind_addr.clone_from(addr) opts.grpc.bind_addr.clone_from(addr);
opts.grpc.tls = merge_tls_option(&opts.grpc.tls, tls_opts.clone());
} }
if let Some(addr) = &self.mysql_addr { if let Some(addr) = &self.mysql_addr {
opts.mysql.enable = true; opts.mysql.enable = true;
opts.mysql.addr.clone_from(addr); opts.mysql.addr.clone_from(addr);
opts.mysql.tls = tls_opts.clone(); opts.mysql.tls = merge_tls_option(&opts.mysql.tls, tls_opts.clone());
} }
if let Some(addr) = &self.postgres_addr { if let Some(addr) = &self.postgres_addr {
opts.postgres.enable = true; opts.postgres.enable = true;
opts.postgres.addr.clone_from(addr); opts.postgres.addr.clone_from(addr);
opts.postgres.tls = tls_opts; opts.postgres.tls = merge_tls_option(&opts.postgres.tls, tls_opts.clone());
} }
if self.influxdb_enable { if self.influxdb_enable {
@@ -551,9 +552,8 @@ impl StartCommand {
let grpc_handler = fe_instance.clone() as Arc<dyn GrpcQueryHandlerWithBoxedError>; let grpc_handler = fe_instance.clone() as Arc<dyn GrpcQueryHandlerWithBoxedError>;
let weak_grpc_handler = Arc::downgrade(&grpc_handler); let weak_grpc_handler = Arc::downgrade(&grpc_handler);
frontend_instance_handler frontend_instance_handler
.lock() .set_handler(weak_grpc_handler)
.unwrap() .await;
.replace(weak_grpc_handler);
// set the frontend invoker for flownode // set the frontend invoker for flownode
let flow_streaming_engine = flownode.flow_engine().streaming_engine(); let flow_streaming_engine = flownode.flow_engine().streaming_engine();
@@ -765,7 +765,6 @@ mod tests {
user_provider: Some("static_user_provider:cmd:test=test".to_string()), user_provider: Some("static_user_provider:cmd:test=test".to_string()),
mysql_addr: Some("127.0.0.1:4002".to_string()), mysql_addr: Some("127.0.0.1:4002".to_string()),
postgres_addr: Some("127.0.0.1:4003".to_string()), postgres_addr: Some("127.0.0.1:4003".to_string()),
tls_watch: true,
..Default::default() ..Default::default()
}; };
@@ -782,8 +781,6 @@ mod tests {
assert_eq!("./greptimedb_data/test/logs", opts.logging.dir); assert_eq!("./greptimedb_data/test/logs", opts.logging.dir);
assert_eq!("debug", opts.logging.level.unwrap()); assert_eq!("debug", opts.logging.level.unwrap());
assert!(opts.mysql.tls.watch);
assert!(opts.postgres.tls.watch);
} }
#[test] #[test]

View File

@@ -228,7 +228,6 @@ fn test_load_flownode_example_config() {
..Default::default() ..Default::default()
}, },
tracing: Default::default(), tracing: Default::default(),
heartbeat: Default::default(),
// flownode deliberately use a slower query parallelism // flownode deliberately use a slower query parallelism
// to avoid overwhelming the frontend with too many queries // to avoid overwhelming the frontend with too many queries
query: QueryOptions { query: QueryOptions {

View File

@@ -59,15 +59,6 @@ pub enum Error {
location: Location, location: Location,
}, },
#[snafu(display("Failed to canonicalize path: {}", path))]
CanonicalizePath {
path: String,
#[snafu(source)]
error: std::io::Error,
#[snafu(implicit)]
location: Location,
},
#[snafu(display("Invalid path '{}': expected a file, not a directory", path))] #[snafu(display("Invalid path '{}': expected a file, not a directory", path))]
InvalidPath { InvalidPath {
path: String, path: String,
@@ -82,8 +73,7 @@ impl ErrorExt for Error {
Error::TomlFormat { .. } Error::TomlFormat { .. }
| Error::LoadLayeredConfig { .. } | Error::LoadLayeredConfig { .. }
| Error::FileWatch { .. } | Error::FileWatch { .. }
| Error::InvalidPath { .. } | Error::InvalidPath { .. } => StatusCode::InvalidArguments,
| Error::CanonicalizePath { .. } => StatusCode::InvalidArguments,
Error::SerdeJson { .. } => StatusCode::Unexpected, Error::SerdeJson { .. } => StatusCode::Unexpected,
} }
} }

View File

@@ -30,7 +30,7 @@ use common_telemetry::{error, info, warn};
use notify::{EventKind, RecursiveMode, Watcher}; use notify::{EventKind, RecursiveMode, Watcher};
use snafu::ResultExt; use snafu::ResultExt;
use crate::error::{CanonicalizePathSnafu, FileWatchSnafu, InvalidPathSnafu, Result}; use crate::error::{FileWatchSnafu, InvalidPathSnafu, Result};
/// Configuration for the file watcher behavior. /// Configuration for the file watcher behavior.
#[derive(Debug, Clone, Default)] #[derive(Debug, Clone, Default)]
@@ -41,15 +41,10 @@ pub struct FileWatcherConfig {
impl FileWatcherConfig { impl FileWatcherConfig {
pub fn new() -> Self { pub fn new() -> Self {
Self::default() Default::default()
} }
pub fn with_modify_and_create(mut self) -> Self { pub fn include_remove_events(mut self) -> Self {
self.include_remove_events = false;
self
}
pub fn with_remove_events(mut self) -> Self {
self.include_remove_events = true; self.include_remove_events = true;
self self
} }
@@ -93,11 +88,8 @@ impl FileWatcherBuilder {
path: path.display().to_string(), path: path.display().to_string(),
} }
); );
// Canonicalize the path for reliable comparison with event paths
let canonical = path.canonicalize().context(CanonicalizePathSnafu { self.file_paths.push(path.to_path_buf());
path: path.display().to_string(),
})?;
self.file_paths.push(canonical);
Ok(self) Ok(self)
} }
@@ -144,7 +136,6 @@ impl FileWatcherBuilder {
} }
let config = self.config; let config = self.config;
let watched_files: HashSet<PathBuf> = self.file_paths.iter().cloned().collect();
info!( info!(
"Spawning file watcher for paths: {:?} (watching parent directories)", "Spawning file watcher for paths: {:?} (watching parent directories)",
@@ -165,25 +156,7 @@ impl FileWatcherBuilder {
continue; continue;
} }
// Check if any of the event paths match our watched files info!(?event.kind, ?event.paths, "Detected folder change");
let is_watched_file = event.paths.iter().any(|event_path| {
// Try to canonicalize the event path for comparison
// If the file was deleted, canonicalize will fail, so we also
// compare the raw path
if let Ok(canonical) = event_path.canonicalize()
&& watched_files.contains(&canonical)
{
return true;
}
// For deleted files, compare using the raw path
watched_files.contains(event_path)
});
if !is_watched_file {
continue;
}
info!(?event.kind, ?event.paths, "Detected file change");
callback(); callback();
} }
Err(err) => { Err(err) => {
@@ -301,55 +274,4 @@ mod tests {
"Watcher should have detected file recreation" "Watcher should have detected file recreation"
); );
} }
#[test]
fn test_file_watcher_ignores_other_files() {
common_telemetry::init_default_ut_logging();
let dir = create_temp_dir("test_file_watcher_other");
let watched_file = dir.path().join("watched.txt");
let other_file = dir.path().join("other.txt");
// Create both files
std::fs::write(&watched_file, "watched content").unwrap();
std::fs::write(&other_file, "other content").unwrap();
let counter = Arc::new(AtomicUsize::new(0));
let counter_clone = counter.clone();
FileWatcherBuilder::new()
.watch_path(&watched_file)
.unwrap()
.config(FileWatcherConfig::new())
.spawn(move || {
counter_clone.fetch_add(1, Ordering::SeqCst);
})
.unwrap();
// Give watcher time to start
std::thread::sleep(Duration::from_millis(100));
// Modify the other file - should NOT trigger callback
std::fs::write(&other_file, "modified other content").unwrap();
// Wait for potential event
std::thread::sleep(Duration::from_millis(500));
assert_eq!(
counter.load(Ordering::SeqCst),
0,
"Watcher should not have detected changes to other files"
);
// Now modify the watched file - SHOULD trigger callback
std::fs::write(&watched_file, "modified watched content").unwrap();
// Wait for the event to be processed
std::thread::sleep(Duration::from_millis(500));
assert!(
counter.load(Ordering::SeqCst) >= 1,
"Watcher should have detected change to watched file"
);
}
} }

View File

@@ -27,6 +27,7 @@ const SECRET_ACCESS_KEY: &str = "secret_access_key";
const SESSION_TOKEN: &str = "session_token"; const SESSION_TOKEN: &str = "session_token";
const REGION: &str = "region"; const REGION: &str = "region";
const ENABLE_VIRTUAL_HOST_STYLE: &str = "enable_virtual_host_style"; const ENABLE_VIRTUAL_HOST_STYLE: &str = "enable_virtual_host_style";
const DISABLE_EC2_METADATA: &str = "disable_ec2_metadata";
pub fn is_supported_in_s3(key: &str) -> bool { pub fn is_supported_in_s3(key: &str) -> bool {
[ [
@@ -36,6 +37,7 @@ pub fn is_supported_in_s3(key: &str) -> bool {
SESSION_TOKEN, SESSION_TOKEN,
REGION, REGION,
ENABLE_VIRTUAL_HOST_STYLE, ENABLE_VIRTUAL_HOST_STYLE,
DISABLE_EC2_METADATA,
] ]
.contains(&key) .contains(&key)
} }
@@ -82,6 +84,21 @@ pub fn build_s3_backend(
} }
} }
if let Some(disable_str) = connection.get(DISABLE_EC2_METADATA) {
let disable = disable_str.as_str().parse::<bool>().map_err(|e| {
error::InvalidConnectionSnafu {
msg: format!(
"failed to parse the option {}={}, {}",
DISABLE_EC2_METADATA, disable_str, e
),
}
.build()
})?;
if disable {
builder = builder.disable_ec2_metadata();
}
}
// TODO(weny): Consider finding a better way to eliminate duplicate code. // TODO(weny): Consider finding a better way to eliminate duplicate code.
Ok(ObjectStore::new(builder) Ok(ObjectStore::new(builder)
.context(error::BuildBackendSnafu)? .context(error::BuildBackendSnafu)?
@@ -109,6 +126,7 @@ mod tests {
assert!(is_supported_in_s3(SESSION_TOKEN)); assert!(is_supported_in_s3(SESSION_TOKEN));
assert!(is_supported_in_s3(REGION)); assert!(is_supported_in_s3(REGION));
assert!(is_supported_in_s3(ENABLE_VIRTUAL_HOST_STYLE)); assert!(is_supported_in_s3(ENABLE_VIRTUAL_HOST_STYLE));
assert!(is_supported_in_s3(DISABLE_EC2_METADATA));
assert!(!is_supported_in_s3("foo")) assert!(!is_supported_in_s3("foo"))
} }
} }

View File

@@ -17,9 +17,10 @@ ahash.workspace = true
api.workspace = true api.workspace = true
arc-swap = "1.0" arc-swap = "1.0"
arrow.workspace = true arrow.workspace = true
arrow-cast.workspace = true
arrow-schema.workspace = true arrow-schema.workspace = true
async-trait.workspace = true async-trait.workspace = true
bincode = "1.3" bincode = "=1.3.3"
catalog.workspace = true catalog.workspace = true
chrono.workspace = true chrono.workspace = true
common-base.workspace = true common-base.workspace = true
@@ -46,6 +47,7 @@ geohash = { version = "0.13", optional = true }
h3o = { version = "0.6", optional = true } h3o = { version = "0.6", optional = true }
hyperloglogplus = "0.4" hyperloglogplus = "0.4"
jsonb.workspace = true jsonb.workspace = true
jsonpath-rust = "0.7.5"
memchr = "2.7" memchr = "2.7"
mito-codec.workspace = true mito-codec.workspace = true
nalgebra.workspace = true nalgebra.workspace = true

View File

@@ -13,17 +13,24 @@
// limitations under the License. // limitations under the License.
use std::fmt::{self, Display}; use std::fmt::{self, Display};
use std::str::FromStr;
use std::sync::Arc; use std::sync::Arc;
use arrow::array::{ArrayRef, BinaryViewArray, StringViewArray, StructArray};
use arrow::compute; use arrow::compute;
use datafusion_common::DataFusionError; use arrow::datatypes::{Float64Type, Int64Type, UInt64Type};
use datafusion_common::arrow::array::{ use datafusion_common::arrow::array::{
Array, AsArray, BinaryViewBuilder, BooleanBuilder, Float64Builder, Int64Builder, Array, AsArray, BinaryViewBuilder, BooleanBuilder, Float64Builder, Int64Builder,
StringViewBuilder, StringViewBuilder,
}; };
use datafusion_common::arrow::datatypes::DataType; use datafusion_common::arrow::datatypes::DataType;
use datafusion_common::{DataFusionError, Result};
use datafusion_expr::type_coercion::aggregates::STRINGS; use datafusion_expr::type_coercion::aggregates::STRINGS;
use datafusion_expr::{ColumnarValue, ScalarFunctionArgs, Signature}; use datafusion_expr::{ColumnarValue, ScalarFunctionArgs, Signature, Volatility};
use datatypes::arrow_array::{int_array_value_at_index, string_array_value_at_index};
use datatypes::json::JsonStructureSettings;
use jsonpath_rust::JsonPath;
use serde_json::Value;
use crate::function::{Function, extract_args}; use crate::function::{Function, extract_args};
use crate::helper; use crate::helper;
@@ -124,13 +131,6 @@ macro_rules! json_get {
}; };
} }
json_get!(
JsonGetInt,
Int64,
i64,
"Get the value from the JSONB by the given path and return it as an integer."
);
json_get!( json_get!(
JsonGetFloat, JsonGetFloat,
Float64, Float64,
@@ -145,70 +145,356 @@ json_get!(
"Get the value from the JSONB by the given path and return it as a boolean." "Get the value from the JSONB by the given path and return it as a boolean."
); );
/// Get the value from the JSONB by the given path and return it as a string. enum JsonResultValue<'a> {
#[derive(Clone, Debug)] Jsonb(Vec<u8>),
pub struct JsonGetString { JsonStructByColumn(&'a ArrayRef, usize),
JsonStructByValue(&'a Value),
}
trait JsonGetResultBuilder {
fn append_value(&mut self, value: JsonResultValue<'_>) -> Result<()>;
fn append_null(&mut self);
fn build(&mut self) -> ArrayRef;
}
/// Common implementation for JSON get scalar functions.
///
/// `JsonGet` encapsulates the logic for extracting values from JSON inputs
/// based on a path expression. Different JSON get functions reuse this
/// implementation by supplying their own `JsonGetResultBuilder` to control
/// how the resulting values are materialized into an Arrow array.
struct JsonGet {
signature: Signature, signature: Signature,
} }
impl JsonGetString { impl JsonGet {
pub const NAME: &'static str = "json_get_string"; fn invoke<F, B>(&self, args: ScalarFunctionArgs, builder_factory: F) -> Result<ColumnarValue>
where
F: Fn(usize) -> B,
B: JsonGetResultBuilder,
{
let [arg0, arg1] = extract_args("JSON_GET", &args)?;
let arg1 = compute::cast(&arg1, &DataType::Utf8View)?;
let paths = arg1.as_string_view();
let mut builder = (builder_factory)(arg0.len());
match arg0.data_type() {
DataType::Binary | DataType::LargeBinary | DataType::BinaryView => {
let arg0 = compute::cast(&arg0, &DataType::BinaryView)?;
let jsons = arg0.as_binary_view();
jsonb_get(jsons, paths, &mut builder)?;
}
DataType::Struct(_) => {
let jsons = arg0.as_struct();
json_struct_get(jsons, paths, &mut builder)?
}
_ => {
return Err(DataFusionError::Execution(format!(
"JSON_GET not supported argument type {}",
arg0.data_type(),
)));
}
};
Ok(ColumnarValue::Array(builder.build()))
}
} }
impl Default for JsonGetString { impl Default for JsonGet {
fn default() -> Self { fn default() -> Self {
Self { Self {
// TODO(LFC): Use a more clear type here instead of "Binary" for Json input, once we have a "Json" type. signature: Signature::any(2, Volatility::Immutable),
signature: helper::one_of_sigs2(
vec![DataType::Binary, DataType::BinaryView],
vec![DataType::Utf8, DataType::Utf8View],
),
} }
} }
} }
#[derive(Default)]
pub struct JsonGetString(JsonGet);
impl JsonGetString {
pub const NAME: &'static str = "json_get_string";
}
impl Function for JsonGetString { impl Function for JsonGetString {
fn name(&self) -> &str { fn name(&self) -> &str {
Self::NAME Self::NAME
} }
fn return_type(&self, _: &[DataType]) -> datafusion_common::Result<DataType> { fn return_type(&self, _: &[DataType]) -> Result<DataType> {
Ok(DataType::Utf8View) Ok(DataType::Utf8View)
} }
fn signature(&self) -> &Signature { fn signature(&self) -> &Signature {
&self.signature &self.0.signature
} }
fn invoke_with_args( fn invoke_with_args(&self, args: ScalarFunctionArgs) -> Result<ColumnarValue> {
&self, struct StringResultBuilder(StringViewBuilder);
args: ScalarFunctionArgs,
) -> datafusion_common::Result<ColumnarValue> {
let [arg0, arg1] = extract_args(self.name(), &args)?;
let arg0 = compute::cast(&arg0, &DataType::BinaryView)?;
let jsons = arg0.as_binary_view();
let arg1 = compute::cast(&arg1, &DataType::Utf8View)?;
let paths = arg1.as_string_view();
let size = jsons.len(); impl JsonGetResultBuilder for StringResultBuilder {
let mut builder = StringViewBuilder::with_capacity(size); fn append_value(&mut self, value: JsonResultValue<'_>) -> Result<()> {
match value {
for i in 0..size { JsonResultValue::Jsonb(value) => {
let json = jsons.is_valid(i).then(|| jsons.value(i)); self.0.append_option(jsonb::to_str(&value).ok())
let path = paths.is_valid(i).then(|| paths.value(i)); }
let result = match (json, path) { JsonResultValue::JsonStructByColumn(column, i) => {
(Some(json), Some(path)) => { if let Some(v) = string_array_value_at_index(column, i) {
get_json_by_path(json, path).and_then(|json| jsonb::to_str(&json).ok()) self.0.append_value(v);
} else {
self.0
.append_value(arrow_cast::display::array_value_to_string(
column, i,
)?);
}
}
JsonResultValue::JsonStructByValue(value) => {
if let Some(s) = value.as_str() {
self.0.append_value(s)
} else {
self.0.append_value(value.to_string())
}
}
} }
_ => None, Ok(())
}; }
builder.append_option(result);
fn append_null(&mut self) {
self.0.append_null();
}
fn build(&mut self) -> ArrayRef {
Arc::new(self.0.finish())
}
} }
Ok(ColumnarValue::Array(Arc::new(builder.finish()))) self.0.invoke(args, |len: usize| {
StringResultBuilder(StringViewBuilder::with_capacity(len))
})
} }
} }
#[derive(Default)]
pub struct JsonGetInt(JsonGet);
impl JsonGetInt {
pub const NAME: &'static str = "json_get_int";
}
impl Function for JsonGetInt {
fn name(&self) -> &str {
Self::NAME
}
fn return_type(&self, _: &[DataType]) -> Result<DataType> {
Ok(DataType::Int64)
}
fn signature(&self) -> &Signature {
&self.0.signature
}
fn invoke_with_args(&self, args: ScalarFunctionArgs) -> Result<ColumnarValue> {
struct IntResultBuilder(Int64Builder);
impl JsonGetResultBuilder for IntResultBuilder {
fn append_value(&mut self, value: JsonResultValue<'_>) -> Result<()> {
match value {
JsonResultValue::Jsonb(value) => {
self.0.append_option(jsonb::to_i64(&value).ok())
}
JsonResultValue::JsonStructByColumn(column, i) => {
self.0.append_option(int_array_value_at_index(column, i))
}
JsonResultValue::JsonStructByValue(value) => {
self.0.append_option(value.as_i64())
}
}
Ok(())
}
fn append_null(&mut self) {
self.0.append_null();
}
fn build(&mut self) -> ArrayRef {
Arc::new(self.0.finish())
}
}
self.0.invoke(args, |len: usize| {
IntResultBuilder(Int64Builder::with_capacity(len))
})
}
}
impl Display for JsonGetInt {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
write!(f, "{}", Self::NAME.to_ascii_uppercase())
}
}
fn jsonb_get(
jsons: &BinaryViewArray,
paths: &StringViewArray,
builder: &mut impl JsonGetResultBuilder,
) -> Result<()> {
let size = jsons.len();
for i in 0..size {
let json = jsons.is_valid(i).then(|| jsons.value(i));
let path = paths.is_valid(i).then(|| paths.value(i));
let result = match (json, path) {
(Some(json), Some(path)) => get_json_by_path(json, path),
_ => None,
};
if let Some(v) = result {
builder.append_value(JsonResultValue::Jsonb(v))?;
} else {
builder.append_null();
}
}
Ok(())
}
fn json_struct_get(
jsons: &StructArray,
paths: &StringViewArray,
builder: &mut impl JsonGetResultBuilder,
) -> Result<()> {
let size = jsons.len();
for i in 0..size {
if jsons.is_null(i) || paths.is_null(i) {
builder.append_null();
continue;
}
let path = paths.value(i);
// naively assume the JSON path is our kind of indexing to the field, by removing its "root"
let field_path = path.trim().replace("$.", "");
let column = jsons.column_by_name(&field_path);
if let Some(column) = column {
builder.append_value(JsonResultValue::JsonStructByColumn(column, i))?;
} else {
let Some(raw) = jsons
.column_by_name(JsonStructureSettings::RAW_FIELD)
.and_then(|x| string_array_value_at_index(x, i))
else {
builder.append_null();
continue;
};
let path: JsonPath<Value> = JsonPath::try_from(path).map_err(|e| {
DataFusionError::Execution(format!("{path} is not a valid JSON path: {e}"))
})?;
// the wanted field is not retrievable from the JSON struct columns directly, we have
// to combine everything (columns and the "_raw") into a complete JSON value to find it
let value = json_struct_to_value(raw, jsons, i)?;
match path.find(&value) {
Value::Null => builder.append_null(),
Value::Array(values) => match values.as_slice() {
[] => builder.append_null(),
[x] => builder.append_value(JsonResultValue::JsonStructByValue(x))?,
_ => builder.append_value(JsonResultValue::JsonStructByValue(&value))?,
},
value => builder.append_value(JsonResultValue::JsonStructByValue(&value))?,
}
}
}
Ok(())
}
fn json_struct_to_value(raw: &str, jsons: &StructArray, i: usize) -> Result<Value> {
let Ok(mut json) = Value::from_str(raw) else {
return Err(DataFusionError::Internal(format!(
"inner field '{}' is not a valid JSON string",
JsonStructureSettings::RAW_FIELD
)));
};
for (column_name, column) in jsons.column_names().into_iter().zip(jsons.columns()) {
if column_name == JsonStructureSettings::RAW_FIELD {
continue;
}
let (json_pointer, field) = if let Some((json_object, field)) = column_name.rsplit_once(".")
{
let json_pointer = format!("/{}", json_object.replace(".", "/"));
(json_pointer, field)
} else {
("".to_string(), column_name)
};
let Some(json_object) = json
.pointer_mut(&json_pointer)
.and_then(|x| x.as_object_mut())
else {
return Err(DataFusionError::Internal(format!(
"value at JSON pointer '{}' is not an object",
json_pointer
)));
};
macro_rules! insert {
($column: ident, $i: ident, $json_object: ident, $field: ident) => {{
if let Some(value) = $column
.is_valid($i)
.then(|| serde_json::Value::from($column.value($i)))
{
$json_object.insert($field.to_string(), value);
}
}};
}
match column.data_type() {
// boolean => Value::Bool
DataType::Boolean => {
let column = column.as_boolean();
insert!(column, i, json_object, field);
}
// int => Value::Number
DataType::Int64 => {
let column = column.as_primitive::<Int64Type>();
insert!(column, i, json_object, field);
}
DataType::UInt64 => {
let column = column.as_primitive::<UInt64Type>();
insert!(column, i, json_object, field);
}
DataType::Float64 => {
let column = column.as_primitive::<Float64Type>();
insert!(column, i, json_object, field);
}
// string => Value::String
DataType::Utf8 => {
let column = column.as_string::<i32>();
insert!(column, i, json_object, field);
}
DataType::LargeUtf8 => {
let column = column.as_string::<i64>();
insert!(column, i, json_object, field);
}
DataType::Utf8View => {
let column = column.as_string_view();
insert!(column, i, json_object, field);
}
// other => Value::Array and Value::Object
_ => {
return Err(DataFusionError::NotImplemented(format!(
"{} is not yet supported to be executed with field {} of datatype {}",
JsonGetString::NAME,
column_name,
column.data_type()
)));
}
}
}
Ok(json)
}
impl Display for JsonGetString { impl Display for JsonGetString {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result { fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
write!(f, "{}", Self::NAME.to_ascii_uppercase()) write!(f, "{}", Self::NAME.to_ascii_uppercase())
@@ -296,14 +582,60 @@ impl Display for JsonGetObject {
mod tests { mod tests {
use std::sync::Arc; use std::sync::Arc;
use arrow::array::{Float64Array, Int64Array, StructArray};
use arrow_schema::Field; use arrow_schema::Field;
use datafusion_common::ScalarValue; use datafusion_common::ScalarValue;
use datafusion_common::arrow::array::{BinaryArray, BinaryViewArray, StringArray}; use datafusion_common::arrow::array::{BinaryArray, BinaryViewArray, StringArray};
use datafusion_common::arrow::datatypes::{Float64Type, Int64Type}; use datafusion_common::arrow::datatypes::{Float64Type, Int64Type};
use datatypes::types::parse_string_to_jsonb; use datatypes::types::parse_string_to_jsonb;
use serde_json::json;
use super::*; use super::*;
/// Create a JSON object like this (as a one element struct array for testing):
///
/// ```JSON
/// {
/// "kind": "foo",
/// "payload": {
/// "code": 404,
/// "success": false,
/// "result": {
/// "error": "not found",
/// "time_cost": 1.234
/// }
/// }
/// }
/// ```
fn test_json_struct() -> ArrayRef {
Arc::new(StructArray::new(
vec![
Field::new("kind", DataType::Utf8, true),
Field::new("payload.code", DataType::Int64, true),
Field::new("payload.result.time_cost", DataType::Float64, true),
Field::new(JsonStructureSettings::RAW_FIELD, DataType::Utf8View, true),
]
.into(),
vec![
Arc::new(StringArray::from_iter([Some("foo")])) as ArrayRef,
Arc::new(Int64Array::from_iter([Some(404)])),
Arc::new(Float64Array::from_iter([Some(1.234)])),
Arc::new(StringViewArray::from_iter([Some(
json! ({
"payload": {
"success": false,
"result": {
"error": "not found"
}
}
})
.to_string(),
)])),
],
None,
))
}
#[test] #[test]
fn test_json_get_int() { fn test_json_get_int() {
let json_get_int = JsonGetInt::default(); let json_get_int = JsonGetInt::default();
@@ -321,37 +653,55 @@ mod tests {
r#"{"a": 4, "b": {"c": 6}, "c": 6}"#, r#"{"a": 4, "b": {"c": 6}, "c": 6}"#,
r#"{"a": 7, "b": 8, "c": {"a": 7}}"#, r#"{"a": 7, "b": 8, "c": {"a": 7}}"#,
]; ];
let paths = vec!["$.a.b", "$.a", "$.c"]; let json_struct = test_json_struct();
let results = [Some(2), Some(4), None];
let jsonbs = json_strings let path_expects = vec![
("$.a.b", Some(2)),
("$.a", Some(4)),
("$.c", None),
("$.kind", None),
("$.payload.code", Some(404)),
("$.payload.success", None),
("$.payload.result.time_cost", None),
("$.payload.not-exists", None),
("$.not-exists", None),
("$", None),
];
let mut jsons = json_strings
.iter() .iter()
.map(|s| { .map(|s| {
let value = jsonb::parse_value(s.as_bytes()).unwrap(); let value = jsonb::parse_value(s.as_bytes()).unwrap();
value.to_vec() Arc::new(BinaryArray::from_iter_values([value.to_vec()])) as ArrayRef
}) })
.collect::<Vec<_>>(); .collect::<Vec<_>>();
let json_struct_arrays =
std::iter::repeat_n(json_struct, path_expects.len() - jsons.len()).collect::<Vec<_>>();
jsons.extend(json_struct_arrays);
let args = ScalarFunctionArgs { for i in 0..jsons.len() {
args: vec![ let json = &jsons[i];
ColumnarValue::Array(Arc::new(BinaryArray::from_iter_values(jsonbs))), let (path, expect) = path_expects[i];
ColumnarValue::Array(Arc::new(StringArray::from_iter_values(paths))),
],
arg_fields: vec![],
number_rows: 3,
return_field: Arc::new(Field::new("x", DataType::Int64, false)),
config_options: Arc::new(Default::default()),
};
let result = json_get_int
.invoke_with_args(args)
.and_then(|x| x.to_array(3))
.unwrap();
let vector = result.as_primitive::<Int64Type>();
assert_eq!(3, vector.len()); let args = ScalarFunctionArgs {
for (i, gt) in results.iter().enumerate() { args: vec![
let result = vector.is_valid(i).then(|| vector.value(i)); ColumnarValue::Array(json.clone()),
assert_eq!(*gt, result); ColumnarValue::Scalar(path.into()),
],
arg_fields: vec![],
number_rows: 1,
return_field: Arc::new(Field::new("x", DataType::Int64, false)),
config_options: Arc::new(Default::default()),
};
let result = json_get_int
.invoke_with_args(args)
.and_then(|x| x.to_array(1))
.unwrap();
let result = result.as_primitive::<Int64Type>();
assert_eq!(1, result.len());
let actual = result.is_valid(0).then(|| result.value(0));
assert_eq!(actual, expect);
} }
} }
@@ -474,42 +824,85 @@ mod tests {
r#"{"a": "d", "b": {"c": "e"}, "c": "f"}"#, r#"{"a": "d", "b": {"c": "e"}, "c": "f"}"#,
r#"{"a": "g", "b": "h", "c": {"a": "g"}}"#, r#"{"a": "g", "b": "h", "c": {"a": "g"}}"#,
]; ];
let paths = vec!["$.a.b", "$.a", ""]; let json_struct = test_json_struct();
let results = [Some("a"), Some("d"), None];
let jsonbs = json_strings let paths = vec![
"$.a.b",
"$.a",
"",
"$.kind",
"$.payload.code",
"$.payload.result.time_cost",
"$.payload",
"$.payload.success",
"$.payload.result",
"$.payload.result.error",
"$.payload.result.not-exists",
"$.payload.not-exists",
"$.not-exists",
"$",
];
let expects = [
Some("a"),
Some("d"),
None,
Some("foo"),
Some("404"),
Some("1.234"),
Some(
r#"{"code":404,"result":{"error":"not found","time_cost":1.234},"success":false}"#,
),
Some("false"),
Some(r#"{"error":"not found","time_cost":1.234}"#),
Some("not found"),
None,
None,
None,
Some(
r#"{"kind":"foo","payload":{"code":404,"result":{"error":"not found","time_cost":1.234},"success":false}}"#,
),
];
let mut jsons = json_strings
.iter() .iter()
.map(|s| { .map(|s| {
let value = jsonb::parse_value(s.as_bytes()).unwrap(); let value = jsonb::parse_value(s.as_bytes()).unwrap();
value.to_vec() Arc::new(BinaryArray::from_iter_values([value.to_vec()])) as ArrayRef
}) })
.collect::<Vec<_>>(); .collect::<Vec<_>>();
let json_struct_arrays =
std::iter::repeat_n(json_struct, expects.len() - jsons.len()).collect::<Vec<_>>();
jsons.extend(json_struct_arrays);
let args = ScalarFunctionArgs { for i in 0..jsons.len() {
args: vec![ let json = &jsons[i];
ColumnarValue::Array(Arc::new(BinaryArray::from_iter_values(jsonbs))), let path = paths[i];
ColumnarValue::Array(Arc::new(StringArray::from_iter_values(paths))), let expect = expects[i];
],
arg_fields: vec![],
number_rows: 3,
return_field: Arc::new(Field::new("x", DataType::Utf8View, false)),
config_options: Arc::new(Default::default()),
};
let result = json_get_string
.invoke_with_args(args)
.and_then(|x| x.to_array(3))
.unwrap();
let vector = result.as_string_view();
assert_eq!(3, vector.len()); let args = ScalarFunctionArgs {
for (i, gt) in results.iter().enumerate() { args: vec![
let result = vector.is_valid(i).then(|| vector.value(i)); ColumnarValue::Array(json.clone()),
assert_eq!(*gt, result); ColumnarValue::Scalar(path.into()),
],
arg_fields: vec![],
number_rows: 1,
return_field: Arc::new(Field::new("x", DataType::Utf8View, false)),
config_options: Arc::new(Default::default()),
};
let result = json_get_string
.invoke_with_args(args)
.and_then(|x| x.to_array(1))
.unwrap();
let result = result.as_string_view();
assert_eq!(1, result.len());
let actual = result.is_valid(0).then(|| result.value(0));
assert_eq!(actual, expect);
} }
} }
#[test] #[test]
fn test_json_get_object() -> datafusion_common::Result<()> { fn test_json_get_object() -> Result<()> {
let udf = JsonGetObject::default(); let udf = JsonGetObject::default();
assert_eq!("json_get_object", udf.name()); assert_eq!("json_get_object", udf.name());
assert_eq!( assert_eq!(

View File

@@ -14,13 +14,31 @@
//! String scalar functions //! String scalar functions
mod elt;
mod field;
mod format;
mod insert;
mod locate;
mod regexp_extract; mod regexp_extract;
mod space;
pub(crate) use elt::EltFunction;
pub(crate) use field::FieldFunction;
pub(crate) use format::FormatFunction;
pub(crate) use insert::InsertFunction;
pub(crate) use locate::LocateFunction;
pub(crate) use regexp_extract::RegexpExtractFunction; pub(crate) use regexp_extract::RegexpExtractFunction;
pub(crate) use space::SpaceFunction;
use crate::function_registry::FunctionRegistry; use crate::function_registry::FunctionRegistry;
/// Register all string functions /// Register all string functions
pub fn register_string_functions(registry: &FunctionRegistry) { pub fn register_string_functions(registry: &FunctionRegistry) {
EltFunction::register(registry);
FieldFunction::register(registry);
FormatFunction::register(registry);
InsertFunction::register(registry);
LocateFunction::register(registry);
RegexpExtractFunction::register(registry); RegexpExtractFunction::register(registry);
SpaceFunction::register(registry);
} }

View File

@@ -0,0 +1,252 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
//! MySQL-compatible ELT function implementation.
//!
//! ELT(N, str1, str2, str3, ...) - Returns the Nth string from the list.
//! Returns NULL if N < 1 or N > number of strings.
use std::fmt;
use std::sync::Arc;
use datafusion_common::DataFusionError;
use datafusion_common::arrow::array::{Array, ArrayRef, AsArray, LargeStringBuilder};
use datafusion_common::arrow::compute::cast;
use datafusion_common::arrow::datatypes::DataType;
use datafusion_expr::{ColumnarValue, ScalarFunctionArgs, Signature, Volatility};
use crate::function::Function;
use crate::function_registry::FunctionRegistry;
const NAME: &str = "elt";
/// MySQL-compatible ELT function.
///
/// Syntax: ELT(N, str1, str2, str3, ...)
/// Returns the Nth string argument. N is 1-based.
/// Returns NULL if N is NULL, N < 1, or N > number of string arguments.
#[derive(Debug)]
pub struct EltFunction {
signature: Signature,
}
impl EltFunction {
pub fn register(registry: &FunctionRegistry) {
registry.register_scalar(EltFunction::default());
}
}
impl Default for EltFunction {
fn default() -> Self {
Self {
// ELT takes a variable number of arguments: (Int64, String, String, ...)
signature: Signature::variadic_any(Volatility::Immutable),
}
}
}
impl fmt::Display for EltFunction {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
write!(f, "{}", NAME.to_ascii_uppercase())
}
}
impl Function for EltFunction {
fn name(&self) -> &str {
NAME
}
fn return_type(&self, _: &[DataType]) -> datafusion_common::Result<DataType> {
Ok(DataType::LargeUtf8)
}
fn signature(&self) -> &Signature {
&self.signature
}
fn invoke_with_args(
&self,
args: ScalarFunctionArgs,
) -> datafusion_common::Result<ColumnarValue> {
if args.args.len() < 2 {
return Err(DataFusionError::Execution(
"ELT requires at least 2 arguments: ELT(N, str1, ...)".to_string(),
));
}
let arrays = ColumnarValue::values_to_arrays(&args.args)?;
let len = arrays[0].len();
let num_strings = arrays.len() - 1;
// First argument is the index (N) - try to cast to Int64
let index_array = if arrays[0].data_type() == &DataType::Null {
// All NULLs - return all NULLs
let mut builder = LargeStringBuilder::with_capacity(len, 0);
for _ in 0..len {
builder.append_null();
}
return Ok(ColumnarValue::Array(Arc::new(builder.finish())));
} else {
cast(arrays[0].as_ref(), &DataType::Int64).map_err(|e| {
DataFusionError::Execution(format!("ELT: index argument cast failed: {}", e))
})?
};
// Cast string arguments to LargeUtf8
let string_arrays: Vec<ArrayRef> = arrays[1..]
.iter()
.enumerate()
.map(|(i, arr)| {
cast(arr.as_ref(), &DataType::LargeUtf8).map_err(|e| {
DataFusionError::Execution(format!(
"ELT: string argument {} cast failed: {}",
i + 1,
e
))
})
})
.collect::<datafusion_common::Result<Vec<_>>>()?;
let mut builder = LargeStringBuilder::with_capacity(len, len * 32);
for i in 0..len {
if index_array.is_null(i) {
builder.append_null();
continue;
}
let n = index_array
.as_primitive::<datafusion_common::arrow::datatypes::Int64Type>()
.value(i);
// N is 1-based, check bounds
if n < 1 || n as usize > num_strings {
builder.append_null();
continue;
}
let str_idx = (n - 1) as usize;
let str_array = string_arrays[str_idx].as_string::<i64>();
if str_array.is_null(i) {
builder.append_null();
} else {
builder.append_value(str_array.value(i));
}
}
Ok(ColumnarValue::Array(Arc::new(builder.finish())))
}
}
#[cfg(test)]
mod tests {
use std::sync::Arc;
use datafusion_common::arrow::array::{Int64Array, StringArray};
use datafusion_common::arrow::datatypes::Field;
use datafusion_expr::ScalarFunctionArgs;
use super::*;
fn create_args(arrays: Vec<ArrayRef>) -> ScalarFunctionArgs {
let arg_fields: Vec<_> = arrays
.iter()
.enumerate()
.map(|(i, arr)| {
Arc::new(Field::new(
format!("arg_{}", i),
arr.data_type().clone(),
true,
))
})
.collect();
ScalarFunctionArgs {
args: arrays.iter().cloned().map(ColumnarValue::Array).collect(),
arg_fields,
return_field: Arc::new(Field::new("result", DataType::LargeUtf8, true)),
number_rows: arrays[0].len(),
config_options: Arc::new(datafusion_common::config::ConfigOptions::default()),
}
}
#[test]
fn test_elt_basic() {
let function = EltFunction::default();
let n = Arc::new(Int64Array::from(vec![1, 2, 3]));
let s1 = Arc::new(StringArray::from(vec!["a", "a", "a"]));
let s2 = Arc::new(StringArray::from(vec!["b", "b", "b"]));
let s3 = Arc::new(StringArray::from(vec!["c", "c", "c"]));
let args = create_args(vec![n, s1, s2, s3]);
let result = function.invoke_with_args(args).unwrap();
if let ColumnarValue::Array(array) = result {
let str_array = array.as_string::<i64>();
assert_eq!(str_array.value(0), "a");
assert_eq!(str_array.value(1), "b");
assert_eq!(str_array.value(2), "c");
} else {
panic!("Expected array result");
}
}
#[test]
fn test_elt_out_of_bounds() {
let function = EltFunction::default();
let n = Arc::new(Int64Array::from(vec![0, 4, -1]));
let s1 = Arc::new(StringArray::from(vec!["a", "a", "a"]));
let s2 = Arc::new(StringArray::from(vec!["b", "b", "b"]));
let s3 = Arc::new(StringArray::from(vec!["c", "c", "c"]));
let args = create_args(vec![n, s1, s2, s3]);
let result = function.invoke_with_args(args).unwrap();
if let ColumnarValue::Array(array) = result {
let str_array = array.as_string::<i64>();
assert!(str_array.is_null(0)); // 0 is out of bounds
assert!(str_array.is_null(1)); // 4 is out of bounds
assert!(str_array.is_null(2)); // -1 is out of bounds
} else {
panic!("Expected array result");
}
}
#[test]
fn test_elt_with_nulls() {
let function = EltFunction::default();
// Row 0: n=1, select s1="a" -> "a"
// Row 1: n=NULL -> NULL
// Row 2: n=1, select s1=NULL -> NULL
let n = Arc::new(Int64Array::from(vec![Some(1), None, Some(1)]));
let s1 = Arc::new(StringArray::from(vec![Some("a"), Some("a"), None]));
let s2 = Arc::new(StringArray::from(vec![Some("b"), Some("b"), Some("b")]));
let args = create_args(vec![n, s1, s2]);
let result = function.invoke_with_args(args).unwrap();
if let ColumnarValue::Array(array) = result {
let str_array = array.as_string::<i64>();
assert_eq!(str_array.value(0), "a");
assert!(str_array.is_null(1)); // N is NULL
assert!(str_array.is_null(2)); // Selected string is NULL
} else {
panic!("Expected array result");
}
}
}

View File

@@ -0,0 +1,224 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
//! MySQL-compatible FIELD function implementation.
//!
//! FIELD(str, str1, str2, str3, ...) - Returns the 1-based index of str in the list.
//! Returns 0 if str is not found or is NULL.
use std::fmt;
use std::sync::Arc;
use datafusion_common::DataFusionError;
use datafusion_common::arrow::array::{Array, ArrayRef, AsArray, Int64Builder};
use datafusion_common::arrow::compute::cast;
use datafusion_common::arrow::datatypes::DataType;
use datafusion_expr::{ColumnarValue, ScalarFunctionArgs, Signature, Volatility};
use crate::function::Function;
use crate::function_registry::FunctionRegistry;
const NAME: &str = "field";
/// MySQL-compatible FIELD function.
///
/// Syntax: FIELD(str, str1, str2, str3, ...)
/// Returns the 1-based index of str in the argument list (str1, str2, str3, ...).
/// Returns 0 if str is not found or is NULL.
#[derive(Debug)]
pub struct FieldFunction {
signature: Signature,
}
impl FieldFunction {
pub fn register(registry: &FunctionRegistry) {
registry.register_scalar(FieldFunction::default());
}
}
impl Default for FieldFunction {
fn default() -> Self {
Self {
// FIELD takes a variable number of arguments: (String, String, String, ...)
signature: Signature::variadic_any(Volatility::Immutable),
}
}
}
impl fmt::Display for FieldFunction {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
write!(f, "{}", NAME.to_ascii_uppercase())
}
}
impl Function for FieldFunction {
fn name(&self) -> &str {
NAME
}
fn return_type(&self, _: &[DataType]) -> datafusion_common::Result<DataType> {
Ok(DataType::Int64)
}
fn signature(&self) -> &Signature {
&self.signature
}
fn invoke_with_args(
&self,
args: ScalarFunctionArgs,
) -> datafusion_common::Result<ColumnarValue> {
if args.args.len() < 2 {
return Err(DataFusionError::Execution(
"FIELD requires at least 2 arguments: FIELD(str, str1, ...)".to_string(),
));
}
let arrays = ColumnarValue::values_to_arrays(&args.args)?;
let len = arrays[0].len();
// Cast all arguments to LargeUtf8
let string_arrays: Vec<ArrayRef> = arrays
.iter()
.enumerate()
.map(|(i, arr)| {
cast(arr.as_ref(), &DataType::LargeUtf8).map_err(|e| {
DataFusionError::Execution(format!("FIELD: argument {} cast failed: {}", i, e))
})
})
.collect::<datafusion_common::Result<Vec<_>>>()?;
let search_str = string_arrays[0].as_string::<i64>();
let mut builder = Int64Builder::with_capacity(len);
for i in 0..len {
// If search string is NULL, return 0
if search_str.is_null(i) {
builder.append_value(0);
continue;
}
let needle = search_str.value(i);
let mut found_idx = 0i64;
// Search through the list (starting from index 1 in string_arrays)
for (j, str_arr) in string_arrays[1..].iter().enumerate() {
let str_array = str_arr.as_string::<i64>();
if !str_array.is_null(i) && str_array.value(i) == needle {
found_idx = (j + 1) as i64; // 1-based index
break;
}
}
builder.append_value(found_idx);
}
Ok(ColumnarValue::Array(Arc::new(builder.finish())))
}
}
#[cfg(test)]
mod tests {
use std::sync::Arc;
use datafusion_common::arrow::array::StringArray;
use datafusion_common::arrow::datatypes::Field;
use datafusion_expr::ScalarFunctionArgs;
use super::*;
fn create_args(arrays: Vec<ArrayRef>) -> ScalarFunctionArgs {
let arg_fields: Vec<_> = arrays
.iter()
.enumerate()
.map(|(i, arr)| {
Arc::new(Field::new(
format!("arg_{}", i),
arr.data_type().clone(),
true,
))
})
.collect();
ScalarFunctionArgs {
args: arrays.iter().cloned().map(ColumnarValue::Array).collect(),
arg_fields,
return_field: Arc::new(Field::new("result", DataType::Int64, true)),
number_rows: arrays[0].len(),
config_options: Arc::new(datafusion_common::config::ConfigOptions::default()),
}
}
#[test]
fn test_field_basic() {
let function = FieldFunction::default();
let search = Arc::new(StringArray::from(vec!["b", "d", "a"]));
let s1 = Arc::new(StringArray::from(vec!["a", "a", "a"]));
let s2 = Arc::new(StringArray::from(vec!["b", "b", "b"]));
let s3 = Arc::new(StringArray::from(vec!["c", "c", "c"]));
let args = create_args(vec![search, s1, s2, s3]);
let result = function.invoke_with_args(args).unwrap();
if let ColumnarValue::Array(array) = result {
let int_array = array.as_primitive::<datafusion_common::arrow::datatypes::Int64Type>();
assert_eq!(int_array.value(0), 2); // "b" is at index 2
assert_eq!(int_array.value(1), 0); // "d" not found
assert_eq!(int_array.value(2), 1); // "a" is at index 1
} else {
panic!("Expected array result");
}
}
#[test]
fn test_field_with_null_search() {
let function = FieldFunction::default();
let search = Arc::new(StringArray::from(vec![Some("a"), None]));
let s1 = Arc::new(StringArray::from(vec!["a", "a"]));
let s2 = Arc::new(StringArray::from(vec!["b", "b"]));
let args = create_args(vec![search, s1, s2]);
let result = function.invoke_with_args(args).unwrap();
if let ColumnarValue::Array(array) = result {
let int_array = array.as_primitive::<datafusion_common::arrow::datatypes::Int64Type>();
assert_eq!(int_array.value(0), 1); // "a" found at index 1
assert_eq!(int_array.value(1), 0); // NULL search returns 0
} else {
panic!("Expected array result");
}
}
#[test]
fn test_field_case_sensitive() {
let function = FieldFunction::default();
let search = Arc::new(StringArray::from(vec!["A", "a"]));
let s1 = Arc::new(StringArray::from(vec!["a", "a"]));
let s2 = Arc::new(StringArray::from(vec!["A", "A"]));
let args = create_args(vec![search, s1, s2]);
let result = function.invoke_with_args(args).unwrap();
if let ColumnarValue::Array(array) = result {
let int_array = array.as_primitive::<datafusion_common::arrow::datatypes::Int64Type>();
assert_eq!(int_array.value(0), 2); // "A" matches at index 2
assert_eq!(int_array.value(1), 1); // "a" matches at index 1
} else {
panic!("Expected array result");
}
}
}

View File

@@ -0,0 +1,512 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
//! MySQL-compatible FORMAT function implementation.
//!
//! FORMAT(X, D) - Formats the number X with D decimal places using thousand separators.
use std::fmt;
use std::sync::Arc;
use datafusion_common::DataFusionError;
use datafusion_common::arrow::array::{Array, AsArray, LargeStringBuilder};
use datafusion_common::arrow::datatypes as arrow_types;
use datafusion_common::arrow::datatypes::DataType;
use datafusion_expr::{ColumnarValue, ScalarFunctionArgs, Signature, TypeSignature, Volatility};
use crate::function::Function;
use crate::function_registry::FunctionRegistry;
const NAME: &str = "format";
/// MySQL-compatible FORMAT function.
///
/// Syntax: FORMAT(X, D)
/// Formats the number X to a format like '#,###,###.##', rounded to D decimal places.
/// D can be 0 to 30.
///
/// Note: This implementation uses the en_US locale (comma as thousand separator,
/// period as decimal separator).
#[derive(Debug)]
pub struct FormatFunction {
signature: Signature,
}
impl FormatFunction {
pub fn register(registry: &FunctionRegistry) {
registry.register_scalar(FormatFunction::default());
}
}
impl Default for FormatFunction {
fn default() -> Self {
let mut signatures = Vec::new();
// Support various numeric types for X
let numeric_types = [
DataType::Float64,
DataType::Float32,
DataType::Int64,
DataType::Int32,
DataType::Int16,
DataType::Int8,
DataType::UInt64,
DataType::UInt32,
DataType::UInt16,
DataType::UInt8,
];
// D can be various integer types
let int_types = [
DataType::Int64,
DataType::Int32,
DataType::Int16,
DataType::Int8,
DataType::UInt64,
DataType::UInt32,
DataType::UInt16,
DataType::UInt8,
];
for x_type in &numeric_types {
for d_type in &int_types {
signatures.push(TypeSignature::Exact(vec![x_type.clone(), d_type.clone()]));
}
}
Self {
signature: Signature::one_of(signatures, Volatility::Immutable),
}
}
}
impl fmt::Display for FormatFunction {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
write!(f, "{}", NAME.to_ascii_uppercase())
}
}
impl Function for FormatFunction {
fn name(&self) -> &str {
NAME
}
fn return_type(&self, _: &[DataType]) -> datafusion_common::Result<DataType> {
Ok(DataType::LargeUtf8)
}
fn signature(&self) -> &Signature {
&self.signature
}
fn invoke_with_args(
&self,
args: ScalarFunctionArgs,
) -> datafusion_common::Result<ColumnarValue> {
if args.args.len() != 2 {
return Err(DataFusionError::Execution(
"FORMAT requires exactly 2 arguments: FORMAT(X, D)".to_string(),
));
}
let arrays = ColumnarValue::values_to_arrays(&args.args)?;
let len = arrays[0].len();
let x_array = &arrays[0];
let d_array = &arrays[1];
let mut builder = LargeStringBuilder::with_capacity(len, len * 20);
for i in 0..len {
if x_array.is_null(i) || d_array.is_null(i) {
builder.append_null();
continue;
}
let decimal_places = get_decimal_places(d_array, i)?.clamp(0, 30) as usize;
let formatted = match x_array.data_type() {
DataType::Float64 | DataType::Float32 => {
format_number_float(get_float_value(x_array, i)?, decimal_places)
}
DataType::Int64
| DataType::Int32
| DataType::Int16
| DataType::Int8
| DataType::UInt64
| DataType::UInt32
| DataType::UInt16
| DataType::UInt8 => format_number_integer(x_array, i, decimal_places)?,
_ => {
return Err(DataFusionError::Execution(format!(
"FORMAT: unsupported type {:?}",
x_array.data_type()
)));
}
};
builder.append_value(&formatted);
}
Ok(ColumnarValue::Array(Arc::new(builder.finish())))
}
}
/// Get float value from various numeric types.
fn get_float_value(
array: &datafusion_common::arrow::array::ArrayRef,
index: usize,
) -> datafusion_common::Result<f64> {
match array.data_type() {
DataType::Float64 => Ok(array
.as_primitive::<arrow_types::Float64Type>()
.value(index)),
DataType::Float32 => Ok(array
.as_primitive::<arrow_types::Float32Type>()
.value(index) as f64),
_ => Err(DataFusionError::Execution(format!(
"FORMAT: unsupported type {:?}",
array.data_type()
))),
}
}
/// Get decimal places from various integer types.
///
/// MySQL clamps decimal places to `0..=30`. This function returns an `i64` so the caller can clamp.
fn get_decimal_places(
array: &datafusion_common::arrow::array::ArrayRef,
index: usize,
) -> datafusion_common::Result<i64> {
match array.data_type() {
DataType::Int64 => Ok(array.as_primitive::<arrow_types::Int64Type>().value(index)),
DataType::Int32 => Ok(array.as_primitive::<arrow_types::Int32Type>().value(index) as i64),
DataType::Int16 => Ok(array.as_primitive::<arrow_types::Int16Type>().value(index) as i64),
DataType::Int8 => Ok(array.as_primitive::<arrow_types::Int8Type>().value(index) as i64),
DataType::UInt64 => {
let v = array.as_primitive::<arrow_types::UInt64Type>().value(index);
Ok(if v > i64::MAX as u64 {
i64::MAX
} else {
v as i64
})
}
DataType::UInt32 => Ok(array.as_primitive::<arrow_types::UInt32Type>().value(index) as i64),
DataType::UInt16 => Ok(array.as_primitive::<arrow_types::UInt16Type>().value(index) as i64),
DataType::UInt8 => Ok(array.as_primitive::<arrow_types::UInt8Type>().value(index) as i64),
_ => Err(DataFusionError::Execution(format!(
"FORMAT: unsupported type {:?}",
array.data_type()
))),
}
}
fn format_number_integer(
array: &datafusion_common::arrow::array::ArrayRef,
index: usize,
decimal_places: usize,
) -> datafusion_common::Result<String> {
let (is_negative, abs_digits) = match array.data_type() {
DataType::Int64 => {
let v = array.as_primitive::<arrow_types::Int64Type>().value(index) as i128;
(v.is_negative(), v.unsigned_abs().to_string())
}
DataType::Int32 => {
let v = array.as_primitive::<arrow_types::Int32Type>().value(index) as i128;
(v.is_negative(), v.unsigned_abs().to_string())
}
DataType::Int16 => {
let v = array.as_primitive::<arrow_types::Int16Type>().value(index) as i128;
(v.is_negative(), v.unsigned_abs().to_string())
}
DataType::Int8 => {
let v = array.as_primitive::<arrow_types::Int8Type>().value(index) as i128;
(v.is_negative(), v.unsigned_abs().to_string())
}
DataType::UInt64 => {
let v = array.as_primitive::<arrow_types::UInt64Type>().value(index) as u128;
(false, v.to_string())
}
DataType::UInt32 => {
let v = array.as_primitive::<arrow_types::UInt32Type>().value(index) as u128;
(false, v.to_string())
}
DataType::UInt16 => {
let v = array.as_primitive::<arrow_types::UInt16Type>().value(index) as u128;
(false, v.to_string())
}
DataType::UInt8 => {
let v = array.as_primitive::<arrow_types::UInt8Type>().value(index) as u128;
(false, v.to_string())
}
_ => {
return Err(DataFusionError::Execution(format!(
"FORMAT: unsupported type {:?}",
array.data_type()
)));
}
};
let mut result = String::new();
if is_negative {
result.push('-');
}
result.push_str(&add_thousand_separators(&abs_digits));
if decimal_places > 0 {
result.push('.');
result.push_str(&"0".repeat(decimal_places));
}
Ok(result)
}
/// Format a float with thousand separators and `decimal_places` digits after decimal point.
fn format_number_float(x: f64, decimal_places: usize) -> String {
// Handle special cases
if x.is_nan() {
return "NaN".to_string();
}
if x.is_infinite() {
return if x.is_sign_positive() {
"Infinity".to_string()
} else {
"-Infinity".to_string()
};
}
// Round to decimal_places
let multiplier = 10f64.powi(decimal_places as i32);
let rounded = (x * multiplier).round() / multiplier;
// Split into integer and fractional parts
let is_negative = rounded < 0.0;
let abs_value = rounded.abs();
// Format with the specified decimal places
let formatted = if decimal_places == 0 {
format!("{:.0}", abs_value)
} else {
format!("{:.prec$}", abs_value, prec = decimal_places)
};
// Split at decimal point
let parts: Vec<&str> = formatted.split('.').collect();
let int_part = parts[0];
let dec_part = parts.get(1).copied();
// Add thousand separators to integer part
let int_with_sep = add_thousand_separators(int_part);
// Build result
let mut result = String::new();
if is_negative {
result.push('-');
}
result.push_str(&int_with_sep);
if let Some(dec) = dec_part {
result.push('.');
result.push_str(dec);
}
result
}
/// Add thousand separators (commas) to an integer string.
fn add_thousand_separators(s: &str) -> String {
let chars: Vec<char> = s.chars().collect();
let len = chars.len();
if len <= 3 {
return s.to_string();
}
let mut result = String::with_capacity(len + len / 3);
let first_group_len = len % 3;
let first_group_len = if first_group_len == 0 {
3
} else {
first_group_len
};
for (i, ch) in chars.iter().enumerate() {
if i > 0 && i >= first_group_len && (i - first_group_len) % 3 == 0 {
result.push(',');
}
result.push(*ch);
}
result
}
#[cfg(test)]
mod tests {
use std::sync::Arc;
use datafusion_common::arrow::array::{Float64Array, Int64Array};
use datafusion_common::arrow::datatypes::Field;
use datafusion_expr::ScalarFunctionArgs;
use super::*;
fn create_args(arrays: Vec<datafusion_common::arrow::array::ArrayRef>) -> ScalarFunctionArgs {
let arg_fields: Vec<_> = arrays
.iter()
.enumerate()
.map(|(i, arr)| {
Arc::new(Field::new(
format!("arg_{}", i),
arr.data_type().clone(),
true,
))
})
.collect();
ScalarFunctionArgs {
args: arrays.iter().cloned().map(ColumnarValue::Array).collect(),
arg_fields,
return_field: Arc::new(Field::new("result", DataType::LargeUtf8, true)),
number_rows: arrays[0].len(),
config_options: Arc::new(datafusion_common::config::ConfigOptions::default()),
}
}
#[test]
fn test_format_basic() {
let function = FormatFunction::default();
let x = Arc::new(Float64Array::from(vec![1234567.891, 1234.5, 1234567.0]));
let d = Arc::new(Int64Array::from(vec![2, 0, 3]));
let args = create_args(vec![x, d]);
let result = function.invoke_with_args(args).unwrap();
if let ColumnarValue::Array(array) = result {
let str_array = array.as_string::<i64>();
assert_eq!(str_array.value(0), "1,234,567.89");
assert_eq!(str_array.value(1), "1,235"); // rounded
assert_eq!(str_array.value(2), "1,234,567.000");
} else {
panic!("Expected array result");
}
}
#[test]
fn test_format_negative() {
let function = FormatFunction::default();
let x = Arc::new(Float64Array::from(vec![-1234567.891]));
let d = Arc::new(Int64Array::from(vec![2]));
let args = create_args(vec![x, d]);
let result = function.invoke_with_args(args).unwrap();
if let ColumnarValue::Array(array) = result {
let str_array = array.as_string::<i64>();
assert_eq!(str_array.value(0), "-1,234,567.89");
} else {
panic!("Expected array result");
}
}
#[test]
fn test_format_small_numbers() {
let function = FormatFunction::default();
let x = Arc::new(Float64Array::from(vec![0.5, 12.345, 123.0]));
let d = Arc::new(Int64Array::from(vec![2, 2, 0]));
let args = create_args(vec![x, d]);
let result = function.invoke_with_args(args).unwrap();
if let ColumnarValue::Array(array) = result {
let str_array = array.as_string::<i64>();
assert_eq!(str_array.value(0), "0.50");
assert_eq!(str_array.value(1), "12.35"); // rounded
assert_eq!(str_array.value(2), "123");
} else {
panic!("Expected array result");
}
}
#[test]
fn test_format_with_nulls() {
let function = FormatFunction::default();
let x = Arc::new(Float64Array::from(vec![Some(1234.5), None]));
let d = Arc::new(Int64Array::from(vec![2, 2]));
let args = create_args(vec![x, d]);
let result = function.invoke_with_args(args).unwrap();
if let ColumnarValue::Array(array) = result {
let str_array = array.as_string::<i64>();
assert_eq!(str_array.value(0), "1,234.50");
assert!(str_array.is_null(1));
} else {
panic!("Expected array result");
}
}
#[test]
fn test_add_thousand_separators() {
assert_eq!(add_thousand_separators("1"), "1");
assert_eq!(add_thousand_separators("12"), "12");
assert_eq!(add_thousand_separators("123"), "123");
assert_eq!(add_thousand_separators("1234"), "1,234");
assert_eq!(add_thousand_separators("12345"), "12,345");
assert_eq!(add_thousand_separators("123456"), "123,456");
assert_eq!(add_thousand_separators("1234567"), "1,234,567");
assert_eq!(add_thousand_separators("12345678"), "12,345,678");
assert_eq!(add_thousand_separators("123456789"), "123,456,789");
}
#[test]
fn test_format_large_int_no_float_precision_loss() {
let function = FormatFunction::default();
// 2^53 + 1 cannot be represented exactly as f64.
let x = Arc::new(Int64Array::from(vec![9_007_199_254_740_993i64]));
let d = Arc::new(Int64Array::from(vec![0]));
let args = create_args(vec![x, d]);
let result = function.invoke_with_args(args).unwrap();
if let ColumnarValue::Array(array) = result {
let str_array = array.as_string::<i64>();
assert_eq!(str_array.value(0), "9,007,199,254,740,993");
} else {
panic!("Expected array result");
}
}
#[test]
fn test_format_decimal_places_u64_overflow_clamps() {
use datafusion_common::arrow::array::UInt64Array;
let function = FormatFunction::default();
let x = Arc::new(Int64Array::from(vec![1]));
let d = Arc::new(UInt64Array::from(vec![u64::MAX]));
let args = create_args(vec![x, d]);
let result = function.invoke_with_args(args).unwrap();
if let ColumnarValue::Array(array) = result {
let str_array = array.as_string::<i64>();
assert_eq!(str_array.value(0), format!("1.{}", "0".repeat(30)));
} else {
panic!("Expected array result");
}
}
}

View File

@@ -0,0 +1,345 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
//! MySQL-compatible INSERT function implementation.
//!
//! INSERT(str, pos, len, newstr) - Inserts newstr into str at position pos,
//! replacing len characters.
use std::fmt;
use std::sync::Arc;
use datafusion_common::DataFusionError;
use datafusion_common::arrow::array::{Array, ArrayRef, AsArray, LargeStringBuilder};
use datafusion_common::arrow::compute::cast;
use datafusion_common::arrow::datatypes::DataType;
use datafusion_expr::{ColumnarValue, ScalarFunctionArgs, Signature, TypeSignature, Volatility};
use crate::function::Function;
use crate::function_registry::FunctionRegistry;
const NAME: &str = "insert";
/// MySQL-compatible INSERT function.
///
/// Syntax: INSERT(str, pos, len, newstr)
/// Returns str with the substring beginning at position pos and len characters long
/// replaced by newstr.
///
/// - pos is 1-based
/// - If pos is out of range, returns the original string
/// - If len is out of range, replaces from pos to end of string
#[derive(Debug)]
pub struct InsertFunction {
signature: Signature,
}
impl InsertFunction {
pub fn register(registry: &FunctionRegistry) {
registry.register_scalar(InsertFunction::default());
}
}
impl Default for InsertFunction {
fn default() -> Self {
let mut signatures = Vec::new();
let string_types = [DataType::Utf8, DataType::LargeUtf8, DataType::Utf8View];
let int_types = [
DataType::Int64,
DataType::Int32,
DataType::Int16,
DataType::Int8,
DataType::UInt64,
DataType::UInt32,
DataType::UInt16,
DataType::UInt8,
];
for str_type in &string_types {
for newstr_type in &string_types {
for pos_type in &int_types {
for len_type in &int_types {
signatures.push(TypeSignature::Exact(vec![
str_type.clone(),
pos_type.clone(),
len_type.clone(),
newstr_type.clone(),
]));
}
}
}
}
Self {
signature: Signature::one_of(signatures, Volatility::Immutable),
}
}
}
impl fmt::Display for InsertFunction {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
write!(f, "{}", NAME.to_ascii_uppercase())
}
}
impl Function for InsertFunction {
fn name(&self) -> &str {
NAME
}
fn return_type(&self, _: &[DataType]) -> datafusion_common::Result<DataType> {
Ok(DataType::LargeUtf8)
}
fn signature(&self) -> &Signature {
&self.signature
}
fn invoke_with_args(
&self,
args: ScalarFunctionArgs,
) -> datafusion_common::Result<ColumnarValue> {
if args.args.len() != 4 {
return Err(DataFusionError::Execution(
"INSERT requires exactly 4 arguments: INSERT(str, pos, len, newstr)".to_string(),
));
}
let arrays = ColumnarValue::values_to_arrays(&args.args)?;
let len = arrays[0].len();
// Cast string arguments to LargeUtf8
let str_array = cast_to_large_utf8(&arrays[0], "str")?;
let newstr_array = cast_to_large_utf8(&arrays[3], "newstr")?;
let pos_array = cast_to_int64(&arrays[1], "pos")?;
let replace_len_array = cast_to_int64(&arrays[2], "len")?;
let str_arr = str_array.as_string::<i64>();
let pos_arr = pos_array.as_primitive::<datafusion_common::arrow::datatypes::Int64Type>();
let len_arr =
replace_len_array.as_primitive::<datafusion_common::arrow::datatypes::Int64Type>();
let newstr_arr = newstr_array.as_string::<i64>();
let mut builder = LargeStringBuilder::with_capacity(len, len * 32);
for i in 0..len {
// Check for NULLs
if str_arr.is_null(i)
|| pos_array.is_null(i)
|| replace_len_array.is_null(i)
|| newstr_arr.is_null(i)
{
builder.append_null();
continue;
}
let original = str_arr.value(i);
let pos = pos_arr.value(i);
let replace_len = len_arr.value(i);
let new_str = newstr_arr.value(i);
let result = insert_string(original, pos, replace_len, new_str);
builder.append_value(&result);
}
Ok(ColumnarValue::Array(Arc::new(builder.finish())))
}
}
/// Cast array to LargeUtf8 for uniform string access.
fn cast_to_large_utf8(array: &ArrayRef, name: &str) -> datafusion_common::Result<ArrayRef> {
cast(array.as_ref(), &DataType::LargeUtf8)
.map_err(|e| DataFusionError::Execution(format!("INSERT: {} cast failed: {}", name, e)))
}
fn cast_to_int64(array: &ArrayRef, name: &str) -> datafusion_common::Result<ArrayRef> {
cast(array.as_ref(), &DataType::Int64)
.map_err(|e| DataFusionError::Execution(format!("INSERT: {} cast failed: {}", name, e)))
}
/// Perform the INSERT string operation.
/// pos is 1-based. If pos < 1 or pos > len(str) + 1, returns original string.
fn insert_string(original: &str, pos: i64, replace_len: i64, new_str: &str) -> String {
let char_count = original.chars().count();
// MySQL behavior: if pos < 1 or pos > string length + 1, return original
if pos < 1 || pos as usize > char_count + 1 {
return original.to_string();
}
let start_idx = (pos - 1) as usize; // Convert to 0-based
// Calculate end index for replacement
let replace_len = if replace_len < 0 {
0
} else {
replace_len as usize
};
let end_idx = (start_idx + replace_len).min(char_count);
let start_byte = char_to_byte_idx(original, start_idx);
let end_byte = char_to_byte_idx(original, end_idx);
let mut result = String::with_capacity(original.len() + new_str.len());
result.push_str(&original[..start_byte]);
result.push_str(new_str);
result.push_str(&original[end_byte..]);
result
}
fn char_to_byte_idx(s: &str, char_idx: usize) -> usize {
s.char_indices()
.nth(char_idx)
.map(|(idx, _)| idx)
.unwrap_or(s.len())
}
#[cfg(test)]
mod tests {
use std::sync::Arc;
use datafusion_common::arrow::array::{Int64Array, StringArray};
use datafusion_common::arrow::datatypes::Field;
use datafusion_expr::ScalarFunctionArgs;
use super::*;
fn create_args(arrays: Vec<ArrayRef>) -> ScalarFunctionArgs {
let arg_fields: Vec<_> = arrays
.iter()
.enumerate()
.map(|(i, arr)| {
Arc::new(Field::new(
format!("arg_{}", i),
arr.data_type().clone(),
true,
))
})
.collect();
ScalarFunctionArgs {
args: arrays.iter().cloned().map(ColumnarValue::Array).collect(),
arg_fields,
return_field: Arc::new(Field::new("result", DataType::LargeUtf8, true)),
number_rows: arrays[0].len(),
config_options: Arc::new(datafusion_common::config::ConfigOptions::default()),
}
}
#[test]
fn test_insert_basic() {
let function = InsertFunction::default();
// INSERT('Quadratic', 3, 4, 'What') => 'QuWhattic'
let str_arr = Arc::new(StringArray::from(vec!["Quadratic"]));
let pos = Arc::new(Int64Array::from(vec![3]));
let len = Arc::new(Int64Array::from(vec![4]));
let newstr = Arc::new(StringArray::from(vec!["What"]));
let args = create_args(vec![str_arr, pos, len, newstr]);
let result = function.invoke_with_args(args).unwrap();
if let ColumnarValue::Array(array) = result {
let str_array = array.as_string::<i64>();
assert_eq!(str_array.value(0), "QuWhattic");
} else {
panic!("Expected array result");
}
}
#[test]
fn test_insert_out_of_range_pos() {
let function = InsertFunction::default();
// INSERT('Quadratic', 0, 4, 'What') => 'Quadratic' (pos < 1)
let str_arr = Arc::new(StringArray::from(vec!["Quadratic", "Quadratic"]));
let pos = Arc::new(Int64Array::from(vec![0, 100]));
let len = Arc::new(Int64Array::from(vec![4, 4]));
let newstr = Arc::new(StringArray::from(vec!["What", "What"]));
let args = create_args(vec![str_arr, pos, len, newstr]);
let result = function.invoke_with_args(args).unwrap();
if let ColumnarValue::Array(array) = result {
let str_array = array.as_string::<i64>();
assert_eq!(str_array.value(0), "Quadratic"); // pos < 1
assert_eq!(str_array.value(1), "Quadratic"); // pos > length
} else {
panic!("Expected array result");
}
}
#[test]
fn test_insert_replace_to_end() {
let function = InsertFunction::default();
// INSERT('Quadratic', 3, 100, 'What') => 'QuWhat' (len exceeds remaining)
let str_arr = Arc::new(StringArray::from(vec!["Quadratic"]));
let pos = Arc::new(Int64Array::from(vec![3]));
let len = Arc::new(Int64Array::from(vec![100]));
let newstr = Arc::new(StringArray::from(vec!["What"]));
let args = create_args(vec![str_arr, pos, len, newstr]);
let result = function.invoke_with_args(args).unwrap();
if let ColumnarValue::Array(array) = result {
let str_array = array.as_string::<i64>();
assert_eq!(str_array.value(0), "QuWhat");
} else {
panic!("Expected array result");
}
}
#[test]
fn test_insert_unicode() {
let function = InsertFunction::default();
// INSERT('hello世界', 6, 1, 'の') => 'helloの界'
let str_arr = Arc::new(StringArray::from(vec!["hello世界"]));
let pos = Arc::new(Int64Array::from(vec![6]));
let len = Arc::new(Int64Array::from(vec![1]));
let newstr = Arc::new(StringArray::from(vec![""]));
let args = create_args(vec![str_arr, pos, len, newstr]);
let result = function.invoke_with_args(args).unwrap();
if let ColumnarValue::Array(array) = result {
let str_array = array.as_string::<i64>();
assert_eq!(str_array.value(0), "helloの界");
} else {
panic!("Expected array result");
}
}
#[test]
fn test_insert_with_nulls() {
let function = InsertFunction::default();
let str_arr = Arc::new(StringArray::from(vec![Some("hello"), None]));
let pos = Arc::new(Int64Array::from(vec![1, 1]));
let len = Arc::new(Int64Array::from(vec![1, 1]));
let newstr = Arc::new(StringArray::from(vec!["X", "X"]));
let args = create_args(vec![str_arr, pos, len, newstr]);
let result = function.invoke_with_args(args).unwrap();
if let ColumnarValue::Array(array) = result {
let str_array = array.as_string::<i64>();
assert_eq!(str_array.value(0), "Xello");
assert!(str_array.is_null(1));
} else {
panic!("Expected array result");
}
}
}

View File

@@ -0,0 +1,373 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
//! MySQL-compatible LOCATE function implementation.
//!
//! LOCATE(substr, str) - Returns the position of the first occurrence of substr in str (1-based).
//! LOCATE(substr, str, pos) - Returns the position of the first occurrence of substr in str,
//! starting from position pos.
//! Returns 0 if substr is not found.
use std::fmt;
use std::sync::Arc;
use datafusion_common::DataFusionError;
use datafusion_common::arrow::array::{Array, ArrayRef, AsArray, Int64Builder};
use datafusion_common::arrow::compute::cast;
use datafusion_common::arrow::datatypes::DataType;
use datafusion_expr::{ColumnarValue, ScalarFunctionArgs, Signature, TypeSignature, Volatility};
use crate::function::Function;
use crate::function_registry::FunctionRegistry;
const NAME: &str = "locate";
/// MySQL-compatible LOCATE function.
///
/// Syntax:
/// - LOCATE(substr, str) - Returns 1-based position of substr in str, or 0 if not found.
/// - LOCATE(substr, str, pos) - Same, but starts searching from position pos.
#[derive(Debug)]
pub struct LocateFunction {
signature: Signature,
}
impl LocateFunction {
pub fn register(registry: &FunctionRegistry) {
registry.register_scalar(LocateFunction::default());
}
}
impl Default for LocateFunction {
fn default() -> Self {
// Support 2 or 3 arguments with various string types
let mut signatures = Vec::new();
let string_types = [DataType::Utf8, DataType::LargeUtf8, DataType::Utf8View];
let int_types = [
DataType::Int64,
DataType::Int32,
DataType::Int16,
DataType::Int8,
DataType::UInt64,
DataType::UInt32,
DataType::UInt16,
DataType::UInt8,
];
// 2-argument form: LOCATE(substr, str)
for substr_type in &string_types {
for str_type in &string_types {
signatures.push(TypeSignature::Exact(vec![
substr_type.clone(),
str_type.clone(),
]));
}
}
// 3-argument form: LOCATE(substr, str, pos)
for substr_type in &string_types {
for str_type in &string_types {
for pos_type in &int_types {
signatures.push(TypeSignature::Exact(vec![
substr_type.clone(),
str_type.clone(),
pos_type.clone(),
]));
}
}
}
Self {
signature: Signature::one_of(signatures, Volatility::Immutable),
}
}
}
impl fmt::Display for LocateFunction {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
write!(f, "{}", NAME.to_ascii_uppercase())
}
}
impl Function for LocateFunction {
fn name(&self) -> &str {
NAME
}
fn return_type(&self, _: &[DataType]) -> datafusion_common::Result<DataType> {
Ok(DataType::Int64)
}
fn signature(&self) -> &Signature {
&self.signature
}
fn invoke_with_args(
&self,
args: ScalarFunctionArgs,
) -> datafusion_common::Result<ColumnarValue> {
let arg_count = args.args.len();
if !(2..=3).contains(&arg_count) {
return Err(DataFusionError::Execution(
"LOCATE requires 2 or 3 arguments: LOCATE(substr, str) or LOCATE(substr, str, pos)"
.to_string(),
));
}
let arrays = ColumnarValue::values_to_arrays(&args.args)?;
// Cast string arguments to LargeUtf8 for uniform access
let substr_array = cast_to_large_utf8(&arrays[0], "substr")?;
let str_array = cast_to_large_utf8(&arrays[1], "str")?;
let substr = substr_array.as_string::<i64>();
let str_arr = str_array.as_string::<i64>();
let len = substr.len();
// Handle optional pos argument
let pos_array: Option<ArrayRef> = if arg_count == 3 {
Some(cast_to_int64(&arrays[2], "pos")?)
} else {
None
};
let mut builder = Int64Builder::with_capacity(len);
for i in 0..len {
if substr.is_null(i) || str_arr.is_null(i) {
builder.append_null();
continue;
}
let needle = substr.value(i);
let haystack = str_arr.value(i);
// Get starting position (1-based in MySQL, convert to 0-based)
let start_pos = if let Some(ref pos_arr) = pos_array {
if pos_arr.is_null(i) {
builder.append_null();
continue;
}
let pos = pos_arr
.as_primitive::<datafusion_common::arrow::datatypes::Int64Type>()
.value(i);
if pos < 1 {
// MySQL returns 0 for pos < 1
builder.append_value(0);
continue;
}
(pos - 1) as usize
} else {
0
};
// Find position using character-based indexing (for Unicode support)
let result = locate_substr(haystack, needle, start_pos);
builder.append_value(result);
}
Ok(ColumnarValue::Array(Arc::new(builder.finish())))
}
}
/// Cast array to LargeUtf8 for uniform string access.
fn cast_to_large_utf8(array: &ArrayRef, name: &str) -> datafusion_common::Result<ArrayRef> {
cast(array.as_ref(), &DataType::LargeUtf8)
.map_err(|e| DataFusionError::Execution(format!("LOCATE: {} cast failed: {}", name, e)))
}
fn cast_to_int64(array: &ArrayRef, name: &str) -> datafusion_common::Result<ArrayRef> {
cast(array.as_ref(), &DataType::Int64)
.map_err(|e| DataFusionError::Execution(format!("LOCATE: {} cast failed: {}", name, e)))
}
/// Find the 1-based position of needle in haystack, starting from start_pos (0-based character index).
/// Returns 0 if not found.
fn locate_substr(haystack: &str, needle: &str, start_pos: usize) -> i64 {
// Handle empty needle - MySQL returns start_pos + 1
if needle.is_empty() {
let char_count = haystack.chars().count();
return if start_pos <= char_count {
(start_pos + 1) as i64
} else {
0
};
}
// Convert start_pos (character index) to byte index
let byte_start = haystack
.char_indices()
.nth(start_pos)
.map(|(idx, _)| idx)
.unwrap_or(haystack.len());
if byte_start >= haystack.len() {
return 0;
}
// Search in the substring
let search_str = &haystack[byte_start..];
if let Some(byte_pos) = search_str.find(needle) {
// Convert byte position back to character position
let char_pos = search_str[..byte_pos].chars().count();
// Return 1-based position relative to original string
(start_pos + char_pos + 1) as i64
} else {
0
}
}
#[cfg(test)]
mod tests {
use std::sync::Arc;
use datafusion_common::arrow::array::StringArray;
use datafusion_common::arrow::datatypes::Field;
use datafusion_expr::ScalarFunctionArgs;
use super::*;
fn create_args(arrays: Vec<ArrayRef>) -> ScalarFunctionArgs {
let arg_fields: Vec<_> = arrays
.iter()
.enumerate()
.map(|(i, arr)| {
Arc::new(Field::new(
format!("arg_{}", i),
arr.data_type().clone(),
true,
))
})
.collect();
ScalarFunctionArgs {
args: arrays.iter().cloned().map(ColumnarValue::Array).collect(),
arg_fields,
return_field: Arc::new(Field::new("result", DataType::Int64, true)),
number_rows: arrays[0].len(),
config_options: Arc::new(datafusion_common::config::ConfigOptions::default()),
}
}
#[test]
fn test_locate_basic() {
let function = LocateFunction::default();
let substr = Arc::new(StringArray::from(vec!["world", "xyz", "hello"]));
let str_arr = Arc::new(StringArray::from(vec![
"hello world",
"hello world",
"hello world",
]));
let args = create_args(vec![substr, str_arr]);
let result = function.invoke_with_args(args).unwrap();
if let ColumnarValue::Array(array) = result {
let int_array = array.as_primitive::<datafusion_common::arrow::datatypes::Int64Type>();
assert_eq!(int_array.value(0), 7); // "world" at position 7
assert_eq!(int_array.value(1), 0); // "xyz" not found
assert_eq!(int_array.value(2), 1); // "hello" at position 1
} else {
panic!("Expected array result");
}
}
#[test]
fn test_locate_with_position() {
let function = LocateFunction::default();
let substr = Arc::new(StringArray::from(vec!["o", "o", "o"]));
let str_arr = Arc::new(StringArray::from(vec![
"hello world",
"hello world",
"hello world",
]));
let pos = Arc::new(datafusion_common::arrow::array::Int64Array::from(vec![
1, 5, 8,
]));
let args = create_args(vec![substr, str_arr, pos]);
let result = function.invoke_with_args(args).unwrap();
if let ColumnarValue::Array(array) = result {
let int_array = array.as_primitive::<datafusion_common::arrow::datatypes::Int64Type>();
assert_eq!(int_array.value(0), 5); // first 'o' at position 5
assert_eq!(int_array.value(1), 5); // 'o' at position 5 (start from 5)
assert_eq!(int_array.value(2), 8); // 'o' in "world" at position 8
} else {
panic!("Expected array result");
}
}
#[test]
fn test_locate_unicode() {
let function = LocateFunction::default();
let substr = Arc::new(StringArray::from(vec!["", ""]));
let str_arr = Arc::new(StringArray::from(vec!["hello世界", "hello世界"]));
let args = create_args(vec![substr, str_arr]);
let result = function.invoke_with_args(args).unwrap();
if let ColumnarValue::Array(array) = result {
let int_array = array.as_primitive::<datafusion_common::arrow::datatypes::Int64Type>();
assert_eq!(int_array.value(0), 6); // "世" at position 6
assert_eq!(int_array.value(1), 7); // "界" at position 7
} else {
panic!("Expected array result");
}
}
#[test]
fn test_locate_empty_needle() {
let function = LocateFunction::default();
let substr = Arc::new(StringArray::from(vec!["", ""]));
let str_arr = Arc::new(StringArray::from(vec!["hello", "hello"]));
let pos = Arc::new(datafusion_common::arrow::array::Int64Array::from(vec![
1, 3,
]));
let args = create_args(vec![substr, str_arr, pos]);
let result = function.invoke_with_args(args).unwrap();
if let ColumnarValue::Array(array) = result {
let int_array = array.as_primitive::<datafusion_common::arrow::datatypes::Int64Type>();
assert_eq!(int_array.value(0), 1); // empty string at pos 1
assert_eq!(int_array.value(1), 3); // empty string at pos 3
} else {
panic!("Expected array result");
}
}
#[test]
fn test_locate_with_nulls() {
let function = LocateFunction::default();
let substr = Arc::new(StringArray::from(vec![Some("o"), None]));
let str_arr = Arc::new(StringArray::from(vec![Some("hello"), Some("hello")]));
let args = create_args(vec![substr, str_arr]);
let result = function.invoke_with_args(args).unwrap();
if let ColumnarValue::Array(array) = result {
let int_array = array.as_primitive::<datafusion_common::arrow::datatypes::Int64Type>();
assert_eq!(int_array.value(0), 5);
assert!(int_array.is_null(1));
} else {
panic!("Expected array result");
}
}
}

View File

@@ -0,0 +1,252 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
//! MySQL-compatible SPACE function implementation.
//!
//! SPACE(N) - Returns a string consisting of N space characters.
use std::fmt;
use std::sync::Arc;
use datafusion_common::DataFusionError;
use datafusion_common::arrow::array::{Array, AsArray, LargeStringBuilder};
use datafusion_common::arrow::datatypes::DataType;
use datafusion_expr::{ColumnarValue, ScalarFunctionArgs, Signature, TypeSignature, Volatility};
use crate::function::Function;
use crate::function_registry::FunctionRegistry;
const NAME: &str = "space";
// Safety limit for maximum number of spaces
const MAX_SPACE_COUNT: i64 = 1024 * 1024; // 1MB of spaces
/// MySQL-compatible SPACE function.
///
/// Syntax: SPACE(N)
/// Returns a string consisting of N space characters.
/// Returns NULL if N is NULL.
/// Returns empty string if N < 0.
#[derive(Debug)]
pub struct SpaceFunction {
signature: Signature,
}
impl SpaceFunction {
pub fn register(registry: &FunctionRegistry) {
registry.register_scalar(SpaceFunction::default());
}
}
impl Default for SpaceFunction {
fn default() -> Self {
Self {
signature: Signature::one_of(
vec![
TypeSignature::Exact(vec![DataType::Int64]),
TypeSignature::Exact(vec![DataType::Int32]),
TypeSignature::Exact(vec![DataType::Int16]),
TypeSignature::Exact(vec![DataType::Int8]),
TypeSignature::Exact(vec![DataType::UInt64]),
TypeSignature::Exact(vec![DataType::UInt32]),
TypeSignature::Exact(vec![DataType::UInt16]),
TypeSignature::Exact(vec![DataType::UInt8]),
],
Volatility::Immutable,
),
}
}
}
impl fmt::Display for SpaceFunction {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
write!(f, "{}", NAME.to_ascii_uppercase())
}
}
impl Function for SpaceFunction {
fn name(&self) -> &str {
NAME
}
fn return_type(&self, _: &[DataType]) -> datafusion_common::Result<DataType> {
Ok(DataType::LargeUtf8)
}
fn signature(&self) -> &Signature {
&self.signature
}
fn invoke_with_args(
&self,
args: ScalarFunctionArgs,
) -> datafusion_common::Result<ColumnarValue> {
if args.args.len() != 1 {
return Err(DataFusionError::Execution(
"SPACE requires exactly 1 argument: SPACE(N)".to_string(),
));
}
let arrays = ColumnarValue::values_to_arrays(&args.args)?;
let len = arrays[0].len();
let n_array = &arrays[0];
let mut builder = LargeStringBuilder::with_capacity(len, len * 10);
for i in 0..len {
if n_array.is_null(i) {
builder.append_null();
continue;
}
let n = get_int_value(n_array, i)?;
if n < 0 {
// MySQL returns empty string for negative values
builder.append_value("");
} else if n > MAX_SPACE_COUNT {
return Err(DataFusionError::Execution(format!(
"SPACE: requested {} spaces exceeds maximum allowed ({})",
n, MAX_SPACE_COUNT
)));
} else {
let spaces = " ".repeat(n as usize);
builder.append_value(&spaces);
}
}
Ok(ColumnarValue::Array(Arc::new(builder.finish())))
}
}
/// Extract integer value from various integer types.
fn get_int_value(
array: &datafusion_common::arrow::array::ArrayRef,
index: usize,
) -> datafusion_common::Result<i64> {
use datafusion_common::arrow::datatypes as arrow_types;
match array.data_type() {
DataType::Int64 => Ok(array.as_primitive::<arrow_types::Int64Type>().value(index)),
DataType::Int32 => Ok(array.as_primitive::<arrow_types::Int32Type>().value(index) as i64),
DataType::Int16 => Ok(array.as_primitive::<arrow_types::Int16Type>().value(index) as i64),
DataType::Int8 => Ok(array.as_primitive::<arrow_types::Int8Type>().value(index) as i64),
DataType::UInt64 => {
let v = array.as_primitive::<arrow_types::UInt64Type>().value(index);
if v > i64::MAX as u64 {
Err(DataFusionError::Execution(format!(
"SPACE: value {} exceeds maximum",
v
)))
} else {
Ok(v as i64)
}
}
DataType::UInt32 => Ok(array.as_primitive::<arrow_types::UInt32Type>().value(index) as i64),
DataType::UInt16 => Ok(array.as_primitive::<arrow_types::UInt16Type>().value(index) as i64),
DataType::UInt8 => Ok(array.as_primitive::<arrow_types::UInt8Type>().value(index) as i64),
_ => Err(DataFusionError::Execution(format!(
"SPACE: unsupported type {:?}",
array.data_type()
))),
}
}
#[cfg(test)]
mod tests {
use std::sync::Arc;
use datafusion_common::arrow::array::Int64Array;
use datafusion_common::arrow::datatypes::Field;
use datafusion_expr::ScalarFunctionArgs;
use super::*;
fn create_args(arrays: Vec<datafusion_common::arrow::array::ArrayRef>) -> ScalarFunctionArgs {
let arg_fields: Vec<_> = arrays
.iter()
.enumerate()
.map(|(i, arr)| {
Arc::new(Field::new(
format!("arg_{}", i),
arr.data_type().clone(),
true,
))
})
.collect();
ScalarFunctionArgs {
args: arrays.iter().cloned().map(ColumnarValue::Array).collect(),
arg_fields,
return_field: Arc::new(Field::new("result", DataType::LargeUtf8, true)),
number_rows: arrays[0].len(),
config_options: Arc::new(datafusion_common::config::ConfigOptions::default()),
}
}
#[test]
fn test_space_basic() {
let function = SpaceFunction::default();
let n = Arc::new(Int64Array::from(vec![0, 1, 5]));
let args = create_args(vec![n]);
let result = function.invoke_with_args(args).unwrap();
if let ColumnarValue::Array(array) = result {
let str_array = array.as_string::<i64>();
assert_eq!(str_array.value(0), "");
assert_eq!(str_array.value(1), " ");
assert_eq!(str_array.value(2), " ");
} else {
panic!("Expected array result");
}
}
#[test]
fn test_space_negative() {
let function = SpaceFunction::default();
let n = Arc::new(Int64Array::from(vec![-1, -100]));
let args = create_args(vec![n]);
let result = function.invoke_with_args(args).unwrap();
if let ColumnarValue::Array(array) = result {
let str_array = array.as_string::<i64>();
assert_eq!(str_array.value(0), "");
assert_eq!(str_array.value(1), "");
} else {
panic!("Expected array result");
}
}
#[test]
fn test_space_with_nulls() {
let function = SpaceFunction::default();
let n = Arc::new(Int64Array::from(vec![Some(3), None]));
let args = create_args(vec![n]);
let result = function.invoke_with_args(args).unwrap();
if let ColumnarValue::Array(array) = result {
let str_array = array.as_string::<i64>();
assert_eq!(str_array.value(0), " ");
assert!(str_array.is_null(1));
} else {
panic!("Expected array result");
}
}
}

View File

@@ -0,0 +1,20 @@
[package]
name = "common-memory-manager"
version.workspace = true
edition.workspace = true
license.workspace = true
[lints]
workspace = true
[dependencies]
common-error = { workspace = true }
common-macro = { workspace = true }
common-telemetry = { workspace = true }
humantime = { workspace = true }
serde = { workspace = true }
snafu = { workspace = true }
tokio = { workspace = true, features = ["sync"] }
[dev-dependencies]
tokio = { workspace = true, features = ["rt", "macros"] }

View File

@@ -0,0 +1,63 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
use std::any::Any;
use std::time::Duration;
use common_error::ext::ErrorExt;
use common_error::status_code::StatusCode;
use common_macro::stack_trace_debug;
use snafu::Snafu;
pub type Result<T> = std::result::Result<T, Error>;
#[derive(Snafu)]
#[snafu(visibility(pub))]
#[stack_trace_debug]
pub enum Error {
#[snafu(display(
"Memory limit exceeded: requested {requested_bytes} bytes, limit {limit_bytes} bytes"
))]
MemoryLimitExceeded {
requested_bytes: u64,
limit_bytes: u64,
},
#[snafu(display("Memory semaphore unexpectedly closed"))]
MemorySemaphoreClosed,
#[snafu(display(
"Timeout waiting for memory quota: requested {requested_bytes} bytes, waited {waited:?}"
))]
MemoryAcquireTimeout {
requested_bytes: u64,
waited: Duration,
},
}
impl ErrorExt for Error {
fn status_code(&self) -> StatusCode {
use Error::*;
match self {
MemoryLimitExceeded { .. } => StatusCode::RuntimeResourcesExhausted,
MemorySemaphoreClosed => StatusCode::Unexpected,
MemoryAcquireTimeout { .. } => StatusCode::RuntimeResourcesExhausted,
}
}
fn as_any(&self) -> &dyn Any {
self
}
}

View File

@@ -0,0 +1,168 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
use std::fmt;
/// Memory permit granularity for different use cases.
#[derive(Debug, Clone, Copy, PartialEq, Eq, Default)]
pub enum PermitGranularity {
/// 1 KB per permit
///
/// Use for:
/// - HTTP/gRPC request limiting (small, high-concurrency operations)
/// - Small batch operations
/// - Scenarios requiring fine-grained fairness
Kilobyte,
/// 1 MB per permit (default)
///
/// Use for:
/// - Query execution memory management
/// - Compaction memory control
/// - Large, long-running operations
#[default]
Megabyte,
}
impl PermitGranularity {
/// Returns the number of bytes per permit.
#[inline]
pub const fn bytes(self) -> u64 {
match self {
Self::Kilobyte => 1024,
Self::Megabyte => 1024 * 1024,
}
}
/// Returns a human-readable string representation.
pub const fn as_str(self) -> &'static str {
match self {
Self::Kilobyte => "1KB",
Self::Megabyte => "1MB",
}
}
/// Converts bytes to permits based on this granularity.
///
/// Rounds up to ensure the requested bytes are fully covered.
/// Clamped to Semaphore::MAX_PERMITS.
#[inline]
pub fn bytes_to_permits(self, bytes: u64) -> u32 {
use tokio::sync::Semaphore;
let granularity_bytes = self.bytes();
bytes
.saturating_add(granularity_bytes - 1)
.saturating_div(granularity_bytes)
.min(Semaphore::MAX_PERMITS as u64)
.min(u32::MAX as u64) as u32
}
/// Converts permits to bytes based on this granularity.
#[inline]
pub fn permits_to_bytes(self, permits: u32) -> u64 {
(permits as u64).saturating_mul(self.bytes())
}
}
impl fmt::Display for PermitGranularity {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
write!(f, "{}", self.as_str())
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_bytes_to_permits_kilobyte() {
let granularity = PermitGranularity::Kilobyte;
// Exact multiples
assert_eq!(granularity.bytes_to_permits(1024), 1);
assert_eq!(granularity.bytes_to_permits(2048), 2);
assert_eq!(granularity.bytes_to_permits(10 * 1024), 10);
// Rounds up
assert_eq!(granularity.bytes_to_permits(1), 1);
assert_eq!(granularity.bytes_to_permits(1025), 2);
assert_eq!(granularity.bytes_to_permits(2047), 2);
}
#[test]
fn test_bytes_to_permits_megabyte() {
let granularity = PermitGranularity::Megabyte;
// Exact multiples
assert_eq!(granularity.bytes_to_permits(1024 * 1024), 1);
assert_eq!(granularity.bytes_to_permits(2 * 1024 * 1024), 2);
// Rounds up
assert_eq!(granularity.bytes_to_permits(1), 1);
assert_eq!(granularity.bytes_to_permits(1024), 1);
assert_eq!(granularity.bytes_to_permits(1024 * 1024 + 1), 2);
}
#[test]
fn test_bytes_to_permits_zero_bytes() {
assert_eq!(PermitGranularity::Kilobyte.bytes_to_permits(0), 0);
assert_eq!(PermitGranularity::Megabyte.bytes_to_permits(0), 0);
}
#[test]
fn test_bytes_to_permits_clamps_to_maximum() {
use tokio::sync::Semaphore;
let max_permits = (Semaphore::MAX_PERMITS as u64).min(u32::MAX as u64) as u32;
assert_eq!(
PermitGranularity::Kilobyte.bytes_to_permits(u64::MAX),
max_permits
);
assert_eq!(
PermitGranularity::Megabyte.bytes_to_permits(u64::MAX),
max_permits
);
}
#[test]
fn test_permits_to_bytes() {
assert_eq!(PermitGranularity::Kilobyte.permits_to_bytes(1), 1024);
assert_eq!(PermitGranularity::Kilobyte.permits_to_bytes(10), 10 * 1024);
assert_eq!(PermitGranularity::Megabyte.permits_to_bytes(1), 1024 * 1024);
assert_eq!(
PermitGranularity::Megabyte.permits_to_bytes(10),
10 * 1024 * 1024
);
}
#[test]
fn test_round_trip_conversion() {
// Kilobyte: bytes -> permits -> bytes (should round up)
let kb = PermitGranularity::Kilobyte;
let permits = kb.bytes_to_permits(1500);
let bytes = kb.permits_to_bytes(permits);
assert!(bytes >= 1500); // Must cover original request
assert_eq!(bytes, 2048); // 2KB
// Megabyte: bytes -> permits -> bytes (should round up)
let mb = PermitGranularity::Megabyte;
let permits = mb.bytes_to_permits(1500);
let bytes = mb.permits_to_bytes(permits);
assert!(bytes >= 1500);
assert_eq!(bytes, 1024 * 1024); // 1MB
}
}

View File

@@ -0,0 +1,231 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
use std::{fmt, mem};
use common_telemetry::debug;
use snafu::ensure;
use tokio::sync::{OwnedSemaphorePermit, TryAcquireError};
use crate::error::{
MemoryAcquireTimeoutSnafu, MemoryLimitExceededSnafu, MemorySemaphoreClosedSnafu, Result,
};
use crate::manager::{MemoryMetrics, MemoryQuota};
use crate::policy::OnExhaustedPolicy;
/// Guard representing a slice of reserved memory.
pub struct MemoryGuard<M: MemoryMetrics> {
pub(crate) state: GuardState<M>,
}
pub(crate) enum GuardState<M: MemoryMetrics> {
Unlimited,
Limited {
permit: OwnedSemaphorePermit,
quota: MemoryQuota<M>,
},
}
impl<M: MemoryMetrics> MemoryGuard<M> {
pub(crate) fn unlimited() -> Self {
Self {
state: GuardState::Unlimited,
}
}
pub(crate) fn limited(permit: OwnedSemaphorePermit, quota: MemoryQuota<M>) -> Self {
Self {
state: GuardState::Limited { permit, quota },
}
}
/// Returns granted quota in bytes.
pub fn granted_bytes(&self) -> u64 {
match &self.state {
GuardState::Unlimited => 0,
GuardState::Limited { permit, quota } => {
quota.permits_to_bytes(permit.num_permits() as u32)
}
}
}
/// Acquires additional memory, waiting if necessary until enough is available.
///
/// On success, merges the new memory into this guard.
///
/// # Errors
/// - Returns error if requested bytes would exceed the manager's total limit
/// - Returns error if the semaphore is unexpectedly closed
pub async fn acquire_additional(&mut self, bytes: u64) -> Result<()> {
match &mut self.state {
GuardState::Unlimited => Ok(()),
GuardState::Limited { permit, quota } => {
if bytes == 0 {
return Ok(());
}
let additional_permits = quota.bytes_to_permits(bytes);
let current_permits = permit.num_permits() as u32;
ensure!(
current_permits.saturating_add(additional_permits) <= quota.limit_permits,
MemoryLimitExceededSnafu {
requested_bytes: bytes,
limit_bytes: quota.permits_to_bytes(quota.limit_permits)
}
);
let additional_permit = quota
.semaphore
.clone()
.acquire_many_owned(additional_permits)
.await
.map_err(|_| MemorySemaphoreClosedSnafu.build())?;
permit.merge(additional_permit);
quota.update_in_use_metric();
debug!("Acquired additional {} bytes", bytes);
Ok(())
}
}
}
/// Tries to acquire additional memory without waiting.
///
/// On success, merges the new memory into this guard and returns true.
/// On failure, returns false and leaves this guard unchanged.
pub fn try_acquire_additional(&mut self, bytes: u64) -> bool {
match &mut self.state {
GuardState::Unlimited => true,
GuardState::Limited { permit, quota } => {
if bytes == 0 {
return true;
}
let additional_permits = quota.bytes_to_permits(bytes);
match quota
.semaphore
.clone()
.try_acquire_many_owned(additional_permits)
{
Ok(additional_permit) => {
permit.merge(additional_permit);
quota.update_in_use_metric();
debug!("Acquired additional {} bytes", bytes);
true
}
Err(TryAcquireError::NoPermits) | Err(TryAcquireError::Closed) => {
quota.metrics.inc_rejected("try_acquire_additional");
false
}
}
}
}
}
/// Acquires additional memory based on the given policy.
///
/// - For `OnExhaustedPolicy::Wait`: Waits up to the timeout duration for memory to become available
/// - For `OnExhaustedPolicy::Fail`: Returns immediately if memory is not available
///
/// # Errors
/// - `MemoryLimitExceeded`: Requested bytes would exceed the total limit (both policies), or memory is currently exhausted (Fail policy only)
/// - `MemoryAcquireTimeout`: Timeout elapsed while waiting for memory (Wait policy only)
/// - `MemorySemaphoreClosed`: The internal semaphore is unexpectedly closed (rare, indicates system issue)
pub async fn acquire_additional_with_policy(
&mut self,
bytes: u64,
policy: OnExhaustedPolicy,
) -> Result<()> {
match policy {
OnExhaustedPolicy::Wait { timeout } => {
match tokio::time::timeout(timeout, self.acquire_additional(bytes)).await {
Ok(Ok(())) => Ok(()),
Ok(Err(e)) => Err(e),
Err(_elapsed) => MemoryAcquireTimeoutSnafu {
requested_bytes: bytes,
waited: timeout,
}
.fail(),
}
}
OnExhaustedPolicy::Fail => {
if self.try_acquire_additional(bytes) {
Ok(())
} else {
MemoryLimitExceededSnafu {
requested_bytes: bytes,
limit_bytes: match &self.state {
GuardState::Unlimited => 0, // unreachable: unlimited mode always succeeds
GuardState::Limited { quota, .. } => {
quota.permits_to_bytes(quota.limit_permits)
}
},
}
.fail()
}
}
}
}
/// Releases a portion of granted memory back to the pool before the guard is dropped.
///
/// Returns true if the release succeeds or is a no-op; false if the request exceeds granted.
pub fn release_partial(&mut self, bytes: u64) -> bool {
match &mut self.state {
GuardState::Unlimited => true,
GuardState::Limited { permit, quota } => {
if bytes == 0 {
return true;
}
let release_permits = quota.bytes_to_permits(bytes);
match permit.split(release_permits as usize) {
Some(released_permit) => {
let released_bytes =
quota.permits_to_bytes(released_permit.num_permits() as u32);
drop(released_permit);
quota.update_in_use_metric();
debug!("Released {} bytes from memory guard", released_bytes);
true
}
None => false,
}
}
}
}
}
impl<M: MemoryMetrics> Drop for MemoryGuard<M> {
fn drop(&mut self) {
if let GuardState::Limited { permit, quota } =
mem::replace(&mut self.state, GuardState::Unlimited)
{
let bytes = quota.permits_to_bytes(permit.num_permits() as u32);
drop(permit);
quota.update_in_use_metric();
debug!("Released memory: {} bytes", bytes);
}
}
}
impl<M: MemoryMetrics> fmt::Debug for MemoryGuard<M> {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
f.debug_struct("MemoryGuard")
.field("granted_bytes", &self.granted_bytes())
.finish()
}
}

View File

@@ -0,0 +1,49 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
//! Generic memory management for resource-constrained operations.
//!
//! This crate provides a reusable memory quota system based on semaphores,
//! allowing different subsystems (compaction, flush, index build, etc.) to
//! share the same allocation logic while using their own metrics.
mod error;
mod granularity;
mod guard;
mod manager;
mod policy;
#[cfg(test)]
mod tests;
pub use error::{Error, Result};
pub use granularity::PermitGranularity;
pub use guard::MemoryGuard;
pub use manager::{MemoryManager, MemoryMetrics};
pub use policy::{DEFAULT_MEMORY_WAIT_TIMEOUT, OnExhaustedPolicy};
/// No-op metrics implementation for testing.
#[derive(Clone, Copy, Debug, Default)]
pub struct NoOpMetrics;
impl MemoryMetrics for NoOpMetrics {
#[inline(always)]
fn set_limit(&self, _: i64) {}
#[inline(always)]
fn set_in_use(&self, _: i64) {}
#[inline(always)]
fn inc_rejected(&self, _: &str) {}
}

View File

@@ -0,0 +1,222 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
use std::sync::Arc;
use snafu::ensure;
use tokio::sync::{Semaphore, TryAcquireError};
use crate::error::{
MemoryAcquireTimeoutSnafu, MemoryLimitExceededSnafu, MemorySemaphoreClosedSnafu, Result,
};
use crate::granularity::PermitGranularity;
use crate::guard::MemoryGuard;
use crate::policy::OnExhaustedPolicy;
/// Trait for recording memory usage metrics.
pub trait MemoryMetrics: Clone + Send + Sync + 'static {
fn set_limit(&self, bytes: i64);
fn set_in_use(&self, bytes: i64);
fn inc_rejected(&self, reason: &str);
}
/// Generic memory manager for quota-controlled operations.
#[derive(Clone)]
pub struct MemoryManager<M: MemoryMetrics> {
quota: Option<MemoryQuota<M>>,
}
impl<M: MemoryMetrics + Default> Default for MemoryManager<M> {
fn default() -> Self {
Self::new(0, M::default())
}
}
#[derive(Clone)]
pub(crate) struct MemoryQuota<M: MemoryMetrics> {
pub(crate) semaphore: Arc<Semaphore>,
pub(crate) limit_permits: u32,
pub(crate) granularity: PermitGranularity,
pub(crate) metrics: M,
}
impl<M: MemoryMetrics> MemoryManager<M> {
/// Creates a new memory manager with the given limit in bytes.
/// `limit_bytes = 0` disables the limit.
pub fn new(limit_bytes: u64, metrics: M) -> Self {
Self::with_granularity(limit_bytes, PermitGranularity::default(), metrics)
}
/// Creates a new memory manager with specified granularity.
pub fn with_granularity(limit_bytes: u64, granularity: PermitGranularity, metrics: M) -> Self {
if limit_bytes == 0 {
metrics.set_limit(0);
return Self { quota: None };
}
let limit_permits = granularity.bytes_to_permits(limit_bytes);
let limit_aligned_bytes = granularity.permits_to_bytes(limit_permits);
metrics.set_limit(limit_aligned_bytes as i64);
Self {
quota: Some(MemoryQuota {
semaphore: Arc::new(Semaphore::new(limit_permits as usize)),
limit_permits,
granularity,
metrics,
}),
}
}
/// Returns the configured limit in bytes (0 if unlimited).
pub fn limit_bytes(&self) -> u64 {
self.quota
.as_ref()
.map(|quota| quota.permits_to_bytes(quota.limit_permits))
.unwrap_or(0)
}
/// Returns currently used bytes.
pub fn used_bytes(&self) -> u64 {
self.quota
.as_ref()
.map(|quota| quota.permits_to_bytes(quota.used_permits()))
.unwrap_or(0)
}
/// Returns available bytes.
pub fn available_bytes(&self) -> u64 {
self.quota
.as_ref()
.map(|quota| quota.permits_to_bytes(quota.available_permits_clamped()))
.unwrap_or(0)
}
/// Acquires memory, waiting if necessary until enough is available.
///
/// # Errors
/// - Returns error if requested bytes exceed the total limit
/// - Returns error if the semaphore is unexpectedly closed
pub async fn acquire(&self, bytes: u64) -> Result<MemoryGuard<M>> {
match &self.quota {
None => Ok(MemoryGuard::unlimited()),
Some(quota) => {
let permits = quota.bytes_to_permits(bytes);
ensure!(
permits <= quota.limit_permits,
MemoryLimitExceededSnafu {
requested_bytes: bytes,
limit_bytes: self.limit_bytes()
}
);
let permit = quota
.semaphore
.clone()
.acquire_many_owned(permits)
.await
.map_err(|_| MemorySemaphoreClosedSnafu.build())?;
quota.update_in_use_metric();
Ok(MemoryGuard::limited(permit, quota.clone()))
}
}
}
/// Tries to acquire memory. Returns Some(guard) on success, None if insufficient.
pub fn try_acquire(&self, bytes: u64) -> Option<MemoryGuard<M>> {
match &self.quota {
None => Some(MemoryGuard::unlimited()),
Some(quota) => {
let permits = quota.bytes_to_permits(bytes);
match quota.semaphore.clone().try_acquire_many_owned(permits) {
Ok(permit) => {
quota.update_in_use_metric();
Some(MemoryGuard::limited(permit, quota.clone()))
}
Err(TryAcquireError::NoPermits) | Err(TryAcquireError::Closed) => {
quota.metrics.inc_rejected("try_acquire");
None
}
}
}
}
}
/// Acquires memory based on the given policy.
///
/// - For `OnExhaustedPolicy::Wait`: Waits up to the timeout duration for memory to become available
/// - For `OnExhaustedPolicy::Fail`: Returns immediately if memory is not available
///
/// # Errors
/// - `MemoryLimitExceeded`: Requested bytes exceed the total limit (both policies), or memory is currently exhausted (Fail policy only)
/// - `MemoryAcquireTimeout`: Timeout elapsed while waiting for memory (Wait policy only)
/// - `MemorySemaphoreClosed`: The internal semaphore is unexpectedly closed (rare, indicates system issue)
pub async fn acquire_with_policy(
&self,
bytes: u64,
policy: OnExhaustedPolicy,
) -> Result<MemoryGuard<M>> {
match policy {
OnExhaustedPolicy::Wait { timeout } => {
match tokio::time::timeout(timeout, self.acquire(bytes)).await {
Ok(Ok(guard)) => Ok(guard),
Ok(Err(e)) => Err(e),
Err(_elapsed) => {
// Timeout elapsed while waiting
MemoryAcquireTimeoutSnafu {
requested_bytes: bytes,
waited: timeout,
}
.fail()
}
}
}
OnExhaustedPolicy::Fail => self.try_acquire(bytes).ok_or_else(|| {
MemoryLimitExceededSnafu {
requested_bytes: bytes,
limit_bytes: self.limit_bytes(),
}
.build()
}),
}
}
}
impl<M: MemoryMetrics> MemoryQuota<M> {
pub(crate) fn bytes_to_permits(&self, bytes: u64) -> u32 {
self.granularity.bytes_to_permits(bytes)
}
pub(crate) fn permits_to_bytes(&self, permits: u32) -> u64 {
self.granularity.permits_to_bytes(permits)
}
pub(crate) fn used_permits(&self) -> u32 {
self.limit_permits
.saturating_sub(self.available_permits_clamped())
}
pub(crate) fn available_permits_clamped(&self) -> u32 {
self.semaphore
.available_permits()
.min(self.limit_permits as usize) as u32
}
pub(crate) fn update_in_use_metric(&self) {
let bytes = self.permits_to_bytes(self.used_permits());
self.metrics.set_in_use(bytes as i64);
}
}

View File

@@ -0,0 +1,83 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
use std::time::Duration;
use humantime::{format_duration, parse_duration};
use serde::{Deserialize, Serialize};
/// Default wait timeout for memory acquisition.
pub const DEFAULT_MEMORY_WAIT_TIMEOUT: Duration = Duration::from_secs(10);
/// Defines how to react when memory cannot be acquired immediately.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum OnExhaustedPolicy {
/// Wait until enough memory is released, bounded by timeout.
Wait { timeout: Duration },
/// Fail immediately if memory is not available.
Fail,
}
impl Default for OnExhaustedPolicy {
fn default() -> Self {
OnExhaustedPolicy::Wait {
timeout: DEFAULT_MEMORY_WAIT_TIMEOUT,
}
}
}
impl Serialize for OnExhaustedPolicy {
fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where
S: serde::Serializer,
{
let text = match self {
OnExhaustedPolicy::Fail => "fail".to_string(),
OnExhaustedPolicy::Wait { timeout } if *timeout == DEFAULT_MEMORY_WAIT_TIMEOUT => {
"wait".to_string()
}
OnExhaustedPolicy::Wait { timeout } => format!("wait({})", format_duration(*timeout)),
};
serializer.serialize_str(&text)
}
}
impl<'de> Deserialize<'de> for OnExhaustedPolicy {
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where
D: serde::Deserializer<'de>,
{
let raw = String::deserialize(deserializer)?;
let lower = raw.to_ascii_lowercase();
// Accept both "skip" (legacy) and "fail".
if lower == "skip" || lower == "fail" {
return Ok(OnExhaustedPolicy::Fail);
}
if lower == "wait" {
return Ok(OnExhaustedPolicy::default());
}
if lower.starts_with("wait(") && lower.ends_with(')') {
let inner = &raw[5..raw.len() - 1];
let timeout = parse_duration(inner).map_err(serde::de::Error::custom)?;
return Ok(OnExhaustedPolicy::Wait { timeout });
}
Err(serde::de::Error::custom(format!(
"invalid memory policy: {}, expected wait, wait(<duration>), fail",
raw
)))
}
}

View File

@@ -0,0 +1,411 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
use tokio::time::{Duration, sleep};
use crate::{MemoryManager, NoOpMetrics, PermitGranularity};
// Helper constant for tests - use default Megabyte granularity
const PERMIT_GRANULARITY_BYTES: u64 = PermitGranularity::Megabyte.bytes();
#[test]
fn test_try_acquire_unlimited() {
let manager = MemoryManager::new(0, NoOpMetrics);
let guard = manager.try_acquire(10 * PERMIT_GRANULARITY_BYTES).unwrap();
assert_eq!(manager.limit_bytes(), 0);
assert_eq!(guard.granted_bytes(), 0);
}
#[test]
fn test_try_acquire_limited_success_and_release() {
let bytes = 2 * PERMIT_GRANULARITY_BYTES;
let manager = MemoryManager::new(bytes, NoOpMetrics);
{
let guard = manager.try_acquire(PERMIT_GRANULARITY_BYTES).unwrap();
assert_eq!(guard.granted_bytes(), PERMIT_GRANULARITY_BYTES);
assert_eq!(manager.used_bytes(), PERMIT_GRANULARITY_BYTES);
drop(guard);
}
assert_eq!(manager.used_bytes(), 0);
}
#[test]
fn test_try_acquire_exceeds_limit() {
let limit = PERMIT_GRANULARITY_BYTES;
let manager = MemoryManager::new(limit, NoOpMetrics);
let result = manager.try_acquire(limit + PERMIT_GRANULARITY_BYTES);
assert!(result.is_none());
}
#[tokio::test(flavor = "current_thread")]
async fn test_acquire_blocks_and_unblocks() {
let bytes = 2 * PERMIT_GRANULARITY_BYTES;
let manager = MemoryManager::new(bytes, NoOpMetrics);
let guard = manager.try_acquire(bytes).unwrap();
// Spawn a task that will block on acquire()
let waiter = {
let manager = manager.clone();
tokio::spawn(async move {
// This will block until memory is available
let _guard = manager.acquire(bytes).await.unwrap();
})
};
sleep(Duration::from_millis(10)).await;
// Release memory - this should unblock the waiter
drop(guard);
// Waiter should complete now
waiter.await.unwrap();
}
#[test]
fn test_request_additional_success() {
let limit = 10 * PERMIT_GRANULARITY_BYTES; // 10MB limit
let manager = MemoryManager::new(limit, NoOpMetrics);
// Acquire base quota (5MB)
let base = 5 * PERMIT_GRANULARITY_BYTES;
let mut guard = manager.try_acquire(base).unwrap();
assert_eq!(guard.granted_bytes(), base);
assert_eq!(manager.used_bytes(), base);
// Request additional memory (3MB) - should succeed and merge
assert!(guard.try_acquire_additional(3 * PERMIT_GRANULARITY_BYTES));
assert_eq!(guard.granted_bytes(), 8 * PERMIT_GRANULARITY_BYTES);
assert_eq!(manager.used_bytes(), 8 * PERMIT_GRANULARITY_BYTES);
}
#[test]
fn test_request_additional_exceeds_limit() {
let limit = 10 * PERMIT_GRANULARITY_BYTES; // 10MB limit
let manager = MemoryManager::new(limit, NoOpMetrics);
// Acquire base quota (5MB)
let base = 5 * PERMIT_GRANULARITY_BYTES;
let mut guard = manager.try_acquire(base).unwrap();
// Request additional memory (3MB) - should succeed
assert!(guard.try_acquire_additional(3 * PERMIT_GRANULARITY_BYTES));
assert_eq!(manager.used_bytes(), 8 * PERMIT_GRANULARITY_BYTES);
// Request more (3MB) - should fail (would exceed 10MB limit)
let result = guard.try_acquire_additional(3 * PERMIT_GRANULARITY_BYTES);
assert!(!result);
// Still at 8MB
assert_eq!(manager.used_bytes(), 8 * PERMIT_GRANULARITY_BYTES);
assert_eq!(guard.granted_bytes(), 8 * PERMIT_GRANULARITY_BYTES);
}
#[test]
fn test_request_additional_auto_release_on_guard_drop() {
let limit = 10 * PERMIT_GRANULARITY_BYTES;
let manager = MemoryManager::new(limit, NoOpMetrics);
{
let mut guard = manager.try_acquire(5 * PERMIT_GRANULARITY_BYTES).unwrap();
// Request additional - memory is merged into guard
assert!(guard.try_acquire_additional(3 * PERMIT_GRANULARITY_BYTES));
assert_eq!(manager.used_bytes(), 8 * PERMIT_GRANULARITY_BYTES);
// When guard drops, all memory (base + additional) is released together
}
// After scope, all memory should be released
assert_eq!(manager.used_bytes(), 0);
}
#[test]
fn test_request_additional_unlimited() {
let manager = MemoryManager::new(0, NoOpMetrics); // Unlimited
let mut guard = manager.try_acquire(5 * PERMIT_GRANULARITY_BYTES).unwrap();
// Should always succeed with unlimited manager
assert!(guard.try_acquire_additional(100 * PERMIT_GRANULARITY_BYTES));
assert_eq!(guard.granted_bytes(), 0);
assert_eq!(manager.used_bytes(), 0);
}
#[test]
fn test_request_additional_zero_bytes() {
let limit = 10 * PERMIT_GRANULARITY_BYTES;
let manager = MemoryManager::new(limit, NoOpMetrics);
let mut guard = manager.try_acquire(5 * PERMIT_GRANULARITY_BYTES).unwrap();
// Request 0 bytes should succeed without affecting anything
assert!(guard.try_acquire_additional(0));
assert_eq!(guard.granted_bytes(), 5 * PERMIT_GRANULARITY_BYTES);
assert_eq!(manager.used_bytes(), 5 * PERMIT_GRANULARITY_BYTES);
}
#[test]
fn test_early_release_partial_success() {
let limit = 10 * PERMIT_GRANULARITY_BYTES;
let manager = MemoryManager::new(limit, NoOpMetrics);
let mut guard = manager.try_acquire(8 * PERMIT_GRANULARITY_BYTES).unwrap();
assert_eq!(manager.used_bytes(), 8 * PERMIT_GRANULARITY_BYTES);
// Release half
assert!(guard.release_partial(4 * PERMIT_GRANULARITY_BYTES));
assert_eq!(guard.granted_bytes(), 4 * PERMIT_GRANULARITY_BYTES);
assert_eq!(manager.used_bytes(), 4 * PERMIT_GRANULARITY_BYTES);
// Released memory should be available to others
let _guard2 = manager.try_acquire(4 * PERMIT_GRANULARITY_BYTES).unwrap();
assert_eq!(manager.used_bytes(), 8 * PERMIT_GRANULARITY_BYTES);
}
#[test]
fn test_early_release_partial_exceeds_granted() {
let manager = MemoryManager::new(10 * PERMIT_GRANULARITY_BYTES, NoOpMetrics);
let mut guard = manager.try_acquire(5 * PERMIT_GRANULARITY_BYTES).unwrap();
// Try to release more than granted - should fail
assert!(!guard.release_partial(10 * PERMIT_GRANULARITY_BYTES));
assert_eq!(guard.granted_bytes(), 5 * PERMIT_GRANULARITY_BYTES);
assert_eq!(manager.used_bytes(), 5 * PERMIT_GRANULARITY_BYTES);
}
#[test]
fn test_early_release_partial_unlimited() {
let manager = MemoryManager::new(0, NoOpMetrics);
let mut guard = manager.try_acquire(100 * PERMIT_GRANULARITY_BYTES).unwrap();
// Unlimited guard - release should succeed (no-op)
assert!(guard.release_partial(50 * PERMIT_GRANULARITY_BYTES));
assert_eq!(guard.granted_bytes(), 0);
}
#[test]
fn test_request_and_early_release_symmetry() {
let limit = 20 * PERMIT_GRANULARITY_BYTES;
let manager = MemoryManager::new(limit, NoOpMetrics);
let mut guard = manager.try_acquire(5 * PERMIT_GRANULARITY_BYTES).unwrap();
// Request additional
assert!(guard.try_acquire_additional(5 * PERMIT_GRANULARITY_BYTES));
assert_eq!(guard.granted_bytes(), 10 * PERMIT_GRANULARITY_BYTES);
assert_eq!(manager.used_bytes(), 10 * PERMIT_GRANULARITY_BYTES);
// Early release some
assert!(guard.release_partial(3 * PERMIT_GRANULARITY_BYTES));
assert_eq!(guard.granted_bytes(), 7 * PERMIT_GRANULARITY_BYTES);
assert_eq!(manager.used_bytes(), 7 * PERMIT_GRANULARITY_BYTES);
// Request again
assert!(guard.try_acquire_additional(2 * PERMIT_GRANULARITY_BYTES));
assert_eq!(guard.granted_bytes(), 9 * PERMIT_GRANULARITY_BYTES);
assert_eq!(manager.used_bytes(), 9 * PERMIT_GRANULARITY_BYTES);
// Early release again
assert!(guard.release_partial(4 * PERMIT_GRANULARITY_BYTES));
assert_eq!(guard.granted_bytes(), 5 * PERMIT_GRANULARITY_BYTES);
assert_eq!(manager.used_bytes(), 5 * PERMIT_GRANULARITY_BYTES);
drop(guard);
assert_eq!(manager.used_bytes(), 0);
}
#[test]
fn test_small_allocation_rounds_up() {
// Test that allocations smaller than PERMIT_GRANULARITY_BYTES
// round up to 1 permit and can use try_acquire_additional()
let limit = 10 * PERMIT_GRANULARITY_BYTES;
let manager = MemoryManager::new(limit, NoOpMetrics);
let mut guard = manager.try_acquire(512 * 1024).unwrap(); // 512KB
assert_eq!(guard.granted_bytes(), PERMIT_GRANULARITY_BYTES); // Rounds up to 1MB
assert!(guard.try_acquire_additional(2 * PERMIT_GRANULARITY_BYTES)); // Can request more
assert_eq!(guard.granted_bytes(), 3 * PERMIT_GRANULARITY_BYTES);
}
#[test]
fn test_acquire_zero_bytes_lazy_allocation() {
// Test that acquire(0) returns 0 permits but can try_acquire_additional() later
let manager = MemoryManager::new(10 * PERMIT_GRANULARITY_BYTES, NoOpMetrics);
let mut guard = manager.try_acquire(0).unwrap();
assert_eq!(guard.granted_bytes(), 0); // No permits consumed
assert_eq!(manager.used_bytes(), 0);
assert!(guard.try_acquire_additional(3 * PERMIT_GRANULARITY_BYTES)); // Lazy allocation
assert_eq!(guard.granted_bytes(), 3 * PERMIT_GRANULARITY_BYTES);
}
#[tokio::test(flavor = "current_thread")]
async fn test_acquire_additional_blocks_and_unblocks() {
let limit = 10 * PERMIT_GRANULARITY_BYTES;
let manager = MemoryManager::new(limit, NoOpMetrics);
// First guard takes 9MB, leaving only 1MB available
let mut guard1 = manager.try_acquire(9 * PERMIT_GRANULARITY_BYTES).unwrap();
assert_eq!(manager.used_bytes(), 9 * PERMIT_GRANULARITY_BYTES);
// Spawn a task that will block trying to acquire additional 5MB (needs total 10MB available)
let manager_clone = manager.clone();
let waiter = tokio::spawn(async move {
let mut guard2 = manager_clone.try_acquire(0).unwrap();
// This will block until enough memory is available
guard2
.acquire_additional(5 * PERMIT_GRANULARITY_BYTES)
.await
.unwrap();
guard2
});
sleep(Duration::from_millis(10)).await;
// Release 5MB from guard1 - this should unblock the waiter
assert!(guard1.release_partial(5 * PERMIT_GRANULARITY_BYTES));
// Waiter should complete now
let guard2 = waiter.await.unwrap();
assert_eq!(guard2.granted_bytes(), 5 * PERMIT_GRANULARITY_BYTES);
// Total: guard1 has 4MB, guard2 has 5MB = 9MB
assert_eq!(manager.used_bytes(), 9 * PERMIT_GRANULARITY_BYTES);
}
#[tokio::test(flavor = "current_thread")]
async fn test_acquire_additional_exceeds_total_limit() {
let limit = 10 * PERMIT_GRANULARITY_BYTES;
let manager = MemoryManager::new(limit, NoOpMetrics);
let mut guard = manager.try_acquire(8 * PERMIT_GRANULARITY_BYTES).unwrap();
// Try to acquire additional 5MB - would exceed total limit of 10MB
let result = guard.acquire_additional(5 * PERMIT_GRANULARITY_BYTES).await;
assert!(result.is_err());
// Guard should remain unchanged
assert_eq!(guard.granted_bytes(), 8 * PERMIT_GRANULARITY_BYTES);
assert_eq!(manager.used_bytes(), 8 * PERMIT_GRANULARITY_BYTES);
}
#[tokio::test(flavor = "current_thread")]
async fn test_acquire_additional_success() {
let limit = 10 * PERMIT_GRANULARITY_BYTES;
let manager = MemoryManager::new(limit, NoOpMetrics);
let mut guard = manager.try_acquire(3 * PERMIT_GRANULARITY_BYTES).unwrap();
assert_eq!(manager.used_bytes(), 3 * PERMIT_GRANULARITY_BYTES);
// Acquire additional 4MB - should succeed
guard
.acquire_additional(4 * PERMIT_GRANULARITY_BYTES)
.await
.unwrap();
assert_eq!(guard.granted_bytes(), 7 * PERMIT_GRANULARITY_BYTES);
assert_eq!(manager.used_bytes(), 7 * PERMIT_GRANULARITY_BYTES);
}
#[tokio::test(flavor = "current_thread")]
async fn test_acquire_additional_with_policy_wait_success() {
use crate::policy::OnExhaustedPolicy;
let limit = 10 * PERMIT_GRANULARITY_BYTES;
let manager = MemoryManager::new(limit, NoOpMetrics);
let mut guard1 = manager.try_acquire(8 * PERMIT_GRANULARITY_BYTES).unwrap();
let manager_clone = manager.clone();
let waiter = tokio::spawn(async move {
let mut guard2 = manager_clone.try_acquire(0).unwrap();
// Wait policy with 1 second timeout
guard2
.acquire_additional_with_policy(
5 * PERMIT_GRANULARITY_BYTES,
OnExhaustedPolicy::Wait {
timeout: Duration::from_secs(1),
},
)
.await
.unwrap();
guard2
});
sleep(Duration::from_millis(10)).await;
// Release memory to unblock waiter
assert!(guard1.release_partial(5 * PERMIT_GRANULARITY_BYTES));
let guard2 = waiter.await.unwrap();
assert_eq!(guard2.granted_bytes(), 5 * PERMIT_GRANULARITY_BYTES);
}
#[tokio::test(flavor = "current_thread")]
async fn test_acquire_additional_with_policy_wait_timeout() {
use crate::policy::OnExhaustedPolicy;
let limit = 10 * PERMIT_GRANULARITY_BYTES;
let manager = MemoryManager::new(limit, NoOpMetrics);
// Take all memory
let _guard1 = manager.try_acquire(10 * PERMIT_GRANULARITY_BYTES).unwrap();
let mut guard2 = manager.try_acquire(0).unwrap();
// Try to acquire with short timeout - should timeout
let result = guard2
.acquire_additional_with_policy(
5 * PERMIT_GRANULARITY_BYTES,
OnExhaustedPolicy::Wait {
timeout: Duration::from_millis(50),
},
)
.await;
assert!(result.is_err());
assert_eq!(guard2.granted_bytes(), 0);
}
#[tokio::test(flavor = "current_thread")]
async fn test_acquire_additional_with_policy_fail() {
use crate::policy::OnExhaustedPolicy;
let limit = 10 * PERMIT_GRANULARITY_BYTES;
let manager = MemoryManager::new(limit, NoOpMetrics);
let _guard1 = manager.try_acquire(8 * PERMIT_GRANULARITY_BYTES).unwrap();
let mut guard2 = manager.try_acquire(0).unwrap();
// Fail policy - should return error immediately
let result = guard2
.acquire_additional_with_policy(5 * PERMIT_GRANULARITY_BYTES, OnExhaustedPolicy::Fail)
.await;
assert!(result.is_err());
assert_eq!(guard2.granted_bytes(), 0);
}
#[tokio::test(flavor = "current_thread")]
async fn test_acquire_additional_unlimited() {
let manager = MemoryManager::new(0, NoOpMetrics); // Unlimited
let mut guard = manager.try_acquire(0).unwrap();
// Should always succeed with unlimited manager
guard
.acquire_additional(1000 * PERMIT_GRANULARITY_BYTES)
.await
.unwrap();
assert_eq!(guard.granted_bytes(), 0);
assert_eq!(manager.used_bytes(), 0);
}

View File

@@ -36,8 +36,7 @@ pub mod create_database;
pub mod create_flow; pub mod create_flow;
pub mod create_logical_tables; pub mod create_logical_tables;
pub mod create_table; pub mod create_table;
mod create_table_template; pub(crate) use create_table::{CreateRequestBuilder, build_template_from_raw_table_info};
pub(crate) use create_table_template::{CreateRequestBuilder, build_template_from_raw_table_info};
pub mod create_view; pub mod create_view;
pub mod drop_database; pub mod drop_database;
pub mod drop_flow; pub mod drop_flow;

View File

@@ -30,7 +30,7 @@ use serde::{Deserialize, Serialize};
use snafu::ResultExt; use snafu::ResultExt;
use store_api::metadata::ColumnMetadata; use store_api::metadata::ColumnMetadata;
use store_api::metric_engine_consts::ALTER_PHYSICAL_EXTENSION_KEY; use store_api::metric_engine_consts::ALTER_PHYSICAL_EXTENSION_KEY;
use store_api::storage::{RegionId, RegionNumber}; use store_api::storage::RegionNumber;
use strum::AsRefStr; use strum::AsRefStr;
use table::metadata::{RawTableInfo, TableId}; use table::metadata::{RawTableInfo, TableId};
@@ -286,14 +286,7 @@ impl CreateTablesData {
.flat_map(|(task, table_id)| { .flat_map(|(task, table_id)| {
if table_id.is_none() { if table_id.is_none() {
let table_info = task.table_info.clone(); let table_info = task.table_info.clone();
let region_ids = self let table_route = TableRouteValue::logical(self.physical_table_id);
.physical_region_numbers
.iter()
.map(|region_number| {
RegionId::new(table_info.ident.table_id, *region_number)
})
.collect();
let table_route = TableRouteValue::logical(self.physical_table_id, region_ids);
Some((table_info, table_route)) Some((table_info, table_route))
} else { } else {
None None

View File

@@ -22,7 +22,7 @@ use store_api::storage::{RegionId, TableId};
use table::metadata::RawTableInfo; use table::metadata::RawTableInfo;
use crate::ddl::create_logical_tables::CreateLogicalTablesProcedure; use crate::ddl::create_logical_tables::CreateLogicalTablesProcedure;
use crate::ddl::create_table_template::{ use crate::ddl::create_table::template::{
CreateRequestBuilder, build_template, build_template_from_raw_table_info, CreateRequestBuilder, build_template, build_template_from_raw_table_info,
}; };
use crate::ddl::utils::region_storage_path; use crate::ddl::utils::region_storage_path;

View File

@@ -12,74 +12,99 @@
// See the License for the specific language governing permissions and // See the License for the specific language governing permissions and
// limitations under the License. // limitations under the License.
pub(crate) mod executor;
pub(crate) mod template;
use std::collections::HashMap; use std::collections::HashMap;
use api::v1::region::region_request::Body as PbRegionRequest; use api::v1::CreateTableExpr;
use api::v1::region::{RegionRequest, RegionRequestHeader};
use async_trait::async_trait; use async_trait::async_trait;
use common_error::ext::BoxedError; use common_error::ext::BoxedError;
use common_procedure::error::{ use common_procedure::error::{
ExternalSnafu, FromJsonSnafu, Result as ProcedureResult, ToJsonSnafu, ExternalSnafu, FromJsonSnafu, Result as ProcedureResult, ToJsonSnafu,
}; };
use common_procedure::{Context as ProcedureContext, LockKey, Procedure, ProcedureId, Status}; use common_procedure::{Context as ProcedureContext, LockKey, Procedure, ProcedureId, Status};
use common_telemetry::tracing_context::TracingContext; use common_telemetry::info;
use common_telemetry::{info, warn};
use futures::future::join_all;
use serde::{Deserialize, Serialize}; use serde::{Deserialize, Serialize};
use snafu::{OptionExt, ResultExt, ensure}; use snafu::{OptionExt, ResultExt};
use store_api::metadata::ColumnMetadata; use store_api::metadata::ColumnMetadata;
use store_api::metric_engine_consts::TABLE_COLUMN_METADATA_EXTENSION_KEY; use store_api::storage::RegionNumber;
use store_api::storage::{RegionId, RegionNumber};
use strum::AsRefStr; use strum::AsRefStr;
use table::metadata::{RawTableInfo, TableId}; use table::metadata::{RawTableInfo, TableId};
use table::table_name::TableName;
use table::table_reference::TableReference; use table::table_reference::TableReference;
pub(crate) use template::{CreateRequestBuilder, build_template_from_raw_table_info};
use crate::ddl::create_table_template::{CreateRequestBuilder, build_template}; use crate::ddl::create_table::executor::CreateTableExecutor;
use crate::ddl::utils::raw_table_info::update_table_info_column_ids; use crate::ddl::create_table::template::build_template;
use crate::ddl::utils::{ use crate::ddl::utils::map_to_procedure_error;
add_peer_context_if_needed, convert_region_routes_to_detecting_regions,
extract_column_metadatas, map_to_procedure_error, region_storage_path,
};
use crate::ddl::{DdlContext, TableMetadata}; use crate::ddl::{DdlContext, TableMetadata};
use crate::error::{self, Result}; use crate::error::{self, Result};
use crate::key::table_name::TableNameKey; use crate::key::table_route::PhysicalTableRouteValue;
use crate::key::table_route::{PhysicalTableRouteValue, TableRouteValue};
use crate::lock_key::{CatalogLock, SchemaLock, TableNameLock}; use crate::lock_key::{CatalogLock, SchemaLock, TableNameLock};
use crate::metrics; use crate::metrics;
use crate::region_keeper::OperatingRegionGuard; use crate::region_keeper::OperatingRegionGuard;
use crate::rpc::ddl::CreateTableTask; use crate::rpc::ddl::CreateTableTask;
use crate::rpc::router::{ use crate::rpc::router::{RegionRoute, operating_leader_regions};
RegionRoute, find_leader_regions, find_leaders, operating_leader_regions,
};
pub struct CreateTableProcedure { pub struct CreateTableProcedure {
pub context: DdlContext, pub context: DdlContext,
pub creator: TableCreator, /// The serializable data.
pub data: CreateTableData,
/// The guards of opening.
pub opening_regions: Vec<OperatingRegionGuard>,
/// The executor of the procedure.
pub executor: CreateTableExecutor,
}
fn build_executor_from_create_table_data(
create_table_expr: &CreateTableExpr,
) -> Result<CreateTableExecutor> {
let template = build_template(create_table_expr)?;
let builder = CreateRequestBuilder::new(template, None);
let table_name = TableName::new(
create_table_expr.catalog_name.clone(),
create_table_expr.schema_name.clone(),
create_table_expr.table_name.clone(),
);
let executor =
CreateTableExecutor::new(table_name, create_table_expr.create_if_not_exists, builder);
Ok(executor)
} }
impl CreateTableProcedure { impl CreateTableProcedure {
pub const TYPE_NAME: &'static str = "metasrv-procedure::CreateTable"; pub const TYPE_NAME: &'static str = "metasrv-procedure::CreateTable";
pub fn new(task: CreateTableTask, context: DdlContext) -> Self { pub fn new(task: CreateTableTask, context: DdlContext) -> Result<Self> {
Self { let executor = build_executor_from_create_table_data(&task.create_table)?;
Ok(Self {
context, context,
creator: TableCreator::new(task), data: CreateTableData::new(task),
} opening_regions: vec![],
executor,
})
} }
pub fn from_json(json: &str, context: DdlContext) -> ProcedureResult<Self> { pub fn from_json(json: &str, context: DdlContext) -> ProcedureResult<Self> {
let data = serde_json::from_str(json).context(FromJsonSnafu)?; let data: CreateTableData = serde_json::from_str(json).context(FromJsonSnafu)?;
let create_table_expr = &data.task.create_table;
let executor = build_executor_from_create_table_data(create_table_expr)
.map_err(BoxedError::new)
.context(ExternalSnafu {
clean_poisons: false,
})?;
Ok(CreateTableProcedure { Ok(CreateTableProcedure {
context, context,
creator: TableCreator { data,
data, opening_regions: vec![],
opening_regions: vec![], executor,
},
}) })
} }
fn table_info(&self) -> &RawTableInfo { fn table_info(&self) -> &RawTableInfo {
&self.creator.data.task.table_info &self.data.task.table_info
} }
pub(crate) fn table_id(&self) -> TableId { pub(crate) fn table_id(&self) -> TableId {
@@ -87,8 +112,7 @@ impl CreateTableProcedure {
} }
fn region_wal_options(&self) -> Result<&HashMap<RegionNumber, String>> { fn region_wal_options(&self) -> Result<&HashMap<RegionNumber, String>> {
self.creator self.data
.data
.region_wal_options .region_wal_options
.as_ref() .as_ref()
.context(error::UnexpectedSnafu { .context(error::UnexpectedSnafu {
@@ -97,8 +121,7 @@ impl CreateTableProcedure {
} }
fn table_route(&self) -> Result<&PhysicalTableRouteValue> { fn table_route(&self) -> Result<&PhysicalTableRouteValue> {
self.creator self.data
.data
.table_route .table_route
.as_ref() .as_ref()
.context(error::UnexpectedSnafu { .context(error::UnexpectedSnafu {
@@ -106,17 +129,6 @@ impl CreateTableProcedure {
}) })
} }
#[cfg(any(test, feature = "testing"))]
pub fn set_allocated_metadata(
&mut self,
table_id: TableId,
table_route: PhysicalTableRouteValue,
region_wal_options: HashMap<RegionNumber, String>,
) {
self.creator
.set_allocated_metadata(table_id, table_route, region_wal_options)
}
/// On the prepare step, it performs: /// On the prepare step, it performs:
/// - Checks whether the table exists. /// - Checks whether the table exists.
/// - Allocates the table id. /// - Allocates the table id.
@@ -125,31 +137,16 @@ impl CreateTableProcedure {
/// - TableName exists and `create_if_not_exists` is false. /// - TableName exists and `create_if_not_exists` is false.
/// - Failed to allocate [TableMetadata]. /// - Failed to allocate [TableMetadata].
pub(crate) async fn on_prepare(&mut self) -> Result<Status> { pub(crate) async fn on_prepare(&mut self) -> Result<Status> {
let expr = &self.creator.data.task.create_table; let table_id = self
let table_name_value = self .executor
.context .on_prepare(&self.context.table_metadata_manager)
.table_metadata_manager
.table_name_manager()
.get(TableNameKey::new(
&expr.catalog_name,
&expr.schema_name,
&expr.table_name,
))
.await?; .await?;
// Return the table id if the table already exists.
if let Some(value) = table_name_value { if let Some(table_id) = table_id {
ensure!(
expr.create_if_not_exists,
error::TableAlreadyExistsSnafu {
table_name: self.creator.data.table_ref().to_string(),
}
);
let table_id = value.table_id();
return Ok(Status::done_with_output(table_id)); return Ok(Status::done_with_output(table_id));
} }
self.creator.data.state = CreateTableState::DatanodeCreateRegions; self.data.state = CreateTableState::DatanodeCreateRegions;
let TableMetadata { let TableMetadata {
table_id, table_id,
table_route, table_route,
@@ -157,23 +154,13 @@ impl CreateTableProcedure {
} = self } = self
.context .context
.table_metadata_allocator .table_metadata_allocator
.create(&self.creator.data.task) .create(&self.data.task)
.await?; .await?;
self.creator self.set_allocated_metadata(table_id, table_route, region_wal_options);
.set_allocated_metadata(table_id, table_route, region_wal_options);
Ok(Status::executing(true)) Ok(Status::executing(true))
} }
pub fn new_region_request_builder(
&self,
physical_table_id: Option<TableId>,
) -> Result<CreateRequestBuilder> {
let create_table_expr = &self.creator.data.task.create_table;
let template = build_template(create_table_expr)?;
Ok(CreateRequestBuilder::new(template, physical_table_id))
}
/// Creates regions on datanodes /// Creates regions on datanodes
/// ///
/// Abort(non-retry): /// Abort(non-retry):
@@ -187,90 +174,29 @@ impl CreateTableProcedure {
/// - [Code::Unavailable](tonic::status::Code::Unavailable) /// - [Code::Unavailable](tonic::status::Code::Unavailable)
pub async fn on_datanode_create_regions(&mut self) -> Result<Status> { pub async fn on_datanode_create_regions(&mut self) -> Result<Status> {
let table_route = self.table_route()?.clone(); let table_route = self.table_route()?.clone();
let request_builder = self.new_region_request_builder(None)?;
// Registers opening regions // Registers opening regions
let guards = self let guards = self.register_opening_regions(&self.context, &table_route.region_routes)?;
.creator
.register_opening_regions(&self.context, &table_route.region_routes)?;
if !guards.is_empty() { if !guards.is_empty() {
self.creator.opening_regions = guards; self.opening_regions = guards;
} }
self.create_regions(&table_route.region_routes, request_builder) self.create_regions(&table_route.region_routes).await
.await
} }
async fn create_regions( async fn create_regions(&mut self, region_routes: &[RegionRoute]) -> Result<Status> {
&mut self, let table_id = self.table_id();
region_routes: &[RegionRoute],
request_builder: CreateRequestBuilder,
) -> Result<Status> {
let create_table_data = &self.creator.data;
// Safety: the region_wal_options must be allocated
let region_wal_options = self.region_wal_options()?; let region_wal_options = self.region_wal_options()?;
let create_table_expr = &create_table_data.task.create_table; let column_metadatas = self
let catalog = &create_table_expr.catalog_name; .executor
let schema = &create_table_expr.schema_name; .on_create_regions(
let storage_path = region_storage_path(catalog, schema); &self.context.node_manager,
let leaders = find_leaders(region_routes); table_id,
let mut create_region_tasks = Vec::with_capacity(leaders.len()); region_routes,
region_wal_options,
)
.await?;
let partition_exprs = region_routes self.data.column_metadatas = column_metadatas;
.iter() self.data.state = CreateTableState::CreateMetadata;
.map(|r| (r.region.id.region_number(), r.region.partition_expr()))
.collect();
for datanode in leaders {
let requester = self.context.node_manager.datanode(&datanode).await;
let regions = find_leader_regions(region_routes, &datanode);
let mut requests = Vec::with_capacity(regions.len());
for region_number in regions {
let region_id = RegionId::new(self.table_id(), region_number);
let create_region_request = request_builder.build_one(
region_id,
storage_path.clone(),
region_wal_options,
&partition_exprs,
);
requests.push(PbRegionRequest::Create(create_region_request));
}
for request in requests {
let request = RegionRequest {
header: Some(RegionRequestHeader {
tracing_context: TracingContext::from_current_span().to_w3c(),
..Default::default()
}),
body: Some(request),
};
let datanode = datanode.clone();
let requester = requester.clone();
create_region_tasks.push(async move {
requester
.handle(request)
.await
.map_err(add_peer_context_if_needed(datanode))
});
}
}
let mut results = join_all(create_region_tasks)
.await
.into_iter()
.collect::<Result<Vec<_>>>()?;
if let Some(column_metadatas) =
extract_column_metadatas(&mut results, TABLE_COLUMN_METADATA_EXTENSION_KEY)?
{
self.creator.data.column_metadatas = column_metadatas;
} else {
warn!(
"creating table result doesn't contains extension key `{TABLE_COLUMN_METADATA_EXTENSION_KEY}`,leaving the table's column metadata unchanged"
);
}
self.creator.data.state = CreateTableState::CreateMetadata;
Ok(Status::executing(true)) Ok(Status::executing(true))
} }
@@ -280,107 +206,33 @@ impl CreateTableProcedure {
/// - Failed to create table metadata. /// - Failed to create table metadata.
async fn on_create_metadata(&mut self, pid: ProcedureId) -> Result<Status> { async fn on_create_metadata(&mut self, pid: ProcedureId) -> Result<Status> {
let table_id = self.table_id(); let table_id = self.table_id();
let table_ref = self.creator.data.table_ref(); let table_ref = self.data.table_ref();
let manager = &self.context.table_metadata_manager; let manager = &self.context.table_metadata_manager;
let mut raw_table_info = self.table_info().clone(); let raw_table_info = self.table_info().clone();
if !self.creator.data.column_metadatas.is_empty() {
update_table_info_column_ids(&mut raw_table_info, &self.creator.data.column_metadatas);
}
// Safety: the region_wal_options must be allocated. // Safety: the region_wal_options must be allocated.
let region_wal_options = self.region_wal_options()?.clone(); let region_wal_options = self.region_wal_options()?.clone();
// Safety: the table_route must be allocated. // Safety: the table_route must be allocated.
let physical_table_route = self.table_route()?.clone(); let physical_table_route = self.table_route()?.clone();
let detecting_regions = self.executor
convert_region_routes_to_detecting_regions(&physical_table_route.region_routes); .on_create_metadata(
let table_route = TableRouteValue::Physical(physical_table_route); manager,
manager &self.context.region_failure_detector_controller,
.create_table_metadata(raw_table_info, table_route, region_wal_options) raw_table_info,
&self.data.column_metadatas,
physical_table_route,
region_wal_options,
)
.await?; .await?;
self.context
.register_failure_detectors(detecting_regions)
.await;
info!( info!(
"Successfully created table: {}, table_id: {}, procedure_id: {}", "Successfully created table: {}, table_id: {}, procedure_id: {}",
table_ref, table_id, pid table_ref, table_id, pid
); );
self.creator.opening_regions.clear(); self.opening_regions.clear();
Ok(Status::done_with_output(table_id)) Ok(Status::done_with_output(table_id))
} }
}
#[async_trait]
impl Procedure for CreateTableProcedure {
fn type_name(&self) -> &str {
Self::TYPE_NAME
}
fn recover(&mut self) -> ProcedureResult<()> {
// Only registers regions if the table route is allocated.
if let Some(x) = &self.creator.data.table_route {
self.creator.opening_regions = self
.creator
.register_opening_regions(&self.context, &x.region_routes)
.map_err(BoxedError::new)
.context(ExternalSnafu {
clean_poisons: false,
})?;
}
Ok(())
}
async fn execute(&mut self, ctx: &ProcedureContext) -> ProcedureResult<Status> {
let state = &self.creator.data.state;
let _timer = metrics::METRIC_META_PROCEDURE_CREATE_TABLE
.with_label_values(&[state.as_ref()])
.start_timer();
match state {
CreateTableState::Prepare => self.on_prepare().await,
CreateTableState::DatanodeCreateRegions => self.on_datanode_create_regions().await,
CreateTableState::CreateMetadata => self.on_create_metadata(ctx.procedure_id).await,
}
.map_err(map_to_procedure_error)
}
fn dump(&self) -> ProcedureResult<String> {
serde_json::to_string(&self.creator.data).context(ToJsonSnafu)
}
fn lock_key(&self) -> LockKey {
let table_ref = &self.creator.data.table_ref();
LockKey::new(vec![
CatalogLock::Read(table_ref.catalog).into(),
SchemaLock::read(table_ref.catalog, table_ref.schema).into(),
TableNameLock::new(table_ref.catalog, table_ref.schema, table_ref.table).into(),
])
}
}
pub struct TableCreator {
/// The serializable data.
pub data: CreateTableData,
/// The guards of opening.
pub opening_regions: Vec<OperatingRegionGuard>,
}
impl TableCreator {
pub fn new(task: CreateTableTask) -> Self {
Self {
data: CreateTableData {
state: CreateTableState::Prepare,
column_metadatas: vec![],
task,
table_route: None,
region_wal_options: None,
},
opening_regions: vec![],
}
}
/// Registers and returns the guards of the opening region if they don't exist. /// Registers and returns the guards of the opening region if they don't exist.
fn register_opening_regions( fn register_opening_regions(
@@ -389,7 +241,6 @@ impl TableCreator {
region_routes: &[RegionRoute], region_routes: &[RegionRoute],
) -> Result<Vec<OperatingRegionGuard>> { ) -> Result<Vec<OperatingRegionGuard>> {
let opening_regions = operating_leader_regions(region_routes); let opening_regions = operating_leader_regions(region_routes);
if self.opening_regions.len() == opening_regions.len() { if self.opening_regions.len() == opening_regions.len() {
return Ok(vec![]); return Ok(vec![]);
} }
@@ -409,7 +260,7 @@ impl TableCreator {
Ok(opening_region_guards) Ok(opening_region_guards)
} }
fn set_allocated_metadata( pub fn set_allocated_metadata(
&mut self, &mut self,
table_id: TableId, table_id: TableId,
table_route: PhysicalTableRouteValue, table_route: PhysicalTableRouteValue,
@@ -421,6 +272,56 @@ impl TableCreator {
} }
} }
#[async_trait]
impl Procedure for CreateTableProcedure {
fn type_name(&self) -> &str {
Self::TYPE_NAME
}
fn recover(&mut self) -> ProcedureResult<()> {
// Only registers regions if the table route is allocated.
if let Some(x) = &self.data.table_route {
self.opening_regions = self
.register_opening_regions(&self.context, &x.region_routes)
.map_err(BoxedError::new)
.context(ExternalSnafu {
clean_poisons: false,
})?;
}
Ok(())
}
async fn execute(&mut self, ctx: &ProcedureContext) -> ProcedureResult<Status> {
let state = &self.data.state;
let _timer = metrics::METRIC_META_PROCEDURE_CREATE_TABLE
.with_label_values(&[state.as_ref()])
.start_timer();
match state {
CreateTableState::Prepare => self.on_prepare().await,
CreateTableState::DatanodeCreateRegions => self.on_datanode_create_regions().await,
CreateTableState::CreateMetadata => self.on_create_metadata(ctx.procedure_id).await,
}
.map_err(map_to_procedure_error)
}
fn dump(&self) -> ProcedureResult<String> {
serde_json::to_string(&self.data).context(ToJsonSnafu)
}
fn lock_key(&self) -> LockKey {
let table_ref = &self.data.table_ref();
LockKey::new(vec![
CatalogLock::Read(table_ref.catalog).into(),
SchemaLock::read(table_ref.catalog, table_ref.schema).into(),
TableNameLock::new(table_ref.catalog, table_ref.schema, table_ref.table).into(),
])
}
}
#[derive(Debug, Clone, Serialize, Deserialize, AsRefStr, PartialEq)] #[derive(Debug, Clone, Serialize, Deserialize, AsRefStr, PartialEq)]
pub enum CreateTableState { pub enum CreateTableState {
/// Prepares to create the table /// Prepares to create the table
@@ -444,6 +345,16 @@ pub struct CreateTableData {
} }
impl CreateTableData { impl CreateTableData {
pub fn new(task: CreateTableTask) -> Self {
CreateTableData {
state: CreateTableState::Prepare,
column_metadatas: vec![],
task,
table_route: None,
region_wal_options: None,
}
}
fn table_ref(&self) -> TableReference<'_> { fn table_ref(&self) -> TableReference<'_> {
self.task.table_ref() self.task.table_ref()
} }

View File

@@ -0,0 +1,203 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
use std::collections::HashMap;
use api::v1::region::region_request::Body as PbRegionRequest;
use api::v1::region::{RegionRequest, RegionRequestHeader};
use common_telemetry::tracing_context::TracingContext;
use common_telemetry::warn;
use futures::future::join_all;
use snafu::ensure;
use store_api::metadata::ColumnMetadata;
use store_api::metric_engine_consts::TABLE_COLUMN_METADATA_EXTENSION_KEY;
use store_api::storage::{RegionId, RegionNumber};
use table::metadata::{RawTableInfo, TableId};
use table::table_name::TableName;
use crate::ddl::utils::raw_table_info::update_table_info_column_ids;
use crate::ddl::utils::{
add_peer_context_if_needed, convert_region_routes_to_detecting_regions,
extract_column_metadatas, region_storage_path,
};
use crate::ddl::{CreateRequestBuilder, RegionFailureDetectorControllerRef};
use crate::error::{self, Result};
use crate::key::TableMetadataManagerRef;
use crate::key::table_name::TableNameKey;
use crate::key::table_route::{PhysicalTableRouteValue, TableRouteValue};
use crate::node_manager::NodeManagerRef;
use crate::rpc::router::{RegionRoute, find_leader_regions, find_leaders};
/// [CreateTableExecutor] performs:
/// - Creates the metadata of the table.
/// - Creates the regions on the Datanode nodes.
pub struct CreateTableExecutor {
create_if_not_exists: bool,
table_name: TableName,
builder: CreateRequestBuilder,
}
impl CreateTableExecutor {
/// Creates a new [`CreateTableExecutor`].
pub fn new(
table_name: TableName,
create_if_not_exists: bool,
builder: CreateRequestBuilder,
) -> Self {
Self {
create_if_not_exists,
table_name,
builder,
}
}
/// On the prepare step, it performs:
/// - Checks whether the table exists.
/// - Returns the table id if the table exists.
///
/// Abort(non-retry):
/// - Table exists and `create_if_not_exists` is `false`.
/// - Failed to get the table name value.
pub async fn on_prepare(
&self,
table_metadata_manager: &TableMetadataManagerRef,
) -> Result<Option<TableId>> {
let table_name_value = table_metadata_manager
.table_name_manager()
.get(TableNameKey::new(
&self.table_name.catalog_name,
&self.table_name.schema_name,
&self.table_name.table_name,
))
.await?;
if let Some(value) = table_name_value {
ensure!(
self.create_if_not_exists,
error::TableAlreadyExistsSnafu {
table_name: self.table_name.to_string(),
}
);
return Ok(Some(value.table_id()));
}
Ok(None)
}
pub async fn on_create_regions(
&self,
node_manager: &NodeManagerRef,
table_id: TableId,
region_routes: &[RegionRoute],
region_wal_options: &HashMap<RegionNumber, String>,
) -> Result<Vec<ColumnMetadata>> {
let storage_path =
region_storage_path(&self.table_name.catalog_name, &self.table_name.schema_name);
let leaders = find_leaders(region_routes);
let mut create_region_tasks = Vec::with_capacity(leaders.len());
let partition_exprs = region_routes
.iter()
.map(|r| (r.region.id.region_number(), r.region.partition_expr()))
.collect::<HashMap<_, _>>();
for datanode in leaders {
let requester = node_manager.datanode(&datanode).await;
let regions = find_leader_regions(region_routes, &datanode);
let mut requests = Vec::with_capacity(regions.len());
for region_number in regions {
let region_id = RegionId::new(table_id, region_number);
let create_region_request = self.builder.build_one(
region_id,
storage_path.clone(),
region_wal_options,
&partition_exprs,
);
requests.push(PbRegionRequest::Create(create_region_request));
}
for request in requests {
let request = RegionRequest {
header: Some(RegionRequestHeader {
tracing_context: TracingContext::from_current_span().to_w3c(),
..Default::default()
}),
body: Some(request),
};
let datanode = datanode.clone();
let requester = requester.clone();
create_region_tasks.push(async move {
requester
.handle(request)
.await
.map_err(add_peer_context_if_needed(datanode))
});
}
}
let mut results = join_all(create_region_tasks)
.await
.into_iter()
.collect::<Result<Vec<_>>>()?;
let column_metadatas = if let Some(column_metadatas) =
extract_column_metadatas(&mut results, TABLE_COLUMN_METADATA_EXTENSION_KEY)?
{
column_metadatas
} else {
warn!(
"creating table result doesn't contains extension key `{TABLE_COLUMN_METADATA_EXTENSION_KEY}`,leaving the table's column metadata unchanged"
);
vec![]
};
Ok(column_metadatas)
}
/// Creates table metadata
///
/// Abort(non-retry):
/// - Failed to create table metadata.
pub async fn on_create_metadata(
&self,
table_metadata_manager: &TableMetadataManagerRef,
region_failure_detector_controller: &RegionFailureDetectorControllerRef,
mut raw_table_info: RawTableInfo,
column_metadatas: &[ColumnMetadata],
table_route: PhysicalTableRouteValue,
region_wal_options: HashMap<RegionNumber, String>,
) -> Result<()> {
if !column_metadatas.is_empty() {
update_table_info_column_ids(&mut raw_table_info, column_metadatas);
}
let detecting_regions =
convert_region_routes_to_detecting_regions(&table_route.region_routes);
let table_route = TableRouteValue::Physical(table_route);
table_metadata_manager
.create_table_metadata(raw_table_info, table_route, region_wal_options)
.await?;
region_failure_detector_controller
.register_failure_detectors(detecting_regions)
.await;
Ok(())
}
/// Returns the builder of the executor.
pub fn builder(&self) -> &CreateRequestBuilder {
&self.builder
}
}

View File

@@ -120,7 +120,13 @@ impl State for DropDatabaseExecutor {
.await?; .await?;
executor.invalidate_table_cache(ddl_ctx).await?; executor.invalidate_table_cache(ddl_ctx).await?;
executor executor
.on_drop_regions(ddl_ctx, &self.physical_region_routes, true) .on_drop_regions(
&ddl_ctx.node_manager,
&ddl_ctx.leader_region_registry,
&self.physical_region_routes,
true,
false,
)
.await?; .await?;
info!("Table: {}({}) is dropped", self.table_name, self.table_id); info!("Table: {}({}) is dropped", self.table_name, self.table_id);

View File

@@ -12,7 +12,7 @@
// See the License for the specific language governing permissions and // See the License for the specific language governing permissions and
// limitations under the License. // limitations under the License.
pub(crate) mod executor; pub mod executor;
mod metadata; mod metadata;
use std::collections::HashMap; use std::collections::HashMap;
@@ -156,7 +156,13 @@ impl DropTableProcedure {
pub async fn on_datanode_drop_regions(&mut self) -> Result<Status> { pub async fn on_datanode_drop_regions(&mut self) -> Result<Status> {
self.executor self.executor
.on_drop_regions(&self.context, &self.data.physical_region_routes, false) .on_drop_regions(
&self.context.node_manager,
&self.context.leader_region_registry,
&self.data.physical_region_routes,
false,
false,
)
.await?; .await?;
self.data.state = DropTableState::DeleteTombstone; self.data.state = DropTableState::DeleteTombstone;
Ok(Status::executing(true)) Ok(Status::executing(true))

View File

@@ -36,6 +36,8 @@ use crate::error::{self, Result};
use crate::instruction::CacheIdent; use crate::instruction::CacheIdent;
use crate::key::table_name::TableNameKey; use crate::key::table_name::TableNameKey;
use crate::key::table_route::TableRouteValue; use crate::key::table_route::TableRouteValue;
use crate::node_manager::NodeManagerRef;
use crate::region_registry::LeaderRegionRegistryRef;
use crate::rpc::router::{ use crate::rpc::router::{
RegionRoute, find_follower_regions, find_followers, find_leader_regions, find_leaders, RegionRoute, find_follower_regions, find_followers, find_leader_regions, find_leaders,
operating_leader_regions, operating_leader_regions,
@@ -212,16 +214,18 @@ impl DropTableExecutor {
/// Drops region on datanode. /// Drops region on datanode.
pub async fn on_drop_regions( pub async fn on_drop_regions(
&self, &self,
ctx: &DdlContext, node_manager: &NodeManagerRef,
leader_region_registry: &LeaderRegionRegistryRef,
region_routes: &[RegionRoute], region_routes: &[RegionRoute],
fast_path: bool, fast_path: bool,
force: bool,
) -> Result<()> { ) -> Result<()> {
// Drops leader regions on datanodes. // Drops leader regions on datanodes.
let leaders = find_leaders(region_routes); let leaders = find_leaders(region_routes);
let mut drop_region_tasks = Vec::with_capacity(leaders.len()); let mut drop_region_tasks = Vec::with_capacity(leaders.len());
let table_id = self.table_id; let table_id = self.table_id;
for datanode in leaders { for datanode in leaders {
let requester = ctx.node_manager.datanode(&datanode).await; let requester = node_manager.datanode(&datanode).await;
let regions = find_leader_regions(region_routes, &datanode); let regions = find_leader_regions(region_routes, &datanode);
let region_ids = regions let region_ids = regions
.iter() .iter()
@@ -238,6 +242,7 @@ impl DropTableExecutor {
body: Some(region_request::Body::Drop(PbDropRegionRequest { body: Some(region_request::Body::Drop(PbDropRegionRequest {
region_id: region_id.as_u64(), region_id: region_id.as_u64(),
fast_path, fast_path,
force,
})), })),
}; };
let datanode = datanode.clone(); let datanode = datanode.clone();
@@ -262,7 +267,7 @@ impl DropTableExecutor {
let followers = find_followers(region_routes); let followers = find_followers(region_routes);
let mut close_region_tasks = Vec::with_capacity(followers.len()); let mut close_region_tasks = Vec::with_capacity(followers.len());
for datanode in followers { for datanode in followers {
let requester = ctx.node_manager.datanode(&datanode).await; let requester = node_manager.datanode(&datanode).await;
let regions = find_follower_regions(region_routes, &datanode); let regions = find_follower_regions(region_routes, &datanode);
let region_ids = regions let region_ids = regions
.iter() .iter()
@@ -307,8 +312,7 @@ impl DropTableExecutor {
// Deletes the leader region from registry. // Deletes the leader region from registry.
let region_ids = operating_leader_regions(region_routes); let region_ids = operating_leader_regions(region_routes);
ctx.leader_region_registry leader_region_registry.batch_delete(region_ids.into_iter().map(|(region_id, _)| region_id));
.batch_delete(region_ids.into_iter().map(|(region_id, _)| region_id));
Ok(()) Ok(())
} }

View File

@@ -128,7 +128,6 @@ pub fn build_raw_table_info_from_expr(expr: &CreateTableExpr) -> RawTableInfo {
value_indices: vec![], value_indices: vec![],
engine: expr.engine.clone(), engine: expr.engine.clone(),
next_column_id: expr.column_defs.len() as u32, next_column_id: expr.column_defs.len() as u32,
region_numbers: vec![],
options: TableOptions::try_from_iter(&expr.table_options).unwrap(), options: TableOptions::try_from_iter(&expr.table_options).unwrap(),
created_on: DateTime::default(), created_on: DateTime::default(),
updated_on: DateTime::default(), updated_on: DateTime::default(),

View File

@@ -166,7 +166,7 @@ async fn test_on_prepare_logical_table_exists_err() {
.table_metadata_manager .table_metadata_manager
.create_logical_tables_metadata(vec![( .create_logical_tables_metadata(vec![(
task.table_info.clone(), task.table_info.clone(),
TableRouteValue::logical(1024, vec![RegionId::new(1025, 1)]), TableRouteValue::logical(1024),
)]) )])
.await .await
.unwrap(); .unwrap();
@@ -208,7 +208,7 @@ async fn test_on_prepare_with_create_if_table_exists() {
.table_metadata_manager .table_metadata_manager
.create_logical_tables_metadata(vec![( .create_logical_tables_metadata(vec![(
task.table_info.clone(), task.table_info.clone(),
TableRouteValue::logical(1024, vec![RegionId::new(8192, 1)]), TableRouteValue::logical(1024),
)]) )])
.await .await
.unwrap(); .unwrap();
@@ -252,7 +252,7 @@ async fn test_on_prepare_part_logical_tables_exist() {
.table_metadata_manager .table_metadata_manager
.create_logical_tables_metadata(vec![( .create_logical_tables_metadata(vec![(
task.table_info.clone(), task.table_info.clone(),
TableRouteValue::logical(1024, vec![RegionId::new(8192, 1)]), TableRouteValue::logical(1024),
)]) )])
.await .await
.unwrap(); .unwrap();
@@ -392,7 +392,7 @@ async fn test_on_create_metadata_part_logical_tables_exist() {
.table_metadata_manager .table_metadata_manager
.create_logical_tables_metadata(vec![( .create_logical_tables_metadata(vec![(
task.table_info.clone(), task.table_info.clone(),
TableRouteValue::logical(1024, vec![RegionId::new(8192, 1)]), TableRouteValue::logical(1024),
)]) )])
.await .await
.unwrap(); .unwrap();
@@ -496,10 +496,7 @@ async fn test_on_create_metadata_err() {
task.table_info.ident.table_id = 1025; task.table_info.ident.table_id = 1025;
ddl_context ddl_context
.table_metadata_manager .table_metadata_manager
.create_logical_tables_metadata(vec![( .create_logical_tables_metadata(vec![(task.table_info, TableRouteValue::logical(512))])
task.table_info,
TableRouteValue::logical(512, vec![RegionId::new(1026, 1)]),
)])
.await .await
.unwrap(); .unwrap();
// Triggers procedure to create table metadata // Triggers procedure to create table metadata

View File

@@ -162,7 +162,7 @@ async fn test_on_prepare_table_exists_err() {
) )
.await .await
.unwrap(); .unwrap();
let mut procedure = CreateTableProcedure::new(task, ddl_context); let mut procedure = CreateTableProcedure::new(task, ddl_context).unwrap();
let err = procedure.on_prepare().await.unwrap_err(); let err = procedure.on_prepare().await.unwrap_err();
assert_matches!(err, Error::TableAlreadyExists { .. }); assert_matches!(err, Error::TableAlreadyExists { .. });
assert_eq!(err.status_code(), StatusCode::TableAlreadyExists); assert_eq!(err.status_code(), StatusCode::TableAlreadyExists);
@@ -185,7 +185,7 @@ async fn test_on_prepare_with_create_if_table_exists() {
) )
.await .await
.unwrap(); .unwrap();
let mut procedure = CreateTableProcedure::new(task, ddl_context); let mut procedure = CreateTableProcedure::new(task, ddl_context).unwrap();
let status = procedure.on_prepare().await.unwrap(); let status = procedure.on_prepare().await.unwrap();
assert_matches!(status, Status::Done { output: Some(..) }); assert_matches!(status, Status::Done { output: Some(..) });
let table_id = *status.downcast_output_ref::<u32>().unwrap(); let table_id = *status.downcast_output_ref::<u32>().unwrap();
@@ -198,7 +198,7 @@ async fn test_on_prepare_without_create_if_table_exists() {
let ddl_context = new_ddl_context(node_manager); let ddl_context = new_ddl_context(node_manager);
let mut task = test_create_table_task("foo"); let mut task = test_create_table_task("foo");
task.create_table.create_if_not_exists = true; task.create_table.create_if_not_exists = true;
let mut procedure = CreateTableProcedure::new(task, ddl_context); let mut procedure = CreateTableProcedure::new(task, ddl_context).unwrap();
let status = procedure.on_prepare().await.unwrap(); let status = procedure.on_prepare().await.unwrap();
assert_matches!( assert_matches!(
status, status,
@@ -217,7 +217,7 @@ async fn test_on_datanode_create_regions_should_retry() {
let ddl_context = new_ddl_context(node_manager); let ddl_context = new_ddl_context(node_manager);
let task = test_create_table_task("foo"); let task = test_create_table_task("foo");
assert!(!task.create_table.create_if_not_exists); assert!(!task.create_table.create_if_not_exists);
let mut procedure = CreateTableProcedure::new(task, ddl_context); let mut procedure = CreateTableProcedure::new(task, ddl_context).unwrap();
procedure.on_prepare().await.unwrap(); procedure.on_prepare().await.unwrap();
let ctx = ProcedureContext { let ctx = ProcedureContext {
procedure_id: ProcedureId::random(), procedure_id: ProcedureId::random(),
@@ -234,7 +234,7 @@ async fn test_on_datanode_create_regions_should_not_retry() {
let ddl_context = new_ddl_context(node_manager); let ddl_context = new_ddl_context(node_manager);
let task = test_create_table_task("foo"); let task = test_create_table_task("foo");
assert!(!task.create_table.create_if_not_exists); assert!(!task.create_table.create_if_not_exists);
let mut procedure = CreateTableProcedure::new(task, ddl_context); let mut procedure = CreateTableProcedure::new(task, ddl_context).unwrap();
procedure.on_prepare().await.unwrap(); procedure.on_prepare().await.unwrap();
let ctx = ProcedureContext { let ctx = ProcedureContext {
procedure_id: ProcedureId::random(), procedure_id: ProcedureId::random(),
@@ -251,7 +251,7 @@ async fn test_on_create_metadata_error() {
let ddl_context = new_ddl_context(node_manager); let ddl_context = new_ddl_context(node_manager);
let task = test_create_table_task("foo"); let task = test_create_table_task("foo");
assert!(!task.create_table.create_if_not_exists); assert!(!task.create_table.create_if_not_exists);
let mut procedure = CreateTableProcedure::new(task.clone(), ddl_context.clone()); let mut procedure = CreateTableProcedure::new(task.clone(), ddl_context.clone()).unwrap();
procedure.on_prepare().await.unwrap(); procedure.on_prepare().await.unwrap();
let ctx = ProcedureContext { let ctx = ProcedureContext {
procedure_id: ProcedureId::random(), procedure_id: ProcedureId::random(),
@@ -284,7 +284,7 @@ async fn test_on_create_metadata() {
let ddl_context = new_ddl_context(node_manager); let ddl_context = new_ddl_context(node_manager);
let task = test_create_table_task("foo"); let task = test_create_table_task("foo");
assert!(!task.create_table.create_if_not_exists); assert!(!task.create_table.create_if_not_exists);
let mut procedure = CreateTableProcedure::new(task, ddl_context.clone()); let mut procedure = CreateTableProcedure::new(task, ddl_context.clone()).unwrap();
procedure.on_prepare().await.unwrap(); procedure.on_prepare().await.unwrap();
let ctx = ProcedureContext { let ctx = ProcedureContext {
procedure_id: ProcedureId::random(), procedure_id: ProcedureId::random(),
@@ -312,16 +312,16 @@ async fn test_memory_region_keeper_guard_dropped_on_procedure_done() {
let ddl_context = new_ddl_context_with_kv_backend(node_manager, kv_backend); let ddl_context = new_ddl_context_with_kv_backend(node_manager, kv_backend);
let task = test_create_table_task("foo"); let task = test_create_table_task("foo");
let mut procedure = CreateTableProcedure::new(task, ddl_context.clone()); let mut procedure = CreateTableProcedure::new(task, ddl_context.clone()).unwrap();
execute_procedure_until(&mut procedure, |p| { execute_procedure_until(&mut procedure, |p| {
p.creator.data.state == CreateTableState::CreateMetadata p.data.state == CreateTableState::CreateMetadata
}) })
.await; .await;
// Ensure that after running to the state `CreateMetadata`(just past `DatanodeCreateRegions`), // Ensure that after running to the state `CreateMetadata`(just past `DatanodeCreateRegions`),
// the opening regions should be recorded: // the opening regions should be recorded:
let guards = &procedure.creator.opening_regions; let guards = &procedure.opening_regions;
assert_eq!(guards.len(), 1); assert_eq!(guards.len(), 1);
let (datanode_id, region_id) = (0, RegionId::new(procedure.table_id(), 0)); let (datanode_id, region_id) = (0, RegionId::new(procedure.table_id(), 0));
assert_eq!(guards[0].info(), (datanode_id, region_id)); assert_eq!(guards[0].info(), (datanode_id, region_id));
@@ -334,7 +334,7 @@ async fn test_memory_region_keeper_guard_dropped_on_procedure_done() {
execute_procedure_until_done(&mut procedure).await; execute_procedure_until_done(&mut procedure).await;
// Ensure that when run to the end, the opening regions should be cleared: // Ensure that when run to the end, the opening regions should be cleared:
let guards = &procedure.creator.opening_regions; let guards = &procedure.opening_regions;
assert!(guards.is_empty()); assert!(guards.is_empty());
assert!( assert!(
!ddl_context !ddl_context

View File

@@ -259,7 +259,7 @@ async fn test_replace_table() {
{ {
// Create a `foo` table. // Create a `foo` table.
let task = test_create_table_task("foo"); let task = test_create_table_task("foo");
let mut procedure = CreateTableProcedure::new(task, ddl_context.clone()); let mut procedure = CreateTableProcedure::new(task, ddl_context.clone()).unwrap();
procedure.on_prepare().await.unwrap(); procedure.on_prepare().await.unwrap();
let ctx = ProcedureContext { let ctx = ProcedureContext {
procedure_id: ProcedureId::random(), procedure_id: ProcedureId::random(),

View File

@@ -231,7 +231,7 @@ impl DdlManager {
) -> Result<(ProcedureId, Option<Output>)> { ) -> Result<(ProcedureId, Option<Output>)> {
let context = self.create_context(); let context = self.create_context();
let procedure = CreateTableProcedure::new(create_table_task, context); let procedure = CreateTableProcedure::new(create_table_task, context)?;
let procedure_with_id = ProcedureWithId::with_random_id(Box::new(procedure)); let procedure_with_id = ProcedureWithId::with_random_id(Box::new(procedure));

View File

@@ -12,27 +12,10 @@
// See the License for the specific language governing permissions and // See the License for the specific language governing permissions and
// limitations under the License. // limitations under the License.
use std::sync::OnceLock;
use std::time::Duration; use std::time::Duration;
use etcd_client::ConnectOptions; pub const BASE_HEARTBEAT_INTERVAL: Duration = Duration::from_secs(3);
/// Heartbeat interval time (is the basic unit of various time).
pub const HEARTBEAT_INTERVAL_MILLIS: u64 = 3000;
/// The frontend will also send heartbeats to Metasrv, sending an empty
/// heartbeat every HEARTBEAT_INTERVAL_MILLIS * 6 seconds.
pub const FRONTEND_HEARTBEAT_INTERVAL_MILLIS: u64 = HEARTBEAT_INTERVAL_MILLIS * 6;
/// The lease seconds of a region. It's set by 3 heartbeat intervals
/// (HEARTBEAT_INTERVAL_MILLIS × 3), plus some extra buffer (1 second).
pub const REGION_LEASE_SECS: u64 =
Duration::from_millis(HEARTBEAT_INTERVAL_MILLIS * 3).as_secs() + 1;
/// When creating table or region failover, a target node needs to be selected.
/// If the node's lease has expired, the `Selector` will not select it.
pub const DATANODE_LEASE_SECS: u64 = REGION_LEASE_SECS;
pub const FLOWNODE_LEASE_SECS: u64 = DATANODE_LEASE_SECS;
/// The lease seconds of metasrv leader. /// The lease seconds of metasrv leader.
pub const META_LEASE_SECS: u64 = 5; pub const META_LEASE_SECS: u64 = 5;
@@ -52,14 +35,6 @@ pub const HEARTBEAT_CHANNEL_KEEP_ALIVE_INTERVAL_SECS: Duration = Duration::from_
/// The keep-alive timeout of the heartbeat channel. /// The keep-alive timeout of the heartbeat channel.
pub const HEARTBEAT_CHANNEL_KEEP_ALIVE_TIMEOUT_SECS: Duration = Duration::from_secs(5); pub const HEARTBEAT_CHANNEL_KEEP_ALIVE_TIMEOUT_SECS: Duration = Duration::from_secs(5);
/// The default options for the etcd client.
pub fn default_etcd_client_options() -> ConnectOptions {
ConnectOptions::new()
.with_keep_alive_while_idle(true)
.with_keep_alive(Duration::from_secs(15), Duration::from_secs(5))
.with_connect_timeout(Duration::from_secs(10))
}
/// The default mailbox round-trip timeout. /// The default mailbox round-trip timeout.
pub const MAILBOX_RTT_SECS: u64 = 1; pub const MAILBOX_RTT_SECS: u64 = 1;
@@ -68,3 +43,60 @@ pub const TOPIC_STATS_REPORT_INTERVAL_SECS: u64 = 15;
/// The retention seconds of topic stats. /// The retention seconds of topic stats.
pub const TOPIC_STATS_RETENTION_SECS: u64 = TOPIC_STATS_REPORT_INTERVAL_SECS * 100; pub const TOPIC_STATS_RETENTION_SECS: u64 = TOPIC_STATS_REPORT_INTERVAL_SECS * 100;
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
/// The distributed time constants.
pub struct DistributedTimeConstants {
pub heartbeat_interval: Duration,
pub frontend_heartbeat_interval: Duration,
pub region_lease: Duration,
pub datanode_lease: Duration,
pub flownode_lease: Duration,
}
/// The frontend heartbeat interval is 6 times of the base heartbeat interval.
pub fn frontend_heartbeat_interval(base_heartbeat_interval: Duration) -> Duration {
base_heartbeat_interval * 6
}
impl DistributedTimeConstants {
/// Create a new DistributedTimeConstants from the heartbeat interval.
pub fn from_heartbeat_interval(heartbeat_interval: Duration) -> Self {
let region_lease = heartbeat_interval * 3 + Duration::from_secs(1);
let datanode_lease = region_lease;
let flownode_lease = datanode_lease;
Self {
heartbeat_interval,
frontend_heartbeat_interval: frontend_heartbeat_interval(heartbeat_interval),
region_lease,
datanode_lease,
flownode_lease,
}
}
}
impl Default for DistributedTimeConstants {
fn default() -> Self {
Self::from_heartbeat_interval(BASE_HEARTBEAT_INTERVAL)
}
}
static DEFAULT_DISTRIBUTED_TIME_CONSTANTS: OnceLock<DistributedTimeConstants> = OnceLock::new();
/// Get the default distributed time constants.
pub fn default_distributed_time_constants() -> &'static DistributedTimeConstants {
DEFAULT_DISTRIBUTED_TIME_CONSTANTS.get_or_init(Default::default)
}
/// Initialize the default distributed time constants.
pub fn init_distributed_time_constants(base_heartbeat_interval: Duration) {
let distributed_time_constants =
DistributedTimeConstants::from_heartbeat_interval(base_heartbeat_interval);
DEFAULT_DISTRIBUTED_TIME_CONSTANTS
.set(distributed_time_constants)
.expect("Failed to set default distributed time constants");
common_telemetry::info!(
"Initialized default distributed time constants: {:#?}",
distributed_time_constants
);
}

View File

@@ -224,6 +224,13 @@ pub enum Error {
location: Location, location: Location,
}, },
#[snafu(display("Failed to find table repartition metadata for table id {}", table_id))]
TableRepartNotFound {
table_id: TableId,
#[snafu(implicit)]
location: Location,
},
#[snafu(display("Failed to decode protobuf"))] #[snafu(display("Failed to decode protobuf"))]
DecodeProto { DecodeProto {
#[snafu(implicit)] #[snafu(implicit)]
@@ -1091,6 +1098,7 @@ impl ErrorExt for Error {
| DecodeProto { .. } | DecodeProto { .. }
| BuildTableMeta { .. } | BuildTableMeta { .. }
| TableRouteNotFound { .. } | TableRouteNotFound { .. }
| TableRepartNotFound { .. }
| ConvertRawTableInfo { .. } | ConvertRawTableInfo { .. }
| RegionOperatingRace { .. } | RegionOperatingRace { .. }
| EncodeWalOptions { .. } | EncodeWalOptions { .. }

View File

@@ -514,6 +514,65 @@ impl Display for GcRegionsReply {
} }
} }
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq)]
pub struct EnterStagingRegion {
pub region_id: RegionId,
pub partition_expr: String,
}
impl Display for EnterStagingRegion {
fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result {
write!(
f,
"EnterStagingRegion(region_id={}, partition_expr={})",
self.region_id, self.partition_expr
)
}
}
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq)]
pub struct RemapManifest {
pub region_id: RegionId,
/// Regions to remap manifests from.
pub input_regions: Vec<RegionId>,
/// For each old region, which new regions should receive its files
pub region_mapping: HashMap<RegionId, Vec<RegionId>>,
/// New partition expressions for the new regions.
pub new_partition_exprs: HashMap<RegionId, String>,
}
impl Display for RemapManifest {
fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result {
write!(
f,
"RemapManifest(region_id={}, input_regions={:?}, region_mapping={:?}, new_partition_exprs={:?})",
self.region_id, self.input_regions, self.region_mapping, self.new_partition_exprs
)
}
}
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq)]
pub struct ApplyStagingManifest {
/// The region ID to apply the staging manifest to.
pub region_id: RegionId,
/// The partition expression of the staging region.
pub partition_expr: String,
/// The region that stores the staging manifests in its staging blob storage.
pub central_region_id: RegionId,
/// The relative path to the staging manifest within the central region's staging blob storage.
pub manifest_path: String,
}
impl Display for ApplyStagingManifest {
fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result {
write!(
f,
"ApplyStagingManifest(region_id={}, partition_expr={}, central_region_id={}, manifest_path={})",
self.region_id, self.partition_expr, self.central_region_id, self.manifest_path
)
}
}
#[derive(Debug, Clone, Serialize, Deserialize, Display, PartialEq)] #[derive(Debug, Clone, Serialize, Deserialize, Display, PartialEq)]
pub enum Instruction { pub enum Instruction {
/// Opens regions. /// Opens regions.
@@ -541,6 +600,12 @@ pub enum Instruction {
GcRegions(GcRegions), GcRegions(GcRegions),
/// Temporary suspend serving reads or writes /// Temporary suspend serving reads or writes
Suspend, Suspend,
/// Makes regions enter staging state.
EnterStagingRegions(Vec<EnterStagingRegion>),
/// Remaps manifests for a region.
RemapManifest(RemapManifest),
/// Applies staging manifests for a region.
ApplyStagingManifests(Vec<ApplyStagingManifest>),
} }
impl Instruction { impl Instruction {
@@ -597,6 +662,13 @@ impl Instruction {
_ => None, _ => None,
} }
} }
pub fn into_enter_staging_regions(self) -> Option<Vec<EnterStagingRegion>> {
match self {
Self::EnterStagingRegions(enter_staging) => Some(enter_staging),
_ => None,
}
}
} }
/// The reply of [UpgradeRegion]. /// The reply of [UpgradeRegion].
@@ -690,6 +762,70 @@ where
}) })
} }
#[derive(Debug, Serialize, Deserialize, PartialEq, Eq, Clone)]
pub struct EnterStagingRegionReply {
pub region_id: RegionId,
/// Returns true if the region is under the new region rule.
pub ready: bool,
/// Indicates whether the region exists.
pub exists: bool,
/// Return error if any during the operation.
pub error: Option<String>,
}
#[derive(Debug, Serialize, Deserialize, PartialEq, Eq, Clone)]
pub struct EnterStagingRegionsReply {
pub replies: Vec<EnterStagingRegionReply>,
}
impl EnterStagingRegionsReply {
pub fn new(replies: Vec<EnterStagingRegionReply>) -> Self {
Self { replies }
}
}
#[derive(Debug, Serialize, Deserialize, PartialEq, Eq, Clone)]
pub struct RemapManifestReply {
/// Returns false if the region does not exist.
pub exists: bool,
/// A map from region IDs to their corresponding remapped manifest paths.
pub manifest_paths: HashMap<RegionId, String>,
/// Return error if any during the operation.
pub error: Option<String>,
}
impl Display for RemapManifestReply {
fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result {
write!(
f,
"RemapManifestReply(manifest_paths={:?}, error={:?})",
self.manifest_paths, self.error
)
}
}
#[derive(Debug, Serialize, Deserialize, PartialEq, Eq, Clone)]
pub struct ApplyStagingManifestsReply {
pub replies: Vec<ApplyStagingManifestReply>,
}
impl ApplyStagingManifestsReply {
pub fn new(replies: Vec<ApplyStagingManifestReply>) -> Self {
Self { replies }
}
}
#[derive(Debug, Serialize, Deserialize, PartialEq, Eq, Clone)]
pub struct ApplyStagingManifestReply {
pub region_id: RegionId,
/// Returns true if the region is ready to serve reads and writes.
pub ready: bool,
/// Indicates whether the region exists.
pub exists: bool,
/// Return error if any during the operation.
pub error: Option<String>,
}
#[derive(Debug, Serialize, Deserialize, PartialEq, Eq, Clone)] #[derive(Debug, Serialize, Deserialize, PartialEq, Eq, Clone)]
#[serde(tag = "type", rename_all = "snake_case")] #[serde(tag = "type", rename_all = "snake_case")]
pub enum InstructionReply { pub enum InstructionReply {
@@ -710,6 +846,9 @@ pub enum InstructionReply {
FlushRegions(FlushRegionReply), FlushRegions(FlushRegionReply),
GetFileRefs(GetFileRefsReply), GetFileRefs(GetFileRefsReply),
GcRegions(GcRegionsReply), GcRegions(GcRegionsReply),
EnterStagingRegions(EnterStagingRegionsReply),
RemapManifest(RemapManifestReply),
ApplyStagingManifests(ApplyStagingManifestsReply),
} }
impl Display for InstructionReply { impl Display for InstructionReply {
@@ -726,6 +865,19 @@ impl Display for InstructionReply {
Self::FlushRegions(reply) => write!(f, "InstructionReply::FlushRegions({})", reply), Self::FlushRegions(reply) => write!(f, "InstructionReply::FlushRegions({})", reply),
Self::GetFileRefs(reply) => write!(f, "InstructionReply::GetFileRefs({})", reply), Self::GetFileRefs(reply) => write!(f, "InstructionReply::GetFileRefs({})", reply),
Self::GcRegions(reply) => write!(f, "InstructionReply::GcRegion({})", reply), Self::GcRegions(reply) => write!(f, "InstructionReply::GcRegion({})", reply),
Self::EnterStagingRegions(reply) => {
write!(
f,
"InstructionReply::EnterStagingRegions({:?})",
reply.replies
)
}
Self::RemapManifest(reply) => write!(f, "InstructionReply::RemapManifest({})", reply),
Self::ApplyStagingManifests(reply) => write!(
f,
"InstructionReply::ApplyStagingManifests({:?})",
reply.replies
),
} }
} }
} }
@@ -766,13 +918,34 @@ impl InstructionReply {
_ => panic!("Expected FlushRegions reply"), _ => panic!("Expected FlushRegions reply"),
} }
} }
pub fn expect_enter_staging_regions_reply(self) -> Vec<EnterStagingRegionReply> {
match self {
Self::EnterStagingRegions(reply) => reply.replies,
_ => panic!("Expected EnterStagingRegion reply"),
}
}
pub fn expect_remap_manifest_reply(self) -> RemapManifestReply {
match self {
Self::RemapManifest(reply) => reply,
_ => panic!("Expected RemapManifest reply"),
}
}
pub fn expect_apply_staging_manifests_reply(self) -> Vec<ApplyStagingManifestReply> {
match self {
Self::ApplyStagingManifests(reply) => reply.replies,
_ => panic!("Expected ApplyStagingManifest reply"),
}
}
} }
#[cfg(test)] #[cfg(test)]
mod tests { mod tests {
use std::collections::HashSet; use std::collections::HashSet;
use store_api::storage::FileId; use store_api::storage::{FileId, FileRef};
use super::*; use super::*;
@@ -1147,12 +1320,14 @@ mod tests {
let mut manifest = FileRefsManifest::default(); let mut manifest = FileRefsManifest::default();
let r0 = RegionId::new(1024, 1); let r0 = RegionId::new(1024, 1);
let r1 = RegionId::new(1024, 2); let r1 = RegionId::new(1024, 2);
manifest manifest.file_refs.insert(
.file_refs r0,
.insert(r0, HashSet::from([FileId::random()])); HashSet::from([FileRef::new(r0, FileId::random(), None)]),
manifest );
.file_refs manifest.file_refs.insert(
.insert(r1, HashSet::from([FileId::random()])); r1,
HashSet::from([FileRef::new(r1, FileId::random(), None)]),
);
manifest.manifest_version.insert(r0, 10); manifest.manifest_version.insert(r0, 10);
manifest.manifest_version.insert(r1, 20); manifest.manifest_version.insert(r1, 20);

View File

@@ -106,6 +106,7 @@ mod schema_metadata_manager;
pub mod schema_name; pub mod schema_name;
pub mod table_info; pub mod table_info;
pub mod table_name; pub mod table_name;
pub mod table_repart;
pub mod table_route; pub mod table_route;
#[cfg(any(test, feature = "testing"))] #[cfg(any(test, feature = "testing"))]
pub mod test_utils; pub mod test_utils;
@@ -156,6 +157,7 @@ use crate::DatanodeId;
use crate::error::{self, Result, SerdeJsonSnafu}; use crate::error::{self, Result, SerdeJsonSnafu};
use crate::key::flow::flow_state::FlowStateValue; use crate::key::flow::flow_state::FlowStateValue;
use crate::key::node_address::NodeAddressValue; use crate::key::node_address::NodeAddressValue;
use crate::key::table_repart::{TableRepartKey, TableRepartManager};
use crate::key::table_route::TableRouteKey; use crate::key::table_route::TableRouteKey;
use crate::key::topic_region::TopicRegionValue; use crate::key::topic_region::TopicRegionValue;
use crate::key::txn_helper::TxnOpGetResponseSet; use crate::key::txn_helper::TxnOpGetResponseSet;
@@ -178,6 +180,7 @@ pub const TABLE_NAME_KEY_PREFIX: &str = "__table_name";
pub const CATALOG_NAME_KEY_PREFIX: &str = "__catalog_name"; pub const CATALOG_NAME_KEY_PREFIX: &str = "__catalog_name";
pub const SCHEMA_NAME_KEY_PREFIX: &str = "__schema_name"; pub const SCHEMA_NAME_KEY_PREFIX: &str = "__schema_name";
pub const TABLE_ROUTE_PREFIX: &str = "__table_route"; pub const TABLE_ROUTE_PREFIX: &str = "__table_route";
pub const TABLE_REPART_PREFIX: &str = "__table_repart";
pub const NODE_ADDRESS_PREFIX: &str = "__node_address"; pub const NODE_ADDRESS_PREFIX: &str = "__node_address";
pub const KAFKA_TOPIC_KEY_PREFIX: &str = "__topic_name/kafka"; pub const KAFKA_TOPIC_KEY_PREFIX: &str = "__topic_name/kafka";
// The legacy topic key prefix is used to store the topic name in previous versions. // The legacy topic key prefix is used to store the topic name in previous versions.
@@ -288,6 +291,11 @@ lazy_static! {
Regex::new(&format!("^{TABLE_ROUTE_PREFIX}/([0-9]+)$")).unwrap(); Regex::new(&format!("^{TABLE_ROUTE_PREFIX}/([0-9]+)$")).unwrap();
} }
lazy_static! {
pub(crate) static ref TABLE_REPART_KEY_PATTERN: Regex =
Regex::new(&format!("^{TABLE_REPART_PREFIX}/([0-9]+)$")).unwrap();
}
lazy_static! { lazy_static! {
static ref DATANODE_TABLE_KEY_PATTERN: Regex = static ref DATANODE_TABLE_KEY_PATTERN: Regex =
Regex::new(&format!("^{DATANODE_TABLE_KEY_PREFIX}/([0-9]+)/([0-9]+)$")).unwrap(); Regex::new(&format!("^{DATANODE_TABLE_KEY_PREFIX}/([0-9]+)/([0-9]+)$")).unwrap();
@@ -386,6 +394,7 @@ pub struct TableMetadataManager {
catalog_manager: CatalogManager, catalog_manager: CatalogManager,
schema_manager: SchemaManager, schema_manager: SchemaManager,
table_route_manager: TableRouteManager, table_route_manager: TableRouteManager,
table_repart_manager: TableRepartManager,
tombstone_manager: TombstoneManager, tombstone_manager: TombstoneManager,
topic_name_manager: TopicNameManager, topic_name_manager: TopicNameManager,
topic_region_manager: TopicRegionManager, topic_region_manager: TopicRegionManager,
@@ -538,6 +547,7 @@ impl TableMetadataManager {
catalog_manager: CatalogManager::new(kv_backend.clone()), catalog_manager: CatalogManager::new(kv_backend.clone()),
schema_manager: SchemaManager::new(kv_backend.clone()), schema_manager: SchemaManager::new(kv_backend.clone()),
table_route_manager: TableRouteManager::new(kv_backend.clone()), table_route_manager: TableRouteManager::new(kv_backend.clone()),
table_repart_manager: TableRepartManager::new(kv_backend.clone()),
tombstone_manager: TombstoneManager::new(kv_backend.clone()), tombstone_manager: TombstoneManager::new(kv_backend.clone()),
topic_name_manager: TopicNameManager::new(kv_backend.clone()), topic_name_manager: TopicNameManager::new(kv_backend.clone()),
topic_region_manager: TopicRegionManager::new(kv_backend.clone()), topic_region_manager: TopicRegionManager::new(kv_backend.clone()),
@@ -558,6 +568,7 @@ impl TableMetadataManager {
catalog_manager: CatalogManager::new(kv_backend.clone()), catalog_manager: CatalogManager::new(kv_backend.clone()),
schema_manager: SchemaManager::new(kv_backend.clone()), schema_manager: SchemaManager::new(kv_backend.clone()),
table_route_manager: TableRouteManager::new(kv_backend.clone()), table_route_manager: TableRouteManager::new(kv_backend.clone()),
table_repart_manager: TableRepartManager::new(kv_backend.clone()),
tombstone_manager: TombstoneManager::new_with_prefix( tombstone_manager: TombstoneManager::new_with_prefix(
kv_backend.clone(), kv_backend.clone(),
tombstone_prefix, tombstone_prefix,
@@ -616,6 +627,10 @@ impl TableMetadataManager {
&self.table_route_manager &self.table_route_manager
} }
pub fn table_repart_manager(&self) -> &TableRepartManager {
&self.table_repart_manager
}
pub fn topic_name_manager(&self) -> &TopicNameManager { pub fn topic_name_manager(&self) -> &TopicNameManager {
&self.topic_name_manager &self.topic_name_manager
} }
@@ -732,12 +747,10 @@ impl TableMetadataManager {
/// The caller MUST ensure it has the exclusive access to `TableNameKey`. /// The caller MUST ensure it has the exclusive access to `TableNameKey`.
pub async fn create_table_metadata( pub async fn create_table_metadata(
&self, &self,
mut table_info: RawTableInfo, table_info: RawTableInfo,
table_route_value: TableRouteValue, table_route_value: TableRouteValue,
region_wal_options: HashMap<RegionNumber, String>, region_wal_options: HashMap<RegionNumber, String>,
) -> Result<()> { ) -> Result<()> {
let region_numbers = table_route_value.region_numbers();
table_info.meta.region_numbers = region_numbers;
let table_id = table_info.ident.table_id; let table_id = table_info.ident.table_id;
let engine = table_info.meta.engine.clone(); let engine = table_info.meta.engine.clone();
@@ -836,8 +849,7 @@ impl TableMetadataManager {
on_create_table_route_failure: F2, on_create_table_route_failure: F2,
} }
let mut on_failures = Vec::with_capacity(len); let mut on_failures = Vec::with_capacity(len);
for (mut table_info, table_route_value) in tables_data { for (table_info, table_route_value) in tables_data {
table_info.meta.region_numbers = table_route_value.region_numbers();
let table_id = table_info.ident.table_id; let table_id = table_info.ident.table_id;
// Creates table name. // Creates table name.
@@ -923,6 +935,7 @@ impl TableMetadataManager {
); );
let table_info_key = TableInfoKey::new(table_id); let table_info_key = TableInfoKey::new(table_id);
let table_route_key = TableRouteKey::new(table_id); let table_route_key = TableRouteKey::new(table_id);
let table_repart_key = TableRepartKey::new(table_id);
let datanode_table_keys = datanode_ids let datanode_table_keys = datanode_ids
.into_iter() .into_iter()
.map(|datanode_id| DatanodeTableKey::new(datanode_id, table_id)) .map(|datanode_id| DatanodeTableKey::new(datanode_id, table_id))
@@ -937,6 +950,7 @@ impl TableMetadataManager {
keys.push(table_name.to_bytes()); keys.push(table_name.to_bytes());
keys.push(table_info_key.to_bytes()); keys.push(table_info_key.to_bytes());
keys.push(table_route_key.to_bytes()); keys.push(table_route_key.to_bytes());
keys.push(table_repart_key.to_bytes());
for key in &datanode_table_keys { for key in &datanode_table_keys {
keys.push(key.to_bytes()); keys.push(key.to_bytes());
} }
@@ -1526,8 +1540,8 @@ mod tests {
} }
} }
fn new_test_table_info(region_numbers: impl Iterator<Item = u32>) -> TableInfo { fn new_test_table_info() -> TableInfo {
test_utils::new_test_table_info(10, region_numbers) test_utils::new_test_table_info(10)
} }
fn new_test_table_names() -> HashSet<TableName> { fn new_test_table_names() -> HashSet<TableName> {
@@ -1585,8 +1599,7 @@ mod tests {
let table_metadata_manager = TableMetadataManager::new(mem_kv.clone()); let table_metadata_manager = TableMetadataManager::new(mem_kv.clone());
let region_route = new_test_region_route(); let region_route = new_test_region_route();
let region_routes = &vec![region_route.clone()]; let region_routes = &vec![region_route.clone()];
let table_info: RawTableInfo = let table_info: RawTableInfo = new_test_table_info().into();
new_test_table_info(region_routes.iter().map(|r| r.region.id.region_number())).into();
let wal_allocator = WalOptionsAllocator::RaftEngine; let wal_allocator = WalOptionsAllocator::RaftEngine;
let regions = (0..16).collect(); let regions = (0..16).collect();
let region_wal_options = let region_wal_options =
@@ -1613,8 +1626,7 @@ mod tests {
let table_metadata_manager = TableMetadataManager::new(mem_kv); let table_metadata_manager = TableMetadataManager::new(mem_kv);
let region_route = new_test_region_route(); let region_route = new_test_region_route();
let region_routes = &vec![region_route.clone()]; let region_routes = &vec![region_route.clone()];
let table_info: RawTableInfo = let table_info: RawTableInfo = new_test_table_info().into();
new_test_table_info(region_routes.iter().map(|r| r.region.id.region_number())).into();
let region_wal_options = create_mock_region_wal_options() let region_wal_options = create_mock_region_wal_options()
.into_iter() .into_iter()
.map(|(k, v)| (k, serde_json::to_string(&v).unwrap())) .map(|(k, v)| (k, serde_json::to_string(&v).unwrap()))
@@ -1696,8 +1708,7 @@ mod tests {
let table_metadata_manager = TableMetadataManager::new(mem_kv); let table_metadata_manager = TableMetadataManager::new(mem_kv);
let region_route = new_test_region_route(); let region_route = new_test_region_route();
let region_routes = vec![region_route.clone()]; let region_routes = vec![region_route.clone()];
let table_info: RawTableInfo = let table_info: RawTableInfo = new_test_table_info().into();
new_test_table_info(region_routes.iter().map(|r| r.region.id.region_number())).into();
let table_id = table_info.ident.table_id; let table_id = table_info.ident.table_id;
let table_route_value = TableRouteValue::physical(region_routes.clone()); let table_route_value = TableRouteValue::physical(region_routes.clone());
@@ -1762,7 +1773,6 @@ mod tests {
let table_info: RawTableInfo = test_utils::new_test_table_info_with_name( let table_info: RawTableInfo = test_utils::new_test_table_info_with_name(
table_id, table_id,
&format!("my_table_{}", table_id), &format!("my_table_{}", table_id),
region_routes.iter().map(|r| r.region.id.region_number()),
) )
.into(); .into();
let table_route_value = TableRouteValue::physical(region_routes.clone()); let table_route_value = TableRouteValue::physical(region_routes.clone());
@@ -1783,8 +1793,7 @@ mod tests {
let table_metadata_manager = TableMetadataManager::new(mem_kv); let table_metadata_manager = TableMetadataManager::new(mem_kv);
let region_route = new_test_region_route(); let region_route = new_test_region_route();
let region_routes = &vec![region_route.clone()]; let region_routes = &vec![region_route.clone()];
let table_info: RawTableInfo = let table_info: RawTableInfo = new_test_table_info().into();
new_test_table_info(region_routes.iter().map(|r| r.region.id.region_number())).into();
let table_id = table_info.ident.table_id; let table_id = table_info.ident.table_id;
let datanode_id = 2; let datanode_id = 2;
let region_wal_options = create_mock_region_wal_options(); let region_wal_options = create_mock_region_wal_options();
@@ -1890,8 +1899,7 @@ mod tests {
let table_metadata_manager = TableMetadataManager::new(mem_kv); let table_metadata_manager = TableMetadataManager::new(mem_kv);
let region_route = new_test_region_route(); let region_route = new_test_region_route();
let region_routes = vec![region_route.clone()]; let region_routes = vec![region_route.clone()];
let table_info: RawTableInfo = let table_info: RawTableInfo = new_test_table_info().into();
new_test_table_info(region_routes.iter().map(|r| r.region.id.region_number())).into();
let table_id = table_info.ident.table_id; let table_id = table_info.ident.table_id;
// creates metadata. // creates metadata.
create_physical_table_metadata( create_physical_table_metadata(
@@ -1967,8 +1975,7 @@ mod tests {
let table_metadata_manager = TableMetadataManager::new(mem_kv); let table_metadata_manager = TableMetadataManager::new(mem_kv);
let region_route = new_test_region_route(); let region_route = new_test_region_route();
let region_routes = vec![region_route.clone()]; let region_routes = vec![region_route.clone()];
let table_info: RawTableInfo = let table_info: RawTableInfo = new_test_table_info().into();
new_test_table_info(region_routes.iter().map(|r| r.region.id.region_number())).into();
let table_id = table_info.ident.table_id; let table_id = table_info.ident.table_id;
// creates metadata. // creates metadata.
create_physical_table_metadata( create_physical_table_metadata(
@@ -2053,8 +2060,7 @@ mod tests {
leader_down_since: None, leader_down_since: None,
}, },
]; ];
let table_info: RawTableInfo = let table_info: RawTableInfo = new_test_table_info().into();
new_test_table_info(region_routes.iter().map(|r| r.region.id.region_number())).into();
let table_id = table_info.ident.table_id; let table_id = table_info.ident.table_id;
let current_table_route_value = DeserializedValueWithBytes::from_inner( let current_table_route_value = DeserializedValueWithBytes::from_inner(
TableRouteValue::physical(region_routes.clone()), TableRouteValue::physical(region_routes.clone()),
@@ -2136,8 +2142,7 @@ mod tests {
let table_metadata_manager = TableMetadataManager::new(mem_kv); let table_metadata_manager = TableMetadataManager::new(mem_kv);
let region_route = new_test_region_route(); let region_route = new_test_region_route();
let region_routes = vec![region_route.clone()]; let region_routes = vec![region_route.clone()];
let table_info: RawTableInfo = let table_info: RawTableInfo = new_test_table_info().into();
new_test_table_info(region_routes.iter().map(|r| r.region.id.region_number())).into();
let table_id = table_info.ident.table_id; let table_id = table_info.ident.table_id;
let engine = table_info.meta.engine.as_str(); let engine = table_info.meta.engine.as_str();
let region_storage_path = let region_storage_path =
@@ -2391,7 +2396,7 @@ mod tests {
let mem_kv = Arc::new(MemoryKvBackend::default()); let mem_kv = Arc::new(MemoryKvBackend::default());
let table_metadata_manager = TableMetadataManager::new(mem_kv); let table_metadata_manager = TableMetadataManager::new(mem_kv);
let view_info: RawTableInfo = new_test_table_info(Vec::<u32>::new().into_iter()).into(); let view_info: RawTableInfo = new_test_table_info().into();
let view_id = view_info.ident.table_id; let view_id = view_info.ident.table_id;

View File

@@ -338,7 +338,6 @@ mod tests {
next_column_id: 3, next_column_id: 3,
value_indices: vec![2, 3], value_indices: vec![2, 3],
options: Default::default(), options: Default::default(),
region_numbers: vec![1],
partition_key_indices: vec![], partition_key_indices: vec![],
column_ids: vec![], column_ids: vec![],
}; };

View File

@@ -0,0 +1,856 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
use std::collections::{BTreeMap, BTreeSet, HashMap};
use std::fmt::Display;
use serde::{Deserialize, Serialize};
use snafu::{OptionExt as _, ResultExt, ensure};
use store_api::storage::RegionId;
use table::metadata::TableId;
use crate::error::{InvalidMetadataSnafu, Result, SerdeJsonSnafu};
use crate::key::txn_helper::TxnOpGetResponseSet;
use crate::key::{
DeserializedValueWithBytes, MetadataKey, MetadataValue, TABLE_REPART_KEY_PATTERN,
TABLE_REPART_PREFIX,
};
use crate::kv_backend::KvBackendRef;
use crate::kv_backend::txn::Txn;
use crate::rpc::store::BatchGetRequest;
/// The key stores table repartition metadata.
/// Specifically, it records the relation between source and destination regions after a repartition operation is completed.
/// This is distinct from the initial partitioning scheme of the table.
/// For example, after repartition, a destination region may still hold files from a source region; this mapping should be updated once repartition is done.
/// The GC scheduler uses this information to clean up those files (and removes this mapping if all files from the source region are cleaned).
///
/// The layout: `__table_repart/{table_id}`.
#[derive(Debug, PartialEq)]
pub struct TableRepartKey {
/// The unique identifier of the table whose re-partition information is stored in this key.
pub table_id: TableId,
}
impl TableRepartKey {
pub fn new(table_id: TableId) -> Self {
Self { table_id }
}
/// Returns the range prefix of the table repartition key.
pub fn range_prefix() -> Vec<u8> {
format!("{}/", TABLE_REPART_PREFIX).into_bytes()
}
}
impl MetadataKey<'_, TableRepartKey> for TableRepartKey {
fn to_bytes(&self) -> Vec<u8> {
self.to_string().into_bytes()
}
fn from_bytes(bytes: &[u8]) -> Result<TableRepartKey> {
let key = std::str::from_utf8(bytes).map_err(|e| {
InvalidMetadataSnafu {
err_msg: format!(
"TableRepartKey '{}' is not a valid UTF8 string: {e}",
String::from_utf8_lossy(bytes)
),
}
.build()
})?;
let captures = TABLE_REPART_KEY_PATTERN
.captures(key)
.context(InvalidMetadataSnafu {
err_msg: format!("Invalid TableRepartKey '{key}'"),
})?;
// Safety: pass the regex check above
let table_id = captures[1].parse::<TableId>().unwrap();
Ok(TableRepartKey { table_id })
}
}
impl Display for TableRepartKey {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
write!(f, "{}/{}", TABLE_REPART_PREFIX, self.table_id)
}
}
#[derive(Debug, PartialEq, Serialize, Deserialize, Clone, Default)]
pub struct TableRepartValue {
/// A mapping from source region IDs to sets of destination region IDs after repartition.
///
/// Each key in the map is a `RegionId` representing a source region that has been repartitioned.
/// The corresponding value is a `BTreeSet<RegionId>` containing the IDs of destination regions
/// that currently hold files originally from the source region. This mapping is updated after
/// repartition and is used by the GC scheduler to track and clean up files that have been moved.
pub src_to_dst: BTreeMap<RegionId, BTreeSet<RegionId>>,
}
impl TableRepartValue {
/// Creates a new TableRepartValue with an empty src_to_dst map.
pub fn new() -> Self {
Default::default()
}
/// Update mapping from src region to dst regions. Should be called once repartition is done.
///
/// If `dst` is empty, this method does nothing.
pub fn update_mappings(&mut self, src: RegionId, dst: &[RegionId]) {
if dst.is_empty() {
return;
}
self.src_to_dst.entry(src).or_default().extend(dst);
}
/// Remove mappings from src region to dst regions. Should be called once files from src region are cleaned up in dst regions.
pub fn remove_mappings(&mut self, src: RegionId, dsts: &[RegionId]) {
if let Some(dst_set) = self.src_to_dst.get_mut(&src) {
for dst in dsts {
dst_set.remove(dst);
}
if dst_set.is_empty() {
self.src_to_dst.remove(&src);
}
}
}
}
impl MetadataValue for TableRepartValue {
fn try_from_raw_value(raw_value: &[u8]) -> Result<Self> {
serde_json::from_slice::<TableRepartValue>(raw_value).context(SerdeJsonSnafu)
}
fn try_as_raw_value(&self) -> Result<Vec<u8>> {
serde_json::to_vec(self).context(SerdeJsonSnafu)
}
}
pub type TableRepartValueDecodeResult =
Result<Option<DeserializedValueWithBytes<TableRepartValue>>>;
pub struct TableRepartManager {
kv_backend: KvBackendRef,
}
impl TableRepartManager {
pub fn new(kv_backend: KvBackendRef) -> Self {
Self { kv_backend }
}
/// Builds a create table repart transaction,
/// it expected the `__table_repart/{table_id}` wasn't occupied.
pub fn build_create_txn(
&self,
table_id: TableId,
table_repart_value: &TableRepartValue,
) -> Result<(
Txn,
impl FnOnce(&mut TxnOpGetResponseSet) -> TableRepartValueDecodeResult + use<>,
)> {
let key = TableRepartKey::new(table_id);
let raw_key = key.to_bytes();
let txn = Txn::put_if_not_exists(raw_key.clone(), table_repart_value.try_as_raw_value()?);
Ok((
txn,
TxnOpGetResponseSet::decode_with(TxnOpGetResponseSet::filter(raw_key)),
))
}
/// Builds a update table repart transaction,
/// it expected the remote value equals the `current_table_repart_value`.
/// It retrieves the latest value if the comparing failed.
pub fn build_update_txn(
&self,
table_id: TableId,
current_table_repart_value: &DeserializedValueWithBytes<TableRepartValue>,
new_table_repart_value: &TableRepartValue,
) -> Result<(
Txn,
impl FnOnce(&mut TxnOpGetResponseSet) -> TableRepartValueDecodeResult + use<>,
)> {
let key = TableRepartKey::new(table_id);
let raw_key = key.to_bytes();
let raw_value = current_table_repart_value.get_raw_bytes();
let new_raw_value: Vec<u8> = new_table_repart_value.try_as_raw_value()?;
let txn = Txn::compare_and_put(raw_key.clone(), raw_value, new_raw_value);
Ok((
txn,
TxnOpGetResponseSet::decode_with(TxnOpGetResponseSet::filter(raw_key)),
))
}
/// Returns the [`TableRepartValue`].
pub async fn get(&self, table_id: TableId) -> Result<Option<TableRepartValue>> {
self.get_inner(table_id).await
}
async fn get_inner(&self, table_id: TableId) -> Result<Option<TableRepartValue>> {
let key = TableRepartKey::new(table_id);
self.kv_backend
.get(&key.to_bytes())
.await?
.map(|kv| TableRepartValue::try_from_raw_value(&kv.value))
.transpose()
}
/// Returns the [`TableRepartValue`] wrapped with [`DeserializedValueWithBytes`].
pub async fn get_with_raw_bytes(
&self,
table_id: TableId,
) -> Result<Option<DeserializedValueWithBytes<TableRepartValue>>> {
self.get_with_raw_bytes_inner(table_id).await
}
async fn get_with_raw_bytes_inner(
&self,
table_id: TableId,
) -> Result<Option<DeserializedValueWithBytes<TableRepartValue>>> {
let key = TableRepartKey::new(table_id);
self.kv_backend
.get(&key.to_bytes())
.await?
.map(|kv| DeserializedValueWithBytes::from_inner_slice(&kv.value))
.transpose()
}
/// Returns batch of [`TableRepartValue`] that respects the order of `table_ids`.
pub async fn batch_get(&self, table_ids: &[TableId]) -> Result<Vec<Option<TableRepartValue>>> {
let raw_table_reparts = self.batch_get_inner(table_ids).await?;
Ok(raw_table_reparts
.into_iter()
.map(|v| v.map(|x| x.inner))
.collect())
}
/// Returns batch of [`TableRepartValue`] wrapped with [`DeserializedValueWithBytes`].
pub async fn batch_get_with_raw_bytes(
&self,
table_ids: &[TableId],
) -> Result<Vec<Option<DeserializedValueWithBytes<TableRepartValue>>>> {
self.batch_get_inner(table_ids).await
}
async fn batch_get_inner(
&self,
table_ids: &[TableId],
) -> Result<Vec<Option<DeserializedValueWithBytes<TableRepartValue>>>> {
let keys = table_ids
.iter()
.map(|id| TableRepartKey::new(*id).to_bytes())
.collect::<Vec<_>>();
let resp = self
.kv_backend
.batch_get(BatchGetRequest { keys: keys.clone() })
.await?;
let kvs = resp
.kvs
.into_iter()
.map(|kv| (kv.key, kv.value))
.collect::<HashMap<_, _>>();
keys.into_iter()
.map(|key| {
if let Some(value) = kvs.get(&key) {
Ok(Some(DeserializedValueWithBytes::from_inner_slice(value)?))
} else {
Ok(None)
}
})
.collect()
}
/// Updates mappings from src region to dst regions.
/// Should be called once repartition is done.
pub async fn update_mappings(&self, src: RegionId, dst: &[RegionId]) -> Result<()> {
let table_id = src.table_id();
// Get current table repart with raw bytes for CAS operation
let current_table_repart = self
.get_with_raw_bytes(table_id)
.await?
.context(crate::error::TableRepartNotFoundSnafu { table_id })?;
// Clone the current repart value and update mappings
let mut new_table_repart_value = current_table_repart.inner.clone();
new_table_repart_value.update_mappings(src, dst);
// Execute atomic update
let (txn, _) =
self.build_update_txn(table_id, &current_table_repart, &new_table_repart_value)?;
let result = self.kv_backend.txn(txn).await?;
ensure!(
result.succeeded,
crate::error::MetadataCorruptionSnafu {
err_msg: format!(
"Failed to update mappings for table {}: CAS operation failed",
table_id
),
}
);
Ok(())
}
/// Removes mappings from src region to dst regions.
/// Should be called once files from src region are cleaned up in dst regions.
pub async fn remove_mappings(&self, src: RegionId, dsts: &[RegionId]) -> Result<()> {
let table_id = src.table_id();
// Get current table repart with raw bytes for CAS operation
let current_table_repart = self
.get_with_raw_bytes(table_id)
.await?
.context(crate::error::TableRepartNotFoundSnafu { table_id })?;
// Clone the current repart value and remove mappings
let mut new_table_repart_value = current_table_repart.inner.clone();
new_table_repart_value.remove_mappings(src, dsts);
// Execute atomic update
let (txn, _) =
self.build_update_txn(table_id, &current_table_repart, &new_table_repart_value)?;
let result = self.kv_backend.txn(txn).await?;
ensure!(
result.succeeded,
crate::error::MetadataCorruptionSnafu {
err_msg: format!(
"Failed to remove mappings for table {}: CAS operation failed",
table_id
),
}
);
Ok(())
}
/// Returns the destination regions for a given source region.
pub async fn get_dst_regions(
&self,
src_region: RegionId,
) -> Result<Option<BTreeSet<RegionId>>> {
let table_id = src_region.table_id();
let table_repart = self.get(table_id).await?;
Ok(table_repart.and_then(|repart| repart.src_to_dst.get(&src_region).cloned()))
}
}
#[cfg(test)]
mod tests {
use std::collections::BTreeMap;
use std::sync::Arc;
use super::*;
use crate::kv_backend::TxnService;
use crate::kv_backend::memory::MemoryKvBackend;
#[test]
fn test_table_repart_key_serialization() {
let key = TableRepartKey::new(42);
let raw_key = key.to_bytes();
assert_eq!(raw_key, b"__table_repart/42");
}
#[test]
fn test_table_repart_key_deserialization() {
let expected = TableRepartKey::new(42);
let key = TableRepartKey::from_bytes(b"__table_repart/42").unwrap();
assert_eq!(key, expected);
}
#[test]
fn test_table_repart_key_deserialization_invalid_utf8() {
let result = TableRepartKey::from_bytes(b"__table_repart/\xff");
assert!(result.is_err());
assert!(
result
.unwrap_err()
.to_string()
.contains("not a valid UTF8 string")
);
}
#[test]
fn test_table_repart_key_deserialization_invalid_format() {
let result = TableRepartKey::from_bytes(b"invalid_key_format");
assert!(result.is_err());
assert!(
result
.unwrap_err()
.to_string()
.contains("Invalid TableRepartKey")
);
}
#[test]
fn test_table_repart_value_serialization_deserialization() {
let mut src_to_dst = BTreeMap::new();
let src_region = RegionId::new(1, 1);
let dst_regions = vec![RegionId::new(1, 2), RegionId::new(1, 3)];
src_to_dst.insert(src_region, dst_regions.into_iter().collect());
let value = TableRepartValue { src_to_dst };
let serialized = value.try_as_raw_value().unwrap();
let deserialized = TableRepartValue::try_from_raw_value(&serialized).unwrap();
assert_eq!(value, deserialized);
}
#[test]
fn test_table_repart_value_update_mappings_new_src() {
let mut value = TableRepartValue {
src_to_dst: BTreeMap::new(),
};
let src = RegionId::new(1, 1);
let dst = vec![RegionId::new(1, 2), RegionId::new(1, 3)];
value.update_mappings(src, &dst);
assert_eq!(value.src_to_dst.len(), 1);
assert!(value.src_to_dst.contains_key(&src));
assert_eq!(value.src_to_dst.get(&src).unwrap().len(), 2);
assert!(
value
.src_to_dst
.get(&src)
.unwrap()
.contains(&RegionId::new(1, 2))
);
assert!(
value
.src_to_dst
.get(&src)
.unwrap()
.contains(&RegionId::new(1, 3))
);
}
#[test]
fn test_table_repart_value_update_mappings_existing_src() {
let mut value = TableRepartValue {
src_to_dst: BTreeMap::new(),
};
let src = RegionId::new(1, 1);
let initial_dst = vec![RegionId::new(1, 2)];
let additional_dst = vec![RegionId::new(1, 3), RegionId::new(1, 4)];
// Initial mapping
value.update_mappings(src, &initial_dst);
// Update with additional destinations
value.update_mappings(src, &additional_dst);
assert_eq!(value.src_to_dst.len(), 1);
assert_eq!(value.src_to_dst.get(&src).unwrap().len(), 3);
assert!(
value
.src_to_dst
.get(&src)
.unwrap()
.contains(&RegionId::new(1, 2))
);
assert!(
value
.src_to_dst
.get(&src)
.unwrap()
.contains(&RegionId::new(1, 3))
);
assert!(
value
.src_to_dst
.get(&src)
.unwrap()
.contains(&RegionId::new(1, 4))
);
}
#[test]
fn test_table_repart_value_remove_mappings_existing() {
let mut value = TableRepartValue {
src_to_dst: BTreeMap::new(),
};
let src = RegionId::new(1, 1);
let dst_regions = vec![
RegionId::new(1, 2),
RegionId::new(1, 3),
RegionId::new(1, 4),
];
value.update_mappings(src, &dst_regions);
// Remove some mappings
let to_remove = vec![RegionId::new(1, 2), RegionId::new(1, 3)];
value.remove_mappings(src, &to_remove);
assert_eq!(value.src_to_dst.len(), 1);
assert_eq!(value.src_to_dst.get(&src).unwrap().len(), 1);
assert!(
value
.src_to_dst
.get(&src)
.unwrap()
.contains(&RegionId::new(1, 4))
);
}
#[test]
fn test_table_repart_value_remove_mappings_all() {
let mut value = TableRepartValue {
src_to_dst: BTreeMap::new(),
};
let src = RegionId::new(1, 1);
let dst_regions = vec![RegionId::new(1, 2), RegionId::new(1, 3)];
value.update_mappings(src, &dst_regions);
// Remove all mappings
value.remove_mappings(src, &dst_regions);
assert_eq!(value.src_to_dst.len(), 0);
}
#[test]
fn test_table_repart_value_remove_mappings_nonexistent() {
let mut value = TableRepartValue {
src_to_dst: BTreeMap::new(),
};
let src = RegionId::new(1, 1);
let dst_regions = vec![RegionId::new(1, 2)];
value.update_mappings(src, &dst_regions);
// Try to remove non-existent mappings
let nonexistent_dst = vec![RegionId::new(1, 3), RegionId::new(1, 4)];
value.remove_mappings(src, &nonexistent_dst);
// Should remain unchanged
assert_eq!(value.src_to_dst.len(), 1);
assert_eq!(value.src_to_dst.get(&src).unwrap().len(), 1);
assert!(
value
.src_to_dst
.get(&src)
.unwrap()
.contains(&RegionId::new(1, 2))
);
}
#[test]
fn test_table_repart_value_remove_mappings_nonexistent_src() {
let mut value = TableRepartValue {
src_to_dst: BTreeMap::new(),
};
let src = RegionId::new(1, 1);
let dst_regions = vec![RegionId::new(1, 2)];
// Try to remove mappings for non-existent source
value.remove_mappings(src, &dst_regions);
// Should remain empty
assert_eq!(value.src_to_dst.len(), 0);
}
#[tokio::test]
async fn test_table_repart_manager_get_empty() {
let kv = Arc::new(MemoryKvBackend::default());
let manager = TableRepartManager::new(kv);
let result = manager.get(1024).await.unwrap();
assert!(result.is_none());
}
#[tokio::test]
async fn test_table_repart_manager_get_with_raw_bytes_empty() {
let kv = Arc::new(MemoryKvBackend::default());
let manager = TableRepartManager::new(kv);
let result = manager.get_with_raw_bytes(1024).await.unwrap();
assert!(result.is_none());
}
#[tokio::test]
async fn test_table_repart_manager_create_and_get() {
let kv = Arc::new(MemoryKvBackend::default());
let manager = TableRepartManager::new(kv.clone());
let mut src_to_dst = BTreeMap::new();
let src_region = RegionId::new(1, 1);
let dst_regions = vec![RegionId::new(1, 2), RegionId::new(1, 3)];
src_to_dst.insert(src_region, dst_regions.into_iter().collect());
let value = TableRepartValue { src_to_dst };
// Create the table repart
let (txn, _) = manager.build_create_txn(1024, &value).unwrap();
let result = kv.txn(txn).await.unwrap();
assert!(result.succeeded);
// Get the table repart
let retrieved = manager.get(1024).await.unwrap().unwrap();
assert_eq!(retrieved, value);
}
#[tokio::test]
async fn test_table_repart_manager_update_txn() {
let kv = Arc::new(MemoryKvBackend::default());
let manager = TableRepartManager::new(kv.clone());
let initial_value = TableRepartValue {
src_to_dst: BTreeMap::new(),
};
// Create initial table repart
let (create_txn, _) = manager.build_create_txn(1024, &initial_value).unwrap();
let result = kv.txn(create_txn).await.unwrap();
assert!(result.succeeded);
// Get current value with raw bytes
let current_value = manager.get_with_raw_bytes(1024).await.unwrap().unwrap();
// Create updated value
let mut updated_src_to_dst = BTreeMap::new();
let src_region = RegionId::new(1, 1);
let dst_regions = vec![RegionId::new(1, 2)];
updated_src_to_dst.insert(src_region, dst_regions.into_iter().collect());
let updated_value = TableRepartValue {
src_to_dst: updated_src_to_dst,
};
// Build update transaction
let (update_txn, _) = manager
.build_update_txn(1024, &current_value, &updated_value)
.unwrap();
let result = kv.txn(update_txn).await.unwrap();
assert!(result.succeeded);
// Verify update
let retrieved = manager.get(1024).await.unwrap().unwrap();
assert_eq!(retrieved, updated_value);
}
#[tokio::test]
async fn test_table_repart_manager_batch_get() {
let kv = Arc::new(MemoryKvBackend::default());
let manager = TableRepartManager::new(kv.clone());
// Create multiple table reparts
let table_reparts = vec![
(
1024,
TableRepartValue {
src_to_dst: {
let mut map = BTreeMap::new();
map.insert(
RegionId::new(1, 1),
vec![RegionId::new(1, 2)].into_iter().collect(),
);
map
},
},
),
(
1025,
TableRepartValue {
src_to_dst: {
let mut map = BTreeMap::new();
map.insert(
RegionId::new(2, 1),
vec![RegionId::new(2, 2), RegionId::new(2, 3)]
.into_iter()
.collect(),
);
map
},
},
),
];
for (table_id, value) in &table_reparts {
let (txn, _) = manager.build_create_txn(*table_id, value).unwrap();
let result = kv.txn(txn).await.unwrap();
assert!(result.succeeded);
}
// Batch get
let results = manager.batch_get(&[1024, 1025, 1026]).await.unwrap();
assert_eq!(results.len(), 3);
assert_eq!(results[0].as_ref().unwrap(), &table_reparts[0].1);
assert_eq!(results[1].as_ref().unwrap(), &table_reparts[1].1);
assert!(results[2].is_none());
}
#[tokio::test]
async fn test_table_repart_manager_update_mappings() {
let kv = Arc::new(MemoryKvBackend::default());
let manager = TableRepartManager::new(kv.clone());
// Create initial table repart
let initial_value = TableRepartValue {
src_to_dst: BTreeMap::new(),
};
let (txn, _) = manager.build_create_txn(1024, &initial_value).unwrap();
let result = kv.txn(txn).await.unwrap();
assert!(result.succeeded);
// Update mappings
let src = RegionId::new(1024, 1);
let dst = vec![RegionId::new(1024, 2), RegionId::new(1024, 3)];
manager.update_mappings(src, &dst).await.unwrap();
// Verify update
let retrieved = manager.get(1024).await.unwrap().unwrap();
assert_eq!(retrieved.src_to_dst.len(), 1);
assert!(retrieved.src_to_dst.contains_key(&src));
assert_eq!(retrieved.src_to_dst.get(&src).unwrap().len(), 2);
}
#[tokio::test]
async fn test_table_repart_manager_remove_mappings() {
let kv = Arc::new(MemoryKvBackend::default());
let manager = TableRepartManager::new(kv.clone());
// Create initial table repart with mappings
let mut initial_src_to_dst = BTreeMap::new();
let src = RegionId::new(1024, 1);
let dst_regions = vec![
RegionId::new(1024, 2),
RegionId::new(1024, 3),
RegionId::new(1024, 4),
];
initial_src_to_dst.insert(src, dst_regions.into_iter().collect());
let initial_value = TableRepartValue {
src_to_dst: initial_src_to_dst,
};
let (txn, _) = manager.build_create_txn(1024, &initial_value).unwrap();
let result = kv.txn(txn).await.unwrap();
assert!(result.succeeded);
// Remove some mappings
let to_remove = vec![RegionId::new(1024, 2), RegionId::new(1024, 3)];
manager.remove_mappings(src, &to_remove).await.unwrap();
// Verify removal
let retrieved = manager.get(1024).await.unwrap().unwrap();
assert_eq!(retrieved.src_to_dst.len(), 1);
assert_eq!(retrieved.src_to_dst.get(&src).unwrap().len(), 1);
assert!(
retrieved
.src_to_dst
.get(&src)
.unwrap()
.contains(&RegionId::new(1024, 4))
);
}
#[tokio::test]
async fn test_table_repart_manager_get_dst_regions() {
let kv = Arc::new(MemoryKvBackend::default());
let manager = TableRepartManager::new(kv.clone());
// Create initial table repart with mappings
let mut initial_src_to_dst = BTreeMap::new();
let src = RegionId::new(1024, 1);
let dst_regions = vec![RegionId::new(1024, 2), RegionId::new(1024, 3)];
initial_src_to_dst.insert(src, dst_regions.into_iter().collect());
let initial_value = TableRepartValue {
src_to_dst: initial_src_to_dst,
};
let (txn, _) = manager.build_create_txn(1024, &initial_value).unwrap();
let result = kv.txn(txn).await.unwrap();
assert!(result.succeeded);
// Get destination regions
let dst_regions = manager.get_dst_regions(src).await.unwrap();
assert!(dst_regions.is_some());
let dst_set = dst_regions.unwrap();
assert_eq!(dst_set.len(), 2);
assert!(dst_set.contains(&RegionId::new(1024, 2)));
assert!(dst_set.contains(&RegionId::new(1024, 3)));
// Test non-existent source region
let nonexistent_src = RegionId::new(1024, 99);
let result = manager.get_dst_regions(nonexistent_src).await.unwrap();
assert!(result.is_none());
}
#[tokio::test]
async fn test_table_repart_manager_operations_on_nonexistent_table() {
let kv = Arc::new(MemoryKvBackend::default());
let manager = TableRepartManager::new(kv);
let src = RegionId::new(1024, 1);
let dst = vec![RegionId::new(1024, 2)];
// Try to update mappings on non-existent table
let result = manager.update_mappings(src, &dst).await;
assert!(result.is_err());
let err_msg = result.unwrap_err().to_string();
assert!(
err_msg.contains("Failed to find table repartition metadata for table id 1024"),
"{err_msg}"
);
// Try to remove mappings on non-existent table
let result = manager.remove_mappings(src, &dst).await;
assert!(result.is_err());
let err_msg = result.unwrap_err().to_string();
assert!(
err_msg.contains("Failed to find table repartition metadata for table id 1024"),
"{err_msg}"
);
}
#[tokio::test]
async fn test_table_repart_manager_batch_get_with_raw_bytes() {
let kv = Arc::new(MemoryKvBackend::default());
let manager = TableRepartManager::new(kv.clone());
// Create table repart
let value = TableRepartValue {
src_to_dst: {
let mut map = BTreeMap::new();
map.insert(
RegionId::new(1, 1),
vec![RegionId::new(1, 2)].into_iter().collect(),
);
map
},
};
let (txn, _) = manager.build_create_txn(1024, &value).unwrap();
let result = kv.txn(txn).await.unwrap();
assert!(result.succeeded);
// Batch get with raw bytes
let results = manager
.batch_get_with_raw_bytes(&[1024, 1025])
.await
.unwrap();
assert_eq!(results.len(), 2);
assert!(results[0].is_some());
assert!(results[1].is_none());
let retrieved = &results[0].as_ref().unwrap().inner;
assert_eq!(retrieved, &value);
}
}

View File

@@ -71,7 +71,6 @@ pub struct PhysicalTableRouteValue {
#[derive(Debug, PartialEq, Serialize, Deserialize, Clone)] #[derive(Debug, PartialEq, Serialize, Deserialize, Clone)]
pub struct LogicalTableRouteValue { pub struct LogicalTableRouteValue {
physical_table_id: TableId, physical_table_id: TableId,
region_ids: Vec<RegionId>,
} }
impl TableRouteValue { impl TableRouteValue {
@@ -85,14 +84,7 @@ impl TableRouteValue {
if table_id == physical_table_id { if table_id == physical_table_id {
TableRouteValue::physical(region_routes) TableRouteValue::physical(region_routes)
} else { } else {
let region_routes = region_routes TableRouteValue::logical(physical_table_id)
.into_iter()
.map(|region| {
debug_assert_eq!(region.region.id.table_id(), physical_table_id);
RegionId::new(table_id, region.region.id.region_number())
})
.collect();
TableRouteValue::logical(physical_table_id, region_routes)
} }
} }
@@ -100,8 +92,8 @@ impl TableRouteValue {
Self::Physical(PhysicalTableRouteValue::new(region_routes)) Self::Physical(PhysicalTableRouteValue::new(region_routes))
} }
pub fn logical(physical_table_id: TableId, region_ids: Vec<RegionId>) -> Self { pub fn logical(physical_table_id: TableId) -> Self {
Self::Logical(LogicalTableRouteValue::new(physical_table_id, region_ids)) Self::Logical(LogicalTableRouteValue::new(physical_table_id))
} }
/// Returns a new version [TableRouteValue] with `region_routes`. /// Returns a new version [TableRouteValue] with `region_routes`.
@@ -207,11 +199,9 @@ impl TableRouteValue {
.iter() .iter()
.map(|region_route| region_route.region.id.region_number()) .map(|region_route| region_route.region.id.region_number())
.collect(), .collect(),
TableRouteValue::Logical(x) => x TableRouteValue::Logical(_) => {
.region_ids() vec![]
.iter() }
.map(|region_id| region_id.region_number())
.collect(),
} }
} }
} }
@@ -245,20 +235,13 @@ impl PhysicalTableRouteValue {
} }
impl LogicalTableRouteValue { impl LogicalTableRouteValue {
pub fn new(physical_table_id: TableId, region_ids: Vec<RegionId>) -> Self { pub fn new(physical_table_id: TableId) -> Self {
Self { Self { physical_table_id }
physical_table_id,
region_ids,
}
} }
pub fn physical_table_id(&self) -> TableId { pub fn physical_table_id(&self) -> TableId {
self.physical_table_id self.physical_table_id
} }
pub fn region_ids(&self) -> &Vec<RegionId> {
&self.region_ids
}
} }
impl MetadataKey<'_, TableRouteKey> for TableRouteKey { impl MetadataKey<'_, TableRouteKey> for TableRouteKey {
@@ -900,7 +883,6 @@ mod tests {
let table_route_manager = TableRouteManager::new(kv.clone()); let table_route_manager = TableRouteManager::new(kv.clone());
let table_route_value = TableRouteValue::Logical(LogicalTableRouteValue { let table_route_value = TableRouteValue::Logical(LogicalTableRouteValue {
physical_table_id: 1023, physical_table_id: 1023,
region_ids: vec![RegionId::new(1023, 1)],
}); });
let (txn, _) = table_route_manager let (txn, _) = table_route_manager
.table_route_storage() .table_route_storage()
@@ -930,14 +912,12 @@ mod tests {
1024, 1024,
TableRouteValue::Logical(LogicalTableRouteValue { TableRouteValue::Logical(LogicalTableRouteValue {
physical_table_id: 1023, physical_table_id: 1023,
region_ids: vec![RegionId::new(1023, 1)],
}), }),
), ),
( (
1025, 1025,
TableRouteValue::Logical(LogicalTableRouteValue { TableRouteValue::Logical(LogicalTableRouteValue {
physical_table_id: 1023, physical_table_id: 1023,
region_ids: vec![RegionId::new(1023, 2)],
}), }),
), ),
]; ];

View File

@@ -19,11 +19,7 @@ use datatypes::schema::{ColumnSchema, SchemaBuilder};
use store_api::storage::TableId; use store_api::storage::TableId;
use table::metadata::{TableInfo, TableInfoBuilder, TableMetaBuilder}; use table::metadata::{TableInfo, TableInfoBuilder, TableMetaBuilder};
pub fn new_test_table_info_with_name<I: IntoIterator<Item = u32>>( pub fn new_test_table_info_with_name(table_id: TableId, table_name: &str) -> TableInfo {
table_id: TableId,
table_name: &str,
region_numbers: I,
) -> TableInfo {
let column_schemas = vec![ let column_schemas = vec![
ColumnSchema::new("col1", ConcreteDataType::int32_datatype(), true), ColumnSchema::new("col1", ConcreteDataType::int32_datatype(), true),
ColumnSchema::new( ColumnSchema::new(
@@ -45,7 +41,6 @@ pub fn new_test_table_info_with_name<I: IntoIterator<Item = u32>>(
.primary_key_indices(vec![0]) .primary_key_indices(vec![0])
.engine("engine") .engine("engine")
.next_column_id(3) .next_column_id(3)
.region_numbers(region_numbers.into_iter().collect::<Vec<_>>())
.build() .build()
.unwrap(); .unwrap();
TableInfoBuilder::default() TableInfoBuilder::default()
@@ -56,9 +51,6 @@ pub fn new_test_table_info_with_name<I: IntoIterator<Item = u32>>(
.build() .build()
.unwrap() .unwrap()
} }
pub fn new_test_table_info<I: IntoIterator<Item = u32>>( pub fn new_test_table_info(table_id: TableId) -> TableInfo {
table_id: TableId, new_test_table_info_with_name(table_id, "mytable")
region_numbers: I,
) -> TableInfo {
new_test_table_info_with_name(table_id, "mytable", region_numbers)
} }

View File

@@ -848,7 +848,7 @@ impl PgStore {
.context(CreatePostgresPoolSnafu)?, .context(CreatePostgresPoolSnafu)?,
}; };
Self::with_pg_pool(pool, None, table_name, max_txn_ops).await Self::with_pg_pool(pool, None, table_name, max_txn_ops, false).await
} }
/// Create [PgStore] impl of [KvBackendRef] from url (backward compatibility). /// Create [PgStore] impl of [KvBackendRef] from url (backward compatibility).
@@ -862,20 +862,37 @@ impl PgStore {
schema_name: Option<&str>, schema_name: Option<&str>,
table_name: &str, table_name: &str,
max_txn_ops: usize, max_txn_ops: usize,
auto_create_schema: bool,
) -> Result<KvBackendRef> { ) -> Result<KvBackendRef> {
// Ensure the postgres metadata backend is ready to use. // Ensure the postgres metadata backend is ready to use.
let client = match pool.get().await { let client = match pool.get().await {
Ok(client) => client, Ok(client) => client,
Err(e) => { Err(e) => {
// We need to log the debug for the error to help diagnose the issue.
common_telemetry::error!(e; "Failed to get Postgres connection.");
return GetPostgresConnectionSnafu { return GetPostgresConnectionSnafu {
reason: e.to_string(), reason: e.to_string(),
} }
.fail(); .fail();
} }
}; };
// Automatically create schema if enabled and schema_name is provided.
if auto_create_schema
&& let Some(schema) = schema_name
&& !schema.is_empty()
{
let create_schema_sql = format!("CREATE SCHEMA IF NOT EXISTS \"{}\"", schema);
client
.execute(&create_schema_sql, &[])
.await
.with_context(|_| PostgresExecutionSnafu {
sql: create_schema_sql.clone(),
})?;
}
let template_factory = PgSqlTemplateFactory::new(schema_name, table_name); let template_factory = PgSqlTemplateFactory::new(schema_name, table_name);
let sql_template_set = template_factory.build(); let sql_template_set = template_factory.build();
// Do not attempt to create schema implicitly.
client client
.execute(&sql_template_set.create_table_statement, &[]) .execute(&sql_template_set.create_table_statement, &[])
.await .await
@@ -959,7 +976,7 @@ mod tests {
let Some(pool) = build_pg15_pool().await else { let Some(pool) = build_pg15_pool().await else {
return; return;
}; };
let res = PgStore::with_pg_pool(pool, None, "pg15_public_should_fail", 128).await; let res = PgStore::with_pg_pool(pool, None, "pg15_public_should_fail", 128, false).await;
assert!( assert!(
res.is_err(), res.is_err(),
"creating table in public should fail for test_user" "creating table in public should fail for test_user"
@@ -1214,4 +1231,249 @@ mod tests {
let t = PgSqlTemplateFactory::format_table_ident(Some(""), "test_table"); let t = PgSqlTemplateFactory::format_table_ident(Some(""), "test_table");
assert_eq!(t, "\"test_table\""); assert_eq!(t, "\"test_table\"");
} }
#[tokio::test]
async fn test_auto_create_schema_enabled() {
common_telemetry::init_default_ut_logging();
maybe_skip_postgres_integration_test!();
let endpoints = std::env::var("GT_POSTGRES_ENDPOINTS").unwrap();
let mut cfg = Config::new();
cfg.url = Some(endpoints);
let pool = cfg
.create_pool(Some(Runtime::Tokio1), NoTls)
.context(CreatePostgresPoolSnafu)
.unwrap();
let schema_name = "test_auto_create_enabled";
let table_name = "test_table";
// Drop the schema if it exists to start clean
let client = pool.get().await.unwrap();
let _ = client
.execute(
&format!("DROP SCHEMA IF EXISTS \"{}\" CASCADE", schema_name),
&[],
)
.await;
// Create store with auto_create_schema enabled
let _ = PgStore::with_pg_pool(pool.clone(), Some(schema_name), table_name, 128, true)
.await
.unwrap();
// Verify schema was created
let row = client
.query_one(
"SELECT schema_name FROM information_schema.schemata WHERE schema_name = $1",
&[&schema_name],
)
.await
.unwrap();
let created_schema: String = row.get(0);
assert_eq!(created_schema, schema_name);
// Verify table was created in the schema
let row = client
.query_one(
"SELECT table_schema, table_name FROM information_schema.tables WHERE table_schema = $1 AND table_name = $2",
&[&schema_name, &table_name],
)
.await
.unwrap();
let created_table_schema: String = row.get(0);
let created_table_name: String = row.get(1);
assert_eq!(created_table_schema, schema_name);
assert_eq!(created_table_name, table_name);
// Cleanup
let _ = client
.execute(
&format!("DROP SCHEMA IF EXISTS \"{}\" CASCADE", schema_name),
&[],
)
.await;
}
#[tokio::test]
async fn test_auto_create_schema_disabled() {
common_telemetry::init_default_ut_logging();
maybe_skip_postgres_integration_test!();
let endpoints = std::env::var("GT_POSTGRES_ENDPOINTS").unwrap();
let mut cfg = Config::new();
cfg.url = Some(endpoints);
let pool = cfg
.create_pool(Some(Runtime::Tokio1), NoTls)
.context(CreatePostgresPoolSnafu)
.unwrap();
let schema_name = "test_auto_create_disabled";
let table_name = "test_table";
// Drop the schema if it exists to start clean
let client = pool.get().await.unwrap();
let _ = client
.execute(
&format!("DROP SCHEMA IF EXISTS \"{}\" CASCADE", schema_name),
&[],
)
.await;
// Try to create store with auto_create_schema disabled (should fail)
let result =
PgStore::with_pg_pool(pool.clone(), Some(schema_name), table_name, 128, false).await;
// Verify it failed because schema doesn't exist
assert!(
result.is_err(),
"Expected error when schema doesn't exist and auto_create_schema is disabled"
);
}
#[tokio::test]
async fn test_auto_create_schema_already_exists() {
common_telemetry::init_default_ut_logging();
maybe_skip_postgres_integration_test!();
let endpoints = std::env::var("GT_POSTGRES_ENDPOINTS").unwrap();
let mut cfg = Config::new();
cfg.url = Some(endpoints);
let pool = cfg
.create_pool(Some(Runtime::Tokio1), NoTls)
.context(CreatePostgresPoolSnafu)
.unwrap();
let schema_name = "test_auto_create_existing";
let table_name = "test_table";
// Manually create the schema first
let client = pool.get().await.unwrap();
let _ = client
.execute(
&format!("DROP SCHEMA IF EXISTS \"{}\" CASCADE", schema_name),
&[],
)
.await;
client
.execute(&format!("CREATE SCHEMA \"{}\"", schema_name), &[])
.await
.unwrap();
// Create store with auto_create_schema enabled (should succeed idempotently)
let _ = PgStore::with_pg_pool(pool.clone(), Some(schema_name), table_name, 128, true)
.await
.unwrap();
// Verify schema still exists
let row = client
.query_one(
"SELECT schema_name FROM information_schema.schemata WHERE schema_name = $1",
&[&schema_name],
)
.await
.unwrap();
let created_schema: String = row.get(0);
assert_eq!(created_schema, schema_name);
// Verify table was created in the schema
let row = client
.query_one(
"SELECT table_schema, table_name FROM information_schema.tables WHERE table_schema = $1 AND table_name = $2",
&[&schema_name, &table_name],
)
.await
.unwrap();
let created_table_schema: String = row.get(0);
let created_table_name: String = row.get(1);
assert_eq!(created_table_schema, schema_name);
assert_eq!(created_table_name, table_name);
// Cleanup
let _ = client
.execute(
&format!("DROP SCHEMA IF EXISTS \"{}\" CASCADE", schema_name),
&[],
)
.await;
}
#[tokio::test]
async fn test_auto_create_schema_no_schema_name() {
common_telemetry::init_default_ut_logging();
maybe_skip_postgres_integration_test!();
let endpoints = std::env::var("GT_POSTGRES_ENDPOINTS").unwrap();
let mut cfg = Config::new();
cfg.url = Some(endpoints);
let pool = cfg
.create_pool(Some(Runtime::Tokio1), NoTls)
.context(CreatePostgresPoolSnafu)
.unwrap();
let table_name = "test_table_no_schema";
// Create store with auto_create_schema enabled but no schema name (should succeed)
// This should create the table in the default schema (public)
let _ = PgStore::with_pg_pool(pool.clone(), None, table_name, 128, true)
.await
.unwrap();
// Verify table was created in public schema
let client = pool.get().await.unwrap();
let row = client
.query_one(
"SELECT table_schema, table_name FROM information_schema.tables WHERE table_name = $1",
&[&table_name],
)
.await
.unwrap();
let created_table_schema: String = row.get(0);
let created_table_name: String = row.get(1);
assert_eq!(created_table_name, table_name);
// Verify it's in public schema (or whichever is the default)
assert!(created_table_schema == "public" || !created_table_schema.is_empty());
// Cleanup
let _ = client
.execute(&format!("DROP TABLE IF EXISTS \"{}\"", table_name), &[])
.await;
}
#[tokio::test]
async fn test_auto_create_schema_with_empty_schema_name() {
common_telemetry::init_default_ut_logging();
maybe_skip_postgres_integration_test!();
let endpoints = std::env::var("GT_POSTGRES_ENDPOINTS").unwrap();
let mut cfg = Config::new();
cfg.url = Some(endpoints);
let pool = cfg
.create_pool(Some(Runtime::Tokio1), NoTls)
.context(CreatePostgresPoolSnafu)
.unwrap();
let table_name = "test_table_empty_schema";
// Create store with auto_create_schema enabled but empty schema name (should succeed)
// This should create the table in the default schema (public)
let _ = PgStore::with_pg_pool(pool.clone(), Some(""), table_name, 128, true)
.await
.unwrap();
// Verify table was created in public schema
let client = pool.get().await.unwrap();
let row = client
.query_one(
"SELECT table_schema, table_name FROM information_schema.tables WHERE table_name = $1",
&[&table_name],
)
.await
.unwrap();
let created_table_schema: String = row.get(0);
let created_table_name: String = row.get(1);
assert_eq!(created_table_name, table_name);
// Verify it's in public schema (or whichever is the default)
assert!(created_table_schema == "public" || !created_table_schema.is_empty());
// Cleanup
let _ = client
.execute(&format!("DROP TABLE IF EXISTS \"{}\"", table_name), &[])
.await;
}
} }

View File

@@ -1639,7 +1639,6 @@ mod tests {
value_indices: vec![2], value_indices: vec![2],
engine: METRIC_ENGINE_NAME.to_string(), engine: METRIC_ENGINE_NAME.to_string(),
next_column_id: 0, next_column_id: 0,
region_numbers: vec![0],
options: Default::default(), options: Default::default(),
created_on: Default::default(), created_on: Default::default(),
updated_on: Default::default(), updated_on: Default::default(),

View File

@@ -14,7 +14,7 @@
use common_telemetry::{debug, error, info}; use common_telemetry::{debug, error, info};
use common_wal::config::kafka::common::{ use common_wal::config::kafka::common::{
DEFAULT_BACKOFF_CONFIG, DEFAULT_CONNECT_TIMEOUT, KafkaConnectionConfig, KafkaTopicConfig, DEFAULT_BACKOFF_CONFIG, KafkaConnectionConfig, KafkaTopicConfig,
}; };
use rskafka::client::error::Error as RsKafkaError; use rskafka::client::error::Error as RsKafkaError;
use rskafka::client::error::ProtocolError::TopicAlreadyExists; use rskafka::client::error::ProtocolError::TopicAlreadyExists;
@@ -211,7 +211,8 @@ pub async fn build_kafka_client(connection: &KafkaConnectionConfig) -> Result<Cl
// Builds an kafka controller client for creating topics. // Builds an kafka controller client for creating topics.
let mut builder = ClientBuilder::new(connection.broker_endpoints.clone()) let mut builder = ClientBuilder::new(connection.broker_endpoints.clone())
.backoff_config(DEFAULT_BACKOFF_CONFIG) .backoff_config(DEFAULT_BACKOFF_CONFIG)
.connect_timeout(Some(DEFAULT_CONNECT_TIMEOUT)); .connect_timeout(Some(connection.connect_timeout))
.timeout(Some(connection.timeout));
if let Some(sasl) = &connection.sasl { if let Some(sasl) = &connection.sasl {
builder = builder.sasl_config(sasl.config.clone().into_sasl_config()); builder = builder.sasl_config(sasl.config.clone().into_sasl_config());
}; };

View File

@@ -21,6 +21,7 @@ use std::sync::Arc;
use std::task::{Context, Poll}; use std::task::{Context, Poll};
use common_base::readable_size::ReadableSize; use common_base::readable_size::ReadableSize;
use common_telemetry::tracing::{Span, info_span};
use common_time::util::format_nanoseconds_human_readable; use common_time::util::format_nanoseconds_human_readable;
use datafusion::arrow::compute::cast; use datafusion::arrow::compute::cast;
use datafusion::arrow::datatypes::SchemaRef as DfSchemaRef; use datafusion::arrow::datatypes::SchemaRef as DfSchemaRef;
@@ -218,6 +219,7 @@ pub struct RecordBatchStreamAdapter {
metrics_2: Metrics, metrics_2: Metrics,
/// Display plan and metrics in verbose mode. /// Display plan and metrics in verbose mode.
explain_verbose: bool, explain_verbose: bool,
span: Span,
} }
/// Json encoded metrics. Contains metric from a whole plan tree. /// Json encoded metrics. Contains metric from a whole plan tree.
@@ -238,22 +240,21 @@ impl RecordBatchStreamAdapter {
metrics: None, metrics: None,
metrics_2: Metrics::Unavailable, metrics_2: Metrics::Unavailable,
explain_verbose: false, explain_verbose: false,
span: Span::current(),
}) })
} }
pub fn try_new_with_metrics_and_df_plan( pub fn try_new_with_span(stream: DfSendableRecordBatchStream, span: Span) -> Result<Self> {
stream: DfSendableRecordBatchStream,
metrics: BaselineMetrics,
df_plan: Arc<dyn ExecutionPlan>,
) -> Result<Self> {
let schema = let schema =
Arc::new(Schema::try_from(stream.schema()).context(error::SchemaConversionSnafu)?); Arc::new(Schema::try_from(stream.schema()).context(error::SchemaConversionSnafu)?);
let subspan = info_span!(parent: &span, "RecordBatchStreamAdapter");
Ok(Self { Ok(Self {
schema, schema,
stream, stream,
metrics: Some(metrics), metrics: None,
metrics_2: Metrics::Unresolved(df_plan), metrics_2: Metrics::Unavailable,
explain_verbose: false, explain_verbose: false,
span: subspan,
}) })
} }
@@ -300,6 +301,8 @@ impl Stream for RecordBatchStreamAdapter {
.map(|m| m.elapsed_compute().clone()) .map(|m| m.elapsed_compute().clone())
.unwrap_or_default(); .unwrap_or_default();
let _guard = timer.timer(); let _guard = timer.timer();
let poll_span = info_span!(parent: &self.span, "poll_next");
let _entered = poll_span.enter();
match Pin::new(&mut self.stream).poll_next(cx) { match Pin::new(&mut self.stream).poll_next(cx) {
Poll::Pending => Poll::Pending, Poll::Pending => Poll::Pending,
Poll::Ready(Some(df_record_batch)) => { Poll::Ready(Some(df_record_batch)) => {

View File

@@ -29,6 +29,7 @@ use std::sync::atomic::{AtomicBool, AtomicUsize, Ordering};
use adapter::RecordBatchMetrics; use adapter::RecordBatchMetrics;
use arc_swap::ArcSwapOption; use arc_swap::ArcSwapOption;
use common_base::readable_size::ReadableSize; use common_base::readable_size::ReadableSize;
use common_telemetry::tracing::Span;
pub use datafusion::physical_plan::SendableRecordBatchStream as DfSendableRecordBatchStream; pub use datafusion::physical_plan::SendableRecordBatchStream as DfSendableRecordBatchStream;
use datatypes::arrow::array::{ArrayRef, AsArray, StringBuilder}; use datatypes::arrow::array::{ArrayRef, AsArray, StringBuilder};
use datatypes::arrow::compute::SortOptions; use datatypes::arrow::compute::SortOptions;
@@ -370,6 +371,7 @@ pub struct RecordBatchStreamWrapper<S> {
pub stream: S, pub stream: S,
pub output_ordering: Option<Vec<OrderOption>>, pub output_ordering: Option<Vec<OrderOption>>,
pub metrics: Arc<ArcSwapOption<RecordBatchMetrics>>, pub metrics: Arc<ArcSwapOption<RecordBatchMetrics>>,
pub span: Span,
} }
impl<S> RecordBatchStreamWrapper<S> { impl<S> RecordBatchStreamWrapper<S> {
@@ -380,6 +382,7 @@ impl<S> RecordBatchStreamWrapper<S> {
stream, stream,
output_ordering: None, output_ordering: None,
metrics: Default::default(), metrics: Default::default(),
span: Span::current(),
} }
} }
} }
@@ -408,6 +411,7 @@ impl<S: Stream<Item = Result<RecordBatch>> + Unpin> Stream for RecordBatchStream
type Item = Result<RecordBatch>; type Item = Result<RecordBatch>;
fn poll_next(mut self: Pin<&mut Self>, ctx: &mut Context<'_>) -> Poll<Option<Self::Item>> { fn poll_next(mut self: Pin<&mut Self>, ctx: &mut Context<'_>) -> Poll<Option<Self::Item>> {
let _entered = self.span.clone().entered();
Pin::new(&mut self.stream).poll_next(ctx) Pin::new(&mut self.stream).poll_next(ctx)
} }
} }

View File

@@ -5,10 +5,12 @@ edition.workspace = true
license.workspace = true license.workspace = true
[dependencies] [dependencies]
arrow-schema.workspace = true
common-base.workspace = true common-base.workspace = true
common-decimal.workspace = true common-decimal.workspace = true
common-error.workspace = true common-error.workspace = true
common-macro.workspace = true common-macro.workspace = true
common-telemetry.workspace = true
common-time.workspace = true common-time.workspace = true
datafusion-sql.workspace = true datafusion-sql.workspace = true
datatypes.workspace = true datatypes.workspace = true

View File

@@ -14,11 +14,12 @@
use std::str::FromStr; use std::str::FromStr;
use arrow_schema::extension::ExtensionType;
use common_time::Timestamp; use common_time::Timestamp;
use common_time::timezone::Timezone; use common_time::timezone::Timezone;
use datatypes::json::JsonStructureSettings; use datatypes::extension::json::JsonExtensionType;
use datatypes::prelude::ConcreteDataType; use datatypes::prelude::ConcreteDataType;
use datatypes::schema::ColumnDefaultConstraint; use datatypes::schema::{ColumnDefaultConstraint, ColumnSchema};
use datatypes::types::{JsonFormat, parse_string_to_jsonb, parse_string_to_vector_type_value}; use datatypes::types::{JsonFormat, parse_string_to_jsonb, parse_string_to_vector_type_value};
use datatypes::value::{OrderedF32, OrderedF64, Value}; use datatypes::value::{OrderedF32, OrderedF64, Value};
use snafu::{OptionExt, ResultExt, ensure}; use snafu::{OptionExt, ResultExt, ensure};
@@ -124,13 +125,14 @@ pub(crate) fn sql_number_to_value(data_type: &ConcreteDataType, n: &str) -> Resu
/// If `auto_string_to_numeric` is true, tries to cast the string value to numeric values, /// If `auto_string_to_numeric` is true, tries to cast the string value to numeric values,
/// and returns error if the cast fails. /// and returns error if the cast fails.
pub fn sql_value_to_value( pub fn sql_value_to_value(
column_name: &str, column_schema: &ColumnSchema,
data_type: &ConcreteDataType,
sql_val: &SqlValue, sql_val: &SqlValue,
timezone: Option<&Timezone>, timezone: Option<&Timezone>,
unary_op: Option<UnaryOperator>, unary_op: Option<UnaryOperator>,
auto_string_to_numeric: bool, auto_string_to_numeric: bool,
) -> Result<Value> { ) -> Result<Value> {
let column_name = &column_schema.name;
let data_type = &column_schema.data_type;
let mut value = match sql_val { let mut value = match sql_val {
SqlValue::Number(n, _) => sql_number_to_value(data_type, n)?, SqlValue::Number(n, _) => sql_number_to_value(data_type, n)?,
SqlValue::Null => Value::Null, SqlValue::Null => Value::Null,
@@ -146,13 +148,9 @@ pub fn sql_value_to_value(
(*b).into() (*b).into()
} }
SqlValue::DoubleQuotedString(s) | SqlValue::SingleQuotedString(s) => parse_string_to_value( SqlValue::DoubleQuotedString(s) | SqlValue::SingleQuotedString(s) => {
column_name, parse_string_to_value(column_schema, s.clone(), timezone, auto_string_to_numeric)?
s.clone(), }
data_type,
timezone,
auto_string_to_numeric,
)?,
SqlValue::HexStringLiteral(s) => { SqlValue::HexStringLiteral(s) => {
// Should not directly write binary into json column // Should not directly write binary into json column
ensure!( ensure!(
@@ -244,12 +242,12 @@ pub fn sql_value_to_value(
} }
pub(crate) fn parse_string_to_value( pub(crate) fn parse_string_to_value(
column_name: &str, column_schema: &ColumnSchema,
s: String, s: String,
data_type: &ConcreteDataType,
timezone: Option<&Timezone>, timezone: Option<&Timezone>,
auto_string_to_numeric: bool, auto_string_to_numeric: bool,
) -> Result<Value> { ) -> Result<Value> {
let data_type = &column_schema.data_type;
if auto_string_to_numeric && let Some(value) = auto_cast_to_numeric(&s, data_type)? { if auto_string_to_numeric && let Some(value) = auto_cast_to_numeric(&s, data_type)? {
return Ok(value); return Ok(value);
} }
@@ -257,7 +255,7 @@ pub(crate) fn parse_string_to_value(
ensure!( ensure!(
data_type.is_stringifiable(), data_type.is_stringifiable(),
ColumnTypeMismatchSnafu { ColumnTypeMismatchSnafu {
column_name, column_name: column_schema.name.clone(),
expect: data_type.clone(), expect: data_type.clone(),
actual: ConcreteDataType::string_datatype(), actual: ConcreteDataType::string_datatype(),
} }
@@ -303,23 +301,21 @@ pub(crate) fn parse_string_to_value(
} }
} }
ConcreteDataType::Binary(_) => Ok(Value::Binary(s.as_bytes().into())), ConcreteDataType::Binary(_) => Ok(Value::Binary(s.as_bytes().into())),
ConcreteDataType::Json(j) => { ConcreteDataType::Json(j) => match &j.format {
match &j.format { JsonFormat::Jsonb => {
JsonFormat::Jsonb => { let v = parse_string_to_jsonb(&s).context(DatatypeSnafu)?;
let v = parse_string_to_jsonb(&s).context(DatatypeSnafu)?; Ok(Value::Binary(v.into()))
Ok(Value::Binary(v.into()))
}
JsonFormat::Native(_inner) => {
// Always use the structured version at this level.
let serde_json_value =
serde_json::from_str(&s).context(DeserializeSnafu { json: s })?;
let json_structure_settings = JsonStructureSettings::Structured(None);
json_structure_settings
.encode(serde_json_value)
.context(DatatypeSnafu)
}
} }
} JsonFormat::Native(_) => {
let extension_type: Option<JsonExtensionType> =
column_schema.extension_type().context(DatatypeSnafu)?;
let json_structure_settings = extension_type
.and_then(|x| x.metadata().json_structure_settings.clone())
.unwrap_or_default();
let v = serde_json::from_str(&s).context(DeserializeSnafu { json: s })?;
json_structure_settings.encode(v).context(DatatypeSnafu)
}
},
ConcreteDataType::Vector(d) => { ConcreteDataType::Vector(d) => {
let v = parse_string_to_vector_type_value(&s, Some(d.dim)).context(DatatypeSnafu)?; let v = parse_string_to_vector_type_value(&s, Some(d.dim)).context(DatatypeSnafu)?;
Ok(Value::Binary(v.into())) Ok(Value::Binary(v.into()))
@@ -417,305 +413,265 @@ mod test {
use super::*; use super::*;
macro_rules! call_parse_string_to_value {
($column_name: expr, $input: expr, $data_type: expr) => {
call_parse_string_to_value!($column_name, $input, $data_type, None)
};
($column_name: expr, $input: expr, $data_type: expr, timezone = $timezone: expr) => {
call_parse_string_to_value!($column_name, $input, $data_type, Some($timezone))
};
($column_name: expr, $input: expr, $data_type: expr, $timezone: expr) => {{
let column_schema = ColumnSchema::new($column_name, $data_type, true);
parse_string_to_value(&column_schema, $input, $timezone, true)
}};
}
#[test] #[test]
fn test_string_to_value_auto_numeric() { fn test_string_to_value_auto_numeric() -> Result<()> {
// Test string to boolean with auto cast // Test string to boolean with auto cast
let result = parse_string_to_value( let result = call_parse_string_to_value!(
"col", "col",
"true".to_string(), "true".to_string(),
&ConcreteDataType::boolean_datatype(), ConcreteDataType::boolean_datatype()
None, )?;
true,
)
.unwrap();
assert_eq!(Value::Boolean(true), result); assert_eq!(Value::Boolean(true), result);
// Test invalid string to boolean with auto cast // Test invalid string to boolean with auto cast
let result = parse_string_to_value( let result = call_parse_string_to_value!(
"col", "col",
"not_a_boolean".to_string(), "not_a_boolean".to_string(),
&ConcreteDataType::boolean_datatype(), ConcreteDataType::boolean_datatype()
None,
true,
); );
assert!(result.is_err()); assert!(result.is_err());
// Test string to int8 // Test string to int8
let result = parse_string_to_value( let result = call_parse_string_to_value!(
"col", "col",
"42".to_string(), "42".to_string(),
&ConcreteDataType::int8_datatype(), ConcreteDataType::int8_datatype()
None, )?;
true,
)
.unwrap();
assert_eq!(Value::Int8(42), result); assert_eq!(Value::Int8(42), result);
// Test invalid string to int8 with auto cast // Test invalid string to int8 with auto cast
let result = parse_string_to_value( let result = call_parse_string_to_value!(
"col", "col",
"not_an_int8".to_string(), "not_an_int8".to_string(),
&ConcreteDataType::int8_datatype(), ConcreteDataType::int8_datatype()
None,
true,
); );
assert!(result.is_err()); assert!(result.is_err());
// Test string to int16 // Test string to int16
let result = parse_string_to_value( let result = call_parse_string_to_value!(
"col", "col",
"1000".to_string(), "1000".to_string(),
&ConcreteDataType::int16_datatype(), ConcreteDataType::int16_datatype()
None, )?;
true,
)
.unwrap();
assert_eq!(Value::Int16(1000), result); assert_eq!(Value::Int16(1000), result);
// Test invalid string to int16 with auto cast // Test invalid string to int16 with auto cast
let result = parse_string_to_value( let result = call_parse_string_to_value!(
"col", "col",
"not_an_int16".to_string(), "not_an_int16".to_string(),
&ConcreteDataType::int16_datatype(), ConcreteDataType::int16_datatype()
None,
true,
); );
assert!(result.is_err()); assert!(result.is_err());
// Test string to int32 // Test string to int32
let result = parse_string_to_value( let result = call_parse_string_to_value!(
"col", "col",
"100000".to_string(), "100000".to_string(),
&ConcreteDataType::int32_datatype(), ConcreteDataType::int32_datatype()
None, )?;
true,
)
.unwrap();
assert_eq!(Value::Int32(100000), result); assert_eq!(Value::Int32(100000), result);
// Test invalid string to int32 with auto cast // Test invalid string to int32 with auto cast
let result = parse_string_to_value( let result = call_parse_string_to_value!(
"col", "col",
"not_an_int32".to_string(), "not_an_int32".to_string(),
&ConcreteDataType::int32_datatype(), ConcreteDataType::int32_datatype()
None,
true,
); );
assert!(result.is_err()); assert!(result.is_err());
// Test string to int64 // Test string to int64
let result = parse_string_to_value( let result = call_parse_string_to_value!(
"col", "col",
"1000000".to_string(), "1000000".to_string(),
&ConcreteDataType::int64_datatype(), ConcreteDataType::int64_datatype()
None, )?;
true,
)
.unwrap();
assert_eq!(Value::Int64(1000000), result); assert_eq!(Value::Int64(1000000), result);
// Test invalid string to int64 with auto cast // Test invalid string to int64 with auto cast
let result = parse_string_to_value( let result = call_parse_string_to_value!(
"col", "col",
"not_an_int64".to_string(), "not_an_int64".to_string(),
&ConcreteDataType::int64_datatype(), ConcreteDataType::int64_datatype()
None,
true,
); );
assert!(result.is_err()); assert!(result.is_err());
// Test string to uint8 // Test string to uint8
let result = parse_string_to_value( let result = call_parse_string_to_value!(
"col", "col",
"200".to_string(), "200".to_string(),
&ConcreteDataType::uint8_datatype(), ConcreteDataType::uint8_datatype()
None, )?;
true,
)
.unwrap();
assert_eq!(Value::UInt8(200), result); assert_eq!(Value::UInt8(200), result);
// Test invalid string to uint8 with auto cast // Test invalid string to uint8 with auto cast
let result = parse_string_to_value( let result = call_parse_string_to_value!(
"col", "col",
"not_a_uint8".to_string(), "not_a_uint8".to_string(),
&ConcreteDataType::uint8_datatype(), ConcreteDataType::uint8_datatype()
None,
true,
); );
assert!(result.is_err()); assert!(result.is_err());
// Test string to uint16 // Test string to uint16
let result = parse_string_to_value( let result = call_parse_string_to_value!(
"col", "col",
"60000".to_string(), "60000".to_string(),
&ConcreteDataType::uint16_datatype(), ConcreteDataType::uint16_datatype()
None, )?;
true,
)
.unwrap();
assert_eq!(Value::UInt16(60000), result); assert_eq!(Value::UInt16(60000), result);
// Test invalid string to uint16 with auto cast // Test invalid string to uint16 with auto cast
let result = parse_string_to_value( let result = call_parse_string_to_value!(
"col", "col",
"not_a_uint16".to_string(), "not_a_uint16".to_string(),
&ConcreteDataType::uint16_datatype(), ConcreteDataType::uint16_datatype()
None,
true,
); );
assert!(result.is_err()); assert!(result.is_err());
// Test string to uint32 // Test string to uint32
let result = parse_string_to_value( let result = call_parse_string_to_value!(
"col", "col",
"4000000000".to_string(), "4000000000".to_string(),
&ConcreteDataType::uint32_datatype(), ConcreteDataType::uint32_datatype()
None, )?;
true,
)
.unwrap();
assert_eq!(Value::UInt32(4000000000), result); assert_eq!(Value::UInt32(4000000000), result);
// Test invalid string to uint32 with auto cast // Test invalid string to uint32 with auto cast
let result = parse_string_to_value( let result = call_parse_string_to_value!(
"col", "col",
"not_a_uint32".to_string(), "not_a_uint32".to_string(),
&ConcreteDataType::uint32_datatype(), ConcreteDataType::uint32_datatype()
None,
true,
); );
assert!(result.is_err()); assert!(result.is_err());
// Test string to uint64 // Test string to uint64
let result = parse_string_to_value( let result = call_parse_string_to_value!(
"col", "col",
"18446744073709551615".to_string(), "18446744073709551615".to_string(),
&ConcreteDataType::uint64_datatype(), ConcreteDataType::uint64_datatype()
None, )?;
true,
)
.unwrap();
assert_eq!(Value::UInt64(18446744073709551615), result); assert_eq!(Value::UInt64(18446744073709551615), result);
// Test invalid string to uint64 with auto cast // Test invalid string to uint64 with auto cast
let result = parse_string_to_value( let result = call_parse_string_to_value!(
"col", "col",
"not_a_uint64".to_string(), "not_a_uint64".to_string(),
&ConcreteDataType::uint64_datatype(), ConcreteDataType::uint64_datatype()
None,
true,
); );
assert!(result.is_err()); assert!(result.is_err());
// Test string to float32 // Test string to float32
let result = parse_string_to_value( let result = call_parse_string_to_value!(
"col", "col",
"3.5".to_string(), "3.5".to_string(),
&ConcreteDataType::float32_datatype(), ConcreteDataType::float32_datatype()
None, )?;
true,
)
.unwrap();
assert_eq!(Value::Float32(OrderedF32::from(3.5)), result); assert_eq!(Value::Float32(OrderedF32::from(3.5)), result);
// Test invalid string to float32 with auto cast // Test invalid string to float32 with auto cast
let result = parse_string_to_value( let result = call_parse_string_to_value!(
"col", "col",
"not_a_float32".to_string(), "not_a_float32".to_string(),
&ConcreteDataType::float32_datatype(), ConcreteDataType::float32_datatype()
None,
true,
); );
assert!(result.is_err()); assert!(result.is_err());
// Test string to float64 // Test string to float64
let result = parse_string_to_value( let result = call_parse_string_to_value!(
"col", "col",
"3.5".to_string(), "3.5".to_string(),
&ConcreteDataType::float64_datatype(), ConcreteDataType::float64_datatype()
None, )?;
true,
)
.unwrap();
assert_eq!(Value::Float64(OrderedF64::from(3.5)), result); assert_eq!(Value::Float64(OrderedF64::from(3.5)), result);
// Test invalid string to float64 with auto cast // Test invalid string to float64 with auto cast
let result = parse_string_to_value( let result = call_parse_string_to_value!(
"col", "col",
"not_a_float64".to_string(), "not_a_float64".to_string(),
&ConcreteDataType::float64_datatype(), ConcreteDataType::float64_datatype()
None,
true,
); );
assert!(result.is_err()); assert!(result.is_err());
Ok(())
} }
#[test] macro_rules! call_sql_value_to_value {
fn test_sql_value_to_value() { ($column_name: expr, $data_type: expr, $sql_value: expr) => {
let sql_val = SqlValue::Null; call_sql_value_to_value!($column_name, $data_type, $sql_value, None, None, false)
assert_eq!( };
Value::Null, ($column_name: expr, $data_type: expr, $sql_value: expr, timezone = $timezone: expr) => {
sql_value_to_value( call_sql_value_to_value!(
"a", $column_name,
&ConcreteDataType::float64_datatype(), $data_type,
&sql_val, $sql_value,
None, Some($timezone),
None, None,
false false
) )
.unwrap() };
($column_name: expr, $data_type: expr, $sql_value: expr, unary_op = $unary_op: expr) => {
call_sql_value_to_value!(
$column_name,
$data_type,
$sql_value,
None,
Some($unary_op),
false
)
};
($column_name: expr, $data_type: expr, $sql_value: expr, auto_string_to_numeric) => {
call_sql_value_to_value!($column_name, $data_type, $sql_value, None, None, true)
};
($column_name: expr, $data_type: expr, $sql_value: expr, $timezone: expr, $unary_op: expr, $auto_string_to_numeric: expr) => {{
let column_schema = ColumnSchema::new($column_name, $data_type, true);
sql_value_to_value(
&column_schema,
$sql_value,
$timezone,
$unary_op,
$auto_string_to_numeric,
)
}};
}
#[test]
fn test_sql_value_to_value() -> Result<()> {
let sql_val = SqlValue::Null;
assert_eq!(
Value::Null,
call_sql_value_to_value!("a", ConcreteDataType::float64_datatype(), &sql_val)?
); );
let sql_val = SqlValue::Boolean(true); let sql_val = SqlValue::Boolean(true);
assert_eq!( assert_eq!(
Value::Boolean(true), Value::Boolean(true),
sql_value_to_value( call_sql_value_to_value!("a", ConcreteDataType::boolean_datatype(), &sql_val)?
"a",
&ConcreteDataType::boolean_datatype(),
&sql_val,
None,
None,
false
)
.unwrap()
); );
let sql_val = SqlValue::Number("3.0".to_string(), false); let sql_val = SqlValue::Number("3.0".to_string(), false);
assert_eq!( assert_eq!(
Value::Float64(OrderedFloat(3.0)), Value::Float64(OrderedFloat(3.0)),
sql_value_to_value( call_sql_value_to_value!("a", ConcreteDataType::float64_datatype(), &sql_val)?
"a",
&ConcreteDataType::float64_datatype(),
&sql_val,
None,
None,
false
)
.unwrap()
); );
let sql_val = SqlValue::Number("3.0".to_string(), false); let sql_val = SqlValue::Number("3.0".to_string(), false);
let v = sql_value_to_value( let v = call_sql_value_to_value!("a", ConcreteDataType::boolean_datatype(), &sql_val);
"a",
&ConcreteDataType::boolean_datatype(),
&sql_val,
None,
None,
false,
);
assert!(v.is_err()); assert!(v.is_err());
assert!(format!("{v:?}").contains("Failed to parse number '3.0' to boolean column type")); assert!(format!("{v:?}").contains("Failed to parse number '3.0' to boolean column type"));
let sql_val = SqlValue::Boolean(true); let sql_val = SqlValue::Boolean(true);
let v = sql_value_to_value( let v = call_sql_value_to_value!("a", ConcreteDataType::float64_datatype(), &sql_val);
"a",
&ConcreteDataType::float64_datatype(),
&sql_val,
None,
None,
false,
);
assert!(v.is_err()); assert!(v.is_err());
assert!( assert!(
format!("{v:?}").contains( format!("{v:?}").contains(
@@ -725,41 +681,18 @@ mod test {
); );
let sql_val = SqlValue::HexStringLiteral("48656c6c6f20776f726c6421".to_string()); let sql_val = SqlValue::HexStringLiteral("48656c6c6f20776f726c6421".to_string());
let v = sql_value_to_value( let v = call_sql_value_to_value!("a", ConcreteDataType::binary_datatype(), &sql_val)?;
"a",
&ConcreteDataType::binary_datatype(),
&sql_val,
None,
None,
false,
)
.unwrap();
assert_eq!(Value::Binary(Bytes::from(b"Hello world!".as_slice())), v); assert_eq!(Value::Binary(Bytes::from(b"Hello world!".as_slice())), v);
let sql_val = SqlValue::DoubleQuotedString("MorningMyFriends".to_string()); let sql_val = SqlValue::DoubleQuotedString("MorningMyFriends".to_string());
let v = sql_value_to_value( let v = call_sql_value_to_value!("a", ConcreteDataType::binary_datatype(), &sql_val)?;
"a",
&ConcreteDataType::binary_datatype(),
&sql_val,
None,
None,
false,
)
.unwrap();
assert_eq!( assert_eq!(
Value::Binary(Bytes::from(b"MorningMyFriends".as_slice())), Value::Binary(Bytes::from(b"MorningMyFriends".as_slice())),
v v
); );
let sql_val = SqlValue::HexStringLiteral("9AF".to_string()); let sql_val = SqlValue::HexStringLiteral("9AF".to_string());
let v = sql_value_to_value( let v = call_sql_value_to_value!("a", ConcreteDataType::binary_datatype(), &sql_val);
"a",
&ConcreteDataType::binary_datatype(),
&sql_val,
None,
None,
false,
);
assert!(v.is_err()); assert!(v.is_err());
assert!( assert!(
format!("{v:?}").contains("odd number of digits"), format!("{v:?}").contains("odd number of digits"),
@@ -767,38 +700,16 @@ mod test {
); );
let sql_val = SqlValue::HexStringLiteral("AG".to_string()); let sql_val = SqlValue::HexStringLiteral("AG".to_string());
let v = sql_value_to_value( let v = call_sql_value_to_value!("a", ConcreteDataType::binary_datatype(), &sql_val);
"a",
&ConcreteDataType::binary_datatype(),
&sql_val,
None,
None,
false,
);
assert!(v.is_err()); assert!(v.is_err());
assert!(format!("{v:?}").contains("invalid character"), "v is {v:?}",); assert!(format!("{v:?}").contains("invalid character"), "v is {v:?}",);
let sql_val = SqlValue::DoubleQuotedString("MorningMyFriends".to_string()); let sql_val = SqlValue::DoubleQuotedString("MorningMyFriends".to_string());
let v = sql_value_to_value( let v = call_sql_value_to_value!("a", ConcreteDataType::json_datatype(), &sql_val);
"a",
&ConcreteDataType::json_datatype(),
&sql_val,
None,
None,
false,
);
assert!(v.is_err()); assert!(v.is_err());
let sql_val = SqlValue::DoubleQuotedString(r#"{"a":"b"}"#.to_string()); let sql_val = SqlValue::DoubleQuotedString(r#"{"a":"b"}"#.to_string());
let v = sql_value_to_value( let v = call_sql_value_to_value!("a", ConcreteDataType::json_datatype(), &sql_val)?;
"a",
&ConcreteDataType::json_datatype(),
&sql_val,
None,
None,
false,
)
.unwrap();
assert_eq!( assert_eq!(
Value::Binary(Bytes::from( Value::Binary(Bytes::from(
jsonb::parse_value(r#"{"a":"b"}"#.as_bytes()) jsonb::parse_value(r#"{"a":"b"}"#.as_bytes())
@@ -808,16 +719,15 @@ mod test {
)), )),
v v
); );
Ok(())
} }
#[test] #[test]
fn test_parse_json_to_jsonb() { fn test_parse_json_to_jsonb() {
match parse_string_to_value( match call_parse_string_to_value!(
"json_col", "json_col",
r#"{"a": "b"}"#.to_string(), r#"{"a": "b"}"#.to_string(),
&ConcreteDataType::json_datatype(), ConcreteDataType::json_datatype()
None,
false,
) { ) {
Ok(Value::Binary(b)) => { Ok(Value::Binary(b)) => {
assert_eq!( assert_eq!(
@@ -833,12 +743,10 @@ mod test {
} }
assert!( assert!(
parse_string_to_value( call_parse_string_to_value!(
"json_col", "json_col",
r#"Nicola Kovac is the best rifler in the world"#.to_string(), r#"Nicola Kovac is the best rifler in the world"#.to_string(),
&ConcreteDataType::json_datatype(), ConcreteDataType::json_datatype()
None,
false,
) )
.is_err() .is_err()
) )
@@ -878,13 +786,10 @@ mod test {
#[test] #[test]
fn test_parse_date_literal() { fn test_parse_date_literal() {
let value = sql_value_to_value( let value = call_sql_value_to_value!(
"date", "date",
&ConcreteDataType::date_datatype(), ConcreteDataType::date_datatype(),
&SqlValue::DoubleQuotedString("2022-02-22".to_string()), &SqlValue::DoubleQuotedString("2022-02-22".to_string())
None,
None,
false,
) )
.unwrap(); .unwrap();
assert_eq!(ConcreteDataType::date_datatype(), value.data_type()); assert_eq!(ConcreteDataType::date_datatype(), value.data_type());
@@ -895,13 +800,11 @@ mod test {
} }
// with timezone // with timezone
let value = sql_value_to_value( let value = call_sql_value_to_value!(
"date", "date",
&ConcreteDataType::date_datatype(), ConcreteDataType::date_datatype(),
&SqlValue::DoubleQuotedString("2022-02-22".to_string()), &SqlValue::DoubleQuotedString("2022-02-22".to_string()),
Some(&Timezone::from_tz_string("+07:00").unwrap()), timezone = &Timezone::from_tz_string("+07:00").unwrap()
None,
false,
) )
.unwrap(); .unwrap();
assert_eq!(ConcreteDataType::date_datatype(), value.data_type()); assert_eq!(ConcreteDataType::date_datatype(), value.data_type());
@@ -913,16 +816,12 @@ mod test {
} }
#[test] #[test]
fn test_parse_timestamp_literal() { fn test_parse_timestamp_literal() -> Result<()> {
match parse_string_to_value( match call_parse_string_to_value!(
"timestamp_col", "timestamp_col",
"2022-02-22T00:01:01+08:00".to_string(), "2022-02-22T00:01:01+08:00".to_string(),
&ConcreteDataType::timestamp_millisecond_datatype(), ConcreteDataType::timestamp_millisecond_datatype()
None, )? {
false,
)
.unwrap()
{
Value::Timestamp(ts) => { Value::Timestamp(ts) => {
assert_eq!(1645459261000, ts.value()); assert_eq!(1645459261000, ts.value());
assert_eq!(TimeUnit::Millisecond, ts.unit()); assert_eq!(TimeUnit::Millisecond, ts.unit());
@@ -932,15 +831,11 @@ mod test {
} }
} }
match parse_string_to_value( match call_parse_string_to_value!(
"timestamp_col", "timestamp_col",
"2022-02-22T00:01:01+08:00".to_string(), "2022-02-22T00:01:01+08:00".to_string(),
&ConcreteDataType::timestamp_datatype(TimeUnit::Second), ConcreteDataType::timestamp_datatype(TimeUnit::Second)
None, )? {
false,
)
.unwrap()
{
Value::Timestamp(ts) => { Value::Timestamp(ts) => {
assert_eq!(1645459261, ts.value()); assert_eq!(1645459261, ts.value());
assert_eq!(TimeUnit::Second, ts.unit()); assert_eq!(TimeUnit::Second, ts.unit());
@@ -950,15 +845,11 @@ mod test {
} }
} }
match parse_string_to_value( match call_parse_string_to_value!(
"timestamp_col", "timestamp_col",
"2022-02-22T00:01:01+08:00".to_string(), "2022-02-22T00:01:01+08:00".to_string(),
&ConcreteDataType::timestamp_datatype(TimeUnit::Microsecond), ConcreteDataType::timestamp_datatype(TimeUnit::Microsecond)
None, )? {
false,
)
.unwrap()
{
Value::Timestamp(ts) => { Value::Timestamp(ts) => {
assert_eq!(1645459261000000, ts.value()); assert_eq!(1645459261000000, ts.value());
assert_eq!(TimeUnit::Microsecond, ts.unit()); assert_eq!(TimeUnit::Microsecond, ts.unit());
@@ -968,15 +859,11 @@ mod test {
} }
} }
match parse_string_to_value( match call_parse_string_to_value!(
"timestamp_col", "timestamp_col",
"2022-02-22T00:01:01+08:00".to_string(), "2022-02-22T00:01:01+08:00".to_string(),
&ConcreteDataType::timestamp_datatype(TimeUnit::Nanosecond), ConcreteDataType::timestamp_datatype(TimeUnit::Nanosecond)
None, )? {
false,
)
.unwrap()
{
Value::Timestamp(ts) => { Value::Timestamp(ts) => {
assert_eq!(1645459261000000000, ts.value()); assert_eq!(1645459261000000000, ts.value());
assert_eq!(TimeUnit::Nanosecond, ts.unit()); assert_eq!(TimeUnit::Nanosecond, ts.unit());
@@ -987,26 +874,21 @@ mod test {
} }
assert!( assert!(
parse_string_to_value( call_parse_string_to_value!(
"timestamp_col", "timestamp_col",
"2022-02-22T00:01:01+08".to_string(), "2022-02-22T00:01:01+08".to_string(),
&ConcreteDataType::timestamp_datatype(TimeUnit::Nanosecond), ConcreteDataType::timestamp_datatype(TimeUnit::Nanosecond)
None,
false,
) )
.is_err() .is_err()
); );
// with timezone // with timezone
match parse_string_to_value( match call_parse_string_to_value!(
"timestamp_col", "timestamp_col",
"2022-02-22T00:01:01".to_string(), "2022-02-22T00:01:01".to_string(),
&ConcreteDataType::timestamp_datatype(TimeUnit::Nanosecond), ConcreteDataType::timestamp_datatype(TimeUnit::Nanosecond),
Some(&Timezone::from_tz_string("Asia/Shanghai").unwrap()), timezone = &Timezone::from_tz_string("Asia/Shanghai").unwrap()
false, )? {
)
.unwrap()
{
Value::Timestamp(ts) => { Value::Timestamp(ts) => {
assert_eq!(1645459261000000000, ts.value()); assert_eq!(1645459261000000000, ts.value());
assert_eq!("2022-02-21 16:01:01+0000", ts.to_iso8601_string()); assert_eq!("2022-02-21 16:01:01+0000", ts.to_iso8601_string());
@@ -1016,51 +898,42 @@ mod test {
unreachable!() unreachable!()
} }
} }
Ok(())
} }
#[test] #[test]
fn test_parse_placeholder_value() { fn test_parse_placeholder_value() {
assert!( assert!(
sql_value_to_value( call_sql_value_to_value!(
"test", "test",
&ConcreteDataType::string_datatype(), ConcreteDataType::string_datatype(),
&SqlValue::Placeholder("default".into())
)
.is_err()
);
assert!(
call_sql_value_to_value!(
"test",
ConcreteDataType::string_datatype(),
&SqlValue::Placeholder("default".into()), &SqlValue::Placeholder("default".into()),
None, unary_op = UnaryOperator::Minus
None,
false
) )
.is_err() .is_err()
); );
assert!( assert!(
sql_value_to_value( call_sql_value_to_value!(
"test", "test",
&ConcreteDataType::string_datatype(), ConcreteDataType::uint16_datatype(),
&SqlValue::Placeholder("default".into()),
None,
Some(UnaryOperator::Minus),
false
)
.is_err()
);
assert!(
sql_value_to_value(
"test",
&ConcreteDataType::uint16_datatype(),
&SqlValue::Number("3".into(), false), &SqlValue::Number("3".into(), false),
None, unary_op = UnaryOperator::Minus
Some(UnaryOperator::Minus),
false
) )
.is_err() .is_err()
); );
assert!( assert!(
sql_value_to_value( call_sql_value_to_value!(
"test", "test",
&ConcreteDataType::uint16_datatype(), ConcreteDataType::uint16_datatype(),
&SqlValue::Number("3".into(), false), &SqlValue::Number("3".into(), false)
None,
None,
false
) )
.is_ok() .is_ok()
); );
@@ -1070,77 +943,60 @@ mod test {
fn test_auto_string_to_numeric() { fn test_auto_string_to_numeric() {
// Test with auto_string_to_numeric=true // Test with auto_string_to_numeric=true
let sql_val = SqlValue::SingleQuotedString("123".to_string()); let sql_val = SqlValue::SingleQuotedString("123".to_string());
let v = sql_value_to_value( let v = call_sql_value_to_value!(
"a", "a",
&ConcreteDataType::int32_datatype(), ConcreteDataType::int32_datatype(),
&sql_val, &sql_val,
None, auto_string_to_numeric
None,
true,
) )
.unwrap(); .unwrap();
assert_eq!(Value::Int32(123), v); assert_eq!(Value::Int32(123), v);
// Test with a float string // Test with a float string
let sql_val = SqlValue::SingleQuotedString("3.5".to_string()); let sql_val = SqlValue::SingleQuotedString("3.5".to_string());
let v = sql_value_to_value( let v = call_sql_value_to_value!(
"a", "a",
&ConcreteDataType::float64_datatype(), ConcreteDataType::float64_datatype(),
&sql_val, &sql_val,
None, auto_string_to_numeric
None,
true,
) )
.unwrap(); .unwrap();
assert_eq!(Value::Float64(OrderedFloat(3.5)), v); assert_eq!(Value::Float64(OrderedFloat(3.5)), v);
// Test with auto_string_to_numeric=false // Test with auto_string_to_numeric=false
let sql_val = SqlValue::SingleQuotedString("123".to_string()); let sql_val = SqlValue::SingleQuotedString("123".to_string());
let v = sql_value_to_value( let v = call_sql_value_to_value!("a", ConcreteDataType::int32_datatype(), &sql_val);
"a",
&ConcreteDataType::int32_datatype(),
&sql_val,
None,
None,
false,
);
assert!(v.is_err()); assert!(v.is_err());
// Test with an invalid numeric string but auto_string_to_numeric=true // Test with an invalid numeric string but auto_string_to_numeric=true
// Should return an error now with the new auto_cast_to_numeric behavior // Should return an error now with the new auto_cast_to_numeric behavior
let sql_val = SqlValue::SingleQuotedString("not_a_number".to_string()); let sql_val = SqlValue::SingleQuotedString("not_a_number".to_string());
let v = sql_value_to_value( let v = call_sql_value_to_value!(
"a", "a",
&ConcreteDataType::int32_datatype(), ConcreteDataType::int32_datatype(),
&sql_val, &sql_val,
None, auto_string_to_numeric
None,
true,
); );
assert!(v.is_err()); assert!(v.is_err());
// Test with boolean type // Test with boolean type
let sql_val = SqlValue::SingleQuotedString("true".to_string()); let sql_val = SqlValue::SingleQuotedString("true".to_string());
let v = sql_value_to_value( let v = call_sql_value_to_value!(
"a", "a",
&ConcreteDataType::boolean_datatype(), ConcreteDataType::boolean_datatype(),
&sql_val, &sql_val,
None, auto_string_to_numeric
None,
true,
) )
.unwrap(); .unwrap();
assert_eq!(Value::Boolean(true), v); assert_eq!(Value::Boolean(true), v);
// Non-numeric types should still be handled normally // Non-numeric types should still be handled normally
let sql_val = SqlValue::SingleQuotedString("hello".to_string()); let sql_val = SqlValue::SingleQuotedString("hello".to_string());
let v = sql_value_to_value( let v = call_sql_value_to_value!(
"a", "a",
&ConcreteDataType::string_datatype(), ConcreteDataType::string_datatype(),
&sql_val, &sql_val,
None, auto_string_to_numeric
None,
true,
); );
assert!(v.is_ok()); assert!(v.is_ok());
} }

View File

@@ -14,8 +14,8 @@
use common_time::timezone::Timezone; use common_time::timezone::Timezone;
use datatypes::prelude::ConcreteDataType; use datatypes::prelude::ConcreteDataType;
use datatypes::schema::ColumnDefaultConstraint;
use datatypes::schema::constraint::{CURRENT_TIMESTAMP, CURRENT_TIMESTAMP_FN}; use datatypes::schema::constraint::{CURRENT_TIMESTAMP, CURRENT_TIMESTAMP_FN};
use datatypes::schema::{ColumnDefaultConstraint, ColumnSchema};
use snafu::ensure; use snafu::ensure;
use sqlparser::ast::ValueWithSpan; use sqlparser::ast::ValueWithSpan;
pub use sqlparser::ast::{ pub use sqlparser::ast::{
@@ -47,9 +47,12 @@ pub fn parse_column_default_constraint(
); );
let default_constraint = match &opt.option { let default_constraint = match &opt.option {
ColumnOption::Default(Expr::Value(v)) => ColumnDefaultConstraint::Value( ColumnOption::Default(Expr::Value(v)) => {
sql_value_to_value(column_name, data_type, &v.value, timezone, None, false)?, let schema = ColumnSchema::new(column_name, data_type.clone(), true);
), ColumnDefaultConstraint::Value(sql_value_to_value(
&schema, &v.value, timezone, None, false,
)?)
}
ColumnOption::Default(Expr::Function(func)) => { ColumnOption::Default(Expr::Function(func)) => {
let mut func = format!("{func}").to_lowercase(); let mut func = format!("{func}").to_lowercase();
// normalize CURRENT_TIMESTAMP to CURRENT_TIMESTAMP() // normalize CURRENT_TIMESTAMP to CURRENT_TIMESTAMP()
@@ -80,8 +83,7 @@ pub fn parse_column_default_constraint(
if let Expr::Value(v) = &**expr { if let Expr::Value(v) = &**expr {
let value = sql_value_to_value( let value = sql_value_to_value(
column_name, &ColumnSchema::new(column_name, data_type.clone(), true),
data_type,
&v.value, &v.value,
timezone, timezone,
Some(*op), Some(*op),

View File

@@ -58,10 +58,14 @@ pub fn get_total_memory_bytes() -> i64 {
} }
} }
/// Get the total CPU cores. The result will be rounded to the nearest integer. /// Get the total CPU cores. The result will be rounded up to the next integer (ceiling).
/// For example, if the total CPU is 1.5 cores(1500 millicores), the result will be 2. /// For example, if the total CPU is 1.1 cores (1100 millicores) or 1.5 cores (1500 millicores), the result will be 2.
pub fn get_total_cpu_cores() -> usize { pub fn get_total_cpu_cores() -> usize {
((get_total_cpu_millicores() as f64) / 1000.0).round() as usize cpu_cores(get_total_cpu_millicores())
}
fn cpu_cores(cpu_millicores: i64) -> usize {
((cpu_millicores as f64) / 1_000.0).ceil() as usize
} }
/// Get the total memory in readable size. /// Get the total memory in readable size.
@@ -178,6 +182,13 @@ mod tests {
#[test] #[test]
fn test_get_total_cpu_cores() { fn test_get_total_cpu_cores() {
assert!(get_total_cpu_cores() > 0); assert!(get_total_cpu_cores() > 0);
assert_eq!(cpu_cores(1), 1);
assert_eq!(cpu_cores(100), 1);
assert_eq!(cpu_cores(500), 1);
assert_eq!(cpu_cores(1000), 1);
assert_eq!(cpu_cores(1100), 2);
assert_eq!(cpu_cores(1900), 2);
assert_eq!(cpu_cores(10_000), 10);
} }
#[test] #[test]

View File

@@ -71,6 +71,7 @@ pub fn convert_metric_to_write_request(
timestamp, timestamp,
}], }],
exemplars: vec![], exemplars: vec![],
histograms: vec![],
}), }),
MetricType::GAUGE => timeseries.push(TimeSeries { MetricType::GAUGE => timeseries.push(TimeSeries {
labels: convert_label(m.get_label(), mf_name, None), labels: convert_label(m.get_label(), mf_name, None),
@@ -79,6 +80,7 @@ pub fn convert_metric_to_write_request(
timestamp, timestamp,
}], }],
exemplars: vec![], exemplars: vec![],
histograms: vec![],
}), }),
MetricType::HISTOGRAM => { MetricType::HISTOGRAM => {
let h = m.get_histogram(); let h = m.get_histogram();
@@ -97,6 +99,7 @@ pub fn convert_metric_to_write_request(
timestamp, timestamp,
}], }],
exemplars: vec![], exemplars: vec![],
histograms: vec![],
}); });
if upper_bound.is_sign_positive() && upper_bound.is_infinite() { if upper_bound.is_sign_positive() && upper_bound.is_infinite() {
inf_seen = true; inf_seen = true;
@@ -114,6 +117,7 @@ pub fn convert_metric_to_write_request(
timestamp, timestamp,
}], }],
exemplars: vec![], exemplars: vec![],
histograms: vec![],
}); });
} }
timeseries.push(TimeSeries { timeseries.push(TimeSeries {
@@ -127,6 +131,7 @@ pub fn convert_metric_to_write_request(
timestamp, timestamp,
}], }],
exemplars: vec![], exemplars: vec![],
histograms: vec![],
}); });
timeseries.push(TimeSeries { timeseries.push(TimeSeries {
labels: convert_label( labels: convert_label(
@@ -139,6 +144,7 @@ pub fn convert_metric_to_write_request(
timestamp, timestamp,
}], }],
exemplars: vec![], exemplars: vec![],
histograms: vec![],
}); });
} }
MetricType::SUMMARY => { MetricType::SUMMARY => {
@@ -155,6 +161,7 @@ pub fn convert_metric_to_write_request(
timestamp, timestamp,
}], }],
exemplars: vec![], exemplars: vec![],
histograms: vec![],
}); });
} }
timeseries.push(TimeSeries { timeseries.push(TimeSeries {
@@ -168,6 +175,7 @@ pub fn convert_metric_to_write_request(
timestamp, timestamp,
}], }],
exemplars: vec![], exemplars: vec![],
histograms: vec![],
}); });
timeseries.push(TimeSeries { timeseries.push(TimeSeries {
labels: convert_label( labels: convert_label(
@@ -180,6 +188,7 @@ pub fn convert_metric_to_write_request(
timestamp, timestamp,
}], }],
exemplars: vec![], exemplars: vec![],
histograms: vec![],
}); });
} }
MetricType::UNTYPED => { MetricType::UNTYPED => {
@@ -274,7 +283,7 @@ mod test {
assert_eq!( assert_eq!(
format!("{:?}", write_quest.timeseries), format!("{:?}", write_quest.timeseries),
r#"[TimeSeries { labels: [Label { name: "__name__", value: "test_counter" }, Label { name: "a", value: "1" }, Label { name: "b", value: "2" }], samples: [Sample { value: 1.0, timestamp: 0 }], exemplars: [] }]"# r#"[TimeSeries { labels: [Label { name: "__name__", value: "test_counter" }, Label { name: "a", value: "1" }, Label { name: "b", value: "2" }], samples: [Sample { value: 1.0, timestamp: 0 }], exemplars: [], histograms: [] }]"#
); );
let gauge_opts = Opts::new("test_gauge", "test help") let gauge_opts = Opts::new("test_gauge", "test help")
@@ -288,7 +297,7 @@ mod test {
let write_quest = convert_metric_to_write_request(mf, None, 0); let write_quest = convert_metric_to_write_request(mf, None, 0);
assert_eq!( assert_eq!(
format!("{:?}", write_quest.timeseries), format!("{:?}", write_quest.timeseries),
r#"[TimeSeries { labels: [Label { name: "__name__", value: "test_gauge" }, Label { name: "a", value: "1" }, Label { name: "b", value: "2" }], samples: [Sample { value: 42.0, timestamp: 0 }], exemplars: [] }]"# r#"[TimeSeries { labels: [Label { name: "__name__", value: "test_gauge" }, Label { name: "a", value: "1" }, Label { name: "b", value: "2" }], samples: [Sample { value: 42.0, timestamp: 0 }], exemplars: [], histograms: [] }]"#
); );
} }
@@ -305,20 +314,20 @@ mod test {
.iter() .iter()
.map(|x| format!("{:?}", x)) .map(|x| format!("{:?}", x))
.collect(); .collect();
let ans = r#"TimeSeries { labels: [Label { name: "__name__", value: "test_histogram_bucket" }, Label { name: "a", value: "1" }, Label { name: "le", value: "0.005" }], samples: [Sample { value: 0.0, timestamp: 0 }], exemplars: [] } let ans = r#"TimeSeries { labels: [Label { name: "__name__", value: "test_histogram_bucket" }, Label { name: "a", value: "1" }, Label { name: "le", value: "0.005" }], samples: [Sample { value: 0.0, timestamp: 0 }], exemplars: [], histograms: [] }
TimeSeries { labels: [Label { name: "__name__", value: "test_histogram_bucket" }, Label { name: "a", value: "1" }, Label { name: "le", value: "0.01" }], samples: [Sample { value: 0.0, timestamp: 0 }], exemplars: [] } TimeSeries { labels: [Label { name: "__name__", value: "test_histogram_bucket" }, Label { name: "a", value: "1" }, Label { name: "le", value: "0.01" }], samples: [Sample { value: 0.0, timestamp: 0 }], exemplars: [], histograms: [] }
TimeSeries { labels: [Label { name: "__name__", value: "test_histogram_bucket" }, Label { name: "a", value: "1" }, Label { name: "le", value: "0.025" }], samples: [Sample { value: 0.0, timestamp: 0 }], exemplars: [] } TimeSeries { labels: [Label { name: "__name__", value: "test_histogram_bucket" }, Label { name: "a", value: "1" }, Label { name: "le", value: "0.025" }], samples: [Sample { value: 0.0, timestamp: 0 }], exemplars: [], histograms: [] }
TimeSeries { labels: [Label { name: "__name__", value: "test_histogram_bucket" }, Label { name: "a", value: "1" }, Label { name: "le", value: "0.05" }], samples: [Sample { value: 0.0, timestamp: 0 }], exemplars: [] } TimeSeries { labels: [Label { name: "__name__", value: "test_histogram_bucket" }, Label { name: "a", value: "1" }, Label { name: "le", value: "0.05" }], samples: [Sample { value: 0.0, timestamp: 0 }], exemplars: [], histograms: [] }
TimeSeries { labels: [Label { name: "__name__", value: "test_histogram_bucket" }, Label { name: "a", value: "1" }, Label { name: "le", value: "0.1" }], samples: [Sample { value: 0.0, timestamp: 0 }], exemplars: [] } TimeSeries { labels: [Label { name: "__name__", value: "test_histogram_bucket" }, Label { name: "a", value: "1" }, Label { name: "le", value: "0.1" }], samples: [Sample { value: 0.0, timestamp: 0 }], exemplars: [], histograms: [] }
TimeSeries { labels: [Label { name: "__name__", value: "test_histogram_bucket" }, Label { name: "a", value: "1" }, Label { name: "le", value: "0.25" }], samples: [Sample { value: 1.0, timestamp: 0 }], exemplars: [] } TimeSeries { labels: [Label { name: "__name__", value: "test_histogram_bucket" }, Label { name: "a", value: "1" }, Label { name: "le", value: "0.25" }], samples: [Sample { value: 1.0, timestamp: 0 }], exemplars: [], histograms: [] }
TimeSeries { labels: [Label { name: "__name__", value: "test_histogram_bucket" }, Label { name: "a", value: "1" }, Label { name: "le", value: "0.5" }], samples: [Sample { value: 1.0, timestamp: 0 }], exemplars: [] } TimeSeries { labels: [Label { name: "__name__", value: "test_histogram_bucket" }, Label { name: "a", value: "1" }, Label { name: "le", value: "0.5" }], samples: [Sample { value: 1.0, timestamp: 0 }], exemplars: [], histograms: [] }
TimeSeries { labels: [Label { name: "__name__", value: "test_histogram_bucket" }, Label { name: "a", value: "1" }, Label { name: "le", value: "1" }], samples: [Sample { value: 1.0, timestamp: 0 }], exemplars: [] } TimeSeries { labels: [Label { name: "__name__", value: "test_histogram_bucket" }, Label { name: "a", value: "1" }, Label { name: "le", value: "1" }], samples: [Sample { value: 1.0, timestamp: 0 }], exemplars: [], histograms: [] }
TimeSeries { labels: [Label { name: "__name__", value: "test_histogram_bucket" }, Label { name: "a", value: "1" }, Label { name: "le", value: "2.5" }], samples: [Sample { value: 1.0, timestamp: 0 }], exemplars: [] } TimeSeries { labels: [Label { name: "__name__", value: "test_histogram_bucket" }, Label { name: "a", value: "1" }, Label { name: "le", value: "2.5" }], samples: [Sample { value: 1.0, timestamp: 0 }], exemplars: [], histograms: [] }
TimeSeries { labels: [Label { name: "__name__", value: "test_histogram_bucket" }, Label { name: "a", value: "1" }, Label { name: "le", value: "5" }], samples: [Sample { value: 1.0, timestamp: 0 }], exemplars: [] } TimeSeries { labels: [Label { name: "__name__", value: "test_histogram_bucket" }, Label { name: "a", value: "1" }, Label { name: "le", value: "5" }], samples: [Sample { value: 1.0, timestamp: 0 }], exemplars: [], histograms: [] }
TimeSeries { labels: [Label { name: "__name__", value: "test_histogram_bucket" }, Label { name: "a", value: "1" }, Label { name: "le", value: "10" }], samples: [Sample { value: 1.0, timestamp: 0 }], exemplars: [] } TimeSeries { labels: [Label { name: "__name__", value: "test_histogram_bucket" }, Label { name: "a", value: "1" }, Label { name: "le", value: "10" }], samples: [Sample { value: 1.0, timestamp: 0 }], exemplars: [], histograms: [] }
TimeSeries { labels: [Label { name: "__name__", value: "test_histogram_bucket" }, Label { name: "a", value: "1" }, Label { name: "le", value: "+Inf" }], samples: [Sample { value: 1.0, timestamp: 0 }], exemplars: [] } TimeSeries { labels: [Label { name: "__name__", value: "test_histogram_bucket" }, Label { name: "a", value: "1" }, Label { name: "le", value: "+Inf" }], samples: [Sample { value: 1.0, timestamp: 0 }], exemplars: [], histograms: [] }
TimeSeries { labels: [Label { name: "__name__", value: "test_histogram_sum" }, Label { name: "a", value: "1" }], samples: [Sample { value: 0.25, timestamp: 0 }], exemplars: [] } TimeSeries { labels: [Label { name: "__name__", value: "test_histogram_sum" }, Label { name: "a", value: "1" }], samples: [Sample { value: 0.25, timestamp: 0 }], exemplars: [], histograms: [] }
TimeSeries { labels: [Label { name: "__name__", value: "test_histogram_count" }, Label { name: "a", value: "1" }], samples: [Sample { value: 1.0, timestamp: 0 }], exemplars: [] }"#; TimeSeries { labels: [Label { name: "__name__", value: "test_histogram_count" }, Label { name: "a", value: "1" }], samples: [Sample { value: 1.0, timestamp: 0 }], exemplars: [], histograms: [] }"#;
assert_eq!(write_quest_str.join("\n"), ans); assert_eq!(write_quest_str.join("\n"), ans);
} }
@@ -355,10 +364,10 @@ TimeSeries { labels: [Label { name: "__name__", value: "test_histogram_count" },
.iter() .iter()
.map(|x| format!("{:?}", x)) .map(|x| format!("{:?}", x))
.collect(); .collect();
let ans = r#"TimeSeries { labels: [Label { name: "__name__", value: "test_summary" }, Label { name: "quantile", value: "50" }], samples: [Sample { value: 3.0, timestamp: 20 }], exemplars: [] } let ans = r#"TimeSeries { labels: [Label { name: "__name__", value: "test_summary" }, Label { name: "quantile", value: "50" }], samples: [Sample { value: 3.0, timestamp: 20 }], exemplars: [], histograms: [] }
TimeSeries { labels: [Label { name: "__name__", value: "test_summary" }, Label { name: "quantile", value: "100" }], samples: [Sample { value: 5.0, timestamp: 20 }], exemplars: [] } TimeSeries { labels: [Label { name: "__name__", value: "test_summary" }, Label { name: "quantile", value: "100" }], samples: [Sample { value: 5.0, timestamp: 20 }], exemplars: [], histograms: [] }
TimeSeries { labels: [Label { name: "__name__", value: "test_summary_sum" }], samples: [Sample { value: 15.0, timestamp: 20 }], exemplars: [] } TimeSeries { labels: [Label { name: "__name__", value: "test_summary_sum" }], samples: [Sample { value: 15.0, timestamp: 20 }], exemplars: [], histograms: [] }
TimeSeries { labels: [Label { name: "__name__", value: "test_summary_count" }], samples: [Sample { value: 5.0, timestamp: 20 }], exemplars: [] }"#; TimeSeries { labels: [Label { name: "__name__", value: "test_summary_count" }], samples: [Sample { value: 5.0, timestamp: 20 }], exemplars: [], histograms: [] }"#;
assert_eq!(write_quest_str.join("\n"), ans); assert_eq!(write_quest_str.join("\n"), ans);
} }
@@ -385,11 +394,11 @@ TimeSeries { labels: [Label { name: "__name__", value: "test_summary_count" }],
let write_quest2 = convert_metric_to_write_request(mf, Some(&filter), 0); let write_quest2 = convert_metric_to_write_request(mf, Some(&filter), 0);
assert_eq!( assert_eq!(
format!("{:?}", write_quest1.timeseries), format!("{:?}", write_quest1.timeseries),
r#"[TimeSeries { labels: [Label { name: "__name__", value: "filter_counter" }, Label { name: "a", value: "1" }, Label { name: "b", value: "2" }], samples: [Sample { value: 1.0, timestamp: 0 }], exemplars: [] }, TimeSeries { labels: [Label { name: "__name__", value: "test_counter" }, Label { name: "a", value: "1" }, Label { name: "b", value: "2" }], samples: [Sample { value: 2.0, timestamp: 0 }], exemplars: [] }]"# r#"[TimeSeries { labels: [Label { name: "__name__", value: "filter_counter" }, Label { name: "a", value: "1" }, Label { name: "b", value: "2" }], samples: [Sample { value: 1.0, timestamp: 0 }], exemplars: [], histograms: [] }, TimeSeries { labels: [Label { name: "__name__", value: "test_counter" }, Label { name: "a", value: "1" }, Label { name: "b", value: "2" }], samples: [Sample { value: 2.0, timestamp: 0 }], exemplars: [], histograms: [] }]"#
); );
assert_eq!( assert_eq!(
format!("{:?}", write_quest2.timeseries), format!("{:?}", write_quest2.timeseries),
r#"[TimeSeries { labels: [Label { name: "__name__", value: "test_counter" }, Label { name: "a", value: "1" }, Label { name: "b", value: "2" }], samples: [Sample { value: 2.0, timestamp: 0 }], exemplars: [] }]"# r#"[TimeSeries { labels: [Label { name: "__name__", value: "test_counter" }, Label { name: "a", value: "1" }, Label { name: "b", value: "2" }], samples: [Sample { value: 2.0, timestamp: 0 }], exemplars: [], histograms: [] }]"#
); );
} }
} }

View File

@@ -206,6 +206,8 @@ mod tests {
client_cert_path: None, client_cert_path: None,
client_key_path: None, client_key_path: None,
}), }),
connect_timeout: Duration::from_secs(3),
timeout: Duration::from_secs(3),
}, },
kafka_topic: KafkaTopicConfig { kafka_topic: KafkaTopicConfig {
num_topics: 32, num_topics: 32,
@@ -239,6 +241,8 @@ mod tests {
client_cert_path: None, client_cert_path: None,
client_key_path: None, client_key_path: None,
}), }),
connect_timeout: Duration::from_secs(3),
timeout: Duration::from_secs(3),
}, },
max_batch_bytes: ReadableSize::mb(1), max_batch_bytes: ReadableSize::mb(1),
consumer_wait_timeout: Duration::from_millis(100), consumer_wait_timeout: Duration::from_millis(100),

Some files were not shown because too many files have changed in this diff Show More