Compare commits

...

46 Commits

Author SHA1 Message Date
discord9
1afcddd5a9 chore: feature gate vector_index (#7428)
Signed-off-by: discord9 <discord9@163.com>
2025-12-17 07:14:25 +00:00
shuiyisong
62808b887b fix: using anonymous s3 access when ak and sk is not provided (#7425)
* chore: allow s3 anon

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* chore: disable ec2 metadata

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

---------

Signed-off-by: shuiyisong <xixing.sys@gmail.com>
2025-12-17 06:34:29 +00:00
discord9
04ddd40e00 chore: bump version to beta.3 (#7423)
chore: bump to beta.3

Signed-off-by: discord9 <discord9@163.com>
2025-12-17 04:18:23 +00:00
liyang
b4f028be5f chore: change etcd endpoints to array in the test scripts (#7419)
chore: change etcd endpoint

Signed-off-by: liyang <daviderli614@gmail.com>
2025-12-17 03:14:35 +00:00
Lei, HUANG
da964880f5 chore: expose symbols (#7417)
* refactor/expose-symbols:
 ## Refactor `bulk/part.rs` to Simplify Mutation Handling

 - Removed the `mutations_to_record_batch` function and its associated helper functions, including `ArraysSorter`, `timestamp_array_to_iter`, and `binary_array_to_dictionary`, to simplify the mutation handling logic in `bulk/part.rs`.
 - Deleted related test functions `check_binary_array_to_dictionary` and `check_mutations_to_record_batches` from the test module, along with their associated test cases.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* refactor/expose-symbols:
 ### Commit Message

 **Refactor and Enhance Deduplication Logic**

 - **`flush.rs`**: Refactored `maybe_dedup_one` function to accept `append_mode` and `merge_mode` as parameters instead of `RegionOptions`. This change enhances flexibility in deduplication logic.
 - **`memtable/bulk.rs`**: Made `BulkRangeIterBuilder` struct and its fields public to allow external access and modification, improving extensibility.
 - **`sst.rs`**: Corrected a typo in the schema documentation, changing `__prmary_key` to `__primary_key` for clarity and accuracy.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

---------

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
2025-12-17 01:29:36 +00:00
dennis zhuang
a35a39f726 feat(vector_index): adds the foundational types and SQL parsing support for vector index (#7366)
* feat: adds the foundational types and SQL parsing support for vector index

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* refactor: by suggestions

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* fix: ensure index option values must be greater than zero

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* chore: validate connectivity strictly

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* fix: compile error

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* feat: disable SIMD for ci

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

---------

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
2025-12-16 22:45:36 +00:00
Lei, HUANG
e0c1566e92 fix(servers): flight stuck on waiting for first message (#7413)
* fix/flight-stuck-on-first-message:
 **Refactor GRPC Stream Handling and Table Resolution**

 - **`grpc.rs`**: Refactored the `GrpcQueryHandler` to resolve table references and check permissions only once per stream, improving efficiency. Introduced a mechanism to handle table resolution and permission checks after receiving the first `RecordBatch`.
 - **`flight.rs`**: Enhanced `PutRecordBatchRequestStream` to manage stream states (`Init` and `Ready`) for better handling of schema and table name extraction. Improved error handling and logging for unexpected flight messages.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* chore: add some doc

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

---------

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
2025-12-16 08:54:13 +00:00
Yingwen
f6afb10e33 feat!: download file to fill the cache on write cache miss (#7294)
* feat: download inverted index file

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: download for bloom and fulltext

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: implement maybe_download_background for FileCache

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: load file for parquet

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: reduce channel size

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: use ManifestCache

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: pass cache to ManifestObjectStore::new

Signed-off-by: evenyag <realevenyag@gmail.com>

* style: fix fmt and clippy

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: remove manifest cache ttl

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: remove read cache

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: clean old read cache path

Signed-off-by: evenyag <realevenyag@gmail.com>

* docs: update config

Signed-off-by: evenyag <realevenyag@gmail.com>

* docs: update config examples

Signed-off-by: evenyag <realevenyag@gmail.com>

* test: update test

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: fix CI

Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: also clean the root directory

Signed-off-by: evenyag <realevenyag@gmail.com>

* test: update manifest test

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: fix compiler errors

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: skip file if it exists

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: remove warn in replace

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: add a flag to enable/disable background download

set the concurrency to 1 for background download

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: rename write_cache_enable_background_download to enable_refill_cache_on_read

Signed-off-by: evenyag <realevenyag@gmail.com>

* test: update config test

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: address comments

Signed-off-by: evenyag <realevenyag@gmail.com>

* docs: update config.md

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: fmt code

Signed-off-by: evenyag <realevenyag@gmail.com>

---------

Signed-off-by: evenyag <realevenyag@gmail.com>
2025-12-16 08:31:26 +00:00
dennis zhuang
2dfcf35fee feat: support function aliases and add MySQL-compatible aliases (#7410)
* feat: support function aliases and add MySQL-compatible aliases

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* fix: get_table_function_source

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* refactor: add function_alias mod

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* fix: license

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

---------

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
2025-12-16 06:56:23 +00:00
Weny Xu
f7d5c87ac0 feat: introduce copy_region_from for mito engine (#7389)
* feat: introduce `copy_region_from`

Signed-off-by: WenyXu <wenymedia@gmail.com>

* fix: fix clippy

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: apply suggestions from CR

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: apply suggestions

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: apply suggestions from CR

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: apply suggestions from CR

Signed-off-by: WenyXu <wenymedia@gmail.com>

---------

Signed-off-by: WenyXu <wenymedia@gmail.com>
2025-12-16 06:12:06 +00:00
Weny Xu
9cd57e9342 fix: use verified recycling method for PostgreSQL connection pool (#7407)
Signed-off-by: WenyXu <wenymedia@gmail.com>
2025-12-16 02:49:01 +00:00
jeremyhi
32f9cc5286 feat: move memory_manager to common crate (#7408)
* feat: move memory_manager to common crate

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* chore: add license header

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* fix: by AI comment

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

---------

Signed-off-by: jeremyhi <fengjiachun@gmail.com>
2025-12-15 13:15:33 +00:00
Yingwen
5232a12a8c feat: per file scan metrics (#7396)
* feat: collect per file metrics

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: divide build_cost to build_part_cost and build_reader_cost

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: limit the file metrics num to display

Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: use sorted iter to get sorted files

Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: output metrics in desc order

Signed-off-by: evenyag <realevenyag@gmail.com>

---------

Signed-off-by: evenyag <realevenyag@gmail.com>
2025-12-15 12:52:03 +00:00
fys
913ac325e5 chore: add is_initialized method for frontend client (#7409)
chore: add `is_initialized` for frontend client
2025-12-15 12:51:09 +00:00
LFC
0c52d5bb34 fix: cpu cores got wrongly calculated to 0 (#7405)
* fix: cpu cores got wrongly calculated to 0

Signed-off-by: luofucong <luofc@foxmail.com>

* Update src/common/stat/src/resource.rs

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Signed-off-by: luofucong <luofc@foxmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-12-15 09:40:49 +00:00
Ruihang Xia
e0697790e6 chore: sort histogram sqlness result (#7406)
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2025-12-15 08:12:12 +00:00
shuiyisong
64e74916b9 fix: TLS option validate and merge (#7401)
* chore: unify gRPC server tls behaviour

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* fix: test

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* chore: add validate and merge tls

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* chore: remove mut in func sig and add back test

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* fix: test

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

---------

Signed-off-by: shuiyisong <xixing.sys@gmail.com>
2025-12-15 02:53:21 +00:00
Ruihang Xia
b601781604 feat: optimize and fix part sort on overlapping time windows (#7387)
* enforce two ends sort

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Signed-off-by: discord9 <discord9@163.com>

* primary end scope drain

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Signed-off-by: discord9 <discord9@163.com>

* correct fuzzy generator, no zero limit

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Signed-off-by: discord9 <discord9@163.com>

* early stop check

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Signed-off-by: discord9 <discord9@163.com>

* correct test

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Signed-off-by: discord9 <discord9@163.com>

* simplify implementation by removing some old logic

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Signed-off-by: discord9 <discord9@163.com>

* what

Signed-off-by: discord9 <discord9@163.com>

* maybe

Signed-off-by: discord9 <discord9@163.com>

* fix: reread topk

Signed-off-by: discord9 <discord9@163.com>

* remove: unused topk_buffer_fulfilled method

Fixes clippy dead code warning by removing the unused method.

Signed-off-by: discord9 <discord9@163.com>

* fix: correct test expectations for windowed sort with limit

Updated test expectations in windowed sort tests to match actual algorithm behavior:
- Fixed descending sort test to expect global top 4 values [95, 94, 90, 85] instead of group-local selection
- Fixed ascending sort test to expect global smallest 4 values [5, 6, 7, 8] and adjusted read count accordingly
- Updated comments to reflect correct algorithm behavior for threshold-based boundary detection

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: discord9 <discord9@163.com>

* skip fuzzy test for now

Signed-off-by: discord9 <discord9@163.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Signed-off-by: discord9 <discord9@163.com>
Co-authored-by: discord9 <discord9@163.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-12 14:04:32 +00:00
Ruihang Xia
bd3ad60910 fix: promql offset direction (#7392)
* fix: promql offset direction

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* sort sqlness result

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* commit forgotten file

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2025-12-12 07:51:35 +00:00
Ruihang Xia
cbfdeca64c fix: promql histogram with aggregation (#7393)
* fix: promql histogram with aggregation

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* update test constructors

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* sqlness tests

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* update sqlness result

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* redact partition number

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2025-12-12 07:32:04 +00:00
jeremyhi
baffed8c6a feat: mem manager on compaction (#7305)
* feat: mem manager on compaction

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* fix: by copilot review comment

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* feat: experimental_

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* fix: refine estimate_compaction_bytes

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* feat: make them into config example

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* chore: by copilot comment

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* Update src/mito2/src/compaction.rs

Co-authored-by: Lei, HUANG <6406592+v0y4g3r@users.noreply.github.com>

* fix: dedup the regions waiting

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* chore: by comment

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* chore: minor change

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* feat: add AdditionalMemoryGuard for the running compaction task

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* refactor: do OnExhaustedPolicy before running task

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* refactor: use OwnedSemaphorePermit to impl guard

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* feat: add early_release_partial method to release a portion of memory

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* fix: 0 bytes make request_additional unlimited

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* fix: fail-fast on acquire

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

---------

Signed-off-by: jeremyhi <fengjiachun@gmail.com>
Co-authored-by: Lei, HUANG <6406592+v0y4g3r@users.noreply.github.com>
2025-12-12 06:49:58 +00:00
discord9
11a5e1618d test: test_tracker_cleanup skip non linux (#7398)
test: skip non linux

Signed-off-by: discord9 <discord9@163.com>
2025-12-12 06:27:57 +00:00
Lanqing Yang
f5e0e94e3a chore(mito): nit avoid clone the batch object on inverted index building (#7388)
fix: avoid clone the batch object on inverted index building

Signed-off-by: lyang24 <lanqingy93@gmail.com>
2025-12-12 04:58:37 +00:00
Weny Xu
ba4eda40e5 refactor: optimize heartbeat channel and etcd client keepalive settings (#7390)
Signed-off-by: WenyXu <wenymedia@gmail.com>
2025-12-11 13:32:11 +00:00
discord9
f06a64ff90 feat: mark index outdated (#7383)
* feat: mark index outdated

Signed-off-by: discord9 <discord9@163.com>

* refactor: move IndexVerwsion to store-api

Signed-off-by: discord9 <discord9@163.com>

* per review

Signed-off-by: discord9 <discord9@163.com>

* fix: condition for add files

Signed-off-by: discord9 <discord9@163.com>

* cleanup

Signed-off-by: discord9 <discord9@163.com>

* refactor(sst): extract index version check into method

Signed-off-by: discord9 <discord9@163.com>

---------

Signed-off-by: discord9 <discord9@163.com>
2025-12-11 12:08:45 +00:00
fys
84b4777925 fix: parse "KEEP FIRING FOR" (#7386)
* fix: parse "KEEP FIRING FOR"

* fix: cargo fmt
2025-12-11 03:54:47 +00:00
discord9
a26dee0ca1 fix: gc listing op first (#7385)
Signed-off-by: discord9 <discord9@163.com>
2025-12-11 03:25:05 +00:00
Ning Sun
276f6bf026 feat: grafana postgresql data source query builder support (#7379)
* feat: grafana postgresql data source query builder support

* test: add sqlness test cases
2025-12-11 03:18:35 +00:00
Weny Xu
1d5291b06d fix(procedure): update procedure state correctly during execution and on failure (#7376)
Signed-off-by: WenyXu <wenymedia@gmail.com>
2025-12-11 02:30:32 +00:00
Ruihang Xia
564cc0c750 feat: table/column/flow COMMENT (#7060)
* initial impl

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* simplify impl

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* sqlness test

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* avoid unimplemented panic

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* validate flow

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* update sqlness result

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix table column comment

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* table level comment

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* simplify table info serde

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* don't txn

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* remove empty trait

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* wip: procedure

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* update proto

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* grpc support

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* Apply suggestions from code review

Co-authored-by: dennis zhuang <killme2008@gmail.com>
Co-authored-by: LFC <990479+MichaelScofield@users.noreply.github.com>
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* try from pb struct

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* doc comment

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* check unchanged fast case

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* tune errors

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix merge error

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* use try_as_raw_value

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Co-authored-by: dennis zhuang <killme2008@gmail.com>
Co-authored-by: LFC <990479+MichaelScofield@users.noreply.github.com>
2025-12-10 15:08:47 +00:00
LFC
f1abe5d215 feat: suspend frontend and datanode (#7370)
Signed-off-by: luofucong <luofc@foxmail.com>
2025-12-10 12:18:24 +00:00
Ruihang Xia
ab426cbf89 refactor: remove duplication coverage and code from window sort tests (#7384)
* refactor: remove duplication coverage and code from window sort tests

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* allow clippy

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2025-12-10 10:11:19 +00:00
Weny Xu
cb0f1afb01 fix: improve network failure detection (#7382)
* fix(meta): add default etcd client options with keep-alive settings (#7363)

* fix: improve network failure detection (#7367)

* Update src/meta-srv/src/handler.rs

Co-authored-by: Lei, HUANG <6406592+v0y4g3r@users.noreply.github.com>

---------

Co-authored-by: Lei, HUANG <6406592+v0y4g3r@users.noreply.github.com>
2025-12-10 09:48:36 +00:00
Yingwen
a22d08f1b1 feat: collect merge and dedup metrics (#7375)
* feat: collect FlatMergeReader metrics

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: add MergeMetricsReporter, rename Metrics to MergeMetrics

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: remove num_input_rows from MergeMetrics

The merge reader won't dedup so there is no need to collect input rows

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: report merge metrics to PartitionMetrics

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: add dedup cost to DedupMetrics

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: collect dedup metrics

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: remove metrics from FlatMergeIterator

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: remove num_output_rows from MergeMetrics

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: fix clippy

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: implement merge() for merge and dedup metrics

Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: report metrics after observe metrics

Signed-off-by: evenyag <realevenyag@gmail.com>

---------

Signed-off-by: evenyag <realevenyag@gmail.com>
2025-12-10 09:16:20 +00:00
Ruihang Xia
6817a376b5 fix: part sort behavior (#7374)
* fix: part sort behavior

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* tune tests

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* debug assertion and remove produced count

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2025-12-10 07:44:44 +00:00
discord9
4d1a587079 chore: saturating duration since (#7380)
chore: sat duration since

Signed-off-by: discord9 <discord9@163.com>
2025-12-10 07:10:46 +00:00
Lei, HUANG
9f1aefe98f feat: allow one to many VRL pipeline (#7342)
* feat/allow-one-to-many-pipeline:
 ### Enhance Pipeline Processing for One-to-Many Transformations

 - **Support One-to-Many Transformations**:
   - Updated `processor.rs`, `etl.rs`, `vrl_processor.rs`, and `greptime.rs` to handle one-to-many transformations by allowing VRL processors to return arrays, expanding each element into separate rows.
   - Introduced `transform_array_elements` and `values_to_rows` functions to facilitate this transformation.

 - **Error Handling Enhancements**:
   - Added new error types in `error.rs` to handle cases where array elements are not objects and for transformation failures.

 - **Testing Enhancements**:
   - Added tests in `pipeline.rs` to verify one-to-many transformations, single object processing, and error handling for non-object array elements.

 - **Context Management**:
   - Modified `ctx_req.rs` to clone `ContextOpt` when adding rows, ensuring correct context management during transformations.

 - **Server Pipeline Adjustments**:
   - Updated `pipeline.rs` in `servers` to handle transformed outputs with one-to-many row expansions, ensuring correct row padding and request formation.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* feat/allow-one-to-many-pipeline:
 Add one-to-many VRL pipeline test in `http.rs`

 - Introduced `test_pipeline_one_to_many_vrl` to verify VRL processor's ability to expand a single input row into multiple output rows.
 - Updated `http_tests!` macro to include the new test.
 - Implemented test scenarios for single and multiple input rows, ensuring correct data transformation and row count validation.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* feat/allow-one-to-many-pipeline:
 ### Add Tests for VRL Pipeline Transformations

 - **File:** `src/pipeline/src/etl.rs`
   - Added tests for one-to-many VRL pipeline expansion to ensure multiple output rows from a single input.
   - Introduced tests to verify backward compatibility for single object output.
   - Implemented tests to confirm zero rows are produced from empty arrays.
   - Added validation tests to ensure array elements must be objects.
   - Developed tests for one-to-many transformations with table suffix hints from VRL.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* feat/allow-one-to-many-pipeline:
 ### Enhance Pipeline Transformation with Per-Row Table Suffixes

 - **`src/pipeline/src/etl.rs`**: Updated `TransformedOutput` to include per-row table suffixes, allowing for more flexible routing of transformed data. Modified `PipelineExecOutput` and related methods to
 handle the new structure.
 - **`src/pipeline/src/etl/transform/transformer/greptime.rs`**: Enhanced `values_to_rows` to support per-row table suffix extraction and application.
 - **`src/pipeline/tests/common.rs`** and **`src/pipeline/tests/pipeline.rs`**: Adjusted tests to validate the new per-row table suffix functionality, ensuring backward compatibility and correct behavior in
 one-to-many transformations.
 - **`src/servers/src/pipeline.rs`**: Modified `run_custom_pipeline` to process transformed outputs with per-row table suffixes, grouping rows by `(opt, table_name)` for insertion.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* feat/allow-one-to-many-pipeline:
 ### Update VRL Processor Type Checks

 - **File:** `vrl_processor.rs`
 - **Changes:** Updated type checking logic to use `contains_object()` and `contains_array()` methods instead of `is_object()` and `is_array()`. This change ensures
 compatibility with VRL type inference that may return multiple possible types.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* feat/allow-one-to-many-pipeline:
 - **Enhance Error Handling**: Added new error types `ArrayElementMustBeObjectSnafu` and `TransformArrayElementSnafu` to improve error handling in `etl.rs` and `greptime.rs`.
 - **Refactor Error Usage**: Moved error usage declarations in `transform_array_elements` and `values_to_rows` functions to the top of the file for better organization in `etl.rs` and `greptime.rs`.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* feat/allow-one-to-many-pipeline:
 ### Update `greptime.rs` to Enhance Error Handling

 - **Error Handling**: Modified the `values_to_rows` function to handle invalid array elements based on the `skip_error` parameter. If `skip_error` is true, invalid elements are skipped; otherwise, an error is returned.
 - **Testing**: Added unit tests in `greptime.rs` to verify the behavior of `values_to_rows` with different `skip_error` settings, ensuring correct processing of valid objects and appropriate error handling for invalid elements.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* feat/allow-one-to-many-pipeline:
 ### Commit Summary

 - **Enhance `TransformedOutput` Structure**: Refactored `TransformedOutput` to use a `HashMap` for grouping rows by `ContextOpt`, allowing for per-row configuration options. Updated methods in `PipelineExecOutput` to support the new structure (`src/pipeline/src/etl.rs`).

 - **Add New Transformation Method**: Introduced `transform_array_elements_to_hashmap` to handle array inputs with per-row `ContextOpt` in `HashMap` format (`src/pipeline/src/etl.rs`).

 - **Update Pipeline Execution**: Modified `run_custom_pipeline` to process `TransformedOutput` using the new `HashMap` structure, ensuring rows are grouped by `ContextOpt` and table name (`src/servers/src/pipeline.rs`).

 - **Add Tests for New Structure**: Implemented tests to verify the functionality of the new `HashMap` structure in `TransformedOutput`, including scenarios for one-to-many mapping, single object input, and empty arrays (`src/pipeline/src/etl.rs`).

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* feat/allow-one-to-many-pipeline:
 ### Refactor `values_to_rows` to Return `HashMap` Grouped by `ContextOpt`

 - **`etl.rs`**:
   - Updated `values_to_rows` to return a `HashMap` grouped by `ContextOpt` instead of a vector.
   - Adjusted logic to handle single object and array inputs, ensuring rows are grouped by their `ContextOpt`.
   - Modified functions to extract rows from default `ContextOpt` and apply table suffixes accordingly.

 - **`greptime.rs`**:
   - Enhanced `values_to_rows` to handle errors gracefully with `skip_error` logic.
   - Added logic to group rows by `ContextOpt` for array inputs.

 - **Tests**:
   - Updated existing tests to validate the new `HashMap` return structure.
   - Added a new test to verify correct grouping of rows by per-element `ContextOpt`.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* feat/allow-one-to-many-pipeline:
 ### Refactor and Enhance Error Handling in ETL Pipeline

 - **Refactored Functionality**:
   - Replaced `transform_array_elements` with `transform_array_elements_by_ctx` in `etl.rs` to streamline transformation logic and improve error handling.
   - Updated `values_to_rows` in `greptime.rs` to use `or_default` for cleaner code.

 - **Enhanced Error Handling**:
   - Introduced `unwrap_or_continue_if_err` macro in `etl.rs` to allow skipping errors based on pipeline context, improving robustness in data processing.

 These changes enhance the maintainability and error resilience of the ETL pipeline.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* feat/allow-one-to-many-pipeline:
 ### Update `Row` Handling in ETL Pipeline

 - **Refactor `Row` Type**: Introduced `RowWithTableSuffix` type alias to simplify handling of rows with optional table suffixes across the ETL pipeline.
 - **Modify Function Signatures**: Updated function signatures in `etl.rs` and `greptime.rs` to use `RowWithTableSuffix` for better clarity and consistency.
 - **Enhance Test Coverage**: Adjusted test logic in `greptime.rs` to align with the new `RowWithTableSuffix` type, ensuring correct grouping and processing of rows by TTL.

 Files affected: `etl.rs`, `greptime.rs`.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

---------

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
2025-12-10 06:38:44 +00:00
Lei, HUANG
2f9130a2de chore(mito): expose some symbols (#7373)
chore/expose-symbols:
 ### Commit Summary

 - **Visibility Changes**: Updated visibility of functions in `bulk/part.rs`:
   - Made `record_batch_estimated_size` and `sort_primary_key_record_batch` functions public.
 - **Enhancements**: Enhanced functionality in `memtable.rs` by exposing additional components from `bulk::part`:
   - `BulkPartEncoder`, `BulkPartMeta`, `UnorderedPart`, `record_batch_estimated_size`, and `sort_primary_key_record_batch`.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
2025-12-09 14:33:14 +00:00
shuiyisong
fa2b4e5e63 refactor: extract file watcher to common-config (#7357)
* refactor: extract file watcher to common-config

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* fix: add file check

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* chore: watch dir instead of file

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* chore: address CR issues

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

---------

Signed-off-by: shuiyisong <xixing.sys@gmail.com>
2025-12-09 11:23:26 +00:00
discord9
9197e818ec refactor: use versioned index for index file (#7309)
* refactor: use versioned index for index file

Signed-off-by: discord9 <discord9@163.com>

* fix: sst entry table

Signed-off-by: discord9 <discord9@163.com>

* update sqlness

Signed-off-by: discord9 <discord9@163.com>

* chore: unit type

Signed-off-by: discord9 <discord9@163.com>

* fix: missing version

Signed-off-by: discord9 <discord9@163.com>

* more fix build index

Signed-off-by: discord9 <discord9@163.com>

* fix: use proper index id

Signed-off-by: discord9 <discord9@163.com>

* pcr

Signed-off-by: discord9 <discord9@163.com>

* test: update

Signed-off-by: discord9 <discord9@163.com>

* clippy

Signed-off-by: discord9 <discord9@163.com>

* test: test_list_ssts fixed

Signed-off-by: discord9 <discord9@163.com>

* test: fix test

Signed-off-by: discord9 <discord9@163.com>

* feat: stuff

Signed-off-by: discord9 <discord9@163.com>

* fix: clean temp index file on abort&delete all index version when delete file

Signed-off-by: discord9 <discord9@163.com>

* docs: explain

Signed-off-by: discord9 <discord9@163.com>

* fix: actually clean up tmp dir

Signed-off-by: discord9 <discord9@163.com>

* clippy

Signed-off-by: discord9 <discord9@163.com>

* clean tmp dir only when write cache enabled

Signed-off-by: discord9 <discord9@163.com>

* refactor: add version to index cache

Signed-off-by: discord9 <discord9@163.com>

* per review

Signed-off-by: discord9 <discord9@163.com>

* test: update size

Signed-off-by: discord9 <discord9@163.com>

* per review

Signed-off-by: discord9 <discord9@163.com>

---------

Signed-off-by: discord9 <discord9@163.com>
2025-12-09 07:31:12 +00:00
discord9
36d89c3baf fix: use saturating in gc tracker (#7369)
chore: use saturating

Signed-off-by: discord9 <discord9@163.com>
2025-12-09 06:38:59 +00:00
Ruihang Xia
0ebfd161d8 feat: allow publishing new nightly release when some platforms are absent (#7354)
* feat: allow publishing new nightly release when some platforms are absent

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* unify linux platforms

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* always evaluate conditions

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2025-12-09 04:59:50 +00:00
ZonaHe
8b26a98c3b feat: update dashboard to v0.11.9 (#7364)
Co-authored-by: sunchanglong <sunchanglong@users.noreply.github.com>
2025-12-09 02:37:44 +00:00
discord9
7199823be9 chore: rename to avoid git reserved name (#7359)
rename to avoid reserved name

Signed-off-by: discord9 <discord9@163.com>
2025-12-08 04:01:25 +00:00
Ruihang Xia
60f752d306 feat: run histogram quantile in safe mode for incomplete data (#7297)
* initial impl

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* sqlness test and fix

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* correct sqlness case

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* simplification

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* refine code and comment

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2025-12-05 09:19:21 +00:00
Ruihang Xia
edb1f6086f feat: decode pk eagerly (#7350)
* feat: decode pk eagerly

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* merge primary_key_codec and decode_primary_key_values

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2025-12-05 09:11:51 +00:00
254 changed files with 14292 additions and 4082 deletions

View File

@@ -51,7 +51,7 @@ runs:
run: | run: |
helm upgrade \ helm upgrade \
--install my-greptimedb \ --install my-greptimedb \
--set meta.backendStorage.etcd.endpoints=${{ inputs.etcd-endpoints }} \ --set 'meta.backendStorage.etcd.endpoints[0]=${{ inputs.etcd-endpoints }}' \
--set meta.enableRegionFailover=${{ inputs.enable-region-failover }} \ --set meta.enableRegionFailover=${{ inputs.enable-region-failover }} \
--set image.registry=${{ inputs.image-registry }} \ --set image.registry=${{ inputs.image-registry }} \
--set image.repository=${{ inputs.image-repository }} \ --set image.repository=${{ inputs.image-repository }} \

View File

@@ -81,7 +81,7 @@ function deploy_greptimedb_cluster() {
--create-namespace \ --create-namespace \
--set image.tag="$GREPTIMEDB_IMAGE_TAG" \ --set image.tag="$GREPTIMEDB_IMAGE_TAG" \
--set initializer.tag="$GREPTIMEDB_INITIALIZER_IMAGE_TAG" \ --set initializer.tag="$GREPTIMEDB_INITIALIZER_IMAGE_TAG" \
--set meta.backendStorage.etcd.endpoints="etcd.$install_namespace:2379" \ --set "meta.backendStorage.etcd.endpoints[0]=etcd.$install_namespace.svc.cluster.local:2379" \
--set meta.backendStorage.etcd.storeKeyPrefix="$cluster_name" \ --set meta.backendStorage.etcd.storeKeyPrefix="$cluster_name" \
-n "$install_namespace" -n "$install_namespace"
@@ -119,7 +119,7 @@ function deploy_greptimedb_cluster_with_s3_storage() {
--create-namespace \ --create-namespace \
--set image.tag="$GREPTIMEDB_IMAGE_TAG" \ --set image.tag="$GREPTIMEDB_IMAGE_TAG" \
--set initializer.tag="$GREPTIMEDB_INITIALIZER_IMAGE_TAG" \ --set initializer.tag="$GREPTIMEDB_INITIALIZER_IMAGE_TAG" \
--set meta.backendStorage.etcd.endpoints="etcd.$install_namespace:2379" \ --set "meta.backendStorage.etcd.endpoints[0]=etcd.$install_namespace.svc.cluster.local:2379" \
--set meta.backendStorage.etcd.storeKeyPrefix="$cluster_name" \ --set meta.backendStorage.etcd.storeKeyPrefix="$cluster_name" \
--set objectStorage.s3.bucket="$AWS_CI_TEST_BUCKET" \ --set objectStorage.s3.bucket="$AWS_CI_TEST_BUCKET" \
--set objectStorage.s3.region="$AWS_REGION" \ --set objectStorage.s3.region="$AWS_REGION" \

View File

@@ -49,14 +49,9 @@ on:
description: Do not run integration tests during the build description: Do not run integration tests during the build
type: boolean type: boolean
default: true default: true
build_linux_amd64_artifacts: build_linux_artifacts:
type: boolean type: boolean
description: Build linux-amd64 artifacts description: Build linux artifacts (both amd64 and arm64)
required: false
default: false
build_linux_arm64_artifacts:
type: boolean
description: Build linux-arm64 artifacts
required: false required: false
default: false default: false
build_macos_artifacts: build_macos_artifacts:
@@ -144,7 +139,7 @@ jobs:
./.github/scripts/check-version.sh "${{ steps.create-version.outputs.version }}" ./.github/scripts/check-version.sh "${{ steps.create-version.outputs.version }}"
- name: Allocate linux-amd64 runner - name: Allocate linux-amd64 runner
if: ${{ inputs.build_linux_amd64_artifacts || github.event_name == 'push' || github.event_name == 'schedule' }} if: ${{ inputs.build_linux_artifacts || github.event_name == 'push' || github.event_name == 'schedule' }}
uses: ./.github/actions/start-runner uses: ./.github/actions/start-runner
id: start-linux-amd64-runner id: start-linux-amd64-runner
with: with:
@@ -158,7 +153,7 @@ jobs:
subnet-id: ${{ vars.EC2_RUNNER_SUBNET_ID }} subnet-id: ${{ vars.EC2_RUNNER_SUBNET_ID }}
- name: Allocate linux-arm64 runner - name: Allocate linux-arm64 runner
if: ${{ inputs.build_linux_arm64_artifacts || github.event_name == 'push' || github.event_name == 'schedule' }} if: ${{ inputs.build_linux_artifacts || github.event_name == 'push' || github.event_name == 'schedule' }}
uses: ./.github/actions/start-runner uses: ./.github/actions/start-runner
id: start-linux-arm64-runner id: start-linux-arm64-runner
with: with:
@@ -173,7 +168,7 @@ jobs:
build-linux-amd64-artifacts: build-linux-amd64-artifacts:
name: Build linux-amd64 artifacts name: Build linux-amd64 artifacts
if: ${{ inputs.build_linux_amd64_artifacts || github.event_name == 'push' || github.event_name == 'schedule' }} if: ${{ inputs.build_linux_artifacts || github.event_name == 'push' || github.event_name == 'schedule' }}
needs: [ needs: [
allocate-runners, allocate-runners,
] ]
@@ -195,7 +190,7 @@ jobs:
build-linux-arm64-artifacts: build-linux-arm64-artifacts:
name: Build linux-arm64 artifacts name: Build linux-arm64 artifacts
if: ${{ inputs.build_linux_arm64_artifacts || github.event_name == 'push' || github.event_name == 'schedule' }} if: ${{ inputs.build_linux_artifacts || github.event_name == 'push' || github.event_name == 'schedule' }}
needs: [ needs: [
allocate-runners, allocate-runners,
] ]
@@ -217,7 +212,7 @@ jobs:
run-multi-lang-tests: run-multi-lang-tests:
name: Run Multi-language SDK Tests name: Run Multi-language SDK Tests
if: ${{ inputs.build_linux_amd64_artifacts || github.event_name == 'push' || github.event_name == 'schedule' }} if: ${{ inputs.build_linux_artifacts || github.event_name == 'push' || github.event_name == 'schedule' }}
needs: [ needs: [
allocate-runners, allocate-runners,
build-linux-amd64-artifacts, build-linux-amd64-artifacts,
@@ -386,7 +381,18 @@ jobs:
publish-github-release: publish-github-release:
name: Create GitHub release and upload artifacts name: Create GitHub release and upload artifacts
if: ${{ inputs.publish_github_release || github.event_name == 'push' || github.event_name == 'schedule' }} # Use always() to run even when optional jobs (macos, windows) are skipped.
# Then check that required jobs succeeded and optional jobs didn't fail.
if: |
always() &&
(inputs.publish_github_release || github.event_name == 'push' || github.event_name == 'schedule') &&
needs.allocate-runners.result == 'success' &&
(needs.build-linux-amd64-artifacts.result == 'success' || needs.build-linux-amd64-artifacts.result == 'skipped') &&
(needs.build-linux-arm64-artifacts.result == 'success' || needs.build-linux-arm64-artifacts.result == 'skipped') &&
(needs.build-macos-artifacts.result == 'success' || needs.build-macos-artifacts.result == 'skipped') &&
(needs.build-windows-artifacts.result == 'success' || needs.build-windows-artifacts.result == 'skipped') &&
(needs.release-images-to-dockerhub.result == 'success' || needs.release-images-to-dockerhub.result == 'skipped') &&
(needs.run-multi-lang-tests.result == 'success' || needs.run-multi-lang-tests.result == 'skipped')
needs: [ # The job have to wait for all the artifacts are built. needs: [ # The job have to wait for all the artifacts are built.
allocate-runners, allocate-runners,
build-linux-amd64-artifacts, build-linux-amd64-artifacts,

321
Cargo.lock generated
View File

@@ -212,7 +212,7 @@ checksum = "d301b3b94cb4b2f23d7917810addbbaff90738e0ca2be692bd027e70d7e0330c"
[[package]] [[package]]
name = "api" name = "api"
version = "1.0.0-beta.2" version = "1.0.0-beta.3"
dependencies = [ dependencies = [
"arrow-schema", "arrow-schema",
"common-base", "common-base",
@@ -733,17 +733,17 @@ dependencies = [
[[package]] [[package]]
name = "auth" name = "auth"
version = "1.0.0-beta.2" version = "1.0.0-beta.3"
dependencies = [ dependencies = [
"api", "api",
"async-trait", "async-trait",
"common-base", "common-base",
"common-config",
"common-error", "common-error",
"common-macro", "common-macro",
"common-telemetry", "common-telemetry",
"common-test-util", "common-test-util",
"digest", "digest",
"notify",
"sha1", "sha1",
"snafu 0.8.6", "snafu 0.8.6",
"sql", "sql",
@@ -1383,7 +1383,7 @@ dependencies = [
[[package]] [[package]]
name = "cache" name = "cache"
version = "1.0.0-beta.2" version = "1.0.0-beta.3"
dependencies = [ dependencies = [
"catalog", "catalog",
"common-error", "common-error",
@@ -1418,7 +1418,7 @@ checksum = "37b2a672a2cb129a2e41c10b1224bb368f9f37a2b16b612598138befd7b37eb5"
[[package]] [[package]]
name = "catalog" name = "catalog"
version = "1.0.0-beta.2" version = "1.0.0-beta.3"
dependencies = [ dependencies = [
"api", "api",
"arrow", "arrow",
@@ -1763,7 +1763,7 @@ checksum = "b94f61472cee1439c0b966b47e3aca9ae07e45d070759512cd390ea2bebc6675"
[[package]] [[package]]
name = "cli" name = "cli"
version = "1.0.0-beta.2" version = "1.0.0-beta.3"
dependencies = [ dependencies = [
"async-stream", "async-stream",
"async-trait", "async-trait",
@@ -1816,7 +1816,7 @@ dependencies = [
[[package]] [[package]]
name = "client" name = "client"
version = "1.0.0-beta.2" version = "1.0.0-beta.3"
dependencies = [ dependencies = [
"api", "api",
"arc-swap", "arc-swap",
@@ -1849,7 +1849,7 @@ dependencies = [
"snafu 0.8.6", "snafu 0.8.6",
"store-api", "store-api",
"substrait 0.37.3", "substrait 0.37.3",
"substrait 1.0.0-beta.2", "substrait 1.0.0-beta.3",
"tokio", "tokio",
"tokio-stream", "tokio-stream",
"tonic 0.13.1", "tonic 0.13.1",
@@ -1889,7 +1889,7 @@ dependencies = [
[[package]] [[package]]
name = "cmd" name = "cmd"
version = "1.0.0-beta.2" version = "1.0.0-beta.3"
dependencies = [ dependencies = [
"async-trait", "async-trait",
"auth", "auth",
@@ -1977,6 +1977,17 @@ dependencies = [
"unicode-width 0.2.1", "unicode-width 0.2.1",
] ]
[[package]]
name = "codespan-reporting"
version = "0.13.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "af491d569909a7e4dee0ad7db7f5341fef5c614d5b8ec8cf765732aba3cff681"
dependencies = [
"serde",
"termcolor",
"unicode-width 0.2.1",
]
[[package]] [[package]]
name = "colorchoice" name = "colorchoice"
version = "1.0.4" version = "1.0.4"
@@ -2012,7 +2023,7 @@ checksum = "55b672471b4e9f9e95499ea597ff64941a309b2cdbffcc46f2cc5e2d971fd335"
[[package]] [[package]]
name = "common-base" name = "common-base"
version = "1.0.0-beta.2" version = "1.0.0-beta.3"
dependencies = [ dependencies = [
"anymap2", "anymap2",
"async-trait", "async-trait",
@@ -2036,14 +2047,14 @@ dependencies = [
[[package]] [[package]]
name = "common-catalog" name = "common-catalog"
version = "1.0.0-beta.2" version = "1.0.0-beta.3"
dependencies = [ dependencies = [
"const_format", "const_format",
] ]
[[package]] [[package]]
name = "common-config" name = "common-config"
version = "1.0.0-beta.2" version = "1.0.0-beta.3"
dependencies = [ dependencies = [
"common-base", "common-base",
"common-error", "common-error",
@@ -2055,6 +2066,7 @@ dependencies = [
"datanode", "datanode",
"humantime-serde", "humantime-serde",
"meta-client", "meta-client",
"notify",
"object-store", "object-store",
"serde", "serde",
"serde_json", "serde_json",
@@ -2067,7 +2079,7 @@ dependencies = [
[[package]] [[package]]
name = "common-datasource" name = "common-datasource"
version = "1.0.0-beta.2" version = "1.0.0-beta.3"
dependencies = [ dependencies = [
"arrow", "arrow",
"arrow-schema", "arrow-schema",
@@ -2102,7 +2114,7 @@ dependencies = [
[[package]] [[package]]
name = "common-decimal" name = "common-decimal"
version = "1.0.0-beta.2" version = "1.0.0-beta.3"
dependencies = [ dependencies = [
"bigdecimal 0.4.8", "bigdecimal 0.4.8",
"common-error", "common-error",
@@ -2115,7 +2127,7 @@ dependencies = [
[[package]] [[package]]
name = "common-error" name = "common-error"
version = "1.0.0-beta.2" version = "1.0.0-beta.3"
dependencies = [ dependencies = [
"common-macro", "common-macro",
"http 1.3.1", "http 1.3.1",
@@ -2126,7 +2138,7 @@ dependencies = [
[[package]] [[package]]
name = "common-event-recorder" name = "common-event-recorder"
version = "1.0.0-beta.2" version = "1.0.0-beta.3"
dependencies = [ dependencies = [
"api", "api",
"async-trait", "async-trait",
@@ -2148,7 +2160,7 @@ dependencies = [
[[package]] [[package]]
name = "common-frontend" name = "common-frontend"
version = "1.0.0-beta.2" version = "1.0.0-beta.3"
dependencies = [ dependencies = [
"api", "api",
"async-trait", "async-trait",
@@ -2170,7 +2182,7 @@ dependencies = [
[[package]] [[package]]
name = "common-function" name = "common-function"
version = "1.0.0-beta.2" version = "1.0.0-beta.3"
dependencies = [ dependencies = [
"ahash 0.8.12", "ahash 0.8.12",
"api", "api",
@@ -2230,7 +2242,7 @@ dependencies = [
[[package]] [[package]]
name = "common-greptimedb-telemetry" name = "common-greptimedb-telemetry"
version = "1.0.0-beta.2" version = "1.0.0-beta.3"
dependencies = [ dependencies = [
"async-trait", "async-trait",
"common-runtime", "common-runtime",
@@ -2247,12 +2259,13 @@ dependencies = [
[[package]] [[package]]
name = "common-grpc" name = "common-grpc"
version = "1.0.0-beta.2" version = "1.0.0-beta.3"
dependencies = [ dependencies = [
"api", "api",
"arrow-flight", "arrow-flight",
"bytes", "bytes",
"common-base", "common-base",
"common-config",
"common-error", "common-error",
"common-macro", "common-macro",
"common-recordbatch", "common-recordbatch",
@@ -2266,7 +2279,6 @@ dependencies = [
"hyper 1.6.0", "hyper 1.6.0",
"hyper-util", "hyper-util",
"lazy_static", "lazy_static",
"notify",
"prost 0.13.5", "prost 0.13.5",
"rand 0.9.1", "rand 0.9.1",
"serde", "serde",
@@ -2282,7 +2294,7 @@ dependencies = [
[[package]] [[package]]
name = "common-grpc-expr" name = "common-grpc-expr"
version = "1.0.0-beta.2" version = "1.0.0-beta.3"
dependencies = [ dependencies = [
"api", "api",
"common-base", "common-base",
@@ -2302,7 +2314,7 @@ dependencies = [
[[package]] [[package]]
name = "common-macro" name = "common-macro"
version = "1.0.0-beta.2" version = "1.0.0-beta.3"
dependencies = [ dependencies = [
"greptime-proto", "greptime-proto",
"once_cell", "once_cell",
@@ -2313,7 +2325,7 @@ dependencies = [
[[package]] [[package]]
name = "common-mem-prof" name = "common-mem-prof"
version = "1.0.0-beta.2" version = "1.0.0-beta.3"
dependencies = [ dependencies = [
"anyhow", "anyhow",
"common-error", "common-error",
@@ -2327,9 +2339,22 @@ dependencies = [
"tokio", "tokio",
] ]
[[package]]
name = "common-memory-manager"
version = "1.0.0-beta.3"
dependencies = [
"common-error",
"common-macro",
"common-telemetry",
"humantime",
"serde",
"snafu 0.8.6",
"tokio",
]
[[package]] [[package]]
name = "common-meta" name = "common-meta"
version = "1.0.0-beta.2" version = "1.0.0-beta.3"
dependencies = [ dependencies = [
"anymap2", "anymap2",
"api", "api",
@@ -2401,7 +2426,7 @@ dependencies = [
[[package]] [[package]]
name = "common-options" name = "common-options"
version = "1.0.0-beta.2" version = "1.0.0-beta.3"
dependencies = [ dependencies = [
"common-grpc", "common-grpc",
"humantime-serde", "humantime-serde",
@@ -2410,11 +2435,11 @@ dependencies = [
[[package]] [[package]]
name = "common-plugins" name = "common-plugins"
version = "1.0.0-beta.2" version = "1.0.0-beta.3"
[[package]] [[package]]
name = "common-pprof" name = "common-pprof"
version = "1.0.0-beta.2" version = "1.0.0-beta.3"
dependencies = [ dependencies = [
"common-error", "common-error",
"common-macro", "common-macro",
@@ -2426,7 +2451,7 @@ dependencies = [
[[package]] [[package]]
name = "common-procedure" name = "common-procedure"
version = "1.0.0-beta.2" version = "1.0.0-beta.3"
dependencies = [ dependencies = [
"api", "api",
"async-stream", "async-stream",
@@ -2455,7 +2480,7 @@ dependencies = [
[[package]] [[package]]
name = "common-procedure-test" name = "common-procedure-test"
version = "1.0.0-beta.2" version = "1.0.0-beta.3"
dependencies = [ dependencies = [
"async-trait", "async-trait",
"common-procedure", "common-procedure",
@@ -2465,7 +2490,7 @@ dependencies = [
[[package]] [[package]]
name = "common-query" name = "common-query"
version = "1.0.0-beta.2" version = "1.0.0-beta.3"
dependencies = [ dependencies = [
"api", "api",
"async-trait", "async-trait",
@@ -2491,7 +2516,7 @@ dependencies = [
[[package]] [[package]]
name = "common-recordbatch" name = "common-recordbatch"
version = "1.0.0-beta.2" version = "1.0.0-beta.3"
dependencies = [ dependencies = [
"arc-swap", "arc-swap",
"common-base", "common-base",
@@ -2515,7 +2540,7 @@ dependencies = [
[[package]] [[package]]
name = "common-runtime" name = "common-runtime"
version = "1.0.0-beta.2" version = "1.0.0-beta.3"
dependencies = [ dependencies = [
"async-trait", "async-trait",
"clap 4.5.40", "clap 4.5.40",
@@ -2544,7 +2569,7 @@ dependencies = [
[[package]] [[package]]
name = "common-session" name = "common-session"
version = "1.0.0-beta.2" version = "1.0.0-beta.3"
dependencies = [ dependencies = [
"serde", "serde",
"strum 0.27.1", "strum 0.27.1",
@@ -2552,7 +2577,7 @@ dependencies = [
[[package]] [[package]]
name = "common-sql" name = "common-sql"
version = "1.0.0-beta.2" version = "1.0.0-beta.3"
dependencies = [ dependencies = [
"common-base", "common-base",
"common-decimal", "common-decimal",
@@ -2570,7 +2595,7 @@ dependencies = [
[[package]] [[package]]
name = "common-stat" name = "common-stat"
version = "1.0.0-beta.2" version = "1.0.0-beta.3"
dependencies = [ dependencies = [
"common-base", "common-base",
"common-runtime", "common-runtime",
@@ -2585,7 +2610,7 @@ dependencies = [
[[package]] [[package]]
name = "common-telemetry" name = "common-telemetry"
version = "1.0.0-beta.2" version = "1.0.0-beta.3"
dependencies = [ dependencies = [
"backtrace", "backtrace",
"common-base", "common-base",
@@ -2614,7 +2639,7 @@ dependencies = [
[[package]] [[package]]
name = "common-test-util" name = "common-test-util"
version = "1.0.0-beta.2" version = "1.0.0-beta.3"
dependencies = [ dependencies = [
"client", "client",
"common-grpc", "common-grpc",
@@ -2627,7 +2652,7 @@ dependencies = [
[[package]] [[package]]
name = "common-time" name = "common-time"
version = "1.0.0-beta.2" version = "1.0.0-beta.3"
dependencies = [ dependencies = [
"arrow", "arrow",
"chrono", "chrono",
@@ -2645,7 +2670,7 @@ dependencies = [
[[package]] [[package]]
name = "common-version" name = "common-version"
version = "1.0.0-beta.2" version = "1.0.0-beta.3"
dependencies = [ dependencies = [
"build-data", "build-data",
"cargo-manifest", "cargo-manifest",
@@ -2656,7 +2681,7 @@ dependencies = [
[[package]] [[package]]
name = "common-wal" name = "common-wal"
version = "1.0.0-beta.2" version = "1.0.0-beta.3"
dependencies = [ dependencies = [
"common-base", "common-base",
"common-error", "common-error",
@@ -2679,7 +2704,7 @@ dependencies = [
[[package]] [[package]]
name = "common-workload" name = "common-workload"
version = "1.0.0-beta.2" version = "1.0.0-beta.3"
dependencies = [ dependencies = [
"common-telemetry", "common-telemetry",
"serde", "serde",
@@ -2845,6 +2870,15 @@ dependencies = [
"unicode-segmentation", "unicode-segmentation",
] ]
[[package]]
name = "convert_case"
version = "0.10.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "633458d4ef8c78b72454de2d54fd6ab2e60f9e02be22f3c6104cdc8a4e0fceb9"
dependencies = [
"unicode-segmentation",
]
[[package]] [[package]]
name = "core-foundation" name = "core-foundation"
version = "0.9.4" version = "0.9.4"
@@ -3146,6 +3180,68 @@ dependencies = [
"cipher", "cipher",
] ]
[[package]]
name = "cxx"
version = "1.0.190"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "a7620f6cfc4dcca21f2b085b7a890e16c60fd66f560cd69ee60594908dc72ab1"
dependencies = [
"cc",
"cxx-build",
"cxxbridge-cmd",
"cxxbridge-flags",
"cxxbridge-macro",
"foldhash 0.2.0",
"link-cplusplus",
]
[[package]]
name = "cxx-build"
version = "1.0.190"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7a9bc1a22964ff6a355fbec24cf68266a0ed28f8b84c0864c386474ea3d0e479"
dependencies = [
"cc",
"codespan-reporting 0.13.1",
"indexmap 2.11.4",
"proc-macro2",
"quote",
"scratch",
"syn 2.0.106",
]
[[package]]
name = "cxxbridge-cmd"
version = "1.0.190"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b1f29a879d35f7906e3c9b77d7a1005a6a0787d330c09dfe4ffb5f617728cb44"
dependencies = [
"clap 4.5.40",
"codespan-reporting 0.13.1",
"indexmap 2.11.4",
"proc-macro2",
"quote",
"syn 2.0.106",
]
[[package]]
name = "cxxbridge-flags"
version = "1.0.190"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d67109015f93f683e364085aa6489a5b2118b4a40058482101d699936a7836d6"
[[package]]
name = "cxxbridge-macro"
version = "1.0.190"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d187e019e7b05a1f3e69a8396b70800ee867aa9fc2ab972761173ccee03742df"
dependencies = [
"indexmap 2.11.4",
"proc-macro2",
"quote",
"syn 2.0.106",
]
[[package]] [[package]]
name = "darling" name = "darling"
version = "0.14.4" version = "0.14.4"
@@ -3741,9 +3837,9 @@ dependencies = [
[[package]] [[package]]
name = "datafusion-pg-catalog" name = "datafusion-pg-catalog"
version = "0.12.2" version = "0.12.3"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "755393864c0c2dd95575ceed4b25e348686028e1b83d06f8f39914209999f821" checksum = "09bfd1feed7ed335227af0b65955ed825e467cf67fad6ecd089123202024cfd1"
dependencies = [ dependencies = [
"async-trait", "async-trait",
"datafusion", "datafusion",
@@ -3916,7 +4012,7 @@ dependencies = [
[[package]] [[package]]
name = "datanode" name = "datanode"
version = "1.0.0-beta.2" version = "1.0.0-beta.3"
dependencies = [ dependencies = [
"api", "api",
"arrow-flight", "arrow-flight",
@@ -3980,7 +4076,7 @@ dependencies = [
[[package]] [[package]]
name = "datatypes" name = "datatypes"
version = "1.0.0-beta.2" version = "1.0.0-beta.3"
dependencies = [ dependencies = [
"arrow", "arrow",
"arrow-array", "arrow-array",
@@ -4184,21 +4280,23 @@ dependencies = [
[[package]] [[package]]
name = "derive_more" name = "derive_more"
version = "1.0.0" version = "2.1.0"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "4a9b99b9cbbe49445b21764dc0625032a89b145a2642e67603e1c936f5458d05" checksum = "10b768e943bed7bf2cab53df09f4bc34bfd217cdb57d971e769874c9a6710618"
dependencies = [ dependencies = [
"derive_more-impl", "derive_more-impl",
] ]
[[package]] [[package]]
name = "derive_more-impl" name = "derive_more-impl"
version = "1.0.0" version = "2.1.0"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "cb7330aeadfbe296029522e6c40f315320aba36fc43a5b3632f3795348f3bd22" checksum = "6d286bfdaf75e988b4a78e013ecd79c581e06399ab53fbacd2d916c2f904f30b"
dependencies = [ dependencies = [
"convert_case 0.10.0",
"proc-macro2", "proc-macro2",
"quote", "quote",
"rustc_version",
"syn 2.0.106", "syn 2.0.106",
"unicode-xid", "unicode-xid",
] ]
@@ -4652,7 +4750,7 @@ checksum = "37909eebbb50d72f9059c3b6d82c0463f2ff062c9e95845c43a6c9c0355411be"
[[package]] [[package]]
name = "file-engine" name = "file-engine"
version = "1.0.0-beta.2" version = "1.0.0-beta.3"
dependencies = [ dependencies = [
"api", "api",
"async-trait", "async-trait",
@@ -4784,7 +4882,7 @@ checksum = "8bf7cc16383c4b8d58b9905a8509f02926ce3058053c056376248d958c9df1e8"
[[package]] [[package]]
name = "flow" name = "flow"
version = "1.0.0-beta.2" version = "1.0.0-beta.3"
dependencies = [ dependencies = [
"api", "api",
"arrow", "arrow",
@@ -4853,7 +4951,7 @@ dependencies = [
"sql", "sql",
"store-api", "store-api",
"strum 0.27.1", "strum 0.27.1",
"substrait 1.0.0-beta.2", "substrait 1.0.0-beta.3",
"table", "table",
"tokio", "tokio",
"tonic 0.13.1", "tonic 0.13.1",
@@ -4891,6 +4989,12 @@ version = "0.1.5"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d9c4f5dac5e15c24eb999c26181a6ca40b39fe946cbe4c263c7209467bc83af2" checksum = "d9c4f5dac5e15c24eb999c26181a6ca40b39fe946cbe4c263c7209467bc83af2"
[[package]]
name = "foldhash"
version = "0.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "77ce24cb58228fbb8aa041425bb1050850ac19177686ea6e0f41a70416f56fdb"
[[package]] [[package]]
name = "form_urlencoded" name = "form_urlencoded"
version = "1.2.2" version = "1.2.2"
@@ -4908,13 +5012,14 @@ checksum = "28dd6caf6059519a65843af8fe2a3ae298b14b80179855aeb4adc2c1934ee619"
[[package]] [[package]]
name = "frontend" name = "frontend"
version = "1.0.0-beta.2" version = "1.0.0-beta.3"
dependencies = [ dependencies = [
"api", "api",
"arc-swap", "arc-swap",
"async-stream", "async-stream",
"async-trait", "async-trait",
"auth", "auth",
"axum 0.8.4",
"bytes", "bytes",
"cache", "cache",
"catalog", "catalog",
@@ -4949,9 +5054,11 @@ dependencies = [
"hostname 0.4.1", "hostname 0.4.1",
"humantime", "humantime",
"humantime-serde", "humantime-serde",
"hyper-util",
"lazy_static", "lazy_static",
"log-query", "log-query",
"meta-client", "meta-client",
"meta-srv",
"num_cpus", "num_cpus",
"opentelemetry-proto", "opentelemetry-proto",
"operator", "operator",
@@ -4963,6 +5070,7 @@ dependencies = [
"prost 0.13.5", "prost 0.13.5",
"query", "query",
"rand 0.9.1", "rand 0.9.1",
"reqwest",
"serde", "serde",
"serde_json", "serde_json",
"servers", "servers",
@@ -5351,7 +5459,7 @@ dependencies = [
[[package]] [[package]]
name = "greptime-proto" name = "greptime-proto"
version = "0.1.0" version = "0.1.0"
source = "git+https://github.com/GreptimeTeam/greptime-proto.git?rev=0df99f09f1d6785055b2d9da96fc4ecc2bdf6803#0df99f09f1d6785055b2d9da96fc4ecc2bdf6803" source = "git+https://github.com/GreptimeTeam/greptime-proto.git?rev=0423fa30203187c75e2937a668df1da699c8b96c#0423fa30203187c75e2937a668df1da699c8b96c"
dependencies = [ dependencies = [
"prost 0.13.5", "prost 0.13.5",
"prost-types 0.13.5", "prost-types 0.13.5",
@@ -5487,7 +5595,7 @@ checksum = "5971ac85611da7067dbfcabef3c70ebb5606018acd9e2a3903a0da507521e0d5"
dependencies = [ dependencies = [
"allocator-api2", "allocator-api2",
"equivalent", "equivalent",
"foldhash", "foldhash 0.1.5",
] ]
[[package]] [[package]]
@@ -6119,7 +6227,7 @@ dependencies = [
[[package]] [[package]]
name = "index" name = "index"
version = "1.0.0-beta.2" version = "1.0.0-beta.3"
dependencies = [ dependencies = [
"async-trait", "async-trait",
"asynchronous-codec", "asynchronous-codec",
@@ -6132,6 +6240,7 @@ dependencies = [
"common-telemetry", "common-telemetry",
"common-test-util", "common-test-util",
"criterion 0.4.0", "criterion 0.4.0",
"datatypes",
"fastbloom", "fastbloom",
"fst", "fst",
"futures", "futures",
@@ -6140,6 +6249,7 @@ dependencies = [
"jieba-rs", "jieba-rs",
"lazy_static", "lazy_static",
"mockall", "mockall",
"nalgebra",
"pin-project", "pin-project",
"prost 0.13.5", "prost 0.13.5",
"puffin", "puffin",
@@ -6157,6 +6267,7 @@ dependencies = [
"tempfile", "tempfile",
"tokio", "tokio",
"tokio-util", "tokio-util",
"usearch",
"uuid", "uuid",
] ]
@@ -6988,6 +7099,15 @@ dependencies = [
"vcpkg", "vcpkg",
] ]
[[package]]
name = "link-cplusplus"
version = "1.0.12"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7f78c730aaa7d0b9336a299029ea49f9ee53b0ed06e9202e8cb7db9bae7b8c82"
dependencies = [
"cc",
]
[[package]] [[package]]
name = "linked-hash-map" name = "linked-hash-map"
version = "0.5.6" version = "0.5.6"
@@ -7048,7 +7168,7 @@ checksum = "13dc2df351e3202783a1fe0d44375f7295ffb4049267b0f3018346dc122a1d94"
[[package]] [[package]]
name = "log-query" name = "log-query"
version = "1.0.0-beta.2" version = "1.0.0-beta.3"
dependencies = [ dependencies = [
"chrono", "chrono",
"common-error", "common-error",
@@ -7060,7 +7180,7 @@ dependencies = [
[[package]] [[package]]
name = "log-store" name = "log-store"
version = "1.0.0-beta.2" version = "1.0.0-beta.3"
dependencies = [ dependencies = [
"async-stream", "async-stream",
"async-trait", "async-trait",
@@ -7301,12 +7421,6 @@ dependencies = [
"digest", "digest",
] ]
[[package]]
name = "md5"
version = "0.7.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "490cc448043f947bae3cbee9c203358d62dbee0db12107a74be5c30ccfd09771"
[[package]] [[package]]
name = "md5" name = "md5"
version = "0.8.0" version = "0.8.0"
@@ -7367,7 +7481,7 @@ dependencies = [
[[package]] [[package]]
name = "meta-client" name = "meta-client"
version = "1.0.0-beta.2" version = "1.0.0-beta.3"
dependencies = [ dependencies = [
"api", "api",
"async-trait", "async-trait",
@@ -7395,7 +7509,7 @@ dependencies = [
[[package]] [[package]]
name = "meta-srv" name = "meta-srv"
version = "1.0.0-beta.2" version = "1.0.0-beta.3"
dependencies = [ dependencies = [
"api", "api",
"async-trait", "async-trait",
@@ -7495,7 +7609,7 @@ dependencies = [
[[package]] [[package]]
name = "metric-engine" name = "metric-engine"
version = "1.0.0-beta.2" version = "1.0.0-beta.3"
dependencies = [ dependencies = [
"api", "api",
"aquamarine", "aquamarine",
@@ -7592,7 +7706,7 @@ dependencies = [
[[package]] [[package]]
name = "mito-codec" name = "mito-codec"
version = "1.0.0-beta.2" version = "1.0.0-beta.3"
dependencies = [ dependencies = [
"api", "api",
"bytes", "bytes",
@@ -7617,7 +7731,7 @@ dependencies = [
[[package]] [[package]]
name = "mito2" name = "mito2"
version = "1.0.0-beta.2" version = "1.0.0-beta.3"
dependencies = [ dependencies = [
"api", "api",
"aquamarine", "aquamarine",
@@ -7635,6 +7749,7 @@ dependencies = [
"common-function", "common-function",
"common-grpc", "common-grpc",
"common-macro", "common-macro",
"common-memory-manager",
"common-meta", "common-meta",
"common-query", "common-query",
"common-recordbatch", "common-recordbatch",
@@ -7656,6 +7771,7 @@ dependencies = [
"either", "either",
"futures", "futures",
"greptime-proto", "greptime-proto",
"humantime",
"humantime-serde", "humantime-serde",
"index", "index",
"itertools 0.14.0", "itertools 0.14.0",
@@ -8355,7 +8471,7 @@ dependencies = [
[[package]] [[package]]
name = "object-store" name = "object-store"
version = "1.0.0-beta.2" version = "1.0.0-beta.3"
dependencies = [ dependencies = [
"anyhow", "anyhow",
"bytes", "bytes",
@@ -8368,7 +8484,6 @@ dependencies = [
"futures", "futures",
"humantime-serde", "humantime-serde",
"lazy_static", "lazy_static",
"md5 0.7.0",
"moka", "moka",
"opendal", "opendal",
"prometheus", "prometheus",
@@ -8641,7 +8756,7 @@ dependencies = [
[[package]] [[package]]
name = "operator" name = "operator"
version = "1.0.0-beta.2" version = "1.0.0-beta.3"
dependencies = [ dependencies = [
"ahash 0.8.12", "ahash 0.8.12",
"api", "api",
@@ -8701,7 +8816,7 @@ dependencies = [
"sql", "sql",
"sqlparser", "sqlparser",
"store-api", "store-api",
"substrait 1.0.0-beta.2", "substrait 1.0.0-beta.3",
"table", "table",
"tokio", "tokio",
"tokio-util", "tokio-util",
@@ -8987,7 +9102,7 @@ dependencies = [
[[package]] [[package]]
name = "partition" name = "partition"
version = "1.0.0-beta.2" version = "1.0.0-beta.3"
dependencies = [ dependencies = [
"api", "api",
"async-trait", "async-trait",
@@ -9215,7 +9330,7 @@ dependencies = [
"futures", "futures",
"hex", "hex",
"lazy-regex", "lazy-regex",
"md5 0.8.0", "md5",
"postgres-types", "postgres-types",
"rand 0.9.1", "rand 0.9.1",
"ring", "ring",
@@ -9344,7 +9459,7 @@ checksum = "8b870d8c151b6f2fb93e84a13146138f05d02ed11c7e7c54f8826aaaf7c9f184"
[[package]] [[package]]
name = "pipeline" name = "pipeline"
version = "1.0.0-beta.2" version = "1.0.0-beta.3"
dependencies = [ dependencies = [
"ahash 0.8.12", "ahash 0.8.12",
"api", "api",
@@ -9500,7 +9615,7 @@ dependencies = [
[[package]] [[package]]
name = "plugins" name = "plugins"
version = "1.0.0-beta.2" version = "1.0.0-beta.3"
dependencies = [ dependencies = [
"auth", "auth",
"catalog", "catalog",
@@ -9802,7 +9917,7 @@ dependencies = [
[[package]] [[package]]
name = "promql" name = "promql"
version = "1.0.0-beta.2" version = "1.0.0-beta.3"
dependencies = [ dependencies = [
"ahash 0.8.12", "ahash 0.8.12",
"async-trait", "async-trait",
@@ -10085,7 +10200,7 @@ dependencies = [
[[package]] [[package]]
name = "puffin" name = "puffin"
version = "1.0.0-beta.2" version = "1.0.0-beta.3"
dependencies = [ dependencies = [
"async-compression 0.4.19", "async-compression 0.4.19",
"async-trait", "async-trait",
@@ -10127,7 +10242,7 @@ dependencies = [
[[package]] [[package]]
name = "query" name = "query"
version = "1.0.0-beta.2" version = "1.0.0-beta.3"
dependencies = [ dependencies = [
"ahash 0.8.12", "ahash 0.8.12",
"api", "api",
@@ -10194,7 +10309,7 @@ dependencies = [
"sql", "sql",
"sqlparser", "sqlparser",
"store-api", "store-api",
"substrait 1.0.0-beta.2", "substrait 1.0.0-beta.3",
"table", "table",
"tokio", "tokio",
"tokio-stream", "tokio-stream",
@@ -10837,7 +10952,7 @@ dependencies = [
[[package]] [[package]]
name = "rskafka" name = "rskafka"
version = "0.6.0" version = "0.6.0"
source = "git+https://github.com/WenyXu/rskafka.git?rev=7b0f31ed39db049b4ee2e5f1e95b5a30be9baf76#7b0f31ed39db049b4ee2e5f1e95b5a30be9baf76" source = "git+https://github.com/GreptimeTeam/rskafka.git?rev=f5688f83e7da591cda3f2674c2408b4c0ed4ed50#f5688f83e7da591cda3f2674c2408b4c0ed4ed50"
dependencies = [ dependencies = [
"bytes", "bytes",
"chrono", "chrono",
@@ -11266,6 +11381,12 @@ version = "1.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "94143f37725109f92c262ed2cf5e59bce7498c01bcc1502d7b9afe439a4e9f49" checksum = "94143f37725109f92c262ed2cf5e59bce7498c01bcc1502d7b9afe439a4e9f49"
[[package]]
name = "scratch"
version = "1.0.9"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d68f2ec51b097e4c1a75b681a8bec621909b5e91f15bb7b840c4f2f7b01148b2"
[[package]] [[package]]
name = "scrypt" name = "scrypt"
version = "0.11.0" version = "0.11.0"
@@ -11530,7 +11651,7 @@ dependencies = [
[[package]] [[package]]
name = "servers" name = "servers"
version = "1.0.0-beta.2" version = "1.0.0-beta.3"
dependencies = [ dependencies = [
"ahash 0.8.12", "ahash 0.8.12",
"api", "api",
@@ -11658,7 +11779,7 @@ dependencies = [
[[package]] [[package]]
name = "session" name = "session"
version = "1.0.0-beta.2" version = "1.0.0-beta.3"
dependencies = [ dependencies = [
"ahash 0.8.12", "ahash 0.8.12",
"api", "api",
@@ -11992,7 +12113,7 @@ dependencies = [
[[package]] [[package]]
name = "sql" name = "sql"
version = "1.0.0-beta.2" version = "1.0.0-beta.3"
dependencies = [ dependencies = [
"api", "api",
"arrow-buffer", "arrow-buffer",
@@ -12052,7 +12173,7 @@ dependencies = [
[[package]] [[package]]
name = "sqlness-runner" name = "sqlness-runner"
version = "1.0.0-beta.2" version = "1.0.0-beta.3"
dependencies = [ dependencies = [
"async-trait", "async-trait",
"clap 4.5.40", "clap 4.5.40",
@@ -12329,7 +12450,7 @@ dependencies = [
[[package]] [[package]]
name = "standalone" name = "standalone"
version = "1.0.0-beta.2" version = "1.0.0-beta.3"
dependencies = [ dependencies = [
"async-trait", "async-trait",
"catalog", "catalog",
@@ -12370,7 +12491,7 @@ checksum = "a2eb9349b6444b326872e140eb1cf5e7c522154d69e7a0ffb0fb81c06b37543f"
[[package]] [[package]]
name = "store-api" name = "store-api"
version = "1.0.0-beta.2" version = "1.0.0-beta.3"
dependencies = [ dependencies = [
"api", "api",
"aquamarine", "aquamarine",
@@ -12583,7 +12704,7 @@ dependencies = [
[[package]] [[package]]
name = "substrait" name = "substrait"
version = "1.0.0-beta.2" version = "1.0.0-beta.3"
dependencies = [ dependencies = [
"async-trait", "async-trait",
"bytes", "bytes",
@@ -12706,7 +12827,7 @@ dependencies = [
[[package]] [[package]]
name = "table" name = "table"
version = "1.0.0-beta.2" version = "1.0.0-beta.3"
dependencies = [ dependencies = [
"api", "api",
"async-trait", "async-trait",
@@ -12975,7 +13096,7 @@ checksum = "8f50febec83f5ee1df3015341d8bd429f2d1cc62bcba7ea2076759d315084683"
[[package]] [[package]]
name = "tests-fuzz" name = "tests-fuzz"
version = "1.0.0-beta.2" version = "1.0.0-beta.3"
dependencies = [ dependencies = [
"arbitrary", "arbitrary",
"async-trait", "async-trait",
@@ -13019,7 +13140,7 @@ dependencies = [
[[package]] [[package]]
name = "tests-integration" name = "tests-integration"
version = "1.0.0-beta.2" version = "1.0.0-beta.3"
dependencies = [ dependencies = [
"api", "api",
"arrow-flight", "arrow-flight",
@@ -13094,7 +13215,7 @@ dependencies = [
"sqlx", "sqlx",
"standalone", "standalone",
"store-api", "store-api",
"substrait 1.0.0-beta.2", "substrait 1.0.0-beta.3",
"table", "table",
"tempfile", "tempfile",
"time", "time",
@@ -14119,6 +14240,16 @@ version = "2.1.3"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "daf8dba3b7eb870caf1ddeed7bc9d2a049f3cfdfae7cb521b087cc33ae4c49da" checksum = "daf8dba3b7eb870caf1ddeed7bc9d2a049f3cfdfae7cb521b087cc33ae4c49da"
[[package]]
name = "usearch"
version = "2.21.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "2cc9fc5f872a3a4f9081d5f42624d788231b763e1846c829b9968a3755ac884d"
dependencies = [
"cxx",
"cxx-build",
]
[[package]] [[package]]
name = "utf8-ranges" name = "utf8-ranges"
version = "1.0.5" version = "1.0.5"
@@ -14258,7 +14389,7 @@ dependencies = [
"ciborium", "ciborium",
"cidr", "cidr",
"clap 4.5.40", "clap 4.5.40",
"codespan-reporting", "codespan-reporting 0.12.0",
"community-id", "community-id",
"convert_case 0.7.1", "convert_case 0.7.1",
"crc", "crc",

View File

@@ -21,6 +21,7 @@ members = [
"src/common/grpc-expr", "src/common/grpc-expr",
"src/common/macro", "src/common/macro",
"src/common/mem-prof", "src/common/mem-prof",
"src/common/memory-manager",
"src/common/meta", "src/common/meta",
"src/common/options", "src/common/options",
"src/common/plugins", "src/common/plugins",
@@ -74,7 +75,7 @@ members = [
resolver = "2" resolver = "2"
[workspace.package] [workspace.package]
version = "1.0.0-beta.2" version = "1.0.0-beta.3"
edition = "2024" edition = "2024"
license = "Apache-2.0" license = "Apache-2.0"
@@ -131,7 +132,7 @@ datafusion-functions = "50"
datafusion-functions-aggregate-common = "50" datafusion-functions-aggregate-common = "50"
datafusion-optimizer = "50" datafusion-optimizer = "50"
datafusion-orc = "0.5" datafusion-orc = "0.5"
datafusion-pg-catalog = "0.12.2" datafusion-pg-catalog = "0.12.3"
datafusion-physical-expr = "50" datafusion-physical-expr = "50"
datafusion-physical-plan = "50" datafusion-physical-plan = "50"
datafusion-sql = "50" datafusion-sql = "50"
@@ -139,6 +140,7 @@ datafusion-substrait = "50"
deadpool = "0.12" deadpool = "0.12"
deadpool-postgres = "0.14" deadpool-postgres = "0.14"
derive_builder = "0.20" derive_builder = "0.20"
derive_more = { version = "2.1", features = ["full"] }
dotenv = "0.15" dotenv = "0.15"
either = "1.15" either = "1.15"
etcd-client = { git = "https://github.com/GreptimeTeam/etcd-client", rev = "f62df834f0cffda355eba96691fe1a9a332b75a7", features = [ etcd-client = { git = "https://github.com/GreptimeTeam/etcd-client", rev = "f62df834f0cffda355eba96691fe1a9a332b75a7", features = [
@@ -148,7 +150,7 @@ etcd-client = { git = "https://github.com/GreptimeTeam/etcd-client", rev = "f62d
fst = "0.4.7" fst = "0.4.7"
futures = "0.3" futures = "0.3"
futures-util = "0.3" futures-util = "0.3"
greptime-proto = { git = "https://github.com/GreptimeTeam/greptime-proto.git", rev = "0df99f09f1d6785055b2d9da96fc4ecc2bdf6803" } greptime-proto = { git = "https://github.com/GreptimeTeam/greptime-proto.git", rev = "0423fa30203187c75e2937a668df1da699c8b96c" }
hex = "0.4" hex = "0.4"
http = "1" http = "1"
humantime = "2.1" humantime = "2.1"
@@ -200,7 +202,8 @@ reqwest = { version = "0.12", default-features = false, features = [
"stream", "stream",
"multipart", "multipart",
] } ] }
rskafka = { git = "https://github.com/WenyXu/rskafka.git", rev = "7b0f31ed39db049b4ee2e5f1e95b5a30be9baf76", features = [ # Branch: feat/request-timeout
rskafka = { git = "https://github.com/GreptimeTeam/rskafka.git", rev = "f5688f83e7da591cda3f2674c2408b4c0ed4ed50", features = [
"transport-tls", "transport-tls",
] } ] }
rstest = "0.25" rstest = "0.25"
@@ -264,6 +267,7 @@ common-grpc = { path = "src/common/grpc" }
common-grpc-expr = { path = "src/common/grpc-expr" } common-grpc-expr = { path = "src/common/grpc-expr" }
common-macro = { path = "src/common/macro" } common-macro = { path = "src/common/macro" }
common-mem-prof = { path = "src/common/mem-prof" } common-mem-prof = { path = "src/common/mem-prof" }
common-memory-manager = { path = "src/common/memory-manager" }
common-meta = { path = "src/common/meta" } common-meta = { path = "src/common/meta" }
common-options = { path = "src/common/options" } common-options = { path = "src/common/options" }
common-plugins = { path = "src/common/plugins" } common-plugins = { path = "src/common/plugins" }

View File

@@ -108,9 +108,6 @@
| `storage` | -- | -- | The data storage options. | | `storage` | -- | -- | The data storage options. |
| `storage.data_home` | String | `./greptimedb_data` | The working home directory. | | `storage.data_home` | String | `./greptimedb_data` | The working home directory. |
| `storage.type` | String | `File` | The storage type used to store the data.<br/>- `File`: the data is stored in the local file system.<br/>- `S3`: the data is stored in the S3 object storage.<br/>- `Gcs`: the data is stored in the Google Cloud Storage.<br/>- `Azblob`: the data is stored in the Azure Blob Storage.<br/>- `Oss`: the data is stored in the Aliyun OSS. | | `storage.type` | String | `File` | The storage type used to store the data.<br/>- `File`: the data is stored in the local file system.<br/>- `S3`: the data is stored in the S3 object storage.<br/>- `Gcs`: the data is stored in the Google Cloud Storage.<br/>- `Azblob`: the data is stored in the Azure Blob Storage.<br/>- `Oss`: the data is stored in the Aliyun OSS. |
| `storage.enable_read_cache` | Bool | `true` | Whether to enable read cache. If not set, the read cache will be enabled by default when using object storage. |
| `storage.cache_path` | String | Unset | Read cache configuration for object storage such as 'S3' etc, it's configured by default when using object storage. It is recommended to configure it when using object storage for better performance.<br/>A local file directory, defaults to `{data_home}`. An empty string means disabling. |
| `storage.cache_capacity` | String | Unset | The local file cache capacity in bytes. If your disk space is sufficient, it is recommended to set it larger. |
| `storage.bucket` | String | Unset | The S3 bucket name.<br/>**It's only used when the storage type is `S3`, `Oss` and `Gcs`**. | | `storage.bucket` | String | Unset | The S3 bucket name.<br/>**It's only used when the storage type is `S3`, `Oss` and `Gcs`**. |
| `storage.root` | String | Unset | The S3 data will be stored in the specified prefix, for example, `s3://${bucket}/${root}`.<br/>**It's only used when the storage type is `S3`, `Oss` and `Azblob`**. | | `storage.root` | String | Unset | The S3 data will be stored in the specified prefix, for example, `s3://${bucket}/${root}`.<br/>**It's only used when the storage type is `S3`, `Oss` and `Azblob`**. |
| `storage.access_key_id` | String | Unset | The access key id of the aws account.<br/>It's **highly recommended** to use AWS IAM roles instead of hardcoding the access key id and secret key.<br/>**It's only used when the storage type is `S3` and `Oss`**. | | `storage.access_key_id` | String | Unset | The access key id of the aws account.<br/>It's **highly recommended** to use AWS IAM roles instead of hardcoding the access key id and secret key.<br/>**It's only used when the storage type is `S3` and `Oss`**. |
@@ -141,6 +138,8 @@
| `region_engine.mito.max_background_flushes` | Integer | Auto | Max number of running background flush jobs (default: 1/2 of cpu cores). | | `region_engine.mito.max_background_flushes` | Integer | Auto | Max number of running background flush jobs (default: 1/2 of cpu cores). |
| `region_engine.mito.max_background_compactions` | Integer | Auto | Max number of running background compaction jobs (default: 1/4 of cpu cores). | | `region_engine.mito.max_background_compactions` | Integer | Auto | Max number of running background compaction jobs (default: 1/4 of cpu cores). |
| `region_engine.mito.max_background_purges` | Integer | Auto | Max number of running background purge jobs (default: number of cpu cores). | | `region_engine.mito.max_background_purges` | Integer | Auto | Max number of running background purge jobs (default: number of cpu cores). |
| `region_engine.mito.experimental_compaction_memory_limit` | String | 0 | Memory budget for compaction tasks. Setting it to 0 or "unlimited" disables the limit. |
| `region_engine.mito.experimental_compaction_on_exhausted` | String | wait | Behavior when compaction cannot acquire memory from the budget.<br/>Options: "wait" (default, 10s), "wait(<duration>)", "fail" |
| `region_engine.mito.auto_flush_interval` | String | `1h` | Interval to auto flush a region if it has not flushed yet. | | `region_engine.mito.auto_flush_interval` | String | `1h` | Interval to auto flush a region if it has not flushed yet. |
| `region_engine.mito.global_write_buffer_size` | String | Auto | Global write buffer size for all regions. If not set, it's default to 1/8 of OS memory with a max limitation of 1GB. | | `region_engine.mito.global_write_buffer_size` | String | Auto | Global write buffer size for all regions. If not set, it's default to 1/8 of OS memory with a max limitation of 1GB. |
| `region_engine.mito.global_write_buffer_reject_size` | String | Auto | Global write buffer size threshold to reject write requests. If not set, it's default to 2 times of `global_write_buffer_size`. | | `region_engine.mito.global_write_buffer_reject_size` | String | Auto | Global write buffer size threshold to reject write requests. If not set, it's default to 2 times of `global_write_buffer_size`. |
@@ -154,6 +153,8 @@
| `region_engine.mito.write_cache_ttl` | String | Unset | TTL for write cache. | | `region_engine.mito.write_cache_ttl` | String | Unset | TTL for write cache. |
| `region_engine.mito.preload_index_cache` | Bool | `true` | Preload index (puffin) files into cache on region open (default: true).<br/>When enabled, index files are loaded into the write cache during region initialization,<br/>which can improve query performance at the cost of longer startup times. | | `region_engine.mito.preload_index_cache` | Bool | `true` | Preload index (puffin) files into cache on region open (default: true).<br/>When enabled, index files are loaded into the write cache during region initialization,<br/>which can improve query performance at the cost of longer startup times. |
| `region_engine.mito.index_cache_percent` | Integer | `20` | Percentage of write cache capacity allocated for index (puffin) files (default: 20).<br/>The remaining capacity is used for data (parquet) files.<br/>Must be between 0 and 100 (exclusive). For example, with a 5GiB write cache and 20% allocation,<br/>1GiB is reserved for index files and 4GiB for data files. | | `region_engine.mito.index_cache_percent` | Integer | `20` | Percentage of write cache capacity allocated for index (puffin) files (default: 20).<br/>The remaining capacity is used for data (parquet) files.<br/>Must be between 0 and 100 (exclusive). For example, with a 5GiB write cache and 20% allocation,<br/>1GiB is reserved for index files and 4GiB for data files. |
| `region_engine.mito.enable_refill_cache_on_read` | Bool | `true` | Enable refilling cache on read operations (default: true).<br/>When disabled, cache refilling on read won't happen. |
| `region_engine.mito.manifest_cache_size` | String | `256MB` | Capacity for manifest cache (default: 256MB). |
| `region_engine.mito.sst_write_buffer_size` | String | `8MB` | Buffer size for SST writing. | | `region_engine.mito.sst_write_buffer_size` | String | `8MB` | Buffer size for SST writing. |
| `region_engine.mito.parallel_scan_channel_size` | Integer | `32` | Capacity of the channel to send data from parallel scan tasks to the main task. | | `region_engine.mito.parallel_scan_channel_size` | Integer | `32` | Capacity of the channel to send data from parallel scan tasks to the main task. |
| `region_engine.mito.max_concurrent_scan_files` | Integer | `384` | Maximum number of SST files to scan concurrently. | | `region_engine.mito.max_concurrent_scan_files` | Integer | `384` | Maximum number of SST files to scan concurrently. |
@@ -486,9 +487,6 @@
| `storage` | -- | -- | The data storage options. | | `storage` | -- | -- | The data storage options. |
| `storage.data_home` | String | `./greptimedb_data` | The working home directory. | | `storage.data_home` | String | `./greptimedb_data` | The working home directory. |
| `storage.type` | String | `File` | The storage type used to store the data.<br/>- `File`: the data is stored in the local file system.<br/>- `S3`: the data is stored in the S3 object storage.<br/>- `Gcs`: the data is stored in the Google Cloud Storage.<br/>- `Azblob`: the data is stored in the Azure Blob Storage.<br/>- `Oss`: the data is stored in the Aliyun OSS. | | `storage.type` | String | `File` | The storage type used to store the data.<br/>- `File`: the data is stored in the local file system.<br/>- `S3`: the data is stored in the S3 object storage.<br/>- `Gcs`: the data is stored in the Google Cloud Storage.<br/>- `Azblob`: the data is stored in the Azure Blob Storage.<br/>- `Oss`: the data is stored in the Aliyun OSS. |
| `storage.cache_path` | String | Unset | Read cache configuration for object storage such as 'S3' etc, it's configured by default when using object storage. It is recommended to configure it when using object storage for better performance.<br/>A local file directory, defaults to `{data_home}`. An empty string means disabling. |
| `storage.enable_read_cache` | Bool | `true` | Whether to enable read cache. If not set, the read cache will be enabled by default when using object storage. |
| `storage.cache_capacity` | String | Unset | The local file cache capacity in bytes. If your disk space is sufficient, it is recommended to set it larger. |
| `storage.bucket` | String | Unset | The S3 bucket name.<br/>**It's only used when the storage type is `S3`, `Oss` and `Gcs`**. | | `storage.bucket` | String | Unset | The S3 bucket name.<br/>**It's only used when the storage type is `S3`, `Oss` and `Gcs`**. |
| `storage.root` | String | Unset | The S3 data will be stored in the specified prefix, for example, `s3://${bucket}/${root}`.<br/>**It's only used when the storage type is `S3`, `Oss` and `Azblob`**. | | `storage.root` | String | Unset | The S3 data will be stored in the specified prefix, for example, `s3://${bucket}/${root}`.<br/>**It's only used when the storage type is `S3`, `Oss` and `Azblob`**. |
| `storage.access_key_id` | String | Unset | The access key id of the aws account.<br/>It's **highly recommended** to use AWS IAM roles instead of hardcoding the access key id and secret key.<br/>**It's only used when the storage type is `S3` and `Oss`**. | | `storage.access_key_id` | String | Unset | The access key id of the aws account.<br/>It's **highly recommended** to use AWS IAM roles instead of hardcoding the access key id and secret key.<br/>**It's only used when the storage type is `S3` and `Oss`**. |
@@ -521,6 +519,8 @@
| `region_engine.mito.max_background_flushes` | Integer | Auto | Max number of running background flush jobs (default: 1/2 of cpu cores). | | `region_engine.mito.max_background_flushes` | Integer | Auto | Max number of running background flush jobs (default: 1/2 of cpu cores). |
| `region_engine.mito.max_background_compactions` | Integer | Auto | Max number of running background compaction jobs (default: 1/4 of cpu cores). | | `region_engine.mito.max_background_compactions` | Integer | Auto | Max number of running background compaction jobs (default: 1/4 of cpu cores). |
| `region_engine.mito.max_background_purges` | Integer | Auto | Max number of running background purge jobs (default: number of cpu cores). | | `region_engine.mito.max_background_purges` | Integer | Auto | Max number of running background purge jobs (default: number of cpu cores). |
| `region_engine.mito.experimental_compaction_memory_limit` | String | 0 | Memory budget for compaction tasks. Setting it to 0 or "unlimited" disables the limit. |
| `region_engine.mito.experimental_compaction_on_exhausted` | String | wait | Behavior when compaction cannot acquire memory from the budget.<br/>Options: "wait" (default, 10s), "wait(<duration>)", "fail" |
| `region_engine.mito.auto_flush_interval` | String | `1h` | Interval to auto flush a region if it has not flushed yet. | | `region_engine.mito.auto_flush_interval` | String | `1h` | Interval to auto flush a region if it has not flushed yet. |
| `region_engine.mito.global_write_buffer_size` | String | Auto | Global write buffer size for all regions. If not set, it's default to 1/8 of OS memory with a max limitation of 1GB. | | `region_engine.mito.global_write_buffer_size` | String | Auto | Global write buffer size for all regions. If not set, it's default to 1/8 of OS memory with a max limitation of 1GB. |
| `region_engine.mito.global_write_buffer_reject_size` | String | Auto | Global write buffer size threshold to reject write requests. If not set, it's default to 2 times of `global_write_buffer_size` | | `region_engine.mito.global_write_buffer_reject_size` | String | Auto | Global write buffer size threshold to reject write requests. If not set, it's default to 2 times of `global_write_buffer_size` |
@@ -534,6 +534,8 @@
| `region_engine.mito.write_cache_ttl` | String | Unset | TTL for write cache. | | `region_engine.mito.write_cache_ttl` | String | Unset | TTL for write cache. |
| `region_engine.mito.preload_index_cache` | Bool | `true` | Preload index (puffin) files into cache on region open (default: true).<br/>When enabled, index files are loaded into the write cache during region initialization,<br/>which can improve query performance at the cost of longer startup times. | | `region_engine.mito.preload_index_cache` | Bool | `true` | Preload index (puffin) files into cache on region open (default: true).<br/>When enabled, index files are loaded into the write cache during region initialization,<br/>which can improve query performance at the cost of longer startup times. |
| `region_engine.mito.index_cache_percent` | Integer | `20` | Percentage of write cache capacity allocated for index (puffin) files (default: 20).<br/>The remaining capacity is used for data (parquet) files.<br/>Must be between 0 and 100 (exclusive). For example, with a 5GiB write cache and 20% allocation,<br/>1GiB is reserved for index files and 4GiB for data files. | | `region_engine.mito.index_cache_percent` | Integer | `20` | Percentage of write cache capacity allocated for index (puffin) files (default: 20).<br/>The remaining capacity is used for data (parquet) files.<br/>Must be between 0 and 100 (exclusive). For example, with a 5GiB write cache and 20% allocation,<br/>1GiB is reserved for index files and 4GiB for data files. |
| `region_engine.mito.enable_refill_cache_on_read` | Bool | `true` | Enable refilling cache on read operations (default: true).<br/>When disabled, cache refilling on read won't happen. |
| `region_engine.mito.manifest_cache_size` | String | `256MB` | Capacity for manifest cache (default: 256MB). |
| `region_engine.mito.sst_write_buffer_size` | String | `8MB` | Buffer size for SST writing. | | `region_engine.mito.sst_write_buffer_size` | String | `8MB` | Buffer size for SST writing. |
| `region_engine.mito.parallel_scan_channel_size` | Integer | `32` | Capacity of the channel to send data from parallel scan tasks to the main task. | | `region_engine.mito.parallel_scan_channel_size` | Integer | `32` | Capacity of the channel to send data from parallel scan tasks to the main task. |
| `region_engine.mito.max_concurrent_scan_files` | Integer | `384` | Maximum number of SST files to scan concurrently. | | `region_engine.mito.max_concurrent_scan_files` | Integer | `384` | Maximum number of SST files to scan concurrently. |

View File

@@ -281,18 +281,6 @@ data_home = "./greptimedb_data"
## - `Oss`: the data is stored in the Aliyun OSS. ## - `Oss`: the data is stored in the Aliyun OSS.
type = "File" type = "File"
## Read cache configuration for object storage such as 'S3' etc, it's configured by default when using object storage. It is recommended to configure it when using object storage for better performance.
## A local file directory, defaults to `{data_home}`. An empty string means disabling.
## @toml2docs:none-default
#+ cache_path = ""
## Whether to enable read cache. If not set, the read cache will be enabled by default when using object storage.
#+ enable_read_cache = true
## The local file cache capacity in bytes. If your disk space is sufficient, it is recommended to set it larger.
## @toml2docs:none-default
cache_capacity = "5GiB"
## The S3 bucket name. ## The S3 bucket name.
## **It's only used when the storage type is `S3`, `Oss` and `Gcs`**. ## **It's only used when the storage type is `S3`, `Oss` and `Gcs`**.
## @toml2docs:none-default ## @toml2docs:none-default
@@ -452,6 +440,15 @@ compress_manifest = false
## @toml2docs:none-default="Auto" ## @toml2docs:none-default="Auto"
#+ max_background_purges = 8 #+ max_background_purges = 8
## Memory budget for compaction tasks. Setting it to 0 or "unlimited" disables the limit.
## @toml2docs:none-default="0"
#+ experimental_compaction_memory_limit = "0"
## Behavior when compaction cannot acquire memory from the budget.
## Options: "wait" (default, 10s), "wait(<duration>)", "fail"
## @toml2docs:none-default="wait"
#+ experimental_compaction_on_exhausted = "wait"
## Interval to auto flush a region if it has not flushed yet. ## Interval to auto flush a region if it has not flushed yet.
auto_flush_interval = "1h" auto_flush_interval = "1h"
@@ -507,6 +504,13 @@ preload_index_cache = true
## 1GiB is reserved for index files and 4GiB for data files. ## 1GiB is reserved for index files and 4GiB for data files.
index_cache_percent = 20 index_cache_percent = 20
## Enable refilling cache on read operations (default: true).
## When disabled, cache refilling on read won't happen.
enable_refill_cache_on_read = true
## Capacity for manifest cache (default: 256MB).
manifest_cache_size = "256MB"
## Buffer size for SST writing. ## Buffer size for SST writing.
sst_write_buffer_size = "8MB" sst_write_buffer_size = "8MB"

View File

@@ -388,18 +388,6 @@ data_home = "./greptimedb_data"
## - `Oss`: the data is stored in the Aliyun OSS. ## - `Oss`: the data is stored in the Aliyun OSS.
type = "File" type = "File"
## Whether to enable read cache. If not set, the read cache will be enabled by default when using object storage.
#+ enable_read_cache = true
## Read cache configuration for object storage such as 'S3' etc, it's configured by default when using object storage. It is recommended to configure it when using object storage for better performance.
## A local file directory, defaults to `{data_home}`. An empty string means disabling.
## @toml2docs:none-default
#+ cache_path = ""
## The local file cache capacity in bytes. If your disk space is sufficient, it is recommended to set it larger.
## @toml2docs:none-default
cache_capacity = "5GiB"
## The S3 bucket name. ## The S3 bucket name.
## **It's only used when the storage type is `S3`, `Oss` and `Gcs`**. ## **It's only used when the storage type is `S3`, `Oss` and `Gcs`**.
## @toml2docs:none-default ## @toml2docs:none-default
@@ -546,6 +534,15 @@ compress_manifest = false
## @toml2docs:none-default="Auto" ## @toml2docs:none-default="Auto"
#+ max_background_purges = 8 #+ max_background_purges = 8
## Memory budget for compaction tasks. Setting it to 0 or "unlimited" disables the limit.
## @toml2docs:none-default="0"
#+ experimental_compaction_memory_limit = "0"
## Behavior when compaction cannot acquire memory from the budget.
## Options: "wait" (default, 10s), "wait(<duration>)", "fail"
## @toml2docs:none-default="wait"
#+ experimental_compaction_on_exhausted = "wait"
## Interval to auto flush a region if it has not flushed yet. ## Interval to auto flush a region if it has not flushed yet.
auto_flush_interval = "1h" auto_flush_interval = "1h"
@@ -601,6 +598,13 @@ preload_index_cache = true
## 1GiB is reserved for index files and 4GiB for data files. ## 1GiB is reserved for index files and 4GiB for data files.
index_cache_percent = 20 index_cache_percent = 20
## Enable refilling cache on read operations (default: true).
## When disabled, cache refilling on read won't happen.
enable_refill_cache_on_read = true
## Capacity for manifest cache (default: 256MB).
manifest_cache_size = "256MB"
## Buffer size for SST writing. ## Buffer size for SST writing.
sst_write_buffer_size = "8MB" sst_write_buffer_size = "8MB"

20
flake.lock generated
View File

@@ -8,11 +8,11 @@
"rust-analyzer-src": "rust-analyzer-src" "rust-analyzer-src": "rust-analyzer-src"
}, },
"locked": { "locked": {
"lastModified": 1760078406, "lastModified": 1765252472,
"narHash": "sha256-JeJK0ZA845PtkCHkfo4KjeI1mYrsr2s3cxBYKhF4BoE=", "narHash": "sha256-byMt/uMi7DJ8tRniFopDFZMO3leSjGp6GS4zWOFT+uQ=",
"owner": "nix-community", "owner": "nix-community",
"repo": "fenix", "repo": "fenix",
"rev": "351277c60d104944122ee389cdf581c5ce2c6732", "rev": "8456b985f6652e3eef0632ee9992b439735c5544",
"type": "github" "type": "github"
}, },
"original": { "original": {
@@ -41,16 +41,16 @@
}, },
"nixpkgs": { "nixpkgs": {
"locked": { "locked": {
"lastModified": 1759994382, "lastModified": 1764983851,
"narHash": "sha256-wSK+3UkalDZRVHGCRikZ//CyZUJWDJkBDTQX1+G77Ow=", "narHash": "sha256-y7RPKl/jJ/KAP/VKLMghMgXTlvNIJMHKskl8/Uuar7o=",
"owner": "NixOS", "owner": "NixOS",
"repo": "nixpkgs", "repo": "nixpkgs",
"rev": "5da4a26309e796daa7ffca72df93dbe53b8164c7", "rev": "d9bc5c7dceb30d8d6fafa10aeb6aa8a48c218454",
"type": "github" "type": "github"
}, },
"original": { "original": {
"owner": "NixOS", "owner": "NixOS",
"ref": "nixos-25.05", "ref": "nixos-25.11",
"repo": "nixpkgs", "repo": "nixpkgs",
"type": "github" "type": "github"
} }
@@ -65,11 +65,11 @@
"rust-analyzer-src": { "rust-analyzer-src": {
"flake": false, "flake": false,
"locked": { "locked": {
"lastModified": 1760014945, "lastModified": 1765120009,
"narHash": "sha256-ySdl7F9+oeWNHVrg3QL/brazqmJvYFEdpGnF3pyoDH8=", "narHash": "sha256-nG76b87rkaDzibWbnB5bYDm6a52b78A+fpm+03pqYIw=",
"owner": "rust-lang", "owner": "rust-lang",
"repo": "rust-analyzer", "repo": "rust-analyzer",
"rev": "90d2e1ce4dfe7dc49250a8b88a0f08ffdb9cb23f", "rev": "5e3e9c4e61bba8a5e72134b9ffefbef8f531d008",
"type": "github" "type": "github"
}, },
"original": { "original": {

View File

@@ -2,7 +2,7 @@
description = "Development environment flake"; description = "Development environment flake";
inputs = { inputs = {
nixpkgs.url = "github:NixOS/nixpkgs/nixos-25.05"; nixpkgs.url = "github:NixOS/nixpkgs/nixos-25.11";
fenix = { fenix = {
url = "github:nix-community/fenix"; url = "github:nix-community/fenix";
inputs.nixpkgs.follows = "nixpkgs"; inputs.nixpkgs.follows = "nixpkgs";
@@ -48,7 +48,7 @@
gnuplot ## for cargo bench gnuplot ## for cargo bench
]; ];
LD_LIBRARY_PATH = pkgs.lib.makeLibraryPath buildInputs; buildInputs = buildInputs;
NIX_HARDENING_ENABLE = ""; NIX_HARDENING_ENABLE = "";
}; };
}); });

View File

@@ -708,6 +708,7 @@ fn ddl_request_type(request: &DdlRequest) -> &'static str {
Some(Expr::CreateView(_)) => "ddl.create_view", Some(Expr::CreateView(_)) => "ddl.create_view",
Some(Expr::DropView(_)) => "ddl.drop_view", Some(Expr::DropView(_)) => "ddl.drop_view",
Some(Expr::AlterDatabase(_)) => "ddl.alter_database", Some(Expr::AlterDatabase(_)) => "ddl.alter_database",
Some(Expr::CommentOn(_)) => "ddl.comment_on",
None => "ddl.empty", None => "ddl.empty",
} }
} }

View File

@@ -15,11 +15,11 @@ workspace = true
api.workspace = true api.workspace = true
async-trait.workspace = true async-trait.workspace = true
common-base.workspace = true common-base.workspace = true
common-config.workspace = true
common-error.workspace = true common-error.workspace = true
common-macro.workspace = true common-macro.workspace = true
common-telemetry.workspace = true common-telemetry.workspace = true
digest = "0.10" digest = "0.10"
notify.workspace = true
sha1 = "0.10" sha1 = "0.10"
snafu.workspace = true snafu.workspace = true
sql.workspace = true sql.workspace = true

View File

@@ -75,11 +75,12 @@ pub enum Error {
username: String, username: String,
}, },
#[snafu(display("Failed to initialize a watcher for file {}", path))] #[snafu(display("Failed to initialize a file watcher"))]
FileWatch { FileWatch {
path: String,
#[snafu(source)] #[snafu(source)]
error: notify::Error, source: common_config::error::Error,
#[snafu(implicit)]
location: Location,
}, },
#[snafu(display("User is not authorized to perform this action"))] #[snafu(display("User is not authorized to perform this action"))]

View File

@@ -12,16 +12,14 @@
// See the License for the specific language governing permissions and // See the License for the specific language governing permissions and
// limitations under the License. // limitations under the License.
use std::path::Path;
use std::sync::mpsc::channel;
use std::sync::{Arc, Mutex}; use std::sync::{Arc, Mutex};
use async_trait::async_trait; use async_trait::async_trait;
use common_config::file_watcher::{FileWatcherBuilder, FileWatcherConfig};
use common_telemetry::{info, warn}; use common_telemetry::{info, warn};
use notify::{EventKind, RecursiveMode, Watcher}; use snafu::ResultExt;
use snafu::{ResultExt, ensure};
use crate::error::{FileWatchSnafu, InvalidConfigSnafu, Result}; use crate::error::{FileWatchSnafu, Result};
use crate::user_provider::{UserInfoMap, authenticate_with_credential, load_credential_from_file}; use crate::user_provider::{UserInfoMap, authenticate_with_credential, load_credential_from_file};
use crate::{Identity, Password, UserInfoRef, UserProvider}; use crate::{Identity, Password, UserInfoRef, UserProvider};
@@ -41,61 +39,36 @@ impl WatchFileUserProvider {
pub fn new(filepath: &str) -> Result<Self> { pub fn new(filepath: &str) -> Result<Self> {
let credential = load_credential_from_file(filepath)?; let credential = load_credential_from_file(filepath)?;
let users = Arc::new(Mutex::new(credential)); let users = Arc::new(Mutex::new(credential));
let this = WatchFileUserProvider {
users: users.clone(),
};
let (tx, rx) = channel::<notify::Result<notify::Event>>(); let users_clone = users.clone();
let mut debouncer = let filepath_owned = filepath.to_string();
notify::recommended_watcher(tx).context(FileWatchSnafu { path: "<none>" })?;
let mut dir = Path::new(filepath).to_path_buf();
ensure!(
dir.pop(),
InvalidConfigSnafu {
value: filepath,
msg: "UserProvider path must be a file path",
}
);
debouncer
.watch(&dir, RecursiveMode::NonRecursive)
.context(FileWatchSnafu { path: filepath })?;
let filepath = filepath.to_string(); FileWatcherBuilder::new()
std::thread::spawn(move || { .watch_path(filepath)
let filename = Path::new(&filepath).file_name(); .context(FileWatchSnafu)?
let _hold = debouncer; .config(FileWatcherConfig::new())
while let Ok(res) = rx.recv() { .spawn(move || match load_credential_from_file(&filepath_owned) {
if let Ok(event) = res {
let is_this_file = event.paths.iter().any(|p| p.file_name() == filename);
let is_relevant_event = matches!(
event.kind,
EventKind::Modify(_) | EventKind::Create(_) | EventKind::Remove(_)
);
if is_this_file && is_relevant_event {
info!(?event.kind, "User provider file {} changed", &filepath);
match load_credential_from_file(&filepath) {
Ok(credential) => { Ok(credential) => {
let mut users = let mut users = users_clone.lock().expect("users credential must be valid");
users.lock().expect("users credential must be valid");
#[cfg(not(test))] #[cfg(not(test))]
info!("User provider file {filepath} reloaded"); info!("User provider file {} reloaded", &filepath_owned);
#[cfg(test)] #[cfg(test)]
info!("User provider file {filepath} reloaded: {credential:?}"); info!(
"User provider file {} reloaded: {:?}",
&filepath_owned, credential
);
*users = credential; *users = credential;
} }
Err(err) => { Err(err) => {
warn!( warn!(
?err, ?err,
"Fail to load credential from file {filepath}; keep the old one", "Fail to load credential from file {}; keep the old one", &filepath_owned
) )
} }
} })
} .context(FileWatchSnafu)?;
}
}
});
Ok(this) Ok(WatchFileUserProvider { users })
} }
} }

View File

@@ -428,7 +428,7 @@ pub trait InformationExtension {
} }
/// The request to inspect the datanode. /// The request to inspect the datanode.
#[derive(Debug, Clone, PartialEq, Eq)] #[derive(Debug, Clone, PartialEq)]
pub struct DatanodeInspectRequest { pub struct DatanodeInspectRequest {
/// Kind to fetch from datanode. /// Kind to fetch from datanode.
pub kind: DatanodeInspectKind, pub kind: DatanodeInspectKind,

View File

@@ -145,6 +145,17 @@ impl ObjbenchCommand {
let region_meta = extract_region_metadata(&self.source, &parquet_meta)?; let region_meta = extract_region_metadata(&self.source, &parquet_meta)?;
let num_rows = parquet_meta.file_metadata().num_rows() as u64; let num_rows = parquet_meta.file_metadata().num_rows() as u64;
let num_row_groups = parquet_meta.num_row_groups() as u64; let num_row_groups = parquet_meta.num_row_groups() as u64;
let max_row_group_uncompressed_size: u64 = parquet_meta
.row_groups()
.iter()
.map(|rg| {
rg.columns()
.iter()
.map(|c| c.uncompressed_size() as u64)
.sum::<u64>()
})
.max()
.unwrap_or(0);
println!( println!(
"{} Metadata loaded - rows: {}, size: {} bytes", "{} Metadata loaded - rows: {}, size: {} bytes",
@@ -160,10 +171,11 @@ impl ObjbenchCommand {
time_range: Default::default(), time_range: Default::default(),
level: 0, level: 0,
file_size, file_size,
max_row_group_uncompressed_size,
available_indexes: Default::default(), available_indexes: Default::default(),
indexes: Default::default(), indexes: Default::default(),
index_file_size: 0, index_file_size: 0,
index_file_id: None, index_version: 0,
num_rows, num_rows,
num_row_groups, num_row_groups,
sequence: None, sequence: None,
@@ -564,7 +576,7 @@ fn new_noop_file_purger() -> FilePurgerRef {
#[derive(Debug)] #[derive(Debug)]
struct Noop; struct Noop;
impl FilePurger for Noop { impl FilePurger for Noop {
fn remove_file(&self, _file_meta: FileMeta, _is_delete: bool) {} fn remove_file(&self, _file_meta: FileMeta, _is_delete: bool, _index_outdated: bool) {}
} }
Arc::new(Noop) Arc::new(Noop)
} }

View File

@@ -35,6 +35,7 @@ use common_meta::cache::{CacheRegistryBuilder, LayeredCacheRegistryBuilder};
use common_meta::heartbeat::handler::HandlerGroupExecutor; use common_meta::heartbeat::handler::HandlerGroupExecutor;
use common_meta::heartbeat::handler::invalidate_table_cache::InvalidateCacheHandler; use common_meta::heartbeat::handler::invalidate_table_cache::InvalidateCacheHandler;
use common_meta::heartbeat::handler::parse_mailbox_message::ParseMailboxMessageHandler; use common_meta::heartbeat::handler::parse_mailbox_message::ParseMailboxMessageHandler;
use common_meta::heartbeat::handler::suspend::SuspendHandler;
use common_query::prelude::set_default_prefix; use common_query::prelude::set_default_prefix;
use common_stat::ResourceStatImpl; use common_stat::ResourceStatImpl;
use common_telemetry::info; use common_telemetry::info;
@@ -45,13 +46,13 @@ use frontend::frontend::Frontend;
use frontend::heartbeat::HeartbeatTask; use frontend::heartbeat::HeartbeatTask;
use frontend::instance::builder::FrontendBuilder; use frontend::instance::builder::FrontendBuilder;
use frontend::server::Services; use frontend::server::Services;
use meta_client::{MetaClientOptions, MetaClientType}; use meta_client::{MetaClientOptions, MetaClientRef, MetaClientType};
use plugins::frontend::context::{ use plugins::frontend::context::{
CatalogManagerConfigureContext, DistributedCatalogManagerConfigureContext, CatalogManagerConfigureContext, DistributedCatalogManagerConfigureContext,
}; };
use servers::addrs; use servers::addrs;
use servers::grpc::GrpcOptions; use servers::grpc::GrpcOptions;
use servers::tls::{TlsMode, TlsOption}; use servers::tls::{TlsMode, TlsOption, merge_tls_option};
use snafu::{OptionExt, ResultExt}; use snafu::{OptionExt, ResultExt};
use tracing_appender::non_blocking::WorkerGuard; use tracing_appender::non_blocking::WorkerGuard;
@@ -255,7 +256,7 @@ impl StartCommand {
if let Some(addr) = &self.rpc_bind_addr { if let Some(addr) = &self.rpc_bind_addr {
opts.grpc.bind_addr.clone_from(addr); opts.grpc.bind_addr.clone_from(addr);
opts.grpc.tls = tls_opts.clone(); opts.grpc.tls = merge_tls_option(&opts.grpc.tls, tls_opts.clone());
} }
if let Some(addr) = &self.rpc_server_addr { if let Some(addr) = &self.rpc_server_addr {
@@ -290,13 +291,13 @@ impl StartCommand {
if let Some(addr) = &self.mysql_addr { if let Some(addr) = &self.mysql_addr {
opts.mysql.enable = true; opts.mysql.enable = true;
opts.mysql.addr.clone_from(addr); opts.mysql.addr.clone_from(addr);
opts.mysql.tls = tls_opts.clone(); opts.mysql.tls = merge_tls_option(&opts.mysql.tls, tls_opts.clone());
} }
if let Some(addr) = &self.postgres_addr { if let Some(addr) = &self.postgres_addr {
opts.postgres.enable = true; opts.postgres.enable = true;
opts.postgres.addr.clone_from(addr); opts.postgres.addr.clone_from(addr);
opts.postgres.tls = tls_opts; opts.postgres.tls = merge_tls_option(&opts.postgres.tls, tls_opts.clone());
} }
if let Some(enable) = self.influxdb_enable { if let Some(enable) = self.influxdb_enable {
@@ -440,30 +441,13 @@ impl StartCommand {
}; };
let catalog_manager = builder.build(); let catalog_manager = builder.build();
let executor = HandlerGroupExecutor::new(vec![
Arc::new(ParseMailboxMessageHandler),
Arc::new(InvalidateCacheHandler::new(layered_cache_registry.clone())),
]);
let mut resource_stat = ResourceStatImpl::default();
resource_stat.start_collect_cpu_usage();
let heartbeat_task = HeartbeatTask::new(
&opts,
meta_client.clone(),
opts.heartbeat.clone(),
Arc::new(executor),
Arc::new(resource_stat),
);
let heartbeat_task = Some(heartbeat_task);
let instance = FrontendBuilder::new( let instance = FrontendBuilder::new(
opts.clone(), opts.clone(),
cached_meta_backend.clone(), cached_meta_backend.clone(),
layered_cache_registry.clone(), layered_cache_registry.clone(),
catalog_manager, catalog_manager,
client, client,
meta_client, meta_client.clone(),
process_manager, process_manager,
) )
.with_plugin(plugins.clone()) .with_plugin(plugins.clone())
@@ -471,6 +455,9 @@ impl StartCommand {
.try_build() .try_build()
.await .await
.context(error::StartFrontendSnafu)?; .context(error::StartFrontendSnafu)?;
let heartbeat_task = Some(create_heartbeat_task(&opts, meta_client, &instance));
let instance = Arc::new(instance); let instance = Arc::new(instance);
let servers = Services::new(opts, instance.clone(), plugins) let servers = Services::new(opts, instance.clone(), plugins)
@@ -487,6 +474,28 @@ impl StartCommand {
} }
} }
pub fn create_heartbeat_task(
options: &frontend::frontend::FrontendOptions,
meta_client: MetaClientRef,
instance: &frontend::instance::Instance,
) -> HeartbeatTask {
let executor = Arc::new(HandlerGroupExecutor::new(vec![
Arc::new(ParseMailboxMessageHandler),
Arc::new(SuspendHandler::new(instance.suspend_state())),
Arc::new(InvalidateCacheHandler::new(
instance.cache_invalidator().clone(),
)),
]));
let stat = {
let mut stat = ResourceStatImpl::default();
stat.start_collect_cpu_usage();
Arc::new(stat)
};
HeartbeatTask::new(options, meta_client, executor, stat)
}
#[cfg(test)] #[cfg(test)]
mod tests { mod tests {
use std::io::Write; use std::io::Write;

View File

@@ -62,7 +62,7 @@ use plugins::frontend::context::{
CatalogManagerConfigureContext, StandaloneCatalogManagerConfigureContext, CatalogManagerConfigureContext, StandaloneCatalogManagerConfigureContext,
}; };
use plugins::standalone::context::DdlManagerConfigureContext; use plugins::standalone::context::DdlManagerConfigureContext;
use servers::tls::{TlsMode, TlsOption}; use servers::tls::{TlsMode, TlsOption, merge_tls_option};
use snafu::ResultExt; use snafu::ResultExt;
use standalone::StandaloneInformationExtension; use standalone::StandaloneInformationExtension;
use standalone::options::StandaloneOptions; use standalone::options::StandaloneOptions;
@@ -293,19 +293,20 @@ impl StartCommand {
), ),
}.fail(); }.fail();
} }
opts.grpc.bind_addr.clone_from(addr) opts.grpc.bind_addr.clone_from(addr);
opts.grpc.tls = merge_tls_option(&opts.grpc.tls, tls_opts.clone());
} }
if let Some(addr) = &self.mysql_addr { if let Some(addr) = &self.mysql_addr {
opts.mysql.enable = true; opts.mysql.enable = true;
opts.mysql.addr.clone_from(addr); opts.mysql.addr.clone_from(addr);
opts.mysql.tls = tls_opts.clone(); opts.mysql.tls = merge_tls_option(&opts.mysql.tls, tls_opts.clone());
} }
if let Some(addr) = &self.postgres_addr { if let Some(addr) = &self.postgres_addr {
opts.postgres.enable = true; opts.postgres.enable = true;
opts.postgres.addr.clone_from(addr); opts.postgres.addr.clone_from(addr);
opts.postgres.tls = tls_opts; opts.postgres.tls = merge_tls_option(&opts.postgres.tls, tls_opts.clone());
} }
if self.influxdb_enable { if self.influxdb_enable {
@@ -765,7 +766,6 @@ mod tests {
user_provider: Some("static_user_provider:cmd:test=test".to_string()), user_provider: Some("static_user_provider:cmd:test=test".to_string()),
mysql_addr: Some("127.0.0.1:4002".to_string()), mysql_addr: Some("127.0.0.1:4002".to_string()),
postgres_addr: Some("127.0.0.1:4003".to_string()), postgres_addr: Some("127.0.0.1:4003".to_string()),
tls_watch: true,
..Default::default() ..Default::default()
}; };
@@ -782,8 +782,6 @@ mod tests {
assert_eq!("./greptimedb_data/test/logs", opts.logging.dir); assert_eq!("./greptimedb_data/test/logs", opts.logging.dir);
assert_eq!("debug", opts.logging.level.unwrap()); assert_eq!("debug", opts.logging.level.unwrap());
assert!(opts.mysql.tls.watch);
assert!(opts.postgres.tls.watch);
} }
#[test] #[test]

View File

@@ -11,8 +11,10 @@ workspace = true
common-base.workspace = true common-base.workspace = true
common-error.workspace = true common-error.workspace = true
common-macro.workspace = true common-macro.workspace = true
common-telemetry.workspace = true
config.workspace = true config.workspace = true
humantime-serde.workspace = true humantime-serde.workspace = true
notify.workspace = true
object-store.workspace = true object-store.workspace = true
serde.workspace = true serde.workspace = true
serde_json.workspace = true serde_json.workspace = true

View File

@@ -49,14 +49,41 @@ pub enum Error {
#[snafu(implicit)] #[snafu(implicit)]
location: Location, location: Location,
}, },
#[snafu(display("Failed to watch file: {}", path))]
FileWatch {
path: String,
#[snafu(source)]
error: notify::Error,
#[snafu(implicit)]
location: Location,
},
#[snafu(display("Failed to canonicalize path: {}", path))]
CanonicalizePath {
path: String,
#[snafu(source)]
error: std::io::Error,
#[snafu(implicit)]
location: Location,
},
#[snafu(display("Invalid path '{}': expected a file, not a directory", path))]
InvalidPath {
path: String,
#[snafu(implicit)]
location: Location,
},
} }
impl ErrorExt for Error { impl ErrorExt for Error {
fn status_code(&self) -> StatusCode { fn status_code(&self) -> StatusCode {
match self { match self {
Error::TomlFormat { .. } | Error::LoadLayeredConfig { .. } => { Error::TomlFormat { .. }
StatusCode::InvalidArguments | Error::LoadLayeredConfig { .. }
} | Error::FileWatch { .. }
| Error::InvalidPath { .. }
| Error::CanonicalizePath { .. } => StatusCode::InvalidArguments,
Error::SerdeJson { .. } => StatusCode::Unexpected, Error::SerdeJson { .. } => StatusCode::Unexpected,
} }
} }

View File

@@ -0,0 +1,355 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
//! Common file watching utilities for configuration hot-reloading.
//!
//! This module provides a generic file watcher that can be used to watch
//! files for changes and trigger callbacks when changes occur.
//!
//! The watcher monitors the parent directory of each file rather than the
//! file itself. This ensures that file deletions and recreations are properly
//! tracked, which is common with editors that use atomic saves or when
//! configuration files are replaced.
use std::collections::HashSet;
use std::path::{Path, PathBuf};
use std::sync::mpsc::channel;
use common_telemetry::{error, info, warn};
use notify::{EventKind, RecursiveMode, Watcher};
use snafu::ResultExt;
use crate::error::{CanonicalizePathSnafu, FileWatchSnafu, InvalidPathSnafu, Result};
/// Configuration for the file watcher behavior.
#[derive(Debug, Clone, Default)]
pub struct FileWatcherConfig {
/// Whether to include Remove events in addition to Modify and Create.
pub include_remove_events: bool,
}
impl FileWatcherConfig {
pub fn new() -> Self {
Self::default()
}
pub fn with_modify_and_create(mut self) -> Self {
self.include_remove_events = false;
self
}
pub fn with_remove_events(mut self) -> Self {
self.include_remove_events = true;
self
}
}
/// A builder for creating file watchers with flexible configuration.
///
/// The watcher monitors the parent directory of each file to handle file
/// deletion and recreation properly. Events are filtered to only trigger
/// callbacks for the specific files being watched.
pub struct FileWatcherBuilder {
config: FileWatcherConfig,
/// Canonicalized paths of files to watch.
file_paths: Vec<PathBuf>,
}
impl FileWatcherBuilder {
/// Create a new builder with default configuration.
pub fn new() -> Self {
Self {
config: FileWatcherConfig::default(),
file_paths: Vec::new(),
}
}
/// Set the watcher configuration.
pub fn config(mut self, config: FileWatcherConfig) -> Self {
self.config = config;
self
}
/// Add a file path to watch.
///
/// Returns an error if the path is a directory.
/// The path is canonicalized for reliable comparison with events.
pub fn watch_path<P: AsRef<Path>>(mut self, path: P) -> Result<Self> {
let path = path.as_ref();
snafu::ensure!(
path.is_file(),
InvalidPathSnafu {
path: path.display().to_string(),
}
);
// Canonicalize the path for reliable comparison with event paths
let canonical = path.canonicalize().context(CanonicalizePathSnafu {
path: path.display().to_string(),
})?;
self.file_paths.push(canonical);
Ok(self)
}
/// Add multiple file paths to watch.
///
/// Returns an error if any path is a directory.
pub fn watch_paths<P: AsRef<Path>, I: IntoIterator<Item = P>>(
mut self,
paths: I,
) -> Result<Self> {
for path in paths {
self = self.watch_path(path)?;
}
Ok(self)
}
/// Build and spawn the file watcher with the given callback.
///
/// The callback is invoked when relevant file events are detected for
/// the watched files. The watcher monitors the parent directories to
/// handle file deletion and recreation properly.
///
/// The spawned watcher thread runs for the lifetime of the process.
pub fn spawn<F>(self, callback: F) -> Result<()>
where
F: Fn() + Send + 'static,
{
let (tx, rx) = channel::<notify::Result<notify::Event>>();
let mut watcher =
notify::recommended_watcher(tx).context(FileWatchSnafu { path: "<none>" })?;
// Collect unique parent directories to watch
let mut watched_dirs: HashSet<PathBuf> = HashSet::new();
for file_path in &self.file_paths {
if let Some(parent) = file_path.parent()
&& watched_dirs.insert(parent.to_path_buf())
{
watcher
.watch(parent, RecursiveMode::NonRecursive)
.context(FileWatchSnafu {
path: parent.display().to_string(),
})?;
}
}
let config = self.config;
let watched_files: HashSet<PathBuf> = self.file_paths.iter().cloned().collect();
info!(
"Spawning file watcher for paths: {:?} (watching parent directories)",
self.file_paths
.iter()
.map(|p| p.display().to_string())
.collect::<Vec<_>>()
);
std::thread::spawn(move || {
// Keep watcher alive in the thread
let _watcher = watcher;
while let Ok(res) = rx.recv() {
match res {
Ok(event) => {
if !is_relevant_event(&event.kind, &config) {
continue;
}
// Check if any of the event paths match our watched files
let is_watched_file = event.paths.iter().any(|event_path| {
// Try to canonicalize the event path for comparison
// If the file was deleted, canonicalize will fail, so we also
// compare the raw path
if let Ok(canonical) = event_path.canonicalize()
&& watched_files.contains(&canonical)
{
return true;
}
// For deleted files, compare using the raw path
watched_files.contains(event_path)
});
if !is_watched_file {
continue;
}
info!(?event.kind, ?event.paths, "Detected file change");
callback();
}
Err(err) => {
warn!("File watcher error: {}", err);
}
}
}
error!("File watcher channel closed unexpectedly");
});
Ok(())
}
}
impl Default for FileWatcherBuilder {
fn default() -> Self {
Self::new()
}
}
/// Check if an event kind is relevant based on the configuration.
fn is_relevant_event(kind: &EventKind, config: &FileWatcherConfig) -> bool {
match kind {
EventKind::Modify(_) | EventKind::Create(_) => true,
EventKind::Remove(_) => config.include_remove_events,
_ => false,
}
}
#[cfg(test)]
mod tests {
use std::sync::Arc;
use std::sync::atomic::{AtomicUsize, Ordering};
use std::time::Duration;
use common_test_util::temp_dir::create_temp_dir;
use super::*;
#[test]
fn test_file_watcher_detects_changes() {
common_telemetry::init_default_ut_logging();
let dir = create_temp_dir("test_file_watcher");
let file_path = dir.path().join("test_file.txt");
// Create initial file
std::fs::write(&file_path, "initial content").unwrap();
let counter = Arc::new(AtomicUsize::new(0));
let counter_clone = counter.clone();
FileWatcherBuilder::new()
.watch_path(&file_path)
.unwrap()
.config(FileWatcherConfig::new())
.spawn(move || {
counter_clone.fetch_add(1, Ordering::SeqCst);
})
.unwrap();
// Give watcher time to start
std::thread::sleep(Duration::from_millis(100));
// Modify the file
std::fs::write(&file_path, "modified content").unwrap();
// Wait for the event to be processed
std::thread::sleep(Duration::from_millis(500));
assert!(
counter.load(Ordering::SeqCst) >= 1,
"Watcher should have detected at least one change"
);
}
#[test]
fn test_file_watcher_detects_delete_and_recreate() {
common_telemetry::init_default_ut_logging();
let dir = create_temp_dir("test_file_watcher_recreate");
let file_path = dir.path().join("test_file.txt");
// Create initial file
std::fs::write(&file_path, "initial content").unwrap();
let counter = Arc::new(AtomicUsize::new(0));
let counter_clone = counter.clone();
FileWatcherBuilder::new()
.watch_path(&file_path)
.unwrap()
.config(FileWatcherConfig::new())
.spawn(move || {
counter_clone.fetch_add(1, Ordering::SeqCst);
})
.unwrap();
// Give watcher time to start
std::thread::sleep(Duration::from_millis(100));
// Delete the file
std::fs::remove_file(&file_path).unwrap();
std::thread::sleep(Duration::from_millis(100));
// Recreate the file - this should still be detected because we watch the directory
std::fs::write(&file_path, "recreated content").unwrap();
// Wait for the event to be processed
std::thread::sleep(Duration::from_millis(500));
assert!(
counter.load(Ordering::SeqCst) >= 1,
"Watcher should have detected file recreation"
);
}
#[test]
fn test_file_watcher_ignores_other_files() {
common_telemetry::init_default_ut_logging();
let dir = create_temp_dir("test_file_watcher_other");
let watched_file = dir.path().join("watched.txt");
let other_file = dir.path().join("other.txt");
// Create both files
std::fs::write(&watched_file, "watched content").unwrap();
std::fs::write(&other_file, "other content").unwrap();
let counter = Arc::new(AtomicUsize::new(0));
let counter_clone = counter.clone();
FileWatcherBuilder::new()
.watch_path(&watched_file)
.unwrap()
.config(FileWatcherConfig::new())
.spawn(move || {
counter_clone.fetch_add(1, Ordering::SeqCst);
})
.unwrap();
// Give watcher time to start
std::thread::sleep(Duration::from_millis(100));
// Modify the other file - should NOT trigger callback
std::fs::write(&other_file, "modified other content").unwrap();
// Wait for potential event
std::thread::sleep(Duration::from_millis(500));
assert_eq!(
counter.load(Ordering::SeqCst),
0,
"Watcher should not have detected changes to other files"
);
// Now modify the watched file - SHOULD trigger callback
std::fs::write(&watched_file, "modified watched content").unwrap();
// Wait for the event to be processed
std::thread::sleep(Duration::from_millis(500));
assert!(
counter.load(Ordering::SeqCst) >= 1,
"Watcher should have detected change to watched file"
);
}
}

View File

@@ -14,6 +14,7 @@
pub mod config; pub mod config;
pub mod error; pub mod error;
pub mod file_watcher;
use std::time::Duration; use std::time::Duration;

View File

@@ -21,6 +21,8 @@ pub mod status_code;
use http::{HeaderMap, HeaderValue}; use http::{HeaderMap, HeaderValue};
pub use snafu; pub use snafu;
use crate::status_code::StatusCode;
// HACK - these headers are here for shared in gRPC services. For common HTTP headers, // HACK - these headers are here for shared in gRPC services. For common HTTP headers,
// please define in `src/servers/src/http/header.rs`. // please define in `src/servers/src/http/header.rs`.
pub const GREPTIME_DB_HEADER_ERROR_CODE: &str = "x-greptime-err-code"; pub const GREPTIME_DB_HEADER_ERROR_CODE: &str = "x-greptime-err-code";
@@ -46,6 +48,29 @@ pub fn from_err_code_msg_to_header(code: u32, msg: &str) -> HeaderMap {
header header
} }
/// Extract [StatusCode] and error message from [HeaderMap], if any.
///
/// Note that if the [StatusCode] is illegal, for example, a random number that is not pre-defined
/// as a [StatusCode], the result is still `None`.
pub fn from_header_to_err_code_msg(headers: &HeaderMap) -> Option<(StatusCode, &str)> {
let code = headers
.get(GREPTIME_DB_HEADER_ERROR_CODE)
.and_then(|value| {
value
.to_str()
.ok()
.and_then(|x| x.parse::<u32>().ok())
.and_then(StatusCode::from_u32)
});
let msg = headers
.get(GREPTIME_DB_HEADER_ERROR_MSG)
.and_then(|x| x.to_str().ok());
match (code, msg) {
(Some(code), Some(msg)) => Some((code, msg)),
_ => None,
}
}
/// Returns the external root cause of the source error (exclude the current error). /// Returns the external root cause of the source error (exclude the current error).
pub fn root_source(err: &dyn std::error::Error) -> Option<&dyn std::error::Error> { pub fn root_source(err: &dyn std::error::Error) -> Option<&dyn std::error::Error> {
// There are some divergence about the behavior of the `sources()` API // There are some divergence about the behavior of the `sources()` API

View File

@@ -42,6 +42,8 @@ pub enum StatusCode {
External = 1007, External = 1007,
/// The request is deadline exceeded (typically server-side). /// The request is deadline exceeded (typically server-side).
DeadlineExceeded = 1008, DeadlineExceeded = 1008,
/// Service got suspended for various reason. For example, resources exceed limit.
Suspended = 1009,
// ====== End of common status code ================ // ====== End of common status code ================
// ====== Begin of SQL related status code ========= // ====== Begin of SQL related status code =========
@@ -175,7 +177,8 @@ impl StatusCode {
| StatusCode::AccessDenied | StatusCode::AccessDenied
| StatusCode::PermissionDenied | StatusCode::PermissionDenied
| StatusCode::RequestOutdated | StatusCode::RequestOutdated
| StatusCode::External => false, | StatusCode::External
| StatusCode::Suspended => false,
} }
} }
@@ -223,7 +226,8 @@ impl StatusCode {
| StatusCode::InvalidAuthHeader | StatusCode::InvalidAuthHeader
| StatusCode::AccessDenied | StatusCode::AccessDenied
| StatusCode::PermissionDenied | StatusCode::PermissionDenied
| StatusCode::RequestOutdated => false, | StatusCode::RequestOutdated
| StatusCode::Suspended => false,
} }
} }
@@ -347,7 +351,8 @@ pub fn status_to_tonic_code(status_code: StatusCode) -> Code {
| StatusCode::RegionNotReady => Code::Unavailable, | StatusCode::RegionNotReady => Code::Unavailable,
StatusCode::RuntimeResourcesExhausted StatusCode::RuntimeResourcesExhausted
| StatusCode::RateLimited | StatusCode::RateLimited
| StatusCode::RegionBusy => Code::ResourceExhausted, | StatusCode::RegionBusy
| StatusCode::Suspended => Code::ResourceExhausted,
StatusCode::UnsupportedPasswordType StatusCode::UnsupportedPasswordType
| StatusCode::UserPasswordMismatch | StatusCode::UserPasswordMismatch
| StatusCode::AuthHeaderNotFound | StatusCode::AuthHeaderNotFound

View File

@@ -39,7 +39,7 @@ datafusion-functions-aggregate-common.workspace = true
datafusion-pg-catalog.workspace = true datafusion-pg-catalog.workspace = true
datafusion-physical-expr.workspace = true datafusion-physical-expr.workspace = true
datatypes.workspace = true datatypes.workspace = true
derive_more = { version = "1", default-features = false, features = ["display"] } derive_more.workspace = true
geo = { version = "0.29", optional = true } geo = { version = "0.29", optional = true }
geo-types = { version = "0.7", optional = true } geo-types = { version = "0.7", optional = true }
geohash = { version = "0.13", optional = true } geohash = { version = "0.13", optional = true }

View File

@@ -12,6 +12,7 @@
// See the License for the specific language governing permissions and // See the License for the specific language governing permissions and
// limitations under the License. // limitations under the License.
use std::fmt::Display;
use std::sync::Arc; use std::sync::Arc;
use datafusion_common::arrow::array::{Array, AsArray, BooleanBuilder}; use datafusion_common::arrow::array::{Array, AsArray, BooleanBuilder};

View File

@@ -387,6 +387,8 @@ impl PGCatalogFunction {
registry.register(pg_catalog::create_pg_stat_get_numscans()); registry.register(pg_catalog::create_pg_stat_get_numscans());
registry.register(pg_catalog::create_pg_get_constraintdef()); registry.register(pg_catalog::create_pg_get_constraintdef());
registry.register(pg_catalog::create_pg_get_partition_ancestors_udf()); registry.register(pg_catalog::create_pg_get_partition_ancestors_udf());
registry.register(pg_catalog::quote_ident_udf::create_quote_ident_udf());
registry.register(pg_catalog::quote_ident_udf::create_parse_ident_udf());
registry.register_scalar(ObjDescriptionFunction::new()); registry.register_scalar(ObjDescriptionFunction::new());
registry.register_scalar(ColDescriptionFunction::new()); registry.register_scalar(ColDescriptionFunction::new());
registry.register_scalar(ShobjDescriptionFunction::new()); registry.register_scalar(ShobjDescriptionFunction::new());

View File

@@ -12,6 +12,7 @@ api.workspace = true
arrow-flight.workspace = true arrow-flight.workspace = true
bytes.workspace = true bytes.workspace = true
common-base.workspace = true common-base.workspace = true
common-config.workspace = true
common-error.workspace = true common-error.workspace = true
common-macro.workspace = true common-macro.workspace = true
common-recordbatch.workspace = true common-recordbatch.workspace = true
@@ -23,7 +24,6 @@ datatypes.workspace = true
flatbuffers = "25.2" flatbuffers = "25.2"
hyper.workspace = true hyper.workspace = true
lazy_static.workspace = true lazy_static.workspace = true
notify.workspace = true
prost.workspace = true prost.workspace = true
serde.workspace = true serde.workspace = true
serde_json.workspace = true serde_json.workspace = true

View File

@@ -38,11 +38,10 @@ pub enum Error {
location: Location, location: Location,
}, },
#[snafu(display("Failed to watch config file path: {}", path))] #[snafu(display("Failed to watch config file"))]
FileWatch { FileWatch {
path: String,
#[snafu(source)] #[snafu(source)]
error: notify::Error, source: common_config::error::Error,
#[snafu(implicit)] #[snafu(implicit)]
location: Location, location: Location,
}, },

View File

@@ -15,11 +15,10 @@
use std::path::Path; use std::path::Path;
use std::result::Result as StdResult; use std::result::Result as StdResult;
use std::sync::atomic::{AtomicUsize, Ordering}; use std::sync::atomic::{AtomicUsize, Ordering};
use std::sync::mpsc::channel;
use std::sync::{Arc, RwLock}; use std::sync::{Arc, RwLock};
use common_config::file_watcher::{FileWatcherBuilder, FileWatcherConfig};
use common_telemetry::{error, info}; use common_telemetry::{error, info};
use notify::{EventKind, RecursiveMode, Watcher};
use snafu::ResultExt; use snafu::ResultExt;
use crate::error::{FileWatchSnafu, Result}; use crate::error::{FileWatchSnafu, Result};
@@ -119,45 +118,28 @@ where
return Ok(()); return Ok(());
} }
let watch_paths: Vec<_> = tls_config
.get_tls_option()
.watch_paths()
.iter()
.map(|p| p.to_path_buf())
.collect();
let tls_config_for_watcher = tls_config.clone(); let tls_config_for_watcher = tls_config.clone();
let (tx, rx) = channel::<notify::Result<notify::Event>>(); FileWatcherBuilder::new()
let mut watcher = notify::recommended_watcher(tx).context(FileWatchSnafu { path: "<none>" })?; .watch_paths(&watch_paths)
.context(FileWatchSnafu)?
// Watch all paths returned by the TlsConfigLoader .config(FileWatcherConfig::new())
for path in tls_config.get_tls_option().watch_paths() { .spawn(move || {
watcher
.watch(path, RecursiveMode::NonRecursive)
.with_context(|_| FileWatchSnafu {
path: path.display().to_string(),
})?;
}
info!("Spawning background task for watching TLS cert/key file changes");
std::thread::spawn(move || {
let _watcher = watcher;
loop {
match rx.recv() {
Ok(Ok(event)) => {
if let EventKind::Modify(_) | EventKind::Create(_) = event.kind {
info!("Detected TLS cert/key file change: {:?}", event);
if let Err(err) = tls_config_for_watcher.reload() { if let Err(err) = tls_config_for_watcher.reload() {
error!("Failed to reload TLS config: {}", err); error!("Failed to reload TLS config: {}", err);
} else { } else {
info!("Reloaded TLS cert/key file successfully."); info!("Reloaded TLS cert/key file successfully.");
on_reload(); on_reload();
} }
} })
} .context(FileWatchSnafu)?;
Ok(Err(err)) => {
error!("Failed to watch TLS cert/key file: {}", err);
}
Err(err) => {
error!("TLS cert/key file watcher channel closed: {}", err);
}
}
}
});
Ok(()) Ok(())
} }

View File

@@ -0,0 +1,20 @@
[package]
name = "common-memory-manager"
version.workspace = true
edition.workspace = true
license.workspace = true
[lints]
workspace = true
[dependencies]
common-error = { workspace = true }
common-macro = { workspace = true }
common-telemetry = { workspace = true }
humantime = { workspace = true }
serde = { workspace = true }
snafu = { workspace = true }
tokio = { workspace = true, features = ["sync"] }
[dev-dependencies]
tokio = { workspace = true, features = ["rt", "macros"] }

View File

@@ -0,0 +1,53 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
use std::any::Any;
use common_error::ext::ErrorExt;
use common_error::status_code::StatusCode;
use common_macro::stack_trace_debug;
use snafu::Snafu;
pub type Result<T> = std::result::Result<T, Error>;
#[derive(Snafu)]
#[snafu(visibility(pub))]
#[stack_trace_debug]
pub enum Error {
#[snafu(display(
"Memory limit exceeded: requested {requested_bytes} bytes, limit {limit_bytes} bytes"
))]
MemoryLimitExceeded {
requested_bytes: u64,
limit_bytes: u64,
},
#[snafu(display("Memory semaphore unexpectedly closed"))]
MemorySemaphoreClosed,
}
impl ErrorExt for Error {
fn status_code(&self) -> StatusCode {
use Error::*;
match self {
MemoryLimitExceeded { .. } => StatusCode::RuntimeResourcesExhausted,
MemorySemaphoreClosed => StatusCode::Unexpected,
}
}
fn as_any(&self) -> &dyn Any {
self
}
}

View File

@@ -0,0 +1,138 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
use std::{fmt, mem};
use common_telemetry::debug;
use tokio::sync::{OwnedSemaphorePermit, TryAcquireError};
use crate::manager::{MemoryMetrics, MemoryQuota, bytes_to_permits, permits_to_bytes};
/// Guard representing a slice of reserved memory.
pub struct MemoryGuard<M: MemoryMetrics> {
pub(crate) state: GuardState<M>,
}
pub(crate) enum GuardState<M: MemoryMetrics> {
Unlimited,
Limited {
permit: OwnedSemaphorePermit,
quota: MemoryQuota<M>,
},
}
impl<M: MemoryMetrics> MemoryGuard<M> {
pub(crate) fn unlimited() -> Self {
Self {
state: GuardState::Unlimited,
}
}
pub(crate) fn limited(permit: OwnedSemaphorePermit, quota: MemoryQuota<M>) -> Self {
Self {
state: GuardState::Limited { permit, quota },
}
}
/// Returns granted quota in bytes.
pub fn granted_bytes(&self) -> u64 {
match &self.state {
GuardState::Unlimited => 0,
GuardState::Limited { permit, .. } => permits_to_bytes(permit.num_permits() as u32),
}
}
/// Tries to allocate additional memory during task execution.
///
/// On success, merges the new memory into this guard and returns true.
/// On failure, returns false and leaves this guard unchanged.
pub fn request_additional(&mut self, bytes: u64) -> bool {
match &mut self.state {
GuardState::Unlimited => true,
GuardState::Limited { permit, quota } => {
if bytes == 0 {
return true;
}
let additional_permits = bytes_to_permits(bytes);
match quota
.semaphore
.clone()
.try_acquire_many_owned(additional_permits)
{
Ok(additional_permit) => {
permit.merge(additional_permit);
quota.update_in_use_metric();
debug!("Allocated additional {} bytes", bytes);
true
}
Err(TryAcquireError::NoPermits) | Err(TryAcquireError::Closed) => {
quota.metrics.inc_rejected("request_additional");
false
}
}
}
}
}
/// Releases a portion of granted memory back to the pool early,
/// before the guard is dropped.
///
/// Returns true if the release succeeds or is a no-op; false if the request exceeds granted.
pub fn early_release_partial(&mut self, bytes: u64) -> bool {
match &mut self.state {
GuardState::Unlimited => true,
GuardState::Limited { permit, quota } => {
if bytes == 0 {
return true;
}
let release_permits = bytes_to_permits(bytes);
match permit.split(release_permits as usize) {
Some(released_permit) => {
let released_bytes = permits_to_bytes(released_permit.num_permits() as u32);
drop(released_permit);
quota.update_in_use_metric();
debug!("Early released {} bytes from memory guard", released_bytes);
true
}
None => false,
}
}
}
}
}
impl<M: MemoryMetrics> Drop for MemoryGuard<M> {
fn drop(&mut self) {
if let GuardState::Limited { permit, quota } =
mem::replace(&mut self.state, GuardState::Unlimited)
{
let bytes = permits_to_bytes(permit.num_permits() as u32);
drop(permit);
quota.update_in_use_metric();
debug!("Released memory: {} bytes", bytes);
}
}
}
impl<M: MemoryMetrics> fmt::Debug for MemoryGuard<M> {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
f.debug_struct("MemoryGuard")
.field("granted_bytes", &self.granted_bytes())
.finish()
}
}

View File

@@ -0,0 +1,47 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
//! Generic memory management for resource-constrained operations.
//!
//! This crate provides a reusable memory quota system based on semaphores,
//! allowing different subsystems (compaction, flush, index build, etc.) to
//! share the same allocation logic while using their own metrics.
mod error;
mod guard;
mod manager;
mod policy;
#[cfg(test)]
mod tests;
pub use error::{Error, Result};
pub use guard::MemoryGuard;
pub use manager::{MemoryManager, MemoryMetrics, PERMIT_GRANULARITY_BYTES};
pub use policy::{DEFAULT_MEMORY_WAIT_TIMEOUT, OnExhaustedPolicy};
/// No-op metrics implementation for testing.
#[derive(Clone, Copy, Debug, Default)]
pub struct NoOpMetrics;
impl MemoryMetrics for NoOpMetrics {
#[inline(always)]
fn set_limit(&self, _: i64) {}
#[inline(always)]
fn set_in_use(&self, _: i64) {}
#[inline(always)]
fn inc_rejected(&self, _: &str) {}
}

View File

@@ -0,0 +1,173 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
use std::sync::Arc;
use snafu::ensure;
use tokio::sync::{Semaphore, TryAcquireError};
use crate::error::{MemoryLimitExceededSnafu, MemorySemaphoreClosedSnafu, Result};
use crate::guard::MemoryGuard;
/// Minimum bytes controlled by one semaphore permit.
pub const PERMIT_GRANULARITY_BYTES: u64 = 1 << 20; // 1 MB
/// Trait for recording memory usage metrics.
pub trait MemoryMetrics: Clone + Send + Sync + 'static {
fn set_limit(&self, bytes: i64);
fn set_in_use(&self, bytes: i64);
fn inc_rejected(&self, reason: &str);
}
/// Generic memory manager for quota-controlled operations.
#[derive(Clone)]
pub struct MemoryManager<M: MemoryMetrics> {
quota: Option<MemoryQuota<M>>,
}
#[derive(Clone)]
pub(crate) struct MemoryQuota<M: MemoryMetrics> {
pub(crate) semaphore: Arc<Semaphore>,
pub(crate) limit_permits: u32,
pub(crate) metrics: M,
}
impl<M: MemoryMetrics> MemoryManager<M> {
/// Creates a new memory manager with the given limit in bytes.
/// `limit_bytes = 0` disables the limit.
pub fn new(limit_bytes: u64, metrics: M) -> Self {
if limit_bytes == 0 {
metrics.set_limit(0);
return Self { quota: None };
}
let limit_permits = bytes_to_permits(limit_bytes);
let limit_aligned_bytes = permits_to_bytes(limit_permits);
metrics.set_limit(limit_aligned_bytes as i64);
Self {
quota: Some(MemoryQuota {
semaphore: Arc::new(Semaphore::new(limit_permits as usize)),
limit_permits,
metrics,
}),
}
}
/// Returns the configured limit in bytes (0 if unlimited).
pub fn limit_bytes(&self) -> u64 {
self.quota
.as_ref()
.map(|quota| permits_to_bytes(quota.limit_permits))
.unwrap_or(0)
}
/// Returns currently used bytes.
pub fn used_bytes(&self) -> u64 {
self.quota
.as_ref()
.map(|quota| permits_to_bytes(quota.used_permits()))
.unwrap_or(0)
}
/// Returns available bytes.
pub fn available_bytes(&self) -> u64 {
self.quota
.as_ref()
.map(|quota| permits_to_bytes(quota.available_permits_clamped()))
.unwrap_or(0)
}
/// Acquires memory, waiting if necessary until enough is available.
///
/// # Errors
/// - Returns error if requested bytes exceed the total limit
/// - Returns error if the semaphore is unexpectedly closed
pub async fn acquire(&self, bytes: u64) -> Result<MemoryGuard<M>> {
match &self.quota {
None => Ok(MemoryGuard::unlimited()),
Some(quota) => {
let permits = bytes_to_permits(bytes);
ensure!(
permits <= quota.limit_permits,
MemoryLimitExceededSnafu {
requested_bytes: bytes,
limit_bytes: permits_to_bytes(quota.limit_permits),
}
);
let permit = quota
.semaphore
.clone()
.acquire_many_owned(permits)
.await
.map_err(|_| MemorySemaphoreClosedSnafu.build())?;
quota.update_in_use_metric();
Ok(MemoryGuard::limited(permit, quota.clone()))
}
}
}
/// Tries to acquire memory. Returns Some(guard) on success, None if insufficient.
pub fn try_acquire(&self, bytes: u64) -> Option<MemoryGuard<M>> {
match &self.quota {
None => Some(MemoryGuard::unlimited()),
Some(quota) => {
let permits = bytes_to_permits(bytes);
match quota.semaphore.clone().try_acquire_many_owned(permits) {
Ok(permit) => {
quota.update_in_use_metric();
Some(MemoryGuard::limited(permit, quota.clone()))
}
Err(TryAcquireError::NoPermits) | Err(TryAcquireError::Closed) => {
quota.metrics.inc_rejected("try_acquire");
None
}
}
}
}
}
}
impl<M: MemoryMetrics> MemoryQuota<M> {
pub(crate) fn used_permits(&self) -> u32 {
self.limit_permits
.saturating_sub(self.available_permits_clamped())
}
pub(crate) fn available_permits_clamped(&self) -> u32 {
self.semaphore
.available_permits()
.min(self.limit_permits as usize) as u32
}
pub(crate) fn update_in_use_metric(&self) {
let bytes = permits_to_bytes(self.used_permits());
self.metrics.set_in_use(bytes as i64);
}
}
pub(crate) fn bytes_to_permits(bytes: u64) -> u32 {
bytes
.saturating_add(PERMIT_GRANULARITY_BYTES - 1)
.saturating_div(PERMIT_GRANULARITY_BYTES)
.min(Semaphore::MAX_PERMITS as u64)
.min(u32::MAX as u64) as u32
}
pub(crate) fn permits_to_bytes(permits: u32) -> u64 {
(permits as u64).saturating_mul(PERMIT_GRANULARITY_BYTES)
}

View File

@@ -0,0 +1,83 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
use std::time::Duration;
use humantime::{format_duration, parse_duration};
use serde::{Deserialize, Serialize};
/// Default wait timeout for memory acquisition.
pub const DEFAULT_MEMORY_WAIT_TIMEOUT: Duration = Duration::from_secs(10);
/// Defines how to react when memory cannot be acquired immediately.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum OnExhaustedPolicy {
/// Wait until enough memory is released, bounded by timeout.
Wait { timeout: Duration },
/// Fail immediately if memory is not available.
Fail,
}
impl Default for OnExhaustedPolicy {
fn default() -> Self {
OnExhaustedPolicy::Wait {
timeout: DEFAULT_MEMORY_WAIT_TIMEOUT,
}
}
}
impl Serialize for OnExhaustedPolicy {
fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where
S: serde::Serializer,
{
let text = match self {
OnExhaustedPolicy::Fail => "fail".to_string(),
OnExhaustedPolicy::Wait { timeout } if *timeout == DEFAULT_MEMORY_WAIT_TIMEOUT => {
"wait".to_string()
}
OnExhaustedPolicy::Wait { timeout } => format!("wait({})", format_duration(*timeout)),
};
serializer.serialize_str(&text)
}
}
impl<'de> Deserialize<'de> for OnExhaustedPolicy {
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where
D: serde::Deserializer<'de>,
{
let raw = String::deserialize(deserializer)?;
let lower = raw.to_ascii_lowercase();
// Accept both "skip" (legacy) and "fail".
if lower == "skip" || lower == "fail" {
return Ok(OnExhaustedPolicy::Fail);
}
if lower == "wait" {
return Ok(OnExhaustedPolicy::default());
}
if lower.starts_with("wait(") && lower.ends_with(')') {
let inner = &raw[5..raw.len() - 1];
let timeout = parse_duration(inner).map_err(serde::de::Error::custom)?;
return Ok(OnExhaustedPolicy::Wait { timeout });
}
Err(serde::de::Error::custom(format!(
"invalid memory policy: {}, expected wait, wait(<duration>), fail",
raw
)))
}
}

View File

@@ -0,0 +1,247 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
use tokio::time::{Duration, sleep};
use crate::{MemoryManager, NoOpMetrics, PERMIT_GRANULARITY_BYTES};
#[test]
fn test_try_acquire_unlimited() {
let manager = MemoryManager::new(0, NoOpMetrics);
let guard = manager.try_acquire(10 * PERMIT_GRANULARITY_BYTES).unwrap();
assert_eq!(manager.limit_bytes(), 0);
assert_eq!(guard.granted_bytes(), 0);
}
#[test]
fn test_try_acquire_limited_success_and_release() {
let bytes = 2 * PERMIT_GRANULARITY_BYTES;
let manager = MemoryManager::new(bytes, NoOpMetrics);
{
let guard = manager.try_acquire(PERMIT_GRANULARITY_BYTES).unwrap();
assert_eq!(guard.granted_bytes(), PERMIT_GRANULARITY_BYTES);
assert_eq!(manager.used_bytes(), PERMIT_GRANULARITY_BYTES);
drop(guard);
}
assert_eq!(manager.used_bytes(), 0);
}
#[test]
fn test_try_acquire_exceeds_limit() {
let limit = PERMIT_GRANULARITY_BYTES;
let manager = MemoryManager::new(limit, NoOpMetrics);
let result = manager.try_acquire(limit + PERMIT_GRANULARITY_BYTES);
assert!(result.is_none());
}
#[tokio::test(flavor = "current_thread")]
async fn test_acquire_blocks_and_unblocks() {
let bytes = 2 * PERMIT_GRANULARITY_BYTES;
let manager = MemoryManager::new(bytes, NoOpMetrics);
let guard = manager.try_acquire(bytes).unwrap();
// Spawn a task that will block on acquire()
let waiter = {
let manager = manager.clone();
tokio::spawn(async move {
// This will block until memory is available
let _guard = manager.acquire(bytes).await.unwrap();
})
};
sleep(Duration::from_millis(10)).await;
// Release memory - this should unblock the waiter
drop(guard);
// Waiter should complete now
waiter.await.unwrap();
}
#[test]
fn test_request_additional_success() {
let limit = 10 * PERMIT_GRANULARITY_BYTES; // 10MB limit
let manager = MemoryManager::new(limit, NoOpMetrics);
// Acquire base quota (5MB)
let base = 5 * PERMIT_GRANULARITY_BYTES;
let mut guard = manager.try_acquire(base).unwrap();
assert_eq!(guard.granted_bytes(), base);
assert_eq!(manager.used_bytes(), base);
// Request additional memory (3MB) - should succeed and merge
assert!(guard.request_additional(3 * PERMIT_GRANULARITY_BYTES));
assert_eq!(guard.granted_bytes(), 8 * PERMIT_GRANULARITY_BYTES);
assert_eq!(manager.used_bytes(), 8 * PERMIT_GRANULARITY_BYTES);
}
#[test]
fn test_request_additional_exceeds_limit() {
let limit = 10 * PERMIT_GRANULARITY_BYTES; // 10MB limit
let manager = MemoryManager::new(limit, NoOpMetrics);
// Acquire base quota (5MB)
let base = 5 * PERMIT_GRANULARITY_BYTES;
let mut guard = manager.try_acquire(base).unwrap();
// Request additional memory (3MB) - should succeed
assert!(guard.request_additional(3 * PERMIT_GRANULARITY_BYTES));
assert_eq!(manager.used_bytes(), 8 * PERMIT_GRANULARITY_BYTES);
// Request more (3MB) - should fail (would exceed 10MB limit)
let result = guard.request_additional(3 * PERMIT_GRANULARITY_BYTES);
assert!(!result);
// Still at 8MB
assert_eq!(manager.used_bytes(), 8 * PERMIT_GRANULARITY_BYTES);
assert_eq!(guard.granted_bytes(), 8 * PERMIT_GRANULARITY_BYTES);
}
#[test]
fn test_request_additional_auto_release_on_guard_drop() {
let limit = 10 * PERMIT_GRANULARITY_BYTES;
let manager = MemoryManager::new(limit, NoOpMetrics);
{
let mut guard = manager.try_acquire(5 * PERMIT_GRANULARITY_BYTES).unwrap();
// Request additional - memory is merged into guard
assert!(guard.request_additional(3 * PERMIT_GRANULARITY_BYTES));
assert_eq!(manager.used_bytes(), 8 * PERMIT_GRANULARITY_BYTES);
// When guard drops, all memory (base + additional) is released together
}
// After scope, all memory should be released
assert_eq!(manager.used_bytes(), 0);
}
#[test]
fn test_request_additional_unlimited() {
let manager = MemoryManager::new(0, NoOpMetrics); // Unlimited
let mut guard = manager.try_acquire(5 * PERMIT_GRANULARITY_BYTES).unwrap();
// Should always succeed with unlimited manager
assert!(guard.request_additional(100 * PERMIT_GRANULARITY_BYTES));
assert_eq!(guard.granted_bytes(), 0);
assert_eq!(manager.used_bytes(), 0);
}
#[test]
fn test_request_additional_zero_bytes() {
let limit = 10 * PERMIT_GRANULARITY_BYTES;
let manager = MemoryManager::new(limit, NoOpMetrics);
let mut guard = manager.try_acquire(5 * PERMIT_GRANULARITY_BYTES).unwrap();
// Request 0 bytes should succeed without affecting anything
assert!(guard.request_additional(0));
assert_eq!(guard.granted_bytes(), 5 * PERMIT_GRANULARITY_BYTES);
assert_eq!(manager.used_bytes(), 5 * PERMIT_GRANULARITY_BYTES);
}
#[test]
fn test_early_release_partial_success() {
let limit = 10 * PERMIT_GRANULARITY_BYTES;
let manager = MemoryManager::new(limit, NoOpMetrics);
let mut guard = manager.try_acquire(8 * PERMIT_GRANULARITY_BYTES).unwrap();
assert_eq!(manager.used_bytes(), 8 * PERMIT_GRANULARITY_BYTES);
// Release half
assert!(guard.early_release_partial(4 * PERMIT_GRANULARITY_BYTES));
assert_eq!(guard.granted_bytes(), 4 * PERMIT_GRANULARITY_BYTES);
assert_eq!(manager.used_bytes(), 4 * PERMIT_GRANULARITY_BYTES);
// Released memory should be available to others
let _guard2 = manager.try_acquire(4 * PERMIT_GRANULARITY_BYTES).unwrap();
assert_eq!(manager.used_bytes(), 8 * PERMIT_GRANULARITY_BYTES);
}
#[test]
fn test_early_release_partial_exceeds_granted() {
let manager = MemoryManager::new(10 * PERMIT_GRANULARITY_BYTES, NoOpMetrics);
let mut guard = manager.try_acquire(5 * PERMIT_GRANULARITY_BYTES).unwrap();
// Try to release more than granted - should fail
assert!(!guard.early_release_partial(10 * PERMIT_GRANULARITY_BYTES));
assert_eq!(guard.granted_bytes(), 5 * PERMIT_GRANULARITY_BYTES);
assert_eq!(manager.used_bytes(), 5 * PERMIT_GRANULARITY_BYTES);
}
#[test]
fn test_early_release_partial_unlimited() {
let manager = MemoryManager::new(0, NoOpMetrics);
let mut guard = manager.try_acquire(100 * PERMIT_GRANULARITY_BYTES).unwrap();
// Unlimited guard - release should succeed (no-op)
assert!(guard.early_release_partial(50 * PERMIT_GRANULARITY_BYTES));
assert_eq!(guard.granted_bytes(), 0);
}
#[test]
fn test_request_and_early_release_symmetry() {
let limit = 20 * PERMIT_GRANULARITY_BYTES;
let manager = MemoryManager::new(limit, NoOpMetrics);
let mut guard = manager.try_acquire(5 * PERMIT_GRANULARITY_BYTES).unwrap();
// Request additional
assert!(guard.request_additional(5 * PERMIT_GRANULARITY_BYTES));
assert_eq!(guard.granted_bytes(), 10 * PERMIT_GRANULARITY_BYTES);
assert_eq!(manager.used_bytes(), 10 * PERMIT_GRANULARITY_BYTES);
// Early release some
assert!(guard.early_release_partial(3 * PERMIT_GRANULARITY_BYTES));
assert_eq!(guard.granted_bytes(), 7 * PERMIT_GRANULARITY_BYTES);
assert_eq!(manager.used_bytes(), 7 * PERMIT_GRANULARITY_BYTES);
// Request again
assert!(guard.request_additional(2 * PERMIT_GRANULARITY_BYTES));
assert_eq!(guard.granted_bytes(), 9 * PERMIT_GRANULARITY_BYTES);
assert_eq!(manager.used_bytes(), 9 * PERMIT_GRANULARITY_BYTES);
// Early release again
assert!(guard.early_release_partial(4 * PERMIT_GRANULARITY_BYTES));
assert_eq!(guard.granted_bytes(), 5 * PERMIT_GRANULARITY_BYTES);
assert_eq!(manager.used_bytes(), 5 * PERMIT_GRANULARITY_BYTES);
drop(guard);
assert_eq!(manager.used_bytes(), 0);
}
#[test]
fn test_small_allocation_rounds_up() {
// Test that allocations smaller than PERMIT_GRANULARITY_BYTES
// round up to 1 permit and can use request_additional()
let limit = 10 * PERMIT_GRANULARITY_BYTES;
let manager = MemoryManager::new(limit, NoOpMetrics);
let mut guard = manager.try_acquire(512 * 1024).unwrap(); // 512KB
assert_eq!(guard.granted_bytes(), PERMIT_GRANULARITY_BYTES); // Rounds up to 1MB
assert!(guard.request_additional(2 * PERMIT_GRANULARITY_BYTES)); // Can request more
assert_eq!(guard.granted_bytes(), 3 * PERMIT_GRANULARITY_BYTES);
}
#[test]
fn test_acquire_zero_bytes_lazy_allocation() {
// Test that acquire(0) returns 0 permits but can request_additional() later
let manager = MemoryManager::new(10 * PERMIT_GRANULARITY_BYTES, NoOpMetrics);
let mut guard = manager.try_acquire(0).unwrap();
assert_eq!(guard.granted_bytes(), 0); // No permits consumed
assert_eq!(manager.used_bytes(), 0);
assert!(guard.request_additional(3 * PERMIT_GRANULARITY_BYTES)); // Lazy allocation
assert_eq!(guard.granted_bytes(), 3 * PERMIT_GRANULARITY_BYTES);
}

View File

@@ -12,6 +12,7 @@
// See the License for the specific language governing permissions and // See the License for the specific language governing permissions and
// limitations under the License. // limitations under the License.
use std::fmt::{Display, Formatter};
use std::hash::{DefaultHasher, Hash, Hasher}; use std::hash::{DefaultHasher, Hash, Hasher};
use std::str::FromStr; use std::str::FromStr;
@@ -60,7 +61,7 @@ pub trait ClusterInfo {
} }
/// The key of [NodeInfo] in the storage. The format is `__meta_cluster_node_info-0-{role}-{node_id}`. /// The key of [NodeInfo] in the storage. The format is `__meta_cluster_node_info-0-{role}-{node_id}`.
#[derive(Debug, Clone, Copy, Eq, Hash, PartialEq, Serialize, Deserialize)] #[derive(Debug, Clone, Copy, Eq, Hash, PartialEq, Serialize, Deserialize, PartialOrd, Ord)]
pub struct NodeInfoKey { pub struct NodeInfoKey {
/// The role of the node. It can be `[Role::Datanode]` or `[Role::Frontend]`. /// The role of the node. It can be `[Role::Datanode]` or `[Role::Frontend]`.
pub role: Role, pub role: Role,
@@ -135,7 +136,7 @@ pub struct NodeInfo {
pub hostname: String, pub hostname: String,
} }
#[derive(Debug, Clone, Copy, Eq, Hash, PartialEq, Serialize, Deserialize)] #[derive(Debug, Clone, Copy, Eq, Hash, PartialEq, Serialize, Deserialize, PartialOrd, Ord)]
pub enum Role { pub enum Role {
Datanode, Datanode,
Frontend, Frontend,
@@ -241,6 +242,12 @@ impl From<&NodeInfoKey> for Vec<u8> {
} }
} }
impl Display for NodeInfoKey {
fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result {
write!(f, "{:?}-{}", self.role, self.node_id)
}
}
impl FromStr for NodeInfo { impl FromStr for NodeInfo {
type Err = Error; type Err = Error;

View File

@@ -31,6 +31,7 @@ use crate::region_registry::LeaderRegionRegistryRef;
pub mod alter_database; pub mod alter_database;
pub mod alter_logical_tables; pub mod alter_logical_tables;
pub mod alter_table; pub mod alter_table;
pub mod comment_on;
pub mod create_database; pub mod create_database;
pub mod create_flow; pub mod create_flow;
pub mod create_logical_tables; pub mod create_logical_tables;

View File

@@ -301,8 +301,8 @@ fn build_new_table_info(
| AlterKind::UnsetTableOptions { .. } | AlterKind::UnsetTableOptions { .. }
| AlterKind::SetIndexes { .. } | AlterKind::SetIndexes { .. }
| AlterKind::UnsetIndexes { .. } | AlterKind::UnsetIndexes { .. }
| AlterKind::DropDefaults { .. } => {} | AlterKind::DropDefaults { .. }
AlterKind::SetDefaults { .. } => {} | AlterKind::SetDefaults { .. } => {}
} }
info!( info!(

View File

@@ -0,0 +1,509 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
use async_trait::async_trait;
use chrono::Utc;
use common_catalog::format_full_table_name;
use common_procedure::error::{FromJsonSnafu, Result as ProcedureResult, ToJsonSnafu};
use common_procedure::{Context as ProcedureContext, LockKey, Procedure, Status};
use common_telemetry::tracing::info;
use datatypes::schema::COMMENT_KEY as COLUMN_COMMENT_KEY;
use serde::{Deserialize, Serialize};
use snafu::{OptionExt, ResultExt, ensure};
use store_api::storage::TableId;
use strum::AsRefStr;
use table::metadata::RawTableInfo;
use table::requests::COMMENT_KEY as TABLE_COMMENT_KEY;
use table::table_name::TableName;
use crate::cache_invalidator::Context;
use crate::ddl::DdlContext;
use crate::ddl::utils::map_to_procedure_error;
use crate::error::{ColumnNotFoundSnafu, FlowNotFoundSnafu, Result, TableNotFoundSnafu};
use crate::instruction::CacheIdent;
use crate::key::flow::flow_info::{FlowInfoKey, FlowInfoValue};
use crate::key::table_info::{TableInfoKey, TableInfoValue};
use crate::key::table_name::TableNameKey;
use crate::key::{DeserializedValueWithBytes, FlowId, MetadataKey, MetadataValue};
use crate::lock_key::{CatalogLock, FlowNameLock, SchemaLock, TableNameLock};
use crate::rpc::ddl::{CommentObjectType, CommentOnTask};
use crate::rpc::store::PutRequest;
pub struct CommentOnProcedure {
pub context: DdlContext,
pub data: CommentOnData,
}
impl CommentOnProcedure {
pub const TYPE_NAME: &'static str = "metasrv-procedure::CommentOn";
pub fn new(task: CommentOnTask, context: DdlContext) -> Self {
Self {
context,
data: CommentOnData::new(task),
}
}
pub fn from_json(json: &str, context: DdlContext) -> ProcedureResult<Self> {
let data = serde_json::from_str(json).context(FromJsonSnafu)?;
Ok(Self { context, data })
}
pub async fn on_prepare(&mut self) -> Result<Status> {
match self.data.object_type {
CommentObjectType::Table | CommentObjectType::Column => {
self.prepare_table_or_column().await?;
}
CommentObjectType::Flow => {
self.prepare_flow().await?;
}
}
// Fast path: if comment is unchanged, skip update
if self.data.is_unchanged {
let object_desc = match self.data.object_type {
CommentObjectType::Table => format!(
"table {}",
format_full_table_name(
&self.data.catalog_name,
&self.data.schema_name,
&self.data.object_name,
)
),
CommentObjectType::Column => format!(
"column {}.{}",
format_full_table_name(
&self.data.catalog_name,
&self.data.schema_name,
&self.data.object_name,
),
self.data.column_name.as_ref().unwrap()
),
CommentObjectType::Flow => {
format!("flow {}.{}", self.data.catalog_name, self.data.object_name)
}
};
info!("Comment unchanged for {}, skipping update", object_desc);
return Ok(Status::done());
}
self.data.state = CommentOnState::UpdateMetadata;
Ok(Status::executing(true))
}
async fn prepare_table_or_column(&mut self) -> Result<()> {
let table_name_key = TableNameKey::new(
&self.data.catalog_name,
&self.data.schema_name,
&self.data.object_name,
);
let table_id = self
.context
.table_metadata_manager
.table_name_manager()
.get(table_name_key)
.await?
.with_context(|| TableNotFoundSnafu {
table_name: format_full_table_name(
&self.data.catalog_name,
&self.data.schema_name,
&self.data.object_name,
),
})?
.table_id();
let table_info = self
.context
.table_metadata_manager
.table_info_manager()
.get(table_id)
.await?
.with_context(|| TableNotFoundSnafu {
table_name: format_full_table_name(
&self.data.catalog_name,
&self.data.schema_name,
&self.data.object_name,
),
})?;
// For column comments, validate the column exists
if self.data.object_type == CommentObjectType::Column {
let column_name = self.data.column_name.as_ref().unwrap();
let column_exists = table_info
.table_info
.meta
.schema
.column_schemas
.iter()
.any(|col| &col.name == column_name);
ensure!(
column_exists,
ColumnNotFoundSnafu {
column_name,
column_id: 0u32, // column_id is not known here
}
);
}
self.data.table_id = Some(table_id);
// Check if comment is unchanged for early exit optimization
match self.data.object_type {
CommentObjectType::Table => {
let current_comment = &table_info.table_info.desc;
if &self.data.comment == current_comment {
self.data.is_unchanged = true;
}
}
CommentObjectType::Column => {
let column_name = self.data.column_name.as_ref().unwrap();
let column_schema = table_info
.table_info
.meta
.schema
.column_schemas
.iter()
.find(|col| &col.name == column_name)
.unwrap(); // Safe: validated above
let current_comment = column_schema.metadata().get(COLUMN_COMMENT_KEY);
if self.data.comment.as_deref() == current_comment.map(String::as_str) {
self.data.is_unchanged = true;
}
}
CommentObjectType::Flow => {
// this branch is handled in `prepare_flow`
}
}
self.data.table_info = Some(table_info);
Ok(())
}
async fn prepare_flow(&mut self) -> Result<()> {
let flow_name_value = self
.context
.flow_metadata_manager
.flow_name_manager()
.get(&self.data.catalog_name, &self.data.object_name)
.await?
.with_context(|| FlowNotFoundSnafu {
flow_name: &self.data.object_name,
})?;
let flow_id = flow_name_value.flow_id();
let flow_info = self
.context
.flow_metadata_manager
.flow_info_manager()
.get_raw(flow_id)
.await?
.with_context(|| FlowNotFoundSnafu {
flow_name: &self.data.object_name,
})?;
self.data.flow_id = Some(flow_id);
// Check if comment is unchanged for early exit optimization
let current_comment = &flow_info.get_inner_ref().comment;
let new_comment = self.data.comment.as_deref().unwrap_or("");
if new_comment == current_comment.as_str() {
self.data.is_unchanged = true;
}
self.data.flow_info = Some(flow_info);
Ok(())
}
pub async fn on_update_metadata(&mut self) -> Result<Status> {
match self.data.object_type {
CommentObjectType::Table => {
self.update_table_comment().await?;
}
CommentObjectType::Column => {
self.update_column_comment().await?;
}
CommentObjectType::Flow => {
self.update_flow_comment().await?;
}
}
self.data.state = CommentOnState::InvalidateCache;
Ok(Status::executing(true))
}
async fn update_table_comment(&mut self) -> Result<()> {
let table_info_value = self.data.table_info.as_ref().unwrap();
let mut new_table_info = table_info_value.table_info.clone();
new_table_info.desc = self.data.comment.clone();
// Sync comment to table options
sync_table_comment_option(
&mut new_table_info.meta.options,
new_table_info.desc.as_deref(),
);
self.update_table_info(table_info_value, new_table_info)
.await?;
info!(
"Updated comment for table {}.{}.{}",
self.data.catalog_name, self.data.schema_name, self.data.object_name
);
Ok(())
}
async fn update_column_comment(&mut self) -> Result<()> {
let table_info_value = self.data.table_info.as_ref().unwrap();
let mut new_table_info = table_info_value.table_info.clone();
let column_name = self.data.column_name.as_ref().unwrap();
let column_schema = new_table_info
.meta
.schema
.column_schemas
.iter_mut()
.find(|col| &col.name == column_name)
.unwrap(); // Safe: validated in prepare
update_column_comment_metadata(column_schema, self.data.comment.clone());
self.update_table_info(table_info_value, new_table_info)
.await?;
info!(
"Updated comment for column {}.{}.{}.{}",
self.data.catalog_name, self.data.schema_name, self.data.object_name, column_name
);
Ok(())
}
async fn update_flow_comment(&mut self) -> Result<()> {
let flow_id = self.data.flow_id.unwrap();
let flow_info_value = self.data.flow_info.as_ref().unwrap();
let mut new_flow_info = flow_info_value.get_inner_ref().clone();
new_flow_info.comment = self.data.comment.clone().unwrap_or_default();
new_flow_info.updated_time = Utc::now();
let raw_value = new_flow_info.try_as_raw_value()?;
self.context
.table_metadata_manager
.kv_backend()
.put(
PutRequest::new()
.with_key(FlowInfoKey::new(flow_id).to_bytes())
.with_value(raw_value),
)
.await?;
info!(
"Updated comment for flow {}.{}",
self.data.catalog_name, self.data.object_name
);
Ok(())
}
async fn update_table_info(
&self,
current_table_info: &DeserializedValueWithBytes<TableInfoValue>,
new_table_info: RawTableInfo,
) -> Result<()> {
let table_id = current_table_info.table_info.ident.table_id;
let new_table_info_value = current_table_info.update(new_table_info);
let raw_value = new_table_info_value.try_as_raw_value()?;
self.context
.table_metadata_manager
.kv_backend()
.put(
PutRequest::new()
.with_key(TableInfoKey::new(table_id).to_bytes())
.with_value(raw_value),
)
.await?;
Ok(())
}
pub async fn on_invalidate_cache(&mut self) -> Result<Status> {
let cache_invalidator = &self.context.cache_invalidator;
match self.data.object_type {
CommentObjectType::Table | CommentObjectType::Column => {
let table_id = self.data.table_id.unwrap();
let table_name = TableName::new(
self.data.catalog_name.clone(),
self.data.schema_name.clone(),
self.data.object_name.clone(),
);
let cache_ident = vec![
CacheIdent::TableId(table_id),
CacheIdent::TableName(table_name),
];
cache_invalidator
.invalidate(&Context::default(), &cache_ident)
.await?;
}
CommentObjectType::Flow => {
let flow_id = self.data.flow_id.unwrap();
let cache_ident = vec![CacheIdent::FlowId(flow_id)];
cache_invalidator
.invalidate(&Context::default(), &cache_ident)
.await?;
}
}
Ok(Status::done())
}
}
#[async_trait]
impl Procedure for CommentOnProcedure {
fn type_name(&self) -> &str {
Self::TYPE_NAME
}
async fn execute(&mut self, _ctx: &ProcedureContext) -> ProcedureResult<Status> {
match self.data.state {
CommentOnState::Prepare => self.on_prepare().await,
CommentOnState::UpdateMetadata => self.on_update_metadata().await,
CommentOnState::InvalidateCache => self.on_invalidate_cache().await,
}
.map_err(map_to_procedure_error)
}
fn dump(&self) -> ProcedureResult<String> {
serde_json::to_string(&self.data).context(ToJsonSnafu)
}
fn lock_key(&self) -> LockKey {
let catalog = &self.data.catalog_name;
let schema = &self.data.schema_name;
let lock_key = match self.data.object_type {
CommentObjectType::Table | CommentObjectType::Column => {
vec![
CatalogLock::Read(catalog).into(),
SchemaLock::read(catalog, schema).into(),
TableNameLock::new(catalog, schema, &self.data.object_name).into(),
]
}
CommentObjectType::Flow => {
vec![
CatalogLock::Read(catalog).into(),
FlowNameLock::new(catalog, &self.data.object_name).into(),
]
}
};
LockKey::new(lock_key)
}
}
#[derive(Debug, Serialize, Deserialize, AsRefStr)]
enum CommentOnState {
Prepare,
UpdateMetadata,
InvalidateCache,
}
/// The data of comment on procedure.
#[derive(Debug, Serialize, Deserialize)]
pub struct CommentOnData {
state: CommentOnState,
catalog_name: String,
schema_name: String,
object_type: CommentObjectType,
object_name: String,
/// Column name (only for Column comments)
column_name: Option<String>,
comment: Option<String>,
/// Cached table ID (for Table/Column)
#[serde(skip_serializing_if = "Option::is_none")]
table_id: Option<TableId>,
/// Cached table info (for Table/Column)
#[serde(skip)]
table_info: Option<DeserializedValueWithBytes<TableInfoValue>>,
/// Cached flow ID (for Flow)
#[serde(skip_serializing_if = "Option::is_none")]
flow_id: Option<FlowId>,
/// Cached flow info (for Flow)
#[serde(skip)]
flow_info: Option<DeserializedValueWithBytes<FlowInfoValue>>,
/// Whether the comment is unchanged (optimization for early exit)
#[serde(skip)]
is_unchanged: bool,
}
impl CommentOnData {
pub fn new(task: CommentOnTask) -> Self {
Self {
state: CommentOnState::Prepare,
catalog_name: task.catalog_name,
schema_name: task.schema_name,
object_type: task.object_type,
object_name: task.object_name,
column_name: task.column_name,
comment: task.comment,
table_id: None,
table_info: None,
flow_id: None,
flow_info: None,
is_unchanged: false,
}
}
}
fn update_column_comment_metadata(
column_schema: &mut datatypes::schema::ColumnSchema,
comment: Option<String>,
) {
match comment {
Some(value) => {
column_schema
.mut_metadata()
.insert(COLUMN_COMMENT_KEY.to_string(), value);
}
None => {
column_schema.mut_metadata().remove(COLUMN_COMMENT_KEY);
}
}
}
fn sync_table_comment_option(options: &mut table::requests::TableOptions, comment: Option<&str>) {
match comment {
Some(value) => {
options
.extra_options
.insert(TABLE_COMMENT_KEY.to_string(), value.to_string());
}
None => {
options.extra_options.remove(TABLE_COMMENT_KEY);
}
}
}

View File

@@ -27,6 +27,7 @@ use store_api::storage::TableId;
use crate::ddl::alter_database::AlterDatabaseProcedure; use crate::ddl::alter_database::AlterDatabaseProcedure;
use crate::ddl::alter_logical_tables::AlterLogicalTablesProcedure; use crate::ddl::alter_logical_tables::AlterLogicalTablesProcedure;
use crate::ddl::alter_table::AlterTableProcedure; use crate::ddl::alter_table::AlterTableProcedure;
use crate::ddl::comment_on::CommentOnProcedure;
use crate::ddl::create_database::CreateDatabaseProcedure; use crate::ddl::create_database::CreateDatabaseProcedure;
use crate::ddl::create_flow::CreateFlowProcedure; use crate::ddl::create_flow::CreateFlowProcedure;
use crate::ddl::create_logical_tables::CreateLogicalTablesProcedure; use crate::ddl::create_logical_tables::CreateLogicalTablesProcedure;
@@ -52,18 +53,18 @@ use crate::rpc::ddl::DdlTask::CreateTrigger;
#[cfg(feature = "enterprise")] #[cfg(feature = "enterprise")]
use crate::rpc::ddl::DdlTask::DropTrigger; use crate::rpc::ddl::DdlTask::DropTrigger;
use crate::rpc::ddl::DdlTask::{ use crate::rpc::ddl::DdlTask::{
AlterDatabase, AlterLogicalTables, AlterTable, CreateDatabase, CreateFlow, CreateLogicalTables, AlterDatabase, AlterLogicalTables, AlterTable, CommentOn, CreateDatabase, CreateFlow,
CreateTable, CreateView, DropDatabase, DropFlow, DropLogicalTables, DropTable, DropView, CreateLogicalTables, CreateTable, CreateView, DropDatabase, DropFlow, DropLogicalTables,
TruncateTable, DropTable, DropView, TruncateTable,
}; };
#[cfg(feature = "enterprise")] #[cfg(feature = "enterprise")]
use crate::rpc::ddl::trigger::CreateTriggerTask; use crate::rpc::ddl::trigger::CreateTriggerTask;
#[cfg(feature = "enterprise")] #[cfg(feature = "enterprise")]
use crate::rpc::ddl::trigger::DropTriggerTask; use crate::rpc::ddl::trigger::DropTriggerTask;
use crate::rpc::ddl::{ use crate::rpc::ddl::{
AlterDatabaseTask, AlterTableTask, CreateDatabaseTask, CreateFlowTask, CreateTableTask, AlterDatabaseTask, AlterTableTask, CommentOnTask, CreateDatabaseTask, CreateFlowTask,
CreateViewTask, DropDatabaseTask, DropFlowTask, DropTableTask, DropViewTask, QueryContext, CreateTableTask, CreateViewTask, DropDatabaseTask, DropFlowTask, DropTableTask, DropViewTask,
SubmitDdlTaskRequest, SubmitDdlTaskResponse, TruncateTableTask, QueryContext, SubmitDdlTaskRequest, SubmitDdlTaskResponse, TruncateTableTask,
}; };
use crate::rpc::router::RegionRoute; use crate::rpc::router::RegionRoute;
@@ -192,7 +193,8 @@ impl DdlManager {
TruncateTableProcedure, TruncateTableProcedure,
CreateDatabaseProcedure, CreateDatabaseProcedure,
DropDatabaseProcedure, DropDatabaseProcedure,
DropViewProcedure DropViewProcedure,
CommentOnProcedure
); );
for (type_name, loader_factory) in loaders { for (type_name, loader_factory) in loaders {
@@ -408,6 +410,19 @@ impl DdlManager {
self.submit_procedure(procedure_with_id).await self.submit_procedure(procedure_with_id).await
} }
/// Submits and executes a comment on task.
#[tracing::instrument(skip_all)]
pub async fn submit_comment_on_task(
&self,
comment_on_task: CommentOnTask,
) -> Result<(ProcedureId, Option<Output>)> {
let context = self.create_context();
let procedure = CommentOnProcedure::new(comment_on_task, context);
let procedure_with_id = ProcedureWithId::with_random_id(Box::new(procedure));
self.submit_procedure(procedure_with_id).await
}
async fn submit_procedure( async fn submit_procedure(
&self, &self,
procedure_with_id: ProcedureWithId, procedure_with_id: ProcedureWithId,
@@ -476,6 +491,7 @@ impl DdlManager {
handle_create_view_task(self, create_view_task).await handle_create_view_task(self, create_view_task).await
} }
DropView(drop_view_task) => handle_drop_view_task(self, drop_view_task).await, DropView(drop_view_task) => handle_drop_view_task(self, drop_view_task).await,
CommentOn(comment_on_task) => handle_comment_on_task(self, comment_on_task).await,
#[cfg(feature = "enterprise")] #[cfg(feature = "enterprise")]
CreateTrigger(create_trigger_task) => { CreateTrigger(create_trigger_task) => {
handle_create_trigger_task( handle_create_trigger_task(
@@ -907,6 +923,26 @@ async fn handle_create_view_task(
}) })
} }
async fn handle_comment_on_task(
ddl_manager: &DdlManager,
comment_on_task: CommentOnTask,
) -> Result<SubmitDdlTaskResponse> {
let (id, _) = ddl_manager
.submit_comment_on_task(comment_on_task.clone())
.await?;
let procedure_id = id.to_string();
info!(
"Comment on {}.{}.{} is updated via procedure_id {id:?}",
comment_on_task.catalog_name, comment_on_task.schema_name, comment_on_task.object_name
);
Ok(SubmitDdlTaskResponse {
key: procedure_id.into(),
..Default::default()
})
}
#[cfg(test)] #[cfg(test)]
mod tests { mod tests {
use std::sync::Arc; use std::sync::Arc;

View File

@@ -14,6 +14,8 @@
use std::time::Duration; use std::time::Duration;
use etcd_client::ConnectOptions;
/// Heartbeat interval time (is the basic unit of various time). /// Heartbeat interval time (is the basic unit of various time).
pub const HEARTBEAT_INTERVAL_MILLIS: u64 = 3000; pub const HEARTBEAT_INTERVAL_MILLIS: u64 = 3000;
@@ -45,12 +47,18 @@ pub const META_KEEP_ALIVE_INTERVAL_SECS: u64 = META_LEASE_SECS / 2;
pub const HEARTBEAT_TIMEOUT: Duration = Duration::from_secs(META_KEEP_ALIVE_INTERVAL_SECS + 1); pub const HEARTBEAT_TIMEOUT: Duration = Duration::from_secs(META_KEEP_ALIVE_INTERVAL_SECS + 1);
/// The keep-alive interval of the heartbeat channel. /// The keep-alive interval of the heartbeat channel.
pub const HEARTBEAT_CHANNEL_KEEP_ALIVE_INTERVAL_SECS: Duration = pub const HEARTBEAT_CHANNEL_KEEP_ALIVE_INTERVAL_SECS: Duration = Duration::from_secs(15);
Duration::from_secs(META_KEEP_ALIVE_INTERVAL_SECS + 1);
/// The keep-alive timeout of the heartbeat channel. /// The keep-alive timeout of the heartbeat channel.
pub const HEARTBEAT_CHANNEL_KEEP_ALIVE_TIMEOUT_SECS: Duration = pub const HEARTBEAT_CHANNEL_KEEP_ALIVE_TIMEOUT_SECS: Duration = Duration::from_secs(5);
Duration::from_secs(META_KEEP_ALIVE_INTERVAL_SECS + 1);
/// The default options for the etcd client.
pub fn default_etcd_client_options() -> ConnectOptions {
ConnectOptions::new()
.with_keep_alive_while_idle(true)
.with_keep_alive(Duration::from_secs(15), Duration::from_secs(5))
.with_connect_timeout(Duration::from_secs(10))
}
/// The default mailbox round-trip timeout. /// The default mailbox round-trip timeout.
pub const MAILBOX_RTT_SECS: u64 = 1; pub const MAILBOX_RTT_SECS: u64 = 1;

View File

@@ -272,13 +272,6 @@ pub enum Error {
location: Location, location: Location,
}, },
#[snafu(display("Failed to send message: {err_msg}"))]
SendMessage {
err_msg: String,
#[snafu(implicit)]
location: Location,
},
#[snafu(display("Failed to serde json"))] #[snafu(display("Failed to serde json"))]
SerdeJson { SerdeJson {
#[snafu(source)] #[snafu(source)]
@@ -1118,7 +1111,7 @@ impl ErrorExt for Error {
| DeserializeFlexbuffers { .. } | DeserializeFlexbuffers { .. }
| ConvertTimeRanges { .. } => StatusCode::Unexpected, | ConvertTimeRanges { .. } => StatusCode::Unexpected,
SendMessage { .. } | GetKvCache { .. } | CacheNotGet { .. } => StatusCode::Internal, GetKvCache { .. } | CacheNotGet { .. } => StatusCode::Internal,
SchemaAlreadyExists { .. } => StatusCode::DatabaseAlreadyExists, SchemaAlreadyExists { .. } => StatusCode::DatabaseAlreadyExists,

View File

@@ -23,6 +23,7 @@ use crate::heartbeat::mailbox::{IncomingMessage, MailboxRef};
pub mod invalidate_table_cache; pub mod invalidate_table_cache;
pub mod parse_mailbox_message; pub mod parse_mailbox_message;
pub mod suspend;
#[cfg(test)] #[cfg(test)]
mod tests; mod tests;

View File

@@ -0,0 +1,69 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
use std::sync::Arc;
use std::sync::atomic::{AtomicBool, Ordering};
use async_trait::async_trait;
use common_telemetry::{info, warn};
use crate::error::Result;
use crate::heartbeat::handler::{
HandleControl, HeartbeatResponseHandler, HeartbeatResponseHandlerContext,
};
use crate::instruction::Instruction;
/// A heartbeat response handler that handles special "suspend" error.
/// It will simply set or clear (if previously set) the inner suspend atomic state.
pub struct SuspendHandler {
suspend: Arc<AtomicBool>,
}
impl SuspendHandler {
pub fn new(suspend: Arc<AtomicBool>) -> Self {
Self { suspend }
}
}
#[async_trait]
impl HeartbeatResponseHandler for SuspendHandler {
fn is_acceptable(&self, context: &HeartbeatResponseHandlerContext) -> bool {
matches!(
context.incoming_message,
Some((_, Instruction::Suspend)) | None
)
}
async fn handle(&self, context: &mut HeartbeatResponseHandlerContext) -> Result<HandleControl> {
let flip_state = |expect: bool| {
self.suspend
.compare_exchange(expect, !expect, Ordering::Relaxed, Ordering::Relaxed)
.is_ok()
};
if let Some((_, Instruction::Suspend)) = context.incoming_message.take() {
if flip_state(false) {
warn!("Suspend instruction received from meta, entering suspension state");
}
} else {
// Suspended components are made always tried to get rid of this state, we don't want
// an "un-suspend" instruction to resume them running. That can be error-prone.
// So if the "suspend" instruction is not found in the heartbeat, just unset the state.
if flip_state(true) {
info!("clear suspend state");
}
}
Ok(HandleControl::Continue)
}
}

View File

@@ -15,8 +15,8 @@
use std::sync::Arc; use std::sync::Arc;
use tokio::sync::mpsc::Sender; use tokio::sync::mpsc::Sender;
use tokio::sync::mpsc::error::SendError;
use crate::error::{self, Result};
use crate::instruction::{Instruction, InstructionReply}; use crate::instruction::{Instruction, InstructionReply};
pub type IncomingMessage = (MessageMeta, Instruction); pub type IncomingMessage = (MessageMeta, Instruction);
@@ -51,13 +51,8 @@ impl HeartbeatMailbox {
Self { sender } Self { sender }
} }
pub async fn send(&self, message: OutgoingMessage) -> Result<()> { pub async fn send(&self, message: OutgoingMessage) -> Result<(), SendError<OutgoingMessage>> {
self.sender.send(message).await.map_err(|e| { self.sender.send(message).await
error::SendMessageSnafu {
err_msg: e.to_string(),
}
.build()
})
} }
} }

View File

@@ -539,6 +539,8 @@ pub enum Instruction {
GetFileRefs(GetFileRefs), GetFileRefs(GetFileRefs),
/// Triggers garbage collection for a region. /// Triggers garbage collection for a region.
GcRegions(GcRegions), GcRegions(GcRegions),
/// Temporary suspend serving reads or writes
Suspend,
} }
impl Instruction { impl Instruction {

View File

@@ -94,7 +94,7 @@ impl TableInfoValue {
} }
} }
pub(crate) fn update(&self, new_table_info: RawTableInfo) -> Self { pub fn update(&self, new_table_info: RawTableInfo) -> Self {
Self { Self {
table_info: new_table_info, table_info: new_table_info,
version: self.version + 1, version: self.version + 1,

View File

@@ -23,19 +23,20 @@ use api::v1::alter_database_expr::Kind as PbAlterDatabaseKind;
use api::v1::meta::ddl_task_request::Task; use api::v1::meta::ddl_task_request::Task;
use api::v1::meta::{ use api::v1::meta::{
AlterDatabaseTask as PbAlterDatabaseTask, AlterTableTask as PbAlterTableTask, AlterDatabaseTask as PbAlterDatabaseTask, AlterTableTask as PbAlterTableTask,
AlterTableTasks as PbAlterTableTasks, CreateDatabaseTask as PbCreateDatabaseTask, AlterTableTasks as PbAlterTableTasks, CommentOnTask as PbCommentOnTask,
CreateFlowTask as PbCreateFlowTask, CreateTableTask as PbCreateTableTask, CreateDatabaseTask as PbCreateDatabaseTask, CreateFlowTask as PbCreateFlowTask,
CreateTableTasks as PbCreateTableTasks, CreateViewTask as PbCreateViewTask, CreateTableTask as PbCreateTableTask, CreateTableTasks as PbCreateTableTasks,
DdlTaskRequest as PbDdlTaskRequest, DdlTaskResponse as PbDdlTaskResponse, CreateViewTask as PbCreateViewTask, DdlTaskRequest as PbDdlTaskRequest,
DropDatabaseTask as PbDropDatabaseTask, DropFlowTask as PbDropFlowTask, DdlTaskResponse as PbDdlTaskResponse, DropDatabaseTask as PbDropDatabaseTask,
DropTableTask as PbDropTableTask, DropTableTasks as PbDropTableTasks, DropFlowTask as PbDropFlowTask, DropTableTask as PbDropTableTask,
DropViewTask as PbDropViewTask, Partition, ProcedureId, DropTableTasks as PbDropTableTasks, DropViewTask as PbDropViewTask, Partition, ProcedureId,
TruncateTableTask as PbTruncateTableTask, TruncateTableTask as PbTruncateTableTask,
}; };
use api::v1::{ use api::v1::{
AlterDatabaseExpr, AlterTableExpr, CreateDatabaseExpr, CreateFlowExpr, CreateTableExpr, AlterDatabaseExpr, AlterTableExpr, CommentObjectType as PbCommentObjectType, CommentOnExpr,
CreateViewExpr, DropDatabaseExpr, DropFlowExpr, DropTableExpr, DropViewExpr, EvalInterval, CreateDatabaseExpr, CreateFlowExpr, CreateTableExpr, CreateViewExpr, DropDatabaseExpr,
ExpireAfter, Option as PbOption, QueryContext as PbQueryContext, TruncateTableExpr, DropFlowExpr, DropTableExpr, DropViewExpr, EvalInterval, ExpireAfter, Option as PbOption,
QueryContext as PbQueryContext, TruncateTableExpr,
}; };
use base64::Engine as _; use base64::Engine as _;
use base64::engine::general_purpose; use base64::engine::general_purpose;
@@ -78,6 +79,7 @@ pub enum DdlTask {
DropView(DropViewTask), DropView(DropViewTask),
#[cfg(feature = "enterprise")] #[cfg(feature = "enterprise")]
CreateTrigger(trigger::CreateTriggerTask), CreateTrigger(trigger::CreateTriggerTask),
CommentOn(CommentOnTask),
} }
impl DdlTask { impl DdlTask {
@@ -200,6 +202,11 @@ impl DdlTask {
view_info, view_info,
}) })
} }
/// Creates a [`DdlTask`] to comment on a table, column, or flow.
pub fn new_comment_on(task: CommentOnTask) -> Self {
DdlTask::CommentOn(task)
}
} }
impl TryFrom<Task> for DdlTask { impl TryFrom<Task> for DdlTask {
@@ -278,6 +285,7 @@ impl TryFrom<Task> for DdlTask {
.fail() .fail()
} }
} }
Task::CommentOnTask(comment_on) => Ok(DdlTask::CommentOn(comment_on.try_into()?)),
} }
} }
} }
@@ -332,6 +340,7 @@ impl TryFrom<SubmitDdlTaskRequest> for PbDdlTaskRequest {
DdlTask::CreateTrigger(task) => Task::CreateTriggerTask(task.try_into()?), DdlTask::CreateTrigger(task) => Task::CreateTriggerTask(task.try_into()?),
#[cfg(feature = "enterprise")] #[cfg(feature = "enterprise")]
DdlTask::DropTrigger(task) => Task::DropTriggerTask(task.into()), DdlTask::DropTrigger(task) => Task::DropTriggerTask(task.into()),
DdlTask::CommentOn(task) => Task::CommentOnTask(task.into()),
}; };
Ok(Self { Ok(Self {
@@ -1277,6 +1286,119 @@ impl From<DropFlowTask> for PbDropFlowTask {
} }
} }
/// Represents the ID of the object being commented on (Table or Flow).
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq)]
pub enum CommentObjectId {
Table(TableId),
Flow(FlowId),
}
/// Comment on table, column, or flow
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq)]
pub struct CommentOnTask {
pub catalog_name: String,
pub schema_name: String,
pub object_type: CommentObjectType,
pub object_name: String,
/// Column name (only for Column comments)
pub column_name: Option<String>,
/// Object ID (Table or Flow) for validation and cache invalidation
pub object_id: Option<CommentObjectId>,
pub comment: Option<String>,
}
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq)]
pub enum CommentObjectType {
Table,
Column,
Flow,
}
impl CommentOnTask {
pub fn table_ref(&self) -> TableReference<'_> {
TableReference {
catalog: &self.catalog_name,
schema: &self.schema_name,
table: &self.object_name,
}
}
}
// Proto conversions for CommentObjectType
impl From<CommentObjectType> for PbCommentObjectType {
fn from(object_type: CommentObjectType) -> Self {
match object_type {
CommentObjectType::Table => PbCommentObjectType::Table,
CommentObjectType::Column => PbCommentObjectType::Column,
CommentObjectType::Flow => PbCommentObjectType::Flow,
}
}
}
impl TryFrom<i32> for CommentObjectType {
type Error = error::Error;
fn try_from(value: i32) -> Result<Self> {
match value {
0 => Ok(CommentObjectType::Table),
1 => Ok(CommentObjectType::Column),
2 => Ok(CommentObjectType::Flow),
_ => error::InvalidProtoMsgSnafu {
err_msg: format!(
"Invalid CommentObjectType value: {}. Valid values are: 0 (Table), 1 (Column), 2 (Flow)",
value
),
}
.fail(),
}
}
}
// Proto conversions for CommentOnTask
impl TryFrom<PbCommentOnTask> for CommentOnTask {
type Error = error::Error;
fn try_from(pb: PbCommentOnTask) -> Result<Self> {
let comment_on = pb.comment_on.context(error::InvalidProtoMsgSnafu {
err_msg: "expected comment_on",
})?;
Ok(CommentOnTask {
catalog_name: comment_on.catalog_name,
schema_name: comment_on.schema_name,
object_type: comment_on.object_type.try_into()?,
object_name: comment_on.object_name,
column_name: if comment_on.column_name.is_empty() {
None
} else {
Some(comment_on.column_name)
},
comment: if comment_on.comment.is_empty() {
None
} else {
Some(comment_on.comment)
},
object_id: None,
})
}
}
impl From<CommentOnTask> for PbCommentOnTask {
fn from(task: CommentOnTask) -> Self {
let pb_object_type: PbCommentObjectType = task.object_type.into();
PbCommentOnTask {
comment_on: Some(CommentOnExpr {
catalog_name: task.catalog_name,
schema_name: task.schema_name,
object_type: pb_object_type as i32,
object_name: task.object_name,
column_name: task.column_name.unwrap_or_default(),
comment: task.comment.unwrap_or_default(),
}),
}
}
}
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq)] #[derive(Debug, Clone, Serialize, Deserialize, PartialEq)]
pub struct QueryContext { pub struct QueryContext {
pub(crate) current_catalog: String, pub(crate) current_catalog: String,

View File

@@ -14,7 +14,7 @@
use common_telemetry::{debug, error, info}; use common_telemetry::{debug, error, info};
use common_wal::config::kafka::common::{ use common_wal::config::kafka::common::{
DEFAULT_BACKOFF_CONFIG, KafkaConnectionConfig, KafkaTopicConfig, DEFAULT_BACKOFF_CONFIG, DEFAULT_CONNECT_TIMEOUT, KafkaConnectionConfig, KafkaTopicConfig,
}; };
use rskafka::client::error::Error as RsKafkaError; use rskafka::client::error::Error as RsKafkaError;
use rskafka::client::error::ProtocolError::TopicAlreadyExists; use rskafka::client::error::ProtocolError::TopicAlreadyExists;
@@ -205,11 +205,13 @@ impl KafkaTopicCreator {
self.partition_client(topic).await.unwrap() self.partition_client(topic).await.unwrap()
} }
} }
/// Builds a kafka [Client](rskafka::client::Client). /// Builds a kafka [Client](rskafka::client::Client).
pub async fn build_kafka_client(connection: &KafkaConnectionConfig) -> Result<Client> { pub async fn build_kafka_client(connection: &KafkaConnectionConfig) -> Result<Client> {
// Builds an kafka controller client for creating topics. // Builds an kafka controller client for creating topics.
let mut builder = ClientBuilder::new(connection.broker_endpoints.clone()) let mut builder = ClientBuilder::new(connection.broker_endpoints.clone())
.backoff_config(DEFAULT_BACKOFF_CONFIG); .backoff_config(DEFAULT_BACKOFF_CONFIG)
.connect_timeout(Some(DEFAULT_CONNECT_TIMEOUT));
if let Some(sasl) = &connection.sasl { if let Some(sasl) = &connection.sasl {
builder = builder.sasl_config(sasl.config.clone().into_sasl_config()); builder = builder.sasl_config(sasl.config.clone().into_sasl_config());
}; };

View File

@@ -331,8 +331,29 @@ impl Runner {
} }
match status { match status {
Status::Executing { .. } => {} Status::Executing { .. } => {
let prev_state = self.meta.state();
if !matches!(prev_state, ProcedureState::Running) {
info!(
"Set Procedure {}-{} state to running, prev_state: {:?}",
self.procedure.type_name(),
self.meta.id,
prev_state
);
self.meta.set_state(ProcedureState::Running);
}
}
Status::Suspended { subprocedures, .. } => { Status::Suspended { subprocedures, .. } => {
let prev_state = self.meta.state();
if !matches!(prev_state, ProcedureState::Running) {
info!(
"Set Procedure {}-{} state to running, prev_state: {:?}",
self.procedure.type_name(),
self.meta.id,
prev_state
);
self.meta.set_state(ProcedureState::Running);
}
self.on_suspended(subprocedures).await; self.on_suspended(subprocedures).await;
} }
Status::Done { output } => { Status::Done { output } => {
@@ -393,8 +414,12 @@ impl Runner {
return; return;
} }
if self.procedure.rollback_supported() {
self.meta self.meta
.set_state(ProcedureState::prepare_rollback(Arc::new(e))); .set_state(ProcedureState::prepare_rollback(Arc::new(e)));
} else {
self.meta.set_state(ProcedureState::failed(Arc::new(e)));
}
} }
} }
} }
@@ -1080,20 +1105,10 @@ mod tests {
let mut runner = new_runner(meta.clone(), Box::new(fail), procedure_store.clone()); let mut runner = new_runner(meta.clone(), Box::new(fail), procedure_store.clone());
runner.manager_ctx.start(); runner.manager_ctx.start();
runner.execute_once(&ctx).await;
let state = runner.meta.state();
assert!(state.is_prepare_rollback(), "{state:?}");
runner.execute_once(&ctx).await; runner.execute_once(&ctx).await;
let state = runner.meta.state(); let state = runner.meta.state();
assert!(state.is_failed(), "{state:?}"); assert!(state.is_failed(), "{state:?}");
check_files( check_files(&object_store, &procedure_store, ctx.procedure_id, &[]).await;
&object_store,
&procedure_store,
ctx.procedure_id,
&["0000000000.rollback"],
)
.await;
} }
#[tokio::test] #[tokio::test]
@@ -1146,6 +1161,8 @@ mod tests {
async move { async move {
if times == 1 { if times == 1 {
Err(Error::retry_later(MockError::new(StatusCode::Unexpected))) Err(Error::retry_later(MockError::new(StatusCode::Unexpected)))
} else if times == 2 {
Ok(Status::executing(false))
} else { } else {
Ok(Status::done()) Ok(Status::done())
} }
@@ -1172,6 +1189,10 @@ mod tests {
let state = runner.meta.state(); let state = runner.meta.state();
assert!(state.is_retrying(), "{state:?}"); assert!(state.is_retrying(), "{state:?}");
runner.execute_once(&ctx).await;
let state = runner.meta.state();
assert!(state.is_running(), "{state:?}");
runner.execute_once(&ctx).await; runner.execute_once(&ctx).await;
let state = runner.meta.state(); let state = runner.meta.state();
assert!(state.is_done(), "{state:?}"); assert!(state.is_done(), "{state:?}");
@@ -1185,6 +1206,86 @@ mod tests {
.await; .await;
} }
#[tokio::test(flavor = "multi_thread")]
async fn test_execute_on_retry_later_error_with_child() {
common_telemetry::init_default_ut_logging();
let mut times = 0;
let child_id = ProcedureId::random();
let exec_fn = move |_| {
times += 1;
async move {
debug!("times: {}", times);
if times == 1 {
Err(Error::retry_later(MockError::new(StatusCode::Unexpected)))
} else if times == 2 {
let exec_fn = |_| {
async { Err(Error::external(MockError::new(StatusCode::Unexpected))) }
.boxed()
};
let fail = ProcedureAdapter {
data: "fail".to_string(),
lock_key: LockKey::single_exclusive("catalog.schema.table.region-0"),
poison_keys: PoisonKeys::default(),
exec_fn,
rollback_fn: None,
};
Ok(Status::Suspended {
subprocedures: vec![ProcedureWithId {
id: child_id,
procedure: Box::new(fail),
}],
persist: true,
})
} else {
Ok(Status::done())
}
}
.boxed()
};
let retry_later = ProcedureAdapter {
data: "retry_later".to_string(),
lock_key: LockKey::single_exclusive("catalog.schema.table"),
poison_keys: PoisonKeys::default(),
exec_fn,
rollback_fn: None,
};
let dir = create_temp_dir("retry_later");
let meta = retry_later.new_meta(ROOT_ID);
let ctx = context_without_provider(meta.id);
let object_store = test_util::new_object_store(&dir);
let procedure_store = Arc::new(ProcedureStore::from_object_store(object_store.clone()));
let mut runner = new_runner(meta.clone(), Box::new(retry_later), procedure_store.clone());
runner.manager_ctx.start();
debug!("execute_once 1");
runner.execute_once(&ctx).await;
let state = runner.meta.state();
assert!(state.is_retrying(), "{state:?}");
let moved_meta = meta.clone();
tokio::spawn(async move {
moved_meta.child_notify.notify_one();
});
runner.execute_once(&ctx).await;
let state = runner.meta.state();
assert!(state.is_running(), "{state:?}");
runner.execute_once(&ctx).await;
let state = runner.meta.state();
assert!(state.is_done(), "{state:?}");
assert!(meta.state().is_done());
check_files(
&object_store,
&procedure_store,
ctx.procedure_id,
&["0000000000.step", "0000000001.commit"],
)
.await;
}
#[tokio::test] #[tokio::test]
async fn test_execute_exceed_max_retry_later() { async fn test_execute_exceed_max_retry_later() {
let exec_fn = let exec_fn =
@@ -1304,7 +1405,7 @@ mod tests {
async fn test_child_error() { async fn test_child_error() {
let mut times = 0; let mut times = 0;
let child_id = ProcedureId::random(); let child_id = ProcedureId::random();
common_telemetry::init_default_ut_logging();
let exec_fn = move |ctx: Context| { let exec_fn = move |ctx: Context| {
times += 1; times += 1;
async move { async move {
@@ -1529,7 +1630,7 @@ mod tests {
runner.execute_once(&ctx).await; runner.execute_once(&ctx).await;
let state = runner.meta.state(); let state = runner.meta.state();
assert!(state.is_prepare_rollback(), "{state:?}"); assert!(state.is_failed(), "{state:?}");
let procedure_id = runner let procedure_id = runner
.manager_ctx .manager_ctx
@@ -1596,11 +1697,6 @@ mod tests {
let state = runner.meta.state(); let state = runner.meta.state();
assert!(state.is_running(), "{state:?}"); assert!(state.is_running(), "{state:?}");
runner.execute_once(&ctx).await;
let state = runner.meta.state();
assert!(state.is_prepare_rollback(), "{state:?}");
assert!(meta.state().is_prepare_rollback());
runner.execute_once(&ctx).await; runner.execute_once(&ctx).await;
let state = runner.meta.state(); let state = runner.meta.state();
assert!(state.is_failed(), "{state:?}"); assert!(state.is_failed(), "{state:?}");

View File

@@ -46,6 +46,22 @@ pub enum OutputData {
Stream(SendableRecordBatchStream), Stream(SendableRecordBatchStream),
} }
impl OutputData {
/// Consume the data to pretty printed string.
pub async fn pretty_print(self) -> String {
match self {
OutputData::AffectedRows(x) => {
format!("Affected Rows: {x}")
}
OutputData::RecordBatches(x) => x.pretty_print().unwrap_or_else(|e| e.to_string()),
OutputData::Stream(x) => common_recordbatch::util::collect_batches(x)
.await
.and_then(|x| x.pretty_print())
.unwrap_or_else(|e| e.to_string()),
}
}
}
/// OutputMeta stores meta information produced/generated during the execution /// OutputMeta stores meta information produced/generated during the execution
#[derive(Debug, Default)] #[derive(Debug, Default)]
pub struct OutputMeta { pub struct OutputMeta {

View File

@@ -58,10 +58,14 @@ pub fn get_total_memory_bytes() -> i64 {
} }
} }
/// Get the total CPU cores. The result will be rounded to the nearest integer. /// Get the total CPU cores. The result will be rounded up to the next integer (ceiling).
/// For example, if the total CPU is 1.5 cores(1500 millicores), the result will be 2. /// For example, if the total CPU is 1.1 cores (1100 millicores) or 1.5 cores (1500 millicores), the result will be 2.
pub fn get_total_cpu_cores() -> usize { pub fn get_total_cpu_cores() -> usize {
((get_total_cpu_millicores() as f64) / 1000.0).round() as usize cpu_cores(get_total_cpu_millicores())
}
fn cpu_cores(cpu_millicores: i64) -> usize {
((cpu_millicores as f64) / 1_000.0).ceil() as usize
} }
/// Get the total memory in readable size. /// Get the total memory in readable size.
@@ -178,6 +182,13 @@ mod tests {
#[test] #[test]
fn test_get_total_cpu_cores() { fn test_get_total_cpu_cores() {
assert!(get_total_cpu_cores() > 0); assert!(get_total_cpu_cores() > 0);
assert_eq!(cpu_cores(1), 1);
assert_eq!(cpu_cores(100), 1);
assert_eq!(cpu_cores(500), 1);
assert_eq!(cpu_cores(1000), 1);
assert_eq!(cpu_cores(1100), 2);
assert_eq!(cpu_cores(1900), 2);
assert_eq!(cpu_cores(10_000), 10);
} }
#[test] #[test]

View File

@@ -36,6 +36,9 @@ pub const DEFAULT_BACKOFF_CONFIG: BackoffConfig = BackoffConfig {
deadline: Some(Duration::from_secs(3)), deadline: Some(Duration::from_secs(3)),
}; };
/// The default connect timeout for kafka client.
pub const DEFAULT_CONNECT_TIMEOUT: Duration = Duration::from_secs(10);
/// Default interval for auto WAL pruning. /// Default interval for auto WAL pruning.
pub const DEFAULT_AUTO_PRUNE_INTERVAL: Duration = Duration::from_mins(30); pub const DEFAULT_AUTO_PRUNE_INTERVAL: Duration = Duration::from_mins(30);
/// Default limit for concurrent auto pruning tasks. /// Default limit for concurrent auto pruning tasks.

View File

@@ -22,6 +22,7 @@ use common_base::Plugins;
use common_error::ext::BoxedError; use common_error::ext::BoxedError;
use common_greptimedb_telemetry::GreptimeDBTelemetryTask; use common_greptimedb_telemetry::GreptimeDBTelemetryTask;
use common_meta::cache::{LayeredCacheRegistry, SchemaCacheRef, TableSchemaCacheRef}; use common_meta::cache::{LayeredCacheRegistry, SchemaCacheRef, TableSchemaCacheRef};
use common_meta::cache_invalidator::CacheInvalidatorRef;
use common_meta::datanode::TopicStatsReporter; use common_meta::datanode::TopicStatsReporter;
use common_meta::key::runtime_switch::RuntimeSwitchManager; use common_meta::key::runtime_switch::RuntimeSwitchManager;
use common_meta::key::{SchemaMetadataManager, SchemaMetadataManagerRef}; use common_meta::key::{SchemaMetadataManager, SchemaMetadataManagerRef};
@@ -281,21 +282,11 @@ impl DatanodeBuilder {
open_all_regions.await?; open_all_regions.await?;
} }
let mut resource_stat = ResourceStatImpl::default();
resource_stat.start_collect_cpu_usage();
let heartbeat_task = if let Some(meta_client) = meta_client { let heartbeat_task = if let Some(meta_client) = meta_client {
Some( let task = self
HeartbeatTask::try_new( .create_heartbeat_task(&region_server, meta_client, cache_registry)
&self.opts, .await?;
region_server.clone(), Some(task)
meta_client,
cache_registry,
self.plugins.clone(),
Arc::new(resource_stat),
)
.await?,
)
} else { } else {
None None
}; };
@@ -324,6 +315,29 @@ impl DatanodeBuilder {
}) })
} }
async fn create_heartbeat_task(
&self,
region_server: &RegionServer,
meta_client: MetaClientRef,
cache_invalidator: CacheInvalidatorRef,
) -> Result<HeartbeatTask> {
let stat = {
let mut stat = ResourceStatImpl::default();
stat.start_collect_cpu_usage();
Arc::new(stat)
};
HeartbeatTask::try_new(
&self.opts,
region_server.clone(),
meta_client,
cache_invalidator,
self.plugins.clone(),
stat,
)
.await
}
/// Builds [ObjectStoreManager] from [StorageConfig]. /// Builds [ObjectStoreManager] from [StorageConfig].
pub async fn build_object_store_manager(cfg: &StorageConfig) -> Result<ObjectStoreManagerRef> { pub async fn build_object_store_manager(cfg: &StorageConfig) -> Result<ObjectStoreManagerRef> {
let object_store = store::new_object_store(cfg.store.clone(), &cfg.data_home).await?; let object_store = store::new_object_store(cfg.store.clone(), &cfg.data_home).await?;

View File

@@ -410,14 +410,6 @@ pub enum Error {
location: Location, location: Location,
}, },
#[snafu(display("Failed to build cache store"))]
BuildCacheStore {
#[snafu(source)]
error: object_store::Error,
#[snafu(implicit)]
location: Location,
},
#[snafu(display("Not yet implemented: {what}"))] #[snafu(display("Not yet implemented: {what}"))]
NotYetImplemented { what: String }, NotYetImplemented { what: String },
} }
@@ -493,7 +485,6 @@ impl ErrorExt for Error {
SerializeJson { .. } => StatusCode::Internal, SerializeJson { .. } => StatusCode::Internal,
ObjectStore { source, .. } => source.status_code(), ObjectStore { source, .. } => source.status_code(),
BuildCacheStore { .. } => StatusCode::StorageUnavailable,
} }
} }

View File

@@ -25,6 +25,7 @@ use common_meta::datanode::REGION_STATISTIC_KEY;
use common_meta::distributed_time_constants::META_KEEP_ALIVE_INTERVAL_SECS; use common_meta::distributed_time_constants::META_KEEP_ALIVE_INTERVAL_SECS;
use common_meta::heartbeat::handler::invalidate_table_cache::InvalidateCacheHandler; use common_meta::heartbeat::handler::invalidate_table_cache::InvalidateCacheHandler;
use common_meta::heartbeat::handler::parse_mailbox_message::ParseMailboxMessageHandler; use common_meta::heartbeat::handler::parse_mailbox_message::ParseMailboxMessageHandler;
use common_meta::heartbeat::handler::suspend::SuspendHandler;
use common_meta::heartbeat::handler::{ use common_meta::heartbeat::handler::{
HandlerGroupExecutor, HeartbeatResponseHandlerContext, HeartbeatResponseHandlerExecutorRef, HandlerGroupExecutor, HeartbeatResponseHandlerContext, HeartbeatResponseHandlerExecutorRef,
}; };
@@ -91,6 +92,7 @@ impl HeartbeatTask {
let resp_handler_executor = Arc::new(HandlerGroupExecutor::new(vec![ let resp_handler_executor = Arc::new(HandlerGroupExecutor::new(vec![
region_alive_keeper.clone(), region_alive_keeper.clone(),
Arc::new(ParseMailboxMessageHandler), Arc::new(ParseMailboxMessageHandler),
Arc::new(SuspendHandler::new(region_server.suspend_state())),
Arc::new( Arc::new(
RegionHeartbeatResponseHandler::new(region_server.clone()) RegionHeartbeatResponseHandler::new(region_server.clone())
.with_open_region_parallelism(opts.init_regions_parallelism), .with_open_region_parallelism(opts.init_regions_parallelism),

View File

@@ -99,26 +99,30 @@ impl RegionHeartbeatResponseHandler {
self self
} }
fn build_handler(&self, instruction: &Instruction) -> MetaResult<Box<InstructionHandlers>> { fn build_handler(
&self,
instruction: &Instruction,
) -> MetaResult<Option<Box<InstructionHandlers>>> {
match instruction { match instruction {
Instruction::CloseRegions(_) => Ok(Box::new(CloseRegionsHandler.into())), Instruction::CloseRegions(_) => Ok(Some(Box::new(CloseRegionsHandler.into()))),
Instruction::OpenRegions(_) => Ok(Box::new( Instruction::OpenRegions(_) => Ok(Some(Box::new(
OpenRegionsHandler { OpenRegionsHandler {
open_region_parallelism: self.open_region_parallelism, open_region_parallelism: self.open_region_parallelism,
} }
.into(), .into(),
)), ))),
Instruction::FlushRegions(_) => Ok(Box::new(FlushRegionsHandler.into())), Instruction::FlushRegions(_) => Ok(Some(Box::new(FlushRegionsHandler.into()))),
Instruction::DowngradeRegions(_) => Ok(Box::new(DowngradeRegionsHandler.into())), Instruction::DowngradeRegions(_) => Ok(Some(Box::new(DowngradeRegionsHandler.into()))),
Instruction::UpgradeRegions(_) => Ok(Box::new( Instruction::UpgradeRegions(_) => Ok(Some(Box::new(
UpgradeRegionsHandler { UpgradeRegionsHandler {
upgrade_region_parallelism: self.open_region_parallelism, upgrade_region_parallelism: self.open_region_parallelism,
} }
.into(), .into(),
)), ))),
Instruction::GetFileRefs(_) => Ok(Box::new(GetFileRefsHandler.into())), Instruction::GetFileRefs(_) => Ok(Some(Box::new(GetFileRefsHandler.into()))),
Instruction::GcRegions(_) => Ok(Box::new(GcRegionsHandler.into())), Instruction::GcRegions(_) => Ok(Some(Box::new(GcRegionsHandler.into()))),
Instruction::InvalidateCaches(_) => InvalidHeartbeatResponseSnafu.fail(), Instruction::InvalidateCaches(_) => InvalidHeartbeatResponseSnafu.fail(),
Instruction::Suspend => Ok(None),
} }
} }
} }
@@ -216,30 +220,24 @@ impl HeartbeatResponseHandler for RegionHeartbeatResponseHandler {
.context(InvalidHeartbeatResponseSnafu)?; .context(InvalidHeartbeatResponseSnafu)?;
let mailbox = ctx.mailbox.clone(); let mailbox = ctx.mailbox.clone();
let region_server = self.region_server.clone(); if let Some(handler) = self.build_handler(&instruction)? {
let downgrade_tasks = self.downgrade_tasks.clone(); let context = HandlerContext {
let flush_tasks = self.flush_tasks.clone(); region_server: self.region_server.clone(),
let gc_tasks = self.gc_tasks.clone(); downgrade_tasks: self.downgrade_tasks.clone(),
let handler = self.build_handler(&instruction)?; flush_tasks: self.flush_tasks.clone(),
gc_tasks: self.gc_tasks.clone(),
};
let _handle = common_runtime::spawn_global(async move { let _handle = common_runtime::spawn_global(async move {
let reply = handler let reply = handler.handle(&context, instruction).await;
.handle(
&HandlerContext {
region_server,
downgrade_tasks,
flush_tasks,
gc_tasks,
},
instruction,
)
.await;
if let Some(reply) = reply if let Some(reply) = reply
&& let Err(e) = mailbox.send((meta, reply)).await && let Err(e) = mailbox.send((meta, reply)).await
{ {
error!(e; "Failed to send reply to mailbox"); let error = e.to_string();
let (meta, reply) = e.0;
error!("Failed to send reply {reply} to {meta:?}: {error}");
} }
}); });
}
Ok(HandleControl::Continue) Ok(HandleControl::Continue)
} }

View File

@@ -17,6 +17,7 @@ mod catalog;
use std::collections::HashMap; use std::collections::HashMap;
use std::fmt::Debug; use std::fmt::Debug;
use std::ops::Deref; use std::ops::Deref;
use std::sync::atomic::{AtomicBool, Ordering};
use std::sync::{Arc, RwLock}; use std::sync::{Arc, RwLock};
use std::time::Duration; use std::time::Duration;
@@ -52,7 +53,9 @@ pub use query::dummy_catalog::{
DummyCatalogList, DummyTableProviderFactory, TableProviderFactoryRef, DummyCatalogList, DummyTableProviderFactory, TableProviderFactoryRef,
}; };
use serde_json; use serde_json;
use servers::error::{self as servers_error, ExecuteGrpcRequestSnafu, Result as ServerResult}; use servers::error::{
self as servers_error, ExecuteGrpcRequestSnafu, Result as ServerResult, SuspendedSnafu,
};
use servers::grpc::FlightCompression; use servers::grpc::FlightCompression;
use servers::grpc::flight::{FlightCraft, FlightRecordBatchStream, TonicStream}; use servers::grpc::flight::{FlightCraft, FlightRecordBatchStream, TonicStream};
use servers::grpc::region_server::RegionServerHandler; use servers::grpc::region_server::RegionServerHandler;
@@ -89,6 +92,7 @@ use crate::region_server::catalog::{NameAwareCatalogList, NameAwareDataSourceInj
pub struct RegionServer { pub struct RegionServer {
inner: Arc<RegionServerInner>, inner: Arc<RegionServerInner>,
flight_compression: FlightCompression, flight_compression: FlightCompression,
suspend: Arc<AtomicBool>,
} }
pub struct RegionStat { pub struct RegionStat {
@@ -136,6 +140,7 @@ impl RegionServer {
), ),
)), )),
flight_compression, flight_compression,
suspend: Arc::new(AtomicBool::new(false)),
} }
} }
@@ -595,6 +600,14 @@ impl RegionServer {
.handle_sync_region(engine_with_status.engine(), region_id, manifest_info) .handle_sync_region(engine_with_status.engine(), region_id, manifest_info)
.await .await
} }
fn is_suspended(&self) -> bool {
self.suspend.load(Ordering::Relaxed)
}
pub(crate) fn suspend_state(&self) -> Arc<AtomicBool> {
self.suspend.clone()
}
} }
#[async_trait] #[async_trait]
@@ -644,6 +657,8 @@ impl FlightCraft for RegionServer {
&self, &self,
request: Request<Ticket>, request: Request<Ticket>,
) -> TonicResult<Response<TonicStream<FlightData>>> { ) -> TonicResult<Response<TonicStream<FlightData>>> {
ensure!(!self.is_suspended(), SuspendedSnafu);
let ticket = request.into_inner().ticket; let ticket = request.into_inner().ticket;
let request = api::v1::region::QueryRequest::decode(ticket.as_ref()) let request = api::v1::region::QueryRequest::decode(ticket.as_ref())
.context(servers_error::InvalidFlightTicketSnafu)?; .context(servers_error::InvalidFlightTicketSnafu)?;

View File

@@ -14,15 +14,10 @@
//! object storage utilities //! object storage utilities
use std::sync::Arc; use common_telemetry::{info, warn};
use common_telemetry::info;
use object_store::config::ObjectStorageCacheConfig;
use object_store::factory::new_raw_object_store; use object_store::factory::new_raw_object_store;
use object_store::layers::LruCacheLayer;
use object_store::services::Fs;
use object_store::util::{clean_temp_dir, join_dir, with_instrument_layers, with_retry_layers}; use object_store::util::{clean_temp_dir, join_dir, with_instrument_layers, with_retry_layers};
use object_store::{ATOMIC_WRITE_DIR, Access, ObjectStore, ObjectStoreBuilder}; use object_store::{ATOMIC_WRITE_DIR, ObjectStore};
use snafu::prelude::*; use snafu::prelude::*;
use crate::config::ObjectStoreConfig; use crate::config::ObjectStoreConfig;
@@ -47,23 +42,58 @@ pub(crate) async fn new_object_store_without_cache(
Ok(object_store) Ok(object_store)
} }
/// Cleans up old LRU read cache directories that were removed.
fn clean_old_read_cache(store: &ObjectStoreConfig, data_home: &str) {
if !store.is_object_storage() {
return;
}
let Some(cache_config) = store.cache_config() else {
return;
};
// Only cleans if read cache was enabled
if !cache_config.enable_read_cache {
return;
}
let cache_base_dir = if cache_config.cache_path.is_empty() {
data_home
} else {
&cache_config.cache_path
};
// Cleans up the old read cache directory
let old_read_cache_dir = join_dir(cache_base_dir, "cache/object/read");
info!(
"Cleaning up old read cache directory: {}",
old_read_cache_dir
);
if let Err(e) = clean_temp_dir(&old_read_cache_dir) {
warn!(e; "Failed to clean old read cache directory {}", old_read_cache_dir);
}
// Cleans up the atomic temp dir used by the cache layer
let cache_atomic_temp_dir = join_dir(cache_base_dir, ATOMIC_WRITE_DIR);
info!(
"Cleaning up old cache atomic temp directory: {}",
cache_atomic_temp_dir
);
if let Err(e) = clean_temp_dir(&cache_atomic_temp_dir) {
warn!(e; "Failed to clean old cache atomic temp directory {}", cache_atomic_temp_dir);
}
}
pub async fn new_object_store(store: ObjectStoreConfig, data_home: &str) -> Result<ObjectStore> { pub async fn new_object_store(store: ObjectStoreConfig, data_home: &str) -> Result<ObjectStore> {
// Cleans up old LRU read cache directories.
// TODO: Remove this line after the 1.0 release.
clean_old_read_cache(&store, data_home);
let object_store = new_raw_object_store(&store, data_home) let object_store = new_raw_object_store(&store, data_home)
.await .await
.context(error::ObjectStoreSnafu)?; .context(error::ObjectStoreSnafu)?;
// Enable retry layer and cache layer for non-fs object storages // Enables retry layer for non-fs object storages
let object_store = if store.is_object_storage() { let object_store = if store.is_object_storage() {
let object_store = {
// It's safe to unwrap here because we already checked above.
let cache_config = store.cache_config().unwrap();
if let Some(cache_layer) = build_cache_layer(cache_config, data_home).await? {
// Adds cache layer
object_store.layer(cache_layer)
} else {
object_store
}
};
// Adds retry layer // Adds retry layer
with_retry_layers(object_store) with_retry_layers(object_store)
} else { } else {
@@ -73,40 +103,3 @@ pub async fn new_object_store(store: ObjectStoreConfig, data_home: &str) -> Resu
let object_store = with_instrument_layers(object_store, true); let object_store = with_instrument_layers(object_store, true);
Ok(object_store) Ok(object_store)
} }
async fn build_cache_layer(
cache_config: &ObjectStorageCacheConfig,
data_home: &str,
) -> Result<Option<LruCacheLayer<impl Access>>> {
// No need to build cache layer if read cache is disabled.
if !cache_config.enable_read_cache {
return Ok(None);
}
let cache_base_dir = if cache_config.cache_path.is_empty() {
data_home
} else {
&cache_config.cache_path
};
let atomic_temp_dir = join_dir(cache_base_dir, ATOMIC_WRITE_DIR);
clean_temp_dir(&atomic_temp_dir).context(error::ObjectStoreSnafu)?;
let cache_store = Fs::default()
.root(cache_base_dir)
.atomic_write_dir(&atomic_temp_dir)
.build()
.context(error::BuildCacheStoreSnafu)?;
let cache_layer = LruCacheLayer::new(
Arc::new(cache_store),
cache_config.cache_capacity.0 as usize,
)
.context(error::BuildCacheStoreSnafu)?;
cache_layer.recover_cache(false).await;
info!(
"Enabled local object storage cache, path: {}, capacity: {}.",
cache_config.cache_path, cache_config.cache_capacity
);
Ok(Some(cache_layer))
}

View File

@@ -33,9 +33,9 @@ use servers::grpc::FlightCompression;
use session::context::QueryContextRef; use session::context::QueryContextRef;
use store_api::metadata::RegionMetadataRef; use store_api::metadata::RegionMetadataRef;
use store_api::region_engine::{ use store_api::region_engine::{
RegionEngine, RegionManifestInfo, RegionRole, RegionScannerRef, RegionStatistic, CopyRegionFromRequest, CopyRegionFromResponse, RegionEngine, RegionManifestInfo, RegionRole,
RemapManifestsRequest, RemapManifestsResponse, SetRegionRoleStateResponse, RegionScannerRef, RegionStatistic, RemapManifestsRequest, RemapManifestsResponse,
SettableRegionRoleState, SyncManifestResponse, SetRegionRoleStateResponse, SettableRegionRoleState, SyncManifestResponse,
}; };
use store_api::region_request::{AffectedRows, RegionRequest}; use store_api::region_request::{AffectedRows, RegionRequest};
use store_api::storage::{RegionId, ScanRequest, SequenceNumber}; use store_api::storage::{RegionId, ScanRequest, SequenceNumber};
@@ -299,6 +299,14 @@ impl RegionEngine for MockRegionEngine {
unimplemented!() unimplemented!()
} }
async fn copy_region_from(
&self,
_region_id: RegionId,
_request: CopyRegionFromRequest,
) -> Result<CopyRegionFromResponse, BoxedError> {
unimplemented!()
}
fn as_any(&self) -> &dyn Any { fn as_any(&self) -> &dyn Any {
self self
} }

View File

@@ -33,7 +33,8 @@ pub use crate::schema::column_schema::{
COLUMN_SKIPPING_INDEX_OPT_KEY_FALSE_POSITIVE_RATE, COLUMN_SKIPPING_INDEX_OPT_KEY_GRANULARITY, COLUMN_SKIPPING_INDEX_OPT_KEY_FALSE_POSITIVE_RATE, COLUMN_SKIPPING_INDEX_OPT_KEY_GRANULARITY,
COLUMN_SKIPPING_INDEX_OPT_KEY_TYPE, COMMENT_KEY, ColumnExtType, ColumnSchema, FULLTEXT_KEY, COLUMN_SKIPPING_INDEX_OPT_KEY_TYPE, COMMENT_KEY, ColumnExtType, ColumnSchema, FULLTEXT_KEY,
FulltextAnalyzer, FulltextBackend, FulltextOptions, INVERTED_INDEX_KEY, Metadata, FulltextAnalyzer, FulltextBackend, FulltextOptions, INVERTED_INDEX_KEY, Metadata,
SKIPPING_INDEX_KEY, SkippingIndexOptions, SkippingIndexType, TIME_INDEX_KEY, SKIPPING_INDEX_KEY, SkippingIndexOptions, SkippingIndexType, TIME_INDEX_KEY, VECTOR_INDEX_KEY,
VectorDistanceMetric, VectorIndexEngineType, VectorIndexOptions,
}; };
pub use crate::schema::constraint::ColumnDefaultConstraint; pub use crate::schema::constraint::ColumnDefaultConstraint;
pub use crate::schema::raw::RawSchema; pub use crate::schema::raw::RawSchema;

View File

@@ -46,6 +46,8 @@ pub const FULLTEXT_KEY: &str = "greptime:fulltext";
pub const INVERTED_INDEX_KEY: &str = "greptime:inverted_index"; pub const INVERTED_INDEX_KEY: &str = "greptime:inverted_index";
/// Key used to store skip options in arrow field's metadata. /// Key used to store skip options in arrow field's metadata.
pub const SKIPPING_INDEX_KEY: &str = "greptime:skipping_index"; pub const SKIPPING_INDEX_KEY: &str = "greptime:skipping_index";
/// Key used to store vector index options in arrow field's metadata.
pub const VECTOR_INDEX_KEY: &str = "greptime:vector_index";
/// Keys used in fulltext options /// Keys used in fulltext options
pub const COLUMN_FULLTEXT_CHANGE_OPT_KEY_ENABLE: &str = "enable"; pub const COLUMN_FULLTEXT_CHANGE_OPT_KEY_ENABLE: &str = "enable";
@@ -216,6 +218,53 @@ impl ColumnSchema {
self.metadata.contains_key(INVERTED_INDEX_KEY) self.metadata.contains_key(INVERTED_INDEX_KEY)
} }
/// Checks if this column has a vector index.
pub fn is_vector_indexed(&self) -> bool {
match self.vector_index_options() {
Ok(opts) => opts.is_some(),
Err(e) => {
common_telemetry::warn!(
"Failed to deserialize vector_index_options for column '{}': {}",
self.name,
e
);
false
}
}
}
/// Gets the vector index options.
pub fn vector_index_options(&self) -> Result<Option<VectorIndexOptions>> {
match self.metadata.get(VECTOR_INDEX_KEY) {
None => Ok(None),
Some(json) => {
let options =
serde_json::from_str(json).context(error::DeserializeSnafu { json })?;
Ok(Some(options))
}
}
}
/// Sets the vector index options.
pub fn set_vector_index_options(&mut self, options: &VectorIndexOptions) -> Result<()> {
self.metadata.insert(
VECTOR_INDEX_KEY.to_string(),
serde_json::to_string(options).context(error::SerializeSnafu)?,
);
Ok(())
}
/// Removes the vector index options.
pub fn unset_vector_index_options(&mut self) {
self.metadata.remove(VECTOR_INDEX_KEY);
}
/// Sets vector index options and returns self for chaining.
pub fn with_vector_index_options(mut self, options: &VectorIndexOptions) -> Result<Self> {
self.set_vector_index_options(options)?;
Ok(self)
}
/// Set default constraint. /// Set default constraint.
/// ///
/// If a default constraint exists for the column, this method will /// If a default constraint exists for the column, this method will
@@ -964,6 +1013,181 @@ impl TryFrom<HashMap<String, String>> for SkippingIndexOptions {
} }
} }
/// Distance metric for vector similarity search.
#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize, Default, Visit, VisitMut)]
#[serde(rename_all = "lowercase")]
pub enum VectorDistanceMetric {
/// Squared Euclidean distance (L2^2).
#[default]
L2sq,
/// Cosine distance (1 - cosine similarity).
Cosine,
/// Inner product (negative, for maximum inner product search).
#[serde(alias = "ip")]
InnerProduct,
}
impl fmt::Display for VectorDistanceMetric {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
match self {
VectorDistanceMetric::L2sq => write!(f, "l2sq"),
VectorDistanceMetric::Cosine => write!(f, "cosine"),
VectorDistanceMetric::InnerProduct => write!(f, "ip"),
}
}
}
impl std::str::FromStr for VectorDistanceMetric {
type Err = String;
fn from_str(s: &str) -> std::result::Result<Self, Self::Err> {
match s.to_lowercase().as_str() {
"l2sq" | "l2" | "euclidean" => Ok(VectorDistanceMetric::L2sq),
"cosine" | "cos" => Ok(VectorDistanceMetric::Cosine),
"inner_product" | "ip" | "dot" => Ok(VectorDistanceMetric::InnerProduct),
_ => Err(format!(
"Unknown distance metric: {}. Expected: l2sq, cosine, or ip",
s
)),
}
}
}
impl VectorDistanceMetric {
/// Returns the metric as u8 for blob serialization.
pub fn as_u8(&self) -> u8 {
match self {
Self::L2sq => 0,
Self::Cosine => 1,
Self::InnerProduct => 2,
}
}
/// Parses metric from u8 (used when reading blob).
pub fn try_from_u8(v: u8) -> Option<Self> {
match v {
0 => Some(Self::L2sq),
1 => Some(Self::Cosine),
2 => Some(Self::InnerProduct),
_ => None,
}
}
}
/// Default HNSW connectivity parameter.
const DEFAULT_VECTOR_INDEX_CONNECTIVITY: u32 = 16;
/// Default expansion factor during index construction.
const DEFAULT_VECTOR_INDEX_EXPANSION_ADD: u32 = 128;
/// Default expansion factor during search.
const DEFAULT_VECTOR_INDEX_EXPANSION_SEARCH: u32 = 64;
fn default_vector_index_connectivity() -> u32 {
DEFAULT_VECTOR_INDEX_CONNECTIVITY
}
fn default_vector_index_expansion_add() -> u32 {
DEFAULT_VECTOR_INDEX_EXPANSION_ADD
}
fn default_vector_index_expansion_search() -> u32 {
DEFAULT_VECTOR_INDEX_EXPANSION_SEARCH
}
/// Supported vector index engine types.
#[derive(Debug, Clone, Copy, PartialEq, Eq, Default, Serialize, Deserialize, Visit, VisitMut)]
#[serde(rename_all = "lowercase")]
pub enum VectorIndexEngineType {
/// USearch HNSW implementation.
#[default]
Usearch,
// Future: Vsag,
}
impl VectorIndexEngineType {
/// Returns the engine type as u8 for blob serialization.
pub fn as_u8(&self) -> u8 {
match self {
Self::Usearch => 0,
}
}
/// Parses engine type from u8 (used when reading blob).
pub fn try_from_u8(v: u8) -> Option<Self> {
match v {
0 => Some(Self::Usearch),
_ => None,
}
}
}
impl fmt::Display for VectorIndexEngineType {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
match self {
Self::Usearch => write!(f, "usearch"),
}
}
}
impl std::str::FromStr for VectorIndexEngineType {
type Err = String;
fn from_str(s: &str) -> std::result::Result<Self, Self::Err> {
match s.to_lowercase().as_str() {
"usearch" => Ok(Self::Usearch),
_ => Err(format!(
"Unknown vector index engine: {}. Expected: usearch",
s
)),
}
}
}
/// Options for vector index (HNSW).
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize, Visit, VisitMut)]
#[serde(rename_all = "kebab-case")]
pub struct VectorIndexOptions {
/// Vector index engine type (default: usearch).
#[serde(default)]
pub engine: VectorIndexEngineType,
/// Distance metric for similarity search.
#[serde(default)]
pub metric: VectorDistanceMetric,
/// HNSW connectivity parameter (M in the paper).
/// Higher values improve recall but increase memory usage.
#[serde(default = "default_vector_index_connectivity")]
pub connectivity: u32,
/// Expansion factor during index construction (ef_construction).
/// Higher values improve index quality but slow down construction.
#[serde(default = "default_vector_index_expansion_add")]
pub expansion_add: u32,
/// Expansion factor during search (ef_search).
/// Higher values improve recall but slow down search.
#[serde(default = "default_vector_index_expansion_search")]
pub expansion_search: u32,
}
impl Default for VectorIndexOptions {
fn default() -> Self {
Self {
engine: VectorIndexEngineType::default(),
metric: VectorDistanceMetric::default(),
connectivity: DEFAULT_VECTOR_INDEX_CONNECTIVITY,
expansion_add: DEFAULT_VECTOR_INDEX_EXPANSION_ADD,
expansion_search: DEFAULT_VECTOR_INDEX_EXPANSION_SEARCH,
}
}
}
impl fmt::Display for VectorIndexOptions {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
write!(
f,
"engine={}, metric={}, connectivity={}, expansion_add={}, expansion_search={}",
self.engine, self.metric, self.connectivity, self.expansion_add, self.expansion_search
)
}
}
#[cfg(test)] #[cfg(test)]
mod tests { mod tests {
use std::sync::Arc; use std::sync::Arc;

View File

@@ -26,10 +26,10 @@ use object_store::ObjectStore;
use snafu::{OptionExt, ensure}; use snafu::{OptionExt, ensure};
use store_api::metadata::RegionMetadataRef; use store_api::metadata::RegionMetadataRef;
use store_api::region_engine::{ use store_api::region_engine::{
RegionEngine, RegionManifestInfo, RegionRole, RegionScannerRef, RegionStatistic, CopyRegionFromRequest, CopyRegionFromResponse, RegionEngine, RegionManifestInfo, RegionRole,
RemapManifestsRequest, RemapManifestsResponse, SetRegionRoleStateResponse, RegionScannerRef, RegionStatistic, RemapManifestsRequest, RemapManifestsResponse,
SetRegionRoleStateSuccess, SettableRegionRoleState, SinglePartitionScanner, SetRegionRoleStateResponse, SetRegionRoleStateSuccess, SettableRegionRoleState,
SyncManifestResponse, SinglePartitionScanner, SyncManifestResponse,
}; };
use store_api::region_request::{ use store_api::region_request::{
AffectedRows, RegionCloseRequest, RegionCreateRequest, RegionDropRequest, RegionOpenRequest, AffectedRows, RegionCloseRequest, RegionCreateRequest, RegionDropRequest, RegionOpenRequest,
@@ -163,6 +163,19 @@ impl RegionEngine for FileRegionEngine {
)) ))
} }
async fn copy_region_from(
&self,
_region_id: RegionId,
_request: CopyRegionFromRequest,
) -> Result<CopyRegionFromResponse, BoxedError> {
Err(BoxedError::new(
UnsupportedSnafu {
operation: "copy_region_from",
}
.build(),
))
}
fn role(&self, region_id: RegionId) -> Option<RegionRole> { fn role(&self, region_id: RegionId) -> Option<RegionRole> {
self.inner.state(region_id) self.inner.state(region_id)
} }

View File

@@ -110,6 +110,26 @@ impl FrontendClient {
) )
} }
/// Check if the frontend client is initialized.
///
/// In distributed mode, it is always initialized.
/// In standalone mode, it checks if the database client is set.
pub fn is_initialized(&self) -> bool {
match self {
FrontendClient::Distributed { .. } => true,
FrontendClient::Standalone {
database_client, ..
} => {
let guard = database_client.lock();
if let Ok(guard) = guard {
guard.is_some()
} else {
false
}
}
}
}
pub fn from_meta_client( pub fn from_meta_client(
meta_client: Arc<MetaClient>, meta_client: Arc<MetaClient>,
auth: Option<FlowAuthHeader>, auth: Option<FlowAuthHeader>,

View File

@@ -17,6 +17,7 @@ arc-swap = "1.0"
async-stream.workspace = true async-stream.workspace = true
async-trait.workspace = true async-trait.workspace = true
auth.workspace = true auth.workspace = true
axum.workspace = true
bytes.workspace = true bytes.workspace = true
cache.workspace = true cache.workspace = true
catalog.workspace = true catalog.workspace = true
@@ -85,6 +86,9 @@ common-test-util.workspace = true
datanode.workspace = true datanode.workspace = true
datatypes.workspace = true datatypes.workspace = true
futures.workspace = true futures.workspace = true
hyper-util = { workspace = true, features = ["tokio"] }
meta-srv.workspace = true
reqwest.workspace = true
serde_json.workspace = true serde_json.workspace = true
strfmt = "0.2" strfmt = "0.2"
tower.workspace = true tower.workspace = true

View File

@@ -364,6 +364,12 @@ pub enum Error {
#[snafu(implicit)] #[snafu(implicit)]
location: Location, location: Location,
}, },
#[snafu(display("Service suspended"))]
Suspended {
#[snafu(implicit)]
location: Location,
},
} }
pub type Result<T> = std::result::Result<T, Error>; pub type Result<T> = std::result::Result<T, Error>;
@@ -444,6 +450,8 @@ impl ErrorExt for Error {
Error::StatementTimeout { .. } => StatusCode::Cancelled, Error::StatementTimeout { .. } => StatusCode::Cancelled,
Error::AcquireLimiter { .. } => StatusCode::Internal, Error::AcquireLimiter { .. } => StatusCode::Internal,
Error::Suspended { .. } => StatusCode::Suspended,
} }
} }

View File

@@ -141,7 +141,43 @@ impl Frontend {
#[cfg(test)] #[cfg(test)]
mod tests { mod tests {
use std::sync::atomic::{AtomicBool, Ordering};
use std::time::Duration;
use api::v1::meta::heartbeat_server::HeartbeatServer;
use api::v1::meta::mailbox_message::Payload;
use api::v1::meta::{
AskLeaderRequest, AskLeaderResponse, HeartbeatRequest, HeartbeatResponse, MailboxMessage,
Peer, ResponseHeader, Role, heartbeat_server,
};
use async_trait::async_trait;
use client::{Client, Database};
use common_catalog::consts::{DEFAULT_CATALOG_NAME, DEFAULT_SCHEMA_NAME};
use common_error::ext::ErrorExt;
use common_error::from_header_to_err_code_msg;
use common_error::status_code::StatusCode;
use common_grpc::channel_manager::ChannelManager;
use common_meta::distributed_time_constants::FRONTEND_HEARTBEAT_INTERVAL_MILLIS;
use common_meta::heartbeat::handler::HandlerGroupExecutor;
use common_meta::heartbeat::handler::parse_mailbox_message::ParseMailboxMessageHandler;
use common_meta::heartbeat::handler::suspend::SuspendHandler;
use common_meta::instruction::Instruction;
use common_stat::ResourceStatImpl;
use meta_client::MetaClientRef;
use meta_client::client::MetaClientBuilder;
use meta_srv::service::GrpcStream;
use servers::grpc::{FlightCompression, GRPC_SERVER};
use servers::http::HTTP_SERVER;
use servers::http::result::greptime_result_v1::GreptimedbV1Response;
use tokio::sync::mpsc;
use tonic::codec::CompressionEncoding;
use tonic::codegen::tokio_stream::StreamExt;
use tonic::codegen::tokio_stream::wrappers::ReceiverStream;
use tonic::{Request, Response, Status, Streaming};
use super::*; use super::*;
use crate::instance::builder::FrontendBuilder;
use crate::server::Services;
#[test] #[test]
fn test_toml() { fn test_toml() {
@@ -149,4 +185,277 @@ mod tests {
let toml_string = toml::to_string(&opts).unwrap(); let toml_string = toml::to_string(&opts).unwrap();
let _parsed: FrontendOptions = toml::from_str(&toml_string).unwrap(); let _parsed: FrontendOptions = toml::from_str(&toml_string).unwrap();
} }
struct SuspendableHeartbeatServer {
suspend: Arc<AtomicBool>,
}
#[async_trait]
impl heartbeat_server::Heartbeat for SuspendableHeartbeatServer {
type HeartbeatStream = GrpcStream<HeartbeatResponse>;
async fn heartbeat(
&self,
request: Request<Streaming<HeartbeatRequest>>,
) -> std::result::Result<Response<Self::HeartbeatStream>, Status> {
let (tx, rx) = mpsc::channel(4);
common_runtime::spawn_global({
let mut requests = request.into_inner();
let suspend = self.suspend.clone();
async move {
while let Some(request) = requests.next().await {
if let Err(e) = request {
let _ = tx.send(Err(e)).await;
return;
}
let mailbox_message =
suspend.load(Ordering::Relaxed).then(|| MailboxMessage {
payload: Some(Payload::Json(
serde_json::to_string(&Instruction::Suspend).unwrap(),
)),
..Default::default()
});
let response = HeartbeatResponse {
header: Some(ResponseHeader::success()),
mailbox_message,
..Default::default()
};
let _ = tx.send(Ok(response)).await;
}
}
});
Ok(Response::new(Box::pin(ReceiverStream::new(rx))))
}
async fn ask_leader(
&self,
_: Request<AskLeaderRequest>,
) -> std::result::Result<Response<AskLeaderResponse>, Status> {
Ok(Response::new(AskLeaderResponse {
header: Some(ResponseHeader::success()),
leader: Some(Peer {
addr: "localhost:0".to_string(),
..Default::default()
}),
}))
}
}
async fn create_meta_client(
options: &MetaClientOptions,
heartbeat_server: Arc<SuspendableHeartbeatServer>,
) -> MetaClientRef {
let (client, server) = tokio::io::duplex(1024);
// create the heartbeat server:
common_runtime::spawn_global(async move {
let mut router = tonic::transport::Server::builder();
let router = router.add_service(
HeartbeatServer::from_arc(heartbeat_server)
.accept_compressed(CompressionEncoding::Zstd)
.send_compressed(CompressionEncoding::Zstd),
);
router
.serve_with_incoming(futures::stream::iter([Ok::<_, std::io::Error>(server)]))
.await
});
// Move client to an option so we can _move_ the inner value
// on the first attempt to connect. All other attempts will fail.
let mut client = Some(client);
let connector = tower::service_fn(move |_| {
let client = client.take();
async move {
if let Some(client) = client {
Ok(hyper_util::rt::TokioIo::new(client))
} else {
Err(std::io::Error::other("client already taken"))
}
}
});
let manager = ChannelManager::new();
manager
.reset_with_connector("localhost:0", connector)
.unwrap();
// create the heartbeat client:
let mut client = MetaClientBuilder::new(0, Role::Frontend)
.enable_heartbeat()
.heartbeat_channel_manager(manager)
.build();
client.start(&options.metasrv_addrs).await.unwrap();
Arc::new(client)
}
async fn create_frontend(
options: &FrontendOptions,
meta_client: MetaClientRef,
) -> Result<Frontend> {
let instance = Arc::new(
FrontendBuilder::new_test(options, meta_client.clone())
.try_build()
.await?,
);
let servers =
Services::new(options.clone(), instance.clone(), Default::default()).build()?;
let executor = Arc::new(HandlerGroupExecutor::new(vec![
Arc::new(ParseMailboxMessageHandler),
Arc::new(SuspendHandler::new(instance.suspend_state())),
]));
let heartbeat_task = Some(HeartbeatTask::new(
options,
meta_client,
executor,
Arc::new(ResourceStatImpl::default()),
));
let mut frontend = Frontend {
instance,
servers,
heartbeat_task,
};
frontend.start().await?;
Ok(frontend)
}
async fn verify_suspend_state_by_http(
frontend: &Frontend,
expected: std::result::Result<&str, (StatusCode, &str)>,
) {
let addr = frontend.server_handlers().addr(HTTP_SERVER).unwrap();
let response = reqwest::get(format!("http://{}/v1/sql?sql=SELECT 1", addr))
.await
.unwrap();
let headers = response.headers();
let response = if let Some((code, error)) = from_header_to_err_code_msg(headers) {
Err((code, error))
} else {
Ok(response.text().await.unwrap())
};
match (response, expected) {
(Ok(response), Ok(expected)) => {
let response: GreptimedbV1Response = serde_json::from_str(&response).unwrap();
let response = serde_json::to_string(response.output()).unwrap();
assert_eq!(&response, expected);
}
(Err(actual), Err(expected)) => assert_eq!(actual, expected),
_ => unreachable!(),
}
}
async fn verify_suspend_state_by_grpc(
frontend: &Frontend,
expected: std::result::Result<&str, (StatusCode, &str)>,
) {
let addr = frontend.server_handlers().addr(GRPC_SERVER).unwrap();
let client = Client::with_urls([addr.to_string()]);
let client = Database::new(DEFAULT_CATALOG_NAME, DEFAULT_SCHEMA_NAME, client);
let response = client.sql("SELECT 1").await;
match (response, expected) {
(Ok(response), Ok(expected)) => {
let response = response.data.pretty_print().await;
assert_eq!(&response, expected.trim());
}
(Err(actual), Err(expected)) => {
assert_eq!(actual.status_code(), expected.0);
assert_eq!(actual.output_msg(), expected.1);
}
_ => unreachable!(),
}
}
#[tokio::test(flavor = "multi_thread", worker_threads = 4)]
async fn test_suspend_frontend() -> Result<()> {
common_telemetry::init_default_ut_logging();
let meta_client_options = MetaClientOptions {
metasrv_addrs: vec!["localhost:0".to_string()],
..Default::default()
};
let options = FrontendOptions {
http: HttpOptions {
addr: "127.0.0.1:0".to_string(),
..Default::default()
},
grpc: GrpcOptions {
bind_addr: "127.0.0.1:0".to_string(),
flight_compression: FlightCompression::None,
..Default::default()
},
mysql: MysqlOptions {
enable: false,
..Default::default()
},
postgres: PostgresOptions {
enable: false,
..Default::default()
},
meta_client: Some(meta_client_options.clone()),
..Default::default()
};
let server = Arc::new(SuspendableHeartbeatServer {
suspend: Arc::new(AtomicBool::new(false)),
});
let meta_client = create_meta_client(&meta_client_options, server.clone()).await;
let frontend = create_frontend(&options, meta_client).await?;
tokio::time::sleep(Duration::from_millis(FRONTEND_HEARTBEAT_INTERVAL_MILLIS)).await;
// initial state: not suspend:
assert!(!frontend.instance.is_suspended());
verify_suspend_state_by_http(&frontend, Ok(r#"[{"records":{"schema":{"column_schemas":[{"name":"Int64(1)","data_type":"Int64"}]},"rows":[[1]],"total_rows":1}}]"#)).await;
verify_suspend_state_by_grpc(
&frontend,
Ok(r#"
+----------+
| Int64(1) |
+----------+
| 1 |
+----------+"#),
)
.await;
// make heartbeat server returned "suspend" instruction,
server.suspend.store(true, Ordering::Relaxed);
tokio::time::sleep(Duration::from_millis(FRONTEND_HEARTBEAT_INTERVAL_MILLIS)).await;
// ... then the frontend is suspended:
assert!(frontend.instance.is_suspended());
verify_suspend_state_by_http(
&frontend,
Err((
StatusCode::Suspended,
"error: Service suspended, execution_time_ms: 0",
)),
)
.await;
verify_suspend_state_by_grpc(&frontend, Err((StatusCode::Suspended, "Service suspended")))
.await;
// make heartbeat server NOT returned "suspend" instruction,
server.suspend.store(false, Ordering::Relaxed);
tokio::time::sleep(Duration::from_millis(FRONTEND_HEARTBEAT_INTERVAL_MILLIS)).await;
// ... then frontend's suspend state is cleared:
assert!(!frontend.instance.is_suspended());
verify_suspend_state_by_http(&frontend, Ok(r#"[{"records":{"schema":{"column_schemas":[{"name":"Int64(1)","data_type":"Int64"}]},"rows":[[1]],"total_rows":1}}]"#)).await;
verify_suspend_state_by_grpc(
&frontend,
Ok(r#"
+----------+
| Int64(1) |
+----------+
| 1 |
+----------+"#),
)
.await;
Ok(())
}
} }

View File

@@ -27,7 +27,6 @@ use common_stat::ResourceStatRef;
use common_telemetry::{debug, error, info, warn}; use common_telemetry::{debug, error, info, warn};
use meta_client::client::{HeartbeatSender, HeartbeatStream, MetaClient}; use meta_client::client::{HeartbeatSender, HeartbeatStream, MetaClient};
use servers::addrs; use servers::addrs;
use servers::heartbeat_options::HeartbeatOptions;
use snafu::ResultExt; use snafu::ResultExt;
use tokio::sync::mpsc; use tokio::sync::mpsc;
use tokio::sync::mpsc::Receiver; use tokio::sync::mpsc::Receiver;
@@ -54,7 +53,6 @@ impl HeartbeatTask {
pub fn new( pub fn new(
opts: &FrontendOptions, opts: &FrontendOptions,
meta_client: Arc<MetaClient>, meta_client: Arc<MetaClient>,
heartbeat_opts: HeartbeatOptions,
resp_handler_executor: HeartbeatResponseHandlerExecutorRef, resp_handler_executor: HeartbeatResponseHandlerExecutorRef,
resource_stat: ResourceStatRef, resource_stat: ResourceStatRef,
) -> Self { ) -> Self {
@@ -68,8 +66,8 @@ impl HeartbeatTask {
addrs::resolve_addr(&opts.grpc.bind_addr, Some(&opts.grpc.server_addr)) addrs::resolve_addr(&opts.grpc.bind_addr, Some(&opts.grpc.server_addr))
}, },
meta_client, meta_client,
report_interval: heartbeat_opts.interval, report_interval: opts.heartbeat.interval,
retry_interval: heartbeat_opts.retry_interval, retry_interval: opts.heartbeat.retry_interval,
resp_handler_executor, resp_handler_executor,
start_time_ms: common_time::util::current_time_millis() as u64, start_time_ms: common_time::util::current_time_millis() as u64,
resource_stat, resource_stat,
@@ -196,7 +194,8 @@ impl HeartbeatTask {
let report_interval = self.report_interval; let report_interval = self.report_interval;
let start_time_ms = self.start_time_ms; let start_time_ms = self.start_time_ms;
let self_peer = Some(Peer { let self_peer = Some(Peer {
// The peer id doesn't make sense for frontend, so we just set it 0. // The node id will be actually calculated from its address (by hashing the address
// string) in the metasrv. So it can be set to 0 here, as a placeholder.
id: 0, id: 0,
addr: self.peer_addr.clone(), addr: self.peer_addr.clone(),
}); });

View File

@@ -26,7 +26,8 @@ mod region_query;
pub mod standalone; pub mod standalone;
use std::pin::Pin; use std::pin::Pin;
use std::sync::Arc; use std::sync::atomic::AtomicBool;
use std::sync::{Arc, atomic};
use std::time::{Duration, SystemTime}; use std::time::{Duration, SystemTime};
use async_stream::stream; use async_stream::stream;
@@ -83,6 +84,7 @@ use snafu::prelude::*;
use sql::ast::ObjectNamePartExt; use sql::ast::ObjectNamePartExt;
use sql::dialect::Dialect; use sql::dialect::Dialect;
use sql::parser::{ParseOptions, ParserContext}; use sql::parser::{ParseOptions, ParserContext};
use sql::statements::comment::CommentObject;
use sql::statements::copy::{CopyDatabase, CopyTable}; use sql::statements::copy::{CopyDatabase, CopyTable};
use sql::statements::statement::Statement; use sql::statements::statement::Statement;
use sql::statements::tql::Tql; use sql::statements::tql::Tql;
@@ -119,6 +121,7 @@ pub struct Instance {
limiter: Option<LimiterRef>, limiter: Option<LimiterRef>,
process_manager: ProcessManagerRef, process_manager: ProcessManagerRef,
slow_query_options: SlowQueryOptions, slow_query_options: SlowQueryOptions,
suspend: Arc<AtomicBool>,
// cache for otlp metrics // cache for otlp metrics
// first layer key: db-string // first layer key: db-string
@@ -171,6 +174,14 @@ impl Instance {
pub fn procedure_executor(&self) -> &ProcedureExecutorRef { pub fn procedure_executor(&self) -> &ProcedureExecutorRef {
self.statement_executor.procedure_executor() self.statement_executor.procedure_executor()
} }
pub fn suspend_state(&self) -> Arc<AtomicBool> {
self.suspend.clone()
}
pub(crate) fn is_suspended(&self) -> bool {
self.suspend.load(atomic::Ordering::Relaxed)
}
} }
fn parse_stmt(sql: &str, dialect: &(dyn Dialect + Send + Sync)) -> Result<Vec<Statement>> { fn parse_stmt(sql: &str, dialect: &(dyn Dialect + Send + Sync)) -> Result<Vec<Statement>> {
@@ -513,6 +524,10 @@ impl SqlQueryHandler for Instance {
#[tracing::instrument(skip_all)] #[tracing::instrument(skip_all)]
async fn do_query(&self, query: &str, query_ctx: QueryContextRef) -> Vec<Result<Output>> { async fn do_query(&self, query: &str, query_ctx: QueryContextRef) -> Vec<Result<Output>> {
if self.is_suspended() {
return vec![error::SuspendedSnafu {}.fail()];
}
let query_interceptor_opt = self.plugins.get::<SqlQueryInterceptorRef<Error>>(); let query_interceptor_opt = self.plugins.get::<SqlQueryInterceptorRef<Error>>();
let query_interceptor = query_interceptor_opt.as_ref(); let query_interceptor = query_interceptor_opt.as_ref();
let query = match query_interceptor.pre_parsing(query, query_ctx.clone()) { let query = match query_interceptor.pre_parsing(query, query_ctx.clone()) {
@@ -580,6 +595,8 @@ impl SqlQueryHandler for Instance {
plan: LogicalPlan, plan: LogicalPlan,
query_ctx: QueryContextRef, query_ctx: QueryContextRef,
) -> Result<Output> { ) -> Result<Output> {
ensure!(!self.is_suspended(), error::SuspendedSnafu);
if should_capture_statement(stmt.as_ref()) { if should_capture_statement(stmt.as_ref()) {
// It's safe to unwrap here because we've already checked the type. // It's safe to unwrap here because we've already checked the type.
let stmt = stmt.unwrap(); let stmt = stmt.unwrap();
@@ -641,6 +658,10 @@ impl SqlQueryHandler for Instance {
query: &PromQuery, query: &PromQuery,
query_ctx: QueryContextRef, query_ctx: QueryContextRef,
) -> Vec<Result<Output>> { ) -> Vec<Result<Output>> {
if self.is_suspended() {
return vec![error::SuspendedSnafu {}.fail()];
}
// check will be done in prometheus handler's do_query // check will be done in prometheus handler's do_query
let result = PrometheusHandler::do_query(self, query, query_ctx) let result = PrometheusHandler::do_query(self, query, query_ctx)
.await .await
@@ -655,6 +676,8 @@ impl SqlQueryHandler for Instance {
stmt: Statement, stmt: Statement,
query_ctx: QueryContextRef, query_ctx: QueryContextRef,
) -> Result<Option<DescribeResult>> { ) -> Result<Option<DescribeResult>> {
ensure!(!self.is_suspended(), error::SuspendedSnafu);
if matches!( if matches!(
stmt, stmt,
Statement::Insert(_) | Statement::Query(_) | Statement::Delete(_) Statement::Insert(_) | Statement::Query(_) | Statement::Delete(_)
@@ -875,7 +898,7 @@ pub fn check_permission(
validate_param(&stmt.table_name, query_ctx)?; validate_param(&stmt.table_name, query_ctx)?;
} }
Statement::ShowCreateFlow(stmt) => { Statement::ShowCreateFlow(stmt) => {
validate_param(&stmt.flow_name, query_ctx)?; validate_flow(&stmt.flow_name, query_ctx)?;
} }
#[cfg(feature = "enterprise")] #[cfg(feature = "enterprise")]
Statement::ShowCreateTrigger(stmt) => { Statement::ShowCreateTrigger(stmt) => {
@@ -908,6 +931,12 @@ pub fn check_permission(
// show charset and show collation won't be checked // show charset and show collation won't be checked
Statement::ShowCharset(_) | Statement::ShowCollation(_) => {} Statement::ShowCharset(_) | Statement::ShowCollation(_) => {}
Statement::Comment(comment) => match &comment.object {
CommentObject::Table(table) => validate_param(table, query_ctx)?,
CommentObject::Column { table, .. } => validate_param(table, query_ctx)?,
CommentObject::Flow(flow) => validate_flow(flow, query_ctx)?,
},
Statement::Insert(insert) => { Statement::Insert(insert) => {
let name = insert.table_name().context(ParseSqlSnafu)?; let name = insert.table_name().context(ParseSqlSnafu)?;
validate_param(name, query_ctx)?; validate_param(name, query_ctx)?;
@@ -993,6 +1022,27 @@ fn validate_param(name: &ObjectName, query_ctx: &QueryContextRef) -> Result<()>
.context(SqlExecInterceptedSnafu) .context(SqlExecInterceptedSnafu)
} }
fn validate_flow(name: &ObjectName, query_ctx: &QueryContextRef) -> Result<()> {
let catalog = match &name.0[..] {
[_flow] => query_ctx.current_catalog().to_string(),
[catalog, _flow] => catalog.to_string_unquoted(),
_ => {
return InvalidSqlSnafu {
err_msg: format!(
"expect flow name to be <catalog>.<flow_name> or <flow_name>, actual: {name}",
),
}
.fail();
}
};
let schema = query_ctx.current_schema();
validate_catalog_and_schema(&catalog, &schema, query_ctx)
.map_err(BoxedError::new)
.context(SqlExecInterceptedSnafu)
}
fn validate_database(name: &ObjectName, query_ctx: &QueryContextRef) -> Result<()> { fn validate_database(name: &ObjectName, query_ctx: &QueryContextRef) -> Result<()> {
let (catalog, schema) = match &name.0[..] { let (catalog, schema) = match &name.0[..] {
[schema] => ( [schema] => (
@@ -1251,6 +1301,28 @@ mod tests {
// test describe table // test describe table
let sql = "DESC TABLE {catalog}{schema}demo;"; let sql = "DESC TABLE {catalog}{schema}demo;";
replace_test(sql, plugins, &query_ctx); replace_test(sql, plugins.clone(), &query_ctx);
let comment_flow_cases = [
("COMMENT ON FLOW my_flow IS 'comment';", true),
("COMMENT ON FLOW greptime.my_flow IS 'comment';", true),
("COMMENT ON FLOW wrongcatalog.my_flow IS 'comment';", false),
];
for (sql, is_ok) in comment_flow_cases {
let stmt = &parse_stmt(sql, &GreptimeDbDialect {}).unwrap()[0];
let result = check_permission(plugins.clone(), stmt, &query_ctx);
assert_eq!(result.is_ok(), is_ok);
}
let show_flow_cases = [
("SHOW CREATE FLOW my_flow;", true),
("SHOW CREATE FLOW greptime.my_flow;", true),
("SHOW CREATE FLOW wrongcatalog.my_flow;", false),
];
for (sql, is_ok) in show_flow_cases {
let stmt = &parse_stmt(sql, &GreptimeDbDialect {}).unwrap()[0];
let result = check_permission(plugins.clone(), stmt, &query_ctx);
assert_eq!(result.is_ok(), is_ok);
}
} }
} }

View File

@@ -13,6 +13,7 @@
// limitations under the License. // limitations under the License.
use std::sync::Arc; use std::sync::Arc;
use std::sync::atomic::AtomicBool;
use cache::{TABLE_FLOWNODE_SET_CACHE_NAME, TABLE_ROUTE_CACHE_NAME}; use cache::{TABLE_FLOWNODE_SET_CACHE_NAME, TABLE_ROUTE_CACHE_NAME};
use catalog::CatalogManagerRef; use catalog::CatalogManagerRef;
@@ -87,6 +88,33 @@ impl FrontendBuilder {
} }
} }
#[cfg(test)]
pub(crate) fn new_test(
options: &FrontendOptions,
meta_client: meta_client::MetaClientRef,
) -> Self {
let kv_backend = Arc::new(common_meta::kv_backend::memory::MemoryKvBackend::new());
let layered_cache_registry = Arc::new(
common_meta::cache::LayeredCacheRegistryBuilder::default()
.add_cache_registry(cache::build_fundamental_cache_registry(kv_backend.clone()))
.build(),
);
Self::new(
options.clone(),
kv_backend,
layered_cache_registry,
catalog::memory::MemoryCatalogManager::with_default_setup(),
Arc::new(client::client_manager::NodeClients::default()),
meta_client,
Arc::new(catalog::process_manager::ProcessManager::new(
"".to_string(),
None,
)),
)
}
pub fn with_local_cache_invalidator(self, cache_invalidator: CacheInvalidatorRef) -> Self { pub fn with_local_cache_invalidator(self, cache_invalidator: CacheInvalidatorRef) -> Self {
Self { Self {
local_cache_invalidator: Some(cache_invalidator), local_cache_invalidator: Some(cache_invalidator),
@@ -242,6 +270,7 @@ impl FrontendBuilder {
process_manager, process_manager,
otlp_metrics_table_legacy_cache: DashMap::new(), otlp_metrics_table_legacy_cache: DashMap::new(),
slow_query_options: self.options.slow_query.clone(), slow_query_options: self.options.slow_query.clone(),
suspend: Arc::new(AtomicBool::new(false)),
}) })
} }
} }

View File

@@ -234,6 +234,11 @@ impl GrpcQueryHandler for Instance {
DdlExpr::DropView(_) => { DdlExpr::DropView(_) => {
todo!("implemented in the following PR") todo!("implemented in the following PR")
} }
DdlExpr::CommentOn(expr) => {
self.statement_executor
.comment_by_expr(expr, ctx.clone())
.await?
}
} }
} }
}; };
@@ -296,22 +301,35 @@ impl GrpcQueryHandler for Instance {
mut stream: servers::grpc::flight::PutRecordBatchRequestStream, mut stream: servers::grpc::flight::PutRecordBatchRequestStream,
ctx: QueryContextRef, ctx: QueryContextRef,
) -> Pin<Box<dyn Stream<Item = Result<DoPutResponse>> + Send>> { ) -> Pin<Box<dyn Stream<Item = Result<DoPutResponse>> + Send>> {
// Resolve table once for the stream
// Clone all necessary data to make it 'static // Clone all necessary data to make it 'static
let catalog_manager = self.catalog_manager().clone(); let catalog_manager = self.catalog_manager().clone();
let plugins = self.plugins.clone(); let plugins = self.plugins.clone();
let inserter = self.inserter.clone(); let inserter = self.inserter.clone();
let table_name = stream.table_name().clone();
let ctx = ctx.clone(); let ctx = ctx.clone();
let mut table_ref: Option<TableRef> = None;
let mut table_checked = false;
Box::pin(try_stream! { Box::pin(try_stream! {
// Process each request in the stream
while let Some(request_result) = stream.next().await {
let request = request_result.map_err(|e| {
let error_msg = format!("Stream error: {:?}", e);
IncompleteGrpcRequestSnafu { err_msg: error_msg }.build()
})?;
// Resolve table and check permissions on first RecordBatch (after schema is received)
if !table_checked {
let table_name = &request.table_name;
plugins plugins
.get::<PermissionCheckerRef>() .get::<PermissionCheckerRef>()
.as_ref() .as_ref()
.check_permission(ctx.current_user(), PermissionReq::BulkInsert) .check_permission(ctx.current_user(), PermissionReq::BulkInsert)
.context(PermissionSnafu)?; .context(PermissionSnafu)?;
// Cache for resolved table reference - resolve once and reuse
let table_ref = catalog_manager // Resolve table reference
table_ref = Some(
catalog_manager
.table( .table(
&table_name.catalog_name, &table_name.catalog_name,
&table_name.schema_name, &table_name.schema_name,
@@ -322,25 +340,22 @@ impl GrpcQueryHandler for Instance {
.context(CatalogSnafu)? .context(CatalogSnafu)?
.with_context(|| TableNotFoundSnafu { .with_context(|| TableNotFoundSnafu {
table_name: table_name.to_string(), table_name: table_name.to_string(),
})?; })?,
);
// Check permissions once for the stream // Check permissions for the table
let interceptor_ref = plugins.get::<GrpcQueryInterceptorRef<Error>>(); let interceptor_ref = plugins.get::<GrpcQueryInterceptorRef<Error>>();
let interceptor = interceptor_ref.as_ref(); let interceptor = interceptor_ref.as_ref();
interceptor.pre_bulk_insert(table_ref.clone(), ctx.clone())?; interceptor.pre_bulk_insert(table_ref.clone().unwrap(), ctx.clone())?;
// Process each request in the stream table_checked = true;
while let Some(request_result) = stream.next().await { }
let request = request_result.map_err(|e| {
let error_msg = format!("Stream error: {:?}", e);
IncompleteGrpcRequestSnafu { err_msg: error_msg }.build()
})?;
let request_id = request.request_id; let request_id = request.request_id;
let start = Instant::now(); let start = Instant::now();
let rows = inserter let rows = inserter
.handle_bulk_insert( .handle_bulk_insert(
table_ref.clone(), table_ref.clone().unwrap(),
request.flight_data, request.flight_data,
request.record_batch, request.record_batch,
request.schema_bytes, request.schema_bytes,
@@ -399,6 +414,9 @@ fn fill_catalog_and_schema_from_context(ddl_expr: &mut DdlExpr, ctx: &QueryConte
Expr::DropView(expr) => { Expr::DropView(expr) => {
check_and_fill!(expr); check_and_fill!(expr);
} }
Expr::CommentOn(expr) => {
check_and_fill!(expr);
}
} }
} }

View File

@@ -65,8 +65,7 @@ impl JaegerQueryHandler for Instance {
// It's equivalent to `SELECT DISTINCT(service_name) FROM {db}.{trace_table}`. // It's equivalent to `SELECT DISTINCT(service_name) FROM {db}.{trace_table}`.
Ok(query_trace_table( Ok(query_trace_table(
ctx, ctx,
self.catalog_manager(), self,
self.query_engine(),
vec![SelectExpr::from(col(SERVICE_NAME_COLUMN))], vec![SelectExpr::from(col(SERVICE_NAME_COLUMN))],
vec![], vec![],
vec![], vec![],
@@ -107,8 +106,7 @@ impl JaegerQueryHandler for Instance {
// ```. // ```.
Ok(query_trace_table( Ok(query_trace_table(
ctx, ctx,
self.catalog_manager(), self,
self.query_engine(),
vec![ vec![
SelectExpr::from(col(SPAN_NAME_COLUMN)), SelectExpr::from(col(SPAN_NAME_COLUMN)),
SelectExpr::from(col(SPAN_KIND_COLUMN)), SelectExpr::from(col(SPAN_KIND_COLUMN)),
@@ -160,8 +158,7 @@ impl JaegerQueryHandler for Instance {
Ok(query_trace_table( Ok(query_trace_table(
ctx, ctx,
self.catalog_manager(), self,
self.query_engine(),
selects, selects,
filters, filters,
vec![col(TIMESTAMP_COLUMN).sort(false, false)], // Sort by timestamp in descending order. vec![col(TIMESTAMP_COLUMN).sort(false, false)], // Sort by timestamp in descending order.
@@ -220,8 +217,7 @@ impl JaegerQueryHandler for Instance {
// ```. // ```.
let output = query_trace_table( let output = query_trace_table(
ctx.clone(), ctx.clone(),
self.catalog_manager(), self,
self.query_engine(),
vec![wildcard()], vec![wildcard()],
filters, filters,
vec![], vec![],
@@ -285,8 +281,7 @@ impl JaegerQueryHandler for Instance {
// query all spans // query all spans
Ok(query_trace_table( Ok(query_trace_table(
ctx, ctx,
self.catalog_manager(), self,
self.query_engine(),
vec![wildcard()], vec![wildcard()],
filters, filters,
vec![col(TIMESTAMP_COLUMN).sort(false, false)], // Sort by timestamp in descending order. vec![col(TIMESTAMP_COLUMN).sort(false, false)], // Sort by timestamp in descending order.
@@ -303,8 +298,7 @@ impl JaegerQueryHandler for Instance {
#[allow(clippy::too_many_arguments)] #[allow(clippy::too_many_arguments)]
async fn query_trace_table( async fn query_trace_table(
ctx: QueryContextRef, ctx: QueryContextRef,
catalog_manager: &CatalogManagerRef, instance: &Instance,
query_engine: &QueryEngineRef,
selects: Vec<SelectExpr>, selects: Vec<SelectExpr>,
filters: Vec<Expr>, filters: Vec<Expr>,
sorts: Vec<SortExpr>, sorts: Vec<SortExpr>,
@@ -334,7 +328,8 @@ async fn query_trace_table(
} }
}; };
let table = catalog_manager let table = instance
.catalog_manager()
.table( .table(
ctx.current_catalog(), ctx.current_catalog(),
&ctx.current_schema(), &ctx.current_schema(),
@@ -367,7 +362,7 @@ async fn query_trace_table(
.map(|s| format!("\"{}\"", s)) .map(|s| format!("\"{}\"", s))
.collect::<HashSet<String>>(); .collect::<HashSet<String>>();
let df_context = create_df_context(query_engine)?; let df_context = create_df_context(instance.query_engine())?;
let dataframe = df_context let dataframe = df_context
.read_table(Arc::new(DfTableProviderAdapter::new(table))) .read_table(Arc::new(DfTableProviderAdapter::new(table)))

View File

@@ -16,6 +16,9 @@ use std::net::SocketAddr;
use std::sync::Arc; use std::sync::Arc;
use auth::UserProviderRef; use auth::UserProviderRef;
use axum::extract::{Request, State};
use axum::middleware::Next;
use axum::response::IntoResponse;
use common_base::Plugins; use common_base::Plugins;
use common_config::Configurable; use common_config::Configurable;
use common_telemetry::info; use common_telemetry::info;
@@ -27,6 +30,7 @@ use servers::grpc::frontend_grpc_handler::FrontendGrpcHandler;
use servers::grpc::greptime_handler::GreptimeRequestHandler; use servers::grpc::greptime_handler::GreptimeRequestHandler;
use servers::grpc::{GrpcOptions, GrpcServer}; use servers::grpc::{GrpcOptions, GrpcServer};
use servers::http::event::LogValidatorRef; use servers::http::event::LogValidatorRef;
use servers::http::result::error_result::ErrorResponse;
use servers::http::utils::router::RouterConfigurator; use servers::http::utils::router::RouterConfigurator;
use servers::http::{HttpServer, HttpServerBuilder}; use servers::http::{HttpServer, HttpServerBuilder};
use servers::interceptor::LogIngestInterceptorRef; use servers::interceptor::LogIngestInterceptorRef;
@@ -39,6 +43,7 @@ use servers::query_handler::sql::ServerSqlQueryHandlerAdapter;
use servers::server::{Server, ServerHandlers}; use servers::server::{Server, ServerHandlers};
use servers::tls::{ReloadableTlsServerConfig, maybe_watch_server_tls_config}; use servers::tls::{ReloadableTlsServerConfig, maybe_watch_server_tls_config};
use snafu::ResultExt; use snafu::ResultExt;
use tonic::Status;
use crate::error::{self, Result, StartServerSnafu, TomlFormatSnafu}; use crate::error::{self, Result, StartServerSnafu, TomlFormatSnafu};
use crate::frontend::FrontendOptions; use crate::frontend::FrontendOptions;
@@ -125,7 +130,16 @@ where
builder = builder.with_extra_router(configurator.router()); builder = builder.with_extra_router(configurator.router());
} }
builder builder.add_layer(axum::middleware::from_fn_with_state(
self.instance.clone(),
async move |State(state): State<Arc<Instance>>, request: Request, next: Next| {
if state.is_suspended() {
return ErrorResponse::from_error(servers::error::SuspendedSnafu.build())
.into_response();
}
next.run(request).await
},
))
} }
pub fn with_grpc_server_builder(self, builder: GrpcServerBuilder) -> Self { pub fn with_grpc_server_builder(self, builder: GrpcServerBuilder) -> Self {
@@ -197,7 +211,17 @@ where
self.instance.clone(), self.instance.clone(),
user_provider.clone(), user_provider.clone(),
)) ))
.flight_handler(flight_handler); .flight_handler(flight_handler)
.add_layer(axum::middleware::from_fn_with_state(
self.instance.clone(),
async move |State(state): State<Arc<Instance>>, request: Request, next: Next| {
if state.is_suspended() {
let status = Status::from(servers::error::SuspendedSnafu.build());
return status.into_http();
}
next.run(request).await
},
));
let grpc_server = if !external { let grpc_server = if !external {
let frontend_grpc_handler = let frontend_grpc_handler =

View File

@@ -7,6 +7,9 @@ license.workspace = true
[lints] [lints]
workspace = true workspace = true
[features]
vector_index = ["dep:usearch"]
[dependencies] [dependencies]
async-trait.workspace = true async-trait.workspace = true
asynchronous-codec = "0.7.0" asynchronous-codec = "0.7.0"
@@ -17,6 +20,7 @@ common-error.workspace = true
common-macro.workspace = true common-macro.workspace = true
common-runtime.workspace = true common-runtime.workspace = true
common-telemetry.workspace = true common-telemetry.workspace = true
datatypes.workspace = true
fastbloom = "0.8" fastbloom = "0.8"
fst.workspace = true fst.workspace = true
futures.workspace = true futures.workspace = true
@@ -25,6 +29,7 @@ itertools.workspace = true
jieba-rs = "0.8" jieba-rs = "0.8"
lazy_static.workspace = true lazy_static.workspace = true
mockall.workspace = true mockall.workspace = true
nalgebra.workspace = true
pin-project.workspace = true pin-project.workspace = true
prost.workspace = true prost.workspace = true
puffin.workspace = true puffin.workspace = true
@@ -39,6 +44,7 @@ tantivy = { version = "0.24", features = ["zstd-compression"] }
tantivy-jieba = "0.16" tantivy-jieba = "0.16"
tokio.workspace = true tokio.workspace = true
tokio-util.workspace = true tokio-util.workspace = true
usearch = { version = "2.21", default-features = false, features = ["fp16lib"], optional = true }
uuid.workspace = true uuid.workspace = true
[dev-dependencies] [dev-dependencies]

View File

@@ -22,6 +22,8 @@ pub mod external_provider;
pub mod fulltext_index; pub mod fulltext_index;
pub mod inverted_index; pub mod inverted_index;
pub mod target; pub mod target;
#[cfg(feature = "vector_index")]
pub mod vector;
pub type Bytes = Vec<u8>; pub type Bytes = Vec<u8>;
pub type BytesRef<'a> = &'a [u8]; pub type BytesRef<'a> = &'a [u8];

163
src/index/src/vector.rs Normal file
View File

@@ -0,0 +1,163 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
//! Vector index types and options.
//!
//! This module re-exports types from `datatypes` and provides conversions
//! to USearch types, as well as distance computation functions.
pub use datatypes::schema::{VectorDistanceMetric, VectorIndexOptions};
use nalgebra::DVectorView;
pub use usearch::MetricKind;
/// Converts a VectorDistanceMetric to a USearch MetricKind.
pub fn distance_metric_to_usearch(metric: VectorDistanceMetric) -> MetricKind {
match metric {
VectorDistanceMetric::L2sq => MetricKind::L2sq,
VectorDistanceMetric::Cosine => MetricKind::Cos,
VectorDistanceMetric::InnerProduct => MetricKind::IP,
}
}
/// Computes distance between two vectors using the specified metric.
///
/// Uses SIMD-optimized implementations via nalgebra.
///
/// **Note:** The caller must ensure that the two vectors have the same length
/// and are non-empty. Empty vectors return 0.0 for all metrics.
pub fn compute_distance(v1: &[f32], v2: &[f32], metric: VectorDistanceMetric) -> f32 {
// Empty vectors are degenerate; return 0.0 uniformly across all metrics.
if v1.is_empty() || v2.is_empty() {
return 0.0;
}
match metric {
VectorDistanceMetric::L2sq => l2sq(v1, v2),
VectorDistanceMetric::Cosine => cosine(v1, v2),
VectorDistanceMetric::InnerProduct => -dot(v1, v2),
}
}
/// Calculates the squared L2 distance between two vectors.
fn l2sq(lhs: &[f32], rhs: &[f32]) -> f32 {
let lhs = DVectorView::from_slice(lhs, lhs.len());
let rhs = DVectorView::from_slice(rhs, rhs.len());
(lhs - rhs).norm_squared()
}
/// Calculates the cosine distance between two vectors.
///
/// Returns a value in `[0.0, 2.0]` where 0.0 means identical direction and 2.0 means
/// opposite direction. For degenerate cases (zero or near-zero magnitude vectors),
/// returns 1.0 (maximum uncertainty) to avoid NaN and ensure safe index operations.
fn cosine(lhs: &[f32], rhs: &[f32]) -> f32 {
let lhs_vec = DVectorView::from_slice(lhs, lhs.len());
let rhs_vec = DVectorView::from_slice(rhs, rhs.len());
let dot_product = lhs_vec.dot(&rhs_vec);
let lhs_norm = lhs_vec.norm();
let rhs_norm = rhs_vec.norm();
// Zero-magnitude vectors have undefined direction; return max distance as safe fallback.
if dot_product.abs() < f32::EPSILON
|| lhs_norm.abs() < f32::EPSILON
|| rhs_norm.abs() < f32::EPSILON
{
return 1.0;
}
let cos_similar = dot_product / (lhs_norm * rhs_norm);
let res = 1.0 - cos_similar;
// Clamp near-zero results to exactly 0.0 to avoid floating-point artifacts.
if res.abs() < f32::EPSILON { 0.0 } else { res }
}
/// Calculates the dot product between two vectors.
fn dot(lhs: &[f32], rhs: &[f32]) -> f32 {
let lhs = DVectorView::from_slice(lhs, lhs.len());
let rhs = DVectorView::from_slice(rhs, rhs.len());
lhs.dot(&rhs)
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_distance_metric_to_usearch() {
assert_eq!(
distance_metric_to_usearch(VectorDistanceMetric::L2sq),
MetricKind::L2sq
);
assert_eq!(
distance_metric_to_usearch(VectorDistanceMetric::Cosine),
MetricKind::Cos
);
assert_eq!(
distance_metric_to_usearch(VectorDistanceMetric::InnerProduct),
MetricKind::IP
);
}
#[test]
fn test_vector_index_options_default() {
let options = VectorIndexOptions::default();
assert_eq!(options.metric, VectorDistanceMetric::L2sq);
assert_eq!(options.connectivity, 16);
assert_eq!(options.expansion_add, 128);
assert_eq!(options.expansion_search, 64);
}
#[test]
fn test_compute_distance_l2sq() {
let v1 = vec![1.0, 2.0, 3.0];
let v2 = vec![4.0, 5.0, 6.0];
// L2sq = (4-1)^2 + (5-2)^2 + (6-3)^2 = 9 + 9 + 9 = 27
let dist = compute_distance(&v1, &v2, VectorDistanceMetric::L2sq);
assert!((dist - 27.0).abs() < 1e-6);
}
#[test]
fn test_compute_distance_cosine() {
let v1 = vec![1.0, 0.0, 0.0];
let v2 = vec![0.0, 1.0, 0.0];
// Orthogonal vectors have cosine similarity of 0, distance of 1
let dist = compute_distance(&v1, &v2, VectorDistanceMetric::Cosine);
assert!((dist - 1.0).abs() < 1e-6);
}
#[test]
fn test_compute_distance_inner_product() {
let v1 = vec![1.0, 2.0, 3.0];
let v2 = vec![4.0, 5.0, 6.0];
// Inner product = 1*4 + 2*5 + 3*6 = 4 + 10 + 18 = 32
// Distance is negated: -32
let dist = compute_distance(&v1, &v2, VectorDistanceMetric::InnerProduct);
assert!((dist - (-32.0)).abs() < 1e-6);
}
#[test]
fn test_compute_distance_empty_vectors() {
// Empty vectors should return 0.0 uniformly for all metrics
assert_eq!(compute_distance(&[], &[], VectorDistanceMetric::L2sq), 0.0);
assert_eq!(
compute_distance(&[], &[], VectorDistanceMetric::Cosine),
0.0
);
assert_eq!(
compute_distance(&[], &[], VectorDistanceMetric::InnerProduct),
0.0
);
}
}

View File

@@ -16,7 +16,7 @@ use std::collections::HashMap;
use std::sync::Arc; use std::sync::Arc;
use common_wal::config::kafka::DatanodeKafkaConfig; use common_wal::config::kafka::DatanodeKafkaConfig;
use common_wal::config::kafka::common::DEFAULT_BACKOFF_CONFIG; use common_wal::config::kafka::common::{DEFAULT_BACKOFF_CONFIG, DEFAULT_CONNECT_TIMEOUT};
use dashmap::DashMap; use dashmap::DashMap;
use rskafka::client::ClientBuilder; use rskafka::client::ClientBuilder;
use rskafka::client::partition::{Compression, PartitionClient, UnknownTopicHandling}; use rskafka::client::partition::{Compression, PartitionClient, UnknownTopicHandling};
@@ -78,7 +78,8 @@ impl ClientManager {
) -> Result<Self> { ) -> Result<Self> {
// Sets backoff config for the top-level kafka client and all clients constructed by it. // Sets backoff config for the top-level kafka client and all clients constructed by it.
let mut builder = ClientBuilder::new(config.connection.broker_endpoints.clone()) let mut builder = ClientBuilder::new(config.connection.broker_endpoints.clone())
.backoff_config(DEFAULT_BACKOFF_CONFIG); .backoff_config(DEFAULT_BACKOFF_CONFIG)
.connect_timeout(Some(DEFAULT_CONNECT_TIMEOUT));
if let Some(sasl) = &config.connection.sasl { if let Some(sasl) = &config.connection.sasl {
builder = builder.sasl_config(sasl.config.clone().into_sasl_config()); builder = builder.sasl_config(sasl.config.clone().into_sasl_config());
}; };

View File

@@ -14,6 +14,7 @@
use std::net::SocketAddr; use std::net::SocketAddr;
use std::sync::Arc; use std::sync::Arc;
use std::time::Duration;
use api::v1::meta::cluster_server::ClusterServer; use api::v1::meta::cluster_server::ClusterServer;
use api::v1::meta::heartbeat_server::HeartbeatServer; use api::v1::meta::heartbeat_server::HeartbeatServer;
@@ -49,16 +50,21 @@ use crate::metasrv::builder::MetasrvBuilder;
use crate::metasrv::{ use crate::metasrv::{
BackendImpl, ElectionRef, Metasrv, MetasrvOptions, SelectTarget, SelectorRef, BackendImpl, ElectionRef, Metasrv, MetasrvOptions, SelectTarget, SelectorRef,
}; };
use crate::selector::SelectorType;
use crate::selector::lease_based::LeaseBasedSelector; use crate::selector::lease_based::LeaseBasedSelector;
use crate::selector::load_based::LoadBasedSelector; use crate::selector::load_based::LoadBasedSelector;
use crate::selector::round_robin::RoundRobinSelector; use crate::selector::round_robin::RoundRobinSelector;
use crate::selector::weight_compute::RegionNumsBasedWeightCompute; use crate::selector::weight_compute::RegionNumsBasedWeightCompute;
use crate::selector::{Selector, SelectorType};
use crate::service::admin; use crate::service::admin;
use crate::service::admin::admin_axum_router; use crate::service::admin::admin_axum_router;
use crate::utils::etcd::create_etcd_client_with_tls; use crate::utils::etcd::create_etcd_client_with_tls;
use crate::{Result, error}; use crate::{Result, error};
/// The default keep-alive interval for gRPC.
const DEFAULT_GRPC_KEEP_ALIVE_INTERVAL: Duration = Duration::from_secs(10);
/// The default keep-alive timeout for gRPC.
const DEFAULT_GRPC_KEEP_ALIVE_TIMEOUT: Duration = Duration::from_secs(10);
pub struct MetasrvInstance { pub struct MetasrvInstance {
metasrv: Arc<Metasrv>, metasrv: Arc<Metasrv>,
@@ -245,7 +251,12 @@ macro_rules! add_compressed_service {
} }
pub fn router(metasrv: Arc<Metasrv>) -> Router { pub fn router(metasrv: Arc<Metasrv>) -> Router {
let mut router = tonic::transport::Server::builder().accept_http1(true); // for admin services let mut router = tonic::transport::Server::builder()
// for admin services
.accept_http1(true)
// For quick network failures detection.
.http2_keepalive_interval(Some(DEFAULT_GRPC_KEEP_ALIVE_INTERVAL))
.http2_keepalive_timeout(Some(DEFAULT_GRPC_KEEP_ALIVE_TIMEOUT));
let router = add_compressed_service!(router, HeartbeatServer::from_arc(metasrv.clone())); let router = add_compressed_service!(router, HeartbeatServer::from_arc(metasrv.clone()));
let router = add_compressed_service!(router, StoreServer::from_arc(metasrv.clone())); let router = add_compressed_service!(router, StoreServer::from_arc(metasrv.clone()));
let router = add_compressed_service!(router, ClusterServer::from_arc(metasrv.clone())); let router = add_compressed_service!(router, ClusterServer::from_arc(metasrv.clone()));
@@ -280,7 +291,7 @@ pub async fn metasrv_builder(
use common_meta::distributed_time_constants::POSTGRES_KEEP_ALIVE_SECS; use common_meta::distributed_time_constants::POSTGRES_KEEP_ALIVE_SECS;
use common_meta::kv_backend::rds::PgStore; use common_meta::kv_backend::rds::PgStore;
use deadpool_postgres::Config; use deadpool_postgres::{Config, ManagerConfig, RecyclingMethod};
use crate::election::rds::postgres::{ElectionPgClient, PgElection}; use crate::election::rds::postgres::{ElectionPgClient, PgElection};
use crate::utils::postgres::create_postgres_pool; use crate::utils::postgres::create_postgres_pool;
@@ -294,8 +305,15 @@ pub async fn metasrv_builder(
let mut cfg = Config::new(); let mut cfg = Config::new();
cfg.keepalives = Some(true); cfg.keepalives = Some(true);
cfg.keepalives_idle = Some(Duration::from_secs(POSTGRES_KEEP_ALIVE_SECS)); cfg.keepalives_idle = Some(Duration::from_secs(POSTGRES_KEEP_ALIVE_SECS));
// We use a separate pool for election since we need a different session keep-alive idle time. cfg.manager = Some(ManagerConfig {
let pool = create_postgres_pool(&opts.store_addrs, Some(cfg), opts.backend_tls.clone()) recycling_method: RecyclingMethod::Verified,
});
// Use a dedicated pool for the election client to allow customized session settings.
let pool = create_postgres_pool(
&opts.store_addrs,
Some(cfg.clone()),
opts.backend_tls.clone(),
)
.await?; .await?;
let election_client = ElectionPgClient::new( let election_client = ElectionPgClient::new(
@@ -316,8 +334,8 @@ pub async fn metasrv_builder(
) )
.await?; .await?;
let pool = let pool = create_postgres_pool(&opts.store_addrs, Some(cfg), opts.backend_tls.clone())
create_postgres_pool(&opts.store_addrs, None, opts.backend_tls.clone()).await?; .await?;
let kv_backend = PgStore::with_pg_pool( let kv_backend = PgStore::with_pg_pool(
pool, pool,
opts.meta_schema_name.as_deref(), opts.meta_schema_name.as_deref(),
@@ -393,7 +411,12 @@ pub async fn metasrv_builder(
info!("Using selector from plugins"); info!("Using selector from plugins");
selector selector
} else { } else {
let selector = match opts.selector { let selector: Arc<
dyn Selector<
Context = crate::metasrv::SelectorContext,
Output = Vec<common_meta::peer::Peer>,
>,
> = match opts.selector {
SelectorType::LoadBased => Arc::new(LoadBasedSelector::new( SelectorType::LoadBased => Arc::new(LoadBasedSelector::new(
RegionNumsBasedWeightCompute, RegionNumsBasedWeightCompute,
meta_peer_client.clone(), meta_peer_client.clone(),

View File

@@ -63,22 +63,6 @@ pub struct EtcdElection {
} }
impl EtcdElection { impl EtcdElection {
pub async fn with_endpoints<E, S>(
leader_value: E,
endpoints: S,
store_key_prefix: String,
) -> Result<ElectionRef>
where
E: AsRef<str>,
S: AsRef<[E]>,
{
let client = Client::connect(endpoints, None)
.await
.context(error::ConnectEtcdSnafu)?;
Self::with_etcd_client(leader_value, client, store_key_prefix).await
}
pub async fn with_etcd_client<E>( pub async fn with_etcd_client<E>(
leader_value: E, leader_value: E,
client: Client, client: Client,

View File

@@ -88,7 +88,8 @@ impl GcScheduler {
// Skip regions that are in cooldown period // Skip regions that are in cooldown period
if let Some(gc_info) = tracker.get(&region_stat.id) if let Some(gc_info) = tracker.get(&region_stat.id)
&& now.duration_since(gc_info.last_gc_time) < self.config.gc_cooldown_period && now.saturating_duration_since(gc_info.last_gc_time)
< self.config.gc_cooldown_period
{ {
debug!("Skipping region {} due to cooldown", region_stat.id); debug!("Skipping region {} due to cooldown", region_stat.id);
continue; continue;

View File

@@ -434,7 +434,7 @@ impl GcScheduler {
if let Some(gc_info) = gc_tracker.get(&region_id) { if let Some(gc_info) = gc_tracker.get(&region_id) {
if let Some(last_full_listing) = gc_info.last_full_listing_time { if let Some(last_full_listing) = gc_info.last_full_listing_time {
// check if pass cooling down interval after last full listing // check if pass cooling down interval after last full listing
let elapsed = now.duration_since(last_full_listing); let elapsed = now.saturating_duration_since(last_full_listing);
elapsed >= self.config.full_file_listing_interval elapsed >= self.config.full_file_listing_interval
} else { } else {
// Never did full listing for this region, do it now // Never did full listing for this region, do it now

View File

@@ -14,7 +14,7 @@
mod basic; mod basic;
mod candidate_select; mod candidate_select;
mod con; mod concurrent;
mod config; mod config;
mod err_handle; mod err_handle;
mod full_list; mod full_list;

View File

@@ -135,6 +135,9 @@ async fn test_full_gc_workflow() {
); );
} }
/// Due to https://github.com/rust-lang/rust/issues/100141 can't have Instant early than process start time on non-linux OS
/// This is fine since in real usage instant will always be after process start time
#[cfg(target_os = "linux")]
#[tokio::test] #[tokio::test]
async fn test_tracker_cleanup() { async fn test_tracker_cleanup() {
init_default_ut_logging(); init_default_ut_logging();

View File

@@ -50,7 +50,7 @@ impl GcScheduler {
let now = Instant::now(); let now = Instant::now();
// Check if enough time has passed since last cleanup // Check if enough time has passed since last cleanup
if now.duration_since(last_cleanup) < self.config.tracker_cleanup_interval { if now.saturating_duration_since(last_cleanup) < self.config.tracker_cleanup_interval {
return Ok(()); return Ok(());
} }
@@ -92,7 +92,7 @@ impl GcScheduler {
if let Some(gc_info) = gc_tracker.get(&region_id) { if let Some(gc_info) = gc_tracker.get(&region_id) {
if let Some(last_full_listing) = gc_info.last_full_listing_time { if let Some(last_full_listing) = gc_info.last_full_listing_time {
let elapsed = now.duration_since(last_full_listing); let elapsed = now.saturating_duration_since(last_full_listing);
elapsed >= self.config.full_file_listing_interval elapsed >= self.config.full_file_listing_interval
} else { } else {
// Never did full listing for this region, do it now // Never did full listing for this region, do it now

View File

@@ -32,7 +32,7 @@ use collect_leader_region_handler::CollectLeaderRegionHandler;
use collect_stats_handler::CollectStatsHandler; use collect_stats_handler::CollectStatsHandler;
use common_base::Plugins; use common_base::Plugins;
use common_meta::datanode::Stat; use common_meta::datanode::Stat;
use common_meta::instruction::{Instruction, InstructionReply}; use common_meta::instruction::InstructionReply;
use common_meta::sequence::Sequence; use common_meta::sequence::Sequence;
use common_telemetry::{debug, info, warn}; use common_telemetry::{debug, info, warn};
use dashmap::DashMap; use dashmap::DashMap;
@@ -114,16 +114,19 @@ pub enum HandleControl {
#[derive(Debug, Default)] #[derive(Debug, Default)]
pub struct HeartbeatAccumulator { pub struct HeartbeatAccumulator {
pub header: Option<ResponseHeader>, pub header: Option<ResponseHeader>,
pub instructions: Vec<Instruction>, mailbox_message: Option<MailboxMessage>,
pub stat: Option<Stat>, pub stat: Option<Stat>,
pub inactive_region_ids: HashSet<RegionId>, pub inactive_region_ids: HashSet<RegionId>,
pub region_lease: Option<RegionLease>, pub region_lease: Option<RegionLease>,
} }
impl HeartbeatAccumulator { impl HeartbeatAccumulator {
pub fn into_mailbox_message(self) -> Option<MailboxMessage> { pub(crate) fn take_mailbox_message(&mut self) -> Option<MailboxMessage> {
// TODO(jiachun): to HeartbeatResponse payload self.mailbox_message.take()
None }
pub fn set_mailbox_message(&mut self, message: MailboxMessage) {
let _ = self.mailbox_message.insert(message);
} }
} }
@@ -275,6 +278,15 @@ impl Pushers {
async fn remove(&self, pusher_id: &str) -> Option<Pusher> { async fn remove(&self, pusher_id: &str) -> Option<Pusher> {
self.0.write().await.remove(pusher_id) self.0.write().await.remove(pusher_id)
} }
pub(crate) async fn clear(&self) -> Vec<String> {
let mut pushers = self.0.write().await;
let keys = pushers.keys().cloned().collect::<Vec<_>>();
if !keys.is_empty() {
pushers.clear();
}
keys
}
} }
#[derive(Clone)] #[derive(Clone)]
@@ -309,12 +321,11 @@ impl HeartbeatHandlerGroup {
} }
/// Deregisters the heartbeat response [`Pusher`] with the given key from the group. /// Deregisters the heartbeat response [`Pusher`] with the given key from the group.
/// pub async fn deregister_push(&self, pusher_id: PusherId) {
/// Returns the [`Pusher`] if it exists.
pub async fn deregister_push(&self, pusher_id: PusherId) -> Option<Pusher> {
METRIC_META_HEARTBEAT_CONNECTION_NUM.dec();
info!("Pusher unregister: {}", pusher_id); info!("Pusher unregister: {}", pusher_id);
self.pushers.remove(&pusher_id.string_key()).await if self.pushers.remove(&pusher_id.string_key()).await.is_some() {
METRIC_META_HEARTBEAT_CONNECTION_NUM.dec();
}
} }
/// Returns the [`Pushers`] of the group. /// Returns the [`Pushers`] of the group.
@@ -351,10 +362,11 @@ impl HeartbeatHandlerGroup {
} }
} }
let header = std::mem::take(&mut acc.header); let header = std::mem::take(&mut acc.header);
let mailbox_message = acc.take_mailbox_message();
let res = HeartbeatResponse { let res = HeartbeatResponse {
header, header,
region_lease: acc.region_lease, region_lease: acc.region_lease,
..Default::default() mailbox_message,
}; };
Ok(res) Ok(res)
} }
@@ -382,7 +394,9 @@ impl HeartbeatMailbox {
/// Parses the [Instruction] from [MailboxMessage]. /// Parses the [Instruction] from [MailboxMessage].
#[cfg(test)] #[cfg(test)]
pub fn json_instruction(msg: &MailboxMessage) -> Result<Instruction> { pub(crate) fn json_instruction(
msg: &MailboxMessage,
) -> Result<common_meta::instruction::Instruction> {
let Payload::Json(payload) = let Payload::Json(payload) =
msg.payload msg.payload
.as_ref() .as_ref()
@@ -519,6 +533,14 @@ impl Mailbox for HeartbeatMailbox {
Ok(()) Ok(())
} }
async fn reset(&self) {
let keys = self.pushers.clear().await;
if !keys.is_empty() {
info!("Reset mailbox, deregister pushers: {:?}", keys);
METRIC_META_HEARTBEAT_CONNECTION_NUM.sub(keys.len() as i64);
}
}
} }
/// The builder to build the group of heartbeat handlers. /// The builder to build the group of heartbeat handlers.

View File

@@ -452,6 +452,7 @@ pub struct MetaStateHandler {
greptimedb_telemetry_task: Arc<GreptimeDBTelemetryTask>, greptimedb_telemetry_task: Arc<GreptimeDBTelemetryTask>,
leader_cached_kv_backend: Arc<LeaderCachedKvBackend>, leader_cached_kv_backend: Arc<LeaderCachedKvBackend>,
leadership_change_notifier: LeadershipChangeNotifier, leadership_change_notifier: LeadershipChangeNotifier,
mailbox: MailboxRef,
state: StateRef, state: StateRef,
} }
@@ -475,6 +476,9 @@ impl MetaStateHandler {
pub async fn on_leader_stop(&self) { pub async fn on_leader_stop(&self) {
self.state.write().unwrap().next_state(become_follower()); self.state.write().unwrap().next_state(become_follower());
// Enforces the mailbox to clear all pushers.
// The remaining heartbeat connections will be closed by the remote peer or keep-alive detection.
self.mailbox.reset().await;
self.leadership_change_notifier self.leadership_change_notifier
.notify_on_leader_stop() .notify_on_leader_stop()
.await; .await;
@@ -602,6 +606,7 @@ impl Metasrv {
state: self.state.clone(), state: self.state.clone(),
leader_cached_kv_backend: leader_cached_kv_backend.clone(), leader_cached_kv_backend: leader_cached_kv_backend.clone(),
leadership_change_notifier, leadership_change_notifier,
mailbox: self.mailbox.clone(),
}; };
let _handle = common_runtime::spawn_global(async move { let _handle = common_runtime::spawn_global(async move {
loop { loop {

View File

@@ -207,6 +207,9 @@ pub trait Mailbox: Send + Sync {
async fn broadcast(&self, ch: &BroadcastChannel, msg: &MailboxMessage) -> Result<()>; async fn broadcast(&self, ch: &BroadcastChannel, msg: &MailboxMessage) -> Result<()>;
async fn on_recv(&self, id: MessageId, maybe_msg: Result<MailboxMessage>) -> Result<()>; async fn on_recv(&self, id: MessageId, maybe_msg: Result<MailboxMessage>) -> Result<()>;
/// Reset all pushers of the mailbox.
async fn reset(&self);
} }
#[cfg(test)] #[cfg(test)]

View File

@@ -12,8 +12,9 @@
// See the License for the specific language governing permissions and // See the License for the specific language governing permissions and
// limitations under the License. // limitations under the License.
use common_meta::distributed_time_constants::default_etcd_client_options;
use common_meta::kv_backend::etcd::create_etcd_tls_options; use common_meta::kv_backend::etcd::create_etcd_tls_options;
use etcd_client::{Client, ConnectOptions}; use etcd_client::Client;
use servers::tls::{TlsMode, TlsOption}; use servers::tls::{TlsMode, TlsOption};
use snafu::ResultExt; use snafu::ResultExt;
@@ -30,14 +31,15 @@ pub async fn create_etcd_client_with_tls(
.filter(|x| !x.is_empty()) .filter(|x| !x.is_empty())
.collect::<Vec<_>>(); .collect::<Vec<_>>();
let connect_options = tls_config let mut connect_options = default_etcd_client_options();
.map(|c| create_etcd_tls_options(&convert_tls_option(c))) if let Some(tls_config) = tls_config
.transpose() && let Some(tls_options) = create_etcd_tls_options(&convert_tls_option(tls_config))
.context(BuildTlsOptionsSnafu)? .context(BuildTlsOptionsSnafu)?
.flatten() {
.map(|tls_options| ConnectOptions::new().with_tls(tls_options)); connect_options = connect_options.with_tls(tls_options);
}
Client::connect(&etcd_endpoints, connect_options) Client::connect(&etcd_endpoints, Some(connect_options))
.await .await
.context(error::ConnectEtcdSnafu) .context(error::ConnectEtcdSnafu)
} }

View File

@@ -43,9 +43,10 @@ pub(crate) use state::MetricEngineState;
use store_api::metadata::RegionMetadataRef; use store_api::metadata::RegionMetadataRef;
use store_api::metric_engine_consts::METRIC_ENGINE_NAME; use store_api::metric_engine_consts::METRIC_ENGINE_NAME;
use store_api::region_engine::{ use store_api::region_engine::{
BatchResponses, RegionEngine, RegionManifestInfo, RegionRole, RegionScannerRef, BatchResponses, CopyRegionFromRequest, CopyRegionFromResponse, RegionEngine,
RegionStatistic, RemapManifestsRequest, RemapManifestsResponse, SetRegionRoleStateResponse, RegionManifestInfo, RegionRole, RegionScannerRef, RegionStatistic, RemapManifestsRequest,
SetRegionRoleStateSuccess, SettableRegionRoleState, SyncManifestResponse, RemapManifestsResponse, SetRegionRoleStateResponse, SetRegionRoleStateSuccess,
SettableRegionRoleState, SyncManifestResponse,
}; };
use store_api::region_request::{ use store_api::region_request::{
BatchRegionDdlRequest, RegionCatchupRequest, RegionOpenRequest, RegionRequest, BatchRegionDdlRequest, RegionCatchupRequest, RegionOpenRequest, RegionRequest,
@@ -375,6 +376,14 @@ impl RegionEngine for MetricEngine {
} }
} }
async fn copy_region_from(
&self,
_region_id: RegionId,
_request: CopyRegionFromRequest,
) -> Result<CopyRegionFromResponse, BoxedError> {
todo!()
}
async fn set_region_role_state_gracefully( async fn set_region_role_state_gracefully(
&self, &self,
region_id: RegionId, region_id: RegionId,

View File

@@ -119,7 +119,7 @@ mod tests {
.index_file_path .index_file_path
.map(|path| path.replace(&e.file_id, "<file_id>")); .map(|path| path.replace(&e.file_id, "<file_id>"));
e.file_id = "<file_id>".to_string(); e.file_id = "<file_id>".to_string();
e.index_file_id = e.index_file_id.map(|_| "<index_file_id>".to_string()); e.index_version = 0;
format!("\n{:?}", e) format!("\n{:?}", e)
}) })
.sorted() .sorted()
@@ -128,12 +128,12 @@ mod tests {
assert_eq!( assert_eq!(
debug_format, debug_format,
r#" r#"
ManifestSstEntry { table_dir: "test_metric_region/", region_id: 47244640257(11, 1), table_id: 11, region_number: 1, region_group: 0, region_sequence: 1, file_id: "<file_id>", index_file_id: Some("<index_file_id>"), level: 0, file_path: "test_metric_region/11_0000000001/data/<file_id>.parquet", file_size: 3217, index_file_path: Some("test_metric_region/11_0000000001/data/index/<file_id>.puffin"), index_file_size: Some(235), num_rows: 10, num_row_groups: 1, num_series: Some(1), min_ts: 0::Millisecond, max_ts: 9::Millisecond, sequence: Some(20), origin_region_id: 47244640257(11, 1), node_id: None, visible: true } ManifestSstEntry { table_dir: "test_metric_region/", region_id: 47244640257(11, 1), table_id: 11, region_number: 1, region_group: 0, region_sequence: 1, file_id: "<file_id>", index_version: 0, level: 0, file_path: "test_metric_region/11_0000000001/data/<file_id>.parquet", file_size: 3217, index_file_path: Some("test_metric_region/11_0000000001/data/index/<file_id>.puffin"), index_file_size: Some(235), num_rows: 10, num_row_groups: 1, num_series: Some(1), min_ts: 0::Millisecond, max_ts: 9::Millisecond, sequence: Some(20), origin_region_id: 47244640257(11, 1), node_id: None, visible: true }
ManifestSstEntry { table_dir: "test_metric_region/", region_id: 47244640258(11, 2), table_id: 11, region_number: 2, region_group: 0, region_sequence: 2, file_id: "<file_id>", index_file_id: Some("<index_file_id>"), level: 0, file_path: "test_metric_region/11_0000000002/data/<file_id>.parquet", file_size: 3217, index_file_path: Some("test_metric_region/11_0000000002/data/index/<file_id>.puffin"), index_file_size: Some(235), num_rows: 10, num_row_groups: 1, num_series: Some(1), min_ts: 0::Millisecond, max_ts: 9::Millisecond, sequence: Some(10), origin_region_id: 47244640258(11, 2), node_id: None, visible: true } ManifestSstEntry { table_dir: "test_metric_region/", region_id: 47244640258(11, 2), table_id: 11, region_number: 2, region_group: 0, region_sequence: 2, file_id: "<file_id>", index_version: 0, level: 0, file_path: "test_metric_region/11_0000000002/data/<file_id>.parquet", file_size: 3217, index_file_path: Some("test_metric_region/11_0000000002/data/index/<file_id>.puffin"), index_file_size: Some(235), num_rows: 10, num_row_groups: 1, num_series: Some(1), min_ts: 0::Millisecond, max_ts: 9::Millisecond, sequence: Some(10), origin_region_id: 47244640258(11, 2), node_id: None, visible: true }
ManifestSstEntry { table_dir: "test_metric_region/", region_id: 47261417473(11, 16777217), table_id: 11, region_number: 16777217, region_group: 1, region_sequence: 1, file_id: "<file_id>", index_file_id: None, level: 0, file_path: "test_metric_region/11_0000000001/metadata/<file_id>.parquet", file_size: 3487, index_file_path: None, index_file_size: None, num_rows: 8, num_row_groups: 1, num_series: Some(8), min_ts: 0::Millisecond, max_ts: 0::Millisecond, sequence: Some(8), origin_region_id: 47261417473(11, 16777217), node_id: None, visible: true } ManifestSstEntry { table_dir: "test_metric_region/", region_id: 47261417473(11, 16777217), table_id: 11, region_number: 16777217, region_group: 1, region_sequence: 1, file_id: "<file_id>", index_version: 0, level: 0, file_path: "test_metric_region/11_0000000001/metadata/<file_id>.parquet", file_size: 3487, index_file_path: None, index_file_size: None, num_rows: 8, num_row_groups: 1, num_series: Some(8), min_ts: 0::Millisecond, max_ts: 0::Millisecond, sequence: Some(8), origin_region_id: 47261417473(11, 16777217), node_id: None, visible: true }
ManifestSstEntry { table_dir: "test_metric_region/", region_id: 47261417474(11, 16777218), table_id: 11, region_number: 16777218, region_group: 1, region_sequence: 2, file_id: "<file_id>", index_file_id: None, level: 0, file_path: "test_metric_region/11_0000000002/metadata/<file_id>.parquet", file_size: 3471, index_file_path: None, index_file_size: None, num_rows: 4, num_row_groups: 1, num_series: Some(4), min_ts: 0::Millisecond, max_ts: 0::Millisecond, sequence: Some(4), origin_region_id: 47261417474(11, 16777218), node_id: None, visible: true } ManifestSstEntry { table_dir: "test_metric_region/", region_id: 47261417474(11, 16777218), table_id: 11, region_number: 16777218, region_group: 1, region_sequence: 2, file_id: "<file_id>", index_version: 0, level: 0, file_path: "test_metric_region/11_0000000002/metadata/<file_id>.parquet", file_size: 3471, index_file_path: None, index_file_size: None, num_rows: 4, num_row_groups: 1, num_series: Some(4), min_ts: 0::Millisecond, max_ts: 0::Millisecond, sequence: Some(4), origin_region_id: 47261417474(11, 16777218), node_id: None, visible: true }
ManifestSstEntry { table_dir: "test_metric_region/", region_id: 94489280554(22, 42), table_id: 22, region_number: 42, region_group: 0, region_sequence: 42, file_id: "<file_id>", index_file_id: Some("<index_file_id>"), level: 0, file_path: "test_metric_region/22_0000000042/data/<file_id>.parquet", file_size: 3217, index_file_path: Some("test_metric_region/22_0000000042/data/index/<file_id>.puffin"), index_file_size: Some(235), num_rows: 10, num_row_groups: 1, num_series: Some(1), min_ts: 0::Millisecond, max_ts: 9::Millisecond, sequence: Some(10), origin_region_id: 94489280554(22, 42), node_id: None, visible: true } ManifestSstEntry { table_dir: "test_metric_region/", region_id: 94489280554(22, 42), table_id: 22, region_number: 42, region_group: 0, region_sequence: 42, file_id: "<file_id>", index_version: 0, level: 0, file_path: "test_metric_region/22_0000000042/data/<file_id>.parquet", file_size: 3217, index_file_path: Some("test_metric_region/22_0000000042/data/index/<file_id>.puffin"), index_file_size: Some(235), num_rows: 10, num_row_groups: 1, num_series: Some(1), min_ts: 0::Millisecond, max_ts: 9::Millisecond, sequence: Some(10), origin_region_id: 94489280554(22, 42), node_id: None, visible: true }
ManifestSstEntry { table_dir: "test_metric_region/", region_id: 94506057770(22, 16777258), table_id: 22, region_number: 16777258, region_group: 1, region_sequence: 42, file_id: "<file_id>", index_file_id: None, level: 0, file_path: "test_metric_region/22_0000000042/metadata/<file_id>.parquet", file_size: 3471, index_file_path: None, index_file_size: None, num_rows: 4, num_row_groups: 1, num_series: Some(4), min_ts: 0::Millisecond, max_ts: 0::Millisecond, sequence: Some(4), origin_region_id: 94506057770(22, 16777258), node_id: None, visible: true }"# ManifestSstEntry { table_dir: "test_metric_region/", region_id: 94506057770(22, 16777258), table_id: 22, region_number: 16777258, region_group: 1, region_sequence: 42, file_id: "<file_id>", index_version: 0, level: 0, file_path: "test_metric_region/22_0000000042/metadata/<file_id>.parquet", file_size: 3471, index_file_path: None, index_file_size: None, num_rows: 4, num_row_groups: 1, num_series: Some(4), min_ts: 0::Millisecond, max_ts: 0::Millisecond, sequence: Some(4), origin_region_id: 94506057770(22, 16777258), node_id: None, visible: true }"#,
); );
// list from storage // list from storage
let storage_entries = mito let storage_entries = mito

View File

@@ -30,6 +30,7 @@ common-error.workspace = true
common-grpc.workspace = true common-grpc.workspace = true
common-macro.workspace = true common-macro.workspace = true
common-meta.workspace = true common-meta.workspace = true
common-memory-manager.workspace = true
common-query.workspace = true common-query.workspace = true
common-recordbatch.workspace = true common-recordbatch.workspace = true
common-runtime.workspace = true common-runtime.workspace = true
@@ -48,6 +49,7 @@ dotenv.workspace = true
either.workspace = true either.workspace = true
futures.workspace = true futures.workspace = true
humantime-serde.workspace = true humantime-serde.workspace = true
humantime.workspace = true
index.workspace = true index.workspace = true
itertools.workspace = true itertools.workspace = true
greptime-proto.workspace = true greptime-proto.workspace = true

View File

@@ -144,6 +144,7 @@ async fn flush(mem: &SimpleBulkMemtable) {
let reader = Box::new(DedupReader::new( let reader = Box::new(DedupReader::new(
merge_reader, merge_reader,
read::dedup::LastRow::new(true), read::dedup::LastRow::new(true),
None,
)); ));
Source::Reader(reader) Source::Reader(reader)
}; };

View File

@@ -37,7 +37,7 @@ use crate::error::{CleanDirSnafu, DeleteIndexSnafu, DeleteSstSnafu, OpenDalSnafu
use crate::metrics::{COMPACTION_STAGE_ELAPSED, FLUSH_ELAPSED}; use crate::metrics::{COMPACTION_STAGE_ELAPSED, FLUSH_ELAPSED};
use crate::read::{FlatSource, Source}; use crate::read::{FlatSource, Source};
use crate::region::options::IndexOptions; use crate::region::options::IndexOptions;
use crate::sst::file::{FileHandle, RegionFileId}; use crate::sst::file::{FileHandle, RegionFileId, RegionIndexId};
use crate::sst::index::IndexerBuilderImpl; use crate::sst::index::IndexerBuilderImpl;
use crate::sst::index::intermediate::IntermediateManager; use crate::sst::index::intermediate::IntermediateManager;
use crate::sst::index::puffin_manager::{PuffinManagerFactory, SstPuffinManager}; use crate::sst::index::puffin_manager::{PuffinManagerFactory, SstPuffinManager};
@@ -216,7 +216,7 @@ impl AccessLayer {
pub(crate) async fn delete_sst( pub(crate) async fn delete_sst(
&self, &self,
region_file_id: &RegionFileId, region_file_id: &RegionFileId,
index_file_id: &RegionFileId, index_file_id: &RegionIndexId,
) -> Result<()> { ) -> Result<()> {
let path = location::sst_file_path(&self.table_dir, *region_file_id, self.path_type); let path = location::sst_file_path(&self.table_dir, *region_file_id, self.path_type);
self.object_store self.object_store
@@ -226,14 +226,30 @@ impl AccessLayer {
file_id: region_file_id.file_id(), file_id: region_file_id.file_id(),
})?; })?;
let path = location::index_file_path(&self.table_dir, *index_file_id, self.path_type); // Delete all versions of the index file.
for version in 0..=index_file_id.version {
let index_id = RegionIndexId::new(*region_file_id, version);
self.delete_index(index_id).await?;
}
Ok(())
}
pub(crate) async fn delete_index(
&self,
index_file_id: RegionIndexId,
) -> Result<(), crate::error::Error> {
let path = location::index_file_path(
&self.table_dir,
RegionIndexId::new(index_file_id.file_id, index_file_id.version),
self.path_type,
);
self.object_store self.object_store
.delete(&path) .delete(&path)
.await .await
.context(DeleteIndexSnafu { .context(DeleteIndexSnafu {
file_id: region_file_id.file_id(), file_id: index_file_id.file_id(),
})?; })?;
Ok(()) Ok(())
} }
@@ -291,6 +307,7 @@ impl AccessLayer {
puffin_manager: self puffin_manager: self
.puffin_manager_factory .puffin_manager_factory
.build(store, path_provider.clone()), .build(store, path_provider.clone()),
write_cache_enabled: false,
intermediate_manager: self.intermediate_manager.clone(), intermediate_manager: self.intermediate_manager.clone(),
index_options: request.index_options, index_options: request.index_options,
inverted_index_config: request.inverted_index_config, inverted_index_config: request.inverted_index_config,
@@ -468,9 +485,10 @@ impl TempFileCleaner {
} }
/// Removes the SST and index file from the local atomic dir by the file id. /// Removes the SST and index file from the local atomic dir by the file id.
/// This only removes the initial index, since the index version is always 0 for a new SST, this method should be safe to pass 0.
pub(crate) async fn clean_by_file_id(&self, file_id: FileId) { pub(crate) async fn clean_by_file_id(&self, file_id: FileId) {
let sst_key = IndexKey::new(self.region_id, file_id, FileType::Parquet).to_string(); let sst_key = IndexKey::new(self.region_id, file_id, FileType::Parquet).to_string();
let index_key = IndexKey::new(self.region_id, file_id, FileType::Puffin).to_string(); let index_key = IndexKey::new(self.region_id, file_id, FileType::Puffin(0)).to_string();
Self::clean_atomic_dir_files(&self.object_store, &[&sst_key, &index_key]).await; Self::clean_atomic_dir_files(&self.object_store, &[&sst_key, &index_key]).await;
} }
@@ -553,9 +571,12 @@ async fn clean_dir(dir: &str) -> Result<()> {
/// Path provider for SST file and index file. /// Path provider for SST file and index file.
pub trait FilePathProvider: Send + Sync { pub trait FilePathProvider: Send + Sync {
/// Creates index file path of given file id. /// Creates index file path of given file id. Version default to 0, and not shown in the path.
fn build_index_file_path(&self, file_id: RegionFileId) -> String; fn build_index_file_path(&self, file_id: RegionFileId) -> String;
/// Creates index file path of given index id (with version support).
fn build_index_file_path_with_version(&self, index_id: RegionIndexId) -> String;
/// Creates SST file path of given file id. /// Creates SST file path of given file id.
fn build_sst_file_path(&self, file_id: RegionFileId) -> String; fn build_sst_file_path(&self, file_id: RegionFileId) -> String;
} }
@@ -575,7 +596,16 @@ impl WriteCachePathProvider {
impl FilePathProvider for WriteCachePathProvider { impl FilePathProvider for WriteCachePathProvider {
fn build_index_file_path(&self, file_id: RegionFileId) -> String { fn build_index_file_path(&self, file_id: RegionFileId) -> String {
let puffin_key = IndexKey::new(file_id.region_id(), file_id.file_id(), FileType::Puffin); let puffin_key = IndexKey::new(file_id.region_id(), file_id.file_id(), FileType::Puffin(0));
self.file_cache.cache_file_path(puffin_key)
}
fn build_index_file_path_with_version(&self, index_id: RegionIndexId) -> String {
let puffin_key = IndexKey::new(
index_id.region_id(),
index_id.file_id(),
FileType::Puffin(index_id.version),
);
self.file_cache.cache_file_path(puffin_key) self.file_cache.cache_file_path(puffin_key)
} }
@@ -605,7 +635,11 @@ impl RegionFilePathFactory {
impl FilePathProvider for RegionFilePathFactory { impl FilePathProvider for RegionFilePathFactory {
fn build_index_file_path(&self, file_id: RegionFileId) -> String { fn build_index_file_path(&self, file_id: RegionFileId) -> String {
location::index_file_path(&self.table_dir, file_id, self.path_type) location::index_file_path_legacy(&self.table_dir, file_id, self.path_type)
}
fn build_index_file_path_with_version(&self, index_id: RegionIndexId) -> String {
location::index_file_path(&self.table_dir, index_id, self.path_type)
} }
fn build_sst_file_path(&self, file_id: RegionFileId) -> String { fn build_sst_file_path(&self, file_id: RegionFileId) -> String {

View File

@@ -34,6 +34,7 @@ use index::bloom_filter_index::{BloomFilterIndexCache, BloomFilterIndexCacheRef}
use index::result_cache::IndexResultCache; use index::result_cache::IndexResultCache;
use moka::notification::RemovalCause; use moka::notification::RemovalCause;
use moka::sync::Cache; use moka::sync::Cache;
use object_store::ObjectStore;
use parquet::file::metadata::ParquetMetaData; use parquet::file::metadata::ParquetMetaData;
use puffin::puffin_manager::cache::{PuffinMetadataCache, PuffinMetadataCacheRef}; use puffin::puffin_manager::cache::{PuffinMetadataCache, PuffinMetadataCacheRef};
use store_api::storage::{ConcreteDataType, FileId, RegionId, TimeSeriesRowSelector}; use store_api::storage::{ConcreteDataType, FileId, RegionId, TimeSeriesRowSelector};
@@ -44,7 +45,7 @@ use crate::cache::index::inverted_index::{InvertedIndexCache, InvertedIndexCache
use crate::cache::write_cache::WriteCacheRef; use crate::cache::write_cache::WriteCacheRef;
use crate::metrics::{CACHE_BYTES, CACHE_EVICTION, CACHE_HIT, CACHE_MISS}; use crate::metrics::{CACHE_BYTES, CACHE_EVICTION, CACHE_HIT, CACHE_MISS};
use crate::read::Batch; use crate::read::Batch;
use crate::sst::file::RegionFileId; use crate::sst::file::{RegionFileId, RegionIndexId};
use crate::sst::parquet::reader::MetadataCacheMetrics; use crate::sst::parquet::reader::MetadataCacheMetrics;
/// Metrics type key for sst meta. /// Metrics type key for sst meta.
@@ -180,7 +181,7 @@ impl CacheStrategy {
} }
/// Calls [CacheManager::evict_puffin_cache()]. /// Calls [CacheManager::evict_puffin_cache()].
pub async fn evict_puffin_cache(&self, file_id: RegionFileId) { pub async fn evict_puffin_cache(&self, file_id: RegionIndexId) {
match self { match self {
CacheStrategy::EnableAll(cache_manager) => { CacheStrategy::EnableAll(cache_manager) => {
cache_manager.evict_puffin_cache(file_id).await cache_manager.evict_puffin_cache(file_id).await
@@ -263,6 +264,26 @@ impl CacheStrategy {
CacheStrategy::Compaction(_) | CacheStrategy::Disabled => None, CacheStrategy::Compaction(_) | CacheStrategy::Disabled => None,
} }
} }
/// Triggers download if the strategy is [CacheStrategy::EnableAll] and write cache is available.
pub fn maybe_download_background(
&self,
index_key: IndexKey,
remote_path: String,
remote_store: ObjectStore,
file_size: u64,
) {
if let CacheStrategy::EnableAll(cache_manager) = self
&& let Some(write_cache) = cache_manager.write_cache()
{
write_cache.file_cache().maybe_download_background(
index_key,
remote_path,
remote_store,
file_size,
);
}
}
} }
/// Manages cached data for the engine. /// Manages cached data for the engine.
@@ -400,7 +421,7 @@ impl CacheManager {
} }
/// Evicts every puffin-related cache entry for the given file. /// Evicts every puffin-related cache entry for the given file.
pub async fn evict_puffin_cache(&self, file_id: RegionFileId) { pub async fn evict_puffin_cache(&self, file_id: RegionIndexId) {
if let Some(cache) = &self.bloom_filter_index_cache { if let Some(cache) = &self.bloom_filter_index_cache {
cache.invalidate_file(file_id.file_id()); cache.invalidate_file(file_id.file_id());
} }
@@ -422,7 +443,7 @@ impl CacheManager {
.remove(IndexKey::new( .remove(IndexKey::new(
file_id.region_id(), file_id.region_id(),
file_id.file_id(), file_id.file_id(),
FileType::Puffin, FileType::Puffin(file_id.version),
)) ))
.await; .await;
} }
@@ -949,7 +970,7 @@ mod tests {
let cache = Arc::new(cache); let cache = Arc::new(cache);
let region_id = RegionId::new(1, 1); let region_id = RegionId::new(1, 1);
let region_file_id = RegionFileId::new(region_id, FileId::random()); let index_id = RegionIndexId::new(RegionFileId::new(region_id, FileId::random()), 0);
let column_id: ColumnId = 1; let column_id: ColumnId = 1;
let bloom_cache = cache.bloom_filter_index_cache().unwrap().clone(); let bloom_cache = cache.bloom_filter_index_cache().unwrap().clone();
@@ -957,16 +978,21 @@ mod tests {
let result_cache = cache.index_result_cache().unwrap(); let result_cache = cache.index_result_cache().unwrap();
let puffin_metadata_cache = cache.puffin_metadata_cache().unwrap().clone(); let puffin_metadata_cache = cache.puffin_metadata_cache().unwrap().clone();
let bloom_key = (region_file_id.file_id(), column_id, Tag::Skipping); let bloom_key = (
index_id.file_id(),
index_id.version,
column_id,
Tag::Skipping,
);
bloom_cache.put_metadata(bloom_key, Arc::new(BloomFilterMeta::default())); bloom_cache.put_metadata(bloom_key, Arc::new(BloomFilterMeta::default()));
inverted_cache.put_metadata( inverted_cache.put_metadata(
region_file_id.file_id(), (index_id.file_id(), index_id.version),
Arc::new(InvertedIndexMetas::default()), Arc::new(InvertedIndexMetas::default()),
); );
let predicate = PredicateKey::new_bloom(Arc::new(BTreeMap::new())); let predicate = PredicateKey::new_bloom(Arc::new(BTreeMap::new()));
let selection = Arc::new(RowGroupSelection::default()); let selection = Arc::new(RowGroupSelection::default());
result_cache.put(predicate.clone(), region_file_id.file_id(), selection); result_cache.put(predicate.clone(), index_id.file_id(), selection);
let file_id_str = region_file_id.to_string(); let file_id_str = index_id.to_string();
let metadata = Arc::new(FileMetadata { let metadata = Arc::new(FileMetadata {
blobs: Vec::new(), blobs: Vec::new(),
properties: HashMap::new(), properties: HashMap::new(),
@@ -976,40 +1002,32 @@ mod tests {
assert!(bloom_cache.get_metadata(bloom_key).is_some()); assert!(bloom_cache.get_metadata(bloom_key).is_some());
assert!( assert!(
inverted_cache inverted_cache
.get_metadata(region_file_id.file_id()) .get_metadata((index_id.file_id(), index_id.version))
.is_some()
);
assert!(
result_cache
.get(&predicate, region_file_id.file_id())
.is_some() .is_some()
); );
assert!(result_cache.get(&predicate, index_id.file_id()).is_some());
assert!(puffin_metadata_cache.get_metadata(&file_id_str).is_some()); assert!(puffin_metadata_cache.get_metadata(&file_id_str).is_some());
cache.evict_puffin_cache(region_file_id).await; cache.evict_puffin_cache(index_id).await;
assert!(bloom_cache.get_metadata(bloom_key).is_none()); assert!(bloom_cache.get_metadata(bloom_key).is_none());
assert!( assert!(
inverted_cache inverted_cache
.get_metadata(region_file_id.file_id()) .get_metadata((index_id.file_id(), index_id.version))
.is_none()
);
assert!(
result_cache
.get(&predicate, region_file_id.file_id())
.is_none() .is_none()
); );
assert!(result_cache.get(&predicate, index_id.file_id()).is_none());
assert!(puffin_metadata_cache.get_metadata(&file_id_str).is_none()); assert!(puffin_metadata_cache.get_metadata(&file_id_str).is_none());
// Refill caches and evict via CacheStrategy to ensure delegation works. // Refill caches and evict via CacheStrategy to ensure delegation works.
bloom_cache.put_metadata(bloom_key, Arc::new(BloomFilterMeta::default())); bloom_cache.put_metadata(bloom_key, Arc::new(BloomFilterMeta::default()));
inverted_cache.put_metadata( inverted_cache.put_metadata(
region_file_id.file_id(), (index_id.file_id(), index_id.version),
Arc::new(InvertedIndexMetas::default()), Arc::new(InvertedIndexMetas::default()),
); );
result_cache.put( result_cache.put(
predicate.clone(), predicate.clone(),
region_file_id.file_id(), index_id.file_id(),
Arc::new(RowGroupSelection::default()), Arc::new(RowGroupSelection::default()),
); );
puffin_metadata_cache.put_metadata( puffin_metadata_cache.put_metadata(
@@ -1021,19 +1039,15 @@ mod tests {
); );
let strategy = CacheStrategy::EnableAll(cache.clone()); let strategy = CacheStrategy::EnableAll(cache.clone());
strategy.evict_puffin_cache(region_file_id).await; strategy.evict_puffin_cache(index_id).await;
assert!(bloom_cache.get_metadata(bloom_key).is_none()); assert!(bloom_cache.get_metadata(bloom_key).is_none());
assert!( assert!(
inverted_cache inverted_cache
.get_metadata(region_file_id.file_id()) .get_metadata((index_id.file_id(), index_id.version))
.is_none()
);
assert!(
result_cache
.get(&predicate, region_file_id.file_id())
.is_none() .is_none()
); );
assert!(result_cache.get(&predicate, index_id.file_id()).is_none());
assert!(puffin_metadata_cache.get_metadata(&file_id_str).is_none()); assert!(puffin_metadata_cache.get_metadata(&file_id_str).is_none());
} }
} }

View File

@@ -31,7 +31,7 @@ use object_store::{ErrorKind, ObjectStore, Reader};
use parquet::file::metadata::ParquetMetaData; use parquet::file::metadata::ParquetMetaData;
use snafu::ResultExt; use snafu::ResultExt;
use store_api::storage::{FileId, RegionId}; use store_api::storage::{FileId, RegionId};
use tokio::sync::mpsc::UnboundedReceiver; use tokio::sync::mpsc::{Sender, UnboundedReceiver};
use crate::access_layer::TempFileCleaner; use crate::access_layer::TempFileCleaner;
use crate::cache::{FILE_TYPE, INDEX_TYPE}; use crate::cache::{FILE_TYPE, INDEX_TYPE};
@@ -55,6 +55,17 @@ pub(crate) const DEFAULT_INDEX_CACHE_PERCENT: u8 = 20;
/// Minimum capacity for each cache (512MB). /// Minimum capacity for each cache (512MB).
const MIN_CACHE_CAPACITY: u64 = 512 * 1024 * 1024; const MIN_CACHE_CAPACITY: u64 = 512 * 1024 * 1024;
/// Channel capacity for background download tasks.
const DOWNLOAD_TASK_CHANNEL_SIZE: usize = 64;
/// A task to download a file in the background.
struct DownloadTask {
index_key: IndexKey,
remote_path: String,
remote_store: ObjectStore,
file_size: u64,
}
/// Inner struct for FileCache that can be used in spawned tasks. /// Inner struct for FileCache that can be used in spawned tasks.
#[derive(Debug)] #[derive(Debug)]
struct FileCacheInner { struct FileCacheInner {
@@ -71,7 +82,7 @@ impl FileCacheInner {
fn memory_index(&self, file_type: FileType) -> &Cache<IndexKey, IndexValue> { fn memory_index(&self, file_type: FileType) -> &Cache<IndexKey, IndexValue> {
match file_type { match file_type {
FileType::Parquet => &self.parquet_index, FileType::Parquet => &self.parquet_index,
FileType::Puffin => &self.puffin_index, FileType::Puffin { .. } => &self.puffin_index,
} }
} }
@@ -130,7 +141,7 @@ impl FileCacheInner {
// Track sizes separately for each file type // Track sizes separately for each file type
match key.file_type { match key.file_type {
FileType::Parquet => parquet_size += size, FileType::Parquet => parquet_size += size,
FileType::Puffin => puffin_size += size, FileType::Puffin { .. } => puffin_size += size,
} }
} }
// The metrics is a signed int gauge so we can updates it finally. // The metrics is a signed int gauge so we can updates it finally.
@@ -170,21 +181,21 @@ impl FileCacheInner {
remote_path: &str, remote_path: &str,
remote_store: &ObjectStore, remote_store: &ObjectStore,
file_size: u64, file_size: u64,
concurrency: usize,
) -> Result<()> { ) -> Result<()> {
const DOWNLOAD_READER_CONCURRENCY: usize = 8;
const DOWNLOAD_READER_CHUNK_SIZE: ReadableSize = ReadableSize::mb(8); const DOWNLOAD_READER_CHUNK_SIZE: ReadableSize = ReadableSize::mb(8);
let file_type = index_key.file_type; let file_type = index_key.file_type;
let timer = WRITE_CACHE_DOWNLOAD_ELAPSED let timer = WRITE_CACHE_DOWNLOAD_ELAPSED
.with_label_values(&[match file_type { .with_label_values(&[match file_type {
FileType::Parquet => "download_parquet", FileType::Parquet => "download_parquet",
FileType::Puffin => "download_puffin", FileType::Puffin { .. } => "download_puffin",
}]) }])
.start_timer(); .start_timer();
let reader = remote_store let reader = remote_store
.reader_with(remote_path) .reader_with(remote_path)
.concurrent(DOWNLOAD_READER_CONCURRENCY) .concurrent(concurrency)
.chunk(DOWNLOAD_READER_CHUNK_SIZE.as_bytes() as usize) .chunk(DOWNLOAD_READER_CHUNK_SIZE.as_bytes() as usize)
.await .await
.context(error::OpenDalSnafu)? .context(error::OpenDalSnafu)?
@@ -238,11 +249,14 @@ impl FileCacheInner {
remote_path: &str, remote_path: &str,
remote_store: &ObjectStore, remote_store: &ObjectStore,
file_size: u64, file_size: u64,
concurrency: usize,
) -> Result<()> { ) -> Result<()> {
if let Err(e) = self if let Err(e) = self
.download_without_cleaning(index_key, remote_path, remote_store, file_size) .download_without_cleaning(index_key, remote_path, remote_store, file_size, concurrency)
.await .await
{ {
error!(e; "Failed to download file '{}' for region {}", remote_path, index_key.region_id);
let filename = index_key.to_string(); let filename = index_key.to_string();
TempFileCleaner::clean_atomic_dir_files(&self.local_store, &[&filename]).await; TempFileCleaner::clean_atomic_dir_files(&self.local_store, &[&filename]).await;
@@ -251,6 +265,11 @@ impl FileCacheInner {
Ok(()) Ok(())
} }
/// Checks if the key is in the file cache.
fn contains_key(&self, key: &IndexKey) -> bool {
self.memory_index(key.file_type).contains_key(key)
}
} }
/// A file cache manages files on local store and evict files based /// A file cache manages files on local store and evict files based
@@ -261,6 +280,8 @@ pub(crate) struct FileCache {
inner: Arc<FileCacheInner>, inner: Arc<FileCacheInner>,
/// Capacity of the puffin (index) cache in bytes. /// Capacity of the puffin (index) cache in bytes.
puffin_capacity: u64, puffin_capacity: u64,
/// Channel for background download tasks. None if background worker is disabled.
download_task_tx: Option<Sender<DownloadTask>>,
} }
pub(crate) type FileCacheRef = Arc<FileCache>; pub(crate) type FileCacheRef = Arc<FileCache>;
@@ -272,6 +293,7 @@ impl FileCache {
capacity: ReadableSize, capacity: ReadableSize,
ttl: Option<Duration>, ttl: Option<Duration>,
index_cache_percent: Option<u8>, index_cache_percent: Option<u8>,
enable_background_worker: bool,
) -> FileCache { ) -> FileCache {
// Validate and use the provided percent or default // Validate and use the provided percent or default
let index_percent = index_cache_percent let index_percent = index_cache_percent
@@ -306,12 +328,54 @@ impl FileCache {
puffin_index, puffin_index,
}); });
// Only create channel and spawn worker if background download is enabled
let download_task_tx = if enable_background_worker {
let (tx, rx) = tokio::sync::mpsc::channel(DOWNLOAD_TASK_CHANNEL_SIZE);
Self::spawn_download_worker(inner.clone(), rx);
Some(tx)
} else {
None
};
FileCache { FileCache {
inner, inner,
puffin_capacity, puffin_capacity,
download_task_tx,
} }
} }
/// Spawns a background worker to process download tasks.
fn spawn_download_worker(
inner: Arc<FileCacheInner>,
mut download_task_rx: tokio::sync::mpsc::Receiver<DownloadTask>,
) {
tokio::spawn(async move {
info!("Background download worker started");
while let Some(task) = download_task_rx.recv().await {
// Check if the file is already in the cache
if inner.contains_key(&task.index_key) {
debug!(
"Skipping background download for region {}, file {} - already in cache",
task.index_key.region_id, task.index_key.file_id
);
continue;
}
// Ignores background download errors.
let _ = inner
.download(
task.index_key,
&task.remote_path,
&task.remote_store,
task.file_size,
1, // Background downloads use concurrency=1
)
.await;
}
info!("Background download worker stopped");
});
}
/// Builds a cache for a specific file type. /// Builds a cache for a specific file type.
fn build_cache( fn build_cache(
local_store: ObjectStore, local_store: ObjectStore,
@@ -333,11 +397,9 @@ impl FileCache {
let file_path = cache_file_path(FILE_DIR, *key); let file_path = cache_file_path(FILE_DIR, *key);
async move { async move {
if let RemovalCause::Replaced = cause { if let RemovalCause::Replaced = cause {
// The cache is replaced by another file. This is unexpected, we don't remove the same // The cache is replaced by another file (maybe download again). We don't remove the same
// file but updates the metrics as the file is already replaced by users. // file but updates the metrics as the file is already replaced by users.
CACHE_BYTES.with_label_values(&[label]).sub(value.file_size.into()); CACHE_BYTES.with_label_values(&[label]).sub(value.file_size.into());
// TODO(yingwen): Don't log warn later.
warn!("Replace existing cache {} for region {} unexpectedly", file_path, key.region_id);
return; return;
} }
@@ -553,7 +615,7 @@ impl FileCache {
/// Checks if the key is in the file cache. /// Checks if the key is in the file cache.
pub(crate) fn contains_key(&self, key: &IndexKey) -> bool { pub(crate) fn contains_key(&self, key: &IndexKey) -> bool {
self.inner.memory_index(key.file_type).contains_key(key) self.inner.contains_key(key)
} }
/// Returns the capacity of the puffin (index) cache in bytes. /// Returns the capacity of the puffin (index) cache in bytes.
@@ -576,9 +638,42 @@ impl FileCache {
file_size: u64, file_size: u64,
) -> Result<()> { ) -> Result<()> {
self.inner self.inner
.download(index_key, remote_path, remote_store, file_size) .download(index_key, remote_path, remote_store, file_size, 8) // Foreground uses concurrency=8
.await .await
} }
/// Downloads a file in `remote_path` from the remote object store to the local cache
/// (specified by `index_key`) in the background. Errors are logged but not returned.
///
/// This method attempts to send a download task to the background worker.
/// If the channel is full, the task is silently dropped.
pub(crate) fn maybe_download_background(
&self,
index_key: IndexKey,
remote_path: String,
remote_store: ObjectStore,
file_size: u64,
) {
// Do nothing if background worker is disabled (channel is None)
let Some(tx) = &self.download_task_tx else {
return;
};
let task = DownloadTask {
index_key,
remote_path,
remote_store,
file_size,
};
// Try to send the task; if the channel is full, just drop it
if let Err(e) = tx.try_send(task) {
debug!(
"Failed to queue background download task for region {}, file {}: {:?}",
index_key.region_id, index_key.file_id, e
);
}
}
} }
/// Key of file cache index. /// Key of file cache index.
@@ -607,7 +702,7 @@ impl fmt::Display for IndexKey {
"{}.{}.{}", "{}.{}.{}",
self.region_id.as_u64(), self.region_id.as_u64(),
self.file_id, self.file_id,
self.file_type.as_str() self.file_type
) )
} }
} }
@@ -618,7 +713,16 @@ pub enum FileType {
/// Parquet file. /// Parquet file.
Parquet, Parquet,
/// Puffin file. /// Puffin file.
Puffin, Puffin(u64),
}
impl fmt::Display for FileType {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
match self {
FileType::Parquet => write!(f, "parquet"),
FileType::Puffin(version) => write!(f, "{}.puffin", version),
}
}
} }
impl FileType { impl FileType {
@@ -626,16 +730,16 @@ impl FileType {
fn parse(s: &str) -> Option<FileType> { fn parse(s: &str) -> Option<FileType> {
match s { match s {
"parquet" => Some(FileType::Parquet), "parquet" => Some(FileType::Parquet),
"puffin" => Some(FileType::Puffin), "puffin" => Some(FileType::Puffin(0)),
_ => None, _ => {
// if post-fix with .puffin, try to parse the version
if let Some(version_str) = s.strip_suffix(".puffin") {
let version = version_str.parse::<u64>().ok()?;
Some(FileType::Puffin(version))
} else {
None
} }
} }
/// Converts the file type to string.
fn as_str(&self) -> &'static str {
match self {
FileType::Parquet => "parquet",
FileType::Puffin => "puffin",
} }
} }
@@ -643,7 +747,7 @@ impl FileType {
fn metric_label(&self) -> &'static str { fn metric_label(&self) -> &'static str {
match self { match self {
FileType::Parquet => FILE_TYPE, FileType::Parquet => FILE_TYPE,
FileType::Puffin => INDEX_TYPE, FileType::Puffin(_) => INDEX_TYPE,
} }
} }
} }
@@ -699,6 +803,7 @@ mod tests {
ReadableSize::mb(10), ReadableSize::mb(10),
Some(Duration::from_millis(10)), Some(Duration::from_millis(10)),
None, None,
true, // enable_background_worker
); );
let region_id = RegionId::new(2000, 0); let region_id = RegionId::new(2000, 0);
let file_id = FileId::random(); let file_id = FileId::random();
@@ -735,7 +840,13 @@ mod tests {
let dir = create_temp_dir(""); let dir = create_temp_dir("");
let local_store = new_fs_store(dir.path().to_str().unwrap()); let local_store = new_fs_store(dir.path().to_str().unwrap());
let cache = FileCache::new(local_store.clone(), ReadableSize::mb(10), None, None); let cache = FileCache::new(
local_store.clone(),
ReadableSize::mb(10),
None,
None,
true, // enable_background_worker
);
let region_id = RegionId::new(2000, 0); let region_id = RegionId::new(2000, 0);
let file_id = FileId::random(); let file_id = FileId::random();
let key = IndexKey::new(region_id, file_id, FileType::Parquet); let key = IndexKey::new(region_id, file_id, FileType::Parquet);
@@ -783,7 +894,13 @@ mod tests {
let dir = create_temp_dir(""); let dir = create_temp_dir("");
let local_store = new_fs_store(dir.path().to_str().unwrap()); let local_store = new_fs_store(dir.path().to_str().unwrap());
let cache = FileCache::new(local_store.clone(), ReadableSize::mb(10), None, None); let cache = FileCache::new(
local_store.clone(),
ReadableSize::mb(10),
None,
None,
true, // enable_background_worker
);
let region_id = RegionId::new(2000, 0); let region_id = RegionId::new(2000, 0);
let file_id = FileId::random(); let file_id = FileId::random();
let key = IndexKey::new(region_id, file_id, FileType::Parquet); let key = IndexKey::new(region_id, file_id, FileType::Parquet);
@@ -815,7 +932,13 @@ mod tests {
async fn test_file_cache_recover() { async fn test_file_cache_recover() {
let dir = create_temp_dir(""); let dir = create_temp_dir("");
let local_store = new_fs_store(dir.path().to_str().unwrap()); let local_store = new_fs_store(dir.path().to_str().unwrap());
let cache = FileCache::new(local_store.clone(), ReadableSize::mb(10), None, None); let cache = FileCache::new(
local_store.clone(),
ReadableSize::mb(10),
None,
None,
true, // enable_background_worker
);
let region_id = RegionId::new(2000, 0); let region_id = RegionId::new(2000, 0);
let file_type = FileType::Parquet; let file_type = FileType::Parquet;
@@ -841,7 +964,13 @@ mod tests {
} }
// Recover the cache. // Recover the cache.
let cache = FileCache::new(local_store.clone(), ReadableSize::mb(10), None, None); let cache = FileCache::new(
local_store.clone(),
ReadableSize::mb(10),
None,
None,
true, // enable_background_worker
);
// No entry before recovery. // No entry before recovery.
assert!( assert!(
cache cache
@@ -870,7 +999,13 @@ mod tests {
async fn test_file_cache_read_ranges() { async fn test_file_cache_read_ranges() {
let dir = create_temp_dir(""); let dir = create_temp_dir("");
let local_store = new_fs_store(dir.path().to_str().unwrap()); let local_store = new_fs_store(dir.path().to_str().unwrap());
let file_cache = FileCache::new(local_store.clone(), ReadableSize::mb(10), None, None); let file_cache = FileCache::new(
local_store.clone(),
ReadableSize::mb(10),
None,
None,
true, // enable_background_worker
);
let region_id = RegionId::new(2000, 0); let region_id = RegionId::new(2000, 0);
let file_id = FileId::random(); let file_id = FileId::random();
let key = IndexKey::new(region_id, file_id, FileType::Parquet); let key = IndexKey::new(region_id, file_id, FileType::Parquet);
@@ -921,6 +1056,15 @@ mod tests {
IndexKey::new(region_id, file_id, FileType::Parquet), IndexKey::new(region_id, file_id, FileType::Parquet),
parse_index_key("5299989643269.3368731b-a556-42b8-a5df-9c31ce155095.parquet").unwrap() parse_index_key("5299989643269.3368731b-a556-42b8-a5df-9c31ce155095.parquet").unwrap()
); );
assert_eq!(
IndexKey::new(region_id, file_id, FileType::Puffin(0)),
parse_index_key("5299989643269.3368731b-a556-42b8-a5df-9c31ce155095.puffin").unwrap()
);
assert_eq!(
IndexKey::new(region_id, file_id, FileType::Puffin(42)),
parse_index_key("5299989643269.3368731b-a556-42b8-a5df-9c31ce155095.42.puffin")
.unwrap()
);
assert!(parse_index_key("").is_none()); assert!(parse_index_key("").is_none());
assert!(parse_index_key(".").is_none()); assert!(parse_index_key(".").is_none());
assert!(parse_index_key("5299989643269").is_none()); assert!(parse_index_key("5299989643269").is_none());

View File

@@ -21,7 +21,7 @@ use async_trait::async_trait;
use bytes::Bytes; use bytes::Bytes;
use index::bloom_filter::error::Result; use index::bloom_filter::error::Result;
use index::bloom_filter::reader::{BloomFilterReadMetrics, BloomFilterReader}; use index::bloom_filter::reader::{BloomFilterReadMetrics, BloomFilterReader};
use store_api::storage::{ColumnId, FileId}; use store_api::storage::{ColumnId, FileId, IndexVersion};
use crate::cache::index::{INDEX_METADATA_TYPE, IndexCache, PageKey}; use crate::cache::index::{INDEX_METADATA_TYPE, IndexCache, PageKey};
use crate::metrics::{CACHE_HIT, CACHE_MISS}; use crate::metrics::{CACHE_HIT, CACHE_MISS};
@@ -35,8 +35,10 @@ pub enum Tag {
Fulltext, Fulltext,
} }
pub type BloomFilterIndexKey = (FileId, IndexVersion, ColumnId, Tag);
/// Cache for bloom filter index. /// Cache for bloom filter index.
pub type BloomFilterIndexCache = IndexCache<(FileId, ColumnId, Tag), BloomFilterMeta>; pub type BloomFilterIndexCache = IndexCache<BloomFilterIndexKey, BloomFilterMeta>;
pub type BloomFilterIndexCacheRef = Arc<BloomFilterIndexCache>; pub type BloomFilterIndexCacheRef = Arc<BloomFilterIndexCache>;
impl BloomFilterIndexCache { impl BloomFilterIndexCache {
@@ -59,11 +61,9 @@ impl BloomFilterIndexCache {
} }
/// Calculates weight for bloom filter index metadata. /// Calculates weight for bloom filter index metadata.
fn bloom_filter_index_metadata_weight( fn bloom_filter_index_metadata_weight(k: &BloomFilterIndexKey, meta: &Arc<BloomFilterMeta>) -> u32 {
k: &(FileId, ColumnId, Tag),
meta: &Arc<BloomFilterMeta>,
) -> u32 {
let base = k.0.as_bytes().len() let base = k.0.as_bytes().len()
+ std::mem::size_of::<IndexVersion>()
+ std::mem::size_of::<ColumnId>() + std::mem::size_of::<ColumnId>()
+ std::mem::size_of::<Tag>() + std::mem::size_of::<Tag>()
+ std::mem::size_of::<BloomFilterMeta>(); + std::mem::size_of::<BloomFilterMeta>();
@@ -75,16 +75,14 @@ fn bloom_filter_index_metadata_weight(
} }
/// Calculates weight for bloom filter index content. /// Calculates weight for bloom filter index content.
fn bloom_filter_index_content_weight( fn bloom_filter_index_content_weight((k, _): &(BloomFilterIndexKey, PageKey), v: &Bytes) -> u32 {
(k, _): &((FileId, ColumnId, Tag), PageKey),
v: &Bytes,
) -> u32 {
(k.0.as_bytes().len() + std::mem::size_of::<ColumnId>() + v.len()) as u32 (k.0.as_bytes().len() + std::mem::size_of::<ColumnId>() + v.len()) as u32
} }
/// Bloom filter index blob reader with cache. /// Bloom filter index blob reader with cache.
pub struct CachedBloomFilterIndexBlobReader<R> { pub struct CachedBloomFilterIndexBlobReader<R> {
file_id: FileId, file_id: FileId,
index_version: IndexVersion,
column_id: ColumnId, column_id: ColumnId,
tag: Tag, tag: Tag,
blob_size: u64, blob_size: u64,
@@ -96,6 +94,7 @@ impl<R> CachedBloomFilterIndexBlobReader<R> {
/// Creates a new bloom filter index blob reader with cache. /// Creates a new bloom filter index blob reader with cache.
pub fn new( pub fn new(
file_id: FileId, file_id: FileId,
index_version: IndexVersion,
column_id: ColumnId, column_id: ColumnId,
tag: Tag, tag: Tag,
blob_size: u64, blob_size: u64,
@@ -104,6 +103,7 @@ impl<R> CachedBloomFilterIndexBlobReader<R> {
) -> Self { ) -> Self {
Self { Self {
file_id, file_id,
index_version,
column_id, column_id,
tag, tag,
blob_size, blob_size,
@@ -126,7 +126,7 @@ impl<R: BloomFilterReader + Send> BloomFilterReader for CachedBloomFilterIndexBl
let (result, cache_metrics) = self let (result, cache_metrics) = self
.cache .cache
.get_or_load( .get_or_load(
(self.file_id, self.column_id, self.tag), (self.file_id, self.index_version, self.column_id, self.tag),
self.blob_size, self.blob_size,
offset, offset,
size, size,
@@ -161,7 +161,7 @@ impl<R: BloomFilterReader + Send> BloomFilterReader for CachedBloomFilterIndexBl
let (page, cache_metrics) = self let (page, cache_metrics) = self
.cache .cache
.get_or_load( .get_or_load(
(self.file_id, self.column_id, self.tag), (self.file_id, self.index_version, self.column_id, self.tag),
self.blob_size, self.blob_size,
range.start, range.start,
(range.end - range.start) as u32, (range.end - range.start) as u32,
@@ -191,9 +191,9 @@ impl<R: BloomFilterReader + Send> BloomFilterReader for CachedBloomFilterIndexBl
&self, &self,
metrics: Option<&mut BloomFilterReadMetrics>, metrics: Option<&mut BloomFilterReadMetrics>,
) -> Result<BloomFilterMeta> { ) -> Result<BloomFilterMeta> {
if let Some(cached) = self if let Some(cached) =
.cache self.cache
.get_metadata((self.file_id, self.column_id, self.tag)) .get_metadata((self.file_id, self.index_version, self.column_id, self.tag))
{ {
CACHE_HIT.with_label_values(&[INDEX_METADATA_TYPE]).inc(); CACHE_HIT.with_label_values(&[INDEX_METADATA_TYPE]).inc();
if let Some(m) = metrics { if let Some(m) = metrics {
@@ -203,7 +203,7 @@ impl<R: BloomFilterReader + Send> BloomFilterReader for CachedBloomFilterIndexBl
} else { } else {
let meta = self.inner.metadata(metrics).await?; let meta = self.inner.metadata(metrics).await?;
self.cache.put_metadata( self.cache.put_metadata(
(self.file_id, self.column_id, self.tag), (self.file_id, self.index_version, self.column_id, self.tag),
Arc::new(meta.clone()), Arc::new(meta.clone()),
); );
CACHE_MISS.with_label_values(&[INDEX_METADATA_TYPE]).inc(); CACHE_MISS.with_label_values(&[INDEX_METADATA_TYPE]).inc();
@@ -223,6 +223,7 @@ mod test {
#[test] #[test]
fn bloom_filter_metadata_weight_counts_vec_contents() { fn bloom_filter_metadata_weight_counts_vec_contents() {
let file_id = FileId::parse_str("00000000-0000-0000-0000-000000000001").unwrap(); let file_id = FileId::parse_str("00000000-0000-0000-0000-000000000001").unwrap();
let version = 0;
let column_id: ColumnId = 42; let column_id: ColumnId = 42;
let tag = Tag::Skipping; let tag = Tag::Skipping;
@@ -246,10 +247,13 @@ mod test {
], ],
}; };
let weight = let weight = bloom_filter_index_metadata_weight(
bloom_filter_index_metadata_weight(&(file_id, column_id, tag), &Arc::new(meta.clone())); &(file_id, version, column_id, tag),
&Arc::new(meta.clone()),
);
let base = file_id.as_bytes().len() let base = file_id.as_bytes().len()
+ std::mem::size_of::<IndexVersion>()
+ std::mem::size_of::<ColumnId>() + std::mem::size_of::<ColumnId>()
+ std::mem::size_of::<Tag>() + std::mem::size_of::<Tag>()
+ std::mem::size_of::<BloomFilterMeta>(); + std::mem::size_of::<BloomFilterMeta>();

View File

@@ -22,7 +22,7 @@ use bytes::Bytes;
use index::inverted_index::error::Result; use index::inverted_index::error::Result;
use index::inverted_index::format::reader::{InvertedIndexReadMetrics, InvertedIndexReader}; use index::inverted_index::format::reader::{InvertedIndexReadMetrics, InvertedIndexReader};
use prost::Message; use prost::Message;
use store_api::storage::FileId; use store_api::storage::{FileId, IndexVersion};
use crate::cache::index::{INDEX_METADATA_TYPE, IndexCache, PageKey}; use crate::cache::index::{INDEX_METADATA_TYPE, IndexCache, PageKey};
use crate::metrics::{CACHE_HIT, CACHE_MISS}; use crate::metrics::{CACHE_HIT, CACHE_MISS};
@@ -30,7 +30,7 @@ use crate::metrics::{CACHE_HIT, CACHE_MISS};
const INDEX_TYPE_INVERTED_INDEX: &str = "inverted_index"; const INDEX_TYPE_INVERTED_INDEX: &str = "inverted_index";
/// Cache for inverted index. /// Cache for inverted index.
pub type InvertedIndexCache = IndexCache<FileId, InvertedIndexMetas>; pub type InvertedIndexCache = IndexCache<(FileId, IndexVersion), InvertedIndexMetas>;
pub type InvertedIndexCacheRef = Arc<InvertedIndexCache>; pub type InvertedIndexCacheRef = Arc<InvertedIndexCache>;
impl InvertedIndexCache { impl InvertedIndexCache {
@@ -48,23 +48,24 @@ impl InvertedIndexCache {
/// Removes all cached entries for the given `file_id`. /// Removes all cached entries for the given `file_id`.
pub fn invalidate_file(&self, file_id: FileId) { pub fn invalidate_file(&self, file_id: FileId) {
self.invalidate_if(move |key| *key == file_id); self.invalidate_if(move |key| key.0 == file_id);
} }
} }
/// Calculates weight for inverted index metadata. /// Calculates weight for inverted index metadata.
fn inverted_index_metadata_weight(k: &FileId, v: &Arc<InvertedIndexMetas>) -> u32 { fn inverted_index_metadata_weight(k: &(FileId, IndexVersion), v: &Arc<InvertedIndexMetas>) -> u32 {
(k.as_bytes().len() + v.encoded_len()) as u32 (k.0.as_bytes().len() + size_of::<IndexVersion>() + v.encoded_len()) as u32
} }
/// Calculates weight for inverted index content. /// Calculates weight for inverted index content.
fn inverted_index_content_weight((k, _): &(FileId, PageKey), v: &Bytes) -> u32 { fn inverted_index_content_weight((k, _): &((FileId, IndexVersion), PageKey), v: &Bytes) -> u32 {
(k.as_bytes().len() + v.len()) as u32 (k.0.as_bytes().len() + size_of::<IndexVersion>() + v.len()) as u32
} }
/// Inverted index blob reader with cache. /// Inverted index blob reader with cache.
pub struct CachedInvertedIndexBlobReader<R> { pub struct CachedInvertedIndexBlobReader<R> {
file_id: FileId, file_id: FileId,
index_version: IndexVersion,
blob_size: u64, blob_size: u64,
inner: R, inner: R,
cache: InvertedIndexCacheRef, cache: InvertedIndexCacheRef,
@@ -72,9 +73,16 @@ pub struct CachedInvertedIndexBlobReader<R> {
impl<R> CachedInvertedIndexBlobReader<R> { impl<R> CachedInvertedIndexBlobReader<R> {
/// Creates a new inverted index blob reader with cache. /// Creates a new inverted index blob reader with cache.
pub fn new(file_id: FileId, blob_size: u64, inner: R, cache: InvertedIndexCacheRef) -> Self { pub fn new(
file_id: FileId,
index_version: IndexVersion,
blob_size: u64,
inner: R,
cache: InvertedIndexCacheRef,
) -> Self {
Self { Self {
file_id, file_id,
index_version,
blob_size, blob_size,
inner, inner,
cache, cache,
@@ -96,7 +104,7 @@ impl<R: InvertedIndexReader> InvertedIndexReader for CachedInvertedIndexBlobRead
let (result, cache_metrics) = self let (result, cache_metrics) = self
.cache .cache
.get_or_load( .get_or_load(
self.file_id, (self.file_id, self.index_version),
self.blob_size, self.blob_size,
offset, offset,
size, size,
@@ -129,7 +137,7 @@ impl<R: InvertedIndexReader> InvertedIndexReader for CachedInvertedIndexBlobRead
let (page, cache_metrics) = self let (page, cache_metrics) = self
.cache .cache
.get_or_load( .get_or_load(
self.file_id, (self.file_id, self.index_version),
self.blob_size, self.blob_size,
range.start, range.start,
(range.end - range.start) as u32, (range.end - range.start) as u32,
@@ -156,7 +164,7 @@ impl<R: InvertedIndexReader> InvertedIndexReader for CachedInvertedIndexBlobRead
&self, &self,
metrics: Option<&'a mut InvertedIndexReadMetrics>, metrics: Option<&'a mut InvertedIndexReadMetrics>,
) -> Result<Arc<InvertedIndexMetas>> { ) -> Result<Arc<InvertedIndexMetas>> {
if let Some(cached) = self.cache.get_metadata(self.file_id) { if let Some(cached) = self.cache.get_metadata((self.file_id, self.index_version)) {
CACHE_HIT.with_label_values(&[INDEX_METADATA_TYPE]).inc(); CACHE_HIT.with_label_values(&[INDEX_METADATA_TYPE]).inc();
if let Some(m) = metrics { if let Some(m) = metrics {
m.cache_hit += 1; m.cache_hit += 1;
@@ -164,7 +172,8 @@ impl<R: InvertedIndexReader> InvertedIndexReader for CachedInvertedIndexBlobRead
Ok(cached) Ok(cached)
} else { } else {
let meta = self.inner.metadata(metrics).await?; let meta = self.inner.metadata(metrics).await?;
self.cache.put_metadata(self.file_id, meta.clone()); self.cache
.put_metadata((self.file_id, self.index_version), meta.clone());
CACHE_MISS.with_label_values(&[INDEX_METADATA_TYPE]).inc(); CACHE_MISS.with_label_values(&[INDEX_METADATA_TYPE]).inc();
Ok(meta) Ok(meta)
} }
@@ -299,6 +308,7 @@ mod test {
// Init a test range reader in local fs. // Init a test range reader in local fs.
let mut env = TestEnv::new().await; let mut env = TestEnv::new().await;
let file_size = blob.len() as u64; let file_size = blob.len() as u64;
let index_version = 0;
let store = env.init_object_store_manager(); let store = env.init_object_store_manager();
let temp_path = "data"; let temp_path = "data";
store.write(temp_path, blob).await.unwrap(); store.write(temp_path, blob).await.unwrap();
@@ -314,6 +324,7 @@ mod test {
let reader = InvertedIndexBlobReader::new(range_reader); let reader = InvertedIndexBlobReader::new(range_reader);
let cached_reader = CachedInvertedIndexBlobReader::new( let cached_reader = CachedInvertedIndexBlobReader::new(
FileId::random(), FileId::random(),
index_version,
file_size, file_size,
reader, reader,
Arc::new(InvertedIndexCache::new(8192, 8192, 50)), Arc::new(InvertedIndexCache::new(8192, 8192, 50)),
@@ -450,7 +461,7 @@ mod test {
let (read, _cache_metrics) = cached_reader let (read, _cache_metrics) = cached_reader
.cache .cache
.get_or_load( .get_or_load(
cached_reader.file_id, (cached_reader.file_id, cached_reader.index_version),
file_size, file_size,
offset, offset,
size, size,

Some files were not shown because too many files have changed in this diff Show More