Compare commits

...

29 Commits

Author SHA1 Message Date
discord9
1d3cfdc0e5 clippy
Signed-off-by: discord9 <discord9@163.com>
2025-12-22 21:07:24 +08:00
discord9
088401c3e9 c
Signed-off-by: discord9 <discord9@163.com>
2025-12-22 21:05:23 +08:00
discord9
4419e0254f refactor: add list test
Signed-off-by: discord9 <discord9@163.com>
2025-12-22 21:05:13 +08:00
discord9
709ccd3e31 c
Signed-off-by: discord9 <discord9@163.com>
2025-12-22 18:39:14 +08:00
discord9
5b50b4824d wt
Signed-off-by: discord9 <discord9@163.com>
2025-12-22 18:35:19 +08:00
discord9
1ef5c2e024 chore
Signed-off-by: discord9 <discord9@163.com>
2025-12-22 18:22:03 +08:00
discord9
d20727f335 test: better fuzz
Signed-off-by: discord9 <discord9@163.com>
2025-12-22 18:15:02 +08:00
discord9
2391ab1941 even more test
Signed-off-by: discord9 <discord9@163.com>
2025-12-22 16:21:27 +08:00
discord9
ec77a5d53a sanity check
Signed-off-by: discord9 <discord9@163.com>
2025-12-22 13:54:19 +08:00
discord9
dbad96eb80 more test
Signed-off-by: discord9 <discord9@163.com>
2025-12-22 13:44:00 +08:00
discord9
c0652f6dd5 chore: release push check against Cargo.toml (#7426)
Signed-off-by: discord9 <discord9@163.com>
2025-12-19 13:16:15 +00:00
Yingwen
fed6cb0806 fix: flat format use correct encoding in indexer for tags (#7440)
* test: add inverted and skipping test

Signed-off-by: evenyag <realevenyag@gmail.com>

* test: Add tests for fulltext index

Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: index dictionary type in correct encoding in flat format

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: use encode_data_type() in SortField

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: refine imports

Signed-off-by: evenyag <realevenyag@gmail.com>

* test: add tests for sparse encoding

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: remove logs

Signed-off-by: evenyag <realevenyag@gmail.com>

* test: update list test

Signed-off-by: evenyag <realevenyag@gmail.com>

* test: simplify tests

Signed-off-by: evenyag <realevenyag@gmail.com>

---------

Signed-off-by: evenyag <realevenyag@gmail.com>
2025-12-19 07:36:44 +00:00
discord9
69659211f6 chore: fix bincode version (#7445)
Signed-off-by: discord9 <discord9@163.com>
2025-12-19 07:36:28 +00:00
LFC
6332d91884 test: reduce execution time of test test_suspend_frontend (#7444)
Signed-off-by: luofucong <luofc@foxmail.com>
2025-12-19 07:25:36 +00:00
Weny Xu
4d66bd96b8 feat: make distributed time constants and client timeouts configurable (#7433)
Signed-off-by: WenyXu <wenymedia@gmail.com>
2025-12-19 02:23:20 +00:00
Ning Sun
2f4a15ec40 ci: ensure commits from main branch for whitelisted git dependencies (#7434)
* chore: update proto to include native histogram

* ci: add a CI check to ensure whitelisted dependencies are using their main branch

* chore: add changes to Cargo.toml to trigger CI

* chore: update proto

* test: update test to include histogram
2025-12-18 14:10:33 +00:00
Lanqing Yang
658332fe68 chore(mito): nit remove extra hashset in gc workers (#7399)
chore(mito): remove extra hashset in gc workers

Signed-off-by: lyang24 <lanqingy93@gmail.com>
2025-12-18 13:09:32 +00:00
shuiyisong
c088d361a4 chore: expose disable_ec2_metadata option (#7439)
chore: add option for disable ec2 metadata

Signed-off-by: shuiyisong <xixing.sys@gmail.com>
2025-12-18 11:55:08 +00:00
shuiyisong
a85864067e chore: remove canonicalize (#7430)
* chore: remove canonicalize

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* chore: add match file name option

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* chore: update field name

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* chore: modify tls option

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* chore: update config file

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* chore: update config md

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* chore: update option to `enable_filename_match`

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* chore: address CR issues

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* chore: remove option

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* chore: remove unused test

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

---------

Signed-off-by: shuiyisong <xixing.sys@gmail.com>
2025-12-18 09:39:10 +00:00
LFC
0df69c95aa chore: use official etcd-client (#7432)
Signed-off-by: luofucong <luofc@foxmail.com>
2025-12-18 06:25:48 +00:00
McKnight22
72eede8b38 refactor(cli): unify storage configuration for export command (#7280)
* refactor(cli): unify storage configuration for export command

- Utilize ObjectStoreConfig to unify storage configuration for export command
- Support export command for Fs, S3, OSS, GCS and Azblob
- Fix the Display implementation for SecretString always returned the string
  "SecretString([REDACTED])" even when the internal secret was empty.

Signed-off-by: McKnight22 <tao.wang.22@outlook.com>

* refactor(cli): unify storage configuration for export command

- Change the encapsulation permissions of each configuration
  options for every storage backend to public access.

Signed-off-by: McKnight22 <tao.wang.22@outlook.com>
Co-authored-by: WenyXu <wenymedia@gmail.com>

* refactor(cli): unify storage configuration for export command

- Update the implementation of ObjectStoreConfig::build_xxx() using macro solutions

Signed-off-by: McKnight22 <tao.wang.22@outlook.com>
Co-authored-by: WenyXu <wenymedia@gmail.com>

* refactor(cli): unify storage configuration for export command

- Introduce config validation for each storage type

Signed-off-by: McKnight22 <tao.wang.22@outlook.com>

* refactor(cli): unify storage configuration for export command

- Enable trait-based polymorphism for storage type handling
  (from inherent impl to trait impl)
- Extract helper functions to reduce code duplication

Signed-off-by: McKnight22 <tao.wang.22@outlook.com>

* refactor(cli): unify storage configuration for export command

- Improve SecretString handling and validation
  (Distinguishing between "not provided" and "empty string")
- Add validation when using filesystem storage

Signed-off-by: McKnight22 <tao.wang.22@outlook.com>

* refactor(cli): unify storage configuration for export command

- Refactor storage field validation with macro

Signed-off-by: McKnight22 <tao.wang.22@outlook.com>

* refactor(cli): unify storage configuration for export command

- support GCS Application Default Credentials (like GKE, Cloud Run, or local development with ) in export
  (Enabling ADC without validating  or  to be present)
  (Making  optional in GCS validation (defaults to https://storage.googleapis.com))

Signed-off-by: McKnight22 <tao.wang.22@outlook.com>

* refactor(cli): unify storage configuration for export command

This commit refactors the validation logic for object store configurations in the CLI to leverage clap features and reduce boilerplate.

Key changes:
- Update wrap_with_clap_prefix macro to use clap's requires attribute.
  This ensures that storage-specific options (e.g., --s3-bucket) are only accepted when the corresponding backend is enabled (e.g., --s3).
- Simplify FieldValidator trait by removing the is_provided method, as dependency checks are now handled by clap.
- Introduce validate_backend! macro to standardize the validation of required fields for enabled backends.
- Refactor ExportCommand to remove explicit validation calls (validate_s3, etc.) and rely on the validation within backend constructors.
- Add integration tests for ExportCommand to verify build success with S3, OSS, GCS, and Azblob configurations.

Signed-off-by: McKnight22 <tao.wang.22@outlook.com>

* refactor(cli): unify storage configuration for export command

- Use macros to simplify storage export implementation

Signed-off-by: McKnight22 <tao.wang.22@outlook.com>
Co-authored-by: WenyXu <wenymedia@gmail.com>

* refactor(cli): unify storage configuration for export command

- Rollback StorageExport trait implementation to not using macro for better code clarity and maintainability
- Introduce format_uri helper function to unify URI formatting logic
- Fix OSS URI path bug inherited from legacy code

Signed-off-by: McKnight22 <tao.wang.22@outlook.com>
Co-authored-by: WenyXu <wenymedia@gmail.com>

* refactor(cli): unify storage configuration for export command

- Remove unnecessary async_trait

Signed-off-by: McKnight22 <tao.wang.22@outlook.com>
Co-authored-by: jeremyhi <jiachun_feng@proton.me>

---------

Signed-off-by: McKnight22 <tao.wang.22@outlook.com>
Co-authored-by: WenyXu <wenymedia@gmail.com>
Co-authored-by: jeremyhi <jiachun_feng@proton.me>
2025-12-18 03:16:53 +00:00
jeremyhi
95eccd6cde feat: introduce granularity for memory manager (#7416)
* feat: introduce granularity for memory manager

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* chore: add unit test

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* chore: remove granularity getter for mamanger

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

* Update src/common/memory-manager/src/manager.rs

Co-authored-by: Lei, HUANG <6406592+v0y4g3r@users.noreply.github.com>

* feat: acquire_with_policy for manager

Signed-off-by: jeremyhi <fengjiachun@gmail.com>

---------

Signed-off-by: jeremyhi <fengjiachun@gmail.com>
Co-authored-by: Lei, HUANG <6406592+v0y4g3r@users.noreply.github.com>
2025-12-17 11:08:51 +00:00
fys
0bc5a305be chore: add wait_initialized method for frontend client (#7414)
* chore: add wait_initialized method for frontend client

* fix: some

* fix: cargo fmt

* add comment

* add unit test

* rename

* fix: cargo check

* fix: cr by copilot
2025-12-17 08:13:36 +00:00
discord9
1afcddd5a9 chore: feature gate vector_index (#7428)
Signed-off-by: discord9 <discord9@163.com>
2025-12-17 07:14:25 +00:00
shuiyisong
62808b887b fix: using anonymous s3 access when ak and sk is not provided (#7425)
* chore: allow s3 anon

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* chore: disable ec2 metadata

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

---------

Signed-off-by: shuiyisong <xixing.sys@gmail.com>
2025-12-17 06:34:29 +00:00
discord9
04ddd40e00 chore: bump version to beta.3 (#7423)
chore: bump to beta.3

Signed-off-by: discord9 <discord9@163.com>
2025-12-17 04:18:23 +00:00
liyang
b4f028be5f chore: change etcd endpoints to array in the test scripts (#7419)
chore: change etcd endpoint

Signed-off-by: liyang <daviderli614@gmail.com>
2025-12-17 03:14:35 +00:00
Lei, HUANG
da964880f5 chore: expose symbols (#7417)
* refactor/expose-symbols:
 ## Refactor `bulk/part.rs` to Simplify Mutation Handling

 - Removed the `mutations_to_record_batch` function and its associated helper functions, including `ArraysSorter`, `timestamp_array_to_iter`, and `binary_array_to_dictionary`, to simplify the mutation handling logic in `bulk/part.rs`.
 - Deleted related test functions `check_binary_array_to_dictionary` and `check_mutations_to_record_batches` from the test module, along with their associated test cases.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* refactor/expose-symbols:
 ### Commit Message

 **Refactor and Enhance Deduplication Logic**

 - **`flush.rs`**: Refactored `maybe_dedup_one` function to accept `append_mode` and `merge_mode` as parameters instead of `RegionOptions`. This change enhances flexibility in deduplication logic.
 - **`memtable/bulk.rs`**: Made `BulkRangeIterBuilder` struct and its fields public to allow external access and modification, improving extensibility.
 - **`sst.rs`**: Corrected a typo in the schema documentation, changing `__prmary_key` to `__primary_key` for clarity and accuracy.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

---------

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
2025-12-17 01:29:36 +00:00
dennis zhuang
a35a39f726 feat(vector_index): adds the foundational types and SQL parsing support for vector index (#7366)
* feat: adds the foundational types and SQL parsing support for vector index

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* refactor: by suggestions

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* fix: ensure index option values must be greater than zero

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* chore: validate connectivity strictly

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* fix: compile error

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* feat: disable SIMD for ci

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

---------

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
2025-12-16 22:45:36 +00:00
91 changed files with 5441 additions and 2228 deletions

View File

@@ -51,7 +51,7 @@ runs:
run: |
helm upgrade \
--install my-greptimedb \
--set meta.backendStorage.etcd.endpoints=${{ inputs.etcd-endpoints }} \
--set 'meta.backendStorage.etcd.endpoints[0]=${{ inputs.etcd-endpoints }}' \
--set meta.enableRegionFailover=${{ inputs.enable-region-failover }} \
--set image.registry=${{ inputs.image-registry }} \
--set image.repository=${{ inputs.image-repository }} \

View File

@@ -49,6 +49,17 @@ function create_version() {
echo "GITHUB_REF_NAME is empty in push event" >&2
exit 1
fi
# For tag releases, ensure GITHUB_REF_NAME matches the version in Cargo.toml
CARGO_VERSION=$(grep '^version = ' Cargo.toml | cut -d '"' -f 2 | head -n 1)
EXPECTED_REF_NAME="v${CARGO_VERSION}"
if [ "$GITHUB_REF_NAME" != "$EXPECTED_REF_NAME" ]; then
echo "Error: GITHUB_REF_NAME '$GITHUB_REF_NAME' does not match Cargo.toml version 'v${CARGO_VERSION}'" >&2
echo "Expected tag name: '$EXPECTED_REF_NAME'" >&2
exit 1
fi
echo "$GITHUB_REF_NAME"
elif [ "$GITHUB_EVENT_NAME" = workflow_dispatch ]; then
echo "$NEXT_RELEASE_VERSION-$(git rev-parse --short HEAD)-$(date "+%Y%m%d-%s")"

View File

@@ -81,7 +81,7 @@ function deploy_greptimedb_cluster() {
--create-namespace \
--set image.tag="$GREPTIMEDB_IMAGE_TAG" \
--set initializer.tag="$GREPTIMEDB_INITIALIZER_IMAGE_TAG" \
--set meta.backendStorage.etcd.endpoints="etcd.$install_namespace:2379" \
--set "meta.backendStorage.etcd.endpoints[0]=etcd.$install_namespace.svc.cluster.local:2379" \
--set meta.backendStorage.etcd.storeKeyPrefix="$cluster_name" \
-n "$install_namespace"
@@ -119,7 +119,7 @@ function deploy_greptimedb_cluster_with_s3_storage() {
--create-namespace \
--set image.tag="$GREPTIMEDB_IMAGE_TAG" \
--set initializer.tag="$GREPTIMEDB_INITIALIZER_IMAGE_TAG" \
--set meta.backendStorage.etcd.endpoints="etcd.$install_namespace:2379" \
--set "meta.backendStorage.etcd.endpoints[0]=etcd.$install_namespace.svc.cluster.local:2379" \
--set meta.backendStorage.etcd.storeKeyPrefix="$cluster_name" \
--set objectStorage.s3.bucket="$AWS_CI_TEST_BUCKET" \
--set objectStorage.s3.region="$AWS_REGION" \

154
.github/workflows/check-git-deps.yml vendored Normal file
View File

@@ -0,0 +1,154 @@
name: Check Git Dependencies on Main Branch
on:
pull_request:
branches: [main]
paths:
- 'Cargo.toml'
push:
branches: [main]
paths:
- 'Cargo.toml'
jobs:
check-git-deps:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v6
- name: Check git dependencies
env:
WHITELIST_DEPS: "greptime-proto,meter-core,meter-macros"
run: |
#!/bin/bash
set -e
echo "Checking whitelisted git dependencies..."
# Function to check if a commit is on main branch
check_commit_on_main() {
local repo_url="$1"
local commit="$2"
local repo_name=$(basename "$repo_url" .git)
echo "Checking $repo_name"
echo "Repo: $repo_url"
echo "Commit: $commit"
# Create a temporary directory for cloning
local temp_dir=$(mktemp -d)
# Clone the repository
if git clone "$repo_url" "$temp_dir" 2>/dev/null; then
cd "$temp_dir"
# Try to determine the main branch name
local main_branch="main"
if ! git rev-parse --verify origin/main >/dev/null 2>&1; then
if git rev-parse --verify origin/master >/dev/null 2>&1; then
main_branch="master"
else
# Try to get the default branch
main_branch=$(git symbolic-ref refs/remotes/origin/HEAD | sed 's@^refs/remotes/origin/@@')
fi
fi
echo "Main branch: $main_branch"
# Check if commit exists
if git cat-file -e "$commit" 2>/dev/null; then
# Check if commit is on main branch
if git merge-base --is-ancestor "$commit" "origin/$main_branch" 2>/dev/null; then
echo "PASS: Commit $commit is on $main_branch branch"
cd - >/dev/null
rm -rf "$temp_dir"
return 0
else
echo "FAIL: Commit $commit is NOT on $main_branch branch"
# Try to find which branch contains this commit
local branch_name=$(git branch -r --contains "$commit" 2>/dev/null | head -1 | sed 's/^[[:space:]]*origin\///' | sed 's/[[:space:]]*$//')
if [[ -n "$branch_name" ]]; then
echo "Found on branch: $branch_name"
fi
cd - >/dev/null
rm -rf "$temp_dir"
return 1
fi
else
echo "FAIL: Commit $commit not found in repository"
cd - >/dev/null
rm -rf "$temp_dir"
return 1
fi
else
echo "FAIL: Failed to clone $repo_url"
rm -rf "$temp_dir"
return 1
fi
}
# Extract whitelisted git dependencies from Cargo.toml
echo "Extracting git dependencies from Cargo.toml..."
# Create temporary array to store dependencies
declare -a deps=()
# Build awk pattern from whitelist
IFS=',' read -ra WHITELIST <<< "$WHITELIST_DEPS"
awk_pattern=""
for dep in "${WHITELIST[@]}"; do
if [[ -n "$awk_pattern" ]]; then
awk_pattern="$awk_pattern|"
fi
awk_pattern="$awk_pattern$dep"
done
# Extract whitelisted dependencies
while IFS= read -r line; do
if [[ -n "$line" ]]; then
deps+=("$line")
fi
done < <(awk -v pattern="$awk_pattern" '
$0 ~ pattern ".*git = \"https:/" {
match($0, /git = "([^"]+)"/, arr)
git_url = arr[1]
if (match($0, /rev = "([^"]+)"/, rev_arr)) {
rev = rev_arr[1]
print git_url " " rev
} else {
# Check next line for rev
getline
if (match($0, /rev = "([^"]+)"/, rev_arr)) {
rev = rev_arr[1]
print git_url " " rev
}
}
}
' Cargo.toml)
echo "Found ${#deps[@]} dependencies to check:"
for dep in "${deps[@]}"; do
echo " $dep"
done
failed=0
for dep in "${deps[@]}"; do
read -r repo_url commit <<< "$dep"
if ! check_commit_on_main "$repo_url" "$commit"; then
failed=1
fi
done
echo "Check completed."
if [[ $failed -eq 1 ]]; then
echo "ERROR: Some git dependencies are not on their main branches!"
echo "Please update the commits to point to main branch commits."
exit 1
else
echo "SUCCESS: All git dependencies are on their main branches!"
fi

271
Cargo.lock generated
View File

@@ -212,7 +212,7 @@ checksum = "d301b3b94cb4b2f23d7917810addbbaff90738e0ca2be692bd027e70d7e0330c"
[[package]]
name = "api"
version = "1.0.0-beta.2"
version = "1.0.0-beta.3"
dependencies = [
"arrow-schema",
"common-base",
@@ -733,7 +733,7 @@ dependencies = [
[[package]]
name = "auth"
version = "1.0.0-beta.2"
version = "1.0.0-beta.3"
dependencies = [
"api",
"async-trait",
@@ -1383,7 +1383,7 @@ dependencies = [
[[package]]
name = "cache"
version = "1.0.0-beta.2"
version = "1.0.0-beta.3"
dependencies = [
"catalog",
"common-error",
@@ -1418,7 +1418,7 @@ checksum = "37b2a672a2cb129a2e41c10b1224bb368f9f37a2b16b612598138befd7b37eb5"
[[package]]
name = "catalog"
version = "1.0.0-beta.2"
version = "1.0.0-beta.3"
dependencies = [
"api",
"arrow",
@@ -1763,7 +1763,7 @@ checksum = "b94f61472cee1439c0b966b47e3aca9ae07e45d070759512cd390ea2bebc6675"
[[package]]
name = "cli"
version = "1.0.0-beta.2"
version = "1.0.0-beta.3"
dependencies = [
"async-stream",
"async-trait",
@@ -1786,6 +1786,7 @@ dependencies = [
"common-recordbatch",
"common-runtime",
"common-telemetry",
"common-test-util",
"common-time",
"common-version",
"common-wal",
@@ -1816,7 +1817,7 @@ dependencies = [
[[package]]
name = "client"
version = "1.0.0-beta.2"
version = "1.0.0-beta.3"
dependencies = [
"api",
"arc-swap",
@@ -1849,7 +1850,7 @@ dependencies = [
"snafu 0.8.6",
"store-api",
"substrait 0.37.3",
"substrait 1.0.0-beta.2",
"substrait 1.0.0-beta.3",
"tokio",
"tokio-stream",
"tonic 0.13.1",
@@ -1889,7 +1890,7 @@ dependencies = [
[[package]]
name = "cmd"
version = "1.0.0-beta.2"
version = "1.0.0-beta.3"
dependencies = [
"async-trait",
"auth",
@@ -1977,6 +1978,17 @@ dependencies = [
"unicode-width 0.2.1",
]
[[package]]
name = "codespan-reporting"
version = "0.13.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "af491d569909a7e4dee0ad7db7f5341fef5c614d5b8ec8cf765732aba3cff681"
dependencies = [
"serde",
"termcolor",
"unicode-width 0.2.1",
]
[[package]]
name = "colorchoice"
version = "1.0.4"
@@ -2012,7 +2024,7 @@ checksum = "55b672471b4e9f9e95499ea597ff64941a309b2cdbffcc46f2cc5e2d971fd335"
[[package]]
name = "common-base"
version = "1.0.0-beta.2"
version = "1.0.0-beta.3"
dependencies = [
"anymap2",
"async-trait",
@@ -2036,14 +2048,14 @@ dependencies = [
[[package]]
name = "common-catalog"
version = "1.0.0-beta.2"
version = "1.0.0-beta.3"
dependencies = [
"const_format",
]
[[package]]
name = "common-config"
version = "1.0.0-beta.2"
version = "1.0.0-beta.3"
dependencies = [
"common-base",
"common-error",
@@ -2068,7 +2080,7 @@ dependencies = [
[[package]]
name = "common-datasource"
version = "1.0.0-beta.2"
version = "1.0.0-beta.3"
dependencies = [
"arrow",
"arrow-schema",
@@ -2103,7 +2115,7 @@ dependencies = [
[[package]]
name = "common-decimal"
version = "1.0.0-beta.2"
version = "1.0.0-beta.3"
dependencies = [
"bigdecimal 0.4.8",
"common-error",
@@ -2116,7 +2128,7 @@ dependencies = [
[[package]]
name = "common-error"
version = "1.0.0-beta.2"
version = "1.0.0-beta.3"
dependencies = [
"common-macro",
"http 1.3.1",
@@ -2127,7 +2139,7 @@ dependencies = [
[[package]]
name = "common-event-recorder"
version = "1.0.0-beta.2"
version = "1.0.0-beta.3"
dependencies = [
"api",
"async-trait",
@@ -2149,7 +2161,7 @@ dependencies = [
[[package]]
name = "common-frontend"
version = "1.0.0-beta.2"
version = "1.0.0-beta.3"
dependencies = [
"api",
"async-trait",
@@ -2171,7 +2183,7 @@ dependencies = [
[[package]]
name = "common-function"
version = "1.0.0-beta.2"
version = "1.0.0-beta.3"
dependencies = [
"ahash 0.8.12",
"api",
@@ -2231,7 +2243,7 @@ dependencies = [
[[package]]
name = "common-greptimedb-telemetry"
version = "1.0.0-beta.2"
version = "1.0.0-beta.3"
dependencies = [
"async-trait",
"common-runtime",
@@ -2248,7 +2260,7 @@ dependencies = [
[[package]]
name = "common-grpc"
version = "1.0.0-beta.2"
version = "1.0.0-beta.3"
dependencies = [
"api",
"arrow-flight",
@@ -2283,7 +2295,7 @@ dependencies = [
[[package]]
name = "common-grpc-expr"
version = "1.0.0-beta.2"
version = "1.0.0-beta.3"
dependencies = [
"api",
"common-base",
@@ -2303,7 +2315,7 @@ dependencies = [
[[package]]
name = "common-macro"
version = "1.0.0-beta.2"
version = "1.0.0-beta.3"
dependencies = [
"greptime-proto",
"once_cell",
@@ -2314,7 +2326,7 @@ dependencies = [
[[package]]
name = "common-mem-prof"
version = "1.0.0-beta.2"
version = "1.0.0-beta.3"
dependencies = [
"anyhow",
"common-error",
@@ -2330,7 +2342,7 @@ dependencies = [
[[package]]
name = "common-memory-manager"
version = "1.0.0-beta.2"
version = "1.0.0-beta.3"
dependencies = [
"common-error",
"common-macro",
@@ -2343,7 +2355,7 @@ dependencies = [
[[package]]
name = "common-meta"
version = "1.0.0-beta.2"
version = "1.0.0-beta.3"
dependencies = [
"anymap2",
"api",
@@ -2415,7 +2427,7 @@ dependencies = [
[[package]]
name = "common-options"
version = "1.0.0-beta.2"
version = "1.0.0-beta.3"
dependencies = [
"common-grpc",
"humantime-serde",
@@ -2424,11 +2436,11 @@ dependencies = [
[[package]]
name = "common-plugins"
version = "1.0.0-beta.2"
version = "1.0.0-beta.3"
[[package]]
name = "common-pprof"
version = "1.0.0-beta.2"
version = "1.0.0-beta.3"
dependencies = [
"common-error",
"common-macro",
@@ -2440,7 +2452,7 @@ dependencies = [
[[package]]
name = "common-procedure"
version = "1.0.0-beta.2"
version = "1.0.0-beta.3"
dependencies = [
"api",
"async-stream",
@@ -2469,7 +2481,7 @@ dependencies = [
[[package]]
name = "common-procedure-test"
version = "1.0.0-beta.2"
version = "1.0.0-beta.3"
dependencies = [
"async-trait",
"common-procedure",
@@ -2479,7 +2491,7 @@ dependencies = [
[[package]]
name = "common-query"
version = "1.0.0-beta.2"
version = "1.0.0-beta.3"
dependencies = [
"api",
"async-trait",
@@ -2505,7 +2517,7 @@ dependencies = [
[[package]]
name = "common-recordbatch"
version = "1.0.0-beta.2"
version = "1.0.0-beta.3"
dependencies = [
"arc-swap",
"common-base",
@@ -2529,7 +2541,7 @@ dependencies = [
[[package]]
name = "common-runtime"
version = "1.0.0-beta.2"
version = "1.0.0-beta.3"
dependencies = [
"async-trait",
"clap 4.5.40",
@@ -2558,7 +2570,7 @@ dependencies = [
[[package]]
name = "common-session"
version = "1.0.0-beta.2"
version = "1.0.0-beta.3"
dependencies = [
"serde",
"strum 0.27.1",
@@ -2566,7 +2578,7 @@ dependencies = [
[[package]]
name = "common-sql"
version = "1.0.0-beta.2"
version = "1.0.0-beta.3"
dependencies = [
"common-base",
"common-decimal",
@@ -2584,7 +2596,7 @@ dependencies = [
[[package]]
name = "common-stat"
version = "1.0.0-beta.2"
version = "1.0.0-beta.3"
dependencies = [
"common-base",
"common-runtime",
@@ -2599,7 +2611,7 @@ dependencies = [
[[package]]
name = "common-telemetry"
version = "1.0.0-beta.2"
version = "1.0.0-beta.3"
dependencies = [
"backtrace",
"common-base",
@@ -2628,7 +2640,7 @@ dependencies = [
[[package]]
name = "common-test-util"
version = "1.0.0-beta.2"
version = "1.0.0-beta.3"
dependencies = [
"client",
"common-grpc",
@@ -2641,7 +2653,7 @@ dependencies = [
[[package]]
name = "common-time"
version = "1.0.0-beta.2"
version = "1.0.0-beta.3"
dependencies = [
"arrow",
"chrono",
@@ -2659,7 +2671,7 @@ dependencies = [
[[package]]
name = "common-version"
version = "1.0.0-beta.2"
version = "1.0.0-beta.3"
dependencies = [
"build-data",
"cargo-manifest",
@@ -2670,7 +2682,7 @@ dependencies = [
[[package]]
name = "common-wal"
version = "1.0.0-beta.2"
version = "1.0.0-beta.3"
dependencies = [
"common-base",
"common-error",
@@ -2693,7 +2705,7 @@ dependencies = [
[[package]]
name = "common-workload"
version = "1.0.0-beta.2"
version = "1.0.0-beta.3"
dependencies = [
"common-telemetry",
"serde",
@@ -3169,6 +3181,68 @@ dependencies = [
"cipher",
]
[[package]]
name = "cxx"
version = "1.0.190"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "a7620f6cfc4dcca21f2b085b7a890e16c60fd66f560cd69ee60594908dc72ab1"
dependencies = [
"cc",
"cxx-build",
"cxxbridge-cmd",
"cxxbridge-flags",
"cxxbridge-macro",
"foldhash 0.2.0",
"link-cplusplus",
]
[[package]]
name = "cxx-build"
version = "1.0.190"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7a9bc1a22964ff6a355fbec24cf68266a0ed28f8b84c0864c386474ea3d0e479"
dependencies = [
"cc",
"codespan-reporting 0.13.1",
"indexmap 2.11.4",
"proc-macro2",
"quote",
"scratch",
"syn 2.0.106",
]
[[package]]
name = "cxxbridge-cmd"
version = "1.0.190"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b1f29a879d35f7906e3c9b77d7a1005a6a0787d330c09dfe4ffb5f617728cb44"
dependencies = [
"clap 4.5.40",
"codespan-reporting 0.13.1",
"indexmap 2.11.4",
"proc-macro2",
"quote",
"syn 2.0.106",
]
[[package]]
name = "cxxbridge-flags"
version = "1.0.190"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d67109015f93f683e364085aa6489a5b2118b4a40058482101d699936a7836d6"
[[package]]
name = "cxxbridge-macro"
version = "1.0.190"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d187e019e7b05a1f3e69a8396b70800ee867aa9fc2ab972761173ccee03742df"
dependencies = [
"indexmap 2.11.4",
"proc-macro2",
"quote",
"syn 2.0.106",
]
[[package]]
name = "darling"
version = "0.14.4"
@@ -3939,7 +4013,7 @@ dependencies = [
[[package]]
name = "datanode"
version = "1.0.0-beta.2"
version = "1.0.0-beta.3"
dependencies = [
"api",
"arrow-flight",
@@ -4003,7 +4077,7 @@ dependencies = [
[[package]]
name = "datatypes"
version = "1.0.0-beta.2"
version = "1.0.0-beta.3"
dependencies = [
"arrow",
"arrow-array",
@@ -4560,8 +4634,9 @@ dependencies = [
[[package]]
name = "etcd-client"
version = "0.15.0"
source = "git+https://github.com/GreptimeTeam/etcd-client?rev=f62df834f0cffda355eba96691fe1a9a332b75a7#f62df834f0cffda355eba96691fe1a9a332b75a7"
version = "0.16.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "88365f1a5671eb2f7fc240adb216786bc6494b38ce15f1d26ad6eaa303d5e822"
dependencies = [
"http 1.3.1",
"prost 0.13.5",
@@ -4677,7 +4752,7 @@ checksum = "37909eebbb50d72f9059c3b6d82c0463f2ff062c9e95845c43a6c9c0355411be"
[[package]]
name = "file-engine"
version = "1.0.0-beta.2"
version = "1.0.0-beta.3"
dependencies = [
"api",
"async-trait",
@@ -4809,7 +4884,7 @@ checksum = "8bf7cc16383c4b8d58b9905a8509f02926ce3058053c056376248d958c9df1e8"
[[package]]
name = "flow"
version = "1.0.0-beta.2"
version = "1.0.0-beta.3"
dependencies = [
"api",
"arrow",
@@ -4878,7 +4953,7 @@ dependencies = [
"sql",
"store-api",
"strum 0.27.1",
"substrait 1.0.0-beta.2",
"substrait 1.0.0-beta.3",
"table",
"tokio",
"tonic 0.13.1",
@@ -4916,6 +4991,12 @@ version = "0.1.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d9c4f5dac5e15c24eb999c26181a6ca40b39fe946cbe4c263c7209467bc83af2"
[[package]]
name = "foldhash"
version = "0.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "77ce24cb58228fbb8aa041425bb1050850ac19177686ea6e0f41a70416f56fdb"
[[package]]
name = "form_urlencoded"
version = "1.2.2"
@@ -4933,7 +5014,7 @@ checksum = "28dd6caf6059519a65843af8fe2a3ae298b14b80179855aeb4adc2c1934ee619"
[[package]]
name = "frontend"
version = "1.0.0-beta.2"
version = "1.0.0-beta.3"
dependencies = [
"api",
"arc-swap",
@@ -5380,7 +5461,7 @@ dependencies = [
[[package]]
name = "greptime-proto"
version = "0.1.0"
source = "git+https://github.com/GreptimeTeam/greptime-proto.git?rev=0423fa30203187c75e2937a668df1da699c8b96c#0423fa30203187c75e2937a668df1da699c8b96c"
source = "git+https://github.com/GreptimeTeam/greptime-proto.git?rev=173efe5ec62722089db7c531c0b0d470a072b915#173efe5ec62722089db7c531c0b0d470a072b915"
dependencies = [
"prost 0.13.5",
"prost-types 0.13.5",
@@ -5516,7 +5597,7 @@ checksum = "5971ac85611da7067dbfcabef3c70ebb5606018acd9e2a3903a0da507521e0d5"
dependencies = [
"allocator-api2",
"equivalent",
"foldhash",
"foldhash 0.1.5",
]
[[package]]
@@ -6148,7 +6229,7 @@ dependencies = [
[[package]]
name = "index"
version = "1.0.0-beta.2"
version = "1.0.0-beta.3"
dependencies = [
"async-trait",
"asynchronous-codec",
@@ -6161,6 +6242,7 @@ dependencies = [
"common-telemetry",
"common-test-util",
"criterion 0.4.0",
"datatypes",
"fastbloom",
"fst",
"futures",
@@ -6169,6 +6251,7 @@ dependencies = [
"jieba-rs",
"lazy_static",
"mockall",
"nalgebra",
"pin-project",
"prost 0.13.5",
"puffin",
@@ -6186,6 +6269,7 @@ dependencies = [
"tempfile",
"tokio",
"tokio-util",
"usearch",
"uuid",
]
@@ -7017,6 +7101,15 @@ dependencies = [
"vcpkg",
]
[[package]]
name = "link-cplusplus"
version = "1.0.12"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7f78c730aaa7d0b9336a299029ea49f9ee53b0ed06e9202e8cb7db9bae7b8c82"
dependencies = [
"cc",
]
[[package]]
name = "linked-hash-map"
version = "0.5.6"
@@ -7077,7 +7170,7 @@ checksum = "13dc2df351e3202783a1fe0d44375f7295ffb4049267b0f3018346dc122a1d94"
[[package]]
name = "log-query"
version = "1.0.0-beta.2"
version = "1.0.0-beta.3"
dependencies = [
"chrono",
"common-error",
@@ -7089,7 +7182,7 @@ dependencies = [
[[package]]
name = "log-store"
version = "1.0.0-beta.2"
version = "1.0.0-beta.3"
dependencies = [
"async-stream",
"async-trait",
@@ -7390,7 +7483,7 @@ dependencies = [
[[package]]
name = "meta-client"
version = "1.0.0-beta.2"
version = "1.0.0-beta.3"
dependencies = [
"api",
"async-trait",
@@ -7418,7 +7511,7 @@ dependencies = [
[[package]]
name = "meta-srv"
version = "1.0.0-beta.2"
version = "1.0.0-beta.3"
dependencies = [
"api",
"async-trait",
@@ -7518,7 +7611,7 @@ dependencies = [
[[package]]
name = "metric-engine"
version = "1.0.0-beta.2"
version = "1.0.0-beta.3"
dependencies = [
"api",
"aquamarine",
@@ -7615,7 +7708,7 @@ dependencies = [
[[package]]
name = "mito-codec"
version = "1.0.0-beta.2"
version = "1.0.0-beta.3"
dependencies = [
"api",
"bytes",
@@ -7640,7 +7733,7 @@ dependencies = [
[[package]]
name = "mito2"
version = "1.0.0-beta.2"
version = "1.0.0-beta.3"
dependencies = [
"api",
"aquamarine",
@@ -8380,7 +8473,7 @@ dependencies = [
[[package]]
name = "object-store"
version = "1.0.0-beta.2"
version = "1.0.0-beta.3"
dependencies = [
"anyhow",
"bytes",
@@ -8665,7 +8758,7 @@ dependencies = [
[[package]]
name = "operator"
version = "1.0.0-beta.2"
version = "1.0.0-beta.3"
dependencies = [
"ahash 0.8.12",
"api",
@@ -8725,7 +8818,7 @@ dependencies = [
"sql",
"sqlparser",
"store-api",
"substrait 1.0.0-beta.2",
"substrait 1.0.0-beta.3",
"table",
"tokio",
"tokio-util",
@@ -9011,7 +9104,7 @@ dependencies = [
[[package]]
name = "partition"
version = "1.0.0-beta.2"
version = "1.0.0-beta.3"
dependencies = [
"api",
"async-trait",
@@ -9368,7 +9461,7 @@ checksum = "8b870d8c151b6f2fb93e84a13146138f05d02ed11c7e7c54f8826aaaf7c9f184"
[[package]]
name = "pipeline"
version = "1.0.0-beta.2"
version = "1.0.0-beta.3"
dependencies = [
"ahash 0.8.12",
"api",
@@ -9524,7 +9617,7 @@ dependencies = [
[[package]]
name = "plugins"
version = "1.0.0-beta.2"
version = "1.0.0-beta.3"
dependencies = [
"auth",
"catalog",
@@ -9826,7 +9919,7 @@ dependencies = [
[[package]]
name = "promql"
version = "1.0.0-beta.2"
version = "1.0.0-beta.3"
dependencies = [
"ahash 0.8.12",
"async-trait",
@@ -10109,7 +10202,7 @@ dependencies = [
[[package]]
name = "puffin"
version = "1.0.0-beta.2"
version = "1.0.0-beta.3"
dependencies = [
"async-compression 0.4.19",
"async-trait",
@@ -10151,7 +10244,7 @@ dependencies = [
[[package]]
name = "query"
version = "1.0.0-beta.2"
version = "1.0.0-beta.3"
dependencies = [
"ahash 0.8.12",
"api",
@@ -10218,7 +10311,7 @@ dependencies = [
"sql",
"sqlparser",
"store-api",
"substrait 1.0.0-beta.2",
"substrait 1.0.0-beta.3",
"table",
"tokio",
"tokio-stream",
@@ -11290,6 +11383,12 @@ version = "1.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "94143f37725109f92c262ed2cf5e59bce7498c01bcc1502d7b9afe439a4e9f49"
[[package]]
name = "scratch"
version = "1.0.9"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d68f2ec51b097e4c1a75b681a8bec621909b5e91f15bb7b840c4f2f7b01148b2"
[[package]]
name = "scrypt"
version = "0.11.0"
@@ -11554,7 +11653,7 @@ dependencies = [
[[package]]
name = "servers"
version = "1.0.0-beta.2"
version = "1.0.0-beta.3"
dependencies = [
"ahash 0.8.12",
"api",
@@ -11682,7 +11781,7 @@ dependencies = [
[[package]]
name = "session"
version = "1.0.0-beta.2"
version = "1.0.0-beta.3"
dependencies = [
"ahash 0.8.12",
"api",
@@ -12016,7 +12115,7 @@ dependencies = [
[[package]]
name = "sql"
version = "1.0.0-beta.2"
version = "1.0.0-beta.3"
dependencies = [
"api",
"arrow-buffer",
@@ -12076,7 +12175,7 @@ dependencies = [
[[package]]
name = "sqlness-runner"
version = "1.0.0-beta.2"
version = "1.0.0-beta.3"
dependencies = [
"async-trait",
"clap 4.5.40",
@@ -12353,7 +12452,7 @@ dependencies = [
[[package]]
name = "standalone"
version = "1.0.0-beta.2"
version = "1.0.0-beta.3"
dependencies = [
"async-trait",
"catalog",
@@ -12394,7 +12493,7 @@ checksum = "a2eb9349b6444b326872e140eb1cf5e7c522154d69e7a0ffb0fb81c06b37543f"
[[package]]
name = "store-api"
version = "1.0.0-beta.2"
version = "1.0.0-beta.3"
dependencies = [
"api",
"aquamarine",
@@ -12607,7 +12706,7 @@ dependencies = [
[[package]]
name = "substrait"
version = "1.0.0-beta.2"
version = "1.0.0-beta.3"
dependencies = [
"async-trait",
"bytes",
@@ -12730,7 +12829,7 @@ dependencies = [
[[package]]
name = "table"
version = "1.0.0-beta.2"
version = "1.0.0-beta.3"
dependencies = [
"api",
"async-trait",
@@ -12999,7 +13098,7 @@ checksum = "8f50febec83f5ee1df3015341d8bd429f2d1cc62bcba7ea2076759d315084683"
[[package]]
name = "tests-fuzz"
version = "1.0.0-beta.2"
version = "1.0.0-beta.3"
dependencies = [
"arbitrary",
"async-trait",
@@ -13043,7 +13142,7 @@ dependencies = [
[[package]]
name = "tests-integration"
version = "1.0.0-beta.2"
version = "1.0.0-beta.3"
dependencies = [
"api",
"arrow-flight",
@@ -13118,7 +13217,7 @@ dependencies = [
"sqlx",
"standalone",
"store-api",
"substrait 1.0.0-beta.2",
"substrait 1.0.0-beta.3",
"table",
"tempfile",
"time",
@@ -14143,6 +14242,16 @@ version = "2.1.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "daf8dba3b7eb870caf1ddeed7bc9d2a049f3cfdfae7cb521b087cc33ae4c49da"
[[package]]
name = "usearch"
version = "2.21.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "2cc9fc5f872a3a4f9081d5f42624d788231b763e1846c829b9968a3755ac884d"
dependencies = [
"cxx",
"cxx-build",
]
[[package]]
name = "utf8-ranges"
version = "1.0.5"
@@ -14282,7 +14391,7 @@ dependencies = [
"ciborium",
"cidr",
"clap 4.5.40",
"codespan-reporting",
"codespan-reporting 0.12.0",
"community-id",
"convert_case 0.7.1",
"crc",

View File

@@ -75,7 +75,7 @@ members = [
resolver = "2"
[workspace.package]
version = "1.0.0-beta.2"
version = "1.0.0-beta.3"
edition = "2024"
license = "Apache-2.0"
@@ -143,14 +143,14 @@ derive_builder = "0.20"
derive_more = { version = "2.1", features = ["full"] }
dotenv = "0.15"
either = "1.15"
etcd-client = { git = "https://github.com/GreptimeTeam/etcd-client", rev = "f62df834f0cffda355eba96691fe1a9a332b75a7", features = [
etcd-client = { version = "0.16.1", features = [
"tls",
"tls-roots",
] }
fst = "0.4.7"
futures = "0.3"
futures-util = "0.3"
greptime-proto = { git = "https://github.com/GreptimeTeam/greptime-proto.git", rev = "0423fa30203187c75e2937a668df1da699c8b96c" }
greptime-proto = { git = "https://github.com/GreptimeTeam/greptime-proto.git", rev = "173efe5ec62722089db7c531c0b0d470a072b915" }
hex = "0.4"
http = "1"
humantime = "2.1"

View File

@@ -83,6 +83,8 @@
| `wal.sync_period` | String | `10s` | Duration for fsyncing log files.<br/>**It's only used when the provider is `raft_engine`**. |
| `wal.recovery_parallelism` | Integer | `2` | Parallelism during WAL recovery. |
| `wal.broker_endpoints` | Array | -- | The Kafka broker endpoints.<br/>**It's only used when the provider is `kafka`**. |
| `wal.connect_timeout` | String | `3s` | The connect timeout for kafka client.<br/>**It's only used when the provider is `kafka`**. |
| `wal.timeout` | String | `3s` | The timeout for kafka client.<br/>**It's only used when the provider is `kafka`**. |
| `wal.auto_create_topics` | Bool | `true` | Automatically create topics for WAL.<br/>Set to `true` to automatically create topics for WAL.<br/>Otherwise, use topics named `topic_name_prefix_[0..num_topics)` |
| `wal.num_topics` | Integer | `64` | Number of topics.<br/>**It's only used when the provider is `kafka`**. |
| `wal.selector_type` | String | `round_robin` | Topic selector type.<br/>Available selector types:<br/>- `round_robin` (default)<br/>**It's only used when the provider is `kafka`**. |
@@ -352,6 +354,7 @@
| `region_failure_detector_initialization_delay` | String | `10m` | The delay before starting region failure detection.<br/>This delay helps prevent Metasrv from triggering unnecessary region failovers before all Datanodes are fully started.<br/>Especially useful when the cluster is not deployed with GreptimeDB Operator and maintenance mode is not enabled. |
| `allow_region_failover_on_local_wal` | Bool | `false` | Whether to allow region failover on local WAL.<br/>**This option is not recommended to be set to true, because it may lead to data loss during failover.** |
| `node_max_idle_time` | String | `24hours` | Max allowed idle time before removing node info from metasrv memory. |
| `heartbeat_interval` | String | `3s` | Base heartbeat interval for calculating distributed time constants.<br/>The frontend heartbeat interval is 6 times of the base heartbeat interval.<br/>The flownode/datanode heartbeat interval is 1 times of the base heartbeat interval.<br/>e.g., If the base heartbeat interval is 3s, the frontend heartbeat interval is 18s, the flownode/datanode heartbeat interval is 3s.<br/>If you change this value, you need to change the heartbeat interval of the flownode/frontend/datanode accordingly. |
| `enable_telemetry` | Bool | `true` | Whether to enable greptimedb telemetry. Enabled by default. |
| `runtime` | -- | -- | The runtime options. |
| `runtime.global_rt_size` | Integer | `8` | The number of threads to execute the runtime for global read operations. |
@@ -361,12 +364,18 @@
| `backend_tls.cert_path` | String | `""` | Path to client certificate file (for client authentication)<br/>Like "/path/to/client.crt" |
| `backend_tls.key_path` | String | `""` | Path to client private key file (for client authentication)<br/>Like "/path/to/client.key" |
| `backend_tls.ca_cert_path` | String | `""` | Path to CA certificate file (for server certificate verification)<br/>Required when using custom CAs or self-signed certificates<br/>Leave empty to use system root certificates only<br/>Like "/path/to/ca.crt" |
| `backend_client` | -- | -- | The backend client options.<br/>Currently, only applicable when using etcd as the metadata store. |
| `backend_client.keep_alive_timeout` | String | `3s` | The keep alive timeout for backend client. |
| `backend_client.keep_alive_interval` | String | `10s` | The keep alive interval for backend client. |
| `backend_client.connect_timeout` | String | `3s` | The connect timeout for backend client. |
| `grpc` | -- | -- | The gRPC server options. |
| `grpc.bind_addr` | String | `127.0.0.1:3002` | The address to bind the gRPC server. |
| `grpc.server_addr` | String | `127.0.0.1:3002` | The communication server address for the frontend and datanode to connect to metasrv.<br/>If left empty or unset, the server will automatically use the IP address of the first network interface<br/>on the host, with the same port number as the one specified in `bind_addr`. |
| `grpc.runtime_size` | Integer | `8` | The number of server worker threads. |
| `grpc.max_recv_message_size` | String | `512MB` | The maximum receive message size for gRPC server. |
| `grpc.max_send_message_size` | String | `512MB` | The maximum send message size for gRPC server. |
| `grpc.http2_keep_alive_interval` | String | `10s` | The server side HTTP/2 keep-alive interval |
| `grpc.http2_keep_alive_timeout` | String | `3s` | The server side HTTP/2 keep-alive timeout. |
| `http` | -- | -- | The HTTP server options. |
| `http.addr` | String | `127.0.0.1:4000` | The address to bind the HTTP server. |
| `http.timeout` | String | `0s` | HTTP request timeout. Set to 0 to disable timeout. |
@@ -476,6 +485,8 @@
| `wal.sync_period` | String | `10s` | Duration for fsyncing log files.<br/>**It's only used when the provider is `raft_engine`**. |
| `wal.recovery_parallelism` | Integer | `2` | Parallelism during WAL recovery. |
| `wal.broker_endpoints` | Array | -- | The Kafka broker endpoints.<br/>**It's only used when the provider is `kafka`**. |
| `wal.connect_timeout` | String | `3s` | The connect timeout for kafka client.<br/>**It's only used when the provider is `kafka`**. |
| `wal.timeout` | String | `3s` | The timeout for kafka client.<br/>**It's only used when the provider is `kafka`**. |
| `wal.max_batch_bytes` | String | `1MB` | The max size of a single producer batch.<br/>Warning: Kafka has a default limit of 1MB per message in a topic.<br/>**It's only used when the provider is `kafka`**. |
| `wal.consumer_wait_timeout` | String | `100ms` | The consumer wait timeout.<br/>**It's only used when the provider is `kafka`**. |
| `wal.create_index` | Bool | `true` | Whether to enable WAL index creation.<br/>**It's only used when the provider is `kafka`**. |

View File

@@ -169,6 +169,14 @@ recovery_parallelism = 2
## **It's only used when the provider is `kafka`**.
broker_endpoints = ["127.0.0.1:9092"]
## The connect timeout for kafka client.
## **It's only used when the provider is `kafka`**.
#+ connect_timeout = "3s"
## The timeout for kafka client.
## **It's only used when the provider is `kafka`**.
#+ timeout = "3s"
## The max size of a single producer batch.
## Warning: Kafka has a default limit of 1MB per message in a topic.
## **It's only used when the provider is `kafka`**.
@@ -225,6 +233,7 @@ overwrite_entry_start_id = false
# endpoint = "https://s3.amazonaws.com"
# region = "us-west-2"
# enable_virtual_host_style = false
# disable_ec2_metadata = false
# Example of using Oss as the storage.
# [storage]

View File

@@ -131,7 +131,6 @@ key_path = ""
## For now, gRPC tls config does not support auto reload.
watch = false
## MySQL server options.
[mysql]
## Whether to enable.

View File

@@ -71,6 +71,13 @@ allow_region_failover_on_local_wal = false
## Max allowed idle time before removing node info from metasrv memory.
node_max_idle_time = "24hours"
## Base heartbeat interval for calculating distributed time constants.
## The frontend heartbeat interval is 6 times of the base heartbeat interval.
## The flownode/datanode heartbeat interval is 1 times of the base heartbeat interval.
## e.g., If the base heartbeat interval is 3s, the frontend heartbeat interval is 18s, the flownode/datanode heartbeat interval is 3s.
## If you change this value, you need to change the heartbeat interval of the flownode/frontend/datanode accordingly.
#+ heartbeat_interval = "3s"
## Whether to enable greptimedb telemetry. Enabled by default.
#+ enable_telemetry = true
@@ -109,6 +116,16 @@ key_path = ""
## Like "/path/to/ca.crt"
ca_cert_path = ""
## The backend client options.
## Currently, only applicable when using etcd as the metadata store.
#+ [backend_client]
## The keep alive timeout for backend client.
#+ keep_alive_timeout = "3s"
## The keep alive interval for backend client.
#+ keep_alive_interval = "10s"
## The connect timeout for backend client.
#+ connect_timeout = "3s"
## The gRPC server options.
[grpc]
## The address to bind the gRPC server.
@@ -123,6 +140,10 @@ runtime_size = 8
max_recv_message_size = "512MB"
## The maximum send message size for gRPC server.
max_send_message_size = "512MB"
## The server side HTTP/2 keep-alive interval
#+ http2_keep_alive_interval = "10s"
## The server side HTTP/2 keep-alive timeout.
#+ http2_keep_alive_timeout = "3s"
## The HTTP server options.
[http]

View File

@@ -230,6 +230,14 @@ recovery_parallelism = 2
## **It's only used when the provider is `kafka`**.
broker_endpoints = ["127.0.0.1:9092"]
## The connect timeout for kafka client.
## **It's only used when the provider is `kafka`**.
#+ connect_timeout = "3s"
## The timeout for kafka client.
## **It's only used when the provider is `kafka`**.
#+ timeout = "3s"
## Automatically create topics for WAL.
## Set to `true` to automatically create topics for WAL.
## Otherwise, use topics named `topic_name_prefix_[0..num_topics)`
@@ -332,6 +340,7 @@ max_running_procedures = 128
# endpoint = "https://s3.amazonaws.com"
# region = "us-west-2"
# enable_virtual_host_style = false
# disable_ec2_metadata = false
# Example of using Oss as the storage.
# [storage]

View File

@@ -428,7 +428,7 @@ pub trait InformationExtension {
}
/// The request to inspect the datanode.
#[derive(Debug, Clone, PartialEq, Eq)]
#[derive(Debug, Clone, PartialEq)]
pub struct DatanodeInspectRequest {
/// Kind to fetch from datanode.
pub kind: DatanodeInspectKind,

View File

@@ -67,6 +67,7 @@ tracing-appender.workspace = true
[dev-dependencies]
common-meta = { workspace = true, features = ["testing"] }
common-test-util.workspace = true
common-version.workspace = true
serde.workspace = true
tempfile.workspace = true

View File

@@ -15,5 +15,8 @@
mod object_store;
mod store;
pub use object_store::{ObjectStoreConfig, new_fs_object_store};
pub use object_store::{
ObjectStoreConfig, PrefixedAzblobConnection, PrefixedGcsConnection, PrefixedOssConnection,
PrefixedS3Connection, new_fs_object_store,
};
pub use store::StoreConfig;

View File

@@ -12,7 +12,7 @@
// See the License for the specific language governing permissions and
// limitations under the License.
use common_base::secrets::SecretString;
use common_base::secrets::{ExposeSecret, SecretString};
use common_error::ext::BoxedError;
use object_store::services::{Azblob, Fs, Gcs, Oss, S3};
use object_store::util::{with_instrument_layers, with_retry_layers};
@@ -22,9 +22,69 @@ use snafu::ResultExt;
use crate::error::{self};
/// Trait to convert CLI field types to target struct field types.
/// This enables `Option<SecretString>` (CLI) -> `SecretString` (target) conversions,
/// allowing us to distinguish "not provided" from "provided but empty".
trait IntoField<T> {
fn into_field(self) -> T;
}
/// Identity conversion for types that are the same.
impl<T> IntoField<T> for T {
fn into_field(self) -> T {
self
}
}
/// Convert `Option<SecretString>` to `SecretString`, using default for None.
impl IntoField<SecretString> for Option<SecretString> {
fn into_field(self) -> SecretString {
self.unwrap_or_default()
}
}
/// Trait for checking if a field is effectively empty.
///
/// **`is_empty()`**: Checks if the field has no meaningful value
/// - Used when backend is enabled to validate required fields
/// - `None`, `Some("")`, `false`, or `""` are considered empty
trait FieldValidator {
/// Check if the field is empty (has no meaningful value).
fn is_empty(&self) -> bool;
}
/// String fields: empty if the string is empty
impl FieldValidator for String {
fn is_empty(&self) -> bool {
self.is_empty()
}
}
/// Bool fields: false is considered "empty", true is "provided"
impl FieldValidator for bool {
fn is_empty(&self) -> bool {
!self
}
}
/// Option<String> fields: None or empty content is empty
impl FieldValidator for Option<String> {
fn is_empty(&self) -> bool {
self.as_ref().is_none_or(|s| s.is_empty())
}
}
/// Option<SecretString> fields: None or empty secret is empty
/// For secrets, Some("") is treated as "not provided" for both checks
impl FieldValidator for Option<SecretString> {
fn is_empty(&self) -> bool {
self.as_ref().is_none_or(|s| s.expose_secret().is_empty())
}
}
macro_rules! wrap_with_clap_prefix {
(
$new_name:ident, $prefix:literal, $base:ty, {
$new_name:ident, $prefix:literal, $enable_flag:literal, $base:ty, {
$( $( #[doc = $doc:expr] )? $( #[alias = $alias:literal] )? $field:ident : $type:ty $( = $default:expr )? ),* $(,)?
}
) => {
@@ -34,15 +94,16 @@ macro_rules! wrap_with_clap_prefix {
$(
$( #[doc = $doc] )?
$( #[clap(alias = $alias)] )?
#[clap(long $(, default_value_t = $default )? )]
[<$prefix $field>]: $type,
#[clap(long, requires = $enable_flag $(, default_value_t = $default )? )]
pub [<$prefix $field>]: $type,
)*
}
impl From<$new_name> for $base {
fn from(w: $new_name) -> Self {
Self {
$( $field: w.[<$prefix $field>] ),*
// Use into_field() to handle Option<SecretString> -> SecretString conversion
$( $field: w.[<$prefix $field>].into_field() ),*
}
}
}
@@ -50,9 +111,90 @@ macro_rules! wrap_with_clap_prefix {
};
}
/// Macro for declarative backend validation.
///
/// # Validation Rules
///
/// For each storage backend (S3, OSS, GCS, Azblob), this function validates:
/// **When backend is enabled** (e.g., `--s3`): All required fields must be non-empty
///
/// Note: When backend is disabled, clap's `requires` attribute ensures no configuration
/// fields can be provided at parse time.
///
/// # Syntax
///
/// ```ignore
/// validate_backend!(
/// enable: self.enable_s3,
/// name: "S3",
/// required: [(field1, "name1"), (field2, "name2"), ...],
/// custom_validator: |missing| { ... } // optional
/// )
/// ```
///
/// # Arguments
///
/// - `enable`: Boolean expression indicating if backend is enabled
/// - `name`: Human-readable backend name for error messages
/// - `required`: Array of (field_ref, field_name) tuples for required fields
/// - `custom_validator`: Optional closure for complex validation logic
///
/// # Example
///
/// ```ignore
/// validate_backend!(
/// enable: self.enable_s3,
/// name: "S3",
/// required: [
/// (&self.s3.s3_bucket, "bucket"),
/// (&self.s3.s3_access_key_id, "access key ID"),
/// ]
/// )
/// ```
macro_rules! validate_backend {
(
enable: $enable:expr,
name: $backend_name:expr,
required: [ $( ($field:expr, $field_name:expr) ),* $(,)? ]
$(, custom_validator: $custom_validator:expr)?
) => {{
if $enable {
// Check required fields when backend is enabled
let mut missing = Vec::new();
$(
if FieldValidator::is_empty($field) {
missing.push($field_name);
}
)*
// Run custom validation if provided
$(
$custom_validator(&mut missing);
)?
if !missing.is_empty() {
return Err(BoxedError::new(
error::MissingConfigSnafu {
msg: format!(
"{} {} must be set when --{} is enabled.",
$backend_name,
missing.join(", "),
$backend_name.to_lowercase()
),
}
.build(),
));
}
}
Ok(())
}};
}
wrap_with_clap_prefix! {
PrefixedAzblobConnection,
"azblob-",
"enable_azblob",
AzblobConnection,
{
#[doc = "The container of the object store."]
@@ -60,9 +202,9 @@ wrap_with_clap_prefix! {
#[doc = "The root of the object store."]
root: String = Default::default(),
#[doc = "The account name of the object store."]
account_name: SecretString = Default::default(),
account_name: Option<SecretString>,
#[doc = "The account key of the object store."]
account_key: SecretString = Default::default(),
account_key: Option<SecretString>,
#[doc = "The endpoint of the object store."]
endpoint: String = Default::default(),
#[doc = "The SAS token of the object store."]
@@ -70,9 +212,33 @@ wrap_with_clap_prefix! {
}
}
impl PrefixedAzblobConnection {
pub fn validate(&self) -> Result<(), BoxedError> {
validate_backend!(
enable: true,
name: "AzBlob",
required: [
(&self.azblob_container, "container"),
(&self.azblob_root, "root"),
(&self.azblob_account_name, "account name"),
(&self.azblob_endpoint, "endpoint"),
],
custom_validator: |missing: &mut Vec<&str>| {
// account_key is only required if sas_token is not provided
if self.azblob_sas_token.is_none()
&& self.azblob_account_key.is_empty()
{
missing.push("account key (when sas_token is not provided)");
}
}
)
}
}
wrap_with_clap_prefix! {
PrefixedS3Connection,
"s3-",
"enable_s3",
S3Connection,
{
#[doc = "The bucket of the object store."]
@@ -80,21 +246,39 @@ wrap_with_clap_prefix! {
#[doc = "The root of the object store."]
root: String = Default::default(),
#[doc = "The access key ID of the object store."]
access_key_id: SecretString = Default::default(),
access_key_id: Option<SecretString>,
#[doc = "The secret access key of the object store."]
secret_access_key: SecretString = Default::default(),
secret_access_key: Option<SecretString>,
#[doc = "The endpoint of the object store."]
endpoint: Option<String>,
#[doc = "The region of the object store."]
region: Option<String>,
#[doc = "Enable virtual host style for the object store."]
enable_virtual_host_style: bool = Default::default(),
#[doc = "Disable EC2 metadata service for the object store."]
disable_ec2_metadata: bool = Default::default(),
}
}
impl PrefixedS3Connection {
pub fn validate(&self) -> Result<(), BoxedError> {
validate_backend!(
enable: true,
name: "S3",
required: [
(&self.s3_bucket, "bucket"),
(&self.s3_access_key_id, "access key ID"),
(&self.s3_secret_access_key, "secret access key"),
(&self.s3_region, "region"),
]
)
}
}
wrap_with_clap_prefix! {
PrefixedOssConnection,
"oss-",
"enable_oss",
OssConnection,
{
#[doc = "The bucket of the object store."]
@@ -102,17 +286,33 @@ wrap_with_clap_prefix! {
#[doc = "The root of the object store."]
root: String = Default::default(),
#[doc = "The access key ID of the object store."]
access_key_id: SecretString = Default::default(),
access_key_id: Option<SecretString>,
#[doc = "The access key secret of the object store."]
access_key_secret: SecretString = Default::default(),
access_key_secret: Option<SecretString>,
#[doc = "The endpoint of the object store."]
endpoint: String = Default::default(),
}
}
impl PrefixedOssConnection {
pub fn validate(&self) -> Result<(), BoxedError> {
validate_backend!(
enable: true,
name: "OSS",
required: [
(&self.oss_bucket, "bucket"),
(&self.oss_access_key_id, "access key ID"),
(&self.oss_access_key_secret, "access key secret"),
(&self.oss_endpoint, "endpoint"),
]
)
}
}
wrap_with_clap_prefix! {
PrefixedGcsConnection,
"gcs-",
"enable_gcs",
GcsConnection,
{
#[doc = "The root of the object store."]
@@ -122,40 +322,72 @@ wrap_with_clap_prefix! {
#[doc = "The scope of the object store."]
scope: String = Default::default(),
#[doc = "The credential path of the object store."]
credential_path: SecretString = Default::default(),
credential_path: Option<SecretString>,
#[doc = "The credential of the object store."]
credential: SecretString = Default::default(),
credential: Option<SecretString>,
#[doc = "The endpoint of the object store."]
endpoint: String = Default::default(),
}
}
/// common config for object store.
impl PrefixedGcsConnection {
pub fn validate(&self) -> Result<(), BoxedError> {
validate_backend!(
enable: true,
name: "GCS",
required: [
(&self.gcs_bucket, "bucket"),
(&self.gcs_root, "root"),
(&self.gcs_scope, "scope"),
]
// No custom_validator needed: GCS supports Application Default Credentials (ADC)
// where neither credential_path nor credential is required.
// Endpoint is also optional (defaults to https://storage.googleapis.com).
)
}
}
/// Common config for object store.
///
/// # Dependency Enforcement
///
/// Each backend's configuration fields (e.g., `--s3-bucket`) requires its corresponding
/// enable flag (e.g., `--s3`) to be present. This is enforced by `clap` at parse time
/// using the `requires` attribute.
///
/// For example, attempting to use `--s3-bucket my-bucket` without `--s3` will result in:
/// ```text
/// error: The argument '--s3-bucket <BUCKET>' requires '--s3'
/// ```
///
/// This ensures that users cannot accidentally provide backend-specific configuration
/// without explicitly enabling that backend.
#[derive(clap::Parser, Debug, Clone, PartialEq, Default)]
#[clap(group(clap::ArgGroup::new("storage_backend").required(false).multiple(false)))]
pub struct ObjectStoreConfig {
/// Whether to use S3 object store.
#[clap(long, alias = "s3")]
#[clap(long = "s3", group = "storage_backend")]
pub enable_s3: bool,
#[clap(flatten)]
pub s3: PrefixedS3Connection,
/// Whether to use OSS.
#[clap(long, alias = "oss")]
#[clap(long = "oss", group = "storage_backend")]
pub enable_oss: bool,
#[clap(flatten)]
pub oss: PrefixedOssConnection,
/// Whether to use GCS.
#[clap(long, alias = "gcs")]
#[clap(long = "gcs", group = "storage_backend")]
pub enable_gcs: bool,
#[clap(flatten)]
pub gcs: PrefixedGcsConnection,
/// Whether to use Azure Blob.
#[clap(long, alias = "azblob")]
#[clap(long = "azblob", group = "storage_backend")]
pub enable_azblob: bool,
#[clap(flatten)]
@@ -173,52 +405,66 @@ pub fn new_fs_object_store(root: &str) -> std::result::Result<ObjectStore, Boxed
Ok(with_instrument_layers(object_store, false))
}
macro_rules! gen_object_store_builder {
($method:ident, $field:ident, $conn_type:ty, $service_type:ty) => {
pub fn $method(&self) -> Result<ObjectStore, BoxedError> {
let config = <$conn_type>::from(self.$field.clone());
common_telemetry::info!(
"Building object store with {}: {:?}",
stringify!($field),
config
);
let object_store = ObjectStore::new(<$service_type>::from(&config))
.context(error::InitBackendSnafu)
.map_err(BoxedError::new)?
.finish();
Ok(with_instrument_layers(
with_retry_layers(object_store),
false,
))
}
};
}
impl ObjectStoreConfig {
gen_object_store_builder!(build_s3, s3, S3Connection, S3);
gen_object_store_builder!(build_oss, oss, OssConnection, Oss);
gen_object_store_builder!(build_gcs, gcs, GcsConnection, Gcs);
gen_object_store_builder!(build_azblob, azblob, AzblobConnection, Azblob);
pub fn validate(&self) -> Result<(), BoxedError> {
if self.enable_s3 {
self.s3.validate()?;
}
if self.enable_oss {
self.oss.validate()?;
}
if self.enable_gcs {
self.gcs.validate()?;
}
if self.enable_azblob {
self.azblob.validate()?;
}
Ok(())
}
/// Builds the object store from the config.
pub fn build(&self) -> Result<Option<ObjectStore>, BoxedError> {
let object_store = if self.enable_s3 {
let s3 = S3Connection::from(self.s3.clone());
common_telemetry::info!("Building object store with s3: {:?}", s3);
Some(
ObjectStore::new(S3::from(&s3))
.context(error::InitBackendSnafu)
.map_err(BoxedError::new)?
.finish(),
)
self.validate()?;
if self.enable_s3 {
self.build_s3().map(Some)
} else if self.enable_oss {
let oss = OssConnection::from(self.oss.clone());
common_telemetry::info!("Building object store with oss: {:?}", oss);
Some(
ObjectStore::new(Oss::from(&oss))
.context(error::InitBackendSnafu)
.map_err(BoxedError::new)?
.finish(),
)
self.build_oss().map(Some)
} else if self.enable_gcs {
let gcs = GcsConnection::from(self.gcs.clone());
common_telemetry::info!("Building object store with gcs: {:?}", gcs);
Some(
ObjectStore::new(Gcs::from(&gcs))
.context(error::InitBackendSnafu)
.map_err(BoxedError::new)?
.finish(),
)
self.build_gcs().map(Some)
} else if self.enable_azblob {
let azblob = AzblobConnection::from(self.azblob.clone());
common_telemetry::info!("Building object store with azblob: {:?}", azblob);
Some(
ObjectStore::new(Azblob::from(&azblob))
.context(error::InitBackendSnafu)
.map_err(BoxedError::new)?
.finish(),
)
self.build_azblob().map(Some)
} else {
None
};
let object_store = object_store
.map(|object_store| with_instrument_layers(with_retry_layers(object_store), false));
Ok(object_store)
Ok(None)
}
}
}

View File

@@ -19,7 +19,7 @@ use common_error::ext::BoxedError;
use common_meta::kv_backend::KvBackendRef;
use common_meta::kv_backend::chroot::ChrootKvBackend;
use common_meta::kv_backend::etcd::EtcdStore;
use meta_srv::metasrv::BackendImpl;
use meta_srv::metasrv::{BackendClientOptions, BackendImpl};
use meta_srv::utils::etcd::create_etcd_client_with_tls;
use servers::tls::{TlsMode, TlsOption};
@@ -112,9 +112,13 @@ impl StoreConfig {
let kvbackend = match self.backend {
BackendImpl::EtcdStore => {
let tls_config = self.tls_config();
let etcd_client = create_etcd_client_with_tls(store_addrs, tls_config.as_ref())
.await
.map_err(BoxedError::new)?;
let etcd_client = create_etcd_client_with_tls(
store_addrs,
&BackendClientOptions::default(),
tls_config.as_ref(),
)
.await
.map_err(BoxedError::new)?;
Ok(EtcdStore::with_etcd_client(etcd_client, max_txn_ops))
}
#[cfg(feature = "pg_kvbackend")]

View File

@@ -14,6 +14,7 @@
mod export;
mod import;
mod storage_export;
use clap::Subcommand;
use client::DEFAULT_CATALOG_NAME;

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,373 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
use std::path::PathBuf;
use common_base::secrets::{ExposeSecret, SecretString};
use common_error::ext::BoxedError;
use crate::common::{
PrefixedAzblobConnection, PrefixedGcsConnection, PrefixedOssConnection, PrefixedS3Connection,
};
/// Helper function to extract secret string from Option<SecretString>.
/// Returns empty string if None.
fn expose_optional_secret(secret: &Option<SecretString>) -> &str {
secret
.as_ref()
.map(|s| s.expose_secret().as_str())
.unwrap_or("")
}
/// Helper function to format root path with leading slash if non-empty.
fn format_root_path(root: &str) -> String {
if root.is_empty() {
String::new()
} else {
format!("/{}", root)
}
}
/// Helper function to mask multiple secrets in a string.
fn mask_secrets(mut sql: String, secrets: &[&str]) -> String {
for secret in secrets {
if !secret.is_empty() {
sql = sql.replace(secret, "[REDACTED]");
}
}
sql
}
/// Helper function to format storage URI.
fn format_uri(scheme: &str, bucket: &str, root: &str, path: &str) -> String {
let root = format_root_path(root);
format!("{}://{}{}/{}", scheme, bucket, root, path)
}
/// Trait for storage backends that can be used for data export.
pub trait StorageExport: Send + Sync {
/// Generate the storage path for COPY DATABASE command.
/// Returns (path, connection_string) where connection_string includes CONNECTION clause.
fn get_storage_path(&self, catalog: &str, schema: &str) -> (String, String);
/// Format the output path for logging purposes.
fn format_output_path(&self, file_path: &str) -> String;
/// Mask sensitive information in SQL commands for safe logging.
fn mask_sensitive_info(&self, sql: &str) -> String;
}
macro_rules! define_backend {
($name:ident, $config:ty) => {
#[derive(Clone)]
pub struct $name {
config: $config,
}
impl $name {
pub fn new(config: $config) -> Result<Self, BoxedError> {
config.validate()?;
Ok(Self { config })
}
}
};
}
/// Local file system storage backend.
#[derive(Clone)]
pub struct FsBackend {
output_dir: String,
}
impl FsBackend {
pub fn new(output_dir: String) -> Self {
Self { output_dir }
}
}
impl StorageExport for FsBackend {
fn get_storage_path(&self, catalog: &str, schema: &str) -> (String, String) {
if self.output_dir.is_empty() {
unreachable!("output_dir must be set when not using remote storage")
}
let path = PathBuf::from(&self.output_dir)
.join(catalog)
.join(format!("{schema}/"))
.to_string_lossy()
.to_string();
(path, String::new())
}
fn format_output_path(&self, file_path: &str) -> String {
format!("{}/{}", self.output_dir, file_path)
}
fn mask_sensitive_info(&self, sql: &str) -> String {
sql.to_string()
}
}
define_backend!(S3Backend, PrefixedS3Connection);
impl StorageExport for S3Backend {
fn get_storage_path(&self, catalog: &str, schema: &str) -> (String, String) {
let s3_path = format_uri(
"s3",
&self.config.s3_bucket,
&self.config.s3_root,
&format!("{}/{}/", catalog, schema),
);
let mut connection_options = vec![
format!(
"ACCESS_KEY_ID='{}'",
expose_optional_secret(&self.config.s3_access_key_id)
),
format!(
"SECRET_ACCESS_KEY='{}'",
expose_optional_secret(&self.config.s3_secret_access_key)
),
];
if let Some(region) = &self.config.s3_region {
connection_options.push(format!("REGION='{}'", region));
}
if let Some(endpoint) = &self.config.s3_endpoint {
connection_options.push(format!("ENDPOINT='{}'", endpoint));
}
let connection_str = format!(" CONNECTION ({})", connection_options.join(", "));
(s3_path, connection_str)
}
fn format_output_path(&self, file_path: &str) -> String {
format_uri(
"s3",
&self.config.s3_bucket,
&self.config.s3_root,
file_path,
)
}
fn mask_sensitive_info(&self, sql: &str) -> String {
mask_secrets(
sql.to_string(),
&[
expose_optional_secret(&self.config.s3_access_key_id),
expose_optional_secret(&self.config.s3_secret_access_key),
],
)
}
}
define_backend!(OssBackend, PrefixedOssConnection);
impl StorageExport for OssBackend {
fn get_storage_path(&self, catalog: &str, schema: &str) -> (String, String) {
let oss_path = format_uri(
"oss",
&self.config.oss_bucket,
&self.config.oss_root,
&format!("{}/{}/", catalog, schema),
);
let connection_options = [
format!(
"ACCESS_KEY_ID='{}'",
expose_optional_secret(&self.config.oss_access_key_id)
),
format!(
"ACCESS_KEY_SECRET='{}'",
expose_optional_secret(&self.config.oss_access_key_secret)
),
];
let connection_str = format!(" CONNECTION ({})", connection_options.join(", "));
(oss_path, connection_str)
}
fn format_output_path(&self, file_path: &str) -> String {
format_uri(
"oss",
&self.config.oss_bucket,
&self.config.oss_root,
file_path,
)
}
fn mask_sensitive_info(&self, sql: &str) -> String {
mask_secrets(
sql.to_string(),
&[
expose_optional_secret(&self.config.oss_access_key_id),
expose_optional_secret(&self.config.oss_access_key_secret),
],
)
}
}
define_backend!(GcsBackend, PrefixedGcsConnection);
impl StorageExport for GcsBackend {
fn get_storage_path(&self, catalog: &str, schema: &str) -> (String, String) {
let gcs_path = format_uri(
"gcs",
&self.config.gcs_bucket,
&self.config.gcs_root,
&format!("{}/{}/", catalog, schema),
);
let mut connection_options = Vec::new();
let credential_path = expose_optional_secret(&self.config.gcs_credential_path);
if !credential_path.is_empty() {
connection_options.push(format!("CREDENTIAL_PATH='{}'", credential_path));
}
let credential = expose_optional_secret(&self.config.gcs_credential);
if !credential.is_empty() {
connection_options.push(format!("CREDENTIAL='{}'", credential));
}
if !self.config.gcs_endpoint.is_empty() {
connection_options.push(format!("ENDPOINT='{}'", self.config.gcs_endpoint));
}
let connection_str = if connection_options.is_empty() {
String::new()
} else {
format!(" CONNECTION ({})", connection_options.join(", "))
};
(gcs_path, connection_str)
}
fn format_output_path(&self, file_path: &str) -> String {
format_uri(
"gcs",
&self.config.gcs_bucket,
&self.config.gcs_root,
file_path,
)
}
fn mask_sensitive_info(&self, sql: &str) -> String {
mask_secrets(
sql.to_string(),
&[
expose_optional_secret(&self.config.gcs_credential_path),
expose_optional_secret(&self.config.gcs_credential),
],
)
}
}
define_backend!(AzblobBackend, PrefixedAzblobConnection);
impl StorageExport for AzblobBackend {
fn get_storage_path(&self, catalog: &str, schema: &str) -> (String, String) {
let azblob_path = format_uri(
"azblob",
&self.config.azblob_container,
&self.config.azblob_root,
&format!("{}/{}/", catalog, schema),
);
let mut connection_options = vec![
format!(
"ACCOUNT_NAME='{}'",
expose_optional_secret(&self.config.azblob_account_name)
),
format!(
"ACCOUNT_KEY='{}'",
expose_optional_secret(&self.config.azblob_account_key)
),
];
if let Some(sas_token) = &self.config.azblob_sas_token {
connection_options.push(format!("SAS_TOKEN='{}'", sas_token));
}
let connection_str = format!(" CONNECTION ({})", connection_options.join(", "));
(azblob_path, connection_str)
}
fn format_output_path(&self, file_path: &str) -> String {
format_uri(
"azblob",
&self.config.azblob_container,
&self.config.azblob_root,
file_path,
)
}
fn mask_sensitive_info(&self, sql: &str) -> String {
mask_secrets(
sql.to_string(),
&[
expose_optional_secret(&self.config.azblob_account_name),
expose_optional_secret(&self.config.azblob_account_key),
],
)
}
}
#[derive(Clone)]
pub enum StorageType {
Fs(FsBackend),
S3(S3Backend),
Oss(OssBackend),
Gcs(GcsBackend),
Azblob(AzblobBackend),
}
impl StorageExport for StorageType {
fn get_storage_path(&self, catalog: &str, schema: &str) -> (String, String) {
match self {
StorageType::Fs(backend) => backend.get_storage_path(catalog, schema),
StorageType::S3(backend) => backend.get_storage_path(catalog, schema),
StorageType::Oss(backend) => backend.get_storage_path(catalog, schema),
StorageType::Gcs(backend) => backend.get_storage_path(catalog, schema),
StorageType::Azblob(backend) => backend.get_storage_path(catalog, schema),
}
}
fn format_output_path(&self, file_path: &str) -> String {
match self {
StorageType::Fs(backend) => backend.format_output_path(file_path),
StorageType::S3(backend) => backend.format_output_path(file_path),
StorageType::Oss(backend) => backend.format_output_path(file_path),
StorageType::Gcs(backend) => backend.format_output_path(file_path),
StorageType::Azblob(backend) => backend.format_output_path(file_path),
}
}
fn mask_sensitive_info(&self, sql: &str) -> String {
match self {
StorageType::Fs(backend) => backend.mask_sensitive_info(sql),
StorageType::S3(backend) => backend.mask_sensitive_info(sql),
StorageType::Oss(backend) => backend.mask_sensitive_info(sql),
StorageType::Gcs(backend) => backend.mask_sensitive_info(sql),
StorageType::Azblob(backend) => backend.mask_sensitive_info(sql),
}
}
}
impl StorageType {
/// Returns true if the storage backend is remote (not local filesystem).
pub fn is_remote_storage(&self) -> bool {
!matches!(self, StorageType::Fs(_))
}
}

View File

@@ -253,12 +253,6 @@ pub enum Error {
error: ObjectStoreError,
},
#[snafu(display("S3 config need be set"))]
S3ConfigNotSet {
#[snafu(implicit)]
location: Location,
},
#[snafu(display("Output directory not set"))]
OutputDirNotSet {
#[snafu(implicit)]
@@ -364,9 +358,9 @@ impl ErrorExt for Error {
Error::Other { source, .. } => source.status_code(),
Error::OpenDal { .. } | Error::InitBackend { .. } => StatusCode::Internal,
Error::S3ConfigNotSet { .. }
| Error::OutputDirNotSet { .. }
| Error::EmptyStoreAddrs { .. } => StatusCode::InvalidArguments,
Error::OutputDirNotSet { .. } | Error::EmptyStoreAddrs { .. } => {
StatusCode::InvalidArguments
}
Error::BuildRuntime { source, .. } => source.status_code(),

View File

@@ -20,6 +20,7 @@ use async_trait::async_trait;
use clap::Parser;
use common_base::Plugins;
use common_config::Configurable;
use common_meta::distributed_time_constants::init_distributed_time_constants;
use common_telemetry::info;
use common_telemetry::logging::{DEFAULT_LOGGING_DIR, TracingOptions};
use common_version::{short_version, verbose_version};
@@ -327,6 +328,7 @@ impl StartCommand {
log_versions(verbose_version(), short_version(), APP_NAME);
maybe_activate_heap_profile(&opts.component.memory);
create_resource_limit_metrics(APP_NAME);
init_distributed_time_constants(opts.component.heartbeat_interval);
info!("Metasrv start command: {:#?}", self);

View File

@@ -552,9 +552,8 @@ impl StartCommand {
let grpc_handler = fe_instance.clone() as Arc<dyn GrpcQueryHandlerWithBoxedError>;
let weak_grpc_handler = Arc::downgrade(&grpc_handler);
frontend_instance_handler
.lock()
.unwrap()
.replace(weak_grpc_handler);
.set_handler(weak_grpc_handler)
.await;
// set the frontend invoker for flownode
let flow_streaming_engine = flownode.flow_engine().streaming_engine();

View File

@@ -59,15 +59,6 @@ pub enum Error {
location: Location,
},
#[snafu(display("Failed to canonicalize path: {}", path))]
CanonicalizePath {
path: String,
#[snafu(source)]
error: std::io::Error,
#[snafu(implicit)]
location: Location,
},
#[snafu(display("Invalid path '{}': expected a file, not a directory", path))]
InvalidPath {
path: String,
@@ -82,8 +73,7 @@ impl ErrorExt for Error {
Error::TomlFormat { .. }
| Error::LoadLayeredConfig { .. }
| Error::FileWatch { .. }
| Error::InvalidPath { .. }
| Error::CanonicalizePath { .. } => StatusCode::InvalidArguments,
| Error::InvalidPath { .. } => StatusCode::InvalidArguments,
Error::SerdeJson { .. } => StatusCode::Unexpected,
}
}

View File

@@ -30,7 +30,7 @@ use common_telemetry::{error, info, warn};
use notify::{EventKind, RecursiveMode, Watcher};
use snafu::ResultExt;
use crate::error::{CanonicalizePathSnafu, FileWatchSnafu, InvalidPathSnafu, Result};
use crate::error::{FileWatchSnafu, InvalidPathSnafu, Result};
/// Configuration for the file watcher behavior.
#[derive(Debug, Clone, Default)]
@@ -41,15 +41,10 @@ pub struct FileWatcherConfig {
impl FileWatcherConfig {
pub fn new() -> Self {
Self::default()
Default::default()
}
pub fn with_modify_and_create(mut self) -> Self {
self.include_remove_events = false;
self
}
pub fn with_remove_events(mut self) -> Self {
pub fn include_remove_events(mut self) -> Self {
self.include_remove_events = true;
self
}
@@ -93,11 +88,8 @@ impl FileWatcherBuilder {
path: path.display().to_string(),
}
);
// Canonicalize the path for reliable comparison with event paths
let canonical = path.canonicalize().context(CanonicalizePathSnafu {
path: path.display().to_string(),
})?;
self.file_paths.push(canonical);
self.file_paths.push(path.to_path_buf());
Ok(self)
}
@@ -144,7 +136,6 @@ impl FileWatcherBuilder {
}
let config = self.config;
let watched_files: HashSet<PathBuf> = self.file_paths.iter().cloned().collect();
info!(
"Spawning file watcher for paths: {:?} (watching parent directories)",
@@ -165,25 +156,7 @@ impl FileWatcherBuilder {
continue;
}
// Check if any of the event paths match our watched files
let is_watched_file = event.paths.iter().any(|event_path| {
// Try to canonicalize the event path for comparison
// If the file was deleted, canonicalize will fail, so we also
// compare the raw path
if let Ok(canonical) = event_path.canonicalize()
&& watched_files.contains(&canonical)
{
return true;
}
// For deleted files, compare using the raw path
watched_files.contains(event_path)
});
if !is_watched_file {
continue;
}
info!(?event.kind, ?event.paths, "Detected file change");
info!(?event.kind, ?event.paths, "Detected folder change");
callback();
}
Err(err) => {
@@ -301,55 +274,4 @@ mod tests {
"Watcher should have detected file recreation"
);
}
#[test]
fn test_file_watcher_ignores_other_files() {
common_telemetry::init_default_ut_logging();
let dir = create_temp_dir("test_file_watcher_other");
let watched_file = dir.path().join("watched.txt");
let other_file = dir.path().join("other.txt");
// Create both files
std::fs::write(&watched_file, "watched content").unwrap();
std::fs::write(&other_file, "other content").unwrap();
let counter = Arc::new(AtomicUsize::new(0));
let counter_clone = counter.clone();
FileWatcherBuilder::new()
.watch_path(&watched_file)
.unwrap()
.config(FileWatcherConfig::new())
.spawn(move || {
counter_clone.fetch_add(1, Ordering::SeqCst);
})
.unwrap();
// Give watcher time to start
std::thread::sleep(Duration::from_millis(100));
// Modify the other file - should NOT trigger callback
std::fs::write(&other_file, "modified other content").unwrap();
// Wait for potential event
std::thread::sleep(Duration::from_millis(500));
assert_eq!(
counter.load(Ordering::SeqCst),
0,
"Watcher should not have detected changes to other files"
);
// Now modify the watched file - SHOULD trigger callback
std::fs::write(&watched_file, "modified watched content").unwrap();
// Wait for the event to be processed
std::thread::sleep(Duration::from_millis(500));
assert!(
counter.load(Ordering::SeqCst) >= 1,
"Watcher should have detected change to watched file"
);
}
}

View File

@@ -27,6 +27,7 @@ const SECRET_ACCESS_KEY: &str = "secret_access_key";
const SESSION_TOKEN: &str = "session_token";
const REGION: &str = "region";
const ENABLE_VIRTUAL_HOST_STYLE: &str = "enable_virtual_host_style";
const DISABLE_EC2_METADATA: &str = "disable_ec2_metadata";
pub fn is_supported_in_s3(key: &str) -> bool {
[
@@ -36,6 +37,7 @@ pub fn is_supported_in_s3(key: &str) -> bool {
SESSION_TOKEN,
REGION,
ENABLE_VIRTUAL_HOST_STYLE,
DISABLE_EC2_METADATA,
]
.contains(&key)
}
@@ -82,6 +84,21 @@ pub fn build_s3_backend(
}
}
if let Some(disable_str) = connection.get(DISABLE_EC2_METADATA) {
let disable = disable_str.as_str().parse::<bool>().map_err(|e| {
error::InvalidConnectionSnafu {
msg: format!(
"failed to parse the option {}={}, {}",
DISABLE_EC2_METADATA, disable_str, e
),
}
.build()
})?;
if disable {
builder = builder.disable_ec2_metadata();
}
}
// TODO(weny): Consider finding a better way to eliminate duplicate code.
Ok(ObjectStore::new(builder)
.context(error::BuildBackendSnafu)?
@@ -109,6 +126,7 @@ mod tests {
assert!(is_supported_in_s3(SESSION_TOKEN));
assert!(is_supported_in_s3(REGION));
assert!(is_supported_in_s3(ENABLE_VIRTUAL_HOST_STYLE));
assert!(is_supported_in_s3(DISABLE_EC2_METADATA));
assert!(!is_supported_in_s3("foo"))
}
}

View File

@@ -19,7 +19,7 @@ arc-swap = "1.0"
arrow.workspace = true
arrow-schema.workspace = true
async-trait.workspace = true
bincode = "1.3"
bincode = "=1.3.3"
catalog.workspace = true
chrono.workspace = true
common-base.workspace = true

View File

@@ -13,6 +13,7 @@
// limitations under the License.
use std::any::Any;
use std::time::Duration;
use common_error::ext::ErrorExt;
use common_error::status_code::StatusCode;
@@ -35,6 +36,14 @@ pub enum Error {
#[snafu(display("Memory semaphore unexpectedly closed"))]
MemorySemaphoreClosed,
#[snafu(display(
"Timeout waiting for memory quota: requested {requested_bytes} bytes, waited {waited:?}"
))]
MemoryAcquireTimeout {
requested_bytes: u64,
waited: Duration,
},
}
impl ErrorExt for Error {
@@ -44,6 +53,7 @@ impl ErrorExt for Error {
match self {
MemoryLimitExceeded { .. } => StatusCode::RuntimeResourcesExhausted,
MemorySemaphoreClosed => StatusCode::Unexpected,
MemoryAcquireTimeout { .. } => StatusCode::RuntimeResourcesExhausted,
}
}

View File

@@ -0,0 +1,168 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
use std::fmt;
/// Memory permit granularity for different use cases.
#[derive(Debug, Clone, Copy, PartialEq, Eq, Default)]
pub enum PermitGranularity {
/// 1 KB per permit
///
/// Use for:
/// - HTTP/gRPC request limiting (small, high-concurrency operations)
/// - Small batch operations
/// - Scenarios requiring fine-grained fairness
Kilobyte,
/// 1 MB per permit (default)
///
/// Use for:
/// - Query execution memory management
/// - Compaction memory control
/// - Large, long-running operations
#[default]
Megabyte,
}
impl PermitGranularity {
/// Returns the number of bytes per permit.
#[inline]
pub const fn bytes(self) -> u64 {
match self {
Self::Kilobyte => 1024,
Self::Megabyte => 1024 * 1024,
}
}
/// Returns a human-readable string representation.
pub const fn as_str(self) -> &'static str {
match self {
Self::Kilobyte => "1KB",
Self::Megabyte => "1MB",
}
}
/// Converts bytes to permits based on this granularity.
///
/// Rounds up to ensure the requested bytes are fully covered.
/// Clamped to Semaphore::MAX_PERMITS.
#[inline]
pub fn bytes_to_permits(self, bytes: u64) -> u32 {
use tokio::sync::Semaphore;
let granularity_bytes = self.bytes();
bytes
.saturating_add(granularity_bytes - 1)
.saturating_div(granularity_bytes)
.min(Semaphore::MAX_PERMITS as u64)
.min(u32::MAX as u64) as u32
}
/// Converts permits to bytes based on this granularity.
#[inline]
pub fn permits_to_bytes(self, permits: u32) -> u64 {
(permits as u64).saturating_mul(self.bytes())
}
}
impl fmt::Display for PermitGranularity {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
write!(f, "{}", self.as_str())
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_bytes_to_permits_kilobyte() {
let granularity = PermitGranularity::Kilobyte;
// Exact multiples
assert_eq!(granularity.bytes_to_permits(1024), 1);
assert_eq!(granularity.bytes_to_permits(2048), 2);
assert_eq!(granularity.bytes_to_permits(10 * 1024), 10);
// Rounds up
assert_eq!(granularity.bytes_to_permits(1), 1);
assert_eq!(granularity.bytes_to_permits(1025), 2);
assert_eq!(granularity.bytes_to_permits(2047), 2);
}
#[test]
fn test_bytes_to_permits_megabyte() {
let granularity = PermitGranularity::Megabyte;
// Exact multiples
assert_eq!(granularity.bytes_to_permits(1024 * 1024), 1);
assert_eq!(granularity.bytes_to_permits(2 * 1024 * 1024), 2);
// Rounds up
assert_eq!(granularity.bytes_to_permits(1), 1);
assert_eq!(granularity.bytes_to_permits(1024), 1);
assert_eq!(granularity.bytes_to_permits(1024 * 1024 + 1), 2);
}
#[test]
fn test_bytes_to_permits_zero_bytes() {
assert_eq!(PermitGranularity::Kilobyte.bytes_to_permits(0), 0);
assert_eq!(PermitGranularity::Megabyte.bytes_to_permits(0), 0);
}
#[test]
fn test_bytes_to_permits_clamps_to_maximum() {
use tokio::sync::Semaphore;
let max_permits = (Semaphore::MAX_PERMITS as u64).min(u32::MAX as u64) as u32;
assert_eq!(
PermitGranularity::Kilobyte.bytes_to_permits(u64::MAX),
max_permits
);
assert_eq!(
PermitGranularity::Megabyte.bytes_to_permits(u64::MAX),
max_permits
);
}
#[test]
fn test_permits_to_bytes() {
assert_eq!(PermitGranularity::Kilobyte.permits_to_bytes(1), 1024);
assert_eq!(PermitGranularity::Kilobyte.permits_to_bytes(10), 10 * 1024);
assert_eq!(PermitGranularity::Megabyte.permits_to_bytes(1), 1024 * 1024);
assert_eq!(
PermitGranularity::Megabyte.permits_to_bytes(10),
10 * 1024 * 1024
);
}
#[test]
fn test_round_trip_conversion() {
// Kilobyte: bytes -> permits -> bytes (should round up)
let kb = PermitGranularity::Kilobyte;
let permits = kb.bytes_to_permits(1500);
let bytes = kb.permits_to_bytes(permits);
assert!(bytes >= 1500); // Must cover original request
assert_eq!(bytes, 2048); // 2KB
// Megabyte: bytes -> permits -> bytes (should round up)
let mb = PermitGranularity::Megabyte;
let permits = mb.bytes_to_permits(1500);
let bytes = mb.permits_to_bytes(permits);
assert!(bytes >= 1500);
assert_eq!(bytes, 1024 * 1024); // 1MB
}
}

View File

@@ -17,7 +17,7 @@ use std::{fmt, mem};
use common_telemetry::debug;
use tokio::sync::{OwnedSemaphorePermit, TryAcquireError};
use crate::manager::{MemoryMetrics, MemoryQuota, bytes_to_permits, permits_to_bytes};
use crate::manager::{MemoryMetrics, MemoryQuota};
/// Guard representing a slice of reserved memory.
pub struct MemoryGuard<M: MemoryMetrics> {
@@ -49,7 +49,9 @@ impl<M: MemoryMetrics> MemoryGuard<M> {
pub fn granted_bytes(&self) -> u64 {
match &self.state {
GuardState::Unlimited => 0,
GuardState::Limited { permit, .. } => permits_to_bytes(permit.num_permits() as u32),
GuardState::Limited { permit, quota } => {
quota.permits_to_bytes(permit.num_permits() as u32)
}
}
}
@@ -65,7 +67,7 @@ impl<M: MemoryMetrics> MemoryGuard<M> {
return true;
}
let additional_permits = bytes_to_permits(bytes);
let additional_permits = quota.bytes_to_permits(bytes);
match quota
.semaphore
@@ -99,11 +101,12 @@ impl<M: MemoryMetrics> MemoryGuard<M> {
return true;
}
let release_permits = bytes_to_permits(bytes);
let release_permits = quota.bytes_to_permits(bytes);
match permit.split(release_permits as usize) {
Some(released_permit) => {
let released_bytes = permits_to_bytes(released_permit.num_permits() as u32);
let released_bytes =
quota.permits_to_bytes(released_permit.num_permits() as u32);
drop(released_permit);
quota.update_in_use_metric();
debug!("Early released {} bytes from memory guard", released_bytes);
@@ -121,7 +124,7 @@ impl<M: MemoryMetrics> Drop for MemoryGuard<M> {
if let GuardState::Limited { permit, quota } =
mem::replace(&mut self.state, GuardState::Unlimited)
{
let bytes = permits_to_bytes(permit.num_permits() as u32);
let bytes = quota.permits_to_bytes(permit.num_permits() as u32);
drop(permit);
quota.update_in_use_metric();
debug!("Released memory: {} bytes", bytes);

View File

@@ -19,6 +19,7 @@
//! share the same allocation logic while using their own metrics.
mod error;
mod granularity;
mod guard;
mod manager;
mod policy;
@@ -27,8 +28,9 @@ mod policy;
mod tests;
pub use error::{Error, Result};
pub use granularity::PermitGranularity;
pub use guard::MemoryGuard;
pub use manager::{MemoryManager, MemoryMetrics, PERMIT_GRANULARITY_BYTES};
pub use manager::{MemoryManager, MemoryMetrics};
pub use policy::{DEFAULT_MEMORY_WAIT_TIMEOUT, OnExhaustedPolicy};
/// No-op metrics implementation for testing.

View File

@@ -17,11 +17,12 @@ use std::sync::Arc;
use snafu::ensure;
use tokio::sync::{Semaphore, TryAcquireError};
use crate::error::{MemoryLimitExceededSnafu, MemorySemaphoreClosedSnafu, Result};
use crate::error::{
MemoryAcquireTimeoutSnafu, MemoryLimitExceededSnafu, MemorySemaphoreClosedSnafu, Result,
};
use crate::granularity::PermitGranularity;
use crate::guard::MemoryGuard;
/// Minimum bytes controlled by one semaphore permit.
pub const PERMIT_GRANULARITY_BYTES: u64 = 1 << 20; // 1 MB
use crate::policy::OnExhaustedPolicy;
/// Trait for recording memory usage metrics.
pub trait MemoryMetrics: Clone + Send + Sync + 'static {
@@ -40,6 +41,7 @@ pub struct MemoryManager<M: MemoryMetrics> {
pub(crate) struct MemoryQuota<M: MemoryMetrics> {
pub(crate) semaphore: Arc<Semaphore>,
pub(crate) limit_permits: u32,
pub(crate) granularity: PermitGranularity,
pub(crate) metrics: M,
}
@@ -47,19 +49,25 @@ impl<M: MemoryMetrics> MemoryManager<M> {
/// Creates a new memory manager with the given limit in bytes.
/// `limit_bytes = 0` disables the limit.
pub fn new(limit_bytes: u64, metrics: M) -> Self {
Self::with_granularity(limit_bytes, PermitGranularity::default(), metrics)
}
/// Creates a new memory manager with specified granularity.
pub fn with_granularity(limit_bytes: u64, granularity: PermitGranularity, metrics: M) -> Self {
if limit_bytes == 0 {
metrics.set_limit(0);
return Self { quota: None };
}
let limit_permits = bytes_to_permits(limit_bytes);
let limit_aligned_bytes = permits_to_bytes(limit_permits);
let limit_permits = granularity.bytes_to_permits(limit_bytes);
let limit_aligned_bytes = granularity.permits_to_bytes(limit_permits);
metrics.set_limit(limit_aligned_bytes as i64);
Self {
quota: Some(MemoryQuota {
semaphore: Arc::new(Semaphore::new(limit_permits as usize)),
limit_permits,
granularity,
metrics,
}),
}
@@ -69,7 +77,7 @@ impl<M: MemoryMetrics> MemoryManager<M> {
pub fn limit_bytes(&self) -> u64 {
self.quota
.as_ref()
.map(|quota| permits_to_bytes(quota.limit_permits))
.map(|quota| quota.permits_to_bytes(quota.limit_permits))
.unwrap_or(0)
}
@@ -77,7 +85,7 @@ impl<M: MemoryMetrics> MemoryManager<M> {
pub fn used_bytes(&self) -> u64 {
self.quota
.as_ref()
.map(|quota| permits_to_bytes(quota.used_permits()))
.map(|quota| quota.permits_to_bytes(quota.used_permits()))
.unwrap_or(0)
}
@@ -85,7 +93,7 @@ impl<M: MemoryMetrics> MemoryManager<M> {
pub fn available_bytes(&self) -> u64 {
self.quota
.as_ref()
.map(|quota| permits_to_bytes(quota.available_permits_clamped()))
.map(|quota| quota.permits_to_bytes(quota.available_permits_clamped()))
.unwrap_or(0)
}
@@ -98,13 +106,13 @@ impl<M: MemoryMetrics> MemoryManager<M> {
match &self.quota {
None => Ok(MemoryGuard::unlimited()),
Some(quota) => {
let permits = bytes_to_permits(bytes);
let permits = quota.bytes_to_permits(bytes);
ensure!(
permits <= quota.limit_permits,
MemoryLimitExceededSnafu {
requested_bytes: bytes,
limit_bytes: permits_to_bytes(quota.limit_permits),
limit_bytes: self.limit_bytes()
}
);
@@ -125,7 +133,7 @@ impl<M: MemoryMetrics> MemoryManager<M> {
match &self.quota {
None => Some(MemoryGuard::unlimited()),
Some(quota) => {
let permits = bytes_to_permits(bytes);
let permits = quota.bytes_to_permits(bytes);
match quota.semaphore.clone().try_acquire_many_owned(permits) {
Ok(permit) => {
@@ -140,9 +148,56 @@ impl<M: MemoryMetrics> MemoryManager<M> {
}
}
}
/// Acquires memory based on the given policy.
///
/// - For `OnExhaustedPolicy::Wait`: Waits up to the timeout duration for memory to become available
/// - For `OnExhaustedPolicy::Fail`: Returns immediately if memory is not available
///
/// # Errors
/// - `MemoryLimitExceeded`: Requested bytes exceed the total limit (both policies), or memory is currently exhausted (Fail policy only)
/// - `MemoryAcquireTimeout`: Timeout elapsed while waiting for memory (Wait policy only)
/// - `MemorySemaphoreClosed`: The internal semaphore is unexpectedly closed (rare, indicates system issue)
pub async fn acquire_with_policy(
&self,
bytes: u64,
policy: OnExhaustedPolicy,
) -> Result<MemoryGuard<M>> {
match policy {
OnExhaustedPolicy::Wait { timeout } => {
match tokio::time::timeout(timeout, self.acquire(bytes)).await {
Ok(Ok(guard)) => Ok(guard),
Ok(Err(e)) => Err(e),
Err(_elapsed) => {
// Timeout elapsed while waiting
MemoryAcquireTimeoutSnafu {
requested_bytes: bytes,
waited: timeout,
}
.fail()
}
}
}
OnExhaustedPolicy::Fail => self.try_acquire(bytes).ok_or_else(|| {
MemoryLimitExceededSnafu {
requested_bytes: bytes,
limit_bytes: self.limit_bytes(),
}
.build()
}),
}
}
}
impl<M: MemoryMetrics> MemoryQuota<M> {
pub(crate) fn bytes_to_permits(&self, bytes: u64) -> u32 {
self.granularity.bytes_to_permits(bytes)
}
pub(crate) fn permits_to_bytes(&self, permits: u32) -> u64 {
self.granularity.permits_to_bytes(permits)
}
pub(crate) fn used_permits(&self) -> u32 {
self.limit_permits
.saturating_sub(self.available_permits_clamped())
@@ -155,19 +210,7 @@ impl<M: MemoryMetrics> MemoryQuota<M> {
}
pub(crate) fn update_in_use_metric(&self) {
let bytes = permits_to_bytes(self.used_permits());
let bytes = self.permits_to_bytes(self.used_permits());
self.metrics.set_in_use(bytes as i64);
}
}
pub(crate) fn bytes_to_permits(bytes: u64) -> u32 {
bytes
.saturating_add(PERMIT_GRANULARITY_BYTES - 1)
.saturating_div(PERMIT_GRANULARITY_BYTES)
.min(Semaphore::MAX_PERMITS as u64)
.min(u32::MAX as u64) as u32
}
pub(crate) fn permits_to_bytes(permits: u32) -> u64 {
(permits as u64).saturating_mul(PERMIT_GRANULARITY_BYTES)
}

View File

@@ -14,7 +14,10 @@
use tokio::time::{Duration, sleep};
use crate::{MemoryManager, NoOpMetrics, PERMIT_GRANULARITY_BYTES};
use crate::{MemoryManager, NoOpMetrics, PermitGranularity};
// Helper constant for tests - use default Megabyte granularity
const PERMIT_GRANULARITY_BYTES: u64 = PermitGranularity::Megabyte.bytes();
#[test]
fn test_try_acquire_unlimited() {

View File

@@ -12,27 +12,10 @@
// See the License for the specific language governing permissions and
// limitations under the License.
use std::sync::OnceLock;
use std::time::Duration;
use etcd_client::ConnectOptions;
/// Heartbeat interval time (is the basic unit of various time).
pub const HEARTBEAT_INTERVAL_MILLIS: u64 = 3000;
/// The frontend will also send heartbeats to Metasrv, sending an empty
/// heartbeat every HEARTBEAT_INTERVAL_MILLIS * 6 seconds.
pub const FRONTEND_HEARTBEAT_INTERVAL_MILLIS: u64 = HEARTBEAT_INTERVAL_MILLIS * 6;
/// The lease seconds of a region. It's set by 3 heartbeat intervals
/// (HEARTBEAT_INTERVAL_MILLIS × 3), plus some extra buffer (1 second).
pub const REGION_LEASE_SECS: u64 =
Duration::from_millis(HEARTBEAT_INTERVAL_MILLIS * 3).as_secs() + 1;
/// When creating table or region failover, a target node needs to be selected.
/// If the node's lease has expired, the `Selector` will not select it.
pub const DATANODE_LEASE_SECS: u64 = REGION_LEASE_SECS;
pub const FLOWNODE_LEASE_SECS: u64 = DATANODE_LEASE_SECS;
pub const BASE_HEARTBEAT_INTERVAL: Duration = Duration::from_secs(3);
/// The lease seconds of metasrv leader.
pub const META_LEASE_SECS: u64 = 5;
@@ -52,14 +35,6 @@ pub const HEARTBEAT_CHANNEL_KEEP_ALIVE_INTERVAL_SECS: Duration = Duration::from_
/// The keep-alive timeout of the heartbeat channel.
pub const HEARTBEAT_CHANNEL_KEEP_ALIVE_TIMEOUT_SECS: Duration = Duration::from_secs(5);
/// The default options for the etcd client.
pub fn default_etcd_client_options() -> ConnectOptions {
ConnectOptions::new()
.with_keep_alive_while_idle(true)
.with_keep_alive(Duration::from_secs(15), Duration::from_secs(5))
.with_connect_timeout(Duration::from_secs(10))
}
/// The default mailbox round-trip timeout.
pub const MAILBOX_RTT_SECS: u64 = 1;
@@ -68,3 +43,60 @@ pub const TOPIC_STATS_REPORT_INTERVAL_SECS: u64 = 15;
/// The retention seconds of topic stats.
pub const TOPIC_STATS_RETENTION_SECS: u64 = TOPIC_STATS_REPORT_INTERVAL_SECS * 100;
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
/// The distributed time constants.
pub struct DistributedTimeConstants {
pub heartbeat_interval: Duration,
pub frontend_heartbeat_interval: Duration,
pub region_lease: Duration,
pub datanode_lease: Duration,
pub flownode_lease: Duration,
}
/// The frontend heartbeat interval is 6 times of the base heartbeat interval.
pub fn frontend_heartbeat_interval(base_heartbeat_interval: Duration) -> Duration {
base_heartbeat_interval * 6
}
impl DistributedTimeConstants {
/// Create a new DistributedTimeConstants from the heartbeat interval.
pub fn from_heartbeat_interval(heartbeat_interval: Duration) -> Self {
let region_lease = heartbeat_interval * 3 + Duration::from_secs(1);
let datanode_lease = region_lease;
let flownode_lease = datanode_lease;
Self {
heartbeat_interval,
frontend_heartbeat_interval: frontend_heartbeat_interval(heartbeat_interval),
region_lease,
datanode_lease,
flownode_lease,
}
}
}
impl Default for DistributedTimeConstants {
fn default() -> Self {
Self::from_heartbeat_interval(BASE_HEARTBEAT_INTERVAL)
}
}
static DEFAULT_DISTRIBUTED_TIME_CONSTANTS: OnceLock<DistributedTimeConstants> = OnceLock::new();
/// Get the default distributed time constants.
pub fn default_distributed_time_constants() -> &'static DistributedTimeConstants {
DEFAULT_DISTRIBUTED_TIME_CONSTANTS.get_or_init(Default::default)
}
/// Initialize the default distributed time constants.
pub fn init_distributed_time_constants(base_heartbeat_interval: Duration) {
let distributed_time_constants =
DistributedTimeConstants::from_heartbeat_interval(base_heartbeat_interval);
DEFAULT_DISTRIBUTED_TIME_CONSTANTS
.set(distributed_time_constants)
.expect("Failed to set default distributed time constants");
common_telemetry::info!(
"Initialized default distributed time constants: {:#?}",
distributed_time_constants
);
}

View File

@@ -14,7 +14,7 @@
use common_telemetry::{debug, error, info};
use common_wal::config::kafka::common::{
DEFAULT_BACKOFF_CONFIG, DEFAULT_CONNECT_TIMEOUT, KafkaConnectionConfig, KafkaTopicConfig,
DEFAULT_BACKOFF_CONFIG, KafkaConnectionConfig, KafkaTopicConfig,
};
use rskafka::client::error::Error as RsKafkaError;
use rskafka::client::error::ProtocolError::TopicAlreadyExists;
@@ -211,7 +211,8 @@ pub async fn build_kafka_client(connection: &KafkaConnectionConfig) -> Result<Cl
// Builds an kafka controller client for creating topics.
let mut builder = ClientBuilder::new(connection.broker_endpoints.clone())
.backoff_config(DEFAULT_BACKOFF_CONFIG)
.connect_timeout(Some(DEFAULT_CONNECT_TIMEOUT));
.connect_timeout(Some(connection.connect_timeout))
.timeout(Some(connection.timeout));
if let Some(sasl) = &connection.sasl {
builder = builder.sasl_config(sasl.config.clone().into_sasl_config());
};

View File

@@ -71,6 +71,7 @@ pub fn convert_metric_to_write_request(
timestamp,
}],
exemplars: vec![],
histograms: vec![],
}),
MetricType::GAUGE => timeseries.push(TimeSeries {
labels: convert_label(m.get_label(), mf_name, None),
@@ -79,6 +80,7 @@ pub fn convert_metric_to_write_request(
timestamp,
}],
exemplars: vec![],
histograms: vec![],
}),
MetricType::HISTOGRAM => {
let h = m.get_histogram();
@@ -97,6 +99,7 @@ pub fn convert_metric_to_write_request(
timestamp,
}],
exemplars: vec![],
histograms: vec![],
});
if upper_bound.is_sign_positive() && upper_bound.is_infinite() {
inf_seen = true;
@@ -114,6 +117,7 @@ pub fn convert_metric_to_write_request(
timestamp,
}],
exemplars: vec![],
histograms: vec![],
});
}
timeseries.push(TimeSeries {
@@ -127,6 +131,7 @@ pub fn convert_metric_to_write_request(
timestamp,
}],
exemplars: vec![],
histograms: vec![],
});
timeseries.push(TimeSeries {
labels: convert_label(
@@ -139,6 +144,7 @@ pub fn convert_metric_to_write_request(
timestamp,
}],
exemplars: vec![],
histograms: vec![],
});
}
MetricType::SUMMARY => {
@@ -155,6 +161,7 @@ pub fn convert_metric_to_write_request(
timestamp,
}],
exemplars: vec![],
histograms: vec![],
});
}
timeseries.push(TimeSeries {
@@ -168,6 +175,7 @@ pub fn convert_metric_to_write_request(
timestamp,
}],
exemplars: vec![],
histograms: vec![],
});
timeseries.push(TimeSeries {
labels: convert_label(
@@ -180,6 +188,7 @@ pub fn convert_metric_to_write_request(
timestamp,
}],
exemplars: vec![],
histograms: vec![],
});
}
MetricType::UNTYPED => {
@@ -274,7 +283,7 @@ mod test {
assert_eq!(
format!("{:?}", write_quest.timeseries),
r#"[TimeSeries { labels: [Label { name: "__name__", value: "test_counter" }, Label { name: "a", value: "1" }, Label { name: "b", value: "2" }], samples: [Sample { value: 1.0, timestamp: 0 }], exemplars: [] }]"#
r#"[TimeSeries { labels: [Label { name: "__name__", value: "test_counter" }, Label { name: "a", value: "1" }, Label { name: "b", value: "2" }], samples: [Sample { value: 1.0, timestamp: 0 }], exemplars: [], histograms: [] }]"#
);
let gauge_opts = Opts::new("test_gauge", "test help")
@@ -288,7 +297,7 @@ mod test {
let write_quest = convert_metric_to_write_request(mf, None, 0);
assert_eq!(
format!("{:?}", write_quest.timeseries),
r#"[TimeSeries { labels: [Label { name: "__name__", value: "test_gauge" }, Label { name: "a", value: "1" }, Label { name: "b", value: "2" }], samples: [Sample { value: 42.0, timestamp: 0 }], exemplars: [] }]"#
r#"[TimeSeries { labels: [Label { name: "__name__", value: "test_gauge" }, Label { name: "a", value: "1" }, Label { name: "b", value: "2" }], samples: [Sample { value: 42.0, timestamp: 0 }], exemplars: [], histograms: [] }]"#
);
}
@@ -305,20 +314,20 @@ mod test {
.iter()
.map(|x| format!("{:?}", x))
.collect();
let ans = r#"TimeSeries { labels: [Label { name: "__name__", value: "test_histogram_bucket" }, Label { name: "a", value: "1" }, Label { name: "le", value: "0.005" }], samples: [Sample { value: 0.0, timestamp: 0 }], exemplars: [] }
TimeSeries { labels: [Label { name: "__name__", value: "test_histogram_bucket" }, Label { name: "a", value: "1" }, Label { name: "le", value: "0.01" }], samples: [Sample { value: 0.0, timestamp: 0 }], exemplars: [] }
TimeSeries { labels: [Label { name: "__name__", value: "test_histogram_bucket" }, Label { name: "a", value: "1" }, Label { name: "le", value: "0.025" }], samples: [Sample { value: 0.0, timestamp: 0 }], exemplars: [] }
TimeSeries { labels: [Label { name: "__name__", value: "test_histogram_bucket" }, Label { name: "a", value: "1" }, Label { name: "le", value: "0.05" }], samples: [Sample { value: 0.0, timestamp: 0 }], exemplars: [] }
TimeSeries { labels: [Label { name: "__name__", value: "test_histogram_bucket" }, Label { name: "a", value: "1" }, Label { name: "le", value: "0.1" }], samples: [Sample { value: 0.0, timestamp: 0 }], exemplars: [] }
TimeSeries { labels: [Label { name: "__name__", value: "test_histogram_bucket" }, Label { name: "a", value: "1" }, Label { name: "le", value: "0.25" }], samples: [Sample { value: 1.0, timestamp: 0 }], exemplars: [] }
TimeSeries { labels: [Label { name: "__name__", value: "test_histogram_bucket" }, Label { name: "a", value: "1" }, Label { name: "le", value: "0.5" }], samples: [Sample { value: 1.0, timestamp: 0 }], exemplars: [] }
TimeSeries { labels: [Label { name: "__name__", value: "test_histogram_bucket" }, Label { name: "a", value: "1" }, Label { name: "le", value: "1" }], samples: [Sample { value: 1.0, timestamp: 0 }], exemplars: [] }
TimeSeries { labels: [Label { name: "__name__", value: "test_histogram_bucket" }, Label { name: "a", value: "1" }, Label { name: "le", value: "2.5" }], samples: [Sample { value: 1.0, timestamp: 0 }], exemplars: [] }
TimeSeries { labels: [Label { name: "__name__", value: "test_histogram_bucket" }, Label { name: "a", value: "1" }, Label { name: "le", value: "5" }], samples: [Sample { value: 1.0, timestamp: 0 }], exemplars: [] }
TimeSeries { labels: [Label { name: "__name__", value: "test_histogram_bucket" }, Label { name: "a", value: "1" }, Label { name: "le", value: "10" }], samples: [Sample { value: 1.0, timestamp: 0 }], exemplars: [] }
TimeSeries { labels: [Label { name: "__name__", value: "test_histogram_bucket" }, Label { name: "a", value: "1" }, Label { name: "le", value: "+Inf" }], samples: [Sample { value: 1.0, timestamp: 0 }], exemplars: [] }
TimeSeries { labels: [Label { name: "__name__", value: "test_histogram_sum" }, Label { name: "a", value: "1" }], samples: [Sample { value: 0.25, timestamp: 0 }], exemplars: [] }
TimeSeries { labels: [Label { name: "__name__", value: "test_histogram_count" }, Label { name: "a", value: "1" }], samples: [Sample { value: 1.0, timestamp: 0 }], exemplars: [] }"#;
let ans = r#"TimeSeries { labels: [Label { name: "__name__", value: "test_histogram_bucket" }, Label { name: "a", value: "1" }, Label { name: "le", value: "0.005" }], samples: [Sample { value: 0.0, timestamp: 0 }], exemplars: [], histograms: [] }
TimeSeries { labels: [Label { name: "__name__", value: "test_histogram_bucket" }, Label { name: "a", value: "1" }, Label { name: "le", value: "0.01" }], samples: [Sample { value: 0.0, timestamp: 0 }], exemplars: [], histograms: [] }
TimeSeries { labels: [Label { name: "__name__", value: "test_histogram_bucket" }, Label { name: "a", value: "1" }, Label { name: "le", value: "0.025" }], samples: [Sample { value: 0.0, timestamp: 0 }], exemplars: [], histograms: [] }
TimeSeries { labels: [Label { name: "__name__", value: "test_histogram_bucket" }, Label { name: "a", value: "1" }, Label { name: "le", value: "0.05" }], samples: [Sample { value: 0.0, timestamp: 0 }], exemplars: [], histograms: [] }
TimeSeries { labels: [Label { name: "__name__", value: "test_histogram_bucket" }, Label { name: "a", value: "1" }, Label { name: "le", value: "0.1" }], samples: [Sample { value: 0.0, timestamp: 0 }], exemplars: [], histograms: [] }
TimeSeries { labels: [Label { name: "__name__", value: "test_histogram_bucket" }, Label { name: "a", value: "1" }, Label { name: "le", value: "0.25" }], samples: [Sample { value: 1.0, timestamp: 0 }], exemplars: [], histograms: [] }
TimeSeries { labels: [Label { name: "__name__", value: "test_histogram_bucket" }, Label { name: "a", value: "1" }, Label { name: "le", value: "0.5" }], samples: [Sample { value: 1.0, timestamp: 0 }], exemplars: [], histograms: [] }
TimeSeries { labels: [Label { name: "__name__", value: "test_histogram_bucket" }, Label { name: "a", value: "1" }, Label { name: "le", value: "1" }], samples: [Sample { value: 1.0, timestamp: 0 }], exemplars: [], histograms: [] }
TimeSeries { labels: [Label { name: "__name__", value: "test_histogram_bucket" }, Label { name: "a", value: "1" }, Label { name: "le", value: "2.5" }], samples: [Sample { value: 1.0, timestamp: 0 }], exemplars: [], histograms: [] }
TimeSeries { labels: [Label { name: "__name__", value: "test_histogram_bucket" }, Label { name: "a", value: "1" }, Label { name: "le", value: "5" }], samples: [Sample { value: 1.0, timestamp: 0 }], exemplars: [], histograms: [] }
TimeSeries { labels: [Label { name: "__name__", value: "test_histogram_bucket" }, Label { name: "a", value: "1" }, Label { name: "le", value: "10" }], samples: [Sample { value: 1.0, timestamp: 0 }], exemplars: [], histograms: [] }
TimeSeries { labels: [Label { name: "__name__", value: "test_histogram_bucket" }, Label { name: "a", value: "1" }, Label { name: "le", value: "+Inf" }], samples: [Sample { value: 1.0, timestamp: 0 }], exemplars: [], histograms: [] }
TimeSeries { labels: [Label { name: "__name__", value: "test_histogram_sum" }, Label { name: "a", value: "1" }], samples: [Sample { value: 0.25, timestamp: 0 }], exemplars: [], histograms: [] }
TimeSeries { labels: [Label { name: "__name__", value: "test_histogram_count" }, Label { name: "a", value: "1" }], samples: [Sample { value: 1.0, timestamp: 0 }], exemplars: [], histograms: [] }"#;
assert_eq!(write_quest_str.join("\n"), ans);
}
@@ -355,10 +364,10 @@ TimeSeries { labels: [Label { name: "__name__", value: "test_histogram_count" },
.iter()
.map(|x| format!("{:?}", x))
.collect();
let ans = r#"TimeSeries { labels: [Label { name: "__name__", value: "test_summary" }, Label { name: "quantile", value: "50" }], samples: [Sample { value: 3.0, timestamp: 20 }], exemplars: [] }
TimeSeries { labels: [Label { name: "__name__", value: "test_summary" }, Label { name: "quantile", value: "100" }], samples: [Sample { value: 5.0, timestamp: 20 }], exemplars: [] }
TimeSeries { labels: [Label { name: "__name__", value: "test_summary_sum" }], samples: [Sample { value: 15.0, timestamp: 20 }], exemplars: [] }
TimeSeries { labels: [Label { name: "__name__", value: "test_summary_count" }], samples: [Sample { value: 5.0, timestamp: 20 }], exemplars: [] }"#;
let ans = r#"TimeSeries { labels: [Label { name: "__name__", value: "test_summary" }, Label { name: "quantile", value: "50" }], samples: [Sample { value: 3.0, timestamp: 20 }], exemplars: [], histograms: [] }
TimeSeries { labels: [Label { name: "__name__", value: "test_summary" }, Label { name: "quantile", value: "100" }], samples: [Sample { value: 5.0, timestamp: 20 }], exemplars: [], histograms: [] }
TimeSeries { labels: [Label { name: "__name__", value: "test_summary_sum" }], samples: [Sample { value: 15.0, timestamp: 20 }], exemplars: [], histograms: [] }
TimeSeries { labels: [Label { name: "__name__", value: "test_summary_count" }], samples: [Sample { value: 5.0, timestamp: 20 }], exemplars: [], histograms: [] }"#;
assert_eq!(write_quest_str.join("\n"), ans);
}
@@ -385,11 +394,11 @@ TimeSeries { labels: [Label { name: "__name__", value: "test_summary_count" }],
let write_quest2 = convert_metric_to_write_request(mf, Some(&filter), 0);
assert_eq!(
format!("{:?}", write_quest1.timeseries),
r#"[TimeSeries { labels: [Label { name: "__name__", value: "filter_counter" }, Label { name: "a", value: "1" }, Label { name: "b", value: "2" }], samples: [Sample { value: 1.0, timestamp: 0 }], exemplars: [] }, TimeSeries { labels: [Label { name: "__name__", value: "test_counter" }, Label { name: "a", value: "1" }, Label { name: "b", value: "2" }], samples: [Sample { value: 2.0, timestamp: 0 }], exemplars: [] }]"#
r#"[TimeSeries { labels: [Label { name: "__name__", value: "filter_counter" }, Label { name: "a", value: "1" }, Label { name: "b", value: "2" }], samples: [Sample { value: 1.0, timestamp: 0 }], exemplars: [], histograms: [] }, TimeSeries { labels: [Label { name: "__name__", value: "test_counter" }, Label { name: "a", value: "1" }, Label { name: "b", value: "2" }], samples: [Sample { value: 2.0, timestamp: 0 }], exemplars: [], histograms: [] }]"#
);
assert_eq!(
format!("{:?}", write_quest2.timeseries),
r#"[TimeSeries { labels: [Label { name: "__name__", value: "test_counter" }, Label { name: "a", value: "1" }, Label { name: "b", value: "2" }], samples: [Sample { value: 2.0, timestamp: 0 }], exemplars: [] }]"#
r#"[TimeSeries { labels: [Label { name: "__name__", value: "test_counter" }, Label { name: "a", value: "1" }, Label { name: "b", value: "2" }], samples: [Sample { value: 2.0, timestamp: 0 }], exemplars: [], histograms: [] }]"#
);
}
}

View File

@@ -206,6 +206,8 @@ mod tests {
client_cert_path: None,
client_key_path: None,
}),
connect_timeout: Duration::from_secs(3),
timeout: Duration::from_secs(3),
},
kafka_topic: KafkaTopicConfig {
num_topics: 32,
@@ -239,6 +241,8 @@ mod tests {
client_cert_path: None,
client_key_path: None,
}),
connect_timeout: Duration::from_secs(3),
timeout: Duration::from_secs(3),
},
max_batch_bytes: ReadableSize::mb(1),
consumer_wait_timeout: Duration::from_millis(100),

View File

@@ -36,9 +36,6 @@ pub const DEFAULT_BACKOFF_CONFIG: BackoffConfig = BackoffConfig {
deadline: Some(Duration::from_secs(3)),
};
/// The default connect timeout for kafka client.
pub const DEFAULT_CONNECT_TIMEOUT: Duration = Duration::from_secs(10);
/// Default interval for auto WAL pruning.
pub const DEFAULT_AUTO_PRUNE_INTERVAL: Duration = Duration::from_mins(30);
/// Default limit for concurrent auto pruning tasks.
@@ -167,6 +164,12 @@ pub struct KafkaConnectionConfig {
pub sasl: Option<KafkaClientSasl>,
/// Client TLS config
pub tls: Option<KafkaClientTls>,
/// The connect timeout for kafka client.
#[serde(with = "humantime_serde")]
pub connect_timeout: Duration,
/// The timeout for kafka client.
#[serde(with = "humantime_serde")]
pub timeout: Duration,
}
impl Default for KafkaConnectionConfig {
@@ -175,6 +178,8 @@ impl Default for KafkaConnectionConfig {
broker_endpoints: vec![BROKER_ENDPOINT.to_string()],
sasl: None,
tls: None,
connect_timeout: Duration::from_secs(3),
timeout: Duration::from_secs(3),
}
}
}

View File

@@ -33,7 +33,8 @@ pub use crate::schema::column_schema::{
COLUMN_SKIPPING_INDEX_OPT_KEY_FALSE_POSITIVE_RATE, COLUMN_SKIPPING_INDEX_OPT_KEY_GRANULARITY,
COLUMN_SKIPPING_INDEX_OPT_KEY_TYPE, COMMENT_KEY, ColumnExtType, ColumnSchema, FULLTEXT_KEY,
FulltextAnalyzer, FulltextBackend, FulltextOptions, INVERTED_INDEX_KEY, Metadata,
SKIPPING_INDEX_KEY, SkippingIndexOptions, SkippingIndexType, TIME_INDEX_KEY,
SKIPPING_INDEX_KEY, SkippingIndexOptions, SkippingIndexType, TIME_INDEX_KEY, VECTOR_INDEX_KEY,
VectorDistanceMetric, VectorIndexEngineType, VectorIndexOptions,
};
pub use crate::schema::constraint::ColumnDefaultConstraint;
pub use crate::schema::raw::RawSchema;

View File

@@ -46,6 +46,8 @@ pub const FULLTEXT_KEY: &str = "greptime:fulltext";
pub const INVERTED_INDEX_KEY: &str = "greptime:inverted_index";
/// Key used to store skip options in arrow field's metadata.
pub const SKIPPING_INDEX_KEY: &str = "greptime:skipping_index";
/// Key used to store vector index options in arrow field's metadata.
pub const VECTOR_INDEX_KEY: &str = "greptime:vector_index";
/// Keys used in fulltext options
pub const COLUMN_FULLTEXT_CHANGE_OPT_KEY_ENABLE: &str = "enable";
@@ -216,6 +218,53 @@ impl ColumnSchema {
self.metadata.contains_key(INVERTED_INDEX_KEY)
}
/// Checks if this column has a vector index.
pub fn is_vector_indexed(&self) -> bool {
match self.vector_index_options() {
Ok(opts) => opts.is_some(),
Err(e) => {
common_telemetry::warn!(
"Failed to deserialize vector_index_options for column '{}': {}",
self.name,
e
);
false
}
}
}
/// Gets the vector index options.
pub fn vector_index_options(&self) -> Result<Option<VectorIndexOptions>> {
match self.metadata.get(VECTOR_INDEX_KEY) {
None => Ok(None),
Some(json) => {
let options =
serde_json::from_str(json).context(error::DeserializeSnafu { json })?;
Ok(Some(options))
}
}
}
/// Sets the vector index options.
pub fn set_vector_index_options(&mut self, options: &VectorIndexOptions) -> Result<()> {
self.metadata.insert(
VECTOR_INDEX_KEY.to_string(),
serde_json::to_string(options).context(error::SerializeSnafu)?,
);
Ok(())
}
/// Removes the vector index options.
pub fn unset_vector_index_options(&mut self) {
self.metadata.remove(VECTOR_INDEX_KEY);
}
/// Sets vector index options and returns self for chaining.
pub fn with_vector_index_options(mut self, options: &VectorIndexOptions) -> Result<Self> {
self.set_vector_index_options(options)?;
Ok(self)
}
/// Set default constraint.
///
/// If a default constraint exists for the column, this method will
@@ -964,6 +1013,181 @@ impl TryFrom<HashMap<String, String>> for SkippingIndexOptions {
}
}
/// Distance metric for vector similarity search.
#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize, Default, Visit, VisitMut)]
#[serde(rename_all = "lowercase")]
pub enum VectorDistanceMetric {
/// Squared Euclidean distance (L2^2).
#[default]
L2sq,
/// Cosine distance (1 - cosine similarity).
Cosine,
/// Inner product (negative, for maximum inner product search).
#[serde(alias = "ip")]
InnerProduct,
}
impl fmt::Display for VectorDistanceMetric {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
match self {
VectorDistanceMetric::L2sq => write!(f, "l2sq"),
VectorDistanceMetric::Cosine => write!(f, "cosine"),
VectorDistanceMetric::InnerProduct => write!(f, "ip"),
}
}
}
impl std::str::FromStr for VectorDistanceMetric {
type Err = String;
fn from_str(s: &str) -> std::result::Result<Self, Self::Err> {
match s.to_lowercase().as_str() {
"l2sq" | "l2" | "euclidean" => Ok(VectorDistanceMetric::L2sq),
"cosine" | "cos" => Ok(VectorDistanceMetric::Cosine),
"inner_product" | "ip" | "dot" => Ok(VectorDistanceMetric::InnerProduct),
_ => Err(format!(
"Unknown distance metric: {}. Expected: l2sq, cosine, or ip",
s
)),
}
}
}
impl VectorDistanceMetric {
/// Returns the metric as u8 for blob serialization.
pub fn as_u8(&self) -> u8 {
match self {
Self::L2sq => 0,
Self::Cosine => 1,
Self::InnerProduct => 2,
}
}
/// Parses metric from u8 (used when reading blob).
pub fn try_from_u8(v: u8) -> Option<Self> {
match v {
0 => Some(Self::L2sq),
1 => Some(Self::Cosine),
2 => Some(Self::InnerProduct),
_ => None,
}
}
}
/// Default HNSW connectivity parameter.
const DEFAULT_VECTOR_INDEX_CONNECTIVITY: u32 = 16;
/// Default expansion factor during index construction.
const DEFAULT_VECTOR_INDEX_EXPANSION_ADD: u32 = 128;
/// Default expansion factor during search.
const DEFAULT_VECTOR_INDEX_EXPANSION_SEARCH: u32 = 64;
fn default_vector_index_connectivity() -> u32 {
DEFAULT_VECTOR_INDEX_CONNECTIVITY
}
fn default_vector_index_expansion_add() -> u32 {
DEFAULT_VECTOR_INDEX_EXPANSION_ADD
}
fn default_vector_index_expansion_search() -> u32 {
DEFAULT_VECTOR_INDEX_EXPANSION_SEARCH
}
/// Supported vector index engine types.
#[derive(Debug, Clone, Copy, PartialEq, Eq, Default, Serialize, Deserialize, Visit, VisitMut)]
#[serde(rename_all = "lowercase")]
pub enum VectorIndexEngineType {
/// USearch HNSW implementation.
#[default]
Usearch,
// Future: Vsag,
}
impl VectorIndexEngineType {
/// Returns the engine type as u8 for blob serialization.
pub fn as_u8(&self) -> u8 {
match self {
Self::Usearch => 0,
}
}
/// Parses engine type from u8 (used when reading blob).
pub fn try_from_u8(v: u8) -> Option<Self> {
match v {
0 => Some(Self::Usearch),
_ => None,
}
}
}
impl fmt::Display for VectorIndexEngineType {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
match self {
Self::Usearch => write!(f, "usearch"),
}
}
}
impl std::str::FromStr for VectorIndexEngineType {
type Err = String;
fn from_str(s: &str) -> std::result::Result<Self, Self::Err> {
match s.to_lowercase().as_str() {
"usearch" => Ok(Self::Usearch),
_ => Err(format!(
"Unknown vector index engine: {}. Expected: usearch",
s
)),
}
}
}
/// Options for vector index (HNSW).
#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize, Visit, VisitMut)]
#[serde(rename_all = "kebab-case")]
pub struct VectorIndexOptions {
/// Vector index engine type (default: usearch).
#[serde(default)]
pub engine: VectorIndexEngineType,
/// Distance metric for similarity search.
#[serde(default)]
pub metric: VectorDistanceMetric,
/// HNSW connectivity parameter (M in the paper).
/// Higher values improve recall but increase memory usage.
#[serde(default = "default_vector_index_connectivity")]
pub connectivity: u32,
/// Expansion factor during index construction (ef_construction).
/// Higher values improve index quality but slow down construction.
#[serde(default = "default_vector_index_expansion_add")]
pub expansion_add: u32,
/// Expansion factor during search (ef_search).
/// Higher values improve recall but slow down search.
#[serde(default = "default_vector_index_expansion_search")]
pub expansion_search: u32,
}
impl Default for VectorIndexOptions {
fn default() -> Self {
Self {
engine: VectorIndexEngineType::default(),
metric: VectorDistanceMetric::default(),
connectivity: DEFAULT_VECTOR_INDEX_CONNECTIVITY,
expansion_add: DEFAULT_VECTOR_INDEX_EXPANSION_ADD,
expansion_search: DEFAULT_VECTOR_INDEX_EXPANSION_SEARCH,
}
}
}
impl fmt::Display for VectorIndexOptions {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
write!(
f,
"engine={}, metric={}, connectivity={}, expansion_add={}, expansion_search={}",
self.engine, self.metric, self.connectivity, self.expansion_add, self.expansion_search
)
}
}
#[cfg(test)]
mod tests {
use std::sync::Arc;

View File

@@ -15,7 +15,7 @@
//! Frontend client to run flow as batching task which is time-window-aware normal query triggered every tick set by user
use std::collections::HashMap;
use std::sync::{Arc, Weak};
use std::sync::{Arc, Mutex, Weak};
use std::time::SystemTime;
use api::v1::greptime_request::Request;
@@ -38,6 +38,7 @@ use servers::query_handler::grpc::GrpcQueryHandler;
use session::context::{QueryContextBuilder, QueryContextRef};
use session::hints::READ_PREFERENCE_HINT;
use snafu::{OptionExt, ResultExt};
use tokio::sync::SetOnce;
use crate::batching_mode::BatchingModeOptions;
use crate::error::{
@@ -75,7 +76,19 @@ impl<E: ErrorExt + Send + Sync + 'static, T: GrpcQueryHandler<Error = E> + Send
}
}
type HandlerMutable = Arc<std::sync::Mutex<Option<Weak<dyn GrpcQueryHandlerWithBoxedError>>>>;
#[derive(Debug, Clone)]
pub struct HandlerMutable {
handler: Arc<Mutex<Option<Weak<dyn GrpcQueryHandlerWithBoxedError>>>>,
is_initialized: Arc<SetOnce<()>>,
}
impl HandlerMutable {
pub async fn set_handler(&self, handler: Weak<dyn GrpcQueryHandlerWithBoxedError>) {
*self.handler.lock().unwrap() = Some(handler);
// Ignore the error, as we allow the handler to be set multiple times.
let _ = self.is_initialized.set(());
}
}
/// A simple frontend client able to execute sql using grpc protocol
///
@@ -100,7 +113,11 @@ pub enum FrontendClient {
impl FrontendClient {
/// Create a new empty frontend client, with a `HandlerMutable` to set the grpc handler later
pub fn from_empty_grpc_handler(query: QueryOptions) -> (Self, HandlerMutable) {
let handler = Arc::new(std::sync::Mutex::new(None));
let is_initialized = Arc::new(SetOnce::new());
let handler = HandlerMutable {
handler: Arc::new(Mutex::new(None)),
is_initialized,
};
(
Self::Standalone {
database_client: handler.clone(),
@@ -110,23 +127,13 @@ impl FrontendClient {
)
}
/// Check if the frontend client is initialized.
///
/// In distributed mode, it is always initialized.
/// In standalone mode, it checks if the database client is set.
pub fn is_initialized(&self) -> bool {
match self {
FrontendClient::Distributed { .. } => true,
FrontendClient::Standalone {
database_client, ..
} => {
let guard = database_client.lock();
if let Ok(guard) = guard {
guard.is_some()
} else {
false
}
}
/// Waits until the frontend client is initialized.
pub async fn wait_initialized(&self) {
if let FrontendClient::Standalone {
database_client, ..
} = self
{
database_client.is_initialized.wait().await;
}
}
@@ -158,8 +165,14 @@ impl FrontendClient {
grpc_handler: Weak<dyn GrpcQueryHandlerWithBoxedError>,
query: QueryOptions,
) -> Self {
let is_initialized = Arc::new(SetOnce::new_with(Some(())));
let handler = HandlerMutable {
handler: Arc::new(Mutex::new(Some(grpc_handler))),
is_initialized: is_initialized.clone(),
};
Self::Standalone {
database_client: Arc::new(std::sync::Mutex::new(Some(grpc_handler))),
database_client: handler,
query,
}
}
@@ -341,6 +354,7 @@ impl FrontendClient {
{
let database_client = {
database_client
.handler
.lock()
.map_err(|e| {
UnexpectedSnafu {
@@ -418,6 +432,7 @@ impl FrontendClient {
{
let database_client = {
database_client
.handler
.lock()
.map_err(|e| {
UnexpectedSnafu {
@@ -480,3 +495,73 @@ impl std::fmt::Display for PeerDesc {
}
}
}
#[cfg(test)]
mod tests {
use std::time::Duration;
use common_query::Output;
use tokio::time::timeout;
use super::*;
#[derive(Debug)]
struct NoopHandler;
#[async_trait::async_trait]
impl GrpcQueryHandlerWithBoxedError for NoopHandler {
async fn do_query(
&self,
_query: Request,
_ctx: QueryContextRef,
) -> std::result::Result<Output, BoxedError> {
Ok(Output::new_with_affected_rows(0))
}
}
#[tokio::test]
async fn wait_initialized() {
let (client, handler_mut) =
FrontendClient::from_empty_grpc_handler(QueryOptions::default());
assert!(
timeout(Duration::from_millis(50), client.wait_initialized())
.await
.is_err()
);
let handler: Arc<dyn GrpcQueryHandlerWithBoxedError> = Arc::new(NoopHandler);
handler_mut.set_handler(Arc::downgrade(&handler)).await;
timeout(Duration::from_secs(1), client.wait_initialized())
.await
.expect("wait_initialized should complete after handler is set");
timeout(Duration::from_millis(10), client.wait_initialized())
.await
.expect("wait_initialized should be a no-op once initialized");
let handler: Arc<dyn GrpcQueryHandlerWithBoxedError> = Arc::new(NoopHandler);
let client =
FrontendClient::from_grpc_handler(Arc::downgrade(&handler), QueryOptions::default());
assert!(
timeout(Duration::from_millis(10), client.wait_initialized())
.await
.is_ok()
);
let meta_client = Arc::new(MetaClient::default());
let client = FrontendClient::from_meta_client(
meta_client,
None,
QueryOptions::default(),
BatchingModeOptions::default(),
)
.unwrap();
assert!(
timeout(Duration::from_millis(10), client.wait_initialized())
.await
.is_ok()
);
}
}

View File

@@ -157,7 +157,6 @@ mod tests {
use common_error::from_header_to_err_code_msg;
use common_error::status_code::StatusCode;
use common_grpc::channel_manager::ChannelManager;
use common_meta::distributed_time_constants::FRONTEND_HEARTBEAT_INTERVAL_MILLIS;
use common_meta::heartbeat::handler::HandlerGroupExecutor;
use common_meta::heartbeat::handler::parse_mailbox_message::ParseMailboxMessageHandler;
use common_meta::heartbeat::handler::suspend::SuspendHandler;
@@ -400,6 +399,10 @@ mod tests {
..Default::default()
},
meta_client: Some(meta_client_options.clone()),
heartbeat: HeartbeatOptions {
interval: Duration::from_secs(1),
..Default::default()
},
..Default::default()
};
@@ -409,7 +412,8 @@ mod tests {
let meta_client = create_meta_client(&meta_client_options, server.clone()).await;
let frontend = create_frontend(&options, meta_client).await?;
tokio::time::sleep(Duration::from_millis(FRONTEND_HEARTBEAT_INTERVAL_MILLIS)).await;
let frontend_heartbeat_interval = options.heartbeat.interval;
tokio::time::sleep(frontend_heartbeat_interval).await;
// initial state: not suspend:
assert!(!frontend.instance.is_suspended());
verify_suspend_state_by_http(&frontend, Ok(r#"[{"records":{"schema":{"column_schemas":[{"name":"Int64(1)","data_type":"Int64"}]},"rows":[[1]],"total_rows":1}}]"#)).await;
@@ -426,7 +430,7 @@ mod tests {
// make heartbeat server returned "suspend" instruction,
server.suspend.store(true, Ordering::Relaxed);
tokio::time::sleep(Duration::from_millis(FRONTEND_HEARTBEAT_INTERVAL_MILLIS)).await;
tokio::time::sleep(frontend_heartbeat_interval).await;
// ... then the frontend is suspended:
assert!(frontend.instance.is_suspended());
verify_suspend_state_by_http(
@@ -442,7 +446,7 @@ mod tests {
// make heartbeat server NOT returned "suspend" instruction,
server.suspend.store(false, Ordering::Relaxed);
tokio::time::sleep(Duration::from_millis(FRONTEND_HEARTBEAT_INTERVAL_MILLIS)).await;
tokio::time::sleep(frontend_heartbeat_interval).await;
// ... then frontend's suspend state is cleared:
assert!(!frontend.instance.is_suspended());
verify_suspend_state_by_http(&frontend, Ok(r#"[{"records":{"schema":{"column_schemas":[{"name":"Int64(1)","data_type":"Int64"}]},"rows":[[1]],"total_rows":1}}]"#)).await;

View File

@@ -7,6 +7,9 @@ license.workspace = true
[lints]
workspace = true
[features]
vector_index = ["dep:usearch"]
[dependencies]
async-trait.workspace = true
asynchronous-codec = "0.7.0"
@@ -17,6 +20,7 @@ common-error.workspace = true
common-macro.workspace = true
common-runtime.workspace = true
common-telemetry.workspace = true
datatypes.workspace = true
fastbloom = "0.8"
fst.workspace = true
futures.workspace = true
@@ -25,6 +29,7 @@ itertools.workspace = true
jieba-rs = "0.8"
lazy_static.workspace = true
mockall.workspace = true
nalgebra.workspace = true
pin-project.workspace = true
prost.workspace = true
puffin.workspace = true
@@ -39,6 +44,7 @@ tantivy = { version = "0.24", features = ["zstd-compression"] }
tantivy-jieba = "0.16"
tokio.workspace = true
tokio-util.workspace = true
usearch = { version = "2.21", default-features = false, features = ["fp16lib"], optional = true }
uuid.workspace = true
[dev-dependencies]

View File

@@ -22,6 +22,8 @@ pub mod external_provider;
pub mod fulltext_index;
pub mod inverted_index;
pub mod target;
#[cfg(feature = "vector_index")]
pub mod vector;
pub type Bytes = Vec<u8>;
pub type BytesRef<'a> = &'a [u8];

163
src/index/src/vector.rs Normal file
View File

@@ -0,0 +1,163 @@
// Copyright 2023 Greptime Team
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
//! Vector index types and options.
//!
//! This module re-exports types from `datatypes` and provides conversions
//! to USearch types, as well as distance computation functions.
pub use datatypes::schema::{VectorDistanceMetric, VectorIndexOptions};
use nalgebra::DVectorView;
pub use usearch::MetricKind;
/// Converts a VectorDistanceMetric to a USearch MetricKind.
pub fn distance_metric_to_usearch(metric: VectorDistanceMetric) -> MetricKind {
match metric {
VectorDistanceMetric::L2sq => MetricKind::L2sq,
VectorDistanceMetric::Cosine => MetricKind::Cos,
VectorDistanceMetric::InnerProduct => MetricKind::IP,
}
}
/// Computes distance between two vectors using the specified metric.
///
/// Uses SIMD-optimized implementations via nalgebra.
///
/// **Note:** The caller must ensure that the two vectors have the same length
/// and are non-empty. Empty vectors return 0.0 for all metrics.
pub fn compute_distance(v1: &[f32], v2: &[f32], metric: VectorDistanceMetric) -> f32 {
// Empty vectors are degenerate; return 0.0 uniformly across all metrics.
if v1.is_empty() || v2.is_empty() {
return 0.0;
}
match metric {
VectorDistanceMetric::L2sq => l2sq(v1, v2),
VectorDistanceMetric::Cosine => cosine(v1, v2),
VectorDistanceMetric::InnerProduct => -dot(v1, v2),
}
}
/// Calculates the squared L2 distance between two vectors.
fn l2sq(lhs: &[f32], rhs: &[f32]) -> f32 {
let lhs = DVectorView::from_slice(lhs, lhs.len());
let rhs = DVectorView::from_slice(rhs, rhs.len());
(lhs - rhs).norm_squared()
}
/// Calculates the cosine distance between two vectors.
///
/// Returns a value in `[0.0, 2.0]` where 0.0 means identical direction and 2.0 means
/// opposite direction. For degenerate cases (zero or near-zero magnitude vectors),
/// returns 1.0 (maximum uncertainty) to avoid NaN and ensure safe index operations.
fn cosine(lhs: &[f32], rhs: &[f32]) -> f32 {
let lhs_vec = DVectorView::from_slice(lhs, lhs.len());
let rhs_vec = DVectorView::from_slice(rhs, rhs.len());
let dot_product = lhs_vec.dot(&rhs_vec);
let lhs_norm = lhs_vec.norm();
let rhs_norm = rhs_vec.norm();
// Zero-magnitude vectors have undefined direction; return max distance as safe fallback.
if dot_product.abs() < f32::EPSILON
|| lhs_norm.abs() < f32::EPSILON
|| rhs_norm.abs() < f32::EPSILON
{
return 1.0;
}
let cos_similar = dot_product / (lhs_norm * rhs_norm);
let res = 1.0 - cos_similar;
// Clamp near-zero results to exactly 0.0 to avoid floating-point artifacts.
if res.abs() < f32::EPSILON { 0.0 } else { res }
}
/// Calculates the dot product between two vectors.
fn dot(lhs: &[f32], rhs: &[f32]) -> f32 {
let lhs = DVectorView::from_slice(lhs, lhs.len());
let rhs = DVectorView::from_slice(rhs, rhs.len());
lhs.dot(&rhs)
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_distance_metric_to_usearch() {
assert_eq!(
distance_metric_to_usearch(VectorDistanceMetric::L2sq),
MetricKind::L2sq
);
assert_eq!(
distance_metric_to_usearch(VectorDistanceMetric::Cosine),
MetricKind::Cos
);
assert_eq!(
distance_metric_to_usearch(VectorDistanceMetric::InnerProduct),
MetricKind::IP
);
}
#[test]
fn test_vector_index_options_default() {
let options = VectorIndexOptions::default();
assert_eq!(options.metric, VectorDistanceMetric::L2sq);
assert_eq!(options.connectivity, 16);
assert_eq!(options.expansion_add, 128);
assert_eq!(options.expansion_search, 64);
}
#[test]
fn test_compute_distance_l2sq() {
let v1 = vec![1.0, 2.0, 3.0];
let v2 = vec![4.0, 5.0, 6.0];
// L2sq = (4-1)^2 + (5-2)^2 + (6-3)^2 = 9 + 9 + 9 = 27
let dist = compute_distance(&v1, &v2, VectorDistanceMetric::L2sq);
assert!((dist - 27.0).abs() < 1e-6);
}
#[test]
fn test_compute_distance_cosine() {
let v1 = vec![1.0, 0.0, 0.0];
let v2 = vec![0.0, 1.0, 0.0];
// Orthogonal vectors have cosine similarity of 0, distance of 1
let dist = compute_distance(&v1, &v2, VectorDistanceMetric::Cosine);
assert!((dist - 1.0).abs() < 1e-6);
}
#[test]
fn test_compute_distance_inner_product() {
let v1 = vec![1.0, 2.0, 3.0];
let v2 = vec![4.0, 5.0, 6.0];
// Inner product = 1*4 + 2*5 + 3*6 = 4 + 10 + 18 = 32
// Distance is negated: -32
let dist = compute_distance(&v1, &v2, VectorDistanceMetric::InnerProduct);
assert!((dist - (-32.0)).abs() < 1e-6);
}
#[test]
fn test_compute_distance_empty_vectors() {
// Empty vectors should return 0.0 uniformly for all metrics
assert_eq!(compute_distance(&[], &[], VectorDistanceMetric::L2sq), 0.0);
assert_eq!(
compute_distance(&[], &[], VectorDistanceMetric::Cosine),
0.0
);
assert_eq!(
compute_distance(&[], &[], VectorDistanceMetric::InnerProduct),
0.0
);
}
}

View File

@@ -16,7 +16,7 @@ use std::collections::HashMap;
use std::sync::Arc;
use common_wal::config::kafka::DatanodeKafkaConfig;
use common_wal::config::kafka::common::{DEFAULT_BACKOFF_CONFIG, DEFAULT_CONNECT_TIMEOUT};
use common_wal::config::kafka::common::DEFAULT_BACKOFF_CONFIG;
use dashmap::DashMap;
use rskafka::client::ClientBuilder;
use rskafka::client::partition::{Compression, PartitionClient, UnknownTopicHandling};
@@ -79,7 +79,8 @@ impl ClientManager {
// Sets backoff config for the top-level kafka client and all clients constructed by it.
let mut builder = ClientBuilder::new(config.connection.broker_endpoints.clone())
.backoff_config(DEFAULT_BACKOFF_CONFIG)
.connect_timeout(Some(DEFAULT_CONNECT_TIMEOUT));
.connect_timeout(Some(config.connection.connect_timeout))
.timeout(Some(config.connection.timeout));
if let Some(sasl) = &config.connection.sasl {
builder = builder.sasl_config(sasl.config.clone().into_sasl_config());
};

View File

@@ -14,7 +14,6 @@
use std::net::SocketAddr;
use std::sync::Arc;
use std::time::Duration;
use api::v1::meta::cluster_server::ClusterServer;
use api::v1::meta::heartbeat_server::HeartbeatServer;
@@ -60,11 +59,6 @@ use crate::service::admin::admin_axum_router;
use crate::utils::etcd::create_etcd_client_with_tls;
use crate::{Result, error};
/// The default keep-alive interval for gRPC.
const DEFAULT_GRPC_KEEP_ALIVE_INTERVAL: Duration = Duration::from_secs(10);
/// The default keep-alive timeout for gRPC.
const DEFAULT_GRPC_KEEP_ALIVE_TIMEOUT: Duration = Duration::from_secs(10);
pub struct MetasrvInstance {
metasrv: Arc<Metasrv>,
@@ -255,8 +249,8 @@ pub fn router(metasrv: Arc<Metasrv>) -> Router {
// for admin services
.accept_http1(true)
// For quick network failures detection.
.http2_keepalive_interval(Some(DEFAULT_GRPC_KEEP_ALIVE_INTERVAL))
.http2_keepalive_timeout(Some(DEFAULT_GRPC_KEEP_ALIVE_TIMEOUT));
.http2_keepalive_interval(Some(metasrv.options().grpc.http2_keep_alive_interval))
.http2_keepalive_timeout(Some(metasrv.options().grpc.http2_keep_alive_timeout));
let router = add_compressed_service!(router, HeartbeatServer::from_arc(metasrv.clone()));
let router = add_compressed_service!(router, StoreServer::from_arc(metasrv.clone()));
let router = add_compressed_service!(router, ClusterServer::from_arc(metasrv.clone()));
@@ -273,8 +267,12 @@ pub async fn metasrv_builder(
(Some(kv_backend), _) => (kv_backend, None),
(None, BackendImpl::MemoryStore) => (Arc::new(MemoryKvBackend::new()) as _, None),
(None, BackendImpl::EtcdStore) => {
let etcd_client =
create_etcd_client_with_tls(&opts.store_addrs, opts.backend_tls.as_ref()).await?;
let etcd_client = create_etcd_client_with_tls(
&opts.store_addrs,
&opts.backend_client,
opts.backend_tls.as_ref(),
)
.await?;
let kv_backend = EtcdStore::with_etcd_client(etcd_client.clone(), opts.max_txn_ops);
let election = EtcdElection::with_etcd_client(
&opts.grpc.server_addr,

View File

@@ -16,13 +16,9 @@ pub mod lease;
pub mod node_info;
pub mod utils;
use std::time::Duration;
use api::v1::meta::heartbeat_request::NodeWorkloads;
use common_error::ext::BoxedError;
use common_meta::distributed_time_constants::{
DATANODE_LEASE_SECS, FLOWNODE_LEASE_SECS, FRONTEND_HEARTBEAT_INTERVAL_MILLIS,
};
use common_meta::distributed_time_constants::default_distributed_time_constants;
use common_meta::error::Result;
use common_meta::peer::{Peer, PeerDiscovery, PeerResolver};
use common_meta::{DatanodeId, FlownodeId};
@@ -38,7 +34,7 @@ impl PeerDiscovery for MetaPeerClient {
utils::alive_frontends(
&DefaultSystemTimer,
self,
Duration::from_millis(FRONTEND_HEARTBEAT_INTERVAL_MILLIS),
default_distributed_time_constants().frontend_heartbeat_interval,
)
.await
.map_err(BoxedError::new)
@@ -52,7 +48,7 @@ impl PeerDiscovery for MetaPeerClient {
utils::alive_datanodes(
&DefaultSystemTimer,
self,
Duration::from_secs(DATANODE_LEASE_SECS),
default_distributed_time_constants().datanode_lease,
filter,
)
.await
@@ -67,7 +63,7 @@ impl PeerDiscovery for MetaPeerClient {
utils::alive_flownodes(
&DefaultSystemTimer,
self,
Duration::from_secs(FLOWNODE_LEASE_SECS),
default_distributed_time_constants().flownode_lease,
filter,
)
.await

View File

@@ -102,7 +102,7 @@ mod tests {
use api::v1::meta::heartbeat_request::NodeWorkloads;
use api::v1::meta::{DatanodeWorkloads, FlownodeWorkloads};
use common_meta::cluster::{FrontendStatus, NodeInfo, NodeInfoKey, NodeStatus, Role};
use common_meta::distributed_time_constants::FRONTEND_HEARTBEAT_INTERVAL_MILLIS;
use common_meta::distributed_time_constants::default_distributed_time_constants;
use common_meta::kv_backend::ResettableKvBackendRef;
use common_meta::peer::{Peer, PeerDiscovery};
use common_meta::rpc::store::PutRequest;
@@ -473,8 +473,10 @@ mod tests {
let client = create_meta_peer_client();
let in_memory = client.memory_backend();
let frontend_heartbeat_interval =
default_distributed_time_constants().frontend_heartbeat_interval;
let last_activity_ts =
current_time_millis() - FRONTEND_HEARTBEAT_INTERVAL_MILLIS as i64 - 1000;
current_time_millis() - frontend_heartbeat_interval.as_millis() as i64 - 1000;
let active_frontend_node = NodeInfo {
peer: Peer {
id: 0,

View File

@@ -15,7 +15,6 @@
use std::collections::VecDeque;
use std::time::Duration;
use common_meta::distributed_time_constants;
use serde::{Deserialize, Serialize};
const FIRST_HEARTBEAT_ESTIMATE_MILLIS: i64 = 1000;
@@ -79,9 +78,7 @@ impl Default for PhiAccrualFailureDetectorOptions {
Self {
threshold: 8_f32,
min_std_deviation: Duration::from_millis(100),
acceptable_heartbeat_pause: Duration::from_secs(
distributed_time_constants::DATANODE_LEASE_SECS,
),
acceptable_heartbeat_pause: Duration::from_secs(10),
}
}
}

View File

@@ -134,7 +134,7 @@ mod test {
use std::sync::Arc;
use common_meta::datanode::{RegionManifestInfo, RegionStat, Stat};
use common_meta::distributed_time_constants;
use common_meta::distributed_time_constants::default_distributed_time_constants;
use common_meta::key::TableMetadataManager;
use common_meta::key::table_route::TableRouteValue;
use common_meta::key::test_utils::new_test_table_info;
@@ -236,7 +236,7 @@ mod test {
let opening_region_keeper = Arc::new(MemoryRegionKeeper::default());
let handler = RegionLeaseHandler::new(
distributed_time_constants::REGION_LEASE_SECS,
default_distributed_time_constants().region_lease.as_secs(),
table_metadata_manager.clone(),
opening_region_keeper.clone(),
None,
@@ -266,7 +266,7 @@ mod test {
assert_eq!(
acc.region_lease.as_ref().unwrap().lease_seconds,
distributed_time_constants::REGION_LEASE_SECS
default_distributed_time_constants().region_lease.as_secs()
);
assert_region_lease(
@@ -300,7 +300,7 @@ mod test {
assert_eq!(
acc.region_lease.as_ref().unwrap().lease_seconds,
distributed_time_constants::REGION_LEASE_SECS
default_distributed_time_constants().region_lease.as_secs()
);
assert_region_lease(
@@ -379,7 +379,7 @@ mod test {
});
let handler = RegionLeaseHandler::new(
distributed_time_constants::REGION_LEASE_SECS,
default_distributed_time_constants().region_lease.as_secs(),
table_metadata_manager.clone(),
Default::default(),
None,
@@ -461,7 +461,7 @@ mod test {
..Default::default()
});
let handler = RegionLeaseHandler::new(
distributed_time_constants::REGION_LEASE_SECS,
default_distributed_time_constants().region_lease.as_secs(),
table_metadata_manager.clone(),
Default::default(),
None,

View File

@@ -27,7 +27,7 @@ use common_event_recorder::EventRecorderOptions;
use common_greptimedb_telemetry::GreptimeDBTelemetryTask;
use common_meta::cache_invalidator::CacheInvalidatorRef;
use common_meta::ddl_manager::DdlManagerRef;
use common_meta::distributed_time_constants;
use common_meta::distributed_time_constants::{self, default_distributed_time_constants};
use common_meta::key::TableMetadataManagerRef;
use common_meta::key::runtime_switch::RuntimeSwitchManagerRef;
use common_meta::kv_backend::{KvBackendRef, ResettableKvBackend, ResettableKvBackendRef};
@@ -121,6 +121,27 @@ impl Default for StatsPersistenceOptions {
}
}
#[derive(Clone, PartialEq, Serialize, Deserialize, Debug)]
#[serde(default)]
pub struct BackendClientOptions {
#[serde(with = "humantime_serde")]
pub keep_alive_timeout: Duration,
#[serde(with = "humantime_serde")]
pub keep_alive_interval: Duration,
#[serde(with = "humantime_serde")]
pub connect_timeout: Duration,
}
impl Default for BackendClientOptions {
fn default() -> Self {
Self {
keep_alive_interval: Duration::from_secs(10),
keep_alive_timeout: Duration::from_secs(3),
connect_timeout: Duration::from_secs(3),
}
}
}
#[derive(Clone, PartialEq, Serialize, Deserialize)]
#[serde(default)]
pub struct MetasrvOptions {
@@ -136,12 +157,22 @@ pub struct MetasrvOptions {
/// Only applicable when using PostgreSQL or MySQL as the metadata store
#[serde(default)]
pub backend_tls: Option<TlsOption>,
/// The backend client options.
/// Currently, only applicable when using etcd as the metadata store.
#[serde(default)]
pub backend_client: BackendClientOptions,
/// The type of selector.
pub selector: SelectorType,
/// Whether to use the memory store.
pub use_memory_store: bool,
/// Whether to enable region failover.
pub enable_region_failover: bool,
/// The base heartbeat interval.
///
/// This value is used to calculate the distributed time constants for components.
/// e.g., the region lease time is `heartbeat_interval * 3 + Duration::from_secs(1)`.
#[serde(with = "humantime_serde")]
pub heartbeat_interval: Duration,
/// The delay before starting region failure detection.
/// This delay helps prevent Metasrv from triggering unnecessary region failovers before all Datanodes are fully started.
/// Especially useful when the cluster is not deployed with GreptimeDB Operator and maintenance mode is not enabled.
@@ -240,7 +271,9 @@ impl fmt::Debug for MetasrvOptions {
.field("tracing", &self.tracing)
.field("backend", &self.backend)
.field("event_recorder", &self.event_recorder)
.field("stats_persistence", &self.stats_persistence);
.field("stats_persistence", &self.stats_persistence)
.field("heartbeat_interval", &self.heartbeat_interval)
.field("backend_client", &self.backend_client);
#[cfg(any(feature = "pg_kvbackend", feature = "mysql_kvbackend"))]
debug_struct.field("meta_table_name", &self.meta_table_name);
@@ -270,6 +303,7 @@ impl Default for MetasrvOptions {
selector: SelectorType::default(),
use_memory_store: false,
enable_region_failover: false,
heartbeat_interval: distributed_time_constants::BASE_HEARTBEAT_INTERVAL,
region_failure_detector_initialization_delay: Duration::from_secs(10 * 60),
allow_region_failover_on_local_wal: false,
grpc: GrpcOptions {
@@ -307,6 +341,7 @@ impl Default for MetasrvOptions {
event_recorder: EventRecorderOptions::default(),
stats_persistence: StatsPersistenceOptions::default(),
gc: GcSchedulerOptions::default(),
backend_client: BackendClientOptions::default(),
}
}
}
@@ -747,7 +782,7 @@ impl Metasrv {
&DefaultSystemTimer,
self.meta_peer_client.as_ref(),
peer_id,
Duration::from_secs(distributed_time_constants::DATANODE_LEASE_SECS),
default_distributed_time_constants().datanode_lease,
)
.await
}

View File

@@ -29,7 +29,7 @@ use common_meta::ddl::{
DdlContext, NoopRegionFailureDetectorControl, RegionFailureDetectorControllerRef,
};
use common_meta::ddl_manager::{DdlManager, DdlManagerConfiguratorRef};
use common_meta::distributed_time_constants::{self};
use common_meta::distributed_time_constants::default_distributed_time_constants;
use common_meta::key::TableMetadataManager;
use common_meta::key::flow::FlowMetadataManager;
use common_meta::key::flow::flow_state::FlowStateManager;
@@ -513,7 +513,7 @@ impl MetasrvBuilder {
Some(handler_group_builder) => handler_group_builder,
None => {
let region_lease_handler = RegionLeaseHandler::new(
distributed_time_constants::REGION_LEASE_SECS,
default_distributed_time_constants().region_lease.as_secs(),
table_metadata_manager.clone(),
memory_region_keeper.clone(),
customized_region_lease_renewer,

View File

@@ -921,7 +921,7 @@ mod tests {
use std::assert_matches::assert_matches;
use std::sync::Arc;
use common_meta::distributed_time_constants::REGION_LEASE_SECS;
use common_meta::distributed_time_constants::default_distributed_time_constants;
use common_meta::instruction::Instruction;
use common_meta::key::test_utils::new_test_table_info;
use common_meta::rpc::router::{Region, RegionRoute};
@@ -1192,8 +1192,10 @@ mod tests {
.run_once()
.await;
let region_lease = default_distributed_time_constants().region_lease.as_secs();
// Ensure it didn't run into the slow path.
assert!(timer.elapsed().as_secs() < REGION_LEASE_SECS / 2);
assert!(timer.elapsed().as_secs() < region_lease / 2);
runner.suite.verify_table_metadata().await;
}
@@ -1539,8 +1541,9 @@ mod tests {
.run_once()
.await;
let region_lease = default_distributed_time_constants().region_lease.as_secs();
// Ensure it didn't run into the slow path.
assert!(timer.elapsed().as_secs() < REGION_LEASE_SECS);
assert!(timer.elapsed().as_secs() < region_lease);
runner.suite.verify_table_metadata().await;
}
}

View File

@@ -13,11 +13,10 @@
// limitations under the License.
use std::any::Any;
use std::time::Duration;
use api::v1::meta::MailboxMessage;
use common_meta::RegionIdent;
use common_meta::distributed_time_constants::REGION_LEASE_SECS;
use common_meta::distributed_time_constants::default_distributed_time_constants;
use common_meta::instruction::{Instruction, InstructionReply, SimpleReply};
use common_procedure::{Context as ProcedureContext, Status};
use common_telemetry::{info, warn};
@@ -30,9 +29,6 @@ use crate::procedure::region_migration::migration_end::RegionMigrationEnd;
use crate::procedure::region_migration::{Context, State};
use crate::service::mailbox::Channel;
/// Uses lease time of a region as the timeout of closing a downgraded region.
const CLOSE_DOWNGRADED_REGION_TIMEOUT: Duration = Duration::from_secs(REGION_LEASE_SECS);
#[derive(Debug, Serialize, Deserialize)]
pub struct CloseDowngradedRegion;
@@ -112,7 +108,7 @@ impl CloseDowngradedRegion {
let ch = Channel::Datanode(downgrade_leader_datanode.id);
let receiver = ctx
.mailbox
.send(&ch, msg, CLOSE_DOWNGRADED_REGION_TIMEOUT)
.send(&ch, msg, default_distributed_time_constants().region_lease)
.await?;
match receiver.await {

View File

@@ -17,7 +17,7 @@ use std::time::Duration;
use api::v1::meta::MailboxMessage;
use common_error::ext::BoxedError;
use common_meta::distributed_time_constants::REGION_LEASE_SECS;
use common_meta::distributed_time_constants::default_distributed_time_constants;
use common_meta::instruction::{
DowngradeRegion, DowngradeRegionReply, DowngradeRegionsReply, Instruction, InstructionReply,
};
@@ -64,7 +64,7 @@ impl State for DowngradeLeaderRegion {
let now = Instant::now();
// Ensures the `leader_region_lease_deadline` must exist after recovering.
ctx.volatile_ctx
.set_leader_region_lease_deadline(Duration::from_secs(REGION_LEASE_SECS));
.set_leader_region_lease_deadline(default_distributed_time_constants().region_lease);
match self.downgrade_region_with_retry(ctx).await {
Ok(_) => {
@@ -277,14 +277,14 @@ impl DowngradeLeaderRegion {
if let Some(last_connection_at) = last_connection_at {
let now = current_time_millis();
let elapsed = now - last_connection_at;
let region_lease = Duration::from_secs(REGION_LEASE_SECS);
let region_lease = default_distributed_time_constants().region_lease;
// It's safe to update the region leader lease deadline here because:
// 1. The old region leader has already been marked as downgraded in metadata,
// which means any attempts to renew its lease will be rejected.
// 2. The pusher disconnect time record only gets removed when the datanode (from_peer)
// establishes a new heartbeat connection stream.
if elapsed >= (REGION_LEASE_SECS * 1000) as i64 {
if elapsed >= (region_lease.as_secs() * 1000) as i64 {
ctx.volatile_ctx.reset_leader_region_lease_deadline();
info!(
"Datanode {}({}) has been disconnected for longer than the region lease period ({:?}), reset leader region lease deadline to None, region: {:?}",
@@ -697,7 +697,8 @@ mod tests {
let procedure_ctx = new_procedure_context();
let (next, _) = state.next(&mut ctx, &procedure_ctx).await.unwrap();
let elapsed = timer.elapsed().as_secs();
assert!(elapsed < REGION_LEASE_SECS / 2);
let region_lease = default_distributed_time_constants().region_lease.as_secs();
assert!(elapsed < region_lease / 2);
assert_eq!(
ctx.volatile_ctx
.leader_region_last_entry_ids

View File

@@ -14,11 +14,10 @@
use std::any::Any;
use std::ops::Div;
use std::time::Duration;
use api::v1::meta::MailboxMessage;
use common_meta::RegionIdent;
use common_meta::distributed_time_constants::REGION_LEASE_SECS;
use common_meta::distributed_time_constants::default_distributed_time_constants;
use common_meta::instruction::{Instruction, InstructionReply, OpenRegion, SimpleReply};
use common_meta::key::datanode_table::RegionInfo;
use common_procedure::{Context as ProcedureContext, Status};
@@ -33,9 +32,6 @@ use crate::procedure::region_migration::flush_leader_region::PreFlushRegion;
use crate::procedure::region_migration::{Context, State};
use crate::service::mailbox::Channel;
/// Uses lease time of a region as the timeout of opening a candidate region.
const OPEN_CANDIDATE_REGION_TIMEOUT: Duration = Duration::from_secs(REGION_LEASE_SECS);
#[derive(Debug, Serialize, Deserialize)]
pub struct OpenCandidateRegion;
@@ -157,7 +153,9 @@ impl OpenCandidateRegion {
.context(error::ExceededDeadlineSnafu {
operation: "Open candidate region",
})?;
let operation_timeout = operation_timeout.div(2).max(OPEN_CANDIDATE_REGION_TIMEOUT);
let operation_timeout = operation_timeout
.div(2)
.max(default_distributed_time_constants().region_lease);
let ch = Channel::Datanode(candidate.id);
let now = Instant::now();
let receiver = ctx.mailbox.send(&ch, msg, operation_timeout).await?;

View File

@@ -99,6 +99,7 @@ impl heartbeat_server::Heartbeat for Metasrv {
error!("Client disconnected: broken pipe");
break;
}
error!(err; "Sending heartbeat response error");
if tx.send(Err(err)).await.is_err() {
info!("ReceiverStream was dropped; shutting down");

View File

@@ -12,17 +12,18 @@
// See the License for the specific language governing permissions and
// limitations under the License.
use common_meta::distributed_time_constants::default_etcd_client_options;
use common_meta::kv_backend::etcd::create_etcd_tls_options;
use etcd_client::Client;
use etcd_client::{Client, ConnectOptions};
use servers::tls::{TlsMode, TlsOption};
use snafu::ResultExt;
use crate::error::{self, BuildTlsOptionsSnafu, Result};
use crate::metasrv::BackendClientOptions;
/// Creates an etcd client with TLS configuration.
pub async fn create_etcd_client_with_tls(
store_addrs: &[String],
client_options: &BackendClientOptions,
tls_config: Option<&TlsOption>,
) -> Result<Client> {
let etcd_endpoints = store_addrs
@@ -31,7 +32,12 @@ pub async fn create_etcd_client_with_tls(
.filter(|x| !x.is_empty())
.collect::<Vec<_>>();
let mut connect_options = default_etcd_client_options();
let mut connect_options = ConnectOptions::new()
.with_keep_alive_while_idle(true)
.with_keep_alive(
client_options.keep_alive_interval,
client_options.keep_alive_timeout,
);
if let Some(tls_config) = tls_config
&& let Some(tls_options) = create_etcd_tls_options(&convert_tls_option(tls_config))
.context(BuildTlsOptionsSnafu)?

View File

@@ -48,7 +48,7 @@ impl IndexValueCodec {
) -> Result<()> {
ensure!(!value.is_null(), IndexEncodeNullSnafu);
if field.data_type().is_string() {
if field.encode_data_type().is_string() {
let value = value
.try_into_string()
.context(FieldTypeMismatchSnafu)?

View File

@@ -57,15 +57,20 @@ impl SortField {
&self.data_type
}
pub fn estimated_size(&self) -> usize {
/// Returns the physical data type to encode of the field.
///
/// For example, a dictionary field will be encoded as its value type.
pub fn encode_data_type(&self) -> &ConcreteDataType {
match &self.data_type {
ConcreteDataType::Dictionary(dict_type) => {
Self::estimated_size_by_type(dict_type.value_type())
}
data_type => Self::estimated_size_by_type(data_type),
ConcreteDataType::Dictionary(dict_type) => dict_type.value_type(),
_ => &self.data_type,
}
}
pub fn estimated_size(&self) -> usize {
Self::estimated_size_by_type(self.encode_data_type())
}
fn estimated_size_by_type(data_type: &ConcreteDataType) -> usize {
match data_type {
ConcreteDataType::Boolean(_) => 2,
@@ -98,12 +103,7 @@ impl SortField {
serializer: &mut Serializer<&mut Vec<u8>>,
value: &ValueRef,
) -> Result<()> {
match self.data_type() {
ConcreteDataType::Dictionary(dict_type) => {
Self::serialize_by_type(dict_type.value_type(), serializer, value)
}
data_type => Self::serialize_by_type(data_type, serializer, value),
}
Self::serialize_by_type(self.encode_data_type(), serializer, value)
}
fn serialize_by_type(
@@ -194,12 +194,7 @@ impl SortField {
/// Deserialize a value from the deserializer.
pub fn deserialize<B: Buf>(&self, deserializer: &mut Deserializer<B>) -> Result<Value> {
match &self.data_type {
ConcreteDataType::Dictionary(dict_type) => {
Self::deserialize_by_type(dict_type.value_type(), deserializer)
}
data_type => Self::deserialize_by_type(data_type, deserializer),
}
Self::deserialize_by_type(self.encode_data_type(), deserializer)
}
fn deserialize_by_type<B: Buf>(
@@ -301,12 +296,7 @@ impl SortField {
return Ok(1);
}
match &self.data_type {
ConcreteDataType::Dictionary(dict_type) => {
Self::skip_deserialize_by_type(dict_type.value_type(), bytes, deserializer)
}
data_type => Self::skip_deserialize_by_type(data_type, bytes, deserializer),
}
Self::skip_deserialize_by_type(self.encode_data_type(), bytes, deserializer)
}
fn skip_deserialize_by_type(

View File

@@ -25,7 +25,7 @@ use tokio::sync::mpsc;
use crate::compaction::compactor::{CompactionRegion, Compactor};
use crate::compaction::memory_manager::{CompactionMemoryGuard, CompactionMemoryManager};
use crate::compaction::picker::{CompactionTask, PickerOutput};
use crate::error::{CompactRegionSnafu, CompactionMemoryExhaustedSnafu, MemoryAcquireFailedSnafu};
use crate::error::{CompactRegionSnafu, CompactionMemoryExhaustedSnafu};
use crate::manifest::action::{RegionEdit, RegionMetaAction, RegionMetaActionList};
use crate::metrics::{COMPACTION_FAILURE_COUNT, COMPACTION_MEMORY_WAIT, COMPACTION_STAGE_ELAPSED};
use crate::region::RegionRoleState;
@@ -95,80 +95,16 @@ impl CompactionTaskImpl {
async fn acquire_memory_with_policy(&self) -> error::Result<CompactionMemoryGuard> {
let region_id = self.compaction_region.region_id;
let requested_bytes = self.estimated_memory_bytes;
let limit_bytes = self.memory_manager.limit_bytes();
let policy = self.memory_policy;
if limit_bytes > 0 && requested_bytes > limit_bytes {
warn!(
"Compaction for region {} requires {} bytes but limit is {} bytes; cannot satisfy request",
region_id, requested_bytes, limit_bytes
);
return Err(CompactionMemoryExhaustedSnafu {
let _timer = COMPACTION_MEMORY_WAIT.start_timer();
self.memory_manager
.acquire_with_policy(requested_bytes, policy)
.await
.context(CompactionMemoryExhaustedSnafu {
region_id,
required_bytes: requested_bytes,
limit_bytes,
policy: "exceed_limit".to_string(),
}
.build());
}
match self.memory_policy {
OnExhaustedPolicy::Wait {
timeout: wait_timeout,
} => {
let timer = COMPACTION_MEMORY_WAIT.start_timer();
match tokio::time::timeout(
wait_timeout,
self.memory_manager.acquire(requested_bytes),
)
.await
{
Ok(Ok(guard)) => {
timer.observe_duration();
Ok(guard)
}
Ok(Err(e)) => {
timer.observe_duration();
Err(e).with_context(|_| MemoryAcquireFailedSnafu {
region_id,
policy: format!("wait_timeout({}ms)", wait_timeout.as_millis()),
})
}
Err(_) => {
timer.observe_duration();
warn!(
"Compaction for region {} waited {:?} for {} bytes but timed out",
region_id, wait_timeout, requested_bytes
);
CompactionMemoryExhaustedSnafu {
region_id,
required_bytes: requested_bytes,
limit_bytes,
policy: format!("wait_timeout({}ms)", wait_timeout.as_millis()),
}
.fail()
}
}
}
OnExhaustedPolicy::Fail => {
// Try to acquire, fail immediately if not available
self.memory_manager
.try_acquire(requested_bytes)
.ok_or_else(|| {
warn!(
"Compaction memory exhausted for region {} (policy=fail, need {} bytes, limit {} bytes)",
region_id, requested_bytes, limit_bytes
);
CompactionMemoryExhaustedSnafu {
region_id,
required_bytes: requested_bytes,
limit_bytes,
policy: "fail".to_string(),
}
.build()
})
}
}
policy: format!("{policy:?}"),
})
}
/// Remove expired ssts files, update manifest immediately

View File

@@ -872,9 +872,9 @@ StorageSstEntry { file_path: "test/11_0000000002/index/<file_id>.puffin", file_s
StorageSstEntry { file_path: "test/22_0000000042/<file_id>.parquet", file_size: None, last_modified_ms: None, node_id: None }
StorageSstEntry { file_path: "test/22_0000000042/index/<file_id>.puffin", file_size: None, last_modified_ms: None, node_id: None }"#).await;
test_list_ssts_with_format(true, r#"
ManifestSstEntry { table_dir: "test/", region_id: 47244640257(11, 1), table_id: 11, region_number: 1, region_group: 0, region_sequence: 1, file_id: "<file_id>", index_version: 0, level: 0, file_path: "test/11_0000000001/<file_id>.parquet", file_size: 2837, index_file_path: Some("test/11_0000000001/index/<file_id>.puffin"), index_file_size: Some(292), num_rows: 10, num_row_groups: 1, num_series: Some(1), min_ts: 0::Millisecond, max_ts: 9000::Millisecond, sequence: Some(10), origin_region_id: 47244640257(11, 1), node_id: None, visible: true }
ManifestSstEntry { table_dir: "test/", region_id: 47244640258(11, 2), table_id: 11, region_number: 2, region_group: 0, region_sequence: 2, file_id: "<file_id>", index_version: 0, level: 0, file_path: "test/11_0000000002/<file_id>.parquet", file_size: 2837, index_file_path: Some("test/11_0000000002/index/<file_id>.puffin"), index_file_size: Some(292), num_rows: 10, num_row_groups: 1, num_series: Some(1), min_ts: 0::Millisecond, max_ts: 9000::Millisecond, sequence: Some(10), origin_region_id: 47244640258(11, 2), node_id: None, visible: true }
ManifestSstEntry { table_dir: "test/", region_id: 94489280554(22, 42), table_id: 22, region_number: 42, region_group: 0, region_sequence: 42, file_id: "<file_id>", index_version: 0, level: 0, file_path: "test/22_0000000042/<file_id>.parquet", file_size: 2837, index_file_path: Some("test/22_0000000042/index/<file_id>.puffin"), index_file_size: Some(292), num_rows: 10, num_row_groups: 1, num_series: Some(1), min_ts: 0::Millisecond, max_ts: 9000::Millisecond, sequence: Some(10), origin_region_id: 94489280554(22, 42), node_id: None, visible: true }"#,
ManifestSstEntry { table_dir: "test/", region_id: 47244640257(11, 1), table_id: 11, region_number: 1, region_group: 0, region_sequence: 1, file_id: "<file_id>", index_version: 0, level: 0, file_path: "test/11_0000000001/<file_id>.parquet", file_size: 2837, index_file_path: Some("test/11_0000000001/index/<file_id>.puffin"), index_file_size: Some(250), num_rows: 10, num_row_groups: 1, num_series: Some(1), min_ts: 0::Millisecond, max_ts: 9000::Millisecond, sequence: Some(10), origin_region_id: 47244640257(11, 1), node_id: None, visible: true }
ManifestSstEntry { table_dir: "test/", region_id: 47244640258(11, 2), table_id: 11, region_number: 2, region_group: 0, region_sequence: 2, file_id: "<file_id>", index_version: 0, level: 0, file_path: "test/11_0000000002/<file_id>.parquet", file_size: 2837, index_file_path: Some("test/11_0000000002/index/<file_id>.puffin"), index_file_size: Some(250), num_rows: 10, num_row_groups: 1, num_series: Some(1), min_ts: 0::Millisecond, max_ts: 9000::Millisecond, sequence: Some(10), origin_region_id: 47244640258(11, 2), node_id: None, visible: true }
ManifestSstEntry { table_dir: "test/", region_id: 94489280554(22, 42), table_id: 22, region_number: 42, region_group: 0, region_sequence: 42, file_id: "<file_id>", index_version: 0, level: 0, file_path: "test/22_0000000042/<file_id>.parquet", file_size: 2837, index_file_path: Some("test/22_0000000042/index/<file_id>.puffin"), index_file_size: Some(250), num_rows: 10, num_row_groups: 1, num_series: Some(1), min_ts: 0::Millisecond, max_ts: 9000::Millisecond, sequence: Some(10), origin_region_id: 94489280554(22, 42), node_id: None, visible: true }"#,
r#"
StorageSstEntry { file_path: "test/11_0000000001/<file_id>.parquet", file_size: None, last_modified_ms: None, node_id: None }
StorageSstEntry { file_path: "test/11_0000000001/index/<file_id>.puffin", file_size: None, last_modified_ms: None, node_id: None }

View File

@@ -1042,20 +1042,8 @@ pub enum Error {
#[snafu(display("Manual compaction is override by following operations."))]
ManualCompactionOverride {},
#[snafu(display(
"Compaction memory limit exceeded for region {region_id}: required {required_bytes} bytes, limit {limit_bytes} bytes (policy: {policy})",
))]
#[snafu(display("Compaction memory exhausted for region {region_id} (policy: {policy})",))]
CompactionMemoryExhausted {
region_id: RegionId,
required_bytes: u64,
limit_bytes: u64,
policy: String,
#[snafu(implicit)]
location: Location,
},
#[snafu(display("Failed to acquire memory for region {region_id} (policy: {policy})"))]
MemoryAcquireFailed {
region_id: RegionId,
policy: String,
#[snafu(source)]
@@ -1359,9 +1347,7 @@ impl ErrorExt for Error {
ManualCompactionOverride {} => StatusCode::Cancelled,
CompactionMemoryExhausted { .. } => StatusCode::RuntimeResourcesExhausted,
MemoryAcquireFailed { source, .. } => source.status_code(),
CompactionMemoryExhausted { source, .. } => source.status_code(),
IncompatibleWalProviderChange { .. } => StatusCode::InvalidArguments,

View File

@@ -774,7 +774,12 @@ fn memtable_flat_sources(
let iter = only_range.build_record_batch_iter(None)?;
// Dedup according to append mode and merge mode.
// Even single range may have duplicate rows.
let iter = maybe_dedup_one(options, field_column_start, iter);
let iter = maybe_dedup_one(
options.append_mode,
options.merge_mode(),
field_column_start,
iter,
);
flat_sources.sources.push(FlatSource::Iter(iter));
};
} else {
@@ -842,17 +847,18 @@ fn merge_and_dedup(
Ok(maybe_dedup)
}
fn maybe_dedup_one(
options: &RegionOptions,
pub fn maybe_dedup_one(
append_mode: bool,
merge_mode: MergeMode,
field_column_start: usize,
input_iter: BoxedRecordBatchIterator,
) -> BoxedRecordBatchIterator {
if options.append_mode {
if append_mode {
// No dedup in append mode
input_iter
} else {
// Dedup according to merge mode.
match options.merge_mode() {
match merge_mode {
MergeMode::LastRow => {
Box::new(FlatDedupIterator::new(input_iter, FlatLastRow::new(false)))
}

View File

@@ -540,7 +540,7 @@ impl LocalGcWorker {
fn filter_deletable_files(
&self,
entries: Vec<Entry>,
in_use_filenames: &HashSet<&FileId>,
in_use_filenames: &HashSet<FileId>,
may_linger_filenames: &HashSet<&FileId>,
eligible_for_removal: &HashSet<&FileId>,
unknown_file_may_linger_until: chrono::DateTime<chrono::Utc>,
@@ -641,9 +641,6 @@ impl LocalGcWorker {
.flatten()
.collect::<HashSet<_>>();
// in use filenames, include sst and index files
let in_use_filenames = in_used.iter().collect::<HashSet<_>>();
// When full_file_listing is false, skip expensive list operations and only delete
// files that are tracked in recently_removed_files
if !self.full_file_listing {
@@ -653,7 +650,7 @@ impl LocalGcWorker {
// 3. Have passed the lingering time
let files_to_delete: Vec<FileId> = eligible_for_removal
.iter()
.filter(|file_id| !in_use_filenames.contains(*file_id))
.filter(|file_id| !in_used.contains(*file_id))
.map(|&f| *f)
.collect();
@@ -672,7 +669,7 @@ impl LocalGcWorker {
let (all_unused_files_ready_for_delete, all_in_exist_linger_files) = self
.filter_deletable_files(
all_entries,
&in_use_filenames,
in_used,
&may_linger_filenames,
&eligible_for_removal,
unknown_file_may_linger_until,

View File

@@ -55,10 +55,8 @@ pub mod time_partition;
pub mod time_series;
pub(crate) mod version;
#[cfg(any(test, feature = "test"))]
pub use bulk::part::BulkPart;
pub use bulk::part::{
BulkPartEncoder, BulkPartMeta, UnorderedPart, record_batch_estimated_size,
BulkPart, BulkPartEncoder, BulkPartMeta, UnorderedPart, record_batch_estimated_size,
sort_primary_key_record_batch,
};
#[cfg(any(test, feature = "test"))]

View File

@@ -668,10 +668,10 @@ impl BulkMemtable {
}
/// Iterator builder for bulk range
struct BulkRangeIterBuilder {
part: BulkPart,
context: Arc<BulkIterContext>,
sequence: Option<SequenceRange>,
pub struct BulkRangeIterBuilder {
pub part: BulkPart,
pub context: Arc<BulkIterContext>,
pub sequence: Option<SequenceRange>,
}
impl IterBuilder for BulkRangeIterBuilder {
@@ -1188,7 +1188,6 @@ impl MemtableBuilder for BulkMemtableBuilder {
#[cfg(test)]
mod tests {
use mito_codec::row_converter::build_primary_key_codec;
use super::*;

View File

@@ -1211,343 +1211,24 @@ impl BulkPartEncoder {
}
}
/// Converts mutations to record batches.
fn mutations_to_record_batch(
mutations: &[Mutation],
metadata: &RegionMetadataRef,
pk_encoder: &DensePrimaryKeyCodec,
dedup: bool,
) -> Result<Option<(RecordBatch, i64, i64)>> {
let total_rows: usize = mutations
.iter()
.map(|m| m.rows.as_ref().map(|r| r.rows.len()).unwrap_or(0))
.sum();
if total_rows == 0 {
return Ok(None);
}
let mut pk_builder = BinaryBuilder::with_capacity(total_rows, 0);
let mut ts_vector: Box<dyn MutableVector> = metadata
.time_index_column()
.column_schema
.data_type
.create_mutable_vector(total_rows);
let mut sequence_builder = UInt64Builder::with_capacity(total_rows);
let mut op_type_builder = UInt8Builder::with_capacity(total_rows);
let mut field_builders: Vec<Box<dyn MutableVector>> = metadata
.field_columns()
.map(|f| f.column_schema.data_type.create_mutable_vector(total_rows))
.collect();
let mut pk_buffer = vec![];
for m in mutations {
let Some(key_values) = KeyValuesRef::new(metadata, m) else {
continue;
};
for row in key_values.iter() {
pk_buffer.clear();
pk_encoder
.encode_to_vec(row.primary_keys(), &mut pk_buffer)
.context(EncodeSnafu)?;
pk_builder.append_value(pk_buffer.as_bytes());
ts_vector.push_value_ref(&row.timestamp());
sequence_builder.append_value(row.sequence());
op_type_builder.append_value(row.op_type() as u8);
for (builder, field) in field_builders.iter_mut().zip(row.fields()) {
builder.push_value_ref(&field);
}
}
}
let arrow_schema = to_sst_arrow_schema(metadata);
// safety: timestamp column must be valid, and values must not be None.
let timestamp_unit = metadata
.time_index_column()
.column_schema
.data_type
.as_timestamp()
.unwrap()
.unit();
let sorter = ArraysSorter {
encoded_primary_keys: pk_builder.finish(),
timestamp_unit,
timestamp: ts_vector.to_vector().to_arrow_array(),
sequence: sequence_builder.finish(),
op_type: op_type_builder.finish(),
fields: field_builders
.iter_mut()
.map(|f| f.to_vector().to_arrow_array()),
dedup,
arrow_schema,
};
sorter.sort().map(Some)
}
struct ArraysSorter<I> {
encoded_primary_keys: BinaryArray,
timestamp_unit: TimeUnit,
timestamp: ArrayRef,
sequence: UInt64Array,
op_type: UInt8Array,
fields: I,
dedup: bool,
arrow_schema: SchemaRef,
}
impl<I> ArraysSorter<I>
where
I: Iterator<Item = ArrayRef>,
{
/// Converts arrays to record batch.
fn sort(self) -> Result<(RecordBatch, i64, i64)> {
debug_assert!(!self.timestamp.is_empty());
debug_assert!(self.timestamp.len() == self.sequence.len());
debug_assert!(self.timestamp.len() == self.op_type.len());
debug_assert!(self.timestamp.len() == self.encoded_primary_keys.len());
let timestamp_iter = timestamp_array_to_iter(self.timestamp_unit, &self.timestamp);
let (mut min_timestamp, mut max_timestamp) = (i64::MAX, i64::MIN);
let mut to_sort = self
.encoded_primary_keys
.iter()
.zip(timestamp_iter)
.zip(self.sequence.iter())
.map(|((pk, timestamp), sequence)| {
max_timestamp = max_timestamp.max(*timestamp);
min_timestamp = min_timestamp.min(*timestamp);
(pk, timestamp, sequence)
})
.enumerate()
.collect::<Vec<_>>();
to_sort.sort_unstable_by(|(_, (l_pk, l_ts, l_seq)), (_, (r_pk, r_ts, r_seq))| {
l_pk.cmp(r_pk)
.then(l_ts.cmp(r_ts))
.then(l_seq.cmp(r_seq).reverse())
});
if self.dedup {
// Dedup by timestamps while ignore sequence.
to_sort.dedup_by(|(_, (l_pk, l_ts, _)), (_, (r_pk, r_ts, _))| {
l_pk == r_pk && l_ts == r_ts
});
}
let indices = UInt32Array::from_iter_values(to_sort.iter().map(|v| v.0 as u32));
let pk_dictionary = Arc::new(binary_array_to_dictionary(
// safety: pk must be BinaryArray
arrow::compute::take(
&self.encoded_primary_keys,
&indices,
Some(TakeOptions {
check_bounds: false,
}),
)
.context(ComputeArrowSnafu)?
.as_any()
.downcast_ref::<BinaryArray>()
.unwrap(),
)?) as ArrayRef;
let mut arrays = Vec::with_capacity(self.arrow_schema.fields.len());
for arr in self.fields {
arrays.push(
arrow::compute::take(
&arr,
&indices,
Some(TakeOptions {
check_bounds: false,
}),
)
.context(ComputeArrowSnafu)?,
);
}
let timestamp = arrow::compute::take(
&self.timestamp,
&indices,
Some(TakeOptions {
check_bounds: false,
}),
)
.context(ComputeArrowSnafu)?;
arrays.push(timestamp);
arrays.push(pk_dictionary);
arrays.push(
arrow::compute::take(
&self.sequence,
&indices,
Some(TakeOptions {
check_bounds: false,
}),
)
.context(ComputeArrowSnafu)?,
);
arrays.push(
arrow::compute::take(
&self.op_type,
&indices,
Some(TakeOptions {
check_bounds: false,
}),
)
.context(ComputeArrowSnafu)?,
);
let batch = RecordBatch::try_new(self.arrow_schema, arrays).context(NewRecordBatchSnafu)?;
Ok((batch, min_timestamp, max_timestamp))
}
}
/// Converts timestamp array to an iter of i64 values.
fn timestamp_array_to_iter(
timestamp_unit: TimeUnit,
timestamp: &ArrayRef,
) -> impl Iterator<Item = &i64> {
match timestamp_unit {
// safety: timestamp column must be valid.
TimeUnit::Second => timestamp
.as_any()
.downcast_ref::<TimestampSecondArray>()
.unwrap()
.values()
.iter(),
TimeUnit::Millisecond => timestamp
.as_any()
.downcast_ref::<TimestampMillisecondArray>()
.unwrap()
.values()
.iter(),
TimeUnit::Microsecond => timestamp
.as_any()
.downcast_ref::<TimestampMicrosecondArray>()
.unwrap()
.values()
.iter(),
TimeUnit::Nanosecond => timestamp
.as_any()
.downcast_ref::<TimestampNanosecondArray>()
.unwrap()
.values()
.iter(),
}
}
/// Converts a **sorted** [BinaryArray] to [DictionaryArray].
fn binary_array_to_dictionary(input: &BinaryArray) -> Result<PrimaryKeyArray> {
if input.is_empty() {
return Ok(DictionaryArray::new(
UInt32Array::from(Vec::<u32>::new()),
Arc::new(BinaryArray::from_vec(vec![])) as ArrayRef,
));
}
let mut keys = Vec::with_capacity(16);
let mut values = BinaryBuilder::new();
let mut prev: usize = 0;
keys.push(prev as u32);
values.append_value(input.value(prev));
for current_bytes in input.iter().skip(1) {
// safety: encoded pk must present.
let current_bytes = current_bytes.unwrap();
let prev_bytes = input.value(prev);
if current_bytes != prev_bytes {
values.append_value(current_bytes);
prev += 1;
}
keys.push(prev as u32);
}
Ok(DictionaryArray::new(
UInt32Array::from(keys),
Arc::new(values.finish()) as ArrayRef,
))
}
#[cfg(test)]
mod tests {
use std::collections::VecDeque;
use api::v1::{Row, SemanticType, WriteHint};
use datafusion_common::ScalarValue;
use datatypes::arrow::array::Float64Array;
use datatypes::prelude::{ConcreteDataType, ScalarVector, Value};
use datatypes::prelude::{ConcreteDataType, Value};
use datatypes::schema::ColumnSchema;
use datatypes::vectors::{Float64Vector, TimestampMillisecondVector};
use store_api::metadata::{ColumnMetadata, RegionMetadataBuilder};
use store_api::storage::RegionId;
use store_api::storage::consts::ReservedColumnId;
use super::*;
use crate::memtable::bulk::context::BulkIterContext;
use crate::sst::parquet::format::{PrimaryKeyReadFormat, ReadFormat};
use crate::sst::{FlatSchemaOptions, to_flat_sst_arrow_schema};
use crate::test_util::memtable_util::{
build_key_values_with_ts_seq_values, metadata_for_test, region_metadata_to_row_schema,
};
fn check_binary_array_to_dictionary(
input: &[&[u8]],
expected_keys: &[u32],
expected_values: &[&[u8]],
) {
let input = BinaryArray::from_iter_values(input.iter());
let array = binary_array_to_dictionary(&input).unwrap();
assert_eq!(
&expected_keys,
&array.keys().iter().map(|v| v.unwrap()).collect::<Vec<_>>()
);
assert_eq!(
expected_values,
&array
.values()
.as_any()
.downcast_ref::<BinaryArray>()
.unwrap()
.iter()
.map(|v| v.unwrap())
.collect::<Vec<_>>()
);
}
#[test]
fn test_binary_array_to_dictionary() {
check_binary_array_to_dictionary(&[], &[], &[]);
check_binary_array_to_dictionary(&["a".as_bytes()], &[0], &["a".as_bytes()]);
check_binary_array_to_dictionary(
&["a".as_bytes(), "a".as_bytes()],
&[0, 0],
&["a".as_bytes()],
);
check_binary_array_to_dictionary(
&["a".as_bytes(), "a".as_bytes(), "b".as_bytes()],
&[0, 0, 1],
&["a".as_bytes(), "b".as_bytes()],
);
check_binary_array_to_dictionary(
&[
"a".as_bytes(),
"a".as_bytes(),
"b".as_bytes(),
"c".as_bytes(),
],
&[0, 0, 1, 2],
&["a".as_bytes(), "b".as_bytes(), "c".as_bytes()],
);
}
struct MutationInput<'a> {
k0: &'a str,
k1: u32,
@@ -1563,232 +1244,6 @@ mod tests {
v1: &'a [Option<f64>],
}
fn check_mutations_to_record_batches(
input: &[MutationInput],
expected: &[BatchOutput],
expected_timestamp: (i64, i64),
dedup: bool,
) {
let metadata = metadata_for_test();
let mutations = input
.iter()
.map(|m| {
build_key_values_with_ts_seq_values(
&metadata,
m.k0.to_string(),
m.k1,
m.timestamps.iter().copied(),
m.v1.iter().copied(),
m.sequence,
)
.mutation
})
.collect::<Vec<_>>();
let total_rows: usize = mutations
.iter()
.flat_map(|m| m.rows.iter())
.map(|r| r.rows.len())
.sum();
let pk_encoder = DensePrimaryKeyCodec::new(&metadata);
let (batch, _, _) = mutations_to_record_batch(&mutations, &metadata, &pk_encoder, dedup)
.unwrap()
.unwrap();
let read_format = PrimaryKeyReadFormat::new_with_all_columns(metadata.clone());
let mut batches = VecDeque::new();
read_format
.convert_record_batch(&batch, None, &mut batches)
.unwrap();
if !dedup {
assert_eq!(
total_rows,
batches.iter().map(|b| { b.num_rows() }).sum::<usize>()
);
}
let batch_values = batches
.into_iter()
.map(|b| {
let pk_values = pk_encoder.decode(b.primary_key()).unwrap().into_dense();
let timestamps = b
.timestamps()
.as_any()
.downcast_ref::<TimestampMillisecondVector>()
.unwrap()
.iter_data()
.map(|v| v.unwrap().0.value())
.collect::<Vec<_>>();
let float_values = b.fields()[1]
.data
.as_any()
.downcast_ref::<Float64Vector>()
.unwrap()
.iter_data()
.collect::<Vec<_>>();
(pk_values, timestamps, float_values)
})
.collect::<Vec<_>>();
assert_eq!(expected.len(), batch_values.len());
for idx in 0..expected.len() {
assert_eq!(expected[idx].pk_values, &batch_values[idx].0);
assert_eq!(expected[idx].timestamps, &batch_values[idx].1);
assert_eq!(expected[idx].v1, &batch_values[idx].2);
}
}
#[test]
fn test_mutations_to_record_batch() {
check_mutations_to_record_batches(
&[MutationInput {
k0: "a",
k1: 0,
timestamps: &[0],
v1: &[Some(0.1)],
sequence: 0,
}],
&[BatchOutput {
pk_values: &[Value::String("a".into()), Value::UInt32(0)],
timestamps: &[0],
v1: &[Some(0.1)],
}],
(0, 0),
true,
);
check_mutations_to_record_batches(
&[
MutationInput {
k0: "a",
k1: 0,
timestamps: &[0],
v1: &[Some(0.1)],
sequence: 0,
},
MutationInput {
k0: "b",
k1: 0,
timestamps: &[0],
v1: &[Some(0.0)],
sequence: 0,
},
MutationInput {
k0: "a",
k1: 0,
timestamps: &[1],
v1: &[Some(0.2)],
sequence: 1,
},
MutationInput {
k0: "a",
k1: 1,
timestamps: &[1],
v1: &[Some(0.3)],
sequence: 2,
},
],
&[
BatchOutput {
pk_values: &[Value::String("a".into()), Value::UInt32(0)],
timestamps: &[0, 1],
v1: &[Some(0.1), Some(0.2)],
},
BatchOutput {
pk_values: &[Value::String("a".into()), Value::UInt32(1)],
timestamps: &[1],
v1: &[Some(0.3)],
},
BatchOutput {
pk_values: &[Value::String("b".into()), Value::UInt32(0)],
timestamps: &[0],
v1: &[Some(0.0)],
},
],
(0, 1),
true,
);
check_mutations_to_record_batches(
&[
MutationInput {
k0: "a",
k1: 0,
timestamps: &[0],
v1: &[Some(0.1)],
sequence: 0,
},
MutationInput {
k0: "b",
k1: 0,
timestamps: &[0],
v1: &[Some(0.0)],
sequence: 0,
},
MutationInput {
k0: "a",
k1: 0,
timestamps: &[0],
v1: &[Some(0.2)],
sequence: 1,
},
],
&[
BatchOutput {
pk_values: &[Value::String("a".into()), Value::UInt32(0)],
timestamps: &[0],
v1: &[Some(0.2)],
},
BatchOutput {
pk_values: &[Value::String("b".into()), Value::UInt32(0)],
timestamps: &[0],
v1: &[Some(0.0)],
},
],
(0, 0),
true,
);
check_mutations_to_record_batches(
&[
MutationInput {
k0: "a",
k1: 0,
timestamps: &[0],
v1: &[Some(0.1)],
sequence: 0,
},
MutationInput {
k0: "b",
k1: 0,
timestamps: &[0],
v1: &[Some(0.0)],
sequence: 0,
},
MutationInput {
k0: "a",
k1: 0,
timestamps: &[0],
v1: &[Some(0.2)],
sequence: 1,
},
],
&[
BatchOutput {
pk_values: &[Value::String("a".into()), Value::UInt32(0)],
timestamps: &[0, 0],
v1: &[Some(0.2), Some(0.1)],
},
BatchOutput {
pk_values: &[Value::String("b".into()), Value::UInt32(0)],
timestamps: &[0],
v1: &[Some(0.0)],
},
],
(0, 0),
false,
);
}
fn encode(input: &[MutationInput]) -> EncodedBulkPart {
let metadata = metadata_for_test();
let kvs = input

View File

@@ -121,7 +121,7 @@ impl FlatSchemaOptions {
///
/// The schema is:
/// ```text
/// primary key columns, field columns, time index, __prmary_key, __sequence, __op_type
/// primary key columns, field columns, time index, __primary_key, __sequence, __op_type
/// ```
///
/// # Panics

View File

@@ -95,21 +95,32 @@ mod tests {
use std::collections::HashSet;
use std::sync::Arc;
use api::v1::OpType;
use api::v1::{OpType, SemanticType};
use common_function::function::FunctionRef;
use common_function::function_factory::ScalarFunctionFactory;
use common_function::scalars::matches::MatchesFunction;
use common_function::scalars::matches_term::MatchesTermFunction;
use common_time::Timestamp;
use datafusion_common::{Column, ScalarValue};
use datafusion_expr::expr::ScalarFunction;
use datafusion_expr::{BinaryExpr, Expr, Literal, Operator, col, lit};
use datatypes::arrow;
use datatypes::arrow::array::{
ArrayRef, BinaryDictionaryBuilder, RecordBatch, StringDictionaryBuilder,
ArrayRef, BinaryDictionaryBuilder, RecordBatch, StringArray, StringDictionaryBuilder,
TimestampMillisecondArray, UInt8Array, UInt64Array,
};
use datatypes::arrow::datatypes::{DataType, Field, Schema, UInt32Type};
use datatypes::prelude::ConcreteDataType;
use datatypes::schema::{FulltextAnalyzer, FulltextBackend, FulltextOptions};
use object_store::ObjectStore;
use parquet::arrow::AsyncArrowWriter;
use parquet::basic::{Compression, Encoding, ZstdLevel};
use parquet::file::metadata::KeyValue;
use parquet::file::properties::WriterProperties;
use store_api::codec::PrimaryKeyEncoding;
use store_api::metadata::{ColumnMetadata, RegionMetadata, RegionMetadataBuilder};
use store_api::region_request::PathType;
use store_api::storage::{ColumnSchema, RegionId};
use table::predicate::Predicate;
use tokio_util::compat::FuturesAsyncWriteCompatExt;
@@ -122,6 +133,7 @@ mod tests {
use crate::sst::file::{FileHandle, FileMeta, RegionFileId, RegionIndexId};
use crate::sst::file_purger::NoopFilePurger;
use crate::sst::index::bloom_filter::applier::BloomFilterIndexApplierBuilder;
use crate::sst::index::fulltext_index::applier::builder::FulltextIndexApplierBuilder;
use crate::sst::index::inverted_index::applier::builder::InvertedIndexApplierBuilder;
use crate::sst::index::{IndexBuildType, Indexer, IndexerBuilder, IndexerBuilderImpl};
use crate::sst::parquet::format::PrimaryKeyWriteFormat;
@@ -133,11 +145,13 @@ mod tests {
use crate::test_util::sst_util::{
assert_parquet_metadata_eq, build_test_binary_test_region_metadata, new_batch_by_range,
new_batch_with_binary, new_batch_with_custom_sequence, new_primary_key, new_source,
sst_file_handle, sst_file_handle_with_file_id, sst_region_metadata,
new_sparse_primary_key, sst_file_handle, sst_file_handle_with_file_id, sst_region_metadata,
sst_region_metadata_with_encoding,
};
use crate::test_util::{TestEnv, check_reader_result};
const FILE_DIR: &str = "/";
const REGION_ID: RegionId = RegionId::new(0, 0);
#[derive(Clone)]
struct FixedPathProvider {
@@ -1064,6 +1078,154 @@ mod tests {
FlatSource::Iter(Box::new(batches.into_iter().map(Ok)))
}
/// Creates a flat format RecordBatch for testing with sparse primary key encoding.
/// Similar to `new_record_batch_by_range` but without individual primary key columns.
fn new_record_batch_by_range_sparse(
tags: &[&str],
start: usize,
end: usize,
metadata: &Arc<RegionMetadata>,
) -> RecordBatch {
assert!(end >= start);
let flat_schema = to_flat_sst_arrow_schema(
metadata,
&FlatSchemaOptions::from_encoding(PrimaryKeyEncoding::Sparse),
);
let num_rows = end - start;
let mut columns: Vec<ArrayRef> = Vec::new();
// NOTE: Individual primary key columns (tag_0, tag_1) are NOT included in sparse format
// Add field column (field_0)
let field_values: Vec<u64> = (start..end).map(|v| v as u64).collect();
columns.push(Arc::new(UInt64Array::from(field_values)) as ArrayRef);
// Add time index column (ts)
let timestamps: Vec<i64> = (start..end).map(|v| v as i64).collect();
columns.push(Arc::new(TimestampMillisecondArray::from(timestamps)) as ArrayRef);
// Add encoded primary key column using sparse encoding
let table_id = 1u32; // Test table ID
let tsid = 100u64; // Base TSID
let pk = new_sparse_primary_key(tags, metadata, table_id, tsid);
let mut pk_builder = BinaryDictionaryBuilder::<UInt32Type>::new();
for _ in 0..num_rows {
pk_builder.append(&pk).unwrap();
}
columns.push(Arc::new(pk_builder.finish()) as ArrayRef);
// Add sequence column
columns.push(Arc::new(UInt64Array::from_value(1000, num_rows)) as ArrayRef);
// Add op_type column
columns.push(Arc::new(UInt8Array::from_value(OpType::Put as u8, num_rows)) as ArrayRef);
RecordBatch::try_new(flat_schema, columns).unwrap()
}
/// Helper function to create IndexerBuilderImpl for tests.
fn create_test_indexer_builder(
env: &TestEnv,
object_store: ObjectStore,
file_path: RegionFilePathFactory,
metadata: Arc<RegionMetadata>,
row_group_size: usize,
) -> IndexerBuilderImpl {
let puffin_manager = env.get_puffin_manager().build(object_store, file_path);
let intermediate_manager = env.get_intermediate_manager();
IndexerBuilderImpl {
build_type: IndexBuildType::Flush,
metadata,
row_group_size,
puffin_manager,
write_cache_enabled: false,
intermediate_manager,
index_options: IndexOptions {
inverted_index: InvertedIndexOptions {
segment_row_count: 1,
..Default::default()
},
},
inverted_index_config: Default::default(),
fulltext_index_config: Default::default(),
bloom_filter_index_config: Default::default(),
}
}
/// Helper function to write flat SST and return SstInfo.
async fn write_flat_sst(
object_store: ObjectStore,
metadata: Arc<RegionMetadata>,
indexer_builder: IndexerBuilderImpl,
file_path: RegionFilePathFactory,
flat_source: FlatSource,
write_opts: &WriteOptions,
) -> SstInfo {
let mut metrics = Metrics::new(WriteType::Flush);
let mut writer = ParquetWriter::new_with_object_store(
object_store,
metadata,
IndexConfig::default(),
indexer_builder,
file_path,
&mut metrics,
)
.await;
writer
.write_all_flat(flat_source, write_opts)
.await
.unwrap()
.remove(0)
}
/// Helper function to create FileHandle from SstInfo.
fn create_file_handle_from_sst_info(
info: &SstInfo,
metadata: &Arc<RegionMetadata>,
) -> FileHandle {
FileHandle::new(
FileMeta {
region_id: metadata.region_id,
file_id: info.file_id,
time_range: info.time_range,
level: 0,
file_size: info.file_size,
max_row_group_uncompressed_size: info.max_row_group_uncompressed_size,
available_indexes: info.index_metadata.build_available_indexes(),
indexes: info.index_metadata.build_indexes(),
index_file_size: info.index_metadata.file_size,
index_version: 0,
num_row_groups: info.num_row_groups,
num_rows: info.num_rows as u64,
sequence: None,
partition_expr: match &metadata.partition_expr {
Some(json_str) => partition::expr::PartitionExpr::from_json_str(json_str)
.expect("partition expression should be valid JSON"),
None => None,
},
num_series: 0,
},
Arc::new(NoopFilePurger),
)
}
/// Helper function to create test cache with standard settings.
fn create_test_cache() -> Arc<CacheManager> {
Arc::new(
CacheManager::builder()
.index_result_cache_size(1024 * 1024)
.index_metadata_size(1024 * 1024)
.index_content_page_size(1024 * 1024)
.index_content_size(1024 * 1024)
.puffin_metadata_size(1024 * 1024)
.build(),
)
}
#[tokio::test]
async fn test_write_flat_with_index() {
let mut env = TestEnv::new().await;
@@ -1238,4 +1400,709 @@ mod tests {
assert_eq!(*override_batch, expected_batch);
}
}
#[tokio::test]
async fn test_write_flat_read_with_inverted_index() {
let mut env = TestEnv::new().await;
let object_store = env.init_object_store_manager();
let file_path = RegionFilePathFactory::new(FILE_DIR.to_string(), PathType::Bare);
let metadata = Arc::new(sst_region_metadata());
let row_group_size = 100;
// Create flat format RecordBatches with non-overlapping timestamp ranges
// Each batch becomes one row group (row_group_size = 100)
// Data: ts tag_0 tag_1
// RG 0: 0-50 [a, d]
// RG 0: 50-100 [b, d]
// RG 1: 100-150 [c, d]
// RG 1: 150-200 [c, f]
let flat_batches = vec![
new_record_batch_by_range(&["a", "d"], 0, 50),
new_record_batch_by_range(&["b", "d"], 50, 100),
new_record_batch_by_range(&["c", "d"], 100, 150),
new_record_batch_by_range(&["c", "f"], 150, 200),
];
let flat_source = new_flat_source_from_record_batches(flat_batches);
let write_opts = WriteOptions {
row_group_size,
..Default::default()
};
let indexer_builder = create_test_indexer_builder(
&env,
object_store.clone(),
file_path.clone(),
metadata.clone(),
row_group_size,
);
let info = write_flat_sst(
object_store.clone(),
metadata.clone(),
indexer_builder,
file_path.clone(),
flat_source,
&write_opts,
)
.await;
assert_eq!(200, info.num_rows);
assert!(info.file_size > 0);
assert!(info.index_metadata.file_size > 0);
let handle = create_file_handle_from_sst_info(&info, &metadata);
let cache = create_test_cache();
// Test 1: Filter by tag_0 = "b"
// Expected: Only rows with tag_0="b"
let preds = vec![col("tag_0").eq(lit("b"))];
let inverted_index_applier = InvertedIndexApplierBuilder::new(
FILE_DIR.to_string(),
PathType::Bare,
object_store.clone(),
&metadata,
HashSet::from_iter([0]),
env.get_puffin_manager(),
)
.with_puffin_metadata_cache(cache.puffin_metadata_cache().cloned())
.with_inverted_index_cache(cache.inverted_index_cache().cloned())
.build(&preds)
.unwrap()
.map(Arc::new);
let builder = ParquetReaderBuilder::new(
FILE_DIR.to_string(),
PathType::Bare,
handle.clone(),
object_store.clone(),
)
.flat_format(true)
.predicate(Some(Predicate::new(preds)))
.inverted_index_appliers([inverted_index_applier.clone(), None])
.cache(CacheStrategy::EnableAll(cache.clone()));
let mut metrics = ReaderMetrics::default();
let (_context, selection) = builder.build_reader_input(&mut metrics).await.unwrap();
// Verify selection contains only RG 0 (tag_0="b", ts 0-100)
assert_eq!(selection.row_group_count(), 1);
assert_eq!(50, selection.get(0).unwrap().row_count());
// Verify filtering metrics
assert_eq!(metrics.filter_metrics.rg_total, 2);
assert_eq!(metrics.filter_metrics.rg_minmax_filtered, 1);
assert_eq!(metrics.filter_metrics.rg_inverted_filtered, 0);
assert_eq!(metrics.filter_metrics.rows_inverted_filtered, 50);
}
#[tokio::test]
async fn test_write_flat_read_with_bloom_filter() {
let mut env = TestEnv::new().await;
let object_store = env.init_object_store_manager();
let file_path = RegionFilePathFactory::new(FILE_DIR.to_string(), PathType::Bare);
let metadata = Arc::new(sst_region_metadata());
let row_group_size = 100;
// Create flat format RecordBatches with non-overlapping timestamp ranges
// Each batch becomes one row group (row_group_size = 100)
// Data: ts tag_0 tag_1
// RG 0: 0-50 [a, d]
// RG 0: 50-100 [b, e]
// RG 1: 100-150 [c, d]
// RG 1: 150-200 [c, f]
let flat_batches = vec![
new_record_batch_by_range(&["a", "d"], 0, 50),
new_record_batch_by_range(&["b", "e"], 50, 100),
new_record_batch_by_range(&["c", "d"], 100, 150),
new_record_batch_by_range(&["c", "f"], 150, 200),
];
let flat_source = new_flat_source_from_record_batches(flat_batches);
let write_opts = WriteOptions {
row_group_size,
..Default::default()
};
let indexer_builder = create_test_indexer_builder(
&env,
object_store.clone(),
file_path.clone(),
metadata.clone(),
row_group_size,
);
let info = write_flat_sst(
object_store.clone(),
metadata.clone(),
indexer_builder,
file_path.clone(),
flat_source,
&write_opts,
)
.await;
assert_eq!(200, info.num_rows);
assert!(info.file_size > 0);
assert!(info.index_metadata.file_size > 0);
let handle = create_file_handle_from_sst_info(&info, &metadata);
let cache = create_test_cache();
// Filter by ts >= 50 AND ts < 200 AND tag_1 = "d"
// Expected: RG 0 (ts 0-100) and RG 1 (ts 100-200), both have tag_1="d"
let preds = vec![
col("ts").gt_eq(lit(ScalarValue::TimestampMillisecond(Some(50), None))),
col("ts").lt(lit(ScalarValue::TimestampMillisecond(Some(200), None))),
col("tag_1").eq(lit("d")),
];
let bloom_filter_applier = BloomFilterIndexApplierBuilder::new(
FILE_DIR.to_string(),
PathType::Bare,
object_store.clone(),
&metadata,
env.get_puffin_manager(),
)
.with_puffin_metadata_cache(cache.puffin_metadata_cache().cloned())
.with_bloom_filter_index_cache(cache.bloom_filter_index_cache().cloned())
.build(&preds)
.unwrap()
.map(Arc::new);
let builder = ParquetReaderBuilder::new(
FILE_DIR.to_string(),
PathType::Bare,
handle.clone(),
object_store.clone(),
)
.flat_format(true)
.predicate(Some(Predicate::new(preds)))
.bloom_filter_index_appliers([None, bloom_filter_applier.clone()])
.cache(CacheStrategy::EnableAll(cache.clone()));
let mut metrics = ReaderMetrics::default();
let (_context, selection) = builder.build_reader_input(&mut metrics).await.unwrap();
// Verify selection contains RG 0 and RG 1
assert_eq!(selection.row_group_count(), 2);
assert_eq!(50, selection.get(0).unwrap().row_count());
assert_eq!(50, selection.get(1).unwrap().row_count());
// Verify filtering metrics
assert_eq!(metrics.filter_metrics.rg_total, 2);
assert_eq!(metrics.filter_metrics.rg_minmax_filtered, 0);
assert_eq!(metrics.filter_metrics.rg_bloom_filtered, 0);
assert_eq!(metrics.filter_metrics.rows_bloom_filtered, 100);
}
#[tokio::test]
async fn test_write_flat_read_with_inverted_index_sparse() {
common_telemetry::init_default_ut_logging();
let mut env = TestEnv::new().await;
let object_store = env.init_object_store_manager();
let file_path = RegionFilePathFactory::new(FILE_DIR.to_string(), PathType::Bare);
let metadata = Arc::new(sst_region_metadata_with_encoding(
PrimaryKeyEncoding::Sparse,
));
let row_group_size = 100;
// Create flat format RecordBatches with non-overlapping timestamp ranges
// Each batch becomes one row group (row_group_size = 100)
// Data: ts tag_0 tag_1
// RG 0: 0-50 [a, d]
// RG 0: 50-100 [b, d]
// RG 1: 100-150 [c, d]
// RG 1: 150-200 [c, f]
let flat_batches = vec![
new_record_batch_by_range_sparse(&["a", "d"], 0, 50, &metadata),
new_record_batch_by_range_sparse(&["b", "d"], 50, 100, &metadata),
new_record_batch_by_range_sparse(&["c", "d"], 100, 150, &metadata),
new_record_batch_by_range_sparse(&["c", "f"], 150, 200, &metadata),
];
let flat_source = new_flat_source_from_record_batches(flat_batches);
let write_opts = WriteOptions {
row_group_size,
..Default::default()
};
let indexer_builder = create_test_indexer_builder(
&env,
object_store.clone(),
file_path.clone(),
metadata.clone(),
row_group_size,
);
let info = write_flat_sst(
object_store.clone(),
metadata.clone(),
indexer_builder,
file_path.clone(),
flat_source,
&write_opts,
)
.await;
assert_eq!(200, info.num_rows);
assert!(info.file_size > 0);
assert!(info.index_metadata.file_size > 0);
let handle = create_file_handle_from_sst_info(&info, &metadata);
let cache = create_test_cache();
// Test 1: Filter by tag_0 = "b"
// Expected: Only rows with tag_0="b"
let preds = vec![col("tag_0").eq(lit("b"))];
let inverted_index_applier = InvertedIndexApplierBuilder::new(
FILE_DIR.to_string(),
PathType::Bare,
object_store.clone(),
&metadata,
HashSet::from_iter([0]),
env.get_puffin_manager(),
)
.with_puffin_metadata_cache(cache.puffin_metadata_cache().cloned())
.with_inverted_index_cache(cache.inverted_index_cache().cloned())
.build(&preds)
.unwrap()
.map(Arc::new);
let builder = ParquetReaderBuilder::new(
FILE_DIR.to_string(),
PathType::Bare,
handle.clone(),
object_store.clone(),
)
.flat_format(true)
.predicate(Some(Predicate::new(preds)))
.inverted_index_appliers([inverted_index_applier.clone(), None])
.cache(CacheStrategy::EnableAll(cache.clone()));
let mut metrics = ReaderMetrics::default();
let (_context, selection) = builder.build_reader_input(&mut metrics).await.unwrap();
// RG 0 has 50 matching rows (tag_0="b")
assert_eq!(selection.row_group_count(), 1);
assert_eq!(50, selection.get(0).unwrap().row_count());
// Verify filtering metrics
// Note: With sparse encoding, tag columns aren't stored separately,
// so minmax filtering on tags doesn't work (only inverted index)
assert_eq!(metrics.filter_metrics.rg_total, 2);
assert_eq!(metrics.filter_metrics.rg_minmax_filtered, 0); // No minmax stats for tags in sparse format
assert_eq!(metrics.filter_metrics.rg_inverted_filtered, 1);
assert_eq!(metrics.filter_metrics.rows_inverted_filtered, 150);
}
#[tokio::test]
async fn test_write_flat_read_with_bloom_filter_sparse() {
let mut env = TestEnv::new().await;
let object_store = env.init_object_store_manager();
let file_path = RegionFilePathFactory::new(FILE_DIR.to_string(), PathType::Bare);
let metadata = Arc::new(sst_region_metadata_with_encoding(
PrimaryKeyEncoding::Sparse,
));
let row_group_size = 100;
// Create flat format RecordBatches with non-overlapping timestamp ranges
// Each batch becomes one row group (row_group_size = 100)
// Data: ts tag_0 tag_1
// RG 0: 0-50 [a, d]
// RG 0: 50-100 [b, e]
// RG 1: 100-150 [c, d]
// RG 1: 150-200 [c, f]
let flat_batches = vec![
new_record_batch_by_range_sparse(&["a", "d"], 0, 50, &metadata),
new_record_batch_by_range_sparse(&["b", "e"], 50, 100, &metadata),
new_record_batch_by_range_sparse(&["c", "d"], 100, 150, &metadata),
new_record_batch_by_range_sparse(&["c", "f"], 150, 200, &metadata),
];
let flat_source = new_flat_source_from_record_batches(flat_batches);
let write_opts = WriteOptions {
row_group_size,
..Default::default()
};
let indexer_builder = create_test_indexer_builder(
&env,
object_store.clone(),
file_path.clone(),
metadata.clone(),
row_group_size,
);
let info = write_flat_sst(
object_store.clone(),
metadata.clone(),
indexer_builder,
file_path.clone(),
flat_source,
&write_opts,
)
.await;
assert_eq!(200, info.num_rows);
assert!(info.file_size > 0);
assert!(info.index_metadata.file_size > 0);
let handle = create_file_handle_from_sst_info(&info, &metadata);
let cache = create_test_cache();
// Filter by ts >= 50 AND ts < 200 AND tag_1 = "d"
// Expected: RG 0 (ts 0-100) and RG 1 (ts 100-200), both have tag_1="d"
let preds = vec![
col("ts").gt_eq(lit(ScalarValue::TimestampMillisecond(Some(50), None))),
col("ts").lt(lit(ScalarValue::TimestampMillisecond(Some(200), None))),
col("tag_1").eq(lit("d")),
];
let bloom_filter_applier = BloomFilterIndexApplierBuilder::new(
FILE_DIR.to_string(),
PathType::Bare,
object_store.clone(),
&metadata,
env.get_puffin_manager(),
)
.with_puffin_metadata_cache(cache.puffin_metadata_cache().cloned())
.with_bloom_filter_index_cache(cache.bloom_filter_index_cache().cloned())
.build(&preds)
.unwrap()
.map(Arc::new);
let builder = ParquetReaderBuilder::new(
FILE_DIR.to_string(),
PathType::Bare,
handle.clone(),
object_store.clone(),
)
.flat_format(true)
.predicate(Some(Predicate::new(preds)))
.bloom_filter_index_appliers([None, bloom_filter_applier.clone()])
.cache(CacheStrategy::EnableAll(cache.clone()));
let mut metrics = ReaderMetrics::default();
let (_context, selection) = builder.build_reader_input(&mut metrics).await.unwrap();
// Verify selection contains RG 0 and RG 1
assert_eq!(selection.row_group_count(), 2);
assert_eq!(50, selection.get(0).unwrap().row_count());
assert_eq!(50, selection.get(1).unwrap().row_count());
// Verify filtering metrics
assert_eq!(metrics.filter_metrics.rg_total, 2);
assert_eq!(metrics.filter_metrics.rg_minmax_filtered, 0);
assert_eq!(metrics.filter_metrics.rg_bloom_filtered, 0);
assert_eq!(metrics.filter_metrics.rows_bloom_filtered, 100);
}
/// Creates region metadata for testing fulltext indexes.
/// Schema: tag_0, text_bloom, text_tantivy, field_0, ts
fn fulltext_region_metadata() -> RegionMetadata {
let mut builder = RegionMetadataBuilder::new(REGION_ID);
builder
.push_column_metadata(ColumnMetadata {
column_schema: ColumnSchema::new(
"tag_0".to_string(),
ConcreteDataType::string_datatype(),
true,
),
semantic_type: SemanticType::Tag,
column_id: 0,
})
.push_column_metadata(ColumnMetadata {
column_schema: ColumnSchema::new(
"text_bloom".to_string(),
ConcreteDataType::string_datatype(),
true,
)
.with_fulltext_options(FulltextOptions {
enable: true,
analyzer: FulltextAnalyzer::English,
case_sensitive: false,
backend: FulltextBackend::Bloom,
granularity: 1,
false_positive_rate_in_10000: 50,
})
.unwrap(),
semantic_type: SemanticType::Field,
column_id: 1,
})
.push_column_metadata(ColumnMetadata {
column_schema: ColumnSchema::new(
"text_tantivy".to_string(),
ConcreteDataType::string_datatype(),
true,
)
.with_fulltext_options(FulltextOptions {
enable: true,
analyzer: FulltextAnalyzer::English,
case_sensitive: false,
backend: FulltextBackend::Tantivy,
granularity: 1,
false_positive_rate_in_10000: 50,
})
.unwrap(),
semantic_type: SemanticType::Field,
column_id: 2,
})
.push_column_metadata(ColumnMetadata {
column_schema: ColumnSchema::new(
"field_0".to_string(),
ConcreteDataType::uint64_datatype(),
true,
),
semantic_type: SemanticType::Field,
column_id: 3,
})
.push_column_metadata(ColumnMetadata {
column_schema: ColumnSchema::new(
"ts".to_string(),
ConcreteDataType::timestamp_millisecond_datatype(),
false,
),
semantic_type: SemanticType::Timestamp,
column_id: 4,
})
.primary_key(vec![0]);
builder.build().unwrap()
}
/// Creates a flat format RecordBatch with string fields for fulltext testing.
fn new_fulltext_record_batch_by_range(
tag: &str,
text_bloom: &str,
text_tantivy: &str,
start: usize,
end: usize,
) -> RecordBatch {
assert!(end >= start);
let metadata = Arc::new(fulltext_region_metadata());
let flat_schema = to_flat_sst_arrow_schema(&metadata, &FlatSchemaOptions::default());
let num_rows = end - start;
let mut columns = Vec::new();
// Add primary key column (tag_0) as dictionary array
let mut tag_builder = StringDictionaryBuilder::<UInt32Type>::new();
for _ in 0..num_rows {
tag_builder.append_value(tag);
}
columns.push(Arc::new(tag_builder.finish()) as ArrayRef);
// Add text_bloom field (fulltext with bloom backend)
let text_bloom_values: Vec<_> = (0..num_rows).map(|_| text_bloom).collect();
columns.push(Arc::new(StringArray::from(text_bloom_values)));
// Add text_tantivy field (fulltext with tantivy backend)
let text_tantivy_values: Vec<_> = (0..num_rows).map(|_| text_tantivy).collect();
columns.push(Arc::new(StringArray::from(text_tantivy_values)));
// Add field column (field_0)
let field_values: Vec<u64> = (start..end).map(|v| v as u64).collect();
columns.push(Arc::new(UInt64Array::from(field_values)));
// Add time index column (ts)
let timestamps: Vec<i64> = (start..end).map(|v| v as i64).collect();
columns.push(Arc::new(TimestampMillisecondArray::from(timestamps)));
// Add encoded primary key column
let pk = new_primary_key(&[tag]);
let mut pk_builder = BinaryDictionaryBuilder::<UInt32Type>::new();
for _ in 0..num_rows {
pk_builder.append(&pk).unwrap();
}
columns.push(Arc::new(pk_builder.finish()));
// Add sequence column
columns.push(Arc::new(UInt64Array::from_value(1000, num_rows)));
// Add op_type column
columns.push(Arc::new(UInt8Array::from_value(
OpType::Put as u8,
num_rows,
)));
RecordBatch::try_new(flat_schema, columns).unwrap()
}
#[tokio::test]
async fn test_write_flat_read_with_fulltext_index() {
let mut env = TestEnv::new().await;
let object_store = env.init_object_store_manager();
let file_path = RegionFilePathFactory::new(FILE_DIR.to_string(), PathType::Bare);
let metadata = Arc::new(fulltext_region_metadata());
let row_group_size = 50;
// Create flat format RecordBatches with different text content
// RG 0: 0-50 tag="a", bloom="hello world", tantivy="quick brown fox"
// RG 1: 50-100 tag="b", bloom="hello world", tantivy="quick brown fox"
// RG 2: 100-150 tag="c", bloom="goodbye world", tantivy="lazy dog"
// RG 3: 150-200 tag="d", bloom="goodbye world", tantivy="lazy dog"
let flat_batches = vec![
new_fulltext_record_batch_by_range("a", "hello world", "quick brown fox", 0, 50),
new_fulltext_record_batch_by_range("b", "hello world", "quick brown fox", 50, 100),
new_fulltext_record_batch_by_range("c", "goodbye world", "lazy dog", 100, 150),
new_fulltext_record_batch_by_range("d", "goodbye world", "lazy dog", 150, 200),
];
let flat_source = new_flat_source_from_record_batches(flat_batches);
let write_opts = WriteOptions {
row_group_size,
..Default::default()
};
let indexer_builder = create_test_indexer_builder(
&env,
object_store.clone(),
file_path.clone(),
metadata.clone(),
row_group_size,
);
let mut info = write_flat_sst(
object_store.clone(),
metadata.clone(),
indexer_builder,
file_path.clone(),
flat_source,
&write_opts,
)
.await;
assert_eq!(200, info.num_rows);
assert!(info.file_size > 0);
assert!(info.index_metadata.file_size > 0);
// Verify fulltext indexes were created
assert!(info.index_metadata.fulltext_index.index_size > 0);
assert_eq!(info.index_metadata.fulltext_index.row_count, 200);
// text_bloom (column_id 1) and text_tantivy (column_id 2)
info.index_metadata.fulltext_index.columns.sort_unstable();
assert_eq!(info.index_metadata.fulltext_index.columns, vec![1, 2]);
assert_eq!(
(
Timestamp::new_millisecond(0),
Timestamp::new_millisecond(199)
),
info.time_range
);
let handle = create_file_handle_from_sst_info(&info, &metadata);
let cache = create_test_cache();
// Helper functions to create fulltext function expressions
let matches_func = || {
Arc::new(
ScalarFunctionFactory::from(Arc::new(MatchesFunction::default()) as FunctionRef)
.provide(Default::default()),
)
};
let matches_term_func = || {
Arc::new(
ScalarFunctionFactory::from(
Arc::new(MatchesTermFunction::default()) as FunctionRef,
)
.provide(Default::default()),
)
};
// Test 1: Filter by text_bloom field using matches_term (bloom backend)
// Expected: RG 0 and RG 1 (rows 0-100) which have "hello" term
let preds = vec![Expr::ScalarFunction(ScalarFunction {
args: vec![col("text_bloom"), "hello".lit()],
func: matches_term_func(),
})];
let fulltext_applier = FulltextIndexApplierBuilder::new(
FILE_DIR.to_string(),
PathType::Bare,
object_store.clone(),
env.get_puffin_manager(),
&metadata,
)
.with_puffin_metadata_cache(cache.puffin_metadata_cache().cloned())
.with_bloom_filter_cache(cache.bloom_filter_index_cache().cloned())
.build(&preds)
.unwrap()
.map(Arc::new);
let builder = ParquetReaderBuilder::new(
FILE_DIR.to_string(),
PathType::Bare,
handle.clone(),
object_store.clone(),
)
.flat_format(true)
.predicate(Some(Predicate::new(preds)))
.fulltext_index_appliers([None, fulltext_applier.clone()])
.cache(CacheStrategy::EnableAll(cache.clone()));
let mut metrics = ReaderMetrics::default();
let (_context, selection) = builder.build_reader_input(&mut metrics).await.unwrap();
// Verify selection contains RG 0 and RG 1 (text_bloom="hello world")
assert_eq!(selection.row_group_count(), 2);
assert_eq!(50, selection.get(0).unwrap().row_count());
assert_eq!(50, selection.get(1).unwrap().row_count());
// Verify filtering metrics
assert_eq!(metrics.filter_metrics.rg_total, 4);
assert_eq!(metrics.filter_metrics.rg_minmax_filtered, 0);
assert_eq!(metrics.filter_metrics.rg_fulltext_filtered, 2);
assert_eq!(metrics.filter_metrics.rows_fulltext_filtered, 100);
// Test 2: Filter by text_tantivy field using matches (tantivy backend)
// Expected: RG 2 and RG 3 (rows 100-200) which have "lazy" in query
let preds = vec![Expr::ScalarFunction(ScalarFunction {
args: vec![col("text_tantivy"), "lazy".lit()],
func: matches_func(),
})];
let fulltext_applier = FulltextIndexApplierBuilder::new(
FILE_DIR.to_string(),
PathType::Bare,
object_store.clone(),
env.get_puffin_manager(),
&metadata,
)
.with_puffin_metadata_cache(cache.puffin_metadata_cache().cloned())
.with_bloom_filter_cache(cache.bloom_filter_index_cache().cloned())
.build(&preds)
.unwrap()
.map(Arc::new);
let builder = ParquetReaderBuilder::new(
FILE_DIR.to_string(),
PathType::Bare,
handle.clone(),
object_store.clone(),
)
.flat_format(true)
.predicate(Some(Predicate::new(preds)))
.fulltext_index_appliers([None, fulltext_applier.clone()])
.cache(CacheStrategy::EnableAll(cache.clone()));
let mut metrics = ReaderMetrics::default();
let (_context, selection) = builder.build_reader_input(&mut metrics).await.unwrap();
// Verify selection contains RG 2 and RG 3 (text_tantivy="lazy dog")
assert_eq!(selection.row_group_count(), 2);
assert_eq!(50, selection.get(2).unwrap().row_count());
assert_eq!(50, selection.get(3).unwrap().row_count());
// Verify filtering metrics
assert_eq!(metrics.filter_metrics.rg_total, 4);
assert_eq!(metrics.filter_metrics.rg_minmax_filtered, 0);
assert_eq!(metrics.filter_metrics.rg_fulltext_filtered, 2);
assert_eq!(metrics.filter_metrics.rows_fulltext_filtered, 100);
}
}

View File

@@ -29,7 +29,7 @@ use tokio::sync::mpsc::Sender;
use crate::access_layer::{AccessLayer, AccessLayerRef};
use crate::cache::CacheManager;
use crate::compaction::CompactionScheduler;
use crate::compaction::memory_manager::{CompactionMemoryManager, new_compaction_memory_manager};
use crate::compaction::memory_manager::new_compaction_memory_manager;
use crate::config::MitoConfig;
use crate::error::Result;
use crate::flush::FlushScheduler;

View File

@@ -27,6 +27,10 @@ use parquet::file::metadata::ParquetMetaData;
use store_api::metadata::{
ColumnMetadata, RegionMetadata, RegionMetadataBuilder, RegionMetadataRef,
};
use store_api::metric_engine_consts::{
DATA_SCHEMA_TABLE_ID_COLUMN_NAME, DATA_SCHEMA_TSID_COLUMN_NAME,
};
use store_api::storage::consts::ReservedColumnId;
use store_api::storage::{FileId, RegionId};
use crate::read::{Batch, BatchBuilder, Source};
@@ -36,11 +40,44 @@ use crate::test_util::{VecBatchReader, new_batch_builder, new_noop_file_purger};
/// Test region id.
const REGION_ID: RegionId = RegionId::new(0, 0);
/// Creates a new region metadata for testing SSTs.
/// Creates a new region metadata for testing SSTs with specified encoding.
///
/// Schema: tag_0, tag_1, field_0, ts
pub fn sst_region_metadata() -> RegionMetadata {
/// Dense schema: tag_0, tag_1, field_0, ts
/// Sparse schema: __table_id, __tsid, tag_0, tag_1, field_0, ts
pub fn sst_region_metadata_with_encoding(
encoding: store_api::codec::PrimaryKeyEncoding,
) -> RegionMetadata {
let mut builder = RegionMetadataBuilder::new(REGION_ID);
// For sparse encoding, add internal columns first
if encoding == store_api::codec::PrimaryKeyEncoding::Sparse {
builder
.push_column_metadata(ColumnMetadata {
column_schema: ColumnSchema::new(
DATA_SCHEMA_TABLE_ID_COLUMN_NAME.to_string(),
ConcreteDataType::uint32_datatype(),
false,
)
.with_skipping_options(SkippingIndexOptions {
granularity: 1,
..Default::default()
})
.unwrap(),
semantic_type: SemanticType::Tag,
column_id: ReservedColumnId::table_id(),
})
.push_column_metadata(ColumnMetadata {
column_schema: ColumnSchema::new(
DATA_SCHEMA_TSID_COLUMN_NAME.to_string(),
ConcreteDataType::uint64_datatype(),
false,
),
semantic_type: SemanticType::Tag,
column_id: ReservedColumnId::tsid(),
});
}
// Add user-defined columns (tag_0, tag_1, field_0, ts)
builder
.push_column_metadata(ColumnMetadata {
column_schema: ColumnSchema::new(
@@ -83,12 +120,32 @@ pub fn sst_region_metadata() -> RegionMetadata {
),
semantic_type: SemanticType::Timestamp,
column_id: 3,
})
.primary_key(vec![0, 1]);
});
// Set primary key based on encoding
if encoding == store_api::codec::PrimaryKeyEncoding::Sparse {
builder.primary_key(vec![
ReservedColumnId::table_id(),
ReservedColumnId::tsid(),
0, // tag_0
1, // tag_1
]);
} else {
builder.primary_key(vec![0, 1]); // Dense: just user tags
}
builder.primary_key_encoding(encoding);
builder.build().unwrap()
}
/// Encodes a primary key for specific tags.
/// Creates a new region metadata for testing SSTs.
///
/// Schema: tag_0, tag_1, field_0, ts
pub fn sst_region_metadata() -> RegionMetadata {
sst_region_metadata_with_encoding(store_api::codec::PrimaryKeyEncoding::Dense)
}
/// Encodes a primary key for specific tags using dense encoding.
pub fn new_primary_key(tags: &[&str]) -> Vec<u8> {
let fields = (0..tags.len())
.map(|idx| {
@@ -104,6 +161,31 @@ pub fn new_primary_key(tags: &[&str]) -> Vec<u8> {
.unwrap()
}
/// Encodes a primary key for specific tags using sparse encoding.
/// Includes internal columns (table_id, tsid) required by sparse format.
pub fn new_sparse_primary_key(
tags: &[&str],
metadata: &Arc<RegionMetadata>,
table_id: u32,
tsid: u64,
) -> Vec<u8> {
use mito_codec::row_converter::PrimaryKeyCodec;
let codec = mito_codec::row_converter::SparsePrimaryKeyCodec::new(metadata);
// Sparse encoding requires internal columns first, then user tags
let values = vec![
(ReservedColumnId::table_id(), ValueRef::UInt32(table_id)),
(ReservedColumnId::tsid(), ValueRef::UInt64(tsid)),
(0, ValueRef::String(tags[0])), // tag_0
(1, ValueRef::String(tags[1])), // tag_1
];
let mut buffer = Vec::new();
codec.encode_value_refs(&values, &mut buffer).unwrap();
buffer
}
/// Creates a [Source] from `batches`.
pub fn new_source(batches: &[Batch]) -> Source {
let reader = VecBatchReader::new(batches);

View File

@@ -117,6 +117,11 @@ pub struct S3Connection {
/// By default, opendal will send API to https://s3.us-east-1.amazonaws.com/bucket_name
/// Enabled, opendal will send API to https://bucket_name.s3.us-east-1.amazonaws.com
pub enable_virtual_host_style: bool,
/// Disable EC2 metadata service.
/// By default, opendal will use EC2 metadata service to load credentials from the instance metadata,
/// when access key id and secret access key are not provided.
/// If enabled, opendal will *NOT* use EC2 metadata service.
pub disable_ec2_metadata: bool,
}
impl From<&S3Connection> for S3 {
@@ -129,6 +134,10 @@ impl From<&S3Connection> for S3 {
.access_key_id(connection.access_key_id.expose_secret())
.secret_access_key(connection.secret_access_key.expose_secret());
if connection.disable_ec2_metadata {
builder = builder.disable_ec2_metadata();
}
if let Some(endpoint) = &connection.endpoint {
builder = builder.endpoint(endpoint);
}

File diff suppressed because it is too large Load Diff

View File

@@ -81,6 +81,12 @@ pub struct GrpcOptions {
/// Default to `None`, means infinite.
#[serde(with = "humantime_serde")]
pub max_connection_age: Option<Duration>,
/// The HTTP/2 keep-alive interval.
#[serde(with = "humantime_serde")]
pub http2_keep_alive_interval: Duration,
/// The HTTP/2 keep-alive timeout.
#[serde(with = "humantime_serde")]
pub http2_keep_alive_timeout: Duration,
}
impl GrpcOptions {
@@ -144,6 +150,8 @@ impl Default for GrpcOptions {
runtime_size: 8,
tls: TlsOption::default(),
max_connection_age: None,
http2_keep_alive_interval: Duration::from_secs(10),
http2_keep_alive_timeout: Duration::from_secs(3),
}
}
}
@@ -164,6 +172,8 @@ impl GrpcOptions {
runtime_size: 8,
tls: TlsOption::default(),
max_connection_age: None,
http2_keep_alive_interval: Duration::from_secs(10),
http2_keep_alive_timeout: Duration::from_secs(3),
}
}

View File

@@ -34,12 +34,10 @@ impl HeartbeatOptions {
pub fn frontend_default() -> Self {
Self {
// Frontend can send heartbeat with a longer interval.
interval: Duration::from_millis(
distributed_time_constants::FRONTEND_HEARTBEAT_INTERVAL_MILLIS,
),
retry_interval: Duration::from_millis(
distributed_time_constants::HEARTBEAT_INTERVAL_MILLIS,
interval: distributed_time_constants::frontend_heartbeat_interval(
distributed_time_constants::BASE_HEARTBEAT_INTERVAL,
),
retry_interval: distributed_time_constants::BASE_HEARTBEAT_INTERVAL,
}
}
}
@@ -47,10 +45,8 @@ impl HeartbeatOptions {
impl Default for HeartbeatOptions {
fn default() -> Self {
Self {
interval: Duration::from_millis(distributed_time_constants::HEARTBEAT_INTERVAL_MILLIS),
retry_interval: Duration::from_millis(
distributed_time_constants::HEARTBEAT_INTERVAL_MILLIS,
),
interval: distributed_time_constants::BASE_HEARTBEAT_INTERVAL,
retry_interval: distributed_time_constants::BASE_HEARTBEAT_INTERVAL,
}
}
}

View File

@@ -26,7 +26,7 @@ use arrow::datatypes::{Float64Type, TimestampMillisecondType};
use common_grpc::precision::Precision;
use common_query::prelude::{greptime_timestamp, greptime_value};
use common_recordbatch::{RecordBatch, RecordBatches};
use common_telemetry::tracing;
use common_telemetry::{tracing, warn};
use datafusion::dataframe::DataFrame;
use datafusion::prelude::{Expr, col, lit, regexp_match};
use datafusion_common::ScalarValue;
@@ -415,6 +415,10 @@ pub fn to_grpc_row_insert_requests(request: &WriteRequest) -> Result<(RowInsertR
table_data.add_row(one_row);
}
}
if !series.histograms.is_empty() {
warn!("Native histograms are not supported yet, data ignored");
}
}
Ok(multi_table_data.into_row_insert_requests())

View File

@@ -362,13 +362,13 @@ mod tests {
cert_path: "/path/to/cert_path".to_string(),
key_path: "/path/to/key_path".to_string(),
ca_cert_path: String::new(),
watch: false
watch: false,
},
TlsOption::new(
Some(Disable),
Some("/path/to/cert_path".to_string()),
Some("/path/to/key_path".to_string()),
false
false,
)
);
}

View File

@@ -285,6 +285,13 @@ pub enum Error {
location: Location,
},
#[snafu(display("Failed to set VECTOR index option"))]
SetVectorIndexOption {
source: datatypes::error::Error,
#[snafu(implicit)]
location: Location,
},
#[snafu(display(
"Invalid partition number: {}, should be in range [2, 65536]",
partition_num
@@ -394,7 +401,9 @@ impl ErrorExt for Error {
ConvertValue { .. } => StatusCode::Unsupported,
PermissionDenied { .. } => StatusCode::PermissionDenied,
SetFulltextOption { .. } | SetSkippingIndexOption { .. } => StatusCode::Unexpected,
SetFulltextOption { .. }
| SetSkippingIndexOption { .. }
| SetVectorIndexOption { .. } => StatusCode::Unexpected,
}
}

View File

@@ -43,6 +43,7 @@ use crate::parser::{FLOW, ParserContext};
use crate::parsers::tql_parser;
use crate::parsers::utils::{
self, validate_column_fulltext_create_option, validate_column_skipping_index_create_option,
validate_column_vector_index_create_option,
};
use crate::statements::create::{
Column, ColumnExtensions, CreateDatabase, CreateExternalTable, CreateFlow, CreateTable,
@@ -60,6 +61,7 @@ pub const EXPIRE: &str = "EXPIRE";
pub const AFTER: &str = "AFTER";
pub const INVERTED: &str = "INVERTED";
pub const SKIPPING: &str = "SKIPPING";
pub const VECTOR: &str = "VECTOR";
pub type RawIntervalExpr = String;
@@ -928,6 +930,61 @@ impl<'a> ParserContext<'a> {
is_index_declared |= true;
}
// vector index
if let Token::Word(word) = parser.peek_token().token
&& word.value.eq_ignore_ascii_case(VECTOR)
{
parser.next_token();
// Consume `INDEX` keyword
ensure!(
parser.parse_keyword(Keyword::INDEX),
InvalidColumnOptionSnafu {
name: column_name.to_string(),
msg: "expect INDEX after VECTOR keyword",
}
);
ensure!(
column_extensions.vector_index_options.is_none(),
InvalidColumnOptionSnafu {
name: column_name.to_string(),
msg: "duplicated VECTOR INDEX option",
}
);
// Check that column is a vector type
let column_type = get_unalias_type(column_type);
let data_type = sql_data_type_to_concrete_data_type(&column_type, column_extensions)?;
ensure!(
matches!(data_type, ConcreteDataType::Vector(_)),
InvalidColumnOptionSnafu {
name: column_name.to_string(),
msg: "VECTOR INDEX only supports Vector type columns",
}
);
let options = parser
.parse_options(Keyword::WITH)
.context(error::SyntaxSnafu)?
.into_iter()
.map(parse_option_string)
.collect::<Result<Vec<_>>>()?;
for (key, _) in options.iter() {
ensure!(
validate_column_vector_index_create_option(key),
InvalidColumnOptionSnafu {
name: column_name.to_string(),
msg: format!("invalid VECTOR INDEX option: {key}"),
}
);
}
let options = OptionMap::new(options);
column_extensions.vector_index_options = Some(options);
is_index_declared |= true;
}
Ok(is_index_declared)
}
@@ -2714,7 +2771,8 @@ CREATE TABLE log (
#[test]
fn test_parse_column_extensions_vector() {
let sql = "VECTOR(128)";
// Test that vector options are parsed from data_type (no additional SQL needed)
let sql = "";
let dialect = GenericDialect {};
let mut tokenizer = Tokenizer::new(&dialect, sql);
let tokens = tokenizer.tokenize().unwrap();
@@ -2734,7 +2792,8 @@ CREATE TABLE log (
#[test]
fn test_parse_column_extensions_vector_invalid() {
let sql = "VECTOR()";
// Test that vector with no dimension fails
let sql = "";
let dialect = GenericDialect {};
let mut tokenizer = Tokenizer::new(&dialect, sql);
let tokens = tokenizer.tokenize().unwrap();
@@ -2912,4 +2971,174 @@ CREATE TABLE log (
.unwrap();
assert_eq!("SELECT '10 seconds'::INTERVAL", &stmts[0].to_string());
}
#[test]
fn test_parse_create_table_vector_index_options() {
// Test basic vector index
let sql = r"
CREATE TABLE vectors (
ts TIMESTAMP TIME INDEX,
vec VECTOR(128) VECTOR INDEX,
)";
let result =
ParserContext::create_with_dialect(sql, &GreptimeDbDialect {}, ParseOptions::default())
.unwrap();
if let Statement::CreateTable(c) = &result[0] {
c.columns.iter().for_each(|col| {
if col.name().value == "vec" {
assert!(
col.extensions
.vector_index_options
.as_ref()
.unwrap()
.is_empty()
);
}
});
} else {
panic!("should be create_table statement");
}
// Test vector index with options
let sql = r"
CREATE TABLE vectors (
ts TIMESTAMP TIME INDEX,
vec VECTOR(128) VECTOR INDEX WITH (metric='cosine', connectivity='32', expansion_add='256', expansion_search='128')
)";
let result =
ParserContext::create_with_dialect(sql, &GreptimeDbDialect {}, ParseOptions::default())
.unwrap();
if let Statement::CreateTable(c) = &result[0] {
c.columns.iter().for_each(|col| {
if col.name().value == "vec" {
let options = col.extensions.vector_index_options.as_ref().unwrap();
assert_eq!(options.len(), 4);
assert_eq!(options.get("metric").unwrap(), "cosine");
assert_eq!(options.get("connectivity").unwrap(), "32");
assert_eq!(options.get("expansion_add").unwrap(), "256");
assert_eq!(options.get("expansion_search").unwrap(), "128");
}
});
} else {
panic!("should be create_table statement");
}
}
#[test]
fn test_parse_create_table_vector_index_invalid_type() {
// Test vector index on non-vector type (should fail)
let sql = r"
CREATE TABLE vectors (
ts TIMESTAMP TIME INDEX,
col INT VECTOR INDEX,
)";
let result =
ParserContext::create_with_dialect(sql, &GreptimeDbDialect {}, ParseOptions::default());
assert!(result.is_err());
assert!(
result
.unwrap_err()
.to_string()
.contains("VECTOR INDEX only supports Vector type columns")
);
}
#[test]
fn test_parse_create_table_vector_index_duplicate() {
// Test duplicate vector index (should fail)
let sql = r"
CREATE TABLE vectors (
ts TIMESTAMP TIME INDEX,
vec VECTOR(128) VECTOR INDEX VECTOR INDEX,
)";
let result =
ParserContext::create_with_dialect(sql, &GreptimeDbDialect {}, ParseOptions::default());
assert!(result.is_err());
assert!(
result
.unwrap_err()
.to_string()
.contains("duplicated VECTOR INDEX option")
);
}
#[test]
fn test_parse_create_table_vector_index_invalid_option() {
// Test invalid option key (should fail)
let sql = r"
CREATE TABLE vectors (
ts TIMESTAMP TIME INDEX,
vec VECTOR(128) VECTOR INDEX WITH (metric='l2sq', invalid_option='foo')
)";
let result =
ParserContext::create_with_dialect(sql, &GreptimeDbDialect {}, ParseOptions::default());
assert!(result.is_err());
assert!(
result
.unwrap_err()
.to_string()
.contains("invalid VECTOR INDEX option")
);
}
#[test]
fn test_parse_column_extensions_vector_index() {
// Test vector index on vector type
{
let sql = "VECTOR INDEX WITH (metric = 'l2sq')";
let dialect = GenericDialect {};
let mut tokenizer = Tokenizer::new(&dialect, sql);
let tokens = tokenizer.tokenize().unwrap();
let mut parser = Parser::new(&dialect).with_tokens(tokens);
let name = Ident::new("vec_col");
let data_type =
DataType::Custom(vec![Ident::new("VECTOR")].into(), vec!["128".to_string()]);
// First, parse the vector type to set vector_options
let mut extensions = ColumnExtensions {
vector_options: Some(OptionMap::from([(
VECTOR_OPT_DIM.to_string(),
"128".to_string(),
)])),
..Default::default()
};
let result = ParserContext::parse_column_extensions(
&mut parser,
&name,
&data_type,
&mut extensions,
);
assert!(result.is_ok());
assert!(extensions.vector_index_options.is_some());
let vi_options = extensions.vector_index_options.unwrap();
assert_eq!(vi_options.get("metric"), Some("l2sq"));
}
// Test vector index on non-vector type (should fail)
{
let sql = "VECTOR INDEX";
let dialect = GenericDialect {};
let mut tokenizer = Tokenizer::new(&dialect, sql);
let tokens = tokenizer.tokenize().unwrap();
let mut parser = Parser::new(&dialect).with_tokens(tokens);
let name = Ident::new("num_col");
let data_type = DataType::Int(None); // Non-vector type
let mut extensions = ColumnExtensions::default();
let result = ParserContext::parse_column_extensions(
&mut parser,
&name,
&data_type,
&mut extensions,
);
assert!(result.is_err());
assert!(
result
.unwrap_err()
.to_string()
.contains("VECTOR INDEX only supports Vector type columns")
);
}
}
}

View File

@@ -222,6 +222,29 @@ pub fn validate_column_skipping_index_create_option(key: &str) -> bool {
.contains(&key)
}
/// Valid options for VECTOR INDEX:
/// - engine: Vector index engine (usearch)
/// - metric: Distance metric (l2sq, cosine, inner_product)
/// - connectivity: HNSW M parameter
/// - expansion_add: ef_construction parameter
/// - expansion_search: ef_search parameter
pub const COLUMN_VECTOR_INDEX_OPT_KEY_ENGINE: &str = "engine";
pub const COLUMN_VECTOR_INDEX_OPT_KEY_METRIC: &str = "metric";
pub const COLUMN_VECTOR_INDEX_OPT_KEY_CONNECTIVITY: &str = "connectivity";
pub const COLUMN_VECTOR_INDEX_OPT_KEY_EXPANSION_ADD: &str = "expansion_add";
pub const COLUMN_VECTOR_INDEX_OPT_KEY_EXPANSION_SEARCH: &str = "expansion_search";
pub fn validate_column_vector_index_create_option(key: &str) -> bool {
[
COLUMN_VECTOR_INDEX_OPT_KEY_ENGINE,
COLUMN_VECTOR_INDEX_OPT_KEY_METRIC,
COLUMN_VECTOR_INDEX_OPT_KEY_CONNECTIVITY,
COLUMN_VECTOR_INDEX_OPT_KEY_EXPANSION_ADD,
COLUMN_VECTOR_INDEX_OPT_KEY_EXPANSION_SEARCH,
]
.contains(&key)
}
/// Convert an [`IntervalMonthDayNano`] to a [`Duration`].
#[cfg(feature = "enterprise")]
pub fn convert_month_day_nano_to_duration(

View File

@@ -55,7 +55,7 @@ use crate::ast::{
use crate::error::{
self, ConvertToGrpcDataTypeSnafu, ConvertValueSnafu, Result,
SerializeColumnDefaultConstraintSnafu, SetFulltextOptionSnafu, SetJsonStructureSettingsSnafu,
SetSkippingIndexOptionSnafu, SqlCommonSnafu,
SetSkippingIndexOptionSnafu, SetVectorIndexOptionSnafu, SqlCommonSnafu,
};
use crate::statements::create::{Column, ColumnExtensions};
pub use crate::statements::option_map::OptionMap;
@@ -147,6 +147,12 @@ pub fn column_to_schema(
.context(SetSkippingIndexOptionSnafu)?;
}
if let Some(options) = column.extensions.build_vector_index_options()? {
column_schema = column_schema
.with_vector_index_options(&options)
.context(SetVectorIndexOptionSnafu)?;
}
column_schema.set_inverted_index(column.extensions.inverted_index_options.is_some());
if matches!(column.data_type(), SqlDataType::JSON) {
@@ -710,6 +716,7 @@ mod tests {
skipping_index_options: None,
inverted_index_options: None,
json_datatype_options: None,
vector_index_options: None,
},
};
@@ -720,4 +727,82 @@ mod tests {
assert_eq!(fulltext_options.analyzer, FulltextAnalyzer::English);
assert!(fulltext_options.case_sensitive);
}
#[test]
fn test_column_to_schema_with_vector_index() {
use datatypes::schema::{VectorDistanceMetric, VectorIndexEngineType};
// Test with custom metric and parameters
let column = Column {
column_def: ColumnDef {
name: "embedding".into(),
data_type: SqlDataType::Custom(
vec![Ident::new(VECTOR_TYPE_NAME)].into(),
vec!["128".to_string()],
),
options: vec![],
},
extensions: ColumnExtensions {
fulltext_index_options: None,
vector_options: None,
skipping_index_options: None,
inverted_index_options: None,
json_datatype_options: None,
vector_index_options: Some(OptionMap::from([
("metric".to_string(), "cosine".to_string()),
("connectivity".to_string(), "32".to_string()),
("expansion_add".to_string(), "200".to_string()),
("expansion_search".to_string(), "100".to_string()),
])),
},
};
let column_schema = column_to_schema(&column, "ts", None).unwrap();
assert_eq!("embedding", column_schema.name);
assert!(column_schema.is_vector_indexed());
let vector_options = column_schema.vector_index_options().unwrap().unwrap();
assert_eq!(vector_options.engine, VectorIndexEngineType::Usearch);
assert_eq!(vector_options.metric, VectorDistanceMetric::Cosine);
assert_eq!(vector_options.connectivity, 32);
assert_eq!(vector_options.expansion_add, 200);
assert_eq!(vector_options.expansion_search, 100);
}
#[test]
fn test_column_to_schema_with_vector_index_defaults() {
use datatypes::schema::{VectorDistanceMetric, VectorIndexEngineType};
// Test with default values (empty options map)
let column = Column {
column_def: ColumnDef {
name: "vec".into(),
data_type: SqlDataType::Custom(
vec![Ident::new(VECTOR_TYPE_NAME)].into(),
vec!["64".to_string()],
),
options: vec![],
},
extensions: ColumnExtensions {
fulltext_index_options: None,
vector_options: None,
skipping_index_options: None,
inverted_index_options: None,
json_datatype_options: None,
vector_index_options: Some(OptionMap::default()),
},
};
let column_schema = column_to_schema(&column, "ts", None).unwrap();
assert_eq!("vec", column_schema.name);
assert!(column_schema.is_vector_indexed());
let vector_options = column_schema.vector_index_options().unwrap().unwrap();
// Verify defaults
assert_eq!(vector_options.engine, VectorIndexEngineType::Usearch);
assert_eq!(vector_options.metric, VectorDistanceMetric::L2sq);
assert_eq!(vector_options.connectivity, 16);
assert_eq!(vector_options.expansion_add, 128);
assert_eq!(vector_options.expansion_search, 64);
}
}

View File

@@ -17,7 +17,10 @@ use std::fmt::{Display, Formatter};
use common_catalog::consts::FILE_ENGINE;
use datatypes::json::JsonStructureSettings;
use datatypes::schema::{FulltextOptions, SkippingIndexOptions};
use datatypes::schema::{
FulltextOptions, SkippingIndexOptions, VectorDistanceMetric, VectorIndexEngineType,
VectorIndexOptions,
};
use itertools::Itertools;
use serde::Serialize;
use snafu::ResultExt;
@@ -133,6 +136,8 @@ pub struct ColumnExtensions {
///
/// Inverted index doesn't have options at present. There won't be any options in that map.
pub inverted_index_options: Option<OptionMap>,
/// Vector index options for HNSW-based vector similarity search.
pub vector_index_options: Option<OptionMap>,
pub json_datatype_options: Option<OptionMap>,
}
@@ -208,6 +213,15 @@ impl Display for Column {
write!(f, " INVERTED INDEX")?;
}
}
if let Some(vector_index_options) = &self.extensions.vector_index_options {
if !vector_index_options.is_empty() {
let options = vector_index_options.kv_pairs();
write!(f, " VECTOR INDEX WITH({})", format_list_comma!(options))?;
} else {
write!(f, " VECTOR INDEX")?;
}
}
Ok(())
}
}
@@ -233,6 +247,89 @@ impl ColumnExtensions {
))
}
pub fn build_vector_index_options(&self) -> Result<Option<VectorIndexOptions>> {
let Some(options) = self.vector_index_options.as_ref() else {
return Ok(None);
};
let options_map: HashMap<String, String> = options.clone().into_map();
let mut result = VectorIndexOptions::default();
if let Some(s) = options_map.get("engine") {
result.engine = s.parse::<VectorIndexEngineType>().map_err(|e| {
InvalidSqlSnafu {
msg: format!("invalid VECTOR INDEX engine: {e}"),
}
.build()
})?;
}
if let Some(s) = options_map.get("metric") {
result.metric = s.parse::<VectorDistanceMetric>().map_err(|e| {
InvalidSqlSnafu {
msg: format!("invalid VECTOR INDEX metric: {e}"),
}
.build()
})?;
}
if let Some(s) = options_map.get("connectivity") {
let value = s.parse::<u32>().map_err(|_| {
InvalidSqlSnafu {
msg: format!(
"invalid VECTOR INDEX connectivity: {s}, expected positive integer"
),
}
.build()
})?;
if !(2..=2048).contains(&value) {
return InvalidSqlSnafu {
msg: "VECTOR INDEX connectivity must be in the range [2, 2048].".to_string(),
}
.fail();
}
result.connectivity = value;
}
if let Some(s) = options_map.get("expansion_add") {
let value = s.parse::<u32>().map_err(|_| {
InvalidSqlSnafu {
msg: format!(
"invalid VECTOR INDEX expansion_add: {s}, expected positive integer"
),
}
.build()
})?;
if value == 0 {
return InvalidSqlSnafu {
msg: "VECTOR INDEX expansion_add must be greater than 0".to_string(),
}
.fail();
}
result.expansion_add = value;
}
if let Some(s) = options_map.get("expansion_search") {
let value = s.parse::<u32>().map_err(|_| {
InvalidSqlSnafu {
msg: format!(
"invalid VECTOR INDEX expansion_search: {s}, expected positive integer"
),
}
.build()
})?;
if value == 0 {
return InvalidSqlSnafu {
msg: "VECTOR INDEX expansion_search must be greater than 0".to_string(),
}
.fail();
}
result.expansion_search = value;
}
Ok(Some(result))
}
pub fn build_json_structure_settings(&self) -> Result<Option<JsonStructureSettings>> {
let Some(options) = self.json_datatype_options.as_ref() else {
return Ok(None);
@@ -893,4 +990,92 @@ AS SELECT number FROM numbers_input where number > 10"#,
_ => unreachable!(),
}
}
#[test]
fn test_vector_index_options_validation() {
use super::{ColumnExtensions, OptionMap};
// Test zero connectivity should fail
let extensions = ColumnExtensions {
fulltext_index_options: None,
vector_options: None,
skipping_index_options: None,
inverted_index_options: None,
json_datatype_options: None,
vector_index_options: Some(OptionMap::from([(
"connectivity".to_string(),
"0".to_string(),
)])),
};
let result = extensions.build_vector_index_options();
assert!(result.is_err());
assert!(
result
.unwrap_err()
.to_string()
.contains("connectivity must be in the range [2, 2048]")
);
// Test zero expansion_add should fail
let extensions = ColumnExtensions {
fulltext_index_options: None,
vector_options: None,
skipping_index_options: None,
inverted_index_options: None,
json_datatype_options: None,
vector_index_options: Some(OptionMap::from([(
"expansion_add".to_string(),
"0".to_string(),
)])),
};
let result = extensions.build_vector_index_options();
assert!(result.is_err());
assert!(
result
.unwrap_err()
.to_string()
.contains("expansion_add must be greater than 0")
);
// Test zero expansion_search should fail
let extensions = ColumnExtensions {
fulltext_index_options: None,
vector_options: None,
skipping_index_options: None,
inverted_index_options: None,
json_datatype_options: None,
vector_index_options: Some(OptionMap::from([(
"expansion_search".to_string(),
"0".to_string(),
)])),
};
let result = extensions.build_vector_index_options();
assert!(result.is_err());
assert!(
result
.unwrap_err()
.to_string()
.contains("expansion_search must be greater than 0")
);
// Test valid values should succeed
let extensions = ColumnExtensions {
fulltext_index_options: None,
vector_options: None,
skipping_index_options: None,
inverted_index_options: None,
json_datatype_options: None,
vector_index_options: Some(OptionMap::from([
("connectivity".to_string(), "32".to_string()),
("expansion_add".to_string(), "200".to_string()),
("expansion_search".to_string(), "100".to_string()),
])),
};
let result = extensions.build_vector_index_options();
assert!(result.is_ok());
let options = result.unwrap().unwrap();
assert_eq!(options.connectivity, 32);
assert_eq!(options.expansion_add, 200);
assert_eq!(options.expansion_search, 100);
}
}

View File

@@ -27,5 +27,8 @@ pub use datatypes::schema::{
pub use self::descriptors::*;
pub use self::file::{FileId, FileRef, FileRefsManifest, GcReport, IndexVersion, ParseIdError};
pub use self::requests::{ScanRequest, TimeSeriesDistribution, TimeSeriesRowSelector};
pub use self::requests::{
ScanRequest, TimeSeriesDistribution, TimeSeriesRowSelector, VectorDistanceMetric,
VectorIndexEngine, VectorIndexEngineType, VectorSearchMatches, VectorSearchRequest,
};
pub use self::types::{SequenceNumber, SequenceRange};

View File

@@ -14,11 +14,66 @@
use std::fmt::{Display, Formatter};
use common_error::ext::BoxedError;
use common_recordbatch::OrderOption;
use datafusion_expr::expr::Expr;
// Re-export vector types from datatypes to avoid duplication
pub use datatypes::schema::{VectorDistanceMetric, VectorIndexEngineType};
use strum::Display;
use crate::storage::SequenceNumber;
use crate::storage::{ColumnId, SequenceNumber};
/// A hint for KNN vector search.
#[derive(Debug, Clone, PartialEq)]
pub struct VectorSearchRequest {
/// Column ID of the vector column to search.
pub column_id: ColumnId,
/// The query vector to search for.
pub query_vector: Vec<f32>,
/// Number of nearest neighbors to return.
pub k: usize,
/// Distance metric to use (matches the index metric).
pub metric: VectorDistanceMetric,
}
/// Search results from vector index.
#[derive(Debug, Clone, PartialEq)]
pub struct VectorSearchMatches {
/// Keys (row offsets in the index).
pub keys: Vec<u64>,
/// Distances from the query vector.
pub distances: Vec<f32>,
}
/// Trait for vector index engines (HNSW implementations).
///
/// This trait defines the interface for pluggable vector index engines.
/// Implementations (e.g., UsearchEngine) are provided by storage engines like mito2.
pub trait VectorIndexEngine: Send + Sync {
/// Adds a vector with the given key.
fn add(&mut self, key: u64, vector: &[f32]) -> Result<(), BoxedError>;
/// Searches for k nearest neighbors.
fn search(&self, query: &[f32], k: usize) -> Result<VectorSearchMatches, BoxedError>;
/// Returns the serialized length.
fn serialized_length(&self) -> usize;
/// Serializes the index to a buffer.
fn save_to_buffer(&self, buffer: &mut [u8]) -> Result<(), BoxedError>;
/// Reserves capacity for vectors.
fn reserve(&mut self, capacity: usize) -> Result<(), BoxedError>;
/// Returns current size (number of vectors).
fn size(&self) -> usize;
/// Returns current capacity.
fn capacity(&self) -> usize;
/// Returns memory usage in bytes.
fn memory_usage(&self) -> usize;
}
/// A hint on how to select rows from a time-series.
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash, Display)]
@@ -38,7 +93,7 @@ pub enum TimeSeriesDistribution {
PerSeries,
}
#[derive(Default, Clone, Debug, PartialEq, Eq)]
#[derive(Default, Clone, Debug, PartialEq)]
pub struct ScanRequest {
/// Indices of columns to read, `None` to read all columns. This indices is
/// based on table schema.
@@ -66,6 +121,9 @@ pub struct ScanRequest {
pub sst_min_sequence: Option<SequenceNumber>,
/// Optional hint for the distribution of time-series data.
pub distribution: Option<TimeSeriesDistribution>,
/// Optional hint for KNN vector search. When set, the scan should use
/// vector index to find the k nearest neighbors.
pub vector_search: Option<VectorSearchRequest>,
}
impl Display for ScanRequest {
@@ -138,6 +196,16 @@ impl Display for ScanRequest {
if let Some(distribution) = &self.distribution {
write!(f, "{}distribution: {}", delimiter.as_str(), distribution)?;
}
if let Some(vector_search) = &self.vector_search {
write!(
f,
"{}vector_search: column_id={}, k={}, metric={}",
delimiter.as_str(),
vector_search.column_id,
vector_search.k,
vector_search.metric
)?;
}
write!(f, " }}")
}
}

View File

@@ -19,7 +19,7 @@ use std::sync::Arc;
use std::time::Duration;
use arbitrary::{Arbitrary, Unstructured};
use common_meta::distributed_time_constants;
use common_meta::distributed_time_constants::default_distributed_time_constants;
use common_telemetry::info;
use libfuzzer_sys::fuzz_target;
use rand::{Rng, SeedableRng};
@@ -254,10 +254,7 @@ async fn execute_failover(ctx: FuzzContext, input: FuzzInput) -> Result<()> {
recover_pod_failure(ctx.kube.clone(), &ctx.namespace, &chaos_name).await?;
wait_for_all_datanode_online(ctx.greptime.clone(), Duration::from_secs(60)).await;
tokio::time::sleep(Duration::from_secs(
distributed_time_constants::REGION_LEASE_SECS,
))
.await;
tokio::time::sleep(default_distributed_time_constants().region_lease).await;
// Validates value rows
info!("Validates num of rows");

View File

@@ -19,7 +19,7 @@ use std::sync::Arc;
use std::time::Duration;
use arbitrary::{Arbitrary, Unstructured};
use common_meta::distributed_time_constants;
use common_meta::distributed_time_constants::default_distributed_time_constants;
use common_telemetry::info;
use common_time::util::current_time_millis;
use futures::future::try_join_all;
@@ -322,10 +322,7 @@ async fn execute_failover(ctx: FuzzContext, input: FuzzInput) -> Result<()> {
recover_pod_failure(ctx.kube.clone(), &ctx.namespace, &chaos_name).await?;
wait_for_all_datanode_online(ctx.greptime.clone(), Duration::from_secs(60)).await;
tokio::time::sleep(Duration::from_secs(
distributed_time_constants::REGION_LEASE_SECS,
))
.await;
tokio::time::sleep(default_distributed_time_constants().region_lease).await;
// Validates value rows
info!("Validates num of rows");
for (table_ctx, expected_rows) in table_ctxs.iter().zip(affected_rows) {

View File

@@ -19,7 +19,7 @@ use std::sync::Arc;
use std::time::Duration;
use arbitrary::{Arbitrary, Unstructured};
use common_meta::distributed_time_constants;
use common_meta::distributed_time_constants::default_distributed_time_constants;
use common_telemetry::info;
use libfuzzer_sys::fuzz_target;
use rand::{Rng, SeedableRng};
@@ -275,10 +275,7 @@ async fn migrate_regions(ctx: &FuzzContext, migrations: &[Migration]) -> Result<
wait_for_migration(ctx, migration, &procedure_id).await;
}
tokio::time::sleep(Duration::from_secs(
distributed_time_constants::REGION_LEASE_SECS,
))
.await;
tokio::time::sleep(default_distributed_time_constants().region_lease).await;
Ok(())
}

View File

@@ -19,7 +19,7 @@ use std::sync::Arc;
use std::time::Duration;
use arbitrary::{Arbitrary, Unstructured};
use common_meta::distributed_time_constants;
use common_meta::distributed_time_constants::default_distributed_time_constants;
use common_telemetry::info;
use libfuzzer_sys::fuzz_target;
use rand::{Rng, SeedableRng};
@@ -274,10 +274,7 @@ async fn migrate_regions(ctx: &FuzzContext, migrations: &[Migration]) -> Result<
.await;
}
tokio::time::sleep(Duration::from_secs(
distributed_time_constants::REGION_LEASE_SECS,
))
.await;
tokio::time::sleep(default_distributed_time_constants().region_lease).await;
Ok(())
}

View File

@@ -259,9 +259,8 @@ impl GreptimeDbStandaloneBuilder {
let grpc_handler = instance.clone() as Arc<dyn GrpcQueryHandlerWithBoxedError>;
let weak_grpc_handler = Arc::downgrade(&grpc_handler);
frontend_instance_handler
.lock()
.unwrap()
.replace(weak_grpc_handler);
.set_handler(weak_grpc_handler)
.await;
let flow_streaming_engine = flownode.flow_engine().streaming_engine();
let invoker = flow::FrontendInvoker::build_from(

View File

@@ -1397,6 +1397,8 @@ max_recv_message_size = "512MiB"
max_send_message_size = "512MiB"
flight_compression = "arrow_ipc"
runtime_size = 8
http2_keep_alive_interval = "10s"
http2_keep_alive_timeout = "3s"
[grpc.tls]
mode = "disable"
@@ -1586,6 +1588,7 @@ fn drop_lines_with_inconsistent_results(input: String) -> String {
"endpoint =",
"region =",
"enable_virtual_host_style =",
"disable_ec2_metadata =",
"cache_path =",
"cache_capacity =",
"memory_pool_size =",