Commit Graph

23 Commits

Author SHA1 Message Date
Yingwen
e1156728fc perf(mito-codec): optimize SparseValues decode and lookup (#8057)
* perf: add benchmarks for SparsePrimaryKeyCodec::has_column

Add benchmarks covering table_id, tsid, first_tag, and last_tag
lookups across 5, 10, 50, and 100 tag counts to measure the cost of
the offset map construction in has_column.

Signed-off-by: evenyag <realevenyag@gmail.com>

* perf: handle table id/tsid specially

Signed-off-by: evenyag <realevenyag@gmail.com>

* perf: lazy decode

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: use vec for small tags

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: add bench

Signed-off-by: evenyag <realevenyag@gmail.com>

* perf: use 32 as inline capacity

Signed-off-by: evenyag <realevenyag@gmail.com>

* perf: benchmark sparse value

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: change sparse values to use vec

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: reserve capacity

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: simplify comments

Signed-off-by: evenyag <realevenyag@gmail.com>

* docs: update comment

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: update benchmark for map sparse values

Signed-off-by: evenyag <realevenyag@gmail.com>

* docs: update comment

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: remove empty check

Signed-off-by: evenyag <realevenyag@gmail.com>

---------

Signed-off-by: evenyag <realevenyag@gmail.com>
2026-05-08 08:35:34 +00:00
Yingwen
555741a9f1 feat: prune bulk memtable parts by first tag (#7911)
* feat: initial implementation for MultiBulkPart minmax pruning

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: add debug logs

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: prune bulk part

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: fmt code

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: address PR review comments for bulk part pruning

Document sparse encoding format in SparsePrimaryKeyCodec and add
comment explaining why primary_key.first() works for both encodings.
Remove noisy info-level pruning logs from the read path.

Signed-off-by: evenyag <realevenyag@gmail.com>

* test: add test for MultiBulkPart batch pruning behavior

Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: don't return error when read() returns None

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: address review comments

Signed-off-by: evenyag <realevenyag@gmail.com>

---------

Signed-off-by: evenyag <realevenyag@gmail.com>
2026-04-21 10:08:16 +00:00
shuiyisong
ba32c5fe9e chore: remove unused deps using udeps (#7906)
* chore: remove unused deps using udeps

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* chore: fmt toml

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

---------

Signed-off-by: shuiyisong <xixing.sys@gmail.com>
2026-04-02 06:49:27 +00:00
Ning Sun
e14404c677 chore: update rust toolchain to 2026-03-21 (#7849)
* chore: update rust toolchain to 2026-03-21

* chore: new format

* fix: lint

* chore: resolve lint issues

* chore: remove as_millis_f64

* chore: deps up
2026-03-30 12:13:14 +00:00
Yingwen
92e2d71f48 feat: implement prefilter framework and primary key prefilter (#7862)
* feat: prefilter basic framework

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: move arguments to RowGroupBuildContext

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: skip prefiltered exprs in FlatPruneReader

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: remove unused functions

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: update comment

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: handle partition columns in prefilter

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: fix clippy

Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: apply prefiltered selection by and_then

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: fix clippy

Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: handle last row cache

Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: don't ignore error in PrimaryKeyFilter

Signed-off-by: evenyag <realevenyag@gmail.com>

---------

Signed-off-by: evenyag <realevenyag@gmail.com>
2026-03-30 04:24:20 +00:00
Ruihang Xia
3ce7e7bc81 perf: accelerate pk filter (#7584)
* perf: accelerate pk filter

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* add benchmark

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* unit tests

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* support compare ops

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* refactor to reduce dense decode/encode duplication

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* refactor if-else chain

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2026-01-19 06:41:51 +00:00
Yingwen
fed6cb0806 fix: flat format use correct encoding in indexer for tags (#7440)
* test: add inverted and skipping test

Signed-off-by: evenyag <realevenyag@gmail.com>

* test: Add tests for fulltext index

Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: index dictionary type in correct encoding in flat format

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: use encode_data_type() in SortField

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: refine imports

Signed-off-by: evenyag <realevenyag@gmail.com>

* test: add tests for sparse encoding

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: remove logs

Signed-off-by: evenyag <realevenyag@gmail.com>

* test: update list test

Signed-off-by: evenyag <realevenyag@gmail.com>

* test: simplify tests

Signed-off-by: evenyag <realevenyag@gmail.com>

---------

Signed-off-by: evenyag <realevenyag@gmail.com>
2025-12-19 07:36:44 +00:00
Ruihang Xia
5bf72ab327 feat: decode_primary_key method for debugging (#7284)
* initial impl

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* third param

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* feat: support convert Dictionary type to ConcreteDataType

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* change to list array

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* simplify file_stream::create_stream

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* simplify FileRegion

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* type alias

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix format

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* remove staled test

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2025-11-24 12:41:54 +00:00
Weny Xu
47937961f6 feat(metric)!: enable sparse primary key encoding by default (#7195)
* feat(metric): enable sparse primary key encoding by default

Signed-off-by: WenyXu <wenymedia@gmail.com>

* chore: update config.md

Signed-off-by: WenyXu <wenymedia@gmail.com>

* fix: fix unit tests

Signed-off-by: WenyXu <wenymedia@gmail.com>

* fix: fix unit tests

Signed-off-by: WenyXu <wenymedia@gmail.com>

* fix sqlness

Signed-off-by: WenyXu <wenymedia@gmail.com>

* Update src/mito-codec/src/key_values.rs

Co-authored-by: Yingwen <realevenyag@gmail.com>

* feat: only allow setting primary key encoding for metric engine

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: support deleting rows from logical region instead of physical region

This keeps the behavior the same as put. It's easier to support sparse
encoding for deleting logical regions. Now the metric engine doesn't
support delete rows from physical region directly.

Signed-off-by: evenyag <realevenyag@gmail.com>

* test: update sqlness

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: remove unused error

Signed-off-by: WenyXu <wenymedia@gmail.com>

---------

Signed-off-by: WenyXu <wenymedia@gmail.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
Co-authored-by: Yingwen <realevenyag@gmail.com>
2025-11-11 06:33:51 +00:00
shuiyisong
a20ac4f9e5 feat: prefix option for timestamp index and value column (#7125)
* refactor: use GREPTIME_TIMESTAMP const

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* feat: add config for default ts col name

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* refactor: replace GREPTIME_TIMESTAMP with function get

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* chore: update config doc

* fix: test

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* chore: remove opts on flownode and metasrv

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* chore: add validation for ts column name

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* chore: use get_or_init to avoid test error

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* chore: fmt

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* chore: update docs

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* chore: using empty string to disable prefix

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* chore: update comment

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

* chore: address CR issues

Signed-off-by: shuiyisong <xixing.sys@gmail.com>

---------

Signed-off-by: shuiyisong <xixing.sys@gmail.com>
2025-10-27 08:00:03 +00:00
dennis zhuang
8a2371a05c feat: supports large string (#7097)
* feat: supports large string

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* chore: add doc for extract_string_vector_values

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* chore: refactor by cr comments

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* chore: changes by cr comments

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* refactor: extract_string_vector_values

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* feat: remove large string type and refactor string vector

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* chore: revert some changes

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* feat: adds large string type

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* chore: impl default for StringSizeType

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* fix: tests and test compatibility

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* test: update sqlness tests

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

* chore: remove panic

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>

---------

Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-10-17 01:46:11 +00:00
Ning Sun
2e6ea1167f refactor: update valueref coerce function name based on its semantics (#7098) 2025-10-16 09:11:40 +00:00
LFC
8fe17d43d5 chore: update rust to nightly 2025-10-01 (#7069)
* chore: update rust to nightly 2025-10-01

Signed-off-by: luofucong <luofc@foxmail.com>

* chore: nix update

---------

Signed-off-by: luofucong <luofc@foxmail.com>
Co-authored-by: Ning Sun <sunning@greptime.com>
2025-10-11 07:30:52 +00:00
Ning Sun
749a5ab165 feat: struct value and vector (#7033)
* feat: struct value

Signed-off-by: Ning Sun <sunning@greptime.com>

* feat: update for proto module

* feat: wip struct type

* feat: implement more vector operations

* feat: make datatype and api

* feat: reoslve some compilation issues

* feat: resolve all compilation issues

* chore: format update

* test: resolve tests

* test: test and refactor value-to-pb

* feat: add more tests and fix for value types

* chore: remove dbg

* feat: test and fix iterator

* fix: resolve struct_type issue

* refactor: use vec for struct items

* chore: update proto to main branch

* refactor: address some of review issues

* refactor: update for further review

* Add validation on new methods

* feat: update struct/list json serialization

* refactor: reimplement get in struct_vector

* refactor: struct vector functions

* refactor: fix lint issue

* refactor: address review comments

---------

Signed-off-by: Ning Sun <sunning@greptime.com>
2025-10-10 21:49:51 +00:00
Lei, HUANG
c92ab4217f fix: avoid truncating SST statistics during flush (#6977)
fix/disable-parquet-stats-truncate:
 - **Update `memcomparable` Dependency**: Switched from crates.io to a Git repository for `memcomparable` in `Cargo.lock`, `mito-codec/Cargo.toml`, and removed it from `mito2/Cargo.toml`.
 - **Enhance Parquet Writer Properties**: Added `set_statistics_truncate_length` and `set_column_index_truncate_length` to `WriterProperties` in `parquet.rs`, `bulk/part.rs`, `partition_tree/data.rs`, and `writer.rs`.
 - **Add Test for Corrupt Scan**: Introduced a new test module `scan_corrupt.rs` in `mito2/src/engine` to verify handling of corrupt data.
 - **Update Test Data**: Modified test data in `flush.rs` to reflect changes in file sizes and sequences.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
2025-09-17 03:02:52 +00:00
Ruihang Xia
c9377e7c5a build: bump rust edition to 2024 (#6920)
* bump edition

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* format

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* gen keyword

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* lifetime and env var

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* one more gen fix

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* lifetime of temporaries in tail expressions

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* format again

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* clippy nested if

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* clippy let and return

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

---------

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2025-09-08 02:37:18 +00:00
Yingwen
8fc42aeb27 feat: Update parquet writer and indexer to support the flat format (#6866)
* feat: implements method to write flat batch for ParquetWriter

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: add update method for flat RecordBatch in Indexer

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: calls indexer to write flat batch in ParquetWriter

Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: handle empty projection for flat format

Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: eval array in precise_filter_flat

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: cache column lookup result in inverted indexer

Signed-off-by: evenyag <realevenyag@gmail.com>

* test: add test

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: support dict type in dense codec

Signed-off-by: evenyag <realevenyag@gmail.com>

* test: remove read part in test as it need modifying the reader

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: support dictionary type in other methods for dense codec

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: fulltext use string array directly

Signed-off-by: evenyag <realevenyag@gmail.com>

---------

Signed-off-by: evenyag <realevenyag@gmail.com>
2025-09-02 12:48:34 +00:00
Lei, HUANG
dd3432e6ca chore: change encode raw values signature (#6869)
* chore/change-encode-raw-values-sig:
 ### Update Sparse Encoding to Use Byte Slices

 - **`bench_sparse_encoding.rs`**: Modified the `encode_raw_tag_value` function to use byte slices instead of `Bytes` for tag values.
 - **`sparse.rs`**: Updated the `encode_raw_tag_value` method in `SparsePrimaryKeyCodec` to accept byte slices (`&[u8]`) instead of `Bytes`. Adjusted related test cases to reflect this change.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* chore/change-encode-raw-values-sig:
 ### Add `Clear` Trait Implementation for Byte Slices

 - Implemented the `Clear` trait for byte slices (`&[u8]`) in `repeated_field.rs` to enhance trait coverage and provide a default clear operation for byte slice types.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

---------

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
2025-09-01 09:53:08 +00:00
Lei, HUANG
d99734b97b perf: sparse encoder (#6809)
* perf/sparse-encoder:
 - **Update Dependencies**: Updated `criterion-plot` to version `0.5.0` and added `criterion` version `0.7.0` in `Cargo.lock`. Added `bytes` to `Cargo.toml` in `src/metric-engine`.
 - **Benchmarking**: Added a new benchmark for sparse encoding in `bench_sparse_encoding.rs` and updated `Cargo.toml` in `src/mito-codec` to include `criterion` as a dev-dependency.
 - **Sparse Encoding Enhancements**: Modified `SparsePrimaryKeyCodec` in `sparse.rs` to include new methods `encode_raw_tag_value` and `encode_internal`. Added public constants `RESERVED_COLUMN_ID_TSID` and `RESERVED_COLUMN_ID_TABLE_ID`.
 - **HTTP Server**: Made `try_decompress` function public in `prom_store.rs`.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* perf/sparse-encoder:
 Improve buffer handling in `sparse.rs`

 - Refactored buffer reservation logic to use `value_len` for clarity.
 - Optimized chunk processing by calculating `num_chunks` and `remainder` for efficient data handling.
 - Enhanced manual serialization of bytes to avoid byte-by-byte operations, improving performance.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* Update src/mito-codec/src/row_converter/sparse.rs

Co-authored-by: Yingwen <realevenyag@gmail.com>

---------

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
Co-authored-by: Yingwen <realevenyag@gmail.com>
2025-08-26 04:10:11 +00:00
Yingwen
d5575d3fa4 feat: add FlatConvertFormat to convert record batches in old format to the flat format (#6786)
* feat: add convert format to FlatReadFormat

Signed-off-by: evenyag <realevenyag@gmail.com>

* test: test convert format

Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: only convert string pks to dictionary

Signed-off-by: evenyag <realevenyag@gmail.com>

---------

Signed-off-by: evenyag <realevenyag@gmail.com>
2025-08-25 06:47:06 +00:00
LFC
f9d2a89a0c chore: update datafusion family (#6675)
* chore: update datafusion family

Signed-off-by: luofucong <luofc@foxmail.com>

* fix ci

Signed-off-by: luofucong <luofc@foxmail.com>

* use official otel-arrow-rust

Signed-off-by: luofucong <luofc@foxmail.com>

* rebase

Signed-off-by: luofucong <luofc@foxmail.com>

* use the official orc-rust

Signed-off-by: luofucong <luofc@foxmail.com>

* resolve PR comments

Signed-off-by: luofucong <luofc@foxmail.com>

* remove the empty lines

Signed-off-by: luofucong <luofc@foxmail.com>

* try following PR comments

Signed-off-by: luofucong <luofc@foxmail.com>

---------

Signed-off-by: luofucong <luofc@foxmail.com>
2025-08-15 12:41:49 +00:00
discord9
f07b1daed4 feat: struct vector (#6595)
* feat: struct vector

Signed-off-by: discord9 <discord9@163.com>

* fix: array2vector&arrow type2concrete type

Signed-off-by: discord9 <discord9@163.com>

* chore: clippy

Signed-off-by: discord9 <discord9@163.com>

* chore: resolve some todos

Signed-off-by: discord9 <discord9@163.com>

* refactor: per review

Signed-off-by: discord9 <discord9@163.com>

---------

Signed-off-by: discord9 <discord9@163.com>
2025-07-29 08:22:27 +00:00
Yingwen
eaf1e1198f refactor: Extract mito codec part into a new crate (#6307)
* chore: add a new crate mito-codec

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: port necessary mods for primary key codec

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: use codec utils in mito-codec

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: remove unused mods

Signed-off-by: evenyag <realevenyag@gmail.com>

* style: fix clippy

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: remove Partition::is_partition_column()

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: remove duplicated test utils

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: remove unused comment

Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: fix is_partition_column check

Signed-off-by: evenyag <realevenyag@gmail.com>

---------

Signed-off-by: evenyag <realevenyag@gmail.com>
2025-06-13 07:14:29 +00:00