discord9
73325acfe4
fix: zh same underscore behavior ( #8002 )
...
* fix: zh same underscore behavior
Signed-off-by: discord9 <discord9@163.com >
* fix: only add token with _ from en analyzer
Signed-off-by: discord9 <discord9@163.com >
* test: neg sqlness case
Signed-off-by: discord9 <discord9@163.com >
---------
Signed-off-by: discord9 <discord9@163.com >
2026-04-22 03:21:36 +00:00
discord9
a8540ad39d
perf: better jieba cut ( #7984 )
...
* perf: better jieba cut
Signed-off-by: discord9 <discord9@163.com >
* fix: also filter pun mark
Signed-off-by: discord9 <discord9@163.com >
* chore
Signed-off-by: discord9 <discord9@163.com >
* docs: explain why
Signed-off-by: discord9 <discord9@163.com >
---------
Signed-off-by: discord9 <discord9@163.com >
2026-04-17 08:33:29 +00:00
discord9
3750819f93
fix: match term zh ( #7952 )
...
* fix: match term zh
Signed-off-by: discord9 <discord9@163.com >
* chore: per gemini
Signed-off-by: discord9 <discord9@163.com >
* chore: revert accident change
Signed-off-by: discord9 <discord9@163.com >
* feat: unicode script han
Signed-off-by: discord9 <discord9@163.com >
---------
Signed-off-by: discord9 <discord9@163.com >
2026-04-13 13:04:11 +00:00
cui
06e49961c7
fix(index): intersect bitmaps before early exit in predicates applier ( #7867 )
...
* fix(index): intersect bitmaps before early exit in predicates applier
The loop skipped intersecting when the next bitmap was empty, which left
the accumulator unchanged instead of zeroing it. Intersect first, then
break when the result is empty.
Signed-off-by: Weixie Cui <cuiweixie@gmail.com >
* per gemini
* style(index): format predicates applier loop
* fix(index): remove unused mut in predicates applier
---------
Signed-off-by: Weixie Cui <cuiweixie@gmail.com >
Co-authored-by: discord9 <55937128+discord9@users.noreply.github.com >
Co-authored-by: discord9 <discord9@163.com >
2026-04-10 09:22:12 +00:00
Lanqing Yang
24ab861052
chore: move Tantivy fulltext search to blocking thread pool ( #7919 )
...
perf: move Tantivy fulltext search to blocking thread pool
Wrap the synchronous Tantivy search (query parsing, posting list
traversal, stored field reads) in spawn_blocking_global to avoid
starving the tokio async runtime with CPU-bound work.
Signed-off-by: lyang24 <lanqingy93@gmail.com >
2026-04-09 11:12:05 +00:00
Ning Sun
e14404c677
chore: update rust toolchain to 2026-03-21 ( #7849 )
...
* chore: update rust toolchain to 2026-03-21
* chore: new format
* fix: lint
* chore: resolve lint issues
* chore: remove as_millis_f64
* chore: deps up
2026-03-30 12:13:14 +00:00
LFC
5eac4f10aa
chore: remove dependency on "atty" ( #7725 )
...
Signed-off-by: luofucong <luofc@foxmail.com >
2026-02-26 09:58:01 +00:00
dennis zhuang
8883022742
refactor(vector-index): use protobuf for metadata and align code ( #7648 )
...
* refactor(vector-index): use protobuf for metadata and introduce lifecycle traits
Signed-off-by: Dennis Zhuang <killme2008@gmail.com >
* chore: minor change
Signed-off-by: Dennis Zhuang <killme2008@gmail.com >
* refactor: by suggestions
Signed-off-by: Dennis Zhuang <killme2008@gmail.com >
* chore: format
Signed-off-by: Dennis Zhuang <killme2008@gmail.com >
* fix: style
Signed-off-by: Dennis Zhuang <killme2008@gmail.com >
* chore: remove usearch from mito2
Signed-off-by: Dennis Zhuang <killme2008@gmail.com >
* chore: tweak errors
Signed-off-by: Dennis Zhuang <killme2008@gmail.com >
* test: update index size in result
Signed-off-by: Dennis Zhuang <killme2008@gmail.com >
* fix: clippy
Signed-off-by: Dennis Zhuang <killme2008@gmail.com >
* chore: update proto deps
Signed-off-by: Dennis Zhuang <killme2008@gmail.com >
---------
Signed-off-by: Dennis Zhuang <killme2008@gmail.com >
2026-02-05 02:41:48 +00:00
discord9
1afcddd5a9
chore: feature gate vector_index ( #7428 )
...
Signed-off-by: discord9 <discord9@163.com >
2025-12-17 07:14:25 +00:00
dennis zhuang
a35a39f726
feat(vector_index): adds the foundational types and SQL parsing support for vector index ( #7366 )
...
* feat: adds the foundational types and SQL parsing support for vector index
Signed-off-by: Dennis Zhuang <killme2008@gmail.com >
* refactor: by suggestions
Signed-off-by: Dennis Zhuang <killme2008@gmail.com >
* fix: ensure index option values must be greater than zero
Signed-off-by: Dennis Zhuang <killme2008@gmail.com >
* chore: validate connectivity strictly
Signed-off-by: Dennis Zhuang <killme2008@gmail.com >
* fix: compile error
Signed-off-by: Dennis Zhuang <killme2008@gmail.com >
* feat: disable SIMD for ci
Signed-off-by: Dennis Zhuang <killme2008@gmail.com >
---------
Signed-off-by: Dennis Zhuang <killme2008@gmail.com >
2025-12-16 22:45:36 +00:00
Yingwen
84e4e42ee7
feat: add more verbose metrics to scanners ( #7336 )
...
* feat: add inverted applier metrics
Signed-off-by: evenyag <realevenyag@gmail.com >
* feat: add metrics to bloom applier
Signed-off-by: evenyag <realevenyag@gmail.com >
* feat: add metrics to fulltext index applier
Signed-off-by: evenyag <realevenyag@gmail.com >
* feat: implement BloomFilterReadMetrics for BloomFilterReader
Signed-off-by: evenyag <realevenyag@gmail.com >
* feat: collect read metrics for inverted index
Signed-off-by: evenyag <realevenyag@gmail.com >
* feat: add metrics for range_read and metadata
Signed-off-by: evenyag <realevenyag@gmail.com >
* refactor: rename elapsed to fetch_elapsed
Signed-off-by: evenyag <realevenyag@gmail.com >
* feat: collect metadata fetch metrics for inverted index
Signed-off-by: evenyag <realevenyag@gmail.com >
* feat: collect cache metrics for inverted and bloom index
Signed-off-by: evenyag <realevenyag@gmail.com >
* feat: collect read metrics in appliers
Signed-off-by: evenyag <realevenyag@gmail.com >
* feat: collect fulltext dir metrics for applier
Signed-off-by: evenyag <realevenyag@gmail.com >
* feat: collect parquet row group metrics
Signed-off-by: evenyag <realevenyag@gmail.com >
* feat: add parquet metadata metrics
Signed-off-by: evenyag <realevenyag@gmail.com >
* feat: add apply metrics
Signed-off-by: evenyag <realevenyag@gmail.com >
* feat: collect more metrics for memory row group
Signed-off-by: evenyag <realevenyag@gmail.com >
* feat: add fetch metrics to ReaderMetrics
Signed-off-by: evenyag <realevenyag@gmail.com >
* feat: init verbose metrics
Signed-off-by: evenyag <realevenyag@gmail.com >
* feat: debug print metrics in ScanMetricsSet
Signed-off-by: evenyag <realevenyag@gmail.com >
* feat: implement debug for new metrics
Signed-off-by: evenyag <realevenyag@gmail.com >
* chore: fix compiler errors
Signed-off-by: evenyag <realevenyag@gmail.com >
* feat: update parquet fetch metrics
Signed-off-by: evenyag <realevenyag@gmail.com >
* feat: collect the whole fetch time
Signed-off-by: evenyag <realevenyag@gmail.com >
* feat: add file_scan_cost
Signed-off-by: evenyag <realevenyag@gmail.com >
* chore: parquet fetch add cache_miss counter
Signed-off-by: evenyag <realevenyag@gmail.com >
* feat: print index read metrics
Signed-off-by: evenyag <realevenyag@gmail.com >
* chore: use actual bytes to increase counter
Signed-off-by: evenyag <realevenyag@gmail.com >
* refactor: remove provided implementations for index reader traits
Signed-off-by: evenyag <realevenyag@gmail.com >
* refactor: change get_parquet_meta_data() method to receive metrics
Signed-off-by: evenyag <realevenyag@gmail.com >
* refactor: rename file_scan_cost to sst_scan_cost
Signed-off-by: evenyag <realevenyag@gmail.com >
* chore: refine ParquetFetchMetrics
Signed-off-by: evenyag <realevenyag@gmail.com >
* style: fix clippy
Signed-off-by: evenyag <realevenyag@gmail.com >
* style: fmt code
Signed-off-by: evenyag <realevenyag@gmail.com >
* refactor: remove useless inner method
Signed-off-by: evenyag <realevenyag@gmail.com >
* refactor: collect page size actual needed
Signed-off-by: evenyag <realevenyag@gmail.com >
* refactor: simplify InvertedIndexReadMetrics
Signed-off-by: evenyag <realevenyag@gmail.com >
* refactor: simplfy InvertedIndexApplyMetrics Debug
Signed-off-by: evenyag <realevenyag@gmail.com >
* refactor: simplify BloomFilterReadMetrics Debug
Signed-off-by: evenyag <realevenyag@gmail.com >
* refactor: simplify BloomFilterIndexApplyMetrics Debug
Signed-off-by: evenyag <realevenyag@gmail.com >
* refactor: simplify FulltextIndexApplyMetrics implementation
Signed-off-by: evenyag <realevenyag@gmail.com >
* refactor: simplify ParquetFetchMetrics Debug
Signed-off-by: evenyag <realevenyag@gmail.com >
* refactor: simplify MetadataCacheMetrics Debug
Signed-off-by: evenyag <realevenyag@gmail.com >
* feat: only print verbose metrics when they are not empty.
Signed-off-by: evenyag <realevenyag@gmail.com >
* refactor: use mutex to protect ParquetFetchMetrics
Signed-off-by: evenyag <realevenyag@gmail.com >
* style: fmt code
Signed-off-by: evenyag <realevenyag@gmail.com >
* refactor: use duration for elapsed in ParquetFetchMetricsData
Signed-off-by: evenyag <realevenyag@gmail.com >
---------
Signed-off-by: evenyag <realevenyag@gmail.com >
2025-12-04 13:40:18 +00:00
Zhenchi
7b396bb290
feat(mito2): expose puffin index metadata ( #7042 )
...
* Add encode/decode helpers for IndexTarget
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* Use IndexTarget encode for puffin index blob keys
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* Normalize puffin index blobs to use IndexTarget keys
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* feat(mito2): expose puffin index metadata
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* target json polish
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* fix header
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* add index path
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* address copilot comments
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* address comments
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* reuse cached index metadata
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* parallelism for reading index meta
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
---------
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
2025-10-17 06:22:07 +00:00
LFC
8fe17d43d5
chore: update rust to nightly 2025-10-01 ( #7069 )
...
* chore: update rust to nightly 2025-10-01
Signed-off-by: luofucong <luofc@foxmail.com >
* chore: nix update
---------
Signed-off-by: luofucong <luofc@foxmail.com >
Co-authored-by: Ning Sun <sunning@greptime.com >
2025-10-11 07:30:52 +00:00
Ruihang Xia
c9377e7c5a
build: bump rust edition to 2024 ( #6920 )
...
* bump edition
Signed-off-by: Ruihang Xia <waynestxia@gmail.com >
* format
Signed-off-by: Ruihang Xia <waynestxia@gmail.com >
* gen keyword
Signed-off-by: Ruihang Xia <waynestxia@gmail.com >
* lifetime and env var
Signed-off-by: Ruihang Xia <waynestxia@gmail.com >
* one more gen fix
Signed-off-by: Ruihang Xia <waynestxia@gmail.com >
* lifetime of temporaries in tail expressions
Signed-off-by: Ruihang Xia <waynestxia@gmail.com >
* format again
Signed-off-by: Ruihang Xia <waynestxia@gmail.com >
* clippy nested if
Signed-off-by: Ruihang Xia <waynestxia@gmail.com >
* clippy let and return
Signed-off-by: Ruihang Xia <waynestxia@gmail.com >
---------
Signed-off-by: Ruihang Xia <waynestxia@gmail.com >
2025-09-08 02:37:18 +00:00
Ruihang Xia
e495c614f7
perf: improve bloom filter reader's byte reading logic ( #6658 )
...
* perf: improve bloom filter reader's byte reading logic
Signed-off-by: Ruihang Xia <waynestxia@gmail.com >
* revert toml change
Signed-off-by: Ruihang Xia <waynestxia@gmail.com >
* clearify comment
Signed-off-by: Ruihang Xia <waynestxia@gmail.com >
* benchmark
Signed-off-by: Ruihang Xia <waynestxia@gmail.com >
* update lock file
Signed-off-by: Ruihang Xia <waynestxia@gmail.com >
* pub util fn
Signed-off-by: Ruihang Xia <waynestxia@gmail.com >
* note endian
Signed-off-by: Ruihang Xia <waynestxia@gmail.com >
---------
Signed-off-by: Ruihang Xia <waynestxia@gmail.com >
2025-08-12 11:37:25 +00:00
Yingwen
bbab35f285
perf: Reduce fulltext bloom load time ( #6651 )
...
* perf: cached reader do not get page concurrently
Otherwise they will all fetch the same pages in parallel
Signed-off-by: evenyag <realevenyag@gmail.com >
* perf: always disable zstd for bloom
Signed-off-by: evenyag <realevenyag@gmail.com >
---------
Signed-off-by: evenyag <realevenyag@gmail.com >
2025-08-06 08:25:31 +00:00
Ruihang Xia
757694ae38
feat: count underscore in English tokenizer and improve performance ( #6660 )
...
* feat: count underscore in English tokenizer and improve performance
Signed-off-by: Ruihang Xia <waynestxia@gmail.com >
* update lock file
Signed-off-by: Ruihang Xia <waynestxia@gmail.com >
* update test results
Signed-off-by: Ruihang Xia <waynestxia@gmail.com >
* assert lookup table
Signed-off-by: Ruihang Xia <waynestxia@gmail.com >
* handle utf8 alphanumeric
Signed-off-by: Ruihang Xia <waynestxia@gmail.com >
* finalize
Signed-off-by: Ruihang Xia <waynestxia@gmail.com >
---------
Signed-off-by: Ruihang Xia <waynestxia@gmail.com >
2025-08-06 07:23:18 +00:00
yihong
e19493db4a
chore: update jieba tantivy-jieba and tantivy version ( #6637 )
...
* chore: update jieba tantivy-jieba and tantivy version
Signed-off-by: yihong0618 <zouzou0208@gmail.com >
* fix: address comments
Signed-off-by: yihong0618 <zouzou0208@gmail.com >
---------
Signed-off-by: yihong0618 <zouzou0208@gmail.com >
2025-08-03 19:08:36 +00:00
Zhenchi
599f289f59
feat: add granularity and false_positive_rate options for indexes ( #6416 )
...
* feat: add `granularity` and `false_positive_rate` options for indexes
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* address comments
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* upgrade proto
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
---------
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
2025-07-02 07:33:39 +00:00
Zhenchi
400229c384
feat: introduce index result cache ( #6110 )
...
* feat: introduce index result cache
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* Update src/mito2/src/sst/index/inverted_index/applier/builder.rs
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* optimize selector_len
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* address comments
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* address comments
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* address comments
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
---------
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
2025-05-20 01:45:42 +00:00
shuiyisong
3c943be189
chore: update rust toolchain ( #5818 )
...
* chore: update nightly version
* chore: sort lint lines
* chore: minor fix
* chore: update nix
* chore: update toolchain to 2024-04-14
* chore: update toolchain to 2024-04-15
* chore: remove unnecessory test
* chore: do not assert oid in sqlness test
* chore: fix margin issue
* chore: fix cr issues
* chore: fix cr issues
---------
Co-authored-by: Ning Sun <sunning@greptime.com >
2025-04-27 09:02:36 +00:00
Zhenchi
d5026f3491
perf: optimize fulltext zh tokenizer for ascii-only text ( #5975 )
...
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
2025-04-24 23:31:26 +00:00
Zhenchi
e3675494b4
feat: apply terms with fulltext bloom backend ( #5884 )
...
* feat: apply terms with fulltext bloom backend
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* perf: preload jieba
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* polish doc
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
---------
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
2025-04-14 07:08:59 +00:00
Zhenchi
dce5e35d7c
feat: apply terms with fulltext tantivy backend ( #5869 )
...
* feat: apply terms with fulltext tantivy backend
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* fix test
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* address comments
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
---------
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
2025-04-10 07:32:15 +00:00
Ruihang Xia
c26e165887
refactor: check and fix super import ( #5846 )
...
* refactor: check and fix super import
Signed-off-by: Ruihang Xia <waynestxia@gmail.com >
* add to makefile
Signed-off-by: Ruihang Xia <waynestxia@gmail.com >
* change dir
Signed-off-by: Ruihang Xia <waynestxia@gmail.com >
---------
Signed-off-by: Ruihang Xia <waynestxia@gmail.com >
2025-04-08 11:48:52 +00:00
Zhenchi
f797de3497
feat: add backend field to fulltext options ( #5806 )
...
* feat: add backend field to fulltext options
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* update proto
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* fix option conv
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* fix display
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* polish
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
---------
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
2025-04-02 09:15:54 +00:00
Zhenchi
aa486db8b7
refactor: allow bloom filter search to apply and conjunction ( #5770 )
...
* refactor: change bloom filter search from any to all match
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* polish
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* place back in list
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* nit
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
---------
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
2025-04-01 12:50:34 +00:00
fys
2b2ea5bf72
chore: upgrade some dependencies ( #5777 )
...
* chore: upgrade some dependencies
* chore: upgrade some dependencies
* fix: cr
* fix: ci
* fix: test
* fix: cargo fmt
2025-03-27 02:48:44 +00:00
Zhenchi
7bcb01d269
feat: utilize blob metadata properties ( #5767 )
...
* feat: utilize blob metadata properties
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* Update src/puffin/src/puffin_manager/fs_puffin_manager/reader.rs
---------
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
2025-03-26 02:47:20 +00:00
Zhenchi
face361fcb
feat: introduce roaring bitmap to optimize sparse value scenarios ( #5603 )
...
* feat: introduce roaring bitmap to optimize sparse value scenarios
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* fix taplo
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* address comments
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* polish
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* address comments
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
---------
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
2025-03-10 04:24:08 +00:00
Zhenchi
e714f7df6c
fix: out of bound during bloom search ( #5625 )
...
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
2025-03-03 09:53:14 +00:00
Zhenchi
8d05fb3503
feat: unify puffin name passed to stager ( #5564 )
...
* feat: purge a given puffin file in staging area
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* polish log
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* ttl set to 2d
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* feat: expose staging_ttl to index config
* feat: unify puffin name passed to stager
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* fix test
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* address comments
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* fallback to remote index
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* fix
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* refactor
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
---------
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
Co-authored-by: evenyag <realevenyag@gmail.com >
2025-02-21 09:27:03 +00:00
Zhenchi
421e38c481
feat: allow purging a given puffin file in staging area ( #5558 )
...
* feat: purge a given puffin file in staging area
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* polish log
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* ttl set to 2d
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* feat: expose staging_ttl to index config
* fix test
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* use `invalidate_entries_if` instead of maintaining map
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* run_pending_tasks after purging
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
---------
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
Co-authored-by: evenyag <realevenyag@gmail.com >
2025-02-19 08:58:30 +00:00
Zhenchi
858dae7b23
feat: add stager nofitier to collect metrics ( #5530 )
...
* feat: add stager nofitier to collect metrics
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* apply prev commit
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* remove dup size
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* add load cost
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
---------
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
2025-02-14 07:49:26 +00:00
Yingwen
35b635f639
feat!: Bump datafusion, prost, hyper, tonic, tower, axum ( #5417 )
...
* change dep
Signed-off-by: Ruihang Xia <waynestxia@gmail.com >
* feat: adapt to arrow's interval array
* chore: fix compile errors in datatypes crate
* chore: fix api crate compiler errors
* chore: fix compiler errors in common-grpc
* chore: fix common-datasource errors
* chore: fix deprecated code in common-datasource
* fix promql and physical plan related
Signed-off-by: Ruihang Xia <waynestxia@gmail.com >
* wip: upgrading network deps
Signed-off-by: Ruihang Xia <waynestxia@gmail.com >
* block on updating `sqlparser`
* upgrade sqlparser
Signed-off-by: Ruihang Xia <waynestxia@gmail.com >
* adapt new df's trait requirements
Signed-off-by: Ruihang Xia <waynestxia@gmail.com >
* chore: fix compiler errors in mito2
* chore: fix common-function crate errors
* chore: fix catalog errors
* change import path
Signed-off-by: Ruihang Xia <waynestxia@gmail.com >
* chore: fix some errors in query crate
* chore: fix some errors in query crate
* aggr expr and some other tiny fixes
Signed-off-by: Ruihang Xia <waynestxia@gmail.com >
* chore: fix expr related errors in query crate
* chore: fix query serializer and admin command
* chore: fix grpc services
* feat: axum serve
* chore: fix http server
* remove handle_error handler
* refactor timeout layer
* serve axum
* chore: fix flow aggr functions
* chore: fix flow
* feat: fix errors in meta-srv
* boxed()
* use TokioIo
* feat!: Remove script crate and python feature (#5321 )
* feat: exclude script crate
* chore: simplify feature
* feat: remove the script crate
* chore: remove python feature and some comments
* chore: fix warning
* chore: fix servers tests compiler errors
* feat: fix tests-integration errors
* chore: fix unused
* test: fix catalog test
* chore: fix compiler errors for crates using common-meta
testing feature is enabled when check with --workspace
* test: use display for logical plan test
* test: implement rewrite for ScanHintRule
* fix: http server build panic
* test: fix mito test
* fix: sql parser type alias error
* test: fix TestClient not listen
* test: some flow tests
* test(flow): more fix
* fix: test_otlp_logs
* test: fix promql test that using deprecated method fun()
* fix: sql type replace supports Int8 ~ Int64, UInt8 ~ UInt64
* test: fix infer schema test case
* test: fix tests related to plan display
* chore: fix last flow test
* test: fix function format related assertion
* test: use larger port range for tests
* fix: test_otlp_traces
* fix: test_otlp_metrics
* fix range query and dist plan
Signed-off-by: Ruihang Xia <waynestxia@gmail.com >
* fix: flow handle distinct use deprecated field
* fix: can't pass Join plan expressions to LogicalPlan::with_new_exprs
* test: fix deserialize test
* test: reduce split key case num
* tests: lower case aggr func name
* test: fix some sqlness tests
* tests: more sqlness fix
* tests: fixed sqlness test
* commit non-bug changes
Signed-off-by: Ruihang Xia <waynestxia@gmail.com >
* fix: make our udf correct
* fix: implement empty methods of ContextProvider for DfContextProviderAdapter
* test: update sqlness test result
* chore: remove unused
* fix: provide alias name for AggregateExprBuilder in range plan
* test: update range query result
* fix: implement missing ContextProvider methods for DfContextProviderAdapter
* test: update timestamps, cte result
* fix: supports empty projection in mito
* test: update comment for cte test
* fix: support projection for numbers
* test: update test cases after projection fix
* fix: fix range select first_value/last_value
* fix: handle CAST and time index conflict
* fix: handle order by correctly in range first_value/last_value
* test: update sqlness result
* test: update view test result
* test: update decimal test
wait for https://github.com/apache/datafusion/pull/14126 to fix this
* feat: remove redundant physical optimization
todo(ruihang): Check if we can remove this.
* test: update sqlness test result
* chore: range select default sort use nulls_first = false
* test: update filter push down test result
* test: comment deciaml test to avoid different panic message
* test: update some distributed test result
* test: update test for distributed count and filter push down
* test: update subqueries test
* fix: SessionState may overwrite our UDFs
* chore: fix compiler errors after merging main
* fix: fix elasticsearch and dashboard router panic
* chore: fix common-functions tests
* chore: update sqlness result
* test: fix id keyword and update sqlness result
* test: fix flow_null test
* fix: enlarge thread size in debug mode to avoid overflow
* chore: fix warnings in common-function
* chore: fix warning in flow
* chore: fix warnings in query crate
* chore: remove unused warnings
* chore: fix deprecated warnings for parquet
* chore: fix deprecated warning in servers crate
* style: fix clippy
* test: enlarge mito cache tttl test ttl time
* chore: fix typo
* style: fmt toml
* refactor: reimplement PartialOrd for RangeSelect
* chore: remove script crate files introduced by merge
* fix: return error if sql option is not kv
* chore: do not use ..default::default()
* chore: per review
* chore: update error message in BuildAdminFunctionArgsSnafu
Co-authored-by: jeremyhi <jiachun_feng@proton.me >
* refactor: typed precision
* update sqlness view case
Signed-off-by: Ruihang Xia <waynestxia@gmail.com >
* chore: flow per review
* chore: add example in comment
* chore: warn if parquet stats of timestamp is not INT64
* style: add a newline before derive to make the comment more clear
* test: update sqlness result
* fix: flow from substrait
* chore: change update_range_context log to debug level
* chore: move axum-extra axum-macros to workspace
---------
Signed-off-by: Ruihang Xia <waynestxia@gmail.com >
Co-authored-by: Ruihang Xia <waynestxia@gmail.com >
Co-authored-by: luofucong <luofc@foxmail.com >
Co-authored-by: discord9 <discord9@163.com >
Co-authored-by: shuiyisong <xixing.sys@gmail.com >
Co-authored-by: jeremyhi <jiachun_feng@proton.me >
2025-01-23 06:15:40 +00:00
Zhenchi
f74a955504
feat: bloom filter as fulltext index v2 (Part 1) ( #5406 )
...
* feat: bloom filter as fulltext index v2
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* add unit tests for tokenizer
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* address comments
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* address comments
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* refactor dup vars
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* address comments
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
---------
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
2025-01-21 23:33:11 +00:00
Zhenchi
1acfb6ed1c
feat!: use indirect indices for bloom filter to reduce size ( #5377 )
...
* feat!(bloom-filter): use indirect indices to reduce size
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* fix format
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* update proto
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* nit
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* upgrade proto
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
---------
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
2025-01-16 13:18:29 +00:00
Zhenchi
5cf9d7b6ca
fix(bloom-filter): filter rows with segment precision ( #5286 )
...
* fix(bloom-filter): filter rows with segment precision
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* add case
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* address TODO
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
---------
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
2025-01-06 11:45:15 +00:00
Weny Xu
96b2a5fb28
feat: introduce ParallelFstValuesMapper ( #5276 )
...
* refactor: `RangeReader` to use `&self`
* refactor: `InvertedIndexReader` to use `&self`
* refactor: refactor: `BloomFilterReader` to use `&self`
* feat: introduce `ParallelFstValuesMapper`
* chore: change prefetch size to 8KiB
* chore: add `file_size_hint` for cached blob reader
* chore: fix clippy
* refactor: remove `FstValuesMapper`
* chore: apply suggestions from CR
2025-01-06 07:33:35 +00:00
Zhenchi
f4b2d393be
feat(config): add bloom filter config ( #5237 )
...
* feat(bloom-filter): integrate indexer with mito2
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* feat(config) add bloom filter config
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* fix
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* fix docs
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* address comments
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* fix docs
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* merge
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* remove cache config
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
---------
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
2024-12-26 04:38:45 +00:00
Ruihang Xia
00ad27dd2e
feat(bloom-filter): bloom filter applier ( #5220 )
...
* wip
Signed-off-by: Ruihang Xia <waynestxia@gmail.com >
* draft search logic
Signed-off-by: Ruihang Xia <waynestxia@gmail.com >
* use defined BloomFilterReader
Signed-off-by: Ruihang Xia <waynestxia@gmail.com >
* fix clippy
Signed-off-by: Ruihang Xia <waynestxia@gmail.com >
* round the range end
Signed-off-by: Ruihang Xia <waynestxia@gmail.com >
* finish index applier
Signed-off-by: Ruihang Xia <waynestxia@gmail.com >
* integrate applier into mito2 with cache layer
Signed-off-by: Ruihang Xia <waynestxia@gmail.com >
* fix cache key and add unit test
Signed-off-by: Ruihang Xia <waynestxia@gmail.com >
* provide bloom filter index size hint
Signed-off-by: Ruihang Xia <waynestxia@gmail.com >
* revert BloomFilterReaderImpl::read_vec
Signed-off-by: Ruihang Xia <waynestxia@gmail.com >
* remove dead code
Signed-off-by: Ruihang Xia <waynestxia@gmail.com >
* ignore null on eq
Signed-off-by: Ruihang Xia <waynestxia@gmail.com >
* add more tests and fix bloom filter logic
Signed-off-by: Ruihang Xia <waynestxia@gmail.com >
---------
Signed-off-by: Ruihang Xia <waynestxia@gmail.com >
2024-12-26 02:51:18 +00:00
Zhenchi
a9f21915ef
feat(bloom-filter): integrate indexer with mito2 ( #5236 )
...
* feat(bloom-filter): integrate indexer with mito2
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* rename skippingindextype
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* address comments
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
---------
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
2024-12-25 14:30:07 +00:00
Zhenchi
c96903e60c
feat(bloom-filter): impl batch push to creator ( #5225 )
...
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
2024-12-25 07:53:53 +00:00
Zhenchi
d51b65a8bf
feat(index-cache): abstract IndexCache to be shared by multi types of indexes ( #5219 )
...
* feat(index-cache): abstract `IndexCache` to be shared by multi types of indexes
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* fix typo
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* fix: remove added label
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* refactor: simplify cached reader impl
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* rename func
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
---------
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
2024-12-24 05:10:30 +00:00
Zhenchi
4245bff8f2
feat(bloom-filter): add bloom filter reader ( #5204 )
...
* feat(bloom-filter): add bloom filter reader
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* chore: remove unused dep
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* fix conflict
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* address comments
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
---------
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
2024-12-20 08:29:18 +00:00
Zhenchi
3d4121aefb
feat(bloom-filter): add memory control for creator ( #5185 )
...
* feat(bloom-filter): add memory control for creator
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* refactor: remove meaningless buf
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* feat: add codec for intermediate
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
---------
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
2024-12-20 06:59:44 +00:00
Yohan Wal
7d1bcc9d49
feat: introduce Buffer for non-continuous bytes ( #5164 )
...
* feat: introduce Buffer for non-continuous bytes
* Update src/mito2/src/cache/index.rs
Co-authored-by: Weny Xu <wenymedia@gmail.com >
* chore: apply review comments
* refactor: use opendal::Buffer
---------
Co-authored-by: Weny Xu <wenymedia@gmail.com >
2024-12-18 03:45:38 +00:00
Zhenchi
d821dc5a3e
feat(bloom-filter): add basic bloom filter creator (Part 1) ( #5177 )
...
* feat(bloom-filter): add a simple bloom filter creator (Part 1)
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* fix: clippy
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* fix: header
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
* docs: add format comment
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
---------
Signed-off-by: Zhenchi <zhongzc_arch@outlook.com >
2024-12-17 06:55:42 +00:00
Yohan Wal
4b4c6dbb66
refactor: cache inverted index with fixed-size page ( #5114 )
...
* feat: cache inverted index by page instead of file
* fix: add unit test and fix bugs
* chore: typo
* chore: ci
* fix: math
* chore: apply review comments
* chore: renames
* test: add unit test for index key calculation
* refactor: use ReadableSize
* feat: add config for inverted index page size
* chore: update config file
* refactor: handle multiple range read and fix some related bugs
* fix: add config
* test: turn to a fs reader to match behaviors of object store
2024-12-13 07:34:24 +00:00
Weny Xu
8c1959c580
feat: add prefetch support to InvertedIndexFooterReader for reduced I/O time ( #5146 )
...
* feat: add prefetch support to `InvertedIndeFooterReader`
* chore: correct struct name
* chore: apply suggestions from CR
2024-12-12 03:49:54 +00:00