Paul Masurel
25d44fcec8
Revert "remove unused columnar api ( #2742 )" ( #2748 )
...
* Revert "remove unused columnar api (#2742 )"
This reverts commit 8725594d47 .
* Clippy comment + removing fill_vals
---------
Co-authored-by: Paul Masurel <paul.masurel@datadoghq.com >
2025-11-26 17:44:02 +01:00
PSeitz-dd
8725594d47
remove unused columnar api ( #2742 )
2025-11-21 18:07:25 +01:00
MassimilianoBaglioni
74334f9c9a
Fixed typo in documentation ( #2629 )
...
Co-authored-by: Massimiliano Baglioni <massimilianobaglioni@MacBook-Air-di-Massimiliano.local >
2025-07-11 14:45:59 +08:00
PSeitz
5379c99ea2
update edition to 2024 ( #2620 )
...
* update common to edition 2024
* update bitpacker to edition 2024
* update stacker to edition 2024
* update query-grammar to edition 2024
* update sstable to edition 2024 + fmt
* fmt
* update columnar to edition 2024
* cargo fmt
* use None instead of _
2025-04-18 04:56:31 +02:00
PSeitz
59084143ef
use optional index in multivalued index ( #2439 )
...
* use optional index in multivalued index
For mostly empty multivalued indices there was a large overhead during
creation when iterating all docids. This is alleviated by placing an
optional index in the multivalued index to mark documents that have values.
There's some performance overhead when accessing values in a multivalued
index. The accessing cost is now optional index + multivalue index. The
sparse codec performs relatively bad with the binary_search when accessing
data. This is reflected in the benchmarks below.
This changes the format of columnar to v2, but code is added to handle the v1
formats.
```
Running benches/bench_access.rs (/home/pascal/Development/tantivy/optional_multivalues/target/release/deps/bench_access-ea323c028db88db4)
multi sparse 1/13
access_values_for_doc Avg: 42.8946ms (+241.80%) Median: 42.8869ms (+244.10%) [42.7484ms .. 43.1074ms]
access_first_vals Avg: 42.8022ms (+421.93%) Median: 42.7553ms (+439.84%) [42.6794ms .. 43.7404ms]
multi 2x
access_values_for_doc Avg: 31.1244ms (+24.17%) Median: 30.8339ms (+23.46%) [30.7192ms .. 33.6059ms]
access_first_vals Avg: 24.3070ms (+70.92%) Median: 24.0966ms (+70.18%) [23.9328ms .. 26.4851ms]
sparse 1/13
access_values_for_doc Avg: 42.2490ms (+0.61%) Median: 42.2346ms (+2.28%) [41.8988ms .. 43.7821ms]
access_first_vals Avg: 43.6272ms (+0.23%) Median: 43.6197ms (+1.78%) [43.4920ms .. 43.9009ms]
dense 1/12
access_values_for_doc Avg: 8.6184ms (+23.18%) Median: 8.6126ms (+23.78%) [8.5843ms .. 8.7527ms]
access_first_vals Avg: 6.8112ms (+4.47%) Median: 6.8002ms (+4.55%) [6.7887ms .. 6.8991ms]
full
access_values_for_doc Avg: 9.4073ms (-5.09%) Median: 9.4023ms (-2.23%) [9.3694ms .. 9.4568ms]
access_first_vals Avg: 4.9531ms (+6.24%) Median: 4.9502ms (+7.85%) [4.9423ms .. 4.9718ms]
```
```
Running benches/bench_merge.rs (/home/pascal/Development/tantivy/optional_multivalues/target/release/deps/bench_merge-475697dfceb3639f)
merge_multi 2x_and_multi 2x Avg: 20.2280ms (+34.33%) Median: 20.1829ms (+35.33%) [19.9933ms .. 20.8806ms]
merge_multi sparse 1/13_and_multi sparse 1/13 Avg: 0.8961ms (-78.04%) Median: 0.8943ms (-77.61%) [0.8899ms .. 0.9272ms]
merge_dense 1/12_and_dense 1/12 Avg: 0.6619ms (-1.26%) Median: 0.6616ms (+2.20%) [0.6473ms .. 0.6837ms]
merge_sparse 1/13_and_sparse 1/13 Avg: 0.5508ms (-0.85%) Median: 0.5508ms (+2.80%) [0.5420ms .. 0.5634ms]
merge_sparse 1/13_and_dense 1/12 Avg: 0.6046ms (-4.64%) Median: 0.6038ms (+2.80%) [0.5939ms .. 0.6296ms]
merge_multi sparse 1/13_and_dense 1/12 Avg: 0.9111ms (-83.48%) Median: 0.9063ms (-83.50%) [0.9047ms .. 0.9663ms]
merge_multi sparse 1/13_and_sparse 1/13 Avg: 0.8451ms (-89.49%) Median: 0.8428ms (-89.43%) [0.8411ms .. 0.8563ms]
merge_multi 2x_and_dense 1/12 Avg: 10.6624ms (-4.82%) Median: 10.6568ms (-4.49%) [10.5738ms .. 10.8353ms]
merge_multi 2x_and_sparse 1/13 Avg: 10.6336ms (-22.95%) Median: 10.5925ms (-22.33%) [10.5149ms .. 11.5657ms]
```
* Update columnar/src/columnar/format_version.rs
Co-authored-by: Paul Masurel <paul@quickwit.io >
* Update columnar/src/column_index/mod.rs
Co-authored-by: Paul Masurel <paul@quickwit.io >
---------
Co-authored-by: Paul Masurel <paul@quickwit.io >
2024-06-19 14:54:12 +08:00
PSeitz
7ce950f141
add method to fetch block of first vals in columnar ( #2330 )
...
* add method to fetch block of first vals in columnar
add method to fetch block of first vals in columnar (this is way faster
than single calls for full columns)
add benchmark
fix import warnings
```
test bench_get_block_first_on_full_column ... bench: 56 ns/iter (+/- 26)
test bench_get_block_first_on_full_column_single_calls ... bench: 311 ns/iter (+/- 6)
test bench_get_block_first_on_multi_column ... bench: 378 ns/iter (+/- 15)
test bench_get_block_first_on_multi_column_single_calls ... bench: 546 ns/iter (+/- 13)
test bench_get_block_first_on_optional_column ... bench: 291 ns/iter (+/- 6)
test bench_get_block_first_on_optional_column_single_calls ... bench: 362 ns/iter (+/- 8)
```
* use remainder
2024-03-15 08:01:47 +01:00
PSeitz
b0e65560a1
handle ip adresses in term aggregation ( #2319 )
...
* handle ip adresses in term aggregation
Stores IpAdresses during the segment term aggregation via u64 representation
and convert to u128(IpV6Adress) via downcast when converting to intermediate results.
Enable Downcasting on `ColumnValues`
Expose u64 variant for u128 encoded data via `open_u64_lenient` method.
Remove lifetime in VecColumn, to avoid 'static lifetime requirement coming
from downcast trait.
* rename method
2024-03-14 09:41:18 +01:00
PSeitz
ec37295b2f
add fast path for full columns in fetch_block ( #2328 )
...
Spotted in `range_date_histogram` query in quickwit benchmark:
5% of time copying docs around, which is not needed in the full index case
remove Column to ColumnIndex deref
2024-03-14 04:07:11 +01:00
PSeitz
b1d8b072db
add missing aggregation part 2 ( #2149 )
...
* add missing aggregation part 2
Add missing support for:
- Mixed types columns
- Key of type string on numerical fields
The special aggregation is slower than the integrated one in TermsAggregation and therefore not
chosen by default, although it can cover all use cases.
* simplify, add num_docs to empty
2023-08-31 07:55:33 +02:00
PSeitz
2e109018b7
add missing parameter to term agg ( #2103 )
...
* add missing parameter to term agg
* move missing handling to block accessor
* add multivalue test, fix multivalue case, add comments
* add documentation, deactivate special case
* cargo fmt
* resolve merge conflict
2023-08-14 14:22:18 +02:00
Paul Masurel
821208480b
Adding Debug/Display impl. Refining the ColumnIndex::get_cardinality
2023-03-26 14:40:37 +09:00
Paul Masurel
a2e3c2ed5b
Renaming Column::idx -> Column::index ( #1961 )
...
There was some variable name ghosting happening.
2023-03-26 13:58:50 +09:00
Paul Masurel
2b6a4da640
Exposing empty column builder. ( #1959 )
2023-03-24 16:34:41 +09:00
PSeitz
da2804644f
fetch blocks of vals in aggregation for all cardinality ( #1950 )
...
* fetch blocks of vals in aggregation for all cardinality
* move caching in common accessor
2023-03-23 08:41:11 +01:00
Paul Masurel
0a726a0897
Added Empty ColumnIndex ( #1910 )
2023-02-27 13:59:22 +09:00
Paul Masurel
06850719dc
Renaming .values(DocId) to .values_for_doc(DocId) ( #1906 )
2023-02-27 12:15:13 +09:00
Paul Masurel
0274c982d5
Refactoring. ( #1881 )
...
`ColumnValues` wrongly located in column_values/column.rs due to
historical reason moves to column_values/mod.rs
u128 stuff gets its own directory like u64 stuff.
2023-02-17 21:57:14 +09:00
PSeitz
74bf60b4f7
implement SegmentAggregationCollector on bucket aggs ( #1878 )
2023-02-17 12:53:29 +01:00
PSeitz
111f25a8f7
clippy ( #1879 )
...
* fix clippy
* fix clippy
* fmt
2023-02-17 11:34:21 +01:00
Paul Masurel
097fd6138d
Fix clippy comments ( #1872 )
2023-02-14 23:12:45 +09:00
PSeitz
1cfb9ce59a
improve range query performance ( #1864 )
...
fix RowId vs DocId naming
fixes #1863
2023-02-14 13:25:39 +09:00
Paul Masurel
bd5eea9852
Integrated columnar work.
2023-02-09 13:14:31 +01:00
PSeitz
b31fd389d8
collect columns for merge ( #1812 )
...
* collect columns for merge
* return column_type from, fix visibility
* fix
Co-authored-by: Paul Masurel <paul@quickwit.io >
2023-01-20 07:58:29 +01:00
Paul Masurel
89cec79813
Make it possible to force a column type and intricate bugfix. ( #1815 )
2023-01-20 14:30:56 +09:00
Paul Masurel
e3d504d833
Minor code cleanup ( #1810 )
2023-01-19 17:47:26 +09:00
Paul Masurel
5a42c5aae9
Add support for multivalues ( #1809 )
2023-01-19 16:55:01 +09:00
Paul Masurel
a86b104a40
Differentiating between str and bytes, + unit test
2023-01-19 14:38:12 +09:00
PSeitz
f9abd256b7
add ip addr to columnar ( #1805 )
2023-01-19 05:36:06 +01:00
Paul Masurel
9f42b6440a
Completed unit test for dictionary encoded column
2023-01-19 12:15:27 +09:00
Paul Masurel
25bad784ad
Integrated fastfield codecs into columnar. ( #1782 )
...
Introduced asymetric OptionalCodec / SerializableOptionalCodec
Removed cardinality from the columnar sstable.
Added DynamicColumn
Reorganized all files
Change DenseCodec serialization logic.
Renamed methods to rank/select
Moved versioning footer to the columnar level
2023-01-16 17:24:49 +09:00