Paul Masurel
c838aa808b
Removedc the extra nesting in unit test file ( #1907 )
2023-02-27 12:17:52 +09:00
Paul Masurel
06850719dc
Renaming .values(DocId) to .values_for_doc(DocId) ( #1906 )
2023-02-27 12:15:13 +09:00
PSeitz
5f23bb7e65
switch to sparse collection for histogram ( #1898 )
...
* switch to sparse collection for histogram
Replaces histogram vec collection with a hashmap. This approach works much better for sparse data and enables use cases like drill downs (filter + small interval).
It is slower for dense cases (1.3x-2x slower). This can be alleviated with a specialized hashmap in the future.
closes #1704
closes #1370
* refactor, clippy
* fix bucket_pos overflow issue
2023-02-23 07:02:58 +01:00
trinity-1686a
533ad99cd5
add PhrasePrefixQuery ( #1842 )
...
* add PhrasePrefixQuery
2023-02-22 11:18:33 +01:00
PSeitz
c7278b3258
remove schema in aggs ( #1888 )
...
* switch to ColumnType, move tests
* remove Schema dependency in agg
2023-02-22 04:50:28 +01:00
Paul Masurel
6b403e3281
Re-export of columnar
2023-02-22 11:23:54 +09:00
Paul Masurel
789cc8703e
Adding unit test testing docfreq after merge ( #1895 )
2023-02-22 11:05:34 +09:00
Paul Masurel
e5098d9fe8
Moving test around reenabling tests that were disabled. ( #1894 )
2023-02-22 10:31:52 +09:00
Paul Masurel
f537334e4f
Adding a write schema to columnar's merge operations. ( #1884 )
...
* Adding a write schema to columnar's merge operations.
* Added unit test checking min/max when columns are empty.
* CR comment
* Rename to value_type_to_column_type
2023-02-21 18:25:16 +09:00
Paul Masurel
e2aa5af075
Clippy warnings fixes ( #1885 )
2023-02-20 19:04:13 +09:00
PSeitz
74bf60b4f7
implement SegmentAggregationCollector on bucket aggs ( #1878 )
2023-02-17 12:53:29 +01:00
PSeitz
111f25a8f7
clippy ( #1879 )
...
* fix clippy
* fix clippy
* fmt
2023-02-17 11:34:21 +01:00
PSeitz
019db10e8e
refactor aggregations ( #1875 )
...
* add specialized version for full cardinality
Pre Columnar
test aggregation::tests::bench::bench_aggregation_average_u64 ... bench: 6,681,850 ns/iter (+/- 1,217,385)
test aggregation::tests::bench::bench_aggregation_average_u64_and_f64 ... bench: 10,576,327 ns/iter (+/- 494,380)
Current
test aggregation::tests::bench::bench_aggregation_average_u64 ... bench: 11,562,084 ns/iter (+/- 3,678,682)
test aggregation::tests::bench::bench_aggregation_average_u64_and_f64 ... bench: 18,925,790 ns/iter (+/- 17,616,771)
Post Change
test aggregation::tests::bench::bench_aggregation_average_u64 ... bench: 9,123,811 ns/iter (+/- 399,720)
test aggregation::tests::bench::bench_aggregation_average_u64_and_f64 ... bench: 13,111,825 ns/iter (+/- 273,547)
* refactor aggregation collection
* add buffering collector
2023-02-16 13:15:16 +01:00
Paul Masurel
7423f99719
Issue/columnar for json ( #1876 )
...
Adding support for JSON fast field.
2023-02-16 20:38:32 +09:00
Alex Cole
f2f38c43ce
Make BM25 scoring more flexible ( #1855 )
...
* Introduce Bm25StatisticsProvider to inject statistics
* fix formatting I accidentally changed
2023-02-16 19:14:12 +09:00
PSeitz
347614c841
test error for avg agg on ip field ( #1873 )
...
closes #1835
2023-02-14 23:22:56 +08:00
Paul Masurel
097fd6138d
Fix clippy comments ( #1872 )
2023-02-14 23:12:45 +09:00
PSeitz
01e5a22759
switch to new ff api ( #1868 )
2023-02-14 15:57:32 +08:00
Yukun Guo
dfe4e95fde
Make index compatible with virtual drives on Windows ( #1843 )
...
* Make index compatible with virtual drives on Windows
* Get rid of normpath
2023-02-14 16:41:48 +09:00
Paul Masurel
60cc2644d6
Fixing test_fail_on_flush_segment_but_one_worker_remains ( #1869 )
...
The new fast field code, based on columnar, had a larger minimum memory
footprint, causing the first docuemnt to trigger a flush of the asegment
in this unit test.
This PR prevents the allocation of a large capacity for the different hashmap tables
using in the columnar writer.
Closes #1859
2023-02-14 16:09:42 +09:00
Paul Masurel
10bccac61b
Bugfix in parse_into_milliseconds ( #1867 )
2023-02-14 15:06:40 +09:00
PSeitz
1cfb9ce59a
improve range query performance ( #1864 )
...
fix RowId vs DocId naming
fixes #1863
2023-02-14 13:25:39 +09:00
trinity-1686a
539ff08a79
move DateTime to tantivy_common ( #1861 )
...
* move DateTime to tantivy_common
* resolve imports of columnar::DateTime as import of common::DateTime
2023-02-11 17:03:06 +01:00
PSeitz
dab93df94e
fix benchmarks ( #1862 )
2023-02-11 15:44:47 +09:00
PSeitz
cbcafae04c
fix: doc store for files larger 4GB ( #1856 )
...
Fixes an issue in the skip list deserialization, which deserialized the byte start offset incorrectly as u32.
`get_doc` will fail for any docs that live in a block with start offset larger than u32::MAX (~4GB).
Causes index corruption, if a segment with a doc store larger 4GB is merged.
tantivy version 0.19 is affected
2023-02-10 14:29:43 +01:00
PSeitz
36c6138e7f
fix: auto downgrade index record option, instead of vint error ( #1857 )
...
Prev: thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: IoError(Custom { kind: InvalidData, error: "Reach end of buffer while reading VInt" })', src/main.rs:46:14
Now: Automatic downgrade to next available level
2023-02-10 13:45:23 +01:00
PSeitz
7a9befd18d
fix sort order test for term aggregation ( #1858 )
...
fix sort order test for term aggregation
fix invalid request test
2023-02-10 10:26:58 +01:00
PSeitz
03345f0aa2
fmt code, update lz4_flex ( #1838 )
...
formatting on nightly changed
2023-02-10 01:42:32 +09:00
Paul Masurel
b7bfa20e38
Fixed test performance.
2023-02-09 17:39:55 +01:00
trinity-1686a
1390834ae8
make Term::as_slice public ( #1846 )
2023-02-09 15:37:07 +01:00
trinity-1686a
3ac973bea4
fix invalid endianness in documentation ( #1845 )
...
* fix doc about term endianness
* rustfmt
2023-02-09 15:36:38 +01:00
Paul Masurel
405e2cf4d9
Merge with main
2023-02-09 14:28:57 +01:00
Paul Masurel
bd5eea9852
Integrated columnar work.
2023-02-09 13:14:31 +01:00
PSeitz
0f20787917
fix doc store cache docs ( #1821 )
...
* fix doc store cache docs
addresses an issue reported in #1820
* rename doc_store_cache_size
2023-01-23 07:06:49 +01:00
Paul Masurel
08919a2900
Improvement on the scalar / random bitpacker code. ( #1781 )
...
* Improvement on the scalar / random bitpacker code.
Added proptesting
Added simple benchmark
Added assert and comments on the very non trivial hidden contract
Remove the need for an extra padding.
The last point introduces a small performance regression (~10%).
* Fixing unit tests
2023-01-19 18:09:13 +09:00
Lonre Wang
8ba333f1b4
Typo fix ( #1803 )
...
* Update text_options.rs
* Update src/schema/text_options.rs
Co-authored-by: Paul Masurel <paul@quickwit.io >
2023-01-19 17:56:05 +09:00
PSeitz
a2ca12995e
update aggregation docs ( #1807 )
2023-01-19 09:52:47 +01:00
Paul Masurel
5180b612ef
Removing the demuxer code ( #1799 )
2023-01-18 16:12:35 +09:00
PSeitz
f687b3a5aa
start migrate Field to &str ( #1772 )
...
start migrate Field to &str in preparation of columnar
return Result for get_field
2023-01-18 16:12:07 +09:00
Adrien Guillo
c51d9f9f83
Fix some Clippy warnings
2023-01-17 10:17:51 -05:00
Adrien Guillo
0caaf13a90
Remove standard deviation from stats aggregation
2023-01-16 22:58:23 -05:00
Adrien Guillo
f2dad194ea
Add count, min, max, and sum aggregations
2023-01-16 12:22:20 -05:00
PSeitz
6ca9a477f3
reuse stats for average ( #1785 )
...
* reuse stats for average
* fix count type
2023-01-13 23:32:27 +08:00
Shikhar Bhushan
2650111b76
EnableScoring::Disabled - optional Searcher ( #1780 )
2023-01-12 09:26:50 -05:00
PSeitz
1176555eff
handle user input on get_docid_for_value_range ( #1760 )
...
* handle user input on get_docid_for_value_range
fixes #1757
* pass range as parameter
2023-01-12 14:20:16 +01:00
Adrien Guillo
e17996f2fd
Allow range queries via fast fields on non-indexed fields
2023-01-11 09:56:13 -05:00
Adrien Guillo
14222a47a3
Fix typo ( #1776 )
2023-01-11 00:49:13 +09:00
Adam Reichold
8312c882a5
More cosmetic fixes for upcoming Clippy lints. ( #1771 )
2023-01-10 10:32:45 +01:00
Paul Masurel
7a8fce0ae7
Minor mini fixes
2023-01-10 14:15:30 +09:00
Michael Kleen
196e42f33e
Add regex tokenizer ( #1759 )
...
This adds a regex tokenizer which tokenizes the text by using a
regex pattern to split.
Co-authored-by: Michael Kleen <mkleen@gmailw.com >
2023-01-10 13:38:37 +09:00