PSeitz
8459efa32c
split term collection count and sub_agg ( #1921 )
...
use unrolled ColumnValues::get_vals
2023-03-13 04:37:41 +01:00
PSeitz
61cfd8dc57
fix clippy ( #1927 )
2023-03-13 03:12:02 +01:00
trinity-1686a
064518156f
refactor tokenization pipeline to use GATs ( #1924 )
...
* refactor tokenization pipeline to use GATs
* fix doctests
* fix clippy lints
* remove commented code
2023-03-09 09:39:37 +01:00
PSeitz
a42a96f470
fix panic in dict column merge ( #1930 )
...
* fix panic in dict column merge
* Bugfix and added unit test
---------
Co-authored-by: Paul Masurel <paul@quickwit.io >
2023-03-08 22:04:37 +09:00
trinity-1686a
fcf5a25d93
use DeltaReader directly to implement Dictionnary::ord_to_term ( #1928 )
2023-03-08 11:15:56 +09:00
dependabot[bot]
c0a5b28fd3
Update lru requirement from 0.9.0 to 0.10.0 ( #1932 )
...
Updates the requirements on [lru](https://github.com/jeromefroe/lru-rs ) to permit the latest version.
- [Release notes](https://github.com/jeromefroe/lru-rs/releases )
- [Changelog](https://github.com/jeromefroe/lru-rs/blob/master/CHANGELOG.md )
- [Commits](https://github.com/jeromefroe/lru-rs/compare/0.9.0...0.10.0 )
---
updated-dependencies:
- dependency-name: lru
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-03-07 15:09:02 +09:00
trinity-1686a
a4f7ca8309
use DeltaReader directly to implement Dictionnary::term_ord ( #1925 )
...
* use DeltaReader directly to implement Dictionnary::term_ord
* add some additional test case for Dictionary::term_ord
2023-03-06 09:45:22 +01:00
Paul Masurel
364e321415
Clippy fix ( #1926 )
2023-03-06 10:37:17 +09:00
Paul Masurel
ed5a3b3172
Bumped murmurhash version
2023-03-03 21:24:32 +09:00
PSeitz
ca20bfa776
add date_histogram ( #1900 )
...
* add date_histogram
* add return result
2023-03-02 05:17:35 +01:00
PSeitz
faa706d804
add coerce option for text and numbers types ( #1904 )
...
* add coerce option for text and numbers types
allow to coerce the field type when indexing if the type does not match
* Apply suggestions from code review
Co-authored-by: Paul Masurel <paul@quickwit.io >
* add tests,add COERCE flag, include bool in coercion
---------
Co-authored-by: Paul Masurel <paul@quickwit.io >
2023-03-01 11:36:59 +01:00
PSeitz
850a0d7ae2
add agg benchmark for optional and multi value ( #1916 )
...
closes #1870
2023-03-01 17:01:52 +09:00
Paul Masurel
7fae4d98d7
Adapting for quickwit2 ( #1912 )
...
* Adapting tantivy to make it possible to be plugged to quickwit.
* Apply suggestions from code review
Co-authored-by: PSeitz <PSeitz@users.noreply.github.com >
* Added unit test
---------
Co-authored-by: PSeitz <PSeitz@users.noreply.github.com >
2023-03-01 16:27:46 +09:00
PSeitz
bc36458334
move buffer in front of dynamic dispatch ( #1915 )
...
dynamic dispatch seems to be really expensive, move the buffer in front of the dynamic dispatch, to reduce the number of calls into the dynamic dispatched collector.
2023-02-28 13:07:50 +08:00
trinity-1686a
8a71e00da3
allow limiting the number of matched term in range query ( #1899 )
2023-02-27 10:44:08 +01:00
PSeitz
e510f699c8
feat: add support for u64,i64,f64 fields in term aggregation ( #1883 )
...
* feat: add support for u64,i64,f64 fields in term aggregation
* hash enum values
* fix build
* Apply suggestions from code review
Co-authored-by: Paul Masurel <paul@quickwit.io >
---------
Co-authored-by: Paul Masurel <paul@quickwit.io >
2023-02-27 15:04:41 +08:00
Paul Masurel
d25fc155b2
Making some of the column/termdict operations async-friendly ( #1902 )
2023-02-27 15:34:47 +09:00
Paul Masurel
8ea97e7d6b
Minor refactoring preparing for getting columnar integrated in quickwit. ( #1911 )
2023-02-27 14:23:30 +09:00
Paul Masurel
0a726a0897
Added Empty ColumnIndex ( #1910 )
2023-02-27 13:59:22 +09:00
Paul Masurel
66ff53b0f4
Various minor code cleanup ( #1909 )
2023-02-27 13:48:34 +09:00
Paul Masurel
d002698008
Re-export of query grammar. ( #1908 )
2023-02-27 12:26:34 +09:00
Paul Masurel
c838aa808b
Removedc the extra nesting in unit test file ( #1907 )
2023-02-27 12:17:52 +09:00
Paul Masurel
06850719dc
Renaming .values(DocId) to .values_for_doc(DocId) ( #1906 )
2023-02-27 12:15:13 +09:00
PSeitz
5f23bb7e65
switch to sparse collection for histogram ( #1898 )
...
* switch to sparse collection for histogram
Replaces histogram vec collection with a hashmap. This approach works much better for sparse data and enables use cases like drill downs (filter + small interval).
It is slower for dense cases (1.3x-2x slower). This can be alleviated with a specialized hashmap in the future.
closes #1704
closes #1370
* refactor, clippy
* fix bucket_pos overflow issue
2023-02-23 07:02:58 +01:00
trinity-1686a
533ad99cd5
add PhrasePrefixQuery ( #1842 )
...
* add PhrasePrefixQuery
2023-02-22 11:18:33 +01:00
PSeitz
c7278b3258
remove schema in aggs ( #1888 )
...
* switch to ColumnType, move tests
* remove Schema dependency in agg
2023-02-22 04:50:28 +01:00
Paul Masurel
6b403e3281
Re-export of columnar
2023-02-22 11:23:54 +09:00
Paul Masurel
789cc8703e
Adding unit test testing docfreq after merge ( #1895 )
2023-02-22 11:05:34 +09:00
Paul Masurel
e5098d9fe8
Moving test around reenabling tests that were disabled. ( #1894 )
2023-02-22 10:31:52 +09:00
Paul Masurel
f537334e4f
Adding a write schema to columnar's merge operations. ( #1884 )
...
* Adding a write schema to columnar's merge operations.
* Added unit test checking min/max when columns are empty.
* CR comment
* Rename to value_type_to_column_type
2023-02-21 18:25:16 +09:00
Paul Masurel
e2aa5af075
Clippy warnings fixes ( #1885 )
2023-02-20 19:04:13 +09:00
Paul Masurel
02bebf4ff5
Cargo fmt
2023-02-20 09:40:04 +09:00
Paul Masurel
0274c982d5
Refactoring. ( #1881 )
...
`ColumnValues` wrongly located in column_values/column.rs due to
historical reason moves to column_values/mod.rs
u128 stuff gets its own directory like u64 stuff.
2023-02-17 21:57:14 +09:00
PSeitz
74bf60b4f7
implement SegmentAggregationCollector on bucket aggs ( #1878 )
2023-02-17 12:53:29 +01:00
PSeitz
bf1449b22d
update examples for literate docs ( #1880 )
2023-02-17 11:48:22 +01:00
PSeitz
111f25a8f7
clippy ( #1879 )
...
* fix clippy
* fix clippy
* fmt
2023-02-17 11:34:21 +01:00
PSeitz
019db10e8e
refactor aggregations ( #1875 )
...
* add specialized version for full cardinality
Pre Columnar
test aggregation::tests::bench::bench_aggregation_average_u64 ... bench: 6,681,850 ns/iter (+/- 1,217,385)
test aggregation::tests::bench::bench_aggregation_average_u64_and_f64 ... bench: 10,576,327 ns/iter (+/- 494,380)
Current
test aggregation::tests::bench::bench_aggregation_average_u64 ... bench: 11,562,084 ns/iter (+/- 3,678,682)
test aggregation::tests::bench::bench_aggregation_average_u64_and_f64 ... bench: 18,925,790 ns/iter (+/- 17,616,771)
Post Change
test aggregation::tests::bench::bench_aggregation_average_u64 ... bench: 9,123,811 ns/iter (+/- 399,720)
test aggregation::tests::bench::bench_aggregation_average_u64_and_f64 ... bench: 13,111,825 ns/iter (+/- 273,547)
* refactor aggregation collection
* add buffering collector
2023-02-16 13:15:16 +01:00
Paul Masurel
7423f99719
Issue/columnar for json ( #1876 )
...
Adding support for JSON fast field.
2023-02-16 20:38:32 +09:00
Alex Cole
f2f38c43ce
Make BM25 scoring more flexible ( #1855 )
...
* Introduce Bm25StatisticsProvider to inject statistics
* fix formatting I accidentally changed
2023-02-16 19:14:12 +09:00
PSeitz
71f43ace1d
fix dynamic dispatch regression for range queries ( #1871 )
2023-02-14 16:56:40 +01:00
PSeitz
347614c841
test error for avg agg on ip field ( #1873 )
...
closes #1835
2023-02-14 23:22:56 +08:00
Paul Masurel
097fd6138d
Fix clippy comments ( #1872 )
2023-02-14 23:12:45 +09:00
PSeitz
01e5a22759
switch to new ff api ( #1868 )
2023-02-14 15:57:32 +08:00
Antoine Gauthier
b60b7d2afe
fix(CI) enable coverage on doctest ( #1839 )
...
* fix(CI) enable coverage on doctest
⚠️ Marked as [unstable](https://github.com/taiki-e/cargo-llvm-cov/issues/2 )
refs #1761
* remove obsolete CI directory
2023-02-14 16:42:44 +09:00
Yukun Guo
dfe4e95fde
Make index compatible with virtual drives on Windows ( #1843 )
...
* Make index compatible with virtual drives on Windows
* Get rid of normpath
2023-02-14 16:41:48 +09:00
Paul Masurel
60cc2644d6
Fixing test_fail_on_flush_segment_but_one_worker_remains ( #1869 )
...
The new fast field code, based on columnar, had a larger minimum memory
footprint, causing the first docuemnt to trigger a flush of the asegment
in this unit test.
This PR prevents the allocation of a large capacity for the different hashmap tables
using in the columnar writer.
Closes #1859
2023-02-14 16:09:42 +09:00
Paul Masurel
10bccac61b
Bugfix in parse_into_milliseconds ( #1867 )
2023-02-14 15:06:40 +09:00
PSeitz
1cfb9ce59a
improve range query performance ( #1864 )
...
fix RowId vs DocId naming
fixes #1863
2023-02-14 13:25:39 +09:00
trinity-1686a
539ff08a79
move DateTime to tantivy_common ( #1861 )
...
* move DateTime to tantivy_common
* resolve imports of columnar::DateTime as import of common::DateTime
2023-02-11 17:03:06 +01:00
PSeitz
dab93df94e
fix benchmarks ( #1862 )
2023-02-11 15:44:47 +09:00