PSeitz
2e109018b7
add missing parameter to term agg ( #2103 )
...
* add missing parameter to term agg
* move missing handling to block accessor
* add multivalue test, fix multivalue case, add comments
* add documentation, deactivate special case
* cargo fmt
* resolve merge conflict
2023-08-14 14:22:18 +02:00
Adam Reichold
22c35b1e00
Fix explanation of boost queries seeking beyond query result. ( #2142 )
...
* Make current nightly Clippy happy.
* Fix explanation of boost queries seeking beyond query result.
2023-08-14 11:59:11 +09:00
trinity-1686a
b92082b748
implement lenient parser ( #2129 )
...
* move query parser to nom
* add suupport for term grouping
* initial work on infallible parser
* fmt
* add tests and fix minor parsing bugs
* address review comments
* add support for lenient queries in tantivy
* make lenient parser report errors
* allow mixing occur and bool in query
2023-08-08 15:41:29 +02:00
PSeitz
c2be6603a2
alternative mixed field aggregation collection ( #2135 )
...
* alternative mixed field aggregation collection
instead of having multiple accessor in one AggregationWithAccessor split it into
multiple independent AggregationWithAccessor
* Update src/aggregation/agg_req_with_accessor.rs
Co-authored-by: Paul Masurel <paul@quickwit.io >
---------
Co-authored-by: Paul Masurel <paul@quickwit.io >
2023-07-27 12:25:31 +02:00
Adam Reichold
c805f08ca7
Fix a few more upcoming Clippy lints ( #2133 )
2023-07-24 17:07:57 +09:00
Adam Reichold
ccc0335158
Minor improvements to OwnedBytes ( #2134 )
...
This makes it obvious where the `StableDerefTrait` is invoked and avoids
`transmute` when only a lifetime needs to be extended. Furthermore, it makes use
of `slice::split_at` where that seemed appropriate.
2023-07-24 17:06:33 +09:00
Adam Reichold
42acd334f4
Fixes the new deny-by-default incorrect_partial_ord_impl_on_ord_type Clippy lint ( #2131 )
2023-07-21 11:36:17 +09:00
Adam Reichold
820f126075
Remove support for Brotli and Snappy compression ( #2123 )
...
LZ4 provides fast and simple compression whereas Zstd is exceptionally flexible
so that the additional support for Brotli and Snappy does not really add
any distinct functionality on top of those two algorithms.
Removing them reduces our maintenance burden and reduces the number of choices
users have to make when setting up their project based on Tantivy.
2023-07-14 16:54:59 +09:00
Adam Reichold
7e6c4a1856
Include only built-in compression algorithms as enum variants ( #2121 )
...
* Include only built-in compression algorithms as enum variants
This enables compile-time errors when a compression algorithm is requested which
is not actually enabled for the current Cargo project. The cost is that indexes
using other compression algorithms cannot even be loaded (even though they
are not fully accessible in any case).
As a drive-by, this also fixes `--no-default-features` on `cfg(unix)`.
* Provide more instructive error messages for unsupported, but not unknown compression variants.
2023-07-14 11:02:49 +09:00
Adam Reichold
5fafe4b1ab
Add missing query_terms impl for TermSetQuery. ( #2120 )
2023-07-13 14:54:29 +02:00
PSeitz
1e7cd48cfa
remove allocations in split compound words ( #2080 )
...
* remove allocations in split compound words
* clear reused data
2023-07-13 09:43:02 +09:00
dependabot[bot]
7f51d85bbd
Update lru requirement from 0.10.0 to 0.11.0 ( #2117 )
...
Updates the requirements on [lru](https://github.com/jeromefroe/lru-rs ) to permit the latest version.
- [Changelog](https://github.com/jeromefroe/lru-rs/blob/master/CHANGELOG.md )
- [Commits](https://github.com/jeromefroe/lru-rs/compare/0.10.0...0.11.0 )
---
updated-dependencies:
- dependency-name: lru
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-07-13 09:42:21 +09:00
PSeitz
ad76e32398
Update CHANGELOG.md ( #2091 )
...
* Update CHANGELOG.md
* Update CHANGELOG.md
2023-07-11 13:58:49 +08:00
dependabot[bot]
7575f9bf1c
Update itertools requirement from 0.10.3 to 0.11.0 ( #2098 )
...
Updates the requirements on [itertools](https://github.com/rust-itertools/itertools ) to permit the latest version.
- [Changelog](https://github.com/rust-itertools/itertools/blob/master/CHANGELOG.md )
- [Commits](https://github.com/rust-itertools/itertools/compare/v0.10.5...v0.11.0 )
---
updated-dependencies:
- dependency-name: itertools
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-07-07 11:14:46 +02:00
Naveen Aiathurai
67bdf3f5f6
fixes order_by_u64_field and order_by_fast_field should allow sorting in ascending order #1676 ( #2111 )
...
* feat: order_by_fast_field allows sorting using parameter order
* chore: change the corresponding values to original one
* chore: fix formatting issues
* fix: first_or_default_col should also sort by order
* chore: empty doc to testcase and docstest fixes
* chore: fix failure tests
* core: add empty document without fastfield
* chore: fix fmt
* chore: change variable name
2023-07-06 05:10:10 +02:00
François Massot
3c300666ad
Merge pull request #2110 from quickwit-oss/fulmicoton/dynamic-follow-up
...
Add dynamic filters to text analyzer builder.
2023-07-03 21:49:24 +02:00
François Massot
b91d3f6be4
Clean comment on 'TextAnalyzerBuilder::filter_dynamic' method.
2023-07-03 18:45:59 +02:00
François Massot
a8e76513bb
Remove useless clone.
2023-07-03 22:05:11 +09:00
François Massot
0a23201338
Fix stackoverflow and add docs.
2023-07-03 22:05:11 +09:00
François Massot
81330aaf89
WIP
2023-07-03 22:05:10 +09:00
Paul Masurel
98a3b01992
Removing the BoxedTokenizer
2023-07-03 22:05:10 +09:00
Paul Masurel
d341520938
Dynamic follow up
2023-07-03 22:05:10 +09:00
François Massot
5c9af73e41
Followup fulmicoton poc.
2023-07-03 22:05:10 +09:00
Paul Masurel
ad4c940fa3
proof of concept for dynamic tokenizer.
2023-07-03 22:05:10 +09:00
Paul Masurel
910b0b0c61
Cargo fmt
2023-07-03 22:03:31 +09:00
PSeitz
3fef052bf1
fix flaky test ( #2107 )
...
closes #2099
2023-06-29 14:30:56 +08:00
PSeitz
040554f2f9
Update to lz4_flex 0.11 ( #2106 )
2023-06-29 14:16:00 +08:00
PSeitz
17186ca9c9
improve docs ( #2105 )
2023-06-27 13:37:14 +08:00
François Massot
212d59c9ab
Merge pull request #2102 from quickwit-oss/fmassot/ngram-new-should-return-error
...
Ngram tokenizer now returns an error with invalid arguments.
2023-06-27 05:36:09 +02:00
dependabot[bot]
1a1f252a3f
Update memmap2 requirement from 0.6.0 to 0.7.1 ( #2104 )
...
Updates the requirements on [memmap2](https://github.com/RazrFalcon/memmap2-rs ) to permit the latest version.
- [Changelog](https://github.com/RazrFalcon/memmap2-rs/blob/master/CHANGELOG.md )
- [Commits](https://github.com/RazrFalcon/memmap2-rs/compare/v0.6.0...v0.7.1 )
---
updated-dependencies:
- dependency-name: memmap2
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-06-27 05:15:43 +02:00
François Massot
d73706dede
Ngram tokenizer now returns an error with invalid arguments.
2023-06-25 20:13:24 +02:00
PSeitz
44850e1036
move fail dep to dev only ( #2094 )
...
wasm compilation fails with dep only
2023-06-22 06:59:11 +02:00
Adam Reichold
3b0cbf8102
Cosmetic updates to the warmer example. ( #2095 )
...
Just some cosmetic tweaks to make the example easier on the eyes as a colleague
was staring at this for quite some time this week.
2023-06-22 11:25:01 +09:00
Adam Reichold
4aa131c3db
Make TextAnalyzerBuilder publically accessible ( #2097 )
...
This way, client code can name the type to e.g. store it inside structs without
resorting to generics and it means that its documentation is part of the crate
documentation generated by `cargo doc`.
2023-06-22 11:24:21 +09:00
Naveen Aiathurai
59962097d0
fix : #2078 return error when tokenizer not found while indexing ( #2093 )
...
* fix : #2078 return error when tokenizer not found while indexing
* chore: formatting issues
* chore: fix review comments
2023-06-16 04:33:55 +02:00
Adam Reichold
ebc78127f3
Add BytesFilterCollector to support filtering based on a bytes fast field ( #2075 )
...
* Do some Clippy- and Cargo-related boy-scouting.
* Add BytesFilterCollector to support filtering based on a bytes fast field
This is basically a copy of the existing FilterCollector but modified and
specialised to work on a bytes fast field.
* Changed semantics of filter collectors to consider multi-valued fields
2023-06-13 14:19:58 +09:00
PSeitz
8199aa7de7
bump version to 0.20.2 ( #2089 )
0.20.2
2023-06-12 18:56:54 +08:00
PSeitz
657f0cd3bd
add missing Bytes validation to term_agg ( #2077 )
...
returns empty for now instead of failing like before
2023-06-12 16:38:07 +08:00
Adam Reichold
3a82ef2560
Fix is_child_of function not considering the root facet. ( #2086 )
2023-06-12 08:35:18 +02:00
PSeitz
3546e7fc63
small agg limit docs improvement ( #2073 )
...
small docs improvement as follow up on bug https://github.com/quickwit-oss/quickwit/issues/3503
2023-06-12 10:55:24 +09:00
PSeitz
862f367f9e
release without Alice in Wonderland, bump version to 0.20.1 ( #2087 )
...
* Release without Alice in Wonderland
* bump version to 0.20.1
2023-06-12 10:54:03 +09:00
PSeitz
14137d91c4
Update CHANGELOG.md ( #2081 )
2023-06-12 10:53:40 +09:00
François Massot
924fc70cb5
Merge pull request #2088 from quickwit-oss/fmassot/align-type-priorities-for-json-numbers
...
Align numerical type priority order on the search side.
2023-06-11 22:04:54 +02:00
François Massot
07023948aa
Add test that indexes and searches a JSON field.
2023-06-11 21:47:52 +02:00
François Massot
0cb53207ec
Fix tests.
2023-06-11 12:13:35 +02:00
François Massot
17c783b4db
Align numerical type priority order on the search side.
2023-06-11 11:49:27 +02:00
Harrison Burt
7220df8a09
Fix building on windows with mmap ( #2070 )
...
* Fix windows build
* Make pub
* Update docs
* Re arrange
* Fix compilation error on unix
* Fix unix borrows
* Revert "Fix unix borrows"
This reverts commit c1d94fd12b .
* Fix unix borrows and revert original change
* Fix warning
* Cleaner code.
---------
Co-authored-by: Paul Masurel <paul@quickwit.io >
0.20.1
2023-06-10 18:32:39 +02:00
PSeitz
e3eacb4388
release tantivy ( #2083 )
...
* prerelease
* chore: Release
0.20
2023-06-09 10:47:46 +02:00
PSeitz
fdecb79273
tokenizer-api: reduce Tokenizer overhead ( #2062 )
...
* tokenizer-api: reduce Tokenizer overhead
Previously a new `Token` for each text encountered was created, which
contains `String::with_capacity(200)`
In the new API the token_stream gets mutable access to the tokenizer,
this allows state to be shared (in this PR Token is shared).
Ideally the allocation for the BoxTokenStream would also be removed, but
this may require some lifetime tricks.
* simplify api
* move lowercase and ascii folding buffer to global
* empty Token text as default
2023-06-08 18:37:58 +08:00
PSeitz
27f202083c
Improve Termmap Indexing Performance +~30% ( #2058 )
...
* update benchmark
* Improve Termmap Indexing Performance +~30%
This contains many small changes to improve Termmap performance.
Most notably:
* Specialized byte compare and equality versions, instead of glibc calls.
* ExpUnrolledLinkedList to not contain inline items.
Allow compare hash only via a feature flag compare_hash_only:
64bits should be enough with a good hash function to compare strings by
their hashes instead of comparing the strings. Disabled by default
CreateHashMap/alice/174693
time: [642.23 µs 643.80 µs 645.24 µs]
thrpt: [258.20 MiB/s 258.78 MiB/s 259.41 MiB/s]
change:
time: [-14.429% -13.303% -12.348%] (p = 0.00 < 0.05)
thrpt: [+14.088% +15.344% +16.862%]
Performance has improved.
CreateHashMap/alice_expull/174693
time: [877.03 µs 880.44 µs 884.67 µs]
thrpt: [188.32 MiB/s 189.22 MiB/s 189.96 MiB/s]
change:
time: [-26.460% -26.274% -26.091%] (p = 0.00 < 0.05)
thrpt: [+35.301% +35.637% +35.981%]
Performance has improved.
CreateHashMap/numbers_zipf/8000000
time: [9.1198 ms 9.1573 ms 9.1961 ms]
thrpt: [829.64 MiB/s 833.15 MiB/s 836.57 MiB/s]
change:
time: [-35.229% -34.828% -34.384%] (p = 0.00 < 0.05)
thrpt: [+52.403% +53.440% +54.390%]
Performance has improved.
* clippy
* add bench for ids
* inline(always) to inline whole block with bounds checks
* cleanup
2023-06-08 11:13:52 +02:00