Commit Graph

3284 Commits

Author SHA1 Message Date
trinity Pointard
0368162ef0 make DateHistogramAggregationReq buildable 2025-02-18 11:45:24 +01:00
trinity-1686a
e843c71015 Merge pull request #2568 from quickwit-oss/trinity/wildcard-query-parser
allow term starting with wildcard in query parser
2025-02-12 16:47:25 +01:00
trinity Pointard
5cea16ef9f improve handling of spcial char after exist query 2025-01-22 16:04:31 +01:00
dependabot[bot]
4aa8cd2470 Update downcast-rs requirement from 1.2.1 to 2.0.1 (#2566)
Updates the requirements on [downcast-rs](https://github.com/marcianx/downcast-rs) to permit the latest version.
- [Changelog](https://github.com/marcianx/downcast-rs/blob/master/CHANGELOG.md)
- [Commits](https://github.com/marcianx/downcast-rs/compare/v1.2.1...v2.0.1)

---
updated-dependencies:
- dependency-name: downcast-rs
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-01-22 10:32:24 +01:00
trinity Pointard
4d4ee1b0ac allow term starting with wildcard in query parser 2025-01-15 10:27:48 +01:00
dependabot[bot]
43c89b4360 Update itertools requirement from 0.13.0 to 0.14.0 (#2563)
Updates the requirements on [itertools](https://github.com/rust-itertools/itertools) to permit the latest version.
- [Changelog](https://github.com/rust-itertools/itertools/blob/master/CHANGELOG.md)
- [Commits](https://github.com/rust-itertools/itertools/compare/v0.13.0...v0.14.0)

---
updated-dependencies:
- dependency-name: itertools
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-01-08 17:11:46 +01:00
trinity-1686a
d281ca3e65 Merge pull request #2559 from quickwit-oss/trinity/sstable-partial-automaton
allow warming partially an sstable for an automaton
2025-01-08 16:35:35 +01:00
trinity Pointard
be17daf658 split iterator 2025-01-08 16:24:34 +01:00
trinity Pointard
6ca84a61fa make termdict always clone 2025-01-08 16:19:54 +01:00
trinity Pointard
037d12c9c9 fix deadlocking on automaton warmup 2025-01-06 11:58:58 +01:00
Remi Dettai
71cf19870b Exist queries match subpath fields (#2558)
* Exist queries match subpath fields

* Make subpath check optional

* Add async subpath listing
2025-01-06 10:17:39 +01:00
trinity Pointard
175a529c41 use executor for cpu-heavy sstable decompression for automaton 2025-01-03 19:14:07 +01:00
trinity Pointard
fe0c7c5408 change rangebound style 2025-01-02 11:56:05 +01:00
Harrison Burt
148594f0f9 Improve IndexWriter customisation via builder (#2562)
* Improve `IndexWriter` customisation via builder

* Remove change noise from PR

* Correct documentation

* Resolve comments and add test
2025-01-02 09:43:22 +01:00
dependabot[bot]
8edb439440 Update rustc-hash requirement from 1.1.0 to 2.1.0 (#2551)
---
updated-dependencies:
- dependency-name: rustc-hash
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-12-26 10:25:05 +01:00
trinity Pointard
dfff5f3bcb rename merge_holes_under => merge_holes_under_bytes 2024-12-23 16:17:44 +01:00
trinity-1686a
ebf4d84553 add comment about cpu-intensive operation in async context 2024-12-20 12:23:49 +01:00
trinity-1686a
42efc7f7c8 clippy 2024-12-20 11:00:11 +01:00
trinity-1686a
192395c311 attempt at simplifying can_block_match_automaton 2024-12-20 10:25:38 +01:00
trinity-1686a
a1447cc9c2 remove breaking change in sstable public api 2024-12-19 17:30:05 +01:00
trinity-1686a
c39d91f827 Merge pull request #2547 from quickwit-oss/trinity/count-str
add support for counting non integer in aggregation
2024-12-17 15:27:30 +01:00
trinity Pointard
32b6e9711b add tests 2024-12-13 16:06:24 +01:00
trinity-1686a
24c5dc2398 allow warming up automaton 2024-12-10 13:32:12 +01:00
trinity-1686a
9e2ddec4b3 merge adjacent block when building delta for automaton 2024-12-10 13:32:12 +01:00
trinity-1686a
1f6a8e74bb support iterating over partially loaded sstable 2024-12-10 13:32:12 +01:00
trinity-1686a
7e901f523b get iter for blocks of sstable matching automaton 2024-12-10 13:32:12 +01:00
trinity-1686a
3c30a41c14 add helper to figure if block can match automaton 2024-12-10 13:32:12 +01:00
dependabot[bot]
0f99d4f420 Update measure_time requirement from 0.8.2 to 0.9.0 (#2557)
---
updated-dependencies:
- dependency-name: measure_time
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-12-09 21:39:01 +01:00
Pierre Barre
6e02c5cb25 Make NUM_MERGE_THREADS configurable (#2535)
* Make `NUM_MERGE_THREADS` configurable

* Remove unused import

* Reword comment src/index/index.rs

Co-authored-by: PSeitz <PSeitz@users.noreply.github.com>

---------

Co-authored-by: PSeitz <PSeitz@users.noreply.github.com>
2024-12-09 16:53:11 +08:00
PSeitz
876a579e5d queryparser: add field respecification test (#2550) 2024-12-02 14:17:12 +01:00
PSeitz
4c52499622 clippy (#2549) 2024-11-29 16:08:21 +08:00
trinity-1686a
0bac391291 add support for counting non integer in aggregation 2024-11-28 19:52:47 +01:00
PSeitz
52d4e81e70 update CHANGELOG (#2546) 2024-11-27 20:49:35 +08:00
dependabot[bot]
c71ea7b2ef Update thiserror requirement from 1.0.30 to 2.0.1 (#2542)
Updates the requirements on [thiserror](https://github.com/dtolnay/thiserror) to permit the latest version.
- [Release notes](https://github.com/dtolnay/thiserror/releases)
- [Commits](https://github.com/dtolnay/thiserror/compare/1.0.30...2.0.1)

---
updated-dependencies:
- dependency-name: thiserror
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-11-09 08:08:34 +08:00
Paul Masurel
c35a782747 Updating rustc-hash and clippy fixes (#2532)
* Updating rustc-hash and clippy fixes

* fix terms_aggregation_min_doc_count_special_case

---------

Co-authored-by: Pascal Seitz <pascal.seitz@gmail.com>
2024-11-01 13:46:26 +08:00
dependabot[bot]
c66af2c0a9 Update binggan requirement from 0.12.0 to 0.14.0 (#2530)
* Update binggan requirement from 0.12.0 to 0.14.0

---
updated-dependencies:
- dependency-name: binggan
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

* fix build

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Pascal Seitz <pascal.seitz@gmail.com>
2024-10-24 09:41:35 +08:00
Joan Antoni RE
f9ac055847 Fix some links in architecture docs (#2528) 2024-10-23 21:06:54 +09:00
PSeitz
21d057059e clippy (#2527)
* clippy

* clippy

* clippy

* clippy

* convert allow to expect and remove unused

* cargo fmt

* cleanup

* export sample

* clippy
2024-10-22 09:26:54 +08:00
PSeitz
dca508b4ca remove read_postings_no_deletes (#2526)
closes #2525
2024-10-22 09:52:43 +09:00
PSeitz
aebae9965d add RegexPhraseQuery (#2516)
* add RegexPhraseQuery

RegexPhraseQuery supports phrase queries with regex. It supports regex
and wildcards. E.g. a query with wildcards:
"b* b* wolf" matches "big bad wolf"
Slop is supported as well:
"b* wolf"~2 matches "big bad wolf"

Regex queries may match a lot of terms where we still need to
keep track which term hit to load the positions.
The phrase query algorithm groups terms by their frequency
together in the union to prefilter groups early.

This PR comes with some new datastructures:

SimpleUnion - A union docset for a list of docsets. It doesn't do any
caching and is therefore well suited for datasets with lots of skipping.
(phrase search, but intersections in general)

LoadedPostings - Like SegmentPostings, but all docs and positions are loaded in
memory. SegmentPostings uses 1840 bytes per instance with its caches,
which is equivalent to 460 docids.
LoadedPostings is used for terms which have less than 100 docs.
LoadedPostings is only used to reduce memory consumption.

BitSetPostingUnion - Creates a `Posting` that uses the bitset for docid
hits and the docsets for positions. The BitSet is the precalculated
union of the docsets
In the RegexPhraseQuery there is a size limit of 512 docsets per PreAggregatedUnion,
before creating a new one.

Renamed Union to BufferedUnionScorer
Added proptests to test different union types.

* cleanup

* use Box instead of Vec

* use RefCell instead of term_freq(&mut)

* remove wildcard mode

* move RefCell to outer

* clippy
2024-10-21 18:29:17 +08:00
Marvin
e7e3e3f44c make casing in docs more consistent (#2524)
* make casing in docs more consistent

* more

* lowercase tantivy
2024-10-21 17:59:41 +09:00
PSeitz
2f2db16ec1 store DateTime as nanoseconds in doc store (#2486)
* store DateTime as nanoseconds in doc store

The doc store DateTime was truncated to microseconds previously. This
removes this truncation, while still keeping backwards compatibility.

This is done by adding the trait `ConfigurableBinarySerializable`, which
works like `BinarySerializable`, but with a config that allows de/serialize
as different date time precision currently.

bump version format to 7.
add compat test to check the date time truncation.

* remove configurable binary serialize, add enum for doc store version

* test doc store version ord
2024-10-18 10:50:20 +08:00
Paul Masurel
d152e29687 Fixed citation (#2523) 2024-10-17 10:19:50 +09:00
Paul Masurel
285bcc25c9 Added citation.cff (#2522) 2024-10-17 09:43:35 +09:00
PSeitz
7b65ad922d use binggan for stacker bench (#2492)
* use binggan for stacker bench

```
alice (num terms: 174693)
hashmap                    Memory: 1.3 MB     Avg: 367.19 MiB/s (-1.34%)    Median: 368.10 MiB/s (-1.34%)    [378.75 MiB/s .. 352.81 MiB/s]
hasmap with postings       Memory: 2.4 MB     Avg: 237.29 MiB/s (-2.19%)    Median: 240.22 MiB/s (-1.61%)    [248.26 MiB/s .. 210.66 MiB/s]
fxhashmap ref postings     Memory: 2.9 MB     Avg: 171.94 MiB/s (-3.22%)    Median: 174.13 MiB/s (-2.69%)    [185.94 MiB/s .. 152.43 MiB/s]
fxhasmap owned postings    Memory: 3.5 MB     Avg: 96.993 MiB/s (-4.20%)    Median: 97.410 MiB/s (-4.48%)    [102.78 MiB/s .. 82.745 MiB/s]
numbers unique 100k
hashmap                 Memory: 5.2 MB     Avg: 334.17 MiB/s (-3.06%)    Median: 352.61 MiB/s (+0.77%)    [362.60 MiB/s .. 213.03 MiB/s]
hasmap with postings    Memory: 6.3 MB     Avg: 316.96 MiB/s (-0.02%)    Median: 325.16 MiB/s (-0.04%)    [338.36 MiB/s .. 218.60 MiB/s]
zipfs numbers 100k
hashmap                 Memory: 1.3 MB     Avg: 1.2342 GiB/s (+2.87%)    Median: 1.2677 GiB/s (+4.66%)    [1.3130 GiB/s .. 915.93 MiB/s]
hasmap with postings    Memory: 2.4 MB     Avg: 485.16 MiB/s (+2.68%)    Median: 494.70 MiB/s (+4.42%)    [505.31 MiB/s .. 413.14 MiB/s]
numbers unique 1mio
hashmap                 Memory: 35.7 MB     Avg: 169.68 MiB/s (-1.08%)    Median: 166.80 MiB/s (-3.87%)    [201.33 MiB/s .. 154.26 MiB/s]
hasmap with postings    Memory: 39.8 MB     Avg: 149.49 MiB/s (-3.07%)    Median: 150.85 MiB/s (-1.45%)    [160.76 MiB/s .. 130.94 MiB/s]
zipfs numbers 1mio
hashmap                 Memory: 1.3 MB     Avg: 1.2185 GiB/s (-2.33%)     Median: 1.2291 GiB/s (-2.33%)     [1.2905 GiB/s .. 1.0742 GiB/s]
hasmap with postings    Memory: 5.5 MB     Avg: 358.43 MiB/s (-11.63%)    Median: 356.95 MiB/s (-12.85%)    [444.94 MiB/s .. 302.46 MiB/s]
numbers unique 2mio
hashmap                 Memory: 70.3 MB     Avg: 163.65 MiB/s (+8.37%)    Median: 162.83 MiB/s (+8.80%)    [190.20 MiB/s .. 144.70 MiB/s]
hasmap with postings    Memory: 78.6 MB     Avg: 148.00 MiB/s (+7.75%)    Median: 151.53 MiB/s (+9.11%)    [166.92 MiB/s .. 120.09 MiB/s]
zipfs numbers 2mio
hashmap                 Memory: 1.3 MB     Avg: 1.2535 GiB/s (+2.59%)    Median: 1.2654 GiB/s (+0.36%)    [1.2938 GiB/s .. 1.0592 GiB/s]
hasmap with postings    Memory: 9.7 MB     Avg: 377.96 MiB/s (-4.94%)    Median: 381.82 MiB/s (-3.67%)    [426.14 MiB/s .. 335.66 MiB/s]
numbers unique 5mio
hashmap                 Memory: 277.9 MB     Avg: 121.30 MiB/s (+2.00%)    Median: 121.99 MiB/s (+2.99%)    [132.51 MiB/s .. 110.32 MiB/s]
hasmap with postings    Memory: 295.7 MB     Avg: 114.23 MiB/s (+2.13%)    Median: 115.26 MiB/s (+2.94%)    [124.08 MiB/s .. 103.38 MiB/s]
zipfs numbers 5mio
hashmap                 Memory: 1.3 MB      Avg: 1.2326 GiB/s (+0.63%)    Median: 1.2400 GiB/s (+0.71%)    [1.2755 GiB/s .. 1.0923 GiB/s]
hasmap with postings    Memory: 25.4 MB     Avg: 360.49 MiB/s (+1.07%)    Median: 363.44 MiB/s (+1.27%)    [404.88 MiB/s .. 300.38 MiB/s]
```

* rename bench

* update binggan

* rename to HASHMAP_CAPACITY
2024-10-16 11:41:33 +08:00
dependabot[bot]
99be20cedd Update binggan requirement from 0.10.0 to 0.12.0 (#2519)
* Update binggan requirement from 0.10.0 to 0.12.0

---
updated-dependencies:
- dependency-name: binggan
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

* fix build

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Pascal Seitz <pascal.seitz@gmail.com>
2024-10-16 11:36:04 +08:00
Bruce Mitchener
5f026901b8 Update MSRV to 1.75 (#2515)
This is required by the `fs4` dependency. There are other
things that need something later than 1.66.

Both quickwit and the Python binding already require something
newer.
2024-10-16 10:32:16 +08:00
baishen
6dfa2df06f fix OwnedBytes debug panic (#2512) 2024-10-16 10:31:40 +08:00
Bruce Mitchener
c17e513377 Reduce typo count. (#2510) 2024-10-10 09:55:37 +08:00
PSeitz
2f5a269c70 update packages (#2500)
fixes some warnings
2024-09-25 17:46:18 +08:00