Commit Graph

3527 Commits

Author SHA1 Message Date
dependabot[bot]
166fc15239 Update memmap2 requirement from 0.7.1 to 0.9.0 (#2204)
Updates the requirements on [memmap2](https://github.com/RazrFalcon/memmap2-rs) to permit the latest version.
- [Changelog](https://github.com/RazrFalcon/memmap2-rs/blob/master/CHANGELOG.md)
- [Commits](https://github.com/RazrFalcon/memmap2-rs/compare/v0.7.1...v0.9.0)

---
updated-dependencies:
- dependency-name: memmap2
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-10-04 05:00:46 +02:00
PSeitz
514a6e7fef fix bench compile, fix Document reexport (#2203) 2023-10-03 17:28:36 +02:00
dependabot[bot]
82d9127191 Update fs4 requirement from 0.6.3 to 0.7.0 (#2199)
Updates the requirements on [fs4](https://github.com/al8n/fs4-rs) to permit the latest version.
- [Release notes](https://github.com/al8n/fs4-rs/releases)
- [Commits](https://github.com/al8n/fs4-rs/commits/0.7.0)

---
updated-dependencies:
- dependency-name: fs4
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-10-03 04:43:09 +02:00
PSeitz
03a1f40767 rename DocValue to Value (#2197)
rename DocValue to Value to avoid confusion with lucene DocValues
rename Value to OwnedValue
2023-10-02 17:03:00 +02:00
Harrison Burt
1c7c6fd591 POC: Tantivy documents as a trait (#2071)
* fix windows build (#1)

* Fix windows build

* Add doc traits

* Add field value iter

* Add value and serialization

* Adjust order

* Fix bug

* Correct type

* Fix generic bugs

* Reformat code

* Add generic to index writer which I forgot about

* Fix missing generics on single segment writer

* Add missing type export

* Add default methods for convenience

* Cleanup

* Fix more-like-this query to use standard types

* Update API and fix tests

* Add doc traits

* Add field value iter

* Add value and serialization

* Adjust order

* Fix bug

* Correct type

* Rebase main and fix conflicts

* Reformat code

* Merge upstream

* Fix missing generics on single segment writer

* Add missing type export

* Add default methods for convenience

* Cleanup

* Fix more-like-this query to use standard types

* Update API and fix tests

* Add tokenizer improvements from previous commits

* Add tokenizer improvements from previous commits

* Reformat

* Fix unit tests

* Fix unit tests

* Use enum in changes

* Stage changes

* Add new deserializer logic

* Add serializer integration

* Add document deserializer

* Implement new (de)serialization api for existing types

* Fix bugs and type errors

* Add helper implementations

* Fix errors

* Reformat code

* Add unit tests and some code organisation for serialization

* Add unit tests to deserializer

* Add some small docs

* Add support for deserializing serde values

* Reformat

* Fix typo

* Fix typo

* Change repr of facet

* Remove unused trait methods

* Add child value type

* Resolve comments

* Fix build

* Fix more build errors

* Fix more build errors

* Fix the tests I missed

* Fix examples

* fix numerical order, serialize PreTok Str

* fix coverage

* rename Document to TantivyDocument, rename DocumentAccess to Document

add Binary prefix to binary de/serialization

* fix coverage

---------

Co-authored-by: Pascal Seitz <pascal.seitz@gmail.com>
2023-10-02 10:01:16 +02:00
PSeitz
b525f653c0 replace BinaryHeap for TopN (#2186)
* replace BinaryHeap for TopN

replace BinaryHeap for TopN with variant that selects the median with QuickSort,
which runs in O(n) time.

add merge_fruits fast path

* call truncate unconditionally, extend test

* remove special early exit

* add TODO, fmt

* truncate top n instead median, return vec

* simplify code
2023-09-27 09:25:30 +02:00
ethever.eth
90586bc1e2 chore: remove unused Seek impl for Writers (#2187) (#2189)
Co-authored-by: famouscat <onismaa@gmail.com>
2023-09-26 17:03:28 +09:00
PSeitz
832f1633de handle exclusive out of bounds ranges on fastfield range queries (#2174)
closes https://github.com/quickwit-oss/quickwit/issues/3790
2023-09-26 08:00:40 +02:00
PSeitz
38db53c465 make column_index pub (#2181) 2023-09-22 08:06:45 +02:00
PSeitz
34920d31f5 Fix DateHistogram bucket gap (#2183)
* Fix DateHistogram bucket gap

Fixes a computation issue of the number of buckets needed in the
DateHistogram.

This is due to a missing normalization from request values (ms) to fast field
values (ns), when converting an intermediate result to the final result.
This results in a wrong computation by a factor 1_000_000.
The Histogram normalizes values to nanoseconds, to make the user input like
extended_bounds (ms precision) and the values from the fast field (ns precision for date type) compatible.
This normalization happens only for date type fields, as other field types don't have precision settings.
The normalization does not happen due a missing `column_type`, which is not
correctly passed after merging an empty aggregation (which does not have a `column_type` set), with a regular aggregation.

Another related issue is an empty aggregation, which will not have
`column_type` set, will not convert the result to human readable format.

This PR fixes the issue by:
- Limit the allowed field types of DateHistogram to DateType
- Instead of passing the column_type, which is only available on the segment level, we flag the aggregation as `is_date_agg`.
- Fix the merge logic

Add a flag to to normalization only once. This is not an issue
currently, but it could become easily one.

closes https://github.com/quickwit-oss/quickwit/issues/3837

* use older nightly for time crate (breaks build)
2023-09-21 10:41:35 +02:00
trinity-1686a
0241a05b90 add support for exists query syntax in query parser (#2170)
* add support for exists query syntax in query parser

* rustfmt

* make Exists require a field
2023-09-19 11:10:39 +02:00
PSeitz
e125f3b041 fix test (#2178) 2023-09-19 08:21:50 +02:00
PSeitz
c520ac46fc add support for date in term agg (#2172)
support DateTime in TermsAggregation
Format dates with Rfc3339
2023-09-14 09:22:18 +02:00
PSeitz
2d7390341c increase min memory to 15MB for indexing (#2176)
With tantivy 0.20 the minimum memory consumption per SegmentWriter increased to
12MB. 7MB are for the different fast field collectors types (they could be
lazily created). Increase the minimum memory from 3MB to 15MB.

Change memory variable naming from arena to budget.

closes #2156
2023-09-13 07:38:34 +02:00
dependabot[bot]
03fcdce016 Bump actions/checkout from 3 to 4 (#2171)
Bumps [actions/checkout](https://github.com/actions/checkout) from 3 to 4.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](https://github.com/actions/checkout/compare/v3...v4)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-09-11 10:47:33 +02:00
Ping Xia
e4e416ac42 extend FuzzyTermQuery to support json field (#2173)
* extend fuzzy search for json field

* comments

* comments

* fmt fix

* comments
2023-09-11 05:59:40 +02:00
Igor Motov
19325132b7 Fast-field based implementation of ExistsQuery (#2160)
Adds an implementation of ExistsQuery that takes advantage of fast fields.

Fixes #2159
2023-09-07 11:51:49 +09:00
Paul Masurel
389d36f760 Added comments 2023-09-04 11:06:56 +09:00
PSeitz
49448b31c6 chore: Release (#2168)
* chore: Release

* update CHANGELOG
0.21
2023-09-01 13:58:58 +02:00
PSeitz
ebede0bed7 update CHANGELOG (#2167) 2023-08-31 10:01:44 +02:00
PSeitz
b1d8b072db add missing aggregation part 2 (#2149)
* add missing aggregation part 2

Add missing support for:
- Mixed types columns
- Key of type string on numerical fields

The special aggregation is slower than the integrated one in TermsAggregation and therefore not
chosen by default, although it can cover all use cases.

* simplify, add num_docs to empty
2023-08-31 07:55:33 +02:00
ethever.eth
ee6a7c2bbb fix a small typo (#2165)
Co-authored-by: famouscat <onismaa@gmail.com>
2023-08-30 20:14:26 +02:00
PSeitz
c4e2708901 fix clippy, fmt (#2162) 2023-08-30 08:04:26 +02:00
PSeitz
5c8cfa50eb add missing parameter for percentiles (#2157) 2023-08-29 13:04:24 +02:00
PSeitz
73cb71762f add missing parameter for stats,min,max,count,sum,avg (#2151)
* add missing parameter for stats,min,max,count,sum,avg

add missing parameter for stats,min,max,count,sum,avg
closes #1913
partially #1789

* Apply suggestions from code review

Co-authored-by: Paul Masurel <paul@quickwit.io>

---------

Co-authored-by: Paul Masurel <paul@quickwit.io>
2023-08-28 08:59:51 +02:00
Harrison Burt
267dfe58d7 Fix testing on windows (#2155)
* Fix missing trait imports

* Fix building tests on windows

* Revert other PR change
2023-08-27 09:20:44 +09:00
Harrison Burt
131c10d318 Fix missing trait imports (#2154) 2023-08-27 09:20:26 +09:00
Chris Tam
e6cacc40a9 Remove outdated fast field documentation (#2145) 2023-08-24 07:49:49 +02:00
PSeitz
48d4847b38 Improve aggregation error message (#2150)
* Improve aggregation error message

Improve aggregation error message by wrapping the deserialization with a
custom struct. This deserialization variant is slower, since we need to
keep the deserialized data around twice with this approach.
For now the valid variants list is manually updated. This could be
replaced with a proc macro.
closes #2143

* Simpler implementation

---------

Co-authored-by: Paul Masurel <paul@quickwit.io>
2023-08-23 20:52:15 +02:00
PSeitz
59460c767f delayed column opening during merge (#2132)
* lazy columnar merge

This is the first part of addressing #3633
Instead of loading all Column into memory for the merge, only the current column_name
group is loaded. This can be done since the sstable streams the columns lexicographically.

* refactor

* add rustdoc

* replace iterator with BTreeMap
2023-08-21 08:55:35 +02:00
Paul Masurel
756156beaf Fix doc 2023-08-17 17:47:45 +09:00
PSeitz
480763db0d track memory arena memory usage (#2148) 2023-08-16 18:19:42 +02:00
PSeitz
62ece86f24 track ff dictionary indexing memory consumption (#2147) 2023-08-16 14:00:08 +02:00
Caleb Hattingh
52d9e6f298 Fix doc typos in count aggregation metric (#2127) 2023-08-15 08:50:23 +02:00
Caleb Hattingh
47b315ff18 doc: escape the backslash (#2144) 2023-08-14 19:10:07 +02:00
PSeitz
ed1deee902 fix sort index by date (#2124)
closes #2112
2023-08-14 17:36:52 +02:00
PSeitz
2e109018b7 add missing parameter to term agg (#2103)
* add missing parameter to term agg

* move missing handling to block accessor

* add multivalue test, fix multivalue case, add comments

* add documentation, deactivate special case

* cargo fmt

* resolve merge conflict
2023-08-14 14:22:18 +02:00
Adam Reichold
22c35b1e00 Fix explanation of boost queries seeking beyond query result. (#2142)
* Make current nightly Clippy happy.

* Fix explanation of boost queries seeking beyond query result.
2023-08-14 11:59:11 +09:00
trinity-1686a
b92082b748 implement lenient parser (#2129)
* move query parser to nom

* add suupport for term grouping

* initial work on infallible parser

* fmt

* add tests and fix minor parsing bugs

* address review comments

* add support for lenient queries in tantivy

* make lenient parser report errors

* allow mixing occur and bool in query
2023-08-08 15:41:29 +02:00
PSeitz
c2be6603a2 alternative mixed field aggregation collection (#2135)
* alternative mixed field aggregation collection

instead of having multiple accessor in one AggregationWithAccessor split it into
multiple independent AggregationWithAccessor

* Update src/aggregation/agg_req_with_accessor.rs

Co-authored-by: Paul Masurel <paul@quickwit.io>

---------

Co-authored-by: Paul Masurel <paul@quickwit.io>
2023-07-27 12:25:31 +02:00
Adam Reichold
c805f08ca7 Fix a few more upcoming Clippy lints (#2133) 2023-07-24 17:07:57 +09:00
Adam Reichold
ccc0335158 Minor improvements to OwnedBytes (#2134)
This makes it obvious where the `StableDerefTrait` is invoked and avoids
`transmute` when only a lifetime needs to be extended. Furthermore, it makes use
of `slice::split_at` where that seemed appropriate.
2023-07-24 17:06:33 +09:00
Adam Reichold
42acd334f4 Fixes the new deny-by-default incorrect_partial_ord_impl_on_ord_type Clippy lint (#2131) 2023-07-21 11:36:17 +09:00
Adam Reichold
820f126075 Remove support for Brotli and Snappy compression (#2123)
LZ4 provides fast and simple compression whereas Zstd is exceptionally flexible
so that the additional support for Brotli and Snappy does not really add
any distinct functionality on top of those two algorithms.

Removing them reduces our maintenance burden and reduces the number of choices
users have to make when setting up their project based on Tantivy.
2023-07-14 16:54:59 +09:00
Adam Reichold
7e6c4a1856 Include only built-in compression algorithms as enum variants (#2121)
* Include only built-in compression algorithms as enum variants

This enables compile-time errors when a compression algorithm is requested which
is not actually enabled for the current Cargo project. The cost is that indexes
using other compression algorithms cannot even be loaded (even though they
are not fully accessible in any case).

As a drive-by, this also fixes `--no-default-features` on `cfg(unix)`.

* Provide more instructive error messages for unsupported, but not unknown compression variants.
2023-07-14 11:02:49 +09:00
Adam Reichold
5fafe4b1ab Add missing query_terms impl for TermSetQuery. (#2120) 2023-07-13 14:54:29 +02:00
PSeitz
1e7cd48cfa remove allocations in split compound words (#2080)
* remove allocations in split compound words

* clear reused data
2023-07-13 09:43:02 +09:00
dependabot[bot]
7f51d85bbd Update lru requirement from 0.10.0 to 0.11.0 (#2117)
Updates the requirements on [lru](https://github.com/jeromefroe/lru-rs) to permit the latest version.
- [Changelog](https://github.com/jeromefroe/lru-rs/blob/master/CHANGELOG.md)
- [Commits](https://github.com/jeromefroe/lru-rs/compare/0.10.0...0.11.0)

---
updated-dependencies:
- dependency-name: lru
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-07-13 09:42:21 +09:00
PSeitz
ad76e32398 Update CHANGELOG.md (#2091)
* Update CHANGELOG.md

* Update CHANGELOG.md
2023-07-11 13:58:49 +08:00
dependabot[bot]
7575f9bf1c Update itertools requirement from 0.10.3 to 0.11.0 (#2098)
Updates the requirements on [itertools](https://github.com/rust-itertools/itertools) to permit the latest version.
- [Changelog](https://github.com/rust-itertools/itertools/blob/master/CHANGELOG.md)
- [Commits](https://github.com/rust-itertools/itertools/compare/v0.10.5...v0.11.0)

---
updated-dependencies:
- dependency-name: itertools
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-07-07 11:14:46 +02:00