tantivy

mirror of https://github.com/quickwit-oss/tantivy.git synced 2026-05-16 00:00:40 +00:00

Author	SHA1	Message	Date
Paul Masurel	61422d7cd5	Change in the default fast field tokenizer manager. `{ fast: true }` now results in the use of a the default fast field tokenizer. (instead of no tokenizer) The default tokenizer lowercases. Fast field gets a different default tokenizer manager than the normal tokenizer. The serialization of the fast field options is unchanged.	2023-07-18 19:20:10 +09:00
Adam Reichold	820f126075	Remove support for Brotli and Snappy compression (#2123 ) LZ4 provides fast and simple compression whereas Zstd is exceptionally flexible so that the additional support for Brotli and Snappy does not really add any distinct functionality on top of those two algorithms. Removing them reduces our maintenance burden and reduces the number of choices users have to make when setting up their project based on Tantivy.	2023-07-14 16:54:59 +09:00
Adam Reichold	7e6c4a1856	Include only built-in compression algorithms as enum variants (#2121 ) * Include only built-in compression algorithms as enum variants This enables compile-time errors when a compression algorithm is requested which is not actually enabled for the current Cargo project. The cost is that indexes using other compression algorithms cannot even be loaded (even though they are not fully accessible in any case). As a drive-by, this also fixes `--no-default-features` on `cfg(unix)`. * Provide more instructive error messages for unsupported, but not unknown compression variants.	2023-07-14 11:02:49 +09:00
Adam Reichold	5fafe4b1ab	Add missing query_terms impl for TermSetQuery. (#2120 )	2023-07-13 14:54:29 +02:00
PSeitz	1e7cd48cfa	remove allocations in split compound words (#2080 ) * remove allocations in split compound words * clear reused data	2023-07-13 09:43:02 +09:00
dependabot[bot]	7f51d85bbd	Update lru requirement from 0.10.0 to 0.11.0 (#2117 ) Updates the requirements on [lru](https://github.com/jeromefroe/lru-rs) to permit the latest version. - [Changelog](https://github.com/jeromefroe/lru-rs/blob/master/CHANGELOG.md) - [Commits](https://github.com/jeromefroe/lru-rs/compare/0.10.0...0.11.0) --- updated-dependencies: - dependency-name: lru dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-07-13 09:42:21 +09:00
PSeitz	ad76e32398	Update CHANGELOG.md (#2091 ) * Update CHANGELOG.md * Update CHANGELOG.md	2023-07-11 13:58:49 +08:00
dependabot[bot]	7575f9bf1c	Update itertools requirement from 0.10.3 to 0.11.0 (#2098 ) Updates the requirements on [itertools](https://github.com/rust-itertools/itertools) to permit the latest version. - [Changelog](https://github.com/rust-itertools/itertools/blob/master/CHANGELOG.md) - [Commits](https://github.com/rust-itertools/itertools/compare/v0.10.5...v0.11.0) --- updated-dependencies: - dependency-name: itertools dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-07-07 11:14:46 +02:00
Naveen Aiathurai	67bdf3f5f6	fixes order_by_u64_field and order_by_fast_field should allow sorting in ascending order #1676 (#2111 ) * feat: order_by_fast_field allows sorting using parameter order * chore: change the corresponding values to original one * chore: fix formatting issues * fix: first_or_default_col should also sort by order * chore: empty doc to testcase and docstest fixes * chore: fix failure tests * core: add empty document without fastfield * chore: fix fmt * chore: change variable name	2023-07-06 05:10:10 +02:00
François Massot	3c300666ad	Merge pull request #2110 from quickwit-oss/fulmicoton/dynamic-follow-up Add dynamic filters to text analyzer builder.	2023-07-03 21:49:24 +02:00
François Massot	b91d3f6be4	Clean comment on 'TextAnalyzerBuilder::filter_dynamic' method.	2023-07-03 18:45:59 +02:00
François Massot	a8e76513bb	Remove useless clone.	2023-07-03 22:05:11 +09:00
François Massot	0a23201338	Fix stackoverflow and add docs.	2023-07-03 22:05:11 +09:00
François Massot	81330aaf89	WIP	2023-07-03 22:05:10 +09:00
Paul Masurel	98a3b01992	Removing the BoxedTokenizer	2023-07-03 22:05:10 +09:00
Paul Masurel	d341520938	Dynamic follow up	2023-07-03 22:05:10 +09:00
François Massot	5c9af73e41	Followup fulmicoton poc.	2023-07-03 22:05:10 +09:00
Paul Masurel	ad4c940fa3	proof of concept for dynamic tokenizer.	2023-07-03 22:05:10 +09:00
Paul Masurel	910b0b0c61	Cargo fmt	2023-07-03 22:03:31 +09:00
PSeitz	3fef052bf1	fix flaky test (#2107 ) closes #2099	2023-06-29 14:30:56 +08:00
PSeitz	040554f2f9	Update to lz4_flex 0.11 (#2106 )	2023-06-29 14:16:00 +08:00
PSeitz	17186ca9c9	improve docs (#2105 )	2023-06-27 13:37:14 +08:00
François Massot	212d59c9ab	Merge pull request #2102 from quickwit-oss/fmassot/ngram-new-should-return-error Ngram tokenizer now returns an error with invalid arguments.	2023-06-27 05:36:09 +02:00
dependabot[bot]	1a1f252a3f	Update memmap2 requirement from 0.6.0 to 0.7.1 (#2104 ) Updates the requirements on [memmap2](https://github.com/RazrFalcon/memmap2-rs) to permit the latest version. - [Changelog](https://github.com/RazrFalcon/memmap2-rs/blob/master/CHANGELOG.md) - [Commits](https://github.com/RazrFalcon/memmap2-rs/compare/v0.6.0...v0.7.1) --- updated-dependencies: - dependency-name: memmap2 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-06-27 05:15:43 +02:00
François Massot	d73706dede	Ngram tokenizer now returns an error with invalid arguments.	2023-06-25 20:13:24 +02:00
PSeitz	44850e1036	move fail dep to dev only (#2094 ) wasm compilation fails with dep only	2023-06-22 06:59:11 +02:00
Adam Reichold	3b0cbf8102	Cosmetic updates to the warmer example. (#2095 ) Just some cosmetic tweaks to make the example easier on the eyes as a colleague was staring at this for quite some time this week.	2023-06-22 11:25:01 +09:00
Adam Reichold	4aa131c3db	Make TextAnalyzerBuilder publically accessible (#2097 ) This way, client code can name the type to e.g. store it inside structs without resorting to generics and it means that its documentation is part of the crate documentation generated by `cargo doc`.	2023-06-22 11:24:21 +09:00
Naveen Aiathurai	59962097d0	fix: #2078 return error when tokenizer not found while indexing (#2093 ) * fix: #2078 return error when tokenizer not found while indexing * chore: formatting issues * chore: fix review comments	2023-06-16 04:33:55 +02:00
Adam Reichold	ebc78127f3	Add BytesFilterCollector to support filtering based on a bytes fast field (#2075 ) * Do some Clippy- and Cargo-related boy-scouting. * Add BytesFilterCollector to support filtering based on a bytes fast field This is basically a copy of the existing FilterCollector but modified and specialised to work on a bytes fast field. * Changed semantics of filter collectors to consider multi-valued fields	2023-06-13 14:19:58 +09:00
PSeitz	8199aa7de7	bump version to 0.20.2 (#2089 ) 0.20.2	2023-06-12 18:56:54 +08:00
PSeitz	657f0cd3bd	add missing Bytes validation to term_agg (#2077 ) returns empty for now instead of failing like before	2023-06-12 16:38:07 +08:00
Adam Reichold	3a82ef2560	Fix is_child_of function not considering the root facet. (#2086 )	2023-06-12 08:35:18 +02:00
PSeitz	3546e7fc63	small agg limit docs improvement (#2073 ) small docs improvement as follow up on bug https://github.com/quickwit-oss/quickwit/issues/3503	2023-06-12 10:55:24 +09:00
PSeitz	862f367f9e	release without Alice in Wonderland, bump version to 0.20.1 (#2087 ) * Release without Alice in Wonderland * bump version to 0.20.1	2023-06-12 10:54:03 +09:00
PSeitz	14137d91c4	Update CHANGELOG.md (#2081 )	2023-06-12 10:53:40 +09:00
François Massot	924fc70cb5	Merge pull request #2088 from quickwit-oss/fmassot/align-type-priorities-for-json-numbers Align numerical type priority order on the search side.	2023-06-11 22:04:54 +02:00
François Massot	07023948aa	Add test that indexes and searches a JSON field.	2023-06-11 21:47:52 +02:00
François Massot	0cb53207ec	Fix tests.	2023-06-11 12:13:35 +02:00
François Massot	17c783b4db	Align numerical type priority order on the search side.	2023-06-11 11:49:27 +02:00
Harrison Burt	7220df8a09	Fix building on windows with mmap (#2070 ) * Fix windows build * Make pub * Update docs * Re arrange * Fix compilation error on unix * Fix unix borrows * Revert "Fix unix borrows" This reverts commit `c1d94fd12b`. * Fix unix borrows and revert original change * Fix warning * Cleaner code. --------- Co-authored-by: Paul Masurel <paul@quickwit.io> 0.20.1	2023-06-10 18:32:39 +02:00
PSeitz	e3eacb4388	release tantivy (#2083 ) * prerelease * chore: Release 0.20	2023-06-09 10:47:46 +02:00
PSeitz	fdecb79273	tokenizer-api: reduce Tokenizer overhead (#2062 ) * tokenizer-api: reduce Tokenizer overhead Previously a new `Token` for each text encountered was created, which contains `String::with_capacity(200)` In the new API the token_stream gets mutable access to the tokenizer, this allows state to be shared (in this PR Token is shared). Ideally the allocation for the BoxTokenStream would also be removed, but this may require some lifetime tricks. * simplify api * move lowercase and ascii folding buffer to global * empty Token text as default	2023-06-08 18:37:58 +08:00
PSeitz	27f202083c	Improve Termmap Indexing Performance +~30% (#2058 ) * update benchmark * Improve Termmap Indexing Performance +~30% This contains many small changes to improve Termmap performance. Most notably: * Specialized byte compare and equality versions, instead of glibc calls. * ExpUnrolledLinkedList to not contain inline items. Allow compare hash only via a feature flag compare_hash_only: 64bits should be enough with a good hash function to compare strings by their hashes instead of comparing the strings. Disabled by default CreateHashMap/alice/174693 time: [642.23 µs 643.80 µs 645.24 µs] thrpt: [258.20 MiB/s 258.78 MiB/s 259.41 MiB/s] change: time: [-14.429% -13.303% -12.348%] (p = 0.00 < 0.05) thrpt: [+14.088% +15.344% +16.862%] Performance has improved. CreateHashMap/alice_expull/174693 time: [877.03 µs 880.44 µs 884.67 µs] thrpt: [188.32 MiB/s 189.22 MiB/s 189.96 MiB/s] change: time: [-26.460% -26.274% -26.091%] (p = 0.00 < 0.05) thrpt: [+35.301% +35.637% +35.981%] Performance has improved. CreateHashMap/numbers_zipf/8000000 time: [9.1198 ms 9.1573 ms 9.1961 ms] thrpt: [829.64 MiB/s 833.15 MiB/s 836.57 MiB/s] change: time: [-35.229% -34.828% -34.384%] (p = 0.00 < 0.05) thrpt: [+52.403% +53.440% +54.390%] Performance has improved. * clippy * add bench for ids * inline(always) to inline whole block with bounds checks * cleanup	2023-06-08 11:13:52 +02:00
PSeitz	ccb09aaa83	allow histogram bounds to be passed as Rfc3339 (#2076 )	2023-06-08 09:07:08 +02:00
Valerii	4b7c485a08	feat: add stop words for Hungarian language (#2069 )	2023-06-02 07:26:03 +02:00
PSeitz	3942fc6d2b	update CHANGELOG (#2068 )	2023-06-02 05:00:12 +02:00
Adam Reichold	b325d569ad	Expose phrase-prefix queries via the built-in query parser (#2044 ) * Expose phrase-prefix queries via the built-in query parser This proposes the less-than-imaginative syntax `field:"phrase ter"` to perform a phrase prefix query against `field` using `phrase` and `ter` as the terms. The aim of this is to make this type of query more discoverable and simplify manual testing. I did consider exposing the `max_expansions` parameter similar to how slop is handled, but I think that this is rather something that should be configured via the querser parser (similar to `set_field_boost` and `set_field_fuzzy`) as choosing it requires rather intimiate knowledge of the backing index. Prevent construction of zero or one term phrase-prefix queries via the query parser. * Add example using phrase-prefix search via surface API to improve feature discoverability.	2023-06-01 13:03:16 +02:00
Paul Masurel	7ee78bda52	Readding s in datetime precision variant names (#2065 ) There is no clear win and it change some serialization in quickwit.	2023-06-01 06:39:46 +02:00
Paul Masurel	184a9daa8a	Cancels concurrently running actions for the same PR. (#2067 )	2023-06-01 12:57:38 +09:00

1 2 3 4 5 ...

3035 Commits