tantivy

mirror of https://github.com/quickwit-oss/tantivy.git synced 2026-05-18 17:20:41 +00:00

Author	SHA1	Message	Date
PSeitz	21d057059e	clippy (#2527 ) * clippy * clippy * clippy * clippy * convert allow to expect and remove unused * cargo fmt * cleanup * export sample * clippy	2024-10-22 09:26:54 +08:00
PSeitz	fdecb79273	tokenizer-api: reduce Tokenizer overhead (#2062 ) * tokenizer-api: reduce Tokenizer overhead Previously a new `Token` for each text encountered was created, which contains `String::with_capacity(200)` In the new API the token_stream gets mutable access to the tokenizer, this allows state to be shared (in this PR Token is shared). Ideally the allocation for the BoxTokenStream would also be removed, but this may require some lifetime tricks. * simplify api * move lowercase and ascii folding buffer to global * empty Token text as default	2023-06-08 18:37:58 +08:00
trinity-1686a	064518156f	refactor tokenization pipeline to use GATs (#1924 ) * refactor tokenization pipeline to use GATs * fix doctests * fix clippy lints * remove commented code	2023-03-09 09:39:37 +01:00
Tomoko Uchida	74e36c7e97	Add unit tests for tokenizers and filters (#1156 ) * add unit test for SimpleTokenizer * add unit tests for tokenizers and filters.	2021-09-27 10:22:01 +09:00
Paul Masurel	811fd0cb9e	Dynamic analyzer (#755 ) * Removed generics in tokenizers * lowercaser * Added TokenizerExt * Introducing BoxedTokenizer * Introducing BoxXXXXX helper struct * Closes #762. * Introducing a TextAnalyzer	2020-01-29 18:23:37 +09:00
Paul Masurel	dac50c6aeb	Dds merged (#539 ) * add ascii folding support * Minor change and added Changelog. * add additional tests * Add tests for ascii folding (#533) * first tests for ascii folding * use a `RawTokenizer` for tokens using punctuation * add test for all (?) folding, inspired by Lucene * Simplification of the unit test code	2019-04-26 10:25:08 +09:00
Paul Masurel	37e4280c0a	Cargo Format (#420 )	2018-09-15 07:44:22 +09:00
Vignesh Sarma K	09e00f1d42	add position_length to Token (#337 ) * add position_length to Token refer #291 * Add term offset to `PhraseQuery` ref #291 * Add new constructor for `PhraseQuery` that allows custom offset * fix the method name as per pr comment * Closes #291 Added unit test. Using offsets from the analyzer in QueryParser.	2018-08-13 10:14:50 +09:00
Paul Masurel	1e55189db1	NOBUG rustfmt	2017-12-14 19:30:31 +09:00
Paul Masurel	f24e5f405e	NOBUG intellij misc lint	2017-12-14 18:23:35 +09:00
Paul Masurel	974c321153	cargo fmt	2017-11-26 11:02:02 +09:00
Paul Masurel	f30ec9b36b	Merge branch 'master' of github.com:tantivy-search/tantivy Conflicts: src/analyzer/mod.rs src/schema/index_record_option.rs src/tokenizer/lower_caser.rs src/tokenizer/tokenizer.rs	2017-11-26 10:54:05 +09:00
Paul Masurel	acd7c1ea2d	Added comments	2017-11-26 10:44:49 +09:00
Paul Masurel	ac4d433fad	Renamed analyzer to tokenizer	2017-11-24 16:50:32 +09:00

14 Commits