tantivy

mirror of https://github.com/quickwit-oss/tantivy.git synced 2026-05-29 14:40:40 +00:00

Author	SHA1	Message	Date
PSeitz	fdecb79273	tokenizer-api: reduce Tokenizer overhead (#2062 ) * tokenizer-api: reduce Tokenizer overhead Previously a new `Token` for each text encountered was created, which contains `String::with_capacity(200)` In the new API the token_stream gets mutable access to the tokenizer, this allows state to be shared (in this PR Token is shared). Ideally the allocation for the BoxTokenStream would also be removed, but this may require some lifetime tricks. * simplify api * move lowercase and ascii folding buffer to global * empty Token text as default	2023-06-08 18:37:58 +08:00
trinity-1686a	064518156f	refactor tokenization pipeline to use GATs (#1924 ) * refactor tokenization pipeline to use GATs * fix doctests * fix clippy lints * remove commented code	2023-03-09 09:39:37 +01:00
Antoine G	11e4225f23	doc fix (#1391 ) Documentation fix.	2022-06-21 15:53:33 +09:00
PSeitz	7f45a6ac96	allow setting tokenizer manager on index (#1362 ) handle json in tokenizer_for_field	2022-05-09 18:15:45 +09:00
Paul Masurel	eca6628b3c	Minor refactoring (#1266 )	2022-01-28 15:55:55 +09:00
Tomoko Uchida	dd81e38e53	Add WhitespaceTokenizer (#1147 ) * Add WhitespaceTokenizer.	2021-08-29 18:20:49 +09:00
Paul Masurel	811fd0cb9e	Dynamic analyzer (#755 ) * Removed generics in tokenizers * lowercaser * Added TokenizerExt * Introducing BoxedTokenizer * Introducing BoxXXXXX helper struct * Closes #762. * Introducing a TextAnalyzer	2020-01-29 18:23:37 +09:00
Paul Masurel	039c0a0863	Introducing a wrapper struct instead of Boxed<BoxableTokenizer> (#631 ) Closes #629	2019-08-15 16:37:04 +09:00
Paul Masurel	462774b15c	Tiqb feature/2018 (#583 ) * rust 2018 * Added CHANGELOG comment	2019-07-01 10:01:46 +09:00
Paul Masurel	66b4615e4e	Issue/542 (#543 ) * Closes 542. Fast fields are all loaded when the segment reader is created.	2019-05-05 13:52:43 +09:00
Paul Masurel	63b593bd0a	Lower RAM usage in tests.	2019-01-24 09:10:38 +09:00
Paul Masurel	0b0bf59a32	Allow stemmers in languages other than English (#478 ) Allow users to create stemmers for languages other than English. Add a default stemmer for English. Closes #478	2019-01-23 22:21:00 +09:00
Paul Masurel	dd37e109f2	Merge branch 'issue/368b'	2018-09-11 20:16:14 +09:00
Paul Masurel	63868733a3	Added SnippetGenerator	2018-09-11 09:45:27 +09:00
Paul Masurel	7e5f697d00	Closes #387	2018-09-09 16:23:56 +09:00
Paul Masurel	ede97eded6	Removed use	2018-08-28 09:54:04 +09:00
Dru Sellers	af593b1116	Add default EN stopwords to the default analyzer (#381 ) * Add a default list of en stopwords * Add the default en stopword filter to the standard tokenizers * code review feedback	2018-08-22 10:49:39 +09:00
Paul Masurel	78673172d0	Cargo fmt	2018-04-21 20:05:36 +09:00
Paul Masurel	49519c3f61	added comments	2018-01-04 12:53:20 +09:00
Paul Masurel	1e55189db1	NOBUG rustfmt	2017-12-14 19:30:31 +09:00
Paul Masurel	f24e5f405e	NOBUG intellij misc lint	2017-12-14 18:23:35 +09:00
Paul Masurel	05ce093f97	doc	2017-11-26 11:43:11 +09:00
Paul Masurel	974c321153	cargo fmt	2017-11-26 11:02:02 +09:00
Paul Masurel	aaeeda2bc5	Editing rustdoc	2017-11-25 13:23:32 +09:00
Paul Masurel	ac4d433fad	Renamed analyzer to tokenizer	2017-11-24 16:50:32 +09:00

25 Commits