tantivy

mirror of https://github.com/quickwit-oss/tantivy.git synced 2026-05-28 22:20:42 +00:00

Author	SHA1	Message	Date
PSeitz	fdecb79273	tokenizer-api: reduce Tokenizer overhead (#2062 ) * tokenizer-api: reduce Tokenizer overhead Previously a new `Token` for each text encountered was created, which contains `String::with_capacity(200)` In the new API the token_stream gets mutable access to the tokenizer, this allows state to be shared (in this PR Token is shared). Ideally the allocation for the BoxTokenStream would also be removed, but this may require some lifetime tricks. * simplify api * move lowercase and ascii folding buffer to global * empty Token text as default	2023-06-08 18:37:58 +08:00
trinity-1686a	064518156f	refactor tokenization pipeline to use GATs (#1924 ) * refactor tokenization pipeline to use GATs * fix doctests * fix clippy lints * remove commented code	2023-03-09 09:39:37 +01:00
Paul Masurel	eca6628b3c	Minor refactoring (#1266 )	2022-01-28 15:55:55 +09:00
Tomoko Uchida	74e36c7e97	Add unit tests for tokenizers and filters (#1156 ) * add unit test for SimpleTokenizer * add unit tests for tokenizers and filters.	2021-09-27 10:22:01 +09:00
Paul Masurel	811fd0cb9e	Dynamic analyzer (#755 ) * Removed generics in tokenizers * lowercaser * Added TokenizerExt * Introducing BoxedTokenizer * Introducing BoxXXXXX helper struct * Closes #762. * Introducing a TextAnalyzer	2020-01-29 18:23:37 +09:00
Paul Masurel	ef3eddf3da	clippy first stab (#711 )	2019-11-22 13:09:35 +09:00
Joshua Dutton	9f74786db2	Update import statements in examples, doctests (#633 ) Update import statements to edition 2018, including removing `extern crate` and `#[macro_use]`. Alphabetize the statements.	2019-08-19 07:26:35 +09:00
Paul Masurel	dac50c6aeb	Dds merged (#539 ) * add ascii folding support * Minor change and added Changelog. * add additional tests * Add tests for ascii folding (#533) * first tests for ascii folding * use a `RawTokenizer` for tokens using punctuation * add test for all (?) folding, inspired by Lucene * Simplification of the unit test code	2019-04-26 10:25:08 +09:00
Dru Sellers	82d87416c2	Implement StopWords Filter (#292 ) * Implement StopWords Filter - added example doctest for alphanum_only.rs so that I could drive my own test of the stopword filter * Style Cop * Switch HashSet Hasher to FNV for speed * Update Change Log * fix missed location renaming	2018-05-09 18:40:41 -07:00
Paul Masurel	cb11b92505	Added comments	2018-01-04 12:27:14 +09:00
Paul Masurel	1e55189db1	NOBUG rustfmt	2017-12-14 19:30:31 +09:00
Paul Masurel	f24e5f405e	NOBUG intellij misc lint	2017-12-14 18:23:35 +09:00
Paul Masurel	974c321153	cargo fmt	2017-11-26 11:02:02 +09:00
Paul Masurel	acd7c1ea2d	Added comments	2017-11-26 10:44:49 +09:00
Paul Masurel	ac4d433fad	Renamed analyzer to tokenizer	2017-11-24 16:50:32 +09:00

15 Commits