trinity-1686a
064518156f
refactor tokenization pipeline to use GATs ( #1924 )
...
* refactor tokenization pipeline to use GATs
* fix doctests
* fix clippy lints
* remove commented code
2023-03-09 09:39:37 +01:00
Paul Masurel
eca6628b3c
Minor refactoring ( #1266 )
2022-01-28 15:55:55 +09:00
Tomoko Uchida
74e36c7e97
Add unit tests for tokenizers and filters ( #1156 )
...
* add unit test for SimpleTokenizer
* add unit tests for tokenizers and filters.
2021-09-27 10:22:01 +09:00
Paul Masurel
811fd0cb9e
Dynamic analyzer ( #755 )
...
* Removed generics in tokenizers
* lowercaser
* Added TokenizerExt
* Introducing BoxedTokenizer
* Introducing BoxXXXXX helper struct
* Closes #762 .
* Introducing a TextAnalyzer
2020-01-29 18:23:37 +09:00
Paul Masurel
ef3eddf3da
clippy first stab ( #711 )
2019-11-22 13:09:35 +09:00
Joshua Dutton
9f74786db2
Update import statements in examples, doctests ( #633 )
...
Update import statements to edition 2018, including removing
`extern crate` and `#[macro_use]`. Alphabetize the statements.
2019-08-19 07:26:35 +09:00
Paul Masurel
dac50c6aeb
Dds merged ( #539 )
...
* add ascii folding support
* Minor change and added Changelog.
* add additional tests
* Add tests for ascii folding (#533 )
* first tests for ascii folding
* use a `RawTokenizer` for tokens using punctuation
* add test for all (?) folding, inspired by Lucene
* Simplification of the unit test code
2019-04-26 10:25:08 +09:00
Dru Sellers
82d87416c2
Implement StopWords Filter ( #292 )
...
* Implement StopWords Filter
- added example doctest for alphanum_only.rs so that I could
drive my own test of the stopword filter
* Style Cop
* Switch HashSet Hasher to FNV for speed
* Update Change Log
* fix missed location renaming
2018-05-09 18:40:41 -07:00
Paul Masurel
cb11b92505
Added comments
2018-01-04 12:27:14 +09:00
Paul Masurel
1e55189db1
NOBUG rustfmt
2017-12-14 19:30:31 +09:00
Paul Masurel
f24e5f405e
NOBUG intellij misc lint
2017-12-14 18:23:35 +09:00
Paul Masurel
974c321153
cargo fmt
2017-11-26 11:02:02 +09:00
Paul Masurel
acd7c1ea2d
Added comments
2017-11-26 10:44:49 +09:00
Paul Masurel
ac4d433fad
Renamed analyzer to tokenizer
2017-11-24 16:50:32 +09:00