Added handling of pre-tokenized text fields (#642). (#669)

* Added handling of pre-tokenized text fields (#642).

* * Updated changelog and examples concerning #642.
* Added tokenized_text method to Value implementation.
* Implemented From<TokenizedString> for TokenizedStream.

* * Removed tokenized flag from TextOptions and code reliance on the flag.
* Changed naming to use word "pre-tokenized" instead of "tokenized".
* Updated example code.
* Fixed comments.

* Minor code refactoring. Test improvements.
This commit is contained in:
kkoziara
2019-11-07 02:10:56 +01:00
committed by Paul Masurel
parent 7305ad575e
commit 0519056bd8
9 changed files with 534 additions and 21 deletions

View File

@@ -136,6 +136,7 @@ mod simple_tokenizer;
mod stemmer;
mod stop_word_filter;
mod token_stream_chain;
mod tokenized_string;
mod tokenizer;
mod tokenizer_manager;
@@ -152,7 +153,9 @@ pub use self::stop_word_filter::StopWordFilter;
pub(crate) use self::token_stream_chain::TokenStreamChain;
pub use self::tokenizer::BoxedTokenizer;
pub use self::tokenized_string::{PreTokenizedStream, PreTokenizedString};
pub use self::tokenizer::{Token, TokenFilter, TokenStream, Tokenizer};
pub use self::tokenizer_manager::TokenizerManager;
/// Maximum authorized len (in bytes) for a token.