Commit Graph

8 Commits

Author SHA1 Message Date
Paul Masurel
7234bef0eb Issue/1198 (#1201)
* Unit test reproducing #1198
* Fixing unit test to handle the error from add_document.
* Bump project version
2021-11-11 16:42:19 +09:00
Pascal Seitz
1e4df54ab3 fix clippy 2021-07-01 17:41:53 +02:00
Vishal Sodani
4bcdca8545 Fixed spelling 2021-02-08 19:51:36 +05:30
Paul Masurel
486b8fa9c5 Removing serde-derive dependency (#786) 2020-03-06 23:33:58 +09:00
Paul Masurel
811fd0cb9e Dynamic analyzer (#755)
* Removed generics in tokenizers

* lowercaser

* Added TokenizerExt

* Introducing BoxedTokenizer

* Introducing BoxXXXXX helper struct

* Closes #762.

* Introducing a TextAnalyzer
2020-01-29 18:23:37 +09:00
Paul Masurel
afe0134d0f Kkoziara remove tokens from doc store (#715)
* Prevent tokens from being stored in the document store.

Commit adds prepare_for_store method to Document, which changes all
PreTokenizedString values into String values. The method is called
before adding document to the document store to prevent tokens from
being saved there. Commit also adds small changes to comments in
pre_tokenized_text example.

* Avoid storing the pretokenized text.
2019-11-25 22:39:12 +09:00
Paul Masurel
fb3d6fa332 Adding Value::From<PretokenizedText> (#697) 2019-11-10 14:39:44 +09:00
kkoziara
0519056bd8 Added handling of pre-tokenized text fields (#642). (#669)
* Added handling of pre-tokenized text fields (#642).

* * Updated changelog and examples concerning #642.
* Added tokenized_text method to Value implementation.
* Implemented From<TokenizedString> for TokenizedStream.

* * Removed tokenized flag from TextOptions and code reliance on the flag.
* Changed naming to use word "pre-tokenized" instead of "tokenized".
* Updated example code.
* Fixed comments.

* Minor code refactoring. Test improvements.
2019-11-07 10:10:56 +09:00