tantivy

mirror of https://github.com/quickwit-oss/tantivy.git synced 2025-12-23 02:29:57 +00:00

Author	SHA1	Message	Date
PSeitz	514d23a20c	move tokenizer API to seperate crate (#1767 ) closes #1766 Finding tantivy tokenizers is a frustrating experience currently, since they need be updated for each tantivy version. That's unnecessary since the API is rather stable anyway.	2023-01-09 06:37:38 +01:00
Adam Reichold	cd952429d2	Add dictionary-based SplitCompoundWords token filter.	2022-10-27 08:30:33 +02:00
Bruce Mitchener	cf02e32578	Improvements to doc linking, grammar, etc.	2022-09-19 18:10:22 +07:00
Bruce Mitchener	6a88ac3fe3	Documentation improvements. Fix some linking, some grammar, some typos, etc.	2022-09-18 18:05:37 +07:00
Kanji Yomoda	af84e74284	Replace deprecated std package's constants on floats and integers (#1420 )	2022-07-22 08:05:08 +09:00
Paul Masurel	d7b46d2137	Added JSON Type (#1270 ) - Removed useless copy when ingesting JSON. - Bugfix in phrase query with a missing field norms. - Disabled range query on default fields Closes #1251	2022-02-24 16:25:22 +09:00
Paul Masurel	4dc80cfa25	Removes TokenStream chain. (#1283 ) This change is mostly motivated by the introduction of json object. We need to be able to inject a position object to make the position shift.	2022-02-21 09:51:27 +09:00
Paul Masurel	eca6628b3c	Minor refactoring (#1266 )	2022-01-28 15:55:55 +09:00
Paul Masurel	732f6847c0	Field type with codes (#1255 ) * Term are now typed. This change is backward compatible: While the Term has a byte representation that is modified, a Term itself is a transient object that is not serialized as is in the index. Its .field() and .value_bytes() on the other hand are unchanged. This change offers better Debug information for terms. While not necessary it also will help in the support for JSON types. * Renamed Hierarchical Facet -> Facet	2022-01-07 20:49:00 +09:00
Tomoko Uchida	dd81e38e53	Add WhitespaceTokenizer (#1147 ) * Add WhitespaceTokenizer.	2021-08-29 18:20:49 +09:00
Paul Masurel	811fd0cb9e	Dynamic analyzer (#755 ) * Removed generics in tokenizers * lowercaser * Added TokenizerExt * Introducing BoxedTokenizer * Introducing BoxXXXXX helper struct * Closes #762. * Introducing a TextAnalyzer	2020-01-29 18:23:37 +09:00
Paul Masurel	ef3eddf3da	clippy first stab (#711 )	2019-11-22 13:09:35 +09:00
kkoziara	0519056bd8	Added handling of pre-tokenized text fields (#642 ). (#669 ) * Added handling of pre-tokenized text fields (#642). * * Updated changelog and examples concerning #642. * Added tokenized_text method to Value implementation. * Implemented From<TokenizedString> for TokenizedStream. * * Removed tokenized flag from TextOptions and code reliance on the flag. * Changed naming to use word "pre-tokenized" instead of "tokenized". * Updated example code. * Fixed comments. * Minor code refactoring. Test improvements.	2019-11-07 10:10:56 +09:00
Paul Masurel	5c6580eb15	fmt (#661 )	2019-10-04 12:10:01 +09:00
Joshua Dutton	9f74786db2	Update import statements in examples, doctests (#633 ) Update import statements to edition 2018, including removing `extern crate` and `#[macro_use]`. Alphabetize the statements.	2019-08-19 07:26:35 +09:00
Paul Masurel	039c0a0863	Introducing a wrapper struct instead of Boxed<BoxableTokenizer> (#631 ) Closes #629	2019-08-15 16:37:04 +09:00
Paul Masurel	dac50c6aeb	Dds merged (#539 ) * add ascii folding support * Minor change and added Changelog. * add additional tests * Add tests for ascii folding (#533) * first tests for ascii folding * use a `RawTokenizer` for tokens using punctuation * add test for all (?) folding, inspired by Lucene * Simplification of the unit test code	2019-04-26 10:25:08 +09:00
Paul Masurel	96a4f503ec	Closes #526 (#535 )	2019-04-24 20:59:48 +09:00
Panagiotis Ktistakis	2cd31bcda2	Fix non english stemmers (#521 )	2019-03-27 08:54:16 +09:00
Paul Masurel	63b593bd0a	Lower RAM usage in tests.	2019-01-24 09:10:38 +09:00
Paul Masurel	0b0bf59a32	Allow stemmers in languages other than English (#478 ) Allow users to create stemmers for languages other than English. Add a default stemmer for English. Closes #478	2019-01-23 22:21:00 +09:00
Paul Masurel	a6e767c877	Cargo fmt	2018-11-30 22:52:45 +09:00
Paul Masurel	07d87e154b	Collector refactoring and multithreaded search (#437 ) * Split Collector into an overall Collector and a per-segment SegmentCollector. Precursor to cross-segment parallelism, and as a side benefit cleans up any per-segment fields from being Option<T> to just T. * Attempt to add MultiCollector back * working. Chained collector is broken though * Fix chained collector * Fix test * Make Weight Send+Sync for parallelization purposes * Expose parameters of RangeQuery for external usage * Removed &mut self * fixing tests * Restored TestCollectors * blop * multicollector working * chained collector working * test broken * fixing unit test * blop * blop * Blop * simplifying APi * blop * better syntax * Simplifying top_collector * refactoring * blop * Sync with master * Added multithread search * Collector refactoring * Schema::builder * CR and rustdoc * CR comments * blop * Added an executor * Sorted the segment readers in the searcher * Update searcher.rs * Fixed unit testst * changed the place where we have the sort-segment-by-count heuristic * using crossbeam::channel * inlining * Comments about panics propagating * Added unit test for executor panicking * Readded default * Removed Default impl * Added unit test for executor	2018-11-30 22:46:59 +09:00
Dru Sellers	e75bb1d6a1	Fix NGram processing of non-ascii characters (#430 ) * A working version * optimize the ngram parsing * Decoding codepoint only once. * Closes #429 * using leading_zeros to make code less cryptic * lookup in a table	2018-10-31 08:35:27 +09:00
Paul Masurel	37e4280c0a	Cargo Format (#420 )	2018-09-15 07:44:22 +09:00
Paul Masurel	dd37e109f2	Merge branch 'issue/368b'	2018-09-11 20:16:14 +09:00
Paul Masurel	63868733a3	Added SnippetGenerator	2018-09-11 09:45:27 +09:00
Paul Masurel	7e5f697d00	Closes #387	2018-09-09 16:23:56 +09:00
Vignesh Sarma K (വിഘ്നേഷ് ശ൪മ കെ)	835cdc2fe8	Initial version of snippet refer #368	2018-08-28 20:41:41 +05:30
Dru Sellers	82d87416c2	Implement StopWords Filter (#292 ) * Implement StopWords Filter - added example doctest for alphanum_only.rs so that I could drive my own test of the stopword filter * Style Cop * Switch HashSet Hasher to FNV for speed * Update Change Log * fix missed location renaming	2018-05-09 18:40:41 -07:00
Dru Sellers	ca74c14647	Simple Implementation of NGram Tokenizer (#278 ) * Simple Implementation of NGram Tokenizer It does not yet support edges It could probably be better in many "rusty" ways But the test is passing, so I'll call this a good stopping point for the day. * Remove Ngram from manager. Too many variations * Basic configuration model Should the extensive tests exist here? * Add Sample to provide an End to End testing * Basic Edgegram support * cleanup * code feedback * More code review feedback processed	2018-05-06 09:47:49 -07:00
Paul Masurel	78673172d0	Cargo fmt	2018-04-21 20:05:36 +09:00
Paul Masurel	3edb3dce6a	Test not passing	2018-01-25 12:46:32 +09:00
Paul Masurel	44e5c4dfd3	Added alphanum only token filter	2017-12-31 13:43:10 +09:00
Paul Masurel	1e55189db1	NOBUG rustfmt	2017-12-14 19:30:31 +09:00
Paul Masurel	8b1b389a76	NOBUG Clippy	2017-12-14 19:25:12 +09:00
Paul Masurel	8023445b63	docs	2017-11-26 11:52:03 +09:00
Paul Masurel	05ce093f97	doc	2017-11-26 11:43:11 +09:00
Paul Masurel	974c321153	cargo fmt	2017-11-26 11:02:02 +09:00
Paul Masurel	f30ec9b36b	Merge branch 'master' of github.com:tantivy-search/tantivy Conflicts: src/analyzer/mod.rs src/schema/index_record_option.rs src/tokenizer/lower_caser.rs src/tokenizer/tokenizer.rs	2017-11-26 10:54:05 +09:00
Paul Masurel	acd7c1ea2d	Added comments	2017-11-26 10:44:49 +09:00
Paul Masurel	aaeeda2bc5	Editing rustdoc	2017-11-25 13:23:32 +09:00
Paul Masurel	ac4d433fad	Renamed analyzer to tokenizer	2017-11-24 16:50:32 +09:00

43 Commits