tantivy

mirror of https://github.com/quickwit-oss/tantivy.git synced 2026-01-10 19:12:54 +00:00

Author	SHA1	Message	Date
Tomoko Uchida	dd81e38e53	Add WhitespaceTokenizer (#1147 ) * Add WhitespaceTokenizer.	2021-08-29 18:20:49 +09:00
Pascal Seitz	9b3e508753	fix clippy	2021-07-01 18:06:09 +02:00
Pascal Seitz	1e4df54ab3	fix clippy	2021-07-01 17:41:53 +02:00
Paul Masurel	486b8fa9c5	Removing serde-derive dependency (#786 )	2020-03-06 23:33:58 +09:00
Paul Masurel	811fd0cb9e	Dynamic analyzer (#755 ) * Removed generics in tokenizers * lowercaser * Added TokenizerExt * Introducing BoxedTokenizer * Introducing BoxXXXXX helper struct * Closes #762. * Introducing a TextAnalyzer	2020-01-29 18:23:37 +09:00
Christian Hunstad	02af28b3b7	add norwegian stemmer (#717 )	2019-11-27 21:08:59 +09:00
Paul Masurel	ef3eddf3da	clippy first stab (#711 )	2019-11-22 13:09:35 +09:00
kkoziara	0519056bd8	Added handling of pre-tokenized text fields (#642 ). (#669 ) * Added handling of pre-tokenized text fields (#642). * * Updated changelog and examples concerning #642. * Added tokenized_text method to Value implementation. * Implemented From<TokenizedString> for TokenizedStream. * * Removed tokenized flag from TextOptions and code reliance on the flag. * Changed naming to use word "pre-tokenized" instead of "tokenized". * Updated example code. * Fixed comments. * Minor code refactoring. Test improvements.	2019-11-07 10:10:56 +09:00
Paul Masurel	5c6580eb15	fmt (#661 )	2019-10-04 12:10:01 +09:00
Joshua Dutton	9f74786db2	Update import statements in examples, doctests (#633 ) Update import statements to edition 2018, including removing `extern crate` and `#[macro_use]`. Alphabetize the statements.	2019-08-19 07:26:35 +09:00
Paul Masurel	039c0a0863	Introducing a wrapper struct instead of Boxed<BoxableTokenizer> (#631 ) Closes #629	2019-08-15 16:37:04 +09:00
Paul Masurel	498057c5b7	Refactor deletes (#597 ) * Refactor deletes * Removing generation from SegmentUpdater. These have been obsolete for a long time * Number literal clippy * Removed clippy useless allow statement	2019-07-17 13:06:44 +09:00
Paul Masurel	462774b15c	Tiqb feature/2018 (#583 ) * rust 2018 * Added CHANGELOG comment	2019-07-01 10:01:46 +09:00
Paul Masurel	66b4615e4e	Issue/542 (#543 ) * Closes 542. Fast fields are all loaded when the segment reader is created.	2019-05-05 13:52:43 +09:00
Paul Masurel	dac50c6aeb	Dds merged (#539 ) * add ascii folding support * Minor change and added Changelog. * add additional tests * Add tests for ascii folding (#533) * first tests for ascii folding * use a `RawTokenizer` for tokens using punctuation * add test for all (?) folding, inspired by Lucene * Simplification of the unit test code	2019-04-26 10:25:08 +09:00
Paul Masurel	96a4f503ec	Closes #526 (#535 )	2019-04-24 20:59:48 +09:00
Panagiotis Ktistakis	2cd31bcda2	Fix non english stemmers (#521 )	2019-03-27 08:54:16 +09:00
Panagiotis Ktistakis	76609deadf	Add Greek stemmer (#486 )	2019-02-01 06:30:49 +01:00
Paul Masurel	bf94fd77db	Issue/471 (#481 ) * Closes 471 Removing writing_segments in the segment manager as it is now useless. Removing the target merged segment id as it is useless as well. * RAII for tracking which segment is in merge. Closes #471 * fmt * Using Inventory::default().	2019-01-27 12:18:59 +09:00
Paul Masurel	1fd46c1e9b	Clippy	2019-01-28 03:46:23 +01:00
Paul Masurel	63b593bd0a	Lower RAM usage in tests.	2019-01-24 09:10:38 +09:00
Paul Masurel	0b0bf59a32	Allow stemmers in languages other than English (#478 ) Allow users to create stemmers for languages other than English. Add a default stemmer for English. Closes #478	2019-01-23 22:21:00 +09:00
Paul Masurel	a3042e956b	Facet remove unsafe (#454 ) * Removing some unsafe * Removing some unsafe (2)	2018-12-17 09:31:09 +09:00
Paul Masurel	a6e767c877	Cargo fmt	2018-11-30 22:52:45 +09:00
Paul Masurel	07d87e154b	Collector refactoring and multithreaded search (#437 ) * Split Collector into an overall Collector and a per-segment SegmentCollector. Precursor to cross-segment parallelism, and as a side benefit cleans up any per-segment fields from being Option<T> to just T. * Attempt to add MultiCollector back * working. Chained collector is broken though * Fix chained collector * Fix test * Make Weight Send+Sync for parallelization purposes * Expose parameters of RangeQuery for external usage * Removed &mut self * fixing tests * Restored TestCollectors * blop * multicollector working * chained collector working * test broken * fixing unit test * blop * blop * Blop * simplifying APi * blop * better syntax * Simplifying top_collector * refactoring * blop * Sync with master * Added multithread search * Collector refactoring * Schema::builder * CR and rustdoc * CR comments * blop * Added an executor * Sorted the segment readers in the searcher * Update searcher.rs * Fixed unit testst * changed the place where we have the sort-segment-by-count heuristic * using crossbeam::channel * inlining * Comments about panics propagating * Added unit test for executor panicking * Readded default * Removed Default impl * Added unit test for executor	2018-11-30 22:46:59 +09:00
Dru Sellers	e75bb1d6a1	Fix NGram processing of non-ascii characters (#430 ) * A working version * optimize the ngram parsing * Decoding codepoint only once. * Closes #429 * using leading_zeros to make code less cryptic * lookup in a table	2018-10-31 08:35:27 +09:00
Paul Masurel	10f6c07c53	Clippy (#422 ) * Cargo Format * Clippy	2018-09-15 20:20:22 +09:00
Paul Masurel	37e4280c0a	Cargo Format (#420 )	2018-09-15 07:44:22 +09:00
Paul Masurel	dd37e109f2	Merge branch 'issue/368b'	2018-09-11 20:16:14 +09:00
Paul Masurel	63868733a3	Added SnippetGenerator	2018-09-11 09:45:27 +09:00
Paul Masurel	7e5f697d00	Closes #387	2018-09-09 16:23:56 +09:00
Vignesh Sarma K	9ccba9f864	Merge branch 'master' into issue/368	2018-09-07 20:27:38 +05:30
Paul Masurel	c64972e039	Apply unicode lowercasing. (#408 ) Checks if the str is ASCII, and uses a fast track if it is the case. If not, the std's definition of a lowercase character. Closes #406	2018-09-05 09:43:56 +09:00
Vignesh Sarma K (വിഘ്നേഷ് ശ൪മ കെ)	835cdc2fe8	Initial version of snippet refer #368	2018-08-28 20:41:41 +05:30
Paul Masurel	ede97eded6	Removed use	2018-08-28 09:54:04 +09:00
Dru Sellers	af593b1116	Add default EN stopwords to the default analyzer (#381 ) * Add a default list of en stopwords * Add the default en stopword filter to the standard tokenizers * code review feedback	2018-08-22 10:49:39 +09:00
Vignesh Sarma K	09e00f1d42	add position_length to Token (#337 ) * add position_length to Token refer #291 * Add term offset to `PhraseQuery` ref #291 * Add new constructor for `PhraseQuery` that allows custom offset * fix the method name as per pr comment * Closes #291 Added unit test. Using offsets from the analyzer in QueryParser.	2018-08-13 10:14:50 +09:00
Paul Masurel	811ddf2226	Closes #364 (#365 ) * Closes #364 * Trying to raise the recursion limit * Better unit test and bug fix on token offsets	2018-08-08 11:15:20 +09:00
Paul Masurel	b59132966f	Better heap (#311 ) * Changed the heap to a paged memory arena. * Trying to simplify the indexing term hashmap * Exploding datastruct * Removed some complexity in bitpacker	2018-06-04 09:39:18 +09:00
Paul Masurel	bc69dab822	cargo fmt	2018-05-18 10:08:05 +09:00
Dru Sellers	82d87416c2	Implement StopWords Filter (#292 ) * Implement StopWords Filter - added example doctest for alphanum_only.rs so that I could drive my own test of the stopword filter * Style Cop * Switch HashSet Hasher to FNV for speed * Update Change Log * fix missed location renaming	2018-05-09 18:40:41 -07:00
Paul Masurel	24050d0eb5	Remove some unsafe stuff, justified some of it.	2018-05-07 23:57:53 -07:00
Paul Masurel	9a0b7f9855	Rustfmt	2018-05-07 19:50:35 -07:00
Paul Masurel	99c0b84036	Integrating #274 , #280 , #289 into master (#290 ) * Integrating bugfixes into master Closes #274 Closes #280 Closes #289 * Next version will be 0.6	2018-05-06 09:48:25 -07:00
Dru Sellers	ca74c14647	Simple Implementation of NGram Tokenizer (#278 ) * Simple Implementation of NGram Tokenizer It does not yet support edges It could probably be better in many "rusty" ways But the test is passing, so I'll call this a good stopping point for the day. * Remove Ngram from manager. Too many variations * Basic configuration model Should the extensive tests exist here? * Add Sample to provide an End to End testing * Basic Edgegram support * cleanup * code feedback * More code review feedback processed	2018-05-06 09:47:49 -07:00
Paul Masurel	78673172d0	Cargo fmt	2018-04-21 20:05:36 +09:00
Paul Masurel	e44782bf14	No more	2018-04-12 13:01:11 +09:00
Paul Masurel	0cf274135b	Clippy	2018-03-10 13:07:18 +09:00
Paul Masurel	a7ffc0e610	Rustfmt	2018-02-12 10:31:29 +09:00
Paul Masurel	df53dc4ceb	Format	2018-02-03 00:21:05 +09:00

1 2

68 Commits