tantivy

mirror of https://github.com/quickwit-oss/tantivy.git synced 2026-06-02 08:30:41 +00:00

Author	SHA1	Message	Date
Paul Masurel	77505c3d03	Making stemming optional. (#2791 ) Fixed code and CI to run on no default features. Co-authored-by: Paul Masurel <paul.masurel@datadoghq.com>	2026-01-02 12:40:42 +01:00
PSeitz	21d057059e	clippy (#2527 ) * clippy * clippy * clippy * clippy * convert allow to expect and remove unused * cargo fmt * cleanup * export sample * clippy	2024-10-22 09:26:54 +08:00
gezihuzi	95a4ddea3e	Fix: Improve collapse_overlapped_ranges function (#2474 ) * Fix: Improve collapse_overlapped_ranges function - Refactor into separate sort_and_deduplicate_ranges and merge_overlapping_ranges functions - Enhance sorting to consider both start and end of ranges - Optimize merging logic to handle adjacent ranges - Add comprehensive examples in function documentation - Ensure proper handling of duplicate and unsorted input ranges - Improve overall efficiency and readability of range collapsing algorithm * move debug_assert --------- Co-authored-by: PSeitz <PSeitz@users.noreply.github.com>	2024-09-04 12:39:13 +08:00
Adam Reichold	72002e8a89	Make test builds Clippy clean. (#2277 )	2024-01-31 02:47:06 +01:00
PSeitz	c2b0469180	improve docs, rework exports (#2220 ) * rework exports move snippet and advice make indexer pub, remove indexer reexports * add deprecation warning * add architecture overview	2023-10-18 09:22:24 +02:00
PSeitz	03a1f40767	rename DocValue to Value (#2197 ) rename DocValue to Value to avoid confusion with lucene DocValues rename Value to OwnedValue	2023-10-02 17:03:00 +02:00
Harrison Burt	1c7c6fd591	POC: Tantivy documents as a trait (#2071 ) * fix windows build (#1) * Fix windows build * Add doc traits * Add field value iter * Add value and serialization * Adjust order * Fix bug * Correct type * Fix generic bugs * Reformat code * Add generic to index writer which I forgot about * Fix missing generics on single segment writer * Add missing type export * Add default methods for convenience * Cleanup * Fix more-like-this query to use standard types * Update API and fix tests * Add doc traits * Add field value iter * Add value and serialization * Adjust order * Fix bug * Correct type * Rebase main and fix conflicts * Reformat code * Merge upstream * Fix missing generics on single segment writer * Add missing type export * Add default methods for convenience * Cleanup * Fix more-like-this query to use standard types * Update API and fix tests * Add tokenizer improvements from previous commits * Add tokenizer improvements from previous commits * Reformat * Fix unit tests * Fix unit tests * Use enum in changes * Stage changes * Add new deserializer logic * Add serializer integration * Add document deserializer * Implement new (de)serialization api for existing types * Fix bugs and type errors * Add helper implementations * Fix errors * Reformat code * Add unit tests and some code organisation for serialization * Add unit tests to deserializer * Add some small docs * Add support for deserializing serde values * Reformat * Fix typo * Fix typo * Change repr of facet * Remove unused trait methods * Add child value type * Resolve comments * Fix build * Fix more build errors * Fix more build errors * Fix the tests I missed * Fix examples * fix numerical order, serialize PreTok Str * fix coverage * rename Document to TantivyDocument, rename DocumentAccess to Document add Binary prefix to binary de/serialization * fix coverage --------- Co-authored-by: Pascal Seitz <pascal.seitz@gmail.com>	2023-10-02 10:01:16 +02:00
PSeitz	480763db0d	track memory arena memory usage (#2148 )	2023-08-16 18:19:42 +02:00
François Massot	d73706dede	Ngram tokenizer now returns an error with invalid arguments.	2023-06-25 20:13:24 +02:00
PSeitz	fdecb79273	tokenizer-api: reduce Tokenizer overhead (#2062 ) * tokenizer-api: reduce Tokenizer overhead Previously a new `Token` for each text encountered was created, which contains `String::with_capacity(200)` In the new API the token_stream gets mutable access to the tokenizer, this allows state to be shared (in this PR Token is shared). Ideally the allocation for the BoxTokenStream would also be removed, but this may require some lifetime tricks. * simplify api * move lowercase and ascii folding buffer to global * empty Token text as default	2023-06-08 18:37:58 +08:00
Sergei Lavrentev	8cf26da4b2	Add possibility to set up highlighten prefix and postfix for snippet (#1422 ) * add possibility to change highlight prefix and postfix * add comment to Snippet::new * add test for highlighten elements * add default highlight prefix and postfix constants * fix spelling * fix tests * fix spelling * do fixes after code review * reduce test_snippet_generator_custom_highlighted_elements code * fix fmt * change names to more convenient --------- Co-authored-by: Sergei Lavrentev <23312691+lavrxxx@users.noreply.github.com>	2023-05-23 15:09:24 +02:00
PSeitz	74f9eafefc	refactor Term (#2006 ) * refactor Term add ValueBytes for serialized term values add missing debug for ip skip unnecessary json path validation remove code duplication add DATE_TIME_PRECISION_INDEXED constant add missing Term clarification remove weird value_bytes_mut() API * fix naming	2023-04-20 15:31:43 +02:00
Adam Reichold	2080c370c2	Enable usage of FuzzyTermQuery for specific fields via QueryParser (#1750 ) * Make nightly Clippy mostly happy. * Document how to produce TermSetQuery queries using QueryParser. * Enable construction of queries using FuzzyTermQuery via the QueryParser * Use FxHashMap instead of HashMap in the QueryParser as these hash tables are not exposed to DoS attacks. * Use a struct instead of a tuple to improve readability.	2023-01-04 18:11:27 +09:00
Bruce Mitchener	cb252a42af	docs: "associated to" -> "associated with" (#1557 ) This reads better this way.	2022-09-26 20:23:37 +09:00
Paul Masurel	4e350c5f1b	Clippy	2022-09-02 13:05:00 +09:00
Paul Masurel	3a9727aa91	Pleasing Clippy	2022-08-27 11:33:03 +02:00
UEDA Akira	17093e8ffe	Collapse overlapped highlighted ranges (#1473 )	2022-08-26 14:37:08 +09:00
Paul Masurel	21519788ea	Build fix (#1470 )	2022-08-24 07:16:38 +09:00
Adam Reichold	71ab482720	RFC: Use a more general but still object-safe signature for Query::query_terms. (#1468 ) * Use a more general but still object-safe signature for Query::query_terms. * Further constraint the generalized Query::query_terms signature to allow extracting references to terms.	2022-08-24 06:34:07 +09:00
Kian-Meng Ang	625bcb4877	Fix typos and markdowns Found via these commands: codespell -L crate,ser,panting,beauti,hart,ue,atleast,childs,ond,pris,hel,mot markdownlint .md doc/src/.md --disable MD013 MD025 MD033 MD001 MD024 MD036 MD041 MD003	2022-08-13 18:25:47 +08:00
Evance Soumaoro	fad3faefe2	added InvertedIndexReader::doc_freq_async and SnippetGenerator::new methods	2022-08-12 06:39:10 +00:00
Paul Masurel	eca6628b3c	Minor refactoring (#1266 )	2022-01-28 15:55:55 +09:00
Paul Masurel	732f6847c0	Field type with codes (#1255 ) * Term are now typed. This change is backward compatible: While the Term has a byte representation that is modified, a Term itself is a transient object that is not serialized as is in the index. Its .field() and .value_bytes() on the other hand are unchanged. This change offers better Debug information for terms. While not necessary it also will help in the support for JSON types. * Renamed Hierarchical Facet -> Facet	2022-01-07 20:49:00 +09:00
Liam Warfield	17e00df112	Change Snippet.fragments -> Snippet.fragment (#1243 ) * Change Snippet.fragments -> Snippet.fragment * Apply suggestions from code review Co-authored-by: Liam Warfield <lwarfield@arista.com>	2022-01-03 22:23:51 +09:00
Paul Masurel	7234bef0eb	Issue/1198 (#1201 ) * Unit test reproducing #1198 * Fixing unit test to handle the error from add_document. * Bump project version	2021-11-11 16:42:19 +09:00
Paul Masurel	ffe4446d90	Minor lint comments (#1166 )	2021-10-06 11:27:48 +09:00
Pascal Seitz	9b3e508753	fix clippy	2021-07-01 18:06:09 +02:00
Pascal Seitz	1e4df54ab3	fix clippy	2021-07-01 17:41:53 +02:00
Pascal Seitz	2de249af74	clippy fixes	2021-07-01 17:37:37 +02:00
Paul Masurel	6e4b61154f	Issue/1070 (#1071 ) Add a boolean flag in the Query::query_terms informing on whether position information is required. Closes #1070	2021-06-03 22:33:20 +09:00
Paul Masurel	31137beea6	Replacing (start, end) by Range	2021-03-10 14:06:21 +09:00
Paul Masurel	c23a03ad81	Large API Change in the Directory API. (#901 ) Tantivy used to assume that all files could be somehow memory mapped. After this change, Directory return a `FileSlice` that can be reduced and eventually read into an `OwnedBytes` object. Long and blocking io operation are still required by they do not span over the entire file.	2020-10-08 16:36:51 +09:00
Paul Masurel	96f946d4c3	Raultang master (#879 ) * add support for indexed bytes fast field * remove backup code file * refine test cases * Simplified unit test. Renamed it as it is testing the storable part. Not the indexed part. * Small refactoring and added unit test. If multivalued we only retain the first FAST value. Co-authored-by: Raul <raul.tang.lc@gmail.com>	2020-10-01 18:03:18 +09:00
Paul Masurel	73024a8af3	Fixing compilation of bench and doctests.	2020-09-08 07:18:43 +09:00
Paul Masurel	439d6956a9	Returning Result in some of the API (#880 ) * Returning Result in some of the API * Introducing `.writer_for_test(..)`	2020-09-07 15:52:34 +09:00
Paul Masurel	2481c87be8	Block wand (#856 )	2020-08-19 22:36:36 +09:00
Paul Masurel	ae14022bf0	Removed `use::Result`. (#771 )	2020-01-31 18:47:02 +09:00
Paul Masurel	811fd0cb9e	Dynamic analyzer (#755 ) * Removed generics in tokenizers * lowercaser * Added TokenizerExt * Introducing BoxedTokenizer * Introducing BoxXXXXX helper struct * Closes #762. * Introducing a TextAnalyzer	2020-01-29 18:23:37 +09:00
Paul Masurel	1868fc1e2c	Text fix	2019-11-20 23:00:39 +09:00
Paul Masurel	451a0252ab	thread pool merge (#704 )	2019-11-20 21:18:05 +09:00
Joshua Dutton	9f74786db2	Update import statements in examples, doctests (#633 ) Update import statements to edition 2018, including removing `extern crate` and `#[macro_use]`. Alphabetize the statements.	2019-08-19 07:26:35 +09:00
Paul Masurel	039c0a0863	Introducing a wrapper struct instead of Boxed<BoxableTokenizer> (#631 ) Closes #629	2019-08-15 16:37:04 +09:00
Paul Masurel	0bc2c64a53	2018 (#585 ) * removing macro import for fail-rs * Downcast-rs * matches	2019-07-07 17:09:04 +09:00
Paul Masurel	462774b15c	Tiqb feature/2018 (#583 ) * rust 2018 * Added CHANGELOG comment	2019-07-01 10:01:46 +09:00
Paul Masurel	66b4615e4e	Issue/542 (#543 ) * Closes 542. Fast fields are all loaded when the segment reader is created.	2019-05-05 13:52:43 +09:00
Paul Masurel	663dd89c05	Feature/reader (#517 ) Adding IndexReader to the API. Making it possible to watch for changes. * Closes #500	2019-03-20 08:39:22 +09:00
Paul Masurel	63b593bd0a	Lower RAM usage in tests.	2019-01-24 09:10:38 +09:00
Paul Masurel	279a9eb5e3	Closes #449 (#450 ) Clippy working on stable. Clippy warnings addressed	2018-12-10 12:20:59 +09:00
fdb-hiroshima	21a24672d8	Add accessors for Snippet and HighlightSection (#448 ) * Add accessors for Snippet and HighlightSection And add an example of custom highlighter * Remove inline(always) and unnecessary empty lines	2018-12-02 18:00:16 +09:00
Paul Masurel	a6e767c877	Cargo fmt	2018-11-30 22:52:45 +09:00

1 2

77 Commits