tantivy

mirror of https://github.com/quickwit-oss/tantivy.git synced 2025-12-23 02:29:57 +00:00

Author	SHA1	Message	Date
Paul Masurel	03d31f6713	Update CHANGELOG	2019-12-19 10:07:43 +09:00
Paul Masurel	f6000aece7	Closes #732 The future returned by `IndexWriter::merge` does not borrow `&mut self`	2019-12-18 21:48:51 +09:00
Paul Masurel	2b3fe3a2b5	Bumped version for hotfix	2019-12-17 21:10:50 +09:00
petr-tik	431c187a60	Make error handling richer in Footer::is_compatible (#724 ) * WIP implemented is_compatible hide Footer::from_bytes from public consumption - only found Footer::extract used outside the module Add a new error type for IncompatibleIndex add a prototypical call to footer.is_compatible() in ManagedDirectory::open_read to make sure we error before reading it further * Make error handling more ergonomic Add an error subtype for OpenReadError and converters to TantivyError * Remove an unnecessary assert it's follower by the same check that Errors instead of panicking * Correct the compatibility check logic Leave a defensive versioned footer check to make sure we add new logic handling when we add possible footer versions Restricted VersionedFooter::from_bytes to be used inside the crate only remove a half-baked test * WIP. * Return an error if index incompatible - closes #662 Enrich the error type with incompatibility Change return type to Result<bool, TantivyError>, instead of bool Add an Incompatibility enum that enriches the IncompatibleIndex error variant with information, which then allows us to generate a developer-friendly hint how to upgrade library version or switch feature flags for a different compression algorithm Updated changelog Change the signature of is_compatible Added documentation to the Incompatibility Added a conditional test on a Footer with lz4 erroring	2019-12-14 09:14:33 +09:00
kkoziara	0519056bd8	Added handling of pre-tokenized text fields (#642 ). (#669 ) * Added handling of pre-tokenized text fields (#642). * * Updated changelog and examples concerning #642. * Added tokenized_text method to Value implementation. * Implemented From<TokenizedString> for TokenizedStream. * * Removed tokenized flag from TextOptions and code reliance on the flag. * Changed naming to use word "pre-tokenized" instead of "tokenized". * Updated example code. * Fixed comments. * Minor code refactoring. Test improvements.	2019-11-07 10:10:56 +09:00
Paul Masurel	67bce6cbf2	Fixing the construction of the DeleteBitset. (#683 ) Closes #681	2019-11-04 15:39:11 +09:00
Alberto Piai	3a65dc84c8	TopDocs: ensure stable sorting on equal score (#675 ) * TopDocs: ensure stable sorting on equal score When selecting the top K documents by score, we need to ensure stable sorting. Until now, for documents with the same score, we were relying on the (arbitrary) order returned by the BinaryHeap used to implement the collectors. This patch fixes the problem by explicitly using the doc address when harvesting the `TopSegmentCollector` and when merging the results in `TopCollector::merge_fruits()`. This is important (for example) to implement pagination correctly using the TopDocs collector. If sorting isn't stable, documents that have the same score might be ranked in different positions depending on the specific K that was used, thus appearing in two different pages, or in none at all. Fixes gh-671 * TMP: alternative solution (see previous commit) If we add the constrait that D is also PartialOrd in ComparableDoc<T, D>, then we can move the comparison by doc address directly in the cmp implementation of ComparableDoc. * TMP rebase as first commit: add benchmarks for TopSegmentCollector * fixup! TMP: alternative solution (see previous commit) * TMP add changelog entry * TMP run cargo fmt	2019-10-26 15:27:25 +09:00
Paul Masurel	2ea8e618f2	Merge branch 'hotfix-656'	2019-10-01 09:44:56 +09:00
Paul Masurel	94f27f990b	Address #656 Broke the reference loop to make sure that the watch_router can be dropped, and the thread exits.	2019-10-01 09:34:22 +09:00
Paul Masurel	cde9b78b8d	Fixing the issue associated with the Regex performance change	2019-09-18 18:29:27 +09:00
fdb-hiroshima	d8894f0bd2	add checksum check in ManagedDirectory (#605 ) * add checksum check in ManagedDirectory fix #400 * flush after writing checksum * don't checksum atomic file access and clone managed_paths * implement a footer storing metadata about a file this is more of a poc, it require some refactoring into multiple files `terminate(self)` is implemented, but not used anywhere yet * address comments and simplify things with new contract use BitOrder for integer to raw byte conversion consider atomic write imply atomic read, which might not actually be true use some indirection to have a boxable terminating writer * implement TerminatingWrite and make terminate() be called where it should add dependancy to drop_bomb to help find where terminate() should be called implement TerminatingWrite for wrapper writers make tests pass /!\ some tests seems to pass where they shouldn't * remove usage of drop_bomb * fmt * add test for checksum * address some review comments * update changelog * fmt	2019-09-18 18:26:25 +09:00
Paul Masurel	c1635c13f6	RegexQuery performance: make it possible to cache Regexes - remastered by fulmicoton (Closes #639 ) (#641 ) * small docs cleanup * only compile a regex once per RegexQuery Building a `Regex` is an expensive operation. Users of `RegexQuery` need to cache and reuse regexes when searching across multiple fields. This is the first step towards allowing that: we can store the `Regex` directly in the `RegexQuery`, instead of the string pattern. * RegexQuery: account for possible failure in the constructor When building a regex from a str pattern, we have to account for the possibility that the pattern is invalid. Before the previous commit, the failure would happen in the `specialized_weight` method. Now that we store a compiled `Regex` in `RegexQuery`, `specialized_weight` doesn't fail anymore, and we can fail early while constructing `RegexQuery` if the pattern is invalid. This is a breaking change for users of `RegexQuery::new`. * add RegexQuery::from_regex method This builds a `RegexQuery` from an already compiled `Regex`. The use of `Into<Arc<Regex>>` is to allow the caller to either simply pass a `Regex`, or an `Arc<Regex>`, in case it needs to be cached and shared on the caller's side. * Using an Arc in AutomatonWeight Closes #639	2019-08-22 16:14:01 +09:00
Paul Masurel	039c0a0863	Introducing a wrapper struct instead of Boxed<BoxableTokenizer> (#631 ) Closes #629	2019-08-15 16:37:04 +09:00
petr-tik	028b0a749c	Elastic unbounded range query (#624 ) * Tidy up fmt remove unneccessary -> Result<()> followed by run.unwrap() in a test * Adding support for elasticsearch-style unbounded queries Extend the UserInputBound to include Unbounded, so we can reuse formatting and internal query format * Still working on elastic-style range queries Fixes #498 Merge the elastic_range into range Reformat to make code easier to follow, use optional() macro to return Some * Fixed bugs Made the range parser insensitive to whitespace between the ":" and the range. Removed optional parsing of field. Added a unit test for the range parser. Derived PartialEq to compare the results of parsing as structs, instead of strings. Found a bug with that unit test - "}" was parsed as an UserInputBound::Exclusive, instead of UserInputBound::Unbounded. Added an early detection-and-return for in the original range parser * Correct failing test Assume that we will use "{" for Unbounded ranges Add a note in the changelog cargo-fmt * Moved parenthesis to a newline to make nested if-else more visible	2019-08-12 08:24:47 +09:00
Paul Masurel	f428f344da	Various bugfix in the query parser (#619 )	2019-08-08 17:48:21 +09:00
Paul Masurel	143f78eced	Trying to fix #609 (#616 )	2019-08-06 20:33:30 +09:00
Paul Masurel	efd1af1325	Closes #544 . (#607 ) Prepare for release 0.10.1	2019-07-30 13:38:06 +09:00
fdb-hiroshima	6eb4e08636	add support for float (#603 ) * add basic support for float as for i64, they are mapped to u64 for indexing query parser don't work yet * Update value.rs * implement support for float in query parser * Update README.md	2019-07-27 17:57:33 +09:00
Paul Masurel	697c7e721d	Only compile bitpacker4x (#589 )	2019-07-10 18:53:46 +09:00
Paul Masurel	3e368d92cb	Issue/479 (#578 ) * Sort by field relying on tweaked score * Sort by u64/i64 get independent methods.	2019-07-07 17:12:31 +09:00
Paul Masurel	462774b15c	Tiqb feature/2018 (#583 ) * rust 2018 * Added CHANGELOG comment	2019-07-01 10:01:46 +09:00
Paul Masurel	3e0907fe05	Fixed CHANGELOG and disable one test on windows (#577 )	2019-06-27 09:48:53 +09:00
Paul Masurel	e2da92fcb5	Petr tik n510 clear index (#566 ) * Enables clearing the index Closes #510 * Adds an examples to clear and rebuild index * Addressing code review Moved the example from examples/ to docstring above `clear` * Corrected minor typos and missed/duplicate words * Added stamper.revert method to be used for rollback Added type alias for Opstamp Moved to AtomicU64 on stable rust (since 1.34) * Change the method name and doc-string * Remove rollback from delete_all_documents test_add_then_delete_all_documents fails with --test-threads 2 * Passes all the tests with any number of test-threads (ran locally 5 times) * Addressed code review Deleted comments with debug info changed ReloadPolicy to Manual * Removing useless garbage_collect call and updated CHANGELOG	2019-06-12 09:40:03 +09:00
Paul Masurel	4822940b19	Issue/36 (#559 ) * Added explanation * Explain * Splitting weight and idf * Added comments Closes #36	2019-06-06 10:03:54 +09:00
Paul Masurel	444662485f	Remove mut in add_document and delete_term. Made stamper ordering rel… (#551 ) * Remove mut in add_document and delete_term. Made stamper ordering relaxed. * Made batch operations &mut self -> &self * Added example	2019-05-28 10:26:00 +09:00
Paul Masurel	66b4615e4e	Issue/542 (#543 ) * Closes 542. Fast fields are all loaded when the segment reader is created.	2019-05-05 13:52:43 +09:00
Paul Masurel	dac50c6aeb	Dds merged (#539 ) * add ascii folding support * Minor change and added Changelog. * add additional tests * Add tests for ascii folding (#533) * first tests for ascii folding * use a `RawTokenizer` for tokens using punctuation * add test for all (?) folding, inspired by Lucene * Simplification of the unit test code	2019-04-26 10:25:08 +09:00
Paul Masurel	96a4f503ec	Closes #526 (#535 )	2019-04-24 20:59:48 +09:00
Paul Masurel	b7c2d0de97	Clippy2 (#534 ) * Clippy comments Clippy complaints that about the cast of &[u32] to a const __m128i, because of the lack of alignment constraints. This commit passes the OutputBuffer object (which enforces proper alignment) instead of `&[u32]`. Clippy. Block alignment * Code simplification * Added comment. Code simplification * Removed the extraneous freq block len hack.	2019-04-24 12:31:32 +09:00
Paul Masurel	79f3cd6cf4	Added instructions to update	2019-03-24 09:10:31 +09:00
Paul Masurel	a8cc5208f1	Linear simd (#519 ) * linear simd search within block	2019-03-20 22:10:05 +09:00
Paul Masurel	663dd89c05	Feature/reader (#517 ) Adding IndexReader to the API. Making it possible to watch for changes. * Closes #500	2019-03-20 08:39:22 +09:00
barrotsteindev	a934577168	WIP: date field (#487 ) * initial version, still a work in progress * remove redudant or * add chrono::DateTime and index i64 * add more tests * fix tests * pass DateTime by ptr * remove println! * document query_parser rfc 3339 date support * added some more docs about implementation to schema.rs * enforce DateTime is UTC, and re-export chrono * added DateField to changelog * fixed conflict * use INDEXED instead of INT_INDEXED for date fields	2019-03-15 22:10:37 +09:00
Paul Masurel	94f1885334	Issue/513 (#514 ) * Closes #513 * Clean up and doc * Updated changelog	2019-03-07 09:39:30 +09:00
Jason Goldberger	788b3803d9	updated changelog (#501 ) * updated changelog * Update CHANGELOG.md * Update CHANGELOG.md	2019-02-19 00:25:18 +09:00
Paul Masurel	515adff644	Merge branch 'hotfix/0.8.2'	2019-02-15 08:30:27 +09:00
Paul Masurel	e70a45426a	0.8.2 release Backporting a fix for non x86_64 platforms	2019-02-14 09:16:27 +09:00
Paul Masurel	96eaa5bc63	Positions	2019-02-05 14:50:16 +01:00
Paul Masurel	2fb219d017	Changelog	2019-01-24 09:12:07 +09:00
Paul Masurel	286bb75a0c	Updated changelog	2019-01-24 09:03:58 +09:00
Paul Masurel	c0cc6aac83	Updated changelog	2019-01-23 22:22:34 +09:00
Paul Masurel	7df3260a15	Version bump	2019-01-23 10:13:18 +09:00
Paul Masurel	b8241c5603	0.8.0	2018-12-26 10:18:34 +09:00
Paul Masurel	07d87e154b	Collector refactoring and multithreaded search (#437 ) * Split Collector into an overall Collector and a per-segment SegmentCollector. Precursor to cross-segment parallelism, and as a side benefit cleans up any per-segment fields from being Option<T> to just T. * Attempt to add MultiCollector back * working. Chained collector is broken though * Fix chained collector * Fix test * Make Weight Send+Sync for parallelization purposes * Expose parameters of RangeQuery for external usage * Removed &mut self * fixing tests * Restored TestCollectors * blop * multicollector working * chained collector working * test broken * fixing unit test * blop * blop * Blop * simplifying APi * blop * better syntax * Simplifying top_collector * refactoring * blop * Sync with master * Added multithread search * Collector refactoring * Schema::builder * CR and rustdoc * CR comments * blop * Added an executor * Sorted the segment readers in the searcher * Update searcher.rs * Fixed unit testst * changed the place where we have the sort-segment-by-count heuristic * using crossbeam::channel * inlining * Comments about panics propagating * Added unit test for executor panicking * Readded default * Removed Default impl * Added unit test for executor	2018-11-30 22:46:59 +09:00
Paul Masurel	14908479d5	Release 0.7.1	2018-11-02 17:56:25 +09:00
Paul Masurel	21a9940726	Update Changelog with #388 (#418 )	2018-09-14 09:31:11 +09:00
Paul Masurel	cc23194c58	Editing document	2018-09-11 20:15:38 +09:00
Paul Masurel	2649c8a715	Issue/246 (#393 ) * Moving Range and All to Leaves * Parsing OR/AND * Simplify user input ast * AND and OR supported. Returning an error when mixing syntax Closes #246 * Added support for NOT * Updated changelog	2018-08-28 11:03:54 +09:00
Dru Sellers	ef3a16a129	Switch from error-chain to failure crate (#376 ) * Switch from error-chain to failure crate * Added deprecated alias for * Started editing the changeld	2018-08-20 09:40:45 +09:00
Paul Masurel	31655e92d7	Preparing release 0.6.1	2018-07-10 09:12:26 +09:00

1 2 3 4

171 Commits