tantivy

mirror of https://github.com/quickwit-oss/tantivy.git synced 2025-12-27 20:42:54 +00:00

Author	SHA1	Message	Date
PSeitz	c3b92a5412	fix compiler warning, cleanup (#2393 ) fix compiler warning for missing feature flag remove unused variables cleanup unused methods	2024-06-11 16:03:50 +08:00
PSeitz	2f55511064	extend indexwriter proptests (#2342 ) * index random values in proptest * add proptest with multiple docs	2024-06-11 16:02:57 +08:00
PSeitz	714f363d43	add bench & test for columnar merging (#2428 ) * add merge columnar proptest * add columnar merge benchmark	2024-06-10 16:26:16 +08:00
PSeitz	93ff7365b0	reduce top hits aggregation memory consumption (#2426 ) move request structure out of top hits aggregation collector and use from the passed structure instead full terms_many_with_top_hits Memory: 58.2 MB (-43.64%) Avg: 425.9680ms (-21.38%) Median: 415.1097ms (-23.56%) [395.5303ms .. 484.6325ms] dense terms_many_with_top_hits Memory: 58.2 MB (-43.64%) Avg: 440.0817ms (-19.68%) Median: 432.2286ms (-21.10%) [403.5632ms .. 497.7541ms] sparse terms_many_with_top_hits Memory: 13.1 MB (-49.31%) Avg: 33.3568ms (-32.19%) Median: 33.0834ms (-31.86%) [32.5126ms .. 35.7397ms] multivalue terms_many_with_top_hits Memory: 58.2 MB (-43.64%) Avg: 414.2340ms (-25.44%) Median: 413.4144ms (-25.64%) [403.9919ms .. 430.3170ms]	2024-06-06 22:32:58 +08:00
Adam Reichold	8151925068	Panicking in spawned Rayon tasks will abort the process by default. (#2409 )	2024-06-04 17:04:30 +09:00
giovannicuccu	1095c9b073	Issue 1787 extended stats (#2247 ) * first version of extended stats along with its tests * using IntermediateExtendStats instead of IntermediateStats with all tests passing * Created struct for request and response * first test with extended_stats * kahan summation and tests with approximate equality * version ready for merge * removed approx dependency * refactor for using ExtendedStats only when needed * interim version * refined version with code formatted * refactored a struct * cosmetic refactor * fix after merge * fix format * added extended_stat bench * merge and new benchmark for extended stats * split stat segment collectors * wrapped intermediate extended stat with a box to limit memory usage * Revert "wrapped intermediate extended stat with a box to limit memory usage" This reverts commit `5b4aa9f393`. * some code reformat, commented kahan summation * refactor after review * refactor after code review * fix after incorrectly restoring kahan summation * modifications for code review + bug fix in merge_fruit * refactor assert_nearly_equals macro * update after code review --------- Co-authored-by: Giovanni Cuccu <gcuccu@imolainformatica.it>	2024-06-04 14:25:17 +08:00
Hamir Mahal	0c634adbe1	style: simplify strings with string interpolation (#2412 ) * style: simplify strings with string interpolation * fix: formatting	2024-05-27 09:16:47 +02:00
PSeitz	2e3641c2ae	return CompactDocValue instead of trait (#2410 ) The CompactDocValue is easier to handle than the trait in some cases like comparison and conversion	2024-05-27 07:33:50 +02:00
Paul Masurel	b806122c81	Fixing flaky test (#2407 )	2024-05-22 10:10:55 +09:00
PSeitz	e1679f3fb9	compact doc (#2402 ) * compact doc * add any value type * pass references when building CompactDoc * remove OwnedValue from API * clippy * clippy * fail on large documents * fmt * cleanup * cleanup * implement Value for different types fix serde_json date Value implementation * fmt * cleanup * fmt * cleanup * store positions instead of pos+len * remove nodes array * remove mediumvec * cleanup * infallible serialize into vec * remove positions indirection * remove 24MB limitation in document use u32 for Addr Remove the 3 byte addressing limitation and use VInt instead * cleanup * extend test * cleanup, add comments * rename, remove pub	2024-05-21 10:16:08 +02:00
PSeitz	5b7cca13e5	lower contention on AggregationLimits (#2394 ) PR https://github.com/quickwit-oss/quickwit/pull/4962 fixes an issue where the AggregationLimits are not passed correctly. Since the AggregationLimits are shared properly we run into contention issues. This PR includes some straightforward improvement to reduce contention, by only calling if the memory changed and avoiding the second read. We probably need some sharding with multiple counters or local caching before updating the global after some threshold.	2024-05-15 12:25:40 +02:00
Paul Masurel	6181c1eb5e	Small changes in the Executor API. (#2391 ) Warning, this change is mildly not backward compatible so I bumped tantivy's version.	2024-05-10 17:19:12 +09:00
Adam Reichold	1ee5f90761	Give allocation control to the caller instead of force a clone (#2389 ) Achieved by moving the boxes out of the temporary reference wrappers which are cloneable themselves, i.e. if required the caller can clone them already or consume them to reuse existing allocations.	2024-05-09 16:01:13 +09:00
PSeitz	71f3b4e4e3	fix ReferenceValue API flaw (#2372 ) * fix ReferenceValue API flaw Remove `Facet` and `TokenizedString` values from the `ReferenceValue` API, as this requires the trait value to have them stored somewhere. Since `TokenizedString` is quite niche, I just copy it into a Box, instead of designing a reference API around it. * fix comment link	2024-05-09 06:14:42 +02:00
trinity-1686a	8cd7ddc535	run block decompression from executor (#2386 ) * run block decompression from executor * add a wrapper with is_closed to oneshot channel * add cancelation test to Executor::spawn_blocking	2024-05-08 12:22:44 +02:00
Paul Masurel	2b76335a95	Removed usage of num_cpus (#2387 ) * Removed usage of num_cpus * handling error	2024-05-08 13:32:52 +09:00
PSeitz	c6b213d8f0	use bingang for agg benchmark (#2378 ) * use bingang for agg benchmark use bingang for agg benchmark, which includes memory consumption Output: ``` full histogram Memory: 15.8 KB Avg: 10.9322ms (+5.44%) Median: 10.8790ms (+9.28%) Min: 10.7470ms Max: 11.3263ms histogram_hard_bounds Memory: 15.5 KB Avg: 5.1939ms (+6.61%) Median: 5.1722ms (+10.98%) Min: 5.0432ms Max: 5.3910ms histogram_with_avg_sub_agg Memory: 48.7 KB Avg: 23.8165ms (+4.57%) Median: 23.7264ms (+10.06%) Min: 23.4995ms Max: 24.8107ms dense histogram Memory: 17.3 KB Avg: 15.6810ms (-8.54%) Median: 15.6174ms (-8.89%) Min: 15.4953ms Max: 16.0702ms histogram_hard_bounds Memory: 15.4 KB Avg: 10.0720ms (-7.33%) Median: 10.0572ms (-7.06%) Min: 9.8500ms Max: 10.4819ms histogram_with_avg_sub_agg Memory: 50.1 KB Avg: 33.0993ms (-7.04%) Median: 32.9499ms (-6.86%) Min: 32.8284ms Max: 34.0529ms sparse histogram Memory: 16.3 KB Avg: 19.2325ms (-0.44%) Median: 19.1211ms (-1.26%) Min: 19.0348ms Max: 19.7902ms histogram_hard_bounds Memory: 16.1 KB Avg: 18.5179ms (-0.61%) Median: 18.4552ms (-0.90%) Min: 18.3799ms Max: 19.0535ms histogram_with_avg_sub_agg Memory: 34.7 KB Avg: 21.2589ms (-0.69%) Median: 21.1867ms (-1.05%) Min: 21.0342ms Max: 21.9900ms ``` * add more bench with term as sub agg	2024-05-07 11:29:49 +02:00
PSeitz	eea70030bf	cleanup top level exports (#2382 ) remove some top level exports	2024-05-07 09:59:41 +02:00
PSeitz	92b5526310	allow more JSON values, fix i64 special case (#2383 ) This changes three things: - Reuse positions_per_path hashmap instead of allocating one per indexed JSON value - Try to cast u64 values to i64 to streamline with search behaviour - Allow top level json values to be of any type, instead of limiting it to JSON objects. Remove special JSON object handling method. TODO: We probably should also try to check f64 to i64 and u64 when indexing, as values may get converted to f64 by the JSON parser	2024-05-01 12:08:12 +02:00
PSeitz	99a59ad37e	remove zero byte check (#2379 ) remove zero byte checks in columnar. zero bytes are converted during serialization now. unify code paths extend test for expected column names	2024-04-26 06:03:28 +02:00
trinity-1686a	6a66a71cbb	modify fastfield range query heuristic (#2375 )	2024-04-25 10:06:11 +02:00
PSeitz	ff40764204	make convert_to_fast_value_and_append_to_json_term pub (#2370 ) * make convert_to_fast_value_and_append_to_json_term pub * clippy	2024-04-23 04:05:41 +02:00
PSeitz	047da20b5b	add json path constructor to term (#2367 )	2024-04-22 12:23:35 +02:00
PSeitz	4f8493d2de	improve document docs (#2359 )	2024-04-22 12:05:16 +02:00
Paul Masurel	8861366137	Owned value relying on Vec instead of BTreeMap (#2364 ) * Owned value relying on Vec instead of BTreeMap * fmt * fix build * fix serialization --------- Co-authored-by: Pascal Seitz <pascal.seitz@gmail.com>	2024-04-22 09:38:05 +02:00
PSeitz	0e9fced336	remove JsonTermWriter (#2238 ) * remove JsonTermWriter remove JsonTermWriter remove path truncation logic, add assertion * fix json_path_writer add sep logic	2024-04-18 16:28:05 +02:00
PSeitz	b257b960b3	validate sort by field type (#2336 ) * validate sort by field type * Update src/index/index.rs Co-authored-by: Adam Reichold <adamreichold@users.noreply.github.com> --------- Co-authored-by: Adam Reichold <adamreichold@users.noreply.github.com>	2024-04-16 04:42:24 +02:00
Adam Reichold	4708171a32	Fix some of the things current Clippy complains about (#2363 )	2024-04-16 04:27:06 +02:00
Adam Reichold	b493743f8d	Fix trait bound of StoreReader::iter (#2360 ) * Fix trait bound of StoreReader::iter Similar to `StoreReader::get`, `StoreReader::iter` should only require `DocumentDeserialize` and not `Document`. * Mark the iterator returned by SegmentReader::doc_ids_alive as Send so it can be used in impls of Stream/AsyncIterator.	2024-04-15 15:50:02 +02:00
PSeitz	dfa3aed32d	check unsupported parameters top_hits (#2351 ) * check unsupported parameters top_hits * move to function	2024-04-10 08:20:52 +02:00
PSeitz	398817ce7b	add index sorting deprecation warning (#2353 ) * add index sorting deprecation warning * remove deprecated IntOptions and DatePrecision	2024-04-10 08:09:09 +02:00
PSeitz	74940e9345	clippy (#2349 ) * fix clippy * fix clippy * fix duplicate imports	2024-04-09 07:54:44 +02:00
PSeitz	92c32979d2	fix postcard compatibility for top_hits, add postcard test (#2346 ) * fix postcard compatibility for top_hits, add postcard test * fix top_hits naming, delay data fetch closes #2347 * fix import	2024-04-09 06:17:25 +02:00
PSeitz	b644d78a32	fix null byte handling in JSON paths (#2345 ) * fix null byte handling in JSON paths closes https://github.com/quickwit-oss/tantivy/issues/2193 closes https://github.com/quickwit-oss/tantivy/issues/2340 * avoid repeated term truncation * fix test * Apply suggestions from code review Co-authored-by: Paul Masurel <paul@quickwit.io> * add comment --------- Co-authored-by: Paul Masurel <paul@quickwit.io>	2024-04-05 09:53:35 +02:00
PSeitz	4e79e11007	add collect_block to BoxableSegmentCollector (#2331 )	2024-03-21 09:10:25 +01:00
PSeitz	67ebba3c3c	expose collect_block buffer size (#2326 ) * expose buffer of collect_block * flip shard_size segment_size	2024-03-15 08:02:08 +01:00
PSeitz	7ce950f141	add method to fetch block of first vals in columnar (#2330 ) * add method to fetch block of first vals in columnar add method to fetch block of first vals in columnar (this is way faster than single calls for full columns) add benchmark fix import warnings ``` test bench_get_block_first_on_full_column ... bench: 56 ns/iter (+/- 26) test bench_get_block_first_on_full_column_single_calls ... bench: 311 ns/iter (+/- 6) test bench_get_block_first_on_multi_column ... bench: 378 ns/iter (+/- 15) test bench_get_block_first_on_multi_column_single_calls ... bench: 546 ns/iter (+/- 13) test bench_get_block_first_on_optional_column ... bench: 291 ns/iter (+/- 6) test bench_get_block_first_on_optional_column_single_calls ... bench: 362 ns/iter (+/- 8) ``` * use remainder	2024-03-15 08:01:47 +01:00
PSeitz	b0e65560a1	handle ip adresses in term aggregation (#2319 ) * handle ip adresses in term aggregation Stores IpAdresses during the segment term aggregation via u64 representation and convert to u128(IpV6Adress) via downcast when converting to intermediate results. Enable Downcasting on `ColumnValues` Expose u64 variant for u128 encoded data via `open_u64_lenient` method. Remove lifetime in VecColumn, to avoid 'static lifetime requirement coming from downcast trait. * rename method	2024-03-14 09:41:18 +01:00
PSeitz	ec37295b2f	add fast path for full columns in fetch_block (#2328 ) Spotted in `range_date_histogram` query in quickwit benchmark: 5% of time copying docs around, which is not needed in the full index case remove Column to ColumnIndex deref	2024-03-14 04:07:11 +01:00
PSeitz	7e41d31c6e	agg: support to deserialize f64 from string (#2311 ) * agg: support to deserialize f64 from string * remove visit_string * disallow NaN	2024-03-05 05:49:41 +01:00
Adam Reichold	40aa4abfe5	Make FacetCounts defaultable and cloneable. (#2322 )	2024-03-05 04:11:11 +01:00
Paul Masurel	6739357314	Removing split_size and adding split_size and shard_size as segmnet_size (#2320 ) aliases.	2024-02-26 11:35:22 +01:00
PSeitz	d57622d54b	support bool type in term aggregation (#2318 ) * support bool type in term aggregation * add Bool to Intermediate Key	2024-02-20 03:22:22 +01:00
PSeitz	f745dbc054	fix Clone for TopNComputer, add top_hits bench (#2315 ) * fix Clone for TopNComputer, add top_hits bench add top_hits agg bench test aggregation::agg_bench::bench::bench_aggregation_terms_many_with_sub_agg ... bench: 123,475,175 ns/iter (+/- 30,608,889) test aggregation::agg_bench::bench::bench_aggregation_terms_many_with_sub_agg_multi ... bench: 194,170,414 ns/iter (+/- 36,495,516) test aggregation::agg_bench::bench::bench_aggregation_terms_many_with_sub_agg_opt ... bench: 179,742,809 ns/iter (+/- 29,976,507) test aggregation::agg_bench::bench::bench_aggregation_terms_many_with_sub_agg_sparse ... bench: 27,592,534 ns/iter (+/- 2,672,370) test aggregation::agg_bench::bench::bench_aggregation_terms_many_with_top_hits_agg ... bench: 552,851,227 ns/iter (+/- 71,975,886) test aggregation::agg_bench::bench::bench_aggregation_terms_many_with_top_hits_agg_multi ... bench: 558,616,384 ns/iter (+/- 100,890,124) test aggregation::agg_bench::bench::bench_aggregation_terms_many_with_top_hits_agg_opt ... bench: 554,031,368 ns/iter (+/- 165,452,650) test aggregation::agg_bench::bench::bench_aggregation_terms_many_with_top_hits_agg_sparse ... bench: 46,435,919 ns/iter (+/- 13,681,935) * add comment	2024-02-20 03:22:00 +01:00
PSeitz	79b041f81f	clippy (#2314 )	2024-02-13 05:56:31 +01:00
PSeitz	0e16ed9ef7	Fix serde for TopNComputer (#2313 ) * Fix serde for TopNComputer The top hits aggregation changed the TopNComputer to be serializable, but capacity needs to be carried over, as it contains logic which is checked against when pushing elements (capacity == 0 is not allowed). * use serde from deser * remove pub, clippy	2024-02-07 12:52:06 +01:00
mochi	88a3275dbb	add shared search executor (#2312 )	2024-02-05 09:33:00 +01:00
PSeitz	48630ceec9	move into new index module (#2259 ) move core modules to index module	2024-01-31 10:30:04 +01:00
Adam Reichold	72002e8a89	Make test builds Clippy clean. (#2277 )	2024-01-31 02:47:06 +01:00
trinity-1686a	3c9297dd64	report if posting list was actually loaded when warming it up (#2309 )	2024-01-29 15:23:16 +01:00

1 2 3 4 5 ...

2410 Commits