tantivy

mirror of https://github.com/quickwit-oss/tantivy.git synced 2026-01-06 17:22:54 +00:00

Author	SHA1	Message	Date
PSeitz	714f363d43	add bench & test for columnar merging (#2428 ) * add merge columnar proptest * add columnar merge benchmark	2024-06-10 16:26:16 +08:00
PSeitz	2e3641c2ae	return CompactDocValue instead of trait (#2410 ) The CompactDocValue is easier to handle than the trait in some cases like comparison and conversion	2024-05-27 07:33:50 +02:00
PSeitz	e1679f3fb9	compact doc (#2402 ) * compact doc * add any value type * pass references when building CompactDoc * remove OwnedValue from API * clippy * clippy * fail on large documents * fmt * cleanup * cleanup * implement Value for different types fix serde_json date Value implementation * fmt * cleanup * fmt * cleanup * store positions instead of pos+len * remove nodes array * remove mediumvec * cleanup * infallible serialize into vec * remove positions indirection * remove 24MB limitation in document use u32 for Addr Remove the 3 byte addressing limitation and use VInt instead * cleanup * extend test * cleanup, add comments * rename, remove pub	2024-05-21 10:16:08 +02:00
PSeitz	eea70030bf	cleanup top level exports (#2382 ) remove some top level exports	2024-05-07 09:59:41 +02:00
PSeitz	74940e9345	clippy (#2349 ) * fix clippy * fix clippy * fix duplicate imports	2024-04-09 07:54:44 +02:00
PSeitz	48630ceec9	move into new index module (#2259 ) move core modules to index module	2024-01-31 10:30:04 +01:00
PSeitz	9c75942aaf	fix merge panic for JSON fields (#2284 ) Root cause was the positions buffer had residue positions from the previous term, when the terms were alternating between having and not having positions in JSON (terms have positions, but not numerics). Fixes #2283	2023-12-21 11:05:34 +01:00
Paul Masurel	6b59ec6fd5	Fix bug occuring when merging JSON object indexed with positions. In JSON Object field the presence of term frequencies depend on the field. Typically, a string with postiions indexed will have positions while numbers won't. The presence or absence of term freqs for a given term is unfortunately encoded in a very passive way. It is given by the presence of extra information in the skip info, or the lack of term freqs after decoding vint blocks. Before, after writing a segment, we would encode the segment correctly (without any term freq for number in json object field). However during merge, we would get the default term freq=1 value. (this is default in the absence of encoded term freqs) The merger would then proceed and attempt to decode 1 position when there are in fact none. This PR requires to explictly tell the posting serialize whether term frequencies should be serialized for each new term. Closes #2251	2023-11-14 22:41:48 +09:00
PSeitz	03a1f40767	rename DocValue to Value (#2197 ) rename DocValue to Value to avoid confusion with lucene DocValues rename Value to OwnedValue	2023-10-02 17:03:00 +02:00
Harrison Burt	1c7c6fd591	POC: Tantivy documents as a trait (#2071 ) * fix windows build (#1) * Fix windows build * Add doc traits * Add field value iter * Add value and serialization * Adjust order * Fix bug * Correct type * Fix generic bugs * Reformat code * Add generic to index writer which I forgot about * Fix missing generics on single segment writer * Add missing type export * Add default methods for convenience * Cleanup * Fix more-like-this query to use standard types * Update API and fix tests * Add doc traits * Add field value iter * Add value and serialization * Adjust order * Fix bug * Correct type * Rebase main and fix conflicts * Reformat code * Merge upstream * Fix missing generics on single segment writer * Add missing type export * Add default methods for convenience * Cleanup * Fix more-like-this query to use standard types * Update API and fix tests * Add tokenizer improvements from previous commits * Add tokenizer improvements from previous commits * Reformat * Fix unit tests * Fix unit tests * Use enum in changes * Stage changes * Add new deserializer logic * Add serializer integration * Add document deserializer * Implement new (de)serialization api for existing types * Fix bugs and type errors * Add helper implementations * Fix errors * Reformat code * Add unit tests and some code organisation for serialization * Add unit tests to deserializer * Add some small docs * Add support for deserializing serde values * Reformat * Fix typo * Fix typo * Change repr of facet * Remove unused trait methods * Add child value type * Resolve comments * Fix build * Fix more build errors * Fix more build errors * Fix the tests I missed * Fix examples * fix numerical order, serialize PreTok Str * fix coverage * rename Document to TantivyDocument, rename DocumentAccess to Document add Binary prefix to binary de/serialization * fix coverage --------- Co-authored-by: Pascal Seitz <pascal.seitz@gmail.com>	2023-10-02 10:01:16 +02:00
Adam Reichold	c805f08ca7	Fix a few more upcoming Clippy lints (#2133 )	2023-07-24 17:07:57 +09:00
Yuri Astrakhan	74275b76a6	Inline format arguments where makes sense (#2038 ) Applied this command to the code, making it a bit shorter and slightly more readable. ``` cargo +nightly clippy --all-features --benches --tests --workspace --fix -- -A clippy::all -W clippy::uninlined_format_args cargo +nightly fmt --all ```	2023-05-10 18:03:59 +09:00
PSeitz	5c380b76e7	Better mixed types support in aggs and fix serialization issue (#1971 ) * Better mixed types support in aggs and fix serialization issue - Improve support for mixed types in JSON field aggregations (pick the right field, #1913) - Resolve the issue with JSON serialization for numeric keys (fixes #1967) - Add JSON round-trip test for term buckets - Remove `u64_lenient`, as this is a footgun without the type - move aggregation benchmarks * remove shadowing	2023-03-31 05:52:11 +02:00
Paul Masurel	06850719dc	Renaming .values(DocId) to .values_for_doc(DocId) (#1906 )	2023-02-27 12:15:13 +09:00
Paul Masurel	f537334e4f	Adding a write schema to columnar's merge operations. (#1884 ) * Adding a write schema to columnar's merge operations. * Added unit test checking min/max when columns are empty. * CR comment * Rename to value_type_to_column_type	2023-02-21 18:25:16 +09:00
Alex Cole	f2f38c43ce	Make BM25 scoring more flexible (#1855 ) * Introduce Bm25StatisticsProvider to inject statistics * fix formatting I accidentally changed	2023-02-16 19:14:12 +09:00
Paul Masurel	097fd6138d	Fix clippy comments (#1872 )	2023-02-14 23:12:45 +09:00
Paul Masurel	bd5eea9852	Integrated columnar work.	2023-02-09 13:14:31 +01:00
PSeitz	f687b3a5aa	start migrate Field to &str (#1772 ) start migrate Field to &str in preparation of columnar return Result for get_field	2023-01-18 16:12:07 +09:00
PSeitz	07a51eb7c8	refactor multivalue fastfield, refactor range query (#1749 ) Introduce MakeZero trait, remove make_zero from FastValue Merge two multivalue fastfield implementations into one prepare range query on fastfield for different types	2023-01-05 12:09:50 +01:00
PSeitz	f9171a3981	fix clippy (#1725 ) * fix clippy * fix clippy fastfield codecs * fix clippy bitpacker * fix clippy common * fix clippy stacker * fix clippy sstable * fmt	2022-12-20 07:30:06 +01:00
Paul Masurel	3edf0a2724	Using the manual reload policy in IndexWriter. (#1667 )	2022-11-09 11:20:41 +01:00
PSeitz	3e9c806890	Merge pull request #1665 from quickwit-oss/fix_num_vals fix num_vals on u128 value index after merge	2022-11-07 21:46:02 +08:00
Pascal Seitz	c69a873dd3	fix num_vals on value index after merge	2022-11-07 21:05:21 +08:00
Pascal Seitz	38ad46e580	fix clippy	2022-11-07 16:09:55 +08:00
Pascal Seitz	83325d8f3f	move multivalue index to own file start_doc parameter in positions to docids	2022-11-01 10:36:13 +08:00
Pascal Seitz	e772d3170d	switch get_val() to u32 Fixes #1638	2022-10-24 19:05:57 +08:00
Pascal Seitz	791350091c	switch num_vals() to u32 fixes #1630	2022-10-20 19:44:28 +08:00
Pascal Seitz	2864bf7123	use serializer for u128	2022-10-07 16:25:01 +08:00
Pascal Seitz	0b86658389	rename ip addr, use buffer	2022-10-07 16:25:01 +08:00
Pascal Seitz	5d6602a8d9	mark null handling TODO	2022-10-07 16:25:01 +08:00
Pascal Seitz	eeb1f19093	rename to iter_gen	2022-10-07 16:25:01 +08:00
Pascal Seitz	309449dba3	rename to IpAddr	2022-10-07 16:25:01 +08:00
Pascal Seitz	c8713a01ed	use iter api	2022-10-07 16:25:01 +08:00
Pascal Seitz	400a20b7af	add ip field add u128 multivalue reader and writer add ip to schema add ip writers, handle merge	2022-10-07 16:25:01 +08:00
Pascal Seitz	d742275048	renames	2022-10-05 19:16:49 +08:00
Pascal Seitz	8b42c4c126	disable linear codec for multivalue value index don't materialize index column on merge use simpler chain() variant	2022-10-05 19:09:17 +08:00
Pascal Seitz	6d9a123cf2	remove get_val in serialization remove get_val in serialization and mark as unimplemented!() replace get_val with iter in linear codec remove MultivalueStartIndexRandomSeeker replace MultivalueStartIndexIter with closure Sample 100 values in linear codec	2022-10-04 12:01:25 +08:00
Paul Masurel	1998111521	Minor refactoring fast fields (#1537 )	2022-09-21 12:46:11 +09:00
Paul Masurel	64f08a1a5c	Hiding useless symbols and removing code. (#1522 )	2022-09-16 14:42:27 +09:00
Pascal Seitz	edd9155b88	return `Write`, add documentation	2022-09-08 12:41:55 +08:00
Pascal Seitz	29d56111de	refactor, fix api refactor fix clippy fix docs remove unused code fix bytesfield index api flaw	2022-09-07 18:43:04 +08:00
PSeitz	54696da771	Merge pull request #1505 from quickwit-oss/refact-fast-field Refact fast field	2022-09-07 02:07:42 -07:00
Paul Masurel	c632fc014e	Refactoring fast fields codecs. This removes the GCD part as a codec, and makes it so that fastfield codecs all share the same normalization part (shift + gcd).	2022-09-05 23:07:12 +09:00
Pascal Seitz	f6f23ba684	optionally create segment on merge create a new segment only if it contains data fixes #1189	2022-09-05 15:07:03 +08:00
Paul Masurel	8e775b6c3d	Refactoring dyn Column (#1502 )	2022-09-02 17:26:30 +09:00
Pascal Seitz	54972caa7c	remove Column impl on Vec remove Column impl on Vec to avoid function shadowing	2022-08-29 11:57:41 +02:00
Paul Masurel	5331be800b	Introducing a column trait	2022-08-28 14:14:27 +02:00
Paul Masurel	298b5dd726	GCD wrapper uses DividerU64 (#1478 )	2022-08-25 02:29:13 +09:00
Pascal Seitz	00ebff3c16	move fastfield stats to trait	2022-08-24 15:29:55 +02:00

1 2 3 4 5 ...

281 Commits