Adam Reichold
19a773da47
Allow cheaply cloning a StoreReader to enable user control over block cache usage.
2023-12-11 10:28:50 +01:00
PSeitz
bff7c58497
improve indexing benchmark ( #2275 )
2023-12-11 09:04:42 +01:00
trinity-1686a
9ebc5ed053
use fst for sstable index ( #2268 )
...
* read path for new fst based index
* implement BlockAddrStoreWriter
* extract slop/derivation computation
* use better linear approximator and allow negative correction to approximator
* document format and reorder some fields
* optimize single block sstable size
* plug backward compat
2023-12-04 15:13:15 +01:00
PSeitz
0b56c88e69
Revert "Preparing for 0.21.2 release." ( #2258 )
...
* Revert "Preparing for 0.21.2 release. (#2256 )"
This reverts commit 9caab45136 .
* bump version to 0.21.1
* set version to 0.22.0-dev
2023-12-01 13:46:12 +01:00
PSeitz
24841f0b2a
update bitpacker dep ( #2269 )
2023-12-01 13:45:52 +01:00
PSeitz
1a9fc10be9
add fields_metadata to SegmentReader, add columnar docs ( #2222 )
...
* add fields_metadata to SegmentReader, add columnar docs
* use schema to resolve field, add test
* normalize paths
* merge for FieldsMetadata, add fields_metadata on Index
* Update src/core/segment_reader.rs
Co-authored-by: Paul Masurel <paul@quickwit.io >
* merge code paths
* add Hash
* move function oustide
---------
Co-authored-by: Paul Masurel <paul@quickwit.io >
2023-11-22 12:29:53 +01:00
PSeitz
07573a7f19
update fst ( #2267 )
...
update fst to 0.5 (deduplicates regex-syntax in the dep tree)
deps cleanup
2023-11-21 16:06:57 +01:00
BlackHoleFox
daad2dc151
Take string references instead of owned values building Facet paths ( #2265 )
2023-11-20 09:40:44 +01:00
PSeitz
054f49dc31
support escaped dot, add agg test ( #2250 )
...
add agg test for nested JSON
allow escaping of dot
2023-11-20 03:00:57 +01:00
PSeitz
47009ed2d3
remove unused deps ( #2264 )
...
found with cargo machete
remove pprof (doesn't work)
2023-11-20 02:59:59 +01:00
PSeitz
0aae31d7d7
reduce number of allocations ( #2257 )
...
* reduce number of allocations
Explanation makes up around 50% of all allocations (numbers not perf).
It's created during serialization but not called.
- Make Explanation optional in BM25
- Avoid allocations when using Explanation
* use Cow
2023-11-16 13:47:36 +01:00
Paul Masurel
9caab45136
Preparing for 0.21.2 release. ( #2256 )
2023-11-15 10:43:36 +09:00
Chris Tam
6d9a7b7eb0
Derive Debug for SchemaBuilder ( #2254 )
2023-11-15 01:03:44 +01:00
dependabot[bot]
7a2c5804b1
Update itertools requirement from 0.11.0 to 0.12.0 ( #2255 )
...
Updates the requirements on [itertools](https://github.com/rust-itertools/itertools ) to permit the latest version.
- [Changelog](https://github.com/rust-itertools/itertools/blob/master/CHANGELOG.md )
- [Commits](https://github.com/rust-itertools/itertools/compare/v0.11.0...v0.12.0 )
---
updated-dependencies:
- dependency-name: itertools
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-11-15 01:03:08 +01:00
François Massot
5319977171
Merge pull request #2253 from quickwit-oss/issue/2251-bug-merge-json-object-with-number
...
Fix bug occuring when merging JSON object indexed with positions.
2023-11-14 17:28:29 +01:00
trinity-1686a
828632e8c4
rustfmt
2023-11-14 15:05:16 +01:00
Paul Masurel
6b59ec6fd5
Fix bug occuring when merging JSON object indexed with positions.
...
In JSON Object field the presence of term frequencies depend on the
field.
Typically, a string with postiions indexed will have positions
while numbers won't.
The presence or absence of term freqs for a given term is unfortunately
encoded in a very passive way.
It is given by the presence of extra information in the skip info, or
the lack of term freqs after decoding vint blocks.
Before, after writing a segment, we would encode the segment correctly
(without any term freq for number in json object field).
However during merge, we would get the default term freq=1 value.
(this is default in the absence of encoded term freqs)
The merger would then proceed and attempt to decode 1 position when
there are in fact none.
This PR requires to explictly tell the posting serialize whether
term frequencies should be serialized for each new term.
Closes #2251
2023-11-14 22:41:48 +09:00
PSeitz
b60d862150
docid deltas while indexing ( #2249 )
...
* docid deltas while indexing
storing deltas is especially helpful for repetitive data like logs.
In those cases, recording a doc on a term costed 4 bytes instead of 1
byte now.
HDFS Indexing 1.1GB Total memory consumption:
Before: 760 MB
Now: 590 MB
* use scan for delta decoding
2023-11-13 05:14:27 +01:00
PSeitz
4837c7811a
add missing inlines ( #2245 )
2023-11-10 08:00:42 +01:00
PSeitz
5a2397d57e
add sstable ord_to_term benchmark ( #2242 )
2023-11-10 07:27:48 +01:00
PSeitz
927b4432c9
Perf: use term hashmap in fastfield ( #2243 )
...
* add shared arena hashmap
* bench fastfield indexing
* use shared arena hashmap in columnar
lower minimum resize in hashtable
* clippy
* add comments
2023-11-09 13:44:02 +01:00
trinity-1686a
7a0064db1f
bump index version ( #2237 )
...
* bump index version
and add constant for lowest supported version
* use range instead of handcoded bounds
2023-11-06 19:02:37 +01:00
PSeitz
2e7327205d
fix coverage run ( #2232 )
...
coverage run uses the compare_hash_only feature which is not compativle
with the test_hashmap_size test
2023-11-06 11:18:38 +00:00
Paul Masurel
7bc5bf78e2
Fixing functional tests. ( #2239 )
2023-11-05 18:18:39 +09:00
giovannicuccu
ef603c8c7e
rename ReloadPolicy onCommit to onCommitWithDelay ( #2235 )
...
* rename ReloadPolicy onCommit to onCommitWithDelay
* fix format issues
---------
Co-authored-by: Giovanni Cuccu <gcuccu@imolainformatica.it >
2023-11-03 12:22:10 +01:00
PSeitz
28dd6b6546
collect json paths in indexing ( #2231 )
...
* collect json paths in indexing
* remove unsafe iter_mut_keys
2023-11-01 11:25:17 +01:00
trinity-1686a
1dda2bb537
handle * inside term in query parser ( #2228 )
2023-10-27 08:57:02 +02:00
PSeitz
bf6544cf28
fix mmap::Advice reexport ( #2230 )
2023-10-27 14:09:25 +09:00
PSeitz
ccecf946f7
tantivy 0.21.1 ( #2227 )
2023-10-27 05:01:44 +02:00
PSeitz
19a859d6fd
term hashmap remove copy in is_empty, unused unordered_id ( #2229 )
2023-10-27 05:01:32 +02:00
PSeitz
83af14caa4
Fix range query ( #2226 )
...
Fix range query end check in advance
Rename vars to reduce ambiguity
add tests
Fixes #2225
2023-10-25 09:17:31 +02:00
PSeitz
4feeb2323d
fix clippy ( #2223 )
2023-10-24 10:05:22 +02:00
PSeitz
07bf66a197
json path writer ( #2224 )
...
* refactor logic to JsonPathWriter
* use in encode_column_name
* add inlines
* move unsafe block
2023-10-24 09:45:50 +02:00
trinity-1686a
0d4589219b
encode some part of posting list as -1 instead of direct values ( #2185 )
...
* add support for delta-1 encoding posting list
* encode term frequency minus one
* don't emit tf for json integer terms
* make skipreader not pub(crate) mutable
2023-10-20 16:58:26 +02:00
PSeitz
c2b0469180
improve docs, rework exports ( #2220 )
...
* rework exports
move snippet and advice
make indexer pub, remove indexer reexports
* add deprecation warning
* add architecture overview
2023-10-18 09:22:24 +02:00
PSeitz
7e1980b218
run coverage only after merge ( #2212 )
...
* run coverage only after merge
coverage is a quite slow step in CI. It can be run only after merging
* Apply suggestions from code review
Co-authored-by: Paul Masurel <paul@quickwit.io >
---------
Co-authored-by: Paul Masurel <paul@quickwit.io >
2023-10-18 07:19:36 +02:00
PSeitz
ecb9a89a9f
add compat mode for JSON ( #2219 )
2023-10-17 10:00:55 +02:00
PSeitz
5e06e504e6
split into ReferenceValueLeaf ( #2217 )
2023-10-16 16:31:30 +02:00
PSeitz
182f58cea6
remove Document: DocumentDeserialize dependency ( #2211 )
...
* remove Document: DocumentDeserialize dependency
The dependency requires users to implement an API they may not use.
* remove unnecessary Document bounds
2023-10-13 07:59:54 +02:00
dependabot[bot]
337ffadefd
Update lru requirement from 0.11.0 to 0.12.0 ( #2208 )
...
Updates the requirements on [lru](https://github.com/jeromefroe/lru-rs ) to permit the latest version.
- [Changelog](https://github.com/jeromefroe/lru-rs/blob/master/CHANGELOG.md )
- [Commits](https://github.com/jeromefroe/lru-rs/compare/0.11.0...0.12.0 )
---
updated-dependencies:
- dependency-name: lru
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-10-12 12:09:56 +02:00
dependabot[bot]
22aa4daf19
Update zstd requirement from 0.12 to 0.13 ( #2214 )
...
Updates the requirements on [zstd](https://github.com/gyscos/zstd-rs ) to permit the latest version.
- [Release notes](https://github.com/gyscos/zstd-rs/releases )
- [Commits](https://github.com/gyscos/zstd-rs/compare/v0.12.0...v0.13.0 )
---
updated-dependencies:
- dependency-name: zstd
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-10-12 04:24:44 +02:00
PSeitz
493f9b2f2a
Read list of JSON fields encoded in dictionary ( #2184 )
...
* Read list of JSON fields encoded in dictionary
add method to get list of fields on InvertedIndexReader
* add field type
2023-10-09 12:06:22 +02:00
PSeitz
e246e5765d
replace ReferenceValue with Self in Value ( #2210 )
2023-10-06 08:22:15 +02:00
PSeitz
6097235eff
fix numeric order, refactor Document ( #2209 )
...
fix numeric order to prefer i64
rename and move Document stuff
2023-10-05 16:39:56 +02:00
PSeitz
b700c42246
add AsRef, expose object and array iter on Value ( #2207 )
...
add AsRef
expose object and array iter
add to_json on Document
2023-10-05 03:55:35 +02:00
PSeitz
5b1bf1a993
replace Field with field name ( #2196 )
2023-10-04 06:21:40 +02:00
PSeitz
041d4fced7
move to_named_doc to Document trait ( #2205 )
2023-10-04 06:03:07 +02:00
dependabot[bot]
166fc15239
Update memmap2 requirement from 0.7.1 to 0.9.0 ( #2204 )
...
Updates the requirements on [memmap2](https://github.com/RazrFalcon/memmap2-rs ) to permit the latest version.
- [Changelog](https://github.com/RazrFalcon/memmap2-rs/blob/master/CHANGELOG.md )
- [Commits](https://github.com/RazrFalcon/memmap2-rs/compare/v0.7.1...v0.9.0 )
---
updated-dependencies:
- dependency-name: memmap2
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-10-04 05:00:46 +02:00
PSeitz
514a6e7fef
fix bench compile, fix Document reexport ( #2203 )
2023-10-03 17:28:36 +02:00
dependabot[bot]
82d9127191
Update fs4 requirement from 0.6.3 to 0.7.0 ( #2199 )
...
Updates the requirements on [fs4](https://github.com/al8n/fs4-rs ) to permit the latest version.
- [Release notes](https://github.com/al8n/fs4-rs/releases )
- [Commits](https://github.com/al8n/fs4-rs/commits/0.7.0 )
---
updated-dependencies:
- dependency-name: fs4
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-10-03 04:43:09 +02:00