Ming Ying
bc24bda408
Make DeleteMeta pub
2025-01-23 21:38:02 -08:00
Ming Ying
f8b6444f3f
make save_metas provide previous metas
2025-01-23 21:38:02 -08:00
Ming Ying
988b10bc0b
undo changes to segment_updater.rs
2025-01-23 21:38:02 -08:00
Eric B. Ridge
db989cf0dd
no pgrx, please
2025-01-23 21:38:02 -08:00
Ming Ying
a8ae71f780
quickwit compiles
2025-01-23 21:38:02 -08:00
Ming Ying
dbab871792
Directory trait can read/write meta/managed
2025-01-23 21:38:02 -08:00
Eric Ridge
97e0a20cbe
adjust Dictionary::sorted_ords_to_term_cb() to allow duplicates ( #8 )
2025-01-23 21:38:01 -08:00
Ming
b48ff430e6
expose AddOperation and with_max_doc ( #7 )
2025-01-23 21:38:01 -08:00
Ming
feab1647a2
Fix managed paths ( #5 )
2025-01-23 21:38:01 -08:00
Alexander Alexandrov
17d366eb51
feat: implement TokenFilter for Option<F> ( #4 )
2025-01-23 21:38:01 -08:00
Neil Hansen
0c9189b684
Use Levenshtein distance to score documents in fuzzy term queries
2025-01-23 21:38:00 -08:00
dependabot[bot]
4aa8cd2470
Update downcast-rs requirement from 1.2.1 to 2.0.1 ( #2566 )
...
Updates the requirements on [downcast-rs](https://github.com/marcianx/downcast-rs ) to permit the latest version.
- [Changelog](https://github.com/marcianx/downcast-rs/blob/master/CHANGELOG.md )
- [Commits](https://github.com/marcianx/downcast-rs/compare/v1.2.1...v2.0.1 )
---
updated-dependencies:
- dependency-name: downcast-rs
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-01-22 10:32:24 +01:00
dependabot[bot]
43c89b4360
Update itertools requirement from 0.13.0 to 0.14.0 ( #2563 )
...
Updates the requirements on [itertools](https://github.com/rust-itertools/itertools ) to permit the latest version.
- [Changelog](https://github.com/rust-itertools/itertools/blob/master/CHANGELOG.md )
- [Commits](https://github.com/rust-itertools/itertools/compare/v0.13.0...v0.14.0 )
---
updated-dependencies:
- dependency-name: itertools
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-01-08 17:11:46 +01:00
trinity-1686a
d281ca3e65
Merge pull request #2559 from quickwit-oss/trinity/sstable-partial-automaton
...
allow warming partially an sstable for an automaton
2025-01-08 16:35:35 +01:00
trinity Pointard
be17daf658
split iterator
2025-01-08 16:24:34 +01:00
trinity Pointard
6ca84a61fa
make termdict always clone
2025-01-08 16:19:54 +01:00
trinity Pointard
037d12c9c9
fix deadlocking on automaton warmup
2025-01-06 11:58:58 +01:00
Remi Dettai
71cf19870b
Exist queries match subpath fields ( #2558 )
...
* Exist queries match subpath fields
* Make subpath check optional
* Add async subpath listing
2025-01-06 10:17:39 +01:00
trinity Pointard
175a529c41
use executor for cpu-heavy sstable decompression for automaton
2025-01-03 19:14:07 +01:00
trinity Pointard
fe0c7c5408
change rangebound style
2025-01-02 11:56:05 +01:00
Harrison Burt
148594f0f9
Improve IndexWriter customisation via builder ( #2562 )
...
* Improve `IndexWriter` customisation via builder
* Remove change noise from PR
* Correct documentation
* Resolve comments and add test
2025-01-02 09:43:22 +01:00
dependabot[bot]
8edb439440
Update rustc-hash requirement from 1.1.0 to 2.1.0 ( #2551 )
...
---
updated-dependencies:
- dependency-name: rustc-hash
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-12-26 10:25:05 +01:00
trinity Pointard
dfff5f3bcb
rename merge_holes_under => merge_holes_under_bytes
2024-12-23 16:17:44 +01:00
trinity-1686a
ebf4d84553
add comment about cpu-intensive operation in async context
2024-12-20 12:23:49 +01:00
trinity-1686a
42efc7f7c8
clippy
2024-12-20 11:00:11 +01:00
trinity-1686a
192395c311
attempt at simplifying can_block_match_automaton
2024-12-20 10:25:38 +01:00
trinity-1686a
a1447cc9c2
remove breaking change in sstable public api
2024-12-19 17:30:05 +01:00
trinity-1686a
c39d91f827
Merge pull request #2547 from quickwit-oss/trinity/count-str
...
add support for counting non integer in aggregation
2024-12-17 15:27:30 +01:00
trinity Pointard
32b6e9711b
add tests
2024-12-13 16:06:24 +01:00
trinity-1686a
24c5dc2398
allow warming up automaton
2024-12-10 13:32:12 +01:00
trinity-1686a
9e2ddec4b3
merge adjacent block when building delta for automaton
2024-12-10 13:32:12 +01:00
trinity-1686a
1f6a8e74bb
support iterating over partially loaded sstable
2024-12-10 13:32:12 +01:00
trinity-1686a
7e901f523b
get iter for blocks of sstable matching automaton
2024-12-10 13:32:12 +01:00
trinity-1686a
3c30a41c14
add helper to figure if block can match automaton
2024-12-10 13:32:12 +01:00
dependabot[bot]
0f99d4f420
Update measure_time requirement from 0.8.2 to 0.9.0 ( #2557 )
...
---
updated-dependencies:
- dependency-name: measure_time
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-12-09 21:39:01 +01:00
Pierre Barre
6e02c5cb25
Make NUM_MERGE_THREADS configurable ( #2535 )
...
* Make `NUM_MERGE_THREADS` configurable
* Remove unused import
* Reword comment src/index/index.rs
Co-authored-by: PSeitz <PSeitz@users.noreply.github.com >
---------
Co-authored-by: PSeitz <PSeitz@users.noreply.github.com >
2024-12-09 16:53:11 +08:00
PSeitz
876a579e5d
queryparser: add field respecification test ( #2550 )
2024-12-02 14:17:12 +01:00
PSeitz
4c52499622
clippy ( #2549 )
2024-11-29 16:08:21 +08:00
trinity-1686a
0bac391291
add support for counting non integer in aggregation
2024-11-28 19:52:47 +01:00
PSeitz
52d4e81e70
update CHANGELOG ( #2546 )
2024-11-27 20:49:35 +08:00
dependabot[bot]
c71ea7b2ef
Update thiserror requirement from 1.0.30 to 2.0.1 ( #2542 )
...
Updates the requirements on [thiserror](https://github.com/dtolnay/thiserror ) to permit the latest version.
- [Release notes](https://github.com/dtolnay/thiserror/releases )
- [Commits](https://github.com/dtolnay/thiserror/compare/1.0.30...2.0.1 )
---
updated-dependencies:
- dependency-name: thiserror
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-11-09 08:08:34 +08:00
Paul Masurel
c35a782747
Updating rustc-hash and clippy fixes ( #2532 )
...
* Updating rustc-hash and clippy fixes
* fix terms_aggregation_min_doc_count_special_case
---------
Co-authored-by: Pascal Seitz <pascal.seitz@gmail.com >
2024-11-01 13:46:26 +08:00
dependabot[bot]
c66af2c0a9
Update binggan requirement from 0.12.0 to 0.14.0 ( #2530 )
...
* Update binggan requirement from 0.12.0 to 0.14.0
---
updated-dependencies:
- dependency-name: binggan
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com >
* fix build
---------
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Pascal Seitz <pascal.seitz@gmail.com >
2024-10-24 09:41:35 +08:00
Joan Antoni RE
f9ac055847
Fix some links in architecture docs ( #2528 )
2024-10-23 21:06:54 +09:00
PSeitz
21d057059e
clippy ( #2527 )
...
* clippy
* clippy
* clippy
* clippy
* convert allow to expect and remove unused
* cargo fmt
* cleanup
* export sample
* clippy
2024-10-22 09:26:54 +08:00
PSeitz
dca508b4ca
remove read_postings_no_deletes ( #2526 )
...
closes #2525
2024-10-22 09:52:43 +09:00
PSeitz
aebae9965d
add RegexPhraseQuery ( #2516 )
...
* add RegexPhraseQuery
RegexPhraseQuery supports phrase queries with regex. It supports regex
and wildcards. E.g. a query with wildcards:
"b* b* wolf" matches "big bad wolf"
Slop is supported as well:
"b* wolf"~2 matches "big bad wolf"
Regex queries may match a lot of terms where we still need to
keep track which term hit to load the positions.
The phrase query algorithm groups terms by their frequency
together in the union to prefilter groups early.
This PR comes with some new datastructures:
SimpleUnion - A union docset for a list of docsets. It doesn't do any
caching and is therefore well suited for datasets with lots of skipping.
(phrase search, but intersections in general)
LoadedPostings - Like SegmentPostings, but all docs and positions are loaded in
memory. SegmentPostings uses 1840 bytes per instance with its caches,
which is equivalent to 460 docids.
LoadedPostings is used for terms which have less than 100 docs.
LoadedPostings is only used to reduce memory consumption.
BitSetPostingUnion - Creates a `Posting` that uses the bitset for docid
hits and the docsets for positions. The BitSet is the precalculated
union of the docsets
In the RegexPhraseQuery there is a size limit of 512 docsets per PreAggregatedUnion,
before creating a new one.
Renamed Union to BufferedUnionScorer
Added proptests to test different union types.
* cleanup
* use Box instead of Vec
* use RefCell instead of term_freq(&mut)
* remove wildcard mode
* move RefCell to outer
* clippy
2024-10-21 18:29:17 +08:00
Marvin
e7e3e3f44c
make casing in docs more consistent ( #2524 )
...
* make casing in docs more consistent
* more
* lowercase tantivy
2024-10-21 17:59:41 +09:00
PSeitz
2f2db16ec1
store DateTime as nanoseconds in doc store ( #2486 )
...
* store DateTime as nanoseconds in doc store
The doc store DateTime was truncated to microseconds previously. This
removes this truncation, while still keeping backwards compatibility.
This is done by adding the trait `ConfigurableBinarySerializable`, which
works like `BinarySerializable`, but with a config that allows de/serialize
as different date time precision currently.
bump version format to 7.
add compat test to check the date time truncation.
* remove configurable binary serialize, add enum for doc store version
* test doc store version ord
2024-10-18 10:50:20 +08:00
Paul Masurel
d152e29687
Fixed citation ( #2523 )
2024-10-17 10:19:50 +09:00