682 Commits

Author SHA1 Message Date
PSeitz
f687b3a5aa start migrate Field to &str (#1772)
start migrate Field to &str in preparation of columnar
return Result for get_field
2023-01-18 16:12:07 +09:00
Shikhar Bhushan
2650111b76 EnableScoring::Disabled - optional Searcher (#1780) 2023-01-12 09:26:50 -05:00
Paul Masurel
bb48c3e488 Refactoring to prepare for the addition of dynamic fast field (#1730)
* Refactoring to prepare for the addition of dynamic fast field

- Exposing insert_key / insert_value
- Renamed SSTable::{Reader/Writer}-> SSTable::{ValueReader/ValueWriter}
- Added a generic Dictionary object in the sstable crate
- Removing the TermDictionary wrapper from tantivy, relying directly on
  an alias of the generic Dictionary object.
- dropped the use of byteorder in sstable.
- Stopped scanning / reading the entire dictionary when streaming a range.

* Added a benchmark for streaming sstable ranges.

* CR comments.

Rename deserialize_u64 -> deserialize_vint_u64

* Removed needless allocation, split serialize into serialize and clear.
2022-12-22 12:25:46 +09:00
Paul Masurel
32cb1d22da Removed AsyncIoResult. (#1728) 2022-12-21 16:01:17 +09:00
PSeitz
f9171a3981 fix clippy (#1725)
* fix clippy

* fix clippy fastfield codecs

* fix clippy bitpacker

* fix clippy common

* fix clippy stacker

* fix clippy sstable

* fmt
2022-12-20 07:30:06 +01:00
PSeitz
0281b22b77 update create_in_ram docs (#1695) 2022-11-24 17:30:09 +01:00
trinity-1686a
5765c261aa allow warming up of the full posting list (#1673)
* allow warming up of the full posting list

* cargo fmt
2022-11-14 10:27:56 +09:00
Paul Masurel
3edf0a2724 Using the manual reload policy in IndexWriter. (#1667) 2022-11-09 11:20:41 +01:00
Pascal Seitz
38ad46e580 fix clippy 2022-11-07 16:09:55 +08:00
Bruce Mitchener
b3bf9a5716 Documentation improvements. 2022-10-05 14:18:10 +07:00
Pascal Seitz
0e94213af0 validate index settings on create 2022-09-29 18:58:09 +08:00
Bruce Mitchener
cb252a42af docs: "associated to" -> "associated with" (#1557)
This reads better this way.
2022-09-26 20:23:37 +09:00
Bruce Mitchener
ea8e6d7b1d Tidy up clippy config. (#1547)
* Checking cfg_attr is no longer necessary.
* Don't need multiple `clippy::` prefixes on a name.
2022-09-26 09:37:55 +09:00
Bruce Mitchener
d231671fe2 clippy: Remove borrows that the compiler will do.
This started showing up with clippy in rust 1.64.
2022-09-22 22:38:23 +07:00
Bruce Mitchener
cf02e32578 Improvements to doc linking, grammar, etc. 2022-09-19 18:10:22 +07:00
Bruce Mitchener
6a88ac3fe3 Documentation improvements.
Fix some linking, some grammar, some typos, etc.
2022-09-18 18:05:37 +07:00
Paul Masurel
817225edfb Allow for a same-thread doc compressor. (#1510)
In addition, it isolates the doc compressor logic,
better reports io::Result.

In the case of the same-thread doc compressor,
the blocks are also not copied.
2022-09-13 15:32:48 +09:00
Paul Masurel
4d634d61ff Expose memory usage in SingleSegmentIndexWriter (#1508) 2022-09-07 18:33:52 +09:00
Paul Masurel
08c4412d73 Adding dragon API to build index without any thread. (#1496)
Closes #1487
2022-09-01 10:32:36 +09:00
Paul Masurel
a451f6d60d Minor refactoring. (#1495) 2022-08-31 12:00:58 +09:00
PSeitz
bb01e99e05 Fixes race condition in Searcher (#1464)
Fixes a race condition in Searcher, by avoiding repeated calls to open_segment_readers and passing them instead as argument

Closes #1461
2022-08-24 21:17:37 +09:00
Kian-Meng Ang
625bcb4877 Fix typos and markdowns
Found via these commands:

    codespell -L crate,ser,panting,beauti,hart,ue,atleast,childs,ond,pris,hel,mot
    markdownlint *.md doc/src/*.md --disable MD013 MD025 MD033 MD001 MD024 MD036 MD041 MD003
2022-08-13 18:25:47 +08:00
Evance Soumaoro
fad3faefe2 added InvertedIndexReader::doc_freq_async and SnippetGenerator::new methods 2022-08-12 06:39:10 +00:00
boraarslan
d4b2b7de8b Expose inner file slice 2022-08-04 18:13:17 +03:00
PSeitz
23fe73a6c0 remove searcher pool and make Searcher cloneable (#1411)
* remove searcher pool and make Searcher cloneable

closes #1410

* use SearcherInner in InnerIndexReader
2022-07-12 18:07:48 +09:00
Pascal Seitz
5750224d4c set docstore cache size at construction 2022-07-04 14:27:55 +08:00
Pascal Seitz
9db2f0e82b expose doc store cache size
expose lru doc store cache size
optimize doc store cache size
2022-07-04 13:54:41 +08:00
PSeitz
de178a1901 Merge pull request #1395 from PSeitz/fix_clippy
fix clippy
2022-06-21 16:30:59 +08:00
Antoine G
11e4225f23 doc fix (#1391)
Documentation fix.
2022-06-21 15:53:33 +09:00
Pascal Seitz
1440f3243b fix clippy 2022-06-21 14:47:01 +08:00
Kanji Yomoda
83d0c13fb0 Fix outdated variable naming and comments to alive bitset (#1387)
* Fix outdated variables and comments for alive bitset

* Fix expired link to delete bitset
2022-06-14 15:59:15 +09:00
Pascal Seitz
4d9d2b6db0 split into compressor/decompressor
use custom de/serializer for compressor
accept parameters like zstd(compression_level=5) as compressor
2022-06-02 23:29:24 +08:00
Pascal Seitz
ed868f93a3 enable setting compression level 2022-06-02 16:47:29 +08:00
Paul Masurel
f0a2b1cc44 Bumped tantivy and subcrate versions. 2022-05-25 22:50:33 +09:00
Kryesh
03040ed81d Add Zstd compression support 2022-05-18 14:04:43 +10:00
Kryesh
aaa22ad225 Make block size configurable to allow for better compression ratios on large documents 2022-05-18 11:13:15 +10:00
PSeitz
7f45a6ac96 allow setting tokenizer manager on index (#1362)
handle json in tokenizer_for_field
2022-05-09 18:15:45 +09:00
Paul Masurel
be70804d17 Removed AtomicUsize. 2022-05-04 16:45:24 +09:00
PSeitz
a1afc80600 Update src/core/executor.rs
Co-authored-by: Paul Masurel <paul@quickwit.io>
2022-05-04 08:39:44 +02:00
Pascal Seitz
4db655ae82 update dependencies, update edition 2022-04-28 22:50:55 +08:00
PSeitz
ae83fc8298 bump uuid to 1.0 (#1345) 2022-04-22 10:02:24 +09:00
Pascal Seitz
bb5254de12 always serialize, use enum as param 2022-04-04 13:50:23 +08:00
Pascal Seitz
8807bfd13d fast field on string
enables FAST on string fields, which creates a fastfield containing the term ordinals
2022-03-29 12:40:10 +08:00
Paul Masurel
46d5de920d Removes all usage of block_on, and use a oneshot channel instead. (#1315)
* Removes all usage of block_on, and use a oneshot channel instead.

Calling `block_on` panics in certain context.
For instance, it panics when it is called in a the context of another
call to block.

Using it in tantivy is unnecessary. We replace it by a thin wrapper
around a oneshot channel that supports both async/sync.

* Removing needless uses of async in the API.

Co-authored-by: PSeitz <PSeitz@users.noreply.github.com>
2022-03-18 16:54:58 +09:00
PSeitz
b105bf72e1 use defaults in meta.json (#1310)
This change allows to have unset fields in meta.json and fall back to their defaults
Currently it is required to explicitly put e.g. fieldnorms: false
2022-03-14 13:54:06 +09:00
Antoine G
e37775fe21 iff->if or if and only if (#1298)
* has_xxx is_xxx -> if, these function usualy define equivalence
xxx returns bool -> specify equivalence when appropriate

* fix doc
2022-03-02 11:00:00 +09:00
Paul Masurel
5004290daa Return an error on certain type of corruption. (#1296) 2022-03-01 11:35:56 +09:00
Paul Masurel
2ead010c83 Tantivy quickwit (#1293)
* Added sstable and enabling it by default, and parallel boolean query.
* Added async API for FileSlice.
* Added async get_doc
* Reduce blocksize to 32_000
* Added debug logs

Quickwit specific feature a hidden behind the quickwit feature flag.
2022-02-25 17:32:49 +09:00
Paul Masurel
d7b46d2137 Added JSON Type (#1270)
- Removed useless copy when ingesting JSON.
- Bugfix in phrase query with a missing field norms.
- Disabled range query on default fields

Closes #1251
2022-02-24 16:25:22 +09:00
Pascal Seitz
704498a1ac rename IntOptions to NumericOptions
keep IntOptions with deprecation warning
Fixes #1286
2022-02-21 22:20:07 +01:00