Commit Graph

71 Commits

Author SHA1 Message Date
PSeitz
28dd6b6546 collect json paths in indexing (#2231)
* collect json paths in indexing

* remove unsafe iter_mut_keys
2023-11-01 11:25:17 +01:00
PSeitz
07bf66a197 json path writer (#2224)
* refactor logic to JsonPathWriter

* use in encode_column_name

* add inlines

* move unsafe block
2023-10-24 09:45:50 +02:00
Harrison Burt
1c7c6fd591 POC: Tantivy documents as a trait (#2071)
* fix windows build (#1)

* Fix windows build

* Add doc traits

* Add field value iter

* Add value and serialization

* Adjust order

* Fix bug

* Correct type

* Fix generic bugs

* Reformat code

* Add generic to index writer which I forgot about

* Fix missing generics on single segment writer

* Add missing type export

* Add default methods for convenience

* Cleanup

* Fix more-like-this query to use standard types

* Update API and fix tests

* Add doc traits

* Add field value iter

* Add value and serialization

* Adjust order

* Fix bug

* Correct type

* Rebase main and fix conflicts

* Reformat code

* Merge upstream

* Fix missing generics on single segment writer

* Add missing type export

* Add default methods for convenience

* Cleanup

* Fix more-like-this query to use standard types

* Update API and fix tests

* Add tokenizer improvements from previous commits

* Add tokenizer improvements from previous commits

* Reformat

* Fix unit tests

* Fix unit tests

* Use enum in changes

* Stage changes

* Add new deserializer logic

* Add serializer integration

* Add document deserializer

* Implement new (de)serialization api for existing types

* Fix bugs and type errors

* Add helper implementations

* Fix errors

* Reformat code

* Add unit tests and some code organisation for serialization

* Add unit tests to deserializer

* Add some small docs

* Add support for deserializing serde values

* Reformat

* Fix typo

* Fix typo

* Change repr of facet

* Remove unused trait methods

* Add child value type

* Resolve comments

* Fix build

* Fix more build errors

* Fix more build errors

* Fix the tests I missed

* Fix examples

* fix numerical order, serialize PreTok Str

* fix coverage

* rename Document to TantivyDocument, rename DocumentAccess to Document

add Binary prefix to binary de/serialization

* fix coverage

---------

Co-authored-by: Pascal Seitz <pascal.seitz@gmail.com>
2023-10-02 10:01:16 +02:00
PSeitz
49448b31c6 chore: Release (#2168)
* chore: Release

* update CHANGELOG
2023-09-01 13:58:58 +02:00
Harrison Burt
131c10d318 Fix missing trait imports (#2154) 2023-08-27 09:20:26 +09:00
PSeitz
59460c767f delayed column opening during merge (#2132)
* lazy columnar merge

This is the first part of addressing #3633
Instead of loading all Column into memory for the merge, only the current column_name
group is loaded. This can be done since the sstable streams the columns lexicographically.

* refactor

* add rustdoc

* replace iterator with BTreeMap
2023-08-21 08:55:35 +02:00
Paul Masurel
7ee78bda52 Readding s in datetime precision variant names (#2065)
There is no clear win and it change some serialization in quickwit.
2023-06-01 06:39:46 +02:00
Adrien Guillo
a789ad9aee Rename DatePrecision to DateTimePrecision (#2051) 2023-05-23 17:09:11 +02:00
Yuri Astrakhan
74275b76a6 Inline format arguments where makes sense (#2038)
Applied this command to the code, making it a bit shorter and slightly
more readable.

```
cargo +nightly clippy --all-features --benches --tests --workspace --fix -- -A clippy::all -W clippy::uninlined_format_args
cargo +nightly fmt --all
```
2023-05-10 18:03:59 +09:00
PSeitz
ba309e18a1 switch to nanosecond precision (#2016) 2023-05-01 03:32:20 +02:00
PSeitz
7b31100208 refactor vint (#2010)
- improve performance of vint
vint serialization shows up in performance profiles during indexing.
It would also make sense to limit the value space to u29 and operate on 4 bytes only.
- remove unused code
- add missing inlines
- fix regex test
2023-04-25 08:49:36 +02:00
trinity-1686a
205e8a0a92 encode dictionary type in fst footer (#1968)
* encode additional footer for dictionary kind in fst
2023-04-12 09:43:01 +02:00
Paul Masurel
059fc767ea Added ::MIN ::MAX DateTime. (#1965) 2023-03-27 15:32:53 +09:00
trinity-1686a
e5e50603a8 new sstable format (#1943)
* document a new sstable format

* add support for changing target block size

* use new format for sstable index

* handle sstable version errror

* use very small blocks for proptests

* add a footer structure
2023-03-21 15:03:52 +01:00
PSeitz
8f7f1d6be4 add Display for ByteCount (#1949)
* add Display for ByteCount

* export missing AggregationLimits
2023-03-21 08:02:35 +01:00
PSeitz
9e2faecf5b add memory limit for aggregations (#1942)
* add memory limit for aggregations

introduce AggregationLimits to set memory consumption limit and bucket limits
memory limit is checked during aggregation, bucket limit is checked before returning the aggregation request.

* Apply suggestions from code review

Co-authored-by: Paul Masurel <paul@quickwit.io>

* add ByteCount with human readable format

---------

Co-authored-by: Paul Masurel <paul@quickwit.io>
2023-03-16 06:21:07 +01:00
Paul Masurel
7fae4d98d7 Adapting for quickwit2 (#1912)
* Adapting tantivy to make it possible to be plugged to quickwit.

* Apply suggestions from code review

Co-authored-by: PSeitz <PSeitz@users.noreply.github.com>

* Added unit test

---------

Co-authored-by: PSeitz <PSeitz@users.noreply.github.com>
2023-03-01 16:27:46 +09:00
Paul Masurel
66ff53b0f4 Various minor code cleanup (#1909) 2023-02-27 13:48:34 +09:00
Paul Masurel
7423f99719 Issue/columnar for json (#1876)
Adding support for JSON fast field.
2023-02-16 20:38:32 +09:00
trinity-1686a
539ff08a79 move DateTime to tantivy_common (#1861)
* move DateTime to tantivy_common

* resolve imports of columnar::DateTime as import of common::DateTime
2023-02-11 17:03:06 +01:00
Paul Masurel
bd5eea9852 Integrated columnar work. 2023-02-09 13:14:31 +01:00
PSeitz
45156fd869 use group_by in translate_codec_idx_to_original_id (#1736) 2022-12-26 06:13:29 +01:00
Paul Masurel
3339a3ec05 Removed feature(quickwit) in tantivy-common. 2022-12-22 10:19:57 +09:00
Paul Masurel
f39165e1e7 Moving FileSlice to tantivy-common (#1729) 2022-12-21 16:35:11 +09:00
PSeitz
f9171a3981 fix clippy (#1725)
* fix clippy

* fix clippy fastfield codecs

* fix clippy bitpacker

* fix clippy common

* fix clippy stacker

* fix clippy sstable

* fmt
2022-12-20 07:30:06 +01:00
PSeitz
509adab79d Bump version (#1715)
* group workspace deps

* update cargo.toml

* revert tant version

* chore: Release
2022-12-12 04:39:43 +01:00
PSeitz
1119e59eae prepare fastfield format for null index (#1691)
* prepare fastfield format for null index
* add format version for fastfield
* Update fastfield_codecs/src/compact_space/mod.rs
* switch to variable size footer
* serialize delta of end
2022-11-28 17:15:24 +09:00
Pascal Seitz
38ad46e580 fix clippy 2022-11-07 16:09:55 +08:00
Pascal Seitz
5171ff611b serialize ip as u128, add test for positions_to_docid 2022-10-07 16:25:01 +08:00
Bruce Mitchener
cb252a42af docs: "associated to" -> "associated with" (#1557)
This reads better this way.
2022-09-26 20:23:37 +09:00
Bruce Mitchener
e9a384bb15 Use u8::from(bool), u64::from(bool). 2022-09-22 22:44:53 +07:00
Bruce Mitchener
6a88ac3fe3 Documentation improvements.
Fix some linking, some grammar, some typos, etc.
2022-09-18 18:05:37 +07:00
Pascal Seitz
bc85947105 add ip codec 2022-09-16 16:38:01 +08:00
Paul Masurel
08c4412d73 Adding dragon API to build index without any thread. (#1496)
Closes #1487
2022-09-01 10:32:36 +09:00
Kian-Meng Ang
014b1adc3e cargo +nightly fmt 2022-08-17 22:33:44 +08:00
Kian-Meng Ang
84295d5b35 cargo fmt 2022-08-15 21:07:01 +08:00
Kian-Meng Ang
625bcb4877 Fix typos and markdowns
Found via these commands:

    codespell -L crate,ser,panting,beauti,hart,ue,atleast,childs,ond,pris,hel,mot
    markdownlint *.md doc/src/*.md --disable MD013 MD025 MD033 MD001 MD024 MD036 MD041 MD003
2022-08-13 18:25:47 +08:00
Kanji Yomoda
af84e74284 Replace deprecated std package's constants on floats and integers (#1420) 2022-07-22 08:05:08 +09:00
Pascal Seitz
02691f2445 edition 2021 for subcrates 2022-07-04 14:19:32 +08:00
Pascal Seitz
8b6647e908 move writer to compressor thread 2022-06-23 15:34:21 +08:00
boraarslan
fc43ab9280 Add tests 2022-06-07 10:09:37 +03:00
Paul Masurel
f0a2b1cc44 Bumped tantivy and subcrate versions. 2022-05-25 22:50:33 +09:00
Paul Masurel
2e255c4bef Preparing for release 2022-03-09 09:59:08 +09:00
Paul Masurel
13a4473faa Removing obsolete clippy allow thingy. 2022-02-01 11:54:01 +09:00
Paul Masurel
eca6628b3c Minor refactoring (#1266) 2022-01-28 15:55:55 +09:00
Paul Masurel
3ea6800ac5 Pleasing clippy (#1253) 2022-01-06 16:41:24 +09:00
Antoine G
395303b644 Collector + directory doc fixes (#1247)
* doc(collector)

* doc(directory)

* doc(misc)

* wording
2022-01-04 09:22:58 +09:00
Pascal Seitz
70283dc6c8 fix incorrect padding in bitset for multiple of 64 2021-10-29 16:49:22 +08:00
Paul Masurel
b5b1244857 More functionality in the ownedbytes crate (#1172) 2021-10-07 18:14:49 +09:00
Paul Masurel
02cffa4dea Code simplification. (#1169)
Code simplification and Clippy
2021-10-07 14:11:44 +09:00