Commit Graph

15 Commits

Author SHA1 Message Date
PSeitz
33794a114c chore: Release (#2686)
Co-authored-by: Pascal Seitz <pascal.seitz@datadoghq.com>
2025-08-20 18:29:37 +08:00
PSeitz
4a6123d3ff release tantivy: bump versions (#2625)
* chore: Release

* chore: Release

---------

Co-authored-by: Pascal Seitz <pascal.seitz@datadoghq.com>
2025-06-10 15:34:39 +02:00
PSeitz
4c52499622 clippy (#2549) 2024-11-29 16:08:21 +08:00
Bruce Mitchener
c17e513377 Reduce typo count. (#2510) 2024-10-10 09:55:37 +08:00
PSeitz
17d5869ad6 update CHANGELOG, use github API in cliff (#2354)
* update CHANGELOG, use github API in cliff

* reset version to 0.21.1, before release

* chore: Release

* remove unreleased from CHANGELOG
2024-04-15 10:07:20 +02:00
PSeitz
c2b0469180 improve docs, rework exports (#2220)
* rework exports

move snippet and advice
make indexer pub, remove indexer reexports

* add deprecation warning

* add architecture overview
2023-10-18 09:22:24 +02:00
PSeitz
49448b31c6 chore: Release (#2168)
* chore: Release

* update CHANGELOG
2023-09-01 13:58:58 +02:00
Paul Masurel
d341520938 Dynamic follow up 2023-07-03 22:05:10 +09:00
Paul Masurel
ad4c940fa3 proof of concept for dynamic tokenizer. 2023-07-03 22:05:10 +09:00
PSeitz
e3eacb4388 release tantivy (#2083)
* prerelease

* chore: Release
2023-06-09 10:47:46 +02:00
PSeitz
fdecb79273 tokenizer-api: reduce Tokenizer overhead (#2062)
* tokenizer-api: reduce Tokenizer overhead

Previously a new `Token` for each text encountered was created, which
contains `String::with_capacity(200)`
In the new API the token_stream gets mutable access to the tokenizer,
this allows state to be shared (in this PR Token is shared).
Ideally the allocation for the BoxTokenStream would also be removed, but
this may require some lifetime tricks.

* simplify api

* move lowercase and ascii folding buffer to global

* empty Token text as default
2023-06-08 18:37:58 +08:00
PSeitz
e56addc63e enable tokenizer on json fields (#2053)
* enable tokenizer on json fields

enable tokenizer on json fields for type text

* Avoid making the tokenizer within the TextAnalyzer pub(crate)

* Moving BoxableTokenizer to tantivy.

---------

Co-authored-by: Paul Masurel <paul@quickwit.io>
2023-05-24 10:47:39 +02:00
trinity-1686a
064518156f refactor tokenization pipeline to use GATs (#1924)
* refactor tokenization pipeline to use GATs

* fix doctests

* fix clippy lints

* remove commented code
2023-03-09 09:39:37 +01:00
Adrien Guillo
f3621c0487 Add license to tokenizer-api crate (#1778) 2023-01-11 05:26:41 +01:00
PSeitz
514d23a20c move tokenizer API to seperate crate (#1767)
closes #1766

Finding tantivy tokenizers is a frustrating experience currently, since
they need be updated for each tantivy version. That's unnecessary since
the API is rather stable anyway.
2023-01-09 06:37:38 +01:00