mirror of
https://github.com/quickwit-oss/tantivy.git
synced 2026-05-31 07:30:39 +00:00
This commit adds support for Polish language stemming. The previously used rust-stemmers crate is abandoned and unmaintained, which blocked the addition of new languages. This change addresses a user request for Polish stemming to improve BM25 recall in their use case. The tantivy-stemmers crate is a modern, maintained alternative that also opens the door for supporting many other languages in the future. - Added the tantivy-stemmers crate as a dependency to the workspace, alongside the existing rust-stemmers dependency (for backward compatibility) - Introduced an internal enum that can hold an algorithm from either rust-stemmers or tantivy-stemmers - Added Polish to the main Language enum, mapped to the new tantivy-stemmers implementation - Updated the token stream to handle both types of stemmers internally - Added the POLISH variant to the stopwords list - Existing tests pass - Added test_pl_tokenizer to verify that the Polish stemmer works correctly