* TopDocs: ensure stable sorting on equal score When selecting the top K documents by score, we need to ensure stable sorting. Until now, for documents with the same score, we were relying on the (arbitrary) order returned by the BinaryHeap used to implement the collectors. This patch fixes the problem by explicitly using the doc address when harvesting the `TopSegmentCollector` and when merging the results in `TopCollector::merge_fruits()`. This is important (for example) to implement pagination correctly using the TopDocs collector. If sorting isn't stable, documents that have the same score might be ranked in different positions depending on the specific K that was used, thus appearing in two different pages, or in none at all. Fixes gh-671 * TMP: alternative solution (see previous commit) If we add the constrait that D is also PartialOrd in ComparableDoc<T, D>, then we can move the comparison by doc address directly in the cmp implementation of ComparableDoc. * TMP rebase as first commit: add benchmarks for TopSegmentCollector * fixup! TMP: alternative solution (see previous commit) * TMP add changelog entry * TMP run cargo fmt
Tantivy is a full text search engine library written in Rust.
It is closer to Apache Lucene than to Elasticsearch or Apache Solr in the sense it is not an off-the-shelf search engine server, but rather a crate that can be used to build such a search engine.
Tantivy is, in fact, strongly inspired by Lucene's design.
Benchmark
Tantivy is typically faster than Lucene, but the results depend on the nature of the queries in your workload.
The following benchmark break downs performance for different type of queries / collection.
Features
- Full-text search
- Configurable tokenizer (stemming available for 17 Latin languages with third party support for Chinese (tantivy-jieba and cang-jie) and Japanese)
- Fast (check out the 🐎 ✨ benchmark ✨ 🐎)
- Tiny startup time (<10ms), perfect for command line tools
- BM25 scoring (the same as Lucene)
- Natural query language (e.g.
(michael AND jackson) OR "king of pop") - Phrase queries search (e.g.
"michael jackson") - Incremental indexing
- Multithreaded indexing (indexing English Wikipedia takes < 3 minutes on my desktop)
- Mmap directory
- SIMD integer compression when the platform/CPU includes the SSE2 instruction set
- Single valued and multivalued u64, i64, and f64 fast fields (equivalent of doc values in Lucene)
&[u8]fast fields- Text, i64, u64, f64, dates, and hierarchical facet fields
- LZ4 compressed document store
- Range queries
- Faceted search
- Configurable indexing (optional term frequency and position indexing)
- Cheesy logo with a horse
Non-features
- Distributed search is out of the scope of Tantivy. That being said, Tantivy is a library upon which one could build a distributed search. Serializable/mergeable collector state for instance, are within the scope of Tantivy.
Supported OS and compiler
Tantivy works on stable Rust (>= 1.27) and supports Linux, MacOS, and Windows.
Getting started
- Tantivy's simple search example
- tantivy-cli and its tutorial -
tantivy-cliis an actual command line interface that makes it easy for you to create a search engine, index documents, and search via the CLI or a small server with a REST API. It walks you through getting a wikipedia search engine up and running in a few minutes. - Reference doc for the last released version
How can I support this project?
There are many ways to support this project.
- Use Tantivy and tell us about your experience on Gitter or by email (paul.masurel@gmail.com)
- Report bugs
- Write a blog post
- Help with documentation by asking questions or submitting PRs
- Contribute code (you can join our Gitter)
- Talk about Tantivy around you
- Drop a word on on
or even

Contributing code
We use the GitHub Pull Request workflow: reference a GitHub ticket and/or include a comprehensive commit message when opening a PR.
Clone and build locally
Tantivy compiles on stable Rust but requires Rust >= 1.27.
To check out and run tests, you can simply run:
git clone https://github.com/tantivy-search/tantivy.git
cd tantivy
cargo build
Run tests
Some tests will not run with just cargo test because of fail-rs.
To run the tests exhaustively, run ./run-tests.sh.
Debug
You might find it useful to step through the programme with a debugger.
A failing test
Make sure you haven't run cargo clean after the most recent cargo test or cargo build to guarantee that the target/ directory exists. Use this bash script to find the name of the most recent debug build of Tantivy and run it under rust-gdb:
find target/debug/ -maxdepth 1 -executable -type f -name "tantivy*" -printf '%TY-%Tm-%Td %TT %p\n' | sort -r | cut -d " " -f 3 | xargs -I RECENT_DBG_TANTIVY rust-gdb RECENT_DBG_TANTIVY
Now that you are in rust-gdb, you can set breakpoints on lines and methods that match your source code and run the debug executable with flags that you normally pass to cargo test like this:
$gdb run --test-threads 1 --test $NAME_OF_TEST
An example
By default, rustc compiles everything in the examples/ directory in debug mode. This makes it easy for you to make examples to reproduce bugs:
rust-gdb target/debug/examples/$EXAMPLE_NAME
$ gdb run
