mirror of https://github.com/quickwit-oss/tantivy.git synced 2026-05-26 05:00:41 +00:00

Go to file

petr-tik 431c187a60 Make error handling richer in Footer::is_compatible (#724 )

* WIP implemented is_compatible

hide Footer::from_bytes from public consumption - only found Footer::extract
used outside the module

Add a new error type for IncompatibleIndex
add a prototypical call to footer.is_compatible() in ManagedDirectory::open_read
to make sure we error before reading it further

* Make error handling more ergonomic

Add an error subtype for OpenReadError and converters to TantivyError

* Remove an unnecessary assert

it's follower by the same check that Errors instead of panicking

* Correct the compatibility check logic

Leave a defensive versioned footer check to make sure we add new logic handling
when we add possible footer versions

Restricted VersionedFooter::from_bytes to be used inside the crate only

remove a half-baked test

* WIP.

* Return an error if index incompatible - closes #662

Enrich the error type with incompatibility

Change return type to Result<bool, TantivyError>, instead of bool

Add an Incompatibility enum that enriches the IncompatibleIndex error variant
with information, which then allows us to generate a developer-friendly hint how
to upgrade library version or switch feature flags for a different compression
algorithm

Updated changelog

Change the signature of is_compatible

Added documentation to the Incompatibility
Added a conditional test on a Footer with lz4 erroring

2019-12-14 09:14:33 +09:00

.github

Create FUNDING.yml

2019-11-05 16:26:12 +09:00

Moving queyr grammar to a different crate. (#645 )

2019-09-05 09:37:28 +09:00

doc

updating doc

2018-09-09 17:23:30 +09:00

examples

Kkoziara remove tokens from doc store (#715 )

2019-11-25 22:39:12 +09:00

query-grammar

Add a doctest to BooleanQuery (#630 )

2019-10-07 10:05:12 +09:00

src

Make error handling richer in Footer::is_compatible (#724 )

2019-12-14 09:14:33 +09:00

tests

Removing futures-cpupool and upgrading to futures-0.3

2019-11-15 18:35:31 +09:00

.gitattributes

Mark "cpp" folder as linguist-vendored in .gitattributes

2017-03-30 13:43:03 +01:00

.gitignore

Added iml filewq

2018-09-16 13:26:54 +09:00

.travis.yml

Added cargo-fmt to CI runs (#627 )

2019-08-12 08:25:47 +09:00

appveyor.yml

Failrs (#600 )

2019-07-22 13:17:21 +09:00

AUTHORS

Added an AUTHORS file. Closes #315 (#316 )

2018-06-11 22:21:58 +09:00

Cargo.toml

Optimize deletes (#723 )

2019-12-13 09:50:00 +09:00

CHANGELOG.md

Make error handling richer in Footer::is_compatible (#724 )

2019-12-14 09:14:33 +09:00

LICENSE

Added an AUTHORS file. Closes #315 (#316 )

2018-06-11 22:21:58 +09:00

Makefile

Moving queyr grammar to a different crate. (#645 )

2019-09-05 09:37:28 +09:00

README.md

Fix grammar / punctuation (#668 )

2019-10-21 10:50:53 +09:00

run-tests.sh

Failrs (#600 )

2019-07-22 13:17:21 +09:00

rustfmt.toml

rustfmt

2018-02-16 17:50:05 +09:00

README.md

Tantivy is a full text search engine library written in Rust.

It is closer to Apache Lucene than to Elasticsearch or Apache Solr in the sense it is not an off-the-shelf search engine server, but rather a crate that can be used to build such a search engine.

Tantivy is, in fact, strongly inspired by Lucene's design.

Benchmark

Tantivy is typically faster than Lucene, but the results depend on the nature of the queries in your workload.

The following benchmark break downs performance for different type of queries / collection.

Features

Full-text search
Configurable tokenizer (stemming available for 17 Latin languages with third party support for Chinese (tantivy-jieba and cang-jie) and Japanese)
Fast (check out the 🐎 ✨ benchmark ✨ 🐎)
Tiny startup time (<10ms), perfect for command line tools
BM25 scoring (the same as Lucene)
Natural query language (e.g. (michael AND jackson) OR "king of pop")
Phrase queries search (e.g. "michael jackson")
Incremental indexing
Multithreaded indexing (indexing English Wikipedia takes < 3 minutes on my desktop)
Mmap directory
SIMD integer compression when the platform/CPU includes the SSE2 instruction set
Single valued and multivalued u64, i64, and f64 fast fields (equivalent of doc values in Lucene)
&[u8] fast fields
Text, i64, u64, f64, dates, and hierarchical facet fields
LZ4 compressed document store
Range queries
Faceted search
Configurable indexing (optional term frequency and position indexing)
Cheesy logo with a horse

Non-features

Distributed search is out of the scope of Tantivy. That being said, Tantivy is a library upon which one could build a distributed search. Serializable/mergeable collector state for instance, are within the scope of Tantivy.

Supported OS and compiler

Tantivy works on stable Rust (>= 1.27) and supports Linux, MacOS, and Windows.

Getting started

Tantivy's simple search example
tantivy-cli and its tutorial - tantivy-cli is an actual command line interface that makes it easy for you to create a search engine, index documents, and search via the CLI or a small server with a REST API. It walks you through getting a wikipedia search engine up and running in a few minutes.
Reference doc for the last released version

How can I support this project?

There are many ways to support this project.

Use Tantivy and tell us about your experience on Gitter or by email (paul.masurel@gmail.com)
Report bugs
Write a blog post
Help with documentation by asking questions or submitting PRs
Contribute code (you can join our Gitter)
Talk about Tantivy around you
Drop a word on on or even

Contributing code

We use the GitHub Pull Request workflow: reference a GitHub ticket and/or include a comprehensive commit message when opening a PR.

Clone and build locally

Tantivy compiles on stable Rust but requires Rust >= 1.27. To check out and run tests, you can simply run:

    git clone https://github.com/tantivy-search/tantivy.git
    cd tantivy
    cargo build

Run tests

Some tests will not run with just cargo test because of fail-rs. To run the tests exhaustively, run ./run-tests.sh.

Debug

You might find it useful to step through the programme with a debugger.

A failing test

Make sure you haven't run cargo clean after the most recent cargo test or cargo build to guarantee that the target/ directory exists. Use this bash script to find the name of the most recent debug build of Tantivy and run it under rust-gdb:

find target/debug/ -maxdepth 1 -executable -type f -name "tantivy*" -printf '%TY-%Tm-%Td %TT %p\n' | sort -r | cut -d " " -f 3 | xargs -I RECENT_DBG_TANTIVY rust-gdb RECENT_DBG_TANTIVY

Now that you are in rust-gdb, you can set breakpoints on lines and methods that match your source code and run the debug executable with flags that you normally pass to cargo test like this:

$gdb run --test-threads 1 --test $NAME_OF_TEST

An example

By default, rustc compiles everything in the examples/ directory in debug mode. This makes it easy for you to make examples to reproduce bugs:

rust-gdb target/debug/examples/$EXAMPLE_NAME
$ gdb run