Paul Masurel 07d87e154b Collector refactoring and multithreaded search (#437)
* Split Collector into an overall Collector and a per-segment SegmentCollector. Precursor to cross-segment parallelism, and as a side benefit cleans up any per-segment fields from being Option<T> to just T.

* Attempt to add MultiCollector back

* working. Chained collector is broken though

* Fix chained collector

* Fix test

* Make Weight Send+Sync for parallelization purposes

* Expose parameters of RangeQuery for external usage

* Removed &mut self

* fixing tests

* Restored TestCollectors

* blop

* multicollector working

* chained collector working

* test broken

* fixing unit test

* blop

* blop

* Blop

* simplifying APi

* blop

* better syntax

* Simplifying top_collector

* refactoring

* blop

* Sync with master

* Added multithread search

* Collector refactoring

* Schema::builder

* CR and rustdoc

* CR comments

* blop

* Added an executor

* Sorted the segment readers in the searcher

* Update searcher.rs

* Fixed unit testst

* changed the place where we have the sort-segment-by-count heuristic

* using crossbeam::channel

* inlining

* Comments about panics propagating

* Added unit test for executor panicking

* Readded default

* Removed Default impl

* Added unit test for executor
2018-11-30 22:46:59 +09:00
2018-09-09 17:23:30 +09:00
2018-09-16 13:26:54 +09:00
2018-07-10 13:07:15 +09:00
2018-09-06 10:10:40 +09:00
2018-09-06 10:10:40 +09:00
2018-02-16 17:50:05 +09:00

Build Status codecov Join the chat at https://gitter.im/tantivy-search/tantivy License: MIT Build status Say Thanks!

Tantivy

Tantivy is a full text search engine library written in rust.

It is closer to Apache Lucene than to Elastic Search and Apache Solr in the sense it is not an off-the-shelf search engine server, but rather a crate that can be used to build such a search engine.

Tantivy is, in fact, strongly inspired by Lucene's design.

Features

  • Full-text search
  • Fast (check out the 🐎 benchmark 🐎)
  • Tiny startup time (<10ms), perfect for command line tools
  • BM25 scoring (the same as lucene)
  • Natural query language (michael AND jackson) OR "king of pop"
  • Phrase queries search ("michael jackson")
  • Incremental indexing
  • Multithreaded indexing (indexing English Wikipedia takes < 3 minutes on my desktop)
  • Mmap directory
  • SIMD integer compression when the platform/CPU includes the SSE2 instruction set.
  • Single valued and multivalued u64 and i64 fast fields (equivalent of doc values in Lucene)
  • &[u8] fast fields
  • LZ4 compressed document store
  • Range queries
  • Faceted search
  • Configurable indexing (optional term frequency and position indexing)
  • Cheesy logo with a horse

Non-features

  • Distributed search is out of the scope of tantivy. That being said, tantivy is meant as a library upon which one could build a distributed search. Serializable/mergeable collector state for instance, are within the scope of tantivy.

Supported OS and compiler

Tantivy works on stable rust (>= 1.27) and supports Linux, MacOS and Windows.

Getting started

Compiling

Development

Tantivy compiles on stable rust but requires Rust >= 1.27. To check out and run tests, you can simply run :

git clone git@github.com:tantivy-search/tantivy.git
cd tantivy
cargo build

Running tests

Some tests will not run with just cargo test because of fail-rs. To run the tests exhaustively, run ./run-tests.sh.

Contribute

Send me an email (paul.masurel at gmail.com) if you want to contribute to tantivy.

Description
Tantivy is a full-text search engine library inspired by Apache Lucene and written in Rust
Readme MIT 83 MiB
Languages
Rust 100%