Fixing bench compilation

fmt
2026-04-20 11:30:40 +00:00 · 2019-10-04 16:36:17 +09:00 · 2019-10-02 09:50:20 +09:00
8 changed files with 49 additions and 150 deletions
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -34,7 +34,7 @@ itertools = "0.8"
 levenshtein_automata = {version="0.1", features=["fst_automaton"]}
 notify = {version="4", optional=true}
 bit-set = "0.5"
-uuid = { version = "0.8", features = ["v4", "serde"] }
+uuid = { version = "0.7.2", features = ["v4", "serde"] }
 crossbeam = "0.7"
 futures = "0.1"
 futures-cpupool = "0.1"
--- a/README.md
+++ b/README.md
@@ -21,9 +21,9 @@
 [![Become a patron](https://c5.patreon.com/external/logo/become_a_patron_button.png)](https://www.patreon.com/fulmicoton)


-**Tantivy** is a **full text search engine library** written in Rust.
+**Tantivy** is a **full text search engine library** written in rust.

-It is closer to [Apache Lucene](https://lucene.apache.org/) than to [Elasticsearch](https://www.elastic.co/products/elasticsearch) or [Apache Solr](https://lucene.apache.org/solr/) in the sense it is not
+It is closer to [Apache Lucene](https://lucene.apache.org/) than to [Elasticsearch](https://www.elastic.co/products/elasticsearch) and [Apache Solr](https://lucene.apache.org/solr/) in the sense it is not
 an off-the-shelf search engine server, but rather a crate that can be used
 to build such a search engine.

@@ -31,7 +31,7 @@ Tantivy is, in fact, strongly inspired by Lucene's design.

 # Benchmark

-Tantivy is typically faster than Lucene, but the results depend on 
+Tantivy is typically faster than Lucene, but the results will depend on 
 the nature of the queries in your workload.

 The following [benchmark](https://tantivy-search.github.io/bench/) break downs 
@@ -40,19 +40,19 @@ performance for different type of queries / collection.
 # Features

 - Full-text search
- Configurable tokenizer (stemming available for 17 Latin languages with third party support for Chinese ([tantivy-jieba](https://crates.io/crates/tantivy-jieba) and [cang-jie](https://crates.io/crates/cang-jie)) and [Japanese](https://crates.io/crates/tantivy-tokenizer-tiny-segmenter))
+- Configurable tokenizer. (stemming available for 17 latin languages. Third party support for Chinese ([tantivy-jieba](https://crates.io/crates/tantivy-jieba) and [cang-jie](https://crates.io/crates/cang-jie)) and [Japanese](https://crates.io/crates/tantivy-tokenizer-tiny-segmenter)
 - Fast (check out the :racehorse: :sparkles: [benchmark](https://tantivy-search.github.io/bench/) :sparkles: :racehorse:)
 - Tiny startup time (<10ms), perfect for command line tools
- BM25 scoring (the same as Lucene)
- Natural query language (e.g. `(michael AND jackson) OR "king of pop"`)
- Phrase queries search (e.g. `"michael jackson"`)
+- BM25 scoring (the same as lucene)
+- Natural query language `(michael AND jackson) OR "king of pop"`
+- Phrase queries search (`"michael jackson"`)
 - Incremental indexing
 - Multithreaded indexing (indexing English Wikipedia takes < 3 minutes on my desktop)
 - Mmap directory
- SIMD integer compression when the platform/CPU includes the SSE2 instruction set
- Single valued and multivalued u64, i64, and f64 fast fields (equivalent of doc values in Lucene)
+- SIMD integer compression when the platform/CPU includes the SSE2 instruction set.
+- Single valued and multivalued u64, i64 and f64 fast fields (equivalent of doc values in Lucene)
 - `&[u8]` fast fields
- Text, i64, u64, f64, dates, and hierarchical facet fields
+- Text, i64, u64, f64, dates and hierarchical facet fields
 - LZ4 compressed document store
 - Range queries
 - Faceted search
@@ -61,42 +61,43 @@ performance for different type of queries / collection.

 # Non-features

- Distributed search is out of the scope of Tantivy. That being said, Tantivy is a
+- Distributed search is out of the scope of tantivy. That being said, tantivy is meant as a
 library upon which one could build a distributed search. Serializable/mergeable collector state for instance, 
-are within the scope of Tantivy.
+are within the scope of tantivy.

 # Supported OS and compiler

-Tantivy works on stable Rust (>= 1.27) and supports Linux, MacOS, and Windows.
+Tantivy works on stable rust (>= 1.27) and supports Linux, MacOS and Windows.

 # Getting started

- [Tantivy's simple search example](https://tantivy-search.github.io/examples/basic_search.html)
- [tantivy-cli and its tutorial](https://github.com/tantivy-search/tantivy-cli) - `tantivy-cli` is an actual command line interface that makes it easy for you to create a search engine,
-index documents, and search via the CLI or a small server with a REST API.
-It walks you through getting a wikipedia search engine up and running in a few minutes.
- [Reference doc for the last released version](https://docs.rs/tantivy/)
+- [tantivy's simple search example](https://tantivy-search.github.io/examples/basic_search.html)
+- [tantivy-cli and its tutorial](https://github.com/tantivy-search/tantivy-cli).
+`tantivy-cli` is an actual command line interface that makes it easy for you to create a search engine,
+index documents and search via the CLI or a small server with a REST API.
+It will walk you through getting a wikipedia search engine up and running in a few minutes.
+- [reference doc for the last released version](https://docs.rs/tantivy/)

 # How can I support this project?

 There are many ways to support this project. 

- Use Tantivy and tell us about your experience on [Gitter](https://gitter.im/tantivy-search/tantivy) or by email (paul.masurel@gmail.com)
+- Use tantivy and tell us about your experience on [gitter](https://gitter.im/tantivy-search/tantivy) or by email (paul.masurel@gmail.com)
 - Report bugs
 - Write a blog post
 - Help with documentation by asking questions or submitting PRs
- Contribute code (you can join [our Gitter](https://gitter.im/tantivy-search/tantivy))
- Talk about Tantivy around you
+- Contribute code (you can join [our gitter](https://gitter.im/tantivy-search/tantivy) )
+- Talk about tantivy around you
 - Drop a word on on [![Say Thanks!](https://img.shields.io/badge/Say%20Thanks-!-1EAEDB.svg)](https://saythanks.io/to/fulmicoton) or even [![Become a patron](https://c5.patreon.com/external/logo/become_a_patron_button.png)](https://www.patreon.com/fulmicoton)

 # Contributing code

-We use the GitHub Pull Request workflow: reference a GitHub ticket and/or include a comprehensive commit message when opening a PR.
+We use the GitHub Pull Request workflow - reference a GitHub ticket and/or include a comprehensive commit message when opening a PR.

 ## Clone and build locally

-Tantivy compiles on stable Rust but requires `Rust >= 1.27`.
-To check out and run tests, you can simply run:
+Tantivy compiles on stable rust but requires `Rust >= 1.27`.
+To check out and run tests, you can simply run :

 ```bash
    git clone https://github.com/tantivy-search/tantivy.git
@@ -107,7 +108,7 @@ To check out and run tests, you can simply run:
 ## Run tests

 Some tests will not run with just `cargo test` because of `fail-rs`.
-To run the tests exhaustively, run `./run-tests.sh`.
+To run the tests exhaustively, run `./run-tests.sh`

 ## Debug

@@ -115,13 +116,13 @@ You might find it useful to step through the programme with a debugger.

 ### A failing test

-Make sure you haven't run `cargo clean` after the most recent `cargo test` or `cargo build` to guarantee that the `target/` directory exists. Use this bash script to find the name of the most recent debug build of Tantivy and run it under `rust-gdb`:
+Make sure you haven't run `cargo clean` after the most recent `cargo test` or `cargo build` to guarantee that `target/` dir exists. Use this bash script to find the most name of the most recent debug build of tantivy and run it under rust-gdb.

 ```bash
 find target/debug/ -maxdepth 1 -executable -type f -name "tantivy*" -printf '%TY-%Tm-%Td %TT %p\n' | sort -r | cut -d " " -f 3 | xargs -I RECENT_DBG_TANTIVY rust-gdb RECENT_DBG_TANTIVY
 ```

-Now that you are in `rust-gdb`, you can set breakpoints on lines and methods that match your source code and run the debug executable with flags that you normally pass to `cargo test` like this:
+Now that you are in rust-gdb, you can set breakpoints on lines and methods that match your source-code and run the debug executable with flags that you normally pass to `cargo test` to like this

 ```bash
 $gdb run --test-threads 1 --test $NAME_OF_TEST
@@ -129,7 +130,7 @@ $gdb run --test-threads 1 --test $NAME_OF_TEST

 ### An example

-By default, `rustc` compiles everything in the `examples/` directory in debug mode. This makes it easy for you to make examples to reproduce bugs:
+By default, rustc compiles everything in the `examples/` dir in debug mode. This makes it easy for you to make examples to reproduce bugs.

 ```bash
 rust-gdb target/debug/examples/$EXAMPLE_NAME
--- a/query-grammar/src/occur.rs
+++ b/query-grammar/src/occur.rs
@@ -2,7 +2,7 @@ use std::fmt;
 use std::fmt::Write;

 /// Defines whether a term in a query must be present,
-/// should be present or must be not present.
+/// should be present or must not be present.
 #[derive(Debug, Clone, Hash, Copy, Eq, PartialEq)]
 pub enum Occur {
    /// For a given document to be considered for scoring,
--- a/src/core/segment_id.rs
+++ b/src/core/segment_id.rs
@@ -76,7 +76,7 @@ impl SegmentId {
 }

 /// Error type used when parsing a `SegmentId` from a string fails.
-pub struct SegmentIdParseError(uuid::Error);
+pub struct SegmentIdParseError(uuid::parser::ParseError);

 impl Error for SegmentIdParseError {}

--- a/src/lib.rs
+++ b/src/lib.rs
@@ -212,13 +212,15 @@ pub type Score = f32;
 pub type SegmentLocalId = u32;

 impl DocAddress {
-    /// Return the segment ordinal id that identifies the segment
-    /// hosting the document in the `Searcher` it is called from.
+    /// Return the segment ordinal.
+    /// The segment ordinal is an id identifying the segment
+    /// hosting the document. It is only meaningful, in the context
+    /// of a searcher.
    pub fn segment_ord(self) -> SegmentLocalId {
        self.0
    }

-    /// Return the segment-local `DocId`
+    /// Return the segment local `DocId`
    pub fn doc(self) -> DocId {
        self.1
    }
@@ -227,11 +229,11 @@ impl DocAddress {
 /// `DocAddress` contains all the necessary information
 /// to identify a document given a `Searcher` object.
 ///
-/// It consists of an id identifying its segment, and
-/// a segment-local `DocId`.
+/// It consists in an id identifying its segment, and
+/// its segment-local `DocId`.
 ///
 /// The id used for the segment is actually an ordinal
-/// in the list of `Segment`s held by a `Searcher`.
+/// in the list of segment hold by a `Searcher`.
 #[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord)]
 pub struct DocAddress(pub SegmentLocalId, pub DocId);

--- a/src/query/boolean_query/boolean_query.rs
+++ b/src/query/boolean_query/boolean_query.rs
@@ -9,8 +9,7 @@ use crate::Result;
 use crate::Searcher;
 use std::collections::BTreeSet;

-/// The boolean query returns a set of documents
-/// that matches the Boolean combination of constituent subqueries.
+/// The boolean query combines a set of queries
 ///
 /// The documents matched by the boolean query are
 /// those which
@@ -20,113 +19,6 @@ use std::collections::BTreeSet;
 /// `MustNot` occurence.
 /// * match at least one of the subqueries that is not
 /// a `MustNot` occurence.
-///
-///
-/// You can combine other query types and their `Occur`ances into one `BooleanQuery`
-///
-/// ```rust
-///use tantivy::collector::Count;
-///use tantivy::doc;
-///use tantivy::query::{BooleanQuery, Occur, PhraseQuery, Query, TermQuery};
-///use tantivy::schema::{IndexRecordOption, Schema, TEXT};
-///use tantivy::Term;
-///use tantivy::{Index, Result};
-///
-///fn main() -> Result<()> {
-///    let mut schema_builder = Schema::builder();
-///    let title = schema_builder.add_text_field("title", TEXT);
-///    let body = schema_builder.add_text_field("body", TEXT);
-///    let schema = schema_builder.build();
-///    let index = Index::create_in_ram(schema);
-///    {
-///        let mut index_writer = index.writer(3_000_000)?;
-///        index_writer.add_document(doc!(
-///            title => "The Name of the Wind",
-///        ));
-///        index_writer.add_document(doc!(
-///            title => "The Diary of Muadib",
-///        ));
-///        index_writer.add_document(doc!(
-///            title => "A Dairy Cow",
-///            body => "hidden",
-///        ));
-///        index_writer.add_document(doc!(
-///            title => "A Dairy Cow",
-///            body => "found",
-///        ));
-///        index_writer.add_document(doc!(
-///            title => "The Diary of a Young Girl",
-///        ));
-///        index_writer.commit().unwrap();
-///    }
-///
-///    let reader = index.reader()?;
-///    let searcher = reader.searcher();
-///
-///    // Make TermQuery's for "girl" and "diary" in the title
-///    let girl_term_query: Box<dyn Query> = Box::new(TermQuery::new(
-///        Term::from_field_text(title, "girl"),
-///        IndexRecordOption::Basic,
-///    ));
-///    let diary_term_query: Box<dyn Query> = Box::new(TermQuery::new(
-///        Term::from_field_text(title, "diary"),
-///        IndexRecordOption::Basic,
-///    ));
-///    // A TermQuery with "found" in the body
-///    let body_term_query: Box<dyn Query> = Box::new(TermQuery::new(
-///        Term::from_field_text(body, "found"),
-///        IndexRecordOption::Basic,
-///    ));
-///    // TermQuery "diary" must and "girl" must not be present
-///    let queries_with_occurs1 = vec![
-///        (Occur::Must, diary_term_query.box_clone()),
-///        (Occur::MustNot, girl_term_query),
-///    ];
-///    // Make a BooleanQuery equivalent to
-///    // title:+diary title:-girl
-///    let diary_must_and_girl_mustnot = BooleanQuery::from(queries_with_occurs1);
-///    let count1 = searcher.search(&diary_must_and_girl_mustnot, &Count)?;
-///    assert_eq!(count1, 1);
-///
-///    // TermQuery for "cow" in the title
-///    let cow_term_query: Box<dyn Query> = Box::new(TermQuery::new(
-///        Term::from_field_text(title, "cow"),
-///        IndexRecordOption::Basic,
-///    ));
-///    // "title:diary OR title:cow"
-///    let title_diary_or_cow = BooleanQuery::from(vec![
-///        (Occur::Should, diary_term_query.box_clone()),
-///        (Occur::Should, cow_term_query),
-///    ]);
-///    let count2 = searcher.search(&title_diary_or_cow, &Count)?;
-///    assert_eq!(count2, 4);
-///
-///    // Make a `PhraseQuery` from a vector of `Term`s
-///    let phrase_query: Box<dyn Query> = Box::new(PhraseQuery::new(vec![
-///        Term::from_field_text(title, "dairy"),
-///        Term::from_field_text(title, "cow"),
-///    ]));
-///    // You can combine subqueries of different types into 1 BooleanQuery:
-///    // `TermQuery` and `PhraseQuery`
-///    // "title:diary OR "dairy cow"
-///    let term_of_phrase_query = BooleanQuery::from(vec![
-///        (Occur::Should, diary_term_query.box_clone()),
-///        (Occur::Should, phrase_query.box_clone()),
-///    ]);
-///    let count3 = searcher.search(&term_of_phrase_query, &Count)?;
-///    assert_eq!(count3, 4);
-///
-///    // You can nest one BooleanQuery inside another
-///    // body:found AND ("title:diary OR "dairy cow")
-///    let nested_query = BooleanQuery::from(vec![
-///        (Occur::Must, body_term_query),
-///        (Occur::Must, Box::new(term_of_phrase_query))
-///    ]);
-///    let count4 = searcher.search(&nested_query, &Count)?;
-///    assert_eq!(count4, 1);
-///    Ok(())
-///}
-/// ```
 #[derive(Debug)]
 pub struct BooleanQuery {
    subqueries: Vec<(Occur, Box<dyn Query>)>,
--- a/src/query/phrase_query/phrase_query.rs
+++ b/src/query/phrase_query/phrase_query.rs
@@ -40,7 +40,7 @@ impl PhraseQuery {
        PhraseQuery::new_with_offset(terms_with_offset)
    }

-    /// Creates a new `PhraseQuery` given a list of terms and their offsets.
+    /// Creates a new `PhraseQuery` given a list of terms and there offsets.
    ///
    /// Can be used to provide custom offset for each term.
    pub fn new_with_offset(mut terms: Vec<(usize, Term)>) -> PhraseQuery {
@@ -73,7 +73,7 @@ impl PhraseQuery {
            .collect::<Vec<Term>>()
    }

-    /// Returns the `PhraseWeight` for the given phrase query given a specific `searcher`.
+    /// Returns the `PhraseWeight` for the given phrase query given a specific `searcher`.  
    ///
    /// This function is the same as `.weight(...)` except it returns
    /// a specialized type `PhraseWeight` instead of a Boxed trait.
--- a/src/schema/field.rs
+++ b/src/schema/field.rs
@@ -3,8 +3,12 @@ use std::io;
 use std::io::Read;
 use std::io::Write;

-/// `Field` is represented by an unsigned 32-bit integer type
-/// The schema holds the mapping between field names and `Field` objects.
+/// `Field` is actually a `u8` identifying a `Field`
+/// The schema is in charge of holding mapping between field names
+/// to `Field` objects.
+///
+/// Because the field id is a `u8`, tantivy can only have at most `255` fields.
+/// Value 255 is reserved.
 #[derive(Copy, Clone, Debug, PartialEq, PartialOrd, Eq, Ord, Hash, Serialize, Deserialize)]
 pub struct Field(pub u32);
Author	SHA1	Message	Date
Paul Masurel	9fd23f3abf	Fixing bench compilation	2019-10-04 16:36:17 +09:00
Paul Masurel	c030990d00	fmt	2019-10-02 09:50:20 +09:00